[RFC] [PATCH v3 0/2] cgroups: implement moving a threadgroup's threads atomically with cgroup.procs

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC] [PATCH v3 0/2] cgroups: implement moving a threadgroup's threads atomically with cgroup.procs
@ 2010-06-25  5:45 Ben Blum
  2010-06-25  5:46 ` [RFC] [PATCH v3 1/2] cgroups: read-write lock CLONE_THREAD forking per threadgroup Ben Blum
  2010-06-25  5:47 ` [RFC] [PATCH v3 2/2] cgroups: make procs file writable Ben Blum
  0 siblings, 2 replies; 8+ messages in thread
From: Ben Blum @ 2010-06-25  5:45 UTC (permalink / raw)
  To: linux-kernel, containers
  Cc: akpm, bblum, ebiederm, lizf, matthltc, menage, oleg

This patch series is a revision of http://lkml.org/lkml/2010/5/29/126 .

These patches use an rwlock in signal_struct which access is dependent
on Oleg's recent changes to signal_struct's lifetime rules.

It is okay to write the tid of any task in the threadgroup. This is
implemented by taking task->group_leader right after find_task_by_vpid
while still rcu_read-side. This makes it necessary to check if
thread_group_leader(leader) every time we want to iterate over
->thread_group; each of these checks can fail with -EAGAIN.
Unfortunately this also means I needed to put these checks in can_attach
for each subsystem that needs to check each thread in the group.

I handle EAGAIN in the file's write handler, since hey, it's a super
expensive operation, might as well make it unbounded-time to boot - this
is optional and would work just as well with -EAGAIN sent to userspace.

-- bblum

---
 Documentation/cgroups/cgroups.txt |   13 -
 include/linux/cgroup.h            |   15 -
 include/linux/init_task.h         |    9
 include/linux/sched.h             |   10
 kernel/cgroup.c                   |  449 +++++++++++++++++++++++++++++++++-----
 kernel/cgroup_freezer.c           |    4
 kernel/cpuset.c                   |    4
 kernel/fork.c                     |   10
 kernel/ns_cgroup.c                |    4
 kernel/sched.c                    |    4
 10 files changed, 462 insertions(+), 60 deletions(-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC] [PATCH v3 1/2] cgroups: read-write lock CLONE_THREAD forking per threadgroup
  2010-06-25  5:45 [RFC] [PATCH v3 0/2] cgroups: implement moving a threadgroup's threads atomically with cgroup.procs Ben Blum
@ 2010-06-25  5:46 ` Ben Blum
  2010-06-28  7:10   ` KAMEZAWA Hiroyuki
  2010-06-25  5:47 ` [RFC] [PATCH v3 2/2] cgroups: make procs file writable Ben Blum
  1 sibling, 1 reply; 8+ messages in thread
From: Ben Blum @ 2010-06-25  5:46 UTC (permalink / raw)
  To: Ben Blum
  Cc: linux-kernel, containers, akpm, ebiederm, lizf, matthltc, menage, oleg

[-- Attachment #1: cgroup-threadgroup-fork-lock.patch --]
[-- Type: text/plain, Size: 7559 bytes --]

Adds functionality to read/write lock CLONE_THREAD fork()ing per-threadgroup

From: Ben Blum <bblum@andrew.cmu.edu>

This patch adds an rwsem that lives in a threadgroup's signal_struct that's
taken for reading in the fork path, under CONFIG_CGROUPS. If another part of
the kernel later wants to use such a locking mechanism, the CONFIG_CGROUPS
ifdefs should be changed to a higher-up flag that CGROUPS and the other system
would both depend on.

This is a pre-patch for cgroups-procs-write.patch.

Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
---
 include/linux/cgroup.h    |   15 ++++++++++-----
 include/linux/init_task.h |    9 +++++++++
 include/linux/sched.h     |   10 ++++++++++
 kernel/cgroup.c           |   23 +++++++++++++++++++++--
 kernel/fork.c             |   10 +++++++---
 5 files changed, 57 insertions(+), 10 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 8f78073..196a703 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -31,10 +31,12 @@ extern void cgroup_lock(void);
 extern int cgroup_lock_is_held(void);
 extern bool cgroup_lock_live_group(struct cgroup *cgrp);
 extern void cgroup_unlock(void);
-extern void cgroup_fork(struct task_struct *p);
+extern void cgroup_fork(struct task_struct *p, unsigned long clone_flags);
 extern void cgroup_fork_callbacks(struct task_struct *p);
-extern void cgroup_post_fork(struct task_struct *p);
+extern void cgroup_post_fork(struct task_struct *p, unsigned long clone_flags);
 extern void cgroup_exit(struct task_struct *p, int run_callbacks);
+extern void cgroup_fork_failed(struct task_struct *p, int run_callbacks,
+			       unsigned long clone_flags);
 extern int cgroupstats_build(struct cgroupstats *stats,
 				struct dentry *dentry);
 extern int cgroup_load_subsys(struct cgroup_subsys *ss);
@@ -613,11 +615,14 @@ unsigned short css_depth(struct cgroup_subsys_state *css);
 
 static inline int cgroup_init_early(void) { return 0; }
 static inline int cgroup_init(void) { return 0; }
-static inline void cgroup_fork(struct task_struct *p) {}
+static inline void cgroup_fork(struct task_struct *p,
+			       unsigned long clone_flags) {}
 static inline void cgroup_fork_callbacks(struct task_struct *p) {}
-static inline void cgroup_post_fork(struct task_struct *p) {}
+static inline void cgroup_post_fork(struct task_struct *p,
+				    unsigned long clone_flags) {}
 static inline void cgroup_exit(struct task_struct *p, int callbacks) {}
-
+static inline void cgroup_fork_failed(struct task_struct *p, int callbacks,
+				      unsigned long clone_flags) {}
 static inline void cgroup_lock(void) {}
 static inline void cgroup_unlock(void) {}
 static inline int cgroupstats_build(struct cgroupstats *stats,
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index b1ed1cd..cfb2bc8 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -15,6 +15,14 @@
 extern struct files_struct init_files;
 extern struct fs_struct init_fs;
 
+#ifdef CONFIG_CGROUPS
+#define INIT_THREADGROUP_FORK_LOCK(sig)					\
+	.threadgroup_fork_lock =					\
+		__RWSEM_INITIALIZER(sig.threadgroup_fork_lock),
+#else
+#define INIT_THREADGROUP_FORK_LOCK(sig)
+#endif
+
 #define INIT_SIGNALS(sig) {						\
 	.count		= ATOMIC_INIT(1), 				\
 	.wait_chldexit	= __WAIT_QUEUE_HEAD_INITIALIZER(sig.wait_chldexit),\
@@ -29,6 +37,7 @@ extern struct fs_struct init_fs;
 		.running = 0,						\
 		.lock = __SPIN_LOCK_UNLOCKED(sig.cputimer.lock),	\
 	},								\
+	INIT_THREADGROUP_FORK_LOCK(sig)					\
 }
 
 extern struct nsproxy init_nsproxy;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2b7b81d..2bbcbd2 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -627,6 +627,16 @@ struct signal_struct {
 	unsigned audit_tty;
 	struct tty_audit_buf *tty_audit_buf;
 #endif
+#ifdef CONFIG_CGROUPS
+	/*
+	 * The threadgroup_fork_lock prevents threads from forking with
+	 * CLONE_THREAD while held for writing. Use this for fork-sensitive
+	 * threadgroup-wide operations.
+	 * Currently only needed by cgroups, and the fork-side readlock happens
+	 * in cgroup_{fork,post_fork,fork_failed}.
+	 */
+	struct rw_semaphore threadgroup_fork_lock;
+#endif
 
 	int oom_adj;	/* OOM kill score adjustment (bit shift) */
 };
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 6d870f2..6c8e46f 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4015,8 +4015,10 @@ static const struct file_operations proc_cgroupstats_operations = {
  * At the point that cgroup_fork() is called, 'current' is the parent
  * task, and the passed argument 'child' points to the child task.
  */
-void cgroup_fork(struct task_struct *child)
+void cgroup_fork(struct task_struct *child, unsigned long clone_flags)
 {
+	if (clone_flags & CLONE_THREAD)
+		down_read(&current->signal->threadgroup_fork_lock);
 	task_lock(current);
 	child->cgroups = current->cgroups;
 	get_css_set(child->cgroups);
@@ -4058,7 +4060,7 @@ void cgroup_fork_callbacks(struct task_struct *child)
  * with the first call to cgroup_iter_start() - to guarantee that the
  * new task ends up on its list.
  */
-void cgroup_post_fork(struct task_struct *child)
+void cgroup_post_fork(struct task_struct *child, unsigned long clone_flags)
 {
 	if (use_task_css_set_links) {
 		write_lock(&css_set_lock);
@@ -4068,6 +4070,8 @@ void cgroup_post_fork(struct task_struct *child)
 		task_unlock(child);
 		write_unlock(&css_set_lock);
 	}
+	if (clone_flags & CLONE_THREAD)
+		up_read(&current->signal->threadgroup_fork_lock);
 }
 /**
  * cgroup_exit - detach cgroup from exiting task
@@ -4143,6 +4147,21 @@ void cgroup_exit(struct task_struct *tsk, int run_callbacks)
 }
 
 /**
+ * cgroup_fork_failed - undo operations for fork failure
+ * @tsk: pointer to  task_struct of exiting process
+ * @run_callback: run exit callbacks?
+ *
+ * Wrapper for cgroup_exit that also drops the fork lock.
+ */
+void cgroup_fork_failed(struct task_struct *tsk, int run_callbacks,
+			unsigned long clone_flags)
+{
+	if (clone_flags & CLONE_THREAD)
+		up_read(&current->signal->threadgroup_fork_lock);
+	cgroup_exit(tsk, run_callbacks);
+}
+
+/**
  * cgroup_clone - clone the cgroup the given subsystem is attached to
  * @tsk: the task to be moved
  * @subsys: the given subsystem
diff --git a/kernel/fork.c b/kernel/fork.c
index 4c14942..e2b04ac 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -884,6 +884,10 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
 
 	tty_audit_fork(sig);
 
+#ifdef CONFIG_CGROUPS
+	init_rwsem(&sig->threadgroup_fork_lock);
+#endif
+
 	sig->oom_adj = current->signal->oom_adj;
 
 	return 0;
@@ -1069,7 +1073,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 	monotonic_to_bootbased(&p->real_start_time);
 	p->io_context = NULL;
 	p->audit_context = NULL;
-	cgroup_fork(p);
+	cgroup_fork(p, clone_flags);
 #ifdef CONFIG_NUMA
 	p->mempolicy = mpol_dup(p->mempolicy);
  	if (IS_ERR(p->mempolicy)) {
@@ -1277,7 +1281,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 	spin_unlock(&current->sighand->siglock);
 	write_unlock_irq(&tasklist_lock);
 	proc_fork_connector(p);
-	cgroup_post_fork(p);
+	cgroup_post_fork(p, clone_flags);
 	perf_event_fork(p);
 	return p;
 
@@ -1311,7 +1315,7 @@ bad_fork_cleanup_policy:
 	mpol_put(p->mempolicy);
 bad_fork_cleanup_cgroup:
 #endif
-	cgroup_exit(p, cgroup_callbacks_done);
+	cgroup_fork_failed(p, cgroup_callbacks_done, clone_flags);
 	delayacct_tsk_free(p);
 	module_put(task_thread_info(p)->exec_domain->module);
 bad_fork_cleanup_count:

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC] [PATCH v3 2/2] cgroups: make procs file writable
  2010-06-25  5:45 [RFC] [PATCH v3 0/2] cgroups: implement moving a threadgroup's threads atomically with cgroup.procs Ben Blum
  2010-06-25  5:46 ` [RFC] [PATCH v3 1/2] cgroups: read-write lock CLONE_THREAD forking per threadgroup Ben Blum
@ 2010-06-25  5:47 ` Ben Blum
  1 sibling, 0 replies; 8+ messages in thread
From: Ben Blum @ 2010-06-25  5:47 UTC (permalink / raw)
  To: Ben Blum
  Cc: linux-kernel, containers, akpm, ebiederm, lizf, matthltc, menage, oleg

[-- Attachment #1: cgroup-procs-writable.patch --]
[-- Type: text/plain, Size: 19712 bytes --]

Makes procs file writable to move all threads by tgid at once

From: Ben Blum <bblum@andrew.cmu.edu>

This patch adds functionality that enables users to move all threads in a
threadgroup at once to a cgroup by writing the tgid to the 'cgroup.procs'
file. This current implementation makes use of a per-threadgroup rwsem that's
taken for reading in the fork() path to prevent newly forking threads within
the threadgroup from "escaping" while the move is in progress.

Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
---
 Documentation/cgroups/cgroups.txt |   13 +
 kernel/cgroup.c                   |  426 +++++++++++++++++++++++++++++++++----
 kernel/cgroup_freezer.c           |    4 
 kernel/cpuset.c                   |    4 
 kernel/ns_cgroup.c                |    4 
 kernel/sched.c                    |    4 
 6 files changed, 405 insertions(+), 50 deletions(-)

diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
index a1ca592..1d9c81e 100644
--- a/Documentation/cgroups/cgroups.txt
+++ b/Documentation/cgroups/cgroups.txt
@@ -235,7 +235,8 @@ containing the following files describing that cgroup:
  - cgroup.procs: list of tgids in the cgroup.  This list is not
    guaranteed to be sorted or free of duplicate tgids, and userspace
    should sort/uniquify the list if this property is required.
-   This is a read-only file, for now.
+   Writing a thread group id into this file moves all threads in that
+   group into this cgroup.
  - notify_on_release flag: run the release agent on exit?
  - release_agent: the path to use for release notifications (this file
    exists in the top cgroup only)
@@ -416,6 +417,12 @@ You can attach the current shell task by echoing 0:
 
 # echo 0 > tasks
 
+You can use the cgroup.procs file instead of the tasks file to move all
+threads in a threadgroup at once. Echoing the pid of any task in a
+threadgroup to cgroup.procs causes all tasks in that threadgroup to be
+be attached to the cgroup. Writing 0 to cgroup.procs moves all tasks
+in the writing task's threadgroup.
+
 2.3 Mounting hierarchies by name
 --------------------------------
 
@@ -564,7 +571,9 @@ called on a fork. If this method returns 0 (success) then this should
 remain valid while the caller holds cgroup_mutex and it is ensured that either
 attach() or cancel_attach() will be called in future. If threadgroup is
 true, then a successful result indicates that all threads in the given
-thread's threadgroup can be moved together.
+thread's threadgroup can be moved together. If the subsystem wants to
+iterate over task->thread_group, it must take rcu_read_lock then check
+if thread_group_leader(task), returning -EAGAIN if that fails.
 
 void cancel_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
 	       struct task_struct *task, bool threadgroup)
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 6c8e46f..526f44b 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1686,6 +1686,76 @@ int cgroup_path(const struct cgroup *cgrp, char *buf, int buflen)
 }
 EXPORT_SYMBOL_GPL(cgroup_path);
 
+/*
+ * cgroup_task_migrate - move a task from one cgroup to another.
+ *
+ * 'guarantee' is set if the caller promises that a new css_set for the task
+ * will already exit. If not set, this function might sleep, and can fail with
+ * -ENOMEM. Otherwise, it can only fail with -ESRCH.
+ */
+static int cgroup_task_migrate(struct cgroup *cgrp, struct cgroup *oldcgrp,
+			       struct task_struct *tsk, bool guarantee)
+{
+	struct css_set *oldcg;
+	struct css_set *newcg;
+
+	/*
+	 * get old css_set. we need to take task_lock and refcount it, because
+	 * an exiting task can change its css_set to init_css_set and drop its
+	 * old one without taking cgroup_mutex.
+	 */
+	task_lock(tsk);
+	oldcg = tsk->cgroups;
+	get_css_set(oldcg);
+	task_unlock(tsk);
+
+	/* locate or allocate a new css_set for this task. */
+	if (guarantee) {
+		/* we know the css_set we want already exists. */
+		struct cgroup_subsys_state *template[CGROUP_SUBSYS_COUNT];
+		read_lock(&css_set_lock);
+		newcg = find_existing_css_set(oldcg, cgrp, template);
+		BUG_ON(!newcg);
+		get_css_set(newcg);
+		read_unlock(&css_set_lock);
+	} else {
+		might_sleep();
+		/* find_css_set will give us newcg already referenced. */
+		newcg = find_css_set(oldcg, cgrp);
+		if (!newcg) {
+			put_css_set(oldcg);
+			return -ENOMEM;
+		}
+	}
+	put_css_set(oldcg);
+
+	/* if PF_EXITING is set, the tsk->cgroups pointer is no longer safe. */
+	task_lock(tsk);
+	if (tsk->flags & PF_EXITING) {
+		task_unlock(tsk);
+		put_css_set(newcg);
+		return -ESRCH;
+	}
+	rcu_assign_pointer(tsk->cgroups, newcg);
+	task_unlock(tsk);
+
+	/* Update the css_set linked lists if we're using them */
+	write_lock(&css_set_lock);
+	if (!list_empty(&tsk->cg_list))
+		list_move(&tsk->cg_list, &newcg->tasks);
+	write_unlock(&css_set_lock);
+
+	/*
+	 * We just gained a reference on oldcg by taking it from the task. As
+	 * trading it for newcg is protected by cgroup_mutex, we're safe to drop
+	 * it here; it will be freed under RCU.
+	 */
+	put_css_set(oldcg);
+
+	set_bit(CGRP_RELEASABLE, &oldcgrp->flags);
+	return 0;
+}
+
 /**
  * cgroup_attach_task - attach task 'tsk' to cgroup 'cgrp'
  * @cgrp: the cgroup the task is attaching to
@@ -1696,11 +1766,9 @@ EXPORT_SYMBOL_GPL(cgroup_path);
  */
 int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
 {
-	int retval = 0;
+	int retval;
 	struct cgroup_subsys *ss, *failed_ss = NULL;
 	struct cgroup *oldcgrp;
-	struct css_set *cg;
-	struct css_set *newcg;
 	struct cgroupfs_root *root = cgrp->root;
 
 	/* Nothing to do if the task is already in that cgroup */
@@ -1724,46 +1792,16 @@ int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
 		}
 	}
 
-	task_lock(tsk);
-	cg = tsk->cgroups;
-	get_css_set(cg);
-	task_unlock(tsk);
-	/*
-	 * Locate or allocate a new css_set for this task,
-	 * based on its final set of cgroups
-	 */
-	newcg = find_css_set(cg, cgrp);
-	put_css_set(cg);
-	if (!newcg) {
-		retval = -ENOMEM;
-		goto out;
-	}
-
-	task_lock(tsk);
-	if (tsk->flags & PF_EXITING) {
-		task_unlock(tsk);
-		put_css_set(newcg);
-		retval = -ESRCH;
+	retval = cgroup_task_migrate(cgrp, oldcgrp, tsk, false);
+	if (retval)
 		goto out;
-	}
-	rcu_assign_pointer(tsk->cgroups, newcg);
-	task_unlock(tsk);
-
-	/* Update the css_set linked lists if we're using them */
-	write_lock(&css_set_lock);
-	if (!list_empty(&tsk->cg_list)) {
-		list_del(&tsk->cg_list);
-		list_add(&tsk->cg_list, &newcg->tasks);
-	}
-	write_unlock(&css_set_lock);
 
 	for_each_subsys(root, ss) {
 		if (ss->attach)
 			ss->attach(ss, cgrp, oldcgrp, tsk, false);
 	}
-	set_bit(CGRP_RELEASABLE, &oldcgrp->flags);
+
 	synchronize_rcu();
-	put_css_set(cg);
 
 	/*
 	 * wake up rmdir() waiter. the rmdir should fail since the cgroup
@@ -1789,49 +1827,341 @@ out:
 }
 
 /*
- * Attach task with pid 'pid' to cgroup 'cgrp'. Call with cgroup_mutex
- * held. May take task_lock of task
+ * cgroup_attach_proc works in two stages, the first of which prefetches all
+ * new css_sets needed (to make sure we have enough memory before committing
+ * to the move) and stores them in a list of entries of the following type.
+ * TODO: possible optimization: use css_set->rcu_head for chaining instead
+ */
+struct cg_list_entry {
+	struct css_set *cg;
+	struct list_head links;
+};
+
+static bool css_set_check_fetched(struct cgroup *cgrp,
+				  struct task_struct *tsk, struct css_set *cg,
+				  struct list_head *newcg_list)
+{
+	struct css_set *newcg;
+	struct cg_list_entry *cg_entry;
+	struct cgroup_subsys_state *template[CGROUP_SUBSYS_COUNT];
+
+	read_lock(&css_set_lock);
+	newcg = find_existing_css_set(cg, cgrp, template);
+	if (newcg)
+		get_css_set(newcg);
+	read_unlock(&css_set_lock);
+
+	/* doesn't exist at all? */
+	if (!newcg)
+		return false;
+	/* see if it's already in the list */
+	list_for_each_entry(cg_entry, newcg_list, links) {
+		if (cg_entry->cg == newcg) {
+			put_css_set(newcg);
+			return true;
+		}
+	}
+
+	/* not found */
+	put_css_set(newcg);
+	return false;
+}
+
+/*
+ * Find the new css_set and store it in the list in preparation for moving the
+ * given task to the given cgroup. Returns 0 or -ENOMEM.
  */
-static int attach_task_by_pid(struct cgroup *cgrp, u64 pid)
+static int css_set_prefetch(struct cgroup *cgrp, struct css_set *cg,
+			    struct list_head *newcg_list)
+{
+	struct css_set *newcg;
+	struct cg_list_entry *cg_entry;
+
+	/* ensure a new css_set will exist for this thread */
+	newcg = find_css_set(cg, cgrp);
+	if (!newcg)
+		return -ENOMEM;
+	/* add it to the list */
+	cg_entry = kmalloc(sizeof(struct cg_list_entry), GFP_KERNEL);
+	if (!cg_entry) {
+		put_css_set(newcg);
+		return -ENOMEM;
+	}
+	cg_entry->cg = newcg;
+	list_add(&cg_entry->links, newcg_list);
+	return 0;
+}
+
+/**
+ * cgroup_attach_proc - attach all threads in a threadgroup to a cgroup
+ * @cgrp: the cgroup to attach to
+ * @leader: the threadgroup leader task_struct of the group to be attached
+ *
+ * Call holding cgroup_mutex. Will take task_lock of each thread in leader's
+ * threadgroup individually in turn.
+ */
+int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
+{
+	int retval;
+	struct cgroup_subsys *ss, *failed_ss = NULL;
+	struct cgroup *oldcgrp;
+	struct css_set *oldcg;
+	struct cgroupfs_root *root = cgrp->root;
+	/* threadgroup list cursor */
+	struct task_struct *tsk;
+	/*
+	 * we need to make sure we have css_sets for all the tasks we're
+	 * going to move -before- we actually start moving them, so that in
+	 * case we get an ENOMEM we can bail out before making any changes.
+	 */
+	struct list_head newcg_list;
+	struct cg_list_entry *cg_entry, *temp_nobe;
+
+	/* check that we can legitimately attach to the cgroup. */
+	for_each_subsys(root, ss) {
+		if (ss->can_attach) {
+			retval = ss->can_attach(ss, cgrp, leader, true);
+			if (retval) {
+				failed_ss = ss;
+				goto out;
+			}
+		}
+	}
+
+	/*
+	 * step 1: make sure css_sets exist for all threads to be migrated.
+	 * we use find_css_set, which allocates a new one if necessary.
+	 */
+	INIT_LIST_HEAD(&newcg_list);
+	oldcgrp = task_cgroup_from_root(leader, root);
+	if (cgrp != oldcgrp) {
+		/* get old css_set */
+		task_lock(leader);
+		if (leader->flags & PF_EXITING) {
+			task_unlock(leader);
+			goto prefetch_loop;
+		}
+		oldcg = leader->cgroups;
+		get_css_set(oldcg);
+		task_unlock(leader);
+		/* acquire new one */
+		retval = css_set_prefetch(cgrp, oldcg, &newcg_list);
+		put_css_set(oldcg);
+		if (retval)
+			goto list_teardown;
+	}
+prefetch_loop:
+	rcu_read_lock();
+	/* sanity check - if we raced with de_thread, we must abort */
+	if (!thread_group_leader(leader)) {
+		retval = -EAGAIN;
+		goto list_teardown;
+	}
+	/*
+	 * if we need to fetch a new css_set for this task, we must exit the
+	 * rcu_read section because allocating it can sleep. afterwards, we'll
+	 * need to restart iteration on the threadgroup list - the whole thing
+	 * will be O(nm) in the number of threads and css_sets; as the typical
+	 * case has only one css_set for all of them, usually O(n). which ones
+	 * we need allocated won't change as long as we hold cgroup_mutex.
+	 */
+	list_for_each_entry_rcu(tsk, &leader->thread_group, thread_group) {
+		/* nothing to do if this task is already in the cgroup */
+		oldcgrp = task_cgroup_from_root(tsk, root);
+		if (cgrp == oldcgrp)
+			continue;
+		/* get old css_set pointer */
+		task_lock(tsk);
+		if (tsk->flags & PF_EXITING) {
+			/* ignore this task if it's going away */
+			task_unlock(tsk);
+			continue;
+		}
+		oldcg = tsk->cgroups;
+		get_css_set(oldcg);
+		task_unlock(tsk);
+		/* see if the new one for us is already in the list? */
+		if (css_set_check_fetched(cgrp, tsk, oldcg, &newcg_list)) {
+			/* was already there, nothing to do. */
+			put_css_set(oldcg);
+		} else {
+			/* we don't already have it. get new one. */
+			rcu_read_unlock();
+			retval = css_set_prefetch(cgrp, oldcg, &newcg_list);
+			put_css_set(oldcg);
+			if (retval)
+				goto list_teardown;
+			/* begin iteration again. */
+			goto prefetch_loop;
+		}
+	}
+	rcu_read_unlock();
+
+	/*
+	 * step 2: now that we're guaranteed success wrt the css_sets, proceed
+	 * to move all tasks to the new cgroup. we need to lock against possible
+	 * races with fork(). note: we can safely access leader->signal because
+	 * attach_task_by_pid takes a reference on leader, which guarantees that
+	 * the signal_struct will stick around. threadgroup_fork_lock must be
+	 * taken outside of tasklist_lock to match the order in the fork path.
+	 */
+	BUG_ON(!leader->signal);
+	down_write(&leader->signal->threadgroup_fork_lock);
+	read_lock(&tasklist_lock);
+	/* sanity check - if we raced with de_thread, we must abort */
+	if (!thread_group_leader(leader)) {
+		retval = -EAGAIN;
+		read_unlock(&tasklist_lock);
+		up_write(&leader->signal->threadgroup_fork_lock);
+		goto list_teardown;
+	}
+	/*
+	 * No failure cases left, so this is the commit point.
+	 *
+	 * If the leader is already there, skip moving him. Note: even if the
+	 * leader is PF_EXITING, we still move all other threads; if everybody
+	 * is PF_EXITING, we end up doing nothing, which is ok.
+	 */
+	oldcgrp = task_cgroup_from_root(leader, root);
+	if (cgrp != oldcgrp) {
+		retval = cgroup_task_migrate(cgrp, oldcgrp, leader, true);
+		BUG_ON(retval != 0 && retval != -ESRCH);
+	}
+	/* Now iterate over each thread in the group. */
+	list_for_each_entry_rcu(tsk, &leader->thread_group, thread_group) {
+		BUG_ON(tsk->signal != leader->signal);
+		/* leave current thread as it is if it's already there */
+		oldcgrp = task_cgroup_from_root(tsk, root);
+		if (cgrp == oldcgrp)
+			continue;
+		/* we don't care whether these threads are exiting */
+		retval = cgroup_task_migrate(cgrp, oldcgrp, tsk, true);
+		BUG_ON(retval != 0 && retval != -ESRCH);
+	}
+
+	/*
+	 * step 3: attach whole threadgroup to each subsystem
+	 * TODO: if ever a subsystem needs to know the oldcgrp for each task
+	 * being moved, this call will need to be reworked to communicate that.
+	 */
+	for_each_subsys(root, ss) {
+		if (ss->attach)
+			ss->attach(ss, cgrp, oldcgrp, leader, true);
+	}
+	/* holding these until here keeps us safe from exec() and fork(). */
+	read_unlock(&tasklist_lock);
+	up_write(&leader->signal->threadgroup_fork_lock);
+
+	/*
+	 * step 4: success! and cleanup
+	 */
+	synchronize_rcu();
+	cgroup_wakeup_rmdir_waiter(cgrp);
+	retval = 0;
+list_teardown:
+	/* clean up the list of prefetched css_sets. */
+	list_for_each_entry_safe(cg_entry, temp_nobe, &newcg_list, links) {
+		list_del(&cg_entry->links);
+		put_css_set(cg_entry->cg);
+		kfree(cg_entry);
+	}
+out:
+	if (retval) {
+		/* same deal as in cgroup_attach_task, with threadgroup=true */
+		for_each_subsys(root, ss) {
+			if (ss == failed_ss)
+				break;
+			if (ss->cancel_attach)
+				ss->cancel_attach(ss, cgrp, leader, true);
+		}
+	}
+	return retval;
+}
+
+/*
+ * Find the task_struct of the task to attach by vpid and pass it along to the
+ * function to attach either it or all tasks in its threadgroup. Will take
+ * cgroup_mutex; may take task_lock of task.
+ */
+static int attach_task_by_pid(struct cgroup *cgrp, u64 pid, bool threadgroup)
 {
 	struct task_struct *tsk;
 	const struct cred *cred = current_cred(), *tcred;
 	int ret;
 
+	if (!cgroup_lock_live_group(cgrp))
+		return -ENODEV;
+
 	if (pid) {
 		rcu_read_lock();
 		tsk = find_task_by_vpid(pid);
-		if (!tsk || tsk->flags & PF_EXITING) {
+		if (!tsk) {
+			rcu_read_unlock();
+			cgroup_unlock();
+			return -ESRCH;
+		}
+		if (threadgroup) {
+			/*
+			 * it is safe to find group_leader because tsk was found
+			 * in the tid map, meaning it can't have been unhashed
+			 * by someone in de_thread changing the leadership.
+			 */
+			tsk = tsk->group_leader;
+			BUG_ON(!thread_group_leader(tsk));
+		} else if (tsk->flags & PF_EXITING) {
+			/* optimization for the single-task-only case */
 			rcu_read_unlock();
+			cgroup_unlock();
 			return -ESRCH;
 		}
 
+		/*
+		 * even if we're attaching all tasks in the thread group, we
+		 * only need to check permissions on one of them.
+		 */
 		tcred = __task_cred(tsk);
 		if (cred->euid &&
 		    cred->euid != tcred->uid &&
 		    cred->euid != tcred->suid) {
 			rcu_read_unlock();
+			cgroup_unlock();
 			return -EACCES;
 		}
 		get_task_struct(tsk);
 		rcu_read_unlock();
 	} else {
-		tsk = current;
+		if (threadgroup)
+			tsk = current->group_leader;
+		else
+			tsk = current;
 		get_task_struct(tsk);
 	}
 
-	ret = cgroup_attach_task(cgrp, tsk);
+	if (threadgroup)
+		ret = cgroup_attach_proc(cgrp, tsk);
+	else
+		ret = cgroup_attach_task(cgrp, tsk);
 	put_task_struct(tsk);
+	cgroup_unlock();
 	return ret;
 }
 
 static int cgroup_tasks_write(struct cgroup *cgrp, struct cftype *cft, u64 pid)
 {
+	return attach_task_by_pid(cgrp, pid, false);
+}
+
+static int cgroup_procs_write(struct cgroup *cgrp, struct cftype *cft, u64 tgid)
+{
 	int ret;
-	if (!cgroup_lock_live_group(cgrp))
-		return -ENODEV;
-	ret = attach_task_by_pid(cgrp, pid);
-	cgroup_unlock();
+	do {
+		/*
+		 * attach_proc fails with -EAGAIN if threadgroup leadership
+		 * changes in the middle of the operation, in which case we need
+		 * to find the task_struct for the new leader and start over.
+		 */
+		ret = attach_task_by_pid(cgrp, tgid, true);
+	} while (ret == -EAGAIN);
 	return ret;
 }
 
@@ -3167,9 +3497,9 @@ static struct cftype files[] = {
 	{
 		.name = CGROUP_FILE_GENERIC_PREFIX "procs",
 		.open = cgroup_procs_open,
-		/* .write_u64 = cgroup_procs_write, TODO */
+		.write_u64 = cgroup_procs_write,
 		.release = cgroup_pidlist_release,
-		.mode = S_IRUGO,
+		.mode = S_IRUGO | S_IWUSR,
 	},
 	{
 		.name = "notify_on_release",
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index e5c0244..dfb9f24 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -185,6 +185,10 @@ static int freezer_can_attach(struct cgroup_subsys *ss,
 		struct task_struct *c;
 
 		rcu_read_lock();
+		if (!thread_group_leader(task)) {
+			rcu_read_unlock();
+			return -EAGAIN;
+		}
 		list_for_each_entry_rcu(c, &task->thread_group, thread_group) {
 			if (is_task_frozen_enough(c)) {
 				rcu_read_unlock();
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index d109467..edc7c01 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1360,6 +1360,10 @@ static int cpuset_can_attach(struct cgroup_subsys *ss, struct cgroup *cont,
 		struct task_struct *c;
 
 		rcu_read_lock();
+		if (!thread_group_leader(tsk)) {
+			rcu_read_unlock();
+			return -EAGAIN;
+		}
 		list_for_each_entry_rcu(c, &tsk->thread_group, thread_group) {
 			ret = security_task_setscheduler(c, 0, NULL);
 			if (ret) {
diff --git a/kernel/ns_cgroup.c b/kernel/ns_cgroup.c
index 2a5dfec..ecd15d2 100644
--- a/kernel/ns_cgroup.c
+++ b/kernel/ns_cgroup.c
@@ -59,6 +59,10 @@ static int ns_can_attach(struct cgroup_subsys *ss, struct cgroup *new_cgroup,
 	if (threadgroup) {
 		struct task_struct *c;
 		rcu_read_lock();
+		if (!thread_group_leader(task)) {
+			rcu_read_unlock();
+			return -EAGAIN;
+		}
 		list_for_each_entry_rcu(c, &task->thread_group, thread_group) {
 			if (!cgroup_is_descendant(new_cgroup, c)) {
 				rcu_read_unlock();
diff --git a/kernel/sched.c b/kernel/sched.c
index 3c2a54f..1b36bf0 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -8712,6 +8712,10 @@ cpu_cgroup_can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
 	if (threadgroup) {
 		struct task_struct *c;
 		rcu_read_lock();
+		if (!thread_group_leader(tsk)) {
+			rcu_read_unlock();
+			return -EAGAIN;
+		}
 		list_for_each_entry_rcu(c, &tsk->thread_group, thread_group) {
 			retval = cpu_cgroup_can_attach_task(cgrp, c);
 			if (retval) {

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC] [PATCH v3 1/2] cgroups: read-write lock CLONE_THREAD forking per threadgroup
  2010-06-25  5:46 ` [RFC] [PATCH v3 1/2] cgroups: read-write lock CLONE_THREAD forking per threadgroup Ben Blum
@ 2010-06-28  7:10   ` KAMEZAWA Hiroyuki
  2010-06-28 15:43     ` Ben Blum
  0 siblings, 1 reply; 8+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-06-28  7:10 UTC (permalink / raw)
  To: Ben Blum
  Cc: linux-kernel, containers, akpm, ebiederm, lizf, matthltc, menage, oleg

On Fri, 25 Jun 2010 01:46:54 -0400
Ben Blum <bblum@andrew.cmu.edu> wrote:

> Adds functionality to read/write lock CLONE_THREAD fork()ing per-threadgroup
> 
> From: Ben Blum <bblum@andrew.cmu.edu>
> 
> This patch adds an rwsem that lives in a threadgroup's signal_struct that's
> taken for reading in the fork path, under CONFIG_CGROUPS. If another part of
> the kernel later wants to use such a locking mechanism, the CONFIG_CGROUPS
> ifdefs should be changed to a higher-up flag that CGROUPS and the other system
> would both depend on.
> 
> This is a pre-patch for cgroups-procs-write.patch.
> 

Hmm, at adding a new lock, please describe its semantics.
Following is my understanding, right ?

Lock range:
This rwsem is read-locked from cgroup_fork() to cgroup_post_fork(). 
Most of works for fork() are between cgroup_fork() and cgroup_post_for().
This means if sig->threadgroup_fork_lock is held, no new do_work() can
make progress in this process groups. This rwsem is held only when
CLONE_THREAD is in clone_flags. IOW, this rwsem is taken only at creating
new thread.

What we can do with this:
By locking sig->threadgroup_fork_lock, a code can visit _all_ threads
in a process group witout any races. So, if you want to implement an
atomic operation against a process, taking this lock is an idea.

For what:
To implement an atomic process move in cgroup, we need this lock.

Why this implemantation:
Considering cgroup, threads in a cgroup can be under several different
cgroup. So, we can't implement lock in cgroup-internal, we use signal
struct.

By the way, IMHO, hiding lock in cgroup_fork() and cgroup_post_fork() doesn't
seem good idea. How about a code like this ?

  read_lock_thread_clone(current);
  cgroup_fork();
 .....
  cgroup_post_fork();
  read_unlock_thrad_clone(current);

We may have chances to move these lock to better position if cgroup is
an only user.

Thanks,
-Kame

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] [PATCH v3 1/2] cgroups: read-write lock CLONE_THREAD forking per threadgroup
  2010-06-28  7:10   ` KAMEZAWA Hiroyuki
@ 2010-06-28 15:43     ` Ben Blum
  2010-07-28  8:29       ` Ben Blum
  0 siblings, 1 reply; 8+ messages in thread
From: Ben Blum @ 2010-06-28 15:43 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Ben Blum, linux-kernel, containers, akpm, ebiederm, lizf,
	matthltc, menage, oleg

On Mon, Jun 28, 2010 at 04:10:31PM +0900, KAMEZAWA Hiroyuki wrote:
> On Fri, 25 Jun 2010 01:46:54 -0400
> Ben Blum <bblum@andrew.cmu.edu> wrote:
> 
> > Adds functionality to read/write lock CLONE_THREAD fork()ing per-threadgroup
> > 
> > From: Ben Blum <bblum@andrew.cmu.edu>
> > 
> > This patch adds an rwsem that lives in a threadgroup's signal_struct that's
> > taken for reading in the fork path, under CONFIG_CGROUPS. If another part of
> > the kernel later wants to use such a locking mechanism, the CONFIG_CGROUPS
> > ifdefs should be changed to a higher-up flag that CGROUPS and the other system
> > would both depend on.
> > 
> > This is a pre-patch for cgroups-procs-write.patch.
> > 
> 
> Hmm, at adding a new lock, please describe its semantics.
> Following is my understanding, right ?
> 
> Lock range:
> This rwsem is read-locked from cgroup_fork() to cgroup_post_fork(). 
> Most of works for fork() are between cgroup_fork() and cgroup_post_for().
> This means if sig->threadgroup_fork_lock is held, no new do_work() can
> make progress in this process groups. This rwsem is held only when
> CLONE_THREAD is in clone_flags. IOW, this rwsem is taken only at creating
> new thread.
> 
> What we can do with this:
> By locking sig->threadgroup_fork_lock, a code can visit _all_ threads
> in a process group witout any races. So, if you want to implement an
> atomic operation against a process, taking this lock is an idea.
> 
> For what:
> To implement an atomic process move in cgroup, we need this lock.

All good. Should a description like this go in Documentation/ somewhere,
or above the declaration of the lock?

> Why this implemantation:
> Considering cgroup, threads in a cgroup can be under several different
> cgroup. So, we can't implement lock in cgroup-internal, we use signal
> struct.

Not entirely, though that's an additional restriction... The reason it's
in signal_struct: signal_struct is per-threadgroup and has exactly the
lifetime rules we want. Putting the lock in task_struct and taking
current->group_leader->signal... seems like it would also work, but
introduces cacheline conflicts that weren't there before, since fork
doesn't look at group_leader (but it does bump signal's count).

> By the way, IMHO, hiding lock in cgroup_fork() and cgroup_post_fork() doesn't
> seem good idea. How about a code like this ?
> 
>   read_lock_thread_clone(current);
>   cgroup_fork();
>  .....
>   cgroup_post_fork();
>   read_unlock_thrad_clone(current);
> 
> We may have chances to move these lock to better position if cgroup is
> an only user.

I didn't do that out of a desire to change fork.c as little as possible,
but that does look better than what I've got. Those two functions should
be in fork.c under #ifdef CONFIG_CGROUPS.

> 
> Thanks,
> -Kame

Thanks,
-- Ben

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] [PATCH v3 1/2] cgroups: read-write lock CLONE_THREAD forking per threadgroup
  2010-06-28 15:43     ` Ben Blum
@ 2010-07-28  8:29       ` Ben Blum
  2010-07-28  9:28         ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 8+ messages in thread
From: Ben Blum @ 2010-07-28  8:29 UTC (permalink / raw)
  To: Ben Blum
  Cc: KAMEZAWA Hiroyuki, linux-kernel, containers, akpm, ebiederm,
	lizf, matthltc, menage, oleg

On Mon, Jun 28, 2010 at 11:43:59AM -0400, Ben Blum wrote:
> On Mon, Jun 28, 2010 at 04:10:31PM +0900, KAMEZAWA Hiroyuki wrote:
> > By the way, IMHO, hiding lock in cgroup_fork() and cgroup_post_fork() doesn't
> > seem good idea. How about a code like this ?
> > 
> >   read_lock_thread_clone(current);
> >   cgroup_fork();
> >  .....
> >   cgroup_post_fork();
> >   read_unlock_thrad_clone(current);
> > 
> > We may have chances to move these lock to better position if cgroup is
> > an only user.
> 
> I didn't do that out of a desire to change fork.c as little as possible,
> but that does look better than what I've got. Those two functions should
> be in fork.c under #ifdef CONFIG_CGROUPS.

I'm looking at this now and am not sure where the best place to put
these is:

1) Don't make new functions, just put:

    #ifdef CONFIG_CGROUPS
        if (clone_flags & CLONE_THREAD)
            down/up_read(...);
    #endif

directly in copy_process() in fork.c. Simplest, but uglifies the code.

2) Make static helper functions in fork.c. Good, but not consistent with
directly using the lock in write-side (attach_proc).

3) Define inline functions under #ifdef CONFIG_CGROUPS in sched.h, just
under the declaration of the lock. Most robust, but I'm hesitant to add
unneeded stuff to such a popular header file.

Any opinions?

-- Ben

> 
> > 
> > Thanks,
> > -Kame
> 
> Thanks,
> -- Ben
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] [PATCH v3 1/2] cgroups: read-write lock CLONE_THREAD forking per threadgroup
  2010-07-28  8:29       ` Ben Blum
@ 2010-07-28  9:28         ` KAMEZAWA Hiroyuki
  2010-07-28 15:41           ` Ben Blum
  0 siblings, 1 reply; 8+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-28  9:28 UTC (permalink / raw)
  To: Ben Blum
  Cc: linux-kernel, containers, akpm, ebiederm, lizf, matthltc, menage, oleg

On Wed, 28 Jul 2010 04:29:53 -0400
Ben Blum <bblum@andrew.cmu.edu> wrote:

> On Mon, Jun 28, 2010 at 11:43:59AM -0400, Ben Blum wrote:
> > On Mon, Jun 28, 2010 at 04:10:31PM +0900, KAMEZAWA Hiroyuki wrote:
> > > By the way, IMHO, hiding lock in cgroup_fork() and cgroup_post_fork() doesn't
> > > seem good idea. How about a code like this ?
> > > 
> > >   read_lock_thread_clone(current);
> > >   cgroup_fork();
> > >  .....
> > >   cgroup_post_fork();
> > >   read_unlock_thrad_clone(current);
> > > 
> > > We may have chances to move these lock to better position if cgroup is
> > > an only user.
> > 
> > I didn't do that out of a desire to change fork.c as little as possible,
> > but that does look better than what I've got. Those two functions should
> > be in fork.c under #ifdef CONFIG_CGROUPS.
> 
> I'm looking at this now and am not sure where the best place to put
> these is:
> 
> 1) Don't make new functions, just put:
> 
>     #ifdef CONFIG_CGROUPS
>         if (clone_flags & CLONE_THREAD)
>             down/up_read(...);
>     #endif
> 
> directly in copy_process() in fork.c. Simplest, but uglifies the code.
> 
> 2) Make static helper functions in fork.c. Good, but not consistent with
> directly using the lock in write-side (attach_proc).
> 
> 3) Define inline functions under #ifdef CONFIG_CGROUPS in sched.h, just
> under the declaration of the lock. Most robust, but I'm hesitant to add
> unneeded stuff to such a popular header file.
> 
> Any opinions?
> 

My point was simple. Because copy_process() is very important path,
the new lock should be visible in copy_process() or kernek/fork.c.
"If the lock is visible in copy_process(), the reader can notice it".

Then, I offer you 2 options.

rename cgroup_fork() and cgroup_post_fork() as
       cgroup_fork_lock() and cgroup_post_fork_unlock() 

Now, the lock is visible and the change is minimum.

Or
       add the definition of lock/unlock to cgroup.h and include it 
       from kernel/fork.c

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] [PATCH v3 1/2] cgroups: read-write lock CLONE_THREAD forking per threadgroup
  2010-07-28  9:28         ` KAMEZAWA Hiroyuki
@ 2010-07-28 15:41           ` Ben Blum
  0 siblings, 0 replies; 8+ messages in thread
From: Ben Blum @ 2010-07-28 15:41 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Ben Blum, linux-kernel, containers, akpm, ebiederm, lizf,
	matthltc, menage, oleg

On Wed, Jul 28, 2010 at 06:28:13PM +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 28 Jul 2010 04:29:53 -0400
> Ben Blum <bblum@andrew.cmu.edu> wrote:
> 
> > On Mon, Jun 28, 2010 at 11:43:59AM -0400, Ben Blum wrote:
> > > On Mon, Jun 28, 2010 at 04:10:31PM +0900, KAMEZAWA Hiroyuki wrote:
> > > > By the way, IMHO, hiding lock in cgroup_fork() and cgroup_post_fork() doesn't
> > > > seem good idea. How about a code like this ?
> > > > 
> > > >   read_lock_thread_clone(current);
> > > >   cgroup_fork();
> > > >  .....
> > > >   cgroup_post_fork();
> > > >   read_unlock_thrad_clone(current);
> > > > 
> > > > We may have chances to move these lock to better position if cgroup is
> > > > an only user.
> > > 
> > > I didn't do that out of a desire to change fork.c as little as possible,
> > > but that does look better than what I've got. Those two functions should
> > > be in fork.c under #ifdef CONFIG_CGROUPS.
> > 
> > I'm looking at this now and am not sure where the best place to put
> > these is:
> > 
> > 1) Don't make new functions, just put:
> > 
> >     #ifdef CONFIG_CGROUPS
> >         if (clone_flags & CLONE_THREAD)
> >             down/up_read(...);
> >     #endif
> > 
> > directly in copy_process() in fork.c. Simplest, but uglifies the code.
> > 
> > 2) Make static helper functions in fork.c. Good, but not consistent with
> > directly using the lock in write-side (attach_proc).
> > 
> > 3) Define inline functions under #ifdef CONFIG_CGROUPS in sched.h, just
> > under the declaration of the lock. Most robust, but I'm hesitant to add
> > unneeded stuff to such a popular header file.
> > 
> > Any opinions?
> > 
> 
> My point was simple. Because copy_process() is very important path,
> the new lock should be visible in copy_process() or kernek/fork.c.
> "If the lock is visible in copy_process(), the reader can notice it".
> 
> Then, I offer you 2 options.
> 
> rename cgroup_fork() and cgroup_post_fork() as
>        cgroup_fork_lock() and cgroup_post_fork_unlock() 
> 
> Now, the lock is visible and the change is minimum.
> 
> Or
>        add the definition of lock/unlock to cgroup.h and include it 
>        from kernel/fork.c
> 
> Thanks,
> -Kame

I don't like either of these. Renaming to cgroup_fork_lock not only
conveys the sense that a cgroup-specific lock is taken, but also hides
the real purpose of these functions, which is to manipulate cgroup
pointers. And it's not a cgroup-specific lock - only write-side is
*currently* used by cgroups - so it shouldn't go in cgroup.h.

-- Ben

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-07-28 15:42 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-25  5:45 [RFC] [PATCH v3 0/2] cgroups: implement moving a threadgroup's threads atomically with cgroup.procs Ben Blum
2010-06-25  5:46 ` [RFC] [PATCH v3 1/2] cgroups: read-write lock CLONE_THREAD forking per threadgroup Ben Blum
2010-06-28  7:10   ` KAMEZAWA Hiroyuki
2010-06-28 15:43     ` Ben Blum
2010-07-28  8:29       ` Ben Blum
2010-07-28  9:28         ` KAMEZAWA Hiroyuki
2010-07-28 15:41           ` Ben Blum
2010-06-25  5:47 ` [RFC] [PATCH v3 2/2] cgroups: make procs file writable Ben Blum

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).