All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET v2 cgroup/for-3.12] cgroup: make cgroup_event specific to memcg
@ 2013-08-15 16:02 Tejun Heo
       [not found] ` <1376582550-12548-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 74+ messages in thread
From: Tejun Heo @ 2013-08-15 16:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Hello,

Changes from the last take[L] are

* Event handling updated such that it doesn't require meddling with
  internals not normally exposed outside cgroup core proper.  dentry
  reference counting is replaced with css one and cftype handling is
  completely gone.  Hopefully, this addresses Michal's complaints.

* Further simplifications.

* Rebased on top of the current cgroup/for-3.12 and pending patches.

Like many other things in cgroup, cgroup_event is way too flexible and
complex - it strives to provide completely flexible event monitoring
facility in cgroup proper which allows any number of users to monitor
custom events.  This essentially is a layering violation and leads to
weird issues like worrying about event API mis/abuse from userland in
cgroup controller event source implementation.

The only thing cgroup_event can do better than standard "file changed"
notification is serving many uncoordinated event listeners watching
many different thresholds which would only make sense if access to the
cgroup hierarchy is widely distributed.  The existing implementation
is pretty ill-equipped to handle such scenario, is not in the right
layer to tackle such issues and the whole cgroup is headed the other
way.  As such, going forward, cgroup core won't support cgroup_event
as the common event mechanism.

Fortunately, memcg along with vmpressure is the only user of the
facility and gets to keep it.  This patchset makes cgroup_event
specific to memcg, moves all related code into mm/memcontrol.c and
renames it to mem_cgroup_event so that its usage can't spread to other
subsystems and later deprecation and cleanup can be localized.

Note that after this patchset, cgroup.event_control file exists only
for the hierarchy which has memcg attached to it.  This is a userland
visible change but unlikely to be noticeable as the file has never
been meaningful outside memcg.  If this ever becomes problematic, we
can add a dummy file on hierarchies w/o memcg when !sane_behavior.

This patchset is consited of the following 12 patches.

 0001-cgroup-rename-cgroup_css_from_dir-to-css_from_dir-an.patch
 0002-cgroup-make-cgroup_css-take-cgroup_subsys-instead-an.patch
 0003-cgroup-implement-CFTYPE_NO_PREFIX.patch
 0004-cgroup-make-cgroup_event-hold-onto-cgroup_subsys_sta.patch
 0005-cgroup-make-cgroup_write_event_control-use-css_from_.patch
 0006-cgroup-memcg-move-cgroup_event-implementation-to-mem.patch
 0007-memcg-cgroup_write_event_control-now-knows-css-is-fo.patch
 0008-cgroup-memcg-move-cgroup-event_list-_lock-and-event-.patch
 0009-memcg-remove-cgroup_event-cft.patch
 0010-memcg-make-cgroup_event-deal-with-mem_cgroup-instead.patch
 0011-memcg-rename-cgroup_event-to-mem_cgroup_event.patch
 0012-cgroup-unexport-cgroup_css-and-remove-__file_cft.patch

0001-0005 prep for the move.  0005 moves it.  0006-0012 simplify it.

While these are quite a few patches, they are mostly trivial in nature
and some are required changes with the planned unified hierarchy
(e.g. switching to css refcnting) and we'd have to restructure it to
handle dynamic css attach/detach if it's kept generic, which will
likely be a lot more work for no noticeable gain.

The patches are based on top of

  cgroup/for-3.12 ff58ac0d58 ("cpuset: remove an unncessary forward declaration")
+ [PATCH] cgroup: fix subsystem file accesses on the root cgroup
+ [PATCH] cgroup: fix cgroup_write_event_control()

and available in the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-memcg_event

diffstat follows.

 Documentation/cgroups/cgroups.txt |   20 -
 include/linux/cgroup.h            |   28 --
 include/linux/vmpressure.h        |    8
 init/Kconfig                      |    3
 kernel/cgroup.c                   |  383 +++++---------------------------------
 kernel/events/core.c              |    2
 mm/memcontrol.c                   |  351 +++++++++++++++++++++++++++++++---
 mm/vmpressure.c                   |   26 --
 8 files changed, 389 insertions(+), 432 deletions(-)

Thanks.

--
tejun

[L] http://thread.gmane.org/gmane.linux.kernel.cgroups/8726

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 01/12] cgroup: rename cgroup_css_from_dir() to css_from_dir() and update its syntax
       [not found] ` <1376582550-12548-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2013-08-15 16:02   ` [PATCH 01/12] cgroup: rename cgroup_css_from_dir() to css_from_dir() and update its syntax Tejun Heo
@ 2013-08-15 16:02   ` Tejun Heo
  2013-08-15 16:02   ` [PATCH 02/12] cgroup: make cgroup_css() take cgroup_subsys * instead and allow NULL subsys Tejun Heo
                     ` (27 subsequent siblings)
  29 siblings, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2013-08-15 16:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
  Cc: Frederic Weisbecker,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Steven Rostedt, Ingo Molnar, Tejun Heo,
	cgroups-u79uwXL29TY76Z2rM5mHXA

cgroup_css_from_dir() will grow another user.  In preparation, make
the following changes.

* All css functions are prefixed with just "css_", rename it to
  css_from_dir().

* Take dentry * instead of file * as dentry is what ultimately
  identifies a cgroup and file may not always be available.  Note that
  the function now checkes whether @dentry->d_inode is NULL as the
  caller now may specify a negative dentry.

* Make it take cgroup_subsys * instead of integer subsys_id.  This
  simplifies the function and allows specifying no subsystem for
  cgroup->dummy_css.

* Make return section a bit less verbose.

This patch doesn't introduce any behavior changes.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org>
Cc: Frederic Weisbecker <fweisbec-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 include/linux/cgroup.h |  3 ++-
 kernel/cgroup.c        | 26 ++++++++++----------------
 kernel/events/core.c   |  2 +-
 3 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index c24bd0b..5029176 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -919,7 +919,8 @@ bool css_is_ancestor(struct cgroup_subsys_state *cg,
 
 /* Get id and depth of css */
 unsigned short css_id(struct cgroup_subsys_state *css);
-struct cgroup_subsys_state *cgroup_css_from_dir(struct file *f, int id);
+struct cgroup_subsys_state *css_from_dir(struct dentry *dentry,
+					 struct cgroup_subsys *ss);
 
 #else /* !CONFIG_CGROUPS */
 
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 6499004..007053d 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -5700,34 +5700,28 @@ struct cgroup_subsys_state *css_lookup(struct cgroup_subsys *ss, int id)
 EXPORT_SYMBOL_GPL(css_lookup);
 
 /**
- * cgroup_css_from_dir - get corresponding css from file open on cgroup dir
- * @f: directory file of interest
- * @id: subsystem id of interest
+ * css_from_dir - get corresponding css from the dentry of a cgroup dir
+ * @dentry: directory dentry of interest
+ * @ss: subsystem of interest
  *
  * Must be called under RCU read lock.  The caller is responsible for
  * pinning the returned css if it needs to be accessed outside the RCU
  * critical section.
  */
-struct cgroup_subsys_state *cgroup_css_from_dir(struct file *f, int id)
+struct cgroup_subsys_state *css_from_dir(struct dentry *dentry,
+					 struct cgroup_subsys *ss)
 {
 	struct cgroup *cgrp;
-	struct inode *inode;
-	struct cgroup_subsys_state *css;
 
 	WARN_ON_ONCE(!rcu_read_lock_held());
 
-	inode = file_inode(f);
-	/* check in cgroup filesystem dir */
-	if (inode->i_op != &cgroup_dir_inode_operations)
+	/* is @dentry a cgroup dir? */
+	if (!dentry->d_inode ||
+	    dentry->d_inode->i_op != &cgroup_dir_inode_operations)
 		return ERR_PTR(-EBADF);
 
-	if (id < 0 || id >= CGROUP_SUBSYS_COUNT)
-		return ERR_PTR(-EINVAL);
-
-	/* get cgroup */
-	cgrp = __d_cgrp(f->f_dentry);
-	css = cgroup_css(cgrp, id);
-	return css ? css : ERR_PTR(-ENOENT);
+	cgrp = __d_cgrp(dentry);
+	return cgroup_css(cgrp, ss->subsys_id) ?: ERR_PTR(-ENOENT);
 }
 
 #ifdef CONFIG_CGROUP_DEBUG
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 23261f9..b59ab66 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -593,7 +593,7 @@ static inline int perf_cgroup_connect(int fd, struct perf_event *event,
 
 	rcu_read_lock();
 
-	css = cgroup_css_from_dir(f.file, perf_subsys_id);
+	css = css_from_dir(f.file->f_dentry, &perf_subsys);
 	if (IS_ERR(css)) {
 		ret = PTR_ERR(css);
 		goto out;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 01/12] cgroup: rename cgroup_css_from_dir() to css_from_dir() and update its syntax
       [not found] ` <1376582550-12548-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2013-08-15 16:02   ` Tejun Heo
  2013-08-15 16:02   ` Tejun Heo
                     ` (28 subsequent siblings)
  29 siblings, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2013-08-15 16:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Tejun Heo, Steven Rostedt,
	Frederic Weisbecker, Ingo Molnar

cgroup_css_from_dir() will grow another user.  In preparation, make
the following changes.

* All css functions are prefixed with just "css_", rename it to
  css_from_dir().

* Take dentry * instead of file * as dentry is what ultimately
  identifies a cgroup and file may not always be available.  Note that
  the function now checkes whether @dentry->d_inode is NULL as the
  caller now may specify a negative dentry.

* Make it take cgroup_subsys * instead of integer subsys_id.  This
  simplifies the function and allows specifying no subsystem for
  cgroup->dummy_css.

* Make return section a bit less verbose.

This patch doesn't introduce any behavior changes.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org>
Cc: Frederic Weisbecker <fweisbec-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 include/linux/cgroup.h |  3 ++-
 kernel/cgroup.c        | 26 ++++++++++----------------
 kernel/events/core.c   |  2 +-
 3 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index c24bd0b..5029176 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -919,7 +919,8 @@ bool css_is_ancestor(struct cgroup_subsys_state *cg,
 
 /* Get id and depth of css */
 unsigned short css_id(struct cgroup_subsys_state *css);
-struct cgroup_subsys_state *cgroup_css_from_dir(struct file *f, int id);
+struct cgroup_subsys_state *css_from_dir(struct dentry *dentry,
+					 struct cgroup_subsys *ss);
 
 #else /* !CONFIG_CGROUPS */
 
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 6499004..007053d 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -5700,34 +5700,28 @@ struct cgroup_subsys_state *css_lookup(struct cgroup_subsys *ss, int id)
 EXPORT_SYMBOL_GPL(css_lookup);
 
 /**
- * cgroup_css_from_dir - get corresponding css from file open on cgroup dir
- * @f: directory file of interest
- * @id: subsystem id of interest
+ * css_from_dir - get corresponding css from the dentry of a cgroup dir
+ * @dentry: directory dentry of interest
+ * @ss: subsystem of interest
  *
  * Must be called under RCU read lock.  The caller is responsible for
  * pinning the returned css if it needs to be accessed outside the RCU
  * critical section.
  */
-struct cgroup_subsys_state *cgroup_css_from_dir(struct file *f, int id)
+struct cgroup_subsys_state *css_from_dir(struct dentry *dentry,
+					 struct cgroup_subsys *ss)
 {
 	struct cgroup *cgrp;
-	struct inode *inode;
-	struct cgroup_subsys_state *css;
 
 	WARN_ON_ONCE(!rcu_read_lock_held());
 
-	inode = file_inode(f);
-	/* check in cgroup filesystem dir */
-	if (inode->i_op != &cgroup_dir_inode_operations)
+	/* is @dentry a cgroup dir? */
+	if (!dentry->d_inode ||
+	    dentry->d_inode->i_op != &cgroup_dir_inode_operations)
 		return ERR_PTR(-EBADF);
 
-	if (id < 0 || id >= CGROUP_SUBSYS_COUNT)
-		return ERR_PTR(-EINVAL);
-
-	/* get cgroup */
-	cgrp = __d_cgrp(f->f_dentry);
-	css = cgroup_css(cgrp, id);
-	return css ? css : ERR_PTR(-ENOENT);
+	cgrp = __d_cgrp(dentry);
+	return cgroup_css(cgrp, ss->subsys_id) ?: ERR_PTR(-ENOENT);
 }
 
 #ifdef CONFIG_CGROUP_DEBUG
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 23261f9..b59ab66 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -593,7 +593,7 @@ static inline int perf_cgroup_connect(int fd, struct perf_event *event,
 
 	rcu_read_lock();
 
-	css = cgroup_css_from_dir(f.file, perf_subsys_id);
+	css = css_from_dir(f.file->f_dentry, &perf_subsys);
 	if (IS_ERR(css)) {
 		ret = PTR_ERR(css);
 		goto out;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 02/12] cgroup: make cgroup_css() take cgroup_subsys * instead and allow NULL subsys
       [not found] ` <1376582550-12548-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2013-08-15 16:02   ` [PATCH 01/12] cgroup: rename cgroup_css_from_dir() to css_from_dir() and update its syntax Tejun Heo
  2013-08-15 16:02   ` Tejun Heo
@ 2013-08-15 16:02   ` Tejun Heo
  2013-08-15 16:02   ` Tejun Heo
                     ` (26 subsequent siblings)
  29 siblings, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2013-08-15 16:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
  Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

cgroup_css() is no longer used in hot paths.  Make it take struct
cgroup_subsys * and allow the users to specify NULL subsys to obtain
the dummy_css.  This removes open-coded NULL subsystem testing in a
couple users and generally simplifies the code.

After this patch, css_from_dir() also allows NULL @ss and returns the
matching dummy_css.  This behavior change doesn't affect its only user
- perf.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 kernel/cgroup.c | 90 +++++++++++++++++++++++++++------------------------------
 1 file changed, 43 insertions(+), 47 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 007053d..f09ce8d 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -226,19 +226,22 @@ static int cgroup_addrm_files(struct cgroup *cgrp, struct cftype cfts[],
 /**
  * cgroup_css - obtain a cgroup's css for the specified subsystem
  * @cgrp: the cgroup of interest
- * @subsys_id: the subsystem of interest
+ * @ss: the subsystem of interest (%NULL returns the dummy_css)
  *
- * Return @cgrp's css (cgroup_subsys_state) associated with @subsys_id.
- * This function must be called either under cgroup_mutex or
- * rcu_read_lock() and the caller is responsible for pinning the returned
- * css if it wants to keep accessing it outside the said locks.  This
- * function may return %NULL if @cgrp doesn't have @subsys_id enabled.
+ * Return @cgrp's css (cgroup_subsys_state) associated with @ss.  This
+ * function must be called either under cgroup_mutex or rcu_read_lock() and
+ * the caller is responsible for pinning the returned css if it wants to
+ * keep accessing it outside the said locks.  This function may return
+ * %NULL if @cgrp doesn't have @subsys_id enabled.
  */
 static struct cgroup_subsys_state *cgroup_css(struct cgroup *cgrp,
-					      int subsys_id)
+					      struct cgroup_subsys *ss)
 {
-	return rcu_dereference_check(cgrp->subsys[subsys_id],
-				     lockdep_is_held(&cgroup_mutex));
+	if (ss)
+		return rcu_dereference_check(cgrp->subsys[ss->subsys_id],
+					     lockdep_is_held(&cgroup_mutex));
+	else
+		return &cgrp->dummy_css;
 }
 
 /* convenient tests for these bits */
@@ -580,7 +583,7 @@ static struct css_set *find_existing_css_set(struct css_set *old_cset,
 			/* Subsystem is in this hierarchy. So we want
 			 * the subsystem state from the new
 			 * cgroup */
-			template[i] = cgroup_css(cgrp, i);
+			template[i] = cgroup_css(cgrp, ss);
 		} else {
 			/* Subsystem is not in this hierarchy, so we
 			 * don't want to change the subsystem state */
@@ -1062,30 +1065,30 @@ static int rebind_subsystems(struct cgroupfs_root *root,
 
 		if (bit & added_mask) {
 			/* We're binding this subsystem to this hierarchy */
-			BUG_ON(cgroup_css(cgrp, i));
-			BUG_ON(!cgroup_css(cgroup_dummy_top, i));
-			BUG_ON(cgroup_css(cgroup_dummy_top, i)->cgroup != cgroup_dummy_top);
+			BUG_ON(cgroup_css(cgrp, ss));
+			BUG_ON(!cgroup_css(cgroup_dummy_top, ss));
+			BUG_ON(cgroup_css(cgroup_dummy_top, ss)->cgroup != cgroup_dummy_top);
 
 			rcu_assign_pointer(cgrp->subsys[i],
-					   cgroup_css(cgroup_dummy_top, i));
-			cgroup_css(cgrp, i)->cgroup = cgrp;
+					   cgroup_css(cgroup_dummy_top, ss));
+			cgroup_css(cgrp, ss)->cgroup = cgrp;
 
 			list_move(&ss->sibling, &root->subsys_list);
 			ss->root = root;
 			if (ss->bind)
-				ss->bind(cgroup_css(cgrp, i));
+				ss->bind(cgroup_css(cgrp, ss));
 
 			/* refcount was already taken, and we're keeping it */
 			root->subsys_mask |= bit;
 		} else if (bit & removed_mask) {
 			/* We're removing this subsystem */
-			BUG_ON(cgroup_css(cgrp, i) != cgroup_css(cgroup_dummy_top, i));
-			BUG_ON(cgroup_css(cgrp, i)->cgroup != cgrp);
+			BUG_ON(cgroup_css(cgrp, ss) != cgroup_css(cgroup_dummy_top, ss));
+			BUG_ON(cgroup_css(cgrp, ss)->cgroup != cgrp);
 
 			if (ss->bind)
-				ss->bind(cgroup_css(cgroup_dummy_top, i));
+				ss->bind(cgroup_css(cgroup_dummy_top, ss));
 
-			cgroup_css(cgroup_dummy_top, i)->cgroup = cgroup_dummy_top;
+			cgroup_css(cgroup_dummy_top, ss)->cgroup = cgroup_dummy_top;
 			RCU_INIT_POINTER(cgrp->subsys[i], NULL);
 
 			cgroup_subsys[i]->root = &cgroup_dummy_root;
@@ -1930,7 +1933,7 @@ EXPORT_SYMBOL_GPL(cgroup_taskset_next);
 struct cgroup_subsys_state *cgroup_taskset_cur_css(struct cgroup_taskset *tset,
 						   int subsys_id)
 {
-	return cgroup_css(tset->cur_cgrp, subsys_id);
+	return cgroup_css(tset->cur_cgrp, cgroup_subsys[subsys_id]);
 }
 EXPORT_SYMBOL_GPL(cgroup_taskset_cur_css);
 
@@ -2071,7 +2074,7 @@ static int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk,
 	 * step 1: check that we can legitimately attach to the cgroup.
 	 */
 	for_each_root_subsys(root, ss) {
-		struct cgroup_subsys_state *css = cgroup_css(cgrp, ss->subsys_id);
+		struct cgroup_subsys_state *css = cgroup_css(cgrp, ss);
 
 		if (ss->can_attach) {
 			retval = ss->can_attach(css, &tset);
@@ -2113,7 +2116,7 @@ static int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk,
 	 * step 4: do subsystem attach callbacks.
 	 */
 	for_each_root_subsys(root, ss) {
-		struct cgroup_subsys_state *css = cgroup_css(cgrp, ss->subsys_id);
+		struct cgroup_subsys_state *css = cgroup_css(cgrp, ss);
 
 		if (ss->attach)
 			ss->attach(css, &tset);
@@ -2135,7 +2138,7 @@ out_put_css_set_refs:
 out_cancel_attach:
 	if (retval) {
 		for_each_root_subsys(root, ss) {
-			struct cgroup_subsys_state *css = cgroup_css(cgrp, ss->subsys_id);
+			struct cgroup_subsys_state *css = cgroup_css(cgrp, ss);
 
 			if (ss == failed_ss)
 				break;
@@ -2481,13 +2484,9 @@ static int cgroup_file_open(struct inode *inode, struct file *file)
 	 * @css stays alive for all file operations.
 	 */
 	rcu_read_lock();
-	if (cft->ss) {
-		css = cgroup_css(cgrp, cft->ss->subsys_id);
-		if (!css_tryget(css))
-			css = NULL;
-	} else {
-		css = &cgrp->dummy_css;
-	}
+	css = cgroup_css(cgrp, cft->ss);
+	if (cft->ss && !css_tryget(css))
+		css = NULL;
 	rcu_read_unlock();
 
 	if (!css)
@@ -2878,7 +2877,7 @@ static int cgroup_cfts_commit(struct cftype *cfts, bool is_add)
 
 	/* add/rm files for all cgroups created before */
 	rcu_read_lock();
-	css_for_each_descendant_pre(css, cgroup_css(root, ss->subsys_id)) {
+	css_for_each_descendant_pre(css, cgroup_css(root, ss)) {
 		struct cgroup *cgrp = css->cgroup;
 
 		if (cgroup_is_dead(cgrp))
@@ -3082,10 +3081,7 @@ css_next_child(struct cgroup_subsys_state *pos_css,
 	if (&next->sibling == &cgrp->children)
 		return NULL;
 
-	if (parent_css->ss)
-		return cgroup_css(next, parent_css->ss->subsys_id);
-	else
-		return &next->dummy_css;
+	return cgroup_css(next, parent_css->ss);
 }
 EXPORT_SYMBOL_GPL(css_next_child);
 
@@ -4110,7 +4106,7 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
 	rcu_read_lock();
 
 	ret = -EINVAL;
-	event->css = cgroup_css(cgrp, event->cft->ss->subsys_id);
+	event->css = cgroup_css(cgrp, event->cft->ss);
 	if (event->css)
 		ret = 0;
 
@@ -4266,7 +4262,7 @@ static int cgroup_populate_dir(struct cgroup *cgrp, unsigned long subsys_mask)
 
 	/* This cgroup is ready now */
 	for_each_root_subsys(cgrp->root, ss) {
-		struct cgroup_subsys_state *css = cgroup_css(cgrp, ss->subsys_id);
+		struct cgroup_subsys_state *css = cgroup_css(cgrp, ss);
 		struct css_id *id = rcu_dereference_protected(css->id, true);
 
 		/*
@@ -4349,11 +4345,11 @@ static void init_css(struct cgroup_subsys_state *css, struct cgroup_subsys *ss,
 	css->id = NULL;
 
 	if (cgrp->parent)
-		css->parent = cgroup_css(cgrp->parent, ss->subsys_id);
+		css->parent = cgroup_css(cgrp->parent, ss);
 	else
 		css->flags |= CSS_ROOT;
 
-	BUG_ON(cgroup_css(cgrp, ss->subsys_id));
+	BUG_ON(cgroup_css(cgrp, ss));
 }
 
 /* invoke ->css_online() on a new CSS and mark it online if successful */
@@ -4466,7 +4462,7 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
 	for_each_root_subsys(root, ss) {
 		struct cgroup_subsys_state *css;
 
-		css = ss->css_alloc(cgroup_css(parent, ss->subsys_id));
+		css = ss->css_alloc(cgroup_css(parent, ss));
 		if (IS_ERR(css)) {
 			err = PTR_ERR(css);
 			goto err_free_all;
@@ -4712,7 +4708,7 @@ static int cgroup_destroy_locked(struct cgroup *cgrp)
 	 * percpu refs of all css's are confirmed to be killed.
 	 */
 	for_each_root_subsys(cgrp->root, ss)
-		kill_css(cgroup_css(cgrp, ss->subsys_id));
+		kill_css(cgroup_css(cgrp, ss));
 
 	/*
 	 * Mark @cgrp dead.  This prevents further task migration and child
@@ -4839,7 +4835,7 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss)
 	/* Create the top cgroup state for this subsystem */
 	list_add(&ss->sibling, &cgroup_dummy_root.subsys_list);
 	ss->root = &cgroup_dummy_root;
-	css = ss->css_alloc(cgroup_css(cgroup_dummy_top, ss->subsys_id));
+	css = ss->css_alloc(cgroup_css(cgroup_dummy_top, ss));
 	/* We don't handle early failures gracefully */
 	BUG_ON(IS_ERR(css));
 	init_css(css, ss, cgroup_dummy_top);
@@ -4918,7 +4914,7 @@ int __init_or_module cgroup_load_subsys(struct cgroup_subsys *ss)
 	 * struct, so this can happen first (i.e. before the dummy root
 	 * attachment).
 	 */
-	css = ss->css_alloc(cgroup_css(cgroup_dummy_top, ss->subsys_id));
+	css = ss->css_alloc(cgroup_css(cgroup_dummy_top, ss));
 	if (IS_ERR(css)) {
 		/* failure case - need to deassign the cgroup_subsys[] slot. */
 		cgroup_subsys[ss->subsys_id] = NULL;
@@ -5000,7 +4996,7 @@ void cgroup_unload_subsys(struct cgroup_subsys *ss)
 
 	mutex_lock(&cgroup_mutex);
 
-	offline_css(cgroup_css(cgroup_dummy_top, ss->subsys_id));
+	offline_css(cgroup_css(cgroup_dummy_top, ss));
 
 	if (ss->use_id)
 		idr_destroy(&ss->idr);
@@ -5034,7 +5030,7 @@ void cgroup_unload_subsys(struct cgroup_subsys *ss)
 	 * the cgrp->subsys pointer to find their state. note that this
 	 * also takes care of freeing the css_id.
 	 */
-	ss->css_free(cgroup_css(cgroup_dummy_top, ss->subsys_id));
+	ss->css_free(cgroup_css(cgroup_dummy_top, ss));
 	RCU_INIT_POINTER(cgroup_dummy_top->subsys[ss->subsys_id], NULL);
 
 	mutex_unlock(&cgroup_mutex);
@@ -5721,7 +5717,7 @@ struct cgroup_subsys_state *css_from_dir(struct dentry *dentry,
 		return ERR_PTR(-EBADF);
 
 	cgrp = __d_cgrp(dentry);
-	return cgroup_css(cgrp, ss->subsys_id) ?: ERR_PTR(-ENOENT);
+	return cgroup_css(cgrp, ss) ?: ERR_PTR(-ENOENT);
 }
 
 #ifdef CONFIG_CGROUP_DEBUG
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 02/12] cgroup: make cgroup_css() take cgroup_subsys * instead and allow NULL subsys
       [not found] ` <1376582550-12548-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (2 preceding siblings ...)
  2013-08-15 16:02   ` [PATCH 02/12] cgroup: make cgroup_css() take cgroup_subsys * instead and allow NULL subsys Tejun Heo
@ 2013-08-15 16:02   ` Tejun Heo
  2013-08-15 16:02   ` [PATCH 03/12] cgroup: implement CFTYPE_NO_PREFIX Tejun Heo
                     ` (25 subsequent siblings)
  29 siblings, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2013-08-15 16:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Tejun Heo

cgroup_css() is no longer used in hot paths.  Make it take struct
cgroup_subsys * and allow the users to specify NULL subsys to obtain
the dummy_css.  This removes open-coded NULL subsystem testing in a
couple users and generally simplifies the code.

After this patch, css_from_dir() also allows NULL @ss and returns the
matching dummy_css.  This behavior change doesn't affect its only user
- perf.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 kernel/cgroup.c | 90 +++++++++++++++++++++++++++------------------------------
 1 file changed, 43 insertions(+), 47 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 007053d..f09ce8d 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -226,19 +226,22 @@ static int cgroup_addrm_files(struct cgroup *cgrp, struct cftype cfts[],
 /**
  * cgroup_css - obtain a cgroup's css for the specified subsystem
  * @cgrp: the cgroup of interest
- * @subsys_id: the subsystem of interest
+ * @ss: the subsystem of interest (%NULL returns the dummy_css)
  *
- * Return @cgrp's css (cgroup_subsys_state) associated with @subsys_id.
- * This function must be called either under cgroup_mutex or
- * rcu_read_lock() and the caller is responsible for pinning the returned
- * css if it wants to keep accessing it outside the said locks.  This
- * function may return %NULL if @cgrp doesn't have @subsys_id enabled.
+ * Return @cgrp's css (cgroup_subsys_state) associated with @ss.  This
+ * function must be called either under cgroup_mutex or rcu_read_lock() and
+ * the caller is responsible for pinning the returned css if it wants to
+ * keep accessing it outside the said locks.  This function may return
+ * %NULL if @cgrp doesn't have @subsys_id enabled.
  */
 static struct cgroup_subsys_state *cgroup_css(struct cgroup *cgrp,
-					      int subsys_id)
+					      struct cgroup_subsys *ss)
 {
-	return rcu_dereference_check(cgrp->subsys[subsys_id],
-				     lockdep_is_held(&cgroup_mutex));
+	if (ss)
+		return rcu_dereference_check(cgrp->subsys[ss->subsys_id],
+					     lockdep_is_held(&cgroup_mutex));
+	else
+		return &cgrp->dummy_css;
 }
 
 /* convenient tests for these bits */
@@ -580,7 +583,7 @@ static struct css_set *find_existing_css_set(struct css_set *old_cset,
 			/* Subsystem is in this hierarchy. So we want
 			 * the subsystem state from the new
 			 * cgroup */
-			template[i] = cgroup_css(cgrp, i);
+			template[i] = cgroup_css(cgrp, ss);
 		} else {
 			/* Subsystem is not in this hierarchy, so we
 			 * don't want to change the subsystem state */
@@ -1062,30 +1065,30 @@ static int rebind_subsystems(struct cgroupfs_root *root,
 
 		if (bit & added_mask) {
 			/* We're binding this subsystem to this hierarchy */
-			BUG_ON(cgroup_css(cgrp, i));
-			BUG_ON(!cgroup_css(cgroup_dummy_top, i));
-			BUG_ON(cgroup_css(cgroup_dummy_top, i)->cgroup != cgroup_dummy_top);
+			BUG_ON(cgroup_css(cgrp, ss));
+			BUG_ON(!cgroup_css(cgroup_dummy_top, ss));
+			BUG_ON(cgroup_css(cgroup_dummy_top, ss)->cgroup != cgroup_dummy_top);
 
 			rcu_assign_pointer(cgrp->subsys[i],
-					   cgroup_css(cgroup_dummy_top, i));
-			cgroup_css(cgrp, i)->cgroup = cgrp;
+					   cgroup_css(cgroup_dummy_top, ss));
+			cgroup_css(cgrp, ss)->cgroup = cgrp;
 
 			list_move(&ss->sibling, &root->subsys_list);
 			ss->root = root;
 			if (ss->bind)
-				ss->bind(cgroup_css(cgrp, i));
+				ss->bind(cgroup_css(cgrp, ss));
 
 			/* refcount was already taken, and we're keeping it */
 			root->subsys_mask |= bit;
 		} else if (bit & removed_mask) {
 			/* We're removing this subsystem */
-			BUG_ON(cgroup_css(cgrp, i) != cgroup_css(cgroup_dummy_top, i));
-			BUG_ON(cgroup_css(cgrp, i)->cgroup != cgrp);
+			BUG_ON(cgroup_css(cgrp, ss) != cgroup_css(cgroup_dummy_top, ss));
+			BUG_ON(cgroup_css(cgrp, ss)->cgroup != cgrp);
 
 			if (ss->bind)
-				ss->bind(cgroup_css(cgroup_dummy_top, i));
+				ss->bind(cgroup_css(cgroup_dummy_top, ss));
 
-			cgroup_css(cgroup_dummy_top, i)->cgroup = cgroup_dummy_top;
+			cgroup_css(cgroup_dummy_top, ss)->cgroup = cgroup_dummy_top;
 			RCU_INIT_POINTER(cgrp->subsys[i], NULL);
 
 			cgroup_subsys[i]->root = &cgroup_dummy_root;
@@ -1930,7 +1933,7 @@ EXPORT_SYMBOL_GPL(cgroup_taskset_next);
 struct cgroup_subsys_state *cgroup_taskset_cur_css(struct cgroup_taskset *tset,
 						   int subsys_id)
 {
-	return cgroup_css(tset->cur_cgrp, subsys_id);
+	return cgroup_css(tset->cur_cgrp, cgroup_subsys[subsys_id]);
 }
 EXPORT_SYMBOL_GPL(cgroup_taskset_cur_css);
 
@@ -2071,7 +2074,7 @@ static int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk,
 	 * step 1: check that we can legitimately attach to the cgroup.
 	 */
 	for_each_root_subsys(root, ss) {
-		struct cgroup_subsys_state *css = cgroup_css(cgrp, ss->subsys_id);
+		struct cgroup_subsys_state *css = cgroup_css(cgrp, ss);
 
 		if (ss->can_attach) {
 			retval = ss->can_attach(css, &tset);
@@ -2113,7 +2116,7 @@ static int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk,
 	 * step 4: do subsystem attach callbacks.
 	 */
 	for_each_root_subsys(root, ss) {
-		struct cgroup_subsys_state *css = cgroup_css(cgrp, ss->subsys_id);
+		struct cgroup_subsys_state *css = cgroup_css(cgrp, ss);
 
 		if (ss->attach)
 			ss->attach(css, &tset);
@@ -2135,7 +2138,7 @@ out_put_css_set_refs:
 out_cancel_attach:
 	if (retval) {
 		for_each_root_subsys(root, ss) {
-			struct cgroup_subsys_state *css = cgroup_css(cgrp, ss->subsys_id);
+			struct cgroup_subsys_state *css = cgroup_css(cgrp, ss);
 
 			if (ss == failed_ss)
 				break;
@@ -2481,13 +2484,9 @@ static int cgroup_file_open(struct inode *inode, struct file *file)
 	 * @css stays alive for all file operations.
 	 */
 	rcu_read_lock();
-	if (cft->ss) {
-		css = cgroup_css(cgrp, cft->ss->subsys_id);
-		if (!css_tryget(css))
-			css = NULL;
-	} else {
-		css = &cgrp->dummy_css;
-	}
+	css = cgroup_css(cgrp, cft->ss);
+	if (cft->ss && !css_tryget(css))
+		css = NULL;
 	rcu_read_unlock();
 
 	if (!css)
@@ -2878,7 +2877,7 @@ static int cgroup_cfts_commit(struct cftype *cfts, bool is_add)
 
 	/* add/rm files for all cgroups created before */
 	rcu_read_lock();
-	css_for_each_descendant_pre(css, cgroup_css(root, ss->subsys_id)) {
+	css_for_each_descendant_pre(css, cgroup_css(root, ss)) {
 		struct cgroup *cgrp = css->cgroup;
 
 		if (cgroup_is_dead(cgrp))
@@ -3082,10 +3081,7 @@ css_next_child(struct cgroup_subsys_state *pos_css,
 	if (&next->sibling == &cgrp->children)
 		return NULL;
 
-	if (parent_css->ss)
-		return cgroup_css(next, parent_css->ss->subsys_id);
-	else
-		return &next->dummy_css;
+	return cgroup_css(next, parent_css->ss);
 }
 EXPORT_SYMBOL_GPL(css_next_child);
 
@@ -4110,7 +4106,7 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
 	rcu_read_lock();
 
 	ret = -EINVAL;
-	event->css = cgroup_css(cgrp, event->cft->ss->subsys_id);
+	event->css = cgroup_css(cgrp, event->cft->ss);
 	if (event->css)
 		ret = 0;
 
@@ -4266,7 +4262,7 @@ static int cgroup_populate_dir(struct cgroup *cgrp, unsigned long subsys_mask)
 
 	/* This cgroup is ready now */
 	for_each_root_subsys(cgrp->root, ss) {
-		struct cgroup_subsys_state *css = cgroup_css(cgrp, ss->subsys_id);
+		struct cgroup_subsys_state *css = cgroup_css(cgrp, ss);
 		struct css_id *id = rcu_dereference_protected(css->id, true);
 
 		/*
@@ -4349,11 +4345,11 @@ static void init_css(struct cgroup_subsys_state *css, struct cgroup_subsys *ss,
 	css->id = NULL;
 
 	if (cgrp->parent)
-		css->parent = cgroup_css(cgrp->parent, ss->subsys_id);
+		css->parent = cgroup_css(cgrp->parent, ss);
 	else
 		css->flags |= CSS_ROOT;
 
-	BUG_ON(cgroup_css(cgrp, ss->subsys_id));
+	BUG_ON(cgroup_css(cgrp, ss));
 }
 
 /* invoke ->css_online() on a new CSS and mark it online if successful */
@@ -4466,7 +4462,7 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
 	for_each_root_subsys(root, ss) {
 		struct cgroup_subsys_state *css;
 
-		css = ss->css_alloc(cgroup_css(parent, ss->subsys_id));
+		css = ss->css_alloc(cgroup_css(parent, ss));
 		if (IS_ERR(css)) {
 			err = PTR_ERR(css);
 			goto err_free_all;
@@ -4712,7 +4708,7 @@ static int cgroup_destroy_locked(struct cgroup *cgrp)
 	 * percpu refs of all css's are confirmed to be killed.
 	 */
 	for_each_root_subsys(cgrp->root, ss)
-		kill_css(cgroup_css(cgrp, ss->subsys_id));
+		kill_css(cgroup_css(cgrp, ss));
 
 	/*
 	 * Mark @cgrp dead.  This prevents further task migration and child
@@ -4839,7 +4835,7 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss)
 	/* Create the top cgroup state for this subsystem */
 	list_add(&ss->sibling, &cgroup_dummy_root.subsys_list);
 	ss->root = &cgroup_dummy_root;
-	css = ss->css_alloc(cgroup_css(cgroup_dummy_top, ss->subsys_id));
+	css = ss->css_alloc(cgroup_css(cgroup_dummy_top, ss));
 	/* We don't handle early failures gracefully */
 	BUG_ON(IS_ERR(css));
 	init_css(css, ss, cgroup_dummy_top);
@@ -4918,7 +4914,7 @@ int __init_or_module cgroup_load_subsys(struct cgroup_subsys *ss)
 	 * struct, so this can happen first (i.e. before the dummy root
 	 * attachment).
 	 */
-	css = ss->css_alloc(cgroup_css(cgroup_dummy_top, ss->subsys_id));
+	css = ss->css_alloc(cgroup_css(cgroup_dummy_top, ss));
 	if (IS_ERR(css)) {
 		/* failure case - need to deassign the cgroup_subsys[] slot. */
 		cgroup_subsys[ss->subsys_id] = NULL;
@@ -5000,7 +4996,7 @@ void cgroup_unload_subsys(struct cgroup_subsys *ss)
 
 	mutex_lock(&cgroup_mutex);
 
-	offline_css(cgroup_css(cgroup_dummy_top, ss->subsys_id));
+	offline_css(cgroup_css(cgroup_dummy_top, ss));
 
 	if (ss->use_id)
 		idr_destroy(&ss->idr);
@@ -5034,7 +5030,7 @@ void cgroup_unload_subsys(struct cgroup_subsys *ss)
 	 * the cgrp->subsys pointer to find their state. note that this
 	 * also takes care of freeing the css_id.
 	 */
-	ss->css_free(cgroup_css(cgroup_dummy_top, ss->subsys_id));
+	ss->css_free(cgroup_css(cgroup_dummy_top, ss));
 	RCU_INIT_POINTER(cgroup_dummy_top->subsys[ss->subsys_id], NULL);
 
 	mutex_unlock(&cgroup_mutex);
@@ -5721,7 +5717,7 @@ struct cgroup_subsys_state *css_from_dir(struct dentry *dentry,
 		return ERR_PTR(-EBADF);
 
 	cgrp = __d_cgrp(dentry);
-	return cgroup_css(cgrp, ss->subsys_id) ?: ERR_PTR(-ENOENT);
+	return cgroup_css(cgrp, ss) ?: ERR_PTR(-ENOENT);
 }
 
 #ifdef CONFIG_CGROUP_DEBUG
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 03/12] cgroup: implement CFTYPE_NO_PREFIX
       [not found] ` <1376582550-12548-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (3 preceding siblings ...)
  2013-08-15 16:02   ` Tejun Heo
@ 2013-08-15 16:02   ` Tejun Heo
  2013-08-15 16:02   ` Tejun Heo
                     ` (24 subsequent siblings)
  29 siblings, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2013-08-15 16:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
  Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Glauber Costa

When cgroup files are created, cgroup core automatically prepends the
name of the subsystem as prefix.  This patch adds CFTYPE_NO_ which
disables the automatic prefix.  This is to work around historical
baggages and shouldn't be used for new files.

This will be used to move "cgroup.event_control" from cgroup core to
memcg.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Glauber Costa <glommer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 include/linux/cgroup.h | 1 +
 kernel/cgroup.c        | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 5029176..00c6329 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -411,6 +411,7 @@ enum {
 	CFTYPE_ONLY_ON_ROOT	= (1 << 0),	/* only create on root cgrp */
 	CFTYPE_NOT_ON_ROOT	= (1 << 1),	/* don't create on root cgrp */
 	CFTYPE_INSANE		= (1 << 2),	/* don't create if sane_behavior */
+	CFTYPE_NO_PREFIX	= (1 << 3),	/* (DON'T USE FOR NEW FILES) no subsys prefix */
 };
 
 #define MAX_CFTYPE_NAME		64
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index f09ce8d..73d1c70 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2756,7 +2756,8 @@ static int cgroup_add_file(struct cgroup *cgrp, struct cftype *cft)
 	umode_t mode;
 	char name[MAX_CGROUP_TYPE_NAMELEN + MAX_CFTYPE_NAME + 2] = { 0 };
 
-	if (cft->ss && !(cgrp->root->flags & CGRP_ROOT_NOPREFIX)) {
+	if (cft->ss && !(cft->flags & CFTYPE_NO_PREFIX) &&
+	    !(cgrp->root->flags & CGRP_ROOT_NOPREFIX)) {
 		strcpy(name, cft->ss->name);
 		strcat(name, ".");
 	}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 03/12] cgroup: implement CFTYPE_NO_PREFIX
       [not found] ` <1376582550-12548-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (4 preceding siblings ...)
  2013-08-15 16:02   ` [PATCH 03/12] cgroup: implement CFTYPE_NO_PREFIX Tejun Heo
@ 2013-08-15 16:02   ` Tejun Heo
  2013-08-15 16:02   ` [PATCH 04/12] cgroup: make cgroup_event hold onto cgroup_subsys_state instead of cgroup Tejun Heo
                     ` (23 subsequent siblings)
  29 siblings, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2013-08-15 16:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Tejun Heo, Glauber Costa

When cgroup files are created, cgroup core automatically prepends the
name of the subsystem as prefix.  This patch adds CFTYPE_NO_ which
disables the automatic prefix.  This is to work around historical
baggages and shouldn't be used for new files.

This will be used to move "cgroup.event_control" from cgroup core to
memcg.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Glauber Costa <glommer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 include/linux/cgroup.h | 1 +
 kernel/cgroup.c        | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 5029176..00c6329 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -411,6 +411,7 @@ enum {
 	CFTYPE_ONLY_ON_ROOT	= (1 << 0),	/* only create on root cgrp */
 	CFTYPE_NOT_ON_ROOT	= (1 << 1),	/* don't create on root cgrp */
 	CFTYPE_INSANE		= (1 << 2),	/* don't create if sane_behavior */
+	CFTYPE_NO_PREFIX	= (1 << 3),	/* (DON'T USE FOR NEW FILES) no subsys prefix */
 };
 
 #define MAX_CFTYPE_NAME		64
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index f09ce8d..73d1c70 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2756,7 +2756,8 @@ static int cgroup_add_file(struct cgroup *cgrp, struct cftype *cft)
 	umode_t mode;
 	char name[MAX_CGROUP_TYPE_NAMELEN + MAX_CFTYPE_NAME + 2] = { 0 };
 
-	if (cft->ss && !(cgrp->root->flags & CGRP_ROOT_NOPREFIX)) {
+	if (cft->ss && !(cft->flags & CFTYPE_NO_PREFIX) &&
+	    !(cgrp->root->flags & CGRP_ROOT_NOPREFIX)) {
 		strcpy(name, cft->ss->name);
 		strcat(name, ".");
 	}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 04/12] cgroup: make cgroup_event hold onto cgroup_subsys_state instead of cgroup
       [not found] ` <1376582550-12548-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (6 preceding siblings ...)
  2013-08-15 16:02   ` [PATCH 04/12] cgroup: make cgroup_event hold onto cgroup_subsys_state instead of cgroup Tejun Heo
@ 2013-08-15 16:02   ` Tejun Heo
  2013-08-15 16:02   ` [PATCH 05/12] cgroup: make cgroup_write_event_control() use css_from_dir() instead of __d_cgrp() Tejun Heo
                     ` (21 subsequent siblings)
  29 siblings, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2013-08-15 16:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
  Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Currently, each registered cgroup_event holds an extra reference to
the cgroup.  This is a bit weird as events are subsystem specific and
will also be incorrect in the planned unified hierarchy as css
(cgroup_subsys_state) may come and go dynamically across the lifetime
of a cgroup.  Holding onto cgroup won't prevent the target css from
going away.

Update cgroup_event to hold onto the css the traget file belongs to
instead of cgroup.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 kernel/cgroup.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 73d1c70..3e34c1e 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -3969,7 +3969,6 @@ static void cgroup_event_remove(struct work_struct *work)
 	struct cgroup_event *event = container_of(work, struct cgroup_event,
 			remove);
 	struct cgroup_subsys_state *css = event->css;
-	struct cgroup *cgrp = css->cgroup;
 
 	remove_wait_queue(event->wqh, &event->wait);
 
@@ -3980,7 +3979,7 @@ static void cgroup_event_remove(struct work_struct *work)
 
 	eventfd_ctx_put(event->eventfd);
 	kfree(event);
-	cgroup_dput(cgrp);
+	css_put(css);
 }
 
 /*
@@ -4103,12 +4102,16 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
 		goto out_put_cfile;
 	}
 
-	/* determine the css of @cfile and associate @event with it */
+	/*
+	 * Determine the css of @cfile and associate @event with it.
+	 * Remaining events are automatically removed on cgroup destruction
+	 * but the removal is asynchronous, so take an extra ref.
+	 */
 	rcu_read_lock();
 
 	ret = -EINVAL;
 	event->css = cgroup_css(cgrp, event->cft->ss);
-	if (event->css)
+	if (event->css && css_tryget(event->css))
 		ret = 0;
 
 	rcu_read_unlock();
@@ -4122,28 +4125,21 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
 	cgrp_cfile = __d_cgrp(cfile->f_dentry->d_parent);
 	if (cgrp_cfile != cgrp) {
 		ret = -EINVAL;
-		goto out_put_cfile;
+		goto out_put_css;
 	}
 
 	if (!event->cft->register_event || !event->cft->unregister_event) {
 		ret = -EINVAL;
-		goto out_put_cfile;
+		goto out_put_css;
 	}
 
 	ret = event->cft->register_event(event->css, event->cft,
 			event->eventfd, buffer);
 	if (ret)
-		goto out_put_cfile;
+		goto out_put_css;
 
 	efile->f_op->poll(efile, &event->pt);
 
-	/*
-	 * Events should be removed after rmdir of cgroup directory, but before
-	 * destroying subsystem state objects. Let's take reference to cgroup
-	 * directory dentry to do that.
-	 */
-	dget(cgrp->dentry);
-
 	spin_lock(&cgrp->event_list_lock);
 	list_add(&event->list, &cgrp->event_list);
 	spin_unlock(&cgrp->event_list_lock);
@@ -4153,6 +4149,8 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
 
 	return 0;
 
+out_put_css:
+	css_put(event->css);
 out_put_cfile:
 	fput(cfile);
 out_put_eventfd:
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 04/12] cgroup: make cgroup_event hold onto cgroup_subsys_state instead of cgroup
       [not found] ` <1376582550-12548-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (5 preceding siblings ...)
  2013-08-15 16:02   ` Tejun Heo
@ 2013-08-15 16:02   ` Tejun Heo
  2013-08-15 16:02   ` Tejun Heo
                     ` (22 subsequent siblings)
  29 siblings, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2013-08-15 16:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Tejun Heo

Currently, each registered cgroup_event holds an extra reference to
the cgroup.  This is a bit weird as events are subsystem specific and
will also be incorrect in the planned unified hierarchy as css
(cgroup_subsys_state) may come and go dynamically across the lifetime
of a cgroup.  Holding onto cgroup won't prevent the target css from
going away.

Update cgroup_event to hold onto the css the traget file belongs to
instead of cgroup.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 kernel/cgroup.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 73d1c70..3e34c1e 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -3969,7 +3969,6 @@ static void cgroup_event_remove(struct work_struct *work)
 	struct cgroup_event *event = container_of(work, struct cgroup_event,
 			remove);
 	struct cgroup_subsys_state *css = event->css;
-	struct cgroup *cgrp = css->cgroup;
 
 	remove_wait_queue(event->wqh, &event->wait);
 
@@ -3980,7 +3979,7 @@ static void cgroup_event_remove(struct work_struct *work)
 
 	eventfd_ctx_put(event->eventfd);
 	kfree(event);
-	cgroup_dput(cgrp);
+	css_put(css);
 }
 
 /*
@@ -4103,12 +4102,16 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
 		goto out_put_cfile;
 	}
 
-	/* determine the css of @cfile and associate @event with it */
+	/*
+	 * Determine the css of @cfile and associate @event with it.
+	 * Remaining events are automatically removed on cgroup destruction
+	 * but the removal is asynchronous, so take an extra ref.
+	 */
 	rcu_read_lock();
 
 	ret = -EINVAL;
 	event->css = cgroup_css(cgrp, event->cft->ss);
-	if (event->css)
+	if (event->css && css_tryget(event->css))
 		ret = 0;
 
 	rcu_read_unlock();
@@ -4122,28 +4125,21 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
 	cgrp_cfile = __d_cgrp(cfile->f_dentry->d_parent);
 	if (cgrp_cfile != cgrp) {
 		ret = -EINVAL;
-		goto out_put_cfile;
+		goto out_put_css;
 	}
 
 	if (!event->cft->register_event || !event->cft->unregister_event) {
 		ret = -EINVAL;
-		goto out_put_cfile;
+		goto out_put_css;
 	}
 
 	ret = event->cft->register_event(event->css, event->cft,
 			event->eventfd, buffer);
 	if (ret)
-		goto out_put_cfile;
+		goto out_put_css;
 
 	efile->f_op->poll(efile, &event->pt);
 
-	/*
-	 * Events should be removed after rmdir of cgroup directory, but before
-	 * destroying subsystem state objects. Let's take reference to cgroup
-	 * directory dentry to do that.
-	 */
-	dget(cgrp->dentry);
-
 	spin_lock(&cgrp->event_list_lock);
 	list_add(&event->list, &cgrp->event_list);
 	spin_unlock(&cgrp->event_list_lock);
@@ -4153,6 +4149,8 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
 
 	return 0;
 
+out_put_css:
+	css_put(event->css);
 out_put_cfile:
 	fput(cfile);
 out_put_eventfd:
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 05/12] cgroup: make cgroup_write_event_control() use css_from_dir() instead of __d_cgrp()
       [not found] ` <1376582550-12548-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (8 preceding siblings ...)
  2013-08-15 16:02   ` [PATCH 05/12] cgroup: make cgroup_write_event_control() use css_from_dir() instead of __d_cgrp() Tejun Heo
@ 2013-08-15 16:02   ` Tejun Heo
  2013-08-15 16:02   ` [PATCH 06/12] cgroup, memcg: move cgroup_event implementation to memcg Tejun Heo
                     ` (19 subsequent siblings)
  29 siblings, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2013-08-15 16:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
  Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

cgroup_event will be moved to its only user - memcg.  Replace
__d_cgrp() usage with css_from_dir(), which is already exported.  This
also simplifies the code a bit.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 kernel/cgroup.c | 18 +++++-------------
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 3e34c1e..6deed8b 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4041,7 +4041,7 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
 {
 	struct cgroup *cgrp = dummy_css->cgroup;
 	struct cgroup_event *event;
-	struct cgroup *cgrp_cfile;
+	struct cgroup_subsys_state *cfile_css;
 	unsigned int efd, cfd;
 	struct file *efile;
 	struct file *cfile;
@@ -4103,7 +4103,8 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
 	}
 
 	/*
-	 * Determine the css of @cfile and associate @event with it.
+	 * Determine the css of @cfile, verify it belongs to the same
+	 * cgroup as cgroup.event_control, and associate @event with it.
 	 * Remaining events are automatically removed on cgroup destruction
 	 * but the removal is asynchronous, so take an extra ref.
 	 */
@@ -4111,23 +4112,14 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
 
 	ret = -EINVAL;
 	event->css = cgroup_css(cgrp, event->cft->ss);
-	if (event->css && css_tryget(event->css))
+	cfile_css = css_from_dir(cfile->f_dentry->d_parent, event->cft->ss);
+	if (event->css && event->css == cfile_css && css_tryget(event->css))
 		ret = 0;
 
 	rcu_read_unlock();
 	if (ret)
 		goto out_put_cfile;
 
-	/*
-	 * The file to be monitored must be in the same cgroup as
-	 * cgroup.event_control is.
-	 */
-	cgrp_cfile = __d_cgrp(cfile->f_dentry->d_parent);
-	if (cgrp_cfile != cgrp) {
-		ret = -EINVAL;
-		goto out_put_css;
-	}
-
 	if (!event->cft->register_event || !event->cft->unregister_event) {
 		ret = -EINVAL;
 		goto out_put_css;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 05/12] cgroup: make cgroup_write_event_control() use css_from_dir() instead of __d_cgrp()
       [not found] ` <1376582550-12548-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (7 preceding siblings ...)
  2013-08-15 16:02   ` Tejun Heo
@ 2013-08-15 16:02   ` Tejun Heo
       [not found]     ` <1376582550-12548-6-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2013-08-15 16:02   ` Tejun Heo
                     ` (20 subsequent siblings)
  29 siblings, 1 reply; 74+ messages in thread
From: Tejun Heo @ 2013-08-15 16:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Tejun Heo

cgroup_event will be moved to its only user - memcg.  Replace
__d_cgrp() usage with css_from_dir(), which is already exported.  This
also simplifies the code a bit.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 kernel/cgroup.c | 18 +++++-------------
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 3e34c1e..6deed8b 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4041,7 +4041,7 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
 {
 	struct cgroup *cgrp = dummy_css->cgroup;
 	struct cgroup_event *event;
-	struct cgroup *cgrp_cfile;
+	struct cgroup_subsys_state *cfile_css;
 	unsigned int efd, cfd;
 	struct file *efile;
 	struct file *cfile;
@@ -4103,7 +4103,8 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
 	}
 
 	/*
-	 * Determine the css of @cfile and associate @event with it.
+	 * Determine the css of @cfile, verify it belongs to the same
+	 * cgroup as cgroup.event_control, and associate @event with it.
 	 * Remaining events are automatically removed on cgroup destruction
 	 * but the removal is asynchronous, so take an extra ref.
 	 */
@@ -4111,23 +4112,14 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
 
 	ret = -EINVAL;
 	event->css = cgroup_css(cgrp, event->cft->ss);
-	if (event->css && css_tryget(event->css))
+	cfile_css = css_from_dir(cfile->f_dentry->d_parent, event->cft->ss);
+	if (event->css && event->css == cfile_css && css_tryget(event->css))
 		ret = 0;
 
 	rcu_read_unlock();
 	if (ret)
 		goto out_put_cfile;
 
-	/*
-	 * The file to be monitored must be in the same cgroup as
-	 * cgroup.event_control is.
-	 */
-	cgrp_cfile = __d_cgrp(cfile->f_dentry->d_parent);
-	if (cgrp_cfile != cgrp) {
-		ret = -EINVAL;
-		goto out_put_css;
-	}
-
 	if (!event->cft->register_event || !event->cft->unregister_event) {
 		ret = -EINVAL;
 		goto out_put_css;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 06/12] cgroup, memcg: move cgroup_event implementation to memcg
       [not found] ` <1376582550-12548-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (10 preceding siblings ...)
  2013-08-15 16:02   ` [PATCH 06/12] cgroup, memcg: move cgroup_event implementation to memcg Tejun Heo
@ 2013-08-15 16:02   ` Tejun Heo
  2013-08-15 16:02   ` [PATCH 07/12] memcg: cgroup_write_event_control() now knows @css is for memcg Tejun Heo
                     ` (17 subsequent siblings)
  29 siblings, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2013-08-15 16:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
  Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

cgroup_event is way over-designed and tries to build a generic
flexible event mechanism into cgroup - fully customizable event
specification for each user of the interface.  This is utterly
unnecessary and overboard especially in the light of the planned
unified hierarchy as there's gonna be single agent.  Simply generating
events at fixed points, or if that's too restrictive, configureable
cadence or single set of configureable points should be enough.

Thankfully, memcg is the only user and gets to keep it.  Replacing it
with something simpler on sane_behavior is strongly recommended.

This patch moves cgroup_event and "cgroup.event_control"
implementation to mm/memcontrol.c.  Clearing of events on cgroup
destruction is moved from cgroup_destroy_locked() to
mem_cgroup_css_offline(), which shouldn't make any noticeable
difference.

cgroup_css() and __file_cft() are exported to enable the move;
however, this will soon be reverted once the event code is updated to
be memcg specific.

Note that "cgroup.event_control" will now exist only on the hierarchy
with memcg attached to it.  While this change is visible to userland,
it is unlikely to be noticeable as the file has never been meaningful
outside memcg.

Aside from the above change, this is pure code relocation.

v2: Per Li Zefan's comments, init/Kconfig updated accordingly and
    poll.h inclusion moved from cgroup.c to memcontrol.c.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Cc: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
Cc: Balbir Singh <bsingharora-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 include/linux/cgroup.h |   5 +
 init/Kconfig           |   3 +-
 kernel/cgroup.c        | 252 +------------------------------------------------
 mm/memcontrol.c        | 247 ++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 256 insertions(+), 251 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 00c6329..b83b348 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -923,6 +923,11 @@ unsigned short css_id(struct cgroup_subsys_state *css);
 struct cgroup_subsys_state *css_from_dir(struct dentry *dentry,
 					 struct cgroup_subsys *ss);
 
+/* XXX: temporary */
+struct cgroup_subsys_state *cgroup_css(struct cgroup *cgrp,
+				       struct cgroup_subsys *ss);
+struct cftype *__file_cft(struct file *file);
+
 #else /* !CONFIG_CGROUPS */
 
 static inline int cgroup_init_early(void) { return 0; }
diff --git a/init/Kconfig b/init/Kconfig
index 54d3fa5..b806453 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -844,7 +844,6 @@ config NUMA_BALANCING
 
 menuconfig CGROUPS
 	boolean "Control Group support"
-	depends on EVENTFD
 	help
 	  This option adds support for grouping sets of processes together, for
 	  use with process control subsystems such as Cpusets, CFS, memory
@@ -911,6 +910,7 @@ config MEMCG
 	bool "Memory Resource Controller for Control Groups"
 	depends on RESOURCE_COUNTERS
 	select MM_OWNER
+	select EVENTFD
 	help
 	  Provides a memory resource controller that manages both anonymous
 	  memory and page cache. (See Documentation/cgroups/memory.txt)
@@ -1163,7 +1163,6 @@ config UIDGID_STRICT_TYPE_CHECKS
 
 config SCHED_AUTOGROUP
 	bool "Automatic process group scheduling"
-	select EVENTFD
 	select CGROUPS
 	select CGROUP_SCHED
 	select FAIR_GROUP_SCHED
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 6deed8b..1579ca8 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -56,8 +56,6 @@
 #include <linux/pid_namespace.h>
 #include <linux/idr.h>
 #include <linux/vmalloc.h> /* TODO: replace with more sophisticated array */
-#include <linux/eventfd.h>
-#include <linux/poll.h>
 #include <linux/flex_array.h> /* used in cgroup_attach_task */
 #include <linux/kthread.h>
 
@@ -155,36 +153,6 @@ struct css_id {
 	unsigned short stack[0]; /* Array of Length (depth+1) */
 };
 
-/*
- * cgroup_event represents events which userspace want to receive.
- */
-struct cgroup_event {
-	/*
-	 * css which the event belongs to.
-	 */
-	struct cgroup_subsys_state *css;
-	/*
-	 * Control file which the event associated.
-	 */
-	struct cftype *cft;
-	/*
-	 * eventfd to signal userspace about the event.
-	 */
-	struct eventfd_ctx *eventfd;
-	/*
-	 * Each of these stored in a list by the cgroup.
-	 */
-	struct list_head list;
-	/*
-	 * All fields below needed to unregister event when
-	 * userspace closes eventfd.
-	 */
-	poll_table pt;
-	wait_queue_head_t *wqh;
-	wait_queue_t wait;
-	struct work_struct remove;
-};
-
 /* The list of hierarchy roots */
 
 static LIST_HEAD(cgroup_roots);
@@ -234,8 +202,8 @@ static int cgroup_addrm_files(struct cgroup *cgrp, struct cftype cfts[],
  * keep accessing it outside the said locks.  This function may return
  * %NULL if @cgrp doesn't have @subsys_id enabled.
  */
-static struct cgroup_subsys_state *cgroup_css(struct cgroup *cgrp,
-					      struct cgroup_subsys *ss)
+struct cgroup_subsys_state *cgroup_css(struct cgroup *cgrp,
+				       struct cgroup_subsys *ss)
 {
 	if (ss)
 		return rcu_dereference_check(cgrp->subsys[ss->subsys_id],
@@ -2671,7 +2639,7 @@ static struct dentry *cgroup_lookup(struct inode *dir, struct dentry *dentry, un
 /*
  * Check if a file is a control file
  */
-static inline struct cftype *__file_cft(struct file *file)
+struct cftype *__file_cft(struct file *file)
 {
 	if (file_inode(file)->i_fop != &cgroup_file_operations)
 		return ERR_PTR(-EINVAL);
@@ -3959,202 +3927,6 @@ static void cgroup_dput(struct cgroup *cgrp)
 	deactivate_super(sb);
 }
 
-/*
- * Unregister event and free resources.
- *
- * Gets called from workqueue.
- */
-static void cgroup_event_remove(struct work_struct *work)
-{
-	struct cgroup_event *event = container_of(work, struct cgroup_event,
-			remove);
-	struct cgroup_subsys_state *css = event->css;
-
-	remove_wait_queue(event->wqh, &event->wait);
-
-	event->cft->unregister_event(css, event->cft, event->eventfd);
-
-	/* Notify userspace the event is going away. */
-	eventfd_signal(event->eventfd, 1);
-
-	eventfd_ctx_put(event->eventfd);
-	kfree(event);
-	css_put(css);
-}
-
-/*
- * Gets called on POLLHUP on eventfd when user closes it.
- *
- * Called with wqh->lock held and interrupts disabled.
- */
-static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
-		int sync, void *key)
-{
-	struct cgroup_event *event = container_of(wait,
-			struct cgroup_event, wait);
-	struct cgroup *cgrp = event->css->cgroup;
-	unsigned long flags = (unsigned long)key;
-
-	if (flags & POLLHUP) {
-		/*
-		 * If the event has been detached at cgroup removal, we
-		 * can simply return knowing the other side will cleanup
-		 * for us.
-		 *
-		 * We can't race against event freeing since the other
-		 * side will require wqh->lock via remove_wait_queue(),
-		 * which we hold.
-		 */
-		spin_lock(&cgrp->event_list_lock);
-		if (!list_empty(&event->list)) {
-			list_del_init(&event->list);
-			/*
-			 * We are in atomic context, but cgroup_event_remove()
-			 * may sleep, so we have to call it in workqueue.
-			 */
-			schedule_work(&event->remove);
-		}
-		spin_unlock(&cgrp->event_list_lock);
-	}
-
-	return 0;
-}
-
-static void cgroup_event_ptable_queue_proc(struct file *file,
-		wait_queue_head_t *wqh, poll_table *pt)
-{
-	struct cgroup_event *event = container_of(pt,
-			struct cgroup_event, pt);
-
-	event->wqh = wqh;
-	add_wait_queue(wqh, &event->wait);
-}
-
-/*
- * Parse input and register new cgroup event handler.
- *
- * Input must be in format '<event_fd> <control_fd> <args>'.
- * Interpretation of args is defined by control file implementation.
- */
-static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
-				      struct cftype *cft, const char *buffer)
-{
-	struct cgroup *cgrp = dummy_css->cgroup;
-	struct cgroup_event *event;
-	struct cgroup_subsys_state *cfile_css;
-	unsigned int efd, cfd;
-	struct file *efile;
-	struct file *cfile;
-	char *endp;
-	int ret;
-
-	efd = simple_strtoul(buffer, &endp, 10);
-	if (*endp != ' ')
-		return -EINVAL;
-	buffer = endp + 1;
-
-	cfd = simple_strtoul(buffer, &endp, 10);
-	if ((*endp != ' ') && (*endp != '\0'))
-		return -EINVAL;
-	buffer = endp + 1;
-
-	event = kzalloc(sizeof(*event), GFP_KERNEL);
-	if (!event)
-		return -ENOMEM;
-
-	INIT_LIST_HEAD(&event->list);
-	init_poll_funcptr(&event->pt, cgroup_event_ptable_queue_proc);
-	init_waitqueue_func_entry(&event->wait, cgroup_event_wake);
-	INIT_WORK(&event->remove, cgroup_event_remove);
-
-	efile = eventfd_fget(efd);
-	if (IS_ERR(efile)) {
-		ret = PTR_ERR(efile);
-		goto out_kfree;
-	}
-
-	event->eventfd = eventfd_ctx_fileget(efile);
-	if (IS_ERR(event->eventfd)) {
-		ret = PTR_ERR(event->eventfd);
-		goto out_put_efile;
-	}
-
-	cfile = fget(cfd);
-	if (!cfile) {
-		ret = -EBADF;
-		goto out_put_eventfd;
-	}
-
-	/* the process need read permission on control file */
-	/* AV: shouldn't we check that it's been opened for read instead? */
-	ret = inode_permission(file_inode(cfile), MAY_READ);
-	if (ret < 0)
-		goto out_put_cfile;
-
-	event->cft = __file_cft(cfile);
-	if (IS_ERR(event->cft)) {
-		ret = PTR_ERR(event->cft);
-		goto out_put_cfile;
-	}
-
-	if (!event->cft->ss) {
-		ret = -EBADF;
-		goto out_put_cfile;
-	}
-
-	/*
-	 * Determine the css of @cfile, verify it belongs to the same
-	 * cgroup as cgroup.event_control, and associate @event with it.
-	 * Remaining events are automatically removed on cgroup destruction
-	 * but the removal is asynchronous, so take an extra ref.
-	 */
-	rcu_read_lock();
-
-	ret = -EINVAL;
-	event->css = cgroup_css(cgrp, event->cft->ss);
-	cfile_css = css_from_dir(cfile->f_dentry->d_parent, event->cft->ss);
-	if (event->css && event->css == cfile_css && css_tryget(event->css))
-		ret = 0;
-
-	rcu_read_unlock();
-	if (ret)
-		goto out_put_cfile;
-
-	if (!event->cft->register_event || !event->cft->unregister_event) {
-		ret = -EINVAL;
-		goto out_put_css;
-	}
-
-	ret = event->cft->register_event(event->css, event->cft,
-			event->eventfd, buffer);
-	if (ret)
-		goto out_put_css;
-
-	efile->f_op->poll(efile, &event->pt);
-
-	spin_lock(&cgrp->event_list_lock);
-	list_add(&event->list, &cgrp->event_list);
-	spin_unlock(&cgrp->event_list_lock);
-
-	fput(cfile);
-	fput(efile);
-
-	return 0;
-
-out_put_css:
-	css_put(event->css);
-out_put_cfile:
-	fput(cfile);
-out_put_eventfd:
-	eventfd_ctx_put(event->eventfd);
-out_put_efile:
-	fput(efile);
-out_kfree:
-	kfree(event);
-
-	return ret;
-}
-
 static u64 cgroup_clone_children_read(struct cgroup_subsys_state *css,
 				      struct cftype *cft)
 {
@@ -4180,11 +3952,6 @@ static struct cftype cgroup_base_files[] = {
 		.mode = S_IRUGO | S_IWUSR,
 	},
 	{
-		.name = "cgroup.event_control",
-		.write_string = cgroup_write_event_control,
-		.mode = S_IWUGO,
-	},
-	{
 		.name = "cgroup.clone_children",
 		.flags = CFTYPE_INSANE,
 		.read_u64 = cgroup_clone_children_read,
@@ -4676,7 +4443,6 @@ static int cgroup_destroy_locked(struct cgroup *cgrp)
 	__releases(&cgroup_mutex) __acquires(&cgroup_mutex)
 {
 	struct dentry *d = cgrp->dentry;
-	struct cgroup_event *event, *tmp;
 	struct cgroup_subsys *ss;
 	bool empty;
 
@@ -4734,18 +4500,6 @@ static int cgroup_destroy_locked(struct cgroup *cgrp)
 	dget(d);
 	cgroup_d_remove_dir(d);
 
-	/*
-	 * Unregister events and notify userspace.
-	 * Notify userspace about cgroup removing only after rmdir of cgroup
-	 * directory to avoid race between userspace and kernelspace.
-	 */
-	spin_lock(&cgrp->event_list_lock);
-	list_for_each_entry_safe(event, tmp, &cgrp->event_list, list) {
-		list_del_init(&event->list);
-		schedule_work(&event->remove);
-	}
-	spin_unlock(&cgrp->event_list_lock);
-
 	return 0;
 };
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b89d4cb..9e54932 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -45,6 +45,7 @@
 #include <linux/swapops.h>
 #include <linux/spinlock.h>
 #include <linux/eventfd.h>
+#include <linux/poll.h>
 #include <linux/sort.h>
 #include <linux/fs.h>
 #include <linux/seq_file.h>
@@ -239,6 +240,36 @@ struct mem_cgroup_eventfd_list {
 	struct eventfd_ctx *eventfd;
 };
 
+/*
+ * cgroup_event represents events which userspace want to receive.
+ */
+struct cgroup_event {
+	/*
+	 * css which the event belongs to.
+	 */
+	struct cgroup_subsys_state *css;
+	/*
+	 * Control file which the event associated.
+	 */
+	struct cftype *cft;
+	/*
+	 * eventfd to signal userspace about the event.
+	 */
+	struct eventfd_ctx *eventfd;
+	/*
+	 * Each of these stored in a list by the cgroup.
+	 */
+	struct list_head list;
+	/*
+	 * All fields below needed to unregister event when
+	 * userspace closes eventfd.
+	 */
+	poll_table pt;
+	wait_queue_head_t *wqh;
+	wait_queue_t wait;
+	struct work_struct remove;
+};
+
 static void mem_cgroup_threshold(struct mem_cgroup *memcg);
 static void mem_cgroup_oom_notify(struct mem_cgroup *memcg);
 
@@ -5919,6 +5950,202 @@ static void kmem_cgroup_css_offline(struct mem_cgroup *memcg)
 }
 #endif
 
+/*
+ * Unregister event and free resources.
+ *
+ * Gets called from workqueue.
+ */
+static void cgroup_event_remove(struct work_struct *work)
+{
+	struct cgroup_event *event = container_of(work, struct cgroup_event,
+			remove);
+	struct cgroup_subsys_state *css = event->css;
+
+	remove_wait_queue(event->wqh, &event->wait);
+
+	event->cft->unregister_event(css, event->cft, event->eventfd);
+
+	/* Notify userspace the event is going away. */
+	eventfd_signal(event->eventfd, 1);
+
+	eventfd_ctx_put(event->eventfd);
+	kfree(event);
+	css_put(css);
+}
+
+/*
+ * Gets called on POLLHUP on eventfd when user closes it.
+ *
+ * Called with wqh->lock held and interrupts disabled.
+ */
+static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
+		int sync, void *key)
+{
+	struct cgroup_event *event = container_of(wait,
+			struct cgroup_event, wait);
+	struct cgroup *cgrp = event->css->cgroup;
+	unsigned long flags = (unsigned long)key;
+
+	if (flags & POLLHUP) {
+		/*
+		 * If the event has been detached at cgroup removal, we
+		 * can simply return knowing the other side will cleanup
+		 * for us.
+		 *
+		 * We can't race against event freeing since the other
+		 * side will require wqh->lock via remove_wait_queue(),
+		 * which we hold.
+		 */
+		spin_lock(&cgrp->event_list_lock);
+		if (!list_empty(&event->list)) {
+			list_del_init(&event->list);
+			/*
+			 * We are in atomic context, but cgroup_event_remove()
+			 * may sleep, so we have to call it in workqueue.
+			 */
+			schedule_work(&event->remove);
+		}
+		spin_unlock(&cgrp->event_list_lock);
+	}
+
+	return 0;
+}
+
+static void cgroup_event_ptable_queue_proc(struct file *file,
+		wait_queue_head_t *wqh, poll_table *pt)
+{
+	struct cgroup_event *event = container_of(pt,
+			struct cgroup_event, pt);
+
+	event->wqh = wqh;
+	add_wait_queue(wqh, &event->wait);
+}
+
+/*
+ * Parse input and register new cgroup event handler.
+ *
+ * Input must be in format '<event_fd> <control_fd> <args>'.
+ * Interpretation of args is defined by control file implementation.
+ */
+static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
+				      struct cftype *cft, const char *buffer)
+{
+	struct cgroup *cgrp = dummy_css->cgroup;
+	struct cgroup_event *event;
+	struct cgroup_subsys_state *cfile_css;
+	unsigned int efd, cfd;
+	struct file *efile;
+	struct file *cfile;
+	char *endp;
+	int ret;
+
+	efd = simple_strtoul(buffer, &endp, 10);
+	if (*endp != ' ')
+		return -EINVAL;
+	buffer = endp + 1;
+
+	cfd = simple_strtoul(buffer, &endp, 10);
+	if ((*endp != ' ') && (*endp != '\0'))
+		return -EINVAL;
+	buffer = endp + 1;
+
+	event = kzalloc(sizeof(*event), GFP_KERNEL);
+	if (!event)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&event->list);
+	init_poll_funcptr(&event->pt, cgroup_event_ptable_queue_proc);
+	init_waitqueue_func_entry(&event->wait, cgroup_event_wake);
+	INIT_WORK(&event->remove, cgroup_event_remove);
+
+	efile = eventfd_fget(efd);
+	if (IS_ERR(efile)) {
+		ret = PTR_ERR(efile);
+		goto out_kfree;
+	}
+
+	event->eventfd = eventfd_ctx_fileget(efile);
+	if (IS_ERR(event->eventfd)) {
+		ret = PTR_ERR(event->eventfd);
+		goto out_put_efile;
+	}
+
+	cfile = fget(cfd);
+	if (!cfile) {
+		ret = -EBADF;
+		goto out_put_eventfd;
+	}
+
+	/* the process need read permission on control file */
+	/* AV: shouldn't we check that it's been opened for read instead? */
+	ret = inode_permission(file_inode(cfile), MAY_READ);
+	if (ret < 0)
+		goto out_put_cfile;
+
+	event->cft = __file_cft(cfile);
+	if (IS_ERR(event->cft)) {
+		ret = PTR_ERR(event->cft);
+		goto out_put_cfile;
+	}
+
+	if (!event->cft->ss) {
+		ret = -EBADF;
+		goto out_put_cfile;
+	}
+
+	/*
+	 * Determine the css of @cfile, verify it belongs to the same
+	 * cgroup as cgroup.event_control, and associate @event with it.
+	 * Remaining events are automatically removed on cgroup destruction
+	 * but the removal is asynchronous, so take an extra ref.
+	 */
+	rcu_read_lock();
+
+	ret = -EINVAL;
+	event->css = cgroup_css(cgrp, event->cft->ss);
+	cfile_css = css_from_dir(cfile->f_dentry->d_parent, event->cft->ss);
+	if (event->css && event->css == cfile_css && css_tryget(event->css))
+		ret = 0;
+
+	rcu_read_unlock();
+	if (ret)
+		goto out_put_cfile;
+
+	if (!event->cft->register_event || !event->cft->unregister_event) {
+		ret = -EINVAL;
+		goto out_put_css;
+	}
+
+	ret = event->cft->register_event(event->css, event->cft,
+			event->eventfd, buffer);
+	if (ret)
+		goto out_put_css;
+
+	efile->f_op->poll(efile, &event->pt);
+
+	spin_lock(&cgrp->event_list_lock);
+	list_add(&event->list, &cgrp->event_list);
+	spin_unlock(&cgrp->event_list_lock);
+
+	fput(cfile);
+	fput(efile);
+
+	return 0;
+
+out_put_css:
+	css_put(event->css);
+out_put_cfile:
+	fput(cfile);
+out_put_eventfd:
+	eventfd_ctx_put(event->eventfd);
+out_put_efile:
+	fput(efile);
+out_kfree:
+	kfree(event);
+
+	return ret;
+}
+
 static struct cftype mem_cgroup_files[] = {
 	{
 		.name = "usage_in_bytes",
@@ -5966,6 +6193,12 @@ static struct cftype mem_cgroup_files[] = {
 		.read_u64 = mem_cgroup_hierarchy_read,
 	},
 	{
+		.name = "cgroup.event_control",
+		.write_string = cgroup_write_event_control,
+		.flags = CFTYPE_NO_PREFIX,
+		.mode = S_IWUGO,
+	},
+	{
 		.name = "swappiness",
 		.read_u64 = mem_cgroup_swappiness_read,
 		.write_u64 = mem_cgroup_swappiness_write,
@@ -6298,6 +6531,20 @@ static void mem_cgroup_invalidate_reclaim_iterators(struct mem_cgroup *memcg)
 static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
+	struct cgroup *cgrp = css->cgroup;
+	struct cgroup_event *event, *tmp;
+
+	/*
+	 * Unregister events and notify userspace.
+	 * Notify userspace about cgroup removing only after rmdir of cgroup
+	 * directory to avoid race between userspace and kernelspace.
+	 */
+	spin_lock(&cgrp->event_list_lock);
+	list_for_each_entry_safe(event, tmp, &cgrp->event_list, list) {
+		list_del_init(&event->list);
+		schedule_work(&event->remove);
+	}
+	spin_unlock(&cgrp->event_list_lock);
 
 	kmem_cgroup_css_offline(memcg);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 06/12] cgroup, memcg: move cgroup_event implementation to memcg
       [not found] ` <1376582550-12548-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (9 preceding siblings ...)
  2013-08-15 16:02   ` Tejun Heo
@ 2013-08-15 16:02   ` Tejun Heo
       [not found]     ` <1376582550-12548-7-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2013-08-15 16:02   ` [PATCH " Tejun Heo
                     ` (18 subsequent siblings)
  29 siblings, 1 reply; 74+ messages in thread
From: Tejun Heo @ 2013-08-15 16:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Tejun Heo

cgroup_event is way over-designed and tries to build a generic
flexible event mechanism into cgroup - fully customizable event
specification for each user of the interface.  This is utterly
unnecessary and overboard especially in the light of the planned
unified hierarchy as there's gonna be single agent.  Simply generating
events at fixed points, or if that's too restrictive, configureable
cadence or single set of configureable points should be enough.

Thankfully, memcg is the only user and gets to keep it.  Replacing it
with something simpler on sane_behavior is strongly recommended.

This patch moves cgroup_event and "cgroup.event_control"
implementation to mm/memcontrol.c.  Clearing of events on cgroup
destruction is moved from cgroup_destroy_locked() to
mem_cgroup_css_offline(), which shouldn't make any noticeable
difference.

cgroup_css() and __file_cft() are exported to enable the move;
however, this will soon be reverted once the event code is updated to
be memcg specific.

Note that "cgroup.event_control" will now exist only on the hierarchy
with memcg attached to it.  While this change is visible to userland,
it is unlikely to be noticeable as the file has never been meaningful
outside memcg.

Aside from the above change, this is pure code relocation.

v2: Per Li Zefan's comments, init/Kconfig updated accordingly and
    poll.h inclusion moved from cgroup.c to memcontrol.c.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Cc: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
Cc: Balbir Singh <bsingharora-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 include/linux/cgroup.h |   5 +
 init/Kconfig           |   3 +-
 kernel/cgroup.c        | 252 +------------------------------------------------
 mm/memcontrol.c        | 247 ++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 256 insertions(+), 251 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 00c6329..b83b348 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -923,6 +923,11 @@ unsigned short css_id(struct cgroup_subsys_state *css);
 struct cgroup_subsys_state *css_from_dir(struct dentry *dentry,
 					 struct cgroup_subsys *ss);
 
+/* XXX: temporary */
+struct cgroup_subsys_state *cgroup_css(struct cgroup *cgrp,
+				       struct cgroup_subsys *ss);
+struct cftype *__file_cft(struct file *file);
+
 #else /* !CONFIG_CGROUPS */
 
 static inline int cgroup_init_early(void) { return 0; }
diff --git a/init/Kconfig b/init/Kconfig
index 54d3fa5..b806453 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -844,7 +844,6 @@ config NUMA_BALANCING
 
 menuconfig CGROUPS
 	boolean "Control Group support"
-	depends on EVENTFD
 	help
 	  This option adds support for grouping sets of processes together, for
 	  use with process control subsystems such as Cpusets, CFS, memory
@@ -911,6 +910,7 @@ config MEMCG
 	bool "Memory Resource Controller for Control Groups"
 	depends on RESOURCE_COUNTERS
 	select MM_OWNER
+	select EVENTFD
 	help
 	  Provides a memory resource controller that manages both anonymous
 	  memory and page cache. (See Documentation/cgroups/memory.txt)
@@ -1163,7 +1163,6 @@ config UIDGID_STRICT_TYPE_CHECKS
 
 config SCHED_AUTOGROUP
 	bool "Automatic process group scheduling"
-	select EVENTFD
 	select CGROUPS
 	select CGROUP_SCHED
 	select FAIR_GROUP_SCHED
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 6deed8b..1579ca8 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -56,8 +56,6 @@
 #include <linux/pid_namespace.h>
 #include <linux/idr.h>
 #include <linux/vmalloc.h> /* TODO: replace with more sophisticated array */
-#include <linux/eventfd.h>
-#include <linux/poll.h>
 #include <linux/flex_array.h> /* used in cgroup_attach_task */
 #include <linux/kthread.h>
 
@@ -155,36 +153,6 @@ struct css_id {
 	unsigned short stack[0]; /* Array of Length (depth+1) */
 };
 
-/*
- * cgroup_event represents events which userspace want to receive.
- */
-struct cgroup_event {
-	/*
-	 * css which the event belongs to.
-	 */
-	struct cgroup_subsys_state *css;
-	/*
-	 * Control file which the event associated.
-	 */
-	struct cftype *cft;
-	/*
-	 * eventfd to signal userspace about the event.
-	 */
-	struct eventfd_ctx *eventfd;
-	/*
-	 * Each of these stored in a list by the cgroup.
-	 */
-	struct list_head list;
-	/*
-	 * All fields below needed to unregister event when
-	 * userspace closes eventfd.
-	 */
-	poll_table pt;
-	wait_queue_head_t *wqh;
-	wait_queue_t wait;
-	struct work_struct remove;
-};
-
 /* The list of hierarchy roots */
 
 static LIST_HEAD(cgroup_roots);
@@ -234,8 +202,8 @@ static int cgroup_addrm_files(struct cgroup *cgrp, struct cftype cfts[],
  * keep accessing it outside the said locks.  This function may return
  * %NULL if @cgrp doesn't have @subsys_id enabled.
  */
-static struct cgroup_subsys_state *cgroup_css(struct cgroup *cgrp,
-					      struct cgroup_subsys *ss)
+struct cgroup_subsys_state *cgroup_css(struct cgroup *cgrp,
+				       struct cgroup_subsys *ss)
 {
 	if (ss)
 		return rcu_dereference_check(cgrp->subsys[ss->subsys_id],
@@ -2671,7 +2639,7 @@ static struct dentry *cgroup_lookup(struct inode *dir, struct dentry *dentry, un
 /*
  * Check if a file is a control file
  */
-static inline struct cftype *__file_cft(struct file *file)
+struct cftype *__file_cft(struct file *file)
 {
 	if (file_inode(file)->i_fop != &cgroup_file_operations)
 		return ERR_PTR(-EINVAL);
@@ -3959,202 +3927,6 @@ static void cgroup_dput(struct cgroup *cgrp)
 	deactivate_super(sb);
 }
 
-/*
- * Unregister event and free resources.
- *
- * Gets called from workqueue.
- */
-static void cgroup_event_remove(struct work_struct *work)
-{
-	struct cgroup_event *event = container_of(work, struct cgroup_event,
-			remove);
-	struct cgroup_subsys_state *css = event->css;
-
-	remove_wait_queue(event->wqh, &event->wait);
-
-	event->cft->unregister_event(css, event->cft, event->eventfd);
-
-	/* Notify userspace the event is going away. */
-	eventfd_signal(event->eventfd, 1);
-
-	eventfd_ctx_put(event->eventfd);
-	kfree(event);
-	css_put(css);
-}
-
-/*
- * Gets called on POLLHUP on eventfd when user closes it.
- *
- * Called with wqh->lock held and interrupts disabled.
- */
-static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
-		int sync, void *key)
-{
-	struct cgroup_event *event = container_of(wait,
-			struct cgroup_event, wait);
-	struct cgroup *cgrp = event->css->cgroup;
-	unsigned long flags = (unsigned long)key;
-
-	if (flags & POLLHUP) {
-		/*
-		 * If the event has been detached at cgroup removal, we
-		 * can simply return knowing the other side will cleanup
-		 * for us.
-		 *
-		 * We can't race against event freeing since the other
-		 * side will require wqh->lock via remove_wait_queue(),
-		 * which we hold.
-		 */
-		spin_lock(&cgrp->event_list_lock);
-		if (!list_empty(&event->list)) {
-			list_del_init(&event->list);
-			/*
-			 * We are in atomic context, but cgroup_event_remove()
-			 * may sleep, so we have to call it in workqueue.
-			 */
-			schedule_work(&event->remove);
-		}
-		spin_unlock(&cgrp->event_list_lock);
-	}
-
-	return 0;
-}
-
-static void cgroup_event_ptable_queue_proc(struct file *file,
-		wait_queue_head_t *wqh, poll_table *pt)
-{
-	struct cgroup_event *event = container_of(pt,
-			struct cgroup_event, pt);
-
-	event->wqh = wqh;
-	add_wait_queue(wqh, &event->wait);
-}
-
-/*
- * Parse input and register new cgroup event handler.
- *
- * Input must be in format '<event_fd> <control_fd> <args>'.
- * Interpretation of args is defined by control file implementation.
- */
-static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
-				      struct cftype *cft, const char *buffer)
-{
-	struct cgroup *cgrp = dummy_css->cgroup;
-	struct cgroup_event *event;
-	struct cgroup_subsys_state *cfile_css;
-	unsigned int efd, cfd;
-	struct file *efile;
-	struct file *cfile;
-	char *endp;
-	int ret;
-
-	efd = simple_strtoul(buffer, &endp, 10);
-	if (*endp != ' ')
-		return -EINVAL;
-	buffer = endp + 1;
-
-	cfd = simple_strtoul(buffer, &endp, 10);
-	if ((*endp != ' ') && (*endp != '\0'))
-		return -EINVAL;
-	buffer = endp + 1;
-
-	event = kzalloc(sizeof(*event), GFP_KERNEL);
-	if (!event)
-		return -ENOMEM;
-
-	INIT_LIST_HEAD(&event->list);
-	init_poll_funcptr(&event->pt, cgroup_event_ptable_queue_proc);
-	init_waitqueue_func_entry(&event->wait, cgroup_event_wake);
-	INIT_WORK(&event->remove, cgroup_event_remove);
-
-	efile = eventfd_fget(efd);
-	if (IS_ERR(efile)) {
-		ret = PTR_ERR(efile);
-		goto out_kfree;
-	}
-
-	event->eventfd = eventfd_ctx_fileget(efile);
-	if (IS_ERR(event->eventfd)) {
-		ret = PTR_ERR(event->eventfd);
-		goto out_put_efile;
-	}
-
-	cfile = fget(cfd);
-	if (!cfile) {
-		ret = -EBADF;
-		goto out_put_eventfd;
-	}
-
-	/* the process need read permission on control file */
-	/* AV: shouldn't we check that it's been opened for read instead? */
-	ret = inode_permission(file_inode(cfile), MAY_READ);
-	if (ret < 0)
-		goto out_put_cfile;
-
-	event->cft = __file_cft(cfile);
-	if (IS_ERR(event->cft)) {
-		ret = PTR_ERR(event->cft);
-		goto out_put_cfile;
-	}
-
-	if (!event->cft->ss) {
-		ret = -EBADF;
-		goto out_put_cfile;
-	}
-
-	/*
-	 * Determine the css of @cfile, verify it belongs to the same
-	 * cgroup as cgroup.event_control, and associate @event with it.
-	 * Remaining events are automatically removed on cgroup destruction
-	 * but the removal is asynchronous, so take an extra ref.
-	 */
-	rcu_read_lock();
-
-	ret = -EINVAL;
-	event->css = cgroup_css(cgrp, event->cft->ss);
-	cfile_css = css_from_dir(cfile->f_dentry->d_parent, event->cft->ss);
-	if (event->css && event->css == cfile_css && css_tryget(event->css))
-		ret = 0;
-
-	rcu_read_unlock();
-	if (ret)
-		goto out_put_cfile;
-
-	if (!event->cft->register_event || !event->cft->unregister_event) {
-		ret = -EINVAL;
-		goto out_put_css;
-	}
-
-	ret = event->cft->register_event(event->css, event->cft,
-			event->eventfd, buffer);
-	if (ret)
-		goto out_put_css;
-
-	efile->f_op->poll(efile, &event->pt);
-
-	spin_lock(&cgrp->event_list_lock);
-	list_add(&event->list, &cgrp->event_list);
-	spin_unlock(&cgrp->event_list_lock);
-
-	fput(cfile);
-	fput(efile);
-
-	return 0;
-
-out_put_css:
-	css_put(event->css);
-out_put_cfile:
-	fput(cfile);
-out_put_eventfd:
-	eventfd_ctx_put(event->eventfd);
-out_put_efile:
-	fput(efile);
-out_kfree:
-	kfree(event);
-
-	return ret;
-}
-
 static u64 cgroup_clone_children_read(struct cgroup_subsys_state *css,
 				      struct cftype *cft)
 {
@@ -4180,11 +3952,6 @@ static struct cftype cgroup_base_files[] = {
 		.mode = S_IRUGO | S_IWUSR,
 	},
 	{
-		.name = "cgroup.event_control",
-		.write_string = cgroup_write_event_control,
-		.mode = S_IWUGO,
-	},
-	{
 		.name = "cgroup.clone_children",
 		.flags = CFTYPE_INSANE,
 		.read_u64 = cgroup_clone_children_read,
@@ -4676,7 +4443,6 @@ static int cgroup_destroy_locked(struct cgroup *cgrp)
 	__releases(&cgroup_mutex) __acquires(&cgroup_mutex)
 {
 	struct dentry *d = cgrp->dentry;
-	struct cgroup_event *event, *tmp;
 	struct cgroup_subsys *ss;
 	bool empty;
 
@@ -4734,18 +4500,6 @@ static int cgroup_destroy_locked(struct cgroup *cgrp)
 	dget(d);
 	cgroup_d_remove_dir(d);
 
-	/*
-	 * Unregister events and notify userspace.
-	 * Notify userspace about cgroup removing only after rmdir of cgroup
-	 * directory to avoid race between userspace and kernelspace.
-	 */
-	spin_lock(&cgrp->event_list_lock);
-	list_for_each_entry_safe(event, tmp, &cgrp->event_list, list) {
-		list_del_init(&event->list);
-		schedule_work(&event->remove);
-	}
-	spin_unlock(&cgrp->event_list_lock);
-
 	return 0;
 };
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b89d4cb..9e54932 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -45,6 +45,7 @@
 #include <linux/swapops.h>
 #include <linux/spinlock.h>
 #include <linux/eventfd.h>
+#include <linux/poll.h>
 #include <linux/sort.h>
 #include <linux/fs.h>
 #include <linux/seq_file.h>
@@ -239,6 +240,36 @@ struct mem_cgroup_eventfd_list {
 	struct eventfd_ctx *eventfd;
 };
 
+/*
+ * cgroup_event represents events which userspace want to receive.
+ */
+struct cgroup_event {
+	/*
+	 * css which the event belongs to.
+	 */
+	struct cgroup_subsys_state *css;
+	/*
+	 * Control file which the event associated.
+	 */
+	struct cftype *cft;
+	/*
+	 * eventfd to signal userspace about the event.
+	 */
+	struct eventfd_ctx *eventfd;
+	/*
+	 * Each of these stored in a list by the cgroup.
+	 */
+	struct list_head list;
+	/*
+	 * All fields below needed to unregister event when
+	 * userspace closes eventfd.
+	 */
+	poll_table pt;
+	wait_queue_head_t *wqh;
+	wait_queue_t wait;
+	struct work_struct remove;
+};
+
 static void mem_cgroup_threshold(struct mem_cgroup *memcg);
 static void mem_cgroup_oom_notify(struct mem_cgroup *memcg);
 
@@ -5919,6 +5950,202 @@ static void kmem_cgroup_css_offline(struct mem_cgroup *memcg)
 }
 #endif
 
+/*
+ * Unregister event and free resources.
+ *
+ * Gets called from workqueue.
+ */
+static void cgroup_event_remove(struct work_struct *work)
+{
+	struct cgroup_event *event = container_of(work, struct cgroup_event,
+			remove);
+	struct cgroup_subsys_state *css = event->css;
+
+	remove_wait_queue(event->wqh, &event->wait);
+
+	event->cft->unregister_event(css, event->cft, event->eventfd);
+
+	/* Notify userspace the event is going away. */
+	eventfd_signal(event->eventfd, 1);
+
+	eventfd_ctx_put(event->eventfd);
+	kfree(event);
+	css_put(css);
+}
+
+/*
+ * Gets called on POLLHUP on eventfd when user closes it.
+ *
+ * Called with wqh->lock held and interrupts disabled.
+ */
+static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
+		int sync, void *key)
+{
+	struct cgroup_event *event = container_of(wait,
+			struct cgroup_event, wait);
+	struct cgroup *cgrp = event->css->cgroup;
+	unsigned long flags = (unsigned long)key;
+
+	if (flags & POLLHUP) {
+		/*
+		 * If the event has been detached at cgroup removal, we
+		 * can simply return knowing the other side will cleanup
+		 * for us.
+		 *
+		 * We can't race against event freeing since the other
+		 * side will require wqh->lock via remove_wait_queue(),
+		 * which we hold.
+		 */
+		spin_lock(&cgrp->event_list_lock);
+		if (!list_empty(&event->list)) {
+			list_del_init(&event->list);
+			/*
+			 * We are in atomic context, but cgroup_event_remove()
+			 * may sleep, so we have to call it in workqueue.
+			 */
+			schedule_work(&event->remove);
+		}
+		spin_unlock(&cgrp->event_list_lock);
+	}
+
+	return 0;
+}
+
+static void cgroup_event_ptable_queue_proc(struct file *file,
+		wait_queue_head_t *wqh, poll_table *pt)
+{
+	struct cgroup_event *event = container_of(pt,
+			struct cgroup_event, pt);
+
+	event->wqh = wqh;
+	add_wait_queue(wqh, &event->wait);
+}
+
+/*
+ * Parse input and register new cgroup event handler.
+ *
+ * Input must be in format '<event_fd> <control_fd> <args>'.
+ * Interpretation of args is defined by control file implementation.
+ */
+static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
+				      struct cftype *cft, const char *buffer)
+{
+	struct cgroup *cgrp = dummy_css->cgroup;
+	struct cgroup_event *event;
+	struct cgroup_subsys_state *cfile_css;
+	unsigned int efd, cfd;
+	struct file *efile;
+	struct file *cfile;
+	char *endp;
+	int ret;
+
+	efd = simple_strtoul(buffer, &endp, 10);
+	if (*endp != ' ')
+		return -EINVAL;
+	buffer = endp + 1;
+
+	cfd = simple_strtoul(buffer, &endp, 10);
+	if ((*endp != ' ') && (*endp != '\0'))
+		return -EINVAL;
+	buffer = endp + 1;
+
+	event = kzalloc(sizeof(*event), GFP_KERNEL);
+	if (!event)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&event->list);
+	init_poll_funcptr(&event->pt, cgroup_event_ptable_queue_proc);
+	init_waitqueue_func_entry(&event->wait, cgroup_event_wake);
+	INIT_WORK(&event->remove, cgroup_event_remove);
+
+	efile = eventfd_fget(efd);
+	if (IS_ERR(efile)) {
+		ret = PTR_ERR(efile);
+		goto out_kfree;
+	}
+
+	event->eventfd = eventfd_ctx_fileget(efile);
+	if (IS_ERR(event->eventfd)) {
+		ret = PTR_ERR(event->eventfd);
+		goto out_put_efile;
+	}
+
+	cfile = fget(cfd);
+	if (!cfile) {
+		ret = -EBADF;
+		goto out_put_eventfd;
+	}
+
+	/* the process need read permission on control file */
+	/* AV: shouldn't we check that it's been opened for read instead? */
+	ret = inode_permission(file_inode(cfile), MAY_READ);
+	if (ret < 0)
+		goto out_put_cfile;
+
+	event->cft = __file_cft(cfile);
+	if (IS_ERR(event->cft)) {
+		ret = PTR_ERR(event->cft);
+		goto out_put_cfile;
+	}
+
+	if (!event->cft->ss) {
+		ret = -EBADF;
+		goto out_put_cfile;
+	}
+
+	/*
+	 * Determine the css of @cfile, verify it belongs to the same
+	 * cgroup as cgroup.event_control, and associate @event with it.
+	 * Remaining events are automatically removed on cgroup destruction
+	 * but the removal is asynchronous, so take an extra ref.
+	 */
+	rcu_read_lock();
+
+	ret = -EINVAL;
+	event->css = cgroup_css(cgrp, event->cft->ss);
+	cfile_css = css_from_dir(cfile->f_dentry->d_parent, event->cft->ss);
+	if (event->css && event->css == cfile_css && css_tryget(event->css))
+		ret = 0;
+
+	rcu_read_unlock();
+	if (ret)
+		goto out_put_cfile;
+
+	if (!event->cft->register_event || !event->cft->unregister_event) {
+		ret = -EINVAL;
+		goto out_put_css;
+	}
+
+	ret = event->cft->register_event(event->css, event->cft,
+			event->eventfd, buffer);
+	if (ret)
+		goto out_put_css;
+
+	efile->f_op->poll(efile, &event->pt);
+
+	spin_lock(&cgrp->event_list_lock);
+	list_add(&event->list, &cgrp->event_list);
+	spin_unlock(&cgrp->event_list_lock);
+
+	fput(cfile);
+	fput(efile);
+
+	return 0;
+
+out_put_css:
+	css_put(event->css);
+out_put_cfile:
+	fput(cfile);
+out_put_eventfd:
+	eventfd_ctx_put(event->eventfd);
+out_put_efile:
+	fput(efile);
+out_kfree:
+	kfree(event);
+
+	return ret;
+}
+
 static struct cftype mem_cgroup_files[] = {
 	{
 		.name = "usage_in_bytes",
@@ -5966,6 +6193,12 @@ static struct cftype mem_cgroup_files[] = {
 		.read_u64 = mem_cgroup_hierarchy_read,
 	},
 	{
+		.name = "cgroup.event_control",
+		.write_string = cgroup_write_event_control,
+		.flags = CFTYPE_NO_PREFIX,
+		.mode = S_IWUGO,
+	},
+	{
 		.name = "swappiness",
 		.read_u64 = mem_cgroup_swappiness_read,
 		.write_u64 = mem_cgroup_swappiness_write,
@@ -6298,6 +6531,20 @@ static void mem_cgroup_invalidate_reclaim_iterators(struct mem_cgroup *memcg)
 static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
+	struct cgroup *cgrp = css->cgroup;
+	struct cgroup_event *event, *tmp;
+
+	/*
+	 * Unregister events and notify userspace.
+	 * Notify userspace about cgroup removing only after rmdir of cgroup
+	 * directory to avoid race between userspace and kernelspace.
+	 */
+	spin_lock(&cgrp->event_list_lock);
+	list_for_each_entry_safe(event, tmp, &cgrp->event_list, list) {
+		list_del_init(&event->list);
+		schedule_work(&event->remove);
+	}
+	spin_unlock(&cgrp->event_list_lock);
 
 	kmem_cgroup_css_offline(memcg);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 07/12] memcg: cgroup_write_event_control() now knows @css is for memcg
       [not found] ` <1376582550-12548-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (11 preceding siblings ...)
  2013-08-15 16:02   ` [PATCH " Tejun Heo
@ 2013-08-15 16:02   ` Tejun Heo
  2013-08-15 16:02   ` Tejun Heo
                     ` (16 subsequent siblings)
  29 siblings, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2013-08-15 16:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
  Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

@css for cgroup_write_event_control() is now always for memcg and the
target file should be a memcg file too.  Drop code which assumes @css
is dummy_css and the target file may belong to different subsystems.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 mm/memcontrol.c | 26 ++++++++++----------------
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9e54932..ef75925 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6027,10 +6027,10 @@ static void cgroup_event_ptable_queue_proc(struct file *file,
  * Input must be in format '<event_fd> <control_fd> <args>'.
  * Interpretation of args is defined by control file implementation.
  */
-static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
+static int cgroup_write_event_control(struct cgroup_subsys_state *css,
 				      struct cftype *cft, const char *buffer)
 {
-	struct cgroup *cgrp = dummy_css->cgroup;
+	struct cgroup *cgrp = css->cgroup;
 	struct cgroup_event *event;
 	struct cgroup_subsys_state *cfile_css;
 	unsigned int efd, cfd;
@@ -6053,6 +6053,7 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
 	if (!event)
 		return -ENOMEM;
 
+	event->css = css;
 	INIT_LIST_HEAD(&event->list);
 	init_poll_funcptr(&event->pt, cgroup_event_ptable_queue_proc);
 	init_waitqueue_func_entry(&event->wait, cgroup_event_wake);
@@ -6088,23 +6089,16 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
 		goto out_put_cfile;
 	}
 
-	if (!event->cft->ss) {
-		ret = -EBADF;
-		goto out_put_cfile;
-	}
-
 	/*
-	 * Determine the css of @cfile, verify it belongs to the same
-	 * cgroup as cgroup.event_control, and associate @event with it.
-	 * Remaining events are automatically removed on cgroup destruction
-	 * but the removal is asynchronous, so take an extra ref.
+	 * Verify @cfile should belong to @css.  Also, remaining events are
+	 * automatically removed on cgroup destruction but the removal is
+	 * asynchronous, so take an extra ref on @css.
 	 */
 	rcu_read_lock();
 
 	ret = -EINVAL;
-	event->css = cgroup_css(cgrp, event->cft->ss);
-	cfile_css = css_from_dir(cfile->f_dentry->d_parent, event->cft->ss);
-	if (event->css && event->css == cfile_css && css_tryget(event->css))
+	cfile_css = css_from_dir(cfile->f_dentry->d_parent, &mem_cgroup_subsys);
+	if (cfile_css == css && css_tryget(css))
 		ret = 0;
 
 	rcu_read_unlock();
@@ -6116,7 +6110,7 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
 		goto out_put_css;
 	}
 
-	ret = event->cft->register_event(event->css, event->cft,
+	ret = event->cft->register_event(css, event->cft,
 			event->eventfd, buffer);
 	if (ret)
 		goto out_put_css;
@@ -6133,7 +6127,7 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
 	return 0;
 
 out_put_css:
-	css_put(event->css);
+	css_put(css);
 out_put_cfile:
 	fput(cfile);
 out_put_eventfd:
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 07/12] memcg: cgroup_write_event_control() now knows @css is for memcg
       [not found] ` <1376582550-12548-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (12 preceding siblings ...)
  2013-08-15 16:02   ` [PATCH 07/12] memcg: cgroup_write_event_control() now knows @css is for memcg Tejun Heo
@ 2013-08-15 16:02   ` Tejun Heo
  2013-08-15 16:02   ` [PATCH 08/12] cgroup, memcg: move cgroup->event_list[_lock] and event callbacks into memcg Tejun Heo
                     ` (15 subsequent siblings)
  29 siblings, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2013-08-15 16:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Tejun Heo

@css for cgroup_write_event_control() is now always for memcg and the
target file should be a memcg file too.  Drop code which assumes @css
is dummy_css and the target file may belong to different subsystems.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 mm/memcontrol.c | 26 ++++++++++----------------
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9e54932..ef75925 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6027,10 +6027,10 @@ static void cgroup_event_ptable_queue_proc(struct file *file,
  * Input must be in format '<event_fd> <control_fd> <args>'.
  * Interpretation of args is defined by control file implementation.
  */
-static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
+static int cgroup_write_event_control(struct cgroup_subsys_state *css,
 				      struct cftype *cft, const char *buffer)
 {
-	struct cgroup *cgrp = dummy_css->cgroup;
+	struct cgroup *cgrp = css->cgroup;
 	struct cgroup_event *event;
 	struct cgroup_subsys_state *cfile_css;
 	unsigned int efd, cfd;
@@ -6053,6 +6053,7 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
 	if (!event)
 		return -ENOMEM;
 
+	event->css = css;
 	INIT_LIST_HEAD(&event->list);
 	init_poll_funcptr(&event->pt, cgroup_event_ptable_queue_proc);
 	init_waitqueue_func_entry(&event->wait, cgroup_event_wake);
@@ -6088,23 +6089,16 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
 		goto out_put_cfile;
 	}
 
-	if (!event->cft->ss) {
-		ret = -EBADF;
-		goto out_put_cfile;
-	}
-
 	/*
-	 * Determine the css of @cfile, verify it belongs to the same
-	 * cgroup as cgroup.event_control, and associate @event with it.
-	 * Remaining events are automatically removed on cgroup destruction
-	 * but the removal is asynchronous, so take an extra ref.
+	 * Verify @cfile should belong to @css.  Also, remaining events are
+	 * automatically removed on cgroup destruction but the removal is
+	 * asynchronous, so take an extra ref on @css.
 	 */
 	rcu_read_lock();
 
 	ret = -EINVAL;
-	event->css = cgroup_css(cgrp, event->cft->ss);
-	cfile_css = css_from_dir(cfile->f_dentry->d_parent, event->cft->ss);
-	if (event->css && event->css == cfile_css && css_tryget(event->css))
+	cfile_css = css_from_dir(cfile->f_dentry->d_parent, &mem_cgroup_subsys);
+	if (cfile_css == css && css_tryget(css))
 		ret = 0;
 
 	rcu_read_unlock();
@@ -6116,7 +6110,7 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
 		goto out_put_css;
 	}
 
-	ret = event->cft->register_event(event->css, event->cft,
+	ret = event->cft->register_event(css, event->cft,
 			event->eventfd, buffer);
 	if (ret)
 		goto out_put_css;
@@ -6133,7 +6127,7 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
 	return 0;
 
 out_put_css:
-	css_put(event->css);
+	css_put(css);
 out_put_cfile:
 	fput(cfile);
 out_put_eventfd:
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 08/12] cgroup, memcg: move cgroup->event_list[_lock] and event callbacks into memcg
       [not found] ` <1376582550-12548-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (13 preceding siblings ...)
  2013-08-15 16:02   ` Tejun Heo
@ 2013-08-15 16:02   ` Tejun Heo
  2013-08-15 16:02   ` Tejun Heo
                     ` (14 subsequent siblings)
  29 siblings, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2013-08-15 16:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
  Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

cgroup_event is being moved from cgroup core to memcg and the
implementation is already moved by the previous patch.  This patch
moves the data fields and callbacks.

* cgroup->event_list[_lock] are moved to mem_cgroup.

* cftype->[un]register_event() are moved to cgroup_event.  This makes
  it impossible for individual cftype definitions to specify their
  event callbacks.  This is worked around by simply hard-coding
  filename to event callback mapping in cgroup_write_event_control().
  This is awkward and inflexible, which is actually desirable given
  that we don't want to grow more usages of this feature.

* eventfd_ctx declaration is removed from cgroup.h, which makes
  vmpressure.h miss eventfd_ctx declaration.  Include eventfd.h from
  vmpressure.h.

v2: Use file name from dentry instead of cftype.  This will allow
    removing all cftype handling in the function.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Cc: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
Cc: Balbir Singh <bsingharora-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 include/linux/cgroup.h     | 24 -------------
 include/linux/vmpressure.h |  1 +
 kernel/cgroup.c            |  2 --
 mm/memcontrol.c            | 87 ++++++++++++++++++++++++++++++++--------------
 4 files changed, 61 insertions(+), 53 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index b83b348..d2cad3f 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -29,7 +29,6 @@ struct cgroup_subsys;
 struct inode;
 struct cgroup;
 struct css_id;
-struct eventfd_ctx;
 
 extern int cgroup_init_early(void);
 extern int cgroup_init(void);
@@ -239,10 +238,6 @@ struct cgroup {
 	struct rcu_head rcu_head;
 	struct work_struct destroy_work;
 
-	/* List of events which userspace want to receive */
-	struct list_head event_list;
-	spinlock_t event_list_lock;
-
 	/* directory xattrs */
 	struct simple_xattrs xattrs;
 };
@@ -506,25 +501,6 @@ struct cftype {
 	int (*trigger)(struct cgroup_subsys_state *css, unsigned int event);
 
 	int (*release)(struct inode *inode, struct file *file);
-
-	/*
-	 * register_event() callback will be used to add new userspace
-	 * waiter for changes related to the cftype. Implement it if
-	 * you want to provide this functionality. Use eventfd_signal()
-	 * on eventfd to send notification to userspace.
-	 */
-	int (*register_event)(struct cgroup_subsys_state *css,
-			      struct cftype *cft, struct eventfd_ctx *eventfd,
-			      const char *args);
-	/*
-	 * unregister_event() callback will be called when userspace
-	 * closes the eventfd or on cgroup removing.
-	 * This callback must be implemented, if you want provide
-	 * notification functionality.
-	 */
-	void (*unregister_event)(struct cgroup_subsys_state *css,
-				 struct cftype *cft,
-				 struct eventfd_ctx *eventfd);
 };
 
 /*
diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
index b239482..324ea7a 100644
--- a/include/linux/vmpressure.h
+++ b/include/linux/vmpressure.h
@@ -7,6 +7,7 @@
 #include <linux/gfp.h>
 #include <linux/types.h>
 #include <linux/cgroup.h>
+#include <linux/eventfd.h>
 
 struct vmpressure {
 	unsigned long scanned;
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 1579ca8..4368a6c 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1353,8 +1353,6 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp)
 	INIT_LIST_HEAD(&cgrp->pidlists);
 	mutex_init(&cgrp->pidlist_mutex);
 	cgrp->dummy_css.cgroup = cgrp;
-	INIT_LIST_HEAD(&cgrp->event_list);
-	spin_lock_init(&cgrp->event_list_lock);
 	simple_xattrs_init(&cgrp->xattrs);
 }
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ef75925..9b833e1 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -261,6 +261,22 @@ struct cgroup_event {
 	 */
 	struct list_head list;
 	/*
+	 * register_event() callback will be used to add new userspace
+	 * waiter for changes related to this event.  Use eventfd_signal()
+	 * on eventfd to send notification to userspace.
+	 */
+	int (*register_event)(struct cgroup_subsys_state *css,
+			      struct cftype *cft, struct eventfd_ctx *eventfd,
+			      const char *args);
+	/*
+	 * unregister_event() callback will be called when userspace closes
+	 * the eventfd or on cgroup removing.  This callback must be set,
+	 * if you want provide notification functionality.
+	 */
+	void (*unregister_event)(struct cgroup_subsys_state *css,
+				 struct cftype *cft,
+				 struct eventfd_ctx *eventfd);
+	/*
 	 * All fields below needed to unregister event when
 	 * userspace closes eventfd.
 	 */
@@ -373,6 +389,10 @@ struct mem_cgroup {
 	atomic_t	numainfo_updating;
 #endif
 
+	/* List of events which userspace want to receive */
+	struct list_head event_list;
+	spinlock_t event_list_lock;
+
 	struct mem_cgroup_per_node *nodeinfo[0];
 	/* WARNING: nodeinfo must be the last member here */
 };
@@ -5963,7 +5983,7 @@ static void cgroup_event_remove(struct work_struct *work)
 
 	remove_wait_queue(event->wqh, &event->wait);
 
-	event->cft->unregister_event(css, event->cft, event->eventfd);
+	event->unregister_event(css, event->cft, event->eventfd);
 
 	/* Notify userspace the event is going away. */
 	eventfd_signal(event->eventfd, 1);
@@ -5983,7 +6003,7 @@ static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
 {
 	struct cgroup_event *event = container_of(wait,
 			struct cgroup_event, wait);
-	struct cgroup *cgrp = event->css->cgroup;
+	struct mem_cgroup *memcg = mem_cgroup_from_css(event->css);
 	unsigned long flags = (unsigned long)key;
 
 	if (flags & POLLHUP) {
@@ -5996,7 +6016,7 @@ static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
 		 * side will require wqh->lock via remove_wait_queue(),
 		 * which we hold.
 		 */
-		spin_lock(&cgrp->event_list_lock);
+		spin_lock(&memcg->event_list_lock);
 		if (!list_empty(&event->list)) {
 			list_del_init(&event->list);
 			/*
@@ -6005,7 +6025,7 @@ static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
 			 */
 			schedule_work(&event->remove);
 		}
-		spin_unlock(&cgrp->event_list_lock);
+		spin_unlock(&memcg->event_list_lock);
 	}
 
 	return 0;
@@ -6030,12 +6050,13 @@ static void cgroup_event_ptable_queue_proc(struct file *file,
 static int cgroup_write_event_control(struct cgroup_subsys_state *css,
 				      struct cftype *cft, const char *buffer)
 {
-	struct cgroup *cgrp = css->cgroup;
+	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
 	struct cgroup_event *event;
 	struct cgroup_subsys_state *cfile_css;
 	unsigned int efd, cfd;
 	struct file *efile;
 	struct file *cfile;
+	const char *name;
 	char *endp;
 	int ret;
 
@@ -6090,6 +6111,31 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *css,
 	}
 
 	/*
+	 * Determine the event callbacks and set them in @event.  This used
+	 * to be done via struct cftype but cgroup core no longer knows
+	 * about these events.  The following is crude but the whole thing
+	 * is for compatibility anyway.
+	 */
+	name = cfile->f_dentry->d_name.name;
+
+	if (!strcmp(name, "memory.usage_in_bytes")) {
+		event->register_event = mem_cgroup_usage_register_event;
+		event->unregister_event = mem_cgroup_usage_unregister_event;
+	} else if (!strcmp(name, "memory.oom_control")) {
+		event->register_event = mem_cgroup_oom_register_event;
+		event->unregister_event = mem_cgroup_oom_unregister_event;
+	} else if (!strcmp(name, "memory.pressure_level")) {
+		event->register_event = vmpressure_register_event;
+		event->unregister_event = vmpressure_unregister_event;
+	} else if (!strcmp(name, "memory.memsw.usage_in_bytes")) {
+		event->register_event = mem_cgroup_usage_register_event;
+		event->unregister_event = mem_cgroup_usage_unregister_event;
+	} else {
+		ret = -EINVAL;
+		goto out_put_cfile;
+	}
+
+	/*
 	 * Verify @cfile should belong to @css.  Also, remaining events are
 	 * automatically removed on cgroup destruction but the removal is
 	 * asynchronous, so take an extra ref on @css.
@@ -6105,21 +6151,15 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *css,
 	if (ret)
 		goto out_put_cfile;
 
-	if (!event->cft->register_event || !event->cft->unregister_event) {
-		ret = -EINVAL;
-		goto out_put_css;
-	}
-
-	ret = event->cft->register_event(css, event->cft,
-			event->eventfd, buffer);
+	ret = event->register_event(css, event->cft, event->eventfd, buffer);
 	if (ret)
 		goto out_put_css;
 
 	efile->f_op->poll(efile, &event->pt);
 
-	spin_lock(&cgrp->event_list_lock);
-	list_add(&event->list, &cgrp->event_list);
-	spin_unlock(&cgrp->event_list_lock);
+	spin_lock(&memcg->event_list_lock);
+	list_add(&event->list, &memcg->event_list);
+	spin_unlock(&memcg->event_list_lock);
 
 	fput(cfile);
 	fput(efile);
@@ -6145,8 +6185,6 @@ static struct cftype mem_cgroup_files[] = {
 		.name = "usage_in_bytes",
 		.private = MEMFILE_PRIVATE(_MEM, RES_USAGE),
 		.read = mem_cgroup_read,
-		.register_event = mem_cgroup_usage_register_event,
-		.unregister_event = mem_cgroup_usage_unregister_event,
 	},
 	{
 		.name = "max_usage_in_bytes",
@@ -6206,14 +6244,10 @@ static struct cftype mem_cgroup_files[] = {
 		.name = "oom_control",
 		.read_map = mem_cgroup_oom_control_read,
 		.write_u64 = mem_cgroup_oom_control_write,
-		.register_event = mem_cgroup_oom_register_event,
-		.unregister_event = mem_cgroup_oom_unregister_event,
 		.private = MEMFILE_PRIVATE(_OOM_TYPE, OOM_CONTROL),
 	},
 	{
 		.name = "pressure_level",
-		.register_event = vmpressure_register_event,
-		.unregister_event = vmpressure_unregister_event,
 	},
 #ifdef CONFIG_NUMA
 	{
@@ -6261,8 +6295,6 @@ static struct cftype memsw_cgroup_files[] = {
 		.name = "memsw.usage_in_bytes",
 		.private = MEMFILE_PRIVATE(_MEMSWAP, RES_USAGE),
 		.read = mem_cgroup_read,
-		.register_event = mem_cgroup_usage_register_event,
-		.unregister_event = mem_cgroup_usage_unregister_event,
 	},
 	{
 		.name = "memsw.max_usage_in_bytes",
@@ -6453,6 +6485,8 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
 	mutex_init(&memcg->thresholds_lock);
 	spin_lock_init(&memcg->move_lock);
 	vmpressure_init(&memcg->vmpressure);
+	INIT_LIST_HEAD(&memcg->event_list);
+	spin_lock_init(&memcg->event_list_lock);
 
 	return &memcg->css;
 
@@ -6525,7 +6559,6 @@ static void mem_cgroup_invalidate_reclaim_iterators(struct mem_cgroup *memcg)
 static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
-	struct cgroup *cgrp = css->cgroup;
 	struct cgroup_event *event, *tmp;
 
 	/*
@@ -6533,12 +6566,12 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
 	 * Notify userspace about cgroup removing only after rmdir of cgroup
 	 * directory to avoid race between userspace and kernelspace.
 	 */
-	spin_lock(&cgrp->event_list_lock);
-	list_for_each_entry_safe(event, tmp, &cgrp->event_list, list) {
+	spin_lock(&memcg->event_list_lock);
+	list_for_each_entry_safe(event, tmp, &memcg->event_list, list) {
 		list_del_init(&event->list);
 		schedule_work(&event->remove);
 	}
-	spin_unlock(&cgrp->event_list_lock);
+	spin_unlock(&memcg->event_list_lock);
 
 	kmem_cgroup_css_offline(memcg);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 08/12] cgroup, memcg: move cgroup->event_list[_lock] and event callbacks into memcg
       [not found] ` <1376582550-12548-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (14 preceding siblings ...)
  2013-08-15 16:02   ` [PATCH 08/12] cgroup, memcg: move cgroup->event_list[_lock] and event callbacks into memcg Tejun Heo
@ 2013-08-15 16:02   ` Tejun Heo
       [not found]     ` <1376582550-12548-9-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2013-08-15 16:02   ` [PATCH 09/12] memcg: remove cgroup_event->cft Tejun Heo
                     ` (13 subsequent siblings)
  29 siblings, 1 reply; 74+ messages in thread
From: Tejun Heo @ 2013-08-15 16:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Tejun Heo

cgroup_event is being moved from cgroup core to memcg and the
implementation is already moved by the previous patch.  This patch
moves the data fields and callbacks.

* cgroup->event_list[_lock] are moved to mem_cgroup.

* cftype->[un]register_event() are moved to cgroup_event.  This makes
  it impossible for individual cftype definitions to specify their
  event callbacks.  This is worked around by simply hard-coding
  filename to event callback mapping in cgroup_write_event_control().
  This is awkward and inflexible, which is actually desirable given
  that we don't want to grow more usages of this feature.

* eventfd_ctx declaration is removed from cgroup.h, which makes
  vmpressure.h miss eventfd_ctx declaration.  Include eventfd.h from
  vmpressure.h.

v2: Use file name from dentry instead of cftype.  This will allow
    removing all cftype handling in the function.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Cc: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
Cc: Balbir Singh <bsingharora-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 include/linux/cgroup.h     | 24 -------------
 include/linux/vmpressure.h |  1 +
 kernel/cgroup.c            |  2 --
 mm/memcontrol.c            | 87 ++++++++++++++++++++++++++++++++--------------
 4 files changed, 61 insertions(+), 53 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index b83b348..d2cad3f 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -29,7 +29,6 @@ struct cgroup_subsys;
 struct inode;
 struct cgroup;
 struct css_id;
-struct eventfd_ctx;
 
 extern int cgroup_init_early(void);
 extern int cgroup_init(void);
@@ -239,10 +238,6 @@ struct cgroup {
 	struct rcu_head rcu_head;
 	struct work_struct destroy_work;
 
-	/* List of events which userspace want to receive */
-	struct list_head event_list;
-	spinlock_t event_list_lock;
-
 	/* directory xattrs */
 	struct simple_xattrs xattrs;
 };
@@ -506,25 +501,6 @@ struct cftype {
 	int (*trigger)(struct cgroup_subsys_state *css, unsigned int event);
 
 	int (*release)(struct inode *inode, struct file *file);
-
-	/*
-	 * register_event() callback will be used to add new userspace
-	 * waiter for changes related to the cftype. Implement it if
-	 * you want to provide this functionality. Use eventfd_signal()
-	 * on eventfd to send notification to userspace.
-	 */
-	int (*register_event)(struct cgroup_subsys_state *css,
-			      struct cftype *cft, struct eventfd_ctx *eventfd,
-			      const char *args);
-	/*
-	 * unregister_event() callback will be called when userspace
-	 * closes the eventfd or on cgroup removing.
-	 * This callback must be implemented, if you want provide
-	 * notification functionality.
-	 */
-	void (*unregister_event)(struct cgroup_subsys_state *css,
-				 struct cftype *cft,
-				 struct eventfd_ctx *eventfd);
 };
 
 /*
diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
index b239482..324ea7a 100644
--- a/include/linux/vmpressure.h
+++ b/include/linux/vmpressure.h
@@ -7,6 +7,7 @@
 #include <linux/gfp.h>
 #include <linux/types.h>
 #include <linux/cgroup.h>
+#include <linux/eventfd.h>
 
 struct vmpressure {
 	unsigned long scanned;
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 1579ca8..4368a6c 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1353,8 +1353,6 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp)
 	INIT_LIST_HEAD(&cgrp->pidlists);
 	mutex_init(&cgrp->pidlist_mutex);
 	cgrp->dummy_css.cgroup = cgrp;
-	INIT_LIST_HEAD(&cgrp->event_list);
-	spin_lock_init(&cgrp->event_list_lock);
 	simple_xattrs_init(&cgrp->xattrs);
 }
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ef75925..9b833e1 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -261,6 +261,22 @@ struct cgroup_event {
 	 */
 	struct list_head list;
 	/*
+	 * register_event() callback will be used to add new userspace
+	 * waiter for changes related to this event.  Use eventfd_signal()
+	 * on eventfd to send notification to userspace.
+	 */
+	int (*register_event)(struct cgroup_subsys_state *css,
+			      struct cftype *cft, struct eventfd_ctx *eventfd,
+			      const char *args);
+	/*
+	 * unregister_event() callback will be called when userspace closes
+	 * the eventfd or on cgroup removing.  This callback must be set,
+	 * if you want provide notification functionality.
+	 */
+	void (*unregister_event)(struct cgroup_subsys_state *css,
+				 struct cftype *cft,
+				 struct eventfd_ctx *eventfd);
+	/*
 	 * All fields below needed to unregister event when
 	 * userspace closes eventfd.
 	 */
@@ -373,6 +389,10 @@ struct mem_cgroup {
 	atomic_t	numainfo_updating;
 #endif
 
+	/* List of events which userspace want to receive */
+	struct list_head event_list;
+	spinlock_t event_list_lock;
+
 	struct mem_cgroup_per_node *nodeinfo[0];
 	/* WARNING: nodeinfo must be the last member here */
 };
@@ -5963,7 +5983,7 @@ static void cgroup_event_remove(struct work_struct *work)
 
 	remove_wait_queue(event->wqh, &event->wait);
 
-	event->cft->unregister_event(css, event->cft, event->eventfd);
+	event->unregister_event(css, event->cft, event->eventfd);
 
 	/* Notify userspace the event is going away. */
 	eventfd_signal(event->eventfd, 1);
@@ -5983,7 +6003,7 @@ static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
 {
 	struct cgroup_event *event = container_of(wait,
 			struct cgroup_event, wait);
-	struct cgroup *cgrp = event->css->cgroup;
+	struct mem_cgroup *memcg = mem_cgroup_from_css(event->css);
 	unsigned long flags = (unsigned long)key;
 
 	if (flags & POLLHUP) {
@@ -5996,7 +6016,7 @@ static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
 		 * side will require wqh->lock via remove_wait_queue(),
 		 * which we hold.
 		 */
-		spin_lock(&cgrp->event_list_lock);
+		spin_lock(&memcg->event_list_lock);
 		if (!list_empty(&event->list)) {
 			list_del_init(&event->list);
 			/*
@@ -6005,7 +6025,7 @@ static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
 			 */
 			schedule_work(&event->remove);
 		}
-		spin_unlock(&cgrp->event_list_lock);
+		spin_unlock(&memcg->event_list_lock);
 	}
 
 	return 0;
@@ -6030,12 +6050,13 @@ static void cgroup_event_ptable_queue_proc(struct file *file,
 static int cgroup_write_event_control(struct cgroup_subsys_state *css,
 				      struct cftype *cft, const char *buffer)
 {
-	struct cgroup *cgrp = css->cgroup;
+	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
 	struct cgroup_event *event;
 	struct cgroup_subsys_state *cfile_css;
 	unsigned int efd, cfd;
 	struct file *efile;
 	struct file *cfile;
+	const char *name;
 	char *endp;
 	int ret;
 
@@ -6090,6 +6111,31 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *css,
 	}
 
 	/*
+	 * Determine the event callbacks and set them in @event.  This used
+	 * to be done via struct cftype but cgroup core no longer knows
+	 * about these events.  The following is crude but the whole thing
+	 * is for compatibility anyway.
+	 */
+	name = cfile->f_dentry->d_name.name;
+
+	if (!strcmp(name, "memory.usage_in_bytes")) {
+		event->register_event = mem_cgroup_usage_register_event;
+		event->unregister_event = mem_cgroup_usage_unregister_event;
+	} else if (!strcmp(name, "memory.oom_control")) {
+		event->register_event = mem_cgroup_oom_register_event;
+		event->unregister_event = mem_cgroup_oom_unregister_event;
+	} else if (!strcmp(name, "memory.pressure_level")) {
+		event->register_event = vmpressure_register_event;
+		event->unregister_event = vmpressure_unregister_event;
+	} else if (!strcmp(name, "memory.memsw.usage_in_bytes")) {
+		event->register_event = mem_cgroup_usage_register_event;
+		event->unregister_event = mem_cgroup_usage_unregister_event;
+	} else {
+		ret = -EINVAL;
+		goto out_put_cfile;
+	}
+
+	/*
 	 * Verify @cfile should belong to @css.  Also, remaining events are
 	 * automatically removed on cgroup destruction but the removal is
 	 * asynchronous, so take an extra ref on @css.
@@ -6105,21 +6151,15 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *css,
 	if (ret)
 		goto out_put_cfile;
 
-	if (!event->cft->register_event || !event->cft->unregister_event) {
-		ret = -EINVAL;
-		goto out_put_css;
-	}
-
-	ret = event->cft->register_event(css, event->cft,
-			event->eventfd, buffer);
+	ret = event->register_event(css, event->cft, event->eventfd, buffer);
 	if (ret)
 		goto out_put_css;
 
 	efile->f_op->poll(efile, &event->pt);
 
-	spin_lock(&cgrp->event_list_lock);
-	list_add(&event->list, &cgrp->event_list);
-	spin_unlock(&cgrp->event_list_lock);
+	spin_lock(&memcg->event_list_lock);
+	list_add(&event->list, &memcg->event_list);
+	spin_unlock(&memcg->event_list_lock);
 
 	fput(cfile);
 	fput(efile);
@@ -6145,8 +6185,6 @@ static struct cftype mem_cgroup_files[] = {
 		.name = "usage_in_bytes",
 		.private = MEMFILE_PRIVATE(_MEM, RES_USAGE),
 		.read = mem_cgroup_read,
-		.register_event = mem_cgroup_usage_register_event,
-		.unregister_event = mem_cgroup_usage_unregister_event,
 	},
 	{
 		.name = "max_usage_in_bytes",
@@ -6206,14 +6244,10 @@ static struct cftype mem_cgroup_files[] = {
 		.name = "oom_control",
 		.read_map = mem_cgroup_oom_control_read,
 		.write_u64 = mem_cgroup_oom_control_write,
-		.register_event = mem_cgroup_oom_register_event,
-		.unregister_event = mem_cgroup_oom_unregister_event,
 		.private = MEMFILE_PRIVATE(_OOM_TYPE, OOM_CONTROL),
 	},
 	{
 		.name = "pressure_level",
-		.register_event = vmpressure_register_event,
-		.unregister_event = vmpressure_unregister_event,
 	},
 #ifdef CONFIG_NUMA
 	{
@@ -6261,8 +6295,6 @@ static struct cftype memsw_cgroup_files[] = {
 		.name = "memsw.usage_in_bytes",
 		.private = MEMFILE_PRIVATE(_MEMSWAP, RES_USAGE),
 		.read = mem_cgroup_read,
-		.register_event = mem_cgroup_usage_register_event,
-		.unregister_event = mem_cgroup_usage_unregister_event,
 	},
 	{
 		.name = "memsw.max_usage_in_bytes",
@@ -6453,6 +6485,8 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
 	mutex_init(&memcg->thresholds_lock);
 	spin_lock_init(&memcg->move_lock);
 	vmpressure_init(&memcg->vmpressure);
+	INIT_LIST_HEAD(&memcg->event_list);
+	spin_lock_init(&memcg->event_list_lock);
 
 	return &memcg->css;
 
@@ -6525,7 +6559,6 @@ static void mem_cgroup_invalidate_reclaim_iterators(struct mem_cgroup *memcg)
 static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
-	struct cgroup *cgrp = css->cgroup;
 	struct cgroup_event *event, *tmp;
 
 	/*
@@ -6533,12 +6566,12 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
 	 * Notify userspace about cgroup removing only after rmdir of cgroup
 	 * directory to avoid race between userspace and kernelspace.
 	 */
-	spin_lock(&cgrp->event_list_lock);
-	list_for_each_entry_safe(event, tmp, &cgrp->event_list, list) {
+	spin_lock(&memcg->event_list_lock);
+	list_for_each_entry_safe(event, tmp, &memcg->event_list, list) {
 		list_del_init(&event->list);
 		schedule_work(&event->remove);
 	}
-	spin_unlock(&cgrp->event_list_lock);
+	spin_unlock(&memcg->event_list_lock);
 
 	kmem_cgroup_css_offline(memcg);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 09/12] memcg: remove cgroup_event->cft
       [not found] ` <1376582550-12548-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (15 preceding siblings ...)
  2013-08-15 16:02   ` Tejun Heo
@ 2013-08-15 16:02   ` Tejun Heo
  2013-08-15 16:02   ` Tejun Heo
                     ` (12 subsequent siblings)
  29 siblings, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2013-08-15 16:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
  Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

The only use of cgroup_event->cft is distinguishing "usage_in_bytes"
and "memsw.usgae_in_bytes" for mem_cgroup_usage_[un]register_event(),
which can be done by adding an explicit argument to the function and
implementing two wrappers so that the two cases can be distinguished
from the function alone.

Remove cgroup_event->cft and the related code including
[un]register_events() methods.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 include/linux/vmpressure.h |  2 --
 mm/memcontrol.c            | 65 +++++++++++++++++++++++++---------------------
 mm/vmpressure.c            | 14 +++-------
 3 files changed, 38 insertions(+), 43 deletions(-)

diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
index 324ea7a..dd0b025 100644
--- a/include/linux/vmpressure.h
+++ b/include/linux/vmpressure.h
@@ -35,11 +35,9 @@ extern struct vmpressure *memcg_to_vmpressure(struct mem_cgroup *memcg);
 extern struct cgroup_subsys_state *vmpressure_to_css(struct vmpressure *vmpr);
 extern struct vmpressure *css_to_vmpressure(struct cgroup_subsys_state *css);
 extern int vmpressure_register_event(struct cgroup_subsys_state *css,
-				     struct cftype *cft,
 				     struct eventfd_ctx *eventfd,
 				     const char *args);
 extern void vmpressure_unregister_event(struct cgroup_subsys_state *css,
-					struct cftype *cft,
 					struct eventfd_ctx *eventfd);
 #else
 static inline void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9b833e1..18e98ae 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -249,10 +249,6 @@ struct cgroup_event {
 	 */
 	struct cgroup_subsys_state *css;
 	/*
-	 * Control file which the event associated.
-	 */
-	struct cftype *cft;
-	/*
 	 * eventfd to signal userspace about the event.
 	 */
 	struct eventfd_ctx *eventfd;
@@ -266,15 +262,13 @@ struct cgroup_event {
 	 * on eventfd to send notification to userspace.
 	 */
 	int (*register_event)(struct cgroup_subsys_state *css,
-			      struct cftype *cft, struct eventfd_ctx *eventfd,
-			      const char *args);
+			      struct eventfd_ctx *eventfd, const char *args);
 	/*
 	 * unregister_event() callback will be called when userspace closes
 	 * the eventfd or on cgroup removing.  This callback must be set,
 	 * if you want provide notification functionality.
 	 */
 	void (*unregister_event)(struct cgroup_subsys_state *css,
-				 struct cftype *cft,
 				 struct eventfd_ctx *eventfd);
 	/*
 	 * All fields below needed to unregister event when
@@ -5659,13 +5653,12 @@ static void mem_cgroup_oom_notify(struct mem_cgroup *memcg)
 		mem_cgroup_oom_notify_cb(iter);
 }
 
-static int mem_cgroup_usage_register_event(struct cgroup_subsys_state *css,
-	struct cftype *cft, struct eventfd_ctx *eventfd, const char *args)
+static int __mem_cgroup_usage_register_event(struct cgroup_subsys_state *css,
+	struct eventfd_ctx *eventfd, const char *args, enum res_type type)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
 	struct mem_cgroup_thresholds *thresholds;
 	struct mem_cgroup_threshold_ary *new;
-	enum res_type type = MEMFILE_TYPE(cft->private);
 	u64 threshold, usage;
 	int i, size, ret;
 
@@ -5742,13 +5735,24 @@ unlock:
 	return ret;
 }
 
-static void mem_cgroup_usage_unregister_event(struct cgroup_subsys_state *css,
-	struct cftype *cft, struct eventfd_ctx *eventfd)
+static int mem_cgroup_usage_register_event(struct cgroup_subsys_state *css,
+	struct eventfd_ctx *eventfd, const char *args)
+{
+	return __mem_cgroup_usage_register_event(css, eventfd, args, _MEM);
+}
+
+static int memsw_cgroup_usage_register_event(struct cgroup_subsys_state *css,
+	struct eventfd_ctx *eventfd, const char *args)
+{
+	return __mem_cgroup_usage_register_event(css, eventfd, args, _MEMSWAP);
+}
+
+static void __mem_cgroup_usage_unregister_event(struct cgroup_subsys_state *css,
+	struct eventfd_ctx *eventfd, enum res_type type)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
 	struct mem_cgroup_thresholds *thresholds;
 	struct mem_cgroup_threshold_ary *new;
-	enum res_type type = MEMFILE_TYPE(cft->private);
 	u64 usage;
 	int i, j, size;
 
@@ -5821,14 +5825,24 @@ unlock:
 	mutex_unlock(&memcg->thresholds_lock);
 }
 
+static void mem_cgroup_usage_unregister_event(struct cgroup_subsys_state *css,
+	struct eventfd_ctx *eventfd)
+{
+	return __mem_cgroup_usage_unregister_event(css, eventfd, _MEM);
+}
+
+static void memsw_cgroup_usage_unregister_event(struct cgroup_subsys_state *css,
+	struct eventfd_ctx *eventfd)
+{
+	return __mem_cgroup_usage_unregister_event(css, eventfd, _MEMSWAP);
+}
+
 static int mem_cgroup_oom_register_event(struct cgroup_subsys_state *css,
-	struct cftype *cft, struct eventfd_ctx *eventfd, const char *args)
+	struct eventfd_ctx *eventfd, const char *args)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
 	struct mem_cgroup_eventfd_list *event;
-	enum res_type type = MEMFILE_TYPE(cft->private);
 
-	BUG_ON(type != _OOM_TYPE);
 	event = kmalloc(sizeof(*event),	GFP_KERNEL);
 	if (!event)
 		return -ENOMEM;
@@ -5847,13 +5861,10 @@ static int mem_cgroup_oom_register_event(struct cgroup_subsys_state *css,
 }
 
 static void mem_cgroup_oom_unregister_event(struct cgroup_subsys_state *css,
-	struct cftype *cft, struct eventfd_ctx *eventfd)
+	struct eventfd_ctx *eventfd)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
 	struct mem_cgroup_eventfd_list *ev, *tmp;
-	enum res_type type = MEMFILE_TYPE(cft->private);
-
-	BUG_ON(type != _OOM_TYPE);
 
 	spin_lock(&memcg_oom_lock);
 
@@ -5983,7 +5994,7 @@ static void cgroup_event_remove(struct work_struct *work)
 
 	remove_wait_queue(event->wqh, &event->wait);
 
-	event->unregister_event(css, event->cft, event->eventfd);
+	event->unregister_event(css, event->eventfd);
 
 	/* Notify userspace the event is going away. */
 	eventfd_signal(event->eventfd, 1);
@@ -6104,12 +6115,6 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *css,
 	if (ret < 0)
 		goto out_put_cfile;
 
-	event->cft = __file_cft(cfile);
-	if (IS_ERR(event->cft)) {
-		ret = PTR_ERR(event->cft);
-		goto out_put_cfile;
-	}
-
 	/*
 	 * Determine the event callbacks and set them in @event.  This used
 	 * to be done via struct cftype but cgroup core no longer knows
@@ -6128,8 +6133,8 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *css,
 		event->register_event = vmpressure_register_event;
 		event->unregister_event = vmpressure_unregister_event;
 	} else if (!strcmp(name, "memory.memsw.usage_in_bytes")) {
-		event->register_event = mem_cgroup_usage_register_event;
-		event->unregister_event = mem_cgroup_usage_unregister_event;
+		event->register_event = memsw_cgroup_usage_register_event;
+		event->unregister_event = memsw_cgroup_usage_unregister_event;
 	} else {
 		ret = -EINVAL;
 		goto out_put_cfile;
@@ -6151,7 +6156,7 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *css,
 	if (ret)
 		goto out_put_cfile;
 
-	ret = event->register_event(css, event->cft, event->eventfd, buffer);
+	ret = event->register_event(css, event->eventfd, buffer);
 	if (ret)
 		goto out_put_css;
 
diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index 13489b1..40ed6e8 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -279,7 +279,6 @@ void vmpressure_prio(gfp_t gfp, struct mem_cgroup *memcg, int prio)
 /**
  * vmpressure_register_event() - Bind vmpressure notifications to an eventfd
  * @css:	css that is interested in vmpressure notifications
- * @cft:	cgroup control files handle
  * @eventfd:	eventfd context to link notifications with
  * @args:	event arguments (used to set up a pressure level threshold)
  *
@@ -289,13 +288,10 @@ void vmpressure_prio(gfp_t gfp, struct mem_cgroup *memcg, int prio)
  * threshold (one of vmpressure_str_levels, i.e. "low", "medium", or
  * "critical").
  *
- * This function should not be used directly, just pass it to (struct
- * cftype).register_event, and then cgroup core will handle everything by
- * itself.
+ * To be used as memcg event method.
  */
 int vmpressure_register_event(struct cgroup_subsys_state *css,
-			      struct cftype *cft, struct eventfd_ctx *eventfd,
-			      const char *args)
+			      struct eventfd_ctx *eventfd, const char *args)
 {
 	struct vmpressure *vmpr = css_to_vmpressure(css);
 	struct vmpressure_event *ev;
@@ -326,19 +322,15 @@ int vmpressure_register_event(struct cgroup_subsys_state *css,
 /**
  * vmpressure_unregister_event() - Unbind eventfd from vmpressure
  * @css:	css handle
- * @cft:	cgroup control files handle
  * @eventfd:	eventfd context that was used to link vmpressure with the @cg
  *
  * This function does internal manipulations to detach the @eventfd from
  * the vmpressure notifications, and then frees internal resources
  * associated with the @eventfd (but the @eventfd itself is not freed).
  *
- * This function should not be used directly, just pass it to (struct
- * cftype).unregister_event, and then cgroup core will handle everything
- * by itself.
+ * To be used as memcg event method.
  */
 void vmpressure_unregister_event(struct cgroup_subsys_state *css,
-				 struct cftype *cft,
 				 struct eventfd_ctx *eventfd)
 {
 	struct vmpressure *vmpr = css_to_vmpressure(css);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 09/12] memcg: remove cgroup_event->cft
       [not found] ` <1376582550-12548-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (16 preceding siblings ...)
  2013-08-15 16:02   ` [PATCH 09/12] memcg: remove cgroup_event->cft Tejun Heo
@ 2013-08-15 16:02   ` Tejun Heo
       [not found]     ` <1376582550-12548-10-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2013-08-15 16:02   ` [PATCH 10/12] memcg: make cgroup_event deal with mem_cgroup instead of cgroup_subsys_state Tejun Heo
                     ` (11 subsequent siblings)
  29 siblings, 1 reply; 74+ messages in thread
From: Tejun Heo @ 2013-08-15 16:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Tejun Heo

The only use of cgroup_event->cft is distinguishing "usage_in_bytes"
and "memsw.usgae_in_bytes" for mem_cgroup_usage_[un]register_event(),
which can be done by adding an explicit argument to the function and
implementing two wrappers so that the two cases can be distinguished
from the function alone.

Remove cgroup_event->cft and the related code including
[un]register_events() methods.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 include/linux/vmpressure.h |  2 --
 mm/memcontrol.c            | 65 +++++++++++++++++++++++++---------------------
 mm/vmpressure.c            | 14 +++-------
 3 files changed, 38 insertions(+), 43 deletions(-)

diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
index 324ea7a..dd0b025 100644
--- a/include/linux/vmpressure.h
+++ b/include/linux/vmpressure.h
@@ -35,11 +35,9 @@ extern struct vmpressure *memcg_to_vmpressure(struct mem_cgroup *memcg);
 extern struct cgroup_subsys_state *vmpressure_to_css(struct vmpressure *vmpr);
 extern struct vmpressure *css_to_vmpressure(struct cgroup_subsys_state *css);
 extern int vmpressure_register_event(struct cgroup_subsys_state *css,
-				     struct cftype *cft,
 				     struct eventfd_ctx *eventfd,
 				     const char *args);
 extern void vmpressure_unregister_event(struct cgroup_subsys_state *css,
-					struct cftype *cft,
 					struct eventfd_ctx *eventfd);
 #else
 static inline void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9b833e1..18e98ae 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -249,10 +249,6 @@ struct cgroup_event {
 	 */
 	struct cgroup_subsys_state *css;
 	/*
-	 * Control file which the event associated.
-	 */
-	struct cftype *cft;
-	/*
 	 * eventfd to signal userspace about the event.
 	 */
 	struct eventfd_ctx *eventfd;
@@ -266,15 +262,13 @@ struct cgroup_event {
 	 * on eventfd to send notification to userspace.
 	 */
 	int (*register_event)(struct cgroup_subsys_state *css,
-			      struct cftype *cft, struct eventfd_ctx *eventfd,
-			      const char *args);
+			      struct eventfd_ctx *eventfd, const char *args);
 	/*
 	 * unregister_event() callback will be called when userspace closes
 	 * the eventfd or on cgroup removing.  This callback must be set,
 	 * if you want provide notification functionality.
 	 */
 	void (*unregister_event)(struct cgroup_subsys_state *css,
-				 struct cftype *cft,
 				 struct eventfd_ctx *eventfd);
 	/*
 	 * All fields below needed to unregister event when
@@ -5659,13 +5653,12 @@ static void mem_cgroup_oom_notify(struct mem_cgroup *memcg)
 		mem_cgroup_oom_notify_cb(iter);
 }
 
-static int mem_cgroup_usage_register_event(struct cgroup_subsys_state *css,
-	struct cftype *cft, struct eventfd_ctx *eventfd, const char *args)
+static int __mem_cgroup_usage_register_event(struct cgroup_subsys_state *css,
+	struct eventfd_ctx *eventfd, const char *args, enum res_type type)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
 	struct mem_cgroup_thresholds *thresholds;
 	struct mem_cgroup_threshold_ary *new;
-	enum res_type type = MEMFILE_TYPE(cft->private);
 	u64 threshold, usage;
 	int i, size, ret;
 
@@ -5742,13 +5735,24 @@ unlock:
 	return ret;
 }
 
-static void mem_cgroup_usage_unregister_event(struct cgroup_subsys_state *css,
-	struct cftype *cft, struct eventfd_ctx *eventfd)
+static int mem_cgroup_usage_register_event(struct cgroup_subsys_state *css,
+	struct eventfd_ctx *eventfd, const char *args)
+{
+	return __mem_cgroup_usage_register_event(css, eventfd, args, _MEM);
+}
+
+static int memsw_cgroup_usage_register_event(struct cgroup_subsys_state *css,
+	struct eventfd_ctx *eventfd, const char *args)
+{
+	return __mem_cgroup_usage_register_event(css, eventfd, args, _MEMSWAP);
+}
+
+static void __mem_cgroup_usage_unregister_event(struct cgroup_subsys_state *css,
+	struct eventfd_ctx *eventfd, enum res_type type)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
 	struct mem_cgroup_thresholds *thresholds;
 	struct mem_cgroup_threshold_ary *new;
-	enum res_type type = MEMFILE_TYPE(cft->private);
 	u64 usage;
 	int i, j, size;
 
@@ -5821,14 +5825,24 @@ unlock:
 	mutex_unlock(&memcg->thresholds_lock);
 }
 
+static void mem_cgroup_usage_unregister_event(struct cgroup_subsys_state *css,
+	struct eventfd_ctx *eventfd)
+{
+	return __mem_cgroup_usage_unregister_event(css, eventfd, _MEM);
+}
+
+static void memsw_cgroup_usage_unregister_event(struct cgroup_subsys_state *css,
+	struct eventfd_ctx *eventfd)
+{
+	return __mem_cgroup_usage_unregister_event(css, eventfd, _MEMSWAP);
+}
+
 static int mem_cgroup_oom_register_event(struct cgroup_subsys_state *css,
-	struct cftype *cft, struct eventfd_ctx *eventfd, const char *args)
+	struct eventfd_ctx *eventfd, const char *args)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
 	struct mem_cgroup_eventfd_list *event;
-	enum res_type type = MEMFILE_TYPE(cft->private);
 
-	BUG_ON(type != _OOM_TYPE);
 	event = kmalloc(sizeof(*event),	GFP_KERNEL);
 	if (!event)
 		return -ENOMEM;
@@ -5847,13 +5861,10 @@ static int mem_cgroup_oom_register_event(struct cgroup_subsys_state *css,
 }
 
 static void mem_cgroup_oom_unregister_event(struct cgroup_subsys_state *css,
-	struct cftype *cft, struct eventfd_ctx *eventfd)
+	struct eventfd_ctx *eventfd)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
 	struct mem_cgroup_eventfd_list *ev, *tmp;
-	enum res_type type = MEMFILE_TYPE(cft->private);
-
-	BUG_ON(type != _OOM_TYPE);
 
 	spin_lock(&memcg_oom_lock);
 
@@ -5983,7 +5994,7 @@ static void cgroup_event_remove(struct work_struct *work)
 
 	remove_wait_queue(event->wqh, &event->wait);
 
-	event->unregister_event(css, event->cft, event->eventfd);
+	event->unregister_event(css, event->eventfd);
 
 	/* Notify userspace the event is going away. */
 	eventfd_signal(event->eventfd, 1);
@@ -6104,12 +6115,6 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *css,
 	if (ret < 0)
 		goto out_put_cfile;
 
-	event->cft = __file_cft(cfile);
-	if (IS_ERR(event->cft)) {
-		ret = PTR_ERR(event->cft);
-		goto out_put_cfile;
-	}
-
 	/*
 	 * Determine the event callbacks and set them in @event.  This used
 	 * to be done via struct cftype but cgroup core no longer knows
@@ -6128,8 +6133,8 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *css,
 		event->register_event = vmpressure_register_event;
 		event->unregister_event = vmpressure_unregister_event;
 	} else if (!strcmp(name, "memory.memsw.usage_in_bytes")) {
-		event->register_event = mem_cgroup_usage_register_event;
-		event->unregister_event = mem_cgroup_usage_unregister_event;
+		event->register_event = memsw_cgroup_usage_register_event;
+		event->unregister_event = memsw_cgroup_usage_unregister_event;
 	} else {
 		ret = -EINVAL;
 		goto out_put_cfile;
@@ -6151,7 +6156,7 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *css,
 	if (ret)
 		goto out_put_cfile;
 
-	ret = event->register_event(css, event->cft, event->eventfd, buffer);
+	ret = event->register_event(css, event->eventfd, buffer);
 	if (ret)
 		goto out_put_css;
 
diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index 13489b1..40ed6e8 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -279,7 +279,6 @@ void vmpressure_prio(gfp_t gfp, struct mem_cgroup *memcg, int prio)
 /**
  * vmpressure_register_event() - Bind vmpressure notifications to an eventfd
  * @css:	css that is interested in vmpressure notifications
- * @cft:	cgroup control files handle
  * @eventfd:	eventfd context to link notifications with
  * @args:	event arguments (used to set up a pressure level threshold)
  *
@@ -289,13 +288,10 @@ void vmpressure_prio(gfp_t gfp, struct mem_cgroup *memcg, int prio)
  * threshold (one of vmpressure_str_levels, i.e. "low", "medium", or
  * "critical").
  *
- * This function should not be used directly, just pass it to (struct
- * cftype).register_event, and then cgroup core will handle everything by
- * itself.
+ * To be used as memcg event method.
  */
 int vmpressure_register_event(struct cgroup_subsys_state *css,
-			      struct cftype *cft, struct eventfd_ctx *eventfd,
-			      const char *args)
+			      struct eventfd_ctx *eventfd, const char *args)
 {
 	struct vmpressure *vmpr = css_to_vmpressure(css);
 	struct vmpressure_event *ev;
@@ -326,19 +322,15 @@ int vmpressure_register_event(struct cgroup_subsys_state *css,
 /**
  * vmpressure_unregister_event() - Unbind eventfd from vmpressure
  * @css:	css handle
- * @cft:	cgroup control files handle
  * @eventfd:	eventfd context that was used to link vmpressure with the @cg
  *
  * This function does internal manipulations to detach the @eventfd from
  * the vmpressure notifications, and then frees internal resources
  * associated with the @eventfd (but the @eventfd itself is not freed).
  *
- * This function should not be used directly, just pass it to (struct
- * cftype).unregister_event, and then cgroup core will handle everything
- * by itself.
+ * To be used as memcg event method.
  */
 void vmpressure_unregister_event(struct cgroup_subsys_state *css,
-				 struct cftype *cft,
 				 struct eventfd_ctx *eventfd)
 {
 	struct vmpressure *vmpr = css_to_vmpressure(css);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 10/12] memcg: make cgroup_event deal with mem_cgroup instead of cgroup_subsys_state
       [not found] ` <1376582550-12548-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (17 preceding siblings ...)
  2013-08-15 16:02   ` Tejun Heo
@ 2013-08-15 16:02   ` Tejun Heo
  2013-08-15 16:02   ` Tejun Heo
                     ` (10 subsequent siblings)
  29 siblings, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2013-08-15 16:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
  Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

cgroup_event is now memcg specific.  Replace cgroup_event->css with
->memcg and convert [un]register_event() callbacks to take mem_cgroup
pointer instead of cgroup_subsys_state one.  This simplifies the code
slightly and makes css_to_vmpressure() unnecessary which is removed.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 include/linux/vmpressure.h |  5 ++---
 mm/memcontrol.c            | 53 +++++++++++++++++++---------------------------
 mm/vmpressure.c            | 12 +++++------
 3 files changed, 30 insertions(+), 40 deletions(-)

diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
index dd0b025..3c67eb3 100644
--- a/include/linux/vmpressure.h
+++ b/include/linux/vmpressure.h
@@ -33,11 +33,10 @@ extern void vmpressure_prio(gfp_t gfp, struct mem_cgroup *memcg, int prio);
 extern void vmpressure_init(struct vmpressure *vmpr);
 extern struct vmpressure *memcg_to_vmpressure(struct mem_cgroup *memcg);
 extern struct cgroup_subsys_state *vmpressure_to_css(struct vmpressure *vmpr);
-extern struct vmpressure *css_to_vmpressure(struct cgroup_subsys_state *css);
-extern int vmpressure_register_event(struct cgroup_subsys_state *css,
+extern int vmpressure_register_event(struct mem_cgroup *memcg,
 				     struct eventfd_ctx *eventfd,
 				     const char *args);
-extern void vmpressure_unregister_event(struct cgroup_subsys_state *css,
+extern void vmpressure_unregister_event(struct mem_cgroup *memcg,
 					struct eventfd_ctx *eventfd);
 #else
 static inline void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 18e98ae..8663d6c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -245,9 +245,9 @@ struct mem_cgroup_eventfd_list {
  */
 struct cgroup_event {
 	/*
-	 * css which the event belongs to.
+	 * memcg which the event belongs to.
 	 */
-	struct cgroup_subsys_state *css;
+	struct mem_cgroup *memcg;
 	/*
 	 * eventfd to signal userspace about the event.
 	 */
@@ -261,14 +261,14 @@ struct cgroup_event {
 	 * waiter for changes related to this event.  Use eventfd_signal()
 	 * on eventfd to send notification to userspace.
 	 */
-	int (*register_event)(struct cgroup_subsys_state *css,
+	int (*register_event)(struct mem_cgroup *memcg,
 			      struct eventfd_ctx *eventfd, const char *args);
 	/*
 	 * unregister_event() callback will be called when userspace closes
 	 * the eventfd or on cgroup removing.  This callback must be set,
 	 * if you want provide notification functionality.
 	 */
-	void (*unregister_event)(struct cgroup_subsys_state *css,
+	void (*unregister_event)(struct mem_cgroup *memcg,
 				 struct eventfd_ctx *eventfd);
 	/*
 	 * All fields below needed to unregister event when
@@ -546,11 +546,6 @@ struct cgroup_subsys_state *vmpressure_to_css(struct vmpressure *vmpr)
 	return &container_of(vmpr, struct mem_cgroup, vmpressure)->css;
 }
 
-struct vmpressure *css_to_vmpressure(struct cgroup_subsys_state *css)
-{
-	return &mem_cgroup_from_css(css)->vmpressure;
-}
-
 static inline bool mem_cgroup_is_root(struct mem_cgroup *memcg)
 {
 	return (memcg == root_mem_cgroup);
@@ -5653,10 +5648,9 @@ static void mem_cgroup_oom_notify(struct mem_cgroup *memcg)
 		mem_cgroup_oom_notify_cb(iter);
 }
 
-static int __mem_cgroup_usage_register_event(struct cgroup_subsys_state *css,
+static int __mem_cgroup_usage_register_event(struct mem_cgroup *memcg,
 	struct eventfd_ctx *eventfd, const char *args, enum res_type type)
 {
-	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
 	struct mem_cgroup_thresholds *thresholds;
 	struct mem_cgroup_threshold_ary *new;
 	u64 threshold, usage;
@@ -5735,22 +5729,21 @@ unlock:
 	return ret;
 }
 
-static int mem_cgroup_usage_register_event(struct cgroup_subsys_state *css,
+static int mem_cgroup_usage_register_event(struct mem_cgroup *memcg,
 	struct eventfd_ctx *eventfd, const char *args)
 {
-	return __mem_cgroup_usage_register_event(css, eventfd, args, _MEM);
+	return __mem_cgroup_usage_register_event(memcg, eventfd, args, _MEM);
 }
 
-static int memsw_cgroup_usage_register_event(struct cgroup_subsys_state *css,
+static int memsw_cgroup_usage_register_event(struct mem_cgroup *memcg,
 	struct eventfd_ctx *eventfd, const char *args)
 {
-	return __mem_cgroup_usage_register_event(css, eventfd, args, _MEMSWAP);
+	return __mem_cgroup_usage_register_event(memcg, eventfd, args, _MEMSWAP);
 }
 
-static void __mem_cgroup_usage_unregister_event(struct cgroup_subsys_state *css,
+static void __mem_cgroup_usage_unregister_event(struct mem_cgroup *memcg,
 	struct eventfd_ctx *eventfd, enum res_type type)
 {
-	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
 	struct mem_cgroup_thresholds *thresholds;
 	struct mem_cgroup_threshold_ary *new;
 	u64 usage;
@@ -5825,22 +5818,21 @@ unlock:
 	mutex_unlock(&memcg->thresholds_lock);
 }
 
-static void mem_cgroup_usage_unregister_event(struct cgroup_subsys_state *css,
+static void mem_cgroup_usage_unregister_event(struct mem_cgroup *memcg,
 	struct eventfd_ctx *eventfd)
 {
-	return __mem_cgroup_usage_unregister_event(css, eventfd, _MEM);
+	return __mem_cgroup_usage_unregister_event(memcg, eventfd, _MEM);
 }
 
-static void memsw_cgroup_usage_unregister_event(struct cgroup_subsys_state *css,
+static void memsw_cgroup_usage_unregister_event(struct mem_cgroup *memcg,
 	struct eventfd_ctx *eventfd)
 {
-	return __mem_cgroup_usage_unregister_event(css, eventfd, _MEMSWAP);
+	return __mem_cgroup_usage_unregister_event(memcg, eventfd, _MEMSWAP);
 }
 
-static int mem_cgroup_oom_register_event(struct cgroup_subsys_state *css,
+static int mem_cgroup_oom_register_event(struct mem_cgroup *memcg,
 	struct eventfd_ctx *eventfd, const char *args)
 {
-	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
 	struct mem_cgroup_eventfd_list *event;
 
 	event = kmalloc(sizeof(*event),	GFP_KERNEL);
@@ -5860,10 +5852,9 @@ static int mem_cgroup_oom_register_event(struct cgroup_subsys_state *css,
 	return 0;
 }
 
-static void mem_cgroup_oom_unregister_event(struct cgroup_subsys_state *css,
+static void mem_cgroup_oom_unregister_event(struct mem_cgroup *memcg,
 	struct eventfd_ctx *eventfd)
 {
-	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
 	struct mem_cgroup_eventfd_list *ev, *tmp;
 
 	spin_lock(&memcg_oom_lock);
@@ -5990,18 +5981,18 @@ static void cgroup_event_remove(struct work_struct *work)
 {
 	struct cgroup_event *event = container_of(work, struct cgroup_event,
 			remove);
-	struct cgroup_subsys_state *css = event->css;
+	struct mem_cgroup *memcg = event->memcg;
 
 	remove_wait_queue(event->wqh, &event->wait);
 
-	event->unregister_event(css, event->eventfd);
+	event->unregister_event(memcg, event->eventfd);
 
 	/* Notify userspace the event is going away. */
 	eventfd_signal(event->eventfd, 1);
 
 	eventfd_ctx_put(event->eventfd);
 	kfree(event);
-	css_put(css);
+	css_put(&memcg->css);
 }
 
 /*
@@ -6014,7 +6005,7 @@ static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
 {
 	struct cgroup_event *event = container_of(wait,
 			struct cgroup_event, wait);
-	struct mem_cgroup *memcg = mem_cgroup_from_css(event->css);
+	struct mem_cgroup *memcg = event->memcg;
 	unsigned long flags = (unsigned long)key;
 
 	if (flags & POLLHUP) {
@@ -6085,7 +6076,7 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *css,
 	if (!event)
 		return -ENOMEM;
 
-	event->css = css;
+	event->memcg = memcg;
 	INIT_LIST_HEAD(&event->list);
 	init_poll_funcptr(&event->pt, cgroup_event_ptable_queue_proc);
 	init_waitqueue_func_entry(&event->wait, cgroup_event_wake);
@@ -6156,7 +6147,7 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *css,
 	if (ret)
 		goto out_put_cfile;
 
-	ret = event->register_event(css, event->eventfd, buffer);
+	ret = event->register_event(memcg, event->eventfd, buffer);
 	if (ret)
 		goto out_put_css;
 
diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index 40ed6e8..96f7509 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -278,7 +278,7 @@ void vmpressure_prio(gfp_t gfp, struct mem_cgroup *memcg, int prio)
 
 /**
  * vmpressure_register_event() - Bind vmpressure notifications to an eventfd
- * @css:	css that is interested in vmpressure notifications
+ * @memcg:	memcg that is interested in vmpressure notifications
  * @eventfd:	eventfd context to link notifications with
  * @args:	event arguments (used to set up a pressure level threshold)
  *
@@ -290,10 +290,10 @@ void vmpressure_prio(gfp_t gfp, struct mem_cgroup *memcg, int prio)
  *
  * To be used as memcg event method.
  */
-int vmpressure_register_event(struct cgroup_subsys_state *css,
+int vmpressure_register_event(struct mem_cgroup *memcg,
 			      struct eventfd_ctx *eventfd, const char *args)
 {
-	struct vmpressure *vmpr = css_to_vmpressure(css);
+	struct vmpressure *vmpr = memcg_to_vmpressure(memcg);
 	struct vmpressure_event *ev;
 	int level;
 
@@ -321,7 +321,7 @@ int vmpressure_register_event(struct cgroup_subsys_state *css,
 
 /**
  * vmpressure_unregister_event() - Unbind eventfd from vmpressure
- * @css:	css handle
+ * @memcg:	memcg handle
  * @eventfd:	eventfd context that was used to link vmpressure with the @cg
  *
  * This function does internal manipulations to detach the @eventfd from
@@ -330,10 +330,10 @@ int vmpressure_register_event(struct cgroup_subsys_state *css,
  *
  * To be used as memcg event method.
  */
-void vmpressure_unregister_event(struct cgroup_subsys_state *css,
+void vmpressure_unregister_event(struct mem_cgroup *memcg,
 				 struct eventfd_ctx *eventfd)
 {
-	struct vmpressure *vmpr = css_to_vmpressure(css);
+	struct vmpressure *vmpr = memcg_to_vmpressure(memcg);
 	struct vmpressure_event *ev;
 
 	mutex_lock(&vmpr->events_lock);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 10/12] memcg: make cgroup_event deal with mem_cgroup instead of cgroup_subsys_state
       [not found] ` <1376582550-12548-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (18 preceding siblings ...)
  2013-08-15 16:02   ` [PATCH 10/12] memcg: make cgroup_event deal with mem_cgroup instead of cgroup_subsys_state Tejun Heo
@ 2013-08-15 16:02   ` Tejun Heo
       [not found]     ` <1376582550-12548-11-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2013-08-15 16:02   ` [PATCH 11/12] memcg: rename cgroup_event to mem_cgroup_event Tejun Heo
                     ` (9 subsequent siblings)
  29 siblings, 1 reply; 74+ messages in thread
From: Tejun Heo @ 2013-08-15 16:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Tejun Heo

cgroup_event is now memcg specific.  Replace cgroup_event->css with
->memcg and convert [un]register_event() callbacks to take mem_cgroup
pointer instead of cgroup_subsys_state one.  This simplifies the code
slightly and makes css_to_vmpressure() unnecessary which is removed.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 include/linux/vmpressure.h |  5 ++---
 mm/memcontrol.c            | 53 +++++++++++++++++++---------------------------
 mm/vmpressure.c            | 12 +++++------
 3 files changed, 30 insertions(+), 40 deletions(-)

diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
index dd0b025..3c67eb3 100644
--- a/include/linux/vmpressure.h
+++ b/include/linux/vmpressure.h
@@ -33,11 +33,10 @@ extern void vmpressure_prio(gfp_t gfp, struct mem_cgroup *memcg, int prio);
 extern void vmpressure_init(struct vmpressure *vmpr);
 extern struct vmpressure *memcg_to_vmpressure(struct mem_cgroup *memcg);
 extern struct cgroup_subsys_state *vmpressure_to_css(struct vmpressure *vmpr);
-extern struct vmpressure *css_to_vmpressure(struct cgroup_subsys_state *css);
-extern int vmpressure_register_event(struct cgroup_subsys_state *css,
+extern int vmpressure_register_event(struct mem_cgroup *memcg,
 				     struct eventfd_ctx *eventfd,
 				     const char *args);
-extern void vmpressure_unregister_event(struct cgroup_subsys_state *css,
+extern void vmpressure_unregister_event(struct mem_cgroup *memcg,
 					struct eventfd_ctx *eventfd);
 #else
 static inline void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 18e98ae..8663d6c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -245,9 +245,9 @@ struct mem_cgroup_eventfd_list {
  */
 struct cgroup_event {
 	/*
-	 * css which the event belongs to.
+	 * memcg which the event belongs to.
 	 */
-	struct cgroup_subsys_state *css;
+	struct mem_cgroup *memcg;
 	/*
 	 * eventfd to signal userspace about the event.
 	 */
@@ -261,14 +261,14 @@ struct cgroup_event {
 	 * waiter for changes related to this event.  Use eventfd_signal()
 	 * on eventfd to send notification to userspace.
 	 */
-	int (*register_event)(struct cgroup_subsys_state *css,
+	int (*register_event)(struct mem_cgroup *memcg,
 			      struct eventfd_ctx *eventfd, const char *args);
 	/*
 	 * unregister_event() callback will be called when userspace closes
 	 * the eventfd or on cgroup removing.  This callback must be set,
 	 * if you want provide notification functionality.
 	 */
-	void (*unregister_event)(struct cgroup_subsys_state *css,
+	void (*unregister_event)(struct mem_cgroup *memcg,
 				 struct eventfd_ctx *eventfd);
 	/*
 	 * All fields below needed to unregister event when
@@ -546,11 +546,6 @@ struct cgroup_subsys_state *vmpressure_to_css(struct vmpressure *vmpr)
 	return &container_of(vmpr, struct mem_cgroup, vmpressure)->css;
 }
 
-struct vmpressure *css_to_vmpressure(struct cgroup_subsys_state *css)
-{
-	return &mem_cgroup_from_css(css)->vmpressure;
-}
-
 static inline bool mem_cgroup_is_root(struct mem_cgroup *memcg)
 {
 	return (memcg == root_mem_cgroup);
@@ -5653,10 +5648,9 @@ static void mem_cgroup_oom_notify(struct mem_cgroup *memcg)
 		mem_cgroup_oom_notify_cb(iter);
 }
 
-static int __mem_cgroup_usage_register_event(struct cgroup_subsys_state *css,
+static int __mem_cgroup_usage_register_event(struct mem_cgroup *memcg,
 	struct eventfd_ctx *eventfd, const char *args, enum res_type type)
 {
-	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
 	struct mem_cgroup_thresholds *thresholds;
 	struct mem_cgroup_threshold_ary *new;
 	u64 threshold, usage;
@@ -5735,22 +5729,21 @@ unlock:
 	return ret;
 }
 
-static int mem_cgroup_usage_register_event(struct cgroup_subsys_state *css,
+static int mem_cgroup_usage_register_event(struct mem_cgroup *memcg,
 	struct eventfd_ctx *eventfd, const char *args)
 {
-	return __mem_cgroup_usage_register_event(css, eventfd, args, _MEM);
+	return __mem_cgroup_usage_register_event(memcg, eventfd, args, _MEM);
 }
 
-static int memsw_cgroup_usage_register_event(struct cgroup_subsys_state *css,
+static int memsw_cgroup_usage_register_event(struct mem_cgroup *memcg,
 	struct eventfd_ctx *eventfd, const char *args)
 {
-	return __mem_cgroup_usage_register_event(css, eventfd, args, _MEMSWAP);
+	return __mem_cgroup_usage_register_event(memcg, eventfd, args, _MEMSWAP);
 }
 
-static void __mem_cgroup_usage_unregister_event(struct cgroup_subsys_state *css,
+static void __mem_cgroup_usage_unregister_event(struct mem_cgroup *memcg,
 	struct eventfd_ctx *eventfd, enum res_type type)
 {
-	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
 	struct mem_cgroup_thresholds *thresholds;
 	struct mem_cgroup_threshold_ary *new;
 	u64 usage;
@@ -5825,22 +5818,21 @@ unlock:
 	mutex_unlock(&memcg->thresholds_lock);
 }
 
-static void mem_cgroup_usage_unregister_event(struct cgroup_subsys_state *css,
+static void mem_cgroup_usage_unregister_event(struct mem_cgroup *memcg,
 	struct eventfd_ctx *eventfd)
 {
-	return __mem_cgroup_usage_unregister_event(css, eventfd, _MEM);
+	return __mem_cgroup_usage_unregister_event(memcg, eventfd, _MEM);
 }
 
-static void memsw_cgroup_usage_unregister_event(struct cgroup_subsys_state *css,
+static void memsw_cgroup_usage_unregister_event(struct mem_cgroup *memcg,
 	struct eventfd_ctx *eventfd)
 {
-	return __mem_cgroup_usage_unregister_event(css, eventfd, _MEMSWAP);
+	return __mem_cgroup_usage_unregister_event(memcg, eventfd, _MEMSWAP);
 }
 
-static int mem_cgroup_oom_register_event(struct cgroup_subsys_state *css,
+static int mem_cgroup_oom_register_event(struct mem_cgroup *memcg,
 	struct eventfd_ctx *eventfd, const char *args)
 {
-	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
 	struct mem_cgroup_eventfd_list *event;
 
 	event = kmalloc(sizeof(*event),	GFP_KERNEL);
@@ -5860,10 +5852,9 @@ static int mem_cgroup_oom_register_event(struct cgroup_subsys_state *css,
 	return 0;
 }
 
-static void mem_cgroup_oom_unregister_event(struct cgroup_subsys_state *css,
+static void mem_cgroup_oom_unregister_event(struct mem_cgroup *memcg,
 	struct eventfd_ctx *eventfd)
 {
-	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
 	struct mem_cgroup_eventfd_list *ev, *tmp;
 
 	spin_lock(&memcg_oom_lock);
@@ -5990,18 +5981,18 @@ static void cgroup_event_remove(struct work_struct *work)
 {
 	struct cgroup_event *event = container_of(work, struct cgroup_event,
 			remove);
-	struct cgroup_subsys_state *css = event->css;
+	struct mem_cgroup *memcg = event->memcg;
 
 	remove_wait_queue(event->wqh, &event->wait);
 
-	event->unregister_event(css, event->eventfd);
+	event->unregister_event(memcg, event->eventfd);
 
 	/* Notify userspace the event is going away. */
 	eventfd_signal(event->eventfd, 1);
 
 	eventfd_ctx_put(event->eventfd);
 	kfree(event);
-	css_put(css);
+	css_put(&memcg->css);
 }
 
 /*
@@ -6014,7 +6005,7 @@ static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
 {
 	struct cgroup_event *event = container_of(wait,
 			struct cgroup_event, wait);
-	struct mem_cgroup *memcg = mem_cgroup_from_css(event->css);
+	struct mem_cgroup *memcg = event->memcg;
 	unsigned long flags = (unsigned long)key;
 
 	if (flags & POLLHUP) {
@@ -6085,7 +6076,7 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *css,
 	if (!event)
 		return -ENOMEM;
 
-	event->css = css;
+	event->memcg = memcg;
 	INIT_LIST_HEAD(&event->list);
 	init_poll_funcptr(&event->pt, cgroup_event_ptable_queue_proc);
 	init_waitqueue_func_entry(&event->wait, cgroup_event_wake);
@@ -6156,7 +6147,7 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *css,
 	if (ret)
 		goto out_put_cfile;
 
-	ret = event->register_event(css, event->eventfd, buffer);
+	ret = event->register_event(memcg, event->eventfd, buffer);
 	if (ret)
 		goto out_put_css;
 
diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index 40ed6e8..96f7509 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -278,7 +278,7 @@ void vmpressure_prio(gfp_t gfp, struct mem_cgroup *memcg, int prio)
 
 /**
  * vmpressure_register_event() - Bind vmpressure notifications to an eventfd
- * @css:	css that is interested in vmpressure notifications
+ * @memcg:	memcg that is interested in vmpressure notifications
  * @eventfd:	eventfd context to link notifications with
  * @args:	event arguments (used to set up a pressure level threshold)
  *
@@ -290,10 +290,10 @@ void vmpressure_prio(gfp_t gfp, struct mem_cgroup *memcg, int prio)
  *
  * To be used as memcg event method.
  */
-int vmpressure_register_event(struct cgroup_subsys_state *css,
+int vmpressure_register_event(struct mem_cgroup *memcg,
 			      struct eventfd_ctx *eventfd, const char *args)
 {
-	struct vmpressure *vmpr = css_to_vmpressure(css);
+	struct vmpressure *vmpr = memcg_to_vmpressure(memcg);
 	struct vmpressure_event *ev;
 	int level;
 
@@ -321,7 +321,7 @@ int vmpressure_register_event(struct cgroup_subsys_state *css,
 
 /**
  * vmpressure_unregister_event() - Unbind eventfd from vmpressure
- * @css:	css handle
+ * @memcg:	memcg handle
  * @eventfd:	eventfd context that was used to link vmpressure with the @cg
  *
  * This function does internal manipulations to detach the @eventfd from
@@ -330,10 +330,10 @@ int vmpressure_register_event(struct cgroup_subsys_state *css,
  *
  * To be used as memcg event method.
  */
-void vmpressure_unregister_event(struct cgroup_subsys_state *css,
+void vmpressure_unregister_event(struct mem_cgroup *memcg,
 				 struct eventfd_ctx *eventfd)
 {
-	struct vmpressure *vmpr = css_to_vmpressure(css);
+	struct vmpressure *vmpr = memcg_to_vmpressure(memcg);
 	struct vmpressure_event *ev;
 
 	mutex_lock(&vmpr->events_lock);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 11/12] memcg: rename cgroup_event to mem_cgroup_event
       [not found] ` <1376582550-12548-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (20 preceding siblings ...)
  2013-08-15 16:02   ` [PATCH 11/12] memcg: rename cgroup_event to mem_cgroup_event Tejun Heo
@ 2013-08-15 16:02   ` Tejun Heo
  2013-08-15 16:02   ` [PATCH 12/12] cgroup: unexport cgroup_css() and remove __file_cft() Tejun Heo
                     ` (7 subsequent siblings)
  29 siblings, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2013-08-15 16:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
  Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

cgroup_event is only available in memcg now.  Let's brand it that way.
While at it, add a comment encouraging deprecation of the feature and
remove the respective section from cgroup documentation.

This patch is cosmetic.

v2: Index in cgroups.txt updated accordingly as suggested by Li Zefan.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 Documentation/cgroups/cgroups.txt | 20 --------------
 mm/memcontrol.c                   | 57 +++++++++++++++++++++++++--------------
 2 files changed, 37 insertions(+), 40 deletions(-)

diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
index 638bf17..821de56 100644
--- a/Documentation/cgroups/cgroups.txt
+++ b/Documentation/cgroups/cgroups.txt
@@ -24,7 +24,6 @@ CONTENTS:
   2.1 Basic Usage
   2.2 Attaching processes
   2.3 Mounting hierarchies by name
-  2.4 Notification API
 3. Kernel API
   3.1 Overview
   3.2 Synchronization
@@ -472,25 +471,6 @@ you give a subsystem a name.
 The name of the subsystem appears as part of the hierarchy description
 in /proc/mounts and /proc/<pid>/cgroups.
 
-2.4 Notification API
---------------------
-
-There is mechanism which allows to get notifications about changing
-status of a cgroup.
-
-To register a new notification handler you need to:
- - create a file descriptor for event notification using eventfd(2);
- - open a control file to be monitored (e.g. memory.usage_in_bytes);
- - write "<event_fd> <control_fd> <args>" to cgroup.event_control.
-   Interpretation of args is defined by control file implementation;
-
-eventfd will be woken up by control file implementation or when the
-cgroup is removed.
-
-To unregister a notification handler just close eventfd.
-
-NOTE: Support of notifications should be implemented for the control
-file. See documentation for the subsystem.
 
 3. Kernel API
 =============
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8663d6c..2f0a8e1 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -243,7 +243,7 @@ struct mem_cgroup_eventfd_list {
 /*
  * cgroup_event represents events which userspace want to receive.
  */
-struct cgroup_event {
+struct mem_cgroup_event {
 	/*
 	 * memcg which the event belongs to.
 	 */
@@ -5973,14 +5973,27 @@ static void kmem_cgroup_css_offline(struct mem_cgroup *memcg)
 #endif
 
 /*
+ * DO NOT USE IN NEW FILES.
+ *
+ * "cgroup.event_control" implementation.
+ *
+ * This is way over-engineered.  It tries to support fully configureable
+ * events for each user.  Such level of flexibility is completely
+ * unnecessary especially in the light of the planned unified hierarchy.
+ *
+ * Please deprecate this and replace with something simpler if at all
+ * possible.
+ */
+
+/*
  * Unregister event and free resources.
  *
  * Gets called from workqueue.
  */
-static void cgroup_event_remove(struct work_struct *work)
+static void memcg_event_remove(struct work_struct *work)
 {
-	struct cgroup_event *event = container_of(work, struct cgroup_event,
-			remove);
+	struct mem_cgroup_event *event =
+		container_of(work, struct mem_cgroup_event, remove);
 	struct mem_cgroup *memcg = event->memcg;
 
 	remove_wait_queue(event->wqh, &event->wait);
@@ -6000,11 +6013,11 @@ static void cgroup_event_remove(struct work_struct *work)
  *
  * Called with wqh->lock held and interrupts disabled.
  */
-static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
-		int sync, void *key)
+static int memcg_event_wake(wait_queue_t *wait, unsigned mode,
+			    int sync, void *key)
 {
-	struct cgroup_event *event = container_of(wait,
-			struct cgroup_event, wait);
+	struct mem_cgroup_event *event =
+		container_of(wait, struct mem_cgroup_event, wait);
 	struct mem_cgroup *memcg = event->memcg;
 	unsigned long flags = (unsigned long)key;
 
@@ -6033,27 +6046,29 @@ static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
 	return 0;
 }
 
-static void cgroup_event_ptable_queue_proc(struct file *file,
+static void memcg_event_ptable_queue_proc(struct file *file,
 		wait_queue_head_t *wqh, poll_table *pt)
 {
-	struct cgroup_event *event = container_of(pt,
-			struct cgroup_event, pt);
+	struct mem_cgroup_event *event =
+		container_of(pt, struct mem_cgroup_event, pt);
 
 	event->wqh = wqh;
 	add_wait_queue(wqh, &event->wait);
 }
 
 /*
+ * DO NOT USE IN NEW FILES.
+ *
  * Parse input and register new cgroup event handler.
  *
  * Input must be in format '<event_fd> <control_fd> <args>'.
  * Interpretation of args is defined by control file implementation.
  */
-static int cgroup_write_event_control(struct cgroup_subsys_state *css,
-				      struct cftype *cft, const char *buffer)
+static int memcg_write_event_control(struct cgroup_subsys_state *css,
+				     struct cftype *cft, const char *buffer)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
-	struct cgroup_event *event;
+	struct mem_cgroup_event *event;
 	struct cgroup_subsys_state *cfile_css;
 	unsigned int efd, cfd;
 	struct file *efile;
@@ -6078,9 +6093,9 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *css,
 
 	event->memcg = memcg;
 	INIT_LIST_HEAD(&event->list);
-	init_poll_funcptr(&event->pt, cgroup_event_ptable_queue_proc);
-	init_waitqueue_func_entry(&event->wait, cgroup_event_wake);
-	INIT_WORK(&event->remove, cgroup_event_remove);
+	init_poll_funcptr(&event->pt, memcg_event_ptable_queue_proc);
+	init_waitqueue_func_entry(&event->wait, memcg_event_wake);
+	INIT_WORK(&event->remove, memcg_event_remove);
 
 	efile = eventfd_fget(efd);
 	if (IS_ERR(efile)) {
@@ -6111,6 +6126,8 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *css,
 	 * to be done via struct cftype but cgroup core no longer knows
 	 * about these events.  The following is crude but the whole thing
 	 * is for compatibility anyway.
+	 *
+	 * DO NOT ADD NEW FILES.
 	 */
 	name = cfile->f_dentry->d_name.name;
 
@@ -6221,8 +6238,8 @@ static struct cftype mem_cgroup_files[] = {
 		.read_u64 = mem_cgroup_hierarchy_read,
 	},
 	{
-		.name = "cgroup.event_control",
-		.write_string = cgroup_write_event_control,
+		.name = "cgroup.event_control",		/* XXX: for compat */
+		.write_string = memcg_write_event_control,
 		.flags = CFTYPE_NO_PREFIX,
 		.mode = S_IWUGO,
 	},
@@ -6555,7 +6572,7 @@ static void mem_cgroup_invalidate_reclaim_iterators(struct mem_cgroup *memcg)
 static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
-	struct cgroup_event *event, *tmp;
+	struct mem_cgroup_event *event, *tmp;
 
 	/*
 	 * Unregister events and notify userspace.
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 11/12] memcg: rename cgroup_event to mem_cgroup_event
       [not found] ` <1376582550-12548-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (19 preceding siblings ...)
  2013-08-15 16:02   ` Tejun Heo
@ 2013-08-15 16:02   ` Tejun Heo
       [not found]     ` <1376582550-12548-12-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2013-08-15 16:02   ` Tejun Heo
                     ` (8 subsequent siblings)
  29 siblings, 1 reply; 74+ messages in thread
From: Tejun Heo @ 2013-08-15 16:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Tejun Heo

cgroup_event is only available in memcg now.  Let's brand it that way.
While at it, add a comment encouraging deprecation of the feature and
remove the respective section from cgroup documentation.

This patch is cosmetic.

v2: Index in cgroups.txt updated accordingly as suggested by Li Zefan.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 Documentation/cgroups/cgroups.txt | 20 --------------
 mm/memcontrol.c                   | 57 +++++++++++++++++++++++++--------------
 2 files changed, 37 insertions(+), 40 deletions(-)

diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
index 638bf17..821de56 100644
--- a/Documentation/cgroups/cgroups.txt
+++ b/Documentation/cgroups/cgroups.txt
@@ -24,7 +24,6 @@ CONTENTS:
   2.1 Basic Usage
   2.2 Attaching processes
   2.3 Mounting hierarchies by name
-  2.4 Notification API
 3. Kernel API
   3.1 Overview
   3.2 Synchronization
@@ -472,25 +471,6 @@ you give a subsystem a name.
 The name of the subsystem appears as part of the hierarchy description
 in /proc/mounts and /proc/<pid>/cgroups.
 
-2.4 Notification API
---------------------
-
-There is mechanism which allows to get notifications about changing
-status of a cgroup.
-
-To register a new notification handler you need to:
- - create a file descriptor for event notification using eventfd(2);
- - open a control file to be monitored (e.g. memory.usage_in_bytes);
- - write "<event_fd> <control_fd> <args>" to cgroup.event_control.
-   Interpretation of args is defined by control file implementation;
-
-eventfd will be woken up by control file implementation or when the
-cgroup is removed.
-
-To unregister a notification handler just close eventfd.
-
-NOTE: Support of notifications should be implemented for the control
-file. See documentation for the subsystem.
 
 3. Kernel API
 =============
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8663d6c..2f0a8e1 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -243,7 +243,7 @@ struct mem_cgroup_eventfd_list {
 /*
  * cgroup_event represents events which userspace want to receive.
  */
-struct cgroup_event {
+struct mem_cgroup_event {
 	/*
 	 * memcg which the event belongs to.
 	 */
@@ -5973,14 +5973,27 @@ static void kmem_cgroup_css_offline(struct mem_cgroup *memcg)
 #endif
 
 /*
+ * DO NOT USE IN NEW FILES.
+ *
+ * "cgroup.event_control" implementation.
+ *
+ * This is way over-engineered.  It tries to support fully configureable
+ * events for each user.  Such level of flexibility is completely
+ * unnecessary especially in the light of the planned unified hierarchy.
+ *
+ * Please deprecate this and replace with something simpler if at all
+ * possible.
+ */
+
+/*
  * Unregister event and free resources.
  *
  * Gets called from workqueue.
  */
-static void cgroup_event_remove(struct work_struct *work)
+static void memcg_event_remove(struct work_struct *work)
 {
-	struct cgroup_event *event = container_of(work, struct cgroup_event,
-			remove);
+	struct mem_cgroup_event *event =
+		container_of(work, struct mem_cgroup_event, remove);
 	struct mem_cgroup *memcg = event->memcg;
 
 	remove_wait_queue(event->wqh, &event->wait);
@@ -6000,11 +6013,11 @@ static void cgroup_event_remove(struct work_struct *work)
  *
  * Called with wqh->lock held and interrupts disabled.
  */
-static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
-		int sync, void *key)
+static int memcg_event_wake(wait_queue_t *wait, unsigned mode,
+			    int sync, void *key)
 {
-	struct cgroup_event *event = container_of(wait,
-			struct cgroup_event, wait);
+	struct mem_cgroup_event *event =
+		container_of(wait, struct mem_cgroup_event, wait);
 	struct mem_cgroup *memcg = event->memcg;
 	unsigned long flags = (unsigned long)key;
 
@@ -6033,27 +6046,29 @@ static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
 	return 0;
 }
 
-static void cgroup_event_ptable_queue_proc(struct file *file,
+static void memcg_event_ptable_queue_proc(struct file *file,
 		wait_queue_head_t *wqh, poll_table *pt)
 {
-	struct cgroup_event *event = container_of(pt,
-			struct cgroup_event, pt);
+	struct mem_cgroup_event *event =
+		container_of(pt, struct mem_cgroup_event, pt);
 
 	event->wqh = wqh;
 	add_wait_queue(wqh, &event->wait);
 }
 
 /*
+ * DO NOT USE IN NEW FILES.
+ *
  * Parse input and register new cgroup event handler.
  *
  * Input must be in format '<event_fd> <control_fd> <args>'.
  * Interpretation of args is defined by control file implementation.
  */
-static int cgroup_write_event_control(struct cgroup_subsys_state *css,
-				      struct cftype *cft, const char *buffer)
+static int memcg_write_event_control(struct cgroup_subsys_state *css,
+				     struct cftype *cft, const char *buffer)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
-	struct cgroup_event *event;
+	struct mem_cgroup_event *event;
 	struct cgroup_subsys_state *cfile_css;
 	unsigned int efd, cfd;
 	struct file *efile;
@@ -6078,9 +6093,9 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *css,
 
 	event->memcg = memcg;
 	INIT_LIST_HEAD(&event->list);
-	init_poll_funcptr(&event->pt, cgroup_event_ptable_queue_proc);
-	init_waitqueue_func_entry(&event->wait, cgroup_event_wake);
-	INIT_WORK(&event->remove, cgroup_event_remove);
+	init_poll_funcptr(&event->pt, memcg_event_ptable_queue_proc);
+	init_waitqueue_func_entry(&event->wait, memcg_event_wake);
+	INIT_WORK(&event->remove, memcg_event_remove);
 
 	efile = eventfd_fget(efd);
 	if (IS_ERR(efile)) {
@@ -6111,6 +6126,8 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *css,
 	 * to be done via struct cftype but cgroup core no longer knows
 	 * about these events.  The following is crude but the whole thing
 	 * is for compatibility anyway.
+	 *
+	 * DO NOT ADD NEW FILES.
 	 */
 	name = cfile->f_dentry->d_name.name;
 
@@ -6221,8 +6238,8 @@ static struct cftype mem_cgroup_files[] = {
 		.read_u64 = mem_cgroup_hierarchy_read,
 	},
 	{
-		.name = "cgroup.event_control",
-		.write_string = cgroup_write_event_control,
+		.name = "cgroup.event_control",		/* XXX: for compat */
+		.write_string = memcg_write_event_control,
 		.flags = CFTYPE_NO_PREFIX,
 		.mode = S_IWUGO,
 	},
@@ -6555,7 +6572,7 @@ static void mem_cgroup_invalidate_reclaim_iterators(struct mem_cgroup *memcg)
 static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
-	struct cgroup_event *event, *tmp;
+	struct mem_cgroup_event *event, *tmp;
 
 	/*
 	 * Unregister events and notify userspace.
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 12/12] cgroup: unexport cgroup_css() and remove __file_cft()
       [not found] ` <1376582550-12548-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (21 preceding siblings ...)
  2013-08-15 16:02   ` Tejun Heo
@ 2013-08-15 16:02   ` Tejun Heo
  2013-08-15 16:02   ` Tejun Heo
                     ` (6 subsequent siblings)
  29 siblings, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2013-08-15 16:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
  Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Now that cgroup_event is made memcg specific, the temporarily exported
functions are no longer necessary.  Unexport cgroup_css() and remove
__file_cft() which doesn't have any user left.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 include/linux/cgroup.h |  5 -----
 kernel/cgroup.c        | 14 ++------------
 2 files changed, 2 insertions(+), 17 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index d2cad3f..f9a4194 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -899,11 +899,6 @@ unsigned short css_id(struct cgroup_subsys_state *css);
 struct cgroup_subsys_state *css_from_dir(struct dentry *dentry,
 					 struct cgroup_subsys *ss);
 
-/* XXX: temporary */
-struct cgroup_subsys_state *cgroup_css(struct cgroup *cgrp,
-				       struct cgroup_subsys *ss);
-struct cftype *__file_cft(struct file *file);
-
 #else /* !CONFIG_CGROUPS */
 
 static inline int cgroup_init_early(void) { return 0; }
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 4368a6c..1402366 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -202,8 +202,8 @@ static int cgroup_addrm_files(struct cgroup *cgrp, struct cftype cfts[],
  * keep accessing it outside the said locks.  This function may return
  * %NULL if @cgrp doesn't have @subsys_id enabled.
  */
-struct cgroup_subsys_state *cgroup_css(struct cgroup *cgrp,
-				       struct cgroup_subsys *ss)
+static struct cgroup_subsys_state *cgroup_css(struct cgroup *cgrp,
+					      struct cgroup_subsys *ss)
 {
 	if (ss)
 		return rcu_dereference_check(cgrp->subsys[ss->subsys_id],
@@ -2634,16 +2634,6 @@ static struct dentry *cgroup_lookup(struct inode *dir, struct dentry *dentry, un
 	return NULL;
 }
 
-/*
- * Check if a file is a control file
- */
-struct cftype *__file_cft(struct file *file)
-{
-	if (file_inode(file)->i_fop != &cgroup_file_operations)
-		return ERR_PTR(-EINVAL);
-	return __d_cft(file->f_dentry);
-}
-
 static int cgroup_create_file(struct dentry *dentry, umode_t mode,
 				struct super_block *sb)
 {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 12/12] cgroup: unexport cgroup_css() and remove __file_cft()
       [not found] ` <1376582550-12548-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (22 preceding siblings ...)
  2013-08-15 16:02   ` [PATCH 12/12] cgroup: unexport cgroup_css() and remove __file_cft() Tejun Heo
@ 2013-08-15 16:02   ` Tejun Heo
  2013-08-21 20:12   ` [PATCHSET v2 cgroup/for-3.12] cgroup: make cgroup_event specific to memcg Tejun Heo
                     ` (5 subsequent siblings)
  29 siblings, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2013-08-15 16:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Tejun Heo

Now that cgroup_event is made memcg specific, the temporarily exported
functions are no longer necessary.  Unexport cgroup_css() and remove
__file_cft() which doesn't have any user left.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 include/linux/cgroup.h |  5 -----
 kernel/cgroup.c        | 14 ++------------
 2 files changed, 2 insertions(+), 17 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index d2cad3f..f9a4194 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -899,11 +899,6 @@ unsigned short css_id(struct cgroup_subsys_state *css);
 struct cgroup_subsys_state *css_from_dir(struct dentry *dentry,
 					 struct cgroup_subsys *ss);
 
-/* XXX: temporary */
-struct cgroup_subsys_state *cgroup_css(struct cgroup *cgrp,
-				       struct cgroup_subsys *ss);
-struct cftype *__file_cft(struct file *file);
-
 #else /* !CONFIG_CGROUPS */
 
 static inline int cgroup_init_early(void) { return 0; }
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 4368a6c..1402366 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -202,8 +202,8 @@ static int cgroup_addrm_files(struct cgroup *cgrp, struct cftype cfts[],
  * keep accessing it outside the said locks.  This function may return
  * %NULL if @cgrp doesn't have @subsys_id enabled.
  */
-struct cgroup_subsys_state *cgroup_css(struct cgroup *cgrp,
-				       struct cgroup_subsys *ss)
+static struct cgroup_subsys_state *cgroup_css(struct cgroup *cgrp,
+					      struct cgroup_subsys *ss)
 {
 	if (ss)
 		return rcu_dereference_check(cgrp->subsys[ss->subsys_id],
@@ -2634,16 +2634,6 @@ static struct dentry *cgroup_lookup(struct inode *dir, struct dentry *dentry, un
 	return NULL;
 }
 
-/*
- * Check if a file is a control file
- */
-struct cftype *__file_cft(struct file *file)
-{
-	if (file_inode(file)->i_fop != &cgroup_file_operations)
-		return ERR_PTR(-EINVAL);
-	return __d_cft(file->f_dentry);
-}
-
 static int cgroup_create_file(struct dentry *dentry, umode_t mode,
 				struct super_block *sb)
 {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [PATCHSET v2 cgroup/for-3.12] cgroup: make cgroup_event specific to memcg
       [not found] ` <1376582550-12548-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (24 preceding siblings ...)
  2013-08-21 20:12   ` [PATCHSET v2 cgroup/for-3.12] cgroup: make cgroup_event specific to memcg Tejun Heo
@ 2013-08-21 20:12   ` Tejun Heo
  2013-08-26 14:15   ` Kirill A. Shutemov
                     ` (3 subsequent siblings)
  29 siblings, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2013-08-21 20:12 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Li, Michal, ping.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCHSET v2 cgroup/for-3.12] cgroup: make cgroup_event specific to memcg
       [not found] ` <1376582550-12548-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (23 preceding siblings ...)
  2013-08-15 16:02   ` Tejun Heo
@ 2013-08-21 20:12   ` Tejun Heo
       [not found]     ` <20130821201239.GB2436-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
  2013-08-21 20:12   ` Tejun Heo
                     ` (4 subsequent siblings)
  29 siblings, 1 reply; 74+ messages in thread
From: Tejun Heo @ 2013-08-21 20:12 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Li, Michal, ping.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 11/12] memcg: rename cgroup_event to mem_cgroup_event
       [not found]     ` <1376582550-12548-12-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2013-08-23  3:42       ` Li Zefan
@ 2013-08-23  3:42       ` Li Zefan
  2013-08-30 11:19       ` Michal Hocko
  2013-08-30 11:19       ` Michal Hocko
  3 siblings, 0 replies; 74+ messages in thread
From: Li Zefan @ 2013-08-23  3:42 UTC (permalink / raw)
  To: Tejun Heo
  Cc: mhocko-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

>  /*
> + * DO NOT USE IN NEW FILES.
> + *
> + * "cgroup.event_control" implementation.
> + *
> + * This is way over-engineered.  It tries to support fully configureable

s/configureable/configurable

> + * events for each user.  Such level of flexibility is completely
> + * unnecessary especially in the light of the planned unified hierarchy.
> + *
> + * Please deprecate this and replace with something simpler if at all
> + * possible.
> + */
> +
> +/*

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 11/12] memcg: rename cgroup_event to mem_cgroup_event
       [not found]     ` <1376582550-12548-12-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2013-08-23  3:42       ` Li Zefan
       [not found]         ` <5216DA08.8040406-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
  2013-08-23  3:42       ` Li Zefan
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 74+ messages in thread
From: Li Zefan @ 2013-08-23  3:42 UTC (permalink / raw)
  To: Tejun Heo
  Cc: mhocko-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

>  /*
> + * DO NOT USE IN NEW FILES.
> + *
> + * "cgroup.event_control" implementation.
> + *
> + * This is way over-engineered.  It tries to support fully configureable

s/configureable/configurable

> + * events for each user.  Such level of flexibility is completely
> + * unnecessary especially in the light of the planned unified hierarchy.
> + *
> + * Please deprecate this and replace with something simpler if at all
> + * possible.
> + */
> +
> +/*

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCHSET v2 cgroup/for-3.12] cgroup: make cgroup_event specific to memcg
       [not found]     ` <20130821201239.GB2436-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
  2013-08-23  3:43       ` Li Zefan
@ 2013-08-23  3:43       ` Li Zefan
  2013-08-24 18:20       ` Michal Hocko
  2013-08-24 18:20       ` Michal Hocko
  3 siblings, 0 replies; 74+ messages in thread
From: Li Zefan @ 2013-08-23  3:43 UTC (permalink / raw)
  To: Tejun Heo
  Cc: mhocko-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On 2013/8/22 4:12, Tejun Heo wrote:
> Li, Michal, ping.
> 

I'm fine with the patchset. Feel free to add my acks. :)

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCHSET v2 cgroup/for-3.12] cgroup: make cgroup_event specific to memcg
       [not found]     ` <20130821201239.GB2436-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
@ 2013-08-23  3:43       ` Li Zefan
       [not found]         ` <5216DA6F.3080508-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
  2013-08-23  3:43       ` Li Zefan
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 74+ messages in thread
From: Li Zefan @ 2013-08-23  3:43 UTC (permalink / raw)
  To: Tejun Heo
  Cc: mhocko-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On 2013/8/22 4:12, Tejun Heo wrote:
> Li, Michal, ping.
> 

I'm fine with the patchset. Feel free to add my acks. :)

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCHSET v2 cgroup/for-3.12] cgroup: make cgroup_event specific to memcg
       [not found]         ` <5216DA6F.3080508-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2013-08-23 12:31           ` Tejun Heo
  0 siblings, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2013-08-23 12:31 UTC (permalink / raw)
  To: Li Zefan
  Cc: mhocko-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Hello,

On Fri, Aug 23, 2013 at 11:43:43AM +0800, Li Zefan wrote:
> On 2013/8/22 4:12, Tejun Heo wrote:
> > Li, Michal, ping.
> 
> I'm fine with the patchset. Feel free to add my acks. :)

Michal, given that these changes are mostly just moving things around
but can cause headaches between cgroup and memcg trees, I think it
could be better to push it out in this cycle.  It's late but I think
it's niche enough to push for the coming merge window.  What do you
think?  If you agree, how should this be routed?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 11/12] memcg: rename cgroup_event to mem_cgroup_event
       [not found]         ` <5216DA08.8040406-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2013-08-23 16:40           ` Tejun Heo
  0 siblings, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2013-08-23 16:40 UTC (permalink / raw)
  To: Li Zefan
  Cc: mhocko-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Fri, Aug 23, 2013 at 11:42:00AM +0800, Li Zefan wrote:
> >  /*
> > + * DO NOT USE IN NEW FILES.
> > + *
> > + * "cgroup.event_control" implementation.
> > + *
> > + * This is way over-engineered.  It tries to support fully configureable
> 
> s/configureable/configurable

Updated.  Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCHSET v2 cgroup/for-3.12] cgroup: make cgroup_event specific to memcg
       [not found]     ` <20130821201239.GB2436-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
                         ` (2 preceding siblings ...)
  2013-08-24 18:20       ` Michal Hocko
@ 2013-08-24 18:20       ` Michal Hocko
  3 siblings, 0 replies; 74+ messages in thread
From: Michal Hocko @ 2013-08-24 18:20 UTC (permalink / raw)
  To: Tejun Heo
  Cc: hannes-druUgvl0LCNAfugRpC6u6w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Wed 21-08-13 16:12:39, Tejun Heo wrote:
> Li, Michal, ping.

I am sorry, I was mostly offline for the last week with very poor
internet connection. I will try to get to this next week.

Sorry about that.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCHSET v2 cgroup/for-3.12] cgroup: make cgroup_event specific to memcg
       [not found]     ` <20130821201239.GB2436-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
  2013-08-23  3:43       ` Li Zefan
  2013-08-23  3:43       ` Li Zefan
@ 2013-08-24 18:20       ` Michal Hocko
       [not found]         ` <20130824182005.GA15897-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
  2013-08-24 18:20       ` Michal Hocko
  3 siblings, 1 reply; 74+ messages in thread
From: Michal Hocko @ 2013-08-24 18:20 UTC (permalink / raw)
  To: Tejun Heo
  Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Wed 21-08-13 16:12:39, Tejun Heo wrote:
> Li, Michal, ping.

I am sorry, I was mostly offline for the last week with very poor
internet connection. I will try to get to this next week.

Sorry about that.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCHSET v2 cgroup/for-3.12] cgroup: make cgroup_event specific to memcg
       [not found]         ` <20130824182005.GA15897-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
@ 2013-08-24 18:25           ` Tejun Heo
  0 siblings, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2013-08-24 18:25 UTC (permalink / raw)
  To: Michal Hocko
  Cc: hannes-druUgvl0LCNAfugRpC6u6w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Sat, Aug 24, 2013 at 08:20:05PM +0200, Michal Hocko wrote:
> On Wed 21-08-13 16:12:39, Tejun Heo wrote:
> > Li, Michal, ping.
> 
> I am sorry, I was mostly offline for the last week with very poor
> internet connection. I will try to get to this next week.

Eh well, it's nothing urgent.  Doing this after the merge window is
perfectly fine too.  Enjoy the offline-ness.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCHSET v2 cgroup/for-3.12] cgroup: make cgroup_event specific to memcg
       [not found] ` <1376582550-12548-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (26 preceding siblings ...)
  2013-08-26 14:15   ` Kirill A. Shutemov
@ 2013-08-26 14:15   ` Kirill A. Shutemov
  2013-11-10  4:48   ` Tejun Heo
  2013-11-10  4:48   ` Tejun Heo
  29 siblings, 0 replies; 74+ messages in thread
From: Kirill A. Shutemov @ 2013-08-26 14:15 UTC (permalink / raw)
  To: Tejun Heo
  Cc: mhocko-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Thu, Aug 15, 2013 at 12:02:18PM -0400, Tejun Heo wrote:
> Hello,
> 
> Changes from the last take[L] are
> 
> * Event handling updated such that it doesn't require meddling with
>   internals not normally exposed outside cgroup core proper.  dentry
>   reference counting is replaced with css one and cftype handling is
>   completely gone.  Hopefully, this addresses Michal's complaints.
> 
> * Further simplifications.
> 
> * Rebased on top of the current cgroup/for-3.12 and pending patches.
> 
> Like many other things in cgroup, cgroup_event is way too flexible and
> complex - it strives to provide completely flexible event monitoring
> facility in cgroup proper which allows any number of users to monitor
> custom events.  This essentially is a layering violation and leads to
> weird issues like worrying about event API mis/abuse from userland in
> cgroup controller event source implementation.

Blame me. When I added the interface I hoped events will be much more
useful. Like for delivering 'last task in the cgroup leaves' signals.
But you are right it's too easy to abuse the interface.

Few questions before I'll ack the patchset:
What's the plan if we will need a similar capability in the future?
How would you implement memory thresholds today?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCHSET v2 cgroup/for-3.12] cgroup: make cgroup_event specific to memcg
       [not found] ` <1376582550-12548-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (25 preceding siblings ...)
  2013-08-21 20:12   ` Tejun Heo
@ 2013-08-26 14:15   ` Kirill A. Shutemov
       [not found]     ` <20130826141536.GA14985-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
  2013-08-26 14:15   ` Kirill A. Shutemov
                     ` (2 subsequent siblings)
  29 siblings, 1 reply; 74+ messages in thread
From: Kirill A. Shutemov @ 2013-08-26 14:15 UTC (permalink / raw)
  To: Tejun Heo
  Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Thu, Aug 15, 2013 at 12:02:18PM -0400, Tejun Heo wrote:
> Hello,
> 
> Changes from the last take[L] are
> 
> * Event handling updated such that it doesn't require meddling with
>   internals not normally exposed outside cgroup core proper.  dentry
>   reference counting is replaced with css one and cftype handling is
>   completely gone.  Hopefully, this addresses Michal's complaints.
> 
> * Further simplifications.
> 
> * Rebased on top of the current cgroup/for-3.12 and pending patches.
> 
> Like many other things in cgroup, cgroup_event is way too flexible and
> complex - it strives to provide completely flexible event monitoring
> facility in cgroup proper which allows any number of users to monitor
> custom events.  This essentially is a layering violation and leads to
> weird issues like worrying about event API mis/abuse from userland in
> cgroup controller event source implementation.

Blame me. When I added the interface I hoped events will be much more
useful. Like for delivering 'last task in the cgroup leaves' signals.
But you are right it's too easy to abuse the interface.

Few questions before I'll ack the patchset:
What's the plan if we will need a similar capability in the future?
How would you implement memory thresholds today?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCHSET v2 cgroup/for-3.12] cgroup: make cgroup_event specific to memcg
       [not found]         ` <20130826151747.GD25171-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
@ 2013-08-26 14:29           ` Kirill A. Shutemov
       [not found]             ` <20130826142918.GB14985-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
  0 siblings, 1 reply; 74+ messages in thread
From: Kirill A. Shutemov @ 2013-08-26 14:29 UTC (permalink / raw)
  To: Tejun Heo
  Cc: mhocko-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Mon, Aug 26, 2013 at 11:17:47AM -0400, Tejun Heo wrote:
> Hello,
> 
> On Mon, Aug 26, 2013 at 05:15:36PM +0300, Kirill A. Shutemov wrote:
> > Few questions before I'll ack the patchset:
> > What's the plan if we will need a similar capability in the future?
> 
> Just a normal file modified event.
> 
> > How would you implement memory thresholds today?
> 
> Ditto.  Just a normal file modified event when the threshold is
> crossed.

What about setting threshold itself? Do we a way to set it in inotify
interface or you have something else in mind?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCHSET v2 cgroup/for-3.12] cgroup: make cgroup_event specific to memcg
       [not found]                 ` <20130826153028.GE25171-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
@ 2013-08-26 14:35                   ` Kirill A. Shutemov
  0 siblings, 0 replies; 74+ messages in thread
From: Kirill A. Shutemov @ 2013-08-26 14:35 UTC (permalink / raw)
  To: Tejun Heo
  Cc: mhocko-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Mon, Aug 26, 2013 at 11:30:28AM -0400, Tejun Heo wrote:
> Hello,
> 
> On Mon, Aug 26, 2013 at 05:29:18PM +0300, Kirill A. Shutemov wrote:
> > What about setting threshold itself? Do we a way to set it in inotify
> > interface or you have something else in mind?
> 
> Just have another file which configs the thresholds or sets the
> cadence?  The whole thing is overdesigned because it wants to allow
> individual listeners to have separate configurations which may be
> useful if access to sysfs hierarchy is open to everybody but that's
> not gonna be the case anymore and configuration of sysfs hierarchy is
> gonna be centralized anyway, so centralizing config of thresholds
> isn't any different from any other knobs.

Makes sense. Thank you.

Acked-by: Kirill A. Shutemov <kirill.shutemov-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCHSET v2 cgroup/for-3.12] cgroup: make cgroup_event specific to memcg
       [not found]     ` <20130826141536.GA14985-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
@ 2013-08-26 15:17       ` Tejun Heo
       [not found]         ` <20130826151747.GD25171-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
  0 siblings, 1 reply; 74+ messages in thread
From: Tejun Heo @ 2013-08-26 15:17 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: mhocko-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Hello,

On Mon, Aug 26, 2013 at 05:15:36PM +0300, Kirill A. Shutemov wrote:
> Few questions before I'll ack the patchset:
> What's the plan if we will need a similar capability in the future?

Just a normal file modified event.

> How would you implement memory thresholds today?

Ditto.  Just a normal file modified event when the threshold is
crossed.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCHSET v2 cgroup/for-3.12] cgroup: make cgroup_event specific to memcg
       [not found]             ` <20130826142918.GB14985-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
  2013-08-26 15:30               ` Tejun Heo
@ 2013-08-26 15:30               ` Tejun Heo
  1 sibling, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2013-08-26 15:30 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: mhocko-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Hello,

On Mon, Aug 26, 2013 at 05:29:18PM +0300, Kirill A. Shutemov wrote:
> What about setting threshold itself? Do we a way to set it in inotify
> interface or you have something else in mind?

Just have another file which configs the thresholds or sets the
cadence?  The whole thing is overdesigned because it wants to allow
individual listeners to have separate configurations which may be
useful if access to sysfs hierarchy is open to everybody but that's
not gonna be the case anymore and configuration of sysfs hierarchy is
gonna be centralized anyway, so centralizing config of thresholds
isn't any different from any other knobs.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCHSET v2 cgroup/for-3.12] cgroup: make cgroup_event specific to memcg
       [not found]             ` <20130826142918.GB14985-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
@ 2013-08-26 15:30               ` Tejun Heo
       [not found]                 ` <20130826153028.GE25171-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
  2013-08-26 15:30               ` Tejun Heo
  1 sibling, 1 reply; 74+ messages in thread
From: Tejun Heo @ 2013-08-26 15:30 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Hello,

On Mon, Aug 26, 2013 at 05:29:18PM +0300, Kirill A. Shutemov wrote:
> What about setting threshold itself? Do we a way to set it in inotify
> interface or you have something else in mind?

Just have another file which configs the thresholds or sets the
cadence?  The whole thing is overdesigned because it wants to allow
individual listeners to have separate configurations which may be
useful if access to sysfs hierarchy is open to everybody but that's
not gonna be the case anymore and configuration of sysfs hierarchy is
gonna be centralized anyway, so centralizing config of thresholds
isn't any different from any other knobs.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 05/12] cgroup: make cgroup_write_event_control() use css_from_dir() instead of __d_cgrp()
       [not found]     ` <1376582550-12548-6-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2013-08-26 22:38       ` Tejun Heo
  0 siblings, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2013-08-26 22:38 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Thu, Aug 15, 2013 at 12:02:23PM -0400, Tejun Heo wrote:
> cgroup_event will be moved to its only user - memcg.  Replace
> __d_cgrp() usage with css_from_dir(), which is already exported.  This
> also simplifies the code a bit.
> 
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

Applied 1-5, the cgroup core updates, to cgroup/for-3.12 with acks
added.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 06/12] cgroup, memcg: move cgroup_event implementation to memcg
       [not found]     ` <1376582550-12548-7-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2013-08-27 14:20       ` Michal Hocko
       [not found]         ` <20130827142002.GC13302-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
  2013-08-29 18:19       ` [PATCH v3 " Tejun Heo
  2013-08-29 18:19       ` Tejun Heo
  2 siblings, 1 reply; 74+ messages in thread
From: Michal Hocko @ 2013-08-27 14:20 UTC (permalink / raw)
  To: Tejun Heo
  Cc: hannes-druUgvl0LCNAfugRpC6u6w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

[Sorry for the late reply, I was mostly offline last week]

On Thu 15-08-13 12:02:24, Tejun Heo wrote:
> cgroup_event is way over-designed and tries to build a generic
> flexible event mechanism into cgroup - fully customizable event
> specification for each user of the interface.  This is utterly
> unnecessary and overboard especially in the light of the planned
> unified hierarchy as there's gonna be single agent.  Simply generating
> events at fixed points, or if that's too restrictive, configureable
> cadence or single set of configureable points should be enough.

I guess you are talking about thresholds here. Having a configurable
static table of them will probably work out fine.

But, how do I tell such an interface that I want to get only MEDIUM
vmpressure events? Do I have a special file for each pressure level?

Doing notification and read() every time might be a concern for
embedded-world who are worried about too many wakeups. We have
already seen suggestions to do a different modes of event triggering
to reduce wake up costs because of the power consumption (e.g.
http://comments.gmane.org/gmane.linux.kernel.mm/101628).

> Thankfully, memcg is the only user and gets to keep it.  Replacing it
> with something simpler on sane_behavior is strongly recommended.

You just forgot to tell us what is that "something simpler". Does it
exist yet? What is the semantic?

We have been trying to reduce memcg specific things in the past and
this will add non trivial chunk of code. I would at least expect some
justification _why_ moving the maintenance burden is worth it. It
certainly won't make memcg live easier. I can bite a bullet though if
this is the roadblock for making important changes in the cgroup core.
But you didn't tell us anything like that, except that you do not like
the interface because like other parts it is over-engineered thus bad.

You have mentioned it will help you clean up code further in the past
but I do not see any mention about it in this patch neither in the
leader email. Could you be more specific? How much? Is this piece of
code blocking those cleanups?

That is what I _really_ dislike about this patch and why I am really
reluctant to ack it.

> This patch moves cgroup_event and "cgroup.event_control"
> implementation to mm/memcontrol.c.

And we might end up having that code there for ever because your new and
yet to be shown interface might turn out to be not the best fit for the
current users.

Tejun, you have done _a lot_ of great work on cleaning up cgroup
core mess and I really appreciate that! I was supporting you in some
of those, I wish I had more time to do more. But I think you are
deprecating some things too easily without carrying much about the
current users assuming they will cope with that somehow and that a magic
central authority will do everything for them in a sane way. I am quite
skeptical, to be honest.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 06/12] cgroup, memcg: move cgroup_event implementation to memcg
       [not found]         ` <20130827142002.GC13302-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
  2013-08-27 20:00           ` Tejun Heo
@ 2013-08-27 20:00           ` Tejun Heo
  1 sibling, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2013-08-27 20:00 UTC (permalink / raw)
  To: Michal Hocko
  Cc: hannes-druUgvl0LCNAfugRpC6u6w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Hey, Michal.

On Tue, Aug 27, 2013 at 04:20:02PM +0200, Michal Hocko wrote:
> Doing notification and read() every time might be a concern for
> embedded-world who are worried about too many wakeups. We have
> already seen suggestions to do a different modes of event triggering
> to reduce wake up costs because of the power consumption (e.g.
> http://comments.gmane.org/gmane.linux.kernel.mm/101628).

Given the expected frequency of the event, I highly doubt it's gonna
matter.  The patch you linked is different.  That's just the event
designed wrong.  Sigh... so it's mixing events for vmpressure state
changing and reclaim activities happening?  Am I understanding it
correctly?  If so, I really have no idea why so many things are so ill
designed, even the new things.

* As a general principle, events should either be edge-triggered or
  level-triggered with a way to acknowledge the current state;
  otherwise, it's a fundamentally broken design.

* Let's please not slide in some completely unobvious side-channel
  information into an event.  If someone wants to watch reclaim
  events, add a dedicated event for that rather than overloading state
  changed event.

Let's just hope I grossly misunderstood the whole thread.

If it *really* is critical to discern low/mid and mid/high
transitions, we can add a seprate file so that the transitions can be
monitored separately but I can't emphasize enough that we shouldn't
chase every single requirement people come up with without the overall
sense of its relative importance and impact on long term maintenance
of the subsystem.  I'm highly skeptical that distinguishing the two
transitions and saving the occassional spurious events would be
justifiable.

> You just forgot to tell us what is that "something simpler". Does it
> exist yet? What is the semantic?

It's gonna be a file modified event, exactly the same as regular
files.  Haven't we gone over this multiple times now?  cgroup
implementation doesn't exist yet but everything about normal file
events including its semantics are already well defined and
understood.

> We have been trying to reduce memcg specific things in the past and
> this will add non trivial chunk of code. I would at least expect some
> justification _why_ moving the maintenance burden is worth it. It
> certainly won't make memcg live easier. I can bite a bullet though if
> this is the roadblock for making important changes in the cgroup core.
> But you didn't tell us anything like that, except that you do not like
> the interface because like other parts it is over-engineered thus bad.

Yes, it is a road block in two ways.

* Everything is being made per-css which is necessary as css's
  lifetime would no longer coincide with cgroup's.  Keeping this in
  cgroup proper would mean that it needs to be updated so that it's
  attached to css instead of cgroup.

* The file system interface of cgroup will go through sysfs so that
  cgroup doesn't have to worry about inode locking maze.  This is a
  major issue for nested subsys enable / disable and implementating
  migration as currently we end up having to lock inodes which belong
  to different subtrees and vfs doesn't define locking order between
  them.  So, short of meddling with vfs rename mutex, it'll deadlock.

> You have mentioned it will help you clean up code further in the past
> but I do not see any mention about it in this patch neither in the
> leader email. Could you be more specific? How much? Is this piece of
> code blocking those cleanups?

See above.

> That is what I _really_ dislike about this patch and why I am really
> reluctant to ack it.

See above.

> And we might end up having that code there for ever because your new and
> yet to be shown interface might turn out to be not the best fit for the
> current users.

We're gonna keep that code for years no matter what and you know that
the existing interface has fundamental issues.  Regardless of what we
do in the future, it needs to go.  That much is clear, isn't it?

> Tejun, you have done _a lot_ of great work on cleaning up cgroup
> core mess and I really appreciate that! I was supporting you in some
> of those, I wish I had more time to do more. But I think you are
> deprecating some things too easily without carrying much about the
> current users assuming they will cope with that somehow and that a magic
> central authority will do everything for them in a sane way. I am quite
> skeptical, to be honest.

This one eventually has to go.  It's the wrong piece of logic at the
wrong place / layer.  The sequence could be debatable but I don't
wanna introduce new thing before the filesystem part is converted to
sysfs as it'll need to be re-done then and I can tell you with
relative high level of confidence that the new interface will be
implemented in some months and its interface will be something along
the line of css_notify(css).

And as for the general skepticism, well, I can't make you to see
things my way.  I have been trying pretty hard with all involved and
am confident enough that at least enough are looking towards the same
general direction and the progress is pretty healthy too.  As for
memcg, I don't know.  I frankly have no idea what your general
direction and long term plan are and memcg in general still seems to
be lost quite often.  So, ummm, I don't know.  Can you at least
pretend to play along?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 06/12] cgroup, memcg: move cgroup_event implementation to memcg
       [not found]         ` <20130827142002.GC13302-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
@ 2013-08-27 20:00           ` Tejun Heo
       [not found]             ` <20130827200002.GD12212-9pTldWuhBndy/B6EtB590w@public.gmane.org>
  2013-08-27 20:00           ` Tejun Heo
  1 sibling, 1 reply; 74+ messages in thread
From: Tejun Heo @ 2013-08-27 20:00 UTC (permalink / raw)
  To: Michal Hocko
  Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Hey, Michal.

On Tue, Aug 27, 2013 at 04:20:02PM +0200, Michal Hocko wrote:
> Doing notification and read() every time might be a concern for
> embedded-world who are worried about too many wakeups. We have
> already seen suggestions to do a different modes of event triggering
> to reduce wake up costs because of the power consumption (e.g.
> http://comments.gmane.org/gmane.linux.kernel.mm/101628).

Given the expected frequency of the event, I highly doubt it's gonna
matter.  The patch you linked is different.  That's just the event
designed wrong.  Sigh... so it's mixing events for vmpressure state
changing and reclaim activities happening?  Am I understanding it
correctly?  If so, I really have no idea why so many things are so ill
designed, even the new things.

* As a general principle, events should either be edge-triggered or
  level-triggered with a way to acknowledge the current state;
  otherwise, it's a fundamentally broken design.

* Let's please not slide in some completely unobvious side-channel
  information into an event.  If someone wants to watch reclaim
  events, add a dedicated event for that rather than overloading state
  changed event.

Let's just hope I grossly misunderstood the whole thread.

If it *really* is critical to discern low/mid and mid/high
transitions, we can add a seprate file so that the transitions can be
monitored separately but I can't emphasize enough that we shouldn't
chase every single requirement people come up with without the overall
sense of its relative importance and impact on long term maintenance
of the subsystem.  I'm highly skeptical that distinguishing the two
transitions and saving the occassional spurious events would be
justifiable.

> You just forgot to tell us what is that "something simpler". Does it
> exist yet? What is the semantic?

It's gonna be a file modified event, exactly the same as regular
files.  Haven't we gone over this multiple times now?  cgroup
implementation doesn't exist yet but everything about normal file
events including its semantics are already well defined and
understood.

> We have been trying to reduce memcg specific things in the past and
> this will add non trivial chunk of code. I would at least expect some
> justification _why_ moving the maintenance burden is worth it. It
> certainly won't make memcg live easier. I can bite a bullet though if
> this is the roadblock for making important changes in the cgroup core.
> But you didn't tell us anything like that, except that you do not like
> the interface because like other parts it is over-engineered thus bad.

Yes, it is a road block in two ways.

* Everything is being made per-css which is necessary as css's
  lifetime would no longer coincide with cgroup's.  Keeping this in
  cgroup proper would mean that it needs to be updated so that it's
  attached to css instead of cgroup.

* The file system interface of cgroup will go through sysfs so that
  cgroup doesn't have to worry about inode locking maze.  This is a
  major issue for nested subsys enable / disable and implementating
  migration as currently we end up having to lock inodes which belong
  to different subtrees and vfs doesn't define locking order between
  them.  So, short of meddling with vfs rename mutex, it'll deadlock.

> You have mentioned it will help you clean up code further in the past
> but I do not see any mention about it in this patch neither in the
> leader email. Could you be more specific? How much? Is this piece of
> code blocking those cleanups?

See above.

> That is what I _really_ dislike about this patch and why I am really
> reluctant to ack it.

See above.

> And we might end up having that code there for ever because your new and
> yet to be shown interface might turn out to be not the best fit for the
> current users.

We're gonna keep that code for years no matter what and you know that
the existing interface has fundamental issues.  Regardless of what we
do in the future, it needs to go.  That much is clear, isn't it?

> Tejun, you have done _a lot_ of great work on cleaning up cgroup
> core mess and I really appreciate that! I was supporting you in some
> of those, I wish I had more time to do more. But I think you are
> deprecating some things too easily without carrying much about the
> current users assuming they will cope with that somehow and that a magic
> central authority will do everything for them in a sane way. I am quite
> skeptical, to be honest.

This one eventually has to go.  It's the wrong piece of logic at the
wrong place / layer.  The sequence could be debatable but I don't
wanna introduce new thing before the filesystem part is converted to
sysfs as it'll need to be re-done then and I can tell you with
relative high level of confidence that the new interface will be
implemented in some months and its interface will be something along
the line of css_notify(css).

And as for the general skepticism, well, I can't make you to see
things my way.  I have been trying pretty hard with all involved and
am confident enough that at least enough are looking towards the same
general direction and the progress is pretty healthy too.  As for
memcg, I don't know.  I frankly have no idea what your general
direction and long term plan are and memcg in general still seems to
be lost quite often.  So, ummm, I don't know.  Can you at least
pretend to play along?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 06/12] cgroup, memcg: move cgroup_event implementation to memcg
       [not found]             ` <20130827200002.GD12212-9pTldWuhBndy/B6EtB590w@public.gmane.org>
  2013-08-28 14:29               ` Michal Hocko
@ 2013-08-28 14:29               ` Michal Hocko
  1 sibling, 0 replies; 74+ messages in thread
From: Michal Hocko @ 2013-08-28 14:29 UTC (permalink / raw)
  To: Tejun Heo
  Cc: hannes-druUgvl0LCNAfugRpC6u6w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Tue 27-08-13 16:00:02, Tejun Heo wrote:
> Hey, Michal.
> 
> On Tue, Aug 27, 2013 at 04:20:02PM +0200, Michal Hocko wrote:
[...]
> > We have been trying to reduce memcg specific things in the past and
> > this will add non trivial chunk of code. I would at least expect some
> > justification _why_ moving the maintenance burden is worth it. It
> > certainly won't make memcg live easier. I can bite a bullet though if
> > this is the roadblock for making important changes in the cgroup core.
> > But you didn't tell us anything like that, except that you do not like
> > the interface because like other parts it is over-engineered thus bad.
> 
> Yes, it is a road block in two ways.
> 
> * Everything is being made per-css which is necessary as css's
>   lifetime would no longer coincide with cgroup's.  Keeping this in
>   cgroup proper would mean that it needs to be updated so that it's
>   attached to css instead of cgroup.
> 
> * The file system interface of cgroup will go through sysfs so that
>   cgroup doesn't have to worry about inode locking maze.  This is a
>   major issue for nested subsys enable / disable and implementating
>   migration as currently we end up having to lock inodes which belong
>   to different subtrees and vfs doesn't define locking order between
>   them.  So, short of meddling with vfs rename mutex, it'll deadlock.

This is totally new to me. You haven't said this would be a road _block_
before. I am not familiar with the current core cgroup development to
figure the above out myself as you can see.
[...]
> > And we might end up having that code there for ever because your new and
> > yet to be shown interface might turn out to be not the best fit for the
> > current users.
> 
> We're gonna keep that code for years no matter what and you know that
> the existing interface has fundamental issues.  Regardless of what we
> do in the future, it needs to go.  That much is clear, isn't it?

Yes. And I have already said that I do not insist on the interface. I do
not like it much either.  I hope that is clear as well.  Writing magic
fd into a file and getting events was quite weird also from the user
POV. But that was the interface we had for a long time and people are
using it.
I was merely objecting to moving code somewhere where it doesn't belong
IMO. Your changelog lacked the most important information _why_ it has
to move from cgroup core. At least that wasn't obvious to me. Your
general conclusion:
"
cgroup_event is way over-designed and tries to build a generic
flexible event mechanism into cgroup - fully customizable event
specification for each user of the interface.  This is utterly
unnecessary and overboard especially in the light of the planned
unified hierarchy as there's gonna be single agent. Simply generating
events at fixed points, or if that's too restrictive, configureable
cadence or single set of configureable points should be enough.
"
didn't explain that to me and sounds more like, it is bad so shove it
off to the user of the interface.

Update the changelog with the more specific information you have
provided in this email and I will have no problem accepting the patch as
the outcome is clearly higher than the maintenance burden in memcg.

[...]

Thanks for the clarification.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 06/12] cgroup, memcg: move cgroup_event implementation to memcg
       [not found]             ` <20130827200002.GD12212-9pTldWuhBndy/B6EtB590w@public.gmane.org>
@ 2013-08-28 14:29               ` Michal Hocko
  2013-08-28 14:29               ` Michal Hocko
  1 sibling, 0 replies; 74+ messages in thread
From: Michal Hocko @ 2013-08-28 14:29 UTC (permalink / raw)
  To: Tejun Heo
  Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Tue 27-08-13 16:00:02, Tejun Heo wrote:
> Hey, Michal.
> 
> On Tue, Aug 27, 2013 at 04:20:02PM +0200, Michal Hocko wrote:
[...]
> > We have been trying to reduce memcg specific things in the past and
> > this will add non trivial chunk of code. I would at least expect some
> > justification _why_ moving the maintenance burden is worth it. It
> > certainly won't make memcg live easier. I can bite a bullet though if
> > this is the roadblock for making important changes in the cgroup core.
> > But you didn't tell us anything like that, except that you do not like
> > the interface because like other parts it is over-engineered thus bad.
> 
> Yes, it is a road block in two ways.
> 
> * Everything is being made per-css which is necessary as css's
>   lifetime would no longer coincide with cgroup's.  Keeping this in
>   cgroup proper would mean that it needs to be updated so that it's
>   attached to css instead of cgroup.
> 
> * The file system interface of cgroup will go through sysfs so that
>   cgroup doesn't have to worry about inode locking maze.  This is a
>   major issue for nested subsys enable / disable and implementating
>   migration as currently we end up having to lock inodes which belong
>   to different subtrees and vfs doesn't define locking order between
>   them.  So, short of meddling with vfs rename mutex, it'll deadlock.

This is totally new to me. You haven't said this would be a road _block_
before. I am not familiar with the current core cgroup development to
figure the above out myself as you can see.
[...]
> > And we might end up having that code there for ever because your new and
> > yet to be shown interface might turn out to be not the best fit for the
> > current users.
> 
> We're gonna keep that code for years no matter what and you know that
> the existing interface has fundamental issues.  Regardless of what we
> do in the future, it needs to go.  That much is clear, isn't it?

Yes. And I have already said that I do not insist on the interface. I do
not like it much either.  I hope that is clear as well.  Writing magic
fd into a file and getting events was quite weird also from the user
POV. But that was the interface we had for a long time and people are
using it.
I was merely objecting to moving code somewhere where it doesn't belong
IMO. Your changelog lacked the most important information _why_ it has
to move from cgroup core. At least that wasn't obvious to me. Your
general conclusion:
"
cgroup_event is way over-designed and tries to build a generic
flexible event mechanism into cgroup - fully customizable event
specification for each user of the interface.  This is utterly
unnecessary and overboard especially in the light of the planned
unified hierarchy as there's gonna be single agent. Simply generating
events at fixed points, or if that's too restrictive, configureable
cadence or single set of configureable points should be enough.
"
didn't explain that to me and sounds more like, it is bad so shove it
off to the user of the interface.

Update the changelog with the more specific information you have
provided in this email and I will have no problem accepting the patch as
the outcome is clearly higher than the maintenance burden in memcg.

[...]

Thanks for the clarification.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH v3 06/12] cgroup, memcg: move cgroup_event implementation to memcg
       [not found]     ` <1376582550-12548-7-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2013-08-27 14:20       ` Michal Hocko
  2013-08-29 18:19       ` [PATCH v3 " Tejun Heo
@ 2013-08-29 18:19       ` Tejun Heo
  2 siblings, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2013-08-29 18:19 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Hello, Michal.

Updated as requested.  If there's still somthing missing, please let
me know.  In case you're okay with the change, how do you wanna route
it?  01-05 are already in cgroup/for-3.12 and the rest can either be
applie to cgroup/for-3.12, which will likely to require some, mostly
trivial, adjustments to memcg patches in -mm, or these all can be
routed through -mm.  Any preference?

Thanks.

------ 8< ------
cgroup_event is way over-designed and tries to build a generic
flexible event mechanism into cgroup - fully customizable event
specification for each user of the interface.  This is utterly
unnecessary and overboard especially in the light of the planned
unified hierarchy as there's gonna be single agent.  Simply generating
events at fixed points, or if that's too restrictive, configureable
cadence or single set of configureable points should be enough.

In addition, it's adding overhead for future changes for cgroup core.

* Everything is being made per-css which is necessary as css's
  lifetime would no longer coincide with cgroup's in the planned
  unified hierarchy.  Keeping cgroup_event in cgroup proper means that
  it should be updated to be managed per-css, which is silly as no
  other subsystems will use this facility.

* The file system interface of cgroup will be restructured so that it
  uses sysfs.  The goal of the conversion is losing duplicate logic
  and taking advantage of features provided by sysfs such as revoke
  semantics on file deletion and decoupled synchronization from vfs
  layer.

  The conversion is likely to involve unifying all different file
  operation types into a single type and having operations which are
  as unusual as event callbacks adds unnecessary burden.  While making
  it specific to memcg doesn't remove all conversion work, it'd lower
  the amount of work necessary.  We don't have to worry how the
  unusual event callbacks would fit into the new structure.

Thankfully, memcg is the only user and gets to keep it.  Replacing it
with something simpler on sane_behavior is strongly recommended.

This patch moves cgroup_event and "cgroup.event_control"
implementation to mm/memcontrol.c.  Clearing of events on cgroup
destruction is moved from cgroup_destroy_locked() to
mem_cgroup_css_offline(), which shouldn't make any noticeable
difference.

cgroup_css() and __file_cft() are exported to enable the move;
however, this will soon be reverted once the event code is updated to
be memcg specific.

Note that "cgroup.event_control" will now exist only on the hierarchy
with memcg attached to it.  While this change is visible to userland,
it is unlikely to be noticeable as the file has never been meaningful
outside memcg.

Aside from the above change, this is pure code relocation.

v2: Per Li Zefan's comments, init/Kconfig updated accordingly and
    poll.h inclusion moved from cgroup.c to memcontrol.c.

v3: Per Michal's request, update patch description.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Acked-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Cc: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
Cc: Balbir Singh <bsingharora-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 include/linux/cgroup.h |    5 
 init/Kconfig           |    3 
 kernel/cgroup.c        |  252 -------------------------------------------------
 mm/memcontrol.c        |  247 ++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 256 insertions(+), 251 deletions(-)

--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -907,6 +907,11 @@ unsigned short css_id(struct cgroup_subs
 struct cgroup_subsys_state *css_from_dir(struct dentry *dentry,
 					 struct cgroup_subsys *ss);
 
+/* XXX: temporary */
+struct cgroup_subsys_state *cgroup_css(struct cgroup *cgrp,
+				       struct cgroup_subsys *ss);
+struct cftype *__file_cft(struct file *file);
+
 #else /* !CONFIG_CGROUPS */
 
 static inline int cgroup_init_early(void) { return 0; }
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -844,7 +844,6 @@ config NUMA_BALANCING
 
 menuconfig CGROUPS
 	boolean "Control Group support"
-	depends on EVENTFD
 	help
 	  This option adds support for grouping sets of processes together, for
 	  use with process control subsystems such as Cpusets, CFS, memory
@@ -911,6 +910,7 @@ config MEMCG
 	bool "Memory Resource Controller for Control Groups"
 	depends on RESOURCE_COUNTERS
 	select MM_OWNER
+	select EVENTFD
 	help
 	  Provides a memory resource controller that manages both anonymous
 	  memory and page cache. (See Documentation/cgroups/memory.txt)
@@ -1163,7 +1163,6 @@ config UIDGID_STRICT_TYPE_CHECKS
 
 config SCHED_AUTOGROUP
 	bool "Automatic process group scheduling"
-	select EVENTFD
 	select CGROUPS
 	select CGROUP_SCHED
 	select FAIR_GROUP_SCHED
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -56,8 +56,6 @@
 #include <linux/pid_namespace.h>
 #include <linux/idr.h>
 #include <linux/vmalloc.h> /* TODO: replace with more sophisticated array */
-#include <linux/eventfd.h>
-#include <linux/poll.h>
 #include <linux/flex_array.h> /* used in cgroup_attach_task */
 #include <linux/kthread.h>
 
@@ -155,36 +153,6 @@ struct css_id {
 	unsigned short stack[0]; /* Array of Length (depth+1) */
 };
 
-/*
- * cgroup_event represents events which userspace want to receive.
- */
-struct cgroup_event {
-	/*
-	 * css which the event belongs to.
-	 */
-	struct cgroup_subsys_state *css;
-	/*
-	 * Control file which the event associated.
-	 */
-	struct cftype *cft;
-	/*
-	 * eventfd to signal userspace about the event.
-	 */
-	struct eventfd_ctx *eventfd;
-	/*
-	 * Each of these stored in a list by the cgroup.
-	 */
-	struct list_head list;
-	/*
-	 * All fields below needed to unregister event when
-	 * userspace closes eventfd.
-	 */
-	poll_table pt;
-	wait_queue_head_t *wqh;
-	wait_queue_t wait;
-	struct work_struct remove;
-};
-
 /* The list of hierarchy roots */
 
 static LIST_HEAD(cgroup_roots);
@@ -234,8 +202,8 @@ static int cgroup_addrm_files(struct cgr
  * keep accessing it outside the said locks.  This function may return
  * %NULL if @cgrp doesn't have @subsys_id enabled.
  */
-static struct cgroup_subsys_state *cgroup_css(struct cgroup *cgrp,
-					      struct cgroup_subsys *ss)
+struct cgroup_subsys_state *cgroup_css(struct cgroup *cgrp,
+				       struct cgroup_subsys *ss)
 {
 	if (ss)
 		return rcu_dereference_check(cgrp->subsys[ss->subsys_id],
@@ -2671,7 +2639,7 @@ static struct dentry *cgroup_lookup(stru
 /*
  * Check if a file is a control file
  */
-static inline struct cftype *__file_cft(struct file *file)
+struct cftype *__file_cft(struct file *file)
 {
 	if (file_inode(file)->i_fop != &cgroup_file_operations)
 		return ERR_PTR(-EINVAL);
@@ -3959,202 +3927,6 @@ static void cgroup_dput(struct cgroup *c
 	deactivate_super(sb);
 }
 
-/*
- * Unregister event and free resources.
- *
- * Gets called from workqueue.
- */
-static void cgroup_event_remove(struct work_struct *work)
-{
-	struct cgroup_event *event = container_of(work, struct cgroup_event,
-			remove);
-	struct cgroup_subsys_state *css = event->css;
-
-	remove_wait_queue(event->wqh, &event->wait);
-
-	event->cft->unregister_event(css, event->cft, event->eventfd);
-
-	/* Notify userspace the event is going away. */
-	eventfd_signal(event->eventfd, 1);
-
-	eventfd_ctx_put(event->eventfd);
-	kfree(event);
-	css_put(css);
-}
-
-/*
- * Gets called on POLLHUP on eventfd when user closes it.
- *
- * Called with wqh->lock held and interrupts disabled.
- */
-static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
-		int sync, void *key)
-{
-	struct cgroup_event *event = container_of(wait,
-			struct cgroup_event, wait);
-	struct cgroup *cgrp = event->css->cgroup;
-	unsigned long flags = (unsigned long)key;
-
-	if (flags & POLLHUP) {
-		/*
-		 * If the event has been detached at cgroup removal, we
-		 * can simply return knowing the other side will cleanup
-		 * for us.
-		 *
-		 * We can't race against event freeing since the other
-		 * side will require wqh->lock via remove_wait_queue(),
-		 * which we hold.
-		 */
-		spin_lock(&cgrp->event_list_lock);
-		if (!list_empty(&event->list)) {
-			list_del_init(&event->list);
-			/*
-			 * We are in atomic context, but cgroup_event_remove()
-			 * may sleep, so we have to call it in workqueue.
-			 */
-			schedule_work(&event->remove);
-		}
-		spin_unlock(&cgrp->event_list_lock);
-	}
-
-	return 0;
-}
-
-static void cgroup_event_ptable_queue_proc(struct file *file,
-		wait_queue_head_t *wqh, poll_table *pt)
-{
-	struct cgroup_event *event = container_of(pt,
-			struct cgroup_event, pt);
-
-	event->wqh = wqh;
-	add_wait_queue(wqh, &event->wait);
-}
-
-/*
- * Parse input and register new cgroup event handler.
- *
- * Input must be in format '<event_fd> <control_fd> <args>'.
- * Interpretation of args is defined by control file implementation.
- */
-static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
-				      struct cftype *cft, const char *buffer)
-{
-	struct cgroup *cgrp = dummy_css->cgroup;
-	struct cgroup_event *event;
-	struct cgroup_subsys_state *cfile_css;
-	unsigned int efd, cfd;
-	struct file *efile;
-	struct file *cfile;
-	char *endp;
-	int ret;
-
-	efd = simple_strtoul(buffer, &endp, 10);
-	if (*endp != ' ')
-		return -EINVAL;
-	buffer = endp + 1;
-
-	cfd = simple_strtoul(buffer, &endp, 10);
-	if ((*endp != ' ') && (*endp != '\0'))
-		return -EINVAL;
-	buffer = endp + 1;
-
-	event = kzalloc(sizeof(*event), GFP_KERNEL);
-	if (!event)
-		return -ENOMEM;
-
-	INIT_LIST_HEAD(&event->list);
-	init_poll_funcptr(&event->pt, cgroup_event_ptable_queue_proc);
-	init_waitqueue_func_entry(&event->wait, cgroup_event_wake);
-	INIT_WORK(&event->remove, cgroup_event_remove);
-
-	efile = eventfd_fget(efd);
-	if (IS_ERR(efile)) {
-		ret = PTR_ERR(efile);
-		goto out_kfree;
-	}
-
-	event->eventfd = eventfd_ctx_fileget(efile);
-	if (IS_ERR(event->eventfd)) {
-		ret = PTR_ERR(event->eventfd);
-		goto out_put_efile;
-	}
-
-	cfile = fget(cfd);
-	if (!cfile) {
-		ret = -EBADF;
-		goto out_put_eventfd;
-	}
-
-	/* the process need read permission on control file */
-	/* AV: shouldn't we check that it's been opened for read instead? */
-	ret = inode_permission(file_inode(cfile), MAY_READ);
-	if (ret < 0)
-		goto out_put_cfile;
-
-	event->cft = __file_cft(cfile);
-	if (IS_ERR(event->cft)) {
-		ret = PTR_ERR(event->cft);
-		goto out_put_cfile;
-	}
-
-	if (!event->cft->ss) {
-		ret = -EBADF;
-		goto out_put_cfile;
-	}
-
-	/*
-	 * Determine the css of @cfile, verify it belongs to the same
-	 * cgroup as cgroup.event_control, and associate @event with it.
-	 * Remaining events are automatically removed on cgroup destruction
-	 * but the removal is asynchronous, so take an extra ref.
-	 */
-	rcu_read_lock();
-
-	ret = -EINVAL;
-	event->css = cgroup_css(cgrp, event->cft->ss);
-	cfile_css = css_from_dir(cfile->f_dentry->d_parent, event->cft->ss);
-	if (event->css && event->css == cfile_css && css_tryget(event->css))
-		ret = 0;
-
-	rcu_read_unlock();
-	if (ret)
-		goto out_put_cfile;
-
-	if (!event->cft->register_event || !event->cft->unregister_event) {
-		ret = -EINVAL;
-		goto out_put_css;
-	}
-
-	ret = event->cft->register_event(event->css, event->cft,
-			event->eventfd, buffer);
-	if (ret)
-		goto out_put_css;
-
-	efile->f_op->poll(efile, &event->pt);
-
-	spin_lock(&cgrp->event_list_lock);
-	list_add(&event->list, &cgrp->event_list);
-	spin_unlock(&cgrp->event_list_lock);
-
-	fput(cfile);
-	fput(efile);
-
-	return 0;
-
-out_put_css:
-	css_put(event->css);
-out_put_cfile:
-	fput(cfile);
-out_put_eventfd:
-	eventfd_ctx_put(event->eventfd);
-out_put_efile:
-	fput(efile);
-out_kfree:
-	kfree(event);
-
-	return ret;
-}
-
 static u64 cgroup_clone_children_read(struct cgroup_subsys_state *css,
 				      struct cftype *cft)
 {
@@ -4180,11 +3952,6 @@ static struct cftype cgroup_base_files[]
 		.mode = S_IRUGO | S_IWUSR,
 	},
 	{
-		.name = "cgroup.event_control",
-		.write_string = cgroup_write_event_control,
-		.mode = S_IWUGO,
-	},
-	{
 		.name = "cgroup.clone_children",
 		.flags = CFTYPE_INSANE,
 		.read_u64 = cgroup_clone_children_read,
@@ -4676,7 +4443,6 @@ static int cgroup_destroy_locked(struct
 	__releases(&cgroup_mutex) __acquires(&cgroup_mutex)
 {
 	struct dentry *d = cgrp->dentry;
-	struct cgroup_event *event, *tmp;
 	struct cgroup_subsys *ss;
 	bool empty;
 
@@ -4734,18 +4500,6 @@ static int cgroup_destroy_locked(struct
 	dget(d);
 	cgroup_d_remove_dir(d);
 
-	/*
-	 * Unregister events and notify userspace.
-	 * Notify userspace about cgroup removing only after rmdir of cgroup
-	 * directory to avoid race between userspace and kernelspace.
-	 */
-	spin_lock(&cgrp->event_list_lock);
-	list_for_each_entry_safe(event, tmp, &cgrp->event_list, list) {
-		list_del_init(&event->list);
-		schedule_work(&event->remove);
-	}
-	spin_unlock(&cgrp->event_list_lock);
-
 	return 0;
 };
 
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -45,6 +45,7 @@
 #include <linux/swapops.h>
 #include <linux/spinlock.h>
 #include <linux/eventfd.h>
+#include <linux/poll.h>
 #include <linux/sort.h>
 #include <linux/fs.h>
 #include <linux/seq_file.h>
@@ -239,6 +240,36 @@ struct mem_cgroup_eventfd_list {
 	struct eventfd_ctx *eventfd;
 };
 
+/*
+ * cgroup_event represents events which userspace want to receive.
+ */
+struct cgroup_event {
+	/*
+	 * css which the event belongs to.
+	 */
+	struct cgroup_subsys_state *css;
+	/*
+	 * Control file which the event associated.
+	 */
+	struct cftype *cft;
+	/*
+	 * eventfd to signal userspace about the event.
+	 */
+	struct eventfd_ctx *eventfd;
+	/*
+	 * Each of these stored in a list by the cgroup.
+	 */
+	struct list_head list;
+	/*
+	 * All fields below needed to unregister event when
+	 * userspace closes eventfd.
+	 */
+	poll_table pt;
+	wait_queue_head_t *wqh;
+	wait_queue_t wait;
+	struct work_struct remove;
+};
+
 static void mem_cgroup_threshold(struct mem_cgroup *memcg);
 static void mem_cgroup_oom_notify(struct mem_cgroup *memcg);
 
@@ -5919,6 +5950,202 @@ static void kmem_cgroup_css_offline(stru
 }
 #endif
 
+/*
+ * Unregister event and free resources.
+ *
+ * Gets called from workqueue.
+ */
+static void cgroup_event_remove(struct work_struct *work)
+{
+	struct cgroup_event *event = container_of(work, struct cgroup_event,
+			remove);
+	struct cgroup_subsys_state *css = event->css;
+
+	remove_wait_queue(event->wqh, &event->wait);
+
+	event->cft->unregister_event(css, event->cft, event->eventfd);
+
+	/* Notify userspace the event is going away. */
+	eventfd_signal(event->eventfd, 1);
+
+	eventfd_ctx_put(event->eventfd);
+	kfree(event);
+	css_put(css);
+}
+
+/*
+ * Gets called on POLLHUP on eventfd when user closes it.
+ *
+ * Called with wqh->lock held and interrupts disabled.
+ */
+static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
+		int sync, void *key)
+{
+	struct cgroup_event *event = container_of(wait,
+			struct cgroup_event, wait);
+	struct cgroup *cgrp = event->css->cgroup;
+	unsigned long flags = (unsigned long)key;
+
+	if (flags & POLLHUP) {
+		/*
+		 * If the event has been detached at cgroup removal, we
+		 * can simply return knowing the other side will cleanup
+		 * for us.
+		 *
+		 * We can't race against event freeing since the other
+		 * side will require wqh->lock via remove_wait_queue(),
+		 * which we hold.
+		 */
+		spin_lock(&cgrp->event_list_lock);
+		if (!list_empty(&event->list)) {
+			list_del_init(&event->list);
+			/*
+			 * We are in atomic context, but cgroup_event_remove()
+			 * may sleep, so we have to call it in workqueue.
+			 */
+			schedule_work(&event->remove);
+		}
+		spin_unlock(&cgrp->event_list_lock);
+	}
+
+	return 0;
+}
+
+static void cgroup_event_ptable_queue_proc(struct file *file,
+		wait_queue_head_t *wqh, poll_table *pt)
+{
+	struct cgroup_event *event = container_of(pt,
+			struct cgroup_event, pt);
+
+	event->wqh = wqh;
+	add_wait_queue(wqh, &event->wait);
+}
+
+/*
+ * Parse input and register new cgroup event handler.
+ *
+ * Input must be in format '<event_fd> <control_fd> <args>'.
+ * Interpretation of args is defined by control file implementation.
+ */
+static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
+				      struct cftype *cft, const char *buffer)
+{
+	struct cgroup *cgrp = dummy_css->cgroup;
+	struct cgroup_event *event;
+	struct cgroup_subsys_state *cfile_css;
+	unsigned int efd, cfd;
+	struct file *efile;
+	struct file *cfile;
+	char *endp;
+	int ret;
+
+	efd = simple_strtoul(buffer, &endp, 10);
+	if (*endp != ' ')
+		return -EINVAL;
+	buffer = endp + 1;
+
+	cfd = simple_strtoul(buffer, &endp, 10);
+	if ((*endp != ' ') && (*endp != '\0'))
+		return -EINVAL;
+	buffer = endp + 1;
+
+	event = kzalloc(sizeof(*event), GFP_KERNEL);
+	if (!event)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&event->list);
+	init_poll_funcptr(&event->pt, cgroup_event_ptable_queue_proc);
+	init_waitqueue_func_entry(&event->wait, cgroup_event_wake);
+	INIT_WORK(&event->remove, cgroup_event_remove);
+
+	efile = eventfd_fget(efd);
+	if (IS_ERR(efile)) {
+		ret = PTR_ERR(efile);
+		goto out_kfree;
+	}
+
+	event->eventfd = eventfd_ctx_fileget(efile);
+	if (IS_ERR(event->eventfd)) {
+		ret = PTR_ERR(event->eventfd);
+		goto out_put_efile;
+	}
+
+	cfile = fget(cfd);
+	if (!cfile) {
+		ret = -EBADF;
+		goto out_put_eventfd;
+	}
+
+	/* the process need read permission on control file */
+	/* AV: shouldn't we check that it's been opened for read instead? */
+	ret = inode_permission(file_inode(cfile), MAY_READ);
+	if (ret < 0)
+		goto out_put_cfile;
+
+	event->cft = __file_cft(cfile);
+	if (IS_ERR(event->cft)) {
+		ret = PTR_ERR(event->cft);
+		goto out_put_cfile;
+	}
+
+	if (!event->cft->ss) {
+		ret = -EBADF;
+		goto out_put_cfile;
+	}
+
+	/*
+	 * Determine the css of @cfile, verify it belongs to the same
+	 * cgroup as cgroup.event_control, and associate @event with it.
+	 * Remaining events are automatically removed on cgroup destruction
+	 * but the removal is asynchronous, so take an extra ref.
+	 */
+	rcu_read_lock();
+
+	ret = -EINVAL;
+	event->css = cgroup_css(cgrp, event->cft->ss);
+	cfile_css = css_from_dir(cfile->f_dentry->d_parent, event->cft->ss);
+	if (event->css && event->css == cfile_css && css_tryget(event->css))
+		ret = 0;
+
+	rcu_read_unlock();
+	if (ret)
+		goto out_put_cfile;
+
+	if (!event->cft->register_event || !event->cft->unregister_event) {
+		ret = -EINVAL;
+		goto out_put_css;
+	}
+
+	ret = event->cft->register_event(event->css, event->cft,
+			event->eventfd, buffer);
+	if (ret)
+		goto out_put_css;
+
+	efile->f_op->poll(efile, &event->pt);
+
+	spin_lock(&cgrp->event_list_lock);
+	list_add(&event->list, &cgrp->event_list);
+	spin_unlock(&cgrp->event_list_lock);
+
+	fput(cfile);
+	fput(efile);
+
+	return 0;
+
+out_put_css:
+	css_put(event->css);
+out_put_cfile:
+	fput(cfile);
+out_put_eventfd:
+	eventfd_ctx_put(event->eventfd);
+out_put_efile:
+	fput(efile);
+out_kfree:
+	kfree(event);
+
+	return ret;
+}
+
 static struct cftype mem_cgroup_files[] = {
 	{
 		.name = "usage_in_bytes",
@@ -5966,6 +6193,12 @@ static struct cftype mem_cgroup_files[]
 		.read_u64 = mem_cgroup_hierarchy_read,
 	},
 	{
+		.name = "cgroup.event_control",
+		.write_string = cgroup_write_event_control,
+		.flags = CFTYPE_NO_PREFIX,
+		.mode = S_IWUGO,
+	},
+	{
 		.name = "swappiness",
 		.read_u64 = mem_cgroup_swappiness_read,
 		.write_u64 = mem_cgroup_swappiness_write,
@@ -6298,6 +6531,20 @@ static void mem_cgroup_invalidate_reclai
 static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
+	struct cgroup *cgrp = css->cgroup;
+	struct cgroup_event *event, *tmp;
+
+	/*
+	 * Unregister events and notify userspace.
+	 * Notify userspace about cgroup removing only after rmdir of cgroup
+	 * directory to avoid race between userspace and kernelspace.
+	 */
+	spin_lock(&cgrp->event_list_lock);
+	list_for_each_entry_safe(event, tmp, &cgrp->event_list, list) {
+		list_del_init(&event->list);
+		schedule_work(&event->remove);
+	}
+	spin_unlock(&cgrp->event_list_lock);
 
 	kmem_cgroup_css_offline(memcg);

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH v3 06/12] cgroup, memcg: move cgroup_event implementation to memcg
       [not found]     ` <1376582550-12548-7-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2013-08-27 14:20       ` Michal Hocko
@ 2013-08-29 18:19       ` Tejun Heo
       [not found]         ` <20130829181911.GA8517-9pTldWuhBndy/B6EtB590w@public.gmane.org>
  2013-08-29 18:19       ` Tejun Heo
  2 siblings, 1 reply; 74+ messages in thread
From: Tejun Heo @ 2013-08-29 18:19 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Hello, Michal.

Updated as requested.  If there's still somthing missing, please let
me know.  In case you're okay with the change, how do you wanna route
it?  01-05 are already in cgroup/for-3.12 and the rest can either be
applie to cgroup/for-3.12, which will likely to require some, mostly
trivial, adjustments to memcg patches in -mm, or these all can be
routed through -mm.  Any preference?

Thanks.

------ 8< ------
cgroup_event is way over-designed and tries to build a generic
flexible event mechanism into cgroup - fully customizable event
specification for each user of the interface.  This is utterly
unnecessary and overboard especially in the light of the planned
unified hierarchy as there's gonna be single agent.  Simply generating
events at fixed points, or if that's too restrictive, configureable
cadence or single set of configureable points should be enough.

In addition, it's adding overhead for future changes for cgroup core.

* Everything is being made per-css which is necessary as css's
  lifetime would no longer coincide with cgroup's in the planned
  unified hierarchy.  Keeping cgroup_event in cgroup proper means that
  it should be updated to be managed per-css, which is silly as no
  other subsystems will use this facility.

* The file system interface of cgroup will be restructured so that it
  uses sysfs.  The goal of the conversion is losing duplicate logic
  and taking advantage of features provided by sysfs such as revoke
  semantics on file deletion and decoupled synchronization from vfs
  layer.

  The conversion is likely to involve unifying all different file
  operation types into a single type and having operations which are
  as unusual as event callbacks adds unnecessary burden.  While making
  it specific to memcg doesn't remove all conversion work, it'd lower
  the amount of work necessary.  We don't have to worry how the
  unusual event callbacks would fit into the new structure.

Thankfully, memcg is the only user and gets to keep it.  Replacing it
with something simpler on sane_behavior is strongly recommended.

This patch moves cgroup_event and "cgroup.event_control"
implementation to mm/memcontrol.c.  Clearing of events on cgroup
destruction is moved from cgroup_destroy_locked() to
mem_cgroup_css_offline(), which shouldn't make any noticeable
difference.

cgroup_css() and __file_cft() are exported to enable the move;
however, this will soon be reverted once the event code is updated to
be memcg specific.

Note that "cgroup.event_control" will now exist only on the hierarchy
with memcg attached to it.  While this change is visible to userland,
it is unlikely to be noticeable as the file has never been meaningful
outside memcg.

Aside from the above change, this is pure code relocation.

v2: Per Li Zefan's comments, init/Kconfig updated accordingly and
    poll.h inclusion moved from cgroup.c to memcontrol.c.

v3: Per Michal's request, update patch description.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Acked-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Cc: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
Cc: Balbir Singh <bsingharora-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 include/linux/cgroup.h |    5 
 init/Kconfig           |    3 
 kernel/cgroup.c        |  252 -------------------------------------------------
 mm/memcontrol.c        |  247 ++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 256 insertions(+), 251 deletions(-)

--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -907,6 +907,11 @@ unsigned short css_id(struct cgroup_subs
 struct cgroup_subsys_state *css_from_dir(struct dentry *dentry,
 					 struct cgroup_subsys *ss);
 
+/* XXX: temporary */
+struct cgroup_subsys_state *cgroup_css(struct cgroup *cgrp,
+				       struct cgroup_subsys *ss);
+struct cftype *__file_cft(struct file *file);
+
 #else /* !CONFIG_CGROUPS */
 
 static inline int cgroup_init_early(void) { return 0; }
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -844,7 +844,6 @@ config NUMA_BALANCING
 
 menuconfig CGROUPS
 	boolean "Control Group support"
-	depends on EVENTFD
 	help
 	  This option adds support for grouping sets of processes together, for
 	  use with process control subsystems such as Cpusets, CFS, memory
@@ -911,6 +910,7 @@ config MEMCG
 	bool "Memory Resource Controller for Control Groups"
 	depends on RESOURCE_COUNTERS
 	select MM_OWNER
+	select EVENTFD
 	help
 	  Provides a memory resource controller that manages both anonymous
 	  memory and page cache. (See Documentation/cgroups/memory.txt)
@@ -1163,7 +1163,6 @@ config UIDGID_STRICT_TYPE_CHECKS
 
 config SCHED_AUTOGROUP
 	bool "Automatic process group scheduling"
-	select EVENTFD
 	select CGROUPS
 	select CGROUP_SCHED
 	select FAIR_GROUP_SCHED
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -56,8 +56,6 @@
 #include <linux/pid_namespace.h>
 #include <linux/idr.h>
 #include <linux/vmalloc.h> /* TODO: replace with more sophisticated array */
-#include <linux/eventfd.h>
-#include <linux/poll.h>
 #include <linux/flex_array.h> /* used in cgroup_attach_task */
 #include <linux/kthread.h>
 
@@ -155,36 +153,6 @@ struct css_id {
 	unsigned short stack[0]; /* Array of Length (depth+1) */
 };
 
-/*
- * cgroup_event represents events which userspace want to receive.
- */
-struct cgroup_event {
-	/*
-	 * css which the event belongs to.
-	 */
-	struct cgroup_subsys_state *css;
-	/*
-	 * Control file which the event associated.
-	 */
-	struct cftype *cft;
-	/*
-	 * eventfd to signal userspace about the event.
-	 */
-	struct eventfd_ctx *eventfd;
-	/*
-	 * Each of these stored in a list by the cgroup.
-	 */
-	struct list_head list;
-	/*
-	 * All fields below needed to unregister event when
-	 * userspace closes eventfd.
-	 */
-	poll_table pt;
-	wait_queue_head_t *wqh;
-	wait_queue_t wait;
-	struct work_struct remove;
-};
-
 /* The list of hierarchy roots */
 
 static LIST_HEAD(cgroup_roots);
@@ -234,8 +202,8 @@ static int cgroup_addrm_files(struct cgr
  * keep accessing it outside the said locks.  This function may return
  * %NULL if @cgrp doesn't have @subsys_id enabled.
  */
-static struct cgroup_subsys_state *cgroup_css(struct cgroup *cgrp,
-					      struct cgroup_subsys *ss)
+struct cgroup_subsys_state *cgroup_css(struct cgroup *cgrp,
+				       struct cgroup_subsys *ss)
 {
 	if (ss)
 		return rcu_dereference_check(cgrp->subsys[ss->subsys_id],
@@ -2671,7 +2639,7 @@ static struct dentry *cgroup_lookup(stru
 /*
  * Check if a file is a control file
  */
-static inline struct cftype *__file_cft(struct file *file)
+struct cftype *__file_cft(struct file *file)
 {
 	if (file_inode(file)->i_fop != &cgroup_file_operations)
 		return ERR_PTR(-EINVAL);
@@ -3959,202 +3927,6 @@ static void cgroup_dput(struct cgroup *c
 	deactivate_super(sb);
 }
 
-/*
- * Unregister event and free resources.
- *
- * Gets called from workqueue.
- */
-static void cgroup_event_remove(struct work_struct *work)
-{
-	struct cgroup_event *event = container_of(work, struct cgroup_event,
-			remove);
-	struct cgroup_subsys_state *css = event->css;
-
-	remove_wait_queue(event->wqh, &event->wait);
-
-	event->cft->unregister_event(css, event->cft, event->eventfd);
-
-	/* Notify userspace the event is going away. */
-	eventfd_signal(event->eventfd, 1);
-
-	eventfd_ctx_put(event->eventfd);
-	kfree(event);
-	css_put(css);
-}
-
-/*
- * Gets called on POLLHUP on eventfd when user closes it.
- *
- * Called with wqh->lock held and interrupts disabled.
- */
-static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
-		int sync, void *key)
-{
-	struct cgroup_event *event = container_of(wait,
-			struct cgroup_event, wait);
-	struct cgroup *cgrp = event->css->cgroup;
-	unsigned long flags = (unsigned long)key;
-
-	if (flags & POLLHUP) {
-		/*
-		 * If the event has been detached at cgroup removal, we
-		 * can simply return knowing the other side will cleanup
-		 * for us.
-		 *
-		 * We can't race against event freeing since the other
-		 * side will require wqh->lock via remove_wait_queue(),
-		 * which we hold.
-		 */
-		spin_lock(&cgrp->event_list_lock);
-		if (!list_empty(&event->list)) {
-			list_del_init(&event->list);
-			/*
-			 * We are in atomic context, but cgroup_event_remove()
-			 * may sleep, so we have to call it in workqueue.
-			 */
-			schedule_work(&event->remove);
-		}
-		spin_unlock(&cgrp->event_list_lock);
-	}
-
-	return 0;
-}
-
-static void cgroup_event_ptable_queue_proc(struct file *file,
-		wait_queue_head_t *wqh, poll_table *pt)
-{
-	struct cgroup_event *event = container_of(pt,
-			struct cgroup_event, pt);
-
-	event->wqh = wqh;
-	add_wait_queue(wqh, &event->wait);
-}
-
-/*
- * Parse input and register new cgroup event handler.
- *
- * Input must be in format '<event_fd> <control_fd> <args>'.
- * Interpretation of args is defined by control file implementation.
- */
-static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
-				      struct cftype *cft, const char *buffer)
-{
-	struct cgroup *cgrp = dummy_css->cgroup;
-	struct cgroup_event *event;
-	struct cgroup_subsys_state *cfile_css;
-	unsigned int efd, cfd;
-	struct file *efile;
-	struct file *cfile;
-	char *endp;
-	int ret;
-
-	efd = simple_strtoul(buffer, &endp, 10);
-	if (*endp != ' ')
-		return -EINVAL;
-	buffer = endp + 1;
-
-	cfd = simple_strtoul(buffer, &endp, 10);
-	if ((*endp != ' ') && (*endp != '\0'))
-		return -EINVAL;
-	buffer = endp + 1;
-
-	event = kzalloc(sizeof(*event), GFP_KERNEL);
-	if (!event)
-		return -ENOMEM;
-
-	INIT_LIST_HEAD(&event->list);
-	init_poll_funcptr(&event->pt, cgroup_event_ptable_queue_proc);
-	init_waitqueue_func_entry(&event->wait, cgroup_event_wake);
-	INIT_WORK(&event->remove, cgroup_event_remove);
-
-	efile = eventfd_fget(efd);
-	if (IS_ERR(efile)) {
-		ret = PTR_ERR(efile);
-		goto out_kfree;
-	}
-
-	event->eventfd = eventfd_ctx_fileget(efile);
-	if (IS_ERR(event->eventfd)) {
-		ret = PTR_ERR(event->eventfd);
-		goto out_put_efile;
-	}
-
-	cfile = fget(cfd);
-	if (!cfile) {
-		ret = -EBADF;
-		goto out_put_eventfd;
-	}
-
-	/* the process need read permission on control file */
-	/* AV: shouldn't we check that it's been opened for read instead? */
-	ret = inode_permission(file_inode(cfile), MAY_READ);
-	if (ret < 0)
-		goto out_put_cfile;
-
-	event->cft = __file_cft(cfile);
-	if (IS_ERR(event->cft)) {
-		ret = PTR_ERR(event->cft);
-		goto out_put_cfile;
-	}
-
-	if (!event->cft->ss) {
-		ret = -EBADF;
-		goto out_put_cfile;
-	}
-
-	/*
-	 * Determine the css of @cfile, verify it belongs to the same
-	 * cgroup as cgroup.event_control, and associate @event with it.
-	 * Remaining events are automatically removed on cgroup destruction
-	 * but the removal is asynchronous, so take an extra ref.
-	 */
-	rcu_read_lock();
-
-	ret = -EINVAL;
-	event->css = cgroup_css(cgrp, event->cft->ss);
-	cfile_css = css_from_dir(cfile->f_dentry->d_parent, event->cft->ss);
-	if (event->css && event->css == cfile_css && css_tryget(event->css))
-		ret = 0;
-
-	rcu_read_unlock();
-	if (ret)
-		goto out_put_cfile;
-
-	if (!event->cft->register_event || !event->cft->unregister_event) {
-		ret = -EINVAL;
-		goto out_put_css;
-	}
-
-	ret = event->cft->register_event(event->css, event->cft,
-			event->eventfd, buffer);
-	if (ret)
-		goto out_put_css;
-
-	efile->f_op->poll(efile, &event->pt);
-
-	spin_lock(&cgrp->event_list_lock);
-	list_add(&event->list, &cgrp->event_list);
-	spin_unlock(&cgrp->event_list_lock);
-
-	fput(cfile);
-	fput(efile);
-
-	return 0;
-
-out_put_css:
-	css_put(event->css);
-out_put_cfile:
-	fput(cfile);
-out_put_eventfd:
-	eventfd_ctx_put(event->eventfd);
-out_put_efile:
-	fput(efile);
-out_kfree:
-	kfree(event);
-
-	return ret;
-}
-
 static u64 cgroup_clone_children_read(struct cgroup_subsys_state *css,
 				      struct cftype *cft)
 {
@@ -4180,11 +3952,6 @@ static struct cftype cgroup_base_files[]
 		.mode = S_IRUGO | S_IWUSR,
 	},
 	{
-		.name = "cgroup.event_control",
-		.write_string = cgroup_write_event_control,
-		.mode = S_IWUGO,
-	},
-	{
 		.name = "cgroup.clone_children",
 		.flags = CFTYPE_INSANE,
 		.read_u64 = cgroup_clone_children_read,
@@ -4676,7 +4443,6 @@ static int cgroup_destroy_locked(struct
 	__releases(&cgroup_mutex) __acquires(&cgroup_mutex)
 {
 	struct dentry *d = cgrp->dentry;
-	struct cgroup_event *event, *tmp;
 	struct cgroup_subsys *ss;
 	bool empty;
 
@@ -4734,18 +4500,6 @@ static int cgroup_destroy_locked(struct
 	dget(d);
 	cgroup_d_remove_dir(d);
 
-	/*
-	 * Unregister events and notify userspace.
-	 * Notify userspace about cgroup removing only after rmdir of cgroup
-	 * directory to avoid race between userspace and kernelspace.
-	 */
-	spin_lock(&cgrp->event_list_lock);
-	list_for_each_entry_safe(event, tmp, &cgrp->event_list, list) {
-		list_del_init(&event->list);
-		schedule_work(&event->remove);
-	}
-	spin_unlock(&cgrp->event_list_lock);
-
 	return 0;
 };
 
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -45,6 +45,7 @@
 #include <linux/swapops.h>
 #include <linux/spinlock.h>
 #include <linux/eventfd.h>
+#include <linux/poll.h>
 #include <linux/sort.h>
 #include <linux/fs.h>
 #include <linux/seq_file.h>
@@ -239,6 +240,36 @@ struct mem_cgroup_eventfd_list {
 	struct eventfd_ctx *eventfd;
 };
 
+/*
+ * cgroup_event represents events which userspace want to receive.
+ */
+struct cgroup_event {
+	/*
+	 * css which the event belongs to.
+	 */
+	struct cgroup_subsys_state *css;
+	/*
+	 * Control file which the event associated.
+	 */
+	struct cftype *cft;
+	/*
+	 * eventfd to signal userspace about the event.
+	 */
+	struct eventfd_ctx *eventfd;
+	/*
+	 * Each of these stored in a list by the cgroup.
+	 */
+	struct list_head list;
+	/*
+	 * All fields below needed to unregister event when
+	 * userspace closes eventfd.
+	 */
+	poll_table pt;
+	wait_queue_head_t *wqh;
+	wait_queue_t wait;
+	struct work_struct remove;
+};
+
 static void mem_cgroup_threshold(struct mem_cgroup *memcg);
 static void mem_cgroup_oom_notify(struct mem_cgroup *memcg);
 
@@ -5919,6 +5950,202 @@ static void kmem_cgroup_css_offline(stru
 }
 #endif
 
+/*
+ * Unregister event and free resources.
+ *
+ * Gets called from workqueue.
+ */
+static void cgroup_event_remove(struct work_struct *work)
+{
+	struct cgroup_event *event = container_of(work, struct cgroup_event,
+			remove);
+	struct cgroup_subsys_state *css = event->css;
+
+	remove_wait_queue(event->wqh, &event->wait);
+
+	event->cft->unregister_event(css, event->cft, event->eventfd);
+
+	/* Notify userspace the event is going away. */
+	eventfd_signal(event->eventfd, 1);
+
+	eventfd_ctx_put(event->eventfd);
+	kfree(event);
+	css_put(css);
+}
+
+/*
+ * Gets called on POLLHUP on eventfd when user closes it.
+ *
+ * Called with wqh->lock held and interrupts disabled.
+ */
+static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
+		int sync, void *key)
+{
+	struct cgroup_event *event = container_of(wait,
+			struct cgroup_event, wait);
+	struct cgroup *cgrp = event->css->cgroup;
+	unsigned long flags = (unsigned long)key;
+
+	if (flags & POLLHUP) {
+		/*
+		 * If the event has been detached at cgroup removal, we
+		 * can simply return knowing the other side will cleanup
+		 * for us.
+		 *
+		 * We can't race against event freeing since the other
+		 * side will require wqh->lock via remove_wait_queue(),
+		 * which we hold.
+		 */
+		spin_lock(&cgrp->event_list_lock);
+		if (!list_empty(&event->list)) {
+			list_del_init(&event->list);
+			/*
+			 * We are in atomic context, but cgroup_event_remove()
+			 * may sleep, so we have to call it in workqueue.
+			 */
+			schedule_work(&event->remove);
+		}
+		spin_unlock(&cgrp->event_list_lock);
+	}
+
+	return 0;
+}
+
+static void cgroup_event_ptable_queue_proc(struct file *file,
+		wait_queue_head_t *wqh, poll_table *pt)
+{
+	struct cgroup_event *event = container_of(pt,
+			struct cgroup_event, pt);
+
+	event->wqh = wqh;
+	add_wait_queue(wqh, &event->wait);
+}
+
+/*
+ * Parse input and register new cgroup event handler.
+ *
+ * Input must be in format '<event_fd> <control_fd> <args>'.
+ * Interpretation of args is defined by control file implementation.
+ */
+static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
+				      struct cftype *cft, const char *buffer)
+{
+	struct cgroup *cgrp = dummy_css->cgroup;
+	struct cgroup_event *event;
+	struct cgroup_subsys_state *cfile_css;
+	unsigned int efd, cfd;
+	struct file *efile;
+	struct file *cfile;
+	char *endp;
+	int ret;
+
+	efd = simple_strtoul(buffer, &endp, 10);
+	if (*endp != ' ')
+		return -EINVAL;
+	buffer = endp + 1;
+
+	cfd = simple_strtoul(buffer, &endp, 10);
+	if ((*endp != ' ') && (*endp != '\0'))
+		return -EINVAL;
+	buffer = endp + 1;
+
+	event = kzalloc(sizeof(*event), GFP_KERNEL);
+	if (!event)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&event->list);
+	init_poll_funcptr(&event->pt, cgroup_event_ptable_queue_proc);
+	init_waitqueue_func_entry(&event->wait, cgroup_event_wake);
+	INIT_WORK(&event->remove, cgroup_event_remove);
+
+	efile = eventfd_fget(efd);
+	if (IS_ERR(efile)) {
+		ret = PTR_ERR(efile);
+		goto out_kfree;
+	}
+
+	event->eventfd = eventfd_ctx_fileget(efile);
+	if (IS_ERR(event->eventfd)) {
+		ret = PTR_ERR(event->eventfd);
+		goto out_put_efile;
+	}
+
+	cfile = fget(cfd);
+	if (!cfile) {
+		ret = -EBADF;
+		goto out_put_eventfd;
+	}
+
+	/* the process need read permission on control file */
+	/* AV: shouldn't we check that it's been opened for read instead? */
+	ret = inode_permission(file_inode(cfile), MAY_READ);
+	if (ret < 0)
+		goto out_put_cfile;
+
+	event->cft = __file_cft(cfile);
+	if (IS_ERR(event->cft)) {
+		ret = PTR_ERR(event->cft);
+		goto out_put_cfile;
+	}
+
+	if (!event->cft->ss) {
+		ret = -EBADF;
+		goto out_put_cfile;
+	}
+
+	/*
+	 * Determine the css of @cfile, verify it belongs to the same
+	 * cgroup as cgroup.event_control, and associate @event with it.
+	 * Remaining events are automatically removed on cgroup destruction
+	 * but the removal is asynchronous, so take an extra ref.
+	 */
+	rcu_read_lock();
+
+	ret = -EINVAL;
+	event->css = cgroup_css(cgrp, event->cft->ss);
+	cfile_css = css_from_dir(cfile->f_dentry->d_parent, event->cft->ss);
+	if (event->css && event->css == cfile_css && css_tryget(event->css))
+		ret = 0;
+
+	rcu_read_unlock();
+	if (ret)
+		goto out_put_cfile;
+
+	if (!event->cft->register_event || !event->cft->unregister_event) {
+		ret = -EINVAL;
+		goto out_put_css;
+	}
+
+	ret = event->cft->register_event(event->css, event->cft,
+			event->eventfd, buffer);
+	if (ret)
+		goto out_put_css;
+
+	efile->f_op->poll(efile, &event->pt);
+
+	spin_lock(&cgrp->event_list_lock);
+	list_add(&event->list, &cgrp->event_list);
+	spin_unlock(&cgrp->event_list_lock);
+
+	fput(cfile);
+	fput(efile);
+
+	return 0;
+
+out_put_css:
+	css_put(event->css);
+out_put_cfile:
+	fput(cfile);
+out_put_eventfd:
+	eventfd_ctx_put(event->eventfd);
+out_put_efile:
+	fput(efile);
+out_kfree:
+	kfree(event);
+
+	return ret;
+}
+
 static struct cftype mem_cgroup_files[] = {
 	{
 		.name = "usage_in_bytes",
@@ -5966,6 +6193,12 @@ static struct cftype mem_cgroup_files[]
 		.read_u64 = mem_cgroup_hierarchy_read,
 	},
 	{
+		.name = "cgroup.event_control",
+		.write_string = cgroup_write_event_control,
+		.flags = CFTYPE_NO_PREFIX,
+		.mode = S_IWUGO,
+	},
+	{
 		.name = "swappiness",
 		.read_u64 = mem_cgroup_swappiness_read,
 		.write_u64 = mem_cgroup_swappiness_write,
@@ -6298,6 +6531,20 @@ static void mem_cgroup_invalidate_reclai
 static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
+	struct cgroup *cgrp = css->cgroup;
+	struct cgroup_event *event, *tmp;
+
+	/*
+	 * Unregister events and notify userspace.
+	 * Notify userspace about cgroup removing only after rmdir of cgroup
+	 * directory to avoid race between userspace and kernelspace.
+	 */
+	spin_lock(&cgrp->event_list_lock);
+	list_for_each_entry_safe(event, tmp, &cgrp->event_list, list) {
+		list_del_init(&event->list);
+		schedule_work(&event->remove);
+	}
+	spin_unlock(&cgrp->event_list_lock);
 
 	kmem_cgroup_css_offline(memcg);
 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v3 06/12] cgroup, memcg: move cgroup_event implementation to memcg
       [not found]         ` <20130829181911.GA8517-9pTldWuhBndy/B6EtB590w@public.gmane.org>
  2013-08-30 10:47           ` Michal Hocko
@ 2013-08-30 10:47           ` Michal Hocko
  1 sibling, 0 replies; 74+ messages in thread
From: Michal Hocko @ 2013-08-30 10:47 UTC (permalink / raw)
  To: Tejun Heo
  Cc: hannes-druUgvl0LCNAfugRpC6u6w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Thu 29-08-13 14:19:11, Tejun Heo wrote:
> Hello, Michal.
> 
> Updated as requested.  If there's still somthing missing, please let
> me know.  In case you're okay with the change, how do you wanna route
> it?  01-05 are already in cgroup/for-3.12 and the rest can either be
> applie to cgroup/for-3.12, which will likely to require some, mostly
> trivial, adjustments to memcg patches in -mm, or these all can be
> routed through -mm.  Any preference?

I do not care much but as Andrew wasn't CCed and this comes quite late he
might be reluctant to take it now and push in the next merge window. So
if you hurry with this then route it via your tree.

> 
> Thanks.
> 
> ------ 8< ------
> cgroup_event is way over-designed and tries to build a generic
> flexible event mechanism into cgroup - fully customizable event
> specification for each user of the interface.  This is utterly
> unnecessary and overboard especially in the light of the planned
> unified hierarchy as there's gonna be single agent.  Simply generating
> events at fixed points, or if that's too restrictive, configureable
> cadence or single set of configureable points should be enough.
> 
> In addition, it's adding overhead for future changes for cgroup core.
> 
> * Everything is being made per-css which is necessary as css's
>   lifetime would no longer coincide with cgroup's in the planned
>   unified hierarchy.  Keeping cgroup_event in cgroup proper means that
>   it should be updated to be managed per-css, which is silly as no
>   other subsystems will use this facility.
> 
> * The file system interface of cgroup will be restructured so that it
>   uses sysfs.  The goal of the conversion is losing duplicate logic
>   and taking advantage of features provided by sysfs such as revoke
>   semantics on file deletion and decoupled synchronization from vfs
>   layer.
> 
>   The conversion is likely to involve unifying all different file
>   operation types into a single type and having operations which are
>   as unusual as event callbacks adds unnecessary burden.  While making
>   it specific to memcg doesn't remove all conversion work, it'd lower
>   the amount of work necessary.  We don't have to worry how the
>   unusual event callbacks would fit into the new structure.
> 
> Thankfully, memcg is the only user and gets to keep it.  Replacing it
> with something simpler on sane_behavior is strongly recommended.
> 
> This patch moves cgroup_event and "cgroup.event_control"
> implementation to mm/memcontrol.c.  Clearing of events on cgroup
> destruction is moved from cgroup_destroy_locked() to
> mem_cgroup_css_offline(), which shouldn't make any noticeable
> difference.
> 
> cgroup_css() and __file_cft() are exported to enable the move;
> however, this will soon be reverted once the event code is updated to
> be memcg specific.
> 
> Note that "cgroup.event_control" will now exist only on the hierarchy
> with memcg attached to it.  While this change is visible to userland,
> it is unlikely to be noticeable as the file has never been meaningful
> outside memcg.
> 
> Aside from the above change, this is pure code relocation.
> 
> v2: Per Li Zefan's comments, init/Kconfig updated accordingly and
>     poll.h inclusion moved from cgroup.c to memcontrol.c.
> 
> v3: Per Michal's request, update patch description.

Thanks for the more specific justification.
 
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Acked-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> Acked-by: Kirill A. Shutemov <kirill.shutemov-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
> Cc: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>

Acked-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>

> Cc: Balbir Singh <bsingharora-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> ---
>  include/linux/cgroup.h |    5 
>  init/Kconfig           |    3 
>  kernel/cgroup.c        |  252 -------------------------------------------------
>  mm/memcontrol.c        |  247 ++++++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 256 insertions(+), 251 deletions(-)
> 
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -907,6 +907,11 @@ unsigned short css_id(struct cgroup_subs
>  struct cgroup_subsys_state *css_from_dir(struct dentry *dentry,
>  					 struct cgroup_subsys *ss);
>  
> +/* XXX: temporary */
> +struct cgroup_subsys_state *cgroup_css(struct cgroup *cgrp,
> +				       struct cgroup_subsys *ss);
> +struct cftype *__file_cft(struct file *file);
> +
>  #else /* !CONFIG_CGROUPS */
>  
>  static inline int cgroup_init_early(void) { return 0; }
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -844,7 +844,6 @@ config NUMA_BALANCING
>  
>  menuconfig CGROUPS
>  	boolean "Control Group support"
> -	depends on EVENTFD
>  	help
>  	  This option adds support for grouping sets of processes together, for
>  	  use with process control subsystems such as Cpusets, CFS, memory
> @@ -911,6 +910,7 @@ config MEMCG
>  	bool "Memory Resource Controller for Control Groups"
>  	depends on RESOURCE_COUNTERS
>  	select MM_OWNER
> +	select EVENTFD
>  	help
>  	  Provides a memory resource controller that manages both anonymous
>  	  memory and page cache. (See Documentation/cgroups/memory.txt)
> @@ -1163,7 +1163,6 @@ config UIDGID_STRICT_TYPE_CHECKS
>  
>  config SCHED_AUTOGROUP
>  	bool "Automatic process group scheduling"
> -	select EVENTFD
>  	select CGROUPS
>  	select CGROUP_SCHED
>  	select FAIR_GROUP_SCHED
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -56,8 +56,6 @@
>  #include <linux/pid_namespace.h>
>  #include <linux/idr.h>
>  #include <linux/vmalloc.h> /* TODO: replace with more sophisticated array */
> -#include <linux/eventfd.h>
> -#include <linux/poll.h>
>  #include <linux/flex_array.h> /* used in cgroup_attach_task */
>  #include <linux/kthread.h>
>  
> @@ -155,36 +153,6 @@ struct css_id {
>  	unsigned short stack[0]; /* Array of Length (depth+1) */
>  };
>  
> -/*
> - * cgroup_event represents events which userspace want to receive.
> - */
> -struct cgroup_event {
> -	/*
> -	 * css which the event belongs to.
> -	 */
> -	struct cgroup_subsys_state *css;
> -	/*
> -	 * Control file which the event associated.
> -	 */
> -	struct cftype *cft;
> -	/*
> -	 * eventfd to signal userspace about the event.
> -	 */
> -	struct eventfd_ctx *eventfd;
> -	/*
> -	 * Each of these stored in a list by the cgroup.
> -	 */
> -	struct list_head list;
> -	/*
> -	 * All fields below needed to unregister event when
> -	 * userspace closes eventfd.
> -	 */
> -	poll_table pt;
> -	wait_queue_head_t *wqh;
> -	wait_queue_t wait;
> -	struct work_struct remove;
> -};
> -
>  /* The list of hierarchy roots */
>  
>  static LIST_HEAD(cgroup_roots);
> @@ -234,8 +202,8 @@ static int cgroup_addrm_files(struct cgr
>   * keep accessing it outside the said locks.  This function may return
>   * %NULL if @cgrp doesn't have @subsys_id enabled.
>   */
> -static struct cgroup_subsys_state *cgroup_css(struct cgroup *cgrp,
> -					      struct cgroup_subsys *ss)
> +struct cgroup_subsys_state *cgroup_css(struct cgroup *cgrp,
> +				       struct cgroup_subsys *ss)
>  {
>  	if (ss)
>  		return rcu_dereference_check(cgrp->subsys[ss->subsys_id],
> @@ -2671,7 +2639,7 @@ static struct dentry *cgroup_lookup(stru
>  /*
>   * Check if a file is a control file
>   */
> -static inline struct cftype *__file_cft(struct file *file)
> +struct cftype *__file_cft(struct file *file)
>  {
>  	if (file_inode(file)->i_fop != &cgroup_file_operations)
>  		return ERR_PTR(-EINVAL);
> @@ -3959,202 +3927,6 @@ static void cgroup_dput(struct cgroup *c
>  	deactivate_super(sb);
>  }
>  
> -/*
> - * Unregister event and free resources.
> - *
> - * Gets called from workqueue.
> - */
> -static void cgroup_event_remove(struct work_struct *work)
> -{
> -	struct cgroup_event *event = container_of(work, struct cgroup_event,
> -			remove);
> -	struct cgroup_subsys_state *css = event->css;
> -
> -	remove_wait_queue(event->wqh, &event->wait);
> -
> -	event->cft->unregister_event(css, event->cft, event->eventfd);
> -
> -	/* Notify userspace the event is going away. */
> -	eventfd_signal(event->eventfd, 1);
> -
> -	eventfd_ctx_put(event->eventfd);
> -	kfree(event);
> -	css_put(css);
> -}
> -
> -/*
> - * Gets called on POLLHUP on eventfd when user closes it.
> - *
> - * Called with wqh->lock held and interrupts disabled.
> - */
> -static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
> -		int sync, void *key)
> -{
> -	struct cgroup_event *event = container_of(wait,
> -			struct cgroup_event, wait);
> -	struct cgroup *cgrp = event->css->cgroup;
> -	unsigned long flags = (unsigned long)key;
> -
> -	if (flags & POLLHUP) {
> -		/*
> -		 * If the event has been detached at cgroup removal, we
> -		 * can simply return knowing the other side will cleanup
> -		 * for us.
> -		 *
> -		 * We can't race against event freeing since the other
> -		 * side will require wqh->lock via remove_wait_queue(),
> -		 * which we hold.
> -		 */
> -		spin_lock(&cgrp->event_list_lock);
> -		if (!list_empty(&event->list)) {
> -			list_del_init(&event->list);
> -			/*
> -			 * We are in atomic context, but cgroup_event_remove()
> -			 * may sleep, so we have to call it in workqueue.
> -			 */
> -			schedule_work(&event->remove);
> -		}
> -		spin_unlock(&cgrp->event_list_lock);
> -	}
> -
> -	return 0;
> -}
> -
> -static void cgroup_event_ptable_queue_proc(struct file *file,
> -		wait_queue_head_t *wqh, poll_table *pt)
> -{
> -	struct cgroup_event *event = container_of(pt,
> -			struct cgroup_event, pt);
> -
> -	event->wqh = wqh;
> -	add_wait_queue(wqh, &event->wait);
> -}
> -
> -/*
> - * Parse input and register new cgroup event handler.
> - *
> - * Input must be in format '<event_fd> <control_fd> <args>'.
> - * Interpretation of args is defined by control file implementation.
> - */
> -static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
> -				      struct cftype *cft, const char *buffer)
> -{
> -	struct cgroup *cgrp = dummy_css->cgroup;
> -	struct cgroup_event *event;
> -	struct cgroup_subsys_state *cfile_css;
> -	unsigned int efd, cfd;
> -	struct file *efile;
> -	struct file *cfile;
> -	char *endp;
> -	int ret;
> -
> -	efd = simple_strtoul(buffer, &endp, 10);
> -	if (*endp != ' ')
> -		return -EINVAL;
> -	buffer = endp + 1;
> -
> -	cfd = simple_strtoul(buffer, &endp, 10);
> -	if ((*endp != ' ') && (*endp != '\0'))
> -		return -EINVAL;
> -	buffer = endp + 1;
> -
> -	event = kzalloc(sizeof(*event), GFP_KERNEL);
> -	if (!event)
> -		return -ENOMEM;
> -
> -	INIT_LIST_HEAD(&event->list);
> -	init_poll_funcptr(&event->pt, cgroup_event_ptable_queue_proc);
> -	init_waitqueue_func_entry(&event->wait, cgroup_event_wake);
> -	INIT_WORK(&event->remove, cgroup_event_remove);
> -
> -	efile = eventfd_fget(efd);
> -	if (IS_ERR(efile)) {
> -		ret = PTR_ERR(efile);
> -		goto out_kfree;
> -	}
> -
> -	event->eventfd = eventfd_ctx_fileget(efile);
> -	if (IS_ERR(event->eventfd)) {
> -		ret = PTR_ERR(event->eventfd);
> -		goto out_put_efile;
> -	}
> -
> -	cfile = fget(cfd);
> -	if (!cfile) {
> -		ret = -EBADF;
> -		goto out_put_eventfd;
> -	}
> -
> -	/* the process need read permission on control file */
> -	/* AV: shouldn't we check that it's been opened for read instead? */
> -	ret = inode_permission(file_inode(cfile), MAY_READ);
> -	if (ret < 0)
> -		goto out_put_cfile;
> -
> -	event->cft = __file_cft(cfile);
> -	if (IS_ERR(event->cft)) {
> -		ret = PTR_ERR(event->cft);
> -		goto out_put_cfile;
> -	}
> -
> -	if (!event->cft->ss) {
> -		ret = -EBADF;
> -		goto out_put_cfile;
> -	}
> -
> -	/*
> -	 * Determine the css of @cfile, verify it belongs to the same
> -	 * cgroup as cgroup.event_control, and associate @event with it.
> -	 * Remaining events are automatically removed on cgroup destruction
> -	 * but the removal is asynchronous, so take an extra ref.
> -	 */
> -	rcu_read_lock();
> -
> -	ret = -EINVAL;
> -	event->css = cgroup_css(cgrp, event->cft->ss);
> -	cfile_css = css_from_dir(cfile->f_dentry->d_parent, event->cft->ss);
> -	if (event->css && event->css == cfile_css && css_tryget(event->css))
> -		ret = 0;
> -
> -	rcu_read_unlock();
> -	if (ret)
> -		goto out_put_cfile;
> -
> -	if (!event->cft->register_event || !event->cft->unregister_event) {
> -		ret = -EINVAL;
> -		goto out_put_css;
> -	}
> -
> -	ret = event->cft->register_event(event->css, event->cft,
> -			event->eventfd, buffer);
> -	if (ret)
> -		goto out_put_css;
> -
> -	efile->f_op->poll(efile, &event->pt);
> -
> -	spin_lock(&cgrp->event_list_lock);
> -	list_add(&event->list, &cgrp->event_list);
> -	spin_unlock(&cgrp->event_list_lock);
> -
> -	fput(cfile);
> -	fput(efile);
> -
> -	return 0;
> -
> -out_put_css:
> -	css_put(event->css);
> -out_put_cfile:
> -	fput(cfile);
> -out_put_eventfd:
> -	eventfd_ctx_put(event->eventfd);
> -out_put_efile:
> -	fput(efile);
> -out_kfree:
> -	kfree(event);
> -
> -	return ret;
> -}
> -
>  static u64 cgroup_clone_children_read(struct cgroup_subsys_state *css,
>  				      struct cftype *cft)
>  {
> @@ -4180,11 +3952,6 @@ static struct cftype cgroup_base_files[]
>  		.mode = S_IRUGO | S_IWUSR,
>  	},
>  	{
> -		.name = "cgroup.event_control",
> -		.write_string = cgroup_write_event_control,
> -		.mode = S_IWUGO,
> -	},
> -	{
>  		.name = "cgroup.clone_children",
>  		.flags = CFTYPE_INSANE,
>  		.read_u64 = cgroup_clone_children_read,
> @@ -4676,7 +4443,6 @@ static int cgroup_destroy_locked(struct
>  	__releases(&cgroup_mutex) __acquires(&cgroup_mutex)
>  {
>  	struct dentry *d = cgrp->dentry;
> -	struct cgroup_event *event, *tmp;
>  	struct cgroup_subsys *ss;
>  	bool empty;
>  
> @@ -4734,18 +4500,6 @@ static int cgroup_destroy_locked(struct
>  	dget(d);
>  	cgroup_d_remove_dir(d);
>  
> -	/*
> -	 * Unregister events and notify userspace.
> -	 * Notify userspace about cgroup removing only after rmdir of cgroup
> -	 * directory to avoid race between userspace and kernelspace.
> -	 */
> -	spin_lock(&cgrp->event_list_lock);
> -	list_for_each_entry_safe(event, tmp, &cgrp->event_list, list) {
> -		list_del_init(&event->list);
> -		schedule_work(&event->remove);
> -	}
> -	spin_unlock(&cgrp->event_list_lock);
> -
>  	return 0;
>  };
>  
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -45,6 +45,7 @@
>  #include <linux/swapops.h>
>  #include <linux/spinlock.h>
>  #include <linux/eventfd.h>
> +#include <linux/poll.h>
>  #include <linux/sort.h>
>  #include <linux/fs.h>
>  #include <linux/seq_file.h>
> @@ -239,6 +240,36 @@ struct mem_cgroup_eventfd_list {
>  	struct eventfd_ctx *eventfd;
>  };
>  
> +/*
> + * cgroup_event represents events which userspace want to receive.
> + */
> +struct cgroup_event {
> +	/*
> +	 * css which the event belongs to.
> +	 */
> +	struct cgroup_subsys_state *css;
> +	/*
> +	 * Control file which the event associated.
> +	 */
> +	struct cftype *cft;
> +	/*
> +	 * eventfd to signal userspace about the event.
> +	 */
> +	struct eventfd_ctx *eventfd;
> +	/*
> +	 * Each of these stored in a list by the cgroup.
> +	 */
> +	struct list_head list;
> +	/*
> +	 * All fields below needed to unregister event when
> +	 * userspace closes eventfd.
> +	 */
> +	poll_table pt;
> +	wait_queue_head_t *wqh;
> +	wait_queue_t wait;
> +	struct work_struct remove;
> +};
> +
>  static void mem_cgroup_threshold(struct mem_cgroup *memcg);
>  static void mem_cgroup_oom_notify(struct mem_cgroup *memcg);
>  
> @@ -5919,6 +5950,202 @@ static void kmem_cgroup_css_offline(stru
>  }
>  #endif
>  
> +/*
> + * Unregister event and free resources.
> + *
> + * Gets called from workqueue.
> + */
> +static void cgroup_event_remove(struct work_struct *work)
> +{
> +	struct cgroup_event *event = container_of(work, struct cgroup_event,
> +			remove);
> +	struct cgroup_subsys_state *css = event->css;
> +
> +	remove_wait_queue(event->wqh, &event->wait);
> +
> +	event->cft->unregister_event(css, event->cft, event->eventfd);
> +
> +	/* Notify userspace the event is going away. */
> +	eventfd_signal(event->eventfd, 1);
> +
> +	eventfd_ctx_put(event->eventfd);
> +	kfree(event);
> +	css_put(css);
> +}
> +
> +/*
> + * Gets called on POLLHUP on eventfd when user closes it.
> + *
> + * Called with wqh->lock held and interrupts disabled.
> + */
> +static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
> +		int sync, void *key)
> +{
> +	struct cgroup_event *event = container_of(wait,
> +			struct cgroup_event, wait);
> +	struct cgroup *cgrp = event->css->cgroup;
> +	unsigned long flags = (unsigned long)key;
> +
> +	if (flags & POLLHUP) {
> +		/*
> +		 * If the event has been detached at cgroup removal, we
> +		 * can simply return knowing the other side will cleanup
> +		 * for us.
> +		 *
> +		 * We can't race against event freeing since the other
> +		 * side will require wqh->lock via remove_wait_queue(),
> +		 * which we hold.
> +		 */
> +		spin_lock(&cgrp->event_list_lock);
> +		if (!list_empty(&event->list)) {
> +			list_del_init(&event->list);
> +			/*
> +			 * We are in atomic context, but cgroup_event_remove()
> +			 * may sleep, so we have to call it in workqueue.
> +			 */
> +			schedule_work(&event->remove);
> +		}
> +		spin_unlock(&cgrp->event_list_lock);
> +	}
> +
> +	return 0;
> +}
> +
> +static void cgroup_event_ptable_queue_proc(struct file *file,
> +		wait_queue_head_t *wqh, poll_table *pt)
> +{
> +	struct cgroup_event *event = container_of(pt,
> +			struct cgroup_event, pt);
> +
> +	event->wqh = wqh;
> +	add_wait_queue(wqh, &event->wait);
> +}
> +
> +/*
> + * Parse input and register new cgroup event handler.
> + *
> + * Input must be in format '<event_fd> <control_fd> <args>'.
> + * Interpretation of args is defined by control file implementation.
> + */
> +static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
> +				      struct cftype *cft, const char *buffer)
> +{
> +	struct cgroup *cgrp = dummy_css->cgroup;
> +	struct cgroup_event *event;
> +	struct cgroup_subsys_state *cfile_css;
> +	unsigned int efd, cfd;
> +	struct file *efile;
> +	struct file *cfile;
> +	char *endp;
> +	int ret;
> +
> +	efd = simple_strtoul(buffer, &endp, 10);
> +	if (*endp != ' ')
> +		return -EINVAL;
> +	buffer = endp + 1;
> +
> +	cfd = simple_strtoul(buffer, &endp, 10);
> +	if ((*endp != ' ') && (*endp != '\0'))
> +		return -EINVAL;
> +	buffer = endp + 1;
> +
> +	event = kzalloc(sizeof(*event), GFP_KERNEL);
> +	if (!event)
> +		return -ENOMEM;
> +
> +	INIT_LIST_HEAD(&event->list);
> +	init_poll_funcptr(&event->pt, cgroup_event_ptable_queue_proc);
> +	init_waitqueue_func_entry(&event->wait, cgroup_event_wake);
> +	INIT_WORK(&event->remove, cgroup_event_remove);
> +
> +	efile = eventfd_fget(efd);
> +	if (IS_ERR(efile)) {
> +		ret = PTR_ERR(efile);
> +		goto out_kfree;
> +	}
> +
> +	event->eventfd = eventfd_ctx_fileget(efile);
> +	if (IS_ERR(event->eventfd)) {
> +		ret = PTR_ERR(event->eventfd);
> +		goto out_put_efile;
> +	}
> +
> +	cfile = fget(cfd);
> +	if (!cfile) {
> +		ret = -EBADF;
> +		goto out_put_eventfd;
> +	}
> +
> +	/* the process need read permission on control file */
> +	/* AV: shouldn't we check that it's been opened for read instead? */
> +	ret = inode_permission(file_inode(cfile), MAY_READ);
> +	if (ret < 0)
> +		goto out_put_cfile;
> +
> +	event->cft = __file_cft(cfile);
> +	if (IS_ERR(event->cft)) {
> +		ret = PTR_ERR(event->cft);
> +		goto out_put_cfile;
> +	}
> +
> +	if (!event->cft->ss) {
> +		ret = -EBADF;
> +		goto out_put_cfile;
> +	}
> +
> +	/*
> +	 * Determine the css of @cfile, verify it belongs to the same
> +	 * cgroup as cgroup.event_control, and associate @event with it.
> +	 * Remaining events are automatically removed on cgroup destruction
> +	 * but the removal is asynchronous, so take an extra ref.
> +	 */
> +	rcu_read_lock();
> +
> +	ret = -EINVAL;
> +	event->css = cgroup_css(cgrp, event->cft->ss);
> +	cfile_css = css_from_dir(cfile->f_dentry->d_parent, event->cft->ss);
> +	if (event->css && event->css == cfile_css && css_tryget(event->css))
> +		ret = 0;
> +
> +	rcu_read_unlock();
> +	if (ret)
> +		goto out_put_cfile;
> +
> +	if (!event->cft->register_event || !event->cft->unregister_event) {
> +		ret = -EINVAL;
> +		goto out_put_css;
> +	}
> +
> +	ret = event->cft->register_event(event->css, event->cft,
> +			event->eventfd, buffer);
> +	if (ret)
> +		goto out_put_css;
> +
> +	efile->f_op->poll(efile, &event->pt);
> +
> +	spin_lock(&cgrp->event_list_lock);
> +	list_add(&event->list, &cgrp->event_list);
> +	spin_unlock(&cgrp->event_list_lock);
> +
> +	fput(cfile);
> +	fput(efile);
> +
> +	return 0;
> +
> +out_put_css:
> +	css_put(event->css);
> +out_put_cfile:
> +	fput(cfile);
> +out_put_eventfd:
> +	eventfd_ctx_put(event->eventfd);
> +out_put_efile:
> +	fput(efile);
> +out_kfree:
> +	kfree(event);
> +
> +	return ret;
> +}
> +
>  static struct cftype mem_cgroup_files[] = {
>  	{
>  		.name = "usage_in_bytes",
> @@ -5966,6 +6193,12 @@ static struct cftype mem_cgroup_files[]
>  		.read_u64 = mem_cgroup_hierarchy_read,
>  	},
>  	{
> +		.name = "cgroup.event_control",
> +		.write_string = cgroup_write_event_control,
> +		.flags = CFTYPE_NO_PREFIX,
> +		.mode = S_IWUGO,
> +	},
> +	{
>  		.name = "swappiness",
>  		.read_u64 = mem_cgroup_swappiness_read,
>  		.write_u64 = mem_cgroup_swappiness_write,
> @@ -6298,6 +6531,20 @@ static void mem_cgroup_invalidate_reclai
>  static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
>  {
>  	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
> +	struct cgroup *cgrp = css->cgroup;
> +	struct cgroup_event *event, *tmp;
> +
> +	/*
> +	 * Unregister events and notify userspace.
> +	 * Notify userspace about cgroup removing only after rmdir of cgroup
> +	 * directory to avoid race between userspace and kernelspace.
> +	 */
> +	spin_lock(&cgrp->event_list_lock);
> +	list_for_each_entry_safe(event, tmp, &cgrp->event_list, list) {
> +		list_del_init(&event->list);
> +		schedule_work(&event->remove);
> +	}
> +	spin_unlock(&cgrp->event_list_lock);
>  
>  	kmem_cgroup_css_offline(memcg);
>  

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v3 06/12] cgroup, memcg: move cgroup_event implementation to memcg
       [not found]         ` <20130829181911.GA8517-9pTldWuhBndy/B6EtB590w@public.gmane.org>
@ 2013-08-30 10:47           ` Michal Hocko
       [not found]             ` <20130830104755.GC28658-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
  2013-08-30 10:47           ` Michal Hocko
  1 sibling, 1 reply; 74+ messages in thread
From: Michal Hocko @ 2013-08-30 10:47 UTC (permalink / raw)
  To: Tejun Heo
  Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Thu 29-08-13 14:19:11, Tejun Heo wrote:
> Hello, Michal.
> 
> Updated as requested.  If there's still somthing missing, please let
> me know.  In case you're okay with the change, how do you wanna route
> it?  01-05 are already in cgroup/for-3.12 and the rest can either be
> applie to cgroup/for-3.12, which will likely to require some, mostly
> trivial, adjustments to memcg patches in -mm, or these all can be
> routed through -mm.  Any preference?

I do not care much but as Andrew wasn't CCed and this comes quite late he
might be reluctant to take it now and push in the next merge window. So
if you hurry with this then route it via your tree.

> 
> Thanks.
> 
> ------ 8< ------
> cgroup_event is way over-designed and tries to build a generic
> flexible event mechanism into cgroup - fully customizable event
> specification for each user of the interface.  This is utterly
> unnecessary and overboard especially in the light of the planned
> unified hierarchy as there's gonna be single agent.  Simply generating
> events at fixed points, or if that's too restrictive, configureable
> cadence or single set of configureable points should be enough.
> 
> In addition, it's adding overhead for future changes for cgroup core.
> 
> * Everything is being made per-css which is necessary as css's
>   lifetime would no longer coincide with cgroup's in the planned
>   unified hierarchy.  Keeping cgroup_event in cgroup proper means that
>   it should be updated to be managed per-css, which is silly as no
>   other subsystems will use this facility.
> 
> * The file system interface of cgroup will be restructured so that it
>   uses sysfs.  The goal of the conversion is losing duplicate logic
>   and taking advantage of features provided by sysfs such as revoke
>   semantics on file deletion and decoupled synchronization from vfs
>   layer.
> 
>   The conversion is likely to involve unifying all different file
>   operation types into a single type and having operations which are
>   as unusual as event callbacks adds unnecessary burden.  While making
>   it specific to memcg doesn't remove all conversion work, it'd lower
>   the amount of work necessary.  We don't have to worry how the
>   unusual event callbacks would fit into the new structure.
> 
> Thankfully, memcg is the only user and gets to keep it.  Replacing it
> with something simpler on sane_behavior is strongly recommended.
> 
> This patch moves cgroup_event and "cgroup.event_control"
> implementation to mm/memcontrol.c.  Clearing of events on cgroup
> destruction is moved from cgroup_destroy_locked() to
> mem_cgroup_css_offline(), which shouldn't make any noticeable
> difference.
> 
> cgroup_css() and __file_cft() are exported to enable the move;
> however, this will soon be reverted once the event code is updated to
> be memcg specific.
> 
> Note that "cgroup.event_control" will now exist only on the hierarchy
> with memcg attached to it.  While this change is visible to userland,
> it is unlikely to be noticeable as the file has never been meaningful
> outside memcg.
> 
> Aside from the above change, this is pure code relocation.
> 
> v2: Per Li Zefan's comments, init/Kconfig updated accordingly and
>     poll.h inclusion moved from cgroup.c to memcontrol.c.
> 
> v3: Per Michal's request, update patch description.

Thanks for the more specific justification.
 
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Acked-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> Acked-by: Kirill A. Shutemov <kirill.shutemov-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
> Cc: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>

Acked-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>

> Cc: Balbir Singh <bsingharora-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> ---
>  include/linux/cgroup.h |    5 
>  init/Kconfig           |    3 
>  kernel/cgroup.c        |  252 -------------------------------------------------
>  mm/memcontrol.c        |  247 ++++++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 256 insertions(+), 251 deletions(-)
> 
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -907,6 +907,11 @@ unsigned short css_id(struct cgroup_subs
>  struct cgroup_subsys_state *css_from_dir(struct dentry *dentry,
>  					 struct cgroup_subsys *ss);
>  
> +/* XXX: temporary */
> +struct cgroup_subsys_state *cgroup_css(struct cgroup *cgrp,
> +				       struct cgroup_subsys *ss);
> +struct cftype *__file_cft(struct file *file);
> +
>  #else /* !CONFIG_CGROUPS */
>  
>  static inline int cgroup_init_early(void) { return 0; }
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -844,7 +844,6 @@ config NUMA_BALANCING
>  
>  menuconfig CGROUPS
>  	boolean "Control Group support"
> -	depends on EVENTFD
>  	help
>  	  This option adds support for grouping sets of processes together, for
>  	  use with process control subsystems such as Cpusets, CFS, memory
> @@ -911,6 +910,7 @@ config MEMCG
>  	bool "Memory Resource Controller for Control Groups"
>  	depends on RESOURCE_COUNTERS
>  	select MM_OWNER
> +	select EVENTFD
>  	help
>  	  Provides a memory resource controller that manages both anonymous
>  	  memory and page cache. (See Documentation/cgroups/memory.txt)
> @@ -1163,7 +1163,6 @@ config UIDGID_STRICT_TYPE_CHECKS
>  
>  config SCHED_AUTOGROUP
>  	bool "Automatic process group scheduling"
> -	select EVENTFD
>  	select CGROUPS
>  	select CGROUP_SCHED
>  	select FAIR_GROUP_SCHED
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -56,8 +56,6 @@
>  #include <linux/pid_namespace.h>
>  #include <linux/idr.h>
>  #include <linux/vmalloc.h> /* TODO: replace with more sophisticated array */
> -#include <linux/eventfd.h>
> -#include <linux/poll.h>
>  #include <linux/flex_array.h> /* used in cgroup_attach_task */
>  #include <linux/kthread.h>
>  
> @@ -155,36 +153,6 @@ struct css_id {
>  	unsigned short stack[0]; /* Array of Length (depth+1) */
>  };
>  
> -/*
> - * cgroup_event represents events which userspace want to receive.
> - */
> -struct cgroup_event {
> -	/*
> -	 * css which the event belongs to.
> -	 */
> -	struct cgroup_subsys_state *css;
> -	/*
> -	 * Control file which the event associated.
> -	 */
> -	struct cftype *cft;
> -	/*
> -	 * eventfd to signal userspace about the event.
> -	 */
> -	struct eventfd_ctx *eventfd;
> -	/*
> -	 * Each of these stored in a list by the cgroup.
> -	 */
> -	struct list_head list;
> -	/*
> -	 * All fields below needed to unregister event when
> -	 * userspace closes eventfd.
> -	 */
> -	poll_table pt;
> -	wait_queue_head_t *wqh;
> -	wait_queue_t wait;
> -	struct work_struct remove;
> -};
> -
>  /* The list of hierarchy roots */
>  
>  static LIST_HEAD(cgroup_roots);
> @@ -234,8 +202,8 @@ static int cgroup_addrm_files(struct cgr
>   * keep accessing it outside the said locks.  This function may return
>   * %NULL if @cgrp doesn't have @subsys_id enabled.
>   */
> -static struct cgroup_subsys_state *cgroup_css(struct cgroup *cgrp,
> -					      struct cgroup_subsys *ss)
> +struct cgroup_subsys_state *cgroup_css(struct cgroup *cgrp,
> +				       struct cgroup_subsys *ss)
>  {
>  	if (ss)
>  		return rcu_dereference_check(cgrp->subsys[ss->subsys_id],
> @@ -2671,7 +2639,7 @@ static struct dentry *cgroup_lookup(stru
>  /*
>   * Check if a file is a control file
>   */
> -static inline struct cftype *__file_cft(struct file *file)
> +struct cftype *__file_cft(struct file *file)
>  {
>  	if (file_inode(file)->i_fop != &cgroup_file_operations)
>  		return ERR_PTR(-EINVAL);
> @@ -3959,202 +3927,6 @@ static void cgroup_dput(struct cgroup *c
>  	deactivate_super(sb);
>  }
>  
> -/*
> - * Unregister event and free resources.
> - *
> - * Gets called from workqueue.
> - */
> -static void cgroup_event_remove(struct work_struct *work)
> -{
> -	struct cgroup_event *event = container_of(work, struct cgroup_event,
> -			remove);
> -	struct cgroup_subsys_state *css = event->css;
> -
> -	remove_wait_queue(event->wqh, &event->wait);
> -
> -	event->cft->unregister_event(css, event->cft, event->eventfd);
> -
> -	/* Notify userspace the event is going away. */
> -	eventfd_signal(event->eventfd, 1);
> -
> -	eventfd_ctx_put(event->eventfd);
> -	kfree(event);
> -	css_put(css);
> -}
> -
> -/*
> - * Gets called on POLLHUP on eventfd when user closes it.
> - *
> - * Called with wqh->lock held and interrupts disabled.
> - */
> -static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
> -		int sync, void *key)
> -{
> -	struct cgroup_event *event = container_of(wait,
> -			struct cgroup_event, wait);
> -	struct cgroup *cgrp = event->css->cgroup;
> -	unsigned long flags = (unsigned long)key;
> -
> -	if (flags & POLLHUP) {
> -		/*
> -		 * If the event has been detached at cgroup removal, we
> -		 * can simply return knowing the other side will cleanup
> -		 * for us.
> -		 *
> -		 * We can't race against event freeing since the other
> -		 * side will require wqh->lock via remove_wait_queue(),
> -		 * which we hold.
> -		 */
> -		spin_lock(&cgrp->event_list_lock);
> -		if (!list_empty(&event->list)) {
> -			list_del_init(&event->list);
> -			/*
> -			 * We are in atomic context, but cgroup_event_remove()
> -			 * may sleep, so we have to call it in workqueue.
> -			 */
> -			schedule_work(&event->remove);
> -		}
> -		spin_unlock(&cgrp->event_list_lock);
> -	}
> -
> -	return 0;
> -}
> -
> -static void cgroup_event_ptable_queue_proc(struct file *file,
> -		wait_queue_head_t *wqh, poll_table *pt)
> -{
> -	struct cgroup_event *event = container_of(pt,
> -			struct cgroup_event, pt);
> -
> -	event->wqh = wqh;
> -	add_wait_queue(wqh, &event->wait);
> -}
> -
> -/*
> - * Parse input and register new cgroup event handler.
> - *
> - * Input must be in format '<event_fd> <control_fd> <args>'.
> - * Interpretation of args is defined by control file implementation.
> - */
> -static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
> -				      struct cftype *cft, const char *buffer)
> -{
> -	struct cgroup *cgrp = dummy_css->cgroup;
> -	struct cgroup_event *event;
> -	struct cgroup_subsys_state *cfile_css;
> -	unsigned int efd, cfd;
> -	struct file *efile;
> -	struct file *cfile;
> -	char *endp;
> -	int ret;
> -
> -	efd = simple_strtoul(buffer, &endp, 10);
> -	if (*endp != ' ')
> -		return -EINVAL;
> -	buffer = endp + 1;
> -
> -	cfd = simple_strtoul(buffer, &endp, 10);
> -	if ((*endp != ' ') && (*endp != '\0'))
> -		return -EINVAL;
> -	buffer = endp + 1;
> -
> -	event = kzalloc(sizeof(*event), GFP_KERNEL);
> -	if (!event)
> -		return -ENOMEM;
> -
> -	INIT_LIST_HEAD(&event->list);
> -	init_poll_funcptr(&event->pt, cgroup_event_ptable_queue_proc);
> -	init_waitqueue_func_entry(&event->wait, cgroup_event_wake);
> -	INIT_WORK(&event->remove, cgroup_event_remove);
> -
> -	efile = eventfd_fget(efd);
> -	if (IS_ERR(efile)) {
> -		ret = PTR_ERR(efile);
> -		goto out_kfree;
> -	}
> -
> -	event->eventfd = eventfd_ctx_fileget(efile);
> -	if (IS_ERR(event->eventfd)) {
> -		ret = PTR_ERR(event->eventfd);
> -		goto out_put_efile;
> -	}
> -
> -	cfile = fget(cfd);
> -	if (!cfile) {
> -		ret = -EBADF;
> -		goto out_put_eventfd;
> -	}
> -
> -	/* the process need read permission on control file */
> -	/* AV: shouldn't we check that it's been opened for read instead? */
> -	ret = inode_permission(file_inode(cfile), MAY_READ);
> -	if (ret < 0)
> -		goto out_put_cfile;
> -
> -	event->cft = __file_cft(cfile);
> -	if (IS_ERR(event->cft)) {
> -		ret = PTR_ERR(event->cft);
> -		goto out_put_cfile;
> -	}
> -
> -	if (!event->cft->ss) {
> -		ret = -EBADF;
> -		goto out_put_cfile;
> -	}
> -
> -	/*
> -	 * Determine the css of @cfile, verify it belongs to the same
> -	 * cgroup as cgroup.event_control, and associate @event with it.
> -	 * Remaining events are automatically removed on cgroup destruction
> -	 * but the removal is asynchronous, so take an extra ref.
> -	 */
> -	rcu_read_lock();
> -
> -	ret = -EINVAL;
> -	event->css = cgroup_css(cgrp, event->cft->ss);
> -	cfile_css = css_from_dir(cfile->f_dentry->d_parent, event->cft->ss);
> -	if (event->css && event->css == cfile_css && css_tryget(event->css))
> -		ret = 0;
> -
> -	rcu_read_unlock();
> -	if (ret)
> -		goto out_put_cfile;
> -
> -	if (!event->cft->register_event || !event->cft->unregister_event) {
> -		ret = -EINVAL;
> -		goto out_put_css;
> -	}
> -
> -	ret = event->cft->register_event(event->css, event->cft,
> -			event->eventfd, buffer);
> -	if (ret)
> -		goto out_put_css;
> -
> -	efile->f_op->poll(efile, &event->pt);
> -
> -	spin_lock(&cgrp->event_list_lock);
> -	list_add(&event->list, &cgrp->event_list);
> -	spin_unlock(&cgrp->event_list_lock);
> -
> -	fput(cfile);
> -	fput(efile);
> -
> -	return 0;
> -
> -out_put_css:
> -	css_put(event->css);
> -out_put_cfile:
> -	fput(cfile);
> -out_put_eventfd:
> -	eventfd_ctx_put(event->eventfd);
> -out_put_efile:
> -	fput(efile);
> -out_kfree:
> -	kfree(event);
> -
> -	return ret;
> -}
> -
>  static u64 cgroup_clone_children_read(struct cgroup_subsys_state *css,
>  				      struct cftype *cft)
>  {
> @@ -4180,11 +3952,6 @@ static struct cftype cgroup_base_files[]
>  		.mode = S_IRUGO | S_IWUSR,
>  	},
>  	{
> -		.name = "cgroup.event_control",
> -		.write_string = cgroup_write_event_control,
> -		.mode = S_IWUGO,
> -	},
> -	{
>  		.name = "cgroup.clone_children",
>  		.flags = CFTYPE_INSANE,
>  		.read_u64 = cgroup_clone_children_read,
> @@ -4676,7 +4443,6 @@ static int cgroup_destroy_locked(struct
>  	__releases(&cgroup_mutex) __acquires(&cgroup_mutex)
>  {
>  	struct dentry *d = cgrp->dentry;
> -	struct cgroup_event *event, *tmp;
>  	struct cgroup_subsys *ss;
>  	bool empty;
>  
> @@ -4734,18 +4500,6 @@ static int cgroup_destroy_locked(struct
>  	dget(d);
>  	cgroup_d_remove_dir(d);
>  
> -	/*
> -	 * Unregister events and notify userspace.
> -	 * Notify userspace about cgroup removing only after rmdir of cgroup
> -	 * directory to avoid race between userspace and kernelspace.
> -	 */
> -	spin_lock(&cgrp->event_list_lock);
> -	list_for_each_entry_safe(event, tmp, &cgrp->event_list, list) {
> -		list_del_init(&event->list);
> -		schedule_work(&event->remove);
> -	}
> -	spin_unlock(&cgrp->event_list_lock);
> -
>  	return 0;
>  };
>  
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -45,6 +45,7 @@
>  #include <linux/swapops.h>
>  #include <linux/spinlock.h>
>  #include <linux/eventfd.h>
> +#include <linux/poll.h>
>  #include <linux/sort.h>
>  #include <linux/fs.h>
>  #include <linux/seq_file.h>
> @@ -239,6 +240,36 @@ struct mem_cgroup_eventfd_list {
>  	struct eventfd_ctx *eventfd;
>  };
>  
> +/*
> + * cgroup_event represents events which userspace want to receive.
> + */
> +struct cgroup_event {
> +	/*
> +	 * css which the event belongs to.
> +	 */
> +	struct cgroup_subsys_state *css;
> +	/*
> +	 * Control file which the event associated.
> +	 */
> +	struct cftype *cft;
> +	/*
> +	 * eventfd to signal userspace about the event.
> +	 */
> +	struct eventfd_ctx *eventfd;
> +	/*
> +	 * Each of these stored in a list by the cgroup.
> +	 */
> +	struct list_head list;
> +	/*
> +	 * All fields below needed to unregister event when
> +	 * userspace closes eventfd.
> +	 */
> +	poll_table pt;
> +	wait_queue_head_t *wqh;
> +	wait_queue_t wait;
> +	struct work_struct remove;
> +};
> +
>  static void mem_cgroup_threshold(struct mem_cgroup *memcg);
>  static void mem_cgroup_oom_notify(struct mem_cgroup *memcg);
>  
> @@ -5919,6 +5950,202 @@ static void kmem_cgroup_css_offline(stru
>  }
>  #endif
>  
> +/*
> + * Unregister event and free resources.
> + *
> + * Gets called from workqueue.
> + */
> +static void cgroup_event_remove(struct work_struct *work)
> +{
> +	struct cgroup_event *event = container_of(work, struct cgroup_event,
> +			remove);
> +	struct cgroup_subsys_state *css = event->css;
> +
> +	remove_wait_queue(event->wqh, &event->wait);
> +
> +	event->cft->unregister_event(css, event->cft, event->eventfd);
> +
> +	/* Notify userspace the event is going away. */
> +	eventfd_signal(event->eventfd, 1);
> +
> +	eventfd_ctx_put(event->eventfd);
> +	kfree(event);
> +	css_put(css);
> +}
> +
> +/*
> + * Gets called on POLLHUP on eventfd when user closes it.
> + *
> + * Called with wqh->lock held and interrupts disabled.
> + */
> +static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
> +		int sync, void *key)
> +{
> +	struct cgroup_event *event = container_of(wait,
> +			struct cgroup_event, wait);
> +	struct cgroup *cgrp = event->css->cgroup;
> +	unsigned long flags = (unsigned long)key;
> +
> +	if (flags & POLLHUP) {
> +		/*
> +		 * If the event has been detached at cgroup removal, we
> +		 * can simply return knowing the other side will cleanup
> +		 * for us.
> +		 *
> +		 * We can't race against event freeing since the other
> +		 * side will require wqh->lock via remove_wait_queue(),
> +		 * which we hold.
> +		 */
> +		spin_lock(&cgrp->event_list_lock);
> +		if (!list_empty(&event->list)) {
> +			list_del_init(&event->list);
> +			/*
> +			 * We are in atomic context, but cgroup_event_remove()
> +			 * may sleep, so we have to call it in workqueue.
> +			 */
> +			schedule_work(&event->remove);
> +		}
> +		spin_unlock(&cgrp->event_list_lock);
> +	}
> +
> +	return 0;
> +}
> +
> +static void cgroup_event_ptable_queue_proc(struct file *file,
> +		wait_queue_head_t *wqh, poll_table *pt)
> +{
> +	struct cgroup_event *event = container_of(pt,
> +			struct cgroup_event, pt);
> +
> +	event->wqh = wqh;
> +	add_wait_queue(wqh, &event->wait);
> +}
> +
> +/*
> + * Parse input and register new cgroup event handler.
> + *
> + * Input must be in format '<event_fd> <control_fd> <args>'.
> + * Interpretation of args is defined by control file implementation.
> + */
> +static int cgroup_write_event_control(struct cgroup_subsys_state *dummy_css,
> +				      struct cftype *cft, const char *buffer)
> +{
> +	struct cgroup *cgrp = dummy_css->cgroup;
> +	struct cgroup_event *event;
> +	struct cgroup_subsys_state *cfile_css;
> +	unsigned int efd, cfd;
> +	struct file *efile;
> +	struct file *cfile;
> +	char *endp;
> +	int ret;
> +
> +	efd = simple_strtoul(buffer, &endp, 10);
> +	if (*endp != ' ')
> +		return -EINVAL;
> +	buffer = endp + 1;
> +
> +	cfd = simple_strtoul(buffer, &endp, 10);
> +	if ((*endp != ' ') && (*endp != '\0'))
> +		return -EINVAL;
> +	buffer = endp + 1;
> +
> +	event = kzalloc(sizeof(*event), GFP_KERNEL);
> +	if (!event)
> +		return -ENOMEM;
> +
> +	INIT_LIST_HEAD(&event->list);
> +	init_poll_funcptr(&event->pt, cgroup_event_ptable_queue_proc);
> +	init_waitqueue_func_entry(&event->wait, cgroup_event_wake);
> +	INIT_WORK(&event->remove, cgroup_event_remove);
> +
> +	efile = eventfd_fget(efd);
> +	if (IS_ERR(efile)) {
> +		ret = PTR_ERR(efile);
> +		goto out_kfree;
> +	}
> +
> +	event->eventfd = eventfd_ctx_fileget(efile);
> +	if (IS_ERR(event->eventfd)) {
> +		ret = PTR_ERR(event->eventfd);
> +		goto out_put_efile;
> +	}
> +
> +	cfile = fget(cfd);
> +	if (!cfile) {
> +		ret = -EBADF;
> +		goto out_put_eventfd;
> +	}
> +
> +	/* the process need read permission on control file */
> +	/* AV: shouldn't we check that it's been opened for read instead? */
> +	ret = inode_permission(file_inode(cfile), MAY_READ);
> +	if (ret < 0)
> +		goto out_put_cfile;
> +
> +	event->cft = __file_cft(cfile);
> +	if (IS_ERR(event->cft)) {
> +		ret = PTR_ERR(event->cft);
> +		goto out_put_cfile;
> +	}
> +
> +	if (!event->cft->ss) {
> +		ret = -EBADF;
> +		goto out_put_cfile;
> +	}
> +
> +	/*
> +	 * Determine the css of @cfile, verify it belongs to the same
> +	 * cgroup as cgroup.event_control, and associate @event with it.
> +	 * Remaining events are automatically removed on cgroup destruction
> +	 * but the removal is asynchronous, so take an extra ref.
> +	 */
> +	rcu_read_lock();
> +
> +	ret = -EINVAL;
> +	event->css = cgroup_css(cgrp, event->cft->ss);
> +	cfile_css = css_from_dir(cfile->f_dentry->d_parent, event->cft->ss);
> +	if (event->css && event->css == cfile_css && css_tryget(event->css))
> +		ret = 0;
> +
> +	rcu_read_unlock();
> +	if (ret)
> +		goto out_put_cfile;
> +
> +	if (!event->cft->register_event || !event->cft->unregister_event) {
> +		ret = -EINVAL;
> +		goto out_put_css;
> +	}
> +
> +	ret = event->cft->register_event(event->css, event->cft,
> +			event->eventfd, buffer);
> +	if (ret)
> +		goto out_put_css;
> +
> +	efile->f_op->poll(efile, &event->pt);
> +
> +	spin_lock(&cgrp->event_list_lock);
> +	list_add(&event->list, &cgrp->event_list);
> +	spin_unlock(&cgrp->event_list_lock);
> +
> +	fput(cfile);
> +	fput(efile);
> +
> +	return 0;
> +
> +out_put_css:
> +	css_put(event->css);
> +out_put_cfile:
> +	fput(cfile);
> +out_put_eventfd:
> +	eventfd_ctx_put(event->eventfd);
> +out_put_efile:
> +	fput(efile);
> +out_kfree:
> +	kfree(event);
> +
> +	return ret;
> +}
> +
>  static struct cftype mem_cgroup_files[] = {
>  	{
>  		.name = "usage_in_bytes",
> @@ -5966,6 +6193,12 @@ static struct cftype mem_cgroup_files[]
>  		.read_u64 = mem_cgroup_hierarchy_read,
>  	},
>  	{
> +		.name = "cgroup.event_control",
> +		.write_string = cgroup_write_event_control,
> +		.flags = CFTYPE_NO_PREFIX,
> +		.mode = S_IWUGO,
> +	},
> +	{
>  		.name = "swappiness",
>  		.read_u64 = mem_cgroup_swappiness_read,
>  		.write_u64 = mem_cgroup_swappiness_write,
> @@ -6298,6 +6531,20 @@ static void mem_cgroup_invalidate_reclai
>  static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
>  {
>  	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
> +	struct cgroup *cgrp = css->cgroup;
> +	struct cgroup_event *event, *tmp;
> +
> +	/*
> +	 * Unregister events and notify userspace.
> +	 * Notify userspace about cgroup removing only after rmdir of cgroup
> +	 * directory to avoid race between userspace and kernelspace.
> +	 */
> +	spin_lock(&cgrp->event_list_lock);
> +	list_for_each_entry_safe(event, tmp, &cgrp->event_list, list) {
> +		list_del_init(&event->list);
> +		schedule_work(&event->remove);
> +	}
> +	spin_unlock(&cgrp->event_list_lock);
>  
>  	kmem_cgroup_css_offline(memcg);
>  

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v3 06/12] cgroup, memcg: move cgroup_event implementation to memcg
       [not found]             ` <20130830104755.GC28658-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
  2013-08-30 10:52               ` Tejun Heo
@ 2013-08-30 10:52               ` Tejun Heo
  1 sibling, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2013-08-30 10:52 UTC (permalink / raw)
  To: Michal Hocko
  Cc: hannes-druUgvl0LCNAfugRpC6u6w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Hello, Michal.

On Fri, Aug 30, 2013 at 12:47:55PM +0200, Michal Hocko wrote:
> > Updated as requested.  If there's still somthing missing, please let
> > me know.  In case you're okay with the change, how do you wanna route
> > it?  01-05 are already in cgroup/for-3.12 and the rest can either be
> > applie to cgroup/for-3.12, which will likely to require some, mostly
> > trivial, adjustments to memcg patches in -mm, or these all can be
> > routed through -mm.  Any preference?
> 
> I do not care much but as Andrew wasn't CCed and this comes quite late he
> might be reluctant to take it now and push in the next merge window. So
> if you hurry with this then route it via your tree.

Hmmm... yeah, it's quite late and I think it'd be better to do it in
the next cycle so that we can avoid disturbing memcg patches (again).
If you ack the changes, I'll queue them in a separate branch after
v3.11 is released so that memcg tree can be based on top of it.  Would
that work?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v3 06/12] cgroup, memcg: move cgroup_event implementation to memcg
       [not found]             ` <20130830104755.GC28658-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
@ 2013-08-30 10:52               ` Tejun Heo
       [not found]                 ` <20130830105210.GA30910-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
  2013-08-30 10:52               ` Tejun Heo
  1 sibling, 1 reply; 74+ messages in thread
From: Tejun Heo @ 2013-08-30 10:52 UTC (permalink / raw)
  To: Michal Hocko
  Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Hello, Michal.

On Fri, Aug 30, 2013 at 12:47:55PM +0200, Michal Hocko wrote:
> > Updated as requested.  If there's still somthing missing, please let
> > me know.  In case you're okay with the change, how do you wanna route
> > it?  01-05 are already in cgroup/for-3.12 and the rest can either be
> > applie to cgroup/for-3.12, which will likely to require some, mostly
> > trivial, adjustments to memcg patches in -mm, or these all can be
> > routed through -mm.  Any preference?
> 
> I do not care much but as Andrew wasn't CCed and this comes quite late he
> might be reluctant to take it now and push in the next merge window. So
> if you hurry with this then route it via your tree.

Hmmm... yeah, it's quite late and I think it'd be better to do it in
the next cycle so that we can avoid disturbing memcg patches (again).
If you ack the changes, I'll queue them in a separate branch after
v3.11 is released so that memcg tree can be based on top of it.  Would
that work?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v3 06/12] cgroup, memcg: move cgroup_event implementation to memcg
       [not found]                 ` <20130830105210.GA30910-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
  2013-08-30 11:05                   ` Michal Hocko
@ 2013-08-30 11:05                   ` Michal Hocko
  1 sibling, 0 replies; 74+ messages in thread
From: Michal Hocko @ 2013-08-30 11:05 UTC (permalink / raw)
  To: Tejun Heo
  Cc: hannes-druUgvl0LCNAfugRpC6u6w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Fri 30-08-13 06:52:10, Tejun Heo wrote:
> Hello, Michal.
> 
> On Fri, Aug 30, 2013 at 12:47:55PM +0200, Michal Hocko wrote:
> > > Updated as requested.  If there's still somthing missing, please let
> > > me know.  In case you're okay with the change, how do you wanna route
> > > it?  01-05 are already in cgroup/for-3.12 and the rest can either be
> > > applie to cgroup/for-3.12, which will likely to require some, mostly
> > > trivial, adjustments to memcg patches in -mm, or these all can be
> > > routed through -mm.  Any preference?
> > 
> > I do not care much but as Andrew wasn't CCed and this comes quite late he
> > might be reluctant to take it now and push in the next merge window. So
> > if you hurry with this then route it via your tree.
> 
> Hmmm... yeah, it's quite late and I think it'd be better to do it in
> the next cycle so that we can avoid disturbing memcg patches (again).
> If you ack the changes, I'll queue them in a separate branch after
> v3.11 is released so that memcg tree can be based on top of it.  Would
> that work?

Yes. Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v3 06/12] cgroup, memcg: move cgroup_event implementation to memcg
       [not found]                 ` <20130830105210.GA30910-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
@ 2013-08-30 11:05                   ` Michal Hocko
  2013-08-30 11:05                   ` Michal Hocko
  1 sibling, 0 replies; 74+ messages in thread
From: Michal Hocko @ 2013-08-30 11:05 UTC (permalink / raw)
  To: Tejun Heo
  Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Fri 30-08-13 06:52:10, Tejun Heo wrote:
> Hello, Michal.
> 
> On Fri, Aug 30, 2013 at 12:47:55PM +0200, Michal Hocko wrote:
> > > Updated as requested.  If there's still somthing missing, please let
> > > me know.  In case you're okay with the change, how do you wanna route
> > > it?  01-05 are already in cgroup/for-3.12 and the rest can either be
> > > applie to cgroup/for-3.12, which will likely to require some, mostly
> > > trivial, adjustments to memcg patches in -mm, or these all can be
> > > routed through -mm.  Any preference?
> > 
> > I do not care much but as Andrew wasn't CCed and this comes quite late he
> > might be reluctant to take it now and push in the next merge window. So
> > if you hurry with this then route it via your tree.
> 
> Hmmm... yeah, it's quite late and I think it'd be better to do it in
> the next cycle so that we can avoid disturbing memcg patches (again).
> If you ack the changes, I'll queue them in a separate branch after
> v3.11 is released so that memcg tree can be based on top of it.  Would
> that work?

Yes. Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 08/12] cgroup, memcg: move cgroup->event_list[_lock] and event callbacks into memcg
       [not found]     ` <1376582550-12548-9-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2013-08-30 11:08       ` Michal Hocko
       [not found]         ` <20130830110846.GB31605-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
  0 siblings, 1 reply; 74+ messages in thread
From: Michal Hocko @ 2013-08-30 11:08 UTC (permalink / raw)
  To: Tejun Heo
  Cc: hannes-druUgvl0LCNAfugRpC6u6w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Thu 15-08-13 12:02:26, Tejun Heo wrote:
[...]
> @@ -6030,12 +6050,13 @@ static void cgroup_event_ptable_queue_proc(struct file *file,
>  static int cgroup_write_event_control(struct cgroup_subsys_state *css,
>  				      struct cftype *cft, const char *buffer)
>  {
> -	struct cgroup *cgrp = css->cgroup;
> +	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
>  	struct cgroup_event *event;
>  	struct cgroup_subsys_state *cfile_css;
>  	unsigned int efd, cfd;
>  	struct file *efile;
>  	struct file *cfile;
> +	const char *name;
>  	char *endp;
>  	int ret;

AFAICS nothing prevents from registering a new event after css_offline
is done already which would lead to a mem leak. The we would need to do
css_tryget somewhere in the beginning. Or is there anything that would
prevent from that?

Other than that it looks correct to me. After the above concern is
sorted out:
Acked-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 09/12] memcg: remove cgroup_event->cft
       [not found]     ` <1376582550-12548-10-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2013-08-30 11:13       ` Michal Hocko
  0 siblings, 0 replies; 74+ messages in thread
From: Michal Hocko @ 2013-08-30 11:13 UTC (permalink / raw)
  To: Tejun Heo
  Cc: hannes-druUgvl0LCNAfugRpC6u6w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Thu 15-08-13 12:02:27, Tejun Heo wrote:
> The only use of cgroup_event->cft is distinguishing "usage_in_bytes"
> and "memsw.usgae_in_bytes" for mem_cgroup_usage_[un]register_event(),
> which can be done by adding an explicit argument to the function and
> implementing two wrappers so that the two cases can be distinguished
> from the function alone.
> 
> Remove cgroup_event->cft and the related code including
> [un]register_events() methods.
> 
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

Acked-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>

> ---
>  include/linux/vmpressure.h |  2 --
>  mm/memcontrol.c            | 65 +++++++++++++++++++++++++---------------------
>  mm/vmpressure.c            | 14 +++-------
>  3 files changed, 38 insertions(+), 43 deletions(-)
> 
> diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
> index 324ea7a..dd0b025 100644
> --- a/include/linux/vmpressure.h
> +++ b/include/linux/vmpressure.h
> @@ -35,11 +35,9 @@ extern struct vmpressure *memcg_to_vmpressure(struct mem_cgroup *memcg);
>  extern struct cgroup_subsys_state *vmpressure_to_css(struct vmpressure *vmpr);
>  extern struct vmpressure *css_to_vmpressure(struct cgroup_subsys_state *css);
>  extern int vmpressure_register_event(struct cgroup_subsys_state *css,
> -				     struct cftype *cft,
>  				     struct eventfd_ctx *eventfd,
>  				     const char *args);
>  extern void vmpressure_unregister_event(struct cgroup_subsys_state *css,
> -					struct cftype *cft,
>  					struct eventfd_ctx *eventfd);
>  #else
>  static inline void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 9b833e1..18e98ae 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -249,10 +249,6 @@ struct cgroup_event {
>  	 */
>  	struct cgroup_subsys_state *css;
>  	/*
> -	 * Control file which the event associated.
> -	 */
> -	struct cftype *cft;
> -	/*
>  	 * eventfd to signal userspace about the event.
>  	 */
>  	struct eventfd_ctx *eventfd;
> @@ -266,15 +262,13 @@ struct cgroup_event {
>  	 * on eventfd to send notification to userspace.
>  	 */
>  	int (*register_event)(struct cgroup_subsys_state *css,
> -			      struct cftype *cft, struct eventfd_ctx *eventfd,
> -			      const char *args);
> +			      struct eventfd_ctx *eventfd, const char *args);
>  	/*
>  	 * unregister_event() callback will be called when userspace closes
>  	 * the eventfd or on cgroup removing.  This callback must be set,
>  	 * if you want provide notification functionality.
>  	 */
>  	void (*unregister_event)(struct cgroup_subsys_state *css,
> -				 struct cftype *cft,
>  				 struct eventfd_ctx *eventfd);
>  	/*
>  	 * All fields below needed to unregister event when
> @@ -5659,13 +5653,12 @@ static void mem_cgroup_oom_notify(struct mem_cgroup *memcg)
>  		mem_cgroup_oom_notify_cb(iter);
>  }
>  
> -static int mem_cgroup_usage_register_event(struct cgroup_subsys_state *css,
> -	struct cftype *cft, struct eventfd_ctx *eventfd, const char *args)
> +static int __mem_cgroup_usage_register_event(struct cgroup_subsys_state *css,
> +	struct eventfd_ctx *eventfd, const char *args, enum res_type type)
>  {
>  	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
>  	struct mem_cgroup_thresholds *thresholds;
>  	struct mem_cgroup_threshold_ary *new;
> -	enum res_type type = MEMFILE_TYPE(cft->private);
>  	u64 threshold, usage;
>  	int i, size, ret;
>  
> @@ -5742,13 +5735,24 @@ unlock:
>  	return ret;
>  }
>  
> -static void mem_cgroup_usage_unregister_event(struct cgroup_subsys_state *css,
> -	struct cftype *cft, struct eventfd_ctx *eventfd)
> +static int mem_cgroup_usage_register_event(struct cgroup_subsys_state *css,
> +	struct eventfd_ctx *eventfd, const char *args)
> +{
> +	return __mem_cgroup_usage_register_event(css, eventfd, args, _MEM);
> +}
> +
> +static int memsw_cgroup_usage_register_event(struct cgroup_subsys_state *css,
> +	struct eventfd_ctx *eventfd, const char *args)
> +{
> +	return __mem_cgroup_usage_register_event(css, eventfd, args, _MEMSWAP);
> +}
> +
> +static void __mem_cgroup_usage_unregister_event(struct cgroup_subsys_state *css,
> +	struct eventfd_ctx *eventfd, enum res_type type)
>  {
>  	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
>  	struct mem_cgroup_thresholds *thresholds;
>  	struct mem_cgroup_threshold_ary *new;
> -	enum res_type type = MEMFILE_TYPE(cft->private);
>  	u64 usage;
>  	int i, j, size;
>  
> @@ -5821,14 +5825,24 @@ unlock:
>  	mutex_unlock(&memcg->thresholds_lock);
>  }
>  
> +static void mem_cgroup_usage_unregister_event(struct cgroup_subsys_state *css,
> +	struct eventfd_ctx *eventfd)
> +{
> +	return __mem_cgroup_usage_unregister_event(css, eventfd, _MEM);
> +}
> +
> +static void memsw_cgroup_usage_unregister_event(struct cgroup_subsys_state *css,
> +	struct eventfd_ctx *eventfd)
> +{
> +	return __mem_cgroup_usage_unregister_event(css, eventfd, _MEMSWAP);
> +}
> +
>  static int mem_cgroup_oom_register_event(struct cgroup_subsys_state *css,
> -	struct cftype *cft, struct eventfd_ctx *eventfd, const char *args)
> +	struct eventfd_ctx *eventfd, const char *args)
>  {
>  	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
>  	struct mem_cgroup_eventfd_list *event;
> -	enum res_type type = MEMFILE_TYPE(cft->private);
>  
> -	BUG_ON(type != _OOM_TYPE);
>  	event = kmalloc(sizeof(*event),	GFP_KERNEL);
>  	if (!event)
>  		return -ENOMEM;
> @@ -5847,13 +5861,10 @@ static int mem_cgroup_oom_register_event(struct cgroup_subsys_state *css,
>  }
>  
>  static void mem_cgroup_oom_unregister_event(struct cgroup_subsys_state *css,
> -	struct cftype *cft, struct eventfd_ctx *eventfd)
> +	struct eventfd_ctx *eventfd)
>  {
>  	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
>  	struct mem_cgroup_eventfd_list *ev, *tmp;
> -	enum res_type type = MEMFILE_TYPE(cft->private);
> -
> -	BUG_ON(type != _OOM_TYPE);
>  
>  	spin_lock(&memcg_oom_lock);
>  
> @@ -5983,7 +5994,7 @@ static void cgroup_event_remove(struct work_struct *work)
>  
>  	remove_wait_queue(event->wqh, &event->wait);
>  
> -	event->unregister_event(css, event->cft, event->eventfd);
> +	event->unregister_event(css, event->eventfd);
>  
>  	/* Notify userspace the event is going away. */
>  	eventfd_signal(event->eventfd, 1);
> @@ -6104,12 +6115,6 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *css,
>  	if (ret < 0)
>  		goto out_put_cfile;
>  
> -	event->cft = __file_cft(cfile);
> -	if (IS_ERR(event->cft)) {
> -		ret = PTR_ERR(event->cft);
> -		goto out_put_cfile;
> -	}
> -
>  	/*
>  	 * Determine the event callbacks and set them in @event.  This used
>  	 * to be done via struct cftype but cgroup core no longer knows
> @@ -6128,8 +6133,8 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *css,
>  		event->register_event = vmpressure_register_event;
>  		event->unregister_event = vmpressure_unregister_event;
>  	} else if (!strcmp(name, "memory.memsw.usage_in_bytes")) {
> -		event->register_event = mem_cgroup_usage_register_event;
> -		event->unregister_event = mem_cgroup_usage_unregister_event;
> +		event->register_event = memsw_cgroup_usage_register_event;
> +		event->unregister_event = memsw_cgroup_usage_unregister_event;
>  	} else {
>  		ret = -EINVAL;
>  		goto out_put_cfile;
> @@ -6151,7 +6156,7 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *css,
>  	if (ret)
>  		goto out_put_cfile;
>  
> -	ret = event->register_event(css, event->cft, event->eventfd, buffer);
> +	ret = event->register_event(css, event->eventfd, buffer);
>  	if (ret)
>  		goto out_put_css;
>  
> diff --git a/mm/vmpressure.c b/mm/vmpressure.c
> index 13489b1..40ed6e8 100644
> --- a/mm/vmpressure.c
> +++ b/mm/vmpressure.c
> @@ -279,7 +279,6 @@ void vmpressure_prio(gfp_t gfp, struct mem_cgroup *memcg, int prio)
>  /**
>   * vmpressure_register_event() - Bind vmpressure notifications to an eventfd
>   * @css:	css that is interested in vmpressure notifications
> - * @cft:	cgroup control files handle
>   * @eventfd:	eventfd context to link notifications with
>   * @args:	event arguments (used to set up a pressure level threshold)
>   *
> @@ -289,13 +288,10 @@ void vmpressure_prio(gfp_t gfp, struct mem_cgroup *memcg, int prio)
>   * threshold (one of vmpressure_str_levels, i.e. "low", "medium", or
>   * "critical").
>   *
> - * This function should not be used directly, just pass it to (struct
> - * cftype).register_event, and then cgroup core will handle everything by
> - * itself.
> + * To be used as memcg event method.
>   */
>  int vmpressure_register_event(struct cgroup_subsys_state *css,
> -			      struct cftype *cft, struct eventfd_ctx *eventfd,
> -			      const char *args)
> +			      struct eventfd_ctx *eventfd, const char *args)
>  {
>  	struct vmpressure *vmpr = css_to_vmpressure(css);
>  	struct vmpressure_event *ev;
> @@ -326,19 +322,15 @@ int vmpressure_register_event(struct cgroup_subsys_state *css,
>  /**
>   * vmpressure_unregister_event() - Unbind eventfd from vmpressure
>   * @css:	css handle
> - * @cft:	cgroup control files handle
>   * @eventfd:	eventfd context that was used to link vmpressure with the @cg
>   *
>   * This function does internal manipulations to detach the @eventfd from
>   * the vmpressure notifications, and then frees internal resources
>   * associated with the @eventfd (but the @eventfd itself is not freed).
>   *
> - * This function should not be used directly, just pass it to (struct
> - * cftype).unregister_event, and then cgroup core will handle everything
> - * by itself.
> + * To be used as memcg event method.
>   */
>  void vmpressure_unregister_event(struct cgroup_subsys_state *css,
> -				 struct cftype *cft,
>  				 struct eventfd_ctx *eventfd)
>  {
>  	struct vmpressure *vmpr = css_to_vmpressure(css);
> -- 
> 1.8.3.1
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 10/12] memcg: make cgroup_event deal with mem_cgroup instead of cgroup_subsys_state
       [not found]     ` <1376582550-12548-11-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2013-08-30 11:15       ` Michal Hocko
@ 2013-08-30 11:15       ` Michal Hocko
  1 sibling, 0 replies; 74+ messages in thread
From: Michal Hocko @ 2013-08-30 11:15 UTC (permalink / raw)
  To: Tejun Heo
  Cc: hannes-druUgvl0LCNAfugRpC6u6w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Thu 15-08-13 12:02:28, Tejun Heo wrote:
> cgroup_event is now memcg specific.  Replace cgroup_event->css with
> ->memcg and convert [un]register_event() callbacks to take mem_cgroup
> pointer instead of cgroup_subsys_state one.  This simplifies the code
> slightly and makes css_to_vmpressure() unnecessary which is removed.
> 
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

Acked-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>

> ---
>  include/linux/vmpressure.h |  5 ++---
>  mm/memcontrol.c            | 53 +++++++++++++++++++---------------------------
>  mm/vmpressure.c            | 12 +++++------
>  3 files changed, 30 insertions(+), 40 deletions(-)
> 
> diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
> index dd0b025..3c67eb3 100644
> --- a/include/linux/vmpressure.h
> +++ b/include/linux/vmpressure.h
> @@ -33,11 +33,10 @@ extern void vmpressure_prio(gfp_t gfp, struct mem_cgroup *memcg, int prio);
>  extern void vmpressure_init(struct vmpressure *vmpr);
>  extern struct vmpressure *memcg_to_vmpressure(struct mem_cgroup *memcg);
>  extern struct cgroup_subsys_state *vmpressure_to_css(struct vmpressure *vmpr);
> -extern struct vmpressure *css_to_vmpressure(struct cgroup_subsys_state *css);
> -extern int vmpressure_register_event(struct cgroup_subsys_state *css,
> +extern int vmpressure_register_event(struct mem_cgroup *memcg,
>  				     struct eventfd_ctx *eventfd,
>  				     const char *args);
> -extern void vmpressure_unregister_event(struct cgroup_subsys_state *css,
> +extern void vmpressure_unregister_event(struct mem_cgroup *memcg,
>  					struct eventfd_ctx *eventfd);
>  #else
>  static inline void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 18e98ae..8663d6c 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -245,9 +245,9 @@ struct mem_cgroup_eventfd_list {
>   */
>  struct cgroup_event {
>  	/*
> -	 * css which the event belongs to.
> +	 * memcg which the event belongs to.
>  	 */
> -	struct cgroup_subsys_state *css;
> +	struct mem_cgroup *memcg;
>  	/*
>  	 * eventfd to signal userspace about the event.
>  	 */
> @@ -261,14 +261,14 @@ struct cgroup_event {
>  	 * waiter for changes related to this event.  Use eventfd_signal()
>  	 * on eventfd to send notification to userspace.
>  	 */
> -	int (*register_event)(struct cgroup_subsys_state *css,
> +	int (*register_event)(struct mem_cgroup *memcg,
>  			      struct eventfd_ctx *eventfd, const char *args);
>  	/*
>  	 * unregister_event() callback will be called when userspace closes
>  	 * the eventfd or on cgroup removing.  This callback must be set,
>  	 * if you want provide notification functionality.
>  	 */
> -	void (*unregister_event)(struct cgroup_subsys_state *css,
> +	void (*unregister_event)(struct mem_cgroup *memcg,
>  				 struct eventfd_ctx *eventfd);
>  	/*
>  	 * All fields below needed to unregister event when
> @@ -546,11 +546,6 @@ struct cgroup_subsys_state *vmpressure_to_css(struct vmpressure *vmpr)
>  	return &container_of(vmpr, struct mem_cgroup, vmpressure)->css;
>  }
>  
> -struct vmpressure *css_to_vmpressure(struct cgroup_subsys_state *css)
> -{
> -	return &mem_cgroup_from_css(css)->vmpressure;
> -}
> -
>  static inline bool mem_cgroup_is_root(struct mem_cgroup *memcg)
>  {
>  	return (memcg == root_mem_cgroup);
> @@ -5653,10 +5648,9 @@ static void mem_cgroup_oom_notify(struct mem_cgroup *memcg)
>  		mem_cgroup_oom_notify_cb(iter);
>  }
>  
> -static int __mem_cgroup_usage_register_event(struct cgroup_subsys_state *css,
> +static int __mem_cgroup_usage_register_event(struct mem_cgroup *memcg,
>  	struct eventfd_ctx *eventfd, const char *args, enum res_type type)
>  {
> -	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
>  	struct mem_cgroup_thresholds *thresholds;
>  	struct mem_cgroup_threshold_ary *new;
>  	u64 threshold, usage;
> @@ -5735,22 +5729,21 @@ unlock:
>  	return ret;
>  }
>  
> -static int mem_cgroup_usage_register_event(struct cgroup_subsys_state *css,
> +static int mem_cgroup_usage_register_event(struct mem_cgroup *memcg,
>  	struct eventfd_ctx *eventfd, const char *args)
>  {
> -	return __mem_cgroup_usage_register_event(css, eventfd, args, _MEM);
> +	return __mem_cgroup_usage_register_event(memcg, eventfd, args, _MEM);
>  }
>  
> -static int memsw_cgroup_usage_register_event(struct cgroup_subsys_state *css,
> +static int memsw_cgroup_usage_register_event(struct mem_cgroup *memcg,
>  	struct eventfd_ctx *eventfd, const char *args)
>  {
> -	return __mem_cgroup_usage_register_event(css, eventfd, args, _MEMSWAP);
> +	return __mem_cgroup_usage_register_event(memcg, eventfd, args, _MEMSWAP);
>  }
>  
> -static void __mem_cgroup_usage_unregister_event(struct cgroup_subsys_state *css,
> +static void __mem_cgroup_usage_unregister_event(struct mem_cgroup *memcg,
>  	struct eventfd_ctx *eventfd, enum res_type type)
>  {
> -	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
>  	struct mem_cgroup_thresholds *thresholds;
>  	struct mem_cgroup_threshold_ary *new;
>  	u64 usage;
> @@ -5825,22 +5818,21 @@ unlock:
>  	mutex_unlock(&memcg->thresholds_lock);
>  }
>  
> -static void mem_cgroup_usage_unregister_event(struct cgroup_subsys_state *css,
> +static void mem_cgroup_usage_unregister_event(struct mem_cgroup *memcg,
>  	struct eventfd_ctx *eventfd)
>  {
> -	return __mem_cgroup_usage_unregister_event(css, eventfd, _MEM);
> +	return __mem_cgroup_usage_unregister_event(memcg, eventfd, _MEM);
>  }
>  
> -static void memsw_cgroup_usage_unregister_event(struct cgroup_subsys_state *css,
> +static void memsw_cgroup_usage_unregister_event(struct mem_cgroup *memcg,
>  	struct eventfd_ctx *eventfd)
>  {
> -	return __mem_cgroup_usage_unregister_event(css, eventfd, _MEMSWAP);
> +	return __mem_cgroup_usage_unregister_event(memcg, eventfd, _MEMSWAP);
>  }
>  
> -static int mem_cgroup_oom_register_event(struct cgroup_subsys_state *css,
> +static int mem_cgroup_oom_register_event(struct mem_cgroup *memcg,
>  	struct eventfd_ctx *eventfd, const char *args)
>  {
> -	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
>  	struct mem_cgroup_eventfd_list *event;
>  
>  	event = kmalloc(sizeof(*event),	GFP_KERNEL);
> @@ -5860,10 +5852,9 @@ static int mem_cgroup_oom_register_event(struct cgroup_subsys_state *css,
>  	return 0;
>  }
>  
> -static void mem_cgroup_oom_unregister_event(struct cgroup_subsys_state *css,
> +static void mem_cgroup_oom_unregister_event(struct mem_cgroup *memcg,
>  	struct eventfd_ctx *eventfd)
>  {
> -	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
>  	struct mem_cgroup_eventfd_list *ev, *tmp;
>  
>  	spin_lock(&memcg_oom_lock);
> @@ -5990,18 +5981,18 @@ static void cgroup_event_remove(struct work_struct *work)
>  {
>  	struct cgroup_event *event = container_of(work, struct cgroup_event,
>  			remove);
> -	struct cgroup_subsys_state *css = event->css;
> +	struct mem_cgroup *memcg = event->memcg;
>  
>  	remove_wait_queue(event->wqh, &event->wait);
>  
> -	event->unregister_event(css, event->eventfd);
> +	event->unregister_event(memcg, event->eventfd);
>  
>  	/* Notify userspace the event is going away. */
>  	eventfd_signal(event->eventfd, 1);
>  
>  	eventfd_ctx_put(event->eventfd);
>  	kfree(event);
> -	css_put(css);
> +	css_put(&memcg->css);
>  }
>  
>  /*
> @@ -6014,7 +6005,7 @@ static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
>  {
>  	struct cgroup_event *event = container_of(wait,
>  			struct cgroup_event, wait);
> -	struct mem_cgroup *memcg = mem_cgroup_from_css(event->css);
> +	struct mem_cgroup *memcg = event->memcg;
>  	unsigned long flags = (unsigned long)key;
>  
>  	if (flags & POLLHUP) {
> @@ -6085,7 +6076,7 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *css,
>  	if (!event)
>  		return -ENOMEM;
>  
> -	event->css = css;
> +	event->memcg = memcg;
>  	INIT_LIST_HEAD(&event->list);
>  	init_poll_funcptr(&event->pt, cgroup_event_ptable_queue_proc);
>  	init_waitqueue_func_entry(&event->wait, cgroup_event_wake);
> @@ -6156,7 +6147,7 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *css,
>  	if (ret)
>  		goto out_put_cfile;
>  
> -	ret = event->register_event(css, event->eventfd, buffer);
> +	ret = event->register_event(memcg, event->eventfd, buffer);
>  	if (ret)
>  		goto out_put_css;
>  
> diff --git a/mm/vmpressure.c b/mm/vmpressure.c
> index 40ed6e8..96f7509 100644
> --- a/mm/vmpressure.c
> +++ b/mm/vmpressure.c
> @@ -278,7 +278,7 @@ void vmpressure_prio(gfp_t gfp, struct mem_cgroup *memcg, int prio)
>  
>  /**
>   * vmpressure_register_event() - Bind vmpressure notifications to an eventfd
> - * @css:	css that is interested in vmpressure notifications
> + * @memcg:	memcg that is interested in vmpressure notifications
>   * @eventfd:	eventfd context to link notifications with
>   * @args:	event arguments (used to set up a pressure level threshold)
>   *
> @@ -290,10 +290,10 @@ void vmpressure_prio(gfp_t gfp, struct mem_cgroup *memcg, int prio)
>   *
>   * To be used as memcg event method.
>   */
> -int vmpressure_register_event(struct cgroup_subsys_state *css,
> +int vmpressure_register_event(struct mem_cgroup *memcg,
>  			      struct eventfd_ctx *eventfd, const char *args)
>  {
> -	struct vmpressure *vmpr = css_to_vmpressure(css);
> +	struct vmpressure *vmpr = memcg_to_vmpressure(memcg);
>  	struct vmpressure_event *ev;
>  	int level;
>  
> @@ -321,7 +321,7 @@ int vmpressure_register_event(struct cgroup_subsys_state *css,
>  
>  /**
>   * vmpressure_unregister_event() - Unbind eventfd from vmpressure
> - * @css:	css handle
> + * @memcg:	memcg handle
>   * @eventfd:	eventfd context that was used to link vmpressure with the @cg
>   *
>   * This function does internal manipulations to detach the @eventfd from
> @@ -330,10 +330,10 @@ int vmpressure_register_event(struct cgroup_subsys_state *css,
>   *
>   * To be used as memcg event method.
>   */
> -void vmpressure_unregister_event(struct cgroup_subsys_state *css,
> +void vmpressure_unregister_event(struct mem_cgroup *memcg,
>  				 struct eventfd_ctx *eventfd)
>  {
> -	struct vmpressure *vmpr = css_to_vmpressure(css);
> +	struct vmpressure *vmpr = memcg_to_vmpressure(memcg);
>  	struct vmpressure_event *ev;
>  
>  	mutex_lock(&vmpr->events_lock);
> -- 
> 1.8.3.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 10/12] memcg: make cgroup_event deal with mem_cgroup instead of cgroup_subsys_state
       [not found]     ` <1376582550-12548-11-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2013-08-30 11:15       ` Michal Hocko
  2013-08-30 11:15       ` Michal Hocko
  1 sibling, 0 replies; 74+ messages in thread
From: Michal Hocko @ 2013-08-30 11:15 UTC (permalink / raw)
  To: Tejun Heo
  Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Thu 15-08-13 12:02:28, Tejun Heo wrote:
> cgroup_event is now memcg specific.  Replace cgroup_event->css with
> ->memcg and convert [un]register_event() callbacks to take mem_cgroup
> pointer instead of cgroup_subsys_state one.  This simplifies the code
> slightly and makes css_to_vmpressure() unnecessary which is removed.
> 
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

Acked-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>

> ---
>  include/linux/vmpressure.h |  5 ++---
>  mm/memcontrol.c            | 53 +++++++++++++++++++---------------------------
>  mm/vmpressure.c            | 12 +++++------
>  3 files changed, 30 insertions(+), 40 deletions(-)
> 
> diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
> index dd0b025..3c67eb3 100644
> --- a/include/linux/vmpressure.h
> +++ b/include/linux/vmpressure.h
> @@ -33,11 +33,10 @@ extern void vmpressure_prio(gfp_t gfp, struct mem_cgroup *memcg, int prio);
>  extern void vmpressure_init(struct vmpressure *vmpr);
>  extern struct vmpressure *memcg_to_vmpressure(struct mem_cgroup *memcg);
>  extern struct cgroup_subsys_state *vmpressure_to_css(struct vmpressure *vmpr);
> -extern struct vmpressure *css_to_vmpressure(struct cgroup_subsys_state *css);
> -extern int vmpressure_register_event(struct cgroup_subsys_state *css,
> +extern int vmpressure_register_event(struct mem_cgroup *memcg,
>  				     struct eventfd_ctx *eventfd,
>  				     const char *args);
> -extern void vmpressure_unregister_event(struct cgroup_subsys_state *css,
> +extern void vmpressure_unregister_event(struct mem_cgroup *memcg,
>  					struct eventfd_ctx *eventfd);
>  #else
>  static inline void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 18e98ae..8663d6c 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -245,9 +245,9 @@ struct mem_cgroup_eventfd_list {
>   */
>  struct cgroup_event {
>  	/*
> -	 * css which the event belongs to.
> +	 * memcg which the event belongs to.
>  	 */
> -	struct cgroup_subsys_state *css;
> +	struct mem_cgroup *memcg;
>  	/*
>  	 * eventfd to signal userspace about the event.
>  	 */
> @@ -261,14 +261,14 @@ struct cgroup_event {
>  	 * waiter for changes related to this event.  Use eventfd_signal()
>  	 * on eventfd to send notification to userspace.
>  	 */
> -	int (*register_event)(struct cgroup_subsys_state *css,
> +	int (*register_event)(struct mem_cgroup *memcg,
>  			      struct eventfd_ctx *eventfd, const char *args);
>  	/*
>  	 * unregister_event() callback will be called when userspace closes
>  	 * the eventfd or on cgroup removing.  This callback must be set,
>  	 * if you want provide notification functionality.
>  	 */
> -	void (*unregister_event)(struct cgroup_subsys_state *css,
> +	void (*unregister_event)(struct mem_cgroup *memcg,
>  				 struct eventfd_ctx *eventfd);
>  	/*
>  	 * All fields below needed to unregister event when
> @@ -546,11 +546,6 @@ struct cgroup_subsys_state *vmpressure_to_css(struct vmpressure *vmpr)
>  	return &container_of(vmpr, struct mem_cgroup, vmpressure)->css;
>  }
>  
> -struct vmpressure *css_to_vmpressure(struct cgroup_subsys_state *css)
> -{
> -	return &mem_cgroup_from_css(css)->vmpressure;
> -}
> -
>  static inline bool mem_cgroup_is_root(struct mem_cgroup *memcg)
>  {
>  	return (memcg == root_mem_cgroup);
> @@ -5653,10 +5648,9 @@ static void mem_cgroup_oom_notify(struct mem_cgroup *memcg)
>  		mem_cgroup_oom_notify_cb(iter);
>  }
>  
> -static int __mem_cgroup_usage_register_event(struct cgroup_subsys_state *css,
> +static int __mem_cgroup_usage_register_event(struct mem_cgroup *memcg,
>  	struct eventfd_ctx *eventfd, const char *args, enum res_type type)
>  {
> -	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
>  	struct mem_cgroup_thresholds *thresholds;
>  	struct mem_cgroup_threshold_ary *new;
>  	u64 threshold, usage;
> @@ -5735,22 +5729,21 @@ unlock:
>  	return ret;
>  }
>  
> -static int mem_cgroup_usage_register_event(struct cgroup_subsys_state *css,
> +static int mem_cgroup_usage_register_event(struct mem_cgroup *memcg,
>  	struct eventfd_ctx *eventfd, const char *args)
>  {
> -	return __mem_cgroup_usage_register_event(css, eventfd, args, _MEM);
> +	return __mem_cgroup_usage_register_event(memcg, eventfd, args, _MEM);
>  }
>  
> -static int memsw_cgroup_usage_register_event(struct cgroup_subsys_state *css,
> +static int memsw_cgroup_usage_register_event(struct mem_cgroup *memcg,
>  	struct eventfd_ctx *eventfd, const char *args)
>  {
> -	return __mem_cgroup_usage_register_event(css, eventfd, args, _MEMSWAP);
> +	return __mem_cgroup_usage_register_event(memcg, eventfd, args, _MEMSWAP);
>  }
>  
> -static void __mem_cgroup_usage_unregister_event(struct cgroup_subsys_state *css,
> +static void __mem_cgroup_usage_unregister_event(struct mem_cgroup *memcg,
>  	struct eventfd_ctx *eventfd, enum res_type type)
>  {
> -	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
>  	struct mem_cgroup_thresholds *thresholds;
>  	struct mem_cgroup_threshold_ary *new;
>  	u64 usage;
> @@ -5825,22 +5818,21 @@ unlock:
>  	mutex_unlock(&memcg->thresholds_lock);
>  }
>  
> -static void mem_cgroup_usage_unregister_event(struct cgroup_subsys_state *css,
> +static void mem_cgroup_usage_unregister_event(struct mem_cgroup *memcg,
>  	struct eventfd_ctx *eventfd)
>  {
> -	return __mem_cgroup_usage_unregister_event(css, eventfd, _MEM);
> +	return __mem_cgroup_usage_unregister_event(memcg, eventfd, _MEM);
>  }
>  
> -static void memsw_cgroup_usage_unregister_event(struct cgroup_subsys_state *css,
> +static void memsw_cgroup_usage_unregister_event(struct mem_cgroup *memcg,
>  	struct eventfd_ctx *eventfd)
>  {
> -	return __mem_cgroup_usage_unregister_event(css, eventfd, _MEMSWAP);
> +	return __mem_cgroup_usage_unregister_event(memcg, eventfd, _MEMSWAP);
>  }
>  
> -static int mem_cgroup_oom_register_event(struct cgroup_subsys_state *css,
> +static int mem_cgroup_oom_register_event(struct mem_cgroup *memcg,
>  	struct eventfd_ctx *eventfd, const char *args)
>  {
> -	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
>  	struct mem_cgroup_eventfd_list *event;
>  
>  	event = kmalloc(sizeof(*event),	GFP_KERNEL);
> @@ -5860,10 +5852,9 @@ static int mem_cgroup_oom_register_event(struct cgroup_subsys_state *css,
>  	return 0;
>  }
>  
> -static void mem_cgroup_oom_unregister_event(struct cgroup_subsys_state *css,
> +static void mem_cgroup_oom_unregister_event(struct mem_cgroup *memcg,
>  	struct eventfd_ctx *eventfd)
>  {
> -	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
>  	struct mem_cgroup_eventfd_list *ev, *tmp;
>  
>  	spin_lock(&memcg_oom_lock);
> @@ -5990,18 +5981,18 @@ static void cgroup_event_remove(struct work_struct *work)
>  {
>  	struct cgroup_event *event = container_of(work, struct cgroup_event,
>  			remove);
> -	struct cgroup_subsys_state *css = event->css;
> +	struct mem_cgroup *memcg = event->memcg;
>  
>  	remove_wait_queue(event->wqh, &event->wait);
>  
> -	event->unregister_event(css, event->eventfd);
> +	event->unregister_event(memcg, event->eventfd);
>  
>  	/* Notify userspace the event is going away. */
>  	eventfd_signal(event->eventfd, 1);
>  
>  	eventfd_ctx_put(event->eventfd);
>  	kfree(event);
> -	css_put(css);
> +	css_put(&memcg->css);
>  }
>  
>  /*
> @@ -6014,7 +6005,7 @@ static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
>  {
>  	struct cgroup_event *event = container_of(wait,
>  			struct cgroup_event, wait);
> -	struct mem_cgroup *memcg = mem_cgroup_from_css(event->css);
> +	struct mem_cgroup *memcg = event->memcg;
>  	unsigned long flags = (unsigned long)key;
>  
>  	if (flags & POLLHUP) {
> @@ -6085,7 +6076,7 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *css,
>  	if (!event)
>  		return -ENOMEM;
>  
> -	event->css = css;
> +	event->memcg = memcg;
>  	INIT_LIST_HEAD(&event->list);
>  	init_poll_funcptr(&event->pt, cgroup_event_ptable_queue_proc);
>  	init_waitqueue_func_entry(&event->wait, cgroup_event_wake);
> @@ -6156,7 +6147,7 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *css,
>  	if (ret)
>  		goto out_put_cfile;
>  
> -	ret = event->register_event(css, event->eventfd, buffer);
> +	ret = event->register_event(memcg, event->eventfd, buffer);
>  	if (ret)
>  		goto out_put_css;
>  
> diff --git a/mm/vmpressure.c b/mm/vmpressure.c
> index 40ed6e8..96f7509 100644
> --- a/mm/vmpressure.c
> +++ b/mm/vmpressure.c
> @@ -278,7 +278,7 @@ void vmpressure_prio(gfp_t gfp, struct mem_cgroup *memcg, int prio)
>  
>  /**
>   * vmpressure_register_event() - Bind vmpressure notifications to an eventfd
> - * @css:	css that is interested in vmpressure notifications
> + * @memcg:	memcg that is interested in vmpressure notifications
>   * @eventfd:	eventfd context to link notifications with
>   * @args:	event arguments (used to set up a pressure level threshold)
>   *
> @@ -290,10 +290,10 @@ void vmpressure_prio(gfp_t gfp, struct mem_cgroup *memcg, int prio)
>   *
>   * To be used as memcg event method.
>   */
> -int vmpressure_register_event(struct cgroup_subsys_state *css,
> +int vmpressure_register_event(struct mem_cgroup *memcg,
>  			      struct eventfd_ctx *eventfd, const char *args)
>  {
> -	struct vmpressure *vmpr = css_to_vmpressure(css);
> +	struct vmpressure *vmpr = memcg_to_vmpressure(memcg);
>  	struct vmpressure_event *ev;
>  	int level;
>  
> @@ -321,7 +321,7 @@ int vmpressure_register_event(struct cgroup_subsys_state *css,
>  
>  /**
>   * vmpressure_unregister_event() - Unbind eventfd from vmpressure
> - * @css:	css handle
> + * @memcg:	memcg handle
>   * @eventfd:	eventfd context that was used to link vmpressure with the @cg
>   *
>   * This function does internal manipulations to detach the @eventfd from
> @@ -330,10 +330,10 @@ int vmpressure_register_event(struct cgroup_subsys_state *css,
>   *
>   * To be used as memcg event method.
>   */
> -void vmpressure_unregister_event(struct cgroup_subsys_state *css,
> +void vmpressure_unregister_event(struct mem_cgroup *memcg,
>  				 struct eventfd_ctx *eventfd)
>  {
> -	struct vmpressure *vmpr = css_to_vmpressure(css);
> +	struct vmpressure *vmpr = memcg_to_vmpressure(memcg);
>  	struct vmpressure_event *ev;
>  
>  	mutex_lock(&vmpr->events_lock);
> -- 
> 1.8.3.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 11/12] memcg: rename cgroup_event to mem_cgroup_event
       [not found]     ` <1376582550-12548-12-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                         ` (2 preceding siblings ...)
  2013-08-30 11:19       ` Michal Hocko
@ 2013-08-30 11:19       ` Michal Hocko
  3 siblings, 0 replies; 74+ messages in thread
From: Michal Hocko @ 2013-08-30 11:19 UTC (permalink / raw)
  To: Tejun Heo
  Cc: hannes-druUgvl0LCNAfugRpC6u6w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Thu 15-08-13 12:02:29, Tejun Heo wrote:
> cgroup_event is only available in memcg now.  Let's brand it that way.
> While at it, add a comment encouraging deprecation of the feature and
> remove the respective section from cgroup documentation.
> 
> This patch is cosmetic.
> 
> v2: Index in cgroups.txt updated accordingly as suggested by Li Zefan.

OK, Documentation/cgroups/memory.txt contains a documentation for all
interfaces so this can go.
> 
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Cc: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

Acked-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>

> ---
>  Documentation/cgroups/cgroups.txt | 20 --------------
>  mm/memcontrol.c                   | 57 +++++++++++++++++++++++++--------------
>  2 files changed, 37 insertions(+), 40 deletions(-)
> 
> diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
> index 638bf17..821de56 100644
> --- a/Documentation/cgroups/cgroups.txt
> +++ b/Documentation/cgroups/cgroups.txt
> @@ -24,7 +24,6 @@ CONTENTS:
>    2.1 Basic Usage
>    2.2 Attaching processes
>    2.3 Mounting hierarchies by name
> -  2.4 Notification API
>  3. Kernel API
>    3.1 Overview
>    3.2 Synchronization
> @@ -472,25 +471,6 @@ you give a subsystem a name.
>  The name of the subsystem appears as part of the hierarchy description
>  in /proc/mounts and /proc/<pid>/cgroups.
>  
> -2.4 Notification API
> ---------------------
> -
> -There is mechanism which allows to get notifications about changing
> -status of a cgroup.
> -
> -To register a new notification handler you need to:
> - - create a file descriptor for event notification using eventfd(2);
> - - open a control file to be monitored (e.g. memory.usage_in_bytes);
> - - write "<event_fd> <control_fd> <args>" to cgroup.event_control.
> -   Interpretation of args is defined by control file implementation;
> -
> -eventfd will be woken up by control file implementation or when the
> -cgroup is removed.
> -
> -To unregister a notification handler just close eventfd.
> -
> -NOTE: Support of notifications should be implemented for the control
> -file. See documentation for the subsystem.
>  
>  3. Kernel API
>  =============
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 8663d6c..2f0a8e1 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -243,7 +243,7 @@ struct mem_cgroup_eventfd_list {
>  /*
>   * cgroup_event represents events which userspace want to receive.
>   */
> -struct cgroup_event {
> +struct mem_cgroup_event {
>  	/*
>  	 * memcg which the event belongs to.
>  	 */
> @@ -5973,14 +5973,27 @@ static void kmem_cgroup_css_offline(struct mem_cgroup *memcg)
>  #endif
>  
>  /*
> + * DO NOT USE IN NEW FILES.
> + *
> + * "cgroup.event_control" implementation.
> + *
> + * This is way over-engineered.  It tries to support fully configureable
> + * events for each user.  Such level of flexibility is completely
> + * unnecessary especially in the light of the planned unified hierarchy.
> + *
> + * Please deprecate this and replace with something simpler if at all
> + * possible.
> + */
> +
> +/*
>   * Unregister event and free resources.
>   *
>   * Gets called from workqueue.
>   */
> -static void cgroup_event_remove(struct work_struct *work)
> +static void memcg_event_remove(struct work_struct *work)
>  {
> -	struct cgroup_event *event = container_of(work, struct cgroup_event,
> -			remove);
> +	struct mem_cgroup_event *event =
> +		container_of(work, struct mem_cgroup_event, remove);
>  	struct mem_cgroup *memcg = event->memcg;
>  
>  	remove_wait_queue(event->wqh, &event->wait);
> @@ -6000,11 +6013,11 @@ static void cgroup_event_remove(struct work_struct *work)
>   *
>   * Called with wqh->lock held and interrupts disabled.
>   */
> -static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
> -		int sync, void *key)
> +static int memcg_event_wake(wait_queue_t *wait, unsigned mode,
> +			    int sync, void *key)
>  {
> -	struct cgroup_event *event = container_of(wait,
> -			struct cgroup_event, wait);
> +	struct mem_cgroup_event *event =
> +		container_of(wait, struct mem_cgroup_event, wait);
>  	struct mem_cgroup *memcg = event->memcg;
>  	unsigned long flags = (unsigned long)key;
>  
> @@ -6033,27 +6046,29 @@ static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
>  	return 0;
>  }
>  
> -static void cgroup_event_ptable_queue_proc(struct file *file,
> +static void memcg_event_ptable_queue_proc(struct file *file,
>  		wait_queue_head_t *wqh, poll_table *pt)
>  {
> -	struct cgroup_event *event = container_of(pt,
> -			struct cgroup_event, pt);
> +	struct mem_cgroup_event *event =
> +		container_of(pt, struct mem_cgroup_event, pt);
>  
>  	event->wqh = wqh;
>  	add_wait_queue(wqh, &event->wait);
>  }
>  
>  /*
> + * DO NOT USE IN NEW FILES.
> + *
>   * Parse input and register new cgroup event handler.
>   *
>   * Input must be in format '<event_fd> <control_fd> <args>'.
>   * Interpretation of args is defined by control file implementation.
>   */
> -static int cgroup_write_event_control(struct cgroup_subsys_state *css,
> -				      struct cftype *cft, const char *buffer)
> +static int memcg_write_event_control(struct cgroup_subsys_state *css,
> +				     struct cftype *cft, const char *buffer)
>  {
>  	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
> -	struct cgroup_event *event;
> +	struct mem_cgroup_event *event;
>  	struct cgroup_subsys_state *cfile_css;
>  	unsigned int efd, cfd;
>  	struct file *efile;
> @@ -6078,9 +6093,9 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *css,
>  
>  	event->memcg = memcg;
>  	INIT_LIST_HEAD(&event->list);
> -	init_poll_funcptr(&event->pt, cgroup_event_ptable_queue_proc);
> -	init_waitqueue_func_entry(&event->wait, cgroup_event_wake);
> -	INIT_WORK(&event->remove, cgroup_event_remove);
> +	init_poll_funcptr(&event->pt, memcg_event_ptable_queue_proc);
> +	init_waitqueue_func_entry(&event->wait, memcg_event_wake);
> +	INIT_WORK(&event->remove, memcg_event_remove);
>  
>  	efile = eventfd_fget(efd);
>  	if (IS_ERR(efile)) {
> @@ -6111,6 +6126,8 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *css,
>  	 * to be done via struct cftype but cgroup core no longer knows
>  	 * about these events.  The following is crude but the whole thing
>  	 * is for compatibility anyway.
> +	 *
> +	 * DO NOT ADD NEW FILES.
>  	 */
>  	name = cfile->f_dentry->d_name.name;
>  
> @@ -6221,8 +6238,8 @@ static struct cftype mem_cgroup_files[] = {
>  		.read_u64 = mem_cgroup_hierarchy_read,
>  	},
>  	{
> -		.name = "cgroup.event_control",
> -		.write_string = cgroup_write_event_control,
> +		.name = "cgroup.event_control",		/* XXX: for compat */
> +		.write_string = memcg_write_event_control,
>  		.flags = CFTYPE_NO_PREFIX,
>  		.mode = S_IWUGO,
>  	},
> @@ -6555,7 +6572,7 @@ static void mem_cgroup_invalidate_reclaim_iterators(struct mem_cgroup *memcg)
>  static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
>  {
>  	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
> -	struct cgroup_event *event, *tmp;
> +	struct mem_cgroup_event *event, *tmp;
>  
>  	/*
>  	 * Unregister events and notify userspace.
> -- 
> 1.8.3.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 11/12] memcg: rename cgroup_event to mem_cgroup_event
       [not found]     ` <1376582550-12548-12-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2013-08-23  3:42       ` Li Zefan
  2013-08-23  3:42       ` Li Zefan
@ 2013-08-30 11:19       ` Michal Hocko
  2013-08-30 11:19       ` Michal Hocko
  3 siblings, 0 replies; 74+ messages in thread
From: Michal Hocko @ 2013-08-30 11:19 UTC (permalink / raw)
  To: Tejun Heo
  Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Thu 15-08-13 12:02:29, Tejun Heo wrote:
> cgroup_event is only available in memcg now.  Let's brand it that way.
> While at it, add a comment encouraging deprecation of the feature and
> remove the respective section from cgroup documentation.
> 
> This patch is cosmetic.
> 
> v2: Index in cgroups.txt updated accordingly as suggested by Li Zefan.

OK, Documentation/cgroups/memory.txt contains a documentation for all
interfaces so this can go.
> 
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Cc: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

Acked-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>

> ---
>  Documentation/cgroups/cgroups.txt | 20 --------------
>  mm/memcontrol.c                   | 57 +++++++++++++++++++++++++--------------
>  2 files changed, 37 insertions(+), 40 deletions(-)
> 
> diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
> index 638bf17..821de56 100644
> --- a/Documentation/cgroups/cgroups.txt
> +++ b/Documentation/cgroups/cgroups.txt
> @@ -24,7 +24,6 @@ CONTENTS:
>    2.1 Basic Usage
>    2.2 Attaching processes
>    2.3 Mounting hierarchies by name
> -  2.4 Notification API
>  3. Kernel API
>    3.1 Overview
>    3.2 Synchronization
> @@ -472,25 +471,6 @@ you give a subsystem a name.
>  The name of the subsystem appears as part of the hierarchy description
>  in /proc/mounts and /proc/<pid>/cgroups.
>  
> -2.4 Notification API
> ---------------------
> -
> -There is mechanism which allows to get notifications about changing
> -status of a cgroup.
> -
> -To register a new notification handler you need to:
> - - create a file descriptor for event notification using eventfd(2);
> - - open a control file to be monitored (e.g. memory.usage_in_bytes);
> - - write "<event_fd> <control_fd> <args>" to cgroup.event_control.
> -   Interpretation of args is defined by control file implementation;
> -
> -eventfd will be woken up by control file implementation or when the
> -cgroup is removed.
> -
> -To unregister a notification handler just close eventfd.
> -
> -NOTE: Support of notifications should be implemented for the control
> -file. See documentation for the subsystem.
>  
>  3. Kernel API
>  =============
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 8663d6c..2f0a8e1 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -243,7 +243,7 @@ struct mem_cgroup_eventfd_list {
>  /*
>   * cgroup_event represents events which userspace want to receive.
>   */
> -struct cgroup_event {
> +struct mem_cgroup_event {
>  	/*
>  	 * memcg which the event belongs to.
>  	 */
> @@ -5973,14 +5973,27 @@ static void kmem_cgroup_css_offline(struct mem_cgroup *memcg)
>  #endif
>  
>  /*
> + * DO NOT USE IN NEW FILES.
> + *
> + * "cgroup.event_control" implementation.
> + *
> + * This is way over-engineered.  It tries to support fully configureable
> + * events for each user.  Such level of flexibility is completely
> + * unnecessary especially in the light of the planned unified hierarchy.
> + *
> + * Please deprecate this and replace with something simpler if at all
> + * possible.
> + */
> +
> +/*
>   * Unregister event and free resources.
>   *
>   * Gets called from workqueue.
>   */
> -static void cgroup_event_remove(struct work_struct *work)
> +static void memcg_event_remove(struct work_struct *work)
>  {
> -	struct cgroup_event *event = container_of(work, struct cgroup_event,
> -			remove);
> +	struct mem_cgroup_event *event =
> +		container_of(work, struct mem_cgroup_event, remove);
>  	struct mem_cgroup *memcg = event->memcg;
>  
>  	remove_wait_queue(event->wqh, &event->wait);
> @@ -6000,11 +6013,11 @@ static void cgroup_event_remove(struct work_struct *work)
>   *
>   * Called with wqh->lock held and interrupts disabled.
>   */
> -static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
> -		int sync, void *key)
> +static int memcg_event_wake(wait_queue_t *wait, unsigned mode,
> +			    int sync, void *key)
>  {
> -	struct cgroup_event *event = container_of(wait,
> -			struct cgroup_event, wait);
> +	struct mem_cgroup_event *event =
> +		container_of(wait, struct mem_cgroup_event, wait);
>  	struct mem_cgroup *memcg = event->memcg;
>  	unsigned long flags = (unsigned long)key;
>  
> @@ -6033,27 +6046,29 @@ static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
>  	return 0;
>  }
>  
> -static void cgroup_event_ptable_queue_proc(struct file *file,
> +static void memcg_event_ptable_queue_proc(struct file *file,
>  		wait_queue_head_t *wqh, poll_table *pt)
>  {
> -	struct cgroup_event *event = container_of(pt,
> -			struct cgroup_event, pt);
> +	struct mem_cgroup_event *event =
> +		container_of(pt, struct mem_cgroup_event, pt);
>  
>  	event->wqh = wqh;
>  	add_wait_queue(wqh, &event->wait);
>  }
>  
>  /*
> + * DO NOT USE IN NEW FILES.
> + *
>   * Parse input and register new cgroup event handler.
>   *
>   * Input must be in format '<event_fd> <control_fd> <args>'.
>   * Interpretation of args is defined by control file implementation.
>   */
> -static int cgroup_write_event_control(struct cgroup_subsys_state *css,
> -				      struct cftype *cft, const char *buffer)
> +static int memcg_write_event_control(struct cgroup_subsys_state *css,
> +				     struct cftype *cft, const char *buffer)
>  {
>  	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
> -	struct cgroup_event *event;
> +	struct mem_cgroup_event *event;
>  	struct cgroup_subsys_state *cfile_css;
>  	unsigned int efd, cfd;
>  	struct file *efile;
> @@ -6078,9 +6093,9 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *css,
>  
>  	event->memcg = memcg;
>  	INIT_LIST_HEAD(&event->list);
> -	init_poll_funcptr(&event->pt, cgroup_event_ptable_queue_proc);
> -	init_waitqueue_func_entry(&event->wait, cgroup_event_wake);
> -	INIT_WORK(&event->remove, cgroup_event_remove);
> +	init_poll_funcptr(&event->pt, memcg_event_ptable_queue_proc);
> +	init_waitqueue_func_entry(&event->wait, memcg_event_wake);
> +	INIT_WORK(&event->remove, memcg_event_remove);
>  
>  	efile = eventfd_fget(efd);
>  	if (IS_ERR(efile)) {
> @@ -6111,6 +6126,8 @@ static int cgroup_write_event_control(struct cgroup_subsys_state *css,
>  	 * to be done via struct cftype but cgroup core no longer knows
>  	 * about these events.  The following is crude but the whole thing
>  	 * is for compatibility anyway.
> +	 *
> +	 * DO NOT ADD NEW FILES.
>  	 */
>  	name = cfile->f_dentry->d_name.name;
>  
> @@ -6221,8 +6238,8 @@ static struct cftype mem_cgroup_files[] = {
>  		.read_u64 = mem_cgroup_hierarchy_read,
>  	},
>  	{
> -		.name = "cgroup.event_control",
> -		.write_string = cgroup_write_event_control,
> +		.name = "cgroup.event_control",		/* XXX: for compat */
> +		.write_string = memcg_write_event_control,
>  		.flags = CFTYPE_NO_PREFIX,
>  		.mode = S_IWUGO,
>  	},
> @@ -6555,7 +6572,7 @@ static void mem_cgroup_invalidate_reclaim_iterators(struct mem_cgroup *memcg)
>  static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
>  {
>  	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
> -	struct cgroup_event *event, *tmp;
> +	struct mem_cgroup_event *event, *tmp;
>  
>  	/*
>  	 * Unregister events and notify userspace.
> -- 
> 1.8.3.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 08/12] cgroup, memcg: move cgroup->event_list[_lock] and event callbacks into memcg
       [not found]         ` <20130830110846.GB31605-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
  2013-09-03 21:56           ` Tejun Heo
@ 2013-09-03 21:56           ` Tejun Heo
  1 sibling, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2013-09-03 21:56 UTC (permalink / raw)
  To: Michal Hocko
  Cc: hannes-druUgvl0LCNAfugRpC6u6w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Hello, Michal.

On Fri, Aug 30, 2013 at 01:08:46PM +0200, Michal Hocko wrote:
> On Thu 15-08-13 12:02:26, Tejun Heo wrote:
> [...]
> > @@ -6030,12 +6050,13 @@ static void cgroup_event_ptable_queue_proc(struct file *file,
> >  static int cgroup_write_event_control(struct cgroup_subsys_state *css,
> >  				      struct cftype *cft, const char *buffer)
> >  {
> > -	struct cgroup *cgrp = css->cgroup;
> > +	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
> >  	struct cgroup_event *event;
> >  	struct cgroup_subsys_state *cfile_css;
> >  	unsigned int efd, cfd;
> >  	struct file *efile;
> >  	struct file *cfile;
> > +	const char *name;
> >  	char *endp;
> >  	int ret;
> 
> AFAICS nothing prevents from registering a new event after css_offline
> is done already which would lead to a mem leak. The we would need to do
> css_tryget somewhere in the beginning. Or is there anything that would
> prevent from that?

An earlier patch titled "cgroup: make cgroup_event hold onto
cgroup_subsys_state instead of cgroup" updates cgroup_event such that
each cgroup_event acquires a css reference using css_tryget(), so if a
css has already been offlined, no new event will be created.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 08/12] cgroup, memcg: move cgroup->event_list[_lock] and event callbacks into memcg
       [not found]         ` <20130830110846.GB31605-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
@ 2013-09-03 21:56           ` Tejun Heo
       [not found]             ` <20130903215646.GA31091-9pTldWuhBndy/B6EtB590w@public.gmane.org>
  2013-09-03 21:56           ` Tejun Heo
  1 sibling, 1 reply; 74+ messages in thread
From: Tejun Heo @ 2013-09-03 21:56 UTC (permalink / raw)
  To: Michal Hocko
  Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Hello, Michal.

On Fri, Aug 30, 2013 at 01:08:46PM +0200, Michal Hocko wrote:
> On Thu 15-08-13 12:02:26, Tejun Heo wrote:
> [...]
> > @@ -6030,12 +6050,13 @@ static void cgroup_event_ptable_queue_proc(struct file *file,
> >  static int cgroup_write_event_control(struct cgroup_subsys_state *css,
> >  				      struct cftype *cft, const char *buffer)
> >  {
> > -	struct cgroup *cgrp = css->cgroup;
> > +	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
> >  	struct cgroup_event *event;
> >  	struct cgroup_subsys_state *cfile_css;
> >  	unsigned int efd, cfd;
> >  	struct file *efile;
> >  	struct file *cfile;
> > +	const char *name;
> >  	char *endp;
> >  	int ret;
> 
> AFAICS nothing prevents from registering a new event after css_offline
> is done already which would lead to a mem leak. The we would need to do
> css_tryget somewhere in the beginning. Or is there anything that would
> prevent from that?

An earlier patch titled "cgroup: make cgroup_event hold onto
cgroup_subsys_state instead of cgroup" updates cgroup_event such that
each cgroup_event acquires a css reference using css_tryget(), so if a
css has already been offlined, no new event will be created.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 08/12] cgroup, memcg: move cgroup->event_list[_lock] and event callbacks into memcg
       [not found]             ` <20130903215646.GA31091-9pTldWuhBndy/B6EtB590w@public.gmane.org>
@ 2013-09-04  7:11               ` Michal Hocko
  0 siblings, 0 replies; 74+ messages in thread
From: Michal Hocko @ 2013-09-04  7:11 UTC (permalink / raw)
  To: Tejun Heo
  Cc: hannes-druUgvl0LCNAfugRpC6u6w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Tue 03-09-13 17:56:46, Tejun Heo wrote:
> Hello, Michal.
> 
> On Fri, Aug 30, 2013 at 01:08:46PM +0200, Michal Hocko wrote:
> > On Thu 15-08-13 12:02:26, Tejun Heo wrote:
> > [...]
> > > @@ -6030,12 +6050,13 @@ static void cgroup_event_ptable_queue_proc(struct file *file,
> > >  static int cgroup_write_event_control(struct cgroup_subsys_state *css,
> > >  				      struct cftype *cft, const char *buffer)
> > >  {
> > > -	struct cgroup *cgrp = css->cgroup;
> > > +	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
> > >  	struct cgroup_event *event;
> > >  	struct cgroup_subsys_state *cfile_css;
> > >  	unsigned int efd, cfd;
> > >  	struct file *efile;
> > >  	struct file *cfile;
> > > +	const char *name;
> > >  	char *endp;
> > >  	int ret;
> > 
> > AFAICS nothing prevents from registering a new event after css_offline
> > is done already which would lead to a mem leak. The we would need to do
> > css_tryget somewhere in the beginning. Or is there anything that would
> > prevent from that?
> 
> An earlier patch titled "cgroup: make cgroup_event hold onto
> cgroup_subsys_state instead of cgroup" updates cgroup_event such that
> each cgroup_event acquires a css reference using css_tryget(), so if a
> css has already been offlined, no new event will be created.

Right you are. Sorry I have missed/forgot about the patch. The patch is
ok then and you can add my acked-by.

Thanks and sorry about the noise.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCHSET v2 cgroup/for-3.12] cgroup: make cgroup_event specific to memcg
       [not found] ` <1376582550-12548-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (28 preceding siblings ...)
  2013-11-10  4:48   ` Tejun Heo
@ 2013-11-10  4:48   ` Tejun Heo
  29 siblings, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2013-11-10  4:48 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Hey, guys.

I'll route this series through the cgroup tree with the acks added
once v3.13-rc1 drops.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCHSET v2 cgroup/for-3.12] cgroup: make cgroup_event specific to memcg
       [not found] ` <1376582550-12548-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (27 preceding siblings ...)
  2013-08-26 14:15   ` Kirill A. Shutemov
@ 2013-11-10  4:48   ` Tejun Heo
       [not found]     ` <20131110044811.GA25112-9pTldWuhBndy/B6EtB590w@public.gmane.org>
  2013-11-10  4:48   ` Tejun Heo
  29 siblings, 1 reply; 74+ messages in thread
From: Tejun Heo @ 2013-11-10  4:48 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Hey, guys.

I'll route this series through the cgroup tree with the acks added
once v3.13-rc1 drops.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCHSET v2 cgroup/for-3.12] cgroup: make cgroup_event specific to memcg
       [not found]     ` <20131110044811.GA25112-9pTldWuhBndy/B6EtB590w@public.gmane.org>
@ 2013-11-11 14:10       ` Michal Hocko
       [not found]         ` <20131111141010.GB14497-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
  0 siblings, 1 reply; 74+ messages in thread
From: Michal Hocko @ 2013-11-11 14:10 UTC (permalink / raw)
  To: Tejun Heo
  Cc: hannes-druUgvl0LCNAfugRpC6u6w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Sun 10-11-13 13:48:11, Tejun Heo wrote:
> Hey, guys.
> 
> I'll route this series through the cgroup tree with the acks added
> once v3.13-rc1 drops.

Do you have it a local branch which I can merge into mm git tree?

Thanks
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCHSET v2 cgroup/for-3.12] cgroup: make cgroup_event specific to memcg
       [not found]         ` <20131111141010.GB14497-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
  2013-11-22 23:39           ` Tejun Heo
@ 2013-11-22 23:39           ` Tejun Heo
  1 sibling, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2013-11-22 23:39 UTC (permalink / raw)
  To: Michal Hocko
  Cc: hannes-druUgvl0LCNAfugRpC6u6w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Hello,

On Mon, Nov 11, 2013 at 03:10:10PM +0100, Michal Hocko wrote:
> On Sun 10-11-13 13:48:11, Tejun Heo wrote:
> > Hey, guys.
> > 
> > I'll route this series through the cgroup tree with the acks added
> > once v3.13-rc1 drops.
> 
> Do you have it a local branch which I can merge into mm git tree?

Applied to cgroup/memcg_event which is based on top of v3.12 and then
pulled into cgroup/for-3.14.  Both branches are stable.  Please feel
free to use any.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git memcg_event
 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-3.14

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCHSET v2 cgroup/for-3.12] cgroup: make cgroup_event specific to memcg
       [not found]         ` <20131111141010.GB14497-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
@ 2013-11-22 23:39           ` Tejun Heo
       [not found]             ` <20131122233947.GH8981-9pTldWuhBndy/B6EtB590w@public.gmane.org>
  2013-11-22 23:39           ` Tejun Heo
  1 sibling, 1 reply; 74+ messages in thread
From: Tejun Heo @ 2013-11-22 23:39 UTC (permalink / raw)
  To: Michal Hocko
  Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Hello,

On Mon, Nov 11, 2013 at 03:10:10PM +0100, Michal Hocko wrote:
> On Sun 10-11-13 13:48:11, Tejun Heo wrote:
> > Hey, guys.
> > 
> > I'll route this series through the cgroup tree with the acks added
> > once v3.13-rc1 drops.
> 
> Do you have it a local branch which I can merge into mm git tree?

Applied to cgroup/memcg_event which is based on top of v3.12 and then
pulled into cgroup/for-3.14.  Both branches are stable.  Please feel
free to use any.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git memcg_event
 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-3.14

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCHSET v2 cgroup/for-3.12] cgroup: make cgroup_event specific to memcg
       [not found]             ` <20131122233947.GH8981-9pTldWuhBndy/B6EtB590w@public.gmane.org>
  2013-11-25 10:33               ` Michal Hocko
@ 2013-11-25 10:33               ` Michal Hocko
  1 sibling, 0 replies; 74+ messages in thread
From: Michal Hocko @ 2013-11-25 10:33 UTC (permalink / raw)
  To: Tejun Heo
  Cc: hannes-druUgvl0LCNAfugRpC6u6w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Fri 22-11-13 18:39:47, Tejun Heo wrote:
> Hello,
> 
> On Mon, Nov 11, 2013 at 03:10:10PM +0100, Michal Hocko wrote:
> > On Sun 10-11-13 13:48:11, Tejun Heo wrote:
> > > Hey, guys.
> > > 
> > > I'll route this series through the cgroup tree with the acks added
> > > once v3.13-rc1 drops.
> > 
> > Do you have it a local branch which I can merge into mm git tree?
> 
> Applied to cgroup/memcg_event which is based on top of v3.12 and then
> pulled into cgroup/for-3.14.  Both branches are stable.  Please feel
> free to use any.

Cool! Thanks Tejun.
 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git memcg_event
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-3.14
> 
> Thanks.
> 
> -- 
> tejun

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCHSET v2 cgroup/for-3.12] cgroup: make cgroup_event specific to memcg
       [not found]             ` <20131122233947.GH8981-9pTldWuhBndy/B6EtB590w@public.gmane.org>
@ 2013-11-25 10:33               ` Michal Hocko
  2013-11-25 10:33               ` Michal Hocko
  1 sibling, 0 replies; 74+ messages in thread
From: Michal Hocko @ 2013-11-25 10:33 UTC (permalink / raw)
  To: Tejun Heo
  Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Fri 22-11-13 18:39:47, Tejun Heo wrote:
> Hello,
> 
> On Mon, Nov 11, 2013 at 03:10:10PM +0100, Michal Hocko wrote:
> > On Sun 10-11-13 13:48:11, Tejun Heo wrote:
> > > Hey, guys.
> > > 
> > > I'll route this series through the cgroup tree with the acks added
> > > once v3.13-rc1 drops.
> > 
> > Do you have it a local branch which I can merge into mm git tree?
> 
> Applied to cgroup/memcg_event which is based on top of v3.12 and then
> pulled into cgroup/for-3.14.  Both branches are stable.  Please feel
> free to use any.

Cool! Thanks Tejun.
 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git memcg_event
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-3.14
> 
> Thanks.
> 
> -- 
> tejun

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCHSET v2 cgroup/for-3.12] cgroup: make cgroup_event specific to memcg
@ 2013-08-15 16:02 Tejun Heo
  0 siblings, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2013-08-15 16:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	hannes-druUgvl0LCNAfugRpC6u6w,
	bsingharora-Re5JQEeQqe8AvxtiuMwx3w,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Hello,

Changes from the last take[L] are

* Event handling updated such that it doesn't require meddling with
  internals not normally exposed outside cgroup core proper.  dentry
  reference counting is replaced with css one and cftype handling is
  completely gone.  Hopefully, this addresses Michal's complaints.

* Further simplifications.

* Rebased on top of the current cgroup/for-3.12 and pending patches.

Like many other things in cgroup, cgroup_event is way too flexible and
complex - it strives to provide completely flexible event monitoring
facility in cgroup proper which allows any number of users to monitor
custom events.  This essentially is a layering violation and leads to
weird issues like worrying about event API mis/abuse from userland in
cgroup controller event source implementation.

The only thing cgroup_event can do better than standard "file changed"
notification is serving many uncoordinated event listeners watching
many different thresholds which would only make sense if access to the
cgroup hierarchy is widely distributed.  The existing implementation
is pretty ill-equipped to handle such scenario, is not in the right
layer to tackle such issues and the whole cgroup is headed the other
way.  As such, going forward, cgroup core won't support cgroup_event
as the common event mechanism.

Fortunately, memcg along with vmpressure is the only user of the
facility and gets to keep it.  This patchset makes cgroup_event
specific to memcg, moves all related code into mm/memcontrol.c and
renames it to mem_cgroup_event so that its usage can't spread to other
subsystems and later deprecation and cleanup can be localized.

Note that after this patchset, cgroup.event_control file exists only
for the hierarchy which has memcg attached to it.  This is a userland
visible change but unlikely to be noticeable as the file has never
been meaningful outside memcg.  If this ever becomes problematic, we
can add a dummy file on hierarchies w/o memcg when !sane_behavior.

This patchset is consited of the following 12 patches.

 0001-cgroup-rename-cgroup_css_from_dir-to-css_from_dir-an.patch
 0002-cgroup-make-cgroup_css-take-cgroup_subsys-instead-an.patch
 0003-cgroup-implement-CFTYPE_NO_PREFIX.patch
 0004-cgroup-make-cgroup_event-hold-onto-cgroup_subsys_sta.patch
 0005-cgroup-make-cgroup_write_event_control-use-css_from_.patch
 0006-cgroup-memcg-move-cgroup_event-implementation-to-mem.patch
 0007-memcg-cgroup_write_event_control-now-knows-css-is-fo.patch
 0008-cgroup-memcg-move-cgroup-event_list-_lock-and-event-.patch
 0009-memcg-remove-cgroup_event-cft.patch
 0010-memcg-make-cgroup_event-deal-with-mem_cgroup-instead.patch
 0011-memcg-rename-cgroup_event-to-mem_cgroup_event.patch
 0012-cgroup-unexport-cgroup_css-and-remove-__file_cft.patch

0001-0005 prep for the move.  0005 moves it.  0006-0012 simplify it.

While these are quite a few patches, they are mostly trivial in nature
and some are required changes with the planned unified hierarchy
(e.g. switching to css refcnting) and we'd have to restructure it to
handle dynamic css attach/detach if it's kept generic, which will
likely be a lot more work for no noticeable gain.

The patches are based on top of

  cgroup/for-3.12 ff58ac0d58 ("cpuset: remove an unncessary forward declaration")
+ [PATCH] cgroup: fix subsystem file accesses on the root cgroup
+ [PATCH] cgroup: fix cgroup_write_event_control()

and available in the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-memcg_event

diffstat follows.

 Documentation/cgroups/cgroups.txt |   20 -
 include/linux/cgroup.h            |   28 --
 include/linux/vmpressure.h        |    8
 init/Kconfig                      |    3
 kernel/cgroup.c                   |  383 +++++---------------------------------
 kernel/events/core.c              |    2
 mm/memcontrol.c                   |  351 +++++++++++++++++++++++++++++++---
 mm/vmpressure.c                   |   26 --
 8 files changed, 389 insertions(+), 432 deletions(-)

Thanks.

--
tejun

[L] http://thread.gmane.org/gmane.linux.kernel.cgroups/8726

^ permalink raw reply	[flat|nested] 74+ messages in thread

end of thread, other threads:[~2013-11-25 10:33 UTC | newest]

Thread overview: 74+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-15 16:02 [PATCHSET v2 cgroup/for-3.12] cgroup: make cgroup_event specific to memcg Tejun Heo
     [not found] ` <1376582550-12548-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-08-15 16:02   ` [PATCH 01/12] cgroup: rename cgroup_css_from_dir() to css_from_dir() and update its syntax Tejun Heo
2013-08-15 16:02   ` Tejun Heo
2013-08-15 16:02   ` [PATCH 02/12] cgroup: make cgroup_css() take cgroup_subsys * instead and allow NULL subsys Tejun Heo
2013-08-15 16:02   ` Tejun Heo
2013-08-15 16:02   ` [PATCH 03/12] cgroup: implement CFTYPE_NO_PREFIX Tejun Heo
2013-08-15 16:02   ` Tejun Heo
2013-08-15 16:02   ` [PATCH 04/12] cgroup: make cgroup_event hold onto cgroup_subsys_state instead of cgroup Tejun Heo
2013-08-15 16:02   ` Tejun Heo
2013-08-15 16:02   ` [PATCH 05/12] cgroup: make cgroup_write_event_control() use css_from_dir() instead of __d_cgrp() Tejun Heo
     [not found]     ` <1376582550-12548-6-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-08-26 22:38       ` Tejun Heo
2013-08-15 16:02   ` Tejun Heo
2013-08-15 16:02   ` [PATCH 06/12] cgroup, memcg: move cgroup_event implementation to memcg Tejun Heo
     [not found]     ` <1376582550-12548-7-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-08-27 14:20       ` Michal Hocko
     [not found]         ` <20130827142002.GC13302-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-08-27 20:00           ` Tejun Heo
     [not found]             ` <20130827200002.GD12212-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-08-28 14:29               ` Michal Hocko
2013-08-28 14:29               ` Michal Hocko
2013-08-27 20:00           ` Tejun Heo
2013-08-29 18:19       ` [PATCH v3 " Tejun Heo
     [not found]         ` <20130829181911.GA8517-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-08-30 10:47           ` Michal Hocko
     [not found]             ` <20130830104755.GC28658-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-08-30 10:52               ` Tejun Heo
     [not found]                 ` <20130830105210.GA30910-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2013-08-30 11:05                   ` Michal Hocko
2013-08-30 11:05                   ` Michal Hocko
2013-08-30 10:52               ` Tejun Heo
2013-08-30 10:47           ` Michal Hocko
2013-08-29 18:19       ` Tejun Heo
2013-08-15 16:02   ` [PATCH " Tejun Heo
2013-08-15 16:02   ` [PATCH 07/12] memcg: cgroup_write_event_control() now knows @css is for memcg Tejun Heo
2013-08-15 16:02   ` Tejun Heo
2013-08-15 16:02   ` [PATCH 08/12] cgroup, memcg: move cgroup->event_list[_lock] and event callbacks into memcg Tejun Heo
2013-08-15 16:02   ` Tejun Heo
     [not found]     ` <1376582550-12548-9-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-08-30 11:08       ` Michal Hocko
     [not found]         ` <20130830110846.GB31605-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-09-03 21:56           ` Tejun Heo
     [not found]             ` <20130903215646.GA31091-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-09-04  7:11               ` Michal Hocko
2013-09-03 21:56           ` Tejun Heo
2013-08-15 16:02   ` [PATCH 09/12] memcg: remove cgroup_event->cft Tejun Heo
2013-08-15 16:02   ` Tejun Heo
     [not found]     ` <1376582550-12548-10-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-08-30 11:13       ` Michal Hocko
2013-08-15 16:02   ` [PATCH 10/12] memcg: make cgroup_event deal with mem_cgroup instead of cgroup_subsys_state Tejun Heo
2013-08-15 16:02   ` Tejun Heo
     [not found]     ` <1376582550-12548-11-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-08-30 11:15       ` Michal Hocko
2013-08-30 11:15       ` Michal Hocko
2013-08-15 16:02   ` [PATCH 11/12] memcg: rename cgroup_event to mem_cgroup_event Tejun Heo
     [not found]     ` <1376582550-12548-12-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-08-23  3:42       ` Li Zefan
     [not found]         ` <5216DA08.8040406-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-08-23 16:40           ` Tejun Heo
2013-08-23  3:42       ` Li Zefan
2013-08-30 11:19       ` Michal Hocko
2013-08-30 11:19       ` Michal Hocko
2013-08-15 16:02   ` Tejun Heo
2013-08-15 16:02   ` [PATCH 12/12] cgroup: unexport cgroup_css() and remove __file_cft() Tejun Heo
2013-08-15 16:02   ` Tejun Heo
2013-08-21 20:12   ` [PATCHSET v2 cgroup/for-3.12] cgroup: make cgroup_event specific to memcg Tejun Heo
     [not found]     ` <20130821201239.GB2436-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2013-08-23  3:43       ` Li Zefan
     [not found]         ` <5216DA6F.3080508-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-08-23 12:31           ` Tejun Heo
2013-08-23  3:43       ` Li Zefan
2013-08-24 18:20       ` Michal Hocko
     [not found]         ` <20130824182005.GA15897-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-08-24 18:25           ` Tejun Heo
2013-08-24 18:20       ` Michal Hocko
2013-08-21 20:12   ` Tejun Heo
2013-08-26 14:15   ` Kirill A. Shutemov
     [not found]     ` <20130826141536.GA14985-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
2013-08-26 15:17       ` Tejun Heo
     [not found]         ` <20130826151747.GD25171-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2013-08-26 14:29           ` Kirill A. Shutemov
     [not found]             ` <20130826142918.GB14985-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
2013-08-26 15:30               ` Tejun Heo
     [not found]                 ` <20130826153028.GE25171-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2013-08-26 14:35                   ` Kirill A. Shutemov
2013-08-26 15:30               ` Tejun Heo
2013-08-26 14:15   ` Kirill A. Shutemov
2013-11-10  4:48   ` Tejun Heo
     [not found]     ` <20131110044811.GA25112-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-11-11 14:10       ` Michal Hocko
     [not found]         ` <20131111141010.GB14497-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-22 23:39           ` Tejun Heo
     [not found]             ` <20131122233947.GH8981-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-11-25 10:33               ` Michal Hocko
2013-11-25 10:33               ` Michal Hocko
2013-11-22 23:39           ` Tejun Heo
2013-11-10  4:48   ` Tejun Heo
  -- strict thread matches above, loose matches on Subject: below --
2013-08-15 16:02 Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.