linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -mmotm 0/8] memcg: move charge at task migration (21/Dec)
@ 2009-12-21  5:31 Daisuke Nishimura
  2009-12-21  5:32 ` [PATCH -mmotm 1/8] cgroup: introduce cancel_attach() Daisuke Nishimura
                   ` (7 more replies)
  0 siblings, 8 replies; 28+ messages in thread
From: Daisuke Nishimura @ 2009-12-21  5:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Balbir Singh, KAMEZAWA Hiroyuki, Li Zefan, Paul Menage,
	Daisuke Nishimura, linux-mm

Hi.

These are the latest version of my move-charge-at-task-migration patch.

As I said in http://marc.info/?l=linux-mm&m=126135930226969&w=2, I've fixed
the BUG I found in 14/Dec version, and I think they are ready to be merged
into mmotm. These patches are based on mmotm-2009-12-10-17-19, but can be
applied onto 2.6.33-rc1-git1 too.


  [1/8] cgroup: introduce cancel_attach()
  [2/8] cgroup: introduce coalesce css_get() and css_put()
  [3/8] memcg: add interface to move charge at task migration
  [4/8] memcg: move charges of anonymous page
  [5/8] memcg: improve performance in moving charge
  [6/8] memcg: avoid oom during moving charge
  [7/8] memcg: move charges of anonymous swap
  [8/8] memcg: improbe performance in moving swap charge

 Documentation/cgroups/cgroups.txt |   13 +-
 Documentation/cgroups/memory.txt  |   56 +++-
 include/linux/cgroup.h            |   14 +-
 include/linux/page_cgroup.h       |    2 +
 include/linux/swap.h              |    1 +
 kernel/cgroup.c                   |   45 ++-
 mm/memcontrol.c                   |  649 +++++++++++++++++++++++++++++++++++--
 mm/page_cgroup.c                  |   35 ++-
 mm/swapfile.c                     |   31 ++
 9 files changed, 796 insertions(+), 50 deletions(-)


Overall history of this patch set:
2009/12/21
- Fix NULL pointer dereference BUG.
2009/12/14
- rebase on mmotm-2009-12-10-17-19.
- split performance improvement patch into cgroup part and memcg part.
- make use of waitq in avoid-oom patch.
- add TODO section in memory.txt.
2009/12/04
- rebase on mmotm-2009-11-24-16-47.
- change the term "recharge" to "move charge".
- improve performance in moving charge.
- parse the page table in can_attach() phase again(go back to the old behavior),
  because it doesn't add so big overheads, so it would be better to calculate
    the precharge count more accurately.
2009/11/19
- rebase on mmotm-2009-11-17-14-03 + KAMEZAWA-san's show per-process swap usage
  via procfs patch(v3).
- in can_attach(), instead of parsing the page table, make use of per process
  mm_counter(anon_rss, swap_usage).
- handle recharge_at_immigrate as bitmask(as I did in first version)
2009/11/06
- remove "[RFC]".
- rebase on mmotm-2009-11-01-10-01.
- drop support for file cache and shmem/tmpfs(revisit in future).
- update Documentation/cgroup/memory.txt.
2009/10/13
- rebase on mmotm-2009-10-09-01-07 + KAMEZAWA-san's batched charge/uncharge(Oct09) + part
of KAMEZAWA-san's cleanup/fix patches(4,5,7 of Sep25 with some fixes).
- change the term "migrate" to "recharge".
2009/09/24
- change "migrate_charge" flag from "int" to "bool".
- in can_attach(), parse the page table of the task and count only the number
  of target ptes and call try_charge() repeatedly. No isolation at this phase.
- in attach(), parse the page table of the task again, and isolate the target
  page and call move_account() one by one.
- do no swap-in in moving swap account any more.
- add support for shmem/tmpfs's swap.
- update Documentation/cgroup/cgroup.txt.
2009/09/17
- first version


Regards,
Daisuke Nishimura.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH -mmotm 1/8] cgroup: introduce cancel_attach()
  2009-12-21  5:31 [PATCH -mmotm 0/8] memcg: move charge at task migration (21/Dec) Daisuke Nishimura
@ 2009-12-21  5:32 ` Daisuke Nishimura
  2009-12-21  5:32 ` [PATCH -mmotm 2/8] cgroup: introduce coalesce css_get() and css_put() Daisuke Nishimura
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 28+ messages in thread
From: Daisuke Nishimura @ 2009-12-21  5:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Balbir Singh, KAMEZAWA Hiroyuki, Li Zefan, Paul Menage,
	Daisuke Nishimura, linux-mm

This patch adds cancel_attach() operation to struct cgroup_subsys.
cancel_attach() can be used when can_attach() operation prepares something
for the subsys, but we should rollback what can_attach() operation has prepared
if attach task fails after we've succeeded in can_attach().

Changelog: 2009/12/04
- update comments.
Changelog: 2009/11/19
- fix typo and coding style.
Changelog: 2009/09/24
- add explanation about cancel_attach() to Documentation/cgroup/cgroup.txt.

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Reviewed-by: Paul Menage <menage@google.com>
---
 Documentation/cgroups/cgroups.txt |   13 +++++++++++-
 include/linux/cgroup.h            |    2 +
 kernel/cgroup.c                   |   40 ++++++++++++++++++++++++++++++------
 3 files changed, 47 insertions(+), 8 deletions(-)

diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
index 0b33bfe..d450826 100644
--- a/Documentation/cgroups/cgroups.txt
+++ b/Documentation/cgroups/cgroups.txt
@@ -536,10 +536,21 @@ returns an error, this will abort the attach operation.  If a NULL
 task is passed, then a successful result indicates that *any*
 unspecified task can be moved into the cgroup. Note that this isn't
 called on a fork. If this method returns 0 (success) then this should
-remain valid while the caller holds cgroup_mutex. If threadgroup is
+remain valid while the caller holds cgroup_mutex and it is ensured that either
+attach() or cancel_attach() will be called in future. If threadgroup is
 true, then a successful result indicates that all threads in the given
 thread's threadgroup can be moved together.
 
+void cancel_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
+	       struct task_struct *task, bool threadgroup)
+(cgroup_mutex held by caller)
+
+Called when a task attach operation has failed after can_attach() has succeeded.
+A subsystem whose can_attach() has some side-effects should provide this
+function, so that the subsytem can implement a rollback. If not, not necessary.
+This will be called only about subsystems whose can_attach() operation have
+succeeded.
+
 void attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
 	    struct cgroup *old_cgrp, struct task_struct *task,
 	    bool threadgroup)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 0008dee..d4cc200 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -427,6 +427,8 @@ struct cgroup_subsys {
 	void (*destroy)(struct cgroup_subsys *ss, struct cgroup *cgrp);
 	int (*can_attach)(struct cgroup_subsys *ss, struct cgroup *cgrp,
 			  struct task_struct *tsk, bool threadgroup);
+	void (*cancel_attach)(struct cgroup_subsys *ss, struct cgroup *cgrp,
+			  struct task_struct *tsk, bool threadgroup);
 	void (*attach)(struct cgroup_subsys *ss, struct cgroup *cgrp,
 			struct cgroup *old_cgrp, struct task_struct *tsk,
 			bool threadgroup);
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 0249f4b..d67d471 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1539,7 +1539,7 @@ int cgroup_path(const struct cgroup *cgrp, char *buf, int buflen)
 int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
 {
 	int retval = 0;
-	struct cgroup_subsys *ss;
+	struct cgroup_subsys *ss, *failed_ss = NULL;
 	struct cgroup *oldcgrp;
 	struct css_set *cg;
 	struct css_set *newcg;
@@ -1553,8 +1553,16 @@ int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
 	for_each_subsys(root, ss) {
 		if (ss->can_attach) {
 			retval = ss->can_attach(ss, cgrp, tsk, false);
-			if (retval)
-				return retval;
+			if (retval) {
+				/*
+				 * Remember on which subsystem the can_attach()
+				 * failed, so that we only call cancel_attach()
+				 * against the subsystems whose can_attach()
+				 * succeeded. (See below)
+				 */
+				failed_ss = ss;
+				goto out;
+			}
 		}
 	}
 
@@ -1568,14 +1576,17 @@ int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
 	 */
 	newcg = find_css_set(cg, cgrp);
 	put_css_set(cg);
-	if (!newcg)
-		return -ENOMEM;
+	if (!newcg) {
+		retval = -ENOMEM;
+		goto out;
+	}
 
 	task_lock(tsk);
 	if (tsk->flags & PF_EXITING) {
 		task_unlock(tsk);
 		put_css_set(newcg);
-		return -ESRCH;
+		retval = -ESRCH;
+		goto out;
 	}
 	rcu_assign_pointer(tsk->cgroups, newcg);
 	task_unlock(tsk);
@@ -1601,7 +1612,22 @@ int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
 	 * is no longer empty.
 	 */
 	cgroup_wakeup_rmdir_waiter(cgrp);
-	return 0;
+out:
+	if (retval) {
+		for_each_subsys(root, ss) {
+			if (ss == failed_ss)
+				/*
+				 * This subsystem was the one that failed the
+				 * can_attach() check earlier, so we don't need
+				 * to call cancel_attach() against it or any
+				 * remaining subsystems.
+				 */
+				break;
+			if (ss->cancel_attach)
+				ss->cancel_attach(ss, cgrp, tsk, false);
+		}
+	}
+	return retval;
 }
 
 /*
-- 
1.5.6.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -mmotm 2/8] cgroup: introduce coalesce css_get() and css_put()
  2009-12-21  5:31 [PATCH -mmotm 0/8] memcg: move charge at task migration (21/Dec) Daisuke Nishimura
  2009-12-21  5:32 ` [PATCH -mmotm 1/8] cgroup: introduce cancel_attach() Daisuke Nishimura
@ 2009-12-21  5:32 ` Daisuke Nishimura
  2009-12-21  5:33 ` [PATCH -mmotm 3/8] memcg: add interface to move charge at task migration Daisuke Nishimura
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 28+ messages in thread
From: Daisuke Nishimura @ 2009-12-21  5:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Balbir Singh, KAMEZAWA Hiroyuki, Li Zefan, Paul Menage,
	Daisuke Nishimura, linux-mm

Current css_get() and css_put() increment/decrement css->refcnt one by one.

This patch add a new function __css_get(), which takes "count" as a arg and
increment the css->refcnt by "count". And this patch also add a new arg("count")
to __css_put() and change the function to decrement the css->refcnt by "count".

These coalesce version of __css_get()/__css_put() will be used to improve
performance of memcg's moving charge feature later, where instead of calling
css_get()/css_put() repeatedly, these new functions will be used.

No change is needed for current users of css_get()/css_put().

Changelog: 2009/12/14
- new patch(I split "[4/7] memcg: improbe performance in moving charge" of
  04/Dec version into 2 part: cgroup part and memcg part. This is the cgroup
  part.)

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: Paul Menage <menage@google.com>
---
 include/linux/cgroup.h |   12 +++++++++---
 kernel/cgroup.c        |    5 +++--
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index d4cc200..61f75ae 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -75,6 +75,12 @@ enum {
 	CSS_REMOVED, /* This CSS is dead */
 };
 
+/* Caller must verify that the css is not for root cgroup */
+static inline void __css_get(struct cgroup_subsys_state *css, int count)
+{
+	atomic_add(count, &css->refcnt);
+}
+
 /*
  * Call css_get() to hold a reference on the css; it can be used
  * for a reference obtained via:
@@ -86,7 +92,7 @@ static inline void css_get(struct cgroup_subsys_state *css)
 {
 	/* We don't need to reference count the root state */
 	if (!test_bit(CSS_ROOT, &css->flags))
-		atomic_inc(&css->refcnt);
+		__css_get(css, 1);
 }
 
 static inline bool css_is_removed(struct cgroup_subsys_state *css)
@@ -117,11 +123,11 @@ static inline bool css_tryget(struct cgroup_subsys_state *css)
  * css_get() or css_tryget()
  */
 
-extern void __css_put(struct cgroup_subsys_state *css);
+extern void __css_put(struct cgroup_subsys_state *css, int count);
 static inline void css_put(struct cgroup_subsys_state *css)
 {
 	if (!test_bit(CSS_ROOT, &css->flags))
-		__css_put(css);
+		__css_put(css, 1);
 }
 
 /* bits in struct cgroup flags field */
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index d67d471..8f89c9d 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -3729,12 +3729,13 @@ static void check_for_release(struct cgroup *cgrp)
 	}
 }
 
-void __css_put(struct cgroup_subsys_state *css)
+/* Caller must verify that the css is not for root cgroup */
+void __css_put(struct cgroup_subsys_state *css, int count)
 {
 	struct cgroup *cgrp = css->cgroup;
 	int val;
 	rcu_read_lock();
-	val = atomic_dec_return(&css->refcnt);
+	val = atomic_sub_return(count, &css->refcnt);
 	if (val == 1) {
 		if (notify_on_release(cgrp)) {
 			set_bit(CGRP_RELEASABLE, &cgrp->flags);
-- 
1.5.6.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -mmotm 3/8] memcg: add interface to move charge at task migration
  2009-12-21  5:31 [PATCH -mmotm 0/8] memcg: move charge at task migration (21/Dec) Daisuke Nishimura
  2009-12-21  5:32 ` [PATCH -mmotm 1/8] cgroup: introduce cancel_attach() Daisuke Nishimura
  2009-12-21  5:32 ` [PATCH -mmotm 2/8] cgroup: introduce coalesce css_get() and css_put() Daisuke Nishimura
@ 2009-12-21  5:33 ` Daisuke Nishimura
  2009-12-21  7:00   ` KAMEZAWA Hiroyuki
  2009-12-21  5:35 ` [PATCH -mmotm 4/8] memcg: move charges of anonymous page Daisuke Nishimura
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Daisuke Nishimura @ 2009-12-21  5:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Balbir Singh, KAMEZAWA Hiroyuki, Li Zefan, Paul Menage,
	Daisuke Nishimura, linux-mm

In current memcg, charges associated with a task aren't moved to the new cgroup
at task migration. Some users feel this behavior to be strange.
These patches are for this feature, that is, for charging to the new cgroup
and, of course, uncharging from the old cgroup at task migration.

This patch adds "memory.move_charge_at_immigrate" file, which is a flag file to
determine whether charges should be moved to the new cgroup at task migration or
not and what type of charges should be moved. This patch also adds read and
write handlers of the file.

This patch also adds no-op handlers for this feature. These handlers will be
implemented in later patches. And you cannot write any values other than 0
to move_charge_at_immigrate yet.

Changelog: 2009/12/14
- Add TODO section to meory.txt.
Changelog: 2009/12/04
- change the term "recharge" to "move_charge".
- update document.
Changelog: 2009/11/19
- consolidate changes in Documentation/cgroup/memory.txt, which were made in
  other patches separately.
- handle recharge_at_immigrate as bitmask(as I did in first version).
- use mm->owner instead of thread_group_leader().
Changelog: 2009/09/24
- change the term "migration" to "recharge".
- handle the flag as bool not bitmask to make codes simple.

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
---
 Documentation/cgroups/memory.txt |   56 +++++++++++++++++++++-
 mm/memcontrol.c                  |   97 ++++++++++++++++++++++++++++++++++++--
 2 files changed, 147 insertions(+), 6 deletions(-)

diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index b871f25..e726fb0 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -262,10 +262,12 @@ some of the pages cached in the cgroup (page cache pages).
 4.2 Task migration
 
 When a task migrates from one cgroup to another, it's charge is not
-carried forward. The pages allocated from the original cgroup still
+carried forward by default. The pages allocated from the original cgroup still
 remain charged to it, the charge is dropped when the page is freed or
 reclaimed.
 
+Note: You can move charges of a task along with task migration. See 8.
+
 4.3 Removing a cgroup
 
 A cgroup can be removed by rmdir, but as discussed in sections 4.1 and 4.2, a
@@ -414,7 +416,57 @@ NOTE1: Soft limits take effect over a long period of time, since they involve
 NOTE2: It is recommended to set the soft limit always below the hard limit,
        otherwise the hard limit will take precedence.
 
-8. TODO
+8. Move charges at task migration
+
+Users can move charges associated with a task along with task migration, that
+is, uncharge task's pages from the old cgroup and charge them to the new cgroup.
+
+8.1 Interface
+
+This feature is disabled by default. It can be enabled(and disabled again) by
+writing to memory.move_charge_at_immigrate of the destination cgroup.
+
+If you want to enable it:
+
+# echo (some positive value) > memory.move_charge_at_immigrate
+
+Note: Each bits of move_charge_at_immigrate has its own meaning about what type
+      of charges should be moved. See 8.2 for details.
+Note: Charges are moved only when you move mm->owner, IOW, a leader of a thread
+      group.
+Note: If we cannot find enough space for the task in the destination cgroup, we
+      try to make space by reclaiming memory. Task migration may fail if we
+      cannot make enough space.
+Note: It can take several seconds if you move charges in giga bytes order.
+
+And if you want disable it again:
+
+# echo 0 > memory.move_charge_at_immigrate
+
+8.2 Type of charges which can be move
+
+Each bits of move_charge_at_immigrate has its own meaning about what type of
+charges should be moved.
+
+  bit | what type of charges would be moved ?
+ -----+------------------------------------------------------------------------
+   0  | A charge of an anonymous page(or swap of it) used by the target task.
+      | Those pages and swaps must be used only by the target task. You must
+      | enable Swap Extension(see 2.4) to enable move of swap charges.
+
+Note: Those pages and swaps must be charged to the old cgroup.
+Note: More type of pages(e.g. file cache, shmem,) will be supported by other
+      bits in future.
+
+8.3 TODO
+
+- Add support for other types of pages(e.g. file cache, shmem, etc.).
+- Implement madvise(2) to let users decide the vma to be moved or not to be
+  moved.
+- All of moving charge operations are done under cgroup_mutex. It's not good
+  behavior to hold the mutex too long, so we may need some trick.
+
+9. TODO
 
 1. Add support for accounting huge pages (as a separate controller)
 2. Make per-cgroup scanner reclaim not-shared pages first
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 488b644..2ec0201 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -226,11 +226,26 @@ struct mem_cgroup {
 	bool		memsw_is_minimum;
 
 	/*
+	 * Should we move charges of a task when a task is moved into this
+	 * mem_cgroup ? And what type of charges should we move ?
+	 */
+	unsigned long 	move_charge_at_immigrate;
+
+	/*
 	 * statistics. This must be placed at the end of memcg.
 	 */
 	struct mem_cgroup_stat stat;
 };
 
+/* Stuffs for move charges at task migration. */
+/*
+ * Types of charges to be moved. "move_charge_at_immitgrate" is treated as a
+ * left-shifted bitmap of these types.
+ */
+enum move_type {
+	NR_MOVE_TYPE,
+};
+
 /*
  * Maximum loops in mem_cgroup_hierarchical_reclaim(), used for soft
  * limit reclaim to prevent infinite loops, if they ever occur.
@@ -2868,6 +2883,31 @@ static int mem_cgroup_reset(struct cgroup *cont, unsigned int event)
 	return 0;
 }
 
+static u64 mem_cgroup_move_charge_read(struct cgroup *cgrp,
+					struct cftype *cft)
+{
+	return mem_cgroup_from_cont(cgrp)->move_charge_at_immigrate;
+}
+
+static int mem_cgroup_move_charge_write(struct cgroup *cgrp,
+					struct cftype *cft, u64 val)
+{
+	struct mem_cgroup *mem = mem_cgroup_from_cont(cgrp);
+
+	if (val >= (1 << NR_MOVE_TYPE))
+		return -EINVAL;
+	/*
+	 * We check this value several times in both in can_attach() and
+	 * attach(), so we need cgroup lock to prevent this value from being
+	 * inconsistent.
+	 */
+	cgroup_lock();
+	mem->move_charge_at_immigrate = val;
+	cgroup_unlock();
+
+	return 0;
+}
+
 
 /* For read statistics */
 enum {
@@ -3101,6 +3141,11 @@ static struct cftype mem_cgroup_files[] = {
 		.read_u64 = mem_cgroup_swappiness_read,
 		.write_u64 = mem_cgroup_swappiness_write,
 	},
+	{
+		.name = "move_charge_at_immigrate",
+		.read_u64 = mem_cgroup_move_charge_read,
+		.write_u64 = mem_cgroup_move_charge_write,
+	},
 };
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
@@ -3348,6 +3393,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 	if (parent)
 		mem->swappiness = get_swappiness(parent);
 	atomic_set(&mem->refcnt, 1);
+	mem->move_charge_at_immigrate = 0;
 	return &mem->css;
 free_out:
 	__mem_cgroup_free(mem);
@@ -3384,16 +3430,57 @@ static int mem_cgroup_populate(struct cgroup_subsys *ss,
 	return ret;
 }
 
+/* Handlers for move charge at task migration. */
+static int mem_cgroup_can_move_charge(void)
+{
+	return 0;
+}
+
+static int mem_cgroup_can_attach(struct cgroup_subsys *ss,
+				struct cgroup *cgroup,
+				struct task_struct *p,
+				bool threadgroup)
+{
+	int ret = 0;
+	struct mem_cgroup *mem = mem_cgroup_from_cont(cgroup);
+
+	if (mem->move_charge_at_immigrate) {
+		struct mm_struct *mm;
+		struct mem_cgroup *from = mem_cgroup_from_task(p);
+
+		VM_BUG_ON(from == mem);
+
+		mm = get_task_mm(p);
+		if (!mm)
+			return 0;
+
+		/* We move charges only when we move a owner of the mm */
+		if (mm->owner == p)
+			ret = mem_cgroup_can_move_charge();
+
+		mmput(mm);
+	}
+	return ret;
+}
+
+static void mem_cgroup_cancel_attach(struct cgroup_subsys *ss,
+				struct cgroup *cgroup,
+				struct task_struct *p,
+				bool threadgroup)
+{
+}
+
+static void mem_cgroup_move_charge(void)
+{
+}
+
 static void mem_cgroup_move_task(struct cgroup_subsys *ss,
 				struct cgroup *cont,
 				struct cgroup *old_cont,
 				struct task_struct *p,
 				bool threadgroup)
 {
-	/*
-	 * FIXME: It's better to move charges of this process from old
-	 * memcg to new memcg. But it's just on TODO-List now.
-	 */
+	mem_cgroup_move_charge();
 }
 
 struct cgroup_subsys mem_cgroup_subsys = {
@@ -3403,6 +3490,8 @@ struct cgroup_subsys mem_cgroup_subsys = {
 	.pre_destroy = mem_cgroup_pre_destroy,
 	.destroy = mem_cgroup_destroy,
 	.populate = mem_cgroup_populate,
+	.can_attach = mem_cgroup_can_attach,
+	.cancel_attach = mem_cgroup_cancel_attach,
 	.attach = mem_cgroup_move_task,
 	.early_init = 0,
 	.use_id = 1,
-- 
1.5.6.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -mmotm 4/8] memcg: move charges of anonymous page
  2009-12-21  5:31 [PATCH -mmotm 0/8] memcg: move charge at task migration (21/Dec) Daisuke Nishimura
                   ` (2 preceding siblings ...)
  2009-12-21  5:33 ` [PATCH -mmotm 3/8] memcg: add interface to move charge at task migration Daisuke Nishimura
@ 2009-12-21  5:35 ` Daisuke Nishimura
  2009-12-21  7:01   ` KAMEZAWA Hiroyuki
  2009-12-23  0:26   ` Andrew Morton
  2009-12-21  5:36 ` [PATCH -mmotm 5/8] memcg: improve performance in moving charge Daisuke Nishimura
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 28+ messages in thread
From: Daisuke Nishimura @ 2009-12-21  5:35 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Balbir Singh, KAMEZAWA Hiroyuki, Li Zefan, Paul Menage,
	Daisuke Nishimura, linux-mm

This patch is the core part of this move-charge-at-task-migration feature.
It implements functions to move charges of anonymous pages mapped only by
the target task.

Implementation:
- define struct move_charge_struct and a valuable of it(mc) to remember the
  count of pre-charges and other information.
- At can_attach(), get anon_rss of the target mm, call __mem_cgroup_try_charge()
  repeatedly and count up mc.precharge.
- At attach(), parse the page table, find a target page to be move, and call
  mem_cgroup_move_account() about the page.
- Cancel all precharges if mc.precharge > 0 on failure or at the end of
  task move.

Changelog: 2009/12/04
- change the term "recharge" to "move_charge".
- handle a signal in can_attach() phase.
- parse the page table in can_attach() phase again(go back to the old behavior),
  because it doesn't add so big overheads, so it would be better to calculate
  the precharge count more accurately.
Changelog: 2009/11/19
- in can_attach(), instead of parsing the page table, make use of per process
  mm_counter(anon_rss).
- loosen the valid check in is_target_pte_for_recharge().
Changelog: 2009/11/06
- drop support for file cache, shmem/tmpfs and shared(used by multiple processes)
  pages(revisit in future).
Changelog: 2009/10/13
- change the term "migrate" to "recharge".
Changelog: 2009/09/24
- in can_attach(), parse the page table of the task and count only the number
  of target ptes and call try_charge() repeatedly. No isolation at this phase.
- in attach(), parse the page table of the task again, and isolate the target
  page and call move_account() one by one.

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
---
 mm/memcontrol.c |  295 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 285 insertions(+), 10 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2ec0201..c363170 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -21,6 +21,7 @@
 #include <linux/memcontrol.h>
 #include <linux/cgroup.h>
 #include <linux/mm.h>
+#include <linux/hugetlb.h>
 #include <linux/pagemap.h>
 #include <linux/smp.h>
 #include <linux/page-flags.h>
@@ -243,8 +244,17 @@ struct mem_cgroup {
  * left-shifted bitmap of these types.
  */
 enum move_type {
+	MOVE_CHARGE_TYPE_ANON,	/* private anonymous page and swap of it */
 	NR_MOVE_TYPE,
 };
+/* "mc" and its members are protected by cgroup_mutex */
+struct move_charge_struct {
+	struct mem_cgroup *from;
+	struct mem_cgroup *to;
+	unsigned long precharge;
+};
+static struct move_charge_struct mc;
+
 
 /*
  * Maximum loops in mem_cgroup_hierarchical_reclaim(), used for soft
@@ -1513,7 +1523,7 @@ charged:
 	 * Insert ancestor (and ancestor's ancestors), to softlimit RB-tree.
 	 * if they exceeds softlimit.
 	 */
-	if (mem_cgroup_soft_limit_check(mem))
+	if (page && mem_cgroup_soft_limit_check(mem))
 		mem_cgroup_update_tree(mem, page);
 done:
 	return 0;
@@ -1690,8 +1700,9 @@ static void __mem_cgroup_move_account(struct page_cgroup *pc,
 	/*
 	 * We charges against "to" which may not have any tasks. Then, "to"
 	 * can be under rmdir(). But in current implementation, caller of
-	 * this function is just force_empty() and it's garanteed that
-	 * "to" is never removed. So, we don't check rmdir status here.
+	 * this function is just force_empty() and move charge, so it's
+	 * garanteed that "to" is never removed. So, we don't check rmdir
+	 * status here.
 	 */
 }
 
@@ -3431,11 +3442,171 @@ static int mem_cgroup_populate(struct cgroup_subsys *ss,
 }
 
 /* Handlers for move charge at task migration. */
-static int mem_cgroup_can_move_charge(void)
+static int mem_cgroup_do_precharge(void)
+{
+	int ret = -ENOMEM;
+	struct mem_cgroup *mem = mc.to;
+
+	ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem, false, NULL);
+	if (ret || !mem)
+		return -ENOMEM;
+
+	mc.precharge++;
+	return ret;
+}
+
+/**
+ * is_target_pte_for_mc - check a pte whether it is valid for move charge
+ * @vma: the vma the pte to be checked belongs
+ * @addr: the address corresponding to the pte to be checked
+ * @ptent: the pte to be checked
+ * @target: the pointer the target page will be stored(can be NULL)
+ *
+ * Returns
+ *   0(MC_TARGET_NONE): if the pte is not a target for move charge.
+ *   1(MC_TARGET_PAGE): if the page corresponding to this pte is a target for
+ *     move charge. if @target is not NULL, the page is stored in target->page
+ *     with extra refcnt got(Callers should handle it).
+ *
+ * Called with pte lock held.
+ */
+/* We add a new member later. */
+union mc_target {
+	struct page	*page;
+};
+
+/* We add a new type later. */
+enum mc_target_type {
+	MC_TARGET_NONE,	/* not used */
+	MC_TARGET_PAGE,
+};
+
+static int is_target_pte_for_mc(struct vm_area_struct *vma,
+		unsigned long addr, pte_t ptent, union mc_target *target)
+{
+	struct page *page;
+	struct page_cgroup *pc;
+	int ret = 0;
+	bool move_anon = test_bit(MOVE_CHARGE_TYPE_ANON,
+					&mc.to->move_charge_at_immigrate);
+
+	if (!pte_present(ptent))
+		return 0;
+
+	page = vm_normal_page(vma, addr, ptent);
+	if (!page || !page_mapped(page))
+		return 0;
+	/*
+	 * TODO: We don't move charges of file(including shmem/tmpfs) pages for
+	 * now.
+	 */
+	if (!move_anon || !PageAnon(page))
+		return 0;
+	/*
+	 * TODO: We don't move charges of shared(used by multiple processes)
+	 * pages for now.
+	 */
+	if (page_mapcount(page) > 1)
+		return 0;
+	if (!get_page_unless_zero(page))
+		return 0;
+
+	pc = lookup_page_cgroup(page);
+	/*
+	 * Do only loose check w/o page_cgroup lock. mem_cgroup_move_account()
+	 * checks the pc is valid or not under the lock.
+	 */
+	if (PageCgroupUsed(pc) && pc->mem_cgroup == mc.from) {
+		ret = MC_TARGET_PAGE;
+		if (target)
+			target->page = page;
+	}
+
+	if (!ret || !target)
+		put_page(page);
+
+	return ret;
+}
+
+static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd,
+					unsigned long addr, unsigned long end,
+					struct mm_walk *walk)
 {
+	struct vm_area_struct *vma = walk->private;
+	pte_t *pte;
+	spinlock_t *ptl;
+
+	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
+	for (; addr != end; pte++, addr += PAGE_SIZE)
+		if (is_target_pte_for_mc(vma, addr, *pte, NULL))
+			mc.precharge++;	/* increment precharge temporarily */
+	pte_unmap_unlock(pte - 1, ptl);
+	cond_resched();
+
 	return 0;
 }
 
+static unsigned long mem_cgroup_count_precharge(struct mm_struct *mm)
+{
+	unsigned long precharge;
+	struct vm_area_struct *vma;
+
+	down_read(&mm->mmap_sem);
+	for (vma = mm->mmap; vma; vma = vma->vm_next) {
+		struct mm_walk mem_cgroup_count_precharge_walk = {
+			.pmd_entry = mem_cgroup_count_precharge_pte_range,
+			.mm = mm,
+			.private = vma,
+		};
+		if (is_vm_hugetlb_page(vma))
+			continue;
+		/* TODO: We don't move charges of shmem/tmpfs pages for now. */
+		if (vma->vm_flags & VM_SHARED)
+			continue;
+		walk_page_range(vma->vm_start, vma->vm_end,
+					&mem_cgroup_count_precharge_walk);
+	}
+	up_read(&mm->mmap_sem);
+
+	precharge = mc.precharge;
+	mc.precharge = 0;
+
+	return precharge;
+}
+
+#define PRECHARGE_AT_ONCE	256
+static int mem_cgroup_precharge_mc(struct mm_struct *mm)
+{
+	int ret = 0;
+	int count = PRECHARGE_AT_ONCE;
+	unsigned long precharge = mem_cgroup_count_precharge(mm);
+
+	while (!ret && precharge--) {
+		if (signal_pending(current)) {
+			ret = -EINTR;
+			break;
+		}
+		if (!count--) {
+			count = PRECHARGE_AT_ONCE;
+			cond_resched();
+		}
+		ret = mem_cgroup_do_precharge();
+	}
+
+	return ret;
+}
+
+static void mem_cgroup_clear_mc(void)
+{
+	/* we must uncharge all the leftover precharges from mc.to */
+	while (mc.precharge) {
+		mem_cgroup_cancel_charge(mc.to);
+		mc.precharge--;
+	}
+	mc.from = NULL;
+	mc.to = NULL;
+}
+
 static int mem_cgroup_can_attach(struct cgroup_subsys *ss,
 				struct cgroup *cgroup,
 				struct task_struct *p,
@@ -3453,11 +3624,19 @@ static int mem_cgroup_can_attach(struct cgroup_subsys *ss,
 		mm = get_task_mm(p);
 		if (!mm)
 			return 0;
-
 		/* We move charges only when we move a owner of the mm */
-		if (mm->owner == p)
-			ret = mem_cgroup_can_move_charge();
-
+		if (mm->owner == p) {
+			VM_BUG_ON(mc.from);
+			VM_BUG_ON(mc.to);
+			VM_BUG_ON(mc.precharge);
+			mc.from = from;
+			mc.to = mem;
+			mc.precharge = 0;
+
+			ret = mem_cgroup_precharge_mc(mm);
+			if (ret)
+				mem_cgroup_clear_mc();
+		}
 		mmput(mm);
 	}
 	return ret;
@@ -3468,10 +3647,95 @@ static void mem_cgroup_cancel_attach(struct cgroup_subsys *ss,
 				struct task_struct *p,
 				bool threadgroup)
 {
+	mem_cgroup_clear_mc();
 }
 
-static void mem_cgroup_move_charge(void)
+static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
+				unsigned long addr, unsigned long end,
+				struct mm_walk *walk)
 {
+	int ret = 0;
+	struct vm_area_struct *vma = walk->private;
+	pte_t *pte;
+	spinlock_t *ptl;
+
+retry:
+	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
+	for (; addr != end; addr += PAGE_SIZE) {
+		pte_t ptent = *(pte++);
+		union mc_target target;
+		int type;
+		struct page *page;
+		struct page_cgroup *pc;
+
+		if (!mc.precharge)
+			break;
+
+		type = is_target_pte_for_mc(vma, addr, ptent, &target);
+		switch (type) {
+		case MC_TARGET_PAGE:
+			page = target.page;
+			if (isolate_lru_page(page))
+				goto put;
+			pc = lookup_page_cgroup(page);
+			if (!mem_cgroup_move_account(pc, mc.from, mc.to)) {
+				css_put(&mc.to->css);
+				mc.precharge--;
+			}
+			putback_lru_page(page);
+put:			/* is_target_pte_for_mc() gets the page */
+			put_page(page);
+			break;
+		default:
+			break;
+		}
+	}
+	pte_unmap_unlock(pte - 1, ptl);
+	cond_resched();
+
+	if (addr != end) {
+		/*
+		 * We have consumed all precharges we got in can_attach().
+		 * We try charge one by one, but don't do any additional
+		 * charges to mc.to if we have failed in charge once in attach()
+		 * phase.
+		 */
+		ret = mem_cgroup_do_precharge();
+		if (!ret)
+			goto retry;
+	}
+
+	return ret;
+}
+
+static void mem_cgroup_move_charge(struct mm_struct *mm)
+{
+	struct vm_area_struct *vma;
+
+	lru_add_drain_all();
+	down_read(&mm->mmap_sem);
+	for (vma = mm->mmap; vma; vma = vma->vm_next) {
+		int ret;
+		struct mm_walk mem_cgroup_move_charge_walk = {
+			.pmd_entry = mem_cgroup_move_charge_pte_range,
+			.mm = mm,
+			.private = vma,
+		};
+		if (is_vm_hugetlb_page(vma))
+			continue;
+		/* TODO: We don't move charges of shmem/tmpfs pages for now. */
+		if (vma->vm_flags & VM_SHARED)
+			continue;
+		ret = walk_page_range(vma->vm_start, vma->vm_end,
+						&mem_cgroup_move_charge_walk);
+		if (ret)
+			/*
+			 * means we have consumed all precharges and failed in
+			 * doing additional charge. Just abandon here.
+			 */
+			break;
+	}
+	up_read(&mm->mmap_sem);
 }
 
 static void mem_cgroup_move_task(struct cgroup_subsys *ss,
@@ -3480,7 +3744,18 @@ static void mem_cgroup_move_task(struct cgroup_subsys *ss,
 				struct task_struct *p,
 				bool threadgroup)
 {
-	mem_cgroup_move_charge();
+	struct mm_struct *mm;
+
+	if (!mc.to)
+		/* no need to move charge */
+		return;
+
+	mm = get_task_mm(p);
+	if (mm) {
+		mem_cgroup_move_charge(mm);
+		mmput(mm);
+	}
+	mem_cgroup_clear_mc();
 }
 
 struct cgroup_subsys mem_cgroup_subsys = {
-- 
1.5.6.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -mmotm 5/8] memcg: improve performance in moving charge
  2009-12-21  5:31 [PATCH -mmotm 0/8] memcg: move charge at task migration (21/Dec) Daisuke Nishimura
                   ` (3 preceding siblings ...)
  2009-12-21  5:35 ` [PATCH -mmotm 4/8] memcg: move charges of anonymous page Daisuke Nishimura
@ 2009-12-21  5:36 ` Daisuke Nishimura
  2009-12-21  7:02   ` KAMEZAWA Hiroyuki
  2009-12-21  5:37 ` [PATCH -mmotm 6/8] memcg: avoid oom during " Daisuke Nishimura
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Daisuke Nishimura @ 2009-12-21  5:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Balbir Singh, KAMEZAWA Hiroyuki, Li Zefan, Paul Menage,
	Daisuke Nishimura, linux-mm

This patch tries to reduce overheads in moving charge by:

- Instead of calling res_counter_uncharge() against the old cgroup in
  __mem_cgroup_move_account() everytime, call res_counter_uncharge() at the end
  of task migration once.
- removed css_get(&to->css) from __mem_cgroup_move_account() because callers
  should have already called css_get(). And removed css_put(&to->css) too,
  which was called by callers of move_account on success of move_account.
- Instead of calling __mem_cgroup_try_charge(), i.e. res_counter_charge(),
  repeatedly, call res_counter_charge(PAGE_SIZE * count) in can_attach() if
  possible.
- Instead of calling css_get()/css_put() repeatedly, make use of coalesce
  __css_get()/__css_put() if possible.

These changes reduces the overhead from 1.7sec to 0.6sec to move charges of 1G
anonymous memory in my test environment.

Changelog: 2009/12/14
- move cgroup part to another patch.
- fix some bugs.

Changelog: 2009/12/04
- new patch

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
---
 mm/memcontrol.c |  152 +++++++++++++++++++++++++++++++++++-------------------
 1 files changed, 98 insertions(+), 54 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c363170..86e3202 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -252,6 +252,7 @@ struct move_charge_struct {
 	struct mem_cgroup *from;
 	struct mem_cgroup *to;
 	unsigned long precharge;
+	unsigned long moved_charge;
 };
 static struct move_charge_struct mc;
 
@@ -1537,14 +1538,23 @@ nomem:
  * This function is for that and do uncharge, put css's refcnt.
  * gotten by try_charge().
  */
-static void mem_cgroup_cancel_charge(struct mem_cgroup *mem)
+static void __mem_cgroup_cancel_charge(struct mem_cgroup *mem,
+							unsigned long count)
 {
 	if (!mem_cgroup_is_root(mem)) {
-		res_counter_uncharge(&mem->res, PAGE_SIZE);
+		res_counter_uncharge(&mem->res, PAGE_SIZE * count);
 		if (do_swap_account)
-			res_counter_uncharge(&mem->memsw, PAGE_SIZE);
+			res_counter_uncharge(&mem->memsw, PAGE_SIZE * count);
+		VM_BUG_ON(test_bit(CSS_ROOT, &mem->css.flags));
+		WARN_ON_ONCE(count > INT_MAX);
+		__css_put(&mem->css, (int)count);
 	}
-	css_put(&mem->css);
+	/* we don't need css_put for root */
+}
+
+static void mem_cgroup_cancel_charge(struct mem_cgroup *mem)
+{
+	__mem_cgroup_cancel_charge(mem, 1);
 }
 
 /*
@@ -1647,17 +1657,20 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
  * @pc:	page_cgroup of the page.
  * @from: mem_cgroup which the page is moved from.
  * @to:	mem_cgroup which the page is moved to. @from != @to.
+ * @uncharge: whether we should call uncharge and css_put against @from.
  *
  * The caller must confirm following.
  * - page is not on LRU (isolate_page() is useful.)
  * - the pc is locked, used, and ->mem_cgroup points to @from.
  *
- * This function does "uncharge" from old cgroup but doesn't do "charge" to
- * new cgroup. It should be done by a caller.
+ * This function doesn't do "charge" nor css_get to new cgroup. It should be
+ * done by a caller(__mem_cgroup_try_charge would be usefull). If @uncharge is
+ * true, this function does "uncharge" from old cgroup, but it doesn't if
+ * @uncharge is false, so a caller should do "uncharge".
  */
 
 static void __mem_cgroup_move_account(struct page_cgroup *pc,
-	struct mem_cgroup *from, struct mem_cgroup *to)
+	struct mem_cgroup *from, struct mem_cgroup *to, bool uncharge)
 {
 	struct page *page;
 	int cpu;
@@ -1670,10 +1683,6 @@ static void __mem_cgroup_move_account(struct page_cgroup *pc,
 	VM_BUG_ON(!PageCgroupUsed(pc));
 	VM_BUG_ON(pc->mem_cgroup != from);
 
-	if (!mem_cgroup_is_root(from))
-		res_counter_uncharge(&from->res, PAGE_SIZE);
-	mem_cgroup_charge_statistics(from, pc, false);
-
 	page = pc->page;
 	if (page_mapped(page) && !PageAnon(page)) {
 		cpu = smp_processor_id();
@@ -1689,12 +1698,12 @@ static void __mem_cgroup_move_account(struct page_cgroup *pc,
 		__mem_cgroup_stat_add_safe(cpustat, MEM_CGROUP_STAT_FILE_MAPPED,
 						1);
 	}
+	mem_cgroup_charge_statistics(from, pc, false);
+	if (uncharge)
+		/* This is not "cancel", but cancel_charge does all we need. */
+		mem_cgroup_cancel_charge(from);
 
-	if (do_swap_account && !mem_cgroup_is_root(from))
-		res_counter_uncharge(&from->memsw, PAGE_SIZE);
-	css_put(&from->css);
-
-	css_get(&to->css);
+	/* caller should have done css_get */
 	pc->mem_cgroup = to;
 	mem_cgroup_charge_statistics(to, pc, true);
 	/*
@@ -1711,12 +1720,12 @@ static void __mem_cgroup_move_account(struct page_cgroup *pc,
  * __mem_cgroup_move_account()
  */
 static int mem_cgroup_move_account(struct page_cgroup *pc,
-				struct mem_cgroup *from, struct mem_cgroup *to)
+		struct mem_cgroup *from, struct mem_cgroup *to, bool uncharge)
 {
 	int ret = -EINVAL;
 	lock_page_cgroup(pc);
 	if (PageCgroupUsed(pc) && pc->mem_cgroup == from) {
-		__mem_cgroup_move_account(pc, from, to);
+		__mem_cgroup_move_account(pc, from, to, uncharge);
 		ret = 0;
 	}
 	unlock_page_cgroup(pc);
@@ -1752,11 +1761,9 @@ static int mem_cgroup_move_parent(struct page_cgroup *pc,
 	if (ret || !parent)
 		goto put_back;
 
-	ret = mem_cgroup_move_account(pc, child, parent);
-	if (!ret)
-		css_put(&parent->css);	/* drop extra refcnt by try_charge() */
-	else
-		mem_cgroup_cancel_charge(parent);	/* does css_put */
+	ret = mem_cgroup_move_account(pc, child, parent, true);
+	if (ret)
+		mem_cgroup_cancel_charge(parent);
 put_back:
 	putback_lru_page(page);
 put:
@@ -3442,16 +3449,58 @@ static int mem_cgroup_populate(struct cgroup_subsys *ss,
 }
 
 /* Handlers for move charge at task migration. */
-static int mem_cgroup_do_precharge(void)
+#define PRECHARGE_COUNT_AT_ONCE	256
+static int mem_cgroup_do_precharge(unsigned long count)
 {
-	int ret = -ENOMEM;
+	int ret = 0;
+	int batch_count = PRECHARGE_COUNT_AT_ONCE;
 	struct mem_cgroup *mem = mc.to;
 
-	ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem, false, NULL);
-	if (ret || !mem)
-		return -ENOMEM;
-
-	mc.precharge++;
+	if (mem_cgroup_is_root(mem)) {
+		mc.precharge += count;
+		/* we don't need css_get for root */
+		return ret;
+	}
+	/* try to charge at once */
+	if (count > 1) {
+		struct res_counter *dummy;
+		/*
+		 * "mem" cannot be under rmdir() because we've already checked
+		 * by cgroup_lock_live_cgroup() that it is not removed and we
+		 * are still under the same cgroup_mutex. So we can postpone
+		 * css_get().
+		 */
+		if (res_counter_charge(&mem->res, PAGE_SIZE * count, &dummy))
+			goto one_by_one;
+		if (do_swap_account && res_counter_charge(&mem->memsw,
+						PAGE_SIZE * count, &dummy)) {
+			res_counter_uncharge(&mem->res, PAGE_SIZE * count);
+			goto one_by_one;
+		}
+		mc.precharge += count;
+		VM_BUG_ON(test_bit(CSS_ROOT, &mem->css.flags));
+		WARN_ON_ONCE(count > INT_MAX);
+		__css_get(&mem->css, (int)count);
+		return ret;
+	}
+one_by_one:
+	/* fall back to one by one charge */
+	while (count--) {
+		if (signal_pending(current)) {
+			ret = -EINTR;
+			break;
+		}
+		if (!batch_count--) {
+			batch_count = PRECHARGE_COUNT_AT_ONCE;
+			cond_resched();
+		}
+		ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem,
+								false, NULL);
+		if (ret || !mem)
+			/* mem_cgroup_clear_mc() will do uncharge later */
+			return -ENOMEM;
+		mc.precharge++;
+	}
 	return ret;
 }
 
@@ -3574,34 +3623,25 @@ static unsigned long mem_cgroup_count_precharge(struct mm_struct *mm)
 	return precharge;
 }
 
-#define PRECHARGE_AT_ONCE	256
 static int mem_cgroup_precharge_mc(struct mm_struct *mm)
 {
-	int ret = 0;
-	int count = PRECHARGE_AT_ONCE;
-	unsigned long precharge = mem_cgroup_count_precharge(mm);
-
-	while (!ret && precharge--) {
-		if (signal_pending(current)) {
-			ret = -EINTR;
-			break;
-		}
-		if (!count--) {
-			count = PRECHARGE_AT_ONCE;
-			cond_resched();
-		}
-		ret = mem_cgroup_do_precharge();
-	}
-
-	return ret;
+	return mem_cgroup_do_precharge(mem_cgroup_count_precharge(mm));
 }
 
 static void mem_cgroup_clear_mc(void)
 {
 	/* we must uncharge all the leftover precharges from mc.to */
-	while (mc.precharge) {
-		mem_cgroup_cancel_charge(mc.to);
-		mc.precharge--;
+	if (mc.precharge) {
+		__mem_cgroup_cancel_charge(mc.to, mc.precharge);
+		mc.precharge = 0;
+	}
+	/*
+	 * we didn't uncharge from mc.from at mem_cgroup_move_account(), so
+	 * we must uncharge here.
+	 */
+	if (mc.moved_charge) {
+		__mem_cgroup_cancel_charge(mc.from, mc.moved_charge);
+		mc.moved_charge = 0;
 	}
 	mc.from = NULL;
 	mc.to = NULL;
@@ -3629,9 +3669,11 @@ static int mem_cgroup_can_attach(struct cgroup_subsys *ss,
 			VM_BUG_ON(mc.from);
 			VM_BUG_ON(mc.to);
 			VM_BUG_ON(mc.precharge);
+			VM_BUG_ON(mc.moved_charge);
 			mc.from = from;
 			mc.to = mem;
 			mc.precharge = 0;
+			mc.moved_charge = 0;
 
 			ret = mem_cgroup_precharge_mc(mm);
 			if (ret)
@@ -3678,9 +3720,11 @@ retry:
 			if (isolate_lru_page(page))
 				goto put;
 			pc = lookup_page_cgroup(page);
-			if (!mem_cgroup_move_account(pc, mc.from, mc.to)) {
-				css_put(&mc.to->css);
+			if (!mem_cgroup_move_account(pc,
+						mc.from, mc.to, false)) {
 				mc.precharge--;
+				/* we uncharge from mc.from later. */
+				mc.moved_charge++;
 			}
 			putback_lru_page(page);
 put:			/* is_target_pte_for_mc() gets the page */
@@ -3700,7 +3744,7 @@ put:			/* is_target_pte_for_mc() gets the page */
 		 * charges to mc.to if we have failed in charge once in attach()
 		 * phase.
 		 */
-		ret = mem_cgroup_do_precharge();
+		ret = mem_cgroup_do_precharge(1);
 		if (!ret)
 			goto retry;
 	}
-- 
1.5.6.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -mmotm 6/8] memcg: avoid oom during moving charge
  2009-12-21  5:31 [PATCH -mmotm 0/8] memcg: move charge at task migration (21/Dec) Daisuke Nishimura
                   ` (4 preceding siblings ...)
  2009-12-21  5:36 ` [PATCH -mmotm 5/8] memcg: improve performance in moving charge Daisuke Nishimura
@ 2009-12-21  5:37 ` Daisuke Nishimura
  2009-12-21  7:03   ` KAMEZAWA Hiroyuki
  2009-12-21  5:38 ` [PATCH -mmotm 7/8] memcg: move charges of anonymous swap Daisuke Nishimura
  2009-12-21  5:40 ` [PATCH -mmotm 8/8] memcg: improve performance in moving swap charge Daisuke Nishimura
  7 siblings, 1 reply; 28+ messages in thread
From: Daisuke Nishimura @ 2009-12-21  5:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Balbir Singh, KAMEZAWA Hiroyuki, Li Zefan, Paul Menage,
	Daisuke Nishimura, linux-mm

This move-charge-at-task-migration feature has extra charges on "to"(pre-charges)
and "from"(left-over charges) during moving charge. This means unnecessary oom
can happen.

This patch tries to avoid such oom.

Changelog: 2009/12/21
- minor cleanup.
Changelog: 2009/12/14
- instead of continuing to charge by busy loop, make use of waitq.
Changelog: 2009/12/04
- take account of "from" too, because we uncharge from "from" at once in
  mem_cgroup_clear_mc(), so left-over charges exist during moving charge.
- check use_hierarchy of "mem_over_limit", instead of "to" or "from"(bugfix).

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
---
 mm/memcontrol.c |   54 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 86e3202..ddb3c6c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -253,8 +253,13 @@ struct move_charge_struct {
 	struct mem_cgroup *to;
 	unsigned long precharge;
 	unsigned long moved_charge;
+	struct task_struct *moving_task;	/* a task moving charges */
+	wait_queue_head_t waitq;		/* a waitq for other context */
+						/* not to cause oom */
+};
+static struct move_charge_struct mc = {
+	.waitq = __WAIT_QUEUE_HEAD_INITIALIZER(mc.waitq),
 };
-static struct move_charge_struct mc;
 
 
 /*
@@ -1509,6 +1514,48 @@ static int __mem_cgroup_try_charge(struct mm_struct *mm,
 		if (mem_cgroup_check_under_limit(mem_over_limit))
 			continue;
 
+		/* try to avoid oom while someone is moving charge */
+		if (mc.moving_task && current != mc.moving_task) {
+			struct mem_cgroup *from, *to;
+			bool do_continue = false;
+			/*
+			 * There is a small race that "from" or "to" can be
+			 * freed by rmdir, so we use css_tryget().
+			 */
+			rcu_read_lock();
+			from = mc.from;
+			to = mc.to;
+			if (from && css_tryget(&from->css)) {
+				if (mem_over_limit->use_hierarchy)
+					do_continue = css_is_ancestor(
+							&from->css,
+							&mem_over_limit->css);
+				else
+					do_continue = (from == mem_over_limit);
+				css_put(&from->css);
+			}
+			if (!do_continue && to && css_tryget(&to->css)) {
+				if (mem_over_limit->use_hierarchy)
+					do_continue = css_is_ancestor(
+							&to->css,
+							&mem_over_limit->css);
+				else
+					do_continue = (to == mem_over_limit);
+				css_put(&to->css);
+			}
+			rcu_read_unlock();
+			if (do_continue) {
+				DEFINE_WAIT(wait);
+				prepare_to_wait(&mc.waitq, &wait,
+							TASK_INTERRUPTIBLE);
+				/* moving charge context might have finished. */
+				if (mc.moving_task)
+					schedule();
+				finish_wait(&mc.waitq, &wait);
+				continue;
+			}
+		}
+
 		if (!nr_retries--) {
 			if (oom) {
 				mem_cgroup_out_of_memory(mem_over_limit, gfp_mask);
@@ -3385,7 +3432,6 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 			INIT_WORK(&stock->work, drain_local_stock);
 		}
 		hotcpu_notifier(memcg_stock_cpu_callback, 0);
-
 	} else {
 		parent = mem_cgroup_from_cont(cont->parent);
 		mem->use_hierarchy = parent->use_hierarchy;
@@ -3645,6 +3691,8 @@ static void mem_cgroup_clear_mc(void)
 	}
 	mc.from = NULL;
 	mc.to = NULL;
+	mc.moving_task = NULL;
+	wake_up_all(&mc.waitq);
 }
 
 static int mem_cgroup_can_attach(struct cgroup_subsys *ss,
@@ -3670,10 +3718,12 @@ static int mem_cgroup_can_attach(struct cgroup_subsys *ss,
 			VM_BUG_ON(mc.to);
 			VM_BUG_ON(mc.precharge);
 			VM_BUG_ON(mc.moved_charge);
+			VM_BUG_ON(mc.moving_task);
 			mc.from = from;
 			mc.to = mem;
 			mc.precharge = 0;
 			mc.moved_charge = 0;
+			mc.moving_task = current;
 
 			ret = mem_cgroup_precharge_mc(mm);
 			if (ret)
-- 
1.5.6.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -mmotm 7/8] memcg: move charges of anonymous swap
  2009-12-21  5:31 [PATCH -mmotm 0/8] memcg: move charge at task migration (21/Dec) Daisuke Nishimura
                   ` (5 preceding siblings ...)
  2009-12-21  5:37 ` [PATCH -mmotm 6/8] memcg: avoid oom during " Daisuke Nishimura
@ 2009-12-21  5:38 ` Daisuke Nishimura
  2009-12-21  7:04   ` KAMEZAWA Hiroyuki
  2010-02-04  3:31   ` Andrew Morton
  2009-12-21  5:40 ` [PATCH -mmotm 8/8] memcg: improve performance in moving swap charge Daisuke Nishimura
  7 siblings, 2 replies; 28+ messages in thread
From: Daisuke Nishimura @ 2009-12-21  5:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Balbir Singh, KAMEZAWA Hiroyuki, Li Zefan, Paul Menage,
	Daisuke Nishimura, linux-mm

This patch is another core part of this move-charge-at-task-migration feature.
It enables moving charges of anonymous swaps.

To move the charge of swap, we need to exchange swap_cgroup's record.

In current implementation, swap_cgroup's record is protected by:

  - page lock: if the entry is on swap cache.
  - swap_lock: if the entry is not on swap cache.

This works well in usual swap-in/out activity.

But this behavior make the feature of moving swap charge check many conditions
to exchange swap_cgroup's record safely.

So I changed modification of swap_cgroup's recored(swap_cgroup_record())
to use xchg, and define a new function to cmpxchg swap_cgroup's record.

This patch also enables moving charge of non pte_present but not uncharged swap
caches, which can be exist on swap-out path, by getting the target pages via
find_get_page() as do_mincore() does.

Changelog: 2009/12/21
- move css_put(&to->css) from mem_cgroup_move_charge_pte_range() to
  mem_cgroup_move_swap_account().
Changelog: 2009/12/04
- minor changes in comments and valuable names.
Changelog: 2009/11/19
- in can_attach(), instead of parsing the page table, make use of per process
  mm_counter(swap_usage).
Changelog: 2009/11/06
- drop support for shmem's swap(revisit in future).
- add mem_cgroup_count_swap_user() to prevent moving charges of swaps used by
  multiple processes(revisit in future).
Changelog: 2009/09/24
- do no swap-in in moving swap account any more.
- add support for shmem's swap.

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
---
 include/linux/page_cgroup.h |    2 +
 include/linux/swap.h        |    1 +
 mm/memcontrol.c             |  152 ++++++++++++++++++++++++++++++++----------
 mm/page_cgroup.c            |   35 +++++++++-
 mm/swapfile.c               |   31 +++++++++
 5 files changed, 183 insertions(+), 38 deletions(-)

diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index b0e4eb1..30b0813 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -118,6 +118,8 @@ static inline void __init page_cgroup_init_flatmem(void)
 #include <linux/swap.h>
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
+extern unsigned short swap_cgroup_cmpxchg(swp_entry_t ent,
+					unsigned short old, unsigned short new);
 extern unsigned short swap_cgroup_record(swp_entry_t ent, unsigned short id);
 extern unsigned short lookup_swap_cgroup(swp_entry_t ent);
 extern int swap_cgroup_swapon(int type, unsigned long max_pages);
diff --git a/include/linux/swap.h b/include/linux/swap.h
index a2602a8..1bb1506 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -355,6 +355,7 @@ static inline void disable_swap_token(void)
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
 extern void
 mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent, bool swapout);
+extern int mem_cgroup_count_swap_user(swp_entry_t ent, struct page **pagep);
 #else
 static inline void
 mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent, bool swapout)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ddb3c6c..006f4b6 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -33,6 +33,7 @@
 #include <linux/rbtree.h>
 #include <linux/slab.h>
 #include <linux/swap.h>
+#include <linux/swapops.h>
 #include <linux/spinlock.h>
 #include <linux/fs.h>
 #include <linux/seq_file.h>
@@ -2272,6 +2273,54 @@ void mem_cgroup_uncharge_swap(swp_entry_t ent)
 	}
 	rcu_read_unlock();
 }
+
+/**
+ * mem_cgroup_move_swap_account - move swap charge and swap_cgroup's record.
+ * @entry: swap entry to be moved
+ * @from:  mem_cgroup which the entry is moved from
+ * @to:  mem_cgroup which the entry is moved to
+ *
+ * It succeeds only when the swap_cgroup's record for this entry is the same
+ * as the mem_cgroup's id of @from.
+ *
+ * Returns 0 on success, -EINVAL on failure.
+ *
+ * The caller must have charged to @to, IOW, called res_counter_charge() about
+ * both res and memsw, and called css_get().
+ */
+static int mem_cgroup_move_swap_account(swp_entry_t entry,
+				struct mem_cgroup *from, struct mem_cgroup *to)
+{
+	unsigned short old_id, new_id;
+
+	old_id = css_id(&from->css);
+	new_id = css_id(&to->css);
+
+	if (swap_cgroup_cmpxchg(entry, old_id, new_id) == old_id) {
+		if (!mem_cgroup_is_root(from))
+			res_counter_uncharge(&from->memsw, PAGE_SIZE);
+		mem_cgroup_swap_statistics(from, false);
+		mem_cgroup_put(from);
+		/*
+		 * we charged both to->res and to->memsw, so we should uncharge
+		 * to->res.
+		 */
+		if (!mem_cgroup_is_root(to))
+			res_counter_uncharge(&to->res, PAGE_SIZE);
+		mem_cgroup_swap_statistics(to, true);
+		mem_cgroup_get(to);
+		css_put(&to->css);
+
+		return 0;
+	}
+	return -EINVAL;
+}
+#else
+static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
+				struct mem_cgroup *from, struct mem_cgroup *to)
+{
+	return -EINVAL;
+}
 #endif
 
 /*
@@ -3555,71 +3604,96 @@ one_by_one:
  * @vma: the vma the pte to be checked belongs
  * @addr: the address corresponding to the pte to be checked
  * @ptent: the pte to be checked
- * @target: the pointer the target page will be stored(can be NULL)
+ * @target: the pointer the target page or swap ent will be stored(can be NULL)
  *
  * Returns
  *   0(MC_TARGET_NONE): if the pte is not a target for move charge.
  *   1(MC_TARGET_PAGE): if the page corresponding to this pte is a target for
  *     move charge. if @target is not NULL, the page is stored in target->page
  *     with extra refcnt got(Callers should handle it).
+ *   2(MC_TARGET_SWAP): if the swap entry corresponding to this pte is a
+ *     target for charge migration. if @target is not NULL, the entry is stored
+ *     in target->ent.
  *
  * Called with pte lock held.
  */
-/* We add a new member later. */
 union mc_target {
 	struct page	*page;
+	swp_entry_t	ent;
 };
 
-/* We add a new type later. */
 enum mc_target_type {
 	MC_TARGET_NONE,	/* not used */
 	MC_TARGET_PAGE,
+	MC_TARGET_SWAP,
 };
 
 static int is_target_pte_for_mc(struct vm_area_struct *vma,
 		unsigned long addr, pte_t ptent, union mc_target *target)
 {
-	struct page *page;
+	struct page *page = NULL;
 	struct page_cgroup *pc;
 	int ret = 0;
+	swp_entry_t ent = { .val = 0 };
+	int usage_count = 0;
 	bool move_anon = test_bit(MOVE_CHARGE_TYPE_ANON,
 					&mc.to->move_charge_at_immigrate);
 
-	if (!pte_present(ptent))
-		return 0;
-
-	page = vm_normal_page(vma, addr, ptent);
-	if (!page || !page_mapped(page))
-		return 0;
-	/*
-	 * TODO: We don't move charges of file(including shmem/tmpfs) pages for
-	 * now.
-	 */
-	if (!move_anon || !PageAnon(page))
-		return 0;
-	/*
-	 * TODO: We don't move charges of shared(used by multiple processes)
-	 * pages for now.
-	 */
-	if (page_mapcount(page) > 1)
-		return 0;
-	if (!get_page_unless_zero(page))
+	if (!pte_present(ptent)) {
+		/* TODO: handle swap of shmes/tmpfs */
+		if (pte_none(ptent) || pte_file(ptent))
+			return 0;
+		else if (is_swap_pte(ptent)) {
+			ent = pte_to_swp_entry(ptent);
+			if (!move_anon || non_swap_entry(ent))
+				return 0;
+			usage_count = mem_cgroup_count_swap_user(ent, &page);
+		}
+	} else {
+		page = vm_normal_page(vma, addr, ptent);
+		if (!page || !page_mapped(page))
+			return 0;
+		/*
+		 * TODO: We don't move charges of file(including shmem/tmpfs)
+		 * pages for now.
+		 */
+		if (!move_anon || !PageAnon(page))
+			return 0;
+		if (!get_page_unless_zero(page))
+			return 0;
+		usage_count = page_mapcount(page);
+	}
+	if (usage_count > 1) {
+		/*
+		 * TODO: We don't move charges of shared(used by multiple
+		 * processes) pages for now.
+		 */
+		if (page)
+			put_page(page);
 		return 0;
-
-	pc = lookup_page_cgroup(page);
-	/*
-	 * Do only loose check w/o page_cgroup lock. mem_cgroup_move_account()
-	 * checks the pc is valid or not under the lock.
-	 */
-	if (PageCgroupUsed(pc) && pc->mem_cgroup == mc.from) {
-		ret = MC_TARGET_PAGE;
+	}
+	if (page) {
+		pc = lookup_page_cgroup(page);
+		/*
+		 * Do only loose check w/o page_cgroup lock.
+		 * mem_cgroup_move_account() checks the pc is valid or not under
+		 * the lock.
+		 */
+		if (PageCgroupUsed(pc) && pc->mem_cgroup == mc.from) {
+			ret = MC_TARGET_PAGE;
+			if (target)
+				target->page = page;
+		}
+		if (!ret || !target)
+			put_page(page);
+	}
+	/* throught */
+	if (ent.val && do_swap_account && !ret &&
+			css_id(&mc.from->css) == lookup_swap_cgroup(ent)) {
+		ret = MC_TARGET_SWAP;
 		if (target)
-			target->page = page;
+			target->ent = ent;
 	}
-
-	if (!ret || !target)
-		put_page(page);
-
 	return ret;
 }
 
@@ -3759,6 +3833,7 @@ retry:
 		int type;
 		struct page *page;
 		struct page_cgroup *pc;
+		swp_entry_t ent;
 
 		if (!mc.precharge)
 			break;
@@ -3780,6 +3855,11 @@ retry:
 put:			/* is_target_pte_for_mc() gets the page */
 			put_page(page);
 			break;
+		case MC_TARGET_SWAP:
+			ent = target.ent;
+			if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to))
+				mc.precharge--;
+			break;
 		default:
 			break;
 		}
diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index 3d535d5..213b0ee 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -9,6 +9,7 @@
 #include <linux/vmalloc.h>
 #include <linux/cgroup.h>
 #include <linux/swapops.h>
+#include <asm/cmpxchg.h>
 
 static void __meminit
 __init_page_cgroup(struct page_cgroup *pc, unsigned long pfn)
@@ -335,6 +336,37 @@ not_enough_page:
 }
 
 /**
+ * swap_cgroup_cmpxchg - cmpxchg mem_cgroup's id for this swp_entry.
+ * @end: swap entry to be cmpxchged
+ * @old: old id
+ * @new: new id
+ *
+ * Returns old id at success, 0 at failure.
+ * (There is no mem_cgroup useing 0 as its id)
+ */
+unsigned short swap_cgroup_cmpxchg(swp_entry_t ent,
+					unsigned short old, unsigned short new)
+{
+	int type = swp_type(ent);
+	unsigned long offset = swp_offset(ent);
+	unsigned long idx = offset / SC_PER_PAGE;
+	unsigned long pos = offset & SC_POS_MASK;
+	struct swap_cgroup_ctrl *ctrl;
+	struct page *mappage;
+	struct swap_cgroup *sc;
+
+	ctrl = &swap_cgroup_ctrl[type];
+
+	mappage = ctrl->map[idx];
+	sc = page_address(mappage);
+	sc += pos;
+	if (cmpxchg(&sc->id, old, new) == old)
+		return old;
+	else
+		return 0;
+}
+
+/**
  * swap_cgroup_record - record mem_cgroup for this swp_entry.
  * @ent: swap entry to be recorded into
  * @mem: mem_cgroup to be recorded
@@ -358,8 +390,7 @@ unsigned short swap_cgroup_record(swp_entry_t ent, unsigned short id)
 	mappage = ctrl->map[idx];
 	sc = page_address(mappage);
 	sc += pos;
-	old = sc->id;
-	sc->id = id;
+	old = xchg(&sc->id, id);
 
 	return old;
 }
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 6c0585b..fae08aa 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -723,6 +723,37 @@ int free_swap_and_cache(swp_entry_t entry)
 	return p != NULL;
 }
 
+#ifdef CONFIG_CGROUP_MEM_RES_CTLR
+/**
+ * mem_cgroup_count_swap_user - count the user of a swap entry
+ * @ent: the swap entry to be checked
+ * @pagep: the pointer for the swap cache page of the entry to be stored
+ *
+ * Returns the number of the user of the swap entry. The number is valid only
+ * for swaps of anonymous pages.
+ * If the entry is found on swap cache, the page is stored to pagep with
+ * refcount of it being incremented.
+ */
+int mem_cgroup_count_swap_user(swp_entry_t ent, struct page **pagep)
+{
+	struct page *page;
+	struct swap_info_struct *p;
+	int count = 0;
+
+	page = find_get_page(&swapper_space, ent.val);
+	if (page)
+		count += page_mapcount(page);
+	p = swap_info_get(ent);
+	if (p) {
+		count += swap_count(p->swap_map[swp_offset(ent)]);
+		spin_unlock(&swap_lock);
+	}
+
+	*pagep = page;
+	return count;
+}
+#endif
+
 #ifdef CONFIG_HIBERNATION
 /*
  * Find the swap type that corresponds to given device (if any).
-- 
1.5.6.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -mmotm 8/8] memcg: improve performance in moving swap charge
  2009-12-21  5:31 [PATCH -mmotm 0/8] memcg: move charge at task migration (21/Dec) Daisuke Nishimura
                   ` (6 preceding siblings ...)
  2009-12-21  5:38 ` [PATCH -mmotm 7/8] memcg: move charges of anonymous swap Daisuke Nishimura
@ 2009-12-21  5:40 ` Daisuke Nishimura
  2009-12-21  7:05   ` KAMEZAWA Hiroyuki
  7 siblings, 1 reply; 28+ messages in thread
From: Daisuke Nishimura @ 2009-12-21  5:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Balbir Singh, KAMEZAWA Hiroyuki, Li Zefan, Paul Menage,
	Daisuke Nishimura, linux-mm

This patch tries to reduce overheads in moving swap charge by:

- Adds a new function(__mem_cgroup_put), which takes "count" as a arg and
  decrement mem->refcnt by "count".
- Removed res_counter_uncharge, css_put, and mem_cgroup_put from the path
  of moving swap account, and consolidate all of them into mem_cgroup_clear_mc.
  We cannot do that about mc.to->refcnt.

These changes reduces the overhead from 1.35sec to 0.9sec to move charges of 1G
anonymous memory(including 500MB swap) in my test environment.

Changelog: 2009/12/21
- don't postpone calling mem_cgroup_get() against the new cgroup(bug fix).
Changelog: 2009/12/04
- new patch

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
---
 mm/memcontrol.c |   73 ++++++++++++++++++++++++++++++++++++++++++++----------
 1 files changed, 59 insertions(+), 14 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 006f4b6..ffca2ab 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -254,6 +254,7 @@ struct move_charge_struct {
 	struct mem_cgroup *to;
 	unsigned long precharge;
 	unsigned long moved_charge;
+	unsigned long moved_swap;
 	struct task_struct *moving_task;	/* a task moving charges */
 	wait_queue_head_t waitq;		/* a waitq for other context */
 						/* not to cause oom */
@@ -2279,6 +2280,7 @@ void mem_cgroup_uncharge_swap(swp_entry_t ent)
  * @entry: swap entry to be moved
  * @from:  mem_cgroup which the entry is moved from
  * @to:  mem_cgroup which the entry is moved to
+ * @need_fixup: whether we should fixup res_counters and refcounts.
  *
  * It succeeds only when the swap_cgroup's record for this entry is the same
  * as the mem_cgroup's id of @from.
@@ -2289,7 +2291,7 @@ void mem_cgroup_uncharge_swap(swp_entry_t ent)
  * both res and memsw, and called css_get().
  */
 static int mem_cgroup_move_swap_account(swp_entry_t entry,
-				struct mem_cgroup *from, struct mem_cgroup *to)
+		struct mem_cgroup *from, struct mem_cgroup *to, bool need_fixup)
 {
 	unsigned short old_id, new_id;
 
@@ -2297,20 +2299,29 @@ static int mem_cgroup_move_swap_account(swp_entry_t entry,
 	new_id = css_id(&to->css);
 
 	if (swap_cgroup_cmpxchg(entry, old_id, new_id) == old_id) {
-		if (!mem_cgroup_is_root(from))
-			res_counter_uncharge(&from->memsw, PAGE_SIZE);
 		mem_cgroup_swap_statistics(from, false);
-		mem_cgroup_put(from);
+		mem_cgroup_swap_statistics(to, true);
 		/*
-		 * we charged both to->res and to->memsw, so we should uncharge
-		 * to->res.
+		 * This function is only called from task migration context now.
+		 * It postpones res_counter and refcount handling till the end
+		 * of task migration(mem_cgroup_clear_mc()) for performance
+		 * improvement. But we cannot postpone mem_cgroup_get(to)
+		 * because if the process that has been moved to @to does
+		 * swap-in, the refcount of @to might be decreased to 0.
 		 */
-		if (!mem_cgroup_is_root(to))
-			res_counter_uncharge(&to->res, PAGE_SIZE);
-		mem_cgroup_swap_statistics(to, true);
 		mem_cgroup_get(to);
-		css_put(&to->css);
-
+		if (need_fixup) {
+			if (!mem_cgroup_is_root(from))
+				res_counter_uncharge(&from->memsw, PAGE_SIZE);
+			mem_cgroup_put(from);
+			/*
+			 * we charged both to->res and to->memsw, so we should
+			 * uncharge to->res.
+			 */
+			if (!mem_cgroup_is_root(to))
+				res_counter_uncharge(&to->res, PAGE_SIZE);
+			css_put(&to->css);
+		}
 		return 0;
 	}
 	return -EINVAL;
@@ -3395,9 +3406,9 @@ static void mem_cgroup_get(struct mem_cgroup *mem)
 	atomic_inc(&mem->refcnt);
 }
 
-static void mem_cgroup_put(struct mem_cgroup *mem)
+static void __mem_cgroup_put(struct mem_cgroup *mem, int count)
 {
-	if (atomic_dec_and_test(&mem->refcnt)) {
+	if (atomic_sub_and_test(count, &mem->refcnt)) {
 		struct mem_cgroup *parent = parent_mem_cgroup(mem);
 		__mem_cgroup_free(mem);
 		if (parent)
@@ -3405,6 +3416,11 @@ static void mem_cgroup_put(struct mem_cgroup *mem)
 	}
 }
 
+static void mem_cgroup_put(struct mem_cgroup *mem)
+{
+	__mem_cgroup_put(mem, 1);
+}
+
 /*
  * Returns the parent mem_cgroup in memcgroup hierarchy with hierarchy enabled.
  */
@@ -3763,6 +3779,29 @@ static void mem_cgroup_clear_mc(void)
 		__mem_cgroup_cancel_charge(mc.from, mc.moved_charge);
 		mc.moved_charge = 0;
 	}
+	/* we must fixup refcnts and charges */
+	if (mc.moved_swap) {
+		WARN_ON_ONCE(mc.moved_swap > INT_MAX);
+		/* uncharge swap account from the old cgroup */
+		if (!mem_cgroup_is_root(mc.from))
+			res_counter_uncharge(&mc.from->memsw,
+						PAGE_SIZE * mc.moved_swap);
+		__mem_cgroup_put(mc.from, mc.moved_swap);
+
+		if (!mem_cgroup_is_root(mc.to)) {
+			/*
+			 * we charged both to->res and to->memsw, so we should
+			 * uncharge to->res.
+			 */
+			res_counter_uncharge(&mc.to->res,
+						PAGE_SIZE * mc.moved_swap);
+			VM_BUG_ON(test_bit(CSS_ROOT, &mc.to->css.flags));
+			__css_put(&mc.to->css, mc.moved_swap);
+		}
+		/* we've already done mem_cgroup_get(mc.to) */
+
+		mc.moved_swap = 0;
+	}
 	mc.from = NULL;
 	mc.to = NULL;
 	mc.moving_task = NULL;
@@ -3792,11 +3831,13 @@ static int mem_cgroup_can_attach(struct cgroup_subsys *ss,
 			VM_BUG_ON(mc.to);
 			VM_BUG_ON(mc.precharge);
 			VM_BUG_ON(mc.moved_charge);
+			VM_BUG_ON(mc.moved_swap);
 			VM_BUG_ON(mc.moving_task);
 			mc.from = from;
 			mc.to = mem;
 			mc.precharge = 0;
 			mc.moved_charge = 0;
+			mc.moved_swap = 0;
 			mc.moving_task = current;
 
 			ret = mem_cgroup_precharge_mc(mm);
@@ -3857,8 +3898,12 @@ put:			/* is_target_pte_for_mc() gets the page */
 			break;
 		case MC_TARGET_SWAP:
 			ent = target.ent;
-			if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to))
+			if (!mem_cgroup_move_swap_account(ent,
+						mc.from, mc.to, false)) {
 				mc.precharge--;
+				/* we fixup refcnts and charges later. */
+				mc.moved_swap++;
+			}
 			break;
 		default:
 			break;
-- 
1.5.6.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH -mmotm 3/8] memcg: add interface to move charge at task migration
  2009-12-21  5:33 ` [PATCH -mmotm 3/8] memcg: add interface to move charge at task migration Daisuke Nishimura
@ 2009-12-21  7:00   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 28+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-12-21  7:00 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: Andrew Morton, Balbir Singh, Li Zefan, Paul Menage, linux-mm

On Mon, 21 Dec 2009 14:33:46 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> In current memcg, charges associated with a task aren't moved to the new cgroup
> at task migration. Some users feel this behavior to be strange.
> These patches are for this feature, that is, for charging to the new cgroup
> and, of course, uncharging from the old cgroup at task migration.
> 
> This patch adds "memory.move_charge_at_immigrate" file, which is a flag file to
> determine whether charges should be moved to the new cgroup at task migration or
> not and what type of charges should be moved. This patch also adds read and
> write handlers of the file.
> 
> This patch also adds no-op handlers for this feature. These handlers will be
> implemented in later patches. And you cannot write any values other than 0
> to move_charge_at_immigrate yet.
> 
> Changelog: 2009/12/14
> - Add TODO section to meory.txt.
> Changelog: 2009/12/04
> - change the term "recharge" to "move_charge".
> - update document.
> Changelog: 2009/11/19
> - consolidate changes in Documentation/cgroup/memory.txt, which were made in
>   other patches separately.
> - handle recharge_at_immigrate as bitmask(as I did in first version).
> - use mm->owner instead of thread_group_leader().
> Changelog: 2009/09/24
> - change the term "migration" to "recharge".
> - handle the flag as bool not bitmask to make codes simple.
> 
> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -mmotm 4/8] memcg: move charges of anonymous page
  2009-12-21  5:35 ` [PATCH -mmotm 4/8] memcg: move charges of anonymous page Daisuke Nishimura
@ 2009-12-21  7:01   ` KAMEZAWA Hiroyuki
  2009-12-23  0:26   ` Andrew Morton
  1 sibling, 0 replies; 28+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-12-21  7:01 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: Andrew Morton, Balbir Singh, Li Zefan, Paul Menage, linux-mm

On Mon, 21 Dec 2009 14:35:03 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> This patch is the core part of this move-charge-at-task-migration feature.
> It implements functions to move charges of anonymous pages mapped only by
> the target task.
> 
> Implementation:
> - define struct move_charge_struct and a valuable of it(mc) to remember the
>   count of pre-charges and other information.
> - At can_attach(), get anon_rss of the target mm, call __mem_cgroup_try_charge()
>   repeatedly and count up mc.precharge.
> - At attach(), parse the page table, find a target page to be move, and call
>   mem_cgroup_move_account() about the page.
> - Cancel all precharges if mc.precharge > 0 on failure or at the end of
>   task move.
> 
> Changelog: 2009/12/04
> - change the term "recharge" to "move_charge".
> - handle a signal in can_attach() phase.
> - parse the page table in can_attach() phase again(go back to the old behavior),
>   because it doesn't add so big overheads, so it would be better to calculate
>   the precharge count more accurately.
> Changelog: 2009/11/19
> - in can_attach(), instead of parsing the page table, make use of per process
>   mm_counter(anon_rss).
> - loosen the valid check in is_target_pte_for_recharge().
> Changelog: 2009/11/06
> - drop support for file cache, shmem/tmpfs and shared(used by multiple processes)
>   pages(revisit in future).
> Changelog: 2009/10/13
> - change the term "migrate" to "recharge".
> Changelog: 2009/09/24
> - in can_attach(), parse the page table of the task and count only the number
>   of target ptes and call try_charge() repeatedly. No isolation at this phase.
> - in attach(), parse the page table of the task again, and isolate the target
>   page and call move_account() one by one.
> 
> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -mmotm 5/8] memcg: improve performance in moving charge
  2009-12-21  5:36 ` [PATCH -mmotm 5/8] memcg: improve performance in moving charge Daisuke Nishimura
@ 2009-12-21  7:02   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 28+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-12-21  7:02 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: Andrew Morton, Balbir Singh, Li Zefan, Paul Menage, linux-mm

On Mon, 21 Dec 2009 14:36:20 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> This patch tries to reduce overheads in moving charge by:
> 
> - Instead of calling res_counter_uncharge() against the old cgroup in
>   __mem_cgroup_move_account() everytime, call res_counter_uncharge() at the end
>   of task migration once.
> - removed css_get(&to->css) from __mem_cgroup_move_account() because callers
>   should have already called css_get(). And removed css_put(&to->css) too,
>   which was called by callers of move_account on success of move_account.
> - Instead of calling __mem_cgroup_try_charge(), i.e. res_counter_charge(),
>   repeatedly, call res_counter_charge(PAGE_SIZE * count) in can_attach() if
>   possible.
> - Instead of calling css_get()/css_put() repeatedly, make use of coalesce
>   __css_get()/__css_put() if possible.
> 
> These changes reduces the overhead from 1.7sec to 0.6sec to move charges of 1G
> anonymous memory in my test environment.
> 
> Changelog: 2009/12/14
> - move cgroup part to another patch.
> - fix some bugs.
> 
> Changelog: 2009/12/04
> - new patch
> 
> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -mmotm 6/8] memcg: avoid oom during moving charge
  2009-12-21  5:37 ` [PATCH -mmotm 6/8] memcg: avoid oom during " Daisuke Nishimura
@ 2009-12-21  7:03   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 28+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-12-21  7:03 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: Andrew Morton, Balbir Singh, Li Zefan, Paul Menage, linux-mm

On Mon, 21 Dec 2009 14:37:09 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> This move-charge-at-task-migration feature has extra charges on "to"(pre-charges)
> and "from"(left-over charges) during moving charge. This means unnecessary oom
> can happen.
> 
> This patch tries to avoid such oom.
> 
> Changelog: 2009/12/21
> - minor cleanup.
> Changelog: 2009/12/14
> - instead of continuing to charge by busy loop, make use of waitq.
> Changelog: 2009/12/04
> - take account of "from" too, because we uncharge from "from" at once in
>   mem_cgroup_clear_mc(), so left-over charges exist during moving charge.
> - check use_hierarchy of "mem_over_limit", instead of "to" or "from"(bugfix).
> 
> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -mmotm 7/8] memcg: move charges of anonymous swap
  2009-12-21  5:38 ` [PATCH -mmotm 7/8] memcg: move charges of anonymous swap Daisuke Nishimura
@ 2009-12-21  7:04   ` KAMEZAWA Hiroyuki
  2010-02-04  3:31   ` Andrew Morton
  1 sibling, 0 replies; 28+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-12-21  7:04 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: Andrew Morton, Balbir Singh, Li Zefan, Paul Menage, linux-mm

On Mon, 21 Dec 2009 14:38:16 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> This patch is another core part of this move-charge-at-task-migration feature.
> It enables moving charges of anonymous swaps.
> 
> To move the charge of swap, we need to exchange swap_cgroup's record.
> 
> In current implementation, swap_cgroup's record is protected by:
> 
>   - page lock: if the entry is on swap cache.
>   - swap_lock: if the entry is not on swap cache.
> 
> This works well in usual swap-in/out activity.
> 
> But this behavior make the feature of moving swap charge check many conditions
> to exchange swap_cgroup's record safely.
> 
> So I changed modification of swap_cgroup's recored(swap_cgroup_record())
> to use xchg, and define a new function to cmpxchg swap_cgroup's record.
> 
> This patch also enables moving charge of non pte_present but not uncharged swap
> caches, which can be exist on swap-out path, by getting the target pages via
> find_get_page() as do_mincore() does.
> 
> Changelog: 2009/12/21
> - move css_put(&to->css) from mem_cgroup_move_charge_pte_range() to
>   mem_cgroup_move_swap_account().
> Changelog: 2009/12/04
> - minor changes in comments and valuable names.
> Changelog: 2009/11/19
> - in can_attach(), instead of parsing the page table, make use of per process
>   mm_counter(swap_usage).
> Changelog: 2009/11/06
> - drop support for shmem's swap(revisit in future).
> - add mem_cgroup_count_swap_user() to prevent moving charges of swaps used by
>   multiple processes(revisit in future).
> Changelog: 2009/09/24
> - do no swap-in in moving swap account any more.
> - add support for shmem's swap.
> 
> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -mmotm 8/8] memcg: improve performance in moving swap charge
  2009-12-21  5:40 ` [PATCH -mmotm 8/8] memcg: improve performance in moving swap charge Daisuke Nishimura
@ 2009-12-21  7:05   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 28+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-12-21  7:05 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: Andrew Morton, Balbir Singh, Li Zefan, Paul Menage, linux-mm

On Mon, 21 Dec 2009 14:40:06 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> This patch tries to reduce overheads in moving swap charge by:
> 
> - Adds a new function(__mem_cgroup_put), which takes "count" as a arg and
>   decrement mem->refcnt by "count".
> - Removed res_counter_uncharge, css_put, and mem_cgroup_put from the path
>   of moving swap account, and consolidate all of them into mem_cgroup_clear_mc.
>   We cannot do that about mc.to->refcnt.
> 
> These changes reduces the overhead from 1.35sec to 0.9sec to move charges of 1G
> anonymous memory(including 500MB swap) in my test environment.
> 
> Changelog: 2009/12/21
> - don't postpone calling mem_cgroup_get() against the new cgroup(bug fix).
> Changelog: 2009/12/04
> - new patch
> 
> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -mmotm 4/8] memcg: move charges of anonymous page
  2009-12-21  5:35 ` [PATCH -mmotm 4/8] memcg: move charges of anonymous page Daisuke Nishimura
  2009-12-21  7:01   ` KAMEZAWA Hiroyuki
@ 2009-12-23  0:26   ` Andrew Morton
  1 sibling, 0 replies; 28+ messages in thread
From: Andrew Morton @ 2009-12-23  0:26 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: Balbir Singh, KAMEZAWA Hiroyuki, Li Zefan, Paul Menage, linux-mm

On Mon, 21 Dec 2009 14:35:03 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> +/* "mc" and its members are protected by cgroup_mutex */
> +struct move_charge_struct {
> +	struct mem_cgroup *from;
> +	struct mem_cgroup *to;
> +	unsigned long precharge;
> +};
> +static struct move_charge_struct mc;

This is neater:

/* "mc" and its members are protected by cgroup_mutex */
static struct move_charge_struct {
	struct mem_cgroup *from;
	struct mem_cgroup *to;
	unsigned long precharge;
} mc;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -mmotm 7/8] memcg: move charges of anonymous swap
  2009-12-21  5:38 ` [PATCH -mmotm 7/8] memcg: move charges of anonymous swap Daisuke Nishimura
  2009-12-21  7:04   ` KAMEZAWA Hiroyuki
@ 2010-02-04  3:31   ` Andrew Morton
  2010-02-04  5:09     ` Daisuke Nishimura
  1 sibling, 1 reply; 28+ messages in thread
From: Andrew Morton @ 2010-02-04  3:31 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: Balbir Singh, KAMEZAWA Hiroyuki, Li Zefan, Paul Menage, linux-mm

On Mon, 21 Dec 2009 14:38:16 +0900 Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> This patch is another core part of this move-charge-at-task-migration feature.
> It enables moving charges of anonymous swaps.
> 
> To move the charge of swap, we need to exchange swap_cgroup's record.
> 
> In current implementation, swap_cgroup's record is protected by:
> 
>   - page lock: if the entry is on swap cache.
>   - swap_lock: if the entry is not on swap cache.
> 
> This works well in usual swap-in/out activity.
> 
> But this behavior make the feature of moving swap charge check many conditions
> to exchange swap_cgroup's record safely.
> 
> So I changed modification of swap_cgroup's recored(swap_cgroup_record())
> to use xchg, and define a new function to cmpxchg swap_cgroup's record.
> 
> This patch also enables moving charge of non pte_present but not uncharged swap
> caches, which can be exist on swap-out path, by getting the target pages via
> find_get_page() as do_mincore() does.
> 
>
> ...
>
> +		else if (is_swap_pte(ptent)) {

is_swap_pte() isn't implemented for CONFIG_MMU=n, so the build breaks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -mmotm 7/8] memcg: move charges of anonymous swap
  2010-02-04  3:31   ` Andrew Morton
@ 2010-02-04  5:09     ` Daisuke Nishimura
  2010-02-04  5:27       ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 28+ messages in thread
From: Daisuke Nishimura @ 2010-02-04  5:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Balbir Singh, KAMEZAWA Hiroyuki, Li Zefan, Paul Menage, linux-mm,
	Daisuke Nishimura

On Wed, 3 Feb 2010 19:31:27 -0800, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Mon, 21 Dec 2009 14:38:16 +0900 Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> 
> > This patch is another core part of this move-charge-at-task-migration feature.
> > It enables moving charges of anonymous swaps.
> > 
> > To move the charge of swap, we need to exchange swap_cgroup's record.
> > 
> > In current implementation, swap_cgroup's record is protected by:
> > 
> >   - page lock: if the entry is on swap cache.
> >   - swap_lock: if the entry is not on swap cache.
> > 
> > This works well in usual swap-in/out activity.
> > 
> > But this behavior make the feature of moving swap charge check many conditions
> > to exchange swap_cgroup's record safely.
> > 
> > So I changed modification of swap_cgroup's recored(swap_cgroup_record())
> > to use xchg, and define a new function to cmpxchg swap_cgroup's record.
> > 
> > This patch also enables moving charge of non pte_present but not uncharged swap
> > caches, which can be exist on swap-out path, by getting the target pages via
> > find_get_page() as do_mincore() does.
> > 
> >
> > ...
> >
> > +		else if (is_swap_pte(ptent)) {
> 
> is_swap_pte() isn't implemented for CONFIG_MMU=n, so the build breaks.
Ah, you're right. I'm sorry I don't have any evironment to test !CONFIG_MMU.

Using #ifdef like below would be the simplest fix(SWAP is depend on MMU),
but hmm, #ifdef is ugly.

I'll prepare another fix.

===
From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

build fix in !CONFIG_MMU case.

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
---
 mm/memcontrol.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 953b18b..85b48cb 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3635,6 +3635,7 @@ static int is_target_pte_for_mc(struct vm_area_struct *vma,
 					&mc.to->move_charge_at_immigrate);
 
 	if (!pte_present(ptent)) {
+#ifdef CONFIG_SWAP
 		/* TODO: handle swap of shmes/tmpfs */
 		if (pte_none(ptent) || pte_file(ptent))
 			return 0;
@@ -3644,6 +3645,7 @@ static int is_target_pte_for_mc(struct vm_area_struct *vma,
 				return 0;
 			usage_count = mem_cgroup_count_swap_user(ent, &page);
 		}
+#endif
 	} else {
 		page = vm_normal_page(vma, addr, ptent);
 		if (!page || !page_mapped(page))
-- 
1.6.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH -mmotm 7/8] memcg: move charges of anonymous swap
  2010-02-04  5:09     ` Daisuke Nishimura
@ 2010-02-04  5:27       ` KAMEZAWA Hiroyuki
  2010-02-04  7:18         ` Paul Mundt
  0 siblings, 1 reply; 28+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-02-04  5:27 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: Andrew Morton, Balbir Singh, Li Zefan, Paul Menage, linux-mm

On Thu, 4 Feb 2010 14:09:42 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> On Wed, 3 Feb 2010 19:31:27 -0800, Andrew Morton <akpm@linux-foundation.org> wrote:
> > On Mon, 21 Dec 2009 14:38:16 +0900 Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> > 
> > > This patch is another core part of this move-charge-at-task-migration feature.
> > > It enables moving charges of anonymous swaps.
> > > 
> > > To move the charge of swap, we need to exchange swap_cgroup's record.
> > > 
> > > In current implementation, swap_cgroup's record is protected by:
> > > 
> > >   - page lock: if the entry is on swap cache.
> > >   - swap_lock: if the entry is not on swap cache.
> > > 
> > > This works well in usual swap-in/out activity.
> > > 
> > > But this behavior make the feature of moving swap charge check many conditions
> > > to exchange swap_cgroup's record safely.
> > > 
> > > So I changed modification of swap_cgroup's recored(swap_cgroup_record())
> > > to use xchg, and define a new function to cmpxchg swap_cgroup's record.
> > > 
> > > This patch also enables moving charge of non pte_present but not uncharged swap
> > > caches, which can be exist on swap-out path, by getting the target pages via
> > > find_get_page() as do_mincore() does.
> > > 
> > >
> > > ...
> > >
> > > +		else if (is_swap_pte(ptent)) {
> > 
> > is_swap_pte() isn't implemented for CONFIG_MMU=n, so the build breaks.
> Ah, you're right. I'm sorry I don't have any evironment to test !CONFIG_MMU.
> 
> Using #ifdef like below would be the simplest fix(SWAP is depend on MMU),
> but hmm, #ifdef is ugly.
> 
> I'll prepare another fix.
> 
Hmm..is there any user of memcg in !CONFIG_MMU environment ?
Maybe memcg can be used for controling amount of file cache (per cgroup)..
but..

I think memcg should depends on CONIFG_MMU.

How do you think ?

Thanks,
-Kame

> ===
> From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> 
> build fix in !CONFIG_MMU case.
> 
> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> ---
>  mm/memcontrol.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 953b18b..85b48cb 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3635,6 +3635,7 @@ static int is_target_pte_for_mc(struct vm_area_struct *vma,
>  					&mc.to->move_charge_at_immigrate);
>  
>  	if (!pte_present(ptent)) {
> +#ifdef CONFIG_SWAP
>  		/* TODO: handle swap of shmes/tmpfs */
>  		if (pte_none(ptent) || pte_file(ptent))
>  			return 0;
> @@ -3644,6 +3645,7 @@ static int is_target_pte_for_mc(struct vm_area_struct *vma,
>  				return 0;
>  			usage_count = mem_cgroup_count_swap_user(ent, &page);
>  		}
> +#endif
>  	} else {
>  		page = vm_normal_page(vma, addr, ptent);
>  		if (!page || !page_mapped(page))
> -- 
> 1.6.4
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -mmotm 7/8] memcg: move charges of anonymous swap
  2010-02-04  5:27       ` KAMEZAWA Hiroyuki
@ 2010-02-04  7:18         ` Paul Mundt
  2010-02-04  7:44           ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 28+ messages in thread
From: Paul Mundt @ 2010-02-04  7:18 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, Andrew Morton, Balbir Singh, Li Zefan,
	Paul Menage, linux-mm

On Thu, Feb 04, 2010 at 02:27:36PM +0900, KAMEZAWA Hiroyuki wrote:
> On Thu, 4 Feb 2010 14:09:42 +0900
> Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> 
> > On Wed, 3 Feb 2010 19:31:27 -0800, Andrew Morton <akpm@linux-foundation.org> wrote:
> > > On Mon, 21 Dec 2009 14:38:16 +0900 Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> > > 
> > > > This patch is another core part of this move-charge-at-task-migration feature.
> > > > It enables moving charges of anonymous swaps.
> > > > 
> > > > To move the charge of swap, we need to exchange swap_cgroup's record.
> > > > 
> > > > In current implementation, swap_cgroup's record is protected by:
> > > > 
> > > >   - page lock: if the entry is on swap cache.
> > > >   - swap_lock: if the entry is not on swap cache.
> > > > 
> > > > This works well in usual swap-in/out activity.
> > > > 
> > > > But this behavior make the feature of moving swap charge check many conditions
> > > > to exchange swap_cgroup's record safely.
> > > > 
> > > > So I changed modification of swap_cgroup's recored(swap_cgroup_record())
> > > > to use xchg, and define a new function to cmpxchg swap_cgroup's record.
> > > > 
> > > > This patch also enables moving charge of non pte_present but not uncharged swap
> > > > caches, which can be exist on swap-out path, by getting the target pages via
> > > > find_get_page() as do_mincore() does.
> > > > 
> > > >
> > > > ...
> > > >
> > > > +		else if (is_swap_pte(ptent)) {
> > > 
> > > is_swap_pte() isn't implemented for CONFIG_MMU=n, so the build breaks.
> > Ah, you're right. I'm sorry I don't have any evironment to test !CONFIG_MMU.
> > 
> > Using #ifdef like below would be the simplest fix(SWAP is depend on MMU),
> > but hmm, #ifdef is ugly.
> > 
> > I'll prepare another fix.
> > 
> Hmm..is there any user of memcg in !CONFIG_MMU environment ?
> Maybe memcg can be used for controling amount of file cache (per cgroup)..
> but..
> 
> I think memcg should depends on CONIFG_MMU.
> 
> How do you think ?
> 
Unless there's a real technical reason to make it depend on CONFIG_MMU,
that's just papering over the problem, and means that some nommu person
will have to come back and fix it properly at a later point in time.

CONFIG_SWAP itself is configurable even with CONFIG_MMU=y, so having
stubbed out helpers for the CONFIG_SWAP=n case would give the compiler a
chance to optimize things away in those cases, too. Embedded systems
especially will often have MMU=y and BLOCK=n, resulting in SWAP being
unset but swap cache encodings still defined.

How about just changing the is_swap_pte() definition to depend on SWAP
instead?

---

diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index cd42e30..45b5b65 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -42,12 +42,17 @@ static inline pgoff_t swp_offset(swp_entry_t entry)
 	return entry.val & SWP_OFFSET_MASK(entry);
 }
 
-#ifdef CONFIG_MMU
+#ifdef CONFIG_SWAP
 /* check whether a pte points to a swap entry */
 static inline int is_swap_pte(pte_t pte)
 {
 	return !pte_none(pte) && !pte_present(pte) && !pte_file(pte);
 }
+#else
+static inline int is_swap_pte(pte_t pte)
+{
+	return 0;
+}
 #endif
 
 /*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH -mmotm 7/8] memcg: move charges of anonymous swap
  2010-02-04  7:18         ` Paul Mundt
@ 2010-02-04  7:44           ` KAMEZAWA Hiroyuki
  2010-02-04 15:32             ` Balbir Singh
  2010-02-05  0:38             ` Daisuke Nishimura
  0 siblings, 2 replies; 28+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-02-04  7:44 UTC (permalink / raw)
  To: Paul Mundt
  Cc: Daisuke Nishimura, Andrew Morton, Balbir Singh, Li Zefan,
	Paul Menage, linux-mm

On Thu, 4 Feb 2010 16:18:40 +0900
Paul Mundt <lethal@linux-sh.org> wrote:

> On Thu, Feb 04, 2010 at 02:27:36PM +0900, KAMEZAWA Hiroyuki wrote:

> > I think memcg should depends on CONIFG_MMU.
> > 
> > How do you think ?
> > 
> Unless there's a real technical reason to make it depend on CONFIG_MMU,
> that's just papering over the problem, and means that some nommu person
> will have to come back and fix it properly at a later point in time.
> 
I have no strong opinion this. It's ok to support as much as possible.
My concern is that there is no !MMU architecture developper around memcg. So,
error report will be delayed.


> CONFIG_SWAP itself is configurable even with CONFIG_MMU=y, so having
> stubbed out helpers for the CONFIG_SWAP=n case would give the compiler a
> chance to optimize things away in those cases, too. Embedded systems
> especially will often have MMU=y and BLOCK=n, resulting in SWAP being
> unset but swap cache encodings still defined.
> 
> How about just changing the is_swap_pte() definition to depend on SWAP
> instead?
> 
I think the new feature as "move task charge" itself depends on CONFIG_MMU
because it walks a process's page table. 

Then, how about this ? (sorry, I can't test this in valid way..)

==
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Now, "move charges at task move" feature depends on page tables. So,
it doesn't work in !CONIFG_MMU enviroments.
This patch moves "task move" codes under CONIFG_MMU.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 Documentation/cgroups/memory.txt |    2 ++
 mm/memcontrol.c                  |   39 ++++++++++++++++++++++++++++++++++++---
 2 files changed, 38 insertions(+), 3 deletions(-)

Index: mmotm-2.6.33-Feb3/Documentation/cgroups/memory.txt
===================================================================
--- mmotm-2.6.33-Feb3.orig/Documentation/cgroups/memory.txt
+++ mmotm-2.6.33-Feb3/Documentation/cgroups/memory.txt
@@ -420,6 +420,8 @@ NOTE2: It is recommended to set the soft
 
 Users can move charges associated with a task along with task migration, that
 is, uncharge task's pages from the old cgroup and charge them to the new cgroup.
+This feature is not supporetd in !CONFIG_MMU environmetns because of lack of
+page tables.
 
 8.1 Interface
 
Index: mmotm-2.6.33-Feb3/mm/memcontrol.c
===================================================================
--- mmotm-2.6.33-Feb3.orig/mm/memcontrol.c
+++ mmotm-2.6.33-Feb3/mm/memcontrol.c
@@ -20,7 +20,6 @@
  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  * GNU General Public License for more details.
  */
-
 #include <linux/res_counter.h>
 #include <linux/memcontrol.h>
 #include <linux/cgroup.h>
@@ -2281,6 +2280,7 @@ void mem_cgroup_uncharge_swap(swp_entry_
 	rcu_read_unlock();
 }
 
+#ifdef CONFIG_MMU /* this is used for task_move */
 /**
  * mem_cgroup_move_swap_account - move swap charge and swap_cgroup's record.
  * @entry: swap entry to be moved
@@ -2332,6 +2332,7 @@ static int mem_cgroup_move_swap_account(
 	}
 	return -EINVAL;
 }
+#endif
 #else
 static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
 		struct mem_cgroup *from, struct mem_cgroup *to, bool need_fixup)
@@ -3027,6 +3028,7 @@ static u64 mem_cgroup_move_charge_read(s
 	return mem_cgroup_from_cont(cgrp)->move_charge_at_immigrate;
 }
 
+#ifdef CONIFG_MMU
 static int mem_cgroup_move_charge_write(struct cgroup *cgrp,
 					struct cftype *cft, u64 val)
 {
@@ -3045,7 +3047,13 @@ static int mem_cgroup_move_charge_write(
 
 	return 0;
 }
-
+#else
+static int mem_cgroup_move_charge_write(struct cgroup *cgrp,
+				struct cftype *cft, u64 val)
+{
+	return -EINVAL;
+}
+#endif
 
 /* For read statistics */
 enum {
@@ -3846,6 +3854,7 @@ static int mem_cgroup_populate(struct cg
 	return ret;
 }
 
+#ifdef CONFIG_MMU
 /* Handlers for move charge at task migration. */
 #define PRECHARGE_COUNT_AT_ONCE	256
 static int mem_cgroup_do_precharge(unsigned long count)
@@ -3901,7 +3910,6 @@ one_by_one:
 	}
 	return ret;
 }
-
 /**
  * is_target_pte_for_mc - check a pte whether it is valid for move charge
  * @vma: the vma the pte to be checked belongs
@@ -4243,6 +4251,31 @@ static void mem_cgroup_move_charge(struc
 	}
 	up_read(&mm->mmap_sem);
 }
+#else
+
+static int mem_cgroup_can_attach(struct cgroup_subsys *ss,
+	struct cgroup *cgroup,
+	struct task_struct *p,
+	bool threadgroup)
+{
+	return 0;
+}
+
+static void mem_cgroup_cancel_attach(struct cgroup_subsys *ss,
+		struct cgroup *cgroup,
+		struct task_struct *p,
+		bool threadgroup)
+{
+}
+
+static void mem_cgroup_move_charge(struct mm_struct *mm)
+{
+}
+
+static void mem_cgroup_clear_mc(void)
+{
+}
+#endif
 
 static void mem_cgroup_move_task(struct cgroup_subsys *ss,
 				struct cgroup *cont,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -mmotm 7/8] memcg: move charges of anonymous swap
  2010-02-04  7:44           ` KAMEZAWA Hiroyuki
@ 2010-02-04 15:32             ` Balbir Singh
  2010-02-05  0:38             ` Daisuke Nishimura
  1 sibling, 0 replies; 28+ messages in thread
From: Balbir Singh @ 2010-02-04 15:32 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Paul Mundt, Daisuke Nishimura, Andrew Morton, Li Zefan,
	Paul Menage, linux-mm

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-02-04 16:44:41]:

> On Thu, 4 Feb 2010 16:18:40 +0900
> Paul Mundt <lethal@linux-sh.org> wrote:
> 
> > On Thu, Feb 04, 2010 at 02:27:36PM +0900, KAMEZAWA Hiroyuki wrote:
> 
> > > I think memcg should depends on CONIFG_MMU.
> > > 
> > > How do you think ?
> > > 
> > Unless there's a real technical reason to make it depend on CONFIG_MMU,
> > that's just papering over the problem, and means that some nommu person
> > will have to come back and fix it properly at a later point in time.
> > 
> I have no strong opinion this. It's ok to support as much as possible.
> My concern is that there is no !MMU architecture developper around memcg. So,
> error report will be delayed.
>

I don't mind making it depend on CONFIG_MMU, enabling it for
!CONFIG_MMU will require some careful thought and work, so I'd rather
make that obvious by making it depend on CONFIG_MMU
 
> 
> > CONFIG_SWAP itself is configurable even with CONFIG_MMU=y, so having
> > stubbed out helpers for the CONFIG_SWAP=n case would give the compiler a
> > chance to optimize things away in those cases, too. Embedded systems
> > especially will often have MMU=y and BLOCK=n, resulting in SWAP being
> > unset but swap cache encodings still defined.
> > 
> > How about just changing the is_swap_pte() definition to depend on SWAP
> > instead?
> > 
> I think the new feature as "move task charge" itself depends on CONFIG_MMU
> because it walks a process's page table. 
> 
-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -mmotm 7/8] memcg: move charges of anonymous swap
  2010-02-04  7:44           ` KAMEZAWA Hiroyuki
  2010-02-04 15:32             ` Balbir Singh
@ 2010-02-05  0:38             ` Daisuke Nishimura
  2010-02-05  0:54               ` KAMEZAWA Hiroyuki
  2010-02-05  1:16               ` Paul Mundt
  1 sibling, 2 replies; 28+ messages in thread
From: Daisuke Nishimura @ 2010-02-05  0:38 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Paul Mundt, Andrew Morton, Balbir Singh, Li Zefan, Paul Menage,
	linux-mm, Daisuke Nishimura

On Thu, 4 Feb 2010 16:44:41 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Thu, 4 Feb 2010 16:18:40 +0900
> Paul Mundt <lethal@linux-sh.org> wrote:
> 
> > On Thu, Feb 04, 2010 at 02:27:36PM +0900, KAMEZAWA Hiroyuki wrote:
> 
> > > I think memcg should depends on CONIFG_MMU.
> > > 
> > > How do you think ?
> > > 
> > Unless there's a real technical reason to make it depend on CONFIG_MMU,
> > that's just papering over the problem, and means that some nommu person
> > will have to come back and fix it properly at a later point in time.
> > 
> I have no strong opinion this. It's ok to support as much as possible.
> My concern is that there is no !MMU architecture developper around memcg. So,
> error report will be delayed.
> 
I agree with you and Paul.

> 
> > CONFIG_SWAP itself is configurable even with CONFIG_MMU=y, so having
> > stubbed out helpers for the CONFIG_SWAP=n case would give the compiler a
> > chance to optimize things away in those cases, too. Embedded systems
> > especially will often have MMU=y and BLOCK=n, resulting in SWAP being
> > unset but swap cache encodings still defined.
> > 
> > How about just changing the is_swap_pte() definition to depend on SWAP
> > instead?
> > 
> I think the new feature as "move task charge" itself depends on CONFIG_MMU
> because it walks a process's page table. 
> 
> Then, how about this ? (sorry, I can't test this in valid way..)
> 
I agree to this direction of making "move charge" depend on CONFIG_MMU,
although I can't test !CONFIG_MMU case either.

Several comments are inlined.
> ==
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Now, "move charges at task move" feature depends on page tables. So,
> it doesn't work in !CONIFG_MMU enviroments.
> This patch moves "task move" codes under CONIFG_MMU.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  Documentation/cgroups/memory.txt |    2 ++
>  mm/memcontrol.c                  |   39 ++++++++++++++++++++++++++++++++++++---
>  2 files changed, 38 insertions(+), 3 deletions(-)
> 
> Index: mmotm-2.6.33-Feb3/Documentation/cgroups/memory.txt
> ===================================================================
> --- mmotm-2.6.33-Feb3.orig/Documentation/cgroups/memory.txt
> +++ mmotm-2.6.33-Feb3/Documentation/cgroups/memory.txt
> @@ -420,6 +420,8 @@ NOTE2: It is recommended to set the soft
>  
>  Users can move charges associated with a task along with task migration, that
>  is, uncharge task's pages from the old cgroup and charge them to the new cgroup.
> +This feature is not supporetd in !CONFIG_MMU environmetns because of lack of
> +page tables.
>  
>  8.1 Interface
>  
> Index: mmotm-2.6.33-Feb3/mm/memcontrol.c
> ===================================================================
> --- mmotm-2.6.33-Feb3.orig/mm/memcontrol.c
> +++ mmotm-2.6.33-Feb3/mm/memcontrol.c
> @@ -20,7 +20,6 @@
>   * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>   * GNU General Public License for more details.
>   */
> -
>  #include <linux/res_counter.h>
>  #include <linux/memcontrol.h>
>  #include <linux/cgroup.h>
Is this deletion necessary ? ;)

> @@ -2281,6 +2280,7 @@ void mem_cgroup_uncharge_swap(swp_entry_
>  	rcu_read_unlock();
>  }
>  
> +#ifdef CONFIG_MMU /* this is used for task_move */
>  /**
>   * mem_cgroup_move_swap_account - move swap charge and swap_cgroup's record.
>   * @entry: swap entry to be moved
> @@ -2332,6 +2332,7 @@ static int mem_cgroup_move_swap_account(
>  	}
>  	return -EINVAL;
>  }
> +#endif
>  #else
>  static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
>  		struct mem_cgroup *from, struct mem_cgroup *to, bool need_fixup)
I think these ifdefs are inside CONFIG_CGROUP_MEM_RES_CTLR_SWAP, which depends
on CONFIG_SWAP, so they are not needed.

> @@ -3027,6 +3028,7 @@ static u64 mem_cgroup_move_charge_read(s
>  	return mem_cgroup_from_cont(cgrp)->move_charge_at_immigrate;
>  }
>  
> +#ifdef CONIFG_MMU
>  static int mem_cgroup_move_charge_write(struct cgroup *cgrp,
>  					struct cftype *cft, u64 val)
>  {
> @@ -3045,7 +3047,13 @@ static int mem_cgroup_move_charge_write(
>  
>  	return 0;
>  }
> -
> +#else
> +static int mem_cgroup_move_charge_write(struct cgroup *cgrp,
> +				struct cftype *cft, u64 val)
> +{
> +	return -EINVAL;
> +}
> +#endif
>  
>  /* For read statistics */
>  enum {
It's a good idea, but I'm not sure which we should return -EINVAL or -ENOSYS.

> @@ -3846,6 +3854,7 @@ static int mem_cgroup_populate(struct cg
>  	return ret;
>  }
>  
> +#ifdef CONFIG_MMU
>  /* Handlers for move charge at task migration. */
>  #define PRECHARGE_COUNT_AT_ONCE	256
>  static int mem_cgroup_do_precharge(unsigned long count)
> @@ -3901,7 +3910,6 @@ one_by_one:
>  	}
>  	return ret;
>  }
> -
>  /**
>   * is_target_pte_for_mc - check a pte whether it is valid for move charge
>   * @vma: the vma the pte to be checked belongs
> @@ -4243,6 +4251,31 @@ static void mem_cgroup_move_charge(struc
>  	}
>  	up_read(&mm->mmap_sem);
>  }
> +#else
> +
> +static int mem_cgroup_can_attach(struct cgroup_subsys *ss,
> +	struct cgroup *cgroup,
> +	struct task_struct *p,
> +	bool threadgroup)
> +{
> +	return 0;
> +}
> +
> +static void mem_cgroup_cancel_attach(struct cgroup_subsys *ss,
> +		struct cgroup *cgroup,
> +		struct task_struct *p,
> +		bool threadgroup)
> +{
> +}
> +
> +static void mem_cgroup_move_charge(struct mm_struct *mm)
> +{
> +}
> +
> +static void mem_cgroup_clear_mc(void)
> +{
> +}
> +#endif
>  
>  static void mem_cgroup_move_task(struct cgroup_subsys *ss,
>  				struct cgroup *cont,
> 
Other parts look good to me.


Thanks,
Daisuke Nishimura.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -mmotm 7/8] memcg: move charges of anonymous swap
  2010-02-05  0:38             ` Daisuke Nishimura
@ 2010-02-05  0:54               ` KAMEZAWA Hiroyuki
  2010-02-05  1:16               ` Paul Mundt
  1 sibling, 0 replies; 28+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-02-05  0:54 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: Paul Mundt, Andrew Morton, Balbir Singh, Li Zefan, Paul Menage, linux-mm

On Fri, 5 Feb 2010 09:38:06 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> On Thu, 4 Feb 2010 16:44:41 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > On Thu, 4 Feb 2010 16:18:40 +0900
> > Paul Mundt <lethal@linux-sh.org> wrote:
> > 
> > > On Thu, Feb 04, 2010 at 02:27:36PM +0900, KAMEZAWA Hiroyuki wrote:
> > 
> > > > I think memcg should depends on CONIFG_MMU.
> > > > 
> > > > How do you think ?
> > > > 
> > > Unless there's a real technical reason to make it depend on CONFIG_MMU,
> > > that's just papering over the problem, and means that some nommu person
> > > will have to come back and fix it properly at a later point in time.
> > > 
> > I have no strong opinion this. It's ok to support as much as possible.
> > My concern is that there is no !MMU architecture developper around memcg. So,
> > error report will be delayed.
> > 
> I agree with you and Paul.
> 
> > 
> > > CONFIG_SWAP itself is configurable even with CONFIG_MMU=y, so having
> > > stubbed out helpers for the CONFIG_SWAP=n case would give the compiler a
> > > chance to optimize things away in those cases, too. Embedded systems
> > > especially will often have MMU=y and BLOCK=n, resulting in SWAP being
> > > unset but swap cache encodings still defined.
> > > 
> > > How about just changing the is_swap_pte() definition to depend on SWAP
> > > instead?
> > > 
> > I think the new feature as "move task charge" itself depends on CONFIG_MMU
> > because it walks a process's page table. 
> > 
> > Then, how about this ? (sorry, I can't test this in valid way..)
> > 
> I agree to this direction of making "move charge" depend on CONFIG_MMU,
> although I can't test !CONFIG_MMU case either.
> 
ya, that's a problem.

> Several comments are inlined.
> > ==
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > 
> > Now, "move charges at task move" feature depends on page tables. So,
> > it doesn't work in !CONIFG_MMU enviroments.
> > This patch moves "task move" codes under CONIFG_MMU.
> > 
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> >  Documentation/cgroups/memory.txt |    2 ++
> >  mm/memcontrol.c                  |   39 ++++++++++++++++++++++++++++++++++++---
> >  2 files changed, 38 insertions(+), 3 deletions(-)
> > 
> > Index: mmotm-2.6.33-Feb3/Documentation/cgroups/memory.txt
> > ===================================================================
> > --- mmotm-2.6.33-Feb3.orig/Documentation/cgroups/memory.txt
> > +++ mmotm-2.6.33-Feb3/Documentation/cgroups/memory.txt
> > @@ -420,6 +420,8 @@ NOTE2: It is recommended to set the soft
> >  
> >  Users can move charges associated with a task along with task migration, that
> >  is, uncharge task's pages from the old cgroup and charge them to the new cgroup.
> > +This feature is not supporetd in !CONFIG_MMU environmetns because of lack of
> > +page tables.
> >  
> >  8.1 Interface
> >  
> > Index: mmotm-2.6.33-Feb3/mm/memcontrol.c
> > ===================================================================
> > --- mmotm-2.6.33-Feb3.orig/mm/memcontrol.c
> > +++ mmotm-2.6.33-Feb3/mm/memcontrol.c
> > @@ -20,7 +20,6 @@
> >   * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >   * GNU General Public License for more details.
> >   */
> > -
> >  #include <linux/res_counter.h>
> >  #include <linux/memcontrol.h>
> >  #include <linux/cgroup.h>
> Is this deletion necessary ? ;)
> 
no ;(

> > @@ -2281,6 +2280,7 @@ void mem_cgroup_uncharge_swap(swp_entry_
> >  	rcu_read_unlock();
> >  }
> >  
> > +#ifdef CONFIG_MMU /* this is used for task_move */
> >  /**
> >   * mem_cgroup_move_swap_account - move swap charge and swap_cgroup's record.
> >   * @entry: swap entry to be moved
> > @@ -2332,6 +2332,7 @@ static int mem_cgroup_move_swap_account(
> >  	}
> >  	return -EINVAL;
> >  }
> > +#endif
> >  #else
> >  static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
> >  		struct mem_cgroup *from, struct mem_cgroup *to, bool need_fixup)
> I think these ifdefs are inside CONFIG_CGROUP_MEM_RES_CTLR_SWAP, which depends
> on CONFIG_SWAP, so they are not needed.
> 
ok. maybe my test to set !MMU manually catches wrong result.


> > @@ -3027,6 +3028,7 @@ static u64 mem_cgroup_move_charge_read(s
> >  	return mem_cgroup_from_cont(cgrp)->move_charge_at_immigrate;
> >  }
> >  
> > +#ifdef CONIFG_MMU
> >  static int mem_cgroup_move_charge_write(struct cgroup *cgrp,
> >  					struct cftype *cft, u64 val)
> >  {
> > @@ -3045,7 +3047,13 @@ static int mem_cgroup_move_charge_write(
> >  
> >  	return 0;
> >  }
> > -
> > +#else
> > +static int mem_cgroup_move_charge_write(struct cgroup *cgrp,
> > +				struct cftype *cft, u64 val)
> > +{
> > +	return -EINVAL;
> > +}
> > +#endif
> >  
> >  /* For read statistics */
> >  enum {
> It's a good idea, but I'm not sure which we should return -EINVAL or -ENOSYS.
> 
Hmm,  -ENOTSUPP ?


> > @@ -3846,6 +3854,7 @@ static int mem_cgroup_populate(struct cg
> >  	return ret;
> >  }
> >  
> > +#ifdef CONFIG_MMU
> >  /* Handlers for move charge at task migration. */
> >  #define PRECHARGE_COUNT_AT_ONCE	256
> >  static int mem_cgroup_do_precharge(unsigned long count)
> > @@ -3901,7 +3910,6 @@ one_by_one:
> >  	}
> >  	return ret;
> >  }
> > -
> >  /**
> >   * is_target_pte_for_mc - check a pte whether it is valid for move charge
> >   * @vma: the vma the pte to be checked belongs
> > @@ -4243,6 +4251,31 @@ static void mem_cgroup_move_charge(struc
> >  	}
> >  	up_read(&mm->mmap_sem);
> >  }
> > +#else
> > +
> > +static int mem_cgroup_can_attach(struct cgroup_subsys *ss,
> > +	struct cgroup *cgroup,
> > +	struct task_struct *p,
> > +	bool threadgroup)
> > +{
> > +	return 0;
> > +}
> > +
> > +static void mem_cgroup_cancel_attach(struct cgroup_subsys *ss,
> > +		struct cgroup *cgroup,
> > +		struct task_struct *p,
> > +		bool threadgroup)
> > +{
> > +}
> > +
> > +static void mem_cgroup_move_charge(struct mm_struct *mm)
> > +{
> > +}
> > +
> > +static void mem_cgroup_clear_mc(void)
> > +{
> > +}
> > +#endif
> >  
> >  static void mem_cgroup_move_task(struct cgroup_subsys *ss,
> >  				struct cgroup *cont,
> > 
> Other parts look good to me.
> 
Thanks, I'll rewrite.

-Kame

> 
> Thanks,
> Daisuke Nishimura.
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -mmotm 7/8] memcg: move charges of anonymous swap
  2010-02-05  0:38             ` Daisuke Nishimura
  2010-02-05  0:54               ` KAMEZAWA Hiroyuki
@ 2010-02-05  1:16               ` Paul Mundt
  2010-03-09 23:13                 ` Andrew Morton
  1 sibling, 1 reply; 28+ messages in thread
From: Paul Mundt @ 2010-02-05  1:16 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, Balbir Singh, Li Zefan,
	Paul Menage, linux-mm

On Fri, Feb 05, 2010 at 09:38:06AM +0900, Daisuke Nishimura wrote:
> On Thu, 4 Feb 2010 16:44:41 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > On Thu, 4 Feb 2010 16:18:40 +0900
> > Paul Mundt <lethal@linux-sh.org> wrote:
> > > CONFIG_SWAP itself is configurable even with CONFIG_MMU=y, so having
> > > stubbed out helpers for the CONFIG_SWAP=n case would give the compiler a
> > > chance to optimize things away in those cases, too. Embedded systems
> > > especially will often have MMU=y and BLOCK=n, resulting in SWAP being
> > > unset but swap cache encodings still defined.
> > > 
> > > How about just changing the is_swap_pte() definition to depend on SWAP
> > > instead?
> > > 
> > I think the new feature as "move task charge" itself depends on CONFIG_MMU
> > because it walks a process's page table. 
> > 
> > Then, how about this ? (sorry, I can't test this in valid way..)
> > 
> I agree to this direction of making "move charge" depend on CONFIG_MMU,
> although I can't test !CONFIG_MMU case either.
> 
I'll try to give it a test on nommu today and see how it goes.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -mmotm 7/8] memcg: move charges of anonymous swap
  2010-02-05  1:16               ` Paul Mundt
@ 2010-03-09 23:13                 ` Andrew Morton
  2010-03-10  2:50                   ` Daisuke Nishimura
  0 siblings, 1 reply; 28+ messages in thread
From: Andrew Morton @ 2010-03-09 23:13 UTC (permalink / raw)
  To: Paul Mundt
  Cc: Daisuke Nishimura, KAMEZAWA Hiroyuki, Balbir Singh, Li Zefan,
	Paul Menage, linux-mm

On Fri, 5 Feb 2010 10:16:02 +0900
Paul Mundt <lethal@linux-sh.org> wrote:

> On Fri, Feb 05, 2010 at 09:38:06AM +0900, Daisuke Nishimura wrote:
> > On Thu, 4 Feb 2010 16:44:41 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > > On Thu, 4 Feb 2010 16:18:40 +0900
> > > Paul Mundt <lethal@linux-sh.org> wrote:
> > > > CONFIG_SWAP itself is configurable even with CONFIG_MMU=y, so having
> > > > stubbed out helpers for the CONFIG_SWAP=n case would give the compiler a
> > > > chance to optimize things away in those cases, too. Embedded systems
> > > > especially will often have MMU=y and BLOCK=n, resulting in SWAP being
> > > > unset but swap cache encodings still defined.
> > > > 
> > > > How about just changing the is_swap_pte() definition to depend on SWAP
> > > > instead?
> > > > 
> > > I think the new feature as "move task charge" itself depends on CONFIG_MMU
> > > because it walks a process's page table. 
> > > 
> > > Then, how about this ? (sorry, I can't test this in valid way..)
> > > 
> > I agree to this direction of making "move charge" depend on CONFIG_MMU,
> > although I can't test !CONFIG_MMU case either.
> > 
> I'll try to give it a test on nommu today and see how it goes.

The patch is still breaking the NOMMU build for me:

mm/memcontrol.c: In function `is_target_pte_for_mc':
mm/memcontrol.c:3641: error: implicit declaration of function `is_swap_pte'


btw, sh allmodconfig gives me a huge spew related to the missing
definition of pt_regs:

/usr/src/25/arch/sh/include/asm/system.h:166: warning: parameter has incomplete type
/usr/src/25/arch/sh/include/asm/system.h:167: warning: parameter has incomplete type
/usr/src/25/arch/sh/include/asm/system.h:168: warning: parameter has incomplete type
/usr/src/25/arch/sh/include/asm/system.h:169: warning: parameter has incomplete type
/usr/src/25/arch/sh/include/asm/system.h:170: warning: parameter has incomplete type
/usr/src/25/arch/sh/include/asm/system.h:171: warning: parameter has incomplete type
/usr/src/25/arch/sh/include/asm/system.h:172: warning: parameter has incomplete type
/usr/src/25/arch/sh/include/asm/system.h:173: warning: parameter has incomplete type

The `struct pt_regs;' declararion isn't enough - it wants to see the
definition.  I think.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -mmotm 7/8] memcg: move charges of anonymous swap
  2010-03-09 23:13                 ` Andrew Morton
@ 2010-03-10  2:50                   ` Daisuke Nishimura
  0 siblings, 0 replies; 28+ messages in thread
From: Daisuke Nishimura @ 2010-03-10  2:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Paul Mundt, KAMEZAWA Hiroyuki, Balbir Singh, Li Zefan,
	Paul Menage, linux-mm, Daisuke Nishimura

On Tue, 9 Mar 2010 15:13:34 -0800, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Fri, 5 Feb 2010 10:16:02 +0900
> Paul Mundt <lethal@linux-sh.org> wrote:
> 
> > On Fri, Feb 05, 2010 at 09:38:06AM +0900, Daisuke Nishimura wrote:
> > > On Thu, 4 Feb 2010 16:44:41 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > > > On Thu, 4 Feb 2010 16:18:40 +0900
> > > > Paul Mundt <lethal@linux-sh.org> wrote:
> > > > > CONFIG_SWAP itself is configurable even with CONFIG_MMU=y, so having
> > > > > stubbed out helpers for the CONFIG_SWAP=n case would give the compiler a
> > > > > chance to optimize things away in those cases, too. Embedded systems
> > > > > especially will often have MMU=y and BLOCK=n, resulting in SWAP being
> > > > > unset but swap cache encodings still defined.
> > > > > 
> > > > > How about just changing the is_swap_pte() definition to depend on SWAP
> > > > > instead?
> > > > > 
> > > > I think the new feature as "move task charge" itself depends on CONFIG_MMU
> > > > because it walks a process's page table. 
> > > > 
> > > > Then, how about this ? (sorry, I can't test this in valid way..)
> > > > 
> > > I agree to this direction of making "move charge" depend on CONFIG_MMU,
> > > although I can't test !CONFIG_MMU case either.
> > > 
> > I'll try to give it a test on nommu today and see how it goes.
> 
> The patch is still breaking the NOMMU build for me:
> 
> mm/memcontrol.c: In function `is_target_pte_for_mc':
> mm/memcontrol.c:3641: error: implicit declaration of function `is_swap_pte'
> 
This is a fix patch based on KAMEZAWA-san's.

This patch makes "move charge" feature depends on CONFIG_MMU. I think it would be
more appropriate to place this patch as a fix for
memcg-add-interface-to-move-charge-at-task-migration.patch, because of its nature,
so I prepared this patch as a fix for it. And all of the following patches can be
applied properly in my environment.

===
From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

"move charges at task migration" feature depends on page tables. So, it doesn't
work in !CONIFG_MMU environments.
This patch moves "task move" codes under CONIFG_MMU.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
---
 Documentation/cgroups/memory.txt |    2 ++
 mm/memcontrol.c                  |   31 +++++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index e726fb0..b8b6b12 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -420,6 +420,8 @@ NOTE2: It is recommended to set the soft limit always below the hard limit,
 
 Users can move charges associated with a task along with task migration, that
 is, uncharge task's pages from the old cgroup and charge them to the new cgroup.
+This feature is not supporetd in !CONFIG_MMU environmetns because of lack of
+page tables.
 
 8.1 Interface
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 59ffaf5..88a6880 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2886,6 +2886,7 @@ static u64 mem_cgroup_move_charge_read(struct cgroup *cgrp,
 	return mem_cgroup_from_cont(cgrp)->move_charge_at_immigrate;
 }
 
+#ifdef CONFIG_MMU
 static int mem_cgroup_move_charge_write(struct cgroup *cgrp,
 					struct cftype *cft, u64 val)
 {
@@ -2904,6 +2905,13 @@ static int mem_cgroup_move_charge_write(struct cgroup *cgrp,
 
 	return 0;
 }
+#else
+static int mem_cgroup_move_charge_write(struct cgroup *cgrp,
+					struct cftype *cft, u64 val)
+{
+	return -ENOSYS;
+}
+#endif
 
 
 /* For read statistics */
@@ -3427,6 +3435,7 @@ static int mem_cgroup_populate(struct cgroup_subsys *ss,
 	return ret;
 }
 
+#ifdef CONFIG_MMU
 /* Handlers for move charge at task migration. */
 static int mem_cgroup_can_move_charge(void)
 {
@@ -3479,6 +3488,28 @@ static void mem_cgroup_move_task(struct cgroup_subsys *ss,
 {
 	mem_cgroup_move_charge();
 }
+#else	/* !CONFIG_MMU */
+static int mem_cgroup_can_attach(struct cgroup_subsys *ss,
+				struct cgroup *cgroup,
+				struct task_struct *p,
+				bool threadgroup)
+{
+	return 0;
+}
+static void mem_cgroup_cancel_attach(struct cgroup_subsys *ss,
+				struct cgroup *cgroup,
+				struct task_struct *p,
+				bool threadgroup)
+{
+}
+static void mem_cgroup_move_task(struct cgroup_subsys *ss,
+				struct cgroup *cont,
+				struct cgroup *old_cont,
+				struct task_struct *p,
+				bool threadgroup)
+{
+}
+#endif
 
 struct cgroup_subsys mem_cgroup_subsys = {
 	.name = "memory",
-- 
1.6.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -mmotm 4/8] memcg: move charges of anonymous page
  2009-12-14  6:17 [PATCH -mmotm 0/8] memcg: move charge at task migration (14/Dec) Daisuke Nishimura
@ 2009-12-14  6:21 ` Daisuke Nishimura
  0 siblings, 0 replies; 28+ messages in thread
From: Daisuke Nishimura @ 2009-12-14  6:21 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Balbir Singh, KAMEZAWA Hiroyuki, Li Zefan,
	Paul Menage, Daisuke Nishimura

This patch is the core part of this move-charge-at-task-migration feature.
It implements functions to move charges of anonymous pages mapped only by
the target task.

Implementation:
- define struct move_charge_struct and a valuable of it(mc) to remember the
  count of pre-charges and other information.
- At can_attach(), get anon_rss of the target mm, call __mem_cgroup_try_charge()
  repeatedly and count up mc.precharge.
- At attach(), parse the page table, find a target page to be move, and call
  mem_cgroup_move_account() about the page.
- Cancel all precharges if mc.precharge > 0 on failure or at the end of
  task move.

Changelog: 2009/12/04
- change the term "recharge" to "move_charge".
- handle a signal in can_attach() phase.
- parse the page table in can_attach() phase again(go back to the old behavior),
  because it doesn't add so big overheads, so it would be better to calculate
  the precharge count more accurately.
Changelog: 2009/11/19
- in can_attach(), instead of parsing the page table, make use of per process
  mm_counter(anon_rss).
- loosen the valid check in is_target_pte_for_recharge().
Changelog: 2009/11/06
- drop support for file cache, shmem/tmpfs and shared(used by multiple processes)
  pages(revisit in future).
Changelog: 2009/10/13
- change the term "migrate" to "recharge".
Changelog: 2009/09/24
- in can_attach(), parse the page table of the task and count only the number
  of target ptes and call try_charge() repeatedly. No isolation at this phase.
- in attach(), parse the page table of the task again, and isolate the target
  page and call move_account() one by one.

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
---
 mm/memcontrol.c |  295 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 285 insertions(+), 10 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2ec0201..3284c0e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -21,6 +21,7 @@
 #include <linux/memcontrol.h>
 #include <linux/cgroup.h>
 #include <linux/mm.h>
+#include <linux/hugetlb.h>
 #include <linux/pagemap.h>
 #include <linux/smp.h>
 #include <linux/page-flags.h>
@@ -243,8 +244,17 @@ struct mem_cgroup {
  * left-shifted bitmap of these types.
  */
 enum move_type {
+	MOVE_CHARGE_TYPE_ANON,	/* private anonymous page and swap of it */
 	NR_MOVE_TYPE,
 };
+/* "mc" and its members are protected by cgroup_mutex */
+struct move_charge_struct {
+	struct mem_cgroup *from;
+	struct mem_cgroup *to;
+	unsigned long precharge;
+};
+static struct move_charge_struct mc;
+
 
 /*
  * Maximum loops in mem_cgroup_hierarchical_reclaim(), used for soft
@@ -1513,7 +1523,7 @@ charged:
 	 * Insert ancestor (and ancestor's ancestors), to softlimit RB-tree.
 	 * if they exceeds softlimit.
 	 */
-	if (mem_cgroup_soft_limit_check(mem))
+	if (page && mem_cgroup_soft_limit_check(mem))
 		mem_cgroup_update_tree(mem, page);
 done:
 	return 0;
@@ -1690,8 +1700,9 @@ static void __mem_cgroup_move_account(struct page_cgroup *pc,
 	/*
 	 * We charges against "to" which may not have any tasks. Then, "to"
 	 * can be under rmdir(). But in current implementation, caller of
-	 * this function is just force_empty() and it's garanteed that
-	 * "to" is never removed. So, we don't check rmdir status here.
+	 * this function is just force_empty() and move charge, so it's
+	 * garanteed that "to" is never removed. So, we don't check rmdir
+	 * status here.
 	 */
 }
 
@@ -3431,11 +3442,171 @@ static int mem_cgroup_populate(struct cgroup_subsys *ss,
 }
 
 /* Handlers for move charge at task migration. */
-static int mem_cgroup_can_move_charge(void)
+static int mem_cgroup_do_precharge(void)
+{
+	int ret = -ENOMEM;
+	struct mem_cgroup *mem = mc.to;
+
+	ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem, false, NULL);
+	if (ret || !mem)
+		return -ENOMEM;
+
+	mc.precharge++;
+	return ret;
+}
+
+/**
+ * is_target_pte_for_mc - check a pte whether it is valid for move charge
+ * @vma: the vma the pte to be checked belongs
+ * @addr: the address corresponding to the pte to be checked
+ * @ptent: the pte to be checked
+ * @target: the pointer the target page will be stored(can be NULL)
+ *
+ * Returns
+ *   0(MC_TARGET_NONE): if the pte is not a target for move charge.
+ *   1(MC_TARGET_PAGE): if the page corresponding to this pte is a target for
+ *     move charge. if @target is not NULL, the page is stored in target->page
+ *     with extra refcnt got(Callers should handle it).
+ *
+ * Called with pte lock held.
+ */
+/* We add a new member later. */
+union mc_target {
+	struct page	*page;
+};
+
+/* We add a new type later. */
+enum mc_target_type {
+	MC_TARGET_NONE,	/* not used */
+	MC_TARGET_PAGE,
+};
+
+static int is_target_pte_for_mc(struct vm_area_struct *vma,
+		unsigned long addr, pte_t ptent, union mc_target *target)
+{
+	struct page *page;
+	struct page_cgroup *pc;
+	int ret = 0;
+	bool move_anon = test_bit(MOVE_CHARGE_TYPE_ANON,
+					&mc.to->move_charge_at_immigrate);
+
+	if (!pte_present(ptent))
+		return 0;
+
+	page = vm_normal_page(vma, addr, ptent);
+	if (!page || !page_mapped(page))
+		return 0;
+	/*
+	 * TODO: We don't move charges of file(including shmem/tmpfs) pages for
+	 * now.
+	 */
+	if (!move_anon || !PageAnon(page))
+		return 0;
+	/*
+	 * TODO: We don't move charges of shared(used by multiple processes)
+	 * pages for now.
+	 */
+	if (page_mapcount(page) > 1)
+		return 0;
+	if (!get_page_unless_zero(page))
+		return 0;
+
+	pc = lookup_page_cgroup(page);
+	/*
+	 * Do only loose check w/o page_cgroup lock. mem_cgroup_move_account()
+	 * checks the pc is valid or not under the lock.
+	 */
+	if (PageCgroupUsed(pc) && pc->mem_cgroup == mc.from) {
+		ret = MC_TARGET_PAGE;
+		if (target)
+			target->page = page;
+	}
+
+	if (!ret || !target)
+		put_page(page);
+
+	return ret;
+}
+
+static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd,
+					unsigned long addr, unsigned long end,
+					struct mm_walk *walk)
 {
+	struct vm_area_struct *vma = walk->private;
+	pte_t *pte;
+	spinlock_t *ptl;
+
+	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
+	for (; addr != end; pte++, addr += PAGE_SIZE)
+		if (is_target_pte_for_mc(vma, addr, *pte, NULL))
+			mc.precharge++;	/* increment precharge temporarily */
+	pte_unmap_unlock(pte - 1, ptl);
+	cond_resched();
+
 	return 0;
 }
 
+static unsigned long mem_cgroup_count_precharge(struct mm_struct *mm)
+{
+	unsigned long precharge;
+	struct vm_area_struct *vma;
+
+	down_read(&mm->mmap_sem);
+	for (vma = mm->mmap; vma; vma = vma->vm_next) {
+		struct mm_walk mem_cgroup_count_precharge_walk = {
+			.pmd_entry = mem_cgroup_count_precharge_pte_range,
+			.mm = mm,
+			.private = vma,
+		};
+		if (is_vm_hugetlb_page(vma))
+			continue;
+		/* TODO: We don't move charges of shmem/tmpfs pages for now. */
+		if (vma->vm_flags & VM_SHARED)
+			continue;
+		walk_page_range(vma->vm_start, vma->vm_end,
+					&mem_cgroup_count_precharge_walk);
+	}
+	up_read(&mm->mmap_sem);
+
+	precharge = mc.precharge;
+	mc.precharge = 0;
+
+	return precharge;
+}
+
+#define PRECHARGE_AT_ONCE	256
+static int mem_cgroup_precharge_mc(struct mm_struct *mm)
+{
+	int ret = 0;
+	int count = PRECHARGE_AT_ONCE;
+	unsigned long precharge = mem_cgroup_count_precharge(mm);
+
+	while (!ret && precharge--) {
+		if (signal_pending(current)) {
+			ret = -EINTR;
+			break;
+		}
+		if (!count--) {
+			count = PRECHARGE_AT_ONCE;
+			cond_resched();
+		}
+		ret = mem_cgroup_do_precharge();
+	}
+
+	return ret;
+}
+
+static void mem_cgroup_clear_mc(void)
+{
+	/* we must uncharge all the leftover precharges from mc.to */
+	while (mc.precharge) {
+		mem_cgroup_cancel_charge(mc.to);
+		mc.precharge--;
+	}
+	mc.from = NULL;
+	mc.to = NULL;
+}
+
 static int mem_cgroup_can_attach(struct cgroup_subsys *ss,
 				struct cgroup *cgroup,
 				struct task_struct *p,
@@ -3453,11 +3624,19 @@ static int mem_cgroup_can_attach(struct cgroup_subsys *ss,
 		mm = get_task_mm(p);
 		if (!mm)
 			return 0;
-
 		/* We move charges only when we move a owner of the mm */
-		if (mm->owner == p)
-			ret = mem_cgroup_can_move_charge();
-
+		if (mm->owner == p) {
+			VM_BUG_ON(mc.from);
+			VM_BUG_ON(mc.to);
+			VM_BUG_ON(mc.precharge);
+			mc = (struct move_charge_struct) {
+				.from = from, .to = mem, .precharge = 0
+			};
+
+			ret = mem_cgroup_precharge_mc(mm);
+			if (ret)
+				mem_cgroup_clear_mc();
+		}
 		mmput(mm);
 	}
 	return ret;
@@ -3468,10 +3647,95 @@ static void mem_cgroup_cancel_attach(struct cgroup_subsys *ss,
 				struct task_struct *p,
 				bool threadgroup)
 {
+	mem_cgroup_clear_mc();
 }
 
-static void mem_cgroup_move_charge(void)
+static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
+				unsigned long addr, unsigned long end,
+				struct mm_walk *walk)
 {
+	int ret = 0;
+	struct vm_area_struct *vma = walk->private;
+	pte_t *pte;
+	spinlock_t *ptl;
+
+retry:
+	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
+	for (; addr != end; addr += PAGE_SIZE) {
+		pte_t ptent = *(pte++);
+		union mc_target target;
+		int type;
+		struct page *page;
+		struct page_cgroup *pc;
+
+		if (!mc.precharge)
+			break;
+
+		type = is_target_pte_for_mc(vma, addr, ptent, &target);
+		switch (type) {
+		case MC_TARGET_PAGE:
+			page = target.page;
+			if (isolate_lru_page(page))
+				goto put;
+			pc = lookup_page_cgroup(page);
+			if (!mem_cgroup_move_account(pc, mc.from, mc.to)) {
+				css_put(&mc.to->css);
+				mc.precharge--;
+			}
+			putback_lru_page(page);
+put:			/* is_target_pte_for_mc() gets the page */
+			put_page(page);
+			break;
+		default:
+			break;
+		}
+	}
+	pte_unmap_unlock(pte - 1, ptl);
+	cond_resched();
+
+	if (addr != end) {
+		/*
+		 * We have consumed all precharges we got in can_attach().
+		 * We try charge one by one, but don't do any additional
+		 * charges to mc.to if we have failed in charge once in attach()
+		 * phase.
+		 */
+		ret = mem_cgroup_do_precharge();
+		if (!ret)
+			goto retry;
+	}
+
+	return ret;
+}
+
+static void mem_cgroup_move_charge(struct mm_struct *mm)
+{
+	struct vm_area_struct *vma;
+
+	lru_add_drain_all();
+	down_read(&mm->mmap_sem);
+	for (vma = mm->mmap; vma; vma = vma->vm_next) {
+		int ret;
+		struct mm_walk mem_cgroup_move_charge_walk = {
+			.pmd_entry = mem_cgroup_move_charge_pte_range,
+			.mm = mm,
+			.private = vma,
+		};
+		if (is_vm_hugetlb_page(vma))
+			continue;
+		/* TODO: We don't move charges of shmem/tmpfs pages for now. */
+		if (vma->vm_flags & VM_SHARED)
+			continue;
+		ret = walk_page_range(vma->vm_start, vma->vm_end,
+						&mem_cgroup_move_charge_walk);
+		if (ret)
+			/*
+			 * means we have consumed all precharges and failed in
+			 * doing additional charge. Just abandon here.
+			 */
+			break;
+	}
+	up_read(&mm->mmap_sem);
 }
 
 static void mem_cgroup_move_task(struct cgroup_subsys *ss,
@@ -3480,7 +3744,18 @@ static void mem_cgroup_move_task(struct cgroup_subsys *ss,
 				struct task_struct *p,
 				bool threadgroup)
 {
-	mem_cgroup_move_charge();
+	struct mm_struct *mm;
+
+	if (!mc.to)
+		/* no need to move charge */
+		return;
+
+	mm = get_task_mm(p);
+	if (mm) {
+		mem_cgroup_move_charge(mm);
+		mmput(mm);
+	}
+	mem_cgroup_clear_mc();
 }
 
 struct cgroup_subsys mem_cgroup_subsys = {
-- 
1.5.6.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2010-03-10  2:59 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-12-21  5:31 [PATCH -mmotm 0/8] memcg: move charge at task migration (21/Dec) Daisuke Nishimura
2009-12-21  5:32 ` [PATCH -mmotm 1/8] cgroup: introduce cancel_attach() Daisuke Nishimura
2009-12-21  5:32 ` [PATCH -mmotm 2/8] cgroup: introduce coalesce css_get() and css_put() Daisuke Nishimura
2009-12-21  5:33 ` [PATCH -mmotm 3/8] memcg: add interface to move charge at task migration Daisuke Nishimura
2009-12-21  7:00   ` KAMEZAWA Hiroyuki
2009-12-21  5:35 ` [PATCH -mmotm 4/8] memcg: move charges of anonymous page Daisuke Nishimura
2009-12-21  7:01   ` KAMEZAWA Hiroyuki
2009-12-23  0:26   ` Andrew Morton
2009-12-21  5:36 ` [PATCH -mmotm 5/8] memcg: improve performance in moving charge Daisuke Nishimura
2009-12-21  7:02   ` KAMEZAWA Hiroyuki
2009-12-21  5:37 ` [PATCH -mmotm 6/8] memcg: avoid oom during " Daisuke Nishimura
2009-12-21  7:03   ` KAMEZAWA Hiroyuki
2009-12-21  5:38 ` [PATCH -mmotm 7/8] memcg: move charges of anonymous swap Daisuke Nishimura
2009-12-21  7:04   ` KAMEZAWA Hiroyuki
2010-02-04  3:31   ` Andrew Morton
2010-02-04  5:09     ` Daisuke Nishimura
2010-02-04  5:27       ` KAMEZAWA Hiroyuki
2010-02-04  7:18         ` Paul Mundt
2010-02-04  7:44           ` KAMEZAWA Hiroyuki
2010-02-04 15:32             ` Balbir Singh
2010-02-05  0:38             ` Daisuke Nishimura
2010-02-05  0:54               ` KAMEZAWA Hiroyuki
2010-02-05  1:16               ` Paul Mundt
2010-03-09 23:13                 ` Andrew Morton
2010-03-10  2:50                   ` Daisuke Nishimura
2009-12-21  5:40 ` [PATCH -mmotm 8/8] memcg: improve performance in moving swap charge Daisuke Nishimura
2009-12-21  7:05   ` KAMEZAWA Hiroyuki
  -- strict thread matches above, loose matches on Subject: below --
2009-12-14  6:17 [PATCH -mmotm 0/8] memcg: move charge at task migration (14/Dec) Daisuke Nishimura
2009-12-14  6:21 ` [PATCH -mmotm 4/8] memcg: move charges of anonymous page Daisuke Nishimura

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).