All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH -V3 0/8] memcg: Add memcg extension to control HugeTLB allocation
@ 2012-03-13  7:07 ` Aneesh Kumar K.V
  0 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-13  7:07 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups

Hi,

This patchset implements a memory controller extension to control
HugeTLB allocations. The extension allows to limit the HugeTLB
usage per control group and enforces the controller limit during
page fault. Since HugeTLB doesn't support page reclaim, enforcing
the limit at page fault time implies that, the application will get
SIGBUS signal if it tries to access HugeTLB pages beyond its limit.
This requires the application to know beforehand how much HugeTLB
pages it would require for its use.

Changes from V2:
* Changed the implementation to limit the HugeTLB usage during page
  fault time. This simplifies the extension and keep it closer to
  memcg design. This also allows to support cgroup removal with less
  complexity. Only caveat is the application should ensure its HugeTLB
  usage doesn't cross the cgroup limit.

Changes from V1:
* Changed the implementation as a memcg extension. We still use
  the same logic to track the cgroup and range.

Changes from RFC post:
* Added support for HugeTLB cgroup hierarchy
* Added support for task migration
* Added documentation patch
* Other bug fixes

-aneesh



^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH -V3 0/8] memcg: Add memcg extension to control HugeTLB allocation
@ 2012-03-13  7:07 ` Aneesh Kumar K.V
  0 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-13  7:07 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups

Hi,

This patchset implements a memory controller extension to control
HugeTLB allocations. The extension allows to limit the HugeTLB
usage per control group and enforces the controller limit during
page fault. Since HugeTLB doesn't support page reclaim, enforcing
the limit at page fault time implies that, the application will get
SIGBUS signal if it tries to access HugeTLB pages beyond its limit.
This requires the application to know beforehand how much HugeTLB
pages it would require for its use.

Changes from V2:
* Changed the implementation to limit the HugeTLB usage during page
  fault time. This simplifies the extension and keep it closer to
  memcg design. This also allows to support cgroup removal with less
  complexity. Only caveat is the application should ensure its HugeTLB
  usage doesn't cross the cgroup limit.

Changes from V1:
* Changed the implementation as a memcg extension. We still use
  the same logic to track the cgroup and range.

Changes from RFC post:
* Added support for HugeTLB cgroup hierarchy
* Added support for task migration
* Added documentation patch
* Other bug fixes

-aneesh


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH -V3 1/8] hugetlb: rename max_hstate to hugetlb_max_hstate
  2012-03-13  7:07 ` Aneesh Kumar K.V
@ 2012-03-13  7:07   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-13  7:07 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

We will be using this from other subsystems like memcg
in later patches.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 mm/hugetlb.c |   14 +++++++-------
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 5f34bd8..d623e71 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -34,7 +34,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
 static gfp_t htlb_alloc_mask = GFP_HIGHUSER;
 unsigned long hugepages_treat_as_movable;
 
-static int max_hstate;
+static int hugetlb_max_hstate;
 unsigned int default_hstate_idx;
 struct hstate hstates[HUGE_MAX_HSTATE];
 
@@ -46,7 +46,7 @@ static unsigned long __initdata default_hstate_max_huge_pages;
 static unsigned long __initdata default_hstate_size;
 
 #define for_each_hstate(h) \
-	for ((h) = hstates; (h) < &hstates[max_hstate]; (h)++)
+	for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++)
 
 /*
  * Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages
@@ -1808,9 +1808,9 @@ void __init hugetlb_add_hstate(unsigned order)
 		printk(KERN_WARNING "hugepagesz= specified twice, ignoring\n");
 		return;
 	}
-	BUG_ON(max_hstate >= HUGE_MAX_HSTATE);
+	BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE);
 	BUG_ON(order == 0);
-	h = &hstates[max_hstate++];
+	h = &hstates[hugetlb_max_hstate++];
 	h->order = order;
 	h->mask = ~((1ULL << (order + PAGE_SHIFT)) - 1);
 	h->nr_huge_pages = 0;
@@ -1831,10 +1831,10 @@ static int __init hugetlb_nrpages_setup(char *s)
 	static unsigned long *last_mhp;
 
 	/*
-	 * !max_hstate means we haven't parsed a hugepagesz= parameter yet,
+	 * !hugetlb_max_hstate means we haven't parsed a hugepagesz= parameter yet,
 	 * so this hugepages= parameter goes to the "default hstate".
 	 */
-	if (!max_hstate)
+	if (!hugetlb_max_hstate)
 		mhp = &default_hstate_max_huge_pages;
 	else
 		mhp = &parsed_hstate->max_huge_pages;
@@ -1853,7 +1853,7 @@ static int __init hugetlb_nrpages_setup(char *s)
 	 * But we need to allocate >= MAX_ORDER hstates here early to still
 	 * use the bootmem allocator.
 	 */
-	if (max_hstate && parsed_hstate->order >= MAX_ORDER)
+	if (hugetlb_max_hstate && parsed_hstate->order >= MAX_ORDER)
 		hugetlb_hstate_alloc_pages(parsed_hstate);
 
 	last_mhp = mhp;
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH -V3 1/8] hugetlb: rename max_hstate to hugetlb_max_hstate
@ 2012-03-13  7:07   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-13  7:07 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

We will be using this from other subsystems like memcg
in later patches.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 mm/hugetlb.c |   14 +++++++-------
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 5f34bd8..d623e71 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -34,7 +34,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
 static gfp_t htlb_alloc_mask = GFP_HIGHUSER;
 unsigned long hugepages_treat_as_movable;
 
-static int max_hstate;
+static int hugetlb_max_hstate;
 unsigned int default_hstate_idx;
 struct hstate hstates[HUGE_MAX_HSTATE];
 
@@ -46,7 +46,7 @@ static unsigned long __initdata default_hstate_max_huge_pages;
 static unsigned long __initdata default_hstate_size;
 
 #define for_each_hstate(h) \
-	for ((h) = hstates; (h) < &hstates[max_hstate]; (h)++)
+	for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++)
 
 /*
  * Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages
@@ -1808,9 +1808,9 @@ void __init hugetlb_add_hstate(unsigned order)
 		printk(KERN_WARNING "hugepagesz= specified twice, ignoring\n");
 		return;
 	}
-	BUG_ON(max_hstate >= HUGE_MAX_HSTATE);
+	BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE);
 	BUG_ON(order == 0);
-	h = &hstates[max_hstate++];
+	h = &hstates[hugetlb_max_hstate++];
 	h->order = order;
 	h->mask = ~((1ULL << (order + PAGE_SHIFT)) - 1);
 	h->nr_huge_pages = 0;
@@ -1831,10 +1831,10 @@ static int __init hugetlb_nrpages_setup(char *s)
 	static unsigned long *last_mhp;
 
 	/*
-	 * !max_hstate means we haven't parsed a hugepagesz= parameter yet,
+	 * !hugetlb_max_hstate means we haven't parsed a hugepagesz= parameter yet,
 	 * so this hugepages= parameter goes to the "default hstate".
 	 */
-	if (!max_hstate)
+	if (!hugetlb_max_hstate)
 		mhp = &default_hstate_max_huge_pages;
 	else
 		mhp = &parsed_hstate->max_huge_pages;
@@ -1853,7 +1853,7 @@ static int __init hugetlb_nrpages_setup(char *s)
 	 * But we need to allocate >= MAX_ORDER hstates here early to still
 	 * use the bootmem allocator.
 	 */
-	if (max_hstate && parsed_hstate->order >= MAX_ORDER)
+	if (hugetlb_max_hstate && parsed_hstate->order >= MAX_ORDER)
 		hugetlb_hstate_alloc_pages(parsed_hstate);
 
 	last_mhp = mhp;
-- 
1.7.9

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH -V3 2/8] memcg: Add HugeTLB extension
  2012-03-13  7:07 ` Aneesh Kumar K.V
@ 2012-03-13  7:07   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-13  7:07 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This patch implements a memcg extension that allows us to control
HugeTLB allocations via memory controller.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/hugetlb.h    |    1 +
 include/linux/memcontrol.h |   42 +++++++++++++
 init/Kconfig               |    8 +++
 mm/hugetlb.c               |    2 +-
 mm/memcontrol.c            |  138 ++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 190 insertions(+), 1 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index d9d6c86..5ed0ad7 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -243,6 +243,7 @@ struct hstate *size_to_hstate(unsigned long size);
 #define HUGE_MAX_HSTATE 1
 #endif
 
+extern int hugetlb_max_hstate;
 extern struct hstate hstates[HUGE_MAX_HSTATE];
 extern unsigned int default_hstate_idx;
 
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 4d34356..320dbad 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -429,5 +429,47 @@ static inline void sock_release_memcg(struct sock *sk)
 {
 }
 #endif /* CONFIG_CGROUP_MEM_RES_CTLR_KMEM */
+
+#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
+extern int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
+					  struct mem_cgroup **ptr);
+extern void mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages,
+					     struct mem_cgroup *memcg,
+					     struct page *page);
+extern void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages,
+					     struct page *page);
+extern void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
+					      struct mem_cgroup *memcg);
+
+#else
+static inline int
+mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
+						 struct mem_cgroup **ptr)
+{
+	return 0;
+}
+
+static inline void
+mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages,
+				 struct mem_cgroup *memcg,
+				 struct page *page)
+{
+	return;
+}
+
+static inline void
+mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages,
+				 struct page *page)
+{
+	return;
+}
+
+static inline void
+mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
+				  struct mem_cgroup *memcg)
+{
+	return;
+}
+#endif  /* CONFIG_MEM_RES_CTLR_HUGETLB */
 #endif /* _LINUX_MEMCONTROL_H */
 
diff --git a/init/Kconfig b/init/Kconfig
index 3f42cd6..f0eb8aa 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -725,6 +725,14 @@ config CGROUP_PERF
 
 	  Say N if unsure.
 
+config MEM_RES_CTLR_HUGETLB
+	bool "Memory Resource Controller HugeTLB Extension (EXPERIMENTAL)"
+	depends on CGROUP_MEM_RES_CTLR && HUGETLB_PAGE && EXPERIMENTAL
+	default n
+	help
+	  Add HugeTLB management to memory resource controller. When you
+	  enable this, you can put a per cgroup limit on HugeTLB usage.
+
 menuconfig CGROUP_SCHED
 	bool "Group CPU scheduler"
 	default n
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d623e71..fe7aefd 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -34,7 +34,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
 static gfp_t htlb_alloc_mask = GFP_HIGHUSER;
 unsigned long hugepages_treat_as_movable;
 
-static int hugetlb_max_hstate;
+int hugetlb_max_hstate;
 unsigned int default_hstate_idx;
 struct hstate hstates[HUGE_MAX_HSTATE];
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6728a7a..8cac77b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -235,6 +235,10 @@ struct mem_cgroup {
 	 */
 	struct res_counter memsw;
 	/*
+	 * the counter to account for hugepages from hugetlb.
+	 */
+	struct res_counter hugepage[HUGE_MAX_HSTATE];
+	/*
 	 * Per cgroup active and inactive list, similar to the
 	 * per zone LRU lists.
 	 */
@@ -3156,6 +3160,128 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
 }
 #endif
 
+#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
+static int mem_cgroup_hugetlb_usage(struct mem_cgroup *memcg)
+{
+	int idx;
+	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
+		if (memcg->hugepage[idx].usage > 0)
+			return memcg->hugepage[idx].usage;
+	}
+	return 0;
+}
+
+int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
+				   struct mem_cgroup **ptr)
+{
+	int ret = 0;
+	struct mem_cgroup *memcg;
+	struct res_counter *fail_res;
+	unsigned long csize = nr_pages * PAGE_SIZE;
+
+	if (mem_cgroup_disabled())
+		return 0;
+again:
+	rcu_read_lock();
+	memcg = mem_cgroup_from_task(current);
+	if (!memcg)
+		memcg = root_mem_cgroup;
+	if (mem_cgroup_is_root(memcg)) {
+		rcu_read_unlock();
+		goto done;
+	}
+	if (!css_tryget(&memcg->css)) {
+		rcu_read_unlock();
+		goto again;
+	}
+	rcu_read_unlock();
+
+	ret = res_counter_charge(&memcg->hugepage[idx], csize, &fail_res);
+	css_put(&memcg->css);
+done:
+	*ptr = memcg;
+	return ret;
+}
+
+void mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages,
+				      struct mem_cgroup *memcg,
+				      struct page *page)
+{
+	struct page_cgroup *pc;
+
+	if (mem_cgroup_disabled())
+		return;
+
+	pc = lookup_page_cgroup(page);
+	lock_page_cgroup(pc);
+	if (unlikely(PageCgroupUsed(pc))) {
+		unlock_page_cgroup(pc);
+		mem_cgroup_hugetlb_uncharge_memcg(idx, nr_pages, memcg);
+		return;
+	}
+	pc->mem_cgroup = memcg;
+	/*
+	 * We access a page_cgroup asynchronously without lock_page_cgroup().
+	 * Especially when a page_cgroup is taken from a page, pc->mem_cgroup
+	 * is accessed after testing USED bit. To make pc->mem_cgroup visible
+	 * before USED bit, we need memory barrier here.
+	 * See mem_cgroup_add_lru_list(), etc.
+	 */
+	smp_wmb();
+	SetPageCgroupUsed(pc);
+
+	unlock_page_cgroup(pc);
+	return;
+}
+
+void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages,
+				      struct page *page)
+{
+	struct page_cgroup *pc;
+	struct mem_cgroup *memcg;
+	unsigned long csize = nr_pages * PAGE_SIZE;
+
+	if (mem_cgroup_disabled())
+		return;
+
+	pc = lookup_page_cgroup(page);
+	if (unlikely(!PageCgroupUsed(pc)))
+		return;
+
+	lock_page_cgroup(pc);
+	if (!PageCgroupUsed(pc)) {
+		unlock_page_cgroup(pc);
+		return;
+	}
+	memcg = pc->mem_cgroup;
+	pc->mem_cgroup = root_mem_cgroup;
+	ClearPageCgroupUsed(pc);
+	unlock_page_cgroup(pc);
+
+	if (!mem_cgroup_is_root(memcg))
+		res_counter_uncharge(&memcg->hugepage[idx], csize);
+	return;
+}
+
+void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
+				       struct mem_cgroup *memcg)
+{
+	unsigned long csize = nr_pages * PAGE_SIZE;
+
+	if (mem_cgroup_disabled())
+		return;
+
+	if (!mem_cgroup_is_root(memcg))
+		res_counter_uncharge(&memcg->hugepage[idx], csize);
+	return;
+}
+#else
+static int mem_cgroup_hugetlb_usage(struct mem_cgroup *memcg)
+{
+	return 0;
+}
+#endif /* CONFIG_MEM_RES_CTLR_HUGETLB */
+
 /*
  * Before starting migration, account PAGE_SIZE to mem_cgroup that the old
  * page belongs to.
@@ -4887,6 +5013,7 @@ err_cleanup:
 static struct cgroup_subsys_state * __ref
 mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 {
+	int idx;
 	struct mem_cgroup *memcg, *parent;
 	long error = -ENOMEM;
 	int node;
@@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 		 * mem_cgroup(see mem_cgroup_put).
 		 */
 		mem_cgroup_get(parent);
+		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
+			res_counter_init(&memcg->hugepage[idx],
+					 &parent->hugepage[idx]);
 	} else {
 		res_counter_init(&memcg->res, NULL);
 		res_counter_init(&memcg->memsw, NULL);
+		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
+			res_counter_init(&memcg->hugepage[idx], NULL);
 	}
 	memcg->last_scanned_node = MAX_NUMNODES;
 	INIT_LIST_HEAD(&memcg->oom_notify);
@@ -4951,6 +5083,12 @@ static int mem_cgroup_pre_destroy(struct cgroup_subsys *ss,
 					struct cgroup *cont)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
+	/*
+	 * Don't allow memcg removal if we have HugeTLB resource
+	 * usage.
+	 */
+	if (mem_cgroup_hugetlb_usage(memcg) > 0)
+		return -EBUSY;
 
 	return mem_cgroup_force_empty(memcg, false);
 }
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH -V3 2/8] memcg: Add HugeTLB extension
@ 2012-03-13  7:07   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-13  7:07 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This patch implements a memcg extension that allows us to control
HugeTLB allocations via memory controller.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/hugetlb.h    |    1 +
 include/linux/memcontrol.h |   42 +++++++++++++
 init/Kconfig               |    8 +++
 mm/hugetlb.c               |    2 +-
 mm/memcontrol.c            |  138 ++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 190 insertions(+), 1 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index d9d6c86..5ed0ad7 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -243,6 +243,7 @@ struct hstate *size_to_hstate(unsigned long size);
 #define HUGE_MAX_HSTATE 1
 #endif
 
+extern int hugetlb_max_hstate;
 extern struct hstate hstates[HUGE_MAX_HSTATE];
 extern unsigned int default_hstate_idx;
 
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 4d34356..320dbad 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -429,5 +429,47 @@ static inline void sock_release_memcg(struct sock *sk)
 {
 }
 #endif /* CONFIG_CGROUP_MEM_RES_CTLR_KMEM */
+
+#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
+extern int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
+					  struct mem_cgroup **ptr);
+extern void mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages,
+					     struct mem_cgroup *memcg,
+					     struct page *page);
+extern void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages,
+					     struct page *page);
+extern void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
+					      struct mem_cgroup *memcg);
+
+#else
+static inline int
+mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
+						 struct mem_cgroup **ptr)
+{
+	return 0;
+}
+
+static inline void
+mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages,
+				 struct mem_cgroup *memcg,
+				 struct page *page)
+{
+	return;
+}
+
+static inline void
+mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages,
+				 struct page *page)
+{
+	return;
+}
+
+static inline void
+mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
+				  struct mem_cgroup *memcg)
+{
+	return;
+}
+#endif  /* CONFIG_MEM_RES_CTLR_HUGETLB */
 #endif /* _LINUX_MEMCONTROL_H */
 
diff --git a/init/Kconfig b/init/Kconfig
index 3f42cd6..f0eb8aa 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -725,6 +725,14 @@ config CGROUP_PERF
 
 	  Say N if unsure.
 
+config MEM_RES_CTLR_HUGETLB
+	bool "Memory Resource Controller HugeTLB Extension (EXPERIMENTAL)"
+	depends on CGROUP_MEM_RES_CTLR && HUGETLB_PAGE && EXPERIMENTAL
+	default n
+	help
+	  Add HugeTLB management to memory resource controller. When you
+	  enable this, you can put a per cgroup limit on HugeTLB usage.
+
 menuconfig CGROUP_SCHED
 	bool "Group CPU scheduler"
 	default n
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d623e71..fe7aefd 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -34,7 +34,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
 static gfp_t htlb_alloc_mask = GFP_HIGHUSER;
 unsigned long hugepages_treat_as_movable;
 
-static int hugetlb_max_hstate;
+int hugetlb_max_hstate;
 unsigned int default_hstate_idx;
 struct hstate hstates[HUGE_MAX_HSTATE];
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6728a7a..8cac77b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -235,6 +235,10 @@ struct mem_cgroup {
 	 */
 	struct res_counter memsw;
 	/*
+	 * the counter to account for hugepages from hugetlb.
+	 */
+	struct res_counter hugepage[HUGE_MAX_HSTATE];
+	/*
 	 * Per cgroup active and inactive list, similar to the
 	 * per zone LRU lists.
 	 */
@@ -3156,6 +3160,128 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
 }
 #endif
 
+#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
+static int mem_cgroup_hugetlb_usage(struct mem_cgroup *memcg)
+{
+	int idx;
+	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
+		if (memcg->hugepage[idx].usage > 0)
+			return memcg->hugepage[idx].usage;
+	}
+	return 0;
+}
+
+int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
+				   struct mem_cgroup **ptr)
+{
+	int ret = 0;
+	struct mem_cgroup *memcg;
+	struct res_counter *fail_res;
+	unsigned long csize = nr_pages * PAGE_SIZE;
+
+	if (mem_cgroup_disabled())
+		return 0;
+again:
+	rcu_read_lock();
+	memcg = mem_cgroup_from_task(current);
+	if (!memcg)
+		memcg = root_mem_cgroup;
+	if (mem_cgroup_is_root(memcg)) {
+		rcu_read_unlock();
+		goto done;
+	}
+	if (!css_tryget(&memcg->css)) {
+		rcu_read_unlock();
+		goto again;
+	}
+	rcu_read_unlock();
+
+	ret = res_counter_charge(&memcg->hugepage[idx], csize, &fail_res);
+	css_put(&memcg->css);
+done:
+	*ptr = memcg;
+	return ret;
+}
+
+void mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages,
+				      struct mem_cgroup *memcg,
+				      struct page *page)
+{
+	struct page_cgroup *pc;
+
+	if (mem_cgroup_disabled())
+		return;
+
+	pc = lookup_page_cgroup(page);
+	lock_page_cgroup(pc);
+	if (unlikely(PageCgroupUsed(pc))) {
+		unlock_page_cgroup(pc);
+		mem_cgroup_hugetlb_uncharge_memcg(idx, nr_pages, memcg);
+		return;
+	}
+	pc->mem_cgroup = memcg;
+	/*
+	 * We access a page_cgroup asynchronously without lock_page_cgroup().
+	 * Especially when a page_cgroup is taken from a page, pc->mem_cgroup
+	 * is accessed after testing USED bit. To make pc->mem_cgroup visible
+	 * before USED bit, we need memory barrier here.
+	 * See mem_cgroup_add_lru_list(), etc.
+	 */
+	smp_wmb();
+	SetPageCgroupUsed(pc);
+
+	unlock_page_cgroup(pc);
+	return;
+}
+
+void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages,
+				      struct page *page)
+{
+	struct page_cgroup *pc;
+	struct mem_cgroup *memcg;
+	unsigned long csize = nr_pages * PAGE_SIZE;
+
+	if (mem_cgroup_disabled())
+		return;
+
+	pc = lookup_page_cgroup(page);
+	if (unlikely(!PageCgroupUsed(pc)))
+		return;
+
+	lock_page_cgroup(pc);
+	if (!PageCgroupUsed(pc)) {
+		unlock_page_cgroup(pc);
+		return;
+	}
+	memcg = pc->mem_cgroup;
+	pc->mem_cgroup = root_mem_cgroup;
+	ClearPageCgroupUsed(pc);
+	unlock_page_cgroup(pc);
+
+	if (!mem_cgroup_is_root(memcg))
+		res_counter_uncharge(&memcg->hugepage[idx], csize);
+	return;
+}
+
+void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
+				       struct mem_cgroup *memcg)
+{
+	unsigned long csize = nr_pages * PAGE_SIZE;
+
+	if (mem_cgroup_disabled())
+		return;
+
+	if (!mem_cgroup_is_root(memcg))
+		res_counter_uncharge(&memcg->hugepage[idx], csize);
+	return;
+}
+#else
+static int mem_cgroup_hugetlb_usage(struct mem_cgroup *memcg)
+{
+	return 0;
+}
+#endif /* CONFIG_MEM_RES_CTLR_HUGETLB */
+
 /*
  * Before starting migration, account PAGE_SIZE to mem_cgroup that the old
  * page belongs to.
@@ -4887,6 +5013,7 @@ err_cleanup:
 static struct cgroup_subsys_state * __ref
 mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 {
+	int idx;
 	struct mem_cgroup *memcg, *parent;
 	long error = -ENOMEM;
 	int node;
@@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 		 * mem_cgroup(see mem_cgroup_put).
 		 */
 		mem_cgroup_get(parent);
+		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
+			res_counter_init(&memcg->hugepage[idx],
+					 &parent->hugepage[idx]);
 	} else {
 		res_counter_init(&memcg->res, NULL);
 		res_counter_init(&memcg->memsw, NULL);
+		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
+			res_counter_init(&memcg->hugepage[idx], NULL);
 	}
 	memcg->last_scanned_node = MAX_NUMNODES;
 	INIT_LIST_HEAD(&memcg->oom_notify);
@@ -4951,6 +5083,12 @@ static int mem_cgroup_pre_destroy(struct cgroup_subsys *ss,
 					struct cgroup *cont)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
+	/*
+	 * Don't allow memcg removal if we have HugeTLB resource
+	 * usage.
+	 */
+	if (mem_cgroup_hugetlb_usage(memcg) > 0)
+		return -EBUSY;
 
 	return mem_cgroup_force_empty(memcg, false);
 }
-- 
1.7.9

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH -V3 3/8] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
  2012-03-13  7:07 ` Aneesh Kumar K.V
@ 2012-03-13  7:07   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-13  7:07 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This adds necessary charge/uncharge calls in the HugeTLB code

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 mm/hugetlb.c    |   21 ++++++++++++++++++++-
 mm/memcontrol.c |    5 +++++
 2 files changed, 25 insertions(+), 1 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index fe7aefd..b7152d1 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -21,6 +21,8 @@
 #include <linux/rmap.h>
 #include <linux/swap.h>
 #include <linux/swapops.h>
+#include <linux/memcontrol.h>
+#include <linux/page_cgroup.h>
 
 #include <asm/page.h>
 #include <asm/pgtable.h>
@@ -542,6 +544,9 @@ static void free_huge_page(struct page *page)
 	BUG_ON(page_mapcount(page));
 	INIT_LIST_HEAD(&page->lru);
 
+	if (mapping)
+		mem_cgroup_hugetlb_uncharge_page(h - hstates,
+						 pages_per_huge_page(h), page);
 	spin_lock(&hugetlb_lock);
 	if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
 		update_and_free_page(h, page);
@@ -1019,12 +1024,15 @@ static void vma_commit_reservation(struct hstate *h,
 static struct page *alloc_huge_page(struct vm_area_struct *vma,
 				    unsigned long addr, int avoid_reserve)
 {
+	int ret, idx;
 	struct hstate *h = hstate_vma(vma);
 	struct page *page;
+	struct mem_cgroup *memcg = NULL;
 	struct address_space *mapping = vma->vm_file->f_mapping;
 	struct inode *inode = mapping->host;
 	long chg;
 
+	idx = h - hstates;
 	/*
 	 * Processes that did not create the mapping will have no reserves and
 	 * will not have accounted against quota. Check that the quota can be
@@ -1039,6 +1047,12 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 		if (hugetlb_get_quota(inode->i_mapping, chg))
 			return ERR_PTR(-VM_FAULT_SIGBUS);
 
+	ret = mem_cgroup_hugetlb_charge_page(idx, pages_per_huge_page(h),
+					     &memcg);
+	if (ret) {
+		hugetlb_put_quota(inode->i_mapping, chg);
+		return ERR_PTR(-VM_FAULT_SIGBUS);
+	}
 	spin_lock(&hugetlb_lock);
 	page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve);
 	spin_unlock(&hugetlb_lock);
@@ -1046,6 +1060,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 	if (!page) {
 		page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
 		if (!page) {
+			mem_cgroup_hugetlb_uncharge_memcg(idx,
+							 pages_per_huge_page(h),
+							 memcg);
 			hugetlb_put_quota(inode->i_mapping, chg);
 			return ERR_PTR(-VM_FAULT_SIGBUS);
 		}
@@ -1054,7 +1071,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 	set_page_private(page, (unsigned long) mapping);
 
 	vma_commit_reservation(h, vma, addr);
-
+	/* update page cgroup details */
+	mem_cgroup_hugetlb_commit_charge(idx, pages_per_huge_page(h),
+					 memcg, page);
 	return page;
 }
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8cac77b..f4aa11c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2901,6 +2901,11 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
 
 	if (PageSwapCache(page))
 		return NULL;
+	/*
+	 * HugeTLB page uncharge happen in the HugeTLB compound page destructor
+	 */
+	if (PageHuge(page))
+		return NULL;
 
 	if (PageTransHuge(page)) {
 		nr_pages <<= compound_order(page);
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH -V3 3/8] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
@ 2012-03-13  7:07   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-13  7:07 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This adds necessary charge/uncharge calls in the HugeTLB code

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 mm/hugetlb.c    |   21 ++++++++++++++++++++-
 mm/memcontrol.c |    5 +++++
 2 files changed, 25 insertions(+), 1 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index fe7aefd..b7152d1 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -21,6 +21,8 @@
 #include <linux/rmap.h>
 #include <linux/swap.h>
 #include <linux/swapops.h>
+#include <linux/memcontrol.h>
+#include <linux/page_cgroup.h>
 
 #include <asm/page.h>
 #include <asm/pgtable.h>
@@ -542,6 +544,9 @@ static void free_huge_page(struct page *page)
 	BUG_ON(page_mapcount(page));
 	INIT_LIST_HEAD(&page->lru);
 
+	if (mapping)
+		mem_cgroup_hugetlb_uncharge_page(h - hstates,
+						 pages_per_huge_page(h), page);
 	spin_lock(&hugetlb_lock);
 	if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
 		update_and_free_page(h, page);
@@ -1019,12 +1024,15 @@ static void vma_commit_reservation(struct hstate *h,
 static struct page *alloc_huge_page(struct vm_area_struct *vma,
 				    unsigned long addr, int avoid_reserve)
 {
+	int ret, idx;
 	struct hstate *h = hstate_vma(vma);
 	struct page *page;
+	struct mem_cgroup *memcg = NULL;
 	struct address_space *mapping = vma->vm_file->f_mapping;
 	struct inode *inode = mapping->host;
 	long chg;
 
+	idx = h - hstates;
 	/*
 	 * Processes that did not create the mapping will have no reserves and
 	 * will not have accounted against quota. Check that the quota can be
@@ -1039,6 +1047,12 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 		if (hugetlb_get_quota(inode->i_mapping, chg))
 			return ERR_PTR(-VM_FAULT_SIGBUS);
 
+	ret = mem_cgroup_hugetlb_charge_page(idx, pages_per_huge_page(h),
+					     &memcg);
+	if (ret) {
+		hugetlb_put_quota(inode->i_mapping, chg);
+		return ERR_PTR(-VM_FAULT_SIGBUS);
+	}
 	spin_lock(&hugetlb_lock);
 	page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve);
 	spin_unlock(&hugetlb_lock);
@@ -1046,6 +1060,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 	if (!page) {
 		page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
 		if (!page) {
+			mem_cgroup_hugetlb_uncharge_memcg(idx,
+							 pages_per_huge_page(h),
+							 memcg);
 			hugetlb_put_quota(inode->i_mapping, chg);
 			return ERR_PTR(-VM_FAULT_SIGBUS);
 		}
@@ -1054,7 +1071,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 	set_page_private(page, (unsigned long) mapping);
 
 	vma_commit_reservation(h, vma, addr);
-
+	/* update page cgroup details */
+	mem_cgroup_hugetlb_commit_charge(idx, pages_per_huge_page(h),
+					 memcg, page);
 	return page;
 }
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8cac77b..f4aa11c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2901,6 +2901,11 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
 
 	if (PageSwapCache(page))
 		return NULL;
+	/*
+	 * HugeTLB page uncharge happen in the HugeTLB compound page destructor
+	 */
+	if (PageHuge(page))
+		return NULL;
 
 	if (PageTransHuge(page)) {
 		nr_pages <<= compound_order(page);
-- 
1.7.9

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH -V3 4/8] memcg: track resource index in cftype private
  2012-03-13  7:07 ` Aneesh Kumar K.V
@ 2012-03-13  7:07   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-13  7:07 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This helps in using same memcg callbacks for non reclaim resource
control files.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 mm/memcontrol.c |   27 +++++++++++++++++++++------
 1 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f4aa11c..7ac8489 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -358,9 +358,14 @@ enum charge_type {
 #define _MEM			(0)
 #define _MEMSWAP		(1)
 #define _OOM_TYPE		(2)
-#define MEMFILE_PRIVATE(x, val)	(((x) << 16) | (val))
-#define MEMFILE_TYPE(val)	(((val) >> 16) & 0xffff)
-#define MEMFILE_ATTR(val)	((val) & 0xffff)
+#define _MEMHUGETLB		(3)
+
+/*  0 ... val ...16.... x...24...idx...32*/
+#define __MEMFILE_PRIVATE(idx, x, val)	(((idx) << 24) | ((x) << 16) | (val))
+#define MEMFILE_PRIVATE(x, val)		__MEMFILE_PRIVATE(0, x, val)
+#define MEMFILE_TYPE(val)		(((val) >> 16) & 0xff)
+#define MEMFILE_IDX(val)		(((val) >> 24) & 0xff)
+#define MEMFILE_ATTR(val)		((val) & 0xffff)
 /* Used for OOM nofiier */
 #define OOM_CONTROL		(0)
 
@@ -3954,7 +3959,7 @@ static u64 mem_cgroup_read(struct cgroup *cont, struct cftype *cft)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
 	u64 val;
-	int type, name;
+	int type, name, idx;
 
 	type = MEMFILE_TYPE(cft->private);
 	name = MEMFILE_ATTR(cft->private);
@@ -3971,6 +3976,10 @@ static u64 mem_cgroup_read(struct cgroup *cont, struct cftype *cft)
 		else
 			val = res_counter_read_u64(&memcg->memsw, name);
 		break;
+	case _MEMHUGETLB:
+		idx = MEMFILE_IDX(cft->private);
+		val = res_counter_read_u64(&memcg->hugepage[idx], name);
+		break;
 	default:
 		BUG();
 		break;
@@ -4003,7 +4012,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
 			break;
 		if (type == _MEM)
 			ret = mem_cgroup_resize_limit(memcg, val);
-		else
+		else if (type == _MEMHUGETLB) {
+			int idx = MEMFILE_IDX(cft->private);
+			ret = res_counter_set_limit(&memcg->hugepage[idx], val);
+		} else
 			ret = mem_cgroup_resize_memsw_limit(memcg, val);
 		break;
 	case RES_SOFT_LIMIT:
@@ -4067,7 +4079,10 @@ static int mem_cgroup_reset(struct cgroup *cont, unsigned int event)
 	case RES_MAX_USAGE:
 		if (type == _MEM)
 			res_counter_reset_max(&memcg->res);
-		else
+		else if (type == _MEMHUGETLB) {
+			int idx = MEMFILE_IDX(event);
+			res_counter_reset_max(&memcg->hugepage[idx]);
+		} else
 			res_counter_reset_max(&memcg->memsw);
 		break;
 	case RES_FAILCNT:
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH -V3 4/8] memcg: track resource index in cftype private
@ 2012-03-13  7:07   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-13  7:07 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This helps in using same memcg callbacks for non reclaim resource
control files.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 mm/memcontrol.c |   27 +++++++++++++++++++++------
 1 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f4aa11c..7ac8489 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -358,9 +358,14 @@ enum charge_type {
 #define _MEM			(0)
 #define _MEMSWAP		(1)
 #define _OOM_TYPE		(2)
-#define MEMFILE_PRIVATE(x, val)	(((x) << 16) | (val))
-#define MEMFILE_TYPE(val)	(((val) >> 16) & 0xffff)
-#define MEMFILE_ATTR(val)	((val) & 0xffff)
+#define _MEMHUGETLB		(3)
+
+/*  0 ... val ...16.... x...24...idx...32*/
+#define __MEMFILE_PRIVATE(idx, x, val)	(((idx) << 24) | ((x) << 16) | (val))
+#define MEMFILE_PRIVATE(x, val)		__MEMFILE_PRIVATE(0, x, val)
+#define MEMFILE_TYPE(val)		(((val) >> 16) & 0xff)
+#define MEMFILE_IDX(val)		(((val) >> 24) & 0xff)
+#define MEMFILE_ATTR(val)		((val) & 0xffff)
 /* Used for OOM nofiier */
 #define OOM_CONTROL		(0)
 
@@ -3954,7 +3959,7 @@ static u64 mem_cgroup_read(struct cgroup *cont, struct cftype *cft)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
 	u64 val;
-	int type, name;
+	int type, name, idx;
 
 	type = MEMFILE_TYPE(cft->private);
 	name = MEMFILE_ATTR(cft->private);
@@ -3971,6 +3976,10 @@ static u64 mem_cgroup_read(struct cgroup *cont, struct cftype *cft)
 		else
 			val = res_counter_read_u64(&memcg->memsw, name);
 		break;
+	case _MEMHUGETLB:
+		idx = MEMFILE_IDX(cft->private);
+		val = res_counter_read_u64(&memcg->hugepage[idx], name);
+		break;
 	default:
 		BUG();
 		break;
@@ -4003,7 +4012,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
 			break;
 		if (type == _MEM)
 			ret = mem_cgroup_resize_limit(memcg, val);
-		else
+		else if (type == _MEMHUGETLB) {
+			int idx = MEMFILE_IDX(cft->private);
+			ret = res_counter_set_limit(&memcg->hugepage[idx], val);
+		} else
 			ret = mem_cgroup_resize_memsw_limit(memcg, val);
 		break;
 	case RES_SOFT_LIMIT:
@@ -4067,7 +4079,10 @@ static int mem_cgroup_reset(struct cgroup *cont, unsigned int event)
 	case RES_MAX_USAGE:
 		if (type == _MEM)
 			res_counter_reset_max(&memcg->res);
-		else
+		else if (type == _MEMHUGETLB) {
+			int idx = MEMFILE_IDX(event);
+			res_counter_reset_max(&memcg->hugepage[idx]);
+		} else
 			res_counter_reset_max(&memcg->memsw);
 		break;
 	case RES_FAILCNT:
-- 
1.7.9

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH -V3 5/8] hugetlbfs: Add memcg control files for hugetlbfs
  2012-03-13  7:07 ` Aneesh Kumar K.V
@ 2012-03-13  7:07   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-13  7:07 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This add control files for hugetlbfs in memcg

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/hugetlb.h |   15 +++++++++++++++
 mm/hugetlb.c            |   32 +++++++++++++++++++++++++++++++-
 mm/memcontrol.c         |   47 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 93 insertions(+), 1 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 5ed0ad7..8c1e855 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -4,6 +4,7 @@
 #include <linux/mm_types.h>
 #include <linux/fs.h>
 #include <linux/hugetlb_inline.h>
+#include <linux/cgroup.h>
 
 struct ctl_table;
 struct user_struct;
@@ -220,6 +221,10 @@ struct hstate {
 	unsigned int nr_huge_pages_node[MAX_NUMNODES];
 	unsigned int free_huge_pages_node[MAX_NUMNODES];
 	unsigned int surplus_huge_pages_node[MAX_NUMNODES];
+	/* cgroup control files */
+	struct cftype cgroup_limit_file;
+	struct cftype cgroup_usage_file;
+	struct cftype cgroup_max_usage_file;
 	char name[HSTATE_NAME_LEN];
 };
 
@@ -332,4 +337,14 @@ static inline unsigned int pages_per_huge_page(struct hstate *h)
 #define hstate_index_to_shift(index) 0
 #endif
 
+#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
+extern int register_hugetlb_memcg_files(struct cgroup *cgroup,
+					struct cgroup_subsys *ss);
+#else
+static inline int register_hugetlb_memcg_files(struct cgroup *cgroup,
+					       struct cgroup_subsys *ss)
+{
+	return 0;
+}
+#endif
 #endif /* _LINUX_HUGETLB_H */
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index b7152d1..30f66f1 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1817,6 +1817,36 @@ static int __init hugetlb_init(void)
 }
 module_init(hugetlb_init);
 
+#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
+int register_hugetlb_memcg_files(struct cgroup *cgroup,
+				 struct cgroup_subsys *ss)
+{
+	int ret = 0;
+	struct hstate *h;
+
+	for_each_hstate(h) {
+		ret = cgroup_add_file(cgroup, ss, &h->cgroup_limit_file);
+		if (ret)
+			return ret;
+		ret = cgroup_add_file(cgroup, ss, &h->cgroup_usage_file);
+		if (ret)
+			return ret;
+		ret = cgroup_add_file(cgroup, ss, &h->cgroup_max_usage_file);
+		if (ret)
+			return ret;
+
+	}
+	return ret;
+}
+/* mm/memcontrol.c because mem_cgroup_read/write is not availabel outside */
+int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx);
+#else
+static int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx)
+{
+	return 0;
+}
+#endif
+
 /* Should be called on processing a hugepagesz=... option */
 void __init hugetlb_add_hstate(unsigned order)
 {
@@ -1840,7 +1870,7 @@ void __init hugetlb_add_hstate(unsigned order)
 	h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]);
 	snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
 					huge_page_size(h)/1024);
-
+	mem_cgroup_hugetlb_file_init(h, hugetlb_max_hstate - 1);
 	parsed_hstate = h;
 }
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7ac8489..405e17d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5123,6 +5123,50 @@ static void mem_cgroup_destroy(struct cgroup_subsys *ss,
 	mem_cgroup_put(memcg);
 }
 
+#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
+static char *mem_fmt(char *buf, unsigned long n)
+{
+	if (n >= (1UL << 30))
+		sprintf(buf, "%luGB", n >> 30);
+	else if (n >= (1UL << 20))
+		sprintf(buf, "%luMB", n >> 20);
+	else
+		sprintf(buf, "%luKB", n >> 10);
+	return buf;
+}
+
+int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx)
+{
+	char buf[32];
+	struct cftype *cft;
+
+	/* format the size */
+	mem_fmt(buf, huge_page_size(h));
+
+	/* Add the limit file */
+	cft = &h->cgroup_limit_file;
+	snprintf(cft->name, MAX_CFTYPE_NAME, "hugetlb.%s.limit_in_bytes", buf);
+	cft->private = __MEMFILE_PRIVATE(idx, _MEMHUGETLB, RES_LIMIT);
+	cft->read_u64 = mem_cgroup_read;
+	cft->write_string = mem_cgroup_write;
+
+	/* Add the usage file */
+	cft = &h->cgroup_usage_file;
+	snprintf(cft->name, MAX_CFTYPE_NAME, "hugetlb.%s.usage_in_bytes", buf);
+	cft->private  = __MEMFILE_PRIVATE(idx, _MEMHUGETLB, RES_USAGE);
+	cft->read_u64 = mem_cgroup_read;
+
+	/* Add the MAX usage file */
+	cft = &h->cgroup_max_usage_file;
+	snprintf(cft->name, MAX_CFTYPE_NAME, "hugetlb.%s.max_usage_in_bytes", buf);
+	cft->private  = __MEMFILE_PRIVATE(idx, _MEMHUGETLB, RES_MAX_USAGE);
+	cft->trigger  = mem_cgroup_reset;
+	cft->read_u64 = mem_cgroup_read;
+
+	return 0;
+}
+#endif
+
 static int mem_cgroup_populate(struct cgroup_subsys *ss,
 				struct cgroup *cont)
 {
@@ -5137,6 +5181,9 @@ static int mem_cgroup_populate(struct cgroup_subsys *ss,
 	if (!ret)
 		ret = register_kmem_files(cont, ss);
 
+	if (!ret)
+		ret = register_hugetlb_memcg_files(cont, ss);
+
 	return ret;
 }
 
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH -V3 5/8] hugetlbfs: Add memcg control files for hugetlbfs
@ 2012-03-13  7:07   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-13  7:07 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This add control files for hugetlbfs in memcg

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/hugetlb.h |   15 +++++++++++++++
 mm/hugetlb.c            |   32 +++++++++++++++++++++++++++++++-
 mm/memcontrol.c         |   47 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 93 insertions(+), 1 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 5ed0ad7..8c1e855 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -4,6 +4,7 @@
 #include <linux/mm_types.h>
 #include <linux/fs.h>
 #include <linux/hugetlb_inline.h>
+#include <linux/cgroup.h>
 
 struct ctl_table;
 struct user_struct;
@@ -220,6 +221,10 @@ struct hstate {
 	unsigned int nr_huge_pages_node[MAX_NUMNODES];
 	unsigned int free_huge_pages_node[MAX_NUMNODES];
 	unsigned int surplus_huge_pages_node[MAX_NUMNODES];
+	/* cgroup control files */
+	struct cftype cgroup_limit_file;
+	struct cftype cgroup_usage_file;
+	struct cftype cgroup_max_usage_file;
 	char name[HSTATE_NAME_LEN];
 };
 
@@ -332,4 +337,14 @@ static inline unsigned int pages_per_huge_page(struct hstate *h)
 #define hstate_index_to_shift(index) 0
 #endif
 
+#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
+extern int register_hugetlb_memcg_files(struct cgroup *cgroup,
+					struct cgroup_subsys *ss);
+#else
+static inline int register_hugetlb_memcg_files(struct cgroup *cgroup,
+					       struct cgroup_subsys *ss)
+{
+	return 0;
+}
+#endif
 #endif /* _LINUX_HUGETLB_H */
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index b7152d1..30f66f1 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1817,6 +1817,36 @@ static int __init hugetlb_init(void)
 }
 module_init(hugetlb_init);
 
+#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
+int register_hugetlb_memcg_files(struct cgroup *cgroup,
+				 struct cgroup_subsys *ss)
+{
+	int ret = 0;
+	struct hstate *h;
+
+	for_each_hstate(h) {
+		ret = cgroup_add_file(cgroup, ss, &h->cgroup_limit_file);
+		if (ret)
+			return ret;
+		ret = cgroup_add_file(cgroup, ss, &h->cgroup_usage_file);
+		if (ret)
+			return ret;
+		ret = cgroup_add_file(cgroup, ss, &h->cgroup_max_usage_file);
+		if (ret)
+			return ret;
+
+	}
+	return ret;
+}
+/* mm/memcontrol.c because mem_cgroup_read/write is not availabel outside */
+int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx);
+#else
+static int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx)
+{
+	return 0;
+}
+#endif
+
 /* Should be called on processing a hugepagesz=... option */
 void __init hugetlb_add_hstate(unsigned order)
 {
@@ -1840,7 +1870,7 @@ void __init hugetlb_add_hstate(unsigned order)
 	h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]);
 	snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
 					huge_page_size(h)/1024);
-
+	mem_cgroup_hugetlb_file_init(h, hugetlb_max_hstate - 1);
 	parsed_hstate = h;
 }
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7ac8489..405e17d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5123,6 +5123,50 @@ static void mem_cgroup_destroy(struct cgroup_subsys *ss,
 	mem_cgroup_put(memcg);
 }
 
+#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
+static char *mem_fmt(char *buf, unsigned long n)
+{
+	if (n >= (1UL << 30))
+		sprintf(buf, "%luGB", n >> 30);
+	else if (n >= (1UL << 20))
+		sprintf(buf, "%luMB", n >> 20);
+	else
+		sprintf(buf, "%luKB", n >> 10);
+	return buf;
+}
+
+int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx)
+{
+	char buf[32];
+	struct cftype *cft;
+
+	/* format the size */
+	mem_fmt(buf, huge_page_size(h));
+
+	/* Add the limit file */
+	cft = &h->cgroup_limit_file;
+	snprintf(cft->name, MAX_CFTYPE_NAME, "hugetlb.%s.limit_in_bytes", buf);
+	cft->private = __MEMFILE_PRIVATE(idx, _MEMHUGETLB, RES_LIMIT);
+	cft->read_u64 = mem_cgroup_read;
+	cft->write_string = mem_cgroup_write;
+
+	/* Add the usage file */
+	cft = &h->cgroup_usage_file;
+	snprintf(cft->name, MAX_CFTYPE_NAME, "hugetlb.%s.usage_in_bytes", buf);
+	cft->private  = __MEMFILE_PRIVATE(idx, _MEMHUGETLB, RES_USAGE);
+	cft->read_u64 = mem_cgroup_read;
+
+	/* Add the MAX usage file */
+	cft = &h->cgroup_max_usage_file;
+	snprintf(cft->name, MAX_CFTYPE_NAME, "hugetlb.%s.max_usage_in_bytes", buf);
+	cft->private  = __MEMFILE_PRIVATE(idx, _MEMHUGETLB, RES_MAX_USAGE);
+	cft->trigger  = mem_cgroup_reset;
+	cft->read_u64 = mem_cgroup_read;
+
+	return 0;
+}
+#endif
+
 static int mem_cgroup_populate(struct cgroup_subsys *ss,
 				struct cgroup *cont)
 {
@@ -5137,6 +5181,9 @@ static int mem_cgroup_populate(struct cgroup_subsys *ss,
 	if (!ret)
 		ret = register_kmem_files(cont, ss);
 
+	if (!ret)
+		ret = register_hugetlb_memcg_files(cont, ss);
+
 	return ret;
 }
 
-- 
1.7.9

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH -V3 6/8] hugetlbfs: Add a list for tracking in-use HugeTLB pages
  2012-03-13  7:07 ` Aneesh Kumar K.V
@ 2012-03-13  7:07   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-13  7:07 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

hugepage_activelist will be used to track currently used HugeTLB pages.
We need to find the in-use HugeTLB pages to support memcg removal.
On memcg removal we update the page's memory cgroup to point to
parent cgroup.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/hugetlb.h |    1 +
 mm/hugetlb.c            |   23 ++++++++++++++++++-----
 2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 8c1e855..e62eee7 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -217,6 +217,7 @@ struct hstate {
 	unsigned long resv_huge_pages;
 	unsigned long surplus_huge_pages;
 	unsigned long nr_overcommit_huge_pages;
+	struct list_head hugepage_activelist;
 	struct list_head hugepage_freelists[MAX_NUMNODES];
 	unsigned int nr_huge_pages_node[MAX_NUMNODES];
 	unsigned int free_huge_pages_node[MAX_NUMNODES];
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 30f66f1..e038fdc 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -433,7 +433,7 @@ void copy_huge_page(struct page *dst, struct page *src)
 static void enqueue_huge_page(struct hstate *h, struct page *page)
 {
 	int nid = page_to_nid(page);
-	list_add(&page->lru, &h->hugepage_freelists[nid]);
+	list_move(&page->lru, &h->hugepage_freelists[nid]);
 	h->free_huge_pages++;
 	h->free_huge_pages_node[nid]++;
 }
@@ -445,7 +445,7 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid)
 	if (list_empty(&h->hugepage_freelists[nid]))
 		return NULL;
 	page = list_entry(h->hugepage_freelists[nid].next, struct page, lru);
-	list_del(&page->lru);
+	list_move(&page->lru, &h->hugepage_activelist);
 	set_page_refcounted(page);
 	h->free_huge_pages--;
 	h->free_huge_pages_node[nid]--;
@@ -542,13 +542,14 @@ static void free_huge_page(struct page *page)
 	page->mapping = NULL;
 	BUG_ON(page_count(page));
 	BUG_ON(page_mapcount(page));
-	INIT_LIST_HEAD(&page->lru);
 
 	if (mapping)
 		mem_cgroup_hugetlb_uncharge_page(h - hstates,
 						 pages_per_huge_page(h), page);
 	spin_lock(&hugetlb_lock);
 	if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
+		/* remove the page from active list */
+		list_del(&page->lru);
 		update_and_free_page(h, page);
 		h->surplus_huge_pages--;
 		h->surplus_huge_pages_node[nid]--;
@@ -562,6 +563,7 @@ static void free_huge_page(struct page *page)
 
 static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
 {
+	INIT_LIST_HEAD(&page->lru);
 	set_compound_page_dtor(page, free_huge_page);
 	spin_lock(&hugetlb_lock);
 	h->nr_huge_pages++;
@@ -1866,6 +1868,7 @@ void __init hugetlb_add_hstate(unsigned order)
 	h->free_huge_pages = 0;
 	for (i = 0; i < MAX_NUMNODES; ++i)
 		INIT_LIST_HEAD(&h->hugepage_freelists[i]);
+	INIT_LIST_HEAD(&h->hugepage_activelist);
 	h->next_nid_to_alloc = first_node(node_states[N_HIGH_MEMORY]);
 	h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]);
 	snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
@@ -2324,14 +2327,24 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
 		page = pte_page(pte);
 		if (pte_dirty(pte))
 			set_page_dirty(page);
-		list_add(&page->lru, &page_list);
+
+		spin_lock(&hugetlb_lock);
+		list_move(&page->lru, &page_list);
+		spin_unlock(&hugetlb_lock);
 	}
 	spin_unlock(&mm->page_table_lock);
 	flush_tlb_range(vma, start, end);
 	mmu_notifier_invalidate_range_end(mm, start, end);
 	list_for_each_entry_safe(page, tmp, &page_list, lru) {
 		page_remove_rmap(page);
-		list_del(&page->lru);
+		/*
+		 * We need to move it back huge page active list. If we are
+		 * holding the last reference, below put_page will move it
+		 * back to free list.
+		 */
+		spin_lock(&hugetlb_lock);
+		list_move(&page->lru, &h->hugepage_activelist);
+		spin_unlock(&hugetlb_lock);
 		put_page(page);
 	}
 }
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH -V3 6/8] hugetlbfs: Add a list for tracking in-use HugeTLB pages
@ 2012-03-13  7:07   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-13  7:07 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

hugepage_activelist will be used to track currently used HugeTLB pages.
We need to find the in-use HugeTLB pages to support memcg removal.
On memcg removal we update the page's memory cgroup to point to
parent cgroup.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/hugetlb.h |    1 +
 mm/hugetlb.c            |   23 ++++++++++++++++++-----
 2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 8c1e855..e62eee7 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -217,6 +217,7 @@ struct hstate {
 	unsigned long resv_huge_pages;
 	unsigned long surplus_huge_pages;
 	unsigned long nr_overcommit_huge_pages;
+	struct list_head hugepage_activelist;
 	struct list_head hugepage_freelists[MAX_NUMNODES];
 	unsigned int nr_huge_pages_node[MAX_NUMNODES];
 	unsigned int free_huge_pages_node[MAX_NUMNODES];
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 30f66f1..e038fdc 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -433,7 +433,7 @@ void copy_huge_page(struct page *dst, struct page *src)
 static void enqueue_huge_page(struct hstate *h, struct page *page)
 {
 	int nid = page_to_nid(page);
-	list_add(&page->lru, &h->hugepage_freelists[nid]);
+	list_move(&page->lru, &h->hugepage_freelists[nid]);
 	h->free_huge_pages++;
 	h->free_huge_pages_node[nid]++;
 }
@@ -445,7 +445,7 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid)
 	if (list_empty(&h->hugepage_freelists[nid]))
 		return NULL;
 	page = list_entry(h->hugepage_freelists[nid].next, struct page, lru);
-	list_del(&page->lru);
+	list_move(&page->lru, &h->hugepage_activelist);
 	set_page_refcounted(page);
 	h->free_huge_pages--;
 	h->free_huge_pages_node[nid]--;
@@ -542,13 +542,14 @@ static void free_huge_page(struct page *page)
 	page->mapping = NULL;
 	BUG_ON(page_count(page));
 	BUG_ON(page_mapcount(page));
-	INIT_LIST_HEAD(&page->lru);
 
 	if (mapping)
 		mem_cgroup_hugetlb_uncharge_page(h - hstates,
 						 pages_per_huge_page(h), page);
 	spin_lock(&hugetlb_lock);
 	if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
+		/* remove the page from active list */
+		list_del(&page->lru);
 		update_and_free_page(h, page);
 		h->surplus_huge_pages--;
 		h->surplus_huge_pages_node[nid]--;
@@ -562,6 +563,7 @@ static void free_huge_page(struct page *page)
 
 static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
 {
+	INIT_LIST_HEAD(&page->lru);
 	set_compound_page_dtor(page, free_huge_page);
 	spin_lock(&hugetlb_lock);
 	h->nr_huge_pages++;
@@ -1866,6 +1868,7 @@ void __init hugetlb_add_hstate(unsigned order)
 	h->free_huge_pages = 0;
 	for (i = 0; i < MAX_NUMNODES; ++i)
 		INIT_LIST_HEAD(&h->hugepage_freelists[i]);
+	INIT_LIST_HEAD(&h->hugepage_activelist);
 	h->next_nid_to_alloc = first_node(node_states[N_HIGH_MEMORY]);
 	h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]);
 	snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
@@ -2324,14 +2327,24 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
 		page = pte_page(pte);
 		if (pte_dirty(pte))
 			set_page_dirty(page);
-		list_add(&page->lru, &page_list);
+
+		spin_lock(&hugetlb_lock);
+		list_move(&page->lru, &page_list);
+		spin_unlock(&hugetlb_lock);
 	}
 	spin_unlock(&mm->page_table_lock);
 	flush_tlb_range(vma, start, end);
 	mmu_notifier_invalidate_range_end(mm, start, end);
 	list_for_each_entry_safe(page, tmp, &page_list, lru) {
 		page_remove_rmap(page);
-		list_del(&page->lru);
+		/*
+		 * We need to move it back huge page active list. If we are
+		 * holding the last reference, below put_page will move it
+		 * back to free list.
+		 */
+		spin_lock(&hugetlb_lock);
+		list_move(&page->lru, &h->hugepage_activelist);
+		spin_unlock(&hugetlb_lock);
 		put_page(page);
 	}
 }
-- 
1.7.9

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH -V3 7/8] memcg: move HugeTLB resource count to parent cgroup on memcg removal
  2012-03-13  7:07 ` Aneesh Kumar K.V
@ 2012-03-13  7:07   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-13  7:07 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This add support for memcg removal with HugeTLB resource usage.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/hugetlb.h    |    6 ++++
 include/linux/memcontrol.h |   15 ++++++++++++
 mm/hugetlb.c               |   32 +++++++++++++++++++++++++
 mm/memcontrol.c            |   55 ++++++++++++++++++++++++++++++++++++--------
 4 files changed, 98 insertions(+), 10 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index e62eee7..f98b29e 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -341,11 +341,17 @@ static inline unsigned int pages_per_huge_page(struct hstate *h)
 #ifdef CONFIG_MEM_RES_CTLR_HUGETLB
 extern int register_hugetlb_memcg_files(struct cgroup *cgroup,
 					struct cgroup_subsys *ss);
+extern int hugetlb_force_memcg_empty(struct cgroup *cgroup);
 #else
 static inline int register_hugetlb_memcg_files(struct cgroup *cgroup,
 					       struct cgroup_subsys *ss)
 {
 	return 0;
 }
+
+static inline int hugetlb_force_memcg_empty(struct cgroup *cgroup)
+{
+	return 0;
+}
 #endif
 #endif /* _LINUX_HUGETLB_H */
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 320dbad..cf837de 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -440,6 +440,9 @@ extern void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages,
 					     struct page *page);
 extern void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
 					      struct mem_cgroup *memcg);
+extern int mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup,
+					  struct page *page);
+extern int mem_cgroup_hugetlb_usage(struct cgroup *cgroup);
 
 #else
 static inline int
@@ -470,6 +473,18 @@ mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
 {
 	return;
 }
+static inline int
+mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup,
+			       struct page *page)
+{
+	return 0;
+}
+
+static inline int
+mem_cgroup_hugetlb_usage(struct cgroup *cgroup)
+{
+	return 0;
+}
 #endif  /* CONFIG_MEM_RES_CTLR_HUGETLB */
 #endif /* _LINUX_MEMCONTROL_H */
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index e038fdc..c7a1046 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1840,6 +1840,38 @@ int register_hugetlb_memcg_files(struct cgroup *cgroup,
 	}
 	return ret;
 }
+
+int hugetlb_force_memcg_empty(struct cgroup *cgroup)
+{
+	struct hstate *h;
+	struct page *page;
+	int ret = 0, idx = 0;
+
+	do {
+		if (cgroup_task_count(cgroup) || !list_empty(&cgroup->children))
+			goto out;
+		if (signal_pending(current)) {
+			ret = -EINTR;
+			goto out;
+		}
+		for_each_hstate(h) {
+			spin_lock(&hugetlb_lock);
+			list_for_each_entry(page, &h->hugepage_activelist, lru) {
+				ret = mem_cgroup_move_hugetlb_parent(idx, cgroup, page);
+				if (ret) {
+					spin_unlock(&hugetlb_lock);
+					goto out;
+				}
+			}
+			spin_unlock(&hugetlb_lock);
+			idx++;
+		}
+		cond_resched();
+	} while (mem_cgroup_hugetlb_usage(cgroup) > 0);
+out:
+	return ret;
+}
+
 /* mm/memcontrol.c because mem_cgroup_read/write is not availabel outside */
 int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx);
 #else
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 405e17d..b77e0bf 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3171,9 +3171,11 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
 #endif
 
 #ifdef CONFIG_MEM_RES_CTLR_HUGETLB
-static int mem_cgroup_hugetlb_usage(struct mem_cgroup *memcg)
+int mem_cgroup_hugetlb_usage(struct cgroup *cgroup)
 {
 	int idx;
+	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgroup);
+
 	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
 		if (memcg->hugepage[idx].usage > 0)
 			return memcg->hugepage[idx].usage;
@@ -3285,10 +3287,44 @@ void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
 		res_counter_uncharge(&memcg->hugepage[idx], csize);
 	return;
 }
-#else
-static int mem_cgroup_hugetlb_usage(struct mem_cgroup *memcg)
+
+int mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup,
+				   struct page *page)
 {
-	return 0;
+	struct page_cgroup *pc;
+	int csize,  ret = 0;
+	struct res_counter *fail_res;
+	struct cgroup *pcgrp = cgroup->parent;
+	struct mem_cgroup *parent = mem_cgroup_from_cont(pcgrp);
+	struct mem_cgroup *memcg  = mem_cgroup_from_cont(cgroup);
+
+	if (!get_page_unless_zero(page))
+		goto out;
+
+	pc = lookup_page_cgroup(page);
+	lock_page_cgroup(pc);
+	if (!PageCgroupUsed(pc) || pc->mem_cgroup != memcg)
+		goto err_out;
+
+	csize = PAGE_SIZE << compound_order(page);
+	/*
+	 * uncharge from child and charge parent
+	 */
+	if (!mem_cgroup_is_root(parent)) {
+		ret = res_counter_charge(&parent->hugepage[idx], csize, &fail_res);
+		if (ret)
+			goto err_out;
+	}
+	res_counter_uncharge(&memcg->hugepage[idx], csize);
+	/*
+	 * caller should have done css_get
+	 */
+	pc->mem_cgroup = parent;
+err_out:
+	unlock_page_cgroup(pc);
+	put_page(page);
+out:
+	return ret;
 }
 #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */
 
@@ -3806,6 +3842,11 @@ static int mem_cgroup_force_empty(struct mem_cgroup *memcg, bool free_all)
 	/* should free all ? */
 	if (free_all)
 		goto try_to_free;
+
+	/* move the hugetlb charges */
+	ret = hugetlb_force_memcg_empty(cgrp);
+	if (ret)
+		goto out;
 move_account:
 	do {
 		ret = -EBUSY;
@@ -5103,12 +5144,6 @@ static int mem_cgroup_pre_destroy(struct cgroup_subsys *ss,
 					struct cgroup *cont)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
-	/*
-	 * Don't allow memcg removal if we have HugeTLB resource
-	 * usage.
-	 */
-	if (mem_cgroup_hugetlb_usage(memcg) > 0)
-		return -EBUSY;
 
 	return mem_cgroup_force_empty(memcg, false);
 }
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH -V3 7/8] memcg: move HugeTLB resource count to parent cgroup on memcg removal
@ 2012-03-13  7:07   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-13  7:07 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This add support for memcg removal with HugeTLB resource usage.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/hugetlb.h    |    6 ++++
 include/linux/memcontrol.h |   15 ++++++++++++
 mm/hugetlb.c               |   32 +++++++++++++++++++++++++
 mm/memcontrol.c            |   55 ++++++++++++++++++++++++++++++++++++--------
 4 files changed, 98 insertions(+), 10 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index e62eee7..f98b29e 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -341,11 +341,17 @@ static inline unsigned int pages_per_huge_page(struct hstate *h)
 #ifdef CONFIG_MEM_RES_CTLR_HUGETLB
 extern int register_hugetlb_memcg_files(struct cgroup *cgroup,
 					struct cgroup_subsys *ss);
+extern int hugetlb_force_memcg_empty(struct cgroup *cgroup);
 #else
 static inline int register_hugetlb_memcg_files(struct cgroup *cgroup,
 					       struct cgroup_subsys *ss)
 {
 	return 0;
 }
+
+static inline int hugetlb_force_memcg_empty(struct cgroup *cgroup)
+{
+	return 0;
+}
 #endif
 #endif /* _LINUX_HUGETLB_H */
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 320dbad..cf837de 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -440,6 +440,9 @@ extern void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages,
 					     struct page *page);
 extern void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
 					      struct mem_cgroup *memcg);
+extern int mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup,
+					  struct page *page);
+extern int mem_cgroup_hugetlb_usage(struct cgroup *cgroup);
 
 #else
 static inline int
@@ -470,6 +473,18 @@ mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
 {
 	return;
 }
+static inline int
+mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup,
+			       struct page *page)
+{
+	return 0;
+}
+
+static inline int
+mem_cgroup_hugetlb_usage(struct cgroup *cgroup)
+{
+	return 0;
+}
 #endif  /* CONFIG_MEM_RES_CTLR_HUGETLB */
 #endif /* _LINUX_MEMCONTROL_H */
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index e038fdc..c7a1046 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1840,6 +1840,38 @@ int register_hugetlb_memcg_files(struct cgroup *cgroup,
 	}
 	return ret;
 }
+
+int hugetlb_force_memcg_empty(struct cgroup *cgroup)
+{
+	struct hstate *h;
+	struct page *page;
+	int ret = 0, idx = 0;
+
+	do {
+		if (cgroup_task_count(cgroup) || !list_empty(&cgroup->children))
+			goto out;
+		if (signal_pending(current)) {
+			ret = -EINTR;
+			goto out;
+		}
+		for_each_hstate(h) {
+			spin_lock(&hugetlb_lock);
+			list_for_each_entry(page, &h->hugepage_activelist, lru) {
+				ret = mem_cgroup_move_hugetlb_parent(idx, cgroup, page);
+				if (ret) {
+					spin_unlock(&hugetlb_lock);
+					goto out;
+				}
+			}
+			spin_unlock(&hugetlb_lock);
+			idx++;
+		}
+		cond_resched();
+	} while (mem_cgroup_hugetlb_usage(cgroup) > 0);
+out:
+	return ret;
+}
+
 /* mm/memcontrol.c because mem_cgroup_read/write is not availabel outside */
 int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx);
 #else
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 405e17d..b77e0bf 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3171,9 +3171,11 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
 #endif
 
 #ifdef CONFIG_MEM_RES_CTLR_HUGETLB
-static int mem_cgroup_hugetlb_usage(struct mem_cgroup *memcg)
+int mem_cgroup_hugetlb_usage(struct cgroup *cgroup)
 {
 	int idx;
+	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgroup);
+
 	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
 		if (memcg->hugepage[idx].usage > 0)
 			return memcg->hugepage[idx].usage;
@@ -3285,10 +3287,44 @@ void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
 		res_counter_uncharge(&memcg->hugepage[idx], csize);
 	return;
 }
-#else
-static int mem_cgroup_hugetlb_usage(struct mem_cgroup *memcg)
+
+int mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup,
+				   struct page *page)
 {
-	return 0;
+	struct page_cgroup *pc;
+	int csize,  ret = 0;
+	struct res_counter *fail_res;
+	struct cgroup *pcgrp = cgroup->parent;
+	struct mem_cgroup *parent = mem_cgroup_from_cont(pcgrp);
+	struct mem_cgroup *memcg  = mem_cgroup_from_cont(cgroup);
+
+	if (!get_page_unless_zero(page))
+		goto out;
+
+	pc = lookup_page_cgroup(page);
+	lock_page_cgroup(pc);
+	if (!PageCgroupUsed(pc) || pc->mem_cgroup != memcg)
+		goto err_out;
+
+	csize = PAGE_SIZE << compound_order(page);
+	/*
+	 * uncharge from child and charge parent
+	 */
+	if (!mem_cgroup_is_root(parent)) {
+		ret = res_counter_charge(&parent->hugepage[idx], csize, &fail_res);
+		if (ret)
+			goto err_out;
+	}
+	res_counter_uncharge(&memcg->hugepage[idx], csize);
+	/*
+	 * caller should have done css_get
+	 */
+	pc->mem_cgroup = parent;
+err_out:
+	unlock_page_cgroup(pc);
+	put_page(page);
+out:
+	return ret;
 }
 #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */
 
@@ -3806,6 +3842,11 @@ static int mem_cgroup_force_empty(struct mem_cgroup *memcg, bool free_all)
 	/* should free all ? */
 	if (free_all)
 		goto try_to_free;
+
+	/* move the hugetlb charges */
+	ret = hugetlb_force_memcg_empty(cgrp);
+	if (ret)
+		goto out;
 move_account:
 	do {
 		ret = -EBUSY;
@@ -5103,12 +5144,6 @@ static int mem_cgroup_pre_destroy(struct cgroup_subsys *ss,
 					struct cgroup *cont)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
-	/*
-	 * Don't allow memcg removal if we have HugeTLB resource
-	 * usage.
-	 */
-	if (mem_cgroup_hugetlb_usage(memcg) > 0)
-		return -EBUSY;
 
 	return mem_cgroup_force_empty(memcg, false);
 }
-- 
1.7.9

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH -V3 8/8] memcg: Add memory controller documentation for hugetlb management
  2012-03-13  7:07 ` Aneesh Kumar K.V
@ 2012-03-13  7:07   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-13  7:07 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 Documentation/cgroups/memory.txt |   29 +++++++++++++++++++++++++++++
 1 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index 4c95c00..d99c41b 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -43,6 +43,7 @@ Features:
  - usage threshold notifier
  - oom-killer disable knob and oom-notifier
  - Root cgroup has no limit controls.
+ - resource accounting for HugeTLB pages
 
  Kernel memory support is work in progress, and the current version provides
  basically functionality. (See Section 2.7)
@@ -75,6 +76,12 @@ Brief summary of control files.
  memory.kmem.tcp.limit_in_bytes  # set/show hard limit for tcp buf memory
  memory.kmem.tcp.usage_in_bytes  # show current tcp buf memory allocation
 
+
+ memory.hugetlb.<hugepagesize>.limit_in_bytes     # set/show limit of "hugepagesize" hugetlb usage
+ memory.hugetlb.<hugepagesize>.max_usage_in_bytes # show max "hugepagesize" hugetlb  usage recorded
+ memory.hugetlb.<hugepagesize>.usage_in_bytes     # show current res_counter usage for "hugepagesize" hugetlb
+						  # see 5.7 for details
+
 1. History
 
 The memory controller has a long history. A request for comments for the memory
@@ -279,6 +286,15 @@ per cgroup, instead of globally.
 
 * tcp memory pressure: sockets memory pressure for the tcp protocol.
 
+2.8 HugeTLB extension
+
+This extension allows to limit the HugeTLB usage per control group and
+enforces the controller limit during page fault. Since HugeTLB doesn't
+support page reclaim, enforcing the limit at page fault time implies that,
+the application will get SIGBUS signal if it tries to access HugeTLB pages
+beyond its limit. This requires the application to know beforehand how much
+HugeTLB pages it would require for its use.
+
 3. User Interface
 
 0. Configuration
@@ -287,6 +303,7 @@ a. Enable CONFIG_CGROUPS
 b. Enable CONFIG_RESOURCE_COUNTERS
 c. Enable CONFIG_CGROUP_MEM_RES_CTLR
 d. Enable CONFIG_CGROUP_MEM_RES_CTLR_SWAP (to use swap extension)
+f. Enable CONFIG_MEM_RES_CTLR_HUGETLB (to use HugeTLB extension)
 
 1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?)
 # mount -t tmpfs none /sys/fs/cgroup
@@ -510,6 +527,18 @@ unevictable=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ...
 
 And we have total = file + anon + unevictable.
 
+5.7 HugeTLB resource control files
+For a system supporting two hugepage size (16M and 16G) the control
+files include:
+
+ memory.hugetlb.16GB.limit_in_bytes
+ memory.hugetlb.16GB.max_usage_in_bytes
+ memory.hugetlb.16GB.usage_in_bytes
+ memory.hugetlb.16MB.limit_in_bytes
+ memory.hugetlb.16MB.max_usage_in_bytes
+ memory.hugetlb.16MB.usage_in_bytes
+
+
 6. Hierarchy support
 
 The memory controller supports a deep hierarchy and hierarchical accounting.
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH -V3 8/8] memcg: Add memory controller documentation for hugetlb management
@ 2012-03-13  7:07   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-13  7:07 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 Documentation/cgroups/memory.txt |   29 +++++++++++++++++++++++++++++
 1 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index 4c95c00..d99c41b 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -43,6 +43,7 @@ Features:
  - usage threshold notifier
  - oom-killer disable knob and oom-notifier
  - Root cgroup has no limit controls.
+ - resource accounting for HugeTLB pages
 
  Kernel memory support is work in progress, and the current version provides
  basically functionality. (See Section 2.7)
@@ -75,6 +76,12 @@ Brief summary of control files.
  memory.kmem.tcp.limit_in_bytes  # set/show hard limit for tcp buf memory
  memory.kmem.tcp.usage_in_bytes  # show current tcp buf memory allocation
 
+
+ memory.hugetlb.<hugepagesize>.limit_in_bytes     # set/show limit of "hugepagesize" hugetlb usage
+ memory.hugetlb.<hugepagesize>.max_usage_in_bytes # show max "hugepagesize" hugetlb  usage recorded
+ memory.hugetlb.<hugepagesize>.usage_in_bytes     # show current res_counter usage for "hugepagesize" hugetlb
+						  # see 5.7 for details
+
 1. History
 
 The memory controller has a long history. A request for comments for the memory
@@ -279,6 +286,15 @@ per cgroup, instead of globally.
 
 * tcp memory pressure: sockets memory pressure for the tcp protocol.
 
+2.8 HugeTLB extension
+
+This extension allows to limit the HugeTLB usage per control group and
+enforces the controller limit during page fault. Since HugeTLB doesn't
+support page reclaim, enforcing the limit at page fault time implies that,
+the application will get SIGBUS signal if it tries to access HugeTLB pages
+beyond its limit. This requires the application to know beforehand how much
+HugeTLB pages it would require for its use.
+
 3. User Interface
 
 0. Configuration
@@ -287,6 +303,7 @@ a. Enable CONFIG_CGROUPS
 b. Enable CONFIG_RESOURCE_COUNTERS
 c. Enable CONFIG_CGROUP_MEM_RES_CTLR
 d. Enable CONFIG_CGROUP_MEM_RES_CTLR_SWAP (to use swap extension)
+f. Enable CONFIG_MEM_RES_CTLR_HUGETLB (to use HugeTLB extension)
 
 1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?)
 # mount -t tmpfs none /sys/fs/cgroup
@@ -510,6 +527,18 @@ unevictable=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ...
 
 And we have total = file + anon + unevictable.
 
+5.7 HugeTLB resource control files
+For a system supporting two hugepage size (16M and 16G) the control
+files include:
+
+ memory.hugetlb.16GB.limit_in_bytes
+ memory.hugetlb.16GB.max_usage_in_bytes
+ memory.hugetlb.16GB.usage_in_bytes
+ memory.hugetlb.16MB.limit_in_bytes
+ memory.hugetlb.16MB.max_usage_in_bytes
+ memory.hugetlb.16MB.usage_in_bytes
+
+
 6. Hierarchy support
 
 The memory controller supports a deep hierarchy and hierarchical accounting.
-- 
1.7.9

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 1/8] hugetlb: rename max_hstate to hugetlb_max_hstate
  2012-03-13  7:07   ` Aneesh Kumar K.V
  (?)
@ 2012-03-13 13:13     ` Hillf Danton
  -1 siblings, 0 replies; 69+ messages in thread
From: Hillf Danton @ 2012-03-13 13:13 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, aarcange, mhocko, akpm,
	hannes, linux-kernel, cgroups

On Tue, Mar 13, 2012 at 3:07 PM, Aneesh Kumar K.V
<aneesh.kumar@linux.vnet.ibm.com> wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>
> We will be using this from other subsystems like memcg
> in later patches.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---

Acked-by: Hillf Danton <dhillf@gmail.com>

>  mm/hugetlb.c |   14 +++++++-------
>  1 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 5f34bd8..d623e71 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -34,7 +34,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
>  static gfp_t htlb_alloc_mask = GFP_HIGHUSER;
>  unsigned long hugepages_treat_as_movable;
>
> -static int max_hstate;
> +static int hugetlb_max_hstate;
>  unsigned int default_hstate_idx;
>  struct hstate hstates[HUGE_MAX_HSTATE];
>
> @@ -46,7 +46,7 @@ static unsigned long __initdata default_hstate_max_huge_pages;
>  static unsigned long __initdata default_hstate_size;
>
>  #define for_each_hstate(h) \
> -       for ((h) = hstates; (h) < &hstates[max_hstate]; (h)++)
> +       for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++)
>
>  /*
>  * Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages
> @@ -1808,9 +1808,9 @@ void __init hugetlb_add_hstate(unsigned order)
>                printk(KERN_WARNING "hugepagesz= specified twice, ignoring\n");
>                return;
>        }
> -       BUG_ON(max_hstate >= HUGE_MAX_HSTATE);
> +       BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE);
>        BUG_ON(order == 0);
> -       h = &hstates[max_hstate++];
> +       h = &hstates[hugetlb_max_hstate++];
>        h->order = order;
>        h->mask = ~((1ULL << (order + PAGE_SHIFT)) - 1);
>        h->nr_huge_pages = 0;
> @@ -1831,10 +1831,10 @@ static int __init hugetlb_nrpages_setup(char *s)
>        static unsigned long *last_mhp;
>
>        /*
> -        * !max_hstate means we haven't parsed a hugepagesz= parameter yet,
> +        * !hugetlb_max_hstate means we haven't parsed a hugepagesz= parameter yet,
>         * so this hugepages= parameter goes to the "default hstate".
>         */
> -       if (!max_hstate)
> +       if (!hugetlb_max_hstate)
>                mhp = &default_hstate_max_huge_pages;
>        else
>                mhp = &parsed_hstate->max_huge_pages;
> @@ -1853,7 +1853,7 @@ static int __init hugetlb_nrpages_setup(char *s)
>         * But we need to allocate >= MAX_ORDER hstates here early to still
>         * use the bootmem allocator.
>         */
> -       if (max_hstate && parsed_hstate->order >= MAX_ORDER)
> +       if (hugetlb_max_hstate && parsed_hstate->order >= MAX_ORDER)
>                hugetlb_hstate_alloc_pages(parsed_hstate);
>
>        last_mhp = mhp;
> --
> 1.7.9
>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 1/8] hugetlb: rename max_hstate to hugetlb_max_hstate
@ 2012-03-13 13:13     ` Hillf Danton
  0 siblings, 0 replies; 69+ messages in thread
From: Hillf Danton @ 2012-03-13 13:13 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, aarcange, mhocko, akpm,
	hannes, linux-kernel, cgroups

On Tue, Mar 13, 2012 at 3:07 PM, Aneesh Kumar K.V
<aneesh.kumar@linux.vnet.ibm.com> wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>
> We will be using this from other subsystems like memcg
> in later patches.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---

Acked-by: Hillf Danton <dhillf@gmail.com>

>  mm/hugetlb.c |   14 +++++++-------
>  1 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 5f34bd8..d623e71 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -34,7 +34,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
>  static gfp_t htlb_alloc_mask = GFP_HIGHUSER;
>  unsigned long hugepages_treat_as_movable;
>
> -static int max_hstate;
> +static int hugetlb_max_hstate;
>  unsigned int default_hstate_idx;
>  struct hstate hstates[HUGE_MAX_HSTATE];
>
> @@ -46,7 +46,7 @@ static unsigned long __initdata default_hstate_max_huge_pages;
>  static unsigned long __initdata default_hstate_size;
>
>  #define for_each_hstate(h) \
> -       for ((h) = hstates; (h) < &hstates[max_hstate]; (h)++)
> +       for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++)
>
>  /*
>  * Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages
> @@ -1808,9 +1808,9 @@ void __init hugetlb_add_hstate(unsigned order)
>                printk(KERN_WARNING "hugepagesz= specified twice, ignoring\n");
>                return;
>        }
> -       BUG_ON(max_hstate >= HUGE_MAX_HSTATE);
> +       BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE);
>        BUG_ON(order == 0);
> -       h = &hstates[max_hstate++];
> +       h = &hstates[hugetlb_max_hstate++];
>        h->order = order;
>        h->mask = ~((1ULL << (order + PAGE_SHIFT)) - 1);
>        h->nr_huge_pages = 0;
> @@ -1831,10 +1831,10 @@ static int __init hugetlb_nrpages_setup(char *s)
>        static unsigned long *last_mhp;
>
>        /*
> -        * !max_hstate means we haven't parsed a hugepagesz= parameter yet,
> +        * !hugetlb_max_hstate means we haven't parsed a hugepagesz= parameter yet,
>         * so this hugepages= parameter goes to the "default hstate".
>         */
> -       if (!max_hstate)
> +       if (!hugetlb_max_hstate)
>                mhp = &default_hstate_max_huge_pages;
>        else
>                mhp = &parsed_hstate->max_huge_pages;
> @@ -1853,7 +1853,7 @@ static int __init hugetlb_nrpages_setup(char *s)
>         * But we need to allocate >= MAX_ORDER hstates here early to still
>         * use the bootmem allocator.
>         */
> -       if (max_hstate && parsed_hstate->order >= MAX_ORDER)
> +       if (hugetlb_max_hstate && parsed_hstate->order >= MAX_ORDER)
>                hugetlb_hstate_alloc_pages(parsed_hstate);
>
>        last_mhp = mhp;
> --
> 1.7.9
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 1/8] hugetlb: rename max_hstate to hugetlb_max_hstate
@ 2012-03-13 13:13     ` Hillf Danton
  0 siblings, 0 replies; 69+ messages in thread
From: Hillf Danton @ 2012-03-13 13:13 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, mgorman-l3A5Bk7waGM,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	aarcange-H+wXaHxf7aLQT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Tue, Mar 13, 2012 at 3:07 PM, Aneesh Kumar K.V
<aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
>
> We will be using this from other subsystems like memcg
> in later patches.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> ---

Acked-by: Hillf Danton <dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

>  mm/hugetlb.c |   14 +++++++-------
>  1 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 5f34bd8..d623e71 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -34,7 +34,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
>  static gfp_t htlb_alloc_mask = GFP_HIGHUSER;
>  unsigned long hugepages_treat_as_movable;
>
> -static int max_hstate;
> +static int hugetlb_max_hstate;
>  unsigned int default_hstate_idx;
>  struct hstate hstates[HUGE_MAX_HSTATE];
>
> @@ -46,7 +46,7 @@ static unsigned long __initdata default_hstate_max_huge_pages;
>  static unsigned long __initdata default_hstate_size;
>
>  #define for_each_hstate(h) \
> -       for ((h) = hstates; (h) < &hstates[max_hstate]; (h)++)
> +       for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++)
>
>  /*
>  * Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages
> @@ -1808,9 +1808,9 @@ void __init hugetlb_add_hstate(unsigned order)
>                printk(KERN_WARNING "hugepagesz= specified twice, ignoring\n");
>                return;
>        }
> -       BUG_ON(max_hstate >= HUGE_MAX_HSTATE);
> +       BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE);
>        BUG_ON(order == 0);
> -       h = &hstates[max_hstate++];
> +       h = &hstates[hugetlb_max_hstate++];
>        h->order = order;
>        h->mask = ~((1ULL << (order + PAGE_SHIFT)) - 1);
>        h->nr_huge_pages = 0;
> @@ -1831,10 +1831,10 @@ static int __init hugetlb_nrpages_setup(char *s)
>        static unsigned long *last_mhp;
>
>        /*
> -        * !max_hstate means we haven't parsed a hugepagesz= parameter yet,
> +        * !hugetlb_max_hstate means we haven't parsed a hugepagesz= parameter yet,
>         * so this hugepages= parameter goes to the "default hstate".
>         */
> -       if (!max_hstate)
> +       if (!hugetlb_max_hstate)
>                mhp = &default_hstate_max_huge_pages;
>        else
>                mhp = &parsed_hstate->max_huge_pages;
> @@ -1853,7 +1853,7 @@ static int __init hugetlb_nrpages_setup(char *s)
>         * But we need to allocate >= MAX_ORDER hstates here early to still
>         * use the bootmem allocator.
>         */
> -       if (max_hstate && parsed_hstate->order >= MAX_ORDER)
> +       if (hugetlb_max_hstate && parsed_hstate->order >= MAX_ORDER)
>                hugetlb_hstate_alloc_pages(parsed_hstate);
>
>        last_mhp = mhp;
> --
> 1.7.9
>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 3/8] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
  2012-03-13  7:07   ` Aneesh Kumar K.V
@ 2012-03-13 13:20     ` Hillf Danton
  -1 siblings, 0 replies; 69+ messages in thread
From: Hillf Danton @ 2012-03-13 13:20 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, aarcange, mhocko, akpm,
	hannes, linux-kernel, cgroups

On Tue, Mar 13, 2012 at 3:07 PM, Aneesh Kumar K.V
<aneesh.kumar@linux.vnet.ibm.com> wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>
> This adds necessary charge/uncharge calls in the HugeTLB code
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  mm/hugetlb.c    |   21 ++++++++++++++++++++-
>  mm/memcontrol.c |    5 +++++
>  2 files changed, 25 insertions(+), 1 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index fe7aefd..b7152d1 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -21,6 +21,8 @@
>  #include <linux/rmap.h>
>  #include <linux/swap.h>
>  #include <linux/swapops.h>
> +#include <linux/memcontrol.h>
> +#include <linux/page_cgroup.h>
>
>  #include <asm/page.h>
>  #include <asm/pgtable.h>
> @@ -542,6 +544,9 @@ static void free_huge_page(struct page *page)
>        BUG_ON(page_mapcount(page));
>        INIT_LIST_HEAD(&page->lru);
>
> +       if (mapping)
> +               mem_cgroup_hugetlb_uncharge_page(h - hstates,
> +                                                pages_per_huge_page(h), page);
>        spin_lock(&hugetlb_lock);
>        if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
>                update_and_free_page(h, page);
> @@ -1019,12 +1024,15 @@ static void vma_commit_reservation(struct hstate *h,
>  static struct page *alloc_huge_page(struct vm_area_struct *vma,
>                                    unsigned long addr, int avoid_reserve)
>  {
> +       int ret, idx;
>        struct hstate *h = hstate_vma(vma);
>        struct page *page;
> +       struct mem_cgroup *memcg = NULL;
>        struct address_space *mapping = vma->vm_file->f_mapping;
>        struct inode *inode = mapping->host;
>        long chg;
>
> +       idx = h - hstates;

Better if hstate index is computed with a tiny inline helper?
Other than that,

Acked-by: Hillf Danton <dhillf@gmail.com>

>        /*
>         * Processes that did not create the mapping will have no reserves and
>         * will not have accounted against quota. Check that the quota can be
> @@ -1039,6 +1047,12 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
>                if (hugetlb_get_quota(inode->i_mapping, chg))
>                        return ERR_PTR(-VM_FAULT_SIGBUS);
>
> +       ret = mem_cgroup_hugetlb_charge_page(idx, pages_per_huge_page(h),
> +                                            &memcg);
> +       if (ret) {
> +               hugetlb_put_quota(inode->i_mapping, chg);
> +               return ERR_PTR(-VM_FAULT_SIGBUS);
> +       }
>        spin_lock(&hugetlb_lock);
>        page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve);
>        spin_unlock(&hugetlb_lock);
> @@ -1046,6 +1060,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
>        if (!page) {
>                page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
>                if (!page) {
> +                       mem_cgroup_hugetlb_uncharge_memcg(idx,
> +                                                        pages_per_huge_page(h),
> +                                                        memcg);
>                        hugetlb_put_quota(inode->i_mapping, chg);
>                        return ERR_PTR(-VM_FAULT_SIGBUS);
>                }
> @@ -1054,7 +1071,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
>        set_page_private(page, (unsigned long) mapping);
>
>        vma_commit_reservation(h, vma, addr);
> -
> +       /* update page cgroup details */
> +       mem_cgroup_hugetlb_commit_charge(idx, pages_per_huge_page(h),
> +                                        memcg, page);
>        return page;
>  }
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 8cac77b..f4aa11c 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2901,6 +2901,11 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
>
>        if (PageSwapCache(page))
>                return NULL;
> +       /*
> +        * HugeTLB page uncharge happen in the HugeTLB compound page destructor
> +        */
> +       if (PageHuge(page))
> +               return NULL;
>
>        if (PageTransHuge(page)) {
>                nr_pages <<= compound_order(page);
> --
> 1.7.9
>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 3/8] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
@ 2012-03-13 13:20     ` Hillf Danton
  0 siblings, 0 replies; 69+ messages in thread
From: Hillf Danton @ 2012-03-13 13:20 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, aarcange, mhocko, akpm,
	hannes, linux-kernel, cgroups

On Tue, Mar 13, 2012 at 3:07 PM, Aneesh Kumar K.V
<aneesh.kumar@linux.vnet.ibm.com> wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>
> This adds necessary charge/uncharge calls in the HugeTLB code
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  mm/hugetlb.c    |   21 ++++++++++++++++++++-
>  mm/memcontrol.c |    5 +++++
>  2 files changed, 25 insertions(+), 1 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index fe7aefd..b7152d1 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -21,6 +21,8 @@
>  #include <linux/rmap.h>
>  #include <linux/swap.h>
>  #include <linux/swapops.h>
> +#include <linux/memcontrol.h>
> +#include <linux/page_cgroup.h>
>
>  #include <asm/page.h>
>  #include <asm/pgtable.h>
> @@ -542,6 +544,9 @@ static void free_huge_page(struct page *page)
>        BUG_ON(page_mapcount(page));
>        INIT_LIST_HEAD(&page->lru);
>
> +       if (mapping)
> +               mem_cgroup_hugetlb_uncharge_page(h - hstates,
> +                                                pages_per_huge_page(h), page);
>        spin_lock(&hugetlb_lock);
>        if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
>                update_and_free_page(h, page);
> @@ -1019,12 +1024,15 @@ static void vma_commit_reservation(struct hstate *h,
>  static struct page *alloc_huge_page(struct vm_area_struct *vma,
>                                    unsigned long addr, int avoid_reserve)
>  {
> +       int ret, idx;
>        struct hstate *h = hstate_vma(vma);
>        struct page *page;
> +       struct mem_cgroup *memcg = NULL;
>        struct address_space *mapping = vma->vm_file->f_mapping;
>        struct inode *inode = mapping->host;
>        long chg;
>
> +       idx = h - hstates;

Better if hstate index is computed with a tiny inline helper?
Other than that,

Acked-by: Hillf Danton <dhillf@gmail.com>

>        /*
>         * Processes that did not create the mapping will have no reserves and
>         * will not have accounted against quota. Check that the quota can be
> @@ -1039,6 +1047,12 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
>                if (hugetlb_get_quota(inode->i_mapping, chg))
>                        return ERR_PTR(-VM_FAULT_SIGBUS);
>
> +       ret = mem_cgroup_hugetlb_charge_page(idx, pages_per_huge_page(h),
> +                                            &memcg);
> +       if (ret) {
> +               hugetlb_put_quota(inode->i_mapping, chg);
> +               return ERR_PTR(-VM_FAULT_SIGBUS);
> +       }
>        spin_lock(&hugetlb_lock);
>        page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve);
>        spin_unlock(&hugetlb_lock);
> @@ -1046,6 +1060,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
>        if (!page) {
>                page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
>                if (!page) {
> +                       mem_cgroup_hugetlb_uncharge_memcg(idx,
> +                                                        pages_per_huge_page(h),
> +                                                        memcg);
>                        hugetlb_put_quota(inode->i_mapping, chg);
>                        return ERR_PTR(-VM_FAULT_SIGBUS);
>                }
> @@ -1054,7 +1071,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
>        set_page_private(page, (unsigned long) mapping);
>
>        vma_commit_reservation(h, vma, addr);
> -
> +       /* update page cgroup details */
> +       mem_cgroup_hugetlb_commit_charge(idx, pages_per_huge_page(h),
> +                                        memcg, page);
>        return page;
>  }
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 8cac77b..f4aa11c 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2901,6 +2901,11 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
>
>        if (PageSwapCache(page))
>                return NULL;
> +       /*
> +        * HugeTLB page uncharge happen in the HugeTLB compound page destructor
> +        */
> +       if (PageHuge(page))
> +               return NULL;
>
>        if (PageTransHuge(page)) {
>                nr_pages <<= compound_order(page);
> --
> 1.7.9
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 2/8] memcg: Add HugeTLB extension
  2012-03-13  7:07   ` Aneesh Kumar K.V
  (?)
@ 2012-03-13 13:28   ` Glauber Costa
  2012-03-13 13:42     ` Glauber Costa
  -1 siblings, 1 reply; 69+ messages in thread
From: Glauber Costa @ 2012-03-13 13:28 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes, linux-kernel, cgroups

On 03/13/2012 11:07 AM, Aneesh Kumar K.V wrote:
> @@ -4951,6 +5083,12 @@ static int mem_cgroup_pre_destroy(struct cgroup_subsys *ss,
>   					struct cgroup *cont)
>   {
>   	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> +	/*
> +	 * Don't allow memcg removal if we have HugeTLB resource
> +	 * usage.
> +	 */
> +	if (mem_cgroup_hugetlb_usage(memcg)>  0)
> +		return -EBUSY;
>
>   	return mem_cgroup_force_empty(memcg, false);

Why can't you move the charges like everyone else?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 3/8] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
  2012-03-13  7:07   ` Aneesh Kumar K.V
  (?)
  (?)
@ 2012-03-13 13:31   ` Glauber Costa
  2012-03-14 10:41       ` Aneesh Kumar K.V
  -1 siblings, 1 reply; 69+ messages in thread
From: Glauber Costa @ 2012-03-13 13:31 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes, linux-kernel, cgroups

On 03/13/2012 11:07 AM, Aneesh Kumar K.V wrote:
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 8cac77b..f4aa11c 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2901,6 +2901,11 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
>
>   	if (PageSwapCache(page))
>   		return NULL;
> +	/*
> +	 * HugeTLB page uncharge happen in the HugeTLB compound page destructor
> +	 */
> +	if (PageHuge(page))
> +		return NULL;

Maybe it is better to call uncharge_common from the compound destructor,
so we can have all the uncharge code in a single place.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 4/8] memcg: track resource index in cftype private
  2012-03-13  7:07   ` Aneesh Kumar K.V
  (?)
@ 2012-03-13 13:34   ` Glauber Costa
  2012-03-14 10:48       ` Aneesh Kumar K.V
  -1 siblings, 1 reply; 69+ messages in thread
From: Glauber Costa @ 2012-03-13 13:34 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes, linux-kernel, cgroups

On 03/13/2012 11:07 AM, Aneesh Kumar K.V wrote:
>   		if (type == _MEM)
>   			ret = mem_cgroup_resize_limit(memcg, val);
> -		else
> +		else if (type == _MEMHUGETLB) {
> +			int idx = MEMFILE_IDX(cft->private);
> +			ret = res_counter_set_limit(&memcg->hugepage[idx], val);
> +		} else
>   			ret = mem_cgroup_resize_memsw_limit(memcg, val);
>   		break;
>   	case RES_SOFT_LIMIT:

What if a user try to set limit < usage ? Isn't there any reclaim that 
we could possibly do, like it is done by normal memcg ?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 2/8] memcg: Add HugeTLB extension
  2012-03-13 13:28   ` Glauber Costa
@ 2012-03-13 13:42     ` Glauber Costa
  0 siblings, 0 replies; 69+ messages in thread
From: Glauber Costa @ 2012-03-13 13:42 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes, linux-kernel, cgroups

On 03/13/2012 05:28 PM, Glauber Costa wrote:
> On 03/13/2012 11:07 AM, Aneesh Kumar K.V wrote:
>> @@ -4951,6 +5083,12 @@ static int mem_cgroup_pre_destroy(struct
>> cgroup_subsys *ss,
>> struct cgroup *cont)
>> {
>> struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
>> + /*
>> + * Don't allow memcg removal if we have HugeTLB resource
>> + * usage.
>> + */
>> + if (mem_cgroup_hugetlb_usage(memcg)> 0)
>> + return -EBUSY;
>>
>> return mem_cgroup_force_empty(memcg, false);
>
> Why can't you move the charges like everyone else?
>

Nevermind, just saw your last patch.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 2/8] memcg: Add HugeTLB extension
  2012-03-13  7:07   ` Aneesh Kumar K.V
@ 2012-03-13 21:33     ` Andrew Morton
  -1 siblings, 0 replies; 69+ messages in thread
From: Andrew Morton @ 2012-03-13 21:33 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	hannes, linux-kernel, cgroups

On Tue, 13 Mar 2012 12:37:06 +0530
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:

> +static int mem_cgroup_hugetlb_usage(struct mem_cgroup *memcg)
> +{
> +	int idx;
> +	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
> +		if (memcg->hugepage[idx].usage > 0)
> +			return memcg->hugepage[idx].usage;
> +	}
> +	return 0;
> +}

Please document the function?  Had you done this, I might have been
able to work out why the function bales out on the first used hugepage
size, but I can't :(

This could have used for_each_hstate(), had that macro been better
designed (or updated).

Upon return this function coerces an unsigned long long into an "int". 
We decided last week that more than 2^32 hugepages was not
inconceivable, so I guess that's a bug.


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 2/8] memcg: Add HugeTLB extension
@ 2012-03-13 21:33     ` Andrew Morton
  0 siblings, 0 replies; 69+ messages in thread
From: Andrew Morton @ 2012-03-13 21:33 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	hannes, linux-kernel, cgroups

On Tue, 13 Mar 2012 12:37:06 +0530
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:

> +static int mem_cgroup_hugetlb_usage(struct mem_cgroup *memcg)
> +{
> +	int idx;
> +	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
> +		if (memcg->hugepage[idx].usage > 0)
> +			return memcg->hugepage[idx].usage;
> +	}
> +	return 0;
> +}

Please document the function?  Had you done this, I might have been
able to work out why the function bales out on the first used hugepage
size, but I can't :(

This could have used for_each_hstate(), had that macro been better
designed (or updated).

Upon return this function coerces an unsigned long long into an "int". 
We decided last week that more than 2^32 hugepages was not
inconceivable, so I guess that's a bug.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 3/8] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
  2012-03-13  7:07   ` Aneesh Kumar K.V
  (?)
@ 2012-03-13 21:36     ` Andrew Morton
  -1 siblings, 0 replies; 69+ messages in thread
From: Andrew Morton @ 2012-03-13 21:36 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	hannes, linux-kernel, cgroups

On Tue, 13 Mar 2012 12:37:07 +0530
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:

> +		return ERR_PTR(-VM_FAULT_SIGBUS);

whee, so we have (ab)used the err.h infrastructure to carry
VM_FAULT_foo codes, thus creating a secret requirement that the
VM_FAULT_foo values not exceed MAX_ERRNO.

What a hack, whodidthat?

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 3/8] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
@ 2012-03-13 21:36     ` Andrew Morton
  0 siblings, 0 replies; 69+ messages in thread
From: Andrew Morton @ 2012-03-13 21:36 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	hannes, linux-kernel, cgroups

On Tue, 13 Mar 2012 12:37:07 +0530
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:

> +		return ERR_PTR(-VM_FAULT_SIGBUS);

whee, so we have (ab)used the err.h infrastructure to carry
VM_FAULT_foo codes, thus creating a secret requirement that the
VM_FAULT_foo values not exceed MAX_ERRNO.

What a hack, whodidthat?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 3/8] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
@ 2012-03-13 21:36     ` Andrew Morton
  0 siblings, 0 replies; 69+ messages in thread
From: Andrew Morton @ 2012-03-13 21:36 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, mgorman-l3A5Bk7waGM,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	mhocko-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Tue, 13 Mar 2012 12:37:07 +0530
"Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> wrote:

> +		return ERR_PTR(-VM_FAULT_SIGBUS);

whee, so we have (ab)used the err.h infrastructure to carry
VM_FAULT_foo codes, thus creating a secret requirement that the
VM_FAULT_foo values not exceed MAX_ERRNO.

What a hack, whodidthat?

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 5/8] hugetlbfs: Add memcg control files for hugetlbfs
  2012-03-13  7:07   ` Aneesh Kumar K.V
  (?)
@ 2012-03-13 21:42     ` Andrew Morton
  -1 siblings, 0 replies; 69+ messages in thread
From: Andrew Morton @ 2012-03-13 21:42 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	hannes, linux-kernel, cgroups

On Tue, 13 Mar 2012 12:37:09 +0530
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:

> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> This add control files for hugetlbfs in memcg
> 
> ...
>
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -220,6 +221,10 @@ struct hstate {
>  	unsigned int nr_huge_pages_node[MAX_NUMNODES];
>  	unsigned int free_huge_pages_node[MAX_NUMNODES];
>  	unsigned int surplus_huge_pages_node[MAX_NUMNODES];
> +	/* cgroup control files */
> +	struct cftype cgroup_limit_file;
> +	struct cftype cgroup_usage_file;
> +	struct cftype cgroup_max_usage_file;
>  	char name[HSTATE_NAME_LEN];
>  };

We don't need all these in here if, for example, cgroups is disabled?

> ...
>
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1817,6 +1817,36 @@ static int __init hugetlb_init(void)
>  }
>  module_init(hugetlb_init);
>  
> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
> +int register_hugetlb_memcg_files(struct cgroup *cgroup,
> +				 struct cgroup_subsys *ss)
> +{
> +	int ret = 0;
> +	struct hstate *h;
> +
> +	for_each_hstate(h) {
> +		ret = cgroup_add_file(cgroup, ss, &h->cgroup_limit_file);
> +		if (ret)
> +			return ret;
> +		ret = cgroup_add_file(cgroup, ss, &h->cgroup_usage_file);
> +		if (ret)
> +			return ret;
> +		ret = cgroup_add_file(cgroup, ss, &h->cgroup_max_usage_file);
> +		if (ret)
> +			return ret;
> +
> +	}
> +	return ret;
> +}
> +/* mm/memcontrol.c because mem_cgroup_read/write is not availabel outside */

Comment has a spelling mistake.

> +int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx);


No, please put it in a header file.  Always.  Where both callers and
the implementation see the same propotype.

> +#else
> +static int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx)
> +{
> +	return 0;
> +}
> +#endif

So this will go into the same header file.


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 5/8] hugetlbfs: Add memcg control files for hugetlbfs
@ 2012-03-13 21:42     ` Andrew Morton
  0 siblings, 0 replies; 69+ messages in thread
From: Andrew Morton @ 2012-03-13 21:42 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	hannes, linux-kernel, cgroups

On Tue, 13 Mar 2012 12:37:09 +0530
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:

> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> This add control files for hugetlbfs in memcg
> 
> ...
>
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -220,6 +221,10 @@ struct hstate {
>  	unsigned int nr_huge_pages_node[MAX_NUMNODES];
>  	unsigned int free_huge_pages_node[MAX_NUMNODES];
>  	unsigned int surplus_huge_pages_node[MAX_NUMNODES];
> +	/* cgroup control files */
> +	struct cftype cgroup_limit_file;
> +	struct cftype cgroup_usage_file;
> +	struct cftype cgroup_max_usage_file;
>  	char name[HSTATE_NAME_LEN];
>  };

We don't need all these in here if, for example, cgroups is disabled?

> ...
>
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1817,6 +1817,36 @@ static int __init hugetlb_init(void)
>  }
>  module_init(hugetlb_init);
>  
> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
> +int register_hugetlb_memcg_files(struct cgroup *cgroup,
> +				 struct cgroup_subsys *ss)
> +{
> +	int ret = 0;
> +	struct hstate *h;
> +
> +	for_each_hstate(h) {
> +		ret = cgroup_add_file(cgroup, ss, &h->cgroup_limit_file);
> +		if (ret)
> +			return ret;
> +		ret = cgroup_add_file(cgroup, ss, &h->cgroup_usage_file);
> +		if (ret)
> +			return ret;
> +		ret = cgroup_add_file(cgroup, ss, &h->cgroup_max_usage_file);
> +		if (ret)
> +			return ret;
> +
> +	}
> +	return ret;
> +}
> +/* mm/memcontrol.c because mem_cgroup_read/write is not availabel outside */

Comment has a spelling mistake.

> +int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx);


No, please put it in a header file.  Always.  Where both callers and
the implementation see the same propotype.

> +#else
> +static int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx)
> +{
> +	return 0;
> +}
> +#endif

So this will go into the same header file.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 5/8] hugetlbfs: Add memcg control files for hugetlbfs
@ 2012-03-13 21:42     ` Andrew Morton
  0 siblings, 0 replies; 69+ messages in thread
From: Andrew Morton @ 2012-03-13 21:42 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, mgorman-l3A5Bk7waGM,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	mhocko-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Tue, 13 Mar 2012 12:37:09 +0530
"Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> wrote:

> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> 
> This add control files for hugetlbfs in memcg
> 
> ...
>
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -220,6 +221,10 @@ struct hstate {
>  	unsigned int nr_huge_pages_node[MAX_NUMNODES];
>  	unsigned int free_huge_pages_node[MAX_NUMNODES];
>  	unsigned int surplus_huge_pages_node[MAX_NUMNODES];
> +	/* cgroup control files */
> +	struct cftype cgroup_limit_file;
> +	struct cftype cgroup_usage_file;
> +	struct cftype cgroup_max_usage_file;
>  	char name[HSTATE_NAME_LEN];
>  };

We don't need all these in here if, for example, cgroups is disabled?

> ...
>
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1817,6 +1817,36 @@ static int __init hugetlb_init(void)
>  }
>  module_init(hugetlb_init);
>  
> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
> +int register_hugetlb_memcg_files(struct cgroup *cgroup,
> +				 struct cgroup_subsys *ss)
> +{
> +	int ret = 0;
> +	struct hstate *h;
> +
> +	for_each_hstate(h) {
> +		ret = cgroup_add_file(cgroup, ss, &h->cgroup_limit_file);
> +		if (ret)
> +			return ret;
> +		ret = cgroup_add_file(cgroup, ss, &h->cgroup_usage_file);
> +		if (ret)
> +			return ret;
> +		ret = cgroup_add_file(cgroup, ss, &h->cgroup_max_usage_file);
> +		if (ret)
> +			return ret;
> +
> +	}
> +	return ret;
> +}
> +/* mm/memcontrol.c because mem_cgroup_read/write is not availabel outside */

Comment has a spelling mistake.

> +int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx);


No, please put it in a header file.  Always.  Where both callers and
the implementation see the same propotype.

> +#else
> +static int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx)
> +{
> +	return 0;
> +}
> +#endif

So this will go into the same header file.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 7/8] memcg: move HugeTLB resource count to parent cgroup on memcg removal
  2012-03-13  7:07   ` Aneesh Kumar K.V
  (?)
@ 2012-03-13 21:47     ` Andrew Morton
  -1 siblings, 0 replies; 69+ messages in thread
From: Andrew Morton @ 2012-03-13 21:47 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	hannes, linux-kernel, cgroups

On Tue, 13 Mar 2012 12:37:11 +0530
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:

> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> This add support for memcg removal with HugeTLB resource usage.
> 
> ...
>
> +int hugetlb_force_memcg_empty(struct cgroup *cgroup)

It's useful to document things, you know.  For a major function like
this, a nice little description of why it exists, what its role is,
etc.  Programming is not just an act of telling a computer what to do:
it is also an act of telling other programmers what you wished the
computer to do.  Both are important, and the latter deserves care.

> +{
> +	struct hstate *h;
> +	struct page *page;
> +	int ret = 0, idx = 0;
> +
> +	do {
> +		if (cgroup_task_count(cgroup) || !list_empty(&cgroup->children))
> +			goto out;
> +		if (signal_pending(current)) {
> +			ret = -EINTR;
> +			goto out;
> +		}

Why is its behaviour altered by signal_pending()?  This seems broken.

> +		for_each_hstate(h) {
> +			spin_lock(&hugetlb_lock);
> +			list_for_each_entry(page, &h->hugepage_activelist, lru) {
> +				ret = mem_cgroup_move_hugetlb_parent(idx, cgroup, page);
> +				if (ret) {
> +					spin_unlock(&hugetlb_lock);
> +					goto out;
> +				}
> +			}
> +			spin_unlock(&hugetlb_lock);
> +			idx++;
> +		}
> +		cond_resched();
> +	} while (mem_cgroup_hugetlb_usage(cgroup) > 0);
> +out:
> +	return ret;
> +}


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 7/8] memcg: move HugeTLB resource count to parent cgroup on memcg removal
@ 2012-03-13 21:47     ` Andrew Morton
  0 siblings, 0 replies; 69+ messages in thread
From: Andrew Morton @ 2012-03-13 21:47 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	hannes, linux-kernel, cgroups

On Tue, 13 Mar 2012 12:37:11 +0530
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:

> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> This add support for memcg removal with HugeTLB resource usage.
> 
> ...
>
> +int hugetlb_force_memcg_empty(struct cgroup *cgroup)

It's useful to document things, you know.  For a major function like
this, a nice little description of why it exists, what its role is,
etc.  Programming is not just an act of telling a computer what to do:
it is also an act of telling other programmers what you wished the
computer to do.  Both are important, and the latter deserves care.

> +{
> +	struct hstate *h;
> +	struct page *page;
> +	int ret = 0, idx = 0;
> +
> +	do {
> +		if (cgroup_task_count(cgroup) || !list_empty(&cgroup->children))
> +			goto out;
> +		if (signal_pending(current)) {
> +			ret = -EINTR;
> +			goto out;
> +		}

Why is its behaviour altered by signal_pending()?  This seems broken.

> +		for_each_hstate(h) {
> +			spin_lock(&hugetlb_lock);
> +			list_for_each_entry(page, &h->hugepage_activelist, lru) {
> +				ret = mem_cgroup_move_hugetlb_parent(idx, cgroup, page);
> +				if (ret) {
> +					spin_unlock(&hugetlb_lock);
> +					goto out;
> +				}
> +			}
> +			spin_unlock(&hugetlb_lock);
> +			idx++;
> +		}
> +		cond_resched();
> +	} while (mem_cgroup_hugetlb_usage(cgroup) > 0);
> +out:
> +	return ret;
> +}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 7/8] memcg: move HugeTLB resource count to parent cgroup on memcg removal
@ 2012-03-13 21:47     ` Andrew Morton
  0 siblings, 0 replies; 69+ messages in thread
From: Andrew Morton @ 2012-03-13 21:47 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, mgorman-l3A5Bk7waGM,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	mhocko-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Tue, 13 Mar 2012 12:37:11 +0530
"Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> wrote:

> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> 
> This add support for memcg removal with HugeTLB resource usage.
> 
> ...
>
> +int hugetlb_force_memcg_empty(struct cgroup *cgroup)

It's useful to document things, you know.  For a major function like
this, a nice little description of why it exists, what its role is,
etc.  Programming is not just an act of telling a computer what to do:
it is also an act of telling other programmers what you wished the
computer to do.  Both are important, and the latter deserves care.

> +{
> +	struct hstate *h;
> +	struct page *page;
> +	int ret = 0, idx = 0;
> +
> +	do {
> +		if (cgroup_task_count(cgroup) || !list_empty(&cgroup->children))
> +			goto out;
> +		if (signal_pending(current)) {
> +			ret = -EINTR;
> +			goto out;
> +		}

Why is its behaviour altered by signal_pending()?  This seems broken.

> +		for_each_hstate(h) {
> +			spin_lock(&hugetlb_lock);
> +			list_for_each_entry(page, &h->hugepage_activelist, lru) {
> +				ret = mem_cgroup_move_hugetlb_parent(idx, cgroup, page);
> +				if (ret) {
> +					spin_unlock(&hugetlb_lock);
> +					goto out;
> +				}
> +			}
> +			spin_unlock(&hugetlb_lock);
> +			idx++;
> +		}
> +		cond_resched();
> +	} while (mem_cgroup_hugetlb_usage(cgroup) > 0);
> +out:
> +	return ret;
> +}

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 0/8] memcg: Add memcg extension to control HugeTLB allocation
  2012-03-13  7:07 ` Aneesh Kumar K.V
  (?)
@ 2012-03-13 21:49   ` Andrew Morton
  -1 siblings, 0 replies; 69+ messages in thread
From: Andrew Morton @ 2012-03-13 21:49 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	hannes, linux-kernel, cgroups

On Tue, 13 Mar 2012 12:37:04 +0530
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:

> This patchset implements a memory controller extension to control
> HugeTLB allocations.

Well, why?  What are the use cases?  Who is asking for this?  Why do
they need it and how will they use it?  etcetera.

Please explain, with some care, why you think we should add this
feature to the kernel.  So that others can assess whether the value it
adds is worth the cost of adding and maintaining it.


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 0/8] memcg: Add memcg extension to control HugeTLB allocation
@ 2012-03-13 21:49   ` Andrew Morton
  0 siblings, 0 replies; 69+ messages in thread
From: Andrew Morton @ 2012-03-13 21:49 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	hannes, linux-kernel, cgroups

On Tue, 13 Mar 2012 12:37:04 +0530
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:

> This patchset implements a memory controller extension to control
> HugeTLB allocations.

Well, why?  What are the use cases?  Who is asking for this?  Why do
they need it and how will they use it?  etcetera.

Please explain, with some care, why you think we should add this
feature to the kernel.  So that others can assess whether the value it
adds is worth the cost of adding and maintaining it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 0/8] memcg: Add memcg extension to control HugeTLB allocation
@ 2012-03-13 21:49   ` Andrew Morton
  0 siblings, 0 replies; 69+ messages in thread
From: Andrew Morton @ 2012-03-13 21:49 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, mgorman-l3A5Bk7waGM,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	mhocko-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Tue, 13 Mar 2012 12:37:04 +0530
"Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> wrote:

> This patchset implements a memory controller extension to control
> HugeTLB allocations.

Well, why?  What are the use cases?  Who is asking for this?  Why do
they need it and how will they use it?  etcetera.

Please explain, with some care, why you think we should add this
feature to the kernel.  So that others can assess whether the value it
adds is worth the cost of adding and maintaining it.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 2/8] memcg: Add HugeTLB extension
  2012-03-13 21:33     ` Andrew Morton
@ 2012-03-14 10:21       ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-14 10:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	hannes, linux-kernel, cgroups

On Tue, 13 Mar 2012 14:33:16 -0700, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Tue, 13 Mar 2012 12:37:06 +0530
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> 
> > +static int mem_cgroup_hugetlb_usage(struct mem_cgroup *memcg)
> > +{
> > +	int idx;
> > +	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
> > +		if (memcg->hugepage[idx].usage > 0)
> > +			return memcg->hugepage[idx].usage;
> > +	}
> > +	return 0;
> > +}
> 
> Please document the function?  Had you done this, I might have been
> able to work out why the function bales out on the first used hugepage
> size, but I can't :(

I guess the function is named wrongly. I will rename it to
mem_cgroup_have_hugetlb_usage() in the next iteration ? The function
will return (bool) 1 if it has any hugetlb resource usage.

> 
> This could have used for_each_hstate(), had that macro been better
> designed (or updated).
> 

Can you explain this ?. for_each_hstate allows to iterate over
different hstates. But here we need to look at different hugepage
rescounter in memcg. I can still use for_each_hstate() and find the
hstate index (h - hstates) and use that to index memcg rescounter
array. But that would make it more complex ?

> Upon return this function coerces an unsigned long long into an "int". 
> We decided last week that more than 2^32 hugepages was not
> inconceivable, so I guess that's a bug.
> 

-aneesh


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 2/8] memcg: Add HugeTLB extension
@ 2012-03-14 10:21       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-14 10:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	hannes, linux-kernel, cgroups

On Tue, 13 Mar 2012 14:33:16 -0700, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Tue, 13 Mar 2012 12:37:06 +0530
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> 
> > +static int mem_cgroup_hugetlb_usage(struct mem_cgroup *memcg)
> > +{
> > +	int idx;
> > +	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
> > +		if (memcg->hugepage[idx].usage > 0)
> > +			return memcg->hugepage[idx].usage;
> > +	}
> > +	return 0;
> > +}
> 
> Please document the function?  Had you done this, I might have been
> able to work out why the function bales out on the first used hugepage
> size, but I can't :(

I guess the function is named wrongly. I will rename it to
mem_cgroup_have_hugetlb_usage() in the next iteration ? The function
will return (bool) 1 if it has any hugetlb resource usage.

> 
> This could have used for_each_hstate(), had that macro been better
> designed (or updated).
> 

Can you explain this ?. for_each_hstate allows to iterate over
different hstates. But here we need to look at different hugepage
rescounter in memcg. I can still use for_each_hstate() and find the
hstate index (h - hstates) and use that to index memcg rescounter
array. But that would make it more complex ?

> Upon return this function coerces an unsigned long long into an "int". 
> We decided last week that more than 2^32 hugepages was not
> inconceivable, so I guess that's a bug.
> 

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 3/8] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
  2012-03-13 13:20     ` Hillf Danton
  (?)
@ 2012-03-14 10:22       ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-14 10:22 UTC (permalink / raw)
  To: Hillf Danton
  Cc: linux-mm, mgorman, kamezawa.hiroyu, aarcange, mhocko, akpm,
	hannes, linux-kernel, cgroups

On Tue, 13 Mar 2012 21:20:21 +0800, Hillf Danton <dhillf@gmail.com> wrote:
> On Tue, Mar 13, 2012 at 3:07 PM, Aneesh Kumar K.V
> <aneesh.kumar@linux.vnet.ibm.com> wrote:
> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >
> > This adds necessary charge/uncharge calls in the HugeTLB code
> >
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> > ---
> >  mm/hugetlb.c    |   21 ++++++++++++++++++++-
> >  mm/memcontrol.c |    5 +++++
> >  2 files changed, 25 insertions(+), 1 deletions(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index fe7aefd..b7152d1 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -21,6 +21,8 @@
> >  #include <linux/rmap.h>
> >  #include <linux/swap.h>
> >  #include <linux/swapops.h>
> > +#include <linux/memcontrol.h>
> > +#include <linux/page_cgroup.h>
> >
> >  #include <asm/page.h>
> >  #include <asm/pgtable.h>
> > @@ -542,6 +544,9 @@ static void free_huge_page(struct page *page)
> >        BUG_ON(page_mapcount(page));
> >        INIT_LIST_HEAD(&page->lru);
> >
> > +       if (mapping)
> > +               mem_cgroup_hugetlb_uncharge_page(h - hstates,
> > +                                                pages_per_huge_page(h), page);
> >        spin_lock(&hugetlb_lock);
> >        if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
> >                update_and_free_page(h, page);
> > @@ -1019,12 +1024,15 @@ static void vma_commit_reservation(struct hstate *h,
> >  static struct page *alloc_huge_page(struct vm_area_struct *vma,
> >                                    unsigned long addr, int avoid_reserve)
> >  {
> > +       int ret, idx;
> >        struct hstate *h = hstate_vma(vma);
> >        struct page *page;
> > +       struct mem_cgroup *memcg = NULL;
> >        struct address_space *mapping = vma->vm_file->f_mapping;
> >        struct inode *inode = mapping->host;
> >        long chg;
> >
> > +       idx = h - hstates;
> 
> Better if hstate index is computed with a tiny inline helper?
> Other than that,

Will update in the next iteration.

> 
> Acked-by: Hillf Danton <dhillf@gmail.com>
> 

-aneesh


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 3/8] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
@ 2012-03-14 10:22       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-14 10:22 UTC (permalink / raw)
  To: Hillf Danton
  Cc: linux-mm, mgorman, kamezawa.hiroyu, aarcange, mhocko, akpm,
	hannes, linux-kernel, cgroups

On Tue, 13 Mar 2012 21:20:21 +0800, Hillf Danton <dhillf@gmail.com> wrote:
> On Tue, Mar 13, 2012 at 3:07 PM, Aneesh Kumar K.V
> <aneesh.kumar@linux.vnet.ibm.com> wrote:
> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >
> > This adds necessary charge/uncharge calls in the HugeTLB code
> >
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> > ---
> >  mm/hugetlb.c    |   21 ++++++++++++++++++++-
> >  mm/memcontrol.c |    5 +++++
> >  2 files changed, 25 insertions(+), 1 deletions(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index fe7aefd..b7152d1 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -21,6 +21,8 @@
> >  #include <linux/rmap.h>
> >  #include <linux/swap.h>
> >  #include <linux/swapops.h>
> > +#include <linux/memcontrol.h>
> > +#include <linux/page_cgroup.h>
> >
> >  #include <asm/page.h>
> >  #include <asm/pgtable.h>
> > @@ -542,6 +544,9 @@ static void free_huge_page(struct page *page)
> >        BUG_ON(page_mapcount(page));
> >        INIT_LIST_HEAD(&page->lru);
> >
> > +       if (mapping)
> > +               mem_cgroup_hugetlb_uncharge_page(h - hstates,
> > +                                                pages_per_huge_page(h), page);
> >        spin_lock(&hugetlb_lock);
> >        if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
> >                update_and_free_page(h, page);
> > @@ -1019,12 +1024,15 @@ static void vma_commit_reservation(struct hstate *h,
> >  static struct page *alloc_huge_page(struct vm_area_struct *vma,
> >                                    unsigned long addr, int avoid_reserve)
> >  {
> > +       int ret, idx;
> >        struct hstate *h = hstate_vma(vma);
> >        struct page *page;
> > +       struct mem_cgroup *memcg = NULL;
> >        struct address_space *mapping = vma->vm_file->f_mapping;
> >        struct inode *inode = mapping->host;
> >        long chg;
> >
> > +       idx = h - hstates;
> 
> Better if hstate index is computed with a tiny inline helper?
> Other than that,

Will update in the next iteration.

> 
> Acked-by: Hillf Danton <dhillf@gmail.com>
> 

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 3/8] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
@ 2012-03-14 10:22       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-14 10:22 UTC (permalink / raw)
  To: Hillf Danton
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, mgorman-l3A5Bk7waGM,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	aarcange-H+wXaHxf7aLQT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Tue, 13 Mar 2012 21:20:21 +0800, Hillf Danton <dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Tue, Mar 13, 2012 at 3:07 PM, Aneesh Kumar K.V
> <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> wrote:
> > From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> >
> > This adds necessary charge/uncharge calls in the HugeTLB code
> >
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> > ---
> >  mm/hugetlb.c    |   21 ++++++++++++++++++++-
> >  mm/memcontrol.c |    5 +++++
> >  2 files changed, 25 insertions(+), 1 deletions(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index fe7aefd..b7152d1 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -21,6 +21,8 @@
> >  #include <linux/rmap.h>
> >  #include <linux/swap.h>
> >  #include <linux/swapops.h>
> > +#include <linux/memcontrol.h>
> > +#include <linux/page_cgroup.h>
> >
> >  #include <asm/page.h>
> >  #include <asm/pgtable.h>
> > @@ -542,6 +544,9 @@ static void free_huge_page(struct page *page)
> >        BUG_ON(page_mapcount(page));
> >        INIT_LIST_HEAD(&page->lru);
> >
> > +       if (mapping)
> > +               mem_cgroup_hugetlb_uncharge_page(h - hstates,
> > +                                                pages_per_huge_page(h), page);
> >        spin_lock(&hugetlb_lock);
> >        if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
> >                update_and_free_page(h, page);
> > @@ -1019,12 +1024,15 @@ static void vma_commit_reservation(struct hstate *h,
> >  static struct page *alloc_huge_page(struct vm_area_struct *vma,
> >                                    unsigned long addr, int avoid_reserve)
> >  {
> > +       int ret, idx;
> >        struct hstate *h = hstate_vma(vma);
> >        struct page *page;
> > +       struct mem_cgroup *memcg = NULL;
> >        struct address_space *mapping = vma->vm_file->f_mapping;
> >        struct inode *inode = mapping->host;
> >        long chg;
> >
> > +       idx = h - hstates;
> 
> Better if hstate index is computed with a tiny inline helper?
> Other than that,

Will update in the next iteration.

> 
> Acked-by: Hillf Danton <dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> 

-aneesh

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 3/8] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
  2012-03-13 13:31   ` Glauber Costa
  2012-03-14 10:41       ` Aneesh Kumar K.V
@ 2012-03-14 10:41       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-14 10:41 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes, linux-kernel, cgroups

On Tue, 13 Mar 2012 17:31:52 +0400, Glauber Costa <glommer@parallels.com> wrote:
> On 03/13/2012 11:07 AM, Aneesh Kumar K.V wrote:
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 8cac77b..f4aa11c 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -2901,6 +2901,11 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
> >
> >   	if (PageSwapCache(page))
> >   		return NULL;
> > +	/*
> > +	 * HugeTLB page uncharge happen in the HugeTLB compound page destructor
> > +	 */
> > +	if (PageHuge(page))
> > +		return NULL;
> 
> Maybe it is better to call uncharge_common from the compound destructor,
> so we can have all the uncharge code in a single place.
> 

PageHuge is not represented by a page flags as SwapCache. Hence I was
not able to call uncharge_common from compound destructor. For
SwapCache, we clear the flag and call uncharge_common again. Also I will
have to update those functions to take the resource counter index as
argument so that we end up updated the right resource counter in the
counter array. That would result in more code changes and I was not sure
about that.

-aneesh


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 3/8] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
@ 2012-03-14 10:41       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-14 10:41 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes, linux-kernel, cgroups

On Tue, 13 Mar 2012 17:31:52 +0400, Glauber Costa <glommer@parallels.com> wrote:
> On 03/13/2012 11:07 AM, Aneesh Kumar K.V wrote:
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 8cac77b..f4aa11c 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -2901,6 +2901,11 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
> >
> >   	if (PageSwapCache(page))
> >   		return NULL;
> > +	/*
> > +	 * HugeTLB page uncharge happen in the HugeTLB compound page destructor
> > +	 */
> > +	if (PageHuge(page))
> > +		return NULL;
> 
> Maybe it is better to call uncharge_common from the compound destructor,
> so we can have all the uncharge code in a single place.
> 

PageHuge is not represented by a page flags as SwapCache. Hence I was
not able to call uncharge_common from compound destructor. For
SwapCache, we clear the flag and call uncharge_common again. Also I will
have to update those functions to take the resource counter index as
argument so that we end up updated the right resource counter in the
counter array. That would result in more code changes and I was not sure
about that.

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 3/8] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
@ 2012-03-14 10:41       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-14 10:41 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, mgorman-l3A5Bk7waGM,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	mhocko-AlSwsSmVLrQ, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Tue, 13 Mar 2012 17:31:52 +0400, Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> wrote:
> On 03/13/2012 11:07 AM, Aneesh Kumar K.V wrote:
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 8cac77b..f4aa11c 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -2901,6 +2901,11 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
> >
> >   	if (PageSwapCache(page))
> >   		return NULL;
> > +	/*
> > +	 * HugeTLB page uncharge happen in the HugeTLB compound page destructor
> > +	 */
> > +	if (PageHuge(page))
> > +		return NULL;
> 
> Maybe it is better to call uncharge_common from the compound destructor,
> so we can have all the uncharge code in a single place.
> 

PageHuge is not represented by a page flags as SwapCache. Hence I was
not able to call uncharge_common from compound destructor. For
SwapCache, we clear the flag and call uncharge_common again. Also I will
have to update those functions to take the resource counter index as
argument so that we end up updated the right resource counter in the
counter array. That would result in more code changes and I was not sure
about that.

-aneesh

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 3/8] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
  2012-03-13 21:36     ` Andrew Morton
  (?)
@ 2012-03-14 10:47       ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-14 10:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	hannes, linux-kernel, cgroups

On Tue, 13 Mar 2012 14:36:54 -0700, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Tue, 13 Mar 2012 12:37:07 +0530
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> 
> > +		return ERR_PTR(-VM_FAULT_SIGBUS);
> 
> whee, so we have (ab)used the err.h infrastructure to carry
> VM_FAULT_foo codes, thus creating a secret requirement that the
> VM_FAULT_foo values not exceed MAX_ERRNO.
> 
> What a hack, whodidthat?

e0dcd8a05be438b3d2e49ef61441ea3a463663f8. We only do that in hugetlb. I
will add a cleanup patch that will return proper error and map them in
the caller to return SIGBUS.

-aneesh 


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 3/8] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
@ 2012-03-14 10:47       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-14 10:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	hannes, linux-kernel, cgroups

On Tue, 13 Mar 2012 14:36:54 -0700, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Tue, 13 Mar 2012 12:37:07 +0530
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> 
> > +		return ERR_PTR(-VM_FAULT_SIGBUS);
> 
> whee, so we have (ab)used the err.h infrastructure to carry
> VM_FAULT_foo codes, thus creating a secret requirement that the
> VM_FAULT_foo values not exceed MAX_ERRNO.
> 
> What a hack, whodidthat?

e0dcd8a05be438b3d2e49ef61441ea3a463663f8. We only do that in hugetlb. I
will add a cleanup patch that will return proper error and map them in
the caller to return SIGBUS.

-aneesh 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 3/8] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
@ 2012-03-14 10:47       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-14 10:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, mgorman-l3A5Bk7waGM,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	mhocko-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Tue, 13 Mar 2012 14:36:54 -0700, Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:
> On Tue, 13 Mar 2012 12:37:07 +0530
> "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> wrote:
> 
> > +		return ERR_PTR(-VM_FAULT_SIGBUS);
> 
> whee, so we have (ab)used the err.h infrastructure to carry
> VM_FAULT_foo codes, thus creating a secret requirement that the
> VM_FAULT_foo values not exceed MAX_ERRNO.
> 
> What a hack, whodidthat?

e0dcd8a05be438b3d2e49ef61441ea3a463663f8. We only do that in hugetlb. I
will add a cleanup patch that will return proper error and map them in
the caller to return SIGBUS.

-aneesh 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 4/8] memcg: track resource index in cftype private
  2012-03-13 13:34   ` Glauber Costa
  2012-03-14 10:48       ` Aneesh Kumar K.V
@ 2012-03-14 10:48       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-14 10:48 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes, linux-kernel, cgroups

On Tue, 13 Mar 2012 17:34:08 +0400, Glauber Costa <glommer@parallels.com> wrote:
> On 03/13/2012 11:07 AM, Aneesh Kumar K.V wrote:
> >   		if (type == _MEM)
> >   			ret = mem_cgroup_resize_limit(memcg, val);
> > -		else
> > +		else if (type == _MEMHUGETLB) {
> > +			int idx = MEMFILE_IDX(cft->private);
> > +			ret = res_counter_set_limit(&memcg->hugepage[idx], val);
> > +		} else
> >   			ret = mem_cgroup_resize_memsw_limit(memcg, val);
> >   		break;
> >   	case RES_SOFT_LIMIT:
> 
> What if a user try to set limit < usage ? Isn't there any reclaim that 
> we could possibly do, like it is done by normal memcg ?

No, HugeTLB doesn't support reclaim. If we set the limit to a value
below current usage, future allocations will fail, but we don't reclaim.


-aneesh


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 4/8] memcg: track resource index in cftype private
@ 2012-03-14 10:48       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-14 10:48 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes, linux-kernel, cgroups

On Tue, 13 Mar 2012 17:34:08 +0400, Glauber Costa <glommer@parallels.com> wrote:
> On 03/13/2012 11:07 AM, Aneesh Kumar K.V wrote:
> >   		if (type == _MEM)
> >   			ret = mem_cgroup_resize_limit(memcg, val);
> > -		else
> > +		else if (type == _MEMHUGETLB) {
> > +			int idx = MEMFILE_IDX(cft->private);
> > +			ret = res_counter_set_limit(&memcg->hugepage[idx], val);
> > +		} else
> >   			ret = mem_cgroup_resize_memsw_limit(memcg, val);
> >   		break;
> >   	case RES_SOFT_LIMIT:
> 
> What if a user try to set limit < usage ? Isn't there any reclaim that 
> we could possibly do, like it is done by normal memcg ?

No, HugeTLB doesn't support reclaim. If we set the limit to a value
below current usage, future allocations will fail, but we don't reclaim.


-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 4/8] memcg: track resource index in cftype private
@ 2012-03-14 10:48       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-14 10:48 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, mgorman-l3A5Bk7waGM,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	mhocko-AlSwsSmVLrQ, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Tue, 13 Mar 2012 17:34:08 +0400, Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> wrote:
> On 03/13/2012 11:07 AM, Aneesh Kumar K.V wrote:
> >   		if (type == _MEM)
> >   			ret = mem_cgroup_resize_limit(memcg, val);
> > -		else
> > +		else if (type == _MEMHUGETLB) {
> > +			int idx = MEMFILE_IDX(cft->private);
> > +			ret = res_counter_set_limit(&memcg->hugepage[idx], val);
> > +		} else
> >   			ret = mem_cgroup_resize_memsw_limit(memcg, val);
> >   		break;
> >   	case RES_SOFT_LIMIT:
> 
> What if a user try to set limit < usage ? Isn't there any reclaim that 
> we could possibly do, like it is done by normal memcg ?

No, HugeTLB doesn't support reclaim. If we set the limit to a value
below current usage, future allocations will fail, but we don't reclaim.


-aneesh

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 5/8] hugetlbfs: Add memcg control files for hugetlbfs
  2012-03-13 21:42     ` Andrew Morton
@ 2012-03-14 11:10       ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-14 11:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	hannes, linux-kernel, cgroups

On Tue, 13 Mar 2012 14:42:33 -0700, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Tue, 13 Mar 2012 12:37:09 +0530
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> 
> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> > 
> > This add control files for hugetlbfs in memcg
> > 
> > ...
> >
> > --- a/include/linux/hugetlb.h
> > +++ b/include/linux/hugetlb.h
> > @@ -220,6 +221,10 @@ struct hstate {
> >  	unsigned int nr_huge_pages_node[MAX_NUMNODES];
> >  	unsigned int free_huge_pages_node[MAX_NUMNODES];
> >  	unsigned int surplus_huge_pages_node[MAX_NUMNODES];
> > +	/* cgroup control files */
> > +	struct cftype cgroup_limit_file;
> > +	struct cftype cgroup_usage_file;
> > +	struct cftype cgroup_max_usage_file;
> >  	char name[HSTATE_NAME_LEN];
> >  };
> 
> We don't need all these in here if, for example, cgroups is disabled?

Will fix.

> 
> > ...
> >
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1817,6 +1817,36 @@ static int __init hugetlb_init(void)
> >  }
> >  module_init(hugetlb_init);
> >  
> > +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
> > +int register_hugetlb_memcg_files(struct cgroup *cgroup,
> > +				 struct cgroup_subsys *ss)
> > +{
> > +	int ret = 0;
> > +	struct hstate *h;
> > +
> > +	for_each_hstate(h) {
> > +		ret = cgroup_add_file(cgroup, ss, &h->cgroup_limit_file);
> > +		if (ret)
> > +			return ret;
> > +		ret = cgroup_add_file(cgroup, ss, &h->cgroup_usage_file);
> > +		if (ret)
> > +			return ret;
> > +		ret = cgroup_add_file(cgroup, ss, &h->cgroup_max_usage_file);
> > +		if (ret)
> > +			return ret;
> > +
> > +	}
> > +	return ret;
> > +}
> > +/* mm/memcontrol.c because mem_cgroup_read/write is not availabel outside */
> 
> Comment has a spelling mistake.

Will fix

> 
> > +int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx);
> 
> 
> No, please put it in a header file.  Always.  Where both callers and
> the implementation see the same propotype.
> 
> > +#else
> > +static int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx)
> > +{
> > +	return 0;
> > +}
> > +#endif
> 
> So this will go into the same header file.
> 

I was not sure whether i want to put mem_cgroup_hugetlb_file_init in
linux/memcontrol.h . Ideally i want to have that in mm/hugetlb.c and in
linux/hugetlb.h. That would require me to make mem_cgroup_read and
others non static and move few #defines to memcontrol.h. That would
involve larger code movement which i didn't want to do. ? What do you
suggest ? Just move mem_cgroup_hugetlb_file_init to memcontrol.h ?


-aneesh


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 5/8] hugetlbfs: Add memcg control files for hugetlbfs
@ 2012-03-14 11:10       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-14 11:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	hannes, linux-kernel, cgroups

On Tue, 13 Mar 2012 14:42:33 -0700, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Tue, 13 Mar 2012 12:37:09 +0530
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> 
> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> > 
> > This add control files for hugetlbfs in memcg
> > 
> > ...
> >
> > --- a/include/linux/hugetlb.h
> > +++ b/include/linux/hugetlb.h
> > @@ -220,6 +221,10 @@ struct hstate {
> >  	unsigned int nr_huge_pages_node[MAX_NUMNODES];
> >  	unsigned int free_huge_pages_node[MAX_NUMNODES];
> >  	unsigned int surplus_huge_pages_node[MAX_NUMNODES];
> > +	/* cgroup control files */
> > +	struct cftype cgroup_limit_file;
> > +	struct cftype cgroup_usage_file;
> > +	struct cftype cgroup_max_usage_file;
> >  	char name[HSTATE_NAME_LEN];
> >  };
> 
> We don't need all these in here if, for example, cgroups is disabled?

Will fix.

> 
> > ...
> >
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1817,6 +1817,36 @@ static int __init hugetlb_init(void)
> >  }
> >  module_init(hugetlb_init);
> >  
> > +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
> > +int register_hugetlb_memcg_files(struct cgroup *cgroup,
> > +				 struct cgroup_subsys *ss)
> > +{
> > +	int ret = 0;
> > +	struct hstate *h;
> > +
> > +	for_each_hstate(h) {
> > +		ret = cgroup_add_file(cgroup, ss, &h->cgroup_limit_file);
> > +		if (ret)
> > +			return ret;
> > +		ret = cgroup_add_file(cgroup, ss, &h->cgroup_usage_file);
> > +		if (ret)
> > +			return ret;
> > +		ret = cgroup_add_file(cgroup, ss, &h->cgroup_max_usage_file);
> > +		if (ret)
> > +			return ret;
> > +
> > +	}
> > +	return ret;
> > +}
> > +/* mm/memcontrol.c because mem_cgroup_read/write is not availabel outside */
> 
> Comment has a spelling mistake.

Will fix

> 
> > +int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx);
> 
> 
> No, please put it in a header file.  Always.  Where both callers and
> the implementation see the same propotype.
> 
> > +#else
> > +static int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx)
> > +{
> > +	return 0;
> > +}
> > +#endif
> 
> So this will go into the same header file.
> 

I was not sure whether i want to put mem_cgroup_hugetlb_file_init in
linux/memcontrol.h . Ideally i want to have that in mm/hugetlb.c and in
linux/hugetlb.h. That would require me to make mem_cgroup_read and
others non static and move few #defines to memcontrol.h. That would
involve larger code movement which i didn't want to do. ? What do you
suggest ? Just move mem_cgroup_hugetlb_file_init to memcontrol.h ?


-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 5/8] hugetlbfs: Add memcg control files for hugetlbfs
  2012-03-14 11:10       ` Aneesh Kumar K.V
@ 2012-03-14 11:35         ` Andrew Morton
  -1 siblings, 0 replies; 69+ messages in thread
From: Andrew Morton @ 2012-03-14 11:35 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	hannes, linux-kernel, cgroups

On Wed, 14 Mar 2012 16:40:58 +0530 "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:

> > 
> > > +int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx);
> > 
> > 
> > No, please put it in a header file.  Always.  Where both callers and
> > the implementation see the same propotype.
> > 
> > > +#else
> > > +static int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx)
> > > +{
> > > +	return 0;
> > > +}
> > > +#endif
> > 
> > So this will go into the same header file.
> > 
> 
> I was not sure whether i want to put mem_cgroup_hugetlb_file_init in
> linux/memcontrol.h .

The above is a declaration, not the definition (implementation).

> Ideally i want to have that in mm/hugetlb.c and in
> linux/hugetlb.h. That would require me to make mem_cgroup_read and
> others non static and move few #defines to memcontrol.h. That would
> involve larger code movement which i didn't want to do. ? What do you
> suggest ? Just move mem_cgroup_hugetlb_file_init to memcontrol.h ?

In memcontrol.h:

#ifdef CONFIG_FOO
extern int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx);
#else
static inline int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx)
{
	return 0;
}
#endif

?

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 5/8] hugetlbfs: Add memcg control files for hugetlbfs
@ 2012-03-14 11:35         ` Andrew Morton
  0 siblings, 0 replies; 69+ messages in thread
From: Andrew Morton @ 2012-03-14 11:35 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	hannes, linux-kernel, cgroups

On Wed, 14 Mar 2012 16:40:58 +0530 "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:

> > 
> > > +int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx);
> > 
> > 
> > No, please put it in a header file.  Always.  Where both callers and
> > the implementation see the same propotype.
> > 
> > > +#else
> > > +static int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx)
> > > +{
> > > +	return 0;
> > > +}
> > > +#endif
> > 
> > So this will go into the same header file.
> > 
> 
> I was not sure whether i want to put mem_cgroup_hugetlb_file_init in
> linux/memcontrol.h .

The above is a declaration, not the definition (implementation).

> Ideally i want to have that in mm/hugetlb.c and in
> linux/hugetlb.h. That would require me to make mem_cgroup_read and
> others non static and move few #defines to memcontrol.h. That would
> involve larger code movement which i didn't want to do. ? What do you
> suggest ? Just move mem_cgroup_hugetlb_file_init to memcontrol.h ?

In memcontrol.h:

#ifdef CONFIG_FOO
extern int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx);
#else
static inline int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx)
{
	return 0;
}
#endif

?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 7/8] memcg: move HugeTLB resource count to parent cgroup on memcg removal
  2012-03-13 21:47     ` Andrew Morton
@ 2012-03-14 12:22       ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-14 12:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	hannes, linux-kernel, cgroups

On Tue, 13 Mar 2012 14:47:05 -0700, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Tue, 13 Mar 2012 12:37:11 +0530
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> 
> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> > 
> > This add support for memcg removal with HugeTLB resource usage.
> > 
> > ...
> >
> > +int hugetlb_force_memcg_empty(struct cgroup *cgroup)
> 
> It's useful to document things, you know.  For a major function like
> this, a nice little description of why it exists, what its role is,
> etc.  Programming is not just an act of telling a computer what to do:
> it is also an act of telling other programmers what you wished the
> computer to do.  Both are important, and the latter deserves care.
> 


Will do.

> > +{
> > +	struct hstate *h;
> > +	struct page *page;
> > +	int ret = 0, idx = 0;
> > +
> > +	do {
> > +		if (cgroup_task_count(cgroup) || !list_empty(&cgroup->children))
> > +			goto out;
> > +		if (signal_pending(current)) {
> > +			ret = -EINTR;
> > +			goto out;
> > +		}
> 
> Why is its behaviour altered by signal_pending()?  This seems broken.

If the task that is doing a cgroup_rmdir got a signal we don't really
need to loop till the hugetlb resource usage become zero. 


> 
> > +		for_each_hstate(h) {
> > +			spin_lock(&hugetlb_lock);
> > +			list_for_each_entry(page, &h->hugepage_activelist, lru) {
> > +				ret = mem_cgroup_move_hugetlb_parent(idx, cgroup, page);
> > +				if (ret) {
> > +					spin_unlock(&hugetlb_lock);
> > +					goto out;
> > +				}
> > +			}
> > +			spin_unlock(&hugetlb_lock);
> > +			idx++;
> > +		}
> > +		cond_resched();
> > +	} while (mem_cgroup_hugetlb_usage(cgroup) > 0);
> > +out:
> > +	return ret;
> > +}

-aneesh


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 7/8] memcg: move HugeTLB resource count to parent cgroup on memcg removal
@ 2012-03-14 12:22       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-14 12:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	hannes, linux-kernel, cgroups

On Tue, 13 Mar 2012 14:47:05 -0700, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Tue, 13 Mar 2012 12:37:11 +0530
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> 
> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> > 
> > This add support for memcg removal with HugeTLB resource usage.
> > 
> > ...
> >
> > +int hugetlb_force_memcg_empty(struct cgroup *cgroup)
> 
> It's useful to document things, you know.  For a major function like
> this, a nice little description of why it exists, what its role is,
> etc.  Programming is not just an act of telling a computer what to do:
> it is also an act of telling other programmers what you wished the
> computer to do.  Both are important, and the latter deserves care.
> 


Will do.

> > +{
> > +	struct hstate *h;
> > +	struct page *page;
> > +	int ret = 0, idx = 0;
> > +
> > +	do {
> > +		if (cgroup_task_count(cgroup) || !list_empty(&cgroup->children))
> > +			goto out;
> > +		if (signal_pending(current)) {
> > +			ret = -EINTR;
> > +			goto out;
> > +		}
> 
> Why is its behaviour altered by signal_pending()?  This seems broken.

If the task that is doing a cgroup_rmdir got a signal we don't really
need to loop till the hugetlb resource usage become zero. 


> 
> > +		for_each_hstate(h) {
> > +			spin_lock(&hugetlb_lock);
> > +			list_for_each_entry(page, &h->hugepage_activelist, lru) {
> > +				ret = mem_cgroup_move_hugetlb_parent(idx, cgroup, page);
> > +				if (ret) {
> > +					spin_unlock(&hugetlb_lock);
> > +					goto out;
> > +				}
> > +			}
> > +			spin_unlock(&hugetlb_lock);
> > +			idx++;
> > +		}
> > +		cond_resched();
> > +	} while (mem_cgroup_hugetlb_usage(cgroup) > 0);
> > +out:
> > +	return ret;
> > +}

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 0/8] memcg: Add memcg extension to control HugeTLB allocation
  2012-03-13 21:49   ` Andrew Morton
@ 2012-03-14 13:01     ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-14 13:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	hannes, linux-kernel, cgroups

On Tue, 13 Mar 2012 14:49:30 -0700, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Tue, 13 Mar 2012 12:37:04 +0530
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> 
> > This patchset implements a memory controller extension to control
> > HugeTLB allocations.
> 
> Well, why?  What are the use cases?  Who is asking for this?  Why do
> they need it and how will they use it?  etcetera.
> 
> Please explain, with some care, why you think we should add this
> feature to the kernel.  So that others can assess whether the value it
> adds is worth the cost of adding and maintaining it.
> 

The goal is to control how many HugeTLB pages a group of task can
allocate. It can be looked at as an extension of the existing quota
interface which limits the number of HugeTLB pages per hugetlbfs
superblock. HPC job scheduler requires jobs to specify their resource
requirements in the job file. Once their requirements can be met,
job schedulers like (SLURM) will schedule the job. We need to make sure
that the jobs won't consume more resources than requested. If they do
we should error out or kill the application.

-aneesh


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 0/8] memcg: Add memcg extension to control HugeTLB allocation
@ 2012-03-14 13:01     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-14 13:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	hannes, linux-kernel, cgroups

On Tue, 13 Mar 2012 14:49:30 -0700, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Tue, 13 Mar 2012 12:37:04 +0530
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> 
> > This patchset implements a memory controller extension to control
> > HugeTLB allocations.
> 
> Well, why?  What are the use cases?  Who is asking for this?  Why do
> they need it and how will they use it?  etcetera.
> 
> Please explain, with some care, why you think we should add this
> feature to the kernel.  So that others can assess whether the value it
> adds is worth the cost of adding and maintaining it.
> 

The goal is to control how many HugeTLB pages a group of task can
allocate. It can be looked at as an extension of the existing quota
interface which limits the number of HugeTLB pages per hugetlbfs
superblock. HPC job scheduler requires jobs to specify their resource
requirements in the job file. Once their requirements can be met,
job schedulers like (SLURM) will schedule the job. We need to make sure
that the jobs won't consume more resources than requested. If they do
we should error out or kill the application.

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 5/8] hugetlbfs: Add memcg control files for hugetlbfs
  2012-03-14 11:35         ` Andrew Morton
  (?)
@ 2012-03-14 15:57           ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-14 15:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	hannes, linux-kernel, cgroups

On Wed, 14 Mar 2012 04:35:30 -0700, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Wed, 14 Mar 2012 16:40:58 +0530 "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> 
> > > 
> > > > +int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx);
> > > 
> > > 
> > > No, please put it in a header file.  Always.  Where both callers and
> > > the implementation see the same propotype.
> > > 
> > > > +#else
> > > > +static int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx)
> > > > +{
> > > > +	return 0;
> > > > +}
> > > > +#endif
> > > 
> > > So this will go into the same header file.
> > > 
> > 
> > I was not sure whether i want to put mem_cgroup_hugetlb_file_init in
> > linux/memcontrol.h .
> 
> The above is a declaration, not the definition (implementation).
> 
> > Ideally i want to have that in mm/hugetlb.c and in
> > linux/hugetlb.h. That would require me to make mem_cgroup_read and
> > others non static and move few #defines to memcontrol.h. That would
> > involve larger code movement which i didn't want to do. ? What do you
> > suggest ? Just move mem_cgroup_hugetlb_file_init to memcontrol.h ?
> 
> In memcontrol.h:
> 
> #ifdef CONFIG_FOO
> extern int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx);
> #else
> static inline int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx)
> {
> 	return 0;
> }
> #endif
> 

Will do that in the next iteration.

-aneesh


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 5/8] hugetlbfs: Add memcg control files for hugetlbfs
@ 2012-03-14 15:57           ` Aneesh Kumar K.V
  0 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-14 15:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	hannes, linux-kernel, cgroups

On Wed, 14 Mar 2012 04:35:30 -0700, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Wed, 14 Mar 2012 16:40:58 +0530 "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> 
> > > 
> > > > +int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx);
> > > 
> > > 
> > > No, please put it in a header file.  Always.  Where both callers and
> > > the implementation see the same propotype.
> > > 
> > > > +#else
> > > > +static int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx)
> > > > +{
> > > > +	return 0;
> > > > +}
> > > > +#endif
> > > 
> > > So this will go into the same header file.
> > > 
> > 
> > I was not sure whether i want to put mem_cgroup_hugetlb_file_init in
> > linux/memcontrol.h .
> 
> The above is a declaration, not the definition (implementation).
> 
> > Ideally i want to have that in mm/hugetlb.c and in
> > linux/hugetlb.h. That would require me to make mem_cgroup_read and
> > others non static and move few #defines to memcontrol.h. That would
> > involve larger code movement which i didn't want to do. ? What do you
> > suggest ? Just move mem_cgroup_hugetlb_file_init to memcontrol.h ?
> 
> In memcontrol.h:
> 
> #ifdef CONFIG_FOO
> extern int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx);
> #else
> static inline int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx)
> {
> 	return 0;
> }
> #endif
> 

Will do that in the next iteration.

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 5/8] hugetlbfs: Add memcg control files for hugetlbfs
@ 2012-03-14 15:57           ` Aneesh Kumar K.V
  0 siblings, 0 replies; 69+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-14 15:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, mgorman-l3A5Bk7waGM,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	mhocko-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Wed, 14 Mar 2012 04:35:30 -0700, Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:
> On Wed, 14 Mar 2012 16:40:58 +0530 "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> wrote:
> 
> > > 
> > > > +int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx);
> > > 
> > > 
> > > No, please put it in a header file.  Always.  Where both callers and
> > > the implementation see the same propotype.
> > > 
> > > > +#else
> > > > +static int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx)
> > > > +{
> > > > +	return 0;
> > > > +}
> > > > +#endif
> > > 
> > > So this will go into the same header file.
> > > 
> > 
> > I was not sure whether i want to put mem_cgroup_hugetlb_file_init in
> > linux/memcontrol.h .
> 
> The above is a declaration, not the definition (implementation).
> 
> > Ideally i want to have that in mm/hugetlb.c and in
> > linux/hugetlb.h. That would require me to make mem_cgroup_read and
> > others non static and move few #defines to memcontrol.h. That would
> > involve larger code movement which i didn't want to do. ? What do you
> > suggest ? Just move mem_cgroup_hugetlb_file_init to memcontrol.h ?
> 
> In memcontrol.h:
> 
> #ifdef CONFIG_FOO
> extern int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx);
> #else
> static inline int mem_cgroup_hugetlb_file_init(struct hstate *h, int idx)
> {
> 	return 0;
> }
> #endif
> 

Will do that in the next iteration.

-aneesh

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 2/8] memcg: Add HugeTLB extension
  2012-03-14 10:21       ` Aneesh Kumar K.V
  (?)
@ 2012-03-14 23:43         ` Andrew Morton
  -1 siblings, 0 replies; 69+ messages in thread
From: Andrew Morton @ 2012-03-14 23:43 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	hannes, linux-kernel, cgroups

On Wed, 14 Mar 2012 15:51:50 +0530
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:

> On Tue, 13 Mar 2012 14:33:16 -0700, Andrew Morton <akpm@linux-foundation.org> wrote:
> > On Tue, 13 Mar 2012 12:37:06 +0530
> > "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> > 
> > > +static int mem_cgroup_hugetlb_usage(struct mem_cgroup *memcg)
> > > +{
> > > +	int idx;
> > > +	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
> > > +		if (memcg->hugepage[idx].usage > 0)
> > > +			return memcg->hugepage[idx].usage;
> > > +	}
> > > +	return 0;
> > > +}
> > 
> > Please document the function?  Had you done this, I might have been
> > able to work out why the function bales out on the first used hugepage
> > size, but I can't :(
> 
> I guess the function is named wrongly. I will rename it to
> mem_cgroup_have_hugetlb_usage() in the next iteration ? The function
> will return (bool) 1 if it has any hugetlb resource usage.
> 
> > 
> > This could have used for_each_hstate(), had that macro been better
> > designed (or updated).
> > 
> 
> Can you explain this ?. for_each_hstate allows to iterate over
> different hstates. But here we need to look at different hugepage
> rescounter in memcg. I can still use for_each_hstate() and find the
> hstate index (h - hstates) and use that to index memcg rescounter
> array. But that would make it more complex ?

If the for_each_hstate() macro took an additional arg which holds the
base address of the array, that macro could have been used here.

Or perhaps not - I didn't look too closely ;)  It isn't important.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 2/8] memcg: Add HugeTLB extension
@ 2012-03-14 23:43         ` Andrew Morton
  0 siblings, 0 replies; 69+ messages in thread
From: Andrew Morton @ 2012-03-14 23:43 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	hannes, linux-kernel, cgroups

On Wed, 14 Mar 2012 15:51:50 +0530
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:

> On Tue, 13 Mar 2012 14:33:16 -0700, Andrew Morton <akpm@linux-foundation.org> wrote:
> > On Tue, 13 Mar 2012 12:37:06 +0530
> > "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> > 
> > > +static int mem_cgroup_hugetlb_usage(struct mem_cgroup *memcg)
> > > +{
> > > +	int idx;
> > > +	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
> > > +		if (memcg->hugepage[idx].usage > 0)
> > > +			return memcg->hugepage[idx].usage;
> > > +	}
> > > +	return 0;
> > > +}
> > 
> > Please document the function?  Had you done this, I might have been
> > able to work out why the function bales out on the first used hugepage
> > size, but I can't :(
> 
> I guess the function is named wrongly. I will rename it to
> mem_cgroup_have_hugetlb_usage() in the next iteration ? The function
> will return (bool) 1 if it has any hugetlb resource usage.
> 
> > 
> > This could have used for_each_hstate(), had that macro been better
> > designed (or updated).
> > 
> 
> Can you explain this ?. for_each_hstate allows to iterate over
> different hstates. But here we need to look at different hugepage
> rescounter in memcg. I can still use for_each_hstate() and find the
> hstate index (h - hstates) and use that to index memcg rescounter
> array. But that would make it more complex ?

If the for_each_hstate() macro took an additional arg which holds the
base address of the array, that macro could have been used here.

Or perhaps not - I didn't look too closely ;)  It isn't important.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH -V3 2/8] memcg: Add HugeTLB extension
@ 2012-03-14 23:43         ` Andrew Morton
  0 siblings, 0 replies; 69+ messages in thread
From: Andrew Morton @ 2012-03-14 23:43 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, mgorman-l3A5Bk7waGM,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	mhocko-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Wed, 14 Mar 2012 15:51:50 +0530
"Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> wrote:

> On Tue, 13 Mar 2012 14:33:16 -0700, Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:
> > On Tue, 13 Mar 2012 12:37:06 +0530
> > "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> wrote:
> > 
> > > +static int mem_cgroup_hugetlb_usage(struct mem_cgroup *memcg)
> > > +{
> > > +	int idx;
> > > +	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
> > > +		if (memcg->hugepage[idx].usage > 0)
> > > +			return memcg->hugepage[idx].usage;
> > > +	}
> > > +	return 0;
> > > +}
> > 
> > Please document the function?  Had you done this, I might have been
> > able to work out why the function bales out on the first used hugepage
> > size, but I can't :(
> 
> I guess the function is named wrongly. I will rename it to
> mem_cgroup_have_hugetlb_usage() in the next iteration ? The function
> will return (bool) 1 if it has any hugetlb resource usage.
> 
> > 
> > This could have used for_each_hstate(), had that macro been better
> > designed (or updated).
> > 
> 
> Can you explain this ?. for_each_hstate allows to iterate over
> different hstates. But here we need to look at different hugepage
> rescounter in memcg. I can still use for_each_hstate() and find the
> hstate index (h - hstates) and use that to index memcg rescounter
> array. But that would make it more complex ?

If the for_each_hstate() macro took an additional arg which holds the
base address of the array, that macro could have been used here.

Or perhaps not - I didn't look too closely ;)  It isn't important.

^ permalink raw reply	[flat|nested] 69+ messages in thread

end of thread, other threads:[~2012-03-14 23:43 UTC | newest]

Thread overview: 69+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-13  7:07 [PATCH -V3 0/8] memcg: Add memcg extension to control HugeTLB allocation Aneesh Kumar K.V
2012-03-13  7:07 ` Aneesh Kumar K.V
2012-03-13  7:07 ` [PATCH -V3 1/8] hugetlb: rename max_hstate to hugetlb_max_hstate Aneesh Kumar K.V
2012-03-13  7:07   ` Aneesh Kumar K.V
2012-03-13 13:13   ` Hillf Danton
2012-03-13 13:13     ` Hillf Danton
2012-03-13 13:13     ` Hillf Danton
2012-03-13  7:07 ` [PATCH -V3 2/8] memcg: Add HugeTLB extension Aneesh Kumar K.V
2012-03-13  7:07   ` Aneesh Kumar K.V
2012-03-13 13:28   ` Glauber Costa
2012-03-13 13:42     ` Glauber Costa
2012-03-13 21:33   ` Andrew Morton
2012-03-13 21:33     ` Andrew Morton
2012-03-14 10:21     ` Aneesh Kumar K.V
2012-03-14 10:21       ` Aneesh Kumar K.V
2012-03-14 23:43       ` Andrew Morton
2012-03-14 23:43         ` Andrew Morton
2012-03-14 23:43         ` Andrew Morton
2012-03-13  7:07 ` [PATCH -V3 3/8] hugetlb: add charge/uncharge calls for HugeTLB alloc/free Aneesh Kumar K.V
2012-03-13  7:07   ` Aneesh Kumar K.V
2012-03-13 13:20   ` Hillf Danton
2012-03-13 13:20     ` Hillf Danton
2012-03-14 10:22     ` Aneesh Kumar K.V
2012-03-14 10:22       ` Aneesh Kumar K.V
2012-03-14 10:22       ` Aneesh Kumar K.V
2012-03-13 13:31   ` Glauber Costa
2012-03-14 10:41     ` Aneesh Kumar K.V
2012-03-14 10:41       ` Aneesh Kumar K.V
2012-03-14 10:41       ` Aneesh Kumar K.V
2012-03-13 21:36   ` Andrew Morton
2012-03-13 21:36     ` Andrew Morton
2012-03-13 21:36     ` Andrew Morton
2012-03-14 10:47     ` Aneesh Kumar K.V
2012-03-14 10:47       ` Aneesh Kumar K.V
2012-03-14 10:47       ` Aneesh Kumar K.V
2012-03-13  7:07 ` [PATCH -V3 4/8] memcg: track resource index in cftype private Aneesh Kumar K.V
2012-03-13  7:07   ` Aneesh Kumar K.V
2012-03-13 13:34   ` Glauber Costa
2012-03-14 10:48     ` Aneesh Kumar K.V
2012-03-14 10:48       ` Aneesh Kumar K.V
2012-03-14 10:48       ` Aneesh Kumar K.V
2012-03-13  7:07 ` [PATCH -V3 5/8] hugetlbfs: Add memcg control files for hugetlbfs Aneesh Kumar K.V
2012-03-13  7:07   ` Aneesh Kumar K.V
2012-03-13 21:42   ` Andrew Morton
2012-03-13 21:42     ` Andrew Morton
2012-03-13 21:42     ` Andrew Morton
2012-03-14 11:10     ` Aneesh Kumar K.V
2012-03-14 11:10       ` Aneesh Kumar K.V
2012-03-14 11:35       ` Andrew Morton
2012-03-14 11:35         ` Andrew Morton
2012-03-14 15:57         ` Aneesh Kumar K.V
2012-03-14 15:57           ` Aneesh Kumar K.V
2012-03-14 15:57           ` Aneesh Kumar K.V
2012-03-13  7:07 ` [PATCH -V3 6/8] hugetlbfs: Add a list for tracking in-use HugeTLB pages Aneesh Kumar K.V
2012-03-13  7:07   ` Aneesh Kumar K.V
2012-03-13  7:07 ` [PATCH -V3 7/8] memcg: move HugeTLB resource count to parent cgroup on memcg removal Aneesh Kumar K.V
2012-03-13  7:07   ` Aneesh Kumar K.V
2012-03-13 21:47   ` Andrew Morton
2012-03-13 21:47     ` Andrew Morton
2012-03-13 21:47     ` Andrew Morton
2012-03-14 12:22     ` Aneesh Kumar K.V
2012-03-14 12:22       ` Aneesh Kumar K.V
2012-03-13  7:07 ` [PATCH -V3 8/8] memcg: Add memory controller documentation for hugetlb management Aneesh Kumar K.V
2012-03-13  7:07   ` Aneesh Kumar K.V
2012-03-13 21:49 ` [PATCH -V3 0/8] memcg: Add memcg extension to control HugeTLB allocation Andrew Morton
2012-03-13 21:49   ` Andrew Morton
2012-03-13 21:49   ` Andrew Morton
2012-03-14 13:01   ` Aneesh Kumar K.V
2012-03-14 13:01     ` Aneesh Kumar K.V

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.