All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH -V4 00/10] memcg: Add memcg extension to control HugeTLB allocation
@ 2012-03-16 17:39 ` Aneesh Kumar K.V
  0 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-16 17:39 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups

Hi,

This patchset implements a memory controller extension to control
HugeTLB allocations. The extension allows to limit the HugeTLB
usage per control group and enforces the controller limit during
page fault. Since HugeTLB doesn't support page reclaim, enforcing
the limit at page fault time implies that, the application will get
SIGBUS signal if it tries to access HugeTLB pages beyond its limit.
This requires the application to know beforehand how much HugeTLB
pages it would require for its use.

The goal is to control how many HugeTLB pages a group of task can
allocate. It can be looked at as an extension of the existing quota
interface which limits the number of HugeTLB pages per hugetlbfs
superblock. HPC job scheduler requires jobs to specify their resource
requirements in the job file. Once their requirements can be met,
job schedulers like (SLURM) will schedule the job. We need to make sure
that the jobs won't consume more resources than requested. If they do
we should either error out or kill the application.

Changes from v3:
 * Address review feedback.
 * bug fix in cgroup removal related parent charging with use_hierarchy set

Changes from V2:
* Changed the implementation to limit the HugeTLB usage during page
  fault time. This simplifies the extension and keep it closer to
  memcg design. This also allows to support cgroup removal with less
  complexity. Only caveat is the application should ensure its HugeTLB
  usage doesn't cross the cgroup limit.

Changes from V1:
* Changed the implementation as a memcg extension. We still use
  the same logic to track the cgroup and range.

Changes from RFC post:
* Added support for HugeTLB cgroup hierarchy
* Added support for task migration
* Added documentation patch
* Other bug fixes

-aneesh



^ permalink raw reply	[flat|nested] 130+ messages in thread

* [PATCH -V4 00/10] memcg: Add memcg extension to control HugeTLB allocation
@ 2012-03-16 17:39 ` Aneesh Kumar K.V
  0 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-16 17:39 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups

Hi,

This patchset implements a memory controller extension to control
HugeTLB allocations. The extension allows to limit the HugeTLB
usage per control group and enforces the controller limit during
page fault. Since HugeTLB doesn't support page reclaim, enforcing
the limit at page fault time implies that, the application will get
SIGBUS signal if it tries to access HugeTLB pages beyond its limit.
This requires the application to know beforehand how much HugeTLB
pages it would require for its use.

The goal is to control how many HugeTLB pages a group of task can
allocate. It can be looked at as an extension of the existing quota
interface which limits the number of HugeTLB pages per hugetlbfs
superblock. HPC job scheduler requires jobs to specify their resource
requirements in the job file. Once their requirements can be met,
job schedulers like (SLURM) will schedule the job. We need to make sure
that the jobs won't consume more resources than requested. If they do
we should either error out or kill the application.

Changes from v3:
 * Address review feedback.
 * bug fix in cgroup removal related parent charging with use_hierarchy set

Changes from V2:
* Changed the implementation to limit the HugeTLB usage during page
  fault time. This simplifies the extension and keep it closer to
  memcg design. This also allows to support cgroup removal with less
  complexity. Only caveat is the application should ensure its HugeTLB
  usage doesn't cross the cgroup limit.

Changes from V1:
* Changed the implementation as a memcg extension. We still use
  the same logic to track the cgroup and range.

Changes from RFC post:
* Added support for HugeTLB cgroup hierarchy
* Added support for task migration
* Added documentation patch
* Other bug fixes

-aneesh


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [PATCH -V4 01/10] hugetlb: rename max_hstate to hugetlb_max_hstate
  2012-03-16 17:39 ` Aneesh Kumar K.V
@ 2012-03-16 17:39   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-16 17:39 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

We will be using this from other subsystems like memcg
in later patches.

Acked-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 mm/hugetlb.c |   14 +++++++-------
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 5f34bd8..d623e71 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -34,7 +34,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
 static gfp_t htlb_alloc_mask = GFP_HIGHUSER;
 unsigned long hugepages_treat_as_movable;
 
-static int max_hstate;
+static int hugetlb_max_hstate;
 unsigned int default_hstate_idx;
 struct hstate hstates[HUGE_MAX_HSTATE];
 
@@ -46,7 +46,7 @@ static unsigned long __initdata default_hstate_max_huge_pages;
 static unsigned long __initdata default_hstate_size;
 
 #define for_each_hstate(h) \
-	for ((h) = hstates; (h) < &hstates[max_hstate]; (h)++)
+	for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++)
 
 /*
  * Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages
@@ -1808,9 +1808,9 @@ void __init hugetlb_add_hstate(unsigned order)
 		printk(KERN_WARNING "hugepagesz= specified twice, ignoring\n");
 		return;
 	}
-	BUG_ON(max_hstate >= HUGE_MAX_HSTATE);
+	BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE);
 	BUG_ON(order == 0);
-	h = &hstates[max_hstate++];
+	h = &hstates[hugetlb_max_hstate++];
 	h->order = order;
 	h->mask = ~((1ULL << (order + PAGE_SHIFT)) - 1);
 	h->nr_huge_pages = 0;
@@ -1831,10 +1831,10 @@ static int __init hugetlb_nrpages_setup(char *s)
 	static unsigned long *last_mhp;
 
 	/*
-	 * !max_hstate means we haven't parsed a hugepagesz= parameter yet,
+	 * !hugetlb_max_hstate means we haven't parsed a hugepagesz= parameter yet,
 	 * so this hugepages= parameter goes to the "default hstate".
 	 */
-	if (!max_hstate)
+	if (!hugetlb_max_hstate)
 		mhp = &default_hstate_max_huge_pages;
 	else
 		mhp = &parsed_hstate->max_huge_pages;
@@ -1853,7 +1853,7 @@ static int __init hugetlb_nrpages_setup(char *s)
 	 * But we need to allocate >= MAX_ORDER hstates here early to still
 	 * use the bootmem allocator.
 	 */
-	if (max_hstate && parsed_hstate->order >= MAX_ORDER)
+	if (hugetlb_max_hstate && parsed_hstate->order >= MAX_ORDER)
 		hugetlb_hstate_alloc_pages(parsed_hstate);
 
 	last_mhp = mhp;
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH -V4 01/10] hugetlb: rename max_hstate to hugetlb_max_hstate
@ 2012-03-16 17:39   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-16 17:39 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

We will be using this from other subsystems like memcg
in later patches.

Acked-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 mm/hugetlb.c |   14 +++++++-------
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 5f34bd8..d623e71 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -34,7 +34,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
 static gfp_t htlb_alloc_mask = GFP_HIGHUSER;
 unsigned long hugepages_treat_as_movable;
 
-static int max_hstate;
+static int hugetlb_max_hstate;
 unsigned int default_hstate_idx;
 struct hstate hstates[HUGE_MAX_HSTATE];
 
@@ -46,7 +46,7 @@ static unsigned long __initdata default_hstate_max_huge_pages;
 static unsigned long __initdata default_hstate_size;
 
 #define for_each_hstate(h) \
-	for ((h) = hstates; (h) < &hstates[max_hstate]; (h)++)
+	for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++)
 
 /*
  * Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages
@@ -1808,9 +1808,9 @@ void __init hugetlb_add_hstate(unsigned order)
 		printk(KERN_WARNING "hugepagesz= specified twice, ignoring\n");
 		return;
 	}
-	BUG_ON(max_hstate >= HUGE_MAX_HSTATE);
+	BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE);
 	BUG_ON(order == 0);
-	h = &hstates[max_hstate++];
+	h = &hstates[hugetlb_max_hstate++];
 	h->order = order;
 	h->mask = ~((1ULL << (order + PAGE_SHIFT)) - 1);
 	h->nr_huge_pages = 0;
@@ -1831,10 +1831,10 @@ static int __init hugetlb_nrpages_setup(char *s)
 	static unsigned long *last_mhp;
 
 	/*
-	 * !max_hstate means we haven't parsed a hugepagesz= parameter yet,
+	 * !hugetlb_max_hstate means we haven't parsed a hugepagesz= parameter yet,
 	 * so this hugepages= parameter goes to the "default hstate".
 	 */
-	if (!max_hstate)
+	if (!hugetlb_max_hstate)
 		mhp = &default_hstate_max_huge_pages;
 	else
 		mhp = &parsed_hstate->max_huge_pages;
@@ -1853,7 +1853,7 @@ static int __init hugetlb_nrpages_setup(char *s)
 	 * But we need to allocate >= MAX_ORDER hstates here early to still
 	 * use the bootmem allocator.
 	 */
-	if (max_hstate && parsed_hstate->order >= MAX_ORDER)
+	if (hugetlb_max_hstate && parsed_hstate->order >= MAX_ORDER)
 		hugetlb_hstate_alloc_pages(parsed_hstate);
 
 	last_mhp = mhp;
-- 
1.7.9

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH -V4 02/10] hugetlbfs: don't use ERR_PTR with VM_FAULT* values
  2012-03-16 17:39 ` Aneesh Kumar K.V
@ 2012-03-16 17:39   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-16 17:39 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

Using VM_FAULT_* codes with ERR_PTR will require us to make sure
VM_FAULT_* values will not exceed MAX_ERRNO value.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 mm/hugetlb.c |   18 +++++++++++++-----
 1 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d623e71..3782da8 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1034,10 +1034,10 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 	 */
 	chg = vma_needs_reservation(h, vma, addr);
 	if (chg < 0)
-		return ERR_PTR(-VM_FAULT_OOM);
+		return ERR_PTR(-ENOMEM);
 	if (chg)
 		if (hugetlb_get_quota(inode->i_mapping, chg))
-			return ERR_PTR(-VM_FAULT_SIGBUS);
+			return ERR_PTR(-ENOSPC);
 
 	spin_lock(&hugetlb_lock);
 	page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve);
@@ -1047,7 +1047,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 		page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
 		if (!page) {
 			hugetlb_put_quota(inode->i_mapping, chg);
-			return ERR_PTR(-VM_FAULT_SIGBUS);
+			return ERR_PTR(-ENOSPC);
 		}
 	}
 
@@ -2395,6 +2395,7 @@ retry_avoidcopy:
 	new_page = alloc_huge_page(vma, address, outside_reserve);
 
 	if (IS_ERR(new_page)) {
+		int err = PTR_ERR(new_page);
 		page_cache_release(old_page);
 
 		/*
@@ -2424,7 +2425,10 @@ retry_avoidcopy:
 
 		/* Caller expects lock to be held */
 		spin_lock(&mm->page_table_lock);
-		return -PTR_ERR(new_page);
+		if (err == -ENOMEM)
+			return VM_FAULT_OOM;
+		else
+			return VM_FAULT_SIGBUS;
 	}
 
 	/*
@@ -2542,7 +2546,11 @@ retry:
 			goto out;
 		page = alloc_huge_page(vma, address, 0);
 		if (IS_ERR(page)) {
-			ret = -PTR_ERR(page);
+			ret = PTR_ERR(page);
+			if (ret == -ENOMEM)
+				ret = VM_FAULT_OOM;
+			else
+				ret = VM_FAULT_SIGBUS;
 			goto out;
 		}
 		clear_huge_page(page, address, pages_per_huge_page(h));
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH -V4 02/10] hugetlbfs: don't use ERR_PTR with VM_FAULT* values
@ 2012-03-16 17:39   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-16 17:39 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

Using VM_FAULT_* codes with ERR_PTR will require us to make sure
VM_FAULT_* values will not exceed MAX_ERRNO value.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 mm/hugetlb.c |   18 +++++++++++++-----
 1 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d623e71..3782da8 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1034,10 +1034,10 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 	 */
 	chg = vma_needs_reservation(h, vma, addr);
 	if (chg < 0)
-		return ERR_PTR(-VM_FAULT_OOM);
+		return ERR_PTR(-ENOMEM);
 	if (chg)
 		if (hugetlb_get_quota(inode->i_mapping, chg))
-			return ERR_PTR(-VM_FAULT_SIGBUS);
+			return ERR_PTR(-ENOSPC);
 
 	spin_lock(&hugetlb_lock);
 	page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve);
@@ -1047,7 +1047,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 		page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
 		if (!page) {
 			hugetlb_put_quota(inode->i_mapping, chg);
-			return ERR_PTR(-VM_FAULT_SIGBUS);
+			return ERR_PTR(-ENOSPC);
 		}
 	}
 
@@ -2395,6 +2395,7 @@ retry_avoidcopy:
 	new_page = alloc_huge_page(vma, address, outside_reserve);
 
 	if (IS_ERR(new_page)) {
+		int err = PTR_ERR(new_page);
 		page_cache_release(old_page);
 
 		/*
@@ -2424,7 +2425,10 @@ retry_avoidcopy:
 
 		/* Caller expects lock to be held */
 		spin_lock(&mm->page_table_lock);
-		return -PTR_ERR(new_page);
+		if (err == -ENOMEM)
+			return VM_FAULT_OOM;
+		else
+			return VM_FAULT_SIGBUS;
 	}
 
 	/*
@@ -2542,7 +2546,11 @@ retry:
 			goto out;
 		page = alloc_huge_page(vma, address, 0);
 		if (IS_ERR(page)) {
-			ret = -PTR_ERR(page);
+			ret = PTR_ERR(page);
+			if (ret == -ENOMEM)
+				ret = VM_FAULT_OOM;
+			else
+				ret = VM_FAULT_SIGBUS;
 			goto out;
 		}
 		clear_huge_page(page, address, pages_per_huge_page(h));
-- 
1.7.9

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH -V4 03/10] hugetlbfs: Add an inline helper for finding hstate index
  2012-03-16 17:39 ` Aneesh Kumar K.V
@ 2012-03-16 17:39   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-16 17:39 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

Add and inline helper and use it in the code.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/hugetlb.h |    6 ++++++
 mm/hugetlb.c            |   18 ++++++++++--------
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index d9d6c86..a2675b0 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -311,6 +311,11 @@ static inline unsigned hstate_index_to_shift(unsigned index)
 	return hstates[index].order + PAGE_SHIFT;
 }
 
+static inline int hstate_index(struct hstate *h)
+{
+	return h - hstates;
+}
+
 #else
 struct hstate {};
 #define alloc_huge_page_node(h, nid) NULL
@@ -329,6 +334,7 @@ static inline unsigned int pages_per_huge_page(struct hstate *h)
 	return 1;
 }
 #define hstate_index_to_shift(index) 0
+#define hstate_index(h) 0
 #endif
 
 #endif /* _LINUX_HUGETLB_H */
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3782da8..ebe245c 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1557,7 +1557,7 @@ static int hugetlb_sysfs_add_hstate(struct hstate *h, struct kobject *parent,
 				    struct attribute_group *hstate_attr_group)
 {
 	int retval;
-	int hi = h - hstates;
+	int hi = hstate_index(h);
 
 	hstate_kobjs[hi] = kobject_create_and_add(h->name, parent);
 	if (!hstate_kobjs[hi])
@@ -1652,11 +1652,13 @@ void hugetlb_unregister_node(struct node *node)
 	if (!nhs->hugepages_kobj)
 		return;		/* no hstate attributes */
 
-	for_each_hstate(h)
-		if (nhs->hstate_kobjs[h - hstates]) {
-			kobject_put(nhs->hstate_kobjs[h - hstates]);
-			nhs->hstate_kobjs[h - hstates] = NULL;
+	for_each_hstate(h) {
+		int idx = hstate_index(h);
+		if (nhs->hstate_kobjs[idx]) {
+			kobject_put(nhs->hstate_kobjs[idx]);
+			nhs->hstate_kobjs[idx] = NULL;
 		}
+	}
 
 	kobject_put(nhs->hugepages_kobj);
 	nhs->hugepages_kobj = NULL;
@@ -1759,7 +1761,7 @@ static void __exit hugetlb_exit(void)
 	hugetlb_unregister_all_nodes();
 
 	for_each_hstate(h) {
-		kobject_put(hstate_kobjs[h - hstates]);
+		kobject_put(hstate_kobjs[hstate_index(h)]);
 	}
 
 	kobject_put(hugepages_kobj);
@@ -2587,7 +2589,7 @@ retry:
 		 */
 		if (unlikely(PageHWPoison(page))) {
 			ret = VM_FAULT_HWPOISON |
-			      VM_FAULT_SET_HINDEX(h - hstates);
+				VM_FAULT_SET_HINDEX(hstate_index(h));
 			goto backout_unlocked;
 		}
 	}
@@ -2660,7 +2662,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 			return 0;
 		} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
 			return VM_FAULT_HWPOISON_LARGE |
-			       VM_FAULT_SET_HINDEX(h - hstates);
+				VM_FAULT_SET_HINDEX(hstate_index(h));
 	}
 
 	ptep = huge_pte_alloc(mm, address, huge_page_size(h));
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH -V4 03/10] hugetlbfs: Add an inline helper for finding hstate index
@ 2012-03-16 17:39   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-16 17:39 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

Add and inline helper and use it in the code.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/hugetlb.h |    6 ++++++
 mm/hugetlb.c            |   18 ++++++++++--------
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index d9d6c86..a2675b0 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -311,6 +311,11 @@ static inline unsigned hstate_index_to_shift(unsigned index)
 	return hstates[index].order + PAGE_SHIFT;
 }
 
+static inline int hstate_index(struct hstate *h)
+{
+	return h - hstates;
+}
+
 #else
 struct hstate {};
 #define alloc_huge_page_node(h, nid) NULL
@@ -329,6 +334,7 @@ static inline unsigned int pages_per_huge_page(struct hstate *h)
 	return 1;
 }
 #define hstate_index_to_shift(index) 0
+#define hstate_index(h) 0
 #endif
 
 #endif /* _LINUX_HUGETLB_H */
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3782da8..ebe245c 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1557,7 +1557,7 @@ static int hugetlb_sysfs_add_hstate(struct hstate *h, struct kobject *parent,
 				    struct attribute_group *hstate_attr_group)
 {
 	int retval;
-	int hi = h - hstates;
+	int hi = hstate_index(h);
 
 	hstate_kobjs[hi] = kobject_create_and_add(h->name, parent);
 	if (!hstate_kobjs[hi])
@@ -1652,11 +1652,13 @@ void hugetlb_unregister_node(struct node *node)
 	if (!nhs->hugepages_kobj)
 		return;		/* no hstate attributes */
 
-	for_each_hstate(h)
-		if (nhs->hstate_kobjs[h - hstates]) {
-			kobject_put(nhs->hstate_kobjs[h - hstates]);
-			nhs->hstate_kobjs[h - hstates] = NULL;
+	for_each_hstate(h) {
+		int idx = hstate_index(h);
+		if (nhs->hstate_kobjs[idx]) {
+			kobject_put(nhs->hstate_kobjs[idx]);
+			nhs->hstate_kobjs[idx] = NULL;
 		}
+	}
 
 	kobject_put(nhs->hugepages_kobj);
 	nhs->hugepages_kobj = NULL;
@@ -1759,7 +1761,7 @@ static void __exit hugetlb_exit(void)
 	hugetlb_unregister_all_nodes();
 
 	for_each_hstate(h) {
-		kobject_put(hstate_kobjs[h - hstates]);
+		kobject_put(hstate_kobjs[hstate_index(h)]);
 	}
 
 	kobject_put(hugepages_kobj);
@@ -2587,7 +2589,7 @@ retry:
 		 */
 		if (unlikely(PageHWPoison(page))) {
 			ret = VM_FAULT_HWPOISON |
-			      VM_FAULT_SET_HINDEX(h - hstates);
+				VM_FAULT_SET_HINDEX(hstate_index(h));
 			goto backout_unlocked;
 		}
 	}
@@ -2660,7 +2662,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 			return 0;
 		} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
 			return VM_FAULT_HWPOISON_LARGE |
-			       VM_FAULT_SET_HINDEX(h - hstates);
+				VM_FAULT_SET_HINDEX(hstate_index(h));
 	}
 
 	ptep = huge_pte_alloc(mm, address, huge_page_size(h));
-- 
1.7.9

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH -V4 04/10] memcg: Add HugeTLB extension
  2012-03-16 17:39 ` Aneesh Kumar K.V
@ 2012-03-16 17:39   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-16 17:39 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This patch implements a memcg extension that allows us to control
HugeTLB allocations via memory controller.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/hugetlb.h    |    1 +
 include/linux/memcontrol.h |   42 +++++++++++++
 init/Kconfig               |    8 +++
 mm/hugetlb.c               |    2 +-
 mm/memcontrol.c            |  138 ++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 190 insertions(+), 1 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index a2675b0..1f70068 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -243,6 +243,7 @@ struct hstate *size_to_hstate(unsigned long size);
 #define HUGE_MAX_HSTATE 1
 #endif
 
+extern int hugetlb_max_hstate;
 extern struct hstate hstates[HUGE_MAX_HSTATE];
 extern unsigned int default_hstate_idx;
 
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 4d34356..320dbad 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -429,5 +429,47 @@ static inline void sock_release_memcg(struct sock *sk)
 {
 }
 #endif /* CONFIG_CGROUP_MEM_RES_CTLR_KMEM */
+
+#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
+extern int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
+					  struct mem_cgroup **ptr);
+extern void mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages,
+					     struct mem_cgroup *memcg,
+					     struct page *page);
+extern void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages,
+					     struct page *page);
+extern void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
+					      struct mem_cgroup *memcg);
+
+#else
+static inline int
+mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
+						 struct mem_cgroup **ptr)
+{
+	return 0;
+}
+
+static inline void
+mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages,
+				 struct mem_cgroup *memcg,
+				 struct page *page)
+{
+	return;
+}
+
+static inline void
+mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages,
+				 struct page *page)
+{
+	return;
+}
+
+static inline void
+mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
+				  struct mem_cgroup *memcg)
+{
+	return;
+}
+#endif  /* CONFIG_MEM_RES_CTLR_HUGETLB */
 #endif /* _LINUX_MEMCONTROL_H */
 
diff --git a/init/Kconfig b/init/Kconfig
index 3f42cd6..f0eb8aa 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -725,6 +725,14 @@ config CGROUP_PERF
 
 	  Say N if unsure.
 
+config MEM_RES_CTLR_HUGETLB
+	bool "Memory Resource Controller HugeTLB Extension (EXPERIMENTAL)"
+	depends on CGROUP_MEM_RES_CTLR && HUGETLB_PAGE && EXPERIMENTAL
+	default n
+	help
+	  Add HugeTLB management to memory resource controller. When you
+	  enable this, you can put a per cgroup limit on HugeTLB usage.
+
 menuconfig CGROUP_SCHED
 	bool "Group CPU scheduler"
 	default n
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ebe245c..c672187 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -34,7 +34,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
 static gfp_t htlb_alloc_mask = GFP_HIGHUSER;
 unsigned long hugepages_treat_as_movable;
 
-static int hugetlb_max_hstate;
+int hugetlb_max_hstate;
 unsigned int default_hstate_idx;
 struct hstate hstates[HUGE_MAX_HSTATE];
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6728a7a..4b36c5e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -235,6 +235,10 @@ struct mem_cgroup {
 	 */
 	struct res_counter memsw;
 	/*
+	 * the counter to account for hugepages from hugetlb.
+	 */
+	struct res_counter hugepage[HUGE_MAX_HSTATE];
+	/*
 	 * Per cgroup active and inactive list, similar to the
 	 * per zone LRU lists.
 	 */
@@ -3156,6 +3160,128 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
 }
 #endif
 
+#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
+static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
+{
+	int idx;
+	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
+		if (memcg->hugepage[idx].usage > 0)
+			return 1;
+	}
+	return 0;
+}
+
+int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
+				   struct mem_cgroup **ptr)
+{
+	int ret = 0;
+	struct mem_cgroup *memcg;
+	struct res_counter *fail_res;
+	unsigned long csize = nr_pages * PAGE_SIZE;
+
+	if (mem_cgroup_disabled())
+		return 0;
+again:
+	rcu_read_lock();
+	memcg = mem_cgroup_from_task(current);
+	if (!memcg)
+		memcg = root_mem_cgroup;
+	if (mem_cgroup_is_root(memcg)) {
+		rcu_read_unlock();
+		goto done;
+	}
+	if (!css_tryget(&memcg->css)) {
+		rcu_read_unlock();
+		goto again;
+	}
+	rcu_read_unlock();
+
+	ret = res_counter_charge(&memcg->hugepage[idx], csize, &fail_res);
+	css_put(&memcg->css);
+done:
+	*ptr = memcg;
+	return ret;
+}
+
+void mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages,
+				      struct mem_cgroup *memcg,
+				      struct page *page)
+{
+	struct page_cgroup *pc;
+
+	if (mem_cgroup_disabled())
+		return;
+
+	pc = lookup_page_cgroup(page);
+	lock_page_cgroup(pc);
+	if (unlikely(PageCgroupUsed(pc))) {
+		unlock_page_cgroup(pc);
+		mem_cgroup_hugetlb_uncharge_memcg(idx, nr_pages, memcg);
+		return;
+	}
+	pc->mem_cgroup = memcg;
+	/*
+	 * We access a page_cgroup asynchronously without lock_page_cgroup().
+	 * Especially when a page_cgroup is taken from a page, pc->mem_cgroup
+	 * is accessed after testing USED bit. To make pc->mem_cgroup visible
+	 * before USED bit, we need memory barrier here.
+	 * See mem_cgroup_add_lru_list(), etc.
+	 */
+	smp_wmb();
+	SetPageCgroupUsed(pc);
+
+	unlock_page_cgroup(pc);
+	return;
+}
+
+void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages,
+				      struct page *page)
+{
+	struct page_cgroup *pc;
+	struct mem_cgroup *memcg;
+	unsigned long csize = nr_pages * PAGE_SIZE;
+
+	if (mem_cgroup_disabled())
+		return;
+
+	pc = lookup_page_cgroup(page);
+	if (unlikely(!PageCgroupUsed(pc)))
+		return;
+
+	lock_page_cgroup(pc);
+	if (!PageCgroupUsed(pc)) {
+		unlock_page_cgroup(pc);
+		return;
+	}
+	memcg = pc->mem_cgroup;
+	pc->mem_cgroup = root_mem_cgroup;
+	ClearPageCgroupUsed(pc);
+	unlock_page_cgroup(pc);
+
+	if (!mem_cgroup_is_root(memcg))
+		res_counter_uncharge(&memcg->hugepage[idx], csize);
+	return;
+}
+
+void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
+				       struct mem_cgroup *memcg)
+{
+	unsigned long csize = nr_pages * PAGE_SIZE;
+
+	if (mem_cgroup_disabled())
+		return;
+
+	if (!mem_cgroup_is_root(memcg))
+		res_counter_uncharge(&memcg->hugepage[idx], csize);
+	return;
+}
+#else
+static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
+{
+	return 0;
+}
+#endif /* CONFIG_MEM_RES_CTLR_HUGETLB */
+
 /*
  * Before starting migration, account PAGE_SIZE to mem_cgroup that the old
  * page belongs to.
@@ -4887,6 +5013,7 @@ err_cleanup:
 static struct cgroup_subsys_state * __ref
 mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 {
+	int idx;
 	struct mem_cgroup *memcg, *parent;
 	long error = -ENOMEM;
 	int node;
@@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 		 * mem_cgroup(see mem_cgroup_put).
 		 */
 		mem_cgroup_get(parent);
+		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
+			res_counter_init(&memcg->hugepage[idx],
+					 &parent->hugepage[idx]);
 	} else {
 		res_counter_init(&memcg->res, NULL);
 		res_counter_init(&memcg->memsw, NULL);
+		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
+			res_counter_init(&memcg->hugepage[idx], NULL);
 	}
 	memcg->last_scanned_node = MAX_NUMNODES;
 	INIT_LIST_HEAD(&memcg->oom_notify);
@@ -4951,6 +5083,12 @@ static int mem_cgroup_pre_destroy(struct cgroup_subsys *ss,
 					struct cgroup *cont)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
+	/*
+	 * Don't allow memcg removal if we have HugeTLB resource
+	 * usage.
+	 */
+	if (mem_cgroup_have_hugetlb_usage(memcg))
+		return -EBUSY;
 
 	return mem_cgroup_force_empty(memcg, false);
 }
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH -V4 04/10] memcg: Add HugeTLB extension
@ 2012-03-16 17:39   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-16 17:39 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This patch implements a memcg extension that allows us to control
HugeTLB allocations via memory controller.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/hugetlb.h    |    1 +
 include/linux/memcontrol.h |   42 +++++++++++++
 init/Kconfig               |    8 +++
 mm/hugetlb.c               |    2 +-
 mm/memcontrol.c            |  138 ++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 190 insertions(+), 1 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index a2675b0..1f70068 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -243,6 +243,7 @@ struct hstate *size_to_hstate(unsigned long size);
 #define HUGE_MAX_HSTATE 1
 #endif
 
+extern int hugetlb_max_hstate;
 extern struct hstate hstates[HUGE_MAX_HSTATE];
 extern unsigned int default_hstate_idx;
 
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 4d34356..320dbad 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -429,5 +429,47 @@ static inline void sock_release_memcg(struct sock *sk)
 {
 }
 #endif /* CONFIG_CGROUP_MEM_RES_CTLR_KMEM */
+
+#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
+extern int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
+					  struct mem_cgroup **ptr);
+extern void mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages,
+					     struct mem_cgroup *memcg,
+					     struct page *page);
+extern void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages,
+					     struct page *page);
+extern void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
+					      struct mem_cgroup *memcg);
+
+#else
+static inline int
+mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
+						 struct mem_cgroup **ptr)
+{
+	return 0;
+}
+
+static inline void
+mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages,
+				 struct mem_cgroup *memcg,
+				 struct page *page)
+{
+	return;
+}
+
+static inline void
+mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages,
+				 struct page *page)
+{
+	return;
+}
+
+static inline void
+mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
+				  struct mem_cgroup *memcg)
+{
+	return;
+}
+#endif  /* CONFIG_MEM_RES_CTLR_HUGETLB */
 #endif /* _LINUX_MEMCONTROL_H */
 
diff --git a/init/Kconfig b/init/Kconfig
index 3f42cd6..f0eb8aa 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -725,6 +725,14 @@ config CGROUP_PERF
 
 	  Say N if unsure.
 
+config MEM_RES_CTLR_HUGETLB
+	bool "Memory Resource Controller HugeTLB Extension (EXPERIMENTAL)"
+	depends on CGROUP_MEM_RES_CTLR && HUGETLB_PAGE && EXPERIMENTAL
+	default n
+	help
+	  Add HugeTLB management to memory resource controller. When you
+	  enable this, you can put a per cgroup limit on HugeTLB usage.
+
 menuconfig CGROUP_SCHED
 	bool "Group CPU scheduler"
 	default n
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ebe245c..c672187 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -34,7 +34,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
 static gfp_t htlb_alloc_mask = GFP_HIGHUSER;
 unsigned long hugepages_treat_as_movable;
 
-static int hugetlb_max_hstate;
+int hugetlb_max_hstate;
 unsigned int default_hstate_idx;
 struct hstate hstates[HUGE_MAX_HSTATE];
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6728a7a..4b36c5e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -235,6 +235,10 @@ struct mem_cgroup {
 	 */
 	struct res_counter memsw;
 	/*
+	 * the counter to account for hugepages from hugetlb.
+	 */
+	struct res_counter hugepage[HUGE_MAX_HSTATE];
+	/*
 	 * Per cgroup active and inactive list, similar to the
 	 * per zone LRU lists.
 	 */
@@ -3156,6 +3160,128 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
 }
 #endif
 
+#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
+static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
+{
+	int idx;
+	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
+		if (memcg->hugepage[idx].usage > 0)
+			return 1;
+	}
+	return 0;
+}
+
+int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
+				   struct mem_cgroup **ptr)
+{
+	int ret = 0;
+	struct mem_cgroup *memcg;
+	struct res_counter *fail_res;
+	unsigned long csize = nr_pages * PAGE_SIZE;
+
+	if (mem_cgroup_disabled())
+		return 0;
+again:
+	rcu_read_lock();
+	memcg = mem_cgroup_from_task(current);
+	if (!memcg)
+		memcg = root_mem_cgroup;
+	if (mem_cgroup_is_root(memcg)) {
+		rcu_read_unlock();
+		goto done;
+	}
+	if (!css_tryget(&memcg->css)) {
+		rcu_read_unlock();
+		goto again;
+	}
+	rcu_read_unlock();
+
+	ret = res_counter_charge(&memcg->hugepage[idx], csize, &fail_res);
+	css_put(&memcg->css);
+done:
+	*ptr = memcg;
+	return ret;
+}
+
+void mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages,
+				      struct mem_cgroup *memcg,
+				      struct page *page)
+{
+	struct page_cgroup *pc;
+
+	if (mem_cgroup_disabled())
+		return;
+
+	pc = lookup_page_cgroup(page);
+	lock_page_cgroup(pc);
+	if (unlikely(PageCgroupUsed(pc))) {
+		unlock_page_cgroup(pc);
+		mem_cgroup_hugetlb_uncharge_memcg(idx, nr_pages, memcg);
+		return;
+	}
+	pc->mem_cgroup = memcg;
+	/*
+	 * We access a page_cgroup asynchronously without lock_page_cgroup().
+	 * Especially when a page_cgroup is taken from a page, pc->mem_cgroup
+	 * is accessed after testing USED bit. To make pc->mem_cgroup visible
+	 * before USED bit, we need memory barrier here.
+	 * See mem_cgroup_add_lru_list(), etc.
+	 */
+	smp_wmb();
+	SetPageCgroupUsed(pc);
+
+	unlock_page_cgroup(pc);
+	return;
+}
+
+void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages,
+				      struct page *page)
+{
+	struct page_cgroup *pc;
+	struct mem_cgroup *memcg;
+	unsigned long csize = nr_pages * PAGE_SIZE;
+
+	if (mem_cgroup_disabled())
+		return;
+
+	pc = lookup_page_cgroup(page);
+	if (unlikely(!PageCgroupUsed(pc)))
+		return;
+
+	lock_page_cgroup(pc);
+	if (!PageCgroupUsed(pc)) {
+		unlock_page_cgroup(pc);
+		return;
+	}
+	memcg = pc->mem_cgroup;
+	pc->mem_cgroup = root_mem_cgroup;
+	ClearPageCgroupUsed(pc);
+	unlock_page_cgroup(pc);
+
+	if (!mem_cgroup_is_root(memcg))
+		res_counter_uncharge(&memcg->hugepage[idx], csize);
+	return;
+}
+
+void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
+				       struct mem_cgroup *memcg)
+{
+	unsigned long csize = nr_pages * PAGE_SIZE;
+
+	if (mem_cgroup_disabled())
+		return;
+
+	if (!mem_cgroup_is_root(memcg))
+		res_counter_uncharge(&memcg->hugepage[idx], csize);
+	return;
+}
+#else
+static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
+{
+	return 0;
+}
+#endif /* CONFIG_MEM_RES_CTLR_HUGETLB */
+
 /*
  * Before starting migration, account PAGE_SIZE to mem_cgroup that the old
  * page belongs to.
@@ -4887,6 +5013,7 @@ err_cleanup:
 static struct cgroup_subsys_state * __ref
 mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 {
+	int idx;
 	struct mem_cgroup *memcg, *parent;
 	long error = -ENOMEM;
 	int node;
@@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 		 * mem_cgroup(see mem_cgroup_put).
 		 */
 		mem_cgroup_get(parent);
+		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
+			res_counter_init(&memcg->hugepage[idx],
+					 &parent->hugepage[idx]);
 	} else {
 		res_counter_init(&memcg->res, NULL);
 		res_counter_init(&memcg->memsw, NULL);
+		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
+			res_counter_init(&memcg->hugepage[idx], NULL);
 	}
 	memcg->last_scanned_node = MAX_NUMNODES;
 	INIT_LIST_HEAD(&memcg->oom_notify);
@@ -4951,6 +5083,12 @@ static int mem_cgroup_pre_destroy(struct cgroup_subsys *ss,
 					struct cgroup *cont)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
+	/*
+	 * Don't allow memcg removal if we have HugeTLB resource
+	 * usage.
+	 */
+	if (mem_cgroup_have_hugetlb_usage(memcg))
+		return -EBUSY;
 
 	return mem_cgroup_force_empty(memcg, false);
 }
-- 
1.7.9

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
  2012-03-16 17:39 ` Aneesh Kumar K.V
@ 2012-03-16 17:39   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-16 17:39 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This adds necessary charge/uncharge calls in the HugeTLB code

Acked-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 mm/hugetlb.c    |   21 ++++++++++++++++++++-
 mm/memcontrol.c |    5 +++++
 2 files changed, 25 insertions(+), 1 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index c672187..91361a0 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -21,6 +21,8 @@
 #include <linux/rmap.h>
 #include <linux/swap.h>
 #include <linux/swapops.h>
+#include <linux/memcontrol.h>
+#include <linux/page_cgroup.h>
 
 #include <asm/page.h>
 #include <asm/pgtable.h>
@@ -542,6 +544,9 @@ static void free_huge_page(struct page *page)
 	BUG_ON(page_mapcount(page));
 	INIT_LIST_HEAD(&page->lru);
 
+	if (mapping)
+		mem_cgroup_hugetlb_uncharge_page(hstate_index(h),
+						 pages_per_huge_page(h), page);
 	spin_lock(&hugetlb_lock);
 	if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
 		update_and_free_page(h, page);
@@ -1019,12 +1024,15 @@ static void vma_commit_reservation(struct hstate *h,
 static struct page *alloc_huge_page(struct vm_area_struct *vma,
 				    unsigned long addr, int avoid_reserve)
 {
+	int ret, idx;
 	struct hstate *h = hstate_vma(vma);
 	struct page *page;
+	struct mem_cgroup *memcg = NULL;
 	struct address_space *mapping = vma->vm_file->f_mapping;
 	struct inode *inode = mapping->host;
 	long chg;
 
+	idx = hstate_index(h);
 	/*
 	 * Processes that did not create the mapping will have no reserves and
 	 * will not have accounted against quota. Check that the quota can be
@@ -1039,6 +1047,12 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 		if (hugetlb_get_quota(inode->i_mapping, chg))
 			return ERR_PTR(-ENOSPC);
 
+	ret = mem_cgroup_hugetlb_charge_page(idx, pages_per_huge_page(h),
+					     &memcg);
+	if (ret) {
+		hugetlb_put_quota(inode->i_mapping, chg);
+		return ERR_PTR(-ENOSPC);
+	}
 	spin_lock(&hugetlb_lock);
 	page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve);
 	spin_unlock(&hugetlb_lock);
@@ -1046,6 +1060,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 	if (!page) {
 		page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
 		if (!page) {
+			mem_cgroup_hugetlb_uncharge_memcg(idx,
+							 pages_per_huge_page(h),
+							 memcg);
 			hugetlb_put_quota(inode->i_mapping, chg);
 			return ERR_PTR(-ENOSPC);
 		}
@@ -1054,7 +1071,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 	set_page_private(page, (unsigned long) mapping);
 
 	vma_commit_reservation(h, vma, addr);
-
+	/* update page cgroup details */
+	mem_cgroup_hugetlb_commit_charge(idx, pages_per_huge_page(h),
+					 memcg, page);
 	return page;
 }
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4b36c5e..7a9ea94 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2901,6 +2901,11 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
 
 	if (PageSwapCache(page))
 		return NULL;
+	/*
+	 * HugeTLB page uncharge happen in the HugeTLB compound page destructor
+	 */
+	if (PageHuge(page))
+		return NULL;
 
 	if (PageTransHuge(page)) {
 		nr_pages <<= compound_order(page);
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
@ 2012-03-16 17:39   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-16 17:39 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This adds necessary charge/uncharge calls in the HugeTLB code

Acked-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 mm/hugetlb.c    |   21 ++++++++++++++++++++-
 mm/memcontrol.c |    5 +++++
 2 files changed, 25 insertions(+), 1 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index c672187..91361a0 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -21,6 +21,8 @@
 #include <linux/rmap.h>
 #include <linux/swap.h>
 #include <linux/swapops.h>
+#include <linux/memcontrol.h>
+#include <linux/page_cgroup.h>
 
 #include <asm/page.h>
 #include <asm/pgtable.h>
@@ -542,6 +544,9 @@ static void free_huge_page(struct page *page)
 	BUG_ON(page_mapcount(page));
 	INIT_LIST_HEAD(&page->lru);
 
+	if (mapping)
+		mem_cgroup_hugetlb_uncharge_page(hstate_index(h),
+						 pages_per_huge_page(h), page);
 	spin_lock(&hugetlb_lock);
 	if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
 		update_and_free_page(h, page);
@@ -1019,12 +1024,15 @@ static void vma_commit_reservation(struct hstate *h,
 static struct page *alloc_huge_page(struct vm_area_struct *vma,
 				    unsigned long addr, int avoid_reserve)
 {
+	int ret, idx;
 	struct hstate *h = hstate_vma(vma);
 	struct page *page;
+	struct mem_cgroup *memcg = NULL;
 	struct address_space *mapping = vma->vm_file->f_mapping;
 	struct inode *inode = mapping->host;
 	long chg;
 
+	idx = hstate_index(h);
 	/*
 	 * Processes that did not create the mapping will have no reserves and
 	 * will not have accounted against quota. Check that the quota can be
@@ -1039,6 +1047,12 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 		if (hugetlb_get_quota(inode->i_mapping, chg))
 			return ERR_PTR(-ENOSPC);
 
+	ret = mem_cgroup_hugetlb_charge_page(idx, pages_per_huge_page(h),
+					     &memcg);
+	if (ret) {
+		hugetlb_put_quota(inode->i_mapping, chg);
+		return ERR_PTR(-ENOSPC);
+	}
 	spin_lock(&hugetlb_lock);
 	page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve);
 	spin_unlock(&hugetlb_lock);
@@ -1046,6 +1060,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 	if (!page) {
 		page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
 		if (!page) {
+			mem_cgroup_hugetlb_uncharge_memcg(idx,
+							 pages_per_huge_page(h),
+							 memcg);
 			hugetlb_put_quota(inode->i_mapping, chg);
 			return ERR_PTR(-ENOSPC);
 		}
@@ -1054,7 +1071,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 	set_page_private(page, (unsigned long) mapping);
 
 	vma_commit_reservation(h, vma, addr);
-
+	/* update page cgroup details */
+	mem_cgroup_hugetlb_commit_charge(idx, pages_per_huge_page(h),
+					 memcg, page);
 	return page;
 }
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4b36c5e..7a9ea94 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2901,6 +2901,11 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
 
 	if (PageSwapCache(page))
 		return NULL;
+	/*
+	 * HugeTLB page uncharge happen in the HugeTLB compound page destructor
+	 */
+	if (PageHuge(page))
+		return NULL;
 
 	if (PageTransHuge(page)) {
 		nr_pages <<= compound_order(page);
-- 
1.7.9

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH -V4 06/10] memcg: track resource index in cftype private
  2012-03-16 17:39 ` Aneesh Kumar K.V
@ 2012-03-16 17:39   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-16 17:39 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This helps in using same memcg callbacks for non reclaim resource
control files.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 mm/memcontrol.c |   27 +++++++++++++++++++++------
 1 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7a9ea94..d8b3513 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -358,9 +358,14 @@ enum charge_type {
 #define _MEM			(0)
 #define _MEMSWAP		(1)
 #define _OOM_TYPE		(2)
-#define MEMFILE_PRIVATE(x, val)	(((x) << 16) | (val))
-#define MEMFILE_TYPE(val)	(((val) >> 16) & 0xffff)
-#define MEMFILE_ATTR(val)	((val) & 0xffff)
+#define _MEMHUGETLB		(3)
+
+/*  0 ... val ...16.... x...24...idx...32*/
+#define __MEMFILE_PRIVATE(idx, x, val)	(((idx) << 24) | ((x) << 16) | (val))
+#define MEMFILE_PRIVATE(x, val)		__MEMFILE_PRIVATE(0, x, val)
+#define MEMFILE_TYPE(val)		(((val) >> 16) & 0xff)
+#define MEMFILE_IDX(val)		(((val) >> 24) & 0xff)
+#define MEMFILE_ATTR(val)		((val) & 0xffff)
 /* Used for OOM nofiier */
 #define OOM_CONTROL		(0)
 
@@ -3954,7 +3959,7 @@ static u64 mem_cgroup_read(struct cgroup *cont, struct cftype *cft)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
 	u64 val;
-	int type, name;
+	int type, name, idx;
 
 	type = MEMFILE_TYPE(cft->private);
 	name = MEMFILE_ATTR(cft->private);
@@ -3971,6 +3976,10 @@ static u64 mem_cgroup_read(struct cgroup *cont, struct cftype *cft)
 		else
 			val = res_counter_read_u64(&memcg->memsw, name);
 		break;
+	case _MEMHUGETLB:
+		idx = MEMFILE_IDX(cft->private);
+		val = res_counter_read_u64(&memcg->hugepage[idx], name);
+		break;
 	default:
 		BUG();
 		break;
@@ -4003,7 +4012,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
 			break;
 		if (type == _MEM)
 			ret = mem_cgroup_resize_limit(memcg, val);
-		else
+		else if (type == _MEMHUGETLB) {
+			int idx = MEMFILE_IDX(cft->private);
+			ret = res_counter_set_limit(&memcg->hugepage[idx], val);
+		} else
 			ret = mem_cgroup_resize_memsw_limit(memcg, val);
 		break;
 	case RES_SOFT_LIMIT:
@@ -4067,7 +4079,10 @@ static int mem_cgroup_reset(struct cgroup *cont, unsigned int event)
 	case RES_MAX_USAGE:
 		if (type == _MEM)
 			res_counter_reset_max(&memcg->res);
-		else
+		else if (type == _MEMHUGETLB) {
+			int idx = MEMFILE_IDX(event);
+			res_counter_reset_max(&memcg->hugepage[idx]);
+		} else
 			res_counter_reset_max(&memcg->memsw);
 		break;
 	case RES_FAILCNT:
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH -V4 06/10] memcg: track resource index in cftype private
@ 2012-03-16 17:39   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-16 17:39 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This helps in using same memcg callbacks for non reclaim resource
control files.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 mm/memcontrol.c |   27 +++++++++++++++++++++------
 1 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7a9ea94..d8b3513 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -358,9 +358,14 @@ enum charge_type {
 #define _MEM			(0)
 #define _MEMSWAP		(1)
 #define _OOM_TYPE		(2)
-#define MEMFILE_PRIVATE(x, val)	(((x) << 16) | (val))
-#define MEMFILE_TYPE(val)	(((val) >> 16) & 0xffff)
-#define MEMFILE_ATTR(val)	((val) & 0xffff)
+#define _MEMHUGETLB		(3)
+
+/*  0 ... val ...16.... x...24...idx...32*/
+#define __MEMFILE_PRIVATE(idx, x, val)	(((idx) << 24) | ((x) << 16) | (val))
+#define MEMFILE_PRIVATE(x, val)		__MEMFILE_PRIVATE(0, x, val)
+#define MEMFILE_TYPE(val)		(((val) >> 16) & 0xff)
+#define MEMFILE_IDX(val)		(((val) >> 24) & 0xff)
+#define MEMFILE_ATTR(val)		((val) & 0xffff)
 /* Used for OOM nofiier */
 #define OOM_CONTROL		(0)
 
@@ -3954,7 +3959,7 @@ static u64 mem_cgroup_read(struct cgroup *cont, struct cftype *cft)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
 	u64 val;
-	int type, name;
+	int type, name, idx;
 
 	type = MEMFILE_TYPE(cft->private);
 	name = MEMFILE_ATTR(cft->private);
@@ -3971,6 +3976,10 @@ static u64 mem_cgroup_read(struct cgroup *cont, struct cftype *cft)
 		else
 			val = res_counter_read_u64(&memcg->memsw, name);
 		break;
+	case _MEMHUGETLB:
+		idx = MEMFILE_IDX(cft->private);
+		val = res_counter_read_u64(&memcg->hugepage[idx], name);
+		break;
 	default:
 		BUG();
 		break;
@@ -4003,7 +4012,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
 			break;
 		if (type == _MEM)
 			ret = mem_cgroup_resize_limit(memcg, val);
-		else
+		else if (type == _MEMHUGETLB) {
+			int idx = MEMFILE_IDX(cft->private);
+			ret = res_counter_set_limit(&memcg->hugepage[idx], val);
+		} else
 			ret = mem_cgroup_resize_memsw_limit(memcg, val);
 		break;
 	case RES_SOFT_LIMIT:
@@ -4067,7 +4079,10 @@ static int mem_cgroup_reset(struct cgroup *cont, unsigned int event)
 	case RES_MAX_USAGE:
 		if (type == _MEM)
 			res_counter_reset_max(&memcg->res);
-		else
+		else if (type == _MEMHUGETLB) {
+			int idx = MEMFILE_IDX(event);
+			res_counter_reset_max(&memcg->hugepage[idx]);
+		} else
 			res_counter_reset_max(&memcg->memsw);
 		break;
 	case RES_FAILCNT:
-- 
1.7.9

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH -V4 07/10] hugetlbfs: Add memcg control files for hugetlbfs
  2012-03-16 17:39 ` Aneesh Kumar K.V
@ 2012-03-16 17:39   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-16 17:39 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This add control files for hugetlbfs in memcg

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/hugetlb.h    |   17 +++++++++++++++
 include/linux/memcontrol.h |    7 ++++++
 mm/hugetlb.c               |   25 ++++++++++++++++++++++-
 mm/memcontrol.c            |   48 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 96 insertions(+), 1 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 1f70068..cbd8dc5 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -4,6 +4,7 @@
 #include <linux/mm_types.h>
 #include <linux/fs.h>
 #include <linux/hugetlb_inline.h>
+#include <linux/cgroup.h>
 
 struct ctl_table;
 struct user_struct;
@@ -220,6 +221,12 @@ struct hstate {
 	unsigned int nr_huge_pages_node[MAX_NUMNODES];
 	unsigned int free_huge_pages_node[MAX_NUMNODES];
 	unsigned int surplus_huge_pages_node[MAX_NUMNODES];
+	/* mem cgroup control files */
+#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
+	struct cftype cgroup_limit_file;
+	struct cftype cgroup_usage_file;
+	struct cftype cgroup_max_usage_file;
+#endif
 	char name[HSTATE_NAME_LEN];
 };
 
@@ -338,4 +345,14 @@ static inline unsigned int pages_per_huge_page(struct hstate *h)
 #define hstate_index(h) 0
 #endif
 
+#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
+extern int register_hugetlb_memcg_files(struct cgroup *cgroup,
+					struct cgroup_subsys *ss);
+#else
+static inline int register_hugetlb_memcg_files(struct cgroup *cgroup,
+					       struct cgroup_subsys *ss)
+{
+	return 0;
+}
+#endif
 #endif /* _LINUX_HUGETLB_H */
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 320dbad..73900b9 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -440,6 +440,7 @@ extern void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages,
 					     struct page *page);
 extern void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
 					      struct mem_cgroup *memcg);
+extern int mem_cgroup_hugetlb_file_init(int idx);
 
 #else
 static inline int
@@ -470,6 +471,12 @@ mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
 {
 	return;
 }
+
+static inline int mem_cgroup_hugetlb_file_init(int idx)
+{
+	return 0;
+}
+
 #endif  /* CONFIG_MEM_RES_CTLR_HUGETLB */
 #endif /* _LINUX_MEMCONTROL_H */
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 91361a0..684849a 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1819,6 +1819,29 @@ static int __init hugetlb_init(void)
 }
 module_init(hugetlb_init);
 
+#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
+int register_hugetlb_memcg_files(struct cgroup *cgroup,
+				 struct cgroup_subsys *ss)
+{
+	int ret = 0;
+	struct hstate *h;
+
+	for_each_hstate(h) {
+		ret = cgroup_add_file(cgroup, ss, &h->cgroup_limit_file);
+		if (ret)
+			return ret;
+		ret = cgroup_add_file(cgroup, ss, &h->cgroup_usage_file);
+		if (ret)
+			return ret;
+		ret = cgroup_add_file(cgroup, ss, &h->cgroup_max_usage_file);
+		if (ret)
+			return ret;
+
+	}
+	return ret;
+}
+#endif
+
 /* Should be called on processing a hugepagesz=... option */
 void __init hugetlb_add_hstate(unsigned order)
 {
@@ -1842,7 +1865,7 @@ void __init hugetlb_add_hstate(unsigned order)
 	h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]);
 	snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
 					huge_page_size(h)/1024);
-
+	mem_cgroup_hugetlb_file_init(hugetlb_max_hstate - 1);
 	parsed_hstate = h;
 }
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d8b3513..4900b72 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5123,6 +5123,51 @@ static void mem_cgroup_destroy(struct cgroup_subsys *ss,
 	mem_cgroup_put(memcg);
 }
 
+#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
+static char *mem_fmt(char *buf, unsigned long n)
+{
+	if (n >= (1UL << 30))
+		sprintf(buf, "%luGB", n >> 30);
+	else if (n >= (1UL << 20))
+		sprintf(buf, "%luMB", n >> 20);
+	else
+		sprintf(buf, "%luKB", n >> 10);
+	return buf;
+}
+
+int mem_cgroup_hugetlb_file_init(int idx)
+{
+	char buf[32];
+	struct cftype *cft;
+	struct hstate *h = &hstates[idx];
+
+	/* format the size */
+	mem_fmt(buf, huge_page_size(h));
+
+	/* Add the limit file */
+	cft = &h->cgroup_limit_file;
+	snprintf(cft->name, MAX_CFTYPE_NAME, "hugetlb.%s.limit_in_bytes", buf);
+	cft->private = __MEMFILE_PRIVATE(idx, _MEMHUGETLB, RES_LIMIT);
+	cft->read_u64 = mem_cgroup_read;
+	cft->write_string = mem_cgroup_write;
+
+	/* Add the usage file */
+	cft = &h->cgroup_usage_file;
+	snprintf(cft->name, MAX_CFTYPE_NAME, "hugetlb.%s.usage_in_bytes", buf);
+	cft->private  = __MEMFILE_PRIVATE(idx, _MEMHUGETLB, RES_USAGE);
+	cft->read_u64 = mem_cgroup_read;
+
+	/* Add the MAX usage file */
+	cft = &h->cgroup_max_usage_file;
+	snprintf(cft->name, MAX_CFTYPE_NAME, "hugetlb.%s.max_usage_in_bytes", buf);
+	cft->private  = __MEMFILE_PRIVATE(idx, _MEMHUGETLB, RES_MAX_USAGE);
+	cft->trigger  = mem_cgroup_reset;
+	cft->read_u64 = mem_cgroup_read;
+
+	return 0;
+}
+#endif
+
 static int mem_cgroup_populate(struct cgroup_subsys *ss,
 				struct cgroup *cont)
 {
@@ -5137,6 +5182,9 @@ static int mem_cgroup_populate(struct cgroup_subsys *ss,
 	if (!ret)
 		ret = register_kmem_files(cont, ss);
 
+	if (!ret)
+		ret = register_hugetlb_memcg_files(cont, ss);
+
 	return ret;
 }
 
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH -V4 07/10] hugetlbfs: Add memcg control files for hugetlbfs
@ 2012-03-16 17:39   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-16 17:39 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This add control files for hugetlbfs in memcg

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/hugetlb.h    |   17 +++++++++++++++
 include/linux/memcontrol.h |    7 ++++++
 mm/hugetlb.c               |   25 ++++++++++++++++++++++-
 mm/memcontrol.c            |   48 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 96 insertions(+), 1 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 1f70068..cbd8dc5 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -4,6 +4,7 @@
 #include <linux/mm_types.h>
 #include <linux/fs.h>
 #include <linux/hugetlb_inline.h>
+#include <linux/cgroup.h>
 
 struct ctl_table;
 struct user_struct;
@@ -220,6 +221,12 @@ struct hstate {
 	unsigned int nr_huge_pages_node[MAX_NUMNODES];
 	unsigned int free_huge_pages_node[MAX_NUMNODES];
 	unsigned int surplus_huge_pages_node[MAX_NUMNODES];
+	/* mem cgroup control files */
+#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
+	struct cftype cgroup_limit_file;
+	struct cftype cgroup_usage_file;
+	struct cftype cgroup_max_usage_file;
+#endif
 	char name[HSTATE_NAME_LEN];
 };
 
@@ -338,4 +345,14 @@ static inline unsigned int pages_per_huge_page(struct hstate *h)
 #define hstate_index(h) 0
 #endif
 
+#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
+extern int register_hugetlb_memcg_files(struct cgroup *cgroup,
+					struct cgroup_subsys *ss);
+#else
+static inline int register_hugetlb_memcg_files(struct cgroup *cgroup,
+					       struct cgroup_subsys *ss)
+{
+	return 0;
+}
+#endif
 #endif /* _LINUX_HUGETLB_H */
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 320dbad..73900b9 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -440,6 +440,7 @@ extern void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages,
 					     struct page *page);
 extern void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
 					      struct mem_cgroup *memcg);
+extern int mem_cgroup_hugetlb_file_init(int idx);
 
 #else
 static inline int
@@ -470,6 +471,12 @@ mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
 {
 	return;
 }
+
+static inline int mem_cgroup_hugetlb_file_init(int idx)
+{
+	return 0;
+}
+
 #endif  /* CONFIG_MEM_RES_CTLR_HUGETLB */
 #endif /* _LINUX_MEMCONTROL_H */
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 91361a0..684849a 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1819,6 +1819,29 @@ static int __init hugetlb_init(void)
 }
 module_init(hugetlb_init);
 
+#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
+int register_hugetlb_memcg_files(struct cgroup *cgroup,
+				 struct cgroup_subsys *ss)
+{
+	int ret = 0;
+	struct hstate *h;
+
+	for_each_hstate(h) {
+		ret = cgroup_add_file(cgroup, ss, &h->cgroup_limit_file);
+		if (ret)
+			return ret;
+		ret = cgroup_add_file(cgroup, ss, &h->cgroup_usage_file);
+		if (ret)
+			return ret;
+		ret = cgroup_add_file(cgroup, ss, &h->cgroup_max_usage_file);
+		if (ret)
+			return ret;
+
+	}
+	return ret;
+}
+#endif
+
 /* Should be called on processing a hugepagesz=... option */
 void __init hugetlb_add_hstate(unsigned order)
 {
@@ -1842,7 +1865,7 @@ void __init hugetlb_add_hstate(unsigned order)
 	h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]);
 	snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
 					huge_page_size(h)/1024);
-
+	mem_cgroup_hugetlb_file_init(hugetlb_max_hstate - 1);
 	parsed_hstate = h;
 }
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d8b3513..4900b72 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5123,6 +5123,51 @@ static void mem_cgroup_destroy(struct cgroup_subsys *ss,
 	mem_cgroup_put(memcg);
 }
 
+#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
+static char *mem_fmt(char *buf, unsigned long n)
+{
+	if (n >= (1UL << 30))
+		sprintf(buf, "%luGB", n >> 30);
+	else if (n >= (1UL << 20))
+		sprintf(buf, "%luMB", n >> 20);
+	else
+		sprintf(buf, "%luKB", n >> 10);
+	return buf;
+}
+
+int mem_cgroup_hugetlb_file_init(int idx)
+{
+	char buf[32];
+	struct cftype *cft;
+	struct hstate *h = &hstates[idx];
+
+	/* format the size */
+	mem_fmt(buf, huge_page_size(h));
+
+	/* Add the limit file */
+	cft = &h->cgroup_limit_file;
+	snprintf(cft->name, MAX_CFTYPE_NAME, "hugetlb.%s.limit_in_bytes", buf);
+	cft->private = __MEMFILE_PRIVATE(idx, _MEMHUGETLB, RES_LIMIT);
+	cft->read_u64 = mem_cgroup_read;
+	cft->write_string = mem_cgroup_write;
+
+	/* Add the usage file */
+	cft = &h->cgroup_usage_file;
+	snprintf(cft->name, MAX_CFTYPE_NAME, "hugetlb.%s.usage_in_bytes", buf);
+	cft->private  = __MEMFILE_PRIVATE(idx, _MEMHUGETLB, RES_USAGE);
+	cft->read_u64 = mem_cgroup_read;
+
+	/* Add the MAX usage file */
+	cft = &h->cgroup_max_usage_file;
+	snprintf(cft->name, MAX_CFTYPE_NAME, "hugetlb.%s.max_usage_in_bytes", buf);
+	cft->private  = __MEMFILE_PRIVATE(idx, _MEMHUGETLB, RES_MAX_USAGE);
+	cft->trigger  = mem_cgroup_reset;
+	cft->read_u64 = mem_cgroup_read;
+
+	return 0;
+}
+#endif
+
 static int mem_cgroup_populate(struct cgroup_subsys *ss,
 				struct cgroup *cont)
 {
@@ -5137,6 +5182,9 @@ static int mem_cgroup_populate(struct cgroup_subsys *ss,
 	if (!ret)
 		ret = register_kmem_files(cont, ss);
 
+	if (!ret)
+		ret = register_hugetlb_memcg_files(cont, ss);
+
 	return ret;
 }
 
-- 
1.7.9

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH -V4 08/10] hugetlbfs: Add a list for tracking in-use HugeTLB pages
  2012-03-16 17:39 ` Aneesh Kumar K.V
@ 2012-03-16 17:39   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-16 17:39 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

hugepage_activelist will be used to track currently used HugeTLB pages.
We need to find the in-use HugeTLB pages to support memcg removal.
On memcg removal we update the page's memory cgroup to point to
parent cgroup.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/hugetlb.h |    1 +
 mm/hugetlb.c            |   23 ++++++++++++++++++-----
 2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index cbd8dc5..6919100 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -217,6 +217,7 @@ struct hstate {
 	unsigned long resv_huge_pages;
 	unsigned long surplus_huge_pages;
 	unsigned long nr_overcommit_huge_pages;
+	struct list_head hugepage_activelist;
 	struct list_head hugepage_freelists[MAX_NUMNODES];
 	unsigned int nr_huge_pages_node[MAX_NUMNODES];
 	unsigned int free_huge_pages_node[MAX_NUMNODES];
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 684849a..8fd465d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -433,7 +433,7 @@ void copy_huge_page(struct page *dst, struct page *src)
 static void enqueue_huge_page(struct hstate *h, struct page *page)
 {
 	int nid = page_to_nid(page);
-	list_add(&page->lru, &h->hugepage_freelists[nid]);
+	list_move(&page->lru, &h->hugepage_freelists[nid]);
 	h->free_huge_pages++;
 	h->free_huge_pages_node[nid]++;
 }
@@ -445,7 +445,7 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid)
 	if (list_empty(&h->hugepage_freelists[nid]))
 		return NULL;
 	page = list_entry(h->hugepage_freelists[nid].next, struct page, lru);
-	list_del(&page->lru);
+	list_move(&page->lru, &h->hugepage_activelist);
 	set_page_refcounted(page);
 	h->free_huge_pages--;
 	h->free_huge_pages_node[nid]--;
@@ -542,13 +542,14 @@ static void free_huge_page(struct page *page)
 	page->mapping = NULL;
 	BUG_ON(page_count(page));
 	BUG_ON(page_mapcount(page));
-	INIT_LIST_HEAD(&page->lru);
 
 	if (mapping)
 		mem_cgroup_hugetlb_uncharge_page(hstate_index(h),
 						 pages_per_huge_page(h), page);
 	spin_lock(&hugetlb_lock);
 	if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
+		/* remove the page from active list */
+		list_del(&page->lru);
 		update_and_free_page(h, page);
 		h->surplus_huge_pages--;
 		h->surplus_huge_pages_node[nid]--;
@@ -562,6 +563,7 @@ static void free_huge_page(struct page *page)
 
 static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
 {
+	INIT_LIST_HEAD(&page->lru);
 	set_compound_page_dtor(page, free_huge_page);
 	spin_lock(&hugetlb_lock);
 	h->nr_huge_pages++;
@@ -1861,6 +1863,7 @@ void __init hugetlb_add_hstate(unsigned order)
 	h->free_huge_pages = 0;
 	for (i = 0; i < MAX_NUMNODES; ++i)
 		INIT_LIST_HEAD(&h->hugepage_freelists[i]);
+	INIT_LIST_HEAD(&h->hugepage_activelist);
 	h->next_nid_to_alloc = first_node(node_states[N_HIGH_MEMORY]);
 	h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]);
 	snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
@@ -2319,14 +2322,24 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
 		page = pte_page(pte);
 		if (pte_dirty(pte))
 			set_page_dirty(page);
-		list_add(&page->lru, &page_list);
+
+		spin_lock(&hugetlb_lock);
+		list_move(&page->lru, &page_list);
+		spin_unlock(&hugetlb_lock);
 	}
 	spin_unlock(&mm->page_table_lock);
 	flush_tlb_range(vma, start, end);
 	mmu_notifier_invalidate_range_end(mm, start, end);
 	list_for_each_entry_safe(page, tmp, &page_list, lru) {
 		page_remove_rmap(page);
-		list_del(&page->lru);
+		/*
+		 * We need to move it back huge page active list. If we are
+		 * holding the last reference, below put_page will move it
+		 * back to free list.
+		 */
+		spin_lock(&hugetlb_lock);
+		list_move(&page->lru, &h->hugepage_activelist);
+		spin_unlock(&hugetlb_lock);
 		put_page(page);
 	}
 }
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH -V4 08/10] hugetlbfs: Add a list for tracking in-use HugeTLB pages
@ 2012-03-16 17:39   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-16 17:39 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

hugepage_activelist will be used to track currently used HugeTLB pages.
We need to find the in-use HugeTLB pages to support memcg removal.
On memcg removal we update the page's memory cgroup to point to
parent cgroup.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/hugetlb.h |    1 +
 mm/hugetlb.c            |   23 ++++++++++++++++++-----
 2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index cbd8dc5..6919100 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -217,6 +217,7 @@ struct hstate {
 	unsigned long resv_huge_pages;
 	unsigned long surplus_huge_pages;
 	unsigned long nr_overcommit_huge_pages;
+	struct list_head hugepage_activelist;
 	struct list_head hugepage_freelists[MAX_NUMNODES];
 	unsigned int nr_huge_pages_node[MAX_NUMNODES];
 	unsigned int free_huge_pages_node[MAX_NUMNODES];
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 684849a..8fd465d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -433,7 +433,7 @@ void copy_huge_page(struct page *dst, struct page *src)
 static void enqueue_huge_page(struct hstate *h, struct page *page)
 {
 	int nid = page_to_nid(page);
-	list_add(&page->lru, &h->hugepage_freelists[nid]);
+	list_move(&page->lru, &h->hugepage_freelists[nid]);
 	h->free_huge_pages++;
 	h->free_huge_pages_node[nid]++;
 }
@@ -445,7 +445,7 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid)
 	if (list_empty(&h->hugepage_freelists[nid]))
 		return NULL;
 	page = list_entry(h->hugepage_freelists[nid].next, struct page, lru);
-	list_del(&page->lru);
+	list_move(&page->lru, &h->hugepage_activelist);
 	set_page_refcounted(page);
 	h->free_huge_pages--;
 	h->free_huge_pages_node[nid]--;
@@ -542,13 +542,14 @@ static void free_huge_page(struct page *page)
 	page->mapping = NULL;
 	BUG_ON(page_count(page));
 	BUG_ON(page_mapcount(page));
-	INIT_LIST_HEAD(&page->lru);
 
 	if (mapping)
 		mem_cgroup_hugetlb_uncharge_page(hstate_index(h),
 						 pages_per_huge_page(h), page);
 	spin_lock(&hugetlb_lock);
 	if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
+		/* remove the page from active list */
+		list_del(&page->lru);
 		update_and_free_page(h, page);
 		h->surplus_huge_pages--;
 		h->surplus_huge_pages_node[nid]--;
@@ -562,6 +563,7 @@ static void free_huge_page(struct page *page)
 
 static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
 {
+	INIT_LIST_HEAD(&page->lru);
 	set_compound_page_dtor(page, free_huge_page);
 	spin_lock(&hugetlb_lock);
 	h->nr_huge_pages++;
@@ -1861,6 +1863,7 @@ void __init hugetlb_add_hstate(unsigned order)
 	h->free_huge_pages = 0;
 	for (i = 0; i < MAX_NUMNODES; ++i)
 		INIT_LIST_HEAD(&h->hugepage_freelists[i]);
+	INIT_LIST_HEAD(&h->hugepage_activelist);
 	h->next_nid_to_alloc = first_node(node_states[N_HIGH_MEMORY]);
 	h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]);
 	snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
@@ -2319,14 +2322,24 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
 		page = pte_page(pte);
 		if (pte_dirty(pte))
 			set_page_dirty(page);
-		list_add(&page->lru, &page_list);
+
+		spin_lock(&hugetlb_lock);
+		list_move(&page->lru, &page_list);
+		spin_unlock(&hugetlb_lock);
 	}
 	spin_unlock(&mm->page_table_lock);
 	flush_tlb_range(vma, start, end);
 	mmu_notifier_invalidate_range_end(mm, start, end);
 	list_for_each_entry_safe(page, tmp, &page_list, lru) {
 		page_remove_rmap(page);
-		list_del(&page->lru);
+		/*
+		 * We need to move it back huge page active list. If we are
+		 * holding the last reference, below put_page will move it
+		 * back to free list.
+		 */
+		spin_lock(&hugetlb_lock);
+		list_move(&page->lru, &h->hugepage_activelist);
+		spin_unlock(&hugetlb_lock);
 		put_page(page);
 	}
 }
-- 
1.7.9

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH -V4 09/10] memcg: move HugeTLB resource count to parent cgroup on memcg removal
  2012-03-16 17:39 ` Aneesh Kumar K.V
@ 2012-03-16 17:39   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-16 17:39 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This add support for memcg removal with HugeTLB resource usage.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/hugetlb.h    |    6 ++++
 include/linux/memcontrol.h |   15 +++++++++-
 mm/hugetlb.c               |   41 ++++++++++++++++++++++++++
 mm/memcontrol.c            |   68 +++++++++++++++++++++++++++++++++++++------
 4 files changed, 119 insertions(+), 11 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 6919100..32e948c 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -349,11 +349,17 @@ static inline unsigned int pages_per_huge_page(struct hstate *h)
 #ifdef CONFIG_MEM_RES_CTLR_HUGETLB
 extern int register_hugetlb_memcg_files(struct cgroup *cgroup,
 					struct cgroup_subsys *ss);
+extern int hugetlb_force_memcg_empty(struct cgroup *cgroup);
 #else
 static inline int register_hugetlb_memcg_files(struct cgroup *cgroup,
 					       struct cgroup_subsys *ss)
 {
 	return 0;
 }
+
+static inline int hugetlb_force_memcg_empty(struct cgroup *cgroup)
+{
+	return 0;
+}
 #endif
 #endif /* _LINUX_HUGETLB_H */
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 73900b9..0980122 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -441,7 +441,9 @@ extern void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages,
 extern void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
 					      struct mem_cgroup *memcg);
 extern int mem_cgroup_hugetlb_file_init(int idx);
-
+extern int mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup,
+					  struct page *page);
+extern bool mem_cgroup_have_hugetlb_usage(struct cgroup *cgroup);
 #else
 static inline int
 mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
@@ -477,6 +479,17 @@ static inline int mem_cgroup_hugetlb_file_init(int idx)
 	return 0;
 }
 
+static inline int
+mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup,
+			       struct page *page)
+{
+	return 0;
+}
+
+static inline bool mem_cgroup_have_hugetlb_usage(struct cgroup *cgroup)
+{
+	return 0;
+}
 #endif  /* CONFIG_MEM_RES_CTLR_HUGETLB */
 #endif /* _LINUX_MEMCONTROL_H */
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8fd465d..685f0d5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1842,6 +1842,47 @@ int register_hugetlb_memcg_files(struct cgroup *cgroup,
 	}
 	return ret;
 }
+
+/*
+ * Force the memcg to empty the hugetlb resources by moving them to
+ * the parent cgroup. We can fail if the parent cgroup's limit prevented
+ * the charging. This should only happen if use_hierarchy is not set.
+ */
+int hugetlb_force_memcg_empty(struct cgroup *cgroup)
+{
+	struct hstate *h;
+	struct page *page;
+	int ret = 0, idx = 0;
+
+	do {
+		if (cgroup_task_count(cgroup) || !list_empty(&cgroup->children))
+			goto out;
+		/*
+		 * If the task doing the cgroup_rmdir got a signal
+		 * we don't really need to loop till the hugetlb resource
+		 * usage become zero.
+		 */
+		if (signal_pending(current)) {
+			ret = -EINTR;
+			goto out;
+		}
+		for_each_hstate(h) {
+			spin_lock(&hugetlb_lock);
+			list_for_each_entry(page, &h->hugepage_activelist, lru) {
+				ret = mem_cgroup_move_hugetlb_parent(idx, cgroup, page);
+				if (ret) {
+					spin_unlock(&hugetlb_lock);
+					goto out;
+				}
+			}
+			spin_unlock(&hugetlb_lock);
+			idx++;
+		}
+		cond_resched();
+	} while (mem_cgroup_have_hugetlb_usage(cgroup));
+out:
+	return ret;
+}
 #endif
 
 /* Should be called on processing a hugepagesz=... option */
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4900b72..e29d86d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3171,9 +3171,11 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
 #endif
 
 #ifdef CONFIG_MEM_RES_CTLR_HUGETLB
-static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
+bool mem_cgroup_have_hugetlb_usage(struct cgroup *cgroup)
 {
 	int idx;
+	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgroup);
+
 	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
 		if (memcg->hugepage[idx].usage > 0)
 			return 1;
@@ -3285,10 +3287,57 @@ void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
 		res_counter_uncharge(&memcg->hugepage[idx], csize);
 	return;
 }
-#else
-static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
+
+int mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup,
+				   struct page *page)
 {
-	return 0;
+	struct page_cgroup *pc;
+	int csize,  ret = 0;
+	struct res_counter *fail_res;
+	struct cgroup *pcgrp = cgroup->parent;
+	struct mem_cgroup *parent = mem_cgroup_from_cont(pcgrp);
+	struct mem_cgroup *memcg  = mem_cgroup_from_cont(cgroup);
+
+	if (!get_page_unless_zero(page))
+		goto out;
+
+	pc = lookup_page_cgroup(page);
+	lock_page_cgroup(pc);
+	if (!PageCgroupUsed(pc) || pc->mem_cgroup != memcg)
+		goto err_out;
+
+	csize = PAGE_SIZE << compound_order(page);
+	/*
+	 * uncharge from child and charge the parent. If we have
+	 * use_hierarchy set, we can never fail here. In-order to make
+	 * sure we don't get -ENOMEM on parent charge, we first uncharge
+	 * the child and then charge the parent.
+	 */
+	if (parent->use_hierarchy) {
+		res_counter_uncharge(&memcg->hugepage[idx], csize);
+		if (!mem_cgroup_is_root(parent))
+			ret = res_counter_charge(&parent->hugepage[idx],
+						 csize, &fail_res);
+	} else {
+		if (!mem_cgroup_is_root(parent)) {
+			ret = res_counter_charge(&parent->hugepage[idx],
+						 csize, &fail_res);
+			if (ret) {
+				ret = -EBUSY;
+				goto err_out;
+			}
+		}
+		res_counter_uncharge(&memcg->hugepage[idx], csize);
+	}
+	/*
+	 * caller should have done css_get
+	 */
+	pc->mem_cgroup = parent;
+err_out:
+	unlock_page_cgroup(pc);
+	put_page(page);
+out:
+	return ret;
 }
 #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */
 
@@ -3806,6 +3855,11 @@ static int mem_cgroup_force_empty(struct mem_cgroup *memcg, bool free_all)
 	/* should free all ? */
 	if (free_all)
 		goto try_to_free;
+
+	/* move the hugetlb charges */
+	ret = hugetlb_force_memcg_empty(cgrp);
+	if (ret)
+		goto out;
 move_account:
 	do {
 		ret = -EBUSY;
@@ -5103,12 +5157,6 @@ static int mem_cgroup_pre_destroy(struct cgroup_subsys *ss,
 					struct cgroup *cont)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
-	/*
-	 * Don't allow memcg removal if we have HugeTLB resource
-	 * usage.
-	 */
-	if (mem_cgroup_have_hugetlb_usage(memcg))
-		return -EBUSY;
 
 	return mem_cgroup_force_empty(memcg, false);
 }
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH -V4 09/10] memcg: move HugeTLB resource count to parent cgroup on memcg removal
@ 2012-03-16 17:39   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-16 17:39 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This add support for memcg removal with HugeTLB resource usage.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/hugetlb.h    |    6 ++++
 include/linux/memcontrol.h |   15 +++++++++-
 mm/hugetlb.c               |   41 ++++++++++++++++++++++++++
 mm/memcontrol.c            |   68 +++++++++++++++++++++++++++++++++++++------
 4 files changed, 119 insertions(+), 11 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 6919100..32e948c 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -349,11 +349,17 @@ static inline unsigned int pages_per_huge_page(struct hstate *h)
 #ifdef CONFIG_MEM_RES_CTLR_HUGETLB
 extern int register_hugetlb_memcg_files(struct cgroup *cgroup,
 					struct cgroup_subsys *ss);
+extern int hugetlb_force_memcg_empty(struct cgroup *cgroup);
 #else
 static inline int register_hugetlb_memcg_files(struct cgroup *cgroup,
 					       struct cgroup_subsys *ss)
 {
 	return 0;
 }
+
+static inline int hugetlb_force_memcg_empty(struct cgroup *cgroup)
+{
+	return 0;
+}
 #endif
 #endif /* _LINUX_HUGETLB_H */
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 73900b9..0980122 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -441,7 +441,9 @@ extern void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages,
 extern void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
 					      struct mem_cgroup *memcg);
 extern int mem_cgroup_hugetlb_file_init(int idx);
-
+extern int mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup,
+					  struct page *page);
+extern bool mem_cgroup_have_hugetlb_usage(struct cgroup *cgroup);
 #else
 static inline int
 mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
@@ -477,6 +479,17 @@ static inline int mem_cgroup_hugetlb_file_init(int idx)
 	return 0;
 }
 
+static inline int
+mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup,
+			       struct page *page)
+{
+	return 0;
+}
+
+static inline bool mem_cgroup_have_hugetlb_usage(struct cgroup *cgroup)
+{
+	return 0;
+}
 #endif  /* CONFIG_MEM_RES_CTLR_HUGETLB */
 #endif /* _LINUX_MEMCONTROL_H */
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8fd465d..685f0d5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1842,6 +1842,47 @@ int register_hugetlb_memcg_files(struct cgroup *cgroup,
 	}
 	return ret;
 }
+
+/*
+ * Force the memcg to empty the hugetlb resources by moving them to
+ * the parent cgroup. We can fail if the parent cgroup's limit prevented
+ * the charging. This should only happen if use_hierarchy is not set.
+ */
+int hugetlb_force_memcg_empty(struct cgroup *cgroup)
+{
+	struct hstate *h;
+	struct page *page;
+	int ret = 0, idx = 0;
+
+	do {
+		if (cgroup_task_count(cgroup) || !list_empty(&cgroup->children))
+			goto out;
+		/*
+		 * If the task doing the cgroup_rmdir got a signal
+		 * we don't really need to loop till the hugetlb resource
+		 * usage become zero.
+		 */
+		if (signal_pending(current)) {
+			ret = -EINTR;
+			goto out;
+		}
+		for_each_hstate(h) {
+			spin_lock(&hugetlb_lock);
+			list_for_each_entry(page, &h->hugepage_activelist, lru) {
+				ret = mem_cgroup_move_hugetlb_parent(idx, cgroup, page);
+				if (ret) {
+					spin_unlock(&hugetlb_lock);
+					goto out;
+				}
+			}
+			spin_unlock(&hugetlb_lock);
+			idx++;
+		}
+		cond_resched();
+	} while (mem_cgroup_have_hugetlb_usage(cgroup));
+out:
+	return ret;
+}
 #endif
 
 /* Should be called on processing a hugepagesz=... option */
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4900b72..e29d86d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3171,9 +3171,11 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
 #endif
 
 #ifdef CONFIG_MEM_RES_CTLR_HUGETLB
-static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
+bool mem_cgroup_have_hugetlb_usage(struct cgroup *cgroup)
 {
 	int idx;
+	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgroup);
+
 	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
 		if (memcg->hugepage[idx].usage > 0)
 			return 1;
@@ -3285,10 +3287,57 @@ void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
 		res_counter_uncharge(&memcg->hugepage[idx], csize);
 	return;
 }
-#else
-static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
+
+int mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup,
+				   struct page *page)
 {
-	return 0;
+	struct page_cgroup *pc;
+	int csize,  ret = 0;
+	struct res_counter *fail_res;
+	struct cgroup *pcgrp = cgroup->parent;
+	struct mem_cgroup *parent = mem_cgroup_from_cont(pcgrp);
+	struct mem_cgroup *memcg  = mem_cgroup_from_cont(cgroup);
+
+	if (!get_page_unless_zero(page))
+		goto out;
+
+	pc = lookup_page_cgroup(page);
+	lock_page_cgroup(pc);
+	if (!PageCgroupUsed(pc) || pc->mem_cgroup != memcg)
+		goto err_out;
+
+	csize = PAGE_SIZE << compound_order(page);
+	/*
+	 * uncharge from child and charge the parent. If we have
+	 * use_hierarchy set, we can never fail here. In-order to make
+	 * sure we don't get -ENOMEM on parent charge, we first uncharge
+	 * the child and then charge the parent.
+	 */
+	if (parent->use_hierarchy) {
+		res_counter_uncharge(&memcg->hugepage[idx], csize);
+		if (!mem_cgroup_is_root(parent))
+			ret = res_counter_charge(&parent->hugepage[idx],
+						 csize, &fail_res);
+	} else {
+		if (!mem_cgroup_is_root(parent)) {
+			ret = res_counter_charge(&parent->hugepage[idx],
+						 csize, &fail_res);
+			if (ret) {
+				ret = -EBUSY;
+				goto err_out;
+			}
+		}
+		res_counter_uncharge(&memcg->hugepage[idx], csize);
+	}
+	/*
+	 * caller should have done css_get
+	 */
+	pc->mem_cgroup = parent;
+err_out:
+	unlock_page_cgroup(pc);
+	put_page(page);
+out:
+	return ret;
 }
 #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */
 
@@ -3806,6 +3855,11 @@ static int mem_cgroup_force_empty(struct mem_cgroup *memcg, bool free_all)
 	/* should free all ? */
 	if (free_all)
 		goto try_to_free;
+
+	/* move the hugetlb charges */
+	ret = hugetlb_force_memcg_empty(cgrp);
+	if (ret)
+		goto out;
 move_account:
 	do {
 		ret = -EBUSY;
@@ -5103,12 +5157,6 @@ static int mem_cgroup_pre_destroy(struct cgroup_subsys *ss,
 					struct cgroup *cont)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
-	/*
-	 * Don't allow memcg removal if we have HugeTLB resource
-	 * usage.
-	 */
-	if (mem_cgroup_have_hugetlb_usage(memcg))
-		return -EBUSY;
 
 	return mem_cgroup_force_empty(memcg, false);
 }
-- 
1.7.9

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH -V4 10/10] memcg: Add memory controller documentation for hugetlb management
  2012-03-16 17:39 ` Aneesh Kumar K.V
@ 2012-03-16 17:39   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-16 17:39 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 Documentation/cgroups/memory.txt |   29 +++++++++++++++++++++++++++++
 1 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index 4c95c00..d99c41b 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -43,6 +43,7 @@ Features:
  - usage threshold notifier
  - oom-killer disable knob and oom-notifier
  - Root cgroup has no limit controls.
+ - resource accounting for HugeTLB pages
 
  Kernel memory support is work in progress, and the current version provides
  basically functionality. (See Section 2.7)
@@ -75,6 +76,12 @@ Brief summary of control files.
  memory.kmem.tcp.limit_in_bytes  # set/show hard limit for tcp buf memory
  memory.kmem.tcp.usage_in_bytes  # show current tcp buf memory allocation
 
+
+ memory.hugetlb.<hugepagesize>.limit_in_bytes     # set/show limit of "hugepagesize" hugetlb usage
+ memory.hugetlb.<hugepagesize>.max_usage_in_bytes # show max "hugepagesize" hugetlb  usage recorded
+ memory.hugetlb.<hugepagesize>.usage_in_bytes     # show current res_counter usage for "hugepagesize" hugetlb
+						  # see 5.7 for details
+
 1. History
 
 The memory controller has a long history. A request for comments for the memory
@@ -279,6 +286,15 @@ per cgroup, instead of globally.
 
 * tcp memory pressure: sockets memory pressure for the tcp protocol.
 
+2.8 HugeTLB extension
+
+This extension allows to limit the HugeTLB usage per control group and
+enforces the controller limit during page fault. Since HugeTLB doesn't
+support page reclaim, enforcing the limit at page fault time implies that,
+the application will get SIGBUS signal if it tries to access HugeTLB pages
+beyond its limit. This requires the application to know beforehand how much
+HugeTLB pages it would require for its use.
+
 3. User Interface
 
 0. Configuration
@@ -287,6 +303,7 @@ a. Enable CONFIG_CGROUPS
 b. Enable CONFIG_RESOURCE_COUNTERS
 c. Enable CONFIG_CGROUP_MEM_RES_CTLR
 d. Enable CONFIG_CGROUP_MEM_RES_CTLR_SWAP (to use swap extension)
+f. Enable CONFIG_MEM_RES_CTLR_HUGETLB (to use HugeTLB extension)
 
 1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?)
 # mount -t tmpfs none /sys/fs/cgroup
@@ -510,6 +527,18 @@ unevictable=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ...
 
 And we have total = file + anon + unevictable.
 
+5.7 HugeTLB resource control files
+For a system supporting two hugepage size (16M and 16G) the control
+files include:
+
+ memory.hugetlb.16GB.limit_in_bytes
+ memory.hugetlb.16GB.max_usage_in_bytes
+ memory.hugetlb.16GB.usage_in_bytes
+ memory.hugetlb.16MB.limit_in_bytes
+ memory.hugetlb.16MB.max_usage_in_bytes
+ memory.hugetlb.16MB.usage_in_bytes
+
+
 6. Hierarchy support
 
 The memory controller supports a deep hierarchy and hierarchical accounting.
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH -V4 10/10] memcg: Add memory controller documentation for hugetlb management
@ 2012-03-16 17:39   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-16 17:39 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, mhocko,
	akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 Documentation/cgroups/memory.txt |   29 +++++++++++++++++++++++++++++
 1 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index 4c95c00..d99c41b 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -43,6 +43,7 @@ Features:
  - usage threshold notifier
  - oom-killer disable knob and oom-notifier
  - Root cgroup has no limit controls.
+ - resource accounting for HugeTLB pages
 
  Kernel memory support is work in progress, and the current version provides
  basically functionality. (See Section 2.7)
@@ -75,6 +76,12 @@ Brief summary of control files.
  memory.kmem.tcp.limit_in_bytes  # set/show hard limit for tcp buf memory
  memory.kmem.tcp.usage_in_bytes  # show current tcp buf memory allocation
 
+
+ memory.hugetlb.<hugepagesize>.limit_in_bytes     # set/show limit of "hugepagesize" hugetlb usage
+ memory.hugetlb.<hugepagesize>.max_usage_in_bytes # show max "hugepagesize" hugetlb  usage recorded
+ memory.hugetlb.<hugepagesize>.usage_in_bytes     # show current res_counter usage for "hugepagesize" hugetlb
+						  # see 5.7 for details
+
 1. History
 
 The memory controller has a long history. A request for comments for the memory
@@ -279,6 +286,15 @@ per cgroup, instead of globally.
 
 * tcp memory pressure: sockets memory pressure for the tcp protocol.
 
+2.8 HugeTLB extension
+
+This extension allows to limit the HugeTLB usage per control group and
+enforces the controller limit during page fault. Since HugeTLB doesn't
+support page reclaim, enforcing the limit at page fault time implies that,
+the application will get SIGBUS signal if it tries to access HugeTLB pages
+beyond its limit. This requires the application to know beforehand how much
+HugeTLB pages it would require for its use.
+
 3. User Interface
 
 0. Configuration
@@ -287,6 +303,7 @@ a. Enable CONFIG_CGROUPS
 b. Enable CONFIG_RESOURCE_COUNTERS
 c. Enable CONFIG_CGROUP_MEM_RES_CTLR
 d. Enable CONFIG_CGROUP_MEM_RES_CTLR_SWAP (to use swap extension)
+f. Enable CONFIG_MEM_RES_CTLR_HUGETLB (to use HugeTLB extension)
 
 1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?)
 # mount -t tmpfs none /sys/fs/cgroup
@@ -510,6 +527,18 @@ unevictable=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ...
 
 And we have total = file + anon + unevictable.
 
+5.7 HugeTLB resource control files
+For a system supporting two hugepage size (16M and 16G) the control
+files include:
+
+ memory.hugetlb.16GB.limit_in_bytes
+ memory.hugetlb.16GB.max_usage_in_bytes
+ memory.hugetlb.16GB.usage_in_bytes
+ memory.hugetlb.16MB.limit_in_bytes
+ memory.hugetlb.16MB.max_usage_in_bytes
+ memory.hugetlb.16MB.usage_in_bytes
+
+
 6. Hierarchy support
 
 The memory controller supports a deep hierarchy and hierarchical accounting.
-- 
1.7.9

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 01/10] hugetlb: rename max_hstate to hugetlb_max_hstate
  2012-03-16 17:39   ` Aneesh Kumar K.V
@ 2012-03-19  2:07     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  2:07 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups

(2012/03/17 2:39), Aneesh Kumar K.V wrote:

> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> We will be using this from other subsystems like memcg
> in later patches.
> 
> Acked-by: Hillf Danton <dhillf@gmail.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>


Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 01/10] hugetlb: rename max_hstate to hugetlb_max_hstate
@ 2012-03-19  2:07     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  2:07 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups

(2012/03/17 2:39), Aneesh Kumar K.V wrote:

> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> We will be using this from other subsystems like memcg
> in later patches.
> 
> Acked-by: Hillf Danton <dhillf@gmail.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>


Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 02/10] hugetlbfs: don't use ERR_PTR with VM_FAULT* values
  2012-03-16 17:39   ` Aneesh Kumar K.V
  (?)
@ 2012-03-19  2:11     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  2:11 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups

(2012/03/17 2:39), Aneesh Kumar K.V wrote:

> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> Using VM_FAULT_* codes with ERR_PTR will require us to make sure
> VM_FAULT_* values will not exceed MAX_ERRNO value.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>


Is this a bug fix ?
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 02/10] hugetlbfs: don't use ERR_PTR with VM_FAULT* values
@ 2012-03-19  2:11     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  2:11 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups

(2012/03/17 2:39), Aneesh Kumar K.V wrote:

> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> Using VM_FAULT_* codes with ERR_PTR will require us to make sure
> VM_FAULT_* values will not exceed MAX_ERRNO value.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>


Is this a bug fix ?
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 02/10] hugetlbfs: don't use ERR_PTR with VM_FAULT* values
@ 2012-03-19  2:11     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  2:11 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, mgorman-l3A5Bk7waGM,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	mhocko-AlSwsSmVLrQ, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

(2012/03/17 2:39), Aneesh Kumar K.V wrote:

> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> 
> Using VM_FAULT_* codes with ERR_PTR will require us to make sure
> VM_FAULT_* values will not exceed MAX_ERRNO value.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>


Is this a bug fix ?
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 03/10] hugetlbfs: Add an inline helper for finding hstate index
  2012-03-16 17:39   ` Aneesh Kumar K.V
@ 2012-03-19  2:15     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  2:15 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups

(2012/03/17 2:39), Aneesh Kumar K.V wrote:

> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> Add and inline helper and use it in the code.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>


Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 03/10] hugetlbfs: Add an inline helper for finding hstate index
@ 2012-03-19  2:15     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  2:15 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups

(2012/03/17 2:39), Aneesh Kumar K.V wrote:

> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> Add and inline helper and use it in the code.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>


Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
  2012-03-16 17:39   ` Aneesh Kumar K.V
  (?)
@ 2012-03-19  2:38     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  2:38 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups

(2012/03/17 2:39), Aneesh Kumar K.V wrote:

> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> This patch implements a memcg extension that allows us to control
> HugeTLB allocations via memory controller.
> 


If you write some details here, it will be helpful for review and
seeing log after merge.


> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  include/linux/hugetlb.h    |    1 +
>  include/linux/memcontrol.h |   42 +++++++++++++
>  init/Kconfig               |    8 +++
>  mm/hugetlb.c               |    2 +-
>  mm/memcontrol.c            |  138 ++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 190 insertions(+), 1 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index a2675b0..1f70068 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -243,6 +243,7 @@ struct hstate *size_to_hstate(unsigned long size);
>  #define HUGE_MAX_HSTATE 1
>  #endif
>  
> +extern int hugetlb_max_hstate;
>  extern struct hstate hstates[HUGE_MAX_HSTATE];
>  extern unsigned int default_hstate_idx;
>  
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 4d34356..320dbad 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -429,5 +429,47 @@ static inline void sock_release_memcg(struct sock *sk)
>  {
>  }
>  #endif /* CONFIG_CGROUP_MEM_RES_CTLR_KMEM */
> +
> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
> +extern int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
> +					  struct mem_cgroup **ptr);
> +extern void mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages,
> +					     struct mem_cgroup *memcg,
> +					     struct page *page);
> +extern void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages,
> +					     struct page *page);
> +extern void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
> +					      struct mem_cgroup *memcg);
> +
> +#else
> +static inline int
> +mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
> +						 struct mem_cgroup **ptr)
> +{
> +	return 0;
> +}
> +
> +static inline void
> +mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages,
> +				 struct mem_cgroup *memcg,
> +				 struct page *page)
> +{
> +	return;
> +}
> +
> +static inline void
> +mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages,
> +				 struct page *page)
> +{
> +	return;
> +}
> +
> +static inline void
> +mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
> +				  struct mem_cgroup *memcg)
> +{
> +	return;
> +}
> +#endif  /* CONFIG_MEM_RES_CTLR_HUGETLB */
>  #endif /* _LINUX_MEMCONTROL_H */
>  
> diff --git a/init/Kconfig b/init/Kconfig
> index 3f42cd6..f0eb8aa 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -725,6 +725,14 @@ config CGROUP_PERF
>  
>  	  Say N if unsure.
>  
> +config MEM_RES_CTLR_HUGETLB
> +	bool "Memory Resource Controller HugeTLB Extension (EXPERIMENTAL)"
> +	depends on CGROUP_MEM_RES_CTLR && HUGETLB_PAGE && EXPERIMENTAL
> +	default n
> +	help
> +	  Add HugeTLB management to memory resource controller. When you
> +	  enable this, you can put a per cgroup limit on HugeTLB usage.
> +
>  menuconfig CGROUP_SCHED
>  	bool "Group CPU scheduler"
>  	default n
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index ebe245c..c672187 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -34,7 +34,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
>  static gfp_t htlb_alloc_mask = GFP_HIGHUSER;
>  unsigned long hugepages_treat_as_movable;
>  
> -static int hugetlb_max_hstate;
> +int hugetlb_max_hstate;
>  unsigned int default_hstate_idx;
>  struct hstate hstates[HUGE_MAX_HSTATE];
>  
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 6728a7a..4b36c5e 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -235,6 +235,10 @@ struct mem_cgroup {
>  	 */
>  	struct res_counter memsw;
>  	/*
> +	 * the counter to account for hugepages from hugetlb.
> +	 */
> +	struct res_counter hugepage[HUGE_MAX_HSTATE];




> +	/*
>  	 * Per cgroup active and inactive list, similar to the
>  	 * per zone LRU lists.
>  	 */
> @@ -3156,6 +3160,128 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
>  }
>  #endif
>  
> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
> +{
> +	int idx;
> +	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
> +		if (memcg->hugepage[idx].usage > 0)
> +			return 1;
> +	}
> +	return 0;
> +}


Please use res_counter_read_u64() rather than reading the value directly.


> +
> +int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
> +				   struct mem_cgroup **ptr)
> +{
> +	int ret = 0;
> +	struct mem_cgroup *memcg;
> +	struct res_counter *fail_res;
> +	unsigned long csize = nr_pages * PAGE_SIZE;
> +
> +	if (mem_cgroup_disabled())
> +		return 0;
> +again:
> +	rcu_read_lock();
> +	memcg = mem_cgroup_from_task(current);
> +	if (!memcg)
> +		memcg = root_mem_cgroup;
> +	if (mem_cgroup_is_root(memcg)) {
> +		rcu_read_unlock();
> +		goto done;
> +	}


One concern is.... Now, yes, memory cgroup doesn't account root cgroup
and doesn't update res->usage to avoid updating shared counter overheads
when memcg is not mounted. But memory.usage_in_bytes files works
for root memcg with reading percpu statistics.

So, how about counting usage for root cgroup even if it cannot be limited ?
Considering hugetlb fs usage, updating res_counter here doesn't have
performance problem of false sharing..
Then, you can remove root_mem_cgroup() checks inserted several places.

<snip>

>  	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> +	/*
> +	 * Don't allow memcg removal if we have HugeTLB resource
> +	 * usage.
> +	 */
> +	if (mem_cgroup_have_hugetlb_usage(memcg))
> +		return -EBUSY;
>  
>  	return mem_cgroup_force_empty(memcg, false);
>  }


Is this fixed by patch 8+9 ?



Thanks,
-Kame


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
@ 2012-03-19  2:38     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  2:38 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups

(2012/03/17 2:39), Aneesh Kumar K.V wrote:

> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> This patch implements a memcg extension that allows us to control
> HugeTLB allocations via memory controller.
> 


If you write some details here, it will be helpful for review and
seeing log after merge.


> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  include/linux/hugetlb.h    |    1 +
>  include/linux/memcontrol.h |   42 +++++++++++++
>  init/Kconfig               |    8 +++
>  mm/hugetlb.c               |    2 +-
>  mm/memcontrol.c            |  138 ++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 190 insertions(+), 1 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index a2675b0..1f70068 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -243,6 +243,7 @@ struct hstate *size_to_hstate(unsigned long size);
>  #define HUGE_MAX_HSTATE 1
>  #endif
>  
> +extern int hugetlb_max_hstate;
>  extern struct hstate hstates[HUGE_MAX_HSTATE];
>  extern unsigned int default_hstate_idx;
>  
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 4d34356..320dbad 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -429,5 +429,47 @@ static inline void sock_release_memcg(struct sock *sk)
>  {
>  }
>  #endif /* CONFIG_CGROUP_MEM_RES_CTLR_KMEM */
> +
> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
> +extern int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
> +					  struct mem_cgroup **ptr);
> +extern void mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages,
> +					     struct mem_cgroup *memcg,
> +					     struct page *page);
> +extern void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages,
> +					     struct page *page);
> +extern void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
> +					      struct mem_cgroup *memcg);
> +
> +#else
> +static inline int
> +mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
> +						 struct mem_cgroup **ptr)
> +{
> +	return 0;
> +}
> +
> +static inline void
> +mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages,
> +				 struct mem_cgroup *memcg,
> +				 struct page *page)
> +{
> +	return;
> +}
> +
> +static inline void
> +mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages,
> +				 struct page *page)
> +{
> +	return;
> +}
> +
> +static inline void
> +mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
> +				  struct mem_cgroup *memcg)
> +{
> +	return;
> +}
> +#endif  /* CONFIG_MEM_RES_CTLR_HUGETLB */
>  #endif /* _LINUX_MEMCONTROL_H */
>  
> diff --git a/init/Kconfig b/init/Kconfig
> index 3f42cd6..f0eb8aa 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -725,6 +725,14 @@ config CGROUP_PERF
>  
>  	  Say N if unsure.
>  
> +config MEM_RES_CTLR_HUGETLB
> +	bool "Memory Resource Controller HugeTLB Extension (EXPERIMENTAL)"
> +	depends on CGROUP_MEM_RES_CTLR && HUGETLB_PAGE && EXPERIMENTAL
> +	default n
> +	help
> +	  Add HugeTLB management to memory resource controller. When you
> +	  enable this, you can put a per cgroup limit on HugeTLB usage.
> +
>  menuconfig CGROUP_SCHED
>  	bool "Group CPU scheduler"
>  	default n
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index ebe245c..c672187 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -34,7 +34,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
>  static gfp_t htlb_alloc_mask = GFP_HIGHUSER;
>  unsigned long hugepages_treat_as_movable;
>  
> -static int hugetlb_max_hstate;
> +int hugetlb_max_hstate;
>  unsigned int default_hstate_idx;
>  struct hstate hstates[HUGE_MAX_HSTATE];
>  
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 6728a7a..4b36c5e 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -235,6 +235,10 @@ struct mem_cgroup {
>  	 */
>  	struct res_counter memsw;
>  	/*
> +	 * the counter to account for hugepages from hugetlb.
> +	 */
> +	struct res_counter hugepage[HUGE_MAX_HSTATE];




> +	/*
>  	 * Per cgroup active and inactive list, similar to the
>  	 * per zone LRU lists.
>  	 */
> @@ -3156,6 +3160,128 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
>  }
>  #endif
>  
> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
> +{
> +	int idx;
> +	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
> +		if (memcg->hugepage[idx].usage > 0)
> +			return 1;
> +	}
> +	return 0;
> +}


Please use res_counter_read_u64() rather than reading the value directly.


> +
> +int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
> +				   struct mem_cgroup **ptr)
> +{
> +	int ret = 0;
> +	struct mem_cgroup *memcg;
> +	struct res_counter *fail_res;
> +	unsigned long csize = nr_pages * PAGE_SIZE;
> +
> +	if (mem_cgroup_disabled())
> +		return 0;
> +again:
> +	rcu_read_lock();
> +	memcg = mem_cgroup_from_task(current);
> +	if (!memcg)
> +		memcg = root_mem_cgroup;
> +	if (mem_cgroup_is_root(memcg)) {
> +		rcu_read_unlock();
> +		goto done;
> +	}


One concern is.... Now, yes, memory cgroup doesn't account root cgroup
and doesn't update res->usage to avoid updating shared counter overheads
when memcg is not mounted. But memory.usage_in_bytes files works
for root memcg with reading percpu statistics.

So, how about counting usage for root cgroup even if it cannot be limited ?
Considering hugetlb fs usage, updating res_counter here doesn't have
performance problem of false sharing..
Then, you can remove root_mem_cgroup() checks inserted several places.

<snip>

>  	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> +	/*
> +	 * Don't allow memcg removal if we have HugeTLB resource
> +	 * usage.
> +	 */
> +	if (mem_cgroup_have_hugetlb_usage(memcg))
> +		return -EBUSY;
>  
>  	return mem_cgroup_force_empty(memcg, false);
>  }


Is this fixed by patch 8+9 ?



Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
@ 2012-03-19  2:38     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  2:38 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, mgorman-l3A5Bk7waGM,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	mhocko-AlSwsSmVLrQ, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

(2012/03/17 2:39), Aneesh Kumar K.V wrote:

> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> 
> This patch implements a memcg extension that allows us to control
> HugeTLB allocations via memory controller.
> 


If you write some details here, it will be helpful for review and
seeing log after merge.


> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> ---
>  include/linux/hugetlb.h    |    1 +
>  include/linux/memcontrol.h |   42 +++++++++++++
>  init/Kconfig               |    8 +++
>  mm/hugetlb.c               |    2 +-
>  mm/memcontrol.c            |  138 ++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 190 insertions(+), 1 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index a2675b0..1f70068 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -243,6 +243,7 @@ struct hstate *size_to_hstate(unsigned long size);
>  #define HUGE_MAX_HSTATE 1
>  #endif
>  
> +extern int hugetlb_max_hstate;
>  extern struct hstate hstates[HUGE_MAX_HSTATE];
>  extern unsigned int default_hstate_idx;
>  
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 4d34356..320dbad 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -429,5 +429,47 @@ static inline void sock_release_memcg(struct sock *sk)
>  {
>  }
>  #endif /* CONFIG_CGROUP_MEM_RES_CTLR_KMEM */
> +
> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
> +extern int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
> +					  struct mem_cgroup **ptr);
> +extern void mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages,
> +					     struct mem_cgroup *memcg,
> +					     struct page *page);
> +extern void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages,
> +					     struct page *page);
> +extern void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
> +					      struct mem_cgroup *memcg);
> +
> +#else
> +static inline int
> +mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
> +						 struct mem_cgroup **ptr)
> +{
> +	return 0;
> +}
> +
> +static inline void
> +mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages,
> +				 struct mem_cgroup *memcg,
> +				 struct page *page)
> +{
> +	return;
> +}
> +
> +static inline void
> +mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages,
> +				 struct page *page)
> +{
> +	return;
> +}
> +
> +static inline void
> +mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
> +				  struct mem_cgroup *memcg)
> +{
> +	return;
> +}
> +#endif  /* CONFIG_MEM_RES_CTLR_HUGETLB */
>  #endif /* _LINUX_MEMCONTROL_H */
>  
> diff --git a/init/Kconfig b/init/Kconfig
> index 3f42cd6..f0eb8aa 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -725,6 +725,14 @@ config CGROUP_PERF
>  
>  	  Say N if unsure.
>  
> +config MEM_RES_CTLR_HUGETLB
> +	bool "Memory Resource Controller HugeTLB Extension (EXPERIMENTAL)"
> +	depends on CGROUP_MEM_RES_CTLR && HUGETLB_PAGE && EXPERIMENTAL
> +	default n
> +	help
> +	  Add HugeTLB management to memory resource controller. When you
> +	  enable this, you can put a per cgroup limit on HugeTLB usage.
> +
>  menuconfig CGROUP_SCHED
>  	bool "Group CPU scheduler"
>  	default n
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index ebe245c..c672187 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -34,7 +34,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
>  static gfp_t htlb_alloc_mask = GFP_HIGHUSER;
>  unsigned long hugepages_treat_as_movable;
>  
> -static int hugetlb_max_hstate;
> +int hugetlb_max_hstate;
>  unsigned int default_hstate_idx;
>  struct hstate hstates[HUGE_MAX_HSTATE];
>  
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 6728a7a..4b36c5e 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -235,6 +235,10 @@ struct mem_cgroup {
>  	 */
>  	struct res_counter memsw;
>  	/*
> +	 * the counter to account for hugepages from hugetlb.
> +	 */
> +	struct res_counter hugepage[HUGE_MAX_HSTATE];




> +	/*
>  	 * Per cgroup active and inactive list, similar to the
>  	 * per zone LRU lists.
>  	 */
> @@ -3156,6 +3160,128 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
>  }
>  #endif
>  
> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
> +{
> +	int idx;
> +	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
> +		if (memcg->hugepage[idx].usage > 0)
> +			return 1;
> +	}
> +	return 0;
> +}


Please use res_counter_read_u64() rather than reading the value directly.


> +
> +int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
> +				   struct mem_cgroup **ptr)
> +{
> +	int ret = 0;
> +	struct mem_cgroup *memcg;
> +	struct res_counter *fail_res;
> +	unsigned long csize = nr_pages * PAGE_SIZE;
> +
> +	if (mem_cgroup_disabled())
> +		return 0;
> +again:
> +	rcu_read_lock();
> +	memcg = mem_cgroup_from_task(current);
> +	if (!memcg)
> +		memcg = root_mem_cgroup;
> +	if (mem_cgroup_is_root(memcg)) {
> +		rcu_read_unlock();
> +		goto done;
> +	}


One concern is.... Now, yes, memory cgroup doesn't account root cgroup
and doesn't update res->usage to avoid updating shared counter overheads
when memcg is not mounted. But memory.usage_in_bytes files works
for root memcg with reading percpu statistics.

So, how about counting usage for root cgroup even if it cannot be limited ?
Considering hugetlb fs usage, updating res_counter here doesn't have
performance problem of false sharing..
Then, you can remove root_mem_cgroup() checks inserted several places.

<snip>

>  	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> +	/*
> +	 * Don't allow memcg removal if we have HugeTLB resource
> +	 * usage.
> +	 */
> +	if (mem_cgroup_have_hugetlb_usage(memcg))
> +		return -EBUSY;
>  
>  	return mem_cgroup_force_empty(memcg, false);
>  }


Is this fixed by patch 8+9 ?



Thanks,
-Kame

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
  2012-03-16 17:39   ` Aneesh Kumar K.V
  (?)
@ 2012-03-19  2:41     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  2:41 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups

(2012/03/17 2:39), Aneesh Kumar K.V wrote:

> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> This adds necessary charge/uncharge calls in the HugeTLB code
> 
> Acked-by: Hillf Danton <dhillf@gmail.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>


Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
A nitpick below.

> ---
>  mm/hugetlb.c    |   21 ++++++++++++++++++++-
>  mm/memcontrol.c |    5 +++++
>  2 files changed, 25 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index c672187..91361a0 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -21,6 +21,8 @@
>  #include <linux/rmap.h>
>  #include <linux/swap.h>
>  #include <linux/swapops.h>
> +#include <linux/memcontrol.h>
> +#include <linux/page_cgroup.h>
>  
>  #include <asm/page.h>
>  #include <asm/pgtable.h>
> @@ -542,6 +544,9 @@ static void free_huge_page(struct page *page)
>  	BUG_ON(page_mapcount(page));
>  	INIT_LIST_HEAD(&page->lru);
>  
> +	if (mapping)
> +		mem_cgroup_hugetlb_uncharge_page(hstate_index(h),
> +						 pages_per_huge_page(h), page);
>  	spin_lock(&hugetlb_lock);
>  	if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
>  		update_and_free_page(h, page);
> @@ -1019,12 +1024,15 @@ static void vma_commit_reservation(struct hstate *h,
>  static struct page *alloc_huge_page(struct vm_area_struct *vma,
>  				    unsigned long addr, int avoid_reserve)
>  {
> +	int ret, idx;
>  	struct hstate *h = hstate_vma(vma);
>  	struct page *page;
> +	struct mem_cgroup *memcg = NULL;


Can't we this initialization in mem_cgroup_hugetlb_charge_page() ?

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
@ 2012-03-19  2:41     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  2:41 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups

(2012/03/17 2:39), Aneesh Kumar K.V wrote:

> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> This adds necessary charge/uncharge calls in the HugeTLB code
> 
> Acked-by: Hillf Danton <dhillf@gmail.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>


Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
A nitpick below.

> ---
>  mm/hugetlb.c    |   21 ++++++++++++++++++++-
>  mm/memcontrol.c |    5 +++++
>  2 files changed, 25 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index c672187..91361a0 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -21,6 +21,8 @@
>  #include <linux/rmap.h>
>  #include <linux/swap.h>
>  #include <linux/swapops.h>
> +#include <linux/memcontrol.h>
> +#include <linux/page_cgroup.h>
>  
>  #include <asm/page.h>
>  #include <asm/pgtable.h>
> @@ -542,6 +544,9 @@ static void free_huge_page(struct page *page)
>  	BUG_ON(page_mapcount(page));
>  	INIT_LIST_HEAD(&page->lru);
>  
> +	if (mapping)
> +		mem_cgroup_hugetlb_uncharge_page(hstate_index(h),
> +						 pages_per_huge_page(h), page);
>  	spin_lock(&hugetlb_lock);
>  	if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
>  		update_and_free_page(h, page);
> @@ -1019,12 +1024,15 @@ static void vma_commit_reservation(struct hstate *h,
>  static struct page *alloc_huge_page(struct vm_area_struct *vma,
>  				    unsigned long addr, int avoid_reserve)
>  {
> +	int ret, idx;
>  	struct hstate *h = hstate_vma(vma);
>  	struct page *page;
> +	struct mem_cgroup *memcg = NULL;


Can't we this initialization in mem_cgroup_hugetlb_charge_page() ?

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
@ 2012-03-19  2:41     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  2:41 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, mgorman-l3A5Bk7waGM,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	mhocko-AlSwsSmVLrQ, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

(2012/03/17 2:39), Aneesh Kumar K.V wrote:

> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> 
> This adds necessary charge/uncharge calls in the HugeTLB code
> 
> Acked-by: Hillf Danton <dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>


Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
A nitpick below.

> ---
>  mm/hugetlb.c    |   21 ++++++++++++++++++++-
>  mm/memcontrol.c |    5 +++++
>  2 files changed, 25 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index c672187..91361a0 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -21,6 +21,8 @@
>  #include <linux/rmap.h>
>  #include <linux/swap.h>
>  #include <linux/swapops.h>
> +#include <linux/memcontrol.h>
> +#include <linux/page_cgroup.h>
>  
>  #include <asm/page.h>
>  #include <asm/pgtable.h>
> @@ -542,6 +544,9 @@ static void free_huge_page(struct page *page)
>  	BUG_ON(page_mapcount(page));
>  	INIT_LIST_HEAD(&page->lru);
>  
> +	if (mapping)
> +		mem_cgroup_hugetlb_uncharge_page(hstate_index(h),
> +						 pages_per_huge_page(h), page);
>  	spin_lock(&hugetlb_lock);
>  	if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
>  		update_and_free_page(h, page);
> @@ -1019,12 +1024,15 @@ static void vma_commit_reservation(struct hstate *h,
>  static struct page *alloc_huge_page(struct vm_area_struct *vma,
>  				    unsigned long addr, int avoid_reserve)
>  {
> +	int ret, idx;
>  	struct hstate *h = hstate_vma(vma);
>  	struct page *page;
> +	struct mem_cgroup *memcg = NULL;


Can't we this initialization in mem_cgroup_hugetlb_charge_page() ?

Thanks,
-Kame

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 06/10] memcg: track resource index in cftype private
  2012-03-16 17:39   ` Aneesh Kumar K.V
  (?)
@ 2012-03-19  2:43     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  2:43 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups

(2012/03/17 2:39), Aneesh Kumar K.V wrote:

> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> This helps in using same memcg callbacks for non reclaim resource
> control files.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>


Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

As mentioned, I'm glad if you can handle usage_in_bytes for root memcg.



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 06/10] memcg: track resource index in cftype private
@ 2012-03-19  2:43     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  2:43 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups

(2012/03/17 2:39), Aneesh Kumar K.V wrote:

> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> This helps in using same memcg callbacks for non reclaim resource
> control files.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>


Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

As mentioned, I'm glad if you can handle usage_in_bytes for root memcg.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 06/10] memcg: track resource index in cftype private
@ 2012-03-19  2:43     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  2:43 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, mgorman-l3A5Bk7waGM,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	mhocko-AlSwsSmVLrQ, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

(2012/03/17 2:39), Aneesh Kumar K.V wrote:

> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> 
> This helps in using same memcg callbacks for non reclaim resource
> control files.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>


Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>

As mentioned, I'm glad if you can handle usage_in_bytes for root memcg.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 07/10] hugetlbfs: Add memcg control files for hugetlbfs
  2012-03-16 17:39   ` Aneesh Kumar K.V
@ 2012-03-19  2:56     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  2:56 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups

(2012/03/17 2:39), Aneesh Kumar K.V wrote:

> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> This add control files for hugetlbfs in memcg
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>


I have a question. When a user does

	1. create memory cgroup as
		/cgroup/A
	2. insmod hugetlb.ko
	3. ls /cgroup/A

and then, files can be shown ? Don't we have any problem at rmdir A ?

I'm sorry if hugetlb never be used as module.

a comment below.

> ---
>  include/linux/hugetlb.h    |   17 +++++++++++++++
>  include/linux/memcontrol.h |    7 ++++++
>  mm/hugetlb.c               |   25 ++++++++++++++++++++++-
>  mm/memcontrol.c            |   48 ++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 96 insertions(+), 1 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 1f70068..cbd8dc5 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -4,6 +4,7 @@
>  #include <linux/mm_types.h>
>  #include <linux/fs.h>
>  #include <linux/hugetlb_inline.h>
> +#include <linux/cgroup.h>
>  
>  struct ctl_table;
>  struct user_struct;
> @@ -220,6 +221,12 @@ struct hstate {
>  	unsigned int nr_huge_pages_node[MAX_NUMNODES];
>  	unsigned int free_huge_pages_node[MAX_NUMNODES];
>  	unsigned int surplus_huge_pages_node[MAX_NUMNODES];
> +	/* mem cgroup control files */
> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
> +	struct cftype cgroup_limit_file;
> +	struct cftype cgroup_usage_file;
> +	struct cftype cgroup_max_usage_file;
> +#endif
>  	char name[HSTATE_NAME_LEN];
>  };
>  
> @@ -338,4 +345,14 @@ static inline unsigned int pages_per_huge_page(struct hstate *h)
>  #define hstate_index(h) 0
>  #endif
>  
> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
> +extern int register_hugetlb_memcg_files(struct cgroup *cgroup,
> +					struct cgroup_subsys *ss);
> +#else
> +static inline int register_hugetlb_memcg_files(struct cgroup *cgroup,
> +					       struct cgroup_subsys *ss)
> +{
> +	return 0;
> +}
> +#endif
>  #endif /* _LINUX_HUGETLB_H */
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 320dbad..73900b9 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -440,6 +440,7 @@ extern void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages,
>  					     struct page *page);
>  extern void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
>  					      struct mem_cgroup *memcg);
> +extern int mem_cgroup_hugetlb_file_init(int idx);
>  
>  #else
>  static inline int
> @@ -470,6 +471,12 @@ mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
>  {
>  	return;
>  }
> +
> +static inline int mem_cgroup_hugetlb_file_init(int idx)
> +{
> +	return 0;
> +}
> +
>  #endif  /* CONFIG_MEM_RES_CTLR_HUGETLB */
>  #endif /* _LINUX_MEMCONTROL_H */
>  
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 91361a0..684849a 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1819,6 +1819,29 @@ static int __init hugetlb_init(void)
>  }
>  module_init(hugetlb_init);
>  
> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
> +int register_hugetlb_memcg_files(struct cgroup *cgroup,
> +				 struct cgroup_subsys *ss)
> +{

> +	int ret = 0;
> +	struct hstate *h;
> +
> +	for_each_hstate(h) {
> +		ret = cgroup_add_file(cgroup, ss, &h->cgroup_limit_file);
> +		if (ret)
> +			return ret;
> +		ret = cgroup_add_file(cgroup, ss, &h->cgroup_usage_file);
> +		if (ret)
> +			return ret;
> +		ret = cgroup_add_file(cgroup, ss, &h->cgroup_max_usage_file);
> +		if (ret)
> +			return ret;
> +
> +	}
> +	return ret;
> +}
> +#endif
> +
>  /* Should be called on processing a hugepagesz=... option */
>  void __init hugetlb_add_hstate(unsigned order)
>  {
> @@ -1842,7 +1865,7 @@ void __init hugetlb_add_hstate(unsigned order)
>  	h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]);
>  	snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
>  					huge_page_size(h)/1024);
> -
> +	mem_cgroup_hugetlb_file_init(hugetlb_max_hstate - 1);
>  	parsed_hstate = h;
>  }
>  
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index d8b3513..4900b72 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5123,6 +5123,51 @@ static void mem_cgroup_destroy(struct cgroup_subsys *ss,
>  	mem_cgroup_put(memcg);
>  }
>  
> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
> +static char *mem_fmt(char *buf, unsigned long n)
> +{
> +	if (n >= (1UL << 30))
> +		sprintf(buf, "%luGB", n >> 30);
> +	else if (n >= (1UL << 20))
> +		sprintf(buf, "%luMB", n >> 20);
> +	else
> +		sprintf(buf, "%luKB", n >> 10);
> +	return buf;
> +}
> +
> +int mem_cgroup_hugetlb_file_init(int idx)
> +{


__init ? And... do we have guarantee that this function is called before
creating root mem cgroup even if CONFIG_HUGETLBFS=y ?

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 07/10] hugetlbfs: Add memcg control files for hugetlbfs
@ 2012-03-19  2:56     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  2:56 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups

(2012/03/17 2:39), Aneesh Kumar K.V wrote:

> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> This add control files for hugetlbfs in memcg
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>


I have a question. When a user does

	1. create memory cgroup as
		/cgroup/A
	2. insmod hugetlb.ko
	3. ls /cgroup/A

and then, files can be shown ? Don't we have any problem at rmdir A ?

I'm sorry if hugetlb never be used as module.

a comment below.

> ---
>  include/linux/hugetlb.h    |   17 +++++++++++++++
>  include/linux/memcontrol.h |    7 ++++++
>  mm/hugetlb.c               |   25 ++++++++++++++++++++++-
>  mm/memcontrol.c            |   48 ++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 96 insertions(+), 1 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 1f70068..cbd8dc5 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -4,6 +4,7 @@
>  #include <linux/mm_types.h>
>  #include <linux/fs.h>
>  #include <linux/hugetlb_inline.h>
> +#include <linux/cgroup.h>
>  
>  struct ctl_table;
>  struct user_struct;
> @@ -220,6 +221,12 @@ struct hstate {
>  	unsigned int nr_huge_pages_node[MAX_NUMNODES];
>  	unsigned int free_huge_pages_node[MAX_NUMNODES];
>  	unsigned int surplus_huge_pages_node[MAX_NUMNODES];
> +	/* mem cgroup control files */
> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
> +	struct cftype cgroup_limit_file;
> +	struct cftype cgroup_usage_file;
> +	struct cftype cgroup_max_usage_file;
> +#endif
>  	char name[HSTATE_NAME_LEN];
>  };
>  
> @@ -338,4 +345,14 @@ static inline unsigned int pages_per_huge_page(struct hstate *h)
>  #define hstate_index(h) 0
>  #endif
>  
> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
> +extern int register_hugetlb_memcg_files(struct cgroup *cgroup,
> +					struct cgroup_subsys *ss);
> +#else
> +static inline int register_hugetlb_memcg_files(struct cgroup *cgroup,
> +					       struct cgroup_subsys *ss)
> +{
> +	return 0;
> +}
> +#endif
>  #endif /* _LINUX_HUGETLB_H */
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 320dbad..73900b9 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -440,6 +440,7 @@ extern void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages,
>  					     struct page *page);
>  extern void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
>  					      struct mem_cgroup *memcg);
> +extern int mem_cgroup_hugetlb_file_init(int idx);
>  
>  #else
>  static inline int
> @@ -470,6 +471,12 @@ mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
>  {
>  	return;
>  }
> +
> +static inline int mem_cgroup_hugetlb_file_init(int idx)
> +{
> +	return 0;
> +}
> +
>  #endif  /* CONFIG_MEM_RES_CTLR_HUGETLB */
>  #endif /* _LINUX_MEMCONTROL_H */
>  
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 91361a0..684849a 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1819,6 +1819,29 @@ static int __init hugetlb_init(void)
>  }
>  module_init(hugetlb_init);
>  
> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
> +int register_hugetlb_memcg_files(struct cgroup *cgroup,
> +				 struct cgroup_subsys *ss)
> +{

> +	int ret = 0;
> +	struct hstate *h;
> +
> +	for_each_hstate(h) {
> +		ret = cgroup_add_file(cgroup, ss, &h->cgroup_limit_file);
> +		if (ret)
> +			return ret;
> +		ret = cgroup_add_file(cgroup, ss, &h->cgroup_usage_file);
> +		if (ret)
> +			return ret;
> +		ret = cgroup_add_file(cgroup, ss, &h->cgroup_max_usage_file);
> +		if (ret)
> +			return ret;
> +
> +	}
> +	return ret;
> +}
> +#endif
> +
>  /* Should be called on processing a hugepagesz=... option */
>  void __init hugetlb_add_hstate(unsigned order)
>  {
> @@ -1842,7 +1865,7 @@ void __init hugetlb_add_hstate(unsigned order)
>  	h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]);
>  	snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
>  					huge_page_size(h)/1024);
> -
> +	mem_cgroup_hugetlb_file_init(hugetlb_max_hstate - 1);
>  	parsed_hstate = h;
>  }
>  
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index d8b3513..4900b72 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5123,6 +5123,51 @@ static void mem_cgroup_destroy(struct cgroup_subsys *ss,
>  	mem_cgroup_put(memcg);
>  }
>  
> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
> +static char *mem_fmt(char *buf, unsigned long n)
> +{
> +	if (n >= (1UL << 30))
> +		sprintf(buf, "%luGB", n >> 30);
> +	else if (n >= (1UL << 20))
> +		sprintf(buf, "%luMB", n >> 20);
> +	else
> +		sprintf(buf, "%luKB", n >> 10);
> +	return buf;
> +}
> +
> +int mem_cgroup_hugetlb_file_init(int idx)
> +{


__init ? And... do we have guarantee that this function is called before
creating root mem cgroup even if CONFIG_HUGETLBFS=y ?

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 08/10] hugetlbfs: Add a list for tracking in-use HugeTLB pages
  2012-03-16 17:39   ` Aneesh Kumar K.V
  (?)
@ 2012-03-19  3:00     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  3:00 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups

(2012/03/17 2:39), Aneesh Kumar K.V wrote:

> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> hugepage_activelist will be used to track currently used HugeTLB pages.
> We need to find the in-use HugeTLB pages to support memcg removal.
> On memcg removal we update the page's memory cgroup to point to
> parent cgroup.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>


Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
 
seems ok to me but...why the new list is not per node ? no benefit ?

Thanks,
-Kame

> ---
>  include/linux/hugetlb.h |    1 +
>  mm/hugetlb.c            |   23 ++++++++++++++++++-----
>  2 files changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index cbd8dc5..6919100 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -217,6 +217,7 @@ struct hstate {
>  	unsigned long resv_huge_pages;
>  	unsigned long surplus_huge_pages;
>  	unsigned long nr_overcommit_huge_pages;
> +	struct list_head hugepage_activelist;
>  	struct list_head hugepage_freelists[MAX_NUMNODES];
>  	unsigned int nr_huge_pages_node[MAX_NUMNODES];
>  	unsigned int free_huge_pages_node[MAX_NUMNODES];
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 684849a..8fd465d 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -433,7 +433,7 @@ void copy_huge_page(struct page *dst, struct page *src)
>  static void enqueue_huge_page(struct hstate *h, struct page *page)
>  {
>  	int nid = page_to_nid(page);
> -	list_add(&page->lru, &h->hugepage_freelists[nid]);
> +	list_move(&page->lru, &h->hugepage_freelists[nid]);
>  	h->free_huge_pages++;
>  	h->free_huge_pages_node[nid]++;
>  }
> @@ -445,7 +445,7 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid)
>  	if (list_empty(&h->hugepage_freelists[nid]))
>  		return NULL;
>  	page = list_entry(h->hugepage_freelists[nid].next, struct page, lru);
> -	list_del(&page->lru);
> +	list_move(&page->lru, &h->hugepage_activelist);
>  	set_page_refcounted(page);
>  	h->free_huge_pages--;
>  	h->free_huge_pages_node[nid]--;
> @@ -542,13 +542,14 @@ static void free_huge_page(struct page *page)
>  	page->mapping = NULL;
>  	BUG_ON(page_count(page));
>  	BUG_ON(page_mapcount(page));
> -	INIT_LIST_HEAD(&page->lru);
>  
>  	if (mapping)
>  		mem_cgroup_hugetlb_uncharge_page(hstate_index(h),
>  						 pages_per_huge_page(h), page);
>  	spin_lock(&hugetlb_lock);
>  	if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
> +		/* remove the page from active list */
> +		list_del(&page->lru);
>  		update_and_free_page(h, page);
>  		h->surplus_huge_pages--;
>  		h->surplus_huge_pages_node[nid]--;
> @@ -562,6 +563,7 @@ static void free_huge_page(struct page *page)
>  
>  static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
>  {
> +	INIT_LIST_HEAD(&page->lru);
>  	set_compound_page_dtor(page, free_huge_page);
>  	spin_lock(&hugetlb_lock);
>  	h->nr_huge_pages++;
> @@ -1861,6 +1863,7 @@ void __init hugetlb_add_hstate(unsigned order)
>  	h->free_huge_pages = 0;
>  	for (i = 0; i < MAX_NUMNODES; ++i)
>  		INIT_LIST_HEAD(&h->hugepage_freelists[i]);
> +	INIT_LIST_HEAD(&h->hugepage_activelist);
>  	h->next_nid_to_alloc = first_node(node_states[N_HIGH_MEMORY]);
>  	h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]);
>  	snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
> @@ -2319,14 +2322,24 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
>  		page = pte_page(pte);
>  		if (pte_dirty(pte))
>  			set_page_dirty(page);
> -		list_add(&page->lru, &page_list);
> +
> +		spin_lock(&hugetlb_lock);
> +		list_move(&page->lru, &page_list);
> +		spin_unlock(&hugetlb_lock);
>  	}
>  	spin_unlock(&mm->page_table_lock);
>  	flush_tlb_range(vma, start, end);
>  	mmu_notifier_invalidate_range_end(mm, start, end);
>  	list_for_each_entry_safe(page, tmp, &page_list, lru) {
>  		page_remove_rmap(page);
> -		list_del(&page->lru);
> +		/*
> +		 * We need to move it back huge page active list. If we are
> +		 * holding the last reference, below put_page will move it
> +		 * back to free list.
> +		 */
> +		spin_lock(&hugetlb_lock);
> +		list_move(&page->lru, &h->hugepage_activelist);
> +		spin_unlock(&hugetlb_lock);
>  		put_page(page);
>  	}
>  }




^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 08/10] hugetlbfs: Add a list for tracking in-use HugeTLB pages
@ 2012-03-19  3:00     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  3:00 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups

(2012/03/17 2:39), Aneesh Kumar K.V wrote:

> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> hugepage_activelist will be used to track currently used HugeTLB pages.
> We need to find the in-use HugeTLB pages to support memcg removal.
> On memcg removal we update the page's memory cgroup to point to
> parent cgroup.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>


Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
 
seems ok to me but...why the new list is not per node ? no benefit ?

Thanks,
-Kame

> ---
>  include/linux/hugetlb.h |    1 +
>  mm/hugetlb.c            |   23 ++++++++++++++++++-----
>  2 files changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index cbd8dc5..6919100 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -217,6 +217,7 @@ struct hstate {
>  	unsigned long resv_huge_pages;
>  	unsigned long surplus_huge_pages;
>  	unsigned long nr_overcommit_huge_pages;
> +	struct list_head hugepage_activelist;
>  	struct list_head hugepage_freelists[MAX_NUMNODES];
>  	unsigned int nr_huge_pages_node[MAX_NUMNODES];
>  	unsigned int free_huge_pages_node[MAX_NUMNODES];
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 684849a..8fd465d 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -433,7 +433,7 @@ void copy_huge_page(struct page *dst, struct page *src)
>  static void enqueue_huge_page(struct hstate *h, struct page *page)
>  {
>  	int nid = page_to_nid(page);
> -	list_add(&page->lru, &h->hugepage_freelists[nid]);
> +	list_move(&page->lru, &h->hugepage_freelists[nid]);
>  	h->free_huge_pages++;
>  	h->free_huge_pages_node[nid]++;
>  }
> @@ -445,7 +445,7 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid)
>  	if (list_empty(&h->hugepage_freelists[nid]))
>  		return NULL;
>  	page = list_entry(h->hugepage_freelists[nid].next, struct page, lru);
> -	list_del(&page->lru);
> +	list_move(&page->lru, &h->hugepage_activelist);
>  	set_page_refcounted(page);
>  	h->free_huge_pages--;
>  	h->free_huge_pages_node[nid]--;
> @@ -542,13 +542,14 @@ static void free_huge_page(struct page *page)
>  	page->mapping = NULL;
>  	BUG_ON(page_count(page));
>  	BUG_ON(page_mapcount(page));
> -	INIT_LIST_HEAD(&page->lru);
>  
>  	if (mapping)
>  		mem_cgroup_hugetlb_uncharge_page(hstate_index(h),
>  						 pages_per_huge_page(h), page);
>  	spin_lock(&hugetlb_lock);
>  	if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
> +		/* remove the page from active list */
> +		list_del(&page->lru);
>  		update_and_free_page(h, page);
>  		h->surplus_huge_pages--;
>  		h->surplus_huge_pages_node[nid]--;
> @@ -562,6 +563,7 @@ static void free_huge_page(struct page *page)
>  
>  static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
>  {
> +	INIT_LIST_HEAD(&page->lru);
>  	set_compound_page_dtor(page, free_huge_page);
>  	spin_lock(&hugetlb_lock);
>  	h->nr_huge_pages++;
> @@ -1861,6 +1863,7 @@ void __init hugetlb_add_hstate(unsigned order)
>  	h->free_huge_pages = 0;
>  	for (i = 0; i < MAX_NUMNODES; ++i)
>  		INIT_LIST_HEAD(&h->hugepage_freelists[i]);
> +	INIT_LIST_HEAD(&h->hugepage_activelist);
>  	h->next_nid_to_alloc = first_node(node_states[N_HIGH_MEMORY]);
>  	h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]);
>  	snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
> @@ -2319,14 +2322,24 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
>  		page = pte_page(pte);
>  		if (pte_dirty(pte))
>  			set_page_dirty(page);
> -		list_add(&page->lru, &page_list);
> +
> +		spin_lock(&hugetlb_lock);
> +		list_move(&page->lru, &page_list);
> +		spin_unlock(&hugetlb_lock);
>  	}
>  	spin_unlock(&mm->page_table_lock);
>  	flush_tlb_range(vma, start, end);
>  	mmu_notifier_invalidate_range_end(mm, start, end);
>  	list_for_each_entry_safe(page, tmp, &page_list, lru) {
>  		page_remove_rmap(page);
> -		list_del(&page->lru);
> +		/*
> +		 * We need to move it back huge page active list. If we are
> +		 * holding the last reference, below put_page will move it
> +		 * back to free list.
> +		 */
> +		spin_lock(&hugetlb_lock);
> +		list_move(&page->lru, &h->hugepage_activelist);
> +		spin_unlock(&hugetlb_lock);
>  		put_page(page);
>  	}
>  }



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 08/10] hugetlbfs: Add a list for tracking in-use HugeTLB pages
@ 2012-03-19  3:00     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  3:00 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, mgorman-l3A5Bk7waGM,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	mhocko-AlSwsSmVLrQ, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

(2012/03/17 2:39), Aneesh Kumar K.V wrote:

> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> 
> hugepage_activelist will be used to track currently used HugeTLB pages.
> We need to find the in-use HugeTLB pages to support memcg removal.
> On memcg removal we update the page's memory cgroup to point to
> parent cgroup.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>


Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
 
seems ok to me but...why the new list is not per node ? no benefit ?

Thanks,
-Kame

> ---
>  include/linux/hugetlb.h |    1 +
>  mm/hugetlb.c            |   23 ++++++++++++++++++-----
>  2 files changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index cbd8dc5..6919100 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -217,6 +217,7 @@ struct hstate {
>  	unsigned long resv_huge_pages;
>  	unsigned long surplus_huge_pages;
>  	unsigned long nr_overcommit_huge_pages;
> +	struct list_head hugepage_activelist;
>  	struct list_head hugepage_freelists[MAX_NUMNODES];
>  	unsigned int nr_huge_pages_node[MAX_NUMNODES];
>  	unsigned int free_huge_pages_node[MAX_NUMNODES];
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 684849a..8fd465d 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -433,7 +433,7 @@ void copy_huge_page(struct page *dst, struct page *src)
>  static void enqueue_huge_page(struct hstate *h, struct page *page)
>  {
>  	int nid = page_to_nid(page);
> -	list_add(&page->lru, &h->hugepage_freelists[nid]);
> +	list_move(&page->lru, &h->hugepage_freelists[nid]);
>  	h->free_huge_pages++;
>  	h->free_huge_pages_node[nid]++;
>  }
> @@ -445,7 +445,7 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid)
>  	if (list_empty(&h->hugepage_freelists[nid]))
>  		return NULL;
>  	page = list_entry(h->hugepage_freelists[nid].next, struct page, lru);
> -	list_del(&page->lru);
> +	list_move(&page->lru, &h->hugepage_activelist);
>  	set_page_refcounted(page);
>  	h->free_huge_pages--;
>  	h->free_huge_pages_node[nid]--;
> @@ -542,13 +542,14 @@ static void free_huge_page(struct page *page)
>  	page->mapping = NULL;
>  	BUG_ON(page_count(page));
>  	BUG_ON(page_mapcount(page));
> -	INIT_LIST_HEAD(&page->lru);
>  
>  	if (mapping)
>  		mem_cgroup_hugetlb_uncharge_page(hstate_index(h),
>  						 pages_per_huge_page(h), page);
>  	spin_lock(&hugetlb_lock);
>  	if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
> +		/* remove the page from active list */
> +		list_del(&page->lru);
>  		update_and_free_page(h, page);
>  		h->surplus_huge_pages--;
>  		h->surplus_huge_pages_node[nid]--;
> @@ -562,6 +563,7 @@ static void free_huge_page(struct page *page)
>  
>  static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
>  {
> +	INIT_LIST_HEAD(&page->lru);
>  	set_compound_page_dtor(page, free_huge_page);
>  	spin_lock(&hugetlb_lock);
>  	h->nr_huge_pages++;
> @@ -1861,6 +1863,7 @@ void __init hugetlb_add_hstate(unsigned order)
>  	h->free_huge_pages = 0;
>  	for (i = 0; i < MAX_NUMNODES; ++i)
>  		INIT_LIST_HEAD(&h->hugepage_freelists[i]);
> +	INIT_LIST_HEAD(&h->hugepage_activelist);
>  	h->next_nid_to_alloc = first_node(node_states[N_HIGH_MEMORY]);
>  	h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]);
>  	snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
> @@ -2319,14 +2322,24 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
>  		page = pte_page(pte);
>  		if (pte_dirty(pte))
>  			set_page_dirty(page);
> -		list_add(&page->lru, &page_list);
> +
> +		spin_lock(&hugetlb_lock);
> +		list_move(&page->lru, &page_list);
> +		spin_unlock(&hugetlb_lock);
>  	}
>  	spin_unlock(&mm->page_table_lock);
>  	flush_tlb_range(vma, start, end);
>  	mmu_notifier_invalidate_range_end(mm, start, end);
>  	list_for_each_entry_safe(page, tmp, &page_list, lru) {
>  		page_remove_rmap(page);
> -		list_del(&page->lru);
> +		/*
> +		 * We need to move it back huge page active list. If we are
> +		 * holding the last reference, below put_page will move it
> +		 * back to free list.
> +		 */
> +		spin_lock(&hugetlb_lock);
> +		list_move(&page->lru, &h->hugepage_activelist);
> +		spin_unlock(&hugetlb_lock);
>  		put_page(page);
>  	}
>  }



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 09/10] memcg: move HugeTLB resource count to parent cgroup on memcg removal
  2012-03-16 17:39   ` Aneesh Kumar K.V
  (?)
@ 2012-03-19  3:04     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  3:04 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups

(2012/03/17 2:39), Aneesh Kumar K.V wrote:

> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> This add support for memcg removal with HugeTLB resource usage.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>


seems ok for now.

Now, Tejun and Costa, and I are discussing removeing -EBUSY from rmdir().
We're now considering 'if use_hierarchy=false and parent seems full, 
reclaim all or move charges to the root cgroup.' then -EBUSY will go away.

Is it accesptable for hugetlb ? Do you have another idea ?

Thanks,
-Kame 


> ---
>  include/linux/hugetlb.h    |    6 ++++
>  include/linux/memcontrol.h |   15 +++++++++-
>  mm/hugetlb.c               |   41 ++++++++++++++++++++++++++
>  mm/memcontrol.c            |   68 +++++++++++++++++++++++++++++++++++++------
>  4 files changed, 119 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 6919100..32e948c 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -349,11 +349,17 @@ static inline unsigned int pages_per_huge_page(struct hstate *h)
>  #ifdef CONFIG_MEM_RES_CTLR_HUGETLB
>  extern int register_hugetlb_memcg_files(struct cgroup *cgroup,
>  					struct cgroup_subsys *ss);
> +extern int hugetlb_force_memcg_empty(struct cgroup *cgroup);
>  #else
>  static inline int register_hugetlb_memcg_files(struct cgroup *cgroup,
>  					       struct cgroup_subsys *ss)
>  {
>  	return 0;
>  }
> +
> +static inline int hugetlb_force_memcg_empty(struct cgroup *cgroup)
> +{
> +	return 0;
> +}
>  #endif
>  #endif /* _LINUX_HUGETLB_H */
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 73900b9..0980122 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -441,7 +441,9 @@ extern void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages,
>  extern void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
>  					      struct mem_cgroup *memcg);
>  extern int mem_cgroup_hugetlb_file_init(int idx);
> -
> +extern int mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup,
> +					  struct page *page);
> +extern bool mem_cgroup_have_hugetlb_usage(struct cgroup *cgroup);
>  #else
>  static inline int
>  mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
> @@ -477,6 +479,17 @@ static inline int mem_cgroup_hugetlb_file_init(int idx)
>  	return 0;
>  }
>  
> +static inline int
> +mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup,
> +			       struct page *page)
> +{
> +	return 0;
> +}
> +
> +static inline bool mem_cgroup_have_hugetlb_usage(struct cgroup *cgroup)
> +{
> +	return 0;
> +}
>  #endif  /* CONFIG_MEM_RES_CTLR_HUGETLB */
>  #endif /* _LINUX_MEMCONTROL_H */
>  
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 8fd465d..685f0d5 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1842,6 +1842,47 @@ int register_hugetlb_memcg_files(struct cgroup *cgroup,
>  	}
>  	return ret;
>  }
> +
> +/*
> + * Force the memcg to empty the hugetlb resources by moving them to
> + * the parent cgroup. We can fail if the parent cgroup's limit prevented
> + * the charging. This should only happen if use_hierarchy is not set.
> + */
> +int hugetlb_force_memcg_empty(struct cgroup *cgroup)
> +{
> +	struct hstate *h;
> +	struct page *page;
> +	int ret = 0, idx = 0;
> +
> +	do {
> +		if (cgroup_task_count(cgroup) || !list_empty(&cgroup->children))
> +			goto out;
> +		/*
> +		 * If the task doing the cgroup_rmdir got a signal
> +		 * we don't really need to loop till the hugetlb resource
> +		 * usage become zero.
> +		 */
> +		if (signal_pending(current)) {
> +			ret = -EINTR;
> +			goto out;
> +		}
> +		for_each_hstate(h) {
> +			spin_lock(&hugetlb_lock);
> +			list_for_each_entry(page, &h->hugepage_activelist, lru) {
> +				ret = mem_cgroup_move_hugetlb_parent(idx, cgroup, page);
> +				if (ret) {
> +					spin_unlock(&hugetlb_lock);
> +					goto out;
> +				}
> +			}
> +			spin_unlock(&hugetlb_lock);
> +			idx++;
> +		}
> +		cond_resched();
> +	} while (mem_cgroup_have_hugetlb_usage(cgroup));
> +out:
> +	return ret;
> +}
>  #endif
>  
>  /* Should be called on processing a hugepagesz=... option */
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 4900b72..e29d86d 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3171,9 +3171,11 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
>  #endif
>  
>  #ifdef CONFIG_MEM_RES_CTLR_HUGETLB
> -static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
> +bool mem_cgroup_have_hugetlb_usage(struct cgroup *cgroup)
>  {
>  	int idx;
> +	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgroup);
> +
>  	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
>  		if (memcg->hugepage[idx].usage > 0)
>  			return 1;
> @@ -3285,10 +3287,57 @@ void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
>  		res_counter_uncharge(&memcg->hugepage[idx], csize);
>  	return;
>  }
> -#else
> -static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
> +
> +int mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup,
> +				   struct page *page)
>  {
> -	return 0;
> +	struct page_cgroup *pc;
> +	int csize,  ret = 0;
> +	struct res_counter *fail_res;
> +	struct cgroup *pcgrp = cgroup->parent;
> +	struct mem_cgroup *parent = mem_cgroup_from_cont(pcgrp);
> +	struct mem_cgroup *memcg  = mem_cgroup_from_cont(cgroup);
> +
> +	if (!get_page_unless_zero(page))
> +		goto out;
> +
> +	pc = lookup_page_cgroup(page);
> +	lock_page_cgroup(pc);
> +	if (!PageCgroupUsed(pc) || pc->mem_cgroup != memcg)
> +		goto err_out;
> +
> +	csize = PAGE_SIZE << compound_order(page);
> +	/*
> +	 * uncharge from child and charge the parent. If we have
> +	 * use_hierarchy set, we can never fail here. In-order to make
> +	 * sure we don't get -ENOMEM on parent charge, we first uncharge
> +	 * the child and then charge the parent.
> +	 */
> +	if (parent->use_hierarchy) {
> +		res_counter_uncharge(&memcg->hugepage[idx], csize);
> +		if (!mem_cgroup_is_root(parent))
> +			ret = res_counter_charge(&parent->hugepage[idx],
> +						 csize, &fail_res);
> +	} else {
> +		if (!mem_cgroup_is_root(parent)) {
> +			ret = res_counter_charge(&parent->hugepage[idx],
> +						 csize, &fail_res);
> +			if (ret) {
> +				ret = -EBUSY;
> +				goto err_out;
> +			}
> +		}
> +		res_counter_uncharge(&memcg->hugepage[idx], csize);
> +	}
> +	/*
> +	 * caller should have done css_get
> +	 */
> +	pc->mem_cgroup = parent;
> +err_out:
> +	unlock_page_cgroup(pc);
> +	put_page(page);
> +out:
> +	return ret;
>  }
>  #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */
>  
> @@ -3806,6 +3855,11 @@ static int mem_cgroup_force_empty(struct mem_cgroup *memcg, bool free_all)
>  	/* should free all ? */
>  	if (free_all)
>  		goto try_to_free;
> +
> +	/* move the hugetlb charges */
> +	ret = hugetlb_force_memcg_empty(cgrp);
> +	if (ret)
> +		goto out;
>  move_account:
>  	do {
>  		ret = -EBUSY;
> @@ -5103,12 +5157,6 @@ static int mem_cgroup_pre_destroy(struct cgroup_subsys *ss,
>  					struct cgroup *cont)
>  {
>  	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> -	/*
> -	 * Don't allow memcg removal if we have HugeTLB resource
> -	 * usage.
> -	 */
> -	if (mem_cgroup_have_hugetlb_usage(memcg))
> -		return -EBUSY;
>  
>  	return mem_cgroup_force_empty(memcg, false);
>  }




^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 09/10] memcg: move HugeTLB resource count to parent cgroup on memcg removal
@ 2012-03-19  3:04     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  3:04 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups

(2012/03/17 2:39), Aneesh Kumar K.V wrote:

> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> This add support for memcg removal with HugeTLB resource usage.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>


seems ok for now.

Now, Tejun and Costa, and I are discussing removeing -EBUSY from rmdir().
We're now considering 'if use_hierarchy=false and parent seems full, 
reclaim all or move charges to the root cgroup.' then -EBUSY will go away.

Is it accesptable for hugetlb ? Do you have another idea ?

Thanks,
-Kame 


> ---
>  include/linux/hugetlb.h    |    6 ++++
>  include/linux/memcontrol.h |   15 +++++++++-
>  mm/hugetlb.c               |   41 ++++++++++++++++++++++++++
>  mm/memcontrol.c            |   68 +++++++++++++++++++++++++++++++++++++------
>  4 files changed, 119 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 6919100..32e948c 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -349,11 +349,17 @@ static inline unsigned int pages_per_huge_page(struct hstate *h)
>  #ifdef CONFIG_MEM_RES_CTLR_HUGETLB
>  extern int register_hugetlb_memcg_files(struct cgroup *cgroup,
>  					struct cgroup_subsys *ss);
> +extern int hugetlb_force_memcg_empty(struct cgroup *cgroup);
>  #else
>  static inline int register_hugetlb_memcg_files(struct cgroup *cgroup,
>  					       struct cgroup_subsys *ss)
>  {
>  	return 0;
>  }
> +
> +static inline int hugetlb_force_memcg_empty(struct cgroup *cgroup)
> +{
> +	return 0;
> +}
>  #endif
>  #endif /* _LINUX_HUGETLB_H */
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 73900b9..0980122 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -441,7 +441,9 @@ extern void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages,
>  extern void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
>  					      struct mem_cgroup *memcg);
>  extern int mem_cgroup_hugetlb_file_init(int idx);
> -
> +extern int mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup,
> +					  struct page *page);
> +extern bool mem_cgroup_have_hugetlb_usage(struct cgroup *cgroup);
>  #else
>  static inline int
>  mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
> @@ -477,6 +479,17 @@ static inline int mem_cgroup_hugetlb_file_init(int idx)
>  	return 0;
>  }
>  
> +static inline int
> +mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup,
> +			       struct page *page)
> +{
> +	return 0;
> +}
> +
> +static inline bool mem_cgroup_have_hugetlb_usage(struct cgroup *cgroup)
> +{
> +	return 0;
> +}
>  #endif  /* CONFIG_MEM_RES_CTLR_HUGETLB */
>  #endif /* _LINUX_MEMCONTROL_H */
>  
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 8fd465d..685f0d5 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1842,6 +1842,47 @@ int register_hugetlb_memcg_files(struct cgroup *cgroup,
>  	}
>  	return ret;
>  }
> +
> +/*
> + * Force the memcg to empty the hugetlb resources by moving them to
> + * the parent cgroup. We can fail if the parent cgroup's limit prevented
> + * the charging. This should only happen if use_hierarchy is not set.
> + */
> +int hugetlb_force_memcg_empty(struct cgroup *cgroup)
> +{
> +	struct hstate *h;
> +	struct page *page;
> +	int ret = 0, idx = 0;
> +
> +	do {
> +		if (cgroup_task_count(cgroup) || !list_empty(&cgroup->children))
> +			goto out;
> +		/*
> +		 * If the task doing the cgroup_rmdir got a signal
> +		 * we don't really need to loop till the hugetlb resource
> +		 * usage become zero.
> +		 */
> +		if (signal_pending(current)) {
> +			ret = -EINTR;
> +			goto out;
> +		}
> +		for_each_hstate(h) {
> +			spin_lock(&hugetlb_lock);
> +			list_for_each_entry(page, &h->hugepage_activelist, lru) {
> +				ret = mem_cgroup_move_hugetlb_parent(idx, cgroup, page);
> +				if (ret) {
> +					spin_unlock(&hugetlb_lock);
> +					goto out;
> +				}
> +			}
> +			spin_unlock(&hugetlb_lock);
> +			idx++;
> +		}
> +		cond_resched();
> +	} while (mem_cgroup_have_hugetlb_usage(cgroup));
> +out:
> +	return ret;
> +}
>  #endif
>  
>  /* Should be called on processing a hugepagesz=... option */
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 4900b72..e29d86d 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3171,9 +3171,11 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
>  #endif
>  
>  #ifdef CONFIG_MEM_RES_CTLR_HUGETLB
> -static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
> +bool mem_cgroup_have_hugetlb_usage(struct cgroup *cgroup)
>  {
>  	int idx;
> +	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgroup);
> +
>  	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
>  		if (memcg->hugepage[idx].usage > 0)
>  			return 1;
> @@ -3285,10 +3287,57 @@ void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
>  		res_counter_uncharge(&memcg->hugepage[idx], csize);
>  	return;
>  }
> -#else
> -static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
> +
> +int mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup,
> +				   struct page *page)
>  {
> -	return 0;
> +	struct page_cgroup *pc;
> +	int csize,  ret = 0;
> +	struct res_counter *fail_res;
> +	struct cgroup *pcgrp = cgroup->parent;
> +	struct mem_cgroup *parent = mem_cgroup_from_cont(pcgrp);
> +	struct mem_cgroup *memcg  = mem_cgroup_from_cont(cgroup);
> +
> +	if (!get_page_unless_zero(page))
> +		goto out;
> +
> +	pc = lookup_page_cgroup(page);
> +	lock_page_cgroup(pc);
> +	if (!PageCgroupUsed(pc) || pc->mem_cgroup != memcg)
> +		goto err_out;
> +
> +	csize = PAGE_SIZE << compound_order(page);
> +	/*
> +	 * uncharge from child and charge the parent. If we have
> +	 * use_hierarchy set, we can never fail here. In-order to make
> +	 * sure we don't get -ENOMEM on parent charge, we first uncharge
> +	 * the child and then charge the parent.
> +	 */
> +	if (parent->use_hierarchy) {
> +		res_counter_uncharge(&memcg->hugepage[idx], csize);
> +		if (!mem_cgroup_is_root(parent))
> +			ret = res_counter_charge(&parent->hugepage[idx],
> +						 csize, &fail_res);
> +	} else {
> +		if (!mem_cgroup_is_root(parent)) {
> +			ret = res_counter_charge(&parent->hugepage[idx],
> +						 csize, &fail_res);
> +			if (ret) {
> +				ret = -EBUSY;
> +				goto err_out;
> +			}
> +		}
> +		res_counter_uncharge(&memcg->hugepage[idx], csize);
> +	}
> +	/*
> +	 * caller should have done css_get
> +	 */
> +	pc->mem_cgroup = parent;
> +err_out:
> +	unlock_page_cgroup(pc);
> +	put_page(page);
> +out:
> +	return ret;
>  }
>  #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */
>  
> @@ -3806,6 +3855,11 @@ static int mem_cgroup_force_empty(struct mem_cgroup *memcg, bool free_all)
>  	/* should free all ? */
>  	if (free_all)
>  		goto try_to_free;
> +
> +	/* move the hugetlb charges */
> +	ret = hugetlb_force_memcg_empty(cgrp);
> +	if (ret)
> +		goto out;
>  move_account:
>  	do {
>  		ret = -EBUSY;
> @@ -5103,12 +5157,6 @@ static int mem_cgroup_pre_destroy(struct cgroup_subsys *ss,
>  					struct cgroup *cont)
>  {
>  	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> -	/*
> -	 * Don't allow memcg removal if we have HugeTLB resource
> -	 * usage.
> -	 */
> -	if (mem_cgroup_have_hugetlb_usage(memcg))
> -		return -EBUSY;
>  
>  	return mem_cgroup_force_empty(memcg, false);
>  }



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 09/10] memcg: move HugeTLB resource count to parent cgroup on memcg removal
@ 2012-03-19  3:04     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  3:04 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, mgorman-l3A5Bk7waGM,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	mhocko-AlSwsSmVLrQ, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

(2012/03/17 2:39), Aneesh Kumar K.V wrote:

> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> 
> This add support for memcg removal with HugeTLB resource usage.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>


seems ok for now.

Now, Tejun and Costa, and I are discussing removeing -EBUSY from rmdir().
We're now considering 'if use_hierarchy=false and parent seems full, 
reclaim all or move charges to the root cgroup.' then -EBUSY will go away.

Is it accesptable for hugetlb ? Do you have another idea ?

Thanks,
-Kame 


> ---
>  include/linux/hugetlb.h    |    6 ++++
>  include/linux/memcontrol.h |   15 +++++++++-
>  mm/hugetlb.c               |   41 ++++++++++++++++++++++++++
>  mm/memcontrol.c            |   68 +++++++++++++++++++++++++++++++++++++------
>  4 files changed, 119 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 6919100..32e948c 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -349,11 +349,17 @@ static inline unsigned int pages_per_huge_page(struct hstate *h)
>  #ifdef CONFIG_MEM_RES_CTLR_HUGETLB
>  extern int register_hugetlb_memcg_files(struct cgroup *cgroup,
>  					struct cgroup_subsys *ss);
> +extern int hugetlb_force_memcg_empty(struct cgroup *cgroup);
>  #else
>  static inline int register_hugetlb_memcg_files(struct cgroup *cgroup,
>  					       struct cgroup_subsys *ss)
>  {
>  	return 0;
>  }
> +
> +static inline int hugetlb_force_memcg_empty(struct cgroup *cgroup)
> +{
> +	return 0;
> +}
>  #endif
>  #endif /* _LINUX_HUGETLB_H */
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 73900b9..0980122 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -441,7 +441,9 @@ extern void mem_cgroup_hugetlb_uncharge_page(int idx, unsigned long nr_pages,
>  extern void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
>  					      struct mem_cgroup *memcg);
>  extern int mem_cgroup_hugetlb_file_init(int idx);
> -
> +extern int mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup,
> +					  struct page *page);
> +extern bool mem_cgroup_have_hugetlb_usage(struct cgroup *cgroup);
>  #else
>  static inline int
>  mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
> @@ -477,6 +479,17 @@ static inline int mem_cgroup_hugetlb_file_init(int idx)
>  	return 0;
>  }
>  
> +static inline int
> +mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup,
> +			       struct page *page)
> +{
> +	return 0;
> +}
> +
> +static inline bool mem_cgroup_have_hugetlb_usage(struct cgroup *cgroup)
> +{
> +	return 0;
> +}
>  #endif  /* CONFIG_MEM_RES_CTLR_HUGETLB */
>  #endif /* _LINUX_MEMCONTROL_H */
>  
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 8fd465d..685f0d5 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1842,6 +1842,47 @@ int register_hugetlb_memcg_files(struct cgroup *cgroup,
>  	}
>  	return ret;
>  }
> +
> +/*
> + * Force the memcg to empty the hugetlb resources by moving them to
> + * the parent cgroup. We can fail if the parent cgroup's limit prevented
> + * the charging. This should only happen if use_hierarchy is not set.
> + */
> +int hugetlb_force_memcg_empty(struct cgroup *cgroup)
> +{
> +	struct hstate *h;
> +	struct page *page;
> +	int ret = 0, idx = 0;
> +
> +	do {
> +		if (cgroup_task_count(cgroup) || !list_empty(&cgroup->children))
> +			goto out;
> +		/*
> +		 * If the task doing the cgroup_rmdir got a signal
> +		 * we don't really need to loop till the hugetlb resource
> +		 * usage become zero.
> +		 */
> +		if (signal_pending(current)) {
> +			ret = -EINTR;
> +			goto out;
> +		}
> +		for_each_hstate(h) {
> +			spin_lock(&hugetlb_lock);
> +			list_for_each_entry(page, &h->hugepage_activelist, lru) {
> +				ret = mem_cgroup_move_hugetlb_parent(idx, cgroup, page);
> +				if (ret) {
> +					spin_unlock(&hugetlb_lock);
> +					goto out;
> +				}
> +			}
> +			spin_unlock(&hugetlb_lock);
> +			idx++;
> +		}
> +		cond_resched();
> +	} while (mem_cgroup_have_hugetlb_usage(cgroup));
> +out:
> +	return ret;
> +}
>  #endif
>  
>  /* Should be called on processing a hugepagesz=... option */
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 4900b72..e29d86d 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3171,9 +3171,11 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
>  #endif
>  
>  #ifdef CONFIG_MEM_RES_CTLR_HUGETLB
> -static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
> +bool mem_cgroup_have_hugetlb_usage(struct cgroup *cgroup)
>  {
>  	int idx;
> +	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgroup);
> +
>  	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
>  		if (memcg->hugepage[idx].usage > 0)
>  			return 1;
> @@ -3285,10 +3287,57 @@ void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
>  		res_counter_uncharge(&memcg->hugepage[idx], csize);
>  	return;
>  }
> -#else
> -static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
> +
> +int mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup,
> +				   struct page *page)
>  {
> -	return 0;
> +	struct page_cgroup *pc;
> +	int csize,  ret = 0;
> +	struct res_counter *fail_res;
> +	struct cgroup *pcgrp = cgroup->parent;
> +	struct mem_cgroup *parent = mem_cgroup_from_cont(pcgrp);
> +	struct mem_cgroup *memcg  = mem_cgroup_from_cont(cgroup);
> +
> +	if (!get_page_unless_zero(page))
> +		goto out;
> +
> +	pc = lookup_page_cgroup(page);
> +	lock_page_cgroup(pc);
> +	if (!PageCgroupUsed(pc) || pc->mem_cgroup != memcg)
> +		goto err_out;
> +
> +	csize = PAGE_SIZE << compound_order(page);
> +	/*
> +	 * uncharge from child and charge the parent. If we have
> +	 * use_hierarchy set, we can never fail here. In-order to make
> +	 * sure we don't get -ENOMEM on parent charge, we first uncharge
> +	 * the child and then charge the parent.
> +	 */
> +	if (parent->use_hierarchy) {
> +		res_counter_uncharge(&memcg->hugepage[idx], csize);
> +		if (!mem_cgroup_is_root(parent))
> +			ret = res_counter_charge(&parent->hugepage[idx],
> +						 csize, &fail_res);
> +	} else {
> +		if (!mem_cgroup_is_root(parent)) {
> +			ret = res_counter_charge(&parent->hugepage[idx],
> +						 csize, &fail_res);
> +			if (ret) {
> +				ret = -EBUSY;
> +				goto err_out;
> +			}
> +		}
> +		res_counter_uncharge(&memcg->hugepage[idx], csize);
> +	}
> +	/*
> +	 * caller should have done css_get
> +	 */
> +	pc->mem_cgroup = parent;
> +err_out:
> +	unlock_page_cgroup(pc);
> +	put_page(page);
> +out:
> +	return ret;
>  }
>  #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */
>  
> @@ -3806,6 +3855,11 @@ static int mem_cgroup_force_empty(struct mem_cgroup *memcg, bool free_all)
>  	/* should free all ? */
>  	if (free_all)
>  		goto try_to_free;
> +
> +	/* move the hugetlb charges */
> +	ret = hugetlb_force_memcg_empty(cgrp);
> +	if (ret)
> +		goto out;
>  move_account:
>  	do {
>  		ret = -EBUSY;
> @@ -5103,12 +5157,6 @@ static int mem_cgroup_pre_destroy(struct cgroup_subsys *ss,
>  					struct cgroup *cont)
>  {
>  	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> -	/*
> -	 * Don't allow memcg removal if we have HugeTLB resource
> -	 * usage.
> -	 */
> -	if (mem_cgroup_have_hugetlb_usage(memcg))
> -		return -EBUSY;
>  
>  	return mem_cgroup_force_empty(memcg, false);
>  }



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 02/10] hugetlbfs: don't use ERR_PTR with VM_FAULT* values
  2012-03-19  2:11     ` KAMEZAWA Hiroyuki
@ 2012-03-19  6:37       ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-19  6:37 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups

On Mon, 19 Mar 2012 11:11:56 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> (2012/03/17 2:39), Aneesh Kumar K.V wrote:
> 
> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> > 
> > Using VM_FAULT_* codes with ERR_PTR will require us to make sure
> > VM_FAULT_* values will not exceed MAX_ERRNO value.
> > 
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> 
> 
> Is this a bug fix ?

No. Currently the values of VM_FAULT_* codes are all below MAX_ERRNO. The
changes in the patch are done based on the suggestion from Andrew. 

http://article.gmane.org/gmane.linux.kernel.cgroups/1160

> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 02/10] hugetlbfs: don't use ERR_PTR with VM_FAULT* values
@ 2012-03-19  6:37       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-19  6:37 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups

On Mon, 19 Mar 2012 11:11:56 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> (2012/03/17 2:39), Aneesh Kumar K.V wrote:
> 
> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> > 
> > Using VM_FAULT_* codes with ERR_PTR will require us to make sure
> > VM_FAULT_* values will not exceed MAX_ERRNO value.
> > 
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> 
> 
> Is this a bug fix ?

No. Currently the values of VM_FAULT_* codes are all below MAX_ERRNO. The
changes in the patch are done based on the suggestion from Andrew. 

http://article.gmane.org/gmane.linux.kernel.cgroups/1160

> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
  2012-03-19  2:38     ` KAMEZAWA Hiroyuki
  (?)
@ 2012-03-19  6:52       ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-19  6:52 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups

On Mon, 19 Mar 2012 11:38:38 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> (2012/03/17 2:39), Aneesh Kumar K.V wrote:
> 
> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> > 
> > This patch implements a memcg extension that allows us to control
> > HugeTLB allocations via memory controller.
> > 
> 
> 
> If you write some details here, it will be helpful for review and
> seeing log after merge.

Will add more info.

> 
> 
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> > ---
> >  include/linux/hugetlb.h    |    1 +
> >  include/linux/memcontrol.h |   42 +++++++++++++
> >  init/Kconfig               |    8 +++
> >  mm/hugetlb.c               |    2 +-
> >  mm/memcontrol.c            |  138 ++++++++++++++++++++++++++++++++++++++++++++
> >  5 files changed, 190 insertions(+), 1 deletions(-)

....

> > +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
> > +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
> > +{
> > +	int idx;
> > +	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
> > +		if (memcg->hugepage[idx].usage > 0)
> > +			return 1;
> > +	}
> > +	return 0;
> > +}
> 
> 
> Please use res_counter_read_u64() rather than reading the value directly.
> 

The open-coded variant is mostly derived from mem_cgroup_force_empty. I
have updated the patch to use res_counter_read_u64. 

> 
> > +
> > +int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
> > +				   struct mem_cgroup **ptr)
> > +{
> > +	int ret = 0;
> > +	struct mem_cgroup *memcg;
> > +	struct res_counter *fail_res;
> > +	unsigned long csize = nr_pages * PAGE_SIZE;
> > +
> > +	if (mem_cgroup_disabled())
> > +		return 0;
> > +again:
> > +	rcu_read_lock();
> > +	memcg = mem_cgroup_from_task(current);
> > +	if (!memcg)
> > +		memcg = root_mem_cgroup;
> > +	if (mem_cgroup_is_root(memcg)) {
> > +		rcu_read_unlock();
> > +		goto done;
> > +	}
> 
> 
> One concern is.... Now, yes, memory cgroup doesn't account root cgroup
> and doesn't update res->usage to avoid updating shared counter overheads
> when memcg is not mounted. But memory.usage_in_bytes files works
> for root memcg with reading percpu statistics.
> 
> So, how about counting usage for root cgroup even if it cannot be limited ?
> Considering hugetlb fs usage, updating res_counter here doesn't have
> performance problem of false sharing..
> Then, you can remove root_mem_cgroup() checks inserted several places.
> 

Yes. That is a good idea. Will update the patch.


> <snip>
> 
> >  	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> > +	/*
> > +	 * Don't allow memcg removal if we have HugeTLB resource
> > +	 * usage.
> > +	 */
> > +	if (mem_cgroup_have_hugetlb_usage(memcg))
> > +		return -EBUSY;
> >  
> >  	return mem_cgroup_force_empty(memcg, false);
> >  }
> 
> 
> Is this fixed by patch 8+9 ?

Yes. 

-aneesh


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
@ 2012-03-19  6:52       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-19  6:52 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups

On Mon, 19 Mar 2012 11:38:38 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> (2012/03/17 2:39), Aneesh Kumar K.V wrote:
> 
> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> > 
> > This patch implements a memcg extension that allows us to control
> > HugeTLB allocations via memory controller.
> > 
> 
> 
> If you write some details here, it will be helpful for review and
> seeing log after merge.

Will add more info.

> 
> 
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> > ---
> >  include/linux/hugetlb.h    |    1 +
> >  include/linux/memcontrol.h |   42 +++++++++++++
> >  init/Kconfig               |    8 +++
> >  mm/hugetlb.c               |    2 +-
> >  mm/memcontrol.c            |  138 ++++++++++++++++++++++++++++++++++++++++++++
> >  5 files changed, 190 insertions(+), 1 deletions(-)

....

> > +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
> > +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
> > +{
> > +	int idx;
> > +	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
> > +		if (memcg->hugepage[idx].usage > 0)
> > +			return 1;
> > +	}
> > +	return 0;
> > +}
> 
> 
> Please use res_counter_read_u64() rather than reading the value directly.
> 

The open-coded variant is mostly derived from mem_cgroup_force_empty. I
have updated the patch to use res_counter_read_u64. 

> 
> > +
> > +int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
> > +				   struct mem_cgroup **ptr)
> > +{
> > +	int ret = 0;
> > +	struct mem_cgroup *memcg;
> > +	struct res_counter *fail_res;
> > +	unsigned long csize = nr_pages * PAGE_SIZE;
> > +
> > +	if (mem_cgroup_disabled())
> > +		return 0;
> > +again:
> > +	rcu_read_lock();
> > +	memcg = mem_cgroup_from_task(current);
> > +	if (!memcg)
> > +		memcg = root_mem_cgroup;
> > +	if (mem_cgroup_is_root(memcg)) {
> > +		rcu_read_unlock();
> > +		goto done;
> > +	}
> 
> 
> One concern is.... Now, yes, memory cgroup doesn't account root cgroup
> and doesn't update res->usage to avoid updating shared counter overheads
> when memcg is not mounted. But memory.usage_in_bytes files works
> for root memcg with reading percpu statistics.
> 
> So, how about counting usage for root cgroup even if it cannot be limited ?
> Considering hugetlb fs usage, updating res_counter here doesn't have
> performance problem of false sharing..
> Then, you can remove root_mem_cgroup() checks inserted several places.
> 

Yes. That is a good idea. Will update the patch.


> <snip>
> 
> >  	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> > +	/*
> > +	 * Don't allow memcg removal if we have HugeTLB resource
> > +	 * usage.
> > +	 */
> > +	if (mem_cgroup_have_hugetlb_usage(memcg))
> > +		return -EBUSY;
> >  
> >  	return mem_cgroup_force_empty(memcg, false);
> >  }
> 
> 
> Is this fixed by patch 8+9 ?

Yes. 

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
@ 2012-03-19  6:52       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-19  6:52 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, mgorman-l3A5Bk7waGM,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	mhocko-AlSwsSmVLrQ, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Mon, 19 Mar 2012 11:38:38 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote:
> (2012/03/17 2:39), Aneesh Kumar K.V wrote:
> 
> > From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> > 
> > This patch implements a memcg extension that allows us to control
> > HugeTLB allocations via memory controller.
> > 
> 
> 
> If you write some details here, it will be helpful for review and
> seeing log after merge.

Will add more info.

> 
> 
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> > ---
> >  include/linux/hugetlb.h    |    1 +
> >  include/linux/memcontrol.h |   42 +++++++++++++
> >  init/Kconfig               |    8 +++
> >  mm/hugetlb.c               |    2 +-
> >  mm/memcontrol.c            |  138 ++++++++++++++++++++++++++++++++++++++++++++
> >  5 files changed, 190 insertions(+), 1 deletions(-)

....

> > +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
> > +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
> > +{
> > +	int idx;
> > +	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
> > +		if (memcg->hugepage[idx].usage > 0)
> > +			return 1;
> > +	}
> > +	return 0;
> > +}
> 
> 
> Please use res_counter_read_u64() rather than reading the value directly.
> 

The open-coded variant is mostly derived from mem_cgroup_force_empty. I
have updated the patch to use res_counter_read_u64. 

> 
> > +
> > +int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
> > +				   struct mem_cgroup **ptr)
> > +{
> > +	int ret = 0;
> > +	struct mem_cgroup *memcg;
> > +	struct res_counter *fail_res;
> > +	unsigned long csize = nr_pages * PAGE_SIZE;
> > +
> > +	if (mem_cgroup_disabled())
> > +		return 0;
> > +again:
> > +	rcu_read_lock();
> > +	memcg = mem_cgroup_from_task(current);
> > +	if (!memcg)
> > +		memcg = root_mem_cgroup;
> > +	if (mem_cgroup_is_root(memcg)) {
> > +		rcu_read_unlock();
> > +		goto done;
> > +	}
> 
> 
> One concern is.... Now, yes, memory cgroup doesn't account root cgroup
> and doesn't update res->usage to avoid updating shared counter overheads
> when memcg is not mounted. But memory.usage_in_bytes files works
> for root memcg with reading percpu statistics.
> 
> So, how about counting usage for root cgroup even if it cannot be limited ?
> Considering hugetlb fs usage, updating res_counter here doesn't have
> performance problem of false sharing..
> Then, you can remove root_mem_cgroup() checks inserted several places.
> 

Yes. That is a good idea. Will update the patch.


> <snip>
> 
> >  	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> > +	/*
> > +	 * Don't allow memcg removal if we have HugeTLB resource
> > +	 * usage.
> > +	 */
> > +	if (mem_cgroup_have_hugetlb_usage(memcg))
> > +		return -EBUSY;
> >  
> >  	return mem_cgroup_force_empty(memcg, false);
> >  }
> 
> 
> Is this fixed by patch 8+9 ?

Yes. 

-aneesh

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
  2012-03-19  6:52       ` Aneesh Kumar K.V
  (?)
@ 2012-03-19  7:00         ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  7:00 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups

(2012/03/19 15:52), Aneesh Kumar K.V wrote:

> On Mon, 19 Mar 2012 11:38:38 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> (2012/03/17 2:39), Aneesh Kumar K.V wrote:
>>
>>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>>>
>>> This patch implements a memcg extension that allows us to control
>>> HugeTLB allocations via memory controller.
>>>
>>
>>
>> If you write some details here, it will be helpful for review and
>> seeing log after merge.
> 
> Will add more info.
> 
>>
>>
>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>>> ---
>>>  include/linux/hugetlb.h    |    1 +
>>>  include/linux/memcontrol.h |   42 +++++++++++++
>>>  init/Kconfig               |    8 +++
>>>  mm/hugetlb.c               |    2 +-
>>>  mm/memcontrol.c            |  138 ++++++++++++++++++++++++++++++++++++++++++++
>>>  5 files changed, 190 insertions(+), 1 deletions(-)
> 
> ....
> 
>>> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
>>> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
>>> +{
>>> +	int idx;
>>> +	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
>>> +		if (memcg->hugepage[idx].usage > 0)
>>> +			return 1;
>>> +	}
>>> +	return 0;
>>> +}
>>
>>
>> Please use res_counter_read_u64() rather than reading the value directly.
>>
> 
> The open-coded variant is mostly derived from mem_cgroup_force_empty. I
> have updated the patch to use res_counter_read_u64. 
> 

Ah, ok. it's(maybe) my bad. I'll schedule a fix.


Thanks,
-Kame


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
@ 2012-03-19  7:00         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  7:00 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups

(2012/03/19 15:52), Aneesh Kumar K.V wrote:

> On Mon, 19 Mar 2012 11:38:38 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> (2012/03/17 2:39), Aneesh Kumar K.V wrote:
>>
>>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>>>
>>> This patch implements a memcg extension that allows us to control
>>> HugeTLB allocations via memory controller.
>>>
>>
>>
>> If you write some details here, it will be helpful for review and
>> seeing log after merge.
> 
> Will add more info.
> 
>>
>>
>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>>> ---
>>>  include/linux/hugetlb.h    |    1 +
>>>  include/linux/memcontrol.h |   42 +++++++++++++
>>>  init/Kconfig               |    8 +++
>>>  mm/hugetlb.c               |    2 +-
>>>  mm/memcontrol.c            |  138 ++++++++++++++++++++++++++++++++++++++++++++
>>>  5 files changed, 190 insertions(+), 1 deletions(-)
> 
> ....
> 
>>> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
>>> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
>>> +{
>>> +	int idx;
>>> +	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
>>> +		if (memcg->hugepage[idx].usage > 0)
>>> +			return 1;
>>> +	}
>>> +	return 0;
>>> +}
>>
>>
>> Please use res_counter_read_u64() rather than reading the value directly.
>>
> 
> The open-coded variant is mostly derived from mem_cgroup_force_empty. I
> have updated the patch to use res_counter_read_u64. 
> 

Ah, ok. it's(maybe) my bad. I'll schedule a fix.


Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
@ 2012-03-19  7:00         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  7:00 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, mgorman-l3A5Bk7waGM,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	mhocko-AlSwsSmVLrQ, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

(2012/03/19 15:52), Aneesh Kumar K.V wrote:

> On Mon, 19 Mar 2012 11:38:38 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote:
>> (2012/03/17 2:39), Aneesh Kumar K.V wrote:
>>
>>> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
>>>
>>> This patch implements a memcg extension that allows us to control
>>> HugeTLB allocations via memory controller.
>>>
>>
>>
>> If you write some details here, it will be helpful for review and
>> seeing log after merge.
> 
> Will add more info.
> 
>>
>>
>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
>>> ---
>>>  include/linux/hugetlb.h    |    1 +
>>>  include/linux/memcontrol.h |   42 +++++++++++++
>>>  init/Kconfig               |    8 +++
>>>  mm/hugetlb.c               |    2 +-
>>>  mm/memcontrol.c            |  138 ++++++++++++++++++++++++++++++++++++++++++++
>>>  5 files changed, 190 insertions(+), 1 deletions(-)
> 
> ....
> 
>>> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
>>> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
>>> +{
>>> +	int idx;
>>> +	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
>>> +		if (memcg->hugepage[idx].usage > 0)
>>> +			return 1;
>>> +	}
>>> +	return 0;
>>> +}
>>
>>
>> Please use res_counter_read_u64() rather than reading the value directly.
>>
> 
> The open-coded variant is mostly derived from mem_cgroup_force_empty. I
> have updated the patch to use res_counter_read_u64. 
> 

Ah, ok. it's(maybe) my bad. I'll schedule a fix.


Thanks,
-Kame

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
  2012-03-19  2:41     ` KAMEZAWA Hiroyuki
@ 2012-03-19  7:01       ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-19  7:01 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups

On Mon, 19 Mar 2012 11:41:07 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> (2012/03/17 2:39), Aneesh Kumar K.V wrote:
> 
> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> > 
> > This adds necessary charge/uncharge calls in the HugeTLB code
> > 
> > Acked-by: Hillf Danton <dhillf@gmail.com>
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> 
> 
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> A nitpick below.
> 
> > ---
> >  mm/hugetlb.c    |   21 ++++++++++++++++++++-
> >  mm/memcontrol.c |    5 +++++
> >  2 files changed, 25 insertions(+), 1 deletions(-)
> > 
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index c672187..91361a0 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -21,6 +21,8 @@
> >  #include <linux/rmap.h>
> >  #include <linux/swap.h>
> >  #include <linux/swapops.h>
> > +#include <linux/memcontrol.h>
> > +#include <linux/page_cgroup.h>
> >  
> >  #include <asm/page.h>
> >  #include <asm/pgtable.h>
> > @@ -542,6 +544,9 @@ static void free_huge_page(struct page *page)
> >  	BUG_ON(page_mapcount(page));
> >  	INIT_LIST_HEAD(&page->lru);
> >  
> > +	if (mapping)
> > +		mem_cgroup_hugetlb_uncharge_page(hstate_index(h),
> > +						 pages_per_huge_page(h), page);
> >  	spin_lock(&hugetlb_lock);
> >  	if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
> >  		update_and_free_page(h, page);
> > @@ -1019,12 +1024,15 @@ static void vma_commit_reservation(struct hstate *h,
> >  static struct page *alloc_huge_page(struct vm_area_struct *vma,
> >  				    unsigned long addr, int avoid_reserve)
> >  {
> > +	int ret, idx;
> >  	struct hstate *h = hstate_vma(vma);
> >  	struct page *page;
> > +	struct mem_cgroup *memcg = NULL;
> 
> 
> Can't we this initialization in mem_cgroup_hugetlb_charge_page() ?
> 

Will update in the next iteration.

-aneesh


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
@ 2012-03-19  7:01       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-19  7:01 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups

On Mon, 19 Mar 2012 11:41:07 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> (2012/03/17 2:39), Aneesh Kumar K.V wrote:
> 
> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> > 
> > This adds necessary charge/uncharge calls in the HugeTLB code
> > 
> > Acked-by: Hillf Danton <dhillf@gmail.com>
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> 
> 
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> A nitpick below.
> 
> > ---
> >  mm/hugetlb.c    |   21 ++++++++++++++++++++-
> >  mm/memcontrol.c |    5 +++++
> >  2 files changed, 25 insertions(+), 1 deletions(-)
> > 
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index c672187..91361a0 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -21,6 +21,8 @@
> >  #include <linux/rmap.h>
> >  #include <linux/swap.h>
> >  #include <linux/swapops.h>
> > +#include <linux/memcontrol.h>
> > +#include <linux/page_cgroup.h>
> >  
> >  #include <asm/page.h>
> >  #include <asm/pgtable.h>
> > @@ -542,6 +544,9 @@ static void free_huge_page(struct page *page)
> >  	BUG_ON(page_mapcount(page));
> >  	INIT_LIST_HEAD(&page->lru);
> >  
> > +	if (mapping)
> > +		mem_cgroup_hugetlb_uncharge_page(hstate_index(h),
> > +						 pages_per_huge_page(h), page);
> >  	spin_lock(&hugetlb_lock);
> >  	if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
> >  		update_and_free_page(h, page);
> > @@ -1019,12 +1024,15 @@ static void vma_commit_reservation(struct hstate *h,
> >  static struct page *alloc_huge_page(struct vm_area_struct *vma,
> >  				    unsigned long addr, int avoid_reserve)
> >  {
> > +	int ret, idx;
> >  	struct hstate *h = hstate_vma(vma);
> >  	struct page *page;
> > +	struct mem_cgroup *memcg = NULL;
> 
> 
> Can't we this initialization in mem_cgroup_hugetlb_charge_page() ?
> 

Will update in the next iteration.

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 07/10] hugetlbfs: Add memcg control files for hugetlbfs
  2012-03-19  2:56     ` KAMEZAWA Hiroyuki
  (?)
@ 2012-03-19  7:14       ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-19  7:14 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups

On Mon, 19 Mar 2012 11:56:25 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> (2012/03/17 2:39), Aneesh Kumar K.V wrote:
> 
> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> > 
> > This add control files for hugetlbfs in memcg
> > 
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> 
> 
> I have a question. When a user does
> 
> 	1. create memory cgroup as
> 		/cgroup/A
> 	2. insmod hugetlb.ko
> 	3. ls /cgroup/A
> 
> and then, files can be shown ? Don't we have any problem at rmdir A ?
> 
> I'm sorry if hugetlb never be used as module.

HUGETLBFS cannot be build as kernel module


> 
> a comment below.
> 
> > ---
> >  include/linux/hugetlb.h    |   17 +++++++++++++++
> >  include/linux/memcontrol.h |    7 ++++++
> >  mm/hugetlb.c               |   25 ++++++++++++++++++++++-
> >  mm/memcontrol.c            |   48 ++++++++++++++++++++++++++++++++++++++++++++
> >  4 files changed, 96 insertions(+), 1 deletions(-)


......

> > 
> > +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
> > +static char *mem_fmt(char *buf, unsigned long n)
> > +{
> > +	if (n >= (1UL << 30))
> > +		sprintf(buf, "%luGB", n >> 30);
> > +	else if (n >= (1UL << 20))
> > +		sprintf(buf, "%luMB", n >> 20);
> > +	else
> > +		sprintf(buf, "%luKB", n >> 10);
> > +	return buf;
> > +}
> > +
> > +int mem_cgroup_hugetlb_file_init(int idx)
> > +{
> 
> 
> __init ? 

Added .

>And... do we have guarantee that this function is called before
> creating root mem cgroup even if CONFIG_HUGETLBFS=y ?
> 

Yes. This should be called before creating root mem cgroup.

-aneesh


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 07/10] hugetlbfs: Add memcg control files for hugetlbfs
@ 2012-03-19  7:14       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-19  7:14 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups

On Mon, 19 Mar 2012 11:56:25 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> (2012/03/17 2:39), Aneesh Kumar K.V wrote:
> 
> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> > 
> > This add control files for hugetlbfs in memcg
> > 
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> 
> 
> I have a question. When a user does
> 
> 	1. create memory cgroup as
> 		/cgroup/A
> 	2. insmod hugetlb.ko
> 	3. ls /cgroup/A
> 
> and then, files can be shown ? Don't we have any problem at rmdir A ?
> 
> I'm sorry if hugetlb never be used as module.

HUGETLBFS cannot be build as kernel module


> 
> a comment below.
> 
> > ---
> >  include/linux/hugetlb.h    |   17 +++++++++++++++
> >  include/linux/memcontrol.h |    7 ++++++
> >  mm/hugetlb.c               |   25 ++++++++++++++++++++++-
> >  mm/memcontrol.c            |   48 ++++++++++++++++++++++++++++++++++++++++++++
> >  4 files changed, 96 insertions(+), 1 deletions(-)


......

> > 
> > +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
> > +static char *mem_fmt(char *buf, unsigned long n)
> > +{
> > +	if (n >= (1UL << 30))
> > +		sprintf(buf, "%luGB", n >> 30);
> > +	else if (n >= (1UL << 20))
> > +		sprintf(buf, "%luMB", n >> 20);
> > +	else
> > +		sprintf(buf, "%luKB", n >> 10);
> > +	return buf;
> > +}
> > +
> > +int mem_cgroup_hugetlb_file_init(int idx)
> > +{
> 
> 
> __init ? 

Added .

>And... do we have guarantee that this function is called before
> creating root mem cgroup even if CONFIG_HUGETLBFS=y ?
> 

Yes. This should be called before creating root mem cgroup.

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 07/10] hugetlbfs: Add memcg control files for hugetlbfs
@ 2012-03-19  7:14       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-19  7:14 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, mgorman-l3A5Bk7waGM,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	mhocko-AlSwsSmVLrQ, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Mon, 19 Mar 2012 11:56:25 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote:
> (2012/03/17 2:39), Aneesh Kumar K.V wrote:
> 
> > From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> > 
> > This add control files for hugetlbfs in memcg
> > 
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> 
> 
> I have a question. When a user does
> 
> 	1. create memory cgroup as
> 		/cgroup/A
> 	2. insmod hugetlb.ko
> 	3. ls /cgroup/A
> 
> and then, files can be shown ? Don't we have any problem at rmdir A ?
> 
> I'm sorry if hugetlb never be used as module.

HUGETLBFS cannot be build as kernel module


> 
> a comment below.
> 
> > ---
> >  include/linux/hugetlb.h    |   17 +++++++++++++++
> >  include/linux/memcontrol.h |    7 ++++++
> >  mm/hugetlb.c               |   25 ++++++++++++++++++++++-
> >  mm/memcontrol.c            |   48 ++++++++++++++++++++++++++++++++++++++++++++
> >  4 files changed, 96 insertions(+), 1 deletions(-)


......

> > 
> > +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
> > +static char *mem_fmt(char *buf, unsigned long n)
> > +{
> > +	if (n >= (1UL << 30))
> > +		sprintf(buf, "%luGB", n >> 30);
> > +	else if (n >= (1UL << 20))
> > +		sprintf(buf, "%luMB", n >> 20);
> > +	else
> > +		sprintf(buf, "%luKB", n >> 10);
> > +	return buf;
> > +}
> > +
> > +int mem_cgroup_hugetlb_file_init(int idx)
> > +{
> 
> 
> __init ? 

Added .

>And... do we have guarantee that this function is called before
> creating root mem cgroup even if CONFIG_HUGETLBFS=y ?
> 

Yes. This should be called before creating root mem cgroup.

-aneesh

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 07/10] hugetlbfs: Add memcg control files for hugetlbfs
  2012-03-19  7:14       ` Aneesh Kumar K.V
  (?)
@ 2012-03-19  7:34         ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  7:34 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups, Tejun Heo

(2012/03/19 16:14), Aneesh Kumar K.V wrote:

> On Mon, 19 Mar 2012 11:56:25 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> (2012/03/17 2:39), Aneesh Kumar K.V wrote:
>>
>>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>>>
>>> This add control files for hugetlbfs in memcg
>>>
>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>>
>>
>> I have a question. When a user does
>>
>> 	1. create memory cgroup as
>> 		/cgroup/A
>> 	2. insmod hugetlb.ko
>> 	3. ls /cgroup/A
>>
>> and then, files can be shown ? Don't we have any problem at rmdir A ?
>>
>> I'm sorry if hugetlb never be used as module.
> 
> HUGETLBFS cannot be build as kernel module
> 
> 
>>
>> a comment below.
>>
>>> ---
>>>  include/linux/hugetlb.h    |   17 +++++++++++++++
>>>  include/linux/memcontrol.h |    7 ++++++
>>>  mm/hugetlb.c               |   25 ++++++++++++++++++++++-
>>>  mm/memcontrol.c            |   48 ++++++++++++++++++++++++++++++++++++++++++++
>>>  4 files changed, 96 insertions(+), 1 deletions(-)
> 
> 
> ......
> 
>>>
>>> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
>>> +static char *mem_fmt(char *buf, unsigned long n)
>>> +{
>>> +	if (n >= (1UL << 30))
>>> +		sprintf(buf, "%luGB", n >> 30);
>>> +	else if (n >= (1UL << 20))
>>> +		sprintf(buf, "%luMB", n >> 20);
>>> +	else
>>> +		sprintf(buf, "%luKB", n >> 10);
>>> +	return buf;
>>> +}
>>> +
>>> +int mem_cgroup_hugetlb_file_init(int idx)
>>> +{
>>
>>
>> __init ? 
> 
> Added .
> 
>> And... do we have guarantee that this function is called before
>> creating root mem cgroup even if CONFIG_HUGETLBFS=y ?
>>
> 
> Yes. This should be called before creating root mem cgroup.
> 


O.K. BTW, please read Tejun's recent post..
 
https://lkml.org/lkml/2012/3/16/522

Can you use his methods ?

I guess you can write...

CGROUP_SUBSYS_CFTYLES_COND(mem_cgroup_subsys,
			hugetlb_cgroup_files,
			if XXXXMB hugetlb is allowed);

Hmm.

Thanks,
-Kame



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 07/10] hugetlbfs: Add memcg control files for hugetlbfs
@ 2012-03-19  7:34         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  7:34 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups, Tejun Heo

(2012/03/19 16:14), Aneesh Kumar K.V wrote:

> On Mon, 19 Mar 2012 11:56:25 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> (2012/03/17 2:39), Aneesh Kumar K.V wrote:
>>
>>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>>>
>>> This add control files for hugetlbfs in memcg
>>>
>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>>
>>
>> I have a question. When a user does
>>
>> 	1. create memory cgroup as
>> 		/cgroup/A
>> 	2. insmod hugetlb.ko
>> 	3. ls /cgroup/A
>>
>> and then, files can be shown ? Don't we have any problem at rmdir A ?
>>
>> I'm sorry if hugetlb never be used as module.
> 
> HUGETLBFS cannot be build as kernel module
> 
> 
>>
>> a comment below.
>>
>>> ---
>>>  include/linux/hugetlb.h    |   17 +++++++++++++++
>>>  include/linux/memcontrol.h |    7 ++++++
>>>  mm/hugetlb.c               |   25 ++++++++++++++++++++++-
>>>  mm/memcontrol.c            |   48 ++++++++++++++++++++++++++++++++++++++++++++
>>>  4 files changed, 96 insertions(+), 1 deletions(-)
> 
> 
> ......
> 
>>>
>>> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
>>> +static char *mem_fmt(char *buf, unsigned long n)
>>> +{
>>> +	if (n >= (1UL << 30))
>>> +		sprintf(buf, "%luGB", n >> 30);
>>> +	else if (n >= (1UL << 20))
>>> +		sprintf(buf, "%luMB", n >> 20);
>>> +	else
>>> +		sprintf(buf, "%luKB", n >> 10);
>>> +	return buf;
>>> +}
>>> +
>>> +int mem_cgroup_hugetlb_file_init(int idx)
>>> +{
>>
>>
>> __init ? 
> 
> Added .
> 
>> And... do we have guarantee that this function is called before
>> creating root mem cgroup even if CONFIG_HUGETLBFS=y ?
>>
> 
> Yes. This should be called before creating root mem cgroup.
> 


O.K. BTW, please read Tejun's recent post..
 
https://lkml.org/lkml/2012/3/16/522

Can you use his methods ?

I guess you can write...

CGROUP_SUBSYS_CFTYLES_COND(mem_cgroup_subsys,
			hugetlb_cgroup_files,
			if XXXXMB hugetlb is allowed);

Hmm.

Thanks,
-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 07/10] hugetlbfs: Add memcg control files for hugetlbfs
@ 2012-03-19  7:34         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  7:34 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, mgorman-l3A5Bk7waGM,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	mhocko-AlSwsSmVLrQ, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Tejun Heo

(2012/03/19 16:14), Aneesh Kumar K.V wrote:

> On Mon, 19 Mar 2012 11:56:25 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote:
>> (2012/03/17 2:39), Aneesh Kumar K.V wrote:
>>
>>> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
>>>
>>> This add control files for hugetlbfs in memcg
>>>
>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
>>
>>
>> I have a question. When a user does
>>
>> 	1. create memory cgroup as
>> 		/cgroup/A
>> 	2. insmod hugetlb.ko
>> 	3. ls /cgroup/A
>>
>> and then, files can be shown ? Don't we have any problem at rmdir A ?
>>
>> I'm sorry if hugetlb never be used as module.
> 
> HUGETLBFS cannot be build as kernel module
> 
> 
>>
>> a comment below.
>>
>>> ---
>>>  include/linux/hugetlb.h    |   17 +++++++++++++++
>>>  include/linux/memcontrol.h |    7 ++++++
>>>  mm/hugetlb.c               |   25 ++++++++++++++++++++++-
>>>  mm/memcontrol.c            |   48 ++++++++++++++++++++++++++++++++++++++++++++
>>>  4 files changed, 96 insertions(+), 1 deletions(-)
> 
> 
> ......
> 
>>>
>>> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
>>> +static char *mem_fmt(char *buf, unsigned long n)
>>> +{
>>> +	if (n >= (1UL << 30))
>>> +		sprintf(buf, "%luGB", n >> 30);
>>> +	else if (n >= (1UL << 20))
>>> +		sprintf(buf, "%luMB", n >> 20);
>>> +	else
>>> +		sprintf(buf, "%luKB", n >> 10);
>>> +	return buf;
>>> +}
>>> +
>>> +int mem_cgroup_hugetlb_file_init(int idx)
>>> +{
>>
>>
>> __init ? 
> 
> Added .
> 
>> And... do we have guarantee that this function is called before
>> creating root mem cgroup even if CONFIG_HUGETLBFS=y ?
>>
> 
> Yes. This should be called before creating root mem cgroup.
> 


O.K. BTW, please read Tejun's recent post..
 
https://lkml.org/lkml/2012/3/16/522

Can you use his methods ?

I guess you can write...

CGROUP_SUBSYS_CFTYLES_COND(mem_cgroup_subsys,
			hugetlb_cgroup_files,
			if XXXXMB hugetlb is allowed);

Hmm.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 08/10] hugetlbfs: Add a list for tracking in-use HugeTLB pages
  2012-03-19  3:00     ` KAMEZAWA Hiroyuki
@ 2012-03-19  8:59       ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-19  8:59 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups

On Mon, 19 Mar 2012 12:00:43 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> (2012/03/17 2:39), Aneesh Kumar K.V wrote:
> 
> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> > 
> > hugepage_activelist will be used to track currently used HugeTLB pages.
> > We need to find the in-use HugeTLB pages to support memcg removal.
> > On memcg removal we update the page's memory cgroup to point to
> > parent cgroup.
> > 
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> 
> 
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> seems ok to me but...why the new list is not per node ? no benefit ?
> 

I am not sure whether having per node will bring any performance
benefit. For cgroup removal we need to look at all the list entries
anyway. 

-aneesh


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 08/10] hugetlbfs: Add a list for tracking in-use HugeTLB pages
@ 2012-03-19  8:59       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-19  8:59 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups

On Mon, 19 Mar 2012 12:00:43 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> (2012/03/17 2:39), Aneesh Kumar K.V wrote:
> 
> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> > 
> > hugepage_activelist will be used to track currently used HugeTLB pages.
> > We need to find the in-use HugeTLB pages to support memcg removal.
> > On memcg removal we update the page's memory cgroup to point to
> > parent cgroup.
> > 
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> 
> 
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> seems ok to me but...why the new list is not per node ? no benefit ?
> 

I am not sure whether having per node will bring any performance
benefit. For cgroup removal we need to look at all the list entries
anyway. 

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 09/10] memcg: move HugeTLB resource count to parent cgroup on memcg removal
  2012-03-19  3:04     ` KAMEZAWA Hiroyuki
@ 2012-03-19  9:00       ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-19  9:00 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups

On Mon, 19 Mar 2012 12:04:56 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> (2012/03/17 2:39), Aneesh Kumar K.V wrote:
> 
> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> > 
> > This add support for memcg removal with HugeTLB resource usage.
> > 
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> 
> 
> seems ok for now.
> 
> Now, Tejun and Costa, and I are discussing removeing -EBUSY from rmdir().
> We're now considering 'if use_hierarchy=false and parent seems full, 
> reclaim all or move charges to the root cgroup.' then -EBUSY will go away.
> 
> Is it accesptable for hugetlb ? Do you have another idea ?
> 

That should work even for hugetlb. 

-aneesh


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 09/10] memcg: move HugeTLB resource count to parent cgroup on memcg removal
@ 2012-03-19  9:00       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-19  9:00 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups

On Mon, 19 Mar 2012 12:04:56 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> (2012/03/17 2:39), Aneesh Kumar K.V wrote:
> 
> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> > 
> > This add support for memcg removal with HugeTLB resource usage.
> > 
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> 
> 
> seems ok for now.
> 
> Now, Tejun and Costa, and I are discussing removeing -EBUSY from rmdir().
> We're now considering 'if use_hierarchy=false and parent seems full, 
> reclaim all or move charges to the root cgroup.' then -EBUSY will go away.
> 
> Is it accesptable for hugetlb ? Do you have another idea ?
> 

That should work even for hugetlb. 

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
  2012-03-19  7:00         ` KAMEZAWA Hiroyuki
  (?)
  (?)
@ 2012-03-19 11:39         ` Glauber Costa
  2012-03-19 12:07             ` KAMEZAWA Hiroyuki
  2012-03-21  4:48             ` Aneesh Kumar K.V
  -1 siblings, 2 replies; 130+ messages in thread
From: Glauber Costa @ 2012-03-19 11:39 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Aneesh Kumar K.V, linux-mm, mgorman, dhillf, aarcange, mhocko,
	akpm, hannes, linux-kernel, cgroups

On 03/19/2012 11:00 AM, KAMEZAWA Hiroyuki wrote:
> (2012/03/19 15:52), Aneesh Kumar K.V wrote:
>
>> On Mon, 19 Mar 2012 11:38:38 +0900, KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>  wrote:
>>> (2012/03/17 2:39), Aneesh Kumar K.V wrote:
>>>
>>>> From: "Aneesh Kumar K.V"<aneesh.kumar@linux.vnet.ibm.com>
>>>>
>>>> This patch implements a memcg extension that allows us to control
>>>> HugeTLB allocations via memory controller.
>>>>
>>>
>>>
>>> If you write some details here, it will be helpful for review and
>>> seeing log after merge.
>>
>> Will add more info.
>>
>>>
>>>
>>>> Signed-off-by: Aneesh Kumar K.V<aneesh.kumar@linux.vnet.ibm.com>
>>>> ---
>>>>   include/linux/hugetlb.h    |    1 +
>>>>   include/linux/memcontrol.h |   42 +++++++++++++
>>>>   init/Kconfig               |    8 +++
>>>>   mm/hugetlb.c               |    2 +-
>>>>   mm/memcontrol.c            |  138 ++++++++++++++++++++++++++++++++++++++++++++
>>>>   5 files changed, 190 insertions(+), 1 deletions(-)
>>
>> ....
>>
>>>> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
>>>> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
>>>> +{
>>>> +	int idx;
>>>> +	for (idx = 0; idx<  hugetlb_max_hstate; idx++) {
>>>> +		if (memcg->hugepage[idx].usage>  0)
>>>> +			return 1;
>>>> +	}
>>>> +	return 0;
>>>> +}
>>>
>>>
>>> Please use res_counter_read_u64() rather than reading the value directly.
>>>
>>
>> The open-coded variant is mostly derived from mem_cgroup_force_empty. I
>> have updated the patch to use res_counter_read_u64.
>>
>
> Ah, ok. it's(maybe) my bad. I'll schedule a fix.
>
Kame,

I actually have it ready here. I can submit it if you want.

This one has bitten me as well when I was trying to experiment with the 
res_counter performance...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
  2012-03-19 11:39         ` Glauber Costa
@ 2012-03-19 12:07             ` KAMEZAWA Hiroyuki
  2012-03-21  4:48             ` Aneesh Kumar K.V
  1 sibling, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19 12:07 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Aneesh Kumar K.V, linux-mm, mgorman, dhillf, aarcange, mhocko,
	akpm, hannes, linux-kernel, cgroups

(2012/03/19 20:39), Glauber Costa wrote:

> On 03/19/2012 11:00 AM, KAMEZAWA Hiroyuki wrote:
>> (2012/03/19 15:52), Aneesh Kumar K.V wrote:
>>
>>> On Mon, 19 Mar 2012 11:38:38 +0900, KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>  wrote:
>>>> (2012/03/17 2:39), Aneesh Kumar K.V wrote:
>>>>
>>>>> From: "Aneesh Kumar K.V"<aneesh.kumar@linux.vnet.ibm.com>
>>>>>
>>>>> This patch implements a memcg extension that allows us to control
>>>>> HugeTLB allocations via memory controller.
>>>>>
>>>>
>>>>
>>>> If you write some details here, it will be helpful for review and
>>>> seeing log after merge.
>>>
>>> Will add more info.
>>>
>>>>
>>>>
>>>>> Signed-off-by: Aneesh Kumar K.V<aneesh.kumar@linux.vnet.ibm.com>
>>>>> ---
>>>>>   include/linux/hugetlb.h    |    1 +
>>>>>   include/linux/memcontrol.h |   42 +++++++++++++
>>>>>   init/Kconfig               |    8 +++
>>>>>   mm/hugetlb.c               |    2 +-
>>>>>   mm/memcontrol.c            |  138 ++++++++++++++++++++++++++++++++++++++++++++
>>>>>   5 files changed, 190 insertions(+), 1 deletions(-)
>>>
>>> ....
>>>
>>>>> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
>>>>> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
>>>>> +{
>>>>> +	int idx;
>>>>> +	for (idx = 0; idx<  hugetlb_max_hstate; idx++) {
>>>>> +		if (memcg->hugepage[idx].usage>  0)
>>>>> +			return 1;
>>>>> +	}
>>>>> +	return 0;
>>>>> +}
>>>>
>>>>
>>>> Please use res_counter_read_u64() rather than reading the value directly.
>>>>
>>>
>>> The open-coded variant is mostly derived from mem_cgroup_force_empty. I
>>> have updated the patch to use res_counter_read_u64.
>>>
>>
>> Ah, ok. it's(maybe) my bad. I'll schedule a fix.
>>
> Kame,
> 
> I actually have it ready here. I can submit it if you want.
> 


That's good :) please post.
(But I'm sorry I'll be absent tomorrow.)

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
@ 2012-03-19 12:07             ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19 12:07 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Aneesh Kumar K.V, linux-mm, mgorman, dhillf, aarcange, mhocko,
	akpm, hannes, linux-kernel, cgroups

(2012/03/19 20:39), Glauber Costa wrote:

> On 03/19/2012 11:00 AM, KAMEZAWA Hiroyuki wrote:
>> (2012/03/19 15:52), Aneesh Kumar K.V wrote:
>>
>>> On Mon, 19 Mar 2012 11:38:38 +0900, KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>  wrote:
>>>> (2012/03/17 2:39), Aneesh Kumar K.V wrote:
>>>>
>>>>> From: "Aneesh Kumar K.V"<aneesh.kumar@linux.vnet.ibm.com>
>>>>>
>>>>> This patch implements a memcg extension that allows us to control
>>>>> HugeTLB allocations via memory controller.
>>>>>
>>>>
>>>>
>>>> If you write some details here, it will be helpful for review and
>>>> seeing log after merge.
>>>
>>> Will add more info.
>>>
>>>>
>>>>
>>>>> Signed-off-by: Aneesh Kumar K.V<aneesh.kumar@linux.vnet.ibm.com>
>>>>> ---
>>>>>   include/linux/hugetlb.h    |    1 +
>>>>>   include/linux/memcontrol.h |   42 +++++++++++++
>>>>>   init/Kconfig               |    8 +++
>>>>>   mm/hugetlb.c               |    2 +-
>>>>>   mm/memcontrol.c            |  138 ++++++++++++++++++++++++++++++++++++++++++++
>>>>>   5 files changed, 190 insertions(+), 1 deletions(-)
>>>
>>> ....
>>>
>>>>> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
>>>>> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
>>>>> +{
>>>>> +	int idx;
>>>>> +	for (idx = 0; idx<  hugetlb_max_hstate; idx++) {
>>>>> +		if (memcg->hugepage[idx].usage>  0)
>>>>> +			return 1;
>>>>> +	}
>>>>> +	return 0;
>>>>> +}
>>>>
>>>>
>>>> Please use res_counter_read_u64() rather than reading the value directly.
>>>>
>>>
>>> The open-coded variant is mostly derived from mem_cgroup_force_empty. I
>>> have updated the patch to use res_counter_read_u64.
>>>
>>
>> Ah, ok. it's(maybe) my bad. I'll schedule a fix.
>>
> Kame,
> 
> I actually have it ready here. I can submit it if you want.
> 


That's good :) please post.
(But I'm sorry I'll be absent tomorrow.)

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 07/10] hugetlbfs: Add memcg control files for hugetlbfs
  2012-03-19  7:34         ` KAMEZAWA Hiroyuki
@ 2012-03-20  9:22           ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-20  9:22 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups, Tejun Heo

KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:

>
> O.K. BTW, please read Tejun's recent post..
>
> https://lkml.org/lkml/2012/3/16/522
>
> Can you use his methods ?
>
> I guess you can write...
>
> CGROUP_SUBSYS_CFTYLES_COND(mem_cgroup_subsys,
> 			hugetlb_cgroup_files,
> 			if XXXXMB hugetlb is allowed);
>

I may not be able to do CGROUP_SUBSYS_CFTYPES_COND(). But as long as we
are able to dynamically add new control files, we should be ok.

-aneesh


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 07/10] hugetlbfs: Add memcg control files for hugetlbfs
@ 2012-03-20  9:22           ` Aneesh Kumar K.V
  0 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-20  9:22 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups, Tejun Heo

KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:

>
> O.K. BTW, please read Tejun's recent post..
>
> https://lkml.org/lkml/2012/3/16/522
>
> Can you use his methods ?
>
> I guess you can write...
>
> CGROUP_SUBSYS_CFTYLES_COND(mem_cgroup_subsys,
> 			hugetlb_cgroup_files,
> 			if XXXXMB hugetlb is allowed);
>

I may not be able to do CGROUP_SUBSYS_CFTYPES_COND(). But as long as we
are able to dynamically add new control files, we should be ok.

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
  2012-03-19 11:39         ` Glauber Costa
  2012-03-19 12:07             ` KAMEZAWA Hiroyuki
@ 2012-03-21  4:48             ` Aneesh Kumar K.V
  1 sibling, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-21  4:48 UTC (permalink / raw)
  To: Glauber Costa, KAMEZAWA Hiroyuki
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups

Glauber Costa <glommer@parallels.com> writes:

> On 03/19/2012 11:00 AM, KAMEZAWA Hiroyuki wrote:
>> (2012/03/19 15:52), Aneesh Kumar K.V wrote:
>>
>>>
>>>>> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
>>>>> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
>>>>> +{
>>>>> +	int idx;
>>>>> +	for (idx = 0; idx<  hugetlb_max_hstate; idx++) {
>>>>> +		if (memcg->hugepage[idx].usage>  0)
>>>>> +			return 1;
>>>>> +	}
>>>>> +	return 0;
>>>>> +}
>>>>
>>>>
>>>> Please use res_counter_read_u64() rather than reading the value directly.
>>>>
>>>
>>> The open-coded variant is mostly derived from mem_cgroup_force_empty. I
>>> have updated the patch to use res_counter_read_u64.
>>>
>>
>> Ah, ok. it's(maybe) my bad. I'll schedule a fix.
>>
> Kame,
>
> I actually have it ready here. I can submit it if you want.
>
> This one has bitten me as well when I was trying to experiment with the 
> res_counter performance...

Do we really need memcg.res.usage to be accurate in that while loop ? If
we miss a zero update because we encountered a partial update; in the
next loop we will find it zero right ?

-aneesh


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
@ 2012-03-21  4:48             ` Aneesh Kumar K.V
  0 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-21  4:48 UTC (permalink / raw)
  To: Glauber Costa, KAMEZAWA Hiroyuki
  Cc: linux-mm, mgorman, dhillf, aarcange, mhocko, akpm, hannes,
	linux-kernel, cgroups

Glauber Costa <glommer@parallels.com> writes:

> On 03/19/2012 11:00 AM, KAMEZAWA Hiroyuki wrote:
>> (2012/03/19 15:52), Aneesh Kumar K.V wrote:
>>
>>>
>>>>> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
>>>>> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
>>>>> +{
>>>>> +	int idx;
>>>>> +	for (idx = 0; idx<  hugetlb_max_hstate; idx++) {
>>>>> +		if (memcg->hugepage[idx].usage>  0)
>>>>> +			return 1;
>>>>> +	}
>>>>> +	return 0;
>>>>> +}
>>>>
>>>>
>>>> Please use res_counter_read_u64() rather than reading the value directly.
>>>>
>>>
>>> The open-coded variant is mostly derived from mem_cgroup_force_empty. I
>>> have updated the patch to use res_counter_read_u64.
>>>
>>
>> Ah, ok. it's(maybe) my bad. I'll schedule a fix.
>>
> Kame,
>
> I actually have it ready here. I can submit it if you want.
>
> This one has bitten me as well when I was trying to experiment with the 
> res_counter performance...

Do we really need memcg.res.usage to be accurate in that while loop ? If
we miss a zero update because we encountered a partial update; in the
next loop we will find it zero right ?

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
@ 2012-03-21  4:48             ` Aneesh Kumar K.V
  0 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-21  4:48 UTC (permalink / raw)
  To: Glauber Costa, KAMEZAWA Hiroyuki
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, mgorman-l3A5Bk7waGM,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	mhocko-AlSwsSmVLrQ, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> writes:

> On 03/19/2012 11:00 AM, KAMEZAWA Hiroyuki wrote:
>> (2012/03/19 15:52), Aneesh Kumar K.V wrote:
>>
>>>
>>>>> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
>>>>> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
>>>>> +{
>>>>> +	int idx;
>>>>> +	for (idx = 0; idx<  hugetlb_max_hstate; idx++) {
>>>>> +		if (memcg->hugepage[idx].usage>  0)
>>>>> +			return 1;
>>>>> +	}
>>>>> +	return 0;
>>>>> +}
>>>>
>>>>
>>>> Please use res_counter_read_u64() rather than reading the value directly.
>>>>
>>>
>>> The open-coded variant is mostly derived from mem_cgroup_force_empty. I
>>> have updated the patch to use res_counter_read_u64.
>>>
>>
>> Ah, ok. it's(maybe) my bad. I'll schedule a fix.
>>
> Kame,
>
> I actually have it ready here. I can submit it if you want.
>
> This one has bitten me as well when I was trying to experiment with the 
> res_counter performance...

Do we really need memcg.res.usage to be accurate in that while loop ? If
we miss a zero update because we encountered a partial update; in the
next loop we will find it zero right ?

-aneesh

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
  2012-03-21  4:48             ` Aneesh Kumar K.V
  (?)
@ 2012-03-21  5:22               ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-21  5:22 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Glauber Costa, linux-mm, mgorman, dhillf, aarcange, mhocko, akpm,
	hannes, linux-kernel, cgroups

(2012/03/21 13:48), Aneesh Kumar K.V wrote:

> Glauber Costa <glommer@parallels.com> writes:
> 
>> On 03/19/2012 11:00 AM, KAMEZAWA Hiroyuki wrote:
>>> (2012/03/19 15:52), Aneesh Kumar K.V wrote:
>>>
>>>>
>>>>>> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
>>>>>> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
>>>>>> +{
>>>>>> +	int idx;
>>>>>> +	for (idx = 0; idx<  hugetlb_max_hstate; idx++) {
>>>>>> +		if (memcg->hugepage[idx].usage>  0)
>>>>>> +			return 1;
>>>>>> +	}
>>>>>> +	return 0;
>>>>>> +}
>>>>>
>>>>>
>>>>> Please use res_counter_read_u64() rather than reading the value directly.
>>>>>
>>>>
>>>> The open-coded variant is mostly derived from mem_cgroup_force_empty. I
>>>> have updated the patch to use res_counter_read_u64.
>>>>
>>>
>>> Ah, ok. it's(maybe) my bad. I'll schedule a fix.
>>>
>> Kame,
>>
>> I actually have it ready here. I can submit it if you want.
>>
>> This one has bitten me as well when I was trying to experiment with the 
>> res_counter performance...
> 
> Do we really need memcg.res.usage to be accurate in that while loop ? If
> we miss a zero update because we encountered a partial update; in the
> next loop we will find it zero right ?
> 

At rmdir(), I assume there is no task in memcg. It means res->usage never
increase and no other thread than force_empty will touch res->counter.
So, I think memcg->res.usage > 0 never be wrong and we'll find correct comparison
by continuing the loop.

But recent kmem accounting at el may break the assumption (I'm not fully sure..)
So, I think it will be good to use res_counter_u64(). This part is not important for
performance, anyway.

Thanks,
-Kame




^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
@ 2012-03-21  5:22               ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-21  5:22 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Glauber Costa, linux-mm, mgorman, dhillf, aarcange, mhocko, akpm,
	hannes, linux-kernel, cgroups

(2012/03/21 13:48), Aneesh Kumar K.V wrote:

> Glauber Costa <glommer@parallels.com> writes:
> 
>> On 03/19/2012 11:00 AM, KAMEZAWA Hiroyuki wrote:
>>> (2012/03/19 15:52), Aneesh Kumar K.V wrote:
>>>
>>>>
>>>>>> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
>>>>>> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
>>>>>> +{
>>>>>> +	int idx;
>>>>>> +	for (idx = 0; idx<  hugetlb_max_hstate; idx++) {
>>>>>> +		if (memcg->hugepage[idx].usage>  0)
>>>>>> +			return 1;
>>>>>> +	}
>>>>>> +	return 0;
>>>>>> +}
>>>>>
>>>>>
>>>>> Please use res_counter_read_u64() rather than reading the value directly.
>>>>>
>>>>
>>>> The open-coded variant is mostly derived from mem_cgroup_force_empty. I
>>>> have updated the patch to use res_counter_read_u64.
>>>>
>>>
>>> Ah, ok. it's(maybe) my bad. I'll schedule a fix.
>>>
>> Kame,
>>
>> I actually have it ready here. I can submit it if you want.
>>
>> This one has bitten me as well when I was trying to experiment with the 
>> res_counter performance...
> 
> Do we really need memcg.res.usage to be accurate in that while loop ? If
> we miss a zero update because we encountered a partial update; in the
> next loop we will find it zero right ?
> 

At rmdir(), I assume there is no task in memcg. It means res->usage never
increase and no other thread than force_empty will touch res->counter.
So, I think memcg->res.usage > 0 never be wrong and we'll find correct comparison
by continuing the loop.

But recent kmem accounting at el may break the assumption (I'm not fully sure..)
So, I think it will be good to use res_counter_u64(). This part is not important for
performance, anyway.

Thanks,
-Kame



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
@ 2012-03-21  5:22               ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-21  5:22 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Glauber Costa, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	mgorman-l3A5Bk7waGM, dhillf-Re5JQEeQqe8AvxtiuMwx3w,
	aarcange-H+wXaHxf7aLQT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

(2012/03/21 13:48), Aneesh Kumar K.V wrote:

> Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> writes:
> 
>> On 03/19/2012 11:00 AM, KAMEZAWA Hiroyuki wrote:
>>> (2012/03/19 15:52), Aneesh Kumar K.V wrote:
>>>
>>>>
>>>>>> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
>>>>>> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
>>>>>> +{
>>>>>> +	int idx;
>>>>>> +	for (idx = 0; idx<  hugetlb_max_hstate; idx++) {
>>>>>> +		if (memcg->hugepage[idx].usage>  0)
>>>>>> +			return 1;
>>>>>> +	}
>>>>>> +	return 0;
>>>>>> +}
>>>>>
>>>>>
>>>>> Please use res_counter_read_u64() rather than reading the value directly.
>>>>>
>>>>
>>>> The open-coded variant is mostly derived from mem_cgroup_force_empty. I
>>>> have updated the patch to use res_counter_read_u64.
>>>>
>>>
>>> Ah, ok. it's(maybe) my bad. I'll schedule a fix.
>>>
>> Kame,
>>
>> I actually have it ready here. I can submit it if you want.
>>
>> This one has bitten me as well when I was trying to experiment with the 
>> res_counter performance...
> 
> Do we really need memcg.res.usage to be accurate in that while loop ? If
> we miss a zero update because we encountered a partial update; in the
> next loop we will find it zero right ?
> 

At rmdir(), I assume there is no task in memcg. It means res->usage never
increase and no other thread than force_empty will touch res->counter.
So, I think memcg->res.usage > 0 never be wrong and we'll find correct comparison
by continuing the loop.

But recent kmem accounting at el may break the assumption (I'm not fully sure..)
So, I think it will be good to use res_counter_u64(). This part is not important for
performance, anyway.

Thanks,
-Kame



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 01/10] hugetlb: rename max_hstate to hugetlb_max_hstate
  2012-03-16 17:39   ` Aneesh Kumar K.V
  (?)
@ 2012-03-28  9:18     ` Michal Hocko
  -1 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-28  9:18 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

[Sorry for late review]

On Fri 16-03-12 23:09:21, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> We will be using this from other subsystems like memcg
> in later patches.

OK, why not. I would probably loved an accessor function more but what
ever.

Acked-by: Michal Hocko <mhocko@suse.cz>

> 
> Acked-by: Hillf Danton <dhillf@gmail.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  mm/hugetlb.c |   14 +++++++-------
>  1 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 5f34bd8..d623e71 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -34,7 +34,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
>  static gfp_t htlb_alloc_mask = GFP_HIGHUSER;
>  unsigned long hugepages_treat_as_movable;
>  
> -static int max_hstate;
> +static int hugetlb_max_hstate;
> 
>  unsigned int default_hstate_idx;
>  struct hstate hstates[HUGE_MAX_HSTATE];
>  
> @@ -46,7 +46,7 @@ static unsigned long __initdata default_hstate_max_huge_pages;
>  static unsigned long __initdata default_hstate_size;
>  
>  #define for_each_hstate(h) \
> -	for ((h) = hstates; (h) < &hstates[max_hstate]; (h)++)
> +	for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++)
>  
>  /*
>   * Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages
> @@ -1808,9 +1808,9 @@ void __init hugetlb_add_hstate(unsigned order)
>  		printk(KERN_WARNING "hugepagesz= specified twice, ignoring\n");
>  		return;
>  	}
> -	BUG_ON(max_hstate >= HUGE_MAX_HSTATE);
> +	BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE);
>  	BUG_ON(order == 0);
> -	h = &hstates[max_hstate++];
> +	h = &hstates[hugetlb_max_hstate++];
>  	h->order = order;
>  	h->mask = ~((1ULL << (order + PAGE_SHIFT)) - 1);
>  	h->nr_huge_pages = 0;
> @@ -1831,10 +1831,10 @@ static int __init hugetlb_nrpages_setup(char *s)
>  	static unsigned long *last_mhp;
>  
>  	/*
> -	 * !max_hstate means we haven't parsed a hugepagesz= parameter yet,
> +	 * !hugetlb_max_hstate means we haven't parsed a hugepagesz= parameter yet,
>  	 * so this hugepages= parameter goes to the "default hstate".
>  	 */
> -	if (!max_hstate)
> +	if (!hugetlb_max_hstate)
>  		mhp = &default_hstate_max_huge_pages;
>  	else
>  		mhp = &parsed_hstate->max_huge_pages;
> @@ -1853,7 +1853,7 @@ static int __init hugetlb_nrpages_setup(char *s)
>  	 * But we need to allocate >= MAX_ORDER hstates here early to still
>  	 * use the bootmem allocator.
>  	 */
> -	if (max_hstate && parsed_hstate->order >= MAX_ORDER)
> +	if (hugetlb_max_hstate && parsed_hstate->order >= MAX_ORDER)
>  		hugetlb_hstate_alloc_pages(parsed_hstate);
>  
>  	last_mhp = mhp;
> -- 
> 1.7.9
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 01/10] hugetlb: rename max_hstate to hugetlb_max_hstate
@ 2012-03-28  9:18     ` Michal Hocko
  0 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-28  9:18 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

[Sorry for late review]

On Fri 16-03-12 23:09:21, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> We will be using this from other subsystems like memcg
> in later patches.

OK, why not. I would probably loved an accessor function more but what
ever.

Acked-by: Michal Hocko <mhocko@suse.cz>

> 
> Acked-by: Hillf Danton <dhillf@gmail.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  mm/hugetlb.c |   14 +++++++-------
>  1 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 5f34bd8..d623e71 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -34,7 +34,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
>  static gfp_t htlb_alloc_mask = GFP_HIGHUSER;
>  unsigned long hugepages_treat_as_movable;
>  
> -static int max_hstate;
> +static int hugetlb_max_hstate;
> 
>  unsigned int default_hstate_idx;
>  struct hstate hstates[HUGE_MAX_HSTATE];
>  
> @@ -46,7 +46,7 @@ static unsigned long __initdata default_hstate_max_huge_pages;
>  static unsigned long __initdata default_hstate_size;
>  
>  #define for_each_hstate(h) \
> -	for ((h) = hstates; (h) < &hstates[max_hstate]; (h)++)
> +	for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++)
>  
>  /*
>   * Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages
> @@ -1808,9 +1808,9 @@ void __init hugetlb_add_hstate(unsigned order)
>  		printk(KERN_WARNING "hugepagesz= specified twice, ignoring\n");
>  		return;
>  	}
> -	BUG_ON(max_hstate >= HUGE_MAX_HSTATE);
> +	BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE);
>  	BUG_ON(order == 0);
> -	h = &hstates[max_hstate++];
> +	h = &hstates[hugetlb_max_hstate++];
>  	h->order = order;
>  	h->mask = ~((1ULL << (order + PAGE_SHIFT)) - 1);
>  	h->nr_huge_pages = 0;
> @@ -1831,10 +1831,10 @@ static int __init hugetlb_nrpages_setup(char *s)
>  	static unsigned long *last_mhp;
>  
>  	/*
> -	 * !max_hstate means we haven't parsed a hugepagesz= parameter yet,
> +	 * !hugetlb_max_hstate means we haven't parsed a hugepagesz= parameter yet,
>  	 * so this hugepages= parameter goes to the "default hstate".
>  	 */
> -	if (!max_hstate)
> +	if (!hugetlb_max_hstate)
>  		mhp = &default_hstate_max_huge_pages;
>  	else
>  		mhp = &parsed_hstate->max_huge_pages;
> @@ -1853,7 +1853,7 @@ static int __init hugetlb_nrpages_setup(char *s)
>  	 * But we need to allocate >= MAX_ORDER hstates here early to still
>  	 * use the bootmem allocator.
>  	 */
> -	if (max_hstate && parsed_hstate->order >= MAX_ORDER)
> +	if (hugetlb_max_hstate && parsed_hstate->order >= MAX_ORDER)
>  		hugetlb_hstate_alloc_pages(parsed_hstate);
>  
>  	last_mhp = mhp;
> -- 
> 1.7.9
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 01/10] hugetlb: rename max_hstate to hugetlb_max_hstate
@ 2012-03-28  9:18     ` Michal Hocko
  0 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-28  9:18 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, mgorman-l3A5Bk7waGM,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

[Sorry for late review]

On Fri 16-03-12 23:09:21, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> 
> We will be using this from other subsystems like memcg
> in later patches.

OK, why not. I would probably loved an accessor function more but what
ever.

Acked-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>

> 
> Acked-by: Hillf Danton <dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> ---
>  mm/hugetlb.c |   14 +++++++-------
>  1 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 5f34bd8..d623e71 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -34,7 +34,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
>  static gfp_t htlb_alloc_mask = GFP_HIGHUSER;
>  unsigned long hugepages_treat_as_movable;
>  
> -static int max_hstate;
> +static int hugetlb_max_hstate;
> 
>  unsigned int default_hstate_idx;
>  struct hstate hstates[HUGE_MAX_HSTATE];
>  
> @@ -46,7 +46,7 @@ static unsigned long __initdata default_hstate_max_huge_pages;
>  static unsigned long __initdata default_hstate_size;
>  
>  #define for_each_hstate(h) \
> -	for ((h) = hstates; (h) < &hstates[max_hstate]; (h)++)
> +	for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++)
>  
>  /*
>   * Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages
> @@ -1808,9 +1808,9 @@ void __init hugetlb_add_hstate(unsigned order)
>  		printk(KERN_WARNING "hugepagesz= specified twice, ignoring\n");
>  		return;
>  	}
> -	BUG_ON(max_hstate >= HUGE_MAX_HSTATE);
> +	BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE);
>  	BUG_ON(order == 0);
> -	h = &hstates[max_hstate++];
> +	h = &hstates[hugetlb_max_hstate++];
>  	h->order = order;
>  	h->mask = ~((1ULL << (order + PAGE_SHIFT)) - 1);
>  	h->nr_huge_pages = 0;
> @@ -1831,10 +1831,10 @@ static int __init hugetlb_nrpages_setup(char *s)
>  	static unsigned long *last_mhp;
>  
>  	/*
> -	 * !max_hstate means we haven't parsed a hugepagesz= parameter yet,
> +	 * !hugetlb_max_hstate means we haven't parsed a hugepagesz= parameter yet,
>  	 * so this hugepages= parameter goes to the "default hstate".
>  	 */
> -	if (!max_hstate)
> +	if (!hugetlb_max_hstate)
>  		mhp = &default_hstate_max_huge_pages;
>  	else
>  		mhp = &parsed_hstate->max_huge_pages;
> @@ -1853,7 +1853,7 @@ static int __init hugetlb_nrpages_setup(char *s)
>  	 * But we need to allocate >= MAX_ORDER hstates here early to still
>  	 * use the bootmem allocator.
>  	 */
> -	if (max_hstate && parsed_hstate->order >= MAX_ORDER)
> +	if (hugetlb_max_hstate && parsed_hstate->order >= MAX_ORDER)
>  		hugetlb_hstate_alloc_pages(parsed_hstate);
>  
>  	last_mhp = mhp;
> -- 
> 1.7.9
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 02/10] hugetlbfs: don't use ERR_PTR with VM_FAULT* values
  2012-03-16 17:39   ` Aneesh Kumar K.V
@ 2012-03-28  9:25     ` Michal Hocko
  -1 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-28  9:25 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

On Fri 16-03-12 23:09:22, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> Using VM_FAULT_* codes with ERR_PTR will require us to make sure
> VM_FAULT_* values will not exceed MAX_ERRNO value.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  mm/hugetlb.c |   18 +++++++++++++-----
>  1 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index d623e71..3782da8 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
[...]
> @@ -1047,7 +1047,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
>  		page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
>  		if (!page) {
>  			hugetlb_put_quota(inode->i_mapping, chg);
> -			return ERR_PTR(-VM_FAULT_SIGBUS);
> +			return ERR_PTR(-ENOSPC);

Hmm, so one error code abuse replaced by another?
I know that ENOMEM would revert 4a6018f7 which would be unfortunate but
ENOSPC doesn't feel right as well.

>  		}
>  	}
>  
> @@ -2395,6 +2395,7 @@ retry_avoidcopy:
>  	new_page = alloc_huge_page(vma, address, outside_reserve);
>  
>  	if (IS_ERR(new_page)) {
> +		int err = PTR_ERR(new_page);
>  		page_cache_release(old_page);
>  
>  		/*
> @@ -2424,7 +2425,10 @@ retry_avoidcopy:
>  
>  		/* Caller expects lock to be held */
>  		spin_lock(&mm->page_table_lock);
> -		return -PTR_ERR(new_page);
> +		if (err == -ENOMEM)
> +			return VM_FAULT_OOM;
> +		else
> +			return VM_FAULT_SIGBUS;
>  	}
>  
>  	/*
> @@ -2542,7 +2546,11 @@ retry:
>  			goto out;
>  		page = alloc_huge_page(vma, address, 0);
>  		if (IS_ERR(page)) {
> -			ret = -PTR_ERR(page);
> +			ret = PTR_ERR(page);
> +			if (ret == -ENOMEM)
> +				ret = VM_FAULT_OOM;
> +			else
> +				ret = VM_FAULT_SIGBUS;
>  			goto out;
>  		}
>  		clear_huge_page(page, address, pages_per_huge_page(h));
> -- 
> 1.7.9
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 02/10] hugetlbfs: don't use ERR_PTR with VM_FAULT* values
@ 2012-03-28  9:25     ` Michal Hocko
  0 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-28  9:25 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

On Fri 16-03-12 23:09:22, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> Using VM_FAULT_* codes with ERR_PTR will require us to make sure
> VM_FAULT_* values will not exceed MAX_ERRNO value.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  mm/hugetlb.c |   18 +++++++++++++-----
>  1 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index d623e71..3782da8 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
[...]
> @@ -1047,7 +1047,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
>  		page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
>  		if (!page) {
>  			hugetlb_put_quota(inode->i_mapping, chg);
> -			return ERR_PTR(-VM_FAULT_SIGBUS);
> +			return ERR_PTR(-ENOSPC);

Hmm, so one error code abuse replaced by another?
I know that ENOMEM would revert 4a6018f7 which would be unfortunate but
ENOSPC doesn't feel right as well.

>  		}
>  	}
>  
> @@ -2395,6 +2395,7 @@ retry_avoidcopy:
>  	new_page = alloc_huge_page(vma, address, outside_reserve);
>  
>  	if (IS_ERR(new_page)) {
> +		int err = PTR_ERR(new_page);
>  		page_cache_release(old_page);
>  
>  		/*
> @@ -2424,7 +2425,10 @@ retry_avoidcopy:
>  
>  		/* Caller expects lock to be held */
>  		spin_lock(&mm->page_table_lock);
> -		return -PTR_ERR(new_page);
> +		if (err == -ENOMEM)
> +			return VM_FAULT_OOM;
> +		else
> +			return VM_FAULT_SIGBUS;
>  	}
>  
>  	/*
> @@ -2542,7 +2546,11 @@ retry:
>  			goto out;
>  		page = alloc_huge_page(vma, address, 0);
>  		if (IS_ERR(page)) {
> -			ret = -PTR_ERR(page);
> +			ret = PTR_ERR(page);
> +			if (ret == -ENOMEM)
> +				ret = VM_FAULT_OOM;
> +			else
> +				ret = VM_FAULT_SIGBUS;
>  			goto out;
>  		}
>  		clear_huge_page(page, address, pages_per_huge_page(h));
> -- 
> 1.7.9
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 03/10] hugetlbfs: Add an inline helper for finding hstate index
  2012-03-16 17:39   ` Aneesh Kumar K.V
@ 2012-03-28  9:41     ` Michal Hocko
  -1 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-28  9:41 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

On Fri 16-03-12 23:09:23, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> Add and inline helper and use it in the code.

OK, helper function looks much nicer.

> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Acked-by: Michal Hocko <mhocko@suse.cz>

> ---
>  include/linux/hugetlb.h |    6 ++++++
>  mm/hugetlb.c            |   18 ++++++++++--------
>  2 files changed, 16 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index d9d6c86..a2675b0 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -311,6 +311,11 @@ static inline unsigned hstate_index_to_shift(unsigned index)
>  	return hstates[index].order + PAGE_SHIFT;
>  }
>  
> +static inline int hstate_index(struct hstate *h)
> +{
> +	return h - hstates;
> +}
> +
>  #else
>  struct hstate {};
>  #define alloc_huge_page_node(h, nid) NULL
> @@ -329,6 +334,7 @@ static inline unsigned int pages_per_huge_page(struct hstate *h)
>  	return 1;
>  }
>  #define hstate_index_to_shift(index) 0
> +#define hstate_index(h) 0
>  #endif
>  
>  #endif /* _LINUX_HUGETLB_H */
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 3782da8..ebe245c 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1557,7 +1557,7 @@ static int hugetlb_sysfs_add_hstate(struct hstate *h, struct kobject *parent,
>  				    struct attribute_group *hstate_attr_group)
>  {
>  	int retval;
> -	int hi = h - hstates;
> +	int hi = hstate_index(h);
>  
>  	hstate_kobjs[hi] = kobject_create_and_add(h->name, parent);
>  	if (!hstate_kobjs[hi])
> @@ -1652,11 +1652,13 @@ void hugetlb_unregister_node(struct node *node)
>  	if (!nhs->hugepages_kobj)
>  		return;		/* no hstate attributes */
>  
> -	for_each_hstate(h)
> -		if (nhs->hstate_kobjs[h - hstates]) {
> -			kobject_put(nhs->hstate_kobjs[h - hstates]);
> -			nhs->hstate_kobjs[h - hstates] = NULL;
> +	for_each_hstate(h) {
> +		int idx = hstate_index(h);
> +		if (nhs->hstate_kobjs[idx]) {
> +			kobject_put(nhs->hstate_kobjs[idx]);
> +			nhs->hstate_kobjs[idx] = NULL;
>  		}
> +	}
>  
>  	kobject_put(nhs->hugepages_kobj);
>  	nhs->hugepages_kobj = NULL;
> @@ -1759,7 +1761,7 @@ static void __exit hugetlb_exit(void)
>  	hugetlb_unregister_all_nodes();
>  
>  	for_each_hstate(h) {
> -		kobject_put(hstate_kobjs[h - hstates]);
> +		kobject_put(hstate_kobjs[hstate_index(h)]);
>  	}
>  
>  	kobject_put(hugepages_kobj);
> @@ -2587,7 +2589,7 @@ retry:
>  		 */
>  		if (unlikely(PageHWPoison(page))) {
>  			ret = VM_FAULT_HWPOISON |
> -			      VM_FAULT_SET_HINDEX(h - hstates);
> +				VM_FAULT_SET_HINDEX(hstate_index(h));
>  			goto backout_unlocked;
>  		}
>  	}
> @@ -2660,7 +2662,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>  			return 0;
>  		} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
>  			return VM_FAULT_HWPOISON_LARGE |
> -			       VM_FAULT_SET_HINDEX(h - hstates);
> +				VM_FAULT_SET_HINDEX(hstate_index(h));
>  	}
>  
>  	ptep = huge_pte_alloc(mm, address, huge_page_size(h));
> -- 
> 1.7.9
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 03/10] hugetlbfs: Add an inline helper for finding hstate index
@ 2012-03-28  9:41     ` Michal Hocko
  0 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-28  9:41 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

On Fri 16-03-12 23:09:23, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> Add and inline helper and use it in the code.

OK, helper function looks much nicer.

> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Acked-by: Michal Hocko <mhocko@suse.cz>

> ---
>  include/linux/hugetlb.h |    6 ++++++
>  mm/hugetlb.c            |   18 ++++++++++--------
>  2 files changed, 16 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index d9d6c86..a2675b0 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -311,6 +311,11 @@ static inline unsigned hstate_index_to_shift(unsigned index)
>  	return hstates[index].order + PAGE_SHIFT;
>  }
>  
> +static inline int hstate_index(struct hstate *h)
> +{
> +	return h - hstates;
> +}
> +
>  #else
>  struct hstate {};
>  #define alloc_huge_page_node(h, nid) NULL
> @@ -329,6 +334,7 @@ static inline unsigned int pages_per_huge_page(struct hstate *h)
>  	return 1;
>  }
>  #define hstate_index_to_shift(index) 0
> +#define hstate_index(h) 0
>  #endif
>  
>  #endif /* _LINUX_HUGETLB_H */
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 3782da8..ebe245c 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1557,7 +1557,7 @@ static int hugetlb_sysfs_add_hstate(struct hstate *h, struct kobject *parent,
>  				    struct attribute_group *hstate_attr_group)
>  {
>  	int retval;
> -	int hi = h - hstates;
> +	int hi = hstate_index(h);
>  
>  	hstate_kobjs[hi] = kobject_create_and_add(h->name, parent);
>  	if (!hstate_kobjs[hi])
> @@ -1652,11 +1652,13 @@ void hugetlb_unregister_node(struct node *node)
>  	if (!nhs->hugepages_kobj)
>  		return;		/* no hstate attributes */
>  
> -	for_each_hstate(h)
> -		if (nhs->hstate_kobjs[h - hstates]) {
> -			kobject_put(nhs->hstate_kobjs[h - hstates]);
> -			nhs->hstate_kobjs[h - hstates] = NULL;
> +	for_each_hstate(h) {
> +		int idx = hstate_index(h);
> +		if (nhs->hstate_kobjs[idx]) {
> +			kobject_put(nhs->hstate_kobjs[idx]);
> +			nhs->hstate_kobjs[idx] = NULL;
>  		}
> +	}
>  
>  	kobject_put(nhs->hugepages_kobj);
>  	nhs->hugepages_kobj = NULL;
> @@ -1759,7 +1761,7 @@ static void __exit hugetlb_exit(void)
>  	hugetlb_unregister_all_nodes();
>  
>  	for_each_hstate(h) {
> -		kobject_put(hstate_kobjs[h - hstates]);
> +		kobject_put(hstate_kobjs[hstate_index(h)]);
>  	}
>  
>  	kobject_put(hugepages_kobj);
> @@ -2587,7 +2589,7 @@ retry:
>  		 */
>  		if (unlikely(PageHWPoison(page))) {
>  			ret = VM_FAULT_HWPOISON |
> -			      VM_FAULT_SET_HINDEX(h - hstates);
> +				VM_FAULT_SET_HINDEX(hstate_index(h));
>  			goto backout_unlocked;
>  		}
>  	}
> @@ -2660,7 +2662,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>  			return 0;
>  		} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
>  			return VM_FAULT_HWPOISON_LARGE |
> -			       VM_FAULT_SET_HINDEX(h - hstates);
> +				VM_FAULT_SET_HINDEX(hstate_index(h));
>  	}
>  
>  	ptep = huge_pte_alloc(mm, address, huge_page_size(h));
> -- 
> 1.7.9
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
  2012-03-16 17:39   ` Aneesh Kumar K.V
  (?)
@ 2012-03-28 11:33     ` Michal Hocko
  -1 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-28 11:33 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> This patch implements a memcg extension that allows us to control
> HugeTLB allocations via memory controller.

And the infrastructure is not used at this stage (you forgot to
mention).
The changelog should be much more descriptive.

> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  include/linux/hugetlb.h    |    1 +
>  include/linux/memcontrol.h |   42 +++++++++++++
>  init/Kconfig               |    8 +++
>  mm/hugetlb.c               |    2 +-
>  mm/memcontrol.c            |  138 ++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 190 insertions(+), 1 deletions(-)
> 
[...]
> diff --git a/init/Kconfig b/init/Kconfig
> index 3f42cd6..f0eb8aa 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -725,6 +725,14 @@ config CGROUP_PERF
>  
>  	  Say N if unsure.
>  
> +config MEM_RES_CTLR_HUGETLB
> +	bool "Memory Resource Controller HugeTLB Extension (EXPERIMENTAL)"
> +	depends on CGROUP_MEM_RES_CTLR && HUGETLB_PAGE && EXPERIMENTAL
> +	default n
> +	help
> +	  Add HugeTLB management to memory resource controller. When you
> +	  enable this, you can put a per cgroup limit on HugeTLB usage.

How does it interact with the hard/soft limists etc...

[...]
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 6728a7a..4b36c5e 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -235,6 +235,10 @@ struct mem_cgroup {
>  	 */
>  	struct res_counter memsw;
>  	/*
> +	 * the counter to account for hugepages from hugetlb.
> +	 */
> +	struct res_counter hugepage[HUGE_MAX_HSTATE];
> +	/*
>  	 * Per cgroup active and inactive list, similar to the
>  	 * per zone LRU lists.
>  	 */
> @@ -3156,6 +3160,128 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
>  }
>  #endif
>  
> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
> +{
> +	int idx;
> +	for (idx = 0; idx < hugetlb_max_hstate; idx++) {

Maybe we should expose for_each_hstate as well...

> +		if (memcg->hugepage[idx].usage > 0)
> +			return 1;
> +	}
> +	return 0;
> +}
> +
> +int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
> +				   struct mem_cgroup **ptr)
> +{
> +	int ret = 0;
> +	struct mem_cgroup *memcg;
> +	struct res_counter *fail_res;
> +	unsigned long csize = nr_pages * PAGE_SIZE;
> +
> +	if (mem_cgroup_disabled())
> +		return 0;
> +again:
> +	rcu_read_lock();
> +	memcg = mem_cgroup_from_task(current);
> +	if (!memcg)
> +		memcg = root_mem_cgroup;
> +	if (mem_cgroup_is_root(memcg)) {
> +		rcu_read_unlock();
> +		goto done;
> +	}
> +	if (!css_tryget(&memcg->css)) {
> +		rcu_read_unlock();
> +		goto again;
> +	}
> +	rcu_read_unlock();
> +
> +	ret = res_counter_charge(&memcg->hugepage[idx], csize, &fail_res);
> +	css_put(&memcg->css);
> +done:
> +	*ptr = memcg;

Why do we set ptr even for the failure case after we dropped a
reference?

> +	return ret;
> +}
> +
> +void mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages,
> +				      struct mem_cgroup *memcg,
> +				      struct page *page)
> +{
> +	struct page_cgroup *pc;
> +
> +	if (mem_cgroup_disabled())
> +		return;
> +
> +	pc = lookup_page_cgroup(page);
> +	lock_page_cgroup(pc);
> +	if (unlikely(PageCgroupUsed(pc))) {
> +		unlock_page_cgroup(pc);
> +		mem_cgroup_hugetlb_uncharge_memcg(idx, nr_pages, memcg);
> +		return;
> +	}
> +	pc->mem_cgroup = memcg;
> +	/*
> +	 * We access a page_cgroup asynchronously without lock_page_cgroup().
> +	 * Especially when a page_cgroup is taken from a page, pc->mem_cgroup
> +	 * is accessed after testing USED bit. To make pc->mem_cgroup visible
> +	 * before USED bit, we need memory barrier here.
> +	 * See mem_cgroup_add_lru_list(), etc.
> +	 */
> +	smp_wmb();

Is this really necessary for hugetlb pages as well?

> +	SetPageCgroupUsed(pc);
> +
> +	unlock_page_cgroup(pc);
> +	return;
> +}
> +
[...]
> @@ -4887,6 +5013,7 @@ err_cleanup:
>  static struct cgroup_subsys_state * __ref
>  mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  {
> +	int idx;
>  	struct mem_cgroup *memcg, *parent;
>  	long error = -ENOMEM;
>  	int node;
> @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  		 * mem_cgroup(see mem_cgroup_put).
>  		 */
>  		mem_cgroup_get(parent);
> +		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)

Do we have to init all hstates or is hugetlb_max_hstate enough?

> +			res_counter_init(&memcg->hugepage[idx],
> +					 &parent->hugepage[idx]);
>  	} else {
>  		res_counter_init(&memcg->res, NULL);
>  		res_counter_init(&memcg->memsw, NULL);
> +		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
> +			res_counter_init(&memcg->hugepage[idx], NULL);

Same here
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
@ 2012-03-28 11:33     ` Michal Hocko
  0 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-28 11:33 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> This patch implements a memcg extension that allows us to control
> HugeTLB allocations via memory controller.

And the infrastructure is not used at this stage (you forgot to
mention).
The changelog should be much more descriptive.

> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  include/linux/hugetlb.h    |    1 +
>  include/linux/memcontrol.h |   42 +++++++++++++
>  init/Kconfig               |    8 +++
>  mm/hugetlb.c               |    2 +-
>  mm/memcontrol.c            |  138 ++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 190 insertions(+), 1 deletions(-)
> 
[...]
> diff --git a/init/Kconfig b/init/Kconfig
> index 3f42cd6..f0eb8aa 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -725,6 +725,14 @@ config CGROUP_PERF
>  
>  	  Say N if unsure.
>  
> +config MEM_RES_CTLR_HUGETLB
> +	bool "Memory Resource Controller HugeTLB Extension (EXPERIMENTAL)"
> +	depends on CGROUP_MEM_RES_CTLR && HUGETLB_PAGE && EXPERIMENTAL
> +	default n
> +	help
> +	  Add HugeTLB management to memory resource controller. When you
> +	  enable this, you can put a per cgroup limit on HugeTLB usage.

How does it interact with the hard/soft limists etc...

[...]
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 6728a7a..4b36c5e 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -235,6 +235,10 @@ struct mem_cgroup {
>  	 */
>  	struct res_counter memsw;
>  	/*
> +	 * the counter to account for hugepages from hugetlb.
> +	 */
> +	struct res_counter hugepage[HUGE_MAX_HSTATE];
> +	/*
>  	 * Per cgroup active and inactive list, similar to the
>  	 * per zone LRU lists.
>  	 */
> @@ -3156,6 +3160,128 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
>  }
>  #endif
>  
> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
> +{
> +	int idx;
> +	for (idx = 0; idx < hugetlb_max_hstate; idx++) {

Maybe we should expose for_each_hstate as well...

> +		if (memcg->hugepage[idx].usage > 0)
> +			return 1;
> +	}
> +	return 0;
> +}
> +
> +int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
> +				   struct mem_cgroup **ptr)
> +{
> +	int ret = 0;
> +	struct mem_cgroup *memcg;
> +	struct res_counter *fail_res;
> +	unsigned long csize = nr_pages * PAGE_SIZE;
> +
> +	if (mem_cgroup_disabled())
> +		return 0;
> +again:
> +	rcu_read_lock();
> +	memcg = mem_cgroup_from_task(current);
> +	if (!memcg)
> +		memcg = root_mem_cgroup;
> +	if (mem_cgroup_is_root(memcg)) {
> +		rcu_read_unlock();
> +		goto done;
> +	}
> +	if (!css_tryget(&memcg->css)) {
> +		rcu_read_unlock();
> +		goto again;
> +	}
> +	rcu_read_unlock();
> +
> +	ret = res_counter_charge(&memcg->hugepage[idx], csize, &fail_res);
> +	css_put(&memcg->css);
> +done:
> +	*ptr = memcg;

Why do we set ptr even for the failure case after we dropped a
reference?

> +	return ret;
> +}
> +
> +void mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages,
> +				      struct mem_cgroup *memcg,
> +				      struct page *page)
> +{
> +	struct page_cgroup *pc;
> +
> +	if (mem_cgroup_disabled())
> +		return;
> +
> +	pc = lookup_page_cgroup(page);
> +	lock_page_cgroup(pc);
> +	if (unlikely(PageCgroupUsed(pc))) {
> +		unlock_page_cgroup(pc);
> +		mem_cgroup_hugetlb_uncharge_memcg(idx, nr_pages, memcg);
> +		return;
> +	}
> +	pc->mem_cgroup = memcg;
> +	/*
> +	 * We access a page_cgroup asynchronously without lock_page_cgroup().
> +	 * Especially when a page_cgroup is taken from a page, pc->mem_cgroup
> +	 * is accessed after testing USED bit. To make pc->mem_cgroup visible
> +	 * before USED bit, we need memory barrier here.
> +	 * See mem_cgroup_add_lru_list(), etc.
> +	 */
> +	smp_wmb();

Is this really necessary for hugetlb pages as well?

> +	SetPageCgroupUsed(pc);
> +
> +	unlock_page_cgroup(pc);
> +	return;
> +}
> +
[...]
> @@ -4887,6 +5013,7 @@ err_cleanup:
>  static struct cgroup_subsys_state * __ref
>  mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  {
> +	int idx;
>  	struct mem_cgroup *memcg, *parent;
>  	long error = -ENOMEM;
>  	int node;
> @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  		 * mem_cgroup(see mem_cgroup_put).
>  		 */
>  		mem_cgroup_get(parent);
> +		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)

Do we have to init all hstates or is hugetlb_max_hstate enough?

> +			res_counter_init(&memcg->hugepage[idx],
> +					 &parent->hugepage[idx]);
>  	} else {
>  		res_counter_init(&memcg->res, NULL);
>  		res_counter_init(&memcg->memsw, NULL);
> +		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
> +			res_counter_init(&memcg->hugepage[idx], NULL);

Same here
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
@ 2012-03-28 11:33     ` Michal Hocko
  0 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-28 11:33 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, mgorman-l3A5Bk7waGM,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> 
> This patch implements a memcg extension that allows us to control
> HugeTLB allocations via memory controller.

And the infrastructure is not used at this stage (you forgot to
mention).
The changelog should be much more descriptive.

> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> ---
>  include/linux/hugetlb.h    |    1 +
>  include/linux/memcontrol.h |   42 +++++++++++++
>  init/Kconfig               |    8 +++
>  mm/hugetlb.c               |    2 +-
>  mm/memcontrol.c            |  138 ++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 190 insertions(+), 1 deletions(-)
> 
[...]
> diff --git a/init/Kconfig b/init/Kconfig
> index 3f42cd6..f0eb8aa 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -725,6 +725,14 @@ config CGROUP_PERF
>  
>  	  Say N if unsure.
>  
> +config MEM_RES_CTLR_HUGETLB
> +	bool "Memory Resource Controller HugeTLB Extension (EXPERIMENTAL)"
> +	depends on CGROUP_MEM_RES_CTLR && HUGETLB_PAGE && EXPERIMENTAL
> +	default n
> +	help
> +	  Add HugeTLB management to memory resource controller. When you
> +	  enable this, you can put a per cgroup limit on HugeTLB usage.

How does it interact with the hard/soft limists etc...

[...]
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 6728a7a..4b36c5e 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -235,6 +235,10 @@ struct mem_cgroup {
>  	 */
>  	struct res_counter memsw;
>  	/*
> +	 * the counter to account for hugepages from hugetlb.
> +	 */
> +	struct res_counter hugepage[HUGE_MAX_HSTATE];
> +	/*
>  	 * Per cgroup active and inactive list, similar to the
>  	 * per zone LRU lists.
>  	 */
> @@ -3156,6 +3160,128 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
>  }
>  #endif
>  
> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
> +{
> +	int idx;
> +	for (idx = 0; idx < hugetlb_max_hstate; idx++) {

Maybe we should expose for_each_hstate as well...

> +		if (memcg->hugepage[idx].usage > 0)
> +			return 1;
> +	}
> +	return 0;
> +}
> +
> +int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
> +				   struct mem_cgroup **ptr)
> +{
> +	int ret = 0;
> +	struct mem_cgroup *memcg;
> +	struct res_counter *fail_res;
> +	unsigned long csize = nr_pages * PAGE_SIZE;
> +
> +	if (mem_cgroup_disabled())
> +		return 0;
> +again:
> +	rcu_read_lock();
> +	memcg = mem_cgroup_from_task(current);
> +	if (!memcg)
> +		memcg = root_mem_cgroup;
> +	if (mem_cgroup_is_root(memcg)) {
> +		rcu_read_unlock();
> +		goto done;
> +	}
> +	if (!css_tryget(&memcg->css)) {
> +		rcu_read_unlock();
> +		goto again;
> +	}
> +	rcu_read_unlock();
> +
> +	ret = res_counter_charge(&memcg->hugepage[idx], csize, &fail_res);
> +	css_put(&memcg->css);
> +done:
> +	*ptr = memcg;

Why do we set ptr even for the failure case after we dropped a
reference?

> +	return ret;
> +}
> +
> +void mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages,
> +				      struct mem_cgroup *memcg,
> +				      struct page *page)
> +{
> +	struct page_cgroup *pc;
> +
> +	if (mem_cgroup_disabled())
> +		return;
> +
> +	pc = lookup_page_cgroup(page);
> +	lock_page_cgroup(pc);
> +	if (unlikely(PageCgroupUsed(pc))) {
> +		unlock_page_cgroup(pc);
> +		mem_cgroup_hugetlb_uncharge_memcg(idx, nr_pages, memcg);
> +		return;
> +	}
> +	pc->mem_cgroup = memcg;
> +	/*
> +	 * We access a page_cgroup asynchronously without lock_page_cgroup().
> +	 * Especially when a page_cgroup is taken from a page, pc->mem_cgroup
> +	 * is accessed after testing USED bit. To make pc->mem_cgroup visible
> +	 * before USED bit, we need memory barrier here.
> +	 * See mem_cgroup_add_lru_list(), etc.
> +	 */
> +	smp_wmb();

Is this really necessary for hugetlb pages as well?

> +	SetPageCgroupUsed(pc);
> +
> +	unlock_page_cgroup(pc);
> +	return;
> +}
> +
[...]
> @@ -4887,6 +5013,7 @@ err_cleanup:
>  static struct cgroup_subsys_state * __ref
>  mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  {
> +	int idx;
>  	struct mem_cgroup *memcg, *parent;
>  	long error = -ENOMEM;
>  	int node;
> @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  		 * mem_cgroup(see mem_cgroup_put).
>  		 */
>  		mem_cgroup_get(parent);
> +		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)

Do we have to init all hstates or is hugetlb_max_hstate enough?

> +			res_counter_init(&memcg->hugepage[idx],
> +					 &parent->hugepage[idx]);
>  	} else {
>  		res_counter_init(&memcg->res, NULL);
>  		res_counter_init(&memcg->memsw, NULL);
> +		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
> +			res_counter_init(&memcg->hugepage[idx], NULL);

Same here
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 02/10] hugetlbfs: don't use ERR_PTR with VM_FAULT* values
  2012-03-28  9:25     ` Michal Hocko
  (?)
@ 2012-03-28 11:35       ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-28 11:35 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

Michal Hocko <mhocko@suse.cz> writes:

> On Fri 16-03-12 23:09:22, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> 
>> Using VM_FAULT_* codes with ERR_PTR will require us to make sure
>> VM_FAULT_* values will not exceed MAX_ERRNO value.
>> 
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>> ---
>>  mm/hugetlb.c |   18 +++++++++++++-----
>>  1 files changed, 13 insertions(+), 5 deletions(-)
>> 
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index d623e71..3782da8 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
> [...]
>> @@ -1047,7 +1047,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
>>  		page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
>>  		if (!page) {
>>  			hugetlb_put_quota(inode->i_mapping, chg);
>> -			return ERR_PTR(-VM_FAULT_SIGBUS);
>> +			return ERR_PTR(-ENOSPC);
>
> Hmm, so one error code abuse replaced by another?
> I know that ENOMEM would revert 4a6018f7 which would be unfortunate but
> ENOSPC doesn't feel right as well.
>

File systems do map ENOSPC to SIGBUS. block_page_mkwrite_return() does
that.

-aneesh


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 02/10] hugetlbfs: don't use ERR_PTR with VM_FAULT* values
@ 2012-03-28 11:35       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-28 11:35 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

Michal Hocko <mhocko@suse.cz> writes:

> On Fri 16-03-12 23:09:22, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> 
>> Using VM_FAULT_* codes with ERR_PTR will require us to make sure
>> VM_FAULT_* values will not exceed MAX_ERRNO value.
>> 
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>> ---
>>  mm/hugetlb.c |   18 +++++++++++++-----
>>  1 files changed, 13 insertions(+), 5 deletions(-)
>> 
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index d623e71..3782da8 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
> [...]
>> @@ -1047,7 +1047,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
>>  		page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
>>  		if (!page) {
>>  			hugetlb_put_quota(inode->i_mapping, chg);
>> -			return ERR_PTR(-VM_FAULT_SIGBUS);
>> +			return ERR_PTR(-ENOSPC);
>
> Hmm, so one error code abuse replaced by another?
> I know that ENOMEM would revert 4a6018f7 which would be unfortunate but
> ENOSPC doesn't feel right as well.
>

File systems do map ENOSPC to SIGBUS. block_page_mkwrite_return() does
that.

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 02/10] hugetlbfs: don't use ERR_PTR with VM_FAULT* values
@ 2012-03-28 11:35       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-28 11:35 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, mgorman-l3A5Bk7waGM,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org> writes:

> On Fri 16-03-12 23:09:22, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
>> 
>> Using VM_FAULT_* codes with ERR_PTR will require us to make sure
>> VM_FAULT_* values will not exceed MAX_ERRNO value.
>> 
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
>> ---
>>  mm/hugetlb.c |   18 +++++++++++++-----
>>  1 files changed, 13 insertions(+), 5 deletions(-)
>> 
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index d623e71..3782da8 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
> [...]
>> @@ -1047,7 +1047,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
>>  		page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
>>  		if (!page) {
>>  			hugetlb_put_quota(inode->i_mapping, chg);
>> -			return ERR_PTR(-VM_FAULT_SIGBUS);
>> +			return ERR_PTR(-ENOSPC);
>
> Hmm, so one error code abuse replaced by another?
> I know that ENOMEM would revert 4a6018f7 which would be unfortunate but
> ENOSPC doesn't feel right as well.
>

File systems do map ENOSPC to SIGBUS. block_page_mkwrite_return() does
that.

-aneesh

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
  2012-03-16 17:39   ` Aneesh Kumar K.V
  (?)
@ 2012-03-28 13:17     ` Michal Hocko
  -1 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-28 13:17 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

On Fri 16-03-12 23:09:25, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> This adds necessary charge/uncharge calls in the HugeTLB code

This begs for more description...
Other than that it looks correct.

> Acked-by: Hillf Danton <dhillf@gmail.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  mm/hugetlb.c    |   21 ++++++++++++++++++++-
>  mm/memcontrol.c |    5 +++++
>  2 files changed, 25 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index c672187..91361a0 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -21,6 +21,8 @@
>  #include <linux/rmap.h>
>  #include <linux/swap.h>
>  #include <linux/swapops.h>
> +#include <linux/memcontrol.h>
> +#include <linux/page_cgroup.h>
>  
>  #include <asm/page.h>
>  #include <asm/pgtable.h>
> @@ -542,6 +544,9 @@ static void free_huge_page(struct page *page)
>  	BUG_ON(page_mapcount(page));
>  	INIT_LIST_HEAD(&page->lru);
>  
> +	if (mapping)
> +		mem_cgroup_hugetlb_uncharge_page(hstate_index(h),
> +						 pages_per_huge_page(h), page);
>  	spin_lock(&hugetlb_lock);
>  	if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
>  		update_and_free_page(h, page);
> @@ -1019,12 +1024,15 @@ static void vma_commit_reservation(struct hstate *h,
>  static struct page *alloc_huge_page(struct vm_area_struct *vma,
>  				    unsigned long addr, int avoid_reserve)
>  {
> +	int ret, idx;
>  	struct hstate *h = hstate_vma(vma);
>  	struct page *page;
> +	struct mem_cgroup *memcg = NULL;
>  	struct address_space *mapping = vma->vm_file->f_mapping;
>  	struct inode *inode = mapping->host;
>  	long chg;
>  
> +	idx = hstate_index(h);
>  	/*
>  	 * Processes that did not create the mapping will have no reserves and
>  	 * will not have accounted against quota. Check that the quota can be
> @@ -1039,6 +1047,12 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
>  		if (hugetlb_get_quota(inode->i_mapping, chg))
>  			return ERR_PTR(-ENOSPC);
>  
> +	ret = mem_cgroup_hugetlb_charge_page(idx, pages_per_huge_page(h),
> +					     &memcg);
> +	if (ret) {
> +		hugetlb_put_quota(inode->i_mapping, chg);
> +		return ERR_PTR(-ENOSPC);
> +	}
>  	spin_lock(&hugetlb_lock);
>  	page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve);
>  	spin_unlock(&hugetlb_lock);
> @@ -1046,6 +1060,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
>  	if (!page) {
>  		page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
>  		if (!page) {
> +			mem_cgroup_hugetlb_uncharge_memcg(idx,
> +							 pages_per_huge_page(h),
> +							 memcg);
>  			hugetlb_put_quota(inode->i_mapping, chg);
>  			return ERR_PTR(-ENOSPC);
>  		}
> @@ -1054,7 +1071,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
>  	set_page_private(page, (unsigned long) mapping);
>  
>  	vma_commit_reservation(h, vma, addr);
> -
> +	/* update page cgroup details */
> +	mem_cgroup_hugetlb_commit_charge(idx, pages_per_huge_page(h),
> +					 memcg, page);
>  	return page;
>  }
>  
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 4b36c5e..7a9ea94 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2901,6 +2901,11 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
>  
>  	if (PageSwapCache(page))
>  		return NULL;
> +	/*
> +	 * HugeTLB page uncharge happen in the HugeTLB compound page destructor
> +	 */
> +	if (PageHuge(page))
> +		return NULL;
>  
>  	if (PageTransHuge(page)) {
>  		nr_pages <<= compound_order(page);
> -- 
> 1.7.9
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
@ 2012-03-28 13:17     ` Michal Hocko
  0 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-28 13:17 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

On Fri 16-03-12 23:09:25, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> This adds necessary charge/uncharge calls in the HugeTLB code

This begs for more description...
Other than that it looks correct.

> Acked-by: Hillf Danton <dhillf@gmail.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  mm/hugetlb.c    |   21 ++++++++++++++++++++-
>  mm/memcontrol.c |    5 +++++
>  2 files changed, 25 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index c672187..91361a0 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -21,6 +21,8 @@
>  #include <linux/rmap.h>
>  #include <linux/swap.h>
>  #include <linux/swapops.h>
> +#include <linux/memcontrol.h>
> +#include <linux/page_cgroup.h>
>  
>  #include <asm/page.h>
>  #include <asm/pgtable.h>
> @@ -542,6 +544,9 @@ static void free_huge_page(struct page *page)
>  	BUG_ON(page_mapcount(page));
>  	INIT_LIST_HEAD(&page->lru);
>  
> +	if (mapping)
> +		mem_cgroup_hugetlb_uncharge_page(hstate_index(h),
> +						 pages_per_huge_page(h), page);
>  	spin_lock(&hugetlb_lock);
>  	if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
>  		update_and_free_page(h, page);
> @@ -1019,12 +1024,15 @@ static void vma_commit_reservation(struct hstate *h,
>  static struct page *alloc_huge_page(struct vm_area_struct *vma,
>  				    unsigned long addr, int avoid_reserve)
>  {
> +	int ret, idx;
>  	struct hstate *h = hstate_vma(vma);
>  	struct page *page;
> +	struct mem_cgroup *memcg = NULL;
>  	struct address_space *mapping = vma->vm_file->f_mapping;
>  	struct inode *inode = mapping->host;
>  	long chg;
>  
> +	idx = hstate_index(h);
>  	/*
>  	 * Processes that did not create the mapping will have no reserves and
>  	 * will not have accounted against quota. Check that the quota can be
> @@ -1039,6 +1047,12 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
>  		if (hugetlb_get_quota(inode->i_mapping, chg))
>  			return ERR_PTR(-ENOSPC);
>  
> +	ret = mem_cgroup_hugetlb_charge_page(idx, pages_per_huge_page(h),
> +					     &memcg);
> +	if (ret) {
> +		hugetlb_put_quota(inode->i_mapping, chg);
> +		return ERR_PTR(-ENOSPC);
> +	}
>  	spin_lock(&hugetlb_lock);
>  	page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve);
>  	spin_unlock(&hugetlb_lock);
> @@ -1046,6 +1060,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
>  	if (!page) {
>  		page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
>  		if (!page) {
> +			mem_cgroup_hugetlb_uncharge_memcg(idx,
> +							 pages_per_huge_page(h),
> +							 memcg);
>  			hugetlb_put_quota(inode->i_mapping, chg);
>  			return ERR_PTR(-ENOSPC);
>  		}
> @@ -1054,7 +1071,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
>  	set_page_private(page, (unsigned long) mapping);
>  
>  	vma_commit_reservation(h, vma, addr);
> -
> +	/* update page cgroup details */
> +	mem_cgroup_hugetlb_commit_charge(idx, pages_per_huge_page(h),
> +					 memcg, page);
>  	return page;
>  }
>  
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 4b36c5e..7a9ea94 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2901,6 +2901,11 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
>  
>  	if (PageSwapCache(page))
>  		return NULL;
> +	/*
> +	 * HugeTLB page uncharge happen in the HugeTLB compound page destructor
> +	 */
> +	if (PageHuge(page))
> +		return NULL;
>  
>  	if (PageTransHuge(page)) {
>  		nr_pages <<= compound_order(page);
> -- 
> 1.7.9
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
@ 2012-03-28 13:17     ` Michal Hocko
  0 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-28 13:17 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, mgorman-l3A5Bk7waGM,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Fri 16-03-12 23:09:25, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> 
> This adds necessary charge/uncharge calls in the HugeTLB code

This begs for more description...
Other than that it looks correct.

> Acked-by: Hillf Danton <dhillf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> ---
>  mm/hugetlb.c    |   21 ++++++++++++++++++++-
>  mm/memcontrol.c |    5 +++++
>  2 files changed, 25 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index c672187..91361a0 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -21,6 +21,8 @@
>  #include <linux/rmap.h>
>  #include <linux/swap.h>
>  #include <linux/swapops.h>
> +#include <linux/memcontrol.h>
> +#include <linux/page_cgroup.h>
>  
>  #include <asm/page.h>
>  #include <asm/pgtable.h>
> @@ -542,6 +544,9 @@ static void free_huge_page(struct page *page)
>  	BUG_ON(page_mapcount(page));
>  	INIT_LIST_HEAD(&page->lru);
>  
> +	if (mapping)
> +		mem_cgroup_hugetlb_uncharge_page(hstate_index(h),
> +						 pages_per_huge_page(h), page);
>  	spin_lock(&hugetlb_lock);
>  	if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
>  		update_and_free_page(h, page);
> @@ -1019,12 +1024,15 @@ static void vma_commit_reservation(struct hstate *h,
>  static struct page *alloc_huge_page(struct vm_area_struct *vma,
>  				    unsigned long addr, int avoid_reserve)
>  {
> +	int ret, idx;
>  	struct hstate *h = hstate_vma(vma);
>  	struct page *page;
> +	struct mem_cgroup *memcg = NULL;
>  	struct address_space *mapping = vma->vm_file->f_mapping;
>  	struct inode *inode = mapping->host;
>  	long chg;
>  
> +	idx = hstate_index(h);
>  	/*
>  	 * Processes that did not create the mapping will have no reserves and
>  	 * will not have accounted against quota. Check that the quota can be
> @@ -1039,6 +1047,12 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
>  		if (hugetlb_get_quota(inode->i_mapping, chg))
>  			return ERR_PTR(-ENOSPC);
>  
> +	ret = mem_cgroup_hugetlb_charge_page(idx, pages_per_huge_page(h),
> +					     &memcg);
> +	if (ret) {
> +		hugetlb_put_quota(inode->i_mapping, chg);
> +		return ERR_PTR(-ENOSPC);
> +	}
>  	spin_lock(&hugetlb_lock);
>  	page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve);
>  	spin_unlock(&hugetlb_lock);
> @@ -1046,6 +1060,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
>  	if (!page) {
>  		page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
>  		if (!page) {
> +			mem_cgroup_hugetlb_uncharge_memcg(idx,
> +							 pages_per_huge_page(h),
> +							 memcg);
>  			hugetlb_put_quota(inode->i_mapping, chg);
>  			return ERR_PTR(-ENOSPC);
>  		}
> @@ -1054,7 +1071,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
>  	set_page_private(page, (unsigned long) mapping);
>  
>  	vma_commit_reservation(h, vma, addr);
> -
> +	/* update page cgroup details */
> +	mem_cgroup_hugetlb_commit_charge(idx, pages_per_huge_page(h),
> +					 memcg, page);
>  	return page;
>  }
>  
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 4b36c5e..7a9ea94 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2901,6 +2901,11 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
>  
>  	if (PageSwapCache(page))
>  		return NULL;
> +	/*
> +	 * HugeTLB page uncharge happen in the HugeTLB compound page destructor
> +	 */
> +	if (PageHuge(page))
> +		return NULL;
>  
>  	if (PageTransHuge(page)) {
>  		nr_pages <<= compound_order(page);
> -- 
> 1.7.9
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
  2012-03-16 17:39   ` Aneesh Kumar K.V
@ 2012-03-28 13:40     ` Michal Hocko
  -1 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-28 13:40 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote:
[...]
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 6728a7a..4b36c5e 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
[...]
> @@ -4887,6 +5013,7 @@ err_cleanup:
>  static struct cgroup_subsys_state * __ref
>  mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  {
> +	int idx;
>  	struct mem_cgroup *memcg, *parent;
>  	long error = -ENOMEM;
>  	int node;
> @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  		 * mem_cgroup(see mem_cgroup_put).
>  		 */
>  		mem_cgroup_get(parent);
> +		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
> +			res_counter_init(&memcg->hugepage[idx],
> +					 &parent->hugepage[idx]);

Hmm, I do not think we want to make groups deeper in the hierarchy
unlimited as we cannot reclaim. Shouldn't we copy the limit from the parent?
Still not ideal but slightly more expected behavior IMO.

The hierarchy setups are still interesting and the limitations should be
described in the documentation...

>  	} else {
>  		res_counter_init(&memcg->res, NULL);
>  		res_counter_init(&memcg->memsw, NULL);
> +		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
> +			res_counter_init(&memcg->hugepage[idx], NULL);
>  	}
>  	memcg->last_scanned_node = MAX_NUMNODES;
>  	INIT_LIST_HEAD(&memcg->oom_notify);
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
@ 2012-03-28 13:40     ` Michal Hocko
  0 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-28 13:40 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote:
[...]
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 6728a7a..4b36c5e 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
[...]
> @@ -4887,6 +5013,7 @@ err_cleanup:
>  static struct cgroup_subsys_state * __ref
>  mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  {
> +	int idx;
>  	struct mem_cgroup *memcg, *parent;
>  	long error = -ENOMEM;
>  	int node;
> @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  		 * mem_cgroup(see mem_cgroup_put).
>  		 */
>  		mem_cgroup_get(parent);
> +		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
> +			res_counter_init(&memcg->hugepage[idx],
> +					 &parent->hugepage[idx]);

Hmm, I do not think we want to make groups deeper in the hierarchy
unlimited as we cannot reclaim. Shouldn't we copy the limit from the parent?
Still not ideal but slightly more expected behavior IMO.

The hierarchy setups are still interesting and the limitations should be
described in the documentation...

>  	} else {
>  		res_counter_init(&memcg->res, NULL);
>  		res_counter_init(&memcg->memsw, NULL);
> +		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
> +			res_counter_init(&memcg->hugepage[idx], NULL);
>  	}
>  	memcg->last_scanned_node = MAX_NUMNODES;
>  	INIT_LIST_HEAD(&memcg->oom_notify);
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
  2012-03-28 11:33     ` Michal Hocko
@ 2012-03-28 13:40       ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-28 13:40 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

Michal Hocko <mhocko@suse.cz> writes:

> On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> 
>> This patch implements a memcg extension that allows us to control
>> HugeTLB allocations via memory controller.
>
> And the infrastructure is not used at this stage (you forgot to
> mention).
> The changelog should be much more descriptive.


Will update the changelog.

>
>> 
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>> ---
>>  include/linux/hugetlb.h    |    1 +
>>  include/linux/memcontrol.h |   42 +++++++++++++
>>  init/Kconfig               |    8 +++
>>  mm/hugetlb.c               |    2 +-
>>  mm/memcontrol.c            |  138 ++++++++++++++++++++++++++++++++++++++++++++
>>  5 files changed, 190 insertions(+), 1 deletions(-)
>> 
> [...]
>> diff --git a/init/Kconfig b/init/Kconfig
>> index 3f42cd6..f0eb8aa 100644
>> --- a/init/Kconfig
>> +++ b/init/Kconfig
>> @@ -725,6 +725,14 @@ config CGROUP_PERF
>>  
>>  	  Say N if unsure.
>>  
>> +config MEM_RES_CTLR_HUGETLB
>> +	bool "Memory Resource Controller HugeTLB Extension (EXPERIMENTAL)"
>> +	depends on CGROUP_MEM_RES_CTLR && HUGETLB_PAGE && EXPERIMENTAL
>> +	default n
>> +	help
>> +	  Add HugeTLB management to memory resource controller. When you
>> +	  enable this, you can put a per cgroup limit on HugeTLB usage.
>
> How does it interact with the hard/soft limists etc...


There is no softlimit support for HugeTLB extension.

>
> [...]
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 6728a7a..4b36c5e 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -235,6 +235,10 @@ struct mem_cgroup {
>>  	 */
>>  	struct res_counter memsw;
>>  	/*
>> +	 * the counter to account for hugepages from hugetlb.
>> +	 */
>> +	struct res_counter hugepage[HUGE_MAX_HSTATE];
>> +	/*
>>  	 * Per cgroup active and inactive list, similar to the
>>  	 * per zone LRU lists.
>>  	 */
>> @@ -3156,6 +3160,128 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
>>  }
>>  #endif
>>  
>> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
>> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
>> +{
>> +	int idx;
>> +	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
>
> Maybe we should expose for_each_hstate as well...


That will not really help here. If we use for_each_hstate then we will
need to use hstate_index to get the index.

>
>> +		if (memcg->hugepage[idx].usage > 0)
>> +			return 1;
>> +	}
>> +	return 0;
>> +}
>> +
>> +int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
>> +				   struct mem_cgroup **ptr)
>> +{
>> +	int ret = 0;
>> +	struct mem_cgroup *memcg;
>> +	struct res_counter *fail_res;
>> +	unsigned long csize = nr_pages * PAGE_SIZE;
>> +
>> +	if (mem_cgroup_disabled())
>> +		return 0;
>> +again:
>> +	rcu_read_lock();
>> +	memcg = mem_cgroup_from_task(current);
>> +	if (!memcg)
>> +		memcg = root_mem_cgroup;
>> +	if (mem_cgroup_is_root(memcg)) {
>> +		rcu_read_unlock();
>> +		goto done;
>> +	}
>> +	if (!css_tryget(&memcg->css)) {
>> +		rcu_read_unlock();
>> +		goto again;
>> +	}
>> +	rcu_read_unlock();
>> +
>> +	ret = res_counter_charge(&memcg->hugepage[idx], csize, &fail_res);
>> +	css_put(&memcg->css);
>> +done:
>> +	*ptr = memcg;
>
> Why do we set ptr even for the failure case after we dropped a
> reference?


That ensures that *ptr is NULL. 

>
>> +	return ret;
>> +}
>> +
>> +void mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages,
>> +				      struct mem_cgroup *memcg,
>> +				      struct page *page)
>> +{
>> +	struct page_cgroup *pc;
>> +
>> +	if (mem_cgroup_disabled())
>> +		return;
>> +
>> +	pc = lookup_page_cgroup(page);
>> +	lock_page_cgroup(pc);
>> +	if (unlikely(PageCgroupUsed(pc))) {
>> +		unlock_page_cgroup(pc);
>> +		mem_cgroup_hugetlb_uncharge_memcg(idx, nr_pages, memcg);
>> +		return;
>> +	}
>> +	pc->mem_cgroup = memcg;
>> +	/*
>> +	 * We access a page_cgroup asynchronously without lock_page_cgroup().
>> +	 * Especially when a page_cgroup is taken from a page, pc->mem_cgroup
>> +	 * is accessed after testing USED bit. To make pc->mem_cgroup visible
>> +	 * before USED bit, we need memory barrier here.
>> +	 * See mem_cgroup_add_lru_list(), etc.
>> +	 */
>> +	smp_wmb();
>
> Is this really necessary for hugetlb pages as well?

I used to do that in cgroup_rmdir path, I later changed that part of the code. I
will look at the patches again to see if we really need this.


>
>> +	SetPageCgroupUsed(pc);
>> +
>> +	unlock_page_cgroup(pc);
>> +	return;
>> +}
>> +
> [...]
>> @@ -4887,6 +5013,7 @@ err_cleanup:
>>  static struct cgroup_subsys_state * __ref
>>  mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>>  {
>> +	int idx;
>>  	struct mem_cgroup *memcg, *parent;
>>  	long error = -ENOMEM;
>>  	int node;
>> @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>>  		 * mem_cgroup(see mem_cgroup_put).
>>  		 */
>>  		mem_cgroup_get(parent);
>> +		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
>
> Do we have to init all hstates or is hugetlb_max_hstate enough?


Yes. we do call mem_cgroup_create for root cgroup before initialzing
hugetlb hstate.

>
>> +			res_counter_init(&memcg->hugepage[idx],
>> +					 &parent->hugepage[idx]);
>>  	} else {
>>  		res_counter_init(&memcg->res, NULL);
>>  		res_counter_init(&memcg->memsw, NULL);
>> +		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
>> +			res_counter_init(&memcg->hugepage[idx], NULL);
>
> Same here
> -- 

-aneesh


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
@ 2012-03-28 13:40       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-28 13:40 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

Michal Hocko <mhocko@suse.cz> writes:

> On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> 
>> This patch implements a memcg extension that allows us to control
>> HugeTLB allocations via memory controller.
>
> And the infrastructure is not used at this stage (you forgot to
> mention).
> The changelog should be much more descriptive.


Will update the changelog.

>
>> 
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>> ---
>>  include/linux/hugetlb.h    |    1 +
>>  include/linux/memcontrol.h |   42 +++++++++++++
>>  init/Kconfig               |    8 +++
>>  mm/hugetlb.c               |    2 +-
>>  mm/memcontrol.c            |  138 ++++++++++++++++++++++++++++++++++++++++++++
>>  5 files changed, 190 insertions(+), 1 deletions(-)
>> 
> [...]
>> diff --git a/init/Kconfig b/init/Kconfig
>> index 3f42cd6..f0eb8aa 100644
>> --- a/init/Kconfig
>> +++ b/init/Kconfig
>> @@ -725,6 +725,14 @@ config CGROUP_PERF
>>  
>>  	  Say N if unsure.
>>  
>> +config MEM_RES_CTLR_HUGETLB
>> +	bool "Memory Resource Controller HugeTLB Extension (EXPERIMENTAL)"
>> +	depends on CGROUP_MEM_RES_CTLR && HUGETLB_PAGE && EXPERIMENTAL
>> +	default n
>> +	help
>> +	  Add HugeTLB management to memory resource controller. When you
>> +	  enable this, you can put a per cgroup limit on HugeTLB usage.
>
> How does it interact with the hard/soft limists etc...


There is no softlimit support for HugeTLB extension.

>
> [...]
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 6728a7a..4b36c5e 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -235,6 +235,10 @@ struct mem_cgroup {
>>  	 */
>>  	struct res_counter memsw;
>>  	/*
>> +	 * the counter to account for hugepages from hugetlb.
>> +	 */
>> +	struct res_counter hugepage[HUGE_MAX_HSTATE];
>> +	/*
>>  	 * Per cgroup active and inactive list, similar to the
>>  	 * per zone LRU lists.
>>  	 */
>> @@ -3156,6 +3160,128 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
>>  }
>>  #endif
>>  
>> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
>> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
>> +{
>> +	int idx;
>> +	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
>
> Maybe we should expose for_each_hstate as well...


That will not really help here. If we use for_each_hstate then we will
need to use hstate_index to get the index.

>
>> +		if (memcg->hugepage[idx].usage > 0)
>> +			return 1;
>> +	}
>> +	return 0;
>> +}
>> +
>> +int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
>> +				   struct mem_cgroup **ptr)
>> +{
>> +	int ret = 0;
>> +	struct mem_cgroup *memcg;
>> +	struct res_counter *fail_res;
>> +	unsigned long csize = nr_pages * PAGE_SIZE;
>> +
>> +	if (mem_cgroup_disabled())
>> +		return 0;
>> +again:
>> +	rcu_read_lock();
>> +	memcg = mem_cgroup_from_task(current);
>> +	if (!memcg)
>> +		memcg = root_mem_cgroup;
>> +	if (mem_cgroup_is_root(memcg)) {
>> +		rcu_read_unlock();
>> +		goto done;
>> +	}
>> +	if (!css_tryget(&memcg->css)) {
>> +		rcu_read_unlock();
>> +		goto again;
>> +	}
>> +	rcu_read_unlock();
>> +
>> +	ret = res_counter_charge(&memcg->hugepage[idx], csize, &fail_res);
>> +	css_put(&memcg->css);
>> +done:
>> +	*ptr = memcg;
>
> Why do we set ptr even for the failure case after we dropped a
> reference?


That ensures that *ptr is NULL. 

>
>> +	return ret;
>> +}
>> +
>> +void mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages,
>> +				      struct mem_cgroup *memcg,
>> +				      struct page *page)
>> +{
>> +	struct page_cgroup *pc;
>> +
>> +	if (mem_cgroup_disabled())
>> +		return;
>> +
>> +	pc = lookup_page_cgroup(page);
>> +	lock_page_cgroup(pc);
>> +	if (unlikely(PageCgroupUsed(pc))) {
>> +		unlock_page_cgroup(pc);
>> +		mem_cgroup_hugetlb_uncharge_memcg(idx, nr_pages, memcg);
>> +		return;
>> +	}
>> +	pc->mem_cgroup = memcg;
>> +	/*
>> +	 * We access a page_cgroup asynchronously without lock_page_cgroup().
>> +	 * Especially when a page_cgroup is taken from a page, pc->mem_cgroup
>> +	 * is accessed after testing USED bit. To make pc->mem_cgroup visible
>> +	 * before USED bit, we need memory barrier here.
>> +	 * See mem_cgroup_add_lru_list(), etc.
>> +	 */
>> +	smp_wmb();
>
> Is this really necessary for hugetlb pages as well?

I used to do that in cgroup_rmdir path, I later changed that part of the code. I
will look at the patches again to see if we really need this.


>
>> +	SetPageCgroupUsed(pc);
>> +
>> +	unlock_page_cgroup(pc);
>> +	return;
>> +}
>> +
> [...]
>> @@ -4887,6 +5013,7 @@ err_cleanup:
>>  static struct cgroup_subsys_state * __ref
>>  mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>>  {
>> +	int idx;
>>  	struct mem_cgroup *memcg, *parent;
>>  	long error = -ENOMEM;
>>  	int node;
>> @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>>  		 * mem_cgroup(see mem_cgroup_put).
>>  		 */
>>  		mem_cgroup_get(parent);
>> +		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
>
> Do we have to init all hstates or is hugetlb_max_hstate enough?


Yes. we do call mem_cgroup_create for root cgroup before initialzing
hugetlb hstate.

>
>> +			res_counter_init(&memcg->hugepage[idx],
>> +					 &parent->hugepage[idx]);
>>  	} else {
>>  		res_counter_init(&memcg->res, NULL);
>>  		res_counter_init(&memcg->memsw, NULL);
>> +		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
>> +			res_counter_init(&memcg->hugepage[idx], NULL);
>
> Same here
> -- 

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 08/10] hugetlbfs: Add a list for tracking in-use HugeTLB pages
  2012-03-16 17:39   ` Aneesh Kumar K.V
@ 2012-03-28 13:58     ` Michal Hocko
  -1 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-28 13:58 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

On Fri 16-03-12 23:09:28, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> hugepage_activelist will be used to track currently used HugeTLB pages.
> We need to find the in-use HugeTLB pages to support memcg removal.
> On memcg removal we update the page's memory cgroup to point to
> parent cgroup.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  include/linux/hugetlb.h |    1 +
>  mm/hugetlb.c            |   23 ++++++++++++++++++-----
>  2 files changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index cbd8dc5..6919100 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
[...]
> @@ -2319,14 +2322,24 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
>  		page = pte_page(pte);
>  		if (pte_dirty(pte))
>  			set_page_dirty(page);
> -		list_add(&page->lru, &page_list);
> +
> +		spin_lock(&hugetlb_lock);
> +		list_move(&page->lru, &page_list);
> +		spin_unlock(&hugetlb_lock);

Why do we really need the spinlock here?

>  	}
>  	spin_unlock(&mm->page_table_lock);
>  	flush_tlb_range(vma, start, end);
>  	mmu_notifier_invalidate_range_end(mm, start, end);
>  	list_for_each_entry_safe(page, tmp, &page_list, lru) {
>  		page_remove_rmap(page);
> -		list_del(&page->lru);
> +		/*
> +		 * We need to move it back huge page active list. If we are
> +		 * holding the last reference, below put_page will move it
> +		 * back to free list.
> +		 */
> +		spin_lock(&hugetlb_lock);
> +		list_move(&page->lru, &h->hugepage_activelist);
> +		spin_unlock(&hugetlb_lock);

This spinlock usage doesn't look nice but I guess we do not have many
other options.

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 08/10] hugetlbfs: Add a list for tracking in-use HugeTLB pages
@ 2012-03-28 13:58     ` Michal Hocko
  0 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-28 13:58 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

On Fri 16-03-12 23:09:28, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> hugepage_activelist will be used to track currently used HugeTLB pages.
> We need to find the in-use HugeTLB pages to support memcg removal.
> On memcg removal we update the page's memory cgroup to point to
> parent cgroup.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  include/linux/hugetlb.h |    1 +
>  mm/hugetlb.c            |   23 ++++++++++++++++++-----
>  2 files changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index cbd8dc5..6919100 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
[...]
> @@ -2319,14 +2322,24 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
>  		page = pte_page(pte);
>  		if (pte_dirty(pte))
>  			set_page_dirty(page);
> -		list_add(&page->lru, &page_list);
> +
> +		spin_lock(&hugetlb_lock);
> +		list_move(&page->lru, &page_list);
> +		spin_unlock(&hugetlb_lock);

Why do we really need the spinlock here?

>  	}
>  	spin_unlock(&mm->page_table_lock);
>  	flush_tlb_range(vma, start, end);
>  	mmu_notifier_invalidate_range_end(mm, start, end);
>  	list_for_each_entry_safe(page, tmp, &page_list, lru) {
>  		page_remove_rmap(page);
> -		list_del(&page->lru);
> +		/*
> +		 * We need to move it back huge page active list. If we are
> +		 * holding the last reference, below put_page will move it
> +		 * back to free list.
> +		 */
> +		spin_lock(&hugetlb_lock);
> +		list_move(&page->lru, &h->hugepage_activelist);
> +		spin_unlock(&hugetlb_lock);

This spinlock usage doesn't look nice but I guess we do not have many
other options.

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 09/10] memcg: move HugeTLB resource count to parent cgroup on memcg removal
  2012-03-16 17:39   ` Aneesh Kumar K.V
  (?)
@ 2012-03-28 14:07     ` Michal Hocko
  -1 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-28 14:07 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

On Fri 16-03-12 23:09:29, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> This add support for memcg removal with HugeTLB resource usage.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  include/linux/hugetlb.h    |    6 ++++
>  include/linux/memcontrol.h |   15 +++++++++-
>  mm/hugetlb.c               |   41 ++++++++++++++++++++++++++
>  mm/memcontrol.c            |   68 +++++++++++++++++++++++++++++++++++++------
>  4 files changed, 119 insertions(+), 11 deletions(-)
> 
[...]
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 8fd465d..685f0d5 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
[...]
> @@ -3285,10 +3287,57 @@ void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
>  		res_counter_uncharge(&memcg->hugepage[idx], csize);
>  	return;
>  }
> -#else
> -static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
> +
> +int mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup,
> +				   struct page *page)
>  {
> -	return 0;
> +	struct page_cgroup *pc;
> +	int csize,  ret = 0;
> +	struct res_counter *fail_res;
> +	struct cgroup *pcgrp = cgroup->parent;
> +	struct mem_cgroup *parent = mem_cgroup_from_cont(pcgrp);
> +	struct mem_cgroup *memcg  = mem_cgroup_from_cont(cgroup);
> +
> +	if (!get_page_unless_zero(page))
> +		goto out;
> +
> +	pc = lookup_page_cgroup(page);
> +	lock_page_cgroup(pc);
> +	if (!PageCgroupUsed(pc) || pc->mem_cgroup != memcg)
> +		goto err_out;
> +
> +	csize = PAGE_SIZE << compound_order(page);
> +	/*
> +	 * uncharge from child and charge the parent. If we have
> +	 * use_hierarchy set, we can never fail here. In-order to make
> +	 * sure we don't get -ENOMEM on parent charge, we first uncharge
> +	 * the child and then charge the parent.
> +	 */
> +	if (parent->use_hierarchy) {
> +		res_counter_uncharge(&memcg->hugepage[idx], csize);
> +		if (!mem_cgroup_is_root(parent))
> +			ret = res_counter_charge(&parent->hugepage[idx],
> +						 csize, &fail_res);

You can still race with other hugetlb charge which would make this fail.

> +	} else {
> +		if (!mem_cgroup_is_root(parent)) {
> +			ret = res_counter_charge(&parent->hugepage[idx],
> +						 csize, &fail_res);
> +			if (ret) {
> +				ret = -EBUSY;
> +				goto err_out;
> +			}
> +		}
> +		res_counter_uncharge(&memcg->hugepage[idx], csize);
> +	}
> +	/*
> +	 * caller should have done css_get
> +	 */
> +	pc->mem_cgroup = parent;
> +err_out:
> +	unlock_page_cgroup(pc);
> +	put_page(page);
> +out:
> +	return ret;
>  }
>  #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */
[...]
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 09/10] memcg: move HugeTLB resource count to parent cgroup on memcg removal
@ 2012-03-28 14:07     ` Michal Hocko
  0 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-28 14:07 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

On Fri 16-03-12 23:09:29, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> This add support for memcg removal with HugeTLB resource usage.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  include/linux/hugetlb.h    |    6 ++++
>  include/linux/memcontrol.h |   15 +++++++++-
>  mm/hugetlb.c               |   41 ++++++++++++++++++++++++++
>  mm/memcontrol.c            |   68 +++++++++++++++++++++++++++++++++++++------
>  4 files changed, 119 insertions(+), 11 deletions(-)
> 
[...]
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 8fd465d..685f0d5 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
[...]
> @@ -3285,10 +3287,57 @@ void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
>  		res_counter_uncharge(&memcg->hugepage[idx], csize);
>  	return;
>  }
> -#else
> -static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
> +
> +int mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup,
> +				   struct page *page)
>  {
> -	return 0;
> +	struct page_cgroup *pc;
> +	int csize,  ret = 0;
> +	struct res_counter *fail_res;
> +	struct cgroup *pcgrp = cgroup->parent;
> +	struct mem_cgroup *parent = mem_cgroup_from_cont(pcgrp);
> +	struct mem_cgroup *memcg  = mem_cgroup_from_cont(cgroup);
> +
> +	if (!get_page_unless_zero(page))
> +		goto out;
> +
> +	pc = lookup_page_cgroup(page);
> +	lock_page_cgroup(pc);
> +	if (!PageCgroupUsed(pc) || pc->mem_cgroup != memcg)
> +		goto err_out;
> +
> +	csize = PAGE_SIZE << compound_order(page);
> +	/*
> +	 * uncharge from child and charge the parent. If we have
> +	 * use_hierarchy set, we can never fail here. In-order to make
> +	 * sure we don't get -ENOMEM on parent charge, we first uncharge
> +	 * the child and then charge the parent.
> +	 */
> +	if (parent->use_hierarchy) {
> +		res_counter_uncharge(&memcg->hugepage[idx], csize);
> +		if (!mem_cgroup_is_root(parent))
> +			ret = res_counter_charge(&parent->hugepage[idx],
> +						 csize, &fail_res);

You can still race with other hugetlb charge which would make this fail.

> +	} else {
> +		if (!mem_cgroup_is_root(parent)) {
> +			ret = res_counter_charge(&parent->hugepage[idx],
> +						 csize, &fail_res);
> +			if (ret) {
> +				ret = -EBUSY;
> +				goto err_out;
> +			}
> +		}
> +		res_counter_uncharge(&memcg->hugepage[idx], csize);
> +	}
> +	/*
> +	 * caller should have done css_get
> +	 */
> +	pc->mem_cgroup = parent;
> +err_out:
> +	unlock_page_cgroup(pc);
> +	put_page(page);
> +out:
> +	return ret;
>  }
>  #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */
[...]
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 09/10] memcg: move HugeTLB resource count to parent cgroup on memcg removal
@ 2012-03-28 14:07     ` Michal Hocko
  0 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-28 14:07 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, mgorman-l3A5Bk7waGM,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Fri 16-03-12 23:09:29, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> 
> This add support for memcg removal with HugeTLB resource usage.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> ---
>  include/linux/hugetlb.h    |    6 ++++
>  include/linux/memcontrol.h |   15 +++++++++-
>  mm/hugetlb.c               |   41 ++++++++++++++++++++++++++
>  mm/memcontrol.c            |   68 +++++++++++++++++++++++++++++++++++++------
>  4 files changed, 119 insertions(+), 11 deletions(-)
> 
[...]
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 8fd465d..685f0d5 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
[...]
> @@ -3285,10 +3287,57 @@ void mem_cgroup_hugetlb_uncharge_memcg(int idx, unsigned long nr_pages,
>  		res_counter_uncharge(&memcg->hugepage[idx], csize);
>  	return;
>  }
> -#else
> -static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
> +
> +int mem_cgroup_move_hugetlb_parent(int idx, struct cgroup *cgroup,
> +				   struct page *page)
>  {
> -	return 0;
> +	struct page_cgroup *pc;
> +	int csize,  ret = 0;
> +	struct res_counter *fail_res;
> +	struct cgroup *pcgrp = cgroup->parent;
> +	struct mem_cgroup *parent = mem_cgroup_from_cont(pcgrp);
> +	struct mem_cgroup *memcg  = mem_cgroup_from_cont(cgroup);
> +
> +	if (!get_page_unless_zero(page))
> +		goto out;
> +
> +	pc = lookup_page_cgroup(page);
> +	lock_page_cgroup(pc);
> +	if (!PageCgroupUsed(pc) || pc->mem_cgroup != memcg)
> +		goto err_out;
> +
> +	csize = PAGE_SIZE << compound_order(page);
> +	/*
> +	 * uncharge from child and charge the parent. If we have
> +	 * use_hierarchy set, we can never fail here. In-order to make
> +	 * sure we don't get -ENOMEM on parent charge, we first uncharge
> +	 * the child and then charge the parent.
> +	 */
> +	if (parent->use_hierarchy) {
> +		res_counter_uncharge(&memcg->hugepage[idx], csize);
> +		if (!mem_cgroup_is_root(parent))
> +			ret = res_counter_charge(&parent->hugepage[idx],
> +						 csize, &fail_res);

You can still race with other hugetlb charge which would make this fail.

> +	} else {
> +		if (!mem_cgroup_is_root(parent)) {
> +			ret = res_counter_charge(&parent->hugepage[idx],
> +						 csize, &fail_res);
> +			if (ret) {
> +				ret = -EBUSY;
> +				goto err_out;
> +			}
> +		}
> +		res_counter_uncharge(&memcg->hugepage[idx], csize);
> +	}
> +	/*
> +	 * caller should have done css_get
> +	 */
> +	pc->mem_cgroup = parent;
> +err_out:
> +	unlock_page_cgroup(pc);
> +	put_page(page);
> +out:
> +	return ret;
>  }
>  #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */
[...]
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 10/10] memcg: Add memory controller documentation for hugetlb management
  2012-03-16 17:39   ` Aneesh Kumar K.V
  (?)
@ 2012-03-28 14:36     ` Michal Hocko
  -1 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-28 14:36 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

On Fri 16-03-12 23:09:30, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  Documentation/cgroups/memory.txt |   29 +++++++++++++++++++++++++++++
>  1 files changed, 29 insertions(+), 0 deletions(-)
> 
> diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
> index 4c95c00..d99c41b 100644
> --- a/Documentation/cgroups/memory.txt
> +++ b/Documentation/cgroups/memory.txt
> @@ -43,6 +43,7 @@ Features:
>   - usage threshold notifier
>   - oom-killer disable knob and oom-notifier
>   - Root cgroup has no limit controls.
> + - resource accounting for HugeTLB pages
>  
>   Kernel memory support is work in progress, and the current version provides
>   basically functionality. (See Section 2.7)
> @@ -75,6 +76,12 @@ Brief summary of control files.
>   memory.kmem.tcp.limit_in_bytes  # set/show hard limit for tcp buf memory
>   memory.kmem.tcp.usage_in_bytes  # show current tcp buf memory allocation
>  
> +
> + memory.hugetlb.<hugepagesize>.limit_in_bytes     # set/show limit of "hugepagesize" hugetlb usage
> + memory.hugetlb.<hugepagesize>.max_usage_in_bytes # show max "hugepagesize" hugetlb  usage recorded
> + memory.hugetlb.<hugepagesize>.usage_in_bytes     # show current res_counter usage for "hugepagesize" hugetlb
> +						  # see 5.7 for details
> +
>  1. History
>  
>  The memory controller has a long history. A request for comments for the memory
> @@ -279,6 +286,15 @@ per cgroup, instead of globally.
>  
>  * tcp memory pressure: sockets memory pressure for the tcp protocol.
>  
> +2.8 HugeTLB extension
> +
> +This extension allows to limit the HugeTLB usage per control group and
> +enforces the controller limit during page fault. Since HugeTLB doesn't
> +support page reclaim, enforcing the limit at page fault time implies that,
> +the application will get SIGBUS signal if it tries to access HugeTLB pages
> +beyond its limit. 

This is consistent with the quota so we should mention that. We should
also add a note how we interact with quotas.

Another important thing to note is that the limit/usage are
unrelated to memcg hard/soft limit/usage.

> This requires the application to know beforehand how much
> +HugeTLB pages it would require for its use.
> +
>  3. User Interface
>  
>  0. Configuration
> @@ -287,6 +303,7 @@ a. Enable CONFIG_CGROUPS
>  b. Enable CONFIG_RESOURCE_COUNTERS
>  c. Enable CONFIG_CGROUP_MEM_RES_CTLR
>  d. Enable CONFIG_CGROUP_MEM_RES_CTLR_SWAP (to use swap extension)
> +f. Enable CONFIG_MEM_RES_CTLR_HUGETLB (to use HugeTLB extension)
>  
>  1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?)
>  # mount -t tmpfs none /sys/fs/cgroup
> @@ -510,6 +527,18 @@ unevictable=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ...
>  
>  And we have total = file + anon + unevictable.
>  
> +5.7 HugeTLB resource control files
> +For a system supporting two hugepage size (16M and 16G) the control
> +files include:
> +
> + memory.hugetlb.16GB.limit_in_bytes
> + memory.hugetlb.16GB.max_usage_in_bytes
> + memory.hugetlb.16GB.usage_in_bytes
> + memory.hugetlb.16MB.limit_in_bytes
> + memory.hugetlb.16MB.max_usage_in_bytes
> + memory.hugetlb.16MB.usage_in_bytes
> +
> +
>  6. Hierarchy support
>  
>  The memory controller supports a deep hierarchy and hierarchical accounting.
> -- 
> 1.7.9
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 10/10] memcg: Add memory controller documentation for hugetlb management
@ 2012-03-28 14:36     ` Michal Hocko
  0 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-28 14:36 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

On Fri 16-03-12 23:09:30, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  Documentation/cgroups/memory.txt |   29 +++++++++++++++++++++++++++++
>  1 files changed, 29 insertions(+), 0 deletions(-)
> 
> diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
> index 4c95c00..d99c41b 100644
> --- a/Documentation/cgroups/memory.txt
> +++ b/Documentation/cgroups/memory.txt
> @@ -43,6 +43,7 @@ Features:
>   - usage threshold notifier
>   - oom-killer disable knob and oom-notifier
>   - Root cgroup has no limit controls.
> + - resource accounting for HugeTLB pages
>  
>   Kernel memory support is work in progress, and the current version provides
>   basically functionality. (See Section 2.7)
> @@ -75,6 +76,12 @@ Brief summary of control files.
>   memory.kmem.tcp.limit_in_bytes  # set/show hard limit for tcp buf memory
>   memory.kmem.tcp.usage_in_bytes  # show current tcp buf memory allocation
>  
> +
> + memory.hugetlb.<hugepagesize>.limit_in_bytes     # set/show limit of "hugepagesize" hugetlb usage
> + memory.hugetlb.<hugepagesize>.max_usage_in_bytes # show max "hugepagesize" hugetlb  usage recorded
> + memory.hugetlb.<hugepagesize>.usage_in_bytes     # show current res_counter usage for "hugepagesize" hugetlb
> +						  # see 5.7 for details
> +
>  1. History
>  
>  The memory controller has a long history. A request for comments for the memory
> @@ -279,6 +286,15 @@ per cgroup, instead of globally.
>  
>  * tcp memory pressure: sockets memory pressure for the tcp protocol.
>  
> +2.8 HugeTLB extension
> +
> +This extension allows to limit the HugeTLB usage per control group and
> +enforces the controller limit during page fault. Since HugeTLB doesn't
> +support page reclaim, enforcing the limit at page fault time implies that,
> +the application will get SIGBUS signal if it tries to access HugeTLB pages
> +beyond its limit. 

This is consistent with the quota so we should mention that. We should
also add a note how we interact with quotas.

Another important thing to note is that the limit/usage are
unrelated to memcg hard/soft limit/usage.

> This requires the application to know beforehand how much
> +HugeTLB pages it would require for its use.
> +
>  3. User Interface
>  
>  0. Configuration
> @@ -287,6 +303,7 @@ a. Enable CONFIG_CGROUPS
>  b. Enable CONFIG_RESOURCE_COUNTERS
>  c. Enable CONFIG_CGROUP_MEM_RES_CTLR
>  d. Enable CONFIG_CGROUP_MEM_RES_CTLR_SWAP (to use swap extension)
> +f. Enable CONFIG_MEM_RES_CTLR_HUGETLB (to use HugeTLB extension)
>  
>  1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?)
>  # mount -t tmpfs none /sys/fs/cgroup
> @@ -510,6 +527,18 @@ unevictable=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ...
>  
>  And we have total = file + anon + unevictable.
>  
> +5.7 HugeTLB resource control files
> +For a system supporting two hugepage size (16M and 16G) the control
> +files include:
> +
> + memory.hugetlb.16GB.limit_in_bytes
> + memory.hugetlb.16GB.max_usage_in_bytes
> + memory.hugetlb.16GB.usage_in_bytes
> + memory.hugetlb.16MB.limit_in_bytes
> + memory.hugetlb.16MB.max_usage_in_bytes
> + memory.hugetlb.16MB.usage_in_bytes
> +
> +
>  6. Hierarchy support
>  
>  The memory controller supports a deep hierarchy and hierarchical accounting.
> -- 
> 1.7.9
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 10/10] memcg: Add memory controller documentation for hugetlb management
@ 2012-03-28 14:36     ` Michal Hocko
  0 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-28 14:36 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, mgorman-l3A5Bk7waGM,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Fri 16-03-12 23:09:30, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> ---
>  Documentation/cgroups/memory.txt |   29 +++++++++++++++++++++++++++++
>  1 files changed, 29 insertions(+), 0 deletions(-)
> 
> diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
> index 4c95c00..d99c41b 100644
> --- a/Documentation/cgroups/memory.txt
> +++ b/Documentation/cgroups/memory.txt
> @@ -43,6 +43,7 @@ Features:
>   - usage threshold notifier
>   - oom-killer disable knob and oom-notifier
>   - Root cgroup has no limit controls.
> + - resource accounting for HugeTLB pages
>  
>   Kernel memory support is work in progress, and the current version provides
>   basically functionality. (See Section 2.7)
> @@ -75,6 +76,12 @@ Brief summary of control files.
>   memory.kmem.tcp.limit_in_bytes  # set/show hard limit for tcp buf memory
>   memory.kmem.tcp.usage_in_bytes  # show current tcp buf memory allocation
>  
> +
> + memory.hugetlb.<hugepagesize>.limit_in_bytes     # set/show limit of "hugepagesize" hugetlb usage
> + memory.hugetlb.<hugepagesize>.max_usage_in_bytes # show max "hugepagesize" hugetlb  usage recorded
> + memory.hugetlb.<hugepagesize>.usage_in_bytes     # show current res_counter usage for "hugepagesize" hugetlb
> +						  # see 5.7 for details
> +
>  1. History
>  
>  The memory controller has a long history. A request for comments for the memory
> @@ -279,6 +286,15 @@ per cgroup, instead of globally.
>  
>  * tcp memory pressure: sockets memory pressure for the tcp protocol.
>  
> +2.8 HugeTLB extension
> +
> +This extension allows to limit the HugeTLB usage per control group and
> +enforces the controller limit during page fault. Since HugeTLB doesn't
> +support page reclaim, enforcing the limit at page fault time implies that,
> +the application will get SIGBUS signal if it tries to access HugeTLB pages
> +beyond its limit. 

This is consistent with the quota so we should mention that. We should
also add a note how we interact with quotas.

Another important thing to note is that the limit/usage are
unrelated to memcg hard/soft limit/usage.

> This requires the application to know beforehand how much
> +HugeTLB pages it would require for its use.
> +
>  3. User Interface
>  
>  0. Configuration
> @@ -287,6 +303,7 @@ a. Enable CONFIG_CGROUPS
>  b. Enable CONFIG_RESOURCE_COUNTERS
>  c. Enable CONFIG_CGROUP_MEM_RES_CTLR
>  d. Enable CONFIG_CGROUP_MEM_RES_CTLR_SWAP (to use swap extension)
> +f. Enable CONFIG_MEM_RES_CTLR_HUGETLB (to use HugeTLB extension)
>  
>  1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?)
>  # mount -t tmpfs none /sys/fs/cgroup
> @@ -510,6 +527,18 @@ unevictable=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ...
>  
>  And we have total = file + anon + unevictable.
>  
> +5.7 HugeTLB resource control files
> +For a system supporting two hugepage size (16M and 16G) the control
> +files include:
> +
> + memory.hugetlb.16GB.limit_in_bytes
> + memory.hugetlb.16GB.max_usage_in_bytes
> + memory.hugetlb.16GB.usage_in_bytes
> + memory.hugetlb.16MB.limit_in_bytes
> + memory.hugetlb.16MB.max_usage_in_bytes
> + memory.hugetlb.16MB.usage_in_bytes
> +
> +
>  6. Hierarchy support
>  
>  The memory controller supports a deep hierarchy and hierarchical accounting.
> -- 
> 1.7.9
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
  2012-03-28 13:40       ` Aneesh Kumar K.V
@ 2012-03-28 15:44         ` Michal Hocko
  -1 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-28 15:44 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

On Wed 28-03-12 19:10:36, Aneesh Kumar K.V wrote:
> Michal Hocko <mhocko@suse.cz> writes:
> 
> > On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote:
> >> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >> 
> >> This patch implements a memcg extension that allows us to control
> >> HugeTLB allocations via memory controller.
> >
> > And the infrastructure is not used at this stage (you forgot to
> > mention).
> > The changelog should be much more descriptive.
> 
> 
> Will update the changelog.

Thx

> 
> >
> >> 
> >> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> >> ---
> >>  include/linux/hugetlb.h    |    1 +
> >>  include/linux/memcontrol.h |   42 +++++++++++++
> >>  init/Kconfig               |    8 +++
> >>  mm/hugetlb.c               |    2 +-
> >>  mm/memcontrol.c            |  138 ++++++++++++++++++++++++++++++++++++++++++++
> >>  5 files changed, 190 insertions(+), 1 deletions(-)
> >> 
> > [...]
> >> diff --git a/init/Kconfig b/init/Kconfig
> >> index 3f42cd6..f0eb8aa 100644
> >> --- a/init/Kconfig
> >> +++ b/init/Kconfig
> >> @@ -725,6 +725,14 @@ config CGROUP_PERF
> >>  
> >>  	  Say N if unsure.
> >>  
> >> +config MEM_RES_CTLR_HUGETLB
> >> +	bool "Memory Resource Controller HugeTLB Extension (EXPERIMENTAL)"
> >> +	depends on CGROUP_MEM_RES_CTLR && HUGETLB_PAGE && EXPERIMENTAL
> >> +	default n
> >> +	help
> >> +	  Add HugeTLB management to memory resource controller. When you
> >> +	  enable this, you can put a per cgroup limit on HugeTLB usage.
> >
> > How does it interact with the hard/soft limists etc...
> 
> 
> There is no softlimit support for HugeTLB extension.

Sure, sorry for not being precise. The point was how this interacts with
memcg hard/soft limit (they are independent) etc...

> > [...]
> >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> >> index 6728a7a..4b36c5e 100644
> >> --- a/mm/memcontrol.c
> >> +++ b/mm/memcontrol.c
> >> @@ -235,6 +235,10 @@ struct mem_cgroup {
> >>  	 */
> >>  	struct res_counter memsw;
> >>  	/*
> >> +	 * the counter to account for hugepages from hugetlb.
> >> +	 */
> >> +	struct res_counter hugepage[HUGE_MAX_HSTATE];
> >> +	/*
> >>  	 * Per cgroup active and inactive list, similar to the
> >>  	 * per zone LRU lists.
> >>  	 */
> >> @@ -3156,6 +3160,128 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
> >>  }
> >>  #endif
> >>  
> >> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
> >> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
> >> +{
> >> +	int idx;
> >> +	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
> >
> > Maybe we should expose for_each_hstate as well...
> 
> 
> That will not really help here. If we use for_each_hstate then we will
> need to use hstate_index to get the index.

Fair enough

> >> +		if (memcg->hugepage[idx].usage > 0)
> >> +			return 1;
> >> +	}
> >> +	return 0;
> >> +}
> >> +
> >> +int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
> >> +				   struct mem_cgroup **ptr)
> >> +{
> >> +	int ret = 0;
> >> +	struct mem_cgroup *memcg;
> >> +	struct res_counter *fail_res;
> >> +	unsigned long csize = nr_pages * PAGE_SIZE;
> >> +
> >> +	if (mem_cgroup_disabled())
> >> +		return 0;
> >> +again:
> >> +	rcu_read_lock();
> >> +	memcg = mem_cgroup_from_task(current);
> >> +	if (!memcg)
> >> +		memcg = root_mem_cgroup;
> >> +	if (mem_cgroup_is_root(memcg)) {
> >> +		rcu_read_unlock();
> >> +		goto done;
> >> +	}
> >> +	if (!css_tryget(&memcg->css)) {
> >> +		rcu_read_unlock();
> >> +		goto again;
> >> +	}
> >> +	rcu_read_unlock();
> >> +
> >> +	ret = res_counter_charge(&memcg->hugepage[idx], csize, &fail_res);
> >> +	css_put(&memcg->css);
> >> +done:
> >> +	*ptr = memcg;
> >
> > Why do we set ptr even for the failure case after we dropped a
> > reference?
> 
> That ensures that *ptr is NULL. 

Does it? AFAICS res_counter_charge might fail and you would use non NULL
memcg (with a dropped reference).

[...]
> >> +	SetPageCgroupUsed(pc);
> >> +
> >> +	unlock_page_cgroup(pc);
> >> +	return;
> >> +}
> >> +
> > [...]
> >> @@ -4887,6 +5013,7 @@ err_cleanup:
> >>  static struct cgroup_subsys_state * __ref
> >>  mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> >>  {
> >> +	int idx;
> >>  	struct mem_cgroup *memcg, *parent;
> >>  	long error = -ENOMEM;
> >>  	int node;
> >> @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> >>  		 * mem_cgroup(see mem_cgroup_put).
> >>  		 */
> >>  		mem_cgroup_get(parent);
> >> +		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
> >
> > Do we have to init all hstates or is hugetlb_max_hstate enough?
> 
> 
> Yes. we do call mem_cgroup_create for root cgroup before initialzing
> hugetlb hstate.

drop a comment?


-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
@ 2012-03-28 15:44         ` Michal Hocko
  0 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-28 15:44 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

On Wed 28-03-12 19:10:36, Aneesh Kumar K.V wrote:
> Michal Hocko <mhocko@suse.cz> writes:
> 
> > On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote:
> >> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >> 
> >> This patch implements a memcg extension that allows us to control
> >> HugeTLB allocations via memory controller.
> >
> > And the infrastructure is not used at this stage (you forgot to
> > mention).
> > The changelog should be much more descriptive.
> 
> 
> Will update the changelog.

Thx

> 
> >
> >> 
> >> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> >> ---
> >>  include/linux/hugetlb.h    |    1 +
> >>  include/linux/memcontrol.h |   42 +++++++++++++
> >>  init/Kconfig               |    8 +++
> >>  mm/hugetlb.c               |    2 +-
> >>  mm/memcontrol.c            |  138 ++++++++++++++++++++++++++++++++++++++++++++
> >>  5 files changed, 190 insertions(+), 1 deletions(-)
> >> 
> > [...]
> >> diff --git a/init/Kconfig b/init/Kconfig
> >> index 3f42cd6..f0eb8aa 100644
> >> --- a/init/Kconfig
> >> +++ b/init/Kconfig
> >> @@ -725,6 +725,14 @@ config CGROUP_PERF
> >>  
> >>  	  Say N if unsure.
> >>  
> >> +config MEM_RES_CTLR_HUGETLB
> >> +	bool "Memory Resource Controller HugeTLB Extension (EXPERIMENTAL)"
> >> +	depends on CGROUP_MEM_RES_CTLR && HUGETLB_PAGE && EXPERIMENTAL
> >> +	default n
> >> +	help
> >> +	  Add HugeTLB management to memory resource controller. When you
> >> +	  enable this, you can put a per cgroup limit on HugeTLB usage.
> >
> > How does it interact with the hard/soft limists etc...
> 
> 
> There is no softlimit support for HugeTLB extension.

Sure, sorry for not being precise. The point was how this interacts with
memcg hard/soft limit (they are independent) etc...

> > [...]
> >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> >> index 6728a7a..4b36c5e 100644
> >> --- a/mm/memcontrol.c
> >> +++ b/mm/memcontrol.c
> >> @@ -235,6 +235,10 @@ struct mem_cgroup {
> >>  	 */
> >>  	struct res_counter memsw;
> >>  	/*
> >> +	 * the counter to account for hugepages from hugetlb.
> >> +	 */
> >> +	struct res_counter hugepage[HUGE_MAX_HSTATE];
> >> +	/*
> >>  	 * Per cgroup active and inactive list, similar to the
> >>  	 * per zone LRU lists.
> >>  	 */
> >> @@ -3156,6 +3160,128 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
> >>  }
> >>  #endif
> >>  
> >> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
> >> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
> >> +{
> >> +	int idx;
> >> +	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
> >
> > Maybe we should expose for_each_hstate as well...
> 
> 
> That will not really help here. If we use for_each_hstate then we will
> need to use hstate_index to get the index.

Fair enough

> >> +		if (memcg->hugepage[idx].usage > 0)
> >> +			return 1;
> >> +	}
> >> +	return 0;
> >> +}
> >> +
> >> +int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
> >> +				   struct mem_cgroup **ptr)
> >> +{
> >> +	int ret = 0;
> >> +	struct mem_cgroup *memcg;
> >> +	struct res_counter *fail_res;
> >> +	unsigned long csize = nr_pages * PAGE_SIZE;
> >> +
> >> +	if (mem_cgroup_disabled())
> >> +		return 0;
> >> +again:
> >> +	rcu_read_lock();
> >> +	memcg = mem_cgroup_from_task(current);
> >> +	if (!memcg)
> >> +		memcg = root_mem_cgroup;
> >> +	if (mem_cgroup_is_root(memcg)) {
> >> +		rcu_read_unlock();
> >> +		goto done;
> >> +	}
> >> +	if (!css_tryget(&memcg->css)) {
> >> +		rcu_read_unlock();
> >> +		goto again;
> >> +	}
> >> +	rcu_read_unlock();
> >> +
> >> +	ret = res_counter_charge(&memcg->hugepage[idx], csize, &fail_res);
> >> +	css_put(&memcg->css);
> >> +done:
> >> +	*ptr = memcg;
> >
> > Why do we set ptr even for the failure case after we dropped a
> > reference?
> 
> That ensures that *ptr is NULL. 

Does it? AFAICS res_counter_charge might fail and you would use non NULL
memcg (with a dropped reference).

[...]
> >> +	SetPageCgroupUsed(pc);
> >> +
> >> +	unlock_page_cgroup(pc);
> >> +	return;
> >> +}
> >> +
> > [...]
> >> @@ -4887,6 +5013,7 @@ err_cleanup:
> >>  static struct cgroup_subsys_state * __ref
> >>  mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> >>  {
> >> +	int idx;
> >>  	struct mem_cgroup *memcg, *parent;
> >>  	long error = -ENOMEM;
> >>  	int node;
> >> @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> >>  		 * mem_cgroup(see mem_cgroup_put).
> >>  		 */
> >>  		mem_cgroup_get(parent);
> >> +		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
> >
> > Do we have to init all hstates or is hugetlb_max_hstate enough?
> 
> 
> Yes. we do call mem_cgroup_create for root cgroup before initialzing
> hugetlb hstate.

drop a comment?


-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
  2012-03-28 13:40     ` Michal Hocko
  (?)
@ 2012-03-28 17:37       ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-28 17:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

Michal Hocko <mhocko@suse.cz> writes:

> On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote:
> [...]
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 6728a7a..4b36c5e 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
> [...]
>> @@ -4887,6 +5013,7 @@ err_cleanup:
>>  static struct cgroup_subsys_state * __ref
>>  mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>>  {
>> +	int idx;
>>  	struct mem_cgroup *memcg, *parent;
>>  	long error = -ENOMEM;
>>  	int node;
>> @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>>  		 * mem_cgroup(see mem_cgroup_put).
>>  		 */
>>  		mem_cgroup_get(parent);
>> +		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
>> +			res_counter_init(&memcg->hugepage[idx],
>> +					 &parent->hugepage[idx]);
>
> Hmm, I do not think we want to make groups deeper in the hierarchy
> unlimited as we cannot reclaim. Shouldn't we copy the limit from the parent?
> Still not ideal but slightly more expected behavior IMO.

But we should be limiting the child group based on parent's limit only
when hierarchy is set right ?

>
> The hierarchy setups are still interesting and the limitations should be
> described in the documentation...
>

It should behave similar to memcg. ie, if hierarchy is set, then we limit
using MIN(parent's limit, child's limit). May be I am missing some of
the details of memcg use_hierarchy config. My goal was to keep it
similar to memcg. Can you explain why do you think the patch would
make it any different ?

-aneesh


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
@ 2012-03-28 17:37       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-28 17:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

Michal Hocko <mhocko@suse.cz> writes:

> On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote:
> [...]
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 6728a7a..4b36c5e 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
> [...]
>> @@ -4887,6 +5013,7 @@ err_cleanup:
>>  static struct cgroup_subsys_state * __ref
>>  mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>>  {
>> +	int idx;
>>  	struct mem_cgroup *memcg, *parent;
>>  	long error = -ENOMEM;
>>  	int node;
>> @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>>  		 * mem_cgroup(see mem_cgroup_put).
>>  		 */
>>  		mem_cgroup_get(parent);
>> +		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
>> +			res_counter_init(&memcg->hugepage[idx],
>> +					 &parent->hugepage[idx]);
>
> Hmm, I do not think we want to make groups deeper in the hierarchy
> unlimited as we cannot reclaim. Shouldn't we copy the limit from the parent?
> Still not ideal but slightly more expected behavior IMO.

But we should be limiting the child group based on parent's limit only
when hierarchy is set right ?

>
> The hierarchy setups are still interesting and the limitations should be
> described in the documentation...
>

It should behave similar to memcg. ie, if hierarchy is set, then we limit
using MIN(parent's limit, child's limit). May be I am missing some of
the details of memcg use_hierarchy config. My goal was to keep it
similar to memcg. Can you explain why do you think the patch would
make it any different ?

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
@ 2012-03-28 17:37       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-28 17:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, mgorman-l3A5Bk7waGM,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org> writes:

> On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote:
> [...]
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 6728a7a..4b36c5e 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
> [...]
>> @@ -4887,6 +5013,7 @@ err_cleanup:
>>  static struct cgroup_subsys_state * __ref
>>  mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>>  {
>> +	int idx;
>>  	struct mem_cgroup *memcg, *parent;
>>  	long error = -ENOMEM;
>>  	int node;
>> @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>>  		 * mem_cgroup(see mem_cgroup_put).
>>  		 */
>>  		mem_cgroup_get(parent);
>> +		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
>> +			res_counter_init(&memcg->hugepage[idx],
>> +					 &parent->hugepage[idx]);
>
> Hmm, I do not think we want to make groups deeper in the hierarchy
> unlimited as we cannot reclaim. Shouldn't we copy the limit from the parent?
> Still not ideal but slightly more expected behavior IMO.

But we should be limiting the child group based on parent's limit only
when hierarchy is set right ?

>
> The hierarchy setups are still interesting and the limitations should be
> described in the documentation...
>

It should behave similar to memcg. ie, if hierarchy is set, then we limit
using MIN(parent's limit, child's limit). May be I am missing some of
the details of memcg use_hierarchy config. My goal was to keep it
similar to memcg. Can you explain why do you think the patch would
make it any different ?

-aneesh

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 08/10] hugetlbfs: Add a list for tracking in-use HugeTLB pages
  2012-03-28 13:58     ` Michal Hocko
@ 2012-03-28 17:38       ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-28 17:38 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

Michal Hocko <mhocko@suse.cz> writes:

> On Fri 16-03-12 23:09:28, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> 
>> hugepage_activelist will be used to track currently used HugeTLB pages.
>> We need to find the in-use HugeTLB pages to support memcg removal.
>> On memcg removal we update the page's memory cgroup to point to
>> parent cgroup.
>> 
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>> ---
>>  include/linux/hugetlb.h |    1 +
>>  mm/hugetlb.c            |   23 ++++++++++++++++++-----
>>  2 files changed, 19 insertions(+), 5 deletions(-)
>> 
>> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
>> index cbd8dc5..6919100 100644
>> --- a/include/linux/hugetlb.h
>> +++ b/include/linux/hugetlb.h
> [...]
>> @@ -2319,14 +2322,24 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
>>  		page = pte_page(pte);
>>  		if (pte_dirty(pte))
>>  			set_page_dirty(page);
>> -		list_add(&page->lru, &page_list);
>> +
>> +		spin_lock(&hugetlb_lock);
>> +		list_move(&page->lru, &page_list);
>> +		spin_unlock(&hugetlb_lock);
>
> Why do we really need the spinlock here?


It does a list_del from hugepage_activelist.


>
>>  	}
>>  	spin_unlock(&mm->page_table_lock);
>>  	flush_tlb_range(vma, start, end);
>>  	mmu_notifier_invalidate_range_end(mm, start, end);
>>  	list_for_each_entry_safe(page, tmp, &page_list, lru) {
>>  		page_remove_rmap(page);
>> -		list_del(&page->lru);
>> +		/*
>> +		 * We need to move it back huge page active list. If we are
>> +		 * holding the last reference, below put_page will move it
>> +		 * back to free list.
>> +		 */
>> +		spin_lock(&hugetlb_lock);
>> +		list_move(&page->lru, &h->hugepage_activelist);
>> +		spin_unlock(&hugetlb_lock);
>
> This spinlock usage doesn't look nice but I guess we do not have many
> other options.
>

-aneesh


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 08/10] hugetlbfs: Add a list for tracking in-use HugeTLB pages
@ 2012-03-28 17:38       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-28 17:38 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

Michal Hocko <mhocko@suse.cz> writes:

> On Fri 16-03-12 23:09:28, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> 
>> hugepage_activelist will be used to track currently used HugeTLB pages.
>> We need to find the in-use HugeTLB pages to support memcg removal.
>> On memcg removal we update the page's memory cgroup to point to
>> parent cgroup.
>> 
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>> ---
>>  include/linux/hugetlb.h |    1 +
>>  mm/hugetlb.c            |   23 ++++++++++++++++++-----
>>  2 files changed, 19 insertions(+), 5 deletions(-)
>> 
>> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
>> index cbd8dc5..6919100 100644
>> --- a/include/linux/hugetlb.h
>> +++ b/include/linux/hugetlb.h
> [...]
>> @@ -2319,14 +2322,24 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
>>  		page = pte_page(pte);
>>  		if (pte_dirty(pte))
>>  			set_page_dirty(page);
>> -		list_add(&page->lru, &page_list);
>> +
>> +		spin_lock(&hugetlb_lock);
>> +		list_move(&page->lru, &page_list);
>> +		spin_unlock(&hugetlb_lock);
>
> Why do we really need the spinlock here?


It does a list_del from hugepage_activelist.


>
>>  	}
>>  	spin_unlock(&mm->page_table_lock);
>>  	flush_tlb_range(vma, start, end);
>>  	mmu_notifier_invalidate_range_end(mm, start, end);
>>  	list_for_each_entry_safe(page, tmp, &page_list, lru) {
>>  		page_remove_rmap(page);
>> -		list_del(&page->lru);
>> +		/*
>> +		 * We need to move it back huge page active list. If we are
>> +		 * holding the last reference, below put_page will move it
>> +		 * back to free list.
>> +		 */
>> +		spin_lock(&hugetlb_lock);
>> +		list_move(&page->lru, &h->hugepage_activelist);
>> +		spin_unlock(&hugetlb_lock);
>
> This spinlock usage doesn't look nice but I guess we do not have many
> other options.
>

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
  2012-03-28 13:17     ` Michal Hocko
  (?)
@ 2012-03-28 17:39       ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-28 17:39 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

Michal Hocko <mhocko@suse.cz> writes:

> On Fri 16-03-12 23:09:25, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> 
>> This adds necessary charge/uncharge calls in the HugeTLB code
>
> This begs for more description...
> Other than that it looks correct.
>

Updated as below

    hugetlb: add charge/uncharge calls for HugeTLB alloc/free
    
    This adds necessary charge/uncharge calls in the HugeTLB code. We do
    memcg charge in page alloc and uncharge in compound page destructor.
    We also need to ignore HugeTLB pages in __mem_cgroup_uncharge_common
    because that get called from delete_from_page_cache
    
-aneesh


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
@ 2012-03-28 17:39       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-28 17:39 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

Michal Hocko <mhocko@suse.cz> writes:

> On Fri 16-03-12 23:09:25, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> 
>> This adds necessary charge/uncharge calls in the HugeTLB code
>
> This begs for more description...
> Other than that it looks correct.
>

Updated as below

    hugetlb: add charge/uncharge calls for HugeTLB alloc/free
    
    This adds necessary charge/uncharge calls in the HugeTLB code. We do
    memcg charge in page alloc and uncharge in compound page destructor.
    We also need to ignore HugeTLB pages in __mem_cgroup_uncharge_common
    because that get called from delete_from_page_cache
    
-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
@ 2012-03-28 17:39       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-28 17:39 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, mgorman-l3A5Bk7waGM,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org> writes:

> On Fri 16-03-12 23:09:25, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
>> 
>> This adds necessary charge/uncharge calls in the HugeTLB code
>
> This begs for more description...
> Other than that it looks correct.
>

Updated as below

    hugetlb: add charge/uncharge calls for HugeTLB alloc/free
    
    This adds necessary charge/uncharge calls in the HugeTLB code. We do
    memcg charge in page alloc and uncharge in compound page destructor.
    We also need to ignore HugeTLB pages in __mem_cgroup_uncharge_common
    because that get called from delete_from_page_cache
    
-aneesh

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
  2012-03-28 17:37       ` Aneesh Kumar K.V
  (?)
@ 2012-03-29  0:18         ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-29  0:18 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Michal Hocko, linux-mm, mgorman, dhillf, aarcange, akpm, hannes,
	linux-kernel, cgroups

(2012/03/29 2:37), Aneesh Kumar K.V wrote:

> Michal Hocko <mhocko@suse.cz> writes:
> 
>> On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote:
>> [...]
>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>>> index 6728a7a..4b36c5e 100644
>>> --- a/mm/memcontrol.c
>>> +++ b/mm/memcontrol.c
>> [...]
>>> @@ -4887,6 +5013,7 @@ err_cleanup:
>>>  static struct cgroup_subsys_state * __ref
>>>  mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>>>  {
>>> +	int idx;
>>>  	struct mem_cgroup *memcg, *parent;
>>>  	long error = -ENOMEM;
>>>  	int node;
>>> @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>>>  		 * mem_cgroup(see mem_cgroup_put).
>>>  		 */
>>>  		mem_cgroup_get(parent);
>>> +		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
>>> +			res_counter_init(&memcg->hugepage[idx],
>>> +					 &parent->hugepage[idx]);
>>
>> Hmm, I do not think we want to make groups deeper in the hierarchy
>> unlimited as we cannot reclaim. Shouldn't we copy the limit from the parent?
>> Still not ideal but slightly more expected behavior IMO.
> 
> But we should be limiting the child group based on parent's limit only
> when hierarchy is set right ?
> 
>>
>> The hierarchy setups are still interesting and the limitations should be
>> described in the documentation...
>>
> 
> It should behave similar to memcg. ie, if hierarchy is set, then we limit
> using MIN(parent's limit, child's limit). May be I am missing some of
> the details of memcg use_hierarchy config. My goal was to keep it
> similar to memcg. Can you explain why do you think the patch would
> make it any different ?
> 


Maybe this is a different story but....

Tejun(Cgroup Maintainer) asked us to remove 'use_hierarchy' settings because
most of other cgroups are hierarchical(*). I answered that improvement in res_counter 
latency is required. And now, we have some idea to improve res_counter.
(I'd like to try this after page_cgroup diet series..)

If we change and drop use_hierarchy, the usage similar to current use_hierarchy=0
will be..

	/cgroup/memory/			       = unlimited
			level1		       = unlimited
				level2	       = unlimited
					level3 = limit

To do this, after improvement of res_counter, we entry use_hierarchy into
feature-removal-list and wait for 2 versions..So, this will not affect
your developments, anyway.
 
Thanks,
-Kame

(*) AFAIK, blkio cgroup needs tons of work to be hierarchical...



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
@ 2012-03-29  0:18         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-29  0:18 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Michal Hocko, linux-mm, mgorman, dhillf, aarcange, akpm, hannes,
	linux-kernel, cgroups

(2012/03/29 2:37), Aneesh Kumar K.V wrote:

> Michal Hocko <mhocko@suse.cz> writes:
> 
>> On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote:
>> [...]
>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>>> index 6728a7a..4b36c5e 100644
>>> --- a/mm/memcontrol.c
>>> +++ b/mm/memcontrol.c
>> [...]
>>> @@ -4887,6 +5013,7 @@ err_cleanup:
>>>  static struct cgroup_subsys_state * __ref
>>>  mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>>>  {
>>> +	int idx;
>>>  	struct mem_cgroup *memcg, *parent;
>>>  	long error = -ENOMEM;
>>>  	int node;
>>> @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>>>  		 * mem_cgroup(see mem_cgroup_put).
>>>  		 */
>>>  		mem_cgroup_get(parent);
>>> +		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
>>> +			res_counter_init(&memcg->hugepage[idx],
>>> +					 &parent->hugepage[idx]);
>>
>> Hmm, I do not think we want to make groups deeper in the hierarchy
>> unlimited as we cannot reclaim. Shouldn't we copy the limit from the parent?
>> Still not ideal but slightly more expected behavior IMO.
> 
> But we should be limiting the child group based on parent's limit only
> when hierarchy is set right ?
> 
>>
>> The hierarchy setups are still interesting and the limitations should be
>> described in the documentation...
>>
> 
> It should behave similar to memcg. ie, if hierarchy is set, then we limit
> using MIN(parent's limit, child's limit). May be I am missing some of
> the details of memcg use_hierarchy config. My goal was to keep it
> similar to memcg. Can you explain why do you think the patch would
> make it any different ?
> 


Maybe this is a different story but....

Tejun(Cgroup Maintainer) asked us to remove 'use_hierarchy' settings because
most of other cgroups are hierarchical(*). I answered that improvement in res_counter 
latency is required. And now, we have some idea to improve res_counter.
(I'd like to try this after page_cgroup diet series..)

If we change and drop use_hierarchy, the usage similar to current use_hierarchy=0
will be..

	/cgroup/memory/			       = unlimited
			level1		       = unlimited
				level2	       = unlimited
					level3 = limit

To do this, after improvement of res_counter, we entry use_hierarchy into
feature-removal-list and wait for 2 versions..So, this will not affect
your developments, anyway.
 
Thanks,
-Kame

(*) AFAIK, blkio cgroup needs tons of work to be hierarchical...


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
@ 2012-03-29  0:18         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 130+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-29  0:18 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Michal Hocko, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	mgorman-l3A5Bk7waGM, dhillf-Re5JQEeQqe8AvxtiuMwx3w,
	aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

(2012/03/29 2:37), Aneesh Kumar K.V wrote:

> Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org> writes:
> 
>> On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote:
>> [...]
>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>>> index 6728a7a..4b36c5e 100644
>>> --- a/mm/memcontrol.c
>>> +++ b/mm/memcontrol.c
>> [...]
>>> @@ -4887,6 +5013,7 @@ err_cleanup:
>>>  static struct cgroup_subsys_state * __ref
>>>  mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>>>  {
>>> +	int idx;
>>>  	struct mem_cgroup *memcg, *parent;
>>>  	long error = -ENOMEM;
>>>  	int node;
>>> @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>>>  		 * mem_cgroup(see mem_cgroup_put).
>>>  		 */
>>>  		mem_cgroup_get(parent);
>>> +		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
>>> +			res_counter_init(&memcg->hugepage[idx],
>>> +					 &parent->hugepage[idx]);
>>
>> Hmm, I do not think we want to make groups deeper in the hierarchy
>> unlimited as we cannot reclaim. Shouldn't we copy the limit from the parent?
>> Still not ideal but slightly more expected behavior IMO.
> 
> But we should be limiting the child group based on parent's limit only
> when hierarchy is set right ?
> 
>>
>> The hierarchy setups are still interesting and the limitations should be
>> described in the documentation...
>>
> 
> It should behave similar to memcg. ie, if hierarchy is set, then we limit
> using MIN(parent's limit, child's limit). May be I am missing some of
> the details of memcg use_hierarchy config. My goal was to keep it
> similar to memcg. Can you explain why do you think the patch would
> make it any different ?
> 


Maybe this is a different story but....

Tejun(Cgroup Maintainer) asked us to remove 'use_hierarchy' settings because
most of other cgroups are hierarchical(*). I answered that improvement in res_counter 
latency is required. And now, we have some idea to improve res_counter.
(I'd like to try this after page_cgroup diet series..)

If we change and drop use_hierarchy, the usage similar to current use_hierarchy=0
will be..

	/cgroup/memory/			       = unlimited
			level1		       = unlimited
				level2	       = unlimited
					level3 = limit

To do this, after improvement of res_counter, we entry use_hierarchy into
feature-removal-list and wait for 2 versions..So, this will not affect
your developments, anyway.
 
Thanks,
-Kame

(*) AFAIK, blkio cgroup needs tons of work to be hierarchical...


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
  2012-03-28 17:37       ` Aneesh Kumar K.V
  (?)
@ 2012-03-29  7:57         ` Michal Hocko
  -1 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-29  7:57 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

On Wed 28-03-12 23:07:14, Aneesh Kumar K.V wrote:
> Michal Hocko <mhocko@suse.cz> writes:
> 
> > On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote:
> > [...]
> >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> >> index 6728a7a..4b36c5e 100644
> >> --- a/mm/memcontrol.c
> >> +++ b/mm/memcontrol.c
> > [...]
> >> @@ -4887,6 +5013,7 @@ err_cleanup:
> >>  static struct cgroup_subsys_state * __ref
> >>  mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> >>  {
> >> +	int idx;
> >>  	struct mem_cgroup *memcg, *parent;
> >>  	long error = -ENOMEM;
> >>  	int node;
> >> @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> >>  		 * mem_cgroup(see mem_cgroup_put).
> >>  		 */
> >>  		mem_cgroup_get(parent);
> >> +		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
> >> +			res_counter_init(&memcg->hugepage[idx],
> >> +					 &parent->hugepage[idx]);
> >
> > Hmm, I do not think we want to make groups deeper in the hierarchy
> > unlimited as we cannot reclaim. Shouldn't we copy the limit from the parent?
> > Still not ideal but slightly more expected behavior IMO.
> 
> But we should be limiting the child group based on parent's limit only
> when hierarchy is set right ?

Yes. Everything else should be unlimited by default.

> 
> >
> > The hierarchy setups are still interesting and the limitations should be
> > described in the documentation...
> >
> 
> It should behave similar to memcg. ie, if hierarchy is set, then we limit
> using MIN(parent's limit, child's limit). May be I am missing some of
> the details of memcg use_hierarchy config. My goal was to keep it
> similar to memcg. Can you explain why do you think the patch would
> make it any different ?

Yes, the patch tries to be consistent with the memcg limits. That is OK
and I have no objections for that. It is just that consequences are
different. The hugetlb limit is really hard...

> 
> -aneesh
> 
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
@ 2012-03-29  7:57         ` Michal Hocko
  0 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-29  7:57 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

On Wed 28-03-12 23:07:14, Aneesh Kumar K.V wrote:
> Michal Hocko <mhocko@suse.cz> writes:
> 
> > On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote:
> > [...]
> >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> >> index 6728a7a..4b36c5e 100644
> >> --- a/mm/memcontrol.c
> >> +++ b/mm/memcontrol.c
> > [...]
> >> @@ -4887,6 +5013,7 @@ err_cleanup:
> >>  static struct cgroup_subsys_state * __ref
> >>  mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> >>  {
> >> +	int idx;
> >>  	struct mem_cgroup *memcg, *parent;
> >>  	long error = -ENOMEM;
> >>  	int node;
> >> @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> >>  		 * mem_cgroup(see mem_cgroup_put).
> >>  		 */
> >>  		mem_cgroup_get(parent);
> >> +		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
> >> +			res_counter_init(&memcg->hugepage[idx],
> >> +					 &parent->hugepage[idx]);
> >
> > Hmm, I do not think we want to make groups deeper in the hierarchy
> > unlimited as we cannot reclaim. Shouldn't we copy the limit from the parent?
> > Still not ideal but slightly more expected behavior IMO.
> 
> But we should be limiting the child group based on parent's limit only
> when hierarchy is set right ?

Yes. Everything else should be unlimited by default.

> 
> >
> > The hierarchy setups are still interesting and the limitations should be
> > described in the documentation...
> >
> 
> It should behave similar to memcg. ie, if hierarchy is set, then we limit
> using MIN(parent's limit, child's limit). May be I am missing some of
> the details of memcg use_hierarchy config. My goal was to keep it
> similar to memcg. Can you explain why do you think the patch would
> make it any different ?

Yes, the patch tries to be consistent with the memcg limits. That is OK
and I have no objections for that. It is just that consequences are
different. The hugetlb limit is really hard...

> 
> -aneesh
> 
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension
@ 2012-03-29  7:57         ` Michal Hocko
  0 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-29  7:57 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, mgorman-l3A5Bk7waGM,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Wed 28-03-12 23:07:14, Aneesh Kumar K.V wrote:
> Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org> writes:
> 
> > On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote:
> > [...]
> >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> >> index 6728a7a..4b36c5e 100644
> >> --- a/mm/memcontrol.c
> >> +++ b/mm/memcontrol.c
> > [...]
> >> @@ -4887,6 +5013,7 @@ err_cleanup:
> >>  static struct cgroup_subsys_state * __ref
> >>  mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> >>  {
> >> +	int idx;
> >>  	struct mem_cgroup *memcg, *parent;
> >>  	long error = -ENOMEM;
> >>  	int node;
> >> @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> >>  		 * mem_cgroup(see mem_cgroup_put).
> >>  		 */
> >>  		mem_cgroup_get(parent);
> >> +		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
> >> +			res_counter_init(&memcg->hugepage[idx],
> >> +					 &parent->hugepage[idx]);
> >
> > Hmm, I do not think we want to make groups deeper in the hierarchy
> > unlimited as we cannot reclaim. Shouldn't we copy the limit from the parent?
> > Still not ideal but slightly more expected behavior IMO.
> 
> But we should be limiting the child group based on parent's limit only
> when hierarchy is set right ?

Yes. Everything else should be unlimited by default.

> 
> >
> > The hierarchy setups are still interesting and the limitations should be
> > described in the documentation...
> >
> 
> It should behave similar to memcg. ie, if hierarchy is set, then we limit
> using MIN(parent's limit, child's limit). May be I am missing some of
> the details of memcg use_hierarchy config. My goal was to keep it
> similar to memcg. Can you explain why do you think the patch would
> make it any different ?

Yes, the patch tries to be consistent with the memcg limits. That is OK
and I have no objections for that. It is just that consequences are
different. The hugetlb limit is really hard...

> 
> -aneesh
> 
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
  2012-03-28 17:39       ` Aneesh Kumar K.V
@ 2012-03-29  8:10         ` Michal Hocko
  -1 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-29  8:10 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

On Wed 28-03-12 23:09:34, Aneesh Kumar K.V wrote:
> Michal Hocko <mhocko@suse.cz> writes:
> 
> > On Fri 16-03-12 23:09:25, Aneesh Kumar K.V wrote:
> >> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >> 
> >> This adds necessary charge/uncharge calls in the HugeTLB code
> >
> > This begs for more description...
> > Other than that it looks correct.
> >
> 
> Updated as below
> 
>     hugetlb: add charge/uncharge calls for HugeTLB alloc/free
>     
>     This adds necessary charge/uncharge calls in the HugeTLB code. We do
>     memcg charge in page alloc and uncharge in compound page destructor.
>     We also need to ignore HugeTLB pages in __mem_cgroup_uncharge_common
>     because that get called from delete_from_page_cache

and from mem_cgroup_end_migration used during soft_offline_page.

Btw., while looking at mem_cgroup_end_migration, I have noticed that you
need to take care of mem_cgroup_prepare_migration as well otherwise the
page would get charged as a normal (shmem) page.

>     
> -aneesh
> 
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
@ 2012-03-29  8:10         ` Michal Hocko
  0 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-29  8:10 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

On Wed 28-03-12 23:09:34, Aneesh Kumar K.V wrote:
> Michal Hocko <mhocko@suse.cz> writes:
> 
> > On Fri 16-03-12 23:09:25, Aneesh Kumar K.V wrote:
> >> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >> 
> >> This adds necessary charge/uncharge calls in the HugeTLB code
> >
> > This begs for more description...
> > Other than that it looks correct.
> >
> 
> Updated as below
> 
>     hugetlb: add charge/uncharge calls for HugeTLB alloc/free
>     
>     This adds necessary charge/uncharge calls in the HugeTLB code. We do
>     memcg charge in page alloc and uncharge in compound page destructor.
>     We also need to ignore HugeTLB pages in __mem_cgroup_uncharge_common
>     because that get called from delete_from_page_cache

and from mem_cgroup_end_migration used during soft_offline_page.

Btw., while looking at mem_cgroup_end_migration, I have noticed that you
need to take care of mem_cgroup_prepare_migration as well otherwise the
page would get charged as a normal (shmem) page.

>     
> -aneesh
> 
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 08/10] hugetlbfs: Add a list for tracking in-use HugeTLB pages
  2012-03-28 17:38       ` Aneesh Kumar K.V
@ 2012-03-29  8:11         ` Michal Hocko
  -1 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-29  8:11 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

On Wed 28-03-12 23:08:34, Aneesh Kumar K.V wrote:
> Michal Hocko <mhocko@suse.cz> writes:
> 
> > On Fri 16-03-12 23:09:28, Aneesh Kumar K.V wrote:
> >> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >> 
> >> hugepage_activelist will be used to track currently used HugeTLB pages.
> >> We need to find the in-use HugeTLB pages to support memcg removal.
> >> On memcg removal we update the page's memory cgroup to point to
> >> parent cgroup.
> >> 
> >> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> >> ---
> >>  include/linux/hugetlb.h |    1 +
> >>  mm/hugetlb.c            |   23 ++++++++++++++++++-----
> >>  2 files changed, 19 insertions(+), 5 deletions(-)
> >> 
> >> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> >> index cbd8dc5..6919100 100644
> >> --- a/include/linux/hugetlb.h
> >> +++ b/include/linux/hugetlb.h
> > [...]
> >> @@ -2319,14 +2322,24 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
> >>  		page = pte_page(pte);
> >>  		if (pte_dirty(pte))
> >>  			set_page_dirty(page);
> >> -		list_add(&page->lru, &page_list);
> >> +
> >> +		spin_lock(&hugetlb_lock);
> >> +		list_move(&page->lru, &page_list);
> >> +		spin_unlock(&hugetlb_lock);
> >
> > Why do we really need the spinlock here?
> 
> 
> It does a list_del from hugepage_activelist.

right you are.
sorry

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 08/10] hugetlbfs: Add a list for tracking in-use HugeTLB pages
@ 2012-03-29  8:11         ` Michal Hocko
  0 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-29  8:11 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

On Wed 28-03-12 23:08:34, Aneesh Kumar K.V wrote:
> Michal Hocko <mhocko@suse.cz> writes:
> 
> > On Fri 16-03-12 23:09:28, Aneesh Kumar K.V wrote:
> >> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >> 
> >> hugepage_activelist will be used to track currently used HugeTLB pages.
> >> We need to find the in-use HugeTLB pages to support memcg removal.
> >> On memcg removal we update the page's memory cgroup to point to
> >> parent cgroup.
> >> 
> >> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> >> ---
> >>  include/linux/hugetlb.h |    1 +
> >>  mm/hugetlb.c            |   23 ++++++++++++++++++-----
> >>  2 files changed, 19 insertions(+), 5 deletions(-)
> >> 
> >> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> >> index cbd8dc5..6919100 100644
> >> --- a/include/linux/hugetlb.h
> >> +++ b/include/linux/hugetlb.h
> > [...]
> >> @@ -2319,14 +2322,24 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
> >>  		page = pte_page(pte);
> >>  		if (pte_dirty(pte))
> >>  			set_page_dirty(page);
> >> -		list_add(&page->lru, &page_list);
> >> +
> >> +		spin_lock(&hugetlb_lock);
> >> +		list_move(&page->lru, &page_list);
> >> +		spin_unlock(&hugetlb_lock);
> >
> > Why do we really need the spinlock here?
> 
> 
> It does a list_del from hugepage_activelist.

right you are.
sorry

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
  2012-03-29  8:10         ` Michal Hocko
@ 2012-03-30 10:40           ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-30 10:40 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

Michal Hocko <mhocko@suse.cz> writes:

> On Wed 28-03-12 23:09:34, Aneesh Kumar K.V wrote:
>> Michal Hocko <mhocko@suse.cz> writes:
>> 
>> > On Fri 16-03-12 23:09:25, Aneesh Kumar K.V wrote:
>> >> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> >> 
>> >> This adds necessary charge/uncharge calls in the HugeTLB code
>> >
>> > This begs for more description...
>> > Other than that it looks correct.
>> >
>> 
>> Updated as below
>> 
>>     hugetlb: add charge/uncharge calls for HugeTLB alloc/free
>>     
>>     This adds necessary charge/uncharge calls in the HugeTLB code. We do
>>     memcg charge in page alloc and uncharge in compound page destructor.
>>     We also need to ignore HugeTLB pages in __mem_cgroup_uncharge_common
>>     because that get called from delete_from_page_cache
>
> and from mem_cgroup_end_migration used during soft_offline_page.
>
> Btw., while looking at mem_cgroup_end_migration, I have noticed that you
> need to take care of mem_cgroup_prepare_migration as well otherwise the
> page would get charged as a normal (shmem) page.
>

Won't we skip HugeTLB pages in migrate ? check_range do check for
is_vm_hugetlb_page.

-aneesh


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
@ 2012-03-30 10:40           ` Aneesh Kumar K.V
  0 siblings, 0 replies; 130+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-30 10:40 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

Michal Hocko <mhocko@suse.cz> writes:

> On Wed 28-03-12 23:09:34, Aneesh Kumar K.V wrote:
>> Michal Hocko <mhocko@suse.cz> writes:
>> 
>> > On Fri 16-03-12 23:09:25, Aneesh Kumar K.V wrote:
>> >> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> >> 
>> >> This adds necessary charge/uncharge calls in the HugeTLB code
>> >
>> > This begs for more description...
>> > Other than that it looks correct.
>> >
>> 
>> Updated as below
>> 
>>     hugetlb: add charge/uncharge calls for HugeTLB alloc/free
>>     
>>     This adds necessary charge/uncharge calls in the HugeTLB code. We do
>>     memcg charge in page alloc and uncharge in compound page destructor.
>>     We also need to ignore HugeTLB pages in __mem_cgroup_uncharge_common
>>     because that get called from delete_from_page_cache
>
> and from mem_cgroup_end_migration used during soft_offline_page.
>
> Btw., while looking at mem_cgroup_end_migration, I have noticed that you
> need to take care of mem_cgroup_prepare_migration as well otherwise the
> page would get charged as a normal (shmem) page.
>

Won't we skip HugeTLB pages in migrate ? check_range do check for
is_vm_hugetlb_page.

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
  2012-03-30 10:40           ` Aneesh Kumar K.V
  (?)
@ 2012-03-30 10:46             ` Michal Hocko
  -1 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-30 10:46 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

On Fri 30-03-12 16:10:00, Aneesh Kumar K.V wrote:
> Michal Hocko <mhocko@suse.cz> writes:
> 
> > On Wed 28-03-12 23:09:34, Aneesh Kumar K.V wrote:
> >> Michal Hocko <mhocko@suse.cz> writes:
> >> 
> >> > On Fri 16-03-12 23:09:25, Aneesh Kumar K.V wrote:
> >> >> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >> >> 
> >> >> This adds necessary charge/uncharge calls in the HugeTLB code
> >> >
> >> > This begs for more description...
> >> > Other than that it looks correct.
> >> >
> >> 
> >> Updated as below
> >> 
> >>     hugetlb: add charge/uncharge calls for HugeTLB alloc/free
> >>     
> >>     This adds necessary charge/uncharge calls in the HugeTLB code. We do
> >>     memcg charge in page alloc and uncharge in compound page destructor.
> >>     We also need to ignore HugeTLB pages in __mem_cgroup_uncharge_common
> >>     because that get called from delete_from_page_cache
> >
> > and from mem_cgroup_end_migration used during soft_offline_page.
> >
> > Btw., while looking at mem_cgroup_end_migration, I have noticed that you
> > need to take care of mem_cgroup_prepare_migration as well otherwise the
> > page would get charged as a normal (shmem) page.
> >
> 
> Won't we skip HugeTLB pages in migrate ?

Yes but we still migrate for memory failure (see soft_offline_page).

> check_range do check for is_vm_hugetlb_page.
> 
> -aneesh
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
@ 2012-03-30 10:46             ` Michal Hocko
  0 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-30 10:46 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, aarcange, akpm,
	hannes, linux-kernel, cgroups

On Fri 30-03-12 16:10:00, Aneesh Kumar K.V wrote:
> Michal Hocko <mhocko@suse.cz> writes:
> 
> > On Wed 28-03-12 23:09:34, Aneesh Kumar K.V wrote:
> >> Michal Hocko <mhocko@suse.cz> writes:
> >> 
> >> > On Fri 16-03-12 23:09:25, Aneesh Kumar K.V wrote:
> >> >> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >> >> 
> >> >> This adds necessary charge/uncharge calls in the HugeTLB code
> >> >
> >> > This begs for more description...
> >> > Other than that it looks correct.
> >> >
> >> 
> >> Updated as below
> >> 
> >>     hugetlb: add charge/uncharge calls for HugeTLB alloc/free
> >>     
> >>     This adds necessary charge/uncharge calls in the HugeTLB code. We do
> >>     memcg charge in page alloc and uncharge in compound page destructor.
> >>     We also need to ignore HugeTLB pages in __mem_cgroup_uncharge_common
> >>     because that get called from delete_from_page_cache
> >
> > and from mem_cgroup_end_migration used during soft_offline_page.
> >
> > Btw., while looking at mem_cgroup_end_migration, I have noticed that you
> > need to take care of mem_cgroup_prepare_migration as well otherwise the
> > page would get charged as a normal (shmem) page.
> >
> 
> Won't we skip HugeTLB pages in migrate ?

Yes but we still migrate for memory failure (see soft_offline_page).

> check_range do check for is_vm_hugetlb_page.
> 
> -aneesh
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free
@ 2012-03-30 10:46             ` Michal Hocko
  0 siblings, 0 replies; 130+ messages in thread
From: Michal Hocko @ 2012-03-30 10:46 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, mgorman-l3A5Bk7waGM,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Fri 30-03-12 16:10:00, Aneesh Kumar K.V wrote:
> Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org> writes:
> 
> > On Wed 28-03-12 23:09:34, Aneesh Kumar K.V wrote:
> >> Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org> writes:
> >> 
> >> > On Fri 16-03-12 23:09:25, Aneesh Kumar K.V wrote:
> >> >> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> >> >> 
> >> >> This adds necessary charge/uncharge calls in the HugeTLB code
> >> >
> >> > This begs for more description...
> >> > Other than that it looks correct.
> >> >
> >> 
> >> Updated as below
> >> 
> >>     hugetlb: add charge/uncharge calls for HugeTLB alloc/free
> >>     
> >>     This adds necessary charge/uncharge calls in the HugeTLB code. We do
> >>     memcg charge in page alloc and uncharge in compound page destructor.
> >>     We also need to ignore HugeTLB pages in __mem_cgroup_uncharge_common
> >>     because that get called from delete_from_page_cache
> >
> > and from mem_cgroup_end_migration used during soft_offline_page.
> >
> > Btw., while looking at mem_cgroup_end_migration, I have noticed that you
> > need to take care of mem_cgroup_prepare_migration as well otherwise the
> > page would get charged as a normal (shmem) page.
> >
> 
> Won't we skip HugeTLB pages in migrate ?

Yes but we still migrate for memory failure (see soft_offline_page).

> check_range do check for is_vm_hugetlb_page.
> 
> -aneesh
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 130+ messages in thread

end of thread, other threads:[~2012-03-30 10:55 UTC | newest]

Thread overview: 130+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-16 17:39 [PATCH -V4 00/10] memcg: Add memcg extension to control HugeTLB allocation Aneesh Kumar K.V
2012-03-16 17:39 ` Aneesh Kumar K.V
2012-03-16 17:39 ` [PATCH -V4 01/10] hugetlb: rename max_hstate to hugetlb_max_hstate Aneesh Kumar K.V
2012-03-16 17:39   ` Aneesh Kumar K.V
2012-03-19  2:07   ` KAMEZAWA Hiroyuki
2012-03-19  2:07     ` KAMEZAWA Hiroyuki
2012-03-28  9:18   ` Michal Hocko
2012-03-28  9:18     ` Michal Hocko
2012-03-28  9:18     ` Michal Hocko
2012-03-16 17:39 ` [PATCH -V4 02/10] hugetlbfs: don't use ERR_PTR with VM_FAULT* values Aneesh Kumar K.V
2012-03-16 17:39   ` Aneesh Kumar K.V
2012-03-19  2:11   ` KAMEZAWA Hiroyuki
2012-03-19  2:11     ` KAMEZAWA Hiroyuki
2012-03-19  2:11     ` KAMEZAWA Hiroyuki
2012-03-19  6:37     ` Aneesh Kumar K.V
2012-03-19  6:37       ` Aneesh Kumar K.V
2012-03-28  9:25   ` Michal Hocko
2012-03-28  9:25     ` Michal Hocko
2012-03-28 11:35     ` Aneesh Kumar K.V
2012-03-28 11:35       ` Aneesh Kumar K.V
2012-03-28 11:35       ` Aneesh Kumar K.V
2012-03-16 17:39 ` [PATCH -V4 03/10] hugetlbfs: Add an inline helper for finding hstate index Aneesh Kumar K.V
2012-03-16 17:39   ` Aneesh Kumar K.V
2012-03-19  2:15   ` KAMEZAWA Hiroyuki
2012-03-19  2:15     ` KAMEZAWA Hiroyuki
2012-03-28  9:41   ` Michal Hocko
2012-03-28  9:41     ` Michal Hocko
2012-03-16 17:39 ` [PATCH -V4 04/10] memcg: Add HugeTLB extension Aneesh Kumar K.V
2012-03-16 17:39   ` Aneesh Kumar K.V
2012-03-19  2:38   ` KAMEZAWA Hiroyuki
2012-03-19  2:38     ` KAMEZAWA Hiroyuki
2012-03-19  2:38     ` KAMEZAWA Hiroyuki
2012-03-19  6:52     ` Aneesh Kumar K.V
2012-03-19  6:52       ` Aneesh Kumar K.V
2012-03-19  6:52       ` Aneesh Kumar K.V
2012-03-19  7:00       ` KAMEZAWA Hiroyuki
2012-03-19  7:00         ` KAMEZAWA Hiroyuki
2012-03-19  7:00         ` KAMEZAWA Hiroyuki
2012-03-19 11:39         ` Glauber Costa
2012-03-19 12:07           ` KAMEZAWA Hiroyuki
2012-03-19 12:07             ` KAMEZAWA Hiroyuki
2012-03-21  4:48           ` Aneesh Kumar K.V
2012-03-21  4:48             ` Aneesh Kumar K.V
2012-03-21  4:48             ` Aneesh Kumar K.V
2012-03-21  5:22             ` KAMEZAWA Hiroyuki
2012-03-21  5:22               ` KAMEZAWA Hiroyuki
2012-03-21  5:22               ` KAMEZAWA Hiroyuki
2012-03-28 11:33   ` Michal Hocko
2012-03-28 11:33     ` Michal Hocko
2012-03-28 11:33     ` Michal Hocko
2012-03-28 13:40     ` Aneesh Kumar K.V
2012-03-28 13:40       ` Aneesh Kumar K.V
2012-03-28 15:44       ` Michal Hocko
2012-03-28 15:44         ` Michal Hocko
2012-03-28 13:40   ` Michal Hocko
2012-03-28 13:40     ` Michal Hocko
2012-03-28 17:37     ` Aneesh Kumar K.V
2012-03-28 17:37       ` Aneesh Kumar K.V
2012-03-28 17:37       ` Aneesh Kumar K.V
2012-03-29  0:18       ` KAMEZAWA Hiroyuki
2012-03-29  0:18         ` KAMEZAWA Hiroyuki
2012-03-29  0:18         ` KAMEZAWA Hiroyuki
2012-03-29  7:57       ` Michal Hocko
2012-03-29  7:57         ` Michal Hocko
2012-03-29  7:57         ` Michal Hocko
2012-03-16 17:39 ` [PATCH -V4 05/10] hugetlb: add charge/uncharge calls for HugeTLB alloc/free Aneesh Kumar K.V
2012-03-16 17:39   ` Aneesh Kumar K.V
2012-03-19  2:41   ` KAMEZAWA Hiroyuki
2012-03-19  2:41     ` KAMEZAWA Hiroyuki
2012-03-19  2:41     ` KAMEZAWA Hiroyuki
2012-03-19  7:01     ` Aneesh Kumar K.V
2012-03-19  7:01       ` Aneesh Kumar K.V
2012-03-28 13:17   ` Michal Hocko
2012-03-28 13:17     ` Michal Hocko
2012-03-28 13:17     ` Michal Hocko
2012-03-28 17:39     ` Aneesh Kumar K.V
2012-03-28 17:39       ` Aneesh Kumar K.V
2012-03-28 17:39       ` Aneesh Kumar K.V
2012-03-29  8:10       ` Michal Hocko
2012-03-29  8:10         ` Michal Hocko
2012-03-30 10:40         ` Aneesh Kumar K.V
2012-03-30 10:40           ` Aneesh Kumar K.V
2012-03-30 10:46           ` Michal Hocko
2012-03-30 10:46             ` Michal Hocko
2012-03-30 10:46             ` Michal Hocko
2012-03-16 17:39 ` [PATCH -V4 06/10] memcg: track resource index in cftype private Aneesh Kumar K.V
2012-03-16 17:39   ` Aneesh Kumar K.V
2012-03-19  2:43   ` KAMEZAWA Hiroyuki
2012-03-19  2:43     ` KAMEZAWA Hiroyuki
2012-03-19  2:43     ` KAMEZAWA Hiroyuki
2012-03-16 17:39 ` [PATCH -V4 07/10] hugetlbfs: Add memcg control files for hugetlbfs Aneesh Kumar K.V
2012-03-16 17:39   ` Aneesh Kumar K.V
2012-03-19  2:56   ` KAMEZAWA Hiroyuki
2012-03-19  2:56     ` KAMEZAWA Hiroyuki
2012-03-19  7:14     ` Aneesh Kumar K.V
2012-03-19  7:14       ` Aneesh Kumar K.V
2012-03-19  7:14       ` Aneesh Kumar K.V
2012-03-19  7:34       ` KAMEZAWA Hiroyuki
2012-03-19  7:34         ` KAMEZAWA Hiroyuki
2012-03-19  7:34         ` KAMEZAWA Hiroyuki
2012-03-20  9:22         ` Aneesh Kumar K.V
2012-03-20  9:22           ` Aneesh Kumar K.V
2012-03-16 17:39 ` [PATCH -V4 08/10] hugetlbfs: Add a list for tracking in-use HugeTLB pages Aneesh Kumar K.V
2012-03-16 17:39   ` Aneesh Kumar K.V
2012-03-19  3:00   ` KAMEZAWA Hiroyuki
2012-03-19  3:00     ` KAMEZAWA Hiroyuki
2012-03-19  3:00     ` KAMEZAWA Hiroyuki
2012-03-19  8:59     ` Aneesh Kumar K.V
2012-03-19  8:59       ` Aneesh Kumar K.V
2012-03-28 13:58   ` Michal Hocko
2012-03-28 13:58     ` Michal Hocko
2012-03-28 17:38     ` Aneesh Kumar K.V
2012-03-28 17:38       ` Aneesh Kumar K.V
2012-03-29  8:11       ` Michal Hocko
2012-03-29  8:11         ` Michal Hocko
2012-03-16 17:39 ` [PATCH -V4 09/10] memcg: move HugeTLB resource count to parent cgroup on memcg removal Aneesh Kumar K.V
2012-03-16 17:39   ` Aneesh Kumar K.V
2012-03-19  3:04   ` KAMEZAWA Hiroyuki
2012-03-19  3:04     ` KAMEZAWA Hiroyuki
2012-03-19  3:04     ` KAMEZAWA Hiroyuki
2012-03-19  9:00     ` Aneesh Kumar K.V
2012-03-19  9:00       ` Aneesh Kumar K.V
2012-03-28 14:07   ` Michal Hocko
2012-03-28 14:07     ` Michal Hocko
2012-03-28 14:07     ` Michal Hocko
2012-03-16 17:39 ` [PATCH -V4 10/10] memcg: Add memory controller documentation for hugetlb management Aneesh Kumar K.V
2012-03-16 17:39   ` Aneesh Kumar K.V
2012-03-28 14:36   ` Michal Hocko
2012-03-28 14:36     ` Michal Hocko
2012-03-28 14:36     ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.