[PATCH -V9 00/15] hugetlb: Add HugeTLB controller to control HugeTLB allocation

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH -V9 00/15] hugetlb: Add HugeTLB controller to control HugeTLB allocation
@ 2012-06-13 10:27 ` Aneesh Kumar K.V
  0 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 10:27 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups

Hi,

This patchset implements a cgroup resource controller for HugeTLB
pages. The controller allows to limit the HugeTLB usage per control
group and enforces the controller limit during page fault. Since
HugeTLB doesn't support page reclaim, enforcing the limit at page
fault time implies that, the application will get SIGBUS signal if
it tries to access HugeTLB pages beyond its limit. This requires
the application to know beforehand how much HugeTLB pages it would
require for its use.

The goal is to control how many HugeTLB pages a group of task can
allocate. It can be looked at as an extension of the existing quota
interface which limits the number of HugeTLB pages per hugetlbfs
superblock. HPC job scheduler requires jobs to specify their resource
requirements in the job file. Once their requirements can be met,
job schedulers like (SLURM) will schedule the job. We need to make sure
that the jobs won't consume more resources than requested. If they do
we should either error out or kill the application.

Patches are on top of v3.5-rc2

Changes from V8:
  * Address review feedback

Changes from V7:
  * Remove dependency on page_cgroup.
  * Use page[2].lru.next to store HugeTLB cgroup information.

Changes from V6:
 * Implement the controller as a seperate HugeTLB cgroup.
 * Folded fixup patches in -mm to the original patches

Changes from V5:
 * Address review feedback.

Changes from V4:
 * Add support for charge/uncharge during page migration
 * Drop the usage of page->lru in unmap_hugepage_range.

Changes from v3:
 * Address review feedback.
 * Fix a bug in cgroup removal related parent charging with use_hierarchy set

Changes from V2:
* Changed the implementation to limit the HugeTLB usage during page
  fault time. This simplifies the extension and keep it closer to
  memcg design. This also allows to support cgroup removal with less
  complexity. Only caveat is the application should ensure its HugeTLB
  usage doesn't cross the cgroup limit.

Changes from V1:
* Changed the implementation as a memcg extension. We still use
  the same logic to track the cgroup and range.

Changes from RFC post:
* Added support for HugeTLB cgroup hierarchy
* Added support for task migration
* Added documentation patch
* Other bug fixes

-aneesh

^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH -V9 00/15] hugetlb: Add HugeTLB controller to control HugeTLB allocation
@ 2012-06-13 10:27 ` Aneesh Kumar K.V
  0 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 10:27 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups

Hi,

This patchset implements a cgroup resource controller for HugeTLB
pages. The controller allows to limit the HugeTLB usage per control
group and enforces the controller limit during page fault. Since
HugeTLB doesn't support page reclaim, enforcing the limit at page
fault time implies that, the application will get SIGBUS signal if
it tries to access HugeTLB pages beyond its limit. This requires
the application to know beforehand how much HugeTLB pages it would
require for its use.

The goal is to control how many HugeTLB pages a group of task can
allocate. It can be looked at as an extension of the existing quota
interface which limits the number of HugeTLB pages per hugetlbfs
superblock. HPC job scheduler requires jobs to specify their resource
requirements in the job file. Once their requirements can be met,
job schedulers like (SLURM) will schedule the job. We need to make sure
that the jobs won't consume more resources than requested. If they do
we should either error out or kill the application.

Patches are on top of v3.5-rc2

Changes from V8:
  * Address review feedback

Changes from V7:
  * Remove dependency on page_cgroup.
  * Use page[2].lru.next to store HugeTLB cgroup information.

Changes from V6:
 * Implement the controller as a seperate HugeTLB cgroup.
 * Folded fixup patches in -mm to the original patches

Changes from V5:
 * Address review feedback.

Changes from V4:
 * Add support for charge/uncharge during page migration
 * Drop the usage of page->lru in unmap_hugepage_range.

Changes from v3:
 * Address review feedback.
 * Fix a bug in cgroup removal related parent charging with use_hierarchy set

Changes from V2:
* Changed the implementation to limit the HugeTLB usage during page
  fault time. This simplifies the extension and keep it closer to
  memcg design. This also allows to support cgroup removal with less
  complexity. Only caveat is the application should ensure its HugeTLB
  usage doesn't cross the cgroup limit.

Changes from V1:
* Changed the implementation as a memcg extension. We still use
  the same logic to track the cgroup and range.

Changes from RFC post:
* Added support for HugeTLB cgroup hierarchy
* Added support for task migration
* Added documentation patch
* Other bug fixes

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH -V9 01/15] hugetlb: rename max_hstate to hugetlb_max_hstate
  2012-06-13 10:27 ` Aneesh Kumar K.V
@ 2012-06-13 10:27   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 10:27 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

Rename max_hstate to hugetlb_max_hstate. We will be using this from other
subsystems like hugetlb controller in later patches.

Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 mm/hugetlb.c |   14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index e198831..c868309 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -34,7 +34,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
 static gfp_t htlb_alloc_mask = GFP_HIGHUSER;
 unsigned long hugepages_treat_as_movable;
 
-static int max_hstate;
+static int hugetlb_max_hstate;
 unsigned int default_hstate_idx;
 struct hstate hstates[HUGE_MAX_HSTATE];
 
@@ -46,7 +46,7 @@ static unsigned long __initdata default_hstate_max_huge_pages;
 static unsigned long __initdata default_hstate_size;
 
 #define for_each_hstate(h) \
-	for ((h) = hstates; (h) < &hstates[max_hstate]; (h)++)
+	for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++)
 
 /*
  * Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages
@@ -1897,9 +1897,9 @@ void __init hugetlb_add_hstate(unsigned order)
 		printk(KERN_WARNING "hugepagesz= specified twice, ignoring\n");
 		return;
 	}
-	BUG_ON(max_hstate >= HUGE_MAX_HSTATE);
+	BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE);
 	BUG_ON(order == 0);
-	h = &hstates[max_hstate++];
+	h = &hstates[hugetlb_max_hstate++];
 	h->order = order;
 	h->mask = ~((1ULL << (order + PAGE_SHIFT)) - 1);
 	h->nr_huge_pages = 0;
@@ -1920,10 +1920,10 @@ static int __init hugetlb_nrpages_setup(char *s)
 	static unsigned long *last_mhp;
 
 	/*
-	 * !max_hstate means we haven't parsed a hugepagesz= parameter yet,
+	 * !hugetlb_max_hstate means we haven't parsed a hugepagesz= parameter yet,
 	 * so this hugepages= parameter goes to the "default hstate".
 	 */
-	if (!max_hstate)
+	if (!hugetlb_max_hstate)
 		mhp = &default_hstate_max_huge_pages;
 	else
 		mhp = &parsed_hstate->max_huge_pages;
@@ -1942,7 +1942,7 @@ static int __init hugetlb_nrpages_setup(char *s)
 	 * But we need to allocate >= MAX_ORDER hstates here early to still
 	 * use the bootmem allocator.
 	 */
-	if (max_hstate && parsed_hstate->order >= MAX_ORDER)
+	if (hugetlb_max_hstate && parsed_hstate->order >= MAX_ORDER)
 		hugetlb_hstate_alloc_pages(parsed_hstate);
 
 	last_mhp = mhp;
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH -V9 01/15] hugetlb: rename max_hstate to hugetlb_max_hstate
@ 2012-06-13 10:27   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 10:27 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

Rename max_hstate to hugetlb_max_hstate. We will be using this from other
subsystems like hugetlb controller in later patches.

Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 mm/hugetlb.c |   14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index e198831..c868309 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -34,7 +34,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
 static gfp_t htlb_alloc_mask = GFP_HIGHUSER;
 unsigned long hugepages_treat_as_movable;
 
-static int max_hstate;
+static int hugetlb_max_hstate;
 unsigned int default_hstate_idx;
 struct hstate hstates[HUGE_MAX_HSTATE];
 
@@ -46,7 +46,7 @@ static unsigned long __initdata default_hstate_max_huge_pages;
 static unsigned long __initdata default_hstate_size;
 
 #define for_each_hstate(h) \
-	for ((h) = hstates; (h) < &hstates[max_hstate]; (h)++)
+	for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++)
 
 /*
  * Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages
@@ -1897,9 +1897,9 @@ void __init hugetlb_add_hstate(unsigned order)
 		printk(KERN_WARNING "hugepagesz= specified twice, ignoring\n");
 		return;
 	}
-	BUG_ON(max_hstate >= HUGE_MAX_HSTATE);
+	BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE);
 	BUG_ON(order == 0);
-	h = &hstates[max_hstate++];
+	h = &hstates[hugetlb_max_hstate++];
 	h->order = order;
 	h->mask = ~((1ULL << (order + PAGE_SHIFT)) - 1);
 	h->nr_huge_pages = 0;
@@ -1920,10 +1920,10 @@ static int __init hugetlb_nrpages_setup(char *s)
 	static unsigned long *last_mhp;
 
 	/*
-	 * !max_hstate means we haven't parsed a hugepagesz= parameter yet,
+	 * !hugetlb_max_hstate means we haven't parsed a hugepagesz= parameter yet,
 	 * so this hugepages= parameter goes to the "default hstate".
 	 */
-	if (!max_hstate)
+	if (!hugetlb_max_hstate)
 		mhp = &default_hstate_max_huge_pages;
 	else
 		mhp = &parsed_hstate->max_huge_pages;
@@ -1942,7 +1942,7 @@ static int __init hugetlb_nrpages_setup(char *s)
 	 * But we need to allocate >= MAX_ORDER hstates here early to still
 	 * use the bootmem allocator.
 	 */
-	if (max_hstate && parsed_hstate->order >= MAX_ORDER)
+	if (hugetlb_max_hstate && parsed_hstate->order >= MAX_ORDER)
 		hugetlb_hstate_alloc_pages(parsed_hstate);
 
 	last_mhp = mhp;
-- 
1.7.10

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH -V9 02/15] hugetlb: don't use ERR_PTR with VM_FAULT* values
  2012-06-13 10:27 ` Aneesh Kumar K.V
@ 2012-06-13 10:27   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 10:27 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

The current use of VM_FAULT_* codes with ERR_PTR requires us to ensure
VM_FAULT_* values will not exceed MAX_ERRNO value. Decouple the
VM_FAULT_* values from MAX_ERRNO.

Acked-by: Hillf Danton <dhillf@gmail.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 mm/hugetlb.c |   18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index c868309..34a7e23 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1123,10 +1123,10 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 	 */
 	chg = vma_needs_reservation(h, vma, addr);
 	if (chg < 0)
-		return ERR_PTR(-VM_FAULT_OOM);
+		return ERR_PTR(-ENOMEM);
 	if (chg)
 		if (hugepage_subpool_get_pages(spool, chg))
-			return ERR_PTR(-VM_FAULT_SIGBUS);
+			return ERR_PTR(-ENOSPC);
 
 	spin_lock(&hugetlb_lock);
 	page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve);
@@ -1136,7 +1136,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 		page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
 		if (!page) {
 			hugepage_subpool_put_pages(spool, chg);
-			return ERR_PTR(-VM_FAULT_SIGBUS);
+			return ERR_PTR(-ENOSPC);
 		}
 	}
 
@@ -2496,6 +2496,7 @@ retry_avoidcopy:
 	new_page = alloc_huge_page(vma, address, outside_reserve);
 
 	if (IS_ERR(new_page)) {
+		long err = PTR_ERR(new_page);
 		page_cache_release(old_page);
 
 		/*
@@ -2524,7 +2525,10 @@ retry_avoidcopy:
 
 		/* Caller expects lock to be held */
 		spin_lock(&mm->page_table_lock);
-		return -PTR_ERR(new_page);
+		if (err == -ENOMEM)
+			return VM_FAULT_OOM;
+		else
+			return VM_FAULT_SIGBUS;
 	}
 
 	/*
@@ -2642,7 +2646,11 @@ retry:
 			goto out;
 		page = alloc_huge_page(vma, address, 0);
 		if (IS_ERR(page)) {
-			ret = -PTR_ERR(page);
+			ret = PTR_ERR(page);
+			if (ret == -ENOMEM)
+				ret = VM_FAULT_OOM;
+			else
+				ret = VM_FAULT_SIGBUS;
 			goto out;
 		}
 		clear_huge_page(page, address, pages_per_huge_page(h));
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH -V9 02/15] hugetlb: don't use ERR_PTR with VM_FAULT* values
@ 2012-06-13 10:27   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 10:27 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

The current use of VM_FAULT_* codes with ERR_PTR requires us to ensure
VM_FAULT_* values will not exceed MAX_ERRNO value. Decouple the
VM_FAULT_* values from MAX_ERRNO.

Acked-by: Hillf Danton <dhillf@gmail.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 mm/hugetlb.c |   18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index c868309..34a7e23 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1123,10 +1123,10 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 	 */
 	chg = vma_needs_reservation(h, vma, addr);
 	if (chg < 0)
-		return ERR_PTR(-VM_FAULT_OOM);
+		return ERR_PTR(-ENOMEM);
 	if (chg)
 		if (hugepage_subpool_get_pages(spool, chg))
-			return ERR_PTR(-VM_FAULT_SIGBUS);
+			return ERR_PTR(-ENOSPC);
 
 	spin_lock(&hugetlb_lock);
 	page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve);
@@ -1136,7 +1136,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 		page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
 		if (!page) {
 			hugepage_subpool_put_pages(spool, chg);
-			return ERR_PTR(-VM_FAULT_SIGBUS);
+			return ERR_PTR(-ENOSPC);
 		}
 	}
 
@@ -2496,6 +2496,7 @@ retry_avoidcopy:
 	new_page = alloc_huge_page(vma, address, outside_reserve);
 
 	if (IS_ERR(new_page)) {
+		long err = PTR_ERR(new_page);
 		page_cache_release(old_page);
 
 		/*
@@ -2524,7 +2525,10 @@ retry_avoidcopy:
 
 		/* Caller expects lock to be held */
 		spin_lock(&mm->page_table_lock);
-		return -PTR_ERR(new_page);
+		if (err == -ENOMEM)
+			return VM_FAULT_OOM;
+		else
+			return VM_FAULT_SIGBUS;
 	}
 
 	/*
@@ -2642,7 +2646,11 @@ retry:
 			goto out;
 		page = alloc_huge_page(vma, address, 0);
 		if (IS_ERR(page)) {
-			ret = -PTR_ERR(page);
+			ret = PTR_ERR(page);
+			if (ret == -ENOMEM)
+				ret = VM_FAULT_OOM;
+			else
+				ret = VM_FAULT_SIGBUS;
 			goto out;
 		}
 		clear_huge_page(page, address, pages_per_huge_page(h));
-- 
1.7.10

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH -V9 03/15] hugetlb: add an inline helper for finding hstate index
  2012-06-13 10:27 ` Aneesh Kumar K.V
@ 2012-06-13 10:27   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 10:27 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

Add an inline helper and use it in the code.

Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/hugetlb.h |    6 ++++++
 mm/hugetlb.c            |   20 +++++++++++---------
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index d5d6bbe..217f528 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -302,6 +302,11 @@ static inline unsigned hstate_index_to_shift(unsigned index)
 	return hstates[index].order + PAGE_SHIFT;
 }
 
+static inline int hstate_index(struct hstate *h)
+{
+	return h - hstates;
+}
+
 #else
 struct hstate {};
 #define alloc_huge_page_node(h, nid) NULL
@@ -320,6 +325,7 @@ static inline unsigned int pages_per_huge_page(struct hstate *h)
 	return 1;
 }
 #define hstate_index_to_shift(index) 0
+#define hstate_index(h) 0
 #endif
 
 #endif /* _LINUX_HUGETLB_H */
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 34a7e23..b1e0ed1 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1646,7 +1646,7 @@ static int hugetlb_sysfs_add_hstate(struct hstate *h, struct kobject *parent,
 				    struct attribute_group *hstate_attr_group)
 {
 	int retval;
-	int hi = h - hstates;
+	int hi = hstate_index(h);
 
 	hstate_kobjs[hi] = kobject_create_and_add(h->name, parent);
 	if (!hstate_kobjs[hi])
@@ -1741,11 +1741,13 @@ void hugetlb_unregister_node(struct node *node)
 	if (!nhs->hugepages_kobj)
 		return;		/* no hstate attributes */
 
-	for_each_hstate(h)
-		if (nhs->hstate_kobjs[h - hstates]) {
-			kobject_put(nhs->hstate_kobjs[h - hstates]);
-			nhs->hstate_kobjs[h - hstates] = NULL;
+	for_each_hstate(h) {
+		int idx = hstate_index(h);
+		if (nhs->hstate_kobjs[idx]) {
+			kobject_put(nhs->hstate_kobjs[idx]);
+			nhs->hstate_kobjs[idx] = NULL;
 		}
+	}
 
 	kobject_put(nhs->hugepages_kobj);
 	nhs->hugepages_kobj = NULL;
@@ -1848,7 +1850,7 @@ static void __exit hugetlb_exit(void)
 	hugetlb_unregister_all_nodes();
 
 	for_each_hstate(h) {
-		kobject_put(hstate_kobjs[h - hstates]);
+		kobject_put(hstate_kobjs[hstate_index(h)]);
 	}
 
 	kobject_put(hugepages_kobj);
@@ -1869,7 +1871,7 @@ static int __init hugetlb_init(void)
 		if (!size_to_hstate(default_hstate_size))
 			hugetlb_add_hstate(HUGETLB_PAGE_ORDER);
 	}
-	default_hstate_idx = size_to_hstate(default_hstate_size) - hstates;
+	default_hstate_idx = hstate_index(size_to_hstate(default_hstate_size));
 	if (default_hstate_max_huge_pages)
 		default_hstate.max_huge_pages = default_hstate_max_huge_pages;
 
@@ -2687,7 +2689,7 @@ retry:
 		 */
 		if (unlikely(PageHWPoison(page))) {
 			ret = VM_FAULT_HWPOISON |
-			      VM_FAULT_SET_HINDEX(h - hstates);
+				VM_FAULT_SET_HINDEX(hstate_index(h));
 			goto backout_unlocked;
 		}
 	}
@@ -2760,7 +2762,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 			return 0;
 		} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
 			return VM_FAULT_HWPOISON_LARGE |
-			       VM_FAULT_SET_HINDEX(h - hstates);
+				VM_FAULT_SET_HINDEX(hstate_index(h));
 	}
 
 	ptep = huge_pte_alloc(mm, address, huge_page_size(h));
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH -V9 03/15] hugetlb: add an inline helper for finding hstate index
@ 2012-06-13 10:27   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 10:27 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

Add an inline helper and use it in the code.

Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/hugetlb.h |    6 ++++++
 mm/hugetlb.c            |   20 +++++++++++---------
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index d5d6bbe..217f528 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -302,6 +302,11 @@ static inline unsigned hstate_index_to_shift(unsigned index)
 	return hstates[index].order + PAGE_SHIFT;
 }
 
+static inline int hstate_index(struct hstate *h)
+{
+	return h - hstates;
+}
+
 #else
 struct hstate {};
 #define alloc_huge_page_node(h, nid) NULL
@@ -320,6 +325,7 @@ static inline unsigned int pages_per_huge_page(struct hstate *h)
 	return 1;
 }
 #define hstate_index_to_shift(index) 0
+#define hstate_index(h) 0
 #endif
 
 #endif /* _LINUX_HUGETLB_H */
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 34a7e23..b1e0ed1 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1646,7 +1646,7 @@ static int hugetlb_sysfs_add_hstate(struct hstate *h, struct kobject *parent,
 				    struct attribute_group *hstate_attr_group)
 {
 	int retval;
-	int hi = h - hstates;
+	int hi = hstate_index(h);
 
 	hstate_kobjs[hi] = kobject_create_and_add(h->name, parent);
 	if (!hstate_kobjs[hi])
@@ -1741,11 +1741,13 @@ void hugetlb_unregister_node(struct node *node)
 	if (!nhs->hugepages_kobj)
 		return;		/* no hstate attributes */
 
-	for_each_hstate(h)
-		if (nhs->hstate_kobjs[h - hstates]) {
-			kobject_put(nhs->hstate_kobjs[h - hstates]);
-			nhs->hstate_kobjs[h - hstates] = NULL;
+	for_each_hstate(h) {
+		int idx = hstate_index(h);
+		if (nhs->hstate_kobjs[idx]) {
+			kobject_put(nhs->hstate_kobjs[idx]);
+			nhs->hstate_kobjs[idx] = NULL;
 		}
+	}
 
 	kobject_put(nhs->hugepages_kobj);
 	nhs->hugepages_kobj = NULL;
@@ -1848,7 +1850,7 @@ static void __exit hugetlb_exit(void)
 	hugetlb_unregister_all_nodes();
 
 	for_each_hstate(h) {
-		kobject_put(hstate_kobjs[h - hstates]);
+		kobject_put(hstate_kobjs[hstate_index(h)]);
 	}
 
 	kobject_put(hugepages_kobj);
@@ -1869,7 +1871,7 @@ static int __init hugetlb_init(void)
 		if (!size_to_hstate(default_hstate_size))
 			hugetlb_add_hstate(HUGETLB_PAGE_ORDER);
 	}
-	default_hstate_idx = size_to_hstate(default_hstate_size) - hstates;
+	default_hstate_idx = hstate_index(size_to_hstate(default_hstate_size));
 	if (default_hstate_max_huge_pages)
 		default_hstate.max_huge_pages = default_hstate_max_huge_pages;
 
@@ -2687,7 +2689,7 @@ retry:
 		 */
 		if (unlikely(PageHWPoison(page))) {
 			ret = VM_FAULT_HWPOISON |
-			      VM_FAULT_SET_HINDEX(h - hstates);
+				VM_FAULT_SET_HINDEX(hstate_index(h));
 			goto backout_unlocked;
 		}
 	}
@@ -2760,7 +2762,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 			return 0;
 		} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
 			return VM_FAULT_HWPOISON_LARGE |
-			       VM_FAULT_SET_HINDEX(h - hstates);
+				VM_FAULT_SET_HINDEX(hstate_index(h));
 	}
 
 	ptep = huge_pte_alloc(mm, address, huge_page_size(h));
-- 
1.7.10

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages
  2012-06-13 10:27 ` Aneesh Kumar K.V
  (?)
@ 2012-06-13 10:27   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 10:27 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

Use a mmu_gather instead of a temporary linked list for accumulating
pages when we unmap a hugepage range

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/hugetlbfs/inode.c    |    4 ++--
 include/linux/hugetlb.h |   22 ++++++++++++++----
 mm/hugetlb.c            |   59 ++++++++++++++++++++++++++++-------------------
 mm/memory.c             |    7 ++++--
 4 files changed, 59 insertions(+), 33 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index cc9281b..ff233e4 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -416,8 +416,8 @@ hugetlb_vmtruncate_list(struct prio_tree_root *root, pgoff_t pgoff)
 		else
 			v_offset = 0;
 
-		__unmap_hugepage_range(vma,
-				vma->vm_start + v_offset, vma->vm_end, NULL);
+		unmap_hugepage_range(vma, vma->vm_start + v_offset,
+				     vma->vm_end, NULL);
 	}
 }
 
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 217f528..0f23c18 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -7,6 +7,7 @@
 
 struct ctl_table;
 struct user_struct;
+struct mmu_gather;
 
 #ifdef CONFIG_HUGETLB_PAGE
 
@@ -40,9 +41,10 @@ int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *,
 			struct page **, struct vm_area_struct **,
 			unsigned long *, int *, int, unsigned int flags);
 void unmap_hugepage_range(struct vm_area_struct *,
-			unsigned long, unsigned long, struct page *);
-void __unmap_hugepage_range(struct vm_area_struct *,
-			unsigned long, unsigned long, struct page *);
+			  unsigned long, unsigned long, struct page *);
+void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
+				unsigned long start, unsigned long end,
+				struct page *ref_page);
 int hugetlb_prefault(struct address_space *, struct vm_area_struct *);
 void hugetlb_report_meminfo(struct seq_file *);
 int hugetlb_report_node_meminfo(int, char *);
@@ -98,7 +100,6 @@ static inline unsigned long hugetlb_total_pages(void)
 #define follow_huge_addr(mm, addr, write)	ERR_PTR(-EINVAL)
 #define copy_hugetlb_page_range(src, dst, vma)	({ BUG(); 0; })
 #define hugetlb_prefault(mapping, vma)		({ BUG(); 0; })
-#define unmap_hugepage_range(vma, start, end, page)	BUG()
 static inline void hugetlb_report_meminfo(struct seq_file *m)
 {
 }
@@ -112,13 +113,24 @@ static inline void hugetlb_report_meminfo(struct seq_file *m)
 #define hugetlb_free_pgd_range(tlb, addr, end, floor, ceiling) ({BUG(); 0; })
 #define hugetlb_fault(mm, vma, addr, flags)	({ BUG(); 0; })
 #define huge_pte_offset(mm, address)	0
-#define dequeue_hwpoisoned_huge_page(page)	0
+static inline int dequeue_hwpoisoned_huge_page(struct page *page)
+{
+	return 0;
+}
+
 static inline void copy_huge_page(struct page *dst, struct page *src)
 {
 }
 
 #define hugetlb_change_protection(vma, address, end, newprot)
 
+static inline void __unmap_hugepage_range(struct mmu_gather *tlb,
+			struct vm_area_struct *vma, unsigned long start,
+			unsigned long end, struct page *ref_page)
+{
+	BUG();
+}
+
 #endif /* !CONFIG_HUGETLB_PAGE */
 
 #define HUGETLB_ANON_FILE "anon_hugepage"
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index b1e0ed1..e54b695 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -24,8 +24,9 @@
 
 #include <asm/page.h>
 #include <asm/pgtable.h>
-#include <linux/io.h>
+#include <asm/tlb.h>
 
+#include <linux/io.h>
 #include <linux/hugetlb.h>
 #include <linux/node.h>
 #include "internal.h"
@@ -2310,30 +2311,26 @@ static int is_hugetlb_entry_hwpoisoned(pte_t pte)
 		return 0;
 }
 
-void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
-			    unsigned long end, struct page *ref_page)
+void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
+			    unsigned long start, unsigned long end,
+			    struct page *ref_page)
 {
+	int force_flush = 0;
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long address;
 	pte_t *ptep;
 	pte_t pte;
 	struct page *page;
-	struct page *tmp;
 	struct hstate *h = hstate_vma(vma);
 	unsigned long sz = huge_page_size(h);
 
-	/*
-	 * A page gathering list, protected by per file i_mmap_mutex. The
-	 * lock is used to avoid list corruption from multiple unmapping
-	 * of the same page since we are using page->lru.
-	 */
-	LIST_HEAD(page_list);
-
 	WARN_ON(!is_vm_hugetlb_page(vma));
 	BUG_ON(start & ~huge_page_mask(h));
 	BUG_ON(end & ~huge_page_mask(h));
 
+	tlb_start_vma(tlb, vma);
 	mmu_notifier_invalidate_range_start(mm, start, end);
+again:
 	spin_lock(&mm->page_table_lock);
 	for (address = start; address < end; address += sz) {
 		ptep = huge_pte_offset(mm, address);
@@ -2372,30 +2369,45 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
 		}
 
 		pte = huge_ptep_get_and_clear(mm, address, ptep);
+		tlb_remove_tlb_entry(tlb, ptep, address);
 		if (pte_dirty(pte))
 			set_page_dirty(page);
-		list_add(&page->lru, &page_list);
 
+		page_remove_rmap(page);
+		force_flush = !__tlb_remove_page(tlb, page);
+		if (force_flush)
+			break;
 		/* Bail out after unmapping reference page if supplied */
 		if (ref_page)
 			break;
 	}
-	flush_tlb_range(vma, start, end);
 	spin_unlock(&mm->page_table_lock);
-	mmu_notifier_invalidate_range_end(mm, start, end);
-	list_for_each_entry_safe(page, tmp, &page_list, lru) {
-		page_remove_rmap(page);
-		list_del(&page->lru);
-		put_page(page);
+	/*
+	 * mmu_gather ran out of room to batch pages, we break out of
+	 * the PTE lock to avoid doing the potential expensive TLB invalidate
+	 * and page-free while holding it.
+	 */
+	if (force_flush) {
+		force_flush = 0;
+		tlb_flush_mmu(tlb);
+		if (address < end && !ref_page)
+			goto again;
 	}
+	mmu_notifier_invalidate_range_end(mm, start, end);
+	tlb_end_vma(tlb, vma);
 }
 
 void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
 			  unsigned long end, struct page *ref_page)
 {
-	mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
-	__unmap_hugepage_range(vma, start, end, ref_page);
-	mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
+	struct mm_struct *mm;
+	struct mmu_gather tlb;
+
+	mm = vma->vm_mm;
+
+	tlb_gather_mmu(&tlb, mm, 0);
+	__unmap_hugepage_range(&tlb, vma, start, end, ref_page);
+	tlb_finish_mmu(&tlb, start, end);
 }
 
 /*
@@ -2440,9 +2452,8 @@ static int unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma,
 		 * from the time of fork. This would look like data corruption
 		 */
 		if (!is_vma_resv_set(iter_vma, HPAGE_RESV_OWNER))
-			__unmap_hugepage_range(iter_vma,
-				address, address + huge_page_size(h),
-				page);
+			unmap_hugepage_range(iter_vma, address,
+					     address + huge_page_size(h), page);
 	}
 	mutex_unlock(&mapping->i_mmap_mutex);
 
diff --git a/mm/memory.c b/mm/memory.c
index 1b7dc66..545e18a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1326,8 +1326,11 @@ static void unmap_single_vma(struct mmu_gather *tlb,
 			 * Since no pte has actually been setup, it is
 			 * safe to do nothing in this case.
 			 */
-			if (vma->vm_file)
-				unmap_hugepage_range(vma, start, end, NULL);
+			if (vma->vm_file) {
+				mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
+				__unmap_hugepage_range(tlb, vma, start, end, NULL);
+				mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
+			}
 		} else
 			unmap_page_range(tlb, vma, start, end, details);
 	}
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages
@ 2012-06-13 10:27   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 10:27 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

Use a mmu_gather instead of a temporary linked list for accumulating
pages when we unmap a hugepage range

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/hugetlbfs/inode.c    |    4 ++--
 include/linux/hugetlb.h |   22 ++++++++++++++----
 mm/hugetlb.c            |   59 ++++++++++++++++++++++++++++-------------------
 mm/memory.c             |    7 ++++--
 4 files changed, 59 insertions(+), 33 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index cc9281b..ff233e4 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -416,8 +416,8 @@ hugetlb_vmtruncate_list(struct prio_tree_root *root, pgoff_t pgoff)
 		else
 			v_offset = 0;
 
-		__unmap_hugepage_range(vma,
-				vma->vm_start + v_offset, vma->vm_end, NULL);
+		unmap_hugepage_range(vma, vma->vm_start + v_offset,
+				     vma->vm_end, NULL);
 	}
 }
 
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 217f528..0f23c18 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -7,6 +7,7 @@
 
 struct ctl_table;
 struct user_struct;
+struct mmu_gather;
 
 #ifdef CONFIG_HUGETLB_PAGE
 
@@ -40,9 +41,10 @@ int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *,
 			struct page **, struct vm_area_struct **,
 			unsigned long *, int *, int, unsigned int flags);
 void unmap_hugepage_range(struct vm_area_struct *,
-			unsigned long, unsigned long, struct page *);
-void __unmap_hugepage_range(struct vm_area_struct *,
-			unsigned long, unsigned long, struct page *);
+			  unsigned long, unsigned long, struct page *);
+void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
+				unsigned long start, unsigned long end,
+				struct page *ref_page);
 int hugetlb_prefault(struct address_space *, struct vm_area_struct *);
 void hugetlb_report_meminfo(struct seq_file *);
 int hugetlb_report_node_meminfo(int, char *);
@@ -98,7 +100,6 @@ static inline unsigned long hugetlb_total_pages(void)
 #define follow_huge_addr(mm, addr, write)	ERR_PTR(-EINVAL)
 #define copy_hugetlb_page_range(src, dst, vma)	({ BUG(); 0; })
 #define hugetlb_prefault(mapping, vma)		({ BUG(); 0; })
-#define unmap_hugepage_range(vma, start, end, page)	BUG()
 static inline void hugetlb_report_meminfo(struct seq_file *m)
 {
 }
@@ -112,13 +113,24 @@ static inline void hugetlb_report_meminfo(struct seq_file *m)
 #define hugetlb_free_pgd_range(tlb, addr, end, floor, ceiling) ({BUG(); 0; })
 #define hugetlb_fault(mm, vma, addr, flags)	({ BUG(); 0; })
 #define huge_pte_offset(mm, address)	0
-#define dequeue_hwpoisoned_huge_page(page)	0
+static inline int dequeue_hwpoisoned_huge_page(struct page *page)
+{
+	return 0;
+}
+
 static inline void copy_huge_page(struct page *dst, struct page *src)
 {
 }
 
 #define hugetlb_change_protection(vma, address, end, newprot)
 
+static inline void __unmap_hugepage_range(struct mmu_gather *tlb,
+			struct vm_area_struct *vma, unsigned long start,
+			unsigned long end, struct page *ref_page)
+{
+	BUG();
+}
+
 #endif /* !CONFIG_HUGETLB_PAGE */
 
 #define HUGETLB_ANON_FILE "anon_hugepage"
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index b1e0ed1..e54b695 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -24,8 +24,9 @@
 
 #include <asm/page.h>
 #include <asm/pgtable.h>
-#include <linux/io.h>
+#include <asm/tlb.h>
 
+#include <linux/io.h>
 #include <linux/hugetlb.h>
 #include <linux/node.h>
 #include "internal.h"
@@ -2310,30 +2311,26 @@ static int is_hugetlb_entry_hwpoisoned(pte_t pte)
 		return 0;
 }
 
-void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
-			    unsigned long end, struct page *ref_page)
+void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
+			    unsigned long start, unsigned long end,
+			    struct page *ref_page)
 {
+	int force_flush = 0;
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long address;
 	pte_t *ptep;
 	pte_t pte;
 	struct page *page;
-	struct page *tmp;
 	struct hstate *h = hstate_vma(vma);
 	unsigned long sz = huge_page_size(h);
 
-	/*
-	 * A page gathering list, protected by per file i_mmap_mutex. The
-	 * lock is used to avoid list corruption from multiple unmapping
-	 * of the same page since we are using page->lru.
-	 */
-	LIST_HEAD(page_list);
-
 	WARN_ON(!is_vm_hugetlb_page(vma));
 	BUG_ON(start & ~huge_page_mask(h));
 	BUG_ON(end & ~huge_page_mask(h));
 
+	tlb_start_vma(tlb, vma);
 	mmu_notifier_invalidate_range_start(mm, start, end);
+again:
 	spin_lock(&mm->page_table_lock);
 	for (address = start; address < end; address += sz) {
 		ptep = huge_pte_offset(mm, address);
@@ -2372,30 +2369,45 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
 		}
 
 		pte = huge_ptep_get_and_clear(mm, address, ptep);
+		tlb_remove_tlb_entry(tlb, ptep, address);
 		if (pte_dirty(pte))
 			set_page_dirty(page);
-		list_add(&page->lru, &page_list);
 
+		page_remove_rmap(page);
+		force_flush = !__tlb_remove_page(tlb, page);
+		if (force_flush)
+			break;
 		/* Bail out after unmapping reference page if supplied */
 		if (ref_page)
 			break;
 	}
-	flush_tlb_range(vma, start, end);
 	spin_unlock(&mm->page_table_lock);
-	mmu_notifier_invalidate_range_end(mm, start, end);
-	list_for_each_entry_safe(page, tmp, &page_list, lru) {
-		page_remove_rmap(page);
-		list_del(&page->lru);
-		put_page(page);
+	/*
+	 * mmu_gather ran out of room to batch pages, we break out of
+	 * the PTE lock to avoid doing the potential expensive TLB invalidate
+	 * and page-free while holding it.
+	 */
+	if (force_flush) {
+		force_flush = 0;
+		tlb_flush_mmu(tlb);
+		if (address < end && !ref_page)
+			goto again;
 	}
+	mmu_notifier_invalidate_range_end(mm, start, end);
+	tlb_end_vma(tlb, vma);
 }
 
 void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
 			  unsigned long end, struct page *ref_page)
 {
-	mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
-	__unmap_hugepage_range(vma, start, end, ref_page);
-	mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
+	struct mm_struct *mm;
+	struct mmu_gather tlb;
+
+	mm = vma->vm_mm;
+
+	tlb_gather_mmu(&tlb, mm, 0);
+	__unmap_hugepage_range(&tlb, vma, start, end, ref_page);
+	tlb_finish_mmu(&tlb, start, end);
 }
 
 /*
@@ -2440,9 +2452,8 @@ static int unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma,
 		 * from the time of fork. This would look like data corruption
 		 */
 		if (!is_vma_resv_set(iter_vma, HPAGE_RESV_OWNER))
-			__unmap_hugepage_range(iter_vma,
-				address, address + huge_page_size(h),
-				page);
+			unmap_hugepage_range(iter_vma, address,
+					     address + huge_page_size(h), page);
 	}
 	mutex_unlock(&mapping->i_mmap_mutex);
 
diff --git a/mm/memory.c b/mm/memory.c
index 1b7dc66..545e18a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1326,8 +1326,11 @@ static void unmap_single_vma(struct mmu_gather *tlb,
 			 * Since no pte has actually been setup, it is
 			 * safe to do nothing in this case.
 			 */
-			if (vma->vm_file)
-				unmap_hugepage_range(vma, start, end, NULL);
+			if (vma->vm_file) {
+				mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
+				__unmap_hugepage_range(tlb, vma, start, end, NULL);
+				mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
+			}
 		} else
 			unmap_page_range(tlb, vma, start, end, details);
 	}
-- 
1.7.10

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages
@ 2012-06-13 10:27   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 10:27 UTC (permalink / raw)
  To: linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, rientjes-hpIqsD4AKlfQT0dZR+AlfA,
	mhocko-AlSwsSmVLrQ, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>

Use a mmu_gather instead of a temporary linked list for accumulating
pages when we unmap a hugepage range

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 fs/hugetlbfs/inode.c    |    4 ++--
 include/linux/hugetlb.h |   22 ++++++++++++++----
 mm/hugetlb.c            |   59 ++++++++++++++++++++++++++++-------------------
 mm/memory.c             |    7 ++++--
 4 files changed, 59 insertions(+), 33 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index cc9281b..ff233e4 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -416,8 +416,8 @@ hugetlb_vmtruncate_list(struct prio_tree_root *root, pgoff_t pgoff)
 		else
 			v_offset = 0;
 
-		__unmap_hugepage_range(vma,
-				vma->vm_start + v_offset, vma->vm_end, NULL);
+		unmap_hugepage_range(vma, vma->vm_start + v_offset,
+				     vma->vm_end, NULL);
 	}
 }
 
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 217f528..0f23c18 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -7,6 +7,7 @@
 
 struct ctl_table;
 struct user_struct;
+struct mmu_gather;
 
 #ifdef CONFIG_HUGETLB_PAGE
 
@@ -40,9 +41,10 @@ int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *,
 			struct page **, struct vm_area_struct **,
 			unsigned long *, int *, int, unsigned int flags);
 void unmap_hugepage_range(struct vm_area_struct *,
-			unsigned long, unsigned long, struct page *);
-void __unmap_hugepage_range(struct vm_area_struct *,
-			unsigned long, unsigned long, struct page *);
+			  unsigned long, unsigned long, struct page *);
+void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
+				unsigned long start, unsigned long end,
+				struct page *ref_page);
 int hugetlb_prefault(struct address_space *, struct vm_area_struct *);
 void hugetlb_report_meminfo(struct seq_file *);
 int hugetlb_report_node_meminfo(int, char *);
@@ -98,7 +100,6 @@ static inline unsigned long hugetlb_total_pages(void)
 #define follow_huge_addr(mm, addr, write)	ERR_PTR(-EINVAL)
 #define copy_hugetlb_page_range(src, dst, vma)	({ BUG(); 0; })
 #define hugetlb_prefault(mapping, vma)		({ BUG(); 0; })
-#define unmap_hugepage_range(vma, start, end, page)	BUG()
 static inline void hugetlb_report_meminfo(struct seq_file *m)
 {
 }
@@ -112,13 +113,24 @@ static inline void hugetlb_report_meminfo(struct seq_file *m)
 #define hugetlb_free_pgd_range(tlb, addr, end, floor, ceiling) ({BUG(); 0; })
 #define hugetlb_fault(mm, vma, addr, flags)	({ BUG(); 0; })
 #define huge_pte_offset(mm, address)	0
-#define dequeue_hwpoisoned_huge_page(page)	0
+static inline int dequeue_hwpoisoned_huge_page(struct page *page)
+{
+	return 0;
+}
+
 static inline void copy_huge_page(struct page *dst, struct page *src)
 {
 }
 
 #define hugetlb_change_protection(vma, address, end, newprot)
 
+static inline void __unmap_hugepage_range(struct mmu_gather *tlb,
+			struct vm_area_struct *vma, unsigned long start,
+			unsigned long end, struct page *ref_page)
+{
+	BUG();
+}
+
 #endif /* !CONFIG_HUGETLB_PAGE */
 
 #define HUGETLB_ANON_FILE "anon_hugepage"
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index b1e0ed1..e54b695 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -24,8 +24,9 @@
 
 #include <asm/page.h>
 #include <asm/pgtable.h>
-#include <linux/io.h>
+#include <asm/tlb.h>
 
+#include <linux/io.h>
 #include <linux/hugetlb.h>
 #include <linux/node.h>
 #include "internal.h"
@@ -2310,30 +2311,26 @@ static int is_hugetlb_entry_hwpoisoned(pte_t pte)
 		return 0;
 }
 
-void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
-			    unsigned long end, struct page *ref_page)
+void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
+			    unsigned long start, unsigned long end,
+			    struct page *ref_page)
 {
+	int force_flush = 0;
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long address;
 	pte_t *ptep;
 	pte_t pte;
 	struct page *page;
-	struct page *tmp;
 	struct hstate *h = hstate_vma(vma);
 	unsigned long sz = huge_page_size(h);
 
-	/*
-	 * A page gathering list, protected by per file i_mmap_mutex. The
-	 * lock is used to avoid list corruption from multiple unmapping
-	 * of the same page since we are using page->lru.
-	 */
-	LIST_HEAD(page_list);
-
 	WARN_ON(!is_vm_hugetlb_page(vma));
 	BUG_ON(start & ~huge_page_mask(h));
 	BUG_ON(end & ~huge_page_mask(h));
 
+	tlb_start_vma(tlb, vma);
 	mmu_notifier_invalidate_range_start(mm, start, end);
+again:
 	spin_lock(&mm->page_table_lock);
 	for (address = start; address < end; address += sz) {
 		ptep = huge_pte_offset(mm, address);
@@ -2372,30 +2369,45 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
 		}
 
 		pte = huge_ptep_get_and_clear(mm, address, ptep);
+		tlb_remove_tlb_entry(tlb, ptep, address);
 		if (pte_dirty(pte))
 			set_page_dirty(page);
-		list_add(&page->lru, &page_list);
 
+		page_remove_rmap(page);
+		force_flush = !__tlb_remove_page(tlb, page);
+		if (force_flush)
+			break;
 		/* Bail out after unmapping reference page if supplied */
 		if (ref_page)
 			break;
 	}
-	flush_tlb_range(vma, start, end);
 	spin_unlock(&mm->page_table_lock);
-	mmu_notifier_invalidate_range_end(mm, start, end);
-	list_for_each_entry_safe(page, tmp, &page_list, lru) {
-		page_remove_rmap(page);
-		list_del(&page->lru);
-		put_page(page);
+	/*
+	 * mmu_gather ran out of room to batch pages, we break out of
+	 * the PTE lock to avoid doing the potential expensive TLB invalidate
+	 * and page-free while holding it.
+	 */
+	if (force_flush) {
+		force_flush = 0;
+		tlb_flush_mmu(tlb);
+		if (address < end && !ref_page)
+			goto again;
 	}
+	mmu_notifier_invalidate_range_end(mm, start, end);
+	tlb_end_vma(tlb, vma);
 }
 
 void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
 			  unsigned long end, struct page *ref_page)
 {
-	mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
-	__unmap_hugepage_range(vma, start, end, ref_page);
-	mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
+	struct mm_struct *mm;
+	struct mmu_gather tlb;
+
+	mm = vma->vm_mm;
+
+	tlb_gather_mmu(&tlb, mm, 0);
+	__unmap_hugepage_range(&tlb, vma, start, end, ref_page);
+	tlb_finish_mmu(&tlb, start, end);
 }
 
 /*
@@ -2440,9 +2452,8 @@ static int unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma,
 		 * from the time of fork. This would look like data corruption
 		 */
 		if (!is_vma_resv_set(iter_vma, HPAGE_RESV_OWNER))
-			__unmap_hugepage_range(iter_vma,
-				address, address + huge_page_size(h),
-				page);
+			unmap_hugepage_range(iter_vma, address,
+					     address + huge_page_size(h), page);
 	}
 	mutex_unlock(&mapping->i_mmap_mutex);
 
diff --git a/mm/memory.c b/mm/memory.c
index 1b7dc66..545e18a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1326,8 +1326,11 @@ static void unmap_single_vma(struct mmu_gather *tlb,
 			 * Since no pte has actually been setup, it is
 			 * safe to do nothing in this case.
 			 */
-			if (vma->vm_file)
-				unmap_hugepage_range(vma, start, end, NULL);
+			if (vma->vm_file) {
+				mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
+				__unmap_hugepage_range(tlb, vma, start, end, NULL);
+				mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
+			}
 		} else
 			unmap_page_range(tlb, vma, start, end, details);
 	}
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH -V9 05/15] hugetlb: avoid taking i_mmap_mutex in unmap_single_vma() for hugetlb
  2012-06-13 10:27 ` Aneesh Kumar K.V
@ 2012-06-13 10:27   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 10:27 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

i_mmap_mutex lock was added in unmap_single_vma by 502717f4e ("hugetlb:
fix linked list corruption in unmap_hugepage_range()") but we don't use
page->lru in unmap_hugepage_range any more.  Also the lock was taken
higher up in the stack in some code path.  That would result in deadlock.

unmap_mapping_range (i_mmap_mutex)
 -> unmap_mapping_range_tree
    -> unmap_mapping_range_vma
       -> zap_page_range_single
         -> unmap_single_vma
	      -> unmap_hugepage_range (i_mmap_mutex)

For shared pagetable support for huge pages, since pagetable pages are ref
counted we don't need any lock during huge_pmd_unshare.  We do take
i_mmap_mutex in huge_pmd_share while walking the vma_prio_tree in mapping.
(39dde65c9940c97f ("shared page table for hugetlb page")).

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 mm/memory.c |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 545e18a..f6bc04f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1326,11 +1326,8 @@ static void unmap_single_vma(struct mmu_gather *tlb,
 			 * Since no pte has actually been setup, it is
 			 * safe to do nothing in this case.
 			 */
-			if (vma->vm_file) {
-				mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
+			if (vma->vm_file)
 				__unmap_hugepage_range(tlb, vma, start, end, NULL);
-				mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
-			}
 		} else
 			unmap_page_range(tlb, vma, start, end, details);
 	}
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH -V9 05/15] hugetlb: avoid taking i_mmap_mutex in unmap_single_vma() for hugetlb
@ 2012-06-13 10:27   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 10:27 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

i_mmap_mutex lock was added in unmap_single_vma by 502717f4e ("hugetlb:
fix linked list corruption in unmap_hugepage_range()") but we don't use
page->lru in unmap_hugepage_range any more.  Also the lock was taken
higher up in the stack in some code path.  That would result in deadlock.

unmap_mapping_range (i_mmap_mutex)
 -> unmap_mapping_range_tree
    -> unmap_mapping_range_vma
       -> zap_page_range_single
         -> unmap_single_vma
	      -> unmap_hugepage_range (i_mmap_mutex)

For shared pagetable support for huge pages, since pagetable pages are ref
counted we don't need any lock during huge_pmd_unshare.  We do take
i_mmap_mutex in huge_pmd_share while walking the vma_prio_tree in mapping.
(39dde65c9940c97f ("shared page table for hugetlb page")).

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 mm/memory.c |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 545e18a..f6bc04f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1326,11 +1326,8 @@ static void unmap_single_vma(struct mmu_gather *tlb,
 			 * Since no pte has actually been setup, it is
 			 * safe to do nothing in this case.
 			 */
-			if (vma->vm_file) {
-				mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
+			if (vma->vm_file)
 				__unmap_hugepage_range(tlb, vma, start, end, NULL);
-				mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
-			}
 		} else
 			unmap_page_range(tlb, vma, start, end, details);
 	}
-- 
1.7.10

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH -V9 06/15] hugetlb: simplify migrate_huge_page()
  2012-06-13 10:27 ` Aneesh Kumar K.V
@ 2012-06-13 10:27   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 10:27 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

Since we migrate only one hugepage, don't use linked list for passing the
page around.  Directly pass the page that need to be migrated as argument.
This also remove the usage page->lru in migrate path.

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/migrate.h |    4 +--
 mm/memory-failure.c     |   13 ++--------
 mm/migrate.c            |   65 +++++++++++++++--------------------------------
 3 files changed, 25 insertions(+), 57 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 855c337..ce7e667 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -15,7 +15,7 @@ extern int migrate_page(struct address_space *,
 extern int migrate_pages(struct list_head *l, new_page_t x,
 			unsigned long private, bool offlining,
 			enum migrate_mode mode);
-extern int migrate_huge_pages(struct list_head *l, new_page_t x,
+extern int migrate_huge_page(struct page *, new_page_t x,
 			unsigned long private, bool offlining,
 			enum migrate_mode mode);
 
@@ -36,7 +36,7 @@ static inline void putback_lru_pages(struct list_head *l) {}
 static inline int migrate_pages(struct list_head *l, new_page_t x,
 		unsigned long private, bool offlining,
 		enum migrate_mode mode) { return -ENOSYS; }
-static inline int migrate_huge_pages(struct list_head *l, new_page_t x,
+static inline int migrate_huge_page(struct page *page, new_page_t x,
 		unsigned long private, bool offlining,
 		enum migrate_mode mode) { return -ENOSYS; }
 
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index ab1e714..53a1495 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1414,7 +1414,6 @@ static int soft_offline_huge_page(struct page *page, int flags)
 	int ret;
 	unsigned long pfn = page_to_pfn(page);
 	struct page *hpage = compound_head(page);
-	LIST_HEAD(pagelist);
 
 	ret = get_any_page(page, pfn, flags);
 	if (ret < 0)
@@ -1429,19 +1428,11 @@ static int soft_offline_huge_page(struct page *page, int flags)
 	}
 
 	/* Keep page count to indicate a given hugepage is isolated. */
-
-	list_add(&hpage->lru, &pagelist);
-	ret = migrate_huge_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL, 0,
-				true);
+	ret = migrate_huge_page(hpage, new_page, MPOL_MF_MOVE_ALL, 0, true);
+	put_page(hpage);
 	if (ret) {
-		struct page *page1, *page2;
-		list_for_each_entry_safe(page1, page2, &pagelist, lru)
-			put_page(page1);
-
 		pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
 			pfn, ret, page->flags);
-		if (ret > 0)
-			ret = -EIO;
 		return ret;
 	}
 done:
diff --git a/mm/migrate.c b/mm/migrate.c
index be26d5c..fdce3a2 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -932,15 +932,8 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
 	if (anon_vma)
 		put_anon_vma(anon_vma);
 	unlock_page(hpage);
-
 out:
-	if (rc != -EAGAIN) {
-		list_del(&hpage->lru);
-		put_page(hpage);
-	}
-
 	put_page(new_hpage);
-
 	if (result) {
 		if (rc)
 			*result = rc;
@@ -1016,48 +1009,32 @@ out:
 	return nr_failed + retry;
 }
 
-int migrate_huge_pages(struct list_head *from,
-		new_page_t get_new_page, unsigned long private, bool offlining,
-		enum migrate_mode mode)
+int migrate_huge_page(struct page *hpage, new_page_t get_new_page,
+		      unsigned long private, bool offlining,
+		      enum migrate_mode mode)
 {
-	int retry = 1;
-	int nr_failed = 0;
-	int pass = 0;
-	struct page *page;
-	struct page *page2;
-	int rc;
-
-	for (pass = 0; pass < 10 && retry; pass++) {
-		retry = 0;
-
-		list_for_each_entry_safe(page, page2, from, lru) {
+	int pass, rc;
+
+	for (pass = 0; pass < 10; pass++) {
+		rc = unmap_and_move_huge_page(get_new_page,
+					      private, hpage, pass > 2, offlining,
+					      mode);
+		switch (rc) {
+		case -ENOMEM:
+			goto out;
+		case -EAGAIN:
+			/* try again */
 			cond_resched();
-
-			rc = unmap_and_move_huge_page(get_new_page,
-					private, page, pass > 2, offlining,
-					mode);
-
-			switch(rc) {
-			case -ENOMEM:
-				goto out;
-			case -EAGAIN:
-				retry++;
-				break;
-			case 0:
-				break;
-			default:
-				/* Permanent failure */
-				nr_failed++;
-				break;
-			}
+			break;
+		case 0:
+			goto out;
+		default:
+			rc = -EIO;
+			goto out;
 		}
 	}
-	rc = 0;
 out:
-	if (rc)
-		return rc;
-
-	return nr_failed + retry;
+	return rc;
 }
 
 #ifdef CONFIG_NUMA
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH -V9 06/15] hugetlb: simplify migrate_huge_page()
@ 2012-06-13 10:27   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 10:27 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

Since we migrate only one hugepage, don't use linked list for passing the
page around.  Directly pass the page that need to be migrated as argument.
This also remove the usage page->lru in migrate path.

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/migrate.h |    4 +--
 mm/memory-failure.c     |   13 ++--------
 mm/migrate.c            |   65 +++++++++++++++--------------------------------
 3 files changed, 25 insertions(+), 57 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 855c337..ce7e667 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -15,7 +15,7 @@ extern int migrate_page(struct address_space *,
 extern int migrate_pages(struct list_head *l, new_page_t x,
 			unsigned long private, bool offlining,
 			enum migrate_mode mode);
-extern int migrate_huge_pages(struct list_head *l, new_page_t x,
+extern int migrate_huge_page(struct page *, new_page_t x,
 			unsigned long private, bool offlining,
 			enum migrate_mode mode);
 
@@ -36,7 +36,7 @@ static inline void putback_lru_pages(struct list_head *l) {}
 static inline int migrate_pages(struct list_head *l, new_page_t x,
 		unsigned long private, bool offlining,
 		enum migrate_mode mode) { return -ENOSYS; }
-static inline int migrate_huge_pages(struct list_head *l, new_page_t x,
+static inline int migrate_huge_page(struct page *page, new_page_t x,
 		unsigned long private, bool offlining,
 		enum migrate_mode mode) { return -ENOSYS; }
 
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index ab1e714..53a1495 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1414,7 +1414,6 @@ static int soft_offline_huge_page(struct page *page, int flags)
 	int ret;
 	unsigned long pfn = page_to_pfn(page);
 	struct page *hpage = compound_head(page);
-	LIST_HEAD(pagelist);
 
 	ret = get_any_page(page, pfn, flags);
 	if (ret < 0)
@@ -1429,19 +1428,11 @@ static int soft_offline_huge_page(struct page *page, int flags)
 	}
 
 	/* Keep page count to indicate a given hugepage is isolated. */
-
-	list_add(&hpage->lru, &pagelist);
-	ret = migrate_huge_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL, 0,
-				true);
+	ret = migrate_huge_page(hpage, new_page, MPOL_MF_MOVE_ALL, 0, true);
+	put_page(hpage);
 	if (ret) {
-		struct page *page1, *page2;
-		list_for_each_entry_safe(page1, page2, &pagelist, lru)
-			put_page(page1);
-
 		pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
 			pfn, ret, page->flags);
-		if (ret > 0)
-			ret = -EIO;
 		return ret;
 	}
 done:
diff --git a/mm/migrate.c b/mm/migrate.c
index be26d5c..fdce3a2 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -932,15 +932,8 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
 	if (anon_vma)
 		put_anon_vma(anon_vma);
 	unlock_page(hpage);
-
 out:
-	if (rc != -EAGAIN) {
-		list_del(&hpage->lru);
-		put_page(hpage);
-	}
-
 	put_page(new_hpage);
-
 	if (result) {
 		if (rc)
 			*result = rc;
@@ -1016,48 +1009,32 @@ out:
 	return nr_failed + retry;
 }
 
-int migrate_huge_pages(struct list_head *from,
-		new_page_t get_new_page, unsigned long private, bool offlining,
-		enum migrate_mode mode)
+int migrate_huge_page(struct page *hpage, new_page_t get_new_page,
+		      unsigned long private, bool offlining,
+		      enum migrate_mode mode)
 {
-	int retry = 1;
-	int nr_failed = 0;
-	int pass = 0;
-	struct page *page;
-	struct page *page2;
-	int rc;
-
-	for (pass = 0; pass < 10 && retry; pass++) {
-		retry = 0;
-
-		list_for_each_entry_safe(page, page2, from, lru) {
+	int pass, rc;
+
+	for (pass = 0; pass < 10; pass++) {
+		rc = unmap_and_move_huge_page(get_new_page,
+					      private, hpage, pass > 2, offlining,
+					      mode);
+		switch (rc) {
+		case -ENOMEM:
+			goto out;
+		case -EAGAIN:
+			/* try again */
 			cond_resched();
-
-			rc = unmap_and_move_huge_page(get_new_page,
-					private, page, pass > 2, offlining,
-					mode);
-
-			switch(rc) {
-			case -ENOMEM:
-				goto out;
-			case -EAGAIN:
-				retry++;
-				break;
-			case 0:
-				break;
-			default:
-				/* Permanent failure */
-				nr_failed++;
-				break;
-			}
+			break;
+		case 0:
+			goto out;
+		default:
+			rc = -EIO;
+			goto out;
 		}
 	}
-	rc = 0;
 out:
-	if (rc)
-		return rc;
-
-	return nr_failed + retry;
+	return rc;
 }
 
 #ifdef CONFIG_NUMA
-- 
1.7.10

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH -V9 07/15] hugetlb: add a list for tracking in-use HugeTLB pages
  2012-06-13 10:27 ` Aneesh Kumar K.V
@ 2012-06-13 10:27   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 10:27 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

hugepage_activelist will be used to track currently used HugeTLB pages.
We need to find the in-use HugeTLB pages to support HugeTLB cgroup removal.
On cgroup removal we update the page's HugeTLB cgroup to point to parent
cgroup.

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/hugetlb.h |    1 +
 mm/hugetlb.c            |   12 +++++++-----
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 0f23c18..ed550d8 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -211,6 +211,7 @@ struct hstate {
 	unsigned long resv_huge_pages;
 	unsigned long surplus_huge_pages;
 	unsigned long nr_overcommit_huge_pages;
+	struct list_head hugepage_activelist;
 	struct list_head hugepage_freelists[MAX_NUMNODES];
 	unsigned int nr_huge_pages_node[MAX_NUMNODES];
 	unsigned int free_huge_pages_node[MAX_NUMNODES];
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index e54b695..b5b6e15 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -510,7 +510,7 @@ void copy_huge_page(struct page *dst, struct page *src)
 static void enqueue_huge_page(struct hstate *h, struct page *page)
 {
 	int nid = page_to_nid(page);
-	list_add(&page->lru, &h->hugepage_freelists[nid]);
+	list_move(&page->lru, &h->hugepage_freelists[nid]);
 	h->free_huge_pages++;
 	h->free_huge_pages_node[nid]++;
 }
@@ -522,7 +522,7 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid)
 	if (list_empty(&h->hugepage_freelists[nid]))
 		return NULL;
 	page = list_entry(h->hugepage_freelists[nid].next, struct page, lru);
-	list_del(&page->lru);
+	list_move(&page->lru, &h->hugepage_activelist);
 	set_page_refcounted(page);
 	h->free_huge_pages--;
 	h->free_huge_pages_node[nid]--;
@@ -626,10 +626,11 @@ static void free_huge_page(struct page *page)
 	page->mapping = NULL;
 	BUG_ON(page_count(page));
 	BUG_ON(page_mapcount(page));
-	INIT_LIST_HEAD(&page->lru);
 
 	spin_lock(&hugetlb_lock);
 	if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
+		/* remove the page from active list */
+		list_del(&page->lru);
 		update_and_free_page(h, page);
 		h->surplus_huge_pages--;
 		h->surplus_huge_pages_node[nid]--;
@@ -642,6 +643,7 @@ static void free_huge_page(struct page *page)
 
 static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
 {
+	INIT_LIST_HEAD(&page->lru);
 	set_compound_page_dtor(page, free_huge_page);
 	spin_lock(&hugetlb_lock);
 	h->nr_huge_pages++;
@@ -890,6 +892,7 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid)
 
 	spin_lock(&hugetlb_lock);
 	if (page) {
+		INIT_LIST_HEAD(&page->lru);
 		r_nid = page_to_nid(page);
 		set_compound_page_dtor(page, free_huge_page);
 		/*
@@ -994,7 +997,6 @@ retry:
 	list_for_each_entry_safe(page, tmp, &surplus_list, lru) {
 		if ((--needed) < 0)
 			break;
-		list_del(&page->lru);
 		/*
 		 * This page is now managed by the hugetlb allocator and has
 		 * no users -- drop the buddy allocator's reference.
@@ -1009,7 +1011,6 @@ free:
 	/* Free unnecessary surplus pages to the buddy allocator */
 	if (!list_empty(&surplus_list)) {
 		list_for_each_entry_safe(page, tmp, &surplus_list, lru) {
-			list_del(&page->lru);
 			put_page(page);
 		}
 	}
@@ -1909,6 +1910,7 @@ void __init hugetlb_add_hstate(unsigned order)
 	h->free_huge_pages = 0;
 	for (i = 0; i < MAX_NUMNODES; ++i)
 		INIT_LIST_HEAD(&h->hugepage_freelists[i]);
+	INIT_LIST_HEAD(&h->hugepage_activelist);
 	h->next_nid_to_alloc = first_node(node_states[N_HIGH_MEMORY]);
 	h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]);
 	snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH -V9 07/15] hugetlb: add a list for tracking in-use HugeTLB pages
@ 2012-06-13 10:27   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 10:27 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

hugepage_activelist will be used to track currently used HugeTLB pages.
We need to find the in-use HugeTLB pages to support HugeTLB cgroup removal.
On cgroup removal we update the page's HugeTLB cgroup to point to parent
cgroup.

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/hugetlb.h |    1 +
 mm/hugetlb.c            |   12 +++++++-----
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 0f23c18..ed550d8 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -211,6 +211,7 @@ struct hstate {
 	unsigned long resv_huge_pages;
 	unsigned long surplus_huge_pages;
 	unsigned long nr_overcommit_huge_pages;
+	struct list_head hugepage_activelist;
 	struct list_head hugepage_freelists[MAX_NUMNODES];
 	unsigned int nr_huge_pages_node[MAX_NUMNODES];
 	unsigned int free_huge_pages_node[MAX_NUMNODES];
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index e54b695..b5b6e15 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -510,7 +510,7 @@ void copy_huge_page(struct page *dst, struct page *src)
 static void enqueue_huge_page(struct hstate *h, struct page *page)
 {
 	int nid = page_to_nid(page);
-	list_add(&page->lru, &h->hugepage_freelists[nid]);
+	list_move(&page->lru, &h->hugepage_freelists[nid]);
 	h->free_huge_pages++;
 	h->free_huge_pages_node[nid]++;
 }
@@ -522,7 +522,7 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid)
 	if (list_empty(&h->hugepage_freelists[nid]))
 		return NULL;
 	page = list_entry(h->hugepage_freelists[nid].next, struct page, lru);
-	list_del(&page->lru);
+	list_move(&page->lru, &h->hugepage_activelist);
 	set_page_refcounted(page);
 	h->free_huge_pages--;
 	h->free_huge_pages_node[nid]--;
@@ -626,10 +626,11 @@ static void free_huge_page(struct page *page)
 	page->mapping = NULL;
 	BUG_ON(page_count(page));
 	BUG_ON(page_mapcount(page));
-	INIT_LIST_HEAD(&page->lru);
 
 	spin_lock(&hugetlb_lock);
 	if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
+		/* remove the page from active list */
+		list_del(&page->lru);
 		update_and_free_page(h, page);
 		h->surplus_huge_pages--;
 		h->surplus_huge_pages_node[nid]--;
@@ -642,6 +643,7 @@ static void free_huge_page(struct page *page)
 
 static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
 {
+	INIT_LIST_HEAD(&page->lru);
 	set_compound_page_dtor(page, free_huge_page);
 	spin_lock(&hugetlb_lock);
 	h->nr_huge_pages++;
@@ -890,6 +892,7 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid)
 
 	spin_lock(&hugetlb_lock);
 	if (page) {
+		INIT_LIST_HEAD(&page->lru);
 		r_nid = page_to_nid(page);
 		set_compound_page_dtor(page, free_huge_page);
 		/*
@@ -994,7 +997,6 @@ retry:
 	list_for_each_entry_safe(page, tmp, &surplus_list, lru) {
 		if ((--needed) < 0)
 			break;
-		list_del(&page->lru);
 		/*
 		 * This page is now managed by the hugetlb allocator and has
 		 * no users -- drop the buddy allocator's reference.
@@ -1009,7 +1011,6 @@ free:
 	/* Free unnecessary surplus pages to the buddy allocator */
 	if (!list_empty(&surplus_list)) {
 		list_for_each_entry_safe(page, tmp, &surplus_list, lru) {
-			list_del(&page->lru);
 			put_page(page);
 		}
 	}
@@ -1909,6 +1910,7 @@ void __init hugetlb_add_hstate(unsigned order)
 	h->free_huge_pages = 0;
 	for (i = 0; i < MAX_NUMNODES; ++i)
 		INIT_LIST_HEAD(&h->hugepage_freelists[i]);
+	INIT_LIST_HEAD(&h->hugepage_activelist);
 	h->next_nid_to_alloc = first_node(node_states[N_HIGH_MEMORY]);
 	h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]);
 	snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
-- 
1.7.10

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH -V9 08/15] hugetlb: Make some static variables global
  2012-06-13 10:27 ` Aneesh Kumar K.V
@ 2012-06-13 10:27   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 10:27 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

We will use them later in hugetlb_cgroup.c

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/hugetlb.h |    5 +++++
 mm/hugetlb.c            |    7 ++-----
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index ed550d8..4aca057 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -21,6 +21,11 @@ struct hugepage_subpool {
 	long max_hpages, used_hpages;
 };
 
+extern spinlock_t hugetlb_lock;
+extern int hugetlb_max_hstate;
+#define for_each_hstate(h) \
+	for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++)
+
 struct hugepage_subpool *hugepage_new_subpool(long nr_blocks);
 void hugepage_put_subpool(struct hugepage_subpool *spool);
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index b5b6e15..e899a2d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -35,7 +35,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
 static gfp_t htlb_alloc_mask = GFP_HIGHUSER;
 unsigned long hugepages_treat_as_movable;
 
-static int hugetlb_max_hstate;
+int hugetlb_max_hstate;
 unsigned int default_hstate_idx;
 struct hstate hstates[HUGE_MAX_HSTATE];
 
@@ -46,13 +46,10 @@ static struct hstate * __initdata parsed_hstate;
 static unsigned long __initdata default_hstate_max_huge_pages;
 static unsigned long __initdata default_hstate_size;
 
-#define for_each_hstate(h) \
-	for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++)
-
 /*
  * Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages
  */
-static DEFINE_SPINLOCK(hugetlb_lock);
+DEFINE_SPINLOCK(hugetlb_lock);
 
 static inline void unlock_or_release_subpool(struct hugepage_subpool *spool)
 {
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH -V9 08/15] hugetlb: Make some static variables global
@ 2012-06-13 10:27   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 10:27 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

We will use them later in hugetlb_cgroup.c

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/hugetlb.h |    5 +++++
 mm/hugetlb.c            |    7 ++-----
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index ed550d8..4aca057 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -21,6 +21,11 @@ struct hugepage_subpool {
 	long max_hpages, used_hpages;
 };
 
+extern spinlock_t hugetlb_lock;
+extern int hugetlb_max_hstate;
+#define for_each_hstate(h) \
+	for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++)
+
 struct hugepage_subpool *hugepage_new_subpool(long nr_blocks);
 void hugepage_put_subpool(struct hugepage_subpool *spool);
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index b5b6e15..e899a2d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -35,7 +35,7 @@ const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
 static gfp_t htlb_alloc_mask = GFP_HIGHUSER;
 unsigned long hugepages_treat_as_movable;
 
-static int hugetlb_max_hstate;
+int hugetlb_max_hstate;
 unsigned int default_hstate_idx;
 struct hstate hstates[HUGE_MAX_HSTATE];
 
@@ -46,13 +46,10 @@ static struct hstate * __initdata parsed_hstate;
 static unsigned long __initdata default_hstate_max_huge_pages;
 static unsigned long __initdata default_hstate_size;
 
-#define for_each_hstate(h) \
-	for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++)
-
 /*
  * Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages
  */
-static DEFINE_SPINLOCK(hugetlb_lock);
+DEFINE_SPINLOCK(hugetlb_lock);
 
 static inline void unlock_or_release_subpool(struct hugepage_subpool *spool)
 {
-- 
1.7.10

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH -V9 09/15] mm/hugetlb: Add new HugeTLB cgroup
  2012-06-13 10:27 ` Aneesh Kumar K.V
@ 2012-06-13 10:27   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 10:27 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This patch implements a new controller that allows us to control HugeTLB
allocations. The extension allows to limit the HugeTLB usage per control
group and enforces the controller limit during page fault.  Since HugeTLB
doesn't support page reclaim, enforcing the limit at page fault time implies
that, the application will get SIGBUS signal if it tries to access HugeTLB
pages beyond its limit. This requires the application to know beforehand
how much HugeTLB pages it would require for its use.

The charge/uncharge calls will be added to HugeTLB code in later patch.
Support for cgroup removal will be added in later patches.

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/cgroup_subsys.h  |    6 ++
 include/linux/hugetlb_cgroup.h |   37 ++++++++++++
 init/Kconfig                   |   15 +++++
 mm/Makefile                    |    1 +
 mm/hugetlb_cgroup.c            |  122 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 181 insertions(+)
 create mode 100644 include/linux/hugetlb_cgroup.h
 create mode 100644 mm/hugetlb_cgroup.c

diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index 0bd390c..895923a 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -72,3 +72,9 @@ SUBSYS(net_prio)
 #endif
 
 /* */
+
+#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR
+SUBSYS(hugetlb)
+#endif
+
+/* */
diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h
new file mode 100644
index 0000000..e9944b4
--- /dev/null
+++ b/include/linux/hugetlb_cgroup.h
@@ -0,0 +1,37 @@
+/*
+ * Copyright IBM Corporation, 2012
+ * Author Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ */
+
+#ifndef _LINUX_HUGETLB_CGROUP_H
+#define _LINUX_HUGETLB_CGROUP_H
+
+#include <linux/res_counter.h>
+
+struct hugetlb_cgroup;
+
+#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR
+static inline bool hugetlb_cgroup_disabled(void)
+{
+	if (hugetlb_subsys.disabled)
+		return true;
+	return false;
+}
+
+#else
+static inline bool hugetlb_cgroup_disabled(void)
+{
+	return true;
+}
+
+#endif  /* CONFIG_MEM_RES_CTLR_HUGETLB */
+#endif
diff --git a/init/Kconfig b/init/Kconfig
index d07dcf9..da05fae 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -751,6 +751,21 @@ config CGROUP_MEM_RES_CTLR_KMEM
 	  the kmem extension can use it to guarantee that no group of processes
 	  will ever exhaust kernel resources alone.
 
+config CGROUP_HUGETLB_RES_CTLR
+	bool "HugeTLB Resource Controller for Control Groups"
+	depends on RESOURCE_COUNTERS && HUGETLB_PAGE && EXPERIMENTAL
+	default n
+	help
+	  Provides a cgroup Resource Controller for HugeTLB pages.
+	  When you enable this, you can put a per cgroup limit on HugeTLB usage.
+	  The limit is enforced during page fault. Since HugeTLB doesn't
+	  support page reclaim, enforcing the limit at page fault time implies
+	  that, the application will get SIGBUS signal if it tries to access
+	  HugeTLB pages beyond its limit. This requires the application to know
+	  beforehand how much HugeTLB pages it would require for its use. The
+	  control group is tracked in the third page lru pointer. This means
+	  that we cannot use the controller with huge page less than 3 pages.
+
 config CGROUP_PERF
 	bool "Enable perf_event per-cpu per-container group (cgroup) monitoring"
 	depends on PERF_EVENTS && CGROUPS
diff --git a/mm/Makefile b/mm/Makefile
index 2e2fbbe..25e8002 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -49,6 +49,7 @@ obj-$(CONFIG_MIGRATION) += migrate.o
 obj-$(CONFIG_QUICKLIST) += quicklist.o
 obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o
 obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o page_cgroup.o
+obj-$(CONFIG_CGROUP_HUGETLB_RES_CTLR) += hugetlb_cgroup.o
 obj-$(CONFIG_MEMORY_FAILURE) += memory-failure.o
 obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o
 obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
new file mode 100644
index 0000000..5a4e71c
--- /dev/null
+++ b/mm/hugetlb_cgroup.c
@@ -0,0 +1,122 @@
+/*
+ *
+ * Copyright IBM Corporation, 2012
+ * Author Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ */
+
+#include <linux/cgroup.h>
+#include <linux/slab.h>
+#include <linux/hugetlb.h>
+#include <linux/hugetlb_cgroup.h>
+
+struct hugetlb_cgroup {
+	struct cgroup_subsys_state css;
+	/*
+	 * the counter to account for hugepages from hugetlb.
+	 */
+	struct res_counter hugepage[HUGE_MAX_HSTATE];
+};
+
+struct cgroup_subsys hugetlb_subsys __read_mostly;
+struct hugetlb_cgroup *root_h_cgroup __read_mostly;
+
+static inline
+struct hugetlb_cgroup *hugetlb_cgroup_from_css(struct cgroup_subsys_state *s)
+{
+	if (s)
+		return container_of(s, struct hugetlb_cgroup, css);
+	return NULL;
+}
+
+static inline
+struct hugetlb_cgroup *hugetlb_cgroup_from_cgroup(struct cgroup *cgroup)
+{
+	return hugetlb_cgroup_from_css(cgroup_subsys_state(cgroup,
+							   hugetlb_subsys_id));
+}
+
+static inline
+struct hugetlb_cgroup *hugetlb_cgroup_from_task(struct task_struct *task)
+{
+	return hugetlb_cgroup_from_css(task_subsys_state(task,
+							 hugetlb_subsys_id));
+}
+
+static inline bool hugetlb_cgroup_is_root(struct hugetlb_cgroup *h_cg)
+{
+	return (h_cg == root_h_cgroup);
+}
+
+static inline struct hugetlb_cgroup *parent_hugetlb_cgroup(struct cgroup *cg)
+{
+	if (!cg->parent)
+		return NULL;
+	return hugetlb_cgroup_from_cgroup(cg->parent);
+}
+
+static inline bool hugetlb_cgroup_have_usage(struct cgroup *cg)
+{
+	int idx;
+	struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cg);
+
+	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
+		if ((res_counter_read_u64(&h_cg->hugepage[idx], RES_USAGE)) > 0)
+			return true;
+	}
+	return false;
+}
+
+static struct cgroup_subsys_state *hugetlb_cgroup_create(struct cgroup *cgroup)
+{
+	int idx;
+	struct cgroup *parent_cgroup;
+	struct hugetlb_cgroup *h_cgroup, *parent_h_cgroup;
+
+	h_cgroup = kzalloc(sizeof(*h_cgroup), GFP_KERNEL);
+	if (!h_cgroup)
+		return ERR_PTR(-ENOMEM);
+
+	parent_cgroup = cgroup->parent;
+	if (parent_cgroup) {
+		parent_h_cgroup = hugetlb_cgroup_from_cgroup(parent_cgroup);
+		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
+			res_counter_init(&h_cgroup->hugepage[idx],
+					 &parent_h_cgroup->hugepage[idx]);
+	} else {
+		root_h_cgroup = h_cgroup;
+		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
+			res_counter_init(&h_cgroup->hugepage[idx], NULL);
+	}
+	return &h_cgroup->css;
+}
+
+static void hugetlb_cgroup_destroy(struct cgroup *cgroup)
+{
+	struct hugetlb_cgroup *h_cgroup;
+
+	h_cgroup = hugetlb_cgroup_from_cgroup(cgroup);
+	kfree(h_cgroup);
+}
+
+static int hugetlb_cgroup_pre_destroy(struct cgroup *cgroup)
+{
+	/* We will add the cgroup removal support in later patches */
+	   return -EBUSY;
+}
+
+struct cgroup_subsys hugetlb_subsys = {
+	.name = "hugetlb",
+	.create     = hugetlb_cgroup_create,
+	.pre_destroy = hugetlb_cgroup_pre_destroy,
+	.destroy    = hugetlb_cgroup_destroy,
+	.subsys_id  = hugetlb_subsys_id,
+};
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH -V9 09/15] mm/hugetlb: Add new HugeTLB cgroup
@ 2012-06-13 10:27   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 10:27 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This patch implements a new controller that allows us to control HugeTLB
allocations. The extension allows to limit the HugeTLB usage per control
group and enforces the controller limit during page fault.  Since HugeTLB
doesn't support page reclaim, enforcing the limit at page fault time implies
that, the application will get SIGBUS signal if it tries to access HugeTLB
pages beyond its limit. This requires the application to know beforehand
how much HugeTLB pages it would require for its use.

The charge/uncharge calls will be added to HugeTLB code in later patch.
Support for cgroup removal will be added in later patches.

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/cgroup_subsys.h  |    6 ++
 include/linux/hugetlb_cgroup.h |   37 ++++++++++++
 init/Kconfig                   |   15 +++++
 mm/Makefile                    |    1 +
 mm/hugetlb_cgroup.c            |  122 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 181 insertions(+)
 create mode 100644 include/linux/hugetlb_cgroup.h
 create mode 100644 mm/hugetlb_cgroup.c

diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index 0bd390c..895923a 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -72,3 +72,9 @@ SUBSYS(net_prio)
 #endif
 
 /* */
+
+#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR
+SUBSYS(hugetlb)
+#endif
+
+/* */
diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h
new file mode 100644
index 0000000..e9944b4
--- /dev/null
+++ b/include/linux/hugetlb_cgroup.h
@@ -0,0 +1,37 @@
+/*
+ * Copyright IBM Corporation, 2012
+ * Author Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ */
+
+#ifndef _LINUX_HUGETLB_CGROUP_H
+#define _LINUX_HUGETLB_CGROUP_H
+
+#include <linux/res_counter.h>
+
+struct hugetlb_cgroup;
+
+#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR
+static inline bool hugetlb_cgroup_disabled(void)
+{
+	if (hugetlb_subsys.disabled)
+		return true;
+	return false;
+}
+
+#else
+static inline bool hugetlb_cgroup_disabled(void)
+{
+	return true;
+}
+
+#endif  /* CONFIG_MEM_RES_CTLR_HUGETLB */
+#endif
diff --git a/init/Kconfig b/init/Kconfig
index d07dcf9..da05fae 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -751,6 +751,21 @@ config CGROUP_MEM_RES_CTLR_KMEM
 	  the kmem extension can use it to guarantee that no group of processes
 	  will ever exhaust kernel resources alone.
 
+config CGROUP_HUGETLB_RES_CTLR
+	bool "HugeTLB Resource Controller for Control Groups"
+	depends on RESOURCE_COUNTERS && HUGETLB_PAGE && EXPERIMENTAL
+	default n
+	help
+	  Provides a cgroup Resource Controller for HugeTLB pages.
+	  When you enable this, you can put a per cgroup limit on HugeTLB usage.
+	  The limit is enforced during page fault. Since HugeTLB doesn't
+	  support page reclaim, enforcing the limit at page fault time implies
+	  that, the application will get SIGBUS signal if it tries to access
+	  HugeTLB pages beyond its limit. This requires the application to know
+	  beforehand how much HugeTLB pages it would require for its use. The
+	  control group is tracked in the third page lru pointer. This means
+	  that we cannot use the controller with huge page less than 3 pages.
+
 config CGROUP_PERF
 	bool "Enable perf_event per-cpu per-container group (cgroup) monitoring"
 	depends on PERF_EVENTS && CGROUPS
diff --git a/mm/Makefile b/mm/Makefile
index 2e2fbbe..25e8002 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -49,6 +49,7 @@ obj-$(CONFIG_MIGRATION) += migrate.o
 obj-$(CONFIG_QUICKLIST) += quicklist.o
 obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o
 obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o page_cgroup.o
+obj-$(CONFIG_CGROUP_HUGETLB_RES_CTLR) += hugetlb_cgroup.o
 obj-$(CONFIG_MEMORY_FAILURE) += memory-failure.o
 obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o
 obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
new file mode 100644
index 0000000..5a4e71c
--- /dev/null
+++ b/mm/hugetlb_cgroup.c
@@ -0,0 +1,122 @@
+/*
+ *
+ * Copyright IBM Corporation, 2012
+ * Author Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ */
+
+#include <linux/cgroup.h>
+#include <linux/slab.h>
+#include <linux/hugetlb.h>
+#include <linux/hugetlb_cgroup.h>
+
+struct hugetlb_cgroup {
+	struct cgroup_subsys_state css;
+	/*
+	 * the counter to account for hugepages from hugetlb.
+	 */
+	struct res_counter hugepage[HUGE_MAX_HSTATE];
+};
+
+struct cgroup_subsys hugetlb_subsys __read_mostly;
+struct hugetlb_cgroup *root_h_cgroup __read_mostly;
+
+static inline
+struct hugetlb_cgroup *hugetlb_cgroup_from_css(struct cgroup_subsys_state *s)
+{
+	if (s)
+		return container_of(s, struct hugetlb_cgroup, css);
+	return NULL;
+}
+
+static inline
+struct hugetlb_cgroup *hugetlb_cgroup_from_cgroup(struct cgroup *cgroup)
+{
+	return hugetlb_cgroup_from_css(cgroup_subsys_state(cgroup,
+							   hugetlb_subsys_id));
+}
+
+static inline
+struct hugetlb_cgroup *hugetlb_cgroup_from_task(struct task_struct *task)
+{
+	return hugetlb_cgroup_from_css(task_subsys_state(task,
+							 hugetlb_subsys_id));
+}
+
+static inline bool hugetlb_cgroup_is_root(struct hugetlb_cgroup *h_cg)
+{
+	return (h_cg == root_h_cgroup);
+}
+
+static inline struct hugetlb_cgroup *parent_hugetlb_cgroup(struct cgroup *cg)
+{
+	if (!cg->parent)
+		return NULL;
+	return hugetlb_cgroup_from_cgroup(cg->parent);
+}
+
+static inline bool hugetlb_cgroup_have_usage(struct cgroup *cg)
+{
+	int idx;
+	struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cg);
+
+	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
+		if ((res_counter_read_u64(&h_cg->hugepage[idx], RES_USAGE)) > 0)
+			return true;
+	}
+	return false;
+}
+
+static struct cgroup_subsys_state *hugetlb_cgroup_create(struct cgroup *cgroup)
+{
+	int idx;
+	struct cgroup *parent_cgroup;
+	struct hugetlb_cgroup *h_cgroup, *parent_h_cgroup;
+
+	h_cgroup = kzalloc(sizeof(*h_cgroup), GFP_KERNEL);
+	if (!h_cgroup)
+		return ERR_PTR(-ENOMEM);
+
+	parent_cgroup = cgroup->parent;
+	if (parent_cgroup) {
+		parent_h_cgroup = hugetlb_cgroup_from_cgroup(parent_cgroup);
+		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
+			res_counter_init(&h_cgroup->hugepage[idx],
+					 &parent_h_cgroup->hugepage[idx]);
+	} else {
+		root_h_cgroup = h_cgroup;
+		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
+			res_counter_init(&h_cgroup->hugepage[idx], NULL);
+	}
+	return &h_cgroup->css;
+}
+
+static void hugetlb_cgroup_destroy(struct cgroup *cgroup)
+{
+	struct hugetlb_cgroup *h_cgroup;
+
+	h_cgroup = hugetlb_cgroup_from_cgroup(cgroup);
+	kfree(h_cgroup);
+}
+
+static int hugetlb_cgroup_pre_destroy(struct cgroup *cgroup)
+{
+	/* We will add the cgroup removal support in later patches */
+	   return -EBUSY;
+}
+
+struct cgroup_subsys hugetlb_subsys = {
+	.name = "hugetlb",
+	.create     = hugetlb_cgroup_create,
+	.pre_destroy = hugetlb_cgroup_pre_destroy,
+	.destroy    = hugetlb_cgroup_destroy,
+	.subsys_id  = hugetlb_subsys_id,
+};
-- 
1.7.10

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH -V9 10/15] hugetlb/cgroup: Add the cgroup pointer to page lru
  2012-06-13 10:27 ` Aneesh Kumar K.V
@ 2012-06-13 10:27   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 10:27 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

Add the hugetlb cgroup pointer to 3rd page lru.next. This limit
the usage to hugetlb cgroup to only hugepages with 3 or more
normal pages. I guess that is an acceptable limitation.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/hugetlb_cgroup.h |   37 +++++++++++++++++++++++++++++++++++++
 mm/hugetlb.c                   |    4 ++++
 2 files changed, 41 insertions(+)

diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h
index e9944b4..be1a9f8 100644
--- a/include/linux/hugetlb_cgroup.h
+++ b/include/linux/hugetlb_cgroup.h
@@ -20,6 +20,32 @@
 struct hugetlb_cgroup;
 
 #ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR
+/*
+ * Minimum page order trackable by hugetlb cgroup.
+ * At least 3 pages are necessary for all the tracking information.
+ */
+#define HUGETLB_CGROUP_MIN_ORDER	2
+
+static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page)
+{
+	VM_BUG_ON(!PageHuge(page));
+
+	if (compound_order(page) < HUGETLB_CGROUP_MIN_ORDER)
+		return NULL;
+	return (struct hugetlb_cgroup *)page[2].lru.next;
+}
+
+static inline
+int set_hugetlb_cgroup(struct page *page, struct hugetlb_cgroup *h_cg)
+{
+	VM_BUG_ON(!PageHuge(page));
+
+	if (compound_order(page) < HUGETLB_CGROUP_MIN_ORDER)
+		return -1;
+	page[2].lru.next = (void *)h_cg;
+	return 0;
+}
+
 static inline bool hugetlb_cgroup_disabled(void)
 {
 	if (hugetlb_subsys.disabled)
@@ -28,6 +54,17 @@ static inline bool hugetlb_cgroup_disabled(void)
 }
 
 #else
+static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page)
+{
+	return NULL;
+}
+
+static inline
+int set_hugetlb_cgroup(struct page *page, struct hugetlb_cgroup *h_cg)
+{
+	return 0;
+}
+
 static inline bool hugetlb_cgroup_disabled(void)
 {
 	return true;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index e899a2d..6a449c5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -28,6 +28,7 @@
 
 #include <linux/io.h>
 #include <linux/hugetlb.h>
+#include <linux/hugetlb_cgroup.h>
 #include <linux/node.h>
 #include "internal.h"
 
@@ -591,6 +592,7 @@ static void update_and_free_page(struct hstate *h, struct page *page)
 				1 << PG_active | 1 << PG_reserved |
 				1 << PG_private | 1 << PG_writeback);
 	}
+	VM_BUG_ON(hugetlb_cgroup_from_page(page));
 	set_compound_page_dtor(page, NULL);
 	set_page_refcounted(page);
 	arch_release_hugepage(page);
@@ -643,6 +645,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
 	INIT_LIST_HEAD(&page->lru);
 	set_compound_page_dtor(page, free_huge_page);
 	spin_lock(&hugetlb_lock);
+	set_hugetlb_cgroup(page, NULL);
 	h->nr_huge_pages++;
 	h->nr_huge_pages_node[nid]++;
 	spin_unlock(&hugetlb_lock);
@@ -892,6 +895,7 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid)
 		INIT_LIST_HEAD(&page->lru);
 		r_nid = page_to_nid(page);
 		set_compound_page_dtor(page, free_huge_page);
+		set_hugetlb_cgroup(page, NULL);
 		/*
 		 * We incremented the global counters already
 		 */
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH -V9 10/15] hugetlb/cgroup: Add the cgroup pointer to page lru
@ 2012-06-13 10:27   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 10:27 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

Add the hugetlb cgroup pointer to 3rd page lru.next. This limit
the usage to hugetlb cgroup to only hugepages with 3 or more
normal pages. I guess that is an acceptable limitation.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/hugetlb_cgroup.h |   37 +++++++++++++++++++++++++++++++++++++
 mm/hugetlb.c                   |    4 ++++
 2 files changed, 41 insertions(+)

diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h
index e9944b4..be1a9f8 100644
--- a/include/linux/hugetlb_cgroup.h
+++ b/include/linux/hugetlb_cgroup.h
@@ -20,6 +20,32 @@
 struct hugetlb_cgroup;
 
 #ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR
+/*
+ * Minimum page order trackable by hugetlb cgroup.
+ * At least 3 pages are necessary for all the tracking information.
+ */
+#define HUGETLB_CGROUP_MIN_ORDER	2
+
+static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page)
+{
+	VM_BUG_ON(!PageHuge(page));
+
+	if (compound_order(page) < HUGETLB_CGROUP_MIN_ORDER)
+		return NULL;
+	return (struct hugetlb_cgroup *)page[2].lru.next;
+}
+
+static inline
+int set_hugetlb_cgroup(struct page *page, struct hugetlb_cgroup *h_cg)
+{
+	VM_BUG_ON(!PageHuge(page));
+
+	if (compound_order(page) < HUGETLB_CGROUP_MIN_ORDER)
+		return -1;
+	page[2].lru.next = (void *)h_cg;
+	return 0;
+}
+
 static inline bool hugetlb_cgroup_disabled(void)
 {
 	if (hugetlb_subsys.disabled)
@@ -28,6 +54,17 @@ static inline bool hugetlb_cgroup_disabled(void)
 }
 
 #else
+static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page)
+{
+	return NULL;
+}
+
+static inline
+int set_hugetlb_cgroup(struct page *page, struct hugetlb_cgroup *h_cg)
+{
+	return 0;
+}
+
 static inline bool hugetlb_cgroup_disabled(void)
 {
 	return true;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index e899a2d..6a449c5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -28,6 +28,7 @@
 
 #include <linux/io.h>
 #include <linux/hugetlb.h>
+#include <linux/hugetlb_cgroup.h>
 #include <linux/node.h>
 #include "internal.h"
 
@@ -591,6 +592,7 @@ static void update_and_free_page(struct hstate *h, struct page *page)
 				1 << PG_active | 1 << PG_reserved |
 				1 << PG_private | 1 << PG_writeback);
 	}
+	VM_BUG_ON(hugetlb_cgroup_from_page(page));
 	set_compound_page_dtor(page, NULL);
 	set_page_refcounted(page);
 	arch_release_hugepage(page);
@@ -643,6 +645,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
 	INIT_LIST_HEAD(&page->lru);
 	set_compound_page_dtor(page, free_huge_page);
 	spin_lock(&hugetlb_lock);
+	set_hugetlb_cgroup(page, NULL);
 	h->nr_huge_pages++;
 	h->nr_huge_pages_node[nid]++;
 	spin_unlock(&hugetlb_lock);
@@ -892,6 +895,7 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid)
 		INIT_LIST_HEAD(&page->lru);
 		r_nid = page_to_nid(page);
 		set_compound_page_dtor(page, free_huge_page);
+		set_hugetlb_cgroup(page, NULL);
 		/*
 		 * We incremented the global counters already
 		 */
-- 
1.7.10

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup
  2012-06-13 10:27 ` Aneesh Kumar K.V
  (?)
@ 2012-06-13 10:27   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 10:27 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This patchset add the charge and uncharge routines for hugetlb cgroup.
We do cgroup charging in page alloc and uncharge in compound page
destructor. Assigning page's hugetlb cgroup is protected by hugetlb_lock.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/hugetlb_cgroup.h |   38 +++++++++++++++++++
 mm/hugetlb.c                   |   16 +++++++-
 mm/hugetlb_cgroup.c            |   80 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 133 insertions(+), 1 deletion(-)

diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h
index be1a9f8..e05871c 100644
--- a/include/linux/hugetlb_cgroup.h
+++ b/include/linux/hugetlb_cgroup.h
@@ -53,6 +53,16 @@ static inline bool hugetlb_cgroup_disabled(void)
 	return false;
 }
 
+extern int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
+					struct hugetlb_cgroup **ptr);
+extern void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages,
+					 struct hugetlb_cgroup *h_cg,
+					 struct page *page);
+extern void hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages,
+					 struct page *page);
+extern void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
+					   struct hugetlb_cgroup *h_cg);
+
 #else
 static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page)
 {
@@ -70,5 +80,33 @@ static inline bool hugetlb_cgroup_disabled(void)
 	return true;
 }
 
+static inline int
+hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
+			     struct hugetlb_cgroup **ptr)
+{
+	return 0;
+}
+
+static inline void
+hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages,
+			     struct hugetlb_cgroup *h_cg,
+			     struct page *page)
+{
+	return;
+}
+
+static inline void
+hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages, struct page *page)
+{
+	return;
+}
+
+static inline void
+hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
+			       struct hugetlb_cgroup *h_cg)
+{
+	return;
+}
+
 #endif  /* CONFIG_MEM_RES_CTLR_HUGETLB */
 #endif
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6a449c5..59720b1 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -627,6 +627,8 @@ static void free_huge_page(struct page *page)
 	BUG_ON(page_mapcount(page));
 
 	spin_lock(&hugetlb_lock);
+	hugetlb_cgroup_uncharge_page(hstate_index(h),
+				     pages_per_huge_page(h), page);
 	if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
 		/* remove the page from active list */
 		list_del(&page->lru);
@@ -1115,7 +1117,10 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 	struct hstate *h = hstate_vma(vma);
 	struct page *page;
 	long chg;
+	int ret, idx;
+	struct hugetlb_cgroup *h_cg;
 
+	idx = hstate_index(h);
 	/*
 	 * Processes that did not create the mapping will have no
 	 * reserves and will not have accounted against subpool
@@ -1131,6 +1136,11 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 		if (hugepage_subpool_get_pages(spool, chg))
 			return ERR_PTR(-ENOSPC);
 
+	ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), &h_cg);
+	if (ret) {
+		hugepage_subpool_put_pages(spool, chg);
+		return ERR_PTR(-ENOSPC);
+	}
 	spin_lock(&hugetlb_lock);
 	page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve);
 	spin_unlock(&hugetlb_lock);
@@ -1138,6 +1148,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 	if (!page) {
 		page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
 		if (!page) {
+			hugetlb_cgroup_uncharge_cgroup(idx,
+						       pages_per_huge_page(h),
+						       h_cg);
 			hugepage_subpool_put_pages(spool, chg);
 			return ERR_PTR(-ENOSPC);
 		}
@@ -1146,7 +1159,8 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 	set_page_private(page, (unsigned long)spool);
 
 	vma_commit_reservation(h, vma, addr);
-
+	/* update page cgroup details */
+	hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page);
 	return page;
 }
 
diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
index 5a4e71c..0f2f6ac 100644
--- a/mm/hugetlb_cgroup.c
+++ b/mm/hugetlb_cgroup.c
@@ -113,6 +113,86 @@ static int hugetlb_cgroup_pre_destroy(struct cgroup *cgroup)
 	   return -EBUSY;
 }
 
+int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
+				 struct hugetlb_cgroup **ptr)
+{
+	int ret = 0;
+	struct res_counter *fail_res;
+	struct hugetlb_cgroup *h_cg = NULL;
+	unsigned long csize = nr_pages * PAGE_SIZE;
+
+	if (hugetlb_cgroup_disabled())
+		goto done;
+	/*
+	 * We don't charge any cgroup if the compound page have less
+	 * than 3 pages.
+	 */
+	if (huge_page_order(&hstates[idx]) < HUGETLB_CGROUP_MIN_ORDER)
+		goto done;
+again:
+	rcu_read_lock();
+	h_cg = hugetlb_cgroup_from_task(current);
+	if (!h_cg)
+		h_cg = root_h_cgroup;
+
+	if (!css_tryget(&h_cg->css)) {
+		rcu_read_unlock();
+		goto again;
+	}
+	rcu_read_unlock();
+
+	ret = res_counter_charge(&h_cg->hugepage[idx], csize, &fail_res);
+	css_put(&h_cg->css);
+done:
+	*ptr = h_cg;
+	return ret;
+}
+
+void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages,
+				  struct hugetlb_cgroup *h_cg,
+				  struct page *page)
+{
+	if (hugetlb_cgroup_disabled() || !h_cg)
+		return;
+
+	spin_lock(&hugetlb_lock);
+	set_hugetlb_cgroup(page, h_cg);
+	spin_unlock(&hugetlb_lock);
+	return;
+}
+
+/*
+ * Should be called with hugetlb_lock held
+ */
+void hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages,
+				  struct page *page)
+{
+	struct hugetlb_cgroup *h_cg;
+	unsigned long csize = nr_pages * PAGE_SIZE;
+
+	if (hugetlb_cgroup_disabled())
+		return;
+	VM_BUG_ON(!spin_is_locked(&hugetlb_lock));
+	h_cg = hugetlb_cgroup_from_page(page);
+	if (unlikely(!h_cg))
+		return;
+	set_hugetlb_cgroup(page, NULL);
+	res_counter_uncharge(&h_cg->hugepage[idx], csize);
+	return;
+}
+
+void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
+				    struct hugetlb_cgroup *h_cg)
+{
+	unsigned long csize = nr_pages * PAGE_SIZE;
+
+	if (hugetlb_cgroup_disabled() || !h_cg)
+		return;
+
+	res_counter_uncharge(&h_cg->hugepage[idx], csize);
+	return;
+}
+
 struct cgroup_subsys hugetlb_subsys = {
 	.name = "hugetlb",
 	.create     = hugetlb_cgroup_create,
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup
@ 2012-06-13 10:27   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 10:27 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This patchset add the charge and uncharge routines for hugetlb cgroup.
We do cgroup charging in page alloc and uncharge in compound page
destructor. Assigning page's hugetlb cgroup is protected by hugetlb_lock.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/hugetlb_cgroup.h |   38 +++++++++++++++++++
 mm/hugetlb.c                   |   16 +++++++-
 mm/hugetlb_cgroup.c            |   80 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 133 insertions(+), 1 deletion(-)

diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h
index be1a9f8..e05871c 100644
--- a/include/linux/hugetlb_cgroup.h
+++ b/include/linux/hugetlb_cgroup.h
@@ -53,6 +53,16 @@ static inline bool hugetlb_cgroup_disabled(void)
 	return false;
 }
 
+extern int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
+					struct hugetlb_cgroup **ptr);
+extern void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages,
+					 struct hugetlb_cgroup *h_cg,
+					 struct page *page);
+extern void hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages,
+					 struct page *page);
+extern void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
+					   struct hugetlb_cgroup *h_cg);
+
 #else
 static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page)
 {
@@ -70,5 +80,33 @@ static inline bool hugetlb_cgroup_disabled(void)
 	return true;
 }
 
+static inline int
+hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
+			     struct hugetlb_cgroup **ptr)
+{
+	return 0;
+}
+
+static inline void
+hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages,
+			     struct hugetlb_cgroup *h_cg,
+			     struct page *page)
+{
+	return;
+}
+
+static inline void
+hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages, struct page *page)
+{
+	return;
+}
+
+static inline void
+hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
+			       struct hugetlb_cgroup *h_cg)
+{
+	return;
+}
+
 #endif  /* CONFIG_MEM_RES_CTLR_HUGETLB */
 #endif
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6a449c5..59720b1 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -627,6 +627,8 @@ static void free_huge_page(struct page *page)
 	BUG_ON(page_mapcount(page));
 
 	spin_lock(&hugetlb_lock);
+	hugetlb_cgroup_uncharge_page(hstate_index(h),
+				     pages_per_huge_page(h), page);
 	if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
 		/* remove the page from active list */
 		list_del(&page->lru);
@@ -1115,7 +1117,10 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 	struct hstate *h = hstate_vma(vma);
 	struct page *page;
 	long chg;
+	int ret, idx;
+	struct hugetlb_cgroup *h_cg;
 
+	idx = hstate_index(h);
 	/*
 	 * Processes that did not create the mapping will have no
 	 * reserves and will not have accounted against subpool
@@ -1131,6 +1136,11 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 		if (hugepage_subpool_get_pages(spool, chg))
 			return ERR_PTR(-ENOSPC);
 
+	ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), &h_cg);
+	if (ret) {
+		hugepage_subpool_put_pages(spool, chg);
+		return ERR_PTR(-ENOSPC);
+	}
 	spin_lock(&hugetlb_lock);
 	page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve);
 	spin_unlock(&hugetlb_lock);
@@ -1138,6 +1148,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 	if (!page) {
 		page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
 		if (!page) {
+			hugetlb_cgroup_uncharge_cgroup(idx,
+						       pages_per_huge_page(h),
+						       h_cg);
 			hugepage_subpool_put_pages(spool, chg);
 			return ERR_PTR(-ENOSPC);
 		}
@@ -1146,7 +1159,8 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 	set_page_private(page, (unsigned long)spool);
 
 	vma_commit_reservation(h, vma, addr);
-
+	/* update page cgroup details */
+	hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page);
 	return page;
 }
 
diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
index 5a4e71c..0f2f6ac 100644
--- a/mm/hugetlb_cgroup.c
+++ b/mm/hugetlb_cgroup.c
@@ -113,6 +113,86 @@ static int hugetlb_cgroup_pre_destroy(struct cgroup *cgroup)
 	   return -EBUSY;
 }
 
+int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
+				 struct hugetlb_cgroup **ptr)
+{
+	int ret = 0;
+	struct res_counter *fail_res;
+	struct hugetlb_cgroup *h_cg = NULL;
+	unsigned long csize = nr_pages * PAGE_SIZE;
+
+	if (hugetlb_cgroup_disabled())
+		goto done;
+	/*
+	 * We don't charge any cgroup if the compound page have less
+	 * than 3 pages.
+	 */
+	if (huge_page_order(&hstates[idx]) < HUGETLB_CGROUP_MIN_ORDER)
+		goto done;
+again:
+	rcu_read_lock();
+	h_cg = hugetlb_cgroup_from_task(current);
+	if (!h_cg)
+		h_cg = root_h_cgroup;
+
+	if (!css_tryget(&h_cg->css)) {
+		rcu_read_unlock();
+		goto again;
+	}
+	rcu_read_unlock();
+
+	ret = res_counter_charge(&h_cg->hugepage[idx], csize, &fail_res);
+	css_put(&h_cg->css);
+done:
+	*ptr = h_cg;
+	return ret;
+}
+
+void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages,
+				  struct hugetlb_cgroup *h_cg,
+				  struct page *page)
+{
+	if (hugetlb_cgroup_disabled() || !h_cg)
+		return;
+
+	spin_lock(&hugetlb_lock);
+	set_hugetlb_cgroup(page, h_cg);
+	spin_unlock(&hugetlb_lock);
+	return;
+}
+
+/*
+ * Should be called with hugetlb_lock held
+ */
+void hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages,
+				  struct page *page)
+{
+	struct hugetlb_cgroup *h_cg;
+	unsigned long csize = nr_pages * PAGE_SIZE;
+
+	if (hugetlb_cgroup_disabled())
+		return;
+	VM_BUG_ON(!spin_is_locked(&hugetlb_lock));
+	h_cg = hugetlb_cgroup_from_page(page);
+	if (unlikely(!h_cg))
+		return;
+	set_hugetlb_cgroup(page, NULL);
+	res_counter_uncharge(&h_cg->hugepage[idx], csize);
+	return;
+}
+
+void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
+				    struct hugetlb_cgroup *h_cg)
+{
+	unsigned long csize = nr_pages * PAGE_SIZE;
+
+	if (hugetlb_cgroup_disabled() || !h_cg)
+		return;
+
+	res_counter_uncharge(&h_cg->hugepage[idx], csize);
+	return;
+}
+
 struct cgroup_subsys hugetlb_subsys = {
 	.name = "hugetlb",
 	.create     = hugetlb_cgroup_create,
-- 
1.7.10

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup
@ 2012-06-13 10:27   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 10:27 UTC (permalink / raw)
  To: linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, rientjes-hpIqsD4AKlfQT0dZR+AlfA,
	mhocko-AlSwsSmVLrQ, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>

This patchset add the charge and uncharge routines for hugetlb cgroup.
We do cgroup charging in page alloc and uncharge in compound page
destructor. Assigning page's hugetlb cgroup is protected by hugetlb_lock.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 include/linux/hugetlb_cgroup.h |   38 +++++++++++++++++++
 mm/hugetlb.c                   |   16 +++++++-
 mm/hugetlb_cgroup.c            |   80 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 133 insertions(+), 1 deletion(-)

diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h
index be1a9f8..e05871c 100644
--- a/include/linux/hugetlb_cgroup.h
+++ b/include/linux/hugetlb_cgroup.h
@@ -53,6 +53,16 @@ static inline bool hugetlb_cgroup_disabled(void)
 	return false;
 }
 
+extern int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
+					struct hugetlb_cgroup **ptr);
+extern void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages,
+					 struct hugetlb_cgroup *h_cg,
+					 struct page *page);
+extern void hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages,
+					 struct page *page);
+extern void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
+					   struct hugetlb_cgroup *h_cg);
+
 #else
 static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page)
 {
@@ -70,5 +80,33 @@ static inline bool hugetlb_cgroup_disabled(void)
 	return true;
 }
 
+static inline int
+hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
+			     struct hugetlb_cgroup **ptr)
+{
+	return 0;
+}
+
+static inline void
+hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages,
+			     struct hugetlb_cgroup *h_cg,
+			     struct page *page)
+{
+	return;
+}
+
+static inline void
+hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages, struct page *page)
+{
+	return;
+}
+
+static inline void
+hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
+			       struct hugetlb_cgroup *h_cg)
+{
+	return;
+}
+
 #endif  /* CONFIG_MEM_RES_CTLR_HUGETLB */
 #endif
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6a449c5..59720b1 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -627,6 +627,8 @@ static void free_huge_page(struct page *page)
 	BUG_ON(page_mapcount(page));
 
 	spin_lock(&hugetlb_lock);
+	hugetlb_cgroup_uncharge_page(hstate_index(h),
+				     pages_per_huge_page(h), page);
 	if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
 		/* remove the page from active list */
 		list_del(&page->lru);
@@ -1115,7 +1117,10 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 	struct hstate *h = hstate_vma(vma);
 	struct page *page;
 	long chg;
+	int ret, idx;
+	struct hugetlb_cgroup *h_cg;
 
+	idx = hstate_index(h);
 	/*
 	 * Processes that did not create the mapping will have no
 	 * reserves and will not have accounted against subpool
@@ -1131,6 +1136,11 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 		if (hugepage_subpool_get_pages(spool, chg))
 			return ERR_PTR(-ENOSPC);
 
+	ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), &h_cg);
+	if (ret) {
+		hugepage_subpool_put_pages(spool, chg);
+		return ERR_PTR(-ENOSPC);
+	}
 	spin_lock(&hugetlb_lock);
 	page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve);
 	spin_unlock(&hugetlb_lock);
@@ -1138,6 +1148,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 	if (!page) {
 		page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
 		if (!page) {
+			hugetlb_cgroup_uncharge_cgroup(idx,
+						       pages_per_huge_page(h),
+						       h_cg);
 			hugepage_subpool_put_pages(spool, chg);
 			return ERR_PTR(-ENOSPC);
 		}
@@ -1146,7 +1159,8 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 	set_page_private(page, (unsigned long)spool);
 
 	vma_commit_reservation(h, vma, addr);
-
+	/* update page cgroup details */
+	hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page);
 	return page;
 }
 
diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
index 5a4e71c..0f2f6ac 100644
--- a/mm/hugetlb_cgroup.c
+++ b/mm/hugetlb_cgroup.c
@@ -113,6 +113,86 @@ static int hugetlb_cgroup_pre_destroy(struct cgroup *cgroup)
 	   return -EBUSY;
 }
 
+int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
+				 struct hugetlb_cgroup **ptr)
+{
+	int ret = 0;
+	struct res_counter *fail_res;
+	struct hugetlb_cgroup *h_cg = NULL;
+	unsigned long csize = nr_pages * PAGE_SIZE;
+
+	if (hugetlb_cgroup_disabled())
+		goto done;
+	/*
+	 * We don't charge any cgroup if the compound page have less
+	 * than 3 pages.
+	 */
+	if (huge_page_order(&hstates[idx]) < HUGETLB_CGROUP_MIN_ORDER)
+		goto done;
+again:
+	rcu_read_lock();
+	h_cg = hugetlb_cgroup_from_task(current);
+	if (!h_cg)
+		h_cg = root_h_cgroup;
+
+	if (!css_tryget(&h_cg->css)) {
+		rcu_read_unlock();
+		goto again;
+	}
+	rcu_read_unlock();
+
+	ret = res_counter_charge(&h_cg->hugepage[idx], csize, &fail_res);
+	css_put(&h_cg->css);
+done:
+	*ptr = h_cg;
+	return ret;
+}
+
+void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages,
+				  struct hugetlb_cgroup *h_cg,
+				  struct page *page)
+{
+	if (hugetlb_cgroup_disabled() || !h_cg)
+		return;
+
+	spin_lock(&hugetlb_lock);
+	set_hugetlb_cgroup(page, h_cg);
+	spin_unlock(&hugetlb_lock);
+	return;
+}
+
+/*
+ * Should be called with hugetlb_lock held
+ */
+void hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages,
+				  struct page *page)
+{
+	struct hugetlb_cgroup *h_cg;
+	unsigned long csize = nr_pages * PAGE_SIZE;
+
+	if (hugetlb_cgroup_disabled())
+		return;
+	VM_BUG_ON(!spin_is_locked(&hugetlb_lock));
+	h_cg = hugetlb_cgroup_from_page(page);
+	if (unlikely(!h_cg))
+		return;
+	set_hugetlb_cgroup(page, NULL);
+	res_counter_uncharge(&h_cg->hugepage[idx], csize);
+	return;
+}
+
+void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
+				    struct hugetlb_cgroup *h_cg)
+{
+	unsigned long csize = nr_pages * PAGE_SIZE;
+
+	if (hugetlb_cgroup_disabled() || !h_cg)
+		return;
+
+	res_counter_uncharge(&h_cg->hugepage[idx], csize);
+	return;
+}
+
 struct cgroup_subsys hugetlb_subsys = {
 	.name = "hugetlb",
 	.create     = hugetlb_cgroup_create,
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH -V9 12/15] hugetlb/cgroup: Add support for cgroup removal
  2012-06-13 10:27 ` Aneesh Kumar K.V
@ 2012-06-13 10:27   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 10:27 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This patch add support for cgroup removal. If we don't have parent
cgroup, the charges are moved to root cgroup.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 mm/hugetlb_cgroup.c |   70 +++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 68 insertions(+), 2 deletions(-)

diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
index 0f2f6ac..a3a68a4 100644
--- a/mm/hugetlb_cgroup.c
+++ b/mm/hugetlb_cgroup.c
@@ -107,10 +107,76 @@ static void hugetlb_cgroup_destroy(struct cgroup *cgroup)
 	kfree(h_cgroup);
 }
 
+
+/*
+ * Should be called with hugetlb_lock held.
+ * Since we are holding hugetlb_lock, pages cannot get moved from
+ * active list or uncharged from the cgroup, So no need to get
+ * page reference and test for page active here. This function
+ * cannot fail.
+ */
+static void hugetlb_cgroup_move_parent(int idx, struct cgroup *cgroup,
+				       struct page *page)
+{
+	int csize;
+	struct res_counter *counter;
+	struct res_counter *fail_res;
+	struct hugetlb_cgroup *page_hcg;
+	struct hugetlb_cgroup *h_cg   = hugetlb_cgroup_from_cgroup(cgroup);
+	struct hugetlb_cgroup *parent = parent_hugetlb_cgroup(cgroup);
+
+	page_hcg = hugetlb_cgroup_from_page(page);
+	/*
+	 * We can have pages in active list without any cgroup
+	 * ie, hugepage with less than 3 pages. We can safely
+	 * ignore those pages.
+	 */
+	if (!page_hcg || page_hcg != h_cg)
+		goto out;
+
+	csize = PAGE_SIZE << compound_order(page);
+	if (!parent) {
+		parent = root_h_cgroup;
+		/* root has no limit */
+		res_counter_charge_nofail(&parent->hugepage[idx],
+					  csize, &fail_res);
+	}
+	counter = &h_cg->hugepage[idx];
+	res_counter_uncharge_until(counter, counter->parent, csize);
+
+	set_hugetlb_cgroup(page, parent);
+out:
+	return;
+}
+
+/*
+ * Force the hugetlb cgroup to empty the hugetlb resources by moving them to
+ * the parent cgroup.
+ */
 static int hugetlb_cgroup_pre_destroy(struct cgroup *cgroup)
 {
-	/* We will add the cgroup removal support in later patches */
-	   return -EBUSY;
+	struct hstate *h;
+	struct page *page;
+	int ret = 0, idx = 0;
+
+	do {
+		if (cgroup_task_count(cgroup) ||
+		    !list_empty(&cgroup->children)) {
+			ret = -EBUSY;
+			goto out;
+		}
+		for_each_hstate(h) {
+			spin_lock(&hugetlb_lock);
+			list_for_each_entry(page, &h->hugepage_activelist, lru)
+				hugetlb_cgroup_move_parent(idx, cgroup, page);
+
+			spin_unlock(&hugetlb_lock);
+			idx++;
+		}
+		cond_resched();
+	} while (hugetlb_cgroup_have_usage(cgroup));
+out:
+	return ret;
 }
 
 int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH -V9 12/15] hugetlb/cgroup: Add support for cgroup removal
@ 2012-06-13 10:27   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 10:27 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This patch add support for cgroup removal. If we don't have parent
cgroup, the charges are moved to root cgroup.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 mm/hugetlb_cgroup.c |   70 +++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 68 insertions(+), 2 deletions(-)

diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
index 0f2f6ac..a3a68a4 100644
--- a/mm/hugetlb_cgroup.c
+++ b/mm/hugetlb_cgroup.c
@@ -107,10 +107,76 @@ static void hugetlb_cgroup_destroy(struct cgroup *cgroup)
 	kfree(h_cgroup);
 }
 
+
+/*
+ * Should be called with hugetlb_lock held.
+ * Since we are holding hugetlb_lock, pages cannot get moved from
+ * active list or uncharged from the cgroup, So no need to get
+ * page reference and test for page active here. This function
+ * cannot fail.
+ */
+static void hugetlb_cgroup_move_parent(int idx, struct cgroup *cgroup,
+				       struct page *page)
+{
+	int csize;
+	struct res_counter *counter;
+	struct res_counter *fail_res;
+	struct hugetlb_cgroup *page_hcg;
+	struct hugetlb_cgroup *h_cg   = hugetlb_cgroup_from_cgroup(cgroup);
+	struct hugetlb_cgroup *parent = parent_hugetlb_cgroup(cgroup);
+
+	page_hcg = hugetlb_cgroup_from_page(page);
+	/*
+	 * We can have pages in active list without any cgroup
+	 * ie, hugepage with less than 3 pages. We can safely
+	 * ignore those pages.
+	 */
+	if (!page_hcg || page_hcg != h_cg)
+		goto out;
+
+	csize = PAGE_SIZE << compound_order(page);
+	if (!parent) {
+		parent = root_h_cgroup;
+		/* root has no limit */
+		res_counter_charge_nofail(&parent->hugepage[idx],
+					  csize, &fail_res);
+	}
+	counter = &h_cg->hugepage[idx];
+	res_counter_uncharge_until(counter, counter->parent, csize);
+
+	set_hugetlb_cgroup(page, parent);
+out:
+	return;
+}
+
+/*
+ * Force the hugetlb cgroup to empty the hugetlb resources by moving them to
+ * the parent cgroup.
+ */
 static int hugetlb_cgroup_pre_destroy(struct cgroup *cgroup)
 {
-	/* We will add the cgroup removal support in later patches */
-	   return -EBUSY;
+	struct hstate *h;
+	struct page *page;
+	int ret = 0, idx = 0;
+
+	do {
+		if (cgroup_task_count(cgroup) ||
+		    !list_empty(&cgroup->children)) {
+			ret = -EBUSY;
+			goto out;
+		}
+		for_each_hstate(h) {
+			spin_lock(&hugetlb_lock);
+			list_for_each_entry(page, &h->hugepage_activelist, lru)
+				hugetlb_cgroup_move_parent(idx, cgroup, page);
+
+			spin_unlock(&hugetlb_lock);
+			idx++;
+		}
+		cond_resched();
+	} while (hugetlb_cgroup_have_usage(cgroup));
+out:
+	return ret;
 }
 
 int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
-- 
1.7.10

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH -V9 13/15] hugetlb/cgroup: add hugetlb cgroup control files
  2012-06-13 10:27 ` Aneesh Kumar K.V
@ 2012-06-13 10:27   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 10:27 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

Add the control files for hugetlb controller

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/hugetlb.h        |    5 ++
 include/linux/hugetlb_cgroup.h |    6 ++
 mm/hugetlb.c                   |    8 +++
 mm/hugetlb_cgroup.c            |  129 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 148 insertions(+)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 4aca057..9650bb1 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -4,6 +4,7 @@
 #include <linux/mm_types.h>
 #include <linux/fs.h>
 #include <linux/hugetlb_inline.h>
+#include <linux/cgroup.h>
 
 struct ctl_table;
 struct user_struct;
@@ -221,6 +222,10 @@ struct hstate {
 	unsigned int nr_huge_pages_node[MAX_NUMNODES];
 	unsigned int free_huge_pages_node[MAX_NUMNODES];
 	unsigned int surplus_huge_pages_node[MAX_NUMNODES];
+#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR
+	/* cgroup control files */
+	struct cftype cgroup_files[5];
+#endif
 	char name[HSTATE_NAME_LEN];
 };
 
diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h
index e05871c..bd8bc98 100644
--- a/include/linux/hugetlb_cgroup.h
+++ b/include/linux/hugetlb_cgroup.h
@@ -62,6 +62,7 @@ extern void hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages,
 					 struct page *page);
 extern void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
 					   struct hugetlb_cgroup *h_cg);
+extern int hugetlb_cgroup_file_init(int idx) __init;
 
 #else
 static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page)
@@ -108,5 +109,10 @@ hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
 	return;
 }
 
+static inline int __init hugetlb_cgroup_file_init(int idx)
+{
+	return 0;
+}
+
 #endif  /* CONFIG_MEM_RES_CTLR_HUGETLB */
 #endif
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 59720b1..a5a30bf 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -30,6 +30,7 @@
 #include <linux/hugetlb.h>
 #include <linux/hugetlb_cgroup.h>
 #include <linux/node.h>
+#include <linux/hugetlb_cgroup.h>
 #include "internal.h"
 
 const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
@@ -1930,6 +1931,13 @@ void __init hugetlb_add_hstate(unsigned order)
 	h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]);
 	snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
 					huge_page_size(h)/1024);
+	/*
+	 * Add cgroup control files only if the huge page consists
+	 * of more than two normal pages. This is because we use
+	 * page[2].lru.next for storing cgoup details.
+	 */
+	if (order >= HUGETLB_CGROUP_MIN_ORDER)
+		hugetlb_cgroup_file_init(hugetlb_max_hstate - 1);
 
 	parsed_hstate = h;
 }
diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
index a3a68a4..64e93e0 100644
--- a/mm/hugetlb_cgroup.c
+++ b/mm/hugetlb_cgroup.c
@@ -26,6 +26,10 @@ struct hugetlb_cgroup {
 	struct res_counter hugepage[HUGE_MAX_HSTATE];
 };
 
+#define MEMFILE_PRIVATE(x, val)	(((x) << 16) | (val))
+#define MEMFILE_IDX(val)	(((val) >> 16) & 0xffff)
+#define MEMFILE_ATTR(val)	((val) & 0xffff)
+
 struct cgroup_subsys hugetlb_subsys __read_mostly;
 struct hugetlb_cgroup *root_h_cgroup __read_mostly;
 
@@ -259,6 +263,131 @@ void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
 	return;
 }
 
+static ssize_t hugetlb_cgroup_read(struct cgroup *cgroup, struct cftype *cft,
+				   struct file *file, char __user *buf,
+				   size_t nbytes, loff_t *ppos)
+{
+	u64 val;
+	char str[64];
+	int idx, name, len;
+	struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup);
+
+	idx = MEMFILE_IDX(cft->private);
+	name = MEMFILE_ATTR(cft->private);
+
+	val = res_counter_read_u64(&h_cg->hugepage[idx], name);
+	len = scnprintf(str, sizeof(str), "%llu\n", (unsigned long long)val);
+	return simple_read_from_buffer(buf, nbytes, ppos, str, len);
+}
+
+static int hugetlb_cgroup_write(struct cgroup *cgroup, struct cftype *cft,
+				const char *buffer)
+{
+	int idx, name, ret;
+	unsigned long long val;
+	struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup);
+
+	idx = MEMFILE_IDX(cft->private);
+	name = MEMFILE_ATTR(cft->private);
+
+	switch (name) {
+	case RES_LIMIT:
+		if (hugetlb_cgroup_is_root(h_cg)) {
+			/* Can't set limit on root */
+			ret = -EINVAL;
+			break;
+		}
+		/* This function does all necessary parse...reuse it */
+		ret = res_counter_memparse_write_strategy(buffer, &val);
+		if (ret)
+			break;
+		ret = res_counter_set_limit(&h_cg->hugepage[idx], val);
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+	return ret;
+}
+
+static int hugetlb_cgroup_reset(struct cgroup *cgroup, unsigned int event)
+{
+	int idx, name, ret = 0;
+	struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup);
+
+	idx = MEMFILE_IDX(event);
+	name = MEMFILE_ATTR(event);
+
+	switch (name) {
+	case RES_MAX_USAGE:
+		res_counter_reset_max(&h_cg->hugepage[idx]);
+		break;
+	case RES_FAILCNT:
+		res_counter_reset_failcnt(&h_cg->hugepage[idx]);
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+	return ret;
+}
+
+static char *mem_fmt(char *buf, int size, unsigned long hsize)
+{
+	if (hsize >= (1UL << 30))
+		snprintf(buf, size, "%luGB", hsize >> 30);
+	else if (hsize >= (1UL << 20))
+		snprintf(buf, size, "%luMB", hsize >> 20);
+	else
+		snprintf(buf, size, "%luKB", hsize >> 10);
+	return buf;
+}
+
+int __init hugetlb_cgroup_file_init(int idx)
+{
+	char buf[32];
+	struct cftype *cft;
+	struct hstate *h = &hstates[idx];
+
+	/* format the size */
+	mem_fmt(buf, 32, huge_page_size(h));
+
+	/* Add the limit file */
+	cft = &h->cgroup_files[0];
+	snprintf(cft->name, MAX_CFTYPE_NAME, "%s.limit_in_bytes", buf);
+	cft->private = MEMFILE_PRIVATE(idx, RES_LIMIT);
+	cft->read = hugetlb_cgroup_read;
+	cft->write_string = hugetlb_cgroup_write;
+
+	/* Add the usage file */
+	cft = &h->cgroup_files[1];
+	snprintf(cft->name, MAX_CFTYPE_NAME, "%s.usage_in_bytes", buf);
+	cft->private = MEMFILE_PRIVATE(idx, RES_USAGE);
+	cft->read = hugetlb_cgroup_read;
+
+	/* Add the MAX usage file */
+	cft = &h->cgroup_files[2];
+	snprintf(cft->name, MAX_CFTYPE_NAME, "%s.max_usage_in_bytes", buf);
+	cft->private = MEMFILE_PRIVATE(idx, RES_MAX_USAGE);
+	cft->trigger = hugetlb_cgroup_reset;
+	cft->read = hugetlb_cgroup_read;
+
+	/* Add the failcntfile */
+	cft = &h->cgroup_files[3];
+	snprintf(cft->name, MAX_CFTYPE_NAME, "%s.failcnt", buf);
+	cft->private  = MEMFILE_PRIVATE(idx, RES_FAILCNT);
+	cft->trigger  = hugetlb_cgroup_reset;
+	cft->read = hugetlb_cgroup_read;
+
+	/* NULL terminate the last cft */
+	cft = &h->cgroup_files[4];
+	memset(cft, 0, sizeof(*cft));
+
+	WARN_ON(cgroup_add_cftypes(&hugetlb_subsys, h->cgroup_files));
+
+	return 0;
+}
+
 struct cgroup_subsys hugetlb_subsys = {
 	.name = "hugetlb",
 	.create     = hugetlb_cgroup_create,
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH -V9 13/15] hugetlb/cgroup: add hugetlb cgroup control files
@ 2012-06-13 10:27   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 10:27 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

Add the control files for hugetlb controller

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/hugetlb.h        |    5 ++
 include/linux/hugetlb_cgroup.h |    6 ++
 mm/hugetlb.c                   |    8 +++
 mm/hugetlb_cgroup.c            |  129 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 148 insertions(+)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 4aca057..9650bb1 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -4,6 +4,7 @@
 #include <linux/mm_types.h>
 #include <linux/fs.h>
 #include <linux/hugetlb_inline.h>
+#include <linux/cgroup.h>
 
 struct ctl_table;
 struct user_struct;
@@ -221,6 +222,10 @@ struct hstate {
 	unsigned int nr_huge_pages_node[MAX_NUMNODES];
 	unsigned int free_huge_pages_node[MAX_NUMNODES];
 	unsigned int surplus_huge_pages_node[MAX_NUMNODES];
+#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR
+	/* cgroup control files */
+	struct cftype cgroup_files[5];
+#endif
 	char name[HSTATE_NAME_LEN];
 };
 
diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h
index e05871c..bd8bc98 100644
--- a/include/linux/hugetlb_cgroup.h
+++ b/include/linux/hugetlb_cgroup.h
@@ -62,6 +62,7 @@ extern void hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages,
 					 struct page *page);
 extern void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
 					   struct hugetlb_cgroup *h_cg);
+extern int hugetlb_cgroup_file_init(int idx) __init;
 
 #else
 static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page)
@@ -108,5 +109,10 @@ hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
 	return;
 }
 
+static inline int __init hugetlb_cgroup_file_init(int idx)
+{
+	return 0;
+}
+
 #endif  /* CONFIG_MEM_RES_CTLR_HUGETLB */
 #endif
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 59720b1..a5a30bf 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -30,6 +30,7 @@
 #include <linux/hugetlb.h>
 #include <linux/hugetlb_cgroup.h>
 #include <linux/node.h>
+#include <linux/hugetlb_cgroup.h>
 #include "internal.h"
 
 const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
@@ -1930,6 +1931,13 @@ void __init hugetlb_add_hstate(unsigned order)
 	h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]);
 	snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
 					huge_page_size(h)/1024);
+	/*
+	 * Add cgroup control files only if the huge page consists
+	 * of more than two normal pages. This is because we use
+	 * page[2].lru.next for storing cgoup details.
+	 */
+	if (order >= HUGETLB_CGROUP_MIN_ORDER)
+		hugetlb_cgroup_file_init(hugetlb_max_hstate - 1);
 
 	parsed_hstate = h;
 }
diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
index a3a68a4..64e93e0 100644
--- a/mm/hugetlb_cgroup.c
+++ b/mm/hugetlb_cgroup.c
@@ -26,6 +26,10 @@ struct hugetlb_cgroup {
 	struct res_counter hugepage[HUGE_MAX_HSTATE];
 };
 
+#define MEMFILE_PRIVATE(x, val)	(((x) << 16) | (val))
+#define MEMFILE_IDX(val)	(((val) >> 16) & 0xffff)
+#define MEMFILE_ATTR(val)	((val) & 0xffff)
+
 struct cgroup_subsys hugetlb_subsys __read_mostly;
 struct hugetlb_cgroup *root_h_cgroup __read_mostly;
 
@@ -259,6 +263,131 @@ void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
 	return;
 }
 
+static ssize_t hugetlb_cgroup_read(struct cgroup *cgroup, struct cftype *cft,
+				   struct file *file, char __user *buf,
+				   size_t nbytes, loff_t *ppos)
+{
+	u64 val;
+	char str[64];
+	int idx, name, len;
+	struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup);
+
+	idx = MEMFILE_IDX(cft->private);
+	name = MEMFILE_ATTR(cft->private);
+
+	val = res_counter_read_u64(&h_cg->hugepage[idx], name);
+	len = scnprintf(str, sizeof(str), "%llu\n", (unsigned long long)val);
+	return simple_read_from_buffer(buf, nbytes, ppos, str, len);
+}
+
+static int hugetlb_cgroup_write(struct cgroup *cgroup, struct cftype *cft,
+				const char *buffer)
+{
+	int idx, name, ret;
+	unsigned long long val;
+	struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup);
+
+	idx = MEMFILE_IDX(cft->private);
+	name = MEMFILE_ATTR(cft->private);
+
+	switch (name) {
+	case RES_LIMIT:
+		if (hugetlb_cgroup_is_root(h_cg)) {
+			/* Can't set limit on root */
+			ret = -EINVAL;
+			break;
+		}
+		/* This function does all necessary parse...reuse it */
+		ret = res_counter_memparse_write_strategy(buffer, &val);
+		if (ret)
+			break;
+		ret = res_counter_set_limit(&h_cg->hugepage[idx], val);
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+	return ret;
+}
+
+static int hugetlb_cgroup_reset(struct cgroup *cgroup, unsigned int event)
+{
+	int idx, name, ret = 0;
+	struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup);
+
+	idx = MEMFILE_IDX(event);
+	name = MEMFILE_ATTR(event);
+
+	switch (name) {
+	case RES_MAX_USAGE:
+		res_counter_reset_max(&h_cg->hugepage[idx]);
+		break;
+	case RES_FAILCNT:
+		res_counter_reset_failcnt(&h_cg->hugepage[idx]);
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+	return ret;
+}
+
+static char *mem_fmt(char *buf, int size, unsigned long hsize)
+{
+	if (hsize >= (1UL << 30))
+		snprintf(buf, size, "%luGB", hsize >> 30);
+	else if (hsize >= (1UL << 20))
+		snprintf(buf, size, "%luMB", hsize >> 20);
+	else
+		snprintf(buf, size, "%luKB", hsize >> 10);
+	return buf;
+}
+
+int __init hugetlb_cgroup_file_init(int idx)
+{
+	char buf[32];
+	struct cftype *cft;
+	struct hstate *h = &hstates[idx];
+
+	/* format the size */
+	mem_fmt(buf, 32, huge_page_size(h));
+
+	/* Add the limit file */
+	cft = &h->cgroup_files[0];
+	snprintf(cft->name, MAX_CFTYPE_NAME, "%s.limit_in_bytes", buf);
+	cft->private = MEMFILE_PRIVATE(idx, RES_LIMIT);
+	cft->read = hugetlb_cgroup_read;
+	cft->write_string = hugetlb_cgroup_write;
+
+	/* Add the usage file */
+	cft = &h->cgroup_files[1];
+	snprintf(cft->name, MAX_CFTYPE_NAME, "%s.usage_in_bytes", buf);
+	cft->private = MEMFILE_PRIVATE(idx, RES_USAGE);
+	cft->read = hugetlb_cgroup_read;
+
+	/* Add the MAX usage file */
+	cft = &h->cgroup_files[2];
+	snprintf(cft->name, MAX_CFTYPE_NAME, "%s.max_usage_in_bytes", buf);
+	cft->private = MEMFILE_PRIVATE(idx, RES_MAX_USAGE);
+	cft->trigger = hugetlb_cgroup_reset;
+	cft->read = hugetlb_cgroup_read;
+
+	/* Add the failcntfile */
+	cft = &h->cgroup_files[3];
+	snprintf(cft->name, MAX_CFTYPE_NAME, "%s.failcnt", buf);
+	cft->private  = MEMFILE_PRIVATE(idx, RES_FAILCNT);
+	cft->trigger  = hugetlb_cgroup_reset;
+	cft->read = hugetlb_cgroup_read;
+
+	/* NULL terminate the last cft */
+	cft = &h->cgroup_files[4];
+	memset(cft, 0, sizeof(*cft));
+
+	WARN_ON(cgroup_add_cftypes(&hugetlb_subsys, h->cgroup_files));
+
+	return 0;
+}
+
 struct cgroup_subsys hugetlb_subsys = {
 	.name = "hugetlb",
 	.create     = hugetlb_cgroup_create,
-- 
1.7.10

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH -V9 14/15] hugetlb/cgroup: migrate hugetlb cgroup info from oldpage to new page during migration
  2012-06-13 10:27 ` Aneesh Kumar K.V
@ 2012-06-13 10:27   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 10:27 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

With HugeTLB pages, hugetlb cgroup is uncharged in compound page destructor.  Since
we are holding a hugepage reference, we can be sure that old page won't
get uncharged till the last put_page().

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/hugetlb_cgroup.h |    8 ++++++++
 mm/hugetlb_cgroup.c            |   20 ++++++++++++++++++++
 mm/migrate.c                   |    5 +++++
 3 files changed, 33 insertions(+)

diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h
index bd8bc98..e9e6d74 100644
--- a/include/linux/hugetlb_cgroup.h
+++ b/include/linux/hugetlb_cgroup.h
@@ -63,6 +63,8 @@ extern void hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages,
 extern void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
 					   struct hugetlb_cgroup *h_cg);
 extern int hugetlb_cgroup_file_init(int idx) __init;
+extern void hugetlb_cgroup_migrate(struct page *oldhpage,
+				   struct page *newhpage);
 
 #else
 static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page)
@@ -114,5 +116,11 @@ static inline int __init hugetlb_cgroup_file_init(int idx)
 	return 0;
 }
 
+static inline void hugetlb_cgroup_migrate(struct page *oldhpage,
+					  struct page *newhpage)
+{
+	return;
+}
+
 #endif  /* CONFIG_MEM_RES_CTLR_HUGETLB */
 #endif
diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
index 64e93e0..8e7ca0a 100644
--- a/mm/hugetlb_cgroup.c
+++ b/mm/hugetlb_cgroup.c
@@ -388,6 +388,26 @@ int __init hugetlb_cgroup_file_init(int idx)
 	return 0;
 }
 
+void hugetlb_cgroup_migrate(struct page *oldhpage, struct page *newhpage)
+{
+	struct hugetlb_cgroup *h_cg;
+
+	if (hugetlb_cgroup_disabled())
+		return;
+
+	VM_BUG_ON(!PageHuge(oldhpage));
+	spin_lock(&hugetlb_lock);
+	h_cg = hugetlb_cgroup_from_page(oldhpage);
+	set_hugetlb_cgroup(oldhpage, NULL);
+	cgroup_exclude_rmdir(&h_cg->css);
+
+	/* move the h_cg details to new cgroup */
+	set_hugetlb_cgroup(newhpage, h_cg);
+	spin_unlock(&hugetlb_lock);
+	cgroup_release_and_wakeup_rmdir(&h_cg->css);
+	return;
+}
+
 struct cgroup_subsys hugetlb_subsys = {
 	.name = "hugetlb",
 	.create     = hugetlb_cgroup_create,
diff --git a/mm/migrate.c b/mm/migrate.c
index fdce3a2..6c37c51 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -33,6 +33,7 @@
 #include <linux/memcontrol.h>
 #include <linux/syscalls.h>
 #include <linux/hugetlb.h>
+#include <linux/hugetlb_cgroup.h>
 #include <linux/gfp.h>
 
 #include <asm/tlbflush.h>
@@ -931,6 +932,10 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
 
 	if (anon_vma)
 		put_anon_vma(anon_vma);
+
+	if (!rc)
+		hugetlb_cgroup_migrate(hpage, new_hpage);
+
 	unlock_page(hpage);
 out:
 	put_page(new_hpage);
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH -V9 14/15] hugetlb/cgroup: migrate hugetlb cgroup info from oldpage to new page during migration
@ 2012-06-13 10:27   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 10:27 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

With HugeTLB pages, hugetlb cgroup is uncharged in compound page destructor.  Since
we are holding a hugepage reference, we can be sure that old page won't
get uncharged till the last put_page().

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/hugetlb_cgroup.h |    8 ++++++++
 mm/hugetlb_cgroup.c            |   20 ++++++++++++++++++++
 mm/migrate.c                   |    5 +++++
 3 files changed, 33 insertions(+)

diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h
index bd8bc98..e9e6d74 100644
--- a/include/linux/hugetlb_cgroup.h
+++ b/include/linux/hugetlb_cgroup.h
@@ -63,6 +63,8 @@ extern void hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages,
 extern void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
 					   struct hugetlb_cgroup *h_cg);
 extern int hugetlb_cgroup_file_init(int idx) __init;
+extern void hugetlb_cgroup_migrate(struct page *oldhpage,
+				   struct page *newhpage);
 
 #else
 static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page)
@@ -114,5 +116,11 @@ static inline int __init hugetlb_cgroup_file_init(int idx)
 	return 0;
 }
 
+static inline void hugetlb_cgroup_migrate(struct page *oldhpage,
+					  struct page *newhpage)
+{
+	return;
+}
+
 #endif  /* CONFIG_MEM_RES_CTLR_HUGETLB */
 #endif
diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
index 64e93e0..8e7ca0a 100644
--- a/mm/hugetlb_cgroup.c
+++ b/mm/hugetlb_cgroup.c
@@ -388,6 +388,26 @@ int __init hugetlb_cgroup_file_init(int idx)
 	return 0;
 }
 
+void hugetlb_cgroup_migrate(struct page *oldhpage, struct page *newhpage)
+{
+	struct hugetlb_cgroup *h_cg;
+
+	if (hugetlb_cgroup_disabled())
+		return;
+
+	VM_BUG_ON(!PageHuge(oldhpage));
+	spin_lock(&hugetlb_lock);
+	h_cg = hugetlb_cgroup_from_page(oldhpage);
+	set_hugetlb_cgroup(oldhpage, NULL);
+	cgroup_exclude_rmdir(&h_cg->css);
+
+	/* move the h_cg details to new cgroup */
+	set_hugetlb_cgroup(newhpage, h_cg);
+	spin_unlock(&hugetlb_lock);
+	cgroup_release_and_wakeup_rmdir(&h_cg->css);
+	return;
+}
+
 struct cgroup_subsys hugetlb_subsys = {
 	.name = "hugetlb",
 	.create     = hugetlb_cgroup_create,
diff --git a/mm/migrate.c b/mm/migrate.c
index fdce3a2..6c37c51 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -33,6 +33,7 @@
 #include <linux/memcontrol.h>
 #include <linux/syscalls.h>
 #include <linux/hugetlb.h>
+#include <linux/hugetlb_cgroup.h>
 #include <linux/gfp.h>
 
 #include <asm/tlbflush.h>
@@ -931,6 +932,10 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
 
 	if (anon_vma)
 		put_anon_vma(anon_vma);
+
+	if (!rc)
+		hugetlb_cgroup_migrate(hpage, new_hpage);
+
 	unlock_page(hpage);
 out:
 	put_page(new_hpage);
-- 
1.7.10

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH -V9 15/15] hugetlb/cgroup: add HugeTLB controller documentation
  2012-06-13 10:27 ` Aneesh Kumar K.V
@ 2012-06-13 10:27   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 10:27 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 Documentation/cgroups/hugetlb.txt |   45 +++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)
 create mode 100644 Documentation/cgroups/hugetlb.txt

diff --git a/Documentation/cgroups/hugetlb.txt b/Documentation/cgroups/hugetlb.txt
new file mode 100644
index 0000000..a9faaca
--- /dev/null
+++ b/Documentation/cgroups/hugetlb.txt
@@ -0,0 +1,45 @@
+HugeTLB Controller
+-------------------
+
+The HugeTLB controller allows to limit the HugeTLB usage per control group and
+enforces the controller limit during page fault. Since HugeTLB doesn't
+support page reclaim, enforcing the limit at page fault time implies that,
+the application will get SIGBUS signal if it tries to access HugeTLB pages
+beyond its limit. This requires the application to know beforehand how much
+HugeTLB pages it would require for its use.
+
+HugeTLB controller can be created by first mounting the cgroup filesystem.
+
+# mount -t cgroup -o hugetlb none /sys/fs/cgroup
+
+With the above step, the initial or the parent HugeTLB group becomes
+visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in
+the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup.
+
+New groups can be created under the parent group /sys/fs/cgroup.
+
+# cd /sys/fs/cgroup
+# mkdir g1
+# echo $$ > g1/tasks
+
+The above steps create a new group g1 and move the current shell
+process (bash) into it.
+
+Brief summary of control files
+
+ hugetlb.<hugepagesize>.limit_in_bytes     # set/show limit of "hugepagesize" hugetlb usage
+ hugetlb.<hugepagesize>.max_usage_in_bytes # show max "hugepagesize" hugetlb  usage recorded
+ hugetlb.<hugepagesize>.usage_in_bytes     # show current res_counter usage for "hugepagesize" hugetlb
+ hugetlb.<hugepagesize>.failcnt		   # show the number of allocation failure due to HugeTLB limit
+
+For a system supporting two hugepage size (16M and 16G) the control
+files include:
+
+hugetlb.16GB.limit_in_bytes
+hugetlb.16GB.max_usage_in_bytes
+hugetlb.16GB.usage_in_bytes
+hugetlb.16GB.failcnt
+hugetlb.16MB.limit_in_bytes
+hugetlb.16MB.max_usage_in_bytes
+hugetlb.16MB.usage_in_bytes
+hugetlb.16MB.failcnt
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH -V9 15/15] hugetlb/cgroup: add HugeTLB controller documentation
@ 2012-06-13 10:27   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 10:27 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 Documentation/cgroups/hugetlb.txt |   45 +++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)
 create mode 100644 Documentation/cgroups/hugetlb.txt

diff --git a/Documentation/cgroups/hugetlb.txt b/Documentation/cgroups/hugetlb.txt
new file mode 100644
index 0000000..a9faaca
--- /dev/null
+++ b/Documentation/cgroups/hugetlb.txt
@@ -0,0 +1,45 @@
+HugeTLB Controller
+-------------------
+
+The HugeTLB controller allows to limit the HugeTLB usage per control group and
+enforces the controller limit during page fault. Since HugeTLB doesn't
+support page reclaim, enforcing the limit at page fault time implies that,
+the application will get SIGBUS signal if it tries to access HugeTLB pages
+beyond its limit. This requires the application to know beforehand how much
+HugeTLB pages it would require for its use.
+
+HugeTLB controller can be created by first mounting the cgroup filesystem.
+
+# mount -t cgroup -o hugetlb none /sys/fs/cgroup
+
+With the above step, the initial or the parent HugeTLB group becomes
+visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in
+the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup.
+
+New groups can be created under the parent group /sys/fs/cgroup.
+
+# cd /sys/fs/cgroup
+# mkdir g1
+# echo $$ > g1/tasks
+
+The above steps create a new group g1 and move the current shell
+process (bash) into it.
+
+Brief summary of control files
+
+ hugetlb.<hugepagesize>.limit_in_bytes     # set/show limit of "hugepagesize" hugetlb usage
+ hugetlb.<hugepagesize>.max_usage_in_bytes # show max "hugepagesize" hugetlb  usage recorded
+ hugetlb.<hugepagesize>.usage_in_bytes     # show current res_counter usage for "hugepagesize" hugetlb
+ hugetlb.<hugepagesize>.failcnt		   # show the number of allocation failure due to HugeTLB limit
+
+For a system supporting two hugepage size (16M and 16G) the control
+files include:
+
+hugetlb.16GB.limit_in_bytes
+hugetlb.16GB.max_usage_in_bytes
+hugetlb.16GB.usage_in_bytes
+hugetlb.16GB.failcnt
+hugetlb.16MB.limit_in_bytes
+hugetlb.16MB.max_usage_in_bytes
+hugetlb.16MB.usage_in_bytes
+hugetlb.16MB.failcnt
-- 
1.7.10

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 10/15] hugetlb/cgroup: Add the cgroup pointer to page lru
  2012-06-13 10:27   ` Aneesh Kumar K.V
  (?)
@ 2012-06-13 11:32     ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 11:32 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups


Need this patch for hugetlb cgroup disabled. I will send an updated patch in
reply.

diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h
index e9e6d74..bc30413 100644
--- a/include/linux/hugetlb_cgroup.h
+++ b/include/linux/hugetlb_cgroup.h
@@ -18,14 +18,14 @@
 #include <linux/res_counter.h>
 
 struct hugetlb_cgroup;
-
-#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR
 /*
  * Minimum page order trackable by hugetlb cgroup.
  * At least 3 pages are necessary for all the tracking information.
  */
 #define HUGETLB_CGROUP_MIN_ORDER	2
 
+#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR
+
 static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page)
 {
 	VM_BUG_ON(!PageHuge(page));


^ permalink raw reply related	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 10/15] hugetlb/cgroup: Add the cgroup pointer to page lru
@ 2012-06-13 11:32     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 11:32 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups


Need this patch for hugetlb cgroup disabled. I will send an updated patch in
reply.

diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h
index e9e6d74..bc30413 100644
--- a/include/linux/hugetlb_cgroup.h
+++ b/include/linux/hugetlb_cgroup.h
@@ -18,14 +18,14 @@
 #include <linux/res_counter.h>
 
 struct hugetlb_cgroup;
-
-#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR
 /*
  * Minimum page order trackable by hugetlb cgroup.
  * At least 3 pages are necessary for all the tracking information.
  */
 #define HUGETLB_CGROUP_MIN_ORDER	2
 
+#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR
+
 static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page)
 {
 	VM_BUG_ON(!PageHuge(page));

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 10/15] hugetlb/cgroup: Add the cgroup pointer to page lru
@ 2012-06-13 11:32     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 11:32 UTC (permalink / raw)
  To: linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, rientjes-hpIqsD4AKlfQT0dZR+AlfA,
	mhocko-AlSwsSmVLrQ, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA


Need this patch for hugetlb cgroup disabled. I will send an updated patch in
reply.

diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h
index e9e6d74..bc30413 100644
--- a/include/linux/hugetlb_cgroup.h
+++ b/include/linux/hugetlb_cgroup.h
@@ -18,14 +18,14 @@
 #include <linux/res_counter.h>
 
 struct hugetlb_cgroup;
-
-#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR
 /*
  * Minimum page order trackable by hugetlb cgroup.
  * At least 3 pages are necessary for all the tracking information.
  */
 #define HUGETLB_CGROUP_MIN_ORDER	2
 
+#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR
+
 static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page)
 {
 	VM_BUG_ON(!PageHuge(page));

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH -V9 [updated] 10/15] hugetlb/cgroup: Add the cgroup pointer to page lru
  2012-06-13 10:27   ` Aneesh Kumar K.V
@ 2012-06-13 11:34     ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 11:34 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

Add the hugetlb cgroup pointer to 3rd page lru.next. This limit
the usage to hugetlb cgroup to only hugepages with 3 or more
normal pages. I guess that is an acceptable limitation.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/hugetlb_cgroup.h |   37 +++++++++++++++++++++++++++++++++++++
 mm/hugetlb.c                   |    4 ++++
 2 files changed, 41 insertions(+)

diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h
index e9944b4..2e4cb6b 100644
--- a/include/linux/hugetlb_cgroup.h
+++ b/include/linux/hugetlb_cgroup.h
@@ -18,8 +18,34 @@
 #include <linux/res_counter.h>
 
 struct hugetlb_cgroup;
+/*
+ * Minimum page order trackable by hugetlb cgroup.
+ * At least 3 pages are necessary for all the tracking information.
+ */
+#define HUGETLB_CGROUP_MIN_ORDER	2
 
 #ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR
+
+static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page)
+{
+	VM_BUG_ON(!PageHuge(page));
+
+	if (compound_order(page) < HUGETLB_CGROUP_MIN_ORDER)
+		return NULL;
+	return (struct hugetlb_cgroup *)page[2].lru.next;
+}
+
+static inline
+int set_hugetlb_cgroup(struct page *page, struct hugetlb_cgroup *h_cg)
+{
+	VM_BUG_ON(!PageHuge(page));
+
+	if (compound_order(page) < HUGETLB_CGROUP_MIN_ORDER)
+		return -1;
+	page[2].lru.next = (void *)h_cg;
+	return 0;
+}
+
 static inline bool hugetlb_cgroup_disabled(void)
 {
 	if (hugetlb_subsys.disabled)
@@ -28,6 +54,17 @@ static inline bool hugetlb_cgroup_disabled(void)
 }
 
 #else
+static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page)
+{
+	return NULL;
+}
+
+static inline
+int set_hugetlb_cgroup(struct page *page, struct hugetlb_cgroup *h_cg)
+{
+	return 0;
+}
+
 static inline bool hugetlb_cgroup_disabled(void)
 {
 	return true;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index e899a2d..6a449c5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -28,6 +28,7 @@
 
 #include <linux/io.h>
 #include <linux/hugetlb.h>
+#include <linux/hugetlb_cgroup.h>
 #include <linux/node.h>
 #include "internal.h"
 
@@ -591,6 +592,7 @@ static void update_and_free_page(struct hstate *h, struct page *page)
 				1 << PG_active | 1 << PG_reserved |
 				1 << PG_private | 1 << PG_writeback);
 	}
+	VM_BUG_ON(hugetlb_cgroup_from_page(page));
 	set_compound_page_dtor(page, NULL);
 	set_page_refcounted(page);
 	arch_release_hugepage(page);
@@ -643,6 +645,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
 	INIT_LIST_HEAD(&page->lru);
 	set_compound_page_dtor(page, free_huge_page);
 	spin_lock(&hugetlb_lock);
+	set_hugetlb_cgroup(page, NULL);
 	h->nr_huge_pages++;
 	h->nr_huge_pages_node[nid]++;
 	spin_unlock(&hugetlb_lock);
@@ -892,6 +895,7 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid)
 		INIT_LIST_HEAD(&page->lru);
 		r_nid = page_to_nid(page);
 		set_compound_page_dtor(page, free_huge_page);
+		set_hugetlb_cgroup(page, NULL);
 		/*
 		 * We incremented the global counters already
 		 */
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH -V9 [updated] 10/15] hugetlb/cgroup: Add the cgroup pointer to page lru
@ 2012-06-13 11:34     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 11:34 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm, hannes
  Cc: linux-kernel, cgroups, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

Add the hugetlb cgroup pointer to 3rd page lru.next. This limit
the usage to hugetlb cgroup to only hugepages with 3 or more
normal pages. I guess that is an acceptable limitation.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/hugetlb_cgroup.h |   37 +++++++++++++++++++++++++++++++++++++
 mm/hugetlb.c                   |    4 ++++
 2 files changed, 41 insertions(+)

diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h
index e9944b4..2e4cb6b 100644
--- a/include/linux/hugetlb_cgroup.h
+++ b/include/linux/hugetlb_cgroup.h
@@ -18,8 +18,34 @@
 #include <linux/res_counter.h>
 
 struct hugetlb_cgroup;
+/*
+ * Minimum page order trackable by hugetlb cgroup.
+ * At least 3 pages are necessary for all the tracking information.
+ */
+#define HUGETLB_CGROUP_MIN_ORDER	2
 
 #ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR
+
+static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page)
+{
+	VM_BUG_ON(!PageHuge(page));
+
+	if (compound_order(page) < HUGETLB_CGROUP_MIN_ORDER)
+		return NULL;
+	return (struct hugetlb_cgroup *)page[2].lru.next;
+}
+
+static inline
+int set_hugetlb_cgroup(struct page *page, struct hugetlb_cgroup *h_cg)
+{
+	VM_BUG_ON(!PageHuge(page));
+
+	if (compound_order(page) < HUGETLB_CGROUP_MIN_ORDER)
+		return -1;
+	page[2].lru.next = (void *)h_cg;
+	return 0;
+}
+
 static inline bool hugetlb_cgroup_disabled(void)
 {
 	if (hugetlb_subsys.disabled)
@@ -28,6 +54,17 @@ static inline bool hugetlb_cgroup_disabled(void)
 }
 
 #else
+static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page)
+{
+	return NULL;
+}
+
+static inline
+int set_hugetlb_cgroup(struct page *page, struct hugetlb_cgroup *h_cg)
+{
+	return 0;
+}
+
 static inline bool hugetlb_cgroup_disabled(void)
 {
 	return true;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index e899a2d..6a449c5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -28,6 +28,7 @@
 
 #include <linux/io.h>
 #include <linux/hugetlb.h>
+#include <linux/hugetlb_cgroup.h>
 #include <linux/node.h>
 #include "internal.h"
 
@@ -591,6 +592,7 @@ static void update_and_free_page(struct hstate *h, struct page *page)
 				1 << PG_active | 1 << PG_reserved |
 				1 << PG_private | 1 << PG_writeback);
 	}
+	VM_BUG_ON(hugetlb_cgroup_from_page(page));
 	set_compound_page_dtor(page, NULL);
 	set_page_refcounted(page);
 	arch_release_hugepage(page);
@@ -643,6 +645,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
 	INIT_LIST_HEAD(&page->lru);
 	set_compound_page_dtor(page, free_huge_page);
 	spin_lock(&hugetlb_lock);
+	set_hugetlb_cgroup(page, NULL);
 	h->nr_huge_pages++;
 	h->nr_huge_pages_node[nid]++;
 	spin_unlock(&hugetlb_lock);
@@ -892,6 +895,7 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid)
 		INIT_LIST_HEAD(&page->lru);
 		r_nid = page_to_nid(page);
 		set_compound_page_dtor(page, free_huge_page);
+		set_hugetlb_cgroup(page, NULL);
 		/*
 		 * We incremented the global counters already
 		 */
-- 
1.7.10

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages
  2012-06-13 10:27   ` Aneesh Kumar K.V
  (?)
@ 2012-06-13 14:59     ` Michal Hocko
  -1 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-13 14:59 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

On Wed 13-06-12 15:57:23, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> Use a mmu_gather instead of a temporary linked list for accumulating
> pages when we unmap a hugepage range

Sorry for coming up with the comment that late but you owe us an
explanation _why_ you are doing this.

I assume that this fixes a real problem when we take i_mmap_mutex
already up in 
unmap_mapping_range
  mutex_lock(&mapping->i_mmap_mutex);
  unmap_mapping_range_tree | unmap_mapping_range_list 
    unmap_mapping_range_vma
      zap_page_range_single
        unmap_single_vma
	  unmap_hugepage_range
	    mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);

And that this should have been marked for stable as well (I haven't
checked when this has been introduced).

But then I do not see how this help when you still do this:
[...]
> diff --git a/mm/memory.c b/mm/memory.c
> index 1b7dc66..545e18a 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1326,8 +1326,11 @@ static void unmap_single_vma(struct mmu_gather *tlb,
>  			 * Since no pte has actually been setup, it is
>  			 * safe to do nothing in this case.
>  			 */
> -			if (vma->vm_file)
> -				unmap_hugepage_range(vma, start, end, NULL);
> +			if (vma->vm_file) {
> +				mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
> +				__unmap_hugepage_range(tlb, vma, start, end, NULL);
> +				mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
> +			}
>  		} else
>  			unmap_page_range(tlb, vma, start, end, details);
>  	}

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages
@ 2012-06-13 14:59     ` Michal Hocko
  0 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-13 14:59 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

On Wed 13-06-12 15:57:23, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> Use a mmu_gather instead of a temporary linked list for accumulating
> pages when we unmap a hugepage range

Sorry for coming up with the comment that late but you owe us an
explanation _why_ you are doing this.

I assume that this fixes a real problem when we take i_mmap_mutex
already up in 
unmap_mapping_range
  mutex_lock(&mapping->i_mmap_mutex);
  unmap_mapping_range_tree | unmap_mapping_range_list 
    unmap_mapping_range_vma
      zap_page_range_single
        unmap_single_vma
	  unmap_hugepage_range
	    mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);

And that this should have been marked for stable as well (I haven't
checked when this has been introduced).

But then I do not see how this help when you still do this:
[...]
> diff --git a/mm/memory.c b/mm/memory.c
> index 1b7dc66..545e18a 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1326,8 +1326,11 @@ static void unmap_single_vma(struct mmu_gather *tlb,
>  			 * Since no pte has actually been setup, it is
>  			 * safe to do nothing in this case.
>  			 */
> -			if (vma->vm_file)
> -				unmap_hugepage_range(vma, start, end, NULL);
> +			if (vma->vm_file) {
> +				mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
> +				__unmap_hugepage_range(tlb, vma, start, end, NULL);
> +				mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
> +			}
>  		} else
>  			unmap_page_range(tlb, vma, start, end, details);
>  	}

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages
@ 2012-06-13 14:59     ` Michal Hocko
  0 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-13 14:59 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, rientjes-hpIqsD4AKlfQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Wed 13-06-12 15:57:23, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> 
> Use a mmu_gather instead of a temporary linked list for accumulating
> pages when we unmap a hugepage range

Sorry for coming up with the comment that late but you owe us an
explanation _why_ you are doing this.

I assume that this fixes a real problem when we take i_mmap_mutex
already up in 
unmap_mapping_range
  mutex_lock(&mapping->i_mmap_mutex);
  unmap_mapping_range_tree | unmap_mapping_range_list 
    unmap_mapping_range_vma
      zap_page_range_single
        unmap_single_vma
	  unmap_hugepage_range
	    mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);

And that this should have been marked for stable as well (I haven't
checked when this has been introduced).

But then I do not see how this help when you still do this:
[...]
> diff --git a/mm/memory.c b/mm/memory.c
> index 1b7dc66..545e18a 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1326,8 +1326,11 @@ static void unmap_single_vma(struct mmu_gather *tlb,
>  			 * Since no pte has actually been setup, it is
>  			 * safe to do nothing in this case.
>  			 */
> -			if (vma->vm_file)
> -				unmap_hugepage_range(vma, start, end, NULL);
> +			if (vma->vm_file) {
> +				mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
> +				__unmap_hugepage_range(tlb, vma, start, end, NULL);
> +				mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
> +			}
>  		} else
>  			unmap_page_range(tlb, vma, start, end, details);
>  	}

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages
  2012-06-13 14:59     ` Michal Hocko
  (?)
@ 2012-06-13 15:03       ` Michal Hocko
  -1 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-13 15:03 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

On Wed 13-06-12 16:59:23, Michal Hocko wrote:
> On Wed 13-06-12 15:57:23, Aneesh Kumar K.V wrote:
> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> > 
> > Use a mmu_gather instead of a temporary linked list for accumulating
> > pages when we unmap a hugepage range
> 
> Sorry for coming up with the comment that late but you owe us an
> explanation _why_ you are doing this.
> 
> I assume that this fixes a real problem when we take i_mmap_mutex
> already up in 
> unmap_mapping_range
>   mutex_lock(&mapping->i_mmap_mutex);
>   unmap_mapping_range_tree | unmap_mapping_range_list 
>     unmap_mapping_range_vma
>       zap_page_range_single
>         unmap_single_vma
> 	  unmap_hugepage_range
> 	    mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
> 
> And that this should have been marked for stable as well (I haven't
> checked when this has been introduced).
> 
> But then I do not see how this help when you still do this:
> [...]
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 1b7dc66..545e18a 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -1326,8 +1326,11 @@ static void unmap_single_vma(struct mmu_gather *tlb,
> >  			 * Since no pte has actually been setup, it is
> >  			 * safe to do nothing in this case.
> >  			 */
> > -			if (vma->vm_file)
> > -				unmap_hugepage_range(vma, start, end, NULL);
> > +			if (vma->vm_file) {
> > +				mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
> > +				__unmap_hugepage_range(tlb, vma, start, end, NULL);
> > +				mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
> > +			}
> >  		} else
> >  			unmap_page_range(tlb, vma, start, end, details);
> >  	}

Ahhh, you are removing the lock in the next patch. Really confusing and
not nice for the stable backport.
Could you merge those two patches and add Cc: stable? 
Then you can add my
Reviewed-by: Michal Hocko <mhocko@suse.cz>

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages
@ 2012-06-13 15:03       ` Michal Hocko
  0 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-13 15:03 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

On Wed 13-06-12 16:59:23, Michal Hocko wrote:
> On Wed 13-06-12 15:57:23, Aneesh Kumar K.V wrote:
> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> > 
> > Use a mmu_gather instead of a temporary linked list for accumulating
> > pages when we unmap a hugepage range
> 
> Sorry for coming up with the comment that late but you owe us an
> explanation _why_ you are doing this.
> 
> I assume that this fixes a real problem when we take i_mmap_mutex
> already up in 
> unmap_mapping_range
>   mutex_lock(&mapping->i_mmap_mutex);
>   unmap_mapping_range_tree | unmap_mapping_range_list 
>     unmap_mapping_range_vma
>       zap_page_range_single
>         unmap_single_vma
> 	  unmap_hugepage_range
> 	    mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
> 
> And that this should have been marked for stable as well (I haven't
> checked when this has been introduced).
> 
> But then I do not see how this help when you still do this:
> [...]
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 1b7dc66..545e18a 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -1326,8 +1326,11 @@ static void unmap_single_vma(struct mmu_gather *tlb,
> >  			 * Since no pte has actually been setup, it is
> >  			 * safe to do nothing in this case.
> >  			 */
> > -			if (vma->vm_file)
> > -				unmap_hugepage_range(vma, start, end, NULL);
> > +			if (vma->vm_file) {
> > +				mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
> > +				__unmap_hugepage_range(tlb, vma, start, end, NULL);
> > +				mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
> > +			}
> >  		} else
> >  			unmap_page_range(tlb, vma, start, end, details);
> >  	}

Ahhh, you are removing the lock in the next patch. Really confusing and
not nice for the stable backport.
Could you merge those two patches and add Cc: stable? 
Then you can add my
Reviewed-by: Michal Hocko <mhocko@suse.cz>

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages
@ 2012-06-13 15:03       ` Michal Hocko
  0 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-13 15:03 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, rientjes-hpIqsD4AKlfQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Wed 13-06-12 16:59:23, Michal Hocko wrote:
> On Wed 13-06-12 15:57:23, Aneesh Kumar K.V wrote:
> > From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> > 
> > Use a mmu_gather instead of a temporary linked list for accumulating
> > pages when we unmap a hugepage range
> 
> Sorry for coming up with the comment that late but you owe us an
> explanation _why_ you are doing this.
> 
> I assume that this fixes a real problem when we take i_mmap_mutex
> already up in 
> unmap_mapping_range
>   mutex_lock(&mapping->i_mmap_mutex);
>   unmap_mapping_range_tree | unmap_mapping_range_list 
>     unmap_mapping_range_vma
>       zap_page_range_single
>         unmap_single_vma
> 	  unmap_hugepage_range
> 	    mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
> 
> And that this should have been marked for stable as well (I haven't
> checked when this has been introduced).
> 
> But then I do not see how this help when you still do this:
> [...]
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 1b7dc66..545e18a 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -1326,8 +1326,11 @@ static void unmap_single_vma(struct mmu_gather *tlb,
> >  			 * Since no pte has actually been setup, it is
> >  			 * safe to do nothing in this case.
> >  			 */
> > -			if (vma->vm_file)
> > -				unmap_hugepage_range(vma, start, end, NULL);
> > +			if (vma->vm_file) {
> > +				mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
> > +				__unmap_hugepage_range(tlb, vma, start, end, NULL);
> > +				mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
> > +			}
> >  		} else
> >  			unmap_page_range(tlb, vma, start, end, details);
> >  	}

Ahhh, you are removing the lock in the next patch. Really confusing and
not nice for the stable backport.
Could you merge those two patches and add Cc: stable? 
Then you can add my
Reviewed-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages
  2012-06-13 14:59     ` Michal Hocko
  (?)
@ 2012-06-13 16:37       ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 16:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

Michal Hocko <mhocko@suse.cz> writes:

> On Wed 13-06-12 15:57:23, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> 
>> Use a mmu_gather instead of a temporary linked list for accumulating
>> pages when we unmap a hugepage range
>
> Sorry for coming up with the comment that late but you owe us an
> explanation _why_ you are doing this.
>
> I assume that this fixes a real problem when we take i_mmap_mutex
> already up in 
> unmap_mapping_range
>   mutex_lock(&mapping->i_mmap_mutex);
>   unmap_mapping_range_tree | unmap_mapping_range_list 
>     unmap_mapping_range_vma
>       zap_page_range_single
>         unmap_single_vma
> 	  unmap_hugepage_range
> 	    mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
>
> And that this should have been marked for stable as well (I haven't
> checked when this has been introduced).

Switch to mmu_gather is to get rid of the use of page->lru so that i can use it for
active list.


-aneesh


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages
@ 2012-06-13 16:37       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 16:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

Michal Hocko <mhocko@suse.cz> writes:

> On Wed 13-06-12 15:57:23, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> 
>> Use a mmu_gather instead of a temporary linked list for accumulating
>> pages when we unmap a hugepage range
>
> Sorry for coming up with the comment that late but you owe us an
> explanation _why_ you are doing this.
>
> I assume that this fixes a real problem when we take i_mmap_mutex
> already up in 
> unmap_mapping_range
>   mutex_lock(&mapping->i_mmap_mutex);
>   unmap_mapping_range_tree | unmap_mapping_range_list 
>     unmap_mapping_range_vma
>       zap_page_range_single
>         unmap_single_vma
> 	  unmap_hugepage_range
> 	    mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
>
> And that this should have been marked for stable as well (I haven't
> checked when this has been introduced).

Switch to mmu_gather is to get rid of the use of page->lru so that i can use it for
active list.


-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages
@ 2012-06-13 16:37       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 16:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, rientjes-hpIqsD4AKlfQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org> writes:

> On Wed 13-06-12 15:57:23, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
>> 
>> Use a mmu_gather instead of a temporary linked list for accumulating
>> pages when we unmap a hugepage range
>
> Sorry for coming up with the comment that late but you owe us an
> explanation _why_ you are doing this.
>
> I assume that this fixes a real problem when we take i_mmap_mutex
> already up in 
> unmap_mapping_range
>   mutex_lock(&mapping->i_mmap_mutex);
>   unmap_mapping_range_tree | unmap_mapping_range_list 
>     unmap_mapping_range_vma
>       zap_page_range_single
>         unmap_single_vma
> 	  unmap_hugepage_range
> 	    mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
>
> And that this should have been marked for stable as well (I haven't
> checked when this has been introduced).

Switch to mmu_gather is to get rid of the use of page->lru so that i can use it for
active list.


-aneesh

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages
  2012-06-13 15:03       ` Michal Hocko
@ 2012-06-13 16:43         ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 16:43 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

Michal Hocko <mhocko@suse.cz> writes:

> On Wed 13-06-12 16:59:23, Michal Hocko wrote:
>> On Wed 13-06-12 15:57:23, Aneesh Kumar K.V wrote:
>> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> > 
>> > Use a mmu_gather instead of a temporary linked list for accumulating
>> > pages when we unmap a hugepage range
>> 
>> Sorry for coming up with the comment that late but you owe us an
>> explanation _why_ you are doing this.
>> 
>> I assume that this fixes a real problem when we take i_mmap_mutex
>> already up in 
>> unmap_mapping_range
>>   mutex_lock(&mapping->i_mmap_mutex);
>>   unmap_mapping_range_tree | unmap_mapping_range_list 
>>     unmap_mapping_range_vma
>>       zap_page_range_single
>>         unmap_single_vma
>> 	  unmap_hugepage_range
>> 	    mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
>> 
>> And that this should have been marked for stable as well (I haven't
>> checked when this has been introduced).
>> 
>> But then I do not see how this help when you still do this:
>> [...]
>> > diff --git a/mm/memory.c b/mm/memory.c
>> > index 1b7dc66..545e18a 100644
>> > --- a/mm/memory.c
>> > +++ b/mm/memory.c
>> > @@ -1326,8 +1326,11 @@ static void unmap_single_vma(struct mmu_gather *tlb,
>> >  			 * Since no pte has actually been setup, it is
>> >  			 * safe to do nothing in this case.
>> >  			 */
>> > -			if (vma->vm_file)
>> > -				unmap_hugepage_range(vma, start, end, NULL);
>> > +			if (vma->vm_file) {
>> > +				mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
>> > +				__unmap_hugepage_range(tlb, vma, start, end, NULL);
>> > +				mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
>> > +			}
>> >  		} else
>> >  			unmap_page_range(tlb, vma, start, end, details);
>> >  	}
>
> Ahhh, you are removing the lock in the next patch. Really confusing and
> not nice for the stable backport.
> Could you merge those two patches and add Cc: stable? 
> Then you can add my
> Reviewed-by: Michal Hocko <mhocko@suse.cz>
>

In the last review cycle I was asked to see if we can get a lockdep
report for the above and what I found was we don't really cause the
above deadlock with the current codebase because for hugetlb we don't
directly call unmap_mapping_range. But still it is good to remove the
i_mmap_mutex, because we don't need that protection now. I didn't
mark it for stable because of the above reason.

-aneesh


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages
@ 2012-06-13 16:43         ` Aneesh Kumar K.V
  0 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-13 16:43 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

Michal Hocko <mhocko@suse.cz> writes:

> On Wed 13-06-12 16:59:23, Michal Hocko wrote:
>> On Wed 13-06-12 15:57:23, Aneesh Kumar K.V wrote:
>> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> > 
>> > Use a mmu_gather instead of a temporary linked list for accumulating
>> > pages when we unmap a hugepage range
>> 
>> Sorry for coming up with the comment that late but you owe us an
>> explanation _why_ you are doing this.
>> 
>> I assume that this fixes a real problem when we take i_mmap_mutex
>> already up in 
>> unmap_mapping_range
>>   mutex_lock(&mapping->i_mmap_mutex);
>>   unmap_mapping_range_tree | unmap_mapping_range_list 
>>     unmap_mapping_range_vma
>>       zap_page_range_single
>>         unmap_single_vma
>> 	  unmap_hugepage_range
>> 	    mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
>> 
>> And that this should have been marked for stable as well (I haven't
>> checked when this has been introduced).
>> 
>> But then I do not see how this help when you still do this:
>> [...]
>> > diff --git a/mm/memory.c b/mm/memory.c
>> > index 1b7dc66..545e18a 100644
>> > --- a/mm/memory.c
>> > +++ b/mm/memory.c
>> > @@ -1326,8 +1326,11 @@ static void unmap_single_vma(struct mmu_gather *tlb,
>> >  			 * Since no pte has actually been setup, it is
>> >  			 * safe to do nothing in this case.
>> >  			 */
>> > -			if (vma->vm_file)
>> > -				unmap_hugepage_range(vma, start, end, NULL);
>> > +			if (vma->vm_file) {
>> > +				mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
>> > +				__unmap_hugepage_range(tlb, vma, start, end, NULL);
>> > +				mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
>> > +			}
>> >  		} else
>> >  			unmap_page_range(tlb, vma, start, end, details);
>> >  	}
>
> Ahhh, you are removing the lock in the next patch. Really confusing and
> not nice for the stable backport.
> Could you merge those two patches and add Cc: stable? 
> Then you can add my
> Reviewed-by: Michal Hocko <mhocko@suse.cz>
>

In the last review cycle I was asked to see if we can get a lockdep
report for the above and what I found was we don't really cause the
above deadlock with the current codebase because for hugetlb we don't
directly call unmap_mapping_range. But still it is good to remove the
i_mmap_mutex, because we don't need that protection now. I didn't
mark it for stable because of the above reason.

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 05/15] hugetlb: avoid taking i_mmap_mutex in unmap_single_vma() for hugetlb
  2012-06-13 10:27   ` Aneesh Kumar K.V
  (?)
@ 2012-06-14  3:09     ` Kamezawa Hiroyuki
  -1 siblings, 0 replies; 122+ messages in thread
From: Kamezawa Hiroyuki @ 2012-06-14  3:09 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, dhillf, rientjes, mhocko, akpm, hannes, linux-kernel, cgroups

(2012/06/13 19:27), Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V"<aneesh.kumar@linux.vnet.ibm.com>
> 
> i_mmap_mutex lock was added in unmap_single_vma by 502717f4e ("hugetlb:
> fix linked list corruption in unmap_hugepage_range()") but we don't use
> page->lru in unmap_hugepage_range any more.  Also the lock was taken
> higher up in the stack in some code path.  That would result in deadlock.
> 
> unmap_mapping_range (i_mmap_mutex)
>   ->  unmap_mapping_range_tree
>      ->  unmap_mapping_range_vma
>         ->  zap_page_range_single
>           ->  unmap_single_vma
> 	      ->  unmap_hugepage_range (i_mmap_mutex)
> 
> For shared pagetable support for huge pages, since pagetable pages are ref
> counted we don't need any lock during huge_pmd_unshare.  We do take
> i_mmap_mutex in huge_pmd_share while walking the vma_prio_tree in mapping.
> (39dde65c9940c97f ("shared page table for hugetlb page")).
> 
> Signed-off-by: Aneesh Kumar K.V<aneesh.kumar@linux.vnet.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com>



^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 05/15] hugetlb: avoid taking i_mmap_mutex in unmap_single_vma() for hugetlb
@ 2012-06-14  3:09     ` Kamezawa Hiroyuki
  0 siblings, 0 replies; 122+ messages in thread
From: Kamezawa Hiroyuki @ 2012-06-14  3:09 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, dhillf, rientjes, mhocko, akpm, hannes, linux-kernel, cgroups

(2012/06/13 19:27), Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V"<aneesh.kumar@linux.vnet.ibm.com>
> 
> i_mmap_mutex lock was added in unmap_single_vma by 502717f4e ("hugetlb:
> fix linked list corruption in unmap_hugepage_range()") but we don't use
> page->lru in unmap_hugepage_range any more.  Also the lock was taken
> higher up in the stack in some code path.  That would result in deadlock.
> 
> unmap_mapping_range (i_mmap_mutex)
>   ->  unmap_mapping_range_tree
>      ->  unmap_mapping_range_vma
>         ->  zap_page_range_single
>           ->  unmap_single_vma
> 	      ->  unmap_hugepage_range (i_mmap_mutex)
> 
> For shared pagetable support for huge pages, since pagetable pages are ref
> counted we don't need any lock during huge_pmd_unshare.  We do take
> i_mmap_mutex in huge_pmd_share while walking the vma_prio_tree in mapping.
> (39dde65c9940c97f ("shared page table for hugetlb page")).
> 
> Signed-off-by: Aneesh Kumar K.V<aneesh.kumar@linux.vnet.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 05/15] hugetlb: avoid taking i_mmap_mutex in unmap_single_vma() for hugetlb
@ 2012-06-14  3:09     ` Kamezawa Hiroyuki
  0 siblings, 0 replies; 122+ messages in thread
From: Kamezawa Hiroyuki @ 2012-06-14  3:09 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, dhillf-Re5JQEeQqe8AvxtiuMwx3w,
	rientjes-hpIqsD4AKlfQT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

(2012/06/13 19:27), Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V"<aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> 
> i_mmap_mutex lock was added in unmap_single_vma by 502717f4e ("hugetlb:
> fix linked list corruption in unmap_hugepage_range()") but we don't use
> page->lru in unmap_hugepage_range any more.  Also the lock was taken
> higher up in the stack in some code path.  That would result in deadlock.
> 
> unmap_mapping_range (i_mmap_mutex)
>   ->  unmap_mapping_range_tree
>      ->  unmap_mapping_range_vma
>         ->  zap_page_range_single
>           ->  unmap_single_vma
> 	      ->  unmap_hugepage_range (i_mmap_mutex)
> 
> For shared pagetable support for huge pages, since pagetable pages are ref
> counted we don't need any lock during huge_pmd_unshare.  We do take
> i_mmap_mutex in huge_pmd_share while walking the vma_prio_tree in mapping.
> (39dde65c9940c97f ("shared page table for hugetlb page")).
> 
> Signed-off-by: Aneesh Kumar K.V<aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-LdfC7J4mv27QFUHtdCDX3A@public.gmane.org>


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 08/15] hugetlb: Make some static variables global
  2012-06-13 10:27   ` Aneesh Kumar K.V
@ 2012-06-14  3:11     ` Kamezawa Hiroyuki
  -1 siblings, 0 replies; 122+ messages in thread
From: Kamezawa Hiroyuki @ 2012-06-14  3:11 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, dhillf, rientjes, mhocko, akpm, hannes, linux-kernel, cgroups

(2012/06/13 19:27), Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V"<aneesh.kumar@linux.vnet.ibm.com>
> 
> We will use them later in hugetlb_cgroup.c
> 
> Signed-off-by: Aneesh Kumar K.V<aneesh.kumar@linux.vnet.ibm.com>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 08/15] hugetlb: Make some static variables global
@ 2012-06-14  3:11     ` Kamezawa Hiroyuki
  0 siblings, 0 replies; 122+ messages in thread
From: Kamezawa Hiroyuki @ 2012-06-14  3:11 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, dhillf, rientjes, mhocko, akpm, hannes, linux-kernel, cgroups

(2012/06/13 19:27), Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V"<aneesh.kumar@linux.vnet.ibm.com>
> 
> We will use them later in hugetlb_cgroup.c
> 
> Signed-off-by: Aneesh Kumar K.V<aneesh.kumar@linux.vnet.ibm.com>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 [updated] 10/15] hugetlb/cgroup: Add the cgroup pointer to page lru
  2012-06-13 11:34     ` Aneesh Kumar K.V
  (?)
@ 2012-06-14  4:04       ` Kamezawa Hiroyuki
  -1 siblings, 0 replies; 122+ messages in thread
From: Kamezawa Hiroyuki @ 2012-06-14  4:04 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, dhillf, rientjes, mhocko, akpm, hannes, linux-kernel, cgroups

(2012/06/13 20:34), Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V"<aneesh.kumar@linux.vnet.ibm.com>
> 
> Add the hugetlb cgroup pointer to 3rd page lru.next. This limit
> the usage to hugetlb cgroup to only hugepages with 3 or more
> normal pages. I guess that is an acceptable limitation.
> 
> Signed-off-by: Aneesh Kumar K.V<aneesh.kumar@linux.vnet.ibm.com>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 [updated] 10/15] hugetlb/cgroup: Add the cgroup pointer to page lru
@ 2012-06-14  4:04       ` Kamezawa Hiroyuki
  0 siblings, 0 replies; 122+ messages in thread
From: Kamezawa Hiroyuki @ 2012-06-14  4:04 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, dhillf, rientjes, mhocko, akpm, hannes, linux-kernel, cgroups

(2012/06/13 20:34), Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V"<aneesh.kumar@linux.vnet.ibm.com>
> 
> Add the hugetlb cgroup pointer to 3rd page lru.next. This limit
> the usage to hugetlb cgroup to only hugepages with 3 or more
> normal pages. I guess that is an acceptable limitation.
> 
> Signed-off-by: Aneesh Kumar K.V<aneesh.kumar@linux.vnet.ibm.com>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 [updated] 10/15] hugetlb/cgroup: Add the cgroup pointer to page lru
@ 2012-06-14  4:04       ` Kamezawa Hiroyuki
  0 siblings, 0 replies; 122+ messages in thread
From: Kamezawa Hiroyuki @ 2012-06-14  4:04 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, dhillf-Re5JQEeQqe8AvxtiuMwx3w,
	rientjes-hpIqsD4AKlfQT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

(2012/06/13 20:34), Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V"<aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> 
> Add the hugetlb cgroup pointer to 3rd page lru.next. This limit
> the usage to hugetlb cgroup to only hugepages with 3 or more
> normal pages. I guess that is an acceptable limitation.
> 
> Signed-off-by: Aneesh Kumar K.V<aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup
  2012-06-13 10:27   ` Aneesh Kumar K.V
  (?)
@ 2012-06-14  4:07     ` Kamezawa Hiroyuki
  -1 siblings, 0 replies; 122+ messages in thread
From: Kamezawa Hiroyuki @ 2012-06-14  4:07 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, dhillf, rientjes, mhocko, akpm, hannes, linux-kernel, cgroups

(2012/06/13 19:27), Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V"<aneesh.kumar@linux.vnet.ibm.com>
> 
> This patchset add the charge and uncharge routines for hugetlb cgroup.
> We do cgroup charging in page alloc and uncharge in compound page
> destructor. Assigning page's hugetlb cgroup is protected by hugetlb_lock.
> 
> Signed-off-by: Aneesh Kumar K.V<aneesh.kumar@linux.vnet.ibm.com>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>



^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup
@ 2012-06-14  4:07     ` Kamezawa Hiroyuki
  0 siblings, 0 replies; 122+ messages in thread
From: Kamezawa Hiroyuki @ 2012-06-14  4:07 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, dhillf, rientjes, mhocko, akpm, hannes, linux-kernel, cgroups

(2012/06/13 19:27), Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V"<aneesh.kumar@linux.vnet.ibm.com>
> 
> This patchset add the charge and uncharge routines for hugetlb cgroup.
> We do cgroup charging in page alloc and uncharge in compound page
> destructor. Assigning page's hugetlb cgroup is protected by hugetlb_lock.
> 
> Signed-off-by: Aneesh Kumar K.V<aneesh.kumar@linux.vnet.ibm.com>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup
@ 2012-06-14  4:07     ` Kamezawa Hiroyuki
  0 siblings, 0 replies; 122+ messages in thread
From: Kamezawa Hiroyuki @ 2012-06-14  4:07 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, dhillf-Re5JQEeQqe8AvxtiuMwx3w,
	rientjes-hpIqsD4AKlfQT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

(2012/06/13 19:27), Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V"<aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> 
> This patchset add the charge and uncharge routines for hugetlb cgroup.
> We do cgroup charging in page alloc and uncharge in compound page
> destructor. Assigning page's hugetlb cgroup is protected by hugetlb_lock.
> 
> Signed-off-by: Aneesh Kumar K.V<aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 12/15] hugetlb/cgroup: Add support for cgroup removal
  2012-06-13 10:27   ` Aneesh Kumar K.V
  (?)
@ 2012-06-14  4:09     ` Kamezawa Hiroyuki
  -1 siblings, 0 replies; 122+ messages in thread
From: Kamezawa Hiroyuki @ 2012-06-14  4:09 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, dhillf, rientjes, mhocko, akpm, hannes, linux-kernel, cgroups

(2012/06/13 19:27), Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V"<aneesh.kumar@linux.vnet.ibm.com>
> 
> This patch add support for cgroup removal. If we don't have parent
> cgroup, the charges are moved to root cgroup.
> 
> Signed-off-by: Aneesh Kumar K.V<aneesh.kumar@linux.vnet.ibm.com>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>



^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 12/15] hugetlb/cgroup: Add support for cgroup removal
@ 2012-06-14  4:09     ` Kamezawa Hiroyuki
  0 siblings, 0 replies; 122+ messages in thread
From: Kamezawa Hiroyuki @ 2012-06-14  4:09 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, dhillf, rientjes, mhocko, akpm, hannes, linux-kernel, cgroups

(2012/06/13 19:27), Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V"<aneesh.kumar@linux.vnet.ibm.com>
> 
> This patch add support for cgroup removal. If we don't have parent
> cgroup, the charges are moved to root cgroup.
> 
> Signed-off-by: Aneesh Kumar K.V<aneesh.kumar@linux.vnet.ibm.com>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 12/15] hugetlb/cgroup: Add support for cgroup removal
@ 2012-06-14  4:09     ` Kamezawa Hiroyuki
  0 siblings, 0 replies; 122+ messages in thread
From: Kamezawa Hiroyuki @ 2012-06-14  4:09 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, dhillf-Re5JQEeQqe8AvxtiuMwx3w,
	rientjes-hpIqsD4AKlfQT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

(2012/06/13 19:27), Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V"<aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> 
> This patch add support for cgroup removal. If we don't have parent
> cgroup, the charges are moved to root cgroup.
> 
> Signed-off-by: Aneesh Kumar K.V<aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 13/15] hugetlb/cgroup: add hugetlb cgroup control files
  2012-06-13 10:27   ` Aneesh Kumar K.V
@ 2012-06-14  4:10     ` Kamezawa Hiroyuki
  -1 siblings, 0 replies; 122+ messages in thread
From: Kamezawa Hiroyuki @ 2012-06-14  4:10 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, dhillf, rientjes, mhocko, akpm, hannes, linux-kernel, cgroups

(2012/06/13 19:27), Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V"<aneesh.kumar@linux.vnet.ibm.com>
> 
> Add the control files for hugetlb controller
> 
> Signed-off-by: Aneesh Kumar K.V<aneesh.kumar@linux.vnet.ibm.com>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 13/15] hugetlb/cgroup: add hugetlb cgroup control files
@ 2012-06-14  4:10     ` Kamezawa Hiroyuki
  0 siblings, 0 replies; 122+ messages in thread
From: Kamezawa Hiroyuki @ 2012-06-14  4:10 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, dhillf, rientjes, mhocko, akpm, hannes, linux-kernel, cgroups

(2012/06/13 19:27), Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V"<aneesh.kumar@linux.vnet.ibm.com>
> 
> Add the control files for hugetlb controller
> 
> Signed-off-by: Aneesh Kumar K.V<aneesh.kumar@linux.vnet.ibm.com>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 14/15] hugetlb/cgroup: migrate hugetlb cgroup info from oldpage to new page during migration
  2012-06-13 10:27   ` Aneesh Kumar K.V
  (?)
@ 2012-06-14  4:13     ` Kamezawa Hiroyuki
  -1 siblings, 0 replies; 122+ messages in thread
From: Kamezawa Hiroyuki @ 2012-06-14  4:13 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, dhillf, rientjes, mhocko, akpm, hannes, linux-kernel, cgroups

(2012/06/13 19:27), Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V"<aneesh.kumar@linux.vnet.ibm.com>
> 
> With HugeTLB pages, hugetlb cgroup is uncharged in compound page destructor.  Since
> we are holding a hugepage reference, we can be sure that old page won't
> get uncharged till the last put_page().
> 
> Signed-off-by: Aneesh Kumar K.V<aneesh.kumar@linux.vnet.ibm.com>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 14/15] hugetlb/cgroup: migrate hugetlb cgroup info from oldpage to new page during migration
@ 2012-06-14  4:13     ` Kamezawa Hiroyuki
  0 siblings, 0 replies; 122+ messages in thread
From: Kamezawa Hiroyuki @ 2012-06-14  4:13 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, dhillf, rientjes, mhocko, akpm, hannes, linux-kernel, cgroups

(2012/06/13 19:27), Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V"<aneesh.kumar@linux.vnet.ibm.com>
> 
> With HugeTLB pages, hugetlb cgroup is uncharged in compound page destructor.  Since
> we are holding a hugepage reference, we can be sure that old page won't
> get uncharged till the last put_page().
> 
> Signed-off-by: Aneesh Kumar K.V<aneesh.kumar@linux.vnet.ibm.com>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 14/15] hugetlb/cgroup: migrate hugetlb cgroup info from oldpage to new page during migration
@ 2012-06-14  4:13     ` Kamezawa Hiroyuki
  0 siblings, 0 replies; 122+ messages in thread
From: Kamezawa Hiroyuki @ 2012-06-14  4:13 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, dhillf-Re5JQEeQqe8AvxtiuMwx3w,
	rientjes-hpIqsD4AKlfQT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

(2012/06/13 19:27), Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V"<aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> 
> With HugeTLB pages, hugetlb cgroup is uncharged in compound page destructor.  Since
> we are holding a hugepage reference, we can be sure that old page won't
> get uncharged till the last put_page().
> 
> Signed-off-by: Aneesh Kumar K.V<aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages
  2012-06-13 16:43         ` Aneesh Kumar K.V
  (?)
@ 2012-06-14  7:14           ` Michal Hocko
  -1 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-14  7:14 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

On Wed 13-06-12 22:13:00, Aneesh Kumar K.V wrote:
> Michal Hocko <mhocko@suse.cz> writes:
> 
> > On Wed 13-06-12 16:59:23, Michal Hocko wrote:
> >> On Wed 13-06-12 15:57:23, Aneesh Kumar K.V wrote:
> >> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >> > 
> >> > Use a mmu_gather instead of a temporary linked list for accumulating
> >> > pages when we unmap a hugepage range
> >> 
> >> Sorry for coming up with the comment that late but you owe us an
> >> explanation _why_ you are doing this.
> >> 
> >> I assume that this fixes a real problem when we take i_mmap_mutex
> >> already up in 
> >> unmap_mapping_range
> >>   mutex_lock(&mapping->i_mmap_mutex);
> >>   unmap_mapping_range_tree | unmap_mapping_range_list 
> >>     unmap_mapping_range_vma
> >>       zap_page_range_single
> >>         unmap_single_vma
> >> 	  unmap_hugepage_range
> >> 	    mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
> >> 
> >> And that this should have been marked for stable as well (I haven't
> >> checked when this has been introduced).
> >> 
> >> But then I do not see how this help when you still do this:
> >> [...]
> >> > diff --git a/mm/memory.c b/mm/memory.c
> >> > index 1b7dc66..545e18a 100644
> >> > --- a/mm/memory.c
> >> > +++ b/mm/memory.c
> >> > @@ -1326,8 +1326,11 @@ static void unmap_single_vma(struct mmu_gather *tlb,
> >> >  			 * Since no pte has actually been setup, it is
> >> >  			 * safe to do nothing in this case.
> >> >  			 */
> >> > -			if (vma->vm_file)
> >> > -				unmap_hugepage_range(vma, start, end, NULL);
> >> > +			if (vma->vm_file) {
> >> > +				mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
> >> > +				__unmap_hugepage_range(tlb, vma, start, end, NULL);
> >> > +				mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
> >> > +			}
> >> >  		} else
> >> >  			unmap_page_range(tlb, vma, start, end, details);
> >> >  	}
> >
> > Ahhh, you are removing the lock in the next patch. Really confusing and
> > not nice for the stable backport.
> > Could you merge those two patches and add Cc: stable? 
> > Then you can add my
> > Reviewed-by: Michal Hocko <mhocko@suse.cz>
> >
> 
> In the last review cycle I was asked to see if we can get a lockdep
> report for the above and what I found was we don't really cause the
> above deadlock with the current codebase because for hugetlb we don't
> directly call unmap_mapping_range. 

Ahh, ok I missed that.

> But still it is good to remove the i_mmap_mutex, because we don't need
> that protection now. I didn't mark it for stable because of the above
> reason.

Thanks for clarification

> 
> -aneesh
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages
@ 2012-06-14  7:14           ` Michal Hocko
  0 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-14  7:14 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

On Wed 13-06-12 22:13:00, Aneesh Kumar K.V wrote:
> Michal Hocko <mhocko@suse.cz> writes:
> 
> > On Wed 13-06-12 16:59:23, Michal Hocko wrote:
> >> On Wed 13-06-12 15:57:23, Aneesh Kumar K.V wrote:
> >> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >> > 
> >> > Use a mmu_gather instead of a temporary linked list for accumulating
> >> > pages when we unmap a hugepage range
> >> 
> >> Sorry for coming up with the comment that late but you owe us an
> >> explanation _why_ you are doing this.
> >> 
> >> I assume that this fixes a real problem when we take i_mmap_mutex
> >> already up in 
> >> unmap_mapping_range
> >>   mutex_lock(&mapping->i_mmap_mutex);
> >>   unmap_mapping_range_tree | unmap_mapping_range_list 
> >>     unmap_mapping_range_vma
> >>       zap_page_range_single
> >>         unmap_single_vma
> >> 	  unmap_hugepage_range
> >> 	    mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
> >> 
> >> And that this should have been marked for stable as well (I haven't
> >> checked when this has been introduced).
> >> 
> >> But then I do not see how this help when you still do this:
> >> [...]
> >> > diff --git a/mm/memory.c b/mm/memory.c
> >> > index 1b7dc66..545e18a 100644
> >> > --- a/mm/memory.c
> >> > +++ b/mm/memory.c
> >> > @@ -1326,8 +1326,11 @@ static void unmap_single_vma(struct mmu_gather *tlb,
> >> >  			 * Since no pte has actually been setup, it is
> >> >  			 * safe to do nothing in this case.
> >> >  			 */
> >> > -			if (vma->vm_file)
> >> > -				unmap_hugepage_range(vma, start, end, NULL);
> >> > +			if (vma->vm_file) {
> >> > +				mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
> >> > +				__unmap_hugepage_range(tlb, vma, start, end, NULL);
> >> > +				mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
> >> > +			}
> >> >  		} else
> >> >  			unmap_page_range(tlb, vma, start, end, details);
> >> >  	}
> >
> > Ahhh, you are removing the lock in the next patch. Really confusing and
> > not nice for the stable backport.
> > Could you merge those two patches and add Cc: stable? 
> > Then you can add my
> > Reviewed-by: Michal Hocko <mhocko@suse.cz>
> >
> 
> In the last review cycle I was asked to see if we can get a lockdep
> report for the above and what I found was we don't really cause the
> above deadlock with the current codebase because for hugetlb we don't
> directly call unmap_mapping_range. 

Ahh, ok I missed that.

> But still it is good to remove the i_mmap_mutex, because we don't need
> that protection now. I didn't mark it for stable because of the above
> reason.

Thanks for clarification

> 
> -aneesh
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages
@ 2012-06-14  7:14           ` Michal Hocko
  0 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-14  7:14 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, rientjes-hpIqsD4AKlfQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Wed 13-06-12 22:13:00, Aneesh Kumar K.V wrote:
> Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org> writes:
> 
> > On Wed 13-06-12 16:59:23, Michal Hocko wrote:
> >> On Wed 13-06-12 15:57:23, Aneesh Kumar K.V wrote:
> >> > From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> >> > 
> >> > Use a mmu_gather instead of a temporary linked list for accumulating
> >> > pages when we unmap a hugepage range
> >> 
> >> Sorry for coming up with the comment that late but you owe us an
> >> explanation _why_ you are doing this.
> >> 
> >> I assume that this fixes a real problem when we take i_mmap_mutex
> >> already up in 
> >> unmap_mapping_range
> >>   mutex_lock(&mapping->i_mmap_mutex);
> >>   unmap_mapping_range_tree | unmap_mapping_range_list 
> >>     unmap_mapping_range_vma
> >>       zap_page_range_single
> >>         unmap_single_vma
> >> 	  unmap_hugepage_range
> >> 	    mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
> >> 
> >> And that this should have been marked for stable as well (I haven't
> >> checked when this has been introduced).
> >> 
> >> But then I do not see how this help when you still do this:
> >> [...]
> >> > diff --git a/mm/memory.c b/mm/memory.c
> >> > index 1b7dc66..545e18a 100644
> >> > --- a/mm/memory.c
> >> > +++ b/mm/memory.c
> >> > @@ -1326,8 +1326,11 @@ static void unmap_single_vma(struct mmu_gather *tlb,
> >> >  			 * Since no pte has actually been setup, it is
> >> >  			 * safe to do nothing in this case.
> >> >  			 */
> >> > -			if (vma->vm_file)
> >> > -				unmap_hugepage_range(vma, start, end, NULL);
> >> > +			if (vma->vm_file) {
> >> > +				mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
> >> > +				__unmap_hugepage_range(tlb, vma, start, end, NULL);
> >> > +				mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
> >> > +			}
> >> >  		} else
> >> >  			unmap_page_range(tlb, vma, start, end, details);
> >> >  	}
> >
> > Ahhh, you are removing the lock in the next patch. Really confusing and
> > not nice for the stable backport.
> > Could you merge those two patches and add Cc: stable? 
> > Then you can add my
> > Reviewed-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
> >
> 
> In the last review cycle I was asked to see if we can get a lockdep
> report for the above and what I found was we don't really cause the
> above deadlock with the current codebase because for hugetlb we don't
> directly call unmap_mapping_range. 

Ahh, ok I missed that.

> But still it is good to remove the i_mmap_mutex, because we don't need
> that protection now. I didn't mark it for stable because of the above
> reason.

Thanks for clarification

> 
> -aneesh
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages
  2012-06-13 16:37       ` Aneesh Kumar K.V
  (?)
@ 2012-06-14  7:16         ` Michal Hocko
  -1 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-14  7:16 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

On Wed 13-06-12 22:07:06, Aneesh Kumar K.V wrote:
> Michal Hocko <mhocko@suse.cz> writes:
> 
> > On Wed 13-06-12 15:57:23, Aneesh Kumar K.V wrote:
> >> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >> 
> >> Use a mmu_gather instead of a temporary linked list for accumulating
> >> pages when we unmap a hugepage range
> >
> > Sorry for coming up with the comment that late but you owe us an
> > explanation _why_ you are doing this.
> >
> > I assume that this fixes a real problem when we take i_mmap_mutex
> > already up in 
> > unmap_mapping_range
> >   mutex_lock(&mapping->i_mmap_mutex);
> >   unmap_mapping_range_tree | unmap_mapping_range_list 
> >     unmap_mapping_range_vma
> >       zap_page_range_single
> >         unmap_single_vma
> > 	  unmap_hugepage_range
> > 	    mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
> >
> > And that this should have been marked for stable as well (I haven't
> > checked when this has been introduced).
> 
> Switch to mmu_gather is to get rid of the use of page->lru so that i can use it for
> active list.

So can we get this to the changelog please?

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages
@ 2012-06-14  7:16         ` Michal Hocko
  0 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-14  7:16 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

On Wed 13-06-12 22:07:06, Aneesh Kumar K.V wrote:
> Michal Hocko <mhocko@suse.cz> writes:
> 
> > On Wed 13-06-12 15:57:23, Aneesh Kumar K.V wrote:
> >> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >> 
> >> Use a mmu_gather instead of a temporary linked list for accumulating
> >> pages when we unmap a hugepage range
> >
> > Sorry for coming up with the comment that late but you owe us an
> > explanation _why_ you are doing this.
> >
> > I assume that this fixes a real problem when we take i_mmap_mutex
> > already up in 
> > unmap_mapping_range
> >   mutex_lock(&mapping->i_mmap_mutex);
> >   unmap_mapping_range_tree | unmap_mapping_range_list 
> >     unmap_mapping_range_vma
> >       zap_page_range_single
> >         unmap_single_vma
> > 	  unmap_hugepage_range
> > 	    mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
> >
> > And that this should have been marked for stable as well (I haven't
> > checked when this has been introduced).
> 
> Switch to mmu_gather is to get rid of the use of page->lru so that i can use it for
> active list.

So can we get this to the changelog please?

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages
@ 2012-06-14  7:16         ` Michal Hocko
  0 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-14  7:16 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, rientjes-hpIqsD4AKlfQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Wed 13-06-12 22:07:06, Aneesh Kumar K.V wrote:
> Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org> writes:
> 
> > On Wed 13-06-12 15:57:23, Aneesh Kumar K.V wrote:
> >> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> >> 
> >> Use a mmu_gather instead of a temporary linked list for accumulating
> >> pages when we unmap a hugepage range
> >
> > Sorry for coming up with the comment that late but you owe us an
> > explanation _why_ you are doing this.
> >
> > I assume that this fixes a real problem when we take i_mmap_mutex
> > already up in 
> > unmap_mapping_range
> >   mutex_lock(&mapping->i_mmap_mutex);
> >   unmap_mapping_range_tree | unmap_mapping_range_list 
> >     unmap_mapping_range_vma
> >       zap_page_range_single
> >         unmap_single_vma
> > 	  unmap_hugepage_range
> > 	    mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
> >
> > And that this should have been marked for stable as well (I haven't
> > checked when this has been introduced).
> 
> Switch to mmu_gather is to get rid of the use of page->lru so that i can use it for
> active list.

So can we get this to the changelog please?

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 05/15] hugetlb: avoid taking i_mmap_mutex in unmap_single_vma() for hugetlb
  2012-06-13 10:27   ` Aneesh Kumar K.V
@ 2012-06-14  7:20     ` Michal Hocko
  -1 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-14  7:20 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

On Wed 13-06-12 15:57:24, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> i_mmap_mutex lock was added in unmap_single_vma by 502717f4e ("hugetlb:
> fix linked list corruption in unmap_hugepage_range()") but we don't use
> page->lru in unmap_hugepage_range any more.  Also the lock was taken
> higher up in the stack in some code path.  That would result in deadlock.

This sounds like the deadlock is real but in the other email you wrote
that the deadlock cannot happen so it would be good to mention it here.
 
> unmap_mapping_range (i_mmap_mutex)
>  -> unmap_mapping_range_tree
>     -> unmap_mapping_range_vma
>        -> zap_page_range_single
>          -> unmap_single_vma
> 	      -> unmap_hugepage_range (i_mmap_mutex)
> 
> For shared pagetable support for huge pages, since pagetable pages are ref
> counted we don't need any lock during huge_pmd_unshare.  We do take
> i_mmap_mutex in huge_pmd_share while walking the vma_prio_tree in mapping.
> (39dde65c9940c97f ("shared page table for hugetlb page")).
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  mm/memory.c |    5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 545e18a..f6bc04f 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1326,11 +1326,8 @@ static void unmap_single_vma(struct mmu_gather *tlb,
>  			 * Since no pte has actually been setup, it is
>  			 * safe to do nothing in this case.
>  			 */
> -			if (vma->vm_file) {
> -				mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
> +			if (vma->vm_file)
>  				__unmap_hugepage_range(tlb, vma, start, end, NULL);
> -				mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
> -			}
>  		} else
>  			unmap_page_range(tlb, vma, start, end, details);
>  	}
> -- 
> 1.7.10
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 05/15] hugetlb: avoid taking i_mmap_mutex in unmap_single_vma() for hugetlb
@ 2012-06-14  7:20     ` Michal Hocko
  0 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-14  7:20 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

On Wed 13-06-12 15:57:24, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> i_mmap_mutex lock was added in unmap_single_vma by 502717f4e ("hugetlb:
> fix linked list corruption in unmap_hugepage_range()") but we don't use
> page->lru in unmap_hugepage_range any more.  Also the lock was taken
> higher up in the stack in some code path.  That would result in deadlock.

This sounds like the deadlock is real but in the other email you wrote
that the deadlock cannot happen so it would be good to mention it here.
 
> unmap_mapping_range (i_mmap_mutex)
>  -> unmap_mapping_range_tree
>     -> unmap_mapping_range_vma
>        -> zap_page_range_single
>          -> unmap_single_vma
> 	      -> unmap_hugepage_range (i_mmap_mutex)
> 
> For shared pagetable support for huge pages, since pagetable pages are ref
> counted we don't need any lock during huge_pmd_unshare.  We do take
> i_mmap_mutex in huge_pmd_share while walking the vma_prio_tree in mapping.
> (39dde65c9940c97f ("shared page table for hugetlb page")).
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  mm/memory.c |    5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 545e18a..f6bc04f 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1326,11 +1326,8 @@ static void unmap_single_vma(struct mmu_gather *tlb,
>  			 * Since no pte has actually been setup, it is
>  			 * safe to do nothing in this case.
>  			 */
> -			if (vma->vm_file) {
> -				mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
> +			if (vma->vm_file)
>  				__unmap_hugepage_range(tlb, vma, start, end, NULL);
> -				mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
> -			}
>  		} else
>  			unmap_page_range(tlb, vma, start, end, details);
>  	}
> -- 
> 1.7.10
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 06/15] hugetlb: simplify migrate_huge_page()
  2012-06-13 10:27   ` Aneesh Kumar K.V
  (?)
@ 2012-06-14  7:28     ` Michal Hocko
  -1 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-14  7:28 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

On Wed 13-06-12 15:57:25, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> Since we migrate only one hugepage, don't use linked list for passing the
> page around.  Directly pass the page that need to be migrated as argument.
> This also remove the usage page->lru in migrate path.
> 
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Yes nice.
Reviewed-by: Michal Hocko <mhocko@suse.cz>

> ---
>  include/linux/migrate.h |    4 +--
>  mm/memory-failure.c     |   13 ++--------
>  mm/migrate.c            |   65 +++++++++++++++--------------------------------
>  3 files changed, 25 insertions(+), 57 deletions(-)
> 
> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> index 855c337..ce7e667 100644
> --- a/include/linux/migrate.h
> +++ b/include/linux/migrate.h
> @@ -15,7 +15,7 @@ extern int migrate_page(struct address_space *,
>  extern int migrate_pages(struct list_head *l, new_page_t x,
>  			unsigned long private, bool offlining,
>  			enum migrate_mode mode);
> -extern int migrate_huge_pages(struct list_head *l, new_page_t x,
> +extern int migrate_huge_page(struct page *, new_page_t x,
>  			unsigned long private, bool offlining,
>  			enum migrate_mode mode);
>  
> @@ -36,7 +36,7 @@ static inline void putback_lru_pages(struct list_head *l) {}
>  static inline int migrate_pages(struct list_head *l, new_page_t x,
>  		unsigned long private, bool offlining,
>  		enum migrate_mode mode) { return -ENOSYS; }
> -static inline int migrate_huge_pages(struct list_head *l, new_page_t x,
> +static inline int migrate_huge_page(struct page *page, new_page_t x,
>  		unsigned long private, bool offlining,
>  		enum migrate_mode mode) { return -ENOSYS; }
>  
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index ab1e714..53a1495 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1414,7 +1414,6 @@ static int soft_offline_huge_page(struct page *page, int flags)
>  	int ret;
>  	unsigned long pfn = page_to_pfn(page);
>  	struct page *hpage = compound_head(page);
> -	LIST_HEAD(pagelist);
>  
>  	ret = get_any_page(page, pfn, flags);
>  	if (ret < 0)
> @@ -1429,19 +1428,11 @@ static int soft_offline_huge_page(struct page *page, int flags)
>  	}
>  
>  	/* Keep page count to indicate a given hugepage is isolated. */
> -
> -	list_add(&hpage->lru, &pagelist);
> -	ret = migrate_huge_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL, 0,
> -				true);
> +	ret = migrate_huge_page(hpage, new_page, MPOL_MF_MOVE_ALL, 0, true);
> +	put_page(hpage);
>  	if (ret) {
> -		struct page *page1, *page2;
> -		list_for_each_entry_safe(page1, page2, &pagelist, lru)
> -			put_page(page1);
> -
>  		pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
>  			pfn, ret, page->flags);
> -		if (ret > 0)
> -			ret = -EIO;
>  		return ret;
>  	}
>  done:
> diff --git a/mm/migrate.c b/mm/migrate.c
> index be26d5c..fdce3a2 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -932,15 +932,8 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
>  	if (anon_vma)
>  		put_anon_vma(anon_vma);
>  	unlock_page(hpage);
> -
>  out:
> -	if (rc != -EAGAIN) {
> -		list_del(&hpage->lru);
> -		put_page(hpage);
> -	}
> -
>  	put_page(new_hpage);
> -
>  	if (result) {
>  		if (rc)
>  			*result = rc;
> @@ -1016,48 +1009,32 @@ out:
>  	return nr_failed + retry;
>  }
>  
> -int migrate_huge_pages(struct list_head *from,
> -		new_page_t get_new_page, unsigned long private, bool offlining,
> -		enum migrate_mode mode)
> +int migrate_huge_page(struct page *hpage, new_page_t get_new_page,
> +		      unsigned long private, bool offlining,
> +		      enum migrate_mode mode)
>  {
> -	int retry = 1;
> -	int nr_failed = 0;
> -	int pass = 0;
> -	struct page *page;
> -	struct page *page2;
> -	int rc;
> -
> -	for (pass = 0; pass < 10 && retry; pass++) {
> -		retry = 0;
> -
> -		list_for_each_entry_safe(page, page2, from, lru) {
> +	int pass, rc;
> +
> +	for (pass = 0; pass < 10; pass++) {
> +		rc = unmap_and_move_huge_page(get_new_page,
> +					      private, hpage, pass > 2, offlining,
> +					      mode);
> +		switch (rc) {
> +		case -ENOMEM:
> +			goto out;
> +		case -EAGAIN:
> +			/* try again */
>  			cond_resched();
> -
> -			rc = unmap_and_move_huge_page(get_new_page,
> -					private, page, pass > 2, offlining,
> -					mode);
> -
> -			switch(rc) {
> -			case -ENOMEM:
> -				goto out;
> -			case -EAGAIN:
> -				retry++;
> -				break;
> -			case 0:
> -				break;
> -			default:
> -				/* Permanent failure */
> -				nr_failed++;
> -				break;
> -			}
> +			break;
> +		case 0:
> +			goto out;
> +		default:
> +			rc = -EIO;
> +			goto out;
>  		}
>  	}
> -	rc = 0;
>  out:
> -	if (rc)
> -		return rc;
> -
> -	return nr_failed + retry;
> +	return rc;
>  }
>  
>  #ifdef CONFIG_NUMA
> -- 
> 1.7.10
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 06/15] hugetlb: simplify migrate_huge_page()
@ 2012-06-14  7:28     ` Michal Hocko
  0 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-14  7:28 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

On Wed 13-06-12 15:57:25, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> Since we migrate only one hugepage, don't use linked list for passing the
> page around.  Directly pass the page that need to be migrated as argument.
> This also remove the usage page->lru in migrate path.
> 
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Yes nice.
Reviewed-by: Michal Hocko <mhocko@suse.cz>

> ---
>  include/linux/migrate.h |    4 +--
>  mm/memory-failure.c     |   13 ++--------
>  mm/migrate.c            |   65 +++++++++++++++--------------------------------
>  3 files changed, 25 insertions(+), 57 deletions(-)
> 
> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> index 855c337..ce7e667 100644
> --- a/include/linux/migrate.h
> +++ b/include/linux/migrate.h
> @@ -15,7 +15,7 @@ extern int migrate_page(struct address_space *,
>  extern int migrate_pages(struct list_head *l, new_page_t x,
>  			unsigned long private, bool offlining,
>  			enum migrate_mode mode);
> -extern int migrate_huge_pages(struct list_head *l, new_page_t x,
> +extern int migrate_huge_page(struct page *, new_page_t x,
>  			unsigned long private, bool offlining,
>  			enum migrate_mode mode);
>  
> @@ -36,7 +36,7 @@ static inline void putback_lru_pages(struct list_head *l) {}
>  static inline int migrate_pages(struct list_head *l, new_page_t x,
>  		unsigned long private, bool offlining,
>  		enum migrate_mode mode) { return -ENOSYS; }
> -static inline int migrate_huge_pages(struct list_head *l, new_page_t x,
> +static inline int migrate_huge_page(struct page *page, new_page_t x,
>  		unsigned long private, bool offlining,
>  		enum migrate_mode mode) { return -ENOSYS; }
>  
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index ab1e714..53a1495 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1414,7 +1414,6 @@ static int soft_offline_huge_page(struct page *page, int flags)
>  	int ret;
>  	unsigned long pfn = page_to_pfn(page);
>  	struct page *hpage = compound_head(page);
> -	LIST_HEAD(pagelist);
>  
>  	ret = get_any_page(page, pfn, flags);
>  	if (ret < 0)
> @@ -1429,19 +1428,11 @@ static int soft_offline_huge_page(struct page *page, int flags)
>  	}
>  
>  	/* Keep page count to indicate a given hugepage is isolated. */
> -
> -	list_add(&hpage->lru, &pagelist);
> -	ret = migrate_huge_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL, 0,
> -				true);
> +	ret = migrate_huge_page(hpage, new_page, MPOL_MF_MOVE_ALL, 0, true);
> +	put_page(hpage);
>  	if (ret) {
> -		struct page *page1, *page2;
> -		list_for_each_entry_safe(page1, page2, &pagelist, lru)
> -			put_page(page1);
> -
>  		pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
>  			pfn, ret, page->flags);
> -		if (ret > 0)
> -			ret = -EIO;
>  		return ret;
>  	}
>  done:
> diff --git a/mm/migrate.c b/mm/migrate.c
> index be26d5c..fdce3a2 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -932,15 +932,8 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
>  	if (anon_vma)
>  		put_anon_vma(anon_vma);
>  	unlock_page(hpage);
> -
>  out:
> -	if (rc != -EAGAIN) {
> -		list_del(&hpage->lru);
> -		put_page(hpage);
> -	}
> -
>  	put_page(new_hpage);
> -
>  	if (result) {
>  		if (rc)
>  			*result = rc;
> @@ -1016,48 +1009,32 @@ out:
>  	return nr_failed + retry;
>  }
>  
> -int migrate_huge_pages(struct list_head *from,
> -		new_page_t get_new_page, unsigned long private, bool offlining,
> -		enum migrate_mode mode)
> +int migrate_huge_page(struct page *hpage, new_page_t get_new_page,
> +		      unsigned long private, bool offlining,
> +		      enum migrate_mode mode)
>  {
> -	int retry = 1;
> -	int nr_failed = 0;
> -	int pass = 0;
> -	struct page *page;
> -	struct page *page2;
> -	int rc;
> -
> -	for (pass = 0; pass < 10 && retry; pass++) {
> -		retry = 0;
> -
> -		list_for_each_entry_safe(page, page2, from, lru) {
> +	int pass, rc;
> +
> +	for (pass = 0; pass < 10; pass++) {
> +		rc = unmap_and_move_huge_page(get_new_page,
> +					      private, hpage, pass > 2, offlining,
> +					      mode);
> +		switch (rc) {
> +		case -ENOMEM:
> +			goto out;
> +		case -EAGAIN:
> +			/* try again */
>  			cond_resched();
> -
> -			rc = unmap_and_move_huge_page(get_new_page,
> -					private, page, pass > 2, offlining,
> -					mode);
> -
> -			switch(rc) {
> -			case -ENOMEM:
> -				goto out;
> -			case -EAGAIN:
> -				retry++;
> -				break;
> -			case 0:
> -				break;
> -			default:
> -				/* Permanent failure */
> -				nr_failed++;
> -				break;
> -			}
> +			break;
> +		case 0:
> +			goto out;
> +		default:
> +			rc = -EIO;
> +			goto out;
>  		}
>  	}
> -	rc = 0;
>  out:
> -	if (rc)
> -		return rc;
> -
> -	return nr_failed + retry;
> +	return rc;
>  }
>  
>  #ifdef CONFIG_NUMA
> -- 
> 1.7.10
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 06/15] hugetlb: simplify migrate_huge_page()
@ 2012-06-14  7:28     ` Michal Hocko
  0 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-14  7:28 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, rientjes-hpIqsD4AKlfQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Wed 13-06-12 15:57:25, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> 
> Since we migrate only one hugepage, don't use linked list for passing the
> page around.  Directly pass the page that need to be migrated as argument.
> This also remove the usage page->lru in migrate path.
> 
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>

Yes nice.
Reviewed-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>

> ---
>  include/linux/migrate.h |    4 +--
>  mm/memory-failure.c     |   13 ++--------
>  mm/migrate.c            |   65 +++++++++++++++--------------------------------
>  3 files changed, 25 insertions(+), 57 deletions(-)
> 
> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> index 855c337..ce7e667 100644
> --- a/include/linux/migrate.h
> +++ b/include/linux/migrate.h
> @@ -15,7 +15,7 @@ extern int migrate_page(struct address_space *,
>  extern int migrate_pages(struct list_head *l, new_page_t x,
>  			unsigned long private, bool offlining,
>  			enum migrate_mode mode);
> -extern int migrate_huge_pages(struct list_head *l, new_page_t x,
> +extern int migrate_huge_page(struct page *, new_page_t x,
>  			unsigned long private, bool offlining,
>  			enum migrate_mode mode);
>  
> @@ -36,7 +36,7 @@ static inline void putback_lru_pages(struct list_head *l) {}
>  static inline int migrate_pages(struct list_head *l, new_page_t x,
>  		unsigned long private, bool offlining,
>  		enum migrate_mode mode) { return -ENOSYS; }
> -static inline int migrate_huge_pages(struct list_head *l, new_page_t x,
> +static inline int migrate_huge_page(struct page *page, new_page_t x,
>  		unsigned long private, bool offlining,
>  		enum migrate_mode mode) { return -ENOSYS; }
>  
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index ab1e714..53a1495 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1414,7 +1414,6 @@ static int soft_offline_huge_page(struct page *page, int flags)
>  	int ret;
>  	unsigned long pfn = page_to_pfn(page);
>  	struct page *hpage = compound_head(page);
> -	LIST_HEAD(pagelist);
>  
>  	ret = get_any_page(page, pfn, flags);
>  	if (ret < 0)
> @@ -1429,19 +1428,11 @@ static int soft_offline_huge_page(struct page *page, int flags)
>  	}
>  
>  	/* Keep page count to indicate a given hugepage is isolated. */
> -
> -	list_add(&hpage->lru, &pagelist);
> -	ret = migrate_huge_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL, 0,
> -				true);
> +	ret = migrate_huge_page(hpage, new_page, MPOL_MF_MOVE_ALL, 0, true);
> +	put_page(hpage);
>  	if (ret) {
> -		struct page *page1, *page2;
> -		list_for_each_entry_safe(page1, page2, &pagelist, lru)
> -			put_page(page1);
> -
>  		pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
>  			pfn, ret, page->flags);
> -		if (ret > 0)
> -			ret = -EIO;
>  		return ret;
>  	}
>  done:
> diff --git a/mm/migrate.c b/mm/migrate.c
> index be26d5c..fdce3a2 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -932,15 +932,8 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
>  	if (anon_vma)
>  		put_anon_vma(anon_vma);
>  	unlock_page(hpage);
> -
>  out:
> -	if (rc != -EAGAIN) {
> -		list_del(&hpage->lru);
> -		put_page(hpage);
> -	}
> -
>  	put_page(new_hpage);
> -
>  	if (result) {
>  		if (rc)
>  			*result = rc;
> @@ -1016,48 +1009,32 @@ out:
>  	return nr_failed + retry;
>  }
>  
> -int migrate_huge_pages(struct list_head *from,
> -		new_page_t get_new_page, unsigned long private, bool offlining,
> -		enum migrate_mode mode)
> +int migrate_huge_page(struct page *hpage, new_page_t get_new_page,
> +		      unsigned long private, bool offlining,
> +		      enum migrate_mode mode)
>  {
> -	int retry = 1;
> -	int nr_failed = 0;
> -	int pass = 0;
> -	struct page *page;
> -	struct page *page2;
> -	int rc;
> -
> -	for (pass = 0; pass < 10 && retry; pass++) {
> -		retry = 0;
> -
> -		list_for_each_entry_safe(page, page2, from, lru) {
> +	int pass, rc;
> +
> +	for (pass = 0; pass < 10; pass++) {
> +		rc = unmap_and_move_huge_page(get_new_page,
> +					      private, hpage, pass > 2, offlining,
> +					      mode);
> +		switch (rc) {
> +		case -ENOMEM:
> +			goto out;
> +		case -EAGAIN:
> +			/* try again */
>  			cond_resched();
> -
> -			rc = unmap_and_move_huge_page(get_new_page,
> -					private, page, pass > 2, offlining,
> -					mode);
> -
> -			switch(rc) {
> -			case -ENOMEM:
> -				goto out;
> -			case -EAGAIN:
> -				retry++;
> -				break;
> -			case 0:
> -				break;
> -			default:
> -				/* Permanent failure */
> -				nr_failed++;
> -				break;
> -			}
> +			break;
> +		case 0:
> +			goto out;
> +		default:
> +			rc = -EIO;
> +			goto out;
>  		}
>  	}
> -	rc = 0;
>  out:
> -	if (rc)
> -		return rc;
> -
> -	return nr_failed + retry;
> +	return rc;
>  }
>  
>  #ifdef CONFIG_NUMA
> -- 
> 1.7.10
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 07/15] hugetlb: add a list for tracking in-use HugeTLB pages
  2012-06-13 10:27   ` Aneesh Kumar K.V
  (?)
@ 2012-06-14  7:33     ` Michal Hocko
  -1 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-14  7:33 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

On Wed 13-06-12 15:57:26, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> hugepage_activelist will be used to track currently used HugeTLB pages.
> We need to find the in-use HugeTLB pages to support HugeTLB cgroup removal.
> On cgroup removal we update the page's HugeTLB cgroup to point to parent
> cgroup.
> 
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Reviewed-by: Michal Hocko <mhocko@suse.cz>

> ---
>  include/linux/hugetlb.h |    1 +
>  mm/hugetlb.c            |   12 +++++++-----
>  2 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 0f23c18..ed550d8 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -211,6 +211,7 @@ struct hstate {
>  	unsigned long resv_huge_pages;
>  	unsigned long surplus_huge_pages;
>  	unsigned long nr_overcommit_huge_pages;
> +	struct list_head hugepage_activelist;
>  	struct list_head hugepage_freelists[MAX_NUMNODES];
>  	unsigned int nr_huge_pages_node[MAX_NUMNODES];
>  	unsigned int free_huge_pages_node[MAX_NUMNODES];
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index e54b695..b5b6e15 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -510,7 +510,7 @@ void copy_huge_page(struct page *dst, struct page *src)
>  static void enqueue_huge_page(struct hstate *h, struct page *page)
>  {
>  	int nid = page_to_nid(page);
> -	list_add(&page->lru, &h->hugepage_freelists[nid]);
> +	list_move(&page->lru, &h->hugepage_freelists[nid]);
>  	h->free_huge_pages++;
>  	h->free_huge_pages_node[nid]++;
>  }
> @@ -522,7 +522,7 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid)
>  	if (list_empty(&h->hugepage_freelists[nid]))
>  		return NULL;
>  	page = list_entry(h->hugepage_freelists[nid].next, struct page, lru);
> -	list_del(&page->lru);
> +	list_move(&page->lru, &h->hugepage_activelist);
>  	set_page_refcounted(page);
>  	h->free_huge_pages--;
>  	h->free_huge_pages_node[nid]--;
> @@ -626,10 +626,11 @@ static void free_huge_page(struct page *page)
>  	page->mapping = NULL;
>  	BUG_ON(page_count(page));
>  	BUG_ON(page_mapcount(page));
> -	INIT_LIST_HEAD(&page->lru);
>  
>  	spin_lock(&hugetlb_lock);
>  	if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
> +		/* remove the page from active list */
> +		list_del(&page->lru);
>  		update_and_free_page(h, page);
>  		h->surplus_huge_pages--;
>  		h->surplus_huge_pages_node[nid]--;
> @@ -642,6 +643,7 @@ static void free_huge_page(struct page *page)
>  
>  static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
>  {
> +	INIT_LIST_HEAD(&page->lru);
>  	set_compound_page_dtor(page, free_huge_page);
>  	spin_lock(&hugetlb_lock);
>  	h->nr_huge_pages++;
> @@ -890,6 +892,7 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid)
>  
>  	spin_lock(&hugetlb_lock);
>  	if (page) {
> +		INIT_LIST_HEAD(&page->lru);
>  		r_nid = page_to_nid(page);
>  		set_compound_page_dtor(page, free_huge_page);
>  		/*
> @@ -994,7 +997,6 @@ retry:
>  	list_for_each_entry_safe(page, tmp, &surplus_list, lru) {
>  		if ((--needed) < 0)
>  			break;
> -		list_del(&page->lru);
>  		/*
>  		 * This page is now managed by the hugetlb allocator and has
>  		 * no users -- drop the buddy allocator's reference.
> @@ -1009,7 +1011,6 @@ free:
>  	/* Free unnecessary surplus pages to the buddy allocator */
>  	if (!list_empty(&surplus_list)) {
>  		list_for_each_entry_safe(page, tmp, &surplus_list, lru) {
> -			list_del(&page->lru);
>  			put_page(page);
>  		}
>  	}
> @@ -1909,6 +1910,7 @@ void __init hugetlb_add_hstate(unsigned order)
>  	h->free_huge_pages = 0;
>  	for (i = 0; i < MAX_NUMNODES; ++i)
>  		INIT_LIST_HEAD(&h->hugepage_freelists[i]);
> +	INIT_LIST_HEAD(&h->hugepage_activelist);
>  	h->next_nid_to_alloc = first_node(node_states[N_HIGH_MEMORY]);
>  	h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]);
>  	snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
> -- 
> 1.7.10
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 07/15] hugetlb: add a list for tracking in-use HugeTLB pages
@ 2012-06-14  7:33     ` Michal Hocko
  0 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-14  7:33 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

On Wed 13-06-12 15:57:26, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> hugepage_activelist will be used to track currently used HugeTLB pages.
> We need to find the in-use HugeTLB pages to support HugeTLB cgroup removal.
> On cgroup removal we update the page's HugeTLB cgroup to point to parent
> cgroup.
> 
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Reviewed-by: Michal Hocko <mhocko@suse.cz>

> ---
>  include/linux/hugetlb.h |    1 +
>  mm/hugetlb.c            |   12 +++++++-----
>  2 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 0f23c18..ed550d8 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -211,6 +211,7 @@ struct hstate {
>  	unsigned long resv_huge_pages;
>  	unsigned long surplus_huge_pages;
>  	unsigned long nr_overcommit_huge_pages;
> +	struct list_head hugepage_activelist;
>  	struct list_head hugepage_freelists[MAX_NUMNODES];
>  	unsigned int nr_huge_pages_node[MAX_NUMNODES];
>  	unsigned int free_huge_pages_node[MAX_NUMNODES];
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index e54b695..b5b6e15 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -510,7 +510,7 @@ void copy_huge_page(struct page *dst, struct page *src)
>  static void enqueue_huge_page(struct hstate *h, struct page *page)
>  {
>  	int nid = page_to_nid(page);
> -	list_add(&page->lru, &h->hugepage_freelists[nid]);
> +	list_move(&page->lru, &h->hugepage_freelists[nid]);
>  	h->free_huge_pages++;
>  	h->free_huge_pages_node[nid]++;
>  }
> @@ -522,7 +522,7 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid)
>  	if (list_empty(&h->hugepage_freelists[nid]))
>  		return NULL;
>  	page = list_entry(h->hugepage_freelists[nid].next, struct page, lru);
> -	list_del(&page->lru);
> +	list_move(&page->lru, &h->hugepage_activelist);
>  	set_page_refcounted(page);
>  	h->free_huge_pages--;
>  	h->free_huge_pages_node[nid]--;
> @@ -626,10 +626,11 @@ static void free_huge_page(struct page *page)
>  	page->mapping = NULL;
>  	BUG_ON(page_count(page));
>  	BUG_ON(page_mapcount(page));
> -	INIT_LIST_HEAD(&page->lru);
>  
>  	spin_lock(&hugetlb_lock);
>  	if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
> +		/* remove the page from active list */
> +		list_del(&page->lru);
>  		update_and_free_page(h, page);
>  		h->surplus_huge_pages--;
>  		h->surplus_huge_pages_node[nid]--;
> @@ -642,6 +643,7 @@ static void free_huge_page(struct page *page)
>  
>  static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
>  {
> +	INIT_LIST_HEAD(&page->lru);
>  	set_compound_page_dtor(page, free_huge_page);
>  	spin_lock(&hugetlb_lock);
>  	h->nr_huge_pages++;
> @@ -890,6 +892,7 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid)
>  
>  	spin_lock(&hugetlb_lock);
>  	if (page) {
> +		INIT_LIST_HEAD(&page->lru);
>  		r_nid = page_to_nid(page);
>  		set_compound_page_dtor(page, free_huge_page);
>  		/*
> @@ -994,7 +997,6 @@ retry:
>  	list_for_each_entry_safe(page, tmp, &surplus_list, lru) {
>  		if ((--needed) < 0)
>  			break;
> -		list_del(&page->lru);
>  		/*
>  		 * This page is now managed by the hugetlb allocator and has
>  		 * no users -- drop the buddy allocator's reference.
> @@ -1009,7 +1011,6 @@ free:
>  	/* Free unnecessary surplus pages to the buddy allocator */
>  	if (!list_empty(&surplus_list)) {
>  		list_for_each_entry_safe(page, tmp, &surplus_list, lru) {
> -			list_del(&page->lru);
>  			put_page(page);
>  		}
>  	}
> @@ -1909,6 +1910,7 @@ void __init hugetlb_add_hstate(unsigned order)
>  	h->free_huge_pages = 0;
>  	for (i = 0; i < MAX_NUMNODES; ++i)
>  		INIT_LIST_HEAD(&h->hugepage_freelists[i]);
> +	INIT_LIST_HEAD(&h->hugepage_activelist);
>  	h->next_nid_to_alloc = first_node(node_states[N_HIGH_MEMORY]);
>  	h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]);
>  	snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
> -- 
> 1.7.10
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 07/15] hugetlb: add a list for tracking in-use HugeTLB pages
@ 2012-06-14  7:33     ` Michal Hocko
  0 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-14  7:33 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, rientjes-hpIqsD4AKlfQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Wed 13-06-12 15:57:26, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> 
> hugepage_activelist will be used to track currently used HugeTLB pages.
> We need to find the in-use HugeTLB pages to support HugeTLB cgroup removal.
> On cgroup removal we update the page's HugeTLB cgroup to point to parent
> cgroup.
> 
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>

Reviewed-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>

> ---
>  include/linux/hugetlb.h |    1 +
>  mm/hugetlb.c            |   12 +++++++-----
>  2 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 0f23c18..ed550d8 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -211,6 +211,7 @@ struct hstate {
>  	unsigned long resv_huge_pages;
>  	unsigned long surplus_huge_pages;
>  	unsigned long nr_overcommit_huge_pages;
> +	struct list_head hugepage_activelist;
>  	struct list_head hugepage_freelists[MAX_NUMNODES];
>  	unsigned int nr_huge_pages_node[MAX_NUMNODES];
>  	unsigned int free_huge_pages_node[MAX_NUMNODES];
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index e54b695..b5b6e15 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -510,7 +510,7 @@ void copy_huge_page(struct page *dst, struct page *src)
>  static void enqueue_huge_page(struct hstate *h, struct page *page)
>  {
>  	int nid = page_to_nid(page);
> -	list_add(&page->lru, &h->hugepage_freelists[nid]);
> +	list_move(&page->lru, &h->hugepage_freelists[nid]);
>  	h->free_huge_pages++;
>  	h->free_huge_pages_node[nid]++;
>  }
> @@ -522,7 +522,7 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid)
>  	if (list_empty(&h->hugepage_freelists[nid]))
>  		return NULL;
>  	page = list_entry(h->hugepage_freelists[nid].next, struct page, lru);
> -	list_del(&page->lru);
> +	list_move(&page->lru, &h->hugepage_activelist);
>  	set_page_refcounted(page);
>  	h->free_huge_pages--;
>  	h->free_huge_pages_node[nid]--;
> @@ -626,10 +626,11 @@ static void free_huge_page(struct page *page)
>  	page->mapping = NULL;
>  	BUG_ON(page_count(page));
>  	BUG_ON(page_mapcount(page));
> -	INIT_LIST_HEAD(&page->lru);
>  
>  	spin_lock(&hugetlb_lock);
>  	if (h->surplus_huge_pages_node[nid] && huge_page_order(h) < MAX_ORDER) {
> +		/* remove the page from active list */
> +		list_del(&page->lru);
>  		update_and_free_page(h, page);
>  		h->surplus_huge_pages--;
>  		h->surplus_huge_pages_node[nid]--;
> @@ -642,6 +643,7 @@ static void free_huge_page(struct page *page)
>  
>  static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
>  {
> +	INIT_LIST_HEAD(&page->lru);
>  	set_compound_page_dtor(page, free_huge_page);
>  	spin_lock(&hugetlb_lock);
>  	h->nr_huge_pages++;
> @@ -890,6 +892,7 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid)
>  
>  	spin_lock(&hugetlb_lock);
>  	if (page) {
> +		INIT_LIST_HEAD(&page->lru);
>  		r_nid = page_to_nid(page);
>  		set_compound_page_dtor(page, free_huge_page);
>  		/*
> @@ -994,7 +997,6 @@ retry:
>  	list_for_each_entry_safe(page, tmp, &surplus_list, lru) {
>  		if ((--needed) < 0)
>  			break;
> -		list_del(&page->lru);
>  		/*
>  		 * This page is now managed by the hugetlb allocator and has
>  		 * no users -- drop the buddy allocator's reference.
> @@ -1009,7 +1011,6 @@ free:
>  	/* Free unnecessary surplus pages to the buddy allocator */
>  	if (!list_empty(&surplus_list)) {
>  		list_for_each_entry_safe(page, tmp, &surplus_list, lru) {
> -			list_del(&page->lru);
>  			put_page(page);
>  		}
>  	}
> @@ -1909,6 +1910,7 @@ void __init hugetlb_add_hstate(unsigned order)
>  	h->free_huge_pages = 0;
>  	for (i = 0; i < MAX_NUMNODES; ++i)
>  		INIT_LIST_HEAD(&h->hugepage_freelists[i]);
> +	INIT_LIST_HEAD(&h->hugepage_activelist);
>  	h->next_nid_to_alloc = first_node(node_states[N_HIGH_MEMORY]);
>  	h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]);
>  	snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
> -- 
> 1.7.10
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 08/15] hugetlb: Make some static variables global
  2012-06-13 10:27   ` Aneesh Kumar K.V
@ 2012-06-14  7:38     ` Michal Hocko
  -1 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-14  7:38 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

On Wed 13-06-12 15:57:27, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> We will use them later in hugetlb_cgroup.c
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Reviewed-by: Michal Hocko <mhocko@suse.cz>

Just a nit
[...]
> +extern int hugetlb_max_hstate;

Maybe we can mark it __read_mostly as it is modified only during
initialization and then it is just a constant.

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 08/15] hugetlb: Make some static variables global
@ 2012-06-14  7:38     ` Michal Hocko
  0 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-14  7:38 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

On Wed 13-06-12 15:57:27, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> We will use them later in hugetlb_cgroup.c
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Reviewed-by: Michal Hocko <mhocko@suse.cz>

Just a nit
[...]
> +extern int hugetlb_max_hstate;

Maybe we can mark it __read_mostly as it is modified only during
initialization and then it is just a constant.

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 09/15] mm/hugetlb: Add new HugeTLB cgroup
  2012-06-13 10:27   ` Aneesh Kumar K.V
@ 2012-06-14  8:24     ` Michal Hocko
  -1 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-14  8:24 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

On Wed 13-06-12 15:57:28, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> This patch implements a new controller that allows us to control HugeTLB
> allocations. The extension allows to limit the HugeTLB usage per control
> group and enforces the controller limit during page fault.  Since HugeTLB
> doesn't support page reclaim, enforcing the limit at page fault time implies
> that, the application will get SIGBUS signal if it tries to access HugeTLB
> pages beyond its limit. This requires the application to know beforehand
> how much HugeTLB pages it would require for its use.
> 
> The charge/uncharge calls will be added to HugeTLB code in later patch.
> Support for cgroup removal will be added in later patches.
> 
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Looks good
Reviewed-by: Michal Hocko <mhocko@suse.cz>

> ---
>  include/linux/cgroup_subsys.h  |    6 ++
>  include/linux/hugetlb_cgroup.h |   37 ++++++++++++
>  init/Kconfig                   |   15 +++++
>  mm/Makefile                    |    1 +
>  mm/hugetlb_cgroup.c            |  122 ++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 181 insertions(+)
>  create mode 100644 include/linux/hugetlb_cgroup.h
>  create mode 100644 mm/hugetlb_cgroup.c
> 
> diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
> index 0bd390c..895923a 100644
> --- a/include/linux/cgroup_subsys.h
> +++ b/include/linux/cgroup_subsys.h
> @@ -72,3 +72,9 @@ SUBSYS(net_prio)
>  #endif
>  
>  /* */
> +
> +#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR
> +SUBSYS(hugetlb)
> +#endif
> +
> +/* */
> diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h
> new file mode 100644
> index 0000000..e9944b4
> --- /dev/null
> +++ b/include/linux/hugetlb_cgroup.h
> @@ -0,0 +1,37 @@
> +/*
> + * Copyright IBM Corporation, 2012
> + * Author Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of version 2.1 of the GNU Lesser General Public License
> + * as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it would be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> + *
> + */
> +
> +#ifndef _LINUX_HUGETLB_CGROUP_H
> +#define _LINUX_HUGETLB_CGROUP_H
> +
> +#include <linux/res_counter.h>
> +
> +struct hugetlb_cgroup;
> +
> +#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR
> +static inline bool hugetlb_cgroup_disabled(void)
> +{
> +	if (hugetlb_subsys.disabled)
> +		return true;
> +	return false;
> +}
> +
> +#else
> +static inline bool hugetlb_cgroup_disabled(void)
> +{
> +	return true;
> +}
> +
> +#endif  /* CONFIG_MEM_RES_CTLR_HUGETLB */
> +#endif
> diff --git a/init/Kconfig b/init/Kconfig
> index d07dcf9..da05fae 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -751,6 +751,21 @@ config CGROUP_MEM_RES_CTLR_KMEM
>  	  the kmem extension can use it to guarantee that no group of processes
>  	  will ever exhaust kernel resources alone.
>  
> +config CGROUP_HUGETLB_RES_CTLR
> +	bool "HugeTLB Resource Controller for Control Groups"
> +	depends on RESOURCE_COUNTERS && HUGETLB_PAGE && EXPERIMENTAL
> +	default n
> +	help
> +	  Provides a cgroup Resource Controller for HugeTLB pages.
> +	  When you enable this, you can put a per cgroup limit on HugeTLB usage.
> +	  The limit is enforced during page fault. Since HugeTLB doesn't
> +	  support page reclaim, enforcing the limit at page fault time implies
> +	  that, the application will get SIGBUS signal if it tries to access
> +	  HugeTLB pages beyond its limit. This requires the application to know
> +	  beforehand how much HugeTLB pages it would require for its use. The
> +	  control group is tracked in the third page lru pointer. This means
> +	  that we cannot use the controller with huge page less than 3 pages.
> +
>  config CGROUP_PERF
>  	bool "Enable perf_event per-cpu per-container group (cgroup) monitoring"
>  	depends on PERF_EVENTS && CGROUPS
> diff --git a/mm/Makefile b/mm/Makefile
> index 2e2fbbe..25e8002 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -49,6 +49,7 @@ obj-$(CONFIG_MIGRATION) += migrate.o
>  obj-$(CONFIG_QUICKLIST) += quicklist.o
>  obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o
>  obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o page_cgroup.o
> +obj-$(CONFIG_CGROUP_HUGETLB_RES_CTLR) += hugetlb_cgroup.o
>  obj-$(CONFIG_MEMORY_FAILURE) += memory-failure.o
>  obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o
>  obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
> diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
> new file mode 100644
> index 0000000..5a4e71c
> --- /dev/null
> +++ b/mm/hugetlb_cgroup.c
> @@ -0,0 +1,122 @@
> +/*
> + *
> + * Copyright IBM Corporation, 2012
> + * Author Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of version 2.1 of the GNU Lesser General Public License
> + * as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it would be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> + *
> + */
> +
> +#include <linux/cgroup.h>
> +#include <linux/slab.h>
> +#include <linux/hugetlb.h>
> +#include <linux/hugetlb_cgroup.h>
> +
> +struct hugetlb_cgroup {
> +	struct cgroup_subsys_state css;
> +	/*
> +	 * the counter to account for hugepages from hugetlb.
> +	 */
> +	struct res_counter hugepage[HUGE_MAX_HSTATE];
> +};
> +
> +struct cgroup_subsys hugetlb_subsys __read_mostly;
> +struct hugetlb_cgroup *root_h_cgroup __read_mostly;
> +
> +static inline
> +struct hugetlb_cgroup *hugetlb_cgroup_from_css(struct cgroup_subsys_state *s)
> +{
> +	if (s)
> +		return container_of(s, struct hugetlb_cgroup, css);
> +	return NULL;
> +}
> +
> +static inline
> +struct hugetlb_cgroup *hugetlb_cgroup_from_cgroup(struct cgroup *cgroup)
> +{
> +	return hugetlb_cgroup_from_css(cgroup_subsys_state(cgroup,
> +							   hugetlb_subsys_id));
> +}
> +
> +static inline
> +struct hugetlb_cgroup *hugetlb_cgroup_from_task(struct task_struct *task)
> +{
> +	return hugetlb_cgroup_from_css(task_subsys_state(task,
> +							 hugetlb_subsys_id));
> +}
> +
> +static inline bool hugetlb_cgroup_is_root(struct hugetlb_cgroup *h_cg)
> +{
> +	return (h_cg == root_h_cgroup);
> +}
> +
> +static inline struct hugetlb_cgroup *parent_hugetlb_cgroup(struct cgroup *cg)
> +{
> +	if (!cg->parent)
> +		return NULL;
> +	return hugetlb_cgroup_from_cgroup(cg->parent);
> +}
> +
> +static inline bool hugetlb_cgroup_have_usage(struct cgroup *cg)
> +{
> +	int idx;
> +	struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cg);
> +
> +	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
> +		if ((res_counter_read_u64(&h_cg->hugepage[idx], RES_USAGE)) > 0)
> +			return true;
> +	}
> +	return false;
> +}
> +
> +static struct cgroup_subsys_state *hugetlb_cgroup_create(struct cgroup *cgroup)
> +{
> +	int idx;
> +	struct cgroup *parent_cgroup;
> +	struct hugetlb_cgroup *h_cgroup, *parent_h_cgroup;
> +
> +	h_cgroup = kzalloc(sizeof(*h_cgroup), GFP_KERNEL);
> +	if (!h_cgroup)
> +		return ERR_PTR(-ENOMEM);
> +
> +	parent_cgroup = cgroup->parent;
> +	if (parent_cgroup) {
> +		parent_h_cgroup = hugetlb_cgroup_from_cgroup(parent_cgroup);
> +		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
> +			res_counter_init(&h_cgroup->hugepage[idx],
> +					 &parent_h_cgroup->hugepage[idx]);
> +	} else {
> +		root_h_cgroup = h_cgroup;
> +		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
> +			res_counter_init(&h_cgroup->hugepage[idx], NULL);
> +	}
> +	return &h_cgroup->css;
> +}
> +
> +static void hugetlb_cgroup_destroy(struct cgroup *cgroup)
> +{
> +	struct hugetlb_cgroup *h_cgroup;
> +
> +	h_cgroup = hugetlb_cgroup_from_cgroup(cgroup);
> +	kfree(h_cgroup);
> +}
> +
> +static int hugetlb_cgroup_pre_destroy(struct cgroup *cgroup)
> +{
> +	/* We will add the cgroup removal support in later patches */
> +	   return -EBUSY;
> +}
> +
> +struct cgroup_subsys hugetlb_subsys = {
> +	.name = "hugetlb",
> +	.create     = hugetlb_cgroup_create,
> +	.pre_destroy = hugetlb_cgroup_pre_destroy,
> +	.destroy    = hugetlb_cgroup_destroy,
> +	.subsys_id  = hugetlb_subsys_id,
> +};
> -- 
> 1.7.10
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 09/15] mm/hugetlb: Add new HugeTLB cgroup
@ 2012-06-14  8:24     ` Michal Hocko
  0 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-14  8:24 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

On Wed 13-06-12 15:57:28, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> This patch implements a new controller that allows us to control HugeTLB
> allocations. The extension allows to limit the HugeTLB usage per control
> group and enforces the controller limit during page fault.  Since HugeTLB
> doesn't support page reclaim, enforcing the limit at page fault time implies
> that, the application will get SIGBUS signal if it tries to access HugeTLB
> pages beyond its limit. This requires the application to know beforehand
> how much HugeTLB pages it would require for its use.
> 
> The charge/uncharge calls will be added to HugeTLB code in later patch.
> Support for cgroup removal will be added in later patches.
> 
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Looks good
Reviewed-by: Michal Hocko <mhocko@suse.cz>

> ---
>  include/linux/cgroup_subsys.h  |    6 ++
>  include/linux/hugetlb_cgroup.h |   37 ++++++++++++
>  init/Kconfig                   |   15 +++++
>  mm/Makefile                    |    1 +
>  mm/hugetlb_cgroup.c            |  122 ++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 181 insertions(+)
>  create mode 100644 include/linux/hugetlb_cgroup.h
>  create mode 100644 mm/hugetlb_cgroup.c
> 
> diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
> index 0bd390c..895923a 100644
> --- a/include/linux/cgroup_subsys.h
> +++ b/include/linux/cgroup_subsys.h
> @@ -72,3 +72,9 @@ SUBSYS(net_prio)
>  #endif
>  
>  /* */
> +
> +#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR
> +SUBSYS(hugetlb)
> +#endif
> +
> +/* */
> diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h
> new file mode 100644
> index 0000000..e9944b4
> --- /dev/null
> +++ b/include/linux/hugetlb_cgroup.h
> @@ -0,0 +1,37 @@
> +/*
> + * Copyright IBM Corporation, 2012
> + * Author Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of version 2.1 of the GNU Lesser General Public License
> + * as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it would be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> + *
> + */
> +
> +#ifndef _LINUX_HUGETLB_CGROUP_H
> +#define _LINUX_HUGETLB_CGROUP_H
> +
> +#include <linux/res_counter.h>
> +
> +struct hugetlb_cgroup;
> +
> +#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR
> +static inline bool hugetlb_cgroup_disabled(void)
> +{
> +	if (hugetlb_subsys.disabled)
> +		return true;
> +	return false;
> +}
> +
> +#else
> +static inline bool hugetlb_cgroup_disabled(void)
> +{
> +	return true;
> +}
> +
> +#endif  /* CONFIG_MEM_RES_CTLR_HUGETLB */
> +#endif
> diff --git a/init/Kconfig b/init/Kconfig
> index d07dcf9..da05fae 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -751,6 +751,21 @@ config CGROUP_MEM_RES_CTLR_KMEM
>  	  the kmem extension can use it to guarantee that no group of processes
>  	  will ever exhaust kernel resources alone.
>  
> +config CGROUP_HUGETLB_RES_CTLR
> +	bool "HugeTLB Resource Controller for Control Groups"
> +	depends on RESOURCE_COUNTERS && HUGETLB_PAGE && EXPERIMENTAL
> +	default n
> +	help
> +	  Provides a cgroup Resource Controller for HugeTLB pages.
> +	  When you enable this, you can put a per cgroup limit on HugeTLB usage.
> +	  The limit is enforced during page fault. Since HugeTLB doesn't
> +	  support page reclaim, enforcing the limit at page fault time implies
> +	  that, the application will get SIGBUS signal if it tries to access
> +	  HugeTLB pages beyond its limit. This requires the application to know
> +	  beforehand how much HugeTLB pages it would require for its use. The
> +	  control group is tracked in the third page lru pointer. This means
> +	  that we cannot use the controller with huge page less than 3 pages.
> +
>  config CGROUP_PERF
>  	bool "Enable perf_event per-cpu per-container group (cgroup) monitoring"
>  	depends on PERF_EVENTS && CGROUPS
> diff --git a/mm/Makefile b/mm/Makefile
> index 2e2fbbe..25e8002 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -49,6 +49,7 @@ obj-$(CONFIG_MIGRATION) += migrate.o
>  obj-$(CONFIG_QUICKLIST) += quicklist.o
>  obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o
>  obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o page_cgroup.o
> +obj-$(CONFIG_CGROUP_HUGETLB_RES_CTLR) += hugetlb_cgroup.o
>  obj-$(CONFIG_MEMORY_FAILURE) += memory-failure.o
>  obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o
>  obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
> diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
> new file mode 100644
> index 0000000..5a4e71c
> --- /dev/null
> +++ b/mm/hugetlb_cgroup.c
> @@ -0,0 +1,122 @@
> +/*
> + *
> + * Copyright IBM Corporation, 2012
> + * Author Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of version 2.1 of the GNU Lesser General Public License
> + * as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it would be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> + *
> + */
> +
> +#include <linux/cgroup.h>
> +#include <linux/slab.h>
> +#include <linux/hugetlb.h>
> +#include <linux/hugetlb_cgroup.h>
> +
> +struct hugetlb_cgroup {
> +	struct cgroup_subsys_state css;
> +	/*
> +	 * the counter to account for hugepages from hugetlb.
> +	 */
> +	struct res_counter hugepage[HUGE_MAX_HSTATE];
> +};
> +
> +struct cgroup_subsys hugetlb_subsys __read_mostly;
> +struct hugetlb_cgroup *root_h_cgroup __read_mostly;
> +
> +static inline
> +struct hugetlb_cgroup *hugetlb_cgroup_from_css(struct cgroup_subsys_state *s)
> +{
> +	if (s)
> +		return container_of(s, struct hugetlb_cgroup, css);
> +	return NULL;
> +}
> +
> +static inline
> +struct hugetlb_cgroup *hugetlb_cgroup_from_cgroup(struct cgroup *cgroup)
> +{
> +	return hugetlb_cgroup_from_css(cgroup_subsys_state(cgroup,
> +							   hugetlb_subsys_id));
> +}
> +
> +static inline
> +struct hugetlb_cgroup *hugetlb_cgroup_from_task(struct task_struct *task)
> +{
> +	return hugetlb_cgroup_from_css(task_subsys_state(task,
> +							 hugetlb_subsys_id));
> +}
> +
> +static inline bool hugetlb_cgroup_is_root(struct hugetlb_cgroup *h_cg)
> +{
> +	return (h_cg == root_h_cgroup);
> +}
> +
> +static inline struct hugetlb_cgroup *parent_hugetlb_cgroup(struct cgroup *cg)
> +{
> +	if (!cg->parent)
> +		return NULL;
> +	return hugetlb_cgroup_from_cgroup(cg->parent);
> +}
> +
> +static inline bool hugetlb_cgroup_have_usage(struct cgroup *cg)
> +{
> +	int idx;
> +	struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cg);
> +
> +	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
> +		if ((res_counter_read_u64(&h_cg->hugepage[idx], RES_USAGE)) > 0)
> +			return true;
> +	}
> +	return false;
> +}
> +
> +static struct cgroup_subsys_state *hugetlb_cgroup_create(struct cgroup *cgroup)
> +{
> +	int idx;
> +	struct cgroup *parent_cgroup;
> +	struct hugetlb_cgroup *h_cgroup, *parent_h_cgroup;
> +
> +	h_cgroup = kzalloc(sizeof(*h_cgroup), GFP_KERNEL);
> +	if (!h_cgroup)
> +		return ERR_PTR(-ENOMEM);
> +
> +	parent_cgroup = cgroup->parent;
> +	if (parent_cgroup) {
> +		parent_h_cgroup = hugetlb_cgroup_from_cgroup(parent_cgroup);
> +		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
> +			res_counter_init(&h_cgroup->hugepage[idx],
> +					 &parent_h_cgroup->hugepage[idx]);
> +	} else {
> +		root_h_cgroup = h_cgroup;
> +		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
> +			res_counter_init(&h_cgroup->hugepage[idx], NULL);
> +	}
> +	return &h_cgroup->css;
> +}
> +
> +static void hugetlb_cgroup_destroy(struct cgroup *cgroup)
> +{
> +	struct hugetlb_cgroup *h_cgroup;
> +
> +	h_cgroup = hugetlb_cgroup_from_cgroup(cgroup);
> +	kfree(h_cgroup);
> +}
> +
> +static int hugetlb_cgroup_pre_destroy(struct cgroup *cgroup)
> +{
> +	/* We will add the cgroup removal support in later patches */
> +	   return -EBUSY;
> +}
> +
> +struct cgroup_subsys hugetlb_subsys = {
> +	.name = "hugetlb",
> +	.create     = hugetlb_cgroup_create,
> +	.pre_destroy = hugetlb_cgroup_pre_destroy,
> +	.destroy    = hugetlb_cgroup_destroy,
> +	.subsys_id  = hugetlb_subsys_id,
> +};
> -- 
> 1.7.10
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 [updated] 10/15] hugetlb/cgroup: Add the cgroup pointer to page lru
  2012-06-13 11:34     ` Aneesh Kumar K.V
  (?)
@ 2012-06-14  8:44       ` Michal Hocko
  -1 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-14  8:44 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

On Wed 13-06-12 17:04:30, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> Add the hugetlb cgroup pointer to 3rd page lru.next. This limit
> the usage to hugetlb cgroup to only hugepages with 3 or more
> normal pages. I guess that is an acceptable limitation.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

I would be happier if you explicitely mentioned that both
hugetlb_cgroup_from_page and set_hugetlb_cgroup need hugetlb_lock held,
but

Reviewed-by: Michal Hocko <mhocko@suse.cz>

> ---
>  include/linux/hugetlb_cgroup.h |   37 +++++++++++++++++++++++++++++++++++++
>  mm/hugetlb.c                   |    4 ++++
>  2 files changed, 41 insertions(+)
> 
> diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h
> index e9944b4..2e4cb6b 100644
> --- a/include/linux/hugetlb_cgroup.h
> +++ b/include/linux/hugetlb_cgroup.h
> @@ -18,8 +18,34 @@
>  #include <linux/res_counter.h>
>  
>  struct hugetlb_cgroup;
> +/*
> + * Minimum page order trackable by hugetlb cgroup.
> + * At least 3 pages are necessary for all the tracking information.
> + */
> +#define HUGETLB_CGROUP_MIN_ORDER	2
>  
>  #ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR
> +
> +static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page)
> +{
> +	VM_BUG_ON(!PageHuge(page));
> +
> +	if (compound_order(page) < HUGETLB_CGROUP_MIN_ORDER)
> +		return NULL;
> +	return (struct hugetlb_cgroup *)page[2].lru.next;
> +}
> +
> +static inline
> +int set_hugetlb_cgroup(struct page *page, struct hugetlb_cgroup *h_cg)
> +{
> +	VM_BUG_ON(!PageHuge(page));
> +
> +	if (compound_order(page) < HUGETLB_CGROUP_MIN_ORDER)
> +		return -1;
> +	page[2].lru.next = (void *)h_cg;
> +	return 0;
> +}
> +
>  static inline bool hugetlb_cgroup_disabled(void)
>  {
>  	if (hugetlb_subsys.disabled)
> @@ -28,6 +54,17 @@ static inline bool hugetlb_cgroup_disabled(void)
>  }
>  
>  #else
> +static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page)
> +{
> +	return NULL;
> +}
> +
> +static inline
> +int set_hugetlb_cgroup(struct page *page, struct hugetlb_cgroup *h_cg)
> +{
> +	return 0;
> +}
> +
>  static inline bool hugetlb_cgroup_disabled(void)
>  {
>  	return true;
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index e899a2d..6a449c5 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -28,6 +28,7 @@
>  
>  #include <linux/io.h>
>  #include <linux/hugetlb.h>
> +#include <linux/hugetlb_cgroup.h>
>  #include <linux/node.h>
>  #include "internal.h"
>  
> @@ -591,6 +592,7 @@ static void update_and_free_page(struct hstate *h, struct page *page)
>  				1 << PG_active | 1 << PG_reserved |
>  				1 << PG_private | 1 << PG_writeback);
>  	}
> +	VM_BUG_ON(hugetlb_cgroup_from_page(page));
>  	set_compound_page_dtor(page, NULL);
>  	set_page_refcounted(page);
>  	arch_release_hugepage(page);
> @@ -643,6 +645,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
>  	INIT_LIST_HEAD(&page->lru);
>  	set_compound_page_dtor(page, free_huge_page);
>  	spin_lock(&hugetlb_lock);
> +	set_hugetlb_cgroup(page, NULL);
>  	h->nr_huge_pages++;
>  	h->nr_huge_pages_node[nid]++;
>  	spin_unlock(&hugetlb_lock);
> @@ -892,6 +895,7 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid)
>  		INIT_LIST_HEAD(&page->lru);
>  		r_nid = page_to_nid(page);
>  		set_compound_page_dtor(page, free_huge_page);
> +		set_hugetlb_cgroup(page, NULL);
>  		/*
>  		 * We incremented the global counters already
>  		 */
> -- 
> 1.7.10
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 [updated] 10/15] hugetlb/cgroup: Add the cgroup pointer to page lru
@ 2012-06-14  8:44       ` Michal Hocko
  0 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-14  8:44 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

On Wed 13-06-12 17:04:30, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> Add the hugetlb cgroup pointer to 3rd page lru.next. This limit
> the usage to hugetlb cgroup to only hugepages with 3 or more
> normal pages. I guess that is an acceptable limitation.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

I would be happier if you explicitely mentioned that both
hugetlb_cgroup_from_page and set_hugetlb_cgroup need hugetlb_lock held,
but

Reviewed-by: Michal Hocko <mhocko@suse.cz>

> ---
>  include/linux/hugetlb_cgroup.h |   37 +++++++++++++++++++++++++++++++++++++
>  mm/hugetlb.c                   |    4 ++++
>  2 files changed, 41 insertions(+)
> 
> diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h
> index e9944b4..2e4cb6b 100644
> --- a/include/linux/hugetlb_cgroup.h
> +++ b/include/linux/hugetlb_cgroup.h
> @@ -18,8 +18,34 @@
>  #include <linux/res_counter.h>
>  
>  struct hugetlb_cgroup;
> +/*
> + * Minimum page order trackable by hugetlb cgroup.
> + * At least 3 pages are necessary for all the tracking information.
> + */
> +#define HUGETLB_CGROUP_MIN_ORDER	2
>  
>  #ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR
> +
> +static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page)
> +{
> +	VM_BUG_ON(!PageHuge(page));
> +
> +	if (compound_order(page) < HUGETLB_CGROUP_MIN_ORDER)
> +		return NULL;
> +	return (struct hugetlb_cgroup *)page[2].lru.next;
> +}
> +
> +static inline
> +int set_hugetlb_cgroup(struct page *page, struct hugetlb_cgroup *h_cg)
> +{
> +	VM_BUG_ON(!PageHuge(page));
> +
> +	if (compound_order(page) < HUGETLB_CGROUP_MIN_ORDER)
> +		return -1;
> +	page[2].lru.next = (void *)h_cg;
> +	return 0;
> +}
> +
>  static inline bool hugetlb_cgroup_disabled(void)
>  {
>  	if (hugetlb_subsys.disabled)
> @@ -28,6 +54,17 @@ static inline bool hugetlb_cgroup_disabled(void)
>  }
>  
>  #else
> +static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page)
> +{
> +	return NULL;
> +}
> +
> +static inline
> +int set_hugetlb_cgroup(struct page *page, struct hugetlb_cgroup *h_cg)
> +{
> +	return 0;
> +}
> +
>  static inline bool hugetlb_cgroup_disabled(void)
>  {
>  	return true;
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index e899a2d..6a449c5 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -28,6 +28,7 @@
>  
>  #include <linux/io.h>
>  #include <linux/hugetlb.h>
> +#include <linux/hugetlb_cgroup.h>
>  #include <linux/node.h>
>  #include "internal.h"
>  
> @@ -591,6 +592,7 @@ static void update_and_free_page(struct hstate *h, struct page *page)
>  				1 << PG_active | 1 << PG_reserved |
>  				1 << PG_private | 1 << PG_writeback);
>  	}
> +	VM_BUG_ON(hugetlb_cgroup_from_page(page));
>  	set_compound_page_dtor(page, NULL);
>  	set_page_refcounted(page);
>  	arch_release_hugepage(page);
> @@ -643,6 +645,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
>  	INIT_LIST_HEAD(&page->lru);
>  	set_compound_page_dtor(page, free_huge_page);
>  	spin_lock(&hugetlb_lock);
> +	set_hugetlb_cgroup(page, NULL);
>  	h->nr_huge_pages++;
>  	h->nr_huge_pages_node[nid]++;
>  	spin_unlock(&hugetlb_lock);
> @@ -892,6 +895,7 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid)
>  		INIT_LIST_HEAD(&page->lru);
>  		r_nid = page_to_nid(page);
>  		set_compound_page_dtor(page, free_huge_page);
> +		set_hugetlb_cgroup(page, NULL);
>  		/*
>  		 * We incremented the global counters already
>  		 */
> -- 
> 1.7.10
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 [updated] 10/15] hugetlb/cgroup: Add the cgroup pointer to page lru
@ 2012-06-14  8:44       ` Michal Hocko
  0 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-14  8:44 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, rientjes-hpIqsD4AKlfQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Wed 13-06-12 17:04:30, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> 
> Add the hugetlb cgroup pointer to 3rd page lru.next. This limit
> the usage to hugetlb cgroup to only hugepages with 3 or more
> normal pages. I guess that is an acceptable limitation.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>

I would be happier if you explicitely mentioned that both
hugetlb_cgroup_from_page and set_hugetlb_cgroup need hugetlb_lock held,
but

Reviewed-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>

> ---
>  include/linux/hugetlb_cgroup.h |   37 +++++++++++++++++++++++++++++++++++++
>  mm/hugetlb.c                   |    4 ++++
>  2 files changed, 41 insertions(+)
> 
> diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h
> index e9944b4..2e4cb6b 100644
> --- a/include/linux/hugetlb_cgroup.h
> +++ b/include/linux/hugetlb_cgroup.h
> @@ -18,8 +18,34 @@
>  #include <linux/res_counter.h>
>  
>  struct hugetlb_cgroup;
> +/*
> + * Minimum page order trackable by hugetlb cgroup.
> + * At least 3 pages are necessary for all the tracking information.
> + */
> +#define HUGETLB_CGROUP_MIN_ORDER	2
>  
>  #ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR
> +
> +static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page)
> +{
> +	VM_BUG_ON(!PageHuge(page));
> +
> +	if (compound_order(page) < HUGETLB_CGROUP_MIN_ORDER)
> +		return NULL;
> +	return (struct hugetlb_cgroup *)page[2].lru.next;
> +}
> +
> +static inline
> +int set_hugetlb_cgroup(struct page *page, struct hugetlb_cgroup *h_cg)
> +{
> +	VM_BUG_ON(!PageHuge(page));
> +
> +	if (compound_order(page) < HUGETLB_CGROUP_MIN_ORDER)
> +		return -1;
> +	page[2].lru.next = (void *)h_cg;
> +	return 0;
> +}
> +
>  static inline bool hugetlb_cgroup_disabled(void)
>  {
>  	if (hugetlb_subsys.disabled)
> @@ -28,6 +54,17 @@ static inline bool hugetlb_cgroup_disabled(void)
>  }
>  
>  #else
> +static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page)
> +{
> +	return NULL;
> +}
> +
> +static inline
> +int set_hugetlb_cgroup(struct page *page, struct hugetlb_cgroup *h_cg)
> +{
> +	return 0;
> +}
> +
>  static inline bool hugetlb_cgroup_disabled(void)
>  {
>  	return true;
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index e899a2d..6a449c5 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -28,6 +28,7 @@
>  
>  #include <linux/io.h>
>  #include <linux/hugetlb.h>
> +#include <linux/hugetlb_cgroup.h>
>  #include <linux/node.h>
>  #include "internal.h"
>  
> @@ -591,6 +592,7 @@ static void update_and_free_page(struct hstate *h, struct page *page)
>  				1 << PG_active | 1 << PG_reserved |
>  				1 << PG_private | 1 << PG_writeback);
>  	}
> +	VM_BUG_ON(hugetlb_cgroup_from_page(page));
>  	set_compound_page_dtor(page, NULL);
>  	set_page_refcounted(page);
>  	arch_release_hugepage(page);
> @@ -643,6 +645,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
>  	INIT_LIST_HEAD(&page->lru);
>  	set_compound_page_dtor(page, free_huge_page);
>  	spin_lock(&hugetlb_lock);
> +	set_hugetlb_cgroup(page, NULL);
>  	h->nr_huge_pages++;
>  	h->nr_huge_pages_node[nid]++;
>  	spin_unlock(&hugetlb_lock);
> @@ -892,6 +895,7 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid)
>  		INIT_LIST_HEAD(&page->lru);
>  		r_nid = page_to_nid(page);
>  		set_compound_page_dtor(page, free_huge_page);
> +		set_hugetlb_cgroup(page, NULL);
>  		/*
>  		 * We incremented the global counters already
>  		 */
> -- 
> 1.7.10
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 09/15] mm/hugetlb: Add new HugeTLB cgroup
  2012-06-13 10:27   ` Aneesh Kumar K.V
@ 2012-06-14  8:54     ` Li Zefan
  -1 siblings, 0 replies; 122+ messages in thread
From: Li Zefan @ 2012-06-14  8:54 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm,
	hannes, linux-kernel, cgroups

> +static inline

> +struct hugetlb_cgroup *hugetlb_cgroup_from_css(struct cgroup_subsys_state *s)
> +{
> +	if (s)


Neither cgroup_subsys_state() or task_subsys_state() will ever return NULL,
so here 's' won't be NULL.

> +		return container_of(s, struct hugetlb_cgroup, css);
> +	return NULL;
> +}
> +
> +static inline
> +struct hugetlb_cgroup *hugetlb_cgroup_from_cgroup(struct cgroup *cgroup)
> +{
> +	return hugetlb_cgroup_from_css(cgroup_subsys_state(cgroup,
> +							   hugetlb_subsys_id));
> +}
> +
> +static inline
> +struct hugetlb_cgroup *hugetlb_cgroup_from_task(struct task_struct *task)
> +{
> +	return hugetlb_cgroup_from_css(task_subsys_state(task,
> +							 hugetlb_subsys_id));
> +}


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 09/15] mm/hugetlb: Add new HugeTLB cgroup
@ 2012-06-14  8:54     ` Li Zefan
  0 siblings, 0 replies; 122+ messages in thread
From: Li Zefan @ 2012-06-14  8:54 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm,
	hannes, linux-kernel, cgroups

> +static inline

> +struct hugetlb_cgroup *hugetlb_cgroup_from_css(struct cgroup_subsys_state *s)
> +{
> +	if (s)


Neither cgroup_subsys_state() or task_subsys_state() will ever return NULL,
so here 's' won't be NULL.

> +		return container_of(s, struct hugetlb_cgroup, css);
> +	return NULL;
> +}
> +
> +static inline
> +struct hugetlb_cgroup *hugetlb_cgroup_from_cgroup(struct cgroup *cgroup)
> +{
> +	return hugetlb_cgroup_from_css(cgroup_subsys_state(cgroup,
> +							   hugetlb_subsys_id));
> +}
> +
> +static inline
> +struct hugetlb_cgroup *hugetlb_cgroup_from_task(struct task_struct *task)
> +{
> +	return hugetlb_cgroup_from_css(task_subsys_state(task,
> +							 hugetlb_subsys_id));
> +}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup
  2012-06-13 10:27   ` Aneesh Kumar K.V
@ 2012-06-14  8:58     ` Li Zefan
  -1 siblings, 0 replies; 122+ messages in thread
From: Li Zefan @ 2012-06-14  8:58 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm,
	hannes, linux-kernel, cgroups

> +int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,

> +				 struct hugetlb_cgroup **ptr)
> +{
> +	int ret = 0;
> +	struct res_counter *fail_res;
> +	struct hugetlb_cgroup *h_cg = NULL;
> +	unsigned long csize = nr_pages * PAGE_SIZE;
> +
> +	if (hugetlb_cgroup_disabled())
> +		goto done;
> +	/*
> +	 * We don't charge any cgroup if the compound page have less
> +	 * than 3 pages.
> +	 */
> +	if (huge_page_order(&hstates[idx]) < HUGETLB_CGROUP_MIN_ORDER)
> +		goto done;
> +again:
> +	rcu_read_lock();
> +	h_cg = hugetlb_cgroup_from_task(current);
> +	if (!h_cg)


In no circumstances should h_cg be NULL.

> +		h_cg = root_h_cgroup;
> +
> +	if (!css_tryget(&h_cg->css)) {
> +		rcu_read_unlock();
> +		goto again;
> +	}
> +	rcu_read_unlock();
> +
> +	ret = res_counter_charge(&h_cg->hugepage[idx], csize, &fail_res);
> +	css_put(&h_cg->css);
> +done:
> +	*ptr = h_cg;
> +	return ret;
> +}



^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup
@ 2012-06-14  8:58     ` Li Zefan
  0 siblings, 0 replies; 122+ messages in thread
From: Li Zefan @ 2012-06-14  8:58 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm,
	hannes, linux-kernel, cgroups

> +int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,

> +				 struct hugetlb_cgroup **ptr)
> +{
> +	int ret = 0;
> +	struct res_counter *fail_res;
> +	struct hugetlb_cgroup *h_cg = NULL;
> +	unsigned long csize = nr_pages * PAGE_SIZE;
> +
> +	if (hugetlb_cgroup_disabled())
> +		goto done;
> +	/*
> +	 * We don't charge any cgroup if the compound page have less
> +	 * than 3 pages.
> +	 */
> +	if (huge_page_order(&hstates[idx]) < HUGETLB_CGROUP_MIN_ORDER)
> +		goto done;
> +again:
> +	rcu_read_lock();
> +	h_cg = hugetlb_cgroup_from_task(current);
> +	if (!h_cg)


In no circumstances should h_cg be NULL.

> +		h_cg = root_h_cgroup;
> +
> +	if (!css_tryget(&h_cg->css)) {
> +		rcu_read_unlock();
> +		goto again;
> +	}
> +	rcu_read_unlock();
> +
> +	ret = res_counter_charge(&h_cg->hugepage[idx], csize, &fail_res);
> +	css_put(&h_cg->css);
> +done:
> +	*ptr = h_cg;
> +	return ret;
> +}


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup
  2012-06-13 10:27   ` Aneesh Kumar K.V
@ 2012-06-14  9:25     ` Michal Hocko
  -1 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-14  9:25 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

On Wed 13-06-12 15:57:30, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> This patchset add the charge and uncharge routines for hugetlb cgroup.
> We do cgroup charging in page alloc and uncharge in compound page
> destructor. Assigning page's hugetlb cgroup is protected by hugetlb_lock.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Reviewed-by: Michal Hocko <mhocko@suse.cz>

One minor comment
[...]
> +void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages,
> +				  struct hugetlb_cgroup *h_cg,
> +				  struct page *page)
> +{
> +	if (hugetlb_cgroup_disabled() || !h_cg)
> +		return;
> +
> +	spin_lock(&hugetlb_lock);
> +	set_hugetlb_cgroup(page, h_cg);
> +	spin_unlock(&hugetlb_lock);
> +	return;
> +}

I guess we can remove the lock here because nobody can see the page yet,
right?

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup
@ 2012-06-14  9:25     ` Michal Hocko
  0 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-14  9:25 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

On Wed 13-06-12 15:57:30, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> This patchset add the charge and uncharge routines for hugetlb cgroup.
> We do cgroup charging in page alloc and uncharge in compound page
> destructor. Assigning page's hugetlb cgroup is protected by hugetlb_lock.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Reviewed-by: Michal Hocko <mhocko@suse.cz>

One minor comment
[...]
> +void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages,
> +				  struct hugetlb_cgroup *h_cg,
> +				  struct page *page)
> +{
> +	if (hugetlb_cgroup_disabled() || !h_cg)
> +		return;
> +
> +	spin_lock(&hugetlb_lock);
> +	set_hugetlb_cgroup(page, h_cg);
> +	spin_unlock(&hugetlb_lock);
> +	return;
> +}

I guess we can remove the lock here because nobody can see the page yet,
right?

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 12/15] hugetlb/cgroup: Add support for cgroup removal
  2012-06-13 10:27   ` Aneesh Kumar K.V
@ 2012-06-14  9:31     ` Michal Hocko
  -1 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-14  9:31 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

On Wed 13-06-12 15:57:31, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> This patch add support for cgroup removal. If we don't have parent
> cgroup, the charges are moved to root cgroup.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Reviewed-by: Michal Hocko <mhocko@suse.cz>

> ---
>  mm/hugetlb_cgroup.c |   70 +++++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 68 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
> index 0f2f6ac..a3a68a4 100644
> --- a/mm/hugetlb_cgroup.c
> +++ b/mm/hugetlb_cgroup.c
> @@ -107,10 +107,76 @@ static void hugetlb_cgroup_destroy(struct cgroup *cgroup)
>  	kfree(h_cgroup);
>  }
>  
> +
> +/*
> + * Should be called with hugetlb_lock held.
> + * Since we are holding hugetlb_lock, pages cannot get moved from
> + * active list or uncharged from the cgroup, So no need to get
> + * page reference and test for page active here. This function
> + * cannot fail.
> + */
> +static void hugetlb_cgroup_move_parent(int idx, struct cgroup *cgroup,
> +				       struct page *page)
> +{
> +	int csize;
> +	struct res_counter *counter;
> +	struct res_counter *fail_res;
> +	struct hugetlb_cgroup *page_hcg;
> +	struct hugetlb_cgroup *h_cg   = hugetlb_cgroup_from_cgroup(cgroup);
> +	struct hugetlb_cgroup *parent = parent_hugetlb_cgroup(cgroup);
> +
> +	page_hcg = hugetlb_cgroup_from_page(page);
> +	/*
> +	 * We can have pages in active list without any cgroup
> +	 * ie, hugepage with less than 3 pages. We can safely
> +	 * ignore those pages.
> +	 */
> +	if (!page_hcg || page_hcg != h_cg)
> +		goto out;
> +
> +	csize = PAGE_SIZE << compound_order(page);
> +	if (!parent) {
> +		parent = root_h_cgroup;
> +		/* root has no limit */
> +		res_counter_charge_nofail(&parent->hugepage[idx],
> +					  csize, &fail_res);
> +	}
> +	counter = &h_cg->hugepage[idx];
> +	res_counter_uncharge_until(counter, counter->parent, csize);
> +
> +	set_hugetlb_cgroup(page, parent);
> +out:
> +	return;
> +}
> +
> +/*
> + * Force the hugetlb cgroup to empty the hugetlb resources by moving them to
> + * the parent cgroup.
> + */
>  static int hugetlb_cgroup_pre_destroy(struct cgroup *cgroup)
>  {
> -	/* We will add the cgroup removal support in later patches */
> -	   return -EBUSY;
> +	struct hstate *h;
> +	struct page *page;
> +	int ret = 0, idx = 0;
> +
> +	do {
> +		if (cgroup_task_count(cgroup) ||
> +		    !list_empty(&cgroup->children)) {
> +			ret = -EBUSY;
> +			goto out;
> +		}
> +		for_each_hstate(h) {
> +			spin_lock(&hugetlb_lock);
> +			list_for_each_entry(page, &h->hugepage_activelist, lru)
> +				hugetlb_cgroup_move_parent(idx, cgroup, page);
> +
> +			spin_unlock(&hugetlb_lock);
> +			idx++;
> +		}
> +		cond_resched();
> +	} while (hugetlb_cgroup_have_usage(cgroup));
> +out:
> +	return ret;
>  }
>  
>  int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
> -- 
> 1.7.10
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 12/15] hugetlb/cgroup: Add support for cgroup removal
@ 2012-06-14  9:31     ` Michal Hocko
  0 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-14  9:31 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

On Wed 13-06-12 15:57:31, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> This patch add support for cgroup removal. If we don't have parent
> cgroup, the charges are moved to root cgroup.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Reviewed-by: Michal Hocko <mhocko@suse.cz>

> ---
>  mm/hugetlb_cgroup.c |   70 +++++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 68 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
> index 0f2f6ac..a3a68a4 100644
> --- a/mm/hugetlb_cgroup.c
> +++ b/mm/hugetlb_cgroup.c
> @@ -107,10 +107,76 @@ static void hugetlb_cgroup_destroy(struct cgroup *cgroup)
>  	kfree(h_cgroup);
>  }
>  
> +
> +/*
> + * Should be called with hugetlb_lock held.
> + * Since we are holding hugetlb_lock, pages cannot get moved from
> + * active list or uncharged from the cgroup, So no need to get
> + * page reference and test for page active here. This function
> + * cannot fail.
> + */
> +static void hugetlb_cgroup_move_parent(int idx, struct cgroup *cgroup,
> +				       struct page *page)
> +{
> +	int csize;
> +	struct res_counter *counter;
> +	struct res_counter *fail_res;
> +	struct hugetlb_cgroup *page_hcg;
> +	struct hugetlb_cgroup *h_cg   = hugetlb_cgroup_from_cgroup(cgroup);
> +	struct hugetlb_cgroup *parent = parent_hugetlb_cgroup(cgroup);
> +
> +	page_hcg = hugetlb_cgroup_from_page(page);
> +	/*
> +	 * We can have pages in active list without any cgroup
> +	 * ie, hugepage with less than 3 pages. We can safely
> +	 * ignore those pages.
> +	 */
> +	if (!page_hcg || page_hcg != h_cg)
> +		goto out;
> +
> +	csize = PAGE_SIZE << compound_order(page);
> +	if (!parent) {
> +		parent = root_h_cgroup;
> +		/* root has no limit */
> +		res_counter_charge_nofail(&parent->hugepage[idx],
> +					  csize, &fail_res);
> +	}
> +	counter = &h_cg->hugepage[idx];
> +	res_counter_uncharge_until(counter, counter->parent, csize);
> +
> +	set_hugetlb_cgroup(page, parent);
> +out:
> +	return;
> +}
> +
> +/*
> + * Force the hugetlb cgroup to empty the hugetlb resources by moving them to
> + * the parent cgroup.
> + */
>  static int hugetlb_cgroup_pre_destroy(struct cgroup *cgroup)
>  {
> -	/* We will add the cgroup removal support in later patches */
> -	   return -EBUSY;
> +	struct hstate *h;
> +	struct page *page;
> +	int ret = 0, idx = 0;
> +
> +	do {
> +		if (cgroup_task_count(cgroup) ||
> +		    !list_empty(&cgroup->children)) {
> +			ret = -EBUSY;
> +			goto out;
> +		}
> +		for_each_hstate(h) {
> +			spin_lock(&hugetlb_lock);
> +			list_for_each_entry(page, &h->hugepage_activelist, lru)
> +				hugetlb_cgroup_move_parent(idx, cgroup, page);
> +
> +			spin_unlock(&hugetlb_lock);
> +			idx++;
> +		}
> +		cond_resched();
> +	} while (hugetlb_cgroup_have_usage(cgroup));
> +out:
> +	return ret;
>  }
>  
>  int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
> -- 
> 1.7.10
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 13/15] hugetlb/cgroup: add hugetlb cgroup control files
  2012-06-13 10:27   ` Aneesh Kumar K.V
@ 2012-06-14  9:36     ` Michal Hocko
  -1 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-14  9:36 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

On Wed 13-06-12 15:57:32, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> Add the control files for hugetlb controller
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Reviewed-by: Michal Hocko <mhocko@suse.cz>

> ---
>  include/linux/hugetlb.h        |    5 ++
>  include/linux/hugetlb_cgroup.h |    6 ++
>  mm/hugetlb.c                   |    8 +++
>  mm/hugetlb_cgroup.c            |  129 ++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 148 insertions(+)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 4aca057..9650bb1 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -4,6 +4,7 @@
>  #include <linux/mm_types.h>
>  #include <linux/fs.h>
>  #include <linux/hugetlb_inline.h>
> +#include <linux/cgroup.h>
>  
>  struct ctl_table;
>  struct user_struct;
> @@ -221,6 +222,10 @@ struct hstate {
>  	unsigned int nr_huge_pages_node[MAX_NUMNODES];
>  	unsigned int free_huge_pages_node[MAX_NUMNODES];
>  	unsigned int surplus_huge_pages_node[MAX_NUMNODES];
> +#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR
> +	/* cgroup control files */
> +	struct cftype cgroup_files[5];
> +#endif
>  	char name[HSTATE_NAME_LEN];
>  };
>  
> diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h
> index e05871c..bd8bc98 100644
> --- a/include/linux/hugetlb_cgroup.h
> +++ b/include/linux/hugetlb_cgroup.h
> @@ -62,6 +62,7 @@ extern void hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages,
>  					 struct page *page);
>  extern void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
>  					   struct hugetlb_cgroup *h_cg);
> +extern int hugetlb_cgroup_file_init(int idx) __init;
>  
>  #else
>  static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page)
> @@ -108,5 +109,10 @@ hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
>  	return;
>  }
>  
> +static inline int __init hugetlb_cgroup_file_init(int idx)
> +{
> +	return 0;
> +}
> +
>  #endif  /* CONFIG_MEM_RES_CTLR_HUGETLB */
>  #endif
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 59720b1..a5a30bf 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -30,6 +30,7 @@
>  #include <linux/hugetlb.h>
>  #include <linux/hugetlb_cgroup.h>
>  #include <linux/node.h>
> +#include <linux/hugetlb_cgroup.h>
>  #include "internal.h"
>  
>  const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
> @@ -1930,6 +1931,13 @@ void __init hugetlb_add_hstate(unsigned order)
>  	h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]);
>  	snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
>  					huge_page_size(h)/1024);
> +	/*
> +	 * Add cgroup control files only if the huge page consists
> +	 * of more than two normal pages. This is because we use
> +	 * page[2].lru.next for storing cgoup details.
> +	 */
> +	if (order >= HUGETLB_CGROUP_MIN_ORDER)
> +		hugetlb_cgroup_file_init(hugetlb_max_hstate - 1);
>  
>  	parsed_hstate = h;
>  }
> diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
> index a3a68a4..64e93e0 100644
> --- a/mm/hugetlb_cgroup.c
> +++ b/mm/hugetlb_cgroup.c
> @@ -26,6 +26,10 @@ struct hugetlb_cgroup {
>  	struct res_counter hugepage[HUGE_MAX_HSTATE];
>  };
>  
> +#define MEMFILE_PRIVATE(x, val)	(((x) << 16) | (val))
> +#define MEMFILE_IDX(val)	(((val) >> 16) & 0xffff)
> +#define MEMFILE_ATTR(val)	((val) & 0xffff)
> +
>  struct cgroup_subsys hugetlb_subsys __read_mostly;
>  struct hugetlb_cgroup *root_h_cgroup __read_mostly;
>  
> @@ -259,6 +263,131 @@ void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
>  	return;
>  }
>  
> +static ssize_t hugetlb_cgroup_read(struct cgroup *cgroup, struct cftype *cft,
> +				   struct file *file, char __user *buf,
> +				   size_t nbytes, loff_t *ppos)
> +{
> +	u64 val;
> +	char str[64];
> +	int idx, name, len;
> +	struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup);
> +
> +	idx = MEMFILE_IDX(cft->private);
> +	name = MEMFILE_ATTR(cft->private);
> +
> +	val = res_counter_read_u64(&h_cg->hugepage[idx], name);
> +	len = scnprintf(str, sizeof(str), "%llu\n", (unsigned long long)val);
> +	return simple_read_from_buffer(buf, nbytes, ppos, str, len);
> +}
> +
> +static int hugetlb_cgroup_write(struct cgroup *cgroup, struct cftype *cft,
> +				const char *buffer)
> +{
> +	int idx, name, ret;
> +	unsigned long long val;
> +	struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup);
> +
> +	idx = MEMFILE_IDX(cft->private);
> +	name = MEMFILE_ATTR(cft->private);
> +
> +	switch (name) {
> +	case RES_LIMIT:
> +		if (hugetlb_cgroup_is_root(h_cg)) {
> +			/* Can't set limit on root */
> +			ret = -EINVAL;
> +			break;
> +		}
> +		/* This function does all necessary parse...reuse it */
> +		ret = res_counter_memparse_write_strategy(buffer, &val);
> +		if (ret)
> +			break;
> +		ret = res_counter_set_limit(&h_cg->hugepage[idx], val);
> +		break;
> +	default:
> +		ret = -EINVAL;
> +		break;
> +	}
> +	return ret;
> +}
> +
> +static int hugetlb_cgroup_reset(struct cgroup *cgroup, unsigned int event)
> +{
> +	int idx, name, ret = 0;
> +	struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup);
> +
> +	idx = MEMFILE_IDX(event);
> +	name = MEMFILE_ATTR(event);
> +
> +	switch (name) {
> +	case RES_MAX_USAGE:
> +		res_counter_reset_max(&h_cg->hugepage[idx]);
> +		break;
> +	case RES_FAILCNT:
> +		res_counter_reset_failcnt(&h_cg->hugepage[idx]);
> +		break;
> +	default:
> +		ret = -EINVAL;
> +		break;
> +	}
> +	return ret;
> +}
> +
> +static char *mem_fmt(char *buf, int size, unsigned long hsize)
> +{
> +	if (hsize >= (1UL << 30))
> +		snprintf(buf, size, "%luGB", hsize >> 30);
> +	else if (hsize >= (1UL << 20))
> +		snprintf(buf, size, "%luMB", hsize >> 20);
> +	else
> +		snprintf(buf, size, "%luKB", hsize >> 10);
> +	return buf;
> +}
> +
> +int __init hugetlb_cgroup_file_init(int idx)
> +{
> +	char buf[32];
> +	struct cftype *cft;
> +	struct hstate *h = &hstates[idx];
> +
> +	/* format the size */
> +	mem_fmt(buf, 32, huge_page_size(h));
> +
> +	/* Add the limit file */
> +	cft = &h->cgroup_files[0];
> +	snprintf(cft->name, MAX_CFTYPE_NAME, "%s.limit_in_bytes", buf);
> +	cft->private = MEMFILE_PRIVATE(idx, RES_LIMIT);
> +	cft->read = hugetlb_cgroup_read;
> +	cft->write_string = hugetlb_cgroup_write;
> +
> +	/* Add the usage file */
> +	cft = &h->cgroup_files[1];
> +	snprintf(cft->name, MAX_CFTYPE_NAME, "%s.usage_in_bytes", buf);
> +	cft->private = MEMFILE_PRIVATE(idx, RES_USAGE);
> +	cft->read = hugetlb_cgroup_read;
> +
> +	/* Add the MAX usage file */
> +	cft = &h->cgroup_files[2];
> +	snprintf(cft->name, MAX_CFTYPE_NAME, "%s.max_usage_in_bytes", buf);
> +	cft->private = MEMFILE_PRIVATE(idx, RES_MAX_USAGE);
> +	cft->trigger = hugetlb_cgroup_reset;
> +	cft->read = hugetlb_cgroup_read;
> +
> +	/* Add the failcntfile */
> +	cft = &h->cgroup_files[3];
> +	snprintf(cft->name, MAX_CFTYPE_NAME, "%s.failcnt", buf);
> +	cft->private  = MEMFILE_PRIVATE(idx, RES_FAILCNT);
> +	cft->trigger  = hugetlb_cgroup_reset;
> +	cft->read = hugetlb_cgroup_read;
> +
> +	/* NULL terminate the last cft */
> +	cft = &h->cgroup_files[4];
> +	memset(cft, 0, sizeof(*cft));
> +
> +	WARN_ON(cgroup_add_cftypes(&hugetlb_subsys, h->cgroup_files));
> +
> +	return 0;
> +}
> +
>  struct cgroup_subsys hugetlb_subsys = {
>  	.name = "hugetlb",
>  	.create     = hugetlb_cgroup_create,
> -- 
> 1.7.10
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 13/15] hugetlb/cgroup: add hugetlb cgroup control files
@ 2012-06-14  9:36     ` Michal Hocko
  0 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-14  9:36 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

On Wed 13-06-12 15:57:32, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> Add the control files for hugetlb controller
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Reviewed-by: Michal Hocko <mhocko@suse.cz>

> ---
>  include/linux/hugetlb.h        |    5 ++
>  include/linux/hugetlb_cgroup.h |    6 ++
>  mm/hugetlb.c                   |    8 +++
>  mm/hugetlb_cgroup.c            |  129 ++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 148 insertions(+)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 4aca057..9650bb1 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -4,6 +4,7 @@
>  #include <linux/mm_types.h>
>  #include <linux/fs.h>
>  #include <linux/hugetlb_inline.h>
> +#include <linux/cgroup.h>
>  
>  struct ctl_table;
>  struct user_struct;
> @@ -221,6 +222,10 @@ struct hstate {
>  	unsigned int nr_huge_pages_node[MAX_NUMNODES];
>  	unsigned int free_huge_pages_node[MAX_NUMNODES];
>  	unsigned int surplus_huge_pages_node[MAX_NUMNODES];
> +#ifdef CONFIG_CGROUP_HUGETLB_RES_CTLR
> +	/* cgroup control files */
> +	struct cftype cgroup_files[5];
> +#endif
>  	char name[HSTATE_NAME_LEN];
>  };
>  
> diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h
> index e05871c..bd8bc98 100644
> --- a/include/linux/hugetlb_cgroup.h
> +++ b/include/linux/hugetlb_cgroup.h
> @@ -62,6 +62,7 @@ extern void hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages,
>  					 struct page *page);
>  extern void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
>  					   struct hugetlb_cgroup *h_cg);
> +extern int hugetlb_cgroup_file_init(int idx) __init;
>  
>  #else
>  static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page)
> @@ -108,5 +109,10 @@ hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
>  	return;
>  }
>  
> +static inline int __init hugetlb_cgroup_file_init(int idx)
> +{
> +	return 0;
> +}
> +
>  #endif  /* CONFIG_MEM_RES_CTLR_HUGETLB */
>  #endif
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 59720b1..a5a30bf 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -30,6 +30,7 @@
>  #include <linux/hugetlb.h>
>  #include <linux/hugetlb_cgroup.h>
>  #include <linux/node.h>
> +#include <linux/hugetlb_cgroup.h>
>  #include "internal.h"
>  
>  const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
> @@ -1930,6 +1931,13 @@ void __init hugetlb_add_hstate(unsigned order)
>  	h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]);
>  	snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
>  					huge_page_size(h)/1024);
> +	/*
> +	 * Add cgroup control files only if the huge page consists
> +	 * of more than two normal pages. This is because we use
> +	 * page[2].lru.next for storing cgoup details.
> +	 */
> +	if (order >= HUGETLB_CGROUP_MIN_ORDER)
> +		hugetlb_cgroup_file_init(hugetlb_max_hstate - 1);
>  
>  	parsed_hstate = h;
>  }
> diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
> index a3a68a4..64e93e0 100644
> --- a/mm/hugetlb_cgroup.c
> +++ b/mm/hugetlb_cgroup.c
> @@ -26,6 +26,10 @@ struct hugetlb_cgroup {
>  	struct res_counter hugepage[HUGE_MAX_HSTATE];
>  };
>  
> +#define MEMFILE_PRIVATE(x, val)	(((x) << 16) | (val))
> +#define MEMFILE_IDX(val)	(((val) >> 16) & 0xffff)
> +#define MEMFILE_ATTR(val)	((val) & 0xffff)
> +
>  struct cgroup_subsys hugetlb_subsys __read_mostly;
>  struct hugetlb_cgroup *root_h_cgroup __read_mostly;
>  
> @@ -259,6 +263,131 @@ void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
>  	return;
>  }
>  
> +static ssize_t hugetlb_cgroup_read(struct cgroup *cgroup, struct cftype *cft,
> +				   struct file *file, char __user *buf,
> +				   size_t nbytes, loff_t *ppos)
> +{
> +	u64 val;
> +	char str[64];
> +	int idx, name, len;
> +	struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup);
> +
> +	idx = MEMFILE_IDX(cft->private);
> +	name = MEMFILE_ATTR(cft->private);
> +
> +	val = res_counter_read_u64(&h_cg->hugepage[idx], name);
> +	len = scnprintf(str, sizeof(str), "%llu\n", (unsigned long long)val);
> +	return simple_read_from_buffer(buf, nbytes, ppos, str, len);
> +}
> +
> +static int hugetlb_cgroup_write(struct cgroup *cgroup, struct cftype *cft,
> +				const char *buffer)
> +{
> +	int idx, name, ret;
> +	unsigned long long val;
> +	struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup);
> +
> +	idx = MEMFILE_IDX(cft->private);
> +	name = MEMFILE_ATTR(cft->private);
> +
> +	switch (name) {
> +	case RES_LIMIT:
> +		if (hugetlb_cgroup_is_root(h_cg)) {
> +			/* Can't set limit on root */
> +			ret = -EINVAL;
> +			break;
> +		}
> +		/* This function does all necessary parse...reuse it */
> +		ret = res_counter_memparse_write_strategy(buffer, &val);
> +		if (ret)
> +			break;
> +		ret = res_counter_set_limit(&h_cg->hugepage[idx], val);
> +		break;
> +	default:
> +		ret = -EINVAL;
> +		break;
> +	}
> +	return ret;
> +}
> +
> +static int hugetlb_cgroup_reset(struct cgroup *cgroup, unsigned int event)
> +{
> +	int idx, name, ret = 0;
> +	struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup);
> +
> +	idx = MEMFILE_IDX(event);
> +	name = MEMFILE_ATTR(event);
> +
> +	switch (name) {
> +	case RES_MAX_USAGE:
> +		res_counter_reset_max(&h_cg->hugepage[idx]);
> +		break;
> +	case RES_FAILCNT:
> +		res_counter_reset_failcnt(&h_cg->hugepage[idx]);
> +		break;
> +	default:
> +		ret = -EINVAL;
> +		break;
> +	}
> +	return ret;
> +}
> +
> +static char *mem_fmt(char *buf, int size, unsigned long hsize)
> +{
> +	if (hsize >= (1UL << 30))
> +		snprintf(buf, size, "%luGB", hsize >> 30);
> +	else if (hsize >= (1UL << 20))
> +		snprintf(buf, size, "%luMB", hsize >> 20);
> +	else
> +		snprintf(buf, size, "%luKB", hsize >> 10);
> +	return buf;
> +}
> +
> +int __init hugetlb_cgroup_file_init(int idx)
> +{
> +	char buf[32];
> +	struct cftype *cft;
> +	struct hstate *h = &hstates[idx];
> +
> +	/* format the size */
> +	mem_fmt(buf, 32, huge_page_size(h));
> +
> +	/* Add the limit file */
> +	cft = &h->cgroup_files[0];
> +	snprintf(cft->name, MAX_CFTYPE_NAME, "%s.limit_in_bytes", buf);
> +	cft->private = MEMFILE_PRIVATE(idx, RES_LIMIT);
> +	cft->read = hugetlb_cgroup_read;
> +	cft->write_string = hugetlb_cgroup_write;
> +
> +	/* Add the usage file */
> +	cft = &h->cgroup_files[1];
> +	snprintf(cft->name, MAX_CFTYPE_NAME, "%s.usage_in_bytes", buf);
> +	cft->private = MEMFILE_PRIVATE(idx, RES_USAGE);
> +	cft->read = hugetlb_cgroup_read;
> +
> +	/* Add the MAX usage file */
> +	cft = &h->cgroup_files[2];
> +	snprintf(cft->name, MAX_CFTYPE_NAME, "%s.max_usage_in_bytes", buf);
> +	cft->private = MEMFILE_PRIVATE(idx, RES_MAX_USAGE);
> +	cft->trigger = hugetlb_cgroup_reset;
> +	cft->read = hugetlb_cgroup_read;
> +
> +	/* Add the failcntfile */
> +	cft = &h->cgroup_files[3];
> +	snprintf(cft->name, MAX_CFTYPE_NAME, "%s.failcnt", buf);
> +	cft->private  = MEMFILE_PRIVATE(idx, RES_FAILCNT);
> +	cft->trigger  = hugetlb_cgroup_reset;
> +	cft->read = hugetlb_cgroup_read;
> +
> +	/* NULL terminate the last cft */
> +	cft = &h->cgroup_files[4];
> +	memset(cft, 0, sizeof(*cft));
> +
> +	WARN_ON(cgroup_add_cftypes(&hugetlb_subsys, h->cgroup_files));
> +
> +	return 0;
> +}
> +
>  struct cgroup_subsys hugetlb_subsys = {
>  	.name = "hugetlb",
>  	.create     = hugetlb_cgroup_create,
> -- 
> 1.7.10
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 14/15] hugetlb/cgroup: migrate hugetlb cgroup info from oldpage to new page during migration
  2012-06-13 10:27   ` Aneesh Kumar K.V
  (?)
@ 2012-06-14 10:04     ` Michal Hocko
  -1 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-14 10:04 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

On Wed 13-06-12 15:57:33, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> With HugeTLB pages, hugetlb cgroup is uncharged in compound page destructor.  Since
> we are holding a hugepage reference, we can be sure that old page won't
> get uncharged till the last put_page().
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Reviewed-by: Michal Hocko <mhocko@suse.cz>

One question below
[...]
> +void hugetlb_cgroup_migrate(struct page *oldhpage, struct page *newhpage)
> +{
> +	struct hugetlb_cgroup *h_cg;
> +
> +	if (hugetlb_cgroup_disabled())
> +		return;
> +
> +	VM_BUG_ON(!PageHuge(oldhpage));
> +	spin_lock(&hugetlb_lock);
> +	h_cg = hugetlb_cgroup_from_page(oldhpage);
> +	set_hugetlb_cgroup(oldhpage, NULL);
> +	cgroup_exclude_rmdir(&h_cg->css);
> +
> +	/* move the h_cg details to new cgroup */
> +	set_hugetlb_cgroup(newhpage, h_cg);
> +	spin_unlock(&hugetlb_lock);
> +	cgroup_release_and_wakeup_rmdir(&h_cg->css);
> +	return;
> +}
> +

The changelog says that the old page won't get uncharged - which means
that the the cgroup cannot go away (even if we raced with the move
parent, hugetlb_lock makes sure we either see old or new cgroup) so why
do we need to play with css ref. counting?
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 14/15] hugetlb/cgroup: migrate hugetlb cgroup info from oldpage to new page during migration
@ 2012-06-14 10:04     ` Michal Hocko
  0 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-14 10:04 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

On Wed 13-06-12 15:57:33, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> With HugeTLB pages, hugetlb cgroup is uncharged in compound page destructor.  Since
> we are holding a hugepage reference, we can be sure that old page won't
> get uncharged till the last put_page().
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Reviewed-by: Michal Hocko <mhocko@suse.cz>

One question below
[...]
> +void hugetlb_cgroup_migrate(struct page *oldhpage, struct page *newhpage)
> +{
> +	struct hugetlb_cgroup *h_cg;
> +
> +	if (hugetlb_cgroup_disabled())
> +		return;
> +
> +	VM_BUG_ON(!PageHuge(oldhpage));
> +	spin_lock(&hugetlb_lock);
> +	h_cg = hugetlb_cgroup_from_page(oldhpage);
> +	set_hugetlb_cgroup(oldhpage, NULL);
> +	cgroup_exclude_rmdir(&h_cg->css);
> +
> +	/* move the h_cg details to new cgroup */
> +	set_hugetlb_cgroup(newhpage, h_cg);
> +	spin_unlock(&hugetlb_lock);
> +	cgroup_release_and_wakeup_rmdir(&h_cg->css);
> +	return;
> +}
> +

The changelog says that the old page won't get uncharged - which means
that the the cgroup cannot go away (even if we raced with the move
parent, hugetlb_lock makes sure we either see old or new cgroup) so why
do we need to play with css ref. counting?
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 14/15] hugetlb/cgroup: migrate hugetlb cgroup info from oldpage to new page during migration
@ 2012-06-14 10:04     ` Michal Hocko
  0 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-14 10:04 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, rientjes-hpIqsD4AKlfQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Wed 13-06-12 15:57:33, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> 
> With HugeTLB pages, hugetlb cgroup is uncharged in compound page destructor.  Since
> we are holding a hugepage reference, we can be sure that old page won't
> get uncharged till the last put_page().
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>

Reviewed-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>

One question below
[...]
> +void hugetlb_cgroup_migrate(struct page *oldhpage, struct page *newhpage)
> +{
> +	struct hugetlb_cgroup *h_cg;
> +
> +	if (hugetlb_cgroup_disabled())
> +		return;
> +
> +	VM_BUG_ON(!PageHuge(oldhpage));
> +	spin_lock(&hugetlb_lock);
> +	h_cg = hugetlb_cgroup_from_page(oldhpage);
> +	set_hugetlb_cgroup(oldhpage, NULL);
> +	cgroup_exclude_rmdir(&h_cg->css);
> +
> +	/* move the h_cg details to new cgroup */
> +	set_hugetlb_cgroup(newhpage, h_cg);
> +	spin_unlock(&hugetlb_lock);
> +	cgroup_release_and_wakeup_rmdir(&h_cg->css);
> +	return;
> +}
> +

The changelog says that the old page won't get uncharged - which means
that the the cgroup cannot go away (even if we raced with the move
parent, hugetlb_lock makes sure we either see old or new cgroup) so why
do we need to play with css ref. counting?
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 15/15] hugetlb/cgroup: add HugeTLB controller documentation
  2012-06-13 10:27   ` Aneesh Kumar K.V
  (?)
@ 2012-06-14 10:07     ` Michal Hocko
  -1 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-14 10:07 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

On Wed 13-06-12 15:57:34, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Reviewed-by: Michal Hocko <mhocko@suse.cz>

Minor nid below
> ---
>  Documentation/cgroups/hugetlb.txt |   45 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 45 insertions(+)
>  create mode 100644 Documentation/cgroups/hugetlb.txt
> 
> diff --git a/Documentation/cgroups/hugetlb.txt b/Documentation/cgroups/hugetlb.txt
> new file mode 100644
> index 0000000..a9faaca
> --- /dev/null
> +++ b/Documentation/cgroups/hugetlb.txt
[...]
> +With the above step, the initial or the parent HugeTLB group becomes
> +visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in
> +the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup.
> +
> +New groups can be created under the parent group /sys/fs/cgroup.
> +
> +# cd /sys/fs/cgroup
> +# mkdir g1
> +# echo $$ > g1/tasks
> +
> +The above steps create a new group g1 and move the current shell
> +process (bash) into it.

This is probably not needed as it is already described in the generic
cgroups description

> +
> +Brief summary of control files
> +
> + hugetlb.<hugepagesize>.limit_in_bytes     # set/show limit of "hugepagesize" hugetlb usage
> + hugetlb.<hugepagesize>.max_usage_in_bytes # show max "hugepagesize" hugetlb  usage recorded
> + hugetlb.<hugepagesize>.usage_in_bytes     # show current res_counter usage for "hugepagesize" hugetlb
> + hugetlb.<hugepagesize>.failcnt		   # show the number of allocation failure due to HugeTLB limit
> +
> +For a system supporting two hugepage size (16M and 16G) the control
> +files include:
> +
> +hugetlb.16GB.limit_in_bytes
> +hugetlb.16GB.max_usage_in_bytes
> +hugetlb.16GB.usage_in_bytes
> +hugetlb.16GB.failcnt
> +hugetlb.16MB.limit_in_bytes
> +hugetlb.16MB.max_usage_in_bytes
> +hugetlb.16MB.usage_in_bytes
> +hugetlb.16MB.failcnt
> -- 
> 1.7.10
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 15/15] hugetlb/cgroup: add HugeTLB controller documentation
@ 2012-06-14 10:07     ` Michal Hocko
  0 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-14 10:07 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

On Wed 13-06-12 15:57:34, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Reviewed-by: Michal Hocko <mhocko@suse.cz>

Minor nid below
> ---
>  Documentation/cgroups/hugetlb.txt |   45 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 45 insertions(+)
>  create mode 100644 Documentation/cgroups/hugetlb.txt
> 
> diff --git a/Documentation/cgroups/hugetlb.txt b/Documentation/cgroups/hugetlb.txt
> new file mode 100644
> index 0000000..a9faaca
> --- /dev/null
> +++ b/Documentation/cgroups/hugetlb.txt
[...]
> +With the above step, the initial or the parent HugeTLB group becomes
> +visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in
> +the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup.
> +
> +New groups can be created under the parent group /sys/fs/cgroup.
> +
> +# cd /sys/fs/cgroup
> +# mkdir g1
> +# echo $$ > g1/tasks
> +
> +The above steps create a new group g1 and move the current shell
> +process (bash) into it.

This is probably not needed as it is already described in the generic
cgroups description

> +
> +Brief summary of control files
> +
> + hugetlb.<hugepagesize>.limit_in_bytes     # set/show limit of "hugepagesize" hugetlb usage
> + hugetlb.<hugepagesize>.max_usage_in_bytes # show max "hugepagesize" hugetlb  usage recorded
> + hugetlb.<hugepagesize>.usage_in_bytes     # show current res_counter usage for "hugepagesize" hugetlb
> + hugetlb.<hugepagesize>.failcnt		   # show the number of allocation failure due to HugeTLB limit
> +
> +For a system supporting two hugepage size (16M and 16G) the control
> +files include:
> +
> +hugetlb.16GB.limit_in_bytes
> +hugetlb.16GB.max_usage_in_bytes
> +hugetlb.16GB.usage_in_bytes
> +hugetlb.16GB.failcnt
> +hugetlb.16MB.limit_in_bytes
> +hugetlb.16MB.max_usage_in_bytes
> +hugetlb.16MB.usage_in_bytes
> +hugetlb.16MB.failcnt
> -- 
> 1.7.10
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 15/15] hugetlb/cgroup: add HugeTLB controller documentation
@ 2012-06-14 10:07     ` Michal Hocko
  0 siblings, 0 replies; 122+ messages in thread
From: Michal Hocko @ 2012-06-14 10:07 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, rientjes-hpIqsD4AKlfQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Wed 13-06-12 15:57:34, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> 
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>

Reviewed-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>

Minor nid below
> ---
>  Documentation/cgroups/hugetlb.txt |   45 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 45 insertions(+)
>  create mode 100644 Documentation/cgroups/hugetlb.txt
> 
> diff --git a/Documentation/cgroups/hugetlb.txt b/Documentation/cgroups/hugetlb.txt
> new file mode 100644
> index 0000000..a9faaca
> --- /dev/null
> +++ b/Documentation/cgroups/hugetlb.txt
[...]
> +With the above step, the initial or the parent HugeTLB group becomes
> +visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in
> +the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup.
> +
> +New groups can be created under the parent group /sys/fs/cgroup.
> +
> +# cd /sys/fs/cgroup
> +# mkdir g1
> +# echo $$ > g1/tasks
> +
> +The above steps create a new group g1 and move the current shell
> +process (bash) into it.

This is probably not needed as it is already described in the generic
cgroups description

> +
> +Brief summary of control files
> +
> + hugetlb.<hugepagesize>.limit_in_bytes     # set/show limit of "hugepagesize" hugetlb usage
> + hugetlb.<hugepagesize>.max_usage_in_bytes # show max "hugepagesize" hugetlb  usage recorded
> + hugetlb.<hugepagesize>.usage_in_bytes     # show current res_counter usage for "hugepagesize" hugetlb
> + hugetlb.<hugepagesize>.failcnt		   # show the number of allocation failure due to HugeTLB limit
> +
> +For a system supporting two hugepage size (16M and 16G) the control
> +files include:
> +
> +hugetlb.16GB.limit_in_bytes
> +hugetlb.16GB.max_usage_in_bytes
> +hugetlb.16GB.usage_in_bytes
> +hugetlb.16GB.failcnt
> +hugetlb.16MB.limit_in_bytes
> +hugetlb.16MB.max_usage_in_bytes
> +hugetlb.16MB.usage_in_bytes
> +hugetlb.16MB.failcnt
> -- 
> 1.7.10
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 09/15] mm/hugetlb: Add new HugeTLB cgroup
  2012-06-14  8:54     ` Li Zefan
  (?)
@ 2012-06-15  6:20       ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-15  6:20 UTC (permalink / raw)
  To: Li Zefan
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm,
	hannes, linux-kernel, cgroups

Li Zefan <lizefan@huawei.com> writes:

>> +static inline
>
>> +struct hugetlb_cgroup *hugetlb_cgroup_from_css(struct cgroup_subsys_state *s)
>> +{
>> +	if (s)
>
>
> Neither cgroup_subsys_state() or task_subsys_state() will ever return NULL,
> so here 's' won't be NULL.
>

That is a change that didn't get updated when i dropped page_cgroup
changes. I had a series that tracked in page_cgroup
cgroup_subsys_state. I will send an fix on top.

Thanks for the review.
-aneesh


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 09/15] mm/hugetlb: Add new HugeTLB cgroup
@ 2012-06-15  6:20       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-15  6:20 UTC (permalink / raw)
  To: Li Zefan
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, akpm,
	hannes, linux-kernel, cgroups

Li Zefan <lizefan@huawei.com> writes:

>> +static inline
>
>> +struct hugetlb_cgroup *hugetlb_cgroup_from_css(struct cgroup_subsys_state *s)
>> +{
>> +	if (s)
>
>
> Neither cgroup_subsys_state() or task_subsys_state() will ever return NULL,
> so here 's' won't be NULL.
>

That is a change that didn't get updated when i dropped page_cgroup
changes. I had a series that tracked in page_cgroup
cgroup_subsys_state. I will send an fix on top.

Thanks for the review.
-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 09/15] mm/hugetlb: Add new HugeTLB cgroup
@ 2012-06-15  6:20       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-15  6:20 UTC (permalink / raw)
  To: Li Zefan
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, rientjes-hpIqsD4AKlfQT0dZR+AlfA,
	mhocko-AlSwsSmVLrQ, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> writes:

>> +static inline
>
>> +struct hugetlb_cgroup *hugetlb_cgroup_from_css(struct cgroup_subsys_state *s)
>> +{
>> +	if (s)
>
>
> Neither cgroup_subsys_state() or task_subsys_state() will ever return NULL,
> so here 's' won't be NULL.
>

That is a change that didn't get updated when i dropped page_cgroup
changes. I had a series that tracked in page_cgroup
cgroup_subsys_state. I will send an fix on top.

Thanks for the review.
-aneesh

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup
  2012-06-14  9:25     ` Michal Hocko
  (?)
@ 2012-06-15 10:06       ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-15 10:06 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

Michal Hocko <mhocko@suse.cz> writes:

> On Wed 13-06-12 15:57:30, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> 
>> This patchset add the charge and uncharge routines for hugetlb cgroup.
>> We do cgroup charging in page alloc and uncharge in compound page
>> destructor. Assigning page's hugetlb cgroup is protected by hugetlb_lock.
>> 
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>
> Reviewed-by: Michal Hocko <mhocko@suse.cz>
>
> One minor comment
> [...]
>> +void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages,
>> +				  struct hugetlb_cgroup *h_cg,
>> +				  struct page *page)
>> +{
>> +	if (hugetlb_cgroup_disabled() || !h_cg)
>> +		return;
>> +
>> +	spin_lock(&hugetlb_lock);
>> +	set_hugetlb_cgroup(page, h_cg);
>> +	spin_unlock(&hugetlb_lock);
>> +	return;
>> +}
>
> I guess we can remove the lock here because nobody can see the page yet,
> right?
>

We need that to make sure when we remove cgroup we find correct page
hugetlb cgroup values. But i guess we have a bug here. How about the
below ?

NOTE: We also need another patch to update active list during soft
offline. I will send that in reply.

commit e4c3fd3cc0f0faa30ea283cb48ba478a5c0d3e74
Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Date:   Fri Jun 15 14:42:27 2012 +0530

    hugetlb/cgroup: Assign the page hugetlb cgroup when we move the page to active list.
    
    page's hugetlb cgroup assign and moving to active list should happen with
    hugetlb_lock held. Otherwise when we remove the hugetlb cgroup we would
    iterate the active list and will find page with NULL hugetlb cgroup values.
    
    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ee4da3b..b90dfb4 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1146,9 +1146,12 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 	}
 	spin_lock(&hugetlb_lock);
 	page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve);
-	spin_unlock(&hugetlb_lock);
-
-	if (!page) {
+	if (page) {
+		/* update page cgroup details */
+		hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page);
+		spin_unlock(&hugetlb_lock);
+	} else {
+		spin_unlock(&hugetlb_lock);
 		page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
 		if (!page) {
 			hugetlb_cgroup_uncharge_cgroup(idx,
@@ -1159,14 +1162,13 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 		}
 		spin_lock(&hugetlb_lock);
 		list_move(&page->lru, &h->hugepage_activelist);
+		hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page);
 		spin_unlock(&hugetlb_lock);
 	}
 
 	set_page_private(page, (unsigned long)spool);
 
 	vma_commit_reservation(h, vma, addr);
-	/* update page cgroup details */
-	hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page);
 	return page;
 }
 
diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
index 8e7ca0a..d4f3f7b 100644
--- a/mm/hugetlb_cgroup.c
+++ b/mm/hugetlb_cgroup.c
@@ -218,6 +218,7 @@ done:
 	return ret;
 }
 
+/* Should be called with hugetlb_lock held */
 void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages,
 				  struct hugetlb_cgroup *h_cg,
 				  struct page *page)
@@ -225,9 +226,7 @@ void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages,
 	if (hugetlb_cgroup_disabled() || !h_cg)
 		return;
 
-	spin_lock(&hugetlb_lock);
 	set_hugetlb_cgroup(page, h_cg);
-	spin_unlock(&hugetlb_lock);
 	return;
 }
 


^ permalink raw reply related	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup
@ 2012-06-15 10:06       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-15 10:06 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

Michal Hocko <mhocko@suse.cz> writes:

> On Wed 13-06-12 15:57:30, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> 
>> This patchset add the charge and uncharge routines for hugetlb cgroup.
>> We do cgroup charging in page alloc and uncharge in compound page
>> destructor. Assigning page's hugetlb cgroup is protected by hugetlb_lock.
>> 
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>
> Reviewed-by: Michal Hocko <mhocko@suse.cz>
>
> One minor comment
> [...]
>> +void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages,
>> +				  struct hugetlb_cgroup *h_cg,
>> +				  struct page *page)
>> +{
>> +	if (hugetlb_cgroup_disabled() || !h_cg)
>> +		return;
>> +
>> +	spin_lock(&hugetlb_lock);
>> +	set_hugetlb_cgroup(page, h_cg);
>> +	spin_unlock(&hugetlb_lock);
>> +	return;
>> +}
>
> I guess we can remove the lock here because nobody can see the page yet,
> right?
>

We need that to make sure when we remove cgroup we find correct page
hugetlb cgroup values. But i guess we have a bug here. How about the
below ?

NOTE: We also need another patch to update active list during soft
offline. I will send that in reply.

commit e4c3fd3cc0f0faa30ea283cb48ba478a5c0d3e74
Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Date:   Fri Jun 15 14:42:27 2012 +0530

    hugetlb/cgroup: Assign the page hugetlb cgroup when we move the page to active list.
    
    page's hugetlb cgroup assign and moving to active list should happen with
    hugetlb_lock held. Otherwise when we remove the hugetlb cgroup we would
    iterate the active list and will find page with NULL hugetlb cgroup values.
    
    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ee4da3b..b90dfb4 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1146,9 +1146,12 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 	}
 	spin_lock(&hugetlb_lock);
 	page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve);
-	spin_unlock(&hugetlb_lock);
-
-	if (!page) {
+	if (page) {
+		/* update page cgroup details */
+		hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page);
+		spin_unlock(&hugetlb_lock);
+	} else {
+		spin_unlock(&hugetlb_lock);
 		page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
 		if (!page) {
 			hugetlb_cgroup_uncharge_cgroup(idx,
@@ -1159,14 +1162,13 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 		}
 		spin_lock(&hugetlb_lock);
 		list_move(&page->lru, &h->hugepage_activelist);
+		hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page);
 		spin_unlock(&hugetlb_lock);
 	}
 
 	set_page_private(page, (unsigned long)spool);
 
 	vma_commit_reservation(h, vma, addr);
-	/* update page cgroup details */
-	hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page);
 	return page;
 }
 
diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
index 8e7ca0a..d4f3f7b 100644
--- a/mm/hugetlb_cgroup.c
+++ b/mm/hugetlb_cgroup.c
@@ -218,6 +218,7 @@ done:
 	return ret;
 }
 
+/* Should be called with hugetlb_lock held */
 void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages,
 				  struct hugetlb_cgroup *h_cg,
 				  struct page *page)
@@ -225,9 +226,7 @@ void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages,
 	if (hugetlb_cgroup_disabled() || !h_cg)
 		return;
 
-	spin_lock(&hugetlb_lock);
 	set_hugetlb_cgroup(page, h_cg);
-	spin_unlock(&hugetlb_lock);
 	return;
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup
@ 2012-06-15 10:06       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-15 10:06 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, rientjes-hpIqsD4AKlfQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org> writes:

> On Wed 13-06-12 15:57:30, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
>> 
>> This patchset add the charge and uncharge routines for hugetlb cgroup.
>> We do cgroup charging in page alloc and uncharge in compound page
>> destructor. Assigning page's hugetlb cgroup is protected by hugetlb_lock.
>> 
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
>
> Reviewed-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
>
> One minor comment
> [...]
>> +void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages,
>> +				  struct hugetlb_cgroup *h_cg,
>> +				  struct page *page)
>> +{
>> +	if (hugetlb_cgroup_disabled() || !h_cg)
>> +		return;
>> +
>> +	spin_lock(&hugetlb_lock);
>> +	set_hugetlb_cgroup(page, h_cg);
>> +	spin_unlock(&hugetlb_lock);
>> +	return;
>> +}
>
> I guess we can remove the lock here because nobody can see the page yet,
> right?
>

We need that to make sure when we remove cgroup we find correct page
hugetlb cgroup values. But i guess we have a bug here. How about the
below ?

NOTE: We also need another patch to update active list during soft
offline. I will send that in reply.

commit e4c3fd3cc0f0faa30ea283cb48ba478a5c0d3e74
Author: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Date:   Fri Jun 15 14:42:27 2012 +0530

    hugetlb/cgroup: Assign the page hugetlb cgroup when we move the page to active list.
    
    page's hugetlb cgroup assign and moving to active list should happen with
    hugetlb_lock held. Otherwise when we remove the hugetlb cgroup we would
    iterate the active list and will find page with NULL hugetlb cgroup values.
    
    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ee4da3b..b90dfb4 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1146,9 +1146,12 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 	}
 	spin_lock(&hugetlb_lock);
 	page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve);
-	spin_unlock(&hugetlb_lock);
-
-	if (!page) {
+	if (page) {
+		/* update page cgroup details */
+		hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page);
+		spin_unlock(&hugetlb_lock);
+	} else {
+		spin_unlock(&hugetlb_lock);
 		page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
 		if (!page) {
 			hugetlb_cgroup_uncharge_cgroup(idx,
@@ -1159,14 +1162,13 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 		}
 		spin_lock(&hugetlb_lock);
 		list_move(&page->lru, &h->hugepage_activelist);
+		hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page);
 		spin_unlock(&hugetlb_lock);
 	}
 
 	set_page_private(page, (unsigned long)spool);
 
 	vma_commit_reservation(h, vma, addr);
-	/* update page cgroup details */
-	hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page);
 	return page;
 }
 
diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
index 8e7ca0a..d4f3f7b 100644
--- a/mm/hugetlb_cgroup.c
+++ b/mm/hugetlb_cgroup.c
@@ -218,6 +218,7 @@ done:
 	return ret;
 }
 
+/* Should be called with hugetlb_lock held */
 void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages,
 				  struct hugetlb_cgroup *h_cg,
 				  struct page *page)
@@ -225,9 +226,7 @@ void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages,
 	if (hugetlb_cgroup_disabled() || !h_cg)
 		return;
 
-	spin_lock(&hugetlb_lock);
 	set_hugetlb_cgroup(page, h_cg);
-	spin_unlock(&hugetlb_lock);
 	return;
 }
 

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH 1/2] hugetlb: Move all the in use pages to active list
  2012-06-15 10:06       ` Aneesh Kumar K.V
  (?)
  (?)
@ 2012-06-15 10:08       ` Aneesh Kumar K.V
  2012-06-15 10:08         ` [PATCH 2/2] hugetlb/cgroup: Assign the page hugetlb cgroup when we move the page " Aneesh Kumar K.V
  -1 siblings, 1 reply; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-15 10:08 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, mhocko, akpm; +Cc: Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

When we fail to allocate pages from the reserve pool, hugetlb
do try to allocate huge pages using alloc_buddy_huge_page.
Add these to the active list. We also need to add the huge
page we allocate when we soft offline the oldpage to active
list.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 mm/hugetlb.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index c57740b..ee4da3b 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -928,8 +928,10 @@ struct page *alloc_huge_page_node(struct hstate *h, int nid)
 	page = dequeue_huge_page_node(h, nid);
 	spin_unlock(&hugetlb_lock);
 
-	if (!page)
+	if (!page) {
 		page = alloc_buddy_huge_page(h, nid);
+		list_move(&page->lru, &h->hugepage_activelist);
+	}
 
 	return page;
 }
@@ -1155,6 +1157,9 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 			hugepage_subpool_put_pages(spool, chg);
 			return ERR_PTR(-ENOSPC);
 		}
+		spin_lock(&hugetlb_lock);
+		list_move(&page->lru, &h->hugepage_activelist);
+		spin_unlock(&hugetlb_lock);
 	}
 
 	set_page_private(page, (unsigned long)spool);
-- 
1.7.10

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* [PATCH 2/2] hugetlb/cgroup: Assign the page hugetlb cgroup when we move the page to active list.
  2012-06-15 10:08       ` [PATCH 1/2] hugetlb: Move all the in use pages to active list Aneesh Kumar K.V
@ 2012-06-15 10:08         ` Aneesh Kumar K.V
  2012-06-16  6:24           ` Kamezawa Hiroyuki
  0 siblings, 1 reply; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-15 10:08 UTC (permalink / raw)
  To: linux-mm, kamezawa.hiroyu, mhocko, akpm; +Cc: Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

page's hugetlb cgroup assign and moving to active list should happen with
hugetlb_lock held. Otherwise when we remove the hugetlb cgroup we would
iterate the active list and will find page with NULL hugetlb cgroup values.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 mm/hugetlb.c        |   12 +++++++-----
 mm/hugetlb_cgroup.c |    3 +--
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ee4da3b..b90dfb4 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1146,9 +1146,12 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 	}
 	spin_lock(&hugetlb_lock);
 	page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve);
-	spin_unlock(&hugetlb_lock);
-
-	if (!page) {
+	if (page) {
+		/* update page cgroup details */
+		hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page);
+		spin_unlock(&hugetlb_lock);
+	} else {
+		spin_unlock(&hugetlb_lock);
 		page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
 		if (!page) {
 			hugetlb_cgroup_uncharge_cgroup(idx,
@@ -1159,14 +1162,13 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 		}
 		spin_lock(&hugetlb_lock);
 		list_move(&page->lru, &h->hugepage_activelist);
+		hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page);
 		spin_unlock(&hugetlb_lock);
 	}
 
 	set_page_private(page, (unsigned long)spool);
 
 	vma_commit_reservation(h, vma, addr);
-	/* update page cgroup details */
-	hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page);
 	return page;
 }
 
diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
index 8e7ca0a..d4f3f7b 100644
--- a/mm/hugetlb_cgroup.c
+++ b/mm/hugetlb_cgroup.c
@@ -218,6 +218,7 @@ done:
 	return ret;
 }
 
+/* Should be called with hugetlb_lock held */
 void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages,
 				  struct hugetlb_cgroup *h_cg,
 				  struct page *page)
@@ -225,9 +226,7 @@ void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages,
 	if (hugetlb_cgroup_disabled() || !h_cg)
 		return;
 
-	spin_lock(&hugetlb_lock);
 	set_hugetlb_cgroup(page, h_cg);
-	spin_unlock(&hugetlb_lock);
 	return;
 }
 
-- 
1.7.10

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 14/15] hugetlb/cgroup: migrate hugetlb cgroup info from oldpage to new page during migration
  2012-06-14 10:04     ` Michal Hocko
  (?)
@ 2012-06-15 10:50       ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-15 10:50 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

Michal Hocko <mhocko@suse.cz> writes:

> On Wed 13-06-12 15:57:33, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> 
>> With HugeTLB pages, hugetlb cgroup is uncharged in compound page destructor.  Since
>> we are holding a hugepage reference, we can be sure that old page won't
>> get uncharged till the last put_page().
>> 
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>
> Reviewed-by: Michal Hocko <mhocko@suse.cz>
>
> One question below
> [...]
>> +void hugetlb_cgroup_migrate(struct page *oldhpage, struct page *newhpage)
>> +{
>> +	struct hugetlb_cgroup *h_cg;
>> +
>> +	if (hugetlb_cgroup_disabled())
>> +		return;
>> +
>> +	VM_BUG_ON(!PageHuge(oldhpage));
>> +	spin_lock(&hugetlb_lock);
>> +	h_cg = hugetlb_cgroup_from_page(oldhpage);
>> +	set_hugetlb_cgroup(oldhpage, NULL);
>> +	cgroup_exclude_rmdir(&h_cg->css);
>> +
>> +	/* move the h_cg details to new cgroup */
>> +	set_hugetlb_cgroup(newhpage, h_cg);
>> +	spin_unlock(&hugetlb_lock);
>> +	cgroup_release_and_wakeup_rmdir(&h_cg->css);
>> +	return;
>> +}
>> +
>
> The changelog says that the old page won't get uncharged - which means
> that the the cgroup cannot go away (even if we raced with the move
> parent, hugetlb_lock makes sure we either see old or new cgroup) so why
> do we need to play with css ref. counting?

Ok hugetlb_lock should be sufficient here i guess. I will send a patch
on top to remove the exclude_rmdir and release_and_wakeup_rmdir 

-aneesh


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 14/15] hugetlb/cgroup: migrate hugetlb cgroup info from oldpage to new page during migration
@ 2012-06-15 10:50       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-15 10:50 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, akpm, hannes,
	linux-kernel, cgroups

Michal Hocko <mhocko@suse.cz> writes:

> On Wed 13-06-12 15:57:33, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> 
>> With HugeTLB pages, hugetlb cgroup is uncharged in compound page destructor.  Since
>> we are holding a hugepage reference, we can be sure that old page won't
>> get uncharged till the last put_page().
>> 
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>
> Reviewed-by: Michal Hocko <mhocko@suse.cz>
>
> One question below
> [...]
>> +void hugetlb_cgroup_migrate(struct page *oldhpage, struct page *newhpage)
>> +{
>> +	struct hugetlb_cgroup *h_cg;
>> +
>> +	if (hugetlb_cgroup_disabled())
>> +		return;
>> +
>> +	VM_BUG_ON(!PageHuge(oldhpage));
>> +	spin_lock(&hugetlb_lock);
>> +	h_cg = hugetlb_cgroup_from_page(oldhpage);
>> +	set_hugetlb_cgroup(oldhpage, NULL);
>> +	cgroup_exclude_rmdir(&h_cg->css);
>> +
>> +	/* move the h_cg details to new cgroup */
>> +	set_hugetlb_cgroup(newhpage, h_cg);
>> +	spin_unlock(&hugetlb_lock);
>> +	cgroup_release_and_wakeup_rmdir(&h_cg->css);
>> +	return;
>> +}
>> +
>
> The changelog says that the old page won't get uncharged - which means
> that the the cgroup cannot go away (even if we raced with the move
> parent, hugetlb_lock makes sure we either see old or new cgroup) so why
> do we need to play with css ref. counting?

Ok hugetlb_lock should be sufficient here i guess. I will send a patch
on top to remove the exclude_rmdir and release_and_wakeup_rmdir 

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 14/15] hugetlb/cgroup: migrate hugetlb cgroup info from oldpage to new page during migration
@ 2012-06-15 10:50       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-15 10:50 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	dhillf-Re5JQEeQqe8AvxtiuMwx3w, rientjes-hpIqsD4AKlfQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org> writes:

> On Wed 13-06-12 15:57:33, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
>> 
>> With HugeTLB pages, hugetlb cgroup is uncharged in compound page destructor.  Since
>> we are holding a hugepage reference, we can be sure that old page won't
>> get uncharged till the last put_page().
>> 
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
>
> Reviewed-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
>
> One question below
> [...]
>> +void hugetlb_cgroup_migrate(struct page *oldhpage, struct page *newhpage)
>> +{
>> +	struct hugetlb_cgroup *h_cg;
>> +
>> +	if (hugetlb_cgroup_disabled())
>> +		return;
>> +
>> +	VM_BUG_ON(!PageHuge(oldhpage));
>> +	spin_lock(&hugetlb_lock);
>> +	h_cg = hugetlb_cgroup_from_page(oldhpage);
>> +	set_hugetlb_cgroup(oldhpage, NULL);
>> +	cgroup_exclude_rmdir(&h_cg->css);
>> +
>> +	/* move the h_cg details to new cgroup */
>> +	set_hugetlb_cgroup(newhpage, h_cg);
>> +	spin_unlock(&hugetlb_lock);
>> +	cgroup_release_and_wakeup_rmdir(&h_cg->css);
>> +	return;
>> +}
>> +
>
> The changelog says that the old page won't get uncharged - which means
> that the the cgroup cannot go away (even if we raced with the move
> parent, hugetlb_lock makes sure we either see old or new cgroup) so why
> do we need to play with css ref. counting?

Ok hugetlb_lock should be sufficient here i guess. I will send a patch
on top to remove the exclude_rmdir and release_and_wakeup_rmdir 

-aneesh

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 2/2] hugetlb/cgroup: Assign the page hugetlb cgroup when we move the page to active list.
  2012-06-15 10:08         ` [PATCH 2/2] hugetlb/cgroup: Assign the page hugetlb cgroup when we move the page " Aneesh Kumar K.V
@ 2012-06-16  6:24           ` Kamezawa Hiroyuki
  0 siblings, 0 replies; 122+ messages in thread
From: Kamezawa Hiroyuki @ 2012-06-16  6:24 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linux-mm, mhocko, akpm

(2012/06/15 19:08), Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V"<aneesh.kumar@linux.vnet.ibm.com>
> 
> page's hugetlb cgroup assign and moving to active list should happen with
> hugetlb_lock held. Otherwise when we remove the hugetlb cgroup we would
> iterate the active list and will find page with NULL hugetlb cgroup values.
> 
> Signed-off-by: Aneesh Kumar K.V<aneesh.kumar@linux.vnet.ibm.com>

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup
  2012-06-14  8:58     ` Li Zefan
@ 2012-06-22 22:11       ` Andrew Morton
  -1 siblings, 0 replies; 122+ messages in thread
From: Andrew Morton @ 2012-06-22 22:11 UTC (permalink / raw)
  To: Li Zefan
  Cc: Aneesh Kumar K.V, linux-mm, kamezawa.hiroyu, dhillf, rientjes,
	mhocko, hannes, linux-kernel, cgroups

On Thu, 14 Jun 2012 16:58:05 +0800
Li Zefan <lizefan@huawei.com> wrote:

> > +int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
> 
> > +				 struct hugetlb_cgroup **ptr)
> > +{
> > +	int ret = 0;
> > +	struct res_counter *fail_res;
> > +	struct hugetlb_cgroup *h_cg = NULL;
> > +	unsigned long csize = nr_pages * PAGE_SIZE;
> > +
> > +	if (hugetlb_cgroup_disabled())
> > +		goto done;
> > +	/*
> > +	 * We don't charge any cgroup if the compound page have less
> > +	 * than 3 pages.
> > +	 */
> > +	if (huge_page_order(&hstates[idx]) < HUGETLB_CGROUP_MIN_ORDER)
> > +		goto done;
> > +again:
> > +	rcu_read_lock();
> > +	h_cg = hugetlb_cgroup_from_task(current);
> > +	if (!h_cg)
> 
> 
> In no circumstances should h_cg be NULL.
> 

Aneesh?

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup
@ 2012-06-22 22:11       ` Andrew Morton
  0 siblings, 0 replies; 122+ messages in thread
From: Andrew Morton @ 2012-06-22 22:11 UTC (permalink / raw)
  To: Li Zefan
  Cc: Aneesh Kumar K.V, linux-mm, kamezawa.hiroyu, dhillf, rientjes,
	mhocko, hannes, linux-kernel, cgroups

On Thu, 14 Jun 2012 16:58:05 +0800
Li Zefan <lizefan@huawei.com> wrote:

> > +int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
> 
> > +				 struct hugetlb_cgroup **ptr)
> > +{
> > +	int ret = 0;
> > +	struct res_counter *fail_res;
> > +	struct hugetlb_cgroup *h_cg = NULL;
> > +	unsigned long csize = nr_pages * PAGE_SIZE;
> > +
> > +	if (hugetlb_cgroup_disabled())
> > +		goto done;
> > +	/*
> > +	 * We don't charge any cgroup if the compound page have less
> > +	 * than 3 pages.
> > +	 */
> > +	if (huge_page_order(&hstates[idx]) < HUGETLB_CGROUP_MIN_ORDER)
> > +		goto done;
> > +again:
> > +	rcu_read_lock();
> > +	h_cg = hugetlb_cgroup_from_task(current);
> > +	if (!h_cg)
> 
> 
> In no circumstances should h_cg be NULL.
> 

Aneesh?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup
  2012-06-22 22:11       ` Andrew Morton
@ 2012-06-24 16:44         ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-24 16:44 UTC (permalink / raw)
  To: Andrew Morton, Li Zefan
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, hannes,
	linux-kernel, cgroups



Hi Andrew,

Andrew Morton <akpm@linux-foundation.org> writes:

> On Thu, 14 Jun 2012 16:58:05 +0800
> Li Zefan <lizefan@huawei.com> wrote:
>
>> > +int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
>> 
>> > +				 struct hugetlb_cgroup **ptr)
>> > +{
>> > +	int ret = 0;
>> > +	struct res_counter *fail_res;
>> > +	struct hugetlb_cgroup *h_cg = NULL;
>> > +	unsigned long csize = nr_pages * PAGE_SIZE;
>> > +
>> > +	if (hugetlb_cgroup_disabled())
>> > +		goto done;
>> > +	/*
>> > +	 * We don't charge any cgroup if the compound page have less
>> > +	 * than 3 pages.
>> > +	 */
>> > +	if (huge_page_order(&hstates[idx]) < HUGETLB_CGROUP_MIN_ORDER)
>> > +		goto done;
>> > +again:
>> > +	rcu_read_lock();
>> > +	h_cg = hugetlb_cgroup_from_task(current);
>> > +	if (!h_cg)
>> 
>> 
>> In no circumstances should h_cg be NULL.
>> 
>
> Aneesh?

I missed this in the last review. Thanks for reminding. I will send a
patch addressing this and another related comment in
4FD9A6B6.50503@huawei.com as a separate mail.

-aneesh


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup
@ 2012-06-24 16:44         ` Aneesh Kumar K.V
  0 siblings, 0 replies; 122+ messages in thread
From: Aneesh Kumar K.V @ 2012-06-24 16:44 UTC (permalink / raw)
  To: Andrew Morton, Li Zefan
  Cc: linux-mm, kamezawa.hiroyu, dhillf, rientjes, mhocko, hannes,
	linux-kernel, cgroups



Hi Andrew,

Andrew Morton <akpm@linux-foundation.org> writes:

> On Thu, 14 Jun 2012 16:58:05 +0800
> Li Zefan <lizefan@huawei.com> wrote:
>
>> > +int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
>> 
>> > +				 struct hugetlb_cgroup **ptr)
>> > +{
>> > +	int ret = 0;
>> > +	struct res_counter *fail_res;
>> > +	struct hugetlb_cgroup *h_cg = NULL;
>> > +	unsigned long csize = nr_pages * PAGE_SIZE;
>> > +
>> > +	if (hugetlb_cgroup_disabled())
>> > +		goto done;
>> > +	/*
>> > +	 * We don't charge any cgroup if the compound page have less
>> > +	 * than 3 pages.
>> > +	 */
>> > +	if (huge_page_order(&hstates[idx]) < HUGETLB_CGROUP_MIN_ORDER)
>> > +		goto done;
>> > +again:
>> > +	rcu_read_lock();
>> > +	h_cg = hugetlb_cgroup_from_task(current);
>> > +	if (!h_cg)
>> 
>> 
>> In no circumstances should h_cg be NULL.
>> 
>
> Aneesh?

I missed this in the last review. Thanks for reminding. I will send a
patch addressing this and another related comment in
4FD9A6B6.50503@huawei.com as a separate mail.

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 122+ messages in thread

end of thread, other threads:[~2012-06-24 16:45 UTC | newest]

Thread overview: 122+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-13 10:27 [PATCH -V9 00/15] hugetlb: Add HugeTLB controller to control HugeTLB allocation Aneesh Kumar K.V
2012-06-13 10:27 ` Aneesh Kumar K.V
2012-06-13 10:27 ` [PATCH -V9 01/15] hugetlb: rename max_hstate to hugetlb_max_hstate Aneesh Kumar K.V
2012-06-13 10:27   ` Aneesh Kumar K.V
2012-06-13 10:27 ` [PATCH -V9 02/15] hugetlb: don't use ERR_PTR with VM_FAULT* values Aneesh Kumar K.V
2012-06-13 10:27   ` Aneesh Kumar K.V
2012-06-13 10:27 ` [PATCH -V9 03/15] hugetlb: add an inline helper for finding hstate index Aneesh Kumar K.V
2012-06-13 10:27   ` Aneesh Kumar K.V
2012-06-13 10:27 ` [PATCH -V9 04/15] hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages Aneesh Kumar K.V
2012-06-13 10:27   ` Aneesh Kumar K.V
2012-06-13 10:27   ` Aneesh Kumar K.V
2012-06-13 14:59   ` Michal Hocko
2012-06-13 14:59     ` Michal Hocko
2012-06-13 14:59     ` Michal Hocko
2012-06-13 15:03     ` Michal Hocko
2012-06-13 15:03       ` Michal Hocko
2012-06-13 15:03       ` Michal Hocko
2012-06-13 16:43       ` Aneesh Kumar K.V
2012-06-13 16:43         ` Aneesh Kumar K.V
2012-06-14  7:14         ` Michal Hocko
2012-06-14  7:14           ` Michal Hocko
2012-06-14  7:14           ` Michal Hocko
2012-06-13 16:37     ` Aneesh Kumar K.V
2012-06-13 16:37       ` Aneesh Kumar K.V
2012-06-13 16:37       ` Aneesh Kumar K.V
2012-06-14  7:16       ` Michal Hocko
2012-06-14  7:16         ` Michal Hocko
2012-06-14  7:16         ` Michal Hocko
2012-06-13 10:27 ` [PATCH -V9 05/15] hugetlb: avoid taking i_mmap_mutex in unmap_single_vma() for hugetlb Aneesh Kumar K.V
2012-06-13 10:27   ` Aneesh Kumar K.V
2012-06-14  3:09   ` Kamezawa Hiroyuki
2012-06-14  3:09     ` Kamezawa Hiroyuki
2012-06-14  3:09     ` Kamezawa Hiroyuki
2012-06-14  7:20   ` Michal Hocko
2012-06-14  7:20     ` Michal Hocko
2012-06-13 10:27 ` [PATCH -V9 06/15] hugetlb: simplify migrate_huge_page() Aneesh Kumar K.V
2012-06-13 10:27   ` Aneesh Kumar K.V
2012-06-14  7:28   ` Michal Hocko
2012-06-14  7:28     ` Michal Hocko
2012-06-14  7:28     ` Michal Hocko
2012-06-13 10:27 ` [PATCH -V9 07/15] hugetlb: add a list for tracking in-use HugeTLB pages Aneesh Kumar K.V
2012-06-13 10:27   ` Aneesh Kumar K.V
2012-06-14  7:33   ` Michal Hocko
2012-06-14  7:33     ` Michal Hocko
2012-06-14  7:33     ` Michal Hocko
2012-06-13 10:27 ` [PATCH -V9 08/15] hugetlb: Make some static variables global Aneesh Kumar K.V
2012-06-13 10:27   ` Aneesh Kumar K.V
2012-06-14  3:11   ` Kamezawa Hiroyuki
2012-06-14  3:11     ` Kamezawa Hiroyuki
2012-06-14  7:38   ` Michal Hocko
2012-06-14  7:38     ` Michal Hocko
2012-06-13 10:27 ` [PATCH -V9 09/15] mm/hugetlb: Add new HugeTLB cgroup Aneesh Kumar K.V
2012-06-13 10:27   ` Aneesh Kumar K.V
2012-06-14  8:24   ` Michal Hocko
2012-06-14  8:24     ` Michal Hocko
2012-06-14  8:54   ` Li Zefan
2012-06-14  8:54     ` Li Zefan
2012-06-15  6:20     ` Aneesh Kumar K.V
2012-06-15  6:20       ` Aneesh Kumar K.V
2012-06-15  6:20       ` Aneesh Kumar K.V
2012-06-13 10:27 ` [PATCH -V9 10/15] hugetlb/cgroup: Add the cgroup pointer to page lru Aneesh Kumar K.V
2012-06-13 10:27   ` Aneesh Kumar K.V
2012-06-13 11:32   ` Aneesh Kumar K.V
2012-06-13 11:32     ` Aneesh Kumar K.V
2012-06-13 11:32     ` Aneesh Kumar K.V
2012-06-13 11:34   ` [PATCH -V9 [updated] " Aneesh Kumar K.V
2012-06-13 11:34     ` Aneesh Kumar K.V
2012-06-14  4:04     ` Kamezawa Hiroyuki
2012-06-14  4:04       ` Kamezawa Hiroyuki
2012-06-14  4:04       ` Kamezawa Hiroyuki
2012-06-14  8:44     ` Michal Hocko
2012-06-14  8:44       ` Michal Hocko
2012-06-14  8:44       ` Michal Hocko
2012-06-13 10:27 ` [PATCH -V9 11/15] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup Aneesh Kumar K.V
2012-06-13 10:27   ` Aneesh Kumar K.V
2012-06-13 10:27   ` Aneesh Kumar K.V
2012-06-14  4:07   ` Kamezawa Hiroyuki
2012-06-14  4:07     ` Kamezawa Hiroyuki
2012-06-14  4:07     ` Kamezawa Hiroyuki
2012-06-14  8:58   ` Li Zefan
2012-06-14  8:58     ` Li Zefan
2012-06-22 22:11     ` Andrew Morton
2012-06-22 22:11       ` Andrew Morton
2012-06-24 16:44       ` Aneesh Kumar K.V
2012-06-24 16:44         ` Aneesh Kumar K.V
2012-06-14  9:25   ` Michal Hocko
2012-06-14  9:25     ` Michal Hocko
2012-06-15 10:06     ` Aneesh Kumar K.V
2012-06-15 10:06       ` Aneesh Kumar K.V
2012-06-15 10:06       ` Aneesh Kumar K.V
2012-06-15 10:08       ` [PATCH 1/2] hugetlb: Move all the in use pages to active list Aneesh Kumar K.V
2012-06-15 10:08         ` [PATCH 2/2] hugetlb/cgroup: Assign the page hugetlb cgroup when we move the page " Aneesh Kumar K.V
2012-06-16  6:24           ` Kamezawa Hiroyuki
2012-06-13 10:27 ` [PATCH -V9 12/15] hugetlb/cgroup: Add support for cgroup removal Aneesh Kumar K.V
2012-06-13 10:27   ` Aneesh Kumar K.V
2012-06-14  4:09   ` Kamezawa Hiroyuki
2012-06-14  4:09     ` Kamezawa Hiroyuki
2012-06-14  4:09     ` Kamezawa Hiroyuki
2012-06-14  9:31   ` Michal Hocko
2012-06-14  9:31     ` Michal Hocko
2012-06-13 10:27 ` [PATCH -V9 13/15] hugetlb/cgroup: add hugetlb cgroup control files Aneesh Kumar K.V
2012-06-13 10:27   ` Aneesh Kumar K.V
2012-06-14  4:10   ` Kamezawa Hiroyuki
2012-06-14  4:10     ` Kamezawa Hiroyuki
2012-06-14  9:36   ` Michal Hocko
2012-06-14  9:36     ` Michal Hocko
2012-06-13 10:27 ` [PATCH -V9 14/15] hugetlb/cgroup: migrate hugetlb cgroup info from oldpage to new page during migration Aneesh Kumar K.V
2012-06-13 10:27   ` Aneesh Kumar K.V
2012-06-14  4:13   ` Kamezawa Hiroyuki
2012-06-14  4:13     ` Kamezawa Hiroyuki
2012-06-14  4:13     ` Kamezawa Hiroyuki
2012-06-14 10:04   ` Michal Hocko
2012-06-14 10:04     ` Michal Hocko
2012-06-14 10:04     ` Michal Hocko
2012-06-15 10:50     ` Aneesh Kumar K.V
2012-06-15 10:50       ` Aneesh Kumar K.V
2012-06-15 10:50       ` Aneesh Kumar K.V
2012-06-13 10:27 ` [PATCH -V9 15/15] hugetlb/cgroup: add HugeTLB controller documentation Aneesh Kumar K.V
2012-06-13 10:27   ` Aneesh Kumar K.V
2012-06-14 10:07   ` Michal Hocko
2012-06-14 10:07     ` Michal Hocko
2012-06-14 10:07     ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.