linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/10] mm, hugetlb: clean-up and possible bug fix
@ 2013-07-22  8:36 Joonsoo Kim
  2013-07-22  8:36 ` [PATCH v2 01/10] mm, hugetlb: move up the code which check availability of free huge page Joonsoo Kim
                   ` (10 more replies)
  0 siblings, 11 replies; 25+ messages in thread
From: Joonsoo Kim @ 2013-07-22  8:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Mel Gorman, Michal Hocko, Aneesh Kumar K.V,
	KAMEZAWA Hiroyuki, Hugh Dickins, Davidlohr Bueso, David Gibson,
	linux-mm, linux-kernel, Joonsoo Kim, Joonsoo Kim

First 6 patches are almost trivial clean-up patches.

The others are for fixing three bugs.
Perhaps, these problems are minor, because this codes are used
for a long time, and there is no bug reporting for these problems.

These patches are based on v3.10.0 and
passed the libhugetlbfs test suite.

Changes from v1.
Split patch 1 into two patches to clear it's purpose.
Remove useless indentation changes in 'clean-up alloc_huge_page()'
Fix new iteration code bug.
Add reviewed-by or acked-by.

Joonsoo Kim (10):
  mm, hugetlb: move up the code which check availability of free huge
    page
  mm, hugetlb: remove err label in dequeue_huge_page_vma()
  mm, hugetlb: trivial commenting fix
  mm, hugetlb: clean-up alloc_huge_page()
  mm, hugetlb: fix and clean-up node iteration code to alloc or free
  mm, hugetlb: remove redundant list_empty check in
    gather_surplus_pages()
  mm, hugetlb: do not use a page in page cache for cow optimization
  mm, hugetlb: add VM_NORESERVE check in vma_has_reserves()
  mm, hugetlb: remove decrement_hugepage_resv_vma()
  mm, hugetlb: decrement reserve count if VM_NORESERVE alloc page cache

 mm/hugetlb.c |  250 ++++++++++++++++++++++++++--------------------------------
 1 file changed, 112 insertions(+), 138 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 01/10] mm, hugetlb: move up the code which check availability of free huge page
  2013-07-22  8:36 [PATCH v2 00/10] mm, hugetlb: clean-up and possible bug fix Joonsoo Kim
@ 2013-07-22  8:36 ` Joonsoo Kim
  2013-07-22 15:45   ` Aneesh Kumar K.V
  2013-07-22  8:36 ` [PATCH v2 02/10] mm, hugetlb: remove err label in dequeue_huge_page_vma() Joonsoo Kim
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 25+ messages in thread
From: Joonsoo Kim @ 2013-07-22  8:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Mel Gorman, Michal Hocko, Aneesh Kumar K.V,
	KAMEZAWA Hiroyuki, Hugh Dickins, Davidlohr Bueso, David Gibson,
	linux-mm, linux-kernel, Joonsoo Kim, Joonsoo Kim

In this time we are holding a hugetlb_lock, so hstate values can't
be changed. If we don't have any usable free huge page in this time,
we don't need to proceede the processing. So move this code up.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index e2bfbf7..fc4988c 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -539,10 +539,6 @@ static struct page *dequeue_huge_page_vma(struct hstate *h,
 	struct zoneref *z;
 	unsigned int cpuset_mems_cookie;
 
-retry_cpuset:
-	cpuset_mems_cookie = get_mems_allowed();
-	zonelist = huge_zonelist(vma, address,
-					htlb_alloc_mask, &mpol, &nodemask);
 	/*
 	 * A child process with MAP_PRIVATE mappings created by their parent
 	 * have no page reserves. This check ensures that reservations are
@@ -556,6 +552,11 @@ retry_cpuset:
 	if (avoid_reserve && h->free_huge_pages - h->resv_huge_pages == 0)
 		goto err;
 
+retry_cpuset:
+	cpuset_mems_cookie = get_mems_allowed();
+	zonelist = huge_zonelist(vma, address,
+					htlb_alloc_mask, &mpol, &nodemask);
+
 	for_each_zone_zonelist_nodemask(zone, z, zonelist,
 						MAX_NR_ZONES - 1, nodemask) {
 		if (cpuset_zone_allowed_softwall(zone, htlb_alloc_mask)) {
@@ -574,7 +575,6 @@ retry_cpuset:
 	return page;
 
 err:
-	mpol_cond_put(mpol);
 	return NULL;
 }
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 02/10] mm, hugetlb: remove err label in dequeue_huge_page_vma()
  2013-07-22  8:36 [PATCH v2 00/10] mm, hugetlb: clean-up and possible bug fix Joonsoo Kim
  2013-07-22  8:36 ` [PATCH v2 01/10] mm, hugetlb: move up the code which check availability of free huge page Joonsoo Kim
@ 2013-07-22  8:36 ` Joonsoo Kim
  2013-07-22 16:11   ` Michal Hocko
  2013-07-22  8:36 ` [PATCH v2 03/10] mm, hugetlb: trivial commenting fix Joonsoo Kim
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 25+ messages in thread
From: Joonsoo Kim @ 2013-07-22  8:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Mel Gorman, Michal Hocko, Aneesh Kumar K.V,
	KAMEZAWA Hiroyuki, Hugh Dickins, Davidlohr Bueso, David Gibson,
	linux-mm, linux-kernel, Joonsoo Kim, Joonsoo Kim

This label is not needed now, because there is no error handling
except returing NULL.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index fc4988c..d87f70b 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -546,11 +546,11 @@ static struct page *dequeue_huge_page_vma(struct hstate *h,
 	 */
 	if (!vma_has_reserves(vma) &&
 			h->free_huge_pages - h->resv_huge_pages == 0)
-		goto err;
+		return NULL;
 
 	/* If reserves cannot be used, ensure enough pages are in the pool */
 	if (avoid_reserve && h->free_huge_pages - h->resv_huge_pages == 0)
-		goto err;
+		return NULL;
 
 retry_cpuset:
 	cpuset_mems_cookie = get_mems_allowed();
@@ -573,9 +573,6 @@ retry_cpuset:
 	if (unlikely(!put_mems_allowed(cpuset_mems_cookie) && !page))
 		goto retry_cpuset;
 	return page;
-
-err:
-	return NULL;
 }
 
 static void update_and_free_page(struct hstate *h, struct page *page)
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 03/10] mm, hugetlb: trivial commenting fix
  2013-07-22  8:36 [PATCH v2 00/10] mm, hugetlb: clean-up and possible bug fix Joonsoo Kim
  2013-07-22  8:36 ` [PATCH v2 01/10] mm, hugetlb: move up the code which check availability of free huge page Joonsoo Kim
  2013-07-22  8:36 ` [PATCH v2 02/10] mm, hugetlb: remove err label in dequeue_huge_page_vma() Joonsoo Kim
@ 2013-07-22  8:36 ` Joonsoo Kim
       [not found]   ` <51ef274e.0605e00a.246a.ffffa5c9SMTPIN_ADDED_BROKEN@mx.google.com>
  2013-07-22  8:36 ` [PATCH v2 04/10] mm, hugetlb: clean-up alloc_huge_page() Joonsoo Kim
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 25+ messages in thread
From: Joonsoo Kim @ 2013-07-22  8:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Mel Gorman, Michal Hocko, Aneesh Kumar K.V,
	KAMEZAWA Hiroyuki, Hugh Dickins, Davidlohr Bueso, David Gibson,
	linux-mm, linux-kernel, Joonsoo Kim, Joonsoo Kim

The name of the mutex written in comment is wrong.
Fix it.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d87f70b..d21a33a 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -135,9 +135,9 @@ static inline struct hugepage_subpool *subpool_vma(struct vm_area_struct *vma)
  *                    across the pages in a mapping.
  *
  * The region data structures are protected by a combination of the mmap_sem
- * and the hugetlb_instantion_mutex.  To access or modify a region the caller
+ * and the hugetlb_instantiation_mutex.  To access or modify a region the caller
  * must either hold the mmap_sem for write, or the mmap_sem for read and
- * the hugetlb_instantiation mutex:
+ * the hugetlb_instantiation_mutex:
  *
  *	down_write(&mm->mmap_sem);
  * or
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 04/10] mm, hugetlb: clean-up alloc_huge_page()
  2013-07-22  8:36 [PATCH v2 00/10] mm, hugetlb: clean-up and possible bug fix Joonsoo Kim
                   ` (2 preceding siblings ...)
  2013-07-22  8:36 ` [PATCH v2 03/10] mm, hugetlb: trivial commenting fix Joonsoo Kim
@ 2013-07-22  8:36 ` Joonsoo Kim
  2013-07-22  8:36 ` [PATCH v2 05/10] mm, hugetlb: fix and clean-up node iteration code to alloc or free Joonsoo Kim
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Joonsoo Kim @ 2013-07-22  8:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Mel Gorman, Michal Hocko, Aneesh Kumar K.V,
	KAMEZAWA Hiroyuki, Hugh Dickins, Davidlohr Bueso, David Gibson,
	linux-mm, linux-kernel, Joonsoo Kim, Joonsoo Kim

We can unify some codes for succeed allocation.
This makes code more readable.
There is no functional difference.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d21a33a..83edd17 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1146,12 +1146,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 	}
 	spin_lock(&hugetlb_lock);
 	page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve);
-	if (page) {
-		/* update page cgroup details */
-		hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h),
-					     h_cg, page);
-		spin_unlock(&hugetlb_lock);
-	} else {
+	if (!page) {
 		spin_unlock(&hugetlb_lock);
 		page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
 		if (!page) {
@@ -1162,11 +1157,11 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 			return ERR_PTR(-ENOSPC);
 		}
 		spin_lock(&hugetlb_lock);
-		hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h),
-					     h_cg, page);
 		list_move(&page->lru, &h->hugepage_activelist);
-		spin_unlock(&hugetlb_lock);
+		/* Fall through */
 	}
+	hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page);
+	spin_unlock(&hugetlb_lock);
 
 	set_page_private(page, (unsigned long)spool);
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 05/10] mm, hugetlb: fix and clean-up node iteration code to alloc or free
  2013-07-22  8:36 [PATCH v2 00/10] mm, hugetlb: clean-up and possible bug fix Joonsoo Kim
                   ` (3 preceding siblings ...)
  2013-07-22  8:36 ` [PATCH v2 04/10] mm, hugetlb: clean-up alloc_huge_page() Joonsoo Kim
@ 2013-07-22  8:36 ` Joonsoo Kim
  2013-07-22 15:50   ` Aneesh Kumar K.V
  2013-07-22 16:23   ` Michal Hocko
  2013-07-22  8:36 ` [PATCH v2 06/10] mm, hugetlb: remove redundant list_empty check in gather_surplus_pages() Joonsoo Kim
                   ` (5 subsequent siblings)
  10 siblings, 2 replies; 25+ messages in thread
From: Joonsoo Kim @ 2013-07-22  8:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Mel Gorman, Michal Hocko, Aneesh Kumar K.V,
	KAMEZAWA Hiroyuki, Hugh Dickins, Davidlohr Bueso, David Gibson,
	linux-mm, linux-kernel, Joonsoo Kim, Joonsoo Kim

Current node iteration code have a minor problem which do one more
node rotation if we can't succeed to allocate. For example,
if we start to allocate at node 0, we stop to iterate at node 0.
Then we start to allocate at node 1 for next allocation.

I introduce new macros "for_each_node_mask_to_[alloc|free]" and
fix and clean-up node iteration code to alloc or free.
This makes code more understandable.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 83edd17..3ac0a6f 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -752,33 +752,6 @@ static int hstate_next_node_to_alloc(struct hstate *h,
 	return nid;
 }
 
-static int alloc_fresh_huge_page(struct hstate *h, nodemask_t *nodes_allowed)
-{
-	struct page *page;
-	int start_nid;
-	int next_nid;
-	int ret = 0;
-
-	start_nid = hstate_next_node_to_alloc(h, nodes_allowed);
-	next_nid = start_nid;
-
-	do {
-		page = alloc_fresh_huge_page_node(h, next_nid);
-		if (page) {
-			ret = 1;
-			break;
-		}
-		next_nid = hstate_next_node_to_alloc(h, nodes_allowed);
-	} while (next_nid != start_nid);
-
-	if (ret)
-		count_vm_event(HTLB_BUDDY_PGALLOC);
-	else
-		count_vm_event(HTLB_BUDDY_PGALLOC_FAIL);
-
-	return ret;
-}
-
 /*
  * helper for free_pool_huge_page() - return the previously saved
  * node ["this node"] from which to free a huge page.  Advance the
@@ -797,6 +770,40 @@ static int hstate_next_node_to_free(struct hstate *h, nodemask_t *nodes_allowed)
 	return nid;
 }
 
+#define for_each_node_mask_to_alloc(hs, nr_nodes, node, mask)		\
+	for (nr_nodes = nodes_weight(*mask);				\
+		nr_nodes > 0 &&						\
+		((node = hstate_next_node_to_alloc(hs, mask)) || 1);	\
+		nr_nodes--)
+
+#define for_each_node_mask_to_free(hs, nr_nodes, node, mask)		\
+	for (nr_nodes = nodes_weight(*mask);				\
+		nr_nodes > 0 &&						\
+		((node = hstate_next_node_to_free(hs, mask)) || 1);	\
+		nr_nodes--)
+
+static int alloc_fresh_huge_page(struct hstate *h, nodemask_t *nodes_allowed)
+{
+	struct page *page;
+	int nr_nodes, node;
+	int ret = 0;
+
+	for_each_node_mask_to_alloc(h, nr_nodes, node, nodes_allowed) {
+		page = alloc_fresh_huge_page_node(h, node);
+		if (page) {
+			ret = 1;
+			break;
+		}
+	}
+
+	if (ret)
+		count_vm_event(HTLB_BUDDY_PGALLOC);
+	else
+		count_vm_event(HTLB_BUDDY_PGALLOC_FAIL);
+
+	return ret;
+}
+
 /*
  * Free huge page from pool from next node to free.
  * Attempt to keep persistent huge pages more or less
@@ -806,36 +813,31 @@ static int hstate_next_node_to_free(struct hstate *h, nodemask_t *nodes_allowed)
 static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
 							 bool acct_surplus)
 {
-	int start_nid;
-	int next_nid;
+	int nr_nodes, node;
 	int ret = 0;
 
-	start_nid = hstate_next_node_to_free(h, nodes_allowed);
-	next_nid = start_nid;
-
-	do {
+	for_each_node_mask_to_free(h, nr_nodes, node, nodes_allowed) {
 		/*
 		 * If we're returning unused surplus pages, only examine
 		 * nodes with surplus pages.
 		 */
-		if ((!acct_surplus || h->surplus_huge_pages_node[next_nid]) &&
-		    !list_empty(&h->hugepage_freelists[next_nid])) {
+		if ((!acct_surplus || h->surplus_huge_pages_node[node]) &&
+		    !list_empty(&h->hugepage_freelists[node])) {
 			struct page *page =
-				list_entry(h->hugepage_freelists[next_nid].next,
+				list_entry(h->hugepage_freelists[node].next,
 					  struct page, lru);
 			list_del(&page->lru);
 			h->free_huge_pages--;
-			h->free_huge_pages_node[next_nid]--;
+			h->free_huge_pages_node[node]--;
 			if (acct_surplus) {
 				h->surplus_huge_pages--;
-				h->surplus_huge_pages_node[next_nid]--;
+				h->surplus_huge_pages_node[node]--;
 			}
 			update_and_free_page(h, page);
 			ret = 1;
 			break;
 		}
-		next_nid = hstate_next_node_to_free(h, nodes_allowed);
-	} while (next_nid != start_nid);
+	}
 
 	return ret;
 }
@@ -1172,14 +1174,12 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 int __weak alloc_bootmem_huge_page(struct hstate *h)
 {
 	struct huge_bootmem_page *m;
-	int nr_nodes = nodes_weight(node_states[N_MEMORY]);
+	int nr_nodes, node;
 
-	while (nr_nodes) {
+	for_each_node_mask_to_alloc(h, nr_nodes, node, &node_states[N_MEMORY]) {
 		void *addr;
 
-		addr = __alloc_bootmem_node_nopanic(
-				NODE_DATA(hstate_next_node_to_alloc(h,
-						&node_states[N_MEMORY])),
+		addr = __alloc_bootmem_node_nopanic(NODE_DATA(node),
 				huge_page_size(h), huge_page_size(h), 0);
 
 		if (addr) {
@@ -1191,7 +1191,6 @@ int __weak alloc_bootmem_huge_page(struct hstate *h)
 			m = addr;
 			goto found;
 		}
-		nr_nodes--;
 	}
 	return 0;
 
@@ -1330,48 +1329,28 @@ static inline void try_to_free_low(struct hstate *h, unsigned long count,
 static int adjust_pool_surplus(struct hstate *h, nodemask_t *nodes_allowed,
 				int delta)
 {
-	int start_nid, next_nid;
-	int ret = 0;
+	int nr_nodes, node;
 
 	VM_BUG_ON(delta != -1 && delta != 1);
 
-	if (delta < 0)
-		start_nid = hstate_next_node_to_alloc(h, nodes_allowed);
-	else
-		start_nid = hstate_next_node_to_free(h, nodes_allowed);
-	next_nid = start_nid;
-
-	do {
-		int nid = next_nid;
-		if (delta < 0)  {
-			/*
-			 * To shrink on this node, there must be a surplus page
-			 */
-			if (!h->surplus_huge_pages_node[nid]) {
-				next_nid = hstate_next_node_to_alloc(h,
-								nodes_allowed);
-				continue;
-			}
+	if (delta < 0) {
+		for_each_node_mask_to_alloc(h, nr_nodes, node, nodes_allowed) {
+			if (h->surplus_huge_pages_node[node])
+				goto found;
 		}
-		if (delta > 0) {
-			/*
-			 * Surplus cannot exceed the total number of pages
-			 */
-			if (h->surplus_huge_pages_node[nid] >=
-						h->nr_huge_pages_node[nid]) {
-				next_nid = hstate_next_node_to_free(h,
-								nodes_allowed);
-				continue;
-			}
+	} else {
+		for_each_node_mask_to_free(h, nr_nodes, node, nodes_allowed) {
+			if (h->surplus_huge_pages_node[node] <
+					h->nr_huge_pages_node[node])
+				goto found;
 		}
+	}
+	return 0;
 
-		h->surplus_huge_pages += delta;
-		h->surplus_huge_pages_node[nid] += delta;
-		ret = 1;
-		break;
-	} while (next_nid != start_nid);
-
-	return ret;
+found:
+	h->surplus_huge_pages += delta;
+	h->surplus_huge_pages_node[node] += delta;
+	return 1;
 }
 
 #define persistent_huge_pages(h) (h->nr_huge_pages - h->surplus_huge_pages)
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 06/10] mm, hugetlb: remove redundant list_empty check in gather_surplus_pages()
  2013-07-22  8:36 [PATCH v2 00/10] mm, hugetlb: clean-up and possible bug fix Joonsoo Kim
                   ` (4 preceding siblings ...)
  2013-07-22  8:36 ` [PATCH v2 05/10] mm, hugetlb: fix and clean-up node iteration code to alloc or free Joonsoo Kim
@ 2013-07-22  8:36 ` Joonsoo Kim
  2013-07-22  8:36 ` [PATCH v2 07/10] mm, hugetlb: do not use a page in page cache for cow optimization Joonsoo Kim
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Joonsoo Kim @ 2013-07-22  8:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Mel Gorman, Michal Hocko, Aneesh Kumar K.V,
	KAMEZAWA Hiroyuki, Hugh Dickins, Davidlohr Bueso, David Gibson,
	linux-mm, linux-kernel, Joonsoo Kim, Joonsoo Kim

If list is empty, list_for_each_entry_safe() doesn't do anything.
So, this check is redundant. Remove it.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3ac0a6f..7ca8733 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1017,11 +1017,8 @@ free:
 	spin_unlock(&hugetlb_lock);
 
 	/* Free unnecessary surplus pages to the buddy allocator */
-	if (!list_empty(&surplus_list)) {
-		list_for_each_entry_safe(page, tmp, &surplus_list, lru) {
-			put_page(page);
-		}
-	}
+	list_for_each_entry_safe(page, tmp, &surplus_list, lru)
+		put_page(page);
 	spin_lock(&hugetlb_lock);
 
 	return ret;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 07/10] mm, hugetlb: do not use a page in page cache for cow optimization
  2013-07-22  8:36 [PATCH v2 00/10] mm, hugetlb: clean-up and possible bug fix Joonsoo Kim
                   ` (5 preceding siblings ...)
  2013-07-22  8:36 ` [PATCH v2 06/10] mm, hugetlb: remove redundant list_empty check in gather_surplus_pages() Joonsoo Kim
@ 2013-07-22  8:36 ` Joonsoo Kim
  2013-07-23  4:26   ` Naoya Horiguchi
  2013-07-23 11:45   ` Michal Hocko
  2013-07-22  8:36 ` [PATCH v2 08/10] mm, hugetlb: add VM_NORESERVE check in vma_has_reserves() Joonsoo Kim
                   ` (3 subsequent siblings)
  10 siblings, 2 replies; 25+ messages in thread
From: Joonsoo Kim @ 2013-07-22  8:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Mel Gorman, Michal Hocko, Aneesh Kumar K.V,
	KAMEZAWA Hiroyuki, Hugh Dickins, Davidlohr Bueso, David Gibson,
	linux-mm, linux-kernel, Joonsoo Kim, Joonsoo Kim

Currently, we use a page with mapped count 1 in page cache for cow
optimization. If we find this condition, we don't allocate a new
page and copy contents. Instead, we map this page directly.
This may introduce a problem that writting to private mapping overwrite
hugetlb file directly. You can find this situation with following code.

        size = 20 * MB;
        flag = MAP_SHARED;
        p = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, fd, 0);
        if (p == MAP_FAILED) {
                fprintf(stderr, "mmap() failed: %s\n", strerror(errno));
                return -1;
        }
        p[0] = 's';
        fprintf(stdout, "BEFORE STEAL PRIVATE WRITE: %c\n", p[0]);
        munmap(p, size);

        flag = MAP_PRIVATE;
        p = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, fd, 0);
        if (p == MAP_FAILED) {
                fprintf(stderr, "mmap() failed: %s\n", strerror(errno));
        }
        p[0] = 'c';
        munmap(p, size);

        flag = MAP_SHARED;
        p = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, fd, 0);
        if (p == MAP_FAILED) {
                fprintf(stderr, "mmap() failed: %s\n", strerror(errno));
                return -1;
        }
        fprintf(stdout, "AFTER STEAL PRIVATE WRITE: %c\n", p[0]);
        munmap(p, size);

We can see that "AFTER STEAL PRIVATE WRITE: c", not "AFTER STEAL
PRIVATE WRITE: s". If we turn off this optimization to a page
in page cache, the problem is disappeared.

Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 7ca8733..8a61638 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2508,7 +2508,6 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
 {
 	struct hstate *h = hstate_vma(vma);
 	struct page *old_page, *new_page;
-	int avoidcopy;
 	int outside_reserve = 0;
 	unsigned long mmun_start;	/* For mmu_notifiers */
 	unsigned long mmun_end;		/* For mmu_notifiers */
@@ -2518,10 +2517,8 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
 retry_avoidcopy:
 	/* If no-one else is actually using this page, avoid the copy
 	 * and just make the page writable */
-	avoidcopy = (page_mapcount(old_page) == 1);
-	if (avoidcopy) {
-		if (PageAnon(old_page))
-			page_move_anon_rmap(old_page, vma, address);
+	if (page_mapcount(old_page) == 1 && PageAnon(old_page)) {
+		page_move_anon_rmap(old_page, vma, address);
 		set_huge_ptep_writable(vma, address, ptep);
 		return 0;
 	}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 08/10] mm, hugetlb: add VM_NORESERVE check in vma_has_reserves()
  2013-07-22  8:36 [PATCH v2 00/10] mm, hugetlb: clean-up and possible bug fix Joonsoo Kim
                   ` (6 preceding siblings ...)
  2013-07-22  8:36 ` [PATCH v2 07/10] mm, hugetlb: do not use a page in page cache for cow optimization Joonsoo Kim
@ 2013-07-22  8:36 ` Joonsoo Kim
  2013-07-23 13:23   ` Michal Hocko
  2013-07-22  8:36 ` [PATCH v2 09/10] mm, hugetlb: remove decrement_hugepage_resv_vma() Joonsoo Kim
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 25+ messages in thread
From: Joonsoo Kim @ 2013-07-22  8:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Mel Gorman, Michal Hocko, Aneesh Kumar K.V,
	KAMEZAWA Hiroyuki, Hugh Dickins, Davidlohr Bueso, David Gibson,
	linux-mm, linux-kernel, Joonsoo Kim, Joonsoo Kim

If we map the region with MAP_NORESERVE and MAP_SHARED,
we can skip to check reserve counting and eventually we cannot be ensured
to allocate a huge page in fault time.
With following example code, you can easily find this situation.

Assume 2MB, nr_hugepages = 100

        fd = hugetlbfs_unlinked_fd();
        if (fd < 0)
                return 1;

        size = 200 * MB;
        flag = MAP_SHARED;
        p = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, fd, 0);
        if (p == MAP_FAILED) {
                fprintf(stderr, "mmap() failed: %s\n", strerror(errno));
                return -1;
        }

        size = 2 * MB;
        flag = MAP_ANONYMOUS | MAP_SHARED | MAP_HUGETLB | MAP_NORESERVE;
        p = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, -1, 0);
        if (p == MAP_FAILED) {
                fprintf(stderr, "mmap() failed: %s\n", strerror(errno));
        }
        p[0] = '0';
        sleep(10);

During executing sleep(10), run 'cat /proc/meminfo' on another process.
You'll find a mentioned problem.

non reserved shared mapping should not eat into reserve space. So
return error when we don't find enough free space.

Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8a61638..87e73bd 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -464,6 +464,8 @@ void reset_vma_resv_huge_pages(struct vm_area_struct *vma)
 /* Returns true if the VMA has associated reserve pages */
 static int vma_has_reserves(struct vm_area_struct *vma)
 {
+	if (vma->vm_flags & VM_NORESERVE)
+		return 0;
 	if (vma->vm_flags & VM_MAYSHARE)
 		return 1;
 	if (is_vma_resv_set(vma, HPAGE_RESV_OWNER))
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 09/10] mm, hugetlb: remove decrement_hugepage_resv_vma()
  2013-07-22  8:36 [PATCH v2 00/10] mm, hugetlb: clean-up and possible bug fix Joonsoo Kim
                   ` (7 preceding siblings ...)
  2013-07-22  8:36 ` [PATCH v2 08/10] mm, hugetlb: add VM_NORESERVE check in vma_has_reserves() Joonsoo Kim
@ 2013-07-22  8:36 ` Joonsoo Kim
  2013-07-22  8:36 ` [PATCH v2 10/10] mm, hugetlb: decrement reserve count if VM_NORESERVE alloc page cache Joonsoo Kim
  2013-07-22 15:51 ` [PATCH v2 00/10] mm, hugetlb: clean-up and possible bug fix Aneesh Kumar K.V
  10 siblings, 0 replies; 25+ messages in thread
From: Joonsoo Kim @ 2013-07-22  8:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Mel Gorman, Michal Hocko, Aneesh Kumar K.V,
	KAMEZAWA Hiroyuki, Hugh Dickins, Davidlohr Bueso, David Gibson,
	linux-mm, linux-kernel, Joonsoo Kim, Joonsoo Kim

Now, Checking condition of decrement_hugepage_resv_vma() and
vma_has_reserves() is same, so we can clean-up this function with
vma_has_reserves(). Additionally, decrement_hugepage_resv_vma() has only
one call site, so we can remove function and embed it into
dequeue_huge_page_vma() directly. This patch implement it.

Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 87e73bd..2ea6afd 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -434,25 +434,6 @@ static int is_vma_resv_set(struct vm_area_struct *vma, unsigned long flag)
 	return (get_vma_private_data(vma) & flag) != 0;
 }
 
-/* Decrement the reserved pages in the hugepage pool by one */
-static void decrement_hugepage_resv_vma(struct hstate *h,
-			struct vm_area_struct *vma)
-{
-	if (vma->vm_flags & VM_NORESERVE)
-		return;
-
-	if (vma->vm_flags & VM_MAYSHARE) {
-		/* Shared mappings always use reserves */
-		h->resv_huge_pages--;
-	} else if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
-		/*
-		 * Only the process that called mmap() has reserves for
-		 * private mappings.
-		 */
-		h->resv_huge_pages--;
-	}
-}
-
 /* Reset counters to 0 and clear all HPAGE_RESV_* flags */
 void reset_vma_resv_huge_pages(struct vm_area_struct *vma)
 {
@@ -466,10 +447,18 @@ static int vma_has_reserves(struct vm_area_struct *vma)
 {
 	if (vma->vm_flags & VM_NORESERVE)
 		return 0;
+
+	/* Shared mappings always use reserves */
 	if (vma->vm_flags & VM_MAYSHARE)
 		return 1;
+
+	/*
+	 * Only the process that called mmap() has reserves for
+	 * private mappings.
+	 */
 	if (is_vma_resv_set(vma, HPAGE_RESV_OWNER))
 		return 1;
+
 	return 0;
 }
 
@@ -564,8 +553,8 @@ retry_cpuset:
 		if (cpuset_zone_allowed_softwall(zone, htlb_alloc_mask)) {
 			page = dequeue_huge_page_node(h, zone_to_nid(zone));
 			if (page) {
-				if (!avoid_reserve)
-					decrement_hugepage_resv_vma(h, vma);
+				if (!avoid_reserve && vma_has_reserves(vma))
+					h->resv_huge_pages--;
 				break;
 			}
 		}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 10/10] mm, hugetlb: decrement reserve count if VM_NORESERVE alloc page cache
  2013-07-22  8:36 [PATCH v2 00/10] mm, hugetlb: clean-up and possible bug fix Joonsoo Kim
                   ` (8 preceding siblings ...)
  2013-07-22  8:36 ` [PATCH v2 09/10] mm, hugetlb: remove decrement_hugepage_resv_vma() Joonsoo Kim
@ 2013-07-22  8:36 ` Joonsoo Kim
  2013-07-23 13:37   ` Michal Hocko
  2013-07-22 15:51 ` [PATCH v2 00/10] mm, hugetlb: clean-up and possible bug fix Aneesh Kumar K.V
  10 siblings, 1 reply; 25+ messages in thread
From: Joonsoo Kim @ 2013-07-22  8:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Mel Gorman, Michal Hocko, Aneesh Kumar K.V,
	KAMEZAWA Hiroyuki, Hugh Dickins, Davidlohr Bueso, David Gibson,
	linux-mm, linux-kernel, Joonsoo Kim, Joonsoo Kim

If a vma with VM_NORESERVE allocate a new page for page cache, we should
check whether this area is reserved or not. If this address is
already reserved by other process(in case of chg == 0), we should
decrement reserve count, because this allocated page will go into page
cache and currently, there is no way to know that this page comes from
reserved pool or not when releasing inode. This may introduce
over-counting problem to reserved count. With following example code,
you can easily reproduce this situation.

        size = 20 * MB;
        flag = MAP_SHARED;
        p = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, fd, 0);
        if (p == MAP_FAILED) {
                fprintf(stderr, "mmap() failed: %s\n", strerror(errno));
                return -1;
        }

        flag = MAP_SHARED | MAP_NORESERVE;
        q = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, fd, 0);
        if (q == MAP_FAILED) {
                fprintf(stderr, "mmap() failed: %s\n", strerror(errno));
        }
        q[0] = 'c';

This patch solve this problem.

Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 2ea6afd..6782b41 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -443,10 +443,23 @@ void reset_vma_resv_huge_pages(struct vm_area_struct *vma)
 }
 
 /* Returns true if the VMA has associated reserve pages */
-static int vma_has_reserves(struct vm_area_struct *vma)
+static int vma_has_reserves(struct vm_area_struct *vma, long chg)
 {
-	if (vma->vm_flags & VM_NORESERVE)
-		return 0;
+	if (vma->vm_flags & VM_NORESERVE) {
+		/*
+		 * This address is already reserved by other process(chg == 0),
+		 * so, we should decreament reserved count. Without
+		 * decreamenting, reserve count is remained after releasing
+		 * inode, because this allocated page will go into page cache
+		 * and is regarded as coming from reserved pool in releasing
+		 * step. Currently, we don't have any other solution to deal
+		 * with this situation properly, so add work-around here.
+		 */
+		if (vma->vm_flags & VM_MAYSHARE && chg == 0)
+			return 1;
+		else
+			return 0;
+	}
 
 	/* Shared mappings always use reserves */
 	if (vma->vm_flags & VM_MAYSHARE)
@@ -520,7 +533,8 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid)
 
 static struct page *dequeue_huge_page_vma(struct hstate *h,
 				struct vm_area_struct *vma,
-				unsigned long address, int avoid_reserve)
+				unsigned long address, int avoid_reserve,
+				long chg)
 {
 	struct page *page = NULL;
 	struct mempolicy *mpol;
@@ -535,7 +549,7 @@ static struct page *dequeue_huge_page_vma(struct hstate *h,
 	 * have no page reserves. This check ensures that reservations are
 	 * not "stolen". The child may still get SIGKILLed
 	 */
-	if (!vma_has_reserves(vma) &&
+	if (!vma_has_reserves(vma, chg) &&
 			h->free_huge_pages - h->resv_huge_pages == 0)
 		return NULL;
 
@@ -553,8 +567,12 @@ retry_cpuset:
 		if (cpuset_zone_allowed_softwall(zone, htlb_alloc_mask)) {
 			page = dequeue_huge_page_node(h, zone_to_nid(zone));
 			if (page) {
-				if (!avoid_reserve && vma_has_reserves(vma))
-					h->resv_huge_pages--;
+				if (avoid_reserve)
+					break;
+				if (!vma_has_reserves(vma, chg))
+					break;
+
+				h->resv_huge_pages--;
 				break;
 			}
 		}
@@ -1135,7 +1153,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 		return ERR_PTR(-ENOSPC);
 	}
 	spin_lock(&hugetlb_lock);
-	page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve);
+	page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve, chg);
 	if (!page) {
 		spin_unlock(&hugetlb_lock);
 		page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 01/10] mm, hugetlb: move up the code which check availability of free huge page
  2013-07-22  8:36 ` [PATCH v2 01/10] mm, hugetlb: move up the code which check availability of free huge page Joonsoo Kim
@ 2013-07-22 15:45   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 25+ messages in thread
From: Aneesh Kumar K.V @ 2013-07-22 15:45 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Rik van Riel, Mel Gorman, Michal Hocko, KAMEZAWA Hiroyuki,
	Hugh Dickins, Davidlohr Bueso, David Gibson, linux-mm,
	linux-kernel, Joonsoo Kim, Joonsoo Kim

Joonsoo Kim <iamjoonsoo.kim@lge.com> writes:

> In this time we are holding a hugetlb_lock, so hstate values can't
> be changed. If we don't have any usable free huge page in this time,
> we don't need to proceede the processing. So move this code up.
>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>


>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index e2bfbf7..fc4988c 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -539,10 +539,6 @@ static struct page *dequeue_huge_page_vma(struct hstate *h,
>  	struct zoneref *z;
>  	unsigned int cpuset_mems_cookie;
>
> -retry_cpuset:
> -	cpuset_mems_cookie = get_mems_allowed();
> -	zonelist = huge_zonelist(vma, address,
> -					htlb_alloc_mask, &mpol, &nodemask);
>  	/*
>  	 * A child process with MAP_PRIVATE mappings created by their parent
>  	 * have no page reserves. This check ensures that reservations are
> @@ -556,6 +552,11 @@ retry_cpuset:
>  	if (avoid_reserve && h->free_huge_pages - h->resv_huge_pages == 0)
>  		goto err;
>
> +retry_cpuset:
> +	cpuset_mems_cookie = get_mems_allowed();
> +	zonelist = huge_zonelist(vma, address,
> +					htlb_alloc_mask, &mpol, &nodemask);
> +
>  	for_each_zone_zonelist_nodemask(zone, z, zonelist,
>  						MAX_NR_ZONES - 1, nodemask) {
>  		if (cpuset_zone_allowed_softwall(zone, htlb_alloc_mask)) {
> @@ -574,7 +575,6 @@ retry_cpuset:
>  	return page;
>
>  err:
> -	mpol_cond_put(mpol);
>  	return NULL;
>  }
>
> -- 
> 1.7.9.5


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 05/10] mm, hugetlb: fix and clean-up node iteration code to alloc or free
  2013-07-22  8:36 ` [PATCH v2 05/10] mm, hugetlb: fix and clean-up node iteration code to alloc or free Joonsoo Kim
@ 2013-07-22 15:50   ` Aneesh Kumar K.V
  2013-07-22 16:23   ` Michal Hocko
  1 sibling, 0 replies; 25+ messages in thread
From: Aneesh Kumar K.V @ 2013-07-22 15:50 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Rik van Riel, Mel Gorman, Michal Hocko, KAMEZAWA Hiroyuki,
	Hugh Dickins, Davidlohr Bueso, David Gibson, linux-mm,
	linux-kernel, Joonsoo Kim, Joonsoo Kim

Joonsoo Kim <iamjoonsoo.kim@lge.com> writes:

> Current node iteration code have a minor problem which do one more
> node rotation if we can't succeed to allocate. For example,
> if we start to allocate at node 0, we stop to iterate at node 0.
> Then we start to allocate at node 1 for next allocation.
>
> I introduce new macros "for_each_node_mask_to_[alloc|free]" and
> fix and clean-up node iteration code to alloc or free.
> This makes code more understandable.
>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>


Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 83edd17..3ac0a6f 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -752,33 +752,6 @@ static int hstate_next_node_to_alloc(struct hstate *h,
>  	return nid;
>  }
>
> -static int alloc_fresh_huge_page(struct hstate *h, nodemask_t *nodes_allowed)
> -{
> -	struct page *page;
> -	int start_nid;
> -	int next_nid;
> -	int ret = 0;
> -
> -	start_nid = hstate_next_node_to_alloc(h, nodes_allowed);
> -	next_nid = start_nid;
> -
> -	do {
> -		page = alloc_fresh_huge_page_node(h, next_nid);
> -		if (page) {
> -			ret = 1;
> -			break;
> -		}
> -		next_nid = hstate_next_node_to_alloc(h, nodes_allowed);
> -	} while (next_nid != start_nid);
> -
> -	if (ret)
> -		count_vm_event(HTLB_BUDDY_PGALLOC);
> -	else
> -		count_vm_event(HTLB_BUDDY_PGALLOC_FAIL);
> -
> -	return ret;
> -}
> -
>  /*
>   * helper for free_pool_huge_page() - return the previously saved
>   * node ["this node"] from which to free a huge page.  Advance the
> @@ -797,6 +770,40 @@ static int hstate_next_node_to_free(struct hstate *h, nodemask_t *nodes_allowed)
>  	return nid;
>  }
>
> +#define for_each_node_mask_to_alloc(hs, nr_nodes, node, mask)		\
> +	for (nr_nodes = nodes_weight(*mask);				\
> +		nr_nodes > 0 &&						\
> +		((node = hstate_next_node_to_alloc(hs, mask)) || 1);	\
> +		nr_nodes--)
> +
> +#define for_each_node_mask_to_free(hs, nr_nodes, node, mask)		\
> +	for (nr_nodes = nodes_weight(*mask);				\
> +		nr_nodes > 0 &&						\
> +		((node = hstate_next_node_to_free(hs, mask)) || 1);	\
> +		nr_nodes--)
> +
> +static int alloc_fresh_huge_page(struct hstate *h, nodemask_t *nodes_allowed)
> +{
> +	struct page *page;
> +	int nr_nodes, node;
> +	int ret = 0;
> +
> +	for_each_node_mask_to_alloc(h, nr_nodes, node, nodes_allowed) {
> +		page = alloc_fresh_huge_page_node(h, node);
> +		if (page) {
> +			ret = 1;
> +			break;
> +		}
> +	}
> +
> +	if (ret)
> +		count_vm_event(HTLB_BUDDY_PGALLOC);
> +	else
> +		count_vm_event(HTLB_BUDDY_PGALLOC_FAIL);
> +
> +	return ret;
> +}
> +
>  /*
>   * Free huge page from pool from next node to free.
>   * Attempt to keep persistent huge pages more or less
> @@ -806,36 +813,31 @@ static int hstate_next_node_to_free(struct hstate *h, nodemask_t *nodes_allowed)
>  static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
>  							 bool acct_surplus)
>  {
> -	int start_nid;
> -	int next_nid;
> +	int nr_nodes, node;
>  	int ret = 0;
>
> -	start_nid = hstate_next_node_to_free(h, nodes_allowed);
> -	next_nid = start_nid;
> -
> -	do {
> +	for_each_node_mask_to_free(h, nr_nodes, node, nodes_allowed) {
>  		/*
>  		 * If we're returning unused surplus pages, only examine
>  		 * nodes with surplus pages.
>  		 */
> -		if ((!acct_surplus || h->surplus_huge_pages_node[next_nid]) &&
> -		    !list_empty(&h->hugepage_freelists[next_nid])) {
> +		if ((!acct_surplus || h->surplus_huge_pages_node[node]) &&
> +		    !list_empty(&h->hugepage_freelists[node])) {
>  			struct page *page =
> -				list_entry(h->hugepage_freelists[next_nid].next,
> +				list_entry(h->hugepage_freelists[node].next,
>  					  struct page, lru);
>  			list_del(&page->lru);
>  			h->free_huge_pages--;
> -			h->free_huge_pages_node[next_nid]--;
> +			h->free_huge_pages_node[node]--;
>  			if (acct_surplus) {
>  				h->surplus_huge_pages--;
> -				h->surplus_huge_pages_node[next_nid]--;
> +				h->surplus_huge_pages_node[node]--;
>  			}
>  			update_and_free_page(h, page);
>  			ret = 1;
>  			break;
>  		}
> -		next_nid = hstate_next_node_to_free(h, nodes_allowed);
> -	} while (next_nid != start_nid);
> +	}
>
>  	return ret;
>  }
> @@ -1172,14 +1174,12 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
>  int __weak alloc_bootmem_huge_page(struct hstate *h)
>  {
>  	struct huge_bootmem_page *m;
> -	int nr_nodes = nodes_weight(node_states[N_MEMORY]);
> +	int nr_nodes, node;
>
> -	while (nr_nodes) {
> +	for_each_node_mask_to_alloc(h, nr_nodes, node, &node_states[N_MEMORY]) {
>  		void *addr;
>
> -		addr = __alloc_bootmem_node_nopanic(
> -				NODE_DATA(hstate_next_node_to_alloc(h,
> -						&node_states[N_MEMORY])),
> +		addr = __alloc_bootmem_node_nopanic(NODE_DATA(node),
>  				huge_page_size(h), huge_page_size(h), 0);
>
>  		if (addr) {
> @@ -1191,7 +1191,6 @@ int __weak alloc_bootmem_huge_page(struct hstate *h)
>  			m = addr;
>  			goto found;
>  		}
> -		nr_nodes--;
>  	}
>  	return 0;
>
> @@ -1330,48 +1329,28 @@ static inline void try_to_free_low(struct hstate *h, unsigned long count,
>  static int adjust_pool_surplus(struct hstate *h, nodemask_t *nodes_allowed,
>  				int delta)
>  {
> -	int start_nid, next_nid;
> -	int ret = 0;
> +	int nr_nodes, node;
>
>  	VM_BUG_ON(delta != -1 && delta != 1);
>
> -	if (delta < 0)
> -		start_nid = hstate_next_node_to_alloc(h, nodes_allowed);
> -	else
> -		start_nid = hstate_next_node_to_free(h, nodes_allowed);
> -	next_nid = start_nid;
> -
> -	do {
> -		int nid = next_nid;
> -		if (delta < 0)  {
> -			/*
> -			 * To shrink on this node, there must be a surplus page
> -			 */
> -			if (!h->surplus_huge_pages_node[nid]) {
> -				next_nid = hstate_next_node_to_alloc(h,
> -								nodes_allowed);
> -				continue;
> -			}
> +	if (delta < 0) {
> +		for_each_node_mask_to_alloc(h, nr_nodes, node, nodes_allowed) {
> +			if (h->surplus_huge_pages_node[node])
> +				goto found;
>  		}
> -		if (delta > 0) {
> -			/*
> -			 * Surplus cannot exceed the total number of pages
> -			 */
> -			if (h->surplus_huge_pages_node[nid] >=
> -						h->nr_huge_pages_node[nid]) {
> -				next_nid = hstate_next_node_to_free(h,
> -								nodes_allowed);
> -				continue;
> -			}
> +	} else {
> +		for_each_node_mask_to_free(h, nr_nodes, node, nodes_allowed) {
> +			if (h->surplus_huge_pages_node[node] <
> +					h->nr_huge_pages_node[node])
> +				goto found;
>  		}
> +	}
> +	return 0;
>
> -		h->surplus_huge_pages += delta;
> -		h->surplus_huge_pages_node[nid] += delta;
> -		ret = 1;
> -		break;
> -	} while (next_nid != start_nid);
> -
> -	return ret;
> +found:
> +	h->surplus_huge_pages += delta;
> +	h->surplus_huge_pages_node[node] += delta;
> +	return 1;
>  }
>
>  #define persistent_huge_pages(h) (h->nr_huge_pages - h->surplus_huge_pages)
> -- 
> 1.7.9.5


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 00/10] mm, hugetlb: clean-up and possible bug fix
  2013-07-22  8:36 [PATCH v2 00/10] mm, hugetlb: clean-up and possible bug fix Joonsoo Kim
                   ` (9 preceding siblings ...)
  2013-07-22  8:36 ` [PATCH v2 10/10] mm, hugetlb: decrement reserve count if VM_NORESERVE alloc page cache Joonsoo Kim
@ 2013-07-22 15:51 ` Aneesh Kumar K.V
  2013-07-23  7:31   ` Joonsoo Kim
  10 siblings, 1 reply; 25+ messages in thread
From: Aneesh Kumar K.V @ 2013-07-22 15:51 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Rik van Riel, Mel Gorman, Michal Hocko, KAMEZAWA Hiroyuki,
	Hugh Dickins, Davidlohr Bueso, David Gibson, linux-mm,
	linux-kernel, Joonsoo Kim, Joonsoo Kim

Joonsoo Kim <iamjoonsoo.kim@lge.com> writes:

> First 6 patches are almost trivial clean-up patches.
>
> The others are for fixing three bugs.
> Perhaps, these problems are minor, because this codes are used
> for a long time, and there is no bug reporting for these problems.
>
> These patches are based on v3.10.0 and
> passed the libhugetlbfs test suite.

Please also add the new tests you have as part of this patch series to
the test suite.

-aneesh


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 02/10] mm, hugetlb: remove err label in dequeue_huge_page_vma()
  2013-07-22  8:36 ` [PATCH v2 02/10] mm, hugetlb: remove err label in dequeue_huge_page_vma() Joonsoo Kim
@ 2013-07-22 16:11   ` Michal Hocko
  2013-07-23  7:27     ` Joonsoo Kim
  0 siblings, 1 reply; 25+ messages in thread
From: Michal Hocko @ 2013-07-22 16:11 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Mel Gorman, Aneesh Kumar K.V,
	KAMEZAWA Hiroyuki, Hugh Dickins, Davidlohr Bueso, David Gibson,
	linux-mm, linux-kernel, Joonsoo Kim

On Mon 22-07-13 17:36:23, Joonsoo Kim wrote:
> This label is not needed now, because there is no error handling
> except returing NULL.
> 
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index fc4988c..d87f70b 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -546,11 +546,11 @@ static struct page *dequeue_huge_page_vma(struct hstate *h,
>  	 */
>  	if (!vma_has_reserves(vma) &&
>  			h->free_huge_pages - h->resv_huge_pages == 0)
> -		goto err;
> +		return NULL;
>  
>  	/* If reserves cannot be used, ensure enough pages are in the pool */
>  	if (avoid_reserve && h->free_huge_pages - h->resv_huge_pages == 0)
> -		goto err;
> +		return NULL;
>  
>  retry_cpuset:
>  	cpuset_mems_cookie = get_mems_allowed();
> @@ -573,9 +573,6 @@ retry_cpuset:
>  	if (unlikely(!put_mems_allowed(cpuset_mems_cookie) && !page))
>  		goto retry_cpuset;
>  	return page;
> -
> -err:
> -	return NULL;

This doesn't give us anything IMO. It is a matter of taste but if there
is no cleanup I would prefer no err label.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 05/10] mm, hugetlb: fix and clean-up node iteration code to alloc or free
  2013-07-22  8:36 ` [PATCH v2 05/10] mm, hugetlb: fix and clean-up node iteration code to alloc or free Joonsoo Kim
  2013-07-22 15:50   ` Aneesh Kumar K.V
@ 2013-07-22 16:23   ` Michal Hocko
  2013-07-23  1:05     ` Aneesh Kumar K.V
  1 sibling, 1 reply; 25+ messages in thread
From: Michal Hocko @ 2013-07-22 16:23 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Mel Gorman, Aneesh Kumar K.V,
	KAMEZAWA Hiroyuki, Hugh Dickins, Davidlohr Bueso, David Gibson,
	linux-mm, linux-kernel, Joonsoo Kim

On Mon 22-07-13 17:36:26, Joonsoo Kim wrote:
> Current node iteration code have a minor problem which do one more
> node rotation if we can't succeed to allocate. For example,
> if we start to allocate at node 0, we stop to iterate at node 0.
> Then we start to allocate at node 1 for next allocation.
> 
> I introduce new macros "for_each_node_mask_to_[alloc|free]" and
> fix and clean-up node iteration code to alloc or free.
> This makes code more understandable.

I don't know but it feels like you are trying to fix an awkward
interface with another one. Why hstate_next_node_to_alloc cannot simply
return MAX_NUMNODES once the loop is done and start from first_node next
time it is called? We wouldn't have the bug you are mentioning and you
do not need scary looking macros.
 
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 83edd17..3ac0a6f 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -752,33 +752,6 @@ static int hstate_next_node_to_alloc(struct hstate *h,
>  	return nid;
>  }
>  
> -static int alloc_fresh_huge_page(struct hstate *h, nodemask_t *nodes_allowed)
> -{
> -	struct page *page;
> -	int start_nid;
> -	int next_nid;
> -	int ret = 0;
> -
> -	start_nid = hstate_next_node_to_alloc(h, nodes_allowed);
> -	next_nid = start_nid;
> -
> -	do {
> -		page = alloc_fresh_huge_page_node(h, next_nid);
> -		if (page) {
> -			ret = 1;
> -			break;
> -		}
> -		next_nid = hstate_next_node_to_alloc(h, nodes_allowed);
> -	} while (next_nid != start_nid);
> -
> -	if (ret)
> -		count_vm_event(HTLB_BUDDY_PGALLOC);
> -	else
> -		count_vm_event(HTLB_BUDDY_PGALLOC_FAIL);
> -
> -	return ret;
> -}
> -
>  /*
>   * helper for free_pool_huge_page() - return the previously saved
>   * node ["this node"] from which to free a huge page.  Advance the
> @@ -797,6 +770,40 @@ static int hstate_next_node_to_free(struct hstate *h, nodemask_t *nodes_allowed)
>  	return nid;
>  }
>  
> +#define for_each_node_mask_to_alloc(hs, nr_nodes, node, mask)		\
> +	for (nr_nodes = nodes_weight(*mask);				\
> +		nr_nodes > 0 &&						\
> +		((node = hstate_next_node_to_alloc(hs, mask)) || 1);	\
> +		nr_nodes--)
> +
> +#define for_each_node_mask_to_free(hs, nr_nodes, node, mask)		\
> +	for (nr_nodes = nodes_weight(*mask);				\
> +		nr_nodes > 0 &&						\
> +		((node = hstate_next_node_to_free(hs, mask)) || 1);	\
> +		nr_nodes--)
> +
> +static int alloc_fresh_huge_page(struct hstate *h, nodemask_t *nodes_allowed)
> +{
> +	struct page *page;
> +	int nr_nodes, node;
> +	int ret = 0;
> +
> +	for_each_node_mask_to_alloc(h, nr_nodes, node, nodes_allowed) {
> +		page = alloc_fresh_huge_page_node(h, node);
> +		if (page) {
> +			ret = 1;
> +			break;
> +		}
> +	}
> +
> +	if (ret)
> +		count_vm_event(HTLB_BUDDY_PGALLOC);
> +	else
> +		count_vm_event(HTLB_BUDDY_PGALLOC_FAIL);
> +
> +	return ret;
> +}
> +
>  /*
>   * Free huge page from pool from next node to free.
>   * Attempt to keep persistent huge pages more or less
> @@ -806,36 +813,31 @@ static int hstate_next_node_to_free(struct hstate *h, nodemask_t *nodes_allowed)
>  static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
>  							 bool acct_surplus)
>  {
> -	int start_nid;
> -	int next_nid;
> +	int nr_nodes, node;
>  	int ret = 0;
>  
> -	start_nid = hstate_next_node_to_free(h, nodes_allowed);
> -	next_nid = start_nid;
> -
> -	do {
> +	for_each_node_mask_to_free(h, nr_nodes, node, nodes_allowed) {
>  		/*
>  		 * If we're returning unused surplus pages, only examine
>  		 * nodes with surplus pages.
>  		 */
> -		if ((!acct_surplus || h->surplus_huge_pages_node[next_nid]) &&
> -		    !list_empty(&h->hugepage_freelists[next_nid])) {
> +		if ((!acct_surplus || h->surplus_huge_pages_node[node]) &&
> +		    !list_empty(&h->hugepage_freelists[node])) {
>  			struct page *page =
> -				list_entry(h->hugepage_freelists[next_nid].next,
> +				list_entry(h->hugepage_freelists[node].next,
>  					  struct page, lru);
>  			list_del(&page->lru);
>  			h->free_huge_pages--;
> -			h->free_huge_pages_node[next_nid]--;
> +			h->free_huge_pages_node[node]--;
>  			if (acct_surplus) {
>  				h->surplus_huge_pages--;
> -				h->surplus_huge_pages_node[next_nid]--;
> +				h->surplus_huge_pages_node[node]--;
>  			}
>  			update_and_free_page(h, page);
>  			ret = 1;
>  			break;
>  		}
> -		next_nid = hstate_next_node_to_free(h, nodes_allowed);
> -	} while (next_nid != start_nid);
> +	}
>  
>  	return ret;
>  }
> @@ -1172,14 +1174,12 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
>  int __weak alloc_bootmem_huge_page(struct hstate *h)
>  {
>  	struct huge_bootmem_page *m;
> -	int nr_nodes = nodes_weight(node_states[N_MEMORY]);
> +	int nr_nodes, node;
>  
> -	while (nr_nodes) {
> +	for_each_node_mask_to_alloc(h, nr_nodes, node, &node_states[N_MEMORY]) {
>  		void *addr;
>  
> -		addr = __alloc_bootmem_node_nopanic(
> -				NODE_DATA(hstate_next_node_to_alloc(h,
> -						&node_states[N_MEMORY])),
> +		addr = __alloc_bootmem_node_nopanic(NODE_DATA(node),
>  				huge_page_size(h), huge_page_size(h), 0);
>  
>  		if (addr) {
> @@ -1191,7 +1191,6 @@ int __weak alloc_bootmem_huge_page(struct hstate *h)
>  			m = addr;
>  			goto found;
>  		}
> -		nr_nodes--;
>  	}
>  	return 0;
>  
> @@ -1330,48 +1329,28 @@ static inline void try_to_free_low(struct hstate *h, unsigned long count,
>  static int adjust_pool_surplus(struct hstate *h, nodemask_t *nodes_allowed,
>  				int delta)
>  {
> -	int start_nid, next_nid;
> -	int ret = 0;
> +	int nr_nodes, node;
>  
>  	VM_BUG_ON(delta != -1 && delta != 1);
>  
> -	if (delta < 0)
> -		start_nid = hstate_next_node_to_alloc(h, nodes_allowed);
> -	else
> -		start_nid = hstate_next_node_to_free(h, nodes_allowed);
> -	next_nid = start_nid;
> -
> -	do {
> -		int nid = next_nid;
> -		if (delta < 0)  {
> -			/*
> -			 * To shrink on this node, there must be a surplus page
> -			 */
> -			if (!h->surplus_huge_pages_node[nid]) {
> -				next_nid = hstate_next_node_to_alloc(h,
> -								nodes_allowed);
> -				continue;
> -			}
> +	if (delta < 0) {
> +		for_each_node_mask_to_alloc(h, nr_nodes, node, nodes_allowed) {
> +			if (h->surplus_huge_pages_node[node])
> +				goto found;
>  		}
> -		if (delta > 0) {
> -			/*
> -			 * Surplus cannot exceed the total number of pages
> -			 */
> -			if (h->surplus_huge_pages_node[nid] >=
> -						h->nr_huge_pages_node[nid]) {
> -				next_nid = hstate_next_node_to_free(h,
> -								nodes_allowed);
> -				continue;
> -			}
> +	} else {
> +		for_each_node_mask_to_free(h, nr_nodes, node, nodes_allowed) {
> +			if (h->surplus_huge_pages_node[node] <
> +					h->nr_huge_pages_node[node])
> +				goto found;
>  		}
> +	}
> +	return 0;
>  
> -		h->surplus_huge_pages += delta;
> -		h->surplus_huge_pages_node[nid] += delta;
> -		ret = 1;
> -		break;
> -	} while (next_nid != start_nid);
> -
> -	return ret;
> +found:
> +	h->surplus_huge_pages += delta;
> +	h->surplus_huge_pages_node[node] += delta;
> +	return 1;
>  }
>  
>  #define persistent_huge_pages(h) (h->nr_huge_pages - h->surplus_huge_pages)
> -- 
> 1.7.9.5
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 05/10] mm, hugetlb: fix and clean-up node iteration code to alloc or free
  2013-07-22 16:23   ` Michal Hocko
@ 2013-07-23  1:05     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 25+ messages in thread
From: Aneesh Kumar K.V @ 2013-07-23  1:05 UTC (permalink / raw)
  To: Michal Hocko, Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Mel Gorman, KAMEZAWA Hiroyuki,
	Hugh Dickins, Davidlohr Bueso, David Gibson, linux-mm,
	linux-kernel, Joonsoo Kim

Michal Hocko <mhocko@suse.cz> writes:

> On Mon 22-07-13 17:36:26, Joonsoo Kim wrote:
>> Current node iteration code have a minor problem which do one more
>> node rotation if we can't succeed to allocate. For example,
>> if we start to allocate at node 0, we stop to iterate at node 0.
>> Then we start to allocate at node 1 for next allocation.
>> 
>> I introduce new macros "for_each_node_mask_to_[alloc|free]" and
>> fix and clean-up node iteration code to alloc or free.
>> This makes code more understandable.
>
> I don't know but it feels like you are trying to fix an awkward
> interface with another one. Why hstate_next_node_to_alloc cannot simply
> return MAX_NUMNODES once the loop is done and start from first_node next
> time it is called? We wouldn't have the bug you are mentioning and you
> do not need scary looking macros.
>

Even though the macros looks confusing, the changes do help rest of the
code. for ex: I liked how it made alloc simpler.

 +	for_each_node_mask_to_alloc(h, nr_nodes, node, nodes_allowed) {
 +		page = alloc_fresh_huge_page_node(h, node);

-aneesh


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 07/10] mm, hugetlb: do not use a page in page cache for cow optimization
  2013-07-22  8:36 ` [PATCH v2 07/10] mm, hugetlb: do not use a page in page cache for cow optimization Joonsoo Kim
@ 2013-07-23  4:26   ` Naoya Horiguchi
  2013-07-23 11:45   ` Michal Hocko
  1 sibling, 0 replies; 25+ messages in thread
From: Naoya Horiguchi @ 2013-07-23  4:26 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Mel Gorman, Michal Hocko,
	Aneesh Kumar K.V, KAMEZAWA Hiroyuki, Hugh Dickins,
	Davidlohr Bueso, David Gibson, linux-mm, linux-kernel,
	Joonsoo Kim

On Mon, Jul 22, 2013 at 05:36:28PM +0900, Joonsoo Kim wrote:
> Currently, we use a page with mapped count 1 in page cache for cow
> optimization. If we find this condition, we don't allocate a new
> page and copy contents. Instead, we map this page directly.
> This may introduce a problem that writting to private mapping overwrite
> hugetlb file directly. You can find this situation with following code.
> 
>         size = 20 * MB;
>         flag = MAP_SHARED;
>         p = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, fd, 0);
>         if (p == MAP_FAILED) {
>                 fprintf(stderr, "mmap() failed: %s\n", strerror(errno));
>                 return -1;
>         }
>         p[0] = 's';
>         fprintf(stdout, "BEFORE STEAL PRIVATE WRITE: %c\n", p[0]);
>         munmap(p, size);
> 
>         flag = MAP_PRIVATE;
>         p = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, fd, 0);
>         if (p == MAP_FAILED) {
>                 fprintf(stderr, "mmap() failed: %s\n", strerror(errno));
>         }
>         p[0] = 'c';
>         munmap(p, size);
> 
>         flag = MAP_SHARED;
>         p = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, fd, 0);
>         if (p == MAP_FAILED) {
>                 fprintf(stderr, "mmap() failed: %s\n", strerror(errno));
>                 return -1;
>         }
>         fprintf(stdout, "AFTER STEAL PRIVATE WRITE: %c\n", p[0]);
>         munmap(p, size);
> 
> We can see that "AFTER STEAL PRIVATE WRITE: c", not "AFTER STEAL
> PRIVATE WRITE: s". If we turn off this optimization to a page
> in page cache, the problem is disappeared.

Looks good to me. Thanks for the fix.

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 7ca8733..8a61638 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2508,7 +2508,6 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
>  {
>  	struct hstate *h = hstate_vma(vma);
>  	struct page *old_page, *new_page;
> -	int avoidcopy;
>  	int outside_reserve = 0;
>  	unsigned long mmun_start;	/* For mmu_notifiers */
>  	unsigned long mmun_end;		/* For mmu_notifiers */
> @@ -2518,10 +2517,8 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
>  retry_avoidcopy:
>  	/* If no-one else is actually using this page, avoid the copy
>  	 * and just make the page writable */
> -	avoidcopy = (page_mapcount(old_page) == 1);
> -	if (avoidcopy) {
> -		if (PageAnon(old_page))
> -			page_move_anon_rmap(old_page, vma, address);
> +	if (page_mapcount(old_page) == 1 && PageAnon(old_page)) {
> +		page_move_anon_rmap(old_page, vma, address);
>  		set_huge_ptep_writable(vma, address, ptep);
>  		return 0;
>  	}
> -- 
> 1.7.9.5
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 02/10] mm, hugetlb: remove err label in dequeue_huge_page_vma()
  2013-07-22 16:11   ` Michal Hocko
@ 2013-07-23  7:27     ` Joonsoo Kim
  0 siblings, 0 replies; 25+ messages in thread
From: Joonsoo Kim @ 2013-07-23  7:27 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Rik van Riel, Mel Gorman, Aneesh Kumar K.V,
	KAMEZAWA Hiroyuki, Hugh Dickins, Davidlohr Bueso, David Gibson,
	linux-mm, linux-kernel

On Mon, Jul 22, 2013 at 06:11:11PM +0200, Michal Hocko wrote:
> On Mon 22-07-13 17:36:23, Joonsoo Kim wrote:
> > This label is not needed now, because there is no error handling
> > except returing NULL.
> > 
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > 
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index fc4988c..d87f70b 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -546,11 +546,11 @@ static struct page *dequeue_huge_page_vma(struct hstate *h,
> >  	 */
> >  	if (!vma_has_reserves(vma) &&
> >  			h->free_huge_pages - h->resv_huge_pages == 0)
> > -		goto err;
> > +		return NULL;
> >  
> >  	/* If reserves cannot be used, ensure enough pages are in the pool */
> >  	if (avoid_reserve && h->free_huge_pages - h->resv_huge_pages == 0)
> > -		goto err;
> > +		return NULL;
> >  
> >  retry_cpuset:
> >  	cpuset_mems_cookie = get_mems_allowed();
> > @@ -573,9 +573,6 @@ retry_cpuset:
> >  	if (unlikely(!put_mems_allowed(cpuset_mems_cookie) && !page))
> >  		goto retry_cpuset;
> >  	return page;
> > -
> > -err:
> > -	return NULL;
> 
> This doesn't give us anything IMO. It is a matter of taste but if there
> is no cleanup I would prefer no err label.

Okay. If this patchset need respin, I will omit this one.

Thanks.

> -- 
> Michal Hocko
> SUSE Labs
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 00/10] mm, hugetlb: clean-up and possible bug fix
  2013-07-22 15:51 ` [PATCH v2 00/10] mm, hugetlb: clean-up and possible bug fix Aneesh Kumar K.V
@ 2013-07-23  7:31   ` Joonsoo Kim
  0 siblings, 0 replies; 25+ messages in thread
From: Joonsoo Kim @ 2013-07-23  7:31 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Andrew Morton, Rik van Riel, Mel Gorman, Michal Hocko,
	KAMEZAWA Hiroyuki, Hugh Dickins, Davidlohr Bueso, David Gibson,
	linux-mm, linux-kernel

On Mon, Jul 22, 2013 at 09:21:38PM +0530, Aneesh Kumar K.V wrote:
> Joonsoo Kim <iamjoonsoo.kim@lge.com> writes:
> 
> > First 6 patches are almost trivial clean-up patches.
> >
> > The others are for fixing three bugs.
> > Perhaps, these problems are minor, because this codes are used
> > for a long time, and there is no bug reporting for these problems.
> >
> > These patches are based on v3.10.0 and
> > passed the libhugetlbfs test suite.
> 
> Please also add the new tests you have as part of this patch series to
> the test suite.

Okay.

Thanks for reviewing this patchset.

> 
> -aneesh
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 07/10] mm, hugetlb: do not use a page in page cache for cow optimization
  2013-07-22  8:36 ` [PATCH v2 07/10] mm, hugetlb: do not use a page in page cache for cow optimization Joonsoo Kim
  2013-07-23  4:26   ` Naoya Horiguchi
@ 2013-07-23 11:45   ` Michal Hocko
  2013-07-24  8:51     ` Joonsoo Kim
  1 sibling, 1 reply; 25+ messages in thread
From: Michal Hocko @ 2013-07-23 11:45 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Mel Gorman, Aneesh Kumar K.V,
	KAMEZAWA Hiroyuki, Hugh Dickins, Davidlohr Bueso, David Gibson,
	linux-mm, linux-kernel, Joonsoo Kim

On Mon 22-07-13 17:36:28, Joonsoo Kim wrote:
> Currently, we use a page with mapped count 1 in page cache for cow
> optimization. If we find this condition, we don't allocate a new
> page and copy contents. Instead, we map this page directly.
> This may introduce a problem that writting to private mapping overwrite
> hugetlb file directly. You can find this situation with following code.
> 
>         size = 20 * MB;
>         flag = MAP_SHARED;
>         p = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, fd, 0);
>         if (p == MAP_FAILED) {
>                 fprintf(stderr, "mmap() failed: %s\n", strerror(errno));
>                 return -1;
>         }
>         p[0] = 's';
>         fprintf(stdout, "BEFORE STEAL PRIVATE WRITE: %c\n", p[0]);
>         munmap(p, size);
> 
>         flag = MAP_PRIVATE;
>         p = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, fd, 0);
>         if (p == MAP_FAILED) {
>                 fprintf(stderr, "mmap() failed: %s\n", strerror(errno));
>         }
>         p[0] = 'c';
>         munmap(p, size);
> 
>         flag = MAP_SHARED;
>         p = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, fd, 0);
>         if (p == MAP_FAILED) {
>                 fprintf(stderr, "mmap() failed: %s\n", strerror(errno));
>                 return -1;
>         }
>         fprintf(stdout, "AFTER STEAL PRIVATE WRITE: %c\n", p[0]);
>         munmap(p, size);
> 
> We can see that "AFTER STEAL PRIVATE WRITE: c", not "AFTER STEAL
> PRIVATE WRITE: s". If we turn off this optimization to a page
> in page cache, the problem is disappeared.

It would be nice to describe the fix here as well. It is far from being
intuitive and trivial.

The fix seems to be correct.

> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Acked-by: Michal Hocko <mhocko@suse.cz>

> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 7ca8733..8a61638 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2508,7 +2508,6 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
>  {
>  	struct hstate *h = hstate_vma(vma);
>  	struct page *old_page, *new_page;
> -	int avoidcopy;
>  	int outside_reserve = 0;
>  	unsigned long mmun_start;	/* For mmu_notifiers */
>  	unsigned long mmun_end;		/* For mmu_notifiers */
> @@ -2518,10 +2517,8 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
>  retry_avoidcopy:
>  	/* If no-one else is actually using this page, avoid the copy
>  	 * and just make the page writable */
> -	avoidcopy = (page_mapcount(old_page) == 1);
> -	if (avoidcopy) {
> -		if (PageAnon(old_page))
> -			page_move_anon_rmap(old_page, vma, address);
> +	if (page_mapcount(old_page) == 1 && PageAnon(old_page)) {
> +		page_move_anon_rmap(old_page, vma, address);
>  		set_huge_ptep_writable(vma, address, ptep);
>  		return 0;
>  	}
> -- 
> 1.7.9.5
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 08/10] mm, hugetlb: add VM_NORESERVE check in vma_has_reserves()
  2013-07-22  8:36 ` [PATCH v2 08/10] mm, hugetlb: add VM_NORESERVE check in vma_has_reserves() Joonsoo Kim
@ 2013-07-23 13:23   ` Michal Hocko
  0 siblings, 0 replies; 25+ messages in thread
From: Michal Hocko @ 2013-07-23 13:23 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Mel Gorman, Aneesh Kumar K.V,
	KAMEZAWA Hiroyuki, Hugh Dickins, Davidlohr Bueso, David Gibson,
	linux-mm, linux-kernel, Joonsoo Kim

On Mon 22-07-13 17:36:29, Joonsoo Kim wrote:
> If we map the region with MAP_NORESERVE and MAP_SHARED,
> we can skip to check reserve counting and eventually we cannot be ensured
> to allocate a huge page in fault time.
> With following example code, you can easily find this situation.
> 
> Assume 2MB, nr_hugepages = 100
> 
>         fd = hugetlbfs_unlinked_fd();
>         if (fd < 0)
>                 return 1;
> 
>         size = 200 * MB;
>         flag = MAP_SHARED;
>         p = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, fd, 0);
>         if (p == MAP_FAILED) {
>                 fprintf(stderr, "mmap() failed: %s\n", strerror(errno));
>                 return -1;
>         }
> 
>         size = 2 * MB;
>         flag = MAP_ANONYMOUS | MAP_SHARED | MAP_HUGETLB | MAP_NORESERVE;
>         p = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, -1, 0);
>         if (p == MAP_FAILED) {
>                 fprintf(stderr, "mmap() failed: %s\n", strerror(errno));
>         }
>         p[0] = '0';
>         sleep(10);
> 
> During executing sleep(10), run 'cat /proc/meminfo' on another process.
> You'll find a mentioned problem.
> 
> non reserved shared mapping should not eat into reserve space. So
> return error when we don't find enough free space.

I guess I undestand what you are trying to tell but the changelog is
really hard to read. Please be explicit about what is the expected
result of the test and what is the fix doing. The reservation code is
quite complex already.
 
> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 8a61638..87e73bd 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -464,6 +464,8 @@ void reset_vma_resv_huge_pages(struct vm_area_struct *vma)
>  /* Returns true if the VMA has associated reserve pages */
>  static int vma_has_reserves(struct vm_area_struct *vma)
>  {
> +	if (vma->vm_flags & VM_NORESERVE)
> +		return 0;
>  	if (vma->vm_flags & VM_MAYSHARE)
>  		return 1;
>  	if (is_vma_resv_set(vma, HPAGE_RESV_OWNER))
> -- 
> 1.7.9.5
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 10/10] mm, hugetlb: decrement reserve count if VM_NORESERVE alloc page cache
  2013-07-22  8:36 ` [PATCH v2 10/10] mm, hugetlb: decrement reserve count if VM_NORESERVE alloc page cache Joonsoo Kim
@ 2013-07-23 13:37   ` Michal Hocko
  0 siblings, 0 replies; 25+ messages in thread
From: Michal Hocko @ 2013-07-23 13:37 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Mel Gorman, Aneesh Kumar K.V,
	KAMEZAWA Hiroyuki, Hugh Dickins, Davidlohr Bueso, David Gibson,
	linux-mm, linux-kernel, Joonsoo Kim

On Mon 22-07-13 17:36:31, Joonsoo Kim wrote:
> If a vma with VM_NORESERVE allocate a new page for page cache, we should
> check whether this area is reserved or not. If this address is
> already reserved by other process(in case of chg == 0), we should
> decrement reserve count, because this allocated page will go into page
> cache and currently, there is no way to know that this page comes from
> reserved pool or not when releasing inode. This may introduce
> over-counting problem to reserved count. With following example code,
> you can easily reproduce this situation.
> 
>         size = 20 * MB;
>         flag = MAP_SHARED;
>         p = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, fd, 0);
>         if (p == MAP_FAILED) {
>                 fprintf(stderr, "mmap() failed: %s\n", strerror(errno));
>                 return -1;
>         }
> 
>         flag = MAP_SHARED | MAP_NORESERVE;
>         q = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, fd, 0);
>         if (q == MAP_FAILED) {
>                 fprintf(stderr, "mmap() failed: %s\n", strerror(errno));
>         }
>         q[0] = 'c';
> 
> This patch solve this problem.

Again, please describe _how_ it solves the problem.

> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 2ea6afd..6782b41 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -443,10 +443,23 @@ void reset_vma_resv_huge_pages(struct vm_area_struct *vma)
>  }
>  
>  /* Returns true if the VMA has associated reserve pages */
> -static int vma_has_reserves(struct vm_area_struct *vma)
> +static int vma_has_reserves(struct vm_area_struct *vma, long chg)
>  {
> -	if (vma->vm_flags & VM_NORESERVE)
> -		return 0;
> +	if (vma->vm_flags & VM_NORESERVE) {
> +		/*
> +		 * This address is already reserved by other process(chg == 0),
> +		 * so, we should decreament reserved count. Without
> +		 * decreamenting, reserve count is remained after releasing
> +		 * inode, because this allocated page will go into page cache
> +		 * and is regarded as coming from reserved pool in releasing
> +		 * step. Currently, we don't have any other solution to deal
> +		 * with this situation properly, so add work-around here.
> +		 */
> +		if (vma->vm_flags & VM_MAYSHARE && chg == 0)
> +			return 1;
> +		else
> +			return 0;
> +	}
>  
>  	/* Shared mappings always use reserves */
>  	if (vma->vm_flags & VM_MAYSHARE)
> @@ -520,7 +533,8 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid)
>  
>  static struct page *dequeue_huge_page_vma(struct hstate *h,
>  				struct vm_area_struct *vma,
> -				unsigned long address, int avoid_reserve)
> +				unsigned long address, int avoid_reserve,
> +				long chg)
>  {
>  	struct page *page = NULL;
>  	struct mempolicy *mpol;
> @@ -535,7 +549,7 @@ static struct page *dequeue_huge_page_vma(struct hstate *h,
>  	 * have no page reserves. This check ensures that reservations are
>  	 * not "stolen". The child may still get SIGKILLed
>  	 */
> -	if (!vma_has_reserves(vma) &&
> +	if (!vma_has_reserves(vma, chg) &&
>  			h->free_huge_pages - h->resv_huge_pages == 0)
>  		return NULL;
>  
> @@ -553,8 +567,12 @@ retry_cpuset:
>  		if (cpuset_zone_allowed_softwall(zone, htlb_alloc_mask)) {
>  			page = dequeue_huge_page_node(h, zone_to_nid(zone));
>  			if (page) {
> -				if (!avoid_reserve && vma_has_reserves(vma))
> -					h->resv_huge_pages--;
> +				if (avoid_reserve)
> +					break;
> +				if (!vma_has_reserves(vma, chg))
> +					break;
> +
> +				h->resv_huge_pages--;
>  				break;
>  			}
>  		}
> @@ -1135,7 +1153,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
>  		return ERR_PTR(-ENOSPC);
>  	}
>  	spin_lock(&hugetlb_lock);
> -	page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve);
> +	page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve, chg);
>  	if (!page) {
>  		spin_unlock(&hugetlb_lock);
>  		page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
> -- 
> 1.7.9.5
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 07/10] mm, hugetlb: do not use a page in page cache for cow optimization
  2013-07-23 11:45   ` Michal Hocko
@ 2013-07-24  8:51     ` Joonsoo Kim
  0 siblings, 0 replies; 25+ messages in thread
From: Joonsoo Kim @ 2013-07-24  8:51 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Rik van Riel, Mel Gorman, Aneesh Kumar K.V,
	KAMEZAWA Hiroyuki, Hugh Dickins, Davidlohr Bueso, David Gibson,
	linux-mm, linux-kernel

On Tue, Jul 23, 2013 at 01:45:50PM +0200, Michal Hocko wrote:
> On Mon 22-07-13 17:36:28, Joonsoo Kim wrote:
> > Currently, we use a page with mapped count 1 in page cache for cow
> > optimization. If we find this condition, we don't allocate a new
> > page and copy contents. Instead, we map this page directly.
> > This may introduce a problem that writting to private mapping overwrite
> > hugetlb file directly. You can find this situation with following code.
> > 
> >         size = 20 * MB;
> >         flag = MAP_SHARED;
> >         p = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, fd, 0);
> >         if (p == MAP_FAILED) {
> >                 fprintf(stderr, "mmap() failed: %s\n", strerror(errno));
> >                 return -1;
> >         }
> >         p[0] = 's';
> >         fprintf(stdout, "BEFORE STEAL PRIVATE WRITE: %c\n", p[0]);
> >         munmap(p, size);
> > 
> >         flag = MAP_PRIVATE;
> >         p = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, fd, 0);
> >         if (p == MAP_FAILED) {
> >                 fprintf(stderr, "mmap() failed: %s\n", strerror(errno));
> >         }
> >         p[0] = 'c';
> >         munmap(p, size);
> > 
> >         flag = MAP_SHARED;
> >         p = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, fd, 0);
> >         if (p == MAP_FAILED) {
> >                 fprintf(stderr, "mmap() failed: %s\n", strerror(errno));
> >                 return -1;
> >         }
> >         fprintf(stdout, "AFTER STEAL PRIVATE WRITE: %c\n", p[0]);
> >         munmap(p, size);
> > 
> > We can see that "AFTER STEAL PRIVATE WRITE: c", not "AFTER STEAL
> > PRIVATE WRITE: s". If we turn off this optimization to a page
> > in page cache, the problem is disappeared.
> 
> It would be nice to describe the fix here as well. It is far from being
> intuitive and trivial.

Okay. I will describe how I fix the problem in all patches you pointed out.

Thanks for reviewing!

> 
> The fix seems to be correct.
> 
> > Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> Acked-by: Michal Hocko <mhocko@suse.cz>
> 
> > 
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 7ca8733..8a61638 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -2508,7 +2508,6 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
> >  {
> >  	struct hstate *h = hstate_vma(vma);
> >  	struct page *old_page, *new_page;
> > -	int avoidcopy;
> >  	int outside_reserve = 0;
> >  	unsigned long mmun_start;	/* For mmu_notifiers */
> >  	unsigned long mmun_end;		/* For mmu_notifiers */
> > @@ -2518,10 +2517,8 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
> >  retry_avoidcopy:
> >  	/* If no-one else is actually using this page, avoid the copy
> >  	 * and just make the page writable */
> > -	avoidcopy = (page_mapcount(old_page) == 1);
> > -	if (avoidcopy) {
> > -		if (PageAnon(old_page))
> > -			page_move_anon_rmap(old_page, vma, address);
> > +	if (page_mapcount(old_page) == 1 && PageAnon(old_page)) {
> > +		page_move_anon_rmap(old_page, vma, address);
> >  		set_huge_ptep_writable(vma, address, ptep);
> >  		return 0;
> >  	}
> > -- 
> > 1.7.9.5
> > 
> 
> -- 
> Michal Hocko
> SUSE Labs
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 03/10] mm, hugetlb: trivial commenting fix
       [not found]   ` <51ef274e.0605e00a.246a.ffffa5c9SMTPIN_ADDED_BROKEN@mx.google.com>
@ 2013-07-24  8:52     ` Joonsoo Kim
  0 siblings, 0 replies; 25+ messages in thread
From: Joonsoo Kim @ 2013-07-24  8:52 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Andrew Morton, Rik van Riel, Mel Gorman, Michal Hocko,
	Aneesh Kumar K.V, KAMEZAWA Hiroyuki, Hugh Dickins,
	Davidlohr Bueso, David Gibson, linux-mm, linux-kernel

On Wed, Jul 24, 2013 at 09:00:41AM +0800, Wanpeng Li wrote:
> On Mon, Jul 22, 2013 at 05:36:24PM +0900, Joonsoo Kim wrote:
> >The name of the mutex written in comment is wrong.
> >Fix it.
> >
> >Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> >Acked-by: Hillf Danton <dhillf@gmail.com>
> >Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >
> >diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> >index d87f70b..d21a33a 100644
> >--- a/mm/hugetlb.c
> >+++ b/mm/hugetlb.c
> >@@ -135,9 +135,9 @@ static inline struct hugepage_subpool *subpool_vma(struct vm_area_struct *vma)
> >  *                    across the pages in a mapping.
> >  *
> >  * The region data structures are protected by a combination of the mmap_sem
> >- * and the hugetlb_instantion_mutex.  To access or modify a region the caller
> >+ * and the hugetlb_instantiation_mutex.  To access or modify a region the caller
> >  * must either hold the mmap_sem for write, or the mmap_sem for read and
> >- * the hugetlb_instantiation mutex:
> >+ * the hugetlb_instantiation_mutex:
> 
> What changed?

hugetlb_instantiation_mutex
                     ^ here!

Thanks for review!

> 
> >  *
> >  *	down_write(&mm->mmap_sem);
> >  * or
> >-- 
> >1.7.9.5
> >
> >--
> >To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >the body to majordomo@kvack.org.  For more info on Linux MM,
> >see: http://www.linux-mm.org/ .
> >Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2013-07-24  8:52 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-22  8:36 [PATCH v2 00/10] mm, hugetlb: clean-up and possible bug fix Joonsoo Kim
2013-07-22  8:36 ` [PATCH v2 01/10] mm, hugetlb: move up the code which check availability of free huge page Joonsoo Kim
2013-07-22 15:45   ` Aneesh Kumar K.V
2013-07-22  8:36 ` [PATCH v2 02/10] mm, hugetlb: remove err label in dequeue_huge_page_vma() Joonsoo Kim
2013-07-22 16:11   ` Michal Hocko
2013-07-23  7:27     ` Joonsoo Kim
2013-07-22  8:36 ` [PATCH v2 03/10] mm, hugetlb: trivial commenting fix Joonsoo Kim
     [not found]   ` <51ef274e.0605e00a.246a.ffffa5c9SMTPIN_ADDED_BROKEN@mx.google.com>
2013-07-24  8:52     ` Joonsoo Kim
2013-07-22  8:36 ` [PATCH v2 04/10] mm, hugetlb: clean-up alloc_huge_page() Joonsoo Kim
2013-07-22  8:36 ` [PATCH v2 05/10] mm, hugetlb: fix and clean-up node iteration code to alloc or free Joonsoo Kim
2013-07-22 15:50   ` Aneesh Kumar K.V
2013-07-22 16:23   ` Michal Hocko
2013-07-23  1:05     ` Aneesh Kumar K.V
2013-07-22  8:36 ` [PATCH v2 06/10] mm, hugetlb: remove redundant list_empty check in gather_surplus_pages() Joonsoo Kim
2013-07-22  8:36 ` [PATCH v2 07/10] mm, hugetlb: do not use a page in page cache for cow optimization Joonsoo Kim
2013-07-23  4:26   ` Naoya Horiguchi
2013-07-23 11:45   ` Michal Hocko
2013-07-24  8:51     ` Joonsoo Kim
2013-07-22  8:36 ` [PATCH v2 08/10] mm, hugetlb: add VM_NORESERVE check in vma_has_reserves() Joonsoo Kim
2013-07-23 13:23   ` Michal Hocko
2013-07-22  8:36 ` [PATCH v2 09/10] mm, hugetlb: remove decrement_hugepage_resv_vma() Joonsoo Kim
2013-07-22  8:36 ` [PATCH v2 10/10] mm, hugetlb: decrement reserve count if VM_NORESERVE alloc page cache Joonsoo Kim
2013-07-23 13:37   ` Michal Hocko
2013-07-22 15:51 ` [PATCH v2 00/10] mm, hugetlb: clean-up and possible bug fix Aneesh Kumar K.V
2013-07-23  7:31   ` Joonsoo Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).