[PATCH 0/2] mm: fix the "counter.sh" failure for libhugetlbfs

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] mm: fix the "counter.sh" failure for libhugetlbfs
@ 2016-11-03  2:51 ` Huang Shijie
  0 siblings, 0 replies; 32+ messages in thread
From: Huang Shijie @ 2016-11-03  2:51 UTC (permalink / raw)
  To: akpm, catalin.marinas
  Cc: n-horiguchi, mhocko, kirill.shutemov, aneesh.kumar,
	gerald.schaefer, mike.kravetz, linux-mm, will.deacon,
	steve.capper, kaly.xin, nd, linux-arm-kernel, Huang Shijie

(1) Background
   For the arm64, the hugetlb page size can be 32M (PMD + Contiguous bit).
   In the 4K page environment, the max page order is 10 (max_order - 1),
   so 32M page is the gigantic page.    

   The arm64 MMU supports a Contiguous bit which is a hint that the TTE
   is one of a set of contiguous entries which can be cached in a single
   TLB entry.  Please refer to the arm64v8 mannul :
       DDI0487A_f_armv8_arm.pdf (in page D4-1811)

(2) The bug   
   After I tested the libhugetlbfs, I found the test case "counter.sh"
   will fail with the gigantic page (32M page in arm64 board).

   This patch set adds support for gigantic surplus hugetlb pages,
   allowing the counter.sh unit test to pass.   


Huang Shijie (2):
  mm: hugetlb: rename some allocation functions
  mm: hugetlb: support gigantic surplus pages

 mm/hugetlb.c | 35 ++++++++++++++++++++---------------
 1 file changed, 20 insertions(+), 15 deletions(-)

-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 0/2] mm: fix the "counter.sh" failure for libhugetlbfs
@ 2016-11-03  2:51 ` Huang Shijie
  0 siblings, 0 replies; 32+ messages in thread
From: Huang Shijie @ 2016-11-03  2:51 UTC (permalink / raw)
  To: linux-arm-kernel

(1) Background
   For the arm64, the hugetlb page size can be 32M (PMD + Contiguous bit).
   In the 4K page environment, the max page order is 10 (max_order - 1),
   so 32M page is the gigantic page.    

   The arm64 MMU supports a Contiguous bit which is a hint that the TTE
   is one of a set of contiguous entries which can be cached in a single
   TLB entry.  Please refer to the arm64v8 mannul :
       DDI0487A_f_armv8_arm.pdf (in page D4-1811)

(2) The bug   
   After I tested the libhugetlbfs, I found the test case "counter.sh"
   will fail with the gigantic page (32M page in arm64 board).

   This patch set adds support for gigantic surplus hugetlb pages,
   allowing the counter.sh unit test to pass.   

Huang Shijie (2):
  mm: hugetlb: rename some allocation functions
  mm: hugetlb: support gigantic surplus pages

 mm/hugetlb.c | 35 ++++++++++++++++++++---------------
 1 file changed, 20 insertions(+), 15 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/2] mm: hugetlb: rename some allocation functions
  2016-11-03  2:51 ` Huang Shijie
@ 2016-11-03  2:51   ` Huang Shijie
  -1 siblings, 0 replies; 32+ messages in thread
From: Huang Shijie @ 2016-11-03  2:51 UTC (permalink / raw)
  To: akpm, catalin.marinas
  Cc: n-horiguchi, mhocko, kirill.shutemov, aneesh.kumar,
	gerald.schaefer, mike.kravetz, linux-mm, will.deacon,
	steve.capper, kaly.xin, nd, linux-arm-kernel, Huang Shijie

After a future patch, the __alloc_buddy_huge_page() will not necessarily
use the buddy allocator.

So this patch removes the "buddy" from these functions:
	__alloc_buddy_huge_page -> __alloc_huge_page
	__alloc_buddy_huge_page_no_mpol -> __alloc_huge_page_no_mpol
	__alloc_buddy_huge_page_with_mpol -> __alloc_huge_page_with_mpol

This patch makes preparation for the later patch.

Acked-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Huang Shijie <shijie.huang@arm.com>
---
 mm/hugetlb.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 32594f1..0bf4444 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1568,7 +1568,7 @@ static struct page *__hugetlb_alloc_buddy_huge_page(struct hstate *h,
  * For (2), we ignore 'vma' and 'addr' and use 'nid' exclusively. This
  * implies that memory policies will not be taken in to account.
  */
-static struct page *__alloc_buddy_huge_page(struct hstate *h,
+static struct page *__alloc_huge_page(struct hstate *h,
 		struct vm_area_struct *vma, unsigned long addr, int nid)
 {
 	struct page *page;
@@ -1649,21 +1649,21 @@ static struct page *__alloc_buddy_huge_page(struct hstate *h,
  * anywhere.
  */
 static
-struct page *__alloc_buddy_huge_page_no_mpol(struct hstate *h, int nid)
+struct page *__alloc_huge_page_no_mpol(struct hstate *h, int nid)
 {
 	unsigned long addr = -1;
 
-	return __alloc_buddy_huge_page(h, NULL, addr, nid);
+	return __alloc_huge_page(h, NULL, addr, nid);
 }
 
 /*
  * Use the VMA's mpolicy to allocate a huge page from the buddy.
  */
 static
-struct page *__alloc_buddy_huge_page_with_mpol(struct hstate *h,
+struct page *__alloc_huge_page_with_mpol(struct hstate *h,
 		struct vm_area_struct *vma, unsigned long addr)
 {
-	return __alloc_buddy_huge_page(h, vma, addr, NUMA_NO_NODE);
+	return __alloc_huge_page(h, vma, addr, NUMA_NO_NODE);
 }
 
 /*
@@ -1681,7 +1681,7 @@ struct page *alloc_huge_page_node(struct hstate *h, int nid)
 	spin_unlock(&hugetlb_lock);
 
 	if (!page)
-		page = __alloc_buddy_huge_page_no_mpol(h, nid);
+		page = __alloc_huge_page_no_mpol(h, nid);
 
 	return page;
 }
@@ -1711,7 +1711,7 @@ static int gather_surplus_pages(struct hstate *h, int delta)
 retry:
 	spin_unlock(&hugetlb_lock);
 	for (i = 0; i < needed; i++) {
-		page = __alloc_buddy_huge_page_no_mpol(h, NUMA_NO_NODE);
+		page = __alloc_huge_page_no_mpol(h, NUMA_NO_NODE);
 		if (!page) {
 			alloc_ok = false;
 			break;
@@ -1963,7 +1963,7 @@ struct page *alloc_huge_page(struct vm_area_struct *vma,
 	page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve, gbl_chg);
 	if (!page) {
 		spin_unlock(&hugetlb_lock);
-		page = __alloc_buddy_huge_page_with_mpol(h, vma, addr);
+		page = __alloc_huge_page_with_mpol(h, vma, addr);
 		if (!page)
 			goto out_uncharge_cgroup;
 		if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) {
@@ -2221,7 +2221,7 @@ static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count,
 	 * First take pages out of surplus state.  Then make up the
 	 * remaining difference by allocating fresh huge pages.
 	 *
-	 * We might race with __alloc_buddy_huge_page() here and be unable
+	 * We might race with __alloc_huge_page() here and be unable
 	 * to convert a surplus huge page to a normal huge page. That is
 	 * not critical, though, it just means the overall size of the
 	 * pool might be one hugepage larger than it needs to be, but
@@ -2267,7 +2267,7 @@ static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count,
 	 * By placing pages into the surplus state independent of the
 	 * overcommit value, we are allowing the surplus pool size to
 	 * exceed overcommit. There are few sane options here. Since
-	 * __alloc_buddy_huge_page() is checking the global counter,
+	 * __alloc_huge_page() is checking the global counter,
 	 * though, we'll note that we're not allowed to exceed surplus
 	 * and won't grow the pool anywhere else. Not until one of the
 	 * sysctls are changed, or the surplus pages go out of use.
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 1/2] mm: hugetlb: rename some allocation functions
@ 2016-11-03  2:51   ` Huang Shijie
  0 siblings, 0 replies; 32+ messages in thread
From: Huang Shijie @ 2016-11-03  2:51 UTC (permalink / raw)
  To: linux-arm-kernel

After a future patch, the __alloc_buddy_huge_page() will not necessarily
use the buddy allocator.

So this patch removes the "buddy" from these functions:
	__alloc_buddy_huge_page -> __alloc_huge_page
	__alloc_buddy_huge_page_no_mpol -> __alloc_huge_page_no_mpol
	__alloc_buddy_huge_page_with_mpol -> __alloc_huge_page_with_mpol

This patch makes preparation for the later patch.

Acked-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Huang Shijie <shijie.huang@arm.com>
---
 mm/hugetlb.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 32594f1..0bf4444 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1568,7 +1568,7 @@ static struct page *__hugetlb_alloc_buddy_huge_page(struct hstate *h,
  * For (2), we ignore 'vma' and 'addr' and use 'nid' exclusively. This
  * implies that memory policies will not be taken in to account.
  */
-static struct page *__alloc_buddy_huge_page(struct hstate *h,
+static struct page *__alloc_huge_page(struct hstate *h,
 		struct vm_area_struct *vma, unsigned long addr, int nid)
 {
 	struct page *page;
@@ -1649,21 +1649,21 @@ static struct page *__alloc_buddy_huge_page(struct hstate *h,
  * anywhere.
  */
 static
-struct page *__alloc_buddy_huge_page_no_mpol(struct hstate *h, int nid)
+struct page *__alloc_huge_page_no_mpol(struct hstate *h, int nid)
 {
 	unsigned long addr = -1;
 
-	return __alloc_buddy_huge_page(h, NULL, addr, nid);
+	return __alloc_huge_page(h, NULL, addr, nid);
 }
 
 /*
  * Use the VMA's mpolicy to allocate a huge page from the buddy.
  */
 static
-struct page *__alloc_buddy_huge_page_with_mpol(struct hstate *h,
+struct page *__alloc_huge_page_with_mpol(struct hstate *h,
 		struct vm_area_struct *vma, unsigned long addr)
 {
-	return __alloc_buddy_huge_page(h, vma, addr, NUMA_NO_NODE);
+	return __alloc_huge_page(h, vma, addr, NUMA_NO_NODE);
 }
 
 /*
@@ -1681,7 +1681,7 @@ struct page *alloc_huge_page_node(struct hstate *h, int nid)
 	spin_unlock(&hugetlb_lock);
 
 	if (!page)
-		page = __alloc_buddy_huge_page_no_mpol(h, nid);
+		page = __alloc_huge_page_no_mpol(h, nid);
 
 	return page;
 }
@@ -1711,7 +1711,7 @@ static int gather_surplus_pages(struct hstate *h, int delta)
 retry:
 	spin_unlock(&hugetlb_lock);
 	for (i = 0; i < needed; i++) {
-		page = __alloc_buddy_huge_page_no_mpol(h, NUMA_NO_NODE);
+		page = __alloc_huge_page_no_mpol(h, NUMA_NO_NODE);
 		if (!page) {
 			alloc_ok = false;
 			break;
@@ -1963,7 +1963,7 @@ struct page *alloc_huge_page(struct vm_area_struct *vma,
 	page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve, gbl_chg);
 	if (!page) {
 		spin_unlock(&hugetlb_lock);
-		page = __alloc_buddy_huge_page_with_mpol(h, vma, addr);
+		page = __alloc_huge_page_with_mpol(h, vma, addr);
 		if (!page)
 			goto out_uncharge_cgroup;
 		if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) {
@@ -2221,7 +2221,7 @@ static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count,
 	 * First take pages out of surplus state.  Then make up the
 	 * remaining difference by allocating fresh huge pages.
 	 *
-	 * We might race with __alloc_buddy_huge_page() here and be unable
+	 * We might race with __alloc_huge_page() here and be unable
 	 * to convert a surplus huge page to a normal huge page. That is
 	 * not critical, though, it just means the overall size of the
 	 * pool might be one hugepage larger than it needs to be, but
@@ -2267,7 +2267,7 @@ static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count,
 	 * By placing pages into the surplus state independent of the
 	 * overcommit value, we are allowing the surplus pool size to
 	 * exceed overcommit. There are few sane options here. Since
-	 * __alloc_buddy_huge_page() is checking the global counter,
+	 * __alloc_huge_page() is checking the global counter,
 	 * though, we'll note that we're not allowed to exceed surplus
 	 * and won't grow the pool anywhere else. Not until one of the
 	 * sysctls are changed, or the surplus pages go out of use.
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/2] mm: hugetlb: support gigantic surplus pages
  2016-11-03  2:51 ` Huang Shijie
@ 2016-11-03  2:51   ` Huang Shijie
  -1 siblings, 0 replies; 32+ messages in thread
From: Huang Shijie @ 2016-11-03  2:51 UTC (permalink / raw)
  To: akpm, catalin.marinas
  Cc: n-horiguchi, mhocko, kirill.shutemov, aneesh.kumar,
	gerald.schaefer, mike.kravetz, linux-mm, will.deacon,
	steve.capper, kaly.xin, nd, linux-arm-kernel, Huang Shijie

When testing the gigantic page whose order is too large for the buddy
allocator, the libhugetlbfs test case "counter.sh" will fail.

The failure is caused by:
 1) kernel fails to allocate a gigantic page for the surplus case.
    And the gather_surplus_pages() will return NULL in the end.

 2) The condition checks for "over-commit" is wrong.

This patch adds code to allocate the gigantic page in the
__alloc_huge_page(). After this patch, gather_surplus_pages()
can return a gigantic page for the surplus case.

This patch also changes the condition checks for:
     return_unused_surplus_pages()
     nr_overcommit_hugepages_store()

After this patch, the counter.sh can pass for the gigantic page.

Acked-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Huang Shijie <shijie.huang@arm.com>
---
 mm/hugetlb.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 0bf4444..2b67aff 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1574,7 +1574,7 @@ static struct page *__alloc_huge_page(struct hstate *h,
 	struct page *page;
 	unsigned int r_nid;
 
-	if (hstate_is_gigantic(h))
+	if (hstate_is_gigantic(h) && !gigantic_page_supported())
 		return NULL;
 
 	/*
@@ -1619,7 +1619,13 @@ static struct page *__alloc_huge_page(struct hstate *h,
 	}
 	spin_unlock(&hugetlb_lock);
 
-	page = __hugetlb_alloc_buddy_huge_page(h, vma, addr, nid);
+	if (hstate_is_gigantic(h))  {
+		page = alloc_gigantic_page(nid, huge_page_order(h));
+		if (page)
+			prep_compound_gigantic_page(page, huge_page_order(h));
+	} else {
+		page = __hugetlb_alloc_buddy_huge_page(h, vma, addr, nid);
+	}
 
 	spin_lock(&hugetlb_lock);
 	if (page) {
@@ -1786,8 +1792,7 @@ static void return_unused_surplus_pages(struct hstate *h,
 	/* Uncommit the reservation */
 	h->resv_huge_pages -= unused_resv_pages;
 
-	/* Cannot return gigantic pages currently */
-	if (hstate_is_gigantic(h))
+	if (hstate_is_gigantic(h) && !gigantic_page_supported())
 		return;
 
 	nr_pages = min(unused_resv_pages, h->surplus_huge_pages);
@@ -2439,7 +2444,7 @@ static ssize_t nr_overcommit_hugepages_store(struct kobject *kobj,
 	unsigned long input;
 	struct hstate *h = kobj_to_hstate(kobj, NULL);
 
-	if (hstate_is_gigantic(h))
+	if (hstate_is_gigantic(h) && !gigantic_page_supported())
 		return -EINVAL;
 
 	err = kstrtoul(buf, 10, &input);
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/2] mm: hugetlb: support gigantic surplus pages
@ 2016-11-03  2:51   ` Huang Shijie
  0 siblings, 0 replies; 32+ messages in thread
From: Huang Shijie @ 2016-11-03  2:51 UTC (permalink / raw)
  To: linux-arm-kernel

When testing the gigantic page whose order is too large for the buddy
allocator, the libhugetlbfs test case "counter.sh" will fail.

The failure is caused by:
 1) kernel fails to allocate a gigantic page for the surplus case.
    And the gather_surplus_pages() will return NULL in the end.

 2) The condition checks for "over-commit" is wrong.

This patch adds code to allocate the gigantic page in the
__alloc_huge_page(). After this patch, gather_surplus_pages()
can return a gigantic page for the surplus case.

This patch also changes the condition checks for:
     return_unused_surplus_pages()
     nr_overcommit_hugepages_store()

After this patch, the counter.sh can pass for the gigantic page.

Acked-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Huang Shijie <shijie.huang@arm.com>
---
 mm/hugetlb.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 0bf4444..2b67aff 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1574,7 +1574,7 @@ static struct page *__alloc_huge_page(struct hstate *h,
 	struct page *page;
 	unsigned int r_nid;
 
-	if (hstate_is_gigantic(h))
+	if (hstate_is_gigantic(h) && !gigantic_page_supported())
 		return NULL;
 
 	/*
@@ -1619,7 +1619,13 @@ static struct page *__alloc_huge_page(struct hstate *h,
 	}
 	spin_unlock(&hugetlb_lock);
 
-	page = __hugetlb_alloc_buddy_huge_page(h, vma, addr, nid);
+	if (hstate_is_gigantic(h))  {
+		page = alloc_gigantic_page(nid, huge_page_order(h));
+		if (page)
+			prep_compound_gigantic_page(page, huge_page_order(h));
+	} else {
+		page = __hugetlb_alloc_buddy_huge_page(h, vma, addr, nid);
+	}
 
 	spin_lock(&hugetlb_lock);
 	if (page) {
@@ -1786,8 +1792,7 @@ static void return_unused_surplus_pages(struct hstate *h,
 	/* Uncommit the reservation */
 	h->resv_huge_pages -= unused_resv_pages;
 
-	/* Cannot return gigantic pages currently */
-	if (hstate_is_gigantic(h))
+	if (hstate_is_gigantic(h) && !gigantic_page_supported())
 		return;
 
 	nr_pages = min(unused_resv_pages, h->surplus_huge_pages);
@@ -2439,7 +2444,7 @@ static ssize_t nr_overcommit_hugepages_store(struct kobject *kobj,
 	unsigned long input;
 	struct hstate *h = kobj_to_hstate(kobj, NULL);
 
-	if (hstate_is_gigantic(h))
+	if (hstate_is_gigantic(h) && !gigantic_page_supported())
 		return -EINVAL;
 
 	err = kstrtoul(buf, 10, &input);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/2] mm: hugetlb: support gigantic surplus pages
  2016-11-03  2:51   ` Huang Shijie
@ 2016-11-03  3:13     ` kbuild test robot
  -1 siblings, 0 replies; 32+ messages in thread
From: kbuild test robot @ 2016-11-03  3:13 UTC (permalink / raw)
  To: Huang Shijie
  Cc: kbuild-all, akpm, catalin.marinas, n-horiguchi, mhocko,
	kirill.shutemov, aneesh.kumar, gerald.schaefer, mike.kravetz,
	linux-mm, will.deacon, steve.capper, kaly.xin, nd,
	linux-arm-kernel

[-- Attachment #1: Type: text/plain, Size: 1643 bytes --]

Hi Huang,

[auto build test ERROR on mmotm/master]
[also build test ERROR on v4.9-rc3 next-20161028]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Huang-Shijie/mm-hugetlb-rename-some-allocation-functions/20161103-105611
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: x86_64-randconfig-x011-201644 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All error/warnings (new ones prefixed by >>):

   mm/hugetlb.c: In function '__alloc_huge_page':
>> mm/hugetlb.c:1623:10: error: implicit declaration of function 'alloc_gigantic_page' [-Werror=implicit-function-declaration]
      page = alloc_gigantic_page(nid, huge_page_order(h));
             ^~~~~~~~~~~~~~~~~~~
>> mm/hugetlb.c:1623:8: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
      page = alloc_gigantic_page(nid, huge_page_order(h));
           ^
   cc1: some warnings being treated as errors

vim +/alloc_gigantic_page +1623 mm/hugetlb.c

  1617			h->nr_huge_pages++;
  1618			h->surplus_huge_pages++;
  1619		}
  1620		spin_unlock(&hugetlb_lock);
  1621	
  1622		if (hstate_is_gigantic(h))  {
> 1623			page = alloc_gigantic_page(nid, huge_page_order(h));
  1624			if (page)
  1625				prep_compound_gigantic_page(page, huge_page_order(h));
  1626		} else {

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 25105 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 2/2] mm: hugetlb: support gigantic surplus pages
@ 2016-11-03  3:13     ` kbuild test robot
  0 siblings, 0 replies; 32+ messages in thread
From: kbuild test robot @ 2016-11-03  3:13 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Huang,

[auto build test ERROR on mmotm/master]
[also build test ERROR on v4.9-rc3 next-20161028]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Huang-Shijie/mm-hugetlb-rename-some-allocation-functions/20161103-105611
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: x86_64-randconfig-x011-201644 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All error/warnings (new ones prefixed by >>):

   mm/hugetlb.c: In function '__alloc_huge_page':
>> mm/hugetlb.c:1623:10: error: implicit declaration of function 'alloc_gigantic_page' [-Werror=implicit-function-declaration]
      page = alloc_gigantic_page(nid, huge_page_order(h));
             ^~~~~~~~~~~~~~~~~~~
>> mm/hugetlb.c:1623:8: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
      page = alloc_gigantic_page(nid, huge_page_order(h));
           ^
   cc1: some warnings being treated as errors

vim +/alloc_gigantic_page +1623 mm/hugetlb.c

  1617			h->nr_huge_pages++;
  1618			h->surplus_huge_pages++;
  1619		}
  1620		spin_unlock(&hugetlb_lock);
  1621	
  1622		if (hstate_is_gigantic(h))  {
> 1623			page = alloc_gigantic_page(nid, huge_page_order(h));
  1624			if (page)
  1625				prep_compound_gigantic_page(page, huge_page_order(h));
  1626		} else {

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
-------------- next part --------------
A non-text attachment was scrubbed...
Name: .config.gz
Type: application/gzip
Size: 25105 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20161103/516fb148/attachment-0001.gz>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/2] mm: fix the "counter.sh" failure for libhugetlbfs
  2016-11-03  2:51 ` Huang Shijie
@ 2016-11-03 17:22   ` Randy Dunlap
  -1 siblings, 0 replies; 32+ messages in thread
From: Randy Dunlap @ 2016-11-03 17:22 UTC (permalink / raw)
  To: Huang Shijie, akpm, catalin.marinas
  Cc: n-horiguchi, mhocko, kirill.shutemov, aneesh.kumar,
	gerald.schaefer, mike.kravetz, linux-mm, will.deacon,
	steve.capper, kaly.xin, nd, linux-arm-kernel

On 11/02/16 19:51, Huang Shijie wrote:
> 
> (2) The bug   
>    After I tested the libhugetlbfs, I found the test case "counter.sh"
>    will fail with the gigantic page (32M page in arm64 board).
> 
>    This patch set adds support for gigantic surplus hugetlb pages,
>    allowing the counter.sh unit test to pass.   

Hi,
Where is the counter.sh test? Where can I find it?

thanks.
-- 
~Randy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 0/2] mm: fix the "counter.sh" failure for libhugetlbfs
@ 2016-11-03 17:22   ` Randy Dunlap
  0 siblings, 0 replies; 32+ messages in thread
From: Randy Dunlap @ 2016-11-03 17:22 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/02/16 19:51, Huang Shijie wrote:
> 
> (2) The bug   
>    After I tested the libhugetlbfs, I found the test case "counter.sh"
>    will fail with the gigantic page (32M page in arm64 board).
> 
>    This patch set adds support for gigantic surplus hugetlb pages,
>    allowing the counter.sh unit test to pass.   

Hi,
Where is the counter.sh test? Where can I find it?

thanks.
-- 
~Randy

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/2] mm: fix the "counter.sh" failure for libhugetlbfs
  2016-11-03 17:22   ` Randy Dunlap
@ 2016-11-04  1:59     ` Huang Shijie
  -1 siblings, 0 replies; 32+ messages in thread
From: Huang Shijie @ 2016-11-04  1:59 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: akpm, catalin.marinas, n-horiguchi, mhocko, kirill.shutemov,
	aneesh.kumar, gerald.schaefer, mike.kravetz, linux-mm,
	will.deacon, steve.capper, kaly.xin, nd, linux-arm-kernel

On Thu, Nov 03, 2016 at 10:22:39AM -0700, Randy Dunlap wrote:
> On 11/02/16 19:51, Huang Shijie wrote:
> > 
> > (2) The bug   
> >    After I tested the libhugetlbfs, I found the test case "counter.sh"
> >    will fail with the gigantic page (32M page in arm64 board).
> > 
> >    This patch set adds support for gigantic surplus hugetlb pages,
> >    allowing the counter.sh unit test to pass.   
> 
> Hi,
> Where is the counter.sh test? Where can I find it?
You can get the libhugetlbfs from:
   https://github.com/libhugetlbfs/libhugetlbfs.git

Use the "make func" to test it, but the default libhugetlbfs can not run
for the 32M page hugetlbfs, there are several bugs in it. I have an
extra patch set to fix the libhugetlbfs bugs. Maybe I can send them out
later.

But for the 2M page size, you can test the "counter.sh" with "make
func".

thanks
Huang Shijie

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 0/2] mm: fix the "counter.sh" failure for libhugetlbfs
@ 2016-11-04  1:59     ` Huang Shijie
  0 siblings, 0 replies; 32+ messages in thread
From: Huang Shijie @ 2016-11-04  1:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 03, 2016 at 10:22:39AM -0700, Randy Dunlap wrote:
> On 11/02/16 19:51, Huang Shijie wrote:
> > 
> > (2) The bug   
> >    After I tested the libhugetlbfs, I found the test case "counter.sh"
> >    will fail with the gigantic page (32M page in arm64 board).
> > 
> >    This patch set adds support for gigantic surplus hugetlb pages,
> >    allowing the counter.sh unit test to pass.   
> 
> Hi,
> Where is the counter.sh test? Where can I find it?
You can get the libhugetlbfs from:
   https://github.com/libhugetlbfs/libhugetlbfs.git

Use the "make func" to test it, but the default libhugetlbfs can not run
for the 32M page hugetlbfs, there are several bugs in it. I have an
extra patch set to fix the libhugetlbfs bugs. Maybe I can send them out
later.

But for the 2M page size, you can test the "counter.sh" with "make
func".

thanks
Huang Shijie

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH] mm: hugetlb: rename some allocation functions
  2016-11-03  2:51   ` Huang Shijie
@ 2016-11-04  3:11     ` Huang Shijie
  -1 siblings, 0 replies; 32+ messages in thread
From: Huang Shijie @ 2016-11-04  3:11 UTC (permalink / raw)
  To: akpm, catalin.marinas
  Cc: n-horiguchi, mhocko, kirill.shutemov, aneesh.kumar,
	gerald.schaefer, mike.kravetz, linux-mm, will.deacon,
	steve.capper, kaly.xin, nd, linux-arm-kernel, Huang Shijie

After a future patch, the __alloc_buddy_huge_page() will not necessarily
use the buddy allocator.

So this patch removes the "buddy" from these functions:
	__alloc_buddy_huge_page -> __alloc_huge_page
	__alloc_buddy_huge_page_no_mpol -> __alloc_huge_page_no_mpol
	__alloc_buddy_huge_page_with_mpol -> __alloc_huge_page_with_mpol

This patch makes preparation for the later patch.

Acked-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Huang Shijie <shijie.huang@arm.com>
---

Fix the build error in the x86 platform.

---
 mm/hugetlb.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 32594f1..9fdfc24 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1163,6 +1163,10 @@ static inline void destroy_compound_gigantic_page(struct page *page,
 						unsigned int order) { }
 static inline int alloc_fresh_gigantic_page(struct hstate *h,
 					nodemask_t *nodes_allowed) { return 0; }
+static struct page *alloc_gigantic_page(int nid, unsigned int order)
+{
+	return NULL;
+}
 #endif
 
 static void update_and_free_page(struct hstate *h, struct page *page)
@@ -1568,7 +1572,7 @@ static struct page *__hugetlb_alloc_buddy_huge_page(struct hstate *h,
  * For (2), we ignore 'vma' and 'addr' and use 'nid' exclusively. This
  * implies that memory policies will not be taken in to account.
  */
-static struct page *__alloc_buddy_huge_page(struct hstate *h,
+static struct page *__alloc_huge_page(struct hstate *h,
 		struct vm_area_struct *vma, unsigned long addr, int nid)
 {
 	struct page *page;
@@ -1649,21 +1653,21 @@ static struct page *__alloc_buddy_huge_page(struct hstate *h,
  * anywhere.
  */
 static
-struct page *__alloc_buddy_huge_page_no_mpol(struct hstate *h, int nid)
+struct page *__alloc_huge_page_no_mpol(struct hstate *h, int nid)
 {
 	unsigned long addr = -1;
 
-	return __alloc_buddy_huge_page(h, NULL, addr, nid);
+	return __alloc_huge_page(h, NULL, addr, nid);
 }
 
 /*
  * Use the VMA's mpolicy to allocate a huge page from the buddy.
  */
 static
-struct page *__alloc_buddy_huge_page_with_mpol(struct hstate *h,
+struct page *__alloc_huge_page_with_mpol(struct hstate *h,
 		struct vm_area_struct *vma, unsigned long addr)
 {
-	return __alloc_buddy_huge_page(h, vma, addr, NUMA_NO_NODE);
+	return __alloc_huge_page(h, vma, addr, NUMA_NO_NODE);
 }
 
 /*
@@ -1681,7 +1685,7 @@ struct page *alloc_huge_page_node(struct hstate *h, int nid)
 	spin_unlock(&hugetlb_lock);
 
 	if (!page)
-		page = __alloc_buddy_huge_page_no_mpol(h, nid);
+		page = __alloc_huge_page_no_mpol(h, nid);
 
 	return page;
 }
@@ -1711,7 +1715,7 @@ static int gather_surplus_pages(struct hstate *h, int delta)
 retry:
 	spin_unlock(&hugetlb_lock);
 	for (i = 0; i < needed; i++) {
-		page = __alloc_buddy_huge_page_no_mpol(h, NUMA_NO_NODE);
+		page = __alloc_huge_page_no_mpol(h, NUMA_NO_NODE);
 		if (!page) {
 			alloc_ok = false;
 			break;
@@ -1963,7 +1967,7 @@ struct page *alloc_huge_page(struct vm_area_struct *vma,
 	page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve, gbl_chg);
 	if (!page) {
 		spin_unlock(&hugetlb_lock);
-		page = __alloc_buddy_huge_page_with_mpol(h, vma, addr);
+		page = __alloc_huge_page_with_mpol(h, vma, addr);
 		if (!page)
 			goto out_uncharge_cgroup;
 		if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) {
@@ -2221,7 +2225,7 @@ static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count,
 	 * First take pages out of surplus state.  Then make up the
 	 * remaining difference by allocating fresh huge pages.
 	 *
-	 * We might race with __alloc_buddy_huge_page() here and be unable
+	 * We might race with __alloc_huge_page() here and be unable
 	 * to convert a surplus huge page to a normal huge page. That is
 	 * not critical, though, it just means the overall size of the
 	 * pool might be one hugepage larger than it needs to be, but
@@ -2267,7 +2271,7 @@ static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count,
 	 * By placing pages into the surplus state independent of the
 	 * overcommit value, we are allowing the surplus pool size to
 	 * exceed overcommit. There are few sane options here. Since
-	 * __alloc_buddy_huge_page() is checking the global counter,
+	 * __alloc_huge_page() is checking the global counter,
 	 * though, we'll note that we're not allowed to exceed surplus
 	 * and won't grow the pool anywhere else. Not until one of the
 	 * sysctls are changed, or the surplus pages go out of use.
-- 
2.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH] mm: hugetlb: rename some allocation functions
@ 2016-11-04  3:11     ` Huang Shijie
  0 siblings, 0 replies; 32+ messages in thread
From: Huang Shijie @ 2016-11-04  3:11 UTC (permalink / raw)
  To: linux-arm-kernel

After a future patch, the __alloc_buddy_huge_page() will not necessarily
use the buddy allocator.

So this patch removes the "buddy" from these functions:
	__alloc_buddy_huge_page -> __alloc_huge_page
	__alloc_buddy_huge_page_no_mpol -> __alloc_huge_page_no_mpol
	__alloc_buddy_huge_page_with_mpol -> __alloc_huge_page_with_mpol

This patch makes preparation for the later patch.

Acked-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Huang Shijie <shijie.huang@arm.com>
---

Fix the build error in the x86 platform.

---
 mm/hugetlb.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 32594f1..9fdfc24 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1163,6 +1163,10 @@ static inline void destroy_compound_gigantic_page(struct page *page,
 						unsigned int order) { }
 static inline int alloc_fresh_gigantic_page(struct hstate *h,
 					nodemask_t *nodes_allowed) { return 0; }
+static struct page *alloc_gigantic_page(int nid, unsigned int order)
+{
+	return NULL;
+}
 #endif
 
 static void update_and_free_page(struct hstate *h, struct page *page)
@@ -1568,7 +1572,7 @@ static struct page *__hugetlb_alloc_buddy_huge_page(struct hstate *h,
  * For (2), we ignore 'vma' and 'addr' and use 'nid' exclusively. This
  * implies that memory policies will not be taken in to account.
  */
-static struct page *__alloc_buddy_huge_page(struct hstate *h,
+static struct page *__alloc_huge_page(struct hstate *h,
 		struct vm_area_struct *vma, unsigned long addr, int nid)
 {
 	struct page *page;
@@ -1649,21 +1653,21 @@ static struct page *__alloc_buddy_huge_page(struct hstate *h,
  * anywhere.
  */
 static
-struct page *__alloc_buddy_huge_page_no_mpol(struct hstate *h, int nid)
+struct page *__alloc_huge_page_no_mpol(struct hstate *h, int nid)
 {
 	unsigned long addr = -1;
 
-	return __alloc_buddy_huge_page(h, NULL, addr, nid);
+	return __alloc_huge_page(h, NULL, addr, nid);
 }
 
 /*
  * Use the VMA's mpolicy to allocate a huge page from the buddy.
  */
 static
-struct page *__alloc_buddy_huge_page_with_mpol(struct hstate *h,
+struct page *__alloc_huge_page_with_mpol(struct hstate *h,
 		struct vm_area_struct *vma, unsigned long addr)
 {
-	return __alloc_buddy_huge_page(h, vma, addr, NUMA_NO_NODE);
+	return __alloc_huge_page(h, vma, addr, NUMA_NO_NODE);
 }
 
 /*
@@ -1681,7 +1685,7 @@ struct page *alloc_huge_page_node(struct hstate *h, int nid)
 	spin_unlock(&hugetlb_lock);
 
 	if (!page)
-		page = __alloc_buddy_huge_page_no_mpol(h, nid);
+		page = __alloc_huge_page_no_mpol(h, nid);
 
 	return page;
 }
@@ -1711,7 +1715,7 @@ static int gather_surplus_pages(struct hstate *h, int delta)
 retry:
 	spin_unlock(&hugetlb_lock);
 	for (i = 0; i < needed; i++) {
-		page = __alloc_buddy_huge_page_no_mpol(h, NUMA_NO_NODE);
+		page = __alloc_huge_page_no_mpol(h, NUMA_NO_NODE);
 		if (!page) {
 			alloc_ok = false;
 			break;
@@ -1963,7 +1967,7 @@ struct page *alloc_huge_page(struct vm_area_struct *vma,
 	page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve, gbl_chg);
 	if (!page) {
 		spin_unlock(&hugetlb_lock);
-		page = __alloc_buddy_huge_page_with_mpol(h, vma, addr);
+		page = __alloc_huge_page_with_mpol(h, vma, addr);
 		if (!page)
 			goto out_uncharge_cgroup;
 		if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) {
@@ -2221,7 +2225,7 @@ static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count,
 	 * First take pages out of surplus state.  Then make up the
 	 * remaining difference by allocating fresh huge pages.
 	 *
-	 * We might race with __alloc_buddy_huge_page() here and be unable
+	 * We might race with __alloc_huge_page() here and be unable
 	 * to convert a surplus huge page to a normal huge page. That is
 	 * not critical, though, it just means the overall size of the
 	 * pool might be one hugepage larger than it needs to be, but
@@ -2267,7 +2271,7 @@ static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count,
 	 * By placing pages into the surplus state independent of the
 	 * overcommit value, we are allowing the surplus pool size to
 	 * exceed overcommit. There are few sane options here. Since
-	 * __alloc_buddy_huge_page() is checking the global counter,
+	 * __alloc_huge_page() is checking the global counter,
 	 * though, we'll note that we're not allowed to exceed surplus
 	 * and won't grow the pool anywhere else. Not until one of the
 	 * sysctls are changed, or the surplus pages go out of use.
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/2] mm: hugetlb: support gigantic surplus pages
  2016-11-03  2:51   ` Huang Shijie
@ 2016-11-07 15:25     ` Gerald Schaefer
  -1 siblings, 0 replies; 32+ messages in thread
From: Gerald Schaefer @ 2016-11-07 15:25 UTC (permalink / raw)
  To: Huang Shijie
  Cc: akpm, catalin.marinas, n-horiguchi, mhocko, kirill.shutemov,
	aneesh.kumar, mike.kravetz, linux-mm, will.deacon, steve.capper,
	kaly.xin, nd, linux-arm-kernel

On Thu, 3 Nov 2016 10:51:38 +0800
Huang Shijie <shijie.huang@arm.com> wrote:

> When testing the gigantic page whose order is too large for the buddy
> allocator, the libhugetlbfs test case "counter.sh" will fail.
> 
> The failure is caused by:
>  1) kernel fails to allocate a gigantic page for the surplus case.
>     And the gather_surplus_pages() will return NULL in the end.
> 
>  2) The condition checks for "over-commit" is wrong.
> 
> This patch adds code to allocate the gigantic page in the
> __alloc_huge_page(). After this patch, gather_surplus_pages()
> can return a gigantic page for the surplus case.
> 
> This patch also changes the condition checks for:
>      return_unused_surplus_pages()
>      nr_overcommit_hugepages_store()
> 
> After this patch, the counter.sh can pass for the gigantic page.
> 
> Acked-by: Steve Capper <steve.capper@arm.com>
> Signed-off-by: Huang Shijie <shijie.huang@arm.com>
> ---
>  mm/hugetlb.c | 15 ++++++++++-----
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 0bf4444..2b67aff 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1574,7 +1574,7 @@ static struct page *__alloc_huge_page(struct hstate *h,
>  	struct page *page;
>  	unsigned int r_nid;
> 
> -	if (hstate_is_gigantic(h))
> +	if (hstate_is_gigantic(h) && !gigantic_page_supported())
>  		return NULL;

Is it really possible to stumble over gigantic pages w/o having
gigantic_page_supported()?

Also, I've just tried this on s390 and counter.sh still fails after these
patches, and it should fail on all archs as long as you use the gigantic
hugepage size as default hugepage size. This is because you only changed
nr_overcommit_hugepages_store(), which handles nr_overcommit_hugepages
in sysfs, and missed hugetlb_overcommit_handler() which handles
/proc/sys/vm/nr_overcommit_hugepages for the default sized hugepages.

However, changing hugetlb_overcommit_handler() in a similar way
produces a lockdep warning, see below, and counters.sh now results in
FAIL	mmap failed: Cannot allocate memory
So I guess this needs more thinking (or just a proper annotation, as
suggested, didn't really look into it):

[  129.595054] INFO: trying to register non-static key.
[  129.595060] the code is fine but needs lockdep annotation.
[  129.595062] turning off the locking correctness validator.
[  129.595066] CPU: 4 PID: 1108 Comm: counters Not tainted 4.9.0-rc3-00261-g577f12c-dirty #12
[  129.595067] Hardware name: IBM              2964 N96              704              (LPAR)
[  129.595069] Stack:
[  129.595070]        00000003b4833688 00000003b4833718 0000000000000003 0000000000000000
[  129.595075]        00000003b48337b8 00000003b4833730 00000003b4833730 0000000000000020
[  129.595078]        0000000000000000 0000000000000020 000000000000000a 000000000000000a
[  129.595082]        000000000000000c 00000003b4833780 0000000000000000 00000003b4830000
[  129.595086]        0000000000000000 0000000000112d90 00000003b4833718 00000003b4833770
[  129.595089] Call Trace:
[  129.595095] ([<0000000000112c6a>] show_trace+0x8a/0xe0)
[  129.595098]  [<0000000000112d40>] show_stack+0x80/0xd8 
[  129.595103]  [<0000000000744eec>] dump_stack+0x9c/0xe0 
[  129.595106]  [<00000000001b0760>] register_lock_class+0x1a8/0x530 
[  129.595109]  [<00000000001b59fa>] __lock_acquire+0x10a/0x7f0 
[  129.595110]  [<00000000001b69b8>] lock_acquire+0x2e0/0x330 
[  129.595115]  [<0000000000a44920>] _raw_spin_lock_irqsave+0x70/0xb8 
[  129.595118]  [<000000000031cdce>] alloc_gigantic_page+0x8e/0x2c8 
[  129.595120]  [<000000000031e95a>] __alloc_huge_page+0xea/0x4d8 
[  129.595122]  [<000000000031f4c6>] hugetlb_acct_memory+0xa6/0x418 
[  129.595125]  [<0000000000323b32>] hugetlb_reserve_pages+0x132/0x240 
[  129.595152]  [<000000000048be62>] hugetlbfs_file_mmap+0xd2/0x130 
[  129.595155]  [<0000000000303918>] mmap_region+0x368/0x6e0 
[  129.595157]  [<0000000000303fb8>] do_mmap+0x328/0x400 
[  129.595160]  [<00000000002dc1aa>] vm_mmap_pgoff+0x9a/0xe8 
[  129.595162]  [<00000000003016dc>] SyS_mmap_pgoff+0x23c/0x288 
[  129.595164]  [<00000000003017b6>] SyS_old_mmap+0x8e/0xb0 
[  129.595166]  [<0000000000a45b06>] system_call+0xd6/0x270 
[  129.595167] INFO: lockdep is turned off.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 2/2] mm: hugetlb: support gigantic surplus pages
@ 2016-11-07 15:25     ` Gerald Schaefer
  0 siblings, 0 replies; 32+ messages in thread
From: Gerald Schaefer @ 2016-11-07 15:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 3 Nov 2016 10:51:38 +0800
Huang Shijie <shijie.huang@arm.com> wrote:

> When testing the gigantic page whose order is too large for the buddy
> allocator, the libhugetlbfs test case "counter.sh" will fail.
> 
> The failure is caused by:
>  1) kernel fails to allocate a gigantic page for the surplus case.
>     And the gather_surplus_pages() will return NULL in the end.
> 
>  2) The condition checks for "over-commit" is wrong.
> 
> This patch adds code to allocate the gigantic page in the
> __alloc_huge_page(). After this patch, gather_surplus_pages()
> can return a gigantic page for the surplus case.
> 
> This patch also changes the condition checks for:
>      return_unused_surplus_pages()
>      nr_overcommit_hugepages_store()
> 
> After this patch, the counter.sh can pass for the gigantic page.
> 
> Acked-by: Steve Capper <steve.capper@arm.com>
> Signed-off-by: Huang Shijie <shijie.huang@arm.com>
> ---
>  mm/hugetlb.c | 15 ++++++++++-----
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 0bf4444..2b67aff 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1574,7 +1574,7 @@ static struct page *__alloc_huge_page(struct hstate *h,
>  	struct page *page;
>  	unsigned int r_nid;
> 
> -	if (hstate_is_gigantic(h))
> +	if (hstate_is_gigantic(h) && !gigantic_page_supported())
>  		return NULL;

Is it really possible to stumble over gigantic pages w/o having
gigantic_page_supported()?

Also, I've just tried this on s390 and counter.sh still fails after these
patches, and it should fail on all archs as long as you use the gigantic
hugepage size as default hugepage size. This is because you only changed
nr_overcommit_hugepages_store(), which handles nr_overcommit_hugepages
in sysfs, and missed hugetlb_overcommit_handler() which handles
/proc/sys/vm/nr_overcommit_hugepages for the default sized hugepages.

However, changing hugetlb_overcommit_handler() in a similar way
produces a lockdep warning, see below, and counters.sh now results in
FAIL	mmap failed: Cannot allocate memory
So I guess this needs more thinking (or just a proper annotation, as
suggested, didn't really look into it):

[  129.595054] INFO: trying to register non-static key.
[  129.595060] the code is fine but needs lockdep annotation.
[  129.595062] turning off the locking correctness validator.
[  129.595066] CPU: 4 PID: 1108 Comm: counters Not tainted 4.9.0-rc3-00261-g577f12c-dirty #12
[  129.595067] Hardware name: IBM              2964 N96              704              (LPAR)
[  129.595069] Stack:
[  129.595070]        00000003b4833688 00000003b4833718 0000000000000003 0000000000000000
[  129.595075]        00000003b48337b8 00000003b4833730 00000003b4833730 0000000000000020
[  129.595078]        0000000000000000 0000000000000020 000000000000000a 000000000000000a
[  129.595082]        000000000000000c 00000003b4833780 0000000000000000 00000003b4830000
[  129.595086]        0000000000000000 0000000000112d90 00000003b4833718 00000003b4833770
[  129.595089] Call Trace:
[  129.595095] ([<0000000000112c6a>] show_trace+0x8a/0xe0)
[  129.595098]  [<0000000000112d40>] show_stack+0x80/0xd8 
[  129.595103]  [<0000000000744eec>] dump_stack+0x9c/0xe0 
[  129.595106]  [<00000000001b0760>] register_lock_class+0x1a8/0x530 
[  129.595109]  [<00000000001b59fa>] __lock_acquire+0x10a/0x7f0 
[  129.595110]  [<00000000001b69b8>] lock_acquire+0x2e0/0x330 
[  129.595115]  [<0000000000a44920>] _raw_spin_lock_irqsave+0x70/0xb8 
[  129.595118]  [<000000000031cdce>] alloc_gigantic_page+0x8e/0x2c8 
[  129.595120]  [<000000000031e95a>] __alloc_huge_page+0xea/0x4d8 
[  129.595122]  [<000000000031f4c6>] hugetlb_acct_memory+0xa6/0x418 
[  129.595125]  [<0000000000323b32>] hugetlb_reserve_pages+0x132/0x240 
[  129.595152]  [<000000000048be62>] hugetlbfs_file_mmap+0xd2/0x130 
[  129.595155]  [<0000000000303918>] mmap_region+0x368/0x6e0 
[  129.595157]  [<0000000000303fb8>] do_mmap+0x328/0x400 
[  129.595160]  [<00000000002dc1aa>] vm_mmap_pgoff+0x9a/0xe8 
[  129.595162]  [<00000000003016dc>] SyS_mmap_pgoff+0x23c/0x288 
[  129.595164]  [<00000000003017b6>] SyS_old_mmap+0x8e/0xb0 
[  129.595166]  [<0000000000a45b06>] system_call+0xd6/0x270 
[  129.595167] INFO: lockdep is turned off.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/2] mm: hugetlb: support gigantic surplus pages
  2016-11-07 15:25     ` Gerald Schaefer
@ 2016-11-08  2:19       ` Huang Shijie
  -1 siblings, 0 replies; 32+ messages in thread
From: Huang Shijie @ 2016-11-08  2:19 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: akpm, catalin.marinas, n-horiguchi, mhocko, kirill.shutemov,
	aneesh.kumar, mike.kravetz, linux-mm, will.deacon, steve.capper,
	kaly.xin, nd, linux-arm-kernel

On Mon, Nov 07, 2016 at 04:25:04PM +0100, Gerald Schaefer wrote:
> On Thu, 3 Nov 2016 10:51:38 +0800
> Huang Shijie <shijie.huang@arm.com> wrote:
> 
> > When testing the gigantic page whose order is too large for the buddy
> > allocator, the libhugetlbfs test case "counter.sh" will fail.
> > 
> > The failure is caused by:
> >  1) kernel fails to allocate a gigantic page for the surplus case.
> >     And the gather_surplus_pages() will return NULL in the end.
> > 
> >  2) The condition checks for "over-commit" is wrong.
> > 
> > This patch adds code to allocate the gigantic page in the
> > __alloc_huge_page(). After this patch, gather_surplus_pages()
> > can return a gigantic page for the surplus case.
> > 
> > This patch also changes the condition checks for:
> >      return_unused_surplus_pages()
> >      nr_overcommit_hugepages_store()
> > 
> > After this patch, the counter.sh can pass for the gigantic page.
> > 
> > Acked-by: Steve Capper <steve.capper@arm.com>
> > Signed-off-by: Huang Shijie <shijie.huang@arm.com>
> > ---
> >  mm/hugetlb.c | 15 ++++++++++-----
> >  1 file changed, 10 insertions(+), 5 deletions(-)
> > 
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 0bf4444..2b67aff 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1574,7 +1574,7 @@ static struct page *__alloc_huge_page(struct hstate *h,
> >  	struct page *page;
> >  	unsigned int r_nid;
> > 
> > -	if (hstate_is_gigantic(h))
> > +	if (hstate_is_gigantic(h) && !gigantic_page_supported())
> >  		return NULL;
> 
> Is it really possible to stumble over gigantic pages w/o having
> gigantic_page_supported()?
> 
> Also, I've just tried this on s390 and counter.sh still fails after these
> patches, and it should fail on all archs as long as you use the gigantic
I guess the failure you met is caused by the libhugetlbfs itself, there are
several bugs in the libhugetlbfs. I have a patch set for the
libhugetlbfs too. I will send it as soon as possible.

> hugepage size as default hugepage size. This is because you only changed
> nr_overcommit_hugepages_store(), which handles nr_overcommit_hugepages
> in sysfs, and missed hugetlb_overcommit_handler() which handles
> /proc/sys/vm/nr_overcommit_hugepages for the default sized hugepages.
This is wrong. :)

I did have an extra patch to fix the hugetlb_overcommit_handler().
but the counters.sh does not use the /proc/sys/vm/nr_overcommit_hugepages.
Please grep it in the code.

Thanks
Huang Shijie

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 2/2] mm: hugetlb: support gigantic surplus pages
@ 2016-11-08  2:19       ` Huang Shijie
  0 siblings, 0 replies; 32+ messages in thread
From: Huang Shijie @ 2016-11-08  2:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Nov 07, 2016 at 04:25:04PM +0100, Gerald Schaefer wrote:
> On Thu, 3 Nov 2016 10:51:38 +0800
> Huang Shijie <shijie.huang@arm.com> wrote:
> 
> > When testing the gigantic page whose order is too large for the buddy
> > allocator, the libhugetlbfs test case "counter.sh" will fail.
> > 
> > The failure is caused by:
> >  1) kernel fails to allocate a gigantic page for the surplus case.
> >     And the gather_surplus_pages() will return NULL in the end.
> > 
> >  2) The condition checks for "over-commit" is wrong.
> > 
> > This patch adds code to allocate the gigantic page in the
> > __alloc_huge_page(). After this patch, gather_surplus_pages()
> > can return a gigantic page for the surplus case.
> > 
> > This patch also changes the condition checks for:
> >      return_unused_surplus_pages()
> >      nr_overcommit_hugepages_store()
> > 
> > After this patch, the counter.sh can pass for the gigantic page.
> > 
> > Acked-by: Steve Capper <steve.capper@arm.com>
> > Signed-off-by: Huang Shijie <shijie.huang@arm.com>
> > ---
> >  mm/hugetlb.c | 15 ++++++++++-----
> >  1 file changed, 10 insertions(+), 5 deletions(-)
> > 
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 0bf4444..2b67aff 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1574,7 +1574,7 @@ static struct page *__alloc_huge_page(struct hstate *h,
> >  	struct page *page;
> >  	unsigned int r_nid;
> > 
> > -	if (hstate_is_gigantic(h))
> > +	if (hstate_is_gigantic(h) && !gigantic_page_supported())
> >  		return NULL;
> 
> Is it really possible to stumble over gigantic pages w/o having
> gigantic_page_supported()?
> 
> Also, I've just tried this on s390 and counter.sh still fails after these
> patches, and it should fail on all archs as long as you use the gigantic
I guess the failure you met is caused by the libhugetlbfs itself, there are
several bugs in the libhugetlbfs. I have a patch set for the
libhugetlbfs too. I will send it as soon as possible.

> hugepage size as default hugepage size. This is because you only changed
> nr_overcommit_hugepages_store(), which handles nr_overcommit_hugepages
> in sysfs, and missed hugetlb_overcommit_handler() which handles
> /proc/sys/vm/nr_overcommit_hugepages for the default sized hugepages.
This is wrong. :)

I did have an extra patch to fix the hugetlb_overcommit_handler().
but the counters.sh does not use the /proc/sys/vm/nr_overcommit_hugepages.
Please grep it in the code.

Thanks
Huang Shijie

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/2] mm: hugetlb: support gigantic surplus pages
  2016-11-08  2:19       ` Huang Shijie
@ 2016-11-08  7:08         ` Huang Shijie
  -1 siblings, 0 replies; 32+ messages in thread
From: Huang Shijie @ 2016-11-08  7:08 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: akpm, catalin.marinas, n-horiguchi, mhocko, kirill.shutemov,
	aneesh.kumar, mike.kravetz, linux-mm, will.deacon, steve.capper,
	kaly.xin, nd, linux-arm-kernel

On Tue, Nov 08, 2016 at 10:19:30AM +0800, Huang Shijie wrote:
> On Mon, Nov 07, 2016 at 04:25:04PM +0100, Gerald Schaefer wrote:
> > On Thu, 3 Nov 2016 10:51:38 +0800
> > Huang Shijie <shijie.huang@arm.com> wrote:
> > 
> > > When testing the gigantic page whose order is too large for the buddy
> > > allocator, the libhugetlbfs test case "counter.sh" will fail.
> > > 
> > > The failure is caused by:
> > >  1) kernel fails to allocate a gigantic page for the surplus case.
> > >     And the gather_surplus_pages() will return NULL in the end.
> > > 
> > >  2) The condition checks for "over-commit" is wrong.
> > > 
> > > This patch adds code to allocate the gigantic page in the
> > > __alloc_huge_page(). After this patch, gather_surplus_pages()
> > > can return a gigantic page for the surplus case.
> > > 
> > > This patch also changes the condition checks for:
> > >      return_unused_surplus_pages()
> > >      nr_overcommit_hugepages_store()
> > > 
> > > After this patch, the counter.sh can pass for the gigantic page.
> > > 
> > > Acked-by: Steve Capper <steve.capper@arm.com>
> > > Signed-off-by: Huang Shijie <shijie.huang@arm.com>
> > > ---
> > >  mm/hugetlb.c | 15 ++++++++++-----
> > >  1 file changed, 10 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > > index 0bf4444..2b67aff 100644
> > > --- a/mm/hugetlb.c
> > > +++ b/mm/hugetlb.c
> > > @@ -1574,7 +1574,7 @@ static struct page *__alloc_huge_page(struct hstate *h,
> > >  	struct page *page;
> > >  	unsigned int r_nid;
> > > 
> > > -	if (hstate_is_gigantic(h))
> > > +	if (hstate_is_gigantic(h) && !gigantic_page_supported())
> > >  		return NULL;
> > 
> > Is it really possible to stumble over gigantic pages w/o having
> > gigantic_page_supported()?
> > 
> > Also, I've just tried this on s390 and counter.sh still fails after these
> > patches, and it should fail on all archs as long as you use the gigantic
> I guess the failure you met is caused by the libhugetlbfs itself, there are
> several bugs in the libhugetlbfs. I have a patch set for the
> libhugetlbfs too. I will send it as soon as possible.
> 
> > hugepage size as default hugepage size. This is because you only changed
> > nr_overcommit_hugepages_store(), which handles nr_overcommit_hugepages
> > in sysfs, and missed hugetlb_overcommit_handler() which handles
> > /proc/sys/vm/nr_overcommit_hugepages for the default sized hugepages.
> This is wrong. :)
Sorry, I was wrong :). The counters test does call the /proc/sys/vm/nr_overcommit_hugepages.
But in the arm64, it does not trigger a fail for the counters test.
In an other word, I did not change the hugetlb_overcommit_handler(),
the counters.sh also can pass in arm64. 

I will look at the lockdep issue.

Thanks
Huang Shijie


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 2/2] mm: hugetlb: support gigantic surplus pages
@ 2016-11-08  7:08         ` Huang Shijie
  0 siblings, 0 replies; 32+ messages in thread
From: Huang Shijie @ 2016-11-08  7:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Nov 08, 2016 at 10:19:30AM +0800, Huang Shijie wrote:
> On Mon, Nov 07, 2016 at 04:25:04PM +0100, Gerald Schaefer wrote:
> > On Thu, 3 Nov 2016 10:51:38 +0800
> > Huang Shijie <shijie.huang@arm.com> wrote:
> > 
> > > When testing the gigantic page whose order is too large for the buddy
> > > allocator, the libhugetlbfs test case "counter.sh" will fail.
> > > 
> > > The failure is caused by:
> > >  1) kernel fails to allocate a gigantic page for the surplus case.
> > >     And the gather_surplus_pages() will return NULL in the end.
> > > 
> > >  2) The condition checks for "over-commit" is wrong.
> > > 
> > > This patch adds code to allocate the gigantic page in the
> > > __alloc_huge_page(). After this patch, gather_surplus_pages()
> > > can return a gigantic page for the surplus case.
> > > 
> > > This patch also changes the condition checks for:
> > >      return_unused_surplus_pages()
> > >      nr_overcommit_hugepages_store()
> > > 
> > > After this patch, the counter.sh can pass for the gigantic page.
> > > 
> > > Acked-by: Steve Capper <steve.capper@arm.com>
> > > Signed-off-by: Huang Shijie <shijie.huang@arm.com>
> > > ---
> > >  mm/hugetlb.c | 15 ++++++++++-----
> > >  1 file changed, 10 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > > index 0bf4444..2b67aff 100644
> > > --- a/mm/hugetlb.c
> > > +++ b/mm/hugetlb.c
> > > @@ -1574,7 +1574,7 @@ static struct page *__alloc_huge_page(struct hstate *h,
> > >  	struct page *page;
> > >  	unsigned int r_nid;
> > > 
> > > -	if (hstate_is_gigantic(h))
> > > +	if (hstate_is_gigantic(h) && !gigantic_page_supported())
> > >  		return NULL;
> > 
> > Is it really possible to stumble over gigantic pages w/o having
> > gigantic_page_supported()?
> > 
> > Also, I've just tried this on s390 and counter.sh still fails after these
> > patches, and it should fail on all archs as long as you use the gigantic
> I guess the failure you met is caused by the libhugetlbfs itself, there are
> several bugs in the libhugetlbfs. I have a patch set for the
> libhugetlbfs too. I will send it as soon as possible.
> 
> > hugepage size as default hugepage size. This is because you only changed
> > nr_overcommit_hugepages_store(), which handles nr_overcommit_hugepages
> > in sysfs, and missed hugetlb_overcommit_handler() which handles
> > /proc/sys/vm/nr_overcommit_hugepages for the default sized hugepages.
> This is wrong. :)
Sorry, I was wrong :). The counters test does call the /proc/sys/vm/nr_overcommit_hugepages.
But in the arm64, it does not trigger a fail for the counters test.
In an other word, I did not change the hugetlb_overcommit_handler(),
the counters.sh also can pass in arm64. 

I will look at the lockdep issue.

Thanks
Huang Shijie

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/2] mm: hugetlb: support gigantic surplus pages
  2016-11-08  7:08         ` Huang Shijie
@ 2016-11-08  9:17           ` Huang Shijie
  -1 siblings, 0 replies; 32+ messages in thread
From: Huang Shijie @ 2016-11-08  9:17 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: akpm, catalin.marinas, n-horiguchi, mhocko, kirill.shutemov,
	aneesh.kumar, mike.kravetz, linux-mm, will.deacon, steve.capper,
	kaly.xin, nd, linux-arm-kernel

Hi Gerald,
On Tue, Nov 08, 2016 at 03:08:52PM +0800, Huang Shijie wrote:
> On Tue, Nov 08, 2016 at 10:19:30AM +0800, Huang Shijie wrote:
> > On Mon, Nov 07, 2016 at 04:25:04PM +0100, Gerald Schaefer wrote:
> > > On Thu, 3 Nov 2016 10:51:38 +0800
> > > Huang Shijie <shijie.huang@arm.com> wrote:
> > > 
> > > > When testing the gigantic page whose order is too large for the buddy
> > > > allocator, the libhugetlbfs test case "counter.sh" will fail.
> > > > 
> > > > The failure is caused by:
> > > >  1) kernel fails to allocate a gigantic page for the surplus case.
> > > >     And the gather_surplus_pages() will return NULL in the end.
> > > > 
> > > >  2) The condition checks for "over-commit" is wrong.
> > > > 
> > > > This patch adds code to allocate the gigantic page in the
> > > > __alloc_huge_page(). After this patch, gather_surplus_pages()
> > > > can return a gigantic page for the surplus case.
> > > > 
> > > > This patch also changes the condition checks for:
> > > >      return_unused_surplus_pages()
> > > >      nr_overcommit_hugepages_store()
> > > > 
> > > > After this patch, the counter.sh can pass for the gigantic page.
> > > > 
> > > > Acked-by: Steve Capper <steve.capper@arm.com>
> > > > Signed-off-by: Huang Shijie <shijie.huang@arm.com>
> > > > ---
> > > >  mm/hugetlb.c | 15 ++++++++++-----
> > > >  1 file changed, 10 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > > > index 0bf4444..2b67aff 100644
> > > > --- a/mm/hugetlb.c
> > > > +++ b/mm/hugetlb.c
> > > > @@ -1574,7 +1574,7 @@ static struct page *__alloc_huge_page(struct hstate *h,
> > > >  	struct page *page;
> > > >  	unsigned int r_nid;
> > > > 
> > > > -	if (hstate_is_gigantic(h))
> > > > +	if (hstate_is_gigantic(h) && !gigantic_page_supported())
> > > >  		return NULL;
> > > 
> > > Is it really possible to stumble over gigantic pages w/o having
> > > gigantic_page_supported()?
> > > 
> > > Also, I've just tried this on s390 and counter.sh still fails after these
> > > patches, and it should fail on all archs as long as you use the gigantic
> > I guess the failure you met is caused by the libhugetlbfs itself, there are
> > several bugs in the libhugetlbfs. I have a patch set for the
> > libhugetlbfs too. I will send it as soon as possible.
> > 
> > > hugepage size as default hugepage size. This is because you only changed
> > > nr_overcommit_hugepages_store(), which handles nr_overcommit_hugepages
> > > in sysfs, and missed hugetlb_overcommit_handler() which handles
> > > /proc/sys/vm/nr_overcommit_hugepages for the default sized hugepages.
> > This is wrong. :)
> Sorry, I was wrong :). The counters test does call the /proc/sys/vm/nr_overcommit_hugepages.
> But in the arm64, it does not trigger a fail for the counters test.
> In an other word, I did not change the hugetlb_overcommit_handler(),
> the counters.sh also can pass in arm64. 
After I add the "default_hugepagesz=32M" to the kernel cmdlin, I can
reproduce this issue. Thanks for point this.

> 
> I will look at the lockdep issue.
I tested the new patch (will be sent out later) on the arm64 platform,
and I did not meet the lockdep issue when I enabled the lockdep.
The following is my config:

	CONFIG_LOCKD=y
	CONFIG_LOCKD_V4=y
	CONFIG_LOCKUP_DETECTOR=y
        # CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
	CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0
	CONFIG_DEBUG_SPINLOCK=y
	CONFIG_DEBUG_LOCK_ALLOC=y
	CONFIG_PROVE_LOCKING=y
	CONFIG_LOCKDEP=y
	CONFIG_LOCK_STAT=y
	CONFIG_DEBUG_LOCKDEP=y
	CONFIG_DEBUG_LOCKING_API_SELFTESTS=y
	
So do I miss something? 

Thanks	
Huang Shijie

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 2/2] mm: hugetlb: support gigantic surplus pages
@ 2016-11-08  9:17           ` Huang Shijie
  0 siblings, 0 replies; 32+ messages in thread
From: Huang Shijie @ 2016-11-08  9:17 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Gerald,
On Tue, Nov 08, 2016 at 03:08:52PM +0800, Huang Shijie wrote:
> On Tue, Nov 08, 2016 at 10:19:30AM +0800, Huang Shijie wrote:
> > On Mon, Nov 07, 2016 at 04:25:04PM +0100, Gerald Schaefer wrote:
> > > On Thu, 3 Nov 2016 10:51:38 +0800
> > > Huang Shijie <shijie.huang@arm.com> wrote:
> > > 
> > > > When testing the gigantic page whose order is too large for the buddy
> > > > allocator, the libhugetlbfs test case "counter.sh" will fail.
> > > > 
> > > > The failure is caused by:
> > > >  1) kernel fails to allocate a gigantic page for the surplus case.
> > > >     And the gather_surplus_pages() will return NULL in the end.
> > > > 
> > > >  2) The condition checks for "over-commit" is wrong.
> > > > 
> > > > This patch adds code to allocate the gigantic page in the
> > > > __alloc_huge_page(). After this patch, gather_surplus_pages()
> > > > can return a gigantic page for the surplus case.
> > > > 
> > > > This patch also changes the condition checks for:
> > > >      return_unused_surplus_pages()
> > > >      nr_overcommit_hugepages_store()
> > > > 
> > > > After this patch, the counter.sh can pass for the gigantic page.
> > > > 
> > > > Acked-by: Steve Capper <steve.capper@arm.com>
> > > > Signed-off-by: Huang Shijie <shijie.huang@arm.com>
> > > > ---
> > > >  mm/hugetlb.c | 15 ++++++++++-----
> > > >  1 file changed, 10 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > > > index 0bf4444..2b67aff 100644
> > > > --- a/mm/hugetlb.c
> > > > +++ b/mm/hugetlb.c
> > > > @@ -1574,7 +1574,7 @@ static struct page *__alloc_huge_page(struct hstate *h,
> > > >  	struct page *page;
> > > >  	unsigned int r_nid;
> > > > 
> > > > -	if (hstate_is_gigantic(h))
> > > > +	if (hstate_is_gigantic(h) && !gigantic_page_supported())
> > > >  		return NULL;
> > > 
> > > Is it really possible to stumble over gigantic pages w/o having
> > > gigantic_page_supported()?
> > > 
> > > Also, I've just tried this on s390 and counter.sh still fails after these
> > > patches, and it should fail on all archs as long as you use the gigantic
> > I guess the failure you met is caused by the libhugetlbfs itself, there are
> > several bugs in the libhugetlbfs. I have a patch set for the
> > libhugetlbfs too. I will send it as soon as possible.
> > 
> > > hugepage size as default hugepage size. This is because you only changed
> > > nr_overcommit_hugepages_store(), which handles nr_overcommit_hugepages
> > > in sysfs, and missed hugetlb_overcommit_handler() which handles
> > > /proc/sys/vm/nr_overcommit_hugepages for the default sized hugepages.
> > This is wrong. :)
> Sorry, I was wrong :). The counters test does call the /proc/sys/vm/nr_overcommit_hugepages.
> But in the arm64, it does not trigger a fail for the counters test.
> In an other word, I did not change the hugetlb_overcommit_handler(),
> the counters.sh also can pass in arm64. 
After I add the "default_hugepagesz=32M" to the kernel cmdlin, I can
reproduce this issue. Thanks for point this.

> 
> I will look at the lockdep issue.
I tested the new patch (will be sent out later) on the arm64 platform,
and I did not meet the lockdep issue when I enabled the lockdep.
The following is my config:

	CONFIG_LOCKD=y
	CONFIG_LOCKD_V4=y
	CONFIG_LOCKUP_DETECTOR=y
        # CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
	CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0
	CONFIG_DEBUG_SPINLOCK=y
	CONFIG_DEBUG_LOCK_ALLOC=y
	CONFIG_PROVE_LOCKING=y
	CONFIG_LOCKDEP=y
	CONFIG_LOCK_STAT=y
	CONFIG_DEBUG_LOCKDEP=y
	CONFIG_DEBUG_LOCKING_API_SELFTESTS=y
	
So do I miss something? 

Thanks	
Huang Shijie

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/2] mm: hugetlb: support gigantic surplus pages
  2016-11-08  9:17           ` Huang Shijie
@ 2016-11-08 19:27             ` Gerald Schaefer
  -1 siblings, 0 replies; 32+ messages in thread
From: Gerald Schaefer @ 2016-11-08 19:27 UTC (permalink / raw)
  To: Huang Shijie
  Cc: akpm, catalin.marinas, n-horiguchi, mhocko, kirill.shutemov,
	aneesh.kumar, mike.kravetz, linux-mm, will.deacon, steve.capper,
	kaly.xin, nd, linux-arm-kernel

On Tue, 8 Nov 2016 17:17:28 +0800
Huang Shijie <shijie.huang@arm.com> wrote:

> > I will look at the lockdep issue.
> I tested the new patch (will be sent out later) on the arm64 platform,
> and I did not meet the lockdep issue when I enabled the lockdep.
> The following is my config:
> 
> 	CONFIG_LOCKD=y
> 	CONFIG_LOCKD_V4=y
> 	CONFIG_LOCKUP_DETECTOR=y
>         # CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
> 	CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0
> 	CONFIG_DEBUG_SPINLOCK=y
> 	CONFIG_DEBUG_LOCK_ALLOC=y
> 	CONFIG_PROVE_LOCKING=y
> 	CONFIG_LOCKDEP=y
> 	CONFIG_LOCK_STAT=y
> 	CONFIG_DEBUG_LOCKDEP=y
> 	CONFIG_DEBUG_LOCKING_API_SELFTESTS=y
> 	
> So do I miss something? 

Those options should be OK. Meanwhile I looked into this a little more,
and the problematic line/lock is spin_lock_irqsave(&z->lock, flags) at
the top of alloc_gigantic_page(). From the lockdep trace we see that
it is triggered by an mmap(), and then hugetlb_acct_memory() ->
__alloc_huge_page() -> alloc_gigantic_page().

However, in between those functions (inside gather_surplus_pages())
a NUMA_NO_NODE node id comes into play. And this finally results in
alloc_gigantic_page() being called with NUMA_NO_NODE as nid (which is
-1), and NODE_DATA(nid)->node_zones will then reach into Nirvana.

So, I guess the problem is a missing NUMA_NO_NODE check in
alloc_gigantic_page(), similar to the one in
__hugetlb_alloc_buddy_huge_page(). And somehow this was not a problem
before the gigantic surplus change.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 2/2] mm: hugetlb: support gigantic surplus pages
@ 2016-11-08 19:27             ` Gerald Schaefer
  0 siblings, 0 replies; 32+ messages in thread
From: Gerald Schaefer @ 2016-11-08 19:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 8 Nov 2016 17:17:28 +0800
Huang Shijie <shijie.huang@arm.com> wrote:

> > I will look at the lockdep issue.
> I tested the new patch (will be sent out later) on the arm64 platform,
> and I did not meet the lockdep issue when I enabled the lockdep.
> The following is my config:
> 
> 	CONFIG_LOCKD=y
> 	CONFIG_LOCKD_V4=y
> 	CONFIG_LOCKUP_DETECTOR=y
>         # CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
> 	CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0
> 	CONFIG_DEBUG_SPINLOCK=y
> 	CONFIG_DEBUG_LOCK_ALLOC=y
> 	CONFIG_PROVE_LOCKING=y
> 	CONFIG_LOCKDEP=y
> 	CONFIG_LOCK_STAT=y
> 	CONFIG_DEBUG_LOCKDEP=y
> 	CONFIG_DEBUG_LOCKING_API_SELFTESTS=y
> 	
> So do I miss something? 

Those options should be OK. Meanwhile I looked into this a little more,
and the problematic line/lock is spin_lock_irqsave(&z->lock, flags) at
the top of alloc_gigantic_page(). From the lockdep trace we see that
it is triggered by an mmap(), and then hugetlb_acct_memory() ->
__alloc_huge_page() -> alloc_gigantic_page().

However, in between those functions (inside gather_surplus_pages())
a NUMA_NO_NODE node id comes into play. And this finally results in
alloc_gigantic_page() being called with NUMA_NO_NODE as nid (which is
-1), and NODE_DATA(nid)->node_zones will then reach into Nirvana.

So, I guess the problem is a missing NUMA_NO_NODE check in
alloc_gigantic_page(), similar to the one in
__hugetlb_alloc_buddy_huge_page(). And somehow this was not a problem
before the gigantic surplus change.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 2/2] mm: hugetlb: support gigantic surplus pages
  2016-11-03  2:51   ` Huang Shijie
@ 2016-11-09  7:08     ` Huang Shijie
  -1 siblings, 0 replies; 32+ messages in thread
From: Huang Shijie @ 2016-11-09  7:08 UTC (permalink / raw)
  To: akpm, catalin.marinas
  Cc: n-horiguchi, mhocko, kirill.shutemov, aneesh.kumar,
	gerald.schaefer, mike.kravetz, linux-mm, will.deacon,
	steve.capper, kaly.xin, nd, linux-arm-kernel, Huang Shijie

When testing the gigantic page whose order is too large for the buddy
allocator, the libhugetlbfs test case "counter.sh" will fail.

The failure is caused by:
 1) kernel fails to allocate a gigantic page for the surplus case.
    And the gather_surplus_pages() will return NULL in the end.

 2) The condition checks for "over-commit" is wrong.

This patch adds code to allocate the gigantic page in the
__alloc_huge_page(). After this patch, gather_surplus_pages()
can return a gigantic page for the surplus case.

This patch changes the condition checks for:
     return_unused_surplus_pages()
     nr_overcommit_hugepages_store()
     hugetlb_overcommit_handler()

This patch also set @nid with proper value when NUMA_NO_NODE is
passed to alloc_gigantic_page().

After this patch, the counter.sh can pass for the gigantic page.

Acked-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Huang Shijie <shijie.huang@arm.com>
---
  1. fix the wrong check in hugetlb_overcommit_handler();
  2. try to fix the s390 issue.
---
 mm/hugetlb.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 9fdfc24..5dbfd62 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1095,6 +1095,9 @@ static struct page *alloc_gigantic_page(int nid, unsigned int order)
 	unsigned long ret, pfn, flags;
 	struct zone *z;
 
+	if (nid == NUMA_NO_NODE)
+		nid = numa_mem_id();
+
 	z = NODE_DATA(nid)->node_zones;
 	for (; z - NODE_DATA(nid)->node_zones < MAX_NR_ZONES; z++) {
 		spin_lock_irqsave(&z->lock, flags);
@@ -1578,7 +1581,7 @@ static struct page *__alloc_huge_page(struct hstate *h,
 	struct page *page;
 	unsigned int r_nid;
 
-	if (hstate_is_gigantic(h))
+	if (hstate_is_gigantic(h) && !gigantic_page_supported())
 		return NULL;
 
 	/*
@@ -1623,7 +1626,13 @@ static struct page *__alloc_huge_page(struct hstate *h,
 	}
 	spin_unlock(&hugetlb_lock);
 
-	page = __hugetlb_alloc_buddy_huge_page(h, vma, addr, nid);
+	if (hstate_is_gigantic(h))  {
+		page = alloc_gigantic_page(nid, huge_page_order(h));
+		if (page)
+			prep_compound_gigantic_page(page, huge_page_order(h));
+	} else {
+		page = __hugetlb_alloc_buddy_huge_page(h, vma, addr, nid);
+	}
 
 	spin_lock(&hugetlb_lock);
 	if (page) {
@@ -1790,8 +1799,7 @@ static void return_unused_surplus_pages(struct hstate *h,
 	/* Uncommit the reservation */
 	h->resv_huge_pages -= unused_resv_pages;
 
-	/* Cannot return gigantic pages currently */
-	if (hstate_is_gigantic(h))
+	if (hstate_is_gigantic(h) && !gigantic_page_supported())
 		return;
 
 	nr_pages = min(unused_resv_pages, h->surplus_huge_pages);
@@ -2443,7 +2451,7 @@ static ssize_t nr_overcommit_hugepages_store(struct kobject *kobj,
 	unsigned long input;
 	struct hstate *h = kobj_to_hstate(kobj, NULL);
 
-	if (hstate_is_gigantic(h))
+	if (hstate_is_gigantic(h) && !gigantic_page_supported())
 		return -EINVAL;
 
 	err = kstrtoul(buf, 10, &input);
@@ -2884,7 +2892,7 @@ int hugetlb_overcommit_handler(struct ctl_table *table, int write,
 
 	tmp = h->nr_overcommit_huge_pages;
 
-	if (write && hstate_is_gigantic(h))
+	if (write && hstate_is_gigantic(h) && !gigantic_page_supported())
 		return -EINVAL;
 
 	table->data = &tmp;
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 2/2] mm: hugetlb: support gigantic surplus pages
@ 2016-11-09  7:08     ` Huang Shijie
  0 siblings, 0 replies; 32+ messages in thread
From: Huang Shijie @ 2016-11-09  7:08 UTC (permalink / raw)
  To: linux-arm-kernel

When testing the gigantic page whose order is too large for the buddy
allocator, the libhugetlbfs test case "counter.sh" will fail.

The failure is caused by:
 1) kernel fails to allocate a gigantic page for the surplus case.
    And the gather_surplus_pages() will return NULL in the end.

 2) The condition checks for "over-commit" is wrong.

This patch adds code to allocate the gigantic page in the
__alloc_huge_page(). After this patch, gather_surplus_pages()
can return a gigantic page for the surplus case.

This patch changes the condition checks for:
     return_unused_surplus_pages()
     nr_overcommit_hugepages_store()
     hugetlb_overcommit_handler()

This patch also set @nid with proper value when NUMA_NO_NODE is
passed to alloc_gigantic_page().

After this patch, the counter.sh can pass for the gigantic page.

Acked-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Huang Shijie <shijie.huang@arm.com>
---
  1. fix the wrong check in hugetlb_overcommit_handler();
  2. try to fix the s390 issue.
---
 mm/hugetlb.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 9fdfc24..5dbfd62 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1095,6 +1095,9 @@ static struct page *alloc_gigantic_page(int nid, unsigned int order)
 	unsigned long ret, pfn, flags;
 	struct zone *z;
 
+	if (nid == NUMA_NO_NODE)
+		nid = numa_mem_id();
+
 	z = NODE_DATA(nid)->node_zones;
 	for (; z - NODE_DATA(nid)->node_zones < MAX_NR_ZONES; z++) {
 		spin_lock_irqsave(&z->lock, flags);
@@ -1578,7 +1581,7 @@ static struct page *__alloc_huge_page(struct hstate *h,
 	struct page *page;
 	unsigned int r_nid;
 
-	if (hstate_is_gigantic(h))
+	if (hstate_is_gigantic(h) && !gigantic_page_supported())
 		return NULL;
 
 	/*
@@ -1623,7 +1626,13 @@ static struct page *__alloc_huge_page(struct hstate *h,
 	}
 	spin_unlock(&hugetlb_lock);
 
-	page = __hugetlb_alloc_buddy_huge_page(h, vma, addr, nid);
+	if (hstate_is_gigantic(h))  {
+		page = alloc_gigantic_page(nid, huge_page_order(h));
+		if (page)
+			prep_compound_gigantic_page(page, huge_page_order(h));
+	} else {
+		page = __hugetlb_alloc_buddy_huge_page(h, vma, addr, nid);
+	}
 
 	spin_lock(&hugetlb_lock);
 	if (page) {
@@ -1790,8 +1799,7 @@ static void return_unused_surplus_pages(struct hstate *h,
 	/* Uncommit the reservation */
 	h->resv_huge_pages -= unused_resv_pages;
 
-	/* Cannot return gigantic pages currently */
-	if (hstate_is_gigantic(h))
+	if (hstate_is_gigantic(h) && !gigantic_page_supported())
 		return;
 
 	nr_pages = min(unused_resv_pages, h->surplus_huge_pages);
@@ -2443,7 +2451,7 @@ static ssize_t nr_overcommit_hugepages_store(struct kobject *kobj,
 	unsigned long input;
 	struct hstate *h = kobj_to_hstate(kobj, NULL);
 
-	if (hstate_is_gigantic(h))
+	if (hstate_is_gigantic(h) && !gigantic_page_supported())
 		return -EINVAL;
 
 	err = kstrtoul(buf, 10, &input);
@@ -2884,7 +2892,7 @@ int hugetlb_overcommit_handler(struct ctl_table *table, int write,
 
 	tmp = h->nr_overcommit_huge_pages;
 
-	if (write && hstate_is_gigantic(h))
+	if (write && hstate_is_gigantic(h) && !gigantic_page_supported())
 		return -EINVAL;
 
 	table->data = &tmp;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/2] mm: hugetlb: support gigantic surplus pages
  2016-11-08 19:27             ` Gerald Schaefer
@ 2016-11-09  7:12               ` Huang Shijie
  -1 siblings, 0 replies; 32+ messages in thread
From: Huang Shijie @ 2016-11-09  7:12 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: akpm, catalin.marinas, n-horiguchi, mhocko, kirill.shutemov,
	aneesh.kumar, mike.kravetz, linux-mm, will.deacon, steve.capper,
	kaly.xin, nd, linux-arm-kernel

On Tue, Nov 08, 2016 at 08:27:42PM +0100, Gerald Schaefer wrote:
> On Tue, 8 Nov 2016 17:17:28 +0800
> Huang Shijie <shijie.huang@arm.com> wrote:
> 
> > > I will look at the lockdep issue.
> > I tested the new patch (will be sent out later) on the arm64 platform,
> > and I did not meet the lockdep issue when I enabled the lockdep.
> > The following is my config:
> > 
> > 	CONFIG_LOCKD=y
> > 	CONFIG_LOCKD_V4=y
> > 	CONFIG_LOCKUP_DETECTOR=y
> >         # CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
> > 	CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0
> > 	CONFIG_DEBUG_SPINLOCK=y
> > 	CONFIG_DEBUG_LOCK_ALLOC=y
> > 	CONFIG_PROVE_LOCKING=y
> > 	CONFIG_LOCKDEP=y
> > 	CONFIG_LOCK_STAT=y
> > 	CONFIG_DEBUG_LOCKDEP=y
> > 	CONFIG_DEBUG_LOCKING_API_SELFTESTS=y
> > 	
> > So do I miss something? 
> 
> Those options should be OK. Meanwhile I looked into this a little more,
> and the problematic line/lock is spin_lock_irqsave(&z->lock, flags) at
> the top of alloc_gigantic_page(). From the lockdep trace we see that
> it is triggered by an mmap(), and then hugetlb_acct_memory() ->
> __alloc_huge_page() -> alloc_gigantic_page().
> 
> However, in between those functions (inside gather_surplus_pages())
> a NUMA_NO_NODE node id comes into play. And this finally results in
> alloc_gigantic_page() being called with NUMA_NO_NODE as nid (which is
> -1), and NODE_DATA(nid)->node_zones will then reach into Nirvana.
Thanks for pointing this.
I sent out the new patch just now. Could you please try it again?
I added a NUMA_NO_NODE check in the alloc_gigantic_page();

thanks
Huang Shijie

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 2/2] mm: hugetlb: support gigantic surplus pages
@ 2016-11-09  7:12               ` Huang Shijie
  0 siblings, 0 replies; 32+ messages in thread
From: Huang Shijie @ 2016-11-09  7:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Nov 08, 2016 at 08:27:42PM +0100, Gerald Schaefer wrote:
> On Tue, 8 Nov 2016 17:17:28 +0800
> Huang Shijie <shijie.huang@arm.com> wrote:
> 
> > > I will look at the lockdep issue.
> > I tested the new patch (will be sent out later) on the arm64 platform,
> > and I did not meet the lockdep issue when I enabled the lockdep.
> > The following is my config:
> > 
> > 	CONFIG_LOCKD=y
> > 	CONFIG_LOCKD_V4=y
> > 	CONFIG_LOCKUP_DETECTOR=y
> >         # CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
> > 	CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0
> > 	CONFIG_DEBUG_SPINLOCK=y
> > 	CONFIG_DEBUG_LOCK_ALLOC=y
> > 	CONFIG_PROVE_LOCKING=y
> > 	CONFIG_LOCKDEP=y
> > 	CONFIG_LOCK_STAT=y
> > 	CONFIG_DEBUG_LOCKDEP=y
> > 	CONFIG_DEBUG_LOCKING_API_SELFTESTS=y
> > 	
> > So do I miss something? 
> 
> Those options should be OK. Meanwhile I looked into this a little more,
> and the problematic line/lock is spin_lock_irqsave(&z->lock, flags) at
> the top of alloc_gigantic_page(). From the lockdep trace we see that
> it is triggered by an mmap(), and then hugetlb_acct_memory() ->
> __alloc_huge_page() -> alloc_gigantic_page().
> 
> However, in between those functions (inside gather_surplus_pages())
> a NUMA_NO_NODE node id comes into play. And this finally results in
> alloc_gigantic_page() being called with NUMA_NO_NODE as nid (which is
> -1), and NODE_DATA(nid)->node_zones will then reach into Nirvana.
Thanks for pointing this.
I sent out the new patch just now. Could you please try it again?
I added a NUMA_NO_NODE check in the alloc_gigantic_page();

thanks
Huang Shijie

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 2/2] mm: hugetlb: support gigantic surplus pages
  2016-11-09  7:08     ` Huang Shijie
@ 2016-11-09 15:55       ` Gerald Schaefer
  -1 siblings, 0 replies; 32+ messages in thread
From: Gerald Schaefer @ 2016-11-09 15:55 UTC (permalink / raw)
  To: Huang Shijie
  Cc: akpm, catalin.marinas, n-horiguchi, mhocko, kirill.shutemov,
	aneesh.kumar, mike.kravetz, linux-mm, will.deacon, steve.capper,
	kaly.xin, nd, linux-arm-kernel

On Wed, 9 Nov 2016 15:08:14 +0800
Huang Shijie <shijie.huang@arm.com> wrote:

> When testing the gigantic page whose order is too large for the buddy
> allocator, the libhugetlbfs test case "counter.sh" will fail.
> 
> The failure is caused by:
>  1) kernel fails to allocate a gigantic page for the surplus case.
>     And the gather_surplus_pages() will return NULL in the end.
> 
>  2) The condition checks for "over-commit" is wrong.
> 
> This patch adds code to allocate the gigantic page in the
> __alloc_huge_page(). After this patch, gather_surplus_pages()
> can return a gigantic page for the surplus case.
> 
> This patch changes the condition checks for:
>      return_unused_surplus_pages()
>      nr_overcommit_hugepages_store()
>      hugetlb_overcommit_handler()
> 
> This patch also set @nid with proper value when NUMA_NO_NODE is
> passed to alloc_gigantic_page().
> 
> After this patch, the counter.sh can pass for the gigantic page.
> 
> Acked-by: Steve Capper <steve.capper@arm.com>
> Signed-off-by: Huang Shijie <shijie.huang@arm.com>
> ---
>   1. fix the wrong check in hugetlb_overcommit_handler();
>   2. try to fix the s390 issue.
> ---
>  mm/hugetlb.c | 20 ++++++++++++++------
>  1 file changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 9fdfc24..5dbfd62 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1095,6 +1095,9 @@ static struct page *alloc_gigantic_page(int nid, unsigned int order)
>  	unsigned long ret, pfn, flags;
>  	struct zone *z;
> 
> +	if (nid == NUMA_NO_NODE)
> +		nid = numa_mem_id();
> +

Now counter.sh works (on s390) w/o the lockdep warning. However, it looks
like this change will now result in inconsistent behavior compared to the
normal sized hugepages, regarding surplus page allocation. Setting nid to
numa_mem_id() means that only the node of the current CPU will be considered
for allocating a gigantic page, as opposed to just "preferring" the current
node in the normal size case (__hugetlb_alloc_buddy_huge_page() ->
alloc_pages_node()) with a fallback to using other nodes.

I am not really familiar with NUMA, and I might be wrong here, but if
this is true then gigantic pages, which may be hard allocate at runtime
in general, will be even harder to find (as surplus pages) because you
only look on the current node.

I honestly do not understand why alloc_gigantic_page() needs a nid
parameter at all, since it looks like it will only be called from
alloc_fresh_gigantic_page_node(), which in turn is only called
from alloc_fresh_gigantic_page() in a "for_each_node" loop (at least
before your patch).

Now it could be an option to also use alloc_fresh_gigantic_page()
in your patch, instead of directly calling alloc_gigantic_page(),
in __alloc_huge_page(). This would fix the "local node only" issue,
but I am not sure how to handle the required nodes_allowed parameter.

Maybe someone with more NUMA insight could have a look at this.
The patch as it is also seems to work, with the "local node only"
restriction, so it may be an option to just accept this restriction.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 2/2] mm: hugetlb: support gigantic surplus pages
@ 2016-11-09 15:55       ` Gerald Schaefer
  0 siblings, 0 replies; 32+ messages in thread
From: Gerald Schaefer @ 2016-11-09 15:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 9 Nov 2016 15:08:14 +0800
Huang Shijie <shijie.huang@arm.com> wrote:

> When testing the gigantic page whose order is too large for the buddy
> allocator, the libhugetlbfs test case "counter.sh" will fail.
> 
> The failure is caused by:
>  1) kernel fails to allocate a gigantic page for the surplus case.
>     And the gather_surplus_pages() will return NULL in the end.
> 
>  2) The condition checks for "over-commit" is wrong.
> 
> This patch adds code to allocate the gigantic page in the
> __alloc_huge_page(). After this patch, gather_surplus_pages()
> can return a gigantic page for the surplus case.
> 
> This patch changes the condition checks for:
>      return_unused_surplus_pages()
>      nr_overcommit_hugepages_store()
>      hugetlb_overcommit_handler()
> 
> This patch also set @nid with proper value when NUMA_NO_NODE is
> passed to alloc_gigantic_page().
> 
> After this patch, the counter.sh can pass for the gigantic page.
> 
> Acked-by: Steve Capper <steve.capper@arm.com>
> Signed-off-by: Huang Shijie <shijie.huang@arm.com>
> ---
>   1. fix the wrong check in hugetlb_overcommit_handler();
>   2. try to fix the s390 issue.
> ---
>  mm/hugetlb.c | 20 ++++++++++++++------
>  1 file changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 9fdfc24..5dbfd62 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1095,6 +1095,9 @@ static struct page *alloc_gigantic_page(int nid, unsigned int order)
>  	unsigned long ret, pfn, flags;
>  	struct zone *z;
> 
> +	if (nid == NUMA_NO_NODE)
> +		nid = numa_mem_id();
> +

Now counter.sh works (on s390) w/o the lockdep warning. However, it looks
like this change will now result in inconsistent behavior compared to the
normal sized hugepages, regarding surplus page allocation. Setting nid to
numa_mem_id() means that only the node of the current CPU will be considered
for allocating a gigantic page, as opposed to just "preferring" the current
node in the normal size case (__hugetlb_alloc_buddy_huge_page() ->
alloc_pages_node()) with a fallback to using other nodes.

I am not really familiar with NUMA, and I might be wrong here, but if
this is true then gigantic pages, which may be hard allocate at runtime
in general, will be even harder to find (as surplus pages) because you
only look on the current node.

I honestly do not understand why alloc_gigantic_page() needs a nid
parameter at all, since it looks like it will only be called from
alloc_fresh_gigantic_page_node(), which in turn is only called
from alloc_fresh_gigantic_page() in a "for_each_node" loop (at least
before your patch).

Now it could be an option to also use alloc_fresh_gigantic_page()
in your patch, instead of directly calling alloc_gigantic_page(),
in __alloc_huge_page(). This would fix the "local node only" issue,
but I am not sure how to handle the required nodes_allowed parameter.

Maybe someone with more NUMA insight could have a look at this.
The patch as it is also seems to work, with the "local node only"
restriction, so it may be an option to just accept this restriction.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 2/2] mm: hugetlb: support gigantic surplus pages
  2016-11-09 15:55       ` Gerald Schaefer
@ 2016-11-10  7:03         ` Huang Shijie
  -1 siblings, 0 replies; 32+ messages in thread
From: Huang Shijie @ 2016-11-10  7:03 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: akpm, catalin.marinas, n-horiguchi, mhocko, kirill.shutemov,
	aneesh.kumar, mike.kravetz, linux-mm, will.deacon, steve.capper,
	kaly.xin, nd, linux-arm-kernel

On Wed, Nov 09, 2016 at 04:55:49PM +0100, Gerald Schaefer wrote:
> > index 9fdfc24..5dbfd62 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1095,6 +1095,9 @@ static struct page *alloc_gigantic_page(int nid, unsigned int order)
> >  	unsigned long ret, pfn, flags;
> >  	struct zone *z;
> > 
> > +	if (nid == NUMA_NO_NODE)
> > +		nid = numa_mem_id();
> > +
> 
> Now counter.sh works (on s390) w/o the lockdep warning. However, it looks
Good news to me :)
We have found the root cause of the s390 issue.

> like this change will now result in inconsistent behavior compared to the
> normal sized hugepages, regarding surplus page allocation. Setting nid to
> numa_mem_id() means that only the node of the current CPU will be considered
> for allocating a gigantic page, as opposed to just "preferring" the current
> node in the normal size case (__hugetlb_alloc_buddy_huge_page() ->
> alloc_pages_node()) with a fallback to using other nodes.
Yes.

> 
> I am not really familiar with NUMA, and I might be wrong here, but if
> this is true then gigantic pages, which may be hard allocate at runtime
> in general, will be even harder to find (as surplus pages) because you
> only look on the current node.
Okay, I will try to fix this in the next version.

> 
> I honestly do not understand why alloc_gigantic_page() needs a nid
> parameter at all, since it looks like it will only be called from
> alloc_fresh_gigantic_page_node(), which in turn is only called
> from alloc_fresh_gigantic_page() in a "for_each_node" loop (at least
> before your patch).
> 
> Now it could be an option to also use alloc_fresh_gigantic_page()
> in your patch, instead of directly calling alloc_gigantic_page(),
Yes, a good suggestion. But I need to do some change to the
alloc_fresh_gigantic_page().	


Thanks
Huang Shijie

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 2/2] mm: hugetlb: support gigantic surplus pages
@ 2016-11-10  7:03         ` Huang Shijie
  0 siblings, 0 replies; 32+ messages in thread
From: Huang Shijie @ 2016-11-10  7:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 09, 2016 at 04:55:49PM +0100, Gerald Schaefer wrote:
> > index 9fdfc24..5dbfd62 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1095,6 +1095,9 @@ static struct page *alloc_gigantic_page(int nid, unsigned int order)
> >  	unsigned long ret, pfn, flags;
> >  	struct zone *z;
> > 
> > +	if (nid == NUMA_NO_NODE)
> > +		nid = numa_mem_id();
> > +
> 
> Now counter.sh works (on s390) w/o the lockdep warning. However, it looks
Good news to me :)
We have found the root cause of the s390 issue.

> like this change will now result in inconsistent behavior compared to the
> normal sized hugepages, regarding surplus page allocation. Setting nid to
> numa_mem_id() means that only the node of the current CPU will be considered
> for allocating a gigantic page, as opposed to just "preferring" the current
> node in the normal size case (__hugetlb_alloc_buddy_huge_page() ->
> alloc_pages_node()) with a fallback to using other nodes.
Yes.

> 
> I am not really familiar with NUMA, and I might be wrong here, but if
> this is true then gigantic pages, which may be hard allocate at runtime
> in general, will be even harder to find (as surplus pages) because you
> only look on the current node.
Okay, I will try to fix this in the next version.

> 
> I honestly do not understand why alloc_gigantic_page() needs a nid
> parameter at all, since it looks like it will only be called from
> alloc_fresh_gigantic_page_node(), which in turn is only called
> from alloc_fresh_gigantic_page() in a "for_each_node" loop (at least
> before your patch).
> 
> Now it could be an option to also use alloc_fresh_gigantic_page()
> in your patch, instead of directly calling alloc_gigantic_page(),
Yes, a good suggestion. But I need to do some change to the
alloc_fresh_gigantic_page().	


Thanks
Huang Shijie

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2016-11-10  7:04 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-03  2:51 [PATCH 0/2] mm: fix the "counter.sh" failure for libhugetlbfs Huang Shijie
2016-11-03  2:51 ` Huang Shijie
2016-11-03  2:51 ` [PATCH 1/2] mm: hugetlb: rename some allocation functions Huang Shijie
2016-11-03  2:51   ` Huang Shijie
2016-11-04  3:11   ` [PATCH] " Huang Shijie
2016-11-04  3:11     ` Huang Shijie
2016-11-03  2:51 ` [PATCH 2/2] mm: hugetlb: support gigantic surplus pages Huang Shijie
2016-11-03  2:51   ` Huang Shijie
2016-11-03  3:13   ` kbuild test robot
2016-11-03  3:13     ` kbuild test robot
2016-11-07 15:25   ` Gerald Schaefer
2016-11-07 15:25     ` Gerald Schaefer
2016-11-08  2:19     ` Huang Shijie
2016-11-08  2:19       ` Huang Shijie
2016-11-08  7:08       ` Huang Shijie
2016-11-08  7:08         ` Huang Shijie
2016-11-08  9:17         ` Huang Shijie
2016-11-08  9:17           ` Huang Shijie
2016-11-08 19:27           ` Gerald Schaefer
2016-11-08 19:27             ` Gerald Schaefer
2016-11-09  7:12             ` Huang Shijie
2016-11-09  7:12               ` Huang Shijie
2016-11-09  7:08   ` [PATCH v2 " Huang Shijie
2016-11-09  7:08     ` Huang Shijie
2016-11-09 15:55     ` Gerald Schaefer
2016-11-09 15:55       ` Gerald Schaefer
2016-11-10  7:03       ` Huang Shijie
2016-11-10  7:03         ` Huang Shijie
2016-11-03 17:22 ` [PATCH 0/2] mm: fix the "counter.sh" failure for libhugetlbfs Randy Dunlap
2016-11-03 17:22   ` Randy Dunlap
2016-11-04  1:59   ` Huang Shijie
2016-11-04  1:59     ` Huang Shijie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.