All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm/madvise: Enable (soft|hard) offline of HugeTLB pages at PGD level
@ 2017-04-25 11:13 ` Anshuman Khandual
  0 siblings, 0 replies; 3+ messages in thread
From: Anshuman Khandual @ 2017-04-25 11:13 UTC (permalink / raw)
  To: linux-kernel, linux-mm; +Cc: n-horiguchi, akpm, aneesh.kumar

Though migrating gigantic HugeTLB pages does not sound much like real
world use case, they can be affected by memory errors. Hence migration
at the PGD level HugeTLB pages should be supported just to enable soft
and hard offline use cases.

While allocating the new gigantic HugeTLB page, it should not matter
whether new page comes from the same node or not. There would be very
few gigantic pages on the system afterall, we should not be bothered
about node locality when trying to save a big page from crashing.

This introduces a new HugeTLB allocator called alloc_huge_page_nonid()
which will scan over all online nodes on the system and allocate a
single HugeTLB page.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
Tested on a POWER8 machine with 16GB pages along with Aneesh's
recent HugeTLB enablement patch series on powerpc which can
be found here.

https://lkml.org/lkml/2017/4/17/225

Here, we directly call alloc_huge_page_nonid() which ignores the
node locality. But we can also first call normal alloc_huge_page()
with the node number and if that fails to allocate only then call
alloc_huge_page_nonid() as a fallback option.

Aneesh mentioned about the waste of memory if we just have to
soft offline a single page. The problem persists both on PGD
as well as PMD level HugeTLB pages. Tried solving the problem
with https://patchwork.kernel.org/patch/9690119/ but right now
madvise splits the entire range of HugeTLB pages (if the page
is HugeTLB one) and calls soft_offline_page() on the head page
of each HugeTLB page as soft_offline_page() acts on the entire
HugeTLB range not just the individual pages. Changing the iterator
in madvise() scan over individual pages solves the problem but
then it creates multiple HugeTLB migrations (HUGE_PAGE_SIZE /
PAGE_SIZE times to be precise) if we really have to soft offline
a single HugeTLB page which is not optimal.

Hence for now, lets just enable PGD level HugeTLB soft offline
at par with the PMD level HugeTLB before we can go back and
address the memory wastage problem comprehensively for both
PGD and PMD level HugeTLB page as mentioned above.

 include/linux/hugetlb.h |  8 +++++++-
 mm/hugetlb.c            | 17 +++++++++++++++++
 mm/memory-failure.c     |  8 ++++++--
 3 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 04b73a9c8b4b..882e6241da71 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -347,6 +347,7 @@ struct huge_bootmem_page {
 
 struct page *alloc_huge_page(struct vm_area_struct *vma,
 				unsigned long addr, int avoid_reserve);
+struct page *alloc_huge_page_nonid(struct hstate *h);
 struct page *alloc_huge_page_node(struct hstate *h, int nid);
 struct page *alloc_huge_page_noerr(struct vm_area_struct *vma,
 				unsigned long addr, int avoid_reserve);
@@ -473,7 +474,11 @@ extern int dissolve_free_huge_pages(unsigned long start_pfn,
 static inline bool hugepage_migration_supported(struct hstate *h)
 {
 #ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
-	return huge_page_shift(h) == PMD_SHIFT;
+	if ((huge_page_shift(h) == PMD_SHIFT) ||
+		(huge_page_shift(h) == PGDIR_SHIFT))
+		return true;
+	else
+		return false;
 #else
 	return false;
 #endif
@@ -511,6 +516,7 @@ static inline void hugetlb_count_sub(long l, struct mm_struct *mm)
 #else	/* CONFIG_HUGETLB_PAGE */
 struct hstate {};
 #define alloc_huge_page(v, a, r) NULL
+#define alloc_huge_page_nonid(h) NULL
 #define alloc_huge_page_node(h, nid) NULL
 #define alloc_huge_page_noerr(v, a, r) NULL
 #define alloc_bootmem_huge_page(h) NULL
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 97a44db06850..bd96fff2bc09 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1669,6 +1669,23 @@ struct page *__alloc_buddy_huge_page_with_mpol(struct hstate *h,
 	return __alloc_buddy_huge_page(h, vma, addr, NUMA_NO_NODE);
 }
 
+struct page *alloc_huge_page_nonid(struct hstate *h)
+{
+	struct page *page = NULL;
+	int nid = 0;
+
+	spin_lock(&hugetlb_lock);
+	if (h->free_huge_pages - h->resv_huge_pages > 0) {
+		for_each_online_node(nid) {
+			page = dequeue_huge_page_node(h, nid);
+			if (page)
+				break;
+		}
+	}
+	spin_unlock(&hugetlb_lock);
+	return page;
+}
+
 /*
  * This allocation function is useful in the context where vma is irrelevant.
  * E.g. soft-offlining uses this function because it only cares physical
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index fe64d7729a8e..d4f5710cf3f7 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1481,11 +1481,15 @@ EXPORT_SYMBOL(unpoison_memory);
 static struct page *new_page(struct page *p, unsigned long private, int **x)
 {
 	int nid = page_to_nid(p);
-	if (PageHuge(p))
+	if (PageHuge(p)) {
+		if (hstate_is_gigantic(page_hstate(compound_head(p))))
+			return alloc_huge_page_nonid(page_hstate(compound_head(p)));
+
 		return alloc_huge_page_node(page_hstate(compound_head(p)),
 						   nid);
-	else
+	} else {
 		return __alloc_pages_node(nid, GFP_HIGHUSER_MOVABLE, 0);
+	}
 }
 
 /*
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH] mm/madvise: Enable (soft|hard) offline of HugeTLB pages at PGD level
@ 2017-04-25 11:13 ` Anshuman Khandual
  0 siblings, 0 replies; 3+ messages in thread
From: Anshuman Khandual @ 2017-04-25 11:13 UTC (permalink / raw)
  To: linux-kernel, linux-mm; +Cc: n-horiguchi, akpm, aneesh.kumar

Though migrating gigantic HugeTLB pages does not sound much like real
world use case, they can be affected by memory errors. Hence migration
at the PGD level HugeTLB pages should be supported just to enable soft
and hard offline use cases.

While allocating the new gigantic HugeTLB page, it should not matter
whether new page comes from the same node or not. There would be very
few gigantic pages on the system afterall, we should not be bothered
about node locality when trying to save a big page from crashing.

This introduces a new HugeTLB allocator called alloc_huge_page_nonid()
which will scan over all online nodes on the system and allocate a
single HugeTLB page.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
Tested on a POWER8 machine with 16GB pages along with Aneesh's
recent HugeTLB enablement patch series on powerpc which can
be found here.

https://lkml.org/lkml/2017/4/17/225

Here, we directly call alloc_huge_page_nonid() which ignores the
node locality. But we can also first call normal alloc_huge_page()
with the node number and if that fails to allocate only then call
alloc_huge_page_nonid() as a fallback option.

Aneesh mentioned about the waste of memory if we just have to
soft offline a single page. The problem persists both on PGD
as well as PMD level HugeTLB pages. Tried solving the problem
with https://patchwork.kernel.org/patch/9690119/ but right now
madvise splits the entire range of HugeTLB pages (if the page
is HugeTLB one) and calls soft_offline_page() on the head page
of each HugeTLB page as soft_offline_page() acts on the entire
HugeTLB range not just the individual pages. Changing the iterator
in madvise() scan over individual pages solves the problem but
then it creates multiple HugeTLB migrations (HUGE_PAGE_SIZE /
PAGE_SIZE times to be precise) if we really have to soft offline
a single HugeTLB page which is not optimal.

Hence for now, lets just enable PGD level HugeTLB soft offline
at par with the PMD level HugeTLB before we can go back and
address the memory wastage problem comprehensively for both
PGD and PMD level HugeTLB page as mentioned above.

 include/linux/hugetlb.h |  8 +++++++-
 mm/hugetlb.c            | 17 +++++++++++++++++
 mm/memory-failure.c     |  8 ++++++--
 3 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 04b73a9c8b4b..882e6241da71 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -347,6 +347,7 @@ struct huge_bootmem_page {
 
 struct page *alloc_huge_page(struct vm_area_struct *vma,
 				unsigned long addr, int avoid_reserve);
+struct page *alloc_huge_page_nonid(struct hstate *h);
 struct page *alloc_huge_page_node(struct hstate *h, int nid);
 struct page *alloc_huge_page_noerr(struct vm_area_struct *vma,
 				unsigned long addr, int avoid_reserve);
@@ -473,7 +474,11 @@ extern int dissolve_free_huge_pages(unsigned long start_pfn,
 static inline bool hugepage_migration_supported(struct hstate *h)
 {
 #ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
-	return huge_page_shift(h) == PMD_SHIFT;
+	if ((huge_page_shift(h) == PMD_SHIFT) ||
+		(huge_page_shift(h) == PGDIR_SHIFT))
+		return true;
+	else
+		return false;
 #else
 	return false;
 #endif
@@ -511,6 +516,7 @@ static inline void hugetlb_count_sub(long l, struct mm_struct *mm)
 #else	/* CONFIG_HUGETLB_PAGE */
 struct hstate {};
 #define alloc_huge_page(v, a, r) NULL
+#define alloc_huge_page_nonid(h) NULL
 #define alloc_huge_page_node(h, nid) NULL
 #define alloc_huge_page_noerr(v, a, r) NULL
 #define alloc_bootmem_huge_page(h) NULL
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 97a44db06850..bd96fff2bc09 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1669,6 +1669,23 @@ struct page *__alloc_buddy_huge_page_with_mpol(struct hstate *h,
 	return __alloc_buddy_huge_page(h, vma, addr, NUMA_NO_NODE);
 }
 
+struct page *alloc_huge_page_nonid(struct hstate *h)
+{
+	struct page *page = NULL;
+	int nid = 0;
+
+	spin_lock(&hugetlb_lock);
+	if (h->free_huge_pages - h->resv_huge_pages > 0) {
+		for_each_online_node(nid) {
+			page = dequeue_huge_page_node(h, nid);
+			if (page)
+				break;
+		}
+	}
+	spin_unlock(&hugetlb_lock);
+	return page;
+}
+
 /*
  * This allocation function is useful in the context where vma is irrelevant.
  * E.g. soft-offlining uses this function because it only cares physical
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index fe64d7729a8e..d4f5710cf3f7 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1481,11 +1481,15 @@ EXPORT_SYMBOL(unpoison_memory);
 static struct page *new_page(struct page *p, unsigned long private, int **x)
 {
 	int nid = page_to_nid(p);
-	if (PageHuge(p))
+	if (PageHuge(p)) {
+		if (hstate_is_gigantic(page_hstate(compound_head(p))))
+			return alloc_huge_page_nonid(page_hstate(compound_head(p)));
+
 		return alloc_huge_page_node(page_hstate(compound_head(p)),
 						   nid);
-	else
+	} else {
 		return __alloc_pages_node(nid, GFP_HIGHUSER_MOVABLE, 0);
+	}
 }
 
 /*
-- 
2.12.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] mm/madvise: Enable (soft|hard) offline of HugeTLB pages at PGD level
  2017-04-25 11:13 ` Anshuman Khandual
  (?)
@ 2017-04-26  1:31 ` kbuild test robot
  -1 siblings, 0 replies; 3+ messages in thread
From: kbuild test robot @ 2017-04-26  1:31 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: kbuild-all, linux-kernel, linux-mm, n-horiguchi, akpm, aneesh.kumar

[-- Attachment #1: Type: text/plain, Size: 1520 bytes --]

Hi Anshuman,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.11-rc8 next-20170424]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Anshuman-Khandual/mm-madvise-Enable-soft-hard-offline-of-HugeTLB-pages-at-PGD-level/20170425-224016
config: x86_64-kexec (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   mm/memory-failure.c: In function 'new_page':
>> mm/memory-failure.c:1485:7: error: implicit declaration of function 'hstate_is_gigantic' [-Werror=implicit-function-declaration]
      if (hstate_is_gigantic(page_hstate(compound_head(p))))
          ^~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/hstate_is_gigantic +1485 mm/memory-failure.c

  1479	EXPORT_SYMBOL(unpoison_memory);
  1480	
  1481	static struct page *new_page(struct page *p, unsigned long private, int **x)
  1482	{
  1483		int nid = page_to_nid(p);
  1484		if (PageHuge(p)) {
> 1485			if (hstate_is_gigantic(page_hstate(compound_head(p))))
  1486				return alloc_huge_page_nonid(page_hstate(compound_head(p)));
  1487	
  1488			return alloc_huge_page_node(page_hstate(compound_head(p)),

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 25151 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-04-26  1:32 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-25 11:13 [PATCH] mm/madvise: Enable (soft|hard) offline of HugeTLB pages at PGD level Anshuman Khandual
2017-04-25 11:13 ` Anshuman Khandual
2017-04-26  1:31 ` kbuild test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.