linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] mm/madvise: Enable (soft|hard) offline of HugeTLB pages at PGD level
@ 2017-04-19  3:27 Anshuman Khandual
  2017-04-19  6:20 ` Aneesh Kumar K.V
  0 siblings, 1 reply; 8+ messages in thread
From: Anshuman Khandual @ 2017-04-19  3:27 UTC (permalink / raw)
  To: linux-kernel, linux-mm; +Cc: n-horiguchi, akpm, aneesh.kumar

Though migrating gigantic HugeTLB pages does not sound much like real
world use case, they can be affected by memory errors. Hence migration
at the PGD level HugeTLB pages should be supported just to enable soft
and hard offline use cases.

While allocating the new gigantic HugeTLB page, it should not matter
whether new page comes from the same node or not. There would be very
few gigantic pages on the system afterall, we should not be bothered
about node locality when trying to save a big page from crashing.

This introduces a new HugeTLB allocator called alloc_gigantic_page()
which will scan over all online nodes on the system and allocate a
single HugeTLB page.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
Tested on a POWER8 machine with 16GB pages along with Aneesh's
recent HugeTLB enablement patch series on powerpc which can
be found here.

https://lkml.org/lkml/2017/4/17/225

Here, we directly call alloc_gigantic_page() which ignores node
locality. But we can also first call normal alloc_huge_page()
with the node number and if that fails to allocate then call
alloc_gigantic_page() as a fallback option.

 include/linux/hugetlb.h |  8 +++++++-
 mm/hugetlb.c            | 17 +++++++++++++++++
 mm/memory-failure.c     |  8 ++++++--
 3 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 04b73a9c8b4b..ee75197e6ed8 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -347,6 +347,7 @@ struct huge_bootmem_page {
 
 struct page *alloc_huge_page(struct vm_area_struct *vma,
 				unsigned long addr, int avoid_reserve);
+struct page *alloc_gigantic_page(struct hstate *h);
 struct page *alloc_huge_page_node(struct hstate *h, int nid);
 struct page *alloc_huge_page_noerr(struct vm_area_struct *vma,
 				unsigned long addr, int avoid_reserve);
@@ -473,7 +474,11 @@ extern int dissolve_free_huge_pages(unsigned long start_pfn,
 static inline bool hugepage_migration_supported(struct hstate *h)
 {
 #ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
-	return huge_page_shift(h) == PMD_SHIFT;
+	if ((huge_page_shift(h) == PMD_SHIFT) ||
+		(huge_page_shift(h) == PGDIR_SHIFT))
+		return true;
+	else
+		return false;
 #else
 	return false;
 #endif
@@ -511,6 +516,7 @@ static inline void hugetlb_count_sub(long l, struct mm_struct *mm)
 #else	/* CONFIG_HUGETLB_PAGE */
 struct hstate {};
 #define alloc_huge_page(v, a, r) NULL
+#define alloc_gigantic_page(h) NULL
 #define alloc_huge_page_node(h, nid) NULL
 #define alloc_huge_page_noerr(v, a, r) NULL
 #define alloc_bootmem_huge_page(h) NULL
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 97a44db06850..f2b31dddb1bc 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1669,6 +1669,23 @@ struct page *__alloc_buddy_huge_page_with_mpol(struct hstate *h,
 	return __alloc_buddy_huge_page(h, vma, addr, NUMA_NO_NODE);
 }
 
+struct page *alloc_gigantic_page(struct hstate *h)
+{
+	struct page *page = NULL;
+	int nid = 0;
+
+	spin_lock(&hugetlb_lock);
+	if (h->free_huge_pages - h->resv_huge_pages > 0) {
+		for_each_online_node(nid) {
+			page = dequeue_huge_page_node(h, nid);
+			if (page)
+				break;
+		}
+	}
+	spin_unlock(&hugetlb_lock);
+	return page;
+}
+
 /*
  * This allocation function is useful in the context where vma is irrelevant.
  * E.g. soft-offlining uses this function because it only cares physical
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index fe64d7729a8e..619650969fe5 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1481,11 +1481,15 @@ EXPORT_SYMBOL(unpoison_memory);
 static struct page *new_page(struct page *p, unsigned long private, int **x)
 {
 	int nid = page_to_nid(p);
-	if (PageHuge(p))
+	if (PageHuge(p)) {
+		if (hstate_is_gigantic(page_hstate(compound_head(p))))
+			return alloc_gigantic_page(page_hstate(compound_head(p)));
+
 		return alloc_huge_page_node(page_hstate(compound_head(p)),
 						   nid);
-	else
+	} else {
 		return __alloc_pages_node(nid, GFP_HIGHUSER_MOVABLE, 0);
+	}
 }
 
 /*
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC] mm/madvise: Enable (soft|hard) offline of HugeTLB pages at PGD level
  2017-04-19  3:27 [RFC] mm/madvise: Enable (soft|hard) offline of HugeTLB pages at PGD level Anshuman Khandual
@ 2017-04-19  6:20 ` Aneesh Kumar K.V
  2017-04-19  6:42   ` Anshuman Khandual
  0 siblings, 1 reply; 8+ messages in thread
From: Aneesh Kumar K.V @ 2017-04-19  6:20 UTC (permalink / raw)
  To: Anshuman Khandual, linux-kernel, linux-mm; +Cc: n-horiguchi, akpm

Anshuman Khandual <khandual@linux.vnet.ibm.com> writes:

> Though migrating gigantic HugeTLB pages does not sound much like real
> world use case, they can be affected by memory errors. Hence migration
> at the PGD level HugeTLB pages should be supported just to enable soft
> and hard offline use cases.

In that case do we want to isolated the entire 16GB range ? Should we
just dequeue the page from hugepage pool convert them to regular 64K
pages and then isolate the 64K that had memory error ?

>
> While allocating the new gigantic HugeTLB page, it should not matter
> whether new page comes from the same node or not. There would be very
> few gigantic pages on the system afterall, we should not be bothered
> about node locality when trying to save a big page from crashing.
>
> This introduces a new HugeTLB allocator called alloc_gigantic_page()
> which will scan over all online nodes on the system and allocate a
> single HugeTLB page.
>


-aneesh

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] mm/madvise: Enable (soft|hard) offline of HugeTLB pages at PGD level
  2017-04-19  6:20 ` Aneesh Kumar K.V
@ 2017-04-19  6:42   ` Anshuman Khandual
  2017-04-20  5:05     ` Anshuman Khandual
  0 siblings, 1 reply; 8+ messages in thread
From: Anshuman Khandual @ 2017-04-19  6:42 UTC (permalink / raw)
  To: Aneesh Kumar K.V, Anshuman Khandual, linux-kernel, linux-mm
  Cc: n-horiguchi, akpm

On 04/19/2017 11:50 AM, Aneesh Kumar K.V wrote:
> Anshuman Khandual <khandual@linux.vnet.ibm.com> writes:
> 
>> Though migrating gigantic HugeTLB pages does not sound much like real
>> world use case, they can be affected by memory errors. Hence migration
>> at the PGD level HugeTLB pages should be supported just to enable soft
>> and hard offline use cases.
> 
> In that case do we want to isolated the entire 16GB range ? Should we
> just dequeue the page from hugepage pool convert them to regular 64K
> pages and then isolate the 64K that had memory error ?

Though its a better thing to do, assuming that we can actually dequeue
the huge page and push it to the buddy allocator as normal 64K pages
(need to check on this as the original allocation happened from the
memblock instead of the buddy allocator, guess it should be possible
given that we do similar stuff during memory hot plug). In that case
we will also have to consider the same for the PMD based HugeTLB pages
as well or it should be only for these gigantic huge pages ?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] mm/madvise: Enable (soft|hard) offline of HugeTLB pages at PGD level
  2017-04-19  6:42   ` Anshuman Khandual
@ 2017-04-20  5:05     ` Anshuman Khandual
  2017-04-20 11:06       ` [PATCH] mm/madvise: Dont poison entire HugeTLB page for single page errors Anshuman Khandual
  0 siblings, 1 reply; 8+ messages in thread
From: Anshuman Khandual @ 2017-04-20  5:05 UTC (permalink / raw)
  To: Anshuman Khandual, Aneesh Kumar K.V, linux-kernel, linux-mm
  Cc: n-horiguchi, akpm

On 04/19/2017 12:12 PM, Anshuman Khandual wrote:
> On 04/19/2017 11:50 AM, Aneesh Kumar K.V wrote:
>> Anshuman Khandual <khandual@linux.vnet.ibm.com> writes:
>>
>>> Though migrating gigantic HugeTLB pages does not sound much like real
>>> world use case, they can be affected by memory errors. Hence migration
>>> at the PGD level HugeTLB pages should be supported just to enable soft
>>> and hard offline use cases.
>> In that case do we want to isolated the entire 16GB range ? Should we
>> just dequeue the page from hugepage pool convert them to regular 64K
>> pages and then isolate the 64K that had memory error ?
> Though its a better thing to do, assuming that we can actually dequeue
> the huge page and push it to the buddy allocator as normal 64K pages
> (need to check on this as the original allocation happened from the
> memblock instead of the buddy allocator, guess it should be possible
> given that we do similar stuff during memory hot plug). In that case
> we will also have to consider the same for the PMD based HugeTLB pages
> as well or it should be only for these gigantic huge pages ?

If we look at the code inside the function soft_offline_huge_page(),
if the source huge page has been freed to the active_freelist then
we mark the *entire* hugepage as poisoned but if the huge page has
been released back to the buddy allocator then only the page in
question is marked poisoned not the entire huge page. This was
part was added with the commit a49ecbcd7 ("mm/memory-failure.c:
recheck PageHuge() after hugetlb page migrate successfully"). But
when I look at the migrate_pages() handling of huge pages, it always
calls putback_active_hugepage() after successful migration to release
the huge page back the active list not to the buddy allocator. I am 
not sure if the second half of 'if' block is ever getting executed
at all.

I am starting to wonder whats the point of releasing the huge page
to the active list in migrate_pages() when we will go and mark the
entire huge page as *poisoned*, put it in a dangling state (page->lru
pointing to itself) which can not be allocated anyway.

After migrate_pages() is successful and the source huge page is
release to the active list. We just mark the single normal page
has poisoned, get the source page from the active list and free
it to the buddy allocator. This should just take care both PMD
and PGD based huge pages.

----------------------------------------------------------------------
ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
				MIGRATE_SYNC, MR_MEMORY_FAILURE);
if (ret) {
	pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
		pfn, ret, page->flags);
	/*
	 * We know that soft_offline_huge_page() tries to migrate
	 * only one hugepage pointed to by hpage, so we need not
	 * run through the pagelist here.
	 */
	putback_active_hugepage(hpage);
	if (ret > 0)
		ret = -EIO;
} else {
	/* overcommit hugetlb page will be freed to buddy */
	if (PageHuge(page)) {
		set_page_hwpoison_huge_page(hpage);
		dequeue_hwpoisoned_huge_page(hpage);
		num_poisoned_pages_add(1 << compound_order(hpage));
	} else {
		SetPageHWPoison(page);
		num_poisoned_pages_inc();
	}
}
----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] mm/madvise: Dont poison entire HugeTLB page for single page errors
  2017-04-20  5:05     ` Anshuman Khandual
@ 2017-04-20 11:06       ` Anshuman Khandual
  2017-04-20 13:48         ` kbuild test robot
  2017-05-12  8:10         ` Naoya Horiguchi
  0 siblings, 2 replies; 8+ messages in thread
From: Anshuman Khandual @ 2017-04-20 11:06 UTC (permalink / raw)
  To: linux-kernel, linux-mm; +Cc: n-horiguchi, akpm, aneesh.kumar

Currently soft_offline_page() migrates the entire HugeTLB page, then
dequeues it from the active list by making it a dangling HugeTLB page
which ofcourse can not be used further and marks the entire HugeTLB
page as poisoned. This might be a costly waste of memory if the error
involved affects only small section of the entire page.

This changes the behaviour so that only the affected page is marked
poisoned and then the HugeTLB page is released back to buddy system.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
The number of poisoned pages on the system has reduced as seen from
dmesg triggered with 'echo m > /proc/sysrq-enter' on powerpc.

 include/linux/hugetlb.h | 1 +
 mm/hugetlb.c            | 2 +-
 mm/memory-failure.c     | 9 ++++-----
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 7a5917d..f6b80a4 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -470,6 +470,7 @@ static inline pgoff_t basepage_index(struct page *page)
 	return __basepage_index(page);
 }
 
+extern int dissolve_free_huge_page(struct page *page);
 extern int dissolve_free_huge_pages(unsigned long start_pfn,
 				    unsigned long end_pfn);
 static inline bool hugepage_migration_supported(struct hstate *h)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 1edfdb8..2fb9ba3 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1444,7 +1444,7 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
  * number of free hugepages would be reduced below the number of reserved
  * hugepages.
  */
-static int dissolve_free_huge_page(struct page *page)
+int dissolve_free_huge_page(struct page *page)
 {
 	int rc = 0;
 
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 27f7210..1e377fd 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1597,13 +1597,12 @@ static int soft_offline_huge_page(struct page *page, int flags)
 			ret = -EIO;
 	} else {
 		/* overcommit hugetlb page will be freed to buddy */
+		SetPageHWPoison(page);
+		num_poisoned_pages_inc();
+
 		if (PageHuge(page)) {
-			set_page_hwpoison_huge_page(hpage);
 			dequeue_hwpoisoned_huge_page(hpage);
-			num_poisoned_pages_add(1 << compound_order(hpage));
-		} else {
-			SetPageHWPoison(page);
-			num_poisoned_pages_inc();
+			dissolve_free_huge_page(hpage);
 		}
 	}
 	return ret;
-- 
1.8.5.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/madvise: Dont poison entire HugeTLB page for single page errors
  2017-04-20 11:06       ` [PATCH] mm/madvise: Dont poison entire HugeTLB page for single page errors Anshuman Khandual
@ 2017-04-20 13:48         ` kbuild test robot
  2017-05-12  8:10         ` Naoya Horiguchi
  1 sibling, 0 replies; 8+ messages in thread
From: kbuild test robot @ 2017-04-20 13:48 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: kbuild-all, linux-kernel, linux-mm, n-horiguchi, akpm, aneesh.kumar

[-- Attachment #1: Type: text/plain, Size: 1427 bytes --]

Hi Anshuman,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.11-rc7 next-20170420]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Anshuman-Khandual/mm-madvise-Dont-poison-entire-HugeTLB-page-for-single-page-errors/20170420-204009
config: x86_64-randconfig-x012-201716 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   mm/memory-failure.c: In function 'soft_offline_huge_page':
>> mm/memory-failure.c:1605:4: error: implicit declaration of function 'dissolve_free_huge_page' [-Werror=implicit-function-declaration]
       dissolve_free_huge_page(hpage);
       ^~~~~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/dissolve_free_huge_page +1605 mm/memory-failure.c

  1599			/* overcommit hugetlb page will be freed to buddy */
  1600			SetPageHWPoison(page);
  1601			num_poisoned_pages_inc();
  1602	
  1603			if (PageHuge(page)) {
  1604				dequeue_hwpoisoned_huge_page(hpage);
> 1605				dissolve_free_huge_page(hpage);
  1606			}
  1607		}
  1608		return ret;

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 32493 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/madvise: Dont poison entire HugeTLB page for single page errors
  2017-04-20 11:06       ` [PATCH] mm/madvise: Dont poison entire HugeTLB page for single page errors Anshuman Khandual
  2017-04-20 13:48         ` kbuild test robot
@ 2017-05-12  8:10         ` Naoya Horiguchi
  2017-05-14  2:34           ` Anshuman Khandual
  1 sibling, 1 reply; 8+ messages in thread
From: Naoya Horiguchi @ 2017-05-12  8:10 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: linux-kernel, linux-mm, akpm, aneesh.kumar

On Thu, Apr 20, 2017 at 04:36:27PM +0530, Anshuman Khandual wrote:
> Currently soft_offline_page() migrates the entire HugeTLB page, then
> dequeues it from the active list by making it a dangling HugeTLB page
> which ofcourse can not be used further and marks the entire HugeTLB
> page as poisoned. This might be a costly waste of memory if the error
> involved affects only small section of the entire page.
> 
> This changes the behaviour so that only the affected page is marked
> poisoned and then the HugeTLB page is released back to buddy system.

Hi Anshuman,

This is a good catch, and we can solve this issue now because freeing
hwpoisoned page (previously forbidden) is available now.

And I'm thinking that the same issue for hard/soft-offline on free
hugepages can be solved, so I'll submit a patchset which includes
updated version of your patch.

Thanks,
Naoya Horiguchi

> 
> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> ---
> The number of poisoned pages on the system has reduced as seen from
> dmesg triggered with 'echo m > /proc/sysrq-enter' on powerpc.
> 
>  include/linux/hugetlb.h | 1 +
>  mm/hugetlb.c            | 2 +-
>  mm/memory-failure.c     | 9 ++++-----
>  3 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 7a5917d..f6b80a4 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -470,6 +470,7 @@ static inline pgoff_t basepage_index(struct page *page)
>  	return __basepage_index(page);
>  }
>  
> +extern int dissolve_free_huge_page(struct page *page);
>  extern int dissolve_free_huge_pages(unsigned long start_pfn,
>  				    unsigned long end_pfn);
>  static inline bool hugepage_migration_supported(struct hstate *h)
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 1edfdb8..2fb9ba3 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1444,7 +1444,7 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
>   * number of free hugepages would be reduced below the number of reserved
>   * hugepages.
>   */
> -static int dissolve_free_huge_page(struct page *page)
> +int dissolve_free_huge_page(struct page *page)
>  {
>  	int rc = 0;
>  
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 27f7210..1e377fd 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1597,13 +1597,12 @@ static int soft_offline_huge_page(struct page *page, int flags)
>  			ret = -EIO;
>  	} else {
>  		/* overcommit hugetlb page will be freed to buddy */
> +		SetPageHWPoison(page);
> +		num_poisoned_pages_inc();
> +
>  		if (PageHuge(page)) {
> -			set_page_hwpoison_huge_page(hpage);
>  			dequeue_hwpoisoned_huge_page(hpage);
> -			num_poisoned_pages_add(1 << compound_order(hpage));
> -		} else {
> -			SetPageHWPoison(page);
> -			num_poisoned_pages_inc();
> +			dissolve_free_huge_page(hpage);
>  		}
>  	}
>  	return ret;
> -- 
> 1.8.5.2
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/madvise: Dont poison entire HugeTLB page for single page errors
  2017-05-12  8:10         ` Naoya Horiguchi
@ 2017-05-14  2:34           ` Anshuman Khandual
  0 siblings, 0 replies; 8+ messages in thread
From: Anshuman Khandual @ 2017-05-14  2:34 UTC (permalink / raw)
  To: Naoya Horiguchi, Anshuman Khandual
  Cc: linux-kernel, linux-mm, akpm, aneesh.kumar

On 05/12/2017 01:40 PM, Naoya Horiguchi wrote:
> On Thu, Apr 20, 2017 at 04:36:27PM +0530, Anshuman Khandual wrote:
>> Currently soft_offline_page() migrates the entire HugeTLB page, then
>> dequeues it from the active list by making it a dangling HugeTLB page
>> which ofcourse can not be used further and marks the entire HugeTLB
>> page as poisoned. This might be a costly waste of memory if the error
>> involved affects only small section of the entire page.
>>
>> This changes the behaviour so that only the affected page is marked
>> poisoned and then the HugeTLB page is released back to buddy system.
> Hi Anshuman,
> 
> This is a good catch, and we can solve this issue now because freeing
> hwpoisoned page (previously forbidden) is available now.
> 
> And I'm thinking that the same issue for hard/soft-offline on free
> hugepages can be solved, so I'll submit a patchset which includes
> updated version of your patch.

Sure, thanks Naoya.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-05-14  2:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-19  3:27 [RFC] mm/madvise: Enable (soft|hard) offline of HugeTLB pages at PGD level Anshuman Khandual
2017-04-19  6:20 ` Aneesh Kumar K.V
2017-04-19  6:42   ` Anshuman Khandual
2017-04-20  5:05     ` Anshuman Khandual
2017-04-20 11:06       ` [PATCH] mm/madvise: Dont poison entire HugeTLB page for single page errors Anshuman Khandual
2017-04-20 13:48         ` kbuild test robot
2017-05-12  8:10         ` Naoya Horiguchi
2017-05-14  2:34           ` Anshuman Khandual

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).