linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] Make alloc_contig_range handle Hugetlb pages
@ 2021-02-18 12:00 Oscar Salvador
  2021-02-18 12:00 ` [PATCH v2 1/2] mm: Make alloc_contig_range handle free hugetlb pages Oscar Salvador
  2021-02-18 12:00 ` [PATCH v2 2/2] mm: Make alloc_contig_range handle in-use " Oscar Salvador
  0 siblings, 2 replies; 5+ messages in thread
From: Oscar Salvador @ 2021-02-18 12:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Kravetz, David Hildenbrand, Michal Hocko, Muchun Song,
	linux-mm, linux-kernel, Oscar Salvador

v1 -> v2:
 - Adressed feedback by Michal
 - Restrict the allocation to a node with __GFP_THISNODE
 - Drop PageHuge check in alloc_and_dissolve_huge_page
 - Re-order comments in isolate_or_dissolve_huge_page
 - Extend comment in isolate_migratepages_block
 - Place put_page right after we got the page, otherwise
   dissolve_free_huge_page will fail

 RFC -> v1:
 - Drop RFC
 - Addressed feedback from David and Mike
 - Fence off gigantic pages as there is a cyclic dependency between
   them and alloc_contig_range
 - Re-organize the code to make race-window smaller and to put
   all details in hugetlb code
 - Drop nodemask initialization. First a node will be tried and then we
   will back to other nodes containing memory (N_MEMORY). Details in
   patch#1's changelog
 - Count new page as surplus in case we failed to dissolve the old page
   and the new one. Details in patch#1.

Cover letter:

alloc_contig_range lacks the hability for handling HugeTLB pages.
This can be problematic for some users, e.g: CMA and virtio-mem, where those
users will fail the call if alloc_contig_range ever sees a HugeTLB page, even
when those pages lay in ZONE_MOVABLE and are free.
That problem can be easily solved by replacing the hugepage by allocating a new one
and dissolving the old one.

In-use HugeTLB are no exception though, as those can be isolated and migrated
as any other LRU or Movable page.

This patchset aims for improving alloc_contig_range->isolate_migratepages_block,
so HugeTLB pages can be recognized and handled.

Below is an insight from David (thanks), where the problem can clearly be seen:

"Start a VM with 4G. Hotplug 1G via virtio-mem and online it to
ZONE_MOVABLE. Allocate 512 huge pages.

[root@localhost ~]# cat /proc/meminfo
MemTotal:        5061512 kB
MemFree:         3319396 kB
MemAvailable:    3457144 kB
...
HugePages_Total:     512
HugePages_Free:      512
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB


The huge pages get partially allocate from ZONE_MOVABLE. Try unplugging
1G via virtio-mem (remember, all ZONE_MOVABLE). Inside the guest:

[  180.058992] alloc_contig_range: [1b8000, 1c0000) PFNs busy
[  180.060531] alloc_contig_range: [1b8000, 1c0000) PFNs busy
[  180.061972] alloc_contig_range: [1b8000, 1c0000) PFNs busy
[  180.063413] alloc_contig_range: [1b8000, 1c0000) PFNs busy
[  180.064838] alloc_contig_range: [1b8000, 1c0000) PFNs busy
[  180.065848] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
[  180.066794] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
[  180.067738] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
[  180.068669] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
[  180.069598] alloc_contig_range: [1bfc00, 1c0000) PFNs busy"

Oscar Salvador (2):
  mm: Make alloc_contig_range handle free hugetlb pages
  mm: Make alloc_contig_range handle in-use hugetlb pages

 include/linux/hugetlb.h |  7 +++++
 mm/compaction.c         | 22 ++++++++++++++
 mm/hugetlb.c            | 77 +++++++++++++++++++++++++++++++++++++++++++++++++
 mm/vmscan.c             |  5 ++--
 4 files changed, 109 insertions(+), 2 deletions(-)

-- 
2.16.3



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 1/2] mm: Make alloc_contig_range handle free hugetlb pages
  2021-02-18 12:00 [PATCH v2 0/2] Make alloc_contig_range handle Hugetlb pages Oscar Salvador
@ 2021-02-18 12:00 ` Oscar Salvador
  2021-02-19  2:10   ` Mike Kravetz
  2021-02-18 12:00 ` [PATCH v2 2/2] mm: Make alloc_contig_range handle in-use " Oscar Salvador
  1 sibling, 1 reply; 5+ messages in thread
From: Oscar Salvador @ 2021-02-18 12:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Kravetz, David Hildenbrand, Michal Hocko, Muchun Song,
	linux-mm, linux-kernel, Oscar Salvador

alloc_contig_range will fail if it ever sees a HugeTLB page within the
range we are trying to allocate, even when that page is free and can be
easily reallocated.
This has proofed to be problematic for some users of alloc_contic_range,
e.g: CMA and virtio-mem, where those would fail the call even when those
pages lay in ZONE_MOVABLE and are free.

We can do better by trying to dissolve such pages.

Free hugepages are tricky to handle so as to no userspace application
notices disruption, we need to replace the current free hugepage with
a new one.

In order to do that, a new function called alloc_and_dissolve_huge_page
is introduced.
This function will first try to get a new fresh hugepage, and if it
succeeds, it will dissolve the old one.

If the old hugepage cannot be be dissolved, we have to dissolve the new
hugepage we just got.
Should that fail as well, we count is as a surplus, so the pool will be
re-balanced when a hugepage gets free instead of enqueues again.

With regard to the allocation, we restrict it to the node the page belongs
to with __GFP_THISNODE, meaning we do not fallback on other node's zones.

Note that gigantic hugetlb pages are fenced off since there is a cyclic
dependency between them and alloc_contig_range.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 include/linux/hugetlb.h |  6 ++++
 mm/compaction.c         | 12 ++++++++
 mm/hugetlb.c            | 75 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 93 insertions(+)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index b5807f23caf8..72352d718829 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -505,6 +505,7 @@ struct huge_bootmem_page {
 	struct hstate *hstate;
 };
 
+bool isolate_or_dissolve_huge_page(struct page *page);
 struct page *alloc_huge_page(struct vm_area_struct *vma,
 				unsigned long addr, int avoid_reserve);
 struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
@@ -775,6 +776,11 @@ void set_page_huge_active(struct page *page);
 #else	/* CONFIG_HUGETLB_PAGE */
 struct hstate {};
 
+static inline bool isolate_or_dissolve_huge_page(struct page *page)
+{
+	return false;
+}
+
 static inline struct page *alloc_huge_page(struct vm_area_struct *vma,
 					   unsigned long addr,
 					   int avoid_reserve)
diff --git a/mm/compaction.c b/mm/compaction.c
index 190ccdaa6c19..d52506ed9db7 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -905,6 +905,18 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 			valid_page = page;
 		}
 
+		if (PageHuge(page) && cc->alloc_contig) {
+			if (!isolate_or_dissolve_huge_page(page))
+				goto isolate_fail;
+
+			/*
+			 * Ok, the hugepage was dissolved. Now these pages are
+			 * Buddy and cannot be re-allocated because they are
+			 * isolated. Fall-through as the check below handles
+			 * Buddy pages.
+			 */
+		}
+
 		/*
 		 * Skip if free. We read page order here without zone lock
 		 * which is generally unsafe, but the race window is small and
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 4bdb58ab14cb..a4fbbe924a55 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2294,6 +2294,81 @@ static void restore_reserve_on_error(struct hstate *h,
 	}
 }
 
+static bool alloc_and_dissolve_huge_page(struct hstate *h, struct page *page)
+{
+	gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE;
+	int nid = page_to_nid(page);
+	struct page *new_page;
+	bool ret = false;
+
+	/*
+	 * Before dissolving the page, we need to allocate a new one,
+	 * so the pool remains stable.
+	 */
+	new_page = alloc_fresh_huge_page(h, gfp_mask, nid, NULL, NULL);
+	if (new_page) {
+		/*
+		 * Free it into the hugepage allocator
+		 */
+		put_page(new_page);
+
+		/*
+		 * Ok, we got a new free hugepage to replace this one. Try to
+		 * dissolve the old page.
+		 */
+		if (!dissolve_free_huge_page(page)) {
+			ret = true;
+		} else if (dissolve_free_huge_page(new_page)) {
+			/*
+			 * Seems the old page could not be dissolved, so try to
+			 * dissolve the freshly allocated page. If that fails
+			 * too, let us count the new page as a surplus. Doing so
+			 * allows the pool to be re-balanced when pages are freed
+			 * instead of enqueued again.
+			 */
+			spin_lock(&hugetlb_lock);
+			h->surplus_huge_pages++;
+			h->surplus_huge_pages_node[nid]++;
+			spin_unlock(&hugetlb_lock);
+		}
+	}
+
+	return ret;
+}
+
+bool isolate_or_dissolve_huge_page(struct page *page)
+{
+	struct hstate *h = NULL;
+	struct page *head;
+	bool ret = false;
+
+	spin_lock(&hugetlb_lock);
+	if (PageHuge(page)) {
+		head = compound_head(page);
+		h = page_hstate(head);
+	}
+	spin_unlock(&hugetlb_lock);
+
+	/*
+	 * The page might have been dissolved from under our feet.
+	 * If that is the case, return success as if we dissolved it ourselves.
+	 */
+	if (!h)
+		return true;
+
+	/*
+	 * Fence off gigantic pages as there is a cyclic dependency
+	 * between alloc_contig_range and them.
+	 */
+	if (hstate_is_gigantic(h))
+		return ret;
+
+	if(!page_count(head) && alloc_and_dissolve_huge_page(h, head))
+		ret = true;
+
+	return ret;
+}
+
 struct page *alloc_huge_page(struct vm_area_struct *vma,
 				    unsigned long addr, int avoid_reserve)
 {
-- 
2.16.3



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 2/2] mm: Make alloc_contig_range handle in-use hugetlb pages
  2021-02-18 12:00 [PATCH v2 0/2] Make alloc_contig_range handle Hugetlb pages Oscar Salvador
  2021-02-18 12:00 ` [PATCH v2 1/2] mm: Make alloc_contig_range handle free hugetlb pages Oscar Salvador
@ 2021-02-18 12:00 ` Oscar Salvador
  1 sibling, 0 replies; 5+ messages in thread
From: Oscar Salvador @ 2021-02-18 12:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Kravetz, David Hildenbrand, Michal Hocko, Muchun Song,
	linux-mm, linux-kernel, Oscar Salvador

alloc_contig_range() will fail miserably if it finds a HugeTLB page
within the range, without a chance to handle them. Since HugeTLB pages
can be migrated as any other page (LRU and Movable), it does not make
sense to bail out without trying.
Enable the interface to recognize in-use HugeTLB pages so we can migrate
them, and have much better chances to succeed the call.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 include/linux/hugetlb.h |  5 +++--
 mm/compaction.c         | 12 +++++++++++-
 mm/hugetlb.c            |  6 ++++--
 mm/vmscan.c             |  5 +++--
 4 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 72352d718829..8c17d0dbc87c 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -505,7 +505,7 @@ struct huge_bootmem_page {
 	struct hstate *hstate;
 };
 
-bool isolate_or_dissolve_huge_page(struct page *page);
+bool isolate_or_dissolve_huge_page(struct page *page, struct list_head *list);
 struct page *alloc_huge_page(struct vm_area_struct *vma,
 				unsigned long addr, int avoid_reserve);
 struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
@@ -776,7 +776,8 @@ void set_page_huge_active(struct page *page);
 #else	/* CONFIG_HUGETLB_PAGE */
 struct hstate {};
 
-static inline bool isolate_or_dissolve_huge_page(struct page *page)
+static inline bool isolate_or_dissolve_huge_page(struct page *page,
+						 struct list_head *list)
 {
 	return false;
 }
diff --git a/mm/compaction.c b/mm/compaction.c
index d52506ed9db7..3394ab385915 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -906,9 +906,18 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 		}
 
 		if (PageHuge(page) && cc->alloc_contig) {
-			if (!isolate_or_dissolve_huge_page(page))
+			if (!isolate_or_dissolve_huge_page(page, &cc->migratepages))
 				goto isolate_fail;
 
+			if (PageHuge(page)) {
+				/*
+				 * Hugepage was succesfully isolated and placed
+				 * on the cc->migratepages list.
+				 */
+				low_pfn += compound_nr(page) - 1;
+				goto isolate_success_no_list;
+			}
+
 			/*
 			 * Ok, the hugepage was dissolved. Now these pages are
 			 * Buddy and cannot be re-allocated because they are
@@ -1053,6 +1062,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 
 isolate_success:
 		list_add(&page->lru, &cc->migratepages);
+isolate_success_no_list:
 		cc->nr_migratepages += compound_nr(page);
 		nr_isolated += compound_nr(page);
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index a4fbbe924a55..1208b5f278b0 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2336,7 +2336,7 @@ static bool alloc_and_dissolve_huge_page(struct hstate *h, struct page *page)
 	return ret;
 }
 
-bool isolate_or_dissolve_huge_page(struct page *page)
+bool isolate_or_dissolve_huge_page(struct page *page, struct list_head *list)
 {
 	struct hstate *h = NULL;
 	struct page *head;
@@ -2363,7 +2363,9 @@ bool isolate_or_dissolve_huge_page(struct page *page)
 	if (hstate_is_gigantic(h))
 		return ret;
 
-	if(!page_count(head) && alloc_and_dissolve_huge_page(h, head))
+	if (page_count(head) && isolate_huge_page(head, list))
+		ret = true;
+	else if(!page_count(head) && alloc_and_dissolve_huge_page(h, head))
 		ret = true;
 
 	return ret;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index b1b574ad199d..0803adca4469 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1506,8 +1506,9 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
 	LIST_HEAD(clean_pages);
 
 	list_for_each_entry_safe(page, next, page_list, lru) {
-		if (page_is_file_lru(page) && !PageDirty(page) &&
-		    !__PageMovable(page) && !PageUnevictable(page)) {
+		if (!PageHuge(page) && page_is_file_lru(page) &&
+		    !PageDirty(page) && !__PageMovable(page) &&
+		    !PageUnevictable(page)) {
 			ClearPageActive(page);
 			list_move(&page->lru, &clean_pages);
 		}
-- 
2.16.3



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 1/2] mm: Make alloc_contig_range handle free hugetlb pages
  2021-02-18 12:00 ` [PATCH v2 1/2] mm: Make alloc_contig_range handle free hugetlb pages Oscar Salvador
@ 2021-02-19  2:10   ` Mike Kravetz
  2021-02-19  6:08     ` Oscar Salvador
  0 siblings, 1 reply; 5+ messages in thread
From: Mike Kravetz @ 2021-02-19  2:10 UTC (permalink / raw)
  To: Oscar Salvador, Andrew Morton
  Cc: David Hildenbrand, Michal Hocko, Muchun Song, linux-mm, linux-kernel

On 2/18/21 4:00 AM, Oscar Salvador wrote:
> alloc_contig_range will fail if it ever sees a HugeTLB page within the
> range we are trying to allocate, even when that page is free and can be
> easily reallocated.
> This has proofed to be problematic for some users of alloc_contic_range,
> e.g: CMA and virtio-mem, where those would fail the call even when those
> pages lay in ZONE_MOVABLE and are free.
> 
> We can do better by trying to dissolve such pages.
> 
> Free hugepages are tricky to handle so as to no userspace application
> notices disruption, we need to replace the current free hugepage with
> a new one.
> 
> In order to do that, a new function called alloc_and_dissolve_huge_page
> is introduced.
> This function will first try to get a new fresh hugepage, and if it
> succeeds, it will dissolve the old one.
> 
> If the old hugepage cannot be be dissolved, we have to dissolve the new
> hugepage we just got.
> Should that fail as well, we count is as a surplus, so the pool will be
> re-balanced when a hugepage gets free instead of enqueues again.
> 
> With regard to the allocation, we restrict it to the node the page belongs
> to with __GFP_THISNODE, meaning we do not fallback on other node's zones.
> 
> Note that gigantic hugetlb pages are fenced off since there is a cyclic
> dependency between them and alloc_contig_range.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>  include/linux/hugetlb.h |  6 ++++
>  mm/compaction.c         | 12 ++++++++
>  mm/hugetlb.c            | 75 +++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 93 insertions(+)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index b5807f23caf8..72352d718829 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -505,6 +505,7 @@ struct huge_bootmem_page {
>  	struct hstate *hstate;
>  };
>  
> +bool isolate_or_dissolve_huge_page(struct page *page);
>  struct page *alloc_huge_page(struct vm_area_struct *vma,
>  				unsigned long addr, int avoid_reserve);
>  struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
> @@ -775,6 +776,11 @@ void set_page_huge_active(struct page *page);
>  #else	/* CONFIG_HUGETLB_PAGE */
>  struct hstate {};
>  
> +static inline bool isolate_or_dissolve_huge_page(struct page *page)
> +{
> +	return false;
> +}
> +
>  static inline struct page *alloc_huge_page(struct vm_area_struct *vma,
>  					   unsigned long addr,
>  					   int avoid_reserve)
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 190ccdaa6c19..d52506ed9db7 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -905,6 +905,18 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  			valid_page = page;
>  		}
>  
> +		if (PageHuge(page) && cc->alloc_contig) {
> +			if (!isolate_or_dissolve_huge_page(page))
> +				goto isolate_fail;
> +
> +			/*
> +			 * Ok, the hugepage was dissolved. Now these pages are
> +			 * Buddy and cannot be re-allocated because they are
> +			 * isolated. Fall-through as the check below handles
> +			 * Buddy pages.
> +			 */
> +		}
> +
>  		/*
>  		 * Skip if free. We read page order here without zone lock
>  		 * which is generally unsafe, but the race window is small and
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 4bdb58ab14cb..a4fbbe924a55 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2294,6 +2294,81 @@ static void restore_reserve_on_error(struct hstate *h,
>  	}
>  }
>  
> +static bool alloc_and_dissolve_huge_page(struct hstate *h, struct page *page)
> +{
> +	gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE;
> +	int nid = page_to_nid(page);
> +	struct page *new_page;
> +	bool ret = false;
> +
> +	/*
> +	 * Before dissolving the page, we need to allocate a new one,
> +	 * so the pool remains stable.
> +	 */
> +	new_page = alloc_fresh_huge_page(h, gfp_mask, nid, NULL, NULL);
> +	if (new_page) {
> +		/*
> +		 * Free it into the hugepage allocator
> +		 */
> +		put_page(new_page);
> +

Suppose an admin does

echo 0 > \
/sys/devices/system/node/node<nid>/hugepages/hugepages-2048kB/nr_hugepages

right now and dissolves both the original and new page.

> +		/*
> +		 * Ok, we got a new free hugepage to replace this one. Try to
> +		 * dissolve the old page.
> +		 */
> +		if (!dissolve_free_huge_page(page)) {
> +			ret = true;

dissolve_free_huge_page will fail for the original page

> +		} else if (dissolve_free_huge_page(new_page)) {

and, will fail for the new page

> +			/*
> +			 * Seems the old page could not be dissolved, so try to
> +			 * dissolve the freshly allocated page. If that fails
> +			 * too, let us count the new page as a surplus. Doing so
> +			 * allows the pool to be re-balanced when pages are freed
> +			 * instead of enqueued again.
> +			 */
> +			spin_lock(&hugetlb_lock);
> +			h->surplus_huge_pages++;
> +			h->surplus_huge_pages_node[nid]++;
> +			spin_unlock(&hugetlb_lock);

Those counts will be wrong as there are no huge pages on the node.

I'll think about this more tomorrow.
Pretty sure this is an issue, but I could be wrong.  Just wanted to give
a heads up.
-- 
Mike Kravetz

> +		}
> +	}
> +
> +	return ret;
> +}
> +
> +bool isolate_or_dissolve_huge_page(struct page *page)
> +{
> +	struct hstate *h = NULL;
> +	struct page *head;
> +	bool ret = false;
> +
> +	spin_lock(&hugetlb_lock);
> +	if (PageHuge(page)) {
> +		head = compound_head(page);
> +		h = page_hstate(head);
> +	}
> +	spin_unlock(&hugetlb_lock);
> +
> +	/*
> +	 * The page might have been dissolved from under our feet.
> +	 * If that is the case, return success as if we dissolved it ourselves.
> +	 */
> +	if (!h)
> +		return true;
> +
> +	/*
> +	 * Fence off gigantic pages as there is a cyclic dependency
> +	 * between alloc_contig_range and them.
> +	 */
> +	if (hstate_is_gigantic(h))
> +		return ret;
> +
> +	if(!page_count(head) && alloc_and_dissolve_huge_page(h, head))
> +		ret = true;
> +
> +	return ret;
> +}
> +
>  struct page *alloc_huge_page(struct vm_area_struct *vma,
>  				    unsigned long addr, int avoid_reserve)
>  {
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 1/2] mm: Make alloc_contig_range handle free hugetlb pages
  2021-02-19  2:10   ` Mike Kravetz
@ 2021-02-19  6:08     ` Oscar Salvador
  0 siblings, 0 replies; 5+ messages in thread
From: Oscar Salvador @ 2021-02-19  6:08 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Andrew Morton, David Hildenbrand, Michal Hocko, Muchun Song,
	linux-mm, linux-kernel

On 2021-02-19 03:10, Mike Kravetz wrote:
> Those counts will be wrong as there are no huge pages on the node.
> 
> I'll think about this more tomorrow.
> Pretty sure this is an issue, but I could be wrong.  Just wanted to 
> give
> a heads up.

Yes, this is a problem, although the fixup would be to check whether we 
have any hugepages.

Nevertheless, I think we should not be touching surplus at all but 
rather make the page temporary.
I am exploring making migrate_pages() handle free hugepages as Michal 
suggested, so the approach is cleaner and we do not need extra 
functions. I yet have to see if that is feasible, as some issues come to 
my mind like the page needs to be in a list to go to migrate_pages, but 
if it is in that list, it is not in hugepages_freelist, and that could 
disrupt userspace as it could not dequeue hugepages if it demands it.
I have to check. Shoult not be possible, we can always make the page 
temporary here.

> --
> Mike Kravetz
> 
>> +		}
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>> +bool isolate_or_dissolve_huge_page(struct page *page)
>> +{
>> +	struct hstate *h = NULL;
>> +	struct page *head;
>> +	bool ret = false;
>> +
>> +	spin_lock(&hugetlb_lock);
>> +	if (PageHuge(page)) {
>> +		head = compound_head(page);
>> +		h = page_hstate(head);
>> +	}
>> +	spin_unlock(&hugetlb_lock);
>> +
>> +	/*
>> +	 * The page might have been dissolved from under our feet.
>> +	 * If that is the case, return success as if we dissolved it 
>> ourselves.
>> +	 */
>> +	if (!h)
>> +		return true;
>> +
>> +	/*
>> +	 * Fence off gigantic pages as there is a cyclic dependency
>> +	 * between alloc_contig_range and them.
>> +	 */
>> +	if (hstate_is_gigantic(h))
>> +		return ret;
>> +
>> +	if(!page_count(head) && alloc_and_dissolve_huge_page(h, head))
>> +		ret = true;
>> +
>> +	return ret;
>> +}
>> +
>>  struct page *alloc_huge_page(struct vm_area_struct *vma,
>>  				    unsigned long addr, int avoid_reserve)
>>  {
>> 

-- 
Oscar Salvador
SUSE L3


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-02-19  6:08 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-18 12:00 [PATCH v2 0/2] Make alloc_contig_range handle Hugetlb pages Oscar Salvador
2021-02-18 12:00 ` [PATCH v2 1/2] mm: Make alloc_contig_range handle free hugetlb pages Oscar Salvador
2021-02-19  2:10   ` Mike Kravetz
2021-02-19  6:08     ` Oscar Salvador
2021-02-18 12:00 ` [PATCH v2 2/2] mm: Make alloc_contig_range handle in-use " Oscar Salvador

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).