linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/4] Make alloc_contig_range handle Hugetlb pages
@ 2021-03-10 15:08 Oscar Salvador
  2021-03-10 15:08 ` [PATCH v4 1/4] mm,page_alloc: Bail out earlier on -ENOMEM in alloc_contig_migrate_range Oscar Salvador
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Oscar Salvador @ 2021-03-10 15:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Hildenbrand, Michal Hocko, Muchun Song,
	Mike Kravetz, linux-mm, linux-kernel, Oscar Salvador

v3 -> v4:
 - Addressed some feedback from David and Michal
 - Make more clear what hugetlb_lock protects in isolate_or_dissolve_huge_page
 - Start reporting proper error codes from isolate_migratepages_{range,block}
 - Bail out earlier in __alloc_contig_migrate_range on -ENOMEM
 - Addressed internal feedback from Vastlimil wrt. compaction code changes

v2 -> v3:
 - Drop usage of high-level generic helpers in favour of
   low-level approach (per Michal)
 - Check for the page to be marked as PageHugeFreed
 - Add a one-time retry in case someone grabbed the free huge page
   from under us

v1 -> v2:
 - Adressed feedback by Michal
 - Restrict the allocation to a node with __GFP_THISNODE
 - Drop PageHuge check in alloc_and_dissolve_huge_page
 - Re-order comments in isolate_or_dissolve_huge_page
 - Extend comment in isolate_migratepages_block
 - Place put_page right after we got the page, otherwise
   dissolve_free_huge_page will fail

 RFC -> v1:
 - Drop RFC
 - Addressed feedback from David and Mike
 - Fence off gigantic pages as there is a cyclic dependency between
   them and alloc_contig_range
 - Re-organize the code to make race-window smaller and to put
   all details in hugetlb code
 - Drop nodemask initialization. First a node will be tried and then we
   will back to other nodes containing memory (N_MEMORY). Details in
   patch#1's changelog
 - Count new page as surplus in case we failed to dissolve the old page
   and the new one. Details in patch#1.

Cover letter:

 alloc_contig_range lacks the hability for handling HugeTLB pages.
 This can be problematic for some users, e.g: CMA and virtio-mem, where those
 users will fail the call if alloc_contig_range ever sees a HugeTLB page, even
 when those pages lay in ZONE_MOVABLE and are free.
 That problem can be easily solved by replacing the page in the free hugepage
 pool.

 In-use HugeTLB are no exception though, as those can be isolated and migrated
 as any other LRU or Movable page.

 This patchset aims for improving alloc_contig_range->isolate_migratepages_block,
 so HugeTLB pages can be recognized and handled.

 Since we also need to start reporting errors down the chain (e.g: -ENOMEM due to
 not be able to allocate a new hugetlb page), isolate_migratepages_{range,block}
 interfaces  need to change to start reporting error codes instead of the pfn == 0
 vs pfn != 0 scheme it is using right now.
 From now on, isolate_migratepages_block will not return the next pfn to be scanned
 anymore, but -EINTR, -ENOMEM or 0, so we the next pfn to be scanned will be recorded
 in cc->migrate_pfn field (as it is already done in isolate_migratepages_range()).

 Below is an insight from David (thanks), where the problem can clearly be seen:

 "Start a VM with 4G. Hotplug 1G via virtio-mem and online it to
  ZONE_MOVABLE. Allocate 512 huge pages.

  [root@localhost ~]# cat /proc/meminfo
  MemTotal:        5061512 kB
  MemFree:         3319396 kB
  MemAvailable:    3457144 kB
  ...
  HugePages_Total:     512
  HugePages_Free:      512
  HugePages_Rsvd:        0
  HugePages_Surp:        0
  Hugepagesize:       2048 kB

  The huge pages get partially allocate from ZONE_MOVABLE. Try unplugging
  1G via virtio-mem (remember, all ZONE_MOVABLE). Inside the guest:

  [  180.058992] alloc_contig_range: [1b8000, 1c0000) PFNs busy
  [  180.060531] alloc_contig_range: [1b8000, 1c0000) PFNs busy
  [  180.061972] alloc_contig_range: [1b8000, 1c0000) PFNs busy
  [  180.063413] alloc_contig_range: [1b8000, 1c0000) PFNs busy
  [  180.064838] alloc_contig_range: [1b8000, 1c0000) PFNs busy
  [  180.065848] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
  [  180.066794] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
  [  180.067738] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
  [  180.068669] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
  [  180.069598] alloc_contig_range: [1bfc00, 1c0000) PFNs busy"

 And then with this patchset running:

 "Same experiment with ZONE_MOVABLE:

  a) Free huge pages: all memory can get unplugged again.

  b) Allocated/populated but idle huge pages: all memory can get unplugged
     again.

  c) Allocated/populated but all 512 huge pages are read/written in a
     loop: all memory can get unplugged again, but I get a single

  [  121.192345] alloc_contig_range: [180000, 188000) PFNs busy

  Most probably because it happened to try migrating a huge page while it
  was busy. As virtio-mem retries on ZONE_MOVABLE a couple of times, it
  can deal with this temporary failure.

  Last but not least, I did something extreme:

  # cat /proc/meminfo
  MemTotal:        5061568 kB
  MemFree:          186560 kB
  MemAvailable:     354524 kB
  ...
  HugePages_Total:    2048
  HugePages_Free:     2048
  HugePages_Rsvd:        0
  HugePages_Surp:        0

  Triggering unplug would require to dissolve+alloc - which now fails when
  trying to allocate an additional ~512 huge pages (1G).

  As expected, I can properly see memory unplug not fully succeeding. + I
  get a fairly continuous stream of

  [  226.611584] alloc_contig_range: [19f400, 19f800) PFNs busy
  ...

  But more importantly, the hugepage count remains stable, as configured
  by the admin (me):

  HugePages_Total:    2048
  HugePages_Free:     2048
  HugePages_Rsvd:        0
  HugePages_Surp:        0"

Oscar Salvador (4):
  mm,page_alloc: Bail out earlier on -ENOMEM in
    alloc_contig_migrate_range
  mm,compaction: Let isolate_migratepages_{range,block} return error
    codes
  mm: Make alloc_contig_range handle free hugetlb pages
  mm: Make alloc_contig_range handle in-use hugetlb pages

 include/linux/hugetlb.h |   7 +++
 mm/compaction.c         |  89 ++++++++++++++++++++++++----------
 mm/hugetlb.c            | 125 +++++++++++++++++++++++++++++++++++++++++++++++-
 mm/internal.h           |   2 +-
 mm/page_alloc.c         |  15 ++++--
 mm/vmscan.c             |   5 +-
 6 files changed, 209 insertions(+), 34 deletions(-)

-- 
2.16.3


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v4 1/4] mm,page_alloc: Bail out earlier on -ENOMEM in alloc_contig_migrate_range
  2021-03-10 15:08 [PATCH v4 0/4] Make alloc_contig_range handle Hugetlb pages Oscar Salvador
@ 2021-03-10 15:08 ` Oscar Salvador
  2021-03-15 11:00   ` Vlastimil Babka
  2021-03-15 11:10   ` David Hildenbrand
  2021-03-10 15:08 ` [PATCH v4 2/4] mm,compaction: Let isolate_migratepages_{range,block} return error codes Oscar Salvador
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 17+ messages in thread
From: Oscar Salvador @ 2021-03-10 15:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Hildenbrand, Michal Hocko, Muchun Song,
	Mike Kravetz, linux-mm, linux-kernel, Oscar Salvador

Currently, __alloc_contig_migrate_range can generate -EINTR, -ENOMEM or -EBUSY,
and report them down the chain.
The problem is that when migrate_pages() reports -ENOMEM, we keep going till we
exhaust all the try-attempts (5 at the moment) instead of bailing out.

migrate_pages bails out right away on -ENOMEM because it is considered a fatal
error. Do the same here instead of keep going and retrying.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/page_alloc.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3e4b29ee2b1e..94467f1b85ff 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8484,7 +8484,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
 			}
 			tries = 0;
 		} else if (++tries == 5) {
-			ret = ret < 0 ? ret : -EBUSY;
+			ret = -EBUSY;
 			break;
 		}
 
@@ -8494,6 +8494,12 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
 
 		ret = migrate_pages(&cc->migratepages, alloc_migration_target,
 				NULL, (unsigned long)&mtc, cc->mode, MR_CONTIG_RANGE);
+		/*
+		 * On -ENOMEM, migrate_pages() bails out right away. It is pointless
+		 * to retry again over this error, so do the same here.
+		 */
+		if (ret == -ENOMEM)
+			break;
 	}
 	if (ret < 0) {
 		putback_movable_pages(&cc->migratepages);
-- 
2.16.3


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 2/4] mm,compaction: Let isolate_migratepages_{range,block} return error codes
  2021-03-10 15:08 [PATCH v4 0/4] Make alloc_contig_range handle Hugetlb pages Oscar Salvador
  2021-03-10 15:08 ` [PATCH v4 1/4] mm,page_alloc: Bail out earlier on -ENOMEM in alloc_contig_migrate_range Oscar Salvador
@ 2021-03-10 15:08 ` Oscar Salvador
  2021-03-15 11:03   ` Vlastimil Babka
  2021-03-10 15:08 ` [PATCH v4 3/4] mm: Make alloc_contig_range handle free hugetlb pages Oscar Salvador
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Oscar Salvador @ 2021-03-10 15:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Hildenbrand, Michal Hocko, Muchun Song,
	Mike Kravetz, linux-mm, linux-kernel, Oscar Salvador

Currently, isolate_migratepages_{range,block} and their callers use
a pfn == 0 vs pfn != 0 scheme to let the caller know whether there was
any error during isolation.
This does not work as soon as we need to start reporting different error
codes and make sure we pass them down the chain, so they are properly
interpreted by functions like e.g: alloc_contig_range.

Let us rework isolate_migratepages_{range,block} so we can report error
codes.
Since isolate_migratepages_block will stop returning the next pfn to be
scanned, we reuse the cc->migrate_pfn field to keep track of that.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/compaction.c | 48 ++++++++++++++++++++++++------------------------
 mm/internal.h   |  2 +-
 mm/page_alloc.c |  7 +++----
 3 files changed, 28 insertions(+), 29 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index e04f4476e68e..5769753a8f60 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -787,15 +787,16 @@ static bool too_many_isolated(pg_data_t *pgdat)
  *
  * Isolate all pages that can be migrated from the range specified by
  * [low_pfn, end_pfn). The range is expected to be within same pageblock.
- * Returns zero if there is a fatal signal pending, otherwise PFN of the
- * first page that was not scanned (which may be both less, equal to or more
- * than end_pfn).
+ * Returns -EINTR in case we need to abort when we have too many isolated pages
+ * due to e.g: signal pending, async mode or having still pages to migrate, or 0.
+ * cc->migrate_pfn will contain the next pfn to scan (which may be both less,
+ * equal to or more that end_pfn).
  *
  * The pages are isolated on cc->migratepages list (not required to be empty),
  * and cc->nr_migratepages is updated accordingly. The cc->migrate_pfn field
  * is neither read nor updated.
  */
-static unsigned long
+static int
 isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 			unsigned long end_pfn, isolate_mode_t isolate_mode)
 {
@@ -810,6 +811,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 	unsigned long next_skip_pfn = 0;
 	bool skip_updated = false;
 
+	cc->migrate_pfn = low_pfn;
+
 	/*
 	 * Ensure that there are not too many pages isolated from the LRU
 	 * list by either parallel reclaimers or compaction. If there are,
@@ -818,16 +821,16 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 	while (unlikely(too_many_isolated(pgdat))) {
 		/* stop isolation if there are still pages not migrated */
 		if (cc->nr_migratepages)
-			return 0;
+			return -EINTR;
 
 		/* async migration should just abort */
 		if (cc->mode == MIGRATE_ASYNC)
-			return 0;
+			return -EINTR;
 
 		congestion_wait(BLK_RW_ASYNC, HZ/10);
 
 		if (fatal_signal_pending(current))
-			return 0;
+			return -EINTR;
 	}
 
 	cond_resched();
@@ -1130,7 +1133,9 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 	if (nr_isolated)
 		count_compact_events(COMPACTISOLATED, nr_isolated);
 
-	return low_pfn;
+	cc->migrate_pfn = low_pfn;
+
+	return 0;
 }
 
 /**
@@ -1139,15 +1144,15 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
  * @start_pfn: The first PFN to start isolating.
  * @end_pfn:   The one-past-last PFN.
  *
- * Returns zero if isolation fails fatally due to e.g. pending signal.
- * Otherwise, function returns one-past-the-last PFN of isolated page
- * (which may be greater than end_pfn if end fell in a middle of a THP page).
+ * Returns -EINTR in case isolation fails fatally due to e.g. pending signal,
+ * or 0.
  */
-unsigned long
+int
 isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
 							unsigned long end_pfn)
 {
 	unsigned long pfn, block_start_pfn, block_end_pfn;
+	int ret = 0;
 
 	/* Scan block by block. First and last block may be incomplete */
 	pfn = start_pfn;
@@ -1166,17 +1171,17 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
 					block_end_pfn, cc->zone))
 			continue;
 
-		pfn = isolate_migratepages_block(cc, pfn, block_end_pfn,
-							ISOLATE_UNEVICTABLE);
+		ret = isolate_migratepages_block(cc, pfn, block_end_pfn,
+						 ISOLATE_UNEVICTABLE);
 
-		if (!pfn)
+		if (ret)
 			break;
 
 		if (cc->nr_migratepages >= COMPACT_CLUSTER_MAX)
 			break;
 	}
 
-	return pfn;
+	return ret;
 }
 
 #endif /* CONFIG_COMPACTION || CONFIG_CMA */
@@ -1847,7 +1852,7 @@ static isolate_migrate_t isolate_migratepages(struct compact_control *cc)
 	 */
 	for (; block_end_pfn <= cc->free_pfn;
 			fast_find_block = false,
-			low_pfn = block_end_pfn,
+			cc->migrate_pfn = low_pfn = block_end_pfn,
 			block_start_pfn = block_end_pfn,
 			block_end_pfn += pageblock_nr_pages) {
 
@@ -1889,10 +1894,8 @@ static isolate_migrate_t isolate_migratepages(struct compact_control *cc)
 		}
 
 		/* Perform the isolation */
-		low_pfn = isolate_migratepages_block(cc, low_pfn,
-						block_end_pfn, isolate_mode);
-
-		if (!low_pfn)
+		if (isolate_migratepages_block(cc, low_pfn, block_end_pfn,
+						isolate_mode))
 			return ISOLATE_ABORT;
 
 		/*
@@ -1903,9 +1906,6 @@ static isolate_migrate_t isolate_migratepages(struct compact_control *cc)
 		break;
 	}
 
-	/* Record where migration scanner will be restarted. */
-	cc->migrate_pfn = low_pfn;
-
 	return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE;
 }
 
diff --git a/mm/internal.h b/mm/internal.h
index 9902648f2206..e670abb1154c 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -261,7 +261,7 @@ struct capture_control {
 unsigned long
 isolate_freepages_range(struct compact_control *cc,
 			unsigned long start_pfn, unsigned long end_pfn);
-unsigned long
+int
 isolate_migratepages_range(struct compact_control *cc,
 			   unsigned long low_pfn, unsigned long end_pfn);
 int find_suitable_fallback(struct free_area *area, unsigned int order,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 94467f1b85ff..2184e3ef4116 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8477,11 +8477,10 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
 
 		if (list_empty(&cc->migratepages)) {
 			cc->nr_migratepages = 0;
-			pfn = isolate_migratepages_range(cc, pfn, end);
-			if (!pfn) {
-				ret = -EINTR;
+			ret = isolate_migratepages_range(cc, pfn, end);
+			if (ret)
 				break;
-			}
+			pfn = cc->migrate_pfn;
 			tries = 0;
 		} else if (++tries == 5) {
 			ret = -EBUSY;
-- 
2.16.3


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 3/4] mm: Make alloc_contig_range handle free hugetlb pages
  2021-03-10 15:08 [PATCH v4 0/4] Make alloc_contig_range handle Hugetlb pages Oscar Salvador
  2021-03-10 15:08 ` [PATCH v4 1/4] mm,page_alloc: Bail out earlier on -ENOMEM in alloc_contig_migrate_range Oscar Salvador
  2021-03-10 15:08 ` [PATCH v4 2/4] mm,compaction: Let isolate_migratepages_{range,block} return error codes Oscar Salvador
@ 2021-03-10 15:08 ` Oscar Salvador
  2021-03-10 15:08 ` [PATCH v4 4/4] mm: Make alloc_contig_range handle in-use " Oscar Salvador
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Oscar Salvador @ 2021-03-10 15:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Hildenbrand, Michal Hocko, Muchun Song,
	Mike Kravetz, linux-mm, linux-kernel, Oscar Salvador

alloc_contig_range will fail if it ever sees a HugeTLB page within the
range we are trying to allocate, even when that page is free and can be
easily reallocated.
This has proved to be problematic for some users of alloc_contic_range,
e.g: CMA and virtio-mem, where those would fail the call even when those
pages lay in ZONE_MOVABLE and are free.

We can do better by trying to replace such page.

Free hugepages are tricky to handle so as to no userspace application
notices disruption, we need to replace the current free hugepage with
a new one.

In order to do that, a new function called alloc_and_dissolve_huge_page
is introduced.
This function will first try to get a new fresh hugepage, and if it
succeeds, it will replace the old one in the free hugepage pool.

All operations are being handled under hugetlb_lock, so no races are
possible. The only exception is when page's refcount is 0, but it still
has not been flagged as PageHugeFreed.
E.g, below scenario:

CPU0				CPU1
__free_huge_page()		isolate_or_dissolve_huge_page
				  PageHuge() == T
				  alloc_and_dissolve_huge_page
				    alloc_fresh_huge_page()
				    spin_lock(hugetlb_lock)
				    // PageHuge() && !PageHugeFreed &&
				    // !PageCount()
				    spin_unlock(hugetlb_lock)
  spin_lock(hugetlb_lock)
  1) update_and_free_page
       PageHuge() == F
       __free_pages()
  2) enqueue_huge_page
       SetPageHugeFreed()
  spin_unlock(&hugetlb_lock)
				  spin_lock(hugetlb_lock)
                                   1) PageHuge() == F (freed by case#1 from CPU0)
				   2) PageHuge() == T
                                       PageHugeFreed() == T
                                       - proceed with replacing the page

In the case above we retry as the window race is quite small and we have high
chances to succeed next time.

With regard to the allocation, we restrict it to the node the page belongs
to with __GFP_THISNODE, meaning we do not fallback on other node's zones.

Note that gigantic hugetlb pages are fenced off since there is a cyclic
dependency between them and alloc_contig_range.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/hugetlb.h |   6 +++
 mm/compaction.c         |  33 ++++++++++++++-
 mm/hugetlb.c            | 109 +++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 145 insertions(+), 3 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index cccd1aab69dd..bcff86ca616f 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -583,6 +583,7 @@ struct huge_bootmem_page {
 	struct hstate *hstate;
 };
 
+int isolate_or_dissolve_huge_page(struct page *page);
 struct page *alloc_huge_page(struct vm_area_struct *vma,
 				unsigned long addr, int avoid_reserve);
 struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
@@ -865,6 +866,11 @@ static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
 #else	/* CONFIG_HUGETLB_PAGE */
 struct hstate {};
 
+static inline int isolate_or_dissolve_huge_page(struct page *page)
+{
+	return -ENOMEM;
+}
+
 static inline struct page *alloc_huge_page(struct vm_area_struct *vma,
 					   unsigned long addr,
 					   int avoid_reserve)
diff --git a/mm/compaction.c b/mm/compaction.c
index 5769753a8f60..9f253fc3b4f9 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -810,6 +810,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 	bool skip_on_failure = false;
 	unsigned long next_skip_pfn = 0;
 	bool skip_updated = false;
+	bool fatal_error = false;
+	int ret = 0;
 
 	cc->migrate_pfn = low_pfn;
 
@@ -907,6 +909,32 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 			valid_page = page;
 		}
 
+		if (PageHuge(page) && cc->alloc_contig) {
+			ret = isolate_or_dissolve_huge_page(page);
+
+			/*
+			 * Fail isolation in case isolate_or_dissolve_huge_page
+			 * reports an error. In case of -ENOMEM, abort right away.
+			 */
+			if (ret < 0) {
+				/*
+				 * Do not report -EBUSY down the chain.
+				 */
+				if (ret == -ENOMEM)
+					fatal_error = true;
+				else
+					ret = 0;
+				goto isolate_fail;
+			}
+
+			/*
+			 * Ok, the hugepage was dissolved. Now these pages are
+			 * Buddy and cannot be re-allocated because they are
+			 * isolated. Fall-through as the check below handles
+			 * Buddy pages.
+			 */
+		}
+
 		/*
 		 * Skip if free. We read page order here without zone lock
 		 * which is generally unsafe, but the race window is small and
@@ -1092,6 +1120,9 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 			 */
 			next_skip_pfn += 1UL << cc->order;
 		}
+
+		if (fatal_error)
+			break;
 	}
 
 	/*
@@ -1135,7 +1166,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 
 	cc->migrate_pfn = low_pfn;
 
-	return 0;
+	return ret;
 }
 
 /**
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8fb42c6dd74b..80dd1b3b80fb 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1032,13 +1032,18 @@ static bool vma_has_reserves(struct vm_area_struct *vma, long chg)
 	return false;
 }
 
+static void __enqueue_huge_page(struct list_head *list, struct page *page)
+{
+	list_move(&page->lru, list);
+	SetHPageFreed(page);
+}
+
 static void enqueue_huge_page(struct hstate *h, struct page *page)
 {
 	int nid = page_to_nid(page);
-	list_move(&page->lru, &h->hugepage_freelists[nid]);
+	__enqueue_huge_page(&h->hugepage_freelists[nid], page);
 	h->free_huge_pages++;
 	h->free_huge_pages_node[nid]++;
-	SetHPageFreed(page);
 }
 
 static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
@@ -2242,6 +2247,106 @@ static void restore_reserve_on_error(struct hstate *h,
 	}
 }
 
+/*
+ * alloc_and_dissolve_huge_page - Allocate a new page and dissolve the old one
+ * @h: struct hstate old page belongs to
+ * @old_page: Old page to dissolve
+ * Returns 0 on success, otherwise negated error.
+ */
+
+static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page)
+{
+	gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE;
+	int nid = page_to_nid(old_page);
+	struct page *new_page;
+	int ret = 0;
+
+	/*
+	 * Before dissolving the page, we need to allocate a new one,
+	 * so the pool remains stable.
+	 */
+	new_page = alloc_fresh_huge_page(h, gfp_mask, nid, NULL, NULL);
+	if (!new_page)
+		return -ENOMEM;
+
+	/*
+	 * Pages got from Buddy are self-refcounted, but free hugepages
+	 * need to have a refcount of 0.
+	 */
+	page_ref_dec(new_page);
+retry:
+	spin_lock(&hugetlb_lock);
+	if (!PageHuge(old_page)) {
+		/*
+		 * Freed from under us. Drop new_page too.
+		 */
+		update_and_free_page(h, new_page);
+		goto unlock;
+	} else if (page_count(old_page)) {
+		/*
+		 * Someone has grabbed the page, fail for now.
+		 */
+		ret = -EBUSY;
+		update_and_free_page(h, new_page);
+		goto unlock;
+	} else if (!HPageFreed(old_page)) {
+		/*
+		 * Page's refcount is 0 but it has not been enqueued in the
+		 * freelist yet. Race window is small, so we can succed here if
+		 * we retry.
+		 */
+		spin_unlock(&hugetlb_lock);
+		cond_resched();
+		goto retry;
+	} else {
+		/*
+		 * Ok, old_page is still a genuine free hugepage. Replace it
+		 * with the new one.
+		 */
+		list_del(&old_page->lru);
+		update_and_free_page(h, old_page);
+		/*
+		 * h->free_huge_pages{_node} counters do not need to be updated.
+		 */
+		__enqueue_huge_page(&h->hugepage_freelists[nid], new_page);
+	}
+unlock:
+	spin_unlock(&hugetlb_lock);
+
+	return ret;
+}
+
+int isolate_or_dissolve_huge_page(struct page *page)
+{
+	struct hstate *h;
+	struct page *head;
+
+	/*
+	 * The page might have been dissolved from under our feet, so make sure
+	 * to carefully check the state under the lock.
+	 * Return success when racing as if we dissolved the page ourselves.
+	 */
+	spin_lock(&hugetlb_lock);
+	if (PageHuge(page)) {
+		head = compound_head(page);
+		h = page_hstate(head);
+	} else {
+		spin_unlock(&hugetlb_lock);
+		return 0;
+	}
+	spin_unlock(&hugetlb_lock);
+
+	/*
+	 * Fence off gigantic pages as there is a cyclic dependency between
+	 * alloc_contig_range and them. Return -ENOME as this has the effect
+	 * of bailing out right away without further retrying.
+	 */
+	if (hstate_is_gigantic(h))
+		return -ENOMEM;
+
+	return alloc_and_dissolve_huge_page(h, head);
+}
+
 struct page *alloc_huge_page(struct vm_area_struct *vma,
 				    unsigned long addr, int avoid_reserve)
 {
-- 
2.16.3


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 4/4] mm: Make alloc_contig_range handle in-use hugetlb pages
  2021-03-10 15:08 [PATCH v4 0/4] Make alloc_contig_range handle Hugetlb pages Oscar Salvador
                   ` (2 preceding siblings ...)
  2021-03-10 15:08 ` [PATCH v4 3/4] mm: Make alloc_contig_range handle free hugetlb pages Oscar Salvador
@ 2021-03-10 15:08 ` Oscar Salvador
  2021-03-15  9:06 ` [PATCH v4 0/4] Make alloc_contig_range handle Hugetlb pages David Hildenbrand
  2021-03-15 10:23 ` Oscar Salvador
  5 siblings, 0 replies; 17+ messages in thread
From: Oscar Salvador @ 2021-03-10 15:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Hildenbrand, Michal Hocko, Muchun Song,
	Mike Kravetz, linux-mm, linux-kernel, Oscar Salvador

alloc_contig_range() will fail if it finds a HugeTLB page within the range,
without a chance to handle them. Since HugeTLB pages can be migrated as any
LRU or Movable page, it does not make sense to bail out without trying.
Enable the interface to recognize in-use HugeTLB pages so we can migrate
them, and have much better chances to succeed the call.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 include/linux/hugetlb.h |  5 +++--
 mm/compaction.c         | 12 +++++++++++-
 mm/hugetlb.c            | 22 +++++++++++++++++++---
 mm/vmscan.c             |  5 +++--
 4 files changed, 36 insertions(+), 8 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index bcff86ca616f..a37b4ce86e58 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -583,7 +583,7 @@ struct huge_bootmem_page {
 	struct hstate *hstate;
 };
 
-int isolate_or_dissolve_huge_page(struct page *page);
+int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list);
 struct page *alloc_huge_page(struct vm_area_struct *vma,
 				unsigned long addr, int avoid_reserve);
 struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
@@ -866,7 +866,8 @@ static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
 #else	/* CONFIG_HUGETLB_PAGE */
 struct hstate {};
 
-static inline int isolate_or_dissolve_huge_page(struct page *page)
+static inline int isolate_or_dissolve_huge_page(struct page *page,
+						struct list_head *list)
 {
 	return -ENOMEM;
 }
diff --git a/mm/compaction.c b/mm/compaction.c
index 9f253fc3b4f9..6e47855fd154 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -910,7 +910,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 		}
 
 		if (PageHuge(page) && cc->alloc_contig) {
-			ret = isolate_or_dissolve_huge_page(page);
+			ret = isolate_or_dissolve_huge_page(page, &cc->migratepages);
 
 			/*
 			 * Fail isolation in case isolate_or_dissolve_huge_page
@@ -927,6 +927,15 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 				goto isolate_fail;
 			}
 
+			if (PageHuge(page)) {
+				/*
+				 * Hugepage was successfully isolated and placed
+				 * on the cc->migratepages list.
+				 */
+				low_pfn += compound_nr(page) - 1;
+				goto isolate_success_no_list;
+			}
+
 			/*
 			 * Ok, the hugepage was dissolved. Now these pages are
 			 * Buddy and cannot be re-allocated because they are
@@ -1068,6 +1077,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 
 isolate_success:
 		list_add(&page->lru, &cc->migratepages);
+isolate_success_no_list:
 		cc->nr_migratepages += compound_nr(page);
 		nr_isolated += compound_nr(page);
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 80dd1b3b80fb..64caffc504d1 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2284,7 +2284,9 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page)
 		goto unlock;
 	} else if (page_count(old_page)) {
 		/*
-		 * Someone has grabbed the page, fail for now.
+		 * Someone has grabbed the page, return -EBUSY so we give
+		 * isolate_or_dissolve_huge_page a chance to handle an in-use
+		 * page.
 		 */
 		ret = -EBUSY;
 		update_and_free_page(h, new_page);
@@ -2316,10 +2318,12 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page)
 	return ret;
 }
 
-int isolate_or_dissolve_huge_page(struct page *page)
+int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list)
 {
 	struct hstate *h;
 	struct page *head;
+	bool try_again = true;
+	int ret = -EBUSY;
 
 	/*
 	 * The page might have been dissolved from under our feet, so make sure
@@ -2344,7 +2348,19 @@ int isolate_or_dissolve_huge_page(struct page *page)
 	if (hstate_is_gigantic(h))
 		return -ENOMEM;
 
-	return alloc_and_dissolve_huge_page(h, head);
+retry:
+	if (page_count(head) && isolate_huge_page(head, list)) {
+		ret = 0;
+	} else if (!page_count(head)) {
+		ret = alloc_and_dissolve_huge_page(h, head);
+
+		if (ret == -EBUSY && try_again) {
+			try_again = false;
+			goto retry;
+		}
+	}
+
+	return ret;
 }
 
 struct page *alloc_huge_page(struct vm_area_struct *vma,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 562e87cbd7a1..42aaef30633e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1507,8 +1507,9 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
 	LIST_HEAD(clean_pages);
 
 	list_for_each_entry_safe(page, next, page_list, lru) {
-		if (page_is_file_lru(page) && !PageDirty(page) &&
-		    !__PageMovable(page) && !PageUnevictable(page)) {
+		if (!PageHuge(page) && page_is_file_lru(page) &&
+		    !PageDirty(page) && !__PageMovable(page) &&
+		    !PageUnevictable(page)) {
 			ClearPageActive(page);
 			list_move(&page->lru, &clean_pages);
 		}
-- 
2.16.3


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 0/4] Make alloc_contig_range handle Hugetlb pages
  2021-03-10 15:08 [PATCH v4 0/4] Make alloc_contig_range handle Hugetlb pages Oscar Salvador
                   ` (3 preceding siblings ...)
  2021-03-10 15:08 ` [PATCH v4 4/4] mm: Make alloc_contig_range handle in-use " Oscar Salvador
@ 2021-03-15  9:06 ` David Hildenbrand
  2021-03-15 10:27   ` Oscar Salvador
  2021-03-17  9:43   ` Oscar Salvador
  2021-03-15 10:23 ` Oscar Salvador
  5 siblings, 2 replies; 17+ messages in thread
From: David Hildenbrand @ 2021-03-15  9:06 UTC (permalink / raw)
  To: Oscar Salvador, Andrew Morton
  Cc: Vlastimil Babka, Michal Hocko, Muchun Song, Mike Kravetz,
	linux-mm, linux-kernel, Anshuman Khandual

On 10.03.21 16:08, Oscar Salvador wrote:
> v3 -> v4:
>   - Addressed some feedback from David and Michal
>   - Make more clear what hugetlb_lock protects in isolate_or_dissolve_huge_page
>   - Start reporting proper error codes from isolate_migratepages_{range,block}
>   - Bail out earlier in __alloc_contig_migrate_range on -ENOMEM
>   - Addressed internal feedback from Vastlimil wrt. compaction code changes
> 
> v2 -> v3:
>   - Drop usage of high-level generic helpers in favour of
>     low-level approach (per Michal)
>   - Check for the page to be marked as PageHugeFreed
>   - Add a one-time retry in case someone grabbed the free huge page
>     from under us
> 
> v1 -> v2:
>   - Adressed feedback by Michal
>   - Restrict the allocation to a node with __GFP_THISNODE
>   - Drop PageHuge check in alloc_and_dissolve_huge_page
>   - Re-order comments in isolate_or_dissolve_huge_page
>   - Extend comment in isolate_migratepages_block
>   - Place put_page right after we got the page, otherwise
>     dissolve_free_huge_page will fail
> 
>   RFC -> v1:
>   - Drop RFC
>   - Addressed feedback from David and Mike
>   - Fence off gigantic pages as there is a cyclic dependency between
>     them and alloc_contig_range
>   - Re-organize the code to make race-window smaller and to put
>     all details in hugetlb code
>   - Drop nodemask initialization. First a node will be tried and then we
>     will back to other nodes containing memory (N_MEMORY). Details in
>     patch#1's changelog
>   - Count new page as surplus in case we failed to dissolve the old page
>     and the new one. Details in patch#1.
> 
> Cover letter:
> 
>   alloc_contig_range lacks the hability for handling HugeTLB pages.
>   This can be problematic for some users, e.g: CMA and virtio-mem, where those
>   users will fail the call if alloc_contig_range ever sees a HugeTLB page, even
>   when those pages lay in ZONE_MOVABLE and are free.
>   That problem can be easily solved by replacing the page in the free hugepage
>   pool.
> 
>   In-use HugeTLB are no exception though, as those can be isolated and migrated
>   as any other LRU or Movable page.
> 
>   This patchset aims for improving alloc_contig_range->isolate_migratepages_block,
>   so HugeTLB pages can be recognized and handled.
> 
>   Since we also need to start reporting errors down the chain (e.g: -ENOMEM due to
>   not be able to allocate a new hugetlb page), isolate_migratepages_{range,block}
>   interfaces  need to change to start reporting error codes instead of the pfn == 0
>   vs pfn != 0 scheme it is using right now.
>   From now on, isolate_migratepages_block will not return the next pfn to be scanned
>   anymore, but -EINTR, -ENOMEM or 0, so we the next pfn to be scanned will be recorded
>   in cc->migrate_pfn field (as it is already done in isolate_migratepages_range()).
> 
>   Below is an insight from David (thanks), where the problem can clearly be seen:
> 
>   "Start a VM with 4G. Hotplug 1G via virtio-mem and online it to
>    ZONE_MOVABLE. Allocate 512 huge pages.
> 
>    [root@localhost ~]# cat /proc/meminfo
>    MemTotal:        5061512 kB
>    MemFree:         3319396 kB
>    MemAvailable:    3457144 kB
>    ...
>    HugePages_Total:     512
>    HugePages_Free:      512
>    HugePages_Rsvd:        0
>    HugePages_Surp:        0
>    Hugepagesize:       2048 kB
> 
>    The huge pages get partially allocate from ZONE_MOVABLE. Try unplugging
>    1G via virtio-mem (remember, all ZONE_MOVABLE). Inside the guest:
> 
>    [  180.058992] alloc_contig_range: [1b8000, 1c0000) PFNs busy
>    [  180.060531] alloc_contig_range: [1b8000, 1c0000) PFNs busy
>    [  180.061972] alloc_contig_range: [1b8000, 1c0000) PFNs busy
>    [  180.063413] alloc_contig_range: [1b8000, 1c0000) PFNs busy
>    [  180.064838] alloc_contig_range: [1b8000, 1c0000) PFNs busy
>    [  180.065848] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
>    [  180.066794] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
>    [  180.067738] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
>    [  180.068669] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
>    [  180.069598] alloc_contig_range: [1bfc00, 1c0000) PFNs busy"
> 
>   And then with this patchset running:
> 
>   "Same experiment with ZONE_MOVABLE:
> 
>    a) Free huge pages: all memory can get unplugged again.
> 
>    b) Allocated/populated but idle huge pages: all memory can get unplugged
>       again.
> 
>    c) Allocated/populated but all 512 huge pages are read/written in a
>       loop: all memory can get unplugged again, but I get a single
> 
>    [  121.192345] alloc_contig_range: [180000, 188000) PFNs busy
> 
>    Most probably because it happened to try migrating a huge page while it
>    was busy. As virtio-mem retries on ZONE_MOVABLE a couple of times, it
>    can deal with this temporary failure.
> 
>    Last but not least, I did something extreme:
> 
>    # cat /proc/meminfo
>    MemTotal:        5061568 kB
>    MemFree:          186560 kB
>    MemAvailable:     354524 kB
>    ...
>    HugePages_Total:    2048
>    HugePages_Free:     2048
>    HugePages_Rsvd:        0
>    HugePages_Surp:        0
> 
>    Triggering unplug would require to dissolve+alloc - which now fails when
>    trying to allocate an additional ~512 huge pages (1G).
> 
>    As expected, I can properly see memory unplug not fully succeeding. + I
>    get a fairly continuous stream of
> 
>    [  226.611584] alloc_contig_range: [19f400, 19f800) PFNs busy
>    ...
> 
>    But more importantly, the hugepage count remains stable, as configured
>    by the admin (me):
> 
>    HugePages_Total:    2048
>    HugePages_Free:     2048
>    HugePages_Rsvd:        0
>    HugePages_Surp:        0"
> 
> Oscar Salvador (4):
>    mm,page_alloc: Bail out earlier on -ENOMEM in
>      alloc_contig_migrate_range
>    mm,compaction: Let isolate_migratepages_{range,block} return error
>      codes
>    mm: Make alloc_contig_range handle free hugetlb pages
>    mm: Make alloc_contig_range handle in-use hugetlb pages
> 
>   include/linux/hugetlb.h |   7 +++
>   mm/compaction.c         |  89 ++++++++++++++++++++++++----------
>   mm/hugetlb.c            | 125 +++++++++++++++++++++++++++++++++++++++++++++++-
>   mm/internal.h           |   2 +-
>   mm/page_alloc.c         |  15 ++++--
>   mm/vmscan.c             |   5 +-
>   6 files changed, 209 insertions(+), 34 deletions(-)
> 

BTW, I stumbled yesterday over

alloc_contig_pages()->pfn_range_valid_contig():

	if (page_count(page) > 0)
		rerurn false;
	if (PageHuge(page))
		return false;

As used by memtrace and for gigantic pages. We can now

a) Drop these check completely, as it's best-effort only and racy. 
alloc_contig_pages()/alloc_contig_range() will handle it properly.

b) Similarly, check for gigantic pages and/or movability/migratability.

Dropping both checks might be the right thing to do: might significantly 
increase allocation chances -- as we actually end up migrating busy 
pages ...

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 0/4] Make alloc_contig_range handle Hugetlb pages
  2021-03-10 15:08 [PATCH v4 0/4] Make alloc_contig_range handle Hugetlb pages Oscar Salvador
                   ` (4 preceding siblings ...)
  2021-03-15  9:06 ` [PATCH v4 0/4] Make alloc_contig_range handle Hugetlb pages David Hildenbrand
@ 2021-03-15 10:23 ` Oscar Salvador
  5 siblings, 0 replies; 17+ messages in thread
From: Oscar Salvador @ 2021-03-15 10:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Hildenbrand, Michal Hocko, Muchun Song,
	Mike Kravetz, linux-mm, linux-kernel

On Wed, Mar 10, 2021 at 04:08:49PM +0100, Oscar Salvador wrote:
> v3 -> v4:
>  - Addressed some feedback from David and Michal
>  - Make more clear what hugetlb_lock protects in isolate_or_dissolve_huge_page
>  - Start reporting proper error codes from isolate_migratepages_{range,block}
>  - Bail out earlier in __alloc_contig_migrate_range on -ENOMEM
>  - Addressed internal feedback from Vastlimil wrt. compaction code changes
> 
> v2 -> v3:
>  - Drop usage of high-level generic helpers in favour of
>    low-level approach (per Michal)
>  - Check for the page to be marked as PageHugeFreed
>  - Add a one-time retry in case someone grabbed the free huge page
>    from under us
> 
> v1 -> v2:
>  - Adressed feedback by Michal
>  - Restrict the allocation to a node with __GFP_THISNODE
>  - Drop PageHuge check in alloc_and_dissolve_huge_page
>  - Re-order comments in isolate_or_dissolve_huge_page
>  - Extend comment in isolate_migratepages_block
>  - Place put_page right after we got the page, otherwise
>    dissolve_free_huge_page will fail
> 
>  RFC -> v1:
>  - Drop RFC
>  - Addressed feedback from David and Mike
>  - Fence off gigantic pages as there is a cyclic dependency between
>    them and alloc_contig_range
>  - Re-organize the code to make race-window smaller and to put
>    all details in hugetlb code
>  - Drop nodemask initialization. First a node will be tried and then we
>    will back to other nodes containing memory (N_MEMORY). Details in
>    patch#1's changelog
>  - Count new page as surplus in case we failed to dissolve the old page
>    and the new one. Details in patch#1.
> 
> Cover letter:
> 
>  alloc_contig_range lacks the hability for handling HugeTLB pages.
>  This can be problematic for some users, e.g: CMA and virtio-mem, where those
>  users will fail the call if alloc_contig_range ever sees a HugeTLB page, even
>  when those pages lay in ZONE_MOVABLE and are free.
>  That problem can be easily solved by replacing the page in the free hugepage
>  pool.
> 
>  In-use HugeTLB are no exception though, as those can be isolated and migrated
>  as any other LRU or Movable page.
> 
>  This patchset aims for improving alloc_contig_range->isolate_migratepages_block,
>  so HugeTLB pages can be recognized and handled.
> 
>  Since we also need to start reporting errors down the chain (e.g: -ENOMEM due to
>  not be able to allocate a new hugetlb page), isolate_migratepages_{range,block}
>  interfaces  need to change to start reporting error codes instead of the pfn == 0
>  vs pfn != 0 scheme it is using right now.
>  From now on, isolate_migratepages_block will not return the next pfn to be scanned
>  anymore, but -EINTR, -ENOMEM or 0, so we the next pfn to be scanned will be recorded
>  in cc->migrate_pfn field (as it is already done in isolate_migratepages_range()).
> 
>  Below is an insight from David (thanks), where the problem can clearly be seen:
> 
>  "Start a VM with 4G. Hotplug 1G via virtio-mem and online it to
>   ZONE_MOVABLE. Allocate 512 huge pages.
> 
>   [root@localhost ~]# cat /proc/meminfo
>   MemTotal:        5061512 kB
>   MemFree:         3319396 kB
>   MemAvailable:    3457144 kB
>   ...
>   HugePages_Total:     512
>   HugePages_Free:      512
>   HugePages_Rsvd:        0
>   HugePages_Surp:        0
>   Hugepagesize:       2048 kB
> 
>   The huge pages get partially allocate from ZONE_MOVABLE. Try unplugging
>   1G via virtio-mem (remember, all ZONE_MOVABLE). Inside the guest:
> 
>   [  180.058992] alloc_contig_range: [1b8000, 1c0000) PFNs busy
>   [  180.060531] alloc_contig_range: [1b8000, 1c0000) PFNs busy
>   [  180.061972] alloc_contig_range: [1b8000, 1c0000) PFNs busy
>   [  180.063413] alloc_contig_range: [1b8000, 1c0000) PFNs busy
>   [  180.064838] alloc_contig_range: [1b8000, 1c0000) PFNs busy
>   [  180.065848] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
>   [  180.066794] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
>   [  180.067738] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
>   [  180.068669] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
>   [  180.069598] alloc_contig_range: [1bfc00, 1c0000) PFNs busy"
> 
>  And then with this patchset running:
> 
>  "Same experiment with ZONE_MOVABLE:
> 
>   a) Free huge pages: all memory can get unplugged again.
> 
>   b) Allocated/populated but idle huge pages: all memory can get unplugged
>      again.
> 
>   c) Allocated/populated but all 512 huge pages are read/written in a
>      loop: all memory can get unplugged again, but I get a single
> 
>   [  121.192345] alloc_contig_range: [180000, 188000) PFNs busy
> 
>   Most probably because it happened to try migrating a huge page while it
>   was busy. As virtio-mem retries on ZONE_MOVABLE a couple of times, it
>   can deal with this temporary failure.
> 
>   Last but not least, I did something extreme:
> 
>   # cat /proc/meminfo
>   MemTotal:        5061568 kB
>   MemFree:          186560 kB
>   MemAvailable:     354524 kB
>   ...
>   HugePages_Total:    2048
>   HugePages_Free:     2048
>   HugePages_Rsvd:        0
>   HugePages_Surp:        0
> 
>   Triggering unplug would require to dissolve+alloc - which now fails when
>   trying to allocate an additional ~512 huge pages (1G).
> 
>   As expected, I can properly see memory unplug not fully succeeding. + I
>   get a fairly continuous stream of
> 
>   [  226.611584] alloc_contig_range: [19f400, 19f800) PFNs busy
>   ...
> 
>   But more importantly, the hugepage count remains stable, as configured
>   by the admin (me):
> 
>   HugePages_Total:    2048
>   HugePages_Free:     2048
>   HugePages_Rsvd:        0
>   HugePages_Surp:        0"
> 
> Oscar Salvador (4):
>   mm,page_alloc: Bail out earlier on -ENOMEM in
>     alloc_contig_migrate_range
>   mm,compaction: Let isolate_migratepages_{range,block} return error
>     codes
>   mm: Make alloc_contig_range handle free hugetlb pages
>   mm: Make alloc_contig_range handle in-use hugetlb pages
> 
>  include/linux/hugetlb.h |   7 +++
>  mm/compaction.c         |  89 ++++++++++++++++++++++++----------
>  mm/hugetlb.c            | 125 +++++++++++++++++++++++++++++++++++++++++++++++-
>  mm/internal.h           |   2 +-
>  mm/page_alloc.c         |  15 ++++--
>  mm/vmscan.c             |   5 +-
>  6 files changed, 209 insertions(+), 34 deletions(-)

Kindly ping :-)


-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 0/4] Make alloc_contig_range handle Hugetlb pages
  2021-03-15  9:06 ` [PATCH v4 0/4] Make alloc_contig_range handle Hugetlb pages David Hildenbrand
@ 2021-03-15 10:27   ` Oscar Salvador
  2021-03-15 10:28     ` David Hildenbrand
  2021-03-17  9:43   ` Oscar Salvador
  1 sibling, 1 reply; 17+ messages in thread
From: Oscar Salvador @ 2021-03-15 10:27 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, Vlastimil Babka, Michal Hocko, Muchun Song,
	Mike Kravetz, linux-mm, linux-kernel, Anshuman Khandual

On Mon, Mar 15, 2021 at 10:06:40AM +0100, David Hildenbrand wrote:
> 
> BTW, I stumbled yesterday over
> 
> alloc_contig_pages()->pfn_range_valid_contig():
> 
> 	if (page_count(page) > 0)
> 		rerurn false;
> 	if (PageHuge(page))
> 		return false;
> 
> As used by memtrace and for gigantic pages. We can now
> 
> a) Drop these check completely, as it's best-effort only and racy.
> alloc_contig_pages()/alloc_contig_range() will handle it properly.
> 
> b) Similarly, check for gigantic pages and/or movability/migratability.
> 
> Dropping both checks might be the right thing to do: might significantly
> increase allocation chances -- as we actually end up migrating busy pages
> ...

Oh, sorry David, my mail client tricked me and I did not see this till now.

I will have a look, but I would like to collect some more feedback from all
pieces before going any further and write a new version.
Vlastimil patch#1 and patch#2 and he was ok with them, but let see what others
think as well.

 

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 0/4] Make alloc_contig_range handle Hugetlb pages
  2021-03-15 10:27   ` Oscar Salvador
@ 2021-03-15 10:28     ` David Hildenbrand
  0 siblings, 0 replies; 17+ messages in thread
From: David Hildenbrand @ 2021-03-15 10:28 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Andrew Morton, Vlastimil Babka, Michal Hocko, Muchun Song,
	Mike Kravetz, linux-mm, linux-kernel, Anshuman Khandual

On 15.03.21 11:27, Oscar Salvador wrote:
> On Mon, Mar 15, 2021 at 10:06:40AM +0100, David Hildenbrand wrote:
>>
>> BTW, I stumbled yesterday over
>>
>> alloc_contig_pages()->pfn_range_valid_contig():
>>
>> 	if (page_count(page) > 0)
>> 		rerurn false;
>> 	if (PageHuge(page))
>> 		return false;
>>
>> As used by memtrace and for gigantic pages. We can now
>>
>> a) Drop these check completely, as it's best-effort only and racy.
>> alloc_contig_pages()/alloc_contig_range() will handle it properly.
>>
>> b) Similarly, check for gigantic pages and/or movability/migratability.
>>
>> Dropping both checks might be the right thing to do: might significantly
>> increase allocation chances -- as we actually end up migrating busy pages
>> ...
> 
> Oh, sorry David, my mail client tricked me and I did not see this till now.
> 
> I will have a look, but I would like to collect some more feedback from all
> pieces before going any further and write a new version.
> Vlastimil patch#1 and patch#2 and he was ok with them, but let see what others
> think as well.

Planning on having a detailed look at the patches. Fairly busy though 
... :(

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/4] mm,page_alloc: Bail out earlier on -ENOMEM in alloc_contig_migrate_range
  2021-03-10 15:08 ` [PATCH v4 1/4] mm,page_alloc: Bail out earlier on -ENOMEM in alloc_contig_migrate_range Oscar Salvador
@ 2021-03-15 11:00   ` Vlastimil Babka
  2021-03-15 11:10   ` David Hildenbrand
  1 sibling, 0 replies; 17+ messages in thread
From: Vlastimil Babka @ 2021-03-15 11:00 UTC (permalink / raw)
  To: Oscar Salvador, Andrew Morton
  Cc: David Hildenbrand, Michal Hocko, Muchun Song, Mike Kravetz,
	linux-mm, linux-kernel

On 3/10/21 4:08 PM, Oscar Salvador wrote:
> Currently, __alloc_contig_migrate_range can generate -EINTR, -ENOMEM or -EBUSY,
> and report them down the chain.
> The problem is that when migrate_pages() reports -ENOMEM, we keep going till we
> exhaust all the try-attempts (5 at the moment) instead of bailing out.
> 
> migrate_pages bails out right away on -ENOMEM because it is considered a fatal
> error. Do the same here instead of keep going and retrying.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/page_alloc.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3e4b29ee2b1e..94467f1b85ff 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -8484,7 +8484,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
>  			}
>  			tries = 0;
>  		} else if (++tries == 5) {
> -			ret = ret < 0 ? ret : -EBUSY;
> +			ret = -EBUSY;
>  			break;
>  		}
>  
> @@ -8494,6 +8494,12 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
>  
>  		ret = migrate_pages(&cc->migratepages, alloc_migration_target,
>  				NULL, (unsigned long)&mtc, cc->mode, MR_CONTIG_RANGE);
> +		/*
> +		 * On -ENOMEM, migrate_pages() bails out right away. It is pointless
> +		 * to retry again over this error, so do the same here.
> +		 */
> +		if (ret == -ENOMEM)
> +			break;
>  	}
>  	if (ret < 0) {
>  		putback_movable_pages(&cc->migratepages);
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 2/4] mm,compaction: Let isolate_migratepages_{range,block} return error codes
  2021-03-10 15:08 ` [PATCH v4 2/4] mm,compaction: Let isolate_migratepages_{range,block} return error codes Oscar Salvador
@ 2021-03-15 11:03   ` Vlastimil Babka
  0 siblings, 0 replies; 17+ messages in thread
From: Vlastimil Babka @ 2021-03-15 11:03 UTC (permalink / raw)
  To: Oscar Salvador, Andrew Morton
  Cc: David Hildenbrand, Michal Hocko, Muchun Song, Mike Kravetz,
	linux-mm, linux-kernel

On 3/10/21 4:08 PM, Oscar Salvador wrote:
> Currently, isolate_migratepages_{range,block} and their callers use
> a pfn == 0 vs pfn != 0 scheme to let the caller know whether there was
> any error during isolation.
> This does not work as soon as we need to start reporting different error
> codes and make sure we pass them down the chain, so they are properly
> interpreted by functions like e.g: alloc_contig_range.
> 
> Let us rework isolate_migratepages_{range,block} so we can report error
> codes.
> Since isolate_migratepages_block will stop returning the next pfn to be
> scanned, we reuse the cc->migrate_pfn field to keep track of that.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/compaction.c | 48 ++++++++++++++++++++++++------------------------
>  mm/internal.h   |  2 +-
>  mm/page_alloc.c |  7 +++----
>  3 files changed, 28 insertions(+), 29 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index e04f4476e68e..5769753a8f60 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -787,15 +787,16 @@ static bool too_many_isolated(pg_data_t *pgdat)
>   *
>   * Isolate all pages that can be migrated from the range specified by
>   * [low_pfn, end_pfn). The range is expected to be within same pageblock.
> - * Returns zero if there is a fatal signal pending, otherwise PFN of the
> - * first page that was not scanned (which may be both less, equal to or more
> - * than end_pfn).
> + * Returns -EINTR in case we need to abort when we have too many isolated pages
> + * due to e.g: signal pending, async mode or having still pages to migrate, or 0.
> + * cc->migrate_pfn will contain the next pfn to scan (which may be both less,
> + * equal to or more that end_pfn).
>   *
>   * The pages are isolated on cc->migratepages list (not required to be empty),
>   * and cc->nr_migratepages is updated accordingly. The cc->migrate_pfn field
>   * is neither read nor updated.
>   */
> -static unsigned long
> +static int
>  isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  			unsigned long end_pfn, isolate_mode_t isolate_mode)
>  {
> @@ -810,6 +811,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  	unsigned long next_skip_pfn = 0;
>  	bool skip_updated = false;
>  
> +	cc->migrate_pfn = low_pfn;
> +
>  	/*
>  	 * Ensure that there are not too many pages isolated from the LRU
>  	 * list by either parallel reclaimers or compaction. If there are,
> @@ -818,16 +821,16 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  	while (unlikely(too_many_isolated(pgdat))) {
>  		/* stop isolation if there are still pages not migrated */
>  		if (cc->nr_migratepages)
> -			return 0;
> +			return -EINTR;
>  
>  		/* async migration should just abort */
>  		if (cc->mode == MIGRATE_ASYNC)
> -			return 0;
> +			return -EINTR;
>  
>  		congestion_wait(BLK_RW_ASYNC, HZ/10);
>  
>  		if (fatal_signal_pending(current))
> -			return 0;
> +			return -EINTR;
>  	}
>  
>  	cond_resched();
> @@ -1130,7 +1133,9 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  	if (nr_isolated)
>  		count_compact_events(COMPACTISOLATED, nr_isolated);
>  
> -	return low_pfn;
> +	cc->migrate_pfn = low_pfn;
> +
> +	return 0;
>  }
>  
>  /**
> @@ -1139,15 +1144,15 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>   * @start_pfn: The first PFN to start isolating.
>   * @end_pfn:   The one-past-last PFN.
>   *
> - * Returns zero if isolation fails fatally due to e.g. pending signal.
> - * Otherwise, function returns one-past-the-last PFN of isolated page
> - * (which may be greater than end_pfn if end fell in a middle of a THP page).
> + * Returns -EINTR in case isolation fails fatally due to e.g. pending signal,
> + * or 0.
>   */
> -unsigned long
> +int
>  isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
>  							unsigned long end_pfn)
>  {
>  	unsigned long pfn, block_start_pfn, block_end_pfn;
> +	int ret = 0;
>  
>  	/* Scan block by block. First and last block may be incomplete */
>  	pfn = start_pfn;
> @@ -1166,17 +1171,17 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
>  					block_end_pfn, cc->zone))
>  			continue;
>  
> -		pfn = isolate_migratepages_block(cc, pfn, block_end_pfn,
> -							ISOLATE_UNEVICTABLE);
> +		ret = isolate_migratepages_block(cc, pfn, block_end_pfn,
> +						 ISOLATE_UNEVICTABLE);
>  
> -		if (!pfn)
> +		if (ret)
>  			break;
>  
>  		if (cc->nr_migratepages >= COMPACT_CLUSTER_MAX)
>  			break;
>  	}
>  
> -	return pfn;
> +	return ret;
>  }
>  
>  #endif /* CONFIG_COMPACTION || CONFIG_CMA */
> @@ -1847,7 +1852,7 @@ static isolate_migrate_t isolate_migratepages(struct compact_control *cc)
>  	 */
>  	for (; block_end_pfn <= cc->free_pfn;
>  			fast_find_block = false,
> -			low_pfn = block_end_pfn,
> +			cc->migrate_pfn = low_pfn = block_end_pfn,
>  			block_start_pfn = block_end_pfn,
>  			block_end_pfn += pageblock_nr_pages) {
>  
> @@ -1889,10 +1894,8 @@ static isolate_migrate_t isolate_migratepages(struct compact_control *cc)
>  		}
>  
>  		/* Perform the isolation */
> -		low_pfn = isolate_migratepages_block(cc, low_pfn,
> -						block_end_pfn, isolate_mode);
> -
> -		if (!low_pfn)
> +		if (isolate_migratepages_block(cc, low_pfn, block_end_pfn,
> +						isolate_mode))
>  			return ISOLATE_ABORT;
>  
>  		/*
> @@ -1903,9 +1906,6 @@ static isolate_migrate_t isolate_migratepages(struct compact_control *cc)
>  		break;
>  	}
>  
> -	/* Record where migration scanner will be restarted. */
> -	cc->migrate_pfn = low_pfn;
> -
>  	return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE;
>  }
>  
> diff --git a/mm/internal.h b/mm/internal.h
> index 9902648f2206..e670abb1154c 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -261,7 +261,7 @@ struct capture_control {
>  unsigned long
>  isolate_freepages_range(struct compact_control *cc,
>  			unsigned long start_pfn, unsigned long end_pfn);
> -unsigned long
> +int
>  isolate_migratepages_range(struct compact_control *cc,
>  			   unsigned long low_pfn, unsigned long end_pfn);
>  int find_suitable_fallback(struct free_area *area, unsigned int order,
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 94467f1b85ff..2184e3ef4116 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -8477,11 +8477,10 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
>  
>  		if (list_empty(&cc->migratepages)) {
>  			cc->nr_migratepages = 0;
> -			pfn = isolate_migratepages_range(cc, pfn, end);
> -			if (!pfn) {
> -				ret = -EINTR;
> +			ret = isolate_migratepages_range(cc, pfn, end);
> +			if (ret)
>  				break;
> -			}
> +			pfn = cc->migrate_pfn;
>  			tries = 0;
>  		} else if (++tries == 5) {
>  			ret = -EBUSY;
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/4] mm,page_alloc: Bail out earlier on -ENOMEM in alloc_contig_migrate_range
  2021-03-10 15:08 ` [PATCH v4 1/4] mm,page_alloc: Bail out earlier on -ENOMEM in alloc_contig_migrate_range Oscar Salvador
  2021-03-15 11:00   ` Vlastimil Babka
@ 2021-03-15 11:10   ` David Hildenbrand
  1 sibling, 0 replies; 17+ messages in thread
From: David Hildenbrand @ 2021-03-15 11:10 UTC (permalink / raw)
  To: Oscar Salvador, Andrew Morton
  Cc: Vlastimil Babka, Michal Hocko, Muchun Song, Mike Kravetz,
	linux-mm, linux-kernel

On 10.03.21 16:08, Oscar Salvador wrote:
> Currently, __alloc_contig_migrate_range can generate -EINTR, -ENOMEM or -EBUSY,
> and report them down the chain.
> The problem is that when migrate_pages() reports -ENOMEM, we keep going till we
> exhaust all the try-attempts (5 at the moment) instead of bailing out.
> 
> migrate_pages bails out right away on -ENOMEM because it is considered a fatal
> error. Do the same here instead of keep going and retrying.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>   mm/page_alloc.c | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3e4b29ee2b1e..94467f1b85ff 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -8484,7 +8484,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
>   			}
>   			tries = 0;
>   		} else if (++tries == 5) {
> -			ret = ret < 0 ? ret : -EBUSY;
> +			ret = -EBUSY;
>   			break;
>   		}
>   
> @@ -8494,6 +8494,12 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
>   
>   		ret = migrate_pages(&cc->migratepages, alloc_migration_target,
>   				NULL, (unsigned long)&mtc, cc->mode, MR_CONTIG_RANGE);
> +		/*
> +		 * On -ENOMEM, migrate_pages() bails out right away. It is pointless
> +		 * to retry again over this error, so do the same here.
> +		 */
> +		if (ret == -ENOMEM)
> +			break;

I would have thought we could also be able to get other fatal errors 
from migrate_pages(), but doesn't seem like it

Reviewed-by: David Hildenbrand <david@redhat.com>


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 0/4] Make alloc_contig_range handle Hugetlb pages
  2021-03-15  9:06 ` [PATCH v4 0/4] Make alloc_contig_range handle Hugetlb pages David Hildenbrand
  2021-03-15 10:27   ` Oscar Salvador
@ 2021-03-17  9:43   ` Oscar Salvador
  2021-03-17  9:48     ` David Hildenbrand
  1 sibling, 1 reply; 17+ messages in thread
From: Oscar Salvador @ 2021-03-17  9:43 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, Vlastimil Babka, Michal Hocko, Muchun Song,
	Mike Kravetz, linux-mm, linux-kernel, Anshuman Khandual

On 2021-03-15 10:06, David Hildenbrand wrote:
> BTW, I stumbled yesterday over
> 
> alloc_contig_pages()->pfn_range_valid_contig():
> 
> 	if (page_count(page) > 0)
> 		rerurn false;
> 	if (PageHuge(page))
> 		return false;
> 
> As used by memtrace and for gigantic pages. We can now
> 
> a) Drop these check completely, as it's best-effort only and racy.
> alloc_contig_pages()/alloc_contig_range() will handle it properly.

I was preparing v5, and I wanted to be sure I understood you here.

Right you are that the in-use page check can be dropped, as those pages 
can
be migrated away, and the Hugetlb page check can also be dropped since
isolate_migratepages_range is now capable of dealing with those kind of 
pages.

> b) Similarly, check for gigantic pages and/or movability/migratability.

I lost you here.

isolate_or_dissolve_huge_page() already bails out on hugetlb-gigantic 
pages.

Or do you mean to place an upfront check here? (hstate_is_gigantic())?


-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 0/4] Make alloc_contig_range handle Hugetlb pages
  2021-03-17  9:43   ` Oscar Salvador
@ 2021-03-17  9:48     ` David Hildenbrand
  2021-03-17 10:05       ` Oscar Salvador
  0 siblings, 1 reply; 17+ messages in thread
From: David Hildenbrand @ 2021-03-17  9:48 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Andrew Morton, Vlastimil Babka, Michal Hocko, Muchun Song,
	Mike Kravetz, linux-mm, linux-kernel, Anshuman Khandual

On 17.03.21 10:43, Oscar Salvador wrote:
> On 2021-03-15 10:06, David Hildenbrand wrote:
>> BTW, I stumbled yesterday over
>>
>> alloc_contig_pages()->pfn_range_valid_contig():
>>
>> 	if (page_count(page) > 0)
>> 		rerurn false;
>> 	if (PageHuge(page))
>> 		return false;
>>
>> As used by memtrace and for gigantic pages. We can now
>>
>> a) Drop these check completely, as it's best-effort only and racy.
>> alloc_contig_pages()/alloc_contig_range() will handle it properly.
> 
> I was preparing v5, and I wanted to be sure I understood you here.
> 
> Right you are that the in-use page check can be dropped, as those pages
> can
> be migrated away, and the Hugetlb page check can also be dropped since
> isolate_migratepages_range is now capable of dealing with those kind of
> pages.
> 
>> b) Similarly, check for gigantic pages and/or movability/migratability.
> 
> I lost you here.
> 
> isolate_or_dissolve_huge_page() already bails out on hugetlb-gigantic
> pages.
> 
> Or do you mean to place an upfront check here? (hstate_is_gigantic())?

Yes. But I prefer a) and keeping it simple here -- just doing basic 
sanity checks (online, zone, PageReserved()) that are absolutely necessary.



-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 0/4] Make alloc_contig_range handle Hugetlb pages
  2021-03-17  9:48     ` David Hildenbrand
@ 2021-03-17 10:05       ` Oscar Salvador
  2021-03-17 10:06         ` Oscar Salvador
  2021-03-17 10:07         ` David Hildenbrand
  0 siblings, 2 replies; 17+ messages in thread
From: Oscar Salvador @ 2021-03-17 10:05 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, Vlastimil Babka, Michal Hocko, Muchun Song,
	Mike Kravetz, linux-mm, linux-kernel, Anshuman Khandual

On Wed, Mar 17, 2021 at 10:48:31AM +0100, David Hildenbrand wrote:
> > I was preparing v5, and I wanted to be sure I understood you here.
> > 
> > Right you are that the in-use page check can be dropped, as those pages
> > can
> > be migrated away, and the Hugetlb page check can also be dropped since
> > isolate_migratepages_range is now capable of dealing with those kind of
> > pages.
> > 
> > > b) Similarly, check for gigantic pages and/or movability/migratability.
> > 
> > I lost you here.
> > 
> > isolate_or_dissolve_huge_page() already bails out on hugetlb-gigantic
> > pages.
> > 
> > Or do you mean to place an upfront check here? (hstate_is_gigantic())?
> 
> Yes. But I prefer a) and keeping it simple here -- just doing basic sanity
> checks (online, zone, PageReserved()) that are absolutely necessary.

Ok, I am probably dense as I understood as if you were lean towards having
a) + b).

That is what I have as the last patch of the patchset:

From e97175b7d4970cbdcbafcf8c398f72a571e817b0 Mon Sep 17 00:00:00 2001
From: Oscar Salvador <osalvador@suse.de>
Date: Thu, 18 Mar 2021 05:03:18 +0100
Subject: [PATCH] mm,page_alloc: Drop unnecesary checks from
 pfn_range_valid_contig

pfn_range_valid_contig() bails out when it finds an in-use page or a
hugetlb page, among other things.
We can drop the in-use page check since __alloc_contig_pages can migrate
away those pages, and the hugetlb page check can go too since
isolate_migratepages_range is now capable of dealing with hugetlb pages.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/page_alloc.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4cb455355f6d..50d73e68b79e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8685,12 +8685,6 @@ static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,

                if (PageReserved(page))
                        return false;
-
-               if (page_count(page) > 0)
-                       return false;
-
-               if (PageHuge(page))
-                       return false;
        }
        return true;
 }

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 0/4] Make alloc_contig_range handle Hugetlb pages
  2021-03-17 10:05       ` Oscar Salvador
@ 2021-03-17 10:06         ` Oscar Salvador
  2021-03-17 10:07         ` David Hildenbrand
  1 sibling, 0 replies; 17+ messages in thread
From: Oscar Salvador @ 2021-03-17 10:06 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, Vlastimil Babka, Michal Hocko, Muchun Song,
	Mike Kravetz, linux-mm, linux-kernel, Anshuman Khandual

On Wed, Mar 17, 2021 at 11:05:23AM +0100, Oscar Salvador wrote:
> From e97175b7d4970cbdcbafcf8c398f72a571e817b0 Mon Sep 17 00:00:00 2001
> From: Oscar Salvador <osalvador@suse.de>
> Date: Thu, 18 Mar 2021 05:03:18 +0100
> Subject: [PATCH] mm,page_alloc: Drop unnecesary checks from
>  pfn_range_valid_contig
> 
> pfn_range_valid_contig() bails out when it finds an in-use page or a
> hugetlb page, among other things.
> We can drop the in-use page check since __alloc_contig_pages can migrate
> away those pages, and the hugetlb page check can go too since
> isolate_migratepages_range is now capable of dealing with hugetlb pages.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>

Of course, missing a Suggested-by: David Hildenbrand <david@redhat.com>

 

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 0/4] Make alloc_contig_range handle Hugetlb pages
  2021-03-17 10:05       ` Oscar Salvador
  2021-03-17 10:06         ` Oscar Salvador
@ 2021-03-17 10:07         ` David Hildenbrand
  1 sibling, 0 replies; 17+ messages in thread
From: David Hildenbrand @ 2021-03-17 10:07 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Andrew Morton, Vlastimil Babka, Michal Hocko, Muchun Song,
	Mike Kravetz, linux-mm, linux-kernel, Anshuman Khandual

On 17.03.21 11:05, Oscar Salvador wrote:
> On Wed, Mar 17, 2021 at 10:48:31AM +0100, David Hildenbrand wrote:
>>> I was preparing v5, and I wanted to be sure I understood you here.
>>>
>>> Right you are that the in-use page check can be dropped, as those pages
>>> can
>>> be migrated away, and the Hugetlb page check can also be dropped since
>>> isolate_migratepages_range is now capable of dealing with those kind of
>>> pages.
>>>
>>>> b) Similarly, check for gigantic pages and/or movability/migratability.
>>>
>>> I lost you here.
>>>
>>> isolate_or_dissolve_huge_page() already bails out on hugetlb-gigantic
>>> pages.
>>>
>>> Or do you mean to place an upfront check here? (hstate_is_gigantic())?
>>
>> Yes. But I prefer a) and keeping it simple here -- just doing basic sanity
>> checks (online, zone, PageReserved()) that are absolutely necessary.
> 
> Ok, I am probably dense as I understood as if you were lean towards having
> a) + b).

Sorry, I meant either a) or b) :)

> 
> That is what I have as the last patch of the patchset:
> 
>  From e97175b7d4970cbdcbafcf8c398f72a571e817b0 Mon Sep 17 00:00:00 2001
> From: Oscar Salvador <osalvador@suse.de>
> Date: Thu, 18 Mar 2021 05:03:18 +0100
> Subject: [PATCH] mm,page_alloc: Drop unnecesary checks from
>   pfn_range_valid_contig
> 
> pfn_range_valid_contig() bails out when it finds an in-use page or a
> hugetlb page, among other things.
> We can drop the in-use page check since __alloc_contig_pages can migrate
> away those pages, and the hugetlb page check can go too since
> isolate_migratepages_range is now capable of dealing with hugetlb pages.
> 

Might want to mention that the existing checks were racy either way :)

> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>   mm/page_alloc.c | 6 ------
>   1 file changed, 6 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 4cb455355f6d..50d73e68b79e 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -8685,12 +8685,6 @@ static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
> 
>                  if (PageReserved(page))
>                          return false;
> -
> -               if (page_count(page) > 0)
> -                       return false;
> -
> -               if (PageHuge(page))
> -                       return false;
>          }
>          return true;
>   }
> 


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-03-17 10:08 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-10 15:08 [PATCH v4 0/4] Make alloc_contig_range handle Hugetlb pages Oscar Salvador
2021-03-10 15:08 ` [PATCH v4 1/4] mm,page_alloc: Bail out earlier on -ENOMEM in alloc_contig_migrate_range Oscar Salvador
2021-03-15 11:00   ` Vlastimil Babka
2021-03-15 11:10   ` David Hildenbrand
2021-03-10 15:08 ` [PATCH v4 2/4] mm,compaction: Let isolate_migratepages_{range,block} return error codes Oscar Salvador
2021-03-15 11:03   ` Vlastimil Babka
2021-03-10 15:08 ` [PATCH v4 3/4] mm: Make alloc_contig_range handle free hugetlb pages Oscar Salvador
2021-03-10 15:08 ` [PATCH v4 4/4] mm: Make alloc_contig_range handle in-use " Oscar Salvador
2021-03-15  9:06 ` [PATCH v4 0/4] Make alloc_contig_range handle Hugetlb pages David Hildenbrand
2021-03-15 10:27   ` Oscar Salvador
2021-03-15 10:28     ` David Hildenbrand
2021-03-17  9:43   ` Oscar Salvador
2021-03-17  9:48     ` David Hildenbrand
2021-03-17 10:05       ` Oscar Salvador
2021-03-17 10:06         ` Oscar Salvador
2021-03-17 10:07         ` David Hildenbrand
2021-03-15 10:23 ` Oscar Salvador

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).