linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Interface for higher order contiguous allocations
@ 2018-04-17  2:09 Mike Kravetz
  2018-04-17  2:09 ` [PATCH 1/3] mm: change type of free_contig_range(nr_pages) to unsigned long Mike Kravetz
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Mike Kravetz @ 2018-04-17  2:09 UTC (permalink / raw)
  To: linux-mm, linux-kernel, linux-api
  Cc: Reinette Chatre, Michal Hocko, Christopher Lameter, Guy Shattah,
	Anshuman Khandual, Michal Nazarewicz, Vlastimil Babka,
	David Nellans, Laura Abbott, Pavel Machek, Dave Hansen,
	Andrew Morton, Mike Kravetz

These patches came out of the "[RFC] mmap(MAP_CONTIG)" discussions at:
http://lkml.kernel.org/r/21f1ec96-2822-1189-1c95-79a2bb491571@oracle.com

One suggestion in that thread was to create a friendlier interface that
could be used by drivers and others outside core mm code to allocate a
contiguous set of pages.  The alloc_contig_range() interface is used for
this purpose today by CMA and gigantic page allocation.  However, this is
not a general purpose interface.  So, wrap alloc_contig_range() in the
more general interface:

struct page *find_alloc_contig_pages(unsigned int order, gfp_t gfp, int nid,
                                        nodemask_t *nodemask)

This interface is essentially the same functionality provided by the
hugetlb specific routine alloc_gigantic_page().  After creating the
interface, change alloc_gigantic_page() to call find_alloc_contig_pages()
and delete all the supporting code in hugetlb.c.

A new use case for allocating contiguous memory has been identified in
Intel(R) Resource Director Technology Cache Pseudo-Locking.

Mike Kravetz (3):
  mm: change type of free_contig_range(nr_pages) to unsigned long
  mm: add find_alloc_contig_pages() interface
  mm/hugetlb: use find_alloc_contig_pages() to allocate gigantic pages

 include/linux/gfp.h | 14 +++++++-
 mm/cma.c            |  2 +-
 mm/hugetlb.c        | 87 ++++--------------------------------------------
 mm/page_alloc.c     | 95 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 4 files changed, 110 insertions(+), 88 deletions(-)

-- 
2.13.6

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/3] mm: change type of free_contig_range(nr_pages) to unsigned long
  2018-04-17  2:09 [PATCH 0/3] Interface for higher order contiguous allocations Mike Kravetz
@ 2018-04-17  2:09 ` Mike Kravetz
  2018-04-17  2:09 ` [PATCH 2/3] mm: add find_alloc_contig_pages() interface Mike Kravetz
  2018-04-17  2:09 ` [PATCH 3/3] mm/hugetlb: use find_alloc_contig_pages() to allocate gigantic pages Mike Kravetz
  2 siblings, 0 replies; 13+ messages in thread
From: Mike Kravetz @ 2018-04-17  2:09 UTC (permalink / raw)
  To: linux-mm, linux-kernel, linux-api
  Cc: Reinette Chatre, Michal Hocko, Christopher Lameter, Guy Shattah,
	Anshuman Khandual, Michal Nazarewicz, Vlastimil Babka,
	David Nellans, Laura Abbott, Pavel Machek, Dave Hansen,
	Andrew Morton, Mike Kravetz

free_contig_range() is currently defined as:
void free_contig_range(unsigned long pfn, unsigned nr_pages);
change to,
void free_contig_range(unsigned long pfn, unsigned long nr_pages);

Some callers are passing a truncated unsigned long today.  It is
highly unlikely that these values will overflow an unsigned int.
However, this should be changed to an unsigned long to be consistent
with other page counts.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 include/linux/gfp.h | 2 +-
 mm/cma.c            | 2 +-
 mm/hugetlb.c        | 2 +-
 mm/page_alloc.c     | 6 +++---
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 1a4582b44d32..86a0d06463ab 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -572,7 +572,7 @@ static inline bool pm_suspended_storage(void)
 /* The below functions must be run on a range from a single zone. */
 extern int alloc_contig_range(unsigned long start, unsigned long end,
 			      unsigned migratetype, gfp_t gfp_mask);
-extern void free_contig_range(unsigned long pfn, unsigned nr_pages);
+extern void free_contig_range(unsigned long pfn, unsigned long nr_pages);
 #endif
 
 #ifdef CONFIG_CMA
diff --git a/mm/cma.c b/mm/cma.c
index aa40e6c7b042..f473fc2b7cbd 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -563,7 +563,7 @@ bool cma_release(struct cma *cma, const struct page *pages, unsigned int count)
 
 	VM_BUG_ON(pfn + count > cma->base_pfn + cma->count);
 
-	free_contig_range(pfn, count);
+	free_contig_range(pfn, (unsigned long)count);
 	cma_clear_bitmap(cma, pfn, count);
 	trace_cma_release(pfn, pages, count);
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 218679138255..c81072ce7510 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1055,7 +1055,7 @@ static void destroy_compound_gigantic_page(struct page *page,
 
 static void free_gigantic_page(struct page *page, unsigned int order)
 {
-	free_contig_range(page_to_pfn(page), 1 << order);
+	free_contig_range(page_to_pfn(page), 1UL << order);
 }
 
 static int __alloc_gigantic_page(unsigned long start_pfn,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 905db9d7962f..0fd5e8e2456e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7937,9 +7937,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	return ret;
 }
 
-void free_contig_range(unsigned long pfn, unsigned nr_pages)
+void free_contig_range(unsigned long pfn, unsigned long nr_pages)
 {
-	unsigned int count = 0;
+	unsigned long count = 0;
 
 	for (; nr_pages--; pfn++) {
 		struct page *page = pfn_to_page(pfn);
@@ -7947,7 +7947,7 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
 		count += page_count(page) != 1;
 		__free_page(page);
 	}
-	WARN(count != 0, "%d pages are still in use!\n", count);
+	WARN(count != 0, "%ld pages are still in use!\n", count);
 }
 #endif
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/3] mm: add find_alloc_contig_pages() interface
  2018-04-17  2:09 [PATCH 0/3] Interface for higher order contiguous allocations Mike Kravetz
  2018-04-17  2:09 ` [PATCH 1/3] mm: change type of free_contig_range(nr_pages) to unsigned long Mike Kravetz
@ 2018-04-17  2:09 ` Mike Kravetz
  2018-04-17 12:10   ` kbuild test robot
                     ` (3 more replies)
  2018-04-17  2:09 ` [PATCH 3/3] mm/hugetlb: use find_alloc_contig_pages() to allocate gigantic pages Mike Kravetz
  2 siblings, 4 replies; 13+ messages in thread
From: Mike Kravetz @ 2018-04-17  2:09 UTC (permalink / raw)
  To: linux-mm, linux-kernel, linux-api
  Cc: Reinette Chatre, Michal Hocko, Christopher Lameter, Guy Shattah,
	Anshuman Khandual, Michal Nazarewicz, Vlastimil Babka,
	David Nellans, Laura Abbott, Pavel Machek, Dave Hansen,
	Andrew Morton, Mike Kravetz

find_alloc_contig_pages() is a new interface that attempts to locate
and allocate a contiguous range of pages.  It is provided as a more
convenient interface than alloc_contig_range() which is currently
used by CMA and gigantic huge pages.

When attempting to allocate a range of pages, migration is employed
if possible.  There is no guarantee that the routine will succeed.
So, the user must be prepared for failure and have a fall back plan.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 include/linux/gfp.h | 12 ++++++++
 mm/page_alloc.c     | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 99 insertions(+), 2 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 86a0d06463ab..528b194cc266 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -573,6 +573,18 @@ static inline bool pm_suspended_storage(void)
 extern int alloc_contig_range(unsigned long start, unsigned long end,
 			      unsigned migratetype, gfp_t gfp_mask);
 extern void free_contig_range(unsigned long pfn, unsigned long nr_pages);
+extern struct page *find_alloc_contig_pages(unsigned int order, gfp_t gfp,
+						int nid, nodemask_t *nodemask);
+extern void free_contig_pages(struct page *page, unsigned long nr_pages);
+#else
+static inline page *find_alloc_contig_pages(unsigned int order, gfp_t gfp,
+						int nid, nodemask_t *nodemask)
+{
+	return NULL;
+}
+static void free_contig_pages(struct page *page, unsigned long nr_pages)
+{
+}
 #endif
 
 #ifdef CONFIG_CMA
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0fd5e8e2456e..81070fe55c44 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -67,6 +67,7 @@
 #include <linux/ftrace.h>
 #include <linux/lockdep.h>
 #include <linux/nmi.h>
+#include <linux/mmzone.h>
 
 #include <asm/sections.h>
 #include <asm/tlbflush.h>
@@ -2010,9 +2011,13 @@ static __always_inline struct page *__rmqueue_cma_fallback(struct zone *zone,
 {
 	return __rmqueue_smallest(zone, order, MIGRATE_CMA);
 }
+#define contig_alloc_migratetype_ok(migratetype) \
+	((migratetype) == MIGRATE_CMA || (migratetype) == MIGRATE_MOVABLE)
 #else
 static inline struct page *__rmqueue_cma_fallback(struct zone *zone,
 					unsigned int order) { return NULL; }
+#define contig_alloc_migratetype_ok(migratetype) \
+	((migratetype) == MIGRATE_MOVABLE)
 #endif
 
 /*
@@ -7822,6 +7827,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	};
 	INIT_LIST_HEAD(&cc.migratepages);
 
+	if (!contig_alloc_migratetype_ok(migratetype))
+		return -EINVAL;
+
 	/*
 	 * What we do here is we mark all pageblocks in range as
 	 * MIGRATE_ISOLATE.  Because pageblock and max order pages may
@@ -7912,8 +7920,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 
 	/* Make sure the range is really isolated. */
 	if (test_pages_isolated(outer_start, end, false)) {
-		pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n",
-			__func__, outer_start, end);
+		if (!(migratetype == MIGRATE_MOVABLE)) /* only print for CMA */
+			pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n",
+				__func__, outer_start, end);
 		ret = -EBUSY;
 		goto done;
 	}
@@ -7949,6 +7958,82 @@ void free_contig_range(unsigned long pfn, unsigned long nr_pages)
 	}
 	WARN(count != 0, "%ld pages are still in use!\n", count);
 }
+
+static bool contig_pfn_range_valid(struct zone *z, unsigned long start_pfn,
+					unsigned long nr_pages)
+{
+	unsigned long i, end_pfn = start_pfn + nr_pages;
+	struct page *page;
+
+	for (i = start_pfn; i < end_pfn; i++) {
+		if (!pfn_valid(i))
+			return false;
+
+		page = pfn_to_page(i);
+
+		if (page_zone(page) != z)
+			return false;
+
+	}
+
+	return true;
+}
+
+/**
+ * find_alloc_contig_pages() -- attempt to find and allocate a contiguous
+ *				range of pages
+ * @order:	number of pages
+ * @gfp:	gfp mask used to limit search as well as during compaction
+ * @nid:	target node
+ * @nodemask:	mask of other possible nodes
+ *
+ * Pages can be freed with a call to free_contig_pages(), or by manually
+ * calling __free_page() for each page allocated.
+ *
+ * Return: pointer to 'order' pages on success, or NULL if not successful.
+ */
+struct page *find_alloc_contig_pages(unsigned int order, gfp_t gfp,
+					int nid, nodemask_t *nodemask)
+{
+	unsigned long pfn, nr_pages, flags;
+	struct page *ret_page = NULL;
+	struct zonelist *zonelist;
+	struct zoneref *z;
+	struct zone *zone;
+	int rc;
+
+	nr_pages = 1 << order;
+	zonelist = node_zonelist(nid, gfp);
+	for_each_zone_zonelist_nodemask(zone, z, zonelist, gfp_zone(gfp),
+					nodemask) {
+		spin_lock_irqsave(&zone->lock, flags);
+		pfn = ALIGN(zone->zone_start_pfn, nr_pages);
+		while (zone_spans_pfn(zone, pfn + nr_pages - 1)) {
+			if (contig_pfn_range_valid(zone, pfn, nr_pages)) {
+				spin_unlock_irqrestore(&zone->lock, flags);
+
+				rc = alloc_contig_range(pfn, pfn + nr_pages,
+							MIGRATE_MOVABLE, gfp);
+				if (!rc) {
+					ret_page = pfn_to_page(pfn);
+					return ret_page;
+				}
+				spin_lock_irqsave(&zone->lock, flags);
+			}
+			pfn += nr_pages;
+		}
+		spin_unlock_irqrestore(&zone->lock, flags);
+	}
+
+	return ret_page;
+}
+EXPORT_SYMBOL_GPL(find_alloc_contig_pages);
+
+void free_contig_pages(struct page *page, unsigned long nr_pages)
+{
+	free_contig_range(page_to_pfn(page), nr_pages);
+}
+EXPORT_SYMBOL_GPL(free_contig_pages);
 #endif
 
 #if defined CONFIG_MEMORY_HOTPLUG || defined CONFIG_CMA
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/3] mm/hugetlb: use find_alloc_contig_pages() to allocate gigantic pages
  2018-04-17  2:09 [PATCH 0/3] Interface for higher order contiguous allocations Mike Kravetz
  2018-04-17  2:09 ` [PATCH 1/3] mm: change type of free_contig_range(nr_pages) to unsigned long Mike Kravetz
  2018-04-17  2:09 ` [PATCH 2/3] mm: add find_alloc_contig_pages() interface Mike Kravetz
@ 2018-04-17  2:09 ` Mike Kravetz
  2 siblings, 0 replies; 13+ messages in thread
From: Mike Kravetz @ 2018-04-17  2:09 UTC (permalink / raw)
  To: linux-mm, linux-kernel, linux-api
  Cc: Reinette Chatre, Michal Hocko, Christopher Lameter, Guy Shattah,
	Anshuman Khandual, Michal Nazarewicz, Vlastimil Babka,
	David Nellans, Laura Abbott, Pavel Machek, Dave Hansen,
	Andrew Morton, Mike Kravetz

Use the new find_alloc_contig_pages() interface for the allocation of
gigantic pages and remove associated code in hugetlb.c.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 mm/hugetlb.c | 87 +++++-------------------------------------------------------
 1 file changed, 6 insertions(+), 81 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index c81072ce7510..a209767cb808 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1053,91 +1053,16 @@ static void destroy_compound_gigantic_page(struct page *page,
 	__ClearPageHead(page);
 }
 
-static void free_gigantic_page(struct page *page, unsigned int order)
+static void free_gigantic_page(struct page *page, struct hstate *h)
 {
-	free_contig_range(page_to_pfn(page), 1UL << order);
-}
-
-static int __alloc_gigantic_page(unsigned long start_pfn,
-				unsigned long nr_pages, gfp_t gfp_mask)
-{
-	unsigned long end_pfn = start_pfn + nr_pages;
-	return alloc_contig_range(start_pfn, end_pfn, MIGRATE_MOVABLE,
-				  gfp_mask);
-}
-
-static bool pfn_range_valid_gigantic(struct zone *z,
-			unsigned long start_pfn, unsigned long nr_pages)
-{
-	unsigned long i, end_pfn = start_pfn + nr_pages;
-	struct page *page;
-
-	for (i = start_pfn; i < end_pfn; i++) {
-		if (!pfn_valid(i))
-			return false;
-
-		page = pfn_to_page(i);
-
-		if (page_zone(page) != z)
-			return false;
-
-		if (PageReserved(page))
-			return false;
-
-		if (page_count(page) > 0)
-			return false;
-
-		if (PageHuge(page))
-			return false;
-	}
-
-	return true;
-}
-
-static bool zone_spans_last_pfn(const struct zone *zone,
-			unsigned long start_pfn, unsigned long nr_pages)
-{
-	unsigned long last_pfn = start_pfn + nr_pages - 1;
-	return zone_spans_pfn(zone, last_pfn);
+	free_contig_pages(page, (unsigned long)pages_per_huge_page(h));
 }
 
 static struct page *alloc_gigantic_page(struct hstate *h, gfp_t gfp_mask,
 		int nid, nodemask_t *nodemask)
 {
-	unsigned int order = huge_page_order(h);
-	unsigned long nr_pages = 1 << order;
-	unsigned long ret, pfn, flags;
-	struct zonelist *zonelist;
-	struct zone *zone;
-	struct zoneref *z;
-
-	zonelist = node_zonelist(nid, gfp_mask);
-	for_each_zone_zonelist_nodemask(zone, z, zonelist, gfp_zone(gfp_mask), nodemask) {
-		spin_lock_irqsave(&zone->lock, flags);
-
-		pfn = ALIGN(zone->zone_start_pfn, nr_pages);
-		while (zone_spans_last_pfn(zone, pfn, nr_pages)) {
-			if (pfn_range_valid_gigantic(zone, pfn, nr_pages)) {
-				/*
-				 * We release the zone lock here because
-				 * alloc_contig_range() will also lock the zone
-				 * at some point. If there's an allocation
-				 * spinning on this lock, it may win the race
-				 * and cause alloc_contig_range() to fail...
-				 */
-				spin_unlock_irqrestore(&zone->lock, flags);
-				ret = __alloc_gigantic_page(pfn, nr_pages, gfp_mask);
-				if (!ret)
-					return pfn_to_page(pfn);
-				spin_lock_irqsave(&zone->lock, flags);
-			}
-			pfn += nr_pages;
-		}
-
-		spin_unlock_irqrestore(&zone->lock, flags);
-	}
-
-	return NULL;
+	return find_alloc_contig_pages(huge_page_order(h), gfp_mask, nid,
+					nodemask);
 }
 
 static void prep_new_huge_page(struct hstate *h, struct page *page, int nid);
@@ -1147,7 +1072,7 @@ static void prep_compound_gigantic_page(struct page *page, unsigned int order);
 static inline bool gigantic_page_supported(void) { return false; }
 static struct page *alloc_gigantic_page(struct hstate *h, gfp_t gfp_mask,
 		int nid, nodemask_t *nodemask) { return NULL; }
-static inline void free_gigantic_page(struct page *page, unsigned int order) { }
+static inline void free_gigantic_page(struct page *page, struct hstate *h) { }
 static inline void destroy_compound_gigantic_page(struct page *page,
 						unsigned int order) { }
 #endif
@@ -1172,7 +1097,7 @@ static void update_and_free_page(struct hstate *h, struct page *page)
 	set_page_refcounted(page);
 	if (hstate_is_gigantic(h)) {
 		destroy_compound_gigantic_page(page, huge_page_order(h));
-		free_gigantic_page(page, huge_page_order(h));
+		free_gigantic_page(page, h);
 	} else {
 		__free_pages(page, huge_page_order(h));
 	}
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] mm: add find_alloc_contig_pages() interface
  2018-04-17  2:09 ` [PATCH 2/3] mm: add find_alloc_contig_pages() interface Mike Kravetz
@ 2018-04-17 12:10   ` kbuild test robot
  2018-04-18  1:39     ` Mike Kravetz
  2018-04-17 14:19   ` kbuild test robot
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: kbuild test robot @ 2018-04-17 12:10 UTC (permalink / raw)
  Cc: kbuild-all, linux-mm, linux-kernel, linux-api, Reinette Chatre,
	Michal Hocko, Christopher Lameter, Guy Shattah,
	Anshuman Khandual, Michal Nazarewicz, Vlastimil Babka,
	David Nellans, Laura Abbott, Pavel Machek, Dave Hansen,
	Andrew Morton, Mike Kravetz

[-- Attachment #1: Type: text/plain, Size: 2543 bytes --]

Hi Mike,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on mmotm/master]
[also build test ERROR on v4.17-rc1 next-20180417]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Mike-Kravetz/mm-change-type-of-free_contig_range-nr_pages-to-unsigned-long/20180417-194309
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   In file included from include/linux/slab.h:15:0,
                    from include/linux/crypto.h:24,
                    from arch/x86/kernel/asm-offsets.c:9:
>> include/linux/gfp.h:580:15: error: unknown type name 'page'
    static inline page *find_alloc_contig_pages(unsigned int order, gfp_t gfp,
                  ^~~~
   include/linux/gfp.h:585:13: warning: 'free_contig_pages' defined but not used [-Wunused-function]
    static void free_contig_pages(struct page *page, unsigned long nr_pages)
                ^~~~~~~~~~~~~~~~~
   make[2]: *** [arch/x86/kernel/asm-offsets.s] Error 1
   make[2]: Target '__build' not remade because of errors.
   make[1]: *** [prepare0] Error 2
   make[1]: Target 'prepare' not remade because of errors.
   make: *** [sub-make] Error 2

vim +/page +580 include/linux/gfp.h

   570	
   571	#if (defined(CONFIG_MEMORY_ISOLATION) && defined(CONFIG_COMPACTION)) || defined(CONFIG_CMA)
   572	/* The below functions must be run on a range from a single zone. */
   573	extern int alloc_contig_range(unsigned long start, unsigned long end,
   574				      unsigned migratetype, gfp_t gfp_mask);
   575	extern void free_contig_range(unsigned long pfn, unsigned long nr_pages);
   576	extern struct page *find_alloc_contig_pages(unsigned int order, gfp_t gfp,
   577							int nid, nodemask_t *nodemask);
   578	extern void free_contig_pages(struct page *page, unsigned long nr_pages);
   579	#else
 > 580	static inline page *find_alloc_contig_pages(unsigned int order, gfp_t gfp,
   581							int nid, nodemask_t *nodemask)
   582	{
   583		return NULL;
   584	}
   585	static void free_contig_pages(struct page *page, unsigned long nr_pages)
   586	{
   587	}
   588	#endif
   589	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6311 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] mm: add find_alloc_contig_pages() interface
  2018-04-17  2:09 ` [PATCH 2/3] mm: add find_alloc_contig_pages() interface Mike Kravetz
  2018-04-17 12:10   ` kbuild test robot
@ 2018-04-17 14:19   ` kbuild test robot
  2018-04-21 16:16   ` Vlastimil Babka
  2018-04-23  0:09   ` Michal Hocko
  3 siblings, 0 replies; 13+ messages in thread
From: kbuild test robot @ 2018-04-17 14:19 UTC (permalink / raw)
  Cc: kbuild-all, linux-mm, linux-kernel, linux-api, Reinette Chatre,
	Michal Hocko, Christopher Lameter, Guy Shattah,
	Anshuman Khandual, Michal Nazarewicz, Vlastimil Babka,
	David Nellans, Laura Abbott, Pavel Machek, Dave Hansen,
	Andrew Morton, Mike Kravetz

[-- Attachment #1: Type: text/plain, Size: 2853 bytes --]

Hi Mike,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on mmotm/master]
[also build test ERROR on v4.17-rc1 next-20180417]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Mike-Kravetz/mm-change-type-of-free_contig_range-nr_pages-to-unsigned-long/20180417-194309
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: sparc-defconfig (attached as .config)
compiler: sparc-linux-gcc (GCC) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=sparc 

All errors (new ones prefixed by >>):

   In file included from include/linux/slab.h:15:0,
                    from include/linux/irq.h:21,
                    from include/asm-generic/hardirq.h:13,
                    from arch/sparc/include/asm/hardirq_32.h:11,
                    from arch/sparc/include/asm/hardirq.h:7,
                    from include/linux/hardirq.h:9,
                    from include/linux/interrupt.h:11,
                    from include/linux/kernel_stat.h:9,
                    from arch/sparc/kernel/irq_32.c:15:
   include/linux/gfp.h:580:15: error: unknown type name 'page'
    static inline page *find_alloc_contig_pages(unsigned int order, gfp_t gfp,
                  ^~~~
>> include/linux/gfp.h:585:13: error: 'free_contig_pages' defined but not used [-Werror=unused-function]
    static void free_contig_pages(struct page *page, unsigned long nr_pages)
                ^~~~~~~~~~~~~~~~~
   cc1: all warnings being treated as errors

vim +/free_contig_pages +585 include/linux/gfp.h

   570	
   571	#if (defined(CONFIG_MEMORY_ISOLATION) && defined(CONFIG_COMPACTION)) || defined(CONFIG_CMA)
   572	/* The below functions must be run on a range from a single zone. */
   573	extern int alloc_contig_range(unsigned long start, unsigned long end,
   574				      unsigned migratetype, gfp_t gfp_mask);
   575	extern void free_contig_range(unsigned long pfn, unsigned long nr_pages);
   576	extern struct page *find_alloc_contig_pages(unsigned int order, gfp_t gfp,
   577							int nid, nodemask_t *nodemask);
   578	extern void free_contig_pages(struct page *page, unsigned long nr_pages);
   579	#else
 > 580	static inline page *find_alloc_contig_pages(unsigned int order, gfp_t gfp,
   581							int nid, nodemask_t *nodemask)
   582	{
   583		return NULL;
   584	}
 > 585	static void free_contig_pages(struct page *page, unsigned long nr_pages)
   586	{
   587	}
   588	#endif
   589	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 11547 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] mm: add find_alloc_contig_pages() interface
  2018-04-17 12:10   ` kbuild test robot
@ 2018-04-18  1:39     ` Mike Kravetz
  0 siblings, 0 replies; 13+ messages in thread
From: Mike Kravetz @ 2018-04-18  1:39 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, linux-mm, linux-kernel, linux-api, Reinette Chatre,
	Michal Hocko, Christopher Lameter, Guy Shattah,
	Anshuman Khandual, Michal Nazarewicz, Vlastimil Babka,
	David Nellans, Laura Abbott, Pavel Machek, Dave Hansen,
	Andrew Morton

On 04/17/2018 05:10 AM, kbuild test robot wrote:
> All errors (new ones prefixed by >>):
> 
>    In file included from include/linux/slab.h:15:0,
>                     from include/linux/crypto.h:24,
>                     from arch/x86/kernel/asm-offsets.c:9:
>>> include/linux/gfp.h:580:15: error: unknown type name 'page'
>     static inline page *find_alloc_contig_pages(unsigned int order, gfp_t gfp,
>                   ^~~~
>    include/linux/gfp.h:585:13: warning: 'free_contig_pages' defined but not used [-Wunused-function]
>     static void free_contig_pages(struct page *page, unsigned long nr_pages)
>                 ^~~~~~~~~~~~~~~~~

Build issues fixed in updated patch below,

>From 28efabb5625c079573a821c8c5cfc19cc73a86bd Mon Sep 17 00:00:00 2001
From: Mike Kravetz <mike.kravetz@oracle.com>
Date: Mon, 16 Apr 2018 18:41:36 -0700
Subject: [PATCH 2/3] mm: add find_alloc_contig_pages() interface

find_alloc_contig_pages() is a new interface that attempts to locate
and allocate a contiguous range of pages.  It is provided as a more
convenient interface than alloc_contig_range() which is currently
used by CMA and gigantic huge pages.

When attempting to allocate a range of pages, migration is employed
if possible.  There is no guarantee that the routine will succeed.
So, the user must be prepared for failure and have a fall back plan.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 include/linux/gfp.h | 12 ++++++++
 mm/page_alloc.c     | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 99 insertions(+), 2 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 86a0d06463ab..7d1ea4e659dc 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -573,6 +573,18 @@ static inline bool pm_suspended_storage(void)
 extern int alloc_contig_range(unsigned long start, unsigned long end,
 			      unsigned migratetype, gfp_t gfp_mask);
 extern void free_contig_range(unsigned long pfn, unsigned long nr_pages);
+extern struct page *find_alloc_contig_pages(unsigned int order, gfp_t gfp,
+						int nid, nodemask_t *nodemask);
+extern void free_contig_pages(struct page *page, unsigned long nr_pages);
+#else
+static inline struct page *find_alloc_contig_pages(unsigned int order,
+				gfp_t gfp, int nid, nodemask_t *nodemask)
+{
+	return NULL;
+}
+static inline void free_contig_pages(struct page *page, unsigned long nr_pages)
+{
+}
 #endif
 
 #ifdef CONFIG_CMA
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0fd5e8e2456e..81070fe55c44 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -67,6 +67,7 @@
 #include <linux/ftrace.h>
 #include <linux/lockdep.h>
 #include <linux/nmi.h>
+#include <linux/mmzone.h>
 
 #include <asm/sections.h>
 #include <asm/tlbflush.h>
@@ -2010,9 +2011,13 @@ static __always_inline struct page *__rmqueue_cma_fallback(struct zone *zone,
 {
 	return __rmqueue_smallest(zone, order, MIGRATE_CMA);
 }
+#define contig_alloc_migratetype_ok(migratetype) \
+	((migratetype) == MIGRATE_CMA || (migratetype) == MIGRATE_MOVABLE)
 #else
 static inline struct page *__rmqueue_cma_fallback(struct zone *zone,
 					unsigned int order) { return NULL; }
+#define contig_alloc_migratetype_ok(migratetype) \
+	((migratetype) == MIGRATE_MOVABLE)
 #endif
 
 /*
@@ -7822,6 +7827,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	};
 	INIT_LIST_HEAD(&cc.migratepages);
 
+	if (!contig_alloc_migratetype_ok(migratetype))
+		return -EINVAL;
+
 	/*
 	 * What we do here is we mark all pageblocks in range as
 	 * MIGRATE_ISOLATE.  Because pageblock and max order pages may
@@ -7912,8 +7920,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 
 	/* Make sure the range is really isolated. */
 	if (test_pages_isolated(outer_start, end, false)) {
-		pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n",
-			__func__, outer_start, end);
+		if (!(migratetype == MIGRATE_MOVABLE)) /* only print for CMA */
+			pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n",
+				__func__, outer_start, end);
 		ret = -EBUSY;
 		goto done;
 	}
@@ -7949,6 +7958,82 @@ void free_contig_range(unsigned long pfn, unsigned long nr_pages)
 	}
 	WARN(count != 0, "%ld pages are still in use!\n", count);
 }
+
+static bool contig_pfn_range_valid(struct zone *z, unsigned long start_pfn,
+					unsigned long nr_pages)
+{
+	unsigned long i, end_pfn = start_pfn + nr_pages;
+	struct page *page;
+
+	for (i = start_pfn; i < end_pfn; i++) {
+		if (!pfn_valid(i))
+			return false;
+
+		page = pfn_to_page(i);
+
+		if (page_zone(page) != z)
+			return false;
+
+	}
+
+	return true;
+}
+
+/**
+ * find_alloc_contig_pages() -- attempt to find and allocate a contiguous
+ *				range of pages
+ * @order:	number of pages
+ * @gfp:	gfp mask used to limit search as well as during compaction
+ * @nid:	target node
+ * @nodemask:	mask of other possible nodes
+ *
+ * Pages can be freed with a call to free_contig_pages(), or by manually
+ * calling __free_page() for each page allocated.
+ *
+ * Return: pointer to 'order' pages on success, or NULL if not successful.
+ */
+struct page *find_alloc_contig_pages(unsigned int order, gfp_t gfp,
+					int nid, nodemask_t *nodemask)
+{
+	unsigned long pfn, nr_pages, flags;
+	struct page *ret_page = NULL;
+	struct zonelist *zonelist;
+	struct zoneref *z;
+	struct zone *zone;
+	int rc;
+
+	nr_pages = 1 << order;
+	zonelist = node_zonelist(nid, gfp);
+	for_each_zone_zonelist_nodemask(zone, z, zonelist, gfp_zone(gfp),
+					nodemask) {
+		spin_lock_irqsave(&zone->lock, flags);
+		pfn = ALIGN(zone->zone_start_pfn, nr_pages);
+		while (zone_spans_pfn(zone, pfn + nr_pages - 1)) {
+			if (contig_pfn_range_valid(zone, pfn, nr_pages)) {
+				spin_unlock_irqrestore(&zone->lock, flags);
+
+				rc = alloc_contig_range(pfn, pfn + nr_pages,
+							MIGRATE_MOVABLE, gfp);
+				if (!rc) {
+					ret_page = pfn_to_page(pfn);
+					return ret_page;
+				}
+				spin_lock_irqsave(&zone->lock, flags);
+			}
+			pfn += nr_pages;
+		}
+		spin_unlock_irqrestore(&zone->lock, flags);
+	}
+
+	return ret_page;
+}
+EXPORT_SYMBOL_GPL(find_alloc_contig_pages);
+
+void free_contig_pages(struct page *page, unsigned long nr_pages)
+{
+	free_contig_range(page_to_pfn(page), nr_pages);
+}
+EXPORT_SYMBOL_GPL(free_contig_pages);
 #endif
 
 #if defined CONFIG_MEMORY_HOTPLUG || defined CONFIG_CMA
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/3] mm: add find_alloc_contig_pages() interface
  2018-04-17  2:09 ` [PATCH 2/3] mm: add find_alloc_contig_pages() interface Mike Kravetz
  2018-04-17 12:10   ` kbuild test robot
  2018-04-17 14:19   ` kbuild test robot
@ 2018-04-21 16:16   ` Vlastimil Babka
  2018-05-02 21:13     ` Mike Kravetz
  2018-04-23  0:09   ` Michal Hocko
  3 siblings, 1 reply; 13+ messages in thread
From: Vlastimil Babka @ 2018-04-21 16:16 UTC (permalink / raw)
  To: Mike Kravetz, linux-mm, linux-kernel, linux-api
  Cc: Reinette Chatre, Michal Hocko, Christopher Lameter, Guy Shattah,
	Anshuman Khandual, Michal Nazarewicz, David Nellans,
	Laura Abbott, Pavel Machek, Dave Hansen, Andrew Morton

On 04/17/2018 04:09 AM, Mike Kravetz wrote:
> find_alloc_contig_pages() is a new interface that attempts to locate
> and allocate a contiguous range of pages.  It is provided as a more
> convenient interface than alloc_contig_range() which is currently
> used by CMA and gigantic huge pages.
> 
> When attempting to allocate a range of pages, migration is employed
> if possible.  There is no guarantee that the routine will succeed.
> So, the user must be prepared for failure and have a fall back plan.
> 
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

Hi, just two quick observations, maybe discussion pointers for the
LSF/MM session:
- it's weird that find_alloc_contig_pages() takes an order, and
free_contig_pages() takes a nr_pages. I suspect the interface would be
more future-proof with both using nr_pages? Perhaps also minimum
alignment for the allocation side? Order is fine for hugetlb, but what
about other potential users?
- contig_alloc_migratetype_ok() says that MIGRATE_CMA blocks are OK to
allocate from. This silently assumes that everything allocated by this
will be migratable itself, or it might eat CMA reserves. Is it the case?
Also you then call alloc_contig_range() with MIGRATE_MOVABLE, so it will
skip/fail on MIGRATE_CMA anyway IIRC.

Vlastimil

> ---
>  include/linux/gfp.h | 12 ++++++++
>  mm/page_alloc.c     | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 99 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 86a0d06463ab..528b194cc266 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -573,6 +573,18 @@ static inline bool pm_suspended_storage(void)
>  extern int alloc_contig_range(unsigned long start, unsigned long end,
>  			      unsigned migratetype, gfp_t gfp_mask);
>  extern void free_contig_range(unsigned long pfn, unsigned long nr_pages);
> +extern struct page *find_alloc_contig_pages(unsigned int order, gfp_t gfp,
> +						int nid, nodemask_t *nodemask);
> +extern void free_contig_pages(struct page *page, unsigned long nr_pages);
> +#else
> +static inline page *find_alloc_contig_pages(unsigned int order, gfp_t gfp,
> +						int nid, nodemask_t *nodemask)
> +{
> +	return NULL;
> +}
> +static void free_contig_pages(struct page *page, unsigned long nr_pages)
> +{
> +}
>  #endif
>  
>  #ifdef CONFIG_CMA
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 0fd5e8e2456e..81070fe55c44 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -67,6 +67,7 @@
>  #include <linux/ftrace.h>
>  #include <linux/lockdep.h>
>  #include <linux/nmi.h>
> +#include <linux/mmzone.h>
>  
>  #include <asm/sections.h>
>  #include <asm/tlbflush.h>
> @@ -2010,9 +2011,13 @@ static __always_inline struct page *__rmqueue_cma_fallback(struct zone *zone,
>  {
>  	return __rmqueue_smallest(zone, order, MIGRATE_CMA);
>  }
> +#define contig_alloc_migratetype_ok(migratetype) \
> +	((migratetype) == MIGRATE_CMA || (migratetype) == MIGRATE_MOVABLE)
>  #else
>  static inline struct page *__rmqueue_cma_fallback(struct zone *zone,
>  					unsigned int order) { return NULL; }
> +#define contig_alloc_migratetype_ok(migratetype) \
> +	((migratetype) == MIGRATE_MOVABLE)
>  #endif
>  
>  /*
> @@ -7822,6 +7827,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>  	};
>  	INIT_LIST_HEAD(&cc.migratepages);
>  
> +	if (!contig_alloc_migratetype_ok(migratetype))
> +		return -EINVAL;
> +
>  	/*
>  	 * What we do here is we mark all pageblocks in range as
>  	 * MIGRATE_ISOLATE.  Because pageblock and max order pages may
> @@ -7912,8 +7920,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>  
>  	/* Make sure the range is really isolated. */
>  	if (test_pages_isolated(outer_start, end, false)) {
> -		pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n",
> -			__func__, outer_start, end);
> +		if (!(migratetype == MIGRATE_MOVABLE)) /* only print for CMA */
> +			pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n",
> +				__func__, outer_start, end);
>  		ret = -EBUSY;
>  		goto done;
>  	}
> @@ -7949,6 +7958,82 @@ void free_contig_range(unsigned long pfn, unsigned long nr_pages)
>  	}
>  	WARN(count != 0, "%ld pages are still in use!\n", count);
>  }
> +
> +static bool contig_pfn_range_valid(struct zone *z, unsigned long start_pfn,
> +					unsigned long nr_pages)
> +{
> +	unsigned long i, end_pfn = start_pfn + nr_pages;
> +	struct page *page;
> +
> +	for (i = start_pfn; i < end_pfn; i++) {
> +		if (!pfn_valid(i))
> +			return false;
> +
> +		page = pfn_to_page(i);
> +
> +		if (page_zone(page) != z)
> +			return false;
> +
> +	}
> +
> +	return true;
> +}
> +
> +/**
> + * find_alloc_contig_pages() -- attempt to find and allocate a contiguous
> + *				range of pages
> + * @order:	number of pages
> + * @gfp:	gfp mask used to limit search as well as during compaction
> + * @nid:	target node
> + * @nodemask:	mask of other possible nodes
> + *
> + * Pages can be freed with a call to free_contig_pages(), or by manually
> + * calling __free_page() for each page allocated.
> + *
> + * Return: pointer to 'order' pages on success, or NULL if not successful.
> + */
> +struct page *find_alloc_contig_pages(unsigned int order, gfp_t gfp,
> +					int nid, nodemask_t *nodemask)
> +{
> +	unsigned long pfn, nr_pages, flags;
> +	struct page *ret_page = NULL;
> +	struct zonelist *zonelist;
> +	struct zoneref *z;
> +	struct zone *zone;
> +	int rc;
> +
> +	nr_pages = 1 << order;
> +	zonelist = node_zonelist(nid, gfp);
> +	for_each_zone_zonelist_nodemask(zone, z, zonelist, gfp_zone(gfp),
> +					nodemask) {
> +		spin_lock_irqsave(&zone->lock, flags);
> +		pfn = ALIGN(zone->zone_start_pfn, nr_pages);
> +		while (zone_spans_pfn(zone, pfn + nr_pages - 1)) {
> +			if (contig_pfn_range_valid(zone, pfn, nr_pages)) {
> +				spin_unlock_irqrestore(&zone->lock, flags);
> +
> +				rc = alloc_contig_range(pfn, pfn + nr_pages,
> +							MIGRATE_MOVABLE, gfp);
> +				if (!rc) {
> +					ret_page = pfn_to_page(pfn);
> +					return ret_page;
> +				}
> +				spin_lock_irqsave(&zone->lock, flags);
> +			}
> +			pfn += nr_pages;
> +		}
> +		spin_unlock_irqrestore(&zone->lock, flags);
> +	}
> +
> +	return ret_page;
> +}
> +EXPORT_SYMBOL_GPL(find_alloc_contig_pages);
> +
> +void free_contig_pages(struct page *page, unsigned long nr_pages)
> +{
> +	free_contig_range(page_to_pfn(page), nr_pages);
> +}
> +EXPORT_SYMBOL_GPL(free_contig_pages);
>  #endif
>  
>  #if defined CONFIG_MEMORY_HOTPLUG || defined CONFIG_CMA
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] mm: add find_alloc_contig_pages() interface
  2018-04-17  2:09 ` [PATCH 2/3] mm: add find_alloc_contig_pages() interface Mike Kravetz
                     ` (2 preceding siblings ...)
  2018-04-21 16:16   ` Vlastimil Babka
@ 2018-04-23  0:09   ` Michal Hocko
  2018-04-23  4:22     ` Mike Kravetz
  3 siblings, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2018-04-23  0:09 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, linux-api, Reinette Chatre,
	Christopher Lameter, Guy Shattah, Anshuman Khandual,
	Michal Nazarewicz, Vlastimil Babka, David Nellans, Laura Abbott,
	Pavel Machek, Dave Hansen, Andrew Morton

On Mon 16-04-18 19:09:14, Mike Kravetz wrote:
[...]
> @@ -2010,9 +2011,13 @@ static __always_inline struct page *__rmqueue_cma_fallback(struct zone *zone,
>  {
>  	return __rmqueue_smallest(zone, order, MIGRATE_CMA);
>  }
> +#define contig_alloc_migratetype_ok(migratetype) \
> +	((migratetype) == MIGRATE_CMA || (migratetype) == MIGRATE_MOVABLE)
>  #else
>  static inline struct page *__rmqueue_cma_fallback(struct zone *zone,
>  					unsigned int order) { return NULL; }
> +#define contig_alloc_migratetype_ok(migratetype) \
> +	((migratetype) == MIGRATE_MOVABLE)
>  #endif
>  
>  /*
> @@ -7822,6 +7827,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>  	};
>  	INIT_LIST_HEAD(&cc.migratepages);
>  
> +	if (!contig_alloc_migratetype_ok(migratetype))
> +		return -EINVAL;
> +
>
>  	/*
>  	 * What we do here is we mark all pageblocks in range as
>  	 * MIGRATE_ISOLATE.  Because pageblock and max order pages may
> @@ -7912,8 +7920,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>  
>  	/* Make sure the range is really isolated. */
>  	if (test_pages_isolated(outer_start, end, false)) {
> -		pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n",
> -			__func__, outer_start, end);
> +		if (!(migratetype == MIGRATE_MOVABLE)) /* only print for CMA */
> +			pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n",
> +				__func__, outer_start, end);
>  		ret = -EBUSY;
>  		goto done;
>  	}

This probably belongs to a separate patch. I would be tempted to say
that we should get rid of this migratetype thingy altogether. I confess
I have forgot everything about why this is required actually but it is
ugly as hell. Not your fault of course.

> @@ -7949,6 +7958,82 @@ void free_contig_range(unsigned long pfn, unsigned long nr_pages)
>  	}
>  	WARN(count != 0, "%ld pages are still in use!\n", count);
>  }
> +
> +static bool contig_pfn_range_valid(struct zone *z, unsigned long start_pfn,
> +					unsigned long nr_pages)
> +{
> +	unsigned long i, end_pfn = start_pfn + nr_pages;
> +	struct page *page;
> +
> +	for (i = start_pfn; i < end_pfn; i++) {
> +		if (!pfn_valid(i))
> +			return false;
> +
> +		page = pfn_to_page(i);

It believe we want pfn_to_online_page here. The old giga pages code is
buggy in that regard but nothing really critical because the
alloc_contig_range will notice that.

Also do we want to check other usual suspects? E.g. PageReserved? And
generally migrateable pages if page count > 0. Or do we want to leave
everything to the alloc_contig_range?

> +
> +		if (page_zone(page) != z)
> +			return false;
> +
> +	}
> +
> +	return true;
> +}
> +
> +/**
> + * find_alloc_contig_pages() -- attempt to find and allocate a contiguous
> + *				range of pages
> + * @order:	number of pages
> + * @gfp:	gfp mask used to limit search as well as during compaction
> + * @nid:	target node
> + * @nodemask:	mask of other possible nodes
> + *
> + * Pages can be freed with a call to free_contig_pages(), or by manually
> + * calling __free_page() for each page allocated.
> + *
> + * Return: pointer to 'order' pages on success, or NULL if not successful.
> + */
> +struct page *find_alloc_contig_pages(unsigned int order, gfp_t gfp,
> +					int nid, nodemask_t *nodemask)

Vlastimil asked about this but I would even say that we do not want to
make this order based. Why would we want to restrict the api to 2^order
sizes in the first place? What if somebody wants to allocate 123 pages?

> +{
> +	unsigned long pfn, nr_pages, flags;
> +	struct page *ret_page = NULL;
> +	struct zonelist *zonelist;
> +	struct zoneref *z;
> +	struct zone *zone;
> +	int rc;
> +
> +	nr_pages = 1 << order;
> +	zonelist = node_zonelist(nid, gfp);
> +	for_each_zone_zonelist_nodemask(zone, z, zonelist, gfp_zone(gfp),
> +					nodemask) {
> +		spin_lock_irqsave(&zone->lock, flags);
> +		pfn = ALIGN(zone->zone_start_pfn, nr_pages);
> +		while (zone_spans_pfn(zone, pfn + nr_pages - 1)) {
> +			if (contig_pfn_range_valid(zone, pfn, nr_pages)) {
> +				spin_unlock_irqrestore(&zone->lock, flags);

I know that the giga page allocation does use the zone lock but why? I
suspect it wants to stabilize zone_start_pfn but zone lock doesn't do
that.

> +
> +				rc = alloc_contig_range(pfn, pfn + nr_pages,
> +							MIGRATE_MOVABLE, gfp);
> +				if (!rc) {
> +					ret_page = pfn_to_page(pfn);
> +					return ret_page;
> +				}
> +				spin_lock_irqsave(&zone->lock, flags);
> +			}
> +			pfn += nr_pages;
> +		}
> +		spin_unlock_irqrestore(&zone->lock, flags);
> +	}

Other than that this API looks much saner than alloc_contig_range. We
still need to sort out some details (e.g. alignment) but it should be an
improvement.

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] mm: add find_alloc_contig_pages() interface
  2018-04-23  0:09   ` Michal Hocko
@ 2018-04-23  4:22     ` Mike Kravetz
  2018-04-24 13:37       ` Michal Hocko
  0 siblings, 1 reply; 13+ messages in thread
From: Mike Kravetz @ 2018-04-23  4:22 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, linux-api, Reinette Chatre,
	Christopher Lameter, Guy Shattah, Anshuman Khandual,
	Michal Nazarewicz, Vlastimil Babka, David Nellans, Laura Abbott,
	Pavel Machek, Dave Hansen, Andrew Morton

On 04/22/2018 05:09 PM, Michal Hocko wrote:
> On Mon 16-04-18 19:09:14, Mike Kravetz wrote:
> [...]
>> @@ -2010,9 +2011,13 @@ static __always_inline struct page *__rmqueue_cma_fallback(struct zone *zone,
>>  {
>>  	return __rmqueue_smallest(zone, order, MIGRATE_CMA);
>>  }
>> +#define contig_alloc_migratetype_ok(migratetype) \
>> +	((migratetype) == MIGRATE_CMA || (migratetype) == MIGRATE_MOVABLE)
>>  #else
>>  static inline struct page *__rmqueue_cma_fallback(struct zone *zone,
>>  					unsigned int order) { return NULL; }
>> +#define contig_alloc_migratetype_ok(migratetype) \
>> +	((migratetype) == MIGRATE_MOVABLE)
>>  #endif
>>  
>>  /*
>> @@ -7822,6 +7827,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>>  	};
>>  	INIT_LIST_HEAD(&cc.migratepages);
>>  
>> +	if (!contig_alloc_migratetype_ok(migratetype))
>> +		return -EINVAL;
>> +
>>
>>  	/*
>>  	 * What we do here is we mark all pageblocks in range as
>>  	 * MIGRATE_ISOLATE.  Because pageblock and max order pages may
>> @@ -7912,8 +7920,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>>  
>>  	/* Make sure the range is really isolated. */
>>  	if (test_pages_isolated(outer_start, end, false)) {
>> -		pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n",
>> -			__func__, outer_start, end);
>> +		if (!(migratetype == MIGRATE_MOVABLE)) /* only print for CMA */
>> +			pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n",
>> +				__func__, outer_start, end);
>>  		ret = -EBUSY;
>>  		goto done;
>>  	}
> 
> This probably belongs to a separate patch. I would be tempted to say
> that we should get rid of this migratetype thingy altogether. I confess
> I have forgot everything about why this is required actually but it is
> ugly as hell. Not your fault of course.
> 
>> @@ -7949,6 +7958,82 @@ void free_contig_range(unsigned long pfn, unsigned long nr_pages)
>>  	}
>>  	WARN(count != 0, "%ld pages are still in use!\n", count);
>>  }
>> +
>> +static bool contig_pfn_range_valid(struct zone *z, unsigned long start_pfn,
>> +					unsigned long nr_pages)
>> +{
>> +	unsigned long i, end_pfn = start_pfn + nr_pages;
>> +	struct page *page;
>> +
>> +	for (i = start_pfn; i < end_pfn; i++) {
>> +		if (!pfn_valid(i))
>> +			return false;
>> +
>> +		page = pfn_to_page(i);
> 
> It believe we want pfn_to_online_page here. The old giga pages code is
> buggy in that regard but nothing really critical because the
> alloc_contig_range will notice that.

Ok

> Also do we want to check other usual suspects? E.g. PageReserved? And
> generally migrateable pages if page count > 0. Or do we want to leave
> everything to the alloc_contig_range?

I think you proposed something like the above with limited checking at
some time in the past.  In my testing, allocations were more likely to
succeed if we did limited testing here and let alloc_contig_range take
a shot at migration/allocation.  There really are two ways to approach
this, do as much checking up front or let it be handled by alloc_contig_range.

>> +
>> +		if (page_zone(page) != z)
>> +			return false;
>> +
>> +	}
>> +
>> +	return true;
>> +}
>> +
>> +/**
>> + * find_alloc_contig_pages() -- attempt to find and allocate a contiguous
>> + *				range of pages
>> + * @order:	number of pages
>> + * @gfp:	gfp mask used to limit search as well as during compaction
>> + * @nid:	target node
>> + * @nodemask:	mask of other possible nodes
>> + *
>> + * Pages can be freed with a call to free_contig_pages(), or by manually
>> + * calling __free_page() for each page allocated.
>> + *
>> + * Return: pointer to 'order' pages on success, or NULL if not successful.
>> + */
>> +struct page *find_alloc_contig_pages(unsigned int order, gfp_t gfp,
>> +					int nid, nodemask_t *nodemask)
> 
> Vlastimil asked about this but I would even say that we do not want to
> make this order based. Why would we want to restrict the api to 2^order
> sizes in the first place? What if somebody wants to allocate 123 pages?

After Vlastimil's question and again here, I realized that this routine
has a HUGE gap.  For allocation sizes less than MAX_ORDER, it should be
using the traditional paga allocation routines.  It is only when allocation
size is greater than MAX_ORDER that we need to call into alloc_contig_range.

Unless I am missing something, calls to alloc_contig range need to have
a size that is a multiple of page block.  This is because isolation needs
to take place at a page block level.  We can easily 'round up' and release
excess pages.

Using number of pages instead of order makes sense.   I will rewrite with
this in mind as well as taking the other issues into account.

> 
>> +{
>> +	unsigned long pfn, nr_pages, flags;
>> +	struct page *ret_page = NULL;
>> +	struct zonelist *zonelist;
>> +	struct zoneref *z;
>> +	struct zone *zone;
>> +	int rc;
>> +
>> +	nr_pages = 1 << order;
>> +	zonelist = node_zonelist(nid, gfp);
>> +	for_each_zone_zonelist_nodemask(zone, z, zonelist, gfp_zone(gfp),
>> +					nodemask) {
>> +		spin_lock_irqsave(&zone->lock, flags);
>> +		pfn = ALIGN(zone->zone_start_pfn, nr_pages);
>> +		while (zone_spans_pfn(zone, pfn + nr_pages - 1)) {
>> +			if (contig_pfn_range_valid(zone, pfn, nr_pages)) {
>> +				spin_unlock_irqrestore(&zone->lock, flags);
> 
> I know that the giga page allocation does use the zone lock but why? I
> suspect it wants to stabilize zone_start_pfn but zone lock doesn't do
> that.
> 

I am not sure.  As you suspected, this was copied from the giga page
allocation code.  I'll look into it.

>> +
>> +				rc = alloc_contig_range(pfn, pfn + nr_pages,
>> +							MIGRATE_MOVABLE, gfp);
>> +				if (!rc) {
>> +					ret_page = pfn_to_page(pfn);
>> +					return ret_page;
>> +				}
>> +				spin_lock_irqsave(&zone->lock, flags);
>> +			}
>> +			pfn += nr_pages;
>> +		}
>> +		spin_unlock_irqrestore(&zone->lock, flags);
>> +	}
> 
> Other than that this API looks much saner than alloc_contig_range. We
> still need to sort out some details (e.g. alignment) but it should be an
> improvement.

Ok, I will continue to refine.  We now have at least one immediate use
case for this type of functionality.

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] mm: add find_alloc_contig_pages() interface
  2018-04-23  4:22     ` Mike Kravetz
@ 2018-04-24 13:37       ` Michal Hocko
  0 siblings, 0 replies; 13+ messages in thread
From: Michal Hocko @ 2018-04-24 13:37 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, linux-api, Reinette Chatre,
	Christopher Lameter, Guy Shattah, Anshuman Khandual,
	Michal Nazarewicz, Vlastimil Babka, David Nellans, Laura Abbott,
	Pavel Machek, Dave Hansen, Andrew Morton

On Sun 22-04-18 21:22:07, Mike Kravetz wrote:
> On 04/22/2018 05:09 PM, Michal Hocko wrote:
[...
> > Also do we want to check other usual suspects? E.g. PageReserved? And
> > generally migrateable pages if page count > 0. Or do we want to leave
> > everything to the alloc_contig_range?
> 
> I think you proposed something like the above with limited checking at
> some time in the past.  In my testing, allocations were more likely to
> succeed if we did limited testing here and let alloc_contig_range take
> a shot at migration/allocation.  There really are two ways to approach
> this, do as much checking up front or let it be handled by alloc_contig_range.

OK, it would be great to have a comment mentioning that. The discrepancy
will just hit eyes 

[...]
> Unless I am missing something, calls to alloc_contig range need to have
> a size that is a multiple of page block.  This is because isolation needs
> to take place at a page block level.  We can easily 'round up' and release
> excess pages.

I am not sure but can we simply leave a part of the page block behind? I
mean it might have a misleading migrate type but that shouldn't matter
much, no?

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] mm: add find_alloc_contig_pages() interface
  2018-04-21 16:16   ` Vlastimil Babka
@ 2018-05-02 21:13     ` Mike Kravetz
  2018-06-06  9:32       ` Michal Hocko
  0 siblings, 1 reply; 13+ messages in thread
From: Mike Kravetz @ 2018-05-02 21:13 UTC (permalink / raw)
  To: Vlastimil Babka, linux-mm, linux-kernel, linux-api
  Cc: Reinette Chatre, Michal Hocko, Christopher Lameter, Guy Shattah,
	Anshuman Khandual, Michal Nazarewicz, David Nellans,
	Laura Abbott, Pavel Machek, Dave Hansen, Andrew Morton

On 04/21/2018 09:16 AM, Vlastimil Babka wrote:
> On 04/17/2018 04:09 AM, Mike Kravetz wrote:
>> find_alloc_contig_pages() is a new interface that attempts to locate
>> and allocate a contiguous range of pages.  It is provided as a more
>> convenient interface than alloc_contig_range() which is currently
>> used by CMA and gigantic huge pages.
>>
>> When attempting to allocate a range of pages, migration is employed
>> if possible.  There is no guarantee that the routine will succeed.
>> So, the user must be prepared for failure and have a fall back plan.
>>
>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> 
> Hi, just two quick observations, maybe discussion pointers for the
> LSF/MM session:
> - it's weird that find_alloc_contig_pages() takes an order, and
> free_contig_pages() takes a nr_pages. I suspect the interface would be
> more future-proof with both using nr_pages? Perhaps also minimum
> alignment for the allocation side? Order is fine for hugetlb, but what
> about other potential users?

Agreed, and I am changing this to nr_pages and adding alignment.

> - contig_alloc_migratetype_ok() says that MIGRATE_CMA blocks are OK to
> allocate from. This silently assumes that everything allocated by this
> will be migratable itself, or it might eat CMA reserves. Is it the case?
> Also you then call alloc_contig_range() with MIGRATE_MOVABLE, so it will
> skip/fail on MIGRATE_CMA anyway IIRC.

When looking closer at the code, alloc_contig_range currently has comments
saying migratetype must be MIGRATE_MOVABLE or MIGRATE_CMA.  However, this
is not checked/enforced anywhere in the code (that I can see).  The
migratetype passed to alloc_contig_range() will be used to set the migrate
type of all pageblocks in the range.  If there is an error, one side effect
is that some pageblocks may have their migrate type changed to migratetype.
Depending on how far we got before hitting the error, the number of pageblocks
changed is unknown.  This actually can happen at the lower level routine
start_isolate_page_range().

My first thought was to make start_isolate_page_range/set_migratetype_isolate
check that the migrate type of a pageblock was migratetype before isolating.
This would work for CMA, and I could make it work for the new allocator.
However, offline_pages also calls start_isolate_page_range and I believe we
do not want to enforce such a rule (all pageblocks must be of the same migrate
type) for memory hotplug/offline?

Should we be concerned at all about this potential changing of migrate type
on error?  The only way I can think to avoid this is to save the original
migrate type before isolation.

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] mm: add find_alloc_contig_pages() interface
  2018-05-02 21:13     ` Mike Kravetz
@ 2018-06-06  9:32       ` Michal Hocko
  0 siblings, 0 replies; 13+ messages in thread
From: Michal Hocko @ 2018-06-06  9:32 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Vlastimil Babka, linux-mm, linux-kernel, linux-api,
	Reinette Chatre, Christopher Lameter, Guy Shattah,
	Anshuman Khandual, Michal Nazarewicz, David Nellans,
	Laura Abbott, Pavel Machek, Dave Hansen, Andrew Morton

On Wed 02-05-18 14:13:32, Mike Kravetz wrote:
> On 04/21/2018 09:16 AM, Vlastimil Babka wrote:
> > On 04/17/2018 04:09 AM, Mike Kravetz wrote:
> >> find_alloc_contig_pages() is a new interface that attempts to locate
> >> and allocate a contiguous range of pages.  It is provided as a more
> >> convenient interface than alloc_contig_range() which is currently
> >> used by CMA and gigantic huge pages.
> >>
> >> When attempting to allocate a range of pages, migration is employed
> >> if possible.  There is no guarantee that the routine will succeed.
> >> So, the user must be prepared for failure and have a fall back plan.
> >>
> >> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> > 
> > Hi, just two quick observations, maybe discussion pointers for the
> > LSF/MM session:
> > - it's weird that find_alloc_contig_pages() takes an order, and
> > free_contig_pages() takes a nr_pages. I suspect the interface would be
> > more future-proof with both using nr_pages? Perhaps also minimum
> > alignment for the allocation side? Order is fine for hugetlb, but what
> > about other potential users?
> 
> Agreed, and I am changing this to nr_pages and adding alignment.
> 
> > - contig_alloc_migratetype_ok() says that MIGRATE_CMA blocks are OK to
> > allocate from. This silently assumes that everything allocated by this
> > will be migratable itself, or it might eat CMA reserves. Is it the case?
> > Also you then call alloc_contig_range() with MIGRATE_MOVABLE, so it will
> > skip/fail on MIGRATE_CMA anyway IIRC.
> 
> When looking closer at the code, alloc_contig_range currently has comments
> saying migratetype must be MIGRATE_MOVABLE or MIGRATE_CMA.  However, this
> is not checked/enforced anywhere in the code (that I can see).  The
> migratetype passed to alloc_contig_range() will be used to set the migrate
> type of all pageblocks in the range.  If there is an error, one side effect
> is that some pageblocks may have their migrate type changed to migratetype.
> Depending on how far we got before hitting the error, the number of pageblocks
> changed is unknown.  This actually can happen at the lower level routine
> start_isolate_page_range().
> 
> My first thought was to make start_isolate_page_range/set_migratetype_isolate
> check that the migrate type of a pageblock was migratetype before isolating.
> This would work for CMA, and I could make it work for the new allocator.
> However, offline_pages also calls start_isolate_page_range and I believe we
> do not want to enforce such a rule (all pageblocks must be of the same migrate
> type) for memory hotplug/offline?
> 
> Should we be concerned at all about this potential changing of migrate type
> on error?  The only way I can think to avoid this is to save the original
> migrate type before isolation.

This is more a question to Vlastimil, Joonsoo. But my understanding is
that it doesn't matter. MIGRATE_MOVABLE will not block other
allocations. So we seem to need it only for MIGRATE_CMA. The later
should die sooner or later hopefully so this awful kludge should just
die with it.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-06-06  9:32 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-17  2:09 [PATCH 0/3] Interface for higher order contiguous allocations Mike Kravetz
2018-04-17  2:09 ` [PATCH 1/3] mm: change type of free_contig_range(nr_pages) to unsigned long Mike Kravetz
2018-04-17  2:09 ` [PATCH 2/3] mm: add find_alloc_contig_pages() interface Mike Kravetz
2018-04-17 12:10   ` kbuild test robot
2018-04-18  1:39     ` Mike Kravetz
2018-04-17 14:19   ` kbuild test robot
2018-04-21 16:16   ` Vlastimil Babka
2018-05-02 21:13     ` Mike Kravetz
2018-06-06  9:32       ` Michal Hocko
2018-04-23  0:09   ` Michal Hocko
2018-04-23  4:22     ` Mike Kravetz
2018-04-24 13:37       ` Michal Hocko
2018-04-17  2:09 ` [PATCH 3/3] mm/hugetlb: use find_alloc_contig_pages() to allocate gigantic pages Mike Kravetz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).