[RFC PATCH 0/3] Interface for higher order contiguous allocations

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH 0/3] Interface for higher order contiguous allocations
@ 2018-02-12 22:20 ` Mike Kravetz
  0 siblings, 0 replies; 19+ messages in thread
From: Mike Kravetz @ 2018-02-12 22:20 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Michal Hocko, Christopher Lameter, Guy Shattah,
	Anshuman Khandual, Michal Nazarewicz, Vlastimil Babka,
	David Nellans, Laura Abbott, Pavel Machek, Dave Hansen,
	Mike Kravetz

These patches came out of the "[RFC] mmap(MAP_CONTIG)" discussions at:
http://lkml.kernel.org/r/21f1ec96-2822-1189-1c95-79a2bb491571@oracle.com

One suggestion in that thread was to create a friendlier interface that
could be used by drivers and others outside core mm code to allocate a
contiguous set of pages.  The alloc_contig_range() interface is used for
this purpose today by CMA and gigantic page allocation.  However, this is
not a general purpose interface.  So, wrap alloc_contig_range() in the
more general interface:

struct page *find_alloc_contig_pages(unsigned int order, gfp_t gfp, int nid,
					nodemask_t *nodemask)

No underlying changes are made to increase the likelihood that a contiguous
set of pages can be found and allocated.  Therefore, any user of this
interface must deal with failure.  The hope is that this interface will be
able to satisfy some use cases today.

If the "rate of failure" is too high to be useful, then more work can be put
into methods to help increase the rate of successful allocations.  Such a
proposal was recently sent by Christoph Lameter "[RFC] Protect larger order
pages from breaking up":
http://lkml.kernel.org/r/alpine.DEB.2.20.1802091311090.3059@nuc-kabylake

find_alloc_contig_pages() uses the same logic that exists today for scanning
zones to look for contiguous ranges suitable for gigantic pages.  The last
patch in the series changes gigantic page allocation to use the new interface.

Mike Kravetz (3):
  mm: make start_isolate_page_range() fail if already isolated
  mm: add find_alloc_contig_pages() interface
  mm/hugetlb: use find_alloc_contig_pages() to allocate gigantic pages

 include/linux/gfp.h | 12 +++++++
 mm/hugetlb.c        | 88 ++++--------------------------------------------
 mm/page_alloc.c     | 97 +++++++++++++++++++++++++++++++++++++++++++++++++----
 mm/page_isolation.c | 10 +++++-
 4 files changed, 118 insertions(+), 89 deletions(-)

-- 
2.13.6

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC PATCH 0/3] Interface for higher order contiguous allocations
@ 2018-02-12 22:20 ` Mike Kravetz
  0 siblings, 0 replies; 19+ messages in thread
From: Mike Kravetz @ 2018-02-12 22:20 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Michal Hocko, Christopher Lameter, Guy Shattah,
	Anshuman Khandual, Michal Nazarewicz, Vlastimil Babka,
	David Nellans, Laura Abbott, Pavel Machek, Dave Hansen,
	Mike Kravetz

These patches came out of the "[RFC] mmap(MAP_CONTIG)" discussions at:
http://lkml.kernel.org/r/21f1ec96-2822-1189-1c95-79a2bb491571@oracle.com

One suggestion in that thread was to create a friendlier interface that
could be used by drivers and others outside core mm code to allocate a
contiguous set of pages.  The alloc_contig_range() interface is used for
this purpose today by CMA and gigantic page allocation.  However, this is
not a general purpose interface.  So, wrap alloc_contig_range() in the
more general interface:

struct page *find_alloc_contig_pages(unsigned int order, gfp_t gfp, int nid,
					nodemask_t *nodemask)

No underlying changes are made to increase the likelihood that a contiguous
set of pages can be found and allocated.  Therefore, any user of this
interface must deal with failure.  The hope is that this interface will be
able to satisfy some use cases today.

If the "rate of failure" is too high to be useful, then more work can be put
into methods to help increase the rate of successful allocations.  Such a
proposal was recently sent by Christoph Lameter "[RFC] Protect larger order
pages from breaking up":
http://lkml.kernel.org/r/alpine.DEB.2.20.1802091311090.3059@nuc-kabylake

find_alloc_contig_pages() uses the same logic that exists today for scanning
zones to look for contiguous ranges suitable for gigantic pages.  The last
patch in the series changes gigantic page allocation to use the new interface.

Mike Kravetz (3):
  mm: make start_isolate_page_range() fail if already isolated
  mm: add find_alloc_contig_pages() interface
  mm/hugetlb: use find_alloc_contig_pages() to allocate gigantic pages

 include/linux/gfp.h | 12 +++++++
 mm/hugetlb.c        | 88 ++++--------------------------------------------
 mm/page_alloc.c     | 97 +++++++++++++++++++++++++++++++++++++++++++++++++----
 mm/page_isolation.c | 10 +++++-
 4 files changed, 118 insertions(+), 89 deletions(-)

-- 
2.13.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC PATCH 1/3] mm: make start_isolate_page_range() fail if already isolated
  2018-02-12 22:20 ` Mike Kravetz
@ 2018-02-12 22:20   ` Mike Kravetz
  -1 siblings, 0 replies; 19+ messages in thread
From: Mike Kravetz @ 2018-02-12 22:20 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Michal Hocko, Christopher Lameter, Guy Shattah,
	Anshuman Khandual, Michal Nazarewicz, Vlastimil Babka,
	David Nellans, Laura Abbott, Pavel Machek, Dave Hansen,
	Mike Kravetz

start_isolate_page_range() is used to set the migrate type of a
page block to MIGRATE_ISOLATE while attempting to start a
migration operation.  It is assumed that only one thread is
attempting such an operation, and due to the limited number of
callers this is generally the case.  However, there are no
guarantees and it is 'possible' for two threads to operate on
the same range.

Since start_isolate_page_range() is called at the beginning of
such operations, have it return -EBUSY if MIGRATE_ISOLATE is
already set.

This will allow start_isolate_page_range to serve as a
synchronization mechanism and will allow for more general use
of callers making use of these interfaces.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 mm/page_alloc.c     |  8 ++++----
 mm/page_isolation.c | 10 +++++++++-
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 76c9688b6a0a..064458f317bf 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7605,11 +7605,11 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
  * @gfp_mask:	GFP mask to use during compaction
  *
  * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
- * aligned, however it's the caller's responsibility to guarantee that
- * we are the only thread that changes migrate type of pageblocks the
- * pages fall in.
+ * aligned.  The PFN range must belong to a single zone.
  *
- * The PFN range must belong to a single zone.
+ * The first thing this routine does is attempt to MIGRATE_ISOLATE all
+ * pageblocks in the range.  Once isolated, the pageblocks should not
+ * be modified by others.
  *
  * Returns zero on success or negative error code.  On success all
  * pages which PFN is in [start, end) are allocated for the caller and
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 165ed8117bd1..e815879d525f 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -28,6 +28,13 @@ static int set_migratetype_isolate(struct page *page, int migratetype,
 
 	spin_lock_irqsave(&zone->lock, flags);
 
+	/*
+	 * We assume we are the only ones trying to isolate this block.
+	 * If MIGRATE_ISOLATE already set, return -EBUSY
+	 */
+	if (is_migrate_isolate_page(page))
+		goto out;
+
 	pfn = page_to_pfn(page);
 	arg.start_pfn = pfn;
 	arg.nr_pages = pageblock_nr_pages;
@@ -166,7 +173,8 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * future will not be allocated again.
  *
  * start_pfn/end_pfn must be aligned to pageblock_order.
- * Returns 0 on success and -EBUSY if any part of range cannot be isolated.
+ * Returns 0 on success and -EBUSY if any part of range cannot be isolated
+ * or any part of the range is already set to MIGRATE_ISOLATE.
  */
 int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 			     unsigned migratetype, bool skip_hwpoisoned_pages)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH 1/3] mm: make start_isolate_page_range() fail if already isolated
@ 2018-02-12 22:20   ` Mike Kravetz
  0 siblings, 0 replies; 19+ messages in thread
From: Mike Kravetz @ 2018-02-12 22:20 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Michal Hocko, Christopher Lameter, Guy Shattah,
	Anshuman Khandual, Michal Nazarewicz, Vlastimil Babka,
	David Nellans, Laura Abbott, Pavel Machek, Dave Hansen,
	Mike Kravetz

start_isolate_page_range() is used to set the migrate type of a
page block to MIGRATE_ISOLATE while attempting to start a
migration operation.  It is assumed that only one thread is
attempting such an operation, and due to the limited number of
callers this is generally the case.  However, there are no
guarantees and it is 'possible' for two threads to operate on
the same range.

Since start_isolate_page_range() is called at the beginning of
such operations, have it return -EBUSY if MIGRATE_ISOLATE is
already set.

This will allow start_isolate_page_range to serve as a
synchronization mechanism and will allow for more general use
of callers making use of these interfaces.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 mm/page_alloc.c     |  8 ++++----
 mm/page_isolation.c | 10 +++++++++-
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 76c9688b6a0a..064458f317bf 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7605,11 +7605,11 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
  * @gfp_mask:	GFP mask to use during compaction
  *
  * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
- * aligned, however it's the caller's responsibility to guarantee that
- * we are the only thread that changes migrate type of pageblocks the
- * pages fall in.
+ * aligned.  The PFN range must belong to a single zone.
  *
- * The PFN range must belong to a single zone.
+ * The first thing this routine does is attempt to MIGRATE_ISOLATE all
+ * pageblocks in the range.  Once isolated, the pageblocks should not
+ * be modified by others.
  *
  * Returns zero on success or negative error code.  On success all
  * pages which PFN is in [start, end) are allocated for the caller and
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 165ed8117bd1..e815879d525f 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -28,6 +28,13 @@ static int set_migratetype_isolate(struct page *page, int migratetype,
 
 	spin_lock_irqsave(&zone->lock, flags);
 
+	/*
+	 * We assume we are the only ones trying to isolate this block.
+	 * If MIGRATE_ISOLATE already set, return -EBUSY
+	 */
+	if (is_migrate_isolate_page(page))
+		goto out;
+
 	pfn = page_to_pfn(page);
 	arg.start_pfn = pfn;
 	arg.nr_pages = pageblock_nr_pages;
@@ -166,7 +173,8 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * future will not be allocated again.
  *
  * start_pfn/end_pfn must be aligned to pageblock_order.
- * Returns 0 on success and -EBUSY if any part of range cannot be isolated.
+ * Returns 0 on success and -EBUSY if any part of range cannot be isolated
+ * or any part of the range is already set to MIGRATE_ISOLATE.
  */
 int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 			     unsigned migratetype, bool skip_hwpoisoned_pages)
-- 
2.13.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH 2/3] mm: add find_alloc_contig_pages() interface
  2018-02-12 22:20 ` Mike Kravetz
@ 2018-02-12 22:20   ` Mike Kravetz
  -1 siblings, 0 replies; 19+ messages in thread
From: Mike Kravetz @ 2018-02-12 22:20 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Michal Hocko, Christopher Lameter, Guy Shattah,
	Anshuman Khandual, Michal Nazarewicz, Vlastimil Babka,
	David Nellans, Laura Abbott, Pavel Machek, Dave Hansen,
	Mike Kravetz

find_alloc_contig_pages() is a new interface that attempts to locate
and allocate a contiguous range of pages.  It is provided as a more
convenient interface to the existing alloc_contig_range() interface
which is used by CMA, memory hotplug and gigantic huge pages.

When attempting to allocate a range of pages, migration is employed
if possible.  There is no guarantee that the routine will succeed.
So, the user must be prepared for failure and have a fall back plan.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 include/linux/gfp.h | 12 ++++++++
 mm/page_alloc.c     | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 99 insertions(+), 2 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 1a4582b44d32..456979022956 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -573,6 +573,18 @@ static inline bool pm_suspended_storage(void)
 extern int alloc_contig_range(unsigned long start, unsigned long end,
 			      unsigned migratetype, gfp_t gfp_mask);
 extern void free_contig_range(unsigned long pfn, unsigned nr_pages);
+extern struct page *find_alloc_contig_pages(unsigned int order, gfp_t gfp,
+						int nid, nodemask_t *nodemask);
+extern void free_contig_pages(struct page *page, unsigned nr_pages);
+#else
+static inline page *find_alloc_contig_pages(unsigned int order, gfp_t gfp,
+						int nid, nodemask_t *nodemask)
+{
+	return NULL;
+}
+static void free_contig_pages(struct page *page, unsigned nr_pages)
+{
+}
 #endif
 
 #ifdef CONFIG_CMA
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 064458f317bf..0a5a547acdbf 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -67,6 +67,7 @@
 #include <linux/ftrace.h>
 #include <linux/lockdep.h>
 #include <linux/nmi.h>
+#include <linux/mmzone.h>
 
 #include <asm/sections.h>
 #include <asm/tlbflush.h>
@@ -1873,9 +1874,13 @@ static __always_inline struct page *__rmqueue_cma_fallback(struct zone *zone,
 {
 	return __rmqueue_smallest(zone, order, MIGRATE_CMA);
 }
+#define contig_alloc_migratetype_ok(migratetype) \
+	((migratetype) == MIGRATE_CMA || (migratetype) == MIGRATE_MOVABLE)
 #else
 static inline struct page *__rmqueue_cma_fallback(struct zone *zone,
 					unsigned int order) { return NULL; }
+#define contig_alloc_migratetype_ok(migratetype) \
+	((migratetype) == MIGRATE_MOVABLE)
 #endif
 
 /*
@@ -7633,6 +7638,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	};
 	INIT_LIST_HEAD(&cc.migratepages);
 
+	if (!contig_alloc_migratetype_ok(migratetype))
+		return -EINVAL;
+
 	/*
 	 * What we do here is we mark all pageblocks in range as
 	 * MIGRATE_ISOLATE.  Because pageblock and max order pages may
@@ -7723,8 +7731,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 
 	/* Make sure the range is really isolated. */
 	if (test_pages_isolated(outer_start, end, false)) {
-		pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n",
-			__func__, outer_start, end);
+		if (!(migratetype == MIGRATE_MOVABLE)) /* only print for CMA */
+			pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n",
+				__func__, outer_start, end);
 		ret = -EBUSY;
 		goto done;
 	}
@@ -7760,6 +7769,82 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
 	}
 	WARN(count != 0, "%d pages are still in use!\n", count);
 }
+
+static bool contig_pfn_range_valid(struct zone *z, unsigned long start_pfn,
+					unsigned long nr_pages)
+{
+	unsigned long i, end_pfn = start_pfn + nr_pages;
+	struct page *page;
+
+	for (i = start_pfn; i < end_pfn; i++) {
+		if (!pfn_valid(i))
+			return false;
+
+		page = pfn_to_page(i);
+
+		if (page_zone(page) != z)
+			return false;
+
+	}
+
+	return true;
+}
+
+/**
+ * find_alloc_contig_pages() -- attempt to find and allocate a contiguous
+ *				range of pages
+ * @order:	number of pages
+ * @gfp:	gfp mask used to limit search as well as during compaction
+ * @nid:	target node
+ * @nodemask:	mask of other possible nodes
+ *
+ * Returns pointer to 'order' pages on success, or NULL if not successful.
+ *
+ * Pages can be freed with a call to free_contig_pages(), or by manually
+ * calling __free_page() for each page allocated.
+ */
+struct page *find_alloc_contig_pages(unsigned int order, gfp_t gfp,
+					int nid, nodemask_t *nodemask)
+{
+	unsigned long pfn, nr_pages, flags;
+	struct page *ret_page = NULL;
+	struct zonelist *zonelist;
+	struct zoneref *z;
+	struct zone *zone;
+	int rc;
+
+	nr_pages = 1 << order;
+	zonelist = node_zonelist(nid, gfp);
+	for_each_zone_zonelist_nodemask(zone, z, zonelist, gfp_zone(gfp),
+					nodemask) {
+		spin_lock_irqsave(&zone->lock, flags);
+		pfn = ALIGN(zone->zone_start_pfn, nr_pages);
+		while (zone_spans_pfn(zone, pfn + nr_pages - 1)) {
+			if (contig_pfn_range_valid(zone, pfn, nr_pages)) {
+				spin_unlock_irqrestore(&zone->lock, flags);
+
+				rc = alloc_contig_range(pfn, pfn + nr_pages,
+							MIGRATE_MOVABLE, gfp);
+				if (!rc) {
+					ret_page = pfn_to_page(pfn);
+					return ret_page;
+				}
+				spin_lock_irqsave(&zone->lock, flags);
+			}
+			pfn += nr_pages;
+		}
+		spin_unlock_irqrestore(&zone->lock, flags);
+	}
+
+	return ret_page;
+}
+EXPORT_SYMBOL_GPL(find_alloc_contig_pages);
+
+void free_contig_pages(struct page *page, unsigned nr_pages)
+{
+	free_contig_range(page_to_pfn(page), nr_pages);
+}
+EXPORT_SYMBOL_GPL(free_contig_pages);
 #endif
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH 2/3] mm: add find_alloc_contig_pages() interface
@ 2018-02-12 22:20   ` Mike Kravetz
  0 siblings, 0 replies; 19+ messages in thread
From: Mike Kravetz @ 2018-02-12 22:20 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Michal Hocko, Christopher Lameter, Guy Shattah,
	Anshuman Khandual, Michal Nazarewicz, Vlastimil Babka,
	David Nellans, Laura Abbott, Pavel Machek, Dave Hansen,
	Mike Kravetz

find_alloc_contig_pages() is a new interface that attempts to locate
and allocate a contiguous range of pages.  It is provided as a more
convenient interface to the existing alloc_contig_range() interface
which is used by CMA, memory hotplug and gigantic huge pages.

When attempting to allocate a range of pages, migration is employed
if possible.  There is no guarantee that the routine will succeed.
So, the user must be prepared for failure and have a fall back plan.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 include/linux/gfp.h | 12 ++++++++
 mm/page_alloc.c     | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 99 insertions(+), 2 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 1a4582b44d32..456979022956 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -573,6 +573,18 @@ static inline bool pm_suspended_storage(void)
 extern int alloc_contig_range(unsigned long start, unsigned long end,
 			      unsigned migratetype, gfp_t gfp_mask);
 extern void free_contig_range(unsigned long pfn, unsigned nr_pages);
+extern struct page *find_alloc_contig_pages(unsigned int order, gfp_t gfp,
+						int nid, nodemask_t *nodemask);
+extern void free_contig_pages(struct page *page, unsigned nr_pages);
+#else
+static inline page *find_alloc_contig_pages(unsigned int order, gfp_t gfp,
+						int nid, nodemask_t *nodemask)
+{
+	return NULL;
+}
+static void free_contig_pages(struct page *page, unsigned nr_pages)
+{
+}
 #endif
 
 #ifdef CONFIG_CMA
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 064458f317bf..0a5a547acdbf 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -67,6 +67,7 @@
 #include <linux/ftrace.h>
 #include <linux/lockdep.h>
 #include <linux/nmi.h>
+#include <linux/mmzone.h>
 
 #include <asm/sections.h>
 #include <asm/tlbflush.h>
@@ -1873,9 +1874,13 @@ static __always_inline struct page *__rmqueue_cma_fallback(struct zone *zone,
 {
 	return __rmqueue_smallest(zone, order, MIGRATE_CMA);
 }
+#define contig_alloc_migratetype_ok(migratetype) \
+	((migratetype) == MIGRATE_CMA || (migratetype) == MIGRATE_MOVABLE)
 #else
 static inline struct page *__rmqueue_cma_fallback(struct zone *zone,
 					unsigned int order) { return NULL; }
+#define contig_alloc_migratetype_ok(migratetype) \
+	((migratetype) == MIGRATE_MOVABLE)
 #endif
 
 /*
@@ -7633,6 +7638,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	};
 	INIT_LIST_HEAD(&cc.migratepages);
 
+	if (!contig_alloc_migratetype_ok(migratetype))
+		return -EINVAL;
+
 	/*
 	 * What we do here is we mark all pageblocks in range as
 	 * MIGRATE_ISOLATE.  Because pageblock and max order pages may
@@ -7723,8 +7731,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 
 	/* Make sure the range is really isolated. */
 	if (test_pages_isolated(outer_start, end, false)) {
-		pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n",
-			__func__, outer_start, end);
+		if (!(migratetype == MIGRATE_MOVABLE)) /* only print for CMA */
+			pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n",
+				__func__, outer_start, end);
 		ret = -EBUSY;
 		goto done;
 	}
@@ -7760,6 +7769,82 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
 	}
 	WARN(count != 0, "%d pages are still in use!\n", count);
 }
+
+static bool contig_pfn_range_valid(struct zone *z, unsigned long start_pfn,
+					unsigned long nr_pages)
+{
+	unsigned long i, end_pfn = start_pfn + nr_pages;
+	struct page *page;
+
+	for (i = start_pfn; i < end_pfn; i++) {
+		if (!pfn_valid(i))
+			return false;
+
+		page = pfn_to_page(i);
+
+		if (page_zone(page) != z)
+			return false;
+
+	}
+
+	return true;
+}
+
+/**
+ * find_alloc_contig_pages() -- attempt to find and allocate a contiguous
+ *				range of pages
+ * @order:	number of pages
+ * @gfp:	gfp mask used to limit search as well as during compaction
+ * @nid:	target node
+ * @nodemask:	mask of other possible nodes
+ *
+ * Returns pointer to 'order' pages on success, or NULL if not successful.
+ *
+ * Pages can be freed with a call to free_contig_pages(), or by manually
+ * calling __free_page() for each page allocated.
+ */
+struct page *find_alloc_contig_pages(unsigned int order, gfp_t gfp,
+					int nid, nodemask_t *nodemask)
+{
+	unsigned long pfn, nr_pages, flags;
+	struct page *ret_page = NULL;
+	struct zonelist *zonelist;
+	struct zoneref *z;
+	struct zone *zone;
+	int rc;
+
+	nr_pages = 1 << order;
+	zonelist = node_zonelist(nid, gfp);
+	for_each_zone_zonelist_nodemask(zone, z, zonelist, gfp_zone(gfp),
+					nodemask) {
+		spin_lock_irqsave(&zone->lock, flags);
+		pfn = ALIGN(zone->zone_start_pfn, nr_pages);
+		while (zone_spans_pfn(zone, pfn + nr_pages - 1)) {
+			if (contig_pfn_range_valid(zone, pfn, nr_pages)) {
+				spin_unlock_irqrestore(&zone->lock, flags);
+
+				rc = alloc_contig_range(pfn, pfn + nr_pages,
+							MIGRATE_MOVABLE, gfp);
+				if (!rc) {
+					ret_page = pfn_to_page(pfn);
+					return ret_page;
+				}
+				spin_lock_irqsave(&zone->lock, flags);
+			}
+			pfn += nr_pages;
+		}
+		spin_unlock_irqrestore(&zone->lock, flags);
+	}
+
+	return ret_page;
+}
+EXPORT_SYMBOL_GPL(find_alloc_contig_pages);
+
+void free_contig_pages(struct page *page, unsigned nr_pages)
+{
+	free_contig_range(page_to_pfn(page), nr_pages);
+}
+EXPORT_SYMBOL_GPL(free_contig_pages);
 #endif
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-- 
2.13.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH 3/3] mm/hugetlb: use find_alloc_contig_pages() to allocate gigantic pages
  2018-02-12 22:20 ` Mike Kravetz
@ 2018-02-12 22:20   ` Mike Kravetz
  -1 siblings, 0 replies; 19+ messages in thread
From: Mike Kravetz @ 2018-02-12 22:20 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Michal Hocko, Christopher Lameter, Guy Shattah,
	Anshuman Khandual, Michal Nazarewicz, Vlastimil Babka,
	David Nellans, Laura Abbott, Pavel Machek, Dave Hansen,
	Mike Kravetz

Use the new find_alloc_contig_pages() interface for the allocation of
gigantic pages.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 mm/hugetlb.c | 88 +++++-------------------------------------------------------
 1 file changed, 6 insertions(+), 82 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 9a334f5fb730..4c0c4f86dcda 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1060,92 +1060,16 @@ static void destroy_compound_gigantic_page(struct page *page,
 	__ClearPageHead(page);
 }
 
-static void free_gigantic_page(struct page *page, unsigned int order)
+static void free_gigantic_page(struct page *page, struct hstate *h)
 {
-	free_contig_range(page_to_pfn(page), 1 << order);
-}
-
-static int __alloc_gigantic_page(unsigned long start_pfn,
-				unsigned long nr_pages, gfp_t gfp_mask)
-{
-	unsigned long end_pfn = start_pfn + nr_pages;
-	return alloc_contig_range(start_pfn, end_pfn, MIGRATE_MOVABLE,
-				  gfp_mask);
-}
-
-static bool pfn_range_valid_gigantic(struct zone *z,
-			unsigned long start_pfn, unsigned long nr_pages)
-{
-	unsigned long i, end_pfn = start_pfn + nr_pages;
-	struct page *page;
-
-	for (i = start_pfn; i < end_pfn; i++) {
-		if (!pfn_valid(i))
-			return false;
-
-		page = pfn_to_page(i);
-
-		if (page_zone(page) != z)
-			return false;
-
-		if (PageReserved(page))
-			return false;
-
-		if (page_count(page) > 0)
-			return false;
-
-		if (PageHuge(page))
-			return false;
-	}
-
-	return true;
-}
-
-static bool zone_spans_last_pfn(const struct zone *zone,
-			unsigned long start_pfn, unsigned long nr_pages)
-{
-	unsigned long last_pfn = start_pfn + nr_pages - 1;
-	return zone_spans_pfn(zone, last_pfn);
+	free_contig_pages(page, pages_per_huge_page(h));
 }
 
 static struct page *alloc_gigantic_page(int nid, struct hstate *h)
 {
-	unsigned int order = huge_page_order(h);
-	unsigned long nr_pages = 1 << order;
-	unsigned long ret, pfn, flags;
-	struct zonelist *zonelist;
-	struct zone *zone;
-	struct zoneref *z;
-	gfp_t gfp_mask;
-
-	gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE;
-	zonelist = node_zonelist(nid, gfp_mask);
-	for_each_zone_zonelist_nodemask(zone, z, zonelist, gfp_zone(gfp_mask), NULL) {
-		spin_lock_irqsave(&zone->lock, flags);
+	gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE;
 
-		pfn = ALIGN(zone->zone_start_pfn, nr_pages);
-		while (zone_spans_last_pfn(zone, pfn, nr_pages)) {
-			if (pfn_range_valid_gigantic(zone, pfn, nr_pages)) {
-				/*
-				 * We release the zone lock here because
-				 * alloc_contig_range() will also lock the zone
-				 * at some point. If there's an allocation
-				 * spinning on this lock, it may win the race
-				 * and cause alloc_contig_range() to fail...
-				 */
-				spin_unlock_irqrestore(&zone->lock, flags);
-				ret = __alloc_gigantic_page(pfn, nr_pages, gfp_mask);
-				if (!ret)
-					return pfn_to_page(pfn);
-				spin_lock_irqsave(&zone->lock, flags);
-			}
-			pfn += nr_pages;
-		}
-
-		spin_unlock_irqrestore(&zone->lock, flags);
-	}
-
-	return NULL;
+	return find_alloc_contig_pages(huge_page_order(h), gfp_mask, nid, NULL);
 }
 
 static void prep_new_huge_page(struct hstate *h, struct page *page, int nid);
@@ -1181,7 +1105,7 @@ static int alloc_fresh_gigantic_page(struct hstate *h,
 
 #else /* !CONFIG_ARCH_HAS_GIGANTIC_PAGE */
 static inline bool gigantic_page_supported(void) { return false; }
-static inline void free_gigantic_page(struct page *page, unsigned int order) { }
+static void free_gigantic_page(struct page *page, struct hstate *h) { }
 static inline void destroy_compound_gigantic_page(struct page *page,
 						unsigned int order) { }
 static inline int alloc_fresh_gigantic_page(struct hstate *h,
@@ -1208,7 +1132,7 @@ static void update_and_free_page(struct hstate *h, struct page *page)
 	set_page_refcounted(page);
 	if (hstate_is_gigantic(h)) {
 		destroy_compound_gigantic_page(page, huge_page_order(h));
-		free_gigantic_page(page, huge_page_order(h));
+		free_gigantic_page(page, h);
 	} else {
 		__free_pages(page, huge_page_order(h));
 	}
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH 3/3] mm/hugetlb: use find_alloc_contig_pages() to allocate gigantic pages
@ 2018-02-12 22:20   ` Mike Kravetz
  0 siblings, 0 replies; 19+ messages in thread
From: Mike Kravetz @ 2018-02-12 22:20 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Michal Hocko, Christopher Lameter, Guy Shattah,
	Anshuman Khandual, Michal Nazarewicz, Vlastimil Babka,
	David Nellans, Laura Abbott, Pavel Machek, Dave Hansen,
	Mike Kravetz

Use the new find_alloc_contig_pages() interface for the allocation of
gigantic pages.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 mm/hugetlb.c | 88 +++++-------------------------------------------------------
 1 file changed, 6 insertions(+), 82 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 9a334f5fb730..4c0c4f86dcda 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1060,92 +1060,16 @@ static void destroy_compound_gigantic_page(struct page *page,
 	__ClearPageHead(page);
 }
 
-static void free_gigantic_page(struct page *page, unsigned int order)
+static void free_gigantic_page(struct page *page, struct hstate *h)
 {
-	free_contig_range(page_to_pfn(page), 1 << order);
-}
-
-static int __alloc_gigantic_page(unsigned long start_pfn,
-				unsigned long nr_pages, gfp_t gfp_mask)
-{
-	unsigned long end_pfn = start_pfn + nr_pages;
-	return alloc_contig_range(start_pfn, end_pfn, MIGRATE_MOVABLE,
-				  gfp_mask);
-}
-
-static bool pfn_range_valid_gigantic(struct zone *z,
-			unsigned long start_pfn, unsigned long nr_pages)
-{
-	unsigned long i, end_pfn = start_pfn + nr_pages;
-	struct page *page;
-
-	for (i = start_pfn; i < end_pfn; i++) {
-		if (!pfn_valid(i))
-			return false;
-
-		page = pfn_to_page(i);
-
-		if (page_zone(page) != z)
-			return false;
-
-		if (PageReserved(page))
-			return false;
-
-		if (page_count(page) > 0)
-			return false;
-
-		if (PageHuge(page))
-			return false;
-	}
-
-	return true;
-}
-
-static bool zone_spans_last_pfn(const struct zone *zone,
-			unsigned long start_pfn, unsigned long nr_pages)
-{
-	unsigned long last_pfn = start_pfn + nr_pages - 1;
-	return zone_spans_pfn(zone, last_pfn);
+	free_contig_pages(page, pages_per_huge_page(h));
 }
 
 static struct page *alloc_gigantic_page(int nid, struct hstate *h)
 {
-	unsigned int order = huge_page_order(h);
-	unsigned long nr_pages = 1 << order;
-	unsigned long ret, pfn, flags;
-	struct zonelist *zonelist;
-	struct zone *zone;
-	struct zoneref *z;
-	gfp_t gfp_mask;
-
-	gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE;
-	zonelist = node_zonelist(nid, gfp_mask);
-	for_each_zone_zonelist_nodemask(zone, z, zonelist, gfp_zone(gfp_mask), NULL) {
-		spin_lock_irqsave(&zone->lock, flags);
+	gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE;
 
-		pfn = ALIGN(zone->zone_start_pfn, nr_pages);
-		while (zone_spans_last_pfn(zone, pfn, nr_pages)) {
-			if (pfn_range_valid_gigantic(zone, pfn, nr_pages)) {
-				/*
-				 * We release the zone lock here because
-				 * alloc_contig_range() will also lock the zone
-				 * at some point. If there's an allocation
-				 * spinning on this lock, it may win the race
-				 * and cause alloc_contig_range() to fail...
-				 */
-				spin_unlock_irqrestore(&zone->lock, flags);
-				ret = __alloc_gigantic_page(pfn, nr_pages, gfp_mask);
-				if (!ret)
-					return pfn_to_page(pfn);
-				spin_lock_irqsave(&zone->lock, flags);
-			}
-			pfn += nr_pages;
-		}
-
-		spin_unlock_irqrestore(&zone->lock, flags);
-	}
-
-	return NULL;
+	return find_alloc_contig_pages(huge_page_order(h), gfp_mask, nid, NULL);
 }
 
 static void prep_new_huge_page(struct hstate *h, struct page *page, int nid);
@@ -1181,7 +1105,7 @@ static int alloc_fresh_gigantic_page(struct hstate *h,
 
 #else /* !CONFIG_ARCH_HAS_GIGANTIC_PAGE */
 static inline bool gigantic_page_supported(void) { return false; }
-static inline void free_gigantic_page(struct page *page, unsigned int order) { }
+static void free_gigantic_page(struct page *page, struct hstate *h) { }
 static inline void destroy_compound_gigantic_page(struct page *page,
 						unsigned int order) { }
 static inline int alloc_fresh_gigantic_page(struct hstate *h,
@@ -1208,7 +1132,7 @@ static void update_and_free_page(struct hstate *h, struct page *page)
 	set_page_refcounted(page);
 	if (hstate_is_gigantic(h)) {
 		destroy_compound_gigantic_page(page, huge_page_order(h));
-		free_gigantic_page(page, huge_page_order(h));
+		free_gigantic_page(page, h);
 	} else {
 		__free_pages(page, huge_page_order(h));
 	}
-- 
2.13.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 1/3] mm: make start_isolate_page_range() fail if already isolated
  2018-02-12 22:20   ` Mike Kravetz
@ 2018-02-13  9:46     ` Mike Rapoport
  -1 siblings, 0 replies; 19+ messages in thread
From: Mike Rapoport @ 2018-02-13  9:46 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, Michal Hocko, Christopher Lameter,
	Guy Shattah, Anshuman Khandual, Michal Nazarewicz,
	Vlastimil Babka, David Nellans, Laura Abbott, Pavel Machek,
	Dave Hansen

On Mon, Feb 12, 2018 at 02:20:54PM -0800, Mike Kravetz wrote:
> start_isolate_page_range() is used to set the migrate type of a
> page block to MIGRATE_ISOLATE while attempting to start a
> migration operation.  It is assumed that only one thread is
> attempting such an operation, and due to the limited number of
> callers this is generally the case.  However, there are no
> guarantees and it is 'possible' for two threads to operate on
> the same range.
> 
> Since start_isolate_page_range() is called at the beginning of
> such operations, have it return -EBUSY if MIGRATE_ISOLATE is
> already set.
> 
> This will allow start_isolate_page_range to serve as a
> synchronization mechanism and will allow for more general use
> of callers making use of these interfaces.
> 
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  mm/page_alloc.c     |  8 ++++----
>  mm/page_isolation.c | 10 +++++++++-
>  2 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 76c9688b6a0a..064458f317bf 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7605,11 +7605,11 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
>   * @gfp_mask:	GFP mask to use during compaction
>   *
>   * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
> - * aligned, however it's the caller's responsibility to guarantee that
> - * we are the only thread that changes migrate type of pageblocks the
> - * pages fall in.
> + * aligned.  The PFN range must belong to a single zone.
>   *
> - * The PFN range must belong to a single zone.
> + * The first thing this routine does is attempt to MIGRATE_ISOLATE all
> + * pageblocks in the range.  Once isolated, the pageblocks should not
> + * be modified by others.
>   *
>   * Returns zero on success or negative error code.  On success all
>   * pages which PFN is in [start, end) are allocated for the caller and
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 165ed8117bd1..e815879d525f 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -28,6 +28,13 @@ static int set_migratetype_isolate(struct page *page, int migratetype,
> 
>  	spin_lock_irqsave(&zone->lock, flags);
> 
> +	/*
> +	 * We assume we are the only ones trying to isolate this block.
> +	 * If MIGRATE_ISOLATE already set, return -EBUSY
> +	 */
> +	if (is_migrate_isolate_page(page))
> +		goto out;
> +
>  	pfn = page_to_pfn(page);
>  	arg.start_pfn = pfn;
>  	arg.nr_pages = pageblock_nr_pages;
> @@ -166,7 +173,8 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>   * future will not be allocated again.
>   *
>   * start_pfn/end_pfn must be aligned to pageblock_order.
> - * Returns 0 on success and -EBUSY if any part of range cannot be isolated.
> + * Returns 0 on success and -EBUSY if any part of range cannot be isolated

Nit: please s/Returns/Return:/ and keep the period in the end 

> + * or any part of the range is already set to MIGRATE_ISOLATE.
>   */
>  int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>  			     unsigned migratetype, bool skip_hwpoisoned_pages)
> -- 
> 2.13.6
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 1/3] mm: make start_isolate_page_range() fail if already isolated
@ 2018-02-13  9:46     ` Mike Rapoport
  0 siblings, 0 replies; 19+ messages in thread
From: Mike Rapoport @ 2018-02-13  9:46 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, Michal Hocko, Christopher Lameter,
	Guy Shattah, Anshuman Khandual, Michal Nazarewicz,
	Vlastimil Babka, David Nellans, Laura Abbott, Pavel Machek,
	Dave Hansen

On Mon, Feb 12, 2018 at 02:20:54PM -0800, Mike Kravetz wrote:
> start_isolate_page_range() is used to set the migrate type of a
> page block to MIGRATE_ISOLATE while attempting to start a
> migration operation.  It is assumed that only one thread is
> attempting such an operation, and due to the limited number of
> callers this is generally the case.  However, there are no
> guarantees and it is 'possible' for two threads to operate on
> the same range.
> 
> Since start_isolate_page_range() is called at the beginning of
> such operations, have it return -EBUSY if MIGRATE_ISOLATE is
> already set.
> 
> This will allow start_isolate_page_range to serve as a
> synchronization mechanism and will allow for more general use
> of callers making use of these interfaces.
> 
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  mm/page_alloc.c     |  8 ++++----
>  mm/page_isolation.c | 10 +++++++++-
>  2 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 76c9688b6a0a..064458f317bf 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7605,11 +7605,11 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
>   * @gfp_mask:	GFP mask to use during compaction
>   *
>   * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
> - * aligned, however it's the caller's responsibility to guarantee that
> - * we are the only thread that changes migrate type of pageblocks the
> - * pages fall in.
> + * aligned.  The PFN range must belong to a single zone.
>   *
> - * The PFN range must belong to a single zone.
> + * The first thing this routine does is attempt to MIGRATE_ISOLATE all
> + * pageblocks in the range.  Once isolated, the pageblocks should not
> + * be modified by others.
>   *
>   * Returns zero on success or negative error code.  On success all
>   * pages which PFN is in [start, end) are allocated for the caller and
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 165ed8117bd1..e815879d525f 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -28,6 +28,13 @@ static int set_migratetype_isolate(struct page *page, int migratetype,
> 
>  	spin_lock_irqsave(&zone->lock, flags);
> 
> +	/*
> +	 * We assume we are the only ones trying to isolate this block.
> +	 * If MIGRATE_ISOLATE already set, return -EBUSY
> +	 */
> +	if (is_migrate_isolate_page(page))
> +		goto out;
> +
>  	pfn = page_to_pfn(page);
>  	arg.start_pfn = pfn;
>  	arg.nr_pages = pageblock_nr_pages;
> @@ -166,7 +173,8 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>   * future will not be allocated again.
>   *
>   * start_pfn/end_pfn must be aligned to pageblock_order.
> - * Returns 0 on success and -EBUSY if any part of range cannot be isolated.
> + * Returns 0 on success and -EBUSY if any part of range cannot be isolated

Nit: please s/Returns/Return:/ and keep the period in the end 

> + * or any part of the range is already set to MIGRATE_ISOLATE.
>   */
>  int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>  			     unsigned migratetype, bool skip_hwpoisoned_pages)
> -- 
> 2.13.6
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

-- 
Sincerely yours,
Mike.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 2/3] mm: add find_alloc_contig_pages() interface
  2018-02-12 22:20   ` Mike Kravetz
@ 2018-02-13  9:53     ` Mike Rapoport
  -1 siblings, 0 replies; 19+ messages in thread
From: Mike Rapoport @ 2018-02-13  9:53 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, Michal Hocko, Christopher Lameter,
	Guy Shattah, Anshuman Khandual, Michal Nazarewicz,
	Vlastimil Babka, David Nellans, Laura Abbott, Pavel Machek,
	Dave Hansen

On Mon, Feb 12, 2018 at 02:20:55PM -0800, Mike Kravetz wrote:
> find_alloc_contig_pages() is a new interface that attempts to locate
> and allocate a contiguous range of pages.  It is provided as a more
> convenient interface to the existing alloc_contig_range() interface
> which is used by CMA, memory hotplug and gigantic huge pages.
> 
> When attempting to allocate a range of pages, migration is employed
> if possible.  There is no guarantee that the routine will succeed.
> So, the user must be prepared for failure and have a fall back plan.
> 
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  include/linux/gfp.h | 12 ++++++++
>  mm/page_alloc.c     | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 99 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 1a4582b44d32..456979022956 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -573,6 +573,18 @@ static inline bool pm_suspended_storage(void)
>  extern int alloc_contig_range(unsigned long start, unsigned long end,
>  			      unsigned migratetype, gfp_t gfp_mask);
>  extern void free_contig_range(unsigned long pfn, unsigned nr_pages);
> +extern struct page *find_alloc_contig_pages(unsigned int order, gfp_t gfp,
> +						int nid, nodemask_t *nodemask);
> +extern void free_contig_pages(struct page *page, unsigned nr_pages);
> +#else
> +static inline page *find_alloc_contig_pages(unsigned int order, gfp_t gfp,
> +						int nid, nodemask_t *nodemask)
> +{
> +	return NULL;
> +}
> +static void free_contig_pages(struct page *page, unsigned nr_pages)
> +{
> +}
>  #endif
> 
>  #ifdef CONFIG_CMA
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 064458f317bf..0a5a547acdbf 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -67,6 +67,7 @@
>  #include <linux/ftrace.h>
>  #include <linux/lockdep.h>
>  #include <linux/nmi.h>
> +#include <linux/mmzone.h>
> 
>  #include <asm/sections.h>
>  #include <asm/tlbflush.h>
> @@ -1873,9 +1874,13 @@ static __always_inline struct page *__rmqueue_cma_fallback(struct zone *zone,
>  {
>  	return __rmqueue_smallest(zone, order, MIGRATE_CMA);
>  }
> +#define contig_alloc_migratetype_ok(migratetype) \
> +	((migratetype) == MIGRATE_CMA || (migratetype) == MIGRATE_MOVABLE)
>  #else
>  static inline struct page *__rmqueue_cma_fallback(struct zone *zone,
>  					unsigned int order) { return NULL; }
> +#define contig_alloc_migratetype_ok(migratetype) \
> +	((migratetype) == MIGRATE_MOVABLE)
>  #endif
> 
>  /*
> @@ -7633,6 +7638,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>  	};
>  	INIT_LIST_HEAD(&cc.migratepages);
> 
> +	if (!contig_alloc_migratetype_ok(migratetype))
> +		return -EINVAL;
> +
>  	/*
>  	 * What we do here is we mark all pageblocks in range as
>  	 * MIGRATE_ISOLATE.  Because pageblock and max order pages may
> @@ -7723,8 +7731,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
> 
>  	/* Make sure the range is really isolated. */
>  	if (test_pages_isolated(outer_start, end, false)) {
> -		pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n",
> -			__func__, outer_start, end);
> +		if (!(migratetype == MIGRATE_MOVABLE)) /* only print for CMA */
> +			pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n",
> +				__func__, outer_start, end);
>  		ret = -EBUSY;
>  		goto done;
>  	}
> @@ -7760,6 +7769,82 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
>  	}
>  	WARN(count != 0, "%d pages are still in use!\n", count);
>  }
> +
> +static bool contig_pfn_range_valid(struct zone *z, unsigned long start_pfn,
> +					unsigned long nr_pages)
> +{
> +	unsigned long i, end_pfn = start_pfn + nr_pages;
> +	struct page *page;
> +
> +	for (i = start_pfn; i < end_pfn; i++) {
> +		if (!pfn_valid(i))
> +			return false;
> +
> +		page = pfn_to_page(i);
> +
> +		if (page_zone(page) != z)
> +			return false;
> +
> +	}
> +
> +	return true;
> +}
> +
> +/**
> + * find_alloc_contig_pages() -- attempt to find and allocate a contiguous
> + *				range of pages
> + * @order:	number of pages
> + * @gfp:	gfp mask used to limit search as well as during compaction
> + * @nid:	target node
> + * @nodemask:	mask of other possible nodes
> + *
> + * Returns pointer to 'order' pages on success, or NULL if not successful.

Please s/Returns/Return:/ and move the return value description to the end
of the comment block.

> + *
> + * Pages can be freed with a call to free_contig_pages(), or by manually
> + * calling __free_page() for each page allocated.
> + */
> +struct page *find_alloc_contig_pages(unsigned int order, gfp_t gfp,
> +					int nid, nodemask_t *nodemask)
> +{
> +	unsigned long pfn, nr_pages, flags;
> +	struct page *ret_page = NULL;
> +	struct zonelist *zonelist;
> +	struct zoneref *z;
> +	struct zone *zone;
> +	int rc;
> +
> +	nr_pages = 1 << order;
> +	zonelist = node_zonelist(nid, gfp);
> +	for_each_zone_zonelist_nodemask(zone, z, zonelist, gfp_zone(gfp),
> +					nodemask) {
> +		spin_lock_irqsave(&zone->lock, flags);
> +		pfn = ALIGN(zone->zone_start_pfn, nr_pages);
> +		while (zone_spans_pfn(zone, pfn + nr_pages - 1)) {
> +			if (contig_pfn_range_valid(zone, pfn, nr_pages)) {
> +				spin_unlock_irqrestore(&zone->lock, flags);
> +
> +				rc = alloc_contig_range(pfn, pfn + nr_pages,
> +							MIGRATE_MOVABLE, gfp);
> +				if (!rc) {
> +					ret_page = pfn_to_page(pfn);
> +					return ret_page;
> +				}
> +				spin_lock_irqsave(&zone->lock, flags);
> +			}
> +			pfn += nr_pages;
> +		}
> +		spin_unlock_irqrestore(&zone->lock, flags);
> +	}
> +
> +	return ret_page;
> +}
> +EXPORT_SYMBOL_GPL(find_alloc_contig_pages);
> +
> +void free_contig_pages(struct page *page, unsigned nr_pages)
> +{
> +	free_contig_range(page_to_pfn(page), nr_pages);
> +}
> +EXPORT_SYMBOL_GPL(free_contig_pages);
>  #endif
> 
>  #ifdef CONFIG_MEMORY_HOTPLUG
> -- 
> 2.13.6
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 2/3] mm: add find_alloc_contig_pages() interface
@ 2018-02-13  9:53     ` Mike Rapoport
  0 siblings, 0 replies; 19+ messages in thread
From: Mike Rapoport @ 2018-02-13  9:53 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, Michal Hocko, Christopher Lameter,
	Guy Shattah, Anshuman Khandual, Michal Nazarewicz,
	Vlastimil Babka, David Nellans, Laura Abbott, Pavel Machek,
	Dave Hansen

On Mon, Feb 12, 2018 at 02:20:55PM -0800, Mike Kravetz wrote:
> find_alloc_contig_pages() is a new interface that attempts to locate
> and allocate a contiguous range of pages.  It is provided as a more
> convenient interface to the existing alloc_contig_range() interface
> which is used by CMA, memory hotplug and gigantic huge pages.
> 
> When attempting to allocate a range of pages, migration is employed
> if possible.  There is no guarantee that the routine will succeed.
> So, the user must be prepared for failure and have a fall back plan.
> 
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  include/linux/gfp.h | 12 ++++++++
>  mm/page_alloc.c     | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 99 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 1a4582b44d32..456979022956 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -573,6 +573,18 @@ static inline bool pm_suspended_storage(void)
>  extern int alloc_contig_range(unsigned long start, unsigned long end,
>  			      unsigned migratetype, gfp_t gfp_mask);
>  extern void free_contig_range(unsigned long pfn, unsigned nr_pages);
> +extern struct page *find_alloc_contig_pages(unsigned int order, gfp_t gfp,
> +						int nid, nodemask_t *nodemask);
> +extern void free_contig_pages(struct page *page, unsigned nr_pages);
> +#else
> +static inline page *find_alloc_contig_pages(unsigned int order, gfp_t gfp,
> +						int nid, nodemask_t *nodemask)
> +{
> +	return NULL;
> +}
> +static void free_contig_pages(struct page *page, unsigned nr_pages)
> +{
> +}
>  #endif
> 
>  #ifdef CONFIG_CMA
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 064458f317bf..0a5a547acdbf 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -67,6 +67,7 @@
>  #include <linux/ftrace.h>
>  #include <linux/lockdep.h>
>  #include <linux/nmi.h>
> +#include <linux/mmzone.h>
> 
>  #include <asm/sections.h>
>  #include <asm/tlbflush.h>
> @@ -1873,9 +1874,13 @@ static __always_inline struct page *__rmqueue_cma_fallback(struct zone *zone,
>  {
>  	return __rmqueue_smallest(zone, order, MIGRATE_CMA);
>  }
> +#define contig_alloc_migratetype_ok(migratetype) \
> +	((migratetype) == MIGRATE_CMA || (migratetype) == MIGRATE_MOVABLE)
>  #else
>  static inline struct page *__rmqueue_cma_fallback(struct zone *zone,
>  					unsigned int order) { return NULL; }
> +#define contig_alloc_migratetype_ok(migratetype) \
> +	((migratetype) == MIGRATE_MOVABLE)
>  #endif
> 
>  /*
> @@ -7633,6 +7638,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>  	};
>  	INIT_LIST_HEAD(&cc.migratepages);
> 
> +	if (!contig_alloc_migratetype_ok(migratetype))
> +		return -EINVAL;
> +
>  	/*
>  	 * What we do here is we mark all pageblocks in range as
>  	 * MIGRATE_ISOLATE.  Because pageblock and max order pages may
> @@ -7723,8 +7731,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
> 
>  	/* Make sure the range is really isolated. */
>  	if (test_pages_isolated(outer_start, end, false)) {
> -		pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n",
> -			__func__, outer_start, end);
> +		if (!(migratetype == MIGRATE_MOVABLE)) /* only print for CMA */
> +			pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n",
> +				__func__, outer_start, end);
>  		ret = -EBUSY;
>  		goto done;
>  	}
> @@ -7760,6 +7769,82 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
>  	}
>  	WARN(count != 0, "%d pages are still in use!\n", count);
>  }
> +
> +static bool contig_pfn_range_valid(struct zone *z, unsigned long start_pfn,
> +					unsigned long nr_pages)
> +{
> +	unsigned long i, end_pfn = start_pfn + nr_pages;
> +	struct page *page;
> +
> +	for (i = start_pfn; i < end_pfn; i++) {
> +		if (!pfn_valid(i))
> +			return false;
> +
> +		page = pfn_to_page(i);
> +
> +		if (page_zone(page) != z)
> +			return false;
> +
> +	}
> +
> +	return true;
> +}
> +
> +/**
> + * find_alloc_contig_pages() -- attempt to find and allocate a contiguous
> + *				range of pages
> + * @order:	number of pages
> + * @gfp:	gfp mask used to limit search as well as during compaction
> + * @nid:	target node
> + * @nodemask:	mask of other possible nodes
> + *
> + * Returns pointer to 'order' pages on success, or NULL if not successful.

Please s/Returns/Return:/ and move the return value description to the end
of the comment block.

> + *
> + * Pages can be freed with a call to free_contig_pages(), or by manually
> + * calling __free_page() for each page allocated.
> + */
> +struct page *find_alloc_contig_pages(unsigned int order, gfp_t gfp,
> +					int nid, nodemask_t *nodemask)
> +{
> +	unsigned long pfn, nr_pages, flags;
> +	struct page *ret_page = NULL;
> +	struct zonelist *zonelist;
> +	struct zoneref *z;
> +	struct zone *zone;
> +	int rc;
> +
> +	nr_pages = 1 << order;
> +	zonelist = node_zonelist(nid, gfp);
> +	for_each_zone_zonelist_nodemask(zone, z, zonelist, gfp_zone(gfp),
> +					nodemask) {
> +		spin_lock_irqsave(&zone->lock, flags);
> +		pfn = ALIGN(zone->zone_start_pfn, nr_pages);
> +		while (zone_spans_pfn(zone, pfn + nr_pages - 1)) {
> +			if (contig_pfn_range_valid(zone, pfn, nr_pages)) {
> +				spin_unlock_irqrestore(&zone->lock, flags);
> +
> +				rc = alloc_contig_range(pfn, pfn + nr_pages,
> +							MIGRATE_MOVABLE, gfp);
> +				if (!rc) {
> +					ret_page = pfn_to_page(pfn);
> +					return ret_page;
> +				}
> +				spin_lock_irqsave(&zone->lock, flags);
> +			}
> +			pfn += nr_pages;
> +		}
> +		spin_unlock_irqrestore(&zone->lock, flags);
> +	}
> +
> +	return ret_page;
> +}
> +EXPORT_SYMBOL_GPL(find_alloc_contig_pages);
> +
> +void free_contig_pages(struct page *page, unsigned nr_pages)
> +{
> +	free_contig_range(page_to_pfn(page), nr_pages);
> +}
> +EXPORT_SYMBOL_GPL(free_contig_pages);
>  #endif
> 
>  #ifdef CONFIG_MEMORY_HOTPLUG
> -- 
> 2.13.6
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

-- 
Sincerely yours,
Mike.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/3] Interface for higher order contiguous allocations
  2018-02-12 22:20 ` Mike Kravetz
@ 2018-02-15 20:22   ` Reinette Chatre
  -1 siblings, 0 replies; 19+ messages in thread
From: Reinette Chatre @ 2018-02-15 20:22 UTC (permalink / raw)
  To: Mike Kravetz, linux-mm, linux-kernel
  Cc: Michal Hocko, Christopher Lameter, Guy Shattah,
	Anshuman Khandual, Michal Nazarewicz, Vlastimil Babka,
	David Nellans, Laura Abbott, Pavel Machek, Dave Hansen

Hi Mike,

On 2/12/2018 2:20 PM, Mike Kravetz wrote:
> These patches came out of the "[RFC] mmap(MAP_CONTIG)" discussions at:
> http://lkml.kernel.org/r/21f1ec96-2822-1189-1c95-79a2bb491571@oracle.com
> 
> One suggestion in that thread was to create a friendlier interface that
> could be used by drivers and others outside core mm code to allocate a
> contiguous set of pages.  The alloc_contig_range() interface is used for
> this purpose today by CMA and gigantic page allocation.  However, this is
> not a general purpose interface.  So, wrap alloc_contig_range() in the
> more general interface:
> 
> struct page *find_alloc_contig_pages(unsigned int order, gfp_t gfp, int nid,
> 					nodemask_t *nodemask)
> 
> No underlying changes are made to increase the likelihood that a contiguous
> set of pages can be found and allocated.  Therefore, any user of this
> interface must deal with failure.  The hope is that this interface will be
> able to satisfy some use cases today.

As discussed in another thread a new feature, Cache Pseudo-Locking,
requires large contiguous regions. Until now I just exposed
alloc_gigantic_page() to handle these allocations in my testing. I now
moved to using find_alloc_contig_pages() as introduced here and all my
tests passed. I do hope that an API supporting large contiguous regions
become available.

Thank you very much for creating this.

Tested-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/3] Interface for higher order contiguous allocations
@ 2018-02-15 20:22   ` Reinette Chatre
  0 siblings, 0 replies; 19+ messages in thread
From: Reinette Chatre @ 2018-02-15 20:22 UTC (permalink / raw)
  To: Mike Kravetz, linux-mm, linux-kernel
  Cc: Michal Hocko, Christopher Lameter, Guy Shattah,
	Anshuman Khandual, Michal Nazarewicz, Vlastimil Babka,
	David Nellans, Laura Abbott, Pavel Machek, Dave Hansen

Hi Mike,

On 2/12/2018 2:20 PM, Mike Kravetz wrote:
> These patches came out of the "[RFC] mmap(MAP_CONTIG)" discussions at:
> http://lkml.kernel.org/r/21f1ec96-2822-1189-1c95-79a2bb491571@oracle.com
> 
> One suggestion in that thread was to create a friendlier interface that
> could be used by drivers and others outside core mm code to allocate a
> contiguous set of pages.  The alloc_contig_range() interface is used for
> this purpose today by CMA and gigantic page allocation.  However, this is
> not a general purpose interface.  So, wrap alloc_contig_range() in the
> more general interface:
> 
> struct page *find_alloc_contig_pages(unsigned int order, gfp_t gfp, int nid,
> 					nodemask_t *nodemask)
> 
> No underlying changes are made to increase the likelihood that a contiguous
> set of pages can be found and allocated.  Therefore, any user of this
> interface must deal with failure.  The hope is that this interface will be
> able to satisfy some use cases today.

As discussed in another thread a new feature, Cache Pseudo-Locking,
requires large contiguous regions. Until now I just exposed
alloc_gigantic_page() to handle these allocations in my testing. I now
moved to using find_alloc_contig_pages() as introduced here and all my
tests passed. I do hope that an API supporting large contiguous regions
become available.

Thank you very much for creating this.

Tested-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 1/3] mm: make start_isolate_page_range() fail if already isolated
  2018-02-12 22:20   ` Mike Kravetz
@ 2018-02-16  0:40     ` Mike Kravetz
  -1 siblings, 0 replies; 19+ messages in thread
From: Mike Kravetz @ 2018-02-16  0:40 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Michal Hocko, Christopher Lameter, Guy Shattah,
	Anshuman Khandual, Michal Nazarewicz, Vlastimil Babka,
	David Nellans, Laura Abbott, Pavel Machek, Dave Hansen

On 02/12/2018 02:20 PM, Mike Kravetz wrote:
> start_isolate_page_range() is used to set the migrate type of a
> page block to MIGRATE_ISOLATE while attempting to start a
> migration operation.  It is assumed that only one thread is
> attempting such an operation, and due to the limited number of
> callers this is generally the case.  However, there are no
> guarantees and it is 'possible' for two threads to operate on
> the same range.

I confirmed my suspicions that this is possible today.

As a test, I created a large CMA area at boot time.   I wrote some
code to exercise large allocations and frees via cma_alloc()/cma_release().
At the same time, I just allocated and freed'ed gigantic pages via the
sysfs interface.

After a little bit of running, 'free memory' on the system went to
zero.  After 'stopping' the tests, I observed that most zone normal
page blocks were marked as MIGRATE_ISOLATE.  Hence 'not available'.

As mentioned in the commit message, I doubt we will see this is
normal operations.  But, my testing confirms that it is possible.
Therefore, we should consider a patch like this or some other form
of mitigation even of we don't move forward with adding the new
interface.

-- 
Mike Kravetz

> 
> Since start_isolate_page_range() is called at the beginning of
> such operations, have it return -EBUSY if MIGRATE_ISOLATE is
> already set.
> 
> This will allow start_isolate_page_range to serve as a
> synchronization mechanism and will allow for more general use
> of callers making use of these interfaces.
> 
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  mm/page_alloc.c     |  8 ++++----
>  mm/page_isolation.c | 10 +++++++++-
>  2 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 76c9688b6a0a..064458f317bf 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7605,11 +7605,11 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
>   * @gfp_mask:	GFP mask to use during compaction
>   *
>   * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
> - * aligned, however it's the caller's responsibility to guarantee that
> - * we are the only thread that changes migrate type of pageblocks the
> - * pages fall in.
> + * aligned.  The PFN range must belong to a single zone.
>   *
> - * The PFN range must belong to a single zone.
> + * The first thing this routine does is attempt to MIGRATE_ISOLATE all
> + * pageblocks in the range.  Once isolated, the pageblocks should not
> + * be modified by others.
>   *
>   * Returns zero on success or negative error code.  On success all
>   * pages which PFN is in [start, end) are allocated for the caller and
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 165ed8117bd1..e815879d525f 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -28,6 +28,13 @@ static int set_migratetype_isolate(struct page *page, int migratetype,
>  
>  	spin_lock_irqsave(&zone->lock, flags);
>  
> +	/*
> +	 * We assume we are the only ones trying to isolate this block.
> +	 * If MIGRATE_ISOLATE already set, return -EBUSY
> +	 */
> +	if (is_migrate_isolate_page(page))
> +		goto out;
> +
>  	pfn = page_to_pfn(page);
>  	arg.start_pfn = pfn;
>  	arg.nr_pages = pageblock_nr_pages;
> @@ -166,7 +173,8 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>   * future will not be allocated again.
>   *
>   * start_pfn/end_pfn must be aligned to pageblock_order.
> - * Returns 0 on success and -EBUSY if any part of range cannot be isolated.
> + * Returns 0 on success and -EBUSY if any part of range cannot be isolated
> + * or any part of the range is already set to MIGRATE_ISOLATE.
>   */
>  int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>  			     unsigned migratetype, bool skip_hwpoisoned_pages)
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 1/3] mm: make start_isolate_page_range() fail if already isolated
@ 2018-02-16  0:40     ` Mike Kravetz
  0 siblings, 0 replies; 19+ messages in thread
From: Mike Kravetz @ 2018-02-16  0:40 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Michal Hocko, Christopher Lameter, Guy Shattah,
	Anshuman Khandual, Michal Nazarewicz, Vlastimil Babka,
	David Nellans, Laura Abbott, Pavel Machek, Dave Hansen

On 02/12/2018 02:20 PM, Mike Kravetz wrote:
> start_isolate_page_range() is used to set the migrate type of a
> page block to MIGRATE_ISOLATE while attempting to start a
> migration operation.  It is assumed that only one thread is
> attempting such an operation, and due to the limited number of
> callers this is generally the case.  However, there are no
> guarantees and it is 'possible' for two threads to operate on
> the same range.

I confirmed my suspicions that this is possible today.

As a test, I created a large CMA area at boot time.   I wrote some
code to exercise large allocations and frees via cma_alloc()/cma_release().
At the same time, I just allocated and freed'ed gigantic pages via the
sysfs interface.

After a little bit of running, 'free memory' on the system went to
zero.  After 'stopping' the tests, I observed that most zone normal
page blocks were marked as MIGRATE_ISOLATE.  Hence 'not available'.

As mentioned in the commit message, I doubt we will see this is
normal operations.  But, my testing confirms that it is possible.
Therefore, we should consider a patch like this or some other form
of mitigation even of we don't move forward with adding the new
interface.

-- 
Mike Kravetz

> 
> Since start_isolate_page_range() is called at the beginning of
> such operations, have it return -EBUSY if MIGRATE_ISOLATE is
> already set.
> 
> This will allow start_isolate_page_range to serve as a
> synchronization mechanism and will allow for more general use
> of callers making use of these interfaces.
> 
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  mm/page_alloc.c     |  8 ++++----
>  mm/page_isolation.c | 10 +++++++++-
>  2 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 76c9688b6a0a..064458f317bf 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7605,11 +7605,11 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
>   * @gfp_mask:	GFP mask to use during compaction
>   *
>   * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
> - * aligned, however it's the caller's responsibility to guarantee that
> - * we are the only thread that changes migrate type of pageblocks the
> - * pages fall in.
> + * aligned.  The PFN range must belong to a single zone.
>   *
> - * The PFN range must belong to a single zone.
> + * The first thing this routine does is attempt to MIGRATE_ISOLATE all
> + * pageblocks in the range.  Once isolated, the pageblocks should not
> + * be modified by others.
>   *
>   * Returns zero on success or negative error code.  On success all
>   * pages which PFN is in [start, end) are allocated for the caller and
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 165ed8117bd1..e815879d525f 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -28,6 +28,13 @@ static int set_migratetype_isolate(struct page *page, int migratetype,
>  
>  	spin_lock_irqsave(&zone->lock, flags);
>  
> +	/*
> +	 * We assume we are the only ones trying to isolate this block.
> +	 * If MIGRATE_ISOLATE already set, return -EBUSY
> +	 */
> +	if (is_migrate_isolate_page(page))
> +		goto out;
> +
>  	pfn = page_to_pfn(page);
>  	arg.start_pfn = pfn;
>  	arg.nr_pages = pageblock_nr_pages;
> @@ -166,7 +173,8 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>   * future will not be allocated again.
>   *
>   * start_pfn/end_pfn must be aligned to pageblock_order.
> - * Returns 0 on success and -EBUSY if any part of range cannot be isolated.
> + * Returns 0 on success and -EBUSY if any part of range cannot be isolated
> + * or any part of the range is already set to MIGRATE_ISOLATE.
>   */
>  int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>  			     unsigned migratetype, bool skip_hwpoisoned_pages)
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/3] Interface for higher order contiguous allocations
  2018-02-15 20:22   ` Reinette Chatre
  (?)
@ 2018-04-12 20:40   ` Reinette Chatre
  2018-04-12 20:58     ` Mike Kravetz
  -1 siblings, 1 reply; 19+ messages in thread
From: Reinette Chatre @ 2018-04-12 20:40 UTC (permalink / raw)
  To: Mike Kravetz, linux-mm, linux-kernel
  Cc: Michal Hocko, Christopher Lameter, Guy Shattah,
	Anshuman Khandual, Michal Nazarewicz, Vlastimil Babka,
	David Nellans, Laura Abbott, Pavel Machek, Dave Hansen

Hi Mike,

On 2/15/2018 12:22 PM, Reinette Chatre wrote:
> On 2/12/2018 2:20 PM, Mike Kravetz wrote:
>> These patches came out of the "[RFC] mmap(MAP_CONTIG)" discussions at:
>> http://lkml.kernel.org/r/21f1ec96-2822-1189-1c95-79a2bb491571@oracle.com
>>
>> One suggestion in that thread was to create a friendlier interface that
>> could be used by drivers and others outside core mm code to allocate a
>> contiguous set of pages.  The alloc_contig_range() interface is used for
>> this purpose today by CMA and gigantic page allocation.  However, this is
>> not a general purpose interface.  So, wrap alloc_contig_range() in the
>> more general interface:
>>
>> struct page *find_alloc_contig_pages(unsigned int order, gfp_t gfp, int nid,
>> 					nodemask_t *nodemask)
>>
>> No underlying changes are made to increase the likelihood that a contiguous
>> set of pages can be found and allocated.  Therefore, any user of this
>> interface must deal with failure.  The hope is that this interface will be
>> able to satisfy some use cases today.
> 
> As discussed in another thread a new feature, Cache Pseudo-Locking,
> requires large contiguous regions. Until now I just exposed
> alloc_gigantic_page() to handle these allocations in my testing. I now
> moved to using find_alloc_contig_pages() as introduced here and all my
> tests passed. I do hope that an API supporting large contiguous regions
> become available.
> 
> Thank you very much for creating this.
> 
> Tested-by: Reinette Chatre <reinette.chatre@intel.com>

Do you still intend on submitting these changes for inclusion?

I would really like to use this work but unfortunately the original
patches submitted here do not apply anymore. I am encountering conflicts
with, for example:

commit d9cc948f6fa1c3384037f500e0acd35f03850d15
Author: Michal Hocko <mhocko@suse.com>
Date:   Wed Jan 31 16:20:44 2018 -0800

    mm, hugetlb: integrate giga hugetlb more naturally to the allocation
path

Thank you very much

Reinette

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/3] Interface for higher order contiguous allocations
  2018-04-12 20:40   ` Reinette Chatre
@ 2018-04-12 20:58     ` Mike Kravetz
  2018-04-16 13:14       ` Michal Hocko
  0 siblings, 1 reply; 19+ messages in thread
From: Mike Kravetz @ 2018-04-12 20:58 UTC (permalink / raw)
  To: Reinette Chatre, linux-mm, linux-kernel
  Cc: Michal Hocko, Christopher Lameter, Guy Shattah,
	Anshuman Khandual, Michal Nazarewicz, Vlastimil Babka,
	David Nellans, Laura Abbott, Pavel Machek, Dave Hansen

On 04/12/2018 01:40 PM, Reinette Chatre wrote:
> Hi Mike,
> 
> On 2/15/2018 12:22 PM, Reinette Chatre wrote:
>> On 2/12/2018 2:20 PM, Mike Kravetz wrote:
>>> These patches came out of the "[RFC] mmap(MAP_CONTIG)" discussions at:
>>> http://lkml.kernel.org/r/21f1ec96-2822-1189-1c95-79a2bb491571@oracle.com
>>>
>>> One suggestion in that thread was to create a friendlier interface that
>>> could be used by drivers and others outside core mm code to allocate a
>>> contiguous set of pages.  The alloc_contig_range() interface is used for
>>> this purpose today by CMA and gigantic page allocation.  However, this is
>>> not a general purpose interface.  So, wrap alloc_contig_range() in the
>>> more general interface:
>>>
>>> struct page *find_alloc_contig_pages(unsigned int order, gfp_t gfp, int nid,
>>> 					nodemask_t *nodemask)
>>>
>>> No underlying changes are made to increase the likelihood that a contiguous
>>> set of pages can be found and allocated.  Therefore, any user of this
>>> interface must deal with failure.  The hope is that this interface will be
>>> able to satisfy some use cases today.
>>
>> As discussed in another thread a new feature, Cache Pseudo-Locking,
>> requires large contiguous regions. Until now I just exposed
>> alloc_gigantic_page() to handle these allocations in my testing. I now
>> moved to using find_alloc_contig_pages() as introduced here and all my
>> tests passed. I do hope that an API supporting large contiguous regions
>> become available.
>>
>> Thank you very much for creating this.
>>
>> Tested-by: Reinette Chatre <reinette.chatre@intel.com>
> 
> Do you still intend on submitting these changes for inclusion?
> 
> I would really like to use this work but unfortunately the original
> patches submitted here do not apply anymore. I am encountering conflicts
> with, for example:
> 
> commit d9cc948f6fa1c3384037f500e0acd35f03850d15
> Author: Michal Hocko <mhocko@suse.com>
> Date:   Wed Jan 31 16:20:44 2018 -0800
> 
>     mm, hugetlb: integrate giga hugetlb more naturally to the allocation
> path
> 
> Thank you very much

Thanks for the reminder Reinette.

You were the only one to comment on the original proposal.  In addition,
my original use case may have gone away.  So, this effort went to the
bottom of my priority list.

I am happy rebase the patches, but would really like to get additional
comments.  Allocation of hugetlbfs gigantic pages is the only existing
user.  Perhaps this is a natural progression of Michal's patch above
as it moves all that special pfn range scanning out of hugetlb code.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/3] Interface for higher order contiguous allocations
  2018-04-12 20:58     ` Mike Kravetz
@ 2018-04-16 13:14       ` Michal Hocko
  0 siblings, 0 replies; 19+ messages in thread
From: Michal Hocko @ 2018-04-16 13:14 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Reinette Chatre, linux-mm, linux-kernel, Christopher Lameter,
	Guy Shattah, Anshuman Khandual, Michal Nazarewicz,
	Vlastimil Babka, David Nellans, Laura Abbott, Pavel Machek,
	Dave Hansen

On Thu 12-04-18 13:58:47, Mike Kravetz wrote:
> On 04/12/2018 01:40 PM, Reinette Chatre wrote:
> > Hi Mike,
> > 
> > On 2/15/2018 12:22 PM, Reinette Chatre wrote:
> >> On 2/12/2018 2:20 PM, Mike Kravetz wrote:
> >>> These patches came out of the "[RFC] mmap(MAP_CONTIG)" discussions at:
> >>> http://lkml.kernel.org/r/21f1ec96-2822-1189-1c95-79a2bb491571@oracle.com
> >>>
> >>> One suggestion in that thread was to create a friendlier interface that
> >>> could be used by drivers and others outside core mm code to allocate a
> >>> contiguous set of pages.  The alloc_contig_range() interface is used for
> >>> this purpose today by CMA and gigantic page allocation.  However, this is
> >>> not a general purpose interface.  So, wrap alloc_contig_range() in the
> >>> more general interface:
> >>>
> >>> struct page *find_alloc_contig_pages(unsigned int order, gfp_t gfp, int nid,
> >>> 					nodemask_t *nodemask)
> >>>
> >>> No underlying changes are made to increase the likelihood that a contiguous
> >>> set of pages can be found and allocated.  Therefore, any user of this
> >>> interface must deal with failure.  The hope is that this interface will be
> >>> able to satisfy some use cases today.
> >>
> >> As discussed in another thread a new feature, Cache Pseudo-Locking,
> >> requires large contiguous regions. Until now I just exposed
> >> alloc_gigantic_page() to handle these allocations in my testing. I now
> >> moved to using find_alloc_contig_pages() as introduced here and all my
> >> tests passed. I do hope that an API supporting large contiguous regions
> >> become available.
> >>
> >> Thank you very much for creating this.
> >>
> >> Tested-by: Reinette Chatre <reinette.chatre@intel.com>
> > 
> > Do you still intend on submitting these changes for inclusion?
> > 
> > I would really like to use this work but unfortunately the original
> > patches submitted here do not apply anymore. I am encountering conflicts
> > with, for example:
> > 
> > commit d9cc948f6fa1c3384037f500e0acd35f03850d15
> > Author: Michal Hocko <mhocko@suse.com>
> > Date:   Wed Jan 31 16:20:44 2018 -0800
> > 
> >     mm, hugetlb: integrate giga hugetlb more naturally to the allocation
> > path
> > 
> > Thank you very much
> 
> Thanks for the reminder Reinette.
> 
> You were the only one to comment on the original proposal.  In addition,
> my original use case may have gone away.  So, this effort went to the
> bottom of my priority list.
> 
> I am happy rebase the patches, but would really like to get additional
> comments.  Allocation of hugetlbfs gigantic pages is the only existing
> user.  Perhaps this is a natural progression of Michal's patch above
> as it moves all that special pfn range scanning out of hugetlb code.

Yes, that was and still is the plan. Turn the hackish contig allocator
into something more usable so I guess it would be in line with what
Reinette is after.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-04-16 13:14 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-12 22:20 [RFC PATCH 0/3] Interface for higher order contiguous allocations Mike Kravetz
2018-02-12 22:20 ` Mike Kravetz
2018-02-12 22:20 ` [RFC PATCH 1/3] mm: make start_isolate_page_range() fail if already isolated Mike Kravetz
2018-02-12 22:20   ` Mike Kravetz
2018-02-13  9:46   ` Mike Rapoport
2018-02-13  9:46     ` Mike Rapoport
2018-02-16  0:40   ` Mike Kravetz
2018-02-16  0:40     ` Mike Kravetz
2018-02-12 22:20 ` [RFC PATCH 2/3] mm: add find_alloc_contig_pages() interface Mike Kravetz
2018-02-12 22:20   ` Mike Kravetz
2018-02-13  9:53   ` Mike Rapoport
2018-02-13  9:53     ` Mike Rapoport
2018-02-12 22:20 ` [RFC PATCH 3/3] mm/hugetlb: use find_alloc_contig_pages() to allocate gigantic pages Mike Kravetz
2018-02-12 22:20   ` Mike Kravetz
2018-02-15 20:22 ` [RFC PATCH 0/3] Interface for higher order contiguous allocations Reinette Chatre
2018-02-15 20:22   ` Reinette Chatre
2018-04-12 20:40   ` Reinette Chatre
2018-04-12 20:58     ` Mike Kravetz
2018-04-16 13:14       ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.