[PATCH 1/2] mm/compaction: fix invalid free_pfn and compact_cached_free

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/2] mm/compaction: fix invalid free_pfn and compact_cached_free_pfn
@ 2015-12-21  6:13 ` Joonsoo Kim
  0 siblings, 0 replies; 34+ messages in thread
From: Joonsoo Kim @ 2015-12-21  6:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Aaron Lu, Mel Gorman, Rik van Riel,
	David Rientjes, linux-kernel, linux-mm, Joonsoo Kim

free_pfn and compact_cached_free_pfn are the pointer that remember
restart position of freepage scanner. When they are reset or invalid,
we set them to zone_end_pfn because freepage scanner works in reverse
direction. But, because zone range is defined as [zone_start_pfn,
zone_end_pfn), zone_end_pfn is invalid to access. Therefore, we should
not store it to free_pfn and compact_cached_free_pfn. Instead, we need
to store zone_end_pfn - 1 to them. There is one more thing we should
consider. Freepage scanner scan reversely by pageblock unit. If free_pfn
and compact_cached_free_pfn are set to middle of pageblock, it regards
that sitiation as that it already scans front part of pageblock so we
lose opportunity to scan there. To fix-up, this patch do round_down()
to guarantee that reset position will be pageblock aligned.

Note that thanks to the current pageblock_pfn_to_page() implementation,
actual access to zone_end_pfn doesn't happen until now. But, following
patch will change pageblock_pfn_to_page() so this patch is needed
from now on.

Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/compaction.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 585de54..56fa321 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -200,7 +200,8 @@ static void reset_cached_positions(struct zone *zone)
 {
 	zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
 	zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
-	zone->compact_cached_free_pfn = zone_end_pfn(zone);
+	zone->compact_cached_free_pfn =
+			round_down(zone_end_pfn(zone) - 1, pageblock_nr_pages);
 }

 /*
@@ -1371,11 +1372,11 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 	 */
 	cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync];
 	cc->free_pfn = zone->compact_cached_free_pfn;
-	if (cc->free_pfn < start_pfn || cc->free_pfn > end_pfn) {
-		cc->free_pfn = end_pfn & ~(pageblock_nr_pages-1);
+	if (cc->free_pfn < start_pfn || cc->free_pfn >= end_pfn) {
+		cc->free_pfn = round_down(end_pfn - 1, pageblock_nr_pages);
 		zone->compact_cached_free_pfn = cc->free_pfn;
 	}
-	if (cc->migrate_pfn < start_pfn || cc->migrate_pfn > end_pfn) {
+	if (cc->migrate_pfn < start_pfn || cc->migrate_pfn >= end_pfn) {
 		cc->migrate_pfn = start_pfn;
 		zone->compact_cached_migrate_pfn[0] = cc->migrate_pfn;
 		zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 1/2] mm/compaction: fix invalid free_pfn and compact_cached_free_pfn
@ 2015-12-21  6:13 ` Joonsoo Kim
  0 siblings, 0 replies; 34+ messages in thread
From: Joonsoo Kim @ 2015-12-21  6:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Aaron Lu, Mel Gorman, Rik van Riel,
	David Rientjes, linux-kernel, linux-mm, Joonsoo Kim

free_pfn and compact_cached_free_pfn are the pointer that remember
restart position of freepage scanner. When they are reset or invalid,
we set them to zone_end_pfn because freepage scanner works in reverse
direction. But, because zone range is defined as [zone_start_pfn,
zone_end_pfn), zone_end_pfn is invalid to access. Therefore, we should
not store it to free_pfn and compact_cached_free_pfn. Instead, we need
to store zone_end_pfn - 1 to them. There is one more thing we should
consider. Freepage scanner scan reversely by pageblock unit. If free_pfn
and compact_cached_free_pfn are set to middle of pageblock, it regards
that sitiation as that it already scans front part of pageblock so we
lose opportunity to scan there. To fix-up, this patch do round_down()
to guarantee that reset position will be pageblock aligned.

Note that thanks to the current pageblock_pfn_to_page() implementation,
actual access to zone_end_pfn doesn't happen until now. But, following
patch will change pageblock_pfn_to_page() so this patch is needed
from now on.

Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/compaction.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 585de54..56fa321 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -200,7 +200,8 @@ static void reset_cached_positions(struct zone *zone)
 {
 	zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
 	zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
-	zone->compact_cached_free_pfn = zone_end_pfn(zone);
+	zone->compact_cached_free_pfn =
+			round_down(zone_end_pfn(zone) - 1, pageblock_nr_pages);
 }

 /*
@@ -1371,11 +1372,11 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 	 */
 	cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync];
 	cc->free_pfn = zone->compact_cached_free_pfn;
-	if (cc->free_pfn < start_pfn || cc->free_pfn > end_pfn) {
-		cc->free_pfn = end_pfn & ~(pageblock_nr_pages-1);
+	if (cc->free_pfn < start_pfn || cc->free_pfn >= end_pfn) {
+		cc->free_pfn = round_down(end_pfn - 1, pageblock_nr_pages);
 		zone->compact_cached_free_pfn = cc->free_pfn;
 	}
-	if (cc->migrate_pfn < start_pfn || cc->migrate_pfn > end_pfn) {
+	if (cc->migrate_pfn < start_pfn || cc->migrate_pfn >= end_pfn) {
 		cc->migrate_pfn = start_pfn;
 		zone->compact_cached_migrate_pfn[0] = cc->migrate_pfn;
 		zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
  2015-12-21  6:13 ` Joonsoo Kim
@ 2015-12-21  6:13   ` Joonsoo Kim
  -1 siblings, 0 replies; 34+ messages in thread
From: Joonsoo Kim @ 2015-12-21  6:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Aaron Lu, Mel Gorman, Rik van Riel,
	David Rientjes, linux-kernel, linux-mm, Joonsoo Kim

There is a performance drop report due to hugepage allocation and in there
half of cpu time are spent on pageblock_pfn_to_page() in compaction [1].
In that workload, compaction is triggered to make hugepage but most of
pageblocks are un-available for compaction due to pageblock type and
skip bit so compaction usually fails. Most costly operations in this case
is to find valid pageblock while scanning whole zone range. To check
if pageblock is valid to compact, valid pfn within pageblock is required
and we can obtain it by calling pageblock_pfn_to_page(). This function
checks whether pageblock is in a single zone and return valid pfn
if possible. Problem is that we need to check it every time before
scanning pageblock even if we re-visit it and this turns out to
be very expensive in this workload.

Although we have no way to skip this pageblock check in the system
where hole exists at arbitrary position, we can use cached value for
zone continuity and just do pfn_to_page() in the system where hole doesn't
exist. This optimization considerably speeds up in above workload.

Before vs After
Max: 1096 MB/s vs 1325 MB/s
Min: 635 MB/s 1015 MB/s
Avg: 899 MB/s 1194 MB/s

Avg is improved by roughly 30% [2].

[1]: http://www.spinics.net/lists/linux-mm/msg97378.html
[2]: https://lkml.org/lkml/2015/12/9/23

v2
o checking zone continuity after initialization
o handle memory-hotplug case

Reported and Tested-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 include/linux/gfp.h            |  6 ---
 include/linux/memory_hotplug.h |  3 ++
 include/linux/mmzone.h         |  2 +
 mm/compaction.c                | 43 ---------------------
 mm/internal.h                  | 12 ++++++
 mm/memory_hotplug.c            | 10 +++++
 mm/page_alloc.c                | 85 +++++++++++++++++++++++++++++++++++++++++-
 7 files changed, 111 insertions(+), 50 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 91f74e7..6eb3eca 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -515,13 +515,7 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp);
 void drain_all_pages(struct zone *zone);
 void drain_local_pages(struct zone *zone);
 
-#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 void page_alloc_init_late(void);
-#else
-static inline void page_alloc_init_late(void)
-{
-}
-#endif
 
 /*
  * gfp_allowed_mask is set to GFP_BOOT_MASK during early boot to restrict what
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 2ea574f..18c2676 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -196,6 +196,9 @@ void put_online_mems(void);
 void mem_hotplug_begin(void);
 void mem_hotplug_done(void);
 
+extern void set_zone_contiguous(struct zone *zone);
+extern void clear_zone_contiguous(struct zone *zone);
+
 #else /* ! CONFIG_MEMORY_HOTPLUG */
 /*
  * Stub functions for when hotplug is off
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 68cc063..eb5d88e 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -523,6 +523,8 @@ struct zone {
 	bool			compact_blockskip_flush;
 #endif
 
+	bool			contiguous;
+
 	ZONE_PADDING(_pad3_)
 	/* Zone statistics */
 	atomic_long_t		vm_stat[NR_VM_ZONE_STAT_ITEMS];
diff --git a/mm/compaction.c b/mm/compaction.c
index 56fa321..9c89d46 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -71,49 +71,6 @@ static inline bool migrate_async_suitable(int migratetype)
 	return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE;
 }
 
-/*
- * Check that the whole (or subset of) a pageblock given by the interval of
- * [start_pfn, end_pfn) is valid and within the same zone, before scanning it
- * with the migration of free compaction scanner. The scanners then need to
- * use only pfn_valid_within() check for arches that allow holes within
- * pageblocks.
- *
- * Return struct page pointer of start_pfn, or NULL if checks were not passed.
- *
- * It's possible on some configurations to have a setup like node0 node1 node0
- * i.e. it's possible that all pages within a zones range of pages do not
- * belong to a single zone. We assume that a border between node0 and node1
- * can occur within a single pageblock, but not a node0 node1 node0
- * interleaving within a single pageblock. It is therefore sufficient to check
- * the first and last page of a pageblock and avoid checking each individual
- * page in a pageblock.
- */
-static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
-				unsigned long end_pfn, struct zone *zone)
-{
-	struct page *start_page;
-	struct page *end_page;
-
-	/* end_pfn is one past the range we are checking */
-	end_pfn--;
-
-	if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn))
-		return NULL;
-
-	start_page = pfn_to_page(start_pfn);
-
-	if (page_zone(start_page) != zone)
-		return NULL;
-
-	end_page = pfn_to_page(end_pfn);
-
-	/* This gives a shorter code than deriving page_zone(end_page) */
-	if (page_zone_id(start_page) != page_zone_id(end_page))
-		return NULL;
-
-	return start_page;
-}
-
 #ifdef CONFIG_COMPACTION
 
 /* Do not skip compaction more than 64 times */
diff --git a/mm/internal.h b/mm/internal.h
index d01a41c..bc9d337 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -137,6 +137,18 @@ __find_buddy_index(unsigned long page_idx, unsigned int order)
 	return page_idx ^ (1 << order);
 }
 
+extern struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
+				unsigned long end_pfn, struct zone *zone);
+
+static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
+				unsigned long end_pfn, struct zone *zone)
+{
+	if (zone->contiguous)
+		return pfn_to_page(start_pfn);
+
+	return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
+}
+
 extern int __isolate_free_page(struct page *page, unsigned int order);
 extern void __free_pages_bootmem(struct page *page, unsigned long pfn,
 					unsigned int order);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index d8016a2..f7b6e6b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -505,6 +505,9 @@ int __ref __add_pages(int nid, struct zone *zone, unsigned long phys_start_pfn,
 	unsigned long i;
 	int err = 0;
 	int start_sec, end_sec;
+
+	clear_zone_contiguous(zone);
+
 	/* during initialize mem_map, align hot-added range to section */
 	start_sec = pfn_to_section_nr(phys_start_pfn);
 	end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
@@ -523,6 +526,8 @@ int __ref __add_pages(int nid, struct zone *zone, unsigned long phys_start_pfn,
 	}
 	vmemmap_populate_print_last();
 
+	set_zone_contiguous(zone);
+
 	return err;
 }
 EXPORT_SYMBOL_GPL(__add_pages);
@@ -770,6 +775,8 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
 	resource_size_t start, size;
 	int ret = 0;
 
+	clear_zone_contiguous(zone);
+
 	/*
 	 * We can only remove entire sections
 	 */
@@ -796,6 +803,9 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
 		if (ret)
 			break;
 	}
+
+	set_zone_contiguous(zone);
+
 	return ret;
 }
 EXPORT_SYMBOL_GPL(__remove_pages);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bac8842..4f5ad2b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1271,9 +1271,13 @@ free_range:
 	pgdat_init_report_one_done();
 	return 0;
 }
+#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
 
 void __init page_alloc_init_late(void)
 {
+	struct zone *zone;
+
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 	int nid;
 
 	/* There will be num_node_state(N_MEMORY) threads */
@@ -1287,8 +1291,87 @@ void __init page_alloc_init_late(void)
 
 	/* Reinit limits that are based on free pages after the kernel is up */
 	files_maxfiles_init();
+#endif
+
+	for_each_populated_zone(zone)
+		set_zone_contiguous(zone);
+}
+
+/*
+ * Check that the whole (or subset of) a pageblock given by the interval of
+ * [start_pfn, end_pfn) is valid and within the same zone, before scanning it
+ * with the migration of free compaction scanner. The scanners then need to
+ * use only pfn_valid_within() check for arches that allow holes within
+ * pageblocks.
+ *
+ * Return struct page pointer of start_pfn, or NULL if checks were not passed.
+ *
+ * It's possible on some configurations to have a setup like node0 node1 node0
+ * i.e. it's possible that all pages within a zones range of pages do not
+ * belong to a single zone. We assume that a border between node0 and node1
+ * can occur within a single pageblock, but not a node0 node1 node0
+ * interleaving within a single pageblock. It is therefore sufficient to check
+ * the first and last page of a pageblock and avoid checking each individual
+ * page in a pageblock.
+ */
+struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
+				unsigned long end_pfn, struct zone *zone)
+{
+	struct page *start_page;
+	struct page *end_page;
+
+	/* end_pfn is one past the range we are checking */
+	end_pfn--;
+
+	if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn))
+		return NULL;
+
+	start_page = pfn_to_page(start_pfn);
+
+	if (page_zone(start_page) != zone)
+		return NULL;
+
+	end_page = pfn_to_page(end_pfn);
+
+	/* This gives a shorter code than deriving page_zone(end_page) */
+	if (page_zone_id(start_page) != page_zone_id(end_page))
+		return NULL;
+
+	return start_page;
+}
+
+void set_zone_contiguous(struct zone *zone)
+{
+	unsigned long block_start_pfn = zone->zone_start_pfn;
+	unsigned long block_end_pfn;
+	unsigned long pfn;
+
+	block_end_pfn = ALIGN(block_start_pfn + 1, pageblock_nr_pages);
+	for (; block_start_pfn < zone_end_pfn(zone);
+		block_start_pfn = block_end_pfn,
+		block_end_pfn += pageblock_nr_pages) {
+
+		block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
+
+		if (!__pageblock_pfn_to_page(block_start_pfn,
+					block_end_pfn, zone))
+			return;
+
+		/* Check validity of pfn within pageblock */
+		for (pfn = block_start_pfn; pfn < block_end_pfn; pfn++) {
+			if (!pfn_valid_within(pfn))
+				return;
+		}
+	}
+
+	/* We confirm that there is no hole */
+	zone->contiguous = true;
+}
+
+void clear_zone_contiguous(struct zone *zone)
+{
+	zone->contiguous = false;
 }
-#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
 
 #ifdef CONFIG_CMA
 /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
@ 2015-12-21  6:13   ` Joonsoo Kim
  0 siblings, 0 replies; 34+ messages in thread
From: Joonsoo Kim @ 2015-12-21  6:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Aaron Lu, Mel Gorman, Rik van Riel,
	David Rientjes, linux-kernel, linux-mm, Joonsoo Kim

There is a performance drop report due to hugepage allocation and in there
half of cpu time are spent on pageblock_pfn_to_page() in compaction [1].
In that workload, compaction is triggered to make hugepage but most of
pageblocks are un-available for compaction due to pageblock type and
skip bit so compaction usually fails. Most costly operations in this case
is to find valid pageblock while scanning whole zone range. To check
if pageblock is valid to compact, valid pfn within pageblock is required
and we can obtain it by calling pageblock_pfn_to_page(). This function
checks whether pageblock is in a single zone and return valid pfn
if possible. Problem is that we need to check it every time before
scanning pageblock even if we re-visit it and this turns out to
be very expensive in this workload.

Although we have no way to skip this pageblock check in the system
where hole exists at arbitrary position, we can use cached value for
zone continuity and just do pfn_to_page() in the system where hole doesn't
exist. This optimization considerably speeds up in above workload.

Before vs After
Max: 1096 MB/s vs 1325 MB/s
Min: 635 MB/s 1015 MB/s
Avg: 899 MB/s 1194 MB/s

Avg is improved by roughly 30% [2].

[1]: http://www.spinics.net/lists/linux-mm/msg97378.html
[2]: https://lkml.org/lkml/2015/12/9/23

v2
o checking zone continuity after initialization
o handle memory-hotplug case

Reported and Tested-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 include/linux/gfp.h            |  6 ---
 include/linux/memory_hotplug.h |  3 ++
 include/linux/mmzone.h         |  2 +
 mm/compaction.c                | 43 ---------------------
 mm/internal.h                  | 12 ++++++
 mm/memory_hotplug.c            | 10 +++++
 mm/page_alloc.c                | 85 +++++++++++++++++++++++++++++++++++++++++-
 7 files changed, 111 insertions(+), 50 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 91f74e7..6eb3eca 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -515,13 +515,7 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp);
 void drain_all_pages(struct zone *zone);
 void drain_local_pages(struct zone *zone);
 
-#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 void page_alloc_init_late(void);
-#else
-static inline void page_alloc_init_late(void)
-{
-}
-#endif
 
 /*
  * gfp_allowed_mask is set to GFP_BOOT_MASK during early boot to restrict what
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 2ea574f..18c2676 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -196,6 +196,9 @@ void put_online_mems(void);
 void mem_hotplug_begin(void);
 void mem_hotplug_done(void);
 
+extern void set_zone_contiguous(struct zone *zone);
+extern void clear_zone_contiguous(struct zone *zone);
+
 #else /* ! CONFIG_MEMORY_HOTPLUG */
 /*
  * Stub functions for when hotplug is off
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 68cc063..eb5d88e 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -523,6 +523,8 @@ struct zone {
 	bool			compact_blockskip_flush;
 #endif
 
+	bool			contiguous;
+
 	ZONE_PADDING(_pad3_)
 	/* Zone statistics */
 	atomic_long_t		vm_stat[NR_VM_ZONE_STAT_ITEMS];
diff --git a/mm/compaction.c b/mm/compaction.c
index 56fa321..9c89d46 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -71,49 +71,6 @@ static inline bool migrate_async_suitable(int migratetype)
 	return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE;
 }
 
-/*
- * Check that the whole (or subset of) a pageblock given by the interval of
- * [start_pfn, end_pfn) is valid and within the same zone, before scanning it
- * with the migration of free compaction scanner. The scanners then need to
- * use only pfn_valid_within() check for arches that allow holes within
- * pageblocks.
- *
- * Return struct page pointer of start_pfn, or NULL if checks were not passed.
- *
- * It's possible on some configurations to have a setup like node0 node1 node0
- * i.e. it's possible that all pages within a zones range of pages do not
- * belong to a single zone. We assume that a border between node0 and node1
- * can occur within a single pageblock, but not a node0 node1 node0
- * interleaving within a single pageblock. It is therefore sufficient to check
- * the first and last page of a pageblock and avoid checking each individual
- * page in a pageblock.
- */
-static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
-				unsigned long end_pfn, struct zone *zone)
-{
-	struct page *start_page;
-	struct page *end_page;
-
-	/* end_pfn is one past the range we are checking */
-	end_pfn--;
-
-	if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn))
-		return NULL;
-
-	start_page = pfn_to_page(start_pfn);
-
-	if (page_zone(start_page) != zone)
-		return NULL;
-
-	end_page = pfn_to_page(end_pfn);
-
-	/* This gives a shorter code than deriving page_zone(end_page) */
-	if (page_zone_id(start_page) != page_zone_id(end_page))
-		return NULL;
-
-	return start_page;
-}
-
 #ifdef CONFIG_COMPACTION
 
 /* Do not skip compaction more than 64 times */
diff --git a/mm/internal.h b/mm/internal.h
index d01a41c..bc9d337 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -137,6 +137,18 @@ __find_buddy_index(unsigned long page_idx, unsigned int order)
 	return page_idx ^ (1 << order);
 }
 
+extern struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
+				unsigned long end_pfn, struct zone *zone);
+
+static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
+				unsigned long end_pfn, struct zone *zone)
+{
+	if (zone->contiguous)
+		return pfn_to_page(start_pfn);
+
+	return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
+}
+
 extern int __isolate_free_page(struct page *page, unsigned int order);
 extern void __free_pages_bootmem(struct page *page, unsigned long pfn,
 					unsigned int order);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index d8016a2..f7b6e6b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -505,6 +505,9 @@ int __ref __add_pages(int nid, struct zone *zone, unsigned long phys_start_pfn,
 	unsigned long i;
 	int err = 0;
 	int start_sec, end_sec;
+
+	clear_zone_contiguous(zone);
+
 	/* during initialize mem_map, align hot-added range to section */
 	start_sec = pfn_to_section_nr(phys_start_pfn);
 	end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
@@ -523,6 +526,8 @@ int __ref __add_pages(int nid, struct zone *zone, unsigned long phys_start_pfn,
 	}
 	vmemmap_populate_print_last();
 
+	set_zone_contiguous(zone);
+
 	return err;
 }
 EXPORT_SYMBOL_GPL(__add_pages);
@@ -770,6 +775,8 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
 	resource_size_t start, size;
 	int ret = 0;
 
+	clear_zone_contiguous(zone);
+
 	/*
 	 * We can only remove entire sections
 	 */
@@ -796,6 +803,9 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
 		if (ret)
 			break;
 	}
+
+	set_zone_contiguous(zone);
+
 	return ret;
 }
 EXPORT_SYMBOL_GPL(__remove_pages);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bac8842..4f5ad2b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1271,9 +1271,13 @@ free_range:
 	pgdat_init_report_one_done();
 	return 0;
 }
+#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
 
 void __init page_alloc_init_late(void)
 {
+	struct zone *zone;
+
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 	int nid;
 
 	/* There will be num_node_state(N_MEMORY) threads */
@@ -1287,8 +1291,87 @@ void __init page_alloc_init_late(void)
 
 	/* Reinit limits that are based on free pages after the kernel is up */
 	files_maxfiles_init();
+#endif
+
+	for_each_populated_zone(zone)
+		set_zone_contiguous(zone);
+}
+
+/*
+ * Check that the whole (or subset of) a pageblock given by the interval of
+ * [start_pfn, end_pfn) is valid and within the same zone, before scanning it
+ * with the migration of free compaction scanner. The scanners then need to
+ * use only pfn_valid_within() check for arches that allow holes within
+ * pageblocks.
+ *
+ * Return struct page pointer of start_pfn, or NULL if checks were not passed.
+ *
+ * It's possible on some configurations to have a setup like node0 node1 node0
+ * i.e. it's possible that all pages within a zones range of pages do not
+ * belong to a single zone. We assume that a border between node0 and node1
+ * can occur within a single pageblock, but not a node0 node1 node0
+ * interleaving within a single pageblock. It is therefore sufficient to check
+ * the first and last page of a pageblock and avoid checking each individual
+ * page in a pageblock.
+ */
+struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
+				unsigned long end_pfn, struct zone *zone)
+{
+	struct page *start_page;
+	struct page *end_page;
+
+	/* end_pfn is one past the range we are checking */
+	end_pfn--;
+
+	if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn))
+		return NULL;
+
+	start_page = pfn_to_page(start_pfn);
+
+	if (page_zone(start_page) != zone)
+		return NULL;
+
+	end_page = pfn_to_page(end_pfn);
+
+	/* This gives a shorter code than deriving page_zone(end_page) */
+	if (page_zone_id(start_page) != page_zone_id(end_page))
+		return NULL;
+
+	return start_page;
+}
+
+void set_zone_contiguous(struct zone *zone)
+{
+	unsigned long block_start_pfn = zone->zone_start_pfn;
+	unsigned long block_end_pfn;
+	unsigned long pfn;
+
+	block_end_pfn = ALIGN(block_start_pfn + 1, pageblock_nr_pages);
+	for (; block_start_pfn < zone_end_pfn(zone);
+		block_start_pfn = block_end_pfn,
+		block_end_pfn += pageblock_nr_pages) {
+
+		block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
+
+		if (!__pageblock_pfn_to_page(block_start_pfn,
+					block_end_pfn, zone))
+			return;
+
+		/* Check validity of pfn within pageblock */
+		for (pfn = block_start_pfn; pfn < block_end_pfn; pfn++) {
+			if (!pfn_valid_within(pfn))
+				return;
+		}
+	}
+
+	/* We confirm that there is no hole */
+	zone->contiguous = true;
+}
+
+void clear_zone_contiguous(struct zone *zone)
+{
+	zone->contiguous = false;
 }
-#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
 
 #ifdef CONFIG_CMA
 /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
  2015-12-21  6:13   ` Joonsoo Kim
@ 2015-12-21 10:46     ` Vlastimil Babka
  -1 siblings, 0 replies; 34+ messages in thread
From: Vlastimil Babka @ 2015-12-21 10:46 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Aaron Lu, Mel Gorman, Rik van Riel, David Rientjes, linux-kernel,
	linux-mm, Joonsoo Kim, Gu Zheng, Tang Chen, Naoya Horiguchi,
	Toshi Kani

On 12/21/2015 07:13 AM, Joonsoo Kim wrote:
> There is a performance drop report due to hugepage allocation and in there
> half of cpu time are spent on pageblock_pfn_to_page() in compaction [1].
> In that workload, compaction is triggered to make hugepage but most of
> pageblocks are un-available for compaction due to pageblock type and
> skip bit so compaction usually fails. Most costly operations in this case
> is to find valid pageblock while scanning whole zone range. To check
> if pageblock is valid to compact, valid pfn within pageblock is required
> and we can obtain it by calling pageblock_pfn_to_page(). This function
> checks whether pageblock is in a single zone and return valid pfn
> if possible. Problem is that we need to check it every time before
> scanning pageblock even if we re-visit it and this turns out to
> be very expensive in this workload.
>
> Although we have no way to skip this pageblock check in the system
> where hole exists at arbitrary position, we can use cached value for
> zone continuity and just do pfn_to_page() in the system where hole doesn't
> exist. This optimization considerably speeds up in above workload.
>
> Before vs After
> Max: 1096 MB/s vs 1325 MB/s
> Min: 635 MB/s 1015 MB/s
> Avg: 899 MB/s 1194 MB/s
>
> Avg is improved by roughly 30% [2].
>
> [1]: http://www.spinics.net/lists/linux-mm/msg97378.html
> [2]: https://lkml.org/lkml/2015/12/9/23
>
> v2
> o checking zone continuity after initialization
> o handle memory-hotplug case
>
> Reported and Tested-by: Aaron Lu <aaron.lu@intel.com>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
[...]
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -505,6 +505,9 @@ int __ref __add_pages(int nid, struct zone *zone, unsigned long phys_start_pfn,
>   	unsigned long i;
>   	int err = 0;
>   	int start_sec, end_sec;
> +
> +	clear_zone_contiguous(zone);
> +
>   	/* during initialize mem_map, align hot-added range to section */
>   	start_sec = pfn_to_section_nr(phys_start_pfn);
>   	end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
> @@ -523,6 +526,8 @@ int __ref __add_pages(int nid, struct zone *zone, unsigned long phys_start_pfn,
>   	}
>   	vmemmap_populate_print_last();
>
> +	set_zone_contiguous(zone);
> +
>   	return err;
>   }
>   EXPORT_SYMBOL_GPL(__add_pages);
> @@ -770,6 +775,8 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
>   	resource_size_t start, size;
>   	int ret = 0;
>
> +	clear_zone_contiguous(zone);
> +
>   	/*
>   	 * We can only remove entire sections
>   	 */
> @@ -796,6 +803,9 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
>   		if (ret)
>   			break;
>   	}
> +
> +	set_zone_contiguous(zone);
> +
>   	return ret;

Hm I wonder how many __add_ or __remove_pages calls there might be per a 
major hotplug event (e.g. whole node). IIRC there may be many subranges 
that are onlined/offlined separately? Doing a full zone rescan on each 
sub-operation could be quite costly, no? You should have added 
mm/hotplug_memory.c people to CC to comment, as you did in the [RFC] 
theoretical race... mail. Doing that now.
If the hotplug people confirm it might be an issue, I guess one solution 
is to call set_zone_contiguous() lazily on-demand as you did in the v1 
(but not relying on cached pfn initialization to determine whether 
contiguous was already evaluated). Add another variable like 
zone->contiguous_evaluated and make hotplug code just set it to false.

>   }
>   EXPORT_SYMBOL_GPL(__remove_pages);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index bac8842..4f5ad2b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1271,9 +1271,13 @@ free_range:
>   	pgdat_init_report_one_done();
>   	return 0;
>   }
> +#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
>
>   void __init page_alloc_init_late(void)
>   {
> +	struct zone *zone;
> +
> +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
>   	int nid;
>
>   	/* There will be num_node_state(N_MEMORY) threads */
> @@ -1287,8 +1291,87 @@ void __init page_alloc_init_late(void)
>
>   	/* Reinit limits that are based on free pages after the kernel is up */
>   	files_maxfiles_init();
> +#endif
> +
> +	for_each_populated_zone(zone)
> +		set_zone_contiguous(zone);
> +}
> +
> +/*
> + * Check that the whole (or subset of) a pageblock given by the interval of
> + * [start_pfn, end_pfn) is valid and within the same zone, before scanning it
> + * with the migration of free compaction scanner. The scanners then need to
> + * use only pfn_valid_within() check for arches that allow holes within
> + * pageblocks.
> + *
> + * Return struct page pointer of start_pfn, or NULL if checks were not passed.
> + *
> + * It's possible on some configurations to have a setup like node0 node1 node0
> + * i.e. it's possible that all pages within a zones range of pages do not
> + * belong to a single zone. We assume that a border between node0 and node1
> + * can occur within a single pageblock, but not a node0 node1 node0
> + * interleaving within a single pageblock. It is therefore sufficient to check
> + * the first and last page of a pageblock and avoid checking each individual
> + * page in a pageblock.
> + */
> +struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
> +				unsigned long end_pfn, struct zone *zone)
> +{
> +	struct page *start_page;
> +	struct page *end_page;
> +
> +	/* end_pfn is one past the range we are checking */
> +	end_pfn--;
> +
> +	if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn))
> +		return NULL;
> +
> +	start_page = pfn_to_page(start_pfn);
> +
> +	if (page_zone(start_page) != zone)
> +		return NULL;
> +
> +	end_page = pfn_to_page(end_pfn);
> +
> +	/* This gives a shorter code than deriving page_zone(end_page) */
> +	if (page_zone_id(start_page) != page_zone_id(end_page))
> +		return NULL;
> +
> +	return start_page;
> +}
> +
> +void set_zone_contiguous(struct zone *zone)
> +{
> +	unsigned long block_start_pfn = zone->zone_start_pfn;
> +	unsigned long block_end_pfn;
> +	unsigned long pfn;
> +
> +	block_end_pfn = ALIGN(block_start_pfn + 1, pageblock_nr_pages);
> +	for (; block_start_pfn < zone_end_pfn(zone);
> +		block_start_pfn = block_end_pfn,
> +		block_end_pfn += pageblock_nr_pages) {
> +
> +		block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
> +
> +		if (!__pageblock_pfn_to_page(block_start_pfn,
> +					block_end_pfn, zone))
> +			return;
> +
> +		/* Check validity of pfn within pageblock */
> +		for (pfn = block_start_pfn; pfn < block_end_pfn; pfn++) {
> +			if (!pfn_valid_within(pfn))
> +				return;
> +		}

Hm this is suboptimal and misleading. The result of pfn_valid_within() 
doesn't affect whether we need to use __pageblock_pfn_to_page() or not, 
so zone->contiguous shouldn't depend on it.

On the other hand, if we knew that pfn_valid_within() is true 
everywhere, we wouldn't need to check it inside isolate_*pages_block().
So you could add another patch that adds another bool to struct zone and 
test for that (with #ifdef CONFIG_HOLES_IN_ZONE at appropriate places).

Thanks,
Vlastimil

> +	}
> +
> +	/* We confirm that there is no hole */
> +	zone->contiguous = true;
> +}
> +
> +void clear_zone_contiguous(struct zone *zone)
> +{
> +	zone->contiguous = false;
>   }
> -#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
>
>   #ifdef CONFIG_CMA
>   /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
@ 2015-12-21 10:46     ` Vlastimil Babka
  0 siblings, 0 replies; 34+ messages in thread
From: Vlastimil Babka @ 2015-12-21 10:46 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Aaron Lu, Mel Gorman, Rik van Riel, David Rientjes, linux-kernel,
	linux-mm, Joonsoo Kim, Gu Zheng, Tang Chen, Naoya Horiguchi,
	Toshi Kani

On 12/21/2015 07:13 AM, Joonsoo Kim wrote:
> There is a performance drop report due to hugepage allocation and in there
> half of cpu time are spent on pageblock_pfn_to_page() in compaction [1].
> In that workload, compaction is triggered to make hugepage but most of
> pageblocks are un-available for compaction due to pageblock type and
> skip bit so compaction usually fails. Most costly operations in this case
> is to find valid pageblock while scanning whole zone range. To check
> if pageblock is valid to compact, valid pfn within pageblock is required
> and we can obtain it by calling pageblock_pfn_to_page(). This function
> checks whether pageblock is in a single zone and return valid pfn
> if possible. Problem is that we need to check it every time before
> scanning pageblock even if we re-visit it and this turns out to
> be very expensive in this workload.
>
> Although we have no way to skip this pageblock check in the system
> where hole exists at arbitrary position, we can use cached value for
> zone continuity and just do pfn_to_page() in the system where hole doesn't
> exist. This optimization considerably speeds up in above workload.
>
> Before vs After
> Max: 1096 MB/s vs 1325 MB/s
> Min: 635 MB/s 1015 MB/s
> Avg: 899 MB/s 1194 MB/s
>
> Avg is improved by roughly 30% [2].
>
> [1]: http://www.spinics.net/lists/linux-mm/msg97378.html
> [2]: https://lkml.org/lkml/2015/12/9/23
>
> v2
> o checking zone continuity after initialization
> o handle memory-hotplug case
>
> Reported and Tested-by: Aaron Lu <aaron.lu@intel.com>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
[...]
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -505,6 +505,9 @@ int __ref __add_pages(int nid, struct zone *zone, unsigned long phys_start_pfn,
>   	unsigned long i;
>   	int err = 0;
>   	int start_sec, end_sec;
> +
> +	clear_zone_contiguous(zone);
> +
>   	/* during initialize mem_map, align hot-added range to section */
>   	start_sec = pfn_to_section_nr(phys_start_pfn);
>   	end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
> @@ -523,6 +526,8 @@ int __ref __add_pages(int nid, struct zone *zone, unsigned long phys_start_pfn,
>   	}
>   	vmemmap_populate_print_last();
>
> +	set_zone_contiguous(zone);
> +
>   	return err;
>   }
>   EXPORT_SYMBOL_GPL(__add_pages);
> @@ -770,6 +775,8 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
>   	resource_size_t start, size;
>   	int ret = 0;
>
> +	clear_zone_contiguous(zone);
> +
>   	/*
>   	 * We can only remove entire sections
>   	 */
> @@ -796,6 +803,9 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
>   		if (ret)
>   			break;
>   	}
> +
> +	set_zone_contiguous(zone);
> +
>   	return ret;

Hm I wonder how many __add_ or __remove_pages calls there might be per a 
major hotplug event (e.g. whole node). IIRC there may be many subranges 
that are onlined/offlined separately? Doing a full zone rescan on each 
sub-operation could be quite costly, no? You should have added 
mm/hotplug_memory.c people to CC to comment, as you did in the [RFC] 
theoretical race... mail. Doing that now.
If the hotplug people confirm it might be an issue, I guess one solution 
is to call set_zone_contiguous() lazily on-demand as you did in the v1 
(but not relying on cached pfn initialization to determine whether 
contiguous was already evaluated). Add another variable like 
zone->contiguous_evaluated and make hotplug code just set it to false.

>   }
>   EXPORT_SYMBOL_GPL(__remove_pages);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index bac8842..4f5ad2b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1271,9 +1271,13 @@ free_range:
>   	pgdat_init_report_one_done();
>   	return 0;
>   }
> +#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
>
>   void __init page_alloc_init_late(void)
>   {
> +	struct zone *zone;
> +
> +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
>   	int nid;
>
>   	/* There will be num_node_state(N_MEMORY) threads */
> @@ -1287,8 +1291,87 @@ void __init page_alloc_init_late(void)
>
>   	/* Reinit limits that are based on free pages after the kernel is up */
>   	files_maxfiles_init();
> +#endif
> +
> +	for_each_populated_zone(zone)
> +		set_zone_contiguous(zone);
> +}
> +
> +/*
> + * Check that the whole (or subset of) a pageblock given by the interval of
> + * [start_pfn, end_pfn) is valid and within the same zone, before scanning it
> + * with the migration of free compaction scanner. The scanners then need to
> + * use only pfn_valid_within() check for arches that allow holes within
> + * pageblocks.
> + *
> + * Return struct page pointer of start_pfn, or NULL if checks were not passed.
> + *
> + * It's possible on some configurations to have a setup like node0 node1 node0
> + * i.e. it's possible that all pages within a zones range of pages do not
> + * belong to a single zone. We assume that a border between node0 and node1
> + * can occur within a single pageblock, but not a node0 node1 node0
> + * interleaving within a single pageblock. It is therefore sufficient to check
> + * the first and last page of a pageblock and avoid checking each individual
> + * page in a pageblock.
> + */
> +struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
> +				unsigned long end_pfn, struct zone *zone)
> +{
> +	struct page *start_page;
> +	struct page *end_page;
> +
> +	/* end_pfn is one past the range we are checking */
> +	end_pfn--;
> +
> +	if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn))
> +		return NULL;
> +
> +	start_page = pfn_to_page(start_pfn);
> +
> +	if (page_zone(start_page) != zone)
> +		return NULL;
> +
> +	end_page = pfn_to_page(end_pfn);
> +
> +	/* This gives a shorter code than deriving page_zone(end_page) */
> +	if (page_zone_id(start_page) != page_zone_id(end_page))
> +		return NULL;
> +
> +	return start_page;
> +}
> +
> +void set_zone_contiguous(struct zone *zone)
> +{
> +	unsigned long block_start_pfn = zone->zone_start_pfn;
> +	unsigned long block_end_pfn;
> +	unsigned long pfn;
> +
> +	block_end_pfn = ALIGN(block_start_pfn + 1, pageblock_nr_pages);
> +	for (; block_start_pfn < zone_end_pfn(zone);
> +		block_start_pfn = block_end_pfn,
> +		block_end_pfn += pageblock_nr_pages) {
> +
> +		block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
> +
> +		if (!__pageblock_pfn_to_page(block_start_pfn,
> +					block_end_pfn, zone))
> +			return;
> +
> +		/* Check validity of pfn within pageblock */
> +		for (pfn = block_start_pfn; pfn < block_end_pfn; pfn++) {
> +			if (!pfn_valid_within(pfn))
> +				return;
> +		}

Hm this is suboptimal and misleading. The result of pfn_valid_within() 
doesn't affect whether we need to use __pageblock_pfn_to_page() or not, 
so zone->contiguous shouldn't depend on it.

On the other hand, if we knew that pfn_valid_within() is true 
everywhere, we wouldn't need to check it inside isolate_*pages_block().
So you could add another patch that adds another bool to struct zone and 
test for that (with #ifdef CONFIG_HOLES_IN_ZONE at appropriate places).

Thanks,
Vlastimil

> +	}
> +
> +	/* We confirm that there is no hole */
> +	zone->contiguous = true;
> +}
> +
> +void clear_zone_contiguous(struct zone *zone)
> +{
> +	zone->contiguous = false;
>   }
> -#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
>
>   #ifdef CONFIG_CMA
>   /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
  2015-12-21 10:46     ` Vlastimil Babka
@ 2015-12-21 12:18       ` Joonsoo Kim
  -1 siblings, 0 replies; 34+ messages in thread
From: Joonsoo Kim @ 2015-12-21 12:18 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Aaron Lu, Mel Gorman, Rik van Riel,
	David Rientjes, LKML, Linux Memory Management List, Joonsoo Kim,
	Gu Zheng, Tang Chen, Naoya Horiguchi, Toshi Kani

2015-12-21 19:46 GMT+09:00 Vlastimil Babka <vbabka@suse.cz>:
> On 12/21/2015 07:13 AM, Joonsoo Kim wrote:
>>
>> There is a performance drop report due to hugepage allocation and in there
>> half of cpu time are spent on pageblock_pfn_to_page() in compaction [1].
>> In that workload, compaction is triggered to make hugepage but most of
>> pageblocks are un-available for compaction due to pageblock type and
>> skip bit so compaction usually fails. Most costly operations in this case
>> is to find valid pageblock while scanning whole zone range. To check
>> if pageblock is valid to compact, valid pfn within pageblock is required
>> and we can obtain it by calling pageblock_pfn_to_page(). This function
>> checks whether pageblock is in a single zone and return valid pfn
>> if possible. Problem is that we need to check it every time before
>> scanning pageblock even if we re-visit it and this turns out to
>> be very expensive in this workload.
>>
>> Although we have no way to skip this pageblock check in the system
>> where hole exists at arbitrary position, we can use cached value for
>> zone continuity and just do pfn_to_page() in the system where hole doesn't
>> exist. This optimization considerably speeds up in above workload.
>>
>> Before vs After
>> Max: 1096 MB/s vs 1325 MB/s
>> Min: 635 MB/s 1015 MB/s
>> Avg: 899 MB/s 1194 MB/s
>>
>> Avg is improved by roughly 30% [2].
>>
>> [1]: http://www.spinics.net/lists/linux-mm/msg97378.html
>> [2]: https://lkml.org/lkml/2015/12/9/23
>>
>> v2
>> o checking zone continuity after initialization
>> o handle memory-hotplug case
>>
>> Reported and Tested-by: Aaron Lu <aaron.lu@intel.com>
>> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>
> [...]
>
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -505,6 +505,9 @@ int __ref __add_pages(int nid, struct zone *zone,
>> unsigned long phys_start_pfn,
>>         unsigned long i;
>>         int err = 0;
>>         int start_sec, end_sec;
>> +
>> +       clear_zone_contiguous(zone);
>> +
>>         /* during initialize mem_map, align hot-added range to section */
>>         start_sec = pfn_to_section_nr(phys_start_pfn);
>>         end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
>> @@ -523,6 +526,8 @@ int __ref __add_pages(int nid, struct zone *zone,
>> unsigned long phys_start_pfn,
>>         }
>>         vmemmap_populate_print_last();
>>
>> +       set_zone_contiguous(zone);
>> +
>>         return err;
>>   }
>>   EXPORT_SYMBOL_GPL(__add_pages);
>> @@ -770,6 +775,8 @@ int __remove_pages(struct zone *zone, unsigned long
>> phys_start_pfn,
>>         resource_size_t start, size;
>>         int ret = 0;
>>
>> +       clear_zone_contiguous(zone);
>> +
>>         /*
>>          * We can only remove entire sections
>>          */
>> @@ -796,6 +803,9 @@ int __remove_pages(struct zone *zone, unsigned long
>> phys_start_pfn,
>>                 if (ret)
>>                         break;
>>         }
>> +
>> +       set_zone_contiguous(zone);
>> +
>>         return ret;
>
>
> Hm I wonder how many __add_ or __remove_pages calls there might be per a
> major hotplug event (e.g. whole node). IIRC there may be many subranges that
> are onlined/offlined separately?

__add_ or __removed_pages are called whenever memory device is added
or removed, that is, 1 call per 1 device add/remove. If device is 1GB,
it is called
for 1GB range and if device is 2GB, it is called for 2GB range. I think they are
reasonable place to check continuity.

> Doing a full zone rescan on each
> sub-operation could be quite costly, no?

Doing full zone rescan makes code much simpler. It can be optimized further but
at this point there is no reason to handle complexity.

You should have added
> mm/hotplug_memory.c people to CC to comment, as you did in the [RFC]
> theoretical race... mail. Doing that now.

Okay. Thanks.

> If the hotplug people confirm it might be an issue, I guess one solution is
> to call set_zone_contiguous() lazily on-demand as you did in the v1 (but not
> relying on cached pfn initialization to determine whether contiguous was
> already evaluated). Add another variable like zone->contiguous_evaluated and
> make hotplug code just set it to false.

It could be, but, I guess that this is not too expensive for memory
hotplug. They can
tolerate 120 sec for offline memory and rescanning full zone to check
continuity is
very cheap than it.

Anyway, it's better to confirm these things by memory hotplug people.

Thanks.

>
>>   }
>>   EXPORT_SYMBOL_GPL(__remove_pages);
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index bac8842..4f5ad2b 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1271,9 +1271,13 @@ free_range:
>>         pgdat_init_report_one_done();
>>         return 0;
>>   }
>> +#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
>>
>>   void __init page_alloc_init_late(void)
>>   {
>> +       struct zone *zone;
>> +
>> +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
>>         int nid;
>>
>>         /* There will be num_node_state(N_MEMORY) threads */
>> @@ -1287,8 +1291,87 @@ void __init page_alloc_init_late(void)
>>
>>         /* Reinit limits that are based on free pages after the kernel is
>> up */
>>         files_maxfiles_init();
>> +#endif
>> +
>> +       for_each_populated_zone(zone)
>> +               set_zone_contiguous(zone);
>> +}
>> +
>> +/*
>> + * Check that the whole (or subset of) a pageblock given by the interval
>> of
>> + * [start_pfn, end_pfn) is valid and within the same zone, before
>> scanning it
>> + * with the migration of free compaction scanner. The scanners then need
>> to
>> + * use only pfn_valid_within() check for arches that allow holes within
>> + * pageblocks.
>> + *
>> + * Return struct page pointer of start_pfn, or NULL if checks were not
>> passed.
>> + *
>> + * It's possible on some configurations to have a setup like node0 node1
>> node0
>> + * i.e. it's possible that all pages within a zones range of pages do not
>> + * belong to a single zone. We assume that a border between node0 and
>> node1
>> + * can occur within a single pageblock, but not a node0 node1 node0
>> + * interleaving within a single pageblock. It is therefore sufficient to
>> check
>> + * the first and last page of a pageblock and avoid checking each
>> individual
>> + * page in a pageblock.
>> + */
>> +struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
>> +                               unsigned long end_pfn, struct zone *zone)
>> +{
>> +       struct page *start_page;
>> +       struct page *end_page;
>> +
>> +       /* end_pfn is one past the range we are checking */
>> +       end_pfn--;
>> +
>> +       if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn))
>> +               return NULL;
>> +
>> +       start_page = pfn_to_page(start_pfn);
>> +
>> +       if (page_zone(start_page) != zone)
>> +               return NULL;
>> +
>> +       end_page = pfn_to_page(end_pfn);
>> +
>> +       /* This gives a shorter code than deriving page_zone(end_page) */
>> +       if (page_zone_id(start_page) != page_zone_id(end_page))
>> +               return NULL;
>> +
>> +       return start_page;
>> +}
>> +
>> +void set_zone_contiguous(struct zone *zone)
>> +{
>> +       unsigned long block_start_pfn = zone->zone_start_pfn;
>> +       unsigned long block_end_pfn;
>> +       unsigned long pfn;
>> +
>> +       block_end_pfn = ALIGN(block_start_pfn + 1, pageblock_nr_pages);
>> +       for (; block_start_pfn < zone_end_pfn(zone);
>> +               block_start_pfn = block_end_pfn,
>> +               block_end_pfn += pageblock_nr_pages) {
>> +
>> +               block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
>> +
>> +               if (!__pageblock_pfn_to_page(block_start_pfn,
>> +                                       block_end_pfn, zone))
>> +                       return;
>> +
>> +               /* Check validity of pfn within pageblock */
>> +               for (pfn = block_start_pfn; pfn < block_end_pfn; pfn++) {
>> +                       if (!pfn_valid_within(pfn))
>> +                               return;
>> +               }
>
>
> Hm this is suboptimal and misleading. The result of pfn_valid_within()
> doesn't affect whether we need to use __pageblock_pfn_to_page() or not, so
> zone->contiguous shouldn't depend on it.
>
> On the other hand, if we knew that pfn_valid_within() is true everywhere, we
> wouldn't need to check it inside isolate_*pages_block().
> So you could add another patch that adds another bool to struct zone and
> test for that (with #ifdef CONFIG_HOLES_IN_ZONE at appropriate places).
>
> Thanks,
> Vlastimil
>
>
>> +       }
>> +
>> +       /* We confirm that there is no hole */
>> +       zone->contiguous = true;
>> +}
>> +
>> +void clear_zone_contiguous(struct zone *zone)
>> +{
>> +       zone->contiguous = false;
>>   }
>> -#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
>>
>>   #ifdef CONFIG_CMA
>>   /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
>>
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
@ 2015-12-21 12:18       ` Joonsoo Kim
  0 siblings, 0 replies; 34+ messages in thread
From: Joonsoo Kim @ 2015-12-21 12:18 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Aaron Lu, Mel Gorman, Rik van Riel,
	David Rientjes, LKML, Linux Memory Management List, Joonsoo Kim,
	Gu Zheng, Tang Chen, Naoya Horiguchi, Toshi Kani

2015-12-21 19:46 GMT+09:00 Vlastimil Babka <vbabka@suse.cz>:
> On 12/21/2015 07:13 AM, Joonsoo Kim wrote:
>>
>> There is a performance drop report due to hugepage allocation and in there
>> half of cpu time are spent on pageblock_pfn_to_page() in compaction [1].
>> In that workload, compaction is triggered to make hugepage but most of
>> pageblocks are un-available for compaction due to pageblock type and
>> skip bit so compaction usually fails. Most costly operations in this case
>> is to find valid pageblock while scanning whole zone range. To check
>> if pageblock is valid to compact, valid pfn within pageblock is required
>> and we can obtain it by calling pageblock_pfn_to_page(). This function
>> checks whether pageblock is in a single zone and return valid pfn
>> if possible. Problem is that we need to check it every time before
>> scanning pageblock even if we re-visit it and this turns out to
>> be very expensive in this workload.
>>
>> Although we have no way to skip this pageblock check in the system
>> where hole exists at arbitrary position, we can use cached value for
>> zone continuity and just do pfn_to_page() in the system where hole doesn't
>> exist. This optimization considerably speeds up in above workload.
>>
>> Before vs After
>> Max: 1096 MB/s vs 1325 MB/s
>> Min: 635 MB/s 1015 MB/s
>> Avg: 899 MB/s 1194 MB/s
>>
>> Avg is improved by roughly 30% [2].
>>
>> [1]: http://www.spinics.net/lists/linux-mm/msg97378.html
>> [2]: https://lkml.org/lkml/2015/12/9/23
>>
>> v2
>> o checking zone continuity after initialization
>> o handle memory-hotplug case
>>
>> Reported and Tested-by: Aaron Lu <aaron.lu@intel.com>
>> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>
> [...]
>
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -505,6 +505,9 @@ int __ref __add_pages(int nid, struct zone *zone,
>> unsigned long phys_start_pfn,
>>         unsigned long i;
>>         int err = 0;
>>         int start_sec, end_sec;
>> +
>> +       clear_zone_contiguous(zone);
>> +
>>         /* during initialize mem_map, align hot-added range to section */
>>         start_sec = pfn_to_section_nr(phys_start_pfn);
>>         end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
>> @@ -523,6 +526,8 @@ int __ref __add_pages(int nid, struct zone *zone,
>> unsigned long phys_start_pfn,
>>         }
>>         vmemmap_populate_print_last();
>>
>> +       set_zone_contiguous(zone);
>> +
>>         return err;
>>   }
>>   EXPORT_SYMBOL_GPL(__add_pages);
>> @@ -770,6 +775,8 @@ int __remove_pages(struct zone *zone, unsigned long
>> phys_start_pfn,
>>         resource_size_t start, size;
>>         int ret = 0;
>>
>> +       clear_zone_contiguous(zone);
>> +
>>         /*
>>          * We can only remove entire sections
>>          */
>> @@ -796,6 +803,9 @@ int __remove_pages(struct zone *zone, unsigned long
>> phys_start_pfn,
>>                 if (ret)
>>                         break;
>>         }
>> +
>> +       set_zone_contiguous(zone);
>> +
>>         return ret;
>
>
> Hm I wonder how many __add_ or __remove_pages calls there might be per a
> major hotplug event (e.g. whole node). IIRC there may be many subranges that
> are onlined/offlined separately?

__add_ or __removed_pages are called whenever memory device is added
or removed, that is, 1 call per 1 device add/remove. If device is 1GB,
it is called
for 1GB range and if device is 2GB, it is called for 2GB range. I think they are
reasonable place to check continuity.

> Doing a full zone rescan on each
> sub-operation could be quite costly, no?

Doing full zone rescan makes code much simpler. It can be optimized further but
at this point there is no reason to handle complexity.

You should have added
> mm/hotplug_memory.c people to CC to comment, as you did in the [RFC]
> theoretical race... mail. Doing that now.

Okay. Thanks.

> If the hotplug people confirm it might be an issue, I guess one solution is
> to call set_zone_contiguous() lazily on-demand as you did in the v1 (but not
> relying on cached pfn initialization to determine whether contiguous was
> already evaluated). Add another variable like zone->contiguous_evaluated and
> make hotplug code just set it to false.

It could be, but, I guess that this is not too expensive for memory
hotplug. They can
tolerate 120 sec for offline memory and rescanning full zone to check
continuity is
very cheap than it.

Anyway, it's better to confirm these things by memory hotplug people.

Thanks.

>
>>   }
>>   EXPORT_SYMBOL_GPL(__remove_pages);
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index bac8842..4f5ad2b 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1271,9 +1271,13 @@ free_range:
>>         pgdat_init_report_one_done();
>>         return 0;
>>   }
>> +#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
>>
>>   void __init page_alloc_init_late(void)
>>   {
>> +       struct zone *zone;
>> +
>> +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
>>         int nid;
>>
>>         /* There will be num_node_state(N_MEMORY) threads */
>> @@ -1287,8 +1291,87 @@ void __init page_alloc_init_late(void)
>>
>>         /* Reinit limits that are based on free pages after the kernel is
>> up */
>>         files_maxfiles_init();
>> +#endif
>> +
>> +       for_each_populated_zone(zone)
>> +               set_zone_contiguous(zone);
>> +}
>> +
>> +/*
>> + * Check that the whole (or subset of) a pageblock given by the interval
>> of
>> + * [start_pfn, end_pfn) is valid and within the same zone, before
>> scanning it
>> + * with the migration of free compaction scanner. The scanners then need
>> to
>> + * use only pfn_valid_within() check for arches that allow holes within
>> + * pageblocks.
>> + *
>> + * Return struct page pointer of start_pfn, or NULL if checks were not
>> passed.
>> + *
>> + * It's possible on some configurations to have a setup like node0 node1
>> node0
>> + * i.e. it's possible that all pages within a zones range of pages do not
>> + * belong to a single zone. We assume that a border between node0 and
>> node1
>> + * can occur within a single pageblock, but not a node0 node1 node0
>> + * interleaving within a single pageblock. It is therefore sufficient to
>> check
>> + * the first and last page of a pageblock and avoid checking each
>> individual
>> + * page in a pageblock.
>> + */
>> +struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
>> +                               unsigned long end_pfn, struct zone *zone)
>> +{
>> +       struct page *start_page;
>> +       struct page *end_page;
>> +
>> +       /* end_pfn is one past the range we are checking */
>> +       end_pfn--;
>> +
>> +       if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn))
>> +               return NULL;
>> +
>> +       start_page = pfn_to_page(start_pfn);
>> +
>> +       if (page_zone(start_page) != zone)
>> +               return NULL;
>> +
>> +       end_page = pfn_to_page(end_pfn);
>> +
>> +       /* This gives a shorter code than deriving page_zone(end_page) */
>> +       if (page_zone_id(start_page) != page_zone_id(end_page))
>> +               return NULL;
>> +
>> +       return start_page;
>> +}
>> +
>> +void set_zone_contiguous(struct zone *zone)
>> +{
>> +       unsigned long block_start_pfn = zone->zone_start_pfn;
>> +       unsigned long block_end_pfn;
>> +       unsigned long pfn;
>> +
>> +       block_end_pfn = ALIGN(block_start_pfn + 1, pageblock_nr_pages);
>> +       for (; block_start_pfn < zone_end_pfn(zone);
>> +               block_start_pfn = block_end_pfn,
>> +               block_end_pfn += pageblock_nr_pages) {
>> +
>> +               block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
>> +
>> +               if (!__pageblock_pfn_to_page(block_start_pfn,
>> +                                       block_end_pfn, zone))
>> +                       return;
>> +
>> +               /* Check validity of pfn within pageblock */
>> +               for (pfn = block_start_pfn; pfn < block_end_pfn; pfn++) {
>> +                       if (!pfn_valid_within(pfn))
>> +                               return;
>> +               }
>
>
> Hm this is suboptimal and misleading. The result of pfn_valid_within()
> doesn't affect whether we need to use __pageblock_pfn_to_page() or not, so
> zone->contiguous shouldn't depend on it.
>
> On the other hand, if we knew that pfn_valid_within() is true everywhere, we
> wouldn't need to check it inside isolate_*pages_block().
> So you could add another patch that adds another bool to struct zone and
> test for that (with #ifdef CONFIG_HOLES_IN_ZONE at appropriate places).
>
> Thanks,
> Vlastimil
>
>
>> +       }
>> +
>> +       /* We confirm that there is no hole */
>> +       zone->contiguous = true;
>> +}
>> +
>> +void clear_zone_contiguous(struct zone *zone)
>> +{
>> +       zone->contiguous = false;
>>   }
>> -#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
>>
>>   #ifdef CONFIG_CMA
>>   /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
>>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
  2015-12-21 12:18       ` Joonsoo Kim
@ 2015-12-21 12:38         ` Joonsoo Kim
  -1 siblings, 0 replies; 34+ messages in thread
From: Joonsoo Kim @ 2015-12-21 12:38 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Aaron Lu, Mel Gorman, Rik van Riel,
	David Rientjes, LKML, Linux Memory Management List, Joonsoo Kim,
	Gu Zheng, Tang Chen, Naoya Horiguchi, Toshi Kani

2015-12-21 21:18 GMT+09:00 Joonsoo Kim <js1304@gmail.com>:
> 2015-12-21 19:46 GMT+09:00 Vlastimil Babka <vbabka@suse.cz>:
>> On 12/21/2015 07:13 AM, Joonsoo Kim wrote:
>>>
>>> There is a performance drop report due to hugepage allocation and in there
>>> half of cpu time are spent on pageblock_pfn_to_page() in compaction [1].
>>> In that workload, compaction is triggered to make hugepage but most of
>>> pageblocks are un-available for compaction due to pageblock type and
>>> skip bit so compaction usually fails. Most costly operations in this case
>>> is to find valid pageblock while scanning whole zone range. To check
>>> if pageblock is valid to compact, valid pfn within pageblock is required
>>> and we can obtain it by calling pageblock_pfn_to_page(). This function
>>> checks whether pageblock is in a single zone and return valid pfn
>>> if possible. Problem is that we need to check it every time before
>>> scanning pageblock even if we re-visit it and this turns out to
>>> be very expensive in this workload.
>>>
>>> Although we have no way to skip this pageblock check in the system
>>> where hole exists at arbitrary position, we can use cached value for
>>> zone continuity and just do pfn_to_page() in the system where hole doesn't
>>> exist. This optimization considerably speeds up in above workload.
>>>
>>> Before vs After
>>> Max: 1096 MB/s vs 1325 MB/s
>>> Min: 635 MB/s 1015 MB/s
>>> Avg: 899 MB/s 1194 MB/s
>>>
>>> Avg is improved by roughly 30% [2].
>>>
>>> [1]: http://www.spinics.net/lists/linux-mm/msg97378.html
>>> [2]: https://lkml.org/lkml/2015/12/9/23
>>>
>>> v2
>>> o checking zone continuity after initialization
>>> o handle memory-hotplug case
>>>
>>> Reported and Tested-by: Aaron Lu <aaron.lu@intel.com>
>>> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>>
>> [...]
>>
>>> --- a/mm/memory_hotplug.c
>>> +++ b/mm/memory_hotplug.c
>>> @@ -505,6 +505,9 @@ int __ref __add_pages(int nid, struct zone *zone,
>>> unsigned long phys_start_pfn,
>>>         unsigned long i;
>>>         int err = 0;
>>>         int start_sec, end_sec;
>>> +
>>> +       clear_zone_contiguous(zone);
>>> +
>>>         /* during initialize mem_map, align hot-added range to section */
>>>         start_sec = pfn_to_section_nr(phys_start_pfn);
>>>         end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
>>> @@ -523,6 +526,8 @@ int __ref __add_pages(int nid, struct zone *zone,
>>> unsigned long phys_start_pfn,
>>>         }
>>>         vmemmap_populate_print_last();
>>>
>>> +       set_zone_contiguous(zone);
>>> +
>>>         return err;
>>>   }
>>>   EXPORT_SYMBOL_GPL(__add_pages);
>>> @@ -770,6 +775,8 @@ int __remove_pages(struct zone *zone, unsigned long
>>> phys_start_pfn,
>>>         resource_size_t start, size;
>>>         int ret = 0;
>>>
>>> +       clear_zone_contiguous(zone);
>>> +
>>>         /*
>>>          * We can only remove entire sections
>>>          */
>>> @@ -796,6 +803,9 @@ int __remove_pages(struct zone *zone, unsigned long
>>> phys_start_pfn,
>>>                 if (ret)
>>>                         break;
>>>         }
>>> +
>>> +       set_zone_contiguous(zone);
>>> +
>>>         return ret;
>>
>>
>> Hm I wonder how many __add_ or __remove_pages calls there might be per a
>> major hotplug event (e.g. whole node). IIRC there may be many subranges that
>> are onlined/offlined separately?
>
> __add_ or __removed_pages are called whenever memory device is added
> or removed, that is, 1 call per 1 device add/remove. If device is 1GB,
> it is called
> for 1GB range and if device is 2GB, it is called for 2GB range. I think they are
> reasonable place to check continuity.
>
>> Doing a full zone rescan on each
>> sub-operation could be quite costly, no?
>
> Doing full zone rescan makes code much simpler. It can be optimized further but
> at this point there is no reason to handle complexity.
>
> You should have added
>> mm/hotplug_memory.c people to CC to comment, as you did in the [RFC]
>> theoretical race... mail. Doing that now.
>
> Okay. Thanks.
>
>> If the hotplug people confirm it might be an issue, I guess one solution is
>> to call set_zone_contiguous() lazily on-demand as you did in the v1 (but not
>> relying on cached pfn initialization to determine whether contiguous was
>> already evaluated). Add another variable like zone->contiguous_evaluated and
>> make hotplug code just set it to false.
>
> It could be, but, I guess that this is not too expensive for memory
> hotplug. They can
> tolerate 120 sec for offline memory and rescanning full zone to check
> continuity is
> very cheap than it.
>
> Anyway, it's better to confirm these things by memory hotplug people.
>
> Thanks.
>
>>
>>>   }
>>>   EXPORT_SYMBOL_GPL(__remove_pages);
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index bac8842..4f5ad2b 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -1271,9 +1271,13 @@ free_range:
>>>         pgdat_init_report_one_done();
>>>         return 0;
>>>   }
>>> +#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
>>>
>>>   void __init page_alloc_init_late(void)
>>>   {
>>> +       struct zone *zone;
>>> +
>>> +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
>>>         int nid;
>>>
>>>         /* There will be num_node_state(N_MEMORY) threads */
>>> @@ -1287,8 +1291,87 @@ void __init page_alloc_init_late(void)
>>>
>>>         /* Reinit limits that are based on free pages after the kernel is
>>> up */
>>>         files_maxfiles_init();
>>> +#endif
>>> +
>>> +       for_each_populated_zone(zone)
>>> +               set_zone_contiguous(zone);
>>> +}
>>> +
>>> +/*
>>> + * Check that the whole (or subset of) a pageblock given by the interval
>>> of
>>> + * [start_pfn, end_pfn) is valid and within the same zone, before
>>> scanning it
>>> + * with the migration of free compaction scanner. The scanners then need
>>> to
>>> + * use only pfn_valid_within() check for arches that allow holes within
>>> + * pageblocks.
>>> + *
>>> + * Return struct page pointer of start_pfn, or NULL if checks were not
>>> passed.
>>> + *
>>> + * It's possible on some configurations to have a setup like node0 node1
>>> node0
>>> + * i.e. it's possible that all pages within a zones range of pages do not
>>> + * belong to a single zone. We assume that a border between node0 and
>>> node1
>>> + * can occur within a single pageblock, but not a node0 node1 node0
>>> + * interleaving within a single pageblock. It is therefore sufficient to
>>> check
>>> + * the first and last page of a pageblock and avoid checking each
>>> individual
>>> + * page in a pageblock.
>>> + */
>>> +struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
>>> +                               unsigned long end_pfn, struct zone *zone)
>>> +{
>>> +       struct page *start_page;
>>> +       struct page *end_page;
>>> +
>>> +       /* end_pfn is one past the range we are checking */
>>> +       end_pfn--;
>>> +
>>> +       if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn))
>>> +               return NULL;
>>> +
>>> +       start_page = pfn_to_page(start_pfn);
>>> +
>>> +       if (page_zone(start_page) != zone)
>>> +               return NULL;
>>> +
>>> +       end_page = pfn_to_page(end_pfn);
>>> +
>>> +       /* This gives a shorter code than deriving page_zone(end_page) */
>>> +       if (page_zone_id(start_page) != page_zone_id(end_page))
>>> +               return NULL;
>>> +
>>> +       return start_page;
>>> +}
>>> +
>>> +void set_zone_contiguous(struct zone *zone)
>>> +{
>>> +       unsigned long block_start_pfn = zone->zone_start_pfn;
>>> +       unsigned long block_end_pfn;
>>> +       unsigned long pfn;
>>> +
>>> +       block_end_pfn = ALIGN(block_start_pfn + 1, pageblock_nr_pages);
>>> +       for (; block_start_pfn < zone_end_pfn(zone);
>>> +               block_start_pfn = block_end_pfn,
>>> +               block_end_pfn += pageblock_nr_pages) {
>>> +
>>> +               block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
>>> +
>>> +               if (!__pageblock_pfn_to_page(block_start_pfn,
>>> +                                       block_end_pfn, zone))
>>> +                       return;
>>> +
>>> +               /* Check validity of pfn within pageblock */
>>> +               for (pfn = block_start_pfn; pfn < block_end_pfn; pfn++) {
>>> +                       if (!pfn_valid_within(pfn))
>>> +                               return;
>>> +               }
>>
>>
>> Hm this is suboptimal and misleading. The result of pfn_valid_within()
>> doesn't affect whether we need to use __pageblock_pfn_to_page() or not, so
>> zone->contiguous shouldn't depend on it.

Forgot to answer these things.

pageblock_pfn_to_page() could be called for any pfn, not pageblock boundary pfn
so every pfn should be checked. It is possible that boundary pfn is valid but
there is invalid pfn in the pageblock if CONFIG_HOLES_IN_ZONE. Therefore,
we need to check pfn_valid_within().

Compiler may opt out 'for loop' for checking pfn_valid_within() but I can
opt out explicitly.

>> On the other hand, if we knew that pfn_valid_within() is true everywhere, we
>> wouldn't need to check it inside isolate_*pages_block().
>> So you could add another patch that adds another bool to struct zone and
>> test for that (with #ifdef CONFIG_HOLES_IN_ZONE at appropriate places).

Yes, we can optimize that, too. We can find appropriate places to utilize this
zone continuity information.

Thanks.

>> Thanks,
>> Vlastimil
>>
>>
>>> +       }
>>> +
>>> +       /* We confirm that there is no hole */
>>> +       zone->contiguous = true;
>>> +}
>>> +
>>> +void clear_zone_contiguous(struct zone *zone)
>>> +{
>>> +       zone->contiguous = false;
>>>   }
>>> -#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
>>>
>>>   #ifdef CONFIG_CMA
>>>   /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
>>>
>>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
@ 2015-12-21 12:38         ` Joonsoo Kim
  0 siblings, 0 replies; 34+ messages in thread
From: Joonsoo Kim @ 2015-12-21 12:38 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Aaron Lu, Mel Gorman, Rik van Riel,
	David Rientjes, LKML, Linux Memory Management List, Joonsoo Kim,
	Gu Zheng, Tang Chen, Naoya Horiguchi, Toshi Kani

2015-12-21 21:18 GMT+09:00 Joonsoo Kim <js1304@gmail.com>:
> 2015-12-21 19:46 GMT+09:00 Vlastimil Babka <vbabka@suse.cz>:
>> On 12/21/2015 07:13 AM, Joonsoo Kim wrote:
>>>
>>> There is a performance drop report due to hugepage allocation and in there
>>> half of cpu time are spent on pageblock_pfn_to_page() in compaction [1].
>>> In that workload, compaction is triggered to make hugepage but most of
>>> pageblocks are un-available for compaction due to pageblock type and
>>> skip bit so compaction usually fails. Most costly operations in this case
>>> is to find valid pageblock while scanning whole zone range. To check
>>> if pageblock is valid to compact, valid pfn within pageblock is required
>>> and we can obtain it by calling pageblock_pfn_to_page(). This function
>>> checks whether pageblock is in a single zone and return valid pfn
>>> if possible. Problem is that we need to check it every time before
>>> scanning pageblock even if we re-visit it and this turns out to
>>> be very expensive in this workload.
>>>
>>> Although we have no way to skip this pageblock check in the system
>>> where hole exists at arbitrary position, we can use cached value for
>>> zone continuity and just do pfn_to_page() in the system where hole doesn't
>>> exist. This optimization considerably speeds up in above workload.
>>>
>>> Before vs After
>>> Max: 1096 MB/s vs 1325 MB/s
>>> Min: 635 MB/s 1015 MB/s
>>> Avg: 899 MB/s 1194 MB/s
>>>
>>> Avg is improved by roughly 30% [2].
>>>
>>> [1]: http://www.spinics.net/lists/linux-mm/msg97378.html
>>> [2]: https://lkml.org/lkml/2015/12/9/23
>>>
>>> v2
>>> o checking zone continuity after initialization
>>> o handle memory-hotplug case
>>>
>>> Reported and Tested-by: Aaron Lu <aaron.lu@intel.com>
>>> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>>
>> [...]
>>
>>> --- a/mm/memory_hotplug.c
>>> +++ b/mm/memory_hotplug.c
>>> @@ -505,6 +505,9 @@ int __ref __add_pages(int nid, struct zone *zone,
>>> unsigned long phys_start_pfn,
>>>         unsigned long i;
>>>         int err = 0;
>>>         int start_sec, end_sec;
>>> +
>>> +       clear_zone_contiguous(zone);
>>> +
>>>         /* during initialize mem_map, align hot-added range to section */
>>>         start_sec = pfn_to_section_nr(phys_start_pfn);
>>>         end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
>>> @@ -523,6 +526,8 @@ int __ref __add_pages(int nid, struct zone *zone,
>>> unsigned long phys_start_pfn,
>>>         }
>>>         vmemmap_populate_print_last();
>>>
>>> +       set_zone_contiguous(zone);
>>> +
>>>         return err;
>>>   }
>>>   EXPORT_SYMBOL_GPL(__add_pages);
>>> @@ -770,6 +775,8 @@ int __remove_pages(struct zone *zone, unsigned long
>>> phys_start_pfn,
>>>         resource_size_t start, size;
>>>         int ret = 0;
>>>
>>> +       clear_zone_contiguous(zone);
>>> +
>>>         /*
>>>          * We can only remove entire sections
>>>          */
>>> @@ -796,6 +803,9 @@ int __remove_pages(struct zone *zone, unsigned long
>>> phys_start_pfn,
>>>                 if (ret)
>>>                         break;
>>>         }
>>> +
>>> +       set_zone_contiguous(zone);
>>> +
>>>         return ret;
>>
>>
>> Hm I wonder how many __add_ or __remove_pages calls there might be per a
>> major hotplug event (e.g. whole node). IIRC there may be many subranges that
>> are onlined/offlined separately?
>
> __add_ or __removed_pages are called whenever memory device is added
> or removed, that is, 1 call per 1 device add/remove. If device is 1GB,
> it is called
> for 1GB range and if device is 2GB, it is called for 2GB range. I think they are
> reasonable place to check continuity.
>
>> Doing a full zone rescan on each
>> sub-operation could be quite costly, no?
>
> Doing full zone rescan makes code much simpler. It can be optimized further but
> at this point there is no reason to handle complexity.
>
> You should have added
>> mm/hotplug_memory.c people to CC to comment, as you did in the [RFC]
>> theoretical race... mail. Doing that now.
>
> Okay. Thanks.
>
>> If the hotplug people confirm it might be an issue, I guess one solution is
>> to call set_zone_contiguous() lazily on-demand as you did in the v1 (but not
>> relying on cached pfn initialization to determine whether contiguous was
>> already evaluated). Add another variable like zone->contiguous_evaluated and
>> make hotplug code just set it to false.
>
> It could be, but, I guess that this is not too expensive for memory
> hotplug. They can
> tolerate 120 sec for offline memory and rescanning full zone to check
> continuity is
> very cheap than it.
>
> Anyway, it's better to confirm these things by memory hotplug people.
>
> Thanks.
>
>>
>>>   }
>>>   EXPORT_SYMBOL_GPL(__remove_pages);
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index bac8842..4f5ad2b 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -1271,9 +1271,13 @@ free_range:
>>>         pgdat_init_report_one_done();
>>>         return 0;
>>>   }
>>> +#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
>>>
>>>   void __init page_alloc_init_late(void)
>>>   {
>>> +       struct zone *zone;
>>> +
>>> +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
>>>         int nid;
>>>
>>>         /* There will be num_node_state(N_MEMORY) threads */
>>> @@ -1287,8 +1291,87 @@ void __init page_alloc_init_late(void)
>>>
>>>         /* Reinit limits that are based on free pages after the kernel is
>>> up */
>>>         files_maxfiles_init();
>>> +#endif
>>> +
>>> +       for_each_populated_zone(zone)
>>> +               set_zone_contiguous(zone);
>>> +}
>>> +
>>> +/*
>>> + * Check that the whole (or subset of) a pageblock given by the interval
>>> of
>>> + * [start_pfn, end_pfn) is valid and within the same zone, before
>>> scanning it
>>> + * with the migration of free compaction scanner. The scanners then need
>>> to
>>> + * use only pfn_valid_within() check for arches that allow holes within
>>> + * pageblocks.
>>> + *
>>> + * Return struct page pointer of start_pfn, or NULL if checks were not
>>> passed.
>>> + *
>>> + * It's possible on some configurations to have a setup like node0 node1
>>> node0
>>> + * i.e. it's possible that all pages within a zones range of pages do not
>>> + * belong to a single zone. We assume that a border between node0 and
>>> node1
>>> + * can occur within a single pageblock, but not a node0 node1 node0
>>> + * interleaving within a single pageblock. It is therefore sufficient to
>>> check
>>> + * the first and last page of a pageblock and avoid checking each
>>> individual
>>> + * page in a pageblock.
>>> + */
>>> +struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
>>> +                               unsigned long end_pfn, struct zone *zone)
>>> +{
>>> +       struct page *start_page;
>>> +       struct page *end_page;
>>> +
>>> +       /* end_pfn is one past the range we are checking */
>>> +       end_pfn--;
>>> +
>>> +       if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn))
>>> +               return NULL;
>>> +
>>> +       start_page = pfn_to_page(start_pfn);
>>> +
>>> +       if (page_zone(start_page) != zone)
>>> +               return NULL;
>>> +
>>> +       end_page = pfn_to_page(end_pfn);
>>> +
>>> +       /* This gives a shorter code than deriving page_zone(end_page) */
>>> +       if (page_zone_id(start_page) != page_zone_id(end_page))
>>> +               return NULL;
>>> +
>>> +       return start_page;
>>> +}
>>> +
>>> +void set_zone_contiguous(struct zone *zone)
>>> +{
>>> +       unsigned long block_start_pfn = zone->zone_start_pfn;
>>> +       unsigned long block_end_pfn;
>>> +       unsigned long pfn;
>>> +
>>> +       block_end_pfn = ALIGN(block_start_pfn + 1, pageblock_nr_pages);
>>> +       for (; block_start_pfn < zone_end_pfn(zone);
>>> +               block_start_pfn = block_end_pfn,
>>> +               block_end_pfn += pageblock_nr_pages) {
>>> +
>>> +               block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
>>> +
>>> +               if (!__pageblock_pfn_to_page(block_start_pfn,
>>> +                                       block_end_pfn, zone))
>>> +                       return;
>>> +
>>> +               /* Check validity of pfn within pageblock */
>>> +               for (pfn = block_start_pfn; pfn < block_end_pfn; pfn++) {
>>> +                       if (!pfn_valid_within(pfn))
>>> +                               return;
>>> +               }
>>
>>
>> Hm this is suboptimal and misleading. The result of pfn_valid_within()
>> doesn't affect whether we need to use __pageblock_pfn_to_page() or not, so
>> zone->contiguous shouldn't depend on it.

Forgot to answer these things.

pageblock_pfn_to_page() could be called for any pfn, not pageblock boundary pfn
so every pfn should be checked. It is possible that boundary pfn is valid but
there is invalid pfn in the pageblock if CONFIG_HOLES_IN_ZONE. Therefore,
we need to check pfn_valid_within().

Compiler may opt out 'for loop' for checking pfn_valid_within() but I can
opt out explicitly.

>> On the other hand, if we knew that pfn_valid_within() is true everywhere, we
>> wouldn't need to check it inside isolate_*pages_block().
>> So you could add another patch that adds another bool to struct zone and
>> test for that (with #ifdef CONFIG_HOLES_IN_ZONE at appropriate places).

Yes, we can optimize that, too. We can find appropriate places to utilize this
zone continuity information.

Thanks.

>> Thanks,
>> Vlastimil
>>
>>
>>> +       }
>>> +
>>> +       /* We confirm that there is no hole */
>>> +       zone->contiguous = true;
>>> +}
>>> +
>>> +void clear_zone_contiguous(struct zone *zone)
>>> +{
>>> +       zone->contiguous = false;
>>>   }
>>> -#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
>>>
>>>   #ifdef CONFIG_CMA
>>>   /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
>>>
>>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/2] mm/compaction: fix invalid free_pfn and compact_cached_free_pfn
  2015-12-21  6:13 ` Joonsoo Kim
@ 2015-12-22 22:05   ` David Rientjes
  -1 siblings, 0 replies; 34+ messages in thread
From: David Rientjes @ 2015-12-22 22:05 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Vlastimil Babka, Aaron Lu, Mel Gorman,
	Rik van Riel, linux-kernel, linux-mm, Joonsoo Kim

On Mon, 21 Dec 2015, Joonsoo Kim wrote:

> free_pfn and compact_cached_free_pfn are the pointer that remember
> restart position of freepage scanner. When they are reset or invalid,
> we set them to zone_end_pfn because freepage scanner works in reverse
> direction. But, because zone range is defined as [zone_start_pfn,
> zone_end_pfn), zone_end_pfn is invalid to access. Therefore, we should
> not store it to free_pfn and compact_cached_free_pfn. Instead, we need
> to store zone_end_pfn - 1 to them. There is one more thing we should
> consider. Freepage scanner scan reversely by pageblock unit. If free_pfn
> and compact_cached_free_pfn are set to middle of pageblock, it regards
> that sitiation as that it already scans front part of pageblock so we
> lose opportunity to scan there. To fix-up, this patch do round_down()
> to guarantee that reset position will be pageblock aligned.
> 
> Note that thanks to the current pageblock_pfn_to_page() implementation,
> actual access to zone_end_pfn doesn't happen until now. But, following
> patch will change pageblock_pfn_to_page() so this patch is needed
> from now on.
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/2] mm/compaction: fix invalid free_pfn and compact_cached_free_pfn
@ 2015-12-22 22:05   ` David Rientjes
  0 siblings, 0 replies; 34+ messages in thread
From: David Rientjes @ 2015-12-22 22:05 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Vlastimil Babka, Aaron Lu, Mel Gorman,
	Rik van Riel, linux-kernel, linux-mm, Joonsoo Kim

On Mon, 21 Dec 2015, Joonsoo Kim wrote:

> free_pfn and compact_cached_free_pfn are the pointer that remember
> restart position of freepage scanner. When they are reset or invalid,
> we set them to zone_end_pfn because freepage scanner works in reverse
> direction. But, because zone range is defined as [zone_start_pfn,
> zone_end_pfn), zone_end_pfn is invalid to access. Therefore, we should
> not store it to free_pfn and compact_cached_free_pfn. Instead, we need
> to store zone_end_pfn - 1 to them. There is one more thing we should
> consider. Freepage scanner scan reversely by pageblock unit. If free_pfn
> and compact_cached_free_pfn are set to middle of pageblock, it regards
> that sitiation as that it already scans front part of pageblock so we
> lose opportunity to scan there. To fix-up, this patch do round_down()
> to guarantee that reset position will be pageblock aligned.
> 
> Note that thanks to the current pageblock_pfn_to_page() implementation,
> actual access to zone_end_pfn doesn't happen until now. But, following
> patch will change pageblock_pfn_to_page() so this patch is needed
> from now on.
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Acked-by: David Rientjes <rientjes@google.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
  2015-12-21  6:13   ` Joonsoo Kim
@ 2015-12-22 22:17     ` David Rientjes
  -1 siblings, 0 replies; 34+ messages in thread
From: David Rientjes @ 2015-12-22 22:17 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Vlastimil Babka, Aaron Lu, Mel Gorman,
	Rik van Riel, linux-kernel, linux-mm, Joonsoo Kim

On Mon, 21 Dec 2015, Joonsoo Kim wrote:

> There is a performance drop report due to hugepage allocation and in there
> half of cpu time are spent on pageblock_pfn_to_page() in compaction [1].
> In that workload, compaction is triggered to make hugepage but most of
> pageblocks are un-available for compaction due to pageblock type and
> skip bit so compaction usually fails. Most costly operations in this case
> is to find valid pageblock while scanning whole zone range. To check
> if pageblock is valid to compact, valid pfn within pageblock is required
> and we can obtain it by calling pageblock_pfn_to_page(). This function
> checks whether pageblock is in a single zone and return valid pfn
> if possible. Problem is that we need to check it every time before
> scanning pageblock even if we re-visit it and this turns out to
> be very expensive in this workload.
> 
> Although we have no way to skip this pageblock check in the system
> where hole exists at arbitrary position, we can use cached value for
> zone continuity and just do pfn_to_page() in the system where hole doesn't
> exist. This optimization considerably speeds up in above workload.
> 
> Before vs After
> Max: 1096 MB/s vs 1325 MB/s
> Min: 635 MB/s 1015 MB/s
> Avg: 899 MB/s 1194 MB/s
> 
> Avg is improved by roughly 30% [2].
> 

Wow, ok!

I'm wondering if it would be better to maintain this as a characteristic 
of each pageblock rather than each zone.  Have you tried to introduce a 
couple new bits to pageblock_bits that would track (1) if a cached value 
makes sense and (2) if the pageblock is contiguous?  On the first call to 
pageblock_pfn_to_page(), set the first bit, PB_cached, and set the second 
bit, PB_contiguous, iff it is contiguous.  On subsequent calls, if 
PB_cached is true, then return PB_contiguous.  On memory hot-add or 
remove (or init), clear PB_cached.

What are the cases where pageblock_pfn_to_page() is used for a subset of 
the pageblock and the result would be problematic for compaction?  I.e., 
do we actually care to use pageblocks that are not contiguous at all?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
@ 2015-12-22 22:17     ` David Rientjes
  0 siblings, 0 replies; 34+ messages in thread
From: David Rientjes @ 2015-12-22 22:17 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Vlastimil Babka, Aaron Lu, Mel Gorman,
	Rik van Riel, linux-kernel, linux-mm, Joonsoo Kim

On Mon, 21 Dec 2015, Joonsoo Kim wrote:

> There is a performance drop report due to hugepage allocation and in there
> half of cpu time are spent on pageblock_pfn_to_page() in compaction [1].
> In that workload, compaction is triggered to make hugepage but most of
> pageblocks are un-available for compaction due to pageblock type and
> skip bit so compaction usually fails. Most costly operations in this case
> is to find valid pageblock while scanning whole zone range. To check
> if pageblock is valid to compact, valid pfn within pageblock is required
> and we can obtain it by calling pageblock_pfn_to_page(). This function
> checks whether pageblock is in a single zone and return valid pfn
> if possible. Problem is that we need to check it every time before
> scanning pageblock even if we re-visit it and this turns out to
> be very expensive in this workload.
> 
> Although we have no way to skip this pageblock check in the system
> where hole exists at arbitrary position, we can use cached value for
> zone continuity and just do pfn_to_page() in the system where hole doesn't
> exist. This optimization considerably speeds up in above workload.
> 
> Before vs After
> Max: 1096 MB/s vs 1325 MB/s
> Min: 635 MB/s 1015 MB/s
> Avg: 899 MB/s 1194 MB/s
> 
> Avg is improved by roughly 30% [2].
> 

Wow, ok!

I'm wondering if it would be better to maintain this as a characteristic 
of each pageblock rather than each zone.  Have you tried to introduce a 
couple new bits to pageblock_bits that would track (1) if a cached value 
makes sense and (2) if the pageblock is contiguous?  On the first call to 
pageblock_pfn_to_page(), set the first bit, PB_cached, and set the second 
bit, PB_contiguous, iff it is contiguous.  On subsequent calls, if 
PB_cached is true, then return PB_contiguous.  On memory hot-add or 
remove (or init), clear PB_cached.

What are the cases where pageblock_pfn_to_page() is used for a subset of 
the pageblock and the result would be problematic for compaction?  I.e., 
do we actually care to use pageblocks that are not contiguous at all?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
  2015-12-22 22:17     ` David Rientjes
@ 2015-12-23  6:14       ` Vlastimil Babka
  -1 siblings, 0 replies; 34+ messages in thread
From: Vlastimil Babka @ 2015-12-23  6:14 UTC (permalink / raw)
  To: David Rientjes, Joonsoo Kim
  Cc: Andrew Morton, Aaron Lu, Mel Gorman, Rik van Riel, linux-kernel,
	linux-mm, Joonsoo Kim

On 22.12.2015 23:17, David Rientjes wrote:
> On Mon, 21 Dec 2015, Joonsoo Kim wrote:
> 
>> Before vs After
>> Max: 1096 MB/s vs 1325 MB/s
>> Min: 635 MB/s 1015 MB/s
>> Avg: 899 MB/s 1194 MB/s
>>
>> Avg is improved by roughly 30% [2].
>>
> 
> Wow, ok!
> 
> I'm wondering if it would be better to maintain this as a characteristic 
> of each pageblock rather than each zone.  Have you tried to introduce a 
> couple new bits to pageblock_bits that would track (1) if a cached value 
> makes sense and (2) if the pageblock is contiguous?  On the first call to 
> pageblock_pfn_to_page(), set the first bit, PB_cached, and set the second 
> bit, PB_contiguous, iff it is contiguous.  On subsequent calls, if 
> PB_cached is true, then return PB_contiguous.  On memory hot-add or 
> remove (or init), clear PB_cached.

I can imagine these bitmap operation to be as expensive as what
__pageblock_pfn_to_page() does (or close)? But if not, we could also just be a
bit smarter about PG_skip and check that before doing pfn_to_page.

> What are the cases where pageblock_pfn_to_page() is used for a subset of 
> the pageblock and the result would be problematic for compaction?  I.e., 
> do we actually care to use pageblocks that are not contiguous at all?

The problematic pageblocks are those that have pages from more than one zone in
them, so we just skip them. Supposedly that can only happen by switching once
between two zones somewhere in the middle of the pageblock, so it's sufficient
to check first and last pfn and compare their zones. So using
pageblock_pfn_to_page() on a subset from compaction would be wrong. Holes (==no
pages) within pageblock is a different thing checked by pfn_valid_within()
(#defined out on archs where such holes cannot happen) when scanning the block.

That's why I'm not entirely happy with how the patch conflates both the
first/last pfn's zone checks and pfn_valid_within() checks. Yes, a fully
contiguous zone does *imply* that pageblock_pfn_to_page() doesn't have to check
first/last pfn for a matching zone. But it's not *equality*. And any (now just
*potential*) user of pageblock_pfn_to_page() with pfn's different than
first/last pfn of a pageblock is likely wrong.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
@ 2015-12-23  6:14       ` Vlastimil Babka
  0 siblings, 0 replies; 34+ messages in thread
From: Vlastimil Babka @ 2015-12-23  6:14 UTC (permalink / raw)
  To: David Rientjes, Joonsoo Kim
  Cc: Andrew Morton, Aaron Lu, Mel Gorman, Rik van Riel, linux-kernel,
	linux-mm, Joonsoo Kim

On 22.12.2015 23:17, David Rientjes wrote:
> On Mon, 21 Dec 2015, Joonsoo Kim wrote:
> 
>> Before vs After
>> Max: 1096 MB/s vs 1325 MB/s
>> Min: 635 MB/s 1015 MB/s
>> Avg: 899 MB/s 1194 MB/s
>>
>> Avg is improved by roughly 30% [2].
>>
> 
> Wow, ok!
> 
> I'm wondering if it would be better to maintain this as a characteristic 
> of each pageblock rather than each zone.  Have you tried to introduce a 
> couple new bits to pageblock_bits that would track (1) if a cached value 
> makes sense and (2) if the pageblock is contiguous?  On the first call to 
> pageblock_pfn_to_page(), set the first bit, PB_cached, and set the second 
> bit, PB_contiguous, iff it is contiguous.  On subsequent calls, if 
> PB_cached is true, then return PB_contiguous.  On memory hot-add or 
> remove (or init), clear PB_cached.

I can imagine these bitmap operation to be as expensive as what
__pageblock_pfn_to_page() does (or close)? But if not, we could also just be a
bit smarter about PG_skip and check that before doing pfn_to_page.

> What are the cases where pageblock_pfn_to_page() is used for a subset of 
> the pageblock and the result would be problematic for compaction?  I.e., 
> do we actually care to use pageblocks that are not contiguous at all?

The problematic pageblocks are those that have pages from more than one zone in
them, so we just skip them. Supposedly that can only happen by switching once
between two zones somewhere in the middle of the pageblock, so it's sufficient
to check first and last pfn and compare their zones. So using
pageblock_pfn_to_page() on a subset from compaction would be wrong. Holes (==no
pages) within pageblock is a different thing checked by pfn_valid_within()
(#defined out on archs where such holes cannot happen) when scanning the block.

That's why I'm not entirely happy with how the patch conflates both the
first/last pfn's zone checks and pfn_valid_within() checks. Yes, a fully
contiguous zone does *imply* that pageblock_pfn_to_page() doesn't have to check
first/last pfn for a matching zone. But it's not *equality*. And any (now just
*potential*) user of pageblock_pfn_to_page() with pfn's different than
first/last pfn of a pageblock is likely wrong.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
  2015-12-23  6:14       ` Vlastimil Babka
@ 2015-12-23  6:57         ` Joonsoo Kim
  -1 siblings, 0 replies; 34+ messages in thread
From: Joonsoo Kim @ 2015-12-23  6:57 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Rientjes, Andrew Morton, Aaron Lu, Mel Gorman,
	Rik van Riel, linux-kernel, linux-mm

On Wed, Dec 23, 2015 at 07:14:21AM +0100, Vlastimil Babka wrote:
> On 22.12.2015 23:17, David Rientjes wrote:
> > On Mon, 21 Dec 2015, Joonsoo Kim wrote:
> > 
> >> Before vs After
> >> Max: 1096 MB/s vs 1325 MB/s
> >> Min: 635 MB/s 1015 MB/s
> >> Avg: 899 MB/s 1194 MB/s
> >>
> >> Avg is improved by roughly 30% [2].
> >>
> > 
> > Wow, ok!
> > 
> > I'm wondering if it would be better to maintain this as a characteristic 
> > of each pageblock rather than each zone.  Have you tried to introduce a 
> > couple new bits to pageblock_bits that would track (1) if a cached value 
> > makes sense and (2) if the pageblock is contiguous?  On the first call to 
> > pageblock_pfn_to_page(), set the first bit, PB_cached, and set the second 
> > bit, PB_contiguous, iff it is contiguous.  On subsequent calls, if 
> > PB_cached is true, then return PB_contiguous.  On memory hot-add or 
> > remove (or init), clear PB_cached.
> 
> I can imagine these bitmap operation to be as expensive as what
> __pageblock_pfn_to_page() does (or close)? But if not, we could also just be a
> bit smarter about PG_skip and check that before doing pfn_to_page.

Although I don't think carefully, to get PB_xxx, we need to check pfn's zone
and it requires pfn_valid() and pfn_to_page(). So, I guess that cost would be
same or half compared to cost of __pageblock_pfn_to_page().

> 
> > What are the cases where pageblock_pfn_to_page() is used for a subset of 
> > the pageblock and the result would be problematic for compaction?  I.e., 
> > do we actually care to use pageblocks that are not contiguous at all?
> 
> The problematic pageblocks are those that have pages from more than one zone in
> them, so we just skip them. Supposedly that can only happen by switching once
> between two zones somewhere in the middle of the pageblock, so it's sufficient
> to check first and last pfn and compare their zones. So using
> pageblock_pfn_to_page() on a subset from compaction would be wrong. Holes (==no
> pages) within pageblock is a different thing checked by pfn_valid_within()
> (#defined out on archs where such holes cannot happen) when scanning the block.
> 
> That's why I'm not entirely happy with how the patch conflates both the
> first/last pfn's zone checks and pfn_valid_within() checks. Yes, a fully
> contiguous zone does *imply* that pageblock_pfn_to_page() doesn't have to check
> first/last pfn for a matching zone. But it's not *equality*. And any (now just
> *potential*) user of pageblock_pfn_to_page() with pfn's different than
> first/last pfn of a pageblock is likely wrong.

Now, I understand your concern. What makes me mislead is that
3 of 4 callers to pageblock_pfn_to_page() in compaction.c could call it with
non-pageblock boundary pfn. Maybe, they should be fixed first. Then, yes. I can
separate first/last pfn's zone checks and pfn_valid_within() checks.
If then, would you be entirely happy? :)

Thanks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
@ 2015-12-23  6:57         ` Joonsoo Kim
  0 siblings, 0 replies; 34+ messages in thread
From: Joonsoo Kim @ 2015-12-23  6:57 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Rientjes, Andrew Morton, Aaron Lu, Mel Gorman,
	Rik van Riel, linux-kernel, linux-mm

On Wed, Dec 23, 2015 at 07:14:21AM +0100, Vlastimil Babka wrote:
> On 22.12.2015 23:17, David Rientjes wrote:
> > On Mon, 21 Dec 2015, Joonsoo Kim wrote:
> > 
> >> Before vs After
> >> Max: 1096 MB/s vs 1325 MB/s
> >> Min: 635 MB/s 1015 MB/s
> >> Avg: 899 MB/s 1194 MB/s
> >>
> >> Avg is improved by roughly 30% [2].
> >>
> > 
> > Wow, ok!
> > 
> > I'm wondering if it would be better to maintain this as a characteristic 
> > of each pageblock rather than each zone.  Have you tried to introduce a 
> > couple new bits to pageblock_bits that would track (1) if a cached value 
> > makes sense and (2) if the pageblock is contiguous?  On the first call to 
> > pageblock_pfn_to_page(), set the first bit, PB_cached, and set the second 
> > bit, PB_contiguous, iff it is contiguous.  On subsequent calls, if 
> > PB_cached is true, then return PB_contiguous.  On memory hot-add or 
> > remove (or init), clear PB_cached.
> 
> I can imagine these bitmap operation to be as expensive as what
> __pageblock_pfn_to_page() does (or close)? But if not, we could also just be a
> bit smarter about PG_skip and check that before doing pfn_to_page.

Although I don't think carefully, to get PB_xxx, we need to check pfn's zone
and it requires pfn_valid() and pfn_to_page(). So, I guess that cost would be
same or half compared to cost of __pageblock_pfn_to_page().

> 
> > What are the cases where pageblock_pfn_to_page() is used for a subset of 
> > the pageblock and the result would be problematic for compaction?  I.e., 
> > do we actually care to use pageblocks that are not contiguous at all?
> 
> The problematic pageblocks are those that have pages from more than one zone in
> them, so we just skip them. Supposedly that can only happen by switching once
> between two zones somewhere in the middle of the pageblock, so it's sufficient
> to check first and last pfn and compare their zones. So using
> pageblock_pfn_to_page() on a subset from compaction would be wrong. Holes (==no
> pages) within pageblock is a different thing checked by pfn_valid_within()
> (#defined out on archs where such holes cannot happen) when scanning the block.
> 
> That's why I'm not entirely happy with how the patch conflates both the
> first/last pfn's zone checks and pfn_valid_within() checks. Yes, a fully
> contiguous zone does *imply* that pageblock_pfn_to_page() doesn't have to check
> first/last pfn for a matching zone. But it's not *equality*. And any (now just
> *potential*) user of pageblock_pfn_to_page() with pfn's different than
> first/last pfn of a pageblock is likely wrong.

Now, I understand your concern. What makes me mislead is that
3 of 4 callers to pageblock_pfn_to_page() in compaction.c could call it with
non-pageblock boundary pfn. Maybe, they should be fixed first. Then, yes. I can
separate first/last pfn's zone checks and pfn_valid_within() checks.
If then, would you be entirely happy? :)

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
  2015-12-23  6:57         ` Joonsoo Kim
@ 2016-01-04 12:38           ` Vlastimil Babka
  -1 siblings, 0 replies; 34+ messages in thread
From: Vlastimil Babka @ 2016-01-04 12:38 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: David Rientjes, Andrew Morton, Aaron Lu, Mel Gorman,
	Rik van Riel, linux-kernel, linux-mm

On 12/23/2015 07:57 AM, Joonsoo Kim wrote:
>>> What are the cases where pageblock_pfn_to_page() is used for a subset of
>>> the pageblock and the result would be problematic for compaction?  I.e.,
>>> do we actually care to use pageblocks that are not contiguous at all?
>>
>> The problematic pageblocks are those that have pages from more than one zone in
>> them, so we just skip them. Supposedly that can only happen by switching once
>> between two zones somewhere in the middle of the pageblock, so it's sufficient
>> to check first and last pfn and compare their zones. So using
>> pageblock_pfn_to_page() on a subset from compaction would be wrong. Holes (==no
>> pages) within pageblock is a different thing checked by pfn_valid_within()
>> (#defined out on archs where such holes cannot happen) when scanning the block.
>>
>> That's why I'm not entirely happy with how the patch conflates both the
>> first/last pfn's zone checks and pfn_valid_within() checks. Yes, a fully
>> contiguous zone does *imply* that pageblock_pfn_to_page() doesn't have to check
>> first/last pfn for a matching zone. But it's not *equality*. And any (now just
>> *potential*) user of pageblock_pfn_to_page() with pfn's different than
>> first/last pfn of a pageblock is likely wrong.
>
> Now, I understand your concern. What makes me mislead is that
> 3 of 4 callers to pageblock_pfn_to_page() in compaction.c could call it with
> non-pageblock boundary pfn.

Oh, I thought you were talking about potential new callers, now that the 
function was exported. So let's see about the existing callers:

isolate_migratepages() - first pfn can be non-boundary when restarting 
from a middle of pageblock, that's true. But it means the pageblock has 
already passed the check in previous call where it was boundary, so it's 
safe. Worst can happen that the restarting pfn will be in a 
intra-pageblock hole so pageblock will be falsely skipped over.

isolate_freepages() - always boundary AFAICS?

isolate_migratepages_range() and isolate_freepages_range() - yeah the 
CMA parts say it doesn't have to be aligned, I don't know about actual users

> Maybe, they should be fixed first.

It would be probably best, even for isolate_migratepages() for 
consistency and less-surprisibility.

> Then, yes. I can
> separate first/last pfn's zone checks and pfn_valid_within() checks.
> If then, would you be entirely happy? :)

Maybe, if the patch also made me a coffee :P

> Thanks.

Thanks!


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
@ 2016-01-04 12:38           ` Vlastimil Babka
  0 siblings, 0 replies; 34+ messages in thread
From: Vlastimil Babka @ 2016-01-04 12:38 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: David Rientjes, Andrew Morton, Aaron Lu, Mel Gorman,
	Rik van Riel, linux-kernel, linux-mm

On 12/23/2015 07:57 AM, Joonsoo Kim wrote:
>>> What are the cases where pageblock_pfn_to_page() is used for a subset of
>>> the pageblock and the result would be problematic for compaction?  I.e.,
>>> do we actually care to use pageblocks that are not contiguous at all?
>>
>> The problematic pageblocks are those that have pages from more than one zone in
>> them, so we just skip them. Supposedly that can only happen by switching once
>> between two zones somewhere in the middle of the pageblock, so it's sufficient
>> to check first and last pfn and compare their zones. So using
>> pageblock_pfn_to_page() on a subset from compaction would be wrong. Holes (==no
>> pages) within pageblock is a different thing checked by pfn_valid_within()
>> (#defined out on archs where such holes cannot happen) when scanning the block.
>>
>> That's why I'm not entirely happy with how the patch conflates both the
>> first/last pfn's zone checks and pfn_valid_within() checks. Yes, a fully
>> contiguous zone does *imply* that pageblock_pfn_to_page() doesn't have to check
>> first/last pfn for a matching zone. But it's not *equality*. And any (now just
>> *potential*) user of pageblock_pfn_to_page() with pfn's different than
>> first/last pfn of a pageblock is likely wrong.
>
> Now, I understand your concern. What makes me mislead is that
> 3 of 4 callers to pageblock_pfn_to_page() in compaction.c could call it with
> non-pageblock boundary pfn.

Oh, I thought you were talking about potential new callers, now that the 
function was exported. So let's see about the existing callers:

isolate_migratepages() - first pfn can be non-boundary when restarting 
from a middle of pageblock, that's true. But it means the pageblock has 
already passed the check in previous call where it was boundary, so it's 
safe. Worst can happen that the restarting pfn will be in a 
intra-pageblock hole so pageblock will be falsely skipped over.

isolate_freepages() - always boundary AFAICS?

isolate_migratepages_range() and isolate_freepages_range() - yeah the 
CMA parts say it doesn't have to be aligned, I don't know about actual users

> Maybe, they should be fixed first.

It would be probably best, even for isolate_migratepages() for 
consistency and less-surprisibility.

> Then, yes. I can
> separate first/last pfn's zone checks and pfn_valid_within() checks.
> If then, would you be entirely happy? :)

Maybe, if the patch also made me a coffee :P

> Thanks.

Thanks!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
  2016-01-04 12:38           ` Vlastimil Babka
@ 2016-01-08  2:52             ` Joonsoo Kim
  -1 siblings, 0 replies; 34+ messages in thread
From: Joonsoo Kim @ 2016-01-08  2:52 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Rientjes, Andrew Morton, Aaron Lu, Mel Gorman,
	Rik van Riel, linux-kernel, linux-mm

On Mon, Jan 04, 2016 at 01:38:02PM +0100, Vlastimil Babka wrote:
> On 12/23/2015 07:57 AM, Joonsoo Kim wrote:
> >>>What are the cases where pageblock_pfn_to_page() is used for a subset of
> >>>the pageblock and the result would be problematic for compaction?  I.e.,
> >>>do we actually care to use pageblocks that are not contiguous at all?
> >>
> >>The problematic pageblocks are those that have pages from more than one zone in
> >>them, so we just skip them. Supposedly that can only happen by switching once
> >>between two zones somewhere in the middle of the pageblock, so it's sufficient
> >>to check first and last pfn and compare their zones. So using
> >>pageblock_pfn_to_page() on a subset from compaction would be wrong. Holes (==no
> >>pages) within pageblock is a different thing checked by pfn_valid_within()
> >>(#defined out on archs where such holes cannot happen) when scanning the block.
> >>
> >>That's why I'm not entirely happy with how the patch conflates both the
> >>first/last pfn's zone checks and pfn_valid_within() checks. Yes, a fully
> >>contiguous zone does *imply* that pageblock_pfn_to_page() doesn't have to check
> >>first/last pfn for a matching zone. But it's not *equality*. And any (now just
> >>*potential*) user of pageblock_pfn_to_page() with pfn's different than
> >>first/last pfn of a pageblock is likely wrong.
> >
> >Now, I understand your concern. What makes me mislead is that
> >3 of 4 callers to pageblock_pfn_to_page() in compaction.c could call it with
> >non-pageblock boundary pfn.
> 
> Oh, I thought you were talking about potential new callers, now that
> the function was exported. So let's see about the existing callers:
> 
> isolate_migratepages() - first pfn can be non-boundary when
> restarting from a middle of pageblock, that's true. But it means the
> pageblock has already passed the check in previous call where it was
> boundary, so it's safe. Worst can happen that the restarting pfn
> will be in a intra-pageblock hole so pageblock will be falsely
> skipped over.

Yes, you are right.

> 
> isolate_freepages() - always boundary AFAICS?
> 
> isolate_migratepages_range() and isolate_freepages_range() - yeah
> the CMA parts say it doesn't have to be aligned, I don't know about
> actual users

CMA can call them with non-pageblock aligned pfn but checking
pageblock_pfn_to_page() with pageblock aligned pfn will be safe because
there is a constraint for CMA region that it is aligned with pageblock
and it should be in a single zone. Even, it has checked pfn_valid()
for all pfn during initialization step.

> >Maybe, they should be fixed first.
> 
> It would be probably best, even for isolate_migratepages() for
> consistency and less-surprisibility.

Yes. Without this fix, if only pageblock aligned pfn is checked for
cached hole information, optimized pageblock_pfn_to_page() would
cause error when meeting intra-pageblock hole in isolate_migratepages().

> >Then, yes. I can
> >separate first/last pfn's zone checks and pfn_valid_within() checks.
> >If then, would you be entirely happy? :)
> 
> Maybe, if the patch also made me a coffee :P

I hope so. :)

Thanks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
@ 2016-01-08  2:52             ` Joonsoo Kim
  0 siblings, 0 replies; 34+ messages in thread
From: Joonsoo Kim @ 2016-01-08  2:52 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Rientjes, Andrew Morton, Aaron Lu, Mel Gorman,
	Rik van Riel, linux-kernel, linux-mm

On Mon, Jan 04, 2016 at 01:38:02PM +0100, Vlastimil Babka wrote:
> On 12/23/2015 07:57 AM, Joonsoo Kim wrote:
> >>>What are the cases where pageblock_pfn_to_page() is used for a subset of
> >>>the pageblock and the result would be problematic for compaction?  I.e.,
> >>>do we actually care to use pageblocks that are not contiguous at all?
> >>
> >>The problematic pageblocks are those that have pages from more than one zone in
> >>them, so we just skip them. Supposedly that can only happen by switching once
> >>between two zones somewhere in the middle of the pageblock, so it's sufficient
> >>to check first and last pfn and compare their zones. So using
> >>pageblock_pfn_to_page() on a subset from compaction would be wrong. Holes (==no
> >>pages) within pageblock is a different thing checked by pfn_valid_within()
> >>(#defined out on archs where such holes cannot happen) when scanning the block.
> >>
> >>That's why I'm not entirely happy with how the patch conflates both the
> >>first/last pfn's zone checks and pfn_valid_within() checks. Yes, a fully
> >>contiguous zone does *imply* that pageblock_pfn_to_page() doesn't have to check
> >>first/last pfn for a matching zone. But it's not *equality*. And any (now just
> >>*potential*) user of pageblock_pfn_to_page() with pfn's different than
> >>first/last pfn of a pageblock is likely wrong.
> >
> >Now, I understand your concern. What makes me mislead is that
> >3 of 4 callers to pageblock_pfn_to_page() in compaction.c could call it with
> >non-pageblock boundary pfn.
> 
> Oh, I thought you were talking about potential new callers, now that
> the function was exported. So let's see about the existing callers:
> 
> isolate_migratepages() - first pfn can be non-boundary when
> restarting from a middle of pageblock, that's true. But it means the
> pageblock has already passed the check in previous call where it was
> boundary, so it's safe. Worst can happen that the restarting pfn
> will be in a intra-pageblock hole so pageblock will be falsely
> skipped over.

Yes, you are right.

> 
> isolate_freepages() - always boundary AFAICS?
> 
> isolate_migratepages_range() and isolate_freepages_range() - yeah
> the CMA parts say it doesn't have to be aligned, I don't know about
> actual users

CMA can call them with non-pageblock aligned pfn but checking
pageblock_pfn_to_page() with pageblock aligned pfn will be safe because
there is a constraint for CMA region that it is aligned with pageblock
and it should be in a single zone. Even, it has checked pfn_valid()
for all pfn during initialization step.

> >Maybe, they should be fixed first.
> 
> It would be probably best, even for isolate_migratepages() for
> consistency and less-surprisibility.

Yes. Without this fix, if only pageblock aligned pfn is checked for
cached hole information, optimized pageblock_pfn_to_page() would
cause error when meeting intra-pageblock hole in isolate_migratepages().

> >Then, yes. I can
> >separate first/last pfn's zone checks and pfn_valid_within() checks.
> >If then, would you be entirely happy? :)
> 
> Maybe, if the patch also made me a coffee :P

I hope so. :)

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
  2015-12-21  6:13   ` Joonsoo Kim
@ 2016-01-19  8:29     ` zhong jiang
  -1 siblings, 0 replies; 34+ messages in thread
From: zhong jiang @ 2016-01-19  8:29 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Vlastimil Babka, Aaron Lu, Mel Gorman, Rik van Riel,
	David Rientjes, linux-kernel, linux-mm, Joonsoo Kim

On 2015/12/21 14:13, Joonsoo Kim wrote:
> There is a performance drop report due to hugepage allocation and in there
> half of cpu time are spent on pageblock_pfn_to_page() in compaction [1].
> In that workload, compaction is triggered to make hugepage but most of
> pageblocks are un-available for compaction due to pageblock type and
> skip bit so compaction usually fails. Most costly operations in this case
> is to find valid pageblock while scanning whole zone range. To check
> if pageblock is valid to compact, valid pfn within pageblock is required
> and we can obtain it by calling pageblock_pfn_to_page(). This function
> checks whether pageblock is in a single zone and return valid pfn
> if possible. Problem is that we need to check it every time before
> scanning pageblock even if we re-visit it and this turns out to
> be very expensive in this workload.
> 
> Although we have no way to skip this pageblock check in the system
> where hole exists at arbitrary position, we can use cached value for
> zone continuity and just do pfn_to_page() in the system where hole doesn't
> exist. This optimization considerably speeds up in above workload.
> 
> Before vs After
> Max: 1096 MB/s vs 1325 MB/s
> Min: 635 MB/s 1015 MB/s
> Avg: 899 MB/s 1194 MB/s
> 
> Avg is improved by roughly 30% [2].
> 
> [1]: http://www.spinics.net/lists/linux-mm/msg97378.html
> [2]: https://lkml.org/lkml/2015/12/9/23
> 
> v2
> o checking zone continuity after initialization
> o handle memory-hotplug case
> 
> Reported and Tested-by: Aaron Lu <aaron.lu@intel.com>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  include/linux/gfp.h            |  6 ---
>  include/linux/memory_hotplug.h |  3 ++
>  include/linux/mmzone.h         |  2 +
>  mm/compaction.c                | 43 ---------------------
>  mm/internal.h                  | 12 ++++++
>  mm/memory_hotplug.c            | 10 +++++
>  mm/page_alloc.c                | 85 +++++++++++++++++++++++++++++++++++++++++-
>  7 files changed, 111 insertions(+), 50 deletions(-)
> 
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 91f74e7..6eb3eca 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -515,13 +515,7 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp);
>  void drain_all_pages(struct zone *zone);
>  void drain_local_pages(struct zone *zone);
>  
> -#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
>  void page_alloc_init_late(void);
> -#else
> -static inline void page_alloc_init_late(void)
> -{
> -}
> -#endif
>  
>  /*
>   * gfp_allowed_mask is set to GFP_BOOT_MASK during early boot to restrict what
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 2ea574f..18c2676 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -196,6 +196,9 @@ void put_online_mems(void);
>  void mem_hotplug_begin(void);
>  void mem_hotplug_done(void);
>  
> +extern void set_zone_contiguous(struct zone *zone);
> +extern void clear_zone_contiguous(struct zone *zone);
> +
>  #else /* ! CONFIG_MEMORY_HOTPLUG */
>  /*
>   * Stub functions for when hotplug is off
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 68cc063..eb5d88e 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -523,6 +523,8 @@ struct zone {
>  	bool			compact_blockskip_flush;
>  #endif
>  
> +	bool			contiguous;
> +
>  	ZONE_PADDING(_pad3_)
>  	/* Zone statistics */
>  	atomic_long_t		vm_stat[NR_VM_ZONE_STAT_ITEMS];
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 56fa321..9c89d46 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -71,49 +71,6 @@ static inline bool migrate_async_suitable(int migratetype)
>  	return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE;
>  }
>  
> -/*
> - * Check that the whole (or subset of) a pageblock given by the interval of
> - * [start_pfn, end_pfn) is valid and within the same zone, before scanning it
> - * with the migration of free compaction scanner. The scanners then need to
> - * use only pfn_valid_within() check for arches that allow holes within
> - * pageblocks.
> - *
> - * Return struct page pointer of start_pfn, or NULL if checks were not passed.
> - *
> - * It's possible on some configurations to have a setup like node0 node1 node0
> - * i.e. it's possible that all pages within a zones range of pages do not
> - * belong to a single zone. We assume that a border between node0 and node1
> - * can occur within a single pageblock, but not a node0 node1 node0
> - * interleaving within a single pageblock. It is therefore sufficient to check
> - * the first and last page of a pageblock and avoid checking each individual
> - * page in a pageblock.
> - */
> -static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
> -				unsigned long end_pfn, struct zone *zone)
> -{
> -	struct page *start_page;
> -	struct page *end_page;
> -
> -	/* end_pfn is one past the range we are checking */
> -	end_pfn--;
> -
> -	if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn))
> -		return NULL;
> -
> -	start_page = pfn_to_page(start_pfn);
> -
> -	if (page_zone(start_page) != zone)
> -		return NULL;
> -
> -	end_page = pfn_to_page(end_pfn);
> -
> -	/* This gives a shorter code than deriving page_zone(end_page) */
> -	if (page_zone_id(start_page) != page_zone_id(end_page))
> -		return NULL;
> -
> -	return start_page;
> -}
> -
>  #ifdef CONFIG_COMPACTION
>  
>  /* Do not skip compaction more than 64 times */
> diff --git a/mm/internal.h b/mm/internal.h
> index d01a41c..bc9d337 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -137,6 +137,18 @@ __find_buddy_index(unsigned long page_idx, unsigned int order)
>  	return page_idx ^ (1 << order);
>  }
>  
> +extern struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
> +				unsigned long end_pfn, struct zone *zone);
> +
> +static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
> +				unsigned long end_pfn, struct zone *zone)
> +{
> +	if (zone->contiguous)
> +		return pfn_to_page(start_pfn);
> +
> +	return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
> +}
> +
>  extern int __isolate_free_page(struct page *page, unsigned int order);
>  extern void __free_pages_bootmem(struct page *page, unsigned long pfn,
>  					unsigned int order);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index d8016a2..f7b6e6b 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -505,6 +505,9 @@ int __ref __add_pages(int nid, struct zone *zone, unsigned long phys_start_pfn,
>  	unsigned long i;
>  	int err = 0;
>  	int start_sec, end_sec;
> +
> +	clear_zone_contiguous(zone);
> +
>  	/* during initialize mem_map, align hot-added range to section */
>  	start_sec = pfn_to_section_nr(phys_start_pfn);
>  	end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
> @@ -523,6 +526,8 @@ int __ref __add_pages(int nid, struct zone *zone, unsigned long phys_start_pfn,
>  	}
>  	vmemmap_populate_print_last();
>  
> +	set_zone_contiguous(zone);
> +
>  	return err;
>  }
>  EXPORT_SYMBOL_GPL(__add_pages);
> @@ -770,6 +775,8 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
>  	resource_size_t start, size;
>  	int ret = 0;
>  
> +	clear_zone_contiguous(zone);
> +
>  	/*
>  	 * We can only remove entire sections
>  	 */
> @@ -796,6 +803,9 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
>  		if (ret)
>  			break;
>  	}
> +
> +	set_zone_contiguous(zone);
> +
>  	return ret;
>  }
>  EXPORT_SYMBOL_GPL(__remove_pages);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index bac8842..4f5ad2b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1271,9 +1271,13 @@ free_range:
>  	pgdat_init_report_one_done();
>  	return 0;
>  }
> +#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
>  
>  void __init page_alloc_init_late(void)
>  {
> +	struct zone *zone;
> +
> +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
>  	int nid;
>  
>  	/* There will be num_node_state(N_MEMORY) threads */
> @@ -1287,8 +1291,87 @@ void __init page_alloc_init_late(void)
>  
>  	/* Reinit limits that are based on free pages after the kernel is up */
>  	files_maxfiles_init();
> +#endif
> +
> +	for_each_populated_zone(zone)
> +		set_zone_contiguous(zone);
> +}
> +
> +/*
> + * Check that the whole (or subset of) a pageblock given by the interval of
> + * [start_pfn, end_pfn) is valid and within the same zone, before scanning it
> + * with the migration of free compaction scanner. The scanners then need to
> + * use only pfn_valid_within() check for arches that allow holes within
> + * pageblocks.
> + *
> + * Return struct page pointer of start_pfn, or NULL if checks were not passed.
> + *
> + * It's possible on some configurations to have a setup like node0 node1 node0
> + * i.e. it's possible that all pages within a zones range of pages do not
> + * belong to a single zone. We assume that a border between node0 and node1
> + * can occur within a single pageblock, but not a node0 node1 node0
> + * interleaving within a single pageblock. It is therefore sufficient to check
> + * the first and last page of a pageblock and avoid checking each individual
> + * page in a pageblock.
> + */
> +struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
> +				unsigned long end_pfn, struct zone *zone)
> +{
> +	struct page *start_page;
> +	struct page *end_page;
> +
> +	/* end_pfn is one past the range we are checking */
> +	end_pfn--;
> +
> +	if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn))
> +		return NULL;
> +
> +	start_page = pfn_to_page(start_pfn);
> +
> +	if (page_zone(start_page) != zone)
> +		return NULL;
> +
> +	end_page = pfn_to_page(end_pfn);
> +
> +	/* This gives a shorter code than deriving page_zone(end_page) */
> +	if (page_zone_id(start_page) != page_zone_id(end_page))
> +		return NULL;
> +
> +	return start_page;
> +}
> +
> +void set_zone_contiguous(struct zone *zone)
> +{
> +	unsigned long block_start_pfn = zone->zone_start_pfn;
> +	unsigned long block_end_pfn;
> +	unsigned long pfn;
> +
> +	block_end_pfn = ALIGN(block_start_pfn + 1, pageblock_nr_pages);
> +	for (; block_start_pfn < zone_end_pfn(zone);
> +		block_start_pfn = block_end_pfn,
> +		block_end_pfn += pageblock_nr_pages) {
> +
> +		block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
> +
> +		if (!__pageblock_pfn_to_page(block_start_pfn,
> +					block_end_pfn, zone))
> +			return;
> +
> +		/* Check validity of pfn within pageblock */
> +		for (pfn = block_start_pfn; pfn < block_end_pfn; pfn++) {
> +			if (!pfn_valid_within(pfn))
> +				return;
> +		}
> +	}
> +
> +	/* We confirm that there is no hole */
> +	zone->contiguous = true;
> +}
pfn_valid_within just to check whether the page frame have a valid
section. buf if this section have a hole, it will not work.

Thanks
zhongjiang


> +void clear_zone_contiguous(struct zone *zone)
> +{
> +	zone->contiguous = false;
>  }
> -#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
>  
>  #ifdef CONFIG_CMA
>  /* Free whole pageblock and set its migration type to MIGRATE_CMA. */

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
@ 2016-01-19  8:29     ` zhong jiang
  0 siblings, 0 replies; 34+ messages in thread
From: zhong jiang @ 2016-01-19  8:29 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Vlastimil Babka, Aaron Lu, Mel Gorman, Rik van Riel,
	David Rientjes, linux-kernel, linux-mm, Joonsoo Kim

On 2015/12/21 14:13, Joonsoo Kim wrote:
> There is a performance drop report due to hugepage allocation and in there
> half of cpu time are spent on pageblock_pfn_to_page() in compaction [1].
> In that workload, compaction is triggered to make hugepage but most of
> pageblocks are un-available for compaction due to pageblock type and
> skip bit so compaction usually fails. Most costly operations in this case
> is to find valid pageblock while scanning whole zone range. To check
> if pageblock is valid to compact, valid pfn within pageblock is required
> and we can obtain it by calling pageblock_pfn_to_page(). This function
> checks whether pageblock is in a single zone and return valid pfn
> if possible. Problem is that we need to check it every time before
> scanning pageblock even if we re-visit it and this turns out to
> be very expensive in this workload.
> 
> Although we have no way to skip this pageblock check in the system
> where hole exists at arbitrary position, we can use cached value for
> zone continuity and just do pfn_to_page() in the system where hole doesn't
> exist. This optimization considerably speeds up in above workload.
> 
> Before vs After
> Max: 1096 MB/s vs 1325 MB/s
> Min: 635 MB/s 1015 MB/s
> Avg: 899 MB/s 1194 MB/s
> 
> Avg is improved by roughly 30% [2].
> 
> [1]: http://www.spinics.net/lists/linux-mm/msg97378.html
> [2]: https://lkml.org/lkml/2015/12/9/23
> 
> v2
> o checking zone continuity after initialization
> o handle memory-hotplug case
> 
> Reported and Tested-by: Aaron Lu <aaron.lu@intel.com>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  include/linux/gfp.h            |  6 ---
>  include/linux/memory_hotplug.h |  3 ++
>  include/linux/mmzone.h         |  2 +
>  mm/compaction.c                | 43 ---------------------
>  mm/internal.h                  | 12 ++++++
>  mm/memory_hotplug.c            | 10 +++++
>  mm/page_alloc.c                | 85 +++++++++++++++++++++++++++++++++++++++++-
>  7 files changed, 111 insertions(+), 50 deletions(-)
> 
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 91f74e7..6eb3eca 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -515,13 +515,7 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp);
>  void drain_all_pages(struct zone *zone);
>  void drain_local_pages(struct zone *zone);
>  
> -#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
>  void page_alloc_init_late(void);
> -#else
> -static inline void page_alloc_init_late(void)
> -{
> -}
> -#endif
>  
>  /*
>   * gfp_allowed_mask is set to GFP_BOOT_MASK during early boot to restrict what
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 2ea574f..18c2676 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -196,6 +196,9 @@ void put_online_mems(void);
>  void mem_hotplug_begin(void);
>  void mem_hotplug_done(void);
>  
> +extern void set_zone_contiguous(struct zone *zone);
> +extern void clear_zone_contiguous(struct zone *zone);
> +
>  #else /* ! CONFIG_MEMORY_HOTPLUG */
>  /*
>   * Stub functions for when hotplug is off
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 68cc063..eb5d88e 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -523,6 +523,8 @@ struct zone {
>  	bool			compact_blockskip_flush;
>  #endif
>  
> +	bool			contiguous;
> +
>  	ZONE_PADDING(_pad3_)
>  	/* Zone statistics */
>  	atomic_long_t		vm_stat[NR_VM_ZONE_STAT_ITEMS];
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 56fa321..9c89d46 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -71,49 +71,6 @@ static inline bool migrate_async_suitable(int migratetype)
>  	return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE;
>  }
>  
> -/*
> - * Check that the whole (or subset of) a pageblock given by the interval of
> - * [start_pfn, end_pfn) is valid and within the same zone, before scanning it
> - * with the migration of free compaction scanner. The scanners then need to
> - * use only pfn_valid_within() check for arches that allow holes within
> - * pageblocks.
> - *
> - * Return struct page pointer of start_pfn, or NULL if checks were not passed.
> - *
> - * It's possible on some configurations to have a setup like node0 node1 node0
> - * i.e. it's possible that all pages within a zones range of pages do not
> - * belong to a single zone. We assume that a border between node0 and node1
> - * can occur within a single pageblock, but not a node0 node1 node0
> - * interleaving within a single pageblock. It is therefore sufficient to check
> - * the first and last page of a pageblock and avoid checking each individual
> - * page in a pageblock.
> - */
> -static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
> -				unsigned long end_pfn, struct zone *zone)
> -{
> -	struct page *start_page;
> -	struct page *end_page;
> -
> -	/* end_pfn is one past the range we are checking */
> -	end_pfn--;
> -
> -	if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn))
> -		return NULL;
> -
> -	start_page = pfn_to_page(start_pfn);
> -
> -	if (page_zone(start_page) != zone)
> -		return NULL;
> -
> -	end_page = pfn_to_page(end_pfn);
> -
> -	/* This gives a shorter code than deriving page_zone(end_page) */
> -	if (page_zone_id(start_page) != page_zone_id(end_page))
> -		return NULL;
> -
> -	return start_page;
> -}
> -
>  #ifdef CONFIG_COMPACTION
>  
>  /* Do not skip compaction more than 64 times */
> diff --git a/mm/internal.h b/mm/internal.h
> index d01a41c..bc9d337 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -137,6 +137,18 @@ __find_buddy_index(unsigned long page_idx, unsigned int order)
>  	return page_idx ^ (1 << order);
>  }
>  
> +extern struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
> +				unsigned long end_pfn, struct zone *zone);
> +
> +static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
> +				unsigned long end_pfn, struct zone *zone)
> +{
> +	if (zone->contiguous)
> +		return pfn_to_page(start_pfn);
> +
> +	return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
> +}
> +
>  extern int __isolate_free_page(struct page *page, unsigned int order);
>  extern void __free_pages_bootmem(struct page *page, unsigned long pfn,
>  					unsigned int order);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index d8016a2..f7b6e6b 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -505,6 +505,9 @@ int __ref __add_pages(int nid, struct zone *zone, unsigned long phys_start_pfn,
>  	unsigned long i;
>  	int err = 0;
>  	int start_sec, end_sec;
> +
> +	clear_zone_contiguous(zone);
> +
>  	/* during initialize mem_map, align hot-added range to section */
>  	start_sec = pfn_to_section_nr(phys_start_pfn);
>  	end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
> @@ -523,6 +526,8 @@ int __ref __add_pages(int nid, struct zone *zone, unsigned long phys_start_pfn,
>  	}
>  	vmemmap_populate_print_last();
>  
> +	set_zone_contiguous(zone);
> +
>  	return err;
>  }
>  EXPORT_SYMBOL_GPL(__add_pages);
> @@ -770,6 +775,8 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
>  	resource_size_t start, size;
>  	int ret = 0;
>  
> +	clear_zone_contiguous(zone);
> +
>  	/*
>  	 * We can only remove entire sections
>  	 */
> @@ -796,6 +803,9 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
>  		if (ret)
>  			break;
>  	}
> +
> +	set_zone_contiguous(zone);
> +
>  	return ret;
>  }
>  EXPORT_SYMBOL_GPL(__remove_pages);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index bac8842..4f5ad2b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1271,9 +1271,13 @@ free_range:
>  	pgdat_init_report_one_done();
>  	return 0;
>  }
> +#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
>  
>  void __init page_alloc_init_late(void)
>  {
> +	struct zone *zone;
> +
> +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
>  	int nid;
>  
>  	/* There will be num_node_state(N_MEMORY) threads */
> @@ -1287,8 +1291,87 @@ void __init page_alloc_init_late(void)
>  
>  	/* Reinit limits that are based on free pages after the kernel is up */
>  	files_maxfiles_init();
> +#endif
> +
> +	for_each_populated_zone(zone)
> +		set_zone_contiguous(zone);
> +}
> +
> +/*
> + * Check that the whole (or subset of) a pageblock given by the interval of
> + * [start_pfn, end_pfn) is valid and within the same zone, before scanning it
> + * with the migration of free compaction scanner. The scanners then need to
> + * use only pfn_valid_within() check for arches that allow holes within
> + * pageblocks.
> + *
> + * Return struct page pointer of start_pfn, or NULL if checks were not passed.
> + *
> + * It's possible on some configurations to have a setup like node0 node1 node0
> + * i.e. it's possible that all pages within a zones range of pages do not
> + * belong to a single zone. We assume that a border between node0 and node1
> + * can occur within a single pageblock, but not a node0 node1 node0
> + * interleaving within a single pageblock. It is therefore sufficient to check
> + * the first and last page of a pageblock and avoid checking each individual
> + * page in a pageblock.
> + */
> +struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
> +				unsigned long end_pfn, struct zone *zone)
> +{
> +	struct page *start_page;
> +	struct page *end_page;
> +
> +	/* end_pfn is one past the range we are checking */
> +	end_pfn--;
> +
> +	if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn))
> +		return NULL;
> +
> +	start_page = pfn_to_page(start_pfn);
> +
> +	if (page_zone(start_page) != zone)
> +		return NULL;
> +
> +	end_page = pfn_to_page(end_pfn);
> +
> +	/* This gives a shorter code than deriving page_zone(end_page) */
> +	if (page_zone_id(start_page) != page_zone_id(end_page))
> +		return NULL;
> +
> +	return start_page;
> +}
> +
> +void set_zone_contiguous(struct zone *zone)
> +{
> +	unsigned long block_start_pfn = zone->zone_start_pfn;
> +	unsigned long block_end_pfn;
> +	unsigned long pfn;
> +
> +	block_end_pfn = ALIGN(block_start_pfn + 1, pageblock_nr_pages);
> +	for (; block_start_pfn < zone_end_pfn(zone);
> +		block_start_pfn = block_end_pfn,
> +		block_end_pfn += pageblock_nr_pages) {
> +
> +		block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
> +
> +		if (!__pageblock_pfn_to_page(block_start_pfn,
> +					block_end_pfn, zone))
> +			return;
> +
> +		/* Check validity of pfn within pageblock */
> +		for (pfn = block_start_pfn; pfn < block_end_pfn; pfn++) {
> +			if (!pfn_valid_within(pfn))
> +				return;
> +		}
> +	}
> +
> +	/* We confirm that there is no hole */
> +	zone->contiguous = true;
> +}
pfn_valid_within just to check whether the page frame have a valid
section. buf if this section have a hole, it will not work.

Thanks
zhongjiang


> +void clear_zone_contiguous(struct zone *zone)
> +{
> +	zone->contiguous = false;
>  }
> -#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
>  
>  #ifdef CONFIG_CMA
>  /* Free whole pageblock and set its migration type to MIGRATE_CMA. */


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
  2015-12-15  1:06         ` Aaron Lu
@ 2015-12-15  8:24           ` Vlastimil Babka
  -1 siblings, 0 replies; 34+ messages in thread
From: Vlastimil Babka @ 2015-12-15  8:24 UTC (permalink / raw)
  To: Aaron Lu, Joonsoo Kim
  Cc: Andrew Morton, Mel Gorman, Rik van Riel, David Rientjes, LKML,
	Linux Memory Management List, Joonsoo Kim

On 12/15/2015 02:06 AM, Aaron Lu wrote:
> On 12/14/2015 11:25 PM, Joonsoo Kim wrote:
>> 2015-12-14 19:29 GMT+09:00 Vlastimil Babka <vbabka@suse.cz>:
>>> Unless I'm mistaken, these results also include my RFC series (Aaron can you
>>> clarify?). These patches should better be tested standalone on top of base,
>>> as being simpler they will probably be included sooner (the RFC series needs
>>> reviews at the very least :) - although the memory hotplug concerns might
>>> make the "sooner" here relative too.
>>
>> AFAIK, these patches are tested standalone on top of base. When I sent it,
>> I asked to Aaron to test it on top of base.
>
> Right, it is tested standalone on top of base.

Thanks, sorry about the noise then.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
@ 2015-12-15  8:24           ` Vlastimil Babka
  0 siblings, 0 replies; 34+ messages in thread
From: Vlastimil Babka @ 2015-12-15  8:24 UTC (permalink / raw)
  To: Aaron Lu, Joonsoo Kim
  Cc: Andrew Morton, Mel Gorman, Rik van Riel, David Rientjes, LKML,
	Linux Memory Management List, Joonsoo Kim

On 12/15/2015 02:06 AM, Aaron Lu wrote:
> On 12/14/2015 11:25 PM, Joonsoo Kim wrote:
>> 2015-12-14 19:29 GMT+09:00 Vlastimil Babka <vbabka@suse.cz>:
>>> Unless I'm mistaken, these results also include my RFC series (Aaron can you
>>> clarify?). These patches should better be tested standalone on top of base,
>>> as being simpler they will probably be included sooner (the RFC series needs
>>> reviews at the very least :) - although the memory hotplug concerns might
>>> make the "sooner" here relative too.
>>
>> AFAIK, these patches are tested standalone on top of base. When I sent it,
>> I asked to Aaron to test it on top of base.
>
> Right, it is tested standalone on top of base.

Thanks, sorry about the noise then.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
  2015-12-14 15:25       ` Joonsoo Kim
@ 2015-12-15  1:06         ` Aaron Lu
  -1 siblings, 0 replies; 34+ messages in thread
From: Aaron Lu @ 2015-12-15  1:06 UTC (permalink / raw)
  To: Joonsoo Kim, Vlastimil Babka
  Cc: Andrew Morton, Mel Gorman, Rik van Riel, David Rientjes, LKML,
	Linux Memory Management List, Joonsoo Kim

On 12/14/2015 11:25 PM, Joonsoo Kim wrote:
> 2015-12-14 19:29 GMT+09:00 Vlastimil Babka <vbabka@suse.cz>:
>> On 12/14/2015 06:02 AM, Joonsoo Kim wrote:
>>> Before vs After
>>> Max: 1096 MB/s vs 1325 MB/s
>>> Min: 635 MB/s 1015 MB/s
>>> Avg: 899 MB/s 1194 MB/s
>>>
>>> Avg is improved by roughly 30% [2].
>>
>>
>> Unless I'm mistaken, these results also include my RFC series (Aaron can you
>> clarify?). These patches should better be tested standalone on top of base,
>> as being simpler they will probably be included sooner (the RFC series needs
>> reviews at the very least :) - although the memory hotplug concerns might
>> make the "sooner" here relative too.
> 
> AFAIK, these patches are tested standalone on top of base. When I sent it,
> I asked to Aaron to test it on top of base.

Right, it is tested standalone on top of base.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
@ 2015-12-15  1:06         ` Aaron Lu
  0 siblings, 0 replies; 34+ messages in thread
From: Aaron Lu @ 2015-12-15  1:06 UTC (permalink / raw)
  To: Joonsoo Kim, Vlastimil Babka
  Cc: Andrew Morton, Mel Gorman, Rik van Riel, David Rientjes, LKML,
	Linux Memory Management List, Joonsoo Kim

On 12/14/2015 11:25 PM, Joonsoo Kim wrote:
> 2015-12-14 19:29 GMT+09:00 Vlastimil Babka <vbabka@suse.cz>:
>> On 12/14/2015 06:02 AM, Joonsoo Kim wrote:
>>> Before vs After
>>> Max: 1096 MB/s vs 1325 MB/s
>>> Min: 635 MB/s 1015 MB/s
>>> Avg: 899 MB/s 1194 MB/s
>>>
>>> Avg is improved by roughly 30% [2].
>>
>>
>> Unless I'm mistaken, these results also include my RFC series (Aaron can you
>> clarify?). These patches should better be tested standalone on top of base,
>> as being simpler they will probably be included sooner (the RFC series needs
>> reviews at the very least :) - although the memory hotplug concerns might
>> make the "sooner" here relative too.
> 
> AFAIK, these patches are tested standalone on top of base. When I sent it,
> I asked to Aaron to test it on top of base.

Right, it is tested standalone on top of base.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
  2015-12-14 10:29     ` Vlastimil Babka
@ 2015-12-14 15:25       ` Joonsoo Kim
  -1 siblings, 0 replies; 34+ messages in thread
From: Joonsoo Kim @ 2015-12-14 15:25 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Aaron Lu, Mel Gorman, Rik van Riel,
	David Rientjes, LKML, Linux Memory Management List, Joonsoo Kim

2015-12-14 19:29 GMT+09:00 Vlastimil Babka <vbabka@suse.cz>:
> On 12/14/2015 06:02 AM, Joonsoo Kim wrote:
>>
>> There is a performance drop report due to hugepage allocation and in there
>> half of cpu time are spent on pageblock_pfn_to_page() in compaction [1].
>> In that workload, compaction is triggered to make hugepage but most of
>> pageblocks are un-available for compaction due to pageblock type and
>> skip bit so compaction usually fails. Most costly operations in this case
>> is to find valid pageblock while scanning whole zone range. To check
>> if pageblock is valid to compact, valid pfn within pageblock is required
>> and we can obtain it by calling pageblock_pfn_to_page(). This function
>> checks whether pageblock is in a single zone and return valid pfn
>> if possible. Problem is that we need to check it every time before
>> scanning pageblock even if we re-visit it and this turns out to
>> be very expensive in this workload.
>
>
> Hm I wonder if this is safe wrt memory hotplug? Shouldn't there be a
> rechecking plugged into the appropriate hotplug add/remove callbacks? Which
> would make the whole thing generic too, zone->contiguous information doesn't
> have to be limited to compaction. And it would remove the rather ugly part
> where cached pfn info is used as an indication of zone->contiguous being
> already set...

Will check it.

>> Although we have no way to skip this pageblock check in the system
>> where hole exists at arbitrary position, we can use cached value for
>> zone continuity and just do pfn_to_page() in the system where hole doesn't
>> exist. This optimization considerably speeds up in above workload.
>>
>> Before vs After
>> Max: 1096 MB/s vs 1325 MB/s
>> Min: 635 MB/s 1015 MB/s
>> Avg: 899 MB/s 1194 MB/s
>>
>> Avg is improved by roughly 30% [2].
>
>
> Unless I'm mistaken, these results also include my RFC series (Aaron can you
> clarify?). These patches should better be tested standalone on top of base,
> as being simpler they will probably be included sooner (the RFC series needs
> reviews at the very least :) - although the memory hotplug concerns might
> make the "sooner" here relative too.

AFAIK, these patches are tested standalone on top of base. When I sent it,
I asked to Aaron to test it on top of base.

Btw, I missed adding Reported/Tested-by tag for Aaron. I will add it
on next spin.

> Anyway it's interesting that this patch improved "Min", and variance in
> general (on top of my RFC) so much. I would expect the overhead of
> pageblock_pfn_to_page() to be quite stable, hmm.

Perhaps, pageblock_pfn_to_page() would be stable. Combination of
slow scanning and kswapd's skip bit flushing would result in unstable result.

Thanks.

>
>> Not to disturb the system where compaction isn't triggered, checking will
>> be done at first compaction invocation.
>>
>> [1]: http://www.spinics.net/lists/linux-mm/msg97378.html
>> [2]: https://lkml.org/lkml/2015/12/9/23
>>
>> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>> ---
>>   include/linux/mmzone.h |  1 +
>>   mm/compaction.c        | 49
>> ++++++++++++++++++++++++++++++++++++++++++++++++-
>>   2 files changed, 49 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index 68cc063..cd3736e 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -521,6 +521,7 @@ struct zone {
>>   #if defined CONFIG_COMPACTION || defined CONFIG_CMA
>>         /* Set to true when the PG_migrate_skip bits should be cleared */
>>         bool                    compact_blockskip_flush;
>> +       bool                    contiguous;
>>   #endif
>>
>>         ZONE_PADDING(_pad3_)
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 56fa321..ce60b38 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -88,7 +88,7 @@ static inline bool migrate_async_suitable(int
>> migratetype)
>>    * the first and last page of a pageblock and avoid checking each
>> individual
>>    * page in a pageblock.
>>    */
>> -static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
>> +static struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
>>                                 unsigned long end_pfn, struct zone *zone)
>>   {
>>         struct page *start_page;
>> @@ -114,6 +114,51 @@ static struct page *pageblock_pfn_to_page(unsigned
>> long start_pfn,
>>         return start_page;
>>   }
>>
>> +static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
>> +                               unsigned long end_pfn, struct zone *zone)
>> +{
>> +       if (zone->contiguous)
>> +               return pfn_to_page(start_pfn);
>> +
>> +       return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
>> +}
>> +
>> +static void check_zone_contiguous(struct zone *zone)
>> +{
>> +       unsigned long block_start_pfn = zone->zone_start_pfn;
>> +       unsigned long block_end_pfn;
>> +       unsigned long pfn;
>> +
>> +       /* Already initialized if cached pfn is non-zero */
>> +       if (zone->compact_cached_migrate_pfn[0] ||
>> +               zone->compact_cached_free_pfn)
>> +               return;
>> +
>> +       /* Mark that checking is in progress */
>> +       zone->compact_cached_free_pfn = ULONG_MAX;
>> +
>> +       block_end_pfn = ALIGN(block_start_pfn + 1, pageblock_nr_pages);
>> +       for (; block_start_pfn < zone_end_pfn(zone);
>> +               block_start_pfn = block_end_pfn,
>> +               block_end_pfn += pageblock_nr_pages) {
>> +
>> +               block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
>> +
>> +               if (!__pageblock_pfn_to_page(block_start_pfn,
>> +                                       block_end_pfn, zone))
>> +                       return;
>> +
>> +               /* Check validity of pfn within pageblock */
>> +               for (pfn = block_start_pfn; pfn < block_end_pfn; pfn++) {
>> +                       if (!pfn_valid_within(pfn))
>> +                               return;
>> +               }
>> +       }
>> +
>> +       /* We confirm that there is no hole */
>> +       zone->contiguous = true;
>> +}
>> +
>>   #ifdef CONFIG_COMPACTION
>>
>>   /* Do not skip compaction more than 64 times */
>> @@ -1357,6 +1402,8 @@ static int compact_zone(struct zone *zone, struct
>> compact_control *cc)
>>                 ;
>>         }
>>
>> +       check_zone_contiguous(zone);
>> +
>>         /*
>>          * Clear pageblock skip if there were failures recently and
>> compaction
>>          * is about to be retried after being deferred. kswapd does not do
>>
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
@ 2015-12-14 15:25       ` Joonsoo Kim
  0 siblings, 0 replies; 34+ messages in thread
From: Joonsoo Kim @ 2015-12-14 15:25 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Aaron Lu, Mel Gorman, Rik van Riel,
	David Rientjes, LKML, Linux Memory Management List, Joonsoo Kim

2015-12-14 19:29 GMT+09:00 Vlastimil Babka <vbabka@suse.cz>:
> On 12/14/2015 06:02 AM, Joonsoo Kim wrote:
>>
>> There is a performance drop report due to hugepage allocation and in there
>> half of cpu time are spent on pageblock_pfn_to_page() in compaction [1].
>> In that workload, compaction is triggered to make hugepage but most of
>> pageblocks are un-available for compaction due to pageblock type and
>> skip bit so compaction usually fails. Most costly operations in this case
>> is to find valid pageblock while scanning whole zone range. To check
>> if pageblock is valid to compact, valid pfn within pageblock is required
>> and we can obtain it by calling pageblock_pfn_to_page(). This function
>> checks whether pageblock is in a single zone and return valid pfn
>> if possible. Problem is that we need to check it every time before
>> scanning pageblock even if we re-visit it and this turns out to
>> be very expensive in this workload.
>
>
> Hm I wonder if this is safe wrt memory hotplug? Shouldn't there be a
> rechecking plugged into the appropriate hotplug add/remove callbacks? Which
> would make the whole thing generic too, zone->contiguous information doesn't
> have to be limited to compaction. And it would remove the rather ugly part
> where cached pfn info is used as an indication of zone->contiguous being
> already set...

Will check it.

>> Although we have no way to skip this pageblock check in the system
>> where hole exists at arbitrary position, we can use cached value for
>> zone continuity and just do pfn_to_page() in the system where hole doesn't
>> exist. This optimization considerably speeds up in above workload.
>>
>> Before vs After
>> Max: 1096 MB/s vs 1325 MB/s
>> Min: 635 MB/s 1015 MB/s
>> Avg: 899 MB/s 1194 MB/s
>>
>> Avg is improved by roughly 30% [2].
>
>
> Unless I'm mistaken, these results also include my RFC series (Aaron can you
> clarify?). These patches should better be tested standalone on top of base,
> as being simpler they will probably be included sooner (the RFC series needs
> reviews at the very least :) - although the memory hotplug concerns might
> make the "sooner" here relative too.

AFAIK, these patches are tested standalone on top of base. When I sent it,
I asked to Aaron to test it on top of base.

Btw, I missed adding Reported/Tested-by tag for Aaron. I will add it
on next spin.

> Anyway it's interesting that this patch improved "Min", and variance in
> general (on top of my RFC) so much. I would expect the overhead of
> pageblock_pfn_to_page() to be quite stable, hmm.

Perhaps, pageblock_pfn_to_page() would be stable. Combination of
slow scanning and kswapd's skip bit flushing would result in unstable result.

Thanks.

>
>> Not to disturb the system where compaction isn't triggered, checking will
>> be done at first compaction invocation.
>>
>> [1]: http://www.spinics.net/lists/linux-mm/msg97378.html
>> [2]: https://lkml.org/lkml/2015/12/9/23
>>
>> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>> ---
>>   include/linux/mmzone.h |  1 +
>>   mm/compaction.c        | 49
>> ++++++++++++++++++++++++++++++++++++++++++++++++-
>>   2 files changed, 49 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index 68cc063..cd3736e 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -521,6 +521,7 @@ struct zone {
>>   #if defined CONFIG_COMPACTION || defined CONFIG_CMA
>>         /* Set to true when the PG_migrate_skip bits should be cleared */
>>         bool                    compact_blockskip_flush;
>> +       bool                    contiguous;
>>   #endif
>>
>>         ZONE_PADDING(_pad3_)
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 56fa321..ce60b38 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -88,7 +88,7 @@ static inline bool migrate_async_suitable(int
>> migratetype)
>>    * the first and last page of a pageblock and avoid checking each
>> individual
>>    * page in a pageblock.
>>    */
>> -static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
>> +static struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
>>                                 unsigned long end_pfn, struct zone *zone)
>>   {
>>         struct page *start_page;
>> @@ -114,6 +114,51 @@ static struct page *pageblock_pfn_to_page(unsigned
>> long start_pfn,
>>         return start_page;
>>   }
>>
>> +static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
>> +                               unsigned long end_pfn, struct zone *zone)
>> +{
>> +       if (zone->contiguous)
>> +               return pfn_to_page(start_pfn);
>> +
>> +       return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
>> +}
>> +
>> +static void check_zone_contiguous(struct zone *zone)
>> +{
>> +       unsigned long block_start_pfn = zone->zone_start_pfn;
>> +       unsigned long block_end_pfn;
>> +       unsigned long pfn;
>> +
>> +       /* Already initialized if cached pfn is non-zero */
>> +       if (zone->compact_cached_migrate_pfn[0] ||
>> +               zone->compact_cached_free_pfn)
>> +               return;
>> +
>> +       /* Mark that checking is in progress */
>> +       zone->compact_cached_free_pfn = ULONG_MAX;
>> +
>> +       block_end_pfn = ALIGN(block_start_pfn + 1, pageblock_nr_pages);
>> +       for (; block_start_pfn < zone_end_pfn(zone);
>> +               block_start_pfn = block_end_pfn,
>> +               block_end_pfn += pageblock_nr_pages) {
>> +
>> +               block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
>> +
>> +               if (!__pageblock_pfn_to_page(block_start_pfn,
>> +                                       block_end_pfn, zone))
>> +                       return;
>> +
>> +               /* Check validity of pfn within pageblock */
>> +               for (pfn = block_start_pfn; pfn < block_end_pfn; pfn++) {
>> +                       if (!pfn_valid_within(pfn))
>> +                               return;
>> +               }
>> +       }
>> +
>> +       /* We confirm that there is no hole */
>> +       zone->contiguous = true;
>> +}
>> +
>>   #ifdef CONFIG_COMPACTION
>>
>>   /* Do not skip compaction more than 64 times */
>> @@ -1357,6 +1402,8 @@ static int compact_zone(struct zone *zone, struct
>> compact_control *cc)
>>                 ;
>>         }
>>
>> +       check_zone_contiguous(zone);
>> +
>>         /*
>>          * Clear pageblock skip if there were failures recently and
>> compaction
>>          * is about to be retried after being deferred. kswapd does not do
>>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
  2015-12-14  5:02   ` Joonsoo Kim
@ 2015-12-14 10:29     ` Vlastimil Babka
  -1 siblings, 0 replies; 34+ messages in thread
From: Vlastimil Babka @ 2015-12-14 10:29 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Aaron Lu, Mel Gorman, Rik van Riel, David Rientjes, linux-kernel,
	linux-mm, Joonsoo Kim

On 12/14/2015 06:02 AM, Joonsoo Kim wrote:
> There is a performance drop report due to hugepage allocation and in there
> half of cpu time are spent on pageblock_pfn_to_page() in compaction [1].
> In that workload, compaction is triggered to make hugepage but most of
> pageblocks are un-available for compaction due to pageblock type and
> skip bit so compaction usually fails. Most costly operations in this case
> is to find valid pageblock while scanning whole zone range. To check
> if pageblock is valid to compact, valid pfn within pageblock is required
> and we can obtain it by calling pageblock_pfn_to_page(). This function
> checks whether pageblock is in a single zone and return valid pfn
> if possible. Problem is that we need to check it every time before
> scanning pageblock even if we re-visit it and this turns out to
> be very expensive in this workload.

Hm I wonder if this is safe wrt memory hotplug? Shouldn't there be a 
rechecking plugged into the appropriate hotplug add/remove callbacks? 
Which would make the whole thing generic too, zone->contiguous 
information doesn't have to be limited to compaction. And it would 
remove the rather ugly part where cached pfn info is used as an 
indication of zone->contiguous being already set...

> Although we have no way to skip this pageblock check in the system
> where hole exists at arbitrary position, we can use cached value for
> zone continuity and just do pfn_to_page() in the system where hole doesn't
> exist. This optimization considerably speeds up in above workload.
>
> Before vs After
> Max: 1096 MB/s vs 1325 MB/s
> Min: 635 MB/s 1015 MB/s
> Avg: 899 MB/s 1194 MB/s
>
> Avg is improved by roughly 30% [2].

Unless I'm mistaken, these results also include my RFC series (Aaron can 
you clarify?). These patches should better be tested standalone on top 
of base, as being simpler they will probably be included sooner (the RFC 
series needs reviews at the very least :) - although the memory hotplug 
concerns might make the "sooner" here relative too.

Anyway it's interesting that this patch improved "Min", and variance in 
general (on top of my RFC) so much. I would expect the overhead of 
pageblock_pfn_to_page() to be quite stable, hmm.

> Not to disturb the system where compaction isn't triggered, checking will
> be done at first compaction invocation.
>
> [1]: http://www.spinics.net/lists/linux-mm/msg97378.html
> [2]: https://lkml.org/lkml/2015/12/9/23
>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>   include/linux/mmzone.h |  1 +
>   mm/compaction.c        | 49 ++++++++++++++++++++++++++++++++++++++++++++++++-
>   2 files changed, 49 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 68cc063..cd3736e 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -521,6 +521,7 @@ struct zone {
>   #if defined CONFIG_COMPACTION || defined CONFIG_CMA
>   	/* Set to true when the PG_migrate_skip bits should be cleared */
>   	bool			compact_blockskip_flush;
> +	bool			contiguous;
>   #endif
>
>   	ZONE_PADDING(_pad3_)
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 56fa321..ce60b38 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -88,7 +88,7 @@ static inline bool migrate_async_suitable(int migratetype)
>    * the first and last page of a pageblock and avoid checking each individual
>    * page in a pageblock.
>    */
> -static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
> +static struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
>   				unsigned long end_pfn, struct zone *zone)
>   {
>   	struct page *start_page;
> @@ -114,6 +114,51 @@ static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
>   	return start_page;
>   }
>
> +static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
> +				unsigned long end_pfn, struct zone *zone)
> +{
> +	if (zone->contiguous)
> +		return pfn_to_page(start_pfn);
> +
> +	return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
> +}
> +
> +static void check_zone_contiguous(struct zone *zone)
> +{
> +	unsigned long block_start_pfn = zone->zone_start_pfn;
> +	unsigned long block_end_pfn;
> +	unsigned long pfn;
> +
> +	/* Already initialized if cached pfn is non-zero */
> +	if (zone->compact_cached_migrate_pfn[0] ||
> +		zone->compact_cached_free_pfn)
> +		return;
> +
> +	/* Mark that checking is in progress */
> +	zone->compact_cached_free_pfn = ULONG_MAX;
> +
> +	block_end_pfn = ALIGN(block_start_pfn + 1, pageblock_nr_pages);
> +	for (; block_start_pfn < zone_end_pfn(zone);
> +		block_start_pfn = block_end_pfn,
> +		block_end_pfn += pageblock_nr_pages) {
> +
> +		block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
> +
> +		if (!__pageblock_pfn_to_page(block_start_pfn,
> +					block_end_pfn, zone))
> +			return;
> +
> +		/* Check validity of pfn within pageblock */
> +		for (pfn = block_start_pfn; pfn < block_end_pfn; pfn++) {
> +			if (!pfn_valid_within(pfn))
> +				return;
> +		}
> +	}
> +
> +	/* We confirm that there is no hole */
> +	zone->contiguous = true;
> +}
> +
>   #ifdef CONFIG_COMPACTION
>
>   /* Do not skip compaction more than 64 times */
> @@ -1357,6 +1402,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
>   		;
>   	}
>
> +	check_zone_contiguous(zone);
> +
>   	/*
>   	 * Clear pageblock skip if there were failures recently and compaction
>   	 * is about to be retried after being deferred. kswapd does not do
>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
@ 2015-12-14 10:29     ` Vlastimil Babka
  0 siblings, 0 replies; 34+ messages in thread
From: Vlastimil Babka @ 2015-12-14 10:29 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Aaron Lu, Mel Gorman, Rik van Riel, David Rientjes, linux-kernel,
	linux-mm, Joonsoo Kim

On 12/14/2015 06:02 AM, Joonsoo Kim wrote:
> There is a performance drop report due to hugepage allocation and in there
> half of cpu time are spent on pageblock_pfn_to_page() in compaction [1].
> In that workload, compaction is triggered to make hugepage but most of
> pageblocks are un-available for compaction due to pageblock type and
> skip bit so compaction usually fails. Most costly operations in this case
> is to find valid pageblock while scanning whole zone range. To check
> if pageblock is valid to compact, valid pfn within pageblock is required
> and we can obtain it by calling pageblock_pfn_to_page(). This function
> checks whether pageblock is in a single zone and return valid pfn
> if possible. Problem is that we need to check it every time before
> scanning pageblock even if we re-visit it and this turns out to
> be very expensive in this workload.

Hm I wonder if this is safe wrt memory hotplug? Shouldn't there be a 
rechecking plugged into the appropriate hotplug add/remove callbacks? 
Which would make the whole thing generic too, zone->contiguous 
information doesn't have to be limited to compaction. And it would 
remove the rather ugly part where cached pfn info is used as an 
indication of zone->contiguous being already set...

> Although we have no way to skip this pageblock check in the system
> where hole exists at arbitrary position, we can use cached value for
> zone continuity and just do pfn_to_page() in the system where hole doesn't
> exist. This optimization considerably speeds up in above workload.
>
> Before vs After
> Max: 1096 MB/s vs 1325 MB/s
> Min: 635 MB/s 1015 MB/s
> Avg: 899 MB/s 1194 MB/s
>
> Avg is improved by roughly 30% [2].

Unless I'm mistaken, these results also include my RFC series (Aaron can 
you clarify?). These patches should better be tested standalone on top 
of base, as being simpler they will probably be included sooner (the RFC 
series needs reviews at the very least :) - although the memory hotplug 
concerns might make the "sooner" here relative too.

Anyway it's interesting that this patch improved "Min", and variance in 
general (on top of my RFC) so much. I would expect the overhead of 
pageblock_pfn_to_page() to be quite stable, hmm.

> Not to disturb the system where compaction isn't triggered, checking will
> be done at first compaction invocation.
>
> [1]: http://www.spinics.net/lists/linux-mm/msg97378.html
> [2]: https://lkml.org/lkml/2015/12/9/23
>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>   include/linux/mmzone.h |  1 +
>   mm/compaction.c        | 49 ++++++++++++++++++++++++++++++++++++++++++++++++-
>   2 files changed, 49 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 68cc063..cd3736e 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -521,6 +521,7 @@ struct zone {
>   #if defined CONFIG_COMPACTION || defined CONFIG_CMA
>   	/* Set to true when the PG_migrate_skip bits should be cleared */
>   	bool			compact_blockskip_flush;
> +	bool			contiguous;
>   #endif
>
>   	ZONE_PADDING(_pad3_)
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 56fa321..ce60b38 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -88,7 +88,7 @@ static inline bool migrate_async_suitable(int migratetype)
>    * the first and last page of a pageblock and avoid checking each individual
>    * page in a pageblock.
>    */
> -static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
> +static struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
>   				unsigned long end_pfn, struct zone *zone)
>   {
>   	struct page *start_page;
> @@ -114,6 +114,51 @@ static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
>   	return start_page;
>   }
>
> +static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
> +				unsigned long end_pfn, struct zone *zone)
> +{
> +	if (zone->contiguous)
> +		return pfn_to_page(start_pfn);
> +
> +	return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
> +}
> +
> +static void check_zone_contiguous(struct zone *zone)
> +{
> +	unsigned long block_start_pfn = zone->zone_start_pfn;
> +	unsigned long block_end_pfn;
> +	unsigned long pfn;
> +
> +	/* Already initialized if cached pfn is non-zero */
> +	if (zone->compact_cached_migrate_pfn[0] ||
> +		zone->compact_cached_free_pfn)
> +		return;
> +
> +	/* Mark that checking is in progress */
> +	zone->compact_cached_free_pfn = ULONG_MAX;
> +
> +	block_end_pfn = ALIGN(block_start_pfn + 1, pageblock_nr_pages);
> +	for (; block_start_pfn < zone_end_pfn(zone);
> +		block_start_pfn = block_end_pfn,
> +		block_end_pfn += pageblock_nr_pages) {
> +
> +		block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
> +
> +		if (!__pageblock_pfn_to_page(block_start_pfn,
> +					block_end_pfn, zone))
> +			return;
> +
> +		/* Check validity of pfn within pageblock */
> +		for (pfn = block_start_pfn; pfn < block_end_pfn; pfn++) {
> +			if (!pfn_valid_within(pfn))
> +				return;
> +		}
> +	}
> +
> +	/* We confirm that there is no hole */
> +	zone->contiguous = true;
> +}
> +
>   #ifdef CONFIG_COMPACTION
>
>   /* Do not skip compaction more than 64 times */
> @@ -1357,6 +1402,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
>   		;
>   	}
>
> +	check_zone_contiguous(zone);
> +
>   	/*
>   	 * Clear pageblock skip if there were failures recently and compaction
>   	 * is about to be retried after being deferred. kswapd does not do
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
  2015-12-14  5:02 Joonsoo Kim
@ 2015-12-14  5:02   ` Joonsoo Kim
  0 siblings, 0 replies; 34+ messages in thread
From: Joonsoo Kim @ 2015-12-14  5:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Aaron Lu, Mel Gorman, Rik van Riel,
	David Rientjes, linux-kernel, linux-mm, Joonsoo Kim

There is a performance drop report due to hugepage allocation and in there
half of cpu time are spent on pageblock_pfn_to_page() in compaction [1].
In that workload, compaction is triggered to make hugepage but most of
pageblocks are un-available for compaction due to pageblock type and
skip bit so compaction usually fails. Most costly operations in this case
is to find valid pageblock while scanning whole zone range. To check
if pageblock is valid to compact, valid pfn within pageblock is required
and we can obtain it by calling pageblock_pfn_to_page(). This function
checks whether pageblock is in a single zone and return valid pfn
if possible. Problem is that we need to check it every time before
scanning pageblock even if we re-visit it and this turns out to
be very expensive in this workload.

Although we have no way to skip this pageblock check in the system
where hole exists at arbitrary position, we can use cached value for
zone continuity and just do pfn_to_page() in the system where hole doesn't
exist. This optimization considerably speeds up in above workload.

Before vs After
Max: 1096 MB/s vs 1325 MB/s
Min: 635 MB/s 1015 MB/s
Avg: 899 MB/s 1194 MB/s

Avg is improved by roughly 30% [2].

Not to disturb the system where compaction isn't triggered, checking will
be done at first compaction invocation.

[1]: http://www.spinics.net/lists/linux-mm/msg97378.html
[2]: https://lkml.org/lkml/2015/12/9/23

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 include/linux/mmzone.h |  1 +
 mm/compaction.c        | 49 ++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 68cc063..cd3736e 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -521,6 +521,7 @@ struct zone {
 #if defined CONFIG_COMPACTION || defined CONFIG_CMA
 	/* Set to true when the PG_migrate_skip bits should be cleared */
 	bool			compact_blockskip_flush;
+	bool			contiguous;
 #endif
 
 	ZONE_PADDING(_pad3_)
diff --git a/mm/compaction.c b/mm/compaction.c
index 56fa321..ce60b38 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -88,7 +88,7 @@ static inline bool migrate_async_suitable(int migratetype)
  * the first and last page of a pageblock and avoid checking each individual
  * page in a pageblock.
  */
-static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
+static struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
 				unsigned long end_pfn, struct zone *zone)
 {
 	struct page *start_page;
@@ -114,6 +114,51 @@ static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
 	return start_page;
 }
 
+static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
+				unsigned long end_pfn, struct zone *zone)
+{
+	if (zone->contiguous)
+		return pfn_to_page(start_pfn);
+
+	return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
+}
+
+static void check_zone_contiguous(struct zone *zone)
+{
+	unsigned long block_start_pfn = zone->zone_start_pfn;
+	unsigned long block_end_pfn;
+	unsigned long pfn;
+
+	/* Already initialized if cached pfn is non-zero */
+	if (zone->compact_cached_migrate_pfn[0] ||
+		zone->compact_cached_free_pfn)
+		return;
+
+	/* Mark that checking is in progress */
+	zone->compact_cached_free_pfn = ULONG_MAX;
+
+	block_end_pfn = ALIGN(block_start_pfn + 1, pageblock_nr_pages);
+	for (; block_start_pfn < zone_end_pfn(zone);
+		block_start_pfn = block_end_pfn,
+		block_end_pfn += pageblock_nr_pages) {
+
+		block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
+
+		if (!__pageblock_pfn_to_page(block_start_pfn,
+					block_end_pfn, zone))
+			return;
+
+		/* Check validity of pfn within pageblock */
+		for (pfn = block_start_pfn; pfn < block_end_pfn; pfn++) {
+			if (!pfn_valid_within(pfn))
+				return;
+		}
+	}
+
+	/* We confirm that there is no hole */
+	zone->contiguous = true;
+}
+
 #ifdef CONFIG_COMPACTION
 
 /* Do not skip compaction more than 64 times */
@@ -1357,6 +1402,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 		;
 	}
 
+	check_zone_contiguous(zone);
+
 	/*
 	 * Clear pageblock skip if there were failures recently and compaction
 	 * is about to be retried after being deferred. kswapd does not do
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
@ 2015-12-14  5:02   ` Joonsoo Kim
  0 siblings, 0 replies; 34+ messages in thread
From: Joonsoo Kim @ 2015-12-14  5:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Aaron Lu, Mel Gorman, Rik van Riel,
	David Rientjes, linux-kernel, linux-mm, Joonsoo Kim

There is a performance drop report due to hugepage allocation and in there
half of cpu time are spent on pageblock_pfn_to_page() in compaction [1].
In that workload, compaction is triggered to make hugepage but most of
pageblocks are un-available for compaction due to pageblock type and
skip bit so compaction usually fails. Most costly operations in this case
is to find valid pageblock while scanning whole zone range. To check
if pageblock is valid to compact, valid pfn within pageblock is required
and we can obtain it by calling pageblock_pfn_to_page(). This function
checks whether pageblock is in a single zone and return valid pfn
if possible. Problem is that we need to check it every time before
scanning pageblock even if we re-visit it and this turns out to
be very expensive in this workload.

Although we have no way to skip this pageblock check in the system
where hole exists at arbitrary position, we can use cached value for
zone continuity and just do pfn_to_page() in the system where hole doesn't
exist. This optimization considerably speeds up in above workload.

Before vs After
Max: 1096 MB/s vs 1325 MB/s
Min: 635 MB/s 1015 MB/s
Avg: 899 MB/s 1194 MB/s

Avg is improved by roughly 30% [2].

Not to disturb the system where compaction isn't triggered, checking will
be done at first compaction invocation.

[1]: http://www.spinics.net/lists/linux-mm/msg97378.html
[2]: https://lkml.org/lkml/2015/12/9/23

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 include/linux/mmzone.h |  1 +
 mm/compaction.c        | 49 ++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 68cc063..cd3736e 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -521,6 +521,7 @@ struct zone {
 #if defined CONFIG_COMPACTION || defined CONFIG_CMA
 	/* Set to true when the PG_migrate_skip bits should be cleared */
 	bool			compact_blockskip_flush;
+	bool			contiguous;
 #endif
 
 	ZONE_PADDING(_pad3_)
diff --git a/mm/compaction.c b/mm/compaction.c
index 56fa321..ce60b38 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -88,7 +88,7 @@ static inline bool migrate_async_suitable(int migratetype)
  * the first and last page of a pageblock and avoid checking each individual
  * page in a pageblock.
  */
-static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
+static struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
 				unsigned long end_pfn, struct zone *zone)
 {
 	struct page *start_page;
@@ -114,6 +114,51 @@ static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
 	return start_page;
 }
 
+static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
+				unsigned long end_pfn, struct zone *zone)
+{
+	if (zone->contiguous)
+		return pfn_to_page(start_pfn);
+
+	return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
+}
+
+static void check_zone_contiguous(struct zone *zone)
+{
+	unsigned long block_start_pfn = zone->zone_start_pfn;
+	unsigned long block_end_pfn;
+	unsigned long pfn;
+
+	/* Already initialized if cached pfn is non-zero */
+	if (zone->compact_cached_migrate_pfn[0] ||
+		zone->compact_cached_free_pfn)
+		return;
+
+	/* Mark that checking is in progress */
+	zone->compact_cached_free_pfn = ULONG_MAX;
+
+	block_end_pfn = ALIGN(block_start_pfn + 1, pageblock_nr_pages);
+	for (; block_start_pfn < zone_end_pfn(zone);
+		block_start_pfn = block_end_pfn,
+		block_end_pfn += pageblock_nr_pages) {
+
+		block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
+
+		if (!__pageblock_pfn_to_page(block_start_pfn,
+					block_end_pfn, zone))
+			return;
+
+		/* Check validity of pfn within pageblock */
+		for (pfn = block_start_pfn; pfn < block_end_pfn; pfn++) {
+			if (!pfn_valid_within(pfn))
+				return;
+		}
+	}
+
+	/* We confirm that there is no hole */
+	zone->contiguous = true;
+}
+
 #ifdef CONFIG_COMPACTION
 
 /* Do not skip compaction more than 64 times */
@@ -1357,6 +1402,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 		;
 	}
 
+	check_zone_contiguous(zone);
+
 	/*
 	 * Clear pageblock skip if there were failures recently and compaction
 	 * is about to be retried after being deferred. kswapd does not do
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2016-01-19  8:32 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-21  6:13 [PATCH 1/2] mm/compaction: fix invalid free_pfn and compact_cached_free_pfn Joonsoo Kim
2015-12-21  6:13 ` Joonsoo Kim
2015-12-21  6:13 ` [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous Joonsoo Kim
2015-12-21  6:13   ` Joonsoo Kim
2015-12-21 10:46   ` Vlastimil Babka
2015-12-21 10:46     ` Vlastimil Babka
2015-12-21 12:18     ` Joonsoo Kim
2015-12-21 12:18       ` Joonsoo Kim
2015-12-21 12:38       ` Joonsoo Kim
2015-12-21 12:38         ` Joonsoo Kim
2015-12-22 22:17   ` David Rientjes
2015-12-22 22:17     ` David Rientjes
2015-12-23  6:14     ` Vlastimil Babka
2015-12-23  6:14       ` Vlastimil Babka
2015-12-23  6:57       ` Joonsoo Kim
2015-12-23  6:57         ` Joonsoo Kim
2016-01-04 12:38         ` Vlastimil Babka
2016-01-04 12:38           ` Vlastimil Babka
2016-01-08  2:52           ` Joonsoo Kim
2016-01-08  2:52             ` Joonsoo Kim
2016-01-19  8:29   ` zhong jiang
2016-01-19  8:29     ` zhong jiang
2015-12-22 22:05 ` [PATCH 1/2] mm/compaction: fix invalid free_pfn and compact_cached_free_pfn David Rientjes
2015-12-22 22:05   ` David Rientjes
  -- strict thread matches above, loose matches on Subject: below --
2015-12-14  5:02 Joonsoo Kim
2015-12-14  5:02 ` [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous Joonsoo Kim
2015-12-14  5:02   ` Joonsoo Kim
2015-12-14 10:29   ` Vlastimil Babka
2015-12-14 10:29     ` Vlastimil Babka
2015-12-14 15:25     ` Joonsoo Kim
2015-12-14 15:25       ` Joonsoo Kim
2015-12-15  1:06       ` Aaron Lu
2015-12-15  1:06         ` Aaron Lu
2015-12-15  8:24         ` Vlastimil Babka
2015-12-15  8:24           ` Vlastimil Babka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.