All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/14] Parallel struct page initialisation v5r4
@ 2015-04-22 17:07 ` Mel Gorman
  0 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-22 17:07 UTC (permalink / raw)
  To: Linux-MM
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, LKML, Mel Gorman

Changelog since v1
o Always initialise low zones
o Typo corrections
o Rename parallel mem init to parallel struct page init
o Rebase to 4.0

Struct page initialisation had been identified as one of the reasons why
large machines take a long time to boot. Patches were posted a long time ago
to defer initialisation until they were first used.  This was rejected on
the grounds it should not be necessary to hurt the fast paths. This series
reuses much of the work from that time but defers the initialisation of
memory to kswapd so that one thread per node initialises memory local to
that node.

After applying the series and setting the appropriate Kconfig variable I
see this in the boot log on a 64G machine

[    7.383764] kswapd 0 initialised deferred memory in 188ms
[    7.404253] kswapd 1 initialised deferred memory in 208ms
[    7.411044] kswapd 3 initialised deferred memory in 216ms
[    7.411551] kswapd 2 initialised deferred memory in 216ms

On a 1TB machine, I see

[   11.913324] kswapd 0 initialised deferred memory in 1168ms
[   12.220011] kswapd 2 initialised deferred memory in 1476ms
[   12.245369] kswapd 3 initialised deferred memory in 1500ms
[   12.271680] kswapd 1 initialised deferred memory in 1528ms

Once booted the machine appears to work as normal. Boot times were measured
from the time shutdown was called until ssh was available again.  In the
64G case, the boot time savings are negligible. On the 1TB machine, the
savings were 16 seconds.

It would be nice if the people that have access to really large machines
would test this series and report back if the complexity is justified.

Patches are against 4.0.

 Documentation/kernel-parameters.txt |   6 +
 arch/ia64/mm/numa.c                 |  19 +-
 arch/x86/Kconfig                    |   1 +
 include/linux/memblock.h            |  18 ++
 include/linux/mm.h                  |   8 +-
 include/linux/mmzone.h              |  45 ++--
 init/main.c                         |   1 +
 mm/Kconfig                          |  28 +++
 mm/bootmem.c                        |   8 +-
 mm/internal.h                       |  23 +-
 mm/memblock.c                       |  34 ++-
 mm/mm_init.c                        |   9 +-
 mm/nobootmem.c                      |   7 +-
 mm/page_alloc.c                     | 408 +++++++++++++++++++++++++++++++-----
 mm/vmscan.c                         |   6 +-
 15 files changed, 514 insertions(+), 107 deletions(-)

-- 
2.1.2


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [RFC PATCH 0/14] Parallel struct page initialisation v5r4
@ 2015-04-22 17:07 ` Mel Gorman
  0 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-22 17:07 UTC (permalink / raw)
  To: Linux-MM
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, LKML, Mel Gorman

Changelog since v1
o Always initialise low zones
o Typo corrections
o Rename parallel mem init to parallel struct page init
o Rebase to 4.0

Struct page initialisation had been identified as one of the reasons why
large machines take a long time to boot. Patches were posted a long time ago
to defer initialisation until they were first used.  This was rejected on
the grounds it should not be necessary to hurt the fast paths. This series
reuses much of the work from that time but defers the initialisation of
memory to kswapd so that one thread per node initialises memory local to
that node.

After applying the series and setting the appropriate Kconfig variable I
see this in the boot log on a 64G machine

[    7.383764] kswapd 0 initialised deferred memory in 188ms
[    7.404253] kswapd 1 initialised deferred memory in 208ms
[    7.411044] kswapd 3 initialised deferred memory in 216ms
[    7.411551] kswapd 2 initialised deferred memory in 216ms

On a 1TB machine, I see

[   11.913324] kswapd 0 initialised deferred memory in 1168ms
[   12.220011] kswapd 2 initialised deferred memory in 1476ms
[   12.245369] kswapd 3 initialised deferred memory in 1500ms
[   12.271680] kswapd 1 initialised deferred memory in 1528ms

Once booted the machine appears to work as normal. Boot times were measured
from the time shutdown was called until ssh was available again.  In the
64G case, the boot time savings are negligible. On the 1TB machine, the
savings were 16 seconds.

It would be nice if the people that have access to really large machines
would test this series and report back if the complexity is justified.

Patches are against 4.0.

 Documentation/kernel-parameters.txt |   6 +
 arch/ia64/mm/numa.c                 |  19 +-
 arch/x86/Kconfig                    |   1 +
 include/linux/memblock.h            |  18 ++
 include/linux/mm.h                  |   8 +-
 include/linux/mmzone.h              |  45 ++--
 init/main.c                         |   1 +
 mm/Kconfig                          |  28 +++
 mm/bootmem.c                        |   8 +-
 mm/internal.h                       |  23 +-
 mm/memblock.c                       |  34 ++-
 mm/mm_init.c                        |   9 +-
 mm/nobootmem.c                      |   7 +-
 mm/page_alloc.c                     | 408 +++++++++++++++++++++++++++++++-----
 mm/vmscan.c                         |   6 +-
 15 files changed, 514 insertions(+), 107 deletions(-)

-- 
2.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 01/13] memblock: Introduce a for_each_reserved_mem_region iterator.
  2015-04-22 17:07 ` Mel Gorman
@ 2015-04-22 17:07   ` Mel Gorman
  -1 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-22 17:07 UTC (permalink / raw)
  To: Linux-MM
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, LKML, Mel Gorman

From: Robin Holt <holt@sgi.com>

As part of initializing struct page's in 2MiB chunks, we noticed that
at the end of free_all_bootmem(), there was nothing which had forced
the reserved/allocated 4KiB pages to be initialized.

This helper function will be used for that expansion.

Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Nate Zimmer <nzimmer@sgi.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/memblock.h | 18 ++++++++++++++++++
 mm/memblock.c            | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index e8cc45307f8f..3075e7673c54 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -93,6 +93,9 @@ void __next_mem_range_rev(u64 *idx, int nid, struct memblock_type *type_a,
 			  struct memblock_type *type_b, phys_addr_t *out_start,
 			  phys_addr_t *out_end, int *out_nid);
 
+void __next_reserved_mem_region(u64 *idx, phys_addr_t *out_start,
+			       phys_addr_t *out_end);
+
 /**
  * for_each_mem_range - iterate through memblock areas from type_a and not
  * included in type_b. Or just type_a if type_b is NULL.
@@ -132,6 +135,21 @@ void __next_mem_range_rev(u64 *idx, int nid, struct memblock_type *type_a,
 	     __next_mem_range_rev(&i, nid, type_a, type_b,		\
 				  p_start, p_end, p_nid))
 
+/**
+ * for_each_reserved_mem_region - iterate over all reserved memblock areas
+ * @i: u64 used as loop variable
+ * @p_start: ptr to phys_addr_t for start address of the range, can be %NULL
+ * @p_end: ptr to phys_addr_t for end address of the range, can be %NULL
+ *
+ * Walks over reserved areas of memblock. Available as soon as memblock
+ * is initialized.
+ */
+#define for_each_reserved_mem_region(i, p_start, p_end)			\
+	for (i = 0UL,							\
+	     __next_reserved_mem_region(&i, p_start, p_end);		\
+	     i != (u64)ULLONG_MAX;					\
+	     __next_reserved_mem_region(&i, p_start, p_end))
+
 #ifdef CONFIG_MOVABLE_NODE
 static inline bool memblock_is_hotpluggable(struct memblock_region *m)
 {
diff --git a/mm/memblock.c b/mm/memblock.c
index 252b77bdf65e..e0cc2d174f74 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -765,6 +765,38 @@ int __init_memblock memblock_clear_hotplug(phys_addr_t base, phys_addr_t size)
 }
 
 /**
+ * __next_reserved_mem_region - next function for for_each_reserved_region()
+ * @idx: pointer to u64 loop variable
+ * @out_start: ptr to phys_addr_t for start address of the region, can be %NULL
+ * @out_end: ptr to phys_addr_t for end address of the region, can be %NULL
+ *
+ * Iterate over all reserved memory regions.
+ */
+void __init_memblock __next_reserved_mem_region(u64 *idx,
+					   phys_addr_t *out_start,
+					   phys_addr_t *out_end)
+{
+	struct memblock_type *rsv = &memblock.reserved;
+
+	if (*idx >= 0 && *idx < rsv->cnt) {
+		struct memblock_region *r = &rsv->regions[*idx];
+		phys_addr_t base = r->base;
+		phys_addr_t size = r->size;
+
+		if (out_start)
+			*out_start = base;
+		if (out_end)
+			*out_end = base + size - 1;
+
+		*idx += 1;
+		return;
+	}
+
+	/* signal end of iteration */
+	*idx = ULLONG_MAX;
+}
+
+/**
  * __next__mem_range - next function for for_each_free_mem_range() etc.
  * @idx: pointer to u64 loop variable
  * @nid: node selector, %NUMA_NO_NODE for all nodes
-- 
2.1.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 01/13] memblock: Introduce a for_each_reserved_mem_region iterator.
@ 2015-04-22 17:07   ` Mel Gorman
  0 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-22 17:07 UTC (permalink / raw)
  To: Linux-MM
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, LKML, Mel Gorman

From: Robin Holt <holt@sgi.com>

As part of initializing struct page's in 2MiB chunks, we noticed that
at the end of free_all_bootmem(), there was nothing which had forced
the reserved/allocated 4KiB pages to be initialized.

This helper function will be used for that expansion.

Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Nate Zimmer <nzimmer@sgi.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/memblock.h | 18 ++++++++++++++++++
 mm/memblock.c            | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index e8cc45307f8f..3075e7673c54 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -93,6 +93,9 @@ void __next_mem_range_rev(u64 *idx, int nid, struct memblock_type *type_a,
 			  struct memblock_type *type_b, phys_addr_t *out_start,
 			  phys_addr_t *out_end, int *out_nid);
 
+void __next_reserved_mem_region(u64 *idx, phys_addr_t *out_start,
+			       phys_addr_t *out_end);
+
 /**
  * for_each_mem_range - iterate through memblock areas from type_a and not
  * included in type_b. Or just type_a if type_b is NULL.
@@ -132,6 +135,21 @@ void __next_mem_range_rev(u64 *idx, int nid, struct memblock_type *type_a,
 	     __next_mem_range_rev(&i, nid, type_a, type_b,		\
 				  p_start, p_end, p_nid))
 
+/**
+ * for_each_reserved_mem_region - iterate over all reserved memblock areas
+ * @i: u64 used as loop variable
+ * @p_start: ptr to phys_addr_t for start address of the range, can be %NULL
+ * @p_end: ptr to phys_addr_t for end address of the range, can be %NULL
+ *
+ * Walks over reserved areas of memblock. Available as soon as memblock
+ * is initialized.
+ */
+#define for_each_reserved_mem_region(i, p_start, p_end)			\
+	for (i = 0UL,							\
+	     __next_reserved_mem_region(&i, p_start, p_end);		\
+	     i != (u64)ULLONG_MAX;					\
+	     __next_reserved_mem_region(&i, p_start, p_end))
+
 #ifdef CONFIG_MOVABLE_NODE
 static inline bool memblock_is_hotpluggable(struct memblock_region *m)
 {
diff --git a/mm/memblock.c b/mm/memblock.c
index 252b77bdf65e..e0cc2d174f74 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -765,6 +765,38 @@ int __init_memblock memblock_clear_hotplug(phys_addr_t base, phys_addr_t size)
 }
 
 /**
+ * __next_reserved_mem_region - next function for for_each_reserved_region()
+ * @idx: pointer to u64 loop variable
+ * @out_start: ptr to phys_addr_t for start address of the region, can be %NULL
+ * @out_end: ptr to phys_addr_t for end address of the region, can be %NULL
+ *
+ * Iterate over all reserved memory regions.
+ */
+void __init_memblock __next_reserved_mem_region(u64 *idx,
+					   phys_addr_t *out_start,
+					   phys_addr_t *out_end)
+{
+	struct memblock_type *rsv = &memblock.reserved;
+
+	if (*idx >= 0 && *idx < rsv->cnt) {
+		struct memblock_region *r = &rsv->regions[*idx];
+		phys_addr_t base = r->base;
+		phys_addr_t size = r->size;
+
+		if (out_start)
+			*out_start = base;
+		if (out_end)
+			*out_end = base + size - 1;
+
+		*idx += 1;
+		return;
+	}
+
+	/* signal end of iteration */
+	*idx = ULLONG_MAX;
+}
+
+/**
  * __next__mem_range - next function for for_each_free_mem_range() etc.
  * @idx: pointer to u64 loop variable
  * @nid: node selector, %NUMA_NO_NODE for all nodes
-- 
2.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 02/13] mm: meminit: Move page initialization into a separate function.
  2015-04-22 17:07 ` Mel Gorman
@ 2015-04-22 17:07   ` Mel Gorman
  -1 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-22 17:07 UTC (permalink / raw)
  To: Linux-MM
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, LKML, Mel Gorman

From: Robin Holt <holt@sgi.com>

Currently, memmap_init_zone() has all the smarts for initializing a single
page. A subset of this is required for parallel page initialisation and so
this patch breaks up the monolithic function in preparation.

Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/page_alloc.c | 79 +++++++++++++++++++++++++++++++++------------------------
 1 file changed, 46 insertions(+), 33 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 40e29429e7b0..fd7a6d09062d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -778,6 +778,51 @@ static int free_tail_pages_check(struct page *head_page, struct page *page)
 	return 0;
 }
 
+static void __meminit __init_single_page(struct page *page, unsigned long pfn,
+				unsigned long zone, int nid)
+{
+	struct zone *z = &NODE_DATA(nid)->node_zones[zone];
+
+	set_page_links(page, zone, nid, pfn);
+	mminit_verify_page_links(page, zone, nid, pfn);
+	init_page_count(page);
+	page_mapcount_reset(page);
+	page_cpupid_reset_last(page);
+	SetPageReserved(page);
+
+	/*
+	 * Mark the block movable so that blocks are reserved for
+	 * movable at startup. This will force kernel allocations
+	 * to reserve their blocks rather than leaking throughout
+	 * the address space during boot when many long-lived
+	 * kernel allocations are made. Later some blocks near
+	 * the start are marked MIGRATE_RESERVE by
+	 * setup_zone_migrate_reserve()
+	 *
+	 * bitmap is created for zone's valid pfn range. but memmap
+	 * can be created for invalid pages (for alignment)
+	 * check here not to call set_pageblock_migratetype() against
+	 * pfn out of zone.
+	 */
+	if ((z->zone_start_pfn <= pfn)
+	    && (pfn < zone_end_pfn(z))
+	    && !(pfn & (pageblock_nr_pages - 1)))
+		set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+
+	INIT_LIST_HEAD(&page->lru);
+#ifdef WANT_PAGE_VIRTUAL
+	/* The shift won't overflow because ZONE_NORMAL is below 4G. */
+	if (!is_highmem_idx(zone))
+		set_page_address(page, __va(pfn << PAGE_SHIFT));
+#endif
+}
+
+static void __meminit __init_single_pfn(unsigned long pfn, unsigned long zone,
+					int nid)
+{
+	return __init_single_page(pfn_to_page(pfn), pfn, zone, nid);
+}
+
 static bool free_pages_prepare(struct page *page, unsigned int order)
 {
 	bool compound = PageCompound(page);
@@ -4124,7 +4169,6 @@ static void setup_zone_migrate_reserve(struct zone *zone)
 void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 		unsigned long start_pfn, enum memmap_context context)
 {
-	struct page *page;
 	unsigned long end_pfn = start_pfn + size;
 	unsigned long pfn;
 	struct zone *z;
@@ -4145,38 +4189,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 			if (!early_pfn_in_nid(pfn, nid))
 				continue;
 		}
-		page = pfn_to_page(pfn);
-		set_page_links(page, zone, nid, pfn);
-		mminit_verify_page_links(page, zone, nid, pfn);
-		init_page_count(page);
-		page_mapcount_reset(page);
-		page_cpupid_reset_last(page);
-		SetPageReserved(page);
-		/*
-		 * Mark the block movable so that blocks are reserved for
-		 * movable at startup. This will force kernel allocations
-		 * to reserve their blocks rather than leaking throughout
-		 * the address space during boot when many long-lived
-		 * kernel allocations are made. Later some blocks near
-		 * the start are marked MIGRATE_RESERVE by
-		 * setup_zone_migrate_reserve()
-		 *
-		 * bitmap is created for zone's valid pfn range. but memmap
-		 * can be created for invalid pages (for alignment)
-		 * check here not to call set_pageblock_migratetype() against
-		 * pfn out of zone.
-		 */
-		if ((z->zone_start_pfn <= pfn)
-		    && (pfn < zone_end_pfn(z))
-		    && !(pfn & (pageblock_nr_pages - 1)))
-			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
-
-		INIT_LIST_HEAD(&page->lru);
-#ifdef WANT_PAGE_VIRTUAL
-		/* The shift won't overflow because ZONE_NORMAL is below 4G. */
-		if (!is_highmem_idx(zone))
-			set_page_address(page, __va(pfn << PAGE_SHIFT));
-#endif
+		__init_single_pfn(pfn, zone, nid);
 	}
 }
 
-- 
2.1.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 02/13] mm: meminit: Move page initialization into a separate function.
@ 2015-04-22 17:07   ` Mel Gorman
  0 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-22 17:07 UTC (permalink / raw)
  To: Linux-MM
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, LKML, Mel Gorman

From: Robin Holt <holt@sgi.com>

Currently, memmap_init_zone() has all the smarts for initializing a single
page. A subset of this is required for parallel page initialisation and so
this patch breaks up the monolithic function in preparation.

Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/page_alloc.c | 79 +++++++++++++++++++++++++++++++++------------------------
 1 file changed, 46 insertions(+), 33 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 40e29429e7b0..fd7a6d09062d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -778,6 +778,51 @@ static int free_tail_pages_check(struct page *head_page, struct page *page)
 	return 0;
 }
 
+static void __meminit __init_single_page(struct page *page, unsigned long pfn,
+				unsigned long zone, int nid)
+{
+	struct zone *z = &NODE_DATA(nid)->node_zones[zone];
+
+	set_page_links(page, zone, nid, pfn);
+	mminit_verify_page_links(page, zone, nid, pfn);
+	init_page_count(page);
+	page_mapcount_reset(page);
+	page_cpupid_reset_last(page);
+	SetPageReserved(page);
+
+	/*
+	 * Mark the block movable so that blocks are reserved for
+	 * movable at startup. This will force kernel allocations
+	 * to reserve their blocks rather than leaking throughout
+	 * the address space during boot when many long-lived
+	 * kernel allocations are made. Later some blocks near
+	 * the start are marked MIGRATE_RESERVE by
+	 * setup_zone_migrate_reserve()
+	 *
+	 * bitmap is created for zone's valid pfn range. but memmap
+	 * can be created for invalid pages (for alignment)
+	 * check here not to call set_pageblock_migratetype() against
+	 * pfn out of zone.
+	 */
+	if ((z->zone_start_pfn <= pfn)
+	    && (pfn < zone_end_pfn(z))
+	    && !(pfn & (pageblock_nr_pages - 1)))
+		set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+
+	INIT_LIST_HEAD(&page->lru);
+#ifdef WANT_PAGE_VIRTUAL
+	/* The shift won't overflow because ZONE_NORMAL is below 4G. */
+	if (!is_highmem_idx(zone))
+		set_page_address(page, __va(pfn << PAGE_SHIFT));
+#endif
+}
+
+static void __meminit __init_single_pfn(unsigned long pfn, unsigned long zone,
+					int nid)
+{
+	return __init_single_page(pfn_to_page(pfn), pfn, zone, nid);
+}
+
 static bool free_pages_prepare(struct page *page, unsigned int order)
 {
 	bool compound = PageCompound(page);
@@ -4124,7 +4169,6 @@ static void setup_zone_migrate_reserve(struct zone *zone)
 void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 		unsigned long start_pfn, enum memmap_context context)
 {
-	struct page *page;
 	unsigned long end_pfn = start_pfn + size;
 	unsigned long pfn;
 	struct zone *z;
@@ -4145,38 +4189,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 			if (!early_pfn_in_nid(pfn, nid))
 				continue;
 		}
-		page = pfn_to_page(pfn);
-		set_page_links(page, zone, nid, pfn);
-		mminit_verify_page_links(page, zone, nid, pfn);
-		init_page_count(page);
-		page_mapcount_reset(page);
-		page_cpupid_reset_last(page);
-		SetPageReserved(page);
-		/*
-		 * Mark the block movable so that blocks are reserved for
-		 * movable at startup. This will force kernel allocations
-		 * to reserve their blocks rather than leaking throughout
-		 * the address space during boot when many long-lived
-		 * kernel allocations are made. Later some blocks near
-		 * the start are marked MIGRATE_RESERVE by
-		 * setup_zone_migrate_reserve()
-		 *
-		 * bitmap is created for zone's valid pfn range. but memmap
-		 * can be created for invalid pages (for alignment)
-		 * check here not to call set_pageblock_migratetype() against
-		 * pfn out of zone.
-		 */
-		if ((z->zone_start_pfn <= pfn)
-		    && (pfn < zone_end_pfn(z))
-		    && !(pfn & (pageblock_nr_pages - 1)))
-			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
-
-		INIT_LIST_HEAD(&page->lru);
-#ifdef WANT_PAGE_VIRTUAL
-		/* The shift won't overflow because ZONE_NORMAL is below 4G. */
-		if (!is_highmem_idx(zone))
-			set_page_address(page, __va(pfn << PAGE_SHIFT));
-#endif
+		__init_single_pfn(pfn, zone, nid);
 	}
 }
 
-- 
2.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 03/13] mm: meminit: Only set page reserved in the memblock region
  2015-04-22 17:07 ` Mel Gorman
@ 2015-04-22 17:07   ` Mel Gorman
  -1 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-22 17:07 UTC (permalink / raw)
  To: Linux-MM
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, LKML, Mel Gorman

From: Nathan Zimmer <nzimmer@sgi.com>

Currently we when we initialze each page struct is set as reserved upon
initialization.  This changes to starting with the reserved bit clear and
then only setting the bit in the reserved region.

Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
---
 include/linux/mm.h |  2 ++
 mm/nobootmem.c     |  3 +++
 mm/page_alloc.c    | 11 ++++++++++-
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 47a93928b90f..b6f82a31028a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1711,6 +1711,8 @@ extern void free_highmem_page(struct page *page);
 extern void adjust_managed_page_count(struct page *page, long count);
 extern void mem_init_print_info(const char *str);
 
+extern void reserve_bootmem_region(unsigned long start, unsigned long end);
+
 /* Free the reserved page into the buddy system, so it gets managed. */
 static inline void __free_reserved_page(struct page *page)
 {
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index 90b50468333e..396f9e450dc1 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -121,6 +121,9 @@ static unsigned long __init free_low_memory_core_early(void)
 
 	memblock_clear_hotplug(0, -1);
 
+	for_each_reserved_mem_region(i, &start, &end)
+		reserve_bootmem_region(start, end);
+
 	for_each_free_mem_range(i, NUMA_NO_NODE, &start, &end, NULL)
 		count += __free_memory_core(start, end);
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fd7a6d09062d..2abb3b861e70 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -788,7 +788,6 @@ static void __meminit __init_single_page(struct page *page, unsigned long pfn,
 	init_page_count(page);
 	page_mapcount_reset(page);
 	page_cpupid_reset_last(page);
-	SetPageReserved(page);
 
 	/*
 	 * Mark the block movable so that blocks are reserved for
@@ -823,6 +822,16 @@ static void __meminit __init_single_pfn(unsigned long pfn, unsigned long zone,
 	return __init_single_page(pfn_to_page(pfn), pfn, zone, nid);
 }
 
+void reserve_bootmem_region(unsigned long start, unsigned long end)
+{
+	unsigned long start_pfn = PFN_DOWN(start);
+	unsigned long end_pfn = PFN_UP(end);
+
+	for (; start_pfn < end_pfn; start_pfn++)
+		if (pfn_valid(start_pfn))
+			SetPageReserved(pfn_to_page(start_pfn));
+}
+
 static bool free_pages_prepare(struct page *page, unsigned int order)
 {
 	bool compound = PageCompound(page);
-- 
2.1.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 03/13] mm: meminit: Only set page reserved in the memblock region
@ 2015-04-22 17:07   ` Mel Gorman
  0 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-22 17:07 UTC (permalink / raw)
  To: Linux-MM
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, LKML, Mel Gorman

From: Nathan Zimmer <nzimmer@sgi.com>

Currently we when we initialze each page struct is set as reserved upon
initialization.  This changes to starting with the reserved bit clear and
then only setting the bit in the reserved region.

Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
---
 include/linux/mm.h |  2 ++
 mm/nobootmem.c     |  3 +++
 mm/page_alloc.c    | 11 ++++++++++-
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 47a93928b90f..b6f82a31028a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1711,6 +1711,8 @@ extern void free_highmem_page(struct page *page);
 extern void adjust_managed_page_count(struct page *page, long count);
 extern void mem_init_print_info(const char *str);
 
+extern void reserve_bootmem_region(unsigned long start, unsigned long end);
+
 /* Free the reserved page into the buddy system, so it gets managed. */
 static inline void __free_reserved_page(struct page *page)
 {
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index 90b50468333e..396f9e450dc1 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -121,6 +121,9 @@ static unsigned long __init free_low_memory_core_early(void)
 
 	memblock_clear_hotplug(0, -1);
 
+	for_each_reserved_mem_region(i, &start, &end)
+		reserve_bootmem_region(start, end);
+
 	for_each_free_mem_range(i, NUMA_NO_NODE, &start, &end, NULL)
 		count += __free_memory_core(start, end);
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fd7a6d09062d..2abb3b861e70 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -788,7 +788,6 @@ static void __meminit __init_single_page(struct page *page, unsigned long pfn,
 	init_page_count(page);
 	page_mapcount_reset(page);
 	page_cpupid_reset_last(page);
-	SetPageReserved(page);
 
 	/*
 	 * Mark the block movable so that blocks are reserved for
@@ -823,6 +822,16 @@ static void __meminit __init_single_pfn(unsigned long pfn, unsigned long zone,
 	return __init_single_page(pfn_to_page(pfn), pfn, zone, nid);
 }
 
+void reserve_bootmem_region(unsigned long start, unsigned long end)
+{
+	unsigned long start_pfn = PFN_DOWN(start);
+	unsigned long end_pfn = PFN_UP(end);
+
+	for (; start_pfn < end_pfn; start_pfn++)
+		if (pfn_valid(start_pfn))
+			SetPageReserved(pfn_to_page(start_pfn));
+}
+
 static bool free_pages_prepare(struct page *page, unsigned int order)
 {
 	bool compound = PageCompound(page);
-- 
2.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 04/13] mm: page_alloc: Pass PFN to __free_pages_bootmem
  2015-04-22 17:07 ` Mel Gorman
@ 2015-04-22 17:07   ` Mel Gorman
  -1 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-22 17:07 UTC (permalink / raw)
  To: Linux-MM
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, LKML, Mel Gorman

__free_pages_bootmem prepares a page for release to the buddy allocator
and assumes that the struct page is initialised. Parallel initialisation of
struct pages defers initialisation and __free_pages_bootmem can be called
for struct pages that cannot yet map struct page to PFN.  This patch passes
PFN to __free_pages_bootmem with no other functional change.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/bootmem.c    | 8 ++++----
 mm/internal.h   | 3 ++-
 mm/memblock.c   | 2 +-
 mm/nobootmem.c  | 4 ++--
 mm/page_alloc.c | 3 ++-
 5 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/mm/bootmem.c b/mm/bootmem.c
index 477be696511d..daf956bb4782 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -164,7 +164,7 @@ void __init free_bootmem_late(unsigned long physaddr, unsigned long size)
 	end = PFN_DOWN(physaddr + size);
 
 	for (; cursor < end; cursor++) {
-		__free_pages_bootmem(pfn_to_page(cursor), 0);
+		__free_pages_bootmem(pfn_to_page(cursor), cursor, 0);
 		totalram_pages++;
 	}
 }
@@ -210,7 +210,7 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
 		if (IS_ALIGNED(start, BITS_PER_LONG) && vec == ~0UL) {
 			int order = ilog2(BITS_PER_LONG);
 
-			__free_pages_bootmem(pfn_to_page(start), order);
+			__free_pages_bootmem(pfn_to_page(start), start, order);
 			count += BITS_PER_LONG;
 			start += BITS_PER_LONG;
 		} else {
@@ -220,7 +220,7 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
 			while (vec && cur != start) {
 				if (vec & 1) {
 					page = pfn_to_page(cur);
-					__free_pages_bootmem(page, 0);
+					__free_pages_bootmem(page, cur, 0);
 					count++;
 				}
 				vec >>= 1;
@@ -234,7 +234,7 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
 	pages = bootmem_bootmap_pages(pages);
 	count += pages;
 	while (pages--)
-		__free_pages_bootmem(page++, 0);
+		__free_pages_bootmem(page++, cur++, 0);
 
 	bdebug("nid=%td released=%lx\n", bdata - bootmem_node_data, count);
 
diff --git a/mm/internal.h b/mm/internal.h
index a96da5b0029d..76b605139c7a 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -155,7 +155,8 @@ __find_buddy_index(unsigned long page_idx, unsigned int order)
 }
 
 extern int __isolate_free_page(struct page *page, unsigned int order);
-extern void __free_pages_bootmem(struct page *page, unsigned int order);
+extern void __free_pages_bootmem(struct page *page, unsigned long pfn,
+					unsigned int order);
 extern void prep_compound_page(struct page *page, unsigned long order);
 #ifdef CONFIG_MEMORY_FAILURE
 extern bool is_free_buddy_page(struct page *page);
diff --git a/mm/memblock.c b/mm/memblock.c
index e0cc2d174f74..f3e97d8eeb5c 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1334,7 +1334,7 @@ void __init __memblock_free_late(phys_addr_t base, phys_addr_t size)
 	end = PFN_DOWN(base + size);
 
 	for (; cursor < end; cursor++) {
-		__free_pages_bootmem(pfn_to_page(cursor), 0);
+		__free_pages_bootmem(pfn_to_page(cursor), cursor, 0);
 		totalram_pages++;
 	}
 }
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index 396f9e450dc1..bae652713ee5 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -77,7 +77,7 @@ void __init free_bootmem_late(unsigned long addr, unsigned long size)
 	end = PFN_DOWN(addr + size);
 
 	for (; cursor < end; cursor++) {
-		__free_pages_bootmem(pfn_to_page(cursor), 0);
+		__free_pages_bootmem(pfn_to_page(cursor), cursor, 0);
 		totalram_pages++;
 	}
 }
@@ -92,7 +92,7 @@ static void __init __free_pages_memory(unsigned long start, unsigned long end)
 		while (start + (1UL << order) > end)
 			order--;
 
-		__free_pages_bootmem(pfn_to_page(start), order);
+		__free_pages_bootmem(pfn_to_page(start), start, order);
 
 		start += (1UL << order);
 	}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2abb3b861e70..0a0e0f280d87 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -886,7 +886,8 @@ static void __free_pages_ok(struct page *page, unsigned int order)
 	local_irq_restore(flags);
 }
 
-void __init __free_pages_bootmem(struct page *page, unsigned int order)
+void __init __free_pages_bootmem(struct page *page, unsigned long pfn,
+							unsigned int order)
 {
 	unsigned int nr_pages = 1 << order;
 	struct page *p = page;
-- 
2.1.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 04/13] mm: page_alloc: Pass PFN to __free_pages_bootmem
@ 2015-04-22 17:07   ` Mel Gorman
  0 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-22 17:07 UTC (permalink / raw)
  To: Linux-MM
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, LKML, Mel Gorman

__free_pages_bootmem prepares a page for release to the buddy allocator
and assumes that the struct page is initialised. Parallel initialisation of
struct pages defers initialisation and __free_pages_bootmem can be called
for struct pages that cannot yet map struct page to PFN.  This patch passes
PFN to __free_pages_bootmem with no other functional change.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/bootmem.c    | 8 ++++----
 mm/internal.h   | 3 ++-
 mm/memblock.c   | 2 +-
 mm/nobootmem.c  | 4 ++--
 mm/page_alloc.c | 3 ++-
 5 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/mm/bootmem.c b/mm/bootmem.c
index 477be696511d..daf956bb4782 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -164,7 +164,7 @@ void __init free_bootmem_late(unsigned long physaddr, unsigned long size)
 	end = PFN_DOWN(physaddr + size);
 
 	for (; cursor < end; cursor++) {
-		__free_pages_bootmem(pfn_to_page(cursor), 0);
+		__free_pages_bootmem(pfn_to_page(cursor), cursor, 0);
 		totalram_pages++;
 	}
 }
@@ -210,7 +210,7 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
 		if (IS_ALIGNED(start, BITS_PER_LONG) && vec == ~0UL) {
 			int order = ilog2(BITS_PER_LONG);
 
-			__free_pages_bootmem(pfn_to_page(start), order);
+			__free_pages_bootmem(pfn_to_page(start), start, order);
 			count += BITS_PER_LONG;
 			start += BITS_PER_LONG;
 		} else {
@@ -220,7 +220,7 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
 			while (vec && cur != start) {
 				if (vec & 1) {
 					page = pfn_to_page(cur);
-					__free_pages_bootmem(page, 0);
+					__free_pages_bootmem(page, cur, 0);
 					count++;
 				}
 				vec >>= 1;
@@ -234,7 +234,7 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
 	pages = bootmem_bootmap_pages(pages);
 	count += pages;
 	while (pages--)
-		__free_pages_bootmem(page++, 0);
+		__free_pages_bootmem(page++, cur++, 0);
 
 	bdebug("nid=%td released=%lx\n", bdata - bootmem_node_data, count);
 
diff --git a/mm/internal.h b/mm/internal.h
index a96da5b0029d..76b605139c7a 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -155,7 +155,8 @@ __find_buddy_index(unsigned long page_idx, unsigned int order)
 }
 
 extern int __isolate_free_page(struct page *page, unsigned int order);
-extern void __free_pages_bootmem(struct page *page, unsigned int order);
+extern void __free_pages_bootmem(struct page *page, unsigned long pfn,
+					unsigned int order);
 extern void prep_compound_page(struct page *page, unsigned long order);
 #ifdef CONFIG_MEMORY_FAILURE
 extern bool is_free_buddy_page(struct page *page);
diff --git a/mm/memblock.c b/mm/memblock.c
index e0cc2d174f74..f3e97d8eeb5c 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1334,7 +1334,7 @@ void __init __memblock_free_late(phys_addr_t base, phys_addr_t size)
 	end = PFN_DOWN(base + size);
 
 	for (; cursor < end; cursor++) {
-		__free_pages_bootmem(pfn_to_page(cursor), 0);
+		__free_pages_bootmem(pfn_to_page(cursor), cursor, 0);
 		totalram_pages++;
 	}
 }
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index 396f9e450dc1..bae652713ee5 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -77,7 +77,7 @@ void __init free_bootmem_late(unsigned long addr, unsigned long size)
 	end = PFN_DOWN(addr + size);
 
 	for (; cursor < end; cursor++) {
-		__free_pages_bootmem(pfn_to_page(cursor), 0);
+		__free_pages_bootmem(pfn_to_page(cursor), cursor, 0);
 		totalram_pages++;
 	}
 }
@@ -92,7 +92,7 @@ static void __init __free_pages_memory(unsigned long start, unsigned long end)
 		while (start + (1UL << order) > end)
 			order--;
 
-		__free_pages_bootmem(pfn_to_page(start), order);
+		__free_pages_bootmem(pfn_to_page(start), start, order);
 
 		start += (1UL << order);
 	}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2abb3b861e70..0a0e0f280d87 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -886,7 +886,8 @@ static void __free_pages_ok(struct page *page, unsigned int order)
 	local_irq_restore(flags);
 }
 
-void __init __free_pages_bootmem(struct page *page, unsigned int order)
+void __init __free_pages_bootmem(struct page *page, unsigned long pfn,
+							unsigned int order)
 {
 	unsigned int nr_pages = 1 << order;
 	struct page *p = page;
-- 
2.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 05/13] mm: meminit: Make __early_pfn_to_nid SMP-safe and introduce meminit_pfn_in_nid
  2015-04-22 17:07 ` Mel Gorman
@ 2015-04-22 17:07   ` Mel Gorman
  -1 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-22 17:07 UTC (permalink / raw)
  To: Linux-MM
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, LKML, Mel Gorman

__early_pfn_to_nid() in the generic and arch-specific implementations
use static variables to cache recent lookups. Without the cache
boot times are much higher due to the excessive memblock lookups but
it assumes that memory initialisation is single-threaded. Parallel
initialisation of struct pages will break that assumption so this patch
makes __early_pfn_to_nid() SMP-safe by requiring the caller to cache
recent search information. early_pfn_to_nid() keeps the same interface
but is only safe to use early in boot due to the use of a global static
variable. meminit_pfn_in_nid() is an SMP-safe version that callers must
maintain their own state for.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 arch/ia64/mm/numa.c    | 19 +++++++------------
 include/linux/mm.h     |  8 ++++++--
 include/linux/mmzone.h | 16 +++++++++++++++-
 mm/page_alloc.c        | 40 +++++++++++++++++++++++++---------------
 4 files changed, 53 insertions(+), 30 deletions(-)

diff --git a/arch/ia64/mm/numa.c b/arch/ia64/mm/numa.c
index ea21d4cad540..aa19b7ac8222 100644
--- a/arch/ia64/mm/numa.c
+++ b/arch/ia64/mm/numa.c
@@ -58,27 +58,22 @@ paddr_to_nid(unsigned long paddr)
  * SPARSEMEM to allocate the SPARSEMEM sectionmap on the NUMA node where
  * the section resides.
  */
-int __meminit __early_pfn_to_nid(unsigned long pfn)
+int __meminit __early_pfn_to_nid(unsigned long pfn,
+					struct mminit_pfnnid_cache *state)
 {
 	int i, section = pfn >> PFN_SECTION_SHIFT, ssec, esec;
-	/*
-	 * NOTE: The following SMP-unsafe globals are only used early in boot
-	 * when the kernel is running single-threaded.
-	 */
-	static int __meminitdata last_ssec, last_esec;
-	static int __meminitdata last_nid;
 
-	if (section >= last_ssec && section < last_esec)
-		return last_nid;
+	if (section >= state->last_start && section < state->last_end)
+		return state->last_nid;
 
 	for (i = 0; i < num_node_memblks; i++) {
 		ssec = node_memblk[i].start_paddr >> PA_SECTION_SHIFT;
 		esec = (node_memblk[i].start_paddr + node_memblk[i].size +
 			((1L << PA_SECTION_SHIFT) - 1)) >> PA_SECTION_SHIFT;
 		if (section >= ssec && section < esec) {
-			last_ssec = ssec;
-			last_esec = esec;
-			last_nid = node_memblk[i].nid;
+			state->last_start = ssec;
+			state->last_end = esec;
+			state->last_nid = node_memblk[i].nid;
 			return node_memblk[i].nid;
 		}
 	}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index b6f82a31028a..3a4c9f72c080 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1802,7 +1802,8 @@ extern void sparse_memory_present_with_active_regions(int nid);
 
 #if !defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) && \
     !defined(CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID)
-static inline int __early_pfn_to_nid(unsigned long pfn)
+static inline int __early_pfn_to_nid(unsigned long pfn,
+					struct mminit_pfnnid_cache *state)
 {
 	return 0;
 }
@@ -1810,7 +1811,10 @@ static inline int __early_pfn_to_nid(unsigned long pfn)
 /* please see mm/page_alloc.c */
 extern int __meminit early_pfn_to_nid(unsigned long pfn);
 /* there is a per-arch backend function. */
-extern int __meminit __early_pfn_to_nid(unsigned long pfn);
+extern int __meminit __early_pfn_to_nid(unsigned long pfn,
+					struct mminit_pfnnid_cache *state);
+bool __meminit meminit_pfn_in_nid(unsigned long pfn, int node,
+					struct mminit_pfnnid_cache *state);
 #endif
 
 extern void set_dma_reserve(unsigned long new_dma_reserve);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 2782df47101e..a67b33e52dfe 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1216,10 +1216,24 @@ void sparse_init(void);
 #define sparse_index_init(_sec, _nid)  do {} while (0)
 #endif /* CONFIG_SPARSEMEM */
 
+/*
+ * During memory init memblocks map pfns to nids. The search is expensive and
+ * this caches recent lookups. The implementation of __early_pfn_to_nid
+ * may treat start/end as pfns or sections.
+ */
+struct mminit_pfnnid_cache {
+	unsigned long last_start;
+	unsigned long last_end;
+	int last_nid;
+};
+
 #ifdef CONFIG_NODES_SPAN_OTHER_NODES
 bool early_pfn_in_nid(unsigned long pfn, int nid);
+bool meminit_pfn_in_nid(unsigned long pfn, int node,
+			struct mminit_pfnnid_cache *state);
 #else
-#define early_pfn_in_nid(pfn, nid)	(1)
+#define early_pfn_in_nid(pfn, nid)		(1)
+#define meminit_pfn_in_nid(pfn, nid, state)	(1)
 #endif
 
 #ifndef early_pfn_valid
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0a0e0f280d87..f556ed63b964 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4457,39 +4457,41 @@ int __meminit init_currently_empty_zone(struct zone *zone,
 
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 #ifndef CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
+
 /*
  * Required by SPARSEMEM. Given a PFN, return what node the PFN is on.
  */
-int __meminit __early_pfn_to_nid(unsigned long pfn)
+int __meminit __early_pfn_to_nid(unsigned long pfn,
+					struct mminit_pfnnid_cache *state)
 {
 	unsigned long start_pfn, end_pfn;
 	int nid;
-	/*
-	 * NOTE: The following SMP-unsafe globals are only used early in boot
-	 * when the kernel is running single-threaded.
-	 */
-	static unsigned long __meminitdata last_start_pfn, last_end_pfn;
-	static int __meminitdata last_nid;
 
-	if (last_start_pfn <= pfn && pfn < last_end_pfn)
-		return last_nid;
+	if (state->last_start <= pfn && pfn < state->last_end)
+		return state->last_nid;
 
 	nid = memblock_search_pfn_nid(pfn, &start_pfn, &end_pfn);
 	if (nid != -1) {
-		last_start_pfn = start_pfn;
-		last_end_pfn = end_pfn;
-		last_nid = nid;
+		state->last_start = start_pfn;
+		state->last_end = end_pfn;
+		state->last_nid = nid;
 	}
 
 	return nid;
 }
 #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */
 
+struct __meminitdata mminit_pfnnid_cache global_init_state;
+
+/* Only safe to use early in boot when initialisation is single-threaded */
 int __meminit early_pfn_to_nid(unsigned long pfn)
 {
 	int nid;
 
-	nid = __early_pfn_to_nid(pfn);
+	/* The system will behave unpredictably otherwise */
+	BUG_ON(system_state != SYSTEM_BOOTING);
+
+	nid = __early_pfn_to_nid(pfn, &global_init_state);
 	if (nid >= 0)
 		return nid;
 	/* just returns 0 */
@@ -4497,15 +4499,23 @@ int __meminit early_pfn_to_nid(unsigned long pfn)
 }
 
 #ifdef CONFIG_NODES_SPAN_OTHER_NODES
-bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
+bool __meminit meminit_pfn_in_nid(unsigned long pfn, int node,
+					struct mminit_pfnnid_cache *state)
 {
 	int nid;
 
-	nid = __early_pfn_to_nid(pfn);
+	nid = __early_pfn_to_nid(pfn, state);
 	if (nid >= 0 && nid != node)
 		return false;
 	return true;
 }
+
+/* Only safe to use early in boot when initialisation is single-threaded */
+bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
+{
+	return meminit_pfn_in_nid(pfn, node, &global_init_state);
+}
+
 #endif
 
 /**
-- 
2.1.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 05/13] mm: meminit: Make __early_pfn_to_nid SMP-safe and introduce meminit_pfn_in_nid
@ 2015-04-22 17:07   ` Mel Gorman
  0 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-22 17:07 UTC (permalink / raw)
  To: Linux-MM
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, LKML, Mel Gorman

__early_pfn_to_nid() in the generic and arch-specific implementations
use static variables to cache recent lookups. Without the cache
boot times are much higher due to the excessive memblock lookups but
it assumes that memory initialisation is single-threaded. Parallel
initialisation of struct pages will break that assumption so this patch
makes __early_pfn_to_nid() SMP-safe by requiring the caller to cache
recent search information. early_pfn_to_nid() keeps the same interface
but is only safe to use early in boot due to the use of a global static
variable. meminit_pfn_in_nid() is an SMP-safe version that callers must
maintain their own state for.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 arch/ia64/mm/numa.c    | 19 +++++++------------
 include/linux/mm.h     |  8 ++++++--
 include/linux/mmzone.h | 16 +++++++++++++++-
 mm/page_alloc.c        | 40 +++++++++++++++++++++++++---------------
 4 files changed, 53 insertions(+), 30 deletions(-)

diff --git a/arch/ia64/mm/numa.c b/arch/ia64/mm/numa.c
index ea21d4cad540..aa19b7ac8222 100644
--- a/arch/ia64/mm/numa.c
+++ b/arch/ia64/mm/numa.c
@@ -58,27 +58,22 @@ paddr_to_nid(unsigned long paddr)
  * SPARSEMEM to allocate the SPARSEMEM sectionmap on the NUMA node where
  * the section resides.
  */
-int __meminit __early_pfn_to_nid(unsigned long pfn)
+int __meminit __early_pfn_to_nid(unsigned long pfn,
+					struct mminit_pfnnid_cache *state)
 {
 	int i, section = pfn >> PFN_SECTION_SHIFT, ssec, esec;
-	/*
-	 * NOTE: The following SMP-unsafe globals are only used early in boot
-	 * when the kernel is running single-threaded.
-	 */
-	static int __meminitdata last_ssec, last_esec;
-	static int __meminitdata last_nid;
 
-	if (section >= last_ssec && section < last_esec)
-		return last_nid;
+	if (section >= state->last_start && section < state->last_end)
+		return state->last_nid;
 
 	for (i = 0; i < num_node_memblks; i++) {
 		ssec = node_memblk[i].start_paddr >> PA_SECTION_SHIFT;
 		esec = (node_memblk[i].start_paddr + node_memblk[i].size +
 			((1L << PA_SECTION_SHIFT) - 1)) >> PA_SECTION_SHIFT;
 		if (section >= ssec && section < esec) {
-			last_ssec = ssec;
-			last_esec = esec;
-			last_nid = node_memblk[i].nid;
+			state->last_start = ssec;
+			state->last_end = esec;
+			state->last_nid = node_memblk[i].nid;
 			return node_memblk[i].nid;
 		}
 	}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index b6f82a31028a..3a4c9f72c080 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1802,7 +1802,8 @@ extern void sparse_memory_present_with_active_regions(int nid);
 
 #if !defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) && \
     !defined(CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID)
-static inline int __early_pfn_to_nid(unsigned long pfn)
+static inline int __early_pfn_to_nid(unsigned long pfn,
+					struct mminit_pfnnid_cache *state)
 {
 	return 0;
 }
@@ -1810,7 +1811,10 @@ static inline int __early_pfn_to_nid(unsigned long pfn)
 /* please see mm/page_alloc.c */
 extern int __meminit early_pfn_to_nid(unsigned long pfn);
 /* there is a per-arch backend function. */
-extern int __meminit __early_pfn_to_nid(unsigned long pfn);
+extern int __meminit __early_pfn_to_nid(unsigned long pfn,
+					struct mminit_pfnnid_cache *state);
+bool __meminit meminit_pfn_in_nid(unsigned long pfn, int node,
+					struct mminit_pfnnid_cache *state);
 #endif
 
 extern void set_dma_reserve(unsigned long new_dma_reserve);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 2782df47101e..a67b33e52dfe 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1216,10 +1216,24 @@ void sparse_init(void);
 #define sparse_index_init(_sec, _nid)  do {} while (0)
 #endif /* CONFIG_SPARSEMEM */
 
+/*
+ * During memory init memblocks map pfns to nids. The search is expensive and
+ * this caches recent lookups. The implementation of __early_pfn_to_nid
+ * may treat start/end as pfns or sections.
+ */
+struct mminit_pfnnid_cache {
+	unsigned long last_start;
+	unsigned long last_end;
+	int last_nid;
+};
+
 #ifdef CONFIG_NODES_SPAN_OTHER_NODES
 bool early_pfn_in_nid(unsigned long pfn, int nid);
+bool meminit_pfn_in_nid(unsigned long pfn, int node,
+			struct mminit_pfnnid_cache *state);
 #else
-#define early_pfn_in_nid(pfn, nid)	(1)
+#define early_pfn_in_nid(pfn, nid)		(1)
+#define meminit_pfn_in_nid(pfn, nid, state)	(1)
 #endif
 
 #ifndef early_pfn_valid
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0a0e0f280d87..f556ed63b964 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4457,39 +4457,41 @@ int __meminit init_currently_empty_zone(struct zone *zone,
 
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 #ifndef CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
+
 /*
  * Required by SPARSEMEM. Given a PFN, return what node the PFN is on.
  */
-int __meminit __early_pfn_to_nid(unsigned long pfn)
+int __meminit __early_pfn_to_nid(unsigned long pfn,
+					struct mminit_pfnnid_cache *state)
 {
 	unsigned long start_pfn, end_pfn;
 	int nid;
-	/*
-	 * NOTE: The following SMP-unsafe globals are only used early in boot
-	 * when the kernel is running single-threaded.
-	 */
-	static unsigned long __meminitdata last_start_pfn, last_end_pfn;
-	static int __meminitdata last_nid;
 
-	if (last_start_pfn <= pfn && pfn < last_end_pfn)
-		return last_nid;
+	if (state->last_start <= pfn && pfn < state->last_end)
+		return state->last_nid;
 
 	nid = memblock_search_pfn_nid(pfn, &start_pfn, &end_pfn);
 	if (nid != -1) {
-		last_start_pfn = start_pfn;
-		last_end_pfn = end_pfn;
-		last_nid = nid;
+		state->last_start = start_pfn;
+		state->last_end = end_pfn;
+		state->last_nid = nid;
 	}
 
 	return nid;
 }
 #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */
 
+struct __meminitdata mminit_pfnnid_cache global_init_state;
+
+/* Only safe to use early in boot when initialisation is single-threaded */
 int __meminit early_pfn_to_nid(unsigned long pfn)
 {
 	int nid;
 
-	nid = __early_pfn_to_nid(pfn);
+	/* The system will behave unpredictably otherwise */
+	BUG_ON(system_state != SYSTEM_BOOTING);
+
+	nid = __early_pfn_to_nid(pfn, &global_init_state);
 	if (nid >= 0)
 		return nid;
 	/* just returns 0 */
@@ -4497,15 +4499,23 @@ int __meminit early_pfn_to_nid(unsigned long pfn)
 }
 
 #ifdef CONFIG_NODES_SPAN_OTHER_NODES
-bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
+bool __meminit meminit_pfn_in_nid(unsigned long pfn, int node,
+					struct mminit_pfnnid_cache *state)
 {
 	int nid;
 
-	nid = __early_pfn_to_nid(pfn);
+	nid = __early_pfn_to_nid(pfn, state);
 	if (nid >= 0 && nid != node)
 		return false;
 	return true;
 }
+
+/* Only safe to use early in boot when initialisation is single-threaded */
+bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
+{
+	return meminit_pfn_in_nid(pfn, node, &global_init_state);
+}
+
 #endif
 
 /**
-- 
2.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 06/13] mm: meminit: Inline some helper functions
  2015-04-22 17:07 ` Mel Gorman
@ 2015-04-22 17:07   ` Mel Gorman
  -1 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-22 17:07 UTC (permalink / raw)
  To: Linux-MM
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, LKML, Mel Gorman

early_pfn_in_nid() and meminit_pfn_in_nid() are small functions that are
unnecessarily visible outside memory initialisation. As well as unnecessary
visibility, it's unnecessary function call overhead when initialising pages.
This patch moves the helpers inline.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/mm.h     |  2 --
 include/linux/mmzone.h | 17 -----------
 mm/page_alloc.c        | 80 +++++++++++++++++++++++++++-----------------------
 3 files changed, 43 insertions(+), 56 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3a4c9f72c080..a8a8b161fd65 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1813,8 +1813,6 @@ extern int __meminit early_pfn_to_nid(unsigned long pfn);
 /* there is a per-arch backend function. */
 extern int __meminit __early_pfn_to_nid(unsigned long pfn,
 					struct mminit_pfnnid_cache *state);
-bool __meminit meminit_pfn_in_nid(unsigned long pfn, int node,
-					struct mminit_pfnnid_cache *state);
 #endif
 
 extern void set_dma_reserve(unsigned long new_dma_reserve);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index a67b33e52dfe..f78ca65a9884 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1036,14 +1036,6 @@ static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist,
 #include <asm/sparsemem.h>
 #endif
 
-#if !defined(CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID) && \
-	!defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP)
-static inline unsigned long early_pfn_to_nid(unsigned long pfn)
-{
-	return 0;
-}
-#endif
-
 #ifdef CONFIG_FLATMEM
 #define pfn_to_nid(pfn)		(0)
 #endif
@@ -1227,15 +1219,6 @@ struct mminit_pfnnid_cache {
 	int last_nid;
 };
 
-#ifdef CONFIG_NODES_SPAN_OTHER_NODES
-bool early_pfn_in_nid(unsigned long pfn, int nid);
-bool meminit_pfn_in_nid(unsigned long pfn, int node,
-			struct mminit_pfnnid_cache *state);
-#else
-#define early_pfn_in_nid(pfn, nid)		(1)
-#define meminit_pfn_in_nid(pfn, nid, state)	(1)
-#endif
-
 #ifndef early_pfn_valid
 #define early_pfn_valid(pfn)	(1)
 #endif
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f556ed63b964..b148c9921740 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -907,6 +907,49 @@ void __init __free_pages_bootmem(struct page *page, unsigned long pfn,
 	__free_pages(page, order);
 }
 
+#if !defined(CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID) && \
+        !defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP)
+static inline unsigned long early_pfn_to_nid(unsigned long pfn)
+{
+	return 0;
+}
+#else
+/* Only safe to use early in boot when initialisation is single-threaded */
+struct __meminitdata mminit_pfnnid_cache global_init_state;
+int __meminit early_pfn_to_nid(unsigned long pfn)
+{
+	int nid;
+
+	/* The system will behave unpredictably otherwise */
+	BUG_ON(system_state != SYSTEM_BOOTING);
+
+	nid = __early_pfn_to_nid(pfn, &global_init_state);
+	if (nid >= 0)
+		return nid;
+	/* just returns 0 */
+	return 0;
+}
+#endif
+
+#ifdef CONFIG_NODES_SPAN_OTHER_NODES
+static inline bool __meminit meminit_pfn_in_nid(unsigned long pfn, int node,
+					struct mminit_pfnnid_cache *state)
+{
+	int nid;
+
+	nid = __early_pfn_to_nid(pfn, state);
+	if (nid >= 0 && nid != node)
+		return false;
+	return true;
+}
+
+/* Only safe to use early in boot when initialisation is single-threaded */
+static inline bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
+{
+	return meminit_pfn_in_nid(pfn, node, &global_init_state);
+}
+#endif
+
 #ifdef CONFIG_CMA
 /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
 void __init init_cma_reserved_pageblock(struct page *page)
@@ -4481,43 +4524,6 @@ int __meminit __early_pfn_to_nid(unsigned long pfn,
 }
 #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */
 
-struct __meminitdata mminit_pfnnid_cache global_init_state;
-
-/* Only safe to use early in boot when initialisation is single-threaded */
-int __meminit early_pfn_to_nid(unsigned long pfn)
-{
-	int nid;
-
-	/* The system will behave unpredictably otherwise */
-	BUG_ON(system_state != SYSTEM_BOOTING);
-
-	nid = __early_pfn_to_nid(pfn, &global_init_state);
-	if (nid >= 0)
-		return nid;
-	/* just returns 0 */
-	return 0;
-}
-
-#ifdef CONFIG_NODES_SPAN_OTHER_NODES
-bool __meminit meminit_pfn_in_nid(unsigned long pfn, int node,
-					struct mminit_pfnnid_cache *state)
-{
-	int nid;
-
-	nid = __early_pfn_to_nid(pfn, state);
-	if (nid >= 0 && nid != node)
-		return false;
-	return true;
-}
-
-/* Only safe to use early in boot when initialisation is single-threaded */
-bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
-{
-	return meminit_pfn_in_nid(pfn, node, &global_init_state);
-}
-
-#endif
-
 /**
  * free_bootmem_with_active_regions - Call memblock_free_early_nid for each active range
  * @nid: The node to free memory on. If MAX_NUMNODES, all nodes are freed.
-- 
2.1.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 06/13] mm: meminit: Inline some helper functions
@ 2015-04-22 17:07   ` Mel Gorman
  0 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-22 17:07 UTC (permalink / raw)
  To: Linux-MM
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, LKML, Mel Gorman

early_pfn_in_nid() and meminit_pfn_in_nid() are small functions that are
unnecessarily visible outside memory initialisation. As well as unnecessary
visibility, it's unnecessary function call overhead when initialising pages.
This patch moves the helpers inline.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/mm.h     |  2 --
 include/linux/mmzone.h | 17 -----------
 mm/page_alloc.c        | 80 +++++++++++++++++++++++++++-----------------------
 3 files changed, 43 insertions(+), 56 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3a4c9f72c080..a8a8b161fd65 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1813,8 +1813,6 @@ extern int __meminit early_pfn_to_nid(unsigned long pfn);
 /* there is a per-arch backend function. */
 extern int __meminit __early_pfn_to_nid(unsigned long pfn,
 					struct mminit_pfnnid_cache *state);
-bool __meminit meminit_pfn_in_nid(unsigned long pfn, int node,
-					struct mminit_pfnnid_cache *state);
 #endif
 
 extern void set_dma_reserve(unsigned long new_dma_reserve);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index a67b33e52dfe..f78ca65a9884 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1036,14 +1036,6 @@ static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist,
 #include <asm/sparsemem.h>
 #endif
 
-#if !defined(CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID) && \
-	!defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP)
-static inline unsigned long early_pfn_to_nid(unsigned long pfn)
-{
-	return 0;
-}
-#endif
-
 #ifdef CONFIG_FLATMEM
 #define pfn_to_nid(pfn)		(0)
 #endif
@@ -1227,15 +1219,6 @@ struct mminit_pfnnid_cache {
 	int last_nid;
 };
 
-#ifdef CONFIG_NODES_SPAN_OTHER_NODES
-bool early_pfn_in_nid(unsigned long pfn, int nid);
-bool meminit_pfn_in_nid(unsigned long pfn, int node,
-			struct mminit_pfnnid_cache *state);
-#else
-#define early_pfn_in_nid(pfn, nid)		(1)
-#define meminit_pfn_in_nid(pfn, nid, state)	(1)
-#endif
-
 #ifndef early_pfn_valid
 #define early_pfn_valid(pfn)	(1)
 #endif
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f556ed63b964..b148c9921740 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -907,6 +907,49 @@ void __init __free_pages_bootmem(struct page *page, unsigned long pfn,
 	__free_pages(page, order);
 }
 
+#if !defined(CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID) && \
+        !defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP)
+static inline unsigned long early_pfn_to_nid(unsigned long pfn)
+{
+	return 0;
+}
+#else
+/* Only safe to use early in boot when initialisation is single-threaded */
+struct __meminitdata mminit_pfnnid_cache global_init_state;
+int __meminit early_pfn_to_nid(unsigned long pfn)
+{
+	int nid;
+
+	/* The system will behave unpredictably otherwise */
+	BUG_ON(system_state != SYSTEM_BOOTING);
+
+	nid = __early_pfn_to_nid(pfn, &global_init_state);
+	if (nid >= 0)
+		return nid;
+	/* just returns 0 */
+	return 0;
+}
+#endif
+
+#ifdef CONFIG_NODES_SPAN_OTHER_NODES
+static inline bool __meminit meminit_pfn_in_nid(unsigned long pfn, int node,
+					struct mminit_pfnnid_cache *state)
+{
+	int nid;
+
+	nid = __early_pfn_to_nid(pfn, state);
+	if (nid >= 0 && nid != node)
+		return false;
+	return true;
+}
+
+/* Only safe to use early in boot when initialisation is single-threaded */
+static inline bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
+{
+	return meminit_pfn_in_nid(pfn, node, &global_init_state);
+}
+#endif
+
 #ifdef CONFIG_CMA
 /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
 void __init init_cma_reserved_pageblock(struct page *page)
@@ -4481,43 +4524,6 @@ int __meminit __early_pfn_to_nid(unsigned long pfn,
 }
 #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */
 
-struct __meminitdata mminit_pfnnid_cache global_init_state;
-
-/* Only safe to use early in boot when initialisation is single-threaded */
-int __meminit early_pfn_to_nid(unsigned long pfn)
-{
-	int nid;
-
-	/* The system will behave unpredictably otherwise */
-	BUG_ON(system_state != SYSTEM_BOOTING);
-
-	nid = __early_pfn_to_nid(pfn, &global_init_state);
-	if (nid >= 0)
-		return nid;
-	/* just returns 0 */
-	return 0;
-}
-
-#ifdef CONFIG_NODES_SPAN_OTHER_NODES
-bool __meminit meminit_pfn_in_nid(unsigned long pfn, int node,
-					struct mminit_pfnnid_cache *state)
-{
-	int nid;
-
-	nid = __early_pfn_to_nid(pfn, state);
-	if (nid >= 0 && nid != node)
-		return false;
-	return true;
-}
-
-/* Only safe to use early in boot when initialisation is single-threaded */
-bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
-{
-	return meminit_pfn_in_nid(pfn, node, &global_init_state);
-}
-
-#endif
-
 /**
  * free_bootmem_with_active_regions - Call memblock_free_early_nid for each active range
  * @nid: The node to free memory on. If MAX_NUMNODES, all nodes are freed.
-- 
2.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 07/13] mm: meminit: Only a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set
  2015-04-22 17:07 ` Mel Gorman
@ 2015-04-22 17:07   ` Mel Gorman
  -1 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-22 17:07 UTC (permalink / raw)
  To: Linux-MM
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, LKML, Mel Gorman

This patch initalises all low memory struct pages and 2G of the highest zone
on each node during memory initialisation if CONFIG_DEFERRED_STRUCT_PAGE_INIT
is set. That config option cannot be set but will be available in a later
patch.  Parallel initialisation of struct page depends on some features
from memory hotplug and it is necessary to alter alter section annotations.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/mmzone.h |  8 +++++
 mm/internal.h          |  8 +++++
 mm/page_alloc.c        | 81 ++++++++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 94 insertions(+), 3 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index f78ca65a9884..821f5000dec9 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -762,6 +762,14 @@ typedef struct pglist_data {
 	/* Number of pages migrated during the rate limiting time interval */
 	unsigned long numabalancing_migrate_nr_pages;
 #endif
+
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+	/*
+	 * If memory initialisation on large machines is deferred then this
+	 * is the first PFN that needs to be initialised.
+	 */
+	unsigned long first_deferred_pfn;
+#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
 } pg_data_t;
 
 #define node_present_pages(nid)	(NODE_DATA(nid)->node_present_pages)
diff --git a/mm/internal.h b/mm/internal.h
index 76b605139c7a..4a73f74846bd 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -385,6 +385,14 @@ static inline void mminit_verify_zonelist(void)
 }
 #endif /* CONFIG_DEBUG_MEMORY_INIT */
 
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+#define __defermem_init __meminit
+#define __defer_init    __meminit
+#else
+#define __defermem_init
+#define __defer_init __init
+#endif
+
 /* mminit_validate_memmodel_limits is independent of CONFIG_DEBUG_MEMORY_INIT */
 #if defined(CONFIG_SPARSEMEM)
 extern void mminit_validate_memmodel_limits(unsigned long *start_pfn,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b148c9921740..2d649e0a1f9e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -235,6 +235,67 @@ EXPORT_SYMBOL(nr_online_nodes);
 
 int page_group_by_mobility_disabled __read_mostly;
 
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+static inline void reset_deferred_meminit(pg_data_t *pgdat)
+{
+	pgdat->first_deferred_pfn = ULONG_MAX;
+}
+
+/* Returns true if the struct page for the pfn is uninitialised */
+static inline bool __defermem_init early_page_uninitialised(unsigned long pfn)
+{
+	int nid = early_pfn_to_nid(pfn);
+
+	if (pfn >= NODE_DATA(nid)->first_deferred_pfn)
+		return true;
+
+	return false;
+}
+
+/*
+ * Returns false when the remaining initialisation should be deferred until
+ * later in the boot cycle when it can be parallelised.
+ */
+static inline bool update_defer_init(pg_data_t *pgdat,
+				unsigned long pfn, unsigned long zone_end,
+				unsigned long *nr_initialised)
+{
+	if (!deferred_mem_init_enabled)
+		return true;
+
+	/* Always populate low zones for address-contrained allocations */
+	if (zone_end < pgdat_end_pfn(pgdat))
+		return true;
+
+	/* Initialise at least 2G of the highest zone */
+	(*nr_initialised)++;
+	if (*nr_initialised > (2UL << (30 - PAGE_SHIFT)) &&
+	    (pfn & (PAGES_PER_SECTION - 1)) == 0) {
+		pgdat->first_deferred_pfn = pfn;
+		return false;
+	}
+
+	return true;
+}
+#else
+static inline void reset_deferred_meminit(pg_data_t *pgdat)
+{
+}
+
+static inline bool early_page_uninitialised(unsigned long pfn)
+{
+	return false;
+}
+
+static inline bool update_defer_init(pg_data_t *pgdat,
+				unsigned long pfn, unsigned long zone_end,
+				unsigned long *nr_initialised)
+{
+	return true;
+}
+#endif
+
+
 void set_pageblock_migratetype(struct page *page, int migratetype)
 {
 	if (unlikely(page_group_by_mobility_disabled &&
@@ -886,8 +947,8 @@ static void __free_pages_ok(struct page *page, unsigned int order)
 	local_irq_restore(flags);
 }
 
-void __init __free_pages_bootmem(struct page *page, unsigned long pfn,
-							unsigned int order)
+static void __defer_init __free_pages_boot_core(struct page *page,
+					unsigned long pfn, unsigned int order)
 {
 	unsigned int nr_pages = 1 << order;
 	struct page *p = page;
@@ -950,6 +1011,14 @@ static inline bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
 }
 #endif
 
+void __defer_init __free_pages_bootmem(struct page *page, unsigned long pfn,
+							unsigned int order)
+{
+	if (early_page_uninitialised(pfn))
+		return;
+	return __free_pages_boot_core(page, pfn, order);
+}
+
 #ifdef CONFIG_CMA
 /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
 void __init init_cma_reserved_pageblock(struct page *page)
@@ -4222,14 +4291,16 @@ static void setup_zone_migrate_reserve(struct zone *zone)
 void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 		unsigned long start_pfn, enum memmap_context context)
 {
+	pg_data_t *pgdat = NODE_DATA(nid);
 	unsigned long end_pfn = start_pfn + size;
 	unsigned long pfn;
 	struct zone *z;
+	unsigned long nr_initialised = 0;
 
 	if (highest_memmap_pfn < end_pfn - 1)
 		highest_memmap_pfn = end_pfn - 1;
 
-	z = &NODE_DATA(nid)->node_zones[zone];
+	z = &pgdat->node_zones[zone];
 	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
 		/*
 		 * There can be holes in boot-time mem_map[]s
@@ -4241,6 +4312,9 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 				continue;
 			if (!early_pfn_in_nid(pfn, nid))
 				continue;
+			if (!update_defer_init(pgdat, pfn, end_pfn,
+						&nr_initialised))
+				break;
 		}
 		__init_single_pfn(pfn, zone, nid);
 	}
@@ -5042,6 +5116,7 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
 	/* pg_data_t should be reset to zero when it's allocated */
 	WARN_ON(pgdat->nr_zones || pgdat->classzone_idx);
 
+	reset_deferred_meminit(pgdat);
 	pgdat->node_id = nid;
 	pgdat->node_start_pfn = node_start_pfn;
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
-- 
2.1.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 07/13] mm: meminit: Only a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set
@ 2015-04-22 17:07   ` Mel Gorman
  0 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-22 17:07 UTC (permalink / raw)
  To: Linux-MM
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, LKML, Mel Gorman

This patch initalises all low memory struct pages and 2G of the highest zone
on each node during memory initialisation if CONFIG_DEFERRED_STRUCT_PAGE_INIT
is set. That config option cannot be set but will be available in a later
patch.  Parallel initialisation of struct page depends on some features
from memory hotplug and it is necessary to alter alter section annotations.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/mmzone.h |  8 +++++
 mm/internal.h          |  8 +++++
 mm/page_alloc.c        | 81 ++++++++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 94 insertions(+), 3 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index f78ca65a9884..821f5000dec9 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -762,6 +762,14 @@ typedef struct pglist_data {
 	/* Number of pages migrated during the rate limiting time interval */
 	unsigned long numabalancing_migrate_nr_pages;
 #endif
+
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+	/*
+	 * If memory initialisation on large machines is deferred then this
+	 * is the first PFN that needs to be initialised.
+	 */
+	unsigned long first_deferred_pfn;
+#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
 } pg_data_t;
 
 #define node_present_pages(nid)	(NODE_DATA(nid)->node_present_pages)
diff --git a/mm/internal.h b/mm/internal.h
index 76b605139c7a..4a73f74846bd 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -385,6 +385,14 @@ static inline void mminit_verify_zonelist(void)
 }
 #endif /* CONFIG_DEBUG_MEMORY_INIT */
 
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+#define __defermem_init __meminit
+#define __defer_init    __meminit
+#else
+#define __defermem_init
+#define __defer_init __init
+#endif
+
 /* mminit_validate_memmodel_limits is independent of CONFIG_DEBUG_MEMORY_INIT */
 #if defined(CONFIG_SPARSEMEM)
 extern void mminit_validate_memmodel_limits(unsigned long *start_pfn,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b148c9921740..2d649e0a1f9e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -235,6 +235,67 @@ EXPORT_SYMBOL(nr_online_nodes);
 
 int page_group_by_mobility_disabled __read_mostly;
 
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+static inline void reset_deferred_meminit(pg_data_t *pgdat)
+{
+	pgdat->first_deferred_pfn = ULONG_MAX;
+}
+
+/* Returns true if the struct page for the pfn is uninitialised */
+static inline bool __defermem_init early_page_uninitialised(unsigned long pfn)
+{
+	int nid = early_pfn_to_nid(pfn);
+
+	if (pfn >= NODE_DATA(nid)->first_deferred_pfn)
+		return true;
+
+	return false;
+}
+
+/*
+ * Returns false when the remaining initialisation should be deferred until
+ * later in the boot cycle when it can be parallelised.
+ */
+static inline bool update_defer_init(pg_data_t *pgdat,
+				unsigned long pfn, unsigned long zone_end,
+				unsigned long *nr_initialised)
+{
+	if (!deferred_mem_init_enabled)
+		return true;
+
+	/* Always populate low zones for address-contrained allocations */
+	if (zone_end < pgdat_end_pfn(pgdat))
+		return true;
+
+	/* Initialise at least 2G of the highest zone */
+	(*nr_initialised)++;
+	if (*nr_initialised > (2UL << (30 - PAGE_SHIFT)) &&
+	    (pfn & (PAGES_PER_SECTION - 1)) == 0) {
+		pgdat->first_deferred_pfn = pfn;
+		return false;
+	}
+
+	return true;
+}
+#else
+static inline void reset_deferred_meminit(pg_data_t *pgdat)
+{
+}
+
+static inline bool early_page_uninitialised(unsigned long pfn)
+{
+	return false;
+}
+
+static inline bool update_defer_init(pg_data_t *pgdat,
+				unsigned long pfn, unsigned long zone_end,
+				unsigned long *nr_initialised)
+{
+	return true;
+}
+#endif
+
+
 void set_pageblock_migratetype(struct page *page, int migratetype)
 {
 	if (unlikely(page_group_by_mobility_disabled &&
@@ -886,8 +947,8 @@ static void __free_pages_ok(struct page *page, unsigned int order)
 	local_irq_restore(flags);
 }
 
-void __init __free_pages_bootmem(struct page *page, unsigned long pfn,
-							unsigned int order)
+static void __defer_init __free_pages_boot_core(struct page *page,
+					unsigned long pfn, unsigned int order)
 {
 	unsigned int nr_pages = 1 << order;
 	struct page *p = page;
@@ -950,6 +1011,14 @@ static inline bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
 }
 #endif
 
+void __defer_init __free_pages_bootmem(struct page *page, unsigned long pfn,
+							unsigned int order)
+{
+	if (early_page_uninitialised(pfn))
+		return;
+	return __free_pages_boot_core(page, pfn, order);
+}
+
 #ifdef CONFIG_CMA
 /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
 void __init init_cma_reserved_pageblock(struct page *page)
@@ -4222,14 +4291,16 @@ static void setup_zone_migrate_reserve(struct zone *zone)
 void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 		unsigned long start_pfn, enum memmap_context context)
 {
+	pg_data_t *pgdat = NODE_DATA(nid);
 	unsigned long end_pfn = start_pfn + size;
 	unsigned long pfn;
 	struct zone *z;
+	unsigned long nr_initialised = 0;
 
 	if (highest_memmap_pfn < end_pfn - 1)
 		highest_memmap_pfn = end_pfn - 1;
 
-	z = &NODE_DATA(nid)->node_zones[zone];
+	z = &pgdat->node_zones[zone];
 	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
 		/*
 		 * There can be holes in boot-time mem_map[]s
@@ -4241,6 +4312,9 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 				continue;
 			if (!early_pfn_in_nid(pfn, nid))
 				continue;
+			if (!update_defer_init(pgdat, pfn, end_pfn,
+						&nr_initialised))
+				break;
 		}
 		__init_single_pfn(pfn, zone, nid);
 	}
@@ -5042,6 +5116,7 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
 	/* pg_data_t should be reset to zero when it's allocated */
 	WARN_ON(pgdat->nr_zones || pgdat->classzone_idx);
 
+	reset_deferred_meminit(pgdat);
 	pgdat->node_id = nid;
 	pgdat->node_start_pfn = node_start_pfn;
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
-- 
2.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 08/13] mm: meminit: Initialise remaining struct pages in parallel with kswapd
  2015-04-22 17:07 ` Mel Gorman
@ 2015-04-22 17:07   ` Mel Gorman
  -1 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-22 17:07 UTC (permalink / raw)
  To: Linux-MM
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, LKML, Mel Gorman

Only a subset of struct pages are initialised at the moment. When this patch
is applied kswapd initialise the remaining struct pages in parallel. This
should boot faster by spreading the work to multiple CPUs and initialising
data that is local to the CPU.  The user-visible effect on large machines
is that free memory will appear to rapidly increase early in the lifetime
of the system until kswapd reports that all memory is initialised in the
kernel log.  Once initialised there should be no other user-visibile effects.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/internal.h   |   6 +++
 mm/mm_init.c    |   1 +
 mm/page_alloc.c | 116 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 mm/vmscan.c     |   6 ++-
 4 files changed, 123 insertions(+), 6 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 4a73f74846bd..2c4057140bec 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -388,9 +388,15 @@ static inline void mminit_verify_zonelist(void)
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 #define __defermem_init __meminit
 #define __defer_init    __meminit
+
+void deferred_init_memmap(int nid);
 #else
 #define __defermem_init
 #define __defer_init __init
+
+static inline void deferred_init_memmap(int nid)
+{
+}
 #endif
 
 /* mminit_validate_memmodel_limits is independent of CONFIG_DEBUG_MEMORY_INIT */
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 5f420f7fafa1..28fbf87b20aa 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -11,6 +11,7 @@
 #include <linux/export.h>
 #include <linux/memory.h>
 #include <linux/notifier.h>
+#include <linux/sched.h>
 #include "internal.h"
 
 #ifdef CONFIG_DEBUG_MEMORY_INIT
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2d649e0a1f9e..dfe63a3c3816 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -252,6 +252,14 @@ static inline bool __defermem_init early_page_uninitialised(unsigned long pfn)
 	return false;
 }
 
+static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid)
+{
+	if (pfn >= NODE_DATA(nid)->first_deferred_pfn)
+		return true;
+
+	return false;
+}
+
 /*
  * Returns false when the remaining initialisation should be deferred until
  * later in the boot cycle when it can be parallelised.
@@ -287,6 +295,11 @@ static inline bool early_page_uninitialised(unsigned long pfn)
 	return false;
 }
 
+static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid)
+{
+	return false;
+}
+
 static inline bool update_defer_init(pg_data_t *pgdat,
 				unsigned long pfn, unsigned long zone_end,
 				unsigned long *nr_initialised)
@@ -883,14 +896,45 @@ static void __meminit __init_single_pfn(unsigned long pfn, unsigned long zone,
 	return __init_single_page(pfn_to_page(pfn), pfn, zone, nid);
 }
 
-void reserve_bootmem_region(unsigned long start, unsigned long end)
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+static void init_reserved_page(unsigned long pfn)
+{
+	pg_data_t *pgdat;
+	int nid, zid;
+
+	if (!early_page_uninitialised(pfn))
+		return;
+
+	nid = early_pfn_to_nid(pfn);
+	pgdat = NODE_DATA(nid);
+
+	for (zid = 0; zid < MAX_NR_ZONES; zid++) {
+		struct zone *zone = &pgdat->node_zones[zid];
+
+		if (pfn >= zone->zone_start_pfn && pfn < zone_end_pfn(zone))
+			break;
+	}
+	__init_single_pfn(pfn, zid, nid);
+}
+#else
+static inline void init_reserved_page(unsigned long pfn)
+{
+}
+#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
+
+void __meminit reserve_bootmem_region(unsigned long start, unsigned long end)
 {
 	unsigned long start_pfn = PFN_DOWN(start);
 	unsigned long end_pfn = PFN_UP(end);
 
-	for (; start_pfn < end_pfn; start_pfn++)
-		if (pfn_valid(start_pfn))
-			SetPageReserved(pfn_to_page(start_pfn));
+	for (; start_pfn < end_pfn; start_pfn++) {
+		if (pfn_valid(start_pfn)) {
+			struct page *page = pfn_to_page(start_pfn);
+
+			init_reserved_page(start_pfn);
+			SetPageReserved(page);
+		}
+	}
 }
 
 static bool free_pages_prepare(struct page *page, unsigned int order)
@@ -1019,6 +1063,67 @@ void __defer_init __free_pages_bootmem(struct page *page, unsigned long pfn,
 	return __free_pages_boot_core(page, pfn, order);
 }
 
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+/* Initialise remaining memory on a node */
+void __defermem_init deferred_init_memmap(int nid)
+{
+	unsigned long start = jiffies;
+	struct mminit_pfnnid_cache nid_init_state = { };
+
+	pg_data_t *pgdat = NODE_DATA(nid);
+	int zid;
+	unsigned long first_init_pfn = pgdat->first_deferred_pfn;
+
+	if (first_init_pfn == ULONG_MAX)
+		return;
+
+	/* Sanity check boundaries */
+	BUG_ON(pgdat->first_deferred_pfn < pgdat->node_start_pfn);
+	BUG_ON(pgdat->first_deferred_pfn > pgdat_end_pfn(pgdat));
+	pgdat->first_deferred_pfn = ULONG_MAX;
+
+	for (zid = 0; zid < MAX_NR_ZONES; zid++) {
+		struct zone *zone = pgdat->node_zones + zid;
+		unsigned long walk_start, walk_end;
+		int i;
+
+		for_each_mem_pfn_range(i, nid, &walk_start, &walk_end, NULL) {
+			unsigned long pfn, end_pfn;
+
+			end_pfn = min(walk_end, zone_end_pfn(zone));
+			pfn = first_init_pfn;
+			if (pfn < walk_start)
+				pfn = walk_start;
+			if (pfn < zone->zone_start_pfn)
+				pfn = zone->zone_start_pfn;
+
+			for (; pfn < end_pfn; pfn++) {
+				struct page *page;
+
+				if (!pfn_valid(pfn))
+					continue;
+
+				if (!meminit_pfn_in_nid(pfn, nid, &nid_init_state))
+					continue;
+
+				if (page->flags) {
+					VM_BUG_ON(page_zone(page) != zone);
+					continue;
+				}
+
+				__init_single_page(page, pfn, zid, nid);
+				__free_pages_boot_core(page, pfn, 0);
+				cond_resched();
+			}
+			first_init_pfn = max(end_pfn, first_init_pfn);
+		}
+	}
+
+	pr_info("kswapd %d initialised deferred memory in %ums\n", nid,
+					jiffies_to_msecs(jiffies - start));
+}
+#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
+
 #ifdef CONFIG_CMA
 /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
 void __init init_cma_reserved_pageblock(struct page *page)
@@ -4229,6 +4334,9 @@ static void setup_zone_migrate_reserve(struct zone *zone)
 	zone->nr_migrate_reserve_block = reserve;
 
 	for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
+		if (!early_page_nid_uninitialised(pfn, zone_to_nid(zone)))
+			return;
+
 		if (!pfn_valid(pfn))
 			continue;
 		page = pfn_to_page(pfn);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5e8eadd71bac..c4895d26d036 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3348,7 +3348,7 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int order, int classzone_idx)
  * If there are applications that are active memory-allocators
  * (most normal use), this basically shouldn't matter.
  */
-static int kswapd(void *p)
+static int __defermem_init kswapd(void *p)
 {
 	unsigned long order, new_order;
 	unsigned balanced_order;
@@ -3383,6 +3383,8 @@ static int kswapd(void *p)
 	tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
 	set_freezable();
 
+	deferred_init_memmap(pgdat->node_id);
+
 	order = new_order = 0;
 	balanced_order = 0;
 	classzone_idx = new_classzone_idx = pgdat->nr_zones - 1;
@@ -3538,7 +3540,7 @@ static int cpu_callback(struct notifier_block *nfb, unsigned long action,
  * This kswapd start function will be called by init and node-hot-add.
  * On node-hot-add, kswapd will moved to proper cpus if cpus are hot-added.
  */
-int kswapd_run(int nid)
+int __defermem_init kswapd_run(int nid)
 {
 	pg_data_t *pgdat = NODE_DATA(nid);
 	int ret = 0;
-- 
2.1.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 08/13] mm: meminit: Initialise remaining struct pages in parallel with kswapd
@ 2015-04-22 17:07   ` Mel Gorman
  0 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-22 17:07 UTC (permalink / raw)
  To: Linux-MM
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, LKML, Mel Gorman

Only a subset of struct pages are initialised at the moment. When this patch
is applied kswapd initialise the remaining struct pages in parallel. This
should boot faster by spreading the work to multiple CPUs and initialising
data that is local to the CPU.  The user-visible effect on large machines
is that free memory will appear to rapidly increase early in the lifetime
of the system until kswapd reports that all memory is initialised in the
kernel log.  Once initialised there should be no other user-visibile effects.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/internal.h   |   6 +++
 mm/mm_init.c    |   1 +
 mm/page_alloc.c | 116 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 mm/vmscan.c     |   6 ++-
 4 files changed, 123 insertions(+), 6 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 4a73f74846bd..2c4057140bec 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -388,9 +388,15 @@ static inline void mminit_verify_zonelist(void)
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 #define __defermem_init __meminit
 #define __defer_init    __meminit
+
+void deferred_init_memmap(int nid);
 #else
 #define __defermem_init
 #define __defer_init __init
+
+static inline void deferred_init_memmap(int nid)
+{
+}
 #endif
 
 /* mminit_validate_memmodel_limits is independent of CONFIG_DEBUG_MEMORY_INIT */
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 5f420f7fafa1..28fbf87b20aa 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -11,6 +11,7 @@
 #include <linux/export.h>
 #include <linux/memory.h>
 #include <linux/notifier.h>
+#include <linux/sched.h>
 #include "internal.h"
 
 #ifdef CONFIG_DEBUG_MEMORY_INIT
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2d649e0a1f9e..dfe63a3c3816 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -252,6 +252,14 @@ static inline bool __defermem_init early_page_uninitialised(unsigned long pfn)
 	return false;
 }
 
+static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid)
+{
+	if (pfn >= NODE_DATA(nid)->first_deferred_pfn)
+		return true;
+
+	return false;
+}
+
 /*
  * Returns false when the remaining initialisation should be deferred until
  * later in the boot cycle when it can be parallelised.
@@ -287,6 +295,11 @@ static inline bool early_page_uninitialised(unsigned long pfn)
 	return false;
 }
 
+static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid)
+{
+	return false;
+}
+
 static inline bool update_defer_init(pg_data_t *pgdat,
 				unsigned long pfn, unsigned long zone_end,
 				unsigned long *nr_initialised)
@@ -883,14 +896,45 @@ static void __meminit __init_single_pfn(unsigned long pfn, unsigned long zone,
 	return __init_single_page(pfn_to_page(pfn), pfn, zone, nid);
 }
 
-void reserve_bootmem_region(unsigned long start, unsigned long end)
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+static void init_reserved_page(unsigned long pfn)
+{
+	pg_data_t *pgdat;
+	int nid, zid;
+
+	if (!early_page_uninitialised(pfn))
+		return;
+
+	nid = early_pfn_to_nid(pfn);
+	pgdat = NODE_DATA(nid);
+
+	for (zid = 0; zid < MAX_NR_ZONES; zid++) {
+		struct zone *zone = &pgdat->node_zones[zid];
+
+		if (pfn >= zone->zone_start_pfn && pfn < zone_end_pfn(zone))
+			break;
+	}
+	__init_single_pfn(pfn, zid, nid);
+}
+#else
+static inline void init_reserved_page(unsigned long pfn)
+{
+}
+#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
+
+void __meminit reserve_bootmem_region(unsigned long start, unsigned long end)
 {
 	unsigned long start_pfn = PFN_DOWN(start);
 	unsigned long end_pfn = PFN_UP(end);
 
-	for (; start_pfn < end_pfn; start_pfn++)
-		if (pfn_valid(start_pfn))
-			SetPageReserved(pfn_to_page(start_pfn));
+	for (; start_pfn < end_pfn; start_pfn++) {
+		if (pfn_valid(start_pfn)) {
+			struct page *page = pfn_to_page(start_pfn);
+
+			init_reserved_page(start_pfn);
+			SetPageReserved(page);
+		}
+	}
 }
 
 static bool free_pages_prepare(struct page *page, unsigned int order)
@@ -1019,6 +1063,67 @@ void __defer_init __free_pages_bootmem(struct page *page, unsigned long pfn,
 	return __free_pages_boot_core(page, pfn, order);
 }
 
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+/* Initialise remaining memory on a node */
+void __defermem_init deferred_init_memmap(int nid)
+{
+	unsigned long start = jiffies;
+	struct mminit_pfnnid_cache nid_init_state = { };
+
+	pg_data_t *pgdat = NODE_DATA(nid);
+	int zid;
+	unsigned long first_init_pfn = pgdat->first_deferred_pfn;
+
+	if (first_init_pfn == ULONG_MAX)
+		return;
+
+	/* Sanity check boundaries */
+	BUG_ON(pgdat->first_deferred_pfn < pgdat->node_start_pfn);
+	BUG_ON(pgdat->first_deferred_pfn > pgdat_end_pfn(pgdat));
+	pgdat->first_deferred_pfn = ULONG_MAX;
+
+	for (zid = 0; zid < MAX_NR_ZONES; zid++) {
+		struct zone *zone = pgdat->node_zones + zid;
+		unsigned long walk_start, walk_end;
+		int i;
+
+		for_each_mem_pfn_range(i, nid, &walk_start, &walk_end, NULL) {
+			unsigned long pfn, end_pfn;
+
+			end_pfn = min(walk_end, zone_end_pfn(zone));
+			pfn = first_init_pfn;
+			if (pfn < walk_start)
+				pfn = walk_start;
+			if (pfn < zone->zone_start_pfn)
+				pfn = zone->zone_start_pfn;
+
+			for (; pfn < end_pfn; pfn++) {
+				struct page *page;
+
+				if (!pfn_valid(pfn))
+					continue;
+
+				if (!meminit_pfn_in_nid(pfn, nid, &nid_init_state))
+					continue;
+
+				if (page->flags) {
+					VM_BUG_ON(page_zone(page) != zone);
+					continue;
+				}
+
+				__init_single_page(page, pfn, zid, nid);
+				__free_pages_boot_core(page, pfn, 0);
+				cond_resched();
+			}
+			first_init_pfn = max(end_pfn, first_init_pfn);
+		}
+	}
+
+	pr_info("kswapd %d initialised deferred memory in %ums\n", nid,
+					jiffies_to_msecs(jiffies - start));
+}
+#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
+
 #ifdef CONFIG_CMA
 /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
 void __init init_cma_reserved_pageblock(struct page *page)
@@ -4229,6 +4334,9 @@ static void setup_zone_migrate_reserve(struct zone *zone)
 	zone->nr_migrate_reserve_block = reserve;
 
 	for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
+		if (!early_page_nid_uninitialised(pfn, zone_to_nid(zone)))
+			return;
+
 		if (!pfn_valid(pfn))
 			continue;
 		page = pfn_to_page(pfn);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5e8eadd71bac..c4895d26d036 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3348,7 +3348,7 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int order, int classzone_idx)
  * If there are applications that are active memory-allocators
  * (most normal use), this basically shouldn't matter.
  */
-static int kswapd(void *p)
+static int __defermem_init kswapd(void *p)
 {
 	unsigned long order, new_order;
 	unsigned balanced_order;
@@ -3383,6 +3383,8 @@ static int kswapd(void *p)
 	tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
 	set_freezable();
 
+	deferred_init_memmap(pgdat->node_id);
+
 	order = new_order = 0;
 	balanced_order = 0;
 	classzone_idx = new_classzone_idx = pgdat->nr_zones - 1;
@@ -3538,7 +3540,7 @@ static int cpu_callback(struct notifier_block *nfb, unsigned long action,
  * This kswapd start function will be called by init and node-hot-add.
  * On node-hot-add, kswapd will moved to proper cpus if cpus are hot-added.
  */
-int kswapd_run(int nid)
+int __defermem_init kswapd_run(int nid)
 {
 	pg_data_t *pgdat = NODE_DATA(nid);
 	int ret = 0;
-- 
2.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 09/13] mm: meminit: Minimise number of pfn->page lookups during initialisation
  2015-04-22 17:07 ` Mel Gorman
@ 2015-04-22 17:07   ` Mel Gorman
  -1 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-22 17:07 UTC (permalink / raw)
  To: Linux-MM
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, LKML, Mel Gorman

Deferred struct page initialisation is using pfn_to_page() on every PFN
unnecessarily. This patch minimises the number of lookups and scheduler
checks.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/page_alloc.c | 29 ++++++++++++++++++++++++-----
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index dfe63a3c3816..839e4c73ce6d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1089,6 +1089,7 @@ void __defermem_init deferred_init_memmap(int nid)
 
 		for_each_mem_pfn_range(i, nid, &walk_start, &walk_end, NULL) {
 			unsigned long pfn, end_pfn;
+			struct page *page = NULL;
 
 			end_pfn = min(walk_end, zone_end_pfn(zone));
 			pfn = first_init_pfn;
@@ -1098,13 +1099,32 @@ void __defermem_init deferred_init_memmap(int nid)
 				pfn = zone->zone_start_pfn;
 
 			for (; pfn < end_pfn; pfn++) {
-				struct page *page;
-
-				if (!pfn_valid(pfn))
+				if (!pfn_valid_within(pfn))
 					continue;
 
-				if (!meminit_pfn_in_nid(pfn, nid, &nid_init_state))
+				/*
+				 * Ensure pfn_valid is checked every
+				 * MAX_ORDER_NR_PAGES for memory holes
+				 */
+				if ((pfn & (MAX_ORDER_NR_PAGES - 1)) == 0) {
+					if (!pfn_valid(pfn)) {
+						page = NULL;
+						continue;
+					}
+				}
+
+				if (!meminit_pfn_in_nid(pfn, nid, &nid_init_state)) {
+					page = NULL;
 					continue;
+				}
+
+				/* Minimise pfn page lookups and scheduler checks */
+				if (page && (pfn & (MAX_ORDER_NR_PAGES - 1)) != 0) {
+					page++;
+				} else {
+					page = pfn_to_page(pfn);
+					cond_resched();
+				}
 
 				if (page->flags) {
 					VM_BUG_ON(page_zone(page) != zone);
@@ -1113,7 +1133,6 @@ void __defermem_init deferred_init_memmap(int nid)
 
 				__init_single_page(page, pfn, zid, nid);
 				__free_pages_boot_core(page, pfn, 0);
-				cond_resched();
 			}
 			first_init_pfn = max(end_pfn, first_init_pfn);
 		}
-- 
2.1.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 09/13] mm: meminit: Minimise number of pfn->page lookups during initialisation
@ 2015-04-22 17:07   ` Mel Gorman
  0 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-22 17:07 UTC (permalink / raw)
  To: Linux-MM
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, LKML, Mel Gorman

Deferred struct page initialisation is using pfn_to_page() on every PFN
unnecessarily. This patch minimises the number of lookups and scheduler
checks.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/page_alloc.c | 29 ++++++++++++++++++++++++-----
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index dfe63a3c3816..839e4c73ce6d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1089,6 +1089,7 @@ void __defermem_init deferred_init_memmap(int nid)
 
 		for_each_mem_pfn_range(i, nid, &walk_start, &walk_end, NULL) {
 			unsigned long pfn, end_pfn;
+			struct page *page = NULL;
 
 			end_pfn = min(walk_end, zone_end_pfn(zone));
 			pfn = first_init_pfn;
@@ -1098,13 +1099,32 @@ void __defermem_init deferred_init_memmap(int nid)
 				pfn = zone->zone_start_pfn;
 
 			for (; pfn < end_pfn; pfn++) {
-				struct page *page;
-
-				if (!pfn_valid(pfn))
+				if (!pfn_valid_within(pfn))
 					continue;
 
-				if (!meminit_pfn_in_nid(pfn, nid, &nid_init_state))
+				/*
+				 * Ensure pfn_valid is checked every
+				 * MAX_ORDER_NR_PAGES for memory holes
+				 */
+				if ((pfn & (MAX_ORDER_NR_PAGES - 1)) == 0) {
+					if (!pfn_valid(pfn)) {
+						page = NULL;
+						continue;
+					}
+				}
+
+				if (!meminit_pfn_in_nid(pfn, nid, &nid_init_state)) {
+					page = NULL;
 					continue;
+				}
+
+				/* Minimise pfn page lookups and scheduler checks */
+				if (page && (pfn & (MAX_ORDER_NR_PAGES - 1)) != 0) {
+					page++;
+				} else {
+					page = pfn_to_page(pfn);
+					cond_resched();
+				}
 
 				if (page->flags) {
 					VM_BUG_ON(page_zone(page) != zone);
@@ -1113,7 +1133,6 @@ void __defermem_init deferred_init_memmap(int nid)
 
 				__init_single_page(page, pfn, zid, nid);
 				__free_pages_boot_core(page, pfn, 0);
-				cond_resched();
 			}
 			first_init_pfn = max(end_pfn, first_init_pfn);
 		}
-- 
2.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 10/13] x86: mm: Enable deferred struct page initialisation on x86-64
  2015-04-22 17:07 ` Mel Gorman
@ 2015-04-22 17:07   ` Mel Gorman
  -1 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-22 17:07 UTC (permalink / raw)
  To: Linux-MM
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, LKML, Mel Gorman

This patch adds the Kconfig logic to add deferred struct page initialisation
to x86-64 if NUMA is enabled. Other architectures may enable on a
case-by-case basis after auditing early_pfn_to_nid and testing.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 Documentation/kernel-parameters.txt |  6 ++++++
 arch/x86/Kconfig                    |  1 +
 include/linux/mmzone.h              | 14 ++++++++++++++
 init/main.c                         |  1 +
 mm/Kconfig                          | 28 ++++++++++++++++++++++++++++
 mm/page_alloc.c                     | 21 +++++++++++++++++++++
 6 files changed, 71 insertions(+)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index bfcb1a62a7b4..e7c6f7486214 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -807,6 +807,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 
 	debug_objects	[KNL] Enable object debugging
 
+	defer_meminit=	[KNL,X86] Enable or disable deferred struct page init.
+			Large machine may take a long time to initialise
+			memory management structures. If enabled then a
+			subset of struct pages are initialised and kswapd
+			initialses the rest in parallel.
+
 	no_debug_objects
 			[KNL] Disable object debugging
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b7d31ca55187..d15d74a052d5 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -32,6 +32,7 @@ config X86
 	select HAVE_UNSTABLE_SCHED_CLOCK
 	select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
 	select ARCH_SUPPORTS_INT128 if X86_64
+	select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT if X86_64 && NUMA
 	select HAVE_IDE
 	select HAVE_OPROFILE
 	select HAVE_PCSPKR_PLATFORM
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 821f5000dec9..8ac074db364f 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -822,6 +822,20 @@ static inline struct zone *lruvec_zone(struct lruvec *lruvec)
 #endif
 }
 
+
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+extern bool deferred_mem_init_enabled;
+static inline void setup_deferred_meminit(void)
+{
+	if (IS_ENABLED(CONFIG_DEFERRED_STRUCT_PAGE_INIT_DEFAULT_ENABLED))
+		deferred_mem_init_enabled = true;
+}
+#else
+static inline void setup_deferred_meminit(void)
+{
+}
+#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
+
 #ifdef CONFIG_HAVE_MEMORY_PRESENT
 void memory_present(int nid, unsigned long start, unsigned long end);
 #else
diff --git a/init/main.c b/init/main.c
index 6f0f1c5ff8cc..f339d37a43e8 100644
--- a/init/main.c
+++ b/init/main.c
@@ -506,6 +506,7 @@ asmlinkage __visible void __init start_kernel(void)
 	boot_init_stack_canary();
 
 	cgroup_init_early();
+	setup_deferred_meminit();
 
 	local_irq_disable();
 	early_boot_irqs_disabled = true;
diff --git a/mm/Kconfig b/mm/Kconfig
index a03131b6ba8e..87a4535e0df4 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -629,3 +629,31 @@ config MAX_STACK_SIZE_MB
 	  changed to a smaller value in which case that is used.
 
 	  A sane initial value is 80 MB.
+
+# For architectures that support deferred memory initialisation
+config ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
+	bool
+
+config DEFERRED_STRUCT_PAGE_INIT
+	bool "Defer initialisation of struct pages to kswapd"
+	default n
+	depends on ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
+	depends on MEMORY_HOTPLUG
+	help
+	  Ordinarily all struct pages are initialised during early boot in a
+	  single thread. On very large machines this can take a considerable
+	  amount of time. If this option is set, large machines will bring up
+	  a subset of memmap at boot and then initialise the rest in parallel
+	  when kswapd starts. This has a potential performance impact on
+	  processes running early in the lifetime of the systemm until kswapd
+	  finishes the initialisation.
+
+config DEFERRED_STRUCT_PAGE_INIT_DEFAULT_ENABLED
+	bool "Automatically enable deferred struct page initialisation"
+	default y
+	depends on DEFERRED_STRUCT_PAGE_INIT
+	help
+	  If set, struct page initialisation will be deferred by default on
+	  large memory configurations. If DEFERRED_STRUCT_PAGE_INIT is set
+	  then it is a reasonable default to enable this too. User may need
+	  to disable this if allocating huge pages from the command line.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 839e4c73ce6d..6b2f6c21b70f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -236,6 +236,8 @@ EXPORT_SYMBOL(nr_online_nodes);
 int page_group_by_mobility_disabled __read_mostly;
 
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+bool __meminitdata deferred_mem_init_enabled;
+
 static inline void reset_deferred_meminit(pg_data_t *pgdat)
 {
 	pgdat->first_deferred_pfn = ULONG_MAX;
@@ -285,6 +287,25 @@ static inline bool update_defer_init(pg_data_t *pgdat,
 
 	return true;
 }
+
+static int __init setup_deferred_mem_init(char *str)
+{
+	if (!str)
+		return -1;
+
+	if (!strcmp(str, "enable")) {
+		deferred_mem_init_enabled = true;
+	} else if (!strcmp(str, "disable")) {
+		deferred_mem_init_enabled = false;
+	} else {
+		pr_warn("Unable to parse deferred_mem_init=\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+early_param("defer_meminit", setup_deferred_mem_init);
 #else
 static inline void reset_deferred_meminit(pg_data_t *pgdat)
 {
-- 
2.1.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 10/13] x86: mm: Enable deferred struct page initialisation on x86-64
@ 2015-04-22 17:07   ` Mel Gorman
  0 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-22 17:07 UTC (permalink / raw)
  To: Linux-MM
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, LKML, Mel Gorman

This patch adds the Kconfig logic to add deferred struct page initialisation
to x86-64 if NUMA is enabled. Other architectures may enable on a
case-by-case basis after auditing early_pfn_to_nid and testing.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 Documentation/kernel-parameters.txt |  6 ++++++
 arch/x86/Kconfig                    |  1 +
 include/linux/mmzone.h              | 14 ++++++++++++++
 init/main.c                         |  1 +
 mm/Kconfig                          | 28 ++++++++++++++++++++++++++++
 mm/page_alloc.c                     | 21 +++++++++++++++++++++
 6 files changed, 71 insertions(+)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index bfcb1a62a7b4..e7c6f7486214 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -807,6 +807,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 
 	debug_objects	[KNL] Enable object debugging
 
+	defer_meminit=	[KNL,X86] Enable or disable deferred struct page init.
+			Large machine may take a long time to initialise
+			memory management structures. If enabled then a
+			subset of struct pages are initialised and kswapd
+			initialses the rest in parallel.
+
 	no_debug_objects
 			[KNL] Disable object debugging
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b7d31ca55187..d15d74a052d5 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -32,6 +32,7 @@ config X86
 	select HAVE_UNSTABLE_SCHED_CLOCK
 	select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
 	select ARCH_SUPPORTS_INT128 if X86_64
+	select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT if X86_64 && NUMA
 	select HAVE_IDE
 	select HAVE_OPROFILE
 	select HAVE_PCSPKR_PLATFORM
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 821f5000dec9..8ac074db364f 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -822,6 +822,20 @@ static inline struct zone *lruvec_zone(struct lruvec *lruvec)
 #endif
 }
 
+
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+extern bool deferred_mem_init_enabled;
+static inline void setup_deferred_meminit(void)
+{
+	if (IS_ENABLED(CONFIG_DEFERRED_STRUCT_PAGE_INIT_DEFAULT_ENABLED))
+		deferred_mem_init_enabled = true;
+}
+#else
+static inline void setup_deferred_meminit(void)
+{
+}
+#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
+
 #ifdef CONFIG_HAVE_MEMORY_PRESENT
 void memory_present(int nid, unsigned long start, unsigned long end);
 #else
diff --git a/init/main.c b/init/main.c
index 6f0f1c5ff8cc..f339d37a43e8 100644
--- a/init/main.c
+++ b/init/main.c
@@ -506,6 +506,7 @@ asmlinkage __visible void __init start_kernel(void)
 	boot_init_stack_canary();
 
 	cgroup_init_early();
+	setup_deferred_meminit();
 
 	local_irq_disable();
 	early_boot_irqs_disabled = true;
diff --git a/mm/Kconfig b/mm/Kconfig
index a03131b6ba8e..87a4535e0df4 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -629,3 +629,31 @@ config MAX_STACK_SIZE_MB
 	  changed to a smaller value in which case that is used.
 
 	  A sane initial value is 80 MB.
+
+# For architectures that support deferred memory initialisation
+config ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
+	bool
+
+config DEFERRED_STRUCT_PAGE_INIT
+	bool "Defer initialisation of struct pages to kswapd"
+	default n
+	depends on ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
+	depends on MEMORY_HOTPLUG
+	help
+	  Ordinarily all struct pages are initialised during early boot in a
+	  single thread. On very large machines this can take a considerable
+	  amount of time. If this option is set, large machines will bring up
+	  a subset of memmap at boot and then initialise the rest in parallel
+	  when kswapd starts. This has a potential performance impact on
+	  processes running early in the lifetime of the systemm until kswapd
+	  finishes the initialisation.
+
+config DEFERRED_STRUCT_PAGE_INIT_DEFAULT_ENABLED
+	bool "Automatically enable deferred struct page initialisation"
+	default y
+	depends on DEFERRED_STRUCT_PAGE_INIT
+	help
+	  If set, struct page initialisation will be deferred by default on
+	  large memory configurations. If DEFERRED_STRUCT_PAGE_INIT is set
+	  then it is a reasonable default to enable this too. User may need
+	  to disable this if allocating huge pages from the command line.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 839e4c73ce6d..6b2f6c21b70f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -236,6 +236,8 @@ EXPORT_SYMBOL(nr_online_nodes);
 int page_group_by_mobility_disabled __read_mostly;
 
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+bool __meminitdata deferred_mem_init_enabled;
+
 static inline void reset_deferred_meminit(pg_data_t *pgdat)
 {
 	pgdat->first_deferred_pfn = ULONG_MAX;
@@ -285,6 +287,25 @@ static inline bool update_defer_init(pg_data_t *pgdat,
 
 	return true;
 }
+
+static int __init setup_deferred_mem_init(char *str)
+{
+	if (!str)
+		return -1;
+
+	if (!strcmp(str, "enable")) {
+		deferred_mem_init_enabled = true;
+	} else if (!strcmp(str, "disable")) {
+		deferred_mem_init_enabled = false;
+	} else {
+		pr_warn("Unable to parse deferred_mem_init=\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+early_param("defer_meminit", setup_deferred_mem_init);
 #else
 static inline void reset_deferred_meminit(pg_data_t *pgdat)
 {
-- 
2.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 11/13] mm: meminit: Free pages in large chunks where possible
  2015-04-22 17:07 ` Mel Gorman
@ 2015-04-22 17:07   ` Mel Gorman
  -1 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-22 17:07 UTC (permalink / raw)
  To: Linux-MM
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, LKML, Mel Gorman

Parallel struct page frees pages one at a time. Try free pages as single
large pages where possible.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/page_alloc.c | 46 +++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 41 insertions(+), 5 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6b2f6c21b70f..8d3fd13a09c9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1085,6 +1085,20 @@ void __defer_init __free_pages_bootmem(struct page *page, unsigned long pfn,
 }
 
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+void __defermem_init deferred_free_range(struct page *page, unsigned long pfn,
+					int nr_pages)
+{
+	int i;
+
+	if (nr_pages == MAX_ORDER_NR_PAGES && (pfn & (MAX_ORDER_NR_PAGES-1)) == 0) {
+		__free_pages_boot_core(page, pfn, MAX_ORDER-1);
+		return;
+	}
+
+	for (i = 0; i < nr_pages; i++, page++, pfn++)
+		__free_pages_boot_core(page, pfn, 0);
+}
+
 /* Initialise remaining memory on a node */
 void __defermem_init deferred_init_memmap(int nid)
 {
@@ -1111,6 +1125,9 @@ void __defermem_init deferred_init_memmap(int nid)
 		for_each_mem_pfn_range(i, nid, &walk_start, &walk_end, NULL) {
 			unsigned long pfn, end_pfn;
 			struct page *page = NULL;
+			struct page *free_base_page = NULL;
+			unsigned long free_base_pfn = 0;
+			int nr_to_free = 0;
 
 			end_pfn = min(walk_end, zone_end_pfn(zone));
 			pfn = first_init_pfn;
@@ -1121,7 +1138,7 @@ void __defermem_init deferred_init_memmap(int nid)
 
 			for (; pfn < end_pfn; pfn++) {
 				if (!pfn_valid_within(pfn))
-					continue;
+					goto free_range;
 
 				/*
 				 * Ensure pfn_valid is checked every
@@ -1130,30 +1147,49 @@ void __defermem_init deferred_init_memmap(int nid)
 				if ((pfn & (MAX_ORDER_NR_PAGES - 1)) == 0) {
 					if (!pfn_valid(pfn)) {
 						page = NULL;
-						continue;
+						goto free_range;
 					}
 				}
 
 				if (!meminit_pfn_in_nid(pfn, nid, &nid_init_state)) {
 					page = NULL;
-					continue;
+					goto free_range;
 				}
 
 				/* Minimise pfn page lookups and scheduler checks */
 				if (page && (pfn & (MAX_ORDER_NR_PAGES - 1)) != 0) {
 					page++;
 				} else {
+					deferred_free_range(free_base_page,
+							free_base_pfn, nr_to_free);
+					free_base_page = NULL;
+					free_base_pfn = nr_to_free = 0;
+
 					page = pfn_to_page(pfn);
 					cond_resched();
 				}
 
 				if (page->flags) {
 					VM_BUG_ON(page_zone(page) != zone);
-					continue;
+					goto free_range;
 				}
 
 				__init_single_page(page, pfn, zid, nid);
-				__free_pages_boot_core(page, pfn, 0);
+				if (!free_base_page) {
+					free_base_page = page;
+					free_base_pfn = pfn;
+					nr_to_free = 0;
+				}
+				nr_to_free++;
+
+				/* Where possible, batch up pages for a single free */
+				continue;
+free_range:
+				/* Free the current block of pages to allocator */
+				if (free_base_page)
+					deferred_free_range(free_base_page, free_base_pfn, nr_to_free);
+				free_base_page = NULL;
+				free_base_pfn = nr_to_free = 0;
 			}
 			first_init_pfn = max(end_pfn, first_init_pfn);
 		}
-- 
2.1.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 11/13] mm: meminit: Free pages in large chunks where possible
@ 2015-04-22 17:07   ` Mel Gorman
  0 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-22 17:07 UTC (permalink / raw)
  To: Linux-MM
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, LKML, Mel Gorman

Parallel struct page frees pages one at a time. Try free pages as single
large pages where possible.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/page_alloc.c | 46 +++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 41 insertions(+), 5 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6b2f6c21b70f..8d3fd13a09c9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1085,6 +1085,20 @@ void __defer_init __free_pages_bootmem(struct page *page, unsigned long pfn,
 }
 
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+void __defermem_init deferred_free_range(struct page *page, unsigned long pfn,
+					int nr_pages)
+{
+	int i;
+
+	if (nr_pages == MAX_ORDER_NR_PAGES && (pfn & (MAX_ORDER_NR_PAGES-1)) == 0) {
+		__free_pages_boot_core(page, pfn, MAX_ORDER-1);
+		return;
+	}
+
+	for (i = 0; i < nr_pages; i++, page++, pfn++)
+		__free_pages_boot_core(page, pfn, 0);
+}
+
 /* Initialise remaining memory on a node */
 void __defermem_init deferred_init_memmap(int nid)
 {
@@ -1111,6 +1125,9 @@ void __defermem_init deferred_init_memmap(int nid)
 		for_each_mem_pfn_range(i, nid, &walk_start, &walk_end, NULL) {
 			unsigned long pfn, end_pfn;
 			struct page *page = NULL;
+			struct page *free_base_page = NULL;
+			unsigned long free_base_pfn = 0;
+			int nr_to_free = 0;
 
 			end_pfn = min(walk_end, zone_end_pfn(zone));
 			pfn = first_init_pfn;
@@ -1121,7 +1138,7 @@ void __defermem_init deferred_init_memmap(int nid)
 
 			for (; pfn < end_pfn; pfn++) {
 				if (!pfn_valid_within(pfn))
-					continue;
+					goto free_range;
 
 				/*
 				 * Ensure pfn_valid is checked every
@@ -1130,30 +1147,49 @@ void __defermem_init deferred_init_memmap(int nid)
 				if ((pfn & (MAX_ORDER_NR_PAGES - 1)) == 0) {
 					if (!pfn_valid(pfn)) {
 						page = NULL;
-						continue;
+						goto free_range;
 					}
 				}
 
 				if (!meminit_pfn_in_nid(pfn, nid, &nid_init_state)) {
 					page = NULL;
-					continue;
+					goto free_range;
 				}
 
 				/* Minimise pfn page lookups and scheduler checks */
 				if (page && (pfn & (MAX_ORDER_NR_PAGES - 1)) != 0) {
 					page++;
 				} else {
+					deferred_free_range(free_base_page,
+							free_base_pfn, nr_to_free);
+					free_base_page = NULL;
+					free_base_pfn = nr_to_free = 0;
+
 					page = pfn_to_page(pfn);
 					cond_resched();
 				}
 
 				if (page->flags) {
 					VM_BUG_ON(page_zone(page) != zone);
-					continue;
+					goto free_range;
 				}
 
 				__init_single_page(page, pfn, zid, nid);
-				__free_pages_boot_core(page, pfn, 0);
+				if (!free_base_page) {
+					free_base_page = page;
+					free_base_pfn = pfn;
+					nr_to_free = 0;
+				}
+				nr_to_free++;
+
+				/* Where possible, batch up pages for a single free */
+				continue;
+free_range:
+				/* Free the current block of pages to allocator */
+				if (free_base_page)
+					deferred_free_range(free_base_page, free_base_pfn, nr_to_free);
+				free_base_page = NULL;
+				free_base_pfn = nr_to_free = 0;
 			}
 			first_init_pfn = max(end_pfn, first_init_pfn);
 		}
-- 
2.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 12/13] mm: meminit: Reduce number of times pageblocks are set during struct page init
  2015-04-22 17:07 ` Mel Gorman
@ 2015-04-22 17:07   ` Mel Gorman
  -1 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-22 17:07 UTC (permalink / raw)
  To: Linux-MM
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, LKML, Mel Gorman

During parallel sturct page initialisation, ranges are checked for every
PFN unnecessarily which increases boot times. This patch alters when the
ranges are checked.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/page_alloc.c | 45 +++++++++++++++++++++++----------------------
 1 file changed, 23 insertions(+), 22 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8d3fd13a09c9..945d56667b61 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -876,33 +876,12 @@ static int free_tail_pages_check(struct page *head_page, struct page *page)
 static void __meminit __init_single_page(struct page *page, unsigned long pfn,
 				unsigned long zone, int nid)
 {
-	struct zone *z = &NODE_DATA(nid)->node_zones[zone];
-
 	set_page_links(page, zone, nid, pfn);
 	mminit_verify_page_links(page, zone, nid, pfn);
 	init_page_count(page);
 	page_mapcount_reset(page);
 	page_cpupid_reset_last(page);
 
-	/*
-	 * Mark the block movable so that blocks are reserved for
-	 * movable at startup. This will force kernel allocations
-	 * to reserve their blocks rather than leaking throughout
-	 * the address space during boot when many long-lived
-	 * kernel allocations are made. Later some blocks near
-	 * the start are marked MIGRATE_RESERVE by
-	 * setup_zone_migrate_reserve()
-	 *
-	 * bitmap is created for zone's valid pfn range. but memmap
-	 * can be created for invalid pages (for alignment)
-	 * check here not to call set_pageblock_migratetype() against
-	 * pfn out of zone.
-	 */
-	if ((z->zone_start_pfn <= pfn)
-	    && (pfn < zone_end_pfn(z))
-	    && !(pfn & (pageblock_nr_pages - 1)))
-		set_pageblock_migratetype(page, MIGRATE_MOVABLE);
-
 	INIT_LIST_HEAD(&page->lru);
 #ifdef WANT_PAGE_VIRTUAL
 	/* The shift won't overflow because ZONE_NORMAL is below 4G. */
@@ -1091,6 +1070,7 @@ void __defermem_init deferred_free_range(struct page *page, unsigned long pfn,
 	int i;
 
 	if (nr_pages == MAX_ORDER_NR_PAGES && (pfn & (MAX_ORDER_NR_PAGES-1)) == 0) {
+		set_pageblock_migratetype(page, MIGRATE_MOVABLE);
 		__free_pages_boot_core(page, pfn, MAX_ORDER-1);
 		return;
 	}
@@ -4500,7 +4480,28 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 						&nr_initialised))
 				break;
 		}
-		__init_single_pfn(pfn, zone, nid);
+
+		/*
+		 * Mark the block movable so that blocks are reserved for
+		 * movable at startup. This will force kernel allocations
+		 * to reserve their blocks rather than leaking throughout
+		 * the address space during boot when many long-lived
+		 * kernel allocations are made. Later some blocks near
+		 * the start are marked MIGRATE_RESERVE by
+		 * setup_zone_migrate_reserve()
+		 *
+		 * bitmap is created for zone's valid pfn range. but memmap
+		 * can be created for invalid pages (for alignment)
+		 * check here not to call set_pageblock_migratetype() against
+		 * pfn out of zone.
+		 */
+		if (!(pfn & (pageblock_nr_pages - 1))) {
+			struct page *page = pfn_to_page(pfn);
+			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+			__init_single_page(page, pfn, zone, nid);
+		} else {
+			__init_single_pfn(pfn, zone, nid);
+		}
 	}
 }
 
-- 
2.1.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 12/13] mm: meminit: Reduce number of times pageblocks are set during struct page init
@ 2015-04-22 17:07   ` Mel Gorman
  0 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-22 17:07 UTC (permalink / raw)
  To: Linux-MM
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, LKML, Mel Gorman

During parallel sturct page initialisation, ranges are checked for every
PFN unnecessarily which increases boot times. This patch alters when the
ranges are checked.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/page_alloc.c | 45 +++++++++++++++++++++++----------------------
 1 file changed, 23 insertions(+), 22 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8d3fd13a09c9..945d56667b61 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -876,33 +876,12 @@ static int free_tail_pages_check(struct page *head_page, struct page *page)
 static void __meminit __init_single_page(struct page *page, unsigned long pfn,
 				unsigned long zone, int nid)
 {
-	struct zone *z = &NODE_DATA(nid)->node_zones[zone];
-
 	set_page_links(page, zone, nid, pfn);
 	mminit_verify_page_links(page, zone, nid, pfn);
 	init_page_count(page);
 	page_mapcount_reset(page);
 	page_cpupid_reset_last(page);
 
-	/*
-	 * Mark the block movable so that blocks are reserved for
-	 * movable at startup. This will force kernel allocations
-	 * to reserve their blocks rather than leaking throughout
-	 * the address space during boot when many long-lived
-	 * kernel allocations are made. Later some blocks near
-	 * the start are marked MIGRATE_RESERVE by
-	 * setup_zone_migrate_reserve()
-	 *
-	 * bitmap is created for zone's valid pfn range. but memmap
-	 * can be created for invalid pages (for alignment)
-	 * check here not to call set_pageblock_migratetype() against
-	 * pfn out of zone.
-	 */
-	if ((z->zone_start_pfn <= pfn)
-	    && (pfn < zone_end_pfn(z))
-	    && !(pfn & (pageblock_nr_pages - 1)))
-		set_pageblock_migratetype(page, MIGRATE_MOVABLE);
-
 	INIT_LIST_HEAD(&page->lru);
 #ifdef WANT_PAGE_VIRTUAL
 	/* The shift won't overflow because ZONE_NORMAL is below 4G. */
@@ -1091,6 +1070,7 @@ void __defermem_init deferred_free_range(struct page *page, unsigned long pfn,
 	int i;
 
 	if (nr_pages == MAX_ORDER_NR_PAGES && (pfn & (MAX_ORDER_NR_PAGES-1)) == 0) {
+		set_pageblock_migratetype(page, MIGRATE_MOVABLE);
 		__free_pages_boot_core(page, pfn, MAX_ORDER-1);
 		return;
 	}
@@ -4500,7 +4480,28 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 						&nr_initialised))
 				break;
 		}
-		__init_single_pfn(pfn, zone, nid);
+
+		/*
+		 * Mark the block movable so that blocks are reserved for
+		 * movable at startup. This will force kernel allocations
+		 * to reserve their blocks rather than leaking throughout
+		 * the address space during boot when many long-lived
+		 * kernel allocations are made. Later some blocks near
+		 * the start are marked MIGRATE_RESERVE by
+		 * setup_zone_migrate_reserve()
+		 *
+		 * bitmap is created for zone's valid pfn range. but memmap
+		 * can be created for invalid pages (for alignment)
+		 * check here not to call set_pageblock_migratetype() against
+		 * pfn out of zone.
+		 */
+		if (!(pfn & (pageblock_nr_pages - 1))) {
+			struct page *page = pfn_to_page(pfn);
+			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+			__init_single_page(page, pfn, zone, nid);
+		} else {
+			__init_single_pfn(pfn, zone, nid);
+		}
 	}
 }
 
-- 
2.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 13/13] mm: meminit: Remove mminit_verify_page_links
  2015-04-22 17:07 ` Mel Gorman
@ 2015-04-22 17:07   ` Mel Gorman
  -1 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-22 17:07 UTC (permalink / raw)
  To: Linux-MM
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, LKML, Mel Gorman

mminit_verify_page_links() is an extremely paranoid check that was introduced
when memory initialisation was being heavily reworked. Profiles indicated
that up to 10% of parallel memory initialisation was spent on checking
this for every page. The cost could be reduced but in practice this check
only found problems very early during the initialisation rewrite and has
found nothing since. This patch removes an expensive unnecessary check.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/internal.h   | 8 --------
 mm/mm_init.c    | 8 --------
 mm/page_alloc.c | 1 -
 3 files changed, 17 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 2c4057140bec..c73ad248f8f4 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -360,10 +360,7 @@ do { \
 } while (0)
 
 extern void mminit_verify_pageflags_layout(void);
-extern void mminit_verify_page_links(struct page *page,
-		enum zone_type zone, unsigned long nid, unsigned long pfn);
 extern void mminit_verify_zonelist(void);
-
 #else
 
 static inline void mminit_dprintk(enum mminit_level level,
@@ -375,11 +372,6 @@ static inline void mminit_verify_pageflags_layout(void)
 {
 }
 
-static inline void mminit_verify_page_links(struct page *page,
-		enum zone_type zone, unsigned long nid, unsigned long pfn)
-{
-}
-
 static inline void mminit_verify_zonelist(void)
 {
 }
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 28fbf87b20aa..fdadf918de76 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -131,14 +131,6 @@ void __init mminit_verify_pageflags_layout(void)
 	BUG_ON(or_mask != add_mask);
 }
 
-void __meminit mminit_verify_page_links(struct page *page, enum zone_type zone,
-			unsigned long nid, unsigned long pfn)
-{
-	BUG_ON(page_to_nid(page) != nid);
-	BUG_ON(page_zonenum(page) != zone);
-	BUG_ON(page_to_pfn(page) != pfn);
-}
-
 static __init int set_mminit_loglevel(char *str)
 {
 	get_option(&str, &mminit_loglevel);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 945d56667b61..cb84d67ae9ba 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -877,7 +877,6 @@ static void __meminit __init_single_page(struct page *page, unsigned long pfn,
 				unsigned long zone, int nid)
 {
 	set_page_links(page, zone, nid, pfn);
-	mminit_verify_page_links(page, zone, nid, pfn);
 	init_page_count(page);
 	page_mapcount_reset(page);
 	page_cpupid_reset_last(page);
-- 
2.1.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 13/13] mm: meminit: Remove mminit_verify_page_links
@ 2015-04-22 17:07   ` Mel Gorman
  0 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-22 17:07 UTC (permalink / raw)
  To: Linux-MM
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, LKML, Mel Gorman

mminit_verify_page_links() is an extremely paranoid check that was introduced
when memory initialisation was being heavily reworked. Profiles indicated
that up to 10% of parallel memory initialisation was spent on checking
this for every page. The cost could be reduced but in practice this check
only found problems very early during the initialisation rewrite and has
found nothing since. This patch removes an expensive unnecessary check.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/internal.h   | 8 --------
 mm/mm_init.c    | 8 --------
 mm/page_alloc.c | 1 -
 3 files changed, 17 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 2c4057140bec..c73ad248f8f4 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -360,10 +360,7 @@ do { \
 } while (0)
 
 extern void mminit_verify_pageflags_layout(void);
-extern void mminit_verify_page_links(struct page *page,
-		enum zone_type zone, unsigned long nid, unsigned long pfn);
 extern void mminit_verify_zonelist(void);
-
 #else
 
 static inline void mminit_dprintk(enum mminit_level level,
@@ -375,11 +372,6 @@ static inline void mminit_verify_pageflags_layout(void)
 {
 }
 
-static inline void mminit_verify_page_links(struct page *page,
-		enum zone_type zone, unsigned long nid, unsigned long pfn)
-{
-}
-
 static inline void mminit_verify_zonelist(void)
 {
 }
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 28fbf87b20aa..fdadf918de76 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -131,14 +131,6 @@ void __init mminit_verify_pageflags_layout(void)
 	BUG_ON(or_mask != add_mask);
 }
 
-void __meminit mminit_verify_page_links(struct page *page, enum zone_type zone,
-			unsigned long nid, unsigned long pfn)
-{
-	BUG_ON(page_to_nid(page) != nid);
-	BUG_ON(page_zonenum(page) != zone);
-	BUG_ON(page_to_pfn(page) != pfn);
-}
-
 static __init int set_mminit_loglevel(char *str)
 {
 	get_option(&str, &mminit_loglevel);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 945d56667b61..cb84d67ae9ba 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -877,7 +877,6 @@ static void __meminit __init_single_page(struct page *page, unsigned long pfn,
 				unsigned long zone, int nid)
 {
 	set_page_links(page, zone, nid, pfn);
-	mminit_verify_page_links(page, zone, nid, pfn);
 	init_page_count(page);
 	page_mapcount_reset(page);
 	page_cpupid_reset_last(page);
-- 
2.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [RFC PATCH 0/14] Parallel struct page initialisation v5r4
  2015-04-22 17:07 ` Mel Gorman
@ 2015-04-22 17:11   ` Mel Gorman
  -1 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-22 17:11 UTC (permalink / raw)
  To: Linux-MM
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, LKML

The 0/14 is a mistake. There are only 13 patches.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC PATCH 0/14] Parallel struct page initialisation v5r4
@ 2015-04-22 17:11   ` Mel Gorman
  0 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-22 17:11 UTC (permalink / raw)
  To: Linux-MM
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, LKML

The 0/14 is a mistake. There are only 13 patches.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 10/13] x86: mm: Enable deferred struct page initialisation on x86-64
  2015-04-22 17:07   ` Mel Gorman
@ 2015-04-22 23:45     ` Andrew Morton
  -1 siblings, 0 replies; 48+ messages in thread
From: Andrew Morton @ 2015-04-22 23:45 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, LKML

On Wed, 22 Apr 2015 18:07:50 +0100 Mel Gorman <mgorman@suse.de> wrote:

> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -32,6 +32,7 @@ config X86
>  	select HAVE_UNSTABLE_SCHED_CLOCK
>  	select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
>  	select ARCH_SUPPORTS_INT128 if X86_64
> +	select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT if X86_64 && NUMA

Put this in the "config X86_64" section and skip the "X86_64 &&"?

Can we omit the whole defer_meminit= thing and permanently enable the
feature?  That's simpler, provides better test coverage and is, we
hope, faster.

And can this be used on non-NUMA?  Presumably that won't speed things
up any if we're bandwidth limited but again it's simpler and provides
better coverage.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 10/13] x86: mm: Enable deferred struct page initialisation on x86-64
@ 2015-04-22 23:45     ` Andrew Morton
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Morton @ 2015-04-22 23:45 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, LKML

On Wed, 22 Apr 2015 18:07:50 +0100 Mel Gorman <mgorman@suse.de> wrote:

> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -32,6 +32,7 @@ config X86
>  	select HAVE_UNSTABLE_SCHED_CLOCK
>  	select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
>  	select ARCH_SUPPORTS_INT128 if X86_64
> +	select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT if X86_64 && NUMA

Put this in the "config X86_64" section and skip the "X86_64 &&"?

Can we omit the whole defer_meminit= thing and permanently enable the
feature?  That's simpler, provides better test coverage and is, we
hope, faster.

And can this be used on non-NUMA?  Presumably that won't speed things
up any if we're bandwidth limited but again it's simpler and provides
better coverage.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 10/13] x86: mm: Enable deferred struct page initialisation on x86-64
  2015-04-22 23:45     ` Andrew Morton
@ 2015-04-23  9:23       ` Mel Gorman
  -1 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-23  9:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, LKML

On Wed, Apr 22, 2015 at 04:45:00PM -0700, Andrew Morton wrote:
> On Wed, 22 Apr 2015 18:07:50 +0100 Mel Gorman <mgorman@suse.de> wrote:
> 
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -32,6 +32,7 @@ config X86
> >  	select HAVE_UNSTABLE_SCHED_CLOCK
> >  	select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
> >  	select ARCH_SUPPORTS_INT128 if X86_64
> > +	select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT if X86_64 && NUMA
> 
> Put this in the "config X86_64" section and skip the "X86_64 &&"?
> 

Done.

> Can we omit the whole defer_meminit= thing and permanently enable the
> feature?  That's simpler, provides better test coverage and is, we
> hope, faster.
> 

Yes. The intent was to have a workaround if there were any failures like
Waiman's vmalloc failures in an earlier version but they are bugs that
should be fixed.

> And can this be used on non-NUMA?  Presumably that won't speed things
> up any if we're bandwidth limited but again it's simpler and provides
> better coverage.

Nothing prevents it. There is less opportunity for parallelism but
improving coverage is desirable.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 10/13] x86: mm: Enable deferred struct page initialisation on x86-64
@ 2015-04-23  9:23       ` Mel Gorman
  0 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-23  9:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, LKML

On Wed, Apr 22, 2015 at 04:45:00PM -0700, Andrew Morton wrote:
> On Wed, 22 Apr 2015 18:07:50 +0100 Mel Gorman <mgorman@suse.de> wrote:
> 
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -32,6 +32,7 @@ config X86
> >  	select HAVE_UNSTABLE_SCHED_CLOCK
> >  	select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
> >  	select ARCH_SUPPORTS_INT128 if X86_64
> > +	select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT if X86_64 && NUMA
> 
> Put this in the "config X86_64" section and skip the "X86_64 &&"?
> 

Done.

> Can we omit the whole defer_meminit= thing and permanently enable the
> feature?  That's simpler, provides better test coverage and is, we
> hope, faster.
> 

Yes. The intent was to have a workaround if there were any failures like
Waiman's vmalloc failures in an earlier version but they are bugs that
should be fixed.

> And can this be used on non-NUMA?  Presumably that won't speed things
> up any if we're bandwidth limited but again it's simpler and provides
> better coverage.

Nothing prevents it. There is less opportunity for parallelism but
improving coverage is desirable.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 10/13] x86: mm: Enable deferred struct page initialisation on x86-64
  2015-04-23  9:23       ` Mel Gorman
@ 2015-04-24 14:35         ` Waiman Long
  -1 siblings, 0 replies; 48+ messages in thread
From: Waiman Long @ 2015-04-24 14:35 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux-MM, Nathan Zimmer, Dave Hansen,
	Scott Norton, Daniel J Blueman, LKML

On 04/23/2015 05:23 AM, Mel Gorman wrote:
> On Wed, Apr 22, 2015 at 04:45:00PM -0700, Andrew Morton wrote:
>> On Wed, 22 Apr 2015 18:07:50 +0100 Mel Gorman<mgorman@suse.de>  wrote:
>>
>>> --- a/arch/x86/Kconfig
>>> +++ b/arch/x86/Kconfig
>>> @@ -32,6 +32,7 @@ config X86
>>>   	select HAVE_UNSTABLE_SCHED_CLOCK
>>>   	select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
>>>   	select ARCH_SUPPORTS_INT128 if X86_64
>>> +	select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT if X86_64&&  NUMA
>> Put this in the "config X86_64" section and skip the "X86_64&&"?
>>
> Done.
>
>> Can we omit the whole defer_meminit= thing and permanently enable the
>> feature?  That's simpler, provides better test coverage and is, we
>> hope, faster.
>>
> Yes. The intent was to have a workaround if there were any failures like
> Waiman's vmalloc failures in an earlier version but they are bugs that
> should be fixed.
>
>> And can this be used on non-NUMA?  Presumably that won't speed things
>> up any if we're bandwidth limited but again it's simpler and provides
>> better coverage.
> Nothing prevents it. There is less opportunity for parallelism but
> improving coverage is desirable.
>

Memory access latency can be more than double for local vs. remote node 
memory. Bandwidth can also be much lower depending on what kind of 
interconnect is between the 2 nodes. So it is better to do it in a 
NUMA-aware way. Within a NUMA node, however, we can split the memory 
initialization to 2 or more local CPUs if the memory size is big enough.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 10/13] x86: mm: Enable deferred struct page initialisation on x86-64
@ 2015-04-24 14:35         ` Waiman Long
  0 siblings, 0 replies; 48+ messages in thread
From: Waiman Long @ 2015-04-24 14:35 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux-MM, Nathan Zimmer, Dave Hansen,
	Scott Norton, Daniel J Blueman, LKML

On 04/23/2015 05:23 AM, Mel Gorman wrote:
> On Wed, Apr 22, 2015 at 04:45:00PM -0700, Andrew Morton wrote:
>> On Wed, 22 Apr 2015 18:07:50 +0100 Mel Gorman<mgorman@suse.de>  wrote:
>>
>>> --- a/arch/x86/Kconfig
>>> +++ b/arch/x86/Kconfig
>>> @@ -32,6 +32,7 @@ config X86
>>>   	select HAVE_UNSTABLE_SCHED_CLOCK
>>>   	select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
>>>   	select ARCH_SUPPORTS_INT128 if X86_64
>>> +	select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT if X86_64&&  NUMA
>> Put this in the "config X86_64" section and skip the "X86_64&&"?
>>
> Done.
>
>> Can we omit the whole defer_meminit= thing and permanently enable the
>> feature?  That's simpler, provides better test coverage and is, we
>> hope, faster.
>>
> Yes. The intent was to have a workaround if there were any failures like
> Waiman's vmalloc failures in an earlier version but they are bugs that
> should be fixed.
>
>> And can this be used on non-NUMA?  Presumably that won't speed things
>> up any if we're bandwidth limited but again it's simpler and provides
>> better coverage.
> Nothing prevents it. There is less opportunity for parallelism but
> improving coverage is desirable.
>

Memory access latency can be more than double for local vs. remote node 
memory. Bandwidth can also be much lower depending on what kind of 
interconnect is between the 2 nodes. So it is better to do it in a 
NUMA-aware way. Within a NUMA node, however, we can split the memory 
initialization to 2 or more local CPUs if the memory size is big enough.

Cheers,
Longman

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 10/13] x86: mm: Enable deferred struct page initialisation on x86-64
  2015-04-24 14:35         ` Waiman Long
@ 2015-04-24 15:20           ` Mel Gorman
  -1 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-24 15:20 UTC (permalink / raw)
  To: Waiman Long
  Cc: Andrew Morton, Linux-MM, Nathan Zimmer, Dave Hansen,
	Scott Norton, Daniel J Blueman, LKML

On Fri, Apr 24, 2015 at 10:35:49AM -0400, Waiman Long wrote:
> On 04/23/2015 05:23 AM, Mel Gorman wrote:
> >On Wed, Apr 22, 2015 at 04:45:00PM -0700, Andrew Morton wrote:
> >>On Wed, 22 Apr 2015 18:07:50 +0100 Mel Gorman<mgorman@suse.de>  wrote:
> >>
> >>>--- a/arch/x86/Kconfig
> >>>+++ b/arch/x86/Kconfig
> >>>@@ -32,6 +32,7 @@ config X86
> >>>  	select HAVE_UNSTABLE_SCHED_CLOCK
> >>>  	select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
> >>>  	select ARCH_SUPPORTS_INT128 if X86_64
> >>>+	select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT if X86_64&&  NUMA
> >>Put this in the "config X86_64" section and skip the "X86_64&&"?
> >>
> >Done.
> >
> >>Can we omit the whole defer_meminit= thing and permanently enable the
> >>feature?  That's simpler, provides better test coverage and is, we
> >>hope, faster.
> >>
> >Yes. The intent was to have a workaround if there were any failures like
> >Waiman's vmalloc failures in an earlier version but they are bugs that
> >should be fixed.
> >
> >>And can this be used on non-NUMA?  Presumably that won't speed things
> >>up any if we're bandwidth limited but again it's simpler and provides
> >>better coverage.
> >Nothing prevents it. There is less opportunity for parallelism but
> >improving coverage is desirable.
> >
> 
> Memory access latency can be more than double for local vs. remote
> node memory. Bandwidth can also be much lower depending on what kind
> of interconnect is between the 2 nodes. So it is better to do it in
> a NUMA-aware way.

I do not believe that is what he was asking. He was asking if we could
defer memory initialisation even when there is only one node. It does not
gain much in terms of boot times but it improves testing coverage.

> Within a NUMA node, however, we can split the
> memory initialization to 2 or more local CPUs if the memory size is
> big enough.
> 

I considered it but discarded the idea. It'd be more complex to setup and
the two CPUs could simply end up contending on the same memory bus as
well as contending on zone->lock.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 10/13] x86: mm: Enable deferred struct page initialisation on x86-64
@ 2015-04-24 15:20           ` Mel Gorman
  0 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-24 15:20 UTC (permalink / raw)
  To: Waiman Long
  Cc: Andrew Morton, Linux-MM, Nathan Zimmer, Dave Hansen,
	Scott Norton, Daniel J Blueman, LKML

On Fri, Apr 24, 2015 at 10:35:49AM -0400, Waiman Long wrote:
> On 04/23/2015 05:23 AM, Mel Gorman wrote:
> >On Wed, Apr 22, 2015 at 04:45:00PM -0700, Andrew Morton wrote:
> >>On Wed, 22 Apr 2015 18:07:50 +0100 Mel Gorman<mgorman@suse.de>  wrote:
> >>
> >>>--- a/arch/x86/Kconfig
> >>>+++ b/arch/x86/Kconfig
> >>>@@ -32,6 +32,7 @@ config X86
> >>>  	select HAVE_UNSTABLE_SCHED_CLOCK
> >>>  	select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
> >>>  	select ARCH_SUPPORTS_INT128 if X86_64
> >>>+	select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT if X86_64&&  NUMA
> >>Put this in the "config X86_64" section and skip the "X86_64&&"?
> >>
> >Done.
> >
> >>Can we omit the whole defer_meminit= thing and permanently enable the
> >>feature?  That's simpler, provides better test coverage and is, we
> >>hope, faster.
> >>
> >Yes. The intent was to have a workaround if there were any failures like
> >Waiman's vmalloc failures in an earlier version but they are bugs that
> >should be fixed.
> >
> >>And can this be used on non-NUMA?  Presumably that won't speed things
> >>up any if we're bandwidth limited but again it's simpler and provides
> >>better coverage.
> >Nothing prevents it. There is less opportunity for parallelism but
> >improving coverage is desirable.
> >
> 
> Memory access latency can be more than double for local vs. remote
> node memory. Bandwidth can also be much lower depending on what kind
> of interconnect is between the 2 nodes. So it is better to do it in
> a NUMA-aware way.

I do not believe that is what he was asking. He was asking if we could
defer memory initialisation even when there is only one node. It does not
gain much in terms of boot times but it improves testing coverage.

> Within a NUMA node, however, we can split the
> memory initialization to 2 or more local CPUs if the memory size is
> big enough.
> 

I considered it but discarded the idea. It'd be more complex to setup and
the two CPUs could simply end up contending on the same memory bus as
well as contending on zone->lock.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 10/13] x86: mm: Enable deferred struct page initialisation on x86-64
  2015-04-24 15:20           ` Mel Gorman
@ 2015-04-24 19:04             ` Waiman Long
  -1 siblings, 0 replies; 48+ messages in thread
From: Waiman Long @ 2015-04-24 19:04 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux-MM, Nathan Zimmer, Dave Hansen,
	Scott Norton, Daniel J Blueman, LKML

On 04/24/2015 11:20 AM, Mel Gorman wrote:
> On Fri, Apr 24, 2015 at 10:35:49AM -0400, Waiman Long wrote:
>> On 04/23/2015 05:23 AM, Mel Gorman wrote:
>>> On Wed, Apr 22, 2015 at 04:45:00PM -0700, Andrew Morton wrote:
>>>> On Wed, 22 Apr 2015 18:07:50 +0100 Mel Gorman<mgorman@suse.de>   wrote:
>>>>
>>>>> --- a/arch/x86/Kconfig
>>>>> +++ b/arch/x86/Kconfig
>>>>> @@ -32,6 +32,7 @@ config X86
>>>>>   	select HAVE_UNSTABLE_SCHED_CLOCK
>>>>>   	select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
>>>>>   	select ARCH_SUPPORTS_INT128 if X86_64
>>>>> +	select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT if X86_64&&   NUMA
>>>> Put this in the "config X86_64" section and skip the "X86_64&&"?
>>>>
>>> Done.
>>>
>>>> Can we omit the whole defer_meminit= thing and permanently enable the
>>>> feature?  That's simpler, provides better test coverage and is, we
>>>> hope, faster.
>>>>
>>> Yes. The intent was to have a workaround if there were any failures like
>>> Waiman's vmalloc failures in an earlier version but they are bugs that
>>> should be fixed.
>>>
>>>> And can this be used on non-NUMA?  Presumably that won't speed things
>>>> up any if we're bandwidth limited but again it's simpler and provides
>>>> better coverage.
>>> Nothing prevents it. There is less opportunity for parallelism but
>>> improving coverage is desirable.
>>>
>> Memory access latency can be more than double for local vs. remote
>> node memory. Bandwidth can also be much lower depending on what kind
>> of interconnect is between the 2 nodes. So it is better to do it in
>> a NUMA-aware way.
> I do not believe that is what he was asking. He was asking if we could
> defer memory initialisation even when there is only one node. It does not
> gain much in terms of boot times but it improves testing coverage.

Thanks for the clarification.

>> Within a NUMA node, however, we can split the
>> memory initialization to 2 or more local CPUs if the memory size is
>> big enough.
>>
> I considered it but discarded the idea. It'd be more complex to setup and
> the two CPUs could simply end up contending on the same memory bus as
> well as contending on zone->lock.
>

I don't think we need that now. However, we may have to consider this 
when one day even a single node can have TBs of memory unless we move to 
a page size larger than 4k.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 10/13] x86: mm: Enable deferred struct page initialisation on x86-64
@ 2015-04-24 19:04             ` Waiman Long
  0 siblings, 0 replies; 48+ messages in thread
From: Waiman Long @ 2015-04-24 19:04 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux-MM, Nathan Zimmer, Dave Hansen,
	Scott Norton, Daniel J Blueman, LKML

On 04/24/2015 11:20 AM, Mel Gorman wrote:
> On Fri, Apr 24, 2015 at 10:35:49AM -0400, Waiman Long wrote:
>> On 04/23/2015 05:23 AM, Mel Gorman wrote:
>>> On Wed, Apr 22, 2015 at 04:45:00PM -0700, Andrew Morton wrote:
>>>> On Wed, 22 Apr 2015 18:07:50 +0100 Mel Gorman<mgorman@suse.de>   wrote:
>>>>
>>>>> --- a/arch/x86/Kconfig
>>>>> +++ b/arch/x86/Kconfig
>>>>> @@ -32,6 +32,7 @@ config X86
>>>>>   	select HAVE_UNSTABLE_SCHED_CLOCK
>>>>>   	select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
>>>>>   	select ARCH_SUPPORTS_INT128 if X86_64
>>>>> +	select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT if X86_64&&   NUMA
>>>> Put this in the "config X86_64" section and skip the "X86_64&&"?
>>>>
>>> Done.
>>>
>>>> Can we omit the whole defer_meminit= thing and permanently enable the
>>>> feature?  That's simpler, provides better test coverage and is, we
>>>> hope, faster.
>>>>
>>> Yes. The intent was to have a workaround if there were any failures like
>>> Waiman's vmalloc failures in an earlier version but they are bugs that
>>> should be fixed.
>>>
>>>> And can this be used on non-NUMA?  Presumably that won't speed things
>>>> up any if we're bandwidth limited but again it's simpler and provides
>>>> better coverage.
>>> Nothing prevents it. There is less opportunity for parallelism but
>>> improving coverage is desirable.
>>>
>> Memory access latency can be more than double for local vs. remote
>> node memory. Bandwidth can also be much lower depending on what kind
>> of interconnect is between the 2 nodes. So it is better to do it in
>> a NUMA-aware way.
> I do not believe that is what he was asking. He was asking if we could
> defer memory initialisation even when there is only one node. It does not
> gain much in terms of boot times but it improves testing coverage.

Thanks for the clarification.

>> Within a NUMA node, however, we can split the
>> memory initialization to 2 or more local CPUs if the memory size is
>> big enough.
>>
> I considered it but discarded the idea. It'd be more complex to setup and
> the two CPUs could simply end up contending on the same memory bus as
> well as contending on zone->lock.
>

I don't think we need that now. However, we may have to consider this 
when one day even a single node can have TBs of memory unless we move to 
a page size larger than 4k.

Cheers,
Longman

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 10/13] x86: mm: Enable deferred struct page initialisation on x86-64
  2015-04-24 19:04             ` Waiman Long
@ 2015-04-25 17:28               ` Mel Gorman
  -1 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-25 17:28 UTC (permalink / raw)
  To: Waiman Long
  Cc: Andrew Morton, Linux-MM, Nathan Zimmer, Dave Hansen,
	Scott Norton, Daniel J Blueman, LKML

On Fri, Apr 24, 2015 at 03:04:27PM -0400, Waiman Long wrote:
> >>Within a NUMA node, however, we can split the
> >>memory initialization to 2 or more local CPUs if the memory size is
> >>big enough.
> >>
> >I considered it but discarded the idea. It'd be more complex to setup and
> >the two CPUs could simply end up contending on the same memory bus as
> >well as contending on zone->lock.
> >
> 
> I don't think we need that now. However, we may have to consider
> this when one day even a single node can have TBs of memory unless
> we move to a page size larger than 4k.
> 

We'll cross that bridge when we come to it. I suspect there is more room
for improvement in the initialisation that would be worth trying before
resorting to more threads. With more threads there is a risk that we hit
memory bus contention and a high risk that it actually is worse due to
contending on zone->lock when freeing the pages.

In the meantime, do you mind updating the before/after figures for your
test machine with this series please?

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 10/13] x86: mm: Enable deferred struct page initialisation on x86-64
@ 2015-04-25 17:28               ` Mel Gorman
  0 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-25 17:28 UTC (permalink / raw)
  To: Waiman Long
  Cc: Andrew Morton, Linux-MM, Nathan Zimmer, Dave Hansen,
	Scott Norton, Daniel J Blueman, LKML

On Fri, Apr 24, 2015 at 03:04:27PM -0400, Waiman Long wrote:
> >>Within a NUMA node, however, we can split the
> >>memory initialization to 2 or more local CPUs if the memory size is
> >>big enough.
> >>
> >I considered it but discarded the idea. It'd be more complex to setup and
> >the two CPUs could simply end up contending on the same memory bus as
> >well as contending on zone->lock.
> >
> 
> I don't think we need that now. However, we may have to consider
> this when one day even a single node can have TBs of memory unless
> we move to a page size larger than 4k.
> 

We'll cross that bridge when we come to it. I suspect there is more room
for improvement in the initialisation that would be worth trying before
resorting to more threads. With more threads there is a risk that we hit
memory bus contention and a high risk that it actually is worse due to
contending on zone->lock when freeing the pages.

In the meantime, do you mind updating the before/after figures for your
test machine with this series please?

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 10/13] x86: mm: Enable deferred struct page initialisation on x86-64
  2015-04-25 17:28               ` Mel Gorman
@ 2015-04-27 20:07                 ` Waiman Long
  -1 siblings, 0 replies; 48+ messages in thread
From: Waiman Long @ 2015-04-27 20:07 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux-MM, Nathan Zimmer, Dave Hansen,
	Scott Norton, Daniel J Blueman, LKML

On 04/25/2015 01:28 PM, Mel Gorman wrote:
> On Fri, Apr 24, 2015 at 03:04:27PM -0400, Waiman Long wrote:
>>>> Within a NUMA node, however, we can split the
>>>> memory initialization to 2 or more local CPUs if the memory size is
>>>> big enough.
>>>>
>>> I considered it but discarded the idea. It'd be more complex to setup and
>>> the two CPUs could simply end up contending on the same memory bus as
>>> well as contending on zone->lock.
>>>
>> I don't think we need that now. However, we may have to consider
>> this when one day even a single node can have TBs of memory unless
>> we move to a page size larger than 4k.
>>
> We'll cross that bridge when we come to it. I suspect there is more room
> for improvement in the initialisation that would be worth trying before
> resorting to more threads. With more threads there is a risk that we hit
> memory bus contention and a high risk that it actually is worse due to
> contending on zone->lock when freeing the pages.
>
> In the meantime, do you mind updating the before/after figures for your
> test machine with this series please?
>

I will test the latest patch once I got my hand on a 12TB machine.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 10/13] x86: mm: Enable deferred struct page initialisation on x86-64
@ 2015-04-27 20:07                 ` Waiman Long
  0 siblings, 0 replies; 48+ messages in thread
From: Waiman Long @ 2015-04-27 20:07 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux-MM, Nathan Zimmer, Dave Hansen,
	Scott Norton, Daniel J Blueman, LKML

On 04/25/2015 01:28 PM, Mel Gorman wrote:
> On Fri, Apr 24, 2015 at 03:04:27PM -0400, Waiman Long wrote:
>>>> Within a NUMA node, however, we can split the
>>>> memory initialization to 2 or more local CPUs if the memory size is
>>>> big enough.
>>>>
>>> I considered it but discarded the idea. It'd be more complex to setup and
>>> the two CPUs could simply end up contending on the same memory bus as
>>> well as contending on zone->lock.
>>>
>> I don't think we need that now. However, we may have to consider
>> this when one day even a single node can have TBs of memory unless
>> we move to a page size larger than 4k.
>>
> We'll cross that bridge when we come to it. I suspect there is more room
> for improvement in the initialisation that would be worth trying before
> resorting to more threads. With more threads there is a risk that we hit
> memory bus contention and a high risk that it actually is worse due to
> contending on zone->lock when freeing the pages.
>
> In the meantime, do you mind updating the before/after figures for your
> test machine with this series please?
>

I will test the latest patch once I got my hand on a 12TB machine.

Cheers,
Longman

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 10/13] x86: mm: Enable deferred struct page initialisation on x86-64
  2015-04-28 14:36 [PATCH 0/13] Parallel struct page initialisation v4 Mel Gorman
@ 2015-04-28 14:37   ` Mel Gorman
  0 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-28 14:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML, Mel Gorman

Subject says it all. Other architectures may enable on a case-by-case
basis after auditing early_pfn_to_nid and testing.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 arch/x86/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b7d31ca55187..1beff8a8fbc9 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -18,6 +18,7 @@ config X86_64
 	select X86_DEV_DMA_OPS
 	select ARCH_USE_CMPXCHG_LOCKREF
 	select HAVE_LIVEPATCH
+	select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
 
 ### Arch settings
 config X86
-- 
2.3.5


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 10/13] x86: mm: Enable deferred struct page initialisation on x86-64
@ 2015-04-28 14:37   ` Mel Gorman
  0 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-28 14:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML, Mel Gorman

Subject says it all. Other architectures may enable on a case-by-case
basis after auditing early_pfn_to_nid and testing.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 arch/x86/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b7d31ca55187..1beff8a8fbc9 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -18,6 +18,7 @@ config X86_64
 	select X86_DEV_DMA_OPS
 	select ARCH_USE_CMPXCHG_LOCKREF
 	select HAVE_LIVEPATCH
+	select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
 
 ### Arch settings
 config X86
-- 
2.3.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 10/13] x86: mm: Enable deferred struct page initialisation on x86-64
  2015-04-23 10:33 [PATCH 0/13] Parallel struct page initialisation v3 Mel Gorman
@ 2015-04-23 10:33   ` Mel Gorman
  0 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-23 10:33 UTC (permalink / raw)
  To: Linux-MM
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Andrew Morton, LKML, Mel Gorman

Subject says it all. Other architectures may enable on a case-by-case
basis after auditing early_pfn_to_nid and testing.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 arch/x86/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b7d31ca55187..1beff8a8fbc9 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -18,6 +18,7 @@ config X86_64
 	select X86_DEV_DMA_OPS
 	select ARCH_USE_CMPXCHG_LOCKREF
 	select HAVE_LIVEPATCH
+	select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
 
 ### Arch settings
 config X86
-- 
2.3.5


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 10/13] x86: mm: Enable deferred struct page initialisation on x86-64
@ 2015-04-23 10:33   ` Mel Gorman
  0 siblings, 0 replies; 48+ messages in thread
From: Mel Gorman @ 2015-04-23 10:33 UTC (permalink / raw)
  To: Linux-MM
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Andrew Morton, LKML, Mel Gorman

Subject says it all. Other architectures may enable on a case-by-case
basis after auditing early_pfn_to_nid and testing.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 arch/x86/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b7d31ca55187..1beff8a8fbc9 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -18,6 +18,7 @@ config X86_64
 	select X86_DEV_DMA_OPS
 	select ARCH_USE_CMPXCHG_LOCKREF
 	select HAVE_LIVEPATCH
+	select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
 
 ### Arch settings
 config X86
-- 
2.3.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2015-04-28 14:38 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-22 17:07 [RFC PATCH 0/14] Parallel struct page initialisation v5r4 Mel Gorman
2015-04-22 17:07 ` Mel Gorman
2015-04-22 17:07 ` [PATCH 01/13] memblock: Introduce a for_each_reserved_mem_region iterator Mel Gorman
2015-04-22 17:07   ` Mel Gorman
2015-04-22 17:07 ` [PATCH 02/13] mm: meminit: Move page initialization into a separate function Mel Gorman
2015-04-22 17:07   ` Mel Gorman
2015-04-22 17:07 ` [PATCH 03/13] mm: meminit: Only set page reserved in the memblock region Mel Gorman
2015-04-22 17:07   ` Mel Gorman
2015-04-22 17:07 ` [PATCH 04/13] mm: page_alloc: Pass PFN to __free_pages_bootmem Mel Gorman
2015-04-22 17:07   ` Mel Gorman
2015-04-22 17:07 ` [PATCH 05/13] mm: meminit: Make __early_pfn_to_nid SMP-safe and introduce meminit_pfn_in_nid Mel Gorman
2015-04-22 17:07   ` Mel Gorman
2015-04-22 17:07 ` [PATCH 06/13] mm: meminit: Inline some helper functions Mel Gorman
2015-04-22 17:07   ` Mel Gorman
2015-04-22 17:07 ` [PATCH 07/13] mm: meminit: Only a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set Mel Gorman
2015-04-22 17:07   ` Mel Gorman
2015-04-22 17:07 ` [PATCH 08/13] mm: meminit: Initialise remaining struct pages in parallel with kswapd Mel Gorman
2015-04-22 17:07   ` Mel Gorman
2015-04-22 17:07 ` [PATCH 09/13] mm: meminit: Minimise number of pfn->page lookups during initialisation Mel Gorman
2015-04-22 17:07   ` Mel Gorman
2015-04-22 17:07 ` [PATCH 10/13] x86: mm: Enable deferred struct page initialisation on x86-64 Mel Gorman
2015-04-22 17:07   ` Mel Gorman
2015-04-22 23:45   ` Andrew Morton
2015-04-22 23:45     ` Andrew Morton
2015-04-23  9:23     ` Mel Gorman
2015-04-23  9:23       ` Mel Gorman
2015-04-24 14:35       ` Waiman Long
2015-04-24 14:35         ` Waiman Long
2015-04-24 15:20         ` Mel Gorman
2015-04-24 15:20           ` Mel Gorman
2015-04-24 19:04           ` Waiman Long
2015-04-24 19:04             ` Waiman Long
2015-04-25 17:28             ` Mel Gorman
2015-04-25 17:28               ` Mel Gorman
2015-04-27 20:07               ` Waiman Long
2015-04-27 20:07                 ` Waiman Long
2015-04-22 17:07 ` [PATCH 11/13] mm: meminit: Free pages in large chunks where possible Mel Gorman
2015-04-22 17:07   ` Mel Gorman
2015-04-22 17:07 ` [PATCH 12/13] mm: meminit: Reduce number of times pageblocks are set during struct page init Mel Gorman
2015-04-22 17:07   ` Mel Gorman
2015-04-22 17:07 ` [PATCH 13/13] mm: meminit: Remove mminit_verify_page_links Mel Gorman
2015-04-22 17:07   ` Mel Gorman
2015-04-22 17:11 ` [RFC PATCH 0/14] Parallel struct page initialisation v5r4 Mel Gorman
2015-04-22 17:11   ` Mel Gorman
2015-04-23 10:33 [PATCH 0/13] Parallel struct page initialisation v3 Mel Gorman
2015-04-23 10:33 ` [PATCH 10/13] x86: mm: Enable deferred struct page initialisation on x86-64 Mel Gorman
2015-04-23 10:33   ` Mel Gorman
2015-04-28 14:36 [PATCH 0/13] Parallel struct page initialisation v4 Mel Gorman
2015-04-28 14:37 ` [PATCH 10/13] x86: mm: Enable deferred struct page initialisation on x86-64 Mel Gorman
2015-04-28 14:37   ` Mel Gorman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.