All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/13] Parallel struct page initialisation v4
@ 2015-04-28 14:36 ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-04-28 14:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML, Mel Gorman

The bulk of the changes here are related to Andrew's feedback. Functionally
there is almost no difference.

Changelog since v3
o Fix section-related warning
o Comments, clarifications, checkpatch
o Report the number of pages initialised

Changelog since v2
o Reduce overhead of topology_init
o Remove boot-time kernel parameter to enable/disable
o Enable on UMA

Changelog since v1
o Always initialise low zones
o Typo corrections
o Rename parallel mem init to parallel struct page init
o Rebase to 4.0

Struct page initialisation had been identified as one of the reasons why
large machines take a long time to boot. Patches were posted a long time ago
to defer initialisation until they were first used.  This was rejected on
the grounds it should not be necessary to hurt the fast paths. This series
reuses much of the work from that time but defers the initialisation of
memory to kswapd so that one thread per node initialises memory local to
that node.

After applying the series and setting the appropriate Kconfig variable I
see this in the boot log on a 64G machine

[    7.383764] kswapd 0 initialised deferred memory in 188ms
[    7.404253] kswapd 1 initialised deferred memory in 208ms
[    7.411044] kswapd 3 initialised deferred memory in 216ms
[    7.411551] kswapd 2 initialised deferred memory in 216ms

On a 1TB machine, I see

[    8.406511] kswapd 3 initialised deferred memory in 1116ms
[    8.428518] kswapd 1 initialised deferred memory in 1140ms
[    8.435977] kswapd 0 initialised deferred memory in 1148ms
[    8.437416] kswapd 2 initialised deferred memory in 1148ms

Once booted the machine appears to work as normal. Boot times were measured
from the time shutdown was called until ssh was available again.  In the
64G case, the boot time savings are negligible. On the 1TB machine, the
savings were 16 seconds.

It would be nice if the people that have access to really large machines
would test this series and report how much boot time is reduced.

 arch/ia64/mm/numa.c      |  19 +--
 arch/x86/Kconfig         |   1 +
 drivers/base/node.c      |   6 +-
 include/linux/memblock.h |  18 +++
 include/linux/mm.h       |   8 +-
 include/linux/mmzone.h   |  23 ++-
 mm/Kconfig               |  18 +++
 mm/bootmem.c             |   8 +-
 mm/internal.h            |  29 +++-
 mm/memblock.c            |  34 +++-
 mm/mm_init.c             |   9 +-
 mm/nobootmem.c           |   7 +-
 mm/page_alloc.c          | 401 ++++++++++++++++++++++++++++++++++++++++-------
 mm/vmscan.c              |   6 +-
 14 files changed, 487 insertions(+), 100 deletions(-)

-- 
2.3.5


^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH 0/13] Parallel struct page initialisation v4
@ 2015-04-28 14:36 ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-04-28 14:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML, Mel Gorman

The bulk of the changes here are related to Andrew's feedback. Functionally
there is almost no difference.

Changelog since v3
o Fix section-related warning
o Comments, clarifications, checkpatch
o Report the number of pages initialised

Changelog since v2
o Reduce overhead of topology_init
o Remove boot-time kernel parameter to enable/disable
o Enable on UMA

Changelog since v1
o Always initialise low zones
o Typo corrections
o Rename parallel mem init to parallel struct page init
o Rebase to 4.0

Struct page initialisation had been identified as one of the reasons why
large machines take a long time to boot. Patches were posted a long time ago
to defer initialisation until they were first used.  This was rejected on
the grounds it should not be necessary to hurt the fast paths. This series
reuses much of the work from that time but defers the initialisation of
memory to kswapd so that one thread per node initialises memory local to
that node.

After applying the series and setting the appropriate Kconfig variable I
see this in the boot log on a 64G machine

[    7.383764] kswapd 0 initialised deferred memory in 188ms
[    7.404253] kswapd 1 initialised deferred memory in 208ms
[    7.411044] kswapd 3 initialised deferred memory in 216ms
[    7.411551] kswapd 2 initialised deferred memory in 216ms

On a 1TB machine, I see

[    8.406511] kswapd 3 initialised deferred memory in 1116ms
[    8.428518] kswapd 1 initialised deferred memory in 1140ms
[    8.435977] kswapd 0 initialised deferred memory in 1148ms
[    8.437416] kswapd 2 initialised deferred memory in 1148ms

Once booted the machine appears to work as normal. Boot times were measured
from the time shutdown was called until ssh was available again.  In the
64G case, the boot time savings are negligible. On the 1TB machine, the
savings were 16 seconds.

It would be nice if the people that have access to really large machines
would test this series and report how much boot time is reduced.

 arch/ia64/mm/numa.c      |  19 +--
 arch/x86/Kconfig         |   1 +
 drivers/base/node.c      |   6 +-
 include/linux/memblock.h |  18 +++
 include/linux/mm.h       |   8 +-
 include/linux/mmzone.h   |  23 ++-
 mm/Kconfig               |  18 +++
 mm/bootmem.c             |   8 +-
 mm/internal.h            |  29 +++-
 mm/memblock.c            |  34 +++-
 mm/mm_init.c             |   9 +-
 mm/nobootmem.c           |   7 +-
 mm/page_alloc.c          | 401 ++++++++++++++++++++++++++++++++++++++++-------
 mm/vmscan.c              |   6 +-
 14 files changed, 487 insertions(+), 100 deletions(-)

-- 
2.3.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH 01/13] memblock: Introduce a for_each_reserved_mem_region iterator.
  2015-04-28 14:36 ` Mel Gorman
@ 2015-04-28 14:36   ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-04-28 14:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML, Mel Gorman

From: Robin Holt <holt@sgi.com>

As part of initializing struct page's in 2MiB chunks, we noticed that
at the end of free_all_bootmem(), there was nothing which had forced
the reserved/allocated 4KiB pages to be initialized.

This helper function will be used for that expansion.

Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Nate Zimmer <nzimmer@sgi.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/memblock.h | 18 ++++++++++++++++++
 mm/memblock.c            | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index e8cc45307f8f..3075e7673c54 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -93,6 +93,9 @@ void __next_mem_range_rev(u64 *idx, int nid, struct memblock_type *type_a,
 			  struct memblock_type *type_b, phys_addr_t *out_start,
 			  phys_addr_t *out_end, int *out_nid);
 
+void __next_reserved_mem_region(u64 *idx, phys_addr_t *out_start,
+			       phys_addr_t *out_end);
+
 /**
  * for_each_mem_range - iterate through memblock areas from type_a and not
  * included in type_b. Or just type_a if type_b is NULL.
@@ -132,6 +135,21 @@ void __next_mem_range_rev(u64 *idx, int nid, struct memblock_type *type_a,
 	     __next_mem_range_rev(&i, nid, type_a, type_b,		\
 				  p_start, p_end, p_nid))
 
+/**
+ * for_each_reserved_mem_region - iterate over all reserved memblock areas
+ * @i: u64 used as loop variable
+ * @p_start: ptr to phys_addr_t for start address of the range, can be %NULL
+ * @p_end: ptr to phys_addr_t for end address of the range, can be %NULL
+ *
+ * Walks over reserved areas of memblock. Available as soon as memblock
+ * is initialized.
+ */
+#define for_each_reserved_mem_region(i, p_start, p_end)			\
+	for (i = 0UL,							\
+	     __next_reserved_mem_region(&i, p_start, p_end);		\
+	     i != (u64)ULLONG_MAX;					\
+	     __next_reserved_mem_region(&i, p_start, p_end))
+
 #ifdef CONFIG_MOVABLE_NODE
 static inline bool memblock_is_hotpluggable(struct memblock_region *m)
 {
diff --git a/mm/memblock.c b/mm/memblock.c
index 252b77bdf65e..e0cc2d174f74 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -765,6 +765,38 @@ int __init_memblock memblock_clear_hotplug(phys_addr_t base, phys_addr_t size)
 }
 
 /**
+ * __next_reserved_mem_region - next function for for_each_reserved_region()
+ * @idx: pointer to u64 loop variable
+ * @out_start: ptr to phys_addr_t for start address of the region, can be %NULL
+ * @out_end: ptr to phys_addr_t for end address of the region, can be %NULL
+ *
+ * Iterate over all reserved memory regions.
+ */
+void __init_memblock __next_reserved_mem_region(u64 *idx,
+					   phys_addr_t *out_start,
+					   phys_addr_t *out_end)
+{
+	struct memblock_type *rsv = &memblock.reserved;
+
+	if (*idx >= 0 && *idx < rsv->cnt) {
+		struct memblock_region *r = &rsv->regions[*idx];
+		phys_addr_t base = r->base;
+		phys_addr_t size = r->size;
+
+		if (out_start)
+			*out_start = base;
+		if (out_end)
+			*out_end = base + size - 1;
+
+		*idx += 1;
+		return;
+	}
+
+	/* signal end of iteration */
+	*idx = ULLONG_MAX;
+}
+
+/**
  * __next__mem_range - next function for for_each_free_mem_range() etc.
  * @idx: pointer to u64 loop variable
  * @nid: node selector, %NUMA_NO_NODE for all nodes
-- 
2.3.5


^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH 01/13] memblock: Introduce a for_each_reserved_mem_region iterator.
@ 2015-04-28 14:36   ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-04-28 14:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML, Mel Gorman

From: Robin Holt <holt@sgi.com>

As part of initializing struct page's in 2MiB chunks, we noticed that
at the end of free_all_bootmem(), there was nothing which had forced
the reserved/allocated 4KiB pages to be initialized.

This helper function will be used for that expansion.

Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Nate Zimmer <nzimmer@sgi.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/memblock.h | 18 ++++++++++++++++++
 mm/memblock.c            | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index e8cc45307f8f..3075e7673c54 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -93,6 +93,9 @@ void __next_mem_range_rev(u64 *idx, int nid, struct memblock_type *type_a,
 			  struct memblock_type *type_b, phys_addr_t *out_start,
 			  phys_addr_t *out_end, int *out_nid);
 
+void __next_reserved_mem_region(u64 *idx, phys_addr_t *out_start,
+			       phys_addr_t *out_end);
+
 /**
  * for_each_mem_range - iterate through memblock areas from type_a and not
  * included in type_b. Or just type_a if type_b is NULL.
@@ -132,6 +135,21 @@ void __next_mem_range_rev(u64 *idx, int nid, struct memblock_type *type_a,
 	     __next_mem_range_rev(&i, nid, type_a, type_b,		\
 				  p_start, p_end, p_nid))
 
+/**
+ * for_each_reserved_mem_region - iterate over all reserved memblock areas
+ * @i: u64 used as loop variable
+ * @p_start: ptr to phys_addr_t for start address of the range, can be %NULL
+ * @p_end: ptr to phys_addr_t for end address of the range, can be %NULL
+ *
+ * Walks over reserved areas of memblock. Available as soon as memblock
+ * is initialized.
+ */
+#define for_each_reserved_mem_region(i, p_start, p_end)			\
+	for (i = 0UL,							\
+	     __next_reserved_mem_region(&i, p_start, p_end);		\
+	     i != (u64)ULLONG_MAX;					\
+	     __next_reserved_mem_region(&i, p_start, p_end))
+
 #ifdef CONFIG_MOVABLE_NODE
 static inline bool memblock_is_hotpluggable(struct memblock_region *m)
 {
diff --git a/mm/memblock.c b/mm/memblock.c
index 252b77bdf65e..e0cc2d174f74 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -765,6 +765,38 @@ int __init_memblock memblock_clear_hotplug(phys_addr_t base, phys_addr_t size)
 }
 
 /**
+ * __next_reserved_mem_region - next function for for_each_reserved_region()
+ * @idx: pointer to u64 loop variable
+ * @out_start: ptr to phys_addr_t for start address of the region, can be %NULL
+ * @out_end: ptr to phys_addr_t for end address of the region, can be %NULL
+ *
+ * Iterate over all reserved memory regions.
+ */
+void __init_memblock __next_reserved_mem_region(u64 *idx,
+					   phys_addr_t *out_start,
+					   phys_addr_t *out_end)
+{
+	struct memblock_type *rsv = &memblock.reserved;
+
+	if (*idx >= 0 && *idx < rsv->cnt) {
+		struct memblock_region *r = &rsv->regions[*idx];
+		phys_addr_t base = r->base;
+		phys_addr_t size = r->size;
+
+		if (out_start)
+			*out_start = base;
+		if (out_end)
+			*out_end = base + size - 1;
+
+		*idx += 1;
+		return;
+	}
+
+	/* signal end of iteration */
+	*idx = ULLONG_MAX;
+}
+
+/**
  * __next__mem_range - next function for for_each_free_mem_range() etc.
  * @idx: pointer to u64 loop variable
  * @nid: node selector, %NUMA_NO_NODE for all nodes
-- 
2.3.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH 02/13] mm: meminit: Move page initialization into a separate function.
  2015-04-28 14:36 ` Mel Gorman
@ 2015-04-28 14:36   ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-04-28 14:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML, Mel Gorman

From: Robin Holt <holt@sgi.com>

Currently, memmap_init_zone() has all the smarts for initializing a single
page. A subset of this is required for parallel page initialisation and so
this patch breaks up the monolithic function in preparation.

Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/page_alloc.c | 79 +++++++++++++++++++++++++++++++++------------------------
 1 file changed, 46 insertions(+), 33 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 40e29429e7b0..fd7a6d09062d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -778,6 +778,51 @@ static int free_tail_pages_check(struct page *head_page, struct page *page)
 	return 0;
 }
 
+static void __meminit __init_single_page(struct page *page, unsigned long pfn,
+				unsigned long zone, int nid)
+{
+	struct zone *z = &NODE_DATA(nid)->node_zones[zone];
+
+	set_page_links(page, zone, nid, pfn);
+	mminit_verify_page_links(page, zone, nid, pfn);
+	init_page_count(page);
+	page_mapcount_reset(page);
+	page_cpupid_reset_last(page);
+	SetPageReserved(page);
+
+	/*
+	 * Mark the block movable so that blocks are reserved for
+	 * movable at startup. This will force kernel allocations
+	 * to reserve their blocks rather than leaking throughout
+	 * the address space during boot when many long-lived
+	 * kernel allocations are made. Later some blocks near
+	 * the start are marked MIGRATE_RESERVE by
+	 * setup_zone_migrate_reserve()
+	 *
+	 * bitmap is created for zone's valid pfn range. but memmap
+	 * can be created for invalid pages (for alignment)
+	 * check here not to call set_pageblock_migratetype() against
+	 * pfn out of zone.
+	 */
+	if ((z->zone_start_pfn <= pfn)
+	    && (pfn < zone_end_pfn(z))
+	    && !(pfn & (pageblock_nr_pages - 1)))
+		set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+
+	INIT_LIST_HEAD(&page->lru);
+#ifdef WANT_PAGE_VIRTUAL
+	/* The shift won't overflow because ZONE_NORMAL is below 4G. */
+	if (!is_highmem_idx(zone))
+		set_page_address(page, __va(pfn << PAGE_SHIFT));
+#endif
+}
+
+static void __meminit __init_single_pfn(unsigned long pfn, unsigned long zone,
+					int nid)
+{
+	return __init_single_page(pfn_to_page(pfn), pfn, zone, nid);
+}
+
 static bool free_pages_prepare(struct page *page, unsigned int order)
 {
 	bool compound = PageCompound(page);
@@ -4124,7 +4169,6 @@ static void setup_zone_migrate_reserve(struct zone *zone)
 void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 		unsigned long start_pfn, enum memmap_context context)
 {
-	struct page *page;
 	unsigned long end_pfn = start_pfn + size;
 	unsigned long pfn;
 	struct zone *z;
@@ -4145,38 +4189,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 			if (!early_pfn_in_nid(pfn, nid))
 				continue;
 		}
-		page = pfn_to_page(pfn);
-		set_page_links(page, zone, nid, pfn);
-		mminit_verify_page_links(page, zone, nid, pfn);
-		init_page_count(page);
-		page_mapcount_reset(page);
-		page_cpupid_reset_last(page);
-		SetPageReserved(page);
-		/*
-		 * Mark the block movable so that blocks are reserved for
-		 * movable at startup. This will force kernel allocations
-		 * to reserve their blocks rather than leaking throughout
-		 * the address space during boot when many long-lived
-		 * kernel allocations are made. Later some blocks near
-		 * the start are marked MIGRATE_RESERVE by
-		 * setup_zone_migrate_reserve()
-		 *
-		 * bitmap is created for zone's valid pfn range. but memmap
-		 * can be created for invalid pages (for alignment)
-		 * check here not to call set_pageblock_migratetype() against
-		 * pfn out of zone.
-		 */
-		if ((z->zone_start_pfn <= pfn)
-		    && (pfn < zone_end_pfn(z))
-		    && !(pfn & (pageblock_nr_pages - 1)))
-			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
-
-		INIT_LIST_HEAD(&page->lru);
-#ifdef WANT_PAGE_VIRTUAL
-		/* The shift won't overflow because ZONE_NORMAL is below 4G. */
-		if (!is_highmem_idx(zone))
-			set_page_address(page, __va(pfn << PAGE_SHIFT));
-#endif
+		__init_single_pfn(pfn, zone, nid);
 	}
 }
 
-- 
2.3.5


^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH 02/13] mm: meminit: Move page initialization into a separate function.
@ 2015-04-28 14:36   ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-04-28 14:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML, Mel Gorman

From: Robin Holt <holt@sgi.com>

Currently, memmap_init_zone() has all the smarts for initializing a single
page. A subset of this is required for parallel page initialisation and so
this patch breaks up the monolithic function in preparation.

Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/page_alloc.c | 79 +++++++++++++++++++++++++++++++++------------------------
 1 file changed, 46 insertions(+), 33 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 40e29429e7b0..fd7a6d09062d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -778,6 +778,51 @@ static int free_tail_pages_check(struct page *head_page, struct page *page)
 	return 0;
 }
 
+static void __meminit __init_single_page(struct page *page, unsigned long pfn,
+				unsigned long zone, int nid)
+{
+	struct zone *z = &NODE_DATA(nid)->node_zones[zone];
+
+	set_page_links(page, zone, nid, pfn);
+	mminit_verify_page_links(page, zone, nid, pfn);
+	init_page_count(page);
+	page_mapcount_reset(page);
+	page_cpupid_reset_last(page);
+	SetPageReserved(page);
+
+	/*
+	 * Mark the block movable so that blocks are reserved for
+	 * movable at startup. This will force kernel allocations
+	 * to reserve their blocks rather than leaking throughout
+	 * the address space during boot when many long-lived
+	 * kernel allocations are made. Later some blocks near
+	 * the start are marked MIGRATE_RESERVE by
+	 * setup_zone_migrate_reserve()
+	 *
+	 * bitmap is created for zone's valid pfn range. but memmap
+	 * can be created for invalid pages (for alignment)
+	 * check here not to call set_pageblock_migratetype() against
+	 * pfn out of zone.
+	 */
+	if ((z->zone_start_pfn <= pfn)
+	    && (pfn < zone_end_pfn(z))
+	    && !(pfn & (pageblock_nr_pages - 1)))
+		set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+
+	INIT_LIST_HEAD(&page->lru);
+#ifdef WANT_PAGE_VIRTUAL
+	/* The shift won't overflow because ZONE_NORMAL is below 4G. */
+	if (!is_highmem_idx(zone))
+		set_page_address(page, __va(pfn << PAGE_SHIFT));
+#endif
+}
+
+static void __meminit __init_single_pfn(unsigned long pfn, unsigned long zone,
+					int nid)
+{
+	return __init_single_page(pfn_to_page(pfn), pfn, zone, nid);
+}
+
 static bool free_pages_prepare(struct page *page, unsigned int order)
 {
 	bool compound = PageCompound(page);
@@ -4124,7 +4169,6 @@ static void setup_zone_migrate_reserve(struct zone *zone)
 void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 		unsigned long start_pfn, enum memmap_context context)
 {
-	struct page *page;
 	unsigned long end_pfn = start_pfn + size;
 	unsigned long pfn;
 	struct zone *z;
@@ -4145,38 +4189,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 			if (!early_pfn_in_nid(pfn, nid))
 				continue;
 		}
-		page = pfn_to_page(pfn);
-		set_page_links(page, zone, nid, pfn);
-		mminit_verify_page_links(page, zone, nid, pfn);
-		init_page_count(page);
-		page_mapcount_reset(page);
-		page_cpupid_reset_last(page);
-		SetPageReserved(page);
-		/*
-		 * Mark the block movable so that blocks are reserved for
-		 * movable at startup. This will force kernel allocations
-		 * to reserve their blocks rather than leaking throughout
-		 * the address space during boot when many long-lived
-		 * kernel allocations are made. Later some blocks near
-		 * the start are marked MIGRATE_RESERVE by
-		 * setup_zone_migrate_reserve()
-		 *
-		 * bitmap is created for zone's valid pfn range. but memmap
-		 * can be created for invalid pages (for alignment)
-		 * check here not to call set_pageblock_migratetype() against
-		 * pfn out of zone.
-		 */
-		if ((z->zone_start_pfn <= pfn)
-		    && (pfn < zone_end_pfn(z))
-		    && !(pfn & (pageblock_nr_pages - 1)))
-			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
-
-		INIT_LIST_HEAD(&page->lru);
-#ifdef WANT_PAGE_VIRTUAL
-		/* The shift won't overflow because ZONE_NORMAL is below 4G. */
-		if (!is_highmem_idx(zone))
-			set_page_address(page, __va(pfn << PAGE_SHIFT));
-#endif
+		__init_single_pfn(pfn, zone, nid);
 	}
 }
 
-- 
2.3.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH 03/13] mm: meminit: Only set page reserved in the memblock region
  2015-04-28 14:36 ` Mel Gorman
@ 2015-04-28 14:37   ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-04-28 14:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML, Mel Gorman

From: Nathan Zimmer <nzimmer@sgi.com>

Currently each page struct is set as reserved upon initialization.
This patch leaves the reserved bit clear and only sets the reserved bit
when it is known the memory was allocated by the bootmem allocator. This
makes it easier to distinguish between uninitialised struct pages and
reserved struct pages in later patches.

Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/mm.h |  2 ++
 mm/nobootmem.c     |  3 +++
 mm/page_alloc.c    | 17 ++++++++++++++++-
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 47a93928b90f..b6f82a31028a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1711,6 +1711,8 @@ extern void free_highmem_page(struct page *page);
 extern void adjust_managed_page_count(struct page *page, long count);
 extern void mem_init_print_info(const char *str);
 
+extern void reserve_bootmem_region(unsigned long start, unsigned long end);
+
 /* Free the reserved page into the buddy system, so it gets managed. */
 static inline void __free_reserved_page(struct page *page)
 {
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index 90b50468333e..396f9e450dc1 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -121,6 +121,9 @@ static unsigned long __init free_low_memory_core_early(void)
 
 	memblock_clear_hotplug(0, -1);
 
+	for_each_reserved_mem_region(i, &start, &end)
+		reserve_bootmem_region(start, end);
+
 	for_each_free_mem_range(i, NUMA_NO_NODE, &start, &end, NULL)
 		count += __free_memory_core(start, end);
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fd7a6d09062d..13c88177d3c6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -788,7 +788,6 @@ static void __meminit __init_single_page(struct page *page, unsigned long pfn,
 	init_page_count(page);
 	page_mapcount_reset(page);
 	page_cpupid_reset_last(page);
-	SetPageReserved(page);
 
 	/*
 	 * Mark the block movable so that blocks are reserved for
@@ -823,6 +822,22 @@ static void __meminit __init_single_pfn(unsigned long pfn, unsigned long zone,
 	return __init_single_page(pfn_to_page(pfn), pfn, zone, nid);
 }
 
+/*
+ * Initialised pages do not have PageReserved set. This function is
+ * called for each range allocated by the bootmem allocator and
+ * marks the pages PageReserved. The remaining valid pages are later
+ * sent to the buddy page allocator.
+ */
+void reserve_bootmem_region(unsigned long start, unsigned long end)
+{
+	unsigned long start_pfn = PFN_DOWN(start);
+	unsigned long end_pfn = PFN_UP(end);
+
+	for (; start_pfn < end_pfn; start_pfn++)
+		if (pfn_valid(start_pfn))
+			SetPageReserved(pfn_to_page(start_pfn));
+}
+
 static bool free_pages_prepare(struct page *page, unsigned int order)
 {
 	bool compound = PageCompound(page);
-- 
2.3.5


^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH 03/13] mm: meminit: Only set page reserved in the memblock region
@ 2015-04-28 14:37   ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-04-28 14:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML, Mel Gorman

From: Nathan Zimmer <nzimmer@sgi.com>

Currently each page struct is set as reserved upon initialization.
This patch leaves the reserved bit clear and only sets the reserved bit
when it is known the memory was allocated by the bootmem allocator. This
makes it easier to distinguish between uninitialised struct pages and
reserved struct pages in later patches.

Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/mm.h |  2 ++
 mm/nobootmem.c     |  3 +++
 mm/page_alloc.c    | 17 ++++++++++++++++-
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 47a93928b90f..b6f82a31028a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1711,6 +1711,8 @@ extern void free_highmem_page(struct page *page);
 extern void adjust_managed_page_count(struct page *page, long count);
 extern void mem_init_print_info(const char *str);
 
+extern void reserve_bootmem_region(unsigned long start, unsigned long end);
+
 /* Free the reserved page into the buddy system, so it gets managed. */
 static inline void __free_reserved_page(struct page *page)
 {
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index 90b50468333e..396f9e450dc1 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -121,6 +121,9 @@ static unsigned long __init free_low_memory_core_early(void)
 
 	memblock_clear_hotplug(0, -1);
 
+	for_each_reserved_mem_region(i, &start, &end)
+		reserve_bootmem_region(start, end);
+
 	for_each_free_mem_range(i, NUMA_NO_NODE, &start, &end, NULL)
 		count += __free_memory_core(start, end);
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fd7a6d09062d..13c88177d3c6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -788,7 +788,6 @@ static void __meminit __init_single_page(struct page *page, unsigned long pfn,
 	init_page_count(page);
 	page_mapcount_reset(page);
 	page_cpupid_reset_last(page);
-	SetPageReserved(page);
 
 	/*
 	 * Mark the block movable so that blocks are reserved for
@@ -823,6 +822,22 @@ static void __meminit __init_single_pfn(unsigned long pfn, unsigned long zone,
 	return __init_single_page(pfn_to_page(pfn), pfn, zone, nid);
 }
 
+/*
+ * Initialised pages do not have PageReserved set. This function is
+ * called for each range allocated by the bootmem allocator and
+ * marks the pages PageReserved. The remaining valid pages are later
+ * sent to the buddy page allocator.
+ */
+void reserve_bootmem_region(unsigned long start, unsigned long end)
+{
+	unsigned long start_pfn = PFN_DOWN(start);
+	unsigned long end_pfn = PFN_UP(end);
+
+	for (; start_pfn < end_pfn; start_pfn++)
+		if (pfn_valid(start_pfn))
+			SetPageReserved(pfn_to_page(start_pfn));
+}
+
 static bool free_pages_prepare(struct page *page, unsigned int order)
 {
 	bool compound = PageCompound(page);
-- 
2.3.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH 04/13] mm: page_alloc: Pass PFN to __free_pages_bootmem
  2015-04-28 14:36 ` Mel Gorman
@ 2015-04-28 14:37   ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-04-28 14:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML, Mel Gorman

__free_pages_bootmem prepares a page for release to the buddy allocator
and assumes that the struct page is initialised. Parallel initialisation of
struct pages defers initialisation and __free_pages_bootmem can be called
for struct pages that cannot yet map struct page to PFN.  This patch passes
PFN to __free_pages_bootmem with no other functional change.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/bootmem.c    | 8 ++++----
 mm/internal.h   | 3 ++-
 mm/memblock.c   | 2 +-
 mm/nobootmem.c  | 4 ++--
 mm/page_alloc.c | 3 ++-
 5 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/mm/bootmem.c b/mm/bootmem.c
index 477be696511d..daf956bb4782 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -164,7 +164,7 @@ void __init free_bootmem_late(unsigned long physaddr, unsigned long size)
 	end = PFN_DOWN(physaddr + size);
 
 	for (; cursor < end; cursor++) {
-		__free_pages_bootmem(pfn_to_page(cursor), 0);
+		__free_pages_bootmem(pfn_to_page(cursor), cursor, 0);
 		totalram_pages++;
 	}
 }
@@ -210,7 +210,7 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
 		if (IS_ALIGNED(start, BITS_PER_LONG) && vec == ~0UL) {
 			int order = ilog2(BITS_PER_LONG);
 
-			__free_pages_bootmem(pfn_to_page(start), order);
+			__free_pages_bootmem(pfn_to_page(start), start, order);
 			count += BITS_PER_LONG;
 			start += BITS_PER_LONG;
 		} else {
@@ -220,7 +220,7 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
 			while (vec && cur != start) {
 				if (vec & 1) {
 					page = pfn_to_page(cur);
-					__free_pages_bootmem(page, 0);
+					__free_pages_bootmem(page, cur, 0);
 					count++;
 				}
 				vec >>= 1;
@@ -234,7 +234,7 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
 	pages = bootmem_bootmap_pages(pages);
 	count += pages;
 	while (pages--)
-		__free_pages_bootmem(page++, 0);
+		__free_pages_bootmem(page++, cur++, 0);
 
 	bdebug("nid=%td released=%lx\n", bdata - bootmem_node_data, count);
 
diff --git a/mm/internal.h b/mm/internal.h
index a96da5b0029d..76b605139c7a 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -155,7 +155,8 @@ __find_buddy_index(unsigned long page_idx, unsigned int order)
 }
 
 extern int __isolate_free_page(struct page *page, unsigned int order);
-extern void __free_pages_bootmem(struct page *page, unsigned int order);
+extern void __free_pages_bootmem(struct page *page, unsigned long pfn,
+					unsigned int order);
 extern void prep_compound_page(struct page *page, unsigned long order);
 #ifdef CONFIG_MEMORY_FAILURE
 extern bool is_free_buddy_page(struct page *page);
diff --git a/mm/memblock.c b/mm/memblock.c
index e0cc2d174f74..f3e97d8eeb5c 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1334,7 +1334,7 @@ void __init __memblock_free_late(phys_addr_t base, phys_addr_t size)
 	end = PFN_DOWN(base + size);
 
 	for (; cursor < end; cursor++) {
-		__free_pages_bootmem(pfn_to_page(cursor), 0);
+		__free_pages_bootmem(pfn_to_page(cursor), cursor, 0);
 		totalram_pages++;
 	}
 }
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index 396f9e450dc1..bae652713ee5 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -77,7 +77,7 @@ void __init free_bootmem_late(unsigned long addr, unsigned long size)
 	end = PFN_DOWN(addr + size);
 
 	for (; cursor < end; cursor++) {
-		__free_pages_bootmem(pfn_to_page(cursor), 0);
+		__free_pages_bootmem(pfn_to_page(cursor), cursor, 0);
 		totalram_pages++;
 	}
 }
@@ -92,7 +92,7 @@ static void __init __free_pages_memory(unsigned long start, unsigned long end)
 		while (start + (1UL << order) > end)
 			order--;
 
-		__free_pages_bootmem(pfn_to_page(start), order);
+		__free_pages_bootmem(pfn_to_page(start), start, order);
 
 		start += (1UL << order);
 	}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 13c88177d3c6..a59f75d02d11 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -892,7 +892,8 @@ static void __free_pages_ok(struct page *page, unsigned int order)
 	local_irq_restore(flags);
 }
 
-void __init __free_pages_bootmem(struct page *page, unsigned int order)
+void __init __free_pages_bootmem(struct page *page, unsigned long pfn,
+							unsigned int order)
 {
 	unsigned int nr_pages = 1 << order;
 	struct page *p = page;
-- 
2.3.5


^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH 04/13] mm: page_alloc: Pass PFN to __free_pages_bootmem
@ 2015-04-28 14:37   ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-04-28 14:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML, Mel Gorman

__free_pages_bootmem prepares a page for release to the buddy allocator
and assumes that the struct page is initialised. Parallel initialisation of
struct pages defers initialisation and __free_pages_bootmem can be called
for struct pages that cannot yet map struct page to PFN.  This patch passes
PFN to __free_pages_bootmem with no other functional change.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/bootmem.c    | 8 ++++----
 mm/internal.h   | 3 ++-
 mm/memblock.c   | 2 +-
 mm/nobootmem.c  | 4 ++--
 mm/page_alloc.c | 3 ++-
 5 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/mm/bootmem.c b/mm/bootmem.c
index 477be696511d..daf956bb4782 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -164,7 +164,7 @@ void __init free_bootmem_late(unsigned long physaddr, unsigned long size)
 	end = PFN_DOWN(physaddr + size);
 
 	for (; cursor < end; cursor++) {
-		__free_pages_bootmem(pfn_to_page(cursor), 0);
+		__free_pages_bootmem(pfn_to_page(cursor), cursor, 0);
 		totalram_pages++;
 	}
 }
@@ -210,7 +210,7 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
 		if (IS_ALIGNED(start, BITS_PER_LONG) && vec == ~0UL) {
 			int order = ilog2(BITS_PER_LONG);
 
-			__free_pages_bootmem(pfn_to_page(start), order);
+			__free_pages_bootmem(pfn_to_page(start), start, order);
 			count += BITS_PER_LONG;
 			start += BITS_PER_LONG;
 		} else {
@@ -220,7 +220,7 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
 			while (vec && cur != start) {
 				if (vec & 1) {
 					page = pfn_to_page(cur);
-					__free_pages_bootmem(page, 0);
+					__free_pages_bootmem(page, cur, 0);
 					count++;
 				}
 				vec >>= 1;
@@ -234,7 +234,7 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
 	pages = bootmem_bootmap_pages(pages);
 	count += pages;
 	while (pages--)
-		__free_pages_bootmem(page++, 0);
+		__free_pages_bootmem(page++, cur++, 0);
 
 	bdebug("nid=%td released=%lx\n", bdata - bootmem_node_data, count);
 
diff --git a/mm/internal.h b/mm/internal.h
index a96da5b0029d..76b605139c7a 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -155,7 +155,8 @@ __find_buddy_index(unsigned long page_idx, unsigned int order)
 }
 
 extern int __isolate_free_page(struct page *page, unsigned int order);
-extern void __free_pages_bootmem(struct page *page, unsigned int order);
+extern void __free_pages_bootmem(struct page *page, unsigned long pfn,
+					unsigned int order);
 extern void prep_compound_page(struct page *page, unsigned long order);
 #ifdef CONFIG_MEMORY_FAILURE
 extern bool is_free_buddy_page(struct page *page);
diff --git a/mm/memblock.c b/mm/memblock.c
index e0cc2d174f74..f3e97d8eeb5c 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1334,7 +1334,7 @@ void __init __memblock_free_late(phys_addr_t base, phys_addr_t size)
 	end = PFN_DOWN(base + size);
 
 	for (; cursor < end; cursor++) {
-		__free_pages_bootmem(pfn_to_page(cursor), 0);
+		__free_pages_bootmem(pfn_to_page(cursor), cursor, 0);
 		totalram_pages++;
 	}
 }
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index 396f9e450dc1..bae652713ee5 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -77,7 +77,7 @@ void __init free_bootmem_late(unsigned long addr, unsigned long size)
 	end = PFN_DOWN(addr + size);
 
 	for (; cursor < end; cursor++) {
-		__free_pages_bootmem(pfn_to_page(cursor), 0);
+		__free_pages_bootmem(pfn_to_page(cursor), cursor, 0);
 		totalram_pages++;
 	}
 }
@@ -92,7 +92,7 @@ static void __init __free_pages_memory(unsigned long start, unsigned long end)
 		while (start + (1UL << order) > end)
 			order--;
 
-		__free_pages_bootmem(pfn_to_page(start), order);
+		__free_pages_bootmem(pfn_to_page(start), start, order);
 
 		start += (1UL << order);
 	}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 13c88177d3c6..a59f75d02d11 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -892,7 +892,8 @@ static void __free_pages_ok(struct page *page, unsigned int order)
 	local_irq_restore(flags);
 }
 
-void __init __free_pages_bootmem(struct page *page, unsigned int order)
+void __init __free_pages_bootmem(struct page *page, unsigned long pfn,
+							unsigned int order)
 {
 	unsigned int nr_pages = 1 << order;
 	struct page *p = page;
-- 
2.3.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH 05/13] mm: meminit: Make __early_pfn_to_nid SMP-safe and introduce meminit_pfn_in_nid
  2015-04-28 14:36 ` Mel Gorman
@ 2015-04-28 14:37   ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-04-28 14:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML, Mel Gorman

__early_pfn_to_nid() use static variables to cache recent lookups as memblock
lookups are very expensive but it assumes that memory initialisation is
single-threaded. Parallel initialisation of struct pages will break that
assumption so this patch makes __early_pfn_to_nid() SMP-safe by requiring
the caller to cache recent search information. early_pfn_to_nid() keeps
the same interface but is only safe to use early in boot due to the use
of a global static variable. meminit_pfn_in_nid() is an SMP-safe version
that callers must maintain their own state for.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 arch/ia64/mm/numa.c    | 19 +++++++------------
 include/linux/mm.h     |  6 ++++--
 include/linux/mmzone.h | 16 +++++++++++++++-
 mm/page_alloc.c        | 40 +++++++++++++++++++++++++---------------
 4 files changed, 51 insertions(+), 30 deletions(-)

diff --git a/arch/ia64/mm/numa.c b/arch/ia64/mm/numa.c
index ea21d4cad540..aa19b7ac8222 100644
--- a/arch/ia64/mm/numa.c
+++ b/arch/ia64/mm/numa.c
@@ -58,27 +58,22 @@ paddr_to_nid(unsigned long paddr)
  * SPARSEMEM to allocate the SPARSEMEM sectionmap on the NUMA node where
  * the section resides.
  */
-int __meminit __early_pfn_to_nid(unsigned long pfn)
+int __meminit __early_pfn_to_nid(unsigned long pfn,
+					struct mminit_pfnnid_cache *state)
 {
 	int i, section = pfn >> PFN_SECTION_SHIFT, ssec, esec;
-	/*
-	 * NOTE: The following SMP-unsafe globals are only used early in boot
-	 * when the kernel is running single-threaded.
-	 */
-	static int __meminitdata last_ssec, last_esec;
-	static int __meminitdata last_nid;
 
-	if (section >= last_ssec && section < last_esec)
-		return last_nid;
+	if (section >= state->last_start && section < state->last_end)
+		return state->last_nid;
 
 	for (i = 0; i < num_node_memblks; i++) {
 		ssec = node_memblk[i].start_paddr >> PA_SECTION_SHIFT;
 		esec = (node_memblk[i].start_paddr + node_memblk[i].size +
 			((1L << PA_SECTION_SHIFT) - 1)) >> PA_SECTION_SHIFT;
 		if (section >= ssec && section < esec) {
-			last_ssec = ssec;
-			last_esec = esec;
-			last_nid = node_memblk[i].nid;
+			state->last_start = ssec;
+			state->last_end = esec;
+			state->last_nid = node_memblk[i].nid;
 			return node_memblk[i].nid;
 		}
 	}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index b6f82a31028a..a8a8b161fd65 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1802,7 +1802,8 @@ extern void sparse_memory_present_with_active_regions(int nid);
 
 #if !defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) && \
     !defined(CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID)
-static inline int __early_pfn_to_nid(unsigned long pfn)
+static inline int __early_pfn_to_nid(unsigned long pfn,
+					struct mminit_pfnnid_cache *state)
 {
 	return 0;
 }
@@ -1810,7 +1811,8 @@ static inline int __early_pfn_to_nid(unsigned long pfn)
 /* please see mm/page_alloc.c */
 extern int __meminit early_pfn_to_nid(unsigned long pfn);
 /* there is a per-arch backend function. */
-extern int __meminit __early_pfn_to_nid(unsigned long pfn);
+extern int __meminit __early_pfn_to_nid(unsigned long pfn,
+					struct mminit_pfnnid_cache *state);
 #endif
 
 extern void set_dma_reserve(unsigned long new_dma_reserve);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 2782df47101e..a67b33e52dfe 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1216,10 +1216,24 @@ void sparse_init(void);
 #define sparse_index_init(_sec, _nid)  do {} while (0)
 #endif /* CONFIG_SPARSEMEM */
 
+/*
+ * During memory init memblocks map pfns to nids. The search is expensive and
+ * this caches recent lookups. The implementation of __early_pfn_to_nid
+ * may treat start/end as pfns or sections.
+ */
+struct mminit_pfnnid_cache {
+	unsigned long last_start;
+	unsigned long last_end;
+	int last_nid;
+};
+
 #ifdef CONFIG_NODES_SPAN_OTHER_NODES
 bool early_pfn_in_nid(unsigned long pfn, int nid);
+bool meminit_pfn_in_nid(unsigned long pfn, int node,
+			struct mminit_pfnnid_cache *state);
 #else
-#define early_pfn_in_nid(pfn, nid)	(1)
+#define early_pfn_in_nid(pfn, nid)		(1)
+#define meminit_pfn_in_nid(pfn, nid, state)	(1)
 #endif
 
 #ifndef early_pfn_valid
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a59f75d02d11..6c5ed5804e82 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4463,39 +4463,41 @@ int __meminit init_currently_empty_zone(struct zone *zone,
 
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 #ifndef CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
+
 /*
  * Required by SPARSEMEM. Given a PFN, return what node the PFN is on.
  */
-int __meminit __early_pfn_to_nid(unsigned long pfn)
+int __meminit __early_pfn_to_nid(unsigned long pfn,
+					struct mminit_pfnnid_cache *state)
 {
 	unsigned long start_pfn, end_pfn;
 	int nid;
-	/*
-	 * NOTE: The following SMP-unsafe globals are only used early in boot
-	 * when the kernel is running single-threaded.
-	 */
-	static unsigned long __meminitdata last_start_pfn, last_end_pfn;
-	static int __meminitdata last_nid;
 
-	if (last_start_pfn <= pfn && pfn < last_end_pfn)
-		return last_nid;
+	if (state->last_start <= pfn && pfn < state->last_end)
+		return state->last_nid;
 
 	nid = memblock_search_pfn_nid(pfn, &start_pfn, &end_pfn);
 	if (nid != -1) {
-		last_start_pfn = start_pfn;
-		last_end_pfn = end_pfn;
-		last_nid = nid;
+		state->last_start = start_pfn;
+		state->last_end = end_pfn;
+		state->last_nid = nid;
 	}
 
 	return nid;
 }
 #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */
 
+static struct mminit_pfnnid_cache early_pfnnid_cache __meminitdata;
+
+/* Only safe to use early in boot when initialisation is single-threaded */
 int __meminit early_pfn_to_nid(unsigned long pfn)
 {
 	int nid;
 
-	nid = __early_pfn_to_nid(pfn);
+	/* The system will behave unpredictably otherwise */
+	BUG_ON(system_state != SYSTEM_BOOTING);
+
+	nid = __early_pfn_to_nid(pfn, &early_pfnnid_cache);
 	if (nid >= 0)
 		return nid;
 	/* just returns 0 */
@@ -4503,15 +4505,23 @@ int __meminit early_pfn_to_nid(unsigned long pfn)
 }
 
 #ifdef CONFIG_NODES_SPAN_OTHER_NODES
-bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
+bool __meminit meminit_pfn_in_nid(unsigned long pfn, int node,
+					struct mminit_pfnnid_cache *state)
 {
 	int nid;
 
-	nid = __early_pfn_to_nid(pfn);
+	nid = __early_pfn_to_nid(pfn, state);
 	if (nid >= 0 && nid != node)
 		return false;
 	return true;
 }
+
+/* Only safe to use early in boot when initialisation is single-threaded */
+bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
+{
+	return meminit_pfn_in_nid(pfn, node, &early_pfnnid_cache);
+}
+
 #endif
 
 /**
-- 
2.3.5


^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH 05/13] mm: meminit: Make __early_pfn_to_nid SMP-safe and introduce meminit_pfn_in_nid
@ 2015-04-28 14:37   ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-04-28 14:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML, Mel Gorman

__early_pfn_to_nid() use static variables to cache recent lookups as memblock
lookups are very expensive but it assumes that memory initialisation is
single-threaded. Parallel initialisation of struct pages will break that
assumption so this patch makes __early_pfn_to_nid() SMP-safe by requiring
the caller to cache recent search information. early_pfn_to_nid() keeps
the same interface but is only safe to use early in boot due to the use
of a global static variable. meminit_pfn_in_nid() is an SMP-safe version
that callers must maintain their own state for.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 arch/ia64/mm/numa.c    | 19 +++++++------------
 include/linux/mm.h     |  6 ++++--
 include/linux/mmzone.h | 16 +++++++++++++++-
 mm/page_alloc.c        | 40 +++++++++++++++++++++++++---------------
 4 files changed, 51 insertions(+), 30 deletions(-)

diff --git a/arch/ia64/mm/numa.c b/arch/ia64/mm/numa.c
index ea21d4cad540..aa19b7ac8222 100644
--- a/arch/ia64/mm/numa.c
+++ b/arch/ia64/mm/numa.c
@@ -58,27 +58,22 @@ paddr_to_nid(unsigned long paddr)
  * SPARSEMEM to allocate the SPARSEMEM sectionmap on the NUMA node where
  * the section resides.
  */
-int __meminit __early_pfn_to_nid(unsigned long pfn)
+int __meminit __early_pfn_to_nid(unsigned long pfn,
+					struct mminit_pfnnid_cache *state)
 {
 	int i, section = pfn >> PFN_SECTION_SHIFT, ssec, esec;
-	/*
-	 * NOTE: The following SMP-unsafe globals are only used early in boot
-	 * when the kernel is running single-threaded.
-	 */
-	static int __meminitdata last_ssec, last_esec;
-	static int __meminitdata last_nid;
 
-	if (section >= last_ssec && section < last_esec)
-		return last_nid;
+	if (section >= state->last_start && section < state->last_end)
+		return state->last_nid;
 
 	for (i = 0; i < num_node_memblks; i++) {
 		ssec = node_memblk[i].start_paddr >> PA_SECTION_SHIFT;
 		esec = (node_memblk[i].start_paddr + node_memblk[i].size +
 			((1L << PA_SECTION_SHIFT) - 1)) >> PA_SECTION_SHIFT;
 		if (section >= ssec && section < esec) {
-			last_ssec = ssec;
-			last_esec = esec;
-			last_nid = node_memblk[i].nid;
+			state->last_start = ssec;
+			state->last_end = esec;
+			state->last_nid = node_memblk[i].nid;
 			return node_memblk[i].nid;
 		}
 	}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index b6f82a31028a..a8a8b161fd65 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1802,7 +1802,8 @@ extern void sparse_memory_present_with_active_regions(int nid);
 
 #if !defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) && \
     !defined(CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID)
-static inline int __early_pfn_to_nid(unsigned long pfn)
+static inline int __early_pfn_to_nid(unsigned long pfn,
+					struct mminit_pfnnid_cache *state)
 {
 	return 0;
 }
@@ -1810,7 +1811,8 @@ static inline int __early_pfn_to_nid(unsigned long pfn)
 /* please see mm/page_alloc.c */
 extern int __meminit early_pfn_to_nid(unsigned long pfn);
 /* there is a per-arch backend function. */
-extern int __meminit __early_pfn_to_nid(unsigned long pfn);
+extern int __meminit __early_pfn_to_nid(unsigned long pfn,
+					struct mminit_pfnnid_cache *state);
 #endif
 
 extern void set_dma_reserve(unsigned long new_dma_reserve);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 2782df47101e..a67b33e52dfe 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1216,10 +1216,24 @@ void sparse_init(void);
 #define sparse_index_init(_sec, _nid)  do {} while (0)
 #endif /* CONFIG_SPARSEMEM */
 
+/*
+ * During memory init memblocks map pfns to nids. The search is expensive and
+ * this caches recent lookups. The implementation of __early_pfn_to_nid
+ * may treat start/end as pfns or sections.
+ */
+struct mminit_pfnnid_cache {
+	unsigned long last_start;
+	unsigned long last_end;
+	int last_nid;
+};
+
 #ifdef CONFIG_NODES_SPAN_OTHER_NODES
 bool early_pfn_in_nid(unsigned long pfn, int nid);
+bool meminit_pfn_in_nid(unsigned long pfn, int node,
+			struct mminit_pfnnid_cache *state);
 #else
-#define early_pfn_in_nid(pfn, nid)	(1)
+#define early_pfn_in_nid(pfn, nid)		(1)
+#define meminit_pfn_in_nid(pfn, nid, state)	(1)
 #endif
 
 #ifndef early_pfn_valid
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a59f75d02d11..6c5ed5804e82 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4463,39 +4463,41 @@ int __meminit init_currently_empty_zone(struct zone *zone,
 
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 #ifndef CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
+
 /*
  * Required by SPARSEMEM. Given a PFN, return what node the PFN is on.
  */
-int __meminit __early_pfn_to_nid(unsigned long pfn)
+int __meminit __early_pfn_to_nid(unsigned long pfn,
+					struct mminit_pfnnid_cache *state)
 {
 	unsigned long start_pfn, end_pfn;
 	int nid;
-	/*
-	 * NOTE: The following SMP-unsafe globals are only used early in boot
-	 * when the kernel is running single-threaded.
-	 */
-	static unsigned long __meminitdata last_start_pfn, last_end_pfn;
-	static int __meminitdata last_nid;
 
-	if (last_start_pfn <= pfn && pfn < last_end_pfn)
-		return last_nid;
+	if (state->last_start <= pfn && pfn < state->last_end)
+		return state->last_nid;
 
 	nid = memblock_search_pfn_nid(pfn, &start_pfn, &end_pfn);
 	if (nid != -1) {
-		last_start_pfn = start_pfn;
-		last_end_pfn = end_pfn;
-		last_nid = nid;
+		state->last_start = start_pfn;
+		state->last_end = end_pfn;
+		state->last_nid = nid;
 	}
 
 	return nid;
 }
 #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */
 
+static struct mminit_pfnnid_cache early_pfnnid_cache __meminitdata;
+
+/* Only safe to use early in boot when initialisation is single-threaded */
 int __meminit early_pfn_to_nid(unsigned long pfn)
 {
 	int nid;
 
-	nid = __early_pfn_to_nid(pfn);
+	/* The system will behave unpredictably otherwise */
+	BUG_ON(system_state != SYSTEM_BOOTING);
+
+	nid = __early_pfn_to_nid(pfn, &early_pfnnid_cache);
 	if (nid >= 0)
 		return nid;
 	/* just returns 0 */
@@ -4503,15 +4505,23 @@ int __meminit early_pfn_to_nid(unsigned long pfn)
 }
 
 #ifdef CONFIG_NODES_SPAN_OTHER_NODES
-bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
+bool __meminit meminit_pfn_in_nid(unsigned long pfn, int node,
+					struct mminit_pfnnid_cache *state)
 {
 	int nid;
 
-	nid = __early_pfn_to_nid(pfn);
+	nid = __early_pfn_to_nid(pfn, state);
 	if (nid >= 0 && nid != node)
 		return false;
 	return true;
 }
+
+/* Only safe to use early in boot when initialisation is single-threaded */
+bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
+{
+	return meminit_pfn_in_nid(pfn, node, &early_pfnnid_cache);
+}
+
 #endif
 
 /**
-- 
2.3.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH 06/13] mm: meminit: Inline some helper functions
  2015-04-28 14:36 ` Mel Gorman
@ 2015-04-28 14:37   ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-04-28 14:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML, Mel Gorman

early_pfn_in_nid() and meminit_pfn_in_nid() are small functions that are
unnecessarily visible outside memory initialisation. As well as unnecessary
visibility, it's unnecessary function call overhead when initialising pages.
This patch moves the helpers inline.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/mmzone.h |  9 ------
 mm/page_alloc.c        | 76 ++++++++++++++++++++++++++------------------------
 2 files changed, 39 insertions(+), 46 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index a67b33e52dfe..e3d8a2bd8d78 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1227,15 +1227,6 @@ struct mminit_pfnnid_cache {
 	int last_nid;
 };
 
-#ifdef CONFIG_NODES_SPAN_OTHER_NODES
-bool early_pfn_in_nid(unsigned long pfn, int nid);
-bool meminit_pfn_in_nid(unsigned long pfn, int node,
-			struct mminit_pfnnid_cache *state);
-#else
-#define early_pfn_in_nid(pfn, nid)		(1)
-#define meminit_pfn_in_nid(pfn, nid, state)	(1)
-#endif
-
 #ifndef early_pfn_valid
 #define early_pfn_valid(pfn)	(1)
 #endif
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6c5ed5804e82..bb99c7e66da5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -913,6 +913,45 @@ void __init __free_pages_bootmem(struct page *page, unsigned long pfn,
 	__free_pages(page, order);
 }
 
+#if defined(CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID) || \
+	defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP)
+/* Only safe to use early in boot when initialisation is single-threaded */
+static struct mminit_pfnnid_cache early_pfnnid_cache __meminitdata;
+
+int __meminit early_pfn_to_nid(unsigned long pfn)
+{
+	int nid;
+
+	/* The system will behave unpredictably otherwise */
+	BUG_ON(system_state != SYSTEM_BOOTING);
+
+	nid = __early_pfn_to_nid(pfn, &early_pfnnid_cache);
+	if (nid >= 0)
+		return nid;
+	/* just returns 0 */
+	return 0;
+}
+#endif
+
+#ifdef CONFIG_NODES_SPAN_OTHER_NODES
+static inline bool __meminit meminit_pfn_in_nid(unsigned long pfn, int node,
+					struct mminit_pfnnid_cache *state)
+{
+	int nid;
+
+	nid = __early_pfn_to_nid(pfn, state);
+	if (nid >= 0 && nid != node)
+		return false;
+	return true;
+}
+
+/* Only safe to use early in boot when initialisation is single-threaded */
+static inline bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
+{
+	return meminit_pfn_in_nid(pfn, node, &early_pfnnid_cache);
+}
+#endif
+
 #ifdef CONFIG_CMA
 /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
 void __init init_cma_reserved_pageblock(struct page *page)
@@ -4487,43 +4526,6 @@ int __meminit __early_pfn_to_nid(unsigned long pfn,
 }
 #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */
 
-static struct mminit_pfnnid_cache early_pfnnid_cache __meminitdata;
-
-/* Only safe to use early in boot when initialisation is single-threaded */
-int __meminit early_pfn_to_nid(unsigned long pfn)
-{
-	int nid;
-
-	/* The system will behave unpredictably otherwise */
-	BUG_ON(system_state != SYSTEM_BOOTING);
-
-	nid = __early_pfn_to_nid(pfn, &early_pfnnid_cache);
-	if (nid >= 0)
-		return nid;
-	/* just returns 0 */
-	return 0;
-}
-
-#ifdef CONFIG_NODES_SPAN_OTHER_NODES
-bool __meminit meminit_pfn_in_nid(unsigned long pfn, int node,
-					struct mminit_pfnnid_cache *state)
-{
-	int nid;
-
-	nid = __early_pfn_to_nid(pfn, state);
-	if (nid >= 0 && nid != node)
-		return false;
-	return true;
-}
-
-/* Only safe to use early in boot when initialisation is single-threaded */
-bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
-{
-	return meminit_pfn_in_nid(pfn, node, &early_pfnnid_cache);
-}
-
-#endif
-
 /**
  * free_bootmem_with_active_regions - Call memblock_free_early_nid for each active range
  * @nid: The node to free memory on. If MAX_NUMNODES, all nodes are freed.
-- 
2.3.5


^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH 06/13] mm: meminit: Inline some helper functions
@ 2015-04-28 14:37   ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-04-28 14:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML, Mel Gorman

early_pfn_in_nid() and meminit_pfn_in_nid() are small functions that are
unnecessarily visible outside memory initialisation. As well as unnecessary
visibility, it's unnecessary function call overhead when initialising pages.
This patch moves the helpers inline.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/mmzone.h |  9 ------
 mm/page_alloc.c        | 76 ++++++++++++++++++++++++++------------------------
 2 files changed, 39 insertions(+), 46 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index a67b33e52dfe..e3d8a2bd8d78 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1227,15 +1227,6 @@ struct mminit_pfnnid_cache {
 	int last_nid;
 };
 
-#ifdef CONFIG_NODES_SPAN_OTHER_NODES
-bool early_pfn_in_nid(unsigned long pfn, int nid);
-bool meminit_pfn_in_nid(unsigned long pfn, int node,
-			struct mminit_pfnnid_cache *state);
-#else
-#define early_pfn_in_nid(pfn, nid)		(1)
-#define meminit_pfn_in_nid(pfn, nid, state)	(1)
-#endif
-
 #ifndef early_pfn_valid
 #define early_pfn_valid(pfn)	(1)
 #endif
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6c5ed5804e82..bb99c7e66da5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -913,6 +913,45 @@ void __init __free_pages_bootmem(struct page *page, unsigned long pfn,
 	__free_pages(page, order);
 }
 
+#if defined(CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID) || \
+	defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP)
+/* Only safe to use early in boot when initialisation is single-threaded */
+static struct mminit_pfnnid_cache early_pfnnid_cache __meminitdata;
+
+int __meminit early_pfn_to_nid(unsigned long pfn)
+{
+	int nid;
+
+	/* The system will behave unpredictably otherwise */
+	BUG_ON(system_state != SYSTEM_BOOTING);
+
+	nid = __early_pfn_to_nid(pfn, &early_pfnnid_cache);
+	if (nid >= 0)
+		return nid;
+	/* just returns 0 */
+	return 0;
+}
+#endif
+
+#ifdef CONFIG_NODES_SPAN_OTHER_NODES
+static inline bool __meminit meminit_pfn_in_nid(unsigned long pfn, int node,
+					struct mminit_pfnnid_cache *state)
+{
+	int nid;
+
+	nid = __early_pfn_to_nid(pfn, state);
+	if (nid >= 0 && nid != node)
+		return false;
+	return true;
+}
+
+/* Only safe to use early in boot when initialisation is single-threaded */
+static inline bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
+{
+	return meminit_pfn_in_nid(pfn, node, &early_pfnnid_cache);
+}
+#endif
+
 #ifdef CONFIG_CMA
 /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
 void __init init_cma_reserved_pageblock(struct page *page)
@@ -4487,43 +4526,6 @@ int __meminit __early_pfn_to_nid(unsigned long pfn,
 }
 #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */
 
-static struct mminit_pfnnid_cache early_pfnnid_cache __meminitdata;
-
-/* Only safe to use early in boot when initialisation is single-threaded */
-int __meminit early_pfn_to_nid(unsigned long pfn)
-{
-	int nid;
-
-	/* The system will behave unpredictably otherwise */
-	BUG_ON(system_state != SYSTEM_BOOTING);
-
-	nid = __early_pfn_to_nid(pfn, &early_pfnnid_cache);
-	if (nid >= 0)
-		return nid;
-	/* just returns 0 */
-	return 0;
-}
-
-#ifdef CONFIG_NODES_SPAN_OTHER_NODES
-bool __meminit meminit_pfn_in_nid(unsigned long pfn, int node,
-					struct mminit_pfnnid_cache *state)
-{
-	int nid;
-
-	nid = __early_pfn_to_nid(pfn, state);
-	if (nid >= 0 && nid != node)
-		return false;
-	return true;
-}
-
-/* Only safe to use early in boot when initialisation is single-threaded */
-bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
-{
-	return meminit_pfn_in_nid(pfn, node, &early_pfnnid_cache);
-}
-
-#endif
-
 /**
  * free_bootmem_with_active_regions - Call memblock_free_early_nid for each active range
  * @nid: The node to free memory on. If MAX_NUMNODES, all nodes are freed.
-- 
2.3.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH 07/13] mm: meminit: Initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set
  2015-04-28 14:36 ` Mel Gorman
@ 2015-04-28 14:37   ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-04-28 14:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML, Mel Gorman

This patch initalises all low memory struct pages and 2G of the highest zone
on each node during memory initialisation if CONFIG_DEFERRED_STRUCT_PAGE_INIT
is set. That config option cannot be set but will be available in a later
patch.  Parallel initialisation of struct page depends on some features
from memory hotplug and it is necessary to alter alter section annotations.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 drivers/base/node.c    |  6 +++-
 include/linux/mmzone.h |  8 ++++++
 mm/Kconfig             | 18 ++++++++++++
 mm/internal.h          | 14 +++++++++
 mm/page_alloc.c        | 78 ++++++++++++++++++++++++++++++++++++++++++++++++--
 5 files changed, 120 insertions(+), 4 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 36fabe43cd44..97ab2c4dd39e 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -361,12 +361,16 @@ int unregister_cpu_under_node(unsigned int cpu, unsigned int nid)
 #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
 #define page_initialized(page)  (page->lru.next)
 
-static int get_nid_for_pfn(unsigned long pfn)
+static int __init_refok get_nid_for_pfn(unsigned long pfn)
 {
 	struct page *page;
 
 	if (!pfn_valid_within(pfn))
 		return -1;
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+	if (system_state == SYSTEM_BOOTING)
+		return early_pfn_to_nid(pfn);
+#endif
 	page = pfn_to_page(pfn);
 	if (!page_initialized(page))
 		return -1;
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e3d8a2bd8d78..4882c53b70b5 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -762,6 +762,14 @@ typedef struct pglist_data {
 	/* Number of pages migrated during the rate limiting time interval */
 	unsigned long numabalancing_migrate_nr_pages;
 #endif
+
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+	/*
+	 * If memory initialisation on large machines is deferred then this
+	 * is the first PFN that needs to be initialised.
+	 */
+	unsigned long first_deferred_pfn;
+#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
 } pg_data_t;
 
 #define node_present_pages(nid)	(NODE_DATA(nid)->node_present_pages)
diff --git a/mm/Kconfig b/mm/Kconfig
index a03131b6ba8e..3e40cb64e226 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -629,3 +629,21 @@ config MAX_STACK_SIZE_MB
 	  changed to a smaller value in which case that is used.
 
 	  A sane initial value is 80 MB.
+
+# For architectures that support deferred memory initialisation
+config ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
+	bool
+
+config DEFERRED_STRUCT_PAGE_INIT
+	bool "Defer initialisation of struct pages to kswapd"
+	default n
+	depends on ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
+	depends on MEMORY_HOTPLUG
+	help
+	  Ordinarily all struct pages are initialised during early boot in a
+	  single thread. On very large machines this can take a considerable
+	  amount of time. If this option is set, large machines will bring up
+	  a subset of memmap at boot and then initialise the rest in parallel
+	  when kswapd starts. This has a potential performance impact on
+	  processes running early in the lifetime of the systemm until kswapd
+	  finishes the initialisation.
diff --git a/mm/internal.h b/mm/internal.h
index 76b605139c7a..24314b671db1 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -385,6 +385,20 @@ static inline void mminit_verify_zonelist(void)
 }
 #endif /* CONFIG_DEBUG_MEMORY_INIT */
 
+/*
+ * Deferred struct page initialisation requires some early init functions that
+ * are removed before kswapd is up and running. The feature depends on memory
+ * hotplug so put the data and code required by deferred initialisation into
+ * the __meminit section where they are preserved.
+ */
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+#define __defermem_init __meminit
+#define __defer_init    __meminit
+#else
+#define __defermem_init
+#define __defer_init __init
+#endif
+
 /* mminit_validate_memmodel_limits is independent of CONFIG_DEBUG_MEMORY_INIT */
 #if defined(CONFIG_SPARSEMEM)
 extern void mminit_validate_memmodel_limits(unsigned long *start_pfn,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bb99c7e66da5..8ec493a24b9c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -235,6 +235,64 @@ EXPORT_SYMBOL(nr_online_nodes);
 
 int page_group_by_mobility_disabled __read_mostly;
 
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+static inline void reset_deferred_meminit(pg_data_t *pgdat)
+{
+	pgdat->first_deferred_pfn = ULONG_MAX;
+}
+
+/* Returns true if the struct page for the pfn is uninitialised */
+static inline bool __defermem_init early_page_uninitialised(unsigned long pfn)
+{
+	int nid = early_pfn_to_nid(pfn);
+
+	if (pfn >= NODE_DATA(nid)->first_deferred_pfn)
+		return true;
+
+	return false;
+}
+
+/*
+ * Returns false when the remaining initialisation should be deferred until
+ * later in the boot cycle when it can be parallelised.
+ */
+static inline bool update_defer_init(pg_data_t *pgdat,
+				unsigned long pfn, unsigned long zone_end,
+				unsigned long *nr_initialised)
+{
+	/* Always populate low zones for address-contrained allocations */
+	if (zone_end < pgdat_end_pfn(pgdat))
+		return true;
+
+	/* Initialise at least 2G of the highest zone */
+	(*nr_initialised)++;
+	if (*nr_initialised > (2UL << (30 - PAGE_SHIFT)) &&
+	    (pfn & (PAGES_PER_SECTION - 1)) == 0) {
+		pgdat->first_deferred_pfn = pfn;
+		return false;
+	}
+
+	return true;
+}
+#else
+static inline void reset_deferred_meminit(pg_data_t *pgdat)
+{
+}
+
+static inline bool early_page_uninitialised(unsigned long pfn)
+{
+	return false;
+}
+
+static inline bool update_defer_init(pg_data_t *pgdat,
+				unsigned long pfn, unsigned long zone_end,
+				unsigned long *nr_initialised)
+{
+	return true;
+}
+#endif
+
+
 void set_pageblock_migratetype(struct page *page, int migratetype)
 {
 	if (unlikely(page_group_by_mobility_disabled &&
@@ -892,8 +950,8 @@ static void __free_pages_ok(struct page *page, unsigned int order)
 	local_irq_restore(flags);
 }
 
-void __init __free_pages_bootmem(struct page *page, unsigned long pfn,
-							unsigned int order)
+static void __defer_init __free_pages_boot_core(struct page *page,
+					unsigned long pfn, unsigned int order)
 {
 	unsigned int nr_pages = 1 << order;
 	struct page *p = page;
@@ -952,6 +1010,14 @@ static inline bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
 }
 #endif
 
+void __defer_init __free_pages_bootmem(struct page *page, unsigned long pfn,
+							unsigned int order)
+{
+	if (early_page_uninitialised(pfn))
+		return;
+	return __free_pages_boot_core(page, pfn, order);
+}
+
 #ifdef CONFIG_CMA
 /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
 void __init init_cma_reserved_pageblock(struct page *page)
@@ -4224,14 +4290,16 @@ static void setup_zone_migrate_reserve(struct zone *zone)
 void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 		unsigned long start_pfn, enum memmap_context context)
 {
+	pg_data_t *pgdat = NODE_DATA(nid);
 	unsigned long end_pfn = start_pfn + size;
 	unsigned long pfn;
 	struct zone *z;
+	unsigned long nr_initialised = 0;
 
 	if (highest_memmap_pfn < end_pfn - 1)
 		highest_memmap_pfn = end_pfn - 1;
 
-	z = &NODE_DATA(nid)->node_zones[zone];
+	z = &pgdat->node_zones[zone];
 	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
 		/*
 		 * There can be holes in boot-time mem_map[]s
@@ -4243,6 +4311,9 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 				continue;
 			if (!early_pfn_in_nid(pfn, nid))
 				continue;
+			if (!update_defer_init(pgdat, pfn, end_pfn,
+						&nr_initialised))
+				break;
 		}
 		__init_single_pfn(pfn, zone, nid);
 	}
@@ -5044,6 +5115,7 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
 	/* pg_data_t should be reset to zero when it's allocated */
 	WARN_ON(pgdat->nr_zones || pgdat->classzone_idx);
 
+	reset_deferred_meminit(pgdat);
 	pgdat->node_id = nid;
 	pgdat->node_start_pfn = node_start_pfn;
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
-- 
2.3.5


^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH 07/13] mm: meminit: Initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set
@ 2015-04-28 14:37   ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-04-28 14:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML, Mel Gorman

This patch initalises all low memory struct pages and 2G of the highest zone
on each node during memory initialisation if CONFIG_DEFERRED_STRUCT_PAGE_INIT
is set. That config option cannot be set but will be available in a later
patch.  Parallel initialisation of struct page depends on some features
from memory hotplug and it is necessary to alter alter section annotations.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 drivers/base/node.c    |  6 +++-
 include/linux/mmzone.h |  8 ++++++
 mm/Kconfig             | 18 ++++++++++++
 mm/internal.h          | 14 +++++++++
 mm/page_alloc.c        | 78 ++++++++++++++++++++++++++++++++++++++++++++++++--
 5 files changed, 120 insertions(+), 4 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 36fabe43cd44..97ab2c4dd39e 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -361,12 +361,16 @@ int unregister_cpu_under_node(unsigned int cpu, unsigned int nid)
 #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
 #define page_initialized(page)  (page->lru.next)
 
-static int get_nid_for_pfn(unsigned long pfn)
+static int __init_refok get_nid_for_pfn(unsigned long pfn)
 {
 	struct page *page;
 
 	if (!pfn_valid_within(pfn))
 		return -1;
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+	if (system_state == SYSTEM_BOOTING)
+		return early_pfn_to_nid(pfn);
+#endif
 	page = pfn_to_page(pfn);
 	if (!page_initialized(page))
 		return -1;
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e3d8a2bd8d78..4882c53b70b5 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -762,6 +762,14 @@ typedef struct pglist_data {
 	/* Number of pages migrated during the rate limiting time interval */
 	unsigned long numabalancing_migrate_nr_pages;
 #endif
+
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+	/*
+	 * If memory initialisation on large machines is deferred then this
+	 * is the first PFN that needs to be initialised.
+	 */
+	unsigned long first_deferred_pfn;
+#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
 } pg_data_t;
 
 #define node_present_pages(nid)	(NODE_DATA(nid)->node_present_pages)
diff --git a/mm/Kconfig b/mm/Kconfig
index a03131b6ba8e..3e40cb64e226 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -629,3 +629,21 @@ config MAX_STACK_SIZE_MB
 	  changed to a smaller value in which case that is used.
 
 	  A sane initial value is 80 MB.
+
+# For architectures that support deferred memory initialisation
+config ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
+	bool
+
+config DEFERRED_STRUCT_PAGE_INIT
+	bool "Defer initialisation of struct pages to kswapd"
+	default n
+	depends on ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
+	depends on MEMORY_HOTPLUG
+	help
+	  Ordinarily all struct pages are initialised during early boot in a
+	  single thread. On very large machines this can take a considerable
+	  amount of time. If this option is set, large machines will bring up
+	  a subset of memmap at boot and then initialise the rest in parallel
+	  when kswapd starts. This has a potential performance impact on
+	  processes running early in the lifetime of the systemm until kswapd
+	  finishes the initialisation.
diff --git a/mm/internal.h b/mm/internal.h
index 76b605139c7a..24314b671db1 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -385,6 +385,20 @@ static inline void mminit_verify_zonelist(void)
 }
 #endif /* CONFIG_DEBUG_MEMORY_INIT */
 
+/*
+ * Deferred struct page initialisation requires some early init functions that
+ * are removed before kswapd is up and running. The feature depends on memory
+ * hotplug so put the data and code required by deferred initialisation into
+ * the __meminit section where they are preserved.
+ */
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+#define __defermem_init __meminit
+#define __defer_init    __meminit
+#else
+#define __defermem_init
+#define __defer_init __init
+#endif
+
 /* mminit_validate_memmodel_limits is independent of CONFIG_DEBUG_MEMORY_INIT */
 #if defined(CONFIG_SPARSEMEM)
 extern void mminit_validate_memmodel_limits(unsigned long *start_pfn,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bb99c7e66da5..8ec493a24b9c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -235,6 +235,64 @@ EXPORT_SYMBOL(nr_online_nodes);
 
 int page_group_by_mobility_disabled __read_mostly;
 
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+static inline void reset_deferred_meminit(pg_data_t *pgdat)
+{
+	pgdat->first_deferred_pfn = ULONG_MAX;
+}
+
+/* Returns true if the struct page for the pfn is uninitialised */
+static inline bool __defermem_init early_page_uninitialised(unsigned long pfn)
+{
+	int nid = early_pfn_to_nid(pfn);
+
+	if (pfn >= NODE_DATA(nid)->first_deferred_pfn)
+		return true;
+
+	return false;
+}
+
+/*
+ * Returns false when the remaining initialisation should be deferred until
+ * later in the boot cycle when it can be parallelised.
+ */
+static inline bool update_defer_init(pg_data_t *pgdat,
+				unsigned long pfn, unsigned long zone_end,
+				unsigned long *nr_initialised)
+{
+	/* Always populate low zones for address-contrained allocations */
+	if (zone_end < pgdat_end_pfn(pgdat))
+		return true;
+
+	/* Initialise at least 2G of the highest zone */
+	(*nr_initialised)++;
+	if (*nr_initialised > (2UL << (30 - PAGE_SHIFT)) &&
+	    (pfn & (PAGES_PER_SECTION - 1)) == 0) {
+		pgdat->first_deferred_pfn = pfn;
+		return false;
+	}
+
+	return true;
+}
+#else
+static inline void reset_deferred_meminit(pg_data_t *pgdat)
+{
+}
+
+static inline bool early_page_uninitialised(unsigned long pfn)
+{
+	return false;
+}
+
+static inline bool update_defer_init(pg_data_t *pgdat,
+				unsigned long pfn, unsigned long zone_end,
+				unsigned long *nr_initialised)
+{
+	return true;
+}
+#endif
+
+
 void set_pageblock_migratetype(struct page *page, int migratetype)
 {
 	if (unlikely(page_group_by_mobility_disabled &&
@@ -892,8 +950,8 @@ static void __free_pages_ok(struct page *page, unsigned int order)
 	local_irq_restore(flags);
 }
 
-void __init __free_pages_bootmem(struct page *page, unsigned long pfn,
-							unsigned int order)
+static void __defer_init __free_pages_boot_core(struct page *page,
+					unsigned long pfn, unsigned int order)
 {
 	unsigned int nr_pages = 1 << order;
 	struct page *p = page;
@@ -952,6 +1010,14 @@ static inline bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
 }
 #endif
 
+void __defer_init __free_pages_bootmem(struct page *page, unsigned long pfn,
+							unsigned int order)
+{
+	if (early_page_uninitialised(pfn))
+		return;
+	return __free_pages_boot_core(page, pfn, order);
+}
+
 #ifdef CONFIG_CMA
 /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
 void __init init_cma_reserved_pageblock(struct page *page)
@@ -4224,14 +4290,16 @@ static void setup_zone_migrate_reserve(struct zone *zone)
 void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 		unsigned long start_pfn, enum memmap_context context)
 {
+	pg_data_t *pgdat = NODE_DATA(nid);
 	unsigned long end_pfn = start_pfn + size;
 	unsigned long pfn;
 	struct zone *z;
+	unsigned long nr_initialised = 0;
 
 	if (highest_memmap_pfn < end_pfn - 1)
 		highest_memmap_pfn = end_pfn - 1;
 
-	z = &NODE_DATA(nid)->node_zones[zone];
+	z = &pgdat->node_zones[zone];
 	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
 		/*
 		 * There can be holes in boot-time mem_map[]s
@@ -4243,6 +4311,9 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 				continue;
 			if (!early_pfn_in_nid(pfn, nid))
 				continue;
+			if (!update_defer_init(pgdat, pfn, end_pfn,
+						&nr_initialised))
+				break;
 		}
 		__init_single_pfn(pfn, zone, nid);
 	}
@@ -5044,6 +5115,7 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
 	/* pg_data_t should be reset to zero when it's allocated */
 	WARN_ON(pgdat->nr_zones || pgdat->classzone_idx);
 
+	reset_deferred_meminit(pgdat);
 	pgdat->node_id = nid;
 	pgdat->node_start_pfn = node_start_pfn;
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
-- 
2.3.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH 08/13] mm: meminit: Initialise remaining struct pages in parallel with kswapd
  2015-04-28 14:36 ` Mel Gorman
@ 2015-04-28 14:37   ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-04-28 14:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML, Mel Gorman

Only a subset of struct pages are initialised at the moment. When this patch
is applied kswapd initialise the remaining struct pages in parallel. This
should boot faster by spreading the work to multiple CPUs and initialising
data that is local to the CPU.  The user-visible effect on large machines
is that free memory will appear to rapidly increase early in the lifetime
of the system until kswapd reports that all memory is initialised in the
kernel log.  Once initialised there should be no other user-visibile effects.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/internal.h   |   6 +++
 mm/mm_init.c    |   1 +
 mm/page_alloc.c | 123 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 mm/vmscan.c     |   6 ++-
 4 files changed, 130 insertions(+), 6 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 24314b671db1..bed751a7ac42 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -394,9 +394,15 @@ static inline void mminit_verify_zonelist(void)
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 #define __defermem_init __meminit
 #define __defer_init    __meminit
+
+void deferred_init_memmap(int nid);
 #else
 #define __defermem_init
 #define __defer_init __init
+
+static inline void deferred_init_memmap(int nid)
+{
+}
 #endif
 
 /* mminit_validate_memmodel_limits is independent of CONFIG_DEBUG_MEMORY_INIT */
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 5f420f7fafa1..28fbf87b20aa 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -11,6 +11,7 @@
 #include <linux/export.h>
 #include <linux/memory.h>
 #include <linux/notifier.h>
+#include <linux/sched.h>
 #include "internal.h"
 
 #ifdef CONFIG_DEBUG_MEMORY_INIT
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8ec493a24b9c..96f2c2dc8ca6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -252,6 +252,14 @@ static inline bool __defermem_init early_page_uninitialised(unsigned long pfn)
 	return false;
 }
 
+static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid)
+{
+	if (pfn >= NODE_DATA(nid)->first_deferred_pfn)
+		return true;
+
+	return false;
+}
+
 /*
  * Returns false when the remaining initialisation should be deferred until
  * later in the boot cycle when it can be parallelised.
@@ -284,6 +292,11 @@ static inline bool early_page_uninitialised(unsigned long pfn)
 	return false;
 }
 
+static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid)
+{
+	return false;
+}
+
 static inline bool update_defer_init(pg_data_t *pgdat,
 				unsigned long pfn, unsigned long zone_end,
 				unsigned long *nr_initialised)
@@ -880,20 +893,51 @@ static void __meminit __init_single_pfn(unsigned long pfn, unsigned long zone,
 	return __init_single_page(pfn_to_page(pfn), pfn, zone, nid);
 }
 
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+static void init_reserved_page(unsigned long pfn)
+{
+	pg_data_t *pgdat;
+	int nid, zid;
+
+	if (!early_page_uninitialised(pfn))
+		return;
+
+	nid = early_pfn_to_nid(pfn);
+	pgdat = NODE_DATA(nid);
+
+	for (zid = 0; zid < MAX_NR_ZONES; zid++) {
+		struct zone *zone = &pgdat->node_zones[zid];
+
+		if (pfn >= zone->zone_start_pfn && pfn < zone_end_pfn(zone))
+			break;
+	}
+	__init_single_pfn(pfn, zid, nid);
+}
+#else
+static inline void init_reserved_page(unsigned long pfn)
+{
+}
+#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
+
 /*
  * Initialised pages do not have PageReserved set. This function is
  * called for each range allocated by the bootmem allocator and
  * marks the pages PageReserved. The remaining valid pages are later
  * sent to the buddy page allocator.
  */
-void reserve_bootmem_region(unsigned long start, unsigned long end)
+void __meminit reserve_bootmem_region(unsigned long start, unsigned long end)
 {
 	unsigned long start_pfn = PFN_DOWN(start);
 	unsigned long end_pfn = PFN_UP(end);
 
-	for (; start_pfn < end_pfn; start_pfn++)
-		if (pfn_valid(start_pfn))
-			SetPageReserved(pfn_to_page(start_pfn));
+	for (; start_pfn < end_pfn; start_pfn++) {
+		if (pfn_valid(start_pfn)) {
+			struct page *page = pfn_to_page(start_pfn);
+
+			init_reserved_page(start_pfn);
+			SetPageReserved(page);
+		}
+	}
 }
 
 static bool free_pages_prepare(struct page *page, unsigned int order)
@@ -1018,6 +1062,74 @@ void __defer_init __free_pages_bootmem(struct page *page, unsigned long pfn,
 	return __free_pages_boot_core(page, pfn, order);
 }
 
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+/* Initialise remaining memory on a node */
+void __defermem_init deferred_init_memmap(int nid)
+{
+	struct mminit_pfnnid_cache nid_init_state = { };
+	unsigned long start = jiffies;
+	unsigned long nr_pages = 0;
+	unsigned long walk_start, walk_end;
+	int i, zid;
+	struct zone *zone;
+	pg_data_t *pgdat = NODE_DATA(nid);
+	unsigned long first_init_pfn = pgdat->first_deferred_pfn;
+
+	if (first_init_pfn == ULONG_MAX)
+		return;
+
+	/* Sanity check boundaries */
+	BUG_ON(pgdat->first_deferred_pfn < pgdat->node_start_pfn);
+	BUG_ON(pgdat->first_deferred_pfn > pgdat_end_pfn(pgdat));
+	pgdat->first_deferred_pfn = ULONG_MAX;
+
+	/* Only the highest zone is deferred so find it */
+	for (zid = 0; zid < MAX_NR_ZONES; zid++) {
+		zone = pgdat->node_zones + zid;
+		if (first_init_pfn < zone_end_pfn(zone))
+			break;
+	}
+
+	for_each_mem_pfn_range(i, nid, &walk_start, &walk_end, NULL) {
+		unsigned long pfn, end_pfn;
+
+		end_pfn = min(walk_end, zone_end_pfn(zone));
+		pfn = first_init_pfn;
+		if (pfn < walk_start)
+			pfn = walk_start;
+		if (pfn < zone->zone_start_pfn)
+			pfn = zone->zone_start_pfn;
+
+		for (; pfn < end_pfn; pfn++) {
+			struct page *page;
+
+			if (!pfn_valid(pfn))
+				continue;
+
+			if (!meminit_pfn_in_nid(pfn, nid, &nid_init_state))
+				continue;
+
+			if (page->flags) {
+				VM_BUG_ON(page_zone(page) != zone);
+				continue;
+			}
+
+			__init_single_page(page, pfn, zid, nid);
+			__free_pages_boot_core(page, pfn, 0);
+			nr_pages++;
+			cond_resched();
+		}
+		first_init_pfn = max(end_pfn, first_init_pfn);
+	}
+
+	/* Sanity check that the next zone really is unpopulated */
+	WARN_ON(++zid < MAX_NR_ZONES && populated_zone(++zone));
+
+	pr_info("kswapd %d initialised %lu pages in %ums\n", nid, nr_pages,
+					jiffies_to_msecs(jiffies - start));
+}
+#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
+
 #ifdef CONFIG_CMA
 /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
 void __init init_cma_reserved_pageblock(struct page *page)
@@ -4228,6 +4340,9 @@ static void setup_zone_migrate_reserve(struct zone *zone)
 	zone->nr_migrate_reserve_block = reserve;
 
 	for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
+		if (!early_page_nid_uninitialised(pfn, zone_to_nid(zone)))
+			return;
+
 		if (!pfn_valid(pfn))
 			continue;
 		page = pfn_to_page(pfn);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5e8eadd71bac..c4895d26d036 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3348,7 +3348,7 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int order, int classzone_idx)
  * If there are applications that are active memory-allocators
  * (most normal use), this basically shouldn't matter.
  */
-static int kswapd(void *p)
+static int __defermem_init kswapd(void *p)
 {
 	unsigned long order, new_order;
 	unsigned balanced_order;
@@ -3383,6 +3383,8 @@ static int kswapd(void *p)
 	tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
 	set_freezable();
 
+	deferred_init_memmap(pgdat->node_id);
+
 	order = new_order = 0;
 	balanced_order = 0;
 	classzone_idx = new_classzone_idx = pgdat->nr_zones - 1;
@@ -3538,7 +3540,7 @@ static int cpu_callback(struct notifier_block *nfb, unsigned long action,
  * This kswapd start function will be called by init and node-hot-add.
  * On node-hot-add, kswapd will moved to proper cpus if cpus are hot-added.
  */
-int kswapd_run(int nid)
+int __defermem_init kswapd_run(int nid)
 {
 	pg_data_t *pgdat = NODE_DATA(nid);
 	int ret = 0;
-- 
2.3.5


^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH 08/13] mm: meminit: Initialise remaining struct pages in parallel with kswapd
@ 2015-04-28 14:37   ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-04-28 14:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML, Mel Gorman

Only a subset of struct pages are initialised at the moment. When this patch
is applied kswapd initialise the remaining struct pages in parallel. This
should boot faster by spreading the work to multiple CPUs and initialising
data that is local to the CPU.  The user-visible effect on large machines
is that free memory will appear to rapidly increase early in the lifetime
of the system until kswapd reports that all memory is initialised in the
kernel log.  Once initialised there should be no other user-visibile effects.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/internal.h   |   6 +++
 mm/mm_init.c    |   1 +
 mm/page_alloc.c | 123 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 mm/vmscan.c     |   6 ++-
 4 files changed, 130 insertions(+), 6 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 24314b671db1..bed751a7ac42 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -394,9 +394,15 @@ static inline void mminit_verify_zonelist(void)
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 #define __defermem_init __meminit
 #define __defer_init    __meminit
+
+void deferred_init_memmap(int nid);
 #else
 #define __defermem_init
 #define __defer_init __init
+
+static inline void deferred_init_memmap(int nid)
+{
+}
 #endif
 
 /* mminit_validate_memmodel_limits is independent of CONFIG_DEBUG_MEMORY_INIT */
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 5f420f7fafa1..28fbf87b20aa 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -11,6 +11,7 @@
 #include <linux/export.h>
 #include <linux/memory.h>
 #include <linux/notifier.h>
+#include <linux/sched.h>
 #include "internal.h"
 
 #ifdef CONFIG_DEBUG_MEMORY_INIT
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8ec493a24b9c..96f2c2dc8ca6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -252,6 +252,14 @@ static inline bool __defermem_init early_page_uninitialised(unsigned long pfn)
 	return false;
 }
 
+static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid)
+{
+	if (pfn >= NODE_DATA(nid)->first_deferred_pfn)
+		return true;
+
+	return false;
+}
+
 /*
  * Returns false when the remaining initialisation should be deferred until
  * later in the boot cycle when it can be parallelised.
@@ -284,6 +292,11 @@ static inline bool early_page_uninitialised(unsigned long pfn)
 	return false;
 }
 
+static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid)
+{
+	return false;
+}
+
 static inline bool update_defer_init(pg_data_t *pgdat,
 				unsigned long pfn, unsigned long zone_end,
 				unsigned long *nr_initialised)
@@ -880,20 +893,51 @@ static void __meminit __init_single_pfn(unsigned long pfn, unsigned long zone,
 	return __init_single_page(pfn_to_page(pfn), pfn, zone, nid);
 }
 
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+static void init_reserved_page(unsigned long pfn)
+{
+	pg_data_t *pgdat;
+	int nid, zid;
+
+	if (!early_page_uninitialised(pfn))
+		return;
+
+	nid = early_pfn_to_nid(pfn);
+	pgdat = NODE_DATA(nid);
+
+	for (zid = 0; zid < MAX_NR_ZONES; zid++) {
+		struct zone *zone = &pgdat->node_zones[zid];
+
+		if (pfn >= zone->zone_start_pfn && pfn < zone_end_pfn(zone))
+			break;
+	}
+	__init_single_pfn(pfn, zid, nid);
+}
+#else
+static inline void init_reserved_page(unsigned long pfn)
+{
+}
+#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
+
 /*
  * Initialised pages do not have PageReserved set. This function is
  * called for each range allocated by the bootmem allocator and
  * marks the pages PageReserved. The remaining valid pages are later
  * sent to the buddy page allocator.
  */
-void reserve_bootmem_region(unsigned long start, unsigned long end)
+void __meminit reserve_bootmem_region(unsigned long start, unsigned long end)
 {
 	unsigned long start_pfn = PFN_DOWN(start);
 	unsigned long end_pfn = PFN_UP(end);
 
-	for (; start_pfn < end_pfn; start_pfn++)
-		if (pfn_valid(start_pfn))
-			SetPageReserved(pfn_to_page(start_pfn));
+	for (; start_pfn < end_pfn; start_pfn++) {
+		if (pfn_valid(start_pfn)) {
+			struct page *page = pfn_to_page(start_pfn);
+
+			init_reserved_page(start_pfn);
+			SetPageReserved(page);
+		}
+	}
 }
 
 static bool free_pages_prepare(struct page *page, unsigned int order)
@@ -1018,6 +1062,74 @@ void __defer_init __free_pages_bootmem(struct page *page, unsigned long pfn,
 	return __free_pages_boot_core(page, pfn, order);
 }
 
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+/* Initialise remaining memory on a node */
+void __defermem_init deferred_init_memmap(int nid)
+{
+	struct mminit_pfnnid_cache nid_init_state = { };
+	unsigned long start = jiffies;
+	unsigned long nr_pages = 0;
+	unsigned long walk_start, walk_end;
+	int i, zid;
+	struct zone *zone;
+	pg_data_t *pgdat = NODE_DATA(nid);
+	unsigned long first_init_pfn = pgdat->first_deferred_pfn;
+
+	if (first_init_pfn == ULONG_MAX)
+		return;
+
+	/* Sanity check boundaries */
+	BUG_ON(pgdat->first_deferred_pfn < pgdat->node_start_pfn);
+	BUG_ON(pgdat->first_deferred_pfn > pgdat_end_pfn(pgdat));
+	pgdat->first_deferred_pfn = ULONG_MAX;
+
+	/* Only the highest zone is deferred so find it */
+	for (zid = 0; zid < MAX_NR_ZONES; zid++) {
+		zone = pgdat->node_zones + zid;
+		if (first_init_pfn < zone_end_pfn(zone))
+			break;
+	}
+
+	for_each_mem_pfn_range(i, nid, &walk_start, &walk_end, NULL) {
+		unsigned long pfn, end_pfn;
+
+		end_pfn = min(walk_end, zone_end_pfn(zone));
+		pfn = first_init_pfn;
+		if (pfn < walk_start)
+			pfn = walk_start;
+		if (pfn < zone->zone_start_pfn)
+			pfn = zone->zone_start_pfn;
+
+		for (; pfn < end_pfn; pfn++) {
+			struct page *page;
+
+			if (!pfn_valid(pfn))
+				continue;
+
+			if (!meminit_pfn_in_nid(pfn, nid, &nid_init_state))
+				continue;
+
+			if (page->flags) {
+				VM_BUG_ON(page_zone(page) != zone);
+				continue;
+			}
+
+			__init_single_page(page, pfn, zid, nid);
+			__free_pages_boot_core(page, pfn, 0);
+			nr_pages++;
+			cond_resched();
+		}
+		first_init_pfn = max(end_pfn, first_init_pfn);
+	}
+
+	/* Sanity check that the next zone really is unpopulated */
+	WARN_ON(++zid < MAX_NR_ZONES && populated_zone(++zone));
+
+	pr_info("kswapd %d initialised %lu pages in %ums\n", nid, nr_pages,
+					jiffies_to_msecs(jiffies - start));
+}
+#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
+
 #ifdef CONFIG_CMA
 /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
 void __init init_cma_reserved_pageblock(struct page *page)
@@ -4228,6 +4340,9 @@ static void setup_zone_migrate_reserve(struct zone *zone)
 	zone->nr_migrate_reserve_block = reserve;
 
 	for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
+		if (!early_page_nid_uninitialised(pfn, zone_to_nid(zone)))
+			return;
+
 		if (!pfn_valid(pfn))
 			continue;
 		page = pfn_to_page(pfn);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5e8eadd71bac..c4895d26d036 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3348,7 +3348,7 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int order, int classzone_idx)
  * If there are applications that are active memory-allocators
  * (most normal use), this basically shouldn't matter.
  */
-static int kswapd(void *p)
+static int __defermem_init kswapd(void *p)
 {
 	unsigned long order, new_order;
 	unsigned balanced_order;
@@ -3383,6 +3383,8 @@ static int kswapd(void *p)
 	tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
 	set_freezable();
 
+	deferred_init_memmap(pgdat->node_id);
+
 	order = new_order = 0;
 	balanced_order = 0;
 	classzone_idx = new_classzone_idx = pgdat->nr_zones - 1;
@@ -3538,7 +3540,7 @@ static int cpu_callback(struct notifier_block *nfb, unsigned long action,
  * This kswapd start function will be called by init and node-hot-add.
  * On node-hot-add, kswapd will moved to proper cpus if cpus are hot-added.
  */
-int kswapd_run(int nid)
+int __defermem_init kswapd_run(int nid)
 {
 	pg_data_t *pgdat = NODE_DATA(nid);
 	int ret = 0;
-- 
2.3.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH 09/13] mm: meminit: Minimise number of pfn->page lookups during initialisation
  2015-04-28 14:36 ` Mel Gorman
@ 2015-04-28 14:37   ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-04-28 14:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML, Mel Gorman

Deferred struct page initialisation is using pfn_to_page() on every PFN
unnecessarily. This patch minimises the number of lookups and scheduler
checks.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/page_alloc.c | 29 ++++++++++++++++++++++++-----
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 96f2c2dc8ca6..6e366fd654e1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1092,6 +1092,7 @@ void __defermem_init deferred_init_memmap(int nid)
 
 	for_each_mem_pfn_range(i, nid, &walk_start, &walk_end, NULL) {
 		unsigned long pfn, end_pfn;
+		struct page *page = NULL;
 
 		end_pfn = min(walk_end, zone_end_pfn(zone));
 		pfn = first_init_pfn;
@@ -1101,13 +1102,32 @@ void __defermem_init deferred_init_memmap(int nid)
 			pfn = zone->zone_start_pfn;
 
 		for (; pfn < end_pfn; pfn++) {
-			struct page *page;
-
-			if (!pfn_valid(pfn))
+			if (!pfn_valid_within(pfn))
 				continue;
 
-			if (!meminit_pfn_in_nid(pfn, nid, &nid_init_state))
+			/*
+			 * Ensure pfn_valid is checked every
+			 * MAX_ORDER_NR_PAGES for memory holes
+			 */
+			if ((pfn & (MAX_ORDER_NR_PAGES - 1)) == 0) {
+				if (!pfn_valid(pfn)) {
+					page = NULL;
+					continue;
+				}
+			}
+
+			if (!meminit_pfn_in_nid(pfn, nid, &nid_init_state)) {
+				page = NULL;
 				continue;
+			}
+
+			/* Minimise pfn page lookups and scheduler checks */
+			if (page && (pfn & (MAX_ORDER_NR_PAGES - 1)) != 0) {
+				page++;
+			} else {
+				page = pfn_to_page(pfn);
+				cond_resched();
+			}
 
 			if (page->flags) {
 				VM_BUG_ON(page_zone(page) != zone);
@@ -1117,7 +1137,6 @@ void __defermem_init deferred_init_memmap(int nid)
 			__init_single_page(page, pfn, zid, nid);
 			__free_pages_boot_core(page, pfn, 0);
 			nr_pages++;
-			cond_resched();
 		}
 		first_init_pfn = max(end_pfn, first_init_pfn);
 	}
-- 
2.3.5


^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH 09/13] mm: meminit: Minimise number of pfn->page lookups during initialisation
@ 2015-04-28 14:37   ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-04-28 14:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML, Mel Gorman

Deferred struct page initialisation is using pfn_to_page() on every PFN
unnecessarily. This patch minimises the number of lookups and scheduler
checks.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/page_alloc.c | 29 ++++++++++++++++++++++++-----
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 96f2c2dc8ca6..6e366fd654e1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1092,6 +1092,7 @@ void __defermem_init deferred_init_memmap(int nid)
 
 	for_each_mem_pfn_range(i, nid, &walk_start, &walk_end, NULL) {
 		unsigned long pfn, end_pfn;
+		struct page *page = NULL;
 
 		end_pfn = min(walk_end, zone_end_pfn(zone));
 		pfn = first_init_pfn;
@@ -1101,13 +1102,32 @@ void __defermem_init deferred_init_memmap(int nid)
 			pfn = zone->zone_start_pfn;
 
 		for (; pfn < end_pfn; pfn++) {
-			struct page *page;
-
-			if (!pfn_valid(pfn))
+			if (!pfn_valid_within(pfn))
 				continue;
 
-			if (!meminit_pfn_in_nid(pfn, nid, &nid_init_state))
+			/*
+			 * Ensure pfn_valid is checked every
+			 * MAX_ORDER_NR_PAGES for memory holes
+			 */
+			if ((pfn & (MAX_ORDER_NR_PAGES - 1)) == 0) {
+				if (!pfn_valid(pfn)) {
+					page = NULL;
+					continue;
+				}
+			}
+
+			if (!meminit_pfn_in_nid(pfn, nid, &nid_init_state)) {
+				page = NULL;
 				continue;
+			}
+
+			/* Minimise pfn page lookups and scheduler checks */
+			if (page && (pfn & (MAX_ORDER_NR_PAGES - 1)) != 0) {
+				page++;
+			} else {
+				page = pfn_to_page(pfn);
+				cond_resched();
+			}
 
 			if (page->flags) {
 				VM_BUG_ON(page_zone(page) != zone);
@@ -1117,7 +1137,6 @@ void __defermem_init deferred_init_memmap(int nid)
 			__init_single_page(page, pfn, zid, nid);
 			__free_pages_boot_core(page, pfn, 0);
 			nr_pages++;
-			cond_resched();
 		}
 		first_init_pfn = max(end_pfn, first_init_pfn);
 	}
-- 
2.3.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH 10/13] x86: mm: Enable deferred struct page initialisation on x86-64
  2015-04-28 14:36 ` Mel Gorman
@ 2015-04-28 14:37   ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-04-28 14:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML, Mel Gorman

Subject says it all. Other architectures may enable on a case-by-case
basis after auditing early_pfn_to_nid and testing.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 arch/x86/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b7d31ca55187..1beff8a8fbc9 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -18,6 +18,7 @@ config X86_64
 	select X86_DEV_DMA_OPS
 	select ARCH_USE_CMPXCHG_LOCKREF
 	select HAVE_LIVEPATCH
+	select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
 
 ### Arch settings
 config X86
-- 
2.3.5


^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH 10/13] x86: mm: Enable deferred struct page initialisation on x86-64
@ 2015-04-28 14:37   ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-04-28 14:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML, Mel Gorman

Subject says it all. Other architectures may enable on a case-by-case
basis after auditing early_pfn_to_nid and testing.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 arch/x86/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b7d31ca55187..1beff8a8fbc9 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -18,6 +18,7 @@ config X86_64
 	select X86_DEV_DMA_OPS
 	select ARCH_USE_CMPXCHG_LOCKREF
 	select HAVE_LIVEPATCH
+	select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
 
 ### Arch settings
 config X86
-- 
2.3.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH 11/13] mm: meminit: Free pages in large chunks where possible
  2015-04-28 14:36 ` Mel Gorman
@ 2015-04-28 14:37   ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-04-28 14:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML, Mel Gorman

Parallel struct page frees pages one at a time. Try free pages as single
large pages where possible.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/page_alloc.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 49 insertions(+), 6 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6e366fd654e1..2200b7473b5a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1063,6 +1063,25 @@ void __defer_init __free_pages_bootmem(struct page *page, unsigned long pfn,
 }
 
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+static void __defermem_init deferred_free_range(struct page *page,
+					unsigned long pfn, int nr_pages)
+{
+	int i;
+
+	if (!page)
+		return;
+
+	/* Free a large naturally-aligned chunk if possible */
+	if (nr_pages == MAX_ORDER_NR_PAGES &&
+	    (pfn & (MAX_ORDER_NR_PAGES-1)) == 0) {
+		__free_pages_boot_core(page, pfn, MAX_ORDER-1);
+		return;
+	}
+
+	for (i = 0; i < nr_pages; i++, page++, pfn++)
+		__free_pages_boot_core(page, pfn, 0);
+}
+
 /* Initialise remaining memory on a node */
 void __defermem_init deferred_init_memmap(int nid)
 {
@@ -1093,6 +1112,9 @@ void __defermem_init deferred_init_memmap(int nid)
 	for_each_mem_pfn_range(i, nid, &walk_start, &walk_end, NULL) {
 		unsigned long pfn, end_pfn;
 		struct page *page = NULL;
+		struct page *free_base_page = NULL;
+		unsigned long free_base_pfn = 0;
+		int nr_to_free = 0;
 
 		end_pfn = min(walk_end, zone_end_pfn(zone));
 		pfn = first_init_pfn;
@@ -1103,7 +1125,7 @@ void __defermem_init deferred_init_memmap(int nid)
 
 		for (; pfn < end_pfn; pfn++) {
 			if (!pfn_valid_within(pfn))
-				continue;
+				goto free_range;
 
 			/*
 			 * Ensure pfn_valid is checked every
@@ -1112,32 +1134,53 @@ void __defermem_init deferred_init_memmap(int nid)
 			if ((pfn & (MAX_ORDER_NR_PAGES - 1)) == 0) {
 				if (!pfn_valid(pfn)) {
 					page = NULL;
-					continue;
+					goto free_range;
 				}
 			}
 
 			if (!meminit_pfn_in_nid(pfn, nid, &nid_init_state)) {
 				page = NULL;
-				continue;
+				goto free_range;
 			}
 
 			/* Minimise pfn page lookups and scheduler checks */
 			if (page && (pfn & (MAX_ORDER_NR_PAGES - 1)) != 0) {
 				page++;
 			} else {
+				nr_pages += nr_to_free;
+				deferred_free_range(free_base_page,
+						free_base_pfn, nr_to_free);
+				free_base_page = NULL;
+				free_base_pfn = nr_to_free = 0;
+
 				page = pfn_to_page(pfn);
 				cond_resched();
 			}
 
 			if (page->flags) {
 				VM_BUG_ON(page_zone(page) != zone);
-				continue;
+				goto free_range;
 			}
 
 			__init_single_page(page, pfn, zid, nid);
-			__free_pages_boot_core(page, pfn, 0);
-			nr_pages++;
+			if (!free_base_page) {
+				free_base_page = page;
+				free_base_pfn = pfn;
+				nr_to_free = 0;
+			}
+			nr_to_free++;
+
+			/* Where possible, batch up pages for a single free */
+			continue;
+free_range:
+			/* Free the current block of pages to allocator */
+			nr_pages += nr_to_free;
+			deferred_free_range(free_base_page, free_base_pfn,
+								nr_to_free);
+			free_base_page = NULL;
+			free_base_pfn = nr_to_free = 0;
 		}
+
 		first_init_pfn = max(end_pfn, first_init_pfn);
 	}
 
-- 
2.3.5


^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH 11/13] mm: meminit: Free pages in large chunks where possible
@ 2015-04-28 14:37   ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-04-28 14:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML, Mel Gorman

Parallel struct page frees pages one at a time. Try free pages as single
large pages where possible.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/page_alloc.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 49 insertions(+), 6 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6e366fd654e1..2200b7473b5a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1063,6 +1063,25 @@ void __defer_init __free_pages_bootmem(struct page *page, unsigned long pfn,
 }
 
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+static void __defermem_init deferred_free_range(struct page *page,
+					unsigned long pfn, int nr_pages)
+{
+	int i;
+
+	if (!page)
+		return;
+
+	/* Free a large naturally-aligned chunk if possible */
+	if (nr_pages == MAX_ORDER_NR_PAGES &&
+	    (pfn & (MAX_ORDER_NR_PAGES-1)) == 0) {
+		__free_pages_boot_core(page, pfn, MAX_ORDER-1);
+		return;
+	}
+
+	for (i = 0; i < nr_pages; i++, page++, pfn++)
+		__free_pages_boot_core(page, pfn, 0);
+}
+
 /* Initialise remaining memory on a node */
 void __defermem_init deferred_init_memmap(int nid)
 {
@@ -1093,6 +1112,9 @@ void __defermem_init deferred_init_memmap(int nid)
 	for_each_mem_pfn_range(i, nid, &walk_start, &walk_end, NULL) {
 		unsigned long pfn, end_pfn;
 		struct page *page = NULL;
+		struct page *free_base_page = NULL;
+		unsigned long free_base_pfn = 0;
+		int nr_to_free = 0;
 
 		end_pfn = min(walk_end, zone_end_pfn(zone));
 		pfn = first_init_pfn;
@@ -1103,7 +1125,7 @@ void __defermem_init deferred_init_memmap(int nid)
 
 		for (; pfn < end_pfn; pfn++) {
 			if (!pfn_valid_within(pfn))
-				continue;
+				goto free_range;
 
 			/*
 			 * Ensure pfn_valid is checked every
@@ -1112,32 +1134,53 @@ void __defermem_init deferred_init_memmap(int nid)
 			if ((pfn & (MAX_ORDER_NR_PAGES - 1)) == 0) {
 				if (!pfn_valid(pfn)) {
 					page = NULL;
-					continue;
+					goto free_range;
 				}
 			}
 
 			if (!meminit_pfn_in_nid(pfn, nid, &nid_init_state)) {
 				page = NULL;
-				continue;
+				goto free_range;
 			}
 
 			/* Minimise pfn page lookups and scheduler checks */
 			if (page && (pfn & (MAX_ORDER_NR_PAGES - 1)) != 0) {
 				page++;
 			} else {
+				nr_pages += nr_to_free;
+				deferred_free_range(free_base_page,
+						free_base_pfn, nr_to_free);
+				free_base_page = NULL;
+				free_base_pfn = nr_to_free = 0;
+
 				page = pfn_to_page(pfn);
 				cond_resched();
 			}
 
 			if (page->flags) {
 				VM_BUG_ON(page_zone(page) != zone);
-				continue;
+				goto free_range;
 			}
 
 			__init_single_page(page, pfn, zid, nid);
-			__free_pages_boot_core(page, pfn, 0);
-			nr_pages++;
+			if (!free_base_page) {
+				free_base_page = page;
+				free_base_pfn = pfn;
+				nr_to_free = 0;
+			}
+			nr_to_free++;
+
+			/* Where possible, batch up pages for a single free */
+			continue;
+free_range:
+			/* Free the current block of pages to allocator */
+			nr_pages += nr_to_free;
+			deferred_free_range(free_base_page, free_base_pfn,
+								nr_to_free);
+			free_base_page = NULL;
+			free_base_pfn = nr_to_free = 0;
 		}
+
 		first_init_pfn = max(end_pfn, first_init_pfn);
 	}
 
-- 
2.3.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH 12/13] mm: meminit: Reduce number of times pageblocks are set during struct page init
  2015-04-28 14:36 ` Mel Gorman
@ 2015-04-28 14:37   ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-04-28 14:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML, Mel Gorman

During parallel sturct page initialisation, ranges are checked for every
PFN unnecessarily which increases boot times. This patch alters when the
ranges are checked.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/page_alloc.c | 46 ++++++++++++++++++++++++----------------------
 1 file changed, 24 insertions(+), 22 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2200b7473b5a..313f4a5a3907 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -852,33 +852,12 @@ static int free_tail_pages_check(struct page *head_page, struct page *page)
 static void __meminit __init_single_page(struct page *page, unsigned long pfn,
 				unsigned long zone, int nid)
 {
-	struct zone *z = &NODE_DATA(nid)->node_zones[zone];
-
 	set_page_links(page, zone, nid, pfn);
 	mminit_verify_page_links(page, zone, nid, pfn);
 	init_page_count(page);
 	page_mapcount_reset(page);
 	page_cpupid_reset_last(page);
 
-	/*
-	 * Mark the block movable so that blocks are reserved for
-	 * movable at startup. This will force kernel allocations
-	 * to reserve their blocks rather than leaking throughout
-	 * the address space during boot when many long-lived
-	 * kernel allocations are made. Later some blocks near
-	 * the start are marked MIGRATE_RESERVE by
-	 * setup_zone_migrate_reserve()
-	 *
-	 * bitmap is created for zone's valid pfn range. but memmap
-	 * can be created for invalid pages (for alignment)
-	 * check here not to call set_pageblock_migratetype() against
-	 * pfn out of zone.
-	 */
-	if ((z->zone_start_pfn <= pfn)
-	    && (pfn < zone_end_pfn(z))
-	    && !(pfn & (pageblock_nr_pages - 1)))
-		set_pageblock_migratetype(page, MIGRATE_MOVABLE);
-
 	INIT_LIST_HEAD(&page->lru);
 #ifdef WANT_PAGE_VIRTUAL
 	/* The shift won't overflow because ZONE_NORMAL is below 4G. */
@@ -1074,6 +1053,7 @@ static void __defermem_init deferred_free_range(struct page *page,
 	/* Free a large naturally-aligned chunk if possible */
 	if (nr_pages == MAX_ORDER_NR_PAGES &&
 	    (pfn & (MAX_ORDER_NR_PAGES-1)) == 0) {
+		set_pageblock_migratetype(page, MIGRATE_MOVABLE);
 		__free_pages_boot_core(page, pfn, MAX_ORDER-1);
 		return;
 	}
@@ -4492,7 +4472,29 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 						&nr_initialised))
 				break;
 		}
-		__init_single_pfn(pfn, zone, nid);
+
+		/*
+		 * Mark the block movable so that blocks are reserved for
+		 * movable at startup. This will force kernel allocations
+		 * to reserve their blocks rather than leaking throughout
+		 * the address space during boot when many long-lived
+		 * kernel allocations are made. Later some blocks near
+		 * the start are marked MIGRATE_RESERVE by
+		 * setup_zone_migrate_reserve()
+		 *
+		 * bitmap is created for zone's valid pfn range. but memmap
+		 * can be created for invalid pages (for alignment)
+		 * check here not to call set_pageblock_migratetype() against
+		 * pfn out of zone.
+		 */
+		if (!(pfn & (pageblock_nr_pages - 1))) {
+			struct page *page = pfn_to_page(pfn);
+
+			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+			__init_single_page(page, pfn, zone, nid);
+		} else {
+			__init_single_pfn(pfn, zone, nid);
+		}
 	}
 }
 
-- 
2.3.5


^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH 12/13] mm: meminit: Reduce number of times pageblocks are set during struct page init
@ 2015-04-28 14:37   ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-04-28 14:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML, Mel Gorman

During parallel sturct page initialisation, ranges are checked for every
PFN unnecessarily which increases boot times. This patch alters when the
ranges are checked.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/page_alloc.c | 46 ++++++++++++++++++++++++----------------------
 1 file changed, 24 insertions(+), 22 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2200b7473b5a..313f4a5a3907 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -852,33 +852,12 @@ static int free_tail_pages_check(struct page *head_page, struct page *page)
 static void __meminit __init_single_page(struct page *page, unsigned long pfn,
 				unsigned long zone, int nid)
 {
-	struct zone *z = &NODE_DATA(nid)->node_zones[zone];
-
 	set_page_links(page, zone, nid, pfn);
 	mminit_verify_page_links(page, zone, nid, pfn);
 	init_page_count(page);
 	page_mapcount_reset(page);
 	page_cpupid_reset_last(page);
 
-	/*
-	 * Mark the block movable so that blocks are reserved for
-	 * movable at startup. This will force kernel allocations
-	 * to reserve their blocks rather than leaking throughout
-	 * the address space during boot when many long-lived
-	 * kernel allocations are made. Later some blocks near
-	 * the start are marked MIGRATE_RESERVE by
-	 * setup_zone_migrate_reserve()
-	 *
-	 * bitmap is created for zone's valid pfn range. but memmap
-	 * can be created for invalid pages (for alignment)
-	 * check here not to call set_pageblock_migratetype() against
-	 * pfn out of zone.
-	 */
-	if ((z->zone_start_pfn <= pfn)
-	    && (pfn < zone_end_pfn(z))
-	    && !(pfn & (pageblock_nr_pages - 1)))
-		set_pageblock_migratetype(page, MIGRATE_MOVABLE);
-
 	INIT_LIST_HEAD(&page->lru);
 #ifdef WANT_PAGE_VIRTUAL
 	/* The shift won't overflow because ZONE_NORMAL is below 4G. */
@@ -1074,6 +1053,7 @@ static void __defermem_init deferred_free_range(struct page *page,
 	/* Free a large naturally-aligned chunk if possible */
 	if (nr_pages == MAX_ORDER_NR_PAGES &&
 	    (pfn & (MAX_ORDER_NR_PAGES-1)) == 0) {
+		set_pageblock_migratetype(page, MIGRATE_MOVABLE);
 		__free_pages_boot_core(page, pfn, MAX_ORDER-1);
 		return;
 	}
@@ -4492,7 +4472,29 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 						&nr_initialised))
 				break;
 		}
-		__init_single_pfn(pfn, zone, nid);
+
+		/*
+		 * Mark the block movable so that blocks are reserved for
+		 * movable at startup. This will force kernel allocations
+		 * to reserve their blocks rather than leaking throughout
+		 * the address space during boot when many long-lived
+		 * kernel allocations are made. Later some blocks near
+		 * the start are marked MIGRATE_RESERVE by
+		 * setup_zone_migrate_reserve()
+		 *
+		 * bitmap is created for zone's valid pfn range. but memmap
+		 * can be created for invalid pages (for alignment)
+		 * check here not to call set_pageblock_migratetype() against
+		 * pfn out of zone.
+		 */
+		if (!(pfn & (pageblock_nr_pages - 1))) {
+			struct page *page = pfn_to_page(pfn);
+
+			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+			__init_single_page(page, pfn, zone, nid);
+		} else {
+			__init_single_pfn(pfn, zone, nid);
+		}
 	}
 }
 
-- 
2.3.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH 13/13] mm: meminit: Remove mminit_verify_page_links
  2015-04-28 14:36 ` Mel Gorman
@ 2015-04-28 14:37   ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-04-28 14:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML, Mel Gorman

mminit_verify_page_links() is an extremely paranoid check that was introduced
when memory initialisation was being heavily reworked. Profiles indicated
that up to 10% of parallel memory initialisation was spent on checking
this for every page. The cost could be reduced but in practice this check
only found problems very early during the initialisation rewrite and has
found nothing since. This patch removes an expensive unnecessary check.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/internal.h   | 8 --------
 mm/mm_init.c    | 8 --------
 mm/page_alloc.c | 1 -
 3 files changed, 17 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index bed751a7ac42..467a93e6a7b1 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -360,10 +360,7 @@ do { \
 } while (0)
 
 extern void mminit_verify_pageflags_layout(void);
-extern void mminit_verify_page_links(struct page *page,
-		enum zone_type zone, unsigned long nid, unsigned long pfn);
 extern void mminit_verify_zonelist(void);
-
 #else
 
 static inline void mminit_dprintk(enum mminit_level level,
@@ -375,11 +372,6 @@ static inline void mminit_verify_pageflags_layout(void)
 {
 }
 
-static inline void mminit_verify_page_links(struct page *page,
-		enum zone_type zone, unsigned long nid, unsigned long pfn)
-{
-}
-
 static inline void mminit_verify_zonelist(void)
 {
 }
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 28fbf87b20aa..fdadf918de76 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -131,14 +131,6 @@ void __init mminit_verify_pageflags_layout(void)
 	BUG_ON(or_mask != add_mask);
 }
 
-void __meminit mminit_verify_page_links(struct page *page, enum zone_type zone,
-			unsigned long nid, unsigned long pfn)
-{
-	BUG_ON(page_to_nid(page) != nid);
-	BUG_ON(page_zonenum(page) != zone);
-	BUG_ON(page_to_pfn(page) != pfn);
-}
-
 static __init int set_mminit_loglevel(char *str)
 {
 	get_option(&str, &mminit_loglevel);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 313f4a5a3907..9c8f2a72263d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -853,7 +853,6 @@ static void __meminit __init_single_page(struct page *page, unsigned long pfn,
 				unsigned long zone, int nid)
 {
 	set_page_links(page, zone, nid, pfn);
-	mminit_verify_page_links(page, zone, nid, pfn);
 	init_page_count(page);
 	page_mapcount_reset(page);
 	page_cpupid_reset_last(page);
-- 
2.3.5


^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH 13/13] mm: meminit: Remove mminit_verify_page_links
@ 2015-04-28 14:37   ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-04-28 14:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML, Mel Gorman

mminit_verify_page_links() is an extremely paranoid check that was introduced
when memory initialisation was being heavily reworked. Profiles indicated
that up to 10% of parallel memory initialisation was spent on checking
this for every page. The cost could be reduced but in practice this check
only found problems very early during the initialisation rewrite and has
found nothing since. This patch removes an expensive unnecessary check.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/internal.h   | 8 --------
 mm/mm_init.c    | 8 --------
 mm/page_alloc.c | 1 -
 3 files changed, 17 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index bed751a7ac42..467a93e6a7b1 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -360,10 +360,7 @@ do { \
 } while (0)
 
 extern void mminit_verify_pageflags_layout(void);
-extern void mminit_verify_page_links(struct page *page,
-		enum zone_type zone, unsigned long nid, unsigned long pfn);
 extern void mminit_verify_zonelist(void);
-
 #else
 
 static inline void mminit_dprintk(enum mminit_level level,
@@ -375,11 +372,6 @@ static inline void mminit_verify_pageflags_layout(void)
 {
 }
 
-static inline void mminit_verify_page_links(struct page *page,
-		enum zone_type zone, unsigned long nid, unsigned long pfn)
-{
-}
-
 static inline void mminit_verify_zonelist(void)
 {
 }
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 28fbf87b20aa..fdadf918de76 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -131,14 +131,6 @@ void __init mminit_verify_pageflags_layout(void)
 	BUG_ON(or_mask != add_mask);
 }
 
-void __meminit mminit_verify_page_links(struct page *page, enum zone_type zone,
-			unsigned long nid, unsigned long pfn)
-{
-	BUG_ON(page_to_nid(page) != nid);
-	BUG_ON(page_zonenum(page) != zone);
-	BUG_ON(page_to_pfn(page) != pfn);
-}
-
 static __init int set_mminit_loglevel(char *str)
 {
 	get_option(&str, &mminit_loglevel);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 313f4a5a3907..9c8f2a72263d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -853,7 +853,6 @@ static void __meminit __init_single_page(struct page *page, unsigned long pfn,
 				unsigned long zone, int nid)
 {
 	set_page_links(page, zone, nid, pfn);
-	mminit_verify_page_links(page, zone, nid, pfn);
 	init_page_count(page);
 	page_mapcount_reset(page);
 	page_cpupid_reset_last(page);
-- 
2.3.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
  2015-04-28 14:36 ` Mel Gorman
@ 2015-04-28 16:06   ` Pekka Enberg
  -1 siblings, 0 replies; 168+ messages in thread
From: Pekka Enberg @ 2015-04-28 16:06 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Waiman Long,
	Scott Norton, Daniel J Blueman, Linux-MM, LKML

On Tue, Apr 28, 2015 at 5:36 PM, Mel Gorman <mgorman@suse.de> wrote:
> Struct page initialisation had been identified as one of the reasons why
> large machines take a long time to boot. Patches were posted a long time ago
> to defer initialisation until they were first used.  This was rejected on
> the grounds it should not be necessary to hurt the fast paths. This series
> reuses much of the work from that time but defers the initialisation of
> memory to kswapd so that one thread per node initialises memory local to
> that node.
>
> After applying the series and setting the appropriate Kconfig variable I
> see this in the boot log on a 64G machine
>
> [    7.383764] kswapd 0 initialised deferred memory in 188ms
> [    7.404253] kswapd 1 initialised deferred memory in 208ms
> [    7.411044] kswapd 3 initialised deferred memory in 216ms
> [    7.411551] kswapd 2 initialised deferred memory in 216ms
>
> On a 1TB machine, I see
>
> [    8.406511] kswapd 3 initialised deferred memory in 1116ms
> [    8.428518] kswapd 1 initialised deferred memory in 1140ms
> [    8.435977] kswapd 0 initialised deferred memory in 1148ms
> [    8.437416] kswapd 2 initialised deferred memory in 1148ms
>
> Once booted the machine appears to work as normal. Boot times were measured
> from the time shutdown was called until ssh was available again.  In the
> 64G case, the boot time savings are negligible. On the 1TB machine, the
> savings were 16 seconds.

FWIW,

Acked-by: Pekka Enberg <penberg@kernel.org>

for the whole series.

- Pekka

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
@ 2015-04-28 16:06   ` Pekka Enberg
  0 siblings, 0 replies; 168+ messages in thread
From: Pekka Enberg @ 2015-04-28 16:06 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Waiman Long,
	Scott Norton, Daniel J Blueman, Linux-MM, LKML

On Tue, Apr 28, 2015 at 5:36 PM, Mel Gorman <mgorman@suse.de> wrote:
> Struct page initialisation had been identified as one of the reasons why
> large machines take a long time to boot. Patches were posted a long time ago
> to defer initialisation until they were first used.  This was rejected on
> the grounds it should not be necessary to hurt the fast paths. This series
> reuses much of the work from that time but defers the initialisation of
> memory to kswapd so that one thread per node initialises memory local to
> that node.
>
> After applying the series and setting the appropriate Kconfig variable I
> see this in the boot log on a 64G machine
>
> [    7.383764] kswapd 0 initialised deferred memory in 188ms
> [    7.404253] kswapd 1 initialised deferred memory in 208ms
> [    7.411044] kswapd 3 initialised deferred memory in 216ms
> [    7.411551] kswapd 2 initialised deferred memory in 216ms
>
> On a 1TB machine, I see
>
> [    8.406511] kswapd 3 initialised deferred memory in 1116ms
> [    8.428518] kswapd 1 initialised deferred memory in 1140ms
> [    8.435977] kswapd 0 initialised deferred memory in 1148ms
> [    8.437416] kswapd 2 initialised deferred memory in 1148ms
>
> Once booted the machine appears to work as normal. Boot times were measured
> from the time shutdown was called until ssh was available again.  In the
> 64G case, the boot time savings are negligible. On the 1TB machine, the
> savings were 16 seconds.

FWIW,

Acked-by: Pekka Enberg <penberg@kernel.org>

for the whole series.

- Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
  2015-04-28 16:06   ` Pekka Enberg
@ 2015-04-28 18:38     ` nzimmer
  -1 siblings, 0 replies; 168+ messages in thread
From: nzimmer @ 2015-04-28 18:38 UTC (permalink / raw)
  To: Pekka Enberg, Mel Gorman
  Cc: Andrew Morton, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On an older 8 TB box with lots and lots of cpus the boot time, as 
measure from grub to login prompt, the boot time improved from 1484 
seconds to exactly 1000 seconds.

I have time on 16 TB box tonight and a 12 TB box thursday and will 
hopefully have more numbers then.



On 04/28/2015 11:06 AM, Pekka Enberg wrote:
> On Tue, Apr 28, 2015 at 5:36 PM, Mel Gorman <mgorman@suse.de> wrote:
>> Struct page initialisation had been identified as one of the reasons why
>> large machines take a long time to boot. Patches were posted a long time ago
>> to defer initialisation until they were first used.  This was rejected on
>> the grounds it should not be necessary to hurt the fast paths. This series
>> reuses much of the work from that time but defers the initialisation of
>> memory to kswapd so that one thread per node initialises memory local to
>> that node.
>>
>> After applying the series and setting the appropriate Kconfig variable I
>> see this in the boot log on a 64G machine
>>
>> [    7.383764] kswapd 0 initialised deferred memory in 188ms
>> [    7.404253] kswapd 1 initialised deferred memory in 208ms
>> [    7.411044] kswapd 3 initialised deferred memory in 216ms
>> [    7.411551] kswapd 2 initialised deferred memory in 216ms
>>
>> On a 1TB machine, I see
>>
>> [    8.406511] kswapd 3 initialised deferred memory in 1116ms
>> [    8.428518] kswapd 1 initialised deferred memory in 1140ms
>> [    8.435977] kswapd 0 initialised deferred memory in 1148ms
>> [    8.437416] kswapd 2 initialised deferred memory in 1148ms
>>
>> Once booted the machine appears to work as normal. Boot times were measured
>> from the time shutdown was called until ssh was available again.  In the
>> 64G case, the boot time savings are negligible. On the 1TB machine, the
>> savings were 16 seconds.
> FWIW,
>
> Acked-by: Pekka Enberg <penberg@kernel.org>
>
> for the whole series.
>
> - Pekka


^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
@ 2015-04-28 18:38     ` nzimmer
  0 siblings, 0 replies; 168+ messages in thread
From: nzimmer @ 2015-04-28 18:38 UTC (permalink / raw)
  To: Pekka Enberg, Mel Gorman
  Cc: Andrew Morton, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On an older 8 TB box with lots and lots of cpus the boot time, as 
measure from grub to login prompt, the boot time improved from 1484 
seconds to exactly 1000 seconds.

I have time on 16 TB box tonight and a 12 TB box thursday and will 
hopefully have more numbers then.



On 04/28/2015 11:06 AM, Pekka Enberg wrote:
> On Tue, Apr 28, 2015 at 5:36 PM, Mel Gorman <mgorman@suse.de> wrote:
>> Struct page initialisation had been identified as one of the reasons why
>> large machines take a long time to boot. Patches were posted a long time ago
>> to defer initialisation until they were first used.  This was rejected on
>> the grounds it should not be necessary to hurt the fast paths. This series
>> reuses much of the work from that time but defers the initialisation of
>> memory to kswapd so that one thread per node initialises memory local to
>> that node.
>>
>> After applying the series and setting the appropriate Kconfig variable I
>> see this in the boot log on a 64G machine
>>
>> [    7.383764] kswapd 0 initialised deferred memory in 188ms
>> [    7.404253] kswapd 1 initialised deferred memory in 208ms
>> [    7.411044] kswapd 3 initialised deferred memory in 216ms
>> [    7.411551] kswapd 2 initialised deferred memory in 216ms
>>
>> On a 1TB machine, I see
>>
>> [    8.406511] kswapd 3 initialised deferred memory in 1116ms
>> [    8.428518] kswapd 1 initialised deferred memory in 1140ms
>> [    8.435977] kswapd 0 initialised deferred memory in 1148ms
>> [    8.437416] kswapd 2 initialised deferred memory in 1148ms
>>
>> Once booted the machine appears to work as normal. Boot times were measured
>> from the time shutdown was called until ssh was available again.  In the
>> 64G case, the boot time savings are negligible. On the 1TB machine, the
>> savings were 16 seconds.
> FWIW,
>
> Acked-by: Pekka Enberg <penberg@kernel.org>
>
> for the whole series.
>
> - Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
  2015-04-28 14:36 ` Mel Gorman
@ 2015-04-29  1:16   ` Waiman Long
  -1 siblings, 0 replies; 168+ messages in thread
From: Waiman Long @ 2015-04-29  1:16 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On 04/28/2015 10:36 AM, Mel Gorman wrote:
> The bulk of the changes here are related to Andrew's feedback. Functionally
> there is almost no difference.
>
> Changelog since v3
> o Fix section-related warning
> o Comments, clarifications, checkpatch
> o Report the number of pages initialised
>
> Changelog since v2
> o Reduce overhead of topology_init
> o Remove boot-time kernel parameter to enable/disable
> o Enable on UMA
>
> Changelog since v1
> o Always initialise low zones
> o Typo corrections
> o Rename parallel mem init to parallel struct page init
> o Rebase to 4.0
>
> Struct page initialisation had been identified as one of the reasons why
> large machines take a long time to boot. Patches were posted a long time ago
> to defer initialisation until they were first used.  This was rejected on
> the grounds it should not be necessary to hurt the fast paths. This series
> reuses much of the work from that time but defers the initialisation of
> memory to kswapd so that one thread per node initialises memory local to
> that node.
>
> After applying the series and setting the appropriate Kconfig variable I
> see this in the boot log on a 64G machine
>
> [    7.383764] kswapd 0 initialised deferred memory in 188ms
> [    7.404253] kswapd 1 initialised deferred memory in 208ms
> [    7.411044] kswapd 3 initialised deferred memory in 216ms
> [    7.411551] kswapd 2 initialised deferred memory in 216ms
>
> On a 1TB machine, I see
>
> [    8.406511] kswapd 3 initialised deferred memory in 1116ms
> [    8.428518] kswapd 1 initialised deferred memory in 1140ms
> [    8.435977] kswapd 0 initialised deferred memory in 1148ms
> [    8.437416] kswapd 2 initialised deferred memory in 1148ms
>
> Once booted the machine appears to work as normal. Boot times were measured
> from the time shutdown was called until ssh was available again.  In the
> 64G case, the boot time savings are negligible. On the 1TB machine, the
> savings were 16 seconds.
>
> It would be nice if the people that have access to really large machines
> would test this series and report how much boot time is reduced.
>
>

I ran a bootup timing test on a 12-TB 16-socket IvyBridge-EX system. 
 From grub menu to ssh login, the bootup time was 453s before the patch 
and 265s after the patch - a saving of 188s (42%). I used a different OS 
environment and config file with this test and so the timing data 
weren't comparable with my previous testing data. The kswapd log entries 
were

[   45.973967] kswapd 4 initialised 197655470 pages in 4390ms
[   45.974214] kswapd 7 initialised 197655470 pages in 4390ms
[   45.976692] kswapd 15 initialised 197654299 pages in 4390ms
[   45.993284] kswapd 0 initialised 197131131 pages in 4410ms
[   46.032735] kswapd 9 initialised 197655470 pages in 4447ms
[   46.065856] kswapd 8 initialised 197655470 pages in 4481ms
[   46.066615] kswapd 1 initialised 197622702 pages in 4483ms
[   46.077995] kswapd 2 initialised 197655470 pages in 4495ms
[   46.219508] kswapd 13 initialised 197655470 pages in 4633ms
[   46.224358] kswapd 3 initialised 197655470 pages in 4641ms
[   46.228441] kswapd 11 initialised 197655470 pages in 4643ms
[   46.232258] kswapd 12 initialised 197655470 pages in 4647ms
[   46.239659] kswapd 10 initialised 197655470 pages in 4654ms
[   46.243402] kswapd 14 initialised 197655470 pages in 4657ms
[   46.250368] kswapd 5 initialised 197655470 pages in 4666ms
[   46.254659] kswapd 6 initialised 197655470 pages in 4670ms

Cheers,
Longman

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
@ 2015-04-29  1:16   ` Waiman Long
  0 siblings, 0 replies; 168+ messages in thread
From: Waiman Long @ 2015-04-29  1:16 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On 04/28/2015 10:36 AM, Mel Gorman wrote:
> The bulk of the changes here are related to Andrew's feedback. Functionally
> there is almost no difference.
>
> Changelog since v3
> o Fix section-related warning
> o Comments, clarifications, checkpatch
> o Report the number of pages initialised
>
> Changelog since v2
> o Reduce overhead of topology_init
> o Remove boot-time kernel parameter to enable/disable
> o Enable on UMA
>
> Changelog since v1
> o Always initialise low zones
> o Typo corrections
> o Rename parallel mem init to parallel struct page init
> o Rebase to 4.0
>
> Struct page initialisation had been identified as one of the reasons why
> large machines take a long time to boot. Patches were posted a long time ago
> to defer initialisation until they were first used.  This was rejected on
> the grounds it should not be necessary to hurt the fast paths. This series
> reuses much of the work from that time but defers the initialisation of
> memory to kswapd so that one thread per node initialises memory local to
> that node.
>
> After applying the series and setting the appropriate Kconfig variable I
> see this in the boot log on a 64G machine
>
> [    7.383764] kswapd 0 initialised deferred memory in 188ms
> [    7.404253] kswapd 1 initialised deferred memory in 208ms
> [    7.411044] kswapd 3 initialised deferred memory in 216ms
> [    7.411551] kswapd 2 initialised deferred memory in 216ms
>
> On a 1TB machine, I see
>
> [    8.406511] kswapd 3 initialised deferred memory in 1116ms
> [    8.428518] kswapd 1 initialised deferred memory in 1140ms
> [    8.435977] kswapd 0 initialised deferred memory in 1148ms
> [    8.437416] kswapd 2 initialised deferred memory in 1148ms
>
> Once booted the machine appears to work as normal. Boot times were measured
> from the time shutdown was called until ssh was available again.  In the
> 64G case, the boot time savings are negligible. On the 1TB machine, the
> savings were 16 seconds.
>
> It would be nice if the people that have access to really large machines
> would test this series and report how much boot time is reduced.
>
>

I ran a bootup timing test on a 12-TB 16-socket IvyBridge-EX system. 
 From grub menu to ssh login, the bootup time was 453s before the patch 
and 265s after the patch - a saving of 188s (42%). I used a different OS 
environment and config file with this test and so the timing data 
weren't comparable with my previous testing data. The kswapd log entries 
were

[   45.973967] kswapd 4 initialised 197655470 pages in 4390ms
[   45.974214] kswapd 7 initialised 197655470 pages in 4390ms
[   45.976692] kswapd 15 initialised 197654299 pages in 4390ms
[   45.993284] kswapd 0 initialised 197131131 pages in 4410ms
[   46.032735] kswapd 9 initialised 197655470 pages in 4447ms
[   46.065856] kswapd 8 initialised 197655470 pages in 4481ms
[   46.066615] kswapd 1 initialised 197622702 pages in 4483ms
[   46.077995] kswapd 2 initialised 197655470 pages in 4495ms
[   46.219508] kswapd 13 initialised 197655470 pages in 4633ms
[   46.224358] kswapd 3 initialised 197655470 pages in 4641ms
[   46.228441] kswapd 11 initialised 197655470 pages in 4643ms
[   46.232258] kswapd 12 initialised 197655470 pages in 4647ms
[   46.239659] kswapd 10 initialised 197655470 pages in 4654ms
[   46.243402] kswapd 14 initialised 197655470 pages in 4657ms
[   46.250368] kswapd 5 initialised 197655470 pages in 4666ms
[   46.254659] kswapd 6 initialised 197655470 pages in 4670ms

Cheers,
Longman

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 07/13] mm: meminit: Initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set
  2015-04-28 14:37   ` Mel Gorman
@ 2015-04-29 21:19     ` Andrew Morton
  -1 siblings, 0 replies; 168+ messages in thread
From: Andrew Morton @ 2015-04-29 21:19 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Tue, 28 Apr 2015 15:37:04 +0100 Mel Gorman <mgorman@suse.de> wrote:

> +/*
> + * Deferred struct page initialisation requires some early init functions that
> + * are removed before kswapd is up and running. The feature depends on memory
> + * hotplug so put the data and code required by deferred initialisation into
> + * the __meminit section where they are preserved.
> + */
> +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
> +#define __defermem_init __meminit
> +#define __defer_init    __meminit
> +#else
> +#define __defermem_init
> +#define __defer_init __init
> +#endif

I still don't get it :(

__defermem_init:

	if (CONFIG_DEFERRED_STRUCT_PAGE_INIT) {
		if (CONFIG_MEMORY_HOTPLUG)
			retain
	} else {
		retain
	}

    but CONFIG_DEFERRED_STRUCT_PAGE_INIT depends on
    CONFIG_MEMORY_HOTPLUG, so this becomes

	if (CONFIG_DEFERRED_STRUCT_PAGE_INIT) {
		retain
	} else {
		retain
	}

    which becomes

	retain

    so why does __defermem_init exist?



__defer_init:

	if (CONFIG_DEFERRED_STRUCT_PAGE_INIT) {
		if (CONFIG_MEMORY_HOTPLUG)
			retain
	} else {
		discard
	}

    becomes

	if (CONFIG_DEFERRED_STRUCT_PAGE_INIT) {
		retain
	} else {
		discard
	}

    this one makes sense, but could be documented much more clearly!


And why does the comment refer to "and data".  There is no
__defer_initdata, etc.  Just not needed yet?


^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 07/13] mm: meminit: Initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set
@ 2015-04-29 21:19     ` Andrew Morton
  0 siblings, 0 replies; 168+ messages in thread
From: Andrew Morton @ 2015-04-29 21:19 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Tue, 28 Apr 2015 15:37:04 +0100 Mel Gorman <mgorman@suse.de> wrote:

> +/*
> + * Deferred struct page initialisation requires some early init functions that
> + * are removed before kswapd is up and running. The feature depends on memory
> + * hotplug so put the data and code required by deferred initialisation into
> + * the __meminit section where they are preserved.
> + */
> +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
> +#define __defermem_init __meminit
> +#define __defer_init    __meminit
> +#else
> +#define __defermem_init
> +#define __defer_init __init
> +#endif

I still don't get it :(

__defermem_init:

	if (CONFIG_DEFERRED_STRUCT_PAGE_INIT) {
		if (CONFIG_MEMORY_HOTPLUG)
			retain
	} else {
		retain
	}

    but CONFIG_DEFERRED_STRUCT_PAGE_INIT depends on
    CONFIG_MEMORY_HOTPLUG, so this becomes

	if (CONFIG_DEFERRED_STRUCT_PAGE_INIT) {
		retain
	} else {
		retain
	}

    which becomes

	retain

    so why does __defermem_init exist?



__defer_init:

	if (CONFIG_DEFERRED_STRUCT_PAGE_INIT) {
		if (CONFIG_MEMORY_HOTPLUG)
			retain
	} else {
		discard
	}

    becomes

	if (CONFIG_DEFERRED_STRUCT_PAGE_INIT) {
		retain
	} else {
		discard
	}

    this one makes sense, but could be documented much more clearly!


And why does the comment refer to "and data".  There is no
__defer_initdata, etc.  Just not needed yet?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 07/13] mm: meminit: Initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set
  2015-04-29 21:19     ` Andrew Morton
@ 2015-04-30  8:45       ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-04-30  8:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Wed, Apr 29, 2015 at 02:19:01PM -0700, Andrew Morton wrote:
> On Tue, 28 Apr 2015 15:37:04 +0100 Mel Gorman <mgorman@suse.de> wrote:
> 
> > +/*
> > + * Deferred struct page initialisation requires some early init functions that
> > + * are removed before kswapd is up and running. The feature depends on memory
> > + * hotplug so put the data and code required by deferred initialisation into
> > + * the __meminit section where they are preserved.
> > + */
> > +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
> > +#define __defermem_init __meminit
> > +#define __defer_init    __meminit
> > +#else
> > +#define __defermem_init
> > +#define __defer_init __init
> > +#endif
> 
> I still don't get it :(
> 

This version was sent out at roughly the same minute you asked the time
before so the comment was not updated. I suggested this as a possible
alternative.

/*
 * Deferred struct page initialisation requires init functions that are freed
 * before kswapd is available. Reuse the memory hotplug section annotation
 * to mark the required code.
 *
 * __defermem_init is code that always exists but is annotated __meminit * to
 *      avoid section warnings.
 * __defer_init code gets marked __meminit when deferring struct page
 *      initialistion but is otherwise in the init section.
 */

Suggestions on better names are welcome.

> __defermem_init:
> 
> 	if (CONFIG_DEFERRED_STRUCT_PAGE_INIT) {
> 		if (CONFIG_MEMORY_HOTPLUG)
> 			retain
> 	} else {
> 		retain
> 	}
> 
>     but CONFIG_DEFERRED_STRUCT_PAGE_INIT depends on
>     CONFIG_MEMORY_HOTPLUG, so this becomes
> 
> 	if (CONFIG_DEFERRED_STRUCT_PAGE_INIT) {
> 		retain
> 	} else {
> 		retain
> 	}
> 
>     which becomes
> 
> 	retain
> 
>     so why does __defermem_init exist?
> 

It suppresses section warnings. Another possibility is that I get rid of
it entirely and use __refok but I feared that it might hide a real problem
in the future.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 07/13] mm: meminit: Initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set
@ 2015-04-30  8:45       ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-04-30  8:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Wed, Apr 29, 2015 at 02:19:01PM -0700, Andrew Morton wrote:
> On Tue, 28 Apr 2015 15:37:04 +0100 Mel Gorman <mgorman@suse.de> wrote:
> 
> > +/*
> > + * Deferred struct page initialisation requires some early init functions that
> > + * are removed before kswapd is up and running. The feature depends on memory
> > + * hotplug so put the data and code required by deferred initialisation into
> > + * the __meminit section where they are preserved.
> > + */
> > +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
> > +#define __defermem_init __meminit
> > +#define __defer_init    __meminit
> > +#else
> > +#define __defermem_init
> > +#define __defer_init __init
> > +#endif
> 
> I still don't get it :(
> 

This version was sent out at roughly the same minute you asked the time
before so the comment was not updated. I suggested this as a possible
alternative.

/*
 * Deferred struct page initialisation requires init functions that are freed
 * before kswapd is available. Reuse the memory hotplug section annotation
 * to mark the required code.
 *
 * __defermem_init is code that always exists but is annotated __meminit * to
 *      avoid section warnings.
 * __defer_init code gets marked __meminit when deferring struct page
 *      initialistion but is otherwise in the init section.
 */

Suggestions on better names are welcome.

> __defermem_init:
> 
> 	if (CONFIG_DEFERRED_STRUCT_PAGE_INIT) {
> 		if (CONFIG_MEMORY_HOTPLUG)
> 			retain
> 	} else {
> 		retain
> 	}
> 
>     but CONFIG_DEFERRED_STRUCT_PAGE_INIT depends on
>     CONFIG_MEMORY_HOTPLUG, so this becomes
> 
> 	if (CONFIG_DEFERRED_STRUCT_PAGE_INIT) {
> 		retain
> 	} else {
> 		retain
> 	}
> 
>     which becomes
> 
> 	retain
> 
>     so why does __defermem_init exist?
> 

It suppresses section warnings. Another possibility is that I get rid of
it entirely and use __refok but I feared that it might hide a real problem
in the future.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
  2015-04-28 18:38     ` nzimmer
@ 2015-04-30 16:10       ` Daniel J Blueman
  -1 siblings, 0 replies; 168+ messages in thread
From: Daniel J Blueman @ 2015-04-30 16:10 UTC (permalink / raw)
  To: nzimmer, Mel Gorman
  Cc: Pekka Enberg, Andrew Morton, Dave Hansen, Waiman Long,
	Scott Norton, Linux-MM, LKML, 'Steffen Persvold'

On Wed, Apr 29, 2015 at 2:38 AM, nzimmer <nzimmer@sgi.com> wrote:
> On 04/28/2015 11:06 AM, Pekka Enberg wrote:
>> On Tue, Apr 28, 2015 at 5:36 PM, Mel Gorman <mgorman@suse.de> wrote:
>>> Struct page initialisation had been identified as one of the 
>>> reasons why
>>> large machines take a long time to boot. Patches were posted a long 
>>> time ago
>>> to defer initialisation until they were first used.  This was 
>>> rejected on
>>> the grounds it should not be necessary to hurt the fast paths. This 
>>> series
>>> reuses much of the work from that time but defers the 
>>> initialisation of
>>> memory to kswapd so that one thread per node initialises memory 
>>> local to
>>> that node.
>>> 
>>> After applying the series and setting the appropriate Kconfig 
>>> variable I
>>> see this in the boot log on a 64G machine
>>> 
>>> [    7.383764] kswapd 0 initialised deferred memory in 188ms
>>> [    7.404253] kswapd 1 initialised deferred memory in 208ms
>>> [    7.411044] kswapd 3 initialised deferred memory in 216ms
>>> [    7.411551] kswapd 2 initialised deferred memory in 216ms
>>> 
>>> On a 1TB machine, I see
>>> 
>>> [    8.406511] kswapd 3 initialised deferred memory in 1116ms
>>> [    8.428518] kswapd 1 initialised deferred memory in 1140ms
>>> [    8.435977] kswapd 0 initialised deferred memory in 1148ms
>>> [    8.437416] kswapd 2 initialised deferred memory in 1148ms
>>> 
>>> Once booted the machine appears to work as normal. Boot times were 
>>> measured
>>> from the time shutdown was called until ssh was available again.  
>>> In the
>>> 64G case, the boot time savings are negligible. On the 1TB machine, 
>>> the
>>> savings were 16 seconds.

> On an older 8 TB box with lots and lots of cpus the boot time, as 
> measure from grub to login prompt, the boot time improved from 1484 
> seconds to exactly 1000 seconds.
> 
> I have time on 16 TB box tonight and a 12 TB box thursday and will 
> hopefully have more numbers then.

Neat, and a roughly similar picture here.

On a 7TB, 1728-core NumaConnect system with 108 NUMA nodes, we're 
seeing stock 4.0 boot in 7136s. This drops to 2159s, or a 70% reduction 
with this patchset. Non-temporal PMD init [1] drops this to 1045s.

Nathan, what do you guys see with the non-temporal PMD patch [1]? Do 
add a sfence at the ende label if you manually patch.

Thanks!
  Daniel

[1] https://lkml.org/lkml/2015/4/23/350


^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
@ 2015-04-30 16:10       ` Daniel J Blueman
  0 siblings, 0 replies; 168+ messages in thread
From: Daniel J Blueman @ 2015-04-30 16:10 UTC (permalink / raw)
  To: nzimmer, Mel Gorman
  Cc: Pekka Enberg, Andrew Morton, Dave Hansen, Waiman Long,
	Scott Norton, Linux-MM, LKML, 'Steffen Persvold'

On Wed, Apr 29, 2015 at 2:38 AM, nzimmer <nzimmer@sgi.com> wrote:
> On 04/28/2015 11:06 AM, Pekka Enberg wrote:
>> On Tue, Apr 28, 2015 at 5:36 PM, Mel Gorman <mgorman@suse.de> wrote:
>>> Struct page initialisation had been identified as one of the 
>>> reasons why
>>> large machines take a long time to boot. Patches were posted a long 
>>> time ago
>>> to defer initialisation until they were first used.  This was 
>>> rejected on
>>> the grounds it should not be necessary to hurt the fast paths. This 
>>> series
>>> reuses much of the work from that time but defers the 
>>> initialisation of
>>> memory to kswapd so that one thread per node initialises memory 
>>> local to
>>> that node.
>>> 
>>> After applying the series and setting the appropriate Kconfig 
>>> variable I
>>> see this in the boot log on a 64G machine
>>> 
>>> [    7.383764] kswapd 0 initialised deferred memory in 188ms
>>> [    7.404253] kswapd 1 initialised deferred memory in 208ms
>>> [    7.411044] kswapd 3 initialised deferred memory in 216ms
>>> [    7.411551] kswapd 2 initialised deferred memory in 216ms
>>> 
>>> On a 1TB machine, I see
>>> 
>>> [    8.406511] kswapd 3 initialised deferred memory in 1116ms
>>> [    8.428518] kswapd 1 initialised deferred memory in 1140ms
>>> [    8.435977] kswapd 0 initialised deferred memory in 1148ms
>>> [    8.437416] kswapd 2 initialised deferred memory in 1148ms
>>> 
>>> Once booted the machine appears to work as normal. Boot times were 
>>> measured
>>> from the time shutdown was called until ssh was available again.  
>>> In the
>>> 64G case, the boot time savings are negligible. On the 1TB machine, 
>>> the
>>> savings were 16 seconds.

> On an older 8 TB box with lots and lots of cpus the boot time, as 
> measure from grub to login prompt, the boot time improved from 1484 
> seconds to exactly 1000 seconds.
> 
> I have time on 16 TB box tonight and a 12 TB box thursday and will 
> hopefully have more numbers then.

Neat, and a roughly similar picture here.

On a 7TB, 1728-core NumaConnect system with 108 NUMA nodes, we're 
seeing stock 4.0 boot in 7136s. This drops to 2159s, or a 70% reduction 
with this patchset. Non-temporal PMD init [1] drops this to 1045s.

Nathan, what do you guys see with the non-temporal PMD patch [1]? Do 
add a sfence at the ende label if you manually patch.

Thanks!
  Daniel

[1] https://lkml.org/lkml/2015/4/23/350

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
  2015-04-30 16:10       ` Daniel J Blueman
@ 2015-04-30 17:12         ` nzimmer
  -1 siblings, 0 replies; 168+ messages in thread
From: nzimmer @ 2015-04-30 17:12 UTC (permalink / raw)
  To: Daniel J Blueman, Mel Gorman
  Cc: Pekka Enberg, Andrew Morton, Dave Hansen, Waiman Long,
	Scott Norton, Linux-MM, LKML, 'Steffen Persvold'

On 04/30/2015 11:10 AM, Daniel J Blueman wrote:
> On Wed, Apr 29, 2015 at 2:38 AM, nzimmer <nzimmer@sgi.com> wrote:
>> On 04/28/2015 11:06 AM, Pekka Enberg wrote:
>>> On Tue, Apr 28, 2015 at 5:36 PM, Mel Gorman <mgorman@suse.de> wrote:
>>>> Struct page initialisation had been identified as one of the 
>>>> reasons why
>>>> large machines take a long time to boot. Patches were posted a long 
>>>> time ago
>>>> to defer initialisation until they were first used.  This was 
>>>> rejected on
>>>> the grounds it should not be necessary to hurt the fast paths. This 
>>>> series
>>>> reuses much of the work from that time but defers the 
>>>> initialisation of
>>>> memory to kswapd so that one thread per node initialises memory 
>>>> local to
>>>> that node.
>>>>
>>>> After applying the series and setting the appropriate Kconfig 
>>>> variable I
>>>> see this in the boot log on a 64G machine
>>>>
>>>> [    7.383764] kswapd 0 initialised deferred memory in 188ms
>>>> [    7.404253] kswapd 1 initialised deferred memory in 208ms
>>>> [    7.411044] kswapd 3 initialised deferred memory in 216ms
>>>> [    7.411551] kswapd 2 initialised deferred memory in 216ms
>>>>
>>>> On a 1TB machine, I see
>>>>
>>>> [    8.406511] kswapd 3 initialised deferred memory in 1116ms
>>>> [    8.428518] kswapd 1 initialised deferred memory in 1140ms
>>>> [    8.435977] kswapd 0 initialised deferred memory in 1148ms
>>>> [    8.437416] kswapd 2 initialised deferred memory in 1148ms
>>>>
>>>> Once booted the machine appears to work as normal. Boot times were 
>>>> measured
>>>> from the time shutdown was called until ssh was available again.  
>>>> In the
>>>> 64G case, the boot time savings are negligible. On the 1TB machine, 
>>>> the
>>>> savings were 16 seconds.
>
>> On an older 8 TB box with lots and lots of cpus the boot time, as 
>> measure from grub to login prompt, the boot time improved from 1484 
>> seconds to exactly 1000 seconds.
>>
>> I have time on 16 TB box tonight and a 12 TB box thursday and will 
>> hopefully have more numbers then.
>
> Neat, and a roughly similar picture here.
>
> On a 7TB, 1728-core NumaConnect system with 108 NUMA nodes, we're 
> seeing stock 4.0 boot in 7136s. This drops to 2159s, or a 70% 
> reduction with this patchset. Non-temporal PMD init [1] drops this to 
> 1045s.
>
> Nathan, what do you guys see with the non-temporal PMD patch [1]? Do 
> add a sfence at the ende label if you manually patch.
>

I have not tried the non-temporal patch yet, Daniel.
I will give that a go when I can grab more machine time but that 
probably won't be today.

> Thanks!
>  Daniel
>
> [1] https://lkml.org/lkml/2015/4/23/350
>

More numbers, including my first set.

My numbers are from grub prompt to login prompt.
All times are in seconds.
The configs are very much like the ones found in sles but with 
RCU_FANOUT_LEAF=64 instead of 16
     Large core count boxed benefit from this quite a bit.

Older 8 TB box (128 nodes)
1484s -> 1000s (yes exactly)

32TB box (128 nodes)
4890s -> 1240s

Recent 12 TB box (32 nodes)
598s -> 450s

I am inferring from these numbers and others that memory locality is a 
big part of the win.

Out of curiosity has anyone ran any tests post boot time?



^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
@ 2015-04-30 17:12         ` nzimmer
  0 siblings, 0 replies; 168+ messages in thread
From: nzimmer @ 2015-04-30 17:12 UTC (permalink / raw)
  To: Daniel J Blueman, Mel Gorman
  Cc: Pekka Enberg, Andrew Morton, Dave Hansen, Waiman Long,
	Scott Norton, Linux-MM, LKML, 'Steffen Persvold'

On 04/30/2015 11:10 AM, Daniel J Blueman wrote:
> On Wed, Apr 29, 2015 at 2:38 AM, nzimmer <nzimmer@sgi.com> wrote:
>> On 04/28/2015 11:06 AM, Pekka Enberg wrote:
>>> On Tue, Apr 28, 2015 at 5:36 PM, Mel Gorman <mgorman@suse.de> wrote:
>>>> Struct page initialisation had been identified as one of the 
>>>> reasons why
>>>> large machines take a long time to boot. Patches were posted a long 
>>>> time ago
>>>> to defer initialisation until they were first used.  This was 
>>>> rejected on
>>>> the grounds it should not be necessary to hurt the fast paths. This 
>>>> series
>>>> reuses much of the work from that time but defers the 
>>>> initialisation of
>>>> memory to kswapd so that one thread per node initialises memory 
>>>> local to
>>>> that node.
>>>>
>>>> After applying the series and setting the appropriate Kconfig 
>>>> variable I
>>>> see this in the boot log on a 64G machine
>>>>
>>>> [    7.383764] kswapd 0 initialised deferred memory in 188ms
>>>> [    7.404253] kswapd 1 initialised deferred memory in 208ms
>>>> [    7.411044] kswapd 3 initialised deferred memory in 216ms
>>>> [    7.411551] kswapd 2 initialised deferred memory in 216ms
>>>>
>>>> On a 1TB machine, I see
>>>>
>>>> [    8.406511] kswapd 3 initialised deferred memory in 1116ms
>>>> [    8.428518] kswapd 1 initialised deferred memory in 1140ms
>>>> [    8.435977] kswapd 0 initialised deferred memory in 1148ms
>>>> [    8.437416] kswapd 2 initialised deferred memory in 1148ms
>>>>
>>>> Once booted the machine appears to work as normal. Boot times were 
>>>> measured
>>>> from the time shutdown was called until ssh was available again.  
>>>> In the
>>>> 64G case, the boot time savings are negligible. On the 1TB machine, 
>>>> the
>>>> savings were 16 seconds.
>
>> On an older 8 TB box with lots and lots of cpus the boot time, as 
>> measure from grub to login prompt, the boot time improved from 1484 
>> seconds to exactly 1000 seconds.
>>
>> I have time on 16 TB box tonight and a 12 TB box thursday and will 
>> hopefully have more numbers then.
>
> Neat, and a roughly similar picture here.
>
> On a 7TB, 1728-core NumaConnect system with 108 NUMA nodes, we're 
> seeing stock 4.0 boot in 7136s. This drops to 2159s, or a 70% 
> reduction with this patchset. Non-temporal PMD init [1] drops this to 
> 1045s.
>
> Nathan, what do you guys see with the non-temporal PMD patch [1]? Do 
> add a sfence at the ende label if you manually patch.
>

I have not tried the non-temporal patch yet, Daniel.
I will give that a go when I can grab more machine time but that 
probably won't be today.

> Thanks!
>  Daniel
>
> [1] https://lkml.org/lkml/2015/4/23/350
>

More numbers, including my first set.

My numbers are from grub prompt to login prompt.
All times are in seconds.
The configs are very much like the ones found in sles but with 
RCU_FANOUT_LEAF=64 instead of 16
     Large core count boxed benefit from this quite a bit.

Older 8 TB box (128 nodes)
1484s -> 1000s (yes exactly)

32TB box (128 nodes)
4890s -> 1240s

Recent 12 TB box (32 nodes)
598s -> 450s

I am inferring from these numbers and others that memory locality is a 
big part of the win.

Out of curiosity has anyone ran any tests post boot time?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
  2015-04-30 17:12         ` nzimmer
@ 2015-04-30 17:28           ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-04-30 17:28 UTC (permalink / raw)
  To: nzimmer
  Cc: Daniel J Blueman, Pekka Enberg, Andrew Morton, Dave Hansen,
	Waiman Long, Scott Norton, Linux-MM, LKML,
	'Steffen Persvold'

On Thu, Apr 30, 2015 at 12:12:50PM -0500, nzimmer wrote:
> 
> Out of curiosity has anyone ran any tests post boot time?
> 

Some functional tests only to exercise the machine and see if anything
blew up. It looked fine to me at least.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
@ 2015-04-30 17:28           ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-04-30 17:28 UTC (permalink / raw)
  To: nzimmer
  Cc: Daniel J Blueman, Pekka Enberg, Andrew Morton, Dave Hansen,
	Waiman Long, Scott Norton, Linux-MM, LKML,
	'Steffen Persvold'

On Thu, Apr 30, 2015 at 12:12:50PM -0500, nzimmer wrote:
> 
> Out of curiosity has anyone ran any tests post boot time?
> 

Some functional tests only to exercise the machine and see if anything
blew up. It looked fine to me at least.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 06/13] mm: meminit: Inline some helper functions
  2015-04-28 14:37   ` Mel Gorman
@ 2015-04-30 21:53     ` Andrew Morton
  -1 siblings, 0 replies; 168+ messages in thread
From: Andrew Morton @ 2015-04-30 21:53 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Tue, 28 Apr 2015 15:37:03 +0100 Mel Gorman <mgorman@suse.de> wrote:

> early_pfn_in_nid() and meminit_pfn_in_nid() are small functions that are
> unnecessarily visible outside memory initialisation. As well as unnecessary
> visibility, it's unnecessary function call overhead when initialising pages.
> This patch moves the helpers inline.

mm/page_alloc.c: In function 'memmap_init_zone':
mm/page_alloc.c:4287: error: implicit declaration of function 'early_pfn_in_nid'

--- a/mm/page_alloc.c~mm-meminit-inline-some-helper-functions-fix
+++ a/mm/page_alloc.c
@@ -950,8 +950,16 @@ static inline bool __meminit early_pfn_i
 {
 	return meminit_pfn_in_nid(pfn, node, &early_pfnnid_cache);
 }
+
+#else
+
+static inline bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
+{
+	return true;
+}
 #endif
 
+
 #ifdef CONFIG_CMA
 /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
 void __init init_cma_reserved_pageblock(struct page *page)


allmodconfig.  It's odd that nobody else hit this...

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 06/13] mm: meminit: Inline some helper functions
@ 2015-04-30 21:53     ` Andrew Morton
  0 siblings, 0 replies; 168+ messages in thread
From: Andrew Morton @ 2015-04-30 21:53 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Tue, 28 Apr 2015 15:37:03 +0100 Mel Gorman <mgorman@suse.de> wrote:

> early_pfn_in_nid() and meminit_pfn_in_nid() are small functions that are
> unnecessarily visible outside memory initialisation. As well as unnecessary
> visibility, it's unnecessary function call overhead when initialising pages.
> This patch moves the helpers inline.

mm/page_alloc.c: In function 'memmap_init_zone':
mm/page_alloc.c:4287: error: implicit declaration of function 'early_pfn_in_nid'

--- a/mm/page_alloc.c~mm-meminit-inline-some-helper-functions-fix
+++ a/mm/page_alloc.c
@@ -950,8 +950,16 @@ static inline bool __meminit early_pfn_i
 {
 	return meminit_pfn_in_nid(pfn, node, &early_pfnnid_cache);
 }
+
+#else
+
+static inline bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
+{
+	return true;
+}
 #endif
 
+
 #ifdef CONFIG_CMA
 /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
 void __init init_cma_reserved_pageblock(struct page *page)


allmodconfig.  It's odd that nobody else hit this...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 06/13] mm: meminit: Inline some helper functions
  2015-04-30 21:53     ` Andrew Morton
@ 2015-04-30 21:55       ` Andrew Morton
  -1 siblings, 0 replies; 168+ messages in thread
From: Andrew Morton @ 2015-04-30 21:55 UTC (permalink / raw)
  To: Mel Gorman, Nathan Zimmer, Dave Hansen, Waiman Long,
	Scott Norton, Daniel J Blueman, Linux-MM, LKML

On Thu, 30 Apr 2015 14:53:46 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:

> allmodconfig.  It's odd that nobody else hit this...

err, it's allnoconfig.  Not odd.

It would be tiresome to mention Documentation/SubmitChecklist.

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 06/13] mm: meminit: Inline some helper functions
@ 2015-04-30 21:55       ` Andrew Morton
  0 siblings, 0 replies; 168+ messages in thread
From: Andrew Morton @ 2015-04-30 21:55 UTC (permalink / raw)
  To: Mel Gorman, Nathan Zimmer, Dave Hansen, Waiman Long,
	Scott Norton, Daniel J Blueman, Linux-MM, LKML

On Thu, 30 Apr 2015 14:53:46 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:

> allmodconfig.  It's odd that nobody else hit this...

err, it's allnoconfig.  Not odd.

It would be tiresome to mention Documentation/SubmitChecklist.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH] mm: page_alloc: pass PFN to __free_pages_bootmem -fix
  2015-04-28 14:37   ` Mel Gorman
@ 2015-05-01  9:20     ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-05-01  9:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

Stephen Rothwell reported the following

	Today's linux-next build (sparc defconfig) failed like this:

	mm/bootmem.c: In function 'free_all_bootmem_core':
	mm/bootmem.c:237:32: error: 'cur' undeclared (first use in this function)
	   __free_pages_bootmem(page++, cur++, 0);
                                ^
	Caused by commit "mm: page_alloc: pass PFN to __free_pages_bootmem".

He also merged a fix. The only difference in this version is one line is
moved so the final diff context is clearer. This is a fix to the mmotm
patch mm-page_alloc-pass-pfn-to-__free_pages_bootmem.patch

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/bootmem.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/bootmem.c b/mm/bootmem.c
index daf956bb4782..a23dd1934654 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -172,7 +172,7 @@ void __init free_bootmem_late(unsigned long physaddr, unsigned long size)
 static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
 {
 	struct page *page;
-	unsigned long *map, start, end, pages, count = 0;
+	unsigned long *map, start, end, pages, cur, count = 0;
 
 	if (!bdata->node_bootmem_map)
 		return 0;
@@ -214,7 +214,7 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
 			count += BITS_PER_LONG;
 			start += BITS_PER_LONG;
 		} else {
-			unsigned long cur = start;
+			cur = start;
 
 			start = ALIGN(start + 1, BITS_PER_LONG);
 			while (vec && cur != start) {
@@ -229,6 +229,7 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
 		}
 	}
 
+	cur = bdata->node_min_pfn;
 	page = virt_to_page(bdata->node_bootmem_map);
 	pages = bdata->node_low_pfn - bdata->node_min_pfn;
 	pages = bootmem_bootmap_pages(pages);

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH] mm: page_alloc: pass PFN to __free_pages_bootmem -fix
@ 2015-05-01  9:20     ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-05-01  9:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

Stephen Rothwell reported the following

	Today's linux-next build (sparc defconfig) failed like this:

	mm/bootmem.c: In function 'free_all_bootmem_core':
	mm/bootmem.c:237:32: error: 'cur' undeclared (first use in this function)
	   __free_pages_bootmem(page++, cur++, 0);
                                ^
	Caused by commit "mm: page_alloc: pass PFN to __free_pages_bootmem".

He also merged a fix. The only difference in this version is one line is
moved so the final diff context is clearer. This is a fix to the mmotm
patch mm-page_alloc-pass-pfn-to-__free_pages_bootmem.patch

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/bootmem.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/bootmem.c b/mm/bootmem.c
index daf956bb4782..a23dd1934654 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -172,7 +172,7 @@ void __init free_bootmem_late(unsigned long physaddr, unsigned long size)
 static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
 {
 	struct page *page;
-	unsigned long *map, start, end, pages, count = 0;
+	unsigned long *map, start, end, pages, cur, count = 0;
 
 	if (!bdata->node_bootmem_map)
 		return 0;
@@ -214,7 +214,7 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
 			count += BITS_PER_LONG;
 			start += BITS_PER_LONG;
 		} else {
-			unsigned long cur = start;
+			cur = start;
 
 			start = ALIGN(start + 1, BITS_PER_LONG);
 			while (vec && cur != start) {
@@ -229,6 +229,7 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
 		}
 	}
 
+	cur = bdata->node_min_pfn;
 	page = virt_to_page(bdata->node_bootmem_map);
 	pages = bdata->node_low_pfn - bdata->node_min_pfn;
 	pages = bootmem_bootmap_pages(pages);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH] mm: meminit: Initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set -fix
  2015-04-28 14:37   ` Mel Gorman
@ 2015-05-01  9:21     ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-05-01  9:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

This is take 2 on describing why these section names exist. If accepted
then it should be considered a fix for the mmotm patch
mm-meminit-initialise-a-subset-of-struct-pages-if-config_deferred_struct_page_init-is-set.patch

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/internal.h | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 24314b671db1..85189fce7f61 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -386,10 +386,14 @@ static inline void mminit_verify_zonelist(void)
 #endif /* CONFIG_DEBUG_MEMORY_INIT */
 
 /*
- * Deferred struct page initialisation requires some early init functions that
- * are removed before kswapd is up and running. The feature depends on memory
- * hotplug so put the data and code required by deferred initialisation into
- * the __meminit section where they are preserved.
+ * Deferred struct page initialisation requires init functions that are freed
+ * before kswapd is available. Reuse the memory hotplug section annotation
+ * to mark the required code.
+ *
+ * __defermem_init is code that always exists but is annotated __meminit to
+ * 	avoid section warnings.
+ * __defer_init code gets marked __meminit when deferring struct page
+ *	initialistion but is otherwise in the init section.
  */
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 #define __defermem_init __meminit

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH] mm: meminit: Initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set -fix
@ 2015-05-01  9:21     ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-05-01  9:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

This is take 2 on describing why these section names exist. If accepted
then it should be considered a fix for the mmotm patch
mm-meminit-initialise-a-subset-of-struct-pages-if-config_deferred_struct_page_init-is-set.patch

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/internal.h | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 24314b671db1..85189fce7f61 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -386,10 +386,14 @@ static inline void mminit_verify_zonelist(void)
 #endif /* CONFIG_DEBUG_MEMORY_INIT */
 
 /*
- * Deferred struct page initialisation requires some early init functions that
- * are removed before kswapd is up and running. The feature depends on memory
- * hotplug so put the data and code required by deferred initialisation into
- * the __meminit section where they are preserved.
+ * Deferred struct page initialisation requires init functions that are freed
+ * before kswapd is available. Reuse the memory hotplug section annotation
+ * to mark the required code.
+ *
+ * __defermem_init is code that always exists but is annotated __meminit to
+ * 	avoid section warnings.
+ * __defer_init code gets marked __meminit when deferring struct page
+ *	initialistion but is otherwise in the init section.
  */
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 #define __defermem_init __meminit

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH] mm: meminit: Reduce number of times pageblocks are set during struct page init -fix
  2015-04-28 14:37   ` Mel Gorman
@ 2015-05-01  9:23     ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-05-01  9:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML


The patch "mm: meminit: Reduce number of times pageblocks are
set during struct page init" is setting a pageblock before
the page is initialised. This is a fix for the mmotm patch
mm-meminit-reduce-number-of-times-pageblocks-are-set-during-struct-page-init.patch

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 19aac687963c..544edb3b8da2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4497,8 +4497,8 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 		if (!(pfn & (pageblock_nr_pages - 1))) {
 			struct page *page = pfn_to_page(pfn);
 
-			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
 			__init_single_page(page, pfn, zone, nid);
+			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
 		} else {
 			__init_single_pfn(pfn, zone, nid);
 		}

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH] mm: meminit: Reduce number of times pageblocks are set during struct page init -fix
@ 2015-05-01  9:23     ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-05-01  9:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Dave Hansen, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML


The patch "mm: meminit: Reduce number of times pageblocks are
set during struct page init" is setting a pageblock before
the page is initialised. This is a fix for the mmotm patch
mm-meminit-reduce-number-of-times-pageblocks-are-set-during-struct-page-init.patch

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 19aac687963c..544edb3b8da2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4497,8 +4497,8 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 		if (!(pfn & (pageblock_nr_pages - 1))) {
 			struct page *page = pfn_to_page(pfn);
 
-			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
 			__init_single_page(page, pfn, zone, nid);
+			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
 		} else {
 			__init_single_pfn(pfn, zone, nid);
 		}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
  2015-04-29  1:16   ` Waiman Long
@ 2015-05-01 22:02     ` Waiman Long
  -1 siblings, 0 replies; 168+ messages in thread
From: Waiman Long @ 2015-05-01 22:02 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On 04/28/2015 09:16 PM, Waiman Long wrote:
> On 04/28/2015 10:36 AM, Mel Gorman wrote:
>> The bulk of the changes here are related to Andrew's feedback. 
>> Functionally
>> there is almost no difference.
>>
>> Changelog since v3
>> o Fix section-related warning
>> o Comments, clarifications, checkpatch
>> o Report the number of pages initialised
>>
>> Changelog since v2
>> o Reduce overhead of topology_init
>> o Remove boot-time kernel parameter to enable/disable
>> o Enable on UMA
>>
>> Changelog since v1
>> o Always initialise low zones
>> o Typo corrections
>> o Rename parallel mem init to parallel struct page init
>> o Rebase to 4.0
>>
>> Struct page initialisation had been identified as one of the reasons why
>> large machines take a long time to boot. Patches were posted a long 
>> time ago
>> to defer initialisation until they were first used.  This was 
>> rejected on
>> the grounds it should not be necessary to hurt the fast paths. This 
>> series
>> reuses much of the work from that time but defers the initialisation of
>> memory to kswapd so that one thread per node initialises memory local to
>> that node.
>>
>> After applying the series and setting the appropriate Kconfig variable I
>> see this in the boot log on a 64G machine
>>
>> [    7.383764] kswapd 0 initialised deferred memory in 188ms
>> [    7.404253] kswapd 1 initialised deferred memory in 208ms
>> [    7.411044] kswapd 3 initialised deferred memory in 216ms
>> [    7.411551] kswapd 2 initialised deferred memory in 216ms
>>
>> On a 1TB machine, I see
>>
>> [    8.406511] kswapd 3 initialised deferred memory in 1116ms
>> [    8.428518] kswapd 1 initialised deferred memory in 1140ms
>> [    8.435977] kswapd 0 initialised deferred memory in 1148ms
>> [    8.437416] kswapd 2 initialised deferred memory in 1148ms
>>
>> Once booted the machine appears to work as normal. Boot times were 
>> measured
>> from the time shutdown was called until ssh was available again.  In the
>> 64G case, the boot time savings are negligible. On the 1TB machine, the
>> savings were 16 seconds.
>>
>> It would be nice if the people that have access to really large machines
>> would test this series and report how much boot time is reduced.
>>
>>
>
> I ran a bootup timing test on a 12-TB 16-socket IvyBridge-EX system. 
> From grub menu to ssh login, the bootup time was 453s before the patch 
> and 265s after the patch - a saving of 188s (42%). I used a different 
> OS environment and config file with this test and so the timing data 
> weren't comparable with my previous testing data. The kswapd log 
> entries were
>
> [   45.973967] kswapd 4 initialised 197655470 pages in 4390ms
> [   45.974214] kswapd 7 initialised 197655470 pages in 4390ms
> [   45.976692] kswapd 15 initialised 197654299 pages in 4390ms
> [   45.993284] kswapd 0 initialised 197131131 pages in 4410ms
> [   46.032735] kswapd 9 initialised 197655470 pages in 4447ms
> [   46.065856] kswapd 8 initialised 197655470 pages in 4481ms
> [   46.066615] kswapd 1 initialised 197622702 pages in 4483ms
> [   46.077995] kswapd 2 initialised 197655470 pages in 4495ms
> [   46.219508] kswapd 13 initialised 197655470 pages in 4633ms
> [   46.224358] kswapd 3 initialised 197655470 pages in 4641ms
> [   46.228441] kswapd 11 initialised 197655470 pages in 4643ms
> [   46.232258] kswapd 12 initialised 197655470 pages in 4647ms
> [   46.239659] kswapd 10 initialised 197655470 pages in 4654ms
> [   46.243402] kswapd 14 initialised 197655470 pages in 4657ms
> [   46.250368] kswapd 5 initialised 197655470 pages in 4666ms
> [   46.254659] kswapd 6 initialised 197655470 pages in 4670ms
>
> Cheers,
> Longman

Bad news!

I tried your patch on a 24-TB DragonHawk and got an out of memory panic. 
The kernel log messages were:
   :
[   80.126186] CPU  474: hi:  186, btch:  31 usd:   0
[   80.131457] CPU  475: hi:  186, btch:  31 usd:   0
[   80.136726] CPU  476: hi:  186, btch:  31 usd:   0
[   80.141997] CPU  477: hi:  186, btch:  31 usd:   0
[   80.147267] CPU  478: hi:  186, btch:  31 usd:   0
[   80.152538] CPU  479: hi:  186, btch:  31 usd:   0
[   80.157813] active_anon:0 inactive_anon:0 isolated_anon:0
[   80.157813]  active_file:0 inactive_file:0 isolated_file:0
[   80.157813]  unevictable:0 dirty:0 writeback:0 unstable:0
[   80.157813]  free:209 slab_reclaimable:7 slab_unreclaimable:42986
[   80.157813]  mapped:0 shmem:0 pagetables:0 bounce:0
[   80.157813]  free_cma:0
[   80.190428] Node 0 DMA free:568kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB 
managed:15896kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB 
slab_reclaimable:0kB slab_unreclaimable:14928kB kernel_stack:400kB 
pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
pages_scanned:0 all_unreclaimable? yes
[   80.233475] lowmem_reserve[]: 0 0 0 0
[   80.237542] Node 0 DMA32 free:20kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1961924kB 
managed:1333604kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB 
shmem:0kB slab_reclaimable:12kB slab_unreclaimable:101664kB 
kernel_stack:50176kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB 
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[   80.281456] lowmem_reserve[]: 0 0 0 0
[   80.285527] Node 0 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1608515580kB managed:2097148kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:4kB 
slab_unreclaimable:948kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes
[   80.328958] lowmem_reserve[]: 0 0 0 0
[   80.333031] Node 1 Normal free:248kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612732kB managed:2228220kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:12kB 
slab_unreclaimable:46240kB kernel_stack:3232kB pagetables:0kB 
unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes
[   80.377256] lowmem_reserve[]: 0 0 0 0
[   80.381325] Node 2 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:612kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes
[   80.424764] lowmem_reserve[]: 0 0 0 0
[   80.428842] Node 3 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:600kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes
[   80.472293] lowmem_reserve[]: 0 0 0 0
[   80.476360] Node 4 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:620kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes
[   80.519803] lowmem_reserve[]: 0 0 0 0
[   80.523875] Node 5 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:584kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes
[   80.567312] lowmem_reserve[]: 0 0 0 0
[   80.571379] Node 6 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:556kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes
[   80.614814] lowmem_reserve[]: 0 0 0 0
[   80.618881] Node 7 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:556kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes
[   80.662316] lowmem_reserve[]: 0 0 0 0
[   80.666390] Node 8 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:572kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes
[   80.709827] lowmem_reserve[]: 0 0 0 0
[   80.713898] Node 9 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:572kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes
[   80.757336] lowmem_reserve[]: 0 0 0 0
[   80.761407] Node 10 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:564kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes
[   80.804941] lowmem_reserve[]: 0 0 0 0
[   80.809015] Node 11 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:572kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes
[   80.852548] lowmem_reserve[]: 0 0 0 0
[   80.856620] Node 12 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:616kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes
[   80.900158] lowmem_reserve[]: 0 0 0 0
[   80.904236] Node 13 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:592kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes
[   80.947765] lowmem_reserve[]: 0 0 0 0
[   80.951847] Node 14 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:600kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes
[   80.995380] lowmem_reserve[]: 0 0 0 0
[   80.999448] Node 15 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:548kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes
[   81.042974] lowmem_reserve[]: 0 0 0 0
[   81.047044] Node 0 DMA: 132*4kB (U) 5*8kB (U) 0*16kB 0*32kB 0*64kB 
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 568kB
[   81.059632] Node 0 DMA32: 5*4kB (U) 0*8kB 0*16kB 0*32kB 0*64kB 
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 20kB
[   81.071733] Node 0 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   81.083443] Node 1 Normal: 52*4kB (U) 5*8kB (U) 0*16kB 0*32kB 0*64kB 
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 248kB
[   81.096227] Node 2 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   81.107935] Node 3 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   81.119643] Node 4 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   81.131347] Node 5 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   81.143056] Node 6 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   81.154767] Node 7 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   81.166473] Node 8 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   81.178179] Node 9 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   81.189893] Node 10 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   81.201695] Node 11 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   81.213496] Node 12 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   81.225324] Node 13 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   81.237130] Node 14 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   81.248926] Node 15 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   81.260726] 0 total pagecache pages
[   81.264565] 0 pages in swap cache
[   81.268212] Swap cache stats: add 0, delete 0, find 0/0
[   81.273962] Free swap  = 0kB
[   81.277125] Total swap = 0kB
[   81.280341] 6442421132 pages RAM
[   81.283888] 0 pages HighMem/MovableOnly
[   81.288109] 6433662383 pages reserved
[   81.292135] 0 pages hwpoisoned
[   81.295491] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds 
swapents oom_score_adj name
[   81.305245] Kernel panic - not syncing: Out of memory and no killable 
processes...
[   81.305245]
[   81.315200] CPU: 240 PID: 1 Comm: swapper/0 Not tainted 
4.0.1-pmm-bigsmp #1
[   81.322856] Hardware name: HP Superdome2 16s x86, BIOS Bundle: 
006.000.042 SFW: 015.099.000 04/01/2015
[   81.333096]  0000000000000000 ffff8800044c79c8 ffffffff8151b0c9 
ffff8800044c7a48
[   81.341262]  ffffffff8151ae1e 0000000000000008 ffff8800044c7a58 
ffff8800044c79f8
[   81.349428]  ffffffff810785c3 ffffffff81a13480 0000000000000000 
ffff8800001001d0
[   81.357595] Call Trace:
[   81.360287]  [<ffffffff8151b0c9>] dump_stack+0x68/0x77
[   81.365942]  [<ffffffff8151ae1e>] panic+0xb9/0x219
[   81.371213]  [<ffffffff810785c3>] ? 
__blocking_notifier_call_chain+0x63/0x80
[   81.378971]  [<ffffffff811384ce>] __out_of_memory+0x34e/0x350
[   81.385292]  [<ffffffff811385ee>] out_of_memory+0x5e/0x90
[   81.391230]  [<ffffffff8113ce9e>] __alloc_pages_slowpath+0x6be/0x740
[   81.398219]  [<ffffffff8113d15c>] __alloc_pages_nodemask+0x23c/0x250
[   81.405212]  [<ffffffff81186346>] kmem_getpages+0x56/0x110
[   81.411246]  [<ffffffff81187f44>] fallback_alloc+0x164/0x200
[   81.417474]  [<ffffffff81187cfd>] ____cache_alloc_node+0x8d/0x170
[   81.424179]  [<ffffffff811887bb>] kmem_cache_alloc_trace+0x17b/0x240
[   81.431169]  [<ffffffff813d5f3a>] init_memory_block+0x3a/0x110
[   81.437586]  [<ffffffff81b5f687>] memory_dev_init+0xd7/0x13d
[   81.443810]  [<ffffffff81b5f2af>] driver_init+0x2f/0x37
[   81.449556]  [<ffffffff81b1599b>] do_basic_setup+0x29/0xd5
[   81.455597]  [<ffffffff81b372c4>] ? sched_init_smp+0x140/0x147
[   81.462015]  [<ffffffff81b15c55>] kernel_init_freeable+0x20e/0x297
[   81.468815]  [<ffffffff81512ea0>] ? rest_init+0x80/0x80
[   81.474565]  [<ffffffff81512ea9>] kernel_init+0x9/0xf0
[   81.480216]  [<ffffffff8151f788>] ret_from_fork+0x58/0x90
[   81.486156]  [<ffffffff81512ea0>] ? rest_init+0x80/0x80
[   81.492350] ---[ end Kernel panic - not syncing: Out of memory and no 
killable processes...
[   81.492350]

-Longman





^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
@ 2015-05-01 22:02     ` Waiman Long
  0 siblings, 0 replies; 168+ messages in thread
From: Waiman Long @ 2015-05-01 22:02 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On 04/28/2015 09:16 PM, Waiman Long wrote:
> On 04/28/2015 10:36 AM, Mel Gorman wrote:
>> The bulk of the changes here are related to Andrew's feedback. 
>> Functionally
>> there is almost no difference.
>>
>> Changelog since v3
>> o Fix section-related warning
>> o Comments, clarifications, checkpatch
>> o Report the number of pages initialised
>>
>> Changelog since v2
>> o Reduce overhead of topology_init
>> o Remove boot-time kernel parameter to enable/disable
>> o Enable on UMA
>>
>> Changelog since v1
>> o Always initialise low zones
>> o Typo corrections
>> o Rename parallel mem init to parallel struct page init
>> o Rebase to 4.0
>>
>> Struct page initialisation had been identified as one of the reasons why
>> large machines take a long time to boot. Patches were posted a long 
>> time ago
>> to defer initialisation until they were first used.  This was 
>> rejected on
>> the grounds it should not be necessary to hurt the fast paths. This 
>> series
>> reuses much of the work from that time but defers the initialisation of
>> memory to kswapd so that one thread per node initialises memory local to
>> that node.
>>
>> After applying the series and setting the appropriate Kconfig variable I
>> see this in the boot log on a 64G machine
>>
>> [    7.383764] kswapd 0 initialised deferred memory in 188ms
>> [    7.404253] kswapd 1 initialised deferred memory in 208ms
>> [    7.411044] kswapd 3 initialised deferred memory in 216ms
>> [    7.411551] kswapd 2 initialised deferred memory in 216ms
>>
>> On a 1TB machine, I see
>>
>> [    8.406511] kswapd 3 initialised deferred memory in 1116ms
>> [    8.428518] kswapd 1 initialised deferred memory in 1140ms
>> [    8.435977] kswapd 0 initialised deferred memory in 1148ms
>> [    8.437416] kswapd 2 initialised deferred memory in 1148ms
>>
>> Once booted the machine appears to work as normal. Boot times were 
>> measured
>> from the time shutdown was called until ssh was available again.  In the
>> 64G case, the boot time savings are negligible. On the 1TB machine, the
>> savings were 16 seconds.
>>
>> It would be nice if the people that have access to really large machines
>> would test this series and report how much boot time is reduced.
>>
>>
>
> I ran a bootup timing test on a 12-TB 16-socket IvyBridge-EX system. 
> From grub menu to ssh login, the bootup time was 453s before the patch 
> and 265s after the patch - a saving of 188s (42%). I used a different 
> OS environment and config file with this test and so the timing data 
> weren't comparable with my previous testing data. The kswapd log 
> entries were
>
> [   45.973967] kswapd 4 initialised 197655470 pages in 4390ms
> [   45.974214] kswapd 7 initialised 197655470 pages in 4390ms
> [   45.976692] kswapd 15 initialised 197654299 pages in 4390ms
> [   45.993284] kswapd 0 initialised 197131131 pages in 4410ms
> [   46.032735] kswapd 9 initialised 197655470 pages in 4447ms
> [   46.065856] kswapd 8 initialised 197655470 pages in 4481ms
> [   46.066615] kswapd 1 initialised 197622702 pages in 4483ms
> [   46.077995] kswapd 2 initialised 197655470 pages in 4495ms
> [   46.219508] kswapd 13 initialised 197655470 pages in 4633ms
> [   46.224358] kswapd 3 initialised 197655470 pages in 4641ms
> [   46.228441] kswapd 11 initialised 197655470 pages in 4643ms
> [   46.232258] kswapd 12 initialised 197655470 pages in 4647ms
> [   46.239659] kswapd 10 initialised 197655470 pages in 4654ms
> [   46.243402] kswapd 14 initialised 197655470 pages in 4657ms
> [   46.250368] kswapd 5 initialised 197655470 pages in 4666ms
> [   46.254659] kswapd 6 initialised 197655470 pages in 4670ms
>
> Cheers,
> Longman

Bad news!

I tried your patch on a 24-TB DragonHawk and got an out of memory panic. 
The kernel log messages were:
   :
[   80.126186] CPU  474: hi:  186, btch:  31 usd:   0
[   80.131457] CPU  475: hi:  186, btch:  31 usd:   0
[   80.136726] CPU  476: hi:  186, btch:  31 usd:   0
[   80.141997] CPU  477: hi:  186, btch:  31 usd:   0
[   80.147267] CPU  478: hi:  186, btch:  31 usd:   0
[   80.152538] CPU  479: hi:  186, btch:  31 usd:   0
[   80.157813] active_anon:0 inactive_anon:0 isolated_anon:0
[   80.157813]  active_file:0 inactive_file:0 isolated_file:0
[   80.157813]  unevictable:0 dirty:0 writeback:0 unstable:0
[   80.157813]  free:209 slab_reclaimable:7 slab_unreclaimable:42986
[   80.157813]  mapped:0 shmem:0 pagetables:0 bounce:0
[   80.157813]  free_cma:0
[   80.190428] Node 0 DMA free:568kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB 
managed:15896kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB 
slab_reclaimable:0kB slab_unreclaimable:14928kB kernel_stack:400kB 
pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
pages_scanned:0 all_unreclaimable? yes
[   80.233475] lowmem_reserve[]: 0 0 0 0
[   80.237542] Node 0 DMA32 free:20kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1961924kB 
managed:1333604kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB 
shmem:0kB slab_reclaimable:12kB slab_unreclaimable:101664kB 
kernel_stack:50176kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB 
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[   80.281456] lowmem_reserve[]: 0 0 0 0
[   80.285527] Node 0 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1608515580kB managed:2097148kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:4kB 
slab_unreclaimable:948kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes
[   80.328958] lowmem_reserve[]: 0 0 0 0
[   80.333031] Node 1 Normal free:248kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612732kB managed:2228220kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:12kB 
slab_unreclaimable:46240kB kernel_stack:3232kB pagetables:0kB 
unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes
[   80.377256] lowmem_reserve[]: 0 0 0 0
[   80.381325] Node 2 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:612kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes
[   80.424764] lowmem_reserve[]: 0 0 0 0
[   80.428842] Node 3 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:600kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes
[   80.472293] lowmem_reserve[]: 0 0 0 0
[   80.476360] Node 4 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:620kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes
[   80.519803] lowmem_reserve[]: 0 0 0 0
[   80.523875] Node 5 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:584kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes
[   80.567312] lowmem_reserve[]: 0 0 0 0
[   80.571379] Node 6 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:556kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes
[   80.614814] lowmem_reserve[]: 0 0 0 0
[   80.618881] Node 7 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:556kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes
[   80.662316] lowmem_reserve[]: 0 0 0 0
[   80.666390] Node 8 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:572kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes
[   80.709827] lowmem_reserve[]: 0 0 0 0
[   80.713898] Node 9 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:572kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes
[   80.757336] lowmem_reserve[]: 0 0 0 0
[   80.761407] Node 10 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:564kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes
[   80.804941] lowmem_reserve[]: 0 0 0 0
[   80.809015] Node 11 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:572kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes
[   80.852548] lowmem_reserve[]: 0 0 0 0
[   80.856620] Node 12 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:616kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes
[   80.900158] lowmem_reserve[]: 0 0 0 0
[   80.904236] Node 13 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:592kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes
[   80.947765] lowmem_reserve[]: 0 0 0 0
[   80.951847] Node 14 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:600kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes
[   80.995380] lowmem_reserve[]: 0 0 0 0
[   80.999448] Node 15 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:548kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes
[   81.042974] lowmem_reserve[]: 0 0 0 0
[   81.047044] Node 0 DMA: 132*4kB (U) 5*8kB (U) 0*16kB 0*32kB 0*64kB 
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 568kB
[   81.059632] Node 0 DMA32: 5*4kB (U) 0*8kB 0*16kB 0*32kB 0*64kB 
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 20kB
[   81.071733] Node 0 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   81.083443] Node 1 Normal: 52*4kB (U) 5*8kB (U) 0*16kB 0*32kB 0*64kB 
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 248kB
[   81.096227] Node 2 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   81.107935] Node 3 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   81.119643] Node 4 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   81.131347] Node 5 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   81.143056] Node 6 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   81.154767] Node 7 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   81.166473] Node 8 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   81.178179] Node 9 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   81.189893] Node 10 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   81.201695] Node 11 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   81.213496] Node 12 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   81.225324] Node 13 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   81.237130] Node 14 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   81.248926] Node 15 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   81.260726] 0 total pagecache pages
[   81.264565] 0 pages in swap cache
[   81.268212] Swap cache stats: add 0, delete 0, find 0/0
[   81.273962] Free swap  = 0kB
[   81.277125] Total swap = 0kB
[   81.280341] 6442421132 pages RAM
[   81.283888] 0 pages HighMem/MovableOnly
[   81.288109] 6433662383 pages reserved
[   81.292135] 0 pages hwpoisoned
[   81.295491] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds 
swapents oom_score_adj name
[   81.305245] Kernel panic - not syncing: Out of memory and no killable 
processes...
[   81.305245]
[   81.315200] CPU: 240 PID: 1 Comm: swapper/0 Not tainted 
4.0.1-pmm-bigsmp #1
[   81.322856] Hardware name: HP Superdome2 16s x86, BIOS Bundle: 
006.000.042 SFW: 015.099.000 04/01/2015
[   81.333096]  0000000000000000 ffff8800044c79c8 ffffffff8151b0c9 
ffff8800044c7a48
[   81.341262]  ffffffff8151ae1e 0000000000000008 ffff8800044c7a58 
ffff8800044c79f8
[   81.349428]  ffffffff810785c3 ffffffff81a13480 0000000000000000 
ffff8800001001d0
[   81.357595] Call Trace:
[   81.360287]  [<ffffffff8151b0c9>] dump_stack+0x68/0x77
[   81.365942]  [<ffffffff8151ae1e>] panic+0xb9/0x219
[   81.371213]  [<ffffffff810785c3>] ? 
__blocking_notifier_call_chain+0x63/0x80
[   81.378971]  [<ffffffff811384ce>] __out_of_memory+0x34e/0x350
[   81.385292]  [<ffffffff811385ee>] out_of_memory+0x5e/0x90
[   81.391230]  [<ffffffff8113ce9e>] __alloc_pages_slowpath+0x6be/0x740
[   81.398219]  [<ffffffff8113d15c>] __alloc_pages_nodemask+0x23c/0x250
[   81.405212]  [<ffffffff81186346>] kmem_getpages+0x56/0x110
[   81.411246]  [<ffffffff81187f44>] fallback_alloc+0x164/0x200
[   81.417474]  [<ffffffff81187cfd>] ____cache_alloc_node+0x8d/0x170
[   81.424179]  [<ffffffff811887bb>] kmem_cache_alloc_trace+0x17b/0x240
[   81.431169]  [<ffffffff813d5f3a>] init_memory_block+0x3a/0x110
[   81.437586]  [<ffffffff81b5f687>] memory_dev_init+0xd7/0x13d
[   81.443810]  [<ffffffff81b5f2af>] driver_init+0x2f/0x37
[   81.449556]  [<ffffffff81b1599b>] do_basic_setup+0x29/0xd5
[   81.455597]  [<ffffffff81b372c4>] ? sched_init_smp+0x140/0x147
[   81.462015]  [<ffffffff81b15c55>] kernel_init_freeable+0x20e/0x297
[   81.468815]  [<ffffffff81512ea0>] ? rest_init+0x80/0x80
[   81.474565]  [<ffffffff81512ea9>] kernel_init+0x9/0xf0
[   81.480216]  [<ffffffff8151f788>] ret_from_fork+0x58/0x90
[   81.486156]  [<ffffffff81512ea0>] ? rest_init+0x80/0x80
[   81.492350] ---[ end Kernel panic - not syncing: Out of memory and no 
killable processes...
[   81.492350]

-Longman




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
  2015-05-01 22:02     ` Waiman Long
@ 2015-05-02  0:09       ` Waiman Long
  -1 siblings, 0 replies; 168+ messages in thread
From: Waiman Long @ 2015-05-02  0:09 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On 05/01/2015 06:02 PM, Waiman Long wrote:
>
> Bad news!
>
> I tried your patch on a 24-TB DragonHawk and got an out of memory 
> panic. The kernel log messages were:
>   :
> [   80.126186] CPU  474: hi:  186, btch:  31 usd:   0
> [   80.131457] CPU  475: hi:  186, btch:  31 usd:   0
> [   80.136726] CPU  476: hi:  186, btch:  31 usd:   0
> [   80.141997] CPU  477: hi:  186, btch:  31 usd:   0
> [   80.147267] CPU  478: hi:  186, btch:  31 usd:   0
> [   80.152538] CPU  479: hi:  186, btch:  31 usd:   0
> [   80.157813] active_anon:0 inactive_anon:0 isolated_anon:0
> [   80.157813]  active_file:0 inactive_file:0 isolated_file:0
> [   80.157813]  unevictable:0 dirty:0 writeback:0 unstable:0
> [   80.157813]  free:209 slab_reclaimable:7 slab_unreclaimable:42986
> [   80.157813]  mapped:0 shmem:0 pagetables:0 bounce:0
> [   80.157813]  free_cma:0
> [   80.190428] Node 0 DMA free:568kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB 
> managed:15896kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB 
> shmem:0kB slab_reclaimable:0kB slab_unreclaimable:14928kB 
> kernel_stack:400kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB 
> writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
> [   80.233475] lowmem_reserve[]: 0 0 0 0
> [   80.237542] Node 0 DMA32 free:20kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:1961924kB managed:1333604kB mlocked:0kB dirty:0kB 
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:12kB 
> slab_unreclaimable:101664kB kernel_stack:50176kB pagetables:0kB 
> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [   80.281456] lowmem_reserve[]: 0 0 0 0
> [   80.285527] Node 0 Normal free:0kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:1608515580kB managed:2097148kB mlocked:0kB dirty:0kB 
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:4kB 
> slab_unreclaimable:948kB kernel_stack:0kB pagetables:0kB unstable:0kB 
> bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [   80.328958] lowmem_reserve[]: 0 0 0 0
> [   80.333031] Node 1 Normal free:248kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:1610612732kB managed:2228220kB mlocked:0kB dirty:0kB 
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:12kB 
> slab_unreclaimable:46240kB kernel_stack:3232kB pagetables:0kB 
> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [   80.377256] lowmem_reserve[]: 0 0 0 0
> [   80.381325] Node 2 Normal free:0kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
> slab_unreclaimable:612kB kernel_stack:0kB pagetables:0kB unstable:0kB 
> bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [   80.424764] lowmem_reserve[]: 0 0 0 0
> [   80.428842] Node 3 Normal free:0kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
> slab_unreclaimable:600kB kernel_stack:0kB pagetables:0kB unstable:0kB 
> bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [   80.472293] lowmem_reserve[]: 0 0 0 0
> [   80.476360] Node 4 Normal free:0kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
> slab_unreclaimable:620kB kernel_stack:0kB pagetables:0kB unstable:0kB 
> bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [   80.519803] lowmem_reserve[]: 0 0 0 0
> [   80.523875] Node 5 Normal free:0kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
> slab_unreclaimable:584kB kernel_stack:0kB pagetables:0kB unstable:0kB 
> bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [   80.567312] lowmem_reserve[]: 0 0 0 0
> [   80.571379] Node 6 Normal free:0kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
> slab_unreclaimable:556kB kernel_stack:0kB pagetables:0kB unstable:0kB 
> bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [   80.614814] lowmem_reserve[]: 0 0 0 0
> [   80.618881] Node 7 Normal free:0kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
> slab_unreclaimable:556kB kernel_stack:0kB pagetables:0kB unstable:0kB 
> bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [   80.662316] lowmem_reserve[]: 0 0 0 0
> [   80.666390] Node 8 Normal free:0kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
> slab_unreclaimable:572kB kernel_stack:0kB pagetables:0kB unstable:0kB 
> bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [   80.709827] lowmem_reserve[]: 0 0 0 0
> [   80.713898] Node 9 Normal free:0kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
> slab_unreclaimable:572kB kernel_stack:0kB pagetables:0kB unstable:0kB 
> bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [   80.757336] lowmem_reserve[]: 0 0 0 0
> [   80.761407] Node 10 Normal free:0kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
> slab_unreclaimable:564kB kernel_stack:0kB pagetables:0kB unstable:0kB 
> bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [   80.804941] lowmem_reserve[]: 0 0 0 0
> [   80.809015] Node 11 Normal free:0kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
> slab_unreclaimable:572kB kernel_stack:0kB pagetables:0kB unstable:0kB 
> bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [   80.852548] lowmem_reserve[]: 0 0 0 0
> [   80.856620] Node 12 Normal free:0kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
> slab_unreclaimable:616kB kernel_stack:0kB pagetables:0kB unstable:0kB 
> bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [   80.900158] lowmem_reserve[]: 0 0 0 0
> [   80.904236] Node 13 Normal free:0kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
> slab_unreclaimable:592kB kernel_stack:0kB pagetables:0kB unstable:0kB 
> bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [   80.947765] lowmem_reserve[]: 0 0 0 0
> [   80.951847] Node 14 Normal free:0kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
> slab_unreclaimable:600kB kernel_stack:0kB pagetables:0kB unstable:0kB 
> bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [   80.995380] lowmem_reserve[]: 0 0 0 0
> [   80.999448] Node 15 Normal free:0kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
> slab_unreclaimable:548kB kernel_stack:0kB pagetables:0kB unstable:0kB 
> bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [   81.042974] lowmem_reserve[]: 0 0 0 0
> [   81.047044] Node 0 DMA: 132*4kB (U) 5*8kB (U) 0*16kB 0*32kB 0*64kB 
> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 568kB
> [   81.059632] Node 0 DMA32: 5*4kB (U) 0*8kB 0*16kB 0*32kB 0*64kB 
> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 20kB
> [   81.071733] Node 0 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [   81.083443] Node 1 Normal: 52*4kB (U) 5*8kB (U) 0*16kB 0*32kB 
> 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 248kB
> [   81.096227] Node 2 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [   81.107935] Node 3 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [   81.119643] Node 4 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [   81.131347] Node 5 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [   81.143056] Node 6 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [   81.154767] Node 7 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [   81.166473] Node 8 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [   81.178179] Node 9 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [   81.189893] Node 10 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [   81.201695] Node 11 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [   81.213496] Node 12 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [   81.225324] Node 13 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [   81.237130] Node 14 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [   81.248926] Node 15 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [   81.260726] 0 total pagecache pages
> [   81.264565] 0 pages in swap cache
> [   81.268212] Swap cache stats: add 0, delete 0, find 0/0
> [   81.273962] Free swap  = 0kB
> [   81.277125] Total swap = 0kB
> [   81.280341] 6442421132 pages RAM
> [   81.283888] 0 pages HighMem/MovableOnly
> [   81.288109] 6433662383 pages reserved
> [   81.292135] 0 pages hwpoisoned
> [   81.295491] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds 
> swapents oom_score_adj name
> [   81.305245] Kernel panic - not syncing: Out of memory and no 
> killable processes...
> [   81.305245]
> [   81.315200] CPU: 240 PID: 1 Comm: swapper/0 Not tainted 
> 4.0.1-pmm-bigsmp #1
> [   81.322856] Hardware name: HP Superdome2 16s x86, BIOS Bundle: 
> 006.000.042 SFW: 015.099.000 04/01/2015
> [   81.333096]  0000000000000000 ffff8800044c79c8 ffffffff8151b0c9 
> ffff8800044c7a48
> [   81.341262]  ffffffff8151ae1e 0000000000000008 ffff8800044c7a58 
> ffff8800044c79f8
> [   81.349428]  ffffffff810785c3 ffffffff81a13480 0000000000000000 
> ffff8800001001d0
> [   81.357595] Call Trace:
> [   81.360287]  [<ffffffff8151b0c9>] dump_stack+0x68/0x77
> [   81.365942]  [<ffffffff8151ae1e>] panic+0xb9/0x219
> [   81.371213]  [<ffffffff810785c3>] ? 
> __blocking_notifier_call_chain+0x63/0x80
> [   81.378971]  [<ffffffff811384ce>] __out_of_memory+0x34e/0x350
> [   81.385292]  [<ffffffff811385ee>] out_of_memory+0x5e/0x90
> [   81.391230]  [<ffffffff8113ce9e>] __alloc_pages_slowpath+0x6be/0x740
> [   81.398219]  [<ffffffff8113d15c>] __alloc_pages_nodemask+0x23c/0x250
> [   81.405212]  [<ffffffff81186346>] kmem_getpages+0x56/0x110
> [   81.411246]  [<ffffffff81187f44>] fallback_alloc+0x164/0x200
> [   81.417474]  [<ffffffff81187cfd>] ____cache_alloc_node+0x8d/0x170
> [   81.424179]  [<ffffffff811887bb>] kmem_cache_alloc_trace+0x17b/0x240
> [   81.431169]  [<ffffffff813d5f3a>] init_memory_block+0x3a/0x110
> [   81.437586]  [<ffffffff81b5f687>] memory_dev_init+0xd7/0x13d
> [   81.443810]  [<ffffffff81b5f2af>] driver_init+0x2f/0x37
> [   81.449556]  [<ffffffff81b1599b>] do_basic_setup+0x29/0xd5
> [   81.455597]  [<ffffffff81b372c4>] ? sched_init_smp+0x140/0x147
> [   81.462015]  [<ffffffff81b15c55>] kernel_init_freeable+0x20e/0x297
> [   81.468815]  [<ffffffff81512ea0>] ? rest_init+0x80/0x80
> [   81.474565]  [<ffffffff81512ea9>] kernel_init+0x9/0xf0
> [   81.480216]  [<ffffffff8151f788>] ret_from_fork+0x58/0x90
> [   81.486156]  [<ffffffff81512ea0>] ? rest_init+0x80/0x80
> [   81.492350] ---[ end Kernel panic - not syncing: Out of memory and 
> no killable processes...
> [   81.492350]
>
> -Longman

I increased the pre-initialized memory per node in update_defer_init() 
of mm/page_alloc.c from 2G to 4G. Now I am able to boot the 24-TB 
machine without error. The 12-TB has 0.75TB/node, while the 24-TB 
machine has 1.5TB/node. I would suggest something like pre-initializing 
1G per 0.25TB/node. In this way, it will scale properly with the memory 
size.

Before the patch, the boot time from elilo prompt to ssh login was 694s. 
After the patch, the boot up time was 346s, a saving of 348s (about 50%).

Cheers,
Longman


^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
@ 2015-05-02  0:09       ` Waiman Long
  0 siblings, 0 replies; 168+ messages in thread
From: Waiman Long @ 2015-05-02  0:09 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On 05/01/2015 06:02 PM, Waiman Long wrote:
>
> Bad news!
>
> I tried your patch on a 24-TB DragonHawk and got an out of memory 
> panic. The kernel log messages were:
>   :
> [   80.126186] CPU  474: hi:  186, btch:  31 usd:   0
> [   80.131457] CPU  475: hi:  186, btch:  31 usd:   0
> [   80.136726] CPU  476: hi:  186, btch:  31 usd:   0
> [   80.141997] CPU  477: hi:  186, btch:  31 usd:   0
> [   80.147267] CPU  478: hi:  186, btch:  31 usd:   0
> [   80.152538] CPU  479: hi:  186, btch:  31 usd:   0
> [   80.157813] active_anon:0 inactive_anon:0 isolated_anon:0
> [   80.157813]  active_file:0 inactive_file:0 isolated_file:0
> [   80.157813]  unevictable:0 dirty:0 writeback:0 unstable:0
> [   80.157813]  free:209 slab_reclaimable:7 slab_unreclaimable:42986
> [   80.157813]  mapped:0 shmem:0 pagetables:0 bounce:0
> [   80.157813]  free_cma:0
> [   80.190428] Node 0 DMA free:568kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB 
> managed:15896kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB 
> shmem:0kB slab_reclaimable:0kB slab_unreclaimable:14928kB 
> kernel_stack:400kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB 
> writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
> [   80.233475] lowmem_reserve[]: 0 0 0 0
> [   80.237542] Node 0 DMA32 free:20kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:1961924kB managed:1333604kB mlocked:0kB dirty:0kB 
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:12kB 
> slab_unreclaimable:101664kB kernel_stack:50176kB pagetables:0kB 
> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [   80.281456] lowmem_reserve[]: 0 0 0 0
> [   80.285527] Node 0 Normal free:0kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:1608515580kB managed:2097148kB mlocked:0kB dirty:0kB 
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:4kB 
> slab_unreclaimable:948kB kernel_stack:0kB pagetables:0kB unstable:0kB 
> bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [   80.328958] lowmem_reserve[]: 0 0 0 0
> [   80.333031] Node 1 Normal free:248kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:1610612732kB managed:2228220kB mlocked:0kB dirty:0kB 
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:12kB 
> slab_unreclaimable:46240kB kernel_stack:3232kB pagetables:0kB 
> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [   80.377256] lowmem_reserve[]: 0 0 0 0
> [   80.381325] Node 2 Normal free:0kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
> slab_unreclaimable:612kB kernel_stack:0kB pagetables:0kB unstable:0kB 
> bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [   80.424764] lowmem_reserve[]: 0 0 0 0
> [   80.428842] Node 3 Normal free:0kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
> slab_unreclaimable:600kB kernel_stack:0kB pagetables:0kB unstable:0kB 
> bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [   80.472293] lowmem_reserve[]: 0 0 0 0
> [   80.476360] Node 4 Normal free:0kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
> slab_unreclaimable:620kB kernel_stack:0kB pagetables:0kB unstable:0kB 
> bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [   80.519803] lowmem_reserve[]: 0 0 0 0
> [   80.523875] Node 5 Normal free:0kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
> slab_unreclaimable:584kB kernel_stack:0kB pagetables:0kB unstable:0kB 
> bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [   80.567312] lowmem_reserve[]: 0 0 0 0
> [   80.571379] Node 6 Normal free:0kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
> slab_unreclaimable:556kB kernel_stack:0kB pagetables:0kB unstable:0kB 
> bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [   80.614814] lowmem_reserve[]: 0 0 0 0
> [   80.618881] Node 7 Normal free:0kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
> slab_unreclaimable:556kB kernel_stack:0kB pagetables:0kB unstable:0kB 
> bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [   80.662316] lowmem_reserve[]: 0 0 0 0
> [   80.666390] Node 8 Normal free:0kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
> slab_unreclaimable:572kB kernel_stack:0kB pagetables:0kB unstable:0kB 
> bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [   80.709827] lowmem_reserve[]: 0 0 0 0
> [   80.713898] Node 9 Normal free:0kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
> slab_unreclaimable:572kB kernel_stack:0kB pagetables:0kB unstable:0kB 
> bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [   80.757336] lowmem_reserve[]: 0 0 0 0
> [   80.761407] Node 10 Normal free:0kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
> slab_unreclaimable:564kB kernel_stack:0kB pagetables:0kB unstable:0kB 
> bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [   80.804941] lowmem_reserve[]: 0 0 0 0
> [   80.809015] Node 11 Normal free:0kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
> slab_unreclaimable:572kB kernel_stack:0kB pagetables:0kB unstable:0kB 
> bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [   80.852548] lowmem_reserve[]: 0 0 0 0
> [   80.856620] Node 12 Normal free:0kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
> slab_unreclaimable:616kB kernel_stack:0kB pagetables:0kB unstable:0kB 
> bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [   80.900158] lowmem_reserve[]: 0 0 0 0
> [   80.904236] Node 13 Normal free:0kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
> slab_unreclaimable:592kB kernel_stack:0kB pagetables:0kB unstable:0kB 
> bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [   80.947765] lowmem_reserve[]: 0 0 0 0
> [   80.951847] Node 14 Normal free:0kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
> slab_unreclaimable:600kB kernel_stack:0kB pagetables:0kB unstable:0kB 
> bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [   80.995380] lowmem_reserve[]: 0 0 0 0
> [   80.999448] Node 15 Normal free:0kB min:0kB low:0kB high:0kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
> slab_unreclaimable:548kB kernel_stack:0kB pagetables:0kB unstable:0kB 
> bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [   81.042974] lowmem_reserve[]: 0 0 0 0
> [   81.047044] Node 0 DMA: 132*4kB (U) 5*8kB (U) 0*16kB 0*32kB 0*64kB 
> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 568kB
> [   81.059632] Node 0 DMA32: 5*4kB (U) 0*8kB 0*16kB 0*32kB 0*64kB 
> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 20kB
> [   81.071733] Node 0 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [   81.083443] Node 1 Normal: 52*4kB (U) 5*8kB (U) 0*16kB 0*32kB 
> 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 248kB
> [   81.096227] Node 2 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [   81.107935] Node 3 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [   81.119643] Node 4 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [   81.131347] Node 5 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [   81.143056] Node 6 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [   81.154767] Node 7 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [   81.166473] Node 8 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [   81.178179] Node 9 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [   81.189893] Node 10 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [   81.201695] Node 11 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [   81.213496] Node 12 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [   81.225324] Node 13 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [   81.237130] Node 14 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [   81.248926] Node 15 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> [   81.260726] 0 total pagecache pages
> [   81.264565] 0 pages in swap cache
> [   81.268212] Swap cache stats: add 0, delete 0, find 0/0
> [   81.273962] Free swap  = 0kB
> [   81.277125] Total swap = 0kB
> [   81.280341] 6442421132 pages RAM
> [   81.283888] 0 pages HighMem/MovableOnly
> [   81.288109] 6433662383 pages reserved
> [   81.292135] 0 pages hwpoisoned
> [   81.295491] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds 
> swapents oom_score_adj name
> [   81.305245] Kernel panic - not syncing: Out of memory and no 
> killable processes...
> [   81.305245]
> [   81.315200] CPU: 240 PID: 1 Comm: swapper/0 Not tainted 
> 4.0.1-pmm-bigsmp #1
> [   81.322856] Hardware name: HP Superdome2 16s x86, BIOS Bundle: 
> 006.000.042 SFW: 015.099.000 04/01/2015
> [   81.333096]  0000000000000000 ffff8800044c79c8 ffffffff8151b0c9 
> ffff8800044c7a48
> [   81.341262]  ffffffff8151ae1e 0000000000000008 ffff8800044c7a58 
> ffff8800044c79f8
> [   81.349428]  ffffffff810785c3 ffffffff81a13480 0000000000000000 
> ffff8800001001d0
> [   81.357595] Call Trace:
> [   81.360287]  [<ffffffff8151b0c9>] dump_stack+0x68/0x77
> [   81.365942]  [<ffffffff8151ae1e>] panic+0xb9/0x219
> [   81.371213]  [<ffffffff810785c3>] ? 
> __blocking_notifier_call_chain+0x63/0x80
> [   81.378971]  [<ffffffff811384ce>] __out_of_memory+0x34e/0x350
> [   81.385292]  [<ffffffff811385ee>] out_of_memory+0x5e/0x90
> [   81.391230]  [<ffffffff8113ce9e>] __alloc_pages_slowpath+0x6be/0x740
> [   81.398219]  [<ffffffff8113d15c>] __alloc_pages_nodemask+0x23c/0x250
> [   81.405212]  [<ffffffff81186346>] kmem_getpages+0x56/0x110
> [   81.411246]  [<ffffffff81187f44>] fallback_alloc+0x164/0x200
> [   81.417474]  [<ffffffff81187cfd>] ____cache_alloc_node+0x8d/0x170
> [   81.424179]  [<ffffffff811887bb>] kmem_cache_alloc_trace+0x17b/0x240
> [   81.431169]  [<ffffffff813d5f3a>] init_memory_block+0x3a/0x110
> [   81.437586]  [<ffffffff81b5f687>] memory_dev_init+0xd7/0x13d
> [   81.443810]  [<ffffffff81b5f2af>] driver_init+0x2f/0x37
> [   81.449556]  [<ffffffff81b1599b>] do_basic_setup+0x29/0xd5
> [   81.455597]  [<ffffffff81b372c4>] ? sched_init_smp+0x140/0x147
> [   81.462015]  [<ffffffff81b15c55>] kernel_init_freeable+0x20e/0x297
> [   81.468815]  [<ffffffff81512ea0>] ? rest_init+0x80/0x80
> [   81.474565]  [<ffffffff81512ea9>] kernel_init+0x9/0xf0
> [   81.480216]  [<ffffffff8151f788>] ret_from_fork+0x58/0x90
> [   81.486156]  [<ffffffff81512ea0>] ? rest_init+0x80/0x80
> [   81.492350] ---[ end Kernel panic - not syncing: Out of memory and 
> no killable processes...
> [   81.492350]
>
> -Longman

I increased the pre-initialized memory per node in update_defer_init() 
of mm/page_alloc.c from 2G to 4G. Now I am able to boot the 24-TB 
machine without error. The 12-TB has 0.75TB/node, while the 24-TB 
machine has 1.5TB/node. I would suggest something like pre-initializing 
1G per 0.25TB/node. In this way, it will scale properly with the memory 
size.

Before the patch, the boot time from elilo prompt to ssh login was 694s. 
After the patch, the boot up time was 346s, a saving of 348s (about 50%).

Cheers,
Longman

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
  2015-05-02  0:09       ` Waiman Long
@ 2015-05-02  8:52         ` Daniel J Blueman
  -1 siblings, 0 replies; 168+ messages in thread
From: Daniel J Blueman @ 2015-05-02  8:52 UTC (permalink / raw)
  To: Waiman Long, Mel Gorman
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Scott Norton, Linux-MM, LKML

On Sat, May 2, 2015 at 8:09 AM, Waiman Long <waiman.long@hp.com> wrote:
> On 05/01/2015 06:02 PM, Waiman Long wrote:
>> 
>> Bad news!
>> 
>> I tried your patch on a 24-TB DragonHawk and got an out of memory 
>> panic. The kernel log messages were:
>>   :
>> [   80.126186] CPU  474: hi:  186, btch:  31 usd:   0
>> [   80.131457] CPU  475: hi:  186, btch:  31 usd:   0
>> [   80.136726] CPU  476: hi:  186, btch:  31 usd:   0
>> [   80.141997] CPU  477: hi:  186, btch:  31 usd:   0
>> [   80.147267] CPU  478: hi:  186, btch:  31 usd:   0
>> [   80.152538] CPU  479: hi:  186, btch:  31 usd:   0
>> [   80.157813] active_anon:0 inactive_anon:0 isolated_anon:0
>> [   80.157813]  active_file:0 inactive_file:0 isolated_file:0
>> [   80.157813]  unevictable:0 dirty:0 writeback:0 unstable:0
>> [   80.157813]  free:209 slab_reclaimable:7 slab_unreclaimable:42986
>> [   80.157813]  mapped:0 shmem:0 pagetables:0 bounce:0
>> [   80.157813]  free_cma:0
>> [   80.190428] Node 0 DMA free:568kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:15988kB managed:15896kB mlocked:0kB dirty:0kB writeback:0kB 
>> mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:14928kB 
>> kernel_stack:400kB pagetables:0kB unstable:0kB bounce:0kB 
>> free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
>> [   80.233475] lowmem_reserve[]: 0 0 0 0
>> [   80.237542] Node 0 DMA32 free:20kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:1961924kB managed:1333604kB mlocked:0kB dirty:0kB 
>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:12kB 
>> slab_unreclaimable:101664kB kernel_stack:50176kB pagetables:0kB 
>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>> pages_scanned:0 all_unreclaimable? yes
>> [   80.281456] lowmem_reserve[]: 0 0 0 0
>> [   80.285527] Node 0 Normal free:0kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:1608515580kB managed:2097148kB mlocked:0kB dirty:0kB 
>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:4kB 
>> slab_unreclaimable:948kB kernel_stack:0kB pagetables:0kB 
>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>> pages_scanned:0 all_unreclaimable? yes
>> [   80.328958] lowmem_reserve[]: 0 0 0 0
>> [   80.333031] Node 1 Normal free:248kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:1610612732kB managed:2228220kB mlocked:0kB dirty:0kB 
>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:12kB 
>> slab_unreclaimable:46240kB kernel_stack:3232kB pagetables:0kB 
>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>> pages_scanned:0 all_unreclaimable? yes
>> [   80.377256] lowmem_reserve[]: 0 0 0 0
>> [   80.381325] Node 2 Normal free:0kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>> slab_unreclaimable:612kB kernel_stack:0kB pagetables:0kB 
>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>> pages_scanned:0 all_unreclaimable? yes
>> [   80.424764] lowmem_reserve[]: 0 0 0 0
>> [   80.428842] Node 3 Normal free:0kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>> slab_unreclaimable:600kB kernel_stack:0kB pagetables:0kB 
>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>> pages_scanned:0 all_unreclaimable? yes
>> [   80.472293] lowmem_reserve[]: 0 0 0 0
>> [   80.476360] Node 4 Normal free:0kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>> slab_unreclaimable:620kB kernel_stack:0kB pagetables:0kB 
>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>> pages_scanned:0 all_unreclaimable? yes
>> [   80.519803] lowmem_reserve[]: 0 0 0 0
>> [   80.523875] Node 5 Normal free:0kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>> slab_unreclaimable:584kB kernel_stack:0kB pagetables:0kB 
>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>> pages_scanned:0 all_unreclaimable? yes
>> [   80.567312] lowmem_reserve[]: 0 0 0 0
>> [   80.571379] Node 6 Normal free:0kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>> slab_unreclaimable:556kB kernel_stack:0kB pagetables:0kB 
>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>> pages_scanned:0 all_unreclaimable? yes
>> [   80.614814] lowmem_reserve[]: 0 0 0 0
>> [   80.618881] Node 7 Normal free:0kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>> slab_unreclaimable:556kB kernel_stack:0kB pagetables:0kB 
>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>> pages_scanned:0 all_unreclaimable? yes
>> [   80.662316] lowmem_reserve[]: 0 0 0 0
>> [   80.666390] Node 8 Normal free:0kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>> slab_unreclaimable:572kB kernel_stack:0kB pagetables:0kB 
>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>> pages_scanned:0 all_unreclaimable? yes
>> [   80.709827] lowmem_reserve[]: 0 0 0 0
>> [   80.713898] Node 9 Normal free:0kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>> slab_unreclaimable:572kB kernel_stack:0kB pagetables:0kB 
>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>> pages_scanned:0 all_unreclaimable? yes
>> [   80.757336] lowmem_reserve[]: 0 0 0 0
>> [   80.761407] Node 10 Normal free:0kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>> slab_unreclaimable:564kB kernel_stack:0kB pagetables:0kB 
>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>> pages_scanned:0 all_unreclaimable? yes
>> [   80.804941] lowmem_reserve[]: 0 0 0 0
>> [   80.809015] Node 11 Normal free:0kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>> slab_unreclaimable:572kB kernel_stack:0kB pagetables:0kB 
>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>> pages_scanned:0 all_unreclaimable? yes
>> [   80.852548] lowmem_reserve[]: 0 0 0 0
>> [   80.856620] Node 12 Normal free:0kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>> slab_unreclaimable:616kB kernel_stack:0kB pagetables:0kB 
>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>> pages_scanned:0 all_unreclaimable? yes
>> [   80.900158] lowmem_reserve[]: 0 0 0 0
>> [   80.904236] Node 13 Normal free:0kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>> slab_unreclaimable:592kB kernel_stack:0kB pagetables:0kB 
>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>> pages_scanned:0 all_unreclaimable? yes
>> [   80.947765] lowmem_reserve[]: 0 0 0 0
>> [   80.951847] Node 14 Normal free:0kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>> slab_unreclaimable:600kB kernel_stack:0kB pagetables:0kB 
>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>> pages_scanned:0 all_unreclaimable? yes
>> [   80.995380] lowmem_reserve[]: 0 0 0 0
>> [   80.999448] Node 15 Normal free:0kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>> slab_unreclaimable:548kB kernel_stack:0kB pagetables:0kB 
>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>> pages_scanned:0 all_unreclaimable? yes
>> [   81.042974] lowmem_reserve[]: 0 0 0 0
>> [   81.047044] Node 0 DMA: 132*4kB (U) 5*8kB (U) 0*16kB 0*32kB 
>> 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 568kB
>> [   81.059632] Node 0 DMA32: 5*4kB (U) 0*8kB 0*16kB 0*32kB 0*64kB 
>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 20kB
>> [   81.071733] Node 0 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>> [   81.083443] Node 1 Normal: 52*4kB (U) 5*8kB (U) 0*16kB 0*32kB 
>> 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 248kB
>> [   81.096227] Node 2 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>> [   81.107935] Node 3 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>> [   81.119643] Node 4 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>> [   81.131347] Node 5 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>> [   81.143056] Node 6 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>> [   81.154767] Node 7 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>> [   81.166473] Node 8 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>> [   81.178179] Node 9 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>> [   81.189893] Node 10 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>> [   81.201695] Node 11 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>> [   81.213496] Node 12 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>> [   81.225324] Node 13 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>> [   81.237130] Node 14 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>> [   81.248926] Node 15 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>> [   81.260726] 0 total pagecache pages
>> [   81.264565] 0 pages in swap cache
>> [   81.268212] Swap cache stats: add 0, delete 0, find 0/0
>> [   81.273962] Free swap  = 0kB
>> [   81.277125] Total swap = 0kB
>> [   81.280341] 6442421132 pages RAM
>> [   81.283888] 0 pages HighMem/MovableOnly
>> [   81.288109] 6433662383 pages reserved
>> [   81.292135] 0 pages hwpoisoned
>> [   81.295491] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds 
>> swapents oom_score_adj name
>> [   81.305245] Kernel panic - not syncing: Out of memory and no 
>> killable processes...
>> [   81.305245]
>> [   81.315200] CPU: 240 PID: 1 Comm: swapper/0 Not tainted 
>> 4.0.1-pmm-bigsmp #1
>> [   81.322856] Hardware name: HP Superdome2 16s x86, BIOS Bundle: 
>> 006.000.042 SFW: 015.099.000 04/01/2015
>> [   81.333096]  0000000000000000 ffff8800044c79c8 ffffffff8151b0c9 
>> ffff8800044c7a48
>> [   81.341262]  ffffffff8151ae1e 0000000000000008 ffff8800044c7a58 
>> ffff8800044c79f8
>> [   81.349428]  ffffffff810785c3 ffffffff81a13480 0000000000000000 
>> ffff8800001001d0
>> [   81.357595] Call Trace:
>> [   81.360287]  [<ffffffff8151b0c9>] dump_stack+0x68/0x77
>> [   81.365942]  [<ffffffff8151ae1e>] panic+0xb9/0x219
>> [   81.371213]  [<ffffffff810785c3>] ? 
>> __blocking_notifier_call_chain+0x63/0x80
>> [   81.378971]  [<ffffffff811384ce>] __out_of_memory+0x34e/0x350
>> [   81.385292]  [<ffffffff811385ee>] out_of_memory+0x5e/0x90
>> [   81.391230]  [<ffffffff8113ce9e>] 
>> __alloc_pages_slowpath+0x6be/0x740
>> [   81.398219]  [<ffffffff8113d15c>] 
>> __alloc_pages_nodemask+0x23c/0x250
>> [   81.405212]  [<ffffffff81186346>] kmem_getpages+0x56/0x110
>> [   81.411246]  [<ffffffff81187f44>] fallback_alloc+0x164/0x200
>> [   81.417474]  [<ffffffff81187cfd>] ____cache_alloc_node+0x8d/0x170
>> [   81.424179]  [<ffffffff811887bb>] 
>> kmem_cache_alloc_trace+0x17b/0x240
>> [   81.431169]  [<ffffffff813d5f3a>] init_memory_block+0x3a/0x110
>> [   81.437586]  [<ffffffff81b5f687>] memory_dev_init+0xd7/0x13d
>> [   81.443810]  [<ffffffff81b5f2af>] driver_init+0x2f/0x37
>> [   81.449556]  [<ffffffff81b1599b>] do_basic_setup+0x29/0xd5
>> [   81.455597]  [<ffffffff81b372c4>] ? sched_init_smp+0x140/0x147
>> [   81.462015]  [<ffffffff81b15c55>] kernel_init_freeable+0x20e/0x297
>> [   81.468815]  [<ffffffff81512ea0>] ? rest_init+0x80/0x80
>> [   81.474565]  [<ffffffff81512ea9>] kernel_init+0x9/0xf0
>> [   81.480216]  [<ffffffff8151f788>] ret_from_fork+0x58/0x90
>> [   81.486156]  [<ffffffff81512ea0>] ? rest_init+0x80/0x80
>> [   81.492350] ---[ end Kernel panic - not syncing: Out of memory 
>> and no killable processes...
>> [   81.492350]
>> 
>> -Longman
> 
> I increased the pre-initialized memory per node in 
> update_defer_init() of mm/page_alloc.c from 2G to 4G. Now I am able 
> to boot the 24-TB machine without error. The 12-TB has 0.75TB/node, 
> while the 24-TB machine has 1.5TB/node. I would suggest something 
> like pre-initializing 1G per 0.25TB/node. In this way, it will scale 
> properly with the memory size.
> 
> Before the patch, the boot time from elilo prompt to ssh login was 
> 694s. After the patch, the boot up time was 346s, a saving of 348s 
> (about 50%).

I second scaling the up-front init with the zone size. The 7TB system I 
was booting has only 32GB per NUMA node, which at 1GB per 0.25TB would 
work out at 128MB up-front init per-NUMA-node, which worked nice and 
booted faster yet.

Even booting with 64MB per NUMA node worked great, so there is adequate 
margin for the 8 cores, just I guess we'd need to enforce a minimum of 
eg 64MB or so.

Thanks,
  Daniel


^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
@ 2015-05-02  8:52         ` Daniel J Blueman
  0 siblings, 0 replies; 168+ messages in thread
From: Daniel J Blueman @ 2015-05-02  8:52 UTC (permalink / raw)
  To: Waiman Long, Mel Gorman
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Scott Norton, Linux-MM, LKML

On Sat, May 2, 2015 at 8:09 AM, Waiman Long <waiman.long@hp.com> wrote:
> On 05/01/2015 06:02 PM, Waiman Long wrote:
>> 
>> Bad news!
>> 
>> I tried your patch on a 24-TB DragonHawk and got an out of memory 
>> panic. The kernel log messages were:
>>   :
>> [   80.126186] CPU  474: hi:  186, btch:  31 usd:   0
>> [   80.131457] CPU  475: hi:  186, btch:  31 usd:   0
>> [   80.136726] CPU  476: hi:  186, btch:  31 usd:   0
>> [   80.141997] CPU  477: hi:  186, btch:  31 usd:   0
>> [   80.147267] CPU  478: hi:  186, btch:  31 usd:   0
>> [   80.152538] CPU  479: hi:  186, btch:  31 usd:   0
>> [   80.157813] active_anon:0 inactive_anon:0 isolated_anon:0
>> [   80.157813]  active_file:0 inactive_file:0 isolated_file:0
>> [   80.157813]  unevictable:0 dirty:0 writeback:0 unstable:0
>> [   80.157813]  free:209 slab_reclaimable:7 slab_unreclaimable:42986
>> [   80.157813]  mapped:0 shmem:0 pagetables:0 bounce:0
>> [   80.157813]  free_cma:0
>> [   80.190428] Node 0 DMA free:568kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:15988kB managed:15896kB mlocked:0kB dirty:0kB writeback:0kB 
>> mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:14928kB 
>> kernel_stack:400kB pagetables:0kB unstable:0kB bounce:0kB 
>> free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
>> [   80.233475] lowmem_reserve[]: 0 0 0 0
>> [   80.237542] Node 0 DMA32 free:20kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:1961924kB managed:1333604kB mlocked:0kB dirty:0kB 
>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:12kB 
>> slab_unreclaimable:101664kB kernel_stack:50176kB pagetables:0kB 
>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>> pages_scanned:0 all_unreclaimable? yes
>> [   80.281456] lowmem_reserve[]: 0 0 0 0
>> [   80.285527] Node 0 Normal free:0kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:1608515580kB managed:2097148kB mlocked:0kB dirty:0kB 
>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:4kB 
>> slab_unreclaimable:948kB kernel_stack:0kB pagetables:0kB 
>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>> pages_scanned:0 all_unreclaimable? yes
>> [   80.328958] lowmem_reserve[]: 0 0 0 0
>> [   80.333031] Node 1 Normal free:248kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:1610612732kB managed:2228220kB mlocked:0kB dirty:0kB 
>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:12kB 
>> slab_unreclaimable:46240kB kernel_stack:3232kB pagetables:0kB 
>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>> pages_scanned:0 all_unreclaimable? yes
>> [   80.377256] lowmem_reserve[]: 0 0 0 0
>> [   80.381325] Node 2 Normal free:0kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>> slab_unreclaimable:612kB kernel_stack:0kB pagetables:0kB 
>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>> pages_scanned:0 all_unreclaimable? yes
>> [   80.424764] lowmem_reserve[]: 0 0 0 0
>> [   80.428842] Node 3 Normal free:0kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>> slab_unreclaimable:600kB kernel_stack:0kB pagetables:0kB 
>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>> pages_scanned:0 all_unreclaimable? yes
>> [   80.472293] lowmem_reserve[]: 0 0 0 0
>> [   80.476360] Node 4 Normal free:0kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>> slab_unreclaimable:620kB kernel_stack:0kB pagetables:0kB 
>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>> pages_scanned:0 all_unreclaimable? yes
>> [   80.519803] lowmem_reserve[]: 0 0 0 0
>> [   80.523875] Node 5 Normal free:0kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>> slab_unreclaimable:584kB kernel_stack:0kB pagetables:0kB 
>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>> pages_scanned:0 all_unreclaimable? yes
>> [   80.567312] lowmem_reserve[]: 0 0 0 0
>> [   80.571379] Node 6 Normal free:0kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>> slab_unreclaimable:556kB kernel_stack:0kB pagetables:0kB 
>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>> pages_scanned:0 all_unreclaimable? yes
>> [   80.614814] lowmem_reserve[]: 0 0 0 0
>> [   80.618881] Node 7 Normal free:0kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>> slab_unreclaimable:556kB kernel_stack:0kB pagetables:0kB 
>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>> pages_scanned:0 all_unreclaimable? yes
>> [   80.662316] lowmem_reserve[]: 0 0 0 0
>> [   80.666390] Node 8 Normal free:0kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>> slab_unreclaimable:572kB kernel_stack:0kB pagetables:0kB 
>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>> pages_scanned:0 all_unreclaimable? yes
>> [   80.709827] lowmem_reserve[]: 0 0 0 0
>> [   80.713898] Node 9 Normal free:0kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>> slab_unreclaimable:572kB kernel_stack:0kB pagetables:0kB 
>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>> pages_scanned:0 all_unreclaimable? yes
>> [   80.757336] lowmem_reserve[]: 0 0 0 0
>> [   80.761407] Node 10 Normal free:0kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>> slab_unreclaimable:564kB kernel_stack:0kB pagetables:0kB 
>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>> pages_scanned:0 all_unreclaimable? yes
>> [   80.804941] lowmem_reserve[]: 0 0 0 0
>> [   80.809015] Node 11 Normal free:0kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>> slab_unreclaimable:572kB kernel_stack:0kB pagetables:0kB 
>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>> pages_scanned:0 all_unreclaimable? yes
>> [   80.852548] lowmem_reserve[]: 0 0 0 0
>> [   80.856620] Node 12 Normal free:0kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>> slab_unreclaimable:616kB kernel_stack:0kB pagetables:0kB 
>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>> pages_scanned:0 all_unreclaimable? yes
>> [   80.900158] lowmem_reserve[]: 0 0 0 0
>> [   80.904236] Node 13 Normal free:0kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>> slab_unreclaimable:592kB kernel_stack:0kB pagetables:0kB 
>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>> pages_scanned:0 all_unreclaimable? yes
>> [   80.947765] lowmem_reserve[]: 0 0 0 0
>> [   80.951847] Node 14 Normal free:0kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>> slab_unreclaimable:600kB kernel_stack:0kB pagetables:0kB 
>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>> pages_scanned:0 all_unreclaimable? yes
>> [   80.995380] lowmem_reserve[]: 0 0 0 0
>> [   80.999448] Node 15 Normal free:0kB min:0kB low:0kB high:0kB 
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>> slab_unreclaimable:548kB kernel_stack:0kB pagetables:0kB 
>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>> pages_scanned:0 all_unreclaimable? yes
>> [   81.042974] lowmem_reserve[]: 0 0 0 0
>> [   81.047044] Node 0 DMA: 132*4kB (U) 5*8kB (U) 0*16kB 0*32kB 
>> 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 568kB
>> [   81.059632] Node 0 DMA32: 5*4kB (U) 0*8kB 0*16kB 0*32kB 0*64kB 
>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 20kB
>> [   81.071733] Node 0 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>> [   81.083443] Node 1 Normal: 52*4kB (U) 5*8kB (U) 0*16kB 0*32kB 
>> 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 248kB
>> [   81.096227] Node 2 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>> [   81.107935] Node 3 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>> [   81.119643] Node 4 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>> [   81.131347] Node 5 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>> [   81.143056] Node 6 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>> [   81.154767] Node 7 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>> [   81.166473] Node 8 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>> [   81.178179] Node 9 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>> [   81.189893] Node 10 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>> [   81.201695] Node 11 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>> [   81.213496] Node 12 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>> [   81.225324] Node 13 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>> [   81.237130] Node 14 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>> [   81.248926] Node 15 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>> [   81.260726] 0 total pagecache pages
>> [   81.264565] 0 pages in swap cache
>> [   81.268212] Swap cache stats: add 0, delete 0, find 0/0
>> [   81.273962] Free swap  = 0kB
>> [   81.277125] Total swap = 0kB
>> [   81.280341] 6442421132 pages RAM
>> [   81.283888] 0 pages HighMem/MovableOnly
>> [   81.288109] 6433662383 pages reserved
>> [   81.292135] 0 pages hwpoisoned
>> [   81.295491] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds 
>> swapents oom_score_adj name
>> [   81.305245] Kernel panic - not syncing: Out of memory and no 
>> killable processes...
>> [   81.305245]
>> [   81.315200] CPU: 240 PID: 1 Comm: swapper/0 Not tainted 
>> 4.0.1-pmm-bigsmp #1
>> [   81.322856] Hardware name: HP Superdome2 16s x86, BIOS Bundle: 
>> 006.000.042 SFW: 015.099.000 04/01/2015
>> [   81.333096]  0000000000000000 ffff8800044c79c8 ffffffff8151b0c9 
>> ffff8800044c7a48
>> [   81.341262]  ffffffff8151ae1e 0000000000000008 ffff8800044c7a58 
>> ffff8800044c79f8
>> [   81.349428]  ffffffff810785c3 ffffffff81a13480 0000000000000000 
>> ffff8800001001d0
>> [   81.357595] Call Trace:
>> [   81.360287]  [<ffffffff8151b0c9>] dump_stack+0x68/0x77
>> [   81.365942]  [<ffffffff8151ae1e>] panic+0xb9/0x219
>> [   81.371213]  [<ffffffff810785c3>] ? 
>> __blocking_notifier_call_chain+0x63/0x80
>> [   81.378971]  [<ffffffff811384ce>] __out_of_memory+0x34e/0x350
>> [   81.385292]  [<ffffffff811385ee>] out_of_memory+0x5e/0x90
>> [   81.391230]  [<ffffffff8113ce9e>] 
>> __alloc_pages_slowpath+0x6be/0x740
>> [   81.398219]  [<ffffffff8113d15c>] 
>> __alloc_pages_nodemask+0x23c/0x250
>> [   81.405212]  [<ffffffff81186346>] kmem_getpages+0x56/0x110
>> [   81.411246]  [<ffffffff81187f44>] fallback_alloc+0x164/0x200
>> [   81.417474]  [<ffffffff81187cfd>] ____cache_alloc_node+0x8d/0x170
>> [   81.424179]  [<ffffffff811887bb>] 
>> kmem_cache_alloc_trace+0x17b/0x240
>> [   81.431169]  [<ffffffff813d5f3a>] init_memory_block+0x3a/0x110
>> [   81.437586]  [<ffffffff81b5f687>] memory_dev_init+0xd7/0x13d
>> [   81.443810]  [<ffffffff81b5f2af>] driver_init+0x2f/0x37
>> [   81.449556]  [<ffffffff81b1599b>] do_basic_setup+0x29/0xd5
>> [   81.455597]  [<ffffffff81b372c4>] ? sched_init_smp+0x140/0x147
>> [   81.462015]  [<ffffffff81b15c55>] kernel_init_freeable+0x20e/0x297
>> [   81.468815]  [<ffffffff81512ea0>] ? rest_init+0x80/0x80
>> [   81.474565]  [<ffffffff81512ea9>] kernel_init+0x9/0xf0
>> [   81.480216]  [<ffffffff8151f788>] ret_from_fork+0x58/0x90
>> [   81.486156]  [<ffffffff81512ea0>] ? rest_init+0x80/0x80
>> [   81.492350] ---[ end Kernel panic - not syncing: Out of memory 
>> and no killable processes...
>> [   81.492350]
>> 
>> -Longman
> 
> I increased the pre-initialized memory per node in 
> update_defer_init() of mm/page_alloc.c from 2G to 4G. Now I am able 
> to boot the 24-TB machine without error. The 12-TB has 0.75TB/node, 
> while the 24-TB machine has 1.5TB/node. I would suggest something 
> like pre-initializing 1G per 0.25TB/node. In this way, it will scale 
> properly with the memory size.
> 
> Before the patch, the boot time from elilo prompt to ssh login was 
> 694s. After the patch, the boot up time was 346s, a saving of 348s 
> (about 50%).

I second scaling the up-front init with the zone size. The 7TB system I 
was booting has only 32GB per NUMA node, which at 1GB per 0.25TB would 
work out at 128MB up-front init per-NUMA-node, which worked nice and 
booted faster yet.

Even booting with 64MB per NUMA node worked great, so there is adequate 
margin for the 8 cores, just I guess we'd need to enforce a minimum of 
eg 64MB or so.

Thanks,
  Daniel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* RE: [PATCH 0/13] Parallel struct page initialisation v4
  2015-04-30 16:10       ` Daniel J Blueman
  (?)
@ 2015-05-02 11:52         ` Elliott, Robert (Server Storage)
  -1 siblings, 0 replies; 168+ messages in thread
From: Elliott, Robert (Server Storage) @ 2015-05-02 11:52 UTC (permalink / raw)
  To: Daniel J Blueman, nzimmer, Mel
  Cc: Pekka Enberg, Andrew Morton, Dave Hansen, Long, Wai,
	Scott J  <scott.norton@hp.com>,
	Linux-MM, LKML, 'Steffen Persvold',
	Boaz Harrosh (boaz@plexistor.com),
	dan.j.williams, linux-nvdimm

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2235 bytes --]


> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> owner@vger.kernel.org] On Behalf Of Daniel J Blueman
> Sent: Thursday, April 30, 2015 11:10 AM
> Subject: Re: [PATCH 0/13] Parallel struct page initialisation v4
...
> On a 7TB, 1728-core NumaConnect system with 108 NUMA nodes, we're
> seeing stock 4.0 boot in 7136s. This drops to 2159s, or a 70% reduction
> with this patchset. Non-temporal PMD init [1] drops this to 1045s.
> 
> Nathan, what do you guys see with the non-temporal PMD patch [1]? Do
> add a sfence at the ende label if you manually patch.
> 
...
> [1] https://lkml.org/lkml/2015/4/23/350

From that post:
> +loop_64:
> +	decq  %rcx
> +	movnti	%rax,(%rdi)
> +	movnti	%rax,8(%rdi)
> +	movnti	%rax,16(%rdi)
> +	movnti	%rax,24(%rdi)
> +	movnti	%rax,32(%rdi)
> +	movnti	%rax,40(%rdi)
> +	movnti	%rax,48(%rdi)
> +	movnti	%rax,56(%rdi)
> +	leaq  64(%rdi),%rdi
> +	jnz    loop_64

There are some even more efficient instructions available in x86,
depending on the CPU features:
* movnti		8 byte
* movntdq %xmm		16 byte, SSE
* vmovntdq %ymm	32 byte, AVX
* vmovntdq %zmm	64 byte, AVX-512 (forthcoming)

The last will transfer a full cache line at a time.

For NVDIMMs, the nd pmem driver is also in need of memcpy functions that 
use these non-temporal instructions, both for performance and reliability.
We also need to speed up __clear_page and copy_user_enhanced_string so
userspace accesses through the page cache can keep up.
https://lkml.org/lkml/2015/4/2/453 is one of the threads on that topic.

Some results I've gotten there under different cache attributes
(in terms of 4 KiB IOPS):

16-byte movntdq:
UC write iops=697872 (697.872 K)(0.697872 M)
WB write iops=9745800 (9745.8 K)(9.7458 M)
WC write iops=9801800 (9801.8 K)(9.8018 M)
WT write iops=9812400 (9812.4 K)(9.8124 M)

32-byte vmovntdq:
UC write iops=1274400 (1274.4 K)(1.2744 M)
WB write iops=10259000 (10259 K)(10.259 M)
WC write iops=10286000 (10286 K)(10.286 M)
WT write iops=10294000 (10294 K)(10.294 M)

---
Robert Elliott, HP Server Storage

N‹§²æìr¸›zǧu©ž²Æ {\b­†éì¹»\x1c®&Þ–)îÆi¢žØ^n‡r¶‰šŽŠÝ¢j$½§$¢¸\x05¢¹¨­è§~Š'.)îÄÃ,yèm¶ŸÿÃ\f%Š{±šj+ƒðèž×¦j)Z†·Ÿ

^ permalink raw reply	[flat|nested] 168+ messages in thread

* RE: [PATCH 0/13] Parallel struct page initialisation v4
@ 2015-05-02 11:52         ` Elliott, Robert (Server Storage)
  0 siblings, 0 replies; 168+ messages in thread
From: Elliott, Robert (Server Storage) @ 2015-05-02 11:52 UTC (permalink / raw)
  To: Daniel J Blueman, nzimmer, Mel Gorman
  Cc: Pekka Enberg, Andrew Morton, Dave Hansen, Long, Wai Man, Norton,
	Scott J, Linux-MM, LKML, 'Steffen Persvold',
	Boaz Harrosh (boaz@plexistor.com),
	dan.j.williams, linux-nvdimm@lists.01.org

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2233 bytes --]


> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> owner@vger.kernel.org] On Behalf Of Daniel J Blueman
> Sent: Thursday, April 30, 2015 11:10 AM
> Subject: Re: [PATCH 0/13] Parallel struct page initialisation v4
...
> On a 7TB, 1728-core NumaConnect system with 108 NUMA nodes, we're
> seeing stock 4.0 boot in 7136s. This drops to 2159s, or a 70% reduction
> with this patchset. Non-temporal PMD init [1] drops this to 1045s.
> 
> Nathan, what do you guys see with the non-temporal PMD patch [1]? Do
> add a sfence at the ende label if you manually patch.
> 
...
> [1] https://lkml.org/lkml/2015/4/23/350

>From that post:
> +loop_64:
> +	decq  %rcx
> +	movnti	%rax,(%rdi)
> +	movnti	%rax,8(%rdi)
> +	movnti	%rax,16(%rdi)
> +	movnti	%rax,24(%rdi)
> +	movnti	%rax,32(%rdi)
> +	movnti	%rax,40(%rdi)
> +	movnti	%rax,48(%rdi)
> +	movnti	%rax,56(%rdi)
> +	leaq  64(%rdi),%rdi
> +	jnz    loop_64

There are some even more efficient instructions available in x86,
depending on the CPU features:
* movnti		8 byte
* movntdq %xmm		16 byte, SSE
* vmovntdq %ymm	32 byte, AVX
* vmovntdq %zmm	64 byte, AVX-512 (forthcoming)

The last will transfer a full cache line at a time.

For NVDIMMs, the nd pmem driver is also in need of memcpy functions that 
use these non-temporal instructions, both for performance and reliability.
We also need to speed up __clear_page and copy_user_enhanced_string so
userspace accesses through the page cache can keep up.
https://lkml.org/lkml/2015/4/2/453 is one of the threads on that topic.

Some results I've gotten there under different cache attributes
(in terms of 4 KiB IOPS):

16-byte movntdq:
UC write iops=697872 (697.872 K)(0.697872 M)
WB write iops=9745800 (9745.8 K)(9.7458 M)
WC write iops=9801800 (9801.8 K)(9.8018 M)
WT write iops=9812400 (9812.4 K)(9.8124 M)

32-byte vmovntdq:
UC write iops=1274400 (1274.4 K)(1.2744 M)
WB write iops=10259000 (10259 K)(10.259 M)
WC write iops=10286000 (10286 K)(10.286 M)
WT write iops=10294000 (10294 K)(10.294 M)

---
Robert Elliott, HP Server Storage

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 168+ messages in thread

* RE: [PATCH 0/13] Parallel struct page initialisation v4
@ 2015-05-02 11:52         ` Elliott, Robert (Server Storage)
  0 siblings, 0 replies; 168+ messages in thread
From: Elliott, Robert (Server Storage) @ 2015-05-02 11:52 UTC (permalink / raw)
  To: Daniel J Blueman, nzimmer, Mel Gorman
  Cc: Pekka Enberg, Andrew Morton, Dave Hansen, Long, Wai Man, Norton,
	Scott J, Linux-MM, LKML, 'Steffen Persvold',
	Boaz Harrosh (boaz@plexistor.com),
	dan.j.williams, linux-nvdimm

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2235 bytes --]


> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> owner@vger.kernel.org] On Behalf Of Daniel J Blueman
> Sent: Thursday, April 30, 2015 11:10 AM
> Subject: Re: [PATCH 0/13] Parallel struct page initialisation v4
...
> On a 7TB, 1728-core NumaConnect system with 108 NUMA nodes, we're
> seeing stock 4.0 boot in 7136s. This drops to 2159s, or a 70% reduction
> with this patchset. Non-temporal PMD init [1] drops this to 1045s.
> 
> Nathan, what do you guys see with the non-temporal PMD patch [1]? Do
> add a sfence at the ende label if you manually patch.
> 
...
> [1] https://lkml.org/lkml/2015/4/23/350

From that post:
> +loop_64:
> +	decq  %rcx
> +	movnti	%rax,(%rdi)
> +	movnti	%rax,8(%rdi)
> +	movnti	%rax,16(%rdi)
> +	movnti	%rax,24(%rdi)
> +	movnti	%rax,32(%rdi)
> +	movnti	%rax,40(%rdi)
> +	movnti	%rax,48(%rdi)
> +	movnti	%rax,56(%rdi)
> +	leaq  64(%rdi),%rdi
> +	jnz    loop_64

There are some even more efficient instructions available in x86,
depending on the CPU features:
* movnti		8 byte
* movntdq %xmm		16 byte, SSE
* vmovntdq %ymm	32 byte, AVX
* vmovntdq %zmm	64 byte, AVX-512 (forthcoming)

The last will transfer a full cache line at a time.

For NVDIMMs, the nd pmem driver is also in need of memcpy functions that 
use these non-temporal instructions, both for performance and reliability.
We also need to speed up __clear_page and copy_user_enhanced_string so
userspace accesses through the page cache can keep up.
https://lkml.org/lkml/2015/4/2/453 is one of the threads on that topic.

Some results I've gotten there under different cache attributes
(in terms of 4 KiB IOPS):

16-byte movntdq:
UC write iops=697872 (697.872 K)(0.697872 M)
WB write iops=9745800 (9745.8 K)(9.7458 M)
WC write iops=9801800 (9801.8 K)(9.8018 M)
WT write iops=9812400 (9812.4 K)(9.8124 M)

32-byte vmovntdq:
UC write iops=1274400 (1274.4 K)(1.2744 M)
WB write iops=10259000 (10259 K)(10.259 M)
WC write iops=10286000 (10286 K)(10.286 M)
WT write iops=10294000 (10294 K)(10.294 M)

---
Robert Elliott, HP Server Storage

N‹§²æìr¸›zǧu©ž²Æ {\b­†éì¹»\x1c®&Þ–)îÆi¢žØ^n‡r¶‰šŽŠÝ¢j$½§$¢¸\x05¢¹¨­è§~Š'.)îÄÃ,yèm¶ŸÿÃ\f%Š{±šj+ƒðèž×¦j)Z†·Ÿ

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
  2015-05-02  8:52         ` Daniel J Blueman
@ 2015-05-02 16:05           ` Daniel J Blueman
  -1 siblings, 0 replies; 168+ messages in thread
From: Daniel J Blueman @ 2015-05-02 16:05 UTC (permalink / raw)
  To: Waiman Long, Mel Gorman
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Scott Norton, Linux-MM, LKML

On Sat, May 2, 2015 at 4:52 PM, Daniel J Blueman <daniel@numascale.com> 
wrote:
> On Sat, May 2, 2015 at 8:09 AM, Waiman Long <waiman.long@hp.com> 
> wrote:
>> On 05/01/2015 06:02 PM, Waiman Long wrote:
>>> 
>>> Bad news!
>>> 
>>> I tried your patch on a 24-TB DragonHawk and got an out of memory 
>>> panic. The kernel log messages were:
>>>   :
>>> [   80.126186] CPU  474: hi:  186, btch:  31 usd:   0
>>> [   80.131457] CPU  475: hi:  186, btch:  31 usd:   0
>>> [   80.136726] CPU  476: hi:  186, btch:  31 usd:   0
>>> [   80.141997] CPU  477: hi:  186, btch:  31 usd:   0
>>> [   80.147267] CPU  478: hi:  186, btch:  31 usd:   0
>>> [   80.152538] CPU  479: hi:  186, btch:  31 usd:   0
>>> [   80.157813] active_anon:0 inactive_anon:0 isolated_anon:0
>>> [   80.157813]  active_file:0 inactive_file:0 isolated_file:0
>>> [   80.157813]  unevictable:0 dirty:0 writeback:0 unstable:0
>>> [   80.157813]  free:209 slab_reclaimable:7 slab_unreclaimable:42986
>>> [   80.157813]  mapped:0 shmem:0 pagetables:0 bounce:0
>>> [   80.157813]  free_cma:0
>>> [   80.190428] Node 0 DMA free:568kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:15988kB managed:15896kB mlocked:0kB dirty:0kB writeback:0kB 
>>> mapped:0kB shmem:0kB slab_reclaimable:0kB 
>>> slab_unreclaimable:14928kB kernel_stack:400kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   80.233475] lowmem_reserve[]: 0 0 0 0
>>> [   80.237542] Node 0 DMA32 free:20kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:1961924kB managed:1333604kB mlocked:0kB dirty:0kB 
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:12kB 
>>> slab_unreclaimable:101664kB kernel_stack:50176kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   80.281456] lowmem_reserve[]: 0 0 0 0
>>> [   80.285527] Node 0 Normal free:0kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:1608515580kB managed:2097148kB mlocked:0kB dirty:0kB 
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:4kB 
>>> slab_unreclaimable:948kB kernel_stack:0kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   80.328958] lowmem_reserve[]: 0 0 0 0
>>> [   80.333031] Node 1 Normal free:248kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:1610612732kB managed:2228220kB mlocked:0kB dirty:0kB 
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:12kB 
>>> slab_unreclaimable:46240kB kernel_stack:3232kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   80.377256] lowmem_reserve[]: 0 0 0 0
>>> [   80.381325] Node 2 Normal free:0kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>>> slab_unreclaimable:612kB kernel_stack:0kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   80.424764] lowmem_reserve[]: 0 0 0 0
>>> [   80.428842] Node 3 Normal free:0kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>>> slab_unreclaimable:600kB kernel_stack:0kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   80.472293] lowmem_reserve[]: 0 0 0 0
>>> [   80.476360] Node 4 Normal free:0kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>>> slab_unreclaimable:620kB kernel_stack:0kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   80.519803] lowmem_reserve[]: 0 0 0 0
>>> [   80.523875] Node 5 Normal free:0kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>>> slab_unreclaimable:584kB kernel_stack:0kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   80.567312] lowmem_reserve[]: 0 0 0 0
>>> [   80.571379] Node 6 Normal free:0kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>>> slab_unreclaimable:556kB kernel_stack:0kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   80.614814] lowmem_reserve[]: 0 0 0 0
>>> [   80.618881] Node 7 Normal free:0kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>>> slab_unreclaimable:556kB kernel_stack:0kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   80.662316] lowmem_reserve[]: 0 0 0 0
>>> [   80.666390] Node 8 Normal free:0kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>>> slab_unreclaimable:572kB kernel_stack:0kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   80.709827] lowmem_reserve[]: 0 0 0 0
>>> [   80.713898] Node 9 Normal free:0kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>>> slab_unreclaimable:572kB kernel_stack:0kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   80.757336] lowmem_reserve[]: 0 0 0 0
>>> [   80.761407] Node 10 Normal free:0kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>>> slab_unreclaimable:564kB kernel_stack:0kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   80.804941] lowmem_reserve[]: 0 0 0 0
>>> [   80.809015] Node 11 Normal free:0kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>>> slab_unreclaimable:572kB kernel_stack:0kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   80.852548] lowmem_reserve[]: 0 0 0 0
>>> [   80.856620] Node 12 Normal free:0kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>>> slab_unreclaimable:616kB kernel_stack:0kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   80.900158] lowmem_reserve[]: 0 0 0 0
>>> [   80.904236] Node 13 Normal free:0kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>>> slab_unreclaimable:592kB kernel_stack:0kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   80.947765] lowmem_reserve[]: 0 0 0 0
>>> [   80.951847] Node 14 Normal free:0kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>>> slab_unreclaimable:600kB kernel_stack:0kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   80.995380] lowmem_reserve[]: 0 0 0 0
>>> [   80.999448] Node 15 Normal free:0kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>>> slab_unreclaimable:548kB kernel_stack:0kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   81.042974] lowmem_reserve[]: 0 0 0 0
>>> [   81.047044] Node 0 DMA: 132*4kB (U) 5*8kB (U) 0*16kB 0*32kB 
>>> 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 568kB
>>> [   81.059632] Node 0 DMA32: 5*4kB (U) 0*8kB 0*16kB 0*32kB 0*64kB 
>>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 20kB
>>> [   81.071733] Node 0 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>> [   81.083443] Node 1 Normal: 52*4kB (U) 5*8kB (U) 0*16kB 0*32kB 
>>> 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 248kB
>>> [   81.096227] Node 2 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>> [   81.107935] Node 3 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>> [   81.119643] Node 4 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>> [   81.131347] Node 5 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>> [   81.143056] Node 6 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>> [   81.154767] Node 7 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>> [   81.166473] Node 8 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>> [   81.178179] Node 9 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>> [   81.189893] Node 10 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>> [   81.201695] Node 11 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>> [   81.213496] Node 12 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>> [   81.225324] Node 13 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>> [   81.237130] Node 14 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>> [   81.248926] Node 15 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>> [   81.260726] 0 total pagecache pages
>>> [   81.264565] 0 pages in swap cache
>>> [   81.268212] Swap cache stats: add 0, delete 0, find 0/0
>>> [   81.273962] Free swap  = 0kB
>>> [   81.277125] Total swap = 0kB
>>> [   81.280341] 6442421132 pages RAM
>>> [   81.283888] 0 pages HighMem/MovableOnly
>>> [   81.288109] 6433662383 pages reserved
>>> [   81.292135] 0 pages hwpoisoned
>>> [   81.295491] [ pid ]   uid  tgid total_vm      rss nr_ptes 
>>> nr_pmds swapents oom_score_adj name
>>> [   81.305245] Kernel panic - not syncing: Out of memory and no 
>>> killable processes...
>>> [   81.305245]
>>> [   81.315200] CPU: 240 PID: 1 Comm: swapper/0 Not tainted 
>>> 4.0.1-pmm-bigsmp #1
>>> [   81.322856] Hardware name: HP Superdome2 16s x86, BIOS Bundle: 
>>> 006.000.042 SFW: 015.099.000 04/01/2015
>>> [   81.333096]  0000000000000000 ffff8800044c79c8 ffffffff8151b0c9 
>>> ffff8800044c7a48
>>> [   81.341262]  ffffffff8151ae1e 0000000000000008 ffff8800044c7a58 
>>> ffff8800044c79f8
>>> [   81.349428]  ffffffff810785c3 ffffffff81a13480 0000000000000000 
>>> ffff8800001001d0
>>> [   81.357595] Call Trace:
>>> [   81.360287]  [<ffffffff8151b0c9>] dump_stack+0x68/0x77
>>> [   81.365942]  [<ffffffff8151ae1e>] panic+0xb9/0x219
>>> [   81.371213]  [<ffffffff810785c3>] ? 
>>> __blocking_notifier_call_chain+0x63/0x80
>>> [   81.378971]  [<ffffffff811384ce>] __out_of_memory+0x34e/0x350
>>> [   81.385292]  [<ffffffff811385ee>] out_of_memory+0x5e/0x90
>>> [   81.391230]  [<ffffffff8113ce9e>] 
>>> __alloc_pages_slowpath+0x6be/0x740
>>> [   81.398219]  [<ffffffff8113d15c>] 
>>> __alloc_pages_nodemask+0x23c/0x250
>>> [   81.405212]  [<ffffffff81186346>] kmem_getpages+0x56/0x110
>>> [   81.411246]  [<ffffffff81187f44>] fallback_alloc+0x164/0x200
>>> [   81.417474]  [<ffffffff81187cfd>] ____cache_alloc_node+0x8d/0x170
>>> [   81.424179]  [<ffffffff811887bb>] 
>>> kmem_cache_alloc_trace+0x17b/0x240
>>> [   81.431169]  [<ffffffff813d5f3a>] init_memory_block+0x3a/0x110
>>> [   81.437586]  [<ffffffff81b5f687>] memory_dev_init+0xd7/0x13d
>>> [   81.443810]  [<ffffffff81b5f2af>] driver_init+0x2f/0x37
>>> [   81.449556]  [<ffffffff81b1599b>] do_basic_setup+0x29/0xd5
>>> [   81.455597]  [<ffffffff81b372c4>] ? sched_init_smp+0x140/0x147
>>> [   81.462015]  [<ffffffff81b15c55>] 
>>> kernel_init_freeable+0x20e/0x297
>>> [   81.468815]  [<ffffffff81512ea0>] ? rest_init+0x80/0x80
>>> [   81.474565]  [<ffffffff81512ea9>] kernel_init+0x9/0xf0
>>> [   81.480216]  [<ffffffff8151f788>] ret_from_fork+0x58/0x90
>>> [   81.486156]  [<ffffffff81512ea0>] ? rest_init+0x80/0x80
>>> [   81.492350] ---[ end Kernel panic - not syncing: Out of memory 
>>> and no killable processes...
>>> [   81.492350]
>>> 
>>> -Longman
>> 
>> I increased the pre-initialized memory per node in 
>> update_defer_init() of mm/page_alloc.c from 2G to 4G. Now I am able 
>> to boot the 24-TB machine without error. The 12-TB has 0.75TB/node, 
>> while the 24-TB machine has 1.5TB/node. I would suggest something 
>> like pre-initializing 1G per 0.25TB/node. In this way, it will scale 
>> properly with the memory size.
>> 
>> Before the patch, the boot time from elilo prompt to ssh login was 
>> 694s. After the patch, the boot up time was 346s, a saving of 348s 
>> (about 50%).
> 
> I second scaling the up-front init with the zone size. The 7TB system 
> I was booting has only 32GB per NUMA node, which at 1GB per 0.25TB 
> would work out at 128MB up-front init per-NUMA-node, which worked 
> nice and booted faster yet.
> 
> Even booting with 64MB per NUMA node worked great, so there is 
> adequate margin for the 8 cores, just I guess we'd need to enforce a 
> minimum of eg 64MB or so.

Varying the synchronous per-NUMA-node initialisation (with non-temporal 
patch, but that just removes a constant from PMD init), from kernel 
load to login prompt on this 7TB, 1728-core system takes:
512MB 699.2s
256MB 680.3s
128MB 661.7s
 64MB 663.6s
 32MB 667.8s

So, in this case 128MB per NUMA node gives more locality than 64MB, so 
should be a good minimum, and matches Waiman's scaling suggestion.

Thanks,
  Daniel


^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
@ 2015-05-02 16:05           ` Daniel J Blueman
  0 siblings, 0 replies; 168+ messages in thread
From: Daniel J Blueman @ 2015-05-02 16:05 UTC (permalink / raw)
  To: Waiman Long, Mel Gorman
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Scott Norton, Linux-MM, LKML

On Sat, May 2, 2015 at 4:52 PM, Daniel J Blueman <daniel@numascale.com> 
wrote:
> On Sat, May 2, 2015 at 8:09 AM, Waiman Long <waiman.long@hp.com> 
> wrote:
>> On 05/01/2015 06:02 PM, Waiman Long wrote:
>>> 
>>> Bad news!
>>> 
>>> I tried your patch on a 24-TB DragonHawk and got an out of memory 
>>> panic. The kernel log messages were:
>>>   :
>>> [   80.126186] CPU  474: hi:  186, btch:  31 usd:   0
>>> [   80.131457] CPU  475: hi:  186, btch:  31 usd:   0
>>> [   80.136726] CPU  476: hi:  186, btch:  31 usd:   0
>>> [   80.141997] CPU  477: hi:  186, btch:  31 usd:   0
>>> [   80.147267] CPU  478: hi:  186, btch:  31 usd:   0
>>> [   80.152538] CPU  479: hi:  186, btch:  31 usd:   0
>>> [   80.157813] active_anon:0 inactive_anon:0 isolated_anon:0
>>> [   80.157813]  active_file:0 inactive_file:0 isolated_file:0
>>> [   80.157813]  unevictable:0 dirty:0 writeback:0 unstable:0
>>> [   80.157813]  free:209 slab_reclaimable:7 slab_unreclaimable:42986
>>> [   80.157813]  mapped:0 shmem:0 pagetables:0 bounce:0
>>> [   80.157813]  free_cma:0
>>> [   80.190428] Node 0 DMA free:568kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:15988kB managed:15896kB mlocked:0kB dirty:0kB writeback:0kB 
>>> mapped:0kB shmem:0kB slab_reclaimable:0kB 
>>> slab_unreclaimable:14928kB kernel_stack:400kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   80.233475] lowmem_reserve[]: 0 0 0 0
>>> [   80.237542] Node 0 DMA32 free:20kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:1961924kB managed:1333604kB mlocked:0kB dirty:0kB 
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:12kB 
>>> slab_unreclaimable:101664kB kernel_stack:50176kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   80.281456] lowmem_reserve[]: 0 0 0 0
>>> [   80.285527] Node 0 Normal free:0kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:1608515580kB managed:2097148kB mlocked:0kB dirty:0kB 
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:4kB 
>>> slab_unreclaimable:948kB kernel_stack:0kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   80.328958] lowmem_reserve[]: 0 0 0 0
>>> [   80.333031] Node 1 Normal free:248kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:1610612732kB managed:2228220kB mlocked:0kB dirty:0kB 
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:12kB 
>>> slab_unreclaimable:46240kB kernel_stack:3232kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   80.377256] lowmem_reserve[]: 0 0 0 0
>>> [   80.381325] Node 2 Normal free:0kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>>> slab_unreclaimable:612kB kernel_stack:0kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   80.424764] lowmem_reserve[]: 0 0 0 0
>>> [   80.428842] Node 3 Normal free:0kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>>> slab_unreclaimable:600kB kernel_stack:0kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   80.472293] lowmem_reserve[]: 0 0 0 0
>>> [   80.476360] Node 4 Normal free:0kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>>> slab_unreclaimable:620kB kernel_stack:0kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   80.519803] lowmem_reserve[]: 0 0 0 0
>>> [   80.523875] Node 5 Normal free:0kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>>> slab_unreclaimable:584kB kernel_stack:0kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   80.567312] lowmem_reserve[]: 0 0 0 0
>>> [   80.571379] Node 6 Normal free:0kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>>> slab_unreclaimable:556kB kernel_stack:0kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   80.614814] lowmem_reserve[]: 0 0 0 0
>>> [   80.618881] Node 7 Normal free:0kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>>> slab_unreclaimable:556kB kernel_stack:0kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   80.662316] lowmem_reserve[]: 0 0 0 0
>>> [   80.666390] Node 8 Normal free:0kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>>> slab_unreclaimable:572kB kernel_stack:0kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   80.709827] lowmem_reserve[]: 0 0 0 0
>>> [   80.713898] Node 9 Normal free:0kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>>> slab_unreclaimable:572kB kernel_stack:0kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   80.757336] lowmem_reserve[]: 0 0 0 0
>>> [   80.761407] Node 10 Normal free:0kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>>> slab_unreclaimable:564kB kernel_stack:0kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   80.804941] lowmem_reserve[]: 0 0 0 0
>>> [   80.809015] Node 11 Normal free:0kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>>> slab_unreclaimable:572kB kernel_stack:0kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   80.852548] lowmem_reserve[]: 0 0 0 0
>>> [   80.856620] Node 12 Normal free:0kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>>> slab_unreclaimable:616kB kernel_stack:0kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   80.900158] lowmem_reserve[]: 0 0 0 0
>>> [   80.904236] Node 13 Normal free:0kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>>> slab_unreclaimable:592kB kernel_stack:0kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   80.947765] lowmem_reserve[]: 0 0 0 0
>>> [   80.951847] Node 14 Normal free:0kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>>> slab_unreclaimable:600kB kernel_stack:0kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   80.995380] lowmem_reserve[]: 0 0 0 0
>>> [   80.999448] Node 15 Normal free:0kB min:0kB low:0kB high:0kB 
>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB 
>>> present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
>>> slab_unreclaimable:548kB kernel_stack:0kB pagetables:0kB 
>>> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
>>> pages_scanned:0 all_unreclaimable? yes
>>> [   81.042974] lowmem_reserve[]: 0 0 0 0
>>> [   81.047044] Node 0 DMA: 132*4kB (U) 5*8kB (U) 0*16kB 0*32kB 
>>> 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 568kB
>>> [   81.059632] Node 0 DMA32: 5*4kB (U) 0*8kB 0*16kB 0*32kB 0*64kB 
>>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 20kB
>>> [   81.071733] Node 0 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>> [   81.083443] Node 1 Normal: 52*4kB (U) 5*8kB (U) 0*16kB 0*32kB 
>>> 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 248kB
>>> [   81.096227] Node 2 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>> [   81.107935] Node 3 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>> [   81.119643] Node 4 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>> [   81.131347] Node 5 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>> [   81.143056] Node 6 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>> [   81.154767] Node 7 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>> [   81.166473] Node 8 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>> [   81.178179] Node 9 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>> [   81.189893] Node 10 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>> [   81.201695] Node 11 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>> [   81.213496] Node 12 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>> [   81.225324] Node 13 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>> [   81.237130] Node 14 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>> [   81.248926] Node 15 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
>>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>> [   81.260726] 0 total pagecache pages
>>> [   81.264565] 0 pages in swap cache
>>> [   81.268212] Swap cache stats: add 0, delete 0, find 0/0
>>> [   81.273962] Free swap  = 0kB
>>> [   81.277125] Total swap = 0kB
>>> [   81.280341] 6442421132 pages RAM
>>> [   81.283888] 0 pages HighMem/MovableOnly
>>> [   81.288109] 6433662383 pages reserved
>>> [   81.292135] 0 pages hwpoisoned
>>> [   81.295491] [ pid ]   uid  tgid total_vm      rss nr_ptes 
>>> nr_pmds swapents oom_score_adj name
>>> [   81.305245] Kernel panic - not syncing: Out of memory and no 
>>> killable processes...
>>> [   81.305245]
>>> [   81.315200] CPU: 240 PID: 1 Comm: swapper/0 Not tainted 
>>> 4.0.1-pmm-bigsmp #1
>>> [   81.322856] Hardware name: HP Superdome2 16s x86, BIOS Bundle: 
>>> 006.000.042 SFW: 015.099.000 04/01/2015
>>> [   81.333096]  0000000000000000 ffff8800044c79c8 ffffffff8151b0c9 
>>> ffff8800044c7a48
>>> [   81.341262]  ffffffff8151ae1e 0000000000000008 ffff8800044c7a58 
>>> ffff8800044c79f8
>>> [   81.349428]  ffffffff810785c3 ffffffff81a13480 0000000000000000 
>>> ffff8800001001d0
>>> [   81.357595] Call Trace:
>>> [   81.360287]  [<ffffffff8151b0c9>] dump_stack+0x68/0x77
>>> [   81.365942]  [<ffffffff8151ae1e>] panic+0xb9/0x219
>>> [   81.371213]  [<ffffffff810785c3>] ? 
>>> __blocking_notifier_call_chain+0x63/0x80
>>> [   81.378971]  [<ffffffff811384ce>] __out_of_memory+0x34e/0x350
>>> [   81.385292]  [<ffffffff811385ee>] out_of_memory+0x5e/0x90
>>> [   81.391230]  [<ffffffff8113ce9e>] 
>>> __alloc_pages_slowpath+0x6be/0x740
>>> [   81.398219]  [<ffffffff8113d15c>] 
>>> __alloc_pages_nodemask+0x23c/0x250
>>> [   81.405212]  [<ffffffff81186346>] kmem_getpages+0x56/0x110
>>> [   81.411246]  [<ffffffff81187f44>] fallback_alloc+0x164/0x200
>>> [   81.417474]  [<ffffffff81187cfd>] ____cache_alloc_node+0x8d/0x170
>>> [   81.424179]  [<ffffffff811887bb>] 
>>> kmem_cache_alloc_trace+0x17b/0x240
>>> [   81.431169]  [<ffffffff813d5f3a>] init_memory_block+0x3a/0x110
>>> [   81.437586]  [<ffffffff81b5f687>] memory_dev_init+0xd7/0x13d
>>> [   81.443810]  [<ffffffff81b5f2af>] driver_init+0x2f/0x37
>>> [   81.449556]  [<ffffffff81b1599b>] do_basic_setup+0x29/0xd5
>>> [   81.455597]  [<ffffffff81b372c4>] ? sched_init_smp+0x140/0x147
>>> [   81.462015]  [<ffffffff81b15c55>] 
>>> kernel_init_freeable+0x20e/0x297
>>> [   81.468815]  [<ffffffff81512ea0>] ? rest_init+0x80/0x80
>>> [   81.474565]  [<ffffffff81512ea9>] kernel_init+0x9/0xf0
>>> [   81.480216]  [<ffffffff8151f788>] ret_from_fork+0x58/0x90
>>> [   81.486156]  [<ffffffff81512ea0>] ? rest_init+0x80/0x80
>>> [   81.492350] ---[ end Kernel panic - not syncing: Out of memory 
>>> and no killable processes...
>>> [   81.492350]
>>> 
>>> -Longman
>> 
>> I increased the pre-initialized memory per node in 
>> update_defer_init() of mm/page_alloc.c from 2G to 4G. Now I am able 
>> to boot the 24-TB machine without error. The 12-TB has 0.75TB/node, 
>> while the 24-TB machine has 1.5TB/node. I would suggest something 
>> like pre-initializing 1G per 0.25TB/node. In this way, it will scale 
>> properly with the memory size.
>> 
>> Before the patch, the boot time from elilo prompt to ssh login was 
>> 694s. After the patch, the boot up time was 346s, a saving of 348s 
>> (about 50%).
> 
> I second scaling the up-front init with the zone size. The 7TB system 
> I was booting has only 32GB per NUMA node, which at 1GB per 0.25TB 
> would work out at 128MB up-front init per-NUMA-node, which worked 
> nice and booted faster yet.
> 
> Even booting with 64MB per NUMA node worked great, so there is 
> adequate margin for the 8 cores, just I guess we'd need to enforce a 
> minimum of eg 64MB or so.

Varying the synchronous per-NUMA-node initialisation (with non-temporal 
patch, but that just removes a constant from PMD init), from kernel 
load to login prompt on this 7TB, 1728-core system takes:
512MB 699.2s
256MB 680.3s
128MB 661.7s
 64MB 663.6s
 32MB 667.8s

So, in this case 128MB per NUMA node gives more locality than 64MB, so 
should be a good minimum, and matches Waiman's scaling suggestion.

Thanks,
  Daniel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 06/13] mm: meminit: Inline some helper functions
  2015-04-28 14:37   ` Mel Gorman
@ 2015-05-04  8:33     ` Michal Hocko
  -1 siblings, 0 replies; 168+ messages in thread
From: Michal Hocko @ 2015-05-04  8:33 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Waiman Long,
	Scott Norton, Daniel J Blueman, Linux-MM, LKML

[-- Attachment #1: Type: text/plain, Size: 1173 bytes --]

On Tue 28-04-15 15:37:03, Mel Gorman wrote:
> early_pfn_in_nid() and meminit_pfn_in_nid() are small functions that are
> unnecessarily visible outside memory initialisation. As well as unnecessary
> visibility, it's unnecessary function call overhead when initialising pages.
> This patch moves the helpers inline.

This is causing:
  CC      mm/page_alloc.o
mm/page_alloc.c: In function ‘deferred_init_memmap’:
mm/page_alloc.c:1135:4: error: implicit declaration of function ‘meminit_pfn_in_nid’ [-Werror=implicit-function-declaration]
    if (!meminit_pfn_in_nid(pfn, nid, &nid_init_state)) {
    ^

with randconfig test. CONFIG_NODES_SPAN_OTHER_NODES is not defined.
The full config is attached.

I guess we need something like this:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3e0257debce0..a48128d882d8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1044,6 +1044,11 @@ static inline bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
 {
 	return true;
 }
+static inline bool __meminit meminit_pfn_in_nid(unsigned long pfn, int node,
+					struct mminit_pfnnid_cache *state)
+{
+	return true;
+}
 #endif
 
-- 
Michal Hocko
SUSE Labs

[-- Attachment #2: config-failed --]
[-- Type: text/plain, Size: 100881 bytes --]

#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 4.0.0 Kernel Configuration
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_PERF_EVENTS_INTEL_UNCORE=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_MMU=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_HAVE_INTEL_TXT=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11"
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_PGTABLE_LEVELS=4
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_CONSTRUCTORS=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
# CONFIG_SWAP is not set
# CONFIG_SYSVIPC is not set
# CONFIG_POSIX_MQUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_FHANDLE=y
# CONFIG_USELIB is not set
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_LEGACY_ALLOC_HWIRQ=y
CONFIG_IRQ_DOMAIN=y
CONFIG_GENERIC_MSI_IRQ=y
# CONFIG_IRQ_DOMAIN_DEBUG is not set
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_BSD_PROCESS_ACCT is not set
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y

#
# RCU Subsystem
#
CONFIG_TINY_RCU=y
CONFIG_SRCU=y
CONFIG_TASKS_RCU=y
# CONFIG_RCU_STALL_COMMON is not set
# CONFIG_TREE_RCU_TRACE is not set
CONFIG_RCU_KTHREAD_PRIO=0
CONFIG_BUILD_BIN2C=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=18
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_ARCH_SUPPORTS_INT128=y
# CONFIG_CGROUPS is not set
# CONFIG_CHECKPOINT_RESTORE is not set
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
CONFIG_USER_NS=y
# CONFIG_PID_NS is not set
CONFIG_NET_NS=y
# CONFIG_SCHED_AUTOGROUP is not set
CONFIG_SYSFS_DEPRECATED=y
CONFIG_SYSFS_DEPRECATED_V2=y
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_RD_XZ=y
CONFIG_RD_LZO=y
# CONFIG_RD_LZ4 is not set
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
CONFIG_HAVE_UID16=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_HAVE_PCSPKR_PLATFORM=y
CONFIG_BPF=y
# CONFIG_EXPERT is not set
CONFIG_UID16=y
CONFIG_SGETMASK_SYSCALL=y
CONFIG_SYSFS_SYSCALL=y
# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
# CONFIG_BPF_SYSCALL is not set
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_ADVISE_SYSCALLS=y
CONFIG_PCI_QUIRKS=y
# CONFIG_EMBEDDED is not set
CONFIG_HAVE_PERF_EVENTS=y

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
# CONFIG_DEBUG_PERF_USE_VMALLOC is not set
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
CONFIG_COMPAT_BRK=y
# CONFIG_SLAB is not set
CONFIG_SLUB=y
CONFIG_SYSTEM_TRUSTED_KEYRING=y
CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
CONFIG_OPROFILE=y
CONFIG_OPROFILE_EVENT_MULTIPLEX=y
CONFIG_HAVE_OPROFILE=y
CONFIG_OPROFILE_NMI_TIMER=y
CONFIG_KPROBES=y
CONFIG_JUMP_LABEL=y
CONFIG_OPTPROBES=y
CONFIG_KPROBES_ON_FTRACE=y
CONFIG_UPROBES=y
# CONFIG_HAVE_64BIT_ALIGNED_ACCESS is not set
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_ARCH_USE_BUILTIN_BSWAP=y
CONFIG_KRETPROBES=y
CONFIG_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_OPTPROBES=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_ATTRS=y
CONFIG_HAVE_DMA_CONTIGUOUS=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
CONFIG_HAVE_DMA_API_DEBUG=y
CONFIG_HAVE_HW_BREAKPOINT=y
CONFIG_HAVE_MIXED_BREAKPOINTS_REGS=y
CONFIG_HAVE_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_PERF_EVENTS_NMI=y
CONFIG_HAVE_PERF_REGS=y
CONFIG_HAVE_PERF_USER_STACK_DUMP=y
CONFIG_HAVE_ARCH_JUMP_LABEL=y
CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y
CONFIG_HAVE_ALIGNED_STRUCT_PAGE=y
CONFIG_HAVE_CMPXCHG_LOCAL=y
CONFIG_HAVE_CMPXCHG_DOUBLE=y
CONFIG_ARCH_WANT_COMPAT_IPC_PARSE_VERSION=y
CONFIG_ARCH_WANT_OLD_COMPAT_IPC=y
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_SECCOMP_FILTER=y
CONFIG_HAVE_CC_STACKPROTECTOR=y
# CONFIG_CC_STACKPROTECTOR is not set
CONFIG_CC_STACKPROTECTOR_NONE=y
# CONFIG_CC_STACKPROTECTOR_REGULAR is not set
# CONFIG_CC_STACKPROTECTOR_STRONG is not set
CONFIG_HAVE_CONTEXT_TRACKING=y
CONFIG_HAVE_VIRT_CPU_ACCOUNTING_GEN=y
CONFIG_HAVE_IRQ_TIME_ACCOUNTING=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_HAVE_ARCH_HUGE_VMAP=y
CONFIG_HAVE_ARCH_SOFT_DIRTY=y
CONFIG_MODULES_USE_ELF_RELA=y
CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK=y
CONFIG_ARCH_HAS_ELF_RANDOMIZE=y
CONFIG_OLD_SIGSUSPEND3=y
CONFIG_COMPAT_OLD_SIGACTION=y

#
# GCOV-based kernel profiling
#
# CONFIG_GCOV_KERNEL is not set
CONFIG_ARCH_HAS_GCOV_PROFILE_ALL=y
# CONFIG_HAVE_GENERIC_DMA_COHERENT is not set
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_FORCE_LOAD=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_MODVERSIONS=y
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_MODULE_SIG=y
# CONFIG_MODULE_SIG_FORCE is not set
# CONFIG_MODULE_SIG_ALL is not set
CONFIG_MODULE_SIG_SHA1=y
# CONFIG_MODULE_SIG_SHA224 is not set
# CONFIG_MODULE_SIG_SHA256 is not set
# CONFIG_MODULE_SIG_SHA384 is not set
# CONFIG_MODULE_SIG_SHA512 is not set
CONFIG_MODULE_SIG_HASH="sha1"
# CONFIG_MODULE_COMPRESS is not set
CONFIG_BLOCK=y
CONFIG_BLK_DEV_BSG=y
CONFIG_BLK_DEV_BSGLIB=y
CONFIG_BLK_DEV_INTEGRITY=y
# CONFIG_BLK_CMDLINE_PARSER is not set

#
# Partition Types
#
CONFIG_PARTITION_ADVANCED=y
# CONFIG_ACORN_PARTITION is not set
CONFIG_AIX_PARTITION=y
CONFIG_OSF_PARTITION=y
# CONFIG_AMIGA_PARTITION is not set
# CONFIG_ATARI_PARTITION is not set
CONFIG_MAC_PARTITION=y
CONFIG_MSDOS_PARTITION=y
CONFIG_BSD_DISKLABEL=y
CONFIG_MINIX_SUBPARTITION=y
# CONFIG_SOLARIS_X86_PARTITION is not set
# CONFIG_UNIXWARE_DISKLABEL is not set
CONFIG_LDM_PARTITION=y
# CONFIG_LDM_DEBUG is not set
CONFIG_SGI_PARTITION=y
# CONFIG_ULTRIX_PARTITION is not set
# CONFIG_SUN_PARTITION is not set
# CONFIG_KARMA_PARTITION is not set
# CONFIG_EFI_PARTITION is not set
# CONFIG_SYSV68_PARTITION is not set
# CONFIG_CMDLINE_PARTITION is not set
CONFIG_BLOCK_COMPAT=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_ASN1=y
CONFIG_UNINLINE_SPIN_UNLOCK=y
CONFIG_ARCH_SUPPORTS_ATOMIC_RMW=y
CONFIG_ARCH_USE_QUEUE_RWLOCK=y
CONFIG_FREEZER=y

#
# Processor type and features
#
CONFIG_ZONE_DMA=y
# CONFIG_SMP is not set
CONFIG_X86_FEATURE_NAMES=y
CONFIG_X86_X2APIC=y
CONFIG_X86_MPPARSE=y
CONFIG_GOLDFISH=y
CONFIG_X86_EXTENDED_PLATFORM=y
CONFIG_X86_GOLDFISH=y
# CONFIG_X86_INTEL_LPSS is not set
# CONFIG_X86_AMD_PLATFORM_DEVICE is not set
CONFIG_IOSF_MBI=y
# CONFIG_IOSF_MBI_DEBUG is not set
CONFIG_X86_SUPPORTS_MEMORY_FAILURE=y
CONFIG_SCHED_OMIT_FRAME_POINTER=y
# CONFIG_HYPERVISOR_GUEST is not set
CONFIG_NO_BOOTMEM=y
# CONFIG_MK8 is not set
# CONFIG_MPSC is not set
# CONFIG_MCORE2 is not set
# CONFIG_MATOM is not set
CONFIG_GENERIC_CPU=y
CONFIG_X86_INTERNODE_CACHE_SHIFT=6
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_TSC=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=64
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_CPU_SUP_INTEL=y
CONFIG_CPU_SUP_AMD=y
CONFIG_CPU_SUP_CENTAUR=y
CONFIG_HPET_TIMER=y
CONFIG_DMI=y
CONFIG_GART_IOMMU=y
CONFIG_CALGARY_IOMMU=y
# CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT is not set
CONFIG_SWIOTLB=y
CONFIG_IOMMU_HELPER=y
CONFIG_NR_CPUS=1
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
CONFIG_UP_LATE_INIT=y
CONFIG_X86_UP_APIC_MSI=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
# CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS is not set
CONFIG_X86_MCE=y
CONFIG_X86_MCE_INTEL=y
CONFIG_X86_MCE_AMD=y
CONFIG_X86_MCE_THRESHOLD=y
CONFIG_X86_MCE_INJECT=y
CONFIG_X86_THERMAL_VECTOR=y
CONFIG_X86_16BIT=y
CONFIG_X86_ESPFIX64=y
CONFIG_X86_VSYSCALL_EMULATION=y
CONFIG_I8K=m
CONFIG_MICROCODE=y
CONFIG_MICROCODE_INTEL=y
CONFIG_MICROCODE_AMD=y
CONFIG_MICROCODE_OLD_INTERFACE=y
# CONFIG_MICROCODE_INTEL_EARLY is not set
# CONFIG_MICROCODE_AMD_EARLY is not set
# CONFIG_MICROCODE_EARLY is not set
CONFIG_X86_MSR=m
CONFIG_X86_CPUID=y
CONFIG_ARCH_PHYS_ADDR_T_64BIT=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_DIRECT_GBPAGES=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
# CONFIG_ARCH_MEMORY_PROBE is not set
CONFIG_ARCH_PROC_KCORE_TEXT=y
CONFIG_ILLEGAL_POINTER_VALUE=0xdead000000000000
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_SPARSEMEM_MANUAL=y
CONFIG_SPARSEMEM=y
CONFIG_HAVE_MEMORY_PRESENT=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER=y
CONFIG_SPARSEMEM_VMEMMAP=y
CONFIG_HAVE_MEMBLOCK=y
CONFIG_HAVE_MEMBLOCK_NODE_MAP=y
CONFIG_ARCH_DISCARD_MEMBLOCK=y
CONFIG_MEMORY_ISOLATION=y
CONFIG_HAVE_BOOTMEM_INFO_NODE=y
CONFIG_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTPLUG_SPARSE=y
CONFIG_MEMORY_HOTREMOVE=y
CONFIG_PAGEFLAGS_EXTENDED=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK=y
CONFIG_COMPACTION=y
CONFIG_MIGRATION=y
CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION=y
CONFIG_PHYS_ADDR_T_64BIT=y
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
CONFIG_NEED_BOUNCE_POOL=y
CONFIG_VIRT_TO_BUS=y
CONFIG_MMU_NOTIFIER=y
CONFIG_KSM=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
CONFIG_ARCH_SUPPORTS_MEMORY_FAILURE=y
CONFIG_MEMORY_FAILURE=y
# CONFIG_HWPOISON_INJECT is not set
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
# CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
CONFIG_NEED_PER_CPU_KM=y
CONFIG_CLEANCACHE=y
CONFIG_CMA=y
CONFIG_CMA_DEBUG=y
# CONFIG_CMA_DEBUGFS is not set
CONFIG_CMA_AREAS=7
CONFIG_ZPOOL=y
# CONFIG_ZBUD is not set
# CONFIG_ZSMALLOC is not set
CONFIG_GENERIC_EARLY_IOREMAP=y
CONFIG_ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT=y
CONFIG_DEFERRED_STRUCT_PAGE_INIT=y
CONFIG_X86_CHECK_BIOS_CORRUPTION=y
# CONFIG_X86_BOOTPARAM_MEMORY_CORRUPTION_CHECK is not set
CONFIG_X86_RESERVE_LOW=64
CONFIG_MTRR=y
# CONFIG_MTRR_SANITIZER is not set
CONFIG_X86_PAT=y
CONFIG_ARCH_USES_PG_UNCACHED=y
CONFIG_ARCH_RANDOM=y
CONFIG_X86_SMAP=y
# CONFIG_X86_INTEL_MPX is not set
# CONFIG_EFI is not set
CONFIG_SECCOMP=y
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
CONFIG_SCHED_HRTICK=y
CONFIG_KEXEC=y
CONFIG_KEXEC_FILE=y
# CONFIG_KEXEC_VERIFY_SIG is not set
CONFIG_CRASH_DUMP=y
CONFIG_PHYSICAL_START=0x1000000
CONFIG_RELOCATABLE=y
CONFIG_RANDOMIZE_BASE=y
CONFIG_RANDOMIZE_BASE_MAX_OFFSET=0x40000000
CONFIG_X86_NEED_RELOCS=y
CONFIG_PHYSICAL_ALIGN=0x1000000
CONFIG_COMPAT_VDSO=y
# CONFIG_CMDLINE_BOOL is not set
CONFIG_HAVE_LIVEPATCH=y
CONFIG_LIVEPATCH=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y

#
# Power management and ACPI options
#
CONFIG_SUSPEND=y
CONFIG_SUSPEND_FREEZER=y
CONFIG_PM_SLEEP=y
CONFIG_PM_AUTOSLEEP=y
CONFIG_PM_WAKELOCKS=y
CONFIG_PM_WAKELOCKS_LIMIT=100
CONFIG_PM_WAKELOCKS_GC=y
CONFIG_PM=y
CONFIG_PM_DEBUG=y
# CONFIG_PM_ADVANCED_DEBUG is not set
CONFIG_PM_SLEEP_DEBUG=y
CONFIG_DPM_WATCHDOG=y
CONFIG_DPM_WATCHDOG_TIMEOUT=12
CONFIG_PM_TRACE=y
CONFIG_PM_TRACE_RTC=y
# CONFIG_WQ_POWER_EFFICIENT_DEFAULT is not set
CONFIG_ACPI=y
CONFIG_ACPI_LEGACY_TABLES_LOOKUP=y
CONFIG_ARCH_MIGHT_HAVE_ACPI_PDC=y
CONFIG_ACPI_SLEEP=y
CONFIG_ACPI_PROCFS_POWER=y
CONFIG_ACPI_EC_DEBUGFS=y
CONFIG_ACPI_AC=y
CONFIG_ACPI_BATTERY=y
CONFIG_ACPI_BUTTON=y
CONFIG_ACPI_VIDEO=m
CONFIG_ACPI_FAN=y
CONFIG_ACPI_DOCK=y
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_PROCESSOR_AGGREGATOR=y
CONFIG_ACPI_THERMAL=y
# CONFIG_ACPI_CUSTOM_DSDT is not set
CONFIG_ACPI_INITRD_TABLE_OVERRIDE=y
CONFIG_ACPI_DEBUG=y
# CONFIG_ACPI_PCI_SLOT is not set
CONFIG_X86_PM_TIMER=y
CONFIG_ACPI_CONTAINER=y
CONFIG_ACPI_HOTPLUG_MEMORY=y
CONFIG_ACPI_HOTPLUG_IOAPIC=y
# CONFIG_ACPI_SBS is not set
CONFIG_ACPI_HED=m
CONFIG_ACPI_CUSTOM_METHOD=m
# CONFIG_ACPI_REDUCED_HARDWARE_ONLY is not set
CONFIG_HAVE_ACPI_APEI=y
CONFIG_HAVE_ACPI_APEI_NMI=y
CONFIG_ACPI_APEI=y
# CONFIG_ACPI_APEI_GHES is not set
CONFIG_ACPI_APEI_MEMORY_FAILURE=y
CONFIG_ACPI_APEI_EINJ=m
# CONFIG_ACPI_APEI_ERST_DEBUG is not set
CONFIG_ACPI_EXTLOG=y
# CONFIG_PMIC_OPREGION is not set
CONFIG_SFI=y

#
# CPU Frequency scaling
#
# CONFIG_CPU_FREQ is not set

#
# CPU Idle
#
CONFIG_CPU_IDLE=y
CONFIG_CPU_IDLE_GOV_LADDER=y
CONFIG_CPU_IDLE_GOV_MENU=y
# CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED is not set
CONFIG_INTEL_IDLE=y

#
# Memory power savings
#
CONFIG_I7300_IDLE_IOAT_CHANNEL=y
CONFIG_I7300_IDLE=m

#
# Bus options (PCI etc.)
#
CONFIG_PCI=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
CONFIG_PCI_DOMAINS=y
# CONFIG_PCIEPORTBUS is not set
CONFIG_PCI_MSI=y
# CONFIG_PCI_DEBUG is not set
CONFIG_PCI_REALLOC_ENABLE_AUTO=y
CONFIG_PCI_STUB=m
CONFIG_HT_IRQ=y
CONFIG_PCI_ATS=y
CONFIG_PCI_IOV=y
# CONFIG_PCI_PRI is not set
CONFIG_PCI_PASID=y
CONFIG_PCI_LABEL=y

#
# PCI host controller drivers
#
CONFIG_ISA_DMA_API=y
CONFIG_AMD_NB=y
# CONFIG_PCCARD is not set
CONFIG_HOTPLUG_PCI=y
CONFIG_HOTPLUG_PCI_ACPI=y
CONFIG_HOTPLUG_PCI_ACPI_IBM=m
# CONFIG_HOTPLUG_PCI_CPCI is not set
CONFIG_HOTPLUG_PCI_SHPC=y
CONFIG_RAPIDIO=m
CONFIG_RAPIDIO_DISC_TIMEOUT=30
CONFIG_RAPIDIO_ENABLE_RX_TX_PORTS=y
# CONFIG_RAPIDIO_DMA_ENGINE is not set
CONFIG_RAPIDIO_DEBUG=y
CONFIG_RAPIDIO_ENUM_BASIC=m

#
# RapidIO Switch drivers
#
CONFIG_RAPIDIO_TSI57X=m
CONFIG_RAPIDIO_CPS_XX=m
CONFIG_RAPIDIO_TSI568=m
# CONFIG_RAPIDIO_CPS_GEN2 is not set
# CONFIG_X86_SYSFB is not set

#
# Executable file formats / Emulations
#
CONFIG_BINFMT_ELF=y
CONFIG_COMPAT_BINFMT_ELF=y
CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y
# CONFIG_BINFMT_SCRIPT is not set
# CONFIG_HAVE_AOUT is not set
# CONFIG_BINFMT_MISC is not set
CONFIG_COREDUMP=y
CONFIG_IA32_EMULATION=y
# CONFIG_IA32_AOUT is not set
CONFIG_X86_X32=y
CONFIG_COMPAT=y
CONFIG_COMPAT_FOR_U64_ALIGNMENT=y
CONFIG_KEYS_COMPAT=y
CONFIG_X86_DEV_DMA_OPS=y
CONFIG_PMC_ATOM=y
CONFIG_NET=y
CONFIG_COMPAT_NETLINK_MESSAGES=y

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_DIAG=y
CONFIG_UNIX=y
CONFIG_UNIX_DIAG=y
CONFIG_XFRM=y
CONFIG_XFRM_ALGO=y
CONFIG_XFRM_USER=m
CONFIG_XFRM_SUB_POLICY=y
CONFIG_XFRM_MIGRATE=y
# CONFIG_XFRM_STATISTICS is not set
CONFIG_XFRM_IPCOMP=m
CONFIG_NET_KEY=m
# CONFIG_NET_KEY_MIGRATE is not set
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
CONFIG_IP_FIB_TRIE_STATS=y
CONFIG_IP_MULTIPLE_TABLES=y
CONFIG_IP_ROUTE_MULTIPATH=y
CONFIG_IP_ROUTE_VERBOSE=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
CONFIG_IP_PNP_BOOTP=y
CONFIG_IP_PNP_RARP=y
CONFIG_NET_IPIP=m
CONFIG_NET_IPGRE_DEMUX=m
CONFIG_NET_IP_TUNNEL=y
# CONFIG_NET_IPGRE is not set
# CONFIG_IP_MROUTE is not set
CONFIG_SYN_COOKIES=y
CONFIG_NET_IPVTI=y
CONFIG_NET_UDP_TUNNEL=y
CONFIG_NET_FOU=y
CONFIG_NET_FOU_IP_TUNNELS=y
CONFIG_GENEVE=m
# CONFIG_INET_AH is not set
CONFIG_INET_ESP=y
# CONFIG_INET_IPCOMP is not set
# CONFIG_INET_XFRM_TUNNEL is not set
CONFIG_INET_TUNNEL=y
CONFIG_INET_XFRM_MODE_TRANSPORT=y
CONFIG_INET_XFRM_MODE_TUNNEL=y
CONFIG_INET_XFRM_MODE_BEET=y
CONFIG_INET_LRO=y
CONFIG_INET_DIAG=y
CONFIG_INET_TCP_DIAG=y
CONFIG_INET_UDP_DIAG=y
CONFIG_TCP_CONG_ADVANCED=y
CONFIG_TCP_CONG_BIC=y
CONFIG_TCP_CONG_CUBIC=y
CONFIG_TCP_CONG_WESTWOOD=y
CONFIG_TCP_CONG_HTCP=m
CONFIG_TCP_CONG_HSTCP=m
CONFIG_TCP_CONG_HYBLA=y
CONFIG_TCP_CONG_VEGAS=y
CONFIG_TCP_CONG_SCALABLE=y
CONFIG_TCP_CONG_LP=y
CONFIG_TCP_CONG_VENO=m
CONFIG_TCP_CONG_YEAH=y
# CONFIG_TCP_CONG_ILLINOIS is not set
# CONFIG_TCP_CONG_DCTCP is not set
# CONFIG_DEFAULT_BIC is not set
CONFIG_DEFAULT_CUBIC=y
# CONFIG_DEFAULT_HYBLA is not set
# CONFIG_DEFAULT_VEGAS is not set
# CONFIG_DEFAULT_WESTWOOD is not set
# CONFIG_DEFAULT_RENO is not set
CONFIG_DEFAULT_TCP_CONG="cubic"
# CONFIG_TCP_MD5SIG is not set
CONFIG_IPV6=y
CONFIG_IPV6_ROUTER_PREF=y
# CONFIG_IPV6_ROUTE_INFO is not set
# CONFIG_IPV6_OPTIMISTIC_DAD is not set
CONFIG_INET6_AH=y
CONFIG_INET6_ESP=m
CONFIG_INET6_IPCOMP=m
CONFIG_IPV6_MIP6=y
CONFIG_INET6_XFRM_TUNNEL=m
CONFIG_INET6_TUNNEL=y
CONFIG_INET6_XFRM_MODE_TRANSPORT=y
CONFIG_INET6_XFRM_MODE_TUNNEL=y
CONFIG_INET6_XFRM_MODE_BEET=y
CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION=y
# CONFIG_IPV6_VTI is not set
CONFIG_IPV6_SIT=y
# CONFIG_IPV6_SIT_6RD is not set
CONFIG_IPV6_NDISC_NODETYPE=y
CONFIG_IPV6_TUNNEL=y
CONFIG_IPV6_GRE=y
# CONFIG_IPV6_MULTIPLE_TABLES is not set
# CONFIG_IPV6_MROUTE is not set
# CONFIG_NETWORK_SECMARK is not set
CONFIG_NET_PTP_CLASSIFY=y
# CONFIG_NETWORK_PHY_TIMESTAMPING is not set
CONFIG_NETFILTER=y
CONFIG_NETFILTER_DEBUG=y
CONFIG_NETFILTER_ADVANCED=y
# CONFIG_BRIDGE_NETFILTER is not set

#
# Core Netfilter Configuration
#
CONFIG_NETFILTER_NETLINK=y
CONFIG_NETFILTER_NETLINK_ACCT=m
CONFIG_NETFILTER_NETLINK_QUEUE=m
CONFIG_NETFILTER_NETLINK_LOG=m
CONFIG_NF_CONNTRACK=m
CONFIG_NF_LOG_COMMON=y
CONFIG_NF_CONNTRACK_MARK=y
# CONFIG_NF_CONNTRACK_ZONES is not set
CONFIG_NF_CONNTRACK_PROCFS=y
CONFIG_NF_CONNTRACK_EVENTS=y
CONFIG_NF_CONNTRACK_TIMEOUT=y
# CONFIG_NF_CONNTRACK_TIMESTAMP is not set
CONFIG_NF_CONNTRACK_LABELS=y
# CONFIG_NF_CT_PROTO_DCCP is not set
CONFIG_NF_CT_PROTO_GRE=m
CONFIG_NF_CT_PROTO_SCTP=m
CONFIG_NF_CT_PROTO_UDPLITE=m
# CONFIG_NF_CONNTRACK_AMANDA is not set
CONFIG_NF_CONNTRACK_FTP=m
CONFIG_NF_CONNTRACK_H323=m
CONFIG_NF_CONNTRACK_IRC=m
CONFIG_NF_CONNTRACK_BROADCAST=m
CONFIG_NF_CONNTRACK_NETBIOS_NS=m
CONFIG_NF_CONNTRACK_SNMP=m
CONFIG_NF_CONNTRACK_PPTP=m
CONFIG_NF_CONNTRACK_SANE=m
# CONFIG_NF_CONNTRACK_SIP is not set
CONFIG_NF_CONNTRACK_TFTP=m
CONFIG_NF_CT_NETLINK=m
# CONFIG_NF_CT_NETLINK_TIMEOUT is not set
# CONFIG_NF_CT_NETLINK_HELPER is not set
CONFIG_NETFILTER_NETLINK_QUEUE_CT=y
CONFIG_NF_NAT=m
CONFIG_NF_NAT_NEEDED=y
CONFIG_NF_NAT_PROTO_UDPLITE=m
CONFIG_NF_NAT_PROTO_SCTP=m
# CONFIG_NF_NAT_AMANDA is not set
CONFIG_NF_NAT_FTP=m
CONFIG_NF_NAT_IRC=m
# CONFIG_NF_NAT_SIP is not set
CONFIG_NF_NAT_TFTP=m
CONFIG_NF_NAT_REDIRECT=m
CONFIG_NETFILTER_SYNPROXY=m
CONFIG_NF_TABLES=y
CONFIG_NF_TABLES_INET=y
# CONFIG_NFT_EXTHDR is not set
CONFIG_NFT_META=m
# CONFIG_NFT_CT is not set
CONFIG_NFT_RBTREE=y
CONFIG_NFT_HASH=y
CONFIG_NFT_COUNTER=y
CONFIG_NFT_LOG=m
CONFIG_NFT_LIMIT=m
CONFIG_NFT_MASQ=m
CONFIG_NFT_REDIR=m
CONFIG_NFT_NAT=m
# CONFIG_NFT_QUEUE is not set
# CONFIG_NFT_REJECT is not set
# CONFIG_NFT_REJECT_INET is not set
CONFIG_NFT_COMPAT=m
CONFIG_NETFILTER_XTABLES=m

#
# Xtables combined modules
#
CONFIG_NETFILTER_XT_MARK=m
CONFIG_NETFILTER_XT_CONNMARK=m
CONFIG_NETFILTER_XT_SET=m

#
# Xtables targets
#
CONFIG_NETFILTER_XT_TARGET_CHECKSUM=m
CONFIG_NETFILTER_XT_TARGET_CLASSIFY=m
CONFIG_NETFILTER_XT_TARGET_CONNMARK=m
CONFIG_NETFILTER_XT_TARGET_CT=m
CONFIG_NETFILTER_XT_TARGET_DSCP=m
CONFIG_NETFILTER_XT_TARGET_HL=m
# CONFIG_NETFILTER_XT_TARGET_HMARK is not set
CONFIG_NETFILTER_XT_TARGET_IDLETIMER=m
# CONFIG_NETFILTER_XT_TARGET_LED is not set
CONFIG_NETFILTER_XT_TARGET_LOG=m
CONFIG_NETFILTER_XT_TARGET_MARK=m
CONFIG_NETFILTER_XT_NAT=m
CONFIG_NETFILTER_XT_TARGET_NETMAP=m
CONFIG_NETFILTER_XT_TARGET_NFLOG=m
CONFIG_NETFILTER_XT_TARGET_NFQUEUE=m
# CONFIG_NETFILTER_XT_TARGET_NOTRACK is not set
CONFIG_NETFILTER_XT_TARGET_RATEEST=m
CONFIG_NETFILTER_XT_TARGET_REDIRECT=m
CONFIG_NETFILTER_XT_TARGET_TEE=m
# CONFIG_NETFILTER_XT_TARGET_TPROXY is not set
CONFIG_NETFILTER_XT_TARGET_TRACE=m
CONFIG_NETFILTER_XT_TARGET_TCPMSS=m
# CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP is not set

#
# Xtables matches
#
CONFIG_NETFILTER_XT_MATCH_ADDRTYPE=m
# CONFIG_NETFILTER_XT_MATCH_BPF is not set
CONFIG_NETFILTER_XT_MATCH_CLUSTER=m
CONFIG_NETFILTER_XT_MATCH_COMMENT=m
CONFIG_NETFILTER_XT_MATCH_CONNBYTES=m
CONFIG_NETFILTER_XT_MATCH_CONNLABEL=m
CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=m
CONFIG_NETFILTER_XT_MATCH_CONNMARK=m
# CONFIG_NETFILTER_XT_MATCH_CONNTRACK is not set
CONFIG_NETFILTER_XT_MATCH_CPU=m
CONFIG_NETFILTER_XT_MATCH_DCCP=m
# CONFIG_NETFILTER_XT_MATCH_DEVGROUP is not set
CONFIG_NETFILTER_XT_MATCH_DSCP=m
CONFIG_NETFILTER_XT_MATCH_ECN=m
CONFIG_NETFILTER_XT_MATCH_ESP=m
CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=m
CONFIG_NETFILTER_XT_MATCH_HELPER=m
CONFIG_NETFILTER_XT_MATCH_HL=m
CONFIG_NETFILTER_XT_MATCH_IPCOMP=m
CONFIG_NETFILTER_XT_MATCH_IPRANGE=m
# CONFIG_NETFILTER_XT_MATCH_L2TP is not set
CONFIG_NETFILTER_XT_MATCH_LENGTH=m
# CONFIG_NETFILTER_XT_MATCH_LIMIT is not set
CONFIG_NETFILTER_XT_MATCH_MAC=m
# CONFIG_NETFILTER_XT_MATCH_MARK is not set
CONFIG_NETFILTER_XT_MATCH_MULTIPORT=m
CONFIG_NETFILTER_XT_MATCH_NFACCT=m
CONFIG_NETFILTER_XT_MATCH_OSF=m
# CONFIG_NETFILTER_XT_MATCH_OWNER is not set
# CONFIG_NETFILTER_XT_MATCH_POLICY is not set
# CONFIG_NETFILTER_XT_MATCH_PKTTYPE is not set
CONFIG_NETFILTER_XT_MATCH_QUOTA=m
CONFIG_NETFILTER_XT_MATCH_RATEEST=m
# CONFIG_NETFILTER_XT_MATCH_REALM is not set
# CONFIG_NETFILTER_XT_MATCH_RECENT is not set
CONFIG_NETFILTER_XT_MATCH_SCTP=m
# CONFIG_NETFILTER_XT_MATCH_SOCKET is not set
# CONFIG_NETFILTER_XT_MATCH_STATE is not set
CONFIG_NETFILTER_XT_MATCH_STATISTIC=m
CONFIG_NETFILTER_XT_MATCH_STRING=m
# CONFIG_NETFILTER_XT_MATCH_TCPMSS is not set
CONFIG_NETFILTER_XT_MATCH_TIME=m
# CONFIG_NETFILTER_XT_MATCH_U32 is not set
CONFIG_IP_SET=m
CONFIG_IP_SET_MAX=256
# CONFIG_IP_SET_BITMAP_IP is not set
CONFIG_IP_SET_BITMAP_IPMAC=m
# CONFIG_IP_SET_BITMAP_PORT is not set
CONFIG_IP_SET_HASH_IP=m
CONFIG_IP_SET_HASH_IPMARK=m
CONFIG_IP_SET_HASH_IPPORT=m
CONFIG_IP_SET_HASH_IPPORTIP=m
CONFIG_IP_SET_HASH_IPPORTNET=m
CONFIG_IP_SET_HASH_MAC=m
CONFIG_IP_SET_HASH_NETPORTNET=m
# CONFIG_IP_SET_HASH_NET is not set
CONFIG_IP_SET_HASH_NETNET=m
# CONFIG_IP_SET_HASH_NETPORT is not set
# CONFIG_IP_SET_HASH_NETIFACE is not set
# CONFIG_IP_SET_LIST_SET is not set
# CONFIG_IP_VS is not set

#
# IP: Netfilter Configuration
#
CONFIG_NF_DEFRAG_IPV4=m
CONFIG_NF_CONNTRACK_IPV4=m
CONFIG_NF_CONNTRACK_PROC_COMPAT=y
CONFIG_NF_LOG_ARP=m
CONFIG_NF_LOG_IPV4=y
CONFIG_NF_TABLES_IPV4=y
CONFIG_NFT_CHAIN_ROUTE_IPV4=m
CONFIG_NF_REJECT_IPV4=m
# CONFIG_NFT_REJECT_IPV4 is not set
CONFIG_NF_TABLES_ARP=y
CONFIG_NF_NAT_IPV4=m
CONFIG_NFT_CHAIN_NAT_IPV4=m
CONFIG_NF_NAT_MASQUERADE_IPV4=m
CONFIG_NFT_MASQ_IPV4=m
CONFIG_NFT_REDIR_IPV4=m
CONFIG_NF_NAT_SNMP_BASIC=m
CONFIG_NF_NAT_PROTO_GRE=m
CONFIG_NF_NAT_PPTP=m
CONFIG_NF_NAT_H323=m
CONFIG_IP_NF_IPTABLES=m
CONFIG_IP_NF_MATCH_AH=m
CONFIG_IP_NF_MATCH_ECN=m
CONFIG_IP_NF_MATCH_RPFILTER=m
CONFIG_IP_NF_MATCH_TTL=m
CONFIG_IP_NF_FILTER=m
CONFIG_IP_NF_TARGET_REJECT=m
CONFIG_IP_NF_TARGET_SYNPROXY=m
CONFIG_IP_NF_NAT=m
CONFIG_IP_NF_TARGET_MASQUERADE=m
# CONFIG_IP_NF_TARGET_NETMAP is not set
CONFIG_IP_NF_TARGET_REDIRECT=m
CONFIG_IP_NF_MANGLE=m
CONFIG_IP_NF_TARGET_CLUSTERIP=m
CONFIG_IP_NF_TARGET_ECN=m
CONFIG_IP_NF_TARGET_TTL=m
CONFIG_IP_NF_RAW=m
CONFIG_IP_NF_ARPTABLES=m
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m

#
# IPv6: Netfilter Configuration
#
# CONFIG_NF_DEFRAG_IPV6 is not set
# CONFIG_NF_CONNTRACK_IPV6 is not set
CONFIG_NF_TABLES_IPV6=y
CONFIG_NFT_CHAIN_ROUTE_IPV6=y
CONFIG_NF_REJECT_IPV6=y
# CONFIG_NFT_REJECT_IPV6 is not set
CONFIG_NF_LOG_IPV6=m
# CONFIG_IP6_NF_IPTABLES is not set

#
# DECnet: Netfilter Configuration
#
CONFIG_DECNET_NF_GRABULATOR=m
# CONFIG_NF_TABLES_BRIDGE is not set
# CONFIG_BRIDGE_NF_EBTABLES is not set
CONFIG_IP_DCCP=m
CONFIG_INET_DCCP_DIAG=m

#
# DCCP CCIDs Configuration
#
# CONFIG_IP_DCCP_CCID2_DEBUG is not set
CONFIG_IP_DCCP_CCID3=y
# CONFIG_IP_DCCP_CCID3_DEBUG is not set
CONFIG_IP_DCCP_TFRC_LIB=y

#
# DCCP Kernel Hacking
#
CONFIG_IP_DCCP_DEBUG=y
CONFIG_NET_DCCPPROBE=m
CONFIG_IP_SCTP=m
# CONFIG_NET_SCTPPROBE is not set
# CONFIG_SCTP_DBG_OBJCNT is not set
CONFIG_SCTP_DEFAULT_COOKIE_HMAC_MD5=y
# CONFIG_SCTP_DEFAULT_COOKIE_HMAC_SHA1 is not set
# CONFIG_SCTP_DEFAULT_COOKIE_HMAC_NONE is not set
CONFIG_SCTP_COOKIE_HMAC_MD5=y
# CONFIG_SCTP_COOKIE_HMAC_SHA1 is not set
CONFIG_RDS=y
# CONFIG_RDS_TCP is not set
CONFIG_RDS_DEBUG=y
# CONFIG_TIPC is not set
# CONFIG_ATM is not set
CONFIG_L2TP=m
# CONFIG_L2TP_DEBUGFS is not set
CONFIG_L2TP_V3=y
CONFIG_L2TP_IP=y
CONFIG_L2TP_ETH=m
CONFIG_STP=m
CONFIG_BRIDGE=m
CONFIG_BRIDGE_IGMP_SNOOPING=y
# CONFIG_BRIDGE_VLAN_FILTERING is not set
CONFIG_HAVE_NET_DSA=y
CONFIG_NET_DSA=y
CONFIG_NET_DSA_TAG_DSA=y
CONFIG_NET_DSA_TAG_EDSA=y
CONFIG_NET_DSA_TAG_TRAILER=y
CONFIG_VLAN_8021Q=y
# CONFIG_VLAN_8021Q_GVRP is not set
# CONFIG_VLAN_8021Q_MVRP is not set
CONFIG_DECNET=m
CONFIG_DECNET_ROUTER=y
CONFIG_LLC=y
CONFIG_LLC2=y
# CONFIG_IPX is not set
CONFIG_ATALK=m
CONFIG_DEV_APPLETALK=m
# CONFIG_IPDDP is not set
CONFIG_X25=y
# CONFIG_LAPB is not set
# CONFIG_PHONET is not set
CONFIG_6LOWPAN=y
CONFIG_IEEE802154=m
# CONFIG_IEEE802154_SOCKET is not set
CONFIG_IEEE802154_6LOWPAN=m
# CONFIG_MAC802154 is not set
# CONFIG_NET_SCHED is not set
# CONFIG_DCB is not set
CONFIG_DNS_RESOLVER=y
CONFIG_BATMAN_ADV=m
CONFIG_BATMAN_ADV_BLA=y
# CONFIG_BATMAN_ADV_DAT is not set
# CONFIG_BATMAN_ADV_NC is not set
CONFIG_BATMAN_ADV_MCAST=y
# CONFIG_BATMAN_ADV_DEBUG is not set
# CONFIG_OPENVSWITCH is not set
CONFIG_VSOCKETS=y
CONFIG_VMWARE_VMCI_VSOCKETS=m
# CONFIG_NETLINK_MMAP is not set
CONFIG_NETLINK_DIAG=m
CONFIG_NET_MPLS_GSO=y
# CONFIG_HSR is not set
# CONFIG_NET_SWITCHDEV is not set
CONFIG_NET_RX_BUSY_POLL=y
CONFIG_BQL=y
# CONFIG_BPF_JIT is not set

#
# Network testing
#
CONFIG_NET_PKTGEN=m
CONFIG_NET_TCPPROBE=y
CONFIG_NET_DROP_MONITOR=y
CONFIG_HAMRADIO=y

#
# Packet Radio protocols
#
# CONFIG_AX25 is not set
CONFIG_CAN=y
# CONFIG_CAN_RAW is not set
CONFIG_CAN_BCM=m
# CONFIG_CAN_GW is not set

#
# CAN Device Drivers
#
CONFIG_CAN_VCAN=m
CONFIG_CAN_SLCAN=m
CONFIG_CAN_DEV=y
# CONFIG_CAN_CALC_BITTIMING is not set
CONFIG_CAN_LEDS=y
CONFIG_CAN_JANZ_ICAN3=m
CONFIG_CAN_SJA1000=y
CONFIG_CAN_SJA1000_ISA=m
CONFIG_CAN_SJA1000_PLATFORM=y
# CONFIG_CAN_EMS_PCI is not set
# CONFIG_CAN_PEAK_PCI is not set
CONFIG_CAN_KVASER_PCI=y
CONFIG_CAN_PLX_PCI=m
CONFIG_CAN_C_CAN=y
# CONFIG_CAN_C_CAN_PLATFORM is not set
# CONFIG_CAN_C_CAN_PCI is not set
CONFIG_CAN_M_CAN=m
CONFIG_CAN_CC770=m
# CONFIG_CAN_CC770_ISA is not set
CONFIG_CAN_CC770_PLATFORM=m

#
# CAN SPI interfaces
#
CONFIG_CAN_MCP251X=m
# CONFIG_CAN_SOFTING is not set
# CONFIG_CAN_DEBUG_DEVICES is not set
# CONFIG_IRDA is not set
# CONFIG_BT is not set
CONFIG_AF_RXRPC=m
# CONFIG_AF_RXRPC_DEBUG is not set
CONFIG_RXKAD=m
CONFIG_FIB_RULES=y
CONFIG_WIRELESS=y
CONFIG_WEXT_CORE=y
CONFIG_WEXT_PROC=y
CONFIG_CFG80211=m
CONFIG_NL80211_TESTMODE=y
CONFIG_CFG80211_DEVELOPER_WARNINGS=y
# CONFIG_CFG80211_REG_DEBUG is not set
CONFIG_CFG80211_DEFAULT_PS=y
CONFIG_CFG80211_DEBUGFS=y
# CONFIG_CFG80211_INTERNAL_REGDB is not set
CONFIG_CFG80211_WEXT=y
# CONFIG_LIB80211 is not set
CONFIG_MAC80211=m
CONFIG_MAC80211_HAS_RC=y
CONFIG_MAC80211_RC_MINSTREL=y
CONFIG_MAC80211_RC_MINSTREL_HT=y
# CONFIG_MAC80211_RC_MINSTREL_VHT is not set
CONFIG_MAC80211_RC_DEFAULT_MINSTREL=y
CONFIG_MAC80211_RC_DEFAULT="minstrel_ht"
# CONFIG_MAC80211_MESH is not set
CONFIG_MAC80211_LEDS=y
# CONFIG_MAC80211_DEBUGFS is not set
CONFIG_MAC80211_MESSAGE_TRACING=y
# CONFIG_MAC80211_DEBUG_MENU is not set
CONFIG_WIMAX=m
CONFIG_WIMAX_DEBUG_LEVEL=8
CONFIG_RFKILL=y
CONFIG_RFKILL_LEDS=y
CONFIG_RFKILL_INPUT=y
# CONFIG_NET_9P is not set
# CONFIG_CAIF is not set
CONFIG_CEPH_LIB=y
# CONFIG_CEPH_LIB_PRETTYDEBUG is not set
# CONFIG_CEPH_LIB_USE_DNS_RESOLVER is not set
CONFIG_NFC=m
CONFIG_NFC_DIGITAL=m
CONFIG_NFC_NCI=m
CONFIG_NFC_NCI_SPI=y
CONFIG_NFC_HCI=m
CONFIG_NFC_SHDLC=y

#
# Near Field Communication (NFC) devices
#
# CONFIG_NFC_TRF7970A is not set
CONFIG_NFC_SIM=m
CONFIG_NFC_PN544=m
CONFIG_NFC_PN544_I2C=m
CONFIG_NFC_MICROREAD=m
# CONFIG_NFC_MICROREAD_I2C is not set
CONFIG_NFC_MRVL=m
CONFIG_NFC_ST21NFCA=m
CONFIG_NFC_ST21NFCA_I2C=m
# CONFIG_NFC_ST21NFCB is not set
CONFIG_HAVE_BPF_JIT=y

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_UEVENT_HELPER=y
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
CONFIG_DEVTMPFS=y
# CONFIG_DEVTMPFS_MOUNT is not set
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
CONFIG_FIRMWARE_IN_KERNEL=y
CONFIG_EXTRA_FIRMWARE=""
CONFIG_FW_LOADER_USER_HELPER=y
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y
CONFIG_ALLOW_DEV_COREDUMP=y
CONFIG_DEBUG_DRIVER=y
CONFIG_DEBUG_DEVRES=y
# CONFIG_SYS_HYPERVISOR is not set
# CONFIG_GENERIC_CPU_DEVICES is not set
CONFIG_GENERIC_CPU_AUTOPROBE=y
CONFIG_REGMAP=y
CONFIG_REGMAP_I2C=y
CONFIG_REGMAP_SPI=y
CONFIG_REGMAP_MMIO=m
CONFIG_REGMAP_IRQ=y
CONFIG_DMA_SHARED_BUFFER=y
CONFIG_FENCE_TRACE=y
CONFIG_DMA_CMA=y

#
# Default contiguous memory area size:
#
CONFIG_CMA_SIZE_MBYTES=0
CONFIG_CMA_SIZE_SEL_MBYTES=y
# CONFIG_CMA_SIZE_SEL_PERCENTAGE is not set
# CONFIG_CMA_SIZE_SEL_MIN is not set
# CONFIG_CMA_SIZE_SEL_MAX is not set
CONFIG_CMA_ALIGNMENT=8

#
# Bus devices
#
CONFIG_CONNECTOR=y
CONFIG_PROC_EVENTS=y
CONFIG_MTD=y
# CONFIG_MTD_TESTS is not set
# CONFIG_MTD_REDBOOT_PARTS is not set
# CONFIG_MTD_CMDLINE_PARTS is not set
CONFIG_MTD_AR7_PARTS=y

#
# User Modules And Translation Layers
#
CONFIG_MTD_BLKDEVS=y
# CONFIG_MTD_BLOCK is not set
# CONFIG_MTD_BLOCK_RO is not set
CONFIG_FTL=y
CONFIG_NFTL=m
CONFIG_NFTL_RW=y
CONFIG_INFTL=y
CONFIG_RFD_FTL=m
# CONFIG_SSFDC is not set
CONFIG_SM_FTL=m
# CONFIG_MTD_OOPS is not set

#
# RAM/ROM/Flash chip drivers
#
# CONFIG_MTD_CFI is not set
CONFIG_MTD_JEDECPROBE=m
CONFIG_MTD_GEN_PROBE=m
# CONFIG_MTD_CFI_ADV_OPTIONS is not set
CONFIG_MTD_MAP_BANK_WIDTH_1=y
CONFIG_MTD_MAP_BANK_WIDTH_2=y
CONFIG_MTD_MAP_BANK_WIDTH_4=y
# CONFIG_MTD_MAP_BANK_WIDTH_8 is not set
# CONFIG_MTD_MAP_BANK_WIDTH_16 is not set
# CONFIG_MTD_MAP_BANK_WIDTH_32 is not set
CONFIG_MTD_CFI_I1=y
CONFIG_MTD_CFI_I2=y
# CONFIG_MTD_CFI_I4 is not set
# CONFIG_MTD_CFI_I8 is not set
CONFIG_MTD_CFI_INTELEXT=m
CONFIG_MTD_CFI_AMDSTD=m
# CONFIG_MTD_CFI_STAA is not set
CONFIG_MTD_CFI_UTIL=m
CONFIG_MTD_RAM=m
CONFIG_MTD_ROM=m
CONFIG_MTD_ABSENT=m

#
# Mapping drivers for chip access
#
# CONFIG_MTD_COMPLEX_MAPPINGS is not set
# CONFIG_MTD_PHYSMAP is not set
CONFIG_MTD_AMD76XROM=m
CONFIG_MTD_ICHXROM=m
CONFIG_MTD_ESB2ROM=m
# CONFIG_MTD_CK804XROM is not set
CONFIG_MTD_SCB2_FLASH=m
CONFIG_MTD_NETtel=m
# CONFIG_MTD_L440GX is not set
CONFIG_MTD_INTEL_VR_NOR=y
# CONFIG_MTD_PLATRAM is not set

#
# Self-contained MTD device drivers
#
CONFIG_MTD_PMC551=y
# CONFIG_MTD_PMC551_BUGFIX is not set
CONFIG_MTD_PMC551_DEBUG=y
# CONFIG_MTD_DATAFLASH is not set
# CONFIG_MTD_SST25L is not set
CONFIG_MTD_SLRAM=m
CONFIG_MTD_PHRAM=y
CONFIG_MTD_MTDRAM=y
CONFIG_MTDRAM_TOTAL_SIZE=4096
CONFIG_MTDRAM_ERASE_SIZE=128
CONFIG_MTDRAM_ABS_POS=0
CONFIG_MTD_BLOCK2MTD=m

#
# Disk-On-Chip Device Drivers
#
CONFIG_MTD_DOCG3=y
CONFIG_BCH_CONST_M=14
CONFIG_BCH_CONST_T=4
CONFIG_MTD_NAND_ECC=m
CONFIG_MTD_NAND_ECC_SMC=y
CONFIG_MTD_NAND=m
CONFIG_MTD_NAND_BCH=m
CONFIG_MTD_NAND_ECC_BCH=y
CONFIG_MTD_SM_COMMON=m
CONFIG_MTD_NAND_DENALI=m
# CONFIG_MTD_NAND_DENALI_PCI is not set
# CONFIG_MTD_NAND_OMAP_BCH_BUILD is not set
CONFIG_MTD_NAND_IDS=m
CONFIG_MTD_NAND_RICOH=m
# CONFIG_MTD_NAND_DISKONCHIP is not set
# CONFIG_MTD_NAND_DOCG4 is not set
CONFIG_MTD_NAND_CAFE=m
# CONFIG_MTD_NAND_NANDSIM is not set
CONFIG_MTD_NAND_PLATFORM=m
CONFIG_MTD_NAND_HISI504=m
CONFIG_MTD_ONENAND=m
# CONFIG_MTD_ONENAND_VERIFY_WRITE is not set
CONFIG_MTD_ONENAND_GENERIC=m
# CONFIG_MTD_ONENAND_OTP is not set
CONFIG_MTD_ONENAND_2X_PROGRAM=y

#
# LPDDR & LPDDR2 PCM memory drivers
#
CONFIG_MTD_LPDDR=m
CONFIG_MTD_QINFO_PROBE=m
# CONFIG_MTD_SPI_NOR is not set
# CONFIG_MTD_UBI is not set
CONFIG_ARCH_MIGHT_HAVE_PC_PARPORT=y
CONFIG_PARPORT=m
CONFIG_PARPORT_PC=m
CONFIG_PARPORT_SERIAL=m
CONFIG_PARPORT_PC_FIFO=y
CONFIG_PARPORT_PC_SUPERIO=y
# CONFIG_PARPORT_GSC is not set
CONFIG_PARPORT_AX88796=m
# CONFIG_PARPORT_1284 is not set
CONFIG_PARPORT_NOT_PC=y
CONFIG_PNP=y
CONFIG_PNP_DEBUG_MESSAGES=y

#
# Protocols
#
CONFIG_PNPACPI=y
CONFIG_BLK_DEV=y
CONFIG_BLK_DEV_NULL_BLK=y
CONFIG_BLK_DEV_FD=y
# CONFIG_PARIDE is not set
# CONFIG_BLK_DEV_PCIESSD_MTIP32XX is not set
CONFIG_BLK_CPQ_CISS_DA=m
# CONFIG_CISS_SCSI_TAPE is not set
CONFIG_BLK_DEV_DAC960=y
CONFIG_BLK_DEV_UMEM=y
# CONFIG_BLK_DEV_COW_COMMON is not set
CONFIG_BLK_DEV_LOOP=y
CONFIG_BLK_DEV_LOOP_MIN_COUNT=8
# CONFIG_BLK_DEV_CRYPTOLOOP is not set
CONFIG_BLK_DEV_DRBD=m
CONFIG_DRBD_FAULT_INJECTION=y
CONFIG_BLK_DEV_NBD=y
CONFIG_BLK_DEV_NVME=m
# CONFIG_BLK_DEV_SKD is not set
CONFIG_BLK_DEV_OSD=m
CONFIG_BLK_DEV_SX8=m
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=4096
CONFIG_CDROM_PKTCDVD=y
CONFIG_CDROM_PKTCDVD_BUFFERS=8
# CONFIG_CDROM_PKTCDVD_WCACHE is not set
CONFIG_ATA_OVER_ETH=m
# CONFIG_VIRTIO_BLK is not set
CONFIG_BLK_DEV_HD=y
# CONFIG_BLK_DEV_RBD is not set
# CONFIG_BLK_DEV_RSXX is not set

#
# Misc devices
#
CONFIG_SENSORS_LIS3LV02D=y
# CONFIG_AD525X_DPOT is not set
# CONFIG_DUMMY_IRQ is not set
CONFIG_IBM_ASM=y
# CONFIG_PHANTOM is not set
# CONFIG_SGI_IOC4 is not set
CONFIG_TIFM_CORE=y
CONFIG_TIFM_7XX1=m
CONFIG_ICS932S401=m
CONFIG_ENCLOSURE_SERVICES=y
CONFIG_HP_ILO=y
CONFIG_APDS9802ALS=y
# CONFIG_ISL29003 is not set
CONFIG_ISL29020=y
CONFIG_SENSORS_TSL2550=m
# CONFIG_SENSORS_BH1780 is not set
CONFIG_SENSORS_BH1770=y
# CONFIG_SENSORS_APDS990X is not set
CONFIG_HMC6352=m
CONFIG_DS1682=m
CONFIG_TI_DAC7512=y
CONFIG_BMP085=y
CONFIG_BMP085_I2C=m
CONFIG_BMP085_SPI=m
CONFIG_USB_SWITCH_FSA9480=y
# CONFIG_LATTICE_ECP3_CONFIG is not set
# CONFIG_SRAM is not set
# CONFIG_C2PORT is not set

#
# EEPROM support
#
# CONFIG_EEPROM_AT24 is not set
CONFIG_EEPROM_AT25=y
# CONFIG_EEPROM_LEGACY is not set
CONFIG_EEPROM_MAX6875=m
CONFIG_EEPROM_93CX6=m
# CONFIG_EEPROM_93XX46 is not set
CONFIG_CB710_CORE=y
# CONFIG_CB710_DEBUG is not set
CONFIG_CB710_DEBUG_ASSUMPTIONS=y

#
# Texas Instruments shared transport line discipline
#
CONFIG_SENSORS_LIS3_I2C=y

#
# Altera FPGA firmware download module
#
CONFIG_ALTERA_STAPL=y
CONFIG_VMWARE_VMCI=m

#
# Intel MIC Bus Driver
#
CONFIG_INTEL_MIC_BUS=y

#
# Intel MIC Host Driver
#
CONFIG_INTEL_MIC_HOST=y

#
# Intel MIC Card Driver
#
CONFIG_INTEL_MIC_CARD=m
# CONFIG_GENWQE is not set
# CONFIG_ECHO is not set
# CONFIG_CXL_BASE is not set
CONFIG_HAVE_IDE=y
CONFIG_IDE=y

#
# Please see Documentation/ide/ide.txt for help/info on IDE drives
#
CONFIG_IDE_XFER_MODE=y
CONFIG_IDE_TIMINGS=y
CONFIG_IDE_ATAPI=y
# CONFIG_BLK_DEV_IDE_SATA is not set
CONFIG_IDE_GD=y
CONFIG_IDE_GD_ATA=y
# CONFIG_IDE_GD_ATAPI is not set
CONFIG_BLK_DEV_IDECD=y
CONFIG_BLK_DEV_IDECD_VERBOSE_ERRORS=y
# CONFIG_BLK_DEV_IDETAPE is not set
CONFIG_BLK_DEV_IDEACPI=y
CONFIG_IDE_TASK_IOCTL=y
CONFIG_IDE_PROC_FS=y

#
# IDE chipset support/bugfixes
#
CONFIG_IDE_GENERIC=y
CONFIG_BLK_DEV_PLATFORM=y
CONFIG_BLK_DEV_CMD640=y
# CONFIG_BLK_DEV_CMD640_ENHANCED is not set
CONFIG_BLK_DEV_IDEPNP=m
CONFIG_BLK_DEV_IDEDMA_SFF=y

#
# PCI IDE chipsets support
#
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_PCIBUS_ORDER=y
# CONFIG_BLK_DEV_OFFBOARD is not set
CONFIG_BLK_DEV_GENERIC=m
CONFIG_BLK_DEV_OPTI621=y
# CONFIG_BLK_DEV_RZ1000 is not set
CONFIG_BLK_DEV_IDEDMA_PCI=y
# CONFIG_BLK_DEV_AEC62XX is not set
# CONFIG_BLK_DEV_ALI15X3 is not set
CONFIG_BLK_DEV_AMD74XX=y
CONFIG_BLK_DEV_ATIIXP=m
CONFIG_BLK_DEV_CMD64X=y
# CONFIG_BLK_DEV_TRIFLEX is not set
CONFIG_BLK_DEV_HPT366=y
CONFIG_BLK_DEV_JMICRON=m
CONFIG_BLK_DEV_PIIX=y
# CONFIG_BLK_DEV_IT8172 is not set
CONFIG_BLK_DEV_IT8213=m
CONFIG_BLK_DEV_IT821X=m
# CONFIG_BLK_DEV_NS87415 is not set
# CONFIG_BLK_DEV_PDC202XX_OLD is not set
# CONFIG_BLK_DEV_PDC202XX_NEW is not set
# CONFIG_BLK_DEV_SVWKS is not set
CONFIG_BLK_DEV_SIIMAGE=m
CONFIG_BLK_DEV_SIS5513=m
# CONFIG_BLK_DEV_SLC90E66 is not set
CONFIG_BLK_DEV_TRM290=y
# CONFIG_BLK_DEV_VIA82CXXX is not set
CONFIG_BLK_DEV_TC86C001=m
CONFIG_BLK_DEV_IDEDMA=y

#
# SCSI device support
#
CONFIG_SCSI_MOD=y
CONFIG_RAID_ATTRS=y
CONFIG_SCSI=y
CONFIG_SCSI_DMA=y
CONFIG_SCSI_NETLINK=y
CONFIG_SCSI_MQ_DEFAULT=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=y
CONFIG_CHR_DEV_ST=m
CONFIG_CHR_DEV_OSST=y
CONFIG_BLK_DEV_SR=y
# CONFIG_BLK_DEV_SR_VENDOR is not set
CONFIG_CHR_DEV_SG=y
# CONFIG_CHR_DEV_SCH is not set
# CONFIG_SCSI_ENCLOSURE is not set
CONFIG_SCSI_CONSTANTS=y
# CONFIG_SCSI_LOGGING is not set
CONFIG_SCSI_SCAN_ASYNC=y

#
# SCSI Transports
#
CONFIG_SCSI_SPI_ATTRS=y
CONFIG_SCSI_FC_ATTRS=y
CONFIG_SCSI_ISCSI_ATTRS=y
CONFIG_SCSI_SAS_ATTRS=y
CONFIG_SCSI_SAS_LIBSAS=y
# CONFIG_SCSI_SAS_ATA is not set
# CONFIG_SCSI_SAS_HOST_SMP is not set
# CONFIG_SCSI_SRP_ATTRS is not set
CONFIG_SCSI_LOWLEVEL=y
CONFIG_ISCSI_TCP=y
CONFIG_ISCSI_BOOT_SYSFS=y
CONFIG_SCSI_CXGB3_ISCSI=y
CONFIG_SCSI_CXGB4_ISCSI=m
CONFIG_SCSI_BNX2_ISCSI=m
# CONFIG_BE2ISCSI is not set
CONFIG_BLK_DEV_3W_XXXX_RAID=m
# CONFIG_SCSI_HPSA is not set
# CONFIG_SCSI_3W_9XXX is not set
CONFIG_SCSI_3W_SAS=y
CONFIG_SCSI_ACARD=m
CONFIG_SCSI_AACRAID=y
CONFIG_SCSI_AIC7XXX=y
CONFIG_AIC7XXX_CMDS_PER_DEVICE=32
CONFIG_AIC7XXX_RESET_DELAY_MS=5000
# CONFIG_AIC7XXX_DEBUG_ENABLE is not set
CONFIG_AIC7XXX_DEBUG_MASK=0
CONFIG_AIC7XXX_REG_PRETTY_PRINT=y
CONFIG_SCSI_AIC79XX=m
CONFIG_AIC79XX_CMDS_PER_DEVICE=32
CONFIG_AIC79XX_RESET_DELAY_MS=5000
# CONFIG_AIC79XX_DEBUG_ENABLE is not set
CONFIG_AIC79XX_DEBUG_MASK=0
# CONFIG_AIC79XX_REG_PRETTY_PRINT is not set
CONFIG_SCSI_AIC94XX=m
CONFIG_AIC94XX_DEBUG=y
CONFIG_SCSI_MVSAS=y
# CONFIG_SCSI_MVSAS_DEBUG is not set
CONFIG_SCSI_MVSAS_TASKLET=y
CONFIG_SCSI_MVUMI=m
CONFIG_SCSI_DPT_I2O=y
CONFIG_SCSI_ADVANSYS=m
CONFIG_SCSI_ARCMSR=m
CONFIG_SCSI_ESAS2R=m
CONFIG_MEGARAID_NEWGEN=y
CONFIG_MEGARAID_MM=y
CONFIG_MEGARAID_MAILBOX=m
# CONFIG_MEGARAID_LEGACY is not set
# CONFIG_MEGARAID_SAS is not set
CONFIG_SCSI_MPT2SAS=m
CONFIG_SCSI_MPT2SAS_MAX_SGE=128
CONFIG_SCSI_MPT2SAS_LOGGING=y
CONFIG_SCSI_MPT3SAS=y
CONFIG_SCSI_MPT3SAS_MAX_SGE=128
# CONFIG_SCSI_MPT3SAS_LOGGING is not set
# CONFIG_SCSI_UFSHCD is not set
# CONFIG_SCSI_HPTIOP is not set
# CONFIG_SCSI_BUSLOGIC is not set
CONFIG_VMWARE_PVSCSI=m
CONFIG_LIBFC=y
# CONFIG_LIBFCOE is not set
CONFIG_SCSI_DMX3191D=y
CONFIG_SCSI_EATA=m
CONFIG_SCSI_EATA_TAGGED_QUEUE=y
CONFIG_SCSI_EATA_LINKED_COMMANDS=y
CONFIG_SCSI_EATA_MAX_TAGS=16
# CONFIG_SCSI_FUTURE_DOMAIN is not set
CONFIG_SCSI_GDTH=m
CONFIG_SCSI_ISCI=y
# CONFIG_SCSI_IPS is not set
CONFIG_SCSI_INITIO=m
CONFIG_SCSI_INIA100=m
# CONFIG_SCSI_PPA is not set
CONFIG_SCSI_IMM=m
CONFIG_SCSI_IZIP_EPP16=y
CONFIG_SCSI_IZIP_SLOW_CTR=y
# CONFIG_SCSI_STEX is not set
CONFIG_SCSI_SYM53C8XX_2=m
CONFIG_SCSI_SYM53C8XX_DMA_ADDRESSING_MODE=1
CONFIG_SCSI_SYM53C8XX_DEFAULT_TAGS=16
CONFIG_SCSI_SYM53C8XX_MAX_TAGS=64
CONFIG_SCSI_SYM53C8XX_MMIO=y
# CONFIG_SCSI_IPR is not set
# CONFIG_SCSI_QLOGIC_1280 is not set
# CONFIG_SCSI_QLA_FC is not set
CONFIG_SCSI_QLA_ISCSI=m
CONFIG_SCSI_LPFC=m
CONFIG_SCSI_LPFC_DEBUG_FS=y
CONFIG_SCSI_DC395x=m
CONFIG_SCSI_AM53C974=m
# CONFIG_SCSI_WD719X is not set
CONFIG_SCSI_DEBUG=m
# CONFIG_SCSI_PMCRAID is not set
# CONFIG_SCSI_PM8001 is not set
CONFIG_SCSI_BFA_FC=m
# CONFIG_SCSI_VIRTIO is not set
# CONFIG_SCSI_CHELSIO_FCOE is not set
# CONFIG_SCSI_DH is not set
CONFIG_SCSI_OSD_INITIATOR=m
CONFIG_SCSI_OSD_ULD=m
CONFIG_SCSI_OSD_DPRINT_SENSE=1
# CONFIG_SCSI_OSD_DEBUG is not set
CONFIG_ATA=y
# CONFIG_ATA_NONSTANDARD is not set
CONFIG_ATA_VERBOSE_ERROR=y
CONFIG_ATA_ACPI=y
CONFIG_SATA_ZPODD=y
CONFIG_SATA_PMP=y

#
# Controllers with non-SFF native interface
#
CONFIG_SATA_AHCI=y
CONFIG_SATA_AHCI_PLATFORM=y
# CONFIG_SATA_INIC162X is not set
# CONFIG_SATA_ACARD_AHCI is not set
CONFIG_SATA_SIL24=y
CONFIG_ATA_SFF=y

#
# SFF controllers with custom DMA interface
#
CONFIG_PDC_ADMA=m
# CONFIG_SATA_QSTOR is not set
CONFIG_SATA_SX4=m
CONFIG_ATA_BMDMA=y

#
# SATA SFF controllers with BMDMA
#
CONFIG_ATA_PIIX=y
CONFIG_SATA_MV=m
CONFIG_SATA_NV=y
CONFIG_SATA_PROMISE=m
CONFIG_SATA_SIL=m
CONFIG_SATA_SIS=y
CONFIG_SATA_SVW=y
CONFIG_SATA_ULI=y
CONFIG_SATA_VIA=y
# CONFIG_SATA_VITESSE is not set

#
# PATA SFF controllers with BMDMA
#
CONFIG_PATA_ALI=y
CONFIG_PATA_AMD=y
CONFIG_PATA_ARTOP=m
CONFIG_PATA_ATIIXP=m
# CONFIG_PATA_ATP867X is not set
# CONFIG_PATA_CMD64X is not set
# CONFIG_PATA_CYPRESS is not set
# CONFIG_PATA_EFAR is not set
CONFIG_PATA_HPT366=m
CONFIG_PATA_HPT37X=m
CONFIG_PATA_HPT3X2N=y
CONFIG_PATA_HPT3X3=m
CONFIG_PATA_HPT3X3_DMA=y
CONFIG_PATA_IT8213=y
# CONFIG_PATA_IT821X is not set
# CONFIG_PATA_JMICRON is not set
CONFIG_PATA_MARVELL=y
CONFIG_PATA_NETCELL=y
CONFIG_PATA_NINJA32=y
CONFIG_PATA_NS87415=y
# CONFIG_PATA_OLDPIIX is not set
CONFIG_PATA_OPTIDMA=y
CONFIG_PATA_PDC2027X=y
# CONFIG_PATA_PDC_OLD is not set
CONFIG_PATA_RADISYS=m
CONFIG_PATA_RDC=y
CONFIG_PATA_SCH=y
CONFIG_PATA_SERVERWORKS=y
# CONFIG_PATA_SIL680 is not set
CONFIG_PATA_SIS=y
# CONFIG_PATA_TOSHIBA is not set
# CONFIG_PATA_TRIFLEX is not set
CONFIG_PATA_VIA=m
CONFIG_PATA_WINBOND=m

#
# PIO-only SFF controllers
#
CONFIG_PATA_CMD640_PCI=m
CONFIG_PATA_MPIIX=y
CONFIG_PATA_NS87410=m
CONFIG_PATA_OPTI=y
CONFIG_PATA_RZ1000=y

#
# Generic fallback / legacy drivers
#
# CONFIG_PATA_ACPI is not set
CONFIG_ATA_GENERIC=m
CONFIG_PATA_LEGACY=y
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
# CONFIG_MD_AUTODETECT is not set
# CONFIG_MD_LINEAR is not set
CONFIG_MD_RAID0=m
CONFIG_MD_RAID1=m
CONFIG_MD_RAID10=y
CONFIG_MD_RAID456=m
# CONFIG_MD_MULTIPATH is not set
CONFIG_MD_FAULTY=y
CONFIG_BCACHE=m
CONFIG_BCACHE_DEBUG=y
# CONFIG_BCACHE_CLOSURES_DEBUG is not set
CONFIG_BLK_DEV_DM_BUILTIN=y
CONFIG_BLK_DEV_DM=m
CONFIG_DM_DEBUG=y
CONFIG_DM_BUFIO=m
CONFIG_DM_BIO_PRISON=m
CONFIG_DM_PERSISTENT_DATA=m
CONFIG_DM_DEBUG_BLOCK_STACK_TRACING=y
CONFIG_DM_CRYPT=m
CONFIG_DM_SNAPSHOT=m
# CONFIG_DM_THIN_PROVISIONING is not set
CONFIG_DM_CACHE=m
CONFIG_DM_CACHE_MQ=m
CONFIG_DM_CACHE_CLEANER=m
CONFIG_DM_ERA=m
CONFIG_DM_MIRROR=m
# CONFIG_DM_LOG_USERSPACE is not set
CONFIG_DM_RAID=m
CONFIG_DM_ZERO=m
CONFIG_DM_MULTIPATH=m
CONFIG_DM_MULTIPATH_QL=m
CONFIG_DM_MULTIPATH_ST=m
CONFIG_DM_DELAY=m
CONFIG_DM_UEVENT=y
CONFIG_DM_FLAKEY=m
# CONFIG_DM_VERITY is not set
CONFIG_DM_SWITCH=m
CONFIG_TARGET_CORE=m
CONFIG_TCM_IBLOCK=m
# CONFIG_TCM_FILEIO is not set
CONFIG_TCM_PSCSI=m
# CONFIG_TCM_USER is not set
CONFIG_LOOPBACK_TARGET=m
CONFIG_TCM_FC=m
CONFIG_ISCSI_TARGET=m
# CONFIG_SBP_TARGET is not set
CONFIG_FUSION=y
CONFIG_FUSION_SPI=m
CONFIG_FUSION_FC=y
CONFIG_FUSION_SAS=m
CONFIG_FUSION_MAX_SGE=128
# CONFIG_FUSION_CTL is not set
# CONFIG_FUSION_LOGGING is not set

#
# IEEE 1394 (FireWire) support
#
CONFIG_FIREWIRE=y
CONFIG_FIREWIRE_OHCI=m
# CONFIG_FIREWIRE_SBP2 is not set
# CONFIG_FIREWIRE_NET is not set
CONFIG_FIREWIRE_NOSY=y
CONFIG_MACINTOSH_DRIVERS=y
# CONFIG_MAC_EMUMOUSEBTN is not set
CONFIG_NETDEVICES=y
CONFIG_MII=y
CONFIG_NET_CORE=y
# CONFIG_BONDING is not set
CONFIG_DUMMY=m
CONFIG_EQUALIZER=y
# CONFIG_NET_FC is not set
CONFIG_NET_TEAM=m
# CONFIG_NET_TEAM_MODE_BROADCAST is not set
# CONFIG_NET_TEAM_MODE_ROUNDROBIN is not set
# CONFIG_NET_TEAM_MODE_RANDOM is not set
CONFIG_NET_TEAM_MODE_ACTIVEBACKUP=m
CONFIG_NET_TEAM_MODE_LOADBALANCE=m
# CONFIG_MACVLAN is not set
# CONFIG_IPVLAN is not set
# CONFIG_VXLAN is not set
CONFIG_NETCONSOLE=y
CONFIG_NETPOLL=y
CONFIG_NET_POLL_CONTROLLER=y
# CONFIG_NTB_NETDEV is not set
CONFIG_RIONET=m
CONFIG_RIONET_TX_SIZE=128
CONFIG_RIONET_RX_SIZE=128
CONFIG_TUN=m
# CONFIG_VETH is not set
CONFIG_VIRTIO_NET=m
CONFIG_NLMON=m
CONFIG_ARCNET=m
# CONFIG_ARCNET_1201 is not set
# CONFIG_ARCNET_1051 is not set
# CONFIG_ARCNET_RAW is not set
# CONFIG_ARCNET_CAP is not set
# CONFIG_ARCNET_COM90xx is not set
CONFIG_ARCNET_COM90xxIO=m
# CONFIG_ARCNET_RIM_I is not set
# CONFIG_ARCNET_COM20020 is not set

#
# CAIF transport drivers
#
CONFIG_VHOST_NET=m
CONFIG_VHOST_SCSI=m
CONFIG_VHOST_RING=y
CONFIG_VHOST=m

#
# Distributed Switch Architecture drivers
#
CONFIG_NET_DSA_MV88E6XXX=y
CONFIG_NET_DSA_MV88E6060=m
CONFIG_NET_DSA_MV88E6XXX_NEED_PPU=y
CONFIG_NET_DSA_MV88E6131=y
CONFIG_NET_DSA_MV88E6123_61_65=m
CONFIG_NET_DSA_MV88E6171=y
CONFIG_NET_DSA_MV88E6352=y
# CONFIG_NET_DSA_BCM_SF2 is not set
CONFIG_ETHERNET=y
CONFIG_MDIO=y
# CONFIG_NET_VENDOR_3COM is not set
CONFIG_NET_VENDOR_ADAPTEC=y
# CONFIG_ADAPTEC_STARFIRE is not set
# CONFIG_NET_VENDOR_AGERE is not set
# CONFIG_NET_VENDOR_ALTEON is not set
# CONFIG_ALTERA_TSE is not set
CONFIG_NET_VENDOR_AMD=y
CONFIG_AMD8111_ETH=y
# CONFIG_PCNET32 is not set
CONFIG_AMD_XGBE=y
CONFIG_NET_XGENE=m
CONFIG_NET_VENDOR_ARC=y
# CONFIG_NET_VENDOR_ATHEROS is not set
CONFIG_NET_VENDOR_BROADCOM=y
CONFIG_B44=y
CONFIG_B44_PCI_AUTOSELECT=y
CONFIG_B44_PCICORE_AUTOSELECT=y
CONFIG_B44_PCI=y
CONFIG_BCMGENET=y
CONFIG_BNX2=y
CONFIG_CNIC=y
CONFIG_TIGON3=m
CONFIG_BNX2X=y
CONFIG_BNX2X_SRIOV=y
CONFIG_NET_VENDOR_BROCADE=y
# CONFIG_BNA is not set
CONFIG_NET_VENDOR_CHELSIO=y
CONFIG_CHELSIO_T1=m
CONFIG_CHELSIO_T1_1G=y
CONFIG_CHELSIO_T3=y
CONFIG_CHELSIO_T4=y
CONFIG_CHELSIO_T4VF=y
# CONFIG_NET_VENDOR_CISCO is not set
# CONFIG_CX_ECAT is not set
CONFIG_DNET=m
CONFIG_NET_VENDOR_DEC=y
# CONFIG_NET_TULIP is not set
CONFIG_NET_VENDOR_DLINK=y
# CONFIG_DL2K is not set
CONFIG_SUNDANCE=m
CONFIG_SUNDANCE_MMIO=y
# CONFIG_NET_VENDOR_EMULEX is not set
# CONFIG_NET_VENDOR_EXAR is not set
# CONFIG_NET_VENDOR_HP is not set
# CONFIG_NET_VENDOR_INTEL is not set
# CONFIG_IP1000 is not set
# CONFIG_JME is not set
CONFIG_NET_VENDOR_MARVELL=y
# CONFIG_MVMDIO is not set
# CONFIG_SKGE is not set
CONFIG_SKY2=m
# CONFIG_SKY2_DEBUG is not set
# CONFIG_NET_VENDOR_MELLANOX is not set
CONFIG_NET_VENDOR_MICREL=y
CONFIG_KS8842=m
# CONFIG_KS8851 is not set
CONFIG_KS8851_MLL=m
CONFIG_KSZ884X_PCI=y
CONFIG_NET_VENDOR_MICROCHIP=y
CONFIG_ENC28J60=m
CONFIG_ENC28J60_WRITEVERIFY=y
CONFIG_NET_VENDOR_MYRI=y
CONFIG_MYRI10GE=y
# CONFIG_MYRI10GE_DCA is not set
# CONFIG_FEALNX is not set
CONFIG_NET_VENDOR_NATSEMI=y
# CONFIG_NATSEMI is not set
CONFIG_NS83820=y
CONFIG_NET_VENDOR_8390=y
CONFIG_NE2K_PCI=m
# CONFIG_NET_VENDOR_NVIDIA is not set
CONFIG_NET_VENDOR_OKI=y
CONFIG_ETHOC=y
# CONFIG_NET_PACKET_ENGINE is not set
# CONFIG_NET_VENDOR_QLOGIC is not set
# CONFIG_NET_VENDOR_QUALCOMM is not set
CONFIG_NET_VENDOR_REALTEK=y
CONFIG_ATP=m
CONFIG_8139CP=y
CONFIG_8139TOO=y
# CONFIG_8139TOO_PIO is not set
CONFIG_8139TOO_TUNE_TWISTER=y
# CONFIG_8139TOO_8129 is not set
# CONFIG_8139_OLD_RX_RESET is not set
# CONFIG_R8169 is not set
CONFIG_NET_VENDOR_RDC=y
# CONFIG_R6040 is not set
CONFIG_NET_VENDOR_ROCKER=y
CONFIG_NET_VENDOR_SAMSUNG=y
CONFIG_SXGBE_ETH=y
# CONFIG_NET_VENDOR_SEEQ is not set
CONFIG_NET_VENDOR_SILAN=y
CONFIG_SC92031=y
CONFIG_NET_VENDOR_SIS=y
CONFIG_SIS900=m
CONFIG_SIS190=y
CONFIG_SFC=y
# CONFIG_SFC_MTD is not set
# CONFIG_SFC_SRIOV is not set
CONFIG_NET_VENDOR_SMSC=y
# CONFIG_EPIC100 is not set
# CONFIG_SMSC911X is not set
CONFIG_SMSC9420=y
CONFIG_NET_VENDOR_STMICRO=y
CONFIG_STMMAC_ETH=m
CONFIG_STMMAC_PLATFORM=m
CONFIG_STMMAC_PCI=m
# CONFIG_NET_VENDOR_SUN is not set
# CONFIG_NET_VENDOR_TEHUTI is not set
# CONFIG_NET_VENDOR_TI is not set
# CONFIG_NET_VENDOR_VIA is not set
CONFIG_NET_VENDOR_WIZNET=y
# CONFIG_WIZNET_W5100 is not set
CONFIG_WIZNET_W5300=m
# CONFIG_WIZNET_BUS_DIRECT is not set
# CONFIG_WIZNET_BUS_INDIRECT is not set
CONFIG_WIZNET_BUS_ANY=y
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_NET_SB1000 is not set
CONFIG_PHYLIB=y

#
# MII PHY device drivers
#
CONFIG_AT803X_PHY=m
CONFIG_AMD_PHY=y
CONFIG_AMD_XGBE_PHY=y
# CONFIG_MARVELL_PHY is not set
CONFIG_DAVICOM_PHY=y
# CONFIG_QSEMI_PHY is not set
# CONFIG_LXT_PHY is not set
# CONFIG_CICADA_PHY is not set
CONFIG_VITESSE_PHY=m
CONFIG_SMSC_PHY=y
CONFIG_BROADCOM_PHY=m
CONFIG_BCM7XXX_PHY=y
CONFIG_BCM87XX_PHY=m
CONFIG_ICPLUS_PHY=y
# CONFIG_REALTEK_PHY is not set
# CONFIG_NATIONAL_PHY is not set
CONFIG_STE10XP=y
# CONFIG_LSI_ET1011C_PHY is not set
CONFIG_MICREL_PHY=m
CONFIG_FIXED_PHY=y
# CONFIG_MDIO_BITBANG is not set
CONFIG_MDIO_BCM_UNIMAC=m
# CONFIG_MICREL_KS8995MA is not set
# CONFIG_PLIP is not set
CONFIG_PPP=y
# CONFIG_PPP_BSDCOMP is not set
CONFIG_PPP_DEFLATE=y
CONFIG_PPP_FILTER=y
CONFIG_PPP_MPPE=y
CONFIG_PPP_MULTILINK=y
# CONFIG_PPPOE is not set
# CONFIG_PPTP is not set
CONFIG_PPPOL2TP=m
# CONFIG_PPP_ASYNC is not set
# CONFIG_PPP_SYNC_TTY is not set
CONFIG_SLIP=y
CONFIG_SLHC=y
# CONFIG_SLIP_COMPRESSED is not set
# CONFIG_SLIP_SMART is not set
CONFIG_SLIP_MODE_SLIP6=y

#
# Host-side USB support is needed for USB Network Adapter support
#
# CONFIG_WLAN is not set

#
# WiMAX Wireless Broadband devices
#

#
# Enable USB support to see WiMAX USB drivers
#
# CONFIG_WAN is not set
CONFIG_IEEE802154_DRIVERS=m
# CONFIG_VMXNET3 is not set
CONFIG_ISDN=y
CONFIG_ISDN_I4L=m
CONFIG_ISDN_PPP=y
CONFIG_ISDN_PPP_VJ=y
CONFIG_ISDN_MPP=y
# CONFIG_IPPP_FILTER is not set
CONFIG_ISDN_PPP_BSDCOMP=m
CONFIG_ISDN_AUDIO=y
# CONFIG_ISDN_TTY_FAX is not set
CONFIG_ISDN_X25=y

#
# ISDN feature submodules
#
# CONFIG_ISDN_DRV_LOOP is not set
CONFIG_ISDN_DIVERSION=m

#
# ISDN4Linux hardware drivers
#

#
# Passive cards
#
# CONFIG_ISDN_DRV_HISAX is not set

#
# Active cards
#
CONFIG_ISDN_CAPI=m
# CONFIG_CAPI_TRACE is not set
CONFIG_ISDN_CAPI_CAPI20=m
# CONFIG_ISDN_CAPI_MIDDLEWARE is not set
# CONFIG_ISDN_CAPI_CAPIDRV is not set

#
# CAPI hardware drivers
#
CONFIG_CAPI_AVM=y
CONFIG_ISDN_DRV_AVMB1_B1PCI=m
CONFIG_ISDN_DRV_AVMB1_B1PCIV4=y
CONFIG_ISDN_DRV_AVMB1_T1PCI=m
# CONFIG_ISDN_DRV_AVMB1_C4 is not set
CONFIG_CAPI_EICON=y
CONFIG_ISDN_DIVAS=m
# CONFIG_ISDN_DIVAS_BRIPCI is not set
# CONFIG_ISDN_DIVAS_PRIPCI is not set
CONFIG_ISDN_DIVAS_DIVACAPI=m
CONFIG_ISDN_DIVAS_USERIDI=m
CONFIG_ISDN_DIVAS_MAINT=m
CONFIG_ISDN_DRV_GIGASET=y
CONFIG_GIGASET_DUMMYLL=y
CONFIG_GIGASET_M101=y
CONFIG_GIGASET_DEBUG=y
CONFIG_HYSDN=m
CONFIG_HYSDN_CAPI=y
CONFIG_MISDN=m
# CONFIG_MISDN_DSP is not set
CONFIG_MISDN_L1OIP=m

#
# mISDN hardware drivers
#
CONFIG_MISDN_HFCPCI=m
CONFIG_MISDN_HFCMULTI=m
# CONFIG_MISDN_AVMFRITZ is not set
CONFIG_MISDN_SPEEDFAX=m
CONFIG_MISDN_INFINEON=m
# CONFIG_MISDN_W6692 is not set
CONFIG_MISDN_NETJET=m
CONFIG_MISDN_IPAC=m
CONFIG_MISDN_ISAR=m
CONFIG_ISDN_HDLC=m

#
# Input device support
#
CONFIG_INPUT=y
CONFIG_INPUT_FF_MEMLESS=y
CONFIG_INPUT_POLLDEV=y
CONFIG_INPUT_SPARSEKMAP=y
CONFIG_INPUT_MATRIXKMAP=y

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
CONFIG_INPUT_MOUSEDEV_PSAUX=y
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
CONFIG_INPUT_JOYDEV=y
CONFIG_INPUT_EVDEV=y
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
# CONFIG_KEYBOARD_ADP5588 is not set
CONFIG_KEYBOARD_ADP5589=m
CONFIG_KEYBOARD_ATKBD=y
CONFIG_KEYBOARD_QT1070=y
# CONFIG_KEYBOARD_QT2160 is not set
CONFIG_KEYBOARD_LKKBD=y
# CONFIG_KEYBOARD_TCA6416 is not set
# CONFIG_KEYBOARD_TCA8418 is not set
# CONFIG_KEYBOARD_LM8323 is not set
CONFIG_KEYBOARD_LM8333=y
CONFIG_KEYBOARD_MAX7359=y
CONFIG_KEYBOARD_MCS=y
CONFIG_KEYBOARD_MPR121=m
CONFIG_KEYBOARD_NEWTON=m
# CONFIG_KEYBOARD_OPENCORES is not set
CONFIG_KEYBOARD_GOLDFISH_EVENTS=y
# CONFIG_KEYBOARD_STOWAWAY is not set
CONFIG_KEYBOARD_SUNKBD=y
CONFIG_KEYBOARD_TWL4030=y
CONFIG_KEYBOARD_XTKBD=m
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
CONFIG_MOUSE_PS2_ALPS=y
CONFIG_MOUSE_PS2_LOGIPS2PP=y
CONFIG_MOUSE_PS2_SYNAPTICS=y
CONFIG_MOUSE_PS2_CYPRESS=y
CONFIG_MOUSE_PS2_LIFEBOOK=y
CONFIG_MOUSE_PS2_TRACKPOINT=y
# CONFIG_MOUSE_PS2_ELANTECH is not set
CONFIG_MOUSE_PS2_SENTELIC=y
# CONFIG_MOUSE_PS2_TOUCHKIT is not set
CONFIG_MOUSE_PS2_FOCALTECH=y
CONFIG_MOUSE_SERIAL=m
# CONFIG_MOUSE_CYAPA is not set
# CONFIG_MOUSE_ELAN_I2C is not set
CONFIG_MOUSE_VSXXXAA=m
CONFIG_MOUSE_SYNAPTICS_I2C=y
# CONFIG_INPUT_JOYSTICK is not set
CONFIG_INPUT_TABLET=y
# CONFIG_TABLET_SERIAL_WACOM4 is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
CONFIG_INPUT_MISC=y
CONFIG_INPUT_AD714X=m
# CONFIG_INPUT_AD714X_I2C is not set
CONFIG_INPUT_AD714X_SPI=m
CONFIG_INPUT_BMA150=m
CONFIG_INPUT_E3X0_BUTTON=m
CONFIG_INPUT_PCSPKR=m
# CONFIG_INPUT_MAX77693_HAPTIC is not set
CONFIG_INPUT_MAX8997_HAPTIC=m
# CONFIG_INPUT_MC13783_PWRBUTTON is not set
CONFIG_INPUT_MMA8450=y
# CONFIG_INPUT_MPU3050 is not set
CONFIG_INPUT_APANEL=y
# CONFIG_INPUT_ATLAS_BTNS is not set
CONFIG_INPUT_KXTJ9=m
CONFIG_INPUT_KXTJ9_POLLED_MODE=y
CONFIG_INPUT_RETU_PWRBUTTON=y
CONFIG_INPUT_TPS65218_PWRBUTTON=m
CONFIG_INPUT_AXP20X_PEK=m
CONFIG_INPUT_TWL4030_PWRBUTTON=y
CONFIG_INPUT_TWL4030_VIBRA=m
CONFIG_INPUT_UINPUT=y
# CONFIG_INPUT_PALMAS_PWRBUTTON is not set
CONFIG_INPUT_PCF8574=m
CONFIG_INPUT_PWM_BEEPER=y
CONFIG_INPUT_DA9055_ONKEY=m
# CONFIG_INPUT_PCAP is not set
# CONFIG_INPUT_ADXL34X is not set
CONFIG_INPUT_CMA3000=y
# CONFIG_INPUT_CMA3000_I2C is not set
# CONFIG_INPUT_IDEAPAD_SLIDEBAR is not set
CONFIG_INPUT_DRV2667_HAPTICS=y

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_ARCH_MIGHT_HAVE_PC_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_SERIO_SERPORT=y
CONFIG_SERIO_CT82C710=y
# CONFIG_SERIO_PARKBD is not set
CONFIG_SERIO_PCIPS2=m
CONFIG_SERIO_LIBPS2=y
CONFIG_SERIO_RAW=y
# CONFIG_SERIO_ALTERA_PS2 is not set
CONFIG_SERIO_PS2MULT=m
# CONFIG_SERIO_ARC_PS2 is not set
CONFIG_GAMEPORT=m
CONFIG_GAMEPORT_NS558=m
# CONFIG_GAMEPORT_L4 is not set
CONFIG_GAMEPORT_EMU10K1=m
# CONFIG_GAMEPORT_FM801 is not set

#
# Character devices
#
CONFIG_TTY=y
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_VT_CONSOLE_SLEEP=y
CONFIG_HW_CONSOLE=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_UNIX98_PTYS=y
# CONFIG_DEVPTS_MULTIPLE_INSTANCES is not set
CONFIG_LEGACY_PTYS=y
CONFIG_LEGACY_PTY_COUNT=256
# CONFIG_SERIAL_NONSTANDARD is not set
CONFIG_NOZOMI=y
# CONFIG_N_GSM is not set
CONFIG_TRACE_ROUTER=m
CONFIG_TRACE_SINK=y
CONFIG_GOLDFISH_TTY=m
CONFIG_DEVMEM=y
CONFIG_DEVKMEM=y

#
# Serial drivers
#
CONFIG_SERIAL_EARLYCON=y
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_DEPRECATED_OPTIONS=y
CONFIG_SERIAL_8250_PNP=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_8250_DMA=y
CONFIG_SERIAL_8250_PCI=y
CONFIG_SERIAL_8250_NR_UARTS=4
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
# CONFIG_SERIAL_8250_EXTENDED is not set
# CONFIG_SERIAL_8250_DW is not set
CONFIG_SERIAL_8250_FINTEK=m

#
# Non-8250 serial port support
#
# CONFIG_SERIAL_MAX3100 is not set
CONFIG_SERIAL_MAX310X=y
CONFIG_SERIAL_MFD_HSU=y
CONFIG_SERIAL_MFD_HSU_CONSOLE=y
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
CONFIG_SERIAL_JSM=m
# CONFIG_SERIAL_SCCNXP is not set
CONFIG_SERIAL_SC16IS7XX=y
CONFIG_SERIAL_ALTERA_JTAGUART=y
# CONFIG_SERIAL_ALTERA_JTAGUART_CONSOLE is not set
# CONFIG_SERIAL_ALTERA_UART is not set
CONFIG_SERIAL_ARC=y
CONFIG_SERIAL_ARC_CONSOLE=y
CONFIG_SERIAL_ARC_NR_PORTS=1
CONFIG_SERIAL_RP2=m
CONFIG_SERIAL_RP2_NR_UARTS=32
CONFIG_SERIAL_FSL_LPUART=m
CONFIG_PRINTER=m
# CONFIG_LP_CONSOLE is not set
# CONFIG_PPDEV is not set
CONFIG_HVC_DRIVER=y
CONFIG_VIRTIO_CONSOLE=m
CONFIG_IPMI_HANDLER=y
# CONFIG_IPMI_PANIC_EVENT is not set
CONFIG_IPMI_DEVICE_INTERFACE=m
# CONFIG_IPMI_SI is not set
CONFIG_IPMI_SSIF=y
# CONFIG_IPMI_WATCHDOG is not set
CONFIG_IPMI_POWEROFF=m
CONFIG_HW_RANDOM=y
# CONFIG_HW_RANDOM_TIMERIOMEM is not set
CONFIG_HW_RANDOM_INTEL=y
CONFIG_HW_RANDOM_AMD=y
CONFIG_HW_RANDOM_VIA=y
CONFIG_HW_RANDOM_VIRTIO=m
# CONFIG_NVRAM is not set
CONFIG_R3964=y
CONFIG_APPLICOM=y
CONFIG_MWAVE=y
CONFIG_RAW_DRIVER=y
CONFIG_MAX_RAW_DEVS=256
CONFIG_HPET=y
CONFIG_HPET_MMAP=y
CONFIG_HPET_MMAP_DEFAULT=y
CONFIG_HANGCHECK_TIMER=y
# CONFIG_TCG_TPM is not set
CONFIG_TELCLOCK=m
CONFIG_DEVPORT=y
CONFIG_XILLYBUS=m
# CONFIG_XILLYBUS_PCIE is not set

#
# I2C support
#
CONFIG_I2C=y
CONFIG_ACPI_I2C_OPREGION=y
CONFIG_I2C_BOARDINFO=y
CONFIG_I2C_COMPAT=y
# CONFIG_I2C_CHARDEV is not set
CONFIG_I2C_MUX=y

#
# Multiplexer I2C Chip support
#
CONFIG_I2C_MUX_PCA9541=m
CONFIG_I2C_HELPER_AUTO=y
CONFIG_I2C_SMBUS=m
CONFIG_I2C_ALGOBIT=y

#
# I2C Hardware Bus support
#

#
# PC SMBus host controller drivers
#
CONFIG_I2C_ALI1535=m
CONFIG_I2C_ALI1563=y
CONFIG_I2C_ALI15X3=y
# CONFIG_I2C_AMD756 is not set
CONFIG_I2C_AMD8111=m
CONFIG_I2C_I801=m
CONFIG_I2C_ISCH=y
CONFIG_I2C_ISMT=y
CONFIG_I2C_PIIX4=m
CONFIG_I2C_NFORCE2=m
CONFIG_I2C_NFORCE2_S4985=m
# CONFIG_I2C_SIS5595 is not set
CONFIG_I2C_SIS630=m
CONFIG_I2C_SIS96X=m
CONFIG_I2C_VIA=m
CONFIG_I2C_VIAPRO=y

#
# ACPI drivers
#
CONFIG_I2C_SCMI=y

#
# I2C system bus drivers (mostly embedded / system-on-chip)
#
CONFIG_I2C_DESIGNWARE_CORE=m
CONFIG_I2C_DESIGNWARE_PCI=m
CONFIG_I2C_OCORES=m
# CONFIG_I2C_PCA_PLATFORM is not set
# CONFIG_I2C_PXA_PCI is not set
CONFIG_I2C_SIMTEC=y
# CONFIG_I2C_XILINX is not set

#
# External I2C/SMBus adapter drivers
#
CONFIG_I2C_PARPORT=m
CONFIG_I2C_PARPORT_LIGHT=m
CONFIG_I2C_TAOS_EVM=m

#
# Other I2C/SMBus bus drivers
#
CONFIG_I2C_STUB=m
# CONFIG_I2C_SLAVE is not set
# CONFIG_I2C_DEBUG_CORE is not set
CONFIG_I2C_DEBUG_ALGO=y
# CONFIG_I2C_DEBUG_BUS is not set
CONFIG_SPI=y
# CONFIG_SPI_DEBUG is not set
CONFIG_SPI_MASTER=y

#
# SPI Master Controller Drivers
#
# CONFIG_SPI_ALTERA is not set
CONFIG_SPI_BITBANG=y
CONFIG_SPI_BUTTERFLY=m
# CONFIG_SPI_LM70_LLP is not set
# CONFIG_SPI_PXA2XX is not set
# CONFIG_SPI_PXA2XX_PCI is not set
CONFIG_SPI_SC18IS602=m
CONFIG_SPI_XCOMM=m
# CONFIG_SPI_XILINX is not set
CONFIG_SPI_DESIGNWARE=y
CONFIG_SPI_DW_PCI=m
CONFIG_SPI_DW_MID_DMA=y
# CONFIG_SPI_DW_MMIO is not set

#
# SPI Protocol Masters
#
CONFIG_SPI_SPIDEV=y
CONFIG_SPI_TLE62X0=m
CONFIG_SPMI=m
# CONFIG_HSI is not set

#
# PPS support
#
CONFIG_PPS=y
CONFIG_PPS_DEBUG=y

#
# PPS clients support
#
CONFIG_PPS_CLIENT_KTIMER=m
CONFIG_PPS_CLIENT_LDISC=m
CONFIG_PPS_CLIENT_PARPORT=m
CONFIG_PPS_CLIENT_GPIO=m

#
# PPS generators support
#

#
# PTP clock support
#
CONFIG_PTP_1588_CLOCK=y

#
# Enable PHYLIB and NETWORK_PHY_TIMESTAMPING to see the additional clocks.
#
CONFIG_ARCH_WANT_OPTIONAL_GPIOLIB=y
# CONFIG_GPIOLIB is not set
CONFIG_W1=y
CONFIG_W1_CON=y

#
# 1-wire Bus Masters
#
CONFIG_W1_MASTER_MATROX=m
# CONFIG_W1_MASTER_DS2482 is not set
CONFIG_W1_MASTER_DS1WM=y

#
# 1-wire Slaves
#
# CONFIG_W1_SLAVE_THERM is not set
CONFIG_W1_SLAVE_SMEM=y
CONFIG_W1_SLAVE_DS2408=m
# CONFIG_W1_SLAVE_DS2408_READBACK is not set
# CONFIG_W1_SLAVE_DS2413 is not set
# CONFIG_W1_SLAVE_DS2406 is not set
# CONFIG_W1_SLAVE_DS2423 is not set
CONFIG_W1_SLAVE_DS2431=y
CONFIG_W1_SLAVE_DS2433=y
CONFIG_W1_SLAVE_DS2433_CRC=y
# CONFIG_W1_SLAVE_DS2760 is not set
CONFIG_W1_SLAVE_DS2780=y
# CONFIG_W1_SLAVE_DS2781 is not set
# CONFIG_W1_SLAVE_DS28E04 is not set
# CONFIG_W1_SLAVE_BQ27000 is not set
CONFIG_POWER_SUPPLY=y
CONFIG_POWER_SUPPLY_DEBUG=y
CONFIG_PDA_POWER=y
# CONFIG_GENERIC_ADC_BATTERY is not set
# CONFIG_TEST_POWER is not set
CONFIG_BATTERY_DS2780=y
# CONFIG_BATTERY_DS2781 is not set
# CONFIG_BATTERY_DS2782 is not set
CONFIG_BATTERY_SBS=y
CONFIG_BATTERY_BQ27x00=y
# CONFIG_BATTERY_BQ27X00_I2C is not set
CONFIG_BATTERY_BQ27X00_PLATFORM=y
# CONFIG_BATTERY_DA9030 is not set
CONFIG_BATTERY_MAX17040=y
# CONFIG_BATTERY_MAX17042 is not set
CONFIG_BATTERY_TWL4030_MADC=m
CONFIG_BATTERY_RX51=m
# CONFIG_CHARGER_MAX8903 is not set
# CONFIG_CHARGER_TWL4030 is not set
CONFIG_CHARGER_LP8727=y
CONFIG_CHARGER_MAX14577=m
# CONFIG_CHARGER_MAX77693 is not set
CONFIG_CHARGER_BQ2415X=y
# CONFIG_CHARGER_SMB347 is not set
CONFIG_BATTERY_GAUGE_LTC2941=y
CONFIG_BATTERY_GOLDFISH=m
# CONFIG_POWER_RESET is not set
CONFIG_POWER_AVS=y
CONFIG_HWMON=m
CONFIG_HWMON_VID=m
CONFIG_HWMON_DEBUG_CHIP=y

#
# Native drivers
#
CONFIG_SENSORS_ABITUGURU=m
# CONFIG_SENSORS_ABITUGURU3 is not set
# CONFIG_SENSORS_AD7314 is not set
CONFIG_SENSORS_AD7414=m
CONFIG_SENSORS_AD7418=m
CONFIG_SENSORS_ADM1021=m
CONFIG_SENSORS_ADM1025=m
CONFIG_SENSORS_ADM1026=m
# CONFIG_SENSORS_ADM1029 is not set
CONFIG_SENSORS_ADM1031=m
# CONFIG_SENSORS_ADM9240 is not set
CONFIG_SENSORS_ADT7X10=m
CONFIG_SENSORS_ADT7310=m
# CONFIG_SENSORS_ADT7410 is not set
CONFIG_SENSORS_ADT7411=m
CONFIG_SENSORS_ADT7462=m
CONFIG_SENSORS_ADT7470=m
# CONFIG_SENSORS_ADT7475 is not set
CONFIG_SENSORS_ASC7621=m
CONFIG_SENSORS_K8TEMP=m
CONFIG_SENSORS_K10TEMP=m
CONFIG_SENSORS_FAM15H_POWER=m
# CONFIG_SENSORS_APPLESMC is not set
CONFIG_SENSORS_ASB100=m
# CONFIG_SENSORS_ATXP1 is not set
CONFIG_SENSORS_DS620=m
# CONFIG_SENSORS_DS1621 is not set
CONFIG_SENSORS_DA9055=m
CONFIG_SENSORS_I5K_AMB=m
# CONFIG_SENSORS_F71805F is not set
CONFIG_SENSORS_F71882FG=m
# CONFIG_SENSORS_F75375S is not set
# CONFIG_SENSORS_MC13783_ADC is not set
CONFIG_SENSORS_FSCHMD=m
CONFIG_SENSORS_GL518SM=m
CONFIG_SENSORS_GL520SM=m
CONFIG_SENSORS_G760A=m
# CONFIG_SENSORS_G762 is not set
CONFIG_SENSORS_HIH6130=m
# CONFIG_SENSORS_IBMAEM is not set
# CONFIG_SENSORS_IBMPEX is not set
# CONFIG_SENSORS_IIO_HWMON is not set
CONFIG_SENSORS_I5500=m
# CONFIG_SENSORS_CORETEMP is not set
CONFIG_SENSORS_IT87=m
CONFIG_SENSORS_JC42=m
CONFIG_SENSORS_POWR1220=m
CONFIG_SENSORS_LINEAGE=m
CONFIG_SENSORS_LTC2945=m
# CONFIG_SENSORS_LTC4151 is not set
CONFIG_SENSORS_LTC4215=m
CONFIG_SENSORS_LTC4222=m
CONFIG_SENSORS_LTC4245=m
# CONFIG_SENSORS_LTC4260 is not set
CONFIG_SENSORS_LTC4261=m
CONFIG_SENSORS_MAX1111=m
CONFIG_SENSORS_MAX16065=m
# CONFIG_SENSORS_MAX1619 is not set
CONFIG_SENSORS_MAX1668=m
CONFIG_SENSORS_MAX197=m
CONFIG_SENSORS_MAX6639=m
CONFIG_SENSORS_MAX6642=m
CONFIG_SENSORS_MAX6650=m
CONFIG_SENSORS_MAX6697=m
# CONFIG_SENSORS_HTU21 is not set
CONFIG_SENSORS_MCP3021=m
CONFIG_SENSORS_MENF21BMC_HWMON=m
CONFIG_SENSORS_ADCXX=m
CONFIG_SENSORS_LM63=m
CONFIG_SENSORS_LM70=m
# CONFIG_SENSORS_LM73 is not set
CONFIG_SENSORS_LM75=m
# CONFIG_SENSORS_LM77 is not set
CONFIG_SENSORS_LM78=m
CONFIG_SENSORS_LM80=m
CONFIG_SENSORS_LM83=m
# CONFIG_SENSORS_LM85 is not set
# CONFIG_SENSORS_LM87 is not set
CONFIG_SENSORS_LM90=m
CONFIG_SENSORS_LM92=m
CONFIG_SENSORS_LM93=m
CONFIG_SENSORS_LM95234=m
# CONFIG_SENSORS_LM95241 is not set
CONFIG_SENSORS_LM95245=m
CONFIG_SENSORS_PC87360=m
CONFIG_SENSORS_PC87427=m
# CONFIG_SENSORS_NTC_THERMISTOR is not set
# CONFIG_SENSORS_NCT6683 is not set
CONFIG_SENSORS_NCT6775=m
# CONFIG_SENSORS_NCT7802 is not set
CONFIG_SENSORS_PCF8591=m
CONFIG_PMBUS=m
# CONFIG_SENSORS_PMBUS is not set
CONFIG_SENSORS_ADM1275=m
# CONFIG_SENSORS_LM25066 is not set
# CONFIG_SENSORS_LTC2978 is not set
CONFIG_SENSORS_MAX16064=m
CONFIG_SENSORS_MAX34440=m
# CONFIG_SENSORS_MAX8688 is not set
CONFIG_SENSORS_TPS40422=m
CONFIG_SENSORS_UCD9000=m
CONFIG_SENSORS_UCD9200=m
# CONFIG_SENSORS_ZL6100 is not set
CONFIG_SENSORS_SHT21=m
# CONFIG_SENSORS_SHTC1 is not set
CONFIG_SENSORS_SIS5595=m
CONFIG_SENSORS_DME1737=m
CONFIG_SENSORS_EMC1403=m
# CONFIG_SENSORS_EMC2103 is not set
# CONFIG_SENSORS_EMC6W201 is not set
CONFIG_SENSORS_SMSC47M1=m
CONFIG_SENSORS_SMSC47M192=m
CONFIG_SENSORS_SMSC47B397=m
# CONFIG_SENSORS_SCH56XX_COMMON is not set
# CONFIG_SENSORS_SMM665 is not set
CONFIG_SENSORS_ADC128D818=m
# CONFIG_SENSORS_ADS1015 is not set
CONFIG_SENSORS_ADS7828=m
CONFIG_SENSORS_ADS7871=m
CONFIG_SENSORS_AMC6821=m
CONFIG_SENSORS_INA209=m
CONFIG_SENSORS_INA2XX=m
CONFIG_SENSORS_THMC50=m
CONFIG_SENSORS_TMP102=m
CONFIG_SENSORS_TMP103=m
# CONFIG_SENSORS_TMP401 is not set
CONFIG_SENSORS_TMP421=m
# CONFIG_SENSORS_TWL4030_MADC is not set
CONFIG_SENSORS_VIA_CPUTEMP=m
# CONFIG_SENSORS_VIA686A is not set
# CONFIG_SENSORS_VT1211 is not set
# CONFIG_SENSORS_VT8231 is not set
# CONFIG_SENSORS_W83781D is not set
# CONFIG_SENSORS_W83791D is not set
# CONFIG_SENSORS_W83792D is not set
CONFIG_SENSORS_W83793=m
# CONFIG_SENSORS_W83795 is not set
# CONFIG_SENSORS_W83L785TS is not set
# CONFIG_SENSORS_W83L786NG is not set
CONFIG_SENSORS_W83627HF=m
# CONFIG_SENSORS_W83627EHF is not set

#
# ACPI drivers
#
# CONFIG_SENSORS_ACPI_POWER is not set
# CONFIG_SENSORS_ATK0110 is not set
CONFIG_THERMAL=y
CONFIG_THERMAL_DEFAULT_GOV_STEP_WISE=y
# CONFIG_THERMAL_DEFAULT_GOV_FAIR_SHARE is not set
# CONFIG_THERMAL_DEFAULT_GOV_USER_SPACE is not set
# CONFIG_THERMAL_GOV_FAIR_SHARE is not set
CONFIG_THERMAL_GOV_STEP_WISE=y
CONFIG_THERMAL_GOV_BANG_BANG=y
CONFIG_THERMAL_GOV_USER_SPACE=y
# CONFIG_THERMAL_EMULATION is not set
# CONFIG_INTEL_POWERCLAMP is not set
CONFIG_X86_PKG_TEMP_THERMAL=y
CONFIG_INTEL_SOC_DTS_THERMAL=m
CONFIG_INT340X_THERMAL=m
CONFIG_ACPI_THERMAL_REL=m

#
# Texas Instruments thermal drivers
#
# CONFIG_WATCHDOG is not set
CONFIG_SSB_POSSIBLE=y

#
# Sonics Silicon Backplane
#
CONFIG_SSB=y
CONFIG_SSB_SPROM=y
CONFIG_SSB_PCIHOST_POSSIBLE=y
CONFIG_SSB_PCIHOST=y
# CONFIG_SSB_B43_PCI_BRIDGE is not set
CONFIG_SSB_SDIOHOST_POSSIBLE=y
# CONFIG_SSB_SDIOHOST is not set
# CONFIG_SSB_DEBUG is not set
CONFIG_SSB_DRIVER_PCICORE_POSSIBLE=y
CONFIG_SSB_DRIVER_PCICORE=y
CONFIG_BCMA_POSSIBLE=y

#
# Broadcom specific AMBA
#
CONFIG_BCMA=m
CONFIG_BCMA_HOST_PCI_POSSIBLE=y
CONFIG_BCMA_HOST_PCI=y
CONFIG_BCMA_HOST_SOC=y
CONFIG_BCMA_DRIVER_GMAC_CMN=y
CONFIG_BCMA_DEBUG=y

#
# Multifunction device drivers
#
CONFIG_MFD_CORE=y
CONFIG_MFD_AS3711=y
# CONFIG_PMIC_ADP5520 is not set
CONFIG_MFD_BCM590XX=y
CONFIG_MFD_AXP20X=y
# CONFIG_MFD_CROS_EC is not set
CONFIG_PMIC_DA903X=y
# CONFIG_MFD_DA9052_SPI is not set
# CONFIG_MFD_DA9052_I2C is not set
CONFIG_MFD_DA9055=y
CONFIG_MFD_DA9063=y
CONFIG_MFD_DA9150=y
CONFIG_MFD_MC13XXX=m
CONFIG_MFD_MC13XXX_SPI=m
CONFIG_MFD_MC13XXX_I2C=m
# CONFIG_HTC_PASIC3 is not set
CONFIG_LPC_ICH=y
CONFIG_LPC_SCH=y
CONFIG_INTEL_SOC_PMIC=y
CONFIG_MFD_JANZ_CMODIO=m
# CONFIG_MFD_KEMPLD is not set
# CONFIG_MFD_88PM800 is not set
CONFIG_MFD_88PM805=m
# CONFIG_MFD_88PM860X is not set
CONFIG_MFD_MAX14577=y
CONFIG_MFD_MAX77693=y
CONFIG_MFD_MAX8907=m
# CONFIG_MFD_MAX8925 is not set
CONFIG_MFD_MAX8997=y
# CONFIG_MFD_MAX8998 is not set
CONFIG_MFD_MENF21BMC=y
CONFIG_EZX_PCAP=y
CONFIG_MFD_RETU=y
# CONFIG_MFD_PCF50633 is not set
CONFIG_MFD_RDC321X=m
CONFIG_MFD_RTSX_PCI=y
# CONFIG_MFD_RT5033 is not set
# CONFIG_MFD_RC5T583 is not set
CONFIG_MFD_RN5T618=m
# CONFIG_MFD_SEC_CORE is not set
# CONFIG_MFD_SI476X_CORE is not set
# CONFIG_MFD_SM501 is not set
# CONFIG_MFD_SMSC is not set
# CONFIG_ABX500_CORE is not set
# CONFIG_MFD_SYSCON is not set
CONFIG_MFD_TI_AM335X_TSCADC=m
CONFIG_MFD_LP3943=m
CONFIG_MFD_LP8788=y
CONFIG_MFD_PALMAS=y
# CONFIG_TPS6105X is not set
CONFIG_TPS6507X=m
# CONFIG_MFD_TPS65090 is not set
CONFIG_MFD_TPS65217=m
CONFIG_MFD_TPS65218=y
CONFIG_MFD_TPS6586X=y
CONFIG_MFD_TPS80031=y
CONFIG_TWL4030_CORE=y
CONFIG_MFD_TWL4030_AUDIO=y
# CONFIG_TWL6040_CORE is not set
# CONFIG_MFD_WL1273_CORE is not set
CONFIG_MFD_LM3533=y
# CONFIG_MFD_TC3589X is not set
# CONFIG_MFD_TMIO is not set
CONFIG_MFD_VX855=m
CONFIG_MFD_ARIZONA=y
CONFIG_MFD_ARIZONA_I2C=m
CONFIG_MFD_ARIZONA_SPI=y
# CONFIG_MFD_WM5102 is not set
CONFIG_MFD_WM5110=y
CONFIG_MFD_WM8997=y
# CONFIG_MFD_WM8400 is not set
# CONFIG_MFD_WM831X_I2C is not set
# CONFIG_MFD_WM831X_SPI is not set
# CONFIG_MFD_WM8350_I2C is not set
# CONFIG_MFD_WM8994 is not set
# CONFIG_REGULATOR is not set
# CONFIG_MEDIA_SUPPORT is not set

#
# Graphics support
#
CONFIG_AGP=y
CONFIG_AGP_AMD64=y
CONFIG_AGP_INTEL=y
# CONFIG_AGP_SIS is not set
CONFIG_AGP_VIA=y
CONFIG_INTEL_GTT=y
CONFIG_VGA_ARB=y
CONFIG_VGA_ARB_MAX_GPUS=16
# CONFIG_VGA_SWITCHEROO is not set

#
# Direct Rendering Manager
#
CONFIG_DRM=m
CONFIG_DRM_MIPI_DSI=y
CONFIG_DRM_KMS_HELPER=m
CONFIG_DRM_KMS_FB_HELPER=y
# CONFIG_DRM_LOAD_EDID_FIRMWARE is not set
CONFIG_DRM_TTM=m

#
# I2C encoder or helper chips
#
# CONFIG_DRM_I2C_ADV7511 is not set
# CONFIG_DRM_I2C_CH7006 is not set
CONFIG_DRM_I2C_SIL164=m
# CONFIG_DRM_I2C_NXP_TDA998X is not set
# CONFIG_DRM_TDFX is not set
CONFIG_DRM_R128=m
CONFIG_DRM_RADEON=m
CONFIG_DRM_RADEON_UMS=y
# CONFIG_DRM_NOUVEAU is not set
CONFIG_DRM_I810=m
CONFIG_DRM_I915=m
# CONFIG_DRM_I915_KMS is not set
CONFIG_DRM_I915_FBDEV=y
CONFIG_DRM_I915_PRELIMINARY_HW_SUPPORT=y
CONFIG_DRM_MGA=m
# CONFIG_DRM_SIS is not set
CONFIG_DRM_VIA=m
CONFIG_DRM_SAVAGE=m
CONFIG_DRM_VMWGFX=m
CONFIG_DRM_VMWGFX_FBCON=y
# CONFIG_DRM_GMA500 is not set
CONFIG_DRM_AST=m
CONFIG_DRM_MGAG200=m
# CONFIG_DRM_CIRRUS_QEMU is not set
# CONFIG_DRM_QXL is not set
CONFIG_DRM_BOCHS=m
CONFIG_DRM_PANEL=y

#
# Display Panels
#

#
# Frame buffer Devices
#
CONFIG_FB=m
# CONFIG_FIRMWARE_EDID is not set
CONFIG_FB_CMDLINE=y
CONFIG_FB_DDC=m
# CONFIG_FB_BOOT_VESA_SUPPORT is not set
CONFIG_FB_CFB_FILLRECT=m
CONFIG_FB_CFB_COPYAREA=m
CONFIG_FB_CFB_IMAGEBLIT=m
# CONFIG_FB_CFB_REV_PIXELS_IN_BYTE is not set
CONFIG_FB_SYS_FILLRECT=m
CONFIG_FB_SYS_COPYAREA=m
CONFIG_FB_SYS_IMAGEBLIT=m
CONFIG_FB_FOREIGN_ENDIAN=y
CONFIG_FB_BOTH_ENDIAN=y
# CONFIG_FB_BIG_ENDIAN is not set
# CONFIG_FB_LITTLE_ENDIAN is not set
CONFIG_FB_SYS_FOPS=m
CONFIG_FB_DEFERRED_IO=y
CONFIG_FB_SVGALIB=m
# CONFIG_FB_MACMODES is not set
CONFIG_FB_BACKLIGHT=y
CONFIG_FB_MODE_HELPERS=y
CONFIG_FB_TILEBLITTING=y

#
# Frame buffer hardware drivers
#
CONFIG_FB_CIRRUS=m
CONFIG_FB_PM2=m
# CONFIG_FB_PM2_FIFO_DISCONNECT is not set
CONFIG_FB_CYBER2000=m
# CONFIG_FB_CYBER2000_DDC is not set
CONFIG_FB_ARC=m
CONFIG_FB_VGA16=m
# CONFIG_FB_UVESA is not set
# CONFIG_FB_N411 is not set
# CONFIG_FB_HGA is not set
# CONFIG_FB_OPENCORES is not set
# CONFIG_FB_S1D13XXX is not set
CONFIG_FB_NVIDIA=m
# CONFIG_FB_NVIDIA_I2C is not set
CONFIG_FB_NVIDIA_DEBUG=y
# CONFIG_FB_NVIDIA_BACKLIGHT is not set
# CONFIG_FB_RIVA is not set
CONFIG_FB_I740=m
CONFIG_FB_LE80578=m
CONFIG_FB_CARILLO_RANCH=m
# CONFIG_FB_MATROX is not set
# CONFIG_FB_RADEON is not set
CONFIG_FB_ATY128=m
CONFIG_FB_ATY128_BACKLIGHT=y
CONFIG_FB_ATY=m
CONFIG_FB_ATY_CT=y
# CONFIG_FB_ATY_GENERIC_LCD is not set
CONFIG_FB_ATY_GX=y
CONFIG_FB_ATY_BACKLIGHT=y
CONFIG_FB_S3=m
# CONFIG_FB_S3_DDC is not set
CONFIG_FB_SAVAGE=m
CONFIG_FB_SAVAGE_I2C=y
# CONFIG_FB_SAVAGE_ACCEL is not set
# CONFIG_FB_SIS is not set
CONFIG_FB_NEOMAGIC=m
CONFIG_FB_KYRO=m
CONFIG_FB_3DFX=m
CONFIG_FB_3DFX_ACCEL=y
CONFIG_FB_3DFX_I2C=y
CONFIG_FB_VOODOO1=m
CONFIG_FB_VT8623=m
CONFIG_FB_TRIDENT=m
CONFIG_FB_ARK=m
CONFIG_FB_PM3=m
CONFIG_FB_CARMINE=m
CONFIG_FB_CARMINE_DRAM_EVAL=y
# CONFIG_CARMINE_DRAM_CUSTOM is not set
# CONFIG_FB_GOLDFISH is not set
CONFIG_FB_VIRTUAL=m
CONFIG_FB_METRONOME=m
CONFIG_FB_MB862XX=m
CONFIG_FB_MB862XX_PCI_GDC=y
# CONFIG_FB_MB862XX_I2C is not set
CONFIG_FB_BROADSHEET=m
# CONFIG_FB_AUO_K190X is not set
CONFIG_BACKLIGHT_LCD_SUPPORT=y
CONFIG_LCD_CLASS_DEVICE=m
CONFIG_LCD_LTV350QV=m
CONFIG_LCD_ILI922X=m
CONFIG_LCD_ILI9320=m
# CONFIG_LCD_TDO24M is not set
CONFIG_LCD_VGG2432A4=m
CONFIG_LCD_PLATFORM=m
CONFIG_LCD_S6E63M0=m
# CONFIG_LCD_LD9040 is not set
CONFIG_LCD_AMS369FG06=m
CONFIG_LCD_LMS501KF03=m
CONFIG_LCD_HX8357=m
CONFIG_BACKLIGHT_CLASS_DEVICE=m
CONFIG_BACKLIGHT_GENERIC=m
# CONFIG_BACKLIGHT_LM3533 is not set
CONFIG_BACKLIGHT_CARILLO_RANCH=m
CONFIG_BACKLIGHT_PWM=m
# CONFIG_BACKLIGHT_DA903X is not set
# CONFIG_BACKLIGHT_APPLE is not set
CONFIG_BACKLIGHT_SAHARA=m
# CONFIG_BACKLIGHT_ADP8860 is not set
CONFIG_BACKLIGHT_ADP8870=m
CONFIG_BACKLIGHT_LM3630A=m
CONFIG_BACKLIGHT_LM3639=m
CONFIG_BACKLIGHT_LP855X=m
CONFIG_BACKLIGHT_LP8788=m
CONFIG_BACKLIGHT_PANDORA=m
CONFIG_BACKLIGHT_TPS65217=m
# CONFIG_BACKLIGHT_AS3711 is not set
# CONFIG_BACKLIGHT_LV5207LP is not set
CONFIG_BACKLIGHT_BD6107=m
CONFIG_VGASTATE=m
CONFIG_HDMI=y

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
CONFIG_VGACON_SOFT_SCROLLBACK=y
CONFIG_VGACON_SOFT_SCROLLBACK_SIZE=128
CONFIG_DUMMY_CONSOLE=y
CONFIG_DUMMY_CONSOLE_COLUMNS=80
CONFIG_DUMMY_CONSOLE_ROWS=25
CONFIG_FRAMEBUFFER_CONSOLE=m
CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
CONFIG_FRAMEBUFFER_CONSOLE_ROTATION=y
CONFIG_LOGO=y
CONFIG_LOGO_LINUX_MONO=y
CONFIG_LOGO_LINUX_VGA16=y
# CONFIG_LOGO_LINUX_CLUT224 is not set
CONFIG_SOUND=y
CONFIG_SOUND_OSS_CORE=y
CONFIG_SOUND_OSS_CORE_PRECLAIM=y
# CONFIG_SND is not set
CONFIG_SOUND_PRIME=y
CONFIG_SOUND_OSS=y
# CONFIG_SOUND_TRACEINIT is not set
CONFIG_SOUND_DMAP=y
CONFIG_SOUND_VMIDI=m
# CONFIG_SOUND_TRIX is not set
# CONFIG_SOUND_MSS is not set
CONFIG_SOUND_MPU401=y
# CONFIG_SOUND_PAS is not set
CONFIG_SOUND_PSS=y
CONFIG_PSS_MIXER=y
CONFIG_SOUND_SB=y
CONFIG_SOUND_YM3812=m
CONFIG_SOUND_UART6850=y
CONFIG_SOUND_AEDSP16=m
CONFIG_SC6600=y
CONFIG_SC6600_JOY=y
CONFIG_SC6600_CDROM=4
CONFIG_SC6600_CDROMBASE=0
CONFIG_SOUND_KAHLUA=y

#
# HID support
#
# CONFIG_HID is not set

#
# I2C HID support
#
# CONFIG_I2C_HID is not set
CONFIG_USB_OHCI_LITTLE_ENDIAN=y
# CONFIG_USB_SUPPORT is not set
CONFIG_UWB=y
CONFIG_UWB_WHCI=m
CONFIG_MMC=y
# CONFIG_MMC_DEBUG is not set
CONFIG_MMC_CLKGATE=y

#
# MMC/SD/SDIO Card Drivers
#
# CONFIG_MMC_BLOCK is not set
CONFIG_SDIO_UART=y
# CONFIG_MMC_TEST is not set

#
# MMC/SD/SDIO Host Controller Drivers
#
# CONFIG_MMC_SDHCI is not set
# CONFIG_MMC_WBSD is not set
CONFIG_MMC_TIFM_SD=y
CONFIG_MMC_GOLDFISH=m
CONFIG_MMC_SPI=m
# CONFIG_MMC_CB710 is not set
# CONFIG_MMC_VIA_SDMMC is not set
CONFIG_MMC_USDHI6ROL0=y
CONFIG_MMC_REALTEK_PCI=y
CONFIG_MMC_TOSHIBA_PCI=y
# CONFIG_MEMSTICK is not set
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y
# CONFIG_LEDS_CLASS_FLASH is not set

#
# LED drivers
#
# CONFIG_LEDS_LM3530 is not set
CONFIG_LEDS_LM3533=m
# CONFIG_LEDS_LM3642 is not set
CONFIG_LEDS_PCA9532=y
CONFIG_LEDS_LP3944=y
CONFIG_LEDS_LP55XX_COMMON=m
CONFIG_LEDS_LP5521=m
# CONFIG_LEDS_LP5523 is not set
# CONFIG_LEDS_LP5562 is not set
# CONFIG_LEDS_LP8501 is not set
CONFIG_LEDS_LP8788=m
CONFIG_LEDS_LP8860=m
CONFIG_LEDS_CLEVO_MAIL=y
CONFIG_LEDS_PCA955X=m
CONFIG_LEDS_PCA963X=y
CONFIG_LEDS_DA903X=y
# CONFIG_LEDS_DAC124S085 is not set
CONFIG_LEDS_PWM=y
CONFIG_LEDS_BD2802=m
CONFIG_LEDS_INTEL_SS4200=m
CONFIG_LEDS_MC13783=m
# CONFIG_LEDS_TCA6507 is not set
# CONFIG_LEDS_MAX8997 is not set
# CONFIG_LEDS_LM355x is not set
CONFIG_LEDS_MENF21BMC=m

#
# LED driver for blink(1) USB RGB LED is under Special HID drivers (HID_THINGM)
#
# CONFIG_LEDS_BLINKM is not set

#
# LED Triggers
#
CONFIG_LEDS_TRIGGERS=y
CONFIG_LEDS_TRIGGER_TIMER=y
CONFIG_LEDS_TRIGGER_ONESHOT=y
CONFIG_LEDS_TRIGGER_IDE_DISK=y
# CONFIG_LEDS_TRIGGER_HEARTBEAT is not set
# CONFIG_LEDS_TRIGGER_BACKLIGHT is not set
# CONFIG_LEDS_TRIGGER_CPU is not set
# CONFIG_LEDS_TRIGGER_DEFAULT_ON is not set

#
# iptables trigger is under Netfilter config (LED target)
#
CONFIG_LEDS_TRIGGER_TRANSIENT=y
CONFIG_LEDS_TRIGGER_CAMERA=m
CONFIG_ACCESSIBILITY=y
CONFIG_A11Y_BRAILLE_CONSOLE=y
# CONFIG_INFINIBAND is not set
CONFIG_EDAC=y
CONFIG_EDAC_LEGACY_SYSFS=y
CONFIG_EDAC_DEBUG=y
CONFIG_EDAC_DECODE_MCE=y
# CONFIG_EDAC_MCE_INJ is not set
CONFIG_EDAC_MM_EDAC=y
CONFIG_EDAC_AMD64=m
# CONFIG_EDAC_AMD64_ERROR_INJECTION is not set
CONFIG_EDAC_E752X=m
# CONFIG_EDAC_I82975X is not set
# CONFIG_EDAC_I3000 is not set
CONFIG_EDAC_I3200=m
# CONFIG_EDAC_IE31200 is not set
CONFIG_EDAC_X38=m
CONFIG_EDAC_I5400=y
# CONFIG_EDAC_I7CORE is not set
CONFIG_EDAC_I5000=m
# CONFIG_EDAC_I5100 is not set
CONFIG_EDAC_I7300=m
CONFIG_EDAC_SBRIDGE=y
CONFIG_RTC_LIB=y
# CONFIG_RTC_CLASS is not set
CONFIG_DMADEVICES=y
# CONFIG_DMADEVICES_DEBUG is not set

#
# DMA Devices
#
CONFIG_INTEL_MIC_X100_DMA=m
CONFIG_INTEL_MID_DMAC=y
CONFIG_INTEL_IOATDMA=y
CONFIG_DW_DMAC_CORE=y
CONFIG_DW_DMAC=m
CONFIG_DW_DMAC_PCI=y
CONFIG_DMA_ENGINE=y
CONFIG_DMA_ACPI=y

#
# DMA Clients
#
# CONFIG_ASYNC_TX_DMA is not set
# CONFIG_DMATEST is not set
CONFIG_DMA_ENGINE_RAID=y
CONFIG_DCA=y
# CONFIG_AUXDISPLAY is not set
CONFIG_UIO=y
# CONFIG_UIO_CIF is not set
CONFIG_UIO_PDRV_GENIRQ=m
# CONFIG_UIO_DMEM_GENIRQ is not set
CONFIG_UIO_AEC=y
CONFIG_UIO_SERCOS3=y
# CONFIG_UIO_PCI_GENERIC is not set
CONFIG_UIO_NETX=y
# CONFIG_UIO_MF624 is not set
CONFIG_VFIO_IOMMU_TYPE1=y
CONFIG_VFIO=y
CONFIG_VFIO_PCI=m
# CONFIG_VFIO_PCI_VGA is not set
CONFIG_VFIO_PCI_MMAP=y
CONFIG_VFIO_PCI_INTX=y
CONFIG_VIRT_DRIVERS=y
CONFIG_VIRTIO=y

#
# Virtio drivers
#
# CONFIG_VIRTIO_PCI is not set
# CONFIG_VIRTIO_BALLOON is not set
CONFIG_VIRTIO_MMIO=y
# CONFIG_VIRTIO_MMIO_CMDLINE_DEVICES is not set

#
# Microsoft Hyper-V guest support
#
# CONFIG_STAGING is not set
# CONFIG_X86_PLATFORM_DEVICES is not set
CONFIG_GOLDFISH_PIPE=y
CONFIG_CHROME_PLATFORMS=y
CONFIG_CHROMEOS_LAPTOP=m
CONFIG_CHROMEOS_PSTORE=y

#
# Hardware Spinlock drivers
#

#
# Clock Source drivers
#
CONFIG_CLKEVT_I8253=y
CONFIG_I8253_LOCK=y
CONFIG_CLKBLD_I8253=y
# CONFIG_ATMEL_PIT is not set
# CONFIG_SH_TIMER_CMT is not set
# CONFIG_SH_TIMER_MTU2 is not set
# CONFIG_SH_TIMER_TMU is not set
# CONFIG_EM_TIMER_STI is not set
CONFIG_MAILBOX=y
# CONFIG_PCC is not set
CONFIG_ALTERA_MBOX=m
CONFIG_IOMMU_API=y
CONFIG_IOMMU_SUPPORT=y

#
# Generic IOMMU Pagetable Support
#
CONFIG_IOMMU_IOVA=y
# CONFIG_AMD_IOMMU is not set
CONFIG_DMAR_TABLE=y
CONFIG_INTEL_IOMMU=y
# CONFIG_INTEL_IOMMU_DEFAULT_ON is not set
CONFIG_INTEL_IOMMU_FLOPPY_WA=y
CONFIG_IRQ_REMAP=y

#
# Remoteproc drivers
#
# CONFIG_STE_MODEM_RPROC is not set

#
# Rpmsg drivers
#

#
# SOC (System On Chip) specific Drivers
#
# CONFIG_SOC_TI is not set
CONFIG_PM_DEVFREQ=y

#
# DEVFREQ Governors
#
# CONFIG_DEVFREQ_GOV_SIMPLE_ONDEMAND is not set
CONFIG_DEVFREQ_GOV_PERFORMANCE=y
CONFIG_DEVFREQ_GOV_POWERSAVE=m
CONFIG_DEVFREQ_GOV_USERSPACE=m

#
# DEVFREQ Drivers
#
# CONFIG_PM_DEVFREQ_EVENT is not set
CONFIG_EXTCON=m

#
# Extcon Device Drivers
#
CONFIG_EXTCON_ADC_JACK=m
CONFIG_EXTCON_MAX14577=m
# CONFIG_EXTCON_MAX77693 is not set
CONFIG_EXTCON_MAX8997=m
CONFIG_EXTCON_PALMAS=m
CONFIG_EXTCON_RT8973A=m
CONFIG_EXTCON_SM5502=m
CONFIG_MEMORY=y
CONFIG_IIO=m
CONFIG_IIO_BUFFER=y
# CONFIG_IIO_BUFFER_CB is not set
CONFIG_IIO_KFIFO_BUF=m
CONFIG_IIO_TRIGGERED_BUFFER=m
CONFIG_IIO_TRIGGER=y
CONFIG_IIO_CONSUMERS_PER_TRIGGER=2

#
# Accelerometers
#
CONFIG_BMA180=m
# CONFIG_BMC150_ACCEL is not set
CONFIG_IIO_ST_ACCEL_3AXIS=m
CONFIG_IIO_ST_ACCEL_I2C_3AXIS=m
CONFIG_IIO_ST_ACCEL_SPI_3AXIS=m
# CONFIG_KXSD9 is not set
CONFIG_MMA8452=m
# CONFIG_KXCJK1013 is not set
# CONFIG_MMA9551 is not set
# CONFIG_MMA9553 is not set

#
# Analog to digital converters
#
CONFIG_AD_SIGMA_DELTA=m
CONFIG_AD7266=m
CONFIG_AD7291=m
# CONFIG_AD7298 is not set
CONFIG_AD7476=m
CONFIG_AD7791=m
# CONFIG_AD7793 is not set
CONFIG_AD7887=m
CONFIG_AD7923=m
# CONFIG_AD799X is not set
CONFIG_AXP288_ADC=m
# CONFIG_LP8788_ADC is not set
# CONFIG_MAX1027 is not set
CONFIG_MAX1363=m
CONFIG_MCP320X=m
CONFIG_MCP3422=m
# CONFIG_NAU7802 is not set
# CONFIG_QCOM_SPMI_IADC is not set
# CONFIG_QCOM_SPMI_VADC is not set
CONFIG_TI_ADC081C=m
CONFIG_TI_ADC128S052=m
# CONFIG_TI_AM335X_ADC is not set
CONFIG_TWL4030_MADC=m
CONFIG_TWL6030_GPADC=m

#
# Amplifiers
#
CONFIG_AD8366=m

#
# Hid Sensor IIO Common
#

#
# SSP Sensor Common
#
# CONFIG_IIO_SSP_SENSORHUB is not set
CONFIG_IIO_ST_SENSORS_I2C=m
CONFIG_IIO_ST_SENSORS_SPI=m
CONFIG_IIO_ST_SENSORS_CORE=m

#
# Digital to analog converters
#
CONFIG_AD5064=m
# CONFIG_AD5360 is not set
CONFIG_AD5380=m
CONFIG_AD5421=m
# CONFIG_AD5446 is not set
CONFIG_AD5449=m
CONFIG_AD5504=m
CONFIG_AD5624R_SPI=m
CONFIG_AD5686=m
CONFIG_AD5755=m
CONFIG_AD5764=m
CONFIG_AD5791=m
CONFIG_AD7303=m
CONFIG_MAX517=m
CONFIG_MCP4725=m
CONFIG_MCP4922=m

#
# Frequency Synthesizers DDS/PLL
#

#
# Clock Generator/Distribution
#
CONFIG_AD9523=m

#
# Phase-Locked Loop (PLL) frequency synthesizers
#
CONFIG_ADF4350=m

#
# Digital gyroscope sensors
#
CONFIG_ADIS16080=m
CONFIG_ADIS16130=m
CONFIG_ADIS16136=m
CONFIG_ADIS16260=m
# CONFIG_ADXRS450 is not set
CONFIG_BMG160=m
CONFIG_IIO_ST_GYRO_3AXIS=m
CONFIG_IIO_ST_GYRO_I2C_3AXIS=m
CONFIG_IIO_ST_GYRO_SPI_3AXIS=m
CONFIG_ITG3200=m

#
# Humidity sensors
#
CONFIG_SI7005=m
# CONFIG_SI7020 is not set

#
# Inertial measurement units
#
# CONFIG_ADIS16400 is not set
# CONFIG_ADIS16480 is not set
CONFIG_KMX61=m
CONFIG_INV_MPU6050_IIO=m
CONFIG_IIO_ADIS_LIB=m
CONFIG_IIO_ADIS_LIB_BUFFER=y

#
# Light sensors
#
# CONFIG_ADJD_S311 is not set
# CONFIG_AL3320A is not set
CONFIG_APDS9300=m
CONFIG_CM32181=m
CONFIG_CM3232=m
# CONFIG_CM36651 is not set
CONFIG_GP2AP020A00F=m
CONFIG_ISL29125=m
CONFIG_JSA1212=m
CONFIG_SENSORS_LM3533=m
# CONFIG_LTR501 is not set
CONFIG_TCS3414=m
# CONFIG_TCS3472 is not set
# CONFIG_SENSORS_TSL2563 is not set
CONFIG_TSL4531=m
# CONFIG_VCNL4000 is not set

#
# Magnetometer sensors
#
CONFIG_MAG3110=m
# CONFIG_IIO_ST_MAGN_3AXIS is not set

#
# Inclinometer sensors
#

#
# Triggers - standalone
#
CONFIG_IIO_INTERRUPT_TRIGGER=m
# CONFIG_IIO_SYSFS_TRIGGER is not set

#
# Pressure sensors
#
# CONFIG_BMP280 is not set
CONFIG_MPL115=m
# CONFIG_MPL3115 is not set
CONFIG_IIO_ST_PRESS=m
CONFIG_IIO_ST_PRESS_I2C=m
CONFIG_IIO_ST_PRESS_SPI=m
CONFIG_T5403=m

#
# Lightning sensors
#
CONFIG_AS3935=m

#
# Proximity sensors
#
CONFIG_SX9500=m

#
# Temperature sensors
#
# CONFIG_MLX90614 is not set
# CONFIG_TMP006 is not set
CONFIG_NTB=m
# CONFIG_VME_BUS is not set
CONFIG_PWM=y
CONFIG_PWM_SYSFS=y
CONFIG_PWM_LP3943=m
# CONFIG_PWM_LPSS is not set
CONFIG_PWM_TWL=m
CONFIG_PWM_TWL_LED=y
CONFIG_IPACK_BUS=y
CONFIG_BOARD_TPCI200=m
CONFIG_SERIAL_IPOCTAL=m
CONFIG_RESET_CONTROLLER=y
CONFIG_FMC=m
CONFIG_FMC_FAKEDEV=m
CONFIG_FMC_TRIVIAL=m
CONFIG_FMC_WRITE_EEPROM=m
# CONFIG_FMC_CHARDEV is not set

#
# PHY Subsystem
#
CONFIG_GENERIC_PHY=y
CONFIG_BCM_KONA_USB2_PHY=y
# CONFIG_POWERCAP is not set
# CONFIG_MCB is not set
CONFIG_RAS=y
CONFIG_THUNDERBOLT=y

#
# Android
#
# CONFIG_ANDROID is not set

#
# Firmware Drivers
#
CONFIG_EDD=m
# CONFIG_EDD_OFF is not set
CONFIG_FIRMWARE_MEMMAP=y
# CONFIG_DELL_RBU is not set
CONFIG_DCDBAS=m
CONFIG_DMIID=y
# CONFIG_DMI_SYSFS is not set
CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK=y
CONFIG_ISCSI_IBFT_FIND=y
CONFIG_ISCSI_IBFT=m
# CONFIG_GOOGLE_FIRMWARE is not set
CONFIG_UEFI_CPER=y

#
# File systems
#
CONFIG_DCACHE_WORD_ACCESS=y
CONFIG_EXT2_FS=y
CONFIG_EXT2_FS_XATTR=y
CONFIG_EXT2_FS_POSIX_ACL=y
# CONFIG_EXT2_FS_SECURITY is not set
CONFIG_EXT3_FS=y
CONFIG_EXT3_DEFAULTS_TO_ORDERED=y
CONFIG_EXT3_FS_XATTR=y
CONFIG_EXT3_FS_POSIX_ACL=y
CONFIG_EXT3_FS_SECURITY=y
CONFIG_EXT4_FS=y
# CONFIG_EXT4_FS_POSIX_ACL is not set
# CONFIG_EXT4_FS_SECURITY is not set
# CONFIG_EXT4_DEBUG is not set
CONFIG_JBD=y
CONFIG_JBD_DEBUG=y
CONFIG_JBD2=y
CONFIG_JBD2_DEBUG=y
CONFIG_FS_MBCACHE=y
# CONFIG_REISERFS_FS is not set
CONFIG_JFS_FS=m
CONFIG_JFS_POSIX_ACL=y
CONFIG_JFS_SECURITY=y
# CONFIG_JFS_DEBUG is not set
CONFIG_JFS_STATISTICS=y
CONFIG_XFS_FS=y
CONFIG_XFS_QUOTA=y
# CONFIG_XFS_POSIX_ACL is not set
# CONFIG_XFS_RT is not set
CONFIG_XFS_DEBUG=y
CONFIG_GFS2_FS=y
CONFIG_OCFS2_FS=m
CONFIG_OCFS2_FS_O2CB=m
CONFIG_OCFS2_FS_USERSPACE_CLUSTER=m
# CONFIG_OCFS2_FS_STATS is not set
# CONFIG_OCFS2_DEBUG_MASKLOG is not set
CONFIG_OCFS2_DEBUG_FS=y
CONFIG_BTRFS_FS=y
CONFIG_BTRFS_FS_POSIX_ACL=y
# CONFIG_BTRFS_FS_CHECK_INTEGRITY is not set
# CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set
CONFIG_BTRFS_DEBUG=y
CONFIG_BTRFS_ASSERT=y
CONFIG_NILFS2_FS=y
# CONFIG_FS_DAX is not set
CONFIG_FS_POSIX_ACL=y
CONFIG_EXPORTFS=y
CONFIG_FILE_LOCKING=y
CONFIG_FSNOTIFY=y
CONFIG_DNOTIFY=y
CONFIG_INOTIFY_USER=y
CONFIG_FANOTIFY=y
CONFIG_QUOTA=y
# CONFIG_QUOTA_NETLINK_INTERFACE is not set
# CONFIG_PRINT_QUOTA_WARNING is not set
CONFIG_QUOTA_DEBUG=y
CONFIG_QUOTA_TREE=m
CONFIG_QFMT_V1=y
# CONFIG_QFMT_V2 is not set
CONFIG_QUOTACTL=y
CONFIG_QUOTACTL_COMPAT=y
CONFIG_AUTOFS4_FS=m
CONFIG_FUSE_FS=m
# CONFIG_CUSE is not set
CONFIG_OVERLAY_FS=y

#
# Caches
#
CONFIG_FSCACHE=m
# CONFIG_FSCACHE_STATS is not set
# CONFIG_FSCACHE_HISTOGRAM is not set
# CONFIG_FSCACHE_DEBUG is not set
CONFIG_FSCACHE_OBJECT_LIST=y
CONFIG_CACHEFILES=m
CONFIG_CACHEFILES_DEBUG=y
# CONFIG_CACHEFILES_HISTOGRAM is not set

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_UDF_FS=y
CONFIG_UDF_NLS=y

#
# DOS/FAT/NT Filesystems
#
# CONFIG_MSDOS_FS is not set
# CONFIG_VFAT_FS is not set
CONFIG_NTFS_FS=m
# CONFIG_NTFS_DEBUG is not set
# CONFIG_NTFS_RW is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
# CONFIG_PROC_VMCORE is not set
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_KERNFS=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
# CONFIG_TMPFS_POSIX_ACL is not set
CONFIG_TMPFS_XATTR=y
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_CONFIGFS_FS=m
CONFIG_MISC_FILESYSTEMS=y
CONFIG_ADFS_FS=y
CONFIG_ADFS_FS_RW=y
# CONFIG_AFFS_FS is not set
CONFIG_ECRYPT_FS=y
# CONFIG_ECRYPT_FS_MESSAGING is not set
CONFIG_HFS_FS=m
CONFIG_HFSPLUS_FS=m
CONFIG_HFSPLUS_FS_POSIX_ACL=y
# CONFIG_BEFS_FS is not set
CONFIG_BFS_FS=m
CONFIG_EFS_FS=y
# CONFIG_JFFS2_FS is not set
CONFIG_LOGFS=y
CONFIG_CRAMFS=m
CONFIG_SQUASHFS=y
CONFIG_SQUASHFS_FILE_CACHE=y
# CONFIG_SQUASHFS_FILE_DIRECT is not set
CONFIG_SQUASHFS_DECOMP_SINGLE=y
# CONFIG_SQUASHFS_DECOMP_MULTI is not set
# CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU is not set
CONFIG_SQUASHFS_XATTR=y
# CONFIG_SQUASHFS_ZLIB is not set
# CONFIG_SQUASHFS_LZ4 is not set
# CONFIG_SQUASHFS_LZO is not set
# CONFIG_SQUASHFS_XZ is not set
# CONFIG_SQUASHFS_4K_DEVBLK_SIZE is not set
CONFIG_SQUASHFS_EMBEDDED=y
CONFIG_SQUASHFS_FRAGMENT_CACHE_SIZE=3
CONFIG_VXFS_FS=y
CONFIG_MINIX_FS=y
CONFIG_OMFS_FS=y
# CONFIG_HPFS_FS is not set
CONFIG_QNX4FS_FS=y
CONFIG_QNX6FS_FS=y
CONFIG_QNX6FS_DEBUG=y
CONFIG_ROMFS_FS=y
CONFIG_ROMFS_BACKED_BY_BLOCK=y
# CONFIG_ROMFS_BACKED_BY_MTD is not set
# CONFIG_ROMFS_BACKED_BY_BOTH is not set
CONFIG_ROMFS_ON_BLOCK=y
CONFIG_PSTORE=y
CONFIG_PSTORE_CONSOLE=y
CONFIG_PSTORE_PMSG=y
CONFIG_PSTORE_FTRACE=y
# CONFIG_PSTORE_RAM is not set
CONFIG_SYSV_FS=m
# CONFIG_UFS_FS is not set
CONFIG_EXOFS_FS=m
CONFIG_EXOFS_DEBUG=y
# CONFIG_F2FS_FS is not set
CONFIG_ORE=m
CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_FS=y
CONFIG_NFS_V2=y
CONFIG_NFS_V3=y
# CONFIG_NFS_V3_ACL is not set
CONFIG_NFS_V4=y
CONFIG_NFS_SWAP=y
# CONFIG_NFS_V4_1 is not set
# CONFIG_ROOT_NFS is not set
CONFIG_NFS_USE_LEGACY_DNS=y
CONFIG_NFS_DEBUG=y
CONFIG_NFSD=y
CONFIG_NFSD_V3=y
# CONFIG_NFSD_V3_ACL is not set
CONFIG_NFSD_V4=y
# CONFIG_NFSD_PNFS is not set
# CONFIG_NFSD_FAULT_INJECTION is not set
CONFIG_GRACE_PERIOD=y
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_NFS_COMMON=y
CONFIG_SUNRPC=y
CONFIG_SUNRPC_GSS=y
CONFIG_SUNRPC_SWAP=y
# CONFIG_RPCSEC_GSS_KRB5 is not set
CONFIG_SUNRPC_DEBUG=y
# CONFIG_CEPH_FS is not set
CONFIG_CIFS=m
# CONFIG_CIFS_STATS is not set
CONFIG_CIFS_WEAK_PW_HASH=y
CONFIG_CIFS_UPCALL=y
# CONFIG_CIFS_XATTR is not set
# CONFIG_CIFS_DEBUG is not set
# CONFIG_CIFS_DFS_UPCALL is not set
CONFIG_CIFS_SMB2=y
# CONFIG_CIFS_FSCACHE is not set
CONFIG_NCP_FS=y
CONFIG_NCPFS_PACKET_SIGNING=y
# CONFIG_NCPFS_IOCTL_LOCKING is not set
# CONFIG_NCPFS_STRONG is not set
CONFIG_NCPFS_NFS_NS=y
CONFIG_NCPFS_OS2_NS=y
CONFIG_NCPFS_SMALLDOS=y
# CONFIG_NCPFS_NLS is not set
# CONFIG_NCPFS_EXTRAS is not set
# CONFIG_CODA_FS is not set
CONFIG_AFS_FS=m
CONFIG_AFS_DEBUG=y
# CONFIG_AFS_FSCACHE is not set
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
# CONFIG_NLS_CODEPAGE_437 is not set
# CONFIG_NLS_CODEPAGE_737 is not set
CONFIG_NLS_CODEPAGE_775=m
CONFIG_NLS_CODEPAGE_850=y
CONFIG_NLS_CODEPAGE_852=m
CONFIG_NLS_CODEPAGE_855=y
CONFIG_NLS_CODEPAGE_857=y
# CONFIG_NLS_CODEPAGE_860 is not set
CONFIG_NLS_CODEPAGE_861=m
CONFIG_NLS_CODEPAGE_862=m
CONFIG_NLS_CODEPAGE_863=y
# CONFIG_NLS_CODEPAGE_864 is not set
CONFIG_NLS_CODEPAGE_865=m
CONFIG_NLS_CODEPAGE_866=m
CONFIG_NLS_CODEPAGE_869=m
CONFIG_NLS_CODEPAGE_936=y
CONFIG_NLS_CODEPAGE_950=y
CONFIG_NLS_CODEPAGE_932=y
# CONFIG_NLS_CODEPAGE_949 is not set
CONFIG_NLS_CODEPAGE_874=m
CONFIG_NLS_ISO8859_8=y
CONFIG_NLS_CODEPAGE_1250=m
CONFIG_NLS_CODEPAGE_1251=m
CONFIG_NLS_ASCII=m
# CONFIG_NLS_ISO8859_1 is not set
CONFIG_NLS_ISO8859_2=m
CONFIG_NLS_ISO8859_3=m
# CONFIG_NLS_ISO8859_4 is not set
# CONFIG_NLS_ISO8859_5 is not set
# CONFIG_NLS_ISO8859_6 is not set
CONFIG_NLS_ISO8859_7=m
CONFIG_NLS_ISO8859_9=y
CONFIG_NLS_ISO8859_13=y
# CONFIG_NLS_ISO8859_14 is not set
CONFIG_NLS_ISO8859_15=y
# CONFIG_NLS_KOI8_R is not set
CONFIG_NLS_KOI8_U=m
CONFIG_NLS_MAC_ROMAN=m
CONFIG_NLS_MAC_CELTIC=m
CONFIG_NLS_MAC_CENTEURO=y
CONFIG_NLS_MAC_CROATIAN=y
CONFIG_NLS_MAC_CYRILLIC=y
# CONFIG_NLS_MAC_GAELIC is not set
CONFIG_NLS_MAC_GREEK=m
CONFIG_NLS_MAC_ICELAND=y
CONFIG_NLS_MAC_INUIT=y
# CONFIG_NLS_MAC_ROMANIAN is not set
CONFIG_NLS_MAC_TURKISH=m
CONFIG_NLS_UTF8=y
CONFIG_DLM=m
# CONFIG_DLM_DEBUG is not set

#
# Kernel hacking
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y

#
# printk and dmesg options
#
CONFIG_PRINTK_TIME=y
CONFIG_MESSAGE_LOGLEVEL_DEFAULT=4
# CONFIG_BOOT_PRINTK_DELAY is not set
CONFIG_DYNAMIC_DEBUG=y

#
# Compile-time checks and compiler options
#
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_INFO_REDUCED=y
# CONFIG_DEBUG_INFO_SPLIT is not set
CONFIG_DEBUG_INFO_DWARF4=y
CONFIG_GDB_SCRIPTS=y
CONFIG_ENABLE_WARN_DEPRECATED=y
CONFIG_ENABLE_MUST_CHECK=y
CONFIG_FRAME_WARN=1024
# CONFIG_STRIP_ASM_SYMS is not set
# CONFIG_READABLE_ASM is not set
CONFIG_UNUSED_SYMBOLS=y
CONFIG_PAGE_OWNER=y
CONFIG_DEBUG_FS=y
CONFIG_HEADERS_CHECK=y
# CONFIG_DEBUG_SECTION_MISMATCH is not set
CONFIG_ARCH_WANT_FRAME_POINTERS=y
CONFIG_FRAME_POINTER=y
# CONFIG_DEBUG_FORCE_WEAK_PER_CPU is not set
CONFIG_MAGIC_SYSRQ=y
CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE=0x1
CONFIG_DEBUG_KERNEL=y

#
# Memory Debugging
#
CONFIG_PAGE_EXTENSION=y
# CONFIG_DEBUG_PAGEALLOC is not set
CONFIG_DEBUG_OBJECTS=y
# CONFIG_DEBUG_OBJECTS_SELFTEST is not set
CONFIG_DEBUG_OBJECTS_FREE=y
# CONFIG_DEBUG_OBJECTS_TIMERS is not set
CONFIG_DEBUG_OBJECTS_WORK=y
CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
# CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER is not set
CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT=1
CONFIG_SLUB_DEBUG_ON=y
CONFIG_SLUB_STATS=y
CONFIG_HAVE_DEBUG_KMEMLEAK=y
# CONFIG_DEBUG_KMEMLEAK is not set
# CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_DEBUG_VM is not set
# CONFIG_DEBUG_VIRTUAL is not set
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_HAVE_DEBUG_STACKOVERFLOW=y
# CONFIG_DEBUG_STACKOVERFLOW is not set
CONFIG_HAVE_ARCH_KMEMCHECK=y
CONFIG_HAVE_ARCH_KASAN=y
CONFIG_KASAN=y
CONFIG_KASAN_SHADOW_OFFSET=0xdffffc0000000000
CONFIG_KASAN_OUTLINE=y
# CONFIG_KASAN_INLINE is not set
CONFIG_TEST_KASAN=m
CONFIG_DEBUG_SHIRQ=y

#
# Debug Lockups and Hangs
#
CONFIG_LOCKUP_DETECTOR=y
CONFIG_HARDLOCKUP_DETECTOR=y
# CONFIG_BOOTPARAM_HARDLOCKUP_PANIC is not set
CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE=0
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=1
# CONFIG_DETECT_HUNG_TASK is not set
# CONFIG_PANIC_ON_OOPS is not set
CONFIG_PANIC_ON_OOPS_VALUE=0
CONFIG_PANIC_TIMEOUT=0
CONFIG_SCHED_DEBUG=y
CONFIG_SCHEDSTATS=y
CONFIG_SCHED_STACK_END_CHECK=y
# CONFIG_TIMER_STATS is not set

#
# Lock Debugging (spinlocks, mutexes, etc...)
#
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
# CONFIG_DEBUG_WW_MUTEX_SLOWPATH is not set
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
CONFIG_LOCKDEP=y
# CONFIG_LOCK_STAT is not set
# CONFIG_DEBUG_LOCKDEP is not set
# CONFIG_DEBUG_ATOMIC_SLEEP is not set
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
# CONFIG_LOCK_TORTURE_TEST is not set
CONFIG_TRACE_IRQFLAGS=y
CONFIG_STACKTRACE=y
CONFIG_DEBUG_KOBJECT=y
CONFIG_DEBUG_BUGVERBOSE=y
# CONFIG_DEBUG_LIST is not set
CONFIG_DEBUG_PI_LIST=y
CONFIG_DEBUG_SG=y
# CONFIG_DEBUG_NOTIFIERS is not set
CONFIG_DEBUG_CREDENTIALS=y

#
# RCU Debugging
#
CONFIG_PROVE_RCU=y
# CONFIG_PROVE_RCU_REPEATEDLY is not set
CONFIG_SPARSE_RCU_POINTER=y
CONFIG_TORTURE_TEST=m
CONFIG_RCU_TORTURE_TEST=m
# CONFIG_RCU_TRACE is not set
CONFIG_DEBUG_BLOCK_EXT_DEVT=y
# CONFIG_NOTIFIER_ERROR_INJECTION is not set
# CONFIG_FAULT_INJECTION is not set
CONFIG_LATENCYTOP=y
CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS=y
# CONFIG_DEBUG_STRICT_USER_COPY_CHECKS is not set
CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_HAVE_FENTRY=y
CONFIG_HAVE_C_RECORDMCOUNT=y
CONFIG_TRACER_MAX_TRACE=y
CONFIG_TRACE_CLOCK=y
CONFIG_RING_BUFFER=y
CONFIG_EVENT_TRACING=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_RING_BUFFER_ALLOW_SWAP=y
CONFIG_TRACING=y
CONFIG_GENERIC_TRACER=y
CONFIG_TRACING_SUPPORT=y
CONFIG_FTRACE=y
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
# CONFIG_IRQSOFF_TRACER is not set
CONFIG_SCHED_TRACER=y
CONFIG_FTRACE_SYSCALLS=y
CONFIG_TRACER_SNAPSHOT=y
CONFIG_TRACER_SNAPSHOT_PER_CPU_SWAP=y
CONFIG_BRANCH_PROFILE_NONE=y
# CONFIG_PROFILE_ANNOTATED_BRANCHES is not set
# CONFIG_PROFILE_ALL_BRANCHES is not set
CONFIG_STACK_TRACER=y
CONFIG_BLK_DEV_IO_TRACE=y
CONFIG_KPROBE_EVENT=y
CONFIG_UPROBE_EVENT=y
CONFIG_PROBE_EVENTS=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
# CONFIG_FUNCTION_PROFILER is not set
CONFIG_FTRACE_MCOUNT_RECORD=y
# CONFIG_FTRACE_STARTUP_TEST is not set
CONFIG_MMIOTRACE=y
CONFIG_MMIOTRACE_TEST=m
# CONFIG_TRACEPOINT_BENCHMARK is not set
CONFIG_RING_BUFFER_BENCHMARK=m
CONFIG_RING_BUFFER_STARTUP_TEST=y

#
# Runtime Testing
#
CONFIG_LKDTM=y
# CONFIG_TEST_LIST_SORT is not set
# CONFIG_KPROBES_SANITY_TEST is not set
# CONFIG_BACKTRACE_SELF_TEST is not set
# CONFIG_RBTREE_TEST is not set
CONFIG_INTERVAL_TREE_TEST=m
# CONFIG_PERCPU_TEST is not set
# CONFIG_ATOMIC64_SELFTEST is not set
# CONFIG_ASYNC_RAID6_TEST is not set
CONFIG_TEST_HEXDUMP=m
CONFIG_TEST_STRING_HELPERS=m
# CONFIG_TEST_KSTRTOX is not set
CONFIG_TEST_RHASHTABLE=y
# CONFIG_PROVIDE_OHCI1394_DMA_INIT is not set
CONFIG_BUILD_DOCSRC=y
CONFIG_DMA_API_DEBUG=y
CONFIG_TEST_LKM=m
CONFIG_TEST_USER_COPY=m
CONFIG_TEST_BPF=m
CONFIG_TEST_FIRMWARE=y
CONFIG_TEST_UDELAY=m
CONFIG_MEMTEST=y
# CONFIG_SAMPLES is not set
CONFIG_HAVE_ARCH_KGDB=y
# CONFIG_KGDB is not set
CONFIG_STRICT_DEVMEM=y
CONFIG_X86_VERBOSE_BOOTUP=y
CONFIG_EARLY_PRINTK=y
# CONFIG_EARLY_PRINTK_DBGP is not set
CONFIG_X86_PTDUMP=y
CONFIG_DEBUG_RODATA=y
CONFIG_DEBUG_RODATA_TEST=y
CONFIG_DEBUG_SET_MODULE_RONX=y
CONFIG_DEBUG_NX_TEST=m
CONFIG_DOUBLEFAULT=y
# CONFIG_DEBUG_TLBFLUSH is not set
# CONFIG_IOMMU_DEBUG is not set
# CONFIG_IOMMU_STRESS is not set
CONFIG_HAVE_MMIOTRACE_SUPPORT=y
CONFIG_X86_DECODER_SELFTEST=y
CONFIG_IO_DELAY_TYPE_0X80=0
CONFIG_IO_DELAY_TYPE_0XED=1
CONFIG_IO_DELAY_TYPE_UDELAY=2
CONFIG_IO_DELAY_TYPE_NONE=3
CONFIG_IO_DELAY_0X80=y
# CONFIG_IO_DELAY_0XED is not set
# CONFIG_IO_DELAY_UDELAY is not set
# CONFIG_IO_DELAY_NONE is not set
CONFIG_DEFAULT_IO_DELAY_TYPE=0
CONFIG_DEBUG_BOOT_PARAMS=y
# CONFIG_CPA_DEBUG is not set
CONFIG_OPTIMIZE_INLINING=y
CONFIG_DEBUG_NMI_SELFTEST=y
# CONFIG_X86_DEBUG_STATIC_CPU_HAS is not set

#
# Security options
#
CONFIG_KEYS=y
# CONFIG_PERSISTENT_KEYRINGS is not set
# CONFIG_BIG_KEYS is not set
# CONFIG_ENCRYPTED_KEYS is not set
# CONFIG_SECURITY_DMESG_RESTRICT is not set
# CONFIG_SECURITY is not set
CONFIG_SECURITYFS=y
CONFIG_INTEL_TXT=y
CONFIG_DEFAULT_SECURITY_DAC=y
CONFIG_DEFAULT_SECURITY=""
CONFIG_XOR_BLOCKS=y
CONFIG_ASYNC_CORE=m
CONFIG_ASYNC_MEMCPY=m
CONFIG_ASYNC_XOR=m
CONFIG_ASYNC_PQ=m
CONFIG_ASYNC_RAID6_RECOV=m
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD=y
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_BLKCIPHER=y
CONFIG_CRYPTO_BLKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG=y
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_PCOMP=m
CONFIG_CRYPTO_PCOMP2=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
CONFIG_CRYPTO_USER=y
CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=y
CONFIG_CRYPTO_GF128MUL=y
CONFIG_CRYPTO_NULL=y
CONFIG_CRYPTO_WORKQUEUE=y
CONFIG_CRYPTO_CRYPTD=y
CONFIG_CRYPTO_MCRYPTD=m
CONFIG_CRYPTO_AUTHENC=y
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_ABLK_HELPER=y
CONFIG_CRYPTO_GLUE_HELPER_X86=y

#
# Authenticated Encryption with Associated Data
#
CONFIG_CRYPTO_CCM=m
CONFIG_CRYPTO_GCM=y
CONFIG_CRYPTO_SEQIV=y

#
# Block modes
#
CONFIG_CRYPTO_CBC=y
CONFIG_CRYPTO_CTR=y
CONFIG_CRYPTO_CTS=y
CONFIG_CRYPTO_ECB=y
CONFIG_CRYPTO_LRW=y
CONFIG_CRYPTO_PCBC=y
CONFIG_CRYPTO_XTS=y

#
# Hash modes
#
CONFIG_CRYPTO_CMAC=m
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPTO_XCBC=y
CONFIG_CRYPTO_VMAC=m

#
# Digest
#
CONFIG_CRYPTO_CRC32C=y
CONFIG_CRYPTO_CRC32C_INTEL=y
# CONFIG_CRYPTO_CRC32 is not set
CONFIG_CRYPTO_CRC32_PCLMUL=y
CONFIG_CRYPTO_CRCT10DIF=y
CONFIG_CRYPTO_CRCT10DIF_PCLMUL=m
CONFIG_CRYPTO_GHASH=y
CONFIG_CRYPTO_MD4=m
CONFIG_CRYPTO_MD5=y
CONFIG_CRYPTO_MICHAEL_MIC=m
CONFIG_CRYPTO_RMD128=m
# CONFIG_CRYPTO_RMD160 is not set
CONFIG_CRYPTO_RMD256=y
CONFIG_CRYPTO_RMD320=y
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_SHA1_SSSE3=m
CONFIG_CRYPTO_SHA256_SSSE3=y
CONFIG_CRYPTO_SHA512_SSSE3=m
# CONFIG_CRYPTO_SHA1_MB is not set
CONFIG_CRYPTO_SHA256=y
CONFIG_CRYPTO_SHA512=y
CONFIG_CRYPTO_TGR192=y
CONFIG_CRYPTO_WP512=m
CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL=m

#
# Ciphers
#
CONFIG_CRYPTO_AES=y
CONFIG_CRYPTO_AES_X86_64=m
# CONFIG_CRYPTO_AES_NI_INTEL is not set
# CONFIG_CRYPTO_ANUBIS is not set
CONFIG_CRYPTO_ARC4=y
CONFIG_CRYPTO_BLOWFISH=m
CONFIG_CRYPTO_BLOWFISH_COMMON=y
CONFIG_CRYPTO_BLOWFISH_X86_64=y
CONFIG_CRYPTO_CAMELLIA=m
CONFIG_CRYPTO_CAMELLIA_X86_64=m
CONFIG_CRYPTO_CAMELLIA_AESNI_AVX_X86_64=m
# CONFIG_CRYPTO_CAMELLIA_AESNI_AVX2_X86_64 is not set
CONFIG_CRYPTO_CAST_COMMON=y
CONFIG_CRYPTO_CAST5=y
CONFIG_CRYPTO_CAST5_AVX_X86_64=m
CONFIG_CRYPTO_CAST6=y
CONFIG_CRYPTO_CAST6_AVX_X86_64=m
CONFIG_CRYPTO_DES=y
CONFIG_CRYPTO_DES3_EDE_X86_64=m
CONFIG_CRYPTO_FCRYPT=y
# CONFIG_CRYPTO_KHAZAD is not set
CONFIG_CRYPTO_SALSA20=m
# CONFIG_CRYPTO_SALSA20_X86_64 is not set
CONFIG_CRYPTO_SEED=y
CONFIG_CRYPTO_SERPENT=y
CONFIG_CRYPTO_SERPENT_SSE2_X86_64=m
CONFIG_CRYPTO_SERPENT_AVX_X86_64=y
CONFIG_CRYPTO_SERPENT_AVX2_X86_64=m
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=y
CONFIG_CRYPTO_TWOFISH_COMMON=y
CONFIG_CRYPTO_TWOFISH_X86_64=y
CONFIG_CRYPTO_TWOFISH_X86_64_3WAY=y
CONFIG_CRYPTO_TWOFISH_AVX_X86_64=y

#
# Compression
#
CONFIG_CRYPTO_DEFLATE=m
CONFIG_CRYPTO_ZLIB=m
CONFIG_CRYPTO_LZO=y
CONFIG_CRYPTO_LZ4=m
CONFIG_CRYPTO_LZ4HC=m

#
# Random Number Generation
#
CONFIG_CRYPTO_ANSI_CPRNG=m
# CONFIG_CRYPTO_DRBG_MENU is not set
CONFIG_CRYPTO_USER_API=y
CONFIG_CRYPTO_USER_API_HASH=m
CONFIG_CRYPTO_USER_API_SKCIPHER=m
CONFIG_CRYPTO_USER_API_RNG=y
CONFIG_CRYPTO_HASH_INFO=y
CONFIG_CRYPTO_HW=y
CONFIG_CRYPTO_DEV_PADLOCK=m
CONFIG_CRYPTO_DEV_PADLOCK_AES=m
CONFIG_CRYPTO_DEV_PADLOCK_SHA=m
# CONFIG_CRYPTO_DEV_CCP is not set
CONFIG_CRYPTO_DEV_QAT=y
CONFIG_CRYPTO_DEV_QAT_DH895xCC=y
CONFIG_ASYMMETRIC_KEY_TYPE=y
CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE=y
CONFIG_PUBLIC_KEY_ALGO_RSA=y
CONFIG_X509_CERTIFICATE_PARSER=y
# CONFIG_PKCS7_MESSAGE_PARSER is not set
CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
CONFIG_HAVE_KVM_IRQFD=y
CONFIG_HAVE_KVM_IRQ_ROUTING=y
CONFIG_HAVE_KVM_EVENTFD=y
CONFIG_KVM_APIC_ARCHITECTURE=y
CONFIG_KVM_MMIO=y
CONFIG_KVM_ASYNC_PF=y
CONFIG_HAVE_KVM_MSI=y
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT=y
CONFIG_KVM_VFIO=y
CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT=y
CONFIG_KVM_COMPAT=y
CONFIG_VIRTUALIZATION=y
CONFIG_KVM=m
CONFIG_KVM_INTEL=m
CONFIG_KVM_AMD=m
CONFIG_KVM_MMU_AUDIT=y
# CONFIG_KVM_DEVICE_ASSIGNMENT is not set
CONFIG_BINARY_PRINTF=y

#
# Library routines
#
CONFIG_RAID6_PQ=y
CONFIG_BITREVERSE=y
# CONFIG_HAVE_ARCH_BITREVERSE is not set
CONFIG_GENERIC_STRNCPY_FROM_USER=y
CONFIG_GENERIC_STRNLEN_USER=y
CONFIG_GENERIC_NET_UTILS=y
CONFIG_GENERIC_FIND_FIRST_BIT=y
CONFIG_GENERIC_PCI_IOMAP=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_IO=y
CONFIG_PERCPU_RWSEM=y
CONFIG_ARCH_USE_CMPXCHG_LOCKREF=y
CONFIG_ARCH_HAS_FAST_MULTIPLIER=y
CONFIG_CRC_CCITT=y
CONFIG_CRC16=y
CONFIG_CRC_T10DIF=y
CONFIG_CRC_ITU_T=y
CONFIG_CRC32=y
CONFIG_CRC32_SELFTEST=y
CONFIG_CRC32_SLICEBY8=y
# CONFIG_CRC32_SLICEBY4 is not set
# CONFIG_CRC32_SARWATE is not set
# CONFIG_CRC32_BIT is not set
CONFIG_CRC7=m
CONFIG_LIBCRC32C=y
# CONFIG_CRC8 is not set
# CONFIG_AUDIT_ARCH_COMPAT_GENERIC is not set
# CONFIG_RANDOM32_SELFTEST is not set
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=y
CONFIG_LZO_COMPRESS=y
CONFIG_LZO_DECOMPRESS=y
CONFIG_LZ4_COMPRESS=m
CONFIG_LZ4HC_COMPRESS=m
CONFIG_LZ4_DECOMPRESS=m
CONFIG_XZ_DEC=y
CONFIG_XZ_DEC_X86=y
CONFIG_XZ_DEC_POWERPC=y
CONFIG_XZ_DEC_IA64=y
CONFIG_XZ_DEC_ARM=y
CONFIG_XZ_DEC_ARMTHUMB=y
CONFIG_XZ_DEC_SPARC=y
CONFIG_XZ_DEC_BCJ=y
CONFIG_XZ_DEC_TEST=m
CONFIG_DECOMPRESS_GZIP=y
CONFIG_DECOMPRESS_BZIP2=y
CONFIG_DECOMPRESS_LZMA=y
CONFIG_DECOMPRESS_XZ=y
CONFIG_DECOMPRESS_LZO=y
CONFIG_REED_SOLOMON=m
CONFIG_REED_SOLOMON_DEC16=y
CONFIG_BCH=y
CONFIG_BCH_CONST_PARAMS=y
CONFIG_TEXTSEARCH=y
CONFIG_TEXTSEARCH_KMP=m
CONFIG_TEXTSEARCH_BM=m
CONFIG_TEXTSEARCH_FSM=m
CONFIG_BTREE=y
CONFIG_INTERVAL_TREE=y
CONFIG_ASSOCIATIVE_ARRAY=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT_MAP=y
CONFIG_HAS_DMA=y
CONFIG_CHECK_SIGNATURE=y
CONFIG_DQL=y
CONFIG_GLOB=y
# CONFIG_GLOB_SELFTEST is not set
CONFIG_NLATTR=y
CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE=y
CONFIG_LRU_CACHE=m
CONFIG_AVERAGE=y
CONFIG_CLZ_TAB=y
CONFIG_CORDIC=y
# CONFIG_DDR is not set
CONFIG_MPILIB=y
CONFIG_OID_REGISTRY=y
CONFIG_FONT_SUPPORT=m
CONFIG_FONTS=y
CONFIG_FONT_8x8=y
CONFIG_FONT_8x16=y
CONFIG_FONT_6x11=y
# CONFIG_FONT_7x14 is not set
# CONFIG_FONT_PEARL_8x8 is not set
# CONFIG_FONT_ACORN_8x8 is not set
# CONFIG_FONT_MINI_4x6 is not set
CONFIG_FONT_6x10=y
# CONFIG_FONT_SUN8x16 is not set
# CONFIG_FONT_SUN12x22 is not set
# CONFIG_FONT_10x18 is not set
CONFIG_ARCH_HAS_SG_CHAIN=y

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* Re: [PATCH 06/13] mm: meminit: Inline some helper functions
@ 2015-05-04  8:33     ` Michal Hocko
  0 siblings, 0 replies; 168+ messages in thread
From: Michal Hocko @ 2015-05-04  8:33 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Waiman Long,
	Scott Norton, Daniel J Blueman, Linux-MM, LKML

[-- Attachment #1: Type: text/plain, Size: 1173 bytes --]

On Tue 28-04-15 15:37:03, Mel Gorman wrote:
> early_pfn_in_nid() and meminit_pfn_in_nid() are small functions that are
> unnecessarily visible outside memory initialisation. As well as unnecessary
> visibility, it's unnecessary function call overhead when initialising pages.
> This patch moves the helpers inline.

This is causing:
  CC      mm/page_alloc.o
mm/page_alloc.c: In function a??deferred_init_memmapa??:
mm/page_alloc.c:1135:4: error: implicit declaration of function a??meminit_pfn_in_nida?? [-Werror=implicit-function-declaration]
    if (!meminit_pfn_in_nid(pfn, nid, &nid_init_state)) {
    ^

with randconfig test. CONFIG_NODES_SPAN_OTHER_NODES is not defined.
The full config is attached.

I guess we need something like this:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3e0257debce0..a48128d882d8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1044,6 +1044,11 @@ static inline bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
 {
 	return true;
 }
+static inline bool __meminit meminit_pfn_in_nid(unsigned long pfn, int node,
+					struct mminit_pfnnid_cache *state)
+{
+	return true;
+}
 #endif
 
-- 
Michal Hocko
SUSE Labs

[-- Attachment #2: config-failed --]
[-- Type: text/plain, Size: 100881 bytes --]

#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 4.0.0 Kernel Configuration
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_PERF_EVENTS_INTEL_UNCORE=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_MMU=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_HAVE_INTEL_TXT=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11"
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_PGTABLE_LEVELS=4
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_CONSTRUCTORS=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
# CONFIG_SWAP is not set
# CONFIG_SYSVIPC is not set
# CONFIG_POSIX_MQUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_FHANDLE=y
# CONFIG_USELIB is not set
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_LEGACY_ALLOC_HWIRQ=y
CONFIG_IRQ_DOMAIN=y
CONFIG_GENERIC_MSI_IRQ=y
# CONFIG_IRQ_DOMAIN_DEBUG is not set
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_BSD_PROCESS_ACCT is not set
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y

#
# RCU Subsystem
#
CONFIG_TINY_RCU=y
CONFIG_SRCU=y
CONFIG_TASKS_RCU=y
# CONFIG_RCU_STALL_COMMON is not set
# CONFIG_TREE_RCU_TRACE is not set
CONFIG_RCU_KTHREAD_PRIO=0
CONFIG_BUILD_BIN2C=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=18
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_ARCH_SUPPORTS_INT128=y
# CONFIG_CGROUPS is not set
# CONFIG_CHECKPOINT_RESTORE is not set
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
CONFIG_USER_NS=y
# CONFIG_PID_NS is not set
CONFIG_NET_NS=y
# CONFIG_SCHED_AUTOGROUP is not set
CONFIG_SYSFS_DEPRECATED=y
CONFIG_SYSFS_DEPRECATED_V2=y
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_RD_XZ=y
CONFIG_RD_LZO=y
# CONFIG_RD_LZ4 is not set
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
CONFIG_HAVE_UID16=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_HAVE_PCSPKR_PLATFORM=y
CONFIG_BPF=y
# CONFIG_EXPERT is not set
CONFIG_UID16=y
CONFIG_SGETMASK_SYSCALL=y
CONFIG_SYSFS_SYSCALL=y
# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
# CONFIG_BPF_SYSCALL is not set
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_ADVISE_SYSCALLS=y
CONFIG_PCI_QUIRKS=y
# CONFIG_EMBEDDED is not set
CONFIG_HAVE_PERF_EVENTS=y

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
# CONFIG_DEBUG_PERF_USE_VMALLOC is not set
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
CONFIG_COMPAT_BRK=y
# CONFIG_SLAB is not set
CONFIG_SLUB=y
CONFIG_SYSTEM_TRUSTED_KEYRING=y
CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
CONFIG_OPROFILE=y
CONFIG_OPROFILE_EVENT_MULTIPLEX=y
CONFIG_HAVE_OPROFILE=y
CONFIG_OPROFILE_NMI_TIMER=y
CONFIG_KPROBES=y
CONFIG_JUMP_LABEL=y
CONFIG_OPTPROBES=y
CONFIG_KPROBES_ON_FTRACE=y
CONFIG_UPROBES=y
# CONFIG_HAVE_64BIT_ALIGNED_ACCESS is not set
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_ARCH_USE_BUILTIN_BSWAP=y
CONFIG_KRETPROBES=y
CONFIG_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_OPTPROBES=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_ATTRS=y
CONFIG_HAVE_DMA_CONTIGUOUS=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
CONFIG_HAVE_DMA_API_DEBUG=y
CONFIG_HAVE_HW_BREAKPOINT=y
CONFIG_HAVE_MIXED_BREAKPOINTS_REGS=y
CONFIG_HAVE_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_PERF_EVENTS_NMI=y
CONFIG_HAVE_PERF_REGS=y
CONFIG_HAVE_PERF_USER_STACK_DUMP=y
CONFIG_HAVE_ARCH_JUMP_LABEL=y
CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y
CONFIG_HAVE_ALIGNED_STRUCT_PAGE=y
CONFIG_HAVE_CMPXCHG_LOCAL=y
CONFIG_HAVE_CMPXCHG_DOUBLE=y
CONFIG_ARCH_WANT_COMPAT_IPC_PARSE_VERSION=y
CONFIG_ARCH_WANT_OLD_COMPAT_IPC=y
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_SECCOMP_FILTER=y
CONFIG_HAVE_CC_STACKPROTECTOR=y
# CONFIG_CC_STACKPROTECTOR is not set
CONFIG_CC_STACKPROTECTOR_NONE=y
# CONFIG_CC_STACKPROTECTOR_REGULAR is not set
# CONFIG_CC_STACKPROTECTOR_STRONG is not set
CONFIG_HAVE_CONTEXT_TRACKING=y
CONFIG_HAVE_VIRT_CPU_ACCOUNTING_GEN=y
CONFIG_HAVE_IRQ_TIME_ACCOUNTING=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_HAVE_ARCH_HUGE_VMAP=y
CONFIG_HAVE_ARCH_SOFT_DIRTY=y
CONFIG_MODULES_USE_ELF_RELA=y
CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK=y
CONFIG_ARCH_HAS_ELF_RANDOMIZE=y
CONFIG_OLD_SIGSUSPEND3=y
CONFIG_COMPAT_OLD_SIGACTION=y

#
# GCOV-based kernel profiling
#
# CONFIG_GCOV_KERNEL is not set
CONFIG_ARCH_HAS_GCOV_PROFILE_ALL=y
# CONFIG_HAVE_GENERIC_DMA_COHERENT is not set
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_FORCE_LOAD=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_MODVERSIONS=y
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_MODULE_SIG=y
# CONFIG_MODULE_SIG_FORCE is not set
# CONFIG_MODULE_SIG_ALL is not set
CONFIG_MODULE_SIG_SHA1=y
# CONFIG_MODULE_SIG_SHA224 is not set
# CONFIG_MODULE_SIG_SHA256 is not set
# CONFIG_MODULE_SIG_SHA384 is not set
# CONFIG_MODULE_SIG_SHA512 is not set
CONFIG_MODULE_SIG_HASH="sha1"
# CONFIG_MODULE_COMPRESS is not set
CONFIG_BLOCK=y
CONFIG_BLK_DEV_BSG=y
CONFIG_BLK_DEV_BSGLIB=y
CONFIG_BLK_DEV_INTEGRITY=y
# CONFIG_BLK_CMDLINE_PARSER is not set

#
# Partition Types
#
CONFIG_PARTITION_ADVANCED=y
# CONFIG_ACORN_PARTITION is not set
CONFIG_AIX_PARTITION=y
CONFIG_OSF_PARTITION=y
# CONFIG_AMIGA_PARTITION is not set
# CONFIG_ATARI_PARTITION is not set
CONFIG_MAC_PARTITION=y
CONFIG_MSDOS_PARTITION=y
CONFIG_BSD_DISKLABEL=y
CONFIG_MINIX_SUBPARTITION=y
# CONFIG_SOLARIS_X86_PARTITION is not set
# CONFIG_UNIXWARE_DISKLABEL is not set
CONFIG_LDM_PARTITION=y
# CONFIG_LDM_DEBUG is not set
CONFIG_SGI_PARTITION=y
# CONFIG_ULTRIX_PARTITION is not set
# CONFIG_SUN_PARTITION is not set
# CONFIG_KARMA_PARTITION is not set
# CONFIG_EFI_PARTITION is not set
# CONFIG_SYSV68_PARTITION is not set
# CONFIG_CMDLINE_PARTITION is not set
CONFIG_BLOCK_COMPAT=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_ASN1=y
CONFIG_UNINLINE_SPIN_UNLOCK=y
CONFIG_ARCH_SUPPORTS_ATOMIC_RMW=y
CONFIG_ARCH_USE_QUEUE_RWLOCK=y
CONFIG_FREEZER=y

#
# Processor type and features
#
CONFIG_ZONE_DMA=y
# CONFIG_SMP is not set
CONFIG_X86_FEATURE_NAMES=y
CONFIG_X86_X2APIC=y
CONFIG_X86_MPPARSE=y
CONFIG_GOLDFISH=y
CONFIG_X86_EXTENDED_PLATFORM=y
CONFIG_X86_GOLDFISH=y
# CONFIG_X86_INTEL_LPSS is not set
# CONFIG_X86_AMD_PLATFORM_DEVICE is not set
CONFIG_IOSF_MBI=y
# CONFIG_IOSF_MBI_DEBUG is not set
CONFIG_X86_SUPPORTS_MEMORY_FAILURE=y
CONFIG_SCHED_OMIT_FRAME_POINTER=y
# CONFIG_HYPERVISOR_GUEST is not set
CONFIG_NO_BOOTMEM=y
# CONFIG_MK8 is not set
# CONFIG_MPSC is not set
# CONFIG_MCORE2 is not set
# CONFIG_MATOM is not set
CONFIG_GENERIC_CPU=y
CONFIG_X86_INTERNODE_CACHE_SHIFT=6
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_TSC=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=64
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_CPU_SUP_INTEL=y
CONFIG_CPU_SUP_AMD=y
CONFIG_CPU_SUP_CENTAUR=y
CONFIG_HPET_TIMER=y
CONFIG_DMI=y
CONFIG_GART_IOMMU=y
CONFIG_CALGARY_IOMMU=y
# CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT is not set
CONFIG_SWIOTLB=y
CONFIG_IOMMU_HELPER=y
CONFIG_NR_CPUS=1
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
CONFIG_UP_LATE_INIT=y
CONFIG_X86_UP_APIC_MSI=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
# CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS is not set
CONFIG_X86_MCE=y
CONFIG_X86_MCE_INTEL=y
CONFIG_X86_MCE_AMD=y
CONFIG_X86_MCE_THRESHOLD=y
CONFIG_X86_MCE_INJECT=y
CONFIG_X86_THERMAL_VECTOR=y
CONFIG_X86_16BIT=y
CONFIG_X86_ESPFIX64=y
CONFIG_X86_VSYSCALL_EMULATION=y
CONFIG_I8K=m
CONFIG_MICROCODE=y
CONFIG_MICROCODE_INTEL=y
CONFIG_MICROCODE_AMD=y
CONFIG_MICROCODE_OLD_INTERFACE=y
# CONFIG_MICROCODE_INTEL_EARLY is not set
# CONFIG_MICROCODE_AMD_EARLY is not set
# CONFIG_MICROCODE_EARLY is not set
CONFIG_X86_MSR=m
CONFIG_X86_CPUID=y
CONFIG_ARCH_PHYS_ADDR_T_64BIT=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_DIRECT_GBPAGES=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
# CONFIG_ARCH_MEMORY_PROBE is not set
CONFIG_ARCH_PROC_KCORE_TEXT=y
CONFIG_ILLEGAL_POINTER_VALUE=0xdead000000000000
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_SPARSEMEM_MANUAL=y
CONFIG_SPARSEMEM=y
CONFIG_HAVE_MEMORY_PRESENT=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER=y
CONFIG_SPARSEMEM_VMEMMAP=y
CONFIG_HAVE_MEMBLOCK=y
CONFIG_HAVE_MEMBLOCK_NODE_MAP=y
CONFIG_ARCH_DISCARD_MEMBLOCK=y
CONFIG_MEMORY_ISOLATION=y
CONFIG_HAVE_BOOTMEM_INFO_NODE=y
CONFIG_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTPLUG_SPARSE=y
CONFIG_MEMORY_HOTREMOVE=y
CONFIG_PAGEFLAGS_EXTENDED=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK=y
CONFIG_COMPACTION=y
CONFIG_MIGRATION=y
CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION=y
CONFIG_PHYS_ADDR_T_64BIT=y
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
CONFIG_NEED_BOUNCE_POOL=y
CONFIG_VIRT_TO_BUS=y
CONFIG_MMU_NOTIFIER=y
CONFIG_KSM=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
CONFIG_ARCH_SUPPORTS_MEMORY_FAILURE=y
CONFIG_MEMORY_FAILURE=y
# CONFIG_HWPOISON_INJECT is not set
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
# CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
CONFIG_NEED_PER_CPU_KM=y
CONFIG_CLEANCACHE=y
CONFIG_CMA=y
CONFIG_CMA_DEBUG=y
# CONFIG_CMA_DEBUGFS is not set
CONFIG_CMA_AREAS=7
CONFIG_ZPOOL=y
# CONFIG_ZBUD is not set
# CONFIG_ZSMALLOC is not set
CONFIG_GENERIC_EARLY_IOREMAP=y
CONFIG_ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT=y
CONFIG_DEFERRED_STRUCT_PAGE_INIT=y
CONFIG_X86_CHECK_BIOS_CORRUPTION=y
# CONFIG_X86_BOOTPARAM_MEMORY_CORRUPTION_CHECK is not set
CONFIG_X86_RESERVE_LOW=64
CONFIG_MTRR=y
# CONFIG_MTRR_SANITIZER is not set
CONFIG_X86_PAT=y
CONFIG_ARCH_USES_PG_UNCACHED=y
CONFIG_ARCH_RANDOM=y
CONFIG_X86_SMAP=y
# CONFIG_X86_INTEL_MPX is not set
# CONFIG_EFI is not set
CONFIG_SECCOMP=y
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
CONFIG_SCHED_HRTICK=y
CONFIG_KEXEC=y
CONFIG_KEXEC_FILE=y
# CONFIG_KEXEC_VERIFY_SIG is not set
CONFIG_CRASH_DUMP=y
CONFIG_PHYSICAL_START=0x1000000
CONFIG_RELOCATABLE=y
CONFIG_RANDOMIZE_BASE=y
CONFIG_RANDOMIZE_BASE_MAX_OFFSET=0x40000000
CONFIG_X86_NEED_RELOCS=y
CONFIG_PHYSICAL_ALIGN=0x1000000
CONFIG_COMPAT_VDSO=y
# CONFIG_CMDLINE_BOOL is not set
CONFIG_HAVE_LIVEPATCH=y
CONFIG_LIVEPATCH=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y

#
# Power management and ACPI options
#
CONFIG_SUSPEND=y
CONFIG_SUSPEND_FREEZER=y
CONFIG_PM_SLEEP=y
CONFIG_PM_AUTOSLEEP=y
CONFIG_PM_WAKELOCKS=y
CONFIG_PM_WAKELOCKS_LIMIT=100
CONFIG_PM_WAKELOCKS_GC=y
CONFIG_PM=y
CONFIG_PM_DEBUG=y
# CONFIG_PM_ADVANCED_DEBUG is not set
CONFIG_PM_SLEEP_DEBUG=y
CONFIG_DPM_WATCHDOG=y
CONFIG_DPM_WATCHDOG_TIMEOUT=12
CONFIG_PM_TRACE=y
CONFIG_PM_TRACE_RTC=y
# CONFIG_WQ_POWER_EFFICIENT_DEFAULT is not set
CONFIG_ACPI=y
CONFIG_ACPI_LEGACY_TABLES_LOOKUP=y
CONFIG_ARCH_MIGHT_HAVE_ACPI_PDC=y
CONFIG_ACPI_SLEEP=y
CONFIG_ACPI_PROCFS_POWER=y
CONFIG_ACPI_EC_DEBUGFS=y
CONFIG_ACPI_AC=y
CONFIG_ACPI_BATTERY=y
CONFIG_ACPI_BUTTON=y
CONFIG_ACPI_VIDEO=m
CONFIG_ACPI_FAN=y
CONFIG_ACPI_DOCK=y
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_PROCESSOR_AGGREGATOR=y
CONFIG_ACPI_THERMAL=y
# CONFIG_ACPI_CUSTOM_DSDT is not set
CONFIG_ACPI_INITRD_TABLE_OVERRIDE=y
CONFIG_ACPI_DEBUG=y
# CONFIG_ACPI_PCI_SLOT is not set
CONFIG_X86_PM_TIMER=y
CONFIG_ACPI_CONTAINER=y
CONFIG_ACPI_HOTPLUG_MEMORY=y
CONFIG_ACPI_HOTPLUG_IOAPIC=y
# CONFIG_ACPI_SBS is not set
CONFIG_ACPI_HED=m
CONFIG_ACPI_CUSTOM_METHOD=m
# CONFIG_ACPI_REDUCED_HARDWARE_ONLY is not set
CONFIG_HAVE_ACPI_APEI=y
CONFIG_HAVE_ACPI_APEI_NMI=y
CONFIG_ACPI_APEI=y
# CONFIG_ACPI_APEI_GHES is not set
CONFIG_ACPI_APEI_MEMORY_FAILURE=y
CONFIG_ACPI_APEI_EINJ=m
# CONFIG_ACPI_APEI_ERST_DEBUG is not set
CONFIG_ACPI_EXTLOG=y
# CONFIG_PMIC_OPREGION is not set
CONFIG_SFI=y

#
# CPU Frequency scaling
#
# CONFIG_CPU_FREQ is not set

#
# CPU Idle
#
CONFIG_CPU_IDLE=y
CONFIG_CPU_IDLE_GOV_LADDER=y
CONFIG_CPU_IDLE_GOV_MENU=y
# CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED is not set
CONFIG_INTEL_IDLE=y

#
# Memory power savings
#
CONFIG_I7300_IDLE_IOAT_CHANNEL=y
CONFIG_I7300_IDLE=m

#
# Bus options (PCI etc.)
#
CONFIG_PCI=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
CONFIG_PCI_DOMAINS=y
# CONFIG_PCIEPORTBUS is not set
CONFIG_PCI_MSI=y
# CONFIG_PCI_DEBUG is not set
CONFIG_PCI_REALLOC_ENABLE_AUTO=y
CONFIG_PCI_STUB=m
CONFIG_HT_IRQ=y
CONFIG_PCI_ATS=y
CONFIG_PCI_IOV=y
# CONFIG_PCI_PRI is not set
CONFIG_PCI_PASID=y
CONFIG_PCI_LABEL=y

#
# PCI host controller drivers
#
CONFIG_ISA_DMA_API=y
CONFIG_AMD_NB=y
# CONFIG_PCCARD is not set
CONFIG_HOTPLUG_PCI=y
CONFIG_HOTPLUG_PCI_ACPI=y
CONFIG_HOTPLUG_PCI_ACPI_IBM=m
# CONFIG_HOTPLUG_PCI_CPCI is not set
CONFIG_HOTPLUG_PCI_SHPC=y
CONFIG_RAPIDIO=m
CONFIG_RAPIDIO_DISC_TIMEOUT=30
CONFIG_RAPIDIO_ENABLE_RX_TX_PORTS=y
# CONFIG_RAPIDIO_DMA_ENGINE is not set
CONFIG_RAPIDIO_DEBUG=y
CONFIG_RAPIDIO_ENUM_BASIC=m

#
# RapidIO Switch drivers
#
CONFIG_RAPIDIO_TSI57X=m
CONFIG_RAPIDIO_CPS_XX=m
CONFIG_RAPIDIO_TSI568=m
# CONFIG_RAPIDIO_CPS_GEN2 is not set
# CONFIG_X86_SYSFB is not set

#
# Executable file formats / Emulations
#
CONFIG_BINFMT_ELF=y
CONFIG_COMPAT_BINFMT_ELF=y
CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y
# CONFIG_BINFMT_SCRIPT is not set
# CONFIG_HAVE_AOUT is not set
# CONFIG_BINFMT_MISC is not set
CONFIG_COREDUMP=y
CONFIG_IA32_EMULATION=y
# CONFIG_IA32_AOUT is not set
CONFIG_X86_X32=y
CONFIG_COMPAT=y
CONFIG_COMPAT_FOR_U64_ALIGNMENT=y
CONFIG_KEYS_COMPAT=y
CONFIG_X86_DEV_DMA_OPS=y
CONFIG_PMC_ATOM=y
CONFIG_NET=y
CONFIG_COMPAT_NETLINK_MESSAGES=y

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_DIAG=y
CONFIG_UNIX=y
CONFIG_UNIX_DIAG=y
CONFIG_XFRM=y
CONFIG_XFRM_ALGO=y
CONFIG_XFRM_USER=m
CONFIG_XFRM_SUB_POLICY=y
CONFIG_XFRM_MIGRATE=y
# CONFIG_XFRM_STATISTICS is not set
CONFIG_XFRM_IPCOMP=m
CONFIG_NET_KEY=m
# CONFIG_NET_KEY_MIGRATE is not set
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
CONFIG_IP_FIB_TRIE_STATS=y
CONFIG_IP_MULTIPLE_TABLES=y
CONFIG_IP_ROUTE_MULTIPATH=y
CONFIG_IP_ROUTE_VERBOSE=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
CONFIG_IP_PNP_BOOTP=y
CONFIG_IP_PNP_RARP=y
CONFIG_NET_IPIP=m
CONFIG_NET_IPGRE_DEMUX=m
CONFIG_NET_IP_TUNNEL=y
# CONFIG_NET_IPGRE is not set
# CONFIG_IP_MROUTE is not set
CONFIG_SYN_COOKIES=y
CONFIG_NET_IPVTI=y
CONFIG_NET_UDP_TUNNEL=y
CONFIG_NET_FOU=y
CONFIG_NET_FOU_IP_TUNNELS=y
CONFIG_GENEVE=m
# CONFIG_INET_AH is not set
CONFIG_INET_ESP=y
# CONFIG_INET_IPCOMP is not set
# CONFIG_INET_XFRM_TUNNEL is not set
CONFIG_INET_TUNNEL=y
CONFIG_INET_XFRM_MODE_TRANSPORT=y
CONFIG_INET_XFRM_MODE_TUNNEL=y
CONFIG_INET_XFRM_MODE_BEET=y
CONFIG_INET_LRO=y
CONFIG_INET_DIAG=y
CONFIG_INET_TCP_DIAG=y
CONFIG_INET_UDP_DIAG=y
CONFIG_TCP_CONG_ADVANCED=y
CONFIG_TCP_CONG_BIC=y
CONFIG_TCP_CONG_CUBIC=y
CONFIG_TCP_CONG_WESTWOOD=y
CONFIG_TCP_CONG_HTCP=m
CONFIG_TCP_CONG_HSTCP=m
CONFIG_TCP_CONG_HYBLA=y
CONFIG_TCP_CONG_VEGAS=y
CONFIG_TCP_CONG_SCALABLE=y
CONFIG_TCP_CONG_LP=y
CONFIG_TCP_CONG_VENO=m
CONFIG_TCP_CONG_YEAH=y
# CONFIG_TCP_CONG_ILLINOIS is not set
# CONFIG_TCP_CONG_DCTCP is not set
# CONFIG_DEFAULT_BIC is not set
CONFIG_DEFAULT_CUBIC=y
# CONFIG_DEFAULT_HYBLA is not set
# CONFIG_DEFAULT_VEGAS is not set
# CONFIG_DEFAULT_WESTWOOD is not set
# CONFIG_DEFAULT_RENO is not set
CONFIG_DEFAULT_TCP_CONG="cubic"
# CONFIG_TCP_MD5SIG is not set
CONFIG_IPV6=y
CONFIG_IPV6_ROUTER_PREF=y
# CONFIG_IPV6_ROUTE_INFO is not set
# CONFIG_IPV6_OPTIMISTIC_DAD is not set
CONFIG_INET6_AH=y
CONFIG_INET6_ESP=m
CONFIG_INET6_IPCOMP=m
CONFIG_IPV6_MIP6=y
CONFIG_INET6_XFRM_TUNNEL=m
CONFIG_INET6_TUNNEL=y
CONFIG_INET6_XFRM_MODE_TRANSPORT=y
CONFIG_INET6_XFRM_MODE_TUNNEL=y
CONFIG_INET6_XFRM_MODE_BEET=y
CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION=y
# CONFIG_IPV6_VTI is not set
CONFIG_IPV6_SIT=y
# CONFIG_IPV6_SIT_6RD is not set
CONFIG_IPV6_NDISC_NODETYPE=y
CONFIG_IPV6_TUNNEL=y
CONFIG_IPV6_GRE=y
# CONFIG_IPV6_MULTIPLE_TABLES is not set
# CONFIG_IPV6_MROUTE is not set
# CONFIG_NETWORK_SECMARK is not set
CONFIG_NET_PTP_CLASSIFY=y
# CONFIG_NETWORK_PHY_TIMESTAMPING is not set
CONFIG_NETFILTER=y
CONFIG_NETFILTER_DEBUG=y
CONFIG_NETFILTER_ADVANCED=y
# CONFIG_BRIDGE_NETFILTER is not set

#
# Core Netfilter Configuration
#
CONFIG_NETFILTER_NETLINK=y
CONFIG_NETFILTER_NETLINK_ACCT=m
CONFIG_NETFILTER_NETLINK_QUEUE=m
CONFIG_NETFILTER_NETLINK_LOG=m
CONFIG_NF_CONNTRACK=m
CONFIG_NF_LOG_COMMON=y
CONFIG_NF_CONNTRACK_MARK=y
# CONFIG_NF_CONNTRACK_ZONES is not set
CONFIG_NF_CONNTRACK_PROCFS=y
CONFIG_NF_CONNTRACK_EVENTS=y
CONFIG_NF_CONNTRACK_TIMEOUT=y
# CONFIG_NF_CONNTRACK_TIMESTAMP is not set
CONFIG_NF_CONNTRACK_LABELS=y
# CONFIG_NF_CT_PROTO_DCCP is not set
CONFIG_NF_CT_PROTO_GRE=m
CONFIG_NF_CT_PROTO_SCTP=m
CONFIG_NF_CT_PROTO_UDPLITE=m
# CONFIG_NF_CONNTRACK_AMANDA is not set
CONFIG_NF_CONNTRACK_FTP=m
CONFIG_NF_CONNTRACK_H323=m
CONFIG_NF_CONNTRACK_IRC=m
CONFIG_NF_CONNTRACK_BROADCAST=m
CONFIG_NF_CONNTRACK_NETBIOS_NS=m
CONFIG_NF_CONNTRACK_SNMP=m
CONFIG_NF_CONNTRACK_PPTP=m
CONFIG_NF_CONNTRACK_SANE=m
# CONFIG_NF_CONNTRACK_SIP is not set
CONFIG_NF_CONNTRACK_TFTP=m
CONFIG_NF_CT_NETLINK=m
# CONFIG_NF_CT_NETLINK_TIMEOUT is not set
# CONFIG_NF_CT_NETLINK_HELPER is not set
CONFIG_NETFILTER_NETLINK_QUEUE_CT=y
CONFIG_NF_NAT=m
CONFIG_NF_NAT_NEEDED=y
CONFIG_NF_NAT_PROTO_UDPLITE=m
CONFIG_NF_NAT_PROTO_SCTP=m
# CONFIG_NF_NAT_AMANDA is not set
CONFIG_NF_NAT_FTP=m
CONFIG_NF_NAT_IRC=m
# CONFIG_NF_NAT_SIP is not set
CONFIG_NF_NAT_TFTP=m
CONFIG_NF_NAT_REDIRECT=m
CONFIG_NETFILTER_SYNPROXY=m
CONFIG_NF_TABLES=y
CONFIG_NF_TABLES_INET=y
# CONFIG_NFT_EXTHDR is not set
CONFIG_NFT_META=m
# CONFIG_NFT_CT is not set
CONFIG_NFT_RBTREE=y
CONFIG_NFT_HASH=y
CONFIG_NFT_COUNTER=y
CONFIG_NFT_LOG=m
CONFIG_NFT_LIMIT=m
CONFIG_NFT_MASQ=m
CONFIG_NFT_REDIR=m
CONFIG_NFT_NAT=m
# CONFIG_NFT_QUEUE is not set
# CONFIG_NFT_REJECT is not set
# CONFIG_NFT_REJECT_INET is not set
CONFIG_NFT_COMPAT=m
CONFIG_NETFILTER_XTABLES=m

#
# Xtables combined modules
#
CONFIG_NETFILTER_XT_MARK=m
CONFIG_NETFILTER_XT_CONNMARK=m
CONFIG_NETFILTER_XT_SET=m

#
# Xtables targets
#
CONFIG_NETFILTER_XT_TARGET_CHECKSUM=m
CONFIG_NETFILTER_XT_TARGET_CLASSIFY=m
CONFIG_NETFILTER_XT_TARGET_CONNMARK=m
CONFIG_NETFILTER_XT_TARGET_CT=m
CONFIG_NETFILTER_XT_TARGET_DSCP=m
CONFIG_NETFILTER_XT_TARGET_HL=m
# CONFIG_NETFILTER_XT_TARGET_HMARK is not set
CONFIG_NETFILTER_XT_TARGET_IDLETIMER=m
# CONFIG_NETFILTER_XT_TARGET_LED is not set
CONFIG_NETFILTER_XT_TARGET_LOG=m
CONFIG_NETFILTER_XT_TARGET_MARK=m
CONFIG_NETFILTER_XT_NAT=m
CONFIG_NETFILTER_XT_TARGET_NETMAP=m
CONFIG_NETFILTER_XT_TARGET_NFLOG=m
CONFIG_NETFILTER_XT_TARGET_NFQUEUE=m
# CONFIG_NETFILTER_XT_TARGET_NOTRACK is not set
CONFIG_NETFILTER_XT_TARGET_RATEEST=m
CONFIG_NETFILTER_XT_TARGET_REDIRECT=m
CONFIG_NETFILTER_XT_TARGET_TEE=m
# CONFIG_NETFILTER_XT_TARGET_TPROXY is not set
CONFIG_NETFILTER_XT_TARGET_TRACE=m
CONFIG_NETFILTER_XT_TARGET_TCPMSS=m
# CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP is not set

#
# Xtables matches
#
CONFIG_NETFILTER_XT_MATCH_ADDRTYPE=m
# CONFIG_NETFILTER_XT_MATCH_BPF is not set
CONFIG_NETFILTER_XT_MATCH_CLUSTER=m
CONFIG_NETFILTER_XT_MATCH_COMMENT=m
CONFIG_NETFILTER_XT_MATCH_CONNBYTES=m
CONFIG_NETFILTER_XT_MATCH_CONNLABEL=m
CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=m
CONFIG_NETFILTER_XT_MATCH_CONNMARK=m
# CONFIG_NETFILTER_XT_MATCH_CONNTRACK is not set
CONFIG_NETFILTER_XT_MATCH_CPU=m
CONFIG_NETFILTER_XT_MATCH_DCCP=m
# CONFIG_NETFILTER_XT_MATCH_DEVGROUP is not set
CONFIG_NETFILTER_XT_MATCH_DSCP=m
CONFIG_NETFILTER_XT_MATCH_ECN=m
CONFIG_NETFILTER_XT_MATCH_ESP=m
CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=m
CONFIG_NETFILTER_XT_MATCH_HELPER=m
CONFIG_NETFILTER_XT_MATCH_HL=m
CONFIG_NETFILTER_XT_MATCH_IPCOMP=m
CONFIG_NETFILTER_XT_MATCH_IPRANGE=m
# CONFIG_NETFILTER_XT_MATCH_L2TP is not set
CONFIG_NETFILTER_XT_MATCH_LENGTH=m
# CONFIG_NETFILTER_XT_MATCH_LIMIT is not set
CONFIG_NETFILTER_XT_MATCH_MAC=m
# CONFIG_NETFILTER_XT_MATCH_MARK is not set
CONFIG_NETFILTER_XT_MATCH_MULTIPORT=m
CONFIG_NETFILTER_XT_MATCH_NFACCT=m
CONFIG_NETFILTER_XT_MATCH_OSF=m
# CONFIG_NETFILTER_XT_MATCH_OWNER is not set
# CONFIG_NETFILTER_XT_MATCH_POLICY is not set
# CONFIG_NETFILTER_XT_MATCH_PKTTYPE is not set
CONFIG_NETFILTER_XT_MATCH_QUOTA=m
CONFIG_NETFILTER_XT_MATCH_RATEEST=m
# CONFIG_NETFILTER_XT_MATCH_REALM is not set
# CONFIG_NETFILTER_XT_MATCH_RECENT is not set
CONFIG_NETFILTER_XT_MATCH_SCTP=m
# CONFIG_NETFILTER_XT_MATCH_SOCKET is not set
# CONFIG_NETFILTER_XT_MATCH_STATE is not set
CONFIG_NETFILTER_XT_MATCH_STATISTIC=m
CONFIG_NETFILTER_XT_MATCH_STRING=m
# CONFIG_NETFILTER_XT_MATCH_TCPMSS is not set
CONFIG_NETFILTER_XT_MATCH_TIME=m
# CONFIG_NETFILTER_XT_MATCH_U32 is not set
CONFIG_IP_SET=m
CONFIG_IP_SET_MAX=256
# CONFIG_IP_SET_BITMAP_IP is not set
CONFIG_IP_SET_BITMAP_IPMAC=m
# CONFIG_IP_SET_BITMAP_PORT is not set
CONFIG_IP_SET_HASH_IP=m
CONFIG_IP_SET_HASH_IPMARK=m
CONFIG_IP_SET_HASH_IPPORT=m
CONFIG_IP_SET_HASH_IPPORTIP=m
CONFIG_IP_SET_HASH_IPPORTNET=m
CONFIG_IP_SET_HASH_MAC=m
CONFIG_IP_SET_HASH_NETPORTNET=m
# CONFIG_IP_SET_HASH_NET is not set
CONFIG_IP_SET_HASH_NETNET=m
# CONFIG_IP_SET_HASH_NETPORT is not set
# CONFIG_IP_SET_HASH_NETIFACE is not set
# CONFIG_IP_SET_LIST_SET is not set
# CONFIG_IP_VS is not set

#
# IP: Netfilter Configuration
#
CONFIG_NF_DEFRAG_IPV4=m
CONFIG_NF_CONNTRACK_IPV4=m
CONFIG_NF_CONNTRACK_PROC_COMPAT=y
CONFIG_NF_LOG_ARP=m
CONFIG_NF_LOG_IPV4=y
CONFIG_NF_TABLES_IPV4=y
CONFIG_NFT_CHAIN_ROUTE_IPV4=m
CONFIG_NF_REJECT_IPV4=m
# CONFIG_NFT_REJECT_IPV4 is not set
CONFIG_NF_TABLES_ARP=y
CONFIG_NF_NAT_IPV4=m
CONFIG_NFT_CHAIN_NAT_IPV4=m
CONFIG_NF_NAT_MASQUERADE_IPV4=m
CONFIG_NFT_MASQ_IPV4=m
CONFIG_NFT_REDIR_IPV4=m
CONFIG_NF_NAT_SNMP_BASIC=m
CONFIG_NF_NAT_PROTO_GRE=m
CONFIG_NF_NAT_PPTP=m
CONFIG_NF_NAT_H323=m
CONFIG_IP_NF_IPTABLES=m
CONFIG_IP_NF_MATCH_AH=m
CONFIG_IP_NF_MATCH_ECN=m
CONFIG_IP_NF_MATCH_RPFILTER=m
CONFIG_IP_NF_MATCH_TTL=m
CONFIG_IP_NF_FILTER=m
CONFIG_IP_NF_TARGET_REJECT=m
CONFIG_IP_NF_TARGET_SYNPROXY=m
CONFIG_IP_NF_NAT=m
CONFIG_IP_NF_TARGET_MASQUERADE=m
# CONFIG_IP_NF_TARGET_NETMAP is not set
CONFIG_IP_NF_TARGET_REDIRECT=m
CONFIG_IP_NF_MANGLE=m
CONFIG_IP_NF_TARGET_CLUSTERIP=m
CONFIG_IP_NF_TARGET_ECN=m
CONFIG_IP_NF_TARGET_TTL=m
CONFIG_IP_NF_RAW=m
CONFIG_IP_NF_ARPTABLES=m
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m

#
# IPv6: Netfilter Configuration
#
# CONFIG_NF_DEFRAG_IPV6 is not set
# CONFIG_NF_CONNTRACK_IPV6 is not set
CONFIG_NF_TABLES_IPV6=y
CONFIG_NFT_CHAIN_ROUTE_IPV6=y
CONFIG_NF_REJECT_IPV6=y
# CONFIG_NFT_REJECT_IPV6 is not set
CONFIG_NF_LOG_IPV6=m
# CONFIG_IP6_NF_IPTABLES is not set

#
# DECnet: Netfilter Configuration
#
CONFIG_DECNET_NF_GRABULATOR=m
# CONFIG_NF_TABLES_BRIDGE is not set
# CONFIG_BRIDGE_NF_EBTABLES is not set
CONFIG_IP_DCCP=m
CONFIG_INET_DCCP_DIAG=m

#
# DCCP CCIDs Configuration
#
# CONFIG_IP_DCCP_CCID2_DEBUG is not set
CONFIG_IP_DCCP_CCID3=y
# CONFIG_IP_DCCP_CCID3_DEBUG is not set
CONFIG_IP_DCCP_TFRC_LIB=y

#
# DCCP Kernel Hacking
#
CONFIG_IP_DCCP_DEBUG=y
CONFIG_NET_DCCPPROBE=m
CONFIG_IP_SCTP=m
# CONFIG_NET_SCTPPROBE is not set
# CONFIG_SCTP_DBG_OBJCNT is not set
CONFIG_SCTP_DEFAULT_COOKIE_HMAC_MD5=y
# CONFIG_SCTP_DEFAULT_COOKIE_HMAC_SHA1 is not set
# CONFIG_SCTP_DEFAULT_COOKIE_HMAC_NONE is not set
CONFIG_SCTP_COOKIE_HMAC_MD5=y
# CONFIG_SCTP_COOKIE_HMAC_SHA1 is not set
CONFIG_RDS=y
# CONFIG_RDS_TCP is not set
CONFIG_RDS_DEBUG=y
# CONFIG_TIPC is not set
# CONFIG_ATM is not set
CONFIG_L2TP=m
# CONFIG_L2TP_DEBUGFS is not set
CONFIG_L2TP_V3=y
CONFIG_L2TP_IP=y
CONFIG_L2TP_ETH=m
CONFIG_STP=m
CONFIG_BRIDGE=m
CONFIG_BRIDGE_IGMP_SNOOPING=y
# CONFIG_BRIDGE_VLAN_FILTERING is not set
CONFIG_HAVE_NET_DSA=y
CONFIG_NET_DSA=y
CONFIG_NET_DSA_TAG_DSA=y
CONFIG_NET_DSA_TAG_EDSA=y
CONFIG_NET_DSA_TAG_TRAILER=y
CONFIG_VLAN_8021Q=y
# CONFIG_VLAN_8021Q_GVRP is not set
# CONFIG_VLAN_8021Q_MVRP is not set
CONFIG_DECNET=m
CONFIG_DECNET_ROUTER=y
CONFIG_LLC=y
CONFIG_LLC2=y
# CONFIG_IPX is not set
CONFIG_ATALK=m
CONFIG_DEV_APPLETALK=m
# CONFIG_IPDDP is not set
CONFIG_X25=y
# CONFIG_LAPB is not set
# CONFIG_PHONET is not set
CONFIG_6LOWPAN=y
CONFIG_IEEE802154=m
# CONFIG_IEEE802154_SOCKET is not set
CONFIG_IEEE802154_6LOWPAN=m
# CONFIG_MAC802154 is not set
# CONFIG_NET_SCHED is not set
# CONFIG_DCB is not set
CONFIG_DNS_RESOLVER=y
CONFIG_BATMAN_ADV=m
CONFIG_BATMAN_ADV_BLA=y
# CONFIG_BATMAN_ADV_DAT is not set
# CONFIG_BATMAN_ADV_NC is not set
CONFIG_BATMAN_ADV_MCAST=y
# CONFIG_BATMAN_ADV_DEBUG is not set
# CONFIG_OPENVSWITCH is not set
CONFIG_VSOCKETS=y
CONFIG_VMWARE_VMCI_VSOCKETS=m
# CONFIG_NETLINK_MMAP is not set
CONFIG_NETLINK_DIAG=m
CONFIG_NET_MPLS_GSO=y
# CONFIG_HSR is not set
# CONFIG_NET_SWITCHDEV is not set
CONFIG_NET_RX_BUSY_POLL=y
CONFIG_BQL=y
# CONFIG_BPF_JIT is not set

#
# Network testing
#
CONFIG_NET_PKTGEN=m
CONFIG_NET_TCPPROBE=y
CONFIG_NET_DROP_MONITOR=y
CONFIG_HAMRADIO=y

#
# Packet Radio protocols
#
# CONFIG_AX25 is not set
CONFIG_CAN=y
# CONFIG_CAN_RAW is not set
CONFIG_CAN_BCM=m
# CONFIG_CAN_GW is not set

#
# CAN Device Drivers
#
CONFIG_CAN_VCAN=m
CONFIG_CAN_SLCAN=m
CONFIG_CAN_DEV=y
# CONFIG_CAN_CALC_BITTIMING is not set
CONFIG_CAN_LEDS=y
CONFIG_CAN_JANZ_ICAN3=m
CONFIG_CAN_SJA1000=y
CONFIG_CAN_SJA1000_ISA=m
CONFIG_CAN_SJA1000_PLATFORM=y
# CONFIG_CAN_EMS_PCI is not set
# CONFIG_CAN_PEAK_PCI is not set
CONFIG_CAN_KVASER_PCI=y
CONFIG_CAN_PLX_PCI=m
CONFIG_CAN_C_CAN=y
# CONFIG_CAN_C_CAN_PLATFORM is not set
# CONFIG_CAN_C_CAN_PCI is not set
CONFIG_CAN_M_CAN=m
CONFIG_CAN_CC770=m
# CONFIG_CAN_CC770_ISA is not set
CONFIG_CAN_CC770_PLATFORM=m

#
# CAN SPI interfaces
#
CONFIG_CAN_MCP251X=m
# CONFIG_CAN_SOFTING is not set
# CONFIG_CAN_DEBUG_DEVICES is not set
# CONFIG_IRDA is not set
# CONFIG_BT is not set
CONFIG_AF_RXRPC=m
# CONFIG_AF_RXRPC_DEBUG is not set
CONFIG_RXKAD=m
CONFIG_FIB_RULES=y
CONFIG_WIRELESS=y
CONFIG_WEXT_CORE=y
CONFIG_WEXT_PROC=y
CONFIG_CFG80211=m
CONFIG_NL80211_TESTMODE=y
CONFIG_CFG80211_DEVELOPER_WARNINGS=y
# CONFIG_CFG80211_REG_DEBUG is not set
CONFIG_CFG80211_DEFAULT_PS=y
CONFIG_CFG80211_DEBUGFS=y
# CONFIG_CFG80211_INTERNAL_REGDB is not set
CONFIG_CFG80211_WEXT=y
# CONFIG_LIB80211 is not set
CONFIG_MAC80211=m
CONFIG_MAC80211_HAS_RC=y
CONFIG_MAC80211_RC_MINSTREL=y
CONFIG_MAC80211_RC_MINSTREL_HT=y
# CONFIG_MAC80211_RC_MINSTREL_VHT is not set
CONFIG_MAC80211_RC_DEFAULT_MINSTREL=y
CONFIG_MAC80211_RC_DEFAULT="minstrel_ht"
# CONFIG_MAC80211_MESH is not set
CONFIG_MAC80211_LEDS=y
# CONFIG_MAC80211_DEBUGFS is not set
CONFIG_MAC80211_MESSAGE_TRACING=y
# CONFIG_MAC80211_DEBUG_MENU is not set
CONFIG_WIMAX=m
CONFIG_WIMAX_DEBUG_LEVEL=8
CONFIG_RFKILL=y
CONFIG_RFKILL_LEDS=y
CONFIG_RFKILL_INPUT=y
# CONFIG_NET_9P is not set
# CONFIG_CAIF is not set
CONFIG_CEPH_LIB=y
# CONFIG_CEPH_LIB_PRETTYDEBUG is not set
# CONFIG_CEPH_LIB_USE_DNS_RESOLVER is not set
CONFIG_NFC=m
CONFIG_NFC_DIGITAL=m
CONFIG_NFC_NCI=m
CONFIG_NFC_NCI_SPI=y
CONFIG_NFC_HCI=m
CONFIG_NFC_SHDLC=y

#
# Near Field Communication (NFC) devices
#
# CONFIG_NFC_TRF7970A is not set
CONFIG_NFC_SIM=m
CONFIG_NFC_PN544=m
CONFIG_NFC_PN544_I2C=m
CONFIG_NFC_MICROREAD=m
# CONFIG_NFC_MICROREAD_I2C is not set
CONFIG_NFC_MRVL=m
CONFIG_NFC_ST21NFCA=m
CONFIG_NFC_ST21NFCA_I2C=m
# CONFIG_NFC_ST21NFCB is not set
CONFIG_HAVE_BPF_JIT=y

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_UEVENT_HELPER=y
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
CONFIG_DEVTMPFS=y
# CONFIG_DEVTMPFS_MOUNT is not set
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
CONFIG_FIRMWARE_IN_KERNEL=y
CONFIG_EXTRA_FIRMWARE=""
CONFIG_FW_LOADER_USER_HELPER=y
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y
CONFIG_ALLOW_DEV_COREDUMP=y
CONFIG_DEBUG_DRIVER=y
CONFIG_DEBUG_DEVRES=y
# CONFIG_SYS_HYPERVISOR is not set
# CONFIG_GENERIC_CPU_DEVICES is not set
CONFIG_GENERIC_CPU_AUTOPROBE=y
CONFIG_REGMAP=y
CONFIG_REGMAP_I2C=y
CONFIG_REGMAP_SPI=y
CONFIG_REGMAP_MMIO=m
CONFIG_REGMAP_IRQ=y
CONFIG_DMA_SHARED_BUFFER=y
CONFIG_FENCE_TRACE=y
CONFIG_DMA_CMA=y

#
# Default contiguous memory area size:
#
CONFIG_CMA_SIZE_MBYTES=0
CONFIG_CMA_SIZE_SEL_MBYTES=y
# CONFIG_CMA_SIZE_SEL_PERCENTAGE is not set
# CONFIG_CMA_SIZE_SEL_MIN is not set
# CONFIG_CMA_SIZE_SEL_MAX is not set
CONFIG_CMA_ALIGNMENT=8

#
# Bus devices
#
CONFIG_CONNECTOR=y
CONFIG_PROC_EVENTS=y
CONFIG_MTD=y
# CONFIG_MTD_TESTS is not set
# CONFIG_MTD_REDBOOT_PARTS is not set
# CONFIG_MTD_CMDLINE_PARTS is not set
CONFIG_MTD_AR7_PARTS=y

#
# User Modules And Translation Layers
#
CONFIG_MTD_BLKDEVS=y
# CONFIG_MTD_BLOCK is not set
# CONFIG_MTD_BLOCK_RO is not set
CONFIG_FTL=y
CONFIG_NFTL=m
CONFIG_NFTL_RW=y
CONFIG_INFTL=y
CONFIG_RFD_FTL=m
# CONFIG_SSFDC is not set
CONFIG_SM_FTL=m
# CONFIG_MTD_OOPS is not set

#
# RAM/ROM/Flash chip drivers
#
# CONFIG_MTD_CFI is not set
CONFIG_MTD_JEDECPROBE=m
CONFIG_MTD_GEN_PROBE=m
# CONFIG_MTD_CFI_ADV_OPTIONS is not set
CONFIG_MTD_MAP_BANK_WIDTH_1=y
CONFIG_MTD_MAP_BANK_WIDTH_2=y
CONFIG_MTD_MAP_BANK_WIDTH_4=y
# CONFIG_MTD_MAP_BANK_WIDTH_8 is not set
# CONFIG_MTD_MAP_BANK_WIDTH_16 is not set
# CONFIG_MTD_MAP_BANK_WIDTH_32 is not set
CONFIG_MTD_CFI_I1=y
CONFIG_MTD_CFI_I2=y
# CONFIG_MTD_CFI_I4 is not set
# CONFIG_MTD_CFI_I8 is not set
CONFIG_MTD_CFI_INTELEXT=m
CONFIG_MTD_CFI_AMDSTD=m
# CONFIG_MTD_CFI_STAA is not set
CONFIG_MTD_CFI_UTIL=m
CONFIG_MTD_RAM=m
CONFIG_MTD_ROM=m
CONFIG_MTD_ABSENT=m

#
# Mapping drivers for chip access
#
# CONFIG_MTD_COMPLEX_MAPPINGS is not set
# CONFIG_MTD_PHYSMAP is not set
CONFIG_MTD_AMD76XROM=m
CONFIG_MTD_ICHXROM=m
CONFIG_MTD_ESB2ROM=m
# CONFIG_MTD_CK804XROM is not set
CONFIG_MTD_SCB2_FLASH=m
CONFIG_MTD_NETtel=m
# CONFIG_MTD_L440GX is not set
CONFIG_MTD_INTEL_VR_NOR=y
# CONFIG_MTD_PLATRAM is not set

#
# Self-contained MTD device drivers
#
CONFIG_MTD_PMC551=y
# CONFIG_MTD_PMC551_BUGFIX is not set
CONFIG_MTD_PMC551_DEBUG=y
# CONFIG_MTD_DATAFLASH is not set
# CONFIG_MTD_SST25L is not set
CONFIG_MTD_SLRAM=m
CONFIG_MTD_PHRAM=y
CONFIG_MTD_MTDRAM=y
CONFIG_MTDRAM_TOTAL_SIZE=4096
CONFIG_MTDRAM_ERASE_SIZE=128
CONFIG_MTDRAM_ABS_POS=0
CONFIG_MTD_BLOCK2MTD=m

#
# Disk-On-Chip Device Drivers
#
CONFIG_MTD_DOCG3=y
CONFIG_BCH_CONST_M=14
CONFIG_BCH_CONST_T=4
CONFIG_MTD_NAND_ECC=m
CONFIG_MTD_NAND_ECC_SMC=y
CONFIG_MTD_NAND=m
CONFIG_MTD_NAND_BCH=m
CONFIG_MTD_NAND_ECC_BCH=y
CONFIG_MTD_SM_COMMON=m
CONFIG_MTD_NAND_DENALI=m
# CONFIG_MTD_NAND_DENALI_PCI is not set
# CONFIG_MTD_NAND_OMAP_BCH_BUILD is not set
CONFIG_MTD_NAND_IDS=m
CONFIG_MTD_NAND_RICOH=m
# CONFIG_MTD_NAND_DISKONCHIP is not set
# CONFIG_MTD_NAND_DOCG4 is not set
CONFIG_MTD_NAND_CAFE=m
# CONFIG_MTD_NAND_NANDSIM is not set
CONFIG_MTD_NAND_PLATFORM=m
CONFIG_MTD_NAND_HISI504=m
CONFIG_MTD_ONENAND=m
# CONFIG_MTD_ONENAND_VERIFY_WRITE is not set
CONFIG_MTD_ONENAND_GENERIC=m
# CONFIG_MTD_ONENAND_OTP is not set
CONFIG_MTD_ONENAND_2X_PROGRAM=y

#
# LPDDR & LPDDR2 PCM memory drivers
#
CONFIG_MTD_LPDDR=m
CONFIG_MTD_QINFO_PROBE=m
# CONFIG_MTD_SPI_NOR is not set
# CONFIG_MTD_UBI is not set
CONFIG_ARCH_MIGHT_HAVE_PC_PARPORT=y
CONFIG_PARPORT=m
CONFIG_PARPORT_PC=m
CONFIG_PARPORT_SERIAL=m
CONFIG_PARPORT_PC_FIFO=y
CONFIG_PARPORT_PC_SUPERIO=y
# CONFIG_PARPORT_GSC is not set
CONFIG_PARPORT_AX88796=m
# CONFIG_PARPORT_1284 is not set
CONFIG_PARPORT_NOT_PC=y
CONFIG_PNP=y
CONFIG_PNP_DEBUG_MESSAGES=y

#
# Protocols
#
CONFIG_PNPACPI=y
CONFIG_BLK_DEV=y
CONFIG_BLK_DEV_NULL_BLK=y
CONFIG_BLK_DEV_FD=y
# CONFIG_PARIDE is not set
# CONFIG_BLK_DEV_PCIESSD_MTIP32XX is not set
CONFIG_BLK_CPQ_CISS_DA=m
# CONFIG_CISS_SCSI_TAPE is not set
CONFIG_BLK_DEV_DAC960=y
CONFIG_BLK_DEV_UMEM=y
# CONFIG_BLK_DEV_COW_COMMON is not set
CONFIG_BLK_DEV_LOOP=y
CONFIG_BLK_DEV_LOOP_MIN_COUNT=8
# CONFIG_BLK_DEV_CRYPTOLOOP is not set
CONFIG_BLK_DEV_DRBD=m
CONFIG_DRBD_FAULT_INJECTION=y
CONFIG_BLK_DEV_NBD=y
CONFIG_BLK_DEV_NVME=m
# CONFIG_BLK_DEV_SKD is not set
CONFIG_BLK_DEV_OSD=m
CONFIG_BLK_DEV_SX8=m
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=4096
CONFIG_CDROM_PKTCDVD=y
CONFIG_CDROM_PKTCDVD_BUFFERS=8
# CONFIG_CDROM_PKTCDVD_WCACHE is not set
CONFIG_ATA_OVER_ETH=m
# CONFIG_VIRTIO_BLK is not set
CONFIG_BLK_DEV_HD=y
# CONFIG_BLK_DEV_RBD is not set
# CONFIG_BLK_DEV_RSXX is not set

#
# Misc devices
#
CONFIG_SENSORS_LIS3LV02D=y
# CONFIG_AD525X_DPOT is not set
# CONFIG_DUMMY_IRQ is not set
CONFIG_IBM_ASM=y
# CONFIG_PHANTOM is not set
# CONFIG_SGI_IOC4 is not set
CONFIG_TIFM_CORE=y
CONFIG_TIFM_7XX1=m
CONFIG_ICS932S401=m
CONFIG_ENCLOSURE_SERVICES=y
CONFIG_HP_ILO=y
CONFIG_APDS9802ALS=y
# CONFIG_ISL29003 is not set
CONFIG_ISL29020=y
CONFIG_SENSORS_TSL2550=m
# CONFIG_SENSORS_BH1780 is not set
CONFIG_SENSORS_BH1770=y
# CONFIG_SENSORS_APDS990X is not set
CONFIG_HMC6352=m
CONFIG_DS1682=m
CONFIG_TI_DAC7512=y
CONFIG_BMP085=y
CONFIG_BMP085_I2C=m
CONFIG_BMP085_SPI=m
CONFIG_USB_SWITCH_FSA9480=y
# CONFIG_LATTICE_ECP3_CONFIG is not set
# CONFIG_SRAM is not set
# CONFIG_C2PORT is not set

#
# EEPROM support
#
# CONFIG_EEPROM_AT24 is not set
CONFIG_EEPROM_AT25=y
# CONFIG_EEPROM_LEGACY is not set
CONFIG_EEPROM_MAX6875=m
CONFIG_EEPROM_93CX6=m
# CONFIG_EEPROM_93XX46 is not set
CONFIG_CB710_CORE=y
# CONFIG_CB710_DEBUG is not set
CONFIG_CB710_DEBUG_ASSUMPTIONS=y

#
# Texas Instruments shared transport line discipline
#
CONFIG_SENSORS_LIS3_I2C=y

#
# Altera FPGA firmware download module
#
CONFIG_ALTERA_STAPL=y
CONFIG_VMWARE_VMCI=m

#
# Intel MIC Bus Driver
#
CONFIG_INTEL_MIC_BUS=y

#
# Intel MIC Host Driver
#
CONFIG_INTEL_MIC_HOST=y

#
# Intel MIC Card Driver
#
CONFIG_INTEL_MIC_CARD=m
# CONFIG_GENWQE is not set
# CONFIG_ECHO is not set
# CONFIG_CXL_BASE is not set
CONFIG_HAVE_IDE=y
CONFIG_IDE=y

#
# Please see Documentation/ide/ide.txt for help/info on IDE drives
#
CONFIG_IDE_XFER_MODE=y
CONFIG_IDE_TIMINGS=y
CONFIG_IDE_ATAPI=y
# CONFIG_BLK_DEV_IDE_SATA is not set
CONFIG_IDE_GD=y
CONFIG_IDE_GD_ATA=y
# CONFIG_IDE_GD_ATAPI is not set
CONFIG_BLK_DEV_IDECD=y
CONFIG_BLK_DEV_IDECD_VERBOSE_ERRORS=y
# CONFIG_BLK_DEV_IDETAPE is not set
CONFIG_BLK_DEV_IDEACPI=y
CONFIG_IDE_TASK_IOCTL=y
CONFIG_IDE_PROC_FS=y

#
# IDE chipset support/bugfixes
#
CONFIG_IDE_GENERIC=y
CONFIG_BLK_DEV_PLATFORM=y
CONFIG_BLK_DEV_CMD640=y
# CONFIG_BLK_DEV_CMD640_ENHANCED is not set
CONFIG_BLK_DEV_IDEPNP=m
CONFIG_BLK_DEV_IDEDMA_SFF=y

#
# PCI IDE chipsets support
#
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_PCIBUS_ORDER=y
# CONFIG_BLK_DEV_OFFBOARD is not set
CONFIG_BLK_DEV_GENERIC=m
CONFIG_BLK_DEV_OPTI621=y
# CONFIG_BLK_DEV_RZ1000 is not set
CONFIG_BLK_DEV_IDEDMA_PCI=y
# CONFIG_BLK_DEV_AEC62XX is not set
# CONFIG_BLK_DEV_ALI15X3 is not set
CONFIG_BLK_DEV_AMD74XX=y
CONFIG_BLK_DEV_ATIIXP=m
CONFIG_BLK_DEV_CMD64X=y
# CONFIG_BLK_DEV_TRIFLEX is not set
CONFIG_BLK_DEV_HPT366=y
CONFIG_BLK_DEV_JMICRON=m
CONFIG_BLK_DEV_PIIX=y
# CONFIG_BLK_DEV_IT8172 is not set
CONFIG_BLK_DEV_IT8213=m
CONFIG_BLK_DEV_IT821X=m
# CONFIG_BLK_DEV_NS87415 is not set
# CONFIG_BLK_DEV_PDC202XX_OLD is not set
# CONFIG_BLK_DEV_PDC202XX_NEW is not set
# CONFIG_BLK_DEV_SVWKS is not set
CONFIG_BLK_DEV_SIIMAGE=m
CONFIG_BLK_DEV_SIS5513=m
# CONFIG_BLK_DEV_SLC90E66 is not set
CONFIG_BLK_DEV_TRM290=y
# CONFIG_BLK_DEV_VIA82CXXX is not set
CONFIG_BLK_DEV_TC86C001=m
CONFIG_BLK_DEV_IDEDMA=y

#
# SCSI device support
#
CONFIG_SCSI_MOD=y
CONFIG_RAID_ATTRS=y
CONFIG_SCSI=y
CONFIG_SCSI_DMA=y
CONFIG_SCSI_NETLINK=y
CONFIG_SCSI_MQ_DEFAULT=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=y
CONFIG_CHR_DEV_ST=m
CONFIG_CHR_DEV_OSST=y
CONFIG_BLK_DEV_SR=y
# CONFIG_BLK_DEV_SR_VENDOR is not set
CONFIG_CHR_DEV_SG=y
# CONFIG_CHR_DEV_SCH is not set
# CONFIG_SCSI_ENCLOSURE is not set
CONFIG_SCSI_CONSTANTS=y
# CONFIG_SCSI_LOGGING is not set
CONFIG_SCSI_SCAN_ASYNC=y

#
# SCSI Transports
#
CONFIG_SCSI_SPI_ATTRS=y
CONFIG_SCSI_FC_ATTRS=y
CONFIG_SCSI_ISCSI_ATTRS=y
CONFIG_SCSI_SAS_ATTRS=y
CONFIG_SCSI_SAS_LIBSAS=y
# CONFIG_SCSI_SAS_ATA is not set
# CONFIG_SCSI_SAS_HOST_SMP is not set
# CONFIG_SCSI_SRP_ATTRS is not set
CONFIG_SCSI_LOWLEVEL=y
CONFIG_ISCSI_TCP=y
CONFIG_ISCSI_BOOT_SYSFS=y
CONFIG_SCSI_CXGB3_ISCSI=y
CONFIG_SCSI_CXGB4_ISCSI=m
CONFIG_SCSI_BNX2_ISCSI=m
# CONFIG_BE2ISCSI is not set
CONFIG_BLK_DEV_3W_XXXX_RAID=m
# CONFIG_SCSI_HPSA is not set
# CONFIG_SCSI_3W_9XXX is not set
CONFIG_SCSI_3W_SAS=y
CONFIG_SCSI_ACARD=m
CONFIG_SCSI_AACRAID=y
CONFIG_SCSI_AIC7XXX=y
CONFIG_AIC7XXX_CMDS_PER_DEVICE=32
CONFIG_AIC7XXX_RESET_DELAY_MS=5000
# CONFIG_AIC7XXX_DEBUG_ENABLE is not set
CONFIG_AIC7XXX_DEBUG_MASK=0
CONFIG_AIC7XXX_REG_PRETTY_PRINT=y
CONFIG_SCSI_AIC79XX=m
CONFIG_AIC79XX_CMDS_PER_DEVICE=32
CONFIG_AIC79XX_RESET_DELAY_MS=5000
# CONFIG_AIC79XX_DEBUG_ENABLE is not set
CONFIG_AIC79XX_DEBUG_MASK=0
# CONFIG_AIC79XX_REG_PRETTY_PRINT is not set
CONFIG_SCSI_AIC94XX=m
CONFIG_AIC94XX_DEBUG=y
CONFIG_SCSI_MVSAS=y
# CONFIG_SCSI_MVSAS_DEBUG is not set
CONFIG_SCSI_MVSAS_TASKLET=y
CONFIG_SCSI_MVUMI=m
CONFIG_SCSI_DPT_I2O=y
CONFIG_SCSI_ADVANSYS=m
CONFIG_SCSI_ARCMSR=m
CONFIG_SCSI_ESAS2R=m
CONFIG_MEGARAID_NEWGEN=y
CONFIG_MEGARAID_MM=y
CONFIG_MEGARAID_MAILBOX=m
# CONFIG_MEGARAID_LEGACY is not set
# CONFIG_MEGARAID_SAS is not set
CONFIG_SCSI_MPT2SAS=m
CONFIG_SCSI_MPT2SAS_MAX_SGE=128
CONFIG_SCSI_MPT2SAS_LOGGING=y
CONFIG_SCSI_MPT3SAS=y
CONFIG_SCSI_MPT3SAS_MAX_SGE=128
# CONFIG_SCSI_MPT3SAS_LOGGING is not set
# CONFIG_SCSI_UFSHCD is not set
# CONFIG_SCSI_HPTIOP is not set
# CONFIG_SCSI_BUSLOGIC is not set
CONFIG_VMWARE_PVSCSI=m
CONFIG_LIBFC=y
# CONFIG_LIBFCOE is not set
CONFIG_SCSI_DMX3191D=y
CONFIG_SCSI_EATA=m
CONFIG_SCSI_EATA_TAGGED_QUEUE=y
CONFIG_SCSI_EATA_LINKED_COMMANDS=y
CONFIG_SCSI_EATA_MAX_TAGS=16
# CONFIG_SCSI_FUTURE_DOMAIN is not set
CONFIG_SCSI_GDTH=m
CONFIG_SCSI_ISCI=y
# CONFIG_SCSI_IPS is not set
CONFIG_SCSI_INITIO=m
CONFIG_SCSI_INIA100=m
# CONFIG_SCSI_PPA is not set
CONFIG_SCSI_IMM=m
CONFIG_SCSI_IZIP_EPP16=y
CONFIG_SCSI_IZIP_SLOW_CTR=y
# CONFIG_SCSI_STEX is not set
CONFIG_SCSI_SYM53C8XX_2=m
CONFIG_SCSI_SYM53C8XX_DMA_ADDRESSING_MODE=1
CONFIG_SCSI_SYM53C8XX_DEFAULT_TAGS=16
CONFIG_SCSI_SYM53C8XX_MAX_TAGS=64
CONFIG_SCSI_SYM53C8XX_MMIO=y
# CONFIG_SCSI_IPR is not set
# CONFIG_SCSI_QLOGIC_1280 is not set
# CONFIG_SCSI_QLA_FC is not set
CONFIG_SCSI_QLA_ISCSI=m
CONFIG_SCSI_LPFC=m
CONFIG_SCSI_LPFC_DEBUG_FS=y
CONFIG_SCSI_DC395x=m
CONFIG_SCSI_AM53C974=m
# CONFIG_SCSI_WD719X is not set
CONFIG_SCSI_DEBUG=m
# CONFIG_SCSI_PMCRAID is not set
# CONFIG_SCSI_PM8001 is not set
CONFIG_SCSI_BFA_FC=m
# CONFIG_SCSI_VIRTIO is not set
# CONFIG_SCSI_CHELSIO_FCOE is not set
# CONFIG_SCSI_DH is not set
CONFIG_SCSI_OSD_INITIATOR=m
CONFIG_SCSI_OSD_ULD=m
CONFIG_SCSI_OSD_DPRINT_SENSE=1
# CONFIG_SCSI_OSD_DEBUG is not set
CONFIG_ATA=y
# CONFIG_ATA_NONSTANDARD is not set
CONFIG_ATA_VERBOSE_ERROR=y
CONFIG_ATA_ACPI=y
CONFIG_SATA_ZPODD=y
CONFIG_SATA_PMP=y

#
# Controllers with non-SFF native interface
#
CONFIG_SATA_AHCI=y
CONFIG_SATA_AHCI_PLATFORM=y
# CONFIG_SATA_INIC162X is not set
# CONFIG_SATA_ACARD_AHCI is not set
CONFIG_SATA_SIL24=y
CONFIG_ATA_SFF=y

#
# SFF controllers with custom DMA interface
#
CONFIG_PDC_ADMA=m
# CONFIG_SATA_QSTOR is not set
CONFIG_SATA_SX4=m
CONFIG_ATA_BMDMA=y

#
# SATA SFF controllers with BMDMA
#
CONFIG_ATA_PIIX=y
CONFIG_SATA_MV=m
CONFIG_SATA_NV=y
CONFIG_SATA_PROMISE=m
CONFIG_SATA_SIL=m
CONFIG_SATA_SIS=y
CONFIG_SATA_SVW=y
CONFIG_SATA_ULI=y
CONFIG_SATA_VIA=y
# CONFIG_SATA_VITESSE is not set

#
# PATA SFF controllers with BMDMA
#
CONFIG_PATA_ALI=y
CONFIG_PATA_AMD=y
CONFIG_PATA_ARTOP=m
CONFIG_PATA_ATIIXP=m
# CONFIG_PATA_ATP867X is not set
# CONFIG_PATA_CMD64X is not set
# CONFIG_PATA_CYPRESS is not set
# CONFIG_PATA_EFAR is not set
CONFIG_PATA_HPT366=m
CONFIG_PATA_HPT37X=m
CONFIG_PATA_HPT3X2N=y
CONFIG_PATA_HPT3X3=m
CONFIG_PATA_HPT3X3_DMA=y
CONFIG_PATA_IT8213=y
# CONFIG_PATA_IT821X is not set
# CONFIG_PATA_JMICRON is not set
CONFIG_PATA_MARVELL=y
CONFIG_PATA_NETCELL=y
CONFIG_PATA_NINJA32=y
CONFIG_PATA_NS87415=y
# CONFIG_PATA_OLDPIIX is not set
CONFIG_PATA_OPTIDMA=y
CONFIG_PATA_PDC2027X=y
# CONFIG_PATA_PDC_OLD is not set
CONFIG_PATA_RADISYS=m
CONFIG_PATA_RDC=y
CONFIG_PATA_SCH=y
CONFIG_PATA_SERVERWORKS=y
# CONFIG_PATA_SIL680 is not set
CONFIG_PATA_SIS=y
# CONFIG_PATA_TOSHIBA is not set
# CONFIG_PATA_TRIFLEX is not set
CONFIG_PATA_VIA=m
CONFIG_PATA_WINBOND=m

#
# PIO-only SFF controllers
#
CONFIG_PATA_CMD640_PCI=m
CONFIG_PATA_MPIIX=y
CONFIG_PATA_NS87410=m
CONFIG_PATA_OPTI=y
CONFIG_PATA_RZ1000=y

#
# Generic fallback / legacy drivers
#
# CONFIG_PATA_ACPI is not set
CONFIG_ATA_GENERIC=m
CONFIG_PATA_LEGACY=y
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
# CONFIG_MD_AUTODETECT is not set
# CONFIG_MD_LINEAR is not set
CONFIG_MD_RAID0=m
CONFIG_MD_RAID1=m
CONFIG_MD_RAID10=y
CONFIG_MD_RAID456=m
# CONFIG_MD_MULTIPATH is not set
CONFIG_MD_FAULTY=y
CONFIG_BCACHE=m
CONFIG_BCACHE_DEBUG=y
# CONFIG_BCACHE_CLOSURES_DEBUG is not set
CONFIG_BLK_DEV_DM_BUILTIN=y
CONFIG_BLK_DEV_DM=m
CONFIG_DM_DEBUG=y
CONFIG_DM_BUFIO=m
CONFIG_DM_BIO_PRISON=m
CONFIG_DM_PERSISTENT_DATA=m
CONFIG_DM_DEBUG_BLOCK_STACK_TRACING=y
CONFIG_DM_CRYPT=m
CONFIG_DM_SNAPSHOT=m
# CONFIG_DM_THIN_PROVISIONING is not set
CONFIG_DM_CACHE=m
CONFIG_DM_CACHE_MQ=m
CONFIG_DM_CACHE_CLEANER=m
CONFIG_DM_ERA=m
CONFIG_DM_MIRROR=m
# CONFIG_DM_LOG_USERSPACE is not set
CONFIG_DM_RAID=m
CONFIG_DM_ZERO=m
CONFIG_DM_MULTIPATH=m
CONFIG_DM_MULTIPATH_QL=m
CONFIG_DM_MULTIPATH_ST=m
CONFIG_DM_DELAY=m
CONFIG_DM_UEVENT=y
CONFIG_DM_FLAKEY=m
# CONFIG_DM_VERITY is not set
CONFIG_DM_SWITCH=m
CONFIG_TARGET_CORE=m
CONFIG_TCM_IBLOCK=m
# CONFIG_TCM_FILEIO is not set
CONFIG_TCM_PSCSI=m
# CONFIG_TCM_USER is not set
CONFIG_LOOPBACK_TARGET=m
CONFIG_TCM_FC=m
CONFIG_ISCSI_TARGET=m
# CONFIG_SBP_TARGET is not set
CONFIG_FUSION=y
CONFIG_FUSION_SPI=m
CONFIG_FUSION_FC=y
CONFIG_FUSION_SAS=m
CONFIG_FUSION_MAX_SGE=128
# CONFIG_FUSION_CTL is not set
# CONFIG_FUSION_LOGGING is not set

#
# IEEE 1394 (FireWire) support
#
CONFIG_FIREWIRE=y
CONFIG_FIREWIRE_OHCI=m
# CONFIG_FIREWIRE_SBP2 is not set
# CONFIG_FIREWIRE_NET is not set
CONFIG_FIREWIRE_NOSY=y
CONFIG_MACINTOSH_DRIVERS=y
# CONFIG_MAC_EMUMOUSEBTN is not set
CONFIG_NETDEVICES=y
CONFIG_MII=y
CONFIG_NET_CORE=y
# CONFIG_BONDING is not set
CONFIG_DUMMY=m
CONFIG_EQUALIZER=y
# CONFIG_NET_FC is not set
CONFIG_NET_TEAM=m
# CONFIG_NET_TEAM_MODE_BROADCAST is not set
# CONFIG_NET_TEAM_MODE_ROUNDROBIN is not set
# CONFIG_NET_TEAM_MODE_RANDOM is not set
CONFIG_NET_TEAM_MODE_ACTIVEBACKUP=m
CONFIG_NET_TEAM_MODE_LOADBALANCE=m
# CONFIG_MACVLAN is not set
# CONFIG_IPVLAN is not set
# CONFIG_VXLAN is not set
CONFIG_NETCONSOLE=y
CONFIG_NETPOLL=y
CONFIG_NET_POLL_CONTROLLER=y
# CONFIG_NTB_NETDEV is not set
CONFIG_RIONET=m
CONFIG_RIONET_TX_SIZE=128
CONFIG_RIONET_RX_SIZE=128
CONFIG_TUN=m
# CONFIG_VETH is not set
CONFIG_VIRTIO_NET=m
CONFIG_NLMON=m
CONFIG_ARCNET=m
# CONFIG_ARCNET_1201 is not set
# CONFIG_ARCNET_1051 is not set
# CONFIG_ARCNET_RAW is not set
# CONFIG_ARCNET_CAP is not set
# CONFIG_ARCNET_COM90xx is not set
CONFIG_ARCNET_COM90xxIO=m
# CONFIG_ARCNET_RIM_I is not set
# CONFIG_ARCNET_COM20020 is not set

#
# CAIF transport drivers
#
CONFIG_VHOST_NET=m
CONFIG_VHOST_SCSI=m
CONFIG_VHOST_RING=y
CONFIG_VHOST=m

#
# Distributed Switch Architecture drivers
#
CONFIG_NET_DSA_MV88E6XXX=y
CONFIG_NET_DSA_MV88E6060=m
CONFIG_NET_DSA_MV88E6XXX_NEED_PPU=y
CONFIG_NET_DSA_MV88E6131=y
CONFIG_NET_DSA_MV88E6123_61_65=m
CONFIG_NET_DSA_MV88E6171=y
CONFIG_NET_DSA_MV88E6352=y
# CONFIG_NET_DSA_BCM_SF2 is not set
CONFIG_ETHERNET=y
CONFIG_MDIO=y
# CONFIG_NET_VENDOR_3COM is not set
CONFIG_NET_VENDOR_ADAPTEC=y
# CONFIG_ADAPTEC_STARFIRE is not set
# CONFIG_NET_VENDOR_AGERE is not set
# CONFIG_NET_VENDOR_ALTEON is not set
# CONFIG_ALTERA_TSE is not set
CONFIG_NET_VENDOR_AMD=y
CONFIG_AMD8111_ETH=y
# CONFIG_PCNET32 is not set
CONFIG_AMD_XGBE=y
CONFIG_NET_XGENE=m
CONFIG_NET_VENDOR_ARC=y
# CONFIG_NET_VENDOR_ATHEROS is not set
CONFIG_NET_VENDOR_BROADCOM=y
CONFIG_B44=y
CONFIG_B44_PCI_AUTOSELECT=y
CONFIG_B44_PCICORE_AUTOSELECT=y
CONFIG_B44_PCI=y
CONFIG_BCMGENET=y
CONFIG_BNX2=y
CONFIG_CNIC=y
CONFIG_TIGON3=m
CONFIG_BNX2X=y
CONFIG_BNX2X_SRIOV=y
CONFIG_NET_VENDOR_BROCADE=y
# CONFIG_BNA is not set
CONFIG_NET_VENDOR_CHELSIO=y
CONFIG_CHELSIO_T1=m
CONFIG_CHELSIO_T1_1G=y
CONFIG_CHELSIO_T3=y
CONFIG_CHELSIO_T4=y
CONFIG_CHELSIO_T4VF=y
# CONFIG_NET_VENDOR_CISCO is not set
# CONFIG_CX_ECAT is not set
CONFIG_DNET=m
CONFIG_NET_VENDOR_DEC=y
# CONFIG_NET_TULIP is not set
CONFIG_NET_VENDOR_DLINK=y
# CONFIG_DL2K is not set
CONFIG_SUNDANCE=m
CONFIG_SUNDANCE_MMIO=y
# CONFIG_NET_VENDOR_EMULEX is not set
# CONFIG_NET_VENDOR_EXAR is not set
# CONFIG_NET_VENDOR_HP is not set
# CONFIG_NET_VENDOR_INTEL is not set
# CONFIG_IP1000 is not set
# CONFIG_JME is not set
CONFIG_NET_VENDOR_MARVELL=y
# CONFIG_MVMDIO is not set
# CONFIG_SKGE is not set
CONFIG_SKY2=m
# CONFIG_SKY2_DEBUG is not set
# CONFIG_NET_VENDOR_MELLANOX is not set
CONFIG_NET_VENDOR_MICREL=y
CONFIG_KS8842=m
# CONFIG_KS8851 is not set
CONFIG_KS8851_MLL=m
CONFIG_KSZ884X_PCI=y
CONFIG_NET_VENDOR_MICROCHIP=y
CONFIG_ENC28J60=m
CONFIG_ENC28J60_WRITEVERIFY=y
CONFIG_NET_VENDOR_MYRI=y
CONFIG_MYRI10GE=y
# CONFIG_MYRI10GE_DCA is not set
# CONFIG_FEALNX is not set
CONFIG_NET_VENDOR_NATSEMI=y
# CONFIG_NATSEMI is not set
CONFIG_NS83820=y
CONFIG_NET_VENDOR_8390=y
CONFIG_NE2K_PCI=m
# CONFIG_NET_VENDOR_NVIDIA is not set
CONFIG_NET_VENDOR_OKI=y
CONFIG_ETHOC=y
# CONFIG_NET_PACKET_ENGINE is not set
# CONFIG_NET_VENDOR_QLOGIC is not set
# CONFIG_NET_VENDOR_QUALCOMM is not set
CONFIG_NET_VENDOR_REALTEK=y
CONFIG_ATP=m
CONFIG_8139CP=y
CONFIG_8139TOO=y
# CONFIG_8139TOO_PIO is not set
CONFIG_8139TOO_TUNE_TWISTER=y
# CONFIG_8139TOO_8129 is not set
# CONFIG_8139_OLD_RX_RESET is not set
# CONFIG_R8169 is not set
CONFIG_NET_VENDOR_RDC=y
# CONFIG_R6040 is not set
CONFIG_NET_VENDOR_ROCKER=y
CONFIG_NET_VENDOR_SAMSUNG=y
CONFIG_SXGBE_ETH=y
# CONFIG_NET_VENDOR_SEEQ is not set
CONFIG_NET_VENDOR_SILAN=y
CONFIG_SC92031=y
CONFIG_NET_VENDOR_SIS=y
CONFIG_SIS900=m
CONFIG_SIS190=y
CONFIG_SFC=y
# CONFIG_SFC_MTD is not set
# CONFIG_SFC_SRIOV is not set
CONFIG_NET_VENDOR_SMSC=y
# CONFIG_EPIC100 is not set
# CONFIG_SMSC911X is not set
CONFIG_SMSC9420=y
CONFIG_NET_VENDOR_STMICRO=y
CONFIG_STMMAC_ETH=m
CONFIG_STMMAC_PLATFORM=m
CONFIG_STMMAC_PCI=m
# CONFIG_NET_VENDOR_SUN is not set
# CONFIG_NET_VENDOR_TEHUTI is not set
# CONFIG_NET_VENDOR_TI is not set
# CONFIG_NET_VENDOR_VIA is not set
CONFIG_NET_VENDOR_WIZNET=y
# CONFIG_WIZNET_W5100 is not set
CONFIG_WIZNET_W5300=m
# CONFIG_WIZNET_BUS_DIRECT is not set
# CONFIG_WIZNET_BUS_INDIRECT is not set
CONFIG_WIZNET_BUS_ANY=y
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_NET_SB1000 is not set
CONFIG_PHYLIB=y

#
# MII PHY device drivers
#
CONFIG_AT803X_PHY=m
CONFIG_AMD_PHY=y
CONFIG_AMD_XGBE_PHY=y
# CONFIG_MARVELL_PHY is not set
CONFIG_DAVICOM_PHY=y
# CONFIG_QSEMI_PHY is not set
# CONFIG_LXT_PHY is not set
# CONFIG_CICADA_PHY is not set
CONFIG_VITESSE_PHY=m
CONFIG_SMSC_PHY=y
CONFIG_BROADCOM_PHY=m
CONFIG_BCM7XXX_PHY=y
CONFIG_BCM87XX_PHY=m
CONFIG_ICPLUS_PHY=y
# CONFIG_REALTEK_PHY is not set
# CONFIG_NATIONAL_PHY is not set
CONFIG_STE10XP=y
# CONFIG_LSI_ET1011C_PHY is not set
CONFIG_MICREL_PHY=m
CONFIG_FIXED_PHY=y
# CONFIG_MDIO_BITBANG is not set
CONFIG_MDIO_BCM_UNIMAC=m
# CONFIG_MICREL_KS8995MA is not set
# CONFIG_PLIP is not set
CONFIG_PPP=y
# CONFIG_PPP_BSDCOMP is not set
CONFIG_PPP_DEFLATE=y
CONFIG_PPP_FILTER=y
CONFIG_PPP_MPPE=y
CONFIG_PPP_MULTILINK=y
# CONFIG_PPPOE is not set
# CONFIG_PPTP is not set
CONFIG_PPPOL2TP=m
# CONFIG_PPP_ASYNC is not set
# CONFIG_PPP_SYNC_TTY is not set
CONFIG_SLIP=y
CONFIG_SLHC=y
# CONFIG_SLIP_COMPRESSED is not set
# CONFIG_SLIP_SMART is not set
CONFIG_SLIP_MODE_SLIP6=y

#
# Host-side USB support is needed for USB Network Adapter support
#
# CONFIG_WLAN is not set

#
# WiMAX Wireless Broadband devices
#

#
# Enable USB support to see WiMAX USB drivers
#
# CONFIG_WAN is not set
CONFIG_IEEE802154_DRIVERS=m
# CONFIG_VMXNET3 is not set
CONFIG_ISDN=y
CONFIG_ISDN_I4L=m
CONFIG_ISDN_PPP=y
CONFIG_ISDN_PPP_VJ=y
CONFIG_ISDN_MPP=y
# CONFIG_IPPP_FILTER is not set
CONFIG_ISDN_PPP_BSDCOMP=m
CONFIG_ISDN_AUDIO=y
# CONFIG_ISDN_TTY_FAX is not set
CONFIG_ISDN_X25=y

#
# ISDN feature submodules
#
# CONFIG_ISDN_DRV_LOOP is not set
CONFIG_ISDN_DIVERSION=m

#
# ISDN4Linux hardware drivers
#

#
# Passive cards
#
# CONFIG_ISDN_DRV_HISAX is not set

#
# Active cards
#
CONFIG_ISDN_CAPI=m
# CONFIG_CAPI_TRACE is not set
CONFIG_ISDN_CAPI_CAPI20=m
# CONFIG_ISDN_CAPI_MIDDLEWARE is not set
# CONFIG_ISDN_CAPI_CAPIDRV is not set

#
# CAPI hardware drivers
#
CONFIG_CAPI_AVM=y
CONFIG_ISDN_DRV_AVMB1_B1PCI=m
CONFIG_ISDN_DRV_AVMB1_B1PCIV4=y
CONFIG_ISDN_DRV_AVMB1_T1PCI=m
# CONFIG_ISDN_DRV_AVMB1_C4 is not set
CONFIG_CAPI_EICON=y
CONFIG_ISDN_DIVAS=m
# CONFIG_ISDN_DIVAS_BRIPCI is not set
# CONFIG_ISDN_DIVAS_PRIPCI is not set
CONFIG_ISDN_DIVAS_DIVACAPI=m
CONFIG_ISDN_DIVAS_USERIDI=m
CONFIG_ISDN_DIVAS_MAINT=m
CONFIG_ISDN_DRV_GIGASET=y
CONFIG_GIGASET_DUMMYLL=y
CONFIG_GIGASET_M101=y
CONFIG_GIGASET_DEBUG=y
CONFIG_HYSDN=m
CONFIG_HYSDN_CAPI=y
CONFIG_MISDN=m
# CONFIG_MISDN_DSP is not set
CONFIG_MISDN_L1OIP=m

#
# mISDN hardware drivers
#
CONFIG_MISDN_HFCPCI=m
CONFIG_MISDN_HFCMULTI=m
# CONFIG_MISDN_AVMFRITZ is not set
CONFIG_MISDN_SPEEDFAX=m
CONFIG_MISDN_INFINEON=m
# CONFIG_MISDN_W6692 is not set
CONFIG_MISDN_NETJET=m
CONFIG_MISDN_IPAC=m
CONFIG_MISDN_ISAR=m
CONFIG_ISDN_HDLC=m

#
# Input device support
#
CONFIG_INPUT=y
CONFIG_INPUT_FF_MEMLESS=y
CONFIG_INPUT_POLLDEV=y
CONFIG_INPUT_SPARSEKMAP=y
CONFIG_INPUT_MATRIXKMAP=y

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
CONFIG_INPUT_MOUSEDEV_PSAUX=y
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
CONFIG_INPUT_JOYDEV=y
CONFIG_INPUT_EVDEV=y
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
# CONFIG_KEYBOARD_ADP5588 is not set
CONFIG_KEYBOARD_ADP5589=m
CONFIG_KEYBOARD_ATKBD=y
CONFIG_KEYBOARD_QT1070=y
# CONFIG_KEYBOARD_QT2160 is not set
CONFIG_KEYBOARD_LKKBD=y
# CONFIG_KEYBOARD_TCA6416 is not set
# CONFIG_KEYBOARD_TCA8418 is not set
# CONFIG_KEYBOARD_LM8323 is not set
CONFIG_KEYBOARD_LM8333=y
CONFIG_KEYBOARD_MAX7359=y
CONFIG_KEYBOARD_MCS=y
CONFIG_KEYBOARD_MPR121=m
CONFIG_KEYBOARD_NEWTON=m
# CONFIG_KEYBOARD_OPENCORES is not set
CONFIG_KEYBOARD_GOLDFISH_EVENTS=y
# CONFIG_KEYBOARD_STOWAWAY is not set
CONFIG_KEYBOARD_SUNKBD=y
CONFIG_KEYBOARD_TWL4030=y
CONFIG_KEYBOARD_XTKBD=m
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
CONFIG_MOUSE_PS2_ALPS=y
CONFIG_MOUSE_PS2_LOGIPS2PP=y
CONFIG_MOUSE_PS2_SYNAPTICS=y
CONFIG_MOUSE_PS2_CYPRESS=y
CONFIG_MOUSE_PS2_LIFEBOOK=y
CONFIG_MOUSE_PS2_TRACKPOINT=y
# CONFIG_MOUSE_PS2_ELANTECH is not set
CONFIG_MOUSE_PS2_SENTELIC=y
# CONFIG_MOUSE_PS2_TOUCHKIT is not set
CONFIG_MOUSE_PS2_FOCALTECH=y
CONFIG_MOUSE_SERIAL=m
# CONFIG_MOUSE_CYAPA is not set
# CONFIG_MOUSE_ELAN_I2C is not set
CONFIG_MOUSE_VSXXXAA=m
CONFIG_MOUSE_SYNAPTICS_I2C=y
# CONFIG_INPUT_JOYSTICK is not set
CONFIG_INPUT_TABLET=y
# CONFIG_TABLET_SERIAL_WACOM4 is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
CONFIG_INPUT_MISC=y
CONFIG_INPUT_AD714X=m
# CONFIG_INPUT_AD714X_I2C is not set
CONFIG_INPUT_AD714X_SPI=m
CONFIG_INPUT_BMA150=m
CONFIG_INPUT_E3X0_BUTTON=m
CONFIG_INPUT_PCSPKR=m
# CONFIG_INPUT_MAX77693_HAPTIC is not set
CONFIG_INPUT_MAX8997_HAPTIC=m
# CONFIG_INPUT_MC13783_PWRBUTTON is not set
CONFIG_INPUT_MMA8450=y
# CONFIG_INPUT_MPU3050 is not set
CONFIG_INPUT_APANEL=y
# CONFIG_INPUT_ATLAS_BTNS is not set
CONFIG_INPUT_KXTJ9=m
CONFIG_INPUT_KXTJ9_POLLED_MODE=y
CONFIG_INPUT_RETU_PWRBUTTON=y
CONFIG_INPUT_TPS65218_PWRBUTTON=m
CONFIG_INPUT_AXP20X_PEK=m
CONFIG_INPUT_TWL4030_PWRBUTTON=y
CONFIG_INPUT_TWL4030_VIBRA=m
CONFIG_INPUT_UINPUT=y
# CONFIG_INPUT_PALMAS_PWRBUTTON is not set
CONFIG_INPUT_PCF8574=m
CONFIG_INPUT_PWM_BEEPER=y
CONFIG_INPUT_DA9055_ONKEY=m
# CONFIG_INPUT_PCAP is not set
# CONFIG_INPUT_ADXL34X is not set
CONFIG_INPUT_CMA3000=y
# CONFIG_INPUT_CMA3000_I2C is not set
# CONFIG_INPUT_IDEAPAD_SLIDEBAR is not set
CONFIG_INPUT_DRV2667_HAPTICS=y

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_ARCH_MIGHT_HAVE_PC_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_SERIO_SERPORT=y
CONFIG_SERIO_CT82C710=y
# CONFIG_SERIO_PARKBD is not set
CONFIG_SERIO_PCIPS2=m
CONFIG_SERIO_LIBPS2=y
CONFIG_SERIO_RAW=y
# CONFIG_SERIO_ALTERA_PS2 is not set
CONFIG_SERIO_PS2MULT=m
# CONFIG_SERIO_ARC_PS2 is not set
CONFIG_GAMEPORT=m
CONFIG_GAMEPORT_NS558=m
# CONFIG_GAMEPORT_L4 is not set
CONFIG_GAMEPORT_EMU10K1=m
# CONFIG_GAMEPORT_FM801 is not set

#
# Character devices
#
CONFIG_TTY=y
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_VT_CONSOLE_SLEEP=y
CONFIG_HW_CONSOLE=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_UNIX98_PTYS=y
# CONFIG_DEVPTS_MULTIPLE_INSTANCES is not set
CONFIG_LEGACY_PTYS=y
CONFIG_LEGACY_PTY_COUNT=256
# CONFIG_SERIAL_NONSTANDARD is not set
CONFIG_NOZOMI=y
# CONFIG_N_GSM is not set
CONFIG_TRACE_ROUTER=m
CONFIG_TRACE_SINK=y
CONFIG_GOLDFISH_TTY=m
CONFIG_DEVMEM=y
CONFIG_DEVKMEM=y

#
# Serial drivers
#
CONFIG_SERIAL_EARLYCON=y
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_DEPRECATED_OPTIONS=y
CONFIG_SERIAL_8250_PNP=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_8250_DMA=y
CONFIG_SERIAL_8250_PCI=y
CONFIG_SERIAL_8250_NR_UARTS=4
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
# CONFIG_SERIAL_8250_EXTENDED is not set
# CONFIG_SERIAL_8250_DW is not set
CONFIG_SERIAL_8250_FINTEK=m

#
# Non-8250 serial port support
#
# CONFIG_SERIAL_MAX3100 is not set
CONFIG_SERIAL_MAX310X=y
CONFIG_SERIAL_MFD_HSU=y
CONFIG_SERIAL_MFD_HSU_CONSOLE=y
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
CONFIG_SERIAL_JSM=m
# CONFIG_SERIAL_SCCNXP is not set
CONFIG_SERIAL_SC16IS7XX=y
CONFIG_SERIAL_ALTERA_JTAGUART=y
# CONFIG_SERIAL_ALTERA_JTAGUART_CONSOLE is not set
# CONFIG_SERIAL_ALTERA_UART is not set
CONFIG_SERIAL_ARC=y
CONFIG_SERIAL_ARC_CONSOLE=y
CONFIG_SERIAL_ARC_NR_PORTS=1
CONFIG_SERIAL_RP2=m
CONFIG_SERIAL_RP2_NR_UARTS=32
CONFIG_SERIAL_FSL_LPUART=m
CONFIG_PRINTER=m
# CONFIG_LP_CONSOLE is not set
# CONFIG_PPDEV is not set
CONFIG_HVC_DRIVER=y
CONFIG_VIRTIO_CONSOLE=m
CONFIG_IPMI_HANDLER=y
# CONFIG_IPMI_PANIC_EVENT is not set
CONFIG_IPMI_DEVICE_INTERFACE=m
# CONFIG_IPMI_SI is not set
CONFIG_IPMI_SSIF=y
# CONFIG_IPMI_WATCHDOG is not set
CONFIG_IPMI_POWEROFF=m
CONFIG_HW_RANDOM=y
# CONFIG_HW_RANDOM_TIMERIOMEM is not set
CONFIG_HW_RANDOM_INTEL=y
CONFIG_HW_RANDOM_AMD=y
CONFIG_HW_RANDOM_VIA=y
CONFIG_HW_RANDOM_VIRTIO=m
# CONFIG_NVRAM is not set
CONFIG_R3964=y
CONFIG_APPLICOM=y
CONFIG_MWAVE=y
CONFIG_RAW_DRIVER=y
CONFIG_MAX_RAW_DEVS=256
CONFIG_HPET=y
CONFIG_HPET_MMAP=y
CONFIG_HPET_MMAP_DEFAULT=y
CONFIG_HANGCHECK_TIMER=y
# CONFIG_TCG_TPM is not set
CONFIG_TELCLOCK=m
CONFIG_DEVPORT=y
CONFIG_XILLYBUS=m
# CONFIG_XILLYBUS_PCIE is not set

#
# I2C support
#
CONFIG_I2C=y
CONFIG_ACPI_I2C_OPREGION=y
CONFIG_I2C_BOARDINFO=y
CONFIG_I2C_COMPAT=y
# CONFIG_I2C_CHARDEV is not set
CONFIG_I2C_MUX=y

#
# Multiplexer I2C Chip support
#
CONFIG_I2C_MUX_PCA9541=m
CONFIG_I2C_HELPER_AUTO=y
CONFIG_I2C_SMBUS=m
CONFIG_I2C_ALGOBIT=y

#
# I2C Hardware Bus support
#

#
# PC SMBus host controller drivers
#
CONFIG_I2C_ALI1535=m
CONFIG_I2C_ALI1563=y
CONFIG_I2C_ALI15X3=y
# CONFIG_I2C_AMD756 is not set
CONFIG_I2C_AMD8111=m
CONFIG_I2C_I801=m
CONFIG_I2C_ISCH=y
CONFIG_I2C_ISMT=y
CONFIG_I2C_PIIX4=m
CONFIG_I2C_NFORCE2=m
CONFIG_I2C_NFORCE2_S4985=m
# CONFIG_I2C_SIS5595 is not set
CONFIG_I2C_SIS630=m
CONFIG_I2C_SIS96X=m
CONFIG_I2C_VIA=m
CONFIG_I2C_VIAPRO=y

#
# ACPI drivers
#
CONFIG_I2C_SCMI=y

#
# I2C system bus drivers (mostly embedded / system-on-chip)
#
CONFIG_I2C_DESIGNWARE_CORE=m
CONFIG_I2C_DESIGNWARE_PCI=m
CONFIG_I2C_OCORES=m
# CONFIG_I2C_PCA_PLATFORM is not set
# CONFIG_I2C_PXA_PCI is not set
CONFIG_I2C_SIMTEC=y
# CONFIG_I2C_XILINX is not set

#
# External I2C/SMBus adapter drivers
#
CONFIG_I2C_PARPORT=m
CONFIG_I2C_PARPORT_LIGHT=m
CONFIG_I2C_TAOS_EVM=m

#
# Other I2C/SMBus bus drivers
#
CONFIG_I2C_STUB=m
# CONFIG_I2C_SLAVE is not set
# CONFIG_I2C_DEBUG_CORE is not set
CONFIG_I2C_DEBUG_ALGO=y
# CONFIG_I2C_DEBUG_BUS is not set
CONFIG_SPI=y
# CONFIG_SPI_DEBUG is not set
CONFIG_SPI_MASTER=y

#
# SPI Master Controller Drivers
#
# CONFIG_SPI_ALTERA is not set
CONFIG_SPI_BITBANG=y
CONFIG_SPI_BUTTERFLY=m
# CONFIG_SPI_LM70_LLP is not set
# CONFIG_SPI_PXA2XX is not set
# CONFIG_SPI_PXA2XX_PCI is not set
CONFIG_SPI_SC18IS602=m
CONFIG_SPI_XCOMM=m
# CONFIG_SPI_XILINX is not set
CONFIG_SPI_DESIGNWARE=y
CONFIG_SPI_DW_PCI=m
CONFIG_SPI_DW_MID_DMA=y
# CONFIG_SPI_DW_MMIO is not set

#
# SPI Protocol Masters
#
CONFIG_SPI_SPIDEV=y
CONFIG_SPI_TLE62X0=m
CONFIG_SPMI=m
# CONFIG_HSI is not set

#
# PPS support
#
CONFIG_PPS=y
CONFIG_PPS_DEBUG=y

#
# PPS clients support
#
CONFIG_PPS_CLIENT_KTIMER=m
CONFIG_PPS_CLIENT_LDISC=m
CONFIG_PPS_CLIENT_PARPORT=m
CONFIG_PPS_CLIENT_GPIO=m

#
# PPS generators support
#

#
# PTP clock support
#
CONFIG_PTP_1588_CLOCK=y

#
# Enable PHYLIB and NETWORK_PHY_TIMESTAMPING to see the additional clocks.
#
CONFIG_ARCH_WANT_OPTIONAL_GPIOLIB=y
# CONFIG_GPIOLIB is not set
CONFIG_W1=y
CONFIG_W1_CON=y

#
# 1-wire Bus Masters
#
CONFIG_W1_MASTER_MATROX=m
# CONFIG_W1_MASTER_DS2482 is not set
CONFIG_W1_MASTER_DS1WM=y

#
# 1-wire Slaves
#
# CONFIG_W1_SLAVE_THERM is not set
CONFIG_W1_SLAVE_SMEM=y
CONFIG_W1_SLAVE_DS2408=m
# CONFIG_W1_SLAVE_DS2408_READBACK is not set
# CONFIG_W1_SLAVE_DS2413 is not set
# CONFIG_W1_SLAVE_DS2406 is not set
# CONFIG_W1_SLAVE_DS2423 is not set
CONFIG_W1_SLAVE_DS2431=y
CONFIG_W1_SLAVE_DS2433=y
CONFIG_W1_SLAVE_DS2433_CRC=y
# CONFIG_W1_SLAVE_DS2760 is not set
CONFIG_W1_SLAVE_DS2780=y
# CONFIG_W1_SLAVE_DS2781 is not set
# CONFIG_W1_SLAVE_DS28E04 is not set
# CONFIG_W1_SLAVE_BQ27000 is not set
CONFIG_POWER_SUPPLY=y
CONFIG_POWER_SUPPLY_DEBUG=y
CONFIG_PDA_POWER=y
# CONFIG_GENERIC_ADC_BATTERY is not set
# CONFIG_TEST_POWER is not set
CONFIG_BATTERY_DS2780=y
# CONFIG_BATTERY_DS2781 is not set
# CONFIG_BATTERY_DS2782 is not set
CONFIG_BATTERY_SBS=y
CONFIG_BATTERY_BQ27x00=y
# CONFIG_BATTERY_BQ27X00_I2C is not set
CONFIG_BATTERY_BQ27X00_PLATFORM=y
# CONFIG_BATTERY_DA9030 is not set
CONFIG_BATTERY_MAX17040=y
# CONFIG_BATTERY_MAX17042 is not set
CONFIG_BATTERY_TWL4030_MADC=m
CONFIG_BATTERY_RX51=m
# CONFIG_CHARGER_MAX8903 is not set
# CONFIG_CHARGER_TWL4030 is not set
CONFIG_CHARGER_LP8727=y
CONFIG_CHARGER_MAX14577=m
# CONFIG_CHARGER_MAX77693 is not set
CONFIG_CHARGER_BQ2415X=y
# CONFIG_CHARGER_SMB347 is not set
CONFIG_BATTERY_GAUGE_LTC2941=y
CONFIG_BATTERY_GOLDFISH=m
# CONFIG_POWER_RESET is not set
CONFIG_POWER_AVS=y
CONFIG_HWMON=m
CONFIG_HWMON_VID=m
CONFIG_HWMON_DEBUG_CHIP=y

#
# Native drivers
#
CONFIG_SENSORS_ABITUGURU=m
# CONFIG_SENSORS_ABITUGURU3 is not set
# CONFIG_SENSORS_AD7314 is not set
CONFIG_SENSORS_AD7414=m
CONFIG_SENSORS_AD7418=m
CONFIG_SENSORS_ADM1021=m
CONFIG_SENSORS_ADM1025=m
CONFIG_SENSORS_ADM1026=m
# CONFIG_SENSORS_ADM1029 is not set
CONFIG_SENSORS_ADM1031=m
# CONFIG_SENSORS_ADM9240 is not set
CONFIG_SENSORS_ADT7X10=m
CONFIG_SENSORS_ADT7310=m
# CONFIG_SENSORS_ADT7410 is not set
CONFIG_SENSORS_ADT7411=m
CONFIG_SENSORS_ADT7462=m
CONFIG_SENSORS_ADT7470=m
# CONFIG_SENSORS_ADT7475 is not set
CONFIG_SENSORS_ASC7621=m
CONFIG_SENSORS_K8TEMP=m
CONFIG_SENSORS_K10TEMP=m
CONFIG_SENSORS_FAM15H_POWER=m
# CONFIG_SENSORS_APPLESMC is not set
CONFIG_SENSORS_ASB100=m
# CONFIG_SENSORS_ATXP1 is not set
CONFIG_SENSORS_DS620=m
# CONFIG_SENSORS_DS1621 is not set
CONFIG_SENSORS_DA9055=m
CONFIG_SENSORS_I5K_AMB=m
# CONFIG_SENSORS_F71805F is not set
CONFIG_SENSORS_F71882FG=m
# CONFIG_SENSORS_F75375S is not set
# CONFIG_SENSORS_MC13783_ADC is not set
CONFIG_SENSORS_FSCHMD=m
CONFIG_SENSORS_GL518SM=m
CONFIG_SENSORS_GL520SM=m
CONFIG_SENSORS_G760A=m
# CONFIG_SENSORS_G762 is not set
CONFIG_SENSORS_HIH6130=m
# CONFIG_SENSORS_IBMAEM is not set
# CONFIG_SENSORS_IBMPEX is not set
# CONFIG_SENSORS_IIO_HWMON is not set
CONFIG_SENSORS_I5500=m
# CONFIG_SENSORS_CORETEMP is not set
CONFIG_SENSORS_IT87=m
CONFIG_SENSORS_JC42=m
CONFIG_SENSORS_POWR1220=m
CONFIG_SENSORS_LINEAGE=m
CONFIG_SENSORS_LTC2945=m
# CONFIG_SENSORS_LTC4151 is not set
CONFIG_SENSORS_LTC4215=m
CONFIG_SENSORS_LTC4222=m
CONFIG_SENSORS_LTC4245=m
# CONFIG_SENSORS_LTC4260 is not set
CONFIG_SENSORS_LTC4261=m
CONFIG_SENSORS_MAX1111=m
CONFIG_SENSORS_MAX16065=m
# CONFIG_SENSORS_MAX1619 is not set
CONFIG_SENSORS_MAX1668=m
CONFIG_SENSORS_MAX197=m
CONFIG_SENSORS_MAX6639=m
CONFIG_SENSORS_MAX6642=m
CONFIG_SENSORS_MAX6650=m
CONFIG_SENSORS_MAX6697=m
# CONFIG_SENSORS_HTU21 is not set
CONFIG_SENSORS_MCP3021=m
CONFIG_SENSORS_MENF21BMC_HWMON=m
CONFIG_SENSORS_ADCXX=m
CONFIG_SENSORS_LM63=m
CONFIG_SENSORS_LM70=m
# CONFIG_SENSORS_LM73 is not set
CONFIG_SENSORS_LM75=m
# CONFIG_SENSORS_LM77 is not set
CONFIG_SENSORS_LM78=m
CONFIG_SENSORS_LM80=m
CONFIG_SENSORS_LM83=m
# CONFIG_SENSORS_LM85 is not set
# CONFIG_SENSORS_LM87 is not set
CONFIG_SENSORS_LM90=m
CONFIG_SENSORS_LM92=m
CONFIG_SENSORS_LM93=m
CONFIG_SENSORS_LM95234=m
# CONFIG_SENSORS_LM95241 is not set
CONFIG_SENSORS_LM95245=m
CONFIG_SENSORS_PC87360=m
CONFIG_SENSORS_PC87427=m
# CONFIG_SENSORS_NTC_THERMISTOR is not set
# CONFIG_SENSORS_NCT6683 is not set
CONFIG_SENSORS_NCT6775=m
# CONFIG_SENSORS_NCT7802 is not set
CONFIG_SENSORS_PCF8591=m
CONFIG_PMBUS=m
# CONFIG_SENSORS_PMBUS is not set
CONFIG_SENSORS_ADM1275=m
# CONFIG_SENSORS_LM25066 is not set
# CONFIG_SENSORS_LTC2978 is not set
CONFIG_SENSORS_MAX16064=m
CONFIG_SENSORS_MAX34440=m
# CONFIG_SENSORS_MAX8688 is not set
CONFIG_SENSORS_TPS40422=m
CONFIG_SENSORS_UCD9000=m
CONFIG_SENSORS_UCD9200=m
# CONFIG_SENSORS_ZL6100 is not set
CONFIG_SENSORS_SHT21=m
# CONFIG_SENSORS_SHTC1 is not set
CONFIG_SENSORS_SIS5595=m
CONFIG_SENSORS_DME1737=m
CONFIG_SENSORS_EMC1403=m
# CONFIG_SENSORS_EMC2103 is not set
# CONFIG_SENSORS_EMC6W201 is not set
CONFIG_SENSORS_SMSC47M1=m
CONFIG_SENSORS_SMSC47M192=m
CONFIG_SENSORS_SMSC47B397=m
# CONFIG_SENSORS_SCH56XX_COMMON is not set
# CONFIG_SENSORS_SMM665 is not set
CONFIG_SENSORS_ADC128D818=m
# CONFIG_SENSORS_ADS1015 is not set
CONFIG_SENSORS_ADS7828=m
CONFIG_SENSORS_ADS7871=m
CONFIG_SENSORS_AMC6821=m
CONFIG_SENSORS_INA209=m
CONFIG_SENSORS_INA2XX=m
CONFIG_SENSORS_THMC50=m
CONFIG_SENSORS_TMP102=m
CONFIG_SENSORS_TMP103=m
# CONFIG_SENSORS_TMP401 is not set
CONFIG_SENSORS_TMP421=m
# CONFIG_SENSORS_TWL4030_MADC is not set
CONFIG_SENSORS_VIA_CPUTEMP=m
# CONFIG_SENSORS_VIA686A is not set
# CONFIG_SENSORS_VT1211 is not set
# CONFIG_SENSORS_VT8231 is not set
# CONFIG_SENSORS_W83781D is not set
# CONFIG_SENSORS_W83791D is not set
# CONFIG_SENSORS_W83792D is not set
CONFIG_SENSORS_W83793=m
# CONFIG_SENSORS_W83795 is not set
# CONFIG_SENSORS_W83L785TS is not set
# CONFIG_SENSORS_W83L786NG is not set
CONFIG_SENSORS_W83627HF=m
# CONFIG_SENSORS_W83627EHF is not set

#
# ACPI drivers
#
# CONFIG_SENSORS_ACPI_POWER is not set
# CONFIG_SENSORS_ATK0110 is not set
CONFIG_THERMAL=y
CONFIG_THERMAL_DEFAULT_GOV_STEP_WISE=y
# CONFIG_THERMAL_DEFAULT_GOV_FAIR_SHARE is not set
# CONFIG_THERMAL_DEFAULT_GOV_USER_SPACE is not set
# CONFIG_THERMAL_GOV_FAIR_SHARE is not set
CONFIG_THERMAL_GOV_STEP_WISE=y
CONFIG_THERMAL_GOV_BANG_BANG=y
CONFIG_THERMAL_GOV_USER_SPACE=y
# CONFIG_THERMAL_EMULATION is not set
# CONFIG_INTEL_POWERCLAMP is not set
CONFIG_X86_PKG_TEMP_THERMAL=y
CONFIG_INTEL_SOC_DTS_THERMAL=m
CONFIG_INT340X_THERMAL=m
CONFIG_ACPI_THERMAL_REL=m

#
# Texas Instruments thermal drivers
#
# CONFIG_WATCHDOG is not set
CONFIG_SSB_POSSIBLE=y

#
# Sonics Silicon Backplane
#
CONFIG_SSB=y
CONFIG_SSB_SPROM=y
CONFIG_SSB_PCIHOST_POSSIBLE=y
CONFIG_SSB_PCIHOST=y
# CONFIG_SSB_B43_PCI_BRIDGE is not set
CONFIG_SSB_SDIOHOST_POSSIBLE=y
# CONFIG_SSB_SDIOHOST is not set
# CONFIG_SSB_DEBUG is not set
CONFIG_SSB_DRIVER_PCICORE_POSSIBLE=y
CONFIG_SSB_DRIVER_PCICORE=y
CONFIG_BCMA_POSSIBLE=y

#
# Broadcom specific AMBA
#
CONFIG_BCMA=m
CONFIG_BCMA_HOST_PCI_POSSIBLE=y
CONFIG_BCMA_HOST_PCI=y
CONFIG_BCMA_HOST_SOC=y
CONFIG_BCMA_DRIVER_GMAC_CMN=y
CONFIG_BCMA_DEBUG=y

#
# Multifunction device drivers
#
CONFIG_MFD_CORE=y
CONFIG_MFD_AS3711=y
# CONFIG_PMIC_ADP5520 is not set
CONFIG_MFD_BCM590XX=y
CONFIG_MFD_AXP20X=y
# CONFIG_MFD_CROS_EC is not set
CONFIG_PMIC_DA903X=y
# CONFIG_MFD_DA9052_SPI is not set
# CONFIG_MFD_DA9052_I2C is not set
CONFIG_MFD_DA9055=y
CONFIG_MFD_DA9063=y
CONFIG_MFD_DA9150=y
CONFIG_MFD_MC13XXX=m
CONFIG_MFD_MC13XXX_SPI=m
CONFIG_MFD_MC13XXX_I2C=m
# CONFIG_HTC_PASIC3 is not set
CONFIG_LPC_ICH=y
CONFIG_LPC_SCH=y
CONFIG_INTEL_SOC_PMIC=y
CONFIG_MFD_JANZ_CMODIO=m
# CONFIG_MFD_KEMPLD is not set
# CONFIG_MFD_88PM800 is not set
CONFIG_MFD_88PM805=m
# CONFIG_MFD_88PM860X is not set
CONFIG_MFD_MAX14577=y
CONFIG_MFD_MAX77693=y
CONFIG_MFD_MAX8907=m
# CONFIG_MFD_MAX8925 is not set
CONFIG_MFD_MAX8997=y
# CONFIG_MFD_MAX8998 is not set
CONFIG_MFD_MENF21BMC=y
CONFIG_EZX_PCAP=y
CONFIG_MFD_RETU=y
# CONFIG_MFD_PCF50633 is not set
CONFIG_MFD_RDC321X=m
CONFIG_MFD_RTSX_PCI=y
# CONFIG_MFD_RT5033 is not set
# CONFIG_MFD_RC5T583 is not set
CONFIG_MFD_RN5T618=m
# CONFIG_MFD_SEC_CORE is not set
# CONFIG_MFD_SI476X_CORE is not set
# CONFIG_MFD_SM501 is not set
# CONFIG_MFD_SMSC is not set
# CONFIG_ABX500_CORE is not set
# CONFIG_MFD_SYSCON is not set
CONFIG_MFD_TI_AM335X_TSCADC=m
CONFIG_MFD_LP3943=m
CONFIG_MFD_LP8788=y
CONFIG_MFD_PALMAS=y
# CONFIG_TPS6105X is not set
CONFIG_TPS6507X=m
# CONFIG_MFD_TPS65090 is not set
CONFIG_MFD_TPS65217=m
CONFIG_MFD_TPS65218=y
CONFIG_MFD_TPS6586X=y
CONFIG_MFD_TPS80031=y
CONFIG_TWL4030_CORE=y
CONFIG_MFD_TWL4030_AUDIO=y
# CONFIG_TWL6040_CORE is not set
# CONFIG_MFD_WL1273_CORE is not set
CONFIG_MFD_LM3533=y
# CONFIG_MFD_TC3589X is not set
# CONFIG_MFD_TMIO is not set
CONFIG_MFD_VX855=m
CONFIG_MFD_ARIZONA=y
CONFIG_MFD_ARIZONA_I2C=m
CONFIG_MFD_ARIZONA_SPI=y
# CONFIG_MFD_WM5102 is not set
CONFIG_MFD_WM5110=y
CONFIG_MFD_WM8997=y
# CONFIG_MFD_WM8400 is not set
# CONFIG_MFD_WM831X_I2C is not set
# CONFIG_MFD_WM831X_SPI is not set
# CONFIG_MFD_WM8350_I2C is not set
# CONFIG_MFD_WM8994 is not set
# CONFIG_REGULATOR is not set
# CONFIG_MEDIA_SUPPORT is not set

#
# Graphics support
#
CONFIG_AGP=y
CONFIG_AGP_AMD64=y
CONFIG_AGP_INTEL=y
# CONFIG_AGP_SIS is not set
CONFIG_AGP_VIA=y
CONFIG_INTEL_GTT=y
CONFIG_VGA_ARB=y
CONFIG_VGA_ARB_MAX_GPUS=16
# CONFIG_VGA_SWITCHEROO is not set

#
# Direct Rendering Manager
#
CONFIG_DRM=m
CONFIG_DRM_MIPI_DSI=y
CONFIG_DRM_KMS_HELPER=m
CONFIG_DRM_KMS_FB_HELPER=y
# CONFIG_DRM_LOAD_EDID_FIRMWARE is not set
CONFIG_DRM_TTM=m

#
# I2C encoder or helper chips
#
# CONFIG_DRM_I2C_ADV7511 is not set
# CONFIG_DRM_I2C_CH7006 is not set
CONFIG_DRM_I2C_SIL164=m
# CONFIG_DRM_I2C_NXP_TDA998X is not set
# CONFIG_DRM_TDFX is not set
CONFIG_DRM_R128=m
CONFIG_DRM_RADEON=m
CONFIG_DRM_RADEON_UMS=y
# CONFIG_DRM_NOUVEAU is not set
CONFIG_DRM_I810=m
CONFIG_DRM_I915=m
# CONFIG_DRM_I915_KMS is not set
CONFIG_DRM_I915_FBDEV=y
CONFIG_DRM_I915_PRELIMINARY_HW_SUPPORT=y
CONFIG_DRM_MGA=m
# CONFIG_DRM_SIS is not set
CONFIG_DRM_VIA=m
CONFIG_DRM_SAVAGE=m
CONFIG_DRM_VMWGFX=m
CONFIG_DRM_VMWGFX_FBCON=y
# CONFIG_DRM_GMA500 is not set
CONFIG_DRM_AST=m
CONFIG_DRM_MGAG200=m
# CONFIG_DRM_CIRRUS_QEMU is not set
# CONFIG_DRM_QXL is not set
CONFIG_DRM_BOCHS=m
CONFIG_DRM_PANEL=y

#
# Display Panels
#

#
# Frame buffer Devices
#
CONFIG_FB=m
# CONFIG_FIRMWARE_EDID is not set
CONFIG_FB_CMDLINE=y
CONFIG_FB_DDC=m
# CONFIG_FB_BOOT_VESA_SUPPORT is not set
CONFIG_FB_CFB_FILLRECT=m
CONFIG_FB_CFB_COPYAREA=m
CONFIG_FB_CFB_IMAGEBLIT=m
# CONFIG_FB_CFB_REV_PIXELS_IN_BYTE is not set
CONFIG_FB_SYS_FILLRECT=m
CONFIG_FB_SYS_COPYAREA=m
CONFIG_FB_SYS_IMAGEBLIT=m
CONFIG_FB_FOREIGN_ENDIAN=y
CONFIG_FB_BOTH_ENDIAN=y
# CONFIG_FB_BIG_ENDIAN is not set
# CONFIG_FB_LITTLE_ENDIAN is not set
CONFIG_FB_SYS_FOPS=m
CONFIG_FB_DEFERRED_IO=y
CONFIG_FB_SVGALIB=m
# CONFIG_FB_MACMODES is not set
CONFIG_FB_BACKLIGHT=y
CONFIG_FB_MODE_HELPERS=y
CONFIG_FB_TILEBLITTING=y

#
# Frame buffer hardware drivers
#
CONFIG_FB_CIRRUS=m
CONFIG_FB_PM2=m
# CONFIG_FB_PM2_FIFO_DISCONNECT is not set
CONFIG_FB_CYBER2000=m
# CONFIG_FB_CYBER2000_DDC is not set
CONFIG_FB_ARC=m
CONFIG_FB_VGA16=m
# CONFIG_FB_UVESA is not set
# CONFIG_FB_N411 is not set
# CONFIG_FB_HGA is not set
# CONFIG_FB_OPENCORES is not set
# CONFIG_FB_S1D13XXX is not set
CONFIG_FB_NVIDIA=m
# CONFIG_FB_NVIDIA_I2C is not set
CONFIG_FB_NVIDIA_DEBUG=y
# CONFIG_FB_NVIDIA_BACKLIGHT is not set
# CONFIG_FB_RIVA is not set
CONFIG_FB_I740=m
CONFIG_FB_LE80578=m
CONFIG_FB_CARILLO_RANCH=m
# CONFIG_FB_MATROX is not set
# CONFIG_FB_RADEON is not set
CONFIG_FB_ATY128=m
CONFIG_FB_ATY128_BACKLIGHT=y
CONFIG_FB_ATY=m
CONFIG_FB_ATY_CT=y
# CONFIG_FB_ATY_GENERIC_LCD is not set
CONFIG_FB_ATY_GX=y
CONFIG_FB_ATY_BACKLIGHT=y
CONFIG_FB_S3=m
# CONFIG_FB_S3_DDC is not set
CONFIG_FB_SAVAGE=m
CONFIG_FB_SAVAGE_I2C=y
# CONFIG_FB_SAVAGE_ACCEL is not set
# CONFIG_FB_SIS is not set
CONFIG_FB_NEOMAGIC=m
CONFIG_FB_KYRO=m
CONFIG_FB_3DFX=m
CONFIG_FB_3DFX_ACCEL=y
CONFIG_FB_3DFX_I2C=y
CONFIG_FB_VOODOO1=m
CONFIG_FB_VT8623=m
CONFIG_FB_TRIDENT=m
CONFIG_FB_ARK=m
CONFIG_FB_PM3=m
CONFIG_FB_CARMINE=m
CONFIG_FB_CARMINE_DRAM_EVAL=y
# CONFIG_CARMINE_DRAM_CUSTOM is not set
# CONFIG_FB_GOLDFISH is not set
CONFIG_FB_VIRTUAL=m
CONFIG_FB_METRONOME=m
CONFIG_FB_MB862XX=m
CONFIG_FB_MB862XX_PCI_GDC=y
# CONFIG_FB_MB862XX_I2C is not set
CONFIG_FB_BROADSHEET=m
# CONFIG_FB_AUO_K190X is not set
CONFIG_BACKLIGHT_LCD_SUPPORT=y
CONFIG_LCD_CLASS_DEVICE=m
CONFIG_LCD_LTV350QV=m
CONFIG_LCD_ILI922X=m
CONFIG_LCD_ILI9320=m
# CONFIG_LCD_TDO24M is not set
CONFIG_LCD_VGG2432A4=m
CONFIG_LCD_PLATFORM=m
CONFIG_LCD_S6E63M0=m
# CONFIG_LCD_LD9040 is not set
CONFIG_LCD_AMS369FG06=m
CONFIG_LCD_LMS501KF03=m
CONFIG_LCD_HX8357=m
CONFIG_BACKLIGHT_CLASS_DEVICE=m
CONFIG_BACKLIGHT_GENERIC=m
# CONFIG_BACKLIGHT_LM3533 is not set
CONFIG_BACKLIGHT_CARILLO_RANCH=m
CONFIG_BACKLIGHT_PWM=m
# CONFIG_BACKLIGHT_DA903X is not set
# CONFIG_BACKLIGHT_APPLE is not set
CONFIG_BACKLIGHT_SAHARA=m
# CONFIG_BACKLIGHT_ADP8860 is not set
CONFIG_BACKLIGHT_ADP8870=m
CONFIG_BACKLIGHT_LM3630A=m
CONFIG_BACKLIGHT_LM3639=m
CONFIG_BACKLIGHT_LP855X=m
CONFIG_BACKLIGHT_LP8788=m
CONFIG_BACKLIGHT_PANDORA=m
CONFIG_BACKLIGHT_TPS65217=m
# CONFIG_BACKLIGHT_AS3711 is not set
# CONFIG_BACKLIGHT_LV5207LP is not set
CONFIG_BACKLIGHT_BD6107=m
CONFIG_VGASTATE=m
CONFIG_HDMI=y

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
CONFIG_VGACON_SOFT_SCROLLBACK=y
CONFIG_VGACON_SOFT_SCROLLBACK_SIZE=128
CONFIG_DUMMY_CONSOLE=y
CONFIG_DUMMY_CONSOLE_COLUMNS=80
CONFIG_DUMMY_CONSOLE_ROWS=25
CONFIG_FRAMEBUFFER_CONSOLE=m
CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
CONFIG_FRAMEBUFFER_CONSOLE_ROTATION=y
CONFIG_LOGO=y
CONFIG_LOGO_LINUX_MONO=y
CONFIG_LOGO_LINUX_VGA16=y
# CONFIG_LOGO_LINUX_CLUT224 is not set
CONFIG_SOUND=y
CONFIG_SOUND_OSS_CORE=y
CONFIG_SOUND_OSS_CORE_PRECLAIM=y
# CONFIG_SND is not set
CONFIG_SOUND_PRIME=y
CONFIG_SOUND_OSS=y
# CONFIG_SOUND_TRACEINIT is not set
CONFIG_SOUND_DMAP=y
CONFIG_SOUND_VMIDI=m
# CONFIG_SOUND_TRIX is not set
# CONFIG_SOUND_MSS is not set
CONFIG_SOUND_MPU401=y
# CONFIG_SOUND_PAS is not set
CONFIG_SOUND_PSS=y
CONFIG_PSS_MIXER=y
CONFIG_SOUND_SB=y
CONFIG_SOUND_YM3812=m
CONFIG_SOUND_UART6850=y
CONFIG_SOUND_AEDSP16=m
CONFIG_SC6600=y
CONFIG_SC6600_JOY=y
CONFIG_SC6600_CDROM=4
CONFIG_SC6600_CDROMBASE=0
CONFIG_SOUND_KAHLUA=y

#
# HID support
#
# CONFIG_HID is not set

#
# I2C HID support
#
# CONFIG_I2C_HID is not set
CONFIG_USB_OHCI_LITTLE_ENDIAN=y
# CONFIG_USB_SUPPORT is not set
CONFIG_UWB=y
CONFIG_UWB_WHCI=m
CONFIG_MMC=y
# CONFIG_MMC_DEBUG is not set
CONFIG_MMC_CLKGATE=y

#
# MMC/SD/SDIO Card Drivers
#
# CONFIG_MMC_BLOCK is not set
CONFIG_SDIO_UART=y
# CONFIG_MMC_TEST is not set

#
# MMC/SD/SDIO Host Controller Drivers
#
# CONFIG_MMC_SDHCI is not set
# CONFIG_MMC_WBSD is not set
CONFIG_MMC_TIFM_SD=y
CONFIG_MMC_GOLDFISH=m
CONFIG_MMC_SPI=m
# CONFIG_MMC_CB710 is not set
# CONFIG_MMC_VIA_SDMMC is not set
CONFIG_MMC_USDHI6ROL0=y
CONFIG_MMC_REALTEK_PCI=y
CONFIG_MMC_TOSHIBA_PCI=y
# CONFIG_MEMSTICK is not set
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y
# CONFIG_LEDS_CLASS_FLASH is not set

#
# LED drivers
#
# CONFIG_LEDS_LM3530 is not set
CONFIG_LEDS_LM3533=m
# CONFIG_LEDS_LM3642 is not set
CONFIG_LEDS_PCA9532=y
CONFIG_LEDS_LP3944=y
CONFIG_LEDS_LP55XX_COMMON=m
CONFIG_LEDS_LP5521=m
# CONFIG_LEDS_LP5523 is not set
# CONFIG_LEDS_LP5562 is not set
# CONFIG_LEDS_LP8501 is not set
CONFIG_LEDS_LP8788=m
CONFIG_LEDS_LP8860=m
CONFIG_LEDS_CLEVO_MAIL=y
CONFIG_LEDS_PCA955X=m
CONFIG_LEDS_PCA963X=y
CONFIG_LEDS_DA903X=y
# CONFIG_LEDS_DAC124S085 is not set
CONFIG_LEDS_PWM=y
CONFIG_LEDS_BD2802=m
CONFIG_LEDS_INTEL_SS4200=m
CONFIG_LEDS_MC13783=m
# CONFIG_LEDS_TCA6507 is not set
# CONFIG_LEDS_MAX8997 is not set
# CONFIG_LEDS_LM355x is not set
CONFIG_LEDS_MENF21BMC=m

#
# LED driver for blink(1) USB RGB LED is under Special HID drivers (HID_THINGM)
#
# CONFIG_LEDS_BLINKM is not set

#
# LED Triggers
#
CONFIG_LEDS_TRIGGERS=y
CONFIG_LEDS_TRIGGER_TIMER=y
CONFIG_LEDS_TRIGGER_ONESHOT=y
CONFIG_LEDS_TRIGGER_IDE_DISK=y
# CONFIG_LEDS_TRIGGER_HEARTBEAT is not set
# CONFIG_LEDS_TRIGGER_BACKLIGHT is not set
# CONFIG_LEDS_TRIGGER_CPU is not set
# CONFIG_LEDS_TRIGGER_DEFAULT_ON is not set

#
# iptables trigger is under Netfilter config (LED target)
#
CONFIG_LEDS_TRIGGER_TRANSIENT=y
CONFIG_LEDS_TRIGGER_CAMERA=m
CONFIG_ACCESSIBILITY=y
CONFIG_A11Y_BRAILLE_CONSOLE=y
# CONFIG_INFINIBAND is not set
CONFIG_EDAC=y
CONFIG_EDAC_LEGACY_SYSFS=y
CONFIG_EDAC_DEBUG=y
CONFIG_EDAC_DECODE_MCE=y
# CONFIG_EDAC_MCE_INJ is not set
CONFIG_EDAC_MM_EDAC=y
CONFIG_EDAC_AMD64=m
# CONFIG_EDAC_AMD64_ERROR_INJECTION is not set
CONFIG_EDAC_E752X=m
# CONFIG_EDAC_I82975X is not set
# CONFIG_EDAC_I3000 is not set
CONFIG_EDAC_I3200=m
# CONFIG_EDAC_IE31200 is not set
CONFIG_EDAC_X38=m
CONFIG_EDAC_I5400=y
# CONFIG_EDAC_I7CORE is not set
CONFIG_EDAC_I5000=m
# CONFIG_EDAC_I5100 is not set
CONFIG_EDAC_I7300=m
CONFIG_EDAC_SBRIDGE=y
CONFIG_RTC_LIB=y
# CONFIG_RTC_CLASS is not set
CONFIG_DMADEVICES=y
# CONFIG_DMADEVICES_DEBUG is not set

#
# DMA Devices
#
CONFIG_INTEL_MIC_X100_DMA=m
CONFIG_INTEL_MID_DMAC=y
CONFIG_INTEL_IOATDMA=y
CONFIG_DW_DMAC_CORE=y
CONFIG_DW_DMAC=m
CONFIG_DW_DMAC_PCI=y
CONFIG_DMA_ENGINE=y
CONFIG_DMA_ACPI=y

#
# DMA Clients
#
# CONFIG_ASYNC_TX_DMA is not set
# CONFIG_DMATEST is not set
CONFIG_DMA_ENGINE_RAID=y
CONFIG_DCA=y
# CONFIG_AUXDISPLAY is not set
CONFIG_UIO=y
# CONFIG_UIO_CIF is not set
CONFIG_UIO_PDRV_GENIRQ=m
# CONFIG_UIO_DMEM_GENIRQ is not set
CONFIG_UIO_AEC=y
CONFIG_UIO_SERCOS3=y
# CONFIG_UIO_PCI_GENERIC is not set
CONFIG_UIO_NETX=y
# CONFIG_UIO_MF624 is not set
CONFIG_VFIO_IOMMU_TYPE1=y
CONFIG_VFIO=y
CONFIG_VFIO_PCI=m
# CONFIG_VFIO_PCI_VGA is not set
CONFIG_VFIO_PCI_MMAP=y
CONFIG_VFIO_PCI_INTX=y
CONFIG_VIRT_DRIVERS=y
CONFIG_VIRTIO=y

#
# Virtio drivers
#
# CONFIG_VIRTIO_PCI is not set
# CONFIG_VIRTIO_BALLOON is not set
CONFIG_VIRTIO_MMIO=y
# CONFIG_VIRTIO_MMIO_CMDLINE_DEVICES is not set

#
# Microsoft Hyper-V guest support
#
# CONFIG_STAGING is not set
# CONFIG_X86_PLATFORM_DEVICES is not set
CONFIG_GOLDFISH_PIPE=y
CONFIG_CHROME_PLATFORMS=y
CONFIG_CHROMEOS_LAPTOP=m
CONFIG_CHROMEOS_PSTORE=y

#
# Hardware Spinlock drivers
#

#
# Clock Source drivers
#
CONFIG_CLKEVT_I8253=y
CONFIG_I8253_LOCK=y
CONFIG_CLKBLD_I8253=y
# CONFIG_ATMEL_PIT is not set
# CONFIG_SH_TIMER_CMT is not set
# CONFIG_SH_TIMER_MTU2 is not set
# CONFIG_SH_TIMER_TMU is not set
# CONFIG_EM_TIMER_STI is not set
CONFIG_MAILBOX=y
# CONFIG_PCC is not set
CONFIG_ALTERA_MBOX=m
CONFIG_IOMMU_API=y
CONFIG_IOMMU_SUPPORT=y

#
# Generic IOMMU Pagetable Support
#
CONFIG_IOMMU_IOVA=y
# CONFIG_AMD_IOMMU is not set
CONFIG_DMAR_TABLE=y
CONFIG_INTEL_IOMMU=y
# CONFIG_INTEL_IOMMU_DEFAULT_ON is not set
CONFIG_INTEL_IOMMU_FLOPPY_WA=y
CONFIG_IRQ_REMAP=y

#
# Remoteproc drivers
#
# CONFIG_STE_MODEM_RPROC is not set

#
# Rpmsg drivers
#

#
# SOC (System On Chip) specific Drivers
#
# CONFIG_SOC_TI is not set
CONFIG_PM_DEVFREQ=y

#
# DEVFREQ Governors
#
# CONFIG_DEVFREQ_GOV_SIMPLE_ONDEMAND is not set
CONFIG_DEVFREQ_GOV_PERFORMANCE=y
CONFIG_DEVFREQ_GOV_POWERSAVE=m
CONFIG_DEVFREQ_GOV_USERSPACE=m

#
# DEVFREQ Drivers
#
# CONFIG_PM_DEVFREQ_EVENT is not set
CONFIG_EXTCON=m

#
# Extcon Device Drivers
#
CONFIG_EXTCON_ADC_JACK=m
CONFIG_EXTCON_MAX14577=m
# CONFIG_EXTCON_MAX77693 is not set
CONFIG_EXTCON_MAX8997=m
CONFIG_EXTCON_PALMAS=m
CONFIG_EXTCON_RT8973A=m
CONFIG_EXTCON_SM5502=m
CONFIG_MEMORY=y
CONFIG_IIO=m
CONFIG_IIO_BUFFER=y
# CONFIG_IIO_BUFFER_CB is not set
CONFIG_IIO_KFIFO_BUF=m
CONFIG_IIO_TRIGGERED_BUFFER=m
CONFIG_IIO_TRIGGER=y
CONFIG_IIO_CONSUMERS_PER_TRIGGER=2

#
# Accelerometers
#
CONFIG_BMA180=m
# CONFIG_BMC150_ACCEL is not set
CONFIG_IIO_ST_ACCEL_3AXIS=m
CONFIG_IIO_ST_ACCEL_I2C_3AXIS=m
CONFIG_IIO_ST_ACCEL_SPI_3AXIS=m
# CONFIG_KXSD9 is not set
CONFIG_MMA8452=m
# CONFIG_KXCJK1013 is not set
# CONFIG_MMA9551 is not set
# CONFIG_MMA9553 is not set

#
# Analog to digital converters
#
CONFIG_AD_SIGMA_DELTA=m
CONFIG_AD7266=m
CONFIG_AD7291=m
# CONFIG_AD7298 is not set
CONFIG_AD7476=m
CONFIG_AD7791=m
# CONFIG_AD7793 is not set
CONFIG_AD7887=m
CONFIG_AD7923=m
# CONFIG_AD799X is not set
CONFIG_AXP288_ADC=m
# CONFIG_LP8788_ADC is not set
# CONFIG_MAX1027 is not set
CONFIG_MAX1363=m
CONFIG_MCP320X=m
CONFIG_MCP3422=m
# CONFIG_NAU7802 is not set
# CONFIG_QCOM_SPMI_IADC is not set
# CONFIG_QCOM_SPMI_VADC is not set
CONFIG_TI_ADC081C=m
CONFIG_TI_ADC128S052=m
# CONFIG_TI_AM335X_ADC is not set
CONFIG_TWL4030_MADC=m
CONFIG_TWL6030_GPADC=m

#
# Amplifiers
#
CONFIG_AD8366=m

#
# Hid Sensor IIO Common
#

#
# SSP Sensor Common
#
# CONFIG_IIO_SSP_SENSORHUB is not set
CONFIG_IIO_ST_SENSORS_I2C=m
CONFIG_IIO_ST_SENSORS_SPI=m
CONFIG_IIO_ST_SENSORS_CORE=m

#
# Digital to analog converters
#
CONFIG_AD5064=m
# CONFIG_AD5360 is not set
CONFIG_AD5380=m
CONFIG_AD5421=m
# CONFIG_AD5446 is not set
CONFIG_AD5449=m
CONFIG_AD5504=m
CONFIG_AD5624R_SPI=m
CONFIG_AD5686=m
CONFIG_AD5755=m
CONFIG_AD5764=m
CONFIG_AD5791=m
CONFIG_AD7303=m
CONFIG_MAX517=m
CONFIG_MCP4725=m
CONFIG_MCP4922=m

#
# Frequency Synthesizers DDS/PLL
#

#
# Clock Generator/Distribution
#
CONFIG_AD9523=m

#
# Phase-Locked Loop (PLL) frequency synthesizers
#
CONFIG_ADF4350=m

#
# Digital gyroscope sensors
#
CONFIG_ADIS16080=m
CONFIG_ADIS16130=m
CONFIG_ADIS16136=m
CONFIG_ADIS16260=m
# CONFIG_ADXRS450 is not set
CONFIG_BMG160=m
CONFIG_IIO_ST_GYRO_3AXIS=m
CONFIG_IIO_ST_GYRO_I2C_3AXIS=m
CONFIG_IIO_ST_GYRO_SPI_3AXIS=m
CONFIG_ITG3200=m

#
# Humidity sensors
#
CONFIG_SI7005=m
# CONFIG_SI7020 is not set

#
# Inertial measurement units
#
# CONFIG_ADIS16400 is not set
# CONFIG_ADIS16480 is not set
CONFIG_KMX61=m
CONFIG_INV_MPU6050_IIO=m
CONFIG_IIO_ADIS_LIB=m
CONFIG_IIO_ADIS_LIB_BUFFER=y

#
# Light sensors
#
# CONFIG_ADJD_S311 is not set
# CONFIG_AL3320A is not set
CONFIG_APDS9300=m
CONFIG_CM32181=m
CONFIG_CM3232=m
# CONFIG_CM36651 is not set
CONFIG_GP2AP020A00F=m
CONFIG_ISL29125=m
CONFIG_JSA1212=m
CONFIG_SENSORS_LM3533=m
# CONFIG_LTR501 is not set
CONFIG_TCS3414=m
# CONFIG_TCS3472 is not set
# CONFIG_SENSORS_TSL2563 is not set
CONFIG_TSL4531=m
# CONFIG_VCNL4000 is not set

#
# Magnetometer sensors
#
CONFIG_MAG3110=m
# CONFIG_IIO_ST_MAGN_3AXIS is not set

#
# Inclinometer sensors
#

#
# Triggers - standalone
#
CONFIG_IIO_INTERRUPT_TRIGGER=m
# CONFIG_IIO_SYSFS_TRIGGER is not set

#
# Pressure sensors
#
# CONFIG_BMP280 is not set
CONFIG_MPL115=m
# CONFIG_MPL3115 is not set
CONFIG_IIO_ST_PRESS=m
CONFIG_IIO_ST_PRESS_I2C=m
CONFIG_IIO_ST_PRESS_SPI=m
CONFIG_T5403=m

#
# Lightning sensors
#
CONFIG_AS3935=m

#
# Proximity sensors
#
CONFIG_SX9500=m

#
# Temperature sensors
#
# CONFIG_MLX90614 is not set
# CONFIG_TMP006 is not set
CONFIG_NTB=m
# CONFIG_VME_BUS is not set
CONFIG_PWM=y
CONFIG_PWM_SYSFS=y
CONFIG_PWM_LP3943=m
# CONFIG_PWM_LPSS is not set
CONFIG_PWM_TWL=m
CONFIG_PWM_TWL_LED=y
CONFIG_IPACK_BUS=y
CONFIG_BOARD_TPCI200=m
CONFIG_SERIAL_IPOCTAL=m
CONFIG_RESET_CONTROLLER=y
CONFIG_FMC=m
CONFIG_FMC_FAKEDEV=m
CONFIG_FMC_TRIVIAL=m
CONFIG_FMC_WRITE_EEPROM=m
# CONFIG_FMC_CHARDEV is not set

#
# PHY Subsystem
#
CONFIG_GENERIC_PHY=y
CONFIG_BCM_KONA_USB2_PHY=y
# CONFIG_POWERCAP is not set
# CONFIG_MCB is not set
CONFIG_RAS=y
CONFIG_THUNDERBOLT=y

#
# Android
#
# CONFIG_ANDROID is not set

#
# Firmware Drivers
#
CONFIG_EDD=m
# CONFIG_EDD_OFF is not set
CONFIG_FIRMWARE_MEMMAP=y
# CONFIG_DELL_RBU is not set
CONFIG_DCDBAS=m
CONFIG_DMIID=y
# CONFIG_DMI_SYSFS is not set
CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK=y
CONFIG_ISCSI_IBFT_FIND=y
CONFIG_ISCSI_IBFT=m
# CONFIG_GOOGLE_FIRMWARE is not set
CONFIG_UEFI_CPER=y

#
# File systems
#
CONFIG_DCACHE_WORD_ACCESS=y
CONFIG_EXT2_FS=y
CONFIG_EXT2_FS_XATTR=y
CONFIG_EXT2_FS_POSIX_ACL=y
# CONFIG_EXT2_FS_SECURITY is not set
CONFIG_EXT3_FS=y
CONFIG_EXT3_DEFAULTS_TO_ORDERED=y
CONFIG_EXT3_FS_XATTR=y
CONFIG_EXT3_FS_POSIX_ACL=y
CONFIG_EXT3_FS_SECURITY=y
CONFIG_EXT4_FS=y
# CONFIG_EXT4_FS_POSIX_ACL is not set
# CONFIG_EXT4_FS_SECURITY is not set
# CONFIG_EXT4_DEBUG is not set
CONFIG_JBD=y
CONFIG_JBD_DEBUG=y
CONFIG_JBD2=y
CONFIG_JBD2_DEBUG=y
CONFIG_FS_MBCACHE=y
# CONFIG_REISERFS_FS is not set
CONFIG_JFS_FS=m
CONFIG_JFS_POSIX_ACL=y
CONFIG_JFS_SECURITY=y
# CONFIG_JFS_DEBUG is not set
CONFIG_JFS_STATISTICS=y
CONFIG_XFS_FS=y
CONFIG_XFS_QUOTA=y
# CONFIG_XFS_POSIX_ACL is not set
# CONFIG_XFS_RT is not set
CONFIG_XFS_DEBUG=y
CONFIG_GFS2_FS=y
CONFIG_OCFS2_FS=m
CONFIG_OCFS2_FS_O2CB=m
CONFIG_OCFS2_FS_USERSPACE_CLUSTER=m
# CONFIG_OCFS2_FS_STATS is not set
# CONFIG_OCFS2_DEBUG_MASKLOG is not set
CONFIG_OCFS2_DEBUG_FS=y
CONFIG_BTRFS_FS=y
CONFIG_BTRFS_FS_POSIX_ACL=y
# CONFIG_BTRFS_FS_CHECK_INTEGRITY is not set
# CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set
CONFIG_BTRFS_DEBUG=y
CONFIG_BTRFS_ASSERT=y
CONFIG_NILFS2_FS=y
# CONFIG_FS_DAX is not set
CONFIG_FS_POSIX_ACL=y
CONFIG_EXPORTFS=y
CONFIG_FILE_LOCKING=y
CONFIG_FSNOTIFY=y
CONFIG_DNOTIFY=y
CONFIG_INOTIFY_USER=y
CONFIG_FANOTIFY=y
CONFIG_QUOTA=y
# CONFIG_QUOTA_NETLINK_INTERFACE is not set
# CONFIG_PRINT_QUOTA_WARNING is not set
CONFIG_QUOTA_DEBUG=y
CONFIG_QUOTA_TREE=m
CONFIG_QFMT_V1=y
# CONFIG_QFMT_V2 is not set
CONFIG_QUOTACTL=y
CONFIG_QUOTACTL_COMPAT=y
CONFIG_AUTOFS4_FS=m
CONFIG_FUSE_FS=m
# CONFIG_CUSE is not set
CONFIG_OVERLAY_FS=y

#
# Caches
#
CONFIG_FSCACHE=m
# CONFIG_FSCACHE_STATS is not set
# CONFIG_FSCACHE_HISTOGRAM is not set
# CONFIG_FSCACHE_DEBUG is not set
CONFIG_FSCACHE_OBJECT_LIST=y
CONFIG_CACHEFILES=m
CONFIG_CACHEFILES_DEBUG=y
# CONFIG_CACHEFILES_HISTOGRAM is not set

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_UDF_FS=y
CONFIG_UDF_NLS=y

#
# DOS/FAT/NT Filesystems
#
# CONFIG_MSDOS_FS is not set
# CONFIG_VFAT_FS is not set
CONFIG_NTFS_FS=m
# CONFIG_NTFS_DEBUG is not set
# CONFIG_NTFS_RW is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
# CONFIG_PROC_VMCORE is not set
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_KERNFS=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
# CONFIG_TMPFS_POSIX_ACL is not set
CONFIG_TMPFS_XATTR=y
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_CONFIGFS_FS=m
CONFIG_MISC_FILESYSTEMS=y
CONFIG_ADFS_FS=y
CONFIG_ADFS_FS_RW=y
# CONFIG_AFFS_FS is not set
CONFIG_ECRYPT_FS=y
# CONFIG_ECRYPT_FS_MESSAGING is not set
CONFIG_HFS_FS=m
CONFIG_HFSPLUS_FS=m
CONFIG_HFSPLUS_FS_POSIX_ACL=y
# CONFIG_BEFS_FS is not set
CONFIG_BFS_FS=m
CONFIG_EFS_FS=y
# CONFIG_JFFS2_FS is not set
CONFIG_LOGFS=y
CONFIG_CRAMFS=m
CONFIG_SQUASHFS=y
CONFIG_SQUASHFS_FILE_CACHE=y
# CONFIG_SQUASHFS_FILE_DIRECT is not set
CONFIG_SQUASHFS_DECOMP_SINGLE=y
# CONFIG_SQUASHFS_DECOMP_MULTI is not set
# CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU is not set
CONFIG_SQUASHFS_XATTR=y
# CONFIG_SQUASHFS_ZLIB is not set
# CONFIG_SQUASHFS_LZ4 is not set
# CONFIG_SQUASHFS_LZO is not set
# CONFIG_SQUASHFS_XZ is not set
# CONFIG_SQUASHFS_4K_DEVBLK_SIZE is not set
CONFIG_SQUASHFS_EMBEDDED=y
CONFIG_SQUASHFS_FRAGMENT_CACHE_SIZE=3
CONFIG_VXFS_FS=y
CONFIG_MINIX_FS=y
CONFIG_OMFS_FS=y
# CONFIG_HPFS_FS is not set
CONFIG_QNX4FS_FS=y
CONFIG_QNX6FS_FS=y
CONFIG_QNX6FS_DEBUG=y
CONFIG_ROMFS_FS=y
CONFIG_ROMFS_BACKED_BY_BLOCK=y
# CONFIG_ROMFS_BACKED_BY_MTD is not set
# CONFIG_ROMFS_BACKED_BY_BOTH is not set
CONFIG_ROMFS_ON_BLOCK=y
CONFIG_PSTORE=y
CONFIG_PSTORE_CONSOLE=y
CONFIG_PSTORE_PMSG=y
CONFIG_PSTORE_FTRACE=y
# CONFIG_PSTORE_RAM is not set
CONFIG_SYSV_FS=m
# CONFIG_UFS_FS is not set
CONFIG_EXOFS_FS=m
CONFIG_EXOFS_DEBUG=y
# CONFIG_F2FS_FS is not set
CONFIG_ORE=m
CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_FS=y
CONFIG_NFS_V2=y
CONFIG_NFS_V3=y
# CONFIG_NFS_V3_ACL is not set
CONFIG_NFS_V4=y
CONFIG_NFS_SWAP=y
# CONFIG_NFS_V4_1 is not set
# CONFIG_ROOT_NFS is not set
CONFIG_NFS_USE_LEGACY_DNS=y
CONFIG_NFS_DEBUG=y
CONFIG_NFSD=y
CONFIG_NFSD_V3=y
# CONFIG_NFSD_V3_ACL is not set
CONFIG_NFSD_V4=y
# CONFIG_NFSD_PNFS is not set
# CONFIG_NFSD_FAULT_INJECTION is not set
CONFIG_GRACE_PERIOD=y
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_NFS_COMMON=y
CONFIG_SUNRPC=y
CONFIG_SUNRPC_GSS=y
CONFIG_SUNRPC_SWAP=y
# CONFIG_RPCSEC_GSS_KRB5 is not set
CONFIG_SUNRPC_DEBUG=y
# CONFIG_CEPH_FS is not set
CONFIG_CIFS=m
# CONFIG_CIFS_STATS is not set
CONFIG_CIFS_WEAK_PW_HASH=y
CONFIG_CIFS_UPCALL=y
# CONFIG_CIFS_XATTR is not set
# CONFIG_CIFS_DEBUG is not set
# CONFIG_CIFS_DFS_UPCALL is not set
CONFIG_CIFS_SMB2=y
# CONFIG_CIFS_FSCACHE is not set
CONFIG_NCP_FS=y
CONFIG_NCPFS_PACKET_SIGNING=y
# CONFIG_NCPFS_IOCTL_LOCKING is not set
# CONFIG_NCPFS_STRONG is not set
CONFIG_NCPFS_NFS_NS=y
CONFIG_NCPFS_OS2_NS=y
CONFIG_NCPFS_SMALLDOS=y
# CONFIG_NCPFS_NLS is not set
# CONFIG_NCPFS_EXTRAS is not set
# CONFIG_CODA_FS is not set
CONFIG_AFS_FS=m
CONFIG_AFS_DEBUG=y
# CONFIG_AFS_FSCACHE is not set
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
# CONFIG_NLS_CODEPAGE_437 is not set
# CONFIG_NLS_CODEPAGE_737 is not set
CONFIG_NLS_CODEPAGE_775=m
CONFIG_NLS_CODEPAGE_850=y
CONFIG_NLS_CODEPAGE_852=m
CONFIG_NLS_CODEPAGE_855=y
CONFIG_NLS_CODEPAGE_857=y
# CONFIG_NLS_CODEPAGE_860 is not set
CONFIG_NLS_CODEPAGE_861=m
CONFIG_NLS_CODEPAGE_862=m
CONFIG_NLS_CODEPAGE_863=y
# CONFIG_NLS_CODEPAGE_864 is not set
CONFIG_NLS_CODEPAGE_865=m
CONFIG_NLS_CODEPAGE_866=m
CONFIG_NLS_CODEPAGE_869=m
CONFIG_NLS_CODEPAGE_936=y
CONFIG_NLS_CODEPAGE_950=y
CONFIG_NLS_CODEPAGE_932=y
# CONFIG_NLS_CODEPAGE_949 is not set
CONFIG_NLS_CODEPAGE_874=m
CONFIG_NLS_ISO8859_8=y
CONFIG_NLS_CODEPAGE_1250=m
CONFIG_NLS_CODEPAGE_1251=m
CONFIG_NLS_ASCII=m
# CONFIG_NLS_ISO8859_1 is not set
CONFIG_NLS_ISO8859_2=m
CONFIG_NLS_ISO8859_3=m
# CONFIG_NLS_ISO8859_4 is not set
# CONFIG_NLS_ISO8859_5 is not set
# CONFIG_NLS_ISO8859_6 is not set
CONFIG_NLS_ISO8859_7=m
CONFIG_NLS_ISO8859_9=y
CONFIG_NLS_ISO8859_13=y
# CONFIG_NLS_ISO8859_14 is not set
CONFIG_NLS_ISO8859_15=y
# CONFIG_NLS_KOI8_R is not set
CONFIG_NLS_KOI8_U=m
CONFIG_NLS_MAC_ROMAN=m
CONFIG_NLS_MAC_CELTIC=m
CONFIG_NLS_MAC_CENTEURO=y
CONFIG_NLS_MAC_CROATIAN=y
CONFIG_NLS_MAC_CYRILLIC=y
# CONFIG_NLS_MAC_GAELIC is not set
CONFIG_NLS_MAC_GREEK=m
CONFIG_NLS_MAC_ICELAND=y
CONFIG_NLS_MAC_INUIT=y
# CONFIG_NLS_MAC_ROMANIAN is not set
CONFIG_NLS_MAC_TURKISH=m
CONFIG_NLS_UTF8=y
CONFIG_DLM=m
# CONFIG_DLM_DEBUG is not set

#
# Kernel hacking
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y

#
# printk and dmesg options
#
CONFIG_PRINTK_TIME=y
CONFIG_MESSAGE_LOGLEVEL_DEFAULT=4
# CONFIG_BOOT_PRINTK_DELAY is not set
CONFIG_DYNAMIC_DEBUG=y

#
# Compile-time checks and compiler options
#
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_INFO_REDUCED=y
# CONFIG_DEBUG_INFO_SPLIT is not set
CONFIG_DEBUG_INFO_DWARF4=y
CONFIG_GDB_SCRIPTS=y
CONFIG_ENABLE_WARN_DEPRECATED=y
CONFIG_ENABLE_MUST_CHECK=y
CONFIG_FRAME_WARN=1024
# CONFIG_STRIP_ASM_SYMS is not set
# CONFIG_READABLE_ASM is not set
CONFIG_UNUSED_SYMBOLS=y
CONFIG_PAGE_OWNER=y
CONFIG_DEBUG_FS=y
CONFIG_HEADERS_CHECK=y
# CONFIG_DEBUG_SECTION_MISMATCH is not set
CONFIG_ARCH_WANT_FRAME_POINTERS=y
CONFIG_FRAME_POINTER=y
# CONFIG_DEBUG_FORCE_WEAK_PER_CPU is not set
CONFIG_MAGIC_SYSRQ=y
CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE=0x1
CONFIG_DEBUG_KERNEL=y

#
# Memory Debugging
#
CONFIG_PAGE_EXTENSION=y
# CONFIG_DEBUG_PAGEALLOC is not set
CONFIG_DEBUG_OBJECTS=y
# CONFIG_DEBUG_OBJECTS_SELFTEST is not set
CONFIG_DEBUG_OBJECTS_FREE=y
# CONFIG_DEBUG_OBJECTS_TIMERS is not set
CONFIG_DEBUG_OBJECTS_WORK=y
CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
# CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER is not set
CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT=1
CONFIG_SLUB_DEBUG_ON=y
CONFIG_SLUB_STATS=y
CONFIG_HAVE_DEBUG_KMEMLEAK=y
# CONFIG_DEBUG_KMEMLEAK is not set
# CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_DEBUG_VM is not set
# CONFIG_DEBUG_VIRTUAL is not set
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_HAVE_DEBUG_STACKOVERFLOW=y
# CONFIG_DEBUG_STACKOVERFLOW is not set
CONFIG_HAVE_ARCH_KMEMCHECK=y
CONFIG_HAVE_ARCH_KASAN=y
CONFIG_KASAN=y
CONFIG_KASAN_SHADOW_OFFSET=0xdffffc0000000000
CONFIG_KASAN_OUTLINE=y
# CONFIG_KASAN_INLINE is not set
CONFIG_TEST_KASAN=m
CONFIG_DEBUG_SHIRQ=y

#
# Debug Lockups and Hangs
#
CONFIG_LOCKUP_DETECTOR=y
CONFIG_HARDLOCKUP_DETECTOR=y
# CONFIG_BOOTPARAM_HARDLOCKUP_PANIC is not set
CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE=0
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=1
# CONFIG_DETECT_HUNG_TASK is not set
# CONFIG_PANIC_ON_OOPS is not set
CONFIG_PANIC_ON_OOPS_VALUE=0
CONFIG_PANIC_TIMEOUT=0
CONFIG_SCHED_DEBUG=y
CONFIG_SCHEDSTATS=y
CONFIG_SCHED_STACK_END_CHECK=y
# CONFIG_TIMER_STATS is not set

#
# Lock Debugging (spinlocks, mutexes, etc...)
#
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
# CONFIG_DEBUG_WW_MUTEX_SLOWPATH is not set
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
CONFIG_LOCKDEP=y
# CONFIG_LOCK_STAT is not set
# CONFIG_DEBUG_LOCKDEP is not set
# CONFIG_DEBUG_ATOMIC_SLEEP is not set
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
# CONFIG_LOCK_TORTURE_TEST is not set
CONFIG_TRACE_IRQFLAGS=y
CONFIG_STACKTRACE=y
CONFIG_DEBUG_KOBJECT=y
CONFIG_DEBUG_BUGVERBOSE=y
# CONFIG_DEBUG_LIST is not set
CONFIG_DEBUG_PI_LIST=y
CONFIG_DEBUG_SG=y
# CONFIG_DEBUG_NOTIFIERS is not set
CONFIG_DEBUG_CREDENTIALS=y

#
# RCU Debugging
#
CONFIG_PROVE_RCU=y
# CONFIG_PROVE_RCU_REPEATEDLY is not set
CONFIG_SPARSE_RCU_POINTER=y
CONFIG_TORTURE_TEST=m
CONFIG_RCU_TORTURE_TEST=m
# CONFIG_RCU_TRACE is not set
CONFIG_DEBUG_BLOCK_EXT_DEVT=y
# CONFIG_NOTIFIER_ERROR_INJECTION is not set
# CONFIG_FAULT_INJECTION is not set
CONFIG_LATENCYTOP=y
CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS=y
# CONFIG_DEBUG_STRICT_USER_COPY_CHECKS is not set
CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_HAVE_FENTRY=y
CONFIG_HAVE_C_RECORDMCOUNT=y
CONFIG_TRACER_MAX_TRACE=y
CONFIG_TRACE_CLOCK=y
CONFIG_RING_BUFFER=y
CONFIG_EVENT_TRACING=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_RING_BUFFER_ALLOW_SWAP=y
CONFIG_TRACING=y
CONFIG_GENERIC_TRACER=y
CONFIG_TRACING_SUPPORT=y
CONFIG_FTRACE=y
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
# CONFIG_IRQSOFF_TRACER is not set
CONFIG_SCHED_TRACER=y
CONFIG_FTRACE_SYSCALLS=y
CONFIG_TRACER_SNAPSHOT=y
CONFIG_TRACER_SNAPSHOT_PER_CPU_SWAP=y
CONFIG_BRANCH_PROFILE_NONE=y
# CONFIG_PROFILE_ANNOTATED_BRANCHES is not set
# CONFIG_PROFILE_ALL_BRANCHES is not set
CONFIG_STACK_TRACER=y
CONFIG_BLK_DEV_IO_TRACE=y
CONFIG_KPROBE_EVENT=y
CONFIG_UPROBE_EVENT=y
CONFIG_PROBE_EVENTS=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
# CONFIG_FUNCTION_PROFILER is not set
CONFIG_FTRACE_MCOUNT_RECORD=y
# CONFIG_FTRACE_STARTUP_TEST is not set
CONFIG_MMIOTRACE=y
CONFIG_MMIOTRACE_TEST=m
# CONFIG_TRACEPOINT_BENCHMARK is not set
CONFIG_RING_BUFFER_BENCHMARK=m
CONFIG_RING_BUFFER_STARTUP_TEST=y

#
# Runtime Testing
#
CONFIG_LKDTM=y
# CONFIG_TEST_LIST_SORT is not set
# CONFIG_KPROBES_SANITY_TEST is not set
# CONFIG_BACKTRACE_SELF_TEST is not set
# CONFIG_RBTREE_TEST is not set
CONFIG_INTERVAL_TREE_TEST=m
# CONFIG_PERCPU_TEST is not set
# CONFIG_ATOMIC64_SELFTEST is not set
# CONFIG_ASYNC_RAID6_TEST is not set
CONFIG_TEST_HEXDUMP=m
CONFIG_TEST_STRING_HELPERS=m
# CONFIG_TEST_KSTRTOX is not set
CONFIG_TEST_RHASHTABLE=y
# CONFIG_PROVIDE_OHCI1394_DMA_INIT is not set
CONFIG_BUILD_DOCSRC=y
CONFIG_DMA_API_DEBUG=y
CONFIG_TEST_LKM=m
CONFIG_TEST_USER_COPY=m
CONFIG_TEST_BPF=m
CONFIG_TEST_FIRMWARE=y
CONFIG_TEST_UDELAY=m
CONFIG_MEMTEST=y
# CONFIG_SAMPLES is not set
CONFIG_HAVE_ARCH_KGDB=y
# CONFIG_KGDB is not set
CONFIG_STRICT_DEVMEM=y
CONFIG_X86_VERBOSE_BOOTUP=y
CONFIG_EARLY_PRINTK=y
# CONFIG_EARLY_PRINTK_DBGP is not set
CONFIG_X86_PTDUMP=y
CONFIG_DEBUG_RODATA=y
CONFIG_DEBUG_RODATA_TEST=y
CONFIG_DEBUG_SET_MODULE_RONX=y
CONFIG_DEBUG_NX_TEST=m
CONFIG_DOUBLEFAULT=y
# CONFIG_DEBUG_TLBFLUSH is not set
# CONFIG_IOMMU_DEBUG is not set
# CONFIG_IOMMU_STRESS is not set
CONFIG_HAVE_MMIOTRACE_SUPPORT=y
CONFIG_X86_DECODER_SELFTEST=y
CONFIG_IO_DELAY_TYPE_0X80=0
CONFIG_IO_DELAY_TYPE_0XED=1
CONFIG_IO_DELAY_TYPE_UDELAY=2
CONFIG_IO_DELAY_TYPE_NONE=3
CONFIG_IO_DELAY_0X80=y
# CONFIG_IO_DELAY_0XED is not set
# CONFIG_IO_DELAY_UDELAY is not set
# CONFIG_IO_DELAY_NONE is not set
CONFIG_DEFAULT_IO_DELAY_TYPE=0
CONFIG_DEBUG_BOOT_PARAMS=y
# CONFIG_CPA_DEBUG is not set
CONFIG_OPTIMIZE_INLINING=y
CONFIG_DEBUG_NMI_SELFTEST=y
# CONFIG_X86_DEBUG_STATIC_CPU_HAS is not set

#
# Security options
#
CONFIG_KEYS=y
# CONFIG_PERSISTENT_KEYRINGS is not set
# CONFIG_BIG_KEYS is not set
# CONFIG_ENCRYPTED_KEYS is not set
# CONFIG_SECURITY_DMESG_RESTRICT is not set
# CONFIG_SECURITY is not set
CONFIG_SECURITYFS=y
CONFIG_INTEL_TXT=y
CONFIG_DEFAULT_SECURITY_DAC=y
CONFIG_DEFAULT_SECURITY=""
CONFIG_XOR_BLOCKS=y
CONFIG_ASYNC_CORE=m
CONFIG_ASYNC_MEMCPY=m
CONFIG_ASYNC_XOR=m
CONFIG_ASYNC_PQ=m
CONFIG_ASYNC_RAID6_RECOV=m
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD=y
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_BLKCIPHER=y
CONFIG_CRYPTO_BLKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG=y
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_PCOMP=m
CONFIG_CRYPTO_PCOMP2=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
CONFIG_CRYPTO_USER=y
CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=y
CONFIG_CRYPTO_GF128MUL=y
CONFIG_CRYPTO_NULL=y
CONFIG_CRYPTO_WORKQUEUE=y
CONFIG_CRYPTO_CRYPTD=y
CONFIG_CRYPTO_MCRYPTD=m
CONFIG_CRYPTO_AUTHENC=y
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_ABLK_HELPER=y
CONFIG_CRYPTO_GLUE_HELPER_X86=y

#
# Authenticated Encryption with Associated Data
#
CONFIG_CRYPTO_CCM=m
CONFIG_CRYPTO_GCM=y
CONFIG_CRYPTO_SEQIV=y

#
# Block modes
#
CONFIG_CRYPTO_CBC=y
CONFIG_CRYPTO_CTR=y
CONFIG_CRYPTO_CTS=y
CONFIG_CRYPTO_ECB=y
CONFIG_CRYPTO_LRW=y
CONFIG_CRYPTO_PCBC=y
CONFIG_CRYPTO_XTS=y

#
# Hash modes
#
CONFIG_CRYPTO_CMAC=m
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPTO_XCBC=y
CONFIG_CRYPTO_VMAC=m

#
# Digest
#
CONFIG_CRYPTO_CRC32C=y
CONFIG_CRYPTO_CRC32C_INTEL=y
# CONFIG_CRYPTO_CRC32 is not set
CONFIG_CRYPTO_CRC32_PCLMUL=y
CONFIG_CRYPTO_CRCT10DIF=y
CONFIG_CRYPTO_CRCT10DIF_PCLMUL=m
CONFIG_CRYPTO_GHASH=y
CONFIG_CRYPTO_MD4=m
CONFIG_CRYPTO_MD5=y
CONFIG_CRYPTO_MICHAEL_MIC=m
CONFIG_CRYPTO_RMD128=m
# CONFIG_CRYPTO_RMD160 is not set
CONFIG_CRYPTO_RMD256=y
CONFIG_CRYPTO_RMD320=y
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_SHA1_SSSE3=m
CONFIG_CRYPTO_SHA256_SSSE3=y
CONFIG_CRYPTO_SHA512_SSSE3=m
# CONFIG_CRYPTO_SHA1_MB is not set
CONFIG_CRYPTO_SHA256=y
CONFIG_CRYPTO_SHA512=y
CONFIG_CRYPTO_TGR192=y
CONFIG_CRYPTO_WP512=m
CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL=m

#
# Ciphers
#
CONFIG_CRYPTO_AES=y
CONFIG_CRYPTO_AES_X86_64=m
# CONFIG_CRYPTO_AES_NI_INTEL is not set
# CONFIG_CRYPTO_ANUBIS is not set
CONFIG_CRYPTO_ARC4=y
CONFIG_CRYPTO_BLOWFISH=m
CONFIG_CRYPTO_BLOWFISH_COMMON=y
CONFIG_CRYPTO_BLOWFISH_X86_64=y
CONFIG_CRYPTO_CAMELLIA=m
CONFIG_CRYPTO_CAMELLIA_X86_64=m
CONFIG_CRYPTO_CAMELLIA_AESNI_AVX_X86_64=m
# CONFIG_CRYPTO_CAMELLIA_AESNI_AVX2_X86_64 is not set
CONFIG_CRYPTO_CAST_COMMON=y
CONFIG_CRYPTO_CAST5=y
CONFIG_CRYPTO_CAST5_AVX_X86_64=m
CONFIG_CRYPTO_CAST6=y
CONFIG_CRYPTO_CAST6_AVX_X86_64=m
CONFIG_CRYPTO_DES=y
CONFIG_CRYPTO_DES3_EDE_X86_64=m
CONFIG_CRYPTO_FCRYPT=y
# CONFIG_CRYPTO_KHAZAD is not set
CONFIG_CRYPTO_SALSA20=m
# CONFIG_CRYPTO_SALSA20_X86_64 is not set
CONFIG_CRYPTO_SEED=y
CONFIG_CRYPTO_SERPENT=y
CONFIG_CRYPTO_SERPENT_SSE2_X86_64=m
CONFIG_CRYPTO_SERPENT_AVX_X86_64=y
CONFIG_CRYPTO_SERPENT_AVX2_X86_64=m
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=y
CONFIG_CRYPTO_TWOFISH_COMMON=y
CONFIG_CRYPTO_TWOFISH_X86_64=y
CONFIG_CRYPTO_TWOFISH_X86_64_3WAY=y
CONFIG_CRYPTO_TWOFISH_AVX_X86_64=y

#
# Compression
#
CONFIG_CRYPTO_DEFLATE=m
CONFIG_CRYPTO_ZLIB=m
CONFIG_CRYPTO_LZO=y
CONFIG_CRYPTO_LZ4=m
CONFIG_CRYPTO_LZ4HC=m

#
# Random Number Generation
#
CONFIG_CRYPTO_ANSI_CPRNG=m
# CONFIG_CRYPTO_DRBG_MENU is not set
CONFIG_CRYPTO_USER_API=y
CONFIG_CRYPTO_USER_API_HASH=m
CONFIG_CRYPTO_USER_API_SKCIPHER=m
CONFIG_CRYPTO_USER_API_RNG=y
CONFIG_CRYPTO_HASH_INFO=y
CONFIG_CRYPTO_HW=y
CONFIG_CRYPTO_DEV_PADLOCK=m
CONFIG_CRYPTO_DEV_PADLOCK_AES=m
CONFIG_CRYPTO_DEV_PADLOCK_SHA=m
# CONFIG_CRYPTO_DEV_CCP is not set
CONFIG_CRYPTO_DEV_QAT=y
CONFIG_CRYPTO_DEV_QAT_DH895xCC=y
CONFIG_ASYMMETRIC_KEY_TYPE=y
CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE=y
CONFIG_PUBLIC_KEY_ALGO_RSA=y
CONFIG_X509_CERTIFICATE_PARSER=y
# CONFIG_PKCS7_MESSAGE_PARSER is not set
CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
CONFIG_HAVE_KVM_IRQFD=y
CONFIG_HAVE_KVM_IRQ_ROUTING=y
CONFIG_HAVE_KVM_EVENTFD=y
CONFIG_KVM_APIC_ARCHITECTURE=y
CONFIG_KVM_MMIO=y
CONFIG_KVM_ASYNC_PF=y
CONFIG_HAVE_KVM_MSI=y
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT=y
CONFIG_KVM_VFIO=y
CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT=y
CONFIG_KVM_COMPAT=y
CONFIG_VIRTUALIZATION=y
CONFIG_KVM=m
CONFIG_KVM_INTEL=m
CONFIG_KVM_AMD=m
CONFIG_KVM_MMU_AUDIT=y
# CONFIG_KVM_DEVICE_ASSIGNMENT is not set
CONFIG_BINARY_PRINTF=y

#
# Library routines
#
CONFIG_RAID6_PQ=y
CONFIG_BITREVERSE=y
# CONFIG_HAVE_ARCH_BITREVERSE is not set
CONFIG_GENERIC_STRNCPY_FROM_USER=y
CONFIG_GENERIC_STRNLEN_USER=y
CONFIG_GENERIC_NET_UTILS=y
CONFIG_GENERIC_FIND_FIRST_BIT=y
CONFIG_GENERIC_PCI_IOMAP=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_IO=y
CONFIG_PERCPU_RWSEM=y
CONFIG_ARCH_USE_CMPXCHG_LOCKREF=y
CONFIG_ARCH_HAS_FAST_MULTIPLIER=y
CONFIG_CRC_CCITT=y
CONFIG_CRC16=y
CONFIG_CRC_T10DIF=y
CONFIG_CRC_ITU_T=y
CONFIG_CRC32=y
CONFIG_CRC32_SELFTEST=y
CONFIG_CRC32_SLICEBY8=y
# CONFIG_CRC32_SLICEBY4 is not set
# CONFIG_CRC32_SARWATE is not set
# CONFIG_CRC32_BIT is not set
CONFIG_CRC7=m
CONFIG_LIBCRC32C=y
# CONFIG_CRC8 is not set
# CONFIG_AUDIT_ARCH_COMPAT_GENERIC is not set
# CONFIG_RANDOM32_SELFTEST is not set
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=y
CONFIG_LZO_COMPRESS=y
CONFIG_LZO_DECOMPRESS=y
CONFIG_LZ4_COMPRESS=m
CONFIG_LZ4HC_COMPRESS=m
CONFIG_LZ4_DECOMPRESS=m
CONFIG_XZ_DEC=y
CONFIG_XZ_DEC_X86=y
CONFIG_XZ_DEC_POWERPC=y
CONFIG_XZ_DEC_IA64=y
CONFIG_XZ_DEC_ARM=y
CONFIG_XZ_DEC_ARMTHUMB=y
CONFIG_XZ_DEC_SPARC=y
CONFIG_XZ_DEC_BCJ=y
CONFIG_XZ_DEC_TEST=m
CONFIG_DECOMPRESS_GZIP=y
CONFIG_DECOMPRESS_BZIP2=y
CONFIG_DECOMPRESS_LZMA=y
CONFIG_DECOMPRESS_XZ=y
CONFIG_DECOMPRESS_LZO=y
CONFIG_REED_SOLOMON=m
CONFIG_REED_SOLOMON_DEC16=y
CONFIG_BCH=y
CONFIG_BCH_CONST_PARAMS=y
CONFIG_TEXTSEARCH=y
CONFIG_TEXTSEARCH_KMP=m
CONFIG_TEXTSEARCH_BM=m
CONFIG_TEXTSEARCH_FSM=m
CONFIG_BTREE=y
CONFIG_INTERVAL_TREE=y
CONFIG_ASSOCIATIVE_ARRAY=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT_MAP=y
CONFIG_HAS_DMA=y
CONFIG_CHECK_SIGNATURE=y
CONFIG_DQL=y
CONFIG_GLOB=y
# CONFIG_GLOB_SELFTEST is not set
CONFIG_NLATTR=y
CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE=y
CONFIG_LRU_CACHE=m
CONFIG_AVERAGE=y
CONFIG_CLZ_TAB=y
CONFIG_CORDIC=y
# CONFIG_DDR is not set
CONFIG_MPILIB=y
CONFIG_OID_REGISTRY=y
CONFIG_FONT_SUPPORT=m
CONFIG_FONTS=y
CONFIG_FONT_8x8=y
CONFIG_FONT_8x16=y
CONFIG_FONT_6x11=y
# CONFIG_FONT_7x14 is not set
# CONFIG_FONT_PEARL_8x8 is not set
# CONFIG_FONT_ACORN_8x8 is not set
# CONFIG_FONT_MINI_4x6 is not set
CONFIG_FONT_6x10=y
# CONFIG_FONT_SUN8x16 is not set
# CONFIG_FONT_SUN12x22 is not set
# CONFIG_FONT_10x18 is not set
CONFIG_ARCH_HAS_SG_CHAIN=y

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* Re: [PATCH 06/13] mm: meminit: Inline some helper functions
  2015-05-04  8:33     ` Michal Hocko
@ 2015-05-04  8:38       ` Michal Hocko
  -1 siblings, 0 replies; 168+ messages in thread
From: Michal Hocko @ 2015-05-04  8:38 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Waiman Long,
	Scott Norton, Daniel J Blueman, Linux-MM, LKML

I have taken this into my mm git tree for now. I guess Andrew will fold
it into the original patch later.

---
>From 986279c465b2f513bcbb91ba7010cb2184d1bb7c Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.cz>
Date: Mon, 4 May 2015 10:35:36 +0200
Subject: [PATCH] mm-meminit-inline-some-helper-functions-fix2.patch
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

mm/page_alloc.c: In function ‘deferred_init_memmap’:
mm/page_alloc.c:1135:4: error: implicit declaration of function ‘meminit_pfn_in_nid’ [-Werror=implicit-function-declaration]
    if (!meminit_pfn_in_nid(pfn, nid, &nid_init_state)) {
    ^

because randconfig decided to disable CONFIG_NODES_SPAN_OTHER_NODES.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 mm/page_alloc.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3e0257debce0..a48128d882d8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1044,6 +1044,11 @@ static inline bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
 {
 	return true;
 }
+static inline bool __meminit meminit_pfn_in_nid(unsigned long pfn, int node,
+					struct mminit_pfnnid_cache *state)
+{
+	return true;
+}
 #endif
 
 
-- 
2.1.4

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* Re: [PATCH 06/13] mm: meminit: Inline some helper functions
@ 2015-05-04  8:38       ` Michal Hocko
  0 siblings, 0 replies; 168+ messages in thread
From: Michal Hocko @ 2015-05-04  8:38 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Waiman Long,
	Scott Norton, Daniel J Blueman, Linux-MM, LKML

I have taken this into my mm git tree for now. I guess Andrew will fold
it into the original patch later.

---

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
  2015-05-02  0:09       ` Waiman Long
@ 2015-05-04 21:30         ` Andrew Morton
  -1 siblings, 0 replies; 168+ messages in thread
From: Andrew Morton @ 2015-05-04 21:30 UTC (permalink / raw)
  To: Waiman Long
  Cc: Mel Gorman, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Fri, 01 May 2015 20:09:21 -0400 Waiman Long <waiman.long@hp.com> wrote:

> On 05/01/2015 06:02 PM, Waiman Long wrote:
> >
> > Bad news!
> >
> > I tried your patch on a 24-TB DragonHawk and got an out of memory 
> > panic. The kernel log messages were:
>
> ...
>
> > [   81.360287]  [<ffffffff8151b0c9>] dump_stack+0x68/0x77
> > [   81.365942]  [<ffffffff8151ae1e>] panic+0xb9/0x219
> > [   81.371213]  [<ffffffff810785c3>] ? 
> > __blocking_notifier_call_chain+0x63/0x80
> > [   81.378971]  [<ffffffff811384ce>] __out_of_memory+0x34e/0x350
> > [   81.385292]  [<ffffffff811385ee>] out_of_memory+0x5e/0x90
> > [   81.391230]  [<ffffffff8113ce9e>] __alloc_pages_slowpath+0x6be/0x740
> > [   81.398219]  [<ffffffff8113d15c>] __alloc_pages_nodemask+0x23c/0x250
> > [   81.405212]  [<ffffffff81186346>] kmem_getpages+0x56/0x110
> > [   81.411246]  [<ffffffff81187f44>] fallback_alloc+0x164/0x200
> > [   81.417474]  [<ffffffff81187cfd>] ____cache_alloc_node+0x8d/0x170
> > [   81.424179]  [<ffffffff811887bb>] kmem_cache_alloc_trace+0x17b/0x240
> > [   81.431169]  [<ffffffff813d5f3a>] init_memory_block+0x3a/0x110
> > [   81.437586]  [<ffffffff81b5f687>] memory_dev_init+0xd7/0x13d
> > [   81.443810]  [<ffffffff81b5f2af>] driver_init+0x2f/0x37
> > [   81.449556]  [<ffffffff81b1599b>] do_basic_setup+0x29/0xd5
> > [   81.455597]  [<ffffffff81b372c4>] ? sched_init_smp+0x140/0x147
> > [   81.462015]  [<ffffffff81b15c55>] kernel_init_freeable+0x20e/0x297
> > [   81.468815]  [<ffffffff81512ea0>] ? rest_init+0x80/0x80
> > [   81.474565]  [<ffffffff81512ea9>] kernel_init+0x9/0xf0
> > [   81.480216]  [<ffffffff8151f788>] ret_from_fork+0x58/0x90
> > [   81.486156]  [<ffffffff81512ea0>] ? rest_init+0x80/0x80
> > [   81.492350] ---[ end Kernel panic - not syncing: Out of memory and 
> > no killable processes...
> > [   81.492350]
> >
> > -Longman
> 
> I increased the pre-initialized memory per node in update_defer_init() 
> of mm/page_alloc.c from 2G to 4G. Now I am able to boot the 24-TB 
> machine without error. The 12-TB has 0.75TB/node, while the 24-TB 
> machine has 1.5TB/node. I would suggest something like pre-initializing 
> 1G per 0.25TB/node. In this way, it will scale properly with the memory 
> size.

We're using more than 2G before we've even completed do_basic_setup()? 
Where did it all go?

> Before the patch, the boot time from elilo prompt to ssh login was 694s. 
> After the patch, the boot up time was 346s, a saving of 348s (about 50%).

Having to guesstimate the amount of memory which is needed for a
successful boot will be painful.  Any number we choose will be wrong
99% of the time.

If the kswapd threads have started, all we need to do is to wait: take
a little nap in the allocator's page==NULL slowpath.

I'm not seeing any reason why we can't start kswapd much earlier -
right at the start of do_basic_setup()?

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
@ 2015-05-04 21:30         ` Andrew Morton
  0 siblings, 0 replies; 168+ messages in thread
From: Andrew Morton @ 2015-05-04 21:30 UTC (permalink / raw)
  To: Waiman Long
  Cc: Mel Gorman, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Fri, 01 May 2015 20:09:21 -0400 Waiman Long <waiman.long@hp.com> wrote:

> On 05/01/2015 06:02 PM, Waiman Long wrote:
> >
> > Bad news!
> >
> > I tried your patch on a 24-TB DragonHawk and got an out of memory 
> > panic. The kernel log messages were:
>
> ...
>
> > [   81.360287]  [<ffffffff8151b0c9>] dump_stack+0x68/0x77
> > [   81.365942]  [<ffffffff8151ae1e>] panic+0xb9/0x219
> > [   81.371213]  [<ffffffff810785c3>] ? 
> > __blocking_notifier_call_chain+0x63/0x80
> > [   81.378971]  [<ffffffff811384ce>] __out_of_memory+0x34e/0x350
> > [   81.385292]  [<ffffffff811385ee>] out_of_memory+0x5e/0x90
> > [   81.391230]  [<ffffffff8113ce9e>] __alloc_pages_slowpath+0x6be/0x740
> > [   81.398219]  [<ffffffff8113d15c>] __alloc_pages_nodemask+0x23c/0x250
> > [   81.405212]  [<ffffffff81186346>] kmem_getpages+0x56/0x110
> > [   81.411246]  [<ffffffff81187f44>] fallback_alloc+0x164/0x200
> > [   81.417474]  [<ffffffff81187cfd>] ____cache_alloc_node+0x8d/0x170
> > [   81.424179]  [<ffffffff811887bb>] kmem_cache_alloc_trace+0x17b/0x240
> > [   81.431169]  [<ffffffff813d5f3a>] init_memory_block+0x3a/0x110
> > [   81.437586]  [<ffffffff81b5f687>] memory_dev_init+0xd7/0x13d
> > [   81.443810]  [<ffffffff81b5f2af>] driver_init+0x2f/0x37
> > [   81.449556]  [<ffffffff81b1599b>] do_basic_setup+0x29/0xd5
> > [   81.455597]  [<ffffffff81b372c4>] ? sched_init_smp+0x140/0x147
> > [   81.462015]  [<ffffffff81b15c55>] kernel_init_freeable+0x20e/0x297
> > [   81.468815]  [<ffffffff81512ea0>] ? rest_init+0x80/0x80
> > [   81.474565]  [<ffffffff81512ea9>] kernel_init+0x9/0xf0
> > [   81.480216]  [<ffffffff8151f788>] ret_from_fork+0x58/0x90
> > [   81.486156]  [<ffffffff81512ea0>] ? rest_init+0x80/0x80
> > [   81.492350] ---[ end Kernel panic - not syncing: Out of memory and 
> > no killable processes...
> > [   81.492350]
> >
> > -Longman
> 
> I increased the pre-initialized memory per node in update_defer_init() 
> of mm/page_alloc.c from 2G to 4G. Now I am able to boot the 24-TB 
> machine without error. The 12-TB has 0.75TB/node, while the 24-TB 
> machine has 1.5TB/node. I would suggest something like pre-initializing 
> 1G per 0.25TB/node. In this way, it will scale properly with the memory 
> size.

We're using more than 2G before we've even completed do_basic_setup()? 
Where did it all go?

> Before the patch, the boot time from elilo prompt to ssh login was 694s. 
> After the patch, the boot up time was 346s, a saving of 348s (about 50%).

Having to guesstimate the amount of memory which is needed for a
successful boot will be painful.  Any number we choose will be wrong
99% of the time.

If the kswapd threads have started, all we need to do is to wait: take
a little nap in the allocator's page==NULL slowpath.

I'm not seeing any reason why we can't start kswapd much earlier -
right at the start of do_basic_setup()?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
  2015-05-04 21:30         ` Andrew Morton
@ 2015-05-05  3:32           ` Waiman Long
  -1 siblings, 0 replies; 168+ messages in thread
From: Waiman Long @ 2015-05-05  3:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On 05/04/2015 05:30 PM, Andrew Morton wrote:
> On Fri, 01 May 2015 20:09:21 -0400 Waiman Long<waiman.long@hp.com>  wrote:
>
>> On 05/01/2015 06:02 PM, Waiman Long wrote:
>>> Bad news!
>>>
>>> I tried your patch on a 24-TB DragonHawk and got an out of memory
>>> panic. The kernel log messages were:
>> ...
>>
>>> [   81.360287]  [<ffffffff8151b0c9>] dump_stack+0x68/0x77
>>> [   81.365942]  [<ffffffff8151ae1e>] panic+0xb9/0x219
>>> [   81.371213]  [<ffffffff810785c3>] ?
>>> __blocking_notifier_call_chain+0x63/0x80
>>> [   81.378971]  [<ffffffff811384ce>] __out_of_memory+0x34e/0x350
>>> [   81.385292]  [<ffffffff811385ee>] out_of_memory+0x5e/0x90
>>> [   81.391230]  [<ffffffff8113ce9e>] __alloc_pages_slowpath+0x6be/0x740
>>> [   81.398219]  [<ffffffff8113d15c>] __alloc_pages_nodemask+0x23c/0x250
>>> [   81.405212]  [<ffffffff81186346>] kmem_getpages+0x56/0x110
>>> [   81.411246]  [<ffffffff81187f44>] fallback_alloc+0x164/0x200
>>> [   81.417474]  [<ffffffff81187cfd>] ____cache_alloc_node+0x8d/0x170
>>> [   81.424179]  [<ffffffff811887bb>] kmem_cache_alloc_trace+0x17b/0x240
>>> [   81.431169]  [<ffffffff813d5f3a>] init_memory_block+0x3a/0x110
>>> [   81.437586]  [<ffffffff81b5f687>] memory_dev_init+0xd7/0x13d
>>> [   81.443810]  [<ffffffff81b5f2af>] driver_init+0x2f/0x37
>>> [   81.449556]  [<ffffffff81b1599b>] do_basic_setup+0x29/0xd5
>>> [   81.455597]  [<ffffffff81b372c4>] ? sched_init_smp+0x140/0x147
>>> [   81.462015]  [<ffffffff81b15c55>] kernel_init_freeable+0x20e/0x297
>>> [   81.468815]  [<ffffffff81512ea0>] ? rest_init+0x80/0x80
>>> [   81.474565]  [<ffffffff81512ea9>] kernel_init+0x9/0xf0
>>> [   81.480216]  [<ffffffff8151f788>] ret_from_fork+0x58/0x90
>>> [   81.486156]  [<ffffffff81512ea0>] ? rest_init+0x80/0x80
>>> [   81.492350] ---[ end Kernel panic - not syncing: Out of memory and
>>> no killable processes...
>>> [   81.492350]
>>>
>>> -Longman
>> I increased the pre-initialized memory per node in update_defer_init()
>> of mm/page_alloc.c from 2G to 4G. Now I am able to boot the 24-TB
>> machine without error. The 12-TB has 0.75TB/node, while the 24-TB
>> machine has 1.5TB/node. I would suggest something like pre-initializing
>> 1G per 0.25TB/node. In this way, it will scale properly with the memory
>> size.
> We're using more than 2G before we've even completed do_basic_setup()?
> Where did it all go?

I think they may be used in the allocation of the hash tables like:

[    2.367440] Dentry cache hash table entries: 2147483648 (order: 22, 
17179869184 bytes)
[   11.522768] Inode-cache hash table entries: 2147483648 (order: 22, 
17179869184 bytes)
[   18.598513] Mount-cache hash table entries: 67108864 (order: 17, 
536870912 bytes)
[   18.667485] Mountpoint-cache hash table entries: 67108864 (order: 17, 
536870912 bytes)

The size of those hash tables do scale somewhat linearly with the amount 
of total memory available.

>> Before the patch, the boot time from elilo prompt to ssh login was 694s.
>> After the patch, the boot up time was 346s, a saving of 348s (about 50%).
> Having to guesstimate the amount of memory which is needed for a
> successful boot will be painful.  Any number we choose will be wrong
> 99% of the time.
>
> If the kswapd threads have started, all we need to do is to wait: take
> a little nap in the allocator's page==NULL slowpath.
>
> I'm not seeing any reason why we can't start kswapd much earlier -
> right at the start of do_basic_setup()?

I think we can, we just have to change the hash table allocator to do that.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
@ 2015-05-05  3:32           ` Waiman Long
  0 siblings, 0 replies; 168+ messages in thread
From: Waiman Long @ 2015-05-05  3:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On 05/04/2015 05:30 PM, Andrew Morton wrote:
> On Fri, 01 May 2015 20:09:21 -0400 Waiman Long<waiman.long@hp.com>  wrote:
>
>> On 05/01/2015 06:02 PM, Waiman Long wrote:
>>> Bad news!
>>>
>>> I tried your patch on a 24-TB DragonHawk and got an out of memory
>>> panic. The kernel log messages were:
>> ...
>>
>>> [   81.360287]  [<ffffffff8151b0c9>] dump_stack+0x68/0x77
>>> [   81.365942]  [<ffffffff8151ae1e>] panic+0xb9/0x219
>>> [   81.371213]  [<ffffffff810785c3>] ?
>>> __blocking_notifier_call_chain+0x63/0x80
>>> [   81.378971]  [<ffffffff811384ce>] __out_of_memory+0x34e/0x350
>>> [   81.385292]  [<ffffffff811385ee>] out_of_memory+0x5e/0x90
>>> [   81.391230]  [<ffffffff8113ce9e>] __alloc_pages_slowpath+0x6be/0x740
>>> [   81.398219]  [<ffffffff8113d15c>] __alloc_pages_nodemask+0x23c/0x250
>>> [   81.405212]  [<ffffffff81186346>] kmem_getpages+0x56/0x110
>>> [   81.411246]  [<ffffffff81187f44>] fallback_alloc+0x164/0x200
>>> [   81.417474]  [<ffffffff81187cfd>] ____cache_alloc_node+0x8d/0x170
>>> [   81.424179]  [<ffffffff811887bb>] kmem_cache_alloc_trace+0x17b/0x240
>>> [   81.431169]  [<ffffffff813d5f3a>] init_memory_block+0x3a/0x110
>>> [   81.437586]  [<ffffffff81b5f687>] memory_dev_init+0xd7/0x13d
>>> [   81.443810]  [<ffffffff81b5f2af>] driver_init+0x2f/0x37
>>> [   81.449556]  [<ffffffff81b1599b>] do_basic_setup+0x29/0xd5
>>> [   81.455597]  [<ffffffff81b372c4>] ? sched_init_smp+0x140/0x147
>>> [   81.462015]  [<ffffffff81b15c55>] kernel_init_freeable+0x20e/0x297
>>> [   81.468815]  [<ffffffff81512ea0>] ? rest_init+0x80/0x80
>>> [   81.474565]  [<ffffffff81512ea9>] kernel_init+0x9/0xf0
>>> [   81.480216]  [<ffffffff8151f788>] ret_from_fork+0x58/0x90
>>> [   81.486156]  [<ffffffff81512ea0>] ? rest_init+0x80/0x80
>>> [   81.492350] ---[ end Kernel panic - not syncing: Out of memory and
>>> no killable processes...
>>> [   81.492350]
>>>
>>> -Longman
>> I increased the pre-initialized memory per node in update_defer_init()
>> of mm/page_alloc.c from 2G to 4G. Now I am able to boot the 24-TB
>> machine without error. The 12-TB has 0.75TB/node, while the 24-TB
>> machine has 1.5TB/node. I would suggest something like pre-initializing
>> 1G per 0.25TB/node. In this way, it will scale properly with the memory
>> size.
> We're using more than 2G before we've even completed do_basic_setup()?
> Where did it all go?

I think they may be used in the allocation of the hash tables like:

[    2.367440] Dentry cache hash table entries: 2147483648 (order: 22, 
17179869184 bytes)
[   11.522768] Inode-cache hash table entries: 2147483648 (order: 22, 
17179869184 bytes)
[   18.598513] Mount-cache hash table entries: 67108864 (order: 17, 
536870912 bytes)
[   18.667485] Mountpoint-cache hash table entries: 67108864 (order: 17, 
536870912 bytes)

The size of those hash tables do scale somewhat linearly with the amount 
of total memory available.

>> Before the patch, the boot time from elilo prompt to ssh login was 694s.
>> After the patch, the boot up time was 346s, a saving of 348s (about 50%).
> Having to guesstimate the amount of memory which is needed for a
> successful boot will be painful.  Any number we choose will be wrong
> 99% of the time.
>
> If the kswapd threads have started, all we need to do is to wait: take
> a little nap in the allocator's page==NULL slowpath.
>
> I'm not seeing any reason why we can't start kswapd much earlier -
> right at the start of do_basic_setup()?

I think we can, we just have to change the hash table allocator to do that.

Cheers,
Longman

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
  2015-05-04 21:30         ` Andrew Morton
@ 2015-05-05 10:45           ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-05-05 10:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Waiman Long, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Mon, May 04, 2015 at 02:30:46PM -0700, Andrew Morton wrote:
> > Before the patch, the boot time from elilo prompt to ssh login was 694s. 
> > After the patch, the boot up time was 346s, a saving of 348s (about 50%).
> 
> Having to guesstimate the amount of memory which is needed for a
> successful boot will be painful.  Any number we choose will be wrong
> 99% of the time.
> 
> If the kswapd threads have started, all we need to do is to wait: take
> a little nap in the allocator's page==NULL slowpath.
> 
> I'm not seeing any reason why we can't start kswapd much earlier -
> right at the start of do_basic_setup()?

It doesn't even have to be kswapd, it just should be a thread pinned to
a done. The difficulty is that dealing with the system hashes means the
initialisation has to happen before vfs_caches_init_early() when there is
no scheduler. Those allocations could be delayed further but then there is
the possibility that the allocations would not be contiguous and they'd
have to rely on CMA to make the attempt. That potentially alters the
performance of the large system hashes at run time.

We can scale the amount initialised with memory sizes relatively easy.
This boots on the same 1TB machine I was testing before but that is
hardly a surprise.

---8<---
mm: meminit: Take into account that large system caches scale linearly with memory

Waiman Long reported a 24TB machine triggered an OOM as parallel memory
initialisation deferred too much memory for initialisation. The likely
consumer of this memory was large system hashes that scale with memory
size. This patch initialises at least 2G per node but scales the amount
initialised for larger systems.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/page_alloc.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 598f78d6544c..f7cc6c9fb909 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -266,15 +266,16 @@ static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid)
  */
 static inline bool update_defer_init(pg_data_t *pgdat,
 				unsigned long pfn, unsigned long zone_end,
+				unsigned long max_initialise,
 				unsigned long *nr_initialised)
 {
 	/* Always populate low zones for address-contrained allocations */
 	if (zone_end < pgdat_end_pfn(pgdat))
 		return true;
 
-	/* Initialise at least 2G of the highest zone */
+	/* Initialise at least the requested amount in the highest zone */
 	(*nr_initialised)++;
-	if (*nr_initialised > (2UL << (30 - PAGE_SHIFT)) &&
+	if ((*nr_initialised > max_initialise) &&
 	    (pfn & (PAGES_PER_SECTION - 1)) == 0) {
 		pgdat->first_deferred_pfn = pfn;
 		return false;
@@ -299,6 +300,7 @@ static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid)
 
 static inline bool update_defer_init(pg_data_t *pgdat,
 				unsigned long pfn, unsigned long zone_end,
+				unsigned long max_initialise,
 				unsigned long *nr_initialised)
 {
 	return true;
@@ -4457,11 +4459,19 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 	unsigned long end_pfn = start_pfn + size;
 	unsigned long pfn;
 	struct zone *z;
+	unsigned long max_initialise;
 	unsigned long nr_initialised = 0;
 
 	if (highest_memmap_pfn < end_pfn - 1)
 		highest_memmap_pfn = end_pfn - 1;
 
+	/*
+	 * Initialise at least 2G of a node but also take into account that
+	 * two large system hashes that can take up an 8th of memory.
+	 */
+	max_initialise = min(2UL << (30 - PAGE_SHIFT),
+			(pgdat->node_spanned_pages >> 3));
+
 	z = &pgdat->node_zones[zone];
 	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
 		/*
@@ -4475,6 +4485,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 			if (!early_pfn_in_nid(pfn, nid))
 				continue;
 			if (!update_defer_init(pgdat, pfn, end_pfn,
+						max_initialise,
 						&nr_initialised))
 				break;
 		}


^ permalink raw reply related	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
@ 2015-05-05 10:45           ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-05-05 10:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Waiman Long, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Mon, May 04, 2015 at 02:30:46PM -0700, Andrew Morton wrote:
> > Before the patch, the boot time from elilo prompt to ssh login was 694s. 
> > After the patch, the boot up time was 346s, a saving of 348s (about 50%).
> 
> Having to guesstimate the amount of memory which is needed for a
> successful boot will be painful.  Any number we choose will be wrong
> 99% of the time.
> 
> If the kswapd threads have started, all we need to do is to wait: take
> a little nap in the allocator's page==NULL slowpath.
> 
> I'm not seeing any reason why we can't start kswapd much earlier -
> right at the start of do_basic_setup()?

It doesn't even have to be kswapd, it just should be a thread pinned to
a done. The difficulty is that dealing with the system hashes means the
initialisation has to happen before vfs_caches_init_early() when there is
no scheduler. Those allocations could be delayed further but then there is
the possibility that the allocations would not be contiguous and they'd
have to rely on CMA to make the attempt. That potentially alters the
performance of the large system hashes at run time.

We can scale the amount initialised with memory sizes relatively easy.
This boots on the same 1TB machine I was testing before but that is
hardly a surprise.

---8<---
mm: meminit: Take into account that large system caches scale linearly with memory

Waiman Long reported a 24TB machine triggered an OOM as parallel memory
initialisation deferred too much memory for initialisation. The likely
consumer of this memory was large system hashes that scale with memory
size. This patch initialises at least 2G per node but scales the amount
initialised for larger systems.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/page_alloc.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 598f78d6544c..f7cc6c9fb909 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -266,15 +266,16 @@ static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid)
  */
 static inline bool update_defer_init(pg_data_t *pgdat,
 				unsigned long pfn, unsigned long zone_end,
+				unsigned long max_initialise,
 				unsigned long *nr_initialised)
 {
 	/* Always populate low zones for address-contrained allocations */
 	if (zone_end < pgdat_end_pfn(pgdat))
 		return true;
 
-	/* Initialise at least 2G of the highest zone */
+	/* Initialise at least the requested amount in the highest zone */
 	(*nr_initialised)++;
-	if (*nr_initialised > (2UL << (30 - PAGE_SHIFT)) &&
+	if ((*nr_initialised > max_initialise) &&
 	    (pfn & (PAGES_PER_SECTION - 1)) == 0) {
 		pgdat->first_deferred_pfn = pfn;
 		return false;
@@ -299,6 +300,7 @@ static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid)
 
 static inline bool update_defer_init(pg_data_t *pgdat,
 				unsigned long pfn, unsigned long zone_end,
+				unsigned long max_initialise,
 				unsigned long *nr_initialised)
 {
 	return true;
@@ -4457,11 +4459,19 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 	unsigned long end_pfn = start_pfn + size;
 	unsigned long pfn;
 	struct zone *z;
+	unsigned long max_initialise;
 	unsigned long nr_initialised = 0;
 
 	if (highest_memmap_pfn < end_pfn - 1)
 		highest_memmap_pfn = end_pfn - 1;
 
+	/*
+	 * Initialise at least 2G of a node but also take into account that
+	 * two large system hashes that can take up an 8th of memory.
+	 */
+	max_initialise = min(2UL << (30 - PAGE_SHIFT),
+			(pgdat->node_spanned_pages >> 3));
+
 	z = &pgdat->node_zones[zone];
 	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
 		/*
@@ -4475,6 +4485,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 			if (!early_pfn_in_nid(pfn, nid))
 				continue;
 			if (!update_defer_init(pgdat, pfn, end_pfn,
+						max_initialise,
 						&nr_initialised))
 				break;
 		}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
  2015-05-05 10:45           ` Mel Gorman
@ 2015-05-05 13:55             ` Waiman Long
  -1 siblings, 0 replies; 168+ messages in thread
From: Waiman Long @ 2015-05-05 13:55 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On 05/05/2015 06:45 AM, Mel Gorman wrote:
> On Mon, May 04, 2015 at 02:30:46PM -0700, Andrew Morton wrote:
>>> Before the patch, the boot time from elilo prompt to ssh login was 694s.
>>> After the patch, the boot up time was 346s, a saving of 348s (about 50%).
>> Having to guesstimate the amount of memory which is needed for a
>> successful boot will be painful.  Any number we choose will be wrong
>> 99% of the time.
>>
>> If the kswapd threads have started, all we need to do is to wait: take
>> a little nap in the allocator's page==NULL slowpath.
>>
>> I'm not seeing any reason why we can't start kswapd much earlier -
>> right at the start of do_basic_setup()?
> It doesn't even have to be kswapd, it just should be a thread pinned to
> a done. The difficulty is that dealing with the system hashes means the
> initialisation has to happen before vfs_caches_init_early() when there is
> no scheduler. Those allocations could be delayed further but then there is
> the possibility that the allocations would not be contiguous and they'd
> have to rely on CMA to make the attempt. That potentially alters the
> performance of the large system hashes at run time.
>
> We can scale the amount initialised with memory sizes relatively easy.
> This boots on the same 1TB machine I was testing before but that is
> hardly a surprise.
>
> ---8<---
> mm: meminit: Take into account that large system caches scale linearly with memory
>
> Waiman Long reported a 24TB machine triggered an OOM as parallel memory
> initialisation deferred too much memory for initialisation. The likely
> consumer of this memory was large system hashes that scale with memory
> size. This patch initialises at least 2G per node but scales the amount
> initialised for larger systems.
>
> Signed-off-by: Mel Gorman<mgorman@suse.de>
> ---
>   mm/page_alloc.c | 15 +++++++++++++--
>   1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 598f78d6544c..f7cc6c9fb909 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -266,15 +266,16 @@ static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid)
>    */
>   static inline bool update_defer_init(pg_data_t *pgdat,
>   				unsigned long pfn, unsigned long zone_end,
> +				unsigned long max_initialise,
>   				unsigned long *nr_initialised)
>   {
>   	/* Always populate low zones for address-contrained allocations */
>   	if (zone_end<  pgdat_end_pfn(pgdat))
>   		return true;
>
> -	/* Initialise at least 2G of the highest zone */
> +	/* Initialise at least the requested amount in the highest zone */
>   	(*nr_initialised)++;
> -	if (*nr_initialised>  (2UL<<  (30 - PAGE_SHIFT))&&
> +	if ((*nr_initialised>  max_initialise)&&
>   	(pfn&  (PAGES_PER_SECTION - 1)) == 0) {
>   		pgdat->first_deferred_pfn = pfn;
>   		return false;
> @@ -299,6 +300,7 @@ static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid)
>
>   static inline bool update_defer_init(pg_data_t *pgdat,
>   				unsigned long pfn, unsigned long zone_end,
> +				unsigned long max_initialise,
>   				unsigned long *nr_initialised)
>   {
>   	return true;
> @@ -4457,11 +4459,19 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>   	unsigned long end_pfn = start_pfn + size;
>   	unsigned long pfn;
>   	struct zone *z;
> +	unsigned long max_initialise;
>   	unsigned long nr_initialised = 0;
>
>   	if (highest_memmap_pfn<  end_pfn - 1)
>   		highest_memmap_pfn = end_pfn - 1;
>
> +	/*
> +	 * Initialise at least 2G of a node but also take into account that
> +	 * two large system hashes that can take up an 8th of memory.
> +	 */
> +	max_initialise = min(2UL<<  (30 - PAGE_SHIFT),
> +			(pgdat->node_spanned_pages>>  3));
> +

I think you may be pre-allocating too much memory here. On the 24-TB 
machine, the size of the dentry and inode hash tables were 16G each. So 
the ratio is about is about 32G/24T = 0.13%. I think a shift factor of 
(>> 8) which is about 0.39% should be more than enough. For the 24TB 
machine, that means a preallocated memory of 96+4G which should be even 
more than the 64+4G in the modified kernel that I used. At the same 
time, I think we can also set the minimum to 1G or even 0.5G for better 
performance for systems that have many CPUs, but not as much memory per 
node.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
@ 2015-05-05 13:55             ` Waiman Long
  0 siblings, 0 replies; 168+ messages in thread
From: Waiman Long @ 2015-05-05 13:55 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On 05/05/2015 06:45 AM, Mel Gorman wrote:
> On Mon, May 04, 2015 at 02:30:46PM -0700, Andrew Morton wrote:
>>> Before the patch, the boot time from elilo prompt to ssh login was 694s.
>>> After the patch, the boot up time was 346s, a saving of 348s (about 50%).
>> Having to guesstimate the amount of memory which is needed for a
>> successful boot will be painful.  Any number we choose will be wrong
>> 99% of the time.
>>
>> If the kswapd threads have started, all we need to do is to wait: take
>> a little nap in the allocator's page==NULL slowpath.
>>
>> I'm not seeing any reason why we can't start kswapd much earlier -
>> right at the start of do_basic_setup()?
> It doesn't even have to be kswapd, it just should be a thread pinned to
> a done. The difficulty is that dealing with the system hashes means the
> initialisation has to happen before vfs_caches_init_early() when there is
> no scheduler. Those allocations could be delayed further but then there is
> the possibility that the allocations would not be contiguous and they'd
> have to rely on CMA to make the attempt. That potentially alters the
> performance of the large system hashes at run time.
>
> We can scale the amount initialised with memory sizes relatively easy.
> This boots on the same 1TB machine I was testing before but that is
> hardly a surprise.
>
> ---8<---
> mm: meminit: Take into account that large system caches scale linearly with memory
>
> Waiman Long reported a 24TB machine triggered an OOM as parallel memory
> initialisation deferred too much memory for initialisation. The likely
> consumer of this memory was large system hashes that scale with memory
> size. This patch initialises at least 2G per node but scales the amount
> initialised for larger systems.
>
> Signed-off-by: Mel Gorman<mgorman@suse.de>
> ---
>   mm/page_alloc.c | 15 +++++++++++++--
>   1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 598f78d6544c..f7cc6c9fb909 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -266,15 +266,16 @@ static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid)
>    */
>   static inline bool update_defer_init(pg_data_t *pgdat,
>   				unsigned long pfn, unsigned long zone_end,
> +				unsigned long max_initialise,
>   				unsigned long *nr_initialised)
>   {
>   	/* Always populate low zones for address-contrained allocations */
>   	if (zone_end<  pgdat_end_pfn(pgdat))
>   		return true;
>
> -	/* Initialise at least 2G of the highest zone */
> +	/* Initialise at least the requested amount in the highest zone */
>   	(*nr_initialised)++;
> -	if (*nr_initialised>  (2UL<<  (30 - PAGE_SHIFT))&&
> +	if ((*nr_initialised>  max_initialise)&&
>   	(pfn&  (PAGES_PER_SECTION - 1)) == 0) {
>   		pgdat->first_deferred_pfn = pfn;
>   		return false;
> @@ -299,6 +300,7 @@ static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid)
>
>   static inline bool update_defer_init(pg_data_t *pgdat,
>   				unsigned long pfn, unsigned long zone_end,
> +				unsigned long max_initialise,
>   				unsigned long *nr_initialised)
>   {
>   	return true;
> @@ -4457,11 +4459,19 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>   	unsigned long end_pfn = start_pfn + size;
>   	unsigned long pfn;
>   	struct zone *z;
> +	unsigned long max_initialise;
>   	unsigned long nr_initialised = 0;
>
>   	if (highest_memmap_pfn<  end_pfn - 1)
>   		highest_memmap_pfn = end_pfn - 1;
>
> +	/*
> +	 * Initialise at least 2G of a node but also take into account that
> +	 * two large system hashes that can take up an 8th of memory.
> +	 */
> +	max_initialise = min(2UL<<  (30 - PAGE_SHIFT),
> +			(pgdat->node_spanned_pages>>  3));
> +

I think you may be pre-allocating too much memory here. On the 24-TB 
machine, the size of the dentry and inode hash tables were 16G each. So 
the ratio is about is about 32G/24T = 0.13%. I think a shift factor of 
(>> 8) which is about 0.39% should be more than enough. For the 24TB 
machine, that means a preallocated memory of 96+4G which should be even 
more than the 64+4G in the modified kernel that I used. At the same 
time, I think we can also set the minimum to 1G or even 0.5G for better 
performance for systems that have many CPUs, but not as much memory per 
node.

Cheers,
Longman

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
  2015-05-05 13:55             ` Waiman Long
@ 2015-05-05 14:31               ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-05-05 14:31 UTC (permalink / raw)
  To: Waiman Long
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Tue, May 05, 2015 at 09:55:52AM -0400, Waiman Long wrote:
> On 05/05/2015 06:45 AM, Mel Gorman wrote:
> >On Mon, May 04, 2015 at 02:30:46PM -0700, Andrew Morton wrote:
> >>>Before the patch, the boot time from elilo prompt to ssh login was 694s.
> >>>After the patch, the boot up time was 346s, a saving of 348s (about 50%).
> >>Having to guesstimate the amount of memory which is needed for a
> >>successful boot will be painful.  Any number we choose will be wrong
> >>99% of the time.
> >>
> >>If the kswapd threads have started, all we need to do is to wait: take
> >>a little nap in the allocator's page==NULL slowpath.
> >>
> >>I'm not seeing any reason why we can't start kswapd much earlier -
> >>right at the start of do_basic_setup()?
> >It doesn't even have to be kswapd, it just should be a thread pinned to
> >a done. The difficulty is that dealing with the system hashes means the
> >initialisation has to happen before vfs_caches_init_early() when there is
> >no scheduler. Those allocations could be delayed further but then there is
> >the possibility that the allocations would not be contiguous and they'd
> >have to rely on CMA to make the attempt. That potentially alters the
> >performance of the large system hashes at run time.
> >
> >We can scale the amount initialised with memory sizes relatively easy.
> >This boots on the same 1TB machine I was testing before but that is
> >hardly a surprise.
> >
> >---8<---
> >mm: meminit: Take into account that large system caches scale linearly with memory
> >
> >Waiman Long reported a 24TB machine triggered an OOM as parallel memory
> >initialisation deferred too much memory for initialisation. The likely
> >consumer of this memory was large system hashes that scale with memory
> >size. This patch initialises at least 2G per node but scales the amount
> >initialised for larger systems.
> >
> >Signed-off-by: Mel Gorman<mgorman@suse.de>
> >---
> >  mm/page_alloc.c | 15 +++++++++++++--
> >  1 file changed, 13 insertions(+), 2 deletions(-)
> >
> >diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >index 598f78d6544c..f7cc6c9fb909 100644
> >--- a/mm/page_alloc.c
> >+++ b/mm/page_alloc.c
> >@@ -266,15 +266,16 @@ static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid)
> >   */
> >  static inline bool update_defer_init(pg_data_t *pgdat,
> >  				unsigned long pfn, unsigned long zone_end,
> >+				unsigned long max_initialise,
> >  				unsigned long *nr_initialised)
> >  {
> >  	/* Always populate low zones for address-contrained allocations */
> >  	if (zone_end<  pgdat_end_pfn(pgdat))
> >  		return true;
> >
> >-	/* Initialise at least 2G of the highest zone */
> >+	/* Initialise at least the requested amount in the highest zone */
> >  	(*nr_initialised)++;
> >-	if (*nr_initialised>  (2UL<<  (30 - PAGE_SHIFT))&&
> >+	if ((*nr_initialised>  max_initialise)&&
> >  	(pfn&  (PAGES_PER_SECTION - 1)) == 0) {
> >  		pgdat->first_deferred_pfn = pfn;
> >  		return false;
> >@@ -299,6 +300,7 @@ static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid)
> >
> >  static inline bool update_defer_init(pg_data_t *pgdat,
> >  				unsigned long pfn, unsigned long zone_end,
> >+				unsigned long max_initialise,
> >  				unsigned long *nr_initialised)
> >  {
> >  	return true;
> >@@ -4457,11 +4459,19 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> >  	unsigned long end_pfn = start_pfn + size;
> >  	unsigned long pfn;
> >  	struct zone *z;
> >+	unsigned long max_initialise;
> >  	unsigned long nr_initialised = 0;
> >
> >  	if (highest_memmap_pfn<  end_pfn - 1)
> >  		highest_memmap_pfn = end_pfn - 1;
> >
> >+	/*
> >+	 * Initialise at least 2G of a node but also take into account that
> >+	 * two large system hashes that can take up an 8th of memory.
> >+	 */
> >+	max_initialise = min(2UL<<  (30 - PAGE_SHIFT),
> >+			(pgdat->node_spanned_pages>>  3));
> >+
> 
> I think you may be pre-allocating too much memory here. On the 24-TB
> machine, the size of the dentry and inode hash tables were 16G each.
> So the ratio is about is about 32G/24T = 0.13%. I think a shift
> factor of (>> 8) which is about 0.39% should be more than enough.

I was taking the most pessimistic value possible to match where those
hashes currently get allocated from so that the locality does not change
after the series is applied. Can you try both (>> 3) and (>> 8) and see
do both work and if so, what the timing is?

> For the 24TB machine, that means a preallocated memory of 96+4G
> which should be even more than the 64+4G in the modified kernel that
> I used. At the same time, I think we can also set the minimum to 1G
> or even 0.5G for better performance for systems that have many CPUs,
> but not as much memory per node.
> 

I think the benefit there is going to be marginal except maybe on machines
where remote accesses are extremely costly.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
@ 2015-05-05 14:31               ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-05-05 14:31 UTC (permalink / raw)
  To: Waiman Long
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Tue, May 05, 2015 at 09:55:52AM -0400, Waiman Long wrote:
> On 05/05/2015 06:45 AM, Mel Gorman wrote:
> >On Mon, May 04, 2015 at 02:30:46PM -0700, Andrew Morton wrote:
> >>>Before the patch, the boot time from elilo prompt to ssh login was 694s.
> >>>After the patch, the boot up time was 346s, a saving of 348s (about 50%).
> >>Having to guesstimate the amount of memory which is needed for a
> >>successful boot will be painful.  Any number we choose will be wrong
> >>99% of the time.
> >>
> >>If the kswapd threads have started, all we need to do is to wait: take
> >>a little nap in the allocator's page==NULL slowpath.
> >>
> >>I'm not seeing any reason why we can't start kswapd much earlier -
> >>right at the start of do_basic_setup()?
> >It doesn't even have to be kswapd, it just should be a thread pinned to
> >a done. The difficulty is that dealing with the system hashes means the
> >initialisation has to happen before vfs_caches_init_early() when there is
> >no scheduler. Those allocations could be delayed further but then there is
> >the possibility that the allocations would not be contiguous and they'd
> >have to rely on CMA to make the attempt. That potentially alters the
> >performance of the large system hashes at run time.
> >
> >We can scale the amount initialised with memory sizes relatively easy.
> >This boots on the same 1TB machine I was testing before but that is
> >hardly a surprise.
> >
> >---8<---
> >mm: meminit: Take into account that large system caches scale linearly with memory
> >
> >Waiman Long reported a 24TB machine triggered an OOM as parallel memory
> >initialisation deferred too much memory for initialisation. The likely
> >consumer of this memory was large system hashes that scale with memory
> >size. This patch initialises at least 2G per node but scales the amount
> >initialised for larger systems.
> >
> >Signed-off-by: Mel Gorman<mgorman@suse.de>
> >---
> >  mm/page_alloc.c | 15 +++++++++++++--
> >  1 file changed, 13 insertions(+), 2 deletions(-)
> >
> >diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >index 598f78d6544c..f7cc6c9fb909 100644
> >--- a/mm/page_alloc.c
> >+++ b/mm/page_alloc.c
> >@@ -266,15 +266,16 @@ static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid)
> >   */
> >  static inline bool update_defer_init(pg_data_t *pgdat,
> >  				unsigned long pfn, unsigned long zone_end,
> >+				unsigned long max_initialise,
> >  				unsigned long *nr_initialised)
> >  {
> >  	/* Always populate low zones for address-contrained allocations */
> >  	if (zone_end<  pgdat_end_pfn(pgdat))
> >  		return true;
> >
> >-	/* Initialise at least 2G of the highest zone */
> >+	/* Initialise at least the requested amount in the highest zone */
> >  	(*nr_initialised)++;
> >-	if (*nr_initialised>  (2UL<<  (30 - PAGE_SHIFT))&&
> >+	if ((*nr_initialised>  max_initialise)&&
> >  	(pfn&  (PAGES_PER_SECTION - 1)) == 0) {
> >  		pgdat->first_deferred_pfn = pfn;
> >  		return false;
> >@@ -299,6 +300,7 @@ static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid)
> >
> >  static inline bool update_defer_init(pg_data_t *pgdat,
> >  				unsigned long pfn, unsigned long zone_end,
> >+				unsigned long max_initialise,
> >  				unsigned long *nr_initialised)
> >  {
> >  	return true;
> >@@ -4457,11 +4459,19 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> >  	unsigned long end_pfn = start_pfn + size;
> >  	unsigned long pfn;
> >  	struct zone *z;
> >+	unsigned long max_initialise;
> >  	unsigned long nr_initialised = 0;
> >
> >  	if (highest_memmap_pfn<  end_pfn - 1)
> >  		highest_memmap_pfn = end_pfn - 1;
> >
> >+	/*
> >+	 * Initialise at least 2G of a node but also take into account that
> >+	 * two large system hashes that can take up an 8th of memory.
> >+	 */
> >+	max_initialise = min(2UL<<  (30 - PAGE_SHIFT),
> >+			(pgdat->node_spanned_pages>>  3));
> >+
> 
> I think you may be pre-allocating too much memory here. On the 24-TB
> machine, the size of the dentry and inode hash tables were 16G each.
> So the ratio is about is about 32G/24T = 0.13%. I think a shift
> factor of (>> 8) which is about 0.39% should be more than enough.

I was taking the most pessimistic value possible to match where those
hashes currently get allocated from so that the locality does not change
after the series is applied. Can you try both (>> 3) and (>> 8) and see
do both work and if so, what the timing is?

> For the 24TB machine, that means a preallocated memory of 96+4G
> which should be even more than the 64+4G in the modified kernel that
> I used. At the same time, I think we can also set the minimum to 1G
> or even 0.5G for better performance for systems that have many CPUs,
> but not as much memory per node.
> 

I think the benefit there is going to be marginal except maybe on machines
where remote accesses are extremely costly.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
  2015-05-05 14:31               ` Mel Gorman
@ 2015-05-05 15:01                 ` Waiman Long
  -1 siblings, 0 replies; 168+ messages in thread
From: Waiman Long @ 2015-05-05 15:01 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On 05/05/2015 10:31 AM, Mel Gorman wrote:
> On Tue, May 05, 2015 at 09:55:52AM -0400, Waiman Long wrote:
>> On 05/05/2015 06:45 AM, Mel Gorman wrote:
>>> On Mon, May 04, 2015 at 02:30:46PM -0700, Andrew Morton wrote:
>>>>> Before the patch, the boot time from elilo prompt to ssh login was 694s.
>>>>> After the patch, the boot up time was 346s, a saving of 348s (about 50%).
>>>> Having to guesstimate the amount of memory which is needed for a
>>>> successful boot will be painful.  Any number we choose will be wrong
>>>> 99% of the time.
>>>>
>>>> If the kswapd threads have started, all we need to do is to wait: take
>>>> a little nap in the allocator's page==NULL slowpath.
>>>>
>>>> I'm not seeing any reason why we can't start kswapd much earlier -
>>>> right at the start of do_basic_setup()?
>>> It doesn't even have to be kswapd, it just should be a thread pinned to
>>> a done. The difficulty is that dealing with the system hashes means the
>>> initialisation has to happen before vfs_caches_init_early() when there is
>>> no scheduler. Those allocations could be delayed further but then there is
>>> the possibility that the allocations would not be contiguous and they'd
>>> have to rely on CMA to make the attempt. That potentially alters the
>>> performance of the large system hashes at run time.
>>>
>>> We can scale the amount initialised with memory sizes relatively easy.
>>> This boots on the same 1TB machine I was testing before but that is
>>> hardly a surprise.
>>>
>>> ---8<---
>>> mm: meminit: Take into account that large system caches scale linearly with memory
>>>
>>> Waiman Long reported a 24TB machine triggered an OOM as parallel memory
>>> initialisation deferred too much memory for initialisation. The likely
>>> consumer of this memory was large system hashes that scale with memory
>>> size. This patch initialises at least 2G per node but scales the amount
>>> initialised for larger systems.
>>>
>>> Signed-off-by: Mel Gorman<mgorman@suse.de>
>>> ---
>>>   mm/page_alloc.c | 15 +++++++++++++--
>>>   1 file changed, 13 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index 598f78d6544c..f7cc6c9fb909 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -266,15 +266,16 @@ static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid)
>>>    */
>>>   static inline bool update_defer_init(pg_data_t *pgdat,
>>>   				unsigned long pfn, unsigned long zone_end,
>>> +				unsigned long max_initialise,
>>>   				unsigned long *nr_initialised)
>>>   {
>>>   	/* Always populate low zones for address-contrained allocations */
>>>   	if (zone_end<   pgdat_end_pfn(pgdat))
>>>   		return true;
>>>
>>> -	/* Initialise at least 2G of the highest zone */
>>> +	/* Initialise at least the requested amount in the highest zone */
>>>   	(*nr_initialised)++;
>>> -	if (*nr_initialised>   (2UL<<   (30 - PAGE_SHIFT))&&
>>> +	if ((*nr_initialised>   max_initialise)&&
>>>   	(pfn&   (PAGES_PER_SECTION - 1)) == 0) {
>>>   		pgdat->first_deferred_pfn = pfn;
>>>   		return false;
>>> @@ -299,6 +300,7 @@ static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid)
>>>
>>>   static inline bool update_defer_init(pg_data_t *pgdat,
>>>   				unsigned long pfn, unsigned long zone_end,
>>> +				unsigned long max_initialise,
>>>   				unsigned long *nr_initialised)
>>>   {
>>>   	return true;
>>> @@ -4457,11 +4459,19 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>>>   	unsigned long end_pfn = start_pfn + size;
>>>   	unsigned long pfn;
>>>   	struct zone *z;
>>> +	unsigned long max_initialise;
>>>   	unsigned long nr_initialised = 0;
>>>
>>>   	if (highest_memmap_pfn<   end_pfn - 1)
>>>   		highest_memmap_pfn = end_pfn - 1;
>>>
>>> +	/*
>>> +	 * Initialise at least 2G of a node but also take into account that
>>> +	 * two large system hashes that can take up an 8th of memory.
>>> +	 */
>>> +	max_initialise = min(2UL<<   (30 - PAGE_SHIFT),
>>> +			(pgdat->node_spanned_pages>>   3));
>>> +
>> I think you may be pre-allocating too much memory here. On the 24-TB
>> machine, the size of the dentry and inode hash tables were 16G each.
>> So the ratio is about is about 32G/24T = 0.13%. I think a shift
>> factor of (>>  8) which is about 0.39% should be more than enough.
> I was taking the most pessimistic value possible to match where those
> hashes currently get allocated from so that the locality does not change
> after the series is applied. Can you try both (>>  3) and (>>  8) and see
> do both work and if so, what the timing is?

Sure. I will try both and get you the results, hopefully by tomorrow at 
the latest.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
@ 2015-05-05 15:01                 ` Waiman Long
  0 siblings, 0 replies; 168+ messages in thread
From: Waiman Long @ 2015-05-05 15:01 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On 05/05/2015 10:31 AM, Mel Gorman wrote:
> On Tue, May 05, 2015 at 09:55:52AM -0400, Waiman Long wrote:
>> On 05/05/2015 06:45 AM, Mel Gorman wrote:
>>> On Mon, May 04, 2015 at 02:30:46PM -0700, Andrew Morton wrote:
>>>>> Before the patch, the boot time from elilo prompt to ssh login was 694s.
>>>>> After the patch, the boot up time was 346s, a saving of 348s (about 50%).
>>>> Having to guesstimate the amount of memory which is needed for a
>>>> successful boot will be painful.  Any number we choose will be wrong
>>>> 99% of the time.
>>>>
>>>> If the kswapd threads have started, all we need to do is to wait: take
>>>> a little nap in the allocator's page==NULL slowpath.
>>>>
>>>> I'm not seeing any reason why we can't start kswapd much earlier -
>>>> right at the start of do_basic_setup()?
>>> It doesn't even have to be kswapd, it just should be a thread pinned to
>>> a done. The difficulty is that dealing with the system hashes means the
>>> initialisation has to happen before vfs_caches_init_early() when there is
>>> no scheduler. Those allocations could be delayed further but then there is
>>> the possibility that the allocations would not be contiguous and they'd
>>> have to rely on CMA to make the attempt. That potentially alters the
>>> performance of the large system hashes at run time.
>>>
>>> We can scale the amount initialised with memory sizes relatively easy.
>>> This boots on the same 1TB machine I was testing before but that is
>>> hardly a surprise.
>>>
>>> ---8<---
>>> mm: meminit: Take into account that large system caches scale linearly with memory
>>>
>>> Waiman Long reported a 24TB machine triggered an OOM as parallel memory
>>> initialisation deferred too much memory for initialisation. The likely
>>> consumer of this memory was large system hashes that scale with memory
>>> size. This patch initialises at least 2G per node but scales the amount
>>> initialised for larger systems.
>>>
>>> Signed-off-by: Mel Gorman<mgorman@suse.de>
>>> ---
>>>   mm/page_alloc.c | 15 +++++++++++++--
>>>   1 file changed, 13 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index 598f78d6544c..f7cc6c9fb909 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -266,15 +266,16 @@ static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid)
>>>    */
>>>   static inline bool update_defer_init(pg_data_t *pgdat,
>>>   				unsigned long pfn, unsigned long zone_end,
>>> +				unsigned long max_initialise,
>>>   				unsigned long *nr_initialised)
>>>   {
>>>   	/* Always populate low zones for address-contrained allocations */
>>>   	if (zone_end<   pgdat_end_pfn(pgdat))
>>>   		return true;
>>>
>>> -	/* Initialise at least 2G of the highest zone */
>>> +	/* Initialise at least the requested amount in the highest zone */
>>>   	(*nr_initialised)++;
>>> -	if (*nr_initialised>   (2UL<<   (30 - PAGE_SHIFT))&&
>>> +	if ((*nr_initialised>   max_initialise)&&
>>>   	(pfn&   (PAGES_PER_SECTION - 1)) == 0) {
>>>   		pgdat->first_deferred_pfn = pfn;
>>>   		return false;
>>> @@ -299,6 +300,7 @@ static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid)
>>>
>>>   static inline bool update_defer_init(pg_data_t *pgdat,
>>>   				unsigned long pfn, unsigned long zone_end,
>>> +				unsigned long max_initialise,
>>>   				unsigned long *nr_initialised)
>>>   {
>>>   	return true;
>>> @@ -4457,11 +4459,19 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>>>   	unsigned long end_pfn = start_pfn + size;
>>>   	unsigned long pfn;
>>>   	struct zone *z;
>>> +	unsigned long max_initialise;
>>>   	unsigned long nr_initialised = 0;
>>>
>>>   	if (highest_memmap_pfn<   end_pfn - 1)
>>>   		highest_memmap_pfn = end_pfn - 1;
>>>
>>> +	/*
>>> +	 * Initialise at least 2G of a node but also take into account that
>>> +	 * two large system hashes that can take up an 8th of memory.
>>> +	 */
>>> +	max_initialise = min(2UL<<   (30 - PAGE_SHIFT),
>>> +			(pgdat->node_spanned_pages>>   3));
>>> +
>> I think you may be pre-allocating too much memory here. On the 24-TB
>> machine, the size of the dentry and inode hash tables were 16G each.
>> So the ratio is about is about 32G/24T = 0.13%. I think a shift
>> factor of (>>  8) which is about 0.39% should be more than enough.
> I was taking the most pessimistic value possible to match where those
> hashes currently get allocated from so that the locality does not change
> after the series is applied. Can you try both (>>  3) and (>>  8) and see
> do both work and if so, what the timing is?

Sure. I will try both and get you the results, hopefully by tomorrow at 
the latest.

Cheers,
Longman

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
  2015-05-05 10:45           ` Mel Gorman
@ 2015-05-05 20:02             ` Andrew Morton
  -1 siblings, 0 replies; 168+ messages in thread
From: Andrew Morton @ 2015-05-05 20:02 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Waiman Long, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Tue, 5 May 2015 11:45:14 +0100 Mel Gorman <mgorman@suse.de> wrote:

> On Mon, May 04, 2015 at 02:30:46PM -0700, Andrew Morton wrote:
> > > Before the patch, the boot time from elilo prompt to ssh login was 694s. 
> > > After the patch, the boot up time was 346s, a saving of 348s (about 50%).
> > 
> > Having to guesstimate the amount of memory which is needed for a
> > successful boot will be painful.  Any number we choose will be wrong
> > 99% of the time.
> > 
> > If the kswapd threads have started, all we need to do is to wait: take
> > a little nap in the allocator's page==NULL slowpath.
> > 
> > I'm not seeing any reason why we can't start kswapd much earlier -
> > right at the start of do_basic_setup()?
> 
> It doesn't even have to be kswapd, it just should be a thread pinned to
> a done. The difficulty is that dealing with the system hashes means the
> initialisation has to happen before vfs_caches_init_early() when there is
> no scheduler.

I bet we can run vfs_caches_init_early() after sched_init().  Might
need a few little fixups.

> Those allocations could be delayed further but then there is
> the possibility that the allocations would not be contiguous and they'd
> have to rely on CMA to make the attempt. That potentially alters the
> performance of the large system hashes at run time.

hm, why.  If the kswapd threads are running and busily creating free
pages then alloc_pages(order=10) can detect this situation and stall
for a while, waiting for kswapd to create an order-10 page.

Alternatively, the page allocator can go off and synchronously
initialize some pageframes itself.  Keep doing that until the
allocation attempt succeeds.

Such an approach is much more robust than trying to predict how much
memory will be needed.


^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
@ 2015-05-05 20:02             ` Andrew Morton
  0 siblings, 0 replies; 168+ messages in thread
From: Andrew Morton @ 2015-05-05 20:02 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Waiman Long, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Tue, 5 May 2015 11:45:14 +0100 Mel Gorman <mgorman@suse.de> wrote:

> On Mon, May 04, 2015 at 02:30:46PM -0700, Andrew Morton wrote:
> > > Before the patch, the boot time from elilo prompt to ssh login was 694s. 
> > > After the patch, the boot up time was 346s, a saving of 348s (about 50%).
> > 
> > Having to guesstimate the amount of memory which is needed for a
> > successful boot will be painful.  Any number we choose will be wrong
> > 99% of the time.
> > 
> > If the kswapd threads have started, all we need to do is to wait: take
> > a little nap in the allocator's page==NULL slowpath.
> > 
> > I'm not seeing any reason why we can't start kswapd much earlier -
> > right at the start of do_basic_setup()?
> 
> It doesn't even have to be kswapd, it just should be a thread pinned to
> a done. The difficulty is that dealing with the system hashes means the
> initialisation has to happen before vfs_caches_init_early() when there is
> no scheduler.

I bet we can run vfs_caches_init_early() after sched_init().  Might
need a few little fixups.

> Those allocations could be delayed further but then there is
> the possibility that the allocations would not be contiguous and they'd
> have to rely on CMA to make the attempt. That potentially alters the
> performance of the large system hashes at run time.

hm, why.  If the kswapd threads are running and busily creating free
pages then alloc_pages(order=10) can detect this situation and stall
for a while, waiting for kswapd to create an order-10 page.

Alternatively, the page allocator can go off and synchronously
initialize some pageframes itself.  Keep doing that until the
allocation attempt succeeds.

Such an approach is much more robust than trying to predict how much
memory will be needed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
  2015-05-05 20:02             ` Andrew Morton
@ 2015-05-05 22:13               ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-05-05 22:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Waiman Long, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Tue, May 05, 2015 at 01:02:55PM -0700, Andrew Morton wrote:
> On Tue, 5 May 2015 11:45:14 +0100 Mel Gorman <mgorman@suse.de> wrote:
> 
> > On Mon, May 04, 2015 at 02:30:46PM -0700, Andrew Morton wrote:
> > > > Before the patch, the boot time from elilo prompt to ssh login was 694s. 
> > > > After the patch, the boot up time was 346s, a saving of 348s (about 50%).
> > > 
> > > Having to guesstimate the amount of memory which is needed for a
> > > successful boot will be painful.  Any number we choose will be wrong
> > > 99% of the time.
> > > 
> > > If the kswapd threads have started, all we need to do is to wait: take
> > > a little nap in the allocator's page==NULL slowpath.
> > > 
> > > I'm not seeing any reason why we can't start kswapd much earlier -
> > > right at the start of do_basic_setup()?
> > 
> > It doesn't even have to be kswapd, it just should be a thread pinned to
> > a done. The difficulty is that dealing with the system hashes means the
> > initialisation has to happen before vfs_caches_init_early() when there is
> > no scheduler.
> 
> I bet we can run vfs_caches_init_early() after sched_init().  Might
> need a few little fixups.
> 

For the large hashes, that would leave the CMA requirement because
allocation sizes can be larger than order-10. Arguably on NUMA, that's
a bad idea anyway because it should have been interleaved but it's not
something this patchset should change.

> > Those allocations could be delayed further but then there is
> > the possibility that the allocations would not be contiguous and they'd
> > have to rely on CMA to make the attempt. That potentially alters the
> > performance of the large system hashes at run time.
> 
> hm, why.  If the kswapd threads are running and busily creating free
> pages then alloc_pages(order=10) can detect this situation and stall
> for a while, waiting for kswapd to create an order-10 page.
> 

In Waiman's case, the OOM happened when kswapd was not necessarily available
but that's an implementation detail. I'll look tomorrow at what is required
to use dedicated threads to parallelisation the allocation and synchronously
wait for those threads to complete. It should be possible to create those
earlier than kswapd currently is. It'll take longer to boot but hopefully
not so long that it makes the series pointless.

> Alternatively, the page allocator can go off and synchronously
> initialize some pageframes itself.  Keep doing that until the
> allocation attempt succeeds.
> 

That was rejected during review of earlier attempts at this feature on
the grounds that it impacted allocator fast paths. 

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
@ 2015-05-05 22:13               ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-05-05 22:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Waiman Long, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Tue, May 05, 2015 at 01:02:55PM -0700, Andrew Morton wrote:
> On Tue, 5 May 2015 11:45:14 +0100 Mel Gorman <mgorman@suse.de> wrote:
> 
> > On Mon, May 04, 2015 at 02:30:46PM -0700, Andrew Morton wrote:
> > > > Before the patch, the boot time from elilo prompt to ssh login was 694s. 
> > > > After the patch, the boot up time was 346s, a saving of 348s (about 50%).
> > > 
> > > Having to guesstimate the amount of memory which is needed for a
> > > successful boot will be painful.  Any number we choose will be wrong
> > > 99% of the time.
> > > 
> > > If the kswapd threads have started, all we need to do is to wait: take
> > > a little nap in the allocator's page==NULL slowpath.
> > > 
> > > I'm not seeing any reason why we can't start kswapd much earlier -
> > > right at the start of do_basic_setup()?
> > 
> > It doesn't even have to be kswapd, it just should be a thread pinned to
> > a done. The difficulty is that dealing with the system hashes means the
> > initialisation has to happen before vfs_caches_init_early() when there is
> > no scheduler.
> 
> I bet we can run vfs_caches_init_early() after sched_init().  Might
> need a few little fixups.
> 

For the large hashes, that would leave the CMA requirement because
allocation sizes can be larger than order-10. Arguably on NUMA, that's
a bad idea anyway because it should have been interleaved but it's not
something this patchset should change.

> > Those allocations could be delayed further but then there is
> > the possibility that the allocations would not be contiguous and they'd
> > have to rely on CMA to make the attempt. That potentially alters the
> > performance of the large system hashes at run time.
> 
> hm, why.  If the kswapd threads are running and busily creating free
> pages then alloc_pages(order=10) can detect this situation and stall
> for a while, waiting for kswapd to create an order-10 page.
> 

In Waiman's case, the OOM happened when kswapd was not necessarily available
but that's an implementation detail. I'll look tomorrow at what is required
to use dedicated threads to parallelisation the allocation and synchronously
wait for those threads to complete. It should be possible to create those
earlier than kswapd currently is. It'll take longer to boot but hopefully
not so long that it makes the series pointless.

> Alternatively, the page allocator can go off and synchronously
> initialize some pageframes itself.  Keep doing that until the
> allocation attempt succeeds.
> 

That was rejected during review of earlier attempts at this feature on
the grounds that it impacted allocator fast paths. 

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
  2015-05-05 22:13               ` Mel Gorman
@ 2015-05-05 22:25                 ` Andrew Morton
  -1 siblings, 0 replies; 168+ messages in thread
From: Andrew Morton @ 2015-05-05 22:25 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Waiman Long, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Tue, 5 May 2015 23:13:29 +0100 Mel Gorman <mgorman@suse.de> wrote:

> > Alternatively, the page allocator can go off and synchronously
> > initialize some pageframes itself.  Keep doing that until the
> > allocation attempt succeeds.
> > 
> 
> That was rejected during review of earlier attempts at this feature on
> the grounds that it impacted allocator fast paths. 

eh?  Changes are only needed on the allocation-attempt-failed path,
which is slow-path.

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
@ 2015-05-05 22:25                 ` Andrew Morton
  0 siblings, 0 replies; 168+ messages in thread
From: Andrew Morton @ 2015-05-05 22:25 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Waiman Long, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Tue, 5 May 2015 23:13:29 +0100 Mel Gorman <mgorman@suse.de> wrote:

> > Alternatively, the page allocator can go off and synchronously
> > initialize some pageframes itself.  Keep doing that until the
> > allocation attempt succeeds.
> > 
> 
> That was rejected during review of earlier attempts at this feature on
> the grounds that it impacted allocator fast paths. 

eh?  Changes are only needed on the allocation-attempt-failed path,
which is slow-path.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
  2015-05-05 14:31               ` Mel Gorman
@ 2015-05-06  0:55                 ` Waiman Long
  -1 siblings, 0 replies; 168+ messages in thread
From: Waiman Long @ 2015-05-06  0:55 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On 05/05/2015 10:31 AM, Mel Gorman wrote:
> On Tue, May 05, 2015 at 09:55:52AM -0400, Waiman Long wrote:
>> On 05/05/2015 06:45 AM, Mel Gorman wrote:
>>> On Mon, May 04, 2015 at 02:30:46PM -0700, Andrew Morton wrote:
>>>>> Before the patch, the boot time from elilo prompt to ssh login was 694s.
>>>>> After the patch, the boot up time was 346s, a saving of 348s (about 50%).
>>>> Having to guesstimate the amount of memory which is needed for a
>>>> successful boot will be painful.  Any number we choose will be wrong
>>>> 99% of the time.
>>>>
>>>> If the kswapd threads have started, all we need to do is to wait: take
>>>> a little nap in the allocator's page==NULL slowpath.
>>>>
>>>> I'm not seeing any reason why we can't start kswapd much earlier -
>>>> right at the start of do_basic_setup()?
>>> It doesn't even have to be kswapd, it just should be a thread pinned to
>>> a done. The difficulty is that dealing with the system hashes means the
>>> initialisation has to happen before vfs_caches_init_early() when there is
>>> no scheduler. Those allocations could be delayed further but then there is
>>> the possibility that the allocations would not be contiguous and they'd
>>> have to rely on CMA to make the attempt. That potentially alters the
>>> performance of the large system hashes at run time.
>>>
>>> We can scale the amount initialised with memory sizes relatively easy.
>>> This boots on the same 1TB machine I was testing before but that is
>>> hardly a surprise.
>>>
>>> ---8<---
>>> mm: meminit: Take into account that large system caches scale linearly with memory
>>>
>>> Waiman Long reported a 24TB machine triggered an OOM as parallel memory
>>> initialisation deferred too much memory for initialisation. The likely
>>> consumer of this memory was large system hashes that scale with memory
>>> size. This patch initialises at least 2G per node but scales the amount
>>> initialised for larger systems.
>>>
>>> Signed-off-by: Mel Gorman<mgorman@suse.de>
>>> ---
>>>   mm/page_alloc.c | 15 +++++++++++++--
>>>   1 file changed, 13 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index 598f78d6544c..f7cc6c9fb909 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -266,15 +266,16 @@ static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid)
>>>    */
>>>   static inline bool update_defer_init(pg_data_t *pgdat,
>>>   				unsigned long pfn, unsigned long zone_end,
>>> +				unsigned long max_initialise,
>>>   				unsigned long *nr_initialised)
>>>   {
>>>   	/* Always populate low zones for address-contrained allocations */
>>>   	if (zone_end<   pgdat_end_pfn(pgdat))
>>>   		return true;
>>>
>>> -	/* Initialise at least 2G of the highest zone */
>>> +	/* Initialise at least the requested amount in the highest zone */
>>>   	(*nr_initialised)++;
>>> -	if (*nr_initialised>   (2UL<<   (30 - PAGE_SHIFT))&&
>>> +	if ((*nr_initialised>   max_initialise)&&
>>>   	(pfn&   (PAGES_PER_SECTION - 1)) == 0) {
>>>   		pgdat->first_deferred_pfn = pfn;
>>>   		return false;
>>> @@ -299,6 +300,7 @@ static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid)
>>>
>>>   static inline bool update_defer_init(pg_data_t *pgdat,
>>>   				unsigned long pfn, unsigned long zone_end,
>>> +				unsigned long max_initialise,
>>>   				unsigned long *nr_initialised)
>>>   {
>>>   	return true;
>>> @@ -4457,11 +4459,19 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>>>   	unsigned long end_pfn = start_pfn + size;
>>>   	unsigned long pfn;
>>>   	struct zone *z;
>>> +	unsigned long max_initialise;
>>>   	unsigned long nr_initialised = 0;
>>>
>>>   	if (highest_memmap_pfn<   end_pfn - 1)
>>>   		highest_memmap_pfn = end_pfn - 1;
>>>
>>> +	/*
>>> +	 * Initialise at least 2G of a node but also take into account that
>>> +	 * two large system hashes that can take up an 8th of memory.
>>> +	 */
>>> +	max_initialise = min(2UL<<   (30 - PAGE_SHIFT),
>>> +			(pgdat->node_spanned_pages>>   3));
>>> +

I found an error here. The correct code should be:

max_initialise = max(2UL<<  (30 - PAGE_SHIFT), (pgdat->node_spanned_pages>>   3));


The error made the 24-TB machine crash again.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
@ 2015-05-06  0:55                 ` Waiman Long
  0 siblings, 0 replies; 168+ messages in thread
From: Waiman Long @ 2015-05-06  0:55 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On 05/05/2015 10:31 AM, Mel Gorman wrote:
> On Tue, May 05, 2015 at 09:55:52AM -0400, Waiman Long wrote:
>> On 05/05/2015 06:45 AM, Mel Gorman wrote:
>>> On Mon, May 04, 2015 at 02:30:46PM -0700, Andrew Morton wrote:
>>>>> Before the patch, the boot time from elilo prompt to ssh login was 694s.
>>>>> After the patch, the boot up time was 346s, a saving of 348s (about 50%).
>>>> Having to guesstimate the amount of memory which is needed for a
>>>> successful boot will be painful.  Any number we choose will be wrong
>>>> 99% of the time.
>>>>
>>>> If the kswapd threads have started, all we need to do is to wait: take
>>>> a little nap in the allocator's page==NULL slowpath.
>>>>
>>>> I'm not seeing any reason why we can't start kswapd much earlier -
>>>> right at the start of do_basic_setup()?
>>> It doesn't even have to be kswapd, it just should be a thread pinned to
>>> a done. The difficulty is that dealing with the system hashes means the
>>> initialisation has to happen before vfs_caches_init_early() when there is
>>> no scheduler. Those allocations could be delayed further but then there is
>>> the possibility that the allocations would not be contiguous and they'd
>>> have to rely on CMA to make the attempt. That potentially alters the
>>> performance of the large system hashes at run time.
>>>
>>> We can scale the amount initialised with memory sizes relatively easy.
>>> This boots on the same 1TB machine I was testing before but that is
>>> hardly a surprise.
>>>
>>> ---8<---
>>> mm: meminit: Take into account that large system caches scale linearly with memory
>>>
>>> Waiman Long reported a 24TB machine triggered an OOM as parallel memory
>>> initialisation deferred too much memory for initialisation. The likely
>>> consumer of this memory was large system hashes that scale with memory
>>> size. This patch initialises at least 2G per node but scales the amount
>>> initialised for larger systems.
>>>
>>> Signed-off-by: Mel Gorman<mgorman@suse.de>
>>> ---
>>>   mm/page_alloc.c | 15 +++++++++++++--
>>>   1 file changed, 13 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index 598f78d6544c..f7cc6c9fb909 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -266,15 +266,16 @@ static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid)
>>>    */
>>>   static inline bool update_defer_init(pg_data_t *pgdat,
>>>   				unsigned long pfn, unsigned long zone_end,
>>> +				unsigned long max_initialise,
>>>   				unsigned long *nr_initialised)
>>>   {
>>>   	/* Always populate low zones for address-contrained allocations */
>>>   	if (zone_end<   pgdat_end_pfn(pgdat))
>>>   		return true;
>>>
>>> -	/* Initialise at least 2G of the highest zone */
>>> +	/* Initialise at least the requested amount in the highest zone */
>>>   	(*nr_initialised)++;
>>> -	if (*nr_initialised>   (2UL<<   (30 - PAGE_SHIFT))&&
>>> +	if ((*nr_initialised>   max_initialise)&&
>>>   	(pfn&   (PAGES_PER_SECTION - 1)) == 0) {
>>>   		pgdat->first_deferred_pfn = pfn;
>>>   		return false;
>>> @@ -299,6 +300,7 @@ static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid)
>>>
>>>   static inline bool update_defer_init(pg_data_t *pgdat,
>>>   				unsigned long pfn, unsigned long zone_end,
>>> +				unsigned long max_initialise,
>>>   				unsigned long *nr_initialised)
>>>   {
>>>   	return true;
>>> @@ -4457,11 +4459,19 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>>>   	unsigned long end_pfn = start_pfn + size;
>>>   	unsigned long pfn;
>>>   	struct zone *z;
>>> +	unsigned long max_initialise;
>>>   	unsigned long nr_initialised = 0;
>>>
>>>   	if (highest_memmap_pfn<   end_pfn - 1)
>>>   		highest_memmap_pfn = end_pfn - 1;
>>>
>>> +	/*
>>> +	 * Initialise at least 2G of a node but also take into account that
>>> +	 * two large system hashes that can take up an 8th of memory.
>>> +	 */
>>> +	max_initialise = min(2UL<<   (30 - PAGE_SHIFT),
>>> +			(pgdat->node_spanned_pages>>   3));
>>> +

I found an error here. The correct code should be:

max_initialise = max(2UL<<  (30 - PAGE_SHIFT), (pgdat->node_spanned_pages>>   3));


The error made the 24-TB machine crash again.

Cheers,
Longman

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
  2015-05-05 20:02             ` Andrew Morton
@ 2015-05-06  1:21               ` Waiman Long
  -1 siblings, 0 replies; 168+ messages in thread
From: Waiman Long @ 2015-05-06  1:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On 05/05/2015 04:02 PM, Andrew Morton wrote:
> On Tue, 5 May 2015 11:45:14 +0100 Mel Gorman<mgorman@suse.de>  wrote:
>
>> On Mon, May 04, 2015 at 02:30:46PM -0700, Andrew Morton wrote:
>>>> Before the patch, the boot time from elilo prompt to ssh login was 694s.
>>>> After the patch, the boot up time was 346s, a saving of 348s (about 50%).
>>> Having to guesstimate the amount of memory which is needed for a
>>> successful boot will be painful.  Any number we choose will be wrong
>>> 99% of the time.
>>>
>>> If the kswapd threads have started, all we need to do is to wait: take
>>> a little nap in the allocator's page==NULL slowpath.
>>>
>>> I'm not seeing any reason why we can't start kswapd much earlier -
>>> right at the start of do_basic_setup()?
>> It doesn't even have to be kswapd, it just should be a thread pinned to
>> a done. The difficulty is that dealing with the system hashes means the
>> initialisation has to happen before vfs_caches_init_early() when there is
>> no scheduler.
> I bet we can run vfs_caches_init_early() after sched_init().  Might
> need a few little fixups.
>
>> Those allocations could be delayed further but then there is
>> the possibility that the allocations would not be contiguous and they'd
>> have to rely on CMA to make the attempt. That potentially alters the
>> performance of the large system hashes at run time.
> hm, why.  If the kswapd threads are running and busily creating free
> pages then alloc_pages(order=10) can detect this situation and stall
> for a while, waiting for kswapd to create an order-10 page.
>
> Alternatively, the page allocator can go off and synchronously
> initialize some pageframes itself.  Keep doing that until the
> allocation attempt succeeds.
>
> Such an approach is much more robust than trying to predict how much
> memory will be needed.
>

Most of those hash tables are allocated before smp_boot. In UP mode, you 
can't have another thread initializing memory. So we really need to 
preallocate enough for those tables.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
@ 2015-05-06  1:21               ` Waiman Long
  0 siblings, 0 replies; 168+ messages in thread
From: Waiman Long @ 2015-05-06  1:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On 05/05/2015 04:02 PM, Andrew Morton wrote:
> On Tue, 5 May 2015 11:45:14 +0100 Mel Gorman<mgorman@suse.de>  wrote:
>
>> On Mon, May 04, 2015 at 02:30:46PM -0700, Andrew Morton wrote:
>>>> Before the patch, the boot time from elilo prompt to ssh login was 694s.
>>>> After the patch, the boot up time was 346s, a saving of 348s (about 50%).
>>> Having to guesstimate the amount of memory which is needed for a
>>> successful boot will be painful.  Any number we choose will be wrong
>>> 99% of the time.
>>>
>>> If the kswapd threads have started, all we need to do is to wait: take
>>> a little nap in the allocator's page==NULL slowpath.
>>>
>>> I'm not seeing any reason why we can't start kswapd much earlier -
>>> right at the start of do_basic_setup()?
>> It doesn't even have to be kswapd, it just should be a thread pinned to
>> a done. The difficulty is that dealing with the system hashes means the
>> initialisation has to happen before vfs_caches_init_early() when there is
>> no scheduler.
> I bet we can run vfs_caches_init_early() after sched_init().  Might
> need a few little fixups.
>
>> Those allocations could be delayed further but then there is
>> the possibility that the allocations would not be contiguous and they'd
>> have to rely on CMA to make the attempt. That potentially alters the
>> performance of the large system hashes at run time.
> hm, why.  If the kswapd threads are running and busily creating free
> pages then alloc_pages(order=10) can detect this situation and stall
> for a while, waiting for kswapd to create an order-10 page.
>
> Alternatively, the page allocator can go off and synchronously
> initialize some pageframes itself.  Keep doing that until the
> allocation attempt succeeds.
>
> Such an approach is much more robust than trying to predict how much
> memory will be needed.
>

Most of those hash tables are allocated before smp_boot. In UP mode, you 
can't have another thread initializing memory. So we really need to 
preallocate enough for those tables.

Cheers,
Longman

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
  2015-05-06  1:21               ` Waiman Long
@ 2015-05-06  2:01                 ` Andrew Morton
  -1 siblings, 0 replies; 168+ messages in thread
From: Andrew Morton @ 2015-05-06  2:01 UTC (permalink / raw)
  To: Waiman Long
  Cc: Mel Gorman, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Tue, 05 May 2015 21:21:19 -0400 Waiman Long <waiman.long@hp.com> wrote:

> On 05/05/2015 04:02 PM, Andrew Morton wrote:
> > On Tue, 5 May 2015 11:45:14 +0100 Mel Gorman<mgorman@suse.de>  wrote:
> >
> >> On Mon, May 04, 2015 at 02:30:46PM -0700, Andrew Morton wrote:
> >>>> Before the patch, the boot time from elilo prompt to ssh login was 694s.
> >>>> After the patch, the boot up time was 346s, a saving of 348s (about 50%).
> >>> Having to guesstimate the amount of memory which is needed for a
> >>> successful boot will be painful.  Any number we choose will be wrong
> >>> 99% of the time.
> >>>
> >>> If the kswapd threads have started, all we need to do is to wait: take
> >>> a little nap in the allocator's page==NULL slowpath.
> >>>
> >>> I'm not seeing any reason why we can't start kswapd much earlier -
> >>> right at the start of do_basic_setup()?
> >> It doesn't even have to be kswapd, it just should be a thread pinned to
> >> a done. The difficulty is that dealing with the system hashes means the
> >> initialisation has to happen before vfs_caches_init_early() when there is
> >> no scheduler.
> > I bet we can run vfs_caches_init_early() after sched_init().  Might
> > need a few little fixups.
> >
> >> Those allocations could be delayed further but then there is
> >> the possibility that the allocations would not be contiguous and they'd
> >> have to rely on CMA to make the attempt. That potentially alters the
> >> performance of the large system hashes at run time.
> > hm, why.  If the kswapd threads are running and busily creating free
> > pages then alloc_pages(order=10) can detect this situation and stall
> > for a while, waiting for kswapd to create an order-10 page.
> >
> > Alternatively, the page allocator can go off and synchronously
> > initialize some pageframes itself.  Keep doing that until the
> > allocation attempt succeeds.
> >
> > Such an approach is much more robust than trying to predict how much
> > memory will be needed.
> >
> 
> Most of those hash tables are allocated before smp_boot. In UP mode, you 
> can't have another thread initializing memory. So we really need to 
> preallocate enough for those tables.

(copy-paste)

: Alternatively, the page allocator can go off and synchronously
: initialize some pageframes itself.  Keep doing that until the
: allocation attempt succeeds.

IOW, the caller of alloc_pages() goes off and does the work which
kswapd would have done later on.


^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
@ 2015-05-06  2:01                 ` Andrew Morton
  0 siblings, 0 replies; 168+ messages in thread
From: Andrew Morton @ 2015-05-06  2:01 UTC (permalink / raw)
  To: Waiman Long
  Cc: Mel Gorman, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Tue, 05 May 2015 21:21:19 -0400 Waiman Long <waiman.long@hp.com> wrote:

> On 05/05/2015 04:02 PM, Andrew Morton wrote:
> > On Tue, 5 May 2015 11:45:14 +0100 Mel Gorman<mgorman@suse.de>  wrote:
> >
> >> On Mon, May 04, 2015 at 02:30:46PM -0700, Andrew Morton wrote:
> >>>> Before the patch, the boot time from elilo prompt to ssh login was 694s.
> >>>> After the patch, the boot up time was 346s, a saving of 348s (about 50%).
> >>> Having to guesstimate the amount of memory which is needed for a
> >>> successful boot will be painful.  Any number we choose will be wrong
> >>> 99% of the time.
> >>>
> >>> If the kswapd threads have started, all we need to do is to wait: take
> >>> a little nap in the allocator's page==NULL slowpath.
> >>>
> >>> I'm not seeing any reason why we can't start kswapd much earlier -
> >>> right at the start of do_basic_setup()?
> >> It doesn't even have to be kswapd, it just should be a thread pinned to
> >> a done. The difficulty is that dealing with the system hashes means the
> >> initialisation has to happen before vfs_caches_init_early() when there is
> >> no scheduler.
> > I bet we can run vfs_caches_init_early() after sched_init().  Might
> > need a few little fixups.
> >
> >> Those allocations could be delayed further but then there is
> >> the possibility that the allocations would not be contiguous and they'd
> >> have to rely on CMA to make the attempt. That potentially alters the
> >> performance of the large system hashes at run time.
> > hm, why.  If the kswapd threads are running and busily creating free
> > pages then alloc_pages(order=10) can detect this situation and stall
> > for a while, waiting for kswapd to create an order-10 page.
> >
> > Alternatively, the page allocator can go off and synchronously
> > initialize some pageframes itself.  Keep doing that until the
> > allocation attempt succeeds.
> >
> > Such an approach is much more robust than trying to predict how much
> > memory will be needed.
> >
> 
> Most of those hash tables are allocated before smp_boot. In UP mode, you 
> can't have another thread initializing memory. So we really need to 
> preallocate enough for those tables.

(copy-paste)

: Alternatively, the page allocator can go off and synchronously
: initialize some pageframes itself.  Keep doing that until the
: allocation attempt succeeds.

IOW, the caller of alloc_pages() goes off and does the work which
kswapd would have done later on.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
  2015-05-05 15:01                 ` Waiman Long
@ 2015-05-06  3:39                   ` Waiman Long
  -1 siblings, 0 replies; 168+ messages in thread
From: Waiman Long @ 2015-05-06  3:39 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On 05/05/2015 11:01 AM, Waiman Long wrote:
> On 05/05/2015 10:31 AM, Mel Gorman wrote:
>> On Tue, May 05, 2015 at 09:55:52AM -0400, Waiman Long wrote:
>>> On 05/05/2015 06:45 AM, Mel Gorman wrote:
>>>> On Mon, May 04, 2015 at 02:30:46PM -0700, Andrew Morton wrote:
>>>>>> Before the patch, the boot time from elilo prompt to ssh login 
>>>>>> was 694s.
>>>>>> After the patch, the boot up time was 346s, a saving of 348s 
>>>>>> (about 50%).
>>>>> Having to guesstimate the amount of memory which is needed for a
>>>>> successful boot will be painful.  Any number we choose will be wrong
>>>>> 99% of the time.
>>>>>
>>>>> If the kswapd threads have started, all we need to do is to wait: 
>>>>> take
>>>>> a little nap in the allocator's page==NULL slowpath.
>>>>>
>>>>> I'm not seeing any reason why we can't start kswapd much earlier -
>>>>> right at the start of do_basic_setup()?
>>>> It doesn't even have to be kswapd, it just should be a thread 
>>>> pinned to
>>>> a done. The difficulty is that dealing with the system hashes means 
>>>> the
>>>> initialisation has to happen before vfs_caches_init_early() when 
>>>> there is
>>>> no scheduler. Those allocations could be delayed further but then 
>>>> there is
>>>> the possibility that the allocations would not be contiguous and 
>>>> they'd
>>>> have to rely on CMA to make the attempt. That potentially alters the
>>>> performance of the large system hashes at run time.
>>>>
>>>> We can scale the amount initialised with memory sizes relatively easy.
>>>> This boots on the same 1TB machine I was testing before but that is
>>>> hardly a surprise.
>>>>
>>>> ---8<---
>>>> mm: meminit: Take into account that large system caches scale 
>>>> linearly with memory
>>>>
>>>> Waiman Long reported a 24TB machine triggered an OOM as parallel 
>>>> memory
>>>> initialisation deferred too much memory for initialisation. The likely
>>>> consumer of this memory was large system hashes that scale with memory
>>>> size. This patch initialises at least 2G per node but scales the 
>>>> amount
>>>> initialised for larger systems.
>>>>
>>>> Signed-off-by: Mel Gorman<mgorman@suse.de>
>>>> ---
>>>>   mm/page_alloc.c | 15 +++++++++++++--
>>>>   1 file changed, 13 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>> index 598f78d6544c..f7cc6c9fb909 100644
>>>> --- a/mm/page_alloc.c
>>>> +++ b/mm/page_alloc.c
>>>> @@ -266,15 +266,16 @@ static inline bool 
>>>> early_page_nid_uninitialised(unsigned long pfn, int nid)
>>>>    */
>>>>   static inline bool update_defer_init(pg_data_t *pgdat,
>>>>                   unsigned long pfn, unsigned long zone_end,
>>>> +                unsigned long max_initialise,
>>>>                   unsigned long *nr_initialised)
>>>>   {
>>>>       /* Always populate low zones for address-contrained 
>>>> allocations */
>>>>       if (zone_end<   pgdat_end_pfn(pgdat))
>>>>           return true;
>>>>
>>>> -    /* Initialise at least 2G of the highest zone */
>>>> +    /* Initialise at least the requested amount in the highest 
>>>> zone */
>>>>       (*nr_initialised)++;
>>>> -    if (*nr_initialised>   (2UL<<   (30 - PAGE_SHIFT))&&
>>>> +    if ((*nr_initialised>   max_initialise)&&
>>>>       (pfn&   (PAGES_PER_SECTION - 1)) == 0) {
>>>>           pgdat->first_deferred_pfn = pfn;
>>>>           return false;
>>>> @@ -299,6 +300,7 @@ static inline bool 
>>>> early_page_nid_uninitialised(unsigned long pfn, int nid)
>>>>
>>>>   static inline bool update_defer_init(pg_data_t *pgdat,
>>>>                   unsigned long pfn, unsigned long zone_end,
>>>> +                unsigned long max_initialise,
>>>>                   unsigned long *nr_initialised)
>>>>   {
>>>>       return true;
>>>> @@ -4457,11 +4459,19 @@ void __meminit memmap_init_zone(unsigned 
>>>> long size, int nid, unsigned long zone,
>>>>       unsigned long end_pfn = start_pfn + size;
>>>>       unsigned long pfn;
>>>>       struct zone *z;
>>>> +    unsigned long max_initialise;
>>>>       unsigned long nr_initialised = 0;
>>>>
>>>>       if (highest_memmap_pfn<   end_pfn - 1)
>>>>           highest_memmap_pfn = end_pfn - 1;
>>>>
>>>> +    /*
>>>> +     * Initialise at least 2G of a node but also take into account 
>>>> that
>>>> +     * two large system hashes that can take up an 8th of memory.
>>>> +     */
>>>> +    max_initialise = min(2UL<<   (30 - PAGE_SHIFT),
>>>> +            (pgdat->node_spanned_pages>>   3));
>>>> +
>>> I think you may be pre-allocating too much memory here. On the 24-TB
>>> machine, the size of the dentry and inode hash tables were 16G each.
>>> So the ratio is about is about 32G/24T = 0.13%. I think a shift
>>> factor of (>>  8) which is about 0.39% should be more than enough.
>> I was taking the most pessimistic value possible to match where those
>> hashes currently get allocated from so that the locality does not change
>> after the series is applied. Can you try both (>>  3) and (>>  8) and 
>> see
>> do both work and if so, what the timing is?
>
> Sure. I will try both and get you the results, hopefully by tomorrow 
> at the latest.
>

With the modified patch, both (>>3) and (>>8) worked without any 
problem. The bootup times are:

1. Unpatch 4.0 kernel - 694s
2. Patch kernel with 4G/node - 346s
3. Patch kernel with (>>3) - 389s
4. Patch kernel with (>>8) - 353s

Cheers,
Longman

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
@ 2015-05-06  3:39                   ` Waiman Long
  0 siblings, 0 replies; 168+ messages in thread
From: Waiman Long @ 2015-05-06  3:39 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On 05/05/2015 11:01 AM, Waiman Long wrote:
> On 05/05/2015 10:31 AM, Mel Gorman wrote:
>> On Tue, May 05, 2015 at 09:55:52AM -0400, Waiman Long wrote:
>>> On 05/05/2015 06:45 AM, Mel Gorman wrote:
>>>> On Mon, May 04, 2015 at 02:30:46PM -0700, Andrew Morton wrote:
>>>>>> Before the patch, the boot time from elilo prompt to ssh login 
>>>>>> was 694s.
>>>>>> After the patch, the boot up time was 346s, a saving of 348s 
>>>>>> (about 50%).
>>>>> Having to guesstimate the amount of memory which is needed for a
>>>>> successful boot will be painful.  Any number we choose will be wrong
>>>>> 99% of the time.
>>>>>
>>>>> If the kswapd threads have started, all we need to do is to wait: 
>>>>> take
>>>>> a little nap in the allocator's page==NULL slowpath.
>>>>>
>>>>> I'm not seeing any reason why we can't start kswapd much earlier -
>>>>> right at the start of do_basic_setup()?
>>>> It doesn't even have to be kswapd, it just should be a thread 
>>>> pinned to
>>>> a done. The difficulty is that dealing with the system hashes means 
>>>> the
>>>> initialisation has to happen before vfs_caches_init_early() when 
>>>> there is
>>>> no scheduler. Those allocations could be delayed further but then 
>>>> there is
>>>> the possibility that the allocations would not be contiguous and 
>>>> they'd
>>>> have to rely on CMA to make the attempt. That potentially alters the
>>>> performance of the large system hashes at run time.
>>>>
>>>> We can scale the amount initialised with memory sizes relatively easy.
>>>> This boots on the same 1TB machine I was testing before but that is
>>>> hardly a surprise.
>>>>
>>>> ---8<---
>>>> mm: meminit: Take into account that large system caches scale 
>>>> linearly with memory
>>>>
>>>> Waiman Long reported a 24TB machine triggered an OOM as parallel 
>>>> memory
>>>> initialisation deferred too much memory for initialisation. The likely
>>>> consumer of this memory was large system hashes that scale with memory
>>>> size. This patch initialises at least 2G per node but scales the 
>>>> amount
>>>> initialised for larger systems.
>>>>
>>>> Signed-off-by: Mel Gorman<mgorman@suse.de>
>>>> ---
>>>>   mm/page_alloc.c | 15 +++++++++++++--
>>>>   1 file changed, 13 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>> index 598f78d6544c..f7cc6c9fb909 100644
>>>> --- a/mm/page_alloc.c
>>>> +++ b/mm/page_alloc.c
>>>> @@ -266,15 +266,16 @@ static inline bool 
>>>> early_page_nid_uninitialised(unsigned long pfn, int nid)
>>>>    */
>>>>   static inline bool update_defer_init(pg_data_t *pgdat,
>>>>                   unsigned long pfn, unsigned long zone_end,
>>>> +                unsigned long max_initialise,
>>>>                   unsigned long *nr_initialised)
>>>>   {
>>>>       /* Always populate low zones for address-contrained 
>>>> allocations */
>>>>       if (zone_end<   pgdat_end_pfn(pgdat))
>>>>           return true;
>>>>
>>>> -    /* Initialise at least 2G of the highest zone */
>>>> +    /* Initialise at least the requested amount in the highest 
>>>> zone */
>>>>       (*nr_initialised)++;
>>>> -    if (*nr_initialised>   (2UL<<   (30 - PAGE_SHIFT))&&
>>>> +    if ((*nr_initialised>   max_initialise)&&
>>>>       (pfn&   (PAGES_PER_SECTION - 1)) == 0) {
>>>>           pgdat->first_deferred_pfn = pfn;
>>>>           return false;
>>>> @@ -299,6 +300,7 @@ static inline bool 
>>>> early_page_nid_uninitialised(unsigned long pfn, int nid)
>>>>
>>>>   static inline bool update_defer_init(pg_data_t *pgdat,
>>>>                   unsigned long pfn, unsigned long zone_end,
>>>> +                unsigned long max_initialise,
>>>>                   unsigned long *nr_initialised)
>>>>   {
>>>>       return true;
>>>> @@ -4457,11 +4459,19 @@ void __meminit memmap_init_zone(unsigned 
>>>> long size, int nid, unsigned long zone,
>>>>       unsigned long end_pfn = start_pfn + size;
>>>>       unsigned long pfn;
>>>>       struct zone *z;
>>>> +    unsigned long max_initialise;
>>>>       unsigned long nr_initialised = 0;
>>>>
>>>>       if (highest_memmap_pfn<   end_pfn - 1)
>>>>           highest_memmap_pfn = end_pfn - 1;
>>>>
>>>> +    /*
>>>> +     * Initialise at least 2G of a node but also take into account 
>>>> that
>>>> +     * two large system hashes that can take up an 8th of memory.
>>>> +     */
>>>> +    max_initialise = min(2UL<<   (30 - PAGE_SHIFT),
>>>> +            (pgdat->node_spanned_pages>>   3));
>>>> +
>>> I think you may be pre-allocating too much memory here. On the 24-TB
>>> machine, the size of the dentry and inode hash tables were 16G each.
>>> So the ratio is about is about 32G/24T = 0.13%. I think a shift
>>> factor of (>>  8) which is about 0.39% should be more than enough.
>> I was taking the most pessimistic value possible to match where those
>> hashes currently get allocated from so that the locality does not change
>> after the series is applied. Can you try both (>>  3) and (>>  8) and 
>> see
>> do both work and if so, what the timing is?
>
> Sure. I will try both and get you the results, hopefully by tomorrow 
> at the latest.
>

With the modified patch, both (>>3) and (>>8) worked without any 
problem. The bootup times are:

1. Unpatch 4.0 kernel - 694s
2. Patch kernel with 4G/node - 346s
3. Patch kernel with (>>3) - 389s
4. Patch kernel with (>>8) - 353s

Cheers,
Longman

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
  2015-05-05 22:25                 ` Andrew Morton
@ 2015-05-06  7:12                   ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-05-06  7:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Waiman Long, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Tue, May 05, 2015 at 03:25:49PM -0700, Andrew Morton wrote:
> On Tue, 5 May 2015 23:13:29 +0100 Mel Gorman <mgorman@suse.de> wrote:
> 
> > > Alternatively, the page allocator can go off and synchronously
> > > initialize some pageframes itself.  Keep doing that until the
> > > allocation attempt succeeds.
> > > 
> > 
> > That was rejected during review of earlier attempts at this feature on
> > the grounds that it impacted allocator fast paths. 
> 
> eh?  Changes are only needed on the allocation-attempt-failed path,
> which is slow-path.

We'd have to distinguish between falling back to other zones because the
high zone is artifically exhausted and normal ALLOC_BATCH exhaustion. We'd
also have to avoid falling back to remote nodes prematurely. While I have
not tried an implementation, I expected they would need to be in the fast
paths unless I used jump labels to get around it. I'm going to try altering
when we initialise instead so that it happens earlier.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
@ 2015-05-06  7:12                   ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-05-06  7:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Waiman Long, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Tue, May 05, 2015 at 03:25:49PM -0700, Andrew Morton wrote:
> On Tue, 5 May 2015 23:13:29 +0100 Mel Gorman <mgorman@suse.de> wrote:
> 
> > > Alternatively, the page allocator can go off and synchronously
> > > initialize some pageframes itself.  Keep doing that until the
> > > allocation attempt succeeds.
> > > 
> > 
> > That was rejected during review of earlier attempts at this feature on
> > the grounds that it impacted allocator fast paths. 
> 
> eh?  Changes are only needed on the allocation-attempt-failed path,
> which is slow-path.

We'd have to distinguish between falling back to other zones because the
high zone is artifically exhausted and normal ALLOC_BATCH exhaustion. We'd
also have to avoid falling back to remote nodes prematurely. While I have
not tried an implementation, I expected they would need to be in the fast
paths unless I used jump labels to get around it. I'm going to try altering
when we initialise instead so that it happens earlier.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
  2015-05-06  7:12                   ` Mel Gorman
@ 2015-05-06 10:22                     ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-05-06 10:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Waiman Long, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Wed, May 06, 2015 at 08:12:46AM +0100, Mel Gorman wrote:
> On Tue, May 05, 2015 at 03:25:49PM -0700, Andrew Morton wrote:
> > On Tue, 5 May 2015 23:13:29 +0100 Mel Gorman <mgorman@suse.de> wrote:
> > 
> > > > Alternatively, the page allocator can go off and synchronously
> > > > initialize some pageframes itself.  Keep doing that until the
> > > > allocation attempt succeeds.
> > > > 
> > > 
> > > That was rejected during review of earlier attempts at this feature on
> > > the grounds that it impacted allocator fast paths. 
> > 
> > eh?  Changes are only needed on the allocation-attempt-failed path,
> > which is slow-path.
> 
> We'd have to distinguish between falling back to other zones because the
> high zone is artifically exhausted and normal ALLOC_BATCH exhaustion. We'd
> also have to avoid falling back to remote nodes prematurely. While I have
> not tried an implementation, I expected they would need to be in the fast
> paths unless I used jump labels to get around it. I'm going to try altering
> when we initialise instead so that it happens earlier.
> 

Which looks as follows. Waiman, a test on the 24TB machine would be
appreciated again. This patch should be applied instead of "mm: meminit:
Take into account that large system caches scale linearly with memory"

---8<---
mm: meminit: Finish initialisation of memory before basic setup

Waiman Long reported that 24TB machines hit OOM during basic setup when
struct page initialisation was deferred. One approach is to initialise memory
on demand but it interferes with page allocator paths. This patch creates
dedicated threads to initialise memory before basic setup. It then blocks
on a rw_semaphore until completion as a wait_queue and counter is overkill.
This may be slower to boot but it's simplier overall and also gets rid of a
lot of section mangling which existed so kswapd could do the initialisation.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/gfp.h |  8 ++++++++
 init/main.c         |  2 ++
 mm/internal.h       | 24 ------------------------
 mm/page_alloc.c     | 44 ++++++++++++++++++++++++++++++++++++--------
 mm/vmscan.c         |  6 ++----
 5 files changed, 48 insertions(+), 36 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 51bd1e72a917..28a3128d9e59 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -385,6 +385,14 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp);
 void drain_all_pages(struct zone *zone);
 void drain_local_pages(struct zone *zone);
 
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+void page_alloc_init_late(void);
+#else
+static inline void page_alloc_init_late(void)
+{
+}
+#endif
+
 /*
  * gfp_allowed_mask is set to GFP_BOOT_MASK during early boot to restrict what
  * GFP flags are used before interrupts are enabled. Once interrupts are
diff --git a/init/main.c b/init/main.c
index 6f0f1c5ff8cc..9bef5f0c9864 100644
--- a/init/main.c
+++ b/init/main.c
@@ -995,6 +995,8 @@ static noinline void __init kernel_init_freeable(void)
 	smp_init();
 	sched_init_smp();
 
+	page_alloc_init_late();
+
 	do_basic_setup();
 
 	/* Open the /dev/console on the rootfs, this should never fail */
diff --git a/mm/internal.h b/mm/internal.h
index 5c221ad41a29..5a7c7a531720 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -377,30 +377,6 @@ static inline void mminit_verify_zonelist(void)
 }
 #endif /* CONFIG_DEBUG_MEMORY_INIT */
 
-/*
- * Deferred struct page initialisation requires init functions that are freed
- * before kswapd is available. Reuse the memory hotplug section annotation
- * to mark the required code.
- *
- * __defermem_init is code that always exists but is annotated __meminit to
- * 	avoid section warnings.
- * __defer_init code gets marked __meminit when deferring struct page
- *	initialistion but is otherwise in the init section.
- */
-#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
-#define __defermem_init __meminit
-#define __defer_init    __meminit
-
-void deferred_init_memmap(int nid);
-#else
-#define __defermem_init
-#define __defer_init __init
-
-static inline void deferred_init_memmap(int nid)
-{
-}
-#endif
-
 /* mminit_validate_memmodel_limits is independent of CONFIG_DEBUG_MEMORY_INIT */
 #if defined(CONFIG_SPARSEMEM)
 extern void mminit_validate_memmodel_limits(unsigned long *start_pfn,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 598f78d6544c..1cef116727b6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -61,6 +61,7 @@
 #include <linux/hugetlb.h>
 #include <linux/sched/rt.h>
 #include <linux/page_owner.h>
+#include <linux/kthread.h>
 
 #include <asm/sections.h>
 #include <asm/tlbflush.h>
@@ -242,7 +243,7 @@ static inline void reset_deferred_meminit(pg_data_t *pgdat)
 }
 
 /* Returns true if the struct page for the pfn is uninitialised */
-static inline bool __defermem_init early_page_uninitialised(unsigned long pfn)
+static inline bool __init early_page_uninitialised(unsigned long pfn)
 {
 	int nid = early_pfn_to_nid(pfn);
 
@@ -972,7 +973,7 @@ static void __free_pages_ok(struct page *page, unsigned int order)
 	local_irq_restore(flags);
 }
 
-static void __defer_init __free_pages_boot_core(struct page *page,
+static void __init __free_pages_boot_core(struct page *page,
 					unsigned long pfn, unsigned int order)
 {
 	unsigned int nr_pages = 1 << order;
@@ -1039,7 +1040,7 @@ static inline bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
 }
 #endif
 
-void __defer_init __free_pages_bootmem(struct page *page, unsigned long pfn,
+void __init __free_pages_bootmem(struct page *page, unsigned long pfn,
 							unsigned int order)
 {
 	if (early_page_uninitialised(pfn))
@@ -1048,7 +1049,7 @@ void __defer_init __free_pages_bootmem(struct page *page, unsigned long pfn,
 }
 
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
-static void __defermem_init deferred_free_range(struct page *page,
+static void __init deferred_free_range(struct page *page,
 					unsigned long pfn, int nr_pages)
 {
 	int i;
@@ -1068,20 +1069,30 @@ static void __defermem_init deferred_free_range(struct page *page,
 		__free_pages_boot_core(page, pfn, 0);
 }
 
+static struct rw_semaphore __initdata pgdat_init_rwsem;
+
 /* Initialise remaining memory on a node */
-void __defermem_init deferred_init_memmap(int nid)
+static int __init deferred_init_memmap(void *data)
 {
+	pg_data_t *pgdat = (pg_data_t *)data;
+	int nid = pgdat->node_id;
 	struct mminit_pfnnid_cache nid_init_state = { };
 	unsigned long start = jiffies;
 	unsigned long nr_pages = 0;
 	unsigned long walk_start, walk_end;
 	int i, zid;
 	struct zone *zone;
-	pg_data_t *pgdat = NODE_DATA(nid);
 	unsigned long first_init_pfn = pgdat->first_deferred_pfn;
+	const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id);
 
-	if (first_init_pfn == ULONG_MAX)
-		return;
+	if (first_init_pfn == ULONG_MAX) {
+		up_read(&pgdat_init_rwsem);
+		return 0;
+	}
+
+	/* Bound memory initialisation to a local node if possible */
+	if (!cpumask_empty(cpumask))
+		set_cpus_allowed_ptr(current, cpumask);
 
 	/* Sanity check boundaries */
 	BUG_ON(pgdat->first_deferred_pfn < pgdat->node_start_pfn);
@@ -1175,6 +1186,23 @@ free_range:
 
 	pr_info("kswapd %d initialised %lu pages in %ums\n", nid, nr_pages,
 					jiffies_to_msecs(jiffies - start));
+	up_read(&pgdat_init_rwsem);
+	return 0;
+}
+
+void __init page_alloc_init_late(void)
+{
+	int nid;
+
+	init_rwsem(&pgdat_init_rwsem);
+	for_each_node_state(nid, N_MEMORY) {
+		down_read(&pgdat_init_rwsem);
+		kthread_run(deferred_init_memmap, NODE_DATA(nid), "pgdatinit%d", nid);
+	}
+
+	/* Block until all are initialised */
+	down_write(&pgdat_init_rwsem);
+	up_write(&pgdat_init_rwsem);
 }
 #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c4895d26d036..5e8eadd71bac 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3348,7 +3348,7 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int order, int classzone_idx)
  * If there are applications that are active memory-allocators
  * (most normal use), this basically shouldn't matter.
  */
-static int __defermem_init kswapd(void *p)
+static int kswapd(void *p)
 {
 	unsigned long order, new_order;
 	unsigned balanced_order;
@@ -3383,8 +3383,6 @@ static int __defermem_init kswapd(void *p)
 	tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
 	set_freezable();
 
-	deferred_init_memmap(pgdat->node_id);
-
 	order = new_order = 0;
 	balanced_order = 0;
 	classzone_idx = new_classzone_idx = pgdat->nr_zones - 1;
@@ -3540,7 +3538,7 @@ static int cpu_callback(struct notifier_block *nfb, unsigned long action,
  * This kswapd start function will be called by init and node-hot-add.
  * On node-hot-add, kswapd will moved to proper cpus if cpus are hot-added.
  */
-int __defermem_init kswapd_run(int nid)
+int kswapd_run(int nid)
 {
 	pg_data_t *pgdat = NODE_DATA(nid);
 	int ret = 0;

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
@ 2015-05-06 10:22                     ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-05-06 10:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Waiman Long, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Wed, May 06, 2015 at 08:12:46AM +0100, Mel Gorman wrote:
> On Tue, May 05, 2015 at 03:25:49PM -0700, Andrew Morton wrote:
> > On Tue, 5 May 2015 23:13:29 +0100 Mel Gorman <mgorman@suse.de> wrote:
> > 
> > > > Alternatively, the page allocator can go off and synchronously
> > > > initialize some pageframes itself.  Keep doing that until the
> > > > allocation attempt succeeds.
> > > > 
> > > 
> > > That was rejected during review of earlier attempts at this feature on
> > > the grounds that it impacted allocator fast paths. 
> > 
> > eh?  Changes are only needed on the allocation-attempt-failed path,
> > which is slow-path.
> 
> We'd have to distinguish between falling back to other zones because the
> high zone is artifically exhausted and normal ALLOC_BATCH exhaustion. We'd
> also have to avoid falling back to remote nodes prematurely. While I have
> not tried an implementation, I expected they would need to be in the fast
> paths unless I used jump labels to get around it. I'm going to try altering
> when we initialise instead so that it happens earlier.
> 

Which looks as follows. Waiman, a test on the 24TB machine would be
appreciated again. This patch should be applied instead of "mm: meminit:
Take into account that large system caches scale linearly with memory"

---8<---
mm: meminit: Finish initialisation of memory before basic setup

Waiman Long reported that 24TB machines hit OOM during basic setup when
struct page initialisation was deferred. One approach is to initialise memory
on demand but it interferes with page allocator paths. This patch creates
dedicated threads to initialise memory before basic setup. It then blocks
on a rw_semaphore until completion as a wait_queue and counter is overkill.
This may be slower to boot but it's simplier overall and also gets rid of a
lot of section mangling which existed so kswapd could do the initialisation.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/gfp.h |  8 ++++++++
 init/main.c         |  2 ++
 mm/internal.h       | 24 ------------------------
 mm/page_alloc.c     | 44 ++++++++++++++++++++++++++++++++++++--------
 mm/vmscan.c         |  6 ++----
 5 files changed, 48 insertions(+), 36 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 51bd1e72a917..28a3128d9e59 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -385,6 +385,14 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp);
 void drain_all_pages(struct zone *zone);
 void drain_local_pages(struct zone *zone);
 
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+void page_alloc_init_late(void);
+#else
+static inline void page_alloc_init_late(void)
+{
+}
+#endif
+
 /*
  * gfp_allowed_mask is set to GFP_BOOT_MASK during early boot to restrict what
  * GFP flags are used before interrupts are enabled. Once interrupts are
diff --git a/init/main.c b/init/main.c
index 6f0f1c5ff8cc..9bef5f0c9864 100644
--- a/init/main.c
+++ b/init/main.c
@@ -995,6 +995,8 @@ static noinline void __init kernel_init_freeable(void)
 	smp_init();
 	sched_init_smp();
 
+	page_alloc_init_late();
+
 	do_basic_setup();
 
 	/* Open the /dev/console on the rootfs, this should never fail */
diff --git a/mm/internal.h b/mm/internal.h
index 5c221ad41a29..5a7c7a531720 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -377,30 +377,6 @@ static inline void mminit_verify_zonelist(void)
 }
 #endif /* CONFIG_DEBUG_MEMORY_INIT */
 
-/*
- * Deferred struct page initialisation requires init functions that are freed
- * before kswapd is available. Reuse the memory hotplug section annotation
- * to mark the required code.
- *
- * __defermem_init is code that always exists but is annotated __meminit to
- * 	avoid section warnings.
- * __defer_init code gets marked __meminit when deferring struct page
- *	initialistion but is otherwise in the init section.
- */
-#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
-#define __defermem_init __meminit
-#define __defer_init    __meminit
-
-void deferred_init_memmap(int nid);
-#else
-#define __defermem_init
-#define __defer_init __init
-
-static inline void deferred_init_memmap(int nid)
-{
-}
-#endif
-
 /* mminit_validate_memmodel_limits is independent of CONFIG_DEBUG_MEMORY_INIT */
 #if defined(CONFIG_SPARSEMEM)
 extern void mminit_validate_memmodel_limits(unsigned long *start_pfn,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 598f78d6544c..1cef116727b6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -61,6 +61,7 @@
 #include <linux/hugetlb.h>
 #include <linux/sched/rt.h>
 #include <linux/page_owner.h>
+#include <linux/kthread.h>
 
 #include <asm/sections.h>
 #include <asm/tlbflush.h>
@@ -242,7 +243,7 @@ static inline void reset_deferred_meminit(pg_data_t *pgdat)
 }
 
 /* Returns true if the struct page for the pfn is uninitialised */
-static inline bool __defermem_init early_page_uninitialised(unsigned long pfn)
+static inline bool __init early_page_uninitialised(unsigned long pfn)
 {
 	int nid = early_pfn_to_nid(pfn);
 
@@ -972,7 +973,7 @@ static void __free_pages_ok(struct page *page, unsigned int order)
 	local_irq_restore(flags);
 }
 
-static void __defer_init __free_pages_boot_core(struct page *page,
+static void __init __free_pages_boot_core(struct page *page,
 					unsigned long pfn, unsigned int order)
 {
 	unsigned int nr_pages = 1 << order;
@@ -1039,7 +1040,7 @@ static inline bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
 }
 #endif
 
-void __defer_init __free_pages_bootmem(struct page *page, unsigned long pfn,
+void __init __free_pages_bootmem(struct page *page, unsigned long pfn,
 							unsigned int order)
 {
 	if (early_page_uninitialised(pfn))
@@ -1048,7 +1049,7 @@ void __defer_init __free_pages_bootmem(struct page *page, unsigned long pfn,
 }
 
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
-static void __defermem_init deferred_free_range(struct page *page,
+static void __init deferred_free_range(struct page *page,
 					unsigned long pfn, int nr_pages)
 {
 	int i;
@@ -1068,20 +1069,30 @@ static void __defermem_init deferred_free_range(struct page *page,
 		__free_pages_boot_core(page, pfn, 0);
 }
 
+static struct rw_semaphore __initdata pgdat_init_rwsem;
+
 /* Initialise remaining memory on a node */
-void __defermem_init deferred_init_memmap(int nid)
+static int __init deferred_init_memmap(void *data)
 {
+	pg_data_t *pgdat = (pg_data_t *)data;
+	int nid = pgdat->node_id;
 	struct mminit_pfnnid_cache nid_init_state = { };
 	unsigned long start = jiffies;
 	unsigned long nr_pages = 0;
 	unsigned long walk_start, walk_end;
 	int i, zid;
 	struct zone *zone;
-	pg_data_t *pgdat = NODE_DATA(nid);
 	unsigned long first_init_pfn = pgdat->first_deferred_pfn;
+	const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id);
 
-	if (first_init_pfn == ULONG_MAX)
-		return;
+	if (first_init_pfn == ULONG_MAX) {
+		up_read(&pgdat_init_rwsem);
+		return 0;
+	}
+
+	/* Bound memory initialisation to a local node if possible */
+	if (!cpumask_empty(cpumask))
+		set_cpus_allowed_ptr(current, cpumask);
 
 	/* Sanity check boundaries */
 	BUG_ON(pgdat->first_deferred_pfn < pgdat->node_start_pfn);
@@ -1175,6 +1186,23 @@ free_range:
 
 	pr_info("kswapd %d initialised %lu pages in %ums\n", nid, nr_pages,
 					jiffies_to_msecs(jiffies - start));
+	up_read(&pgdat_init_rwsem);
+	return 0;
+}
+
+void __init page_alloc_init_late(void)
+{
+	int nid;
+
+	init_rwsem(&pgdat_init_rwsem);
+	for_each_node_state(nid, N_MEMORY) {
+		down_read(&pgdat_init_rwsem);
+		kthread_run(deferred_init_memmap, NODE_DATA(nid), "pgdatinit%d", nid);
+	}
+
+	/* Block until all are initialised */
+	down_write(&pgdat_init_rwsem);
+	up_write(&pgdat_init_rwsem);
 }
 #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c4895d26d036..5e8eadd71bac 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3348,7 +3348,7 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int order, int classzone_idx)
  * If there are applications that are active memory-allocators
  * (most normal use), this basically shouldn't matter.
  */
-static int __defermem_init kswapd(void *p)
+static int kswapd(void *p)
 {
 	unsigned long order, new_order;
 	unsigned balanced_order;
@@ -3383,8 +3383,6 @@ static int __defermem_init kswapd(void *p)
 	tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
 	set_freezable();
 
-	deferred_init_memmap(pgdat->node_id);
-
 	order = new_order = 0;
 	balanced_order = 0;
 	classzone_idx = new_classzone_idx = pgdat->nr_zones - 1;
@@ -3540,7 +3538,7 @@ static int cpu_callback(struct notifier_block *nfb, unsigned long action,
  * This kswapd start function will be called by init and node-hot-add.
  * On node-hot-add, kswapd will moved to proper cpus if cpus are hot-added.
  */
-int __defermem_init kswapd_run(int nid)
+int kswapd_run(int nid)
 {
 	pg_data_t *pgdat = NODE_DATA(nid);
 	int ret = 0;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
  2015-05-06 10:22                     ` Mel Gorman
@ 2015-05-06 12:05                       ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-05-06 12:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Waiman Long, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Wed, May 06, 2015 at 11:22:20AM +0100, Mel Gorman wrote:
> On Wed, May 06, 2015 at 08:12:46AM +0100, Mel Gorman wrote:
> > On Tue, May 05, 2015 at 03:25:49PM -0700, Andrew Morton wrote:
> > > On Tue, 5 May 2015 23:13:29 +0100 Mel Gorman <mgorman@suse.de> wrote:
> > > 
> > > > > Alternatively, the page allocator can go off and synchronously
> > > > > initialize some pageframes itself.  Keep doing that until the
> > > > > allocation attempt succeeds.
> > > > > 
> > > > 
> > > > That was rejected during review of earlier attempts at this feature on
> > > > the grounds that it impacted allocator fast paths. 
> > > 
> > > eh?  Changes are only needed on the allocation-attempt-failed path,
> > > which is slow-path.
> > 
> > We'd have to distinguish between falling back to other zones because the
> > high zone is artifically exhausted and normal ALLOC_BATCH exhaustion. We'd
> > also have to avoid falling back to remote nodes prematurely. While I have
> > not tried an implementation, I expected they would need to be in the fast
> > paths unless I used jump labels to get around it. I'm going to try altering
> > when we initialise instead so that it happens earlier.
> > 
> 
> Which looks as follows. Waiman, a test on the 24TB machine would be
> appreciated again. This patch should be applied instead of "mm: meminit:
> Take into account that large system caches scale linearly with memory"
> 
> ---8<---
> mm: meminit: Finish initialisation of memory before basic setup
> 

*sigh* Eventually build testing found the need for this

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1cef116727b6..052b9ba65b66 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -243,7 +243,7 @@ static inline void reset_deferred_meminit(pg_data_t *pgdat)
 }
 
 /* Returns true if the struct page for the pfn is uninitialised */
-static inline bool __init early_page_uninitialised(unsigned long pfn)
+static inline bool __meminit early_page_uninitialised(unsigned long pfn)
 {
 	int nid = early_pfn_to_nid(pfn);
 

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
@ 2015-05-06 12:05                       ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-05-06 12:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Waiman Long, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Wed, May 06, 2015 at 11:22:20AM +0100, Mel Gorman wrote:
> On Wed, May 06, 2015 at 08:12:46AM +0100, Mel Gorman wrote:
> > On Tue, May 05, 2015 at 03:25:49PM -0700, Andrew Morton wrote:
> > > On Tue, 5 May 2015 23:13:29 +0100 Mel Gorman <mgorman@suse.de> wrote:
> > > 
> > > > > Alternatively, the page allocator can go off and synchronously
> > > > > initialize some pageframes itself.  Keep doing that until the
> > > > > allocation attempt succeeds.
> > > > > 
> > > > 
> > > > That was rejected during review of earlier attempts at this feature on
> > > > the grounds that it impacted allocator fast paths. 
> > > 
> > > eh?  Changes are only needed on the allocation-attempt-failed path,
> > > which is slow-path.
> > 
> > We'd have to distinguish between falling back to other zones because the
> > high zone is artifically exhausted and normal ALLOC_BATCH exhaustion. We'd
> > also have to avoid falling back to remote nodes prematurely. While I have
> > not tried an implementation, I expected they would need to be in the fast
> > paths unless I used jump labels to get around it. I'm going to try altering
> > when we initialise instead so that it happens earlier.
> > 
> 
> Which looks as follows. Waiman, a test on the 24TB machine would be
> appreciated again. This patch should be applied instead of "mm: meminit:
> Take into account that large system caches scale linearly with memory"
> 
> ---8<---
> mm: meminit: Finish initialisation of memory before basic setup
> 

*sigh* Eventually build testing found the need for this

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1cef116727b6..052b9ba65b66 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -243,7 +243,7 @@ static inline void reset_deferred_meminit(pg_data_t *pgdat)
 }
 
 /* Returns true if the struct page for the pfn is uninitialised */
-static inline bool __init early_page_uninitialised(unsigned long pfn)
+static inline bool __meminit early_page_uninitialised(unsigned long pfn)
 {
 	int nid = early_pfn_to_nid(pfn);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
  2015-05-06 10:22                     ` Mel Gorman
@ 2015-05-06 17:58                       ` Waiman Long
  -1 siblings, 0 replies; 168+ messages in thread
From: Waiman Long @ 2015-05-06 17:58 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On 05/06/2015 06:22 AM, Mel Gorman wrote:
> On Wed, May 06, 2015 at 08:12:46AM +0100, Mel Gorman wrote:
>> On Tue, May 05, 2015 at 03:25:49PM -0700, Andrew Morton wrote:
>>> On Tue, 5 May 2015 23:13:29 +0100 Mel Gorman<mgorman@suse.de>  wrote:
>>>
>>>>> Alternatively, the page allocator can go off and synchronously
>>>>> initialize some pageframes itself.  Keep doing that until the
>>>>> allocation attempt succeeds.
>>>>>
>>>> That was rejected during review of earlier attempts at this feature on
>>>> the grounds that it impacted allocator fast paths.
>>> eh?  Changes are only needed on the allocation-attempt-failed path,
>>> which is slow-path.
>> We'd have to distinguish between falling back to other zones because the
>> high zone is artifically exhausted and normal ALLOC_BATCH exhaustion. We'd
>> also have to avoid falling back to remote nodes prematurely. While I have
>> not tried an implementation, I expected they would need to be in the fast
>> paths unless I used jump labels to get around it. I'm going to try altering
>> when we initialise instead so that it happens earlier.
>>
> Which looks as follows. Waiman, a test on the 24TB machine would be
> appreciated again. This patch should be applied instead of "mm: meminit:
> Take into account that large system caches scale linearly with memory"
>
> ---8<---
> mm: meminit: Finish initialisation of memory before basic setup
>
> Waiman Long reported that 24TB machines hit OOM during basic setup when
> struct page initialisation was deferred. One approach is to initialise memory
> on demand but it interferes with page allocator paths. This patch creates
> dedicated threads to initialise memory before basic setup. It then blocks
> on a rw_semaphore until completion as a wait_queue and counter is overkill.
> This may be slower to boot but it's simplier overall and also gets rid of a
> lot of section mangling which existed so kswapd could do the initialisation.
>
> Signed-off-by: Mel Gorman<mgorman@suse.de>
>

This patch moves the deferred meminit from kswapd to its own kernel 
threads started after smp_init(). However, the hash table allocation was 
done earlier than that. It seems like it will still run out of memory in 
the 24TB machine that I tested on.

I will certainly try it out, but I doubt it will solve the problem on 
its own.

Cheers,
Longman



^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
@ 2015-05-06 17:58                       ` Waiman Long
  0 siblings, 0 replies; 168+ messages in thread
From: Waiman Long @ 2015-05-06 17:58 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On 05/06/2015 06:22 AM, Mel Gorman wrote:
> On Wed, May 06, 2015 at 08:12:46AM +0100, Mel Gorman wrote:
>> On Tue, May 05, 2015 at 03:25:49PM -0700, Andrew Morton wrote:
>>> On Tue, 5 May 2015 23:13:29 +0100 Mel Gorman<mgorman@suse.de>  wrote:
>>>
>>>>> Alternatively, the page allocator can go off and synchronously
>>>>> initialize some pageframes itself.  Keep doing that until the
>>>>> allocation attempt succeeds.
>>>>>
>>>> That was rejected during review of earlier attempts at this feature on
>>>> the grounds that it impacted allocator fast paths.
>>> eh?  Changes are only needed on the allocation-attempt-failed path,
>>> which is slow-path.
>> We'd have to distinguish between falling back to other zones because the
>> high zone is artifically exhausted and normal ALLOC_BATCH exhaustion. We'd
>> also have to avoid falling back to remote nodes prematurely. While I have
>> not tried an implementation, I expected they would need to be in the fast
>> paths unless I used jump labels to get around it. I'm going to try altering
>> when we initialise instead so that it happens earlier.
>>
> Which looks as follows. Waiman, a test on the 24TB machine would be
> appreciated again. This patch should be applied instead of "mm: meminit:
> Take into account that large system caches scale linearly with memory"
>
> ---8<---
> mm: meminit: Finish initialisation of memory before basic setup
>
> Waiman Long reported that 24TB machines hit OOM during basic setup when
> struct page initialisation was deferred. One approach is to initialise memory
> on demand but it interferes with page allocator paths. This patch creates
> dedicated threads to initialise memory before basic setup. It then blocks
> on a rw_semaphore until completion as a wait_queue and counter is overkill.
> This may be slower to boot but it's simplier overall and also gets rid of a
> lot of section mangling which existed so kswapd could do the initialisation.
>
> Signed-off-by: Mel Gorman<mgorman@suse.de>
>

This patch moves the deferred meminit from kswapd to its own kernel 
threads started after smp_init(). However, the hash table allocation was 
done earlier than that. It seems like it will still run out of memory in 
the 24TB machine that I tested on.

I will certainly try it out, but I doubt it will solve the problem on 
its own.

Cheers,
Longman


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
  2015-05-06 17:58                       ` Waiman Long
@ 2015-05-07  2:37                         ` Waiman Long
  -1 siblings, 0 replies; 168+ messages in thread
From: Waiman Long @ 2015-05-07  2:37 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On 05/06/2015 01:58 PM, Waiman Long wrote:
> On 05/06/2015 06:22 AM, Mel Gorman wrote:
>> On Wed, May 06, 2015 at 08:12:46AM +0100, Mel Gorman wrote:
>>> On Tue, May 05, 2015 at 03:25:49PM -0700, Andrew Morton wrote:
>>>> On Tue, 5 May 2015 23:13:29 +0100 Mel Gorman<mgorman@suse.de>  wrote:
>>>>
>>>>>> Alternatively, the page allocator can go off and synchronously
>>>>>> initialize some pageframes itself.  Keep doing that until the
>>>>>> allocation attempt succeeds.
>>>>>>
>>>>> That was rejected during review of earlier attempts at this 
>>>>> feature on
>>>>> the grounds that it impacted allocator fast paths.
>>>> eh?  Changes are only needed on the allocation-attempt-failed path,
>>>> which is slow-path.
>>> We'd have to distinguish between falling back to other zones because 
>>> the
>>> high zone is artifically exhausted and normal ALLOC_BATCH 
>>> exhaustion. We'd
>>> also have to avoid falling back to remote nodes prematurely. While I 
>>> have
>>> not tried an implementation, I expected they would need to be in the 
>>> fast
>>> paths unless I used jump labels to get around it. I'm going to try 
>>> altering
>>> when we initialise instead so that it happens earlier.
>>>
>> Which looks as follows. Waiman, a test on the 24TB machine would be
>> appreciated again. This patch should be applied instead of "mm: meminit:
>> Take into account that large system caches scale linearly with memory"
>>
>> ---8<---
>> mm: meminit: Finish initialisation of memory before basic setup
>>
>> Waiman Long reported that 24TB machines hit OOM during basic setup when
>> struct page initialisation was deferred. One approach is to 
>> initialise memory
>> on demand but it interferes with page allocator paths. This patch 
>> creates
>> dedicated threads to initialise memory before basic setup. It then 
>> blocks
>> on a rw_semaphore until completion as a wait_queue and counter is 
>> overkill.
>> This may be slower to boot but it's simplier overall and also gets 
>> rid of a
>> lot of section mangling which existed so kswapd could do the 
>> initialisation.
>>
>> Signed-off-by: Mel Gorman<mgorman@suse.de>
>>
>
> This patch moves the deferred meminit from kswapd to its own kernel 
> threads started after smp_init(). However, the hash table allocation 
> was done earlier than that. It seems like it will still run out of 
> memory in the 24TB machine that I tested on.
>
> I will certainly try it out, but I doubt it will solve the problem on 
> its own.

It turns out that the two new patches did work on the 24-TB DragonHawk 
without the "mm: meminit: Take into account that large system caches 
scale linearly with memory" patch. The bootup time was 357s which was 
just a few seconds slower than the other bootup times that I sent you 
yesterday.

BTW, do you want to change the following log message as kswapd will no 
longer be the one doing deferred meminit?

     kswapd 0 initialised 396098436 pages in 6024ms

Cheers,
Longman


^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
@ 2015-05-07  2:37                         ` Waiman Long
  0 siblings, 0 replies; 168+ messages in thread
From: Waiman Long @ 2015-05-07  2:37 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On 05/06/2015 01:58 PM, Waiman Long wrote:
> On 05/06/2015 06:22 AM, Mel Gorman wrote:
>> On Wed, May 06, 2015 at 08:12:46AM +0100, Mel Gorman wrote:
>>> On Tue, May 05, 2015 at 03:25:49PM -0700, Andrew Morton wrote:
>>>> On Tue, 5 May 2015 23:13:29 +0100 Mel Gorman<mgorman@suse.de>  wrote:
>>>>
>>>>>> Alternatively, the page allocator can go off and synchronously
>>>>>> initialize some pageframes itself.  Keep doing that until the
>>>>>> allocation attempt succeeds.
>>>>>>
>>>>> That was rejected during review of earlier attempts at this 
>>>>> feature on
>>>>> the grounds that it impacted allocator fast paths.
>>>> eh?  Changes are only needed on the allocation-attempt-failed path,
>>>> which is slow-path.
>>> We'd have to distinguish between falling back to other zones because 
>>> the
>>> high zone is artifically exhausted and normal ALLOC_BATCH 
>>> exhaustion. We'd
>>> also have to avoid falling back to remote nodes prematurely. While I 
>>> have
>>> not tried an implementation, I expected they would need to be in the 
>>> fast
>>> paths unless I used jump labels to get around it. I'm going to try 
>>> altering
>>> when we initialise instead so that it happens earlier.
>>>
>> Which looks as follows. Waiman, a test on the 24TB machine would be
>> appreciated again. This patch should be applied instead of "mm: meminit:
>> Take into account that large system caches scale linearly with memory"
>>
>> ---8<---
>> mm: meminit: Finish initialisation of memory before basic setup
>>
>> Waiman Long reported that 24TB machines hit OOM during basic setup when
>> struct page initialisation was deferred. One approach is to 
>> initialise memory
>> on demand but it interferes with page allocator paths. This patch 
>> creates
>> dedicated threads to initialise memory before basic setup. It then 
>> blocks
>> on a rw_semaphore until completion as a wait_queue and counter is 
>> overkill.
>> This may be slower to boot but it's simplier overall and also gets 
>> rid of a
>> lot of section mangling which existed so kswapd could do the 
>> initialisation.
>>
>> Signed-off-by: Mel Gorman<mgorman@suse.de>
>>
>
> This patch moves the deferred meminit from kswapd to its own kernel 
> threads started after smp_init(). However, the hash table allocation 
> was done earlier than that. It seems like it will still run out of 
> memory in the 24TB machine that I tested on.
>
> I will certainly try it out, but I doubt it will solve the problem on 
> its own.

It turns out that the two new patches did work on the 24-TB DragonHawk 
without the "mm: meminit: Take into account that large system caches 
scale linearly with memory" patch. The bootup time was 357s which was 
just a few seconds slower than the other bootup times that I sent you 
yesterday.

BTW, do you want to change the following log message as kswapd will no 
longer be the one doing deferred meminit?

     kswapd 0 initialised 396098436 pages in 6024ms

Cheers,
Longman

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
  2015-05-07  2:37                         ` Waiman Long
@ 2015-05-07  7:21                           ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-05-07  7:21 UTC (permalink / raw)
  To: Waiman Long
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Wed, May 06, 2015 at 10:37:28PM -0400, Waiman Long wrote:
> On 05/06/2015 01:58 PM, Waiman Long wrote:
> >On 05/06/2015 06:22 AM, Mel Gorman wrote:
> >>On Wed, May 06, 2015 at 08:12:46AM +0100, Mel Gorman wrote:
> >>>On Tue, May 05, 2015 at 03:25:49PM -0700, Andrew Morton wrote:
> >>>>On Tue, 5 May 2015 23:13:29 +0100 Mel Gorman<mgorman@suse.de>  wrote:
> >>>>
> >>>>>>Alternatively, the page allocator can go off and synchronously
> >>>>>>initialize some pageframes itself.  Keep doing that until the
> >>>>>>allocation attempt succeeds.
> >>>>>>
> >>>>>That was rejected during review of earlier attempts at
> >>>>>this feature on
> >>>>>the grounds that it impacted allocator fast paths.
> >>>>eh?  Changes are only needed on the allocation-attempt-failed path,
> >>>>which is slow-path.
> >>>We'd have to distinguish between falling back to other zones
> >>>because the
> >>>high zone is artifically exhausted and normal ALLOC_BATCH
> >>>exhaustion. We'd
> >>>also have to avoid falling back to remote nodes prematurely.
> >>>While I have
> >>>not tried an implementation, I expected they would need to be
> >>>in the fast
> >>>paths unless I used jump labels to get around it. I'm going to
> >>>try altering
> >>>when we initialise instead so that it happens earlier.
> >>>
> >>Which looks as follows. Waiman, a test on the 24TB machine would be
> >>appreciated again. This patch should be applied instead of "mm: meminit:
> >>Take into account that large system caches scale linearly with memory"
> >>
> >>---8<---
> >>mm: meminit: Finish initialisation of memory before basic setup
> >>
> >>Waiman Long reported that 24TB machines hit OOM during basic setup when
> >>struct page initialisation was deferred. One approach is to
> >>initialise memory
> >>on demand but it interferes with page allocator paths. This
> >>patch creates
> >>dedicated threads to initialise memory before basic setup. It
> >>then blocks
> >>on a rw_semaphore until completion as a wait_queue and counter
> >>is overkill.
> >>This may be slower to boot but it's simplier overall and also
> >>gets rid of a
> >>lot of section mangling which existed so kswapd could do the
> >>initialisation.
> >>
> >>Signed-off-by: Mel Gorman<mgorman@suse.de>
> >>
> >
> >This patch moves the deferred meminit from kswapd to its own
> >kernel threads started after smp_init(). However, the hash table
> >allocation was done earlier than that. It seems like it will still
> >run out of memory in the 24TB machine that I tested on.
> >
> >I will certainly try it out, but I doubt it will solve the problem
> >on its own.
> 
> It turns out that the two new patches did work on the 24-TB
> DragonHawk without the "mm: meminit: Take into account that large
> system caches scale linearly with memory" patch. The bootup time was
> 357s which was just a few seconds slower than the other bootup times
> that I sent you yesterday.
> 

Grand. This is what I expected because the previous failure was not the
hash tables, it was later allocations and the parallel initialisation
was early enough.

> BTW, do you want to change the following log message as kswapd will
> no longer be the one doing deferred meminit?
> 
>     kswapd 0 initialised 396098436 pages in 6024ms
> 

I will.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 0/13] Parallel struct page initialisation v4
@ 2015-05-07  7:21                           ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-05-07  7:21 UTC (permalink / raw)
  To: Waiman Long
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Wed, May 06, 2015 at 10:37:28PM -0400, Waiman Long wrote:
> On 05/06/2015 01:58 PM, Waiman Long wrote:
> >On 05/06/2015 06:22 AM, Mel Gorman wrote:
> >>On Wed, May 06, 2015 at 08:12:46AM +0100, Mel Gorman wrote:
> >>>On Tue, May 05, 2015 at 03:25:49PM -0700, Andrew Morton wrote:
> >>>>On Tue, 5 May 2015 23:13:29 +0100 Mel Gorman<mgorman@suse.de>  wrote:
> >>>>
> >>>>>>Alternatively, the page allocator can go off and synchronously
> >>>>>>initialize some pageframes itself.  Keep doing that until the
> >>>>>>allocation attempt succeeds.
> >>>>>>
> >>>>>That was rejected during review of earlier attempts at
> >>>>>this feature on
> >>>>>the grounds that it impacted allocator fast paths.
> >>>>eh?  Changes are only needed on the allocation-attempt-failed path,
> >>>>which is slow-path.
> >>>We'd have to distinguish between falling back to other zones
> >>>because the
> >>>high zone is artifically exhausted and normal ALLOC_BATCH
> >>>exhaustion. We'd
> >>>also have to avoid falling back to remote nodes prematurely.
> >>>While I have
> >>>not tried an implementation, I expected they would need to be
> >>>in the fast
> >>>paths unless I used jump labels to get around it. I'm going to
> >>>try altering
> >>>when we initialise instead so that it happens earlier.
> >>>
> >>Which looks as follows. Waiman, a test on the 24TB machine would be
> >>appreciated again. This patch should be applied instead of "mm: meminit:
> >>Take into account that large system caches scale linearly with memory"
> >>
> >>---8<---
> >>mm: meminit: Finish initialisation of memory before basic setup
> >>
> >>Waiman Long reported that 24TB machines hit OOM during basic setup when
> >>struct page initialisation was deferred. One approach is to
> >>initialise memory
> >>on demand but it interferes with page allocator paths. This
> >>patch creates
> >>dedicated threads to initialise memory before basic setup. It
> >>then blocks
> >>on a rw_semaphore until completion as a wait_queue and counter
> >>is overkill.
> >>This may be slower to boot but it's simplier overall and also
> >>gets rid of a
> >>lot of section mangling which existed so kswapd could do the
> >>initialisation.
> >>
> >>Signed-off-by: Mel Gorman<mgorman@suse.de>
> >>
> >
> >This patch moves the deferred meminit from kswapd to its own
> >kernel threads started after smp_init(). However, the hash table
> >allocation was done earlier than that. It seems like it will still
> >run out of memory in the 24TB machine that I tested on.
> >
> >I will certainly try it out, but I doubt it will solve the problem
> >on its own.
> 
> It turns out that the two new patches did work on the 24-TB
> DragonHawk without the "mm: meminit: Take into account that large
> system caches scale linearly with memory" patch. The bootup time was
> 357s which was just a few seconds slower than the other bootup times
> that I sent you yesterday.
> 

Grand. This is what I expected because the previous failure was not the
hash tables, it was later allocations and the parallel initialisation
was early enough.

> BTW, do you want to change the following log message as kswapd will
> no longer be the one doing deferred meminit?
> 
>     kswapd 0 initialised 396098436 pages in 6024ms
> 

I will.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
  2015-05-05 20:02             ` Andrew Morton
@ 2015-05-07  7:25               ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-05-07  7:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Waiman Long, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

Waiman Long reported that 24TB machines hit OOM during basic setup when
struct page initialisation was deferred. One approach is to initialise memory
on demand but it interferes with page allocator paths. This patch creates
dedicated threads to initialise memory before basic setup. It then blocks
on a rw_semaphore until completion as a wait_queue and counter is overkill.
This may be slower to boot but it's simplier overall and also gets rid of a
section mangling which existed so kswapd could do the initialisation.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/gfp.h |  8 ++++++++
 init/main.c         |  2 ++
 mm/internal.h       | 24 ------------------------
 mm/page_alloc.c     | 46 +++++++++++++++++++++++++++++++++++++---------
 mm/vmscan.c         |  6 ++----
 5 files changed, 49 insertions(+), 37 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 51bd1e72a917..28a3128d9e59 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -385,6 +385,14 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp);
 void drain_all_pages(struct zone *zone);
 void drain_local_pages(struct zone *zone);
 
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+void page_alloc_init_late(void);
+#else
+static inline void page_alloc_init_late(void)
+{
+}
+#endif
+
 /*
  * gfp_allowed_mask is set to GFP_BOOT_MASK during early boot to restrict what
  * GFP flags are used before interrupts are enabled. Once interrupts are
diff --git a/init/main.c b/init/main.c
index 6f0f1c5ff8cc..9bef5f0c9864 100644
--- a/init/main.c
+++ b/init/main.c
@@ -995,6 +995,8 @@ static noinline void __init kernel_init_freeable(void)
 	smp_init();
 	sched_init_smp();
 
+	page_alloc_init_late();
+
 	do_basic_setup();
 
 	/* Open the /dev/console on the rootfs, this should never fail */
diff --git a/mm/internal.h b/mm/internal.h
index 5c221ad41a29..5a7c7a531720 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -377,30 +377,6 @@ static inline void mminit_verify_zonelist(void)
 }
 #endif /* CONFIG_DEBUG_MEMORY_INIT */
 
-/*
- * Deferred struct page initialisation requires init functions that are freed
- * before kswapd is available. Reuse the memory hotplug section annotation
- * to mark the required code.
- *
- * __defermem_init is code that always exists but is annotated __meminit to
- * 	avoid section warnings.
- * __defer_init code gets marked __meminit when deferring struct page
- *	initialistion but is otherwise in the init section.
- */
-#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
-#define __defermem_init __meminit
-#define __defer_init    __meminit
-
-void deferred_init_memmap(int nid);
-#else
-#define __defermem_init
-#define __defer_init __init
-
-static inline void deferred_init_memmap(int nid)
-{
-}
-#endif
-
 /* mminit_validate_memmodel_limits is independent of CONFIG_DEBUG_MEMORY_INIT */
 #if defined(CONFIG_SPARSEMEM)
 extern void mminit_validate_memmodel_limits(unsigned long *start_pfn,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 598f78d6544c..7c257e37f2ce 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -61,6 +61,7 @@
 #include <linux/hugetlb.h>
 #include <linux/sched/rt.h>
 #include <linux/page_owner.h>
+#include <linux/kthread.h>
 
 #include <asm/sections.h>
 #include <asm/tlbflush.h>
@@ -242,7 +243,7 @@ static inline void reset_deferred_meminit(pg_data_t *pgdat)
 }
 
 /* Returns true if the struct page for the pfn is uninitialised */
-static inline bool __defermem_init early_page_uninitialised(unsigned long pfn)
+static inline bool __meminit early_page_uninitialised(unsigned long pfn)
 {
 	int nid = early_pfn_to_nid(pfn);
 
@@ -972,7 +973,7 @@ static void __free_pages_ok(struct page *page, unsigned int order)
 	local_irq_restore(flags);
 }
 
-static void __defer_init __free_pages_boot_core(struct page *page,
+static void __init __free_pages_boot_core(struct page *page,
 					unsigned long pfn, unsigned int order)
 {
 	unsigned int nr_pages = 1 << order;
@@ -1039,7 +1040,7 @@ static inline bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
 }
 #endif
 
-void __defer_init __free_pages_bootmem(struct page *page, unsigned long pfn,
+void __init __free_pages_bootmem(struct page *page, unsigned long pfn,
 							unsigned int order)
 {
 	if (early_page_uninitialised(pfn))
@@ -1048,7 +1049,7 @@ void __defer_init __free_pages_bootmem(struct page *page, unsigned long pfn,
 }
 
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
-static void __defermem_init deferred_free_range(struct page *page,
+static void __init deferred_free_range(struct page *page,
 					unsigned long pfn, int nr_pages)
 {
 	int i;
@@ -1068,20 +1069,30 @@ static void __defermem_init deferred_free_range(struct page *page,
 		__free_pages_boot_core(page, pfn, 0);
 }
 
+static struct rw_semaphore __initdata pgdat_init_rwsem;
+
 /* Initialise remaining memory on a node */
-void __defermem_init deferred_init_memmap(int nid)
+static int __init deferred_init_memmap(void *data)
 {
+	pg_data_t *pgdat = (pg_data_t *)data;
+	int nid = pgdat->node_id;
 	struct mminit_pfnnid_cache nid_init_state = { };
 	unsigned long start = jiffies;
 	unsigned long nr_pages = 0;
 	unsigned long walk_start, walk_end;
 	int i, zid;
 	struct zone *zone;
-	pg_data_t *pgdat = NODE_DATA(nid);
 	unsigned long first_init_pfn = pgdat->first_deferred_pfn;
+	const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id);
 
-	if (first_init_pfn == ULONG_MAX)
-		return;
+	if (first_init_pfn == ULONG_MAX) {
+		up_read(&pgdat_init_rwsem);
+		return 0;
+	}
+
+	/* Bound memory initialisation to a local node if possible */
+	if (!cpumask_empty(cpumask))
+		set_cpus_allowed_ptr(current, cpumask);
 
 	/* Sanity check boundaries */
 	BUG_ON(pgdat->first_deferred_pfn < pgdat->node_start_pfn);
@@ -1173,8 +1184,25 @@ free_range:
 	/* Sanity check that the next zone really is unpopulated */
 	WARN_ON(++zid < MAX_NR_ZONES && populated_zone(++zone));
 
-	pr_info("kswapd %d initialised %lu pages in %ums\n", nid, nr_pages,
+	pr_info("node %d initialised, %lu pages in %ums\n", nid, nr_pages,
 					jiffies_to_msecs(jiffies - start));
+	up_read(&pgdat_init_rwsem);
+	return 0;
+}
+
+void __init page_alloc_init_late(void)
+{
+	int nid;
+
+	init_rwsem(&pgdat_init_rwsem);
+	for_each_node_state(nid, N_MEMORY) {
+		down_read(&pgdat_init_rwsem);
+		kthread_run(deferred_init_memmap, NODE_DATA(nid), "pgdatinit%d", nid);
+	}
+
+	/* Block until all are initialised */
+	down_write(&pgdat_init_rwsem);
+	up_write(&pgdat_init_rwsem);
 }
 #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c4895d26d036..5e8eadd71bac 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3348,7 +3348,7 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int order, int classzone_idx)
  * If there are applications that are active memory-allocators
  * (most normal use), this basically shouldn't matter.
  */
-static int __defermem_init kswapd(void *p)
+static int kswapd(void *p)
 {
 	unsigned long order, new_order;
 	unsigned balanced_order;
@@ -3383,8 +3383,6 @@ static int __defermem_init kswapd(void *p)
 	tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
 	set_freezable();
 
-	deferred_init_memmap(pgdat->node_id);
-
 	order = new_order = 0;
 	balanced_order = 0;
 	classzone_idx = new_classzone_idx = pgdat->nr_zones - 1;
@@ -3540,7 +3538,7 @@ static int cpu_callback(struct notifier_block *nfb, unsigned long action,
  * This kswapd start function will be called by init and node-hot-add.
  * On node-hot-add, kswapd will moved to proper cpus if cpus are hot-added.
  */
-int __defermem_init kswapd_run(int nid)
+int kswapd_run(int nid)
 {
 	pg_data_t *pgdat = NODE_DATA(nid);
 	int ret = 0;


^ permalink raw reply related	[flat|nested] 168+ messages in thread

* [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
@ 2015-05-07  7:25               ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-05-07  7:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Waiman Long, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

Waiman Long reported that 24TB machines hit OOM during basic setup when
struct page initialisation was deferred. One approach is to initialise memory
on demand but it interferes with page allocator paths. This patch creates
dedicated threads to initialise memory before basic setup. It then blocks
on a rw_semaphore until completion as a wait_queue and counter is overkill.
This may be slower to boot but it's simplier overall and also gets rid of a
section mangling which existed so kswapd could do the initialisation.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/gfp.h |  8 ++++++++
 init/main.c         |  2 ++
 mm/internal.h       | 24 ------------------------
 mm/page_alloc.c     | 46 +++++++++++++++++++++++++++++++++++++---------
 mm/vmscan.c         |  6 ++----
 5 files changed, 49 insertions(+), 37 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 51bd1e72a917..28a3128d9e59 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -385,6 +385,14 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp);
 void drain_all_pages(struct zone *zone);
 void drain_local_pages(struct zone *zone);
 
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+void page_alloc_init_late(void);
+#else
+static inline void page_alloc_init_late(void)
+{
+}
+#endif
+
 /*
  * gfp_allowed_mask is set to GFP_BOOT_MASK during early boot to restrict what
  * GFP flags are used before interrupts are enabled. Once interrupts are
diff --git a/init/main.c b/init/main.c
index 6f0f1c5ff8cc..9bef5f0c9864 100644
--- a/init/main.c
+++ b/init/main.c
@@ -995,6 +995,8 @@ static noinline void __init kernel_init_freeable(void)
 	smp_init();
 	sched_init_smp();
 
+	page_alloc_init_late();
+
 	do_basic_setup();
 
 	/* Open the /dev/console on the rootfs, this should never fail */
diff --git a/mm/internal.h b/mm/internal.h
index 5c221ad41a29..5a7c7a531720 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -377,30 +377,6 @@ static inline void mminit_verify_zonelist(void)
 }
 #endif /* CONFIG_DEBUG_MEMORY_INIT */
 
-/*
- * Deferred struct page initialisation requires init functions that are freed
- * before kswapd is available. Reuse the memory hotplug section annotation
- * to mark the required code.
- *
- * __defermem_init is code that always exists but is annotated __meminit to
- * 	avoid section warnings.
- * __defer_init code gets marked __meminit when deferring struct page
- *	initialistion but is otherwise in the init section.
- */
-#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
-#define __defermem_init __meminit
-#define __defer_init    __meminit
-
-void deferred_init_memmap(int nid);
-#else
-#define __defermem_init
-#define __defer_init __init
-
-static inline void deferred_init_memmap(int nid)
-{
-}
-#endif
-
 /* mminit_validate_memmodel_limits is independent of CONFIG_DEBUG_MEMORY_INIT */
 #if defined(CONFIG_SPARSEMEM)
 extern void mminit_validate_memmodel_limits(unsigned long *start_pfn,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 598f78d6544c..7c257e37f2ce 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -61,6 +61,7 @@
 #include <linux/hugetlb.h>
 #include <linux/sched/rt.h>
 #include <linux/page_owner.h>
+#include <linux/kthread.h>
 
 #include <asm/sections.h>
 #include <asm/tlbflush.h>
@@ -242,7 +243,7 @@ static inline void reset_deferred_meminit(pg_data_t *pgdat)
 }
 
 /* Returns true if the struct page for the pfn is uninitialised */
-static inline bool __defermem_init early_page_uninitialised(unsigned long pfn)
+static inline bool __meminit early_page_uninitialised(unsigned long pfn)
 {
 	int nid = early_pfn_to_nid(pfn);
 
@@ -972,7 +973,7 @@ static void __free_pages_ok(struct page *page, unsigned int order)
 	local_irq_restore(flags);
 }
 
-static void __defer_init __free_pages_boot_core(struct page *page,
+static void __init __free_pages_boot_core(struct page *page,
 					unsigned long pfn, unsigned int order)
 {
 	unsigned int nr_pages = 1 << order;
@@ -1039,7 +1040,7 @@ static inline bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
 }
 #endif
 
-void __defer_init __free_pages_bootmem(struct page *page, unsigned long pfn,
+void __init __free_pages_bootmem(struct page *page, unsigned long pfn,
 							unsigned int order)
 {
 	if (early_page_uninitialised(pfn))
@@ -1048,7 +1049,7 @@ void __defer_init __free_pages_bootmem(struct page *page, unsigned long pfn,
 }
 
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
-static void __defermem_init deferred_free_range(struct page *page,
+static void __init deferred_free_range(struct page *page,
 					unsigned long pfn, int nr_pages)
 {
 	int i;
@@ -1068,20 +1069,30 @@ static void __defermem_init deferred_free_range(struct page *page,
 		__free_pages_boot_core(page, pfn, 0);
 }
 
+static struct rw_semaphore __initdata pgdat_init_rwsem;
+
 /* Initialise remaining memory on a node */
-void __defermem_init deferred_init_memmap(int nid)
+static int __init deferred_init_memmap(void *data)
 {
+	pg_data_t *pgdat = (pg_data_t *)data;
+	int nid = pgdat->node_id;
 	struct mminit_pfnnid_cache nid_init_state = { };
 	unsigned long start = jiffies;
 	unsigned long nr_pages = 0;
 	unsigned long walk_start, walk_end;
 	int i, zid;
 	struct zone *zone;
-	pg_data_t *pgdat = NODE_DATA(nid);
 	unsigned long first_init_pfn = pgdat->first_deferred_pfn;
+	const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id);
 
-	if (first_init_pfn == ULONG_MAX)
-		return;
+	if (first_init_pfn == ULONG_MAX) {
+		up_read(&pgdat_init_rwsem);
+		return 0;
+	}
+
+	/* Bound memory initialisation to a local node if possible */
+	if (!cpumask_empty(cpumask))
+		set_cpus_allowed_ptr(current, cpumask);
 
 	/* Sanity check boundaries */
 	BUG_ON(pgdat->first_deferred_pfn < pgdat->node_start_pfn);
@@ -1173,8 +1184,25 @@ free_range:
 	/* Sanity check that the next zone really is unpopulated */
 	WARN_ON(++zid < MAX_NR_ZONES && populated_zone(++zone));
 
-	pr_info("kswapd %d initialised %lu pages in %ums\n", nid, nr_pages,
+	pr_info("node %d initialised, %lu pages in %ums\n", nid, nr_pages,
 					jiffies_to_msecs(jiffies - start));
+	up_read(&pgdat_init_rwsem);
+	return 0;
+}
+
+void __init page_alloc_init_late(void)
+{
+	int nid;
+
+	init_rwsem(&pgdat_init_rwsem);
+	for_each_node_state(nid, N_MEMORY) {
+		down_read(&pgdat_init_rwsem);
+		kthread_run(deferred_init_memmap, NODE_DATA(nid), "pgdatinit%d", nid);
+	}
+
+	/* Block until all are initialised */
+	down_write(&pgdat_init_rwsem);
+	up_write(&pgdat_init_rwsem);
 }
 #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c4895d26d036..5e8eadd71bac 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3348,7 +3348,7 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int order, int classzone_idx)
  * If there are applications that are active memory-allocators
  * (most normal use), this basically shouldn't matter.
  */
-static int __defermem_init kswapd(void *p)
+static int kswapd(void *p)
 {
 	unsigned long order, new_order;
 	unsigned balanced_order;
@@ -3383,8 +3383,6 @@ static int __defermem_init kswapd(void *p)
 	tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
 	set_freezable();
 
-	deferred_init_memmap(pgdat->node_id);
-
 	order = new_order = 0;
 	balanced_order = 0;
 	classzone_idx = new_classzone_idx = pgdat->nr_zones - 1;
@@ -3540,7 +3538,7 @@ static int cpu_callback(struct notifier_block *nfb, unsigned long action,
  * This kswapd start function will be called by init and node-hot-add.
  * On node-hot-add, kswapd will moved to proper cpus if cpus are hot-added.
  */
-int __defermem_init kswapd_run(int nid)
+int kswapd_run(int nid)
 {
 	pg_data_t *pgdat = NODE_DATA(nid);
 	int ret = 0;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
  2015-05-07  7:25               ` Mel Gorman
@ 2015-05-07 22:09                 ` Andrew Morton
  -1 siblings, 0 replies; 168+ messages in thread
From: Andrew Morton @ 2015-05-07 22:09 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Waiman Long, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Thu, 7 May 2015 08:25:18 +0100 Mel Gorman <mgorman@suse.de> wrote:

> Waiman Long reported that 24TB machines hit OOM during basic setup when
> struct page initialisation was deferred. One approach is to initialise memory
> on demand but it interferes with page allocator paths. This patch creates
> dedicated threads to initialise memory before basic setup. It then blocks
> on a rw_semaphore until completion as a wait_queue and counter is overkill.
> This may be slower to boot but it's simplier overall and also gets rid of a
> section mangling which existed so kswapd could do the initialisation.

Seems a reasonable compromise.  It makes a bit of a mess of the patch
sequencing.

Have some tweaklets:



From: Andrew Morton <akpm@linux-foundation.org>
Subject: mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix

include rwsem.h, use DECLARE_RWSEM, fix comment, remove unneeded cast

Cc: Daniel J Blueman <daniel@numascale.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Nathan Zimmer <nzimmer@sgi.com>
Cc: Scott Norton <scott.norton@hp.com>
Cc: Waiman Long <waiman.long@hp.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page_alloc.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff -puN mm/page_alloc.c~mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix mm/page_alloc.c
--- a/mm/page_alloc.c~mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix
+++ a/mm/page_alloc.c
@@ -18,6 +18,7 @@
 #include <linux/mm.h>
 #include <linux/swap.h>
 #include <linux/interrupt.h>
+#include <linux/rwsem.h>
 #include <linux/pagemap.h>
 #include <linux/jiffies.h>
 #include <linux/bootmem.h>
@@ -1075,12 +1076,12 @@ static void __init deferred_free_range(s
 		__free_pages_boot_core(page, pfn, 0);
 }
 
-static struct rw_semaphore __initdata pgdat_init_rwsem;
+static __initdata DECLARE_RWSEM(pgdat_init_rwsem);
 
 /* Initialise remaining memory on a node */
 static int __init deferred_init_memmap(void *data)
 {
-	pg_data_t *pgdat = (pg_data_t *)data;
+	pg_data_t *pgdat = data;
 	int nid = pgdat->node_id;
 	struct mminit_pfnnid_cache nid_init_state = { };
 	unsigned long start = jiffies;
@@ -1096,7 +1097,7 @@ static int __init deferred_init_memmap(v
 		return 0;
 	}
 
-	/* Bound memory initialisation to a local node if possible */
+	/* Bind memory initialisation thread to a local node if possible */
 	if (!cpumask_empty(cpumask))
 		set_cpus_allowed_ptr(current, cpumask);
 
@@ -1200,7 +1201,6 @@ void __init page_alloc_init_late(void)
 {
 	int nid;
 
-	init_rwsem(&pgdat_init_rwsem);
 	for_each_node_state(nid, N_MEMORY) {
 		down_read(&pgdat_init_rwsem);
 		kthread_run(deferred_init_memmap, NODE_DATA(nid), "pgdatinit%d", nid);
_


^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
@ 2015-05-07 22:09                 ` Andrew Morton
  0 siblings, 0 replies; 168+ messages in thread
From: Andrew Morton @ 2015-05-07 22:09 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Waiman Long, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Thu, 7 May 2015 08:25:18 +0100 Mel Gorman <mgorman@suse.de> wrote:

> Waiman Long reported that 24TB machines hit OOM during basic setup when
> struct page initialisation was deferred. One approach is to initialise memory
> on demand but it interferes with page allocator paths. This patch creates
> dedicated threads to initialise memory before basic setup. It then blocks
> on a rw_semaphore until completion as a wait_queue and counter is overkill.
> This may be slower to boot but it's simplier overall and also gets rid of a
> section mangling which existed so kswapd could do the initialisation.

Seems a reasonable compromise.  It makes a bit of a mess of the patch
sequencing.

Have some tweaklets:



From: Andrew Morton <akpm@linux-foundation.org>
Subject: mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix

include rwsem.h, use DECLARE_RWSEM, fix comment, remove unneeded cast

Cc: Daniel J Blueman <daniel@numascale.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Nathan Zimmer <nzimmer@sgi.com>
Cc: Scott Norton <scott.norton@hp.com>
Cc: Waiman Long <waiman.long@hp.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page_alloc.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff -puN mm/page_alloc.c~mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix mm/page_alloc.c
--- a/mm/page_alloc.c~mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix
+++ a/mm/page_alloc.c
@@ -18,6 +18,7 @@
 #include <linux/mm.h>
 #include <linux/swap.h>
 #include <linux/interrupt.h>
+#include <linux/rwsem.h>
 #include <linux/pagemap.h>
 #include <linux/jiffies.h>
 #include <linux/bootmem.h>
@@ -1075,12 +1076,12 @@ static void __init deferred_free_range(s
 		__free_pages_boot_core(page, pfn, 0);
 }
 
-static struct rw_semaphore __initdata pgdat_init_rwsem;
+static __initdata DECLARE_RWSEM(pgdat_init_rwsem);
 
 /* Initialise remaining memory on a node */
 static int __init deferred_init_memmap(void *data)
 {
-	pg_data_t *pgdat = (pg_data_t *)data;
+	pg_data_t *pgdat = data;
 	int nid = pgdat->node_id;
 	struct mminit_pfnnid_cache nid_init_state = { };
 	unsigned long start = jiffies;
@@ -1096,7 +1097,7 @@ static int __init deferred_init_memmap(v
 		return 0;
 	}
 
-	/* Bound memory initialisation to a local node if possible */
+	/* Bind memory initialisation thread to a local node if possible */
 	if (!cpumask_empty(cpumask))
 		set_cpus_allowed_ptr(current, cpumask);
 
@@ -1200,7 +1201,6 @@ void __init page_alloc_init_late(void)
 {
 	int nid;
 
-	init_rwsem(&pgdat_init_rwsem);
 	for_each_node_state(nid, N_MEMORY) {
 		down_read(&pgdat_init_rwsem);
 		kthread_run(deferred_init_memmap, NODE_DATA(nid), "pgdatinit%d", nid);
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
  2015-05-07 22:09                 ` Andrew Morton
@ 2015-05-07 22:52                   ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-05-07 22:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Waiman Long, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Thu, May 07, 2015 at 03:09:32PM -0700, Andrew Morton wrote:
> On Thu, 7 May 2015 08:25:18 +0100 Mel Gorman <mgorman@suse.de> wrote:
> 
> > Waiman Long reported that 24TB machines hit OOM during basic setup when
> > struct page initialisation was deferred. One approach is to initialise memory
> > on demand but it interferes with page allocator paths. This patch creates
> > dedicated threads to initialise memory before basic setup. It then blocks
> > on a rw_semaphore until completion as a wait_queue and counter is overkill.
> > This may be slower to boot but it's simplier overall and also gets rid of a
> > section mangling which existed so kswapd could do the initialisation.
> 
> Seems a reasonable compromise.  It makes a bit of a mess of the patch
> sequencing.
> 
> Have some tweaklets:
> 

The tweaks are prefectly reasonable. As for the patch sequencing, I'm ok
with adding the patch on top if you are because that preserves the testing
history. If you're unhappy, I can shuffle it into a better place and resend
the full series that includes all the fixes so far.

Thanks.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
@ 2015-05-07 22:52                   ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-05-07 22:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Waiman Long, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Thu, May 07, 2015 at 03:09:32PM -0700, Andrew Morton wrote:
> On Thu, 7 May 2015 08:25:18 +0100 Mel Gorman <mgorman@suse.de> wrote:
> 
> > Waiman Long reported that 24TB machines hit OOM during basic setup when
> > struct page initialisation was deferred. One approach is to initialise memory
> > on demand but it interferes with page allocator paths. This patch creates
> > dedicated threads to initialise memory before basic setup. It then blocks
> > on a rw_semaphore until completion as a wait_queue and counter is overkill.
> > This may be slower to boot but it's simplier overall and also gets rid of a
> > section mangling which existed so kswapd could do the initialisation.
> 
> Seems a reasonable compromise.  It makes a bit of a mess of the patch
> sequencing.
> 
> Have some tweaklets:
> 

The tweaks are prefectly reasonable. As for the patch sequencing, I'm ok
with adding the patch on top if you are because that preserves the testing
history. If you're unhappy, I can shuffle it into a better place and resend
the full series that includes all the fixes so far.

Thanks.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
  2015-05-07 22:52                   ` Mel Gorman
@ 2015-05-07 23:02                     ` Andrew Morton
  -1 siblings, 0 replies; 168+ messages in thread
From: Andrew Morton @ 2015-05-07 23:02 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Waiman Long, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Thu, 7 May 2015 23:52:26 +0100 Mel Gorman <mgorman@suse.de> wrote:

>  As for the patch sequencing, I'm ok
> with adding the patch on top if you are because that preserves the testing
> history. If you're unhappy, I can shuffle it into a better place and resend
> the full series that includes all the fixes so far.

We'll survive.  Let's only do the reorganization if the patches need rework
for other reasons.

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
@ 2015-05-07 23:02                     ` Andrew Morton
  0 siblings, 0 replies; 168+ messages in thread
From: Andrew Morton @ 2015-05-07 23:02 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Waiman Long, Nathan Zimmer, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Thu, 7 May 2015 23:52:26 +0100 Mel Gorman <mgorman@suse.de> wrote:

>  As for the patch sequencing, I'm ok
> with adding the patch on top if you are because that preserves the testing
> history. If you're unhappy, I can shuffle it into a better place and resend
> the full series that includes all the fixes so far.

We'll survive.  Let's only do the reorganization if the patches need rework
for other reasons.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
  2015-05-07 22:09                 ` Andrew Morton
@ 2015-05-13 15:53                   ` nzimmer
  -1 siblings, 0 replies; 168+ messages in thread
From: nzimmer @ 2015-05-13 15:53 UTC (permalink / raw)
  To: Andrew Morton, Mel Gorman
  Cc: Waiman Long, Dave Hansen, Scott Norton, Daniel J Blueman, Linux-MM, LKML

I am just noticed a hang on my largest box.
I can only reproduce with large core counts, if I turn down the number 
of cpus it doesn't have an issue.

Also as time goes on the amount of time required to initialize pages 
goes up.


log_uv48_05121052:[  177.250385] node 0 initialised, 14950072 pages in 544ms
log_uv48_05121052:[  177.269629] node 1 initialised, 15990505 pages in 564ms
log_uv48_05121052:[  177.436047] node 215 initialised, 3600110 pages in 
724ms
log_uv48_05121052:[  177.464056] node 102 initialised, 3604205 pages in 
756ms
log_uv48_05121052:[  178.073822] node 30 initialised, 7732972 pages in 
1368ms
log_uv48_05121052:[  178.082888] node 31 initialised, 7728877 pages in 
1372ms
log_uv48_05121052:[  178.080060] node 29 initialised, 7728877 pages in 
1376ms
....
log_uv48_05121052:[  178.217980] node 197 initialised, 7728877 pages in 
1504ms
log_uv48_05121052:[  178.217851] node 196 initialised, 7732972 pages in 
1504ms
log_uv48_05121052:[  178.219992] node 247 initialised, 7726418 pages in 
1504ms
log_uv48_05121052:[  178.325299] node 3 initialised, 15986409 pages in 
1624ms
log_uv48_05121052:[  178.328455] node 2 initialised, 15990505 pages in 
1624ms
log_uv48_05121052:[  178.383371] node 4 initialised, 15990505 pages in 
1680ms
...
log_uv48_05121052:[  178.438401] node 19 initialised, 15986409 pages in 
1728ms

I apologize for the tardiness of this report but I have not been able to 
get to the largest boxes reliably.
Hopefully I will have more access this week.


On 05/07/2015 05:09 PM, Andrew Morton wrote:
> On Thu, 7 May 2015 08:25:18 +0100 Mel Gorman <mgorman@suse.de> wrote:
>
>> Waiman Long reported that 24TB machines hit OOM during basic setup when
>> struct page initialisation was deferred. One approach is to initialise memory
>> on demand but it interferes with page allocator paths. This patch creates
>> dedicated threads to initialise memory before basic setup. It then blocks
>> on a rw_semaphore until completion as a wait_queue and counter is overkill.
>> This may be slower to boot but it's simplier overall and also gets rid of a
>> section mangling which existed so kswapd could do the initialisation.
> Seems a reasonable compromise.  It makes a bit of a mess of the patch
> sequencing.
>
> Have some tweaklets:
>
>
>
> From: Andrew Morton <akpm@linux-foundation.org>
> Subject: mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix
>
> include rwsem.h, use DECLARE_RWSEM, fix comment, remove unneeded cast
>
> Cc: Daniel J Blueman <daniel@numascale.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Nathan Zimmer <nzimmer@sgi.com>
> Cc: Scott Norton <scott.norton@hp.com>
> Cc: Waiman Long <waiman.long@hp.com
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>
>   mm/page_alloc.c |    8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff -puN mm/page_alloc.c~mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix mm/page_alloc.c
> --- a/mm/page_alloc.c~mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix
> +++ a/mm/page_alloc.c
> @@ -18,6 +18,7 @@
>   #include <linux/mm.h>
>   #include <linux/swap.h>
>   #include <linux/interrupt.h>
> +#include <linux/rwsem.h>
>   #include <linux/pagemap.h>
>   #include <linux/jiffies.h>
>   #include <linux/bootmem.h>
> @@ -1075,12 +1076,12 @@ static void __init deferred_free_range(s
>   		__free_pages_boot_core(page, pfn, 0);
>   }
>   
> -static struct rw_semaphore __initdata pgdat_init_rwsem;
> +static __initdata DECLARE_RWSEM(pgdat_init_rwsem);
>   
>   /* Initialise remaining memory on a node */
>   static int __init deferred_init_memmap(void *data)
>   {
> -	pg_data_t *pgdat = (pg_data_t *)data;
> +	pg_data_t *pgdat = data;
>   	int nid = pgdat->node_id;
>   	struct mminit_pfnnid_cache nid_init_state = { };
>   	unsigned long start = jiffies;
> @@ -1096,7 +1097,7 @@ static int __init deferred_init_memmap(v
>   		return 0;
>   	}
>   
> -	/* Bound memory initialisation to a local node if possible */
> +	/* Bind memory initialisation thread to a local node if possible */
>   	if (!cpumask_empty(cpumask))
>   		set_cpus_allowed_ptr(current, cpumask);
>   
> @@ -1200,7 +1201,6 @@ void __init page_alloc_init_late(void)
>   {
>   	int nid;
>   
> -	init_rwsem(&pgdat_init_rwsem);
>   	for_each_node_state(nid, N_MEMORY) {
>   		down_read(&pgdat_init_rwsem);
>   		kthread_run(deferred_init_memmap, NODE_DATA(nid), "pgdatinit%d", nid);
> _
>


^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
@ 2015-05-13 15:53                   ` nzimmer
  0 siblings, 0 replies; 168+ messages in thread
From: nzimmer @ 2015-05-13 15:53 UTC (permalink / raw)
  To: Andrew Morton, Mel Gorman
  Cc: Waiman Long, Dave Hansen, Scott Norton, Daniel J Blueman, Linux-MM, LKML

I am just noticed a hang on my largest box.
I can only reproduce with large core counts, if I turn down the number 
of cpus it doesn't have an issue.

Also as time goes on the amount of time required to initialize pages 
goes up.


log_uv48_05121052:[  177.250385] node 0 initialised, 14950072 pages in 544ms
log_uv48_05121052:[  177.269629] node 1 initialised, 15990505 pages in 564ms
log_uv48_05121052:[  177.436047] node 215 initialised, 3600110 pages in 
724ms
log_uv48_05121052:[  177.464056] node 102 initialised, 3604205 pages in 
756ms
log_uv48_05121052:[  178.073822] node 30 initialised, 7732972 pages in 
1368ms
log_uv48_05121052:[  178.082888] node 31 initialised, 7728877 pages in 
1372ms
log_uv48_05121052:[  178.080060] node 29 initialised, 7728877 pages in 
1376ms
....
log_uv48_05121052:[  178.217980] node 197 initialised, 7728877 pages in 
1504ms
log_uv48_05121052:[  178.217851] node 196 initialised, 7732972 pages in 
1504ms
log_uv48_05121052:[  178.219992] node 247 initialised, 7726418 pages in 
1504ms
log_uv48_05121052:[  178.325299] node 3 initialised, 15986409 pages in 
1624ms
log_uv48_05121052:[  178.328455] node 2 initialised, 15990505 pages in 
1624ms
log_uv48_05121052:[  178.383371] node 4 initialised, 15990505 pages in 
1680ms
...
log_uv48_05121052:[  178.438401] node 19 initialised, 15986409 pages in 
1728ms

I apologize for the tardiness of this report but I have not been able to 
get to the largest boxes reliably.
Hopefully I will have more access this week.


On 05/07/2015 05:09 PM, Andrew Morton wrote:
> On Thu, 7 May 2015 08:25:18 +0100 Mel Gorman <mgorman@suse.de> wrote:
>
>> Waiman Long reported that 24TB machines hit OOM during basic setup when
>> struct page initialisation was deferred. One approach is to initialise memory
>> on demand but it interferes with page allocator paths. This patch creates
>> dedicated threads to initialise memory before basic setup. It then blocks
>> on a rw_semaphore until completion as a wait_queue and counter is overkill.
>> This may be slower to boot but it's simplier overall and also gets rid of a
>> section mangling which existed so kswapd could do the initialisation.
> Seems a reasonable compromise.  It makes a bit of a mess of the patch
> sequencing.
>
> Have some tweaklets:
>
>
>
> From: Andrew Morton <akpm@linux-foundation.org>
> Subject: mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix
>
> include rwsem.h, use DECLARE_RWSEM, fix comment, remove unneeded cast
>
> Cc: Daniel J Blueman <daniel@numascale.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Nathan Zimmer <nzimmer@sgi.com>
> Cc: Scott Norton <scott.norton@hp.com>
> Cc: Waiman Long <waiman.long@hp.com
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>
>   mm/page_alloc.c |    8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff -puN mm/page_alloc.c~mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix mm/page_alloc.c
> --- a/mm/page_alloc.c~mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix
> +++ a/mm/page_alloc.c
> @@ -18,6 +18,7 @@
>   #include <linux/mm.h>
>   #include <linux/swap.h>
>   #include <linux/interrupt.h>
> +#include <linux/rwsem.h>
>   #include <linux/pagemap.h>
>   #include <linux/jiffies.h>
>   #include <linux/bootmem.h>
> @@ -1075,12 +1076,12 @@ static void __init deferred_free_range(s
>   		__free_pages_boot_core(page, pfn, 0);
>   }
>   
> -static struct rw_semaphore __initdata pgdat_init_rwsem;
> +static __initdata DECLARE_RWSEM(pgdat_init_rwsem);
>   
>   /* Initialise remaining memory on a node */
>   static int __init deferred_init_memmap(void *data)
>   {
> -	pg_data_t *pgdat = (pg_data_t *)data;
> +	pg_data_t *pgdat = data;
>   	int nid = pgdat->node_id;
>   	struct mminit_pfnnid_cache nid_init_state = { };
>   	unsigned long start = jiffies;
> @@ -1096,7 +1097,7 @@ static int __init deferred_init_memmap(v
>   		return 0;
>   	}
>   
> -	/* Bound memory initialisation to a local node if possible */
> +	/* Bind memory initialisation thread to a local node if possible */
>   	if (!cpumask_empty(cpumask))
>   		set_cpus_allowed_ptr(current, cpumask);
>   
> @@ -1200,7 +1201,6 @@ void __init page_alloc_init_late(void)
>   {
>   	int nid;
>   
> -	init_rwsem(&pgdat_init_rwsem);
>   	for_each_node_state(nid, N_MEMORY) {
>   		down_read(&pgdat_init_rwsem);
>   		kthread_run(deferred_init_memmap, NODE_DATA(nid), "pgdatinit%d", nid);
> _
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
  2015-05-13 15:53                   ` nzimmer
@ 2015-05-13 16:31                     ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-05-13 16:31 UTC (permalink / raw)
  To: nzimmer
  Cc: Andrew Morton, Waiman Long, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Wed, May 13, 2015 at 10:53:33AM -0500, nzimmer wrote:
> I am just noticed a hang on my largest box.
> I can only reproduce with large core counts, if I turn down the
> number of cpus it doesn't have an issue.
> 

Odd. The number of core counts should make little a difference as only
one CPU per node should be in use. Does sysrq+t give any indication how
or where it is hanging?

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
@ 2015-05-13 16:31                     ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-05-13 16:31 UTC (permalink / raw)
  To: nzimmer
  Cc: Andrew Morton, Waiman Long, Dave Hansen, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML

On Wed, May 13, 2015 at 10:53:33AM -0500, nzimmer wrote:
> I am just noticed a hang on my largest box.
> I can only reproduce with large core counts, if I turn down the
> number of cpus it doesn't have an issue.
> 

Odd. The number of core counts should make little a difference as only
one CPU per node should be in use. Does sysrq+t give any indication how
or where it is hanging?

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
  2015-05-13 16:31                     ` Mel Gorman
@ 2015-05-14 10:03                       ` Daniel J Blueman
  -1 siblings, 0 replies; 168+ messages in thread
From: Daniel J Blueman @ 2015-05-14 10:03 UTC (permalink / raw)
  To: Mel Gorman, nzimmer
  Cc: Andrew Morton, Waiman Long, Dave Hansen, Scott Norton, Linux-MM,
	LKML, Steffen Persvold

On Thu, May 14, 2015 at 12:31 AM, Mel Gorman <mgorman@suse.de> wrote:
> On Wed, May 13, 2015 at 10:53:33AM -0500, nzimmer wrote:
>>  I am just noticed a hang on my largest box.
>>  I can only reproduce with large core counts, if I turn down the
>>  number of cpus it doesn't have an issue.
>> 
> 
> Odd. The number of core counts should make little a difference as only
> one CPU per node should be in use. Does sysrq+t give any indication 
> how
> or where it is hanging?

I was seeing the same behaviour of 1000ms increasing to 5500ms [1]; 
this suggests either lock contention or O(n) behaviour.

Nathan, can you check with this ordering of patches from Andrew's cache 
[2]? I was getting hanging until I a found them all.

I'll follow up with timing data.

Thanks,
  Daniel

-- [1]

[   73.076117] node 2 initialised, 7732961 pages in 1060ms
[   73.077184] node 38 initialised, 7732961 pages in 1060ms
[   73.079626] node 146 initialised, 7732961 pages in 1050ms
[   73.093488] node 62 initialised, 7732961 pages in 1080ms
[   73.091557] node 3 initialised, 7732962 pages in 1080ms
[   73.100000] node 186 initialised, 7732961 pages in 1040ms
[   73.095731] node 4 initialised, 7732961 pages in 1080ms
[   73.090289] node 50 initialised, 7732961 pages in 1080ms
[   73.094005] node 158 initialised, 7732961 pages in 1050ms
[   73.095421] node 159 initialised, 7732962 pages in 1050ms
[   73.090324] node 52 initialised, 7732961 pages in 1080ms
[   73.099056] node 5 initialised, 7732962 pages in 1080ms
[   73.090116] node 160 initialised, 7732961 pages in 1050ms
[   73.161051] node 157 initialised, 7732962 pages in 1120ms
[   73.193565] node 161 initialised, 7732962 pages in 1160ms
[   73.212456] node 26 initialised, 7732961 pages in 1200ms
[   73.222904] node 0 initialised, 6686488 pages in 1210ms
[   73.242165] node 140 initialised, 7732961 pages in 1210ms
[   73.254230] node 156 initialised, 7732961 pages in 1220ms
[   73.284634] node 1 initialised, 7732962 pages in 1270ms
[   73.305301] node 141 initialised, 7732962 pages in 1280ms
[   73.322845] node 28 initialised, 7732961 pages in 1310ms
[   73.321757] node 142 initialised, 7732961 pages in 1290ms
[   73.327677] node 138 initialised, 7732961 pages in 1300ms
[   73.413597] node 176 initialised, 7732961 pages in 1370ms
[   73.455552] node 139 initialised, 7732962 pages in 1420ms
[   73.475356] node 143 initialised, 7732962 pages in 1440ms
[   73.547202] node 32 initialised, 7732961 pages in 1530ms
[   73.579591] node 104 initialised, 7732961 pages in 1560ms
[   73.618065] node 174 initialised, 7732961 pages in 1570ms
[   73.624918] node 178 initialised, 7732961 pages in 1580ms
[   73.649024] node 175 initialised, 7732962 pages in 1610ms
[   73.654110] node 105 initialised, 7732962 pages in 1630ms
[   73.670589] node 106 initialised, 7732961 pages in 1650ms
[   73.739682] node 102 initialised, 7732961 pages in 1720ms
[   73.769639] node 86 initialised, 7732961 pages in 1750ms
[   73.775573] node 44 initialised, 7732961 pages in 1760ms
[   73.772955] node 177 initialised, 7732962 pages in 1740ms
[   73.804390] node 34 initialised, 7732961 pages in 1790ms
[   73.819370] node 30 initialised, 7732961 pages in 1810ms
[   73.847882] node 98 initialised, 7732961 pages in 1830ms
[   73.867545] node 33 initialised, 7732962 pages in 1860ms
[   73.877964] node 107 initialised, 7732962 pages in 1860ms
[   73.906256] node 103 initialised, 7732962 pages in 1880ms
[   73.945581] node 100 initialised, 7732961 pages in 1930ms
[   73.947024] node 96 initialised, 7732961 pages in 1930ms
[   74.186208] node 116 initialised, 7732961 pages in 2170ms
[   74.220838] node 68 initialised, 7732961 pages in 2210ms
[   74.252341] node 46 initialised, 7732961 pages in 2240ms
[   74.274795] node 118 initialised, 7732961 pages in 2260ms
[   74.337544] node 14 initialised, 7732961 pages in 2320ms
[   74.350819] node 22 initialised, 7732961 pages in 2340ms
[   74.350332] node 69 initialised, 7732962 pages in 2340ms
[   74.362683] node 211 initialised, 7732962 pages in 2310ms
[   74.360617] node 70 initialised, 7732961 pages in 2340ms
[   74.369137] node 66 initialised, 7732961 pages in 2360ms
[   74.378242] node 115 initialised, 7732962 pages in 2360ms
[   74.404221] node 213 initialised, 7732962 pages in 2350ms
[   74.420901] node 210 initialised, 7732961 pages in 2370ms
[   74.430049] node 35 initialised, 7732962 pages in 2420ms
[   74.436007] node 48 initialised, 7732961 pages in 2420ms
[   74.480595] node 71 initialised, 7732962 pages in 2460ms
[   74.485700] node 67 initialised, 7732962 pages in 2480ms
[   74.502627] node 31 initialised, 7732962 pages in 2490ms
[   74.542220] node 16 initialised, 7732961 pages in 2530ms
[   74.547936] node 128 initialised, 7732961 pages in 2520ms
[   74.634374] node 214 initialised, 7732961 pages in 2580ms
[   74.654389] node 88 initialised, 7732961 pages in 2630ms
[   74.722833] node 117 initialised, 7732962 pages in 2700ms
[   74.735002] node 148 initialised, 7732961 pages in 2700ms
[   74.742725] node 12 initialised, 7732961 pages in 2730ms
[   74.749319] node 194 initialised, 7732961 pages in 2700ms
[   74.767979] node 24 initialised, 7732961 pages in 2750ms
[   74.769465] node 114 initialised, 7732961 pages in 2750ms
[   74.796973] node 134 initialised, 7732961 pages in 2770ms
[   74.818164] node 15 initialised, 7732962 pages in 2810ms
[   74.844852] node 18 initialised, 7732961 pages in 2830ms
[   74.866123] node 110 initialised, 7732961 pages in 2850ms
[   74.898255] node 215 initialised, 7730688 pages in 2840ms
[   74.903623] node 136 initialised, 7732961 pages in 2880ms
[   74.911107] node 144 initialised, 7732961 pages in 2890ms
[   74.918757] node 212 initialised, 7732961 pages in 2870ms
[   74.935333] node 182 initialised, 7732961 pages in 2880ms
[   74.958147] node 42 initialised, 7732961 pages in 2950ms
[   74.964989] node 108 initialised, 7732961 pages in 2950ms
[   74.965482] node 112 initialised, 7732961 pages in 2950ms
[   75.034787] node 184 initialised, 7732961 pages in 2980ms
[   75.051242] node 45 initialised, 7732962 pages in 3040ms
[   75.047169] node 152 initialised, 7732961 pages in 3020ms
[   75.062834] node 179 initialised, 7732962 pages in 3010ms
[   75.076528] node 145 initialised, 7732962 pages in 3040ms
[   75.076613] node 25 initialised, 7732962 pages in 3070ms
[   75.073086] node 164 initialised, 7732961 pages in 3040ms
[   75.079674] node 149 initialised, 7732962 pages in 3050ms
[   75.092015] node 113 initialised, 7732962 pages in 3070ms
[   75.096325] node 80 initialised, 7732961 pages in 3080ms
[   75.131380] node 92 initialised, 7732961 pages in 3110ms
[   75.142147] node 10 initialised, 7732961 pages in 3130ms
[   75.151041] node 51 initialised, 7732962 pages in 3140ms
[   75.159074] node 130 initialised, 7732961 pages in 3130ms
[   75.162616] node 166 initialised, 7732961 pages in 3130ms
[   75.193557] node 82 initialised, 7732961 pages in 3170ms
[   75.254801] node 84 initialised, 7732961 pages in 3240ms
[   75.303028] node 64 initialised, 7732961 pages in 3290ms
[   75.299739] node 49 initialised, 7732962 pages in 3290ms
[   75.314231] node 21 initialised, 7732962 pages in 3300ms
[   75.371298] node 53 initialised, 7732962 pages in 3360ms
[   75.394569] node 95 initialised, 7732962 pages in 3380ms
[   75.441101] node 23 initialised, 7732962 pages in 3430ms
[   75.433080] node 19 initialised, 7732962 pages in 3430ms
[   75.446076] node 173 initialised, 7732962 pages in 3410ms
[   75.445816] node 99 initialised, 7732962 pages in 3430ms
[   75.470330] node 87 initialised, 7732962 pages in 3450ms
[   75.502334] node 8 initialised, 7732961 pages in 3490ms
[   75.508300] node 206 initialised, 7732961 pages in 3460ms
[   75.540253] node 132 initialised, 7732961 pages in 3510ms
[   75.615453] node 183 initialised, 7732962 pages in 3560ms
[   75.632576] node 78 initialised, 7732961 pages in 3610ms
[   75.647753] node 85 initialised, 7732962 pages in 3620ms
[   75.688955] node 90 initialised, 7732961 pages in 3670ms
[   75.694522] node 200 initialised, 7732961 pages in 3640ms
[   75.688790] node 43 initialised, 7732962 pages in 3680ms
[   75.694540] node 94 initialised, 7732961 pages in 3680ms
[   75.697149] node 29 initialised, 7732962 pages in 3690ms
[   75.693590] node 111 initialised, 7732962 pages in 3680ms
[   75.715829] node 56 initialised, 7732961 pages in 3700ms
[   75.718427] node 97 initialised, 7732962 pages in 3700ms
[   75.741643] node 147 initialised, 7732962 pages in 3710ms
[   75.773613] node 170 initialised, 7732961 pages in 3740ms
[   75.802874] node 208 initialised, 7732961 pages in 3750ms
[   75.804409] node 58 initialised, 7732961 pages in 3790ms
[   75.853438] node 126 initialised, 7732961 pages in 3830ms
[   75.888167] node 167 initialised, 7732962 pages in 3850ms
[   75.912656] node 172 initialised, 7732961 pages in 3870ms
[   75.956540] node 93 initialised, 7732962 pages in 3940ms
[   75.988819] node 127 initialised, 7732962 pages in 3960ms
[   76.062198] node 201 initialised, 7732962 pages in 4010ms
[   76.091769] node 47 initialised, 7732962 pages in 4080ms
[   76.119749] node 162 initialised, 7732961 pages in 4080ms
[   76.122797] node 6 initialised, 7732961 pages in 4110ms
[   76.225916] node 153 initialised, 7732962 pages in 4190ms
[   76.219855] node 81 initialised, 7732962 pages in 4200ms
[   76.236116] node 150 initialised, 7732961 pages in 4210ms
[   76.245349] node 180 initialised, 7732961 pages in 4190ms
[   76.248827] node 17 initialised, 7732962 pages in 4240ms
[   76.258801] node 13 initialised, 7732962 pages in 4250ms
[   76.259943] node 122 initialised, 7732961 pages in 4240ms
[   76.277480] node 196 initialised, 7732961 pages in 4230ms
[   76.320830] node 41 initialised, 7732962 pages in 4310ms
[   76.351667] node 129 initialised, 7732962 pages in 4320ms
[   76.353488] node 202 initialised, 7732961 pages in 4310ms
[   76.376753] node 165 initialised, 7732962 pages in 4340ms
[   76.381807] node 124 initialised, 7732961 pages in 4350ms
[   76.419952] node 171 initialised, 7732962 pages in 4380ms
[   76.431242] node 168 initialised, 7732961 pages in 4390ms
[   76.441324] node 89 initialised, 7732962 pages in 4420ms
[   76.440720] node 155 initialised, 7732962 pages in 4400ms
[   76.459715] node 120 initialised, 7732961 pages in 4440ms
[   76.483986] node 205 initialised, 7732962 pages in 4430ms
[   76.493284] node 151 initialised, 7732962 pages in 4460ms
[   76.491437] node 60 initialised, 7732961 pages in 4480ms
[   76.526620] node 74 initialised, 7732961 pages in 4510ms
[   76.543761] node 131 initialised, 7732962 pages in 4510ms
[   76.549562] node 39 initialised, 7732962 pages in 4540ms
[   76.563861] node 11 initialised, 7732962 pages in 4550ms
[   76.598775] node 54 initialised, 7732961 pages in 4590ms
[   76.602006] node 123 initialised, 7732962 pages in 4570ms
[   76.619856] node 76 initialised, 7732961 pages in 4600ms
[   76.631418] node 198 initialised, 7732961 pages in 4580ms
[   76.665415] node 188 initialised, 7732961 pages in 4610ms
[   76.669178] node 63 initialised, 7732962 pages in 4660ms
[   76.683646] node 101 initialised, 7732962 pages in 4670ms
[   76.710780] node 192 initialised, 7732961 pages in 4660ms
[   76.736743] node 121 initialised, 7732962 pages in 4720ms
[   76.743800] node 199 initialised, 7732962 pages in 4700ms
[   76.750663] node 20 initialised, 7732961 pages in 4740ms
[   76.763045] node 135 initialised, 7732962 pages in 4730ms
[   76.768216] node 137 initialised, 7732962 pages in 4740ms
[   76.800135] node 181 initialised, 7732962 pages in 4750ms
[   76.811215] node 27 initialised, 7732962 pages in 4800ms
[   76.857405] node 125 initialised, 7732962 pages in 4820ms
[   76.853750] node 163 initialised, 7732962 pages in 4820ms
[   76.882975] node 59 initialised, 7732962 pages in 4870ms
[   76.920121] node 9 initialised, 7732962 pages in 4910ms
[   76.934824] node 189 initialised, 7732962 pages in 4880ms
[   76.951223] node 154 initialised, 7732961 pages in 4920ms
[   76.953897] node 203 initialised, 7732962 pages in 4900ms
[   76.952558] node 75 initialised, 7732962 pages in 4930ms
[   76.985480] node 119 initialised, 7732962 pages in 4970ms
[   77.036089] node 195 initialised, 7732962 pages in 4980ms
[   77.039996] node 55 initialised, 7732962 pages in 5030ms
[   77.067989] node 109 initialised, 7732962 pages in 5040ms
[   77.066236] node 7 initialised, 7732962 pages in 5060ms
[   77.068709] node 65 initialised, 7732962 pages in 5060ms
[   77.097859] node 79 initialised, 7732962 pages in 5080ms
[   77.096219] node 169 initialised, 7732962 pages in 5060ms
[   77.125113] node 83 initialised, 7732962 pages in 5110ms
[   77.139507] node 37 initialised, 7732962 pages in 5130ms
[   77.143280] node 77 initialised, 7732962 pages in 5120ms
[   77.226494] node 73 initialised, 7732962 pages in 5200ms
[   77.281584] node 190 initialised, 7732961 pages in 5230ms
[   77.314794] node 204 initialised, 7732961 pages in 5260ms
[   77.328577] node 72 initialised, 7732961 pages in 5310ms
[   77.335743] node 36 initialised, 7732961 pages in 5320ms
[   77.360573] node 40 initialised, 7732961 pages in 5350ms
[   77.368712] node 207 initialised, 7732962 pages in 5320ms
[   77.387708] node 91 initialised, 7732962 pages in 5370ms
[   77.385143] node 57 initialised, 7732962 pages in 5380ms
[   77.391785] node 191 initialised, 7732962 pages in 5340ms
[   77.479970] node 185 initialised, 7732962 pages in 5430ms
[   77.491865] node 61 initialised, 7732962 pages in 5480ms
[   77.489255] node 133 initialised, 7732962 pages in 5460ms
[   77.502111] node 197 initialised, 7732962 pages in 5450ms
[   77.507136] node 193 initialised, 7732962 pages in 5460ms
[   77.523739] node 209 initialised, 7732962 pages in 5470ms
[   77.537131] node 187 initialised, 7732962 pages in 5490ms

-- [2]

http://ozlabs.org/~akpm/mmots/broken-out/memblock-introduce-a-for_each_reserved_mem_region-iterator.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-move-page-initialization-into-a-separate-function.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-only-set-page-reserved-in-the-memblock-region.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-page_alloc-pass-pfn-to-__free_pages_bootmem.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-make-__early_pfn_to_nid-smp-safe-and-introduce-meminit_pfn_in_nid.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-inline-some-helper-functions.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-inline-some-helper-functions-fix.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-initialise-a-subset-of-struct-pages-if-config_deferred_struct_page_init-is-set.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-initialise-remaining-struct-pages-in-parallel-with-kswapd.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-minimise-number-of-pfn-page-lookups-during-initialisation.patch
http://ozlabs.org/~akpm/mmots/broken-out/x86-mm-enable-deferred-struct-page-initialisation-on-x86-64.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-free-pages-in-large-chunks-where-possible.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-reduce-number-of-times-pageblocks-are-set-during-struct-page-init.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-remove-mminit_verify_page_links.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-initialise-a-subset-of-struct-pages-if-config_deferred_struct_page_init-is-set-fix.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-reduce-number-of-times-pageblocks-are-set-during-struct-page-init-fix.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-inline-some-helper-functions-fix2.patch


^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
@ 2015-05-14 10:03                       ` Daniel J Blueman
  0 siblings, 0 replies; 168+ messages in thread
From: Daniel J Blueman @ 2015-05-14 10:03 UTC (permalink / raw)
  To: Mel Gorman, nzimmer
  Cc: Andrew Morton, Waiman Long, Dave Hansen, Scott Norton, Linux-MM,
	LKML, Steffen Persvold

On Thu, May 14, 2015 at 12:31 AM, Mel Gorman <mgorman@suse.de> wrote:
> On Wed, May 13, 2015 at 10:53:33AM -0500, nzimmer wrote:
>>  I am just noticed a hang on my largest box.
>>  I can only reproduce with large core counts, if I turn down the
>>  number of cpus it doesn't have an issue.
>> 
> 
> Odd. The number of core counts should make little a difference as only
> one CPU per node should be in use. Does sysrq+t give any indication 
> how
> or where it is hanging?

I was seeing the same behaviour of 1000ms increasing to 5500ms [1]; 
this suggests either lock contention or O(n) behaviour.

Nathan, can you check with this ordering of patches from Andrew's cache 
[2]? I was getting hanging until I a found them all.

I'll follow up with timing data.

Thanks,
  Daniel

-- [1]

[   73.076117] node 2 initialised, 7732961 pages in 1060ms
[   73.077184] node 38 initialised, 7732961 pages in 1060ms
[   73.079626] node 146 initialised, 7732961 pages in 1050ms
[   73.093488] node 62 initialised, 7732961 pages in 1080ms
[   73.091557] node 3 initialised, 7732962 pages in 1080ms
[   73.100000] node 186 initialised, 7732961 pages in 1040ms
[   73.095731] node 4 initialised, 7732961 pages in 1080ms
[   73.090289] node 50 initialised, 7732961 pages in 1080ms
[   73.094005] node 158 initialised, 7732961 pages in 1050ms
[   73.095421] node 159 initialised, 7732962 pages in 1050ms
[   73.090324] node 52 initialised, 7732961 pages in 1080ms
[   73.099056] node 5 initialised, 7732962 pages in 1080ms
[   73.090116] node 160 initialised, 7732961 pages in 1050ms
[   73.161051] node 157 initialised, 7732962 pages in 1120ms
[   73.193565] node 161 initialised, 7732962 pages in 1160ms
[   73.212456] node 26 initialised, 7732961 pages in 1200ms
[   73.222904] node 0 initialised, 6686488 pages in 1210ms
[   73.242165] node 140 initialised, 7732961 pages in 1210ms
[   73.254230] node 156 initialised, 7732961 pages in 1220ms
[   73.284634] node 1 initialised, 7732962 pages in 1270ms
[   73.305301] node 141 initialised, 7732962 pages in 1280ms
[   73.322845] node 28 initialised, 7732961 pages in 1310ms
[   73.321757] node 142 initialised, 7732961 pages in 1290ms
[   73.327677] node 138 initialised, 7732961 pages in 1300ms
[   73.413597] node 176 initialised, 7732961 pages in 1370ms
[   73.455552] node 139 initialised, 7732962 pages in 1420ms
[   73.475356] node 143 initialised, 7732962 pages in 1440ms
[   73.547202] node 32 initialised, 7732961 pages in 1530ms
[   73.579591] node 104 initialised, 7732961 pages in 1560ms
[   73.618065] node 174 initialised, 7732961 pages in 1570ms
[   73.624918] node 178 initialised, 7732961 pages in 1580ms
[   73.649024] node 175 initialised, 7732962 pages in 1610ms
[   73.654110] node 105 initialised, 7732962 pages in 1630ms
[   73.670589] node 106 initialised, 7732961 pages in 1650ms
[   73.739682] node 102 initialised, 7732961 pages in 1720ms
[   73.769639] node 86 initialised, 7732961 pages in 1750ms
[   73.775573] node 44 initialised, 7732961 pages in 1760ms
[   73.772955] node 177 initialised, 7732962 pages in 1740ms
[   73.804390] node 34 initialised, 7732961 pages in 1790ms
[   73.819370] node 30 initialised, 7732961 pages in 1810ms
[   73.847882] node 98 initialised, 7732961 pages in 1830ms
[   73.867545] node 33 initialised, 7732962 pages in 1860ms
[   73.877964] node 107 initialised, 7732962 pages in 1860ms
[   73.906256] node 103 initialised, 7732962 pages in 1880ms
[   73.945581] node 100 initialised, 7732961 pages in 1930ms
[   73.947024] node 96 initialised, 7732961 pages in 1930ms
[   74.186208] node 116 initialised, 7732961 pages in 2170ms
[   74.220838] node 68 initialised, 7732961 pages in 2210ms
[   74.252341] node 46 initialised, 7732961 pages in 2240ms
[   74.274795] node 118 initialised, 7732961 pages in 2260ms
[   74.337544] node 14 initialised, 7732961 pages in 2320ms
[   74.350819] node 22 initialised, 7732961 pages in 2340ms
[   74.350332] node 69 initialised, 7732962 pages in 2340ms
[   74.362683] node 211 initialised, 7732962 pages in 2310ms
[   74.360617] node 70 initialised, 7732961 pages in 2340ms
[   74.369137] node 66 initialised, 7732961 pages in 2360ms
[   74.378242] node 115 initialised, 7732962 pages in 2360ms
[   74.404221] node 213 initialised, 7732962 pages in 2350ms
[   74.420901] node 210 initialised, 7732961 pages in 2370ms
[   74.430049] node 35 initialised, 7732962 pages in 2420ms
[   74.436007] node 48 initialised, 7732961 pages in 2420ms
[   74.480595] node 71 initialised, 7732962 pages in 2460ms
[   74.485700] node 67 initialised, 7732962 pages in 2480ms
[   74.502627] node 31 initialised, 7732962 pages in 2490ms
[   74.542220] node 16 initialised, 7732961 pages in 2530ms
[   74.547936] node 128 initialised, 7732961 pages in 2520ms
[   74.634374] node 214 initialised, 7732961 pages in 2580ms
[   74.654389] node 88 initialised, 7732961 pages in 2630ms
[   74.722833] node 117 initialised, 7732962 pages in 2700ms
[   74.735002] node 148 initialised, 7732961 pages in 2700ms
[   74.742725] node 12 initialised, 7732961 pages in 2730ms
[   74.749319] node 194 initialised, 7732961 pages in 2700ms
[   74.767979] node 24 initialised, 7732961 pages in 2750ms
[   74.769465] node 114 initialised, 7732961 pages in 2750ms
[   74.796973] node 134 initialised, 7732961 pages in 2770ms
[   74.818164] node 15 initialised, 7732962 pages in 2810ms
[   74.844852] node 18 initialised, 7732961 pages in 2830ms
[   74.866123] node 110 initialised, 7732961 pages in 2850ms
[   74.898255] node 215 initialised, 7730688 pages in 2840ms
[   74.903623] node 136 initialised, 7732961 pages in 2880ms
[   74.911107] node 144 initialised, 7732961 pages in 2890ms
[   74.918757] node 212 initialised, 7732961 pages in 2870ms
[   74.935333] node 182 initialised, 7732961 pages in 2880ms
[   74.958147] node 42 initialised, 7732961 pages in 2950ms
[   74.964989] node 108 initialised, 7732961 pages in 2950ms
[   74.965482] node 112 initialised, 7732961 pages in 2950ms
[   75.034787] node 184 initialised, 7732961 pages in 2980ms
[   75.051242] node 45 initialised, 7732962 pages in 3040ms
[   75.047169] node 152 initialised, 7732961 pages in 3020ms
[   75.062834] node 179 initialised, 7732962 pages in 3010ms
[   75.076528] node 145 initialised, 7732962 pages in 3040ms
[   75.076613] node 25 initialised, 7732962 pages in 3070ms
[   75.073086] node 164 initialised, 7732961 pages in 3040ms
[   75.079674] node 149 initialised, 7732962 pages in 3050ms
[   75.092015] node 113 initialised, 7732962 pages in 3070ms
[   75.096325] node 80 initialised, 7732961 pages in 3080ms
[   75.131380] node 92 initialised, 7732961 pages in 3110ms
[   75.142147] node 10 initialised, 7732961 pages in 3130ms
[   75.151041] node 51 initialised, 7732962 pages in 3140ms
[   75.159074] node 130 initialised, 7732961 pages in 3130ms
[   75.162616] node 166 initialised, 7732961 pages in 3130ms
[   75.193557] node 82 initialised, 7732961 pages in 3170ms
[   75.254801] node 84 initialised, 7732961 pages in 3240ms
[   75.303028] node 64 initialised, 7732961 pages in 3290ms
[   75.299739] node 49 initialised, 7732962 pages in 3290ms
[   75.314231] node 21 initialised, 7732962 pages in 3300ms
[   75.371298] node 53 initialised, 7732962 pages in 3360ms
[   75.394569] node 95 initialised, 7732962 pages in 3380ms
[   75.441101] node 23 initialised, 7732962 pages in 3430ms
[   75.433080] node 19 initialised, 7732962 pages in 3430ms
[   75.446076] node 173 initialised, 7732962 pages in 3410ms
[   75.445816] node 99 initialised, 7732962 pages in 3430ms
[   75.470330] node 87 initialised, 7732962 pages in 3450ms
[   75.502334] node 8 initialised, 7732961 pages in 3490ms
[   75.508300] node 206 initialised, 7732961 pages in 3460ms
[   75.540253] node 132 initialised, 7732961 pages in 3510ms
[   75.615453] node 183 initialised, 7732962 pages in 3560ms
[   75.632576] node 78 initialised, 7732961 pages in 3610ms
[   75.647753] node 85 initialised, 7732962 pages in 3620ms
[   75.688955] node 90 initialised, 7732961 pages in 3670ms
[   75.694522] node 200 initialised, 7732961 pages in 3640ms
[   75.688790] node 43 initialised, 7732962 pages in 3680ms
[   75.694540] node 94 initialised, 7732961 pages in 3680ms
[   75.697149] node 29 initialised, 7732962 pages in 3690ms
[   75.693590] node 111 initialised, 7732962 pages in 3680ms
[   75.715829] node 56 initialised, 7732961 pages in 3700ms
[   75.718427] node 97 initialised, 7732962 pages in 3700ms
[   75.741643] node 147 initialised, 7732962 pages in 3710ms
[   75.773613] node 170 initialised, 7732961 pages in 3740ms
[   75.802874] node 208 initialised, 7732961 pages in 3750ms
[   75.804409] node 58 initialised, 7732961 pages in 3790ms
[   75.853438] node 126 initialised, 7732961 pages in 3830ms
[   75.888167] node 167 initialised, 7732962 pages in 3850ms
[   75.912656] node 172 initialised, 7732961 pages in 3870ms
[   75.956540] node 93 initialised, 7732962 pages in 3940ms
[   75.988819] node 127 initialised, 7732962 pages in 3960ms
[   76.062198] node 201 initialised, 7732962 pages in 4010ms
[   76.091769] node 47 initialised, 7732962 pages in 4080ms
[   76.119749] node 162 initialised, 7732961 pages in 4080ms
[   76.122797] node 6 initialised, 7732961 pages in 4110ms
[   76.225916] node 153 initialised, 7732962 pages in 4190ms
[   76.219855] node 81 initialised, 7732962 pages in 4200ms
[   76.236116] node 150 initialised, 7732961 pages in 4210ms
[   76.245349] node 180 initialised, 7732961 pages in 4190ms
[   76.248827] node 17 initialised, 7732962 pages in 4240ms
[   76.258801] node 13 initialised, 7732962 pages in 4250ms
[   76.259943] node 122 initialised, 7732961 pages in 4240ms
[   76.277480] node 196 initialised, 7732961 pages in 4230ms
[   76.320830] node 41 initialised, 7732962 pages in 4310ms
[   76.351667] node 129 initialised, 7732962 pages in 4320ms
[   76.353488] node 202 initialised, 7732961 pages in 4310ms
[   76.376753] node 165 initialised, 7732962 pages in 4340ms
[   76.381807] node 124 initialised, 7732961 pages in 4350ms
[   76.419952] node 171 initialised, 7732962 pages in 4380ms
[   76.431242] node 168 initialised, 7732961 pages in 4390ms
[   76.441324] node 89 initialised, 7732962 pages in 4420ms
[   76.440720] node 155 initialised, 7732962 pages in 4400ms
[   76.459715] node 120 initialised, 7732961 pages in 4440ms
[   76.483986] node 205 initialised, 7732962 pages in 4430ms
[   76.493284] node 151 initialised, 7732962 pages in 4460ms
[   76.491437] node 60 initialised, 7732961 pages in 4480ms
[   76.526620] node 74 initialised, 7732961 pages in 4510ms
[   76.543761] node 131 initialised, 7732962 pages in 4510ms
[   76.549562] node 39 initialised, 7732962 pages in 4540ms
[   76.563861] node 11 initialised, 7732962 pages in 4550ms
[   76.598775] node 54 initialised, 7732961 pages in 4590ms
[   76.602006] node 123 initialised, 7732962 pages in 4570ms
[   76.619856] node 76 initialised, 7732961 pages in 4600ms
[   76.631418] node 198 initialised, 7732961 pages in 4580ms
[   76.665415] node 188 initialised, 7732961 pages in 4610ms
[   76.669178] node 63 initialised, 7732962 pages in 4660ms
[   76.683646] node 101 initialised, 7732962 pages in 4670ms
[   76.710780] node 192 initialised, 7732961 pages in 4660ms
[   76.736743] node 121 initialised, 7732962 pages in 4720ms
[   76.743800] node 199 initialised, 7732962 pages in 4700ms
[   76.750663] node 20 initialised, 7732961 pages in 4740ms
[   76.763045] node 135 initialised, 7732962 pages in 4730ms
[   76.768216] node 137 initialised, 7732962 pages in 4740ms
[   76.800135] node 181 initialised, 7732962 pages in 4750ms
[   76.811215] node 27 initialised, 7732962 pages in 4800ms
[   76.857405] node 125 initialised, 7732962 pages in 4820ms
[   76.853750] node 163 initialised, 7732962 pages in 4820ms
[   76.882975] node 59 initialised, 7732962 pages in 4870ms
[   76.920121] node 9 initialised, 7732962 pages in 4910ms
[   76.934824] node 189 initialised, 7732962 pages in 4880ms
[   76.951223] node 154 initialised, 7732961 pages in 4920ms
[   76.953897] node 203 initialised, 7732962 pages in 4900ms
[   76.952558] node 75 initialised, 7732962 pages in 4930ms
[   76.985480] node 119 initialised, 7732962 pages in 4970ms
[   77.036089] node 195 initialised, 7732962 pages in 4980ms
[   77.039996] node 55 initialised, 7732962 pages in 5030ms
[   77.067989] node 109 initialised, 7732962 pages in 5040ms
[   77.066236] node 7 initialised, 7732962 pages in 5060ms
[   77.068709] node 65 initialised, 7732962 pages in 5060ms
[   77.097859] node 79 initialised, 7732962 pages in 5080ms
[   77.096219] node 169 initialised, 7732962 pages in 5060ms
[   77.125113] node 83 initialised, 7732962 pages in 5110ms
[   77.139507] node 37 initialised, 7732962 pages in 5130ms
[   77.143280] node 77 initialised, 7732962 pages in 5120ms
[   77.226494] node 73 initialised, 7732962 pages in 5200ms
[   77.281584] node 190 initialised, 7732961 pages in 5230ms
[   77.314794] node 204 initialised, 7732961 pages in 5260ms
[   77.328577] node 72 initialised, 7732961 pages in 5310ms
[   77.335743] node 36 initialised, 7732961 pages in 5320ms
[   77.360573] node 40 initialised, 7732961 pages in 5350ms
[   77.368712] node 207 initialised, 7732962 pages in 5320ms
[   77.387708] node 91 initialised, 7732962 pages in 5370ms
[   77.385143] node 57 initialised, 7732962 pages in 5380ms
[   77.391785] node 191 initialised, 7732962 pages in 5340ms
[   77.479970] node 185 initialised, 7732962 pages in 5430ms
[   77.491865] node 61 initialised, 7732962 pages in 5480ms
[   77.489255] node 133 initialised, 7732962 pages in 5460ms
[   77.502111] node 197 initialised, 7732962 pages in 5450ms
[   77.507136] node 193 initialised, 7732962 pages in 5460ms
[   77.523739] node 209 initialised, 7732962 pages in 5470ms
[   77.537131] node 187 initialised, 7732962 pages in 5490ms

-- [2]

http://ozlabs.org/~akpm/mmots/broken-out/memblock-introduce-a-for_each_reserved_mem_region-iterator.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-move-page-initialization-into-a-separate-function.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-only-set-page-reserved-in-the-memblock-region.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-page_alloc-pass-pfn-to-__free_pages_bootmem.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-make-__early_pfn_to_nid-smp-safe-and-introduce-meminit_pfn_in_nid.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-inline-some-helper-functions.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-inline-some-helper-functions-fix.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-initialise-a-subset-of-struct-pages-if-config_deferred_struct_page_init-is-set.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-initialise-remaining-struct-pages-in-parallel-with-kswapd.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-minimise-number-of-pfn-page-lookups-during-initialisation.patch
http://ozlabs.org/~akpm/mmots/broken-out/x86-mm-enable-deferred-struct-page-initialisation-on-x86-64.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-free-pages-in-large-chunks-where-possible.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-reduce-number-of-times-pageblocks-are-set-during-struct-page-init.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-remove-mminit_verify_page_links.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-initialise-a-subset-of-struct-pages-if-config_deferred_struct_page_init-is-set-fix.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-reduce-number-of-times-pageblocks-are-set-during-struct-page-init-fix.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-inline-some-helper-functions-fix2.patch

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
  2015-05-14 10:03                       ` Daniel J Blueman
@ 2015-05-14 15:47                         ` nzimmer
  -1 siblings, 0 replies; 168+ messages in thread
From: nzimmer @ 2015-05-14 15:47 UTC (permalink / raw)
  To: Daniel J Blueman, Mel Gorman
  Cc: Andrew Morton, Waiman Long, Dave Hansen, Scott Norton, Linux-MM,
	LKML, Steffen Persvold

Well I did get in some tests yesterday afternoon.  And with some simple 
timers found that occasionally it took huge amount of time in this 
snippit at the top of
static int __init deferred_init_memmap(void *data)

        /* Bind memory initialisation thread to a local node if possible */
        if (!cpumask_empty(cpumask))
                set_cpus_allowed_ptr(current, cpumask);

I am assuming that the it is getting caught up in set_cpus_allowed_ptr 
not the cpumask_empty.
I have more machine time today and will make sure I have all those patches.

On 05/14/2015 05:03 AM, Daniel J Blueman wrote:
> On Thu, May 14, 2015 at 12:31 AM, Mel Gorman <mgorman@suse.de> wrote:
>> On Wed, May 13, 2015 at 10:53:33AM -0500, nzimmer wrote:
>>>  I am just noticed a hang on my largest box.
>>>  I can only reproduce with large core counts, if I turn down the
>>>  number of cpus it doesn't have an issue.
>>>
>>
>> Odd. The number of core counts should make little a difference as only
>> one CPU per node should be in use. Does sysrq+t give any indication how
>> or where it is hanging?
>
> I was seeing the same behaviour of 1000ms increasing to 5500ms [1]; 
> this suggests either lock contention or O(n) behaviour.
>
> Nathan, can you check with this ordering of patches from Andrew's 
> cache [2]? I was getting hanging until I a found them all.
>
> I'll follow up with timing data.
>
> Thanks,
>  Daniel
>
> -- [1]
>
> [   73.076117] node 2 initialised, 7732961 pages in 1060ms
> [   73.077184] node 38 initialised, 7732961 pages in 1060ms
> [   73.079626] node 146 initialised, 7732961 pages in 1050ms
> [   73.093488] node 62 initialised, 7732961 pages in 1080ms
> [   73.091557] node 3 initialised, 7732962 pages in 1080ms
> [   73.100000] node 186 initialised, 7732961 pages in 1040ms
> [   73.095731] node 4 initialised, 7732961 pages in 1080ms
> [   73.090289] node 50 initialised, 7732961 pages in 1080ms
> [   73.094005] node 158 initialised, 7732961 pages in 1050ms
> [   73.095421] node 159 initialised, 7732962 pages in 1050ms
> [   73.090324] node 52 initialised, 7732961 pages in 1080ms
> [   73.099056] node 5 initialised, 7732962 pages in 1080ms
> [   73.090116] node 160 initialised, 7732961 pages in 1050ms
> [   73.161051] node 157 initialised, 7732962 pages in 1120ms
> [   73.193565] node 161 initialised, 7732962 pages in 1160ms
> [   73.212456] node 26 initialised, 7732961 pages in 1200ms
> [   73.222904] node 0 initialised, 6686488 pages in 1210ms
> [   73.242165] node 140 initialised, 7732961 pages in 1210ms
> [   73.254230] node 156 initialised, 7732961 pages in 1220ms
> [   73.284634] node 1 initialised, 7732962 pages in 1270ms
> [   73.305301] node 141 initialised, 7732962 pages in 1280ms
> [   73.322845] node 28 initialised, 7732961 pages in 1310ms
> [   73.321757] node 142 initialised, 7732961 pages in 1290ms
> [   73.327677] node 138 initialised, 7732961 pages in 1300ms
> [   73.413597] node 176 initialised, 7732961 pages in 1370ms
> [   73.455552] node 139 initialised, 7732962 pages in 1420ms
> [   73.475356] node 143 initialised, 7732962 pages in 1440ms
> [   73.547202] node 32 initialised, 7732961 pages in 1530ms
> [   73.579591] node 104 initialised, 7732961 pages in 1560ms
> [   73.618065] node 174 initialised, 7732961 pages in 1570ms
> [   73.624918] node 178 initialised, 7732961 pages in 1580ms
> [   73.649024] node 175 initialised, 7732962 pages in 1610ms
> [   73.654110] node 105 initialised, 7732962 pages in 1630ms
> [   73.670589] node 106 initialised, 7732961 pages in 1650ms
> [   73.739682] node 102 initialised, 7732961 pages in 1720ms
> [   73.769639] node 86 initialised, 7732961 pages in 1750ms
> [   73.775573] node 44 initialised, 7732961 pages in 1760ms
> [   73.772955] node 177 initialised, 7732962 pages in 1740ms
> [   73.804390] node 34 initialised, 7732961 pages in 1790ms
> [   73.819370] node 30 initialised, 7732961 pages in 1810ms
> [   73.847882] node 98 initialised, 7732961 pages in 1830ms
> [   73.867545] node 33 initialised, 7732962 pages in 1860ms
> [   73.877964] node 107 initialised, 7732962 pages in 1860ms
> [   73.906256] node 103 initialised, 7732962 pages in 1880ms
> [   73.945581] node 100 initialised, 7732961 pages in 1930ms
> [   73.947024] node 96 initialised, 7732961 pages in 1930ms
> [   74.186208] node 116 initialised, 7732961 pages in 2170ms
> [   74.220838] node 68 initialised, 7732961 pages in 2210ms
> [   74.252341] node 46 initialised, 7732961 pages in 2240ms
> [   74.274795] node 118 initialised, 7732961 pages in 2260ms
> [   74.337544] node 14 initialised, 7732961 pages in 2320ms
> [   74.350819] node 22 initialised, 7732961 pages in 2340ms
> [   74.350332] node 69 initialised, 7732962 pages in 2340ms
> [   74.362683] node 211 initialised, 7732962 pages in 2310ms
> [   74.360617] node 70 initialised, 7732961 pages in 2340ms
> [   74.369137] node 66 initialised, 7732961 pages in 2360ms
> [   74.378242] node 115 initialised, 7732962 pages in 2360ms
> [   74.404221] node 213 initialised, 7732962 pages in 2350ms
> [   74.420901] node 210 initialised, 7732961 pages in 2370ms
> [   74.430049] node 35 initialised, 7732962 pages in 2420ms
> [   74.436007] node 48 initialised, 7732961 pages in 2420ms
> [   74.480595] node 71 initialised, 7732962 pages in 2460ms
> [   74.485700] node 67 initialised, 7732962 pages in 2480ms
> [   74.502627] node 31 initialised, 7732962 pages in 2490ms
> [   74.542220] node 16 initialised, 7732961 pages in 2530ms
> [   74.547936] node 128 initialised, 7732961 pages in 2520ms
> [   74.634374] node 214 initialised, 7732961 pages in 2580ms
> [   74.654389] node 88 initialised, 7732961 pages in 2630ms
> [   74.722833] node 117 initialised, 7732962 pages in 2700ms
> [   74.735002] node 148 initialised, 7732961 pages in 2700ms
> [   74.742725] node 12 initialised, 7732961 pages in 2730ms
> [   74.749319] node 194 initialised, 7732961 pages in 2700ms
> [   74.767979] node 24 initialised, 7732961 pages in 2750ms
> [   74.769465] node 114 initialised, 7732961 pages in 2750ms
> [   74.796973] node 134 initialised, 7732961 pages in 2770ms
> [   74.818164] node 15 initialised, 7732962 pages in 2810ms
> [   74.844852] node 18 initialised, 7732961 pages in 2830ms
> [   74.866123] node 110 initialised, 7732961 pages in 2850ms
> [   74.898255] node 215 initialised, 7730688 pages in 2840ms
> [   74.903623] node 136 initialised, 7732961 pages in 2880ms
> [   74.911107] node 144 initialised, 7732961 pages in 2890ms
> [   74.918757] node 212 initialised, 7732961 pages in 2870ms
> [   74.935333] node 182 initialised, 7732961 pages in 2880ms
> [   74.958147] node 42 initialised, 7732961 pages in 2950ms
> [   74.964989] node 108 initialised, 7732961 pages in 2950ms
> [   74.965482] node 112 initialised, 7732961 pages in 2950ms
> [   75.034787] node 184 initialised, 7732961 pages in 2980ms
> [   75.051242] node 45 initialised, 7732962 pages in 3040ms
> [   75.047169] node 152 initialised, 7732961 pages in 3020ms
> [   75.062834] node 179 initialised, 7732962 pages in 3010ms
> [   75.076528] node 145 initialised, 7732962 pages in 3040ms
> [   75.076613] node 25 initialised, 7732962 pages in 3070ms
> [   75.073086] node 164 initialised, 7732961 pages in 3040ms
> [   75.079674] node 149 initialised, 7732962 pages in 3050ms
> [   75.092015] node 113 initialised, 7732962 pages in 3070ms
> [   75.096325] node 80 initialised, 7732961 pages in 3080ms
> [   75.131380] node 92 initialised, 7732961 pages in 3110ms
> [   75.142147] node 10 initialised, 7732961 pages in 3130ms
> [   75.151041] node 51 initialised, 7732962 pages in 3140ms
> [   75.159074] node 130 initialised, 7732961 pages in 3130ms
> [   75.162616] node 166 initialised, 7732961 pages in 3130ms
> [   75.193557] node 82 initialised, 7732961 pages in 3170ms
> [   75.254801] node 84 initialised, 7732961 pages in 3240ms
> [   75.303028] node 64 initialised, 7732961 pages in 3290ms
> [   75.299739] node 49 initialised, 7732962 pages in 3290ms
> [   75.314231] node 21 initialised, 7732962 pages in 3300ms
> [   75.371298] node 53 initialised, 7732962 pages in 3360ms
> [   75.394569] node 95 initialised, 7732962 pages in 3380ms
> [   75.441101] node 23 initialised, 7732962 pages in 3430ms
> [   75.433080] node 19 initialised, 7732962 pages in 3430ms
> [   75.446076] node 173 initialised, 7732962 pages in 3410ms
> [   75.445816] node 99 initialised, 7732962 pages in 3430ms
> [   75.470330] node 87 initialised, 7732962 pages in 3450ms
> [   75.502334] node 8 initialised, 7732961 pages in 3490ms
> [   75.508300] node 206 initialised, 7732961 pages in 3460ms
> [   75.540253] node 132 initialised, 7732961 pages in 3510ms
> [   75.615453] node 183 initialised, 7732962 pages in 3560ms
> [   75.632576] node 78 initialised, 7732961 pages in 3610ms
> [   75.647753] node 85 initialised, 7732962 pages in 3620ms
> [   75.688955] node 90 initialised, 7732961 pages in 3670ms
> [   75.694522] node 200 initialised, 7732961 pages in 3640ms
> [   75.688790] node 43 initialised, 7732962 pages in 3680ms
> [   75.694540] node 94 initialised, 7732961 pages in 3680ms
> [   75.697149] node 29 initialised, 7732962 pages in 3690ms
> [   75.693590] node 111 initialised, 7732962 pages in 3680ms
> [   75.715829] node 56 initialised, 7732961 pages in 3700ms
> [   75.718427] node 97 initialised, 7732962 pages in 3700ms
> [   75.741643] node 147 initialised, 7732962 pages in 3710ms
> [   75.773613] node 170 initialised, 7732961 pages in 3740ms
> [   75.802874] node 208 initialised, 7732961 pages in 3750ms
> [   75.804409] node 58 initialised, 7732961 pages in 3790ms
> [   75.853438] node 126 initialised, 7732961 pages in 3830ms
> [   75.888167] node 167 initialised, 7732962 pages in 3850ms
> [   75.912656] node 172 initialised, 7732961 pages in 3870ms
> [   75.956540] node 93 initialised, 7732962 pages in 3940ms
> [   75.988819] node 127 initialised, 7732962 pages in 3960ms
> [   76.062198] node 201 initialised, 7732962 pages in 4010ms
> [   76.091769] node 47 initialised, 7732962 pages in 4080ms
> [   76.119749] node 162 initialised, 7732961 pages in 4080ms
> [   76.122797] node 6 initialised, 7732961 pages in 4110ms
> [   76.225916] node 153 initialised, 7732962 pages in 4190ms
> [   76.219855] node 81 initialised, 7732962 pages in 4200ms
> [   76.236116] node 150 initialised, 7732961 pages in 4210ms
> [   76.245349] node 180 initialised, 7732961 pages in 4190ms
> [   76.248827] node 17 initialised, 7732962 pages in 4240ms
> [   76.258801] node 13 initialised, 7732962 pages in 4250ms
> [   76.259943] node 122 initialised, 7732961 pages in 4240ms
> [   76.277480] node 196 initialised, 7732961 pages in 4230ms
> [   76.320830] node 41 initialised, 7732962 pages in 4310ms
> [   76.351667] node 129 initialised, 7732962 pages in 4320ms
> [   76.353488] node 202 initialised, 7732961 pages in 4310ms
> [   76.376753] node 165 initialised, 7732962 pages in 4340ms
> [   76.381807] node 124 initialised, 7732961 pages in 4350ms
> [   76.419952] node 171 initialised, 7732962 pages in 4380ms
> [   76.431242] node 168 initialised, 7732961 pages in 4390ms
> [   76.441324] node 89 initialised, 7732962 pages in 4420ms
> [   76.440720] node 155 initialised, 7732962 pages in 4400ms
> [   76.459715] node 120 initialised, 7732961 pages in 4440ms
> [   76.483986] node 205 initialised, 7732962 pages in 4430ms
> [   76.493284] node 151 initialised, 7732962 pages in 4460ms
> [   76.491437] node 60 initialised, 7732961 pages in 4480ms
> [   76.526620] node 74 initialised, 7732961 pages in 4510ms
> [   76.543761] node 131 initialised, 7732962 pages in 4510ms
> [   76.549562] node 39 initialised, 7732962 pages in 4540ms
> [   76.563861] node 11 initialised, 7732962 pages in 4550ms
> [   76.598775] node 54 initialised, 7732961 pages in 4590ms
> [   76.602006] node 123 initialised, 7732962 pages in 4570ms
> [   76.619856] node 76 initialised, 7732961 pages in 4600ms
> [   76.631418] node 198 initialised, 7732961 pages in 4580ms
> [   76.665415] node 188 initialised, 7732961 pages in 4610ms
> [   76.669178] node 63 initialised, 7732962 pages in 4660ms
> [   76.683646] node 101 initialised, 7732962 pages in 4670ms
> [   76.710780] node 192 initialised, 7732961 pages in 4660ms
> [   76.736743] node 121 initialised, 7732962 pages in 4720ms
> [   76.743800] node 199 initialised, 7732962 pages in 4700ms
> [   76.750663] node 20 initialised, 7732961 pages in 4740ms
> [   76.763045] node 135 initialised, 7732962 pages in 4730ms
> [   76.768216] node 137 initialised, 7732962 pages in 4740ms
> [   76.800135] node 181 initialised, 7732962 pages in 4750ms
> [   76.811215] node 27 initialised, 7732962 pages in 4800ms
> [   76.857405] node 125 initialised, 7732962 pages in 4820ms
> [   76.853750] node 163 initialised, 7732962 pages in 4820ms
> [   76.882975] node 59 initialised, 7732962 pages in 4870ms
> [   76.920121] node 9 initialised, 7732962 pages in 4910ms
> [   76.934824] node 189 initialised, 7732962 pages in 4880ms
> [   76.951223] node 154 initialised, 7732961 pages in 4920ms
> [   76.953897] node 203 initialised, 7732962 pages in 4900ms
> [   76.952558] node 75 initialised, 7732962 pages in 4930ms
> [   76.985480] node 119 initialised, 7732962 pages in 4970ms
> [   77.036089] node 195 initialised, 7732962 pages in 4980ms
> [   77.039996] node 55 initialised, 7732962 pages in 5030ms
> [   77.067989] node 109 initialised, 7732962 pages in 5040ms
> [   77.066236] node 7 initialised, 7732962 pages in 5060ms
> [   77.068709] node 65 initialised, 7732962 pages in 5060ms
> [   77.097859] node 79 initialised, 7732962 pages in 5080ms
> [   77.096219] node 169 initialised, 7732962 pages in 5060ms
> [   77.125113] node 83 initialised, 7732962 pages in 5110ms
> [   77.139507] node 37 initialised, 7732962 pages in 5130ms
> [   77.143280] node 77 initialised, 7732962 pages in 5120ms
> [   77.226494] node 73 initialised, 7732962 pages in 5200ms
> [   77.281584] node 190 initialised, 7732961 pages in 5230ms
> [   77.314794] node 204 initialised, 7732961 pages in 5260ms
> [   77.328577] node 72 initialised, 7732961 pages in 5310ms
> [   77.335743] node 36 initialised, 7732961 pages in 5320ms
> [   77.360573] node 40 initialised, 7732961 pages in 5350ms
> [   77.368712] node 207 initialised, 7732962 pages in 5320ms
> [   77.387708] node 91 initialised, 7732962 pages in 5370ms
> [   77.385143] node 57 initialised, 7732962 pages in 5380ms
> [   77.391785] node 191 initialised, 7732962 pages in 5340ms
> [   77.479970] node 185 initialised, 7732962 pages in 5430ms
> [   77.491865] node 61 initialised, 7732962 pages in 5480ms
> [   77.489255] node 133 initialised, 7732962 pages in 5460ms
> [   77.502111] node 197 initialised, 7732962 pages in 5450ms
> [   77.507136] node 193 initialised, 7732962 pages in 5460ms
> [   77.523739] node 209 initialised, 7732962 pages in 5470ms
> [   77.537131] node 187 initialised, 7732962 pages in 5490ms
>
> -- [2]
>
> http://ozlabs.org/~akpm/mmots/broken-out/memblock-introduce-a-for_each_reserved_mem_region-iterator.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-move-page-initialization-into-a-separate-function.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-only-set-page-reserved-in-the-memblock-region.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-page_alloc-pass-pfn-to-__free_pages_bootmem.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-make-__early_pfn_to_nid-smp-safe-and-introduce-meminit_pfn_in_nid.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-inline-some-helper-functions.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-inline-some-helper-functions-fix.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-initialise-a-subset-of-struct-pages-if-config_deferred_struct_page_init-is-set.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-initialise-remaining-struct-pages-in-parallel-with-kswapd.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-minimise-number-of-pfn-page-lookups-during-initialisation.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/x86-mm-enable-deferred-struct-page-initialisation-on-x86-64.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-free-pages-in-large-chunks-where-possible.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-reduce-number-of-times-pageblocks-are-set-during-struct-page-init.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-remove-mminit_verify_page_links.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-initialise-a-subset-of-struct-pages-if-config_deferred_struct_page_init-is-set-fix.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-reduce-number-of-times-pageblocks-are-set-during-struct-page-init-fix.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-inline-some-helper-functions-fix2.patch 
>
>


^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
@ 2015-05-14 15:47                         ` nzimmer
  0 siblings, 0 replies; 168+ messages in thread
From: nzimmer @ 2015-05-14 15:47 UTC (permalink / raw)
  To: Daniel J Blueman, Mel Gorman
  Cc: Andrew Morton, Waiman Long, Dave Hansen, Scott Norton, Linux-MM,
	LKML, Steffen Persvold

Well I did get in some tests yesterday afternoon.  And with some simple 
timers found that occasionally it took huge amount of time in this 
snippit at the top of
static int __init deferred_init_memmap(void *data)

        /* Bind memory initialisation thread to a local node if possible */
        if (!cpumask_empty(cpumask))
                set_cpus_allowed_ptr(current, cpumask);

I am assuming that the it is getting caught up in set_cpus_allowed_ptr 
not the cpumask_empty.
I have more machine time today and will make sure I have all those patches.

On 05/14/2015 05:03 AM, Daniel J Blueman wrote:
> On Thu, May 14, 2015 at 12:31 AM, Mel Gorman <mgorman@suse.de> wrote:
>> On Wed, May 13, 2015 at 10:53:33AM -0500, nzimmer wrote:
>>>  I am just noticed a hang on my largest box.
>>>  I can only reproduce with large core counts, if I turn down the
>>>  number of cpus it doesn't have an issue.
>>>
>>
>> Odd. The number of core counts should make little a difference as only
>> one CPU per node should be in use. Does sysrq+t give any indication how
>> or where it is hanging?
>
> I was seeing the same behaviour of 1000ms increasing to 5500ms [1]; 
> this suggests either lock contention or O(n) behaviour.
>
> Nathan, can you check with this ordering of patches from Andrew's 
> cache [2]? I was getting hanging until I a found them all.
>
> I'll follow up with timing data.
>
> Thanks,
>  Daniel
>
> -- [1]
>
> [   73.076117] node 2 initialised, 7732961 pages in 1060ms
> [   73.077184] node 38 initialised, 7732961 pages in 1060ms
> [   73.079626] node 146 initialised, 7732961 pages in 1050ms
> [   73.093488] node 62 initialised, 7732961 pages in 1080ms
> [   73.091557] node 3 initialised, 7732962 pages in 1080ms
> [   73.100000] node 186 initialised, 7732961 pages in 1040ms
> [   73.095731] node 4 initialised, 7732961 pages in 1080ms
> [   73.090289] node 50 initialised, 7732961 pages in 1080ms
> [   73.094005] node 158 initialised, 7732961 pages in 1050ms
> [   73.095421] node 159 initialised, 7732962 pages in 1050ms
> [   73.090324] node 52 initialised, 7732961 pages in 1080ms
> [   73.099056] node 5 initialised, 7732962 pages in 1080ms
> [   73.090116] node 160 initialised, 7732961 pages in 1050ms
> [   73.161051] node 157 initialised, 7732962 pages in 1120ms
> [   73.193565] node 161 initialised, 7732962 pages in 1160ms
> [   73.212456] node 26 initialised, 7732961 pages in 1200ms
> [   73.222904] node 0 initialised, 6686488 pages in 1210ms
> [   73.242165] node 140 initialised, 7732961 pages in 1210ms
> [   73.254230] node 156 initialised, 7732961 pages in 1220ms
> [   73.284634] node 1 initialised, 7732962 pages in 1270ms
> [   73.305301] node 141 initialised, 7732962 pages in 1280ms
> [   73.322845] node 28 initialised, 7732961 pages in 1310ms
> [   73.321757] node 142 initialised, 7732961 pages in 1290ms
> [   73.327677] node 138 initialised, 7732961 pages in 1300ms
> [   73.413597] node 176 initialised, 7732961 pages in 1370ms
> [   73.455552] node 139 initialised, 7732962 pages in 1420ms
> [   73.475356] node 143 initialised, 7732962 pages in 1440ms
> [   73.547202] node 32 initialised, 7732961 pages in 1530ms
> [   73.579591] node 104 initialised, 7732961 pages in 1560ms
> [   73.618065] node 174 initialised, 7732961 pages in 1570ms
> [   73.624918] node 178 initialised, 7732961 pages in 1580ms
> [   73.649024] node 175 initialised, 7732962 pages in 1610ms
> [   73.654110] node 105 initialised, 7732962 pages in 1630ms
> [   73.670589] node 106 initialised, 7732961 pages in 1650ms
> [   73.739682] node 102 initialised, 7732961 pages in 1720ms
> [   73.769639] node 86 initialised, 7732961 pages in 1750ms
> [   73.775573] node 44 initialised, 7732961 pages in 1760ms
> [   73.772955] node 177 initialised, 7732962 pages in 1740ms
> [   73.804390] node 34 initialised, 7732961 pages in 1790ms
> [   73.819370] node 30 initialised, 7732961 pages in 1810ms
> [   73.847882] node 98 initialised, 7732961 pages in 1830ms
> [   73.867545] node 33 initialised, 7732962 pages in 1860ms
> [   73.877964] node 107 initialised, 7732962 pages in 1860ms
> [   73.906256] node 103 initialised, 7732962 pages in 1880ms
> [   73.945581] node 100 initialised, 7732961 pages in 1930ms
> [   73.947024] node 96 initialised, 7732961 pages in 1930ms
> [   74.186208] node 116 initialised, 7732961 pages in 2170ms
> [   74.220838] node 68 initialised, 7732961 pages in 2210ms
> [   74.252341] node 46 initialised, 7732961 pages in 2240ms
> [   74.274795] node 118 initialised, 7732961 pages in 2260ms
> [   74.337544] node 14 initialised, 7732961 pages in 2320ms
> [   74.350819] node 22 initialised, 7732961 pages in 2340ms
> [   74.350332] node 69 initialised, 7732962 pages in 2340ms
> [   74.362683] node 211 initialised, 7732962 pages in 2310ms
> [   74.360617] node 70 initialised, 7732961 pages in 2340ms
> [   74.369137] node 66 initialised, 7732961 pages in 2360ms
> [   74.378242] node 115 initialised, 7732962 pages in 2360ms
> [   74.404221] node 213 initialised, 7732962 pages in 2350ms
> [   74.420901] node 210 initialised, 7732961 pages in 2370ms
> [   74.430049] node 35 initialised, 7732962 pages in 2420ms
> [   74.436007] node 48 initialised, 7732961 pages in 2420ms
> [   74.480595] node 71 initialised, 7732962 pages in 2460ms
> [   74.485700] node 67 initialised, 7732962 pages in 2480ms
> [   74.502627] node 31 initialised, 7732962 pages in 2490ms
> [   74.542220] node 16 initialised, 7732961 pages in 2530ms
> [   74.547936] node 128 initialised, 7732961 pages in 2520ms
> [   74.634374] node 214 initialised, 7732961 pages in 2580ms
> [   74.654389] node 88 initialised, 7732961 pages in 2630ms
> [   74.722833] node 117 initialised, 7732962 pages in 2700ms
> [   74.735002] node 148 initialised, 7732961 pages in 2700ms
> [   74.742725] node 12 initialised, 7732961 pages in 2730ms
> [   74.749319] node 194 initialised, 7732961 pages in 2700ms
> [   74.767979] node 24 initialised, 7732961 pages in 2750ms
> [   74.769465] node 114 initialised, 7732961 pages in 2750ms
> [   74.796973] node 134 initialised, 7732961 pages in 2770ms
> [   74.818164] node 15 initialised, 7732962 pages in 2810ms
> [   74.844852] node 18 initialised, 7732961 pages in 2830ms
> [   74.866123] node 110 initialised, 7732961 pages in 2850ms
> [   74.898255] node 215 initialised, 7730688 pages in 2840ms
> [   74.903623] node 136 initialised, 7732961 pages in 2880ms
> [   74.911107] node 144 initialised, 7732961 pages in 2890ms
> [   74.918757] node 212 initialised, 7732961 pages in 2870ms
> [   74.935333] node 182 initialised, 7732961 pages in 2880ms
> [   74.958147] node 42 initialised, 7732961 pages in 2950ms
> [   74.964989] node 108 initialised, 7732961 pages in 2950ms
> [   74.965482] node 112 initialised, 7732961 pages in 2950ms
> [   75.034787] node 184 initialised, 7732961 pages in 2980ms
> [   75.051242] node 45 initialised, 7732962 pages in 3040ms
> [   75.047169] node 152 initialised, 7732961 pages in 3020ms
> [   75.062834] node 179 initialised, 7732962 pages in 3010ms
> [   75.076528] node 145 initialised, 7732962 pages in 3040ms
> [   75.076613] node 25 initialised, 7732962 pages in 3070ms
> [   75.073086] node 164 initialised, 7732961 pages in 3040ms
> [   75.079674] node 149 initialised, 7732962 pages in 3050ms
> [   75.092015] node 113 initialised, 7732962 pages in 3070ms
> [   75.096325] node 80 initialised, 7732961 pages in 3080ms
> [   75.131380] node 92 initialised, 7732961 pages in 3110ms
> [   75.142147] node 10 initialised, 7732961 pages in 3130ms
> [   75.151041] node 51 initialised, 7732962 pages in 3140ms
> [   75.159074] node 130 initialised, 7732961 pages in 3130ms
> [   75.162616] node 166 initialised, 7732961 pages in 3130ms
> [   75.193557] node 82 initialised, 7732961 pages in 3170ms
> [   75.254801] node 84 initialised, 7732961 pages in 3240ms
> [   75.303028] node 64 initialised, 7732961 pages in 3290ms
> [   75.299739] node 49 initialised, 7732962 pages in 3290ms
> [   75.314231] node 21 initialised, 7732962 pages in 3300ms
> [   75.371298] node 53 initialised, 7732962 pages in 3360ms
> [   75.394569] node 95 initialised, 7732962 pages in 3380ms
> [   75.441101] node 23 initialised, 7732962 pages in 3430ms
> [   75.433080] node 19 initialised, 7732962 pages in 3430ms
> [   75.446076] node 173 initialised, 7732962 pages in 3410ms
> [   75.445816] node 99 initialised, 7732962 pages in 3430ms
> [   75.470330] node 87 initialised, 7732962 pages in 3450ms
> [   75.502334] node 8 initialised, 7732961 pages in 3490ms
> [   75.508300] node 206 initialised, 7732961 pages in 3460ms
> [   75.540253] node 132 initialised, 7732961 pages in 3510ms
> [   75.615453] node 183 initialised, 7732962 pages in 3560ms
> [   75.632576] node 78 initialised, 7732961 pages in 3610ms
> [   75.647753] node 85 initialised, 7732962 pages in 3620ms
> [   75.688955] node 90 initialised, 7732961 pages in 3670ms
> [   75.694522] node 200 initialised, 7732961 pages in 3640ms
> [   75.688790] node 43 initialised, 7732962 pages in 3680ms
> [   75.694540] node 94 initialised, 7732961 pages in 3680ms
> [   75.697149] node 29 initialised, 7732962 pages in 3690ms
> [   75.693590] node 111 initialised, 7732962 pages in 3680ms
> [   75.715829] node 56 initialised, 7732961 pages in 3700ms
> [   75.718427] node 97 initialised, 7732962 pages in 3700ms
> [   75.741643] node 147 initialised, 7732962 pages in 3710ms
> [   75.773613] node 170 initialised, 7732961 pages in 3740ms
> [   75.802874] node 208 initialised, 7732961 pages in 3750ms
> [   75.804409] node 58 initialised, 7732961 pages in 3790ms
> [   75.853438] node 126 initialised, 7732961 pages in 3830ms
> [   75.888167] node 167 initialised, 7732962 pages in 3850ms
> [   75.912656] node 172 initialised, 7732961 pages in 3870ms
> [   75.956540] node 93 initialised, 7732962 pages in 3940ms
> [   75.988819] node 127 initialised, 7732962 pages in 3960ms
> [   76.062198] node 201 initialised, 7732962 pages in 4010ms
> [   76.091769] node 47 initialised, 7732962 pages in 4080ms
> [   76.119749] node 162 initialised, 7732961 pages in 4080ms
> [   76.122797] node 6 initialised, 7732961 pages in 4110ms
> [   76.225916] node 153 initialised, 7732962 pages in 4190ms
> [   76.219855] node 81 initialised, 7732962 pages in 4200ms
> [   76.236116] node 150 initialised, 7732961 pages in 4210ms
> [   76.245349] node 180 initialised, 7732961 pages in 4190ms
> [   76.248827] node 17 initialised, 7732962 pages in 4240ms
> [   76.258801] node 13 initialised, 7732962 pages in 4250ms
> [   76.259943] node 122 initialised, 7732961 pages in 4240ms
> [   76.277480] node 196 initialised, 7732961 pages in 4230ms
> [   76.320830] node 41 initialised, 7732962 pages in 4310ms
> [   76.351667] node 129 initialised, 7732962 pages in 4320ms
> [   76.353488] node 202 initialised, 7732961 pages in 4310ms
> [   76.376753] node 165 initialised, 7732962 pages in 4340ms
> [   76.381807] node 124 initialised, 7732961 pages in 4350ms
> [   76.419952] node 171 initialised, 7732962 pages in 4380ms
> [   76.431242] node 168 initialised, 7732961 pages in 4390ms
> [   76.441324] node 89 initialised, 7732962 pages in 4420ms
> [   76.440720] node 155 initialised, 7732962 pages in 4400ms
> [   76.459715] node 120 initialised, 7732961 pages in 4440ms
> [   76.483986] node 205 initialised, 7732962 pages in 4430ms
> [   76.493284] node 151 initialised, 7732962 pages in 4460ms
> [   76.491437] node 60 initialised, 7732961 pages in 4480ms
> [   76.526620] node 74 initialised, 7732961 pages in 4510ms
> [   76.543761] node 131 initialised, 7732962 pages in 4510ms
> [   76.549562] node 39 initialised, 7732962 pages in 4540ms
> [   76.563861] node 11 initialised, 7732962 pages in 4550ms
> [   76.598775] node 54 initialised, 7732961 pages in 4590ms
> [   76.602006] node 123 initialised, 7732962 pages in 4570ms
> [   76.619856] node 76 initialised, 7732961 pages in 4600ms
> [   76.631418] node 198 initialised, 7732961 pages in 4580ms
> [   76.665415] node 188 initialised, 7732961 pages in 4610ms
> [   76.669178] node 63 initialised, 7732962 pages in 4660ms
> [   76.683646] node 101 initialised, 7732962 pages in 4670ms
> [   76.710780] node 192 initialised, 7732961 pages in 4660ms
> [   76.736743] node 121 initialised, 7732962 pages in 4720ms
> [   76.743800] node 199 initialised, 7732962 pages in 4700ms
> [   76.750663] node 20 initialised, 7732961 pages in 4740ms
> [   76.763045] node 135 initialised, 7732962 pages in 4730ms
> [   76.768216] node 137 initialised, 7732962 pages in 4740ms
> [   76.800135] node 181 initialised, 7732962 pages in 4750ms
> [   76.811215] node 27 initialised, 7732962 pages in 4800ms
> [   76.857405] node 125 initialised, 7732962 pages in 4820ms
> [   76.853750] node 163 initialised, 7732962 pages in 4820ms
> [   76.882975] node 59 initialised, 7732962 pages in 4870ms
> [   76.920121] node 9 initialised, 7732962 pages in 4910ms
> [   76.934824] node 189 initialised, 7732962 pages in 4880ms
> [   76.951223] node 154 initialised, 7732961 pages in 4920ms
> [   76.953897] node 203 initialised, 7732962 pages in 4900ms
> [   76.952558] node 75 initialised, 7732962 pages in 4930ms
> [   76.985480] node 119 initialised, 7732962 pages in 4970ms
> [   77.036089] node 195 initialised, 7732962 pages in 4980ms
> [   77.039996] node 55 initialised, 7732962 pages in 5030ms
> [   77.067989] node 109 initialised, 7732962 pages in 5040ms
> [   77.066236] node 7 initialised, 7732962 pages in 5060ms
> [   77.068709] node 65 initialised, 7732962 pages in 5060ms
> [   77.097859] node 79 initialised, 7732962 pages in 5080ms
> [   77.096219] node 169 initialised, 7732962 pages in 5060ms
> [   77.125113] node 83 initialised, 7732962 pages in 5110ms
> [   77.139507] node 37 initialised, 7732962 pages in 5130ms
> [   77.143280] node 77 initialised, 7732962 pages in 5120ms
> [   77.226494] node 73 initialised, 7732962 pages in 5200ms
> [   77.281584] node 190 initialised, 7732961 pages in 5230ms
> [   77.314794] node 204 initialised, 7732961 pages in 5260ms
> [   77.328577] node 72 initialised, 7732961 pages in 5310ms
> [   77.335743] node 36 initialised, 7732961 pages in 5320ms
> [   77.360573] node 40 initialised, 7732961 pages in 5350ms
> [   77.368712] node 207 initialised, 7732962 pages in 5320ms
> [   77.387708] node 91 initialised, 7732962 pages in 5370ms
> [   77.385143] node 57 initialised, 7732962 pages in 5380ms
> [   77.391785] node 191 initialised, 7732962 pages in 5340ms
> [   77.479970] node 185 initialised, 7732962 pages in 5430ms
> [   77.491865] node 61 initialised, 7732962 pages in 5480ms
> [   77.489255] node 133 initialised, 7732962 pages in 5460ms
> [   77.502111] node 197 initialised, 7732962 pages in 5450ms
> [   77.507136] node 193 initialised, 7732962 pages in 5460ms
> [   77.523739] node 209 initialised, 7732962 pages in 5470ms
> [   77.537131] node 187 initialised, 7732962 pages in 5490ms
>
> -- [2]
>
> http://ozlabs.org/~akpm/mmots/broken-out/memblock-introduce-a-for_each_reserved_mem_region-iterator.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-move-page-initialization-into-a-separate-function.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-only-set-page-reserved-in-the-memblock-region.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-page_alloc-pass-pfn-to-__free_pages_bootmem.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-make-__early_pfn_to_nid-smp-safe-and-introduce-meminit_pfn_in_nid.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-inline-some-helper-functions.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-inline-some-helper-functions-fix.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-initialise-a-subset-of-struct-pages-if-config_deferred_struct_page_init-is-set.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-initialise-remaining-struct-pages-in-parallel-with-kswapd.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-minimise-number-of-pfn-page-lookups-during-initialisation.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/x86-mm-enable-deferred-struct-page-initialisation-on-x86-64.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-free-pages-in-large-chunks-where-possible.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-reduce-number-of-times-pageblocks-are-set-during-struct-page-init.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-remove-mminit_verify_page_links.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-initialise-a-subset-of-struct-pages-if-config_deferred_struct_page_init-is-set-fix.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-reduce-number-of-times-pageblocks-are-set-during-struct-page-init-fix.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-inline-some-helper-functions-fix2.patch 
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
  2015-05-14 10:03                       ` Daniel J Blueman
@ 2015-05-19 18:31                         ` nzimmer
  -1 siblings, 0 replies; 168+ messages in thread
From: nzimmer @ 2015-05-19 18:31 UTC (permalink / raw)
  To: Daniel J Blueman, Mel Gorman
  Cc: Andrew Morton, Waiman Long, Dave Hansen, Scott Norton, Linux-MM,
	LKML, Steffen Persvold

After double checking the patches it seems everything is ok.

I had to rerun quite a bit since the machine was reconfigured and I 
wanted to be thorough.
My latest timings are quite close to my previous reported numbers.

The hang issue I encountered turned out to be unrelated to these patches 
so that is a separate bundle of fun.



On 05/14/2015 05:03 AM, Daniel J Blueman wrote:
> On Thu, May 14, 2015 at 12:31 AM, Mel Gorman <mgorman@suse.de> wrote:
>> On Wed, May 13, 2015 at 10:53:33AM -0500, nzimmer wrote:
>>>  I am just noticed a hang on my largest box.
>>>  I can only reproduce with large core counts, if I turn down the
>>>  number of cpus it doesn't have an issue.
>>>
>>
>> Odd. The number of core counts should make little a difference as only
>> one CPU per node should be in use. Does sysrq+t give any indication how
>> or where it is hanging?
>
> I was seeing the same behaviour of 1000ms increasing to 5500ms [1]; 
> this suggests either lock contention or O(n) behaviour.
>
> Nathan, can you check with this ordering of patches from Andrew's 
> cache [2]? I was getting hanging until I a found them all.
>
> I'll follow up with timing data.
>
> Thanks,
>  Daniel
>
> -- [1]
>
> [   73.076117] node 2 initialised, 7732961 pages in 1060ms
> [   73.077184] node 38 initialised, 7732961 pages in 1060ms
> [   73.079626] node 146 initialised, 7732961 pages in 1050ms
> [   73.093488] node 62 initialised, 7732961 pages in 1080ms
> [   73.091557] node 3 initialised, 7732962 pages in 1080ms
> [   73.100000] node 186 initialised, 7732961 pages in 1040ms
> [   73.095731] node 4 initialised, 7732961 pages in 1080ms
> [   73.090289] node 50 initialised, 7732961 pages in 1080ms
> [   73.094005] node 158 initialised, 7732961 pages in 1050ms
> [   73.095421] node 159 initialised, 7732962 pages in 1050ms
> [   73.090324] node 52 initialised, 7732961 pages in 1080ms
> [   73.099056] node 5 initialised, 7732962 pages in 1080ms
> [   73.090116] node 160 initialised, 7732961 pages in 1050ms
> [   73.161051] node 157 initialised, 7732962 pages in 1120ms
> [   73.193565] node 161 initialised, 7732962 pages in 1160ms
> [   73.212456] node 26 initialised, 7732961 pages in 1200ms
> [   73.222904] node 0 initialised, 6686488 pages in 1210ms
> [   73.242165] node 140 initialised, 7732961 pages in 1210ms
> [   73.254230] node 156 initialised, 7732961 pages in 1220ms
> [   73.284634] node 1 initialised, 7732962 pages in 1270ms
> [   73.305301] node 141 initialised, 7732962 pages in 1280ms
> [   73.322845] node 28 initialised, 7732961 pages in 1310ms
> [   73.321757] node 142 initialised, 7732961 pages in 1290ms
> [   73.327677] node 138 initialised, 7732961 pages in 1300ms
> [   73.413597] node 176 initialised, 7732961 pages in 1370ms
> [   73.455552] node 139 initialised, 7732962 pages in 1420ms
> [   73.475356] node 143 initialised, 7732962 pages in 1440ms
> [   73.547202] node 32 initialised, 7732961 pages in 1530ms
> [   73.579591] node 104 initialised, 7732961 pages in 1560ms
> [   73.618065] node 174 initialised, 7732961 pages in 1570ms
> [   73.624918] node 178 initialised, 7732961 pages in 1580ms
> [   73.649024] node 175 initialised, 7732962 pages in 1610ms
> [   73.654110] node 105 initialised, 7732962 pages in 1630ms
> [   73.670589] node 106 initialised, 7732961 pages in 1650ms
> [   73.739682] node 102 initialised, 7732961 pages in 1720ms
> [   73.769639] node 86 initialised, 7732961 pages in 1750ms
> [   73.775573] node 44 initialised, 7732961 pages in 1760ms
> [   73.772955] node 177 initialised, 7732962 pages in 1740ms
> [   73.804390] node 34 initialised, 7732961 pages in 1790ms
> [   73.819370] node 30 initialised, 7732961 pages in 1810ms
> [   73.847882] node 98 initialised, 7732961 pages in 1830ms
> [   73.867545] node 33 initialised, 7732962 pages in 1860ms
> [   73.877964] node 107 initialised, 7732962 pages in 1860ms
> [   73.906256] node 103 initialised, 7732962 pages in 1880ms
> [   73.945581] node 100 initialised, 7732961 pages in 1930ms
> [   73.947024] node 96 initialised, 7732961 pages in 1930ms
> [   74.186208] node 116 initialised, 7732961 pages in 2170ms
> [   74.220838] node 68 initialised, 7732961 pages in 2210ms
> [   74.252341] node 46 initialised, 7732961 pages in 2240ms
> [   74.274795] node 118 initialised, 7732961 pages in 2260ms
> [   74.337544] node 14 initialised, 7732961 pages in 2320ms
> [   74.350819] node 22 initialised, 7732961 pages in 2340ms
> [   74.350332] node 69 initialised, 7732962 pages in 2340ms
> [   74.362683] node 211 initialised, 7732962 pages in 2310ms
> [   74.360617] node 70 initialised, 7732961 pages in 2340ms
> [   74.369137] node 66 initialised, 7732961 pages in 2360ms
> [   74.378242] node 115 initialised, 7732962 pages in 2360ms
> [   74.404221] node 213 initialised, 7732962 pages in 2350ms
> [   74.420901] node 210 initialised, 7732961 pages in 2370ms
> [   74.430049] node 35 initialised, 7732962 pages in 2420ms
> [   74.436007] node 48 initialised, 7732961 pages in 2420ms
> [   74.480595] node 71 initialised, 7732962 pages in 2460ms
> [   74.485700] node 67 initialised, 7732962 pages in 2480ms
> [   74.502627] node 31 initialised, 7732962 pages in 2490ms
> [   74.542220] node 16 initialised, 7732961 pages in 2530ms
> [   74.547936] node 128 initialised, 7732961 pages in 2520ms
> [   74.634374] node 214 initialised, 7732961 pages in 2580ms
> [   74.654389] node 88 initialised, 7732961 pages in 2630ms
> [   74.722833] node 117 initialised, 7732962 pages in 2700ms
> [   74.735002] node 148 initialised, 7732961 pages in 2700ms
> [   74.742725] node 12 initialised, 7732961 pages in 2730ms
> [   74.749319] node 194 initialised, 7732961 pages in 2700ms
> [   74.767979] node 24 initialised, 7732961 pages in 2750ms
> [   74.769465] node 114 initialised, 7732961 pages in 2750ms
> [   74.796973] node 134 initialised, 7732961 pages in 2770ms
> [   74.818164] node 15 initialised, 7732962 pages in 2810ms
> [   74.844852] node 18 initialised, 7732961 pages in 2830ms
> [   74.866123] node 110 initialised, 7732961 pages in 2850ms
> [   74.898255] node 215 initialised, 7730688 pages in 2840ms
> [   74.903623] node 136 initialised, 7732961 pages in 2880ms
> [   74.911107] node 144 initialised, 7732961 pages in 2890ms
> [   74.918757] node 212 initialised, 7732961 pages in 2870ms
> [   74.935333] node 182 initialised, 7732961 pages in 2880ms
> [   74.958147] node 42 initialised, 7732961 pages in 2950ms
> [   74.964989] node 108 initialised, 7732961 pages in 2950ms
> [   74.965482] node 112 initialised, 7732961 pages in 2950ms
> [   75.034787] node 184 initialised, 7732961 pages in 2980ms
> [   75.051242] node 45 initialised, 7732962 pages in 3040ms
> [   75.047169] node 152 initialised, 7732961 pages in 3020ms
> [   75.062834] node 179 initialised, 7732962 pages in 3010ms
> [   75.076528] node 145 initialised, 7732962 pages in 3040ms
> [   75.076613] node 25 initialised, 7732962 pages in 3070ms
> [   75.073086] node 164 initialised, 7732961 pages in 3040ms
> [   75.079674] node 149 initialised, 7732962 pages in 3050ms
> [   75.092015] node 113 initialised, 7732962 pages in 3070ms
> [   75.096325] node 80 initialised, 7732961 pages in 3080ms
> [   75.131380] node 92 initialised, 7732961 pages in 3110ms
> [   75.142147] node 10 initialised, 7732961 pages in 3130ms
> [   75.151041] node 51 initialised, 7732962 pages in 3140ms
> [   75.159074] node 130 initialised, 7732961 pages in 3130ms
> [   75.162616] node 166 initialised, 7732961 pages in 3130ms
> [   75.193557] node 82 initialised, 7732961 pages in 3170ms
> [   75.254801] node 84 initialised, 7732961 pages in 3240ms
> [   75.303028] node 64 initialised, 7732961 pages in 3290ms
> [   75.299739] node 49 initialised, 7732962 pages in 3290ms
> [   75.314231] node 21 initialised, 7732962 pages in 3300ms
> [   75.371298] node 53 initialised, 7732962 pages in 3360ms
> [   75.394569] node 95 initialised, 7732962 pages in 3380ms
> [   75.441101] node 23 initialised, 7732962 pages in 3430ms
> [   75.433080] node 19 initialised, 7732962 pages in 3430ms
> [   75.446076] node 173 initialised, 7732962 pages in 3410ms
> [   75.445816] node 99 initialised, 7732962 pages in 3430ms
> [   75.470330] node 87 initialised, 7732962 pages in 3450ms
> [   75.502334] node 8 initialised, 7732961 pages in 3490ms
> [   75.508300] node 206 initialised, 7732961 pages in 3460ms
> [   75.540253] node 132 initialised, 7732961 pages in 3510ms
> [   75.615453] node 183 initialised, 7732962 pages in 3560ms
> [   75.632576] node 78 initialised, 7732961 pages in 3610ms
> [   75.647753] node 85 initialised, 7732962 pages in 3620ms
> [   75.688955] node 90 initialised, 7732961 pages in 3670ms
> [   75.694522] node 200 initialised, 7732961 pages in 3640ms
> [   75.688790] node 43 initialised, 7732962 pages in 3680ms
> [   75.694540] node 94 initialised, 7732961 pages in 3680ms
> [   75.697149] node 29 initialised, 7732962 pages in 3690ms
> [   75.693590] node 111 initialised, 7732962 pages in 3680ms
> [   75.715829] node 56 initialised, 7732961 pages in 3700ms
> [   75.718427] node 97 initialised, 7732962 pages in 3700ms
> [   75.741643] node 147 initialised, 7732962 pages in 3710ms
> [   75.773613] node 170 initialised, 7732961 pages in 3740ms
> [   75.802874] node 208 initialised, 7732961 pages in 3750ms
> [   75.804409] node 58 initialised, 7732961 pages in 3790ms
> [   75.853438] node 126 initialised, 7732961 pages in 3830ms
> [   75.888167] node 167 initialised, 7732962 pages in 3850ms
> [   75.912656] node 172 initialised, 7732961 pages in 3870ms
> [   75.956540] node 93 initialised, 7732962 pages in 3940ms
> [   75.988819] node 127 initialised, 7732962 pages in 3960ms
> [   76.062198] node 201 initialised, 7732962 pages in 4010ms
> [   76.091769] node 47 initialised, 7732962 pages in 4080ms
> [   76.119749] node 162 initialised, 7732961 pages in 4080ms
> [   76.122797] node 6 initialised, 7732961 pages in 4110ms
> [   76.225916] node 153 initialised, 7732962 pages in 4190ms
> [   76.219855] node 81 initialised, 7732962 pages in 4200ms
> [   76.236116] node 150 initialised, 7732961 pages in 4210ms
> [   76.245349] node 180 initialised, 7732961 pages in 4190ms
> [   76.248827] node 17 initialised, 7732962 pages in 4240ms
> [   76.258801] node 13 initialised, 7732962 pages in 4250ms
> [   76.259943] node 122 initialised, 7732961 pages in 4240ms
> [   76.277480] node 196 initialised, 7732961 pages in 4230ms
> [   76.320830] node 41 initialised, 7732962 pages in 4310ms
> [   76.351667] node 129 initialised, 7732962 pages in 4320ms
> [   76.353488] node 202 initialised, 7732961 pages in 4310ms
> [   76.376753] node 165 initialised, 7732962 pages in 4340ms
> [   76.381807] node 124 initialised, 7732961 pages in 4350ms
> [   76.419952] node 171 initialised, 7732962 pages in 4380ms
> [   76.431242] node 168 initialised, 7732961 pages in 4390ms
> [   76.441324] node 89 initialised, 7732962 pages in 4420ms
> [   76.440720] node 155 initialised, 7732962 pages in 4400ms
> [   76.459715] node 120 initialised, 7732961 pages in 4440ms
> [   76.483986] node 205 initialised, 7732962 pages in 4430ms
> [   76.493284] node 151 initialised, 7732962 pages in 4460ms
> [   76.491437] node 60 initialised, 7732961 pages in 4480ms
> [   76.526620] node 74 initialised, 7732961 pages in 4510ms
> [   76.543761] node 131 initialised, 7732962 pages in 4510ms
> [   76.549562] node 39 initialised, 7732962 pages in 4540ms
> [   76.563861] node 11 initialised, 7732962 pages in 4550ms
> [   76.598775] node 54 initialised, 7732961 pages in 4590ms
> [   76.602006] node 123 initialised, 7732962 pages in 4570ms
> [   76.619856] node 76 initialised, 7732961 pages in 4600ms
> [   76.631418] node 198 initialised, 7732961 pages in 4580ms
> [   76.665415] node 188 initialised, 7732961 pages in 4610ms
> [   76.669178] node 63 initialised, 7732962 pages in 4660ms
> [   76.683646] node 101 initialised, 7732962 pages in 4670ms
> [   76.710780] node 192 initialised, 7732961 pages in 4660ms
> [   76.736743] node 121 initialised, 7732962 pages in 4720ms
> [   76.743800] node 199 initialised, 7732962 pages in 4700ms
> [   76.750663] node 20 initialised, 7732961 pages in 4740ms
> [   76.763045] node 135 initialised, 7732962 pages in 4730ms
> [   76.768216] node 137 initialised, 7732962 pages in 4740ms
> [   76.800135] node 181 initialised, 7732962 pages in 4750ms
> [   76.811215] node 27 initialised, 7732962 pages in 4800ms
> [   76.857405] node 125 initialised, 7732962 pages in 4820ms
> [   76.853750] node 163 initialised, 7732962 pages in 4820ms
> [   76.882975] node 59 initialised, 7732962 pages in 4870ms
> [   76.920121] node 9 initialised, 7732962 pages in 4910ms
> [   76.934824] node 189 initialised, 7732962 pages in 4880ms
> [   76.951223] node 154 initialised, 7732961 pages in 4920ms
> [   76.953897] node 203 initialised, 7732962 pages in 4900ms
> [   76.952558] node 75 initialised, 7732962 pages in 4930ms
> [   76.985480] node 119 initialised, 7732962 pages in 4970ms
> [   77.036089] node 195 initialised, 7732962 pages in 4980ms
> [   77.039996] node 55 initialised, 7732962 pages in 5030ms
> [   77.067989] node 109 initialised, 7732962 pages in 5040ms
> [   77.066236] node 7 initialised, 7732962 pages in 5060ms
> [   77.068709] node 65 initialised, 7732962 pages in 5060ms
> [   77.097859] node 79 initialised, 7732962 pages in 5080ms
> [   77.096219] node 169 initialised, 7732962 pages in 5060ms
> [   77.125113] node 83 initialised, 7732962 pages in 5110ms
> [   77.139507] node 37 initialised, 7732962 pages in 5130ms
> [   77.143280] node 77 initialised, 7732962 pages in 5120ms
> [   77.226494] node 73 initialised, 7732962 pages in 5200ms
> [   77.281584] node 190 initialised, 7732961 pages in 5230ms
> [   77.314794] node 204 initialised, 7732961 pages in 5260ms
> [   77.328577] node 72 initialised, 7732961 pages in 5310ms
> [   77.335743] node 36 initialised, 7732961 pages in 5320ms
> [   77.360573] node 40 initialised, 7732961 pages in 5350ms
> [   77.368712] node 207 initialised, 7732962 pages in 5320ms
> [   77.387708] node 91 initialised, 7732962 pages in 5370ms
> [   77.385143] node 57 initialised, 7732962 pages in 5380ms
> [   77.391785] node 191 initialised, 7732962 pages in 5340ms
> [   77.479970] node 185 initialised, 7732962 pages in 5430ms
> [   77.491865] node 61 initialised, 7732962 pages in 5480ms
> [   77.489255] node 133 initialised, 7732962 pages in 5460ms
> [   77.502111] node 197 initialised, 7732962 pages in 5450ms
> [   77.507136] node 193 initialised, 7732962 pages in 5460ms
> [   77.523739] node 209 initialised, 7732962 pages in 5470ms
> [   77.537131] node 187 initialised, 7732962 pages in 5490ms
>
> -- [2]
>
> http://ozlabs.org/~akpm/mmots/broken-out/memblock-introduce-a-for_each_reserved_mem_region-iterator.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-move-page-initialization-into-a-separate-function.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-only-set-page-reserved-in-the-memblock-region.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-page_alloc-pass-pfn-to-__free_pages_bootmem.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-make-__early_pfn_to_nid-smp-safe-and-introduce-meminit_pfn_in_nid.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-inline-some-helper-functions.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-inline-some-helper-functions-fix.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-initialise-a-subset-of-struct-pages-if-config_deferred_struct_page_init-is-set.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-initialise-remaining-struct-pages-in-parallel-with-kswapd.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-minimise-number-of-pfn-page-lookups-during-initialisation.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/x86-mm-enable-deferred-struct-page-initialisation-on-x86-64.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-free-pages-in-large-chunks-where-possible.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-reduce-number-of-times-pageblocks-are-set-during-struct-page-init.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-remove-mminit_verify_page_links.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-initialise-a-subset-of-struct-pages-if-config_deferred_struct_page_init-is-set-fix.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-reduce-number-of-times-pageblocks-are-set-during-struct-page-init-fix.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-inline-some-helper-functions-fix2.patch 
>
>


^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
@ 2015-05-19 18:31                         ` nzimmer
  0 siblings, 0 replies; 168+ messages in thread
From: nzimmer @ 2015-05-19 18:31 UTC (permalink / raw)
  To: Daniel J Blueman, Mel Gorman
  Cc: Andrew Morton, Waiman Long, Dave Hansen, Scott Norton, Linux-MM,
	LKML, Steffen Persvold

After double checking the patches it seems everything is ok.

I had to rerun quite a bit since the machine was reconfigured and I 
wanted to be thorough.
My latest timings are quite close to my previous reported numbers.

The hang issue I encountered turned out to be unrelated to these patches 
so that is a separate bundle of fun.



On 05/14/2015 05:03 AM, Daniel J Blueman wrote:
> On Thu, May 14, 2015 at 12:31 AM, Mel Gorman <mgorman@suse.de> wrote:
>> On Wed, May 13, 2015 at 10:53:33AM -0500, nzimmer wrote:
>>>  I am just noticed a hang on my largest box.
>>>  I can only reproduce with large core counts, if I turn down the
>>>  number of cpus it doesn't have an issue.
>>>
>>
>> Odd. The number of core counts should make little a difference as only
>> one CPU per node should be in use. Does sysrq+t give any indication how
>> or where it is hanging?
>
> I was seeing the same behaviour of 1000ms increasing to 5500ms [1]; 
> this suggests either lock contention or O(n) behaviour.
>
> Nathan, can you check with this ordering of patches from Andrew's 
> cache [2]? I was getting hanging until I a found them all.
>
> I'll follow up with timing data.
>
> Thanks,
>  Daniel
>
> -- [1]
>
> [   73.076117] node 2 initialised, 7732961 pages in 1060ms
> [   73.077184] node 38 initialised, 7732961 pages in 1060ms
> [   73.079626] node 146 initialised, 7732961 pages in 1050ms
> [   73.093488] node 62 initialised, 7732961 pages in 1080ms
> [   73.091557] node 3 initialised, 7732962 pages in 1080ms
> [   73.100000] node 186 initialised, 7732961 pages in 1040ms
> [   73.095731] node 4 initialised, 7732961 pages in 1080ms
> [   73.090289] node 50 initialised, 7732961 pages in 1080ms
> [   73.094005] node 158 initialised, 7732961 pages in 1050ms
> [   73.095421] node 159 initialised, 7732962 pages in 1050ms
> [   73.090324] node 52 initialised, 7732961 pages in 1080ms
> [   73.099056] node 5 initialised, 7732962 pages in 1080ms
> [   73.090116] node 160 initialised, 7732961 pages in 1050ms
> [   73.161051] node 157 initialised, 7732962 pages in 1120ms
> [   73.193565] node 161 initialised, 7732962 pages in 1160ms
> [   73.212456] node 26 initialised, 7732961 pages in 1200ms
> [   73.222904] node 0 initialised, 6686488 pages in 1210ms
> [   73.242165] node 140 initialised, 7732961 pages in 1210ms
> [   73.254230] node 156 initialised, 7732961 pages in 1220ms
> [   73.284634] node 1 initialised, 7732962 pages in 1270ms
> [   73.305301] node 141 initialised, 7732962 pages in 1280ms
> [   73.322845] node 28 initialised, 7732961 pages in 1310ms
> [   73.321757] node 142 initialised, 7732961 pages in 1290ms
> [   73.327677] node 138 initialised, 7732961 pages in 1300ms
> [   73.413597] node 176 initialised, 7732961 pages in 1370ms
> [   73.455552] node 139 initialised, 7732962 pages in 1420ms
> [   73.475356] node 143 initialised, 7732962 pages in 1440ms
> [   73.547202] node 32 initialised, 7732961 pages in 1530ms
> [   73.579591] node 104 initialised, 7732961 pages in 1560ms
> [   73.618065] node 174 initialised, 7732961 pages in 1570ms
> [   73.624918] node 178 initialised, 7732961 pages in 1580ms
> [   73.649024] node 175 initialised, 7732962 pages in 1610ms
> [   73.654110] node 105 initialised, 7732962 pages in 1630ms
> [   73.670589] node 106 initialised, 7732961 pages in 1650ms
> [   73.739682] node 102 initialised, 7732961 pages in 1720ms
> [   73.769639] node 86 initialised, 7732961 pages in 1750ms
> [   73.775573] node 44 initialised, 7732961 pages in 1760ms
> [   73.772955] node 177 initialised, 7732962 pages in 1740ms
> [   73.804390] node 34 initialised, 7732961 pages in 1790ms
> [   73.819370] node 30 initialised, 7732961 pages in 1810ms
> [   73.847882] node 98 initialised, 7732961 pages in 1830ms
> [   73.867545] node 33 initialised, 7732962 pages in 1860ms
> [   73.877964] node 107 initialised, 7732962 pages in 1860ms
> [   73.906256] node 103 initialised, 7732962 pages in 1880ms
> [   73.945581] node 100 initialised, 7732961 pages in 1930ms
> [   73.947024] node 96 initialised, 7732961 pages in 1930ms
> [   74.186208] node 116 initialised, 7732961 pages in 2170ms
> [   74.220838] node 68 initialised, 7732961 pages in 2210ms
> [   74.252341] node 46 initialised, 7732961 pages in 2240ms
> [   74.274795] node 118 initialised, 7732961 pages in 2260ms
> [   74.337544] node 14 initialised, 7732961 pages in 2320ms
> [   74.350819] node 22 initialised, 7732961 pages in 2340ms
> [   74.350332] node 69 initialised, 7732962 pages in 2340ms
> [   74.362683] node 211 initialised, 7732962 pages in 2310ms
> [   74.360617] node 70 initialised, 7732961 pages in 2340ms
> [   74.369137] node 66 initialised, 7732961 pages in 2360ms
> [   74.378242] node 115 initialised, 7732962 pages in 2360ms
> [   74.404221] node 213 initialised, 7732962 pages in 2350ms
> [   74.420901] node 210 initialised, 7732961 pages in 2370ms
> [   74.430049] node 35 initialised, 7732962 pages in 2420ms
> [   74.436007] node 48 initialised, 7732961 pages in 2420ms
> [   74.480595] node 71 initialised, 7732962 pages in 2460ms
> [   74.485700] node 67 initialised, 7732962 pages in 2480ms
> [   74.502627] node 31 initialised, 7732962 pages in 2490ms
> [   74.542220] node 16 initialised, 7732961 pages in 2530ms
> [   74.547936] node 128 initialised, 7732961 pages in 2520ms
> [   74.634374] node 214 initialised, 7732961 pages in 2580ms
> [   74.654389] node 88 initialised, 7732961 pages in 2630ms
> [   74.722833] node 117 initialised, 7732962 pages in 2700ms
> [   74.735002] node 148 initialised, 7732961 pages in 2700ms
> [   74.742725] node 12 initialised, 7732961 pages in 2730ms
> [   74.749319] node 194 initialised, 7732961 pages in 2700ms
> [   74.767979] node 24 initialised, 7732961 pages in 2750ms
> [   74.769465] node 114 initialised, 7732961 pages in 2750ms
> [   74.796973] node 134 initialised, 7732961 pages in 2770ms
> [   74.818164] node 15 initialised, 7732962 pages in 2810ms
> [   74.844852] node 18 initialised, 7732961 pages in 2830ms
> [   74.866123] node 110 initialised, 7732961 pages in 2850ms
> [   74.898255] node 215 initialised, 7730688 pages in 2840ms
> [   74.903623] node 136 initialised, 7732961 pages in 2880ms
> [   74.911107] node 144 initialised, 7732961 pages in 2890ms
> [   74.918757] node 212 initialised, 7732961 pages in 2870ms
> [   74.935333] node 182 initialised, 7732961 pages in 2880ms
> [   74.958147] node 42 initialised, 7732961 pages in 2950ms
> [   74.964989] node 108 initialised, 7732961 pages in 2950ms
> [   74.965482] node 112 initialised, 7732961 pages in 2950ms
> [   75.034787] node 184 initialised, 7732961 pages in 2980ms
> [   75.051242] node 45 initialised, 7732962 pages in 3040ms
> [   75.047169] node 152 initialised, 7732961 pages in 3020ms
> [   75.062834] node 179 initialised, 7732962 pages in 3010ms
> [   75.076528] node 145 initialised, 7732962 pages in 3040ms
> [   75.076613] node 25 initialised, 7732962 pages in 3070ms
> [   75.073086] node 164 initialised, 7732961 pages in 3040ms
> [   75.079674] node 149 initialised, 7732962 pages in 3050ms
> [   75.092015] node 113 initialised, 7732962 pages in 3070ms
> [   75.096325] node 80 initialised, 7732961 pages in 3080ms
> [   75.131380] node 92 initialised, 7732961 pages in 3110ms
> [   75.142147] node 10 initialised, 7732961 pages in 3130ms
> [   75.151041] node 51 initialised, 7732962 pages in 3140ms
> [   75.159074] node 130 initialised, 7732961 pages in 3130ms
> [   75.162616] node 166 initialised, 7732961 pages in 3130ms
> [   75.193557] node 82 initialised, 7732961 pages in 3170ms
> [   75.254801] node 84 initialised, 7732961 pages in 3240ms
> [   75.303028] node 64 initialised, 7732961 pages in 3290ms
> [   75.299739] node 49 initialised, 7732962 pages in 3290ms
> [   75.314231] node 21 initialised, 7732962 pages in 3300ms
> [   75.371298] node 53 initialised, 7732962 pages in 3360ms
> [   75.394569] node 95 initialised, 7732962 pages in 3380ms
> [   75.441101] node 23 initialised, 7732962 pages in 3430ms
> [   75.433080] node 19 initialised, 7732962 pages in 3430ms
> [   75.446076] node 173 initialised, 7732962 pages in 3410ms
> [   75.445816] node 99 initialised, 7732962 pages in 3430ms
> [   75.470330] node 87 initialised, 7732962 pages in 3450ms
> [   75.502334] node 8 initialised, 7732961 pages in 3490ms
> [   75.508300] node 206 initialised, 7732961 pages in 3460ms
> [   75.540253] node 132 initialised, 7732961 pages in 3510ms
> [   75.615453] node 183 initialised, 7732962 pages in 3560ms
> [   75.632576] node 78 initialised, 7732961 pages in 3610ms
> [   75.647753] node 85 initialised, 7732962 pages in 3620ms
> [   75.688955] node 90 initialised, 7732961 pages in 3670ms
> [   75.694522] node 200 initialised, 7732961 pages in 3640ms
> [   75.688790] node 43 initialised, 7732962 pages in 3680ms
> [   75.694540] node 94 initialised, 7732961 pages in 3680ms
> [   75.697149] node 29 initialised, 7732962 pages in 3690ms
> [   75.693590] node 111 initialised, 7732962 pages in 3680ms
> [   75.715829] node 56 initialised, 7732961 pages in 3700ms
> [   75.718427] node 97 initialised, 7732962 pages in 3700ms
> [   75.741643] node 147 initialised, 7732962 pages in 3710ms
> [   75.773613] node 170 initialised, 7732961 pages in 3740ms
> [   75.802874] node 208 initialised, 7732961 pages in 3750ms
> [   75.804409] node 58 initialised, 7732961 pages in 3790ms
> [   75.853438] node 126 initialised, 7732961 pages in 3830ms
> [   75.888167] node 167 initialised, 7732962 pages in 3850ms
> [   75.912656] node 172 initialised, 7732961 pages in 3870ms
> [   75.956540] node 93 initialised, 7732962 pages in 3940ms
> [   75.988819] node 127 initialised, 7732962 pages in 3960ms
> [   76.062198] node 201 initialised, 7732962 pages in 4010ms
> [   76.091769] node 47 initialised, 7732962 pages in 4080ms
> [   76.119749] node 162 initialised, 7732961 pages in 4080ms
> [   76.122797] node 6 initialised, 7732961 pages in 4110ms
> [   76.225916] node 153 initialised, 7732962 pages in 4190ms
> [   76.219855] node 81 initialised, 7732962 pages in 4200ms
> [   76.236116] node 150 initialised, 7732961 pages in 4210ms
> [   76.245349] node 180 initialised, 7732961 pages in 4190ms
> [   76.248827] node 17 initialised, 7732962 pages in 4240ms
> [   76.258801] node 13 initialised, 7732962 pages in 4250ms
> [   76.259943] node 122 initialised, 7732961 pages in 4240ms
> [   76.277480] node 196 initialised, 7732961 pages in 4230ms
> [   76.320830] node 41 initialised, 7732962 pages in 4310ms
> [   76.351667] node 129 initialised, 7732962 pages in 4320ms
> [   76.353488] node 202 initialised, 7732961 pages in 4310ms
> [   76.376753] node 165 initialised, 7732962 pages in 4340ms
> [   76.381807] node 124 initialised, 7732961 pages in 4350ms
> [   76.419952] node 171 initialised, 7732962 pages in 4380ms
> [   76.431242] node 168 initialised, 7732961 pages in 4390ms
> [   76.441324] node 89 initialised, 7732962 pages in 4420ms
> [   76.440720] node 155 initialised, 7732962 pages in 4400ms
> [   76.459715] node 120 initialised, 7732961 pages in 4440ms
> [   76.483986] node 205 initialised, 7732962 pages in 4430ms
> [   76.493284] node 151 initialised, 7732962 pages in 4460ms
> [   76.491437] node 60 initialised, 7732961 pages in 4480ms
> [   76.526620] node 74 initialised, 7732961 pages in 4510ms
> [   76.543761] node 131 initialised, 7732962 pages in 4510ms
> [   76.549562] node 39 initialised, 7732962 pages in 4540ms
> [   76.563861] node 11 initialised, 7732962 pages in 4550ms
> [   76.598775] node 54 initialised, 7732961 pages in 4590ms
> [   76.602006] node 123 initialised, 7732962 pages in 4570ms
> [   76.619856] node 76 initialised, 7732961 pages in 4600ms
> [   76.631418] node 198 initialised, 7732961 pages in 4580ms
> [   76.665415] node 188 initialised, 7732961 pages in 4610ms
> [   76.669178] node 63 initialised, 7732962 pages in 4660ms
> [   76.683646] node 101 initialised, 7732962 pages in 4670ms
> [   76.710780] node 192 initialised, 7732961 pages in 4660ms
> [   76.736743] node 121 initialised, 7732962 pages in 4720ms
> [   76.743800] node 199 initialised, 7732962 pages in 4700ms
> [   76.750663] node 20 initialised, 7732961 pages in 4740ms
> [   76.763045] node 135 initialised, 7732962 pages in 4730ms
> [   76.768216] node 137 initialised, 7732962 pages in 4740ms
> [   76.800135] node 181 initialised, 7732962 pages in 4750ms
> [   76.811215] node 27 initialised, 7732962 pages in 4800ms
> [   76.857405] node 125 initialised, 7732962 pages in 4820ms
> [   76.853750] node 163 initialised, 7732962 pages in 4820ms
> [   76.882975] node 59 initialised, 7732962 pages in 4870ms
> [   76.920121] node 9 initialised, 7732962 pages in 4910ms
> [   76.934824] node 189 initialised, 7732962 pages in 4880ms
> [   76.951223] node 154 initialised, 7732961 pages in 4920ms
> [   76.953897] node 203 initialised, 7732962 pages in 4900ms
> [   76.952558] node 75 initialised, 7732962 pages in 4930ms
> [   76.985480] node 119 initialised, 7732962 pages in 4970ms
> [   77.036089] node 195 initialised, 7732962 pages in 4980ms
> [   77.039996] node 55 initialised, 7732962 pages in 5030ms
> [   77.067989] node 109 initialised, 7732962 pages in 5040ms
> [   77.066236] node 7 initialised, 7732962 pages in 5060ms
> [   77.068709] node 65 initialised, 7732962 pages in 5060ms
> [   77.097859] node 79 initialised, 7732962 pages in 5080ms
> [   77.096219] node 169 initialised, 7732962 pages in 5060ms
> [   77.125113] node 83 initialised, 7732962 pages in 5110ms
> [   77.139507] node 37 initialised, 7732962 pages in 5130ms
> [   77.143280] node 77 initialised, 7732962 pages in 5120ms
> [   77.226494] node 73 initialised, 7732962 pages in 5200ms
> [   77.281584] node 190 initialised, 7732961 pages in 5230ms
> [   77.314794] node 204 initialised, 7732961 pages in 5260ms
> [   77.328577] node 72 initialised, 7732961 pages in 5310ms
> [   77.335743] node 36 initialised, 7732961 pages in 5320ms
> [   77.360573] node 40 initialised, 7732961 pages in 5350ms
> [   77.368712] node 207 initialised, 7732962 pages in 5320ms
> [   77.387708] node 91 initialised, 7732962 pages in 5370ms
> [   77.385143] node 57 initialised, 7732962 pages in 5380ms
> [   77.391785] node 191 initialised, 7732962 pages in 5340ms
> [   77.479970] node 185 initialised, 7732962 pages in 5430ms
> [   77.491865] node 61 initialised, 7732962 pages in 5480ms
> [   77.489255] node 133 initialised, 7732962 pages in 5460ms
> [   77.502111] node 197 initialised, 7732962 pages in 5450ms
> [   77.507136] node 193 initialised, 7732962 pages in 5460ms
> [   77.523739] node 209 initialised, 7732962 pages in 5470ms
> [   77.537131] node 187 initialised, 7732962 pages in 5490ms
>
> -- [2]
>
> http://ozlabs.org/~akpm/mmots/broken-out/memblock-introduce-a-for_each_reserved_mem_region-iterator.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-move-page-initialization-into-a-separate-function.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-only-set-page-reserved-in-the-memblock-region.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-page_alloc-pass-pfn-to-__free_pages_bootmem.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-make-__early_pfn_to_nid-smp-safe-and-introduce-meminit_pfn_in_nid.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-inline-some-helper-functions.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-inline-some-helper-functions-fix.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-initialise-a-subset-of-struct-pages-if-config_deferred_struct_page_init-is-set.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-initialise-remaining-struct-pages-in-parallel-with-kswapd.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-minimise-number-of-pfn-page-lookups-during-initialisation.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/x86-mm-enable-deferred-struct-page-initialisation-on-x86-64.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-free-pages-in-large-chunks-where-possible.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-reduce-number-of-times-pageblocks-are-set-during-struct-page-init.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-remove-mminit_verify_page_links.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-initialise-a-subset-of-struct-pages-if-config_deferred_struct_page_init-is-set-fix.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-reduce-number-of-times-pageblocks-are-set-during-struct-page-init-fix.patch 
>
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-inline-some-helper-functions-fix2.patch 
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
  2015-05-19 18:31                         ` nzimmer
@ 2015-05-19 19:06                           ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-05-19 19:06 UTC (permalink / raw)
  To: nzimmer
  Cc: Daniel J Blueman, Andrew Morton, Waiman Long, Dave Hansen,
	Scott Norton, Linux-MM, LKML, Steffen Persvold

On Tue, May 19, 2015 at 01:31:28PM -0500, nzimmer wrote:
> After double checking the patches it seems everything is ok.
> 
> I had to rerun quite a bit since the machine was reconfigured and I
> wanted to be thorough.
> My latest timings are quite close to my previous reported numbers.
> 
> The hang issue I encountered turned out to be unrelated to these
> patches so that is a separate bundle of fun.
> 

Ok, sorry to hear about the hanging but I'm glad to hear the patches are
not responsible. Thanks for testing and getting back.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
@ 2015-05-19 19:06                           ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-05-19 19:06 UTC (permalink / raw)
  To: nzimmer
  Cc: Daniel J Blueman, Andrew Morton, Waiman Long, Dave Hansen,
	Scott Norton, Linux-MM, LKML, Steffen Persvold

On Tue, May 19, 2015 at 01:31:28PM -0500, nzimmer wrote:
> After double checking the patches it seems everything is ok.
> 
> I had to rerun quite a bit since the machine was reconfigured and I
> wanted to be thorough.
> My latest timings are quite close to my previous reported numbers.
> 
> The hang issue I encountered turned out to be unrelated to these
> patches so that is a separate bundle of fun.
> 

Ok, sorry to hear about the hanging but I'm glad to hear the patches are
not responsible. Thanks for testing and getting back.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
  2015-05-14 10:03                       ` Daniel J Blueman
@ 2015-05-22  6:30                         ` Daniel J Blueman
  -1 siblings, 0 replies; 168+ messages in thread
From: Daniel J Blueman @ 2015-05-22  6:30 UTC (permalink / raw)
  To: Mel Gorman, Andrew Morton
  Cc: nzimmer, Waiman Long, Dave Hansen, Scott Norton, Linux-MM, LKML,
	Steffen Persvold

On Thu, May 14, 2015 at 6:03 PM, Daniel J Blueman 
<daniel@numascale.com> wrote:
> On Thu, May 14, 2015 at 12:31 AM, Mel Gorman <mgorman@suse.de> wrote:
>> On Wed, May 13, 2015 at 10:53:33AM -0500, nzimmer wrote:
>>>  I am just noticed a hang on my largest box.
>>>  I can only reproduce with large core counts, if I turn down the
>>>  number of cpus it doesn't have an issue.
>>> 
>> 
>> Odd. The number of core counts should make little a difference as 
>> only
>> one CPU per node should be in use. Does sysrq+t give any indication 
>> how
>> or where it is hanging?
> 
> I was seeing the same behaviour of 1000ms increasing to 5500ms [1]; 
> this suggests either lock contention or O(n) behaviour.
> 
> Nathan, can you check with this ordering of patches from Andrew's 
> cache [2]? I was getting hanging until I a found them all.
> 
> I'll follow up with timing data.

7TB over 216 NUMA nodes, 1728 cores, from kernel 4.0.4 load to login:

1. 2086s with patches 01-19 [1]

2. 2026s adding "Take into account that large system caches scale 
linearly with memory", which has:
 min(2UL << (30 - PAGE_SHIFT), (pgdat->node_spanned_pages >> 3));

3. 2442s fixing to:
 max(2UL << (30 - PAGE_SHIFT), (pgdat->node_spanned_pages >> 3));

4. 2064s adjusting minimum and shift to:
 max(512UL << (20 - PAGE_SHIFT), (pgdat->node_spanned_pages >> 8));

5. 1934s adjusting minimum and shift to:
 max(128UL << (20 - PAGE_SHIFT), (pgdat->node_spanned_pages >> 8));

6. 930s #5 with the non-temporal PMD init patch I had earlier proposed 
(I'll pursue separately)

The scaling patch isn't in -mm. #5 tests out nice on a bunch of other 
AMD systems, 64GB and up, so: Tested-by: Daniel J Blueman 
<daniel@numascale.com>.

Fine work, Mel!

Daniel

-- [1]

> http://ozlabs.org/~akpm/mmots/broken-out/memblock-introduce-a-for_each_reserved_mem_region-iterator.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-move-page-initialization-into-a-separate-function.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-only-set-page-reserved-in-the-memblock-region.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-page_alloc-pass-pfn-to-__free_pages_bootmem.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-make-__early_pfn_to_nid-smp-safe-and-introduce-meminit_pfn_in_nid.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-inline-some-helper-functions.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-inline-some-helper-functions-fix.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-initialise-a-subset-of-struct-pages-if-config_deferred_struct_page_init-is-set.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-initialise-remaining-struct-pages-in-parallel-with-kswapd.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-minimise-number-of-pfn-page-lookups-during-initialisation.patch
> http://ozlabs.org/~akpm/mmots/broken-out/x86-mm-enable-deferred-struct-page-initialisation-on-x86-64.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-free-pages-in-large-chunks-where-possible.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-reduce-number-of-times-pageblocks-are-set-during-struct-page-init.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-remove-mminit_verify_page_links.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-initialise-a-subset-of-struct-pages-if-config_deferred_struct_page_init-is-set-fix.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-reduce-number-of-times-pageblocks-are-set-during-struct-page-init-fix.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-inline-some-helper-functions-fix2.patch


^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
@ 2015-05-22  6:30                         ` Daniel J Blueman
  0 siblings, 0 replies; 168+ messages in thread
From: Daniel J Blueman @ 2015-05-22  6:30 UTC (permalink / raw)
  To: Mel Gorman, Andrew Morton
  Cc: nzimmer, Waiman Long, Dave Hansen, Scott Norton, Linux-MM, LKML,
	Steffen Persvold

On Thu, May 14, 2015 at 6:03 PM, Daniel J Blueman 
<daniel@numascale.com> wrote:
> On Thu, May 14, 2015 at 12:31 AM, Mel Gorman <mgorman@suse.de> wrote:
>> On Wed, May 13, 2015 at 10:53:33AM -0500, nzimmer wrote:
>>>  I am just noticed a hang on my largest box.
>>>  I can only reproduce with large core counts, if I turn down the
>>>  number of cpus it doesn't have an issue.
>>> 
>> 
>> Odd. The number of core counts should make little a difference as 
>> only
>> one CPU per node should be in use. Does sysrq+t give any indication 
>> how
>> or where it is hanging?
> 
> I was seeing the same behaviour of 1000ms increasing to 5500ms [1]; 
> this suggests either lock contention or O(n) behaviour.
> 
> Nathan, can you check with this ordering of patches from Andrew's 
> cache [2]? I was getting hanging until I a found them all.
> 
> I'll follow up with timing data.

7TB over 216 NUMA nodes, 1728 cores, from kernel 4.0.4 load to login:

1. 2086s with patches 01-19 [1]

2. 2026s adding "Take into account that large system caches scale 
linearly with memory", which has:
 min(2UL << (30 - PAGE_SHIFT), (pgdat->node_spanned_pages >> 3));

3. 2442s fixing to:
 max(2UL << (30 - PAGE_SHIFT), (pgdat->node_spanned_pages >> 3));

4. 2064s adjusting minimum and shift to:
 max(512UL << (20 - PAGE_SHIFT), (pgdat->node_spanned_pages >> 8));

5. 1934s adjusting minimum and shift to:
 max(128UL << (20 - PAGE_SHIFT), (pgdat->node_spanned_pages >> 8));

6. 930s #5 with the non-temporal PMD init patch I had earlier proposed 
(I'll pursue separately)

The scaling patch isn't in -mm. #5 tests out nice on a bunch of other 
AMD systems, 64GB and up, so: Tested-by: Daniel J Blueman 
<daniel@numascale.com>.

Fine work, Mel!

Daniel

-- [1]

> http://ozlabs.org/~akpm/mmots/broken-out/memblock-introduce-a-for_each_reserved_mem_region-iterator.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-move-page-initialization-into-a-separate-function.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-only-set-page-reserved-in-the-memblock-region.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-page_alloc-pass-pfn-to-__free_pages_bootmem.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-make-__early_pfn_to_nid-smp-safe-and-introduce-meminit_pfn_in_nid.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-inline-some-helper-functions.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-inline-some-helper-functions-fix.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-initialise-a-subset-of-struct-pages-if-config_deferred_struct_page_init-is-set.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-initialise-remaining-struct-pages-in-parallel-with-kswapd.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-minimise-number-of-pfn-page-lookups-during-initialisation.patch
> http://ozlabs.org/~akpm/mmots/broken-out/x86-mm-enable-deferred-struct-page-initialisation-on-x86-64.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-free-pages-in-large-chunks-where-possible.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-reduce-number-of-times-pageblocks-are-set-during-struct-page-init.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-remove-mminit_verify_page_links.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-initialise-a-subset-of-struct-pages-if-config_deferred_struct_page_init-is-set-fix.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-reduce-number-of-times-pageblocks-are-set-during-struct-page-init-fix.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-inline-some-helper-functions-fix2.patch

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
  2015-05-22  6:30                         ` Daniel J Blueman
@ 2015-05-22  9:33                           ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-05-22  9:33 UTC (permalink / raw)
  To: Daniel J Blueman
  Cc: Andrew Morton, nzimmer, Waiman Long, Dave Hansen, Scott Norton,
	Linux-MM, LKML, Steffen Persvold

On Fri, May 22, 2015 at 02:30:01PM +0800, Daniel J Blueman wrote:
> On Thu, May 14, 2015 at 6:03 PM, Daniel J Blueman
> <daniel@numascale.com> wrote:
> >On Thu, May 14, 2015 at 12:31 AM, Mel Gorman <mgorman@suse.de> wrote:
> >>On Wed, May 13, 2015 at 10:53:33AM -0500, nzimmer wrote:
> >>> I am just noticed a hang on my largest box.
> >>> I can only reproduce with large core counts, if I turn down the
> >>> number of cpus it doesn't have an issue.
> >>>
> >>
> >>Odd. The number of core counts should make little a difference
> >>as only
> >>one CPU per node should be in use. Does sysrq+t give any
> >>indication how
> >>or where it is hanging?
> >
> >I was seeing the same behaviour of 1000ms increasing to 5500ms
> >[1]; this suggests either lock contention or O(n) behaviour.
> >
> >Nathan, can you check with this ordering of patches from Andrew's
> >cache [2]? I was getting hanging until I a found them all.
> >
> >I'll follow up with timing data.
> 
> 7TB over 216 NUMA nodes, 1728 cores, from kernel 4.0.4 load to login:
> 
> 1. 2086s with patches 01-19 [1]
> 
> 2. 2026s adding "Take into account that large system caches scale
> linearly with memory", which has:
> min(2UL << (30 - PAGE_SHIFT), (pgdat->node_spanned_pages >> 3));
> 
> 3. 2442s fixing to:
> max(2UL << (30 - PAGE_SHIFT), (pgdat->node_spanned_pages >> 3));
> 
> 4. 2064s adjusting minimum and shift to:
> max(512UL << (20 - PAGE_SHIFT), (pgdat->node_spanned_pages >> 8));
> 
> 5. 1934s adjusting minimum and shift to:
> max(128UL << (20 - PAGE_SHIFT), (pgdat->node_spanned_pages >> 8));
> 
> 6. 930s #5 with the non-temporal PMD init patch I had earlier
> proposed (I'll pursue separately)
> 
> The scaling patch isn't in -mm.

That patch was superceded by "mm: meminit: finish
initialisation of struct pages before basic setup" and
"mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix"
so that's ok.

FWIW, I think you should still go ahead with the non-temporal patches because
there is potential benefit there other than the initialisation.  If there
was an arch-optional implementation of a non-termporal clear then it would
also be worth considering if __GFP_ZERO should use non-temporal stores.
At a greater stretch it would be worth considering if kswapd freeing should
zero pages to avoid a zero on the allocation side in the general case as
it would be more generally useful and a stepping stone towards what the
series "Sanitizing freed pages" attempts.

> #5 tests out nice on a bunch of
> other AMD systems, 64GB and up, so: Tested-by: Daniel J Blueman
> <daniel@numascale.com>.
> 

Thanks very much Daniel, much appreciated.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
@ 2015-05-22  9:33                           ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-05-22  9:33 UTC (permalink / raw)
  To: Daniel J Blueman
  Cc: Andrew Morton, nzimmer, Waiman Long, Dave Hansen, Scott Norton,
	Linux-MM, LKML, Steffen Persvold

On Fri, May 22, 2015 at 02:30:01PM +0800, Daniel J Blueman wrote:
> On Thu, May 14, 2015 at 6:03 PM, Daniel J Blueman
> <daniel@numascale.com> wrote:
> >On Thu, May 14, 2015 at 12:31 AM, Mel Gorman <mgorman@suse.de> wrote:
> >>On Wed, May 13, 2015 at 10:53:33AM -0500, nzimmer wrote:
> >>> I am just noticed a hang on my largest box.
> >>> I can only reproduce with large core counts, if I turn down the
> >>> number of cpus it doesn't have an issue.
> >>>
> >>
> >>Odd. The number of core counts should make little a difference
> >>as only
> >>one CPU per node should be in use. Does sysrq+t give any
> >>indication how
> >>or where it is hanging?
> >
> >I was seeing the same behaviour of 1000ms increasing to 5500ms
> >[1]; this suggests either lock contention or O(n) behaviour.
> >
> >Nathan, can you check with this ordering of patches from Andrew's
> >cache [2]? I was getting hanging until I a found them all.
> >
> >I'll follow up with timing data.
> 
> 7TB over 216 NUMA nodes, 1728 cores, from kernel 4.0.4 load to login:
> 
> 1. 2086s with patches 01-19 [1]
> 
> 2. 2026s adding "Take into account that large system caches scale
> linearly with memory", which has:
> min(2UL << (30 - PAGE_SHIFT), (pgdat->node_spanned_pages >> 3));
> 
> 3. 2442s fixing to:
> max(2UL << (30 - PAGE_SHIFT), (pgdat->node_spanned_pages >> 3));
> 
> 4. 2064s adjusting minimum and shift to:
> max(512UL << (20 - PAGE_SHIFT), (pgdat->node_spanned_pages >> 8));
> 
> 5. 1934s adjusting minimum and shift to:
> max(128UL << (20 - PAGE_SHIFT), (pgdat->node_spanned_pages >> 8));
> 
> 6. 930s #5 with the non-temporal PMD init patch I had earlier
> proposed (I'll pursue separately)
> 
> The scaling patch isn't in -mm.

That patch was superceded by "mm: meminit: finish
initialisation of struct pages before basic setup" and
"mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix"
so that's ok.

FWIW, I think you should still go ahead with the non-temporal patches because
there is potential benefit there other than the initialisation.  If there
was an arch-optional implementation of a non-termporal clear then it would
also be worth considering if __GFP_ZERO should use non-temporal stores.
At a greater stretch it would be worth considering if kswapd freeing should
zero pages to avoid a zero on the allocation side in the general case as
it would be more generally useful and a stepping stone towards what the
series "Sanitizing freed pages" attempts.

> #5 tests out nice on a bunch of
> other AMD systems, 64GB and up, so: Tested-by: Daniel J Blueman
> <daniel@numascale.com>.
> 

Thanks very much Daniel, much appreciated.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
  2015-05-22  9:33                           ` Mel Gorman
@ 2015-05-22 17:14                             ` Waiman Long
  -1 siblings, 0 replies; 168+ messages in thread
From: Waiman Long @ 2015-05-22 17:14 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Daniel J Blueman, Andrew Morton, nzimmer, Dave Hansen,
	Scott Norton, Linux-MM, LKML, Steffen Persvold

On 05/22/2015 05:33 AM, Mel Gorman wrote:
> On Fri, May 22, 2015 at 02:30:01PM +0800, Daniel J Blueman wrote:
>> On Thu, May 14, 2015 at 6:03 PM, Daniel J Blueman
>> <daniel@numascale.com>  wrote:
>>> On Thu, May 14, 2015 at 12:31 AM, Mel Gorman<mgorman@suse.de>  wrote:
>>>> On Wed, May 13, 2015 at 10:53:33AM -0500, nzimmer wrote:
>>>>> I am just noticed a hang on my largest box.
>>>>> I can only reproduce with large core counts, if I turn down the
>>>>> number of cpus it doesn't have an issue.
>>>>>
>>>> Odd. The number of core counts should make little a difference
>>>> as only
>>>> one CPU per node should be in use. Does sysrq+t give any
>>>> indication how
>>>> or where it is hanging?
>>> I was seeing the same behaviour of 1000ms increasing to 5500ms
>>> [1]; this suggests either lock contention or O(n) behaviour.
>>>
>>> Nathan, can you check with this ordering of patches from Andrew's
>>> cache [2]? I was getting hanging until I a found them all.
>>>
>>> I'll follow up with timing data.
>> 7TB over 216 NUMA nodes, 1728 cores, from kernel 4.0.4 load to login:
>>
>> 1. 2086s with patches 01-19 [1]
>>
>> 2. 2026s adding "Take into account that large system caches scale
>> linearly with memory", which has:
>> min(2UL<<  (30 - PAGE_SHIFT), (pgdat->node_spanned_pages>>  3));
>>
>> 3. 2442s fixing to:
>> max(2UL<<  (30 - PAGE_SHIFT), (pgdat->node_spanned_pages>>  3));
>>
>> 4. 2064s adjusting minimum and shift to:
>> max(512UL<<  (20 - PAGE_SHIFT), (pgdat->node_spanned_pages>>  8));
>>
>> 5. 1934s adjusting minimum and shift to:
>> max(128UL<<  (20 - PAGE_SHIFT), (pgdat->node_spanned_pages>>  8));
>>
>> 6. 930s #5 with the non-temporal PMD init patch I had earlier
>> proposed (I'll pursue separately)
>>
>> The scaling patch isn't in -mm.
> That patch was superceded by "mm: meminit: finish
> initialisation of struct pages before basic setup" and
> "mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix"
> so that's ok.
>
> FWIW, I think you should still go ahead with the non-temporal patches because
> there is potential benefit there other than the initialisation.  If there
> was an arch-optional implementation of a non-termporal clear then it would
> also be worth considering if __GFP_ZERO should use non-temporal stores.
> At a greater stretch it would be worth considering if kswapd freeing should
> zero pages to avoid a zero on the allocation side in the general case as
> it would be more generally useful and a stepping stone towards what the
> series "Sanitizing freed pages" attempts.

I think the non-temporal patch benefits mainly AMD systems. I have tried 
the patch on both DragonHawk and it actually made it boot up a little 
bit slower. I think the Intel optimized "rep stosb" instruction (used in 
memset) is performing well. I had done similar test on zero page code 
and the performance gain was non-conclusive.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
@ 2015-05-22 17:14                             ` Waiman Long
  0 siblings, 0 replies; 168+ messages in thread
From: Waiman Long @ 2015-05-22 17:14 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Daniel J Blueman, Andrew Morton, nzimmer, Dave Hansen,
	Scott Norton, Linux-MM, LKML, Steffen Persvold

On 05/22/2015 05:33 AM, Mel Gorman wrote:
> On Fri, May 22, 2015 at 02:30:01PM +0800, Daniel J Blueman wrote:
>> On Thu, May 14, 2015 at 6:03 PM, Daniel J Blueman
>> <daniel@numascale.com>  wrote:
>>> On Thu, May 14, 2015 at 12:31 AM, Mel Gorman<mgorman@suse.de>  wrote:
>>>> On Wed, May 13, 2015 at 10:53:33AM -0500, nzimmer wrote:
>>>>> I am just noticed a hang on my largest box.
>>>>> I can only reproduce with large core counts, if I turn down the
>>>>> number of cpus it doesn't have an issue.
>>>>>
>>>> Odd. The number of core counts should make little a difference
>>>> as only
>>>> one CPU per node should be in use. Does sysrq+t give any
>>>> indication how
>>>> or where it is hanging?
>>> I was seeing the same behaviour of 1000ms increasing to 5500ms
>>> [1]; this suggests either lock contention or O(n) behaviour.
>>>
>>> Nathan, can you check with this ordering of patches from Andrew's
>>> cache [2]? I was getting hanging until I a found them all.
>>>
>>> I'll follow up with timing data.
>> 7TB over 216 NUMA nodes, 1728 cores, from kernel 4.0.4 load to login:
>>
>> 1. 2086s with patches 01-19 [1]
>>
>> 2. 2026s adding "Take into account that large system caches scale
>> linearly with memory", which has:
>> min(2UL<<  (30 - PAGE_SHIFT), (pgdat->node_spanned_pages>>  3));
>>
>> 3. 2442s fixing to:
>> max(2UL<<  (30 - PAGE_SHIFT), (pgdat->node_spanned_pages>>  3));
>>
>> 4. 2064s adjusting minimum and shift to:
>> max(512UL<<  (20 - PAGE_SHIFT), (pgdat->node_spanned_pages>>  8));
>>
>> 5. 1934s adjusting minimum and shift to:
>> max(128UL<<  (20 - PAGE_SHIFT), (pgdat->node_spanned_pages>>  8));
>>
>> 6. 930s #5 with the non-temporal PMD init patch I had earlier
>> proposed (I'll pursue separately)
>>
>> The scaling patch isn't in -mm.
> That patch was superceded by "mm: meminit: finish
> initialisation of struct pages before basic setup" and
> "mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix"
> so that's ok.
>
> FWIW, I think you should still go ahead with the non-temporal patches because
> there is potential benefit there other than the initialisation.  If there
> was an arch-optional implementation of a non-termporal clear then it would
> also be worth considering if __GFP_ZERO should use non-temporal stores.
> At a greater stretch it would be worth considering if kswapd freeing should
> zero pages to avoid a zero on the allocation side in the general case as
> it would be more generally useful and a stepping stone towards what the
> series "Sanitizing freed pages" attempts.

I think the non-temporal patch benefits mainly AMD systems. I have tried 
the patch on both DragonHawk and it actually made it boot up a little 
bit slower. I think the Intel optimized "rep stosb" instruction (used in 
memset) is performing well. I had done similar test on zero page code 
and the performance gain was non-conclusive.

Cheers,
Longman

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 03/13] mm: meminit: Only set page reserved in the memblock region
  2015-04-28 14:37   ` Mel Gorman
@ 2015-05-22 20:31     ` Tony Luck
  -1 siblings, 0 replies; 168+ messages in thread
From: Tony Luck @ 2015-05-22 20:31 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Waiman Long,
	Scott Norton, Daniel J Blueman, Linux-MM, LKML

On Tue, Apr 28, 2015 at 7:37 AM, Mel Gorman <mgorman@suse.de> wrote:
> Currently each page struct is set as reserved upon initialization.
> This patch leaves the reserved bit clear and only sets the reserved bit
> when it is known the memory was allocated by the bootmem allocator. This
> makes it easier to distinguish between uninitialised struct pages and
> reserved struct pages in later patches.

On ia64 my linux-next builds now report a bunch of messages like this:

put_kernel_page: page at 0xe000000005588000 not in reserved memory
put_kernel_page: page at 0xe000000005588000 not in reserved memory
put_kernel_page: page at 0xe000000005580000 not in reserved memory
put_kernel_page: page at 0xe000000005580000 not in reserved memory
put_kernel_page: page at 0xe000000005580000 not in reserved memory
put_kernel_page: page at 0xe000000005580000 not in reserved memory

the two different pages match up with two objects from the loaded kernel
that get mapped by arch/ia64/mm/init.c:setup_gate()

a000000101588000 D __start_gate_section
a000000101580000 D empty_zero_page

Should I look for a place to set the reserved bit on page structures for these
addresses? Or just remove the test and message in put_kernel_page()
[I added a debug "else" clause here - every caller passes in a page that is
not reserved]

        if (!PageReserved(page))
                printk(KERN_ERR "put_kernel_page: page at 0x%p not in
reserved memory\n",
                       page_address(page));

-Tony

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 03/13] mm: meminit: Only set page reserved in the memblock region
@ 2015-05-22 20:31     ` Tony Luck
  0 siblings, 0 replies; 168+ messages in thread
From: Tony Luck @ 2015-05-22 20:31 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Waiman Long,
	Scott Norton, Daniel J Blueman, Linux-MM, LKML

On Tue, Apr 28, 2015 at 7:37 AM, Mel Gorman <mgorman@suse.de> wrote:
> Currently each page struct is set as reserved upon initialization.
> This patch leaves the reserved bit clear and only sets the reserved bit
> when it is known the memory was allocated by the bootmem allocator. This
> makes it easier to distinguish between uninitialised struct pages and
> reserved struct pages in later patches.

On ia64 my linux-next builds now report a bunch of messages like this:

put_kernel_page: page at 0xe000000005588000 not in reserved memory
put_kernel_page: page at 0xe000000005588000 not in reserved memory
put_kernel_page: page at 0xe000000005580000 not in reserved memory
put_kernel_page: page at 0xe000000005580000 not in reserved memory
put_kernel_page: page at 0xe000000005580000 not in reserved memory
put_kernel_page: page at 0xe000000005580000 not in reserved memory

the two different pages match up with two objects from the loaded kernel
that get mapped by arch/ia64/mm/init.c:setup_gate()

a000000101588000 D __start_gate_section
a000000101580000 D empty_zero_page

Should I look for a place to set the reserved bit on page structures for these
addresses? Or just remove the test and message in put_kernel_page()
[I added a debug "else" clause here - every caller passes in a page that is
not reserved]

        if (!PageReserved(page))
                printk(KERN_ERR "put_kernel_page: page at 0x%p not in
reserved memory\n",
                       page_address(page));

-Tony

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
  2015-05-22 17:14                             ` Waiman Long
@ 2015-05-22 21:43                               ` Davidlohr Bueso
  -1 siblings, 0 replies; 168+ messages in thread
From: Davidlohr Bueso @ 2015-05-22 21:43 UTC (permalink / raw)
  To: Waiman Long
  Cc: Mel Gorman, Daniel J Blueman, Andrew Morton, nzimmer,
	Dave Hansen, Scott Norton, Linux-MM, LKML, Steffen Persvold

On Fri, 2015-05-22 at 13:14 -0400, Waiman Long wrote:
> I think the non-temporal patch benefits mainly AMD systems. I have tried 
> the patch on both DragonHawk and it actually made it boot up a little 
> bit slower. I think the Intel optimized "rep stosb" instruction (used in 
> memset) is performing well. I had done similar test on zero page code 
> and the performance gain was non-conclusive.

fwiw I did some experiments with similar conclusions a while ago
(inconclusive with intel hw, maybe it was even the same machine ;)
Now, this was for optimizing clear_hugepage by using movnti, but I never
got to run it on an AMD box.

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
@ 2015-05-22 21:43                               ` Davidlohr Bueso
  0 siblings, 0 replies; 168+ messages in thread
From: Davidlohr Bueso @ 2015-05-22 21:43 UTC (permalink / raw)
  To: Waiman Long
  Cc: Mel Gorman, Daniel J Blueman, Andrew Morton, nzimmer,
	Dave Hansen, Scott Norton, Linux-MM, LKML, Steffen Persvold

On Fri, 2015-05-22 at 13:14 -0400, Waiman Long wrote:
> I think the non-temporal patch benefits mainly AMD systems. I have tried 
> the patch on both DragonHawk and it actually made it boot up a little 
> bit slower. I think the Intel optimized "rep stosb" instruction (used in 
> memset) is performing well. I had done similar test on zero page code 
> and the performance gain was non-conclusive.

fwiw I did some experiments with similar conclusions a while ago
(inconclusive with intel hw, maybe it was even the same machine ;)
Now, this was for optimizing clear_hugepage by using movnti, but I never
got to run it on an AMD box.

Thanks,
Davidlohr

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
  2015-05-22 17:14                             ` Waiman Long
@ 2015-05-23  3:49                               ` Daniel J Blueman
  -1 siblings, 0 replies; 168+ messages in thread
From: Daniel J Blueman @ 2015-05-23  3:49 UTC (permalink / raw)
  To: Waiman Long, Mel Gorman
  Cc: Andrew Morton, nzimmer, Dave Hansen, Scott Norton, Linux-MM,
	LKML, Steffen Persvold



-- 
Daniel J Blueman
Principal Software Engineer, Numascale

On Sat, May 23, 2015 at 1:14 AM, Waiman Long <waiman.long@hp.com> wrote:
> On 05/22/2015 05:33 AM, Mel Gorman wrote:
>> On Fri, May 22, 2015 at 02:30:01PM +0800, Daniel J Blueman wrote:
>>> On Thu, May 14, 2015 at 6:03 PM, Daniel J Blueman
>>> <daniel@numascale.com>  wrote:
>>>> On Thu, May 14, 2015 at 12:31 AM, Mel Gorman<mgorman@suse.de>  
>>>> wrote:
>>>>> On Wed, May 13, 2015 at 10:53:33AM -0500, nzimmer wrote:
>>>>>> I am just noticed a hang on my largest box.
>>>>>> I can only reproduce with large core counts, if I turn down the
>>>>>> number of cpus it doesn't have an issue.
>>>>>> 
>>>>> Odd. The number of core counts should make little a difference
>>>>> as only
>>>>> one CPU per node should be in use. Does sysrq+t give any
>>>>> indication how
>>>>> or where it is hanging?
>>>> I was seeing the same behaviour of 1000ms increasing to 5500ms
>>>> [1]; this suggests either lock contention or O(n) behaviour.
>>>> 
>>>> Nathan, can you check with this ordering of patches from Andrew's
>>>> cache [2]? I was getting hanging until I a found them all.
>>>> 
>>>> I'll follow up with timing data.
>>> 7TB over 216 NUMA nodes, 1728 cores, from kernel 4.0.4 load to 
>>> login:
>>> 
>>> 1. 2086s with patches 01-19 [1]
>>> 
>>> 2. 2026s adding "Take into account that large system caches scale
>>> linearly with memory", which has:
>>> min(2UL<<  (30 - PAGE_SHIFT), (pgdat->node_spanned_pages>>  3));
>>> 
>>> 3. 2442s fixing to:
>>> max(2UL<<  (30 - PAGE_SHIFT), (pgdat->node_spanned_pages>>  3));
>>> 
>>> 4. 2064s adjusting minimum and shift to:
>>> max(512UL<<  (20 - PAGE_SHIFT), (pgdat->node_spanned_pages>>  8));
>>> 
>>> 5. 1934s adjusting minimum and shift to:
>>> max(128UL<<  (20 - PAGE_SHIFT), (pgdat->node_spanned_pages>>  8));
>>> 
>>> 6. 930s #5 with the non-temporal PMD init patch I had earlier
>>> proposed (I'll pursue separately)
>>> 
>>> The scaling patch isn't in -mm.
>> That patch was superceded by "mm: meminit: finish
>> initialisation of struct pages before basic setup" and
>> "mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix"
>> so that's ok.
>> 
>> FWIW, I think you should still go ahead with the non-temporal 
>> patches because
>> there is potential benefit there other than the initialisation.  If 
>> there
>> was an arch-optional implementation of a non-termporal clear then it 
>> would
>> also be worth considering if __GFP_ZERO should use non-temporal 
>> stores.
>> At a greater stretch it would be worth considering if kswapd freeing 
>> should
>> zero pages to avoid a zero on the allocation side in the general 
>> case as
>> it would be more generally useful and a stepping stone towards what 
>> the
>> series "Sanitizing freed pages" attempts.

Good tip Mel; I'll take a look when time allows and get some data, 
though I guess it'll only be a win where the clearing is on a different 
node than the allocation.

> I think the non-temporal patch benefits mainly AMD systems. I have 
> tried the patch on both DragonHawk and it actually made it boot up a 
> little bit slower. I think the Intel optimized "rep stosb" 
> instruction (used in memset) is performing well. I had done similar 
> test on zero page code and the performance gain was non-conclusive.

I suspect 'rep stosb' on modern Intel hardware can write whole 
cachelines atomically, avoiding the RMW, or that the read part of the 
RMW is optimally prefetched. Open-coding it just can't reach the same 
level of pipeline saturation that the microcode can.

Daniel


^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
@ 2015-05-23  3:49                               ` Daniel J Blueman
  0 siblings, 0 replies; 168+ messages in thread
From: Daniel J Blueman @ 2015-05-23  3:49 UTC (permalink / raw)
  To: Waiman Long, Mel Gorman
  Cc: Andrew Morton, nzimmer, Dave Hansen, Scott Norton, Linux-MM,
	LKML, Steffen Persvold



-- 
Daniel J Blueman
Principal Software Engineer, Numascale

On Sat, May 23, 2015 at 1:14 AM, Waiman Long <waiman.long@hp.com> wrote:
> On 05/22/2015 05:33 AM, Mel Gorman wrote:
>> On Fri, May 22, 2015 at 02:30:01PM +0800, Daniel J Blueman wrote:
>>> On Thu, May 14, 2015 at 6:03 PM, Daniel J Blueman
>>> <daniel@numascale.com>  wrote:
>>>> On Thu, May 14, 2015 at 12:31 AM, Mel Gorman<mgorman@suse.de>  
>>>> wrote:
>>>>> On Wed, May 13, 2015 at 10:53:33AM -0500, nzimmer wrote:
>>>>>> I am just noticed a hang on my largest box.
>>>>>> I can only reproduce with large core counts, if I turn down the
>>>>>> number of cpus it doesn't have an issue.
>>>>>> 
>>>>> Odd. The number of core counts should make little a difference
>>>>> as only
>>>>> one CPU per node should be in use. Does sysrq+t give any
>>>>> indication how
>>>>> or where it is hanging?
>>>> I was seeing the same behaviour of 1000ms increasing to 5500ms
>>>> [1]; this suggests either lock contention or O(n) behaviour.
>>>> 
>>>> Nathan, can you check with this ordering of patches from Andrew's
>>>> cache [2]? I was getting hanging until I a found them all.
>>>> 
>>>> I'll follow up with timing data.
>>> 7TB over 216 NUMA nodes, 1728 cores, from kernel 4.0.4 load to 
>>> login:
>>> 
>>> 1. 2086s with patches 01-19 [1]
>>> 
>>> 2. 2026s adding "Take into account that large system caches scale
>>> linearly with memory", which has:
>>> min(2UL<<  (30 - PAGE_SHIFT), (pgdat->node_spanned_pages>>  3));
>>> 
>>> 3. 2442s fixing to:
>>> max(2UL<<  (30 - PAGE_SHIFT), (pgdat->node_spanned_pages>>  3));
>>> 
>>> 4. 2064s adjusting minimum and shift to:
>>> max(512UL<<  (20 - PAGE_SHIFT), (pgdat->node_spanned_pages>>  8));
>>> 
>>> 5. 1934s adjusting minimum and shift to:
>>> max(128UL<<  (20 - PAGE_SHIFT), (pgdat->node_spanned_pages>>  8));
>>> 
>>> 6. 930s #5 with the non-temporal PMD init patch I had earlier
>>> proposed (I'll pursue separately)
>>> 
>>> The scaling patch isn't in -mm.
>> That patch was superceded by "mm: meminit: finish
>> initialisation of struct pages before basic setup" and
>> "mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix"
>> so that's ok.
>> 
>> FWIW, I think you should still go ahead with the non-temporal 
>> patches because
>> there is potential benefit there other than the initialisation.  If 
>> there
>> was an arch-optional implementation of a non-termporal clear then it 
>> would
>> also be worth considering if __GFP_ZERO should use non-temporal 
>> stores.
>> At a greater stretch it would be worth considering if kswapd freeing 
>> should
>> zero pages to avoid a zero on the allocation side in the general 
>> case as
>> it would be more generally useful and a stepping stone towards what 
>> the
>> series "Sanitizing freed pages" attempts.

Good tip Mel; I'll take a look when time allows and get some data, 
though I guess it'll only be a win where the clearing is on a different 
node than the allocation.

> I think the non-temporal patch benefits mainly AMD systems. I have 
> tried the patch on both DragonHawk and it actually made it boot up a 
> little bit slower. I think the Intel optimized "rep stosb" 
> instruction (used in memset) is performing well. I had done similar 
> test on zero page code and the performance gain was non-conclusive.

I suspect 'rep stosb' on modern Intel hardware can write whole 
cachelines atomically, avoiding the RMW, or that the read part of the 
RMW is optimally prefetched. Open-coding it just can't reach the same 
level of pipeline saturation that the microcode can.

Daniel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 03/13] mm: meminit: Only set page reserved in the memblock region
  2015-05-22 20:31     ` Tony Luck
@ 2015-05-26 10:22       ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-05-26 10:22 UTC (permalink / raw)
  To: Tony Luck
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Waiman Long,
	Scott Norton, Daniel J Blueman, Linux-MM, LKML

On Fri, May 22, 2015 at 01:31:55PM -0700, Tony Luck wrote:
> On Tue, Apr 28, 2015 at 7:37 AM, Mel Gorman <mgorman@suse.de> wrote:
> > Currently each page struct is set as reserved upon initialization.
> > This patch leaves the reserved bit clear and only sets the reserved bit
> > when it is known the memory was allocated by the bootmem allocator. This
> > makes it easier to distinguish between uninitialised struct pages and
> > reserved struct pages in later patches.
> 
> On ia64 my linux-next builds now report a bunch of messages like this:
> 
> put_kernel_page: page at 0xe000000005588000 not in reserved memory
> put_kernel_page: page at 0xe000000005588000 not in reserved memory
> put_kernel_page: page at 0xe000000005580000 not in reserved memory
> put_kernel_page: page at 0xe000000005580000 not in reserved memory
> put_kernel_page: page at 0xe000000005580000 not in reserved memory
> put_kernel_page: page at 0xe000000005580000 not in reserved memory
> 
> the two different pages match up with two objects from the loaded kernel
> that get mapped by arch/ia64/mm/init.c:setup_gate()
> 
> a000000101588000 D __start_gate_section
> a000000101580000 D empty_zero_page
> 
> Should I look for a place to set the reserved bit on page structures for these
> addresses?

That would be preferred.

> Or just remove the test and message in put_kernel_page()
> [I added a debug "else" clause here - every caller passes in a page that is
> not reserved]
> 
>         if (!PageReserved(page))
>                 printk(KERN_ERR "put_kernel_page: page at 0x%p not in
> reserved memory\n",
>                        page_address(page));
> 

But as it's a debugging check that is ia-64 specific I think either
should be fine.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH 03/13] mm: meminit: Only set page reserved in the memblock region
@ 2015-05-26 10:22       ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-05-26 10:22 UTC (permalink / raw)
  To: Tony Luck
  Cc: Andrew Morton, Nathan Zimmer, Dave Hansen, Waiman Long,
	Scott Norton, Daniel J Blueman, Linux-MM, LKML

On Fri, May 22, 2015 at 01:31:55PM -0700, Tony Luck wrote:
> On Tue, Apr 28, 2015 at 7:37 AM, Mel Gorman <mgorman@suse.de> wrote:
> > Currently each page struct is set as reserved upon initialization.
> > This patch leaves the reserved bit clear and only sets the reserved bit
> > when it is known the memory was allocated by the bootmem allocator. This
> > makes it easier to distinguish between uninitialised struct pages and
> > reserved struct pages in later patches.
> 
> On ia64 my linux-next builds now report a bunch of messages like this:
> 
> put_kernel_page: page at 0xe000000005588000 not in reserved memory
> put_kernel_page: page at 0xe000000005588000 not in reserved memory
> put_kernel_page: page at 0xe000000005580000 not in reserved memory
> put_kernel_page: page at 0xe000000005580000 not in reserved memory
> put_kernel_page: page at 0xe000000005580000 not in reserved memory
> put_kernel_page: page at 0xe000000005580000 not in reserved memory
> 
> the two different pages match up with two objects from the loaded kernel
> that get mapped by arch/ia64/mm/init.c:setup_gate()
> 
> a000000101588000 D __start_gate_section
> a000000101580000 D empty_zero_page
> 
> Should I look for a place to set the reserved bit on page structures for these
> addresses?

That would be preferred.

> Or just remove the test and message in put_kernel_page()
> [I added a debug "else" clause here - every caller passes in a page that is
> not reserved]
> 
>         if (!PageReserved(page))
>                 printk(KERN_ERR "put_kernel_page: page at 0x%p not in
> reserved memory\n",
>                        page_address(page));
> 

But as it's a debugging check that is ia-64 specific I think either
should be fine.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
  2015-05-14 10:03                       ` Daniel J Blueman
@ 2015-06-24 22:50                         ` Nathan Zimmer
  -1 siblings, 0 replies; 168+ messages in thread
From: Nathan Zimmer @ 2015-06-24 22:50 UTC (permalink / raw)
  To: Daniel J Blueman
  Cc: Mel Gorman, nzimmer, Andrew Morton, Waiman Long, Dave Hansen,
	Scott Norton, Linux-MM, LKML, Steffen Persvold

[-- Attachment #1: Type: text/plain, Size: 17004 bytes --]

My apologies for taking so long to get back to this.

I think I did locate two potential sources of slowdown.
One is the set_cpus_allowed_ptr as I have noted previously.
However I only notice that on the very largest boxes.
I did cobble together a patch that seems to help.

The other spot I suspect is the zone lock in free_one_page.
I haven't been able to give that much thought as of yet though.

Daniel do you mind seeing if the attached patch helps out?

Thanks,
Nate

On Thu, May 14, 2015 at 06:03:03PM +0800, Daniel J Blueman wrote:
> On Thu, May 14, 2015 at 12:31 AM, Mel Gorman <mgorman@suse.de> wrote:
>> On Wed, May 13, 2015 at 10:53:33AM -0500, nzimmer wrote:
>>>  I am just noticed a hang on my largest box.
>>>  I can only reproduce with large core counts, if I turn down the
>>>  number of cpus it doesn't have an issue.
>>>
>>
>> Odd. The number of core counts should make little a difference as only
>> one CPU per node should be in use. Does sysrq+t give any indication how
>> or where it is hanging?
>
> I was seeing the same behaviour of 1000ms increasing to 5500ms [1]; this 
> suggests either lock contention or O(n) behaviour.
>
> Nathan, can you check with this ordering of patches from Andrew's cache 
> [2]? I was getting hanging until I a found them all.
>
> I'll follow up with timing data.
>
> Thanks,
>  Daniel
>
> -- [1]
>
> [   73.076117] node 2 initialised, 7732961 pages in 1060ms
> [   73.077184] node 38 initialised, 7732961 pages in 1060ms
> [   73.079626] node 146 initialised, 7732961 pages in 1050ms
> [   73.093488] node 62 initialised, 7732961 pages in 1080ms
> [   73.091557] node 3 initialised, 7732962 pages in 1080ms
> [   73.100000] node 186 initialised, 7732961 pages in 1040ms
> [   73.095731] node 4 initialised, 7732961 pages in 1080ms
> [   73.090289] node 50 initialised, 7732961 pages in 1080ms
> [   73.094005] node 158 initialised, 7732961 pages in 1050ms
> [   73.095421] node 159 initialised, 7732962 pages in 1050ms
> [   73.090324] node 52 initialised, 7732961 pages in 1080ms
> [   73.099056] node 5 initialised, 7732962 pages in 1080ms
> [   73.090116] node 160 initialised, 7732961 pages in 1050ms
> [   73.161051] node 157 initialised, 7732962 pages in 1120ms
> [   73.193565] node 161 initialised, 7732962 pages in 1160ms
> [   73.212456] node 26 initialised, 7732961 pages in 1200ms
> [   73.222904] node 0 initialised, 6686488 pages in 1210ms
> [   73.242165] node 140 initialised, 7732961 pages in 1210ms
> [   73.254230] node 156 initialised, 7732961 pages in 1220ms
> [   73.284634] node 1 initialised, 7732962 pages in 1270ms
> [   73.305301] node 141 initialised, 7732962 pages in 1280ms
> [   73.322845] node 28 initialised, 7732961 pages in 1310ms
> [   73.321757] node 142 initialised, 7732961 pages in 1290ms
> [   73.327677] node 138 initialised, 7732961 pages in 1300ms
> [   73.413597] node 176 initialised, 7732961 pages in 1370ms
> [   73.455552] node 139 initialised, 7732962 pages in 1420ms
> [   73.475356] node 143 initialised, 7732962 pages in 1440ms
> [   73.547202] node 32 initialised, 7732961 pages in 1530ms
> [   73.579591] node 104 initialised, 7732961 pages in 1560ms
> [   73.618065] node 174 initialised, 7732961 pages in 1570ms
> [   73.624918] node 178 initialised, 7732961 pages in 1580ms
> [   73.649024] node 175 initialised, 7732962 pages in 1610ms
> [   73.654110] node 105 initialised, 7732962 pages in 1630ms
> [   73.670589] node 106 initialised, 7732961 pages in 1650ms
> [   73.739682] node 102 initialised, 7732961 pages in 1720ms
> [   73.769639] node 86 initialised, 7732961 pages in 1750ms
> [   73.775573] node 44 initialised, 7732961 pages in 1760ms
> [   73.772955] node 177 initialised, 7732962 pages in 1740ms
> [   73.804390] node 34 initialised, 7732961 pages in 1790ms
> [   73.819370] node 30 initialised, 7732961 pages in 1810ms
> [   73.847882] node 98 initialised, 7732961 pages in 1830ms
> [   73.867545] node 33 initialised, 7732962 pages in 1860ms
> [   73.877964] node 107 initialised, 7732962 pages in 1860ms
> [   73.906256] node 103 initialised, 7732962 pages in 1880ms
> [   73.945581] node 100 initialised, 7732961 pages in 1930ms
> [   73.947024] node 96 initialised, 7732961 pages in 1930ms
> [   74.186208] node 116 initialised, 7732961 pages in 2170ms
> [   74.220838] node 68 initialised, 7732961 pages in 2210ms
> [   74.252341] node 46 initialised, 7732961 pages in 2240ms
> [   74.274795] node 118 initialised, 7732961 pages in 2260ms
> [   74.337544] node 14 initialised, 7732961 pages in 2320ms
> [   74.350819] node 22 initialised, 7732961 pages in 2340ms
> [   74.350332] node 69 initialised, 7732962 pages in 2340ms
> [   74.362683] node 211 initialised, 7732962 pages in 2310ms
> [   74.360617] node 70 initialised, 7732961 pages in 2340ms
> [   74.369137] node 66 initialised, 7732961 pages in 2360ms
> [   74.378242] node 115 initialised, 7732962 pages in 2360ms
> [   74.404221] node 213 initialised, 7732962 pages in 2350ms
> [   74.420901] node 210 initialised, 7732961 pages in 2370ms
> [   74.430049] node 35 initialised, 7732962 pages in 2420ms
> [   74.436007] node 48 initialised, 7732961 pages in 2420ms
> [   74.480595] node 71 initialised, 7732962 pages in 2460ms
> [   74.485700] node 67 initialised, 7732962 pages in 2480ms
> [   74.502627] node 31 initialised, 7732962 pages in 2490ms
> [   74.542220] node 16 initialised, 7732961 pages in 2530ms
> [   74.547936] node 128 initialised, 7732961 pages in 2520ms
> [   74.634374] node 214 initialised, 7732961 pages in 2580ms
> [   74.654389] node 88 initialised, 7732961 pages in 2630ms
> [   74.722833] node 117 initialised, 7732962 pages in 2700ms
> [   74.735002] node 148 initialised, 7732961 pages in 2700ms
> [   74.742725] node 12 initialised, 7732961 pages in 2730ms
> [   74.749319] node 194 initialised, 7732961 pages in 2700ms
> [   74.767979] node 24 initialised, 7732961 pages in 2750ms
> [   74.769465] node 114 initialised, 7732961 pages in 2750ms
> [   74.796973] node 134 initialised, 7732961 pages in 2770ms
> [   74.818164] node 15 initialised, 7732962 pages in 2810ms
> [   74.844852] node 18 initialised, 7732961 pages in 2830ms
> [   74.866123] node 110 initialised, 7732961 pages in 2850ms
> [   74.898255] node 215 initialised, 7730688 pages in 2840ms
> [   74.903623] node 136 initialised, 7732961 pages in 2880ms
> [   74.911107] node 144 initialised, 7732961 pages in 2890ms
> [   74.918757] node 212 initialised, 7732961 pages in 2870ms
> [   74.935333] node 182 initialised, 7732961 pages in 2880ms
> [   74.958147] node 42 initialised, 7732961 pages in 2950ms
> [   74.964989] node 108 initialised, 7732961 pages in 2950ms
> [   74.965482] node 112 initialised, 7732961 pages in 2950ms
> [   75.034787] node 184 initialised, 7732961 pages in 2980ms
> [   75.051242] node 45 initialised, 7732962 pages in 3040ms
> [   75.047169] node 152 initialised, 7732961 pages in 3020ms
> [   75.062834] node 179 initialised, 7732962 pages in 3010ms
> [   75.076528] node 145 initialised, 7732962 pages in 3040ms
> [   75.076613] node 25 initialised, 7732962 pages in 3070ms
> [   75.073086] node 164 initialised, 7732961 pages in 3040ms
> [   75.079674] node 149 initialised, 7732962 pages in 3050ms
> [   75.092015] node 113 initialised, 7732962 pages in 3070ms
> [   75.096325] node 80 initialised, 7732961 pages in 3080ms
> [   75.131380] node 92 initialised, 7732961 pages in 3110ms
> [   75.142147] node 10 initialised, 7732961 pages in 3130ms
> [   75.151041] node 51 initialised, 7732962 pages in 3140ms
> [   75.159074] node 130 initialised, 7732961 pages in 3130ms
> [   75.162616] node 166 initialised, 7732961 pages in 3130ms
> [   75.193557] node 82 initialised, 7732961 pages in 3170ms
> [   75.254801] node 84 initialised, 7732961 pages in 3240ms
> [   75.303028] node 64 initialised, 7732961 pages in 3290ms
> [   75.299739] node 49 initialised, 7732962 pages in 3290ms
> [   75.314231] node 21 initialised, 7732962 pages in 3300ms
> [   75.371298] node 53 initialised, 7732962 pages in 3360ms
> [   75.394569] node 95 initialised, 7732962 pages in 3380ms
> [   75.441101] node 23 initialised, 7732962 pages in 3430ms
> [   75.433080] node 19 initialised, 7732962 pages in 3430ms
> [   75.446076] node 173 initialised, 7732962 pages in 3410ms
> [   75.445816] node 99 initialised, 7732962 pages in 3430ms
> [   75.470330] node 87 initialised, 7732962 pages in 3450ms
> [   75.502334] node 8 initialised, 7732961 pages in 3490ms
> [   75.508300] node 206 initialised, 7732961 pages in 3460ms
> [   75.540253] node 132 initialised, 7732961 pages in 3510ms
> [   75.615453] node 183 initialised, 7732962 pages in 3560ms
> [   75.632576] node 78 initialised, 7732961 pages in 3610ms
> [   75.647753] node 85 initialised, 7732962 pages in 3620ms
> [   75.688955] node 90 initialised, 7732961 pages in 3670ms
> [   75.694522] node 200 initialised, 7732961 pages in 3640ms
> [   75.688790] node 43 initialised, 7732962 pages in 3680ms
> [   75.694540] node 94 initialised, 7732961 pages in 3680ms
> [   75.697149] node 29 initialised, 7732962 pages in 3690ms
> [   75.693590] node 111 initialised, 7732962 pages in 3680ms
> [   75.715829] node 56 initialised, 7732961 pages in 3700ms
> [   75.718427] node 97 initialised, 7732962 pages in 3700ms
> [   75.741643] node 147 initialised, 7732962 pages in 3710ms
> [   75.773613] node 170 initialised, 7732961 pages in 3740ms
> [   75.802874] node 208 initialised, 7732961 pages in 3750ms
> [   75.804409] node 58 initialised, 7732961 pages in 3790ms
> [   75.853438] node 126 initialised, 7732961 pages in 3830ms
> [   75.888167] node 167 initialised, 7732962 pages in 3850ms
> [   75.912656] node 172 initialised, 7732961 pages in 3870ms
> [   75.956540] node 93 initialised, 7732962 pages in 3940ms
> [   75.988819] node 127 initialised, 7732962 pages in 3960ms
> [   76.062198] node 201 initialised, 7732962 pages in 4010ms
> [   76.091769] node 47 initialised, 7732962 pages in 4080ms
> [   76.119749] node 162 initialised, 7732961 pages in 4080ms
> [   76.122797] node 6 initialised, 7732961 pages in 4110ms
> [   76.225916] node 153 initialised, 7732962 pages in 4190ms
> [   76.219855] node 81 initialised, 7732962 pages in 4200ms
> [   76.236116] node 150 initialised, 7732961 pages in 4210ms
> [   76.245349] node 180 initialised, 7732961 pages in 4190ms
> [   76.248827] node 17 initialised, 7732962 pages in 4240ms
> [   76.258801] node 13 initialised, 7732962 pages in 4250ms
> [   76.259943] node 122 initialised, 7732961 pages in 4240ms
> [   76.277480] node 196 initialised, 7732961 pages in 4230ms
> [   76.320830] node 41 initialised, 7732962 pages in 4310ms
> [   76.351667] node 129 initialised, 7732962 pages in 4320ms
> [   76.353488] node 202 initialised, 7732961 pages in 4310ms
> [   76.376753] node 165 initialised, 7732962 pages in 4340ms
> [   76.381807] node 124 initialised, 7732961 pages in 4350ms
> [   76.419952] node 171 initialised, 7732962 pages in 4380ms
> [   76.431242] node 168 initialised, 7732961 pages in 4390ms
> [   76.441324] node 89 initialised, 7732962 pages in 4420ms
> [   76.440720] node 155 initialised, 7732962 pages in 4400ms
> [   76.459715] node 120 initialised, 7732961 pages in 4440ms
> [   76.483986] node 205 initialised, 7732962 pages in 4430ms
> [   76.493284] node 151 initialised, 7732962 pages in 4460ms
> [   76.491437] node 60 initialised, 7732961 pages in 4480ms
> [   76.526620] node 74 initialised, 7732961 pages in 4510ms
> [   76.543761] node 131 initialised, 7732962 pages in 4510ms
> [   76.549562] node 39 initialised, 7732962 pages in 4540ms
> [   76.563861] node 11 initialised, 7732962 pages in 4550ms
> [   76.598775] node 54 initialised, 7732961 pages in 4590ms
> [   76.602006] node 123 initialised, 7732962 pages in 4570ms
> [   76.619856] node 76 initialised, 7732961 pages in 4600ms
> [   76.631418] node 198 initialised, 7732961 pages in 4580ms
> [   76.665415] node 188 initialised, 7732961 pages in 4610ms
> [   76.669178] node 63 initialised, 7732962 pages in 4660ms
> [   76.683646] node 101 initialised, 7732962 pages in 4670ms
> [   76.710780] node 192 initialised, 7732961 pages in 4660ms
> [   76.736743] node 121 initialised, 7732962 pages in 4720ms
> [   76.743800] node 199 initialised, 7732962 pages in 4700ms
> [   76.750663] node 20 initialised, 7732961 pages in 4740ms
> [   76.763045] node 135 initialised, 7732962 pages in 4730ms
> [   76.768216] node 137 initialised, 7732962 pages in 4740ms
> [   76.800135] node 181 initialised, 7732962 pages in 4750ms
> [   76.811215] node 27 initialised, 7732962 pages in 4800ms
> [   76.857405] node 125 initialised, 7732962 pages in 4820ms
> [   76.853750] node 163 initialised, 7732962 pages in 4820ms
> [   76.882975] node 59 initialised, 7732962 pages in 4870ms
> [   76.920121] node 9 initialised, 7732962 pages in 4910ms
> [   76.934824] node 189 initialised, 7732962 pages in 4880ms
> [   76.951223] node 154 initialised, 7732961 pages in 4920ms
> [   76.953897] node 203 initialised, 7732962 pages in 4900ms
> [   76.952558] node 75 initialised, 7732962 pages in 4930ms
> [   76.985480] node 119 initialised, 7732962 pages in 4970ms
> [   77.036089] node 195 initialised, 7732962 pages in 4980ms
> [   77.039996] node 55 initialised, 7732962 pages in 5030ms
> [   77.067989] node 109 initialised, 7732962 pages in 5040ms
> [   77.066236] node 7 initialised, 7732962 pages in 5060ms
> [   77.068709] node 65 initialised, 7732962 pages in 5060ms
> [   77.097859] node 79 initialised, 7732962 pages in 5080ms
> [   77.096219] node 169 initialised, 7732962 pages in 5060ms
> [   77.125113] node 83 initialised, 7732962 pages in 5110ms
> [   77.139507] node 37 initialised, 7732962 pages in 5130ms
> [   77.143280] node 77 initialised, 7732962 pages in 5120ms
> [   77.226494] node 73 initialised, 7732962 pages in 5200ms
> [   77.281584] node 190 initialised, 7732961 pages in 5230ms
> [   77.314794] node 204 initialised, 7732961 pages in 5260ms
> [   77.328577] node 72 initialised, 7732961 pages in 5310ms
> [   77.335743] node 36 initialised, 7732961 pages in 5320ms
> [   77.360573] node 40 initialised, 7732961 pages in 5350ms
> [   77.368712] node 207 initialised, 7732962 pages in 5320ms
> [   77.387708] node 91 initialised, 7732962 pages in 5370ms
> [   77.385143] node 57 initialised, 7732962 pages in 5380ms
> [   77.391785] node 191 initialised, 7732962 pages in 5340ms
> [   77.479970] node 185 initialised, 7732962 pages in 5430ms
> [   77.491865] node 61 initialised, 7732962 pages in 5480ms
> [   77.489255] node 133 initialised, 7732962 pages in 5460ms
> [   77.502111] node 197 initialised, 7732962 pages in 5450ms
> [   77.507136] node 193 initialised, 7732962 pages in 5460ms
> [   77.523739] node 209 initialised, 7732962 pages in 5470ms
> [   77.537131] node 187 initialised, 7732962 pages in 5490ms
>
> -- [2]
>
> http://ozlabs.org/~akpm/mmots/broken-out/memblock-introduce-a-for_each_reserved_mem_region-iterator.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-move-page-initialization-into-a-separate-function.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-only-set-page-reserved-in-the-memblock-region.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-page_alloc-pass-pfn-to-__free_pages_bootmem.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-make-__early_pfn_to_nid-smp-safe-and-introduce-meminit_pfn_in_nid.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-inline-some-helper-functions.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-inline-some-helper-functions-fix.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-initialise-a-subset-of-struct-pages-if-config_deferred_struct_page_init-is-set.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-initialise-remaining-struct-pages-in-parallel-with-kswapd.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-minimise-number-of-pfn-page-lookups-during-initialisation.patch
> http://ozlabs.org/~akpm/mmots/broken-out/x86-mm-enable-deferred-struct-page-initialisation-on-x86-64.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-free-pages-in-large-chunks-where-possible.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-reduce-number-of-times-pageblocks-are-set-during-struct-page-init.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-remove-mminit_verify_page_links.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-initialise-a-subset-of-struct-pages-if-config_deferred_struct_page_init-is-set-fix.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-reduce-number-of-times-pageblocks-are-set-during-struct-page-init-fix.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-inline-some-helper-functions-fix2.patch
>

[-- Attachment #2: 0001-Avoid-the-contention-in-set_cpus_allowed.patch --]
[-- Type: text/x-patch, Size: 2280 bytes --]

>From e18aa6158a60c2134b4eef93c856f3b5b250b122 Mon Sep 17 00:00:00 2001
From: Nathan Zimmer <nzimmer@sgi.com>
Date: Thu, 11 Jun 2015 10:47:39 -0500
Subject: [RFC] Avoid the contention in set_cpus_allowed

Noticing some scaling issues at larger box sizes (64 nodes+) I found that in some
cases we are spending significant amounts of time in set_cpus_allowed_ptr.

My assumption is that it is getting stuck on migration.
So if we create the thread on the target node and restrict cpus before we start
the thread then we don't have to suffer migration.

Cc: Mel Gorman <mgorman@suse.de>
Cc: Waiman Long <waiman.long@hp.com
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Scott Norton <scott.norton@hp.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>

---
 mm/page_alloc.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f88e8c4..531f7bc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1090,16 +1090,12 @@ static int __init deferred_init_memmap(void *data)
 	int i, zid;
 	struct zone *zone;
 	unsigned long first_init_pfn = pgdat->first_deferred_pfn;
-	const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id);
 
 	if (first_init_pfn == ULONG_MAX) {
 		up_read(&pgdat_init_rwsem);
 		return 0;
 	}
 
-	/* Bind memory initialisation thread to a local node if possible */
-	if (!cpumask_empty(cpumask))
-		set_cpus_allowed_ptr(current, cpumask);
 
 	/* Sanity check boundaries */
 	BUG_ON(pgdat->first_deferred_pfn < pgdat->node_start_pfn);
@@ -1204,8 +1200,16 @@ void __init page_alloc_init_late(void)
 	unsigned long start = jiffies;
 
 	for_each_node_state(nid, N_MEMORY) {
+		struct task_struct *defer_task;
+		const struct cpumask *cpumask = cpumask_of_node(nid);
 		down_read(&pgdat_init_rwsem);
-		kthread_run(deferred_init_memmap, NODE_DATA(nid), "pgdatinit%d", nid);
+		defer_task = kthread_create_on_node(deferred_init_memmap,
+			NODE_DATA(nid), nid, "pgdatinit%d", nid);
+		/* Bind memory initialisation thread to a local node if possible */
+		if (!cpumask_empty(cpumask))
+			set_cpus_allowed_ptr(defer_task, cpumask);
+		if (!IS_ERR(defer_task))
+			wake_up_process(defer_task);
 	}
 
 	/* Block until all are initialised */
-- 
1.8.2.1


^ permalink raw reply related	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
@ 2015-06-24 22:50                         ` Nathan Zimmer
  0 siblings, 0 replies; 168+ messages in thread
From: Nathan Zimmer @ 2015-06-24 22:50 UTC (permalink / raw)
  To: Daniel J Blueman
  Cc: Mel Gorman, nzimmer, Andrew Morton, Waiman Long, Dave Hansen,
	Scott Norton, Linux-MM, LKML, Steffen Persvold

[-- Attachment #1: Type: text/plain, Size: 17004 bytes --]

My apologies for taking so long to get back to this.

I think I did locate two potential sources of slowdown.
One is the set_cpus_allowed_ptr as I have noted previously.
However I only notice that on the very largest boxes.
I did cobble together a patch that seems to help.

The other spot I suspect is the zone lock in free_one_page.
I haven't been able to give that much thought as of yet though.

Daniel do you mind seeing if the attached patch helps out?

Thanks,
Nate

On Thu, May 14, 2015 at 06:03:03PM +0800, Daniel J Blueman wrote:
> On Thu, May 14, 2015 at 12:31 AM, Mel Gorman <mgorman@suse.de> wrote:
>> On Wed, May 13, 2015 at 10:53:33AM -0500, nzimmer wrote:
>>>  I am just noticed a hang on my largest box.
>>>  I can only reproduce with large core counts, if I turn down the
>>>  number of cpus it doesn't have an issue.
>>>
>>
>> Odd. The number of core counts should make little a difference as only
>> one CPU per node should be in use. Does sysrq+t give any indication how
>> or where it is hanging?
>
> I was seeing the same behaviour of 1000ms increasing to 5500ms [1]; this 
> suggests either lock contention or O(n) behaviour.
>
> Nathan, can you check with this ordering of patches from Andrew's cache 
> [2]? I was getting hanging until I a found them all.
>
> I'll follow up with timing data.
>
> Thanks,
>  Daniel
>
> -- [1]
>
> [   73.076117] node 2 initialised, 7732961 pages in 1060ms
> [   73.077184] node 38 initialised, 7732961 pages in 1060ms
> [   73.079626] node 146 initialised, 7732961 pages in 1050ms
> [   73.093488] node 62 initialised, 7732961 pages in 1080ms
> [   73.091557] node 3 initialised, 7732962 pages in 1080ms
> [   73.100000] node 186 initialised, 7732961 pages in 1040ms
> [   73.095731] node 4 initialised, 7732961 pages in 1080ms
> [   73.090289] node 50 initialised, 7732961 pages in 1080ms
> [   73.094005] node 158 initialised, 7732961 pages in 1050ms
> [   73.095421] node 159 initialised, 7732962 pages in 1050ms
> [   73.090324] node 52 initialised, 7732961 pages in 1080ms
> [   73.099056] node 5 initialised, 7732962 pages in 1080ms
> [   73.090116] node 160 initialised, 7732961 pages in 1050ms
> [   73.161051] node 157 initialised, 7732962 pages in 1120ms
> [   73.193565] node 161 initialised, 7732962 pages in 1160ms
> [   73.212456] node 26 initialised, 7732961 pages in 1200ms
> [   73.222904] node 0 initialised, 6686488 pages in 1210ms
> [   73.242165] node 140 initialised, 7732961 pages in 1210ms
> [   73.254230] node 156 initialised, 7732961 pages in 1220ms
> [   73.284634] node 1 initialised, 7732962 pages in 1270ms
> [   73.305301] node 141 initialised, 7732962 pages in 1280ms
> [   73.322845] node 28 initialised, 7732961 pages in 1310ms
> [   73.321757] node 142 initialised, 7732961 pages in 1290ms
> [   73.327677] node 138 initialised, 7732961 pages in 1300ms
> [   73.413597] node 176 initialised, 7732961 pages in 1370ms
> [   73.455552] node 139 initialised, 7732962 pages in 1420ms
> [   73.475356] node 143 initialised, 7732962 pages in 1440ms
> [   73.547202] node 32 initialised, 7732961 pages in 1530ms
> [   73.579591] node 104 initialised, 7732961 pages in 1560ms
> [   73.618065] node 174 initialised, 7732961 pages in 1570ms
> [   73.624918] node 178 initialised, 7732961 pages in 1580ms
> [   73.649024] node 175 initialised, 7732962 pages in 1610ms
> [   73.654110] node 105 initialised, 7732962 pages in 1630ms
> [   73.670589] node 106 initialised, 7732961 pages in 1650ms
> [   73.739682] node 102 initialised, 7732961 pages in 1720ms
> [   73.769639] node 86 initialised, 7732961 pages in 1750ms
> [   73.775573] node 44 initialised, 7732961 pages in 1760ms
> [   73.772955] node 177 initialised, 7732962 pages in 1740ms
> [   73.804390] node 34 initialised, 7732961 pages in 1790ms
> [   73.819370] node 30 initialised, 7732961 pages in 1810ms
> [   73.847882] node 98 initialised, 7732961 pages in 1830ms
> [   73.867545] node 33 initialised, 7732962 pages in 1860ms
> [   73.877964] node 107 initialised, 7732962 pages in 1860ms
> [   73.906256] node 103 initialised, 7732962 pages in 1880ms
> [   73.945581] node 100 initialised, 7732961 pages in 1930ms
> [   73.947024] node 96 initialised, 7732961 pages in 1930ms
> [   74.186208] node 116 initialised, 7732961 pages in 2170ms
> [   74.220838] node 68 initialised, 7732961 pages in 2210ms
> [   74.252341] node 46 initialised, 7732961 pages in 2240ms
> [   74.274795] node 118 initialised, 7732961 pages in 2260ms
> [   74.337544] node 14 initialised, 7732961 pages in 2320ms
> [   74.350819] node 22 initialised, 7732961 pages in 2340ms
> [   74.350332] node 69 initialised, 7732962 pages in 2340ms
> [   74.362683] node 211 initialised, 7732962 pages in 2310ms
> [   74.360617] node 70 initialised, 7732961 pages in 2340ms
> [   74.369137] node 66 initialised, 7732961 pages in 2360ms
> [   74.378242] node 115 initialised, 7732962 pages in 2360ms
> [   74.404221] node 213 initialised, 7732962 pages in 2350ms
> [   74.420901] node 210 initialised, 7732961 pages in 2370ms
> [   74.430049] node 35 initialised, 7732962 pages in 2420ms
> [   74.436007] node 48 initialised, 7732961 pages in 2420ms
> [   74.480595] node 71 initialised, 7732962 pages in 2460ms
> [   74.485700] node 67 initialised, 7732962 pages in 2480ms
> [   74.502627] node 31 initialised, 7732962 pages in 2490ms
> [   74.542220] node 16 initialised, 7732961 pages in 2530ms
> [   74.547936] node 128 initialised, 7732961 pages in 2520ms
> [   74.634374] node 214 initialised, 7732961 pages in 2580ms
> [   74.654389] node 88 initialised, 7732961 pages in 2630ms
> [   74.722833] node 117 initialised, 7732962 pages in 2700ms
> [   74.735002] node 148 initialised, 7732961 pages in 2700ms
> [   74.742725] node 12 initialised, 7732961 pages in 2730ms
> [   74.749319] node 194 initialised, 7732961 pages in 2700ms
> [   74.767979] node 24 initialised, 7732961 pages in 2750ms
> [   74.769465] node 114 initialised, 7732961 pages in 2750ms
> [   74.796973] node 134 initialised, 7732961 pages in 2770ms
> [   74.818164] node 15 initialised, 7732962 pages in 2810ms
> [   74.844852] node 18 initialised, 7732961 pages in 2830ms
> [   74.866123] node 110 initialised, 7732961 pages in 2850ms
> [   74.898255] node 215 initialised, 7730688 pages in 2840ms
> [   74.903623] node 136 initialised, 7732961 pages in 2880ms
> [   74.911107] node 144 initialised, 7732961 pages in 2890ms
> [   74.918757] node 212 initialised, 7732961 pages in 2870ms
> [   74.935333] node 182 initialised, 7732961 pages in 2880ms
> [   74.958147] node 42 initialised, 7732961 pages in 2950ms
> [   74.964989] node 108 initialised, 7732961 pages in 2950ms
> [   74.965482] node 112 initialised, 7732961 pages in 2950ms
> [   75.034787] node 184 initialised, 7732961 pages in 2980ms
> [   75.051242] node 45 initialised, 7732962 pages in 3040ms
> [   75.047169] node 152 initialised, 7732961 pages in 3020ms
> [   75.062834] node 179 initialised, 7732962 pages in 3010ms
> [   75.076528] node 145 initialised, 7732962 pages in 3040ms
> [   75.076613] node 25 initialised, 7732962 pages in 3070ms
> [   75.073086] node 164 initialised, 7732961 pages in 3040ms
> [   75.079674] node 149 initialised, 7732962 pages in 3050ms
> [   75.092015] node 113 initialised, 7732962 pages in 3070ms
> [   75.096325] node 80 initialised, 7732961 pages in 3080ms
> [   75.131380] node 92 initialised, 7732961 pages in 3110ms
> [   75.142147] node 10 initialised, 7732961 pages in 3130ms
> [   75.151041] node 51 initialised, 7732962 pages in 3140ms
> [   75.159074] node 130 initialised, 7732961 pages in 3130ms
> [   75.162616] node 166 initialised, 7732961 pages in 3130ms
> [   75.193557] node 82 initialised, 7732961 pages in 3170ms
> [   75.254801] node 84 initialised, 7732961 pages in 3240ms
> [   75.303028] node 64 initialised, 7732961 pages in 3290ms
> [   75.299739] node 49 initialised, 7732962 pages in 3290ms
> [   75.314231] node 21 initialised, 7732962 pages in 3300ms
> [   75.371298] node 53 initialised, 7732962 pages in 3360ms
> [   75.394569] node 95 initialised, 7732962 pages in 3380ms
> [   75.441101] node 23 initialised, 7732962 pages in 3430ms
> [   75.433080] node 19 initialised, 7732962 pages in 3430ms
> [   75.446076] node 173 initialised, 7732962 pages in 3410ms
> [   75.445816] node 99 initialised, 7732962 pages in 3430ms
> [   75.470330] node 87 initialised, 7732962 pages in 3450ms
> [   75.502334] node 8 initialised, 7732961 pages in 3490ms
> [   75.508300] node 206 initialised, 7732961 pages in 3460ms
> [   75.540253] node 132 initialised, 7732961 pages in 3510ms
> [   75.615453] node 183 initialised, 7732962 pages in 3560ms
> [   75.632576] node 78 initialised, 7732961 pages in 3610ms
> [   75.647753] node 85 initialised, 7732962 pages in 3620ms
> [   75.688955] node 90 initialised, 7732961 pages in 3670ms
> [   75.694522] node 200 initialised, 7732961 pages in 3640ms
> [   75.688790] node 43 initialised, 7732962 pages in 3680ms
> [   75.694540] node 94 initialised, 7732961 pages in 3680ms
> [   75.697149] node 29 initialised, 7732962 pages in 3690ms
> [   75.693590] node 111 initialised, 7732962 pages in 3680ms
> [   75.715829] node 56 initialised, 7732961 pages in 3700ms
> [   75.718427] node 97 initialised, 7732962 pages in 3700ms
> [   75.741643] node 147 initialised, 7732962 pages in 3710ms
> [   75.773613] node 170 initialised, 7732961 pages in 3740ms
> [   75.802874] node 208 initialised, 7732961 pages in 3750ms
> [   75.804409] node 58 initialised, 7732961 pages in 3790ms
> [   75.853438] node 126 initialised, 7732961 pages in 3830ms
> [   75.888167] node 167 initialised, 7732962 pages in 3850ms
> [   75.912656] node 172 initialised, 7732961 pages in 3870ms
> [   75.956540] node 93 initialised, 7732962 pages in 3940ms
> [   75.988819] node 127 initialised, 7732962 pages in 3960ms
> [   76.062198] node 201 initialised, 7732962 pages in 4010ms
> [   76.091769] node 47 initialised, 7732962 pages in 4080ms
> [   76.119749] node 162 initialised, 7732961 pages in 4080ms
> [   76.122797] node 6 initialised, 7732961 pages in 4110ms
> [   76.225916] node 153 initialised, 7732962 pages in 4190ms
> [   76.219855] node 81 initialised, 7732962 pages in 4200ms
> [   76.236116] node 150 initialised, 7732961 pages in 4210ms
> [   76.245349] node 180 initialised, 7732961 pages in 4190ms
> [   76.248827] node 17 initialised, 7732962 pages in 4240ms
> [   76.258801] node 13 initialised, 7732962 pages in 4250ms
> [   76.259943] node 122 initialised, 7732961 pages in 4240ms
> [   76.277480] node 196 initialised, 7732961 pages in 4230ms
> [   76.320830] node 41 initialised, 7732962 pages in 4310ms
> [   76.351667] node 129 initialised, 7732962 pages in 4320ms
> [   76.353488] node 202 initialised, 7732961 pages in 4310ms
> [   76.376753] node 165 initialised, 7732962 pages in 4340ms
> [   76.381807] node 124 initialised, 7732961 pages in 4350ms
> [   76.419952] node 171 initialised, 7732962 pages in 4380ms
> [   76.431242] node 168 initialised, 7732961 pages in 4390ms
> [   76.441324] node 89 initialised, 7732962 pages in 4420ms
> [   76.440720] node 155 initialised, 7732962 pages in 4400ms
> [   76.459715] node 120 initialised, 7732961 pages in 4440ms
> [   76.483986] node 205 initialised, 7732962 pages in 4430ms
> [   76.493284] node 151 initialised, 7732962 pages in 4460ms
> [   76.491437] node 60 initialised, 7732961 pages in 4480ms
> [   76.526620] node 74 initialised, 7732961 pages in 4510ms
> [   76.543761] node 131 initialised, 7732962 pages in 4510ms
> [   76.549562] node 39 initialised, 7732962 pages in 4540ms
> [   76.563861] node 11 initialised, 7732962 pages in 4550ms
> [   76.598775] node 54 initialised, 7732961 pages in 4590ms
> [   76.602006] node 123 initialised, 7732962 pages in 4570ms
> [   76.619856] node 76 initialised, 7732961 pages in 4600ms
> [   76.631418] node 198 initialised, 7732961 pages in 4580ms
> [   76.665415] node 188 initialised, 7732961 pages in 4610ms
> [   76.669178] node 63 initialised, 7732962 pages in 4660ms
> [   76.683646] node 101 initialised, 7732962 pages in 4670ms
> [   76.710780] node 192 initialised, 7732961 pages in 4660ms
> [   76.736743] node 121 initialised, 7732962 pages in 4720ms
> [   76.743800] node 199 initialised, 7732962 pages in 4700ms
> [   76.750663] node 20 initialised, 7732961 pages in 4740ms
> [   76.763045] node 135 initialised, 7732962 pages in 4730ms
> [   76.768216] node 137 initialised, 7732962 pages in 4740ms
> [   76.800135] node 181 initialised, 7732962 pages in 4750ms
> [   76.811215] node 27 initialised, 7732962 pages in 4800ms
> [   76.857405] node 125 initialised, 7732962 pages in 4820ms
> [   76.853750] node 163 initialised, 7732962 pages in 4820ms
> [   76.882975] node 59 initialised, 7732962 pages in 4870ms
> [   76.920121] node 9 initialised, 7732962 pages in 4910ms
> [   76.934824] node 189 initialised, 7732962 pages in 4880ms
> [   76.951223] node 154 initialised, 7732961 pages in 4920ms
> [   76.953897] node 203 initialised, 7732962 pages in 4900ms
> [   76.952558] node 75 initialised, 7732962 pages in 4930ms
> [   76.985480] node 119 initialised, 7732962 pages in 4970ms
> [   77.036089] node 195 initialised, 7732962 pages in 4980ms
> [   77.039996] node 55 initialised, 7732962 pages in 5030ms
> [   77.067989] node 109 initialised, 7732962 pages in 5040ms
> [   77.066236] node 7 initialised, 7732962 pages in 5060ms
> [   77.068709] node 65 initialised, 7732962 pages in 5060ms
> [   77.097859] node 79 initialised, 7732962 pages in 5080ms
> [   77.096219] node 169 initialised, 7732962 pages in 5060ms
> [   77.125113] node 83 initialised, 7732962 pages in 5110ms
> [   77.139507] node 37 initialised, 7732962 pages in 5130ms
> [   77.143280] node 77 initialised, 7732962 pages in 5120ms
> [   77.226494] node 73 initialised, 7732962 pages in 5200ms
> [   77.281584] node 190 initialised, 7732961 pages in 5230ms
> [   77.314794] node 204 initialised, 7732961 pages in 5260ms
> [   77.328577] node 72 initialised, 7732961 pages in 5310ms
> [   77.335743] node 36 initialised, 7732961 pages in 5320ms
> [   77.360573] node 40 initialised, 7732961 pages in 5350ms
> [   77.368712] node 207 initialised, 7732962 pages in 5320ms
> [   77.387708] node 91 initialised, 7732962 pages in 5370ms
> [   77.385143] node 57 initialised, 7732962 pages in 5380ms
> [   77.391785] node 191 initialised, 7732962 pages in 5340ms
> [   77.479970] node 185 initialised, 7732962 pages in 5430ms
> [   77.491865] node 61 initialised, 7732962 pages in 5480ms
> [   77.489255] node 133 initialised, 7732962 pages in 5460ms
> [   77.502111] node 197 initialised, 7732962 pages in 5450ms
> [   77.507136] node 193 initialised, 7732962 pages in 5460ms
> [   77.523739] node 209 initialised, 7732962 pages in 5470ms
> [   77.537131] node 187 initialised, 7732962 pages in 5490ms
>
> -- [2]
>
> http://ozlabs.org/~akpm/mmots/broken-out/memblock-introduce-a-for_each_reserved_mem_region-iterator.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-move-page-initialization-into-a-separate-function.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-only-set-page-reserved-in-the-memblock-region.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-page_alloc-pass-pfn-to-__free_pages_bootmem.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-make-__early_pfn_to_nid-smp-safe-and-introduce-meminit_pfn_in_nid.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-inline-some-helper-functions.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-inline-some-helper-functions-fix.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-initialise-a-subset-of-struct-pages-if-config_deferred_struct_page_init-is-set.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-initialise-remaining-struct-pages-in-parallel-with-kswapd.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-minimise-number-of-pfn-page-lookups-during-initialisation.patch
> http://ozlabs.org/~akpm/mmots/broken-out/x86-mm-enable-deferred-struct-page-initialisation-on-x86-64.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-free-pages-in-large-chunks-where-possible.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-reduce-number-of-times-pageblocks-are-set-during-struct-page-init.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-remove-mminit_verify_page_links.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-initialise-a-subset-of-struct-pages-if-config_deferred_struct_page_init-is-set-fix.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-reduce-number-of-times-pageblocks-are-set-during-struct-page-init-fix.patch
> http://ozlabs.org/~akpm/mmots/broken-out/mm-meminit-inline-some-helper-functions-fix2.patch
>

[-- Attachment #2: 0001-Avoid-the-contention-in-set_cpus_allowed.patch --]
[-- Type: text/x-patch, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
  2015-06-24 22:50                         ` Nathan Zimmer
@ 2015-06-25 20:48                           ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-06-25 20:48 UTC (permalink / raw)
  To: Nathan Zimmer
  Cc: Daniel J Blueman, Andrew Morton, Waiman Long, Dave Hansen,
	Scott Norton, Linux-MM, LKML, Steffen Persvold

On Wed, Jun 24, 2015 at 05:50:28PM -0500, Nathan Zimmer wrote:
> My apologies for taking so long to get back to this.
> 
> I think I did locate two potential sources of slowdown.
> One is the set_cpus_allowed_ptr as I have noted previously.
> However I only notice that on the very largest boxes.
> I did cobble together a patch that seems to help.
> 

If you are using kthread_create_on_node(), is it even necessary to call
set_cpus_allowed_ptr() at all?

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
@ 2015-06-25 20:48                           ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-06-25 20:48 UTC (permalink / raw)
  To: Nathan Zimmer
  Cc: Daniel J Blueman, Andrew Morton, Waiman Long, Dave Hansen,
	Scott Norton, Linux-MM, LKML, Steffen Persvold

On Wed, Jun 24, 2015 at 05:50:28PM -0500, Nathan Zimmer wrote:
> My apologies for taking so long to get back to this.
> 
> I think I did locate two potential sources of slowdown.
> One is the set_cpus_allowed_ptr as I have noted previously.
> However I only notice that on the very largest boxes.
> I did cobble together a patch that seems to help.
> 

If you are using kthread_create_on_node(), is it even necessary to call
set_cpus_allowed_ptr() at all?

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
  2015-06-25 20:48                           ` Mel Gorman
@ 2015-06-25 20:57                             ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-06-25 20:57 UTC (permalink / raw)
  To: Nathan Zimmer
  Cc: Daniel J Blueman, Andrew Morton, Waiman Long, Dave Hansen,
	Scott Norton, Linux-MM, LKML, Steffen Persvold

On Thu, Jun 25, 2015 at 09:48:55PM +0100, Mel Gorman wrote:
> On Wed, Jun 24, 2015 at 05:50:28PM -0500, Nathan Zimmer wrote:
> > My apologies for taking so long to get back to this.
> > 
> > I think I did locate two potential sources of slowdown.
> > One is the set_cpus_allowed_ptr as I have noted previously.
> > However I only notice that on the very largest boxes.
> > I did cobble together a patch that seems to help.
> > 
> 
> If you are using kthread_create_on_node(), is it even necessary to call
> set_cpus_allowed_ptr() at all?
> 

That aside, are you aware of any failure with this series as it currently
stands in Andrew's tree that this patch is meant to address?  It seems
like a nice follow-on that would boot faster on very large machines but
if it's addressing a regression then it's very important as the series
cannot be merged with known critical failures.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
@ 2015-06-25 20:57                             ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-06-25 20:57 UTC (permalink / raw)
  To: Nathan Zimmer
  Cc: Daniel J Blueman, Andrew Morton, Waiman Long, Dave Hansen,
	Scott Norton, Linux-MM, LKML, Steffen Persvold

On Thu, Jun 25, 2015 at 09:48:55PM +0100, Mel Gorman wrote:
> On Wed, Jun 24, 2015 at 05:50:28PM -0500, Nathan Zimmer wrote:
> > My apologies for taking so long to get back to this.
> > 
> > I think I did locate two potential sources of slowdown.
> > One is the set_cpus_allowed_ptr as I have noted previously.
> > However I only notice that on the very largest boxes.
> > I did cobble together a patch that seems to help.
> > 
> 
> If you are using kthread_create_on_node(), is it even necessary to call
> set_cpus_allowed_ptr() at all?
> 

That aside, are you aware of any failure with this series as it currently
stands in Andrew's tree that this patch is meant to address?  It seems
like a nice follow-on that would boot faster on very large machines but
if it's addressing a regression then it's very important as the series
cannot be merged with known critical failures.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
  2015-06-25 20:48                           ` Mel Gorman
@ 2015-06-25 21:34                             ` Nathan Zimmer
  -1 siblings, 0 replies; 168+ messages in thread
From: Nathan Zimmer @ 2015-06-25 21:34 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Nathan Zimmer, Daniel J Blueman, Andrew Morton, Waiman Long,
	Dave Hansen, Scott Norton, Linux-MM, LKML, Steffen Persvold

On Thu, Jun 25, 2015 at 09:48:55PM +0100, Mel Gorman wrote:
> On Wed, Jun 24, 2015 at 05:50:28PM -0500, Nathan Zimmer wrote:
> > My apologies for taking so long to get back to this.
> > 
> > I think I did locate two potential sources of slowdown.
> > One is the set_cpus_allowed_ptr as I have noted previously.
> > However I only notice that on the very largest boxes.
> > I did cobble together a patch that seems to help.
> > 
> 
> If you are using kthread_create_on_node(), is it even necessary to call
> set_cpus_allowed_ptr() at all?
> 

Yup kthread_create_on_node unconditionanly calls
set_cpus_allowed_ptr(task, cpu_all_mask);
It does it to avoid inherting kthreadd's properties.

Not being familiar with scheduling code I assumed I missed something.
However it sounds like it should respect the choice.



^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
@ 2015-06-25 21:34                             ` Nathan Zimmer
  0 siblings, 0 replies; 168+ messages in thread
From: Nathan Zimmer @ 2015-06-25 21:34 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Nathan Zimmer, Daniel J Blueman, Andrew Morton, Waiman Long,
	Dave Hansen, Scott Norton, Linux-MM, LKML, Steffen Persvold

On Thu, Jun 25, 2015 at 09:48:55PM +0100, Mel Gorman wrote:
> On Wed, Jun 24, 2015 at 05:50:28PM -0500, Nathan Zimmer wrote:
> > My apologies for taking so long to get back to this.
> > 
> > I think I did locate two potential sources of slowdown.
> > One is the set_cpus_allowed_ptr as I have noted previously.
> > However I only notice that on the very largest boxes.
> > I did cobble together a patch that seems to help.
> > 
> 
> If you are using kthread_create_on_node(), is it even necessary to call
> set_cpus_allowed_ptr() at all?
> 

Yup kthread_create_on_node unconditionanly calls
set_cpus_allowed_ptr(task, cpu_all_mask);
It does it to avoid inherting kthreadd's properties.

Not being familiar with scheduling code I assumed I missed something.
However it sounds like it should respect the choice.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
  2015-06-25 20:57                             ` Mel Gorman
@ 2015-06-25 21:37                               ` Nathan Zimmer
  -1 siblings, 0 replies; 168+ messages in thread
From: Nathan Zimmer @ 2015-06-25 21:37 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Nathan Zimmer, Daniel J Blueman, Andrew Morton, Waiman Long,
	Dave Hansen, Scott Norton, Linux-MM, LKML, Steffen Persvold

On Thu, Jun 25, 2015 at 09:57:44PM +0100, Mel Gorman wrote:
> On Thu, Jun 25, 2015 at 09:48:55PM +0100, Mel Gorman wrote:
> > On Wed, Jun 24, 2015 at 05:50:28PM -0500, Nathan Zimmer wrote:
> > > My apologies for taking so long to get back to this.
> > > 
> > > I think I did locate two potential sources of slowdown.
> > > One is the set_cpus_allowed_ptr as I have noted previously.
> > > However I only notice that on the very largest boxes.
> > > I did cobble together a patch that seems to help.
> > > 
> > 
> > If you are using kthread_create_on_node(), is it even necessary to call
> > set_cpus_allowed_ptr() at all?
> > 
> 
> That aside, are you aware of any failure with this series as it currently
> stands in Andrew's tree that this patch is meant to address?  It seems
> like a nice follow-on that would boot faster on very large machines but
> if it's addressing a regression then it's very important as the series
> cannot be merged with known critical failures.
> 

Nope I haven't recorded any failures without it.
I just get concerned when I see some scaling issues that something COULD go wrong.


Nate


^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
@ 2015-06-25 21:37                               ` Nathan Zimmer
  0 siblings, 0 replies; 168+ messages in thread
From: Nathan Zimmer @ 2015-06-25 21:37 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Nathan Zimmer, Daniel J Blueman, Andrew Morton, Waiman Long,
	Dave Hansen, Scott Norton, Linux-MM, LKML, Steffen Persvold

On Thu, Jun 25, 2015 at 09:57:44PM +0100, Mel Gorman wrote:
> On Thu, Jun 25, 2015 at 09:48:55PM +0100, Mel Gorman wrote:
> > On Wed, Jun 24, 2015 at 05:50:28PM -0500, Nathan Zimmer wrote:
> > > My apologies for taking so long to get back to this.
> > > 
> > > I think I did locate two potential sources of slowdown.
> > > One is the set_cpus_allowed_ptr as I have noted previously.
> > > However I only notice that on the very largest boxes.
> > > I did cobble together a patch that seems to help.
> > > 
> > 
> > If you are using kthread_create_on_node(), is it even necessary to call
> > set_cpus_allowed_ptr() at all?
> > 
> 
> That aside, are you aware of any failure with this series as it currently
> stands in Andrew's tree that this patch is meant to address?  It seems
> like a nice follow-on that would boot faster on very large machines but
> if it's addressing a regression then it's very important as the series
> cannot be merged with known critical failures.
> 

Nope I haven't recorded any failures without it.
I just get concerned when I see some scaling issues that something COULD go wrong.


Nate

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* [RFC] kthread_create_on_node is failing to honor the node choice
  2015-06-25 20:48                           ` Mel Gorman
                                             ` (2 preceding siblings ...)
  (?)
@ 2015-06-25 21:44                           ` Nathan Zimmer
  2015-06-26  1:08                             ` Lai Jiangshan
  2015-07-09 22:12                             ` Andrew Morton
  -1 siblings, 2 replies; 168+ messages in thread
From: Nathan Zimmer @ 2015-06-25 21:44 UTC (permalink / raw)
  Cc: Nathan Zimmer, Andrew Morton, Nishanth Aravamudan, Tejun Heo,
	Lai Jiangshan, Mel Gorman, linux-kernel

In kthread_create_on_node we set_cpus_allowed to cpu_all_mask
regardless of what the node is requested.
This seems incorrect.

Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: linux-kernel@vger.kernel.org

---
 kernel/kthread.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 10e489c..d885d66 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -318,7 +318,10 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
 		 * The kernel thread should not inherit these properties.
 		 */
 		sched_setscheduler_nocheck(task, SCHED_NORMAL, &param);
-		set_cpus_allowed_ptr(task, cpu_all_mask);
+		if (node == -1)
+			set_cpus_allowed_ptr(task, cpu_all_mask);
+		else
+			set_cpus_allowed_ptr(task, cpumask_of_node(node));
 	}
 	kfree(create);
 	return task;
-- 
1.8.2.1


^ permalink raw reply related	[flat|nested] 168+ messages in thread

* Re: [RFC] kthread_create_on_node is failing to honor the node choice
  2015-06-25 21:44                           ` [RFC] kthread_create_on_node is failing to honor the node choice Nathan Zimmer
@ 2015-06-26  1:08                             ` Lai Jiangshan
  2015-07-09 22:12                             ` Andrew Morton
  1 sibling, 0 replies; 168+ messages in thread
From: Lai Jiangshan @ 2015-06-26  1:08 UTC (permalink / raw)
  To: Nathan Zimmer
  Cc: Andrew Morton, Nishanth Aravamudan, Tejun Heo, Mel Gorman, linux-kernel

On 06/26/2015 05:44 AM, Nathan Zimmer wrote:
> In kthread_create_on_node we set_cpus_allowed to cpu_all_mask
> regardless of what the node is requested.
> This seems incorrect.
> 
> Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: linux-kernel@vger.kernel.org
> 
> ---
>  kernel/kthread.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index 10e489c..d885d66 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -318,7 +318,10 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
>  		 * The kernel thread should not inherit these properties.
>  		 */
>  		sched_setscheduler_nocheck(task, SCHED_NORMAL, &param);
> -		set_cpus_allowed_ptr(task, cpu_all_mask);
> +		if (node == -1)
> +			set_cpus_allowed_ptr(task, cpu_all_mask);
> +		else
> +			set_cpus_allowed_ptr(task, cpumask_of_node(node));


cpumask_of_node(node) is bad here. It containers only online cpus.

>  	}
>  	kfree(create);
>  	return task;
> 


^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
  2015-06-24 22:50                         ` Nathan Zimmer
@ 2015-06-26 10:16                           ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-06-26 10:16 UTC (permalink / raw)
  To: Nathan Zimmer
  Cc: Daniel J Blueman, Andrew Morton, Waiman Long, Dave Hansen,
	Scott Norton, Linux-MM, LKML, Steffen Persvold

On Wed, Jun 24, 2015 at 05:50:28PM -0500, Nathan Zimmer wrote:
> From e18aa6158a60c2134b4eef93c856f3b5b250b122 Mon Sep 17 00:00:00 2001
> From: Nathan Zimmer <nzimmer@sgi.com>
> Date: Thu, 11 Jun 2015 10:47:39 -0500
> Subject: [RFC] Avoid the contention in set_cpus_allowed
> 
> Noticing some scaling issues at larger box sizes (64 nodes+) I found that in some
> cases we are spending significant amounts of time in set_cpus_allowed_ptr.
> 
> My assumption is that it is getting stuck on migration.
> So if we create the thread on the target node and restrict cpus before we start
> the thread then we don't have to suffer migration.
> 
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Waiman Long <waiman.long@hp.com
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Scott Norton <scott.norton@hp.com>
> Cc: Daniel J Blueman <daniel@numascale.com>
> Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
> 

I asked yesterday if set_cpus_allowed_ptr() was required and I made a
mistake because it is. The node parameter for kthread_create_on_node()
controls where it gets created but not how it is scheduled after that.
Sorry for the noise. The patch makes sense to me now, lets see if it
helps Daniel.


-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
@ 2015-06-26 10:16                           ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-06-26 10:16 UTC (permalink / raw)
  To: Nathan Zimmer
  Cc: Daniel J Blueman, Andrew Morton, Waiman Long, Dave Hansen,
	Scott Norton, Linux-MM, LKML, Steffen Persvold

On Wed, Jun 24, 2015 at 05:50:28PM -0500, Nathan Zimmer wrote:
> From e18aa6158a60c2134b4eef93c856f3b5b250b122 Mon Sep 17 00:00:00 2001
> From: Nathan Zimmer <nzimmer@sgi.com>
> Date: Thu, 11 Jun 2015 10:47:39 -0500
> Subject: [RFC] Avoid the contention in set_cpus_allowed
> 
> Noticing some scaling issues at larger box sizes (64 nodes+) I found that in some
> cases we are spending significant amounts of time in set_cpus_allowed_ptr.
> 
> My assumption is that it is getting stuck on migration.
> So if we create the thread on the target node and restrict cpus before we start
> the thread then we don't have to suffer migration.
> 
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Waiman Long <waiman.long@hp.com
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Scott Norton <scott.norton@hp.com>
> Cc: Daniel J Blueman <daniel@numascale.com>
> Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
> 

I asked yesterday if set_cpus_allowed_ptr() was required and I made a
mistake because it is. The node parameter for kthread_create_on_node()
controls where it gets created but not how it is scheduled after that.
Sorry for the noise. The patch makes sense to me now, lets see if it
helps Daniel.


-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
  2015-06-24 22:50                         ` Nathan Zimmer
@ 2015-07-06 17:45                           ` Daniel J Blueman
  -1 siblings, 0 replies; 168+ messages in thread
From: Daniel J Blueman @ 2015-07-06 17:45 UTC (permalink / raw)
  To: Nathan Zimmer
  Cc: Mel Gorman, Andrew Morton, Waiman Long, Dave Hansen,
	Scott Norton, Linux-MM, LKML, Steffen Persvold

Hi Nate,

On Wed, Jun 24, 2015 at 11:50 PM, Nathan Zimmer <nzimmer@sgi.com> wrote:
> My apologies for taking so long to get back to this.
> 
> I think I did locate two potential sources of slowdown.
> One is the set_cpus_allowed_ptr as I have noted previously.
> However I only notice that on the very largest boxes.
> I did cobble together a patch that seems to help.
> 
> The other spot I suspect is the zone lock in free_one_page.
> I haven't been able to give that much thought as of yet though.
> 
> Daniel do you mind seeing if the attached patch helps out?

Just got back from travel, so apologies for the delays.

The patch doesn't mitigate the increasing initialisation time; summing 
the per-node times for an accurate measure, there was a total of 
171.48s before the patch and 175.23s after. I double-checked and got 
similar data.

Thanks,
  Daniel


^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
@ 2015-07-06 17:45                           ` Daniel J Blueman
  0 siblings, 0 replies; 168+ messages in thread
From: Daniel J Blueman @ 2015-07-06 17:45 UTC (permalink / raw)
  To: Nathan Zimmer
  Cc: Mel Gorman, Andrew Morton, Waiman Long, Dave Hansen,
	Scott Norton, Linux-MM, LKML, Steffen Persvold

Hi Nate,

On Wed, Jun 24, 2015 at 11:50 PM, Nathan Zimmer <nzimmer@sgi.com> wrote:
> My apologies for taking so long to get back to this.
> 
> I think I did locate two potential sources of slowdown.
> One is the set_cpus_allowed_ptr as I have noted previously.
> However I only notice that on the very largest boxes.
> I did cobble together a patch that seems to help.
> 
> The other spot I suspect is the zone lock in free_one_page.
> I haven't been able to give that much thought as of yet though.
> 
> Daniel do you mind seeing if the attached patch helps out?

Just got back from travel, so apologies for the delays.

The patch doesn't mitigate the increasing initialisation time; summing 
the per-node times for an accurate measure, there was a total of 
171.48s before the patch and 175.23s after. I double-checked and got 
similar data.

Thanks,
  Daniel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
  2015-07-06 17:45                           ` Daniel J Blueman
@ 2015-07-09 17:49                             ` Nathan Zimmer
  -1 siblings, 0 replies; 168+ messages in thread
From: Nathan Zimmer @ 2015-07-09 17:49 UTC (permalink / raw)
  To: Daniel J Blueman
  Cc: Mel Gorman, Andrew Morton, Waiman Long, Dave Hansen,
	Scott Norton, Linux-MM, LKML, Steffen Persvold

Interesting, I found a small improvement in total clock time through the 
area.
I tweaked page_alloc_init_late have a timer, like the 
deferred_init_memmap, and this patch showed a small improvement.

Ok thanks for your help.


On 07/06/2015 12:45 PM, Daniel J Blueman wrote:
> Hi Nate,
>
> On Wed, Jun 24, 2015 at 11:50 PM, Nathan Zimmer <nzimmer@sgi.com> wrote:
>> My apologies for taking so long to get back to this.
>>
>> I think I did locate two potential sources of slowdown.
>> One is the set_cpus_allowed_ptr as I have noted previously.
>> However I only notice that on the very largest boxes.
>> I did cobble together a patch that seems to help.
>>
>> The other spot I suspect is the zone lock in free_one_page.
>> I haven't been able to give that much thought as of yet though.
>>
>> Daniel do you mind seeing if the attached patch helps out?
>
> Just got back from travel, so apologies for the delays.
>
> The patch doesn't mitigate the increasing initialisation time; summing 
> the per-node times for an accurate measure, there was a total of 
> 171.48s before the patch and 175.23s after. I double-checked and got 
> similar data.
>
> Thanks,
>  Daniel
>


^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup
@ 2015-07-09 17:49                             ` Nathan Zimmer
  0 siblings, 0 replies; 168+ messages in thread
From: Nathan Zimmer @ 2015-07-09 17:49 UTC (permalink / raw)
  To: Daniel J Blueman
  Cc: Mel Gorman, Andrew Morton, Waiman Long, Dave Hansen,
	Scott Norton, Linux-MM, LKML, Steffen Persvold

Interesting, I found a small improvement in total clock time through the 
area.
I tweaked page_alloc_init_late have a timer, like the 
deferred_init_memmap, and this patch showed a small improvement.

Ok thanks for your help.


On 07/06/2015 12:45 PM, Daniel J Blueman wrote:
> Hi Nate,
>
> On Wed, Jun 24, 2015 at 11:50 PM, Nathan Zimmer <nzimmer@sgi.com> wrote:
>> My apologies for taking so long to get back to this.
>>
>> I think I did locate two potential sources of slowdown.
>> One is the set_cpus_allowed_ptr as I have noted previously.
>> However I only notice that on the very largest boxes.
>> I did cobble together a patch that seems to help.
>>
>> The other spot I suspect is the zone lock in free_one_page.
>> I haven't been able to give that much thought as of yet though.
>>
>> Daniel do you mind seeing if the attached patch helps out?
>
> Just got back from travel, so apologies for the delays.
>
> The patch doesn't mitigate the increasing initialisation time; summing 
> the per-node times for an accurate measure, there was a total of 
> 171.48s before the patch and 175.23s after. I double-checked and got 
> similar data.
>
> Thanks,
>  Daniel
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [RFC] kthread_create_on_node is failing to honor the node choice
  2015-06-25 21:44                           ` [RFC] kthread_create_on_node is failing to honor the node choice Nathan Zimmer
  2015-06-26  1:08                             ` Lai Jiangshan
@ 2015-07-09 22:12                             ` Andrew Morton
  2015-07-10 14:26                               ` Mel Gorman
  2015-07-10 17:34                               ` Nathan Zimmer
  1 sibling, 2 replies; 168+ messages in thread
From: Andrew Morton @ 2015-07-09 22:12 UTC (permalink / raw)
  To: Nathan Zimmer
  Cc: Nishanth Aravamudan, Tejun Heo, Lai Jiangshan, Mel Gorman,
	linux-kernel, Eric Dumazet

On Thu, 25 Jun 2015 16:44:13 -0500 Nathan Zimmer <nzimmer@sgi.com> wrote:

> In kthread_create_on_node we set_cpus_allowed to cpu_all_mask
> regardless of what the node is requested.
> This seems incorrect.

The `node' arg to kthread_create_on_node() refers to which node the
task_struct and thread_info are allocated from.  It doesn't refer to
the CPUs upon which the thread is executed.  See
kthread_create_info.node and that gruesome task_struct.pref_node_fork
thing.

The kthread_create_on_node() kerneldoc explains this, in a confused
way.  It needs a s/-1/NUMA_NO_NODE/.

I'm a bit surprised that kthread_create_on_node() futzes with the new
thread's policy and cpumask after it has been created.  Wouldn't it be
simpler/faster to have the thread itself set these things while it's
starting up?


As to whether kthread_create_on_node() should bind the thread to that
node's CPUs: unclear. 
drivers/thermal/intel_powerclamp.c:start_power_clamp() understands how
kthread_create_on_node() works.  I guess the code is OK as-is, but the
documentation could be improved.  Perfunctory effort:

--- a/kernel/kthread.c~a
+++ a/kernel/kthread.c
@@ -246,7 +246,7 @@ static void create_kthread(struct kthrea
  * kthread_create_on_node - create a kthread.
  * @threadfn: the function to run until signal_pending(current).
  * @data: data ptr for @threadfn.
- * @node: memory node number.
+ * @node: task and thread structures for the thread are allocated on this node
  * @namefmt: printf-style name for the thread.
  *
  * Description: This helper function creates and names a kernel
@@ -254,7 +254,7 @@ static void create_kthread(struct kthrea
  * it.  See also kthread_run().
  *
  * If thread is going to be bound on a particular cpu, give its node
- * in @node, to get NUMA affinity for kthread stack, or else give -1.
+ * in @node, to get NUMA affinity for kthread stack, or else give NUMA_NO_NODE.
  * When woken, the thread will run @threadfn() with @data as its
  * argument. @threadfn() can either call do_exit() directly if it is a
  * standalone thread for which no one will call kthread_stop(), or
_




^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [RFC] kthread_create_on_node is failing to honor the node choice
  2015-07-09 22:12                             ` Andrew Morton
@ 2015-07-10 14:26                               ` Mel Gorman
  2015-07-10 17:34                               ` Nathan Zimmer
  1 sibling, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-07-10 14:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Nishanth Aravamudan, Tejun Heo, Lai Jiangshan,
	linux-kernel, Eric Dumazet

On Thu, Jul 09, 2015 at 03:12:59PM -0700, Andrew Morton wrote:
> On Thu, 25 Jun 2015 16:44:13 -0500 Nathan Zimmer <nzimmer@sgi.com> wrote:
> 
> > In kthread_create_on_node we set_cpus_allowed to cpu_all_mask
> > regardless of what the node is requested.
> > This seems incorrect.
> 
> The `node' arg to kthread_create_on_node() refers to which node the
> task_struct and thread_info are allocated from.  It doesn't refer to
> the CPUs upon which the thread is executed.  See
> kthread_create_info.node and that gruesome task_struct.pref_node_fork
> thing.
> 

That's the initial mistake I made when reviewing Nathan's first path.

> The kthread_create_on_node() kerneldoc explains this, in a confused
> way.  It needs a s/-1/NUMA_NO_NODE/.
> 
> I'm a bit surprised that kthread_create_on_node() futzes with the new
> thread's policy and cpumask after it has been created.  Wouldn't it be
> simpler/faster to have the thread itself set these things while it's
> starting up?
> 

Yeah, which is what Nathan's first patch did that I made a mistake with
initially. It creates a thread, sets the mask then wakes it up which
looks correct.

Your documentation patch looks good to me, I would not have fallen into
the trap if it had been applied.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: [RFC] kthread_create_on_node is failing to honor the node choice
  2015-07-09 22:12                             ` Andrew Morton
  2015-07-10 14:26                               ` Mel Gorman
@ 2015-07-10 17:34                               ` Nathan Zimmer
  1 sibling, 0 replies; 168+ messages in thread
From: Nathan Zimmer @ 2015-07-10 17:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Nishanth Aravamudan, Tejun Heo, Lai Jiangshan,
	Mel Gorman, linux-kernel, Eric Dumazet


On Thu, Jul 09, 2015 at 03:12:59PM -0700, Andrew Morton wrote:
> On Thu, 25 Jun 2015 16:44:13 -0500 Nathan Zimmer <nzimmer@sgi.com> wrote:
> 
> > In kthread_create_on_node we set_cpus_allowed to cpu_all_mask
> > regardless of what the node is requested.
> > This seems incorrect.
> 
> The `node' arg to kthread_create_on_node() refers to which node the
> task_struct and thread_info are allocated from.  It doesn't refer to
> the CPUs upon which the thread is executed.  See
> kthread_create_info.node and that gruesome task_struct.pref_node_fork
> thing.
> 
> The kthread_create_on_node() kerneldoc explains this, in a confused
> way.  It needs a s/-1/NUMA_NO_NODE/.


I suspect we should also update the kthread_create macro to use NUMA_NO_NODE also.


diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 13d5520..3e6773e 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -11,7 +11,7 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
 					   const char namefmt[], ...);
 
 #define kthread_create(threadfn, data, namefmt, arg...) \
-	kthread_create_on_node(threadfn, data, -1, namefmt, ##arg)
+	kthread_create_on_node(threadfn, data, NUMA_NO_NODE, namefmt, ##arg)
 
 
 struct task_struct *kthread_create_on_cpu(int (*threadfn)(void *data),

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* 4.2-rc2: hitting "file-max limit 8192 reached"
  2015-04-28 14:37   ` Mel Gorman
@ 2015-07-14 15:54     ` Dave Hansen
  -1 siblings, 0 replies; 168+ messages in thread
From: Dave Hansen @ 2015-07-14 15:54 UTC (permalink / raw)
  To: Mel Gorman, Andrew Morton
  Cc: Nathan Zimmer, Waiman Long, Scott Norton, Daniel J Blueman,
	Linux-MM, LKML, Al Viro, Linus Torvalds

My laptop has been behaving strangely with 4.2-rc2.  Once I log in to my
X session, I start getting all kinds of strange errors from applications
and see this in my dmesg:

	VFS: file-max limit 8192 reached

Could this be from CONFIG_DEFERRED_STRUCT_PAGE_INIT=y?  files_init()
seems top be sizing files_stat.max_files from memory sizes.

vfs_caches_init() uses nr_free_pages() to figure out what the "current
kernel size" is in early boot.  *But* since we have not freed most of
our memory, nr_free_pages() is low and makes us calculate the reserve as
if the kernel we huge.

Adding some printk's confirms this.  Broken kernel:

	vfs_caches_init() mempages: 4026972
	vfs_caches_init() reserve: 4021629
	vfs_caches_init() mempages (after reserve minus): 5343
	files_init() n: 2137
	files_init() files_stat.max_files: 8192

Working kernel:

	vfs_caches_init() mempages: 4026972
	vfs_caches_init() reserve: 375
	vfs_caches_init() mempages2: 4026597
	files_init() n: 1610638
	files_init() files_stat.max_files: 1610638

Do we have an alternative to call instead of nr_free_pages() in
vfs_caches_init()?

I guess we could save off 'nr_initialized' in memmap_init_zone() and
then use "nr_initialized - nr_free_pages()", but that seems a bit hackish.

^ permalink raw reply	[flat|nested] 168+ messages in thread

* 4.2-rc2: hitting "file-max limit 8192 reached"
@ 2015-07-14 15:54     ` Dave Hansen
  0 siblings, 0 replies; 168+ messages in thread
From: Dave Hansen @ 2015-07-14 15:54 UTC (permalink / raw)
  To: Mel Gorman, Andrew Morton
  Cc: Nathan Zimmer, Waiman Long, Scott Norton, Daniel J Blueman,
	Linux-MM, LKML, Al Viro, Linus Torvalds

My laptop has been behaving strangely with 4.2-rc2.  Once I log in to my
X session, I start getting all kinds of strange errors from applications
and see this in my dmesg:

	VFS: file-max limit 8192 reached

Could this be from CONFIG_DEFERRED_STRUCT_PAGE_INIT=y?  files_init()
seems top be sizing files_stat.max_files from memory sizes.

vfs_caches_init() uses nr_free_pages() to figure out what the "current
kernel size" is in early boot.  *But* since we have not freed most of
our memory, nr_free_pages() is low and makes us calculate the reserve as
if the kernel we huge.

Adding some printk's confirms this.  Broken kernel:

	vfs_caches_init() mempages: 4026972
	vfs_caches_init() reserve: 4021629
	vfs_caches_init() mempages (after reserve minus): 5343
	files_init() n: 2137
	files_init() files_stat.max_files: 8192

Working kernel:

	vfs_caches_init() mempages: 4026972
	vfs_caches_init() reserve: 375
	vfs_caches_init() mempages2: 4026597
	files_init() n: 1610638
	files_init() files_stat.max_files: 1610638

Do we have an alternative to call instead of nr_free_pages() in
vfs_caches_init()?

I guess we could save off 'nr_initialized' in memmap_init_zone() and
then use "nr_initialized - nr_free_pages()", but that seems a bit hackish.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: 4.2-rc2: hitting "file-max limit 8192 reached"
  2015-07-14 15:54     ` Dave Hansen
@ 2015-07-14 16:15       ` Andrew Morton
  -1 siblings, 0 replies; 168+ messages in thread
From: Andrew Morton @ 2015-07-14 16:15 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Mel Gorman, Nathan Zimmer, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML, Al Viro, Linus Torvalds

On Tue, 14 Jul 2015 08:54:11 -0700 Dave Hansen <dave.hansen@intel.com> wrote:

> My laptop has been behaving strangely with 4.2-rc2.  Once I log in to my
> X session, I start getting all kinds of strange errors from applications
> and see this in my dmesg:
> 
> 	VFS: file-max limit 8192 reached
> 
> Could this be from CONFIG_DEFERRED_STRUCT_PAGE_INIT=y?  files_init()
> seems top be sizing files_stat.max_files from memory sizes.

argh.

> vfs_caches_init() uses nr_free_pages() to figure out what the "current
> kernel size" is in early boot.  *But* since we have not freed most of
> our memory, nr_free_pages() is low and makes us calculate the reserve as
> if the kernel we huge.
> 
> Adding some printk's confirms this.  Broken kernel:
> 
> 	vfs_caches_init() mempages: 4026972
> 	vfs_caches_init() reserve: 4021629
> 	vfs_caches_init() mempages (after reserve minus): 5343
> 	files_init() n: 2137
> 	files_init() files_stat.max_files: 8192
> 
> Working kernel:
> 
> 	vfs_caches_init() mempages: 4026972
> 	vfs_caches_init() reserve: 375
> 	vfs_caches_init() mempages2: 4026597
> 	files_init() n: 1610638
> 	files_init() files_stat.max_files: 1610638
> 
> Do we have an alternative to call instead of nr_free_pages() in
> vfs_caches_init()?
> 
> I guess we could save off 'nr_initialized' in memmap_init_zone() and
> then use "nr_initialized - nr_free_pages()", but that seems a bit hackish.

There are a lot of things that might be affected this way.  Callers of
nr_free_buffer_pages(), nr_free_pagecache_pages(), etc.

If we'd fully used the memory hotplug infrastructure then everything
would work - all those knobs which are sized off free-memory would get
themselves resized as more memory comes on line.  But quite a few
things have been missed.


^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: 4.2-rc2: hitting "file-max limit 8192 reached"
@ 2015-07-14 16:15       ` Andrew Morton
  0 siblings, 0 replies; 168+ messages in thread
From: Andrew Morton @ 2015-07-14 16:15 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Mel Gorman, Nathan Zimmer, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML, Al Viro, Linus Torvalds

On Tue, 14 Jul 2015 08:54:11 -0700 Dave Hansen <dave.hansen@intel.com> wrote:

> My laptop has been behaving strangely with 4.2-rc2.  Once I log in to my
> X session, I start getting all kinds of strange errors from applications
> and see this in my dmesg:
> 
> 	VFS: file-max limit 8192 reached
> 
> Could this be from CONFIG_DEFERRED_STRUCT_PAGE_INIT=y?  files_init()
> seems top be sizing files_stat.max_files from memory sizes.

argh.

> vfs_caches_init() uses nr_free_pages() to figure out what the "current
> kernel size" is in early boot.  *But* since we have not freed most of
> our memory, nr_free_pages() is low and makes us calculate the reserve as
> if the kernel we huge.
> 
> Adding some printk's confirms this.  Broken kernel:
> 
> 	vfs_caches_init() mempages: 4026972
> 	vfs_caches_init() reserve: 4021629
> 	vfs_caches_init() mempages (after reserve minus): 5343
> 	files_init() n: 2137
> 	files_init() files_stat.max_files: 8192
> 
> Working kernel:
> 
> 	vfs_caches_init() mempages: 4026972
> 	vfs_caches_init() reserve: 375
> 	vfs_caches_init() mempages2: 4026597
> 	files_init() n: 1610638
> 	files_init() files_stat.max_files: 1610638
> 
> Do we have an alternative to call instead of nr_free_pages() in
> vfs_caches_init()?
> 
> I guess we could save off 'nr_initialized' in memmap_init_zone() and
> then use "nr_initialized - nr_free_pages()", but that seems a bit hackish.

There are a lot of things that might be affected this way.  Callers of
nr_free_buffer_pages(), nr_free_pagecache_pages(), etc.

If we'd fully used the memory hotplug infrastructure then everything
would work - all those knobs which are sized off free-memory would get
themselves resized as more memory comes on line.  But quite a few
things have been missed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 168+ messages in thread

* Re: 4.2-rc2: hitting "file-max limit 8192 reached"
  2015-07-14 15:54     ` Dave Hansen
@ 2015-07-15 10:45       ` Mel Gorman
  -1 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-07-15 10:45 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andrew Morton, Nathan Zimmer, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML, Al Viro, Linus Torvalds

On Tue, Jul 14, 2015 at 08:54:11AM -0700, Dave Hansen wrote:
> My laptop has been behaving strangely with 4.2-rc2.  Once I log in to my
> X session, I start getting all kinds of strange errors from applications
> and see this in my dmesg:
> 
> 	VFS: file-max limit 8192 reached
> 
> Could this be from CONFIG_DEFERRED_STRUCT_PAGE_INIT=y?  files_init()
> seems top be sizing files_stat.max_files from memory sizes.
> 

Yep.

I'm very sick at the moment and running a temperature so this needs double
checking. Medication is helping but I'm nowhere near 100%.

Andrew mentioned nr_free_buffer_pages and nr_free_pagecache_pages.
They are both live calculation that walks through zonelists with return
values based on zone->managed_pages. They are not affected by deferred
memory initialisation which leaves managed_pages alone.

AFAICS, the key problem is to watch for initialisations that are based on
free memory. It appears that only file_table.c cares and the calculation
of limits can be done after deferred memory initialisation like this;

---8<---
fs, file table: Reinit files_stat.max_files after deferred memory initialisation

Dave Hansen reported the following;

	My laptop has been behaving strangely with 4.2-rc2.  Once I log
	in to my X session, I start getting all kinds of strange errors
	from applications and see this in my dmesg:

        	VFS: file-max limit 8192 reached

The problem is that the file-max is calculated before memory is fully
initialised and miscalculates how much memory the kernel is using. This
patch recalculates file-max after deferred memory initialisation. Note
that using memory hotplug infrastructure would not have avoided this
problem as the value is not recalculated after memory hot-add.

4.1:             files_stat.max_files = 6582781
4.2-rc2:         files_stat.max_files = 8192
4.2-rc2 patched: files_stat.max_files = 6562467

Small differences with the patch applied and 4.1 but not enough to matter.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 fs/dcache.c        | 13 +++----------
 fs/file_table.c    | 24 +++++++++++++++---------
 include/linux/fs.h |  5 +++--
 init/main.c        |  2 +-
 mm/page_alloc.c    |  3 +++
 5 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 5c8ea15e73a5..9b5fe503f6cb 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -3442,22 +3442,15 @@ void __init vfs_caches_init_early(void)
 	inode_init_early();
 }
 
-void __init vfs_caches_init(unsigned long mempages)
+void __init vfs_caches_init(void)
 {
-	unsigned long reserve;
-
-	/* Base hash sizes on available memory, with a reserve equal to
-           150% of current kernel size */
-
-	reserve = min((mempages - nr_free_pages()) * 3/2, mempages - 1);
-	mempages -= reserve;
-
 	names_cachep = kmem_cache_create("names_cache", PATH_MAX, 0,
 			SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
 
 	dcache_init();
 	inode_init();
-	files_init(mempages);
+	files_init();
+	files_maxfiles_init();
 	mnt_init();
 	bdev_cache_init();
 	chrdev_init();
diff --git a/fs/file_table.c b/fs/file_table.c
index 7f9d407c7595..ad17e05ebf95 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -25,6 +25,7 @@
 #include <linux/hardirq.h>
 #include <linux/task_work.h>
 #include <linux/ima.h>
+#include <linux/swap.h>
 
 #include <linux/atomic.h>
 
@@ -308,19 +309,24 @@ void put_filp(struct file *file)
 	}
 }
 
-void __init files_init(unsigned long mempages)
+void __init files_init(void)
 { 
-	unsigned long n;
-
 	filp_cachep = kmem_cache_create("filp", sizeof(struct file), 0,
 			SLAB_HWCACHE_ALIGN | SLAB_PANIC, NULL);
+	percpu_counter_init(&nr_files, 0, GFP_KERNEL);
+}
 
-	/*
-	 * One file with associated inode and dcache is very roughly 1K.
-	 * Per default don't use more than 10% of our memory for files. 
-	 */ 
+/*
+ * One file with associated inode and dcache is very roughly 1K. Per default
+ * do not use more than 10% of our memory for files.
+ */
+void __init files_maxfiles_init(void)
+{
+	unsigned long n;
+	unsigned long memreserve = (totalram_pages - nr_free_pages()) * 3/2;
+
+	memreserve = min(memreserve, totalram_pages - 1);
+	n = ((totalram_pages - memreserve) * (PAGE_SIZE / 1024)) / 10;
 
-	n = (mempages * (PAGE_SIZE / 1024)) / 10;
 	files_stat.max_files = max_t(unsigned long, n, NR_FILE);
-	percpu_counter_init(&nr_files, 0, GFP_KERNEL);
 } 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a0653e560c26..e6ceaae3a50e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -55,7 +55,8 @@ struct vm_fault;
 
 extern void __init inode_init(void);
 extern void __init inode_init_early(void);
-extern void __init files_init(unsigned long);
+extern void __init files_init(void);
+extern void __init files_maxfiles_init(void);
 
 extern struct files_stat_struct files_stat;
 extern unsigned long get_max_files(void);
@@ -2235,7 +2236,7 @@ extern int ioctl_preallocate(struct file *filp, void __user *argp);
 
 /* fs/dcache.c */
 extern void __init vfs_caches_init_early(void);
-extern void __init vfs_caches_init(unsigned long);
+extern void __init vfs_caches_init(void);
 
 extern struct kmem_cache *names_cachep;
 
diff --git a/init/main.c b/init/main.c
index c5d5626289ce..56506553d4d8 100644
--- a/init/main.c
+++ b/init/main.c
@@ -656,7 +656,7 @@ asmlinkage __visible void __init start_kernel(void)
 	key_init();
 	security_init();
 	dbg_late_init();
-	vfs_caches_init(totalram_pages);
+	vfs_caches_init();
 	signals_init();
 	/* rootfs populating might need page-writeback */
 	page_writeback_init();
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a69e78c396a0..94e2599830c2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1203,6 +1203,9 @@ void __init page_alloc_init_late(void)
 
 	/* Block until all are initialised */
 	wait_for_completion(&pgdat_init_all_done_comp);
+
+	/* Reinit limits that are based on free pages after the kernel is up */
+	files_maxfiles_init();
 }
 #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
 

^ permalink raw reply related	[flat|nested] 168+ messages in thread

* Re: 4.2-rc2: hitting "file-max limit 8192 reached"
@ 2015-07-15 10:45       ` Mel Gorman
  0 siblings, 0 replies; 168+ messages in thread
From: Mel Gorman @ 2015-07-15 10:45 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andrew Morton, Nathan Zimmer, Waiman Long, Scott Norton,
	Daniel J Blueman, Linux-MM, LKML, Al Viro, Linus Torvalds

On Tue, Jul 14, 2015 at 08:54:11AM -0700, Dave Hansen wrote:
> My laptop has been behaving strangely with 4.2-rc2.  Once I log in to my
> X session, I start getting all kinds of strange errors from applications
> and see this in my dmesg:
> 
> 	VFS: file-max limit 8192 reached
> 
> Could this be from CONFIG_DEFERRED_STRUCT_PAGE_INIT=y?  files_init()
> seems top be sizing files_stat.max_files from memory sizes.
> 

Yep.

I'm very sick at the moment and running a temperature so this needs double
checking. Medication is helping but I'm nowhere near 100%.

Andrew mentioned nr_free_buffer_pages and nr_free_pagecache_pages.
They are both live calculation that walks through zonelists with return
values based on zone->managed_pages. They are not affected by deferred
memory initialisation which leaves managed_pages alone.

AFAICS, the key problem is to watch for initialisations that are based on
free memory. It appears that only file_table.c cares and the calculation
of limits can be done after deferred memory initialisation like this;

---8<---
fs, file table: Reinit files_stat.max_files after deferred memory initialisation

Dave Hansen reported the following;

	My laptop has been behaving strangely with 4.2-rc2.  Once I log
	in to my X session, I start getting all kinds of strange errors
	from applications and see this in my dmesg:

        	VFS: file-max limit 8192 reached

The problem is that the file-max is calculated before memory is fully
initialised and miscalculates how much memory the kernel is using. This
patch recalculates file-max after deferred memory initialisation. Note
that using memory hotplug infrastructure would not have avoided this
problem as the value is not recalculated after memory hot-add.

4.1:             files_stat.max_files = 6582781
4.2-rc2:         files_stat.max_files = 8192
4.2-rc2 patched: files_stat.max_files = 6562467

Small differences with the patch applied and 4.1 but not enough to matter.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 fs/dcache.c        | 13 +++----------
 fs/file_table.c    | 24 +++++++++++++++---------
 include/linux/fs.h |  5 +++--
 init/main.c        |  2 +-
 mm/page_alloc.c    |  3 +++
 5 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 5c8ea15e73a5..9b5fe503f6cb 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -3442,22 +3442,15 @@ void __init vfs_caches_init_early(void)
 	inode_init_early();
 }
 
-void __init vfs_caches_init(unsigned long mempages)
+void __init vfs_caches_init(void)
 {
-	unsigned long reserve;
-
-	/* Base hash sizes on available memory, with a reserve equal to
-           150% of current kernel size */
-
-	reserve = min((mempages - nr_free_pages()) * 3/2, mempages - 1);
-	mempages -= reserve;
-
 	names_cachep = kmem_cache_create("names_cache", PATH_MAX, 0,
 			SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
 
 	dcache_init();
 	inode_init();
-	files_init(mempages);
+	files_init();
+	files_maxfiles_init();
 	mnt_init();
 	bdev_cache_init();
 	chrdev_init();
diff --git a/fs/file_table.c b/fs/file_table.c
index 7f9d407c7595..ad17e05ebf95 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -25,6 +25,7 @@
 #include <linux/hardirq.h>
 #include <linux/task_work.h>
 #include <linux/ima.h>
+#include <linux/swap.h>
 
 #include <linux/atomic.h>
 
@@ -308,19 +309,24 @@ void put_filp(struct file *file)
 	}
 }
 
-void __init files_init(unsigned long mempages)
+void __init files_init(void)
 { 
-	unsigned long n;
-
 	filp_cachep = kmem_cache_create("filp", sizeof(struct file), 0,
 			SLAB_HWCACHE_ALIGN | SLAB_PANIC, NULL);
+	percpu_counter_init(&nr_files, 0, GFP_KERNEL);
+}
 
-	/*
-	 * One file with associated inode and dcache is very roughly 1K.
-	 * Per default don't use more than 10% of our memory for files. 
-	 */ 
+/*
+ * One file with associated inode and dcache is very roughly 1K. Per default
+ * do not use more than 10% of our memory for files.
+ */
+void __init files_maxfiles_init(void)
+{
+	unsigned long n;
+	unsigned long memreserve = (totalram_pages - nr_free_pages()) * 3/2;
+
+	memreserve = min(memreserve, totalram_pages - 1);
+	n = ((totalram_pages - memreserve) * (PAGE_SIZE / 1024)) / 10;
 
-	n = (mempages * (PAGE_SIZE / 1024)) / 10;
 	files_stat.max_files = max_t(unsigned long, n, NR_FILE);
-	percpu_counter_init(&nr_files, 0, GFP_KERNEL);
 } 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a0653e560c26..e6ceaae3a50e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -55,7 +55,8 @@ struct vm_fault;
 
 extern void __init inode_init(void);
 extern void __init inode_init_early(void);
-extern void __init files_init(unsigned long);
+extern void __init files_init(void);
+extern void __init files_maxfiles_init(void);
 
 extern struct files_stat_struct files_stat;
 extern unsigned long get_max_files(void);
@@ -2235,7 +2236,7 @@ extern int ioctl_preallocate(struct file *filp, void __user *argp);
 
 /* fs/dcache.c */
 extern void __init vfs_caches_init_early(void);
-extern void __init vfs_caches_init(unsigned long);
+extern void __init vfs_caches_init(void);
 
 extern struct kmem_cache *names_cachep;
 
diff --git a/init/main.c b/init/main.c
index c5d5626289ce..56506553d4d8 100644
--- a/init/main.c
+++ b/init/main.c
@@ -656,7 +656,7 @@ asmlinkage __visible void __init start_kernel(void)
 	key_init();
 	security_init();
 	dbg_late_init();
-	vfs_caches_init(totalram_pages);
+	vfs_caches_init();
 	signals_init();
 	/* rootfs populating might need page-writeback */
 	page_writeback_init();
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a69e78c396a0..94e2599830c2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1203,6 +1203,9 @@ void __init page_alloc_init_late(void)
 
 	/* Block until all are initialised */
 	wait_for_completion(&pgdat_init_all_done_comp);
+
+	/* Reinit limits that are based on free pages after the kernel is up */
+	files_maxfiles_init();
 }
 #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 168+ messages in thread

end of thread, other threads:[~2015-07-15 10:45 UTC | newest]

Thread overview: 168+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-28 14:36 [PATCH 0/13] Parallel struct page initialisation v4 Mel Gorman
2015-04-28 14:36 ` Mel Gorman
2015-04-28 14:36 ` [PATCH 01/13] memblock: Introduce a for_each_reserved_mem_region iterator Mel Gorman
2015-04-28 14:36   ` Mel Gorman
2015-04-28 14:36 ` [PATCH 02/13] mm: meminit: Move page initialization into a separate function Mel Gorman
2015-04-28 14:36   ` Mel Gorman
2015-04-28 14:37 ` [PATCH 03/13] mm: meminit: Only set page reserved in the memblock region Mel Gorman
2015-04-28 14:37   ` Mel Gorman
2015-05-22 20:31   ` Tony Luck
2015-05-22 20:31     ` Tony Luck
2015-05-26 10:22     ` Mel Gorman
2015-05-26 10:22       ` Mel Gorman
2015-04-28 14:37 ` [PATCH 04/13] mm: page_alloc: Pass PFN to __free_pages_bootmem Mel Gorman
2015-04-28 14:37   ` Mel Gorman
2015-05-01  9:20   ` [PATCH] mm: page_alloc: pass PFN to __free_pages_bootmem -fix Mel Gorman
2015-05-01  9:20     ` Mel Gorman
2015-04-28 14:37 ` [PATCH 05/13] mm: meminit: Make __early_pfn_to_nid SMP-safe and introduce meminit_pfn_in_nid Mel Gorman
2015-04-28 14:37   ` Mel Gorman
2015-04-28 14:37 ` [PATCH 06/13] mm: meminit: Inline some helper functions Mel Gorman
2015-04-28 14:37   ` Mel Gorman
2015-04-30 21:53   ` Andrew Morton
2015-04-30 21:53     ` Andrew Morton
2015-04-30 21:55     ` Andrew Morton
2015-04-30 21:55       ` Andrew Morton
2015-05-04  8:33   ` Michal Hocko
2015-05-04  8:33     ` Michal Hocko
2015-05-04  8:38     ` Michal Hocko
2015-05-04  8:38       ` Michal Hocko
2015-04-28 14:37 ` [PATCH 07/13] mm: meminit: Initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set Mel Gorman
2015-04-28 14:37   ` Mel Gorman
2015-04-29 21:19   ` Andrew Morton
2015-04-29 21:19     ` Andrew Morton
2015-04-30  8:45     ` Mel Gorman
2015-04-30  8:45       ` Mel Gorman
2015-05-01  9:21   ` [PATCH] mm: meminit: Initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set -fix Mel Gorman
2015-05-01  9:21     ` Mel Gorman
2015-07-14 15:54   ` 4.2-rc2: hitting "file-max limit 8192 reached" Dave Hansen
2015-07-14 15:54     ` Dave Hansen
2015-07-14 16:15     ` Andrew Morton
2015-07-14 16:15       ` Andrew Morton
2015-07-15 10:45     ` Mel Gorman
2015-07-15 10:45       ` Mel Gorman
2015-04-28 14:37 ` [PATCH 08/13] mm: meminit: Initialise remaining struct pages in parallel with kswapd Mel Gorman
2015-04-28 14:37   ` Mel Gorman
2015-04-28 14:37 ` [PATCH 09/13] mm: meminit: Minimise number of pfn->page lookups during initialisation Mel Gorman
2015-04-28 14:37   ` Mel Gorman
2015-04-28 14:37 ` [PATCH 10/13] x86: mm: Enable deferred struct page initialisation on x86-64 Mel Gorman
2015-04-28 14:37   ` Mel Gorman
2015-04-28 14:37 ` [PATCH 11/13] mm: meminit: Free pages in large chunks where possible Mel Gorman
2015-04-28 14:37   ` Mel Gorman
2015-04-28 14:37 ` [PATCH 12/13] mm: meminit: Reduce number of times pageblocks are set during struct page init Mel Gorman
2015-04-28 14:37   ` Mel Gorman
2015-05-01  9:23   ` [PATCH] mm: meminit: Reduce number of times pageblocks are set during struct page init -fix Mel Gorman
2015-05-01  9:23     ` Mel Gorman
2015-04-28 14:37 ` [PATCH 13/13] mm: meminit: Remove mminit_verify_page_links Mel Gorman
2015-04-28 14:37   ` Mel Gorman
2015-04-28 16:06 ` [PATCH 0/13] Parallel struct page initialisation v4 Pekka Enberg
2015-04-28 16:06   ` Pekka Enberg
2015-04-28 18:38   ` nzimmer
2015-04-28 18:38     ` nzimmer
2015-04-30 16:10     ` Daniel J Blueman
2015-04-30 16:10       ` Daniel J Blueman
2015-04-30 17:12       ` nzimmer
2015-04-30 17:12         ` nzimmer
2015-04-30 17:28         ` Mel Gorman
2015-04-30 17:28           ` Mel Gorman
2015-05-02 11:52       ` Elliott, Robert (Server Storage)
2015-05-02 11:52         ` Elliott, Robert (Server Storage)
2015-05-02 11:52         ` Elliott, Robert (Server Storage)
2015-04-29  1:16 ` Waiman Long
2015-04-29  1:16   ` Waiman Long
2015-05-01 22:02   ` Waiman Long
2015-05-01 22:02     ` Waiman Long
2015-05-02  0:09     ` Waiman Long
2015-05-02  0:09       ` Waiman Long
2015-05-02  8:52       ` Daniel J Blueman
2015-05-02  8:52         ` Daniel J Blueman
2015-05-02 16:05         ` Daniel J Blueman
2015-05-02 16:05           ` Daniel J Blueman
2015-05-04 21:30       ` Andrew Morton
2015-05-04 21:30         ` Andrew Morton
2015-05-05  3:32         ` Waiman Long
2015-05-05  3:32           ` Waiman Long
2015-05-05 10:45         ` Mel Gorman
2015-05-05 10:45           ` Mel Gorman
2015-05-05 13:55           ` Waiman Long
2015-05-05 13:55             ` Waiman Long
2015-05-05 14:31             ` Mel Gorman
2015-05-05 14:31               ` Mel Gorman
2015-05-05 15:01               ` Waiman Long
2015-05-05 15:01                 ` Waiman Long
2015-05-06  3:39                 ` Waiman Long
2015-05-06  3:39                   ` Waiman Long
2015-05-06  0:55               ` Waiman Long
2015-05-06  0:55                 ` Waiman Long
2015-05-05 20:02           ` Andrew Morton
2015-05-05 20:02             ` Andrew Morton
2015-05-05 22:13             ` Mel Gorman
2015-05-05 22:13               ` Mel Gorman
2015-05-05 22:25               ` Andrew Morton
2015-05-05 22:25                 ` Andrew Morton
2015-05-06  7:12                 ` Mel Gorman
2015-05-06  7:12                   ` Mel Gorman
2015-05-06 10:22                   ` Mel Gorman
2015-05-06 10:22                     ` Mel Gorman
2015-05-06 12:05                     ` Mel Gorman
2015-05-06 12:05                       ` Mel Gorman
2015-05-06 17:58                     ` Waiman Long
2015-05-06 17:58                       ` Waiman Long
2015-05-07  2:37                       ` Waiman Long
2015-05-07  2:37                         ` Waiman Long
2015-05-07  7:21                         ` Mel Gorman
2015-05-07  7:21                           ` Mel Gorman
2015-05-06  1:21             ` Waiman Long
2015-05-06  1:21               ` Waiman Long
2015-05-06  2:01               ` Andrew Morton
2015-05-06  2:01                 ` Andrew Morton
2015-05-07  7:25             ` [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup Mel Gorman
2015-05-07  7:25               ` Mel Gorman
2015-05-07 22:09               ` Andrew Morton
2015-05-07 22:09                 ` Andrew Morton
2015-05-07 22:52                 ` Mel Gorman
2015-05-07 22:52                   ` Mel Gorman
2015-05-07 23:02                   ` Andrew Morton
2015-05-07 23:02                     ` Andrew Morton
2015-05-13 15:53                 ` nzimmer
2015-05-13 15:53                   ` nzimmer
2015-05-13 16:31                   ` Mel Gorman
2015-05-13 16:31                     ` Mel Gorman
2015-05-14 10:03                     ` Daniel J Blueman
2015-05-14 10:03                       ` Daniel J Blueman
2015-05-14 15:47                       ` nzimmer
2015-05-14 15:47                         ` nzimmer
2015-05-19 18:31                       ` nzimmer
2015-05-19 18:31                         ` nzimmer
2015-05-19 19:06                         ` Mel Gorman
2015-05-19 19:06                           ` Mel Gorman
2015-05-22  6:30                       ` Daniel J Blueman
2015-05-22  6:30                         ` Daniel J Blueman
2015-05-22  9:33                         ` Mel Gorman
2015-05-22  9:33                           ` Mel Gorman
2015-05-22 17:14                           ` Waiman Long
2015-05-22 17:14                             ` Waiman Long
2015-05-22 21:43                             ` Davidlohr Bueso
2015-05-22 21:43                               ` Davidlohr Bueso
2015-05-23  3:49                             ` Daniel J Blueman
2015-05-23  3:49                               ` Daniel J Blueman
2015-06-24 22:50                       ` Nathan Zimmer
2015-06-24 22:50                         ` Nathan Zimmer
2015-06-25 20:48                         ` Mel Gorman
2015-06-25 20:48                           ` Mel Gorman
2015-06-25 20:57                           ` Mel Gorman
2015-06-25 20:57                             ` Mel Gorman
2015-06-25 21:37                             ` Nathan Zimmer
2015-06-25 21:37                               ` Nathan Zimmer
2015-06-25 21:34                           ` Nathan Zimmer
2015-06-25 21:34                             ` Nathan Zimmer
2015-06-25 21:44                           ` [RFC] kthread_create_on_node is failing to honor the node choice Nathan Zimmer
2015-06-26  1:08                             ` Lai Jiangshan
2015-07-09 22:12                             ` Andrew Morton
2015-07-10 14:26                               ` Mel Gorman
2015-07-10 17:34                               ` Nathan Zimmer
2015-06-26 10:16                         ` [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup Mel Gorman
2015-06-26 10:16                           ` Mel Gorman
2015-07-06 17:45                         ` Daniel J Blueman
2015-07-06 17:45                           ` Daniel J Blueman
2015-07-09 17:49                           ` Nathan Zimmer
2015-07-09 17:49                             ` Nathan Zimmer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.