linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] Account reserved memory when allocating system hash
@ 2016-08-29 13:06 Srikar Dronamraju
  2016-08-29 13:06 ` [PATCH v3 1/3] mm: Introduce arch_reserved_kernel_pages() Srikar Dronamraju
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Srikar Dronamraju @ 2016-08-29 13:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Srikar Dronamraju, linux-mm, Mel Gorman, Vlastimil Babka,
	Michal Hocko, Michael Ellerman, linuxppc-dev, Mahesh Salgaonkar,
	Hari Bathini, Dave Hansen, Balbir Singh

Fadump kernel reserves large chunks of memory even before the pages are
initialised. This could mean memory that corresponds to several nodes might
fall in memblock reserved regions.

Kernels compiled with CONFIG_DEFERRED_STRUCT_PAGE_INIT will initialise
only certain size memory per node. The certain size takes into account
the dentry and inode cache sizes. However such a kernel when booting a
secondary kernel will not be able to allocate the required amount of
memory to suffice for the dentry and inode caches. This results in
crashes like the below on large systems such as 32 TB systems.

Dentry cache hash table entries: 536870912 (order: 16, 4294967296 bytes)
vmalloc: allocation failure, allocated 4097114112 of 17179934720 bytes
swapper/0: page allocation failure: order:0, mode:0x2080020(GFP_ATOMIC)
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6-master+ #3
Call Trace:
[c00000000108fb10] [c0000000007fac88] dump_stack+0xb0/0xf0 (unreliable)
[c00000000108fb50] [c000000000235264] warn_alloc_failed+0x114/0x160
[c00000000108fbf0] [c000000000281484] __vmalloc_node_range+0x304/0x340
[c00000000108fca0] [c00000000028152c] __vmalloc+0x6c/0x90
[c00000000108fd40] [c000000000aecfb0]
alloc_large_system_hash+0x1b8/0x2c0
[c00000000108fe00] [c000000000af7240] inode_init+0x94/0xe4
[c00000000108fe80] [c000000000af6fec] vfs_caches_init+0x8c/0x13c
[c00000000108ff00] [c000000000ac4014] start_kernel+0x50c/0x578
[c00000000108ff90] [c000000000008c6c] start_here_common+0x20/0xa8

This patchset solves this problem by accounting the size of reserved memory
when calculating the size of large system hashes.

While this patchset applies on v4.8-rc3, it cannot be tested on v4.8-rc3
because of http://lkml.kernel.org/r/20160829093844.GA2592@linux.vnet.ibm.com
However it has been tested on v4.7/v4.6 and v4.4

v2: http://lkml.kernel.org/r/1470330729-6273-1-git-send-email-srikar@linux.vnet.ibm.com 

Cc: linux-mm@kvack.org
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

Srikar Dronamraju (3):
  mm: Introduce arch_reserved_kernel_pages()
  mm/memblock: Expose total reserved memory
  powerpc: Implement arch_reserved_kernel_pages

 arch/powerpc/include/asm/mmzone.h |  3 +++
 arch/powerpc/kernel/fadump.c      |  5 +++++
 include/linux/memblock.h          |  1 +
 include/linux/mm.h                |  3 +++
 mm/memblock.c                     |  5 +++++
 mm/page_alloc.c                   | 12 ++++++++++++
 6 files changed, 29 insertions(+)

-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3 1/3] mm: Introduce arch_reserved_kernel_pages()
  2016-08-29 13:06 [PATCH v3 0/3] Account reserved memory when allocating system hash Srikar Dronamraju
@ 2016-08-29 13:06 ` Srikar Dronamraju
  2016-08-29 13:06 ` [PATCH v3 2/3] mm/memblock: Expose total reserved memory Srikar Dronamraju
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Srikar Dronamraju @ 2016-08-29 13:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Srikar Dronamraju, linux-mm, Mel Gorman, Vlastimil Babka,
	Michal Hocko, Michael Ellerman, linuxppc-dev, Mahesh Salgaonkar,
	Hari Bathini, Dave Hansen, Balbir Singh

Currently arch specific code can reserve memory blocks but
alloc_large_system_hash() may not take it into consideration when sizing
the hashes. This can lead to bigger hash than required and lead to no
available memory for other purposes. This is specifically true for
systems with CONFIG_DEFERRED_STRUCT_PAGE_INIT enabled.

One approach to solve this problem would be to walk through the memblock
regions and calculate the available memory and base the size of hash
system on the available memory.

The other approach would be to depend on the architecture to provide the
number of pages that are reserved. This change provides hooks to allow
the architecture to provide the required info.

Cc: linux-mm@kvack.org
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Suggested-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 include/linux/mm.h |  3 +++
 mm/page_alloc.c    | 12 ++++++++++++
 2 files changed, 15 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 08ed53e..7e91cd8 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1924,6 +1924,9 @@ extern void show_mem(unsigned int flags);
 extern long si_mem_available(void);
 extern void si_meminfo(struct sysinfo * val);
 extern void si_meminfo_node(struct sysinfo *val, int nid);
+#ifdef __HAVE_ARCH_RESERVED_KERNEL_PAGES
+extern unsigned long arch_reserved_kernel_pages(void);
+#endif
 
 extern __printf(3, 4)
 void warn_alloc_failed(gfp_t gfp_mask, unsigned int order,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3fbe73a..9d91706 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6976,6 +6976,17 @@ static int __init set_hashdist(char *str)
 __setup("hashdist=", set_hashdist);
 #endif
 
+#ifndef __HAVE_ARCH_RESERVED_KERNEL_PAGES
+/*
+ * Returns the number of pages that arch has reserved but
+ * is not known to alloc_large_system_hash().
+ */
+static unsigned long __init arch_reserved_kernel_pages(void)
+{
+	return 0;
+}
+#endif
+
 /*
  * allocate a large system hash table from bootmem
  * - it is assumed that the hash table must contain an exact power-of-2
@@ -7000,6 +7011,7 @@ void *__init alloc_large_system_hash(const char *tablename,
 	if (!numentries) {
 		/* round applicable memory size up to nearest megabyte */
 		numentries = nr_kernel_pages;
+		numentries -= arch_reserved_kernel_pages();
 
 		/* It isn't necessary when PAGE_SIZE >= 1MB */
 		if (PAGE_SHIFT < 20)
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v3 2/3] mm/memblock: Expose total reserved memory
  2016-08-29 13:06 [PATCH v3 0/3] Account reserved memory when allocating system hash Srikar Dronamraju
  2016-08-29 13:06 ` [PATCH v3 1/3] mm: Introduce arch_reserved_kernel_pages() Srikar Dronamraju
@ 2016-08-29 13:06 ` Srikar Dronamraju
  2016-08-29 13:06 ` [PATCH v3 3/3] powerpc: Implement arch_reserved_kernel_pages Srikar Dronamraju
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Srikar Dronamraju @ 2016-08-29 13:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Srikar Dronamraju, linux-mm, Mel Gorman, Vlastimil Babka,
	Michal Hocko, Michael Ellerman, linuxppc-dev, Mahesh Salgaonkar,
	Hari Bathini, Dave Hansen, Balbir Singh

The total reserved memory in a system is accounted but not available for
use use outside mm/memblock.c. By exposing the total reserved memory,
systems can better calculate the size of large hashes.

Cc: linux-mm@kvack.org
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 include/linux/memblock.h | 1 +
 mm/memblock.c            | 5 +++++
 2 files changed, 6 insertions(+)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 2925da2..5b759c9 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -328,6 +328,7 @@ phys_addr_t memblock_alloc_base(phys_addr_t size, phys_addr_t align,
 phys_addr_t __memblock_alloc_base(phys_addr_t size, phys_addr_t align,
 				  phys_addr_t max_addr);
 phys_addr_t memblock_phys_mem_size(void);
+phys_addr_t memblock_reserved_size(void);
 phys_addr_t memblock_mem_size(unsigned long limit_pfn);
 phys_addr_t memblock_start_of_DRAM(void);
 phys_addr_t memblock_end_of_DRAM(void);
diff --git a/mm/memblock.c b/mm/memblock.c
index 483197e..c8dfa43 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1438,6 +1438,11 @@ phys_addr_t __init_memblock memblock_phys_mem_size(void)
 	return memblock.memory.total_size;
 }
 
+phys_addr_t __init_memblock memblock_reserved_size(void)
+{
+	return memblock.reserved.total_size;
+}
+
 phys_addr_t __init memblock_mem_size(unsigned long limit_pfn)
 {
 	unsigned long pages = 0;
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v3 3/3] powerpc: Implement arch_reserved_kernel_pages
  2016-08-29 13:06 [PATCH v3 0/3] Account reserved memory when allocating system hash Srikar Dronamraju
  2016-08-29 13:06 ` [PATCH v3 1/3] mm: Introduce arch_reserved_kernel_pages() Srikar Dronamraju
  2016-08-29 13:06 ` [PATCH v3 2/3] mm/memblock: Expose total reserved memory Srikar Dronamraju
@ 2016-08-29 13:06 ` Srikar Dronamraju
  2016-08-29 23:07 ` [PATCH v3 0/3] Account reserved memory when allocating system hash Andrew Morton
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Srikar Dronamraju @ 2016-08-29 13:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Srikar Dronamraju, linux-mm, Mel Gorman, Vlastimil Babka,
	Michal Hocko, Michael Ellerman, linuxppc-dev, Mahesh Salgaonkar,
	Hari Bathini, Dave Hansen, Balbir Singh

Currently significant amount of memory is reserved only in kernel
booted to capture kernel dump using the fa_dump method.

Kernels compiled with CONFIG_DEFERRED_STRUCT_PAGE_INIT will initialize
only certain size memory per node. The certain size takes into account
the dentry and inode cache sizes. Currently the cache sizes are
calculated based on the total system memory including the reserved
memory. However such a kernel when booting the same kernel as fadump
kernel will not be able to allocate the required amount of memory to
suffice for the dentry and inode caches. This results in crashes like

Hence only implement arch_reserved_kernel_pages() for CONFIG_FA_DUMP
configurations. The amount reserved will be reduced while calculating
the large caches and will avoid crashes like the below on large systems
such as 32 TB systems.

Dentry cache hash table entries: 536870912 (order: 16, 4294967296 bytes)
vmalloc: allocation failure, allocated 4097114112 of 17179934720 bytes
swapper/0: page allocation failure: order:0, mode:0x2080020(GFP_ATOMIC)
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6-master+ #3
Call Trace:
[c00000000108fb10] [c0000000007fac88] dump_stack+0xb0/0xf0 (unreliable)
[c00000000108fb50] [c000000000235264] warn_alloc_failed+0x114/0x160
[c00000000108fbf0] [c000000000281484] __vmalloc_node_range+0x304/0x340
[c00000000108fca0] [c00000000028152c] __vmalloc+0x6c/0x90
[c00000000108fd40] [c000000000aecfb0]
alloc_large_system_hash+0x1b8/0x2c0
[c00000000108fe00] [c000000000af7240] inode_init+0x94/0xe4
[c00000000108fe80] [c000000000af6fec] vfs_caches_init+0x8c/0x13c
[c00000000108ff00] [c000000000ac4014] start_kernel+0x50c/0x578
[c00000000108ff90] [c000000000008c6c] start_here_common+0x20/0xa8

Cc: linux-mm@kvack.org
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/mmzone.h | 3 +++
 arch/powerpc/kernel/fadump.c      | 5 +++++
 2 files changed, 8 insertions(+)

diff --git a/arch/powerpc/include/asm/mmzone.h b/arch/powerpc/include/asm/mmzone.h
index 7b58917..4d52ccf 100644
--- a/arch/powerpc/include/asm/mmzone.h
+++ b/arch/powerpc/include/asm/mmzone.h
@@ -41,6 +41,9 @@ u64 memory_hotplug_max(void);
 #else
 #define memory_hotplug_max() memblock_end_of_DRAM()
 #endif /* CONFIG_NEED_MULTIPLE_NODES */
+#ifdef CONFIG_FA_DUMP
+#define __HAVE_ARCH_RESERVED_KERNEL_PAGES
+#endif
 
 #endif /* __KERNEL__ */
 #endif /* _ASM_MMZONE_H_ */
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index b3a6633..eeb80de 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -333,6 +333,11 @@ int __init fadump_reserve_mem(void)
 	return 1;
 }
 
+unsigned long __init arch_reserved_kernel_pages(void)
+{
+	return memblock_reserved_size() / PAGE_SIZE;
+}
+
 /* Look for fadump= cmdline option. */
 static int __init early_fadump_param(char *p)
 {
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 0/3] Account reserved memory when allocating system hash
  2016-08-29 13:06 [PATCH v3 0/3] Account reserved memory when allocating system hash Srikar Dronamraju
                   ` (2 preceding siblings ...)
  2016-08-29 13:06 ` [PATCH v3 3/3] powerpc: Implement arch_reserved_kernel_pages Srikar Dronamraju
@ 2016-08-29 23:07 ` Andrew Morton
  2016-08-29 23:08 ` Andrew Morton
  2016-08-31  9:48 ` Michal Hocko
  5 siblings, 0 replies; 7+ messages in thread
From: Andrew Morton @ 2016-08-29 23:07 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: linux-mm, Mel Gorman, Vlastimil Babka, Michal Hocko,
	Michael Ellerman, linuxppc-dev, Mahesh Salgaonkar, Hari Bathini,
	Dave Hansen, Balbir Singh

On Mon, 29 Aug 2016 18:36:47 +0530 Srikar Dronamraju <srikar@linux.vnet.ibm.com> wrote:

> Fadump kernel reserves large chunks of memory even before the pages are
> initialised. This could mean memory that corresponds to several nodes might
> fall in memblock reserved regions.
> 
> Kernels compiled with CONFIG_DEFERRED_STRUCT_PAGE_INIT will initialise
> only certain size memory per node. The certain size takes into account
> the dentry and inode cache sizes. However such a kernel when booting a
> secondary kernel will not be able to allocate the required amount of
> memory to suffice for the dentry and inode caches. This results in
> crashes like the below on large systems such as 32 TB systems.
> 
> Dentry cache hash table entries: 536870912 (order: 16, 4294967296 bytes)
> vmalloc: allocation failure, allocated 4097114112 of 17179934720 bytes
> swapper/0: page allocation failure: order:0, mode:0x2080020(GFP_ATOMIC)
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6-master+ #3
> Call Trace:
> [c00000000108fb10] [c0000000007fac88] dump_stack+0xb0/0xf0 (unreliable)
> [c00000000108fb50] [c000000000235264] warn_alloc_failed+0x114/0x160
> [c00000000108fbf0] [c000000000281484] __vmalloc_node_range+0x304/0x340
> [c00000000108fca0] [c00000000028152c] __vmalloc+0x6c/0x90
> [c00000000108fd40] [c000000000aecfb0]
> alloc_large_system_hash+0x1b8/0x2c0
> [c00000000108fe00] [c000000000af7240] inode_init+0x94/0xe4
> [c00000000108fe80] [c000000000af6fec] vfs_caches_init+0x8c/0x13c
> [c00000000108ff00] [c000000000ac4014] start_kernel+0x50c/0x578
> [c00000000108ff90] [c000000000008c6c] start_here_common+0x20/0xa8
> 
> This patchset solves this problem by accounting the size of reserved memory
> when calculating the size of large system hashes.
> 
> While this patchset applies on v4.8-rc3, it cannot be tested on v4.8-rc3
> because of http://lkml.kernel.org/r/20160829093844.GA2592@linux.vnet.ibm.com
> However it has been tested on v4.7/v4.6 and v4.4

That looks like a pretty serious regression.

I'll grab the patchset anyway.  It will come good when we fix that kswapd
thing.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 0/3] Account reserved memory when allocating system hash
  2016-08-29 13:06 [PATCH v3 0/3] Account reserved memory when allocating system hash Srikar Dronamraju
                   ` (3 preceding siblings ...)
  2016-08-29 23:07 ` [PATCH v3 0/3] Account reserved memory when allocating system hash Andrew Morton
@ 2016-08-29 23:08 ` Andrew Morton
  2016-08-31  9:48 ` Michal Hocko
  5 siblings, 0 replies; 7+ messages in thread
From: Andrew Morton @ 2016-08-29 23:08 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: linux-mm, Mel Gorman, Vlastimil Babka, Michal Hocko,
	Michael Ellerman, linuxppc-dev, Mahesh Salgaonkar, Hari Bathini,
	Dave Hansen, Balbir Singh

On Mon, 29 Aug 2016 18:36:47 +0530 Srikar Dronamraju <srikar@linux.vnet.ibm.com> wrote:

> Fadump kernel reserves large chunks of memory even before the pages are
> initialised. This could mean memory that corresponds to several nodes might
> fall in memblock reserved regions.
> 
> Kernels compiled with CONFIG_DEFERRED_STRUCT_PAGE_INIT will initialise
> only certain size memory per node. The certain size takes into account
> the dentry and inode cache sizes. However such a kernel when booting a
> secondary kernel will not be able to allocate the required amount of
> memory to suffice for the dentry and inode caches. This results in
> crashes like the below on large systems such as 32 TB systems.
> 
> Dentry cache hash table entries: 536870912 (order: 16, 4294967296 bytes)
> vmalloc: allocation failure, allocated 4097114112 of 17179934720 bytes
> swapper/0: page allocation failure: order:0, mode:0x2080020(GFP_ATOMIC)
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6-master+ #3
> Call Trace:
> [c00000000108fb10] [c0000000007fac88] dump_stack+0xb0/0xf0 (unreliable)
> [c00000000108fb50] [c000000000235264] warn_alloc_failed+0x114/0x160
> [c00000000108fbf0] [c000000000281484] __vmalloc_node_range+0x304/0x340
> [c00000000108fca0] [c00000000028152c] __vmalloc+0x6c/0x90
> [c00000000108fd40] [c000000000aecfb0]
> alloc_large_system_hash+0x1b8/0x2c0
> [c00000000108fe00] [c000000000af7240] inode_init+0x94/0xe4
> [c00000000108fe80] [c000000000af6fec] vfs_caches_init+0x8c/0x13c
> [c00000000108ff00] [c000000000ac4014] start_kernel+0x50c/0x578
> [c00000000108ff90] [c000000000008c6c] start_here_common+0x20/0xa8
> 
> This patchset solves this problem by accounting the size of reserved memory
> when calculating the size of large system hashes.

What's the priority on this, btw?  Not needed in earlier kernels?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 0/3] Account reserved memory when allocating system hash
  2016-08-29 13:06 [PATCH v3 0/3] Account reserved memory when allocating system hash Srikar Dronamraju
                   ` (4 preceding siblings ...)
  2016-08-29 23:08 ` Andrew Morton
@ 2016-08-31  9:48 ` Michal Hocko
  5 siblings, 0 replies; 7+ messages in thread
From: Michal Hocko @ 2016-08-31  9:48 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Andrew Morton, linux-mm, Mel Gorman, Vlastimil Babka,
	Michael Ellerman, linuxppc-dev, Mahesh Salgaonkar, Hari Bathini,
	Dave Hansen, Balbir Singh

On Mon 29-08-16 18:36:47, Srikar Dronamraju wrote:
> Fadump kernel reserves large chunks of memory even before the pages are
> initialised. This could mean memory that corresponds to several nodes might
> fall in memblock reserved regions.
> 
> Kernels compiled with CONFIG_DEFERRED_STRUCT_PAGE_INIT will initialise
> only certain size memory per node. The certain size takes into account
> the dentry and inode cache sizes. However such a kernel when booting a
> secondary kernel will not be able to allocate the required amount of
> memory to suffice for the dentry and inode caches. This results in
> crashes like the below on large systems such as 32 TB systems.
> 
> Dentry cache hash table entries: 536870912 (order: 16, 4294967296 bytes)
> vmalloc: allocation failure, allocated 4097114112 of 17179934720 bytes
> swapper/0: page allocation failure: order:0, mode:0x2080020(GFP_ATOMIC)
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6-master+ #3
> Call Trace:
> [c00000000108fb10] [c0000000007fac88] dump_stack+0xb0/0xf0 (unreliable)
> [c00000000108fb50] [c000000000235264] warn_alloc_failed+0x114/0x160
> [c00000000108fbf0] [c000000000281484] __vmalloc_node_range+0x304/0x340
> [c00000000108fca0] [c00000000028152c] __vmalloc+0x6c/0x90
> [c00000000108fd40] [c000000000aecfb0]
> alloc_large_system_hash+0x1b8/0x2c0
> [c00000000108fe00] [c000000000af7240] inode_init+0x94/0xe4
> [c00000000108fe80] [c000000000af6fec] vfs_caches_init+0x8c/0x13c
> [c00000000108ff00] [c000000000ac4014] start_kernel+0x50c/0x578
> [c00000000108ff90] [c000000000008c6c] start_here_common+0x20/0xa8
> 
> This patchset solves this problem by accounting the size of reserved memory
> when calculating the size of large system hashes.

So I think that this is just a fallout from how fadump is hackish and
tricky. Reserving large portion/majority of memory from the kernel just
sounds like a mind field. This patchset is dealing with one particular
problem. Fair enough, it seems like the easiest way to go and something
that would be stable backport safe as well so
Acked-by: Michal Hocko <mhocko@suse.com> to those whole series

but I cannot say I would be happy about the whole fadump thing...

> While this patchset applies on v4.8-rc3, it cannot be tested on v4.8-rc3
> because of http://lkml.kernel.org/r/20160829093844.GA2592@linux.vnet.ibm.com
> However it has been tested on v4.7/v4.6 and v4.4

another supporting argument for the above. 15 out of 16 nodes without
any memory... Sigh

> v2: http://lkml.kernel.org/r/1470330729-6273-1-git-send-email-srikar@linux.vnet.ibm.com 
> 
> Cc: linux-mm@kvack.org
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Balbir Singh <bsingharora@gmail.com>
> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> 
> Srikar Dronamraju (3):
>   mm: Introduce arch_reserved_kernel_pages()
>   mm/memblock: Expose total reserved memory
>   powerpc: Implement arch_reserved_kernel_pages
> 
>  arch/powerpc/include/asm/mmzone.h |  3 +++
>  arch/powerpc/kernel/fadump.c      |  5 +++++
>  include/linux/memblock.h          |  1 +
>  include/linux/mm.h                |  3 +++
>  mm/memblock.c                     |  5 +++++
>  mm/page_alloc.c                   | 12 ++++++++++++
>  6 files changed, 29 insertions(+)
> 
> -- 
> 1.8.5.6

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-08-31  9:48 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-29 13:06 [PATCH v3 0/3] Account reserved memory when allocating system hash Srikar Dronamraju
2016-08-29 13:06 ` [PATCH v3 1/3] mm: Introduce arch_reserved_kernel_pages() Srikar Dronamraju
2016-08-29 13:06 ` [PATCH v3 2/3] mm/memblock: Expose total reserved memory Srikar Dronamraju
2016-08-29 13:06 ` [PATCH v3 3/3] powerpc: Implement arch_reserved_kernel_pages Srikar Dronamraju
2016-08-29 23:07 ` [PATCH v3 0/3] Account reserved memory when allocating system hash Andrew Morton
2016-08-29 23:08 ` Andrew Morton
2016-08-31  9:48 ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).