linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2 1/2] mm/page_alloc: Replace set_dma_reserve to set_memory_reserve
@ 2016-08-04 17:12 Srikar Dronamraju
  2016-08-04 17:12 ` [PATCH V2 2/2] fadump: Register the memory reserved by fadump Srikar Dronamraju
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Srikar Dronamraju @ 2016-08-04 17:12 UTC (permalink / raw)
  To: linux-mm, Mel Gorman, Vlastimil Babka, Michal Hocko,
	Andrew Morton, Michael Ellerman, linuxppc-dev, Mahesh Salgaonkar,
	Hari Bathini
  Cc: Dave Hansen, Balbir Singh, Srikar Dronamraju

Expand the scope of the existing dma_reserve to accommodate other memory
reserves too. Accordingly rename variable dma_reserve to
nr_memory_reserve.

set_memory_reserve also takes a new parameter that helps to identify if
the current value needs to be incremented.

Suggested-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 arch/x86/kernel/e820.c |  2 +-
 include/linux/mm.h     |  2 +-
 mm/page_alloc.c        | 20 ++++++++++++--------
 3 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 621b501..d935983 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1188,6 +1188,6 @@ void __init memblock_find_dma_reserve(void)
 			nr_free_pages += end_pfn - start_pfn;
 	}
 
-	set_dma_reserve(nr_pages - nr_free_pages);
+	set_memory_reserve(nr_pages - nr_free_pages, false);
 #endif
 }
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8f468e0..c884ffb 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1886,7 +1886,7 @@ extern int __meminit __early_pfn_to_nid(unsigned long pfn,
 					struct mminit_pfnnid_cache *state);
 #endif
 
-extern void set_dma_reserve(unsigned long new_dma_reserve);
+extern void set_memory_reserve(unsigned long nr_reserve, bool inc);
 extern void memmap_init_zone(unsigned long, int, unsigned long,
 				unsigned long, enum memmap_context);
 extern void setup_per_zone_wmarks(void);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c1069ef..a154c2f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -253,7 +253,7 @@ int watermark_scale_factor = 10;
 
 static unsigned long __meminitdata nr_kernel_pages;
 static unsigned long __meminitdata nr_all_pages;
-static unsigned long __meminitdata dma_reserve;
+static unsigned long __meminitdata nr_memory_reserve;
 
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 static unsigned long __meminitdata arch_zone_lowest_possible_pfn[MAX_NR_ZONES];
@@ -5493,10 +5493,10 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
 		}
 
 		/* Account for reserved pages */
-		if (j == 0 && freesize > dma_reserve) {
-			freesize -= dma_reserve;
+		if (j == 0 && freesize > nr_memory_reserve) {
+			freesize -= nr_memory_reserve;
 			printk(KERN_DEBUG "  %s zone: %lu pages reserved\n",
-					zone_names[0], dma_reserve);
+					zone_names[0], nr_memory_reserve);
 		}
 
 		if (!is_highmem_idx(j))
@@ -6186,8 +6186,9 @@ void __init mem_init_print_info(const char *str)
 }
 
 /**
- * set_dma_reserve - set the specified number of pages reserved in the first zone
- * @new_dma_reserve: The number of pages to mark reserved
+ * set_memory_reserve - set number of pages reserved in the first zone
+ * @nr_reserve: The number of pages to mark reserved
+ * @inc: true increment to existing value; false set new value.
  *
  * The per-cpu batchsize and zone watermarks are determined by managed_pages.
  * In the DMA zone, a significant percentage may be consumed by kernel image
@@ -6196,9 +6197,12 @@ void __init mem_init_print_info(const char *str)
  * first zone (e.g., ZONE_DMA). The effect will be lower watermarks and
  * smaller per-cpu batchsize.
  */
-void __init set_dma_reserve(unsigned long new_dma_reserve)
+void __init set_memory_reserve(unsigned long nr_reserve, bool inc)
 {
-	dma_reserve = new_dma_reserve;
+	if (inc)
+		nr_memory_reserve += nr_reserve;
+	else
+		nr_memory_reserve = nr_reserve;
 }
 
 void __init free_area_init(unsigned long *zones_size)
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V2 2/2] fadump: Register the memory reserved by fadump
  2016-08-04 17:12 [PATCH V2 1/2] mm/page_alloc: Replace set_dma_reserve to set_memory_reserve Srikar Dronamraju
@ 2016-08-04 17:12 ` Srikar Dronamraju
  2016-08-04 21:01   ` Andrew Morton
  2016-08-05  6:45 ` [PATCH V2 1/2] mm/page_alloc: Replace set_dma_reserve to set_memory_reserve Vlastimil Babka
  2016-08-05  6:47 ` Mel Gorman
  2 siblings, 1 reply; 9+ messages in thread
From: Srikar Dronamraju @ 2016-08-04 17:12 UTC (permalink / raw)
  To: linux-mm, Mel Gorman, Vlastimil Babka, Michal Hocko,
	Andrew Morton, Michael Ellerman, linuxppc-dev, Mahesh Salgaonkar,
	Hari Bathini
  Cc: Dave Hansen, Balbir Singh, Srikar Dronamraju

Fadump kernel reserves large chunks of memory even before the pages are
initialized. This could mean memory that corresponds to several nodes might
fall in memblock reserved regions.

Kernels compiled with CONFIG_DEFERRED_STRUCT_PAGE_INIT will initialize
only certain size memory per node. The certain size takes into account
the dentry and inode cache sizes. Currently the cache sizes are
calculated based on the total system memory including the reserved
memory. However such a kernel when booting the same kernel as fadump
kernel will not be able to allocate the required amount of memory to
suffice for the dentry and inode caches. This results in crashes like
the below on large systems such as 32 TB systems.

Dentry cache hash table entries: 536870912 (order: 16, 4294967296 bytes)
vmalloc: allocation failure, allocated 4097114112 of 17179934720 bytes
swapper/0: page allocation failure: order:0, mode:0x2080020(GFP_ATOMIC)
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6-master+ #3
Call Trace:
[c00000000108fb10] [c0000000007fac88] dump_stack+0xb0/0xf0 (unreliable)
[c00000000108fb50] [c000000000235264] warn_alloc_failed+0x114/0x160
[c00000000108fbf0] [c000000000281484] __vmalloc_node_range+0x304/0x340
[c00000000108fca0] [c00000000028152c] __vmalloc+0x6c/0x90
[c00000000108fd40] [c000000000aecfb0]
alloc_large_system_hash+0x1b8/0x2c0
[c00000000108fe00] [c000000000af7240] inode_init+0x94/0xe4
[c00000000108fe80] [c000000000af6fec] vfs_caches_init+0x8c/0x13c
[c00000000108ff00] [c000000000ac4014] start_kernel+0x50c/0x578
[c00000000108ff90] [c000000000008c6c] start_here_common+0x20/0xa8

Register the memory reserved by fadump, so that the cache sizes are
calculated based on the free memory (i.e Total memory - reserved
memory).

Suggested-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/fadump.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 3cb3b02a..ca5ec88 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -330,6 +330,7 @@ int __init fadump_reserve_mem(void)
 	}
 	fw_dump.reserve_dump_area_start = base;
 	fw_dump.reserve_dump_area_size = size;
+	set_memory_reserve(size/PAGE_SIZE, true);
 	return 1;
 }
 
-- 
1.8.5.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH V2 2/2] fadump: Register the memory reserved by fadump
  2016-08-04 17:12 ` [PATCH V2 2/2] fadump: Register the memory reserved by fadump Srikar Dronamraju
@ 2016-08-04 21:01   ` Andrew Morton
  2016-08-29 13:12     ` Srikar Dronamraju
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2016-08-04 21:01 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: linux-mm, Mel Gorman, Vlastimil Babka, Michal Hocko,
	Michael Ellerman, linuxppc-dev, Mahesh Salgaonkar, Hari Bathini,
	Dave Hansen, Balbir Singh

On Thu,  4 Aug 2016 22:42:09 +0530 Srikar Dronamraju <srikar@linux.vnet.ibm.com> wrote:

> Fadump kernel reserves large chunks of memory even before the pages are
> initialized. This could mean memory that corresponds to several nodes might
> fall in memblock reserved regions.
> 
> Kernels compiled with CONFIG_DEFERRED_STRUCT_PAGE_INIT will initialize
> only certain size memory per node. The certain size takes into account
> the dentry and inode cache sizes. Currently the cache sizes are
> calculated based on the total system memory including the reserved
> memory. However such a kernel when booting the same kernel as fadump
> kernel will not be able to allocate the required amount of memory to
> suffice for the dentry and inode caches. This results in crashes like
> the below on large systems such as 32 TB systems.
> 
> Dentry cache hash table entries: 536870912 (order: 16, 4294967296 bytes)
> vmalloc: allocation failure, allocated 4097114112 of 17179934720 bytes
> swapper/0: page allocation failure: order:0, mode:0x2080020(GFP_ATOMIC)
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6-master+ #3
> Call Trace:
> [c00000000108fb10] [c0000000007fac88] dump_stack+0xb0/0xf0 (unreliable)
> [c00000000108fb50] [c000000000235264] warn_alloc_failed+0x114/0x160
> [c00000000108fbf0] [c000000000281484] __vmalloc_node_range+0x304/0x340
> [c00000000108fca0] [c00000000028152c] __vmalloc+0x6c/0x90
> [c00000000108fd40] [c000000000aecfb0]
> alloc_large_system_hash+0x1b8/0x2c0
> [c00000000108fe00] [c000000000af7240] inode_init+0x94/0xe4
> [c00000000108fe80] [c000000000af6fec] vfs_caches_init+0x8c/0x13c
> [c00000000108ff00] [c000000000ac4014] start_kernel+0x50c/0x578
> [c00000000108ff90] [c000000000008c6c] start_here_common+0x20/0xa8
> 
> Register the memory reserved by fadump, so that the cache sizes are
> calculated based on the free memory (i.e Total memory - reserved
> memory).

Looks harmless enough to me.  I'll schedule the patches for 4.8.  But
it sounds like they should be backported into older kernels?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V2 1/2] mm/page_alloc: Replace set_dma_reserve to set_memory_reserve
  2016-08-04 17:12 [PATCH V2 1/2] mm/page_alloc: Replace set_dma_reserve to set_memory_reserve Srikar Dronamraju
  2016-08-04 17:12 ` [PATCH V2 2/2] fadump: Register the memory reserved by fadump Srikar Dronamraju
@ 2016-08-05  6:45 ` Vlastimil Babka
  2016-08-05  7:24   ` Srikar Dronamraju
  2016-08-05  6:47 ` Mel Gorman
  2 siblings, 1 reply; 9+ messages in thread
From: Vlastimil Babka @ 2016-08-05  6:45 UTC (permalink / raw)
  To: Srikar Dronamraju, linux-mm, Mel Gorman, Michal Hocko,
	Andrew Morton, Michael Ellerman, linuxppc-dev, Mahesh Salgaonkar,
	Hari Bathini
  Cc: Dave Hansen, Balbir Singh

On 08/04/2016 07:12 PM, Srikar Dronamraju wrote:
> Expand the scope of the existing dma_reserve to accommodate other memory
> reserves too. Accordingly rename variable dma_reserve to
> nr_memory_reserve.
>
> set_memory_reserve also takes a new parameter that helps to identify if
> the current value needs to be incremented.
>
> Suggested-by: Mel Gorman <mgorman@techsingularity.net>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> ---
>  arch/x86/kernel/e820.c |  2 +-
>  include/linux/mm.h     |  2 +-
>  mm/page_alloc.c        | 20 ++++++++++++--------
>  3 files changed, 14 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> index 621b501..d935983 100644
> --- a/arch/x86/kernel/e820.c
> +++ b/arch/x86/kernel/e820.c
> @@ -1188,6 +1188,6 @@ void __init memblock_find_dma_reserve(void)
>  			nr_free_pages += end_pfn - start_pfn;
>  	}
>
> -	set_dma_reserve(nr_pages - nr_free_pages);
> +	set_memory_reserve(nr_pages - nr_free_pages, false);
>  #endif
>  }
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 8f468e0..c884ffb 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1886,7 +1886,7 @@ extern int __meminit __early_pfn_to_nid(unsigned long pfn,
>  					struct mminit_pfnnid_cache *state);
>  #endif
>
> -extern void set_dma_reserve(unsigned long new_dma_reserve);
> +extern void set_memory_reserve(unsigned long nr_reserve, bool inc);
>  extern void memmap_init_zone(unsigned long, int, unsigned long,
>  				unsigned long, enum memmap_context);
>  extern void setup_per_zone_wmarks(void);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c1069ef..a154c2f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -253,7 +253,7 @@ int watermark_scale_factor = 10;
>
>  static unsigned long __meminitdata nr_kernel_pages;
>  static unsigned long __meminitdata nr_all_pages;
> -static unsigned long __meminitdata dma_reserve;
> +static unsigned long __meminitdata nr_memory_reserve;
>
>  #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
>  static unsigned long __meminitdata arch_zone_lowest_possible_pfn[MAX_NR_ZONES];
> @@ -5493,10 +5493,10 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
>  		}
>
>  		/* Account for reserved pages */
> -		if (j == 0 && freesize > dma_reserve) {
> -			freesize -= dma_reserve;
> +		if (j == 0 && freesize > nr_memory_reserve) {

Will this really work (together with patch 2) as intended?
This j == 0 means that we are doing this only for the first zone, which 
is ZONE_DMA (or ZONE_DMA32) on node 0 on many systems. I.e. I don't 
think it's really true that "dma_reserve has nothing to do with DMA or 
ZONE_DMA".

This zone will have limited amount of memory, so the "freesize > 
nr_memory_reserve" will easily be false once you set this to many 
gigabytes, so in fact nothing will get subtracted.

On the other hand if the kernel has both CONFIG_ZONE_DMA and 
CONFIG_ZONE_DMA32 disabled, then j == 0 will be true for ZONE_NORMAL. 
This zone might be present on multiple nodes (unless they are configured 
as movable) and then the value intended to be global will be subtracted 
from several nodes.

I don't know what's the exact ppc64 situation here, perhaps there are 
indeed no DMA/DMA32 zones, and the fadump kernel only uses one node, so 
it works in the end, but it doesn't seem much robust to me?

> +			freesize -= nr_memory_reserve;
>  			printk(KERN_DEBUG "  %s zone: %lu pages reserved\n",
> -					zone_names[0], dma_reserve);
> +					zone_names[0], nr_memory_reserve);
>  		}
>
>  		if (!is_highmem_idx(j))
> @@ -6186,8 +6186,9 @@ void __init mem_init_print_info(const char *str)
>  }
>
>  /**
> - * set_dma_reserve - set the specified number of pages reserved in the first zone
> - * @new_dma_reserve: The number of pages to mark reserved
> + * set_memory_reserve - set number of pages reserved in the first zone
> + * @nr_reserve: The number of pages to mark reserved
> + * @inc: true increment to existing value; false set new value.
>   *
>   * The per-cpu batchsize and zone watermarks are determined by managed_pages.
>   * In the DMA zone, a significant percentage may be consumed by kernel image
> @@ -6196,9 +6197,12 @@ void __init mem_init_print_info(const char *str)
>   * first zone (e.g., ZONE_DMA). The effect will be lower watermarks and
>   * smaller per-cpu batchsize.
>   */
> -void __init set_dma_reserve(unsigned long new_dma_reserve)
> +void __init set_memory_reserve(unsigned long nr_reserve, bool inc)
>  {
> -	dma_reserve = new_dma_reserve;
> +	if (inc)
> +		nr_memory_reserve += nr_reserve;
> +	else
> +		nr_memory_reserve = nr_reserve;
>  }
>
>  void __init free_area_init(unsigned long *zones_size)
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V2 1/2] mm/page_alloc: Replace set_dma_reserve to set_memory_reserve
  2016-08-04 17:12 [PATCH V2 1/2] mm/page_alloc: Replace set_dma_reserve to set_memory_reserve Srikar Dronamraju
  2016-08-04 17:12 ` [PATCH V2 2/2] fadump: Register the memory reserved by fadump Srikar Dronamraju
  2016-08-05  6:45 ` [PATCH V2 1/2] mm/page_alloc: Replace set_dma_reserve to set_memory_reserve Vlastimil Babka
@ 2016-08-05  6:47 ` Mel Gorman
  2016-08-05  7:36   ` Srikar Dronamraju
  2 siblings, 1 reply; 9+ messages in thread
From: Mel Gorman @ 2016-08-05  6:47 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: linux-mm, Vlastimil Babka, Michal Hocko, Andrew Morton,
	Michael Ellerman, linuxppc-dev, Mahesh Salgaonkar, Hari Bathini,
	Dave Hansen, Balbir Singh

On Thu, Aug 04, 2016 at 10:42:08PM +0530, Srikar Dronamraju wrote:
> Expand the scope of the existing dma_reserve to accommodate other memory
> reserves too. Accordingly rename variable dma_reserve to
> nr_memory_reserve.
> 
> set_memory_reserve also takes a new parameter that helps to identify if
> the current value needs to be incremented.
> 

I think the parameter is ugly and it should have been just
inc_memory_reserve but at least it works.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V2 1/2] mm/page_alloc: Replace set_dma_reserve to set_memory_reserve
  2016-08-05  6:45 ` [PATCH V2 1/2] mm/page_alloc: Replace set_dma_reserve to set_memory_reserve Vlastimil Babka
@ 2016-08-05  7:24   ` Srikar Dronamraju
  2016-08-05  9:09     ` Vlastimil Babka
  0 siblings, 1 reply; 9+ messages in thread
From: Srikar Dronamraju @ 2016-08-05  7:24 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Mel Gorman, Michal Hocko, Andrew Morton,
	Michael Ellerman, linuxppc-dev, Mahesh Salgaonkar, Hari Bathini,
	Dave Hansen, Balbir Singh

* Vlastimil Babka <vbabka@suse.cz> [2016-08-05 08:45:03]:

> >@@ -5493,10 +5493,10 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
> > 		}
> >
> > 		/* Account for reserved pages */
> >-		if (j == 0 && freesize > dma_reserve) {
> >-			freesize -= dma_reserve;
> >+		if (j == 0 && freesize > nr_memory_reserve) {
> 
> Will this really work (together with patch 2) as intended?
> This j == 0 means that we are doing this only for the first zone, which is
> ZONE_DMA (or ZONE_DMA32) on node 0 on many systems. I.e. I don't think it's
> really true that "dma_reserve has nothing to do with DMA or ZONE_DMA".
> 
> This zone will have limited amount of memory, so the "freesize >
> nr_memory_reserve" will easily be false once you set this to many gigabytes,
> so in fact nothing will get subtracted.
> 
> On the other hand if the kernel has both CONFIG_ZONE_DMA and
> CONFIG_ZONE_DMA32 disabled, then j == 0 will be true for ZONE_NORMAL. This
> zone might be present on multiple nodes (unless they are configured as
> movable) and then the value intended to be global will be subtracted from
> several nodes.
> 
> I don't know what's the exact ppc64 situation here, perhaps there are indeed
> no DMA/DMA32 zones, and the fadump kernel only uses one node, so it works in
> the end, but it doesn't seem much robust to me?
> 

At the page initialization time, powerpc seems to have just one zone
spread across the 16 nodes.

>From the dmesg.

[    0.000000] Memory hole size: 0MB
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000000000000-0x00001f5c8fffffff]
[    0.000000]   DMA32    empty
[    0.000000]   Normal   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x000001fb4fffffff]
[    0.000000]   node   1: [mem 0x000001fb50000000-0x000003fa8fffffff]
[    0.000000]   node   2: [mem 0x000003fa90000000-0x000005f9cfffffff]
[    0.000000]   node   3: [mem 0x000005f9d0000000-0x000007f8efffffff]
[    0.000000]   node   4: [mem 0x000007f8f0000000-0x000009f81fffffff]
[    0.000000]   node   5: [mem 0x000009f820000000-0x00000bf77fffffff]
[    0.000000]   node   6: [mem 0x00000bf780000000-0x00000df6dfffffff]
[    0.000000]   node   7: [mem 0x00000df6e0000000-0x00000ff63fffffff]
[    0.000000]   node   8: [mem 0x00000ff640000000-0x000011f58fffffff]
[    0.000000]   node   9: [mem 0x000011f590000000-0x000013644fffffff]
[    0.000000]   node  10: [mem 0x0000136450000000-0x00001563afffffff]
[    0.000000]   node  11: [mem 0x00001563b0000000-0x000017630fffffff]
[    0.000000]   node  12: [mem 0x0000176310000000-0x000019625fffffff]
[    0.000000]   node  13: [mem 0x0000196260000000-0x00001b5dcfffffff]
[    0.000000]   node  14: [mem 0x00001b5dd0000000-0x00001d5d2fffffff]
[    0.000000]   node  15: [mem 0x00001d5d30000000-0x00001f5c8fffffff]


The config has the below.

CONFIG_ZONE_DMA32=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_ZONE_DMA_FLAG=1
CONFIG_FORCE_MAX_ZONEORDER=9
CONFIG_ZONE_DMA=y

I tried forcing CONFIG_ZONE_DMA to be not set, but make always pick it.
>From source arch/powerpc/Kconfig marks CONFIG_ZONE_DMA as "default y"

-- 
Thanks and Regards
Srikar Dronamraju

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V2 1/2] mm/page_alloc: Replace set_dma_reserve to set_memory_reserve
  2016-08-05  6:47 ` Mel Gorman
@ 2016-08-05  7:36   ` Srikar Dronamraju
  0 siblings, 0 replies; 9+ messages in thread
From: Srikar Dronamraju @ 2016-08-05  7:36 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, Vlastimil Babka, Michal Hocko, Andrew Morton,
	Michael Ellerman, linuxppc-dev, Mahesh Salgaonkar, Hari Bathini,
	Dave Hansen, Balbir Singh

* Mel Gorman <mgorman@techsingularity.net> [2016-08-05 07:47:47]:

> On Thu, Aug 04, 2016 at 10:42:08PM +0530, Srikar Dronamraju wrote:
> > Expand the scope of the existing dma_reserve to accommodate other memory
> > reserves too. Accordingly rename variable dma_reserve to
> > nr_memory_reserve.
> > 
> > set_memory_reserve also takes a new parameter that helps to identify if
> > the current value needs to be incremented.
> > 
> 
> I think the parameter is ugly and it should have been just
> inc_memory_reserve but at least it works.
> 

Yes while the parameter is definitely ugly, the only other use
case in arch/x86/kernel/e820.c seems to be written with an intention to
set to an absolute value.

It was "set_dma_reserve(nr_pages - nr_free_pages)". Both of them
nr_pages and nr_free_pages are calculated after walking through the mem
blocks. I didnt want to take a chance where someother code path also
starts to set reserve value and then the code in e820.c just increments
it.

However if you still feel strongly about using inc_memory_reserve than
set_memory_reserve, I will respin.


-- 
Thanks and Regards
Srikar Dronamraju

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V2 1/2] mm/page_alloc: Replace set_dma_reserve to set_memory_reserve
  2016-08-05  7:24   ` Srikar Dronamraju
@ 2016-08-05  9:09     ` Vlastimil Babka
  0 siblings, 0 replies; 9+ messages in thread
From: Vlastimil Babka @ 2016-08-05  9:09 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: linux-mm, Mel Gorman, Michal Hocko, Andrew Morton,
	Michael Ellerman, linuxppc-dev, Mahesh Salgaonkar, Hari Bathini,
	Dave Hansen, Balbir Singh

On 08/05/2016 09:24 AM, Srikar Dronamraju wrote:
> * Vlastimil Babka <vbabka@suse.cz> [2016-08-05 08:45:03]:
>
>>> @@ -5493,10 +5493,10 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
>>> 		}
>>>
>>> 		/* Account for reserved pages */
>>> -		if (j == 0 && freesize > dma_reserve) {
>>> -			freesize -= dma_reserve;
>>> +		if (j == 0 && freesize > nr_memory_reserve) {
>>
>> Will this really work (together with patch 2) as intended?
>> This j == 0 means that we are doing this only for the first zone, which is
>> ZONE_DMA (or ZONE_DMA32) on node 0 on many systems. I.e. I don't think it's
>> really true that "dma_reserve has nothing to do with DMA or ZONE_DMA".
>>
>> This zone will have limited amount of memory, so the "freesize >
>> nr_memory_reserve" will easily be false once you set this to many gigabytes,
>> so in fact nothing will get subtracted.
>>
>> On the other hand if the kernel has both CONFIG_ZONE_DMA and
>> CONFIG_ZONE_DMA32 disabled, then j == 0 will be true for ZONE_NORMAL. This
>> zone might be present on multiple nodes (unless they are configured as
>> movable) and then the value intended to be global will be subtracted from
>> several nodes.
>>
>> I don't know what's the exact ppc64 situation here, perhaps there are indeed
>> no DMA/DMA32 zones, and the fadump kernel only uses one node, so it works in
>> the end, but it doesn't seem much robust to me?
>>
>
> At the page initialization time, powerpc seems to have just one zone
> spread across the 16 nodes.
>
> From the dmesg.
>
> [    0.000000] Memory hole size: 0MB
> [    0.000000] Zone ranges:
> [    0.000000]   DMA      [mem 0x0000000000000000-0x00001f5c8fffffff]
> [    0.000000]   DMA32    empty
> [    0.000000]   Normal   empty
> [    0.000000] Movable zone start for each node
> [    0.000000] Early memory node ranges
> [    0.000000]   node   0: [mem 0x0000000000000000-0x000001fb4fffffff]
> [    0.000000]   node   1: [mem 0x000001fb50000000-0x000003fa8fffffff]
> [    0.000000]   node   2: [mem 0x000003fa90000000-0x000005f9cfffffff]
> [    0.000000]   node   3: [mem 0x000005f9d0000000-0x000007f8efffffff]
> [    0.000000]   node   4: [mem 0x000007f8f0000000-0x000009f81fffffff]
> [    0.000000]   node   5: [mem 0x000009f820000000-0x00000bf77fffffff]
> [    0.000000]   node   6: [mem 0x00000bf780000000-0x00000df6dfffffff]
> [    0.000000]   node   7: [mem 0x00000df6e0000000-0x00000ff63fffffff]
> [    0.000000]   node   8: [mem 0x00000ff640000000-0x000011f58fffffff]
> [    0.000000]   node   9: [mem 0x000011f590000000-0x000013644fffffff]
> [    0.000000]   node  10: [mem 0x0000136450000000-0x00001563afffffff]
> [    0.000000]   node  11: [mem 0x00001563b0000000-0x000017630fffffff]
> [    0.000000]   node  12: [mem 0x0000176310000000-0x000019625fffffff]
> [    0.000000]   node  13: [mem 0x0000196260000000-0x00001b5dcfffffff]
> [    0.000000]   node  14: [mem 0x00001b5dd0000000-0x00001d5d2fffffff]
> [    0.000000]   node  15: [mem 0x00001d5d30000000-0x00001f5c8fffffff]

Hmm so it will work for ppc64 and its fadump, but I'm not happy that we 
made the function name sound like it's generic (unlike when the name 
contained "dma"), while it only works as intended in specific corner 
cases. The next user might be surprised...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V2 2/2] fadump: Register the memory reserved by fadump
  2016-08-04 21:01   ` Andrew Morton
@ 2016-08-29 13:12     ` Srikar Dronamraju
  0 siblings, 0 replies; 9+ messages in thread
From: Srikar Dronamraju @ 2016-08-29 13:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Mel Gorman, Vlastimil Babka, Michal Hocko,
	Michael Ellerman, linuxppc-dev, Mahesh Salgaonkar, Hari Bathini,
	Dave Hansen, Balbir Singh

* Andrew Morton <akpm@linux-foundation.org> [2016-08-04 14:01:33]:

> > Register the memory reserved by fadump, so that the cache sizes are
> > calculated based on the free memory (i.e Total memory - reserved
> > memory).
> 
> Looks harmless enough to me.  I'll schedule the patches for 4.8.  But
> it sounds like they should be backported into older kernels?
> 

Based on the v2 feedback, I just posted a v3 at
http://lkml.kernel.org/r/1472476010-4709-1-git-send-email-srikar@linux.vnet.ibm.com
that tries to reduce the large system hash based on tha reserved memory.
Hence please drop the v2 patches.

-- 
Thanks and Regards
Srikar Dronamraju

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-08-29 13:12 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-04 17:12 [PATCH V2 1/2] mm/page_alloc: Replace set_dma_reserve to set_memory_reserve Srikar Dronamraju
2016-08-04 17:12 ` [PATCH V2 2/2] fadump: Register the memory reserved by fadump Srikar Dronamraju
2016-08-04 21:01   ` Andrew Morton
2016-08-29 13:12     ` Srikar Dronamraju
2016-08-05  6:45 ` [PATCH V2 1/2] mm/page_alloc: Replace set_dma_reserve to set_memory_reserve Vlastimil Babka
2016-08-05  7:24   ` Srikar Dronamraju
2016-08-05  9:09     ` Vlastimil Babka
2016-08-05  6:47 ` Mel Gorman
2016-08-05  7:36   ` Srikar Dronamraju

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).