linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] mm: Enable page parallel initialisation for Power
@ 2016-03-08  3:55 Li Zhang
  2016-03-08  3:55 ` [PATCH 1/2] mm: meminit: initialise more memory for inode/dentry hash tables in early boot Li Zhang
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Li Zhang @ 2016-03-08  3:55 UTC (permalink / raw)
  To: akpm, vbabka, mgorman, mpe, khandual, aneesh.kumar
  Cc: linux-mm, linuxppc-dev, linux-kernel, Li Zhang

From: Li Zhang <zhlcindy@linux.vnet.ibm.com>

Uptream has supported page parallel initialisation for X86 and the
boot time is improved greately. Some tests have been done for Power.

Here is the result I have done with different memory size.

* 4GB memory:
    boot time is as the following: 
    with patch vs without patch: 10.4s vs 24.5s
    boot time is improved 57%
* 200GB memory: 
    boot time looks the same with and without patches.
    boot time is about 38s
* 32TB memory: 
    boot time looks the same with and without patches 
    boot time is about 160s.
    The boot time is much shorter than X86 with 24TB memory.
    From community discussion, it costs about 694s for X86 24T system.

>From code view, parallel initialisation improve the performance by
deferring memory initilisation to kswap with N kthreads, it should
improve the performance therotically. 

>From the test result, On X86, performance is improved greatly with huge
memory. But on Power platform, it is improved greatly with less than 
100GB memory. For huge memory, it is not improved greatly. But it saves 
the time with several threads at least, as the following information 
shows(32TB system log):

[   22.648169] node 9 initialised, 16607461 pages in 280ms
[   22.783772] node 3 initialised, 23937243 pages in 410ms
[   22.858877] node 6 initialised, 29179347 pages in 490ms
[   22.863252] node 2 initialised, 29179347 pages in 490ms
[   22.907545] node 0 initialised, 32049614 pages in 540ms
[   22.920891] node 15 initialised, 32212280 pages in 550ms
[   22.923236] node 4 initialised, 32306127 pages in 550ms
[   22.923384] node 12 initialised, 32314319 pages in 550ms
[   22.924754] node 8 initialised, 32314319 pages in 550ms
[   22.940780] node 13 initialised, 33353677 pages in 570ms
[   22.940796] node 11 initialised, 33353677 pages in 570ms
[   22.941700] node 5 initialised, 33353677 pages in 570ms
[   22.941721] node 10 initialised, 33353677 pages in 570ms
[   22.941876] node 7 initialised, 33353677 pages in 570ms
[   22.944946] node 14 initialised, 33353677 pages in 570ms
[   22.946063] node 1 initialised, 33345485 pages in 580ms

It saves the time about 550*16 ms at least, although it can be ignore to compare 
the boot time about 160 seconds. What's more, the boot time is much shorter 
on Power even without patches than x86 for huge memory machine. 

So this patchset is still necessary to be enabled for Power. 

Li Zhang (2):
  mm: meminit: initialise more memory for inode/dentry hash tables in
    early boot
  powerpc/mm: Enable page parallel initialisation

 arch/powerpc/Kconfig |  1 +
 mm/page_alloc.c      | 11 +++++++++--
 2 files changed, 10 insertions(+), 2 deletions(-)

-- 
2.1.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] mm: meminit: initialise more memory for inode/dentry hash tables in early boot
  2016-03-08  3:55 [PATCH 0/2] mm: Enable page parallel initialisation for Power Li Zhang
@ 2016-03-08  3:55 ` Li Zhang
  2016-03-08 13:25   ` Vlastimil Babka
  2016-03-08  3:55 ` [PATCH 2/2] powerpc/mm: Enable page parallel initialisation Li Zhang
  2016-03-08 14:45 ` [PATCH 0/2] mm: Enable page parallel initialisation for Power Balbir Singh
  2 siblings, 1 reply; 12+ messages in thread
From: Li Zhang @ 2016-03-08  3:55 UTC (permalink / raw)
  To: akpm, vbabka, mgorman, mpe, khandual, aneesh.kumar
  Cc: linux-mm, linuxppc-dev, linux-kernel, Li Zhang

From: Li Zhang <zhlcindy@linux.vnet.ibm.com>

This patch is based on Mel Gorman's old patch in the mailing list,
https://lkml.org/lkml/2015/5/5/280 which is discussed but it is
fixed with a completion to wait for all memory initialised in
page_alloc_init_late(). It is to fix the OOM problem on X86
with 24TB memory which allocates memory in late initialisation.
But for Power platform with 32TB memory, it causes a call trace
in vfs_caches_init->inode_init() and inode hash table needs more
memory.
So this patch allocates 1GB for 0.25TB/node for large system
as it is mentioned in https://lkml.org/lkml/2015/5/1/627

This call trace is found on Power with 32TB memory, 1024CPUs, 16nodes.
Currently, it only allocates 2GB*16=32GB for early initialisation. But
Dentry cache hash table needes 16GB and Inode cache hash table needs
16GB. So the system have no enough memory for it.
The log from dmesg as the following:

Dentry cache hash table entries: 2147483648 (order: 18,17179869184 bytes)
vmalloc: allocation failure, allocated 16021913600 of 17179934720 bytes
swapper/0: page allocation failure: order:0,mode:0x2080020
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.0-0-ppc64
Call Trace:
[c0000000012bfa00] [c0000000007c4a50].dump_stack+0xb4/0xb664 (unreliable)
[c0000000012bfa80] [c0000000001f93d4].warn_alloc_failed+0x114/0x160
[c0000000012bfb30] [c00000000023c204].__vmalloc_area_node+0x1a4/0x2b0
[c0000000012bfbf0] [c00000000023c3f4].__vmalloc_node_range+0xe4/0x110
[c0000000012bfc90] [c00000000023c460].__vmalloc_node+0x40/0x50
[c0000000012bfd10] [c000000000b67d60].alloc_large_system_hash+0x134/0x2a4
[c0000000012bfdd0] [c000000000b70924].inode_init+0xa4/0xf0
[c0000000012bfe60] [c000000000b706a0].vfs_caches_init+0x80/0x144
[c0000000012bfef0] [c000000000b35208].start_kernel+0x40c/0x4e0
[c0000000012bff90] [c000000000008cfc]start_here_common+0x20/0x4a4
Mem-Info:

Acked-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
---
 * Fix a typo and format dmesg in change log.
 * Fix a coding stype of this patch.

 mm/page_alloc.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 838ca8bb..6f77f64 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -293,13 +293,20 @@ static inline bool update_defer_init(pg_data_t *pgdat,
 				unsigned long pfn, unsigned long zone_end,
 				unsigned long *nr_initialised)
 {
+	unsigned long max_initialise;
+
 	/* Always populate low zones for address-contrained allocations */
 	if (zone_end < pgdat_end_pfn(pgdat))
 		return true;
+	/*
+	 * Initialise at least 2G of a node but also take into account that
+	 * two large system hashes that can take up 1GB for 0.25TB/node.
+	 */
+	max_initialise = max(2UL << (30 - PAGE_SHIFT),
+		(pgdat->node_spanned_pages >> 8));
 
-	/* Initialise at least 2G of the highest zone */
 	(*nr_initialised)++;
-	if (*nr_initialised > (2UL << (30 - PAGE_SHIFT)) &&
+	if ((*nr_initialised > max_initialise) &&
 	    (pfn & (PAGES_PER_SECTION - 1)) == 0) {
 		pgdat->first_deferred_pfn = pfn;
 		return false;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] powerpc/mm: Enable page parallel initialisation
  2016-03-08  3:55 [PATCH 0/2] mm: Enable page parallel initialisation for Power Li Zhang
  2016-03-08  3:55 ` [PATCH 1/2] mm: meminit: initialise more memory for inode/dentry hash tables in early boot Li Zhang
@ 2016-03-08  3:55 ` Li Zhang
  2016-03-08  9:36   ` Michael Ellerman
  2016-03-08 14:45 ` [PATCH 0/2] mm: Enable page parallel initialisation for Power Balbir Singh
  2 siblings, 1 reply; 12+ messages in thread
From: Li Zhang @ 2016-03-08  3:55 UTC (permalink / raw)
  To: akpm, vbabka, mgorman, mpe, khandual, aneesh.kumar
  Cc: linux-mm, linuxppc-dev, linux-kernel, Li Zhang

From: Li Zhang <zhlcindy@linux.vnet.ibm.com>

Parallel initialisation has been enabled for X86, boot time is
improved greatly. On Power8, it is improved greatly for small
memory. Here is the result from my test on Power8 platform:

For 4GB memory: 57% is improved, boot time as the following:
with patch: 10s, without patch: 24.5s

For 50GB memory: 22% is improved, boot time as the following:
with patch: 43.8s, without patch: 56.8s

Acked-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
---
 * Add boot time details in change log.
 * Please apply this patch after [PATCH 1/2] mm: meminit: initialise
    more memory for inode/dentry hash tables in early boot, because
   [PATCH 1/2] is to fix a bug which can be reproduced on Power.

 arch/powerpc/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 9faa18c..97d41ad 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -158,6 +158,7 @@ config PPC
 	select ARCH_HAS_DEVMEM_IS_ALLOWED
 	select HAVE_ARCH_SECCOMP_FILTER
 	select ARCH_HAS_UBSAN_SANITIZE_ALL
+	select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
 
 config GENERIC_CSUM
 	def_bool CPU_LITTLE_ENDIAN
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] powerpc/mm: Enable page parallel initialisation
  2016-03-08  3:55 ` [PATCH 2/2] powerpc/mm: Enable page parallel initialisation Li Zhang
@ 2016-03-08  9:36   ` Michael Ellerman
  2016-03-09  2:06     ` Li Zhang
  2016-03-09 21:42     ` Andrew Morton
  0 siblings, 2 replies; 12+ messages in thread
From: Michael Ellerman @ 2016-03-08  9:36 UTC (permalink / raw)
  To: Li Zhang, akpm, vbabka, mgorman, khandual, aneesh.kumar
  Cc: linux-mm, linuxppc-dev, linux-kernel, Li Zhang

Hi Li,

On Tue, 2016-03-08 at 11:55 +0800, Li Zhang wrote:

> From: Li Zhang <zhlcindy@linux.vnet.ibm.com>
>
> Parallel initialisation has been enabled for X86, boot time is
> improved greatly. On Power8, it is improved greatly for small
> memory. Here is the result from my test on Power8 platform:
>
> For 4GB memory: 57% is improved, boot time as the following:
> with patch: 10s, without patch: 24.5s

This isn't worded quite right, and the numbers are a bit off.

old = 24.5
new = 10

So the improvement is 14.5 (seconds).

That means the improvement (14.5) as a percentage of the original boot time is:

 = 14.5 / 24.5 * 100
 = 59.183673469387756
 = 59%

So you would say:

  For 4GB of memory, boot time is improved by 59%, from 24.5s to 10s.

> For 50GB memory: 22% is improved, boot time as the following:
> with patch: 43.8s, without patch: 56.8s

  For 50GB memory, boot time is improved by 22%, from 56.8s to 43.8s.

> Acked-by: Mel Gorman <mgorman@techsingularity.net>
> Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
> ---
>  * Add boot time details in change log.
>  * Please apply this patch after [PATCH 1/2] mm: meminit: initialise
>     more memory for inode/dentry hash tables in early boot, because
>    [PATCH 1/2] is to fix a bug which can be reproduced on Power.

Given that, I think it would be best if Andrew merged both of these patches.
Because this patch is pretty trivial, whereas the patch to mm/ is less so.

Is that OK Andrew?

For this one:

Acked-by: Michael Ellerman <mpe@ellerman.id.au>

cheers

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] mm: meminit: initialise more memory for inode/dentry hash tables in early boot
  2016-03-08  3:55 ` [PATCH 1/2] mm: meminit: initialise more memory for inode/dentry hash tables in early boot Li Zhang
@ 2016-03-08 13:25   ` Vlastimil Babka
  0 siblings, 0 replies; 12+ messages in thread
From: Vlastimil Babka @ 2016-03-08 13:25 UTC (permalink / raw)
  To: Li Zhang, akpm, mgorman, mpe, khandual, aneesh.kumar
  Cc: linux-mm, linuxppc-dev, linux-kernel, Li Zhang

On 03/08/2016 04:55 AM, Li Zhang wrote:
> From: Li Zhang <zhlcindy@linux.vnet.ibm.com>
> 
> This patch is based on Mel Gorman's old patch in the mailing list,
> https://lkml.org/lkml/2015/5/5/280 which is discussed but it is
> fixed with a completion to wait for all memory initialised in
> page_alloc_init_late(). It is to fix the OOM problem on X86
> with 24TB memory which allocates memory in late initialisation.
> But for Power platform with 32TB memory, it causes a call trace
> in vfs_caches_init->inode_init() and inode hash table needs more
> memory.
> So this patch allocates 1GB for 0.25TB/node for large system
> as it is mentioned in https://lkml.org/lkml/2015/5/1/627
> 
> This call trace is found on Power with 32TB memory, 1024CPUs, 16nodes.
> Currently, it only allocates 2GB*16=32GB for early initialisation. But
> Dentry cache hash table needes 16GB and Inode cache hash table needs
> 16GB. So the system have no enough memory for it.
> The log from dmesg as the following:
> 
> Dentry cache hash table entries: 2147483648 (order: 18,17179869184 bytes)
> vmalloc: allocation failure, allocated 16021913600 of 17179934720 bytes
> swapper/0: page allocation failure: order:0,mode:0x2080020
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.0-0-ppc64
> Call Trace:
> [c0000000012bfa00] [c0000000007c4a50].dump_stack+0xb4/0xb664 (unreliable)
> [c0000000012bfa80] [c0000000001f93d4].warn_alloc_failed+0x114/0x160
> [c0000000012bfb30] [c00000000023c204].__vmalloc_area_node+0x1a4/0x2b0
> [c0000000012bfbf0] [c00000000023c3f4].__vmalloc_node_range+0xe4/0x110
> [c0000000012bfc90] [c00000000023c460].__vmalloc_node+0x40/0x50
> [c0000000012bfd10] [c000000000b67d60].alloc_large_system_hash+0x134/0x2a4
> [c0000000012bfdd0] [c000000000b70924].inode_init+0xa4/0xf0
> [c0000000012bfe60] [c000000000b706a0].vfs_caches_init+0x80/0x144
> [c0000000012bfef0] [c000000000b35208].start_kernel+0x40c/0x4e0
> [c0000000012bff90] [c000000000008cfc]start_here_common+0x20/0x4a4
> Mem-Info:
> 
> Acked-by: Mel Gorman <mgorman@techsingularity.net>
> Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] mm: Enable page parallel initialisation for Power
  2016-03-08  3:55 [PATCH 0/2] mm: Enable page parallel initialisation for Power Li Zhang
  2016-03-08  3:55 ` [PATCH 1/2] mm: meminit: initialise more memory for inode/dentry hash tables in early boot Li Zhang
  2016-03-08  3:55 ` [PATCH 2/2] powerpc/mm: Enable page parallel initialisation Li Zhang
@ 2016-03-08 14:45 ` Balbir Singh
  2016-03-09  4:17   ` Li Zhang
  2 siblings, 1 reply; 12+ messages in thread
From: Balbir Singh @ 2016-03-08 14:45 UTC (permalink / raw)
  To: Li Zhang, akpm, vbabka, mgorman, mpe, khandual, aneesh.kumar
  Cc: linux-mm, linuxppc-dev, linux-kernel, Li Zhang



On 08/03/16 14:55, Li Zhang wrote:
> From: Li Zhang <zhlcindy@linux.vnet.ibm.com>
>
> Uptream has supported page parallel initialisation for X86 and the
> boot time is improved greately. Some tests have been done for Power.
>
> Here is the result I have done with different memory size.
>
> * 4GB memory:
>     boot time is as the following: 
>     with patch vs without patch: 10.4s vs 24.5s
>     boot time is improved 57%
> * 200GB memory: 
>     boot time looks the same with and without patches.
>     boot time is about 38s
> * 32TB memory: 
>     boot time looks the same with and without patches 
>     boot time is about 160s.
>     The boot time is much shorter than X86 with 24TB memory.
>     From community discussion, it costs about 694s for X86 24T system.
>
> From code view, parallel initialisation improve the performance by
> deferring memory initilisation to kswap with N kthreads, it should
> improve the performance therotically. 
>
> From the test result, On X86, performance is improved greatly with huge
> memory. But on Power platform, it is improved greatly with less than 
> 100GB memory. For huge memory, it is not improved greatly. But it saves 
> the time with several threads at least, as the following information 
> shows(32TB system log):
>
> [   22.648169] node 9 initialised, 16607461 pages in 280ms
> [   22.783772] node 3 initialised, 23937243 pages in 410ms
> [   22.858877] node 6 initialised, 29179347 pages in 490ms
> [   22.863252] node 2 initialised, 29179347 pages in 490ms
> [   22.907545] node 0 initialised, 32049614 pages in 540ms
> [   22.920891] node 15 initialised, 32212280 pages in 550ms
> [   22.923236] node 4 initialised, 32306127 pages in 550ms
> [   22.923384] node 12 initialised, 32314319 pages in 550ms
> [   22.924754] node 8 initialised, 32314319 pages in 550ms
> [   22.940780] node 13 initialised, 33353677 pages in 570ms
> [   22.940796] node 11 initialised, 33353677 pages in 570ms
> [   22.941700] node 5 initialised, 33353677 pages in 570ms
> [   22.941721] node 10 initialised, 33353677 pages in 570ms
> [   22.941876] node 7 initialised, 33353677 pages in 570ms
> [   22.944946] node 14 initialised, 33353677 pages in 570ms
> [   22.946063] node 1 initialised, 33345485 pages in 580ms
>
> It saves the time about 550*16 ms at least, although it can be ignore to compare 
> the boot time about 160 seconds. What's more, the boot time is much shorter 
> on Power even without patches than x86 for huge memory machine. 
>
> So this patchset is still necessary to be enabled for Power. 
>
>

The patchset looks good, two questions

1. The patchset is still necessary for
    a. systems with smaller amount of RAM?
    b. Theoretically it improves boot time?
2. the pgdat->node_spanned_pages >> 8 sounds arbitrary
    On a system with 2TB*16 nodes, it would initialize about 8GB before calling deferred init?
    Don't we need at-least 32GB + space for other early hash allocations
    BTW, My expectation was that 32TB would imply 32GB+32GB of large hash allocations early on

Balbir Singh.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] powerpc/mm: Enable page parallel initialisation
  2016-03-08  9:36   ` Michael Ellerman
@ 2016-03-09  2:06     ` Li Zhang
  2016-03-09 21:42     ` Andrew Morton
  1 sibling, 0 replies; 12+ messages in thread
From: Li Zhang @ 2016-03-09  2:06 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: akpm, Vlastimil Babka, mgorman, Anshuman Khandual, aneesh.kumar,
	linux-mm, linuxppc-dev, linux-kernel, Li Zhang

On Tue, Mar 8, 2016 at 5:36 PM, Michael Ellerman <mpe@ellerman.id.au> wrote:
> Hi Li,
>
> On Tue, 2016-03-08 at 11:55 +0800, Li Zhang wrote:
>
>> From: Li Zhang <zhlcindy@linux.vnet.ibm.com>
>>
>> Parallel initialisation has been enabled for X86, boot time is
>> improved greatly. On Power8, it is improved greatly for small
>> memory. Here is the result from my test on Power8 platform:
>>
>> For 4GB memory: 57% is improved, boot time as the following:
>> with patch: 10s, without patch: 24.5s
>
> This isn't worded quite right, and the numbers are a bit off.
>
> old = 24.5
> new = 10
>
> So the improvement is 14.5 (seconds).
>
> That means the improvement (14.5) as a percentage of the original boot time is:
>
>  = 14.5 / 24.5 * 100
>  = 59.183673469387756
>  = 59%

Oh, sorry. It seems that I made a mistake.
>
> So you would say:
>
>   For 4GB of memory, boot time is improved by 59%, from 24.5s to 10s.

Got it. :)

>
>> For 50GB memory: 22% is improved, boot time as the following:
>> with patch: 43.8s, without patch: 56.8s
>
>   For 50GB memory, boot time is improved by 22%, from 56.8s to 43.8s.
>
>> Acked-by: Mel Gorman <mgorman@techsingularity.net>
>> Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
>> ---
>>  * Add boot time details in change log.
>>  * Please apply this patch after [PATCH 1/2] mm: meminit: initialise
>>     more memory for inode/dentry hash tables in early boot, because
>>    [PATCH 1/2] is to fix a bug which can be reproduced on Power.
>
> Given that, I think it would be best if Andrew merged both of these patches.
> Because this patch is pretty trivial, whereas the patch to mm/ is less so.
>
> Is that OK Andrew?
>
> For this one:
>
> Acked-by: Michael Ellerman <mpe@ellerman.id.au>
>
> cheers
>



-- 

Best Regards
-Li

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] mm: Enable page parallel initialisation for Power
  2016-03-08 14:45 ` [PATCH 0/2] mm: Enable page parallel initialisation for Power Balbir Singh
@ 2016-03-09  4:17   ` Li Zhang
  2016-03-09  4:28     ` Balbir Singh
  0 siblings, 1 reply; 12+ messages in thread
From: Li Zhang @ 2016-03-09  4:17 UTC (permalink / raw)
  To: Balbir Singh
  Cc: akpm, Vlastimil Babka, mgorman, Michael Ellerman,
	Anshuman Khandual, aneesh.kumar, linux-mm, linuxppc-dev,
	linux-kernel, Li Zhang

On Tue, Mar 8, 2016 at 10:45 PM, Balbir Singh <bsingharora@gmail.com> wrote:
>
>
> On 08/03/16 14:55, Li Zhang wrote:
>> From: Li Zhang <zhlcindy@linux.vnet.ibm.com>
>>
>> Uptream has supported page parallel initialisation for X86 and the
>> boot time is improved greately. Some tests have been done for Power.
>>
>> Here is the result I have done with different memory size.
>>
>> * 4GB memory:
>>     boot time is as the following:
>>     with patch vs without patch: 10.4s vs 24.5s
>>     boot time is improved 57%
>> * 200GB memory:
>>     boot time looks the same with and without patches.
>>     boot time is about 38s
>> * 32TB memory:
>>     boot time looks the same with and without patches
>>     boot time is about 160s.
>>     The boot time is much shorter than X86 with 24TB memory.
>>     From community discussion, it costs about 694s for X86 24T system.
>>
>> From code view, parallel initialisation improve the performance by
>> deferring memory initilisation to kswap with N kthreads, it should
>> improve the performance therotically.
>>
>> From the test result, On X86, performance is improved greatly with huge
>> memory. But on Power platform, it is improved greatly with less than
>> 100GB memory. For huge memory, it is not improved greatly. But it saves
>> the time with several threads at least, as the following information
>> shows(32TB system log):
>>
>> [   22.648169] node 9 initialised, 16607461 pages in 280ms
>> [   22.783772] node 3 initialised, 23937243 pages in 410ms
>> [   22.858877] node 6 initialised, 29179347 pages in 490ms
>> [   22.863252] node 2 initialised, 29179347 pages in 490ms
>> [   22.907545] node 0 initialised, 32049614 pages in 540ms
>> [   22.920891] node 15 initialised, 32212280 pages in 550ms
>> [   22.923236] node 4 initialised, 32306127 pages in 550ms
>> [   22.923384] node 12 initialised, 32314319 pages in 550ms
>> [   22.924754] node 8 initialised, 32314319 pages in 550ms
>> [   22.940780] node 13 initialised, 33353677 pages in 570ms
>> [   22.940796] node 11 initialised, 33353677 pages in 570ms
>> [   22.941700] node 5 initialised, 33353677 pages in 570ms
>> [   22.941721] node 10 initialised, 33353677 pages in 570ms
>> [   22.941876] node 7 initialised, 33353677 pages in 570ms
>> [   22.944946] node 14 initialised, 33353677 pages in 570ms
>> [   22.946063] node 1 initialised, 33345485 pages in 580ms
>>
>> It saves the time about 550*16 ms at least, although it can be ignore to compare
>> the boot time about 160 seconds. What's more, the boot time is much shorter
>> on Power even without patches than x86 for huge memory machine.
>>
>> So this patchset is still necessary to be enabled for Power.
>>
>>
>
Hi Balbir,

Thanks for your reviewing.

> The patchset looks good, two questions
>
> 1. The patchset is still necessary for
>     a. systems with smaller amount of RAM?
       I think it is. Currently, I tested systems for 4GB, 50GB, and
boot time is improved.
       We may test more systems with different memory size in the future.
>     b. Theoretically it improves boot time?
       The boot time is improved a little bit for huge memory system
and it can be ignored.
       But I think it's still necessary to enable this feature.

> 2. the pgdat->node_spanned_pages >> 8 sounds arbitrary
>     On a system with 2TB*16 nodes, it would initialize about 8GB before calling deferred init?
>     Don't we need at-least 32GB + space for other early hash allocations
>     BTW, My expectation was that 32TB would imply 32GB+32GB of large hash allocations early on

      pgdat->node_spanned_pages >> 8 means that it allocates the size
of the memory on one node.
      On a system with 2TB *16nodes, it will allocate 16*8GB = 128GB.
      I am not sure if it can be minimised to >> 16 to make sure all
the architectures with different
      memory size work well.  And this is also mentioned in early
discussion for X86, so I choose  >> 8.

*    From the code as the following:

      free_area_init_core ->
                     memmap_init->
                              update_defer_init
     #define memmap_init(size, nid, zone, start_pfn) \
           memmap_init_zone((size), (nid), (zone), (start_pfn), MEMMAP_EARLY)

     memmap_init_zone is based on a zone, but free_area_init_core will
help find the highest
     zone on the node. And update_defer_init() get max initialised
memory on highest zone for a node to
     reserve for early initialisation.

     static void __paginginit free_area_init_core(struct pglist_data *pgdat)
     {
            ...
           for (j = 0; j < MAX_NR_ZONES; j++) {
                  ....
                 memmap_init(size, nid, j, zone_start_fn);   //find
the highest zone on a node.
                 ...
           }
     }

*   From the dmesg log, after applying this patchset, it has
123013440K(about 117GB),
    which is enough for Dentry node hash table and Inode hash table in
this system.

    [    0.000000] Memory: 123013440K/31739871232K available (8000K
kernel code, 1856K rwdata,
    3384K rodata, 6208K init, 2544K bss, 28531136K reserved, 0K cma-reserved)

Thanks :)

>
> Balbir Singh.


-- 

Best Regards
-Li

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] mm: Enable page parallel initialisation for Power
  2016-03-09  4:17   ` Li Zhang
@ 2016-03-09  4:28     ` Balbir Singh
  2016-03-09  5:50       ` Li Zhang
  0 siblings, 1 reply; 12+ messages in thread
From: Balbir Singh @ 2016-03-09  4:28 UTC (permalink / raw)
  To: Li Zhang
  Cc: akpm, Vlastimil Babka, mgorman, Michael Ellerman,
	Anshuman Khandual, aneesh.kumar, linux-mm, linuxppc-dev,
	linux-kernel, Li Zhang



On 09/03/16 15:17, Li Zhang wrote:
> On Tue, Mar 8, 2016 at 10:45 PM, Balbir Singh <bsingharora@gmail.com> wrote:
>>
>> On 08/03/16 14:55, Li Zhang wrote:
>>> From: Li Zhang <zhlcindy@linux.vnet.ibm.com>
>>>
>>> Uptream has supported page parallel initialisation for X86 and the
>>> boot time is improved greately. Some tests have been done for Power.
>>>
>>> Here is the result I have done with different memory size.
>>>
>>> * 4GB memory:
>>>     boot time is as the following:
>>>     with patch vs without patch: 10.4s vs 24.5s
>>>     boot time is improved 57%
>>> * 200GB memory:
>>>     boot time looks the same with and without patches.
>>>     boot time is about 38s
>>> * 32TB memory:
>>>     boot time looks the same with and without patches
>>>     boot time is about 160s.
>>>     The boot time is much shorter than X86 with 24TB memory.
>>>     From community discussion, it costs about 694s for X86 24T system.
>>>
>>> From code view, parallel initialisation improve the performance by
>>> deferring memory initilisation to kswap with N kthreads, it should
>>> improve the performance therotically.
>>>
>>> From the test result, On X86, performance is improved greatly with huge
>>> memory. But on Power platform, it is improved greatly with less than
>>> 100GB memory. For huge memory, it is not improved greatly. But it saves
>>> the time with several threads at least, as the following information
>>> shows(32TB system log):
>>>
>>> [   22.648169] node 9 initialised, 16607461 pages in 280ms
>>> [   22.783772] node 3 initialised, 23937243 pages in 410ms
>>> [   22.858877] node 6 initialised, 29179347 pages in 490ms
>>> [   22.863252] node 2 initialised, 29179347 pages in 490ms
>>> [   22.907545] node 0 initialised, 32049614 pages in 540ms
>>> [   22.920891] node 15 initialised, 32212280 pages in 550ms
>>> [   22.923236] node 4 initialised, 32306127 pages in 550ms
>>> [   22.923384] node 12 initialised, 32314319 pages in 550ms
>>> [   22.924754] node 8 initialised, 32314319 pages in 550ms
>>> [   22.940780] node 13 initialised, 33353677 pages in 570ms
>>> [   22.940796] node 11 initialised, 33353677 pages in 570ms
>>> [   22.941700] node 5 initialised, 33353677 pages in 570ms
>>> [   22.941721] node 10 initialised, 33353677 pages in 570ms
>>> [   22.941876] node 7 initialised, 33353677 pages in 570ms
>>> [   22.944946] node 14 initialised, 33353677 pages in 570ms
>>> [   22.946063] node 1 initialised, 33345485 pages in 580ms
>>>
>>> It saves the time about 550*16 ms at least, although it can be ignore to compare
>>> the boot time about 160 seconds. What's more, the boot time is much shorter
>>> on Power even without patches than x86 for huge memory machine.
>>>
>>> So this patchset is still necessary to be enabled for Power.
>>>
>>>
> Hi Balbir,
>
> Thanks for your reviewing.
>
>> The patchset looks good, two questions
>>
>> 1. The patchset is still necessary for
>>     a. systems with smaller amount of RAM?
>        I think it is. Currently, I tested systems for 4GB, 50GB, and
> boot time is improved.
>        We may test more systems with different memory size in the future.
>>     b. Theoretically it improves boot time?
>        The boot time is improved a little bit for huge memory system
> and it can be ignored.
>        But I think it's still necessary to enable this feature.
>
>> 2. the pgdat->node_spanned_pages >> 8 sounds arbitrary
>>     On a system with 2TB*16 nodes, it would initialize about 8GB before calling deferred init?
>>     Don't we need at-least 32GB + space for other early hash allocations
>>     BTW, My expectation was that 32TB would imply 32GB+32GB of large hash allocations early on
>       pgdat->node_spanned_pages >> 8 means that it allocates the size
> of the memory on one node.
>       On a system with 2TB *16nodes, it will allocate 16*8GB = 128GB.
>       I am not sure if it can be minimised to >> 16 to make sure all
> the architectures with different
>       memory size work well.  And this is also mentioned in early
> discussion for X86, so I choose  >> 8.
>
> *    From the code as the following:
>
>       free_area_init_core ->
>                      memmap_init->
>                               update_defer_init
>      #define memmap_init(size, nid, zone, start_pfn) \
>            memmap_init_zone((size), (nid), (zone), (start_pfn), MEMMAP_EARLY)
>
>      memmap_init_zone is based on a zone, but free_area_init_core will
> help find the highest
>      zone on the node. And update_defer_init() get max initialised
> memory on highest zone for a node to
>      reserve for early initialisation.
>
>      static void __paginginit free_area_init_core(struct pglist_data *pgdat)
>      {
>             ...
>            for (j = 0; j < MAX_NR_ZONES; j++) {
>                   ....
>                  memmap_init(size, nid, j, zone_start_fn);   //find
> the highest zone on a node.
>                  ...
>            }
>      }
>
> *   From the dmesg log, after applying this patchset, it has
> 123013440K(about 117GB),
>     which is enough for Dentry node hash table and Inode hash table in
> this system.
>
>     [    0.000000] Memory: 123013440K/31739871232K available (8000K
> kernel code, 1856K rwdata,
>     3384K rodata, 6208K init, 2544K bss, 28531136K reserved, 0K cma-reserved)
>
> Thanks :)
>
Looks good! It seems the real benefit is for smaller systems - thanks for clarifying
Please check if CMA is affected in any way

Acked-by: Balbir Singh <bsingharora@gmail.com>

Balbir Singh.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] mm: Enable page parallel initialisation for Power
  2016-03-09  4:28     ` Balbir Singh
@ 2016-03-09  5:50       ` Li Zhang
  0 siblings, 0 replies; 12+ messages in thread
From: Li Zhang @ 2016-03-09  5:50 UTC (permalink / raw)
  To: Balbir Singh
  Cc: akpm, Vlastimil Babka, mgorman, Michael Ellerman,
	Anshuman Khandual, aneesh.kumar, linux-mm, linuxppc-dev,
	linux-kernel, Li Zhang

On Wed, Mar 9, 2016 at 12:28 PM, Balbir Singh <bsingharora@gmail.com> wrote:
>
>
> On 09/03/16 15:17, Li Zhang wrote:
>> On Tue, Mar 8, 2016 at 10:45 PM, Balbir Singh <bsingharora@gmail.com> wrote:
>>>
>>> On 08/03/16 14:55, Li Zhang wrote:
>>>> From: Li Zhang <zhlcindy@linux.vnet.ibm.com>
>>>>
>>>> Uptream has supported page parallel initialisation for X86 and the
>>>> boot time is improved greately. Some tests have been done for Power.
>>>>
>>>> Here is the result I have done with different memory size.
>>>>
>>>> * 4GB memory:
>>>>     boot time is as the following:
>>>>     with patch vs without patch: 10.4s vs 24.5s
>>>>     boot time is improved 57%
>>>> * 200GB memory:
>>>>     boot time looks the same with and without patches.
>>>>     boot time is about 38s
>>>> * 32TB memory:
>>>>     boot time looks the same with and without patches
>>>>     boot time is about 160s.
>>>>     The boot time is much shorter than X86 with 24TB memory.
>>>>     From community discussion, it costs about 694s for X86 24T system.
>>>>
>>>> From code view, parallel initialisation improve the performance by
>>>> deferring memory initilisation to kswap with N kthreads, it should
>>>> improve the performance therotically.
>>>>
>>>> From the test result, On X86, performance is improved greatly with huge
>>>> memory. But on Power platform, it is improved greatly with less than
>>>> 100GB memory. For huge memory, it is not improved greatly. But it saves
>>>> the time with several threads at least, as the following information
>>>> shows(32TB system log):
>>>>
>>>> [   22.648169] node 9 initialised, 16607461 pages in 280ms
>>>> [   22.783772] node 3 initialised, 23937243 pages in 410ms
>>>> [   22.858877] node 6 initialised, 29179347 pages in 490ms
>>>> [   22.863252] node 2 initialised, 29179347 pages in 490ms
>>>> [   22.907545] node 0 initialised, 32049614 pages in 540ms
>>>> [   22.920891] node 15 initialised, 32212280 pages in 550ms
>>>> [   22.923236] node 4 initialised, 32306127 pages in 550ms
>>>> [   22.923384] node 12 initialised, 32314319 pages in 550ms
>>>> [   22.924754] node 8 initialised, 32314319 pages in 550ms
>>>> [   22.940780] node 13 initialised, 33353677 pages in 570ms
>>>> [   22.940796] node 11 initialised, 33353677 pages in 570ms
>>>> [   22.941700] node 5 initialised, 33353677 pages in 570ms
>>>> [   22.941721] node 10 initialised, 33353677 pages in 570ms
>>>> [   22.941876] node 7 initialised, 33353677 pages in 570ms
>>>> [   22.944946] node 14 initialised, 33353677 pages in 570ms
>>>> [   22.946063] node 1 initialised, 33345485 pages in 580ms
>>>>
>>>> It saves the time about 550*16 ms at least, although it can be ignore to compare
>>>> the boot time about 160 seconds. What's more, the boot time is much shorter
>>>> on Power even without patches than x86 for huge memory machine.
>>>>
>>>> So this patchset is still necessary to be enabled for Power.
>>>>
>>>>
>> Hi Balbir,
>>
>> Thanks for your reviewing.
>>
>>> The patchset looks good, two questions
>>>
>>> 1. The patchset is still necessary for
>>>     a. systems with smaller amount of RAM?
>>        I think it is. Currently, I tested systems for 4GB, 50GB, and
>> boot time is improved.
>>        We may test more systems with different memory size in the future.
>>>     b. Theoretically it improves boot time?
>>        The boot time is improved a little bit for huge memory system
>> and it can be ignored.
>>        But I think it's still necessary to enable this feature.
>>
>>> 2. the pgdat->node_spanned_pages >> 8 sounds arbitrary
>>>     On a system with 2TB*16 nodes, it would initialize about 8GB before calling deferred init?
>>>     Don't we need at-least 32GB + space for other early hash allocations
>>>     BTW, My expectation was that 32TB would imply 32GB+32GB of large hash allocations early on
>>       pgdat->node_spanned_pages >> 8 means that it allocates the size
>> of the memory on one node.
>>       On a system with 2TB *16nodes, it will allocate 16*8GB = 128GB.
>>       I am not sure if it can be minimised to >> 16 to make sure all
>> the architectures with different
>>       memory size work well.  And this is also mentioned in early
>> discussion for X86, so I choose  >> 8.
>>
>> *    From the code as the following:
>>
>>       free_area_init_core ->
>>                      memmap_init->
>>                               update_defer_init
>>      #define memmap_init(size, nid, zone, start_pfn) \
>>            memmap_init_zone((size), (nid), (zone), (start_pfn), MEMMAP_EARLY)
>>
>>      memmap_init_zone is based on a zone, but free_area_init_core will
>> help find the highest
>>      zone on the node. And update_defer_init() get max initialised
>> memory on highest zone for a node to
>>      reserve for early initialisation.
>>
>>      static void __paginginit free_area_init_core(struct pglist_data *pgdat)
>>      {
>>             ...
>>            for (j = 0; j < MAX_NR_ZONES; j++) {
>>                   ....
>>                  memmap_init(size, nid, j, zone_start_fn);   //find
>> the highest zone on a node.
>>                  ...
>>            }
>>      }
>>
>> *   From the dmesg log, after applying this patchset, it has
>> 123013440K(about 117GB),
>>     which is enough for Dentry node hash table and Inode hash table in
>> this system.
>>
>>     [    0.000000] Memory: 123013440K/31739871232K available (8000K
>> kernel code, 1856K rwdata,
>>     3384K rodata, 6208K init, 2544K bss, 28531136K reserved, 0K cma-reserved)
>>
>> Thanks :)
>>
> Looks good! It seems the real benefit is for smaller systems - thanks for clarifying
> Please check if CMA is affected in any way
>

Sure, thanks.

> Acked-by: Balbir Singh <bsingharora@gmail.com>
>
> Balbir Singh.



-- 

Best Regards
-Li

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] powerpc/mm: Enable page parallel initialisation
  2016-03-08  9:36   ` Michael Ellerman
  2016-03-09  2:06     ` Li Zhang
@ 2016-03-09 21:42     ` Andrew Morton
  2016-03-10  0:28       ` Michael Ellerman
  1 sibling, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2016-03-09 21:42 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Li Zhang, vbabka, mgorman, khandual, aneesh.kumar, linux-mm,
	linuxppc-dev, linux-kernel, Li Zhang

On Tue, 08 Mar 2016 20:36:34 +1100 Michael Ellerman <mpe@ellerman.id.au> wrote:

> Given that, I think it would be best if Andrew merged both of these patches.
> Because this patch is pretty trivial, whereas the patch to mm/ is less so.
> 
> Is that OK Andrew?

Yep, no probs.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] powerpc/mm: Enable page parallel initialisation
  2016-03-09 21:42     ` Andrew Morton
@ 2016-03-10  0:28       ` Michael Ellerman
  0 siblings, 0 replies; 12+ messages in thread
From: Michael Ellerman @ 2016-03-10  0:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Li Zhang, vbabka, mgorman, khandual, aneesh.kumar, linux-mm,
	linuxppc-dev, linux-kernel, Li Zhang

On Wed, 2016-03-09 at 13:42 -0800, Andrew Morton wrote:
> On Tue, 08 Mar 2016 20:36:34 +1100 Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> > Given that, I think it would be best if Andrew merged both of these patches.
> > Because this patch is pretty trivial, whereas the patch to mm/ is less so.
> >
> > Is that OK Andrew?
>
> Yep, no probs.

Thanks.

cheers

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-03-10  0:28 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-08  3:55 [PATCH 0/2] mm: Enable page parallel initialisation for Power Li Zhang
2016-03-08  3:55 ` [PATCH 1/2] mm: meminit: initialise more memory for inode/dentry hash tables in early boot Li Zhang
2016-03-08 13:25   ` Vlastimil Babka
2016-03-08  3:55 ` [PATCH 2/2] powerpc/mm: Enable page parallel initialisation Li Zhang
2016-03-08  9:36   ` Michael Ellerman
2016-03-09  2:06     ` Li Zhang
2016-03-09 21:42     ` Andrew Morton
2016-03-10  0:28       ` Michael Ellerman
2016-03-08 14:45 ` [PATCH 0/2] mm: Enable page parallel initialisation for Power Balbir Singh
2016-03-09  4:17   ` Li Zhang
2016-03-09  4:28     ` Balbir Singh
2016-03-09  5:50       ` Li Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).