linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] arm64: mm: Fix NOMAP page initialization
@ 2016-12-16 16:54 Robert Richter
  2017-01-04 13:56 ` Ard Biesheuvel
  0 siblings, 1 reply; 19+ messages in thread
From: Robert Richter @ 2016-12-16 16:54 UTC (permalink / raw)
  To: Russell King, Catalin Marinas, Will Deacon
  Cc: Ard Biesheuvel, David Daney, Mark Rutland, Hanjun Guo,
	James Morse, Yisheng Xie, Robert Richter, linux-arm-kernel,
	linux-kernel, linux-mm

On ThunderX systems with certain memory configurations we see the
following BUG_ON():

 kernel BUG at mm/page_alloc.c:1848!

This happens for some configs with 64k page size enabled. The BUG_ON()
checks if start and end page of a memmap range belongs to the same
zone.

The BUG_ON() check fails if a memory zone contains NOMAP regions. In
this case the node information of those pages is not initialized. This
causes an inconsistency of the page links with wrong zone and node
information for that pages. NOMAP pages from node 1 still point to the
mem zone from node 0 and have the wrong nid assigned.

The reason for the mis-configuration is a change in pfn_valid() which
reports pages marked NOMAP as invalid:

 68709f45385a arm64: only consider memblocks with NOMAP cleared for linear mapping

This causes pages marked as nomap being no longer reassigned to the
new zone in memmap_init_zone() by calling __init_single_pfn().

Fixing this by implementing an arm64 specific early_pfn_valid(). This
causes all pages of sections with memory including NOMAP ranges to be
initialized by __init_single_page() and ensures consistency of page
links to zone, node and section.

The HAVE_ARCH_PFN_VALID config option now requires an explicit
definiton of early_pfn_valid() in the same way as pfn_valid(). This
allows a customized implementation of early_pfn_valid() which
redirects to valid_section() for arm64. This is the same as for the
generic pfn_valid() implementation.

v3:

 * Use valid_section() which is the same as the default pfn_valid()
   implementation to initialize
 * Added Ack for arm/ changes.

v2:

 * Use pfn_present() instead of memblock_is_memory() to support also
   non-memory NOMAP holes

Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Robert Richter <rrichter@cavium.com>
---
 arch/arm/include/asm/page.h   |  1 +
 arch/arm64/include/asm/page.h |  2 ++
 arch/arm64/mm/init.c          | 15 +++++++++++++++
 include/linux/mmzone.h        |  5 ++++-
 4 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/page.h b/arch/arm/include/asm/page.h
index 4355f0ec44d6..79761bd55f94 100644
--- a/arch/arm/include/asm/page.h
+++ b/arch/arm/include/asm/page.h
@@ -158,6 +158,7 @@ typedef struct page *pgtable_t;
 
 #ifdef CONFIG_HAVE_ARCH_PFN_VALID
 extern int pfn_valid(unsigned long);
+#define early_pfn_valid(pfn)	pfn_valid(pfn)
 #endif
 
 #include <asm/memory.h>
diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index 8472c6def5ef..17ceb7435ded 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -49,6 +49,8 @@ typedef struct page *pgtable_t;
 
 #ifdef CONFIG_HAVE_ARCH_PFN_VALID
 extern int pfn_valid(unsigned long);
+extern int early_pfn_valid(unsigned long);
+#define early_pfn_valid early_pfn_valid
 #endif
 
 #include <asm/memory.h>
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 212c4d1e2f26..8ff62a7ff634 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -145,11 +145,26 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max)
 #endif /* CONFIG_NUMA */
 
 #ifdef CONFIG_HAVE_ARCH_PFN_VALID
+
 int pfn_valid(unsigned long pfn)
 {
 	return memblock_is_map_memory(pfn << PAGE_SHIFT);
 }
 EXPORT_SYMBOL(pfn_valid);
+
+/*
+ * This is the same as the generic pfn_valid() implementation. We use
+ * valid_section() here to make sure all pages of a section including
+ * NOMAP pages are initialized with __init_single_page().
+ */
+int early_pfn_valid(unsigned long pfn)
+{
+	if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
+		return 0;
+	return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
+}
+EXPORT_SYMBOL(early_pfn_valid);
+
 #endif
 
 #ifndef CONFIG_SPARSEMEM
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 0f088f3a2fed..bedcf8a95881 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1170,12 +1170,16 @@ static inline struct mem_section *__pfn_to_section(unsigned long pfn)
 }
 
 #ifndef CONFIG_HAVE_ARCH_PFN_VALID
+
 static inline int pfn_valid(unsigned long pfn)
 {
 	if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
 		return 0;
 	return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
 }
+
+#define early_pfn_valid(pfn)	pfn_valid(pfn)
+
 #endif
 
 static inline int pfn_present(unsigned long pfn)
@@ -1200,7 +1204,6 @@ static inline int pfn_present(unsigned long pfn)
 #define pfn_to_nid(pfn)		(0)
 #endif
 
-#define early_pfn_valid(pfn)	pfn_valid(pfn)
 void sparse_init(void);
 #else
 #define sparse_init()	do {} while (0)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] arm64: mm: Fix NOMAP page initialization
  2016-12-16 16:54 [PATCH v3] arm64: mm: Fix NOMAP page initialization Robert Richter
@ 2017-01-04 13:56 ` Ard Biesheuvel
  2017-01-05  2:03   ` Hanjun Guo
  2017-01-05 11:03   ` Robert Richter
  0 siblings, 2 replies; 19+ messages in thread
From: Ard Biesheuvel @ 2017-01-04 13:56 UTC (permalink / raw)
  To: Robert Richter
  Cc: Russell King, Catalin Marinas, Will Deacon, David Daney,
	Mark Rutland, Hanjun Guo, James Morse, Yisheng Xie,
	linux-arm-kernel, linux-kernel, linux-mm

On 16 December 2016 at 16:54, Robert Richter <rrichter@cavium.com> wrote:
> On ThunderX systems with certain memory configurations we see the
> following BUG_ON():
>
>  kernel BUG at mm/page_alloc.c:1848!
>
> This happens for some configs with 64k page size enabled. The BUG_ON()
> checks if start and end page of a memmap range belongs to the same
> zone.
>
> The BUG_ON() check fails if a memory zone contains NOMAP regions. In
> this case the node information of those pages is not initialized. This
> causes an inconsistency of the page links with wrong zone and node
> information for that pages. NOMAP pages from node 1 still point to the
> mem zone from node 0 and have the wrong nid assigned.
>
> The reason for the mis-configuration is a change in pfn_valid() which
> reports pages marked NOMAP as invalid:
>
>  68709f45385a arm64: only consider memblocks with NOMAP cleared for linear mapping
>
> This causes pages marked as nomap being no longer reassigned to the
> new zone in memmap_init_zone() by calling __init_single_pfn().
>
> Fixing this by implementing an arm64 specific early_pfn_valid(). This
> causes all pages of sections with memory including NOMAP ranges to be
> initialized by __init_single_page() and ensures consistency of page
> links to zone, node and section.
>

I like this solution a lot better than the first one, but I am still
somewhat uneasy about having the kernel reason about attributes of
pages it should not touch in the first place. But the fact that
early_pfn_valid() is only used a single time in the whole kernel does
give some confidence that we are not simply moving the problem
elsewhere.

Given that you are touching arch/arm/ as well as arch/arm64, could you
explain why only arm64 needs this treatment? Is it simply because we
don't have NUMA support there?

Considering that Hisilicon D05 suffered from the same issue, I would
like to get some coverage there as well. Hanjun, is this something you
can arrange? Thanks


> The HAVE_ARCH_PFN_VALID config option now requires an explicit
> definiton of early_pfn_valid() in the same way as pfn_valid(). This
> allows a customized implementation of early_pfn_valid() which
> redirects to valid_section() for arm64. This is the same as for the
> generic pfn_valid() implementation.
>
> v3:
>
>  * Use valid_section() which is the same as the default pfn_valid()
>    implementation to initialize
>  * Added Ack for arm/ changes.
>
> v2:
>
>  * Use pfn_present() instead of memblock_is_memory() to support also
>    non-memory NOMAP holes
>
> Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
> Signed-off-by: Robert Richter <rrichter@cavium.com>
> ---
>  arch/arm/include/asm/page.h   |  1 +
>  arch/arm64/include/asm/page.h |  2 ++
>  arch/arm64/mm/init.c          | 15 +++++++++++++++
>  include/linux/mmzone.h        |  5 ++++-
>  4 files changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm/include/asm/page.h b/arch/arm/include/asm/page.h
> index 4355f0ec44d6..79761bd55f94 100644
> --- a/arch/arm/include/asm/page.h
> +++ b/arch/arm/include/asm/page.h
> @@ -158,6 +158,7 @@ typedef struct page *pgtable_t;
>
>  #ifdef CONFIG_HAVE_ARCH_PFN_VALID
>  extern int pfn_valid(unsigned long);
> +#define early_pfn_valid(pfn)   pfn_valid(pfn)
>  #endif
>
>  #include <asm/memory.h>
> diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
> index 8472c6def5ef..17ceb7435ded 100644
> --- a/arch/arm64/include/asm/page.h
> +++ b/arch/arm64/include/asm/page.h
> @@ -49,6 +49,8 @@ typedef struct page *pgtable_t;
>
>  #ifdef CONFIG_HAVE_ARCH_PFN_VALID
>  extern int pfn_valid(unsigned long);
> +extern int early_pfn_valid(unsigned long);
> +#define early_pfn_valid early_pfn_valid
>  #endif
>
>  #include <asm/memory.h>
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 212c4d1e2f26..8ff62a7ff634 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -145,11 +145,26 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max)
>  #endif /* CONFIG_NUMA */
>
>  #ifdef CONFIG_HAVE_ARCH_PFN_VALID
> +
>  int pfn_valid(unsigned long pfn)
>  {
>         return memblock_is_map_memory(pfn << PAGE_SHIFT);
>  }
>  EXPORT_SYMBOL(pfn_valid);
> +
> +/*
> + * This is the same as the generic pfn_valid() implementation. We use
> + * valid_section() here to make sure all pages of a section including
> + * NOMAP pages are initialized with __init_single_page().
> + */
> +int early_pfn_valid(unsigned long pfn)
> +{
> +       if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
> +               return 0;
> +       return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
> +}
> +EXPORT_SYMBOL(early_pfn_valid);
> +
>  #endif
>
>  #ifndef CONFIG_SPARSEMEM
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 0f088f3a2fed..bedcf8a95881 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1170,12 +1170,16 @@ static inline struct mem_section *__pfn_to_section(unsigned long pfn)
>  }
>
>  #ifndef CONFIG_HAVE_ARCH_PFN_VALID
> +
>  static inline int pfn_valid(unsigned long pfn)
>  {
>         if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
>                 return 0;
>         return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
>  }
> +
> +#define early_pfn_valid(pfn)   pfn_valid(pfn)
> +
>  #endif
>
>  static inline int pfn_present(unsigned long pfn)
> @@ -1200,7 +1204,6 @@ static inline int pfn_present(unsigned long pfn)
>  #define pfn_to_nid(pfn)                (0)
>  #endif
>
> -#define early_pfn_valid(pfn)   pfn_valid(pfn)
>  void sparse_init(void);
>  #else
>  #define sparse_init()  do {} while (0)
> --
> 2.11.0
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] arm64: mm: Fix NOMAP page initialization
  2017-01-04 13:56 ` Ard Biesheuvel
@ 2017-01-05  2:03   ` Hanjun Guo
  2017-01-06  1:07     ` Hanjun Guo
  2017-01-05 11:03   ` Robert Richter
  1 sibling, 1 reply; 19+ messages in thread
From: Hanjun Guo @ 2017-01-05  2:03 UTC (permalink / raw)
  To: Ard Biesheuvel, Robert Richter
  Cc: Russell King, Catalin Marinas, Will Deacon, David Daney,
	Mark Rutland, James Morse, Yisheng Xie, linux-arm-kernel,
	linux-kernel, linux-mm

On 2017/1/4 21:56, Ard Biesheuvel wrote:
> On 16 December 2016 at 16:54, Robert Richter <rrichter@cavium.com> wrote:
>> On ThunderX systems with certain memory configurations we see the
>> following BUG_ON():
>>
>>  kernel BUG at mm/page_alloc.c:1848!
>>
>> This happens for some configs with 64k page size enabled. The BUG_ON()
>> checks if start and end page of a memmap range belongs to the same
>> zone.
>>
>> The BUG_ON() check fails if a memory zone contains NOMAP regions. In
>> this case the node information of those pages is not initialized. This
>> causes an inconsistency of the page links with wrong zone and node
>> information for that pages. NOMAP pages from node 1 still point to the
>> mem zone from node 0 and have the wrong nid assigned.
>>
>> The reason for the mis-configuration is a change in pfn_valid() which
>> reports pages marked NOMAP as invalid:
>>
>>  68709f45385a arm64: only consider memblocks with NOMAP cleared for linear mapping
>>
>> This causes pages marked as nomap being no longer reassigned to the
>> new zone in memmap_init_zone() by calling __init_single_pfn().
>>
>> Fixing this by implementing an arm64 specific early_pfn_valid(). This
>> causes all pages of sections with memory including NOMAP ranges to be
>> initialized by __init_single_page() and ensures consistency of page
>> links to zone, node and section.
>>
>
> I like this solution a lot better than the first one, but I am still
> somewhat uneasy about having the kernel reason about attributes of
> pages it should not touch in the first place. But the fact that
> early_pfn_valid() is only used a single time in the whole kernel does
> give some confidence that we are not simply moving the problem
> elsewhere.
>
> Given that you are touching arch/arm/ as well as arch/arm64, could you
> explain why only arm64 needs this treatment? Is it simply because we
> don't have NUMA support there?
>
> Considering that Hisilicon D05 suffered from the same issue, I would
> like to get some coverage there as well. Hanjun, is this something you
> can arrange? Thanks

Sure, we will test this patch with LTP MM stress test (which triggers
the bug on D05), and give the feedback.

Thanks
Hanjun

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] arm64: mm: Fix NOMAP page initialization
  2017-01-04 13:56 ` Ard Biesheuvel
  2017-01-05  2:03   ` Hanjun Guo
@ 2017-01-05 11:03   ` Robert Richter
  1 sibling, 0 replies; 19+ messages in thread
From: Robert Richter @ 2017-01-05 11:03 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Russell King, Catalin Marinas, Will Deacon, David Daney,
	Mark Rutland, Hanjun Guo, James Morse, Yisheng Xie,
	linux-arm-kernel, linux-kernel, linux-mm

On 04.01.17 13:56:39, Ard Biesheuvel wrote:
> Given that you are touching arch/arm/ as well as arch/arm64, could you
> explain why only arm64 needs this treatment? Is it simply because we
> don't have NUMA support there?

I haven't considered a solution for arch/arm yet. The fixes are
independent. But if that fix would be an excepted solution, it could
be implemented for arm then too. But as you said, since probably only
NUMA is affected, we might not need it there.

-Robert

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] arm64: mm: Fix NOMAP page initialization
  2017-01-05  2:03   ` Hanjun Guo
@ 2017-01-06  1:07     ` Hanjun Guo
  2017-01-06  5:22       ` Prakash B
  2017-01-06  8:37       ` Ard Biesheuvel
  0 siblings, 2 replies; 19+ messages in thread
From: Hanjun Guo @ 2017-01-06  1:07 UTC (permalink / raw)
  To: Ard Biesheuvel, Robert Richter
  Cc: Russell King, Catalin Marinas, Will Deacon, David Daney,
	Mark Rutland, James Morse, Yisheng Xie, linux-arm-kernel,
	linux-kernel, linux-mm

On 2017/1/5 10:03, Hanjun Guo wrote:
> On 2017/1/4 21:56, Ard Biesheuvel wrote:
>> On 16 December 2016 at 16:54, Robert Richter <rrichter@cavium.com> wrote:
>>> On ThunderX systems with certain memory configurations we see the
>>> following BUG_ON():
>>>
>>>  kernel BUG at mm/page_alloc.c:1848!
>>>
>>> This happens for some configs with 64k page size enabled. The BUG_ON()
>>> checks if start and end page of a memmap range belongs to the same
>>> zone.
>>>
>>> The BUG_ON() check fails if a memory zone contains NOMAP regions. In
>>> this case the node information of those pages is not initialized. This
>>> causes an inconsistency of the page links with wrong zone and node
>>> information for that pages. NOMAP pages from node 1 still point to the
>>> mem zone from node 0 and have the wrong nid assigned.
>>>
>>> The reason for the mis-configuration is a change in pfn_valid() which
>>> reports pages marked NOMAP as invalid:
>>>
>>>  68709f45385a arm64: only consider memblocks with NOMAP cleared for
>>> linear mapping
>>>
>>> This causes pages marked as nomap being no longer reassigned to the
>>> new zone in memmap_init_zone() by calling __init_single_pfn().
>>>
>>> Fixing this by implementing an arm64 specific early_pfn_valid(). This
>>> causes all pages of sections with memory including NOMAP ranges to be
>>> initialized by __init_single_page() and ensures consistency of page
>>> links to zone, node and section.
>>>
>>
>> I like this solution a lot better than the first one, but I am still
>> somewhat uneasy about having the kernel reason about attributes of
>> pages it should not touch in the first place. But the fact that
>> early_pfn_valid() is only used a single time in the whole kernel does
>> give some confidence that we are not simply moving the problem
>> elsewhere.
>>
>> Given that you are touching arch/arm/ as well as arch/arm64, could you
>> explain why only arm64 needs this treatment? Is it simply because we
>> don't have NUMA support there?
>>
>> Considering that Hisilicon D05 suffered from the same issue, I would
>> like to get some coverage there as well. Hanjun, is this something you
>> can arrange? Thanks
>
> Sure, we will test this patch with LTP MM stress test (which triggers
> the bug on D05), and give the feedback.

a update here, tested on 4.9,

  - Applied Ard's two patches only
  - Applied Robert's patch only

Both of them can work fine on D05 with NUMA enabled, which means
boot ok and LTP MM stress test is passed.

I'm not familiar with memory management, it's up to you guys to make
a decision :)

Thanks
Hanjun

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] arm64: mm: Fix NOMAP page initialization
  2017-01-06  1:07     ` Hanjun Guo
@ 2017-01-06  5:22       ` Prakash B
  2017-01-09  5:09         ` Hanjun Guo
  2017-01-06  8:37       ` Ard Biesheuvel
  1 sibling, 1 reply; 19+ messages in thread
From: Prakash B @ 2017-01-06  5:22 UTC (permalink / raw)
  To: Hanjun Guo
  Cc: Ard Biesheuvel, Robert Richter, Mark Rutland, Yisheng Xie,
	David Daney, Catalin Marinas, Will Deacon, Russell King,
	linux-kernel, linux-mm, James Morse, linux-arm-kernel

Hi Hanjun,


> a update here, tested on 4.9,
>
>  - Applied Ard's two patches only
>  - Applied Robert's patch only
>
> Both of them can work fine on D05 with NUMA enabled, which means
> boot ok and LTP MM stress test is passed.

It is  not related to this patch set.
LTP "cpuset01" test  crashes with latest 4.9,  4.10-rc1 and 4.10-rc2 kernels on
Thunderx 2S .  Do you see any such behaviour on D05.

Any idea what might be causing this issue.


  227.627546] cpuset01: page allocation stalls for 10096ms, order:0,
mode:0x24200ca(GFP_HIGHUSER_MOVABLE)
[  227.627586] CPU: 53 PID: 11017 Comm: cpuset01 Not tainted 4.9.04kNUMA+ #2
[  227.627591] Hardware name: www.cavium.com ThunderX Unknown/ThunderX
Unknown, BIOS 0.3 Aug 24 2016
[  227.627599] Call trace:
[  227.627623] [<ffff000008089f10>] dump_backtrace+0x0/0x238
[  227.627640] [<ffff00000808a16c>] show_stack+0x24/0x30
[  227.627656] [<ffff00000846fb50>] dump_stack+0x94/0xb4
[  227.627679] [<ffff0000081eb4f8>] warn_alloc+0x138/0x150
[  227.627686] [<ffff0000081ec0a4>] __alloc_pages_nodemask+0xb04/0xcf0
[  227.627697] [<ffff000008245988>] alloc_pages_vma+0xc8/0x270
[  227.627715] [<ffff00000821f604>] handle_mm_fault+0xc8c/0xfd8
[  227.627732] [<ffff00000809a488>] do_page_fault+0x2c0/0x368
[  227.627744] [<ffff0000080812ec>] do_mem_abort+0x6c/0xe0
[  227.627752] Exception stack(0xffff801f55823e00 to 0xffff801f55823f30)
[  227.627763] 3e00: 0000000000000000 0000ffff92682000
ffffffffffffffff 0000ffff9252b3e8
[  227.627774] 3e20: 0000000020000000 0000000000000000
000000000000a000 0000000000000003
[  227.627785] 3e40: 0000000000000022 ffffffffffffffff
0000000000000123 00000000000000de
[  227.627793] 3e60: ffff000008972000 0000000000000015
ffff801f55823e90 0000000000040900
[  227.627800] 3e80: 0000000000000000 ffff0000080836f0
0000000000000000 0000ffff92682000
[  227.627809] 3ea0: ffffffffffffffff 0000ffff92575d8c
0000000000000000 0000000000040900
[  227.627819] 3ec0: 0000ffff92682000 00000000000000f7
0000000000004fc0 0000000000000022
[  227.627828] 3ee0: 0000000000000000 0000000000000000
0000ffff925f5508 f7f7f7f7f7f7f7f7
[  227.627838] 3f00: 0000ffff92686ff0 0000000000002ab8
0101010101010101 0000000000000020
[  227.627847] 3f20: 0000000000000000 0000000000000000
[  227.627858] [<ffff000008083324>] el0_da+0x18/0x1c
[  227.627865] Mem-Info:
[  227.627899] active_anon:38613 inactive_anon:8174 isolated_anon:0
                active_file:25148 inactive_file:64173 isolated_file:0
                unevictable:742 dirty:0 writeback:0 unstable:0
                slab_reclaimable:29066 slab_unreclaimable:67304
                mapped:22876 shmem:2597 pagetables:1240 bounce:0
                free:65582521 free_pcp:1834 free_cma:0

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] arm64: mm: Fix NOMAP page initialization
  2017-01-06  1:07     ` Hanjun Guo
  2017-01-06  5:22       ` Prakash B
@ 2017-01-06  8:37       ` Ard Biesheuvel
  2017-01-09  5:14         ` Hanjun Guo
  2017-01-09 11:53         ` Robert Richter
  1 sibling, 2 replies; 19+ messages in thread
From: Ard Biesheuvel @ 2017-01-06  8:37 UTC (permalink / raw)
  To: Hanjun Guo
  Cc: Robert Richter, Russell King, Catalin Marinas, Will Deacon,
	David Daney, Mark Rutland, James Morse, Yisheng Xie,
	linux-arm-kernel, linux-kernel, linux-mm

On 6 January 2017 at 01:07, Hanjun Guo <hanjun.guo@linaro.org> wrote:
> On 2017/1/5 10:03, Hanjun Guo wrote:
>>
>> On 2017/1/4 21:56, Ard Biesheuvel wrote:
>>>
>>> On 16 December 2016 at 16:54, Robert Richter <rrichter@cavium.com> wrote:
>>>>
>>>> On ThunderX systems with certain memory configurations we see the
>>>> following BUG_ON():
>>>>
>>>>  kernel BUG at mm/page_alloc.c:1848!
>>>>
>>>> This happens for some configs with 64k page size enabled. The BUG_ON()
>>>> checks if start and end page of a memmap range belongs to the same
>>>> zone.
>>>>
>>>> The BUG_ON() check fails if a memory zone contains NOMAP regions. In
>>>> this case the node information of those pages is not initialized. This
>>>> causes an inconsistency of the page links with wrong zone and node
>>>> information for that pages. NOMAP pages from node 1 still point to the
>>>> mem zone from node 0 and have the wrong nid assigned.
>>>>
>>>> The reason for the mis-configuration is a change in pfn_valid() which
>>>> reports pages marked NOMAP as invalid:
>>>>
>>>>  68709f45385a arm64: only consider memblocks with NOMAP cleared for
>>>> linear mapping
>>>>
>>>> This causes pages marked as nomap being no longer reassigned to the
>>>> new zone in memmap_init_zone() by calling __init_single_pfn().
>>>>
>>>> Fixing this by implementing an arm64 specific early_pfn_valid(). This
>>>> causes all pages of sections with memory including NOMAP ranges to be
>>>> initialized by __init_single_page() and ensures consistency of page
>>>> links to zone, node and section.
>>>>
>>>
>>> I like this solution a lot better than the first one, but I am still
>>> somewhat uneasy about having the kernel reason about attributes of
>>> pages it should not touch in the first place. But the fact that
>>> early_pfn_valid() is only used a single time in the whole kernel does
>>> give some confidence that we are not simply moving the problem
>>> elsewhere.
>>>
>>> Given that you are touching arch/arm/ as well as arch/arm64, could you
>>> explain why only arm64 needs this treatment? Is it simply because we
>>> don't have NUMA support there?
>>>
>>> Considering that Hisilicon D05 suffered from the same issue, I would
>>> like to get some coverage there as well. Hanjun, is this something you
>>> can arrange? Thanks
>>
>>
>> Sure, we will test this patch with LTP MM stress test (which triggers
>> the bug on D05), and give the feedback.
>
>
> a update here, tested on 4.9,
>
>  - Applied Ard's two patches only
>  - Applied Robert's patch only
>
> Both of them can work fine on D05 with NUMA enabled, which means
> boot ok and LTP MM stress test is passed.
>

Thanks a lot Hanjun.

Any comments on the performance impact (including boot time) ?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] arm64: mm: Fix NOMAP page initialization
  2017-01-06  5:22       ` Prakash B
@ 2017-01-09  5:09         ` Hanjun Guo
  2017-01-09  6:15           ` Prakash B
  0 siblings, 1 reply; 19+ messages in thread
From: Hanjun Guo @ 2017-01-09  5:09 UTC (permalink / raw)
  To: Prakash B
  Cc: Ard Biesheuvel, Robert Richter, Mark Rutland, Yisheng Xie,
	David Daney, Catalin Marinas, Will Deacon, Russell King,
	linux-kernel, linux-mm, James Morse, linux-arm-kernel

Hi Prakash,

On 2017/1/6 13:22, Prakash B wrote:
> Hi Hanjun,
>
>
>> a update here, tested on 4.9,
>>
>>  - Applied Ard's two patches only
>>  - Applied Robert's patch only
>>
>> Both of them can work fine on D05 with NUMA enabled, which means
>> boot ok and LTP MM stress test is passed.
>
> It is  not related to this patch set.
> LTP "cpuset01" test  crashes with latest 4.9,  4.10-rc1 and 4.10-rc2 kernels on
> Thunderx 2S .  Do you see any such behaviour on D05.

I didn't test "cpuset01" on D05 but according to the test in
Linaro, LTP full test is passed on D05 with Ard's 2 patches.

>
> Any idea what might be causing this issue.

Since it's not happening on D05, maybe it's related to
the firmware? (just a wild guess...)

Thanks
Hanjun

>
>
>   227.627546] cpuset01: page allocation stalls for 10096ms, order:0,
> mode:0x24200ca(GFP_HIGHUSER_MOVABLE)
> [  227.627586] CPU: 53 PID: 11017 Comm: cpuset01 Not tainted 4.9.04kNUMA+ #2
> [  227.627591] Hardware name: www.cavium.com ThunderX Unknown/ThunderX
> Unknown, BIOS 0.3 Aug 24 2016
> [  227.627599] Call trace:
> [  227.627623] [<ffff000008089f10>] dump_backtrace+0x0/0x238
> [  227.627640] [<ffff00000808a16c>] show_stack+0x24/0x30
> [  227.627656] [<ffff00000846fb50>] dump_stack+0x94/0xb4
> [  227.627679] [<ffff0000081eb4f8>] warn_alloc+0x138/0x150
> [  227.627686] [<ffff0000081ec0a4>] __alloc_pages_nodemask+0xb04/0xcf0
> [  227.627697] [<ffff000008245988>] alloc_pages_vma+0xc8/0x270
> [  227.627715] [<ffff00000821f604>] handle_mm_fault+0xc8c/0xfd8
> [  227.627732] [<ffff00000809a488>] do_page_fault+0x2c0/0x368
> [  227.627744] [<ffff0000080812ec>] do_mem_abort+0x6c/0xe0
> [  227.627752] Exception stack(0xffff801f55823e00 to 0xffff801f55823f30)
> [  227.627763] 3e00: 0000000000000000 0000ffff92682000
> ffffffffffffffff 0000ffff9252b3e8
> [  227.627774] 3e20: 0000000020000000 0000000000000000
> 000000000000a000 0000000000000003
> [  227.627785] 3e40: 0000000000000022 ffffffffffffffff
> 0000000000000123 00000000000000de
> [  227.627793] 3e60: ffff000008972000 0000000000000015
> ffff801f55823e90 0000000000040900
> [  227.627800] 3e80: 0000000000000000 ffff0000080836f0
> 0000000000000000 0000ffff92682000
> [  227.627809] 3ea0: ffffffffffffffff 0000ffff92575d8c
> 0000000000000000 0000000000040900
> [  227.627819] 3ec0: 0000ffff92682000 00000000000000f7
> 0000000000004fc0 0000000000000022
> [  227.627828] 3ee0: 0000000000000000 0000000000000000
> 0000ffff925f5508 f7f7f7f7f7f7f7f7
> [  227.627838] 3f00: 0000ffff92686ff0 0000000000002ab8
> 0101010101010101 0000000000000020
> [  227.627847] 3f20: 0000000000000000 0000000000000000
> [  227.627858] [<ffff000008083324>] el0_da+0x18/0x1c
> [  227.627865] Mem-Info:
> [  227.627899] active_anon:38613 inactive_anon:8174 isolated_anon:0
>                 active_file:25148 inactive_file:64173 isolated_file:0
>                 unevictable:742 dirty:0 writeback:0 unstable:0
>                 slab_reclaimable:29066 slab_unreclaimable:67304
>                 mapped:22876 shmem:2597 pagetables:1240 bounce:0
>                 free:65582521 free_pcp:1834 free_cma:0
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] arm64: mm: Fix NOMAP page initialization
  2017-01-06  8:37       ` Ard Biesheuvel
@ 2017-01-09  5:14         ` Hanjun Guo
  2017-01-09 11:53         ` Robert Richter
  1 sibling, 0 replies; 19+ messages in thread
From: Hanjun Guo @ 2017-01-09  5:14 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Robert Richter, Russell King, Catalin Marinas, Will Deacon,
	David Daney, Mark Rutland, James Morse, Yisheng Xie,
	linux-arm-kernel, linux-kernel, linux-mm

On 2017/1/6 16:37, Ard Biesheuvel wrote:
> On 6 January 2017 at 01:07, Hanjun Guo <hanjun.guo@linaro.org> wrote:
>> On 2017/1/5 10:03, Hanjun Guo wrote:
>>>
>>> On 2017/1/4 21:56, Ard Biesheuvel wrote:
>>>>
>>>> On 16 December 2016 at 16:54, Robert Richter <rrichter@cavium.com> wrote:
>>>>>
>>>>> On ThunderX systems with certain memory configurations we see the
>>>>> following BUG_ON():
>>>>>
>>>>>  kernel BUG at mm/page_alloc.c:1848!
>>>>>
>>>>> This happens for some configs with 64k page size enabled. The BUG_ON()
>>>>> checks if start and end page of a memmap range belongs to the same
>>>>> zone.
>>>>>
>>>>> The BUG_ON() check fails if a memory zone contains NOMAP regions. In
>>>>> this case the node information of those pages is not initialized. This
>>>>> causes an inconsistency of the page links with wrong zone and node
>>>>> information for that pages. NOMAP pages from node 1 still point to the
>>>>> mem zone from node 0 and have the wrong nid assigned.
>>>>>
>>>>> The reason for the mis-configuration is a change in pfn_valid() which
>>>>> reports pages marked NOMAP as invalid:
>>>>>
>>>>>  68709f45385a arm64: only consider memblocks with NOMAP cleared for
>>>>> linear mapping
>>>>>
>>>>> This causes pages marked as nomap being no longer reassigned to the
>>>>> new zone in memmap_init_zone() by calling __init_single_pfn().
>>>>>
>>>>> Fixing this by implementing an arm64 specific early_pfn_valid(). This
>>>>> causes all pages of sections with memory including NOMAP ranges to be
>>>>> initialized by __init_single_page() and ensures consistency of page
>>>>> links to zone, node and section.
>>>>>
>>>>
>>>> I like this solution a lot better than the first one, but I am still
>>>> somewhat uneasy about having the kernel reason about attributes of
>>>> pages it should not touch in the first place. But the fact that
>>>> early_pfn_valid() is only used a single time in the whole kernel does
>>>> give some confidence that we are not simply moving the problem
>>>> elsewhere.
>>>>
>>>> Given that you are touching arch/arm/ as well as arch/arm64, could you
>>>> explain why only arm64 needs this treatment? Is it simply because we
>>>> don't have NUMA support there?
>>>>
>>>> Considering that Hisilicon D05 suffered from the same issue, I would
>>>> like to get some coverage there as well. Hanjun, is this something you
>>>> can arrange? Thanks
>>>
>>>
>>> Sure, we will test this patch with LTP MM stress test (which triggers
>>> the bug on D05), and give the feedback.
>>
>>
>> a update here, tested on 4.9,
>>
>>  - Applied Ard's two patches only
>>  - Applied Robert's patch only
>>
>> Both of them can work fine on D05 with NUMA enabled, which means
>> boot ok and LTP MM stress test is passed.
>>
>
> Thanks a lot Hanjun.
>
> Any comments on the performance impact (including boot time) ?

Didn't collect the performance data yet, any recommended test
suite?  Is it sysbench ok? we can test it and collect the data.

Thanks
Hanjun

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] arm64: mm: Fix NOMAP page initialization
  2017-01-09  5:09         ` Hanjun Guo
@ 2017-01-09  6:15           ` Prakash B
  0 siblings, 0 replies; 19+ messages in thread
From: Prakash B @ 2017-01-09  6:15 UTC (permalink / raw)
  To: Hanjun Guo
  Cc: Ard Biesheuvel, Robert Richter, Mark Rutland, Yisheng Xie,
	David Daney, Catalin Marinas, Will Deacon, Russell King,
	linux-kernel, linux-mm, James Morse, linux-arm-kernel

Thanks Hanjun ,


On Mon, Jan 9, 2017 at 10:39 AM, Hanjun Guo <hanjun.guo@linaro.org> wrote:
> Hi Prakash,
> I didn't test "cpuset01" on D05 but according to the test in
> Linaro, LTP full test is passed on D05 with Ard's 2 patches.
>
>>
>> Any idea what might be causing this issue.
>
>
> Since it's not happening on D05, maybe it's related to
> the firmware? (just a wild guess...)
>

 Used same firmware b/w 4.4 kernel and 4.9 (and   above kernels) .
Test passed wtih 4.4 kernel and didn't generated  any crashes or
dumps.

If there is more observation I will  send a  mail  or I will start a
separate mail thread.

Thanks,
Prakash B

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] arm64: mm: Fix NOMAP page initialization
  2017-01-06  8:37       ` Ard Biesheuvel
  2017-01-09  5:14         ` Hanjun Guo
@ 2017-01-09 11:53         ` Robert Richter
  2017-01-12 16:05           ` Will Deacon
  1 sibling, 1 reply; 19+ messages in thread
From: Robert Richter @ 2017-01-09 11:53 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Hanjun Guo, Russell King, Catalin Marinas, Will Deacon,
	David Daney, Mark Rutland, James Morse, Yisheng Xie,
	linux-arm-kernel, linux-kernel, linux-mm

On 06.01.17 08:37:25, Ard Biesheuvel wrote:
> Any comments on the performance impact (including boot time) ?

I did a kernel compile test and kernel mode time increases by about
2.2%. Though this is already significant, we need a more suitable mem
benchmark here for further testing.

For boot time I dont see significant changes.

-Robert


Boot times:

pfn_valid_within():
[   25.929134]
[   25.548830]
[   25.503225]

early_pfn_valid() v3:
[   25.773814]
[   25.548428]
[   25.765290]


Kernel compile times (3 runs each):

pfn_valid_within():

real    6m4.088s
user    372m57.607s
sys     16m55.158s

real    6m1.532s
user    372m48.453s
sys     16m50.370s

real    6m4.061s
user    373m18.753s
sys     16m57.027s


early_pfn_valid() v3:

real    6m3.261s
user    373m15.816s
sys     16m30.019s

real    6m2.980s
user    373m9.019s
sys     16m32.992s

real    6m2.574s
user    372m45.146s
sys     16m33.218s

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] arm64: mm: Fix NOMAP page initialization
  2017-01-09 11:53         ` Robert Richter
@ 2017-01-12 16:05           ` Will Deacon
  2017-01-12 18:58             ` Robert Richter
  0 siblings, 1 reply; 19+ messages in thread
From: Will Deacon @ 2017-01-12 16:05 UTC (permalink / raw)
  To: Robert Richter
  Cc: Ard Biesheuvel, Hanjun Guo, Russell King, Catalin Marinas,
	David Daney, Mark Rutland, James Morse, Yisheng Xie,
	linux-arm-kernel, linux-kernel, linux-mm

Hi Robert,

On Mon, Jan 09, 2017 at 12:53:20PM +0100, Robert Richter wrote:
> On 06.01.17 08:37:25, Ard Biesheuvel wrote:
> > Any comments on the performance impact (including boot time) ?
> 
> I did a kernel compile test and kernel mode time increases by about
> 2.2%. Though this is already significant, we need a more suitable mem
> benchmark here for further testing.

Thanks for doing this.

> For boot time I dont see significant changes.
> 
> -Robert
> 
> 
> Boot times:
> 
> pfn_valid_within():
> [   25.929134]
> [   25.548830]
> [   25.503225]
> 
> early_pfn_valid() v3:
> [   25.773814]
> [   25.548428]
> [   25.765290]
> 
> 
> Kernel compile times (3 runs each):
> 
> pfn_valid_within():
> 
> real    6m4.088s
> user    372m57.607s
> sys     16m55.158s
> 
> real    6m1.532s
> user    372m48.453s
> sys     16m50.370s
> 
> real    6m4.061s
> user    373m18.753s
> sys     16m57.027s

Did you reboot the machine between each build here, or only when changing
kernel? If the latter, do you see variations in kernel build time by simply
rebooting the same Image?

Will

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] arm64: mm: Fix NOMAP page initialization
  2017-01-12 16:05           ` Will Deacon
@ 2017-01-12 18:58             ` Robert Richter
  2017-01-13  9:19               ` Will Deacon
  0 siblings, 1 reply; 19+ messages in thread
From: Robert Richter @ 2017-01-12 18:58 UTC (permalink / raw)
  To: Will Deacon
  Cc: Ard Biesheuvel, Hanjun Guo, Russell King, Catalin Marinas,
	David Daney, Mark Rutland, James Morse, Yisheng Xie,
	linux-arm-kernel, linux-kernel, linux-mm

On 12.01.17 16:05:36, Will Deacon wrote:
> On Mon, Jan 09, 2017 at 12:53:20PM +0100, Robert Richter wrote:

> > Kernel compile times (3 runs each):
> > 
> > pfn_valid_within():
> > 
> > real    6m4.088s
> > user    372m57.607s
> > sys     16m55.158s
> > 
> > real    6m1.532s
> > user    372m48.453s
> > sys     16m50.370s
> > 
> > real    6m4.061s
> > user    373m18.753s
> > sys     16m57.027s
> 
> Did you reboot the machine between each build here, or only when changing
> kernel? If the latter, do you see variations in kernel build time by simply
> rebooting the same Image?

I built it in a loop on the shell, so no reboots between builds. Note
that I was building the kernel in /dev/shm to not access harddisks. I
think build times should be comparable then since there is no fs
caching.

-Robert

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] arm64: mm: Fix NOMAP page initialization
  2017-01-12 18:58             ` Robert Richter
@ 2017-01-13  9:19               ` Will Deacon
  2017-01-13 13:15                 ` Robert Richter
  0 siblings, 1 reply; 19+ messages in thread
From: Will Deacon @ 2017-01-13  9:19 UTC (permalink / raw)
  To: Robert Richter
  Cc: Ard Biesheuvel, Hanjun Guo, Russell King, Catalin Marinas,
	David Daney, Mark Rutland, James Morse, Yisheng Xie,
	linux-arm-kernel, linux-kernel, linux-mm

On Thu, Jan 12, 2017 at 07:58:25PM +0100, Robert Richter wrote:
> On 12.01.17 16:05:36, Will Deacon wrote:
> > On Mon, Jan 09, 2017 at 12:53:20PM +0100, Robert Richter wrote:
> 
> > > Kernel compile times (3 runs each):
> > > 
> > > pfn_valid_within():
> > > 
> > > real    6m4.088s
> > > user    372m57.607s
> > > sys     16m55.158s
> > > 
> > > real    6m1.532s
> > > user    372m48.453s
> > > sys     16m50.370s
> > > 
> > > real    6m4.061s
> > > user    373m18.753s
> > > sys     16m57.027s
> > 
> > Did you reboot the machine between each build here, or only when changing
> > kernel? If the latter, do you see variations in kernel build time by simply
> > rebooting the same Image?
> 
> I built it in a loop on the shell, so no reboots between builds. Note
> that I was building the kernel in /dev/shm to not access harddisks. I
> think build times should be comparable then since there is no fs
> caching.

I guess I'm really asking what the standard deviation is if you *do* reboot
between builds, using the same kernel. It's hard to tell whether the numbers
are due to the patches, or just because of noise incurred by the way things
happen to initialise.

Will

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] arm64: mm: Fix NOMAP page initialization
  2017-01-13  9:19               ` Will Deacon
@ 2017-01-13 13:15                 ` Robert Richter
  2017-01-17 10:00                   ` Robert Richter
  0 siblings, 1 reply; 19+ messages in thread
From: Robert Richter @ 2017-01-13 13:15 UTC (permalink / raw)
  To: Will Deacon
  Cc: Ard Biesheuvel, Hanjun Guo, Russell King, Catalin Marinas,
	David Daney, Mark Rutland, James Morse, Yisheng Xie,
	linux-arm-kernel, linux-kernel, linux-mm

On 13.01.17 09:19:04, Will Deacon wrote:
> On Thu, Jan 12, 2017 at 07:58:25PM +0100, Robert Richter wrote:
> > On 12.01.17 16:05:36, Will Deacon wrote:
> > > On Mon, Jan 09, 2017 at 12:53:20PM +0100, Robert Richter wrote:
> > 
> > > > Kernel compile times (3 runs each):
> > > > 
> > > > pfn_valid_within():
> > > > 
> > > > real    6m4.088s
> > > > user    372m57.607s
> > > > sys     16m55.158s
> > > > 
> > > > real    6m1.532s
> > > > user    372m48.453s
> > > > sys     16m50.370s
> > > > 
> > > > real    6m4.061s
> > > > user    373m18.753s
> > > > sys     16m57.027s
> > > 
> > > Did you reboot the machine between each build here, or only when changing
> > > kernel? If the latter, do you see variations in kernel build time by simply
> > > rebooting the same Image?
> > 
> > I built it in a loop on the shell, so no reboots between builds. Note
> > that I was building the kernel in /dev/shm to not access harddisks. I
> > think build times should be comparable then since there is no fs
> > caching.
> 
> I guess I'm really asking what the standard deviation is if you *do* reboot
> between builds, using the same kernel. It's hard to tell whether the numbers
> are due to the patches, or just because of noise incurred by the way things
> happen to initialise.

Ok, I am going to test this.

-Robert

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] arm64: mm: Fix NOMAP page initialization
  2017-01-13 13:15                 ` Robert Richter
@ 2017-01-17 10:00                   ` Robert Richter
  2017-01-17 19:16                     ` Will Deacon
  0 siblings, 1 reply; 19+ messages in thread
From: Robert Richter @ 2017-01-17 10:00 UTC (permalink / raw)
  To: Will Deacon
  Cc: Ard Biesheuvel, Hanjun Guo, Russell King, Catalin Marinas,
	David Daney, Mark Rutland, James Morse, Yisheng Xie,
	linux-arm-kernel, linux-kernel, linux-mm

On 13.01.17 14:15:00, Robert Richter wrote:
> On 13.01.17 09:19:04, Will Deacon wrote:
> > On Thu, Jan 12, 2017 at 07:58:25PM +0100, Robert Richter wrote:
> > > On 12.01.17 16:05:36, Will Deacon wrote:
> > > > On Mon, Jan 09, 2017 at 12:53:20PM +0100, Robert Richter wrote:
> > > 
> > > > > Kernel compile times (3 runs each):
> > > > > 
> > > > > pfn_valid_within():
> > > > > 
> > > > > real    6m4.088s
> > > > > user    372m57.607s
> > > > > sys     16m55.158s
> > > > > 
> > > > > real    6m1.532s
> > > > > user    372m48.453s
> > > > > sys     16m50.370s
> > > > > 
> > > > > real    6m4.061s
> > > > > user    373m18.753s
> > > > > sys     16m57.027s
> > > > 
> > > > Did you reboot the machine between each build here, or only when changing
> > > > kernel? If the latter, do you see variations in kernel build time by simply
> > > > rebooting the same Image?
> > > 
> > > I built it in a loop on the shell, so no reboots between builds. Note
> > > that I was building the kernel in /dev/shm to not access harddisks. I
> > > think build times should be comparable then since there is no fs
> > > caching.
> > 
> > I guess I'm really asking what the standard deviation is if you *do* reboot
> > between builds, using the same kernel. It's hard to tell whether the numbers
> > are due to the patches, or just because of noise incurred by the way things
> > happen to initialise.
> 
> Ok, I am going to test this.

See below the data for a test with reboots between every 3 builds (9
builds per kernel). Though some deviation can be seen between reboots
there is a trend.

-Robert



pfn_valid_within(), boot #1:

real	6m0.007s
user	372m55.709s
sys	16m45.962s

real	5m58.718s
user	372m58.852s
sys	16m47.675s

real	5m58.481s
user	372m56.172s
sys	16m46.953s

pfn_valid_within(), Boot #2:

real	6m1.163s
user	372m57.282s
sys	16m52.025s

real	6m0.562s
user	373m4.957s
sys	16m52.847s

real	6m0.030s
user	372m54.710s
sys	16m54.516s

pfn_valid_within(), Boot #3:

real	6m1.784s
user	373m13.379s
sys	16m48.388s

real	5m58.579s
user	373m10.403s
sys	16m47.628s

real	5m59.151s
user	373m0.084s
sys	16m50.634s

early_pfn_valid(), Boot #1:

real	5m59.902s
user	372m57.201s
sys	16m42.157s

real	5m59.510s
user	372m59.762s
sys	16m47.331s

real	5m58.559s
user	372m46.530s
sys	16m49.010s

early_pfn_valid(), Boot #2:

real	6m0.652s
user	373m10.785s
sys	16m25.138s

real	5m58.663s
user	373m4.498s
sys	16m28.262s

real	5m57.675s
user	373m6.174s
sys	16m28.653s

early_pfn_valid(), Boot #3:

real	5m59.680s
user	373m4.007s
sys	16m26.781s

real	5m58.234s
user	372m58.895s
sys	16m26.957s

real	5m58.707s
user	372m40.546s
sys	16m29.345s

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] arm64: mm: Fix NOMAP page initialization
  2017-01-17 10:00                   ` Robert Richter
@ 2017-01-17 19:16                     ` Will Deacon
  2017-02-03 15:14                       ` Robert Richter
  0 siblings, 1 reply; 19+ messages in thread
From: Will Deacon @ 2017-01-17 19:16 UTC (permalink / raw)
  To: Robert Richter
  Cc: Ard Biesheuvel, Hanjun Guo, Russell King, Catalin Marinas,
	David Daney, Mark Rutland, James Morse, Yisheng Xie,
	linux-arm-kernel, linux-kernel, linux-mm

On Tue, Jan 17, 2017 at 11:00:15AM +0100, Robert Richter wrote:
> On 13.01.17 14:15:00, Robert Richter wrote:
> > On 13.01.17 09:19:04, Will Deacon wrote:
> > > On Thu, Jan 12, 2017 at 07:58:25PM +0100, Robert Richter wrote:
> > > > On 12.01.17 16:05:36, Will Deacon wrote:
> > > > > On Mon, Jan 09, 2017 at 12:53:20PM +0100, Robert Richter wrote:
> > > > 
> > > > > > Kernel compile times (3 runs each):
> > > > > > 
> > > > > > pfn_valid_within():
> > > > > > 
> > > > > > real    6m4.088s
> > > > > > user    372m57.607s
> > > > > > sys     16m55.158s
> > > > > > 
> > > > > > real    6m1.532s
> > > > > > user    372m48.453s
> > > > > > sys     16m50.370s
> > > > > > 
> > > > > > real    6m4.061s
> > > > > > user    373m18.753s
> > > > > > sys     16m57.027s
> > > > > 
> > > > > Did you reboot the machine between each build here, or only when changing
> > > > > kernel? If the latter, do you see variations in kernel build time by simply
> > > > > rebooting the same Image?
> > > > 
> > > > I built it in a loop on the shell, so no reboots between builds. Note
> > > > that I was building the kernel in /dev/shm to not access harddisks. I
> > > > think build times should be comparable then since there is no fs
> > > > caching.
> > > 
> > > I guess I'm really asking what the standard deviation is if you *do* reboot
> > > between builds, using the same kernel. It's hard to tell whether the numbers
> > > are due to the patches, or just because of noise incurred by the way things
> > > happen to initialise.
> > 
> > Ok, I am going to test this.
> 
> See below the data for a test with reboots between every 3 builds (9
> builds per kernel). Though some deviation can be seen between reboots
> there is a trend.

I can't really see the trend given that, for system time, your
pfn_valid_within results have a variance of ~9 and the early_pfn_valid
results have a variance of ~92. Given that the variance seems to come
about due to the reboots, I think we need more numbers to establish whether
the data sets end up largely overlapping or if they really are disjoint.

Will

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] arm64: mm: Fix NOMAP page initialization
  2017-01-17 19:16                     ` Will Deacon
@ 2017-02-03 15:14                       ` Robert Richter
  2017-02-03 18:16                         ` Will Deacon
  0 siblings, 1 reply; 19+ messages in thread
From: Robert Richter @ 2017-02-03 15:14 UTC (permalink / raw)
  To: Will Deacon
  Cc: Ard Biesheuvel, Hanjun Guo, Russell King, Catalin Marinas,
	David Daney, Mark Rutland, James Morse, Yisheng Xie,
	linux-arm-kernel, linux-kernel, linux-mm

On 17.01.17 19:16:56, Will Deacon wrote:
> I can't really see the trend given that, for system time, your
> pfn_valid_within results have a variance of ~9 and the early_pfn_valid
> results have a variance of ~92. Given that the variance seems to come
> about due to the reboots, I think we need more numbers to establish whether
> the data sets end up largely overlapping or if they really are disjoint.

Assuming the numbers of both versions are not significant, please
apply one or the other.

Thanks,

-Robert

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] arm64: mm: Fix NOMAP page initialization
  2017-02-03 15:14                       ` Robert Richter
@ 2017-02-03 18:16                         ` Will Deacon
  0 siblings, 0 replies; 19+ messages in thread
From: Will Deacon @ 2017-02-03 18:16 UTC (permalink / raw)
  To: Robert Richter
  Cc: Ard Biesheuvel, Hanjun Guo, Russell King, Catalin Marinas,
	David Daney, Mark Rutland, James Morse, Yisheng Xie,
	linux-arm-kernel, linux-kernel, linux-mm

On Fri, Feb 03, 2017 at 04:14:14PM +0100, Robert Richter wrote:
> On 17.01.17 19:16:56, Will Deacon wrote:
> > I can't really see the trend given that, for system time, your
> > pfn_valid_within results have a variance of ~9 and the early_pfn_valid
> > results have a variance of ~92. Given that the variance seems to come
> > about due to the reboots, I think we need more numbers to establish whether
> > the data sets end up largely overlapping or if they really are disjoint.
> 
> Assuming the numbers of both versions are not significant, please
> apply one or the other.

I'd rather apply the pfn_valid_within version, but please can you Ack the
patch first, since there was some confusion when it was posted about a
translation fault that was never reproduced.

Thanks,

Will

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-02-03 18:16 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-16 16:54 [PATCH v3] arm64: mm: Fix NOMAP page initialization Robert Richter
2017-01-04 13:56 ` Ard Biesheuvel
2017-01-05  2:03   ` Hanjun Guo
2017-01-06  1:07     ` Hanjun Guo
2017-01-06  5:22       ` Prakash B
2017-01-09  5:09         ` Hanjun Guo
2017-01-09  6:15           ` Prakash B
2017-01-06  8:37       ` Ard Biesheuvel
2017-01-09  5:14         ` Hanjun Guo
2017-01-09 11:53         ` Robert Richter
2017-01-12 16:05           ` Will Deacon
2017-01-12 18:58             ` Robert Richter
2017-01-13  9:19               ` Will Deacon
2017-01-13 13:15                 ` Robert Richter
2017-01-17 10:00                   ` Robert Richter
2017-01-17 19:16                     ` Will Deacon
2017-02-03 15:14                       ` Robert Richter
2017-02-03 18:16                         ` Will Deacon
2017-01-05 11:03   ` Robert Richter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).