All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] arm64: fix MAX_ORDER for 64K pagesize
@ 2014-06-11 21:33 ` Mark Salter
  0 siblings, 0 replies; 28+ messages in thread
From: Mark Salter @ 2014-06-11 21:33 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: linux-arm-kernel, linux-kernel, Mark Salter

With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE
I get this at early boot:

  SMP: Total of 8 processors activated.
  devtmpfs: initialized
  Unable to handle kernel NULL pointer dereference at virtual address 00000008
  pgd = fffffe0000050000
  [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
  Internal error: Oops: 96000006 [#1] SMP
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
  task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
  PC is at __list_add+0x10/0xd4
  LR is at free_one_page+0x270/0x638
  ...
  Call trace:
  [<fffffe00003ee970>] __list_add+0x10/0xd4
  [<fffffe000019c478>] free_one_page+0x26c/0x638
  [<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
  [<fffffe000019d5e8>] __free_pages+0x74/0xbc
  [<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
  [<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
  [<fffffe0000090418>] do_one_initcall+0xc4/0x154
  [<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
  [<fffffe00007520a0>] kernel_init+0xc/0xd4

This happens in this configuration because __free_one_page() is called
with an order greater than MAX_ORDER, accesses past zone->free_list[]
and passes a bogus list_head to list_add().

arch/arm64/Kconfig has:

  config FORCE_MAX_ZONEORDER
	int
	default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
	default "11"

So with THP turned off MAX_ORDER == 11 but init_cma_reserved_pageblock()
passes __free_pages() an order of pageblock_order which is based on
(HPAGE_SHIFT - PAGE_SHIFT) which is 13 for 64K pages. I worked around
this by removing the THP test so FORCE_MAX_ZONEORDER is always 14 for
ARM64_64K_PAGES.

Signed-off-by: Mark Salter <msalter@redhat.com>
---
 arch/arm64/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7295419..42a334e 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -269,7 +269,7 @@ config XEN
 
 config FORCE_MAX_ZONEORDER
 	int
-	default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
+	default "14" if ARM64_64K_PAGES
 	default "11"
 
 endmenu
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH] arm64: fix MAX_ORDER for 64K pagesize
@ 2014-06-11 21:33 ` Mark Salter
  0 siblings, 0 replies; 28+ messages in thread
From: Mark Salter @ 2014-06-11 21:33 UTC (permalink / raw)
  To: linux-arm-kernel

With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE
I get this at early boot:

  SMP: Total of 8 processors activated.
  devtmpfs: initialized
  Unable to handle kernel NULL pointer dereference at virtual address 00000008
  pgd = fffffe0000050000
  [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
  Internal error: Oops: 96000006 [#1] SMP
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
  task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
  PC is at __list_add+0x10/0xd4
  LR is at free_one_page+0x270/0x638
  ...
  Call trace:
  [<fffffe00003ee970>] __list_add+0x10/0xd4
  [<fffffe000019c478>] free_one_page+0x26c/0x638
  [<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
  [<fffffe000019d5e8>] __free_pages+0x74/0xbc
  [<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
  [<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
  [<fffffe0000090418>] do_one_initcall+0xc4/0x154
  [<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
  [<fffffe00007520a0>] kernel_init+0xc/0xd4

This happens in this configuration because __free_one_page() is called
with an order greater than MAX_ORDER, accesses past zone->free_list[]
and passes a bogus list_head to list_add().

arch/arm64/Kconfig has:

  config FORCE_MAX_ZONEORDER
	int
	default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
	default "11"

So with THP turned off MAX_ORDER == 11 but init_cma_reserved_pageblock()
passes __free_pages() an order of pageblock_order which is based on
(HPAGE_SHIFT - PAGE_SHIFT) which is 13 for 64K pages. I worked around
this by removing the THP test so FORCE_MAX_ZONEORDER is always 14 for
ARM64_64K_PAGES.

Signed-off-by: Mark Salter <msalter@redhat.com>
---
 arch/arm64/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7295419..42a334e 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -269,7 +269,7 @@ config XEN
 
 config FORCE_MAX_ZONEORDER
 	int
-	default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
+	default "14" if ARM64_64K_PAGES
 	default "11"
 
 endmenu
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] arm64: fix MAX_ORDER for 64K pagesize
  2014-06-11 21:33 ` Mark Salter
@ 2014-06-11 23:03   ` David Rientjes
  -1 siblings, 0 replies; 28+ messages in thread
From: David Rientjes @ 2014-06-11 23:03 UTC (permalink / raw)
  To: Mark Salter, Michal Nazarewicz, Marek Szyprowski
  Cc: Catalin Marinas, linux-arm-kernel, linux-kernel

On Wed, 11 Jun 2014, Mark Salter wrote:

> With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE
> I get this at early boot:
> 
>   SMP: Total of 8 processors activated.
>   devtmpfs: initialized
>   Unable to handle kernel NULL pointer dereference at virtual address 00000008
>   pgd = fffffe0000050000
>   [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
>   Internal error: Oops: 96000006 [#1] SMP
>   Modules linked in:
>   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
>   task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
>   PC is at __list_add+0x10/0xd4
>   LR is at free_one_page+0x270/0x638
>   ...
>   Call trace:
>   [<fffffe00003ee970>] __list_add+0x10/0xd4
>   [<fffffe000019c478>] free_one_page+0x26c/0x638
>   [<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
>   [<fffffe000019d5e8>] __free_pages+0x74/0xbc
>   [<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
>   [<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
>   [<fffffe0000090418>] do_one_initcall+0xc4/0x154
>   [<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
>   [<fffffe00007520a0>] kernel_init+0xc/0xd4
> 
> This happens in this configuration because __free_one_page() is called
> with an order greater than MAX_ORDER, accesses past zone->free_list[]
> and passes a bogus list_head to list_add().
> 
> arch/arm64/Kconfig has:
> 
>   config FORCE_MAX_ZONEORDER
> 	int
> 	default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
> 	default "11"
> 
> So with THP turned off MAX_ORDER == 11 but init_cma_reserved_pageblock()
> passes __free_pages() an order of pageblock_order which is based on
> (HPAGE_SHIFT - PAGE_SHIFT) which is 13 for 64K pages. I worked around
> this by removing the THP test so FORCE_MAX_ZONEORDER is always 14 for
> ARM64_64K_PAGES.
> 
> Signed-off-by: Mark Salter <msalter@redhat.com>
> ---
>  arch/arm64/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 7295419..42a334e 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -269,7 +269,7 @@ config XEN
>  
>  config FORCE_MAX_ZONEORDER
>  	int
> -	default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
> +	default "14" if ARM64_64K_PAGES
>  	default "11"
>  
>  endmenu

Any reason to not switch this to

	ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE && CMA

instead?  If pageblock_order > MAX_ORDER because of 
HPAGE_SHIFT > PAGE_SHIFT, then cma is always going to be passing a 
too-large-order to free_pages_prepare() via this path.

Adding Michal and Marek to the cc.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH] arm64: fix MAX_ORDER for 64K pagesize
@ 2014-06-11 23:03   ` David Rientjes
  0 siblings, 0 replies; 28+ messages in thread
From: David Rientjes @ 2014-06-11 23:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 11 Jun 2014, Mark Salter wrote:

> With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE
> I get this at early boot:
> 
>   SMP: Total of 8 processors activated.
>   devtmpfs: initialized
>   Unable to handle kernel NULL pointer dereference at virtual address 00000008
>   pgd = fffffe0000050000
>   [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
>   Internal error: Oops: 96000006 [#1] SMP
>   Modules linked in:
>   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
>   task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
>   PC is at __list_add+0x10/0xd4
>   LR is at free_one_page+0x270/0x638
>   ...
>   Call trace:
>   [<fffffe00003ee970>] __list_add+0x10/0xd4
>   [<fffffe000019c478>] free_one_page+0x26c/0x638
>   [<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
>   [<fffffe000019d5e8>] __free_pages+0x74/0xbc
>   [<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
>   [<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
>   [<fffffe0000090418>] do_one_initcall+0xc4/0x154
>   [<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
>   [<fffffe00007520a0>] kernel_init+0xc/0xd4
> 
> This happens in this configuration because __free_one_page() is called
> with an order greater than MAX_ORDER, accesses past zone->free_list[]
> and passes a bogus list_head to list_add().
> 
> arch/arm64/Kconfig has:
> 
>   config FORCE_MAX_ZONEORDER
> 	int
> 	default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
> 	default "11"
> 
> So with THP turned off MAX_ORDER == 11 but init_cma_reserved_pageblock()
> passes __free_pages() an order of pageblock_order which is based on
> (HPAGE_SHIFT - PAGE_SHIFT) which is 13 for 64K pages. I worked around
> this by removing the THP test so FORCE_MAX_ZONEORDER is always 14 for
> ARM64_64K_PAGES.
> 
> Signed-off-by: Mark Salter <msalter@redhat.com>
> ---
>  arch/arm64/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 7295419..42a334e 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -269,7 +269,7 @@ config XEN
>  
>  config FORCE_MAX_ZONEORDER
>  	int
> -	default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
> +	default "14" if ARM64_64K_PAGES
>  	default "11"
>  
>  endmenu

Any reason to not switch this to

	ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE && CMA

instead?  If pageblock_order > MAX_ORDER because of 
HPAGE_SHIFT > PAGE_SHIFT, then cma is always going to be passing a 
too-large-order to free_pages_prepare() via this path.

Adding Michal and Marek to the cc.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] arm64: fix MAX_ORDER for 64K pagesize
  2014-06-11 23:03   ` David Rientjes
@ 2014-06-11 23:04     ` David Rientjes
  -1 siblings, 0 replies; 28+ messages in thread
From: David Rientjes @ 2014-06-11 23:04 UTC (permalink / raw)
  To: Mark Salter, Michal Nazarewicz, Marek Szyprowski
  Cc: Catalin Marinas, linux-arm-kernel, linux-kernel

On Wed, 11 Jun 2014, David Rientjes wrote:

> Any reason to not switch this to
> 
> 	ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE && CMA
> 

(ARM64_64K_PAGES) && (TRANSPARENT_HUGEPAGE || CMA)

> instead?  If pageblock_order > MAX_ORDER because of 
> HPAGE_SHIFT > PAGE_SHIFT, then cma is always going to be passing a 
> too-large-order to free_pages_prepare() via this path.
> 
> Adding Michal and Marek to the cc.
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH] arm64: fix MAX_ORDER for 64K pagesize
@ 2014-06-11 23:04     ` David Rientjes
  0 siblings, 0 replies; 28+ messages in thread
From: David Rientjes @ 2014-06-11 23:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 11 Jun 2014, David Rientjes wrote:

> Any reason to not switch this to
> 
> 	ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE && CMA
> 

(ARM64_64K_PAGES) && (TRANSPARENT_HUGEPAGE || CMA)

> instead?  If pageblock_order > MAX_ORDER because of 
> HPAGE_SHIFT > PAGE_SHIFT, then cma is always going to be passing a 
> too-large-order to free_pages_prepare() via this path.
> 
> Adding Michal and Marek to the cc.
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] arm64: fix MAX_ORDER for 64K pagesize
  2014-06-11 23:04     ` David Rientjes
@ 2014-06-12 13:57       ` Mark Salter
  -1 siblings, 0 replies; 28+ messages in thread
From: Mark Salter @ 2014-06-12 13:57 UTC (permalink / raw)
  To: David Rientjes
  Cc: Michal Nazarewicz, Marek Szyprowski, Catalin Marinas,
	linux-arm-kernel, linux-kernel

On Wed, 2014-06-11 at 16:04 -0700, David Rientjes wrote:
> On Wed, 11 Jun 2014, David Rientjes wrote:
> 
> > Any reason to not switch this to
> > 
> > 	ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE && CMA
> > 
> 
> (ARM64_64K_PAGES) && (TRANSPARENT_HUGEPAGE || CMA)

And add HUGETLB to the list also? I'm not sure of all the trade offs
here, so I kept it simple. I don't have a strong opinion one way or
the other.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH] arm64: fix MAX_ORDER for 64K pagesize
@ 2014-06-12 13:57       ` Mark Salter
  0 siblings, 0 replies; 28+ messages in thread
From: Mark Salter @ 2014-06-12 13:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2014-06-11 at 16:04 -0700, David Rientjes wrote:
> On Wed, 11 Jun 2014, David Rientjes wrote:
> 
> > Any reason to not switch this to
> > 
> > 	ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE && CMA
> > 
> 
> (ARM64_64K_PAGES) && (TRANSPARENT_HUGEPAGE || CMA)

And add HUGETLB to the list also? I'm not sure of all the trade offs
here, so I kept it simple. I don't have a strong opinion one way or
the other.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] arm64: fix MAX_ORDER for 64K pagesize
  2014-06-11 23:03   ` David Rientjes
@ 2014-06-17 18:32     ` Michal Nazarewicz
  -1 siblings, 0 replies; 28+ messages in thread
From: Michal Nazarewicz @ 2014-06-17 18:32 UTC (permalink / raw)
  To: David Rientjes, Mark Salter, Marek Szyprowski
  Cc: Catalin Marinas, linux-arm-kernel, linux-kernel

On Wed, Jun 11 2014, David Rientjes wrote:
> On Wed, 11 Jun 2014, Mark Salter wrote:
>
>> With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE
>> I get this at early boot:
>> 
>>   SMP: Total of 8 processors activated.
>>   devtmpfs: initialized
>>   Unable to handle kernel NULL pointer dereference at virtual address 00000008
>>   pgd = fffffe0000050000
>>   [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
>>   Internal error: Oops: 96000006 [#1] SMP
>>   Modules linked in:
>>   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
>>   task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
>>   PC is at __list_add+0x10/0xd4
>>   LR is at free_one_page+0x270/0x638
>>   ...
>>   Call trace:
>>   [<fffffe00003ee970>] __list_add+0x10/0xd4
>>   [<fffffe000019c478>] free_one_page+0x26c/0x638
>>   [<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
>>   [<fffffe000019d5e8>] __free_pages+0x74/0xbc
>>   [<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
>>   [<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
>>   [<fffffe0000090418>] do_one_initcall+0xc4/0x154
>>   [<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
>>   [<fffffe00007520a0>] kernel_init+0xc/0xd4
>> 
>> This happens in this configuration because __free_one_page() is called
>> with an order greater than MAX_ORDER, accesses past zone->free_list[]
>> and passes a bogus list_head to list_add().
>> 
>> arch/arm64/Kconfig has:
>> 
>>   config FORCE_MAX_ZONEORDER
>> 	int
>> 	default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
>> 	default "11"
>> 
>> So with THP turned off MAX_ORDER == 11 but init_cma_reserved_pageblock()
>> passes __free_pages() an order of pageblock_order which is based on
>> (HPAGE_SHIFT - PAGE_SHIFT) which is 13 for 64K pages. I worked around
>> this by removing the THP test so FORCE_MAX_ZONEORDER is always 14 for
>> ARM64_64K_PAGES.
>> 
>> Signed-off-by: Mark Salter <msalter@redhat.com>
>> ---
>>  arch/arm64/Kconfig | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 7295419..42a334e 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -269,7 +269,7 @@ config XEN
>>  
>>  config FORCE_MAX_ZONEORDER
>>  	int
>> -	default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
>> +	default "14" if ARM64_64K_PAGES
>>  	default "11"
>>  
>>  endmenu
>
> Any reason to not switch this to
>
> 	ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE && CMA
>
> instead?  If pageblock_order > MAX_ORDER because of 
> HPAGE_SHIFT > PAGE_SHIFT, then cma is always going to be passing a 
> too-large-order to free_pages_prepare() via this path.
>
> Adding Michal and Marek to the cc.

The correct fix would be to change init_cma_reserved_pageblock such that
it checks whether pageblock_order > MAX_ORDER and if so frees each max
order page of the pageblock individually:

--------- >8 ---------------------------------------------------------
From: Michal Nazarewicz <mina86@mina86.com>
Subject: [PATCH] mm: cma: fix cases where pageblock is bigger then MAX_ORDER

With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE,
the following is triggered at early boot:

  SMP: Total of 8 processors activated.
  devtmpfs: initialized
  Unable to handle kernel NULL pointer dereference at virtual address 00000008
  pgd = fffffe0000050000
  [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
  Internal error: Oops: 96000006 [#1] SMP
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
  task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
  PC is at __list_add+0x10/0xd4
  LR is at free_one_page+0x270/0x638
  ...
  Call trace:
  [<fffffe00003ee970>] __list_add+0x10/0xd4
  [<fffffe000019c478>] free_one_page+0x26c/0x638
  [<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
  [<fffffe000019d5e8>] __free_pages+0x74/0xbc
  [<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
  [<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
  [<fffffe0000090418>] do_one_initcall+0xc4/0x154
  [<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
  [<fffffe00007520a0>] kernel_init+0xc/0xd4

This happens in this configuration because __free_one_page() is called
with an order greater than MAX_ORDER, accesses past zone->free_list[]
and passes a bogus list_head to list_add().

arch/arm64/Kconfig has:

  config FORCE_MAX_ZONEORDER
	int
	default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
	default "11"

So with THP turned off MAX_ORDER == 11 but init_cma_reserved_pageblock()
passes __free_pages() an order of pageblock_order which is based on
(HPAGE_SHIFT - PAGE_SHIFT) which is 13 for 64K pages.

Fix the problem by changing init_cma_reserved_pageblock() such that it
splits pageblock into individual MAX_ORDER pages if pageblock is
bigger than a MAX_ORDER page.

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Reported-by: Mark Salter <msalter@redhat.com>
---
 mm/page_alloc.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5dba293..6e657ce 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -801,7 +801,15 @@ void __init init_cma_reserved_pageblock(struct page *page)
 
 	set_page_refcounted(page);
 	set_pageblock_migratetype(page, MIGRATE_CMA);
-	__free_pages(page, pageblock_order);
+	if (pageblock_order > MAX_ORDER) {
+		struct page *subpage = p;
+		unsigned count = 1 << (pageblock_order - MAX_ORDER);
+		do {
+			__free_pages(subpage, pageblock_order);
+		} while (subpage += MAX_ORDER_NR_PAGES, --count);
+	} else {
+		__free_pages(page, pageblock_order);
+	}
 	adjust_managed_page_count(page, pageblock_nr_pages);
 }
 #endif
--------- >8 ---------------------------------------------------------

Thoughts?  This has not been tested and I think it may cause performance
degradation in some cases since pageblock_order is not always
a constant, so the comparison may end up not being stripped away even on
systems where it's always false.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +--<mpn@google.com>--<xmpp:mina86@jabber.org>--ooO--(_)--Ooo--

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH] arm64: fix MAX_ORDER for 64K pagesize
@ 2014-06-17 18:32     ` Michal Nazarewicz
  0 siblings, 0 replies; 28+ messages in thread
From: Michal Nazarewicz @ 2014-06-17 18:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 11 2014, David Rientjes wrote:
> On Wed, 11 Jun 2014, Mark Salter wrote:
>
>> With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE
>> I get this at early boot:
>> 
>>   SMP: Total of 8 processors activated.
>>   devtmpfs: initialized
>>   Unable to handle kernel NULL pointer dereference at virtual address 00000008
>>   pgd = fffffe0000050000
>>   [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
>>   Internal error: Oops: 96000006 [#1] SMP
>>   Modules linked in:
>>   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
>>   task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
>>   PC is at __list_add+0x10/0xd4
>>   LR is at free_one_page+0x270/0x638
>>   ...
>>   Call trace:
>>   [<fffffe00003ee970>] __list_add+0x10/0xd4
>>   [<fffffe000019c478>] free_one_page+0x26c/0x638
>>   [<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
>>   [<fffffe000019d5e8>] __free_pages+0x74/0xbc
>>   [<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
>>   [<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
>>   [<fffffe0000090418>] do_one_initcall+0xc4/0x154
>>   [<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
>>   [<fffffe00007520a0>] kernel_init+0xc/0xd4
>> 
>> This happens in this configuration because __free_one_page() is called
>> with an order greater than MAX_ORDER, accesses past zone->free_list[]
>> and passes a bogus list_head to list_add().
>> 
>> arch/arm64/Kconfig has:
>> 
>>   config FORCE_MAX_ZONEORDER
>> 	int
>> 	default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
>> 	default "11"
>> 
>> So with THP turned off MAX_ORDER == 11 but init_cma_reserved_pageblock()
>> passes __free_pages() an order of pageblock_order which is based on
>> (HPAGE_SHIFT - PAGE_SHIFT) which is 13 for 64K pages. I worked around
>> this by removing the THP test so FORCE_MAX_ZONEORDER is always 14 for
>> ARM64_64K_PAGES.
>> 
>> Signed-off-by: Mark Salter <msalter@redhat.com>
>> ---
>>  arch/arm64/Kconfig | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 7295419..42a334e 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -269,7 +269,7 @@ config XEN
>>  
>>  config FORCE_MAX_ZONEORDER
>>  	int
>> -	default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
>> +	default "14" if ARM64_64K_PAGES
>>  	default "11"
>>  
>>  endmenu
>
> Any reason to not switch this to
>
> 	ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE && CMA
>
> instead?  If pageblock_order > MAX_ORDER because of 
> HPAGE_SHIFT > PAGE_SHIFT, then cma is always going to be passing a 
> too-large-order to free_pages_prepare() via this path.
>
> Adding Michal and Marek to the cc.

The correct fix would be to change init_cma_reserved_pageblock such that
it checks whether pageblock_order > MAX_ORDER and if so frees each max
order page of the pageblock individually:

--------- >8 ---------------------------------------------------------
From: Michal Nazarewicz <mina86@mina86.com>
Subject: [PATCH] mm: cma: fix cases where pageblock is bigger then MAX_ORDER

With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE,
the following is triggered at early boot:

  SMP: Total of 8 processors activated.
  devtmpfs: initialized
  Unable to handle kernel NULL pointer dereference at virtual address 00000008
  pgd = fffffe0000050000
  [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
  Internal error: Oops: 96000006 [#1] SMP
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
  task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
  PC is at __list_add+0x10/0xd4
  LR is at free_one_page+0x270/0x638
  ...
  Call trace:
  [<fffffe00003ee970>] __list_add+0x10/0xd4
  [<fffffe000019c478>] free_one_page+0x26c/0x638
  [<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
  [<fffffe000019d5e8>] __free_pages+0x74/0xbc
  [<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
  [<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
  [<fffffe0000090418>] do_one_initcall+0xc4/0x154
  [<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
  [<fffffe00007520a0>] kernel_init+0xc/0xd4

This happens in this configuration because __free_one_page() is called
with an order greater than MAX_ORDER, accesses past zone->free_list[]
and passes a bogus list_head to list_add().

arch/arm64/Kconfig has:

  config FORCE_MAX_ZONEORDER
	int
	default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
	default "11"

So with THP turned off MAX_ORDER == 11 but init_cma_reserved_pageblock()
passes __free_pages() an order of pageblock_order which is based on
(HPAGE_SHIFT - PAGE_SHIFT) which is 13 for 64K pages.

Fix the problem by changing init_cma_reserved_pageblock() such that it
splits pageblock into individual MAX_ORDER pages if pageblock is
bigger than a MAX_ORDER page.

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Reported-by: Mark Salter <msalter@redhat.com>
---
 mm/page_alloc.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5dba293..6e657ce 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -801,7 +801,15 @@ void __init init_cma_reserved_pageblock(struct page *page)
 
 	set_page_refcounted(page);
 	set_pageblock_migratetype(page, MIGRATE_CMA);
-	__free_pages(page, pageblock_order);
+	if (pageblock_order > MAX_ORDER) {
+		struct page *subpage = p;
+		unsigned count = 1 << (pageblock_order - MAX_ORDER);
+		do {
+			__free_pages(subpage, pageblock_order);
+		} while (subpage += MAX_ORDER_NR_PAGES, --count);
+	} else {
+		__free_pages(page, pageblock_order);
+	}
 	adjust_managed_page_count(page, pageblock_nr_pages);
 }
 #endif
--------- >8 ---------------------------------------------------------

Thoughts?  This has not been tested and I think it may cause performance
degradation in some cases since pageblock_order is not always
a constant, so the comparison may end up not being stripped away even on
systems where it's always false.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Micha? ?mina86? Nazarewicz    (o o)
ooo +--<mpn@google.com>--<xmpp:mina86@jabber.org>--ooO--(_)--Ooo--

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] arm64: fix MAX_ORDER for 64K pagesize
  2014-06-17 18:32     ` Michal Nazarewicz
@ 2014-06-19 18:12       ` Mark Salter
  -1 siblings, 0 replies; 28+ messages in thread
From: Mark Salter @ 2014-06-19 18:12 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: David Rientjes, Marek Szyprowski, Catalin Marinas,
	linux-arm-kernel, linux-kernel

On Tue, 2014-06-17 at 20:32 +0200, Michal Nazarewicz wrote:
> On Wed, Jun 11 2014, David Rientjes wrote:
> > On Wed, 11 Jun 2014, Mark Salter wrote:
> >
> >> With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE
> >> I get this at early boot:
> >> 
> >>   SMP: Total of 8 processors activated.
> >>   devtmpfs: initialized
> >>   Unable to handle kernel NULL pointer dereference at virtual address 00000008
> >>   pgd = fffffe0000050000
> >>   [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
> >>   Internal error: Oops: 96000006 [#1] SMP
> >>   Modules linked in:
> >>   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
> >>   task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
> >>   PC is at __list_add+0x10/0xd4
> >>   LR is at free_one_page+0x270/0x638
> >>   ...
> >>   Call trace:
> >>   [<fffffe00003ee970>] __list_add+0x10/0xd4
> >>   [<fffffe000019c478>] free_one_page+0x26c/0x638
> >>   [<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
> >>   [<fffffe000019d5e8>] __free_pages+0x74/0xbc
> >>   [<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
> >>   [<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
> >>   [<fffffe0000090418>] do_one_initcall+0xc4/0x154
> >>   [<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
> >>   [<fffffe00007520a0>] kernel_init+0xc/0xd4
> >> 
> >> This happens in this configuration because __free_one_page() is called
> >> with an order greater than MAX_ORDER, accesses past zone->free_list[]
> >> and passes a bogus list_head to list_add().
> >> 
> >> arch/arm64/Kconfig has:
> >> 
> >>   config FORCE_MAX_ZONEORDER
> >> 	int
> >> 	default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
> >> 	default "11"
> >> 
> >> So with THP turned off MAX_ORDER == 11 but init_cma_reserved_pageblock()
> >> passes __free_pages() an order of pageblock_order which is based on
> >> (HPAGE_SHIFT - PAGE_SHIFT) which is 13 for 64K pages. I worked around
> >> this by removing the THP test so FORCE_MAX_ZONEORDER is always 14 for
> >> ARM64_64K_PAGES.
> >> 
> >> Signed-off-by: Mark Salter <msalter@redhat.com>
> >> ---
> >>  arch/arm64/Kconfig | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >> 
> >> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> >> index 7295419..42a334e 100644
> >> --- a/arch/arm64/Kconfig
> >> +++ b/arch/arm64/Kconfig
> >> @@ -269,7 +269,7 @@ config XEN
> >>  
> >>  config FORCE_MAX_ZONEORDER
> >>  	int
> >> -	default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
> >> +	default "14" if ARM64_64K_PAGES
> >>  	default "11"
> >>  
> >>  endmenu
> >
> > Any reason to not switch this to
> >
> > 	ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE && CMA
> >
> > instead?  If pageblock_order > MAX_ORDER because of 
> > HPAGE_SHIFT > PAGE_SHIFT, then cma is always going to be passing a 
> > too-large-order to free_pages_prepare() via this path.
> >
> > Adding Michal and Marek to the cc.
> 
> The correct fix would be to change init_cma_reserved_pageblock such that
> it checks whether pageblock_order > MAX_ORDER and if so frees each max
> order page of the pageblock individually:
> 
> --------- >8 ---------------------------------------------------------
> From: Michal Nazarewicz <mina86@mina86.com>
> Subject: [PATCH] mm: cma: fix cases where pageblock is bigger then MAX_ORDER
> 
> With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE,
> the following is triggered at early boot:
> 
>   SMP: Total of 8 processors activated.
>   devtmpfs: initialized
>   Unable to handle kernel NULL pointer dereference at virtual address 00000008
>   pgd = fffffe0000050000
>   [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
>   Internal error: Oops: 96000006 [#1] SMP
>   Modules linked in:
>   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
>   task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
>   PC is at __list_add+0x10/0xd4
>   LR is at free_one_page+0x270/0x638
>   ...
>   Call trace:
>   [<fffffe00003ee970>] __list_add+0x10/0xd4
>   [<fffffe000019c478>] free_one_page+0x26c/0x638
>   [<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
>   [<fffffe000019d5e8>] __free_pages+0x74/0xbc
>   [<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
>   [<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
>   [<fffffe0000090418>] do_one_initcall+0xc4/0x154
>   [<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
>   [<fffffe00007520a0>] kernel_init+0xc/0xd4
> 
> This happens in this configuration because __free_one_page() is called
> with an order greater than MAX_ORDER, accesses past zone->free_list[]
> and passes a bogus list_head to list_add().
> 
> arch/arm64/Kconfig has:
> 
>   config FORCE_MAX_ZONEORDER
> 	int
> 	default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
> 	default "11"
> 
> So with THP turned off MAX_ORDER == 11 but init_cma_reserved_pageblock()
> passes __free_pages() an order of pageblock_order which is based on
> (HPAGE_SHIFT - PAGE_SHIFT) which is 13 for 64K pages.
> 
> Fix the problem by changing init_cma_reserved_pageblock() such that it
> splits pageblock into individual MAX_ORDER pages if pageblock is
> bigger than a MAX_ORDER page.
> 
> Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
> Reported-by: Mark Salter <msalter@redhat.com>
> ---
>  mm/page_alloc.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5dba293..6e657ce 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -801,7 +801,15 @@ void __init init_cma_reserved_pageblock(struct page *page)
>  
>  	set_page_refcounted(page);
>  	set_pageblock_migratetype(page, MIGRATE_CMA);
> -	__free_pages(page, pageblock_order);
> +	if (pageblock_order > MAX_ORDER) {
> +		struct page *subpage = p;
> +		unsigned count = 1 << (pageblock_order - MAX_ORDER);
> +		do {
> +			__free_pages(subpage, pageblock_order);
                                               ^^^^^^^
                                               MAX_ORDER

> +		} while (subpage += MAX_ORDER_NR_PAGES, --count);
> +	} else {
> +		__free_pages(page, pageblock_order);
> +	}
>  	adjust_managed_page_count(page, pageblock_nr_pages);
>  }
>  #endif
> --------- >8 ---------------------------------------------------------
> 
> Thoughts?  This has not been tested and I think it may cause performance
> degradation in some cases since pageblock_order is not always
> a constant, so the comparison may end up not being stripped away even on
> systems where it's always false.
> 

This works with the above tweak. So it fixes the problm here, but I was
not sure if we'd get bitten elsewhere by pageblock_order > MAX_ORDER.
It will be slower, but does it only gets called a few time at most at
boot time, right?



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH] arm64: fix MAX_ORDER for 64K pagesize
@ 2014-06-19 18:12       ` Mark Salter
  0 siblings, 0 replies; 28+ messages in thread
From: Mark Salter @ 2014-06-19 18:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 2014-06-17 at 20:32 +0200, Michal Nazarewicz wrote:
> On Wed, Jun 11 2014, David Rientjes wrote:
> > On Wed, 11 Jun 2014, Mark Salter wrote:
> >
> >> With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE
> >> I get this at early boot:
> >> 
> >>   SMP: Total of 8 processors activated.
> >>   devtmpfs: initialized
> >>   Unable to handle kernel NULL pointer dereference at virtual address 00000008
> >>   pgd = fffffe0000050000
> >>   [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
> >>   Internal error: Oops: 96000006 [#1] SMP
> >>   Modules linked in:
> >>   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
> >>   task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
> >>   PC is at __list_add+0x10/0xd4
> >>   LR is at free_one_page+0x270/0x638
> >>   ...
> >>   Call trace:
> >>   [<fffffe00003ee970>] __list_add+0x10/0xd4
> >>   [<fffffe000019c478>] free_one_page+0x26c/0x638
> >>   [<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
> >>   [<fffffe000019d5e8>] __free_pages+0x74/0xbc
> >>   [<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
> >>   [<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
> >>   [<fffffe0000090418>] do_one_initcall+0xc4/0x154
> >>   [<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
> >>   [<fffffe00007520a0>] kernel_init+0xc/0xd4
> >> 
> >> This happens in this configuration because __free_one_page() is called
> >> with an order greater than MAX_ORDER, accesses past zone->free_list[]
> >> and passes a bogus list_head to list_add().
> >> 
> >> arch/arm64/Kconfig has:
> >> 
> >>   config FORCE_MAX_ZONEORDER
> >> 	int
> >> 	default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
> >> 	default "11"
> >> 
> >> So with THP turned off MAX_ORDER == 11 but init_cma_reserved_pageblock()
> >> passes __free_pages() an order of pageblock_order which is based on
> >> (HPAGE_SHIFT - PAGE_SHIFT) which is 13 for 64K pages. I worked around
> >> this by removing the THP test so FORCE_MAX_ZONEORDER is always 14 for
> >> ARM64_64K_PAGES.
> >> 
> >> Signed-off-by: Mark Salter <msalter@redhat.com>
> >> ---
> >>  arch/arm64/Kconfig | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >> 
> >> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> >> index 7295419..42a334e 100644
> >> --- a/arch/arm64/Kconfig
> >> +++ b/arch/arm64/Kconfig
> >> @@ -269,7 +269,7 @@ config XEN
> >>  
> >>  config FORCE_MAX_ZONEORDER
> >>  	int
> >> -	default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
> >> +	default "14" if ARM64_64K_PAGES
> >>  	default "11"
> >>  
> >>  endmenu
> >
> > Any reason to not switch this to
> >
> > 	ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE && CMA
> >
> > instead?  If pageblock_order > MAX_ORDER because of 
> > HPAGE_SHIFT > PAGE_SHIFT, then cma is always going to be passing a 
> > too-large-order to free_pages_prepare() via this path.
> >
> > Adding Michal and Marek to the cc.
> 
> The correct fix would be to change init_cma_reserved_pageblock such that
> it checks whether pageblock_order > MAX_ORDER and if so frees each max
> order page of the pageblock individually:
> 
> --------- >8 ---------------------------------------------------------
> From: Michal Nazarewicz <mina86@mina86.com>
> Subject: [PATCH] mm: cma: fix cases where pageblock is bigger then MAX_ORDER
> 
> With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE,
> the following is triggered at early boot:
> 
>   SMP: Total of 8 processors activated.
>   devtmpfs: initialized
>   Unable to handle kernel NULL pointer dereference at virtual address 00000008
>   pgd = fffffe0000050000
>   [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
>   Internal error: Oops: 96000006 [#1] SMP
>   Modules linked in:
>   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
>   task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
>   PC is at __list_add+0x10/0xd4
>   LR is at free_one_page+0x270/0x638
>   ...
>   Call trace:
>   [<fffffe00003ee970>] __list_add+0x10/0xd4
>   [<fffffe000019c478>] free_one_page+0x26c/0x638
>   [<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
>   [<fffffe000019d5e8>] __free_pages+0x74/0xbc
>   [<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
>   [<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
>   [<fffffe0000090418>] do_one_initcall+0xc4/0x154
>   [<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
>   [<fffffe00007520a0>] kernel_init+0xc/0xd4
> 
> This happens in this configuration because __free_one_page() is called
> with an order greater than MAX_ORDER, accesses past zone->free_list[]
> and passes a bogus list_head to list_add().
> 
> arch/arm64/Kconfig has:
> 
>   config FORCE_MAX_ZONEORDER
> 	int
> 	default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
> 	default "11"
> 
> So with THP turned off MAX_ORDER == 11 but init_cma_reserved_pageblock()
> passes __free_pages() an order of pageblock_order which is based on
> (HPAGE_SHIFT - PAGE_SHIFT) which is 13 for 64K pages.
> 
> Fix the problem by changing init_cma_reserved_pageblock() such that it
> splits pageblock into individual MAX_ORDER pages if pageblock is
> bigger than a MAX_ORDER page.
> 
> Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
> Reported-by: Mark Salter <msalter@redhat.com>
> ---
>  mm/page_alloc.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5dba293..6e657ce 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -801,7 +801,15 @@ void __init init_cma_reserved_pageblock(struct page *page)
>  
>  	set_page_refcounted(page);
>  	set_pageblock_migratetype(page, MIGRATE_CMA);
> -	__free_pages(page, pageblock_order);
> +	if (pageblock_order > MAX_ORDER) {
> +		struct page *subpage = p;
> +		unsigned count = 1 << (pageblock_order - MAX_ORDER);
> +		do {
> +			__free_pages(subpage, pageblock_order);
                                               ^^^^^^^
                                               MAX_ORDER

> +		} while (subpage += MAX_ORDER_NR_PAGES, --count);
> +	} else {
> +		__free_pages(page, pageblock_order);
> +	}
>  	adjust_managed_page_count(page, pageblock_nr_pages);
>  }
>  #endif
> --------- >8 ---------------------------------------------------------
> 
> Thoughts?  This has not been tested and I think it may cause performance
> degradation in some cases since pageblock_order is not always
> a constant, so the comparison may end up not being stripped away even on
> systems where it's always false.
> 

This works with the above tweak. So it fixes the problm here, but I was
not sure if we'd get bitten elsewhere by pageblock_order > MAX_ORDER.
It will be slower, but does it only gets called a few time at most at
boot time, right?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] arm64: fix MAX_ORDER for 64K pagesize
  2014-06-19 18:12       ` Mark Salter
@ 2014-06-19 19:24         ` Michal Nazarewicz
  -1 siblings, 0 replies; 28+ messages in thread
From: Michal Nazarewicz @ 2014-06-19 19:24 UTC (permalink / raw)
  To: Mark Salter
  Cc: David Rientjes, Marek Szyprowski, Catalin Marinas,
	linux-arm-kernel, linux-kernel

On Thu, Jun 19 2014, Mark Salter <msalter@redhat.com> wrote:
> On Tue, 2014-06-17 at 20:32 +0200, Michal Nazarewicz wrote:
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 5dba293..6e657ce 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -801,7 +801,15 @@ void __init init_cma_reserved_pageblock(struct page *page)
>>  
>>  	set_page_refcounted(page);
>>  	set_pageblock_migratetype(page, MIGRATE_CMA);
>> -	__free_pages(page, pageblock_order);
>> +	if (pageblock_order > MAX_ORDER) {
>> +		struct page *subpage = p;
>> +		unsigned count = 1 << (pageblock_order - MAX_ORDER);
>> +		do {
>> +			__free_pages(subpage, pageblock_order);
>                                                ^^^^^^^
>                                                MAX_ORDER

D'oh!  I'll send a revised patch.

>> +		} while (subpage += MAX_ORDER_NR_PAGES, --count);
>> +	} else {
>> +		__free_pages(page, pageblock_order);
>> +	}
>>  	adjust_managed_page_count(page, pageblock_nr_pages);
>>  }
>>  #endif
>> --------- >8 ---------------------------------------------------------
>> 
>> Thoughts?  This has not been tested and I think it may cause performance
>> degradation in some cases since pageblock_order is not always
>> a constant, so the comparison may end up not being stripped away even on
>> systems where it's always false.

> This works with the above tweak. So it fixes the problm here, but I was
> not sure if we'd get bitten elsewhere by pageblock_order > MAX_ORDER.

This is always a possibility, but in such cases, it's a bug in CMA.
I've tried to keep in mind that pageblock_order may be greater than
MAX_ORDER when writing CMA, but I've never tested on such a system.

> It will be slower, but does it only gets called a few time at most at
> boot time, right?

Yes.  The performance degradation should be negligible since
init_cma_reserved is hardly a critical path and is called at most
MAX_CMA_AREAS times which by default is 8.  And I mean it will be slower
because it will have to perform a branch.


-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +--<mpn@google.com>--<xmpp:mina86@jabber.org>--ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH] arm64: fix MAX_ORDER for 64K pagesize
@ 2014-06-19 19:24         ` Michal Nazarewicz
  0 siblings, 0 replies; 28+ messages in thread
From: Michal Nazarewicz @ 2014-06-19 19:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jun 19 2014, Mark Salter <msalter@redhat.com> wrote:
> On Tue, 2014-06-17 at 20:32 +0200, Michal Nazarewicz wrote:
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 5dba293..6e657ce 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -801,7 +801,15 @@ void __init init_cma_reserved_pageblock(struct page *page)
>>  
>>  	set_page_refcounted(page);
>>  	set_pageblock_migratetype(page, MIGRATE_CMA);
>> -	__free_pages(page, pageblock_order);
>> +	if (pageblock_order > MAX_ORDER) {
>> +		struct page *subpage = p;
>> +		unsigned count = 1 << (pageblock_order - MAX_ORDER);
>> +		do {
>> +			__free_pages(subpage, pageblock_order);
>                                                ^^^^^^^
>                                                MAX_ORDER

D'oh!  I'll send a revised patch.

>> +		} while (subpage += MAX_ORDER_NR_PAGES, --count);
>> +	} else {
>> +		__free_pages(page, pageblock_order);
>> +	}
>>  	adjust_managed_page_count(page, pageblock_nr_pages);
>>  }
>>  #endif
>> --------- >8 ---------------------------------------------------------
>> 
>> Thoughts?  This has not been tested and I think it may cause performance
>> degradation in some cases since pageblock_order is not always
>> a constant, so the comparison may end up not being stripped away even on
>> systems where it's always false.

> This works with the above tweak. So it fixes the problm here, but I was
> not sure if we'd get bitten elsewhere by pageblock_order > MAX_ORDER.

This is always a possibility, but in such cases, it's a bug in CMA.
I've tried to keep in mind that pageblock_order may be greater than
MAX_ORDER when writing CMA, but I've never tested on such a system.

> It will be slower, but does it only gets called a few time at most at
> boot time, right?

Yes.  The performance degradation should be negligible since
init_cma_reserved is hardly a critical path and is called at most
MAX_CMA_AREAS times which by default is 8.  And I mean it will be slower
because it will have to perform a branch.


-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Micha? ?mina86? Nazarewicz    (o o)
ooo +--<mpn@google.com>--<xmpp:mina86@jabber.org>--ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCHv2] mm: page_alloc: fix CMA area initialisation when pageblock > MAX_ORDER
  2014-06-19 18:12       ` Mark Salter
@ 2014-06-19 19:53         ` Michal Nazarewicz
  -1 siblings, 0 replies; 28+ messages in thread
From: Michal Nazarewicz @ 2014-06-19 19:53 UTC (permalink / raw)
  To: Mark Salter
  Cc: David Rientjes, Marek Szyprowski, Catalin Marinas,
	linux-arm-kernel, linux-kernel

With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE,
the following is triggered at early boot:

  SMP: Total of 8 processors activated.
  devtmpfs: initialized
  Unable to handle kernel NULL pointer dereference at virtual address 00000008
  pgd = fffffe0000050000
  [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
  Internal error: Oops: 96000006 [#1] SMP
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
  task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
  PC is at __list_add+0x10/0xd4
  LR is at free_one_page+0x270/0x638
  ...
  Call trace:
  [<fffffe00003ee970>] __list_add+0x10/0xd4
  [<fffffe000019c478>] free_one_page+0x26c/0x638
  [<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
  [<fffffe000019d5e8>] __free_pages+0x74/0xbc
  [<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
  [<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
  [<fffffe0000090418>] do_one_initcall+0xc4/0x154
  [<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
  [<fffffe00007520a0>] kernel_init+0xc/0xd4

This happens because init_cma_reserved_pageblock() calls
__free_one_page() with pageblock_order as page order but it is bigger
han MAX_ORDER.  This in turn causes accesses past zone->free_list[].

Fix the problem by changing init_cma_reserved_pageblock() such that it
splits pageblock into individual MAX_ORDER pages if pageblock is
bigger than a MAX_ORDER page.

In cases where !CONFIG_HUGETLB_PAGE_SIZE_VARIABLE, which is all
architectures expect for ia64, powerpc and tile at the moment, the
“pageblock_order > MAX_ORDER” condition will be optimised out since
both sides of the operator are constants.  In cases where pageblock
size is variable, the performance degradation should not be
significant anyway since init_cma_reserved_pageblock() is called
only at boot time at most MAX_CMA_AREAS times which by default is
eight.

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Reported-by: Mark Salter <msalter@redhat.com>
---
 mm/page_alloc.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7f97767..fe114db 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -817,7 +817,18 @@ void __init init_cma_reserved_pageblock(struct page *page)
 
 	set_page_refcounted(page);
 	set_pageblock_migratetype(page, MIGRATE_CMA);
-	__free_pages(page, pageblock_order);
+
+	if (pageblock_order > MAX_ORDER) {
+		i = pageblock_order - MAX_ORDER;
+		i = 1 << i;
+		p = page;
+		do {
+			__free_pages(p, MAX_ORDER);
+		} while (p += MAX_ORDER_NR_PAGES, --i);
+	} else {
+		__free_pages(page, pageblock_order);
+	}
+
 	adjust_managed_page_count(page, pageblock_nr_pages);
 }
 #endif
-- 
2.0.0.526.g5318336


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCHv2] mm: page_alloc: fix CMA area initialisation when pageblock > MAX_ORDER
@ 2014-06-19 19:53         ` Michal Nazarewicz
  0 siblings, 0 replies; 28+ messages in thread
From: Michal Nazarewicz @ 2014-06-19 19:53 UTC (permalink / raw)
  To: linux-arm-kernel

With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE,
the following is triggered at early boot:

  SMP: Total of 8 processors activated.
  devtmpfs: initialized
  Unable to handle kernel NULL pointer dereference at virtual address 00000008
  pgd = fffffe0000050000
  [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
  Internal error: Oops: 96000006 [#1] SMP
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
  task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
  PC is at __list_add+0x10/0xd4
  LR is at free_one_page+0x270/0x638
  ...
  Call trace:
  [<fffffe00003ee970>] __list_add+0x10/0xd4
  [<fffffe000019c478>] free_one_page+0x26c/0x638
  [<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
  [<fffffe000019d5e8>] __free_pages+0x74/0xbc
  [<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
  [<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
  [<fffffe0000090418>] do_one_initcall+0xc4/0x154
  [<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
  [<fffffe00007520a0>] kernel_init+0xc/0xd4

This happens because init_cma_reserved_pageblock() calls
__free_one_page() with pageblock_order as page order but it is bigger
han MAX_ORDER.  This in turn causes accesses past zone->free_list[].

Fix the problem by changing init_cma_reserved_pageblock() such that it
splits pageblock into individual MAX_ORDER pages if pageblock is
bigger than a MAX_ORDER page.

In cases where !CONFIG_HUGETLB_PAGE_SIZE_VARIABLE, which is all
architectures expect for ia64, powerpc and tile at the moment, the
?pageblock_order > MAX_ORDER? condition will be optimised out since
both sides of the operator are constants.  In cases where pageblock
size is variable, the performance degradation should not be
significant anyway since init_cma_reserved_pageblock() is called
only at boot time at most MAX_CMA_AREAS times which by default is
eight.

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Reported-by: Mark Salter <msalter@redhat.com>
---
 mm/page_alloc.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7f97767..fe114db 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -817,7 +817,18 @@ void __init init_cma_reserved_pageblock(struct page *page)
 
 	set_page_refcounted(page);
 	set_pageblock_migratetype(page, MIGRATE_CMA);
-	__free_pages(page, pageblock_order);
+
+	if (pageblock_order > MAX_ORDER) {
+		i = pageblock_order - MAX_ORDER;
+		i = 1 << i;
+		p = page;
+		do {
+			__free_pages(p, MAX_ORDER);
+		} while (p += MAX_ORDER_NR_PAGES, --i);
+	} else {
+		__free_pages(page, pageblock_order);
+	}
+
 	adjust_managed_page_count(page, pageblock_nr_pages);
 }
 #endif
-- 
2.0.0.526.g5318336

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCHv2] mm: page_alloc: fix CMA area initialisation when pageblock > MAX_ORDER
  2014-06-19 19:53         ` Michal Nazarewicz
@ 2014-06-20 13:54           ` Christopher Covington
  -1 siblings, 0 replies; 28+ messages in thread
From: Christopher Covington @ 2014-06-20 13:54 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: Mark Salter, Catalin Marinas, Marek Szyprowski, linux-kernel,
	linux-arm-kernel, David Rientjes

On 06/19/2014 03:53 PM, Michal Nazarewicz wrote:
> With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE,
> the following is triggered at early boot:
> 
>   SMP: Total of 8 processors activated.
>   devtmpfs: initialized
>   Unable to handle kernel NULL pointer dereference at virtual address 00000008
>   pgd = fffffe0000050000
>   [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
>   Internal error: Oops: 96000006 [#1] SMP
>   Modules linked in:
>   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
>   task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
>   PC is at __list_add+0x10/0xd4
>   LR is at free_one_page+0x270/0x638
>   ...
>   Call trace:
>   [<fffffe00003ee970>] __list_add+0x10/0xd4
>   [<fffffe000019c478>] free_one_page+0x26c/0x638
>   [<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
>   [<fffffe000019d5e8>] __free_pages+0x74/0xbc
>   [<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
>   [<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
>   [<fffffe0000090418>] do_one_initcall+0xc4/0x154
>   [<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
>   [<fffffe00007520a0>] kernel_init+0xc/0xd4

I just ran into this. Thanks for the fix.

Tested-by: Christopher Covington <cov@codeaurora.org>

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by the Linux Foundation.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCHv2] mm: page_alloc: fix CMA area initialisation when pageblock > MAX_ORDER
@ 2014-06-20 13:54           ` Christopher Covington
  0 siblings, 0 replies; 28+ messages in thread
From: Christopher Covington @ 2014-06-20 13:54 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/19/2014 03:53 PM, Michal Nazarewicz wrote:
> With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE,
> the following is triggered at early boot:
> 
>   SMP: Total of 8 processors activated.
>   devtmpfs: initialized
>   Unable to handle kernel NULL pointer dereference at virtual address 00000008
>   pgd = fffffe0000050000
>   [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
>   Internal error: Oops: 96000006 [#1] SMP
>   Modules linked in:
>   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
>   task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
>   PC is at __list_add+0x10/0xd4
>   LR is at free_one_page+0x270/0x638
>   ...
>   Call trace:
>   [<fffffe00003ee970>] __list_add+0x10/0xd4
>   [<fffffe000019c478>] free_one_page+0x26c/0x638
>   [<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
>   [<fffffe000019d5e8>] __free_pages+0x74/0xbc
>   [<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
>   [<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
>   [<fffffe0000090418>] do_one_initcall+0xc4/0x154
>   [<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
>   [<fffffe00007520a0>] kernel_init+0xc/0xd4

I just ran into this. Thanks for the fix.

Tested-by: Christopher Covington <cov@codeaurora.org>

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by the Linux Foundation.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCHv2] mm: page_alloc: fix CMA area initialisation when pageblock > MAX_ORDER
  2014-06-19 19:53         ` Michal Nazarewicz
@ 2014-06-20 15:48           ` Mark Salter
  -1 siblings, 0 replies; 28+ messages in thread
From: Mark Salter @ 2014-06-20 15:48 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: David Rientjes, Marek Szyprowski, Catalin Marinas,
	linux-arm-kernel, linux-kernel

On Thu, 2014-06-19 at 21:53 +0200, Michal Nazarewicz wrote:
> With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE,
> the following is triggered at early boot:
> 
>   SMP: Total of 8 processors activated.
>   devtmpfs: initialized
>   Unable to handle kernel NULL pointer dereference at virtual address 00000008
>   pgd = fffffe0000050000
>   [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
>   Internal error: Oops: 96000006 [#1] SMP
>   Modules linked in:
>   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
>   task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
>   PC is at __list_add+0x10/0xd4
>   LR is at free_one_page+0x270/0x638
>   ...
>   Call trace:
>   [<fffffe00003ee970>] __list_add+0x10/0xd4
>   [<fffffe000019c478>] free_one_page+0x26c/0x638
>   [<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
>   [<fffffe000019d5e8>] __free_pages+0x74/0xbc
>   [<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
>   [<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
>   [<fffffe0000090418>] do_one_initcall+0xc4/0x154
>   [<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
>   [<fffffe00007520a0>] kernel_init+0xc/0xd4
> 
> This happens because init_cma_reserved_pageblock() calls
> __free_one_page() with pageblock_order as page order but it is bigger
> han MAX_ORDER.  This in turn causes accesses past zone->free_list[].
> 
> Fix the problem by changing init_cma_reserved_pageblock() such that it
> splits pageblock into individual MAX_ORDER pages if pageblock is
> bigger than a MAX_ORDER page.
> 
> In cases where !CONFIG_HUGETLB_PAGE_SIZE_VARIABLE, which is all
> architectures expect for ia64, powerpc and tile at the moment, the
> “pageblock_order > MAX_ORDER” condition will be optimised out since
> both sides of the operator are constants.  In cases where pageblock
> size is variable, the performance degradation should not be
> significant anyway since init_cma_reserved_pageblock() is called
> only at boot time at most MAX_CMA_AREAS times which by default is
> eight.
> 
> Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
> Reported-by: Mark Salter <msalter@redhat.com>
> ---
>  mm/page_alloc.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7f97767..fe114db 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -817,7 +817,18 @@ void __init init_cma_reserved_pageblock(struct page *page)
>  
>  	set_page_refcounted(page);
>  	set_pageblock_migratetype(page, MIGRATE_CMA);
> -	__free_pages(page, pageblock_order);
> +
> +	if (pageblock_order > MAX_ORDER) {
> +		i = pageblock_order - MAX_ORDER;
> +		i = 1 << i;
> +		p = page;
> +		do {
> +			__free_pages(p, MAX_ORDER);
> +		} while (p += MAX_ORDER_NR_PAGES, --i);
> +	} else {
> +		__free_pages(page, pageblock_order);
> +	}
> +
>  	adjust_managed_page_count(page, pageblock_nr_pages);
>  }
>  #endif

This still isn't quite right. __free_pages can only take up to
MAX_ORDER-1 (MAX_ORDER_NR_PAGES is 1 << (MAX_ORDER - 1)). But
I'm hitting a slightly different issue even with that fixed up. 
Still looking...



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCHv2] mm: page_alloc: fix CMA area initialisation when pageblock > MAX_ORDER
@ 2014-06-20 15:48           ` Mark Salter
  0 siblings, 0 replies; 28+ messages in thread
From: Mark Salter @ 2014-06-20 15:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2014-06-19 at 21:53 +0200, Michal Nazarewicz wrote:
> With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE,
> the following is triggered at early boot:
> 
>   SMP: Total of 8 processors activated.
>   devtmpfs: initialized
>   Unable to handle kernel NULL pointer dereference at virtual address 00000008
>   pgd = fffffe0000050000
>   [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
>   Internal error: Oops: 96000006 [#1] SMP
>   Modules linked in:
>   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
>   task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
>   PC is at __list_add+0x10/0xd4
>   LR is at free_one_page+0x270/0x638
>   ...
>   Call trace:
>   [<fffffe00003ee970>] __list_add+0x10/0xd4
>   [<fffffe000019c478>] free_one_page+0x26c/0x638
>   [<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
>   [<fffffe000019d5e8>] __free_pages+0x74/0xbc
>   [<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
>   [<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
>   [<fffffe0000090418>] do_one_initcall+0xc4/0x154
>   [<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
>   [<fffffe00007520a0>] kernel_init+0xc/0xd4
> 
> This happens because init_cma_reserved_pageblock() calls
> __free_one_page() with pageblock_order as page order but it is bigger
> han MAX_ORDER.  This in turn causes accesses past zone->free_list[].
> 
> Fix the problem by changing init_cma_reserved_pageblock() such that it
> splits pageblock into individual MAX_ORDER pages if pageblock is
> bigger than a MAX_ORDER page.
> 
> In cases where !CONFIG_HUGETLB_PAGE_SIZE_VARIABLE, which is all
> architectures expect for ia64, powerpc and tile at the moment, the
> ?pageblock_order > MAX_ORDER? condition will be optimised out since
> both sides of the operator are constants.  In cases where pageblock
> size is variable, the performance degradation should not be
> significant anyway since init_cma_reserved_pageblock() is called
> only at boot time at most MAX_CMA_AREAS times which by default is
> eight.
> 
> Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
> Reported-by: Mark Salter <msalter@redhat.com>
> ---
>  mm/page_alloc.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7f97767..fe114db 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -817,7 +817,18 @@ void __init init_cma_reserved_pageblock(struct page *page)
>  
>  	set_page_refcounted(page);
>  	set_pageblock_migratetype(page, MIGRATE_CMA);
> -	__free_pages(page, pageblock_order);
> +
> +	if (pageblock_order > MAX_ORDER) {
> +		i = pageblock_order - MAX_ORDER;
> +		i = 1 << i;
> +		p = page;
> +		do {
> +			__free_pages(p, MAX_ORDER);
> +		} while (p += MAX_ORDER_NR_PAGES, --i);
> +	} else {
> +		__free_pages(page, pageblock_order);
> +	}
> +
>  	adjust_managed_page_count(page, pageblock_nr_pages);
>  }
>  #endif

This still isn't quite right. __free_pages can only take up to
MAX_ORDER-1 (MAX_ORDER_NR_PAGES is 1 << (MAX_ORDER - 1)). But
I'm hitting a slightly different issue even with that fixed up. 
Still looking...

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCHv2] mm: page_alloc: fix CMA area initialisation when pageblock > MAX_ORDER
  2014-06-20 15:48           ` Mark Salter
@ 2014-06-20 16:36             ` Michal Nazarewicz
  -1 siblings, 0 replies; 28+ messages in thread
From: Michal Nazarewicz @ 2014-06-20 16:36 UTC (permalink / raw)
  To: Mark Salter
  Cc: David Rientjes, Marek Szyprowski, Catalin Marinas,
	linux-arm-kernel, linux-kernel

On Fri, Jun 20 2014, Mark Salter <msalter@redhat.com> wrote:
> This still isn't quite right. __free_pages can only take up to
> MAX_ORDER-1 (MAX_ORDER_NR_PAGES is 1 << (MAX_ORDER - 1)).

Good catch.  I'll send v3 in a few days then.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +--<mpn@google.com>--<xmpp:mina86@jabber.org>--ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCHv2] mm: page_alloc: fix CMA area initialisation when pageblock > MAX_ORDER
@ 2014-06-20 16:36             ` Michal Nazarewicz
  0 siblings, 0 replies; 28+ messages in thread
From: Michal Nazarewicz @ 2014-06-20 16:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jun 20 2014, Mark Salter <msalter@redhat.com> wrote:
> This still isn't quite right. __free_pages can only take up to
> MAX_ORDER-1 (MAX_ORDER_NR_PAGES is 1 << (MAX_ORDER - 1)).

Good catch.  I'll send v3 in a few days then.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Micha? ?mina86? Nazarewicz    (o o)
ooo +--<mpn@google.com>--<xmpp:mina86@jabber.org>--ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] arm64: fix MAX_ORDER for 64K pagesize
  2014-06-19 19:24         ` Michal Nazarewicz
@ 2014-06-20 17:37           ` Mark Salter
  -1 siblings, 0 replies; 28+ messages in thread
From: Mark Salter @ 2014-06-20 17:37 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: David Rientjes, Marek Szyprowski, Catalin Marinas,
	linux-arm-kernel, linux-kernel

On Thu, 2014-06-19 at 21:24 +0200, Michal Nazarewicz wrote:
> On Thu, Jun 19 2014, Mark Salter <msalter@redhat.com> wrote:
> > On Tue, 2014-06-17 at 20:32 +0200, Michal Nazarewicz wrote:
> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >> index 5dba293..6e657ce 100644
> >> --- a/mm/page_alloc.c
> >> +++ b/mm/page_alloc.c
> >> @@ -801,7 +801,15 @@ void __init init_cma_reserved_pageblock(struct page *page)
> >>  
> >>  	set_page_refcounted(page);
> >>  	set_pageblock_migratetype(page, MIGRATE_CMA);
> >> -	__free_pages(page, pageblock_order);
> >> +	if (pageblock_order > MAX_ORDER) {
> >> +		struct page *subpage = p;
> >> +		unsigned count = 1 << (pageblock_order - MAX_ORDER);
> >> +		do {
> >> +			__free_pages(subpage, pageblock_order);
> >                                                ^^^^^^^
> >                                                MAX_ORDER
> 
> D'oh!  I'll send a revised patch.
> 
> >> +		} while (subpage += MAX_ORDER_NR_PAGES, --count);
> >> +	} else {
> >> +		__free_pages(page, pageblock_order);
> >> +	}
> >>  	adjust_managed_page_count(page, pageblock_nr_pages);
> >>  }
> >>  #endif
> >> --------- >8 ---------------------------------------------------------
> >> 
> >> Thoughts?  This has not been tested and I think it may cause performance
> >> degradation in some cases since pageblock_order is not always
> >> a constant, so the comparison may end up not being stripped away even on
> >> systems where it's always false.
> 
> > This works with the above tweak. So it fixes the problm here, but I was
> > not sure if we'd get bitten elsewhere by pageblock_order > MAX_ORDER.
> 
> This is always a possibility, but in such cases, it's a bug in CMA.
> I've tried to keep in mind that pageblock_order may be greater than
> MAX_ORDER when writing CMA, but I've never tested on such a system.
> 
> > It will be slower, but does it only gets called a few time at most at
> > boot time, right?
> 
> Yes.  The performance degradation should be negligible since
> init_cma_reserved is hardly a critical path and is called at most
> MAX_CMA_AREAS times which by default is 8.  And I mean it will be slower
> because it will have to perform a branch.
> 

I ended up needing this (on top of your patch) to get the system to
boot. Each MAX_ORDER-1 group needs the refcount and migratetype set so
that __free_pages does the right thing.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 02fb1ed..a7ca6cc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -799,17 +799,18 @@ void __init init_cma_reserved_pageblock(struct page *page)
 		set_page_count(p, 0);
 	} while (++p, --i);
 
-	set_page_refcounted(page);
-	set_pageblock_migratetype(page, MIGRATE_CMA);
-
-	if (pageblock_order > MAX_ORDER) {
-		i = pageblock_order - MAX_ORDER;
+	if (pageblock_order >= MAX_ORDER) {
+		i = pageblock_order - MAX_ORDER + 1;
 		i = 1 << i;
 		p = page;
 		do {
-			__free_pages(p, MAX_ORDER);
+			set_page_refcounted(p);
+			set_pageblock_migratetype(p, MIGRATE_CMA);
+			__free_pages(p, MAX_ORDER - 1);
 		} while (p += MAX_ORDER_NR_PAGES, --i);
 	} else {
+		set_page_refcounted(page);
+		set_pageblock_migratetype(page, MIGRATE_CMA);
 		__free_pages(page, pageblock_order);
 	}




^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH] arm64: fix MAX_ORDER for 64K pagesize
@ 2014-06-20 17:37           ` Mark Salter
  0 siblings, 0 replies; 28+ messages in thread
From: Mark Salter @ 2014-06-20 17:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2014-06-19 at 21:24 +0200, Michal Nazarewicz wrote:
> On Thu, Jun 19 2014, Mark Salter <msalter@redhat.com> wrote:
> > On Tue, 2014-06-17 at 20:32 +0200, Michal Nazarewicz wrote:
> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >> index 5dba293..6e657ce 100644
> >> --- a/mm/page_alloc.c
> >> +++ b/mm/page_alloc.c
> >> @@ -801,7 +801,15 @@ void __init init_cma_reserved_pageblock(struct page *page)
> >>  
> >>  	set_page_refcounted(page);
> >>  	set_pageblock_migratetype(page, MIGRATE_CMA);
> >> -	__free_pages(page, pageblock_order);
> >> +	if (pageblock_order > MAX_ORDER) {
> >> +		struct page *subpage = p;
> >> +		unsigned count = 1 << (pageblock_order - MAX_ORDER);
> >> +		do {
> >> +			__free_pages(subpage, pageblock_order);
> >                                                ^^^^^^^
> >                                                MAX_ORDER
> 
> D'oh!  I'll send a revised patch.
> 
> >> +		} while (subpage += MAX_ORDER_NR_PAGES, --count);
> >> +	} else {
> >> +		__free_pages(page, pageblock_order);
> >> +	}
> >>  	adjust_managed_page_count(page, pageblock_nr_pages);
> >>  }
> >>  #endif
> >> --------- >8 ---------------------------------------------------------
> >> 
> >> Thoughts?  This has not been tested and I think it may cause performance
> >> degradation in some cases since pageblock_order is not always
> >> a constant, so the comparison may end up not being stripped away even on
> >> systems where it's always false.
> 
> > This works with the above tweak. So it fixes the problm here, but I was
> > not sure if we'd get bitten elsewhere by pageblock_order > MAX_ORDER.
> 
> This is always a possibility, but in such cases, it's a bug in CMA.
> I've tried to keep in mind that pageblock_order may be greater than
> MAX_ORDER when writing CMA, but I've never tested on such a system.
> 
> > It will be slower, but does it only gets called a few time at most at
> > boot time, right?
> 
> Yes.  The performance degradation should be negligible since
> init_cma_reserved is hardly a critical path and is called at most
> MAX_CMA_AREAS times which by default is 8.  And I mean it will be slower
> because it will have to perform a branch.
> 

I ended up needing this (on top of your patch) to get the system to
boot. Each MAX_ORDER-1 group needs the refcount and migratetype set so
that __free_pages does the right thing.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 02fb1ed..a7ca6cc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -799,17 +799,18 @@ void __init init_cma_reserved_pageblock(struct page *page)
 		set_page_count(p, 0);
 	} while (++p, --i);
 
-	set_page_refcounted(page);
-	set_pageblock_migratetype(page, MIGRATE_CMA);
-
-	if (pageblock_order > MAX_ORDER) {
-		i = pageblock_order - MAX_ORDER;
+	if (pageblock_order >= MAX_ORDER) {
+		i = pageblock_order - MAX_ORDER + 1;
 		i = 1 << i;
 		p = page;
 		do {
-			__free_pages(p, MAX_ORDER);
+			set_page_refcounted(p);
+			set_pageblock_migratetype(p, MIGRATE_CMA);
+			__free_pages(p, MAX_ORDER - 1);
 		} while (p += MAX_ORDER_NR_PAGES, --i);
 	} else {
+		set_page_refcounted(page);
+		set_pageblock_migratetype(page, MIGRATE_CMA);
 		__free_pages(page, pageblock_order);
 	}

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCHv3] mm: page_alloc: fix CMA area initialisation when pageblock > MAX_ORDER
  2014-06-20 17:37           ` Mark Salter
@ 2014-06-23 19:40             ` Michal Nazarewicz
  -1 siblings, 0 replies; 28+ messages in thread
From: Michal Nazarewicz @ 2014-06-23 19:40 UTC (permalink / raw)
  To: Mark Salter
  Cc: David Rientjes, Marek Szyprowski, Catalin Marinas,
	linux-arm-kernel, linux-kernel

With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE,
the following is triggered at early boot:

  SMP: Total of 8 processors activated.
  devtmpfs: initialized
  Unable to handle kernel NULL pointer dereference at virtual address 00000008
  pgd = fffffe0000050000
  [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
  Internal error: Oops: 96000006 [#1] SMP
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
  task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
  PC is at __list_add+0x10/0xd4
  LR is at free_one_page+0x270/0x638
  ...
  Call trace:
  [<fffffe00003ee970>] __list_add+0x10/0xd4
  [<fffffe000019c478>] free_one_page+0x26c/0x638
  [<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
  [<fffffe000019d5e8>] __free_pages+0x74/0xbc
  [<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
  [<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
  [<fffffe0000090418>] do_one_initcall+0xc4/0x154
  [<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
  [<fffffe00007520a0>] kernel_init+0xc/0xd4

This happens because init_cma_reserved_pageblock() calls
__free_one_page() with pageblock_order as page order but it is bigger
han MAX_ORDER.  This in turn causes accesses past zone->free_list[].

Fix the problem by changing init_cma_reserved_pageblock() such that it
splits pageblock into individual MAX_ORDER pages if pageblock is
bigger than a MAX_ORDER page.

In cases where !CONFIG_HUGETLB_PAGE_SIZE_VARIABLE, which is all
architectures expect for ia64, powerpc and tile at the moment, the
“pageblock_order > MAX_ORDER” condition will be optimised out since
both sides of the operator are constants.  In cases where pageblock
size is variable, the performance degradation should not be
significant anyway since init_cma_reserved_pageblock() is called
only at boot time at most MAX_CMA_AREAS times which by default is
eight.

Cc: stable@vger.kernel.org
Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Reported-by: Mark Salter <msalter@redhat.com>
Tested-by: Christopher Covington <cov@codeaurora.org>
---
 mm/page_alloc.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

 Mark Salter wrote:
 > I ended up needing this (on top of your patch) to get the system to
 > boot.  Each MAX_ORDER-1 group needs the refcount and migratetype set
 > so that __free_pages does the right thing.
 >
 > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
 > index 02fb1ed..a7ca6cc 100644
 > --- a/mm/page_alloc.c
 > +++ b/mm/page_alloc.c
 > @@ -799,17 +799,18 @@ void __init init_cma_reserved_pageblock(struct page *page)
 >  		set_page_count(p, 0);
 >  	} while (++p, --i);
 >  
 > -	set_page_refcounted(page);
 > -	set_pageblock_migratetype(page, MIGRATE_CMA);
 > -
 > -	if (pageblock_order > MAX_ORDER) {
 > -		i = pageblock_order - MAX_ORDER;
 > +	if (pageblock_order >= MAX_ORDER) {
 > +		i = pageblock_order - MAX_ORDER + 1;
 >  		i = 1 << i;
 >  		p = page;
 >  		do {
 > -			__free_pages(p, MAX_ORDER);
 > +			set_page_refcounted(p);
 > +			set_pageblock_migratetype(p, MIGRATE_CMA);
 > +			__free_pages(p, MAX_ORDER - 1);
 >  		} while (p += MAX_ORDER_NR_PAGES, --i);
 >  	} else {
 > +		set_page_refcounted(page);
 > +		set_pageblock_migratetype(page, MIGRATE_CMA);
 >  		__free_pages(page, pageblock_order);
 >  	}

 This is kinda embarrassing, dunno how I missed that.

 But each page actually does not need to have migratetype set, does it?
 All of those pages are in a single pageblock so a single call
 suffices.  If you track set_pageblock_migratetype down to pfn_to_bitidx
 there is:

	return (pfn >> pageblock_order) * NR_PAGEBLOCK_BITS;

 so for pfns inside of a pageblock, they get truncated.  Or did I miss
 yet another thing?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ee92384..fef9614 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -816,9 +816,21 @@ void __init init_cma_reserved_pageblock(struct page *page)
 		set_page_count(p, 0);
 	} while (++p, --i);
 
-	set_page_refcounted(page);
 	set_pageblock_migratetype(page, MIGRATE_CMA);
-	__free_pages(page, pageblock_order);
+
+	if (pageblock_order >= MAX_ORDER) {
+		i = pageblock_nr_pages;
+		p = page;
+		do {
+			set_page_refcounted(p);
+			__free_pages(p, MAX_ORDER - 1);
+			p += MAX_ORDER_NR_PAGES;
+		} while (i -= MAX_ORDER_NR_PAGES);
+	} else {
+		set_page_refcounted(page);
+		__free_pages(page, pageblock_order);
+	}
+
 	adjust_managed_page_count(page, pageblock_nr_pages);
 }
 #endif
-- 
2.0.0.526.g5318336

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCHv3] mm: page_alloc: fix CMA area initialisation when pageblock > MAX_ORDER
@ 2014-06-23 19:40             ` Michal Nazarewicz
  0 siblings, 0 replies; 28+ messages in thread
From: Michal Nazarewicz @ 2014-06-23 19:40 UTC (permalink / raw)
  To: linux-arm-kernel

With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE,
the following is triggered at early boot:

  SMP: Total of 8 processors activated.
  devtmpfs: initialized
  Unable to handle kernel NULL pointer dereference at virtual address 00000008
  pgd = fffffe0000050000
  [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
  Internal error: Oops: 96000006 [#1] SMP
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
  task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
  PC is at __list_add+0x10/0xd4
  LR is at free_one_page+0x270/0x638
  ...
  Call trace:
  [<fffffe00003ee970>] __list_add+0x10/0xd4
  [<fffffe000019c478>] free_one_page+0x26c/0x638
  [<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
  [<fffffe000019d5e8>] __free_pages+0x74/0xbc
  [<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
  [<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
  [<fffffe0000090418>] do_one_initcall+0xc4/0x154
  [<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
  [<fffffe00007520a0>] kernel_init+0xc/0xd4

This happens because init_cma_reserved_pageblock() calls
__free_one_page() with pageblock_order as page order but it is bigger
han MAX_ORDER.  This in turn causes accesses past zone->free_list[].

Fix the problem by changing init_cma_reserved_pageblock() such that it
splits pageblock into individual MAX_ORDER pages if pageblock is
bigger than a MAX_ORDER page.

In cases where !CONFIG_HUGETLB_PAGE_SIZE_VARIABLE, which is all
architectures expect for ia64, powerpc and tile at the moment, the
?pageblock_order > MAX_ORDER? condition will be optimised out since
both sides of the operator are constants.  In cases where pageblock
size is variable, the performance degradation should not be
significant anyway since init_cma_reserved_pageblock() is called
only at boot time at most MAX_CMA_AREAS times which by default is
eight.

Cc: stable at vger.kernel.org
Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Reported-by: Mark Salter <msalter@redhat.com>
Tested-by: Christopher Covington <cov@codeaurora.org>
---
 mm/page_alloc.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

 Mark Salter wrote:
 > I ended up needing this (on top of your patch) to get the system to
 > boot.  Each MAX_ORDER-1 group needs the refcount and migratetype set
 > so that __free_pages does the right thing.
 >
 > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
 > index 02fb1ed..a7ca6cc 100644
 > --- a/mm/page_alloc.c
 > +++ b/mm/page_alloc.c
 > @@ -799,17 +799,18 @@ void __init init_cma_reserved_pageblock(struct page *page)
 >  		set_page_count(p, 0);
 >  	} while (++p, --i);
 >  
 > -	set_page_refcounted(page);
 > -	set_pageblock_migratetype(page, MIGRATE_CMA);
 > -
 > -	if (pageblock_order > MAX_ORDER) {
 > -		i = pageblock_order - MAX_ORDER;
 > +	if (pageblock_order >= MAX_ORDER) {
 > +		i = pageblock_order - MAX_ORDER + 1;
 >  		i = 1 << i;
 >  		p = page;
 >  		do {
 > -			__free_pages(p, MAX_ORDER);
 > +			set_page_refcounted(p);
 > +			set_pageblock_migratetype(p, MIGRATE_CMA);
 > +			__free_pages(p, MAX_ORDER - 1);
 >  		} while (p += MAX_ORDER_NR_PAGES, --i);
 >  	} else {
 > +		set_page_refcounted(page);
 > +		set_pageblock_migratetype(page, MIGRATE_CMA);
 >  		__free_pages(page, pageblock_order);
 >  	}

 This is kinda embarrassing, dunno how I missed that.

 But each page actually does not need to have migratetype set, does it?
 All of those pages are in a single pageblock so a single call
 suffices.  If you track set_pageblock_migratetype down to pfn_to_bitidx
 there is:

	return (pfn >> pageblock_order) * NR_PAGEBLOCK_BITS;

 so for pfns inside of a pageblock, they get truncated.  Or did I miss
 yet another thing?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ee92384..fef9614 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -816,9 +816,21 @@ void __init init_cma_reserved_pageblock(struct page *page)
 		set_page_count(p, 0);
 	} while (++p, --i);
 
-	set_page_refcounted(page);
 	set_pageblock_migratetype(page, MIGRATE_CMA);
-	__free_pages(page, pageblock_order);
+
+	if (pageblock_order >= MAX_ORDER) {
+		i = pageblock_nr_pages;
+		p = page;
+		do {
+			set_page_refcounted(p);
+			__free_pages(p, MAX_ORDER - 1);
+			p += MAX_ORDER_NR_PAGES;
+		} while (i -= MAX_ORDER_NR_PAGES);
+	} else {
+		set_page_refcounted(page);
+		__free_pages(page, pageblock_order);
+	}
+
 	adjust_managed_page_count(page, pageblock_nr_pages);
 }
 #endif
-- 
2.0.0.526.g5318336

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCHv3] mm: page_alloc: fix CMA area initialisation when pageblock > MAX_ORDER
  2014-06-23 19:40             ` Michal Nazarewicz
@ 2014-06-23 21:10               ` Mark Salter
  -1 siblings, 0 replies; 28+ messages in thread
From: Mark Salter @ 2014-06-23 21:10 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: David Rientjes, Marek Szyprowski, Catalin Marinas,
	linux-arm-kernel, linux-kernel

On Mon, 2014-06-23 at 21:40 +0200, Michal Nazarewicz wrote:
> With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE,
> the following is triggered at early boot:
> 
>   SMP: Total of 8 processors activated.
>   devtmpfs: initialized
>   Unable to handle kernel NULL pointer dereference at virtual address 00000008
>   pgd = fffffe0000050000
>   [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
>   Internal error: Oops: 96000006 [#1] SMP
>   Modules linked in:
>   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
>   task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
>   PC is at __list_add+0x10/0xd4
>   LR is at free_one_page+0x270/0x638
>   ...
>   Call trace:
>   [<fffffe00003ee970>] __list_add+0x10/0xd4
>   [<fffffe000019c478>] free_one_page+0x26c/0x638
>   [<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
>   [<fffffe000019d5e8>] __free_pages+0x74/0xbc
>   [<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
>   [<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
>   [<fffffe0000090418>] do_one_initcall+0xc4/0x154
>   [<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
>   [<fffffe00007520a0>] kernel_init+0xc/0xd4
> 
> This happens because init_cma_reserved_pageblock() calls
> __free_one_page() with pageblock_order as page order but it is bigger
> han MAX_ORDER.  This in turn causes accesses past zone->free_list[].
> 
> Fix the problem by changing init_cma_reserved_pageblock() such that it
> splits pageblock into individual MAX_ORDER pages if pageblock is
> bigger than a MAX_ORDER page.
> 
> In cases where !CONFIG_HUGETLB_PAGE_SIZE_VARIABLE, which is all
> architectures expect for ia64, powerpc and tile at the moment, the
> “pageblock_order > MAX_ORDER” condition will be optimised out since
> both sides of the operator are constants.  In cases where pageblock
> size is variable, the performance degradation should not be
> significant anyway since init_cma_reserved_pageblock() is called
> only at boot time at most MAX_CMA_AREAS times which by default is
> eight.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
> Reported-by: Mark Salter <msalter@redhat.com>
> Tested-by: Christopher Covington <cov@codeaurora.org>
> ---
>  mm/page_alloc.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
>  Mark Salter wrote:
>  > I ended up needing this (on top of your patch) to get the system to
>  > boot.  Each MAX_ORDER-1 group needs the refcount and migratetype set
>  > so that __free_pages does the right thing.
>  >
>  > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>  > index 02fb1ed..a7ca6cc 100644
>  > --- a/mm/page_alloc.c
>  > +++ b/mm/page_alloc.c
>  > @@ -799,17 +799,18 @@ void __init init_cma_reserved_pageblock(struct page *page)
>  >  		set_page_count(p, 0);
>  >  	} while (++p, --i);
>  >  
>  > -	set_page_refcounted(page);
>  > -	set_pageblock_migratetype(page, MIGRATE_CMA);
>  > -
>  > -	if (pageblock_order > MAX_ORDER) {
>  > -		i = pageblock_order - MAX_ORDER;
>  > +	if (pageblock_order >= MAX_ORDER) {
>  > +		i = pageblock_order - MAX_ORDER + 1;
>  >  		i = 1 << i;
>  >  		p = page;
>  >  		do {
>  > -			__free_pages(p, MAX_ORDER);
>  > +			set_page_refcounted(p);
>  > +			set_pageblock_migratetype(p, MIGRATE_CMA);
>  > +			__free_pages(p, MAX_ORDER - 1);
>  >  		} while (p += MAX_ORDER_NR_PAGES, --i);
>  >  	} else {
>  > +		set_page_refcounted(page);
>  > +		set_pageblock_migratetype(page, MIGRATE_CMA);
>  >  		__free_pages(page, pageblock_order);
>  >  	}
> 
>  This is kinda embarrassing, dunno how I missed that.
> 
>  But each page actually does not need to have migratetype set, does it?
>  All of those pages are in a single pageblock so a single call
>  suffices.  If you track set_pageblock_migratetype down to pfn_to_bitidx
>  there is:
> 
> 	return (pfn >> pageblock_order) * NR_PAGEBLOCK_BITS;
> 
>  so for pfns inside of a pageblock, they get truncated.  Or did I miss
>  yet another thing?

Nope, my turn to miss something. You only need to set migrate type
once per pageblock.

> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ee92384..fef9614 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -816,9 +816,21 @@ void __init init_cma_reserved_pageblock(struct page *page)
>  		set_page_count(p, 0);
>  	} while (++p, --i);
>  
> -	set_page_refcounted(page);
>  	set_pageblock_migratetype(page, MIGRATE_CMA);
> -	__free_pages(page, pageblock_order);
> +
> +	if (pageblock_order >= MAX_ORDER) {
> +		i = pageblock_nr_pages;
> +		p = page;
> +		do {
> +			set_page_refcounted(p);
> +			__free_pages(p, MAX_ORDER - 1);
> +			p += MAX_ORDER_NR_PAGES;
> +		} while (i -= MAX_ORDER_NR_PAGES);
> +	} else {
> +		set_page_refcounted(page);
> +		__free_pages(page, pageblock_order);
> +	}
> +
>  	adjust_managed_page_count(page, pageblock_nr_pages);
>  }
>  #endif

This version works for me. Thanks.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCHv3] mm: page_alloc: fix CMA area initialisation when pageblock > MAX_ORDER
@ 2014-06-23 21:10               ` Mark Salter
  0 siblings, 0 replies; 28+ messages in thread
From: Mark Salter @ 2014-06-23 21:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2014-06-23 at 21:40 +0200, Michal Nazarewicz wrote:
> With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE,
> the following is triggered at early boot:
> 
>   SMP: Total of 8 processors activated.
>   devtmpfs: initialized
>   Unable to handle kernel NULL pointer dereference at virtual address 00000008
>   pgd = fffffe0000050000
>   [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
>   Internal error: Oops: 96000006 [#1] SMP
>   Modules linked in:
>   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
>   task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
>   PC is at __list_add+0x10/0xd4
>   LR is at free_one_page+0x270/0x638
>   ...
>   Call trace:
>   [<fffffe00003ee970>] __list_add+0x10/0xd4
>   [<fffffe000019c478>] free_one_page+0x26c/0x638
>   [<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
>   [<fffffe000019d5e8>] __free_pages+0x74/0xbc
>   [<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
>   [<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
>   [<fffffe0000090418>] do_one_initcall+0xc4/0x154
>   [<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
>   [<fffffe00007520a0>] kernel_init+0xc/0xd4
> 
> This happens because init_cma_reserved_pageblock() calls
> __free_one_page() with pageblock_order as page order but it is bigger
> han MAX_ORDER.  This in turn causes accesses past zone->free_list[].
> 
> Fix the problem by changing init_cma_reserved_pageblock() such that it
> splits pageblock into individual MAX_ORDER pages if pageblock is
> bigger than a MAX_ORDER page.
> 
> In cases where !CONFIG_HUGETLB_PAGE_SIZE_VARIABLE, which is all
> architectures expect for ia64, powerpc and tile at the moment, the
> ?pageblock_order > MAX_ORDER? condition will be optimised out since
> both sides of the operator are constants.  In cases where pageblock
> size is variable, the performance degradation should not be
> significant anyway since init_cma_reserved_pageblock() is called
> only at boot time at most MAX_CMA_AREAS times which by default is
> eight.
> 
> Cc: stable at vger.kernel.org
> Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
> Reported-by: Mark Salter <msalter@redhat.com>
> Tested-by: Christopher Covington <cov@codeaurora.org>
> ---
>  mm/page_alloc.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
>  Mark Salter wrote:
>  > I ended up needing this (on top of your patch) to get the system to
>  > boot.  Each MAX_ORDER-1 group needs the refcount and migratetype set
>  > so that __free_pages does the right thing.
>  >
>  > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>  > index 02fb1ed..a7ca6cc 100644
>  > --- a/mm/page_alloc.c
>  > +++ b/mm/page_alloc.c
>  > @@ -799,17 +799,18 @@ void __init init_cma_reserved_pageblock(struct page *page)
>  >  		set_page_count(p, 0);
>  >  	} while (++p, --i);
>  >  
>  > -	set_page_refcounted(page);
>  > -	set_pageblock_migratetype(page, MIGRATE_CMA);
>  > -
>  > -	if (pageblock_order > MAX_ORDER) {
>  > -		i = pageblock_order - MAX_ORDER;
>  > +	if (pageblock_order >= MAX_ORDER) {
>  > +		i = pageblock_order - MAX_ORDER + 1;
>  >  		i = 1 << i;
>  >  		p = page;
>  >  		do {
>  > -			__free_pages(p, MAX_ORDER);
>  > +			set_page_refcounted(p);
>  > +			set_pageblock_migratetype(p, MIGRATE_CMA);
>  > +			__free_pages(p, MAX_ORDER - 1);
>  >  		} while (p += MAX_ORDER_NR_PAGES, --i);
>  >  	} else {
>  > +		set_page_refcounted(page);
>  > +		set_pageblock_migratetype(page, MIGRATE_CMA);
>  >  		__free_pages(page, pageblock_order);
>  >  	}
> 
>  This is kinda embarrassing, dunno how I missed that.
> 
>  But each page actually does not need to have migratetype set, does it?
>  All of those pages are in a single pageblock so a single call
>  suffices.  If you track set_pageblock_migratetype down to pfn_to_bitidx
>  there is:
> 
> 	return (pfn >> pageblock_order) * NR_PAGEBLOCK_BITS;
> 
>  so for pfns inside of a pageblock, they get truncated.  Or did I miss
>  yet another thing?

Nope, my turn to miss something. You only need to set migrate type
once per pageblock.

> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ee92384..fef9614 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -816,9 +816,21 @@ void __init init_cma_reserved_pageblock(struct page *page)
>  		set_page_count(p, 0);
>  	} while (++p, --i);
>  
> -	set_page_refcounted(page);
>  	set_pageblock_migratetype(page, MIGRATE_CMA);
> -	__free_pages(page, pageblock_order);
> +
> +	if (pageblock_order >= MAX_ORDER) {
> +		i = pageblock_nr_pages;
> +		p = page;
> +		do {
> +			set_page_refcounted(p);
> +			__free_pages(p, MAX_ORDER - 1);
> +			p += MAX_ORDER_NR_PAGES;
> +		} while (i -= MAX_ORDER_NR_PAGES);
> +	} else {
> +		set_page_refcounted(page);
> +		__free_pages(page, pageblock_order);
> +	}
> +
>  	adjust_managed_page_count(page, pageblock_nr_pages);
>  }
>  #endif

This version works for me. Thanks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2014-06-23 21:10 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-11 21:33 [PATCH] arm64: fix MAX_ORDER for 64K pagesize Mark Salter
2014-06-11 21:33 ` Mark Salter
2014-06-11 23:03 ` David Rientjes
2014-06-11 23:03   ` David Rientjes
2014-06-11 23:04   ` David Rientjes
2014-06-11 23:04     ` David Rientjes
2014-06-12 13:57     ` Mark Salter
2014-06-12 13:57       ` Mark Salter
2014-06-17 18:32   ` Michal Nazarewicz
2014-06-17 18:32     ` Michal Nazarewicz
2014-06-19 18:12     ` Mark Salter
2014-06-19 18:12       ` Mark Salter
2014-06-19 19:24       ` Michal Nazarewicz
2014-06-19 19:24         ` Michal Nazarewicz
2014-06-20 17:37         ` Mark Salter
2014-06-20 17:37           ` Mark Salter
2014-06-23 19:40           ` [PATCHv3] mm: page_alloc: fix CMA area initialisation when pageblock > MAX_ORDER Michal Nazarewicz
2014-06-23 19:40             ` Michal Nazarewicz
2014-06-23 21:10             ` Mark Salter
2014-06-23 21:10               ` Mark Salter
2014-06-19 19:53       ` [PATCHv2] " Michal Nazarewicz
2014-06-19 19:53         ` Michal Nazarewicz
2014-06-20 13:54         ` Christopher Covington
2014-06-20 13:54           ` Christopher Covington
2014-06-20 15:48         ` Mark Salter
2014-06-20 15:48           ` Mark Salter
2014-06-20 16:36           ` Michal Nazarewicz
2014-06-20 16:36             ` Michal Nazarewicz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.