All of lore.kernel.org
 help / color / mirror / Atom feed
* Issue on reserving memory with no-map flag  in  DT
@ 2015-01-16 11:30 Srinivas Kandagatla
  2015-01-17  0:24   ` Laura Abbott
  0 siblings, 1 reply; 33+ messages in thread
From: Srinivas Kandagatla @ 2015-01-16 11:30 UTC (permalink / raw)
  To: linux-arm-kernel

Hi All,

I am hitting boot failures when I did try to reserve memory with no-map 
flag using DT. Basically kernel just hangs with no indication of whats 
going on. Added some debug to find out the location, it was some where 
while dma mapping at kmap_atomic() in __dma_clear_buffer().
reserving.

The issue is very much identical to 
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-October/294773.html 
but the memory reserve in my case is at start of the memory. I tried the 
same fixes on this thread but it did not help.

Platform: IFC6410 with APQ8064 which is a v7 platform with 2GB of memory 
starting at 0x80000000 and kernel is always loaded at 0x80200000
And am using multi_v7_defconfig.

Meminfo without memory reserve:
80000000-88dfffff : System RAM
   80208000-80e5d307 : Kernel code
   80f64000-810be397 : Kernel data
8a000000-8d9fffff : System RAM
8ec00000-8effffff : System RAM
8f700000-8fdfffff : System RAM
90000000-af7fffff : System RAM

DT entry:
        reserved-memory {
                #address-cells = <1>;
                #size-cells = <1>;
                ranges;
                smem at 80000000 {
                        reg = <0x80000000 0x200000>;
                        no-map;
                };
        };

If I remove the no-map flag, then I can boot the board. But I don?t want 
kernel to map this memory at all, as this a IPC memory.

I just wanted to understand whats going on here, Am guessing that kernel 
would never touch that 2MB memory.

Does arm-kernel has limitation on unmapping/memblock_remove() such 
memory locations?
Or
Is this a known issue?

Any pointers to debug this issue?

Before the kernel hangs it reports 2 errors like:

BUG: Bad page state in process swapper  pfn:fffa8
page:ef7fb500 count:0 mapcount:0 mapping:  (null) index:0x0
flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked)
page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
bad because of flags:
flags: 0x200041(locked|active|mlocked)
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816
Hardware name: Qualcomm (Flattened Device Tree)
[<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24)
[<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c)
[<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128)
[<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0)
[<c03018a8>] (free_pages_prepare) from [<c030369c>] 
(free_hot_cold_page+0x3c/0x174)
[<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58)
[<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88)
[<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430)
[<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8)
[<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074)
Disabling lock debugging due to kernel taint


Full kernel log with memblock debug at http://paste.ubuntu.com/9761000/

Thanks,
srini

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Issue on reserving memory with no-map flag  in  DT
  2015-01-16 11:30 Issue on reserving memory with no-map flag in DT Srinivas Kandagatla
@ 2015-01-17  0:24   ` Laura Abbott
  0 siblings, 0 replies; 33+ messages in thread
From: Laura Abbott @ 2015-01-17  0:24 UTC (permalink / raw)
  To: Srinivas Kandagatla, linux-arm-kernel, linux, ssantosh,
	Andrew Morton, Mel Gorman
  Cc: Kevin Hilman, Stephen Boyd, Arnd Bergmann, Kumar Gala, linux-mm

(Adding linux-mm and relevant people because this looks like an issue there)

On 1/16/2015 3:30 AM, Srinivas Kandagatla wrote:
> Hi All,
>
> I am hitting boot failures when I did try to reserve memory with no-map flag using DT. Basically kernel just hangs with no indication of whats going on. Added some debug to find out the location, it was some where while dma mapping at kmap_atomic() in __dma_clear_buffer().
> reserving.
>
> The issue is very much identical to http://lists.infradead.org/pipermail/linux-arm-kernel/2014-October/294773.html but the memory reserve in my case is at start of the memory. I tried the same fixes on this thread but it did not help.
>
> Platform: IFC6410 with APQ8064 which is a v7 platform with 2GB of memory starting at 0x80000000 and kernel is always loaded at 0x80200000
> And am using multi_v7_defconfig.
>
> Meminfo without memory reserve:
> 80000000-88dfffff : System RAM
>    80208000-80e5d307 : Kernel code
>    80f64000-810be397 : Kernel data
> 8a000000-8d9fffff : System RAM
> 8ec00000-8effffff : System RAM
> 8f700000-8fdfffff : System RAM
> 90000000-af7fffff : System RAM
>
> DT entry:
>         reserved-memory {
>                 #address-cells = <1>;
>                 #size-cells = <1>;
>                 ranges;
>                 smem@80000000 {
>                         reg = <0x80000000 0x200000>;
>                         no-map;
>                 };
>         };
>
> If I remove the no-map flag, then I can boot the board. But I dona??t want kernel to map this memory at all, as this a IPC memory.
>
> I just wanted to understand whats going on here, Am guessing that kernel would never touch that 2MB memory.
>
> Does arm-kernel has limitation on unmapping/memblock_remove() such memory locations?
> Or
> Is this a known issue?
>
> Any pointers to debug this issue?
>
> Before the kernel hangs it reports 2 errors like:
>
> BUG: Bad page state in process swapper  pfn:fffa8
> page:ef7fb500 count:0 mapcount:0 mapping:  (null) index:0x0
> flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked)
> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> bad because of flags:
> flags: 0x200041(locked|active|mlocked)
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816
> Hardware name: Qualcomm (Flattened Device Tree)
> [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24)
> [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c)
> [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128)
> [<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0)
> [<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174)
> [<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58)
> [<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88)
> [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430)
> [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8)
> [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074)
> Disabling lock debugging due to kernel taint
>
>
> Full kernel log with memblock debug at http://paste.ubuntu.com/9761000/
>

I don't have an IFC handy but I was able to reproduce the same issue on another board.
I think this is an underlying issue in mm code.

Removing the first 2MB changes the start address of the zone. This means the start
address is no longer pageblock aligned (4MB on this system). With a little
digging, it looks like the issue is we're running off the end of the end of the
mem_map array because the memmap array is too small. This is similar to
an issue fixed by 7c45512 mm: fix pageblock bitmap allocation and the following
fixes it for me:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7633c50..32d9436 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5012,7 +5012,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
  #ifdef CONFIG_FLAT_NODE_MEM_MAP
         /* ia64 gets its own node_mem_map, before this, without bootmem */
         if (!pgdat->node_mem_map) {
-               unsigned long size, start, end;
+               unsigned long size, start, end, offset;
                 struct page *map;
  
                 /*
@@ -5020,10 +5020,11 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
                  * aligned but the node_mem_map endpoints must be in order
                  * for the buddy allocator to function correctly.
                  */
+               offset = pgdat->node_start_pfn & (pageblock_nr_pages - 1);
                 start = pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1);
                 end = pgdat_end_pfn(pgdat);
                 end = ALIGN(end, MAX_ORDER_NR_PAGES);
-               size =  (end - start) * sizeof(struct page);
+               size =  ((end - start) + offset) * sizeof(struct page);
                 map = alloc_remap(pgdat->node_id, size);
                 if (!map)
                         map = memblock_virt_alloc_node_nopanic(size,

If there is agreement on this approach, I can turn this into a proper patch.

Thanks,
Laura

-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Issue on reserving memory with no-map flag  in  DT
@ 2015-01-17  0:24   ` Laura Abbott
  0 siblings, 0 replies; 33+ messages in thread
From: Laura Abbott @ 2015-01-17  0:24 UTC (permalink / raw)
  To: linux-arm-kernel

(Adding linux-mm and relevant people because this looks like an issue there)

On 1/16/2015 3:30 AM, Srinivas Kandagatla wrote:
> Hi All,
>
> I am hitting boot failures when I did try to reserve memory with no-map flag using DT. Basically kernel just hangs with no indication of whats going on. Added some debug to find out the location, it was some where while dma mapping at kmap_atomic() in __dma_clear_buffer().
> reserving.
>
> The issue is very much identical to http://lists.infradead.org/pipermail/linux-arm-kernel/2014-October/294773.html but the memory reserve in my case is at start of the memory. I tried the same fixes on this thread but it did not help.
>
> Platform: IFC6410 with APQ8064 which is a v7 platform with 2GB of memory starting at 0x80000000 and kernel is always loaded at 0x80200000
> And am using multi_v7_defconfig.
>
> Meminfo without memory reserve:
> 80000000-88dfffff : System RAM
>    80208000-80e5d307 : Kernel code
>    80f64000-810be397 : Kernel data
> 8a000000-8d9fffff : System RAM
> 8ec00000-8effffff : System RAM
> 8f700000-8fdfffff : System RAM
> 90000000-af7fffff : System RAM
>
> DT entry:
>         reserved-memory {
>                 #address-cells = <1>;
>                 #size-cells = <1>;
>                 ranges;
>                 smem at 80000000 {
>                         reg = <0x80000000 0x200000>;
>                         no-map;
>                 };
>         };
>
> If I remove the no-map flag, then I can boot the board. But I don?t want kernel to map this memory at all, as this a IPC memory.
>
> I just wanted to understand whats going on here, Am guessing that kernel would never touch that 2MB memory.
>
> Does arm-kernel has limitation on unmapping/memblock_remove() such memory locations?
> Or
> Is this a known issue?
>
> Any pointers to debug this issue?
>
> Before the kernel hangs it reports 2 errors like:
>
> BUG: Bad page state in process swapper  pfn:fffa8
> page:ef7fb500 count:0 mapcount:0 mapping:  (null) index:0x0
> flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked)
> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> bad because of flags:
> flags: 0x200041(locked|active|mlocked)
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816
> Hardware name: Qualcomm (Flattened Device Tree)
> [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24)
> [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c)
> [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128)
> [<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0)
> [<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174)
> [<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58)
> [<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88)
> [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430)
> [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8)
> [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074)
> Disabling lock debugging due to kernel taint
>
>
> Full kernel log with memblock debug at http://paste.ubuntu.com/9761000/
>

I don't have an IFC handy but I was able to reproduce the same issue on another board.
I think this is an underlying issue in mm code.

Removing the first 2MB changes the start address of the zone. This means the start
address is no longer pageblock aligned (4MB on this system). With a little
digging, it looks like the issue is we're running off the end of the end of the
mem_map array because the memmap array is too small. This is similar to
an issue fixed by 7c45512 mm: fix pageblock bitmap allocation and the following
fixes it for me:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7633c50..32d9436 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5012,7 +5012,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
  #ifdef CONFIG_FLAT_NODE_MEM_MAP
         /* ia64 gets its own node_mem_map, before this, without bootmem */
         if (!pgdat->node_mem_map) {
-               unsigned long size, start, end;
+               unsigned long size, start, end, offset;
                 struct page *map;
  
                 /*
@@ -5020,10 +5020,11 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
                  * aligned but the node_mem_map endpoints must be in order
                  * for the buddy allocator to function correctly.
                  */
+               offset = pgdat->node_start_pfn & (pageblock_nr_pages - 1);
                 start = pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1);
                 end = pgdat_end_pfn(pgdat);
                 end = ALIGN(end, MAX_ORDER_NR_PAGES);
-               size =  (end - start) * sizeof(struct page);
+               size =  ((end - start) + offset) * sizeof(struct page);
                 map = alloc_remap(pgdat->node_id, size);
                 if (!map)
                         map = memblock_virt_alloc_node_nopanic(size,

If there is agreement on this approach, I can turn this into a proper patch.

Thanks,
Laura

-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: Issue on reserving memory with no-map flag  in  DT
  2015-01-17  0:24   ` Laura Abbott
@ 2015-01-17  8:39     ` Srinivas Kandagatla
  -1 siblings, 0 replies; 33+ messages in thread
From: Srinivas Kandagatla @ 2015-01-17  8:39 UTC (permalink / raw)
  To: Laura Abbott, linux-arm-kernel, linux, ssantosh, Andrew Morton,
	Mel Gorman
  Cc: Kevin Hilman, Stephen Boyd, Arnd Bergmann, Kumar Gala, linux-mm



On 17/01/15 00:24, Laura Abbott wrote:
> (Adding linux-mm and relevant people because this looks like an issue
> there)
>
> On 1/16/2015 3:30 AM, Srinivas Kandagatla wrote:
>> Hi All,
>>
>> I am hitting boot failures when I did try to reserve memory with
>> no-map flag using DT. Basically kernel just hangs with no indication
>> of whats going on. Added some debug to find out the location, it was
>> some where while dma mapping at kmap_atomic() in __dma_clear_buffer().
>> reserving.
>>
>> The issue is very much identical to
>> http://lists.infradead.org/pipermail/linux-arm-kernel/2014-October/294773.html
>> but the memory reserve in my case is at start of the memory. I tried
>> the same fixes on this thread but it did not help.
>>
>> Platform: IFC6410 with APQ8064 which is a v7 platform with 2GB of
>> memory starting at 0x80000000 and kernel is always loaded at 0x80200000
>> And am using multi_v7_defconfig.
>>
>> Meminfo without memory reserve:
>> 80000000-88dfffff : System RAM
>>    80208000-80e5d307 : Kernel code
>>    80f64000-810be397 : Kernel data
>> 8a000000-8d9fffff : System RAM
>> 8ec00000-8effffff : System RAM
>> 8f700000-8fdfffff : System RAM
>> 90000000-af7fffff : System RAM
>>
>> DT entry:
>>         reserved-memory {
>>                 #address-cells = <1>;
>>                 #size-cells = <1>;
>>                 ranges;
>>                 smem@80000000 {
>>                         reg = <0x80000000 0x200000>;
>>                         no-map;
>>                 };
>>         };
>>
>> If I remove the no-map flag, then I can boot the board. But I dona??t
>> want kernel to map this memory at all, as this a IPC memory.
>>
>> I just wanted to understand whats going on here, Am guessing that
>> kernel would never touch that 2MB memory.
>>
>> Does arm-kernel has limitation on unmapping/memblock_remove() such
>> memory locations?
>> Or
>> Is this a known issue?
>>
>> Any pointers to debug this issue?
>>
>> Before the kernel hangs it reports 2 errors like:
>>
>> BUG: Bad page state in process swapper  pfn:fffa8
>> page:ef7fb500 count:0 mapcount:0 mapping:  (null) index:0x0
>> flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked)
>> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
>> bad because of flags:
>> flags: 0x200041(locked|active|mlocked)
>> Modules linked in:
>> CPU: 0 PID: 0 Comm: swapper Not tainted
>> 3.19.0-rc3-00007-g412f9ba-dirty #816
>> Hardware name: Qualcomm (Flattened Device Tree)
>> [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24)
>> [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c)
>> [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128)
>> [<c0301570>] (bad_page) from [<c03018a8>]
>> (free_pages_prepare+0x168/0x1e0)
>> [<c03018a8>] (free_pages_prepare) from [<c030369c>]
>> (free_hot_cold_page+0x3c/0x174)
>> [<c030369c>] (free_hot_cold_page) from [<c0303828>]
>> (__free_pages+0x54/0x58)
>> [<c0303828>] (__free_pages) from [<c030395c>]
>> (free_highmem_page+0x38/0x88)
>> [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430)
>> [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8)
>> [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074)
>> Disabling lock debugging due to kernel taint
>>
>>
>> Full kernel log with memblock debug at http://paste.ubuntu.com/9761000/
>>
>
> I don't have an IFC handy but I was able to reproduce the same issue on
> another board.
> I think this is an underlying issue in mm code.
>
> Removing the first 2MB changes the start address of the zone. This means
> the start
> address is no longer pageblock aligned (4MB on this system). With a little
> digging, it looks like the issue is we're running off the end of the end
> of the
> mem_map array because the memmap array is too small. This is similar to
> an issue fixed by 7c45512 mm: fix pageblock bitmap allocation and the
> following
> fixes it for me:

Thanks Laura, This patch indeed fixes issue for me too.

Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>

>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7633c50..32d9436 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5012,7 +5012,7 @@ static void __init_refok alloc_node_mem_map(struct
> pglist_data *pgdat)
>   #ifdef CONFIG_FLAT_NODE_MEM_MAP
>          /* ia64 gets its own node_mem_map, before this, without bootmem */
>          if (!pgdat->node_mem_map) {
> -               unsigned long size, start, end;
> +               unsigned long size, start, end, offset;
>                  struct page *map;
>
>                  /*
> @@ -5020,10 +5020,11 @@ static void __init_refok
> alloc_node_mem_map(struct pglist_data *pgdat)
>                   * aligned but the node_mem_map endpoints must be in order
>                   * for the buddy allocator to function correctly.
>                   */
> +               offset = pgdat->node_start_pfn & (pageblock_nr_pages - 1);
>                  start = pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1);
>                  end = pgdat_end_pfn(pgdat);
>                  end = ALIGN(end, MAX_ORDER_NR_PAGES);
> -               size =  (end - start) * sizeof(struct page);
> +               size =  ((end - start) + offset) * sizeof(struct page);
>                  map = alloc_remap(pgdat->node_id, size);
>                  if (!map)
>                          map = memblock_virt_alloc_node_nopanic(size,
>
> If there is agreement on this approach, I can turn this into a proper
> patch.
>
> Thanks,
> Laura
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Issue on reserving memory with no-map flag  in  DT
@ 2015-01-17  8:39     ` Srinivas Kandagatla
  0 siblings, 0 replies; 33+ messages in thread
From: Srinivas Kandagatla @ 2015-01-17  8:39 UTC (permalink / raw)
  To: linux-arm-kernel



On 17/01/15 00:24, Laura Abbott wrote:
> (Adding linux-mm and relevant people because this looks like an issue
> there)
>
> On 1/16/2015 3:30 AM, Srinivas Kandagatla wrote:
>> Hi All,
>>
>> I am hitting boot failures when I did try to reserve memory with
>> no-map flag using DT. Basically kernel just hangs with no indication
>> of whats going on. Added some debug to find out the location, it was
>> some where while dma mapping at kmap_atomic() in __dma_clear_buffer().
>> reserving.
>>
>> The issue is very much identical to
>> http://lists.infradead.org/pipermail/linux-arm-kernel/2014-October/294773.html
>> but the memory reserve in my case is at start of the memory. I tried
>> the same fixes on this thread but it did not help.
>>
>> Platform: IFC6410 with APQ8064 which is a v7 platform with 2GB of
>> memory starting at 0x80000000 and kernel is always loaded at 0x80200000
>> And am using multi_v7_defconfig.
>>
>> Meminfo without memory reserve:
>> 80000000-88dfffff : System RAM
>>    80208000-80e5d307 : Kernel code
>>    80f64000-810be397 : Kernel data
>> 8a000000-8d9fffff : System RAM
>> 8ec00000-8effffff : System RAM
>> 8f700000-8fdfffff : System RAM
>> 90000000-af7fffff : System RAM
>>
>> DT entry:
>>         reserved-memory {
>>                 #address-cells = <1>;
>>                 #size-cells = <1>;
>>                 ranges;
>>                 smem at 80000000 {
>>                         reg = <0x80000000 0x200000>;
>>                         no-map;
>>                 };
>>         };
>>
>> If I remove the no-map flag, then I can boot the board. But I don?t
>> want kernel to map this memory at all, as this a IPC memory.
>>
>> I just wanted to understand whats going on here, Am guessing that
>> kernel would never touch that 2MB memory.
>>
>> Does arm-kernel has limitation on unmapping/memblock_remove() such
>> memory locations?
>> Or
>> Is this a known issue?
>>
>> Any pointers to debug this issue?
>>
>> Before the kernel hangs it reports 2 errors like:
>>
>> BUG: Bad page state in process swapper  pfn:fffa8
>> page:ef7fb500 count:0 mapcount:0 mapping:  (null) index:0x0
>> flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked)
>> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
>> bad because of flags:
>> flags: 0x200041(locked|active|mlocked)
>> Modules linked in:
>> CPU: 0 PID: 0 Comm: swapper Not tainted
>> 3.19.0-rc3-00007-g412f9ba-dirty #816
>> Hardware name: Qualcomm (Flattened Device Tree)
>> [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24)
>> [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c)
>> [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128)
>> [<c0301570>] (bad_page) from [<c03018a8>]
>> (free_pages_prepare+0x168/0x1e0)
>> [<c03018a8>] (free_pages_prepare) from [<c030369c>]
>> (free_hot_cold_page+0x3c/0x174)
>> [<c030369c>] (free_hot_cold_page) from [<c0303828>]
>> (__free_pages+0x54/0x58)
>> [<c0303828>] (__free_pages) from [<c030395c>]
>> (free_highmem_page+0x38/0x88)
>> [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430)
>> [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8)
>> [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074)
>> Disabling lock debugging due to kernel taint
>>
>>
>> Full kernel log with memblock debug at http://paste.ubuntu.com/9761000/
>>
>
> I don't have an IFC handy but I was able to reproduce the same issue on
> another board.
> I think this is an underlying issue in mm code.
>
> Removing the first 2MB changes the start address of the zone. This means
> the start
> address is no longer pageblock aligned (4MB on this system). With a little
> digging, it looks like the issue is we're running off the end of the end
> of the
> mem_map array because the memmap array is too small. This is similar to
> an issue fixed by 7c45512 mm: fix pageblock bitmap allocation and the
> following
> fixes it for me:

Thanks Laura, This patch indeed fixes issue for me too.

Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>

>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7633c50..32d9436 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5012,7 +5012,7 @@ static void __init_refok alloc_node_mem_map(struct
> pglist_data *pgdat)
>   #ifdef CONFIG_FLAT_NODE_MEM_MAP
>          /* ia64 gets its own node_mem_map, before this, without bootmem */
>          if (!pgdat->node_mem_map) {
> -               unsigned long size, start, end;
> +               unsigned long size, start, end, offset;
>                  struct page *map;
>
>                  /*
> @@ -5020,10 +5020,11 @@ static void __init_refok
> alloc_node_mem_map(struct pglist_data *pgdat)
>                   * aligned but the node_mem_map endpoints must be in order
>                   * for the buddy allocator to function correctly.
>                   */
> +               offset = pgdat->node_start_pfn & (pageblock_nr_pages - 1);
>                  start = pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1);
>                  end = pgdat_end_pfn(pgdat);
>                  end = ALIGN(end, MAX_ORDER_NR_PAGES);
> -               size =  (end - start) * sizeof(struct page);
> +               size =  ((end - start) + offset) * sizeof(struct page);
>                  map = alloc_remap(pgdat->node_id, size);
>                  if (!map)
>                          map = memblock_virt_alloc_node_nopanic(size,
>
> If there is agreement on this approach, I can turn this into a proper
> patch.
>
> Thanks,
> Laura
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Issue on reserving memory with no-map flag  in  DT
  2015-01-17  0:24   ` Laura Abbott
@ 2015-01-19 15:49     ` Vlastimil Babka
  -1 siblings, 0 replies; 33+ messages in thread
From: Vlastimil Babka @ 2015-01-19 15:49 UTC (permalink / raw)
  To: Laura Abbott, Srinivas Kandagatla, linux-arm-kernel, linux,
	ssantosh, Andrew Morton, Mel Gorman
  Cc: Kevin Hilman, Stephen Boyd, Arnd Bergmann, Kumar Gala, linux-mm

On 01/17/2015 01:24 AM, Laura Abbott wrote:
> (Adding linux-mm and relevant people because this looks like an issue there)
> 
> On 1/16/2015 3:30 AM, Srinivas Kandagatla wrote:
>> Hi All,
>>
>> I am hitting boot failures when I did try to reserve memory with no-map flag using DT. Basically kernel just hangs with no indication of whats going on. Added some debug to find out the location, it was some where while dma mapping at kmap_atomic() in __dma_clear_buffer().
>> reserving.
>>
>> The issue is very much identical to http://lists.infradead.org/pipermail/linux-arm-kernel/2014-October/294773.html but the memory reserve in my case is at start of the memory. I tried the same fixes on this thread but it did not help.
>>
>> Platform: IFC6410 with APQ8064 which is a v7 platform with 2GB of memory starting at 0x80000000 and kernel is always loaded at 0x80200000
>> And am using multi_v7_defconfig.
>>
>> Meminfo without memory reserve:
>> 80000000-88dfffff : System RAM
>>    80208000-80e5d307 : Kernel code
>>    80f64000-810be397 : Kernel data
>> 8a000000-8d9fffff : System RAM
>> 8ec00000-8effffff : System RAM
>> 8f700000-8fdfffff : System RAM
>> 90000000-af7fffff : System RAM
>>
>> DT entry:
>>         reserved-memory {
>>                 #address-cells = <1>;
>>                 #size-cells = <1>;
>>                 ranges;
>>                 smem@80000000 {
>>                         reg = <0x80000000 0x200000>;
>>                         no-map;
>>                 };
>>         };
>>
>> If I remove the no-map flag, then I can boot the board. But I dona??t want kernel to map this memory at all, as this a IPC memory.
>>
>> I just wanted to understand whats going on here, Am guessing that kernel would never touch that 2MB memory.
>>
>> Does arm-kernel has limitation on unmapping/memblock_remove() such memory locations?
>> Or
>> Is this a known issue?
>>
>> Any pointers to debug this issue?
>>
>> Before the kernel hangs it reports 2 errors like:
>>
>> BUG: Bad page state in process swapper  pfn:fffa8
>> page:ef7fb500 count:0 mapcount:0 mapping:  (null) index:0x0
>> flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked)
>> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
>> bad because of flags:
>> flags: 0x200041(locked|active|mlocked)
>> Modules linked in:
>> CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816
>> Hardware name: Qualcomm (Flattened Device Tree)
>> [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24)
>> [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c)
>> [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128)
>> [<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0)
>> [<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174)
>> [<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58)
>> [<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88)
>> [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430)
>> [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8)
>> [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074)
>> Disabling lock debugging due to kernel taint
>>
>>
>> Full kernel log with memblock debug at http://paste.ubuntu.com/9761000/
>>
> 
> I don't have an IFC handy but I was able to reproduce the same issue on another board.
> I think this is an underlying issue in mm code.
> 
> Removing the first 2MB changes the start address of the zone. This means the start
> address is no longer pageblock aligned (4MB on this system). With a little
> digging, it looks like the issue is we're running off the end of the end of the
> mem_map array because the memmap array is too small. This is similar to
> an issue fixed by 7c45512 mm: fix pageblock bitmap allocation and the following
> fixes it for me:
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7633c50..32d9436 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5012,7 +5012,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
>   #ifdef CONFIG_FLAT_NODE_MEM_MAP
>          /* ia64 gets its own node_mem_map, before this, without bootmem */
>          if (!pgdat->node_mem_map) {
> -               unsigned long size, start, end;
> +               unsigned long size, start, end, offset;
>                  struct page *map;
>   
>                  /*
> @@ -5020,10 +5020,11 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
>                   * aligned but the node_mem_map endpoints must be in order
>                   * for the buddy allocator to function correctly.
>                   */
> +               offset = pgdat->node_start_pfn & (pageblock_nr_pages - 1);
>                  start = pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1);
>                  end = pgdat_end_pfn(pgdat);
>                  end = ALIGN(end, MAX_ORDER_NR_PAGES);
> -               size =  (end - start) * sizeof(struct page);
> +               size =  ((end - start) + offset) * sizeof(struct page);
>                  map = alloc_remap(pgdat->node_id, size);
>                  if (!map)
>                          map = memblock_virt_alloc_node_nopanic(size,
> 
> If there is agreement on this approach, I can turn this into a proper patch.

I admit I may not see clearly through all the arch-specific layers and various
config option combinations that are possible here, so I might be misinterpreting
the code. But I think the problem here is not insufficient allocation size, but
something else.

The code above continues by this line:

		pgdat->node_mem_map = map + (pgdat->node_start_pfn - start);

So, size for the map allocation has already been calculated aligned to
MAX_ORDER_NR_PAGES before your patch, and node_mem_map points to the first
actually present page, which might be offset from the perfect alignment. Your
patch adds another offset to the already aligned size (but you use
pageblock_nr_pages which might be lower than MAX_ORDER_NR_PAGES; this seems like
a mistake in itself?). So with your patch we have map of aligned size starting
from the node_mem_map. This means the last offset-worth of struct pages should
be beyond what's needed to access struct page of pgdat_end_pfn(). If we need
that extra padding to prevent crashing, then it looks really suspicious...

And when I look at node_mem_map usage, I see include/asm/generic/memory_model.h
defines __pfn_to_page as (basically)

NODE_DATA(__nid)->node_mem_map + arch_local_page_offset(__pfn, __nid);\

and further above is a generic definition of arch_local_page_offset:

#define arch_local_page_offset(pfn, nid)        \
        ((pfn) - NODE_DATA(nid)->node_start_pfn)

So it looks correct to me without your patch. The map is allocated aligned,
node_mem_map points to this map at the offset corresponding to node_start_pfn,
and pfn_to_page subtracts node_start_pfn to get the offset relative to
node_mem_map. We shouldn't need the extra padding by the node_start_pfn offset,
unless something else is misbehaving here.

In the issue fixed by 7c45512 that you refer to, the problem was basically that
the allocation didn't use aligned size, but this looks different to me?


> Thanks,
> Laura
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Issue on reserving memory with no-map flag  in  DT
@ 2015-01-19 15:49     ` Vlastimil Babka
  0 siblings, 0 replies; 33+ messages in thread
From: Vlastimil Babka @ 2015-01-19 15:49 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/17/2015 01:24 AM, Laura Abbott wrote:
> (Adding linux-mm and relevant people because this looks like an issue there)
> 
> On 1/16/2015 3:30 AM, Srinivas Kandagatla wrote:
>> Hi All,
>>
>> I am hitting boot failures when I did try to reserve memory with no-map flag using DT. Basically kernel just hangs with no indication of whats going on. Added some debug to find out the location, it was some where while dma mapping at kmap_atomic() in __dma_clear_buffer().
>> reserving.
>>
>> The issue is very much identical to http://lists.infradead.org/pipermail/linux-arm-kernel/2014-October/294773.html but the memory reserve in my case is at start of the memory. I tried the same fixes on this thread but it did not help.
>>
>> Platform: IFC6410 with APQ8064 which is a v7 platform with 2GB of memory starting at 0x80000000 and kernel is always loaded at 0x80200000
>> And am using multi_v7_defconfig.
>>
>> Meminfo without memory reserve:
>> 80000000-88dfffff : System RAM
>>    80208000-80e5d307 : Kernel code
>>    80f64000-810be397 : Kernel data
>> 8a000000-8d9fffff : System RAM
>> 8ec00000-8effffff : System RAM
>> 8f700000-8fdfffff : System RAM
>> 90000000-af7fffff : System RAM
>>
>> DT entry:
>>         reserved-memory {
>>                 #address-cells = <1>;
>>                 #size-cells = <1>;
>>                 ranges;
>>                 smem at 80000000 {
>>                         reg = <0x80000000 0x200000>;
>>                         no-map;
>>                 };
>>         };
>>
>> If I remove the no-map flag, then I can boot the board. But I don?t want kernel to map this memory at all, as this a IPC memory.
>>
>> I just wanted to understand whats going on here, Am guessing that kernel would never touch that 2MB memory.
>>
>> Does arm-kernel has limitation on unmapping/memblock_remove() such memory locations?
>> Or
>> Is this a known issue?
>>
>> Any pointers to debug this issue?
>>
>> Before the kernel hangs it reports 2 errors like:
>>
>> BUG: Bad page state in process swapper  pfn:fffa8
>> page:ef7fb500 count:0 mapcount:0 mapping:  (null) index:0x0
>> flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked)
>> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
>> bad because of flags:
>> flags: 0x200041(locked|active|mlocked)
>> Modules linked in:
>> CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816
>> Hardware name: Qualcomm (Flattened Device Tree)
>> [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24)
>> [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c)
>> [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128)
>> [<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0)
>> [<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174)
>> [<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58)
>> [<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88)
>> [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430)
>> [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8)
>> [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074)
>> Disabling lock debugging due to kernel taint
>>
>>
>> Full kernel log with memblock debug at http://paste.ubuntu.com/9761000/
>>
> 
> I don't have an IFC handy but I was able to reproduce the same issue on another board.
> I think this is an underlying issue in mm code.
> 
> Removing the first 2MB changes the start address of the zone. This means the start
> address is no longer pageblock aligned (4MB on this system). With a little
> digging, it looks like the issue is we're running off the end of the end of the
> mem_map array because the memmap array is too small. This is similar to
> an issue fixed by 7c45512 mm: fix pageblock bitmap allocation and the following
> fixes it for me:
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7633c50..32d9436 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5012,7 +5012,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
>   #ifdef CONFIG_FLAT_NODE_MEM_MAP
>          /* ia64 gets its own node_mem_map, before this, without bootmem */
>          if (!pgdat->node_mem_map) {
> -               unsigned long size, start, end;
> +               unsigned long size, start, end, offset;
>                  struct page *map;
>   
>                  /*
> @@ -5020,10 +5020,11 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
>                   * aligned but the node_mem_map endpoints must be in order
>                   * for the buddy allocator to function correctly.
>                   */
> +               offset = pgdat->node_start_pfn & (pageblock_nr_pages - 1);
>                  start = pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1);
>                  end = pgdat_end_pfn(pgdat);
>                  end = ALIGN(end, MAX_ORDER_NR_PAGES);
> -               size =  (end - start) * sizeof(struct page);
> +               size =  ((end - start) + offset) * sizeof(struct page);
>                  map = alloc_remap(pgdat->node_id, size);
>                  if (!map)
>                          map = memblock_virt_alloc_node_nopanic(size,
> 
> If there is agreement on this approach, I can turn this into a proper patch.

I admit I may not see clearly through all the arch-specific layers and various
config option combinations that are possible here, so I might be misinterpreting
the code. But I think the problem here is not insufficient allocation size, but
something else.

The code above continues by this line:

		pgdat->node_mem_map = map + (pgdat->node_start_pfn - start);

So, size for the map allocation has already been calculated aligned to
MAX_ORDER_NR_PAGES before your patch, and node_mem_map points to the first
actually present page, which might be offset from the perfect alignment. Your
patch adds another offset to the already aligned size (but you use
pageblock_nr_pages which might be lower than MAX_ORDER_NR_PAGES; this seems like
a mistake in itself?). So with your patch we have map of aligned size starting
from the node_mem_map. This means the last offset-worth of struct pages should
be beyond what's needed to access struct page of pgdat_end_pfn(). If we need
that extra padding to prevent crashing, then it looks really suspicious...

And when I look at node_mem_map usage, I see include/asm/generic/memory_model.h
defines __pfn_to_page as (basically)

NODE_DATA(__nid)->node_mem_map + arch_local_page_offset(__pfn, __nid);\

and further above is a generic definition of arch_local_page_offset:

#define arch_local_page_offset(pfn, nid)        \
        ((pfn) - NODE_DATA(nid)->node_start_pfn)

So it looks correct to me without your patch. The map is allocated aligned,
node_mem_map points to this map at the offset corresponding to node_start_pfn,
and pfn_to_page subtracts node_start_pfn to get the offset relative to
node_mem_map. We shouldn't need the extra padding by the node_start_pfn offset,
unless something else is misbehaving here.

In the issue fixed by 7c45512 that you refer to, the problem was basically that
the allocation didn't use aligned size, but this looks different to me?


> Thanks,
> Laura
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Issue on reserving memory with no-map flag  in  DT
  2015-01-19 15:49     ` Vlastimil Babka
@ 2015-01-19 23:57       ` Laura Abbott
  -1 siblings, 0 replies; 33+ messages in thread
From: Laura Abbott @ 2015-01-19 23:57 UTC (permalink / raw)
  To: Vlastimil Babka, Srinivas Kandagatla, linux-arm-kernel, linux,
	ssantosh, Andrew Morton, Mel Gorman
  Cc: Kevin Hilman, Stephen Boyd, Arnd Bergmann, Kumar Gala, linux-mm

On 1/19/2015 7:49 AM, Vlastimil Babka wrote:
> On 01/17/2015 01:24 AM, Laura Abbott wrote:
>> (Adding linux-mm and relevant people because this looks like an issue there)
>>
>> On 1/16/2015 3:30 AM, Srinivas Kandagatla wrote:
>>> Hi All,
>>>
>>> I am hitting boot failures when I did try to reserve memory with no-map flag using DT. Basically kernel just hangs with no indication of whats going on. Added some debug to find out the location, it was some where while dma mapping at kmap_atomic() in __dma_clear_buffer().
>>> reserving.
>>>
>>> The issue is very much identical to http://lists.infradead.org/pipermail/linux-arm-kernel/2014-October/294773.html but the memory reserve in my case is at start of the memory. I tried the same fixes on this thread but it did not help.
>>>
>>> Platform: IFC6410 with APQ8064 which is a v7 platform with 2GB of memory starting at 0x80000000 and kernel is always loaded at 0x80200000
>>> And am using multi_v7_defconfig.
>>>
>>> Meminfo without memory reserve:
>>> 80000000-88dfffff : System RAM
>>>     80208000-80e5d307 : Kernel code
>>>     80f64000-810be397 : Kernel data
>>> 8a000000-8d9fffff : System RAM
>>> 8ec00000-8effffff : System RAM
>>> 8f700000-8fdfffff : System RAM
>>> 90000000-af7fffff : System RAM
>>>
>>> DT entry:
>>>          reserved-memory {
>>>                  #address-cells = <1>;
>>>                  #size-cells = <1>;
>>>                  ranges;
>>>                  smem@80000000 {
>>>                          reg = <0x80000000 0x200000>;
>>>                          no-map;
>>>                  };
>>>          };
>>>
>>> If I remove the no-map flag, then I can boot the board. But I dona??t want kernel to map this memory at all, as this a IPC memory.
>>>
>>> I just wanted to understand whats going on here, Am guessing that kernel would never touch that 2MB memory.
>>>
>>> Does arm-kernel has limitation on unmapping/memblock_remove() such memory locations?
>>> Or
>>> Is this a known issue?
>>>
>>> Any pointers to debug this issue?
>>>
>>> Before the kernel hangs it reports 2 errors like:
>>>
>>> BUG: Bad page state in process swapper  pfn:fffa8
>>> page:ef7fb500 count:0 mapcount:0 mapping:  (null) index:0x0
>>> flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked)
>>> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
>>> bad because of flags:
>>> flags: 0x200041(locked|active|mlocked)
>>> Modules linked in:
>>> CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816
>>> Hardware name: Qualcomm (Flattened Device Tree)
>>> [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24)
>>> [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c)
>>> [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128)
>>> [<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0)
>>> [<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174)
>>> [<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58)
>>> [<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88)
>>> [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430)
>>> [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8)
>>> [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074)
>>> Disabling lock debugging due to kernel taint
>>>
>>>
>>> Full kernel log with memblock debug at http://paste.ubuntu.com/9761000/
>>>
>>
>> I don't have an IFC handy but I was able to reproduce the same issue on another board.
>> I think this is an underlying issue in mm code.
>>
>> Removing the first 2MB changes the start address of the zone. This means the start
>> address is no longer pageblock aligned (4MB on this system). With a little
>> digging, it looks like the issue is we're running off the end of the end of the
>> mem_map array because the memmap array is too small. This is similar to
>> an issue fixed by 7c45512 mm: fix pageblock bitmap allocation and the following
>> fixes it for me:
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 7633c50..32d9436 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -5012,7 +5012,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
>>    #ifdef CONFIG_FLAT_NODE_MEM_MAP
>>           /* ia64 gets its own node_mem_map, before this, without bootmem */
>>           if (!pgdat->node_mem_map) {
>> -               unsigned long size, start, end;
>> +               unsigned long size, start, end, offset;
>>                   struct page *map;
>>
>>                   /*
>> @@ -5020,10 +5020,11 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
>>                    * aligned but the node_mem_map endpoints must be in order
>>                    * for the buddy allocator to function correctly.
>>                    */
>> +               offset = pgdat->node_start_pfn & (pageblock_nr_pages - 1);
>>                   start = pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1);
>>                   end = pgdat_end_pfn(pgdat);
>>                   end = ALIGN(end, MAX_ORDER_NR_PAGES);
>> -               size =  (end - start) * sizeof(struct page);
>> +               size =  ((end - start) + offset) * sizeof(struct page);
>>                   map = alloc_remap(pgdat->node_id, size);
>>                   if (!map)
>>                           map = memblock_virt_alloc_node_nopanic(size,
>>
>> If there is agreement on this approach, I can turn this into a proper patch.
>
> I admit I may not see clearly through all the arch-specific layers and various
> config option combinations that are possible here, so I might be misinterpreting
> the code. But I think the problem here is not insufficient allocation size, but
> something else.
>
> The code above continues by this line:
>
> 		pgdat->node_mem_map = map + (pgdat->node_start_pfn - start);
>
> So, size for the map allocation has already been calculated aligned to
> MAX_ORDER_NR_PAGES before your patch, and node_mem_map points to the first
> actually present page, which might be offset from the perfect alignment. Your
> patch adds another offset to the already aligned size (but you use
> pageblock_nr_pages which might be lower than MAX_ORDER_NR_PAGES; this seems like
> a mistake in itself?). So with your patch we have map of aligned size starting
> from the node_mem_map. This means the last offset-worth of struct pages should
> be beyond what's needed to access struct page of pgdat_end_pfn(). If we need
> that extra padding to prevent crashing, then it looks really suspicious...
>
> And when I look at node_mem_map usage, I see include/asm/generic/memory_model.h
> defines __pfn_to_page as (basically)
>
> NODE_DATA(__nid)->node_mem_map + arch_local_page_offset(__pfn, __nid);\
>
> and further above is a generic definition of arch_local_page_offset:
>
> #define arch_local_page_offset(pfn, nid)        \
>          ((pfn) - NODE_DATA(nid)->node_start_pfn)
>
> So it looks correct to me without your patch. The map is allocated aligned,
> node_mem_map points to this map at the offset corresponding to node_start_pfn,
> and pfn_to_page subtracts node_start_pfn to get the offset relative to
> node_mem_map. We shouldn't need the extra padding by the node_start_pfn offset,
> unless something else is misbehaving here.
>
> In the issue fixed by 7c45512 that you refer to, the problem was basically that
> the allocation didn't use aligned size, but this looks different to me?
>
>

With this hard coded debugging:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7633c50..241b870 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5029,6 +5029,11 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
                         map = memblock_virt_alloc_node_nopanic(size,
                                                                pgdat->node_id);
                 pgdat->node_mem_map = map + (pgdat->node_start_pfn - start);
+               pr_err(">>> node_start_pfn %lx node_end_pfn %lx\n",
+                       pgdat->node_start_pfn, pgdat_end_pfn(pgdat));
+               pr_err(">>> size calculated %lx\n", size);
+               pr_err(">>> allocated region %p-%lx\n", map, ((unsigned long)map)+size);
+
         }
  #ifndef CONFIG_NEED_MULTIPLE_NODES
         /*
@@ -5043,6 +5048,8 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
         }
  #endif
  #endif /* CONFIG_FLAT_NODE_MEM_MAP */
+       pr_err(">>> pfn %lx page %p\n", 0x200, pfn_to_page(0x200));
+       pr_err(">>> pfn %lx page %p\n", 0xbffff, pfn_to_page(0xbffff));
  }
  
  void __paginginit free_area_init_node(int nid, unsigned long *zones_size,

I get this output:
[    0.000000] >>> node_start_pfn 200 node_end_pfn c0000
[    0.000000] >>> size calculated 1800000
[    0.000000] >>> allocated region edffa000-ef7fa000
[    0.000000] >>> pfn 200 page ee002000
[    0.000000] >>> pfn bffff page ef7fdfe0

The start and end pfn values are correct but that page value is outside of the
allocated region for the memory map. This is a CONFIG_FLATMEM system so we
aren't actually using arch_local_page_offset at all:


#define __pfn_to_page(pfn)      (mem_map + ((pfn) - ARCH_PFN_OFFSET))
#define __page_to_pfn(page)     ((unsigned long)((page) - mem_map) + \
                                  ARCH_PFN_OFFSET)

If you do the math, the array size is fine if we don't offset by the
start but alloc_node_mem_map offsets assuming pfn_to_page will offset
as well but this doesn't happen in CONFIG_FLATMEM.

Either alloc_node_mem_map needs to drop the offset or the pfn_to_page
functions need to start adding the offset. It's worth noting that
this gets corrected properly if we have CONFIG_HAVE_MEMBLOCK_NODE_MAP enabled
so perhaps the fix is to unoffset for flatmem as well:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7633c50..271c44b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5036,7 +5036,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
          */
         if (pgdat == NODE_DATA(0)) {
                 mem_map = NODE_DATA(0)->node_mem_map;
-#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
+#if defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) || defined(CONFIG_FLATMEM)
                 if (page_to_pfn(mem_map) != pgdat->node_start_pfn)
                         mem_map -= (pgdat->node_start_pfn - ARCH_PFN_OFFSET);
  #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */

Thanks,
Laura

-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Issue on reserving memory with no-map flag  in  DT
@ 2015-01-19 23:57       ` Laura Abbott
  0 siblings, 0 replies; 33+ messages in thread
From: Laura Abbott @ 2015-01-19 23:57 UTC (permalink / raw)
  To: linux-arm-kernel

On 1/19/2015 7:49 AM, Vlastimil Babka wrote:
> On 01/17/2015 01:24 AM, Laura Abbott wrote:
>> (Adding linux-mm and relevant people because this looks like an issue there)
>>
>> On 1/16/2015 3:30 AM, Srinivas Kandagatla wrote:
>>> Hi All,
>>>
>>> I am hitting boot failures when I did try to reserve memory with no-map flag using DT. Basically kernel just hangs with no indication of whats going on. Added some debug to find out the location, it was some where while dma mapping at kmap_atomic() in __dma_clear_buffer().
>>> reserving.
>>>
>>> The issue is very much identical to http://lists.infradead.org/pipermail/linux-arm-kernel/2014-October/294773.html but the memory reserve in my case is at start of the memory. I tried the same fixes on this thread but it did not help.
>>>
>>> Platform: IFC6410 with APQ8064 which is a v7 platform with 2GB of memory starting at 0x80000000 and kernel is always loaded at 0x80200000
>>> And am using multi_v7_defconfig.
>>>
>>> Meminfo without memory reserve:
>>> 80000000-88dfffff : System RAM
>>>     80208000-80e5d307 : Kernel code
>>>     80f64000-810be397 : Kernel data
>>> 8a000000-8d9fffff : System RAM
>>> 8ec00000-8effffff : System RAM
>>> 8f700000-8fdfffff : System RAM
>>> 90000000-af7fffff : System RAM
>>>
>>> DT entry:
>>>          reserved-memory {
>>>                  #address-cells = <1>;
>>>                  #size-cells = <1>;
>>>                  ranges;
>>>                  smem at 80000000 {
>>>                          reg = <0x80000000 0x200000>;
>>>                          no-map;
>>>                  };
>>>          };
>>>
>>> If I remove the no-map flag, then I can boot the board. But I don?t want kernel to map this memory at all, as this a IPC memory.
>>>
>>> I just wanted to understand whats going on here, Am guessing that kernel would never touch that 2MB memory.
>>>
>>> Does arm-kernel has limitation on unmapping/memblock_remove() such memory locations?
>>> Or
>>> Is this a known issue?
>>>
>>> Any pointers to debug this issue?
>>>
>>> Before the kernel hangs it reports 2 errors like:
>>>
>>> BUG: Bad page state in process swapper  pfn:fffa8
>>> page:ef7fb500 count:0 mapcount:0 mapping:  (null) index:0x0
>>> flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked)
>>> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
>>> bad because of flags:
>>> flags: 0x200041(locked|active|mlocked)
>>> Modules linked in:
>>> CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816
>>> Hardware name: Qualcomm (Flattened Device Tree)
>>> [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24)
>>> [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c)
>>> [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128)
>>> [<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0)
>>> [<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174)
>>> [<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58)
>>> [<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88)
>>> [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430)
>>> [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8)
>>> [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074)
>>> Disabling lock debugging due to kernel taint
>>>
>>>
>>> Full kernel log with memblock debug at http://paste.ubuntu.com/9761000/
>>>
>>
>> I don't have an IFC handy but I was able to reproduce the same issue on another board.
>> I think this is an underlying issue in mm code.
>>
>> Removing the first 2MB changes the start address of the zone. This means the start
>> address is no longer pageblock aligned (4MB on this system). With a little
>> digging, it looks like the issue is we're running off the end of the end of the
>> mem_map array because the memmap array is too small. This is similar to
>> an issue fixed by 7c45512 mm: fix pageblock bitmap allocation and the following
>> fixes it for me:
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 7633c50..32d9436 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -5012,7 +5012,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
>>    #ifdef CONFIG_FLAT_NODE_MEM_MAP
>>           /* ia64 gets its own node_mem_map, before this, without bootmem */
>>           if (!pgdat->node_mem_map) {
>> -               unsigned long size, start, end;
>> +               unsigned long size, start, end, offset;
>>                   struct page *map;
>>
>>                   /*
>> @@ -5020,10 +5020,11 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
>>                    * aligned but the node_mem_map endpoints must be in order
>>                    * for the buddy allocator to function correctly.
>>                    */
>> +               offset = pgdat->node_start_pfn & (pageblock_nr_pages - 1);
>>                   start = pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1);
>>                   end = pgdat_end_pfn(pgdat);
>>                   end = ALIGN(end, MAX_ORDER_NR_PAGES);
>> -               size =  (end - start) * sizeof(struct page);
>> +               size =  ((end - start) + offset) * sizeof(struct page);
>>                   map = alloc_remap(pgdat->node_id, size);
>>                   if (!map)
>>                           map = memblock_virt_alloc_node_nopanic(size,
>>
>> If there is agreement on this approach, I can turn this into a proper patch.
>
> I admit I may not see clearly through all the arch-specific layers and various
> config option combinations that are possible here, so I might be misinterpreting
> the code. But I think the problem here is not insufficient allocation size, but
> something else.
>
> The code above continues by this line:
>
> 		pgdat->node_mem_map = map + (pgdat->node_start_pfn - start);
>
> So, size for the map allocation has already been calculated aligned to
> MAX_ORDER_NR_PAGES before your patch, and node_mem_map points to the first
> actually present page, which might be offset from the perfect alignment. Your
> patch adds another offset to the already aligned size (but you use
> pageblock_nr_pages which might be lower than MAX_ORDER_NR_PAGES; this seems like
> a mistake in itself?). So with your patch we have map of aligned size starting
> from the node_mem_map. This means the last offset-worth of struct pages should
> be beyond what's needed to access struct page of pgdat_end_pfn(). If we need
> that extra padding to prevent crashing, then it looks really suspicious...
>
> And when I look at node_mem_map usage, I see include/asm/generic/memory_model.h
> defines __pfn_to_page as (basically)
>
> NODE_DATA(__nid)->node_mem_map + arch_local_page_offset(__pfn, __nid);\
>
> and further above is a generic definition of arch_local_page_offset:
>
> #define arch_local_page_offset(pfn, nid)        \
>          ((pfn) - NODE_DATA(nid)->node_start_pfn)
>
> So it looks correct to me without your patch. The map is allocated aligned,
> node_mem_map points to this map at the offset corresponding to node_start_pfn,
> and pfn_to_page subtracts node_start_pfn to get the offset relative to
> node_mem_map. We shouldn't need the extra padding by the node_start_pfn offset,
> unless something else is misbehaving here.
>
> In the issue fixed by 7c45512 that you refer to, the problem was basically that
> the allocation didn't use aligned size, but this looks different to me?
>
>

With this hard coded debugging:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7633c50..241b870 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5029,6 +5029,11 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
                         map = memblock_virt_alloc_node_nopanic(size,
                                                                pgdat->node_id);
                 pgdat->node_mem_map = map + (pgdat->node_start_pfn - start);
+               pr_err(">>> node_start_pfn %lx node_end_pfn %lx\n",
+                       pgdat->node_start_pfn, pgdat_end_pfn(pgdat));
+               pr_err(">>> size calculated %lx\n", size);
+               pr_err(">>> allocated region %p-%lx\n", map, ((unsigned long)map)+size);
+
         }
  #ifndef CONFIG_NEED_MULTIPLE_NODES
         /*
@@ -5043,6 +5048,8 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
         }
  #endif
  #endif /* CONFIG_FLAT_NODE_MEM_MAP */
+       pr_err(">>> pfn %lx page %p\n", 0x200, pfn_to_page(0x200));
+       pr_err(">>> pfn %lx page %p\n", 0xbffff, pfn_to_page(0xbffff));
  }
  
  void __paginginit free_area_init_node(int nid, unsigned long *zones_size,

I get this output:
[    0.000000] >>> node_start_pfn 200 node_end_pfn c0000
[    0.000000] >>> size calculated 1800000
[    0.000000] >>> allocated region edffa000-ef7fa000
[    0.000000] >>> pfn 200 page ee002000
[    0.000000] >>> pfn bffff page ef7fdfe0

The start and end pfn values are correct but that page value is outside of the
allocated region for the memory map. This is a CONFIG_FLATMEM system so we
aren't actually using arch_local_page_offset at all:


#define __pfn_to_page(pfn)      (mem_map + ((pfn) - ARCH_PFN_OFFSET))
#define __page_to_pfn(page)     ((unsigned long)((page) - mem_map) + \
                                  ARCH_PFN_OFFSET)

If you do the math, the array size is fine if we don't offset by the
start but alloc_node_mem_map offsets assuming pfn_to_page will offset
as well but this doesn't happen in CONFIG_FLATMEM.

Either alloc_node_mem_map needs to drop the offset or the pfn_to_page
functions need to start adding the offset. It's worth noting that
this gets corrected properly if we have CONFIG_HAVE_MEMBLOCK_NODE_MAP enabled
so perhaps the fix is to unoffset for flatmem as well:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7633c50..271c44b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5036,7 +5036,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
          */
         if (pgdat == NODE_DATA(0)) {
                 mem_map = NODE_DATA(0)->node_mem_map;
-#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
+#if defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) || defined(CONFIG_FLATMEM)
                 if (page_to_pfn(mem_map) != pgdat->node_start_pfn)
                         mem_map -= (pgdat->node_start_pfn - ARCH_PFN_OFFSET);
  #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */

Thanks,
Laura

-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: Issue on reserving memory with no-map flag  in  DT
  2015-01-19 23:57       ` Laura Abbott
@ 2015-01-20  9:54         ` Vlastimil Babka
  -1 siblings, 0 replies; 33+ messages in thread
From: Vlastimil Babka @ 2015-01-20  9:54 UTC (permalink / raw)
  To: Laura Abbott, Srinivas Kandagatla, linux-arm-kernel, linux,
	ssantosh, Andrew Morton, Mel Gorman
  Cc: Kevin Hilman, Stephen Boyd, Arnd Bergmann, Kumar Gala, linux-mm

On 01/20/2015 12:57 AM, Laura Abbott wrote:
> On 1/19/2015 7:49 AM, Vlastimil Babka wrote:
>> On 01/17/2015 01:24 AM, Laura Abbott wrote:
>>
>> I admit I may not see clearly through all the arch-specific layers and various
>> config option combinations that are possible here, so I might be misinterpreting
>> the code. But I think the problem here is not insufficient allocation size, but
>> something else.
>>
>> The code above continues by this line:
>>
>> 		pgdat->node_mem_map = map + (pgdat->node_start_pfn - start);
>>
>> So, size for the map allocation has already been calculated aligned to
>> MAX_ORDER_NR_PAGES before your patch, and node_mem_map points to the first
>> actually present page, which might be offset from the perfect alignment. Your
>> patch adds another offset to the already aligned size (but you use
>> pageblock_nr_pages which might be lower than MAX_ORDER_NR_PAGES; this seems like
>> a mistake in itself?). So with your patch we have map of aligned size starting
>> from the node_mem_map. This means the last offset-worth of struct pages should
>> be beyond what's needed to access struct page of pgdat_end_pfn(). If we need
>> that extra padding to prevent crashing, then it looks really suspicious...
>>
>> And when I look at node_mem_map usage, I see include/asm/generic/memory_model.h
>> defines __pfn_to_page as (basically)
>>
>> NODE_DATA(__nid)->node_mem_map + arch_local_page_offset(__pfn, __nid);\
>>
>> and further above is a generic definition of arch_local_page_offset:
>>
>> #define arch_local_page_offset(pfn, nid)        \
>>          ((pfn) - NODE_DATA(nid)->node_start_pfn)
>>
>> So it looks correct to me without your patch. The map is allocated aligned,
>> node_mem_map points to this map at the offset corresponding to node_start_pfn,
>> and pfn_to_page subtracts node_start_pfn to get the offset relative to
>> node_mem_map. We shouldn't need the extra padding by the node_start_pfn offset,
>> unless something else is misbehaving here.
>>
>> In the issue fixed by 7c45512 that you refer to, the problem was basically that
>> the allocation didn't use aligned size, but this looks different to me?
>>
>>
> 
> With this hard coded debugging:
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7633c50..241b870 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5029,6 +5029,11 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
>                          map = memblock_virt_alloc_node_nopanic(size,
>                                                                 pgdat->node_id);
>                  pgdat->node_mem_map = map + (pgdat->node_start_pfn - start);
> +               pr_err(">>> node_start_pfn %lx node_end_pfn %lx\n",
> +                       pgdat->node_start_pfn, pgdat_end_pfn(pgdat));
> +               pr_err(">>> size calculated %lx\n", size);
> +               pr_err(">>> allocated region %p-%lx\n", map, ((unsigned long)map)+size);
> +
>          }
>   #ifndef CONFIG_NEED_MULTIPLE_NODES
>          /*
> @@ -5043,6 +5048,8 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
>          }
>   #endif
>   #endif /* CONFIG_FLAT_NODE_MEM_MAP */
> +       pr_err(">>> pfn %lx page %p\n", 0x200, pfn_to_page(0x200));
> +       pr_err(">>> pfn %lx page %p\n", 0xbffff, pfn_to_page(0xbffff));
>   }
>   
>   void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
> 
> I get this output:
> [    0.000000] >>> node_start_pfn 200 node_end_pfn c0000
> [    0.000000] >>> size calculated 1800000
> [    0.000000] >>> allocated region edffa000-ef7fa000
> [    0.000000] >>> pfn 200 page ee002000
> [    0.000000] >>> pfn bffff page ef7fdfe0
> 
> The start and end pfn values are correct but that page value is outside of the
> allocated region for the memory map. This is a CONFIG_FLATMEM system so we
> aren't actually using arch_local_page_offset at all:
> 
> 
> #define __pfn_to_page(pfn)      (mem_map + ((pfn) - ARCH_PFN_OFFSET))
> #define __page_to_pfn(page)     ((unsigned long)((page) - mem_map) + \
>                                   ARCH_PFN_OFFSET)

Ah, OK. I searched just for node_mem_map and didn't notice it's also assigned to
mem_map.

> If you do the math, the array size is fine if we don't offset by the
> start but alloc_node_mem_map offsets assuming pfn_to_page will offset
> as well but this doesn't happen in CONFIG_FLATMEM.
> 
> Either alloc_node_mem_map needs to drop the offset or the pfn_to_page
> functions need to start adding the offset. It's worth noting that
> this gets corrected properly if we have CONFIG_HAVE_MEMBLOCK_NODE_MAP enabled
> so perhaps the fix is to unoffset for flatmem as well:
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7633c50..271c44b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5036,7 +5036,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
>           */
>          if (pgdat == NODE_DATA(0)) {
>                  mem_map = NODE_DATA(0)->node_mem_map;
> -#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
> +#if defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) || defined(CONFIG_FLATMEM)
>                  if (page_to_pfn(mem_map) != pgdat->node_start_pfn)
>                          mem_map -= (pgdat->node_start_pfn - ARCH_PFN_OFFSET);

But is this correcting the same thing? The offset that's added earlier is
(pgdat->node_start_pfn - start) where "start" is just alignment of the
node_start_pfn to MAX_ORDER_NR_PAGES. But here we subtract whole
pgdat->node_start_pfn, minus a ARCH_PFN_OFFSET constant. Is the constant always
equeal to the earlier value of "start", which is calculated dynamically?.

So I agree that mem_map assignment should be fixed, but maybe not exactly like this?

>   #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
> 
> Thanks,
> Laura
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Issue on reserving memory with no-map flag  in  DT
@ 2015-01-20  9:54         ` Vlastimil Babka
  0 siblings, 0 replies; 33+ messages in thread
From: Vlastimil Babka @ 2015-01-20  9:54 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/20/2015 12:57 AM, Laura Abbott wrote:
> On 1/19/2015 7:49 AM, Vlastimil Babka wrote:
>> On 01/17/2015 01:24 AM, Laura Abbott wrote:
>>
>> I admit I may not see clearly through all the arch-specific layers and various
>> config option combinations that are possible here, so I might be misinterpreting
>> the code. But I think the problem here is not insufficient allocation size, but
>> something else.
>>
>> The code above continues by this line:
>>
>> 		pgdat->node_mem_map = map + (pgdat->node_start_pfn - start);
>>
>> So, size for the map allocation has already been calculated aligned to
>> MAX_ORDER_NR_PAGES before your patch, and node_mem_map points to the first
>> actually present page, which might be offset from the perfect alignment. Your
>> patch adds another offset to the already aligned size (but you use
>> pageblock_nr_pages which might be lower than MAX_ORDER_NR_PAGES; this seems like
>> a mistake in itself?). So with your patch we have map of aligned size starting
>> from the node_mem_map. This means the last offset-worth of struct pages should
>> be beyond what's needed to access struct page of pgdat_end_pfn(). If we need
>> that extra padding to prevent crashing, then it looks really suspicious...
>>
>> And when I look at node_mem_map usage, I see include/asm/generic/memory_model.h
>> defines __pfn_to_page as (basically)
>>
>> NODE_DATA(__nid)->node_mem_map + arch_local_page_offset(__pfn, __nid);\
>>
>> and further above is a generic definition of arch_local_page_offset:
>>
>> #define arch_local_page_offset(pfn, nid)        \
>>          ((pfn) - NODE_DATA(nid)->node_start_pfn)
>>
>> So it looks correct to me without your patch. The map is allocated aligned,
>> node_mem_map points to this map at the offset corresponding to node_start_pfn,
>> and pfn_to_page subtracts node_start_pfn to get the offset relative to
>> node_mem_map. We shouldn't need the extra padding by the node_start_pfn offset,
>> unless something else is misbehaving here.
>>
>> In the issue fixed by 7c45512 that you refer to, the problem was basically that
>> the allocation didn't use aligned size, but this looks different to me?
>>
>>
> 
> With this hard coded debugging:
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7633c50..241b870 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5029,6 +5029,11 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
>                          map = memblock_virt_alloc_node_nopanic(size,
>                                                                 pgdat->node_id);
>                  pgdat->node_mem_map = map + (pgdat->node_start_pfn - start);
> +               pr_err(">>> node_start_pfn %lx node_end_pfn %lx\n",
> +                       pgdat->node_start_pfn, pgdat_end_pfn(pgdat));
> +               pr_err(">>> size calculated %lx\n", size);
> +               pr_err(">>> allocated region %p-%lx\n", map, ((unsigned long)map)+size);
> +
>          }
>   #ifndef CONFIG_NEED_MULTIPLE_NODES
>          /*
> @@ -5043,6 +5048,8 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
>          }
>   #endif
>   #endif /* CONFIG_FLAT_NODE_MEM_MAP */
> +       pr_err(">>> pfn %lx page %p\n", 0x200, pfn_to_page(0x200));
> +       pr_err(">>> pfn %lx page %p\n", 0xbffff, pfn_to_page(0xbffff));
>   }
>   
>   void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
> 
> I get this output:
> [    0.000000] >>> node_start_pfn 200 node_end_pfn c0000
> [    0.000000] >>> size calculated 1800000
> [    0.000000] >>> allocated region edffa000-ef7fa000
> [    0.000000] >>> pfn 200 page ee002000
> [    0.000000] >>> pfn bffff page ef7fdfe0
> 
> The start and end pfn values are correct but that page value is outside of the
> allocated region for the memory map. This is a CONFIG_FLATMEM system so we
> aren't actually using arch_local_page_offset at all:
> 
> 
> #define __pfn_to_page(pfn)      (mem_map + ((pfn) - ARCH_PFN_OFFSET))
> #define __page_to_pfn(page)     ((unsigned long)((page) - mem_map) + \
>                                   ARCH_PFN_OFFSET)

Ah, OK. I searched just for node_mem_map and didn't notice it's also assigned to
mem_map.

> If you do the math, the array size is fine if we don't offset by the
> start but alloc_node_mem_map offsets assuming pfn_to_page will offset
> as well but this doesn't happen in CONFIG_FLATMEM.
> 
> Either alloc_node_mem_map needs to drop the offset or the pfn_to_page
> functions need to start adding the offset. It's worth noting that
> this gets corrected properly if we have CONFIG_HAVE_MEMBLOCK_NODE_MAP enabled
> so perhaps the fix is to unoffset for flatmem as well:
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7633c50..271c44b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5036,7 +5036,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
>           */
>          if (pgdat == NODE_DATA(0)) {
>                  mem_map = NODE_DATA(0)->node_mem_map;
> -#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
> +#if defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) || defined(CONFIG_FLATMEM)
>                  if (page_to_pfn(mem_map) != pgdat->node_start_pfn)
>                          mem_map -= (pgdat->node_start_pfn - ARCH_PFN_OFFSET);

But is this correcting the same thing? The offset that's added earlier is
(pgdat->node_start_pfn - start) where "start" is just alignment of the
node_start_pfn to MAX_ORDER_NR_PAGES. But here we subtract whole
pgdat->node_start_pfn, minus a ARCH_PFN_OFFSET constant. Is the constant always
equeal to the earlier value of "start", which is calculated dynamically?.

So I agree that mem_map assignment should be fixed, but maybe not exactly like this?

>   #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
> 
> Thanks,
> Laura
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH] mm: Don't offset memmap for flatmem
  2015-01-20  9:54         ` Vlastimil Babka
@ 2015-01-21  1:37           ` Laura Abbott
  -1 siblings, 0 replies; 33+ messages in thread
From: Laura Abbott @ 2015-01-21  1:37 UTC (permalink / raw)
  To: Srinivas Kandagatla, linux-arm-kernel, Russell King - ARM Linux,
	ssantosh, Andrew Morton, Vlastimil Babka
  Cc: Laura Abbott, Kevin Kilman, Stephen Boyd, Arnd Bergman,
	Kumar Gala, linux-mm

Srinivas Kandagatla reported bad page messages when trying to
remove the bottom 2MB on an ARM based IFC6410 board

BUG: Bad page state in process swapper  pfn:fffa8
page:ef7fb500 count:0 mapcount:0 mapping:  (null) index:0x0
flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked)
page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
bad because of flags:
flags: 0x200041(locked|active|mlocked)
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816
Hardware name: Qualcomm (Flattened Device Tree)
[<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24)
[<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c)
[<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128)
[<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0)
[<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174)
[<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58)
[<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88)
[<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430)
[<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8)
[<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074)
Disabling lock debugging due to kernel taint

Removing the lower 2MB made the start of the lowmem zone to no longer
be page block aligned. IFC6410 uses CONFIG_FLATMEM where
alloc_node_mem_map allocates memory for the mem_map. alloc_node_mem_map
will offset for unaligned nodes with the assumption the pfn/page
translation functions will account for the offset. The functions for
CONFIG_FLATMEM do not offset however, resulting in overrunning
the memmap array. Just use the allocated memmap without any offset
when running with CONFIG_FLATMEM to avoid the overrun.

Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Reported-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
---
Srinivas, can you test this version of the patch?
---
 mm/page_alloc.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7633c50..33cef00 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5014,6 +5014,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
 	if (!pgdat->node_mem_map) {
 		unsigned long size, start, end;
 		struct page *map;
+		unsigned long offset = 0;
 
 		/*
 		 * The zone's endpoints aren't required to be MAX_ORDER
@@ -5021,6 +5022,8 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
 		 * for the buddy allocator to function correctly.
 		 */
 		start = pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1);
+		if (!IS_ENABLED(CONFIG_FLATMEM))
+			offset = pgdat->node_start_pfn - start;
 		end = pgdat_end_pfn(pgdat);
 		end = ALIGN(end, MAX_ORDER_NR_PAGES);
 		size =  (end - start) * sizeof(struct page);
@@ -5028,7 +5031,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
 		if (!map)
 			map = memblock_virt_alloc_node_nopanic(size,
 							       pgdat->node_id);
-		pgdat->node_mem_map = map + (pgdat->node_start_pfn - start);
+		pgdat->node_mem_map = map + offset;
 	}
 #ifndef CONFIG_NEED_MULTIPLE_NODES
 	/*
-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH] mm: Don't offset memmap for flatmem
@ 2015-01-21  1:37           ` Laura Abbott
  0 siblings, 0 replies; 33+ messages in thread
From: Laura Abbott @ 2015-01-21  1:37 UTC (permalink / raw)
  To: linux-arm-kernel

Srinivas Kandagatla reported bad page messages when trying to
remove the bottom 2MB on an ARM based IFC6410 board

BUG: Bad page state in process swapper  pfn:fffa8
page:ef7fb500 count:0 mapcount:0 mapping:  (null) index:0x0
flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked)
page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
bad because of flags:
flags: 0x200041(locked|active|mlocked)
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816
Hardware name: Qualcomm (Flattened Device Tree)
[<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24)
[<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c)
[<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128)
[<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0)
[<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174)
[<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58)
[<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88)
[<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430)
[<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8)
[<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074)
Disabling lock debugging due to kernel taint

Removing the lower 2MB made the start of the lowmem zone to no longer
be page block aligned. IFC6410 uses CONFIG_FLATMEM where
alloc_node_mem_map allocates memory for the mem_map. alloc_node_mem_map
will offset for unaligned nodes with the assumption the pfn/page
translation functions will account for the offset. The functions for
CONFIG_FLATMEM do not offset however, resulting in overrunning
the memmap array. Just use the allocated memmap without any offset
when running with CONFIG_FLATMEM to avoid the overrun.

Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Reported-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
---
Srinivas, can you test this version of the patch?
---
 mm/page_alloc.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7633c50..33cef00 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5014,6 +5014,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
 	if (!pgdat->node_mem_map) {
 		unsigned long size, start, end;
 		struct page *map;
+		unsigned long offset = 0;
 
 		/*
 		 * The zone's endpoints aren't required to be MAX_ORDER
@@ -5021,6 +5022,8 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
 		 * for the buddy allocator to function correctly.
 		 */
 		start = pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1);
+		if (!IS_ENABLED(CONFIG_FLATMEM))
+			offset = pgdat->node_start_pfn - start;
 		end = pgdat_end_pfn(pgdat);
 		end = ALIGN(end, MAX_ORDER_NR_PAGES);
 		size =  (end - start) * sizeof(struct page);
@@ -5028,7 +5031,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
 		if (!map)
 			map = memblock_virt_alloc_node_nopanic(size,
 							       pgdat->node_id);
-		pgdat->node_mem_map = map + (pgdat->node_start_pfn - start);
+		pgdat->node_mem_map = map + offset;
 	}
 #ifndef CONFIG_NEED_MULTIPLE_NODES
 	/*
-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH] mm: Don't offset memmap for flatmem
  2015-01-21  1:37           ` Laura Abbott
@ 2015-01-21 10:15             ` Vlastimil Babka
  -1 siblings, 0 replies; 33+ messages in thread
From: Vlastimil Babka @ 2015-01-21 10:15 UTC (permalink / raw)
  To: Laura Abbott, Srinivas Kandagatla, linux-arm-kernel,
	Russell King - ARM Linux, ssantosh, Andrew Morton
  Cc: Kevin Kilman, Stephen Boyd, Arnd Bergman, Kumar Gala, linux-mm

On 01/21/2015 02:37 AM, Laura Abbott wrote:
> Srinivas Kandagatla reported bad page messages when trying to
> remove the bottom 2MB on an ARM based IFC6410 board
> 
> BUG: Bad page state in process swapper  pfn:fffa8
> page:ef7fb500 count:0 mapcount:0 mapping:  (null) index:0x0
> flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked)
> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> bad because of flags:
> flags: 0x200041(locked|active|mlocked)
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816
> Hardware name: Qualcomm (Flattened Device Tree)
> [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24)
> [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c)
> [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128)
> [<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0)
> [<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174)
> [<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58)
> [<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88)
> [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430)
> [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8)
> [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074)
> Disabling lock debugging due to kernel taint
> 
> Removing the lower 2MB made the start of the lowmem zone to no longer
> be page block aligned. IFC6410 uses CONFIG_FLATMEM where
> alloc_node_mem_map allocates memory for the mem_map. alloc_node_mem_map
> will offset for unaligned nodes with the assumption the pfn/page
> translation functions will account for the offset. The functions for
> CONFIG_FLATMEM do not offset however, resulting in overrunning
> the memmap array. Just use the allocated memmap without any offset
> when running with CONFIG_FLATMEM to avoid the overrun.
> 
> Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
> Reported-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
> ---
> Srinivas, can you test this version of the patch?
> ---
>  mm/page_alloc.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7633c50..33cef00 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5014,6 +5014,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
>  	if (!pgdat->node_mem_map) {
>  		unsigned long size, start, end;
>  		struct page *map;
> +		unsigned long offset = 0;
>  
>  		/*
>  		 * The zone's endpoints aren't required to be MAX_ORDER
> @@ -5021,6 +5022,8 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
>  		 * for the buddy allocator to function correctly.
>  		 */
>  		start = pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1);
> +		if (!IS_ENABLED(CONFIG_FLATMEM))
> +			offset = pgdat->node_start_pfn - start;
>  		end = pgdat_end_pfn(pgdat);
>  		end = ALIGN(end, MAX_ORDER_NR_PAGES);
>  		size =  (end - start) * sizeof(struct page);
> @@ -5028,7 +5031,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
>  		if (!map)
>  			map = memblock_virt_alloc_node_nopanic(size,
>  							       pgdat->node_id);
> -		pgdat->node_mem_map = map + (pgdat->node_start_pfn - start);
> +		pgdat->node_mem_map = map + offset;

Hmm, by this patch, you have changed not only mem_map, but also node_mem_map
itself. So the result of pgdat_page_nr() defined in mmzone.h will now be
different in the CONFIG_FLAT_NODE_MEM_MAP case?

#ifdef CONFIG_FLAT_NODE_MEM_MAP
#define pgdat_page_nr(pgdat, pagenr)    ((pgdat)->node_mem_map + (pagenr))
#else
#define pgdat_page_nr(pgdat, pagenr)    pfn_to_page((pgdat)->node_start_pfn +
(pagenr))
#define nid_page_nr(nid, pagenr)        pgdat_page_nr(NODE_DATA(nid),(pagenr))

It appears that nobody uses pgdat_page_nr, except nid_page_nr, which nobody
uses. But better not leave it broken, and there's also some arch-specific code
looking at node_mem_map directly (although not sure if this particular
combination of CONFIG_ parameters applies there). So it seems to me we should
rather apply the offset to node_mem_map in any case, but not apply it (i.e.
subtract it back) to mem_map for !CONFIG_FLATMEM?

Thanks.

>  	}
>  #ifndef CONFIG_NEED_MULTIPLE_NODES
>  	/*
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH] mm: Don't offset memmap for flatmem
@ 2015-01-21 10:15             ` Vlastimil Babka
  0 siblings, 0 replies; 33+ messages in thread
From: Vlastimil Babka @ 2015-01-21 10:15 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/21/2015 02:37 AM, Laura Abbott wrote:
> Srinivas Kandagatla reported bad page messages when trying to
> remove the bottom 2MB on an ARM based IFC6410 board
> 
> BUG: Bad page state in process swapper  pfn:fffa8
> page:ef7fb500 count:0 mapcount:0 mapping:  (null) index:0x0
> flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked)
> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> bad because of flags:
> flags: 0x200041(locked|active|mlocked)
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816
> Hardware name: Qualcomm (Flattened Device Tree)
> [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24)
> [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c)
> [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128)
> [<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0)
> [<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174)
> [<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58)
> [<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88)
> [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430)
> [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8)
> [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074)
> Disabling lock debugging due to kernel taint
> 
> Removing the lower 2MB made the start of the lowmem zone to no longer
> be page block aligned. IFC6410 uses CONFIG_FLATMEM where
> alloc_node_mem_map allocates memory for the mem_map. alloc_node_mem_map
> will offset for unaligned nodes with the assumption the pfn/page
> translation functions will account for the offset. The functions for
> CONFIG_FLATMEM do not offset however, resulting in overrunning
> the memmap array. Just use the allocated memmap without any offset
> when running with CONFIG_FLATMEM to avoid the overrun.
> 
> Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
> Reported-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
> ---
> Srinivas, can you test this version of the patch?
> ---
>  mm/page_alloc.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7633c50..33cef00 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5014,6 +5014,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
>  	if (!pgdat->node_mem_map) {
>  		unsigned long size, start, end;
>  		struct page *map;
> +		unsigned long offset = 0;
>  
>  		/*
>  		 * The zone's endpoints aren't required to be MAX_ORDER
> @@ -5021,6 +5022,8 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
>  		 * for the buddy allocator to function correctly.
>  		 */
>  		start = pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1);
> +		if (!IS_ENABLED(CONFIG_FLATMEM))
> +			offset = pgdat->node_start_pfn - start;
>  		end = pgdat_end_pfn(pgdat);
>  		end = ALIGN(end, MAX_ORDER_NR_PAGES);
>  		size =  (end - start) * sizeof(struct page);
> @@ -5028,7 +5031,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
>  		if (!map)
>  			map = memblock_virt_alloc_node_nopanic(size,
>  							       pgdat->node_id);
> -		pgdat->node_mem_map = map + (pgdat->node_start_pfn - start);
> +		pgdat->node_mem_map = map + offset;

Hmm, by this patch, you have changed not only mem_map, but also node_mem_map
itself. So the result of pgdat_page_nr() defined in mmzone.h will now be
different in the CONFIG_FLAT_NODE_MEM_MAP case?

#ifdef CONFIG_FLAT_NODE_MEM_MAP
#define pgdat_page_nr(pgdat, pagenr)    ((pgdat)->node_mem_map + (pagenr))
#else
#define pgdat_page_nr(pgdat, pagenr)    pfn_to_page((pgdat)->node_start_pfn +
(pagenr))
#define nid_page_nr(nid, pagenr)        pgdat_page_nr(NODE_DATA(nid),(pagenr))

It appears that nobody uses pgdat_page_nr, except nid_page_nr, which nobody
uses. But better not leave it broken, and there's also some arch-specific code
looking at node_mem_map directly (although not sure if this particular
combination of CONFIG_ parameters applies there). So it seems to me we should
rather apply the offset to node_mem_map in any case, but not apply it (i.e.
subtract it back) to mem_map for !CONFIG_FLATMEM?

Thanks.

>  	}
>  #ifndef CONFIG_NEED_MULTIPLE_NODES
>  	/*
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCHv2] mm: Don't offset memmap for flatmem
  2015-01-21  1:37           ` Laura Abbott
@ 2015-01-22  1:01             ` Laura Abbott
  -1 siblings, 0 replies; 33+ messages in thread
From: Laura Abbott @ 2015-01-22  1:01 UTC (permalink / raw)
  To: Vlastimil Babka, Srinivas Kandagatla, linux-arm-kernel,
	Russell King - ARM Linux, ssantosh, Andrew Morton
  Cc: Laura Abbott, Kevin Hilman, Arnd Bergman, Stephen Boyd, linux-mm,
	Kumar Gala

Srinivas Kandagatla reported bad page messages when trying to
remove the bottom 2MB on an ARM based IFC6410 board

BUG: Bad page state in process swapper  pfn:fffa8
page:ef7fb500 count:0 mapcount:0 mapping:  (null) index:0x0
flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked)
page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
bad because of flags:
flags: 0x200041(locked|active|mlocked)
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816
Hardware name: Qualcomm (Flattened Device Tree)
[<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24)
[<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c)
[<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128)
[<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0)
[<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174)
[<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58)
[<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88)
[<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430)
[<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8)
[<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074)
Disabling lock debugging due to kernel taint

Removing the lower 2MB made the start of the lowmem zone to no longer
be page block aligned. IFC6410 uses CONFIG_FLATMEM where
alloc_node_mem_map allocates memory for the mem_map. alloc_node_mem_map
will offset for unaligned nodes with the assumption the pfn/page
translation functions will account for the offset. The functions for
CONFIG_FLATMEM do not offset however, resulting in overrunning
the memmap array. Just use the allocated memmap without any offset
when running with CONFIG_FLATMEM to avoid the overrun.

Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Reported-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
---
 mm/page_alloc.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7633c50..269fc93 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5005,6 +5005,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 
 static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
 {
+	unsigned long __maybe_unused offset = 0;
+
 	/* Skip empty nodes */
 	if (!pgdat->node_spanned_pages)
 		return;
@@ -5021,6 +5023,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
 		 * for the buddy allocator to function correctly.
 		 */
 		start = pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1);
+		offset = pgdat->node_start_pfn - start;
 		end = pgdat_end_pfn(pgdat);
 		end = ALIGN(end, MAX_ORDER_NR_PAGES);
 		size =  (end - start) * sizeof(struct page);
@@ -5028,7 +5031,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
 		if (!map)
 			map = memblock_virt_alloc_node_nopanic(size,
 							       pgdat->node_id);
-		pgdat->node_mem_map = map + (pgdat->node_start_pfn - start);
+		pgdat->node_mem_map = map + offset;
 	}
 #ifndef CONFIG_NEED_MULTIPLE_NODES
 	/*
@@ -5036,10 +5039,13 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
 	 */
 	if (pgdat == NODE_DATA(0)) {
 		mem_map = NODE_DATA(0)->node_mem_map;
-#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
-		if (page_to_pfn(mem_map) != pgdat->node_start_pfn)
-			mem_map -= (pgdat->node_start_pfn - ARCH_PFN_OFFSET);
-#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
+#if defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) || defined(CONFIG_FLATMEM)
+		if (page_to_pfn(mem_map) != pgdat->node_start_pfn) {
+			if (IS_ENABLED(CONFIG_HAVE_MEMBLOCK_NODE_MAP))
+				offset = pgdat->node_start_pfn - ARCH_PFN_OFFSET;
+			mem_map -= offset;
+		}
+#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP || CONFIG_FLATMEM */
 	}
 #endif
 #endif /* CONFIG_FLAT_NODE_MEM_MAP */
-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv2] mm: Don't offset memmap for flatmem
@ 2015-01-22  1:01             ` Laura Abbott
  0 siblings, 0 replies; 33+ messages in thread
From: Laura Abbott @ 2015-01-22  1:01 UTC (permalink / raw)
  To: linux-arm-kernel

Srinivas Kandagatla reported bad page messages when trying to
remove the bottom 2MB on an ARM based IFC6410 board

BUG: Bad page state in process swapper  pfn:fffa8
page:ef7fb500 count:0 mapcount:0 mapping:  (null) index:0x0
flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked)
page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
bad because of flags:
flags: 0x200041(locked|active|mlocked)
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816
Hardware name: Qualcomm (Flattened Device Tree)
[<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24)
[<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c)
[<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128)
[<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0)
[<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174)
[<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58)
[<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88)
[<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430)
[<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8)
[<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074)
Disabling lock debugging due to kernel taint

Removing the lower 2MB made the start of the lowmem zone to no longer
be page block aligned. IFC6410 uses CONFIG_FLATMEM where
alloc_node_mem_map allocates memory for the mem_map. alloc_node_mem_map
will offset for unaligned nodes with the assumption the pfn/page
translation functions will account for the offset. The functions for
CONFIG_FLATMEM do not offset however, resulting in overrunning
the memmap array. Just use the allocated memmap without any offset
when running with CONFIG_FLATMEM to avoid the overrun.

Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Reported-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
---
 mm/page_alloc.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7633c50..269fc93 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5005,6 +5005,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 
 static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
 {
+	unsigned long __maybe_unused offset = 0;
+
 	/* Skip empty nodes */
 	if (!pgdat->node_spanned_pages)
 		return;
@@ -5021,6 +5023,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
 		 * for the buddy allocator to function correctly.
 		 */
 		start = pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1);
+		offset = pgdat->node_start_pfn - start;
 		end = pgdat_end_pfn(pgdat);
 		end = ALIGN(end, MAX_ORDER_NR_PAGES);
 		size =  (end - start) * sizeof(struct page);
@@ -5028,7 +5031,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
 		if (!map)
 			map = memblock_virt_alloc_node_nopanic(size,
 							       pgdat->node_id);
-		pgdat->node_mem_map = map + (pgdat->node_start_pfn - start);
+		pgdat->node_mem_map = map + offset;
 	}
 #ifndef CONFIG_NEED_MULTIPLE_NODES
 	/*
@@ -5036,10 +5039,13 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
 	 */
 	if (pgdat == NODE_DATA(0)) {
 		mem_map = NODE_DATA(0)->node_mem_map;
-#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
-		if (page_to_pfn(mem_map) != pgdat->node_start_pfn)
-			mem_map -= (pgdat->node_start_pfn - ARCH_PFN_OFFSET);
-#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
+#if defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) || defined(CONFIG_FLATMEM)
+		if (page_to_pfn(mem_map) != pgdat->node_start_pfn) {
+			if (IS_ENABLED(CONFIG_HAVE_MEMBLOCK_NODE_MAP))
+				offset = pgdat->node_start_pfn - ARCH_PFN_OFFSET;
+			mem_map -= offset;
+		}
+#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP || CONFIG_FLATMEM */
 	}
 #endif
 #endif /* CONFIG_FLAT_NODE_MEM_MAP */
-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCHv2] mm: Don't offset memmap for flatmem
  2015-01-22  1:01             ` Laura Abbott
@ 2015-01-23  0:20               ` Andrew Morton
  -1 siblings, 0 replies; 33+ messages in thread
From: Andrew Morton @ 2015-01-23  0:20 UTC (permalink / raw)
  To: Laura Abbott
  Cc: Vlastimil Babka, Srinivas Kandagatla, linux-arm-kernel,
	Russell King - ARM Linux, ssantosh, Kevin Hilman, Arnd Bergman,
	Stephen Boyd, linux-mm, Kumar Gala

On Wed, 21 Jan 2015 17:01:40 -0800 Laura Abbott <lauraa@codeaurora.org> wrote:

> Srinivas Kandagatla reported bad page messages when trying to
> remove the bottom 2MB on an ARM based IFC6410 board
> 
> BUG: Bad page state in process swapper  pfn:fffa8
> page:ef7fb500 count:0 mapcount:0 mapping:  (null) index:0x0
> flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked)
> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> bad because of flags:
> flags: 0x200041(locked|active|mlocked)
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816
> Hardware name: Qualcomm (Flattened Device Tree)
> [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24)
> [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c)
> [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128)
> [<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0)
> [<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174)
> [<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58)
> [<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88)
> [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430)
> [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8)
> [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074)
> Disabling lock debugging due to kernel taint
> 
> Removing the lower 2MB made the start of the lowmem zone to no longer
> be page block aligned. IFC6410 uses CONFIG_FLATMEM where
> alloc_node_mem_map allocates memory for the mem_map. alloc_node_mem_map
> will offset for unaligned nodes with the assumption the pfn/page
> translation functions will account for the offset. The functions for
> CONFIG_FLATMEM do not offset however, resulting in overrunning
> the memmap array. Just use the allocated memmap without any offset
> when running with CONFIG_FLATMEM to avoid the overrun.
> 

I don't think v2 addressed Vlastimil's review comment?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCHv2] mm: Don't offset memmap for flatmem
@ 2015-01-23  0:20               ` Andrew Morton
  0 siblings, 0 replies; 33+ messages in thread
From: Andrew Morton @ 2015-01-23  0:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 21 Jan 2015 17:01:40 -0800 Laura Abbott <lauraa@codeaurora.org> wrote:

> Srinivas Kandagatla reported bad page messages when trying to
> remove the bottom 2MB on an ARM based IFC6410 board
> 
> BUG: Bad page state in process swapper  pfn:fffa8
> page:ef7fb500 count:0 mapcount:0 mapping:  (null) index:0x0
> flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked)
> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> bad because of flags:
> flags: 0x200041(locked|active|mlocked)
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816
> Hardware name: Qualcomm (Flattened Device Tree)
> [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24)
> [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c)
> [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128)
> [<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0)
> [<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174)
> [<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58)
> [<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88)
> [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430)
> [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8)
> [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074)
> Disabling lock debugging due to kernel taint
> 
> Removing the lower 2MB made the start of the lowmem zone to no longer
> be page block aligned. IFC6410 uses CONFIG_FLATMEM where
> alloc_node_mem_map allocates memory for the mem_map. alloc_node_mem_map
> will offset for unaligned nodes with the assumption the pfn/page
> translation functions will account for the offset. The functions for
> CONFIG_FLATMEM do not offset however, resulting in overrunning
> the memmap array. Just use the allocated memmap without any offset
> when running with CONFIG_FLATMEM to avoid the overrun.
> 

I don't think v2 addressed Vlastimil's review comment?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv2] mm: Don't offset memmap for flatmem
  2015-01-23  0:20               ` Andrew Morton
@ 2015-01-23  0:33                 ` Laura Abbott
  -1 siblings, 0 replies; 33+ messages in thread
From: Laura Abbott @ 2015-01-23  0:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Srinivas Kandagatla, linux-arm-kernel,
	Russell King - ARM Linux, ssantosh, Kevin Hilman, Arnd Bergman,
	Stephen Boyd, linux-mm, Kumar Gala

On 1/22/2015 4:20 PM, Andrew Morton wrote:
> On Wed, 21 Jan 2015 17:01:40 -0800 Laura Abbott <lauraa@codeaurora.org> wrote:
>
>> Srinivas Kandagatla reported bad page messages when trying to
>> remove the bottom 2MB on an ARM based IFC6410 board
>>
>> BUG: Bad page state in process swapper  pfn:fffa8
>> page:ef7fb500 count:0 mapcount:0 mapping:  (null) index:0x0
>> flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked)
>> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
>> bad because of flags:
>> flags: 0x200041(locked|active|mlocked)
>> Modules linked in:
>> CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816
>> Hardware name: Qualcomm (Flattened Device Tree)
>> [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24)
>> [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c)
>> [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128)
>> [<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0)
>> [<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174)
>> [<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58)
>> [<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88)
>> [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430)
>> [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8)
>> [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074)
>> Disabling lock debugging due to kernel taint
>>
>> Removing the lower 2MB made the start of the lowmem zone to no longer
>> be page block aligned. IFC6410 uses CONFIG_FLATMEM where
>> alloc_node_mem_map allocates memory for the mem_map. alloc_node_mem_map
>> will offset for unaligned nodes with the assumption the pfn/page
>> translation functions will account for the offset. The functions for
>> CONFIG_FLATMEM do not offset however, resulting in overrunning
>> the memmap array. Just use the allocated memmap without any offset
>> when running with CONFIG_FLATMEM to avoid the overrun.
>>
>
> I don't think v2 addressed Vlastimil's review comment?
>

We're still adding the offset to node_mem_map and then subtracting it from
just mem_map. Did I miss another comment somewhere?


-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCHv2] mm: Don't offset memmap for flatmem
@ 2015-01-23  0:33                 ` Laura Abbott
  0 siblings, 0 replies; 33+ messages in thread
From: Laura Abbott @ 2015-01-23  0:33 UTC (permalink / raw)
  To: linux-arm-kernel

On 1/22/2015 4:20 PM, Andrew Morton wrote:
> On Wed, 21 Jan 2015 17:01:40 -0800 Laura Abbott <lauraa@codeaurora.org> wrote:
>
>> Srinivas Kandagatla reported bad page messages when trying to
>> remove the bottom 2MB on an ARM based IFC6410 board
>>
>> BUG: Bad page state in process swapper  pfn:fffa8
>> page:ef7fb500 count:0 mapcount:0 mapping:  (null) index:0x0
>> flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked)
>> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
>> bad because of flags:
>> flags: 0x200041(locked|active|mlocked)
>> Modules linked in:
>> CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816
>> Hardware name: Qualcomm (Flattened Device Tree)
>> [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24)
>> [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c)
>> [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128)
>> [<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0)
>> [<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174)
>> [<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58)
>> [<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88)
>> [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430)
>> [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8)
>> [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074)
>> Disabling lock debugging due to kernel taint
>>
>> Removing the lower 2MB made the start of the lowmem zone to no longer
>> be page block aligned. IFC6410 uses CONFIG_FLATMEM where
>> alloc_node_mem_map allocates memory for the mem_map. alloc_node_mem_map
>> will offset for unaligned nodes with the assumption the pfn/page
>> translation functions will account for the offset. The functions for
>> CONFIG_FLATMEM do not offset however, resulting in overrunning
>> the memmap array. Just use the allocated memmap without any offset
>> when running with CONFIG_FLATMEM to avoid the overrun.
>>
>
> I don't think v2 addressed Vlastimil's review comment?
>

We're still adding the offset to node_mem_map and then subtracting it from
just mem_map. Did I miss another comment somewhere?


-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv2] mm: Don't offset memmap for flatmem
  2015-01-23  0:33                 ` Laura Abbott
@ 2015-01-23  9:05                   ` Vlastimil Babka
  -1 siblings, 0 replies; 33+ messages in thread
From: Vlastimil Babka @ 2015-01-23  9:05 UTC (permalink / raw)
  To: Laura Abbott, Andrew Morton
  Cc: Srinivas Kandagatla, linux-arm-kernel, Russell King - ARM Linux,
	ssantosh, Kevin Hilman, Arnd Bergman, Stephen Boyd, linux-mm,
	Kumar Gala, Mel Gorman

On 01/23/2015 01:33 AM, Laura Abbott wrote:
> On 1/22/2015 4:20 PM, Andrew Morton wrote:
>> On Wed, 21 Jan 2015 17:01:40 -0800 Laura Abbott <lauraa@codeaurora.org> wrote:
>>
>>> Srinivas Kandagatla reported bad page messages when trying to
>>> remove the bottom 2MB on an ARM based IFC6410 board
>>>
>>> BUG: Bad page state in process swapper  pfn:fffa8
>>> page:ef7fb500 count:0 mapcount:0 mapping:  (null) index:0x0
>>> flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked)
>>> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
>>> bad because of flags:
>>> flags: 0x200041(locked|active|mlocked)
>>> Modules linked in:
>>> CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816
>>> Hardware name: Qualcomm (Flattened Device Tree)
>>> [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24)
>>> [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c)
>>> [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128)
>>> [<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0)
>>> [<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174)
>>> [<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58)
>>> [<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88)
>>> [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430)
>>> [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8)
>>> [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074)
>>> Disabling lock debugging due to kernel taint
>>>
>>> Removing the lower 2MB made the start of the lowmem zone to no longer
>>> be page block aligned. IFC6410 uses CONFIG_FLATMEM where
>>> alloc_node_mem_map allocates memory for the mem_map. alloc_node_mem_map
>>> will offset for unaligned nodes with the assumption the pfn/page
>>> translation functions will account for the offset. The functions for
>>> CONFIG_FLATMEM do not offset however, resulting in overrunning
>>> the memmap array. Just use the allocated memmap without any offset
>>> when running with CONFIG_FLATMEM to avoid the overrun.
>>>
>>
>> I don't think v2 addressed Vlastimil's review comment?
>>
>
> We're still adding the offset to node_mem_map and then subtracting it from
> just mem_map. Did I miss another comment somewhere?

Yes that was addressed, thanks. But I don't feel comfortable acking it 
yet, as I have no idea if we are doing the right thing for 
CONFIG_HAVE_MEMBLOCK_NODE_MAP && CONFIG_FLATMEM case here.

Also putting the CONFIG_FLATMEM && !CONFIG_HAVE_MEMBLOCK_NODE_MAP under 
the "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" will probably 
do the right thing, but looks like a weird test for this case here.

I have no good suggestion though, so let's CC Mel who apparently wrote 
the ARCH_PFN_OFFSET correction?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCHv2] mm: Don't offset memmap for flatmem
@ 2015-01-23  9:05                   ` Vlastimil Babka
  0 siblings, 0 replies; 33+ messages in thread
From: Vlastimil Babka @ 2015-01-23  9:05 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/23/2015 01:33 AM, Laura Abbott wrote:
> On 1/22/2015 4:20 PM, Andrew Morton wrote:
>> On Wed, 21 Jan 2015 17:01:40 -0800 Laura Abbott <lauraa@codeaurora.org> wrote:
>>
>>> Srinivas Kandagatla reported bad page messages when trying to
>>> remove the bottom 2MB on an ARM based IFC6410 board
>>>
>>> BUG: Bad page state in process swapper  pfn:fffa8
>>> page:ef7fb500 count:0 mapcount:0 mapping:  (null) index:0x0
>>> flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked)
>>> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
>>> bad because of flags:
>>> flags: 0x200041(locked|active|mlocked)
>>> Modules linked in:
>>> CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816
>>> Hardware name: Qualcomm (Flattened Device Tree)
>>> [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24)
>>> [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c)
>>> [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128)
>>> [<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0)
>>> [<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174)
>>> [<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58)
>>> [<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88)
>>> [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430)
>>> [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8)
>>> [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074)
>>> Disabling lock debugging due to kernel taint
>>>
>>> Removing the lower 2MB made the start of the lowmem zone to no longer
>>> be page block aligned. IFC6410 uses CONFIG_FLATMEM where
>>> alloc_node_mem_map allocates memory for the mem_map. alloc_node_mem_map
>>> will offset for unaligned nodes with the assumption the pfn/page
>>> translation functions will account for the offset. The functions for
>>> CONFIG_FLATMEM do not offset however, resulting in overrunning
>>> the memmap array. Just use the allocated memmap without any offset
>>> when running with CONFIG_FLATMEM to avoid the overrun.
>>>
>>
>> I don't think v2 addressed Vlastimil's review comment?
>>
>
> We're still adding the offset to node_mem_map and then subtracting it from
> just mem_map. Did I miss another comment somewhere?

Yes that was addressed, thanks. But I don't feel comfortable acking it 
yet, as I have no idea if we are doing the right thing for 
CONFIG_HAVE_MEMBLOCK_NODE_MAP && CONFIG_FLATMEM case here.

Also putting the CONFIG_FLATMEM && !CONFIG_HAVE_MEMBLOCK_NODE_MAP under 
the "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" will probably 
do the right thing, but looks like a weird test for this case here.

I have no good suggestion though, so let's CC Mel who apparently wrote 
the ARCH_PFN_OFFSET correction?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv2] mm: Don't offset memmap for flatmem
  2015-01-23  9:05                   ` Vlastimil Babka
@ 2015-01-26 15:56                     ` Mel Gorman
  -1 siblings, 0 replies; 33+ messages in thread
From: Mel Gorman @ 2015-01-26 15:56 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Laura Abbott, Andrew Morton, Srinivas Kandagatla,
	linux-arm-kernel, Russell King - ARM Linux, ssantosh,
	Kevin Hilman, Arnd Bergman, Stephen Boyd, linux-mm, Kumar Gala

On Fri, Jan 23, 2015 at 10:05:48AM +0100, Vlastimil Babka wrote:
> On 01/23/2015 01:33 AM, Laura Abbott wrote:
> >On 1/22/2015 4:20 PM, Andrew Morton wrote:
> >>On Wed, 21 Jan 2015 17:01:40 -0800 Laura Abbott <lauraa@codeaurora.org> wrote:
> >>
> >>>Srinivas Kandagatla reported bad page messages when trying to
> >>>remove the bottom 2MB on an ARM based IFC6410 board
> >>>
> >>>BUG: Bad page state in process swapper  pfn:fffa8
> >>>page:ef7fb500 count:0 mapcount:0 mapping:  (null) index:0x0
> >>>flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked)
> >>>page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> >>>bad because of flags:
> >>>flags: 0x200041(locked|active|mlocked)
> >>>Modules linked in:
> >>>CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816
> >>>Hardware name: Qualcomm (Flattened Device Tree)
> >>>[<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24)
> >>>[<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c)
> >>>[<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128)
> >>>[<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0)
> >>>[<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174)
> >>>[<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58)
> >>>[<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88)
> >>>[<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430)
> >>>[<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8)
> >>>[<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074)
> >>>Disabling lock debugging due to kernel taint
> >>>
> >>>Removing the lower 2MB made the start of the lowmem zone to no longer
> >>>be page block aligned. IFC6410 uses CONFIG_FLATMEM where
> >>>alloc_node_mem_map allocates memory for the mem_map. alloc_node_mem_map
> >>>will offset for unaligned nodes with the assumption the pfn/page
> >>>translation functions will account for the offset. The functions for
> >>>CONFIG_FLATMEM do not offset however, resulting in overrunning
> >>>the memmap array. Just use the allocated memmap without any offset
> >>>when running with CONFIG_FLATMEM to avoid the overrun.
> >>>
> >>
> >>I don't think v2 addressed Vlastimil's review comment?
> >>
> >
> >We're still adding the offset to node_mem_map and then subtracting it from
> >just mem_map. Did I miss another comment somewhere?
> 
> Yes that was addressed, thanks. But I don't feel comfortable acking
> it yet, as I have no idea if we are doing the right thing for
> CONFIG_HAVE_MEMBLOCK_NODE_MAP && CONFIG_FLATMEM case here.
> 
> Also putting the CONFIG_FLATMEM && !CONFIG_HAVE_MEMBLOCK_NODE_MAP
> under the "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" will
> probably do the right thing, but looks like a weird test for this
> case here.
> 
> I have no good suggestion though, so let's CC Mel who apparently
> wrote the ARCH_PFN_OFFSET correction?
> 

I don't recall introducing ARCH_PFN_OFFSET, are you sure it was me?  I'm just
back today after been offline a week so didn't review the patch but IIRC,
ARCH_PFN_OFFSET deals with the case where physical memory does not start
at 0. Without the offset, virtual _PAGE_OFFSET would not physical page 0.
I don't recall it being related to the alignment of node 0 so if there
are crashes due to misalignment of node 0 and the fix is ARCH_PFN_OFFSET
related then I'm surprised.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCHv2] mm: Don't offset memmap for flatmem
@ 2015-01-26 15:56                     ` Mel Gorman
  0 siblings, 0 replies; 33+ messages in thread
From: Mel Gorman @ 2015-01-26 15:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 23, 2015 at 10:05:48AM +0100, Vlastimil Babka wrote:
> On 01/23/2015 01:33 AM, Laura Abbott wrote:
> >On 1/22/2015 4:20 PM, Andrew Morton wrote:
> >>On Wed, 21 Jan 2015 17:01:40 -0800 Laura Abbott <lauraa@codeaurora.org> wrote:
> >>
> >>>Srinivas Kandagatla reported bad page messages when trying to
> >>>remove the bottom 2MB on an ARM based IFC6410 board
> >>>
> >>>BUG: Bad page state in process swapper  pfn:fffa8
> >>>page:ef7fb500 count:0 mapcount:0 mapping:  (null) index:0x0
> >>>flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked)
> >>>page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> >>>bad because of flags:
> >>>flags: 0x200041(locked|active|mlocked)
> >>>Modules linked in:
> >>>CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816
> >>>Hardware name: Qualcomm (Flattened Device Tree)
> >>>[<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24)
> >>>[<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c)
> >>>[<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128)
> >>>[<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0)
> >>>[<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174)
> >>>[<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58)
> >>>[<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88)
> >>>[<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430)
> >>>[<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8)
> >>>[<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074)
> >>>Disabling lock debugging due to kernel taint
> >>>
> >>>Removing the lower 2MB made the start of the lowmem zone to no longer
> >>>be page block aligned. IFC6410 uses CONFIG_FLATMEM where
> >>>alloc_node_mem_map allocates memory for the mem_map. alloc_node_mem_map
> >>>will offset for unaligned nodes with the assumption the pfn/page
> >>>translation functions will account for the offset. The functions for
> >>>CONFIG_FLATMEM do not offset however, resulting in overrunning
> >>>the memmap array. Just use the allocated memmap without any offset
> >>>when running with CONFIG_FLATMEM to avoid the overrun.
> >>>
> >>
> >>I don't think v2 addressed Vlastimil's review comment?
> >>
> >
> >We're still adding the offset to node_mem_map and then subtracting it from
> >just mem_map. Did I miss another comment somewhere?
> 
> Yes that was addressed, thanks. But I don't feel comfortable acking
> it yet, as I have no idea if we are doing the right thing for
> CONFIG_HAVE_MEMBLOCK_NODE_MAP && CONFIG_FLATMEM case here.
> 
> Also putting the CONFIG_FLATMEM && !CONFIG_HAVE_MEMBLOCK_NODE_MAP
> under the "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" will
> probably do the right thing, but looks like a weird test for this
> case here.
> 
> I have no good suggestion though, so let's CC Mel who apparently
> wrote the ARCH_PFN_OFFSET correction?
> 

I don't recall introducing ARCH_PFN_OFFSET, are you sure it was me?  I'm just
back today after been offline a week so didn't review the patch but IIRC,
ARCH_PFN_OFFSET deals with the case where physical memory does not start
at 0. Without the offset, virtual _PAGE_OFFSET would not physical page 0.
I don't recall it being related to the alignment of node 0 so if there
are crashes due to misalignment of node 0 and the fix is ARCH_PFN_OFFSET
related then I'm surprised.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv2] mm: Don't offset memmap for flatmem
  2015-01-26 15:56                     ` Mel Gorman
@ 2015-01-29 13:13                       ` Vlastimil Babka
  -1 siblings, 0 replies; 33+ messages in thread
From: Vlastimil Babka @ 2015-01-29 13:13 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Laura Abbott, Andrew Morton, Srinivas Kandagatla,
	linux-arm-kernel, Russell King - ARM Linux, ssantosh,
	Kevin Hilman, Arnd Bergman, Stephen Boyd, linux-mm, Kumar Gala

On 01/26/2015 04:56 PM, Mel Gorman wrote:
> On Fri, Jan 23, 2015 at 10:05:48AM +0100, Vlastimil Babka wrote:
>> On 01/23/2015 01:33 AM, Laura Abbott wrote:
>>> On 1/22/2015 4:20 PM, Andrew Morton wrote:
>>>>
>>>> I don't think v2 addressed Vlastimil's review comment?
>>>>
>>>
>>> We're still adding the offset to node_mem_map and then subtracting it from
>>> just mem_map. Did I miss another comment somewhere?
>>
>> Yes that was addressed, thanks. But I don't feel comfortable acking
>> it yet, as I have no idea if we are doing the right thing for
>> CONFIG_HAVE_MEMBLOCK_NODE_MAP && CONFIG_FLATMEM case here.
>>
>> Also putting the CONFIG_FLATMEM && !CONFIG_HAVE_MEMBLOCK_NODE_MAP
>> under the "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" will
>> probably do the right thing, but looks like a weird test for this
>> case here.
>>
>> I have no good suggestion though, so let's CC Mel who apparently
>> wrote the ARCH_PFN_OFFSET correction?
>>
>
> I don't recall introducing ARCH_PFN_OFFSET, are you sure it was me?  I'm just
> back today after been offline a week so didn't review the patch but IIRC,
> ARCH_PFN_OFFSET deals with the case where physical memory does not start
> at 0. Without the offset, virtual _PAGE_OFFSET would not physical page 0.
> I don't recall it being related to the alignment of node 0 so if there
> are crashes due to misalignment of node 0 and the fix is ARCH_PFN_OFFSET
> related then I'm surprised.

You're right that ARCH_PFN_OFFSET wasn't added by you, but by commit 
467bc461d2 which was a bugfix to your commit c713216dee, which did 
introduce the mem_map correction code, and after which the code looked like:

mem_map = NODE_DATA(0)->node_mem_map;
#ifdef CONFIG_ARCH_POPULATES_NODE_MAP
                if (page_to_pfn(mem_map) != pgdat->node_start_pfn)
                        mem_map -= pgdat->node_start_pfn;
#endif /* CONFIG_ARCH_POPULATES_NODE_MAP */


It's from 2006 so I can't expect you remember the details, but I had 
some trouble finding out what this does. I assume it makes sure that 
mem_map points to struct page corresponding to pfn 0, because that's 
what translations using mem_map expect.
But pgdat->node_mem_map points to struct page corresponding to 
pgdat->node_start_pfn, which might not be 0. So it subtracts 
node_start_pfn to fix that. This is OK, as the node_mem_map is allocated 
(in this very function) with padding so that it covers a 
MAX_ORDER_NR_PAGES aligned area where node_mem_map may point to the 
middle of it.

Commit 467bc461d2 fixed this in case the first pfn is not 0, but 
ARCH_PFN_OFFSET. So mem_map points to struct page corresponding to 
pfn=ARCH_PFN_OFFSET, which is OK. But I still have few doubts:

1) The "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" sort of 
silently assumes that mem_map is allocated at the beginning of the node, 
i.e. at pgdat->node_start_pfn. And the only reason for this if-condition 
to be true, is that we haven't corrected the page_to_pfn translation, 
which uses mem_map. Is this assumption always OK to do? Shouldn't the 
if-condition be instead about pgdat->node_start_pfn not being aligned?

2) The #ifdef guard is about CONFIG_ARCH_POPULATES_NODE_MAP, which is 
nowadays called CONFIG_HAVE_MEMBLOCK_NODE_MAP. But shouldn't it be 
#ifdef FLATMEM instead? After all, we are correcting value of mem_map 
based on page_to_pfn code variant used on FLATMEM. arm doesn't define
CONFIG_ARCH_POPULATES_NODE_MAP but apparently needs this correction.

3) The node_mem_map allocation code aligns the allocation to 
MAX_ORDER_NR_PAGES, so the offset between the start of the allocated map 
and where node_mem_map points to will be up to MAX_ORDER_NR_PAGES.
However, here we subtract (in current kernel) (pgdat->node_start_pfn - 
ARCH_PFN_OFFSET). That looks like another silent assumption, that 
pgdat->node_start_pfn is always between ARCH_PFN_OFFSET and 
ARCH_PFN_OFFSET + MAX_ORDER_NR_PAGES. If it were larger, the mem_map 
correction would subtract too much and end up below what was allocated 
for node_mem_map, no? The bug report behind this patch said that first 
2MB of memory was reserved using "no-map flag using DT". Unless this 
somehow translates to ARCH_PFN_OFFSET at build time, we would underflow 
mem_map, right? Maybe I'm just overly paranoid here and of course 
ARCH_PFN_OFFSET is determined properly on arm...

If anyone can confirm my doubts or point me to what I'm missing, thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCHv2] mm: Don't offset memmap for flatmem
@ 2015-01-29 13:13                       ` Vlastimil Babka
  0 siblings, 0 replies; 33+ messages in thread
From: Vlastimil Babka @ 2015-01-29 13:13 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/26/2015 04:56 PM, Mel Gorman wrote:
> On Fri, Jan 23, 2015 at 10:05:48AM +0100, Vlastimil Babka wrote:
>> On 01/23/2015 01:33 AM, Laura Abbott wrote:
>>> On 1/22/2015 4:20 PM, Andrew Morton wrote:
>>>>
>>>> I don't think v2 addressed Vlastimil's review comment?
>>>>
>>>
>>> We're still adding the offset to node_mem_map and then subtracting it from
>>> just mem_map. Did I miss another comment somewhere?
>>
>> Yes that was addressed, thanks. But I don't feel comfortable acking
>> it yet, as I have no idea if we are doing the right thing for
>> CONFIG_HAVE_MEMBLOCK_NODE_MAP && CONFIG_FLATMEM case here.
>>
>> Also putting the CONFIG_FLATMEM && !CONFIG_HAVE_MEMBLOCK_NODE_MAP
>> under the "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" will
>> probably do the right thing, but looks like a weird test for this
>> case here.
>>
>> I have no good suggestion though, so let's CC Mel who apparently
>> wrote the ARCH_PFN_OFFSET correction?
>>
>
> I don't recall introducing ARCH_PFN_OFFSET, are you sure it was me?  I'm just
> back today after been offline a week so didn't review the patch but IIRC,
> ARCH_PFN_OFFSET deals with the case where physical memory does not start
> at 0. Without the offset, virtual _PAGE_OFFSET would not physical page 0.
> I don't recall it being related to the alignment of node 0 so if there
> are crashes due to misalignment of node 0 and the fix is ARCH_PFN_OFFSET
> related then I'm surprised.

You're right that ARCH_PFN_OFFSET wasn't added by you, but by commit 
467bc461d2 which was a bugfix to your commit c713216dee, which did 
introduce the mem_map correction code, and after which the code looked like:

mem_map = NODE_DATA(0)->node_mem_map;
#ifdef CONFIG_ARCH_POPULATES_NODE_MAP
                if (page_to_pfn(mem_map) != pgdat->node_start_pfn)
                        mem_map -= pgdat->node_start_pfn;
#endif /* CONFIG_ARCH_POPULATES_NODE_MAP */


It's from 2006 so I can't expect you remember the details, but I had 
some trouble finding out what this does. I assume it makes sure that 
mem_map points to struct page corresponding to pfn 0, because that's 
what translations using mem_map expect.
But pgdat->node_mem_map points to struct page corresponding to 
pgdat->node_start_pfn, which might not be 0. So it subtracts 
node_start_pfn to fix that. This is OK, as the node_mem_map is allocated 
(in this very function) with padding so that it covers a 
MAX_ORDER_NR_PAGES aligned area where node_mem_map may point to the 
middle of it.

Commit 467bc461d2 fixed this in case the first pfn is not 0, but 
ARCH_PFN_OFFSET. So mem_map points to struct page corresponding to 
pfn=ARCH_PFN_OFFSET, which is OK. But I still have few doubts:

1) The "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" sort of 
silently assumes that mem_map is allocated at the beginning of the node, 
i.e. at pgdat->node_start_pfn. And the only reason for this if-condition 
to be true, is that we haven't corrected the page_to_pfn translation, 
which uses mem_map. Is this assumption always OK to do? Shouldn't the 
if-condition be instead about pgdat->node_start_pfn not being aligned?

2) The #ifdef guard is about CONFIG_ARCH_POPULATES_NODE_MAP, which is 
nowadays called CONFIG_HAVE_MEMBLOCK_NODE_MAP. But shouldn't it be 
#ifdef FLATMEM instead? After all, we are correcting value of mem_map 
based on page_to_pfn code variant used on FLATMEM. arm doesn't define
CONFIG_ARCH_POPULATES_NODE_MAP but apparently needs this correction.

3) The node_mem_map allocation code aligns the allocation to 
MAX_ORDER_NR_PAGES, so the offset between the start of the allocated map 
and where node_mem_map points to will be up to MAX_ORDER_NR_PAGES.
However, here we subtract (in current kernel) (pgdat->node_start_pfn - 
ARCH_PFN_OFFSET). That looks like another silent assumption, that 
pgdat->node_start_pfn is always between ARCH_PFN_OFFSET and 
ARCH_PFN_OFFSET + MAX_ORDER_NR_PAGES. If it were larger, the mem_map 
correction would subtract too much and end up below what was allocated 
for node_mem_map, no? The bug report behind this patch said that first 
2MB of memory was reserved using "no-map flag using DT". Unless this 
somehow translates to ARCH_PFN_OFFSET at build time, we would underflow 
mem_map, right? Maybe I'm just overly paranoid here and of course 
ARCH_PFN_OFFSET is determined properly on arm...

If anyone can confirm my doubts or point me to what I'm missing, thanks.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv2] mm: Don't offset memmap for flatmem
  2015-01-29 13:13                       ` Vlastimil Babka
@ 2015-02-04  2:25                         ` Laura Abbott
  -1 siblings, 0 replies; 33+ messages in thread
From: Laura Abbott @ 2015-02-04  2:25 UTC (permalink / raw)
  To: Vlastimil Babka, Mel Gorman
  Cc: Andrew Morton, Srinivas Kandagatla, linux-arm-kernel,
	Russell King - ARM Linux, ssantosh, Kevin Hilman, Arnd Bergman,
	Stephen Boyd, linux-mm, Kumar Gala

On 1/29/2015 5:13 AM, Vlastimil Babka wrote:
> On 01/26/2015 04:56 PM, Mel Gorman wrote:
>> On Fri, Jan 23, 2015 at 10:05:48AM +0100, Vlastimil Babka wrote:
>>> On 01/23/2015 01:33 AM, Laura Abbott wrote:
>>>> On 1/22/2015 4:20 PM, Andrew Morton wrote:
>>>>>
>>>>> I don't think v2 addressed Vlastimil's review comment?
>>>>>
>>>>
>>>> We're still adding the offset to node_mem_map and then subtracting it from
>>>> just mem_map. Did I miss another comment somewhere?
>>>
>>> Yes that was addressed, thanks. But I don't feel comfortable acking
>>> it yet, as I have no idea if we are doing the right thing for
>>> CONFIG_HAVE_MEMBLOCK_NODE_MAP && CONFIG_FLATMEM case here.
>>>
>>> Also putting the CONFIG_FLATMEM && !CONFIG_HAVE_MEMBLOCK_NODE_MAP
>>> under the "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" will
>>> probably do the right thing, but looks like a weird test for this
>>> case here.
>>>
>>> I have no good suggestion though, so let's CC Mel who apparently
>>> wrote the ARCH_PFN_OFFSET correction?
>>>
>>
>> I don't recall introducing ARCH_PFN_OFFSET, are you sure it was me?  I'm just
>> back today after been offline a week so didn't review the patch but IIRC,
>> ARCH_PFN_OFFSET deals with the case where physical memory does not start
>> at 0. Without the offset, virtual _PAGE_OFFSET would not physical page 0.
>> I don't recall it being related to the alignment of node 0 so if there
>> are crashes due to misalignment of node 0 and the fix is ARCH_PFN_OFFSET
>> related then I'm surprised.
>
> You're right that ARCH_PFN_OFFSET wasn't added by you, but by commit
> 467bc461d2 which was a bugfix to your commit c713216dee, which did
>  introduce the mem_map correction code, and after which the code looked like:
>
> mem_map = NODE_DATA(0)->node_mem_map;
> #ifdef CONFIG_ARCH_POPULATES_NODE_MAP
>                 if (page_to_pfn(mem_map) != pgdat->node_start_pfn)
>                         mem_map -= pgdat->node_start_pfn;
> #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
>
>
> It's from 2006 so I can't expect you remember the details, but I had some
>  trouble finding out what this does. I assume it makes sure that mem_map points
>  to struct page corresponding to pfn 0, because that's what translations using
>  mem_map expect.
> But pgdat->node_mem_map points to struct page corresponding to
>  pgdat->node_start_pfn, which might not be 0. So it subtracts node_start_pfn
>  to fix that. This is OK, as the node_mem_map is allocated (in this very
>  function) with padding so that it covers a MAX_ORDER_NR_PAGES aligned area
>  where node_mem_map may point to the middle of it.
>
> Commit 467bc461d2 fixed this in case the first pfn is not 0, but ARCH_PFN_OFFSET.
>  So mem_map points to struct page corresponding to pfn=ARCH_PFN_OFFSET, which
>  is OK. But I still have few doubts:
>
> 1) The "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" sort of silently
>  assumes that mem_map is allocated at the beginning of the node, i.e. at
>  pgdat->node_start_pfn. And the only reason for this if-condition to be true,
>  is that we haven't corrected the page_to_pfn translation, which uses mem_map.
>  Is this assumption always OK to do? Shouldn't the if-condition be instead about
>  pgdat->node_start_pfn not being aligned?
>
> 2) The #ifdef guard is about CONFIG_ARCH_POPULATES_NODE_MAP, which is nowadays  called  > CONFIG_HAVE_MEMBLOCK_NODE_MAP. But shouldn't it be #ifdef FLATMEM instead?
>  After all, we are correcting value of mem_map based on page_to_pfn code
>variant used on FLATMEM. arm doesn't define
> CONFIG_ARCH_POPULATES_NODE_MAP but apparently needs this correction.
>

Just doing #ifdef FLATMEM doesn't work because ARCH_PFN_OFFSET doesn't
seem to be picked up properly for NOMMU arches properly. Probably just
missing a header somewhere.

> 3) The node_mem_map allocation code aligns the allocation to MAX_ORDER_NR_PAGES,
>  so the offset between the start of the allocated map and where node_mem_map
>  points to will be up to MAX_ORDER_NR_PAGES.
> However, here we subtract (in current kernel) (pgdat->node_start_pfn - ARCH_PFN_OFFSET).
>  That looks like another silent assumption, that pgdat->node_start_pfn is always
>  between ARCH_PFN_OFFSET and ARCH_PFN_OFFSET + MAX_ORDER_NR_PAGES. If it were
>  larger, the mem_map correction would subtract too much and end up below what
>  was allocated for node_mem_map, no? The bug report behind this patch said that
>  first 2MB of memory was reserved using "no-map flag using DT". Unless this somehow
>  translates to ARCH_PFN_OFFSET at build time, we would underflow mem_map, right?
>  Maybe I'm just overly paranoid here and of course ARCH_PFN_OFFSET is determined
>  properly on arm...
>
> If anyone can confirm my doubts or point me to what I'm missing, thanks.

ARCH_PFN_OFFSET should always be the lowest PFN in the system, otherwise
I think plenty of other things are broken given how many architectures
make this assumption. That said, I don't think subtracting ARCH_PFN_OFFSET
makes it obvious why the adjustment is being made.

Thanks,
Laura

-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCHv2] mm: Don't offset memmap for flatmem
@ 2015-02-04  2:25                         ` Laura Abbott
  0 siblings, 0 replies; 33+ messages in thread
From: Laura Abbott @ 2015-02-04  2:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 1/29/2015 5:13 AM, Vlastimil Babka wrote:
> On 01/26/2015 04:56 PM, Mel Gorman wrote:
>> On Fri, Jan 23, 2015 at 10:05:48AM +0100, Vlastimil Babka wrote:
>>> On 01/23/2015 01:33 AM, Laura Abbott wrote:
>>>> On 1/22/2015 4:20 PM, Andrew Morton wrote:
>>>>>
>>>>> I don't think v2 addressed Vlastimil's review comment?
>>>>>
>>>>
>>>> We're still adding the offset to node_mem_map and then subtracting it from
>>>> just mem_map. Did I miss another comment somewhere?
>>>
>>> Yes that was addressed, thanks. But I don't feel comfortable acking
>>> it yet, as I have no idea if we are doing the right thing for
>>> CONFIG_HAVE_MEMBLOCK_NODE_MAP && CONFIG_FLATMEM case here.
>>>
>>> Also putting the CONFIG_FLATMEM && !CONFIG_HAVE_MEMBLOCK_NODE_MAP
>>> under the "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" will
>>> probably do the right thing, but looks like a weird test for this
>>> case here.
>>>
>>> I have no good suggestion though, so let's CC Mel who apparently
>>> wrote the ARCH_PFN_OFFSET correction?
>>>
>>
>> I don't recall introducing ARCH_PFN_OFFSET, are you sure it was me?  I'm just
>> back today after been offline a week so didn't review the patch but IIRC,
>> ARCH_PFN_OFFSET deals with the case where physical memory does not start
>> at 0. Without the offset, virtual _PAGE_OFFSET would not physical page 0.
>> I don't recall it being related to the alignment of node 0 so if there
>> are crashes due to misalignment of node 0 and the fix is ARCH_PFN_OFFSET
>> related then I'm surprised.
>
> You're right that ARCH_PFN_OFFSET wasn't added by you, but by commit
> 467bc461d2 which was a bugfix to your commit c713216dee, which did
>  introduce the mem_map correction code, and after which the code looked like:
>
> mem_map = NODE_DATA(0)->node_mem_map;
> #ifdef CONFIG_ARCH_POPULATES_NODE_MAP
>                 if (page_to_pfn(mem_map) != pgdat->node_start_pfn)
>                         mem_map -= pgdat->node_start_pfn;
> #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
>
>
> It's from 2006 so I can't expect you remember the details, but I had some
>  trouble finding out what this does. I assume it makes sure that mem_map points
>  to struct page corresponding to pfn 0, because that's what translations using
>  mem_map expect.
> But pgdat->node_mem_map points to struct page corresponding to
>  pgdat->node_start_pfn, which might not be 0. So it subtracts node_start_pfn
>  to fix that. This is OK, as the node_mem_map is allocated (in this very
>  function) with padding so that it covers a MAX_ORDER_NR_PAGES aligned area
>  where node_mem_map may point to the middle of it.
>
> Commit 467bc461d2 fixed this in case the first pfn is not 0, but ARCH_PFN_OFFSET.
>  So mem_map points to struct page corresponding to pfn=ARCH_PFN_OFFSET, which
>  is OK. But I still have few doubts:
>
> 1) The "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" sort of silently
>  assumes that mem_map is allocated at the beginning of the node, i.e. at
>  pgdat->node_start_pfn. And the only reason for this if-condition to be true,
>  is that we haven't corrected the page_to_pfn translation, which uses mem_map.
>  Is this assumption always OK to do? Shouldn't the if-condition be instead about
>  pgdat->node_start_pfn not being aligned?
>
> 2) The #ifdef guard is about CONFIG_ARCH_POPULATES_NODE_MAP, which is nowadays  called  > CONFIG_HAVE_MEMBLOCK_NODE_MAP. But shouldn't it be #ifdef FLATMEM instead?
>  After all, we are correcting value of mem_map based on page_to_pfn code
>variant used on FLATMEM. arm doesn't define
> CONFIG_ARCH_POPULATES_NODE_MAP but apparently needs this correction.
>

Just doing #ifdef FLATMEM doesn't work because ARCH_PFN_OFFSET doesn't
seem to be picked up properly for NOMMU arches properly. Probably just
missing a header somewhere.

> 3) The node_mem_map allocation code aligns the allocation to MAX_ORDER_NR_PAGES,
>  so the offset between the start of the allocated map and where node_mem_map
>  points to will be up to MAX_ORDER_NR_PAGES.
> However, here we subtract (in current kernel) (pgdat->node_start_pfn - ARCH_PFN_OFFSET).
>  That looks like another silent assumption, that pgdat->node_start_pfn is always
>  between ARCH_PFN_OFFSET and ARCH_PFN_OFFSET + MAX_ORDER_NR_PAGES. If it were
>  larger, the mem_map correction would subtract too much and end up below what
>  was allocated for node_mem_map, no? The bug report behind this patch said that
>  first 2MB of memory was reserved using "no-map flag using DT". Unless this somehow
>  translates to ARCH_PFN_OFFSET at build time, we would underflow mem_map, right?
>  Maybe I'm just overly paranoid here and of course ARCH_PFN_OFFSET is determined
>  properly on arm...
>
> If anyone can confirm my doubts or point me to what I'm missing, thanks.

ARCH_PFN_OFFSET should always be the lowest PFN in the system, otherwise
I think plenty of other things are broken given how many architectures
make this assumption. That said, I don't think subtracting ARCH_PFN_OFFSET
makes it obvious why the adjustment is being made.

Thanks,
Laura

-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv2] mm: Don't offset memmap for flatmem
  2015-02-04  2:25                         ` Laura Abbott
@ 2015-02-24 19:54                           ` Laura Abbott
  -1 siblings, 0 replies; 33+ messages in thread
From: Laura Abbott @ 2015-02-24 19:54 UTC (permalink / raw)
  To: Vlastimil Babka, Mel Gorman
  Cc: Andrew Morton, Srinivas Kandagatla, linux-arm-kernel,
	Russell King - ARM Linux, ssantosh, Kevin Hilman, Arnd Bergman,
	Stephen Boyd, linux-mm, Kumar Gala

Reviving this thread because I don't think it ever got resolved.

On 2/3/2015 6:25 PM, Laura Abbott wrote:
> On 1/29/2015 5:13 AM, Vlastimil Babka wrote:
>> On 01/26/2015 04:56 PM, Mel Gorman wrote:
>>> On Fri, Jan 23, 2015 at 10:05:48AM +0100, Vlastimil Babka wrote:
>>>> On 01/23/2015 01:33 AM, Laura Abbott wrote:
>>>>> On 1/22/2015 4:20 PM, Andrew Morton wrote:
>>>>>>
>>>>>> I don't think v2 addressed Vlastimil's review comment?
>>>>>>
>>>>>
>>>>> We're still adding the offset to node_mem_map and then subtracting it from
>>>>> just mem_map. Did I miss another comment somewhere?
>>>>
>>>> Yes that was addressed, thanks. But I don't feel comfortable acking
>>>> it yet, as I have no idea if we are doing the right thing for
>>>> CONFIG_HAVE_MEMBLOCK_NODE_MAP && CONFIG_FLATMEM case here.
>>>>
>>>> Also putting the CONFIG_FLATMEM && !CONFIG_HAVE_MEMBLOCK_NODE_MAP
>>>> under the "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" will
>>>> probably do the right thing, but looks like a weird test for this
>>>> case here.
>>>>
>>>> I have no good suggestion though, so let's CC Mel who apparently
>>>> wrote the ARCH_PFN_OFFSET correction?
>>>>
>>>
>>> I don't recall introducing ARCH_PFN_OFFSET, are you sure it was me?  I'm just
>>> back today after been offline a week so didn't review the patch but IIRC,
>>> ARCH_PFN_OFFSET deals with the case where physical memory does not start
>>> at 0. Without the offset, virtual _PAGE_OFFSET would not physical page 0.
>>> I don't recall it being related to the alignment of node 0 so if there
>>> are crashes due to misalignment of node 0 and the fix is ARCH_PFN_OFFSET
>>> related then I'm surprised.
>>
>> You're right that ARCH_PFN_OFFSET wasn't added by you, but by commit
>> 467bc461d2 which was a bugfix to your commit c713216dee, which did
>>  introduce the mem_map correction code, and after which the code looked like:
>>
>> mem_map = NODE_DATA(0)->node_mem_map;
>> #ifdef CONFIG_ARCH_POPULATES_NODE_MAP
>>                 if (page_to_pfn(mem_map) != pgdat->node_start_pfn)
>>                         mem_map -= pgdat->node_start_pfn;
>> #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
>>
>>
>> It's from 2006 so I can't expect you remember the details, but I had some
>>  trouble finding out what this does. I assume it makes sure that mem_map points
>>  to struct page corresponding to pfn 0, because that's what translations using
>>  mem_map expect.
>> But pgdat->node_mem_map points to struct page corresponding to
>>  pgdat->node_start_pfn, which might not be 0. So it subtracts node_start_pfn
>>  to fix that. This is OK, as the node_mem_map is allocated (in this very
>>  function) with padding so that it covers a MAX_ORDER_NR_PAGES aligned area
>>  where node_mem_map may point to the middle of it.
>>
>> Commit 467bc461d2 fixed this in case the first pfn is not 0, but ARCH_PFN_OFFSET.
>>  So mem_map points to struct page corresponding to pfn=ARCH_PFN_OFFSET, which
>>  is OK. But I still have few doubts:
>>
>> 1) The "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" sort of silently
>>  assumes that mem_map is allocated at the beginning of the node, i.e. at
>>  pgdat->node_start_pfn. And the only reason for this if-condition to be true,
>>  is that we haven't corrected the page_to_pfn translation, which uses mem_map.
>>  Is this assumption always OK to do? Shouldn't the if-condition be instead about
>>  pgdat->node_start_pfn not being aligned?
>>
>> 2) The #ifdef guard is about CONFIG_ARCH_POPULATES_NODE_MAP, which is nowadays  called  > CONFIG_HAVE_MEMBLOCK_NODE_MAP. But shouldn't it be #ifdef FLATMEM instead?
>>  After all, we are correcting value of mem_map based on page_to_pfn code
>> variant used on FLATMEM. arm doesn't define
>> CONFIG_ARCH_POPULATES_NODE_MAP but apparently needs this correction.
>>
>
> Just doing #ifdef FLATMEM doesn't work because ARCH_PFN_OFFSET doesn't
> seem to be picked up properly for NOMMU arches properly. Probably just
> missing a header somewhere.
>
>> 3) The node_mem_map allocation code aligns the allocation to MAX_ORDER_NR_PAGES,
>>  so the offset between the start of the allocated map and where node_mem_map
>>  points to will be up to MAX_ORDER_NR_PAGES.
>> However, here we subtract (in current kernel) (pgdat->node_start_pfn - ARCH_PFN_OFFSET).
>>  That looks like another silent assumption, that pgdat->node_start_pfn is always
>>  between ARCH_PFN_OFFSET and ARCH_PFN_OFFSET + MAX_ORDER_NR_PAGES. If it were
>>  larger, the mem_map correction would subtract too much and end up below what
>>  was allocated for node_mem_map, no? The bug report behind this patch said that
>>  first 2MB of memory was reserved using "no-map flag using DT". Unless this somehow
>>  translates to ARCH_PFN_OFFSET at build time, we would underflow mem_map, right?
>>  Maybe I'm just overly paranoid here and of course ARCH_PFN_OFFSET is determined
>>  properly on arm...
>>
>> If anyone can confirm my doubts or point me to what I'm missing, thanks.
>
> ARCH_PFN_OFFSET should always be the lowest PFN in the system, otherwise
> I think plenty of other things are broken given how many architectures
> make this assumption. That said, I don't think subtracting ARCH_PFN_OFFSET
> makes it obvious why the adjustment is being made.
>
> Thanks,
> Laura
>

I was incorrect before: it isn't just NOMMU but architectures that don't use
asm-generic/memory_model.h which failed to compile. I could respin with
more ifdefery around the ARCH_PFN_OFFSET if that sounds reasonable.

Thanks,
Laura

-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCHv2] mm: Don't offset memmap for flatmem
@ 2015-02-24 19:54                           ` Laura Abbott
  0 siblings, 0 replies; 33+ messages in thread
From: Laura Abbott @ 2015-02-24 19:54 UTC (permalink / raw)
  To: linux-arm-kernel

Reviving this thread because I don't think it ever got resolved.

On 2/3/2015 6:25 PM, Laura Abbott wrote:
> On 1/29/2015 5:13 AM, Vlastimil Babka wrote:
>> On 01/26/2015 04:56 PM, Mel Gorman wrote:
>>> On Fri, Jan 23, 2015 at 10:05:48AM +0100, Vlastimil Babka wrote:
>>>> On 01/23/2015 01:33 AM, Laura Abbott wrote:
>>>>> On 1/22/2015 4:20 PM, Andrew Morton wrote:
>>>>>>
>>>>>> I don't think v2 addressed Vlastimil's review comment?
>>>>>>
>>>>>
>>>>> We're still adding the offset to node_mem_map and then subtracting it from
>>>>> just mem_map. Did I miss another comment somewhere?
>>>>
>>>> Yes that was addressed, thanks. But I don't feel comfortable acking
>>>> it yet, as I have no idea if we are doing the right thing for
>>>> CONFIG_HAVE_MEMBLOCK_NODE_MAP && CONFIG_FLATMEM case here.
>>>>
>>>> Also putting the CONFIG_FLATMEM && !CONFIG_HAVE_MEMBLOCK_NODE_MAP
>>>> under the "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" will
>>>> probably do the right thing, but looks like a weird test for this
>>>> case here.
>>>>
>>>> I have no good suggestion though, so let's CC Mel who apparently
>>>> wrote the ARCH_PFN_OFFSET correction?
>>>>
>>>
>>> I don't recall introducing ARCH_PFN_OFFSET, are you sure it was me?  I'm just
>>> back today after been offline a week so didn't review the patch but IIRC,
>>> ARCH_PFN_OFFSET deals with the case where physical memory does not start
>>> at 0. Without the offset, virtual _PAGE_OFFSET would not physical page 0.
>>> I don't recall it being related to the alignment of node 0 so if there
>>> are crashes due to misalignment of node 0 and the fix is ARCH_PFN_OFFSET
>>> related then I'm surprised.
>>
>> You're right that ARCH_PFN_OFFSET wasn't added by you, but by commit
>> 467bc461d2 which was a bugfix to your commit c713216dee, which did
>>  introduce the mem_map correction code, and after which the code looked like:
>>
>> mem_map = NODE_DATA(0)->node_mem_map;
>> #ifdef CONFIG_ARCH_POPULATES_NODE_MAP
>>                 if (page_to_pfn(mem_map) != pgdat->node_start_pfn)
>>                         mem_map -= pgdat->node_start_pfn;
>> #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
>>
>>
>> It's from 2006 so I can't expect you remember the details, but I had some
>>  trouble finding out what this does. I assume it makes sure that mem_map points
>>  to struct page corresponding to pfn 0, because that's what translations using
>>  mem_map expect.
>> But pgdat->node_mem_map points to struct page corresponding to
>>  pgdat->node_start_pfn, which might not be 0. So it subtracts node_start_pfn
>>  to fix that. This is OK, as the node_mem_map is allocated (in this very
>>  function) with padding so that it covers a MAX_ORDER_NR_PAGES aligned area
>>  where node_mem_map may point to the middle of it.
>>
>> Commit 467bc461d2 fixed this in case the first pfn is not 0, but ARCH_PFN_OFFSET.
>>  So mem_map points to struct page corresponding to pfn=ARCH_PFN_OFFSET, which
>>  is OK. But I still have few doubts:
>>
>> 1) The "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" sort of silently
>>  assumes that mem_map is allocated at the beginning of the node, i.e. at
>>  pgdat->node_start_pfn. And the only reason for this if-condition to be true,
>>  is that we haven't corrected the page_to_pfn translation, which uses mem_map.
>>  Is this assumption always OK to do? Shouldn't the if-condition be instead about
>>  pgdat->node_start_pfn not being aligned?
>>
>> 2) The #ifdef guard is about CONFIG_ARCH_POPULATES_NODE_MAP, which is nowadays  called  > CONFIG_HAVE_MEMBLOCK_NODE_MAP. But shouldn't it be #ifdef FLATMEM instead?
>>  After all, we are correcting value of mem_map based on page_to_pfn code
>> variant used on FLATMEM. arm doesn't define
>> CONFIG_ARCH_POPULATES_NODE_MAP but apparently needs this correction.
>>
>
> Just doing #ifdef FLATMEM doesn't work because ARCH_PFN_OFFSET doesn't
> seem to be picked up properly for NOMMU arches properly. Probably just
> missing a header somewhere.
>
>> 3) The node_mem_map allocation code aligns the allocation to MAX_ORDER_NR_PAGES,
>>  so the offset between the start of the allocated map and where node_mem_map
>>  points to will be up to MAX_ORDER_NR_PAGES.
>> However, here we subtract (in current kernel) (pgdat->node_start_pfn - ARCH_PFN_OFFSET).
>>  That looks like another silent assumption, that pgdat->node_start_pfn is always
>>  between ARCH_PFN_OFFSET and ARCH_PFN_OFFSET + MAX_ORDER_NR_PAGES. If it were
>>  larger, the mem_map correction would subtract too much and end up below what
>>  was allocated for node_mem_map, no? The bug report behind this patch said that
>>  first 2MB of memory was reserved using "no-map flag using DT". Unless this somehow
>>  translates to ARCH_PFN_OFFSET at build time, we would underflow mem_map, right?
>>  Maybe I'm just overly paranoid here and of course ARCH_PFN_OFFSET is determined
>>  properly on arm...
>>
>> If anyone can confirm my doubts or point me to what I'm missing, thanks.
>
> ARCH_PFN_OFFSET should always be the lowest PFN in the system, otherwise
> I think plenty of other things are broken given how many architectures
> make this assumption. That said, I don't think subtracting ARCH_PFN_OFFSET
> makes it obvious why the adjustment is being made.
>
> Thanks,
> Laura
>

I was incorrect before: it isn't just NOMMU but architectures that don't use
asm-generic/memory_model.h which failed to compile. I could respin with
more ifdefery around the ARCH_PFN_OFFSET if that sounds reasonable.

Thanks,
Laura

-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv2] mm: Don't offset memmap for flatmem
  2015-02-24 19:54                           ` Laura Abbott
@ 2015-02-27 15:24                             ` Vlastimil Babka
  -1 siblings, 0 replies; 33+ messages in thread
From: Vlastimil Babka @ 2015-02-27 15:24 UTC (permalink / raw)
  To: Laura Abbott, Mel Gorman
  Cc: Andrew Morton, Srinivas Kandagatla, linux-arm-kernel,
	Russell King - ARM Linux, ssantosh, Kevin Hilman, Arnd Bergman,
	Stephen Boyd, linux-mm, Kumar Gala

On 02/24/2015 08:54 PM, Laura Abbott wrote:
> Reviving this thread because I don't think it ever got resolved.
> 
> On 2/3/2015 6:25 PM, Laura Abbott wrote:
>> On 1/29/2015 5:13 AM, Vlastimil Babka wrote:
>>>>
>>>> I don't recall introducing ARCH_PFN_OFFSET, are you sure it was me?  I'm just
>>>> back today after been offline a week so didn't review the patch but IIRC,
>>>> ARCH_PFN_OFFSET deals with the case where physical memory does not start
>>>> at 0. Without the offset, virtual _PAGE_OFFSET would not physical page 0.
>>>> I don't recall it being related to the alignment of node 0 so if there
>>>> are crashes due to misalignment of node 0 and the fix is ARCH_PFN_OFFSET
>>>> related then I'm surprised.
>>>
>>> You're right that ARCH_PFN_OFFSET wasn't added by you, but by commit
>>> 467bc461d2 which was a bugfix to your commit c713216dee, which did
>>>  introduce the mem_map correction code, and after which the code looked like:
>>>
>>> mem_map = NODE_DATA(0)->node_mem_map;
>>> #ifdef CONFIG_ARCH_POPULATES_NODE_MAP
>>>                 if (page_to_pfn(mem_map) != pgdat->node_start_pfn)
>>>                         mem_map -= pgdat->node_start_pfn;
>>> #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
>>>
>>>
>>> It's from 2006 so I can't expect you remember the details, but I had some
>>>  trouble finding out what this does. I assume it makes sure that mem_map points
>>>  to struct page corresponding to pfn 0, because that's what translations using
>>>  mem_map expect.
>>> But pgdat->node_mem_map points to struct page corresponding to
>>>  pgdat->node_start_pfn, which might not be 0. So it subtracts node_start_pfn
>>>  to fix that. This is OK, as the node_mem_map is allocated (in this very
>>>  function) with padding so that it covers a MAX_ORDER_NR_PAGES aligned area
>>>  where node_mem_map may point to the middle of it.
>>>
>>> Commit 467bc461d2 fixed this in case the first pfn is not 0, but ARCH_PFN_OFFSET.
>>>  So mem_map points to struct page corresponding to pfn=ARCH_PFN_OFFSET, which
>>>  is OK. But I still have few doubts:
>>>
>>> 1) The "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" sort of silently
>>>  assumes that mem_map is allocated at the beginning of the node, i.e. at
>>>  pgdat->node_start_pfn. And the only reason for this if-condition to be true,
>>>  is that we haven't corrected the page_to_pfn translation, which uses mem_map.
>>>  Is this assumption always OK to do? Shouldn't the if-condition be instead about
>>>  pgdat->node_start_pfn not being aligned?
>>>
>>> 2) The #ifdef guard is about CONFIG_ARCH_POPULATES_NODE_MAP, which is nowadays  called  > CONFIG_HAVE_MEMBLOCK_NODE_MAP. But shouldn't it be #ifdef FLATMEM instead?
>>>  After all, we are correcting value of mem_map based on page_to_pfn code
>>> variant used on FLATMEM. arm doesn't define
>>> CONFIG_ARCH_POPULATES_NODE_MAP but apparently needs this correction.
>>>
>>
>> Just doing #ifdef FLATMEM doesn't work because ARCH_PFN_OFFSET doesn't
>> seem to be picked up properly for NOMMU arches properly. Probably just
>> missing a header somewhere.
>>
>>> 3) The node_mem_map allocation code aligns the allocation to MAX_ORDER_NR_PAGES,
>>>  so the offset between the start of the allocated map and where node_mem_map
>>>  points to will be up to MAX_ORDER_NR_PAGES.
>>> However, here we subtract (in current kernel) (pgdat->node_start_pfn - ARCH_PFN_OFFSET).
>>>  That looks like another silent assumption, that pgdat->node_start_pfn is always
>>>  between ARCH_PFN_OFFSET and ARCH_PFN_OFFSET + MAX_ORDER_NR_PAGES. If it were
>>>  larger, the mem_map correction would subtract too much and end up below what
>>>  was allocated for node_mem_map, no? The bug report behind this patch said that
>>>  first 2MB of memory was reserved using "no-map flag using DT". Unless this somehow
>>>  translates to ARCH_PFN_OFFSET at build time, we would underflow mem_map, right?
>>>  Maybe I'm just overly paranoid here and of course ARCH_PFN_OFFSET is determined
>>>  properly on arm...
>>>
>>> If anyone can confirm my doubts or point me to what I'm missing, thanks.
>>
>> ARCH_PFN_OFFSET should always be the lowest PFN in the system, otherwise
>> I think plenty of other things are broken given how many architectures
>> make this assumption. That said, I don't think subtracting ARCH_PFN_OFFSET
>> makes it obvious why the adjustment is being made.
>>
>> Thanks,
>> Laura
>>
> 
> I was incorrect before: it isn't just NOMMU but architectures that don't use
> asm-generic/memory_model.h which failed to compile. I could respin with

Hm I see, some architectures use own variant of page_to_pfn, that's why it's
being used in the if () check.

> more ifdefery around the ARCH_PFN_OFFSET if that sounds reasonable.

So I think your v2 might be correct already. Unless there's an architecture that
defines CONFIG_FLATMEM and not CONFIG_HAVE_MEMBLOCK_NODE_MAP and places memmap
somewhere else than pgdat->node_start_pfn, which would trigger the check for a
wrong reason after the patch.

Looks like arm is an arch that doesn't define CONFIG_HAVE_MEMBLOCK_NODE_MAP, yet
it defines ARCH_PFN_OFFSET. With your patch it would correct memmap by the
calculated offset, not the ARCH_PFN_OFFSET constant. Are these two the same
then? Should there be something like a VM_BUG_ON that ARCH_PFN_OFFSET (if it
exists) is indeed equal to the calculated offset? Or maybe a more general
VM_BUG_ON checking that after any correction we make, the (page_to_pfn(mem_map)
== pgdat->node_start_pfn) condition holds?

> Thanks,
> Laura
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCHv2] mm: Don't offset memmap for flatmem
@ 2015-02-27 15:24                             ` Vlastimil Babka
  0 siblings, 0 replies; 33+ messages in thread
From: Vlastimil Babka @ 2015-02-27 15:24 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/24/2015 08:54 PM, Laura Abbott wrote:
> Reviving this thread because I don't think it ever got resolved.
> 
> On 2/3/2015 6:25 PM, Laura Abbott wrote:
>> On 1/29/2015 5:13 AM, Vlastimil Babka wrote:
>>>>
>>>> I don't recall introducing ARCH_PFN_OFFSET, are you sure it was me?  I'm just
>>>> back today after been offline a week so didn't review the patch but IIRC,
>>>> ARCH_PFN_OFFSET deals with the case where physical memory does not start
>>>> at 0. Without the offset, virtual _PAGE_OFFSET would not physical page 0.
>>>> I don't recall it being related to the alignment of node 0 so if there
>>>> are crashes due to misalignment of node 0 and the fix is ARCH_PFN_OFFSET
>>>> related then I'm surprised.
>>>
>>> You're right that ARCH_PFN_OFFSET wasn't added by you, but by commit
>>> 467bc461d2 which was a bugfix to your commit c713216dee, which did
>>>  introduce the mem_map correction code, and after which the code looked like:
>>>
>>> mem_map = NODE_DATA(0)->node_mem_map;
>>> #ifdef CONFIG_ARCH_POPULATES_NODE_MAP
>>>                 if (page_to_pfn(mem_map) != pgdat->node_start_pfn)
>>>                         mem_map -= pgdat->node_start_pfn;
>>> #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
>>>
>>>
>>> It's from 2006 so I can't expect you remember the details, but I had some
>>>  trouble finding out what this does. I assume it makes sure that mem_map points
>>>  to struct page corresponding to pfn 0, because that's what translations using
>>>  mem_map expect.
>>> But pgdat->node_mem_map points to struct page corresponding to
>>>  pgdat->node_start_pfn, which might not be 0. So it subtracts node_start_pfn
>>>  to fix that. This is OK, as the node_mem_map is allocated (in this very
>>>  function) with padding so that it covers a MAX_ORDER_NR_PAGES aligned area
>>>  where node_mem_map may point to the middle of it.
>>>
>>> Commit 467bc461d2 fixed this in case the first pfn is not 0, but ARCH_PFN_OFFSET.
>>>  So mem_map points to struct page corresponding to pfn=ARCH_PFN_OFFSET, which
>>>  is OK. But I still have few doubts:
>>>
>>> 1) The "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" sort of silently
>>>  assumes that mem_map is allocated at the beginning of the node, i.e. at
>>>  pgdat->node_start_pfn. And the only reason for this if-condition to be true,
>>>  is that we haven't corrected the page_to_pfn translation, which uses mem_map.
>>>  Is this assumption always OK to do? Shouldn't the if-condition be instead about
>>>  pgdat->node_start_pfn not being aligned?
>>>
>>> 2) The #ifdef guard is about CONFIG_ARCH_POPULATES_NODE_MAP, which is nowadays  called  > CONFIG_HAVE_MEMBLOCK_NODE_MAP. But shouldn't it be #ifdef FLATMEM instead?
>>>  After all, we are correcting value of mem_map based on page_to_pfn code
>>> variant used on FLATMEM. arm doesn't define
>>> CONFIG_ARCH_POPULATES_NODE_MAP but apparently needs this correction.
>>>
>>
>> Just doing #ifdef FLATMEM doesn't work because ARCH_PFN_OFFSET doesn't
>> seem to be picked up properly for NOMMU arches properly. Probably just
>> missing a header somewhere.
>>
>>> 3) The node_mem_map allocation code aligns the allocation to MAX_ORDER_NR_PAGES,
>>>  so the offset between the start of the allocated map and where node_mem_map
>>>  points to will be up to MAX_ORDER_NR_PAGES.
>>> However, here we subtract (in current kernel) (pgdat->node_start_pfn - ARCH_PFN_OFFSET).
>>>  That looks like another silent assumption, that pgdat->node_start_pfn is always
>>>  between ARCH_PFN_OFFSET and ARCH_PFN_OFFSET + MAX_ORDER_NR_PAGES. If it were
>>>  larger, the mem_map correction would subtract too much and end up below what
>>>  was allocated for node_mem_map, no? The bug report behind this patch said that
>>>  first 2MB of memory was reserved using "no-map flag using DT". Unless this somehow
>>>  translates to ARCH_PFN_OFFSET at build time, we would underflow mem_map, right?
>>>  Maybe I'm just overly paranoid here and of course ARCH_PFN_OFFSET is determined
>>>  properly on arm...
>>>
>>> If anyone can confirm my doubts or point me to what I'm missing, thanks.
>>
>> ARCH_PFN_OFFSET should always be the lowest PFN in the system, otherwise
>> I think plenty of other things are broken given how many architectures
>> make this assumption. That said, I don't think subtracting ARCH_PFN_OFFSET
>> makes it obvious why the adjustment is being made.
>>
>> Thanks,
>> Laura
>>
> 
> I was incorrect before: it isn't just NOMMU but architectures that don't use
> asm-generic/memory_model.h which failed to compile. I could respin with

Hm I see, some architectures use own variant of page_to_pfn, that's why it's
being used in the if () check.

> more ifdefery around the ARCH_PFN_OFFSET if that sounds reasonable.

So I think your v2 might be correct already. Unless there's an architecture that
defines CONFIG_FLATMEM and not CONFIG_HAVE_MEMBLOCK_NODE_MAP and places memmap
somewhere else than pgdat->node_start_pfn, which would trigger the check for a
wrong reason after the patch.

Looks like arm is an arch that doesn't define CONFIG_HAVE_MEMBLOCK_NODE_MAP, yet
it defines ARCH_PFN_OFFSET. With your patch it would correct memmap by the
calculated offset, not the ARCH_PFN_OFFSET constant. Are these two the same
then? Should there be something like a VM_BUG_ON that ARCH_PFN_OFFSET (if it
exists) is indeed equal to the calculated offset? Or maybe a more general
VM_BUG_ON checking that after any correction we make, the (page_to_pfn(mem_map)
== pgdat->node_start_pfn) condition holds?

> Thanks,
> Laura
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2015-02-27 15:25 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-16 11:30 Issue on reserving memory with no-map flag in DT Srinivas Kandagatla
2015-01-17  0:24 ` Laura Abbott
2015-01-17  0:24   ` Laura Abbott
2015-01-17  8:39   ` Srinivas Kandagatla
2015-01-17  8:39     ` Srinivas Kandagatla
2015-01-19 15:49   ` Vlastimil Babka
2015-01-19 15:49     ` Vlastimil Babka
2015-01-19 23:57     ` Laura Abbott
2015-01-19 23:57       ` Laura Abbott
2015-01-20  9:54       ` Vlastimil Babka
2015-01-20  9:54         ` Vlastimil Babka
2015-01-21  1:37         ` [PATCH] mm: Don't offset memmap for flatmem Laura Abbott
2015-01-21  1:37           ` Laura Abbott
2015-01-21 10:15           ` Vlastimil Babka
2015-01-21 10:15             ` Vlastimil Babka
2015-01-22  1:01           ` [PATCHv2] " Laura Abbott
2015-01-22  1:01             ` Laura Abbott
2015-01-23  0:20             ` Andrew Morton
2015-01-23  0:20               ` Andrew Morton
2015-01-23  0:33               ` Laura Abbott
2015-01-23  0:33                 ` Laura Abbott
2015-01-23  9:05                 ` Vlastimil Babka
2015-01-23  9:05                   ` Vlastimil Babka
2015-01-26 15:56                   ` Mel Gorman
2015-01-26 15:56                     ` Mel Gorman
2015-01-29 13:13                     ` Vlastimil Babka
2015-01-29 13:13                       ` Vlastimil Babka
2015-02-04  2:25                       ` Laura Abbott
2015-02-04  2:25                         ` Laura Abbott
2015-02-24 19:54                         ` Laura Abbott
2015-02-24 19:54                           ` Laura Abbott
2015-02-27 15:24                           ` Vlastimil Babka
2015-02-27 15:24                             ` Vlastimil Babka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.