* Issue on reserving memory with no-map flag in DT @ 2015-01-16 11:30 Srinivas Kandagatla 2015-01-17 0:24 ` Laura Abbott 0 siblings, 1 reply; 33+ messages in thread From: Srinivas Kandagatla @ 2015-01-16 11:30 UTC (permalink / raw) To: linux-arm-kernel Hi All, I am hitting boot failures when I did try to reserve memory with no-map flag using DT. Basically kernel just hangs with no indication of whats going on. Added some debug to find out the location, it was some where while dma mapping at kmap_atomic() in __dma_clear_buffer(). reserving. The issue is very much identical to http://lists.infradead.org/pipermail/linux-arm-kernel/2014-October/294773.html but the memory reserve in my case is at start of the memory. I tried the same fixes on this thread but it did not help. Platform: IFC6410 with APQ8064 which is a v7 platform with 2GB of memory starting at 0x80000000 and kernel is always loaded at 0x80200000 And am using multi_v7_defconfig. Meminfo without memory reserve: 80000000-88dfffff : System RAM 80208000-80e5d307 : Kernel code 80f64000-810be397 : Kernel data 8a000000-8d9fffff : System RAM 8ec00000-8effffff : System RAM 8f700000-8fdfffff : System RAM 90000000-af7fffff : System RAM DT entry: reserved-memory { #address-cells = <1>; #size-cells = <1>; ranges; smem at 80000000 { reg = <0x80000000 0x200000>; no-map; }; }; If I remove the no-map flag, then I can boot the board. But I don?t want kernel to map this memory at all, as this a IPC memory. I just wanted to understand whats going on here, Am guessing that kernel would never touch that 2MB memory. Does arm-kernel has limitation on unmapping/memblock_remove() such memory locations? Or Is this a known issue? Any pointers to debug this issue? Before the kernel hangs it reports 2 errors like: BUG: Bad page state in process swapper pfn:fffa8 page:ef7fb500 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked) page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set bad because of flags: flags: 0x200041(locked|active|mlocked) Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816 Hardware name: Qualcomm (Flattened Device Tree) [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24) [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c) [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128) [<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0) [<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174) [<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58) [<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88) [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430) [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8) [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074) Disabling lock debugging due to kernel taint Full kernel log with memblock debug at http://paste.ubuntu.com/9761000/ Thanks, srini ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Issue on reserving memory with no-map flag in DT 2015-01-16 11:30 Issue on reserving memory with no-map flag in DT Srinivas Kandagatla @ 2015-01-17 0:24 ` Laura Abbott 0 siblings, 0 replies; 33+ messages in thread From: Laura Abbott @ 2015-01-17 0:24 UTC (permalink / raw) To: Srinivas Kandagatla, linux-arm-kernel, linux, ssantosh, Andrew Morton, Mel Gorman Cc: Kevin Hilman, Stephen Boyd, Arnd Bergmann, Kumar Gala, linux-mm (Adding linux-mm and relevant people because this looks like an issue there) On 1/16/2015 3:30 AM, Srinivas Kandagatla wrote: > Hi All, > > I am hitting boot failures when I did try to reserve memory with no-map flag using DT. Basically kernel just hangs with no indication of whats going on. Added some debug to find out the location, it was some where while dma mapping at kmap_atomic() in __dma_clear_buffer(). > reserving. > > The issue is very much identical to http://lists.infradead.org/pipermail/linux-arm-kernel/2014-October/294773.html but the memory reserve in my case is at start of the memory. I tried the same fixes on this thread but it did not help. > > Platform: IFC6410 with APQ8064 which is a v7 platform with 2GB of memory starting at 0x80000000 and kernel is always loaded at 0x80200000 > And am using multi_v7_defconfig. > > Meminfo without memory reserve: > 80000000-88dfffff : System RAM > 80208000-80e5d307 : Kernel code > 80f64000-810be397 : Kernel data > 8a000000-8d9fffff : System RAM > 8ec00000-8effffff : System RAM > 8f700000-8fdfffff : System RAM > 90000000-af7fffff : System RAM > > DT entry: > reserved-memory { > #address-cells = <1>; > #size-cells = <1>; > ranges; > smem@80000000 { > reg = <0x80000000 0x200000>; > no-map; > }; > }; > > If I remove the no-map flag, then I can boot the board. But I dona??t want kernel to map this memory at all, as this a IPC memory. > > I just wanted to understand whats going on here, Am guessing that kernel would never touch that 2MB memory. > > Does arm-kernel has limitation on unmapping/memblock_remove() such memory locations? > Or > Is this a known issue? > > Any pointers to debug this issue? > > Before the kernel hangs it reports 2 errors like: > > BUG: Bad page state in process swapper pfn:fffa8 > page:ef7fb500 count:0 mapcount:0 mapping: (null) index:0x0 > flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked) > page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set > bad because of flags: > flags: 0x200041(locked|active|mlocked) > Modules linked in: > CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816 > Hardware name: Qualcomm (Flattened Device Tree) > [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24) > [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c) > [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128) > [<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0) > [<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174) > [<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58) > [<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88) > [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430) > [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8) > [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074) > Disabling lock debugging due to kernel taint > > > Full kernel log with memblock debug at http://paste.ubuntu.com/9761000/ > I don't have an IFC handy but I was able to reproduce the same issue on another board. I think this is an underlying issue in mm code. Removing the first 2MB changes the start address of the zone. This means the start address is no longer pageblock aligned (4MB on this system). With a little digging, it looks like the issue is we're running off the end of the end of the mem_map array because the memmap array is too small. This is similar to an issue fixed by 7c45512 mm: fix pageblock bitmap allocation and the following fixes it for me: diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7633c50..32d9436 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5012,7 +5012,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) #ifdef CONFIG_FLAT_NODE_MEM_MAP /* ia64 gets its own node_mem_map, before this, without bootmem */ if (!pgdat->node_mem_map) { - unsigned long size, start, end; + unsigned long size, start, end, offset; struct page *map; /* @@ -5020,10 +5020,11 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) * aligned but the node_mem_map endpoints must be in order * for the buddy allocator to function correctly. */ + offset = pgdat->node_start_pfn & (pageblock_nr_pages - 1); start = pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1); end = pgdat_end_pfn(pgdat); end = ALIGN(end, MAX_ORDER_NR_PAGES); - size = (end - start) * sizeof(struct page); + size = ((end - start) + offset) * sizeof(struct page); map = alloc_remap(pgdat->node_id, size); if (!map) map = memblock_virt_alloc_node_nopanic(size, If there is agreement on this approach, I can turn this into a proper patch. Thanks, Laura -- Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Issue on reserving memory with no-map flag in DT @ 2015-01-17 0:24 ` Laura Abbott 0 siblings, 0 replies; 33+ messages in thread From: Laura Abbott @ 2015-01-17 0:24 UTC (permalink / raw) To: linux-arm-kernel (Adding linux-mm and relevant people because this looks like an issue there) On 1/16/2015 3:30 AM, Srinivas Kandagatla wrote: > Hi All, > > I am hitting boot failures when I did try to reserve memory with no-map flag using DT. Basically kernel just hangs with no indication of whats going on. Added some debug to find out the location, it was some where while dma mapping at kmap_atomic() in __dma_clear_buffer(). > reserving. > > The issue is very much identical to http://lists.infradead.org/pipermail/linux-arm-kernel/2014-October/294773.html but the memory reserve in my case is at start of the memory. I tried the same fixes on this thread but it did not help. > > Platform: IFC6410 with APQ8064 which is a v7 platform with 2GB of memory starting at 0x80000000 and kernel is always loaded at 0x80200000 > And am using multi_v7_defconfig. > > Meminfo without memory reserve: > 80000000-88dfffff : System RAM > 80208000-80e5d307 : Kernel code > 80f64000-810be397 : Kernel data > 8a000000-8d9fffff : System RAM > 8ec00000-8effffff : System RAM > 8f700000-8fdfffff : System RAM > 90000000-af7fffff : System RAM > > DT entry: > reserved-memory { > #address-cells = <1>; > #size-cells = <1>; > ranges; > smem at 80000000 { > reg = <0x80000000 0x200000>; > no-map; > }; > }; > > If I remove the no-map flag, then I can boot the board. But I don?t want kernel to map this memory at all, as this a IPC memory. > > I just wanted to understand whats going on here, Am guessing that kernel would never touch that 2MB memory. > > Does arm-kernel has limitation on unmapping/memblock_remove() such memory locations? > Or > Is this a known issue? > > Any pointers to debug this issue? > > Before the kernel hangs it reports 2 errors like: > > BUG: Bad page state in process swapper pfn:fffa8 > page:ef7fb500 count:0 mapcount:0 mapping: (null) index:0x0 > flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked) > page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set > bad because of flags: > flags: 0x200041(locked|active|mlocked) > Modules linked in: > CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816 > Hardware name: Qualcomm (Flattened Device Tree) > [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24) > [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c) > [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128) > [<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0) > [<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174) > [<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58) > [<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88) > [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430) > [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8) > [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074) > Disabling lock debugging due to kernel taint > > > Full kernel log with memblock debug at http://paste.ubuntu.com/9761000/ > I don't have an IFC handy but I was able to reproduce the same issue on another board. I think this is an underlying issue in mm code. Removing the first 2MB changes the start address of the zone. This means the start address is no longer pageblock aligned (4MB on this system). With a little digging, it looks like the issue is we're running off the end of the end of the mem_map array because the memmap array is too small. This is similar to an issue fixed by 7c45512 mm: fix pageblock bitmap allocation and the following fixes it for me: diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7633c50..32d9436 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5012,7 +5012,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) #ifdef CONFIG_FLAT_NODE_MEM_MAP /* ia64 gets its own node_mem_map, before this, without bootmem */ if (!pgdat->node_mem_map) { - unsigned long size, start, end; + unsigned long size, start, end, offset; struct page *map; /* @@ -5020,10 +5020,11 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) * aligned but the node_mem_map endpoints must be in order * for the buddy allocator to function correctly. */ + offset = pgdat->node_start_pfn & (pageblock_nr_pages - 1); start = pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1); end = pgdat_end_pfn(pgdat); end = ALIGN(end, MAX_ORDER_NR_PAGES); - size = (end - start) * sizeof(struct page); + size = ((end - start) + offset) * sizeof(struct page); map = alloc_remap(pgdat->node_id, size); if (!map) map = memblock_virt_alloc_node_nopanic(size, If there is agreement on this approach, I can turn this into a proper patch. Thanks, Laura -- Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: Issue on reserving memory with no-map flag in DT 2015-01-17 0:24 ` Laura Abbott @ 2015-01-17 8:39 ` Srinivas Kandagatla -1 siblings, 0 replies; 33+ messages in thread From: Srinivas Kandagatla @ 2015-01-17 8:39 UTC (permalink / raw) To: Laura Abbott, linux-arm-kernel, linux, ssantosh, Andrew Morton, Mel Gorman Cc: Kevin Hilman, Stephen Boyd, Arnd Bergmann, Kumar Gala, linux-mm On 17/01/15 00:24, Laura Abbott wrote: > (Adding linux-mm and relevant people because this looks like an issue > there) > > On 1/16/2015 3:30 AM, Srinivas Kandagatla wrote: >> Hi All, >> >> I am hitting boot failures when I did try to reserve memory with >> no-map flag using DT. Basically kernel just hangs with no indication >> of whats going on. Added some debug to find out the location, it was >> some where while dma mapping at kmap_atomic() in __dma_clear_buffer(). >> reserving. >> >> The issue is very much identical to >> http://lists.infradead.org/pipermail/linux-arm-kernel/2014-October/294773.html >> but the memory reserve in my case is at start of the memory. I tried >> the same fixes on this thread but it did not help. >> >> Platform: IFC6410 with APQ8064 which is a v7 platform with 2GB of >> memory starting at 0x80000000 and kernel is always loaded at 0x80200000 >> And am using multi_v7_defconfig. >> >> Meminfo without memory reserve: >> 80000000-88dfffff : System RAM >> 80208000-80e5d307 : Kernel code >> 80f64000-810be397 : Kernel data >> 8a000000-8d9fffff : System RAM >> 8ec00000-8effffff : System RAM >> 8f700000-8fdfffff : System RAM >> 90000000-af7fffff : System RAM >> >> DT entry: >> reserved-memory { >> #address-cells = <1>; >> #size-cells = <1>; >> ranges; >> smem@80000000 { >> reg = <0x80000000 0x200000>; >> no-map; >> }; >> }; >> >> If I remove the no-map flag, then I can boot the board. But I dona??t >> want kernel to map this memory at all, as this a IPC memory. >> >> I just wanted to understand whats going on here, Am guessing that >> kernel would never touch that 2MB memory. >> >> Does arm-kernel has limitation on unmapping/memblock_remove() such >> memory locations? >> Or >> Is this a known issue? >> >> Any pointers to debug this issue? >> >> Before the kernel hangs it reports 2 errors like: >> >> BUG: Bad page state in process swapper pfn:fffa8 >> page:ef7fb500 count:0 mapcount:0 mapping: (null) index:0x0 >> flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked) >> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set >> bad because of flags: >> flags: 0x200041(locked|active|mlocked) >> Modules linked in: >> CPU: 0 PID: 0 Comm: swapper Not tainted >> 3.19.0-rc3-00007-g412f9ba-dirty #816 >> Hardware name: Qualcomm (Flattened Device Tree) >> [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24) >> [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c) >> [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128) >> [<c0301570>] (bad_page) from [<c03018a8>] >> (free_pages_prepare+0x168/0x1e0) >> [<c03018a8>] (free_pages_prepare) from [<c030369c>] >> (free_hot_cold_page+0x3c/0x174) >> [<c030369c>] (free_hot_cold_page) from [<c0303828>] >> (__free_pages+0x54/0x58) >> [<c0303828>] (__free_pages) from [<c030395c>] >> (free_highmem_page+0x38/0x88) >> [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430) >> [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8) >> [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074) >> Disabling lock debugging due to kernel taint >> >> >> Full kernel log with memblock debug at http://paste.ubuntu.com/9761000/ >> > > I don't have an IFC handy but I was able to reproduce the same issue on > another board. > I think this is an underlying issue in mm code. > > Removing the first 2MB changes the start address of the zone. This means > the start > address is no longer pageblock aligned (4MB on this system). With a little > digging, it looks like the issue is we're running off the end of the end > of the > mem_map array because the memmap array is too small. This is similar to > an issue fixed by 7c45512 mm: fix pageblock bitmap allocation and the > following > fixes it for me: Thanks Laura, This patch indeed fixes issue for me too. Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 7633c50..32d9436 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5012,7 +5012,7 @@ static void __init_refok alloc_node_mem_map(struct > pglist_data *pgdat) > #ifdef CONFIG_FLAT_NODE_MEM_MAP > /* ia64 gets its own node_mem_map, before this, without bootmem */ > if (!pgdat->node_mem_map) { > - unsigned long size, start, end; > + unsigned long size, start, end, offset; > struct page *map; > > /* > @@ -5020,10 +5020,11 @@ static void __init_refok > alloc_node_mem_map(struct pglist_data *pgdat) > * aligned but the node_mem_map endpoints must be in order > * for the buddy allocator to function correctly. > */ > + offset = pgdat->node_start_pfn & (pageblock_nr_pages - 1); > start = pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1); > end = pgdat_end_pfn(pgdat); > end = ALIGN(end, MAX_ORDER_NR_PAGES); > - size = (end - start) * sizeof(struct page); > + size = ((end - start) + offset) * sizeof(struct page); > map = alloc_remap(pgdat->node_id, size); > if (!map) > map = memblock_virt_alloc_node_nopanic(size, > > If there is agreement on this approach, I can turn this into a proper > patch. > > Thanks, > Laura > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 33+ messages in thread
* Issue on reserving memory with no-map flag in DT @ 2015-01-17 8:39 ` Srinivas Kandagatla 0 siblings, 0 replies; 33+ messages in thread From: Srinivas Kandagatla @ 2015-01-17 8:39 UTC (permalink / raw) To: linux-arm-kernel On 17/01/15 00:24, Laura Abbott wrote: > (Adding linux-mm and relevant people because this looks like an issue > there) > > On 1/16/2015 3:30 AM, Srinivas Kandagatla wrote: >> Hi All, >> >> I am hitting boot failures when I did try to reserve memory with >> no-map flag using DT. Basically kernel just hangs with no indication >> of whats going on. Added some debug to find out the location, it was >> some where while dma mapping at kmap_atomic() in __dma_clear_buffer(). >> reserving. >> >> The issue is very much identical to >> http://lists.infradead.org/pipermail/linux-arm-kernel/2014-October/294773.html >> but the memory reserve in my case is at start of the memory. I tried >> the same fixes on this thread but it did not help. >> >> Platform: IFC6410 with APQ8064 which is a v7 platform with 2GB of >> memory starting at 0x80000000 and kernel is always loaded at 0x80200000 >> And am using multi_v7_defconfig. >> >> Meminfo without memory reserve: >> 80000000-88dfffff : System RAM >> 80208000-80e5d307 : Kernel code >> 80f64000-810be397 : Kernel data >> 8a000000-8d9fffff : System RAM >> 8ec00000-8effffff : System RAM >> 8f700000-8fdfffff : System RAM >> 90000000-af7fffff : System RAM >> >> DT entry: >> reserved-memory { >> #address-cells = <1>; >> #size-cells = <1>; >> ranges; >> smem at 80000000 { >> reg = <0x80000000 0x200000>; >> no-map; >> }; >> }; >> >> If I remove the no-map flag, then I can boot the board. But I don?t >> want kernel to map this memory at all, as this a IPC memory. >> >> I just wanted to understand whats going on here, Am guessing that >> kernel would never touch that 2MB memory. >> >> Does arm-kernel has limitation on unmapping/memblock_remove() such >> memory locations? >> Or >> Is this a known issue? >> >> Any pointers to debug this issue? >> >> Before the kernel hangs it reports 2 errors like: >> >> BUG: Bad page state in process swapper pfn:fffa8 >> page:ef7fb500 count:0 mapcount:0 mapping: (null) index:0x0 >> flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked) >> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set >> bad because of flags: >> flags: 0x200041(locked|active|mlocked) >> Modules linked in: >> CPU: 0 PID: 0 Comm: swapper Not tainted >> 3.19.0-rc3-00007-g412f9ba-dirty #816 >> Hardware name: Qualcomm (Flattened Device Tree) >> [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24) >> [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c) >> [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128) >> [<c0301570>] (bad_page) from [<c03018a8>] >> (free_pages_prepare+0x168/0x1e0) >> [<c03018a8>] (free_pages_prepare) from [<c030369c>] >> (free_hot_cold_page+0x3c/0x174) >> [<c030369c>] (free_hot_cold_page) from [<c0303828>] >> (__free_pages+0x54/0x58) >> [<c0303828>] (__free_pages) from [<c030395c>] >> (free_highmem_page+0x38/0x88) >> [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430) >> [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8) >> [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074) >> Disabling lock debugging due to kernel taint >> >> >> Full kernel log with memblock debug at http://paste.ubuntu.com/9761000/ >> > > I don't have an IFC handy but I was able to reproduce the same issue on > another board. > I think this is an underlying issue in mm code. > > Removing the first 2MB changes the start address of the zone. This means > the start > address is no longer pageblock aligned (4MB on this system). With a little > digging, it looks like the issue is we're running off the end of the end > of the > mem_map array because the memmap array is too small. This is similar to > an issue fixed by 7c45512 mm: fix pageblock bitmap allocation and the > following > fixes it for me: Thanks Laura, This patch indeed fixes issue for me too. Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 7633c50..32d9436 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5012,7 +5012,7 @@ static void __init_refok alloc_node_mem_map(struct > pglist_data *pgdat) > #ifdef CONFIG_FLAT_NODE_MEM_MAP > /* ia64 gets its own node_mem_map, before this, without bootmem */ > if (!pgdat->node_mem_map) { > - unsigned long size, start, end; > + unsigned long size, start, end, offset; > struct page *map; > > /* > @@ -5020,10 +5020,11 @@ static void __init_refok > alloc_node_mem_map(struct pglist_data *pgdat) > * aligned but the node_mem_map endpoints must be in order > * for the buddy allocator to function correctly. > */ > + offset = pgdat->node_start_pfn & (pageblock_nr_pages - 1); > start = pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1); > end = pgdat_end_pfn(pgdat); > end = ALIGN(end, MAX_ORDER_NR_PAGES); > - size = (end - start) * sizeof(struct page); > + size = ((end - start) + offset) * sizeof(struct page); > map = alloc_remap(pgdat->node_id, size); > if (!map) > map = memblock_virt_alloc_node_nopanic(size, > > If there is agreement on this approach, I can turn this into a proper > patch. > > Thanks, > Laura > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Issue on reserving memory with no-map flag in DT 2015-01-17 0:24 ` Laura Abbott @ 2015-01-19 15:49 ` Vlastimil Babka -1 siblings, 0 replies; 33+ messages in thread From: Vlastimil Babka @ 2015-01-19 15:49 UTC (permalink / raw) To: Laura Abbott, Srinivas Kandagatla, linux-arm-kernel, linux, ssantosh, Andrew Morton, Mel Gorman Cc: Kevin Hilman, Stephen Boyd, Arnd Bergmann, Kumar Gala, linux-mm On 01/17/2015 01:24 AM, Laura Abbott wrote: > (Adding linux-mm and relevant people because this looks like an issue there) > > On 1/16/2015 3:30 AM, Srinivas Kandagatla wrote: >> Hi All, >> >> I am hitting boot failures when I did try to reserve memory with no-map flag using DT. Basically kernel just hangs with no indication of whats going on. Added some debug to find out the location, it was some where while dma mapping at kmap_atomic() in __dma_clear_buffer(). >> reserving. >> >> The issue is very much identical to http://lists.infradead.org/pipermail/linux-arm-kernel/2014-October/294773.html but the memory reserve in my case is at start of the memory. I tried the same fixes on this thread but it did not help. >> >> Platform: IFC6410 with APQ8064 which is a v7 platform with 2GB of memory starting at 0x80000000 and kernel is always loaded at 0x80200000 >> And am using multi_v7_defconfig. >> >> Meminfo without memory reserve: >> 80000000-88dfffff : System RAM >> 80208000-80e5d307 : Kernel code >> 80f64000-810be397 : Kernel data >> 8a000000-8d9fffff : System RAM >> 8ec00000-8effffff : System RAM >> 8f700000-8fdfffff : System RAM >> 90000000-af7fffff : System RAM >> >> DT entry: >> reserved-memory { >> #address-cells = <1>; >> #size-cells = <1>; >> ranges; >> smem@80000000 { >> reg = <0x80000000 0x200000>; >> no-map; >> }; >> }; >> >> If I remove the no-map flag, then I can boot the board. But I dona??t want kernel to map this memory at all, as this a IPC memory. >> >> I just wanted to understand whats going on here, Am guessing that kernel would never touch that 2MB memory. >> >> Does arm-kernel has limitation on unmapping/memblock_remove() such memory locations? >> Or >> Is this a known issue? >> >> Any pointers to debug this issue? >> >> Before the kernel hangs it reports 2 errors like: >> >> BUG: Bad page state in process swapper pfn:fffa8 >> page:ef7fb500 count:0 mapcount:0 mapping: (null) index:0x0 >> flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked) >> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set >> bad because of flags: >> flags: 0x200041(locked|active|mlocked) >> Modules linked in: >> CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816 >> Hardware name: Qualcomm (Flattened Device Tree) >> [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24) >> [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c) >> [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128) >> [<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0) >> [<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174) >> [<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58) >> [<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88) >> [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430) >> [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8) >> [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074) >> Disabling lock debugging due to kernel taint >> >> >> Full kernel log with memblock debug at http://paste.ubuntu.com/9761000/ >> > > I don't have an IFC handy but I was able to reproduce the same issue on another board. > I think this is an underlying issue in mm code. > > Removing the first 2MB changes the start address of the zone. This means the start > address is no longer pageblock aligned (4MB on this system). With a little > digging, it looks like the issue is we're running off the end of the end of the > mem_map array because the memmap array is too small. This is similar to > an issue fixed by 7c45512 mm: fix pageblock bitmap allocation and the following > fixes it for me: > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 7633c50..32d9436 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5012,7 +5012,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) > #ifdef CONFIG_FLAT_NODE_MEM_MAP > /* ia64 gets its own node_mem_map, before this, without bootmem */ > if (!pgdat->node_mem_map) { > - unsigned long size, start, end; > + unsigned long size, start, end, offset; > struct page *map; > > /* > @@ -5020,10 +5020,11 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) > * aligned but the node_mem_map endpoints must be in order > * for the buddy allocator to function correctly. > */ > + offset = pgdat->node_start_pfn & (pageblock_nr_pages - 1); > start = pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1); > end = pgdat_end_pfn(pgdat); > end = ALIGN(end, MAX_ORDER_NR_PAGES); > - size = (end - start) * sizeof(struct page); > + size = ((end - start) + offset) * sizeof(struct page); > map = alloc_remap(pgdat->node_id, size); > if (!map) > map = memblock_virt_alloc_node_nopanic(size, > > If there is agreement on this approach, I can turn this into a proper patch. I admit I may not see clearly through all the arch-specific layers and various config option combinations that are possible here, so I might be misinterpreting the code. But I think the problem here is not insufficient allocation size, but something else. The code above continues by this line: pgdat->node_mem_map = map + (pgdat->node_start_pfn - start); So, size for the map allocation has already been calculated aligned to MAX_ORDER_NR_PAGES before your patch, and node_mem_map points to the first actually present page, which might be offset from the perfect alignment. Your patch adds another offset to the already aligned size (but you use pageblock_nr_pages which might be lower than MAX_ORDER_NR_PAGES; this seems like a mistake in itself?). So with your patch we have map of aligned size starting from the node_mem_map. This means the last offset-worth of struct pages should be beyond what's needed to access struct page of pgdat_end_pfn(). If we need that extra padding to prevent crashing, then it looks really suspicious... And when I look at node_mem_map usage, I see include/asm/generic/memory_model.h defines __pfn_to_page as (basically) NODE_DATA(__nid)->node_mem_map + arch_local_page_offset(__pfn, __nid);\ and further above is a generic definition of arch_local_page_offset: #define arch_local_page_offset(pfn, nid) \ ((pfn) - NODE_DATA(nid)->node_start_pfn) So it looks correct to me without your patch. The map is allocated aligned, node_mem_map points to this map at the offset corresponding to node_start_pfn, and pfn_to_page subtracts node_start_pfn to get the offset relative to node_mem_map. We shouldn't need the extra padding by the node_start_pfn offset, unless something else is misbehaving here. In the issue fixed by 7c45512 that you refer to, the problem was basically that the allocation didn't use aligned size, but this looks different to me? > Thanks, > Laura > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 33+ messages in thread
* Issue on reserving memory with no-map flag in DT @ 2015-01-19 15:49 ` Vlastimil Babka 0 siblings, 0 replies; 33+ messages in thread From: Vlastimil Babka @ 2015-01-19 15:49 UTC (permalink / raw) To: linux-arm-kernel On 01/17/2015 01:24 AM, Laura Abbott wrote: > (Adding linux-mm and relevant people because this looks like an issue there) > > On 1/16/2015 3:30 AM, Srinivas Kandagatla wrote: >> Hi All, >> >> I am hitting boot failures when I did try to reserve memory with no-map flag using DT. Basically kernel just hangs with no indication of whats going on. Added some debug to find out the location, it was some where while dma mapping at kmap_atomic() in __dma_clear_buffer(). >> reserving. >> >> The issue is very much identical to http://lists.infradead.org/pipermail/linux-arm-kernel/2014-October/294773.html but the memory reserve in my case is at start of the memory. I tried the same fixes on this thread but it did not help. >> >> Platform: IFC6410 with APQ8064 which is a v7 platform with 2GB of memory starting at 0x80000000 and kernel is always loaded at 0x80200000 >> And am using multi_v7_defconfig. >> >> Meminfo without memory reserve: >> 80000000-88dfffff : System RAM >> 80208000-80e5d307 : Kernel code >> 80f64000-810be397 : Kernel data >> 8a000000-8d9fffff : System RAM >> 8ec00000-8effffff : System RAM >> 8f700000-8fdfffff : System RAM >> 90000000-af7fffff : System RAM >> >> DT entry: >> reserved-memory { >> #address-cells = <1>; >> #size-cells = <1>; >> ranges; >> smem at 80000000 { >> reg = <0x80000000 0x200000>; >> no-map; >> }; >> }; >> >> If I remove the no-map flag, then I can boot the board. But I don?t want kernel to map this memory at all, as this a IPC memory. >> >> I just wanted to understand whats going on here, Am guessing that kernel would never touch that 2MB memory. >> >> Does arm-kernel has limitation on unmapping/memblock_remove() such memory locations? >> Or >> Is this a known issue? >> >> Any pointers to debug this issue? >> >> Before the kernel hangs it reports 2 errors like: >> >> BUG: Bad page state in process swapper pfn:fffa8 >> page:ef7fb500 count:0 mapcount:0 mapping: (null) index:0x0 >> flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked) >> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set >> bad because of flags: >> flags: 0x200041(locked|active|mlocked) >> Modules linked in: >> CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816 >> Hardware name: Qualcomm (Flattened Device Tree) >> [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24) >> [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c) >> [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128) >> [<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0) >> [<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174) >> [<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58) >> [<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88) >> [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430) >> [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8) >> [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074) >> Disabling lock debugging due to kernel taint >> >> >> Full kernel log with memblock debug at http://paste.ubuntu.com/9761000/ >> > > I don't have an IFC handy but I was able to reproduce the same issue on another board. > I think this is an underlying issue in mm code. > > Removing the first 2MB changes the start address of the zone. This means the start > address is no longer pageblock aligned (4MB on this system). With a little > digging, it looks like the issue is we're running off the end of the end of the > mem_map array because the memmap array is too small. This is similar to > an issue fixed by 7c45512 mm: fix pageblock bitmap allocation and the following > fixes it for me: > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 7633c50..32d9436 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5012,7 +5012,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) > #ifdef CONFIG_FLAT_NODE_MEM_MAP > /* ia64 gets its own node_mem_map, before this, without bootmem */ > if (!pgdat->node_mem_map) { > - unsigned long size, start, end; > + unsigned long size, start, end, offset; > struct page *map; > > /* > @@ -5020,10 +5020,11 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) > * aligned but the node_mem_map endpoints must be in order > * for the buddy allocator to function correctly. > */ > + offset = pgdat->node_start_pfn & (pageblock_nr_pages - 1); > start = pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1); > end = pgdat_end_pfn(pgdat); > end = ALIGN(end, MAX_ORDER_NR_PAGES); > - size = (end - start) * sizeof(struct page); > + size = ((end - start) + offset) * sizeof(struct page); > map = alloc_remap(pgdat->node_id, size); > if (!map) > map = memblock_virt_alloc_node_nopanic(size, > > If there is agreement on this approach, I can turn this into a proper patch. I admit I may not see clearly through all the arch-specific layers and various config option combinations that are possible here, so I might be misinterpreting the code. But I think the problem here is not insufficient allocation size, but something else. The code above continues by this line: pgdat->node_mem_map = map + (pgdat->node_start_pfn - start); So, size for the map allocation has already been calculated aligned to MAX_ORDER_NR_PAGES before your patch, and node_mem_map points to the first actually present page, which might be offset from the perfect alignment. Your patch adds another offset to the already aligned size (but you use pageblock_nr_pages which might be lower than MAX_ORDER_NR_PAGES; this seems like a mistake in itself?). So with your patch we have map of aligned size starting from the node_mem_map. This means the last offset-worth of struct pages should be beyond what's needed to access struct page of pgdat_end_pfn(). If we need that extra padding to prevent crashing, then it looks really suspicious... And when I look at node_mem_map usage, I see include/asm/generic/memory_model.h defines __pfn_to_page as (basically) NODE_DATA(__nid)->node_mem_map + arch_local_page_offset(__pfn, __nid);\ and further above is a generic definition of arch_local_page_offset: #define arch_local_page_offset(pfn, nid) \ ((pfn) - NODE_DATA(nid)->node_start_pfn) So it looks correct to me without your patch. The map is allocated aligned, node_mem_map points to this map at the offset corresponding to node_start_pfn, and pfn_to_page subtracts node_start_pfn to get the offset relative to node_mem_map. We shouldn't need the extra padding by the node_start_pfn offset, unless something else is misbehaving here. In the issue fixed by 7c45512 that you refer to, the problem was basically that the allocation didn't use aligned size, but this looks different to me? > Thanks, > Laura > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Issue on reserving memory with no-map flag in DT 2015-01-19 15:49 ` Vlastimil Babka @ 2015-01-19 23:57 ` Laura Abbott -1 siblings, 0 replies; 33+ messages in thread From: Laura Abbott @ 2015-01-19 23:57 UTC (permalink / raw) To: Vlastimil Babka, Srinivas Kandagatla, linux-arm-kernel, linux, ssantosh, Andrew Morton, Mel Gorman Cc: Kevin Hilman, Stephen Boyd, Arnd Bergmann, Kumar Gala, linux-mm On 1/19/2015 7:49 AM, Vlastimil Babka wrote: > On 01/17/2015 01:24 AM, Laura Abbott wrote: >> (Adding linux-mm and relevant people because this looks like an issue there) >> >> On 1/16/2015 3:30 AM, Srinivas Kandagatla wrote: >>> Hi All, >>> >>> I am hitting boot failures when I did try to reserve memory with no-map flag using DT. Basically kernel just hangs with no indication of whats going on. Added some debug to find out the location, it was some where while dma mapping at kmap_atomic() in __dma_clear_buffer(). >>> reserving. >>> >>> The issue is very much identical to http://lists.infradead.org/pipermail/linux-arm-kernel/2014-October/294773.html but the memory reserve in my case is at start of the memory. I tried the same fixes on this thread but it did not help. >>> >>> Platform: IFC6410 with APQ8064 which is a v7 platform with 2GB of memory starting at 0x80000000 and kernel is always loaded at 0x80200000 >>> And am using multi_v7_defconfig. >>> >>> Meminfo without memory reserve: >>> 80000000-88dfffff : System RAM >>> 80208000-80e5d307 : Kernel code >>> 80f64000-810be397 : Kernel data >>> 8a000000-8d9fffff : System RAM >>> 8ec00000-8effffff : System RAM >>> 8f700000-8fdfffff : System RAM >>> 90000000-af7fffff : System RAM >>> >>> DT entry: >>> reserved-memory { >>> #address-cells = <1>; >>> #size-cells = <1>; >>> ranges; >>> smem@80000000 { >>> reg = <0x80000000 0x200000>; >>> no-map; >>> }; >>> }; >>> >>> If I remove the no-map flag, then I can boot the board. But I dona??t want kernel to map this memory at all, as this a IPC memory. >>> >>> I just wanted to understand whats going on here, Am guessing that kernel would never touch that 2MB memory. >>> >>> Does arm-kernel has limitation on unmapping/memblock_remove() such memory locations? >>> Or >>> Is this a known issue? >>> >>> Any pointers to debug this issue? >>> >>> Before the kernel hangs it reports 2 errors like: >>> >>> BUG: Bad page state in process swapper pfn:fffa8 >>> page:ef7fb500 count:0 mapcount:0 mapping: (null) index:0x0 >>> flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked) >>> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set >>> bad because of flags: >>> flags: 0x200041(locked|active|mlocked) >>> Modules linked in: >>> CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816 >>> Hardware name: Qualcomm (Flattened Device Tree) >>> [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24) >>> [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c) >>> [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128) >>> [<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0) >>> [<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174) >>> [<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58) >>> [<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88) >>> [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430) >>> [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8) >>> [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074) >>> Disabling lock debugging due to kernel taint >>> >>> >>> Full kernel log with memblock debug at http://paste.ubuntu.com/9761000/ >>> >> >> I don't have an IFC handy but I was able to reproduce the same issue on another board. >> I think this is an underlying issue in mm code. >> >> Removing the first 2MB changes the start address of the zone. This means the start >> address is no longer pageblock aligned (4MB on this system). With a little >> digging, it looks like the issue is we're running off the end of the end of the >> mem_map array because the memmap array is too small. This is similar to >> an issue fixed by 7c45512 mm: fix pageblock bitmap allocation and the following >> fixes it for me: >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 7633c50..32d9436 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -5012,7 +5012,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) >> #ifdef CONFIG_FLAT_NODE_MEM_MAP >> /* ia64 gets its own node_mem_map, before this, without bootmem */ >> if (!pgdat->node_mem_map) { >> - unsigned long size, start, end; >> + unsigned long size, start, end, offset; >> struct page *map; >> >> /* >> @@ -5020,10 +5020,11 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) >> * aligned but the node_mem_map endpoints must be in order >> * for the buddy allocator to function correctly. >> */ >> + offset = pgdat->node_start_pfn & (pageblock_nr_pages - 1); >> start = pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1); >> end = pgdat_end_pfn(pgdat); >> end = ALIGN(end, MAX_ORDER_NR_PAGES); >> - size = (end - start) * sizeof(struct page); >> + size = ((end - start) + offset) * sizeof(struct page); >> map = alloc_remap(pgdat->node_id, size); >> if (!map) >> map = memblock_virt_alloc_node_nopanic(size, >> >> If there is agreement on this approach, I can turn this into a proper patch. > > I admit I may not see clearly through all the arch-specific layers and various > config option combinations that are possible here, so I might be misinterpreting > the code. But I think the problem here is not insufficient allocation size, but > something else. > > The code above continues by this line: > > pgdat->node_mem_map = map + (pgdat->node_start_pfn - start); > > So, size for the map allocation has already been calculated aligned to > MAX_ORDER_NR_PAGES before your patch, and node_mem_map points to the first > actually present page, which might be offset from the perfect alignment. Your > patch adds another offset to the already aligned size (but you use > pageblock_nr_pages which might be lower than MAX_ORDER_NR_PAGES; this seems like > a mistake in itself?). So with your patch we have map of aligned size starting > from the node_mem_map. This means the last offset-worth of struct pages should > be beyond what's needed to access struct page of pgdat_end_pfn(). If we need > that extra padding to prevent crashing, then it looks really suspicious... > > And when I look at node_mem_map usage, I see include/asm/generic/memory_model.h > defines __pfn_to_page as (basically) > > NODE_DATA(__nid)->node_mem_map + arch_local_page_offset(__pfn, __nid);\ > > and further above is a generic definition of arch_local_page_offset: > > #define arch_local_page_offset(pfn, nid) \ > ((pfn) - NODE_DATA(nid)->node_start_pfn) > > So it looks correct to me without your patch. The map is allocated aligned, > node_mem_map points to this map at the offset corresponding to node_start_pfn, > and pfn_to_page subtracts node_start_pfn to get the offset relative to > node_mem_map. We shouldn't need the extra padding by the node_start_pfn offset, > unless something else is misbehaving here. > > In the issue fixed by 7c45512 that you refer to, the problem was basically that > the allocation didn't use aligned size, but this looks different to me? > > With this hard coded debugging: diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7633c50..241b870 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5029,6 +5029,11 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) map = memblock_virt_alloc_node_nopanic(size, pgdat->node_id); pgdat->node_mem_map = map + (pgdat->node_start_pfn - start); + pr_err(">>> node_start_pfn %lx node_end_pfn %lx\n", + pgdat->node_start_pfn, pgdat_end_pfn(pgdat)); + pr_err(">>> size calculated %lx\n", size); + pr_err(">>> allocated region %p-%lx\n", map, ((unsigned long)map)+size); + } #ifndef CONFIG_NEED_MULTIPLE_NODES /* @@ -5043,6 +5048,8 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) } #endif #endif /* CONFIG_FLAT_NODE_MEM_MAP */ + pr_err(">>> pfn %lx page %p\n", 0x200, pfn_to_page(0x200)); + pr_err(">>> pfn %lx page %p\n", 0xbffff, pfn_to_page(0xbffff)); } void __paginginit free_area_init_node(int nid, unsigned long *zones_size, I get this output: [ 0.000000] >>> node_start_pfn 200 node_end_pfn c0000 [ 0.000000] >>> size calculated 1800000 [ 0.000000] >>> allocated region edffa000-ef7fa000 [ 0.000000] >>> pfn 200 page ee002000 [ 0.000000] >>> pfn bffff page ef7fdfe0 The start and end pfn values are correct but that page value is outside of the allocated region for the memory map. This is a CONFIG_FLATMEM system so we aren't actually using arch_local_page_offset at all: #define __pfn_to_page(pfn) (mem_map + ((pfn) - ARCH_PFN_OFFSET)) #define __page_to_pfn(page) ((unsigned long)((page) - mem_map) + \ ARCH_PFN_OFFSET) If you do the math, the array size is fine if we don't offset by the start but alloc_node_mem_map offsets assuming pfn_to_page will offset as well but this doesn't happen in CONFIG_FLATMEM. Either alloc_node_mem_map needs to drop the offset or the pfn_to_page functions need to start adding the offset. It's worth noting that this gets corrected properly if we have CONFIG_HAVE_MEMBLOCK_NODE_MAP enabled so perhaps the fix is to unoffset for flatmem as well: diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7633c50..271c44b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5036,7 +5036,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) */ if (pgdat == NODE_DATA(0)) { mem_map = NODE_DATA(0)->node_mem_map; -#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP +#if defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) || defined(CONFIG_FLATMEM) if (page_to_pfn(mem_map) != pgdat->node_start_pfn) mem_map -= (pgdat->node_start_pfn - ARCH_PFN_OFFSET); #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */ Thanks, Laura -- Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Issue on reserving memory with no-map flag in DT @ 2015-01-19 23:57 ` Laura Abbott 0 siblings, 0 replies; 33+ messages in thread From: Laura Abbott @ 2015-01-19 23:57 UTC (permalink / raw) To: linux-arm-kernel On 1/19/2015 7:49 AM, Vlastimil Babka wrote: > On 01/17/2015 01:24 AM, Laura Abbott wrote: >> (Adding linux-mm and relevant people because this looks like an issue there) >> >> On 1/16/2015 3:30 AM, Srinivas Kandagatla wrote: >>> Hi All, >>> >>> I am hitting boot failures when I did try to reserve memory with no-map flag using DT. Basically kernel just hangs with no indication of whats going on. Added some debug to find out the location, it was some where while dma mapping at kmap_atomic() in __dma_clear_buffer(). >>> reserving. >>> >>> The issue is very much identical to http://lists.infradead.org/pipermail/linux-arm-kernel/2014-October/294773.html but the memory reserve in my case is at start of the memory. I tried the same fixes on this thread but it did not help. >>> >>> Platform: IFC6410 with APQ8064 which is a v7 platform with 2GB of memory starting at 0x80000000 and kernel is always loaded at 0x80200000 >>> And am using multi_v7_defconfig. >>> >>> Meminfo without memory reserve: >>> 80000000-88dfffff : System RAM >>> 80208000-80e5d307 : Kernel code >>> 80f64000-810be397 : Kernel data >>> 8a000000-8d9fffff : System RAM >>> 8ec00000-8effffff : System RAM >>> 8f700000-8fdfffff : System RAM >>> 90000000-af7fffff : System RAM >>> >>> DT entry: >>> reserved-memory { >>> #address-cells = <1>; >>> #size-cells = <1>; >>> ranges; >>> smem at 80000000 { >>> reg = <0x80000000 0x200000>; >>> no-map; >>> }; >>> }; >>> >>> If I remove the no-map flag, then I can boot the board. But I don?t want kernel to map this memory at all, as this a IPC memory. >>> >>> I just wanted to understand whats going on here, Am guessing that kernel would never touch that 2MB memory. >>> >>> Does arm-kernel has limitation on unmapping/memblock_remove() such memory locations? >>> Or >>> Is this a known issue? >>> >>> Any pointers to debug this issue? >>> >>> Before the kernel hangs it reports 2 errors like: >>> >>> BUG: Bad page state in process swapper pfn:fffa8 >>> page:ef7fb500 count:0 mapcount:0 mapping: (null) index:0x0 >>> flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked) >>> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set >>> bad because of flags: >>> flags: 0x200041(locked|active|mlocked) >>> Modules linked in: >>> CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816 >>> Hardware name: Qualcomm (Flattened Device Tree) >>> [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24) >>> [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c) >>> [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128) >>> [<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0) >>> [<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174) >>> [<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58) >>> [<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88) >>> [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430) >>> [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8) >>> [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074) >>> Disabling lock debugging due to kernel taint >>> >>> >>> Full kernel log with memblock debug at http://paste.ubuntu.com/9761000/ >>> >> >> I don't have an IFC handy but I was able to reproduce the same issue on another board. >> I think this is an underlying issue in mm code. >> >> Removing the first 2MB changes the start address of the zone. This means the start >> address is no longer pageblock aligned (4MB on this system). With a little >> digging, it looks like the issue is we're running off the end of the end of the >> mem_map array because the memmap array is too small. This is similar to >> an issue fixed by 7c45512 mm: fix pageblock bitmap allocation and the following >> fixes it for me: >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 7633c50..32d9436 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -5012,7 +5012,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) >> #ifdef CONFIG_FLAT_NODE_MEM_MAP >> /* ia64 gets its own node_mem_map, before this, without bootmem */ >> if (!pgdat->node_mem_map) { >> - unsigned long size, start, end; >> + unsigned long size, start, end, offset; >> struct page *map; >> >> /* >> @@ -5020,10 +5020,11 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) >> * aligned but the node_mem_map endpoints must be in order >> * for the buddy allocator to function correctly. >> */ >> + offset = pgdat->node_start_pfn & (pageblock_nr_pages - 1); >> start = pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1); >> end = pgdat_end_pfn(pgdat); >> end = ALIGN(end, MAX_ORDER_NR_PAGES); >> - size = (end - start) * sizeof(struct page); >> + size = ((end - start) + offset) * sizeof(struct page); >> map = alloc_remap(pgdat->node_id, size); >> if (!map) >> map = memblock_virt_alloc_node_nopanic(size, >> >> If there is agreement on this approach, I can turn this into a proper patch. > > I admit I may not see clearly through all the arch-specific layers and various > config option combinations that are possible here, so I might be misinterpreting > the code. But I think the problem here is not insufficient allocation size, but > something else. > > The code above continues by this line: > > pgdat->node_mem_map = map + (pgdat->node_start_pfn - start); > > So, size for the map allocation has already been calculated aligned to > MAX_ORDER_NR_PAGES before your patch, and node_mem_map points to the first > actually present page, which might be offset from the perfect alignment. Your > patch adds another offset to the already aligned size (but you use > pageblock_nr_pages which might be lower than MAX_ORDER_NR_PAGES; this seems like > a mistake in itself?). So with your patch we have map of aligned size starting > from the node_mem_map. This means the last offset-worth of struct pages should > be beyond what's needed to access struct page of pgdat_end_pfn(). If we need > that extra padding to prevent crashing, then it looks really suspicious... > > And when I look at node_mem_map usage, I see include/asm/generic/memory_model.h > defines __pfn_to_page as (basically) > > NODE_DATA(__nid)->node_mem_map + arch_local_page_offset(__pfn, __nid);\ > > and further above is a generic definition of arch_local_page_offset: > > #define arch_local_page_offset(pfn, nid) \ > ((pfn) - NODE_DATA(nid)->node_start_pfn) > > So it looks correct to me without your patch. The map is allocated aligned, > node_mem_map points to this map at the offset corresponding to node_start_pfn, > and pfn_to_page subtracts node_start_pfn to get the offset relative to > node_mem_map. We shouldn't need the extra padding by the node_start_pfn offset, > unless something else is misbehaving here. > > In the issue fixed by 7c45512 that you refer to, the problem was basically that > the allocation didn't use aligned size, but this looks different to me? > > With this hard coded debugging: diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7633c50..241b870 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5029,6 +5029,11 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) map = memblock_virt_alloc_node_nopanic(size, pgdat->node_id); pgdat->node_mem_map = map + (pgdat->node_start_pfn - start); + pr_err(">>> node_start_pfn %lx node_end_pfn %lx\n", + pgdat->node_start_pfn, pgdat_end_pfn(pgdat)); + pr_err(">>> size calculated %lx\n", size); + pr_err(">>> allocated region %p-%lx\n", map, ((unsigned long)map)+size); + } #ifndef CONFIG_NEED_MULTIPLE_NODES /* @@ -5043,6 +5048,8 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) } #endif #endif /* CONFIG_FLAT_NODE_MEM_MAP */ + pr_err(">>> pfn %lx page %p\n", 0x200, pfn_to_page(0x200)); + pr_err(">>> pfn %lx page %p\n", 0xbffff, pfn_to_page(0xbffff)); } void __paginginit free_area_init_node(int nid, unsigned long *zones_size, I get this output: [ 0.000000] >>> node_start_pfn 200 node_end_pfn c0000 [ 0.000000] >>> size calculated 1800000 [ 0.000000] >>> allocated region edffa000-ef7fa000 [ 0.000000] >>> pfn 200 page ee002000 [ 0.000000] >>> pfn bffff page ef7fdfe0 The start and end pfn values are correct but that page value is outside of the allocated region for the memory map. This is a CONFIG_FLATMEM system so we aren't actually using arch_local_page_offset at all: #define __pfn_to_page(pfn) (mem_map + ((pfn) - ARCH_PFN_OFFSET)) #define __page_to_pfn(page) ((unsigned long)((page) - mem_map) + \ ARCH_PFN_OFFSET) If you do the math, the array size is fine if we don't offset by the start but alloc_node_mem_map offsets assuming pfn_to_page will offset as well but this doesn't happen in CONFIG_FLATMEM. Either alloc_node_mem_map needs to drop the offset or the pfn_to_page functions need to start adding the offset. It's worth noting that this gets corrected properly if we have CONFIG_HAVE_MEMBLOCK_NODE_MAP enabled so perhaps the fix is to unoffset for flatmem as well: diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7633c50..271c44b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5036,7 +5036,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) */ if (pgdat == NODE_DATA(0)) { mem_map = NODE_DATA(0)->node_mem_map; -#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP +#if defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) || defined(CONFIG_FLATMEM) if (page_to_pfn(mem_map) != pgdat->node_start_pfn) mem_map -= (pgdat->node_start_pfn - ARCH_PFN_OFFSET); #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */ Thanks, Laura -- Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: Issue on reserving memory with no-map flag in DT 2015-01-19 23:57 ` Laura Abbott @ 2015-01-20 9:54 ` Vlastimil Babka -1 siblings, 0 replies; 33+ messages in thread From: Vlastimil Babka @ 2015-01-20 9:54 UTC (permalink / raw) To: Laura Abbott, Srinivas Kandagatla, linux-arm-kernel, linux, ssantosh, Andrew Morton, Mel Gorman Cc: Kevin Hilman, Stephen Boyd, Arnd Bergmann, Kumar Gala, linux-mm On 01/20/2015 12:57 AM, Laura Abbott wrote: > On 1/19/2015 7:49 AM, Vlastimil Babka wrote: >> On 01/17/2015 01:24 AM, Laura Abbott wrote: >> >> I admit I may not see clearly through all the arch-specific layers and various >> config option combinations that are possible here, so I might be misinterpreting >> the code. But I think the problem here is not insufficient allocation size, but >> something else. >> >> The code above continues by this line: >> >> pgdat->node_mem_map = map + (pgdat->node_start_pfn - start); >> >> So, size for the map allocation has already been calculated aligned to >> MAX_ORDER_NR_PAGES before your patch, and node_mem_map points to the first >> actually present page, which might be offset from the perfect alignment. Your >> patch adds another offset to the already aligned size (but you use >> pageblock_nr_pages which might be lower than MAX_ORDER_NR_PAGES; this seems like >> a mistake in itself?). So with your patch we have map of aligned size starting >> from the node_mem_map. This means the last offset-worth of struct pages should >> be beyond what's needed to access struct page of pgdat_end_pfn(). If we need >> that extra padding to prevent crashing, then it looks really suspicious... >> >> And when I look at node_mem_map usage, I see include/asm/generic/memory_model.h >> defines __pfn_to_page as (basically) >> >> NODE_DATA(__nid)->node_mem_map + arch_local_page_offset(__pfn, __nid);\ >> >> and further above is a generic definition of arch_local_page_offset: >> >> #define arch_local_page_offset(pfn, nid) \ >> ((pfn) - NODE_DATA(nid)->node_start_pfn) >> >> So it looks correct to me without your patch. The map is allocated aligned, >> node_mem_map points to this map at the offset corresponding to node_start_pfn, >> and pfn_to_page subtracts node_start_pfn to get the offset relative to >> node_mem_map. We shouldn't need the extra padding by the node_start_pfn offset, >> unless something else is misbehaving here. >> >> In the issue fixed by 7c45512 that you refer to, the problem was basically that >> the allocation didn't use aligned size, but this looks different to me? >> >> > > With this hard coded debugging: > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 7633c50..241b870 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5029,6 +5029,11 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) > map = memblock_virt_alloc_node_nopanic(size, > pgdat->node_id); > pgdat->node_mem_map = map + (pgdat->node_start_pfn - start); > + pr_err(">>> node_start_pfn %lx node_end_pfn %lx\n", > + pgdat->node_start_pfn, pgdat_end_pfn(pgdat)); > + pr_err(">>> size calculated %lx\n", size); > + pr_err(">>> allocated region %p-%lx\n", map, ((unsigned long)map)+size); > + > } > #ifndef CONFIG_NEED_MULTIPLE_NODES > /* > @@ -5043,6 +5048,8 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) > } > #endif > #endif /* CONFIG_FLAT_NODE_MEM_MAP */ > + pr_err(">>> pfn %lx page %p\n", 0x200, pfn_to_page(0x200)); > + pr_err(">>> pfn %lx page %p\n", 0xbffff, pfn_to_page(0xbffff)); > } > > void __paginginit free_area_init_node(int nid, unsigned long *zones_size, > > I get this output: > [ 0.000000] >>> node_start_pfn 200 node_end_pfn c0000 > [ 0.000000] >>> size calculated 1800000 > [ 0.000000] >>> allocated region edffa000-ef7fa000 > [ 0.000000] >>> pfn 200 page ee002000 > [ 0.000000] >>> pfn bffff page ef7fdfe0 > > The start and end pfn values are correct but that page value is outside of the > allocated region for the memory map. This is a CONFIG_FLATMEM system so we > aren't actually using arch_local_page_offset at all: > > > #define __pfn_to_page(pfn) (mem_map + ((pfn) - ARCH_PFN_OFFSET)) > #define __page_to_pfn(page) ((unsigned long)((page) - mem_map) + \ > ARCH_PFN_OFFSET) Ah, OK. I searched just for node_mem_map and didn't notice it's also assigned to mem_map. > If you do the math, the array size is fine if we don't offset by the > start but alloc_node_mem_map offsets assuming pfn_to_page will offset > as well but this doesn't happen in CONFIG_FLATMEM. > > Either alloc_node_mem_map needs to drop the offset or the pfn_to_page > functions need to start adding the offset. It's worth noting that > this gets corrected properly if we have CONFIG_HAVE_MEMBLOCK_NODE_MAP enabled > so perhaps the fix is to unoffset for flatmem as well: > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 7633c50..271c44b 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5036,7 +5036,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) > */ > if (pgdat == NODE_DATA(0)) { > mem_map = NODE_DATA(0)->node_mem_map; > -#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP > +#if defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) || defined(CONFIG_FLATMEM) > if (page_to_pfn(mem_map) != pgdat->node_start_pfn) > mem_map -= (pgdat->node_start_pfn - ARCH_PFN_OFFSET); But is this correcting the same thing? The offset that's added earlier is (pgdat->node_start_pfn - start) where "start" is just alignment of the node_start_pfn to MAX_ORDER_NR_PAGES. But here we subtract whole pgdat->node_start_pfn, minus a ARCH_PFN_OFFSET constant. Is the constant always equeal to the earlier value of "start", which is calculated dynamically?. So I agree that mem_map assignment should be fixed, but maybe not exactly like this? > #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */ > > Thanks, > Laura > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 33+ messages in thread
* Issue on reserving memory with no-map flag in DT @ 2015-01-20 9:54 ` Vlastimil Babka 0 siblings, 0 replies; 33+ messages in thread From: Vlastimil Babka @ 2015-01-20 9:54 UTC (permalink / raw) To: linux-arm-kernel On 01/20/2015 12:57 AM, Laura Abbott wrote: > On 1/19/2015 7:49 AM, Vlastimil Babka wrote: >> On 01/17/2015 01:24 AM, Laura Abbott wrote: >> >> I admit I may not see clearly through all the arch-specific layers and various >> config option combinations that are possible here, so I might be misinterpreting >> the code. But I think the problem here is not insufficient allocation size, but >> something else. >> >> The code above continues by this line: >> >> pgdat->node_mem_map = map + (pgdat->node_start_pfn - start); >> >> So, size for the map allocation has already been calculated aligned to >> MAX_ORDER_NR_PAGES before your patch, and node_mem_map points to the first >> actually present page, which might be offset from the perfect alignment. Your >> patch adds another offset to the already aligned size (but you use >> pageblock_nr_pages which might be lower than MAX_ORDER_NR_PAGES; this seems like >> a mistake in itself?). So with your patch we have map of aligned size starting >> from the node_mem_map. This means the last offset-worth of struct pages should >> be beyond what's needed to access struct page of pgdat_end_pfn(). If we need >> that extra padding to prevent crashing, then it looks really suspicious... >> >> And when I look at node_mem_map usage, I see include/asm/generic/memory_model.h >> defines __pfn_to_page as (basically) >> >> NODE_DATA(__nid)->node_mem_map + arch_local_page_offset(__pfn, __nid);\ >> >> and further above is a generic definition of arch_local_page_offset: >> >> #define arch_local_page_offset(pfn, nid) \ >> ((pfn) - NODE_DATA(nid)->node_start_pfn) >> >> So it looks correct to me without your patch. The map is allocated aligned, >> node_mem_map points to this map at the offset corresponding to node_start_pfn, >> and pfn_to_page subtracts node_start_pfn to get the offset relative to >> node_mem_map. We shouldn't need the extra padding by the node_start_pfn offset, >> unless something else is misbehaving here. >> >> In the issue fixed by 7c45512 that you refer to, the problem was basically that >> the allocation didn't use aligned size, but this looks different to me? >> >> > > With this hard coded debugging: > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 7633c50..241b870 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5029,6 +5029,11 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) > map = memblock_virt_alloc_node_nopanic(size, > pgdat->node_id); > pgdat->node_mem_map = map + (pgdat->node_start_pfn - start); > + pr_err(">>> node_start_pfn %lx node_end_pfn %lx\n", > + pgdat->node_start_pfn, pgdat_end_pfn(pgdat)); > + pr_err(">>> size calculated %lx\n", size); > + pr_err(">>> allocated region %p-%lx\n", map, ((unsigned long)map)+size); > + > } > #ifndef CONFIG_NEED_MULTIPLE_NODES > /* > @@ -5043,6 +5048,8 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) > } > #endif > #endif /* CONFIG_FLAT_NODE_MEM_MAP */ > + pr_err(">>> pfn %lx page %p\n", 0x200, pfn_to_page(0x200)); > + pr_err(">>> pfn %lx page %p\n", 0xbffff, pfn_to_page(0xbffff)); > } > > void __paginginit free_area_init_node(int nid, unsigned long *zones_size, > > I get this output: > [ 0.000000] >>> node_start_pfn 200 node_end_pfn c0000 > [ 0.000000] >>> size calculated 1800000 > [ 0.000000] >>> allocated region edffa000-ef7fa000 > [ 0.000000] >>> pfn 200 page ee002000 > [ 0.000000] >>> pfn bffff page ef7fdfe0 > > The start and end pfn values are correct but that page value is outside of the > allocated region for the memory map. This is a CONFIG_FLATMEM system so we > aren't actually using arch_local_page_offset at all: > > > #define __pfn_to_page(pfn) (mem_map + ((pfn) - ARCH_PFN_OFFSET)) > #define __page_to_pfn(page) ((unsigned long)((page) - mem_map) + \ > ARCH_PFN_OFFSET) Ah, OK. I searched just for node_mem_map and didn't notice it's also assigned to mem_map. > If you do the math, the array size is fine if we don't offset by the > start but alloc_node_mem_map offsets assuming pfn_to_page will offset > as well but this doesn't happen in CONFIG_FLATMEM. > > Either alloc_node_mem_map needs to drop the offset or the pfn_to_page > functions need to start adding the offset. It's worth noting that > this gets corrected properly if we have CONFIG_HAVE_MEMBLOCK_NODE_MAP enabled > so perhaps the fix is to unoffset for flatmem as well: > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 7633c50..271c44b 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5036,7 +5036,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) > */ > if (pgdat == NODE_DATA(0)) { > mem_map = NODE_DATA(0)->node_mem_map; > -#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP > +#if defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) || defined(CONFIG_FLATMEM) > if (page_to_pfn(mem_map) != pgdat->node_start_pfn) > mem_map -= (pgdat->node_start_pfn - ARCH_PFN_OFFSET); But is this correcting the same thing? The offset that's added earlier is (pgdat->node_start_pfn - start) where "start" is just alignment of the node_start_pfn to MAX_ORDER_NR_PAGES. But here we subtract whole pgdat->node_start_pfn, minus a ARCH_PFN_OFFSET constant. Is the constant always equeal to the earlier value of "start", which is calculated dynamically?. So I agree that mem_map assignment should be fixed, but maybe not exactly like this? > #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */ > > Thanks, > Laura > ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH] mm: Don't offset memmap for flatmem 2015-01-20 9:54 ` Vlastimil Babka @ 2015-01-21 1:37 ` Laura Abbott -1 siblings, 0 replies; 33+ messages in thread From: Laura Abbott @ 2015-01-21 1:37 UTC (permalink / raw) To: Srinivas Kandagatla, linux-arm-kernel, Russell King - ARM Linux, ssantosh, Andrew Morton, Vlastimil Babka Cc: Laura Abbott, Kevin Kilman, Stephen Boyd, Arnd Bergman, Kumar Gala, linux-mm Srinivas Kandagatla reported bad page messages when trying to remove the bottom 2MB on an ARM based IFC6410 board BUG: Bad page state in process swapper pfn:fffa8 page:ef7fb500 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked) page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set bad because of flags: flags: 0x200041(locked|active|mlocked) Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816 Hardware name: Qualcomm (Flattened Device Tree) [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24) [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c) [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128) [<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0) [<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174) [<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58) [<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88) [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430) [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8) [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074) Disabling lock debugging due to kernel taint Removing the lower 2MB made the start of the lowmem zone to no longer be page block aligned. IFC6410 uses CONFIG_FLATMEM where alloc_node_mem_map allocates memory for the mem_map. alloc_node_mem_map will offset for unaligned nodes with the assumption the pfn/page translation functions will account for the offset. The functions for CONFIG_FLATMEM do not offset however, resulting in overrunning the memmap array. Just use the allocated memmap without any offset when running with CONFIG_FLATMEM to avoid the overrun. Signed-off-by: Laura Abbott <lauraa@codeaurora.org> Reported-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> --- Srinivas, can you test this version of the patch? --- mm/page_alloc.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7633c50..33cef00 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5014,6 +5014,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) if (!pgdat->node_mem_map) { unsigned long size, start, end; struct page *map; + unsigned long offset = 0; /* * The zone's endpoints aren't required to be MAX_ORDER @@ -5021,6 +5022,8 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) * for the buddy allocator to function correctly. */ start = pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1); + if (!IS_ENABLED(CONFIG_FLATMEM)) + offset = pgdat->node_start_pfn - start; end = pgdat_end_pfn(pgdat); end = ALIGN(end, MAX_ORDER_NR_PAGES); size = (end - start) * sizeof(struct page); @@ -5028,7 +5031,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) if (!map) map = memblock_virt_alloc_node_nopanic(size, pgdat->node_id); - pgdat->node_mem_map = map + (pgdat->node_start_pfn - start); + pgdat->node_mem_map = map + offset; } #ifndef CONFIG_NEED_MULTIPLE_NODES /* -- Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH] mm: Don't offset memmap for flatmem @ 2015-01-21 1:37 ` Laura Abbott 0 siblings, 0 replies; 33+ messages in thread From: Laura Abbott @ 2015-01-21 1:37 UTC (permalink / raw) To: linux-arm-kernel Srinivas Kandagatla reported bad page messages when trying to remove the bottom 2MB on an ARM based IFC6410 board BUG: Bad page state in process swapper pfn:fffa8 page:ef7fb500 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked) page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set bad because of flags: flags: 0x200041(locked|active|mlocked) Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816 Hardware name: Qualcomm (Flattened Device Tree) [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24) [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c) [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128) [<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0) [<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174) [<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58) [<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88) [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430) [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8) [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074) Disabling lock debugging due to kernel taint Removing the lower 2MB made the start of the lowmem zone to no longer be page block aligned. IFC6410 uses CONFIG_FLATMEM where alloc_node_mem_map allocates memory for the mem_map. alloc_node_mem_map will offset for unaligned nodes with the assumption the pfn/page translation functions will account for the offset. The functions for CONFIG_FLATMEM do not offset however, resulting in overrunning the memmap array. Just use the allocated memmap without any offset when running with CONFIG_FLATMEM to avoid the overrun. Signed-off-by: Laura Abbott <lauraa@codeaurora.org> Reported-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> --- Srinivas, can you test this version of the patch? --- mm/page_alloc.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7633c50..33cef00 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5014,6 +5014,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) if (!pgdat->node_mem_map) { unsigned long size, start, end; struct page *map; + unsigned long offset = 0; /* * The zone's endpoints aren't required to be MAX_ORDER @@ -5021,6 +5022,8 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) * for the buddy allocator to function correctly. */ start = pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1); + if (!IS_ENABLED(CONFIG_FLATMEM)) + offset = pgdat->node_start_pfn - start; end = pgdat_end_pfn(pgdat); end = ALIGN(end, MAX_ORDER_NR_PAGES); size = (end - start) * sizeof(struct page); @@ -5028,7 +5031,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) if (!map) map = memblock_virt_alloc_node_nopanic(size, pgdat->node_id); - pgdat->node_mem_map = map + (pgdat->node_start_pfn - start); + pgdat->node_mem_map = map + offset; } #ifndef CONFIG_NEED_MULTIPLE_NODES /* -- Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH] mm: Don't offset memmap for flatmem 2015-01-21 1:37 ` Laura Abbott @ 2015-01-21 10:15 ` Vlastimil Babka -1 siblings, 0 replies; 33+ messages in thread From: Vlastimil Babka @ 2015-01-21 10:15 UTC (permalink / raw) To: Laura Abbott, Srinivas Kandagatla, linux-arm-kernel, Russell King - ARM Linux, ssantosh, Andrew Morton Cc: Kevin Kilman, Stephen Boyd, Arnd Bergman, Kumar Gala, linux-mm On 01/21/2015 02:37 AM, Laura Abbott wrote: > Srinivas Kandagatla reported bad page messages when trying to > remove the bottom 2MB on an ARM based IFC6410 board > > BUG: Bad page state in process swapper pfn:fffa8 > page:ef7fb500 count:0 mapcount:0 mapping: (null) index:0x0 > flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked) > page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set > bad because of flags: > flags: 0x200041(locked|active|mlocked) > Modules linked in: > CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816 > Hardware name: Qualcomm (Flattened Device Tree) > [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24) > [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c) > [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128) > [<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0) > [<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174) > [<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58) > [<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88) > [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430) > [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8) > [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074) > Disabling lock debugging due to kernel taint > > Removing the lower 2MB made the start of the lowmem zone to no longer > be page block aligned. IFC6410 uses CONFIG_FLATMEM where > alloc_node_mem_map allocates memory for the mem_map. alloc_node_mem_map > will offset for unaligned nodes with the assumption the pfn/page > translation functions will account for the offset. The functions for > CONFIG_FLATMEM do not offset however, resulting in overrunning > the memmap array. Just use the allocated memmap without any offset > when running with CONFIG_FLATMEM to avoid the overrun. > > Signed-off-by: Laura Abbott <lauraa@codeaurora.org> > Reported-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> > --- > Srinivas, can you test this version of the patch? > --- > mm/page_alloc.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 7633c50..33cef00 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5014,6 +5014,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) > if (!pgdat->node_mem_map) { > unsigned long size, start, end; > struct page *map; > + unsigned long offset = 0; > > /* > * The zone's endpoints aren't required to be MAX_ORDER > @@ -5021,6 +5022,8 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) > * for the buddy allocator to function correctly. > */ > start = pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1); > + if (!IS_ENABLED(CONFIG_FLATMEM)) > + offset = pgdat->node_start_pfn - start; > end = pgdat_end_pfn(pgdat); > end = ALIGN(end, MAX_ORDER_NR_PAGES); > size = (end - start) * sizeof(struct page); > @@ -5028,7 +5031,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) > if (!map) > map = memblock_virt_alloc_node_nopanic(size, > pgdat->node_id); > - pgdat->node_mem_map = map + (pgdat->node_start_pfn - start); > + pgdat->node_mem_map = map + offset; Hmm, by this patch, you have changed not only mem_map, but also node_mem_map itself. So the result of pgdat_page_nr() defined in mmzone.h will now be different in the CONFIG_FLAT_NODE_MEM_MAP case? #ifdef CONFIG_FLAT_NODE_MEM_MAP #define pgdat_page_nr(pgdat, pagenr) ((pgdat)->node_mem_map + (pagenr)) #else #define pgdat_page_nr(pgdat, pagenr) pfn_to_page((pgdat)->node_start_pfn + (pagenr)) #define nid_page_nr(nid, pagenr) pgdat_page_nr(NODE_DATA(nid),(pagenr)) It appears that nobody uses pgdat_page_nr, except nid_page_nr, which nobody uses. But better not leave it broken, and there's also some arch-specific code looking at node_mem_map directly (although not sure if this particular combination of CONFIG_ parameters applies there). So it seems to me we should rather apply the offset to node_mem_map in any case, but not apply it (i.e. subtract it back) to mem_map for !CONFIG_FLATMEM? Thanks. > } > #ifndef CONFIG_NEED_MULTIPLE_NODES > /* > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH] mm: Don't offset memmap for flatmem @ 2015-01-21 10:15 ` Vlastimil Babka 0 siblings, 0 replies; 33+ messages in thread From: Vlastimil Babka @ 2015-01-21 10:15 UTC (permalink / raw) To: linux-arm-kernel On 01/21/2015 02:37 AM, Laura Abbott wrote: > Srinivas Kandagatla reported bad page messages when trying to > remove the bottom 2MB on an ARM based IFC6410 board > > BUG: Bad page state in process swapper pfn:fffa8 > page:ef7fb500 count:0 mapcount:0 mapping: (null) index:0x0 > flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked) > page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set > bad because of flags: > flags: 0x200041(locked|active|mlocked) > Modules linked in: > CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816 > Hardware name: Qualcomm (Flattened Device Tree) > [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24) > [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c) > [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128) > [<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0) > [<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174) > [<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58) > [<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88) > [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430) > [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8) > [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074) > Disabling lock debugging due to kernel taint > > Removing the lower 2MB made the start of the lowmem zone to no longer > be page block aligned. IFC6410 uses CONFIG_FLATMEM where > alloc_node_mem_map allocates memory for the mem_map. alloc_node_mem_map > will offset for unaligned nodes with the assumption the pfn/page > translation functions will account for the offset. The functions for > CONFIG_FLATMEM do not offset however, resulting in overrunning > the memmap array. Just use the allocated memmap without any offset > when running with CONFIG_FLATMEM to avoid the overrun. > > Signed-off-by: Laura Abbott <lauraa@codeaurora.org> > Reported-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> > --- > Srinivas, can you test this version of the patch? > --- > mm/page_alloc.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 7633c50..33cef00 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5014,6 +5014,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) > if (!pgdat->node_mem_map) { > unsigned long size, start, end; > struct page *map; > + unsigned long offset = 0; > > /* > * The zone's endpoints aren't required to be MAX_ORDER > @@ -5021,6 +5022,8 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) > * for the buddy allocator to function correctly. > */ > start = pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1); > + if (!IS_ENABLED(CONFIG_FLATMEM)) > + offset = pgdat->node_start_pfn - start; > end = pgdat_end_pfn(pgdat); > end = ALIGN(end, MAX_ORDER_NR_PAGES); > size = (end - start) * sizeof(struct page); > @@ -5028,7 +5031,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) > if (!map) > map = memblock_virt_alloc_node_nopanic(size, > pgdat->node_id); > - pgdat->node_mem_map = map + (pgdat->node_start_pfn - start); > + pgdat->node_mem_map = map + offset; Hmm, by this patch, you have changed not only mem_map, but also node_mem_map itself. So the result of pgdat_page_nr() defined in mmzone.h will now be different in the CONFIG_FLAT_NODE_MEM_MAP case? #ifdef CONFIG_FLAT_NODE_MEM_MAP #define pgdat_page_nr(pgdat, pagenr) ((pgdat)->node_mem_map + (pagenr)) #else #define pgdat_page_nr(pgdat, pagenr) pfn_to_page((pgdat)->node_start_pfn + (pagenr)) #define nid_page_nr(nid, pagenr) pgdat_page_nr(NODE_DATA(nid),(pagenr)) It appears that nobody uses pgdat_page_nr, except nid_page_nr, which nobody uses. But better not leave it broken, and there's also some arch-specific code looking at node_mem_map directly (although not sure if this particular combination of CONFIG_ parameters applies there). So it seems to me we should rather apply the offset to node_mem_map in any case, but not apply it (i.e. subtract it back) to mem_map for !CONFIG_FLATMEM? Thanks. > } > #ifndef CONFIG_NEED_MULTIPLE_NODES > /* > ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCHv2] mm: Don't offset memmap for flatmem 2015-01-21 1:37 ` Laura Abbott @ 2015-01-22 1:01 ` Laura Abbott -1 siblings, 0 replies; 33+ messages in thread From: Laura Abbott @ 2015-01-22 1:01 UTC (permalink / raw) To: Vlastimil Babka, Srinivas Kandagatla, linux-arm-kernel, Russell King - ARM Linux, ssantosh, Andrew Morton Cc: Laura Abbott, Kevin Hilman, Arnd Bergman, Stephen Boyd, linux-mm, Kumar Gala Srinivas Kandagatla reported bad page messages when trying to remove the bottom 2MB on an ARM based IFC6410 board BUG: Bad page state in process swapper pfn:fffa8 page:ef7fb500 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked) page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set bad because of flags: flags: 0x200041(locked|active|mlocked) Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816 Hardware name: Qualcomm (Flattened Device Tree) [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24) [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c) [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128) [<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0) [<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174) [<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58) [<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88) [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430) [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8) [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074) Disabling lock debugging due to kernel taint Removing the lower 2MB made the start of the lowmem zone to no longer be page block aligned. IFC6410 uses CONFIG_FLATMEM where alloc_node_mem_map allocates memory for the mem_map. alloc_node_mem_map will offset for unaligned nodes with the assumption the pfn/page translation functions will account for the offset. The functions for CONFIG_FLATMEM do not offset however, resulting in overrunning the memmap array. Just use the allocated memmap without any offset when running with CONFIG_FLATMEM to avoid the overrun. Signed-off-by: Laura Abbott <lauraa@codeaurora.org> Reported-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> --- mm/page_alloc.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7633c50..269fc93 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5005,6 +5005,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat, static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) { + unsigned long __maybe_unused offset = 0; + /* Skip empty nodes */ if (!pgdat->node_spanned_pages) return; @@ -5021,6 +5023,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) * for the buddy allocator to function correctly. */ start = pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1); + offset = pgdat->node_start_pfn - start; end = pgdat_end_pfn(pgdat); end = ALIGN(end, MAX_ORDER_NR_PAGES); size = (end - start) * sizeof(struct page); @@ -5028,7 +5031,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) if (!map) map = memblock_virt_alloc_node_nopanic(size, pgdat->node_id); - pgdat->node_mem_map = map + (pgdat->node_start_pfn - start); + pgdat->node_mem_map = map + offset; } #ifndef CONFIG_NEED_MULTIPLE_NODES /* @@ -5036,10 +5039,13 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) */ if (pgdat == NODE_DATA(0)) { mem_map = NODE_DATA(0)->node_mem_map; -#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP - if (page_to_pfn(mem_map) != pgdat->node_start_pfn) - mem_map -= (pgdat->node_start_pfn - ARCH_PFN_OFFSET); -#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */ +#if defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) || defined(CONFIG_FLATMEM) + if (page_to_pfn(mem_map) != pgdat->node_start_pfn) { + if (IS_ENABLED(CONFIG_HAVE_MEMBLOCK_NODE_MAP)) + offset = pgdat->node_start_pfn - ARCH_PFN_OFFSET; + mem_map -= offset; + } +#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP || CONFIG_FLATMEM */ } #endif #endif /* CONFIG_FLAT_NODE_MEM_MAP */ -- Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCHv2] mm: Don't offset memmap for flatmem @ 2015-01-22 1:01 ` Laura Abbott 0 siblings, 0 replies; 33+ messages in thread From: Laura Abbott @ 2015-01-22 1:01 UTC (permalink / raw) To: linux-arm-kernel Srinivas Kandagatla reported bad page messages when trying to remove the bottom 2MB on an ARM based IFC6410 board BUG: Bad page state in process swapper pfn:fffa8 page:ef7fb500 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked) page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set bad because of flags: flags: 0x200041(locked|active|mlocked) Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816 Hardware name: Qualcomm (Flattened Device Tree) [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24) [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c) [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128) [<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0) [<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174) [<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58) [<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88) [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430) [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8) [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074) Disabling lock debugging due to kernel taint Removing the lower 2MB made the start of the lowmem zone to no longer be page block aligned. IFC6410 uses CONFIG_FLATMEM where alloc_node_mem_map allocates memory for the mem_map. alloc_node_mem_map will offset for unaligned nodes with the assumption the pfn/page translation functions will account for the offset. The functions for CONFIG_FLATMEM do not offset however, resulting in overrunning the memmap array. Just use the allocated memmap without any offset when running with CONFIG_FLATMEM to avoid the overrun. Signed-off-by: Laura Abbott <lauraa@codeaurora.org> Reported-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> --- mm/page_alloc.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7633c50..269fc93 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5005,6 +5005,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat, static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) { + unsigned long __maybe_unused offset = 0; + /* Skip empty nodes */ if (!pgdat->node_spanned_pages) return; @@ -5021,6 +5023,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) * for the buddy allocator to function correctly. */ start = pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1); + offset = pgdat->node_start_pfn - start; end = pgdat_end_pfn(pgdat); end = ALIGN(end, MAX_ORDER_NR_PAGES); size = (end - start) * sizeof(struct page); @@ -5028,7 +5031,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) if (!map) map = memblock_virt_alloc_node_nopanic(size, pgdat->node_id); - pgdat->node_mem_map = map + (pgdat->node_start_pfn - start); + pgdat->node_mem_map = map + offset; } #ifndef CONFIG_NEED_MULTIPLE_NODES /* @@ -5036,10 +5039,13 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) */ if (pgdat == NODE_DATA(0)) { mem_map = NODE_DATA(0)->node_mem_map; -#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP - if (page_to_pfn(mem_map) != pgdat->node_start_pfn) - mem_map -= (pgdat->node_start_pfn - ARCH_PFN_OFFSET); -#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */ +#if defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) || defined(CONFIG_FLATMEM) + if (page_to_pfn(mem_map) != pgdat->node_start_pfn) { + if (IS_ENABLED(CONFIG_HAVE_MEMBLOCK_NODE_MAP)) + offset = pgdat->node_start_pfn - ARCH_PFN_OFFSET; + mem_map -= offset; + } +#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP || CONFIG_FLATMEM */ } #endif #endif /* CONFIG_FLAT_NODE_MEM_MAP */ -- Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCHv2] mm: Don't offset memmap for flatmem 2015-01-22 1:01 ` Laura Abbott @ 2015-01-23 0:20 ` Andrew Morton -1 siblings, 0 replies; 33+ messages in thread From: Andrew Morton @ 2015-01-23 0:20 UTC (permalink / raw) To: Laura Abbott Cc: Vlastimil Babka, Srinivas Kandagatla, linux-arm-kernel, Russell King - ARM Linux, ssantosh, Kevin Hilman, Arnd Bergman, Stephen Boyd, linux-mm, Kumar Gala On Wed, 21 Jan 2015 17:01:40 -0800 Laura Abbott <lauraa@codeaurora.org> wrote: > Srinivas Kandagatla reported bad page messages when trying to > remove the bottom 2MB on an ARM based IFC6410 board > > BUG: Bad page state in process swapper pfn:fffa8 > page:ef7fb500 count:0 mapcount:0 mapping: (null) index:0x0 > flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked) > page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set > bad because of flags: > flags: 0x200041(locked|active|mlocked) > Modules linked in: > CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816 > Hardware name: Qualcomm (Flattened Device Tree) > [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24) > [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c) > [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128) > [<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0) > [<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174) > [<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58) > [<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88) > [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430) > [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8) > [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074) > Disabling lock debugging due to kernel taint > > Removing the lower 2MB made the start of the lowmem zone to no longer > be page block aligned. IFC6410 uses CONFIG_FLATMEM where > alloc_node_mem_map allocates memory for the mem_map. alloc_node_mem_map > will offset for unaligned nodes with the assumption the pfn/page > translation functions will account for the offset. The functions for > CONFIG_FLATMEM do not offset however, resulting in overrunning > the memmap array. Just use the allocated memmap without any offset > when running with CONFIG_FLATMEM to avoid the overrun. > I don't think v2 addressed Vlastimil's review comment? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCHv2] mm: Don't offset memmap for flatmem @ 2015-01-23 0:20 ` Andrew Morton 0 siblings, 0 replies; 33+ messages in thread From: Andrew Morton @ 2015-01-23 0:20 UTC (permalink / raw) To: linux-arm-kernel On Wed, 21 Jan 2015 17:01:40 -0800 Laura Abbott <lauraa@codeaurora.org> wrote: > Srinivas Kandagatla reported bad page messages when trying to > remove the bottom 2MB on an ARM based IFC6410 board > > BUG: Bad page state in process swapper pfn:fffa8 > page:ef7fb500 count:0 mapcount:0 mapping: (null) index:0x0 > flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked) > page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set > bad because of flags: > flags: 0x200041(locked|active|mlocked) > Modules linked in: > CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816 > Hardware name: Qualcomm (Flattened Device Tree) > [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24) > [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c) > [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128) > [<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0) > [<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174) > [<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58) > [<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88) > [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430) > [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8) > [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074) > Disabling lock debugging due to kernel taint > > Removing the lower 2MB made the start of the lowmem zone to no longer > be page block aligned. IFC6410 uses CONFIG_FLATMEM where > alloc_node_mem_map allocates memory for the mem_map. alloc_node_mem_map > will offset for unaligned nodes with the assumption the pfn/page > translation functions will account for the offset. The functions for > CONFIG_FLATMEM do not offset however, resulting in overrunning > the memmap array. Just use the allocated memmap without any offset > when running with CONFIG_FLATMEM to avoid the overrun. > I don't think v2 addressed Vlastimil's review comment? ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCHv2] mm: Don't offset memmap for flatmem 2015-01-23 0:20 ` Andrew Morton @ 2015-01-23 0:33 ` Laura Abbott -1 siblings, 0 replies; 33+ messages in thread From: Laura Abbott @ 2015-01-23 0:33 UTC (permalink / raw) To: Andrew Morton Cc: Vlastimil Babka, Srinivas Kandagatla, linux-arm-kernel, Russell King - ARM Linux, ssantosh, Kevin Hilman, Arnd Bergman, Stephen Boyd, linux-mm, Kumar Gala On 1/22/2015 4:20 PM, Andrew Morton wrote: > On Wed, 21 Jan 2015 17:01:40 -0800 Laura Abbott <lauraa@codeaurora.org> wrote: > >> Srinivas Kandagatla reported bad page messages when trying to >> remove the bottom 2MB on an ARM based IFC6410 board >> >> BUG: Bad page state in process swapper pfn:fffa8 >> page:ef7fb500 count:0 mapcount:0 mapping: (null) index:0x0 >> flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked) >> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set >> bad because of flags: >> flags: 0x200041(locked|active|mlocked) >> Modules linked in: >> CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816 >> Hardware name: Qualcomm (Flattened Device Tree) >> [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24) >> [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c) >> [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128) >> [<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0) >> [<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174) >> [<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58) >> [<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88) >> [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430) >> [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8) >> [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074) >> Disabling lock debugging due to kernel taint >> >> Removing the lower 2MB made the start of the lowmem zone to no longer >> be page block aligned. IFC6410 uses CONFIG_FLATMEM where >> alloc_node_mem_map allocates memory for the mem_map. alloc_node_mem_map >> will offset for unaligned nodes with the assumption the pfn/page >> translation functions will account for the offset. The functions for >> CONFIG_FLATMEM do not offset however, resulting in overrunning >> the memmap array. Just use the allocated memmap without any offset >> when running with CONFIG_FLATMEM to avoid the overrun. >> > > I don't think v2 addressed Vlastimil's review comment? > We're still adding the offset to node_mem_map and then subtracting it from just mem_map. Did I miss another comment somewhere? -- Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCHv2] mm: Don't offset memmap for flatmem @ 2015-01-23 0:33 ` Laura Abbott 0 siblings, 0 replies; 33+ messages in thread From: Laura Abbott @ 2015-01-23 0:33 UTC (permalink / raw) To: linux-arm-kernel On 1/22/2015 4:20 PM, Andrew Morton wrote: > On Wed, 21 Jan 2015 17:01:40 -0800 Laura Abbott <lauraa@codeaurora.org> wrote: > >> Srinivas Kandagatla reported bad page messages when trying to >> remove the bottom 2MB on an ARM based IFC6410 board >> >> BUG: Bad page state in process swapper pfn:fffa8 >> page:ef7fb500 count:0 mapcount:0 mapping: (null) index:0x0 >> flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked) >> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set >> bad because of flags: >> flags: 0x200041(locked|active|mlocked) >> Modules linked in: >> CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816 >> Hardware name: Qualcomm (Flattened Device Tree) >> [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24) >> [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c) >> [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128) >> [<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0) >> [<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174) >> [<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58) >> [<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88) >> [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430) >> [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8) >> [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074) >> Disabling lock debugging due to kernel taint >> >> Removing the lower 2MB made the start of the lowmem zone to no longer >> be page block aligned. IFC6410 uses CONFIG_FLATMEM where >> alloc_node_mem_map allocates memory for the mem_map. alloc_node_mem_map >> will offset for unaligned nodes with the assumption the pfn/page >> translation functions will account for the offset. The functions for >> CONFIG_FLATMEM do not offset however, resulting in overrunning >> the memmap array. Just use the allocated memmap without any offset >> when running with CONFIG_FLATMEM to avoid the overrun. >> > > I don't think v2 addressed Vlastimil's review comment? > We're still adding the offset to node_mem_map and then subtracting it from just mem_map. Did I miss another comment somewhere? -- Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCHv2] mm: Don't offset memmap for flatmem 2015-01-23 0:33 ` Laura Abbott @ 2015-01-23 9:05 ` Vlastimil Babka -1 siblings, 0 replies; 33+ messages in thread From: Vlastimil Babka @ 2015-01-23 9:05 UTC (permalink / raw) To: Laura Abbott, Andrew Morton Cc: Srinivas Kandagatla, linux-arm-kernel, Russell King - ARM Linux, ssantosh, Kevin Hilman, Arnd Bergman, Stephen Boyd, linux-mm, Kumar Gala, Mel Gorman On 01/23/2015 01:33 AM, Laura Abbott wrote: > On 1/22/2015 4:20 PM, Andrew Morton wrote: >> On Wed, 21 Jan 2015 17:01:40 -0800 Laura Abbott <lauraa@codeaurora.org> wrote: >> >>> Srinivas Kandagatla reported bad page messages when trying to >>> remove the bottom 2MB on an ARM based IFC6410 board >>> >>> BUG: Bad page state in process swapper pfn:fffa8 >>> page:ef7fb500 count:0 mapcount:0 mapping: (null) index:0x0 >>> flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked) >>> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set >>> bad because of flags: >>> flags: 0x200041(locked|active|mlocked) >>> Modules linked in: >>> CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816 >>> Hardware name: Qualcomm (Flattened Device Tree) >>> [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24) >>> [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c) >>> [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128) >>> [<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0) >>> [<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174) >>> [<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58) >>> [<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88) >>> [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430) >>> [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8) >>> [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074) >>> Disabling lock debugging due to kernel taint >>> >>> Removing the lower 2MB made the start of the lowmem zone to no longer >>> be page block aligned. IFC6410 uses CONFIG_FLATMEM where >>> alloc_node_mem_map allocates memory for the mem_map. alloc_node_mem_map >>> will offset for unaligned nodes with the assumption the pfn/page >>> translation functions will account for the offset. The functions for >>> CONFIG_FLATMEM do not offset however, resulting in overrunning >>> the memmap array. Just use the allocated memmap without any offset >>> when running with CONFIG_FLATMEM to avoid the overrun. >>> >> >> I don't think v2 addressed Vlastimil's review comment? >> > > We're still adding the offset to node_mem_map and then subtracting it from > just mem_map. Did I miss another comment somewhere? Yes that was addressed, thanks. But I don't feel comfortable acking it yet, as I have no idea if we are doing the right thing for CONFIG_HAVE_MEMBLOCK_NODE_MAP && CONFIG_FLATMEM case here. Also putting the CONFIG_FLATMEM && !CONFIG_HAVE_MEMBLOCK_NODE_MAP under the "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" will probably do the right thing, but looks like a weird test for this case here. I have no good suggestion though, so let's CC Mel who apparently wrote the ARCH_PFN_OFFSET correction? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCHv2] mm: Don't offset memmap for flatmem @ 2015-01-23 9:05 ` Vlastimil Babka 0 siblings, 0 replies; 33+ messages in thread From: Vlastimil Babka @ 2015-01-23 9:05 UTC (permalink / raw) To: linux-arm-kernel On 01/23/2015 01:33 AM, Laura Abbott wrote: > On 1/22/2015 4:20 PM, Andrew Morton wrote: >> On Wed, 21 Jan 2015 17:01:40 -0800 Laura Abbott <lauraa@codeaurora.org> wrote: >> >>> Srinivas Kandagatla reported bad page messages when trying to >>> remove the bottom 2MB on an ARM based IFC6410 board >>> >>> BUG: Bad page state in process swapper pfn:fffa8 >>> page:ef7fb500 count:0 mapcount:0 mapping: (null) index:0x0 >>> flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked) >>> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set >>> bad because of flags: >>> flags: 0x200041(locked|active|mlocked) >>> Modules linked in: >>> CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816 >>> Hardware name: Qualcomm (Flattened Device Tree) >>> [<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24) >>> [<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c) >>> [<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128) >>> [<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0) >>> [<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174) >>> [<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58) >>> [<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88) >>> [<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430) >>> [<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8) >>> [<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074) >>> Disabling lock debugging due to kernel taint >>> >>> Removing the lower 2MB made the start of the lowmem zone to no longer >>> be page block aligned. IFC6410 uses CONFIG_FLATMEM where >>> alloc_node_mem_map allocates memory for the mem_map. alloc_node_mem_map >>> will offset for unaligned nodes with the assumption the pfn/page >>> translation functions will account for the offset. The functions for >>> CONFIG_FLATMEM do not offset however, resulting in overrunning >>> the memmap array. Just use the allocated memmap without any offset >>> when running with CONFIG_FLATMEM to avoid the overrun. >>> >> >> I don't think v2 addressed Vlastimil's review comment? >> > > We're still adding the offset to node_mem_map and then subtracting it from > just mem_map. Did I miss another comment somewhere? Yes that was addressed, thanks. But I don't feel comfortable acking it yet, as I have no idea if we are doing the right thing for CONFIG_HAVE_MEMBLOCK_NODE_MAP && CONFIG_FLATMEM case here. Also putting the CONFIG_FLATMEM && !CONFIG_HAVE_MEMBLOCK_NODE_MAP under the "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" will probably do the right thing, but looks like a weird test for this case here. I have no good suggestion though, so let's CC Mel who apparently wrote the ARCH_PFN_OFFSET correction? ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCHv2] mm: Don't offset memmap for flatmem 2015-01-23 9:05 ` Vlastimil Babka @ 2015-01-26 15:56 ` Mel Gorman -1 siblings, 0 replies; 33+ messages in thread From: Mel Gorman @ 2015-01-26 15:56 UTC (permalink / raw) To: Vlastimil Babka Cc: Laura Abbott, Andrew Morton, Srinivas Kandagatla, linux-arm-kernel, Russell King - ARM Linux, ssantosh, Kevin Hilman, Arnd Bergman, Stephen Boyd, linux-mm, Kumar Gala On Fri, Jan 23, 2015 at 10:05:48AM +0100, Vlastimil Babka wrote: > On 01/23/2015 01:33 AM, Laura Abbott wrote: > >On 1/22/2015 4:20 PM, Andrew Morton wrote: > >>On Wed, 21 Jan 2015 17:01:40 -0800 Laura Abbott <lauraa@codeaurora.org> wrote: > >> > >>>Srinivas Kandagatla reported bad page messages when trying to > >>>remove the bottom 2MB on an ARM based IFC6410 board > >>> > >>>BUG: Bad page state in process swapper pfn:fffa8 > >>>page:ef7fb500 count:0 mapcount:0 mapping: (null) index:0x0 > >>>flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked) > >>>page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set > >>>bad because of flags: > >>>flags: 0x200041(locked|active|mlocked) > >>>Modules linked in: > >>>CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816 > >>>Hardware name: Qualcomm (Flattened Device Tree) > >>>[<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24) > >>>[<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c) > >>>[<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128) > >>>[<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0) > >>>[<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174) > >>>[<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58) > >>>[<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88) > >>>[<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430) > >>>[<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8) > >>>[<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074) > >>>Disabling lock debugging due to kernel taint > >>> > >>>Removing the lower 2MB made the start of the lowmem zone to no longer > >>>be page block aligned. IFC6410 uses CONFIG_FLATMEM where > >>>alloc_node_mem_map allocates memory for the mem_map. alloc_node_mem_map > >>>will offset for unaligned nodes with the assumption the pfn/page > >>>translation functions will account for the offset. The functions for > >>>CONFIG_FLATMEM do not offset however, resulting in overrunning > >>>the memmap array. Just use the allocated memmap without any offset > >>>when running with CONFIG_FLATMEM to avoid the overrun. > >>> > >> > >>I don't think v2 addressed Vlastimil's review comment? > >> > > > >We're still adding the offset to node_mem_map and then subtracting it from > >just mem_map. Did I miss another comment somewhere? > > Yes that was addressed, thanks. But I don't feel comfortable acking > it yet, as I have no idea if we are doing the right thing for > CONFIG_HAVE_MEMBLOCK_NODE_MAP && CONFIG_FLATMEM case here. > > Also putting the CONFIG_FLATMEM && !CONFIG_HAVE_MEMBLOCK_NODE_MAP > under the "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" will > probably do the right thing, but looks like a weird test for this > case here. > > I have no good suggestion though, so let's CC Mel who apparently > wrote the ARCH_PFN_OFFSET correction? > I don't recall introducing ARCH_PFN_OFFSET, are you sure it was me? I'm just back today after been offline a week so didn't review the patch but IIRC, ARCH_PFN_OFFSET deals with the case where physical memory does not start at 0. Without the offset, virtual _PAGE_OFFSET would not physical page 0. I don't recall it being related to the alignment of node 0 so if there are crashes due to misalignment of node 0 and the fix is ARCH_PFN_OFFSET related then I'm surprised. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCHv2] mm: Don't offset memmap for flatmem @ 2015-01-26 15:56 ` Mel Gorman 0 siblings, 0 replies; 33+ messages in thread From: Mel Gorman @ 2015-01-26 15:56 UTC (permalink / raw) To: linux-arm-kernel On Fri, Jan 23, 2015 at 10:05:48AM +0100, Vlastimil Babka wrote: > On 01/23/2015 01:33 AM, Laura Abbott wrote: > >On 1/22/2015 4:20 PM, Andrew Morton wrote: > >>On Wed, 21 Jan 2015 17:01:40 -0800 Laura Abbott <lauraa@codeaurora.org> wrote: > >> > >>>Srinivas Kandagatla reported bad page messages when trying to > >>>remove the bottom 2MB on an ARM based IFC6410 board > >>> > >>>BUG: Bad page state in process swapper pfn:fffa8 > >>>page:ef7fb500 count:0 mapcount:0 mapping: (null) index:0x0 > >>>flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked) > >>>page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set > >>>bad because of flags: > >>>flags: 0x200041(locked|active|mlocked) > >>>Modules linked in: > >>>CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816 > >>>Hardware name: Qualcomm (Flattened Device Tree) > >>>[<c0218280>] (unwind_backtrace) from [<c0212be8>] (show_stack+0x20/0x24) > >>>[<c0212be8>] (show_stack) from [<c0af7124>] (dump_stack+0x80/0x9c) > >>>[<c0af7124>] (dump_stack) from [<c0301570>] (bad_page+0xc8/0x128) > >>>[<c0301570>] (bad_page) from [<c03018a8>] (free_pages_prepare+0x168/0x1e0) > >>>[<c03018a8>] (free_pages_prepare) from [<c030369c>] (free_hot_cold_page+0x3c/0x174) > >>>[<c030369c>] (free_hot_cold_page) from [<c0303828>] (__free_pages+0x54/0x58) > >>>[<c0303828>] (__free_pages) from [<c030395c>] (free_highmem_page+0x38/0x88) > >>>[<c030395c>] (free_highmem_page) from [<c0f62d5c>] (mem_init+0x240/0x430) > >>>[<c0f62d5c>] (mem_init) from [<c0f5db3c>] (start_kernel+0x1e4/0x3c8) > >>>[<c0f5db3c>] (start_kernel) from [<80208074>] (0x80208074) > >>>Disabling lock debugging due to kernel taint > >>> > >>>Removing the lower 2MB made the start of the lowmem zone to no longer > >>>be page block aligned. IFC6410 uses CONFIG_FLATMEM where > >>>alloc_node_mem_map allocates memory for the mem_map. alloc_node_mem_map > >>>will offset for unaligned nodes with the assumption the pfn/page > >>>translation functions will account for the offset. The functions for > >>>CONFIG_FLATMEM do not offset however, resulting in overrunning > >>>the memmap array. Just use the allocated memmap without any offset > >>>when running with CONFIG_FLATMEM to avoid the overrun. > >>> > >> > >>I don't think v2 addressed Vlastimil's review comment? > >> > > > >We're still adding the offset to node_mem_map and then subtracting it from > >just mem_map. Did I miss another comment somewhere? > > Yes that was addressed, thanks. But I don't feel comfortable acking > it yet, as I have no idea if we are doing the right thing for > CONFIG_HAVE_MEMBLOCK_NODE_MAP && CONFIG_FLATMEM case here. > > Also putting the CONFIG_FLATMEM && !CONFIG_HAVE_MEMBLOCK_NODE_MAP > under the "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" will > probably do the right thing, but looks like a weird test for this > case here. > > I have no good suggestion though, so let's CC Mel who apparently > wrote the ARCH_PFN_OFFSET correction? > I don't recall introducing ARCH_PFN_OFFSET, are you sure it was me? I'm just back today after been offline a week so didn't review the patch but IIRC, ARCH_PFN_OFFSET deals with the case where physical memory does not start at 0. Without the offset, virtual _PAGE_OFFSET would not physical page 0. I don't recall it being related to the alignment of node 0 so if there are crashes due to misalignment of node 0 and the fix is ARCH_PFN_OFFSET related then I'm surprised. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCHv2] mm: Don't offset memmap for flatmem 2015-01-26 15:56 ` Mel Gorman @ 2015-01-29 13:13 ` Vlastimil Babka -1 siblings, 0 replies; 33+ messages in thread From: Vlastimil Babka @ 2015-01-29 13:13 UTC (permalink / raw) To: Mel Gorman Cc: Laura Abbott, Andrew Morton, Srinivas Kandagatla, linux-arm-kernel, Russell King - ARM Linux, ssantosh, Kevin Hilman, Arnd Bergman, Stephen Boyd, linux-mm, Kumar Gala On 01/26/2015 04:56 PM, Mel Gorman wrote: > On Fri, Jan 23, 2015 at 10:05:48AM +0100, Vlastimil Babka wrote: >> On 01/23/2015 01:33 AM, Laura Abbott wrote: >>> On 1/22/2015 4:20 PM, Andrew Morton wrote: >>>> >>>> I don't think v2 addressed Vlastimil's review comment? >>>> >>> >>> We're still adding the offset to node_mem_map and then subtracting it from >>> just mem_map. Did I miss another comment somewhere? >> >> Yes that was addressed, thanks. But I don't feel comfortable acking >> it yet, as I have no idea if we are doing the right thing for >> CONFIG_HAVE_MEMBLOCK_NODE_MAP && CONFIG_FLATMEM case here. >> >> Also putting the CONFIG_FLATMEM && !CONFIG_HAVE_MEMBLOCK_NODE_MAP >> under the "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" will >> probably do the right thing, but looks like a weird test for this >> case here. >> >> I have no good suggestion though, so let's CC Mel who apparently >> wrote the ARCH_PFN_OFFSET correction? >> > > I don't recall introducing ARCH_PFN_OFFSET, are you sure it was me? I'm just > back today after been offline a week so didn't review the patch but IIRC, > ARCH_PFN_OFFSET deals with the case where physical memory does not start > at 0. Without the offset, virtual _PAGE_OFFSET would not physical page 0. > I don't recall it being related to the alignment of node 0 so if there > are crashes due to misalignment of node 0 and the fix is ARCH_PFN_OFFSET > related then I'm surprised. You're right that ARCH_PFN_OFFSET wasn't added by you, but by commit 467bc461d2 which was a bugfix to your commit c713216dee, which did introduce the mem_map correction code, and after which the code looked like: mem_map = NODE_DATA(0)->node_mem_map; #ifdef CONFIG_ARCH_POPULATES_NODE_MAP if (page_to_pfn(mem_map) != pgdat->node_start_pfn) mem_map -= pgdat->node_start_pfn; #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */ It's from 2006 so I can't expect you remember the details, but I had some trouble finding out what this does. I assume it makes sure that mem_map points to struct page corresponding to pfn 0, because that's what translations using mem_map expect. But pgdat->node_mem_map points to struct page corresponding to pgdat->node_start_pfn, which might not be 0. So it subtracts node_start_pfn to fix that. This is OK, as the node_mem_map is allocated (in this very function) with padding so that it covers a MAX_ORDER_NR_PAGES aligned area where node_mem_map may point to the middle of it. Commit 467bc461d2 fixed this in case the first pfn is not 0, but ARCH_PFN_OFFSET. So mem_map points to struct page corresponding to pfn=ARCH_PFN_OFFSET, which is OK. But I still have few doubts: 1) The "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" sort of silently assumes that mem_map is allocated at the beginning of the node, i.e. at pgdat->node_start_pfn. And the only reason for this if-condition to be true, is that we haven't corrected the page_to_pfn translation, which uses mem_map. Is this assumption always OK to do? Shouldn't the if-condition be instead about pgdat->node_start_pfn not being aligned? 2) The #ifdef guard is about CONFIG_ARCH_POPULATES_NODE_MAP, which is nowadays called CONFIG_HAVE_MEMBLOCK_NODE_MAP. But shouldn't it be #ifdef FLATMEM instead? After all, we are correcting value of mem_map based on page_to_pfn code variant used on FLATMEM. arm doesn't define CONFIG_ARCH_POPULATES_NODE_MAP but apparently needs this correction. 3) The node_mem_map allocation code aligns the allocation to MAX_ORDER_NR_PAGES, so the offset between the start of the allocated map and where node_mem_map points to will be up to MAX_ORDER_NR_PAGES. However, here we subtract (in current kernel) (pgdat->node_start_pfn - ARCH_PFN_OFFSET). That looks like another silent assumption, that pgdat->node_start_pfn is always between ARCH_PFN_OFFSET and ARCH_PFN_OFFSET + MAX_ORDER_NR_PAGES. If it were larger, the mem_map correction would subtract too much and end up below what was allocated for node_mem_map, no? The bug report behind this patch said that first 2MB of memory was reserved using "no-map flag using DT". Unless this somehow translates to ARCH_PFN_OFFSET at build time, we would underflow mem_map, right? Maybe I'm just overly paranoid here and of course ARCH_PFN_OFFSET is determined properly on arm... If anyone can confirm my doubts or point me to what I'm missing, thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCHv2] mm: Don't offset memmap for flatmem @ 2015-01-29 13:13 ` Vlastimil Babka 0 siblings, 0 replies; 33+ messages in thread From: Vlastimil Babka @ 2015-01-29 13:13 UTC (permalink / raw) To: linux-arm-kernel On 01/26/2015 04:56 PM, Mel Gorman wrote: > On Fri, Jan 23, 2015 at 10:05:48AM +0100, Vlastimil Babka wrote: >> On 01/23/2015 01:33 AM, Laura Abbott wrote: >>> On 1/22/2015 4:20 PM, Andrew Morton wrote: >>>> >>>> I don't think v2 addressed Vlastimil's review comment? >>>> >>> >>> We're still adding the offset to node_mem_map and then subtracting it from >>> just mem_map. Did I miss another comment somewhere? >> >> Yes that was addressed, thanks. But I don't feel comfortable acking >> it yet, as I have no idea if we are doing the right thing for >> CONFIG_HAVE_MEMBLOCK_NODE_MAP && CONFIG_FLATMEM case here. >> >> Also putting the CONFIG_FLATMEM && !CONFIG_HAVE_MEMBLOCK_NODE_MAP >> under the "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" will >> probably do the right thing, but looks like a weird test for this >> case here. >> >> I have no good suggestion though, so let's CC Mel who apparently >> wrote the ARCH_PFN_OFFSET correction? >> > > I don't recall introducing ARCH_PFN_OFFSET, are you sure it was me? I'm just > back today after been offline a week so didn't review the patch but IIRC, > ARCH_PFN_OFFSET deals with the case where physical memory does not start > at 0. Without the offset, virtual _PAGE_OFFSET would not physical page 0. > I don't recall it being related to the alignment of node 0 so if there > are crashes due to misalignment of node 0 and the fix is ARCH_PFN_OFFSET > related then I'm surprised. You're right that ARCH_PFN_OFFSET wasn't added by you, but by commit 467bc461d2 which was a bugfix to your commit c713216dee, which did introduce the mem_map correction code, and after which the code looked like: mem_map = NODE_DATA(0)->node_mem_map; #ifdef CONFIG_ARCH_POPULATES_NODE_MAP if (page_to_pfn(mem_map) != pgdat->node_start_pfn) mem_map -= pgdat->node_start_pfn; #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */ It's from 2006 so I can't expect you remember the details, but I had some trouble finding out what this does. I assume it makes sure that mem_map points to struct page corresponding to pfn 0, because that's what translations using mem_map expect. But pgdat->node_mem_map points to struct page corresponding to pgdat->node_start_pfn, which might not be 0. So it subtracts node_start_pfn to fix that. This is OK, as the node_mem_map is allocated (in this very function) with padding so that it covers a MAX_ORDER_NR_PAGES aligned area where node_mem_map may point to the middle of it. Commit 467bc461d2 fixed this in case the first pfn is not 0, but ARCH_PFN_OFFSET. So mem_map points to struct page corresponding to pfn=ARCH_PFN_OFFSET, which is OK. But I still have few doubts: 1) The "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" sort of silently assumes that mem_map is allocated at the beginning of the node, i.e. at pgdat->node_start_pfn. And the only reason for this if-condition to be true, is that we haven't corrected the page_to_pfn translation, which uses mem_map. Is this assumption always OK to do? Shouldn't the if-condition be instead about pgdat->node_start_pfn not being aligned? 2) The #ifdef guard is about CONFIG_ARCH_POPULATES_NODE_MAP, which is nowadays called CONFIG_HAVE_MEMBLOCK_NODE_MAP. But shouldn't it be #ifdef FLATMEM instead? After all, we are correcting value of mem_map based on page_to_pfn code variant used on FLATMEM. arm doesn't define CONFIG_ARCH_POPULATES_NODE_MAP but apparently needs this correction. 3) The node_mem_map allocation code aligns the allocation to MAX_ORDER_NR_PAGES, so the offset between the start of the allocated map and where node_mem_map points to will be up to MAX_ORDER_NR_PAGES. However, here we subtract (in current kernel) (pgdat->node_start_pfn - ARCH_PFN_OFFSET). That looks like another silent assumption, that pgdat->node_start_pfn is always between ARCH_PFN_OFFSET and ARCH_PFN_OFFSET + MAX_ORDER_NR_PAGES. If it were larger, the mem_map correction would subtract too much and end up below what was allocated for node_mem_map, no? The bug report behind this patch said that first 2MB of memory was reserved using "no-map flag using DT". Unless this somehow translates to ARCH_PFN_OFFSET at build time, we would underflow mem_map, right? Maybe I'm just overly paranoid here and of course ARCH_PFN_OFFSET is determined properly on arm... If anyone can confirm my doubts or point me to what I'm missing, thanks. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCHv2] mm: Don't offset memmap for flatmem 2015-01-29 13:13 ` Vlastimil Babka @ 2015-02-04 2:25 ` Laura Abbott -1 siblings, 0 replies; 33+ messages in thread From: Laura Abbott @ 2015-02-04 2:25 UTC (permalink / raw) To: Vlastimil Babka, Mel Gorman Cc: Andrew Morton, Srinivas Kandagatla, linux-arm-kernel, Russell King - ARM Linux, ssantosh, Kevin Hilman, Arnd Bergman, Stephen Boyd, linux-mm, Kumar Gala On 1/29/2015 5:13 AM, Vlastimil Babka wrote: > On 01/26/2015 04:56 PM, Mel Gorman wrote: >> On Fri, Jan 23, 2015 at 10:05:48AM +0100, Vlastimil Babka wrote: >>> On 01/23/2015 01:33 AM, Laura Abbott wrote: >>>> On 1/22/2015 4:20 PM, Andrew Morton wrote: >>>>> >>>>> I don't think v2 addressed Vlastimil's review comment? >>>>> >>>> >>>> We're still adding the offset to node_mem_map and then subtracting it from >>>> just mem_map. Did I miss another comment somewhere? >>> >>> Yes that was addressed, thanks. But I don't feel comfortable acking >>> it yet, as I have no idea if we are doing the right thing for >>> CONFIG_HAVE_MEMBLOCK_NODE_MAP && CONFIG_FLATMEM case here. >>> >>> Also putting the CONFIG_FLATMEM && !CONFIG_HAVE_MEMBLOCK_NODE_MAP >>> under the "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" will >>> probably do the right thing, but looks like a weird test for this >>> case here. >>> >>> I have no good suggestion though, so let's CC Mel who apparently >>> wrote the ARCH_PFN_OFFSET correction? >>> >> >> I don't recall introducing ARCH_PFN_OFFSET, are you sure it was me? I'm just >> back today after been offline a week so didn't review the patch but IIRC, >> ARCH_PFN_OFFSET deals with the case where physical memory does not start >> at 0. Without the offset, virtual _PAGE_OFFSET would not physical page 0. >> I don't recall it being related to the alignment of node 0 so if there >> are crashes due to misalignment of node 0 and the fix is ARCH_PFN_OFFSET >> related then I'm surprised. > > You're right that ARCH_PFN_OFFSET wasn't added by you, but by commit > 467bc461d2 which was a bugfix to your commit c713216dee, which did > introduce the mem_map correction code, and after which the code looked like: > > mem_map = NODE_DATA(0)->node_mem_map; > #ifdef CONFIG_ARCH_POPULATES_NODE_MAP > if (page_to_pfn(mem_map) != pgdat->node_start_pfn) > mem_map -= pgdat->node_start_pfn; > #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */ > > > It's from 2006 so I can't expect you remember the details, but I had some > trouble finding out what this does. I assume it makes sure that mem_map points > to struct page corresponding to pfn 0, because that's what translations using > mem_map expect. > But pgdat->node_mem_map points to struct page corresponding to > pgdat->node_start_pfn, which might not be 0. So it subtracts node_start_pfn > to fix that. This is OK, as the node_mem_map is allocated (in this very > function) with padding so that it covers a MAX_ORDER_NR_PAGES aligned area > where node_mem_map may point to the middle of it. > > Commit 467bc461d2 fixed this in case the first pfn is not 0, but ARCH_PFN_OFFSET. > So mem_map points to struct page corresponding to pfn=ARCH_PFN_OFFSET, which > is OK. But I still have few doubts: > > 1) The "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" sort of silently > assumes that mem_map is allocated at the beginning of the node, i.e. at > pgdat->node_start_pfn. And the only reason for this if-condition to be true, > is that we haven't corrected the page_to_pfn translation, which uses mem_map. > Is this assumption always OK to do? Shouldn't the if-condition be instead about > pgdat->node_start_pfn not being aligned? > > 2) The #ifdef guard is about CONFIG_ARCH_POPULATES_NODE_MAP, which is nowadays called > CONFIG_HAVE_MEMBLOCK_NODE_MAP. But shouldn't it be #ifdef FLATMEM instead? > After all, we are correcting value of mem_map based on page_to_pfn code >variant used on FLATMEM. arm doesn't define > CONFIG_ARCH_POPULATES_NODE_MAP but apparently needs this correction. > Just doing #ifdef FLATMEM doesn't work because ARCH_PFN_OFFSET doesn't seem to be picked up properly for NOMMU arches properly. Probably just missing a header somewhere. > 3) The node_mem_map allocation code aligns the allocation to MAX_ORDER_NR_PAGES, > so the offset between the start of the allocated map and where node_mem_map > points to will be up to MAX_ORDER_NR_PAGES. > However, here we subtract (in current kernel) (pgdat->node_start_pfn - ARCH_PFN_OFFSET). > That looks like another silent assumption, that pgdat->node_start_pfn is always > between ARCH_PFN_OFFSET and ARCH_PFN_OFFSET + MAX_ORDER_NR_PAGES. If it were > larger, the mem_map correction would subtract too much and end up below what > was allocated for node_mem_map, no? The bug report behind this patch said that > first 2MB of memory was reserved using "no-map flag using DT". Unless this somehow > translates to ARCH_PFN_OFFSET at build time, we would underflow mem_map, right? > Maybe I'm just overly paranoid here and of course ARCH_PFN_OFFSET is determined > properly on arm... > > If anyone can confirm my doubts or point me to what I'm missing, thanks. ARCH_PFN_OFFSET should always be the lowest PFN in the system, otherwise I think plenty of other things are broken given how many architectures make this assumption. That said, I don't think subtracting ARCH_PFN_OFFSET makes it obvious why the adjustment is being made. Thanks, Laura -- Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCHv2] mm: Don't offset memmap for flatmem @ 2015-02-04 2:25 ` Laura Abbott 0 siblings, 0 replies; 33+ messages in thread From: Laura Abbott @ 2015-02-04 2:25 UTC (permalink / raw) To: linux-arm-kernel On 1/29/2015 5:13 AM, Vlastimil Babka wrote: > On 01/26/2015 04:56 PM, Mel Gorman wrote: >> On Fri, Jan 23, 2015 at 10:05:48AM +0100, Vlastimil Babka wrote: >>> On 01/23/2015 01:33 AM, Laura Abbott wrote: >>>> On 1/22/2015 4:20 PM, Andrew Morton wrote: >>>>> >>>>> I don't think v2 addressed Vlastimil's review comment? >>>>> >>>> >>>> We're still adding the offset to node_mem_map and then subtracting it from >>>> just mem_map. Did I miss another comment somewhere? >>> >>> Yes that was addressed, thanks. But I don't feel comfortable acking >>> it yet, as I have no idea if we are doing the right thing for >>> CONFIG_HAVE_MEMBLOCK_NODE_MAP && CONFIG_FLATMEM case here. >>> >>> Also putting the CONFIG_FLATMEM && !CONFIG_HAVE_MEMBLOCK_NODE_MAP >>> under the "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" will >>> probably do the right thing, but looks like a weird test for this >>> case here. >>> >>> I have no good suggestion though, so let's CC Mel who apparently >>> wrote the ARCH_PFN_OFFSET correction? >>> >> >> I don't recall introducing ARCH_PFN_OFFSET, are you sure it was me? I'm just >> back today after been offline a week so didn't review the patch but IIRC, >> ARCH_PFN_OFFSET deals with the case where physical memory does not start >> at 0. Without the offset, virtual _PAGE_OFFSET would not physical page 0. >> I don't recall it being related to the alignment of node 0 so if there >> are crashes due to misalignment of node 0 and the fix is ARCH_PFN_OFFSET >> related then I'm surprised. > > You're right that ARCH_PFN_OFFSET wasn't added by you, but by commit > 467bc461d2 which was a bugfix to your commit c713216dee, which did > introduce the mem_map correction code, and after which the code looked like: > > mem_map = NODE_DATA(0)->node_mem_map; > #ifdef CONFIG_ARCH_POPULATES_NODE_MAP > if (page_to_pfn(mem_map) != pgdat->node_start_pfn) > mem_map -= pgdat->node_start_pfn; > #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */ > > > It's from 2006 so I can't expect you remember the details, but I had some > trouble finding out what this does. I assume it makes sure that mem_map points > to struct page corresponding to pfn 0, because that's what translations using > mem_map expect. > But pgdat->node_mem_map points to struct page corresponding to > pgdat->node_start_pfn, which might not be 0. So it subtracts node_start_pfn > to fix that. This is OK, as the node_mem_map is allocated (in this very > function) with padding so that it covers a MAX_ORDER_NR_PAGES aligned area > where node_mem_map may point to the middle of it. > > Commit 467bc461d2 fixed this in case the first pfn is not 0, but ARCH_PFN_OFFSET. > So mem_map points to struct page corresponding to pfn=ARCH_PFN_OFFSET, which > is OK. But I still have few doubts: > > 1) The "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" sort of silently > assumes that mem_map is allocated at the beginning of the node, i.e. at > pgdat->node_start_pfn. And the only reason for this if-condition to be true, > is that we haven't corrected the page_to_pfn translation, which uses mem_map. > Is this assumption always OK to do? Shouldn't the if-condition be instead about > pgdat->node_start_pfn not being aligned? > > 2) The #ifdef guard is about CONFIG_ARCH_POPULATES_NODE_MAP, which is nowadays called > CONFIG_HAVE_MEMBLOCK_NODE_MAP. But shouldn't it be #ifdef FLATMEM instead? > After all, we are correcting value of mem_map based on page_to_pfn code >variant used on FLATMEM. arm doesn't define > CONFIG_ARCH_POPULATES_NODE_MAP but apparently needs this correction. > Just doing #ifdef FLATMEM doesn't work because ARCH_PFN_OFFSET doesn't seem to be picked up properly for NOMMU arches properly. Probably just missing a header somewhere. > 3) The node_mem_map allocation code aligns the allocation to MAX_ORDER_NR_PAGES, > so the offset between the start of the allocated map and where node_mem_map > points to will be up to MAX_ORDER_NR_PAGES. > However, here we subtract (in current kernel) (pgdat->node_start_pfn - ARCH_PFN_OFFSET). > That looks like another silent assumption, that pgdat->node_start_pfn is always > between ARCH_PFN_OFFSET and ARCH_PFN_OFFSET + MAX_ORDER_NR_PAGES. If it were > larger, the mem_map correction would subtract too much and end up below what > was allocated for node_mem_map, no? The bug report behind this patch said that > first 2MB of memory was reserved using "no-map flag using DT". Unless this somehow > translates to ARCH_PFN_OFFSET at build time, we would underflow mem_map, right? > Maybe I'm just overly paranoid here and of course ARCH_PFN_OFFSET is determined > properly on arm... > > If anyone can confirm my doubts or point me to what I'm missing, thanks. ARCH_PFN_OFFSET should always be the lowest PFN in the system, otherwise I think plenty of other things are broken given how many architectures make this assumption. That said, I don't think subtracting ARCH_PFN_OFFSET makes it obvious why the adjustment is being made. Thanks, Laura -- Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCHv2] mm: Don't offset memmap for flatmem 2015-02-04 2:25 ` Laura Abbott @ 2015-02-24 19:54 ` Laura Abbott -1 siblings, 0 replies; 33+ messages in thread From: Laura Abbott @ 2015-02-24 19:54 UTC (permalink / raw) To: Vlastimil Babka, Mel Gorman Cc: Andrew Morton, Srinivas Kandagatla, linux-arm-kernel, Russell King - ARM Linux, ssantosh, Kevin Hilman, Arnd Bergman, Stephen Boyd, linux-mm, Kumar Gala Reviving this thread because I don't think it ever got resolved. On 2/3/2015 6:25 PM, Laura Abbott wrote: > On 1/29/2015 5:13 AM, Vlastimil Babka wrote: >> On 01/26/2015 04:56 PM, Mel Gorman wrote: >>> On Fri, Jan 23, 2015 at 10:05:48AM +0100, Vlastimil Babka wrote: >>>> On 01/23/2015 01:33 AM, Laura Abbott wrote: >>>>> On 1/22/2015 4:20 PM, Andrew Morton wrote: >>>>>> >>>>>> I don't think v2 addressed Vlastimil's review comment? >>>>>> >>>>> >>>>> We're still adding the offset to node_mem_map and then subtracting it from >>>>> just mem_map. Did I miss another comment somewhere? >>>> >>>> Yes that was addressed, thanks. But I don't feel comfortable acking >>>> it yet, as I have no idea if we are doing the right thing for >>>> CONFIG_HAVE_MEMBLOCK_NODE_MAP && CONFIG_FLATMEM case here. >>>> >>>> Also putting the CONFIG_FLATMEM && !CONFIG_HAVE_MEMBLOCK_NODE_MAP >>>> under the "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" will >>>> probably do the right thing, but looks like a weird test for this >>>> case here. >>>> >>>> I have no good suggestion though, so let's CC Mel who apparently >>>> wrote the ARCH_PFN_OFFSET correction? >>>> >>> >>> I don't recall introducing ARCH_PFN_OFFSET, are you sure it was me? I'm just >>> back today after been offline a week so didn't review the patch but IIRC, >>> ARCH_PFN_OFFSET deals with the case where physical memory does not start >>> at 0. Without the offset, virtual _PAGE_OFFSET would not physical page 0. >>> I don't recall it being related to the alignment of node 0 so if there >>> are crashes due to misalignment of node 0 and the fix is ARCH_PFN_OFFSET >>> related then I'm surprised. >> >> You're right that ARCH_PFN_OFFSET wasn't added by you, but by commit >> 467bc461d2 which was a bugfix to your commit c713216dee, which did >> introduce the mem_map correction code, and after which the code looked like: >> >> mem_map = NODE_DATA(0)->node_mem_map; >> #ifdef CONFIG_ARCH_POPULATES_NODE_MAP >> if (page_to_pfn(mem_map) != pgdat->node_start_pfn) >> mem_map -= pgdat->node_start_pfn; >> #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */ >> >> >> It's from 2006 so I can't expect you remember the details, but I had some >> trouble finding out what this does. I assume it makes sure that mem_map points >> to struct page corresponding to pfn 0, because that's what translations using >> mem_map expect. >> But pgdat->node_mem_map points to struct page corresponding to >> pgdat->node_start_pfn, which might not be 0. So it subtracts node_start_pfn >> to fix that. This is OK, as the node_mem_map is allocated (in this very >> function) with padding so that it covers a MAX_ORDER_NR_PAGES aligned area >> where node_mem_map may point to the middle of it. >> >> Commit 467bc461d2 fixed this in case the first pfn is not 0, but ARCH_PFN_OFFSET. >> So mem_map points to struct page corresponding to pfn=ARCH_PFN_OFFSET, which >> is OK. But I still have few doubts: >> >> 1) The "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" sort of silently >> assumes that mem_map is allocated at the beginning of the node, i.e. at >> pgdat->node_start_pfn. And the only reason for this if-condition to be true, >> is that we haven't corrected the page_to_pfn translation, which uses mem_map. >> Is this assumption always OK to do? Shouldn't the if-condition be instead about >> pgdat->node_start_pfn not being aligned? >> >> 2) The #ifdef guard is about CONFIG_ARCH_POPULATES_NODE_MAP, which is nowadays called > CONFIG_HAVE_MEMBLOCK_NODE_MAP. But shouldn't it be #ifdef FLATMEM instead? >> After all, we are correcting value of mem_map based on page_to_pfn code >> variant used on FLATMEM. arm doesn't define >> CONFIG_ARCH_POPULATES_NODE_MAP but apparently needs this correction. >> > > Just doing #ifdef FLATMEM doesn't work because ARCH_PFN_OFFSET doesn't > seem to be picked up properly for NOMMU arches properly. Probably just > missing a header somewhere. > >> 3) The node_mem_map allocation code aligns the allocation to MAX_ORDER_NR_PAGES, >> so the offset between the start of the allocated map and where node_mem_map >> points to will be up to MAX_ORDER_NR_PAGES. >> However, here we subtract (in current kernel) (pgdat->node_start_pfn - ARCH_PFN_OFFSET). >> That looks like another silent assumption, that pgdat->node_start_pfn is always >> between ARCH_PFN_OFFSET and ARCH_PFN_OFFSET + MAX_ORDER_NR_PAGES. If it were >> larger, the mem_map correction would subtract too much and end up below what >> was allocated for node_mem_map, no? The bug report behind this patch said that >> first 2MB of memory was reserved using "no-map flag using DT". Unless this somehow >> translates to ARCH_PFN_OFFSET at build time, we would underflow mem_map, right? >> Maybe I'm just overly paranoid here and of course ARCH_PFN_OFFSET is determined >> properly on arm... >> >> If anyone can confirm my doubts or point me to what I'm missing, thanks. > > ARCH_PFN_OFFSET should always be the lowest PFN in the system, otherwise > I think plenty of other things are broken given how many architectures > make this assumption. That said, I don't think subtracting ARCH_PFN_OFFSET > makes it obvious why the adjustment is being made. > > Thanks, > Laura > I was incorrect before: it isn't just NOMMU but architectures that don't use asm-generic/memory_model.h which failed to compile. I could respin with more ifdefery around the ARCH_PFN_OFFSET if that sounds reasonable. Thanks, Laura -- Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCHv2] mm: Don't offset memmap for flatmem @ 2015-02-24 19:54 ` Laura Abbott 0 siblings, 0 replies; 33+ messages in thread From: Laura Abbott @ 2015-02-24 19:54 UTC (permalink / raw) To: linux-arm-kernel Reviving this thread because I don't think it ever got resolved. On 2/3/2015 6:25 PM, Laura Abbott wrote: > On 1/29/2015 5:13 AM, Vlastimil Babka wrote: >> On 01/26/2015 04:56 PM, Mel Gorman wrote: >>> On Fri, Jan 23, 2015 at 10:05:48AM +0100, Vlastimil Babka wrote: >>>> On 01/23/2015 01:33 AM, Laura Abbott wrote: >>>>> On 1/22/2015 4:20 PM, Andrew Morton wrote: >>>>>> >>>>>> I don't think v2 addressed Vlastimil's review comment? >>>>>> >>>>> >>>>> We're still adding the offset to node_mem_map and then subtracting it from >>>>> just mem_map. Did I miss another comment somewhere? >>>> >>>> Yes that was addressed, thanks. But I don't feel comfortable acking >>>> it yet, as I have no idea if we are doing the right thing for >>>> CONFIG_HAVE_MEMBLOCK_NODE_MAP && CONFIG_FLATMEM case here. >>>> >>>> Also putting the CONFIG_FLATMEM && !CONFIG_HAVE_MEMBLOCK_NODE_MAP >>>> under the "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" will >>>> probably do the right thing, but looks like a weird test for this >>>> case here. >>>> >>>> I have no good suggestion though, so let's CC Mel who apparently >>>> wrote the ARCH_PFN_OFFSET correction? >>>> >>> >>> I don't recall introducing ARCH_PFN_OFFSET, are you sure it was me? I'm just >>> back today after been offline a week so didn't review the patch but IIRC, >>> ARCH_PFN_OFFSET deals with the case where physical memory does not start >>> at 0. Without the offset, virtual _PAGE_OFFSET would not physical page 0. >>> I don't recall it being related to the alignment of node 0 so if there >>> are crashes due to misalignment of node 0 and the fix is ARCH_PFN_OFFSET >>> related then I'm surprised. >> >> You're right that ARCH_PFN_OFFSET wasn't added by you, but by commit >> 467bc461d2 which was a bugfix to your commit c713216dee, which did >> introduce the mem_map correction code, and after which the code looked like: >> >> mem_map = NODE_DATA(0)->node_mem_map; >> #ifdef CONFIG_ARCH_POPULATES_NODE_MAP >> if (page_to_pfn(mem_map) != pgdat->node_start_pfn) >> mem_map -= pgdat->node_start_pfn; >> #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */ >> >> >> It's from 2006 so I can't expect you remember the details, but I had some >> trouble finding out what this does. I assume it makes sure that mem_map points >> to struct page corresponding to pfn 0, because that's what translations using >> mem_map expect. >> But pgdat->node_mem_map points to struct page corresponding to >> pgdat->node_start_pfn, which might not be 0. So it subtracts node_start_pfn >> to fix that. This is OK, as the node_mem_map is allocated (in this very >> function) with padding so that it covers a MAX_ORDER_NR_PAGES aligned area >> where node_mem_map may point to the middle of it. >> >> Commit 467bc461d2 fixed this in case the first pfn is not 0, but ARCH_PFN_OFFSET. >> So mem_map points to struct page corresponding to pfn=ARCH_PFN_OFFSET, which >> is OK. But I still have few doubts: >> >> 1) The "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" sort of silently >> assumes that mem_map is allocated at the beginning of the node, i.e. at >> pgdat->node_start_pfn. And the only reason for this if-condition to be true, >> is that we haven't corrected the page_to_pfn translation, which uses mem_map. >> Is this assumption always OK to do? Shouldn't the if-condition be instead about >> pgdat->node_start_pfn not being aligned? >> >> 2) The #ifdef guard is about CONFIG_ARCH_POPULATES_NODE_MAP, which is nowadays called > CONFIG_HAVE_MEMBLOCK_NODE_MAP. But shouldn't it be #ifdef FLATMEM instead? >> After all, we are correcting value of mem_map based on page_to_pfn code >> variant used on FLATMEM. arm doesn't define >> CONFIG_ARCH_POPULATES_NODE_MAP but apparently needs this correction. >> > > Just doing #ifdef FLATMEM doesn't work because ARCH_PFN_OFFSET doesn't > seem to be picked up properly for NOMMU arches properly. Probably just > missing a header somewhere. > >> 3) The node_mem_map allocation code aligns the allocation to MAX_ORDER_NR_PAGES, >> so the offset between the start of the allocated map and where node_mem_map >> points to will be up to MAX_ORDER_NR_PAGES. >> However, here we subtract (in current kernel) (pgdat->node_start_pfn - ARCH_PFN_OFFSET). >> That looks like another silent assumption, that pgdat->node_start_pfn is always >> between ARCH_PFN_OFFSET and ARCH_PFN_OFFSET + MAX_ORDER_NR_PAGES. If it were >> larger, the mem_map correction would subtract too much and end up below what >> was allocated for node_mem_map, no? The bug report behind this patch said that >> first 2MB of memory was reserved using "no-map flag using DT". Unless this somehow >> translates to ARCH_PFN_OFFSET at build time, we would underflow mem_map, right? >> Maybe I'm just overly paranoid here and of course ARCH_PFN_OFFSET is determined >> properly on arm... >> >> If anyone can confirm my doubts or point me to what I'm missing, thanks. > > ARCH_PFN_OFFSET should always be the lowest PFN in the system, otherwise > I think plenty of other things are broken given how many architectures > make this assumption. That said, I don't think subtracting ARCH_PFN_OFFSET > makes it obvious why the adjustment is being made. > > Thanks, > Laura > I was incorrect before: it isn't just NOMMU but architectures that don't use asm-generic/memory_model.h which failed to compile. I could respin with more ifdefery around the ARCH_PFN_OFFSET if that sounds reasonable. Thanks, Laura -- Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCHv2] mm: Don't offset memmap for flatmem 2015-02-24 19:54 ` Laura Abbott @ 2015-02-27 15:24 ` Vlastimil Babka -1 siblings, 0 replies; 33+ messages in thread From: Vlastimil Babka @ 2015-02-27 15:24 UTC (permalink / raw) To: Laura Abbott, Mel Gorman Cc: Andrew Morton, Srinivas Kandagatla, linux-arm-kernel, Russell King - ARM Linux, ssantosh, Kevin Hilman, Arnd Bergman, Stephen Boyd, linux-mm, Kumar Gala On 02/24/2015 08:54 PM, Laura Abbott wrote: > Reviving this thread because I don't think it ever got resolved. > > On 2/3/2015 6:25 PM, Laura Abbott wrote: >> On 1/29/2015 5:13 AM, Vlastimil Babka wrote: >>>> >>>> I don't recall introducing ARCH_PFN_OFFSET, are you sure it was me? I'm just >>>> back today after been offline a week so didn't review the patch but IIRC, >>>> ARCH_PFN_OFFSET deals with the case where physical memory does not start >>>> at 0. Without the offset, virtual _PAGE_OFFSET would not physical page 0. >>>> I don't recall it being related to the alignment of node 0 so if there >>>> are crashes due to misalignment of node 0 and the fix is ARCH_PFN_OFFSET >>>> related then I'm surprised. >>> >>> You're right that ARCH_PFN_OFFSET wasn't added by you, but by commit >>> 467bc461d2 which was a bugfix to your commit c713216dee, which did >>> introduce the mem_map correction code, and after which the code looked like: >>> >>> mem_map = NODE_DATA(0)->node_mem_map; >>> #ifdef CONFIG_ARCH_POPULATES_NODE_MAP >>> if (page_to_pfn(mem_map) != pgdat->node_start_pfn) >>> mem_map -= pgdat->node_start_pfn; >>> #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */ >>> >>> >>> It's from 2006 so I can't expect you remember the details, but I had some >>> trouble finding out what this does. I assume it makes sure that mem_map points >>> to struct page corresponding to pfn 0, because that's what translations using >>> mem_map expect. >>> But pgdat->node_mem_map points to struct page corresponding to >>> pgdat->node_start_pfn, which might not be 0. So it subtracts node_start_pfn >>> to fix that. This is OK, as the node_mem_map is allocated (in this very >>> function) with padding so that it covers a MAX_ORDER_NR_PAGES aligned area >>> where node_mem_map may point to the middle of it. >>> >>> Commit 467bc461d2 fixed this in case the first pfn is not 0, but ARCH_PFN_OFFSET. >>> So mem_map points to struct page corresponding to pfn=ARCH_PFN_OFFSET, which >>> is OK. But I still have few doubts: >>> >>> 1) The "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" sort of silently >>> assumes that mem_map is allocated at the beginning of the node, i.e. at >>> pgdat->node_start_pfn. And the only reason for this if-condition to be true, >>> is that we haven't corrected the page_to_pfn translation, which uses mem_map. >>> Is this assumption always OK to do? Shouldn't the if-condition be instead about >>> pgdat->node_start_pfn not being aligned? >>> >>> 2) The #ifdef guard is about CONFIG_ARCH_POPULATES_NODE_MAP, which is nowadays called > CONFIG_HAVE_MEMBLOCK_NODE_MAP. But shouldn't it be #ifdef FLATMEM instead? >>> After all, we are correcting value of mem_map based on page_to_pfn code >>> variant used on FLATMEM. arm doesn't define >>> CONFIG_ARCH_POPULATES_NODE_MAP but apparently needs this correction. >>> >> >> Just doing #ifdef FLATMEM doesn't work because ARCH_PFN_OFFSET doesn't >> seem to be picked up properly for NOMMU arches properly. Probably just >> missing a header somewhere. >> >>> 3) The node_mem_map allocation code aligns the allocation to MAX_ORDER_NR_PAGES, >>> so the offset between the start of the allocated map and where node_mem_map >>> points to will be up to MAX_ORDER_NR_PAGES. >>> However, here we subtract (in current kernel) (pgdat->node_start_pfn - ARCH_PFN_OFFSET). >>> That looks like another silent assumption, that pgdat->node_start_pfn is always >>> between ARCH_PFN_OFFSET and ARCH_PFN_OFFSET + MAX_ORDER_NR_PAGES. If it were >>> larger, the mem_map correction would subtract too much and end up below what >>> was allocated for node_mem_map, no? The bug report behind this patch said that >>> first 2MB of memory was reserved using "no-map flag using DT". Unless this somehow >>> translates to ARCH_PFN_OFFSET at build time, we would underflow mem_map, right? >>> Maybe I'm just overly paranoid here and of course ARCH_PFN_OFFSET is determined >>> properly on arm... >>> >>> If anyone can confirm my doubts or point me to what I'm missing, thanks. >> >> ARCH_PFN_OFFSET should always be the lowest PFN in the system, otherwise >> I think plenty of other things are broken given how many architectures >> make this assumption. That said, I don't think subtracting ARCH_PFN_OFFSET >> makes it obvious why the adjustment is being made. >> >> Thanks, >> Laura >> > > I was incorrect before: it isn't just NOMMU but architectures that don't use > asm-generic/memory_model.h which failed to compile. I could respin with Hm I see, some architectures use own variant of page_to_pfn, that's why it's being used in the if () check. > more ifdefery around the ARCH_PFN_OFFSET if that sounds reasonable. So I think your v2 might be correct already. Unless there's an architecture that defines CONFIG_FLATMEM and not CONFIG_HAVE_MEMBLOCK_NODE_MAP and places memmap somewhere else than pgdat->node_start_pfn, which would trigger the check for a wrong reason after the patch. Looks like arm is an arch that doesn't define CONFIG_HAVE_MEMBLOCK_NODE_MAP, yet it defines ARCH_PFN_OFFSET. With your patch it would correct memmap by the calculated offset, not the ARCH_PFN_OFFSET constant. Are these two the same then? Should there be something like a VM_BUG_ON that ARCH_PFN_OFFSET (if it exists) is indeed equal to the calculated offset? Or maybe a more general VM_BUG_ON checking that after any correction we make, the (page_to_pfn(mem_map) == pgdat->node_start_pfn) condition holds? > Thanks, > Laura > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCHv2] mm: Don't offset memmap for flatmem @ 2015-02-27 15:24 ` Vlastimil Babka 0 siblings, 0 replies; 33+ messages in thread From: Vlastimil Babka @ 2015-02-27 15:24 UTC (permalink / raw) To: linux-arm-kernel On 02/24/2015 08:54 PM, Laura Abbott wrote: > Reviving this thread because I don't think it ever got resolved. > > On 2/3/2015 6:25 PM, Laura Abbott wrote: >> On 1/29/2015 5:13 AM, Vlastimil Babka wrote: >>>> >>>> I don't recall introducing ARCH_PFN_OFFSET, are you sure it was me? I'm just >>>> back today after been offline a week so didn't review the patch but IIRC, >>>> ARCH_PFN_OFFSET deals with the case where physical memory does not start >>>> at 0. Without the offset, virtual _PAGE_OFFSET would not physical page 0. >>>> I don't recall it being related to the alignment of node 0 so if there >>>> are crashes due to misalignment of node 0 and the fix is ARCH_PFN_OFFSET >>>> related then I'm surprised. >>> >>> You're right that ARCH_PFN_OFFSET wasn't added by you, but by commit >>> 467bc461d2 which was a bugfix to your commit c713216dee, which did >>> introduce the mem_map correction code, and after which the code looked like: >>> >>> mem_map = NODE_DATA(0)->node_mem_map; >>> #ifdef CONFIG_ARCH_POPULATES_NODE_MAP >>> if (page_to_pfn(mem_map) != pgdat->node_start_pfn) >>> mem_map -= pgdat->node_start_pfn; >>> #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */ >>> >>> >>> It's from 2006 so I can't expect you remember the details, but I had some >>> trouble finding out what this does. I assume it makes sure that mem_map points >>> to struct page corresponding to pfn 0, because that's what translations using >>> mem_map expect. >>> But pgdat->node_mem_map points to struct page corresponding to >>> pgdat->node_start_pfn, which might not be 0. So it subtracts node_start_pfn >>> to fix that. This is OK, as the node_mem_map is allocated (in this very >>> function) with padding so that it covers a MAX_ORDER_NR_PAGES aligned area >>> where node_mem_map may point to the middle of it. >>> >>> Commit 467bc461d2 fixed this in case the first pfn is not 0, but ARCH_PFN_OFFSET. >>> So mem_map points to struct page corresponding to pfn=ARCH_PFN_OFFSET, which >>> is OK. But I still have few doubts: >>> >>> 1) The "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" sort of silently >>> assumes that mem_map is allocated at the beginning of the node, i.e. at >>> pgdat->node_start_pfn. And the only reason for this if-condition to be true, >>> is that we haven't corrected the page_to_pfn translation, which uses mem_map. >>> Is this assumption always OK to do? Shouldn't the if-condition be instead about >>> pgdat->node_start_pfn not being aligned? >>> >>> 2) The #ifdef guard is about CONFIG_ARCH_POPULATES_NODE_MAP, which is nowadays called > CONFIG_HAVE_MEMBLOCK_NODE_MAP. But shouldn't it be #ifdef FLATMEM instead? >>> After all, we are correcting value of mem_map based on page_to_pfn code >>> variant used on FLATMEM. arm doesn't define >>> CONFIG_ARCH_POPULATES_NODE_MAP but apparently needs this correction. >>> >> >> Just doing #ifdef FLATMEM doesn't work because ARCH_PFN_OFFSET doesn't >> seem to be picked up properly for NOMMU arches properly. Probably just >> missing a header somewhere. >> >>> 3) The node_mem_map allocation code aligns the allocation to MAX_ORDER_NR_PAGES, >>> so the offset between the start of the allocated map and where node_mem_map >>> points to will be up to MAX_ORDER_NR_PAGES. >>> However, here we subtract (in current kernel) (pgdat->node_start_pfn - ARCH_PFN_OFFSET). >>> That looks like another silent assumption, that pgdat->node_start_pfn is always >>> between ARCH_PFN_OFFSET and ARCH_PFN_OFFSET + MAX_ORDER_NR_PAGES. If it were >>> larger, the mem_map correction would subtract too much and end up below what >>> was allocated for node_mem_map, no? The bug report behind this patch said that >>> first 2MB of memory was reserved using "no-map flag using DT". Unless this somehow >>> translates to ARCH_PFN_OFFSET at build time, we would underflow mem_map, right? >>> Maybe I'm just overly paranoid here and of course ARCH_PFN_OFFSET is determined >>> properly on arm... >>> >>> If anyone can confirm my doubts or point me to what I'm missing, thanks. >> >> ARCH_PFN_OFFSET should always be the lowest PFN in the system, otherwise >> I think plenty of other things are broken given how many architectures >> make this assumption. That said, I don't think subtracting ARCH_PFN_OFFSET >> makes it obvious why the adjustment is being made. >> >> Thanks, >> Laura >> > > I was incorrect before: it isn't just NOMMU but architectures that don't use > asm-generic/memory_model.h which failed to compile. I could respin with Hm I see, some architectures use own variant of page_to_pfn, that's why it's being used in the if () check. > more ifdefery around the ARCH_PFN_OFFSET if that sounds reasonable. So I think your v2 might be correct already. Unless there's an architecture that defines CONFIG_FLATMEM and not CONFIG_HAVE_MEMBLOCK_NODE_MAP and places memmap somewhere else than pgdat->node_start_pfn, which would trigger the check for a wrong reason after the patch. Looks like arm is an arch that doesn't define CONFIG_HAVE_MEMBLOCK_NODE_MAP, yet it defines ARCH_PFN_OFFSET. With your patch it would correct memmap by the calculated offset, not the ARCH_PFN_OFFSET constant. Are these two the same then? Should there be something like a VM_BUG_ON that ARCH_PFN_OFFSET (if it exists) is indeed equal to the calculated offset? Or maybe a more general VM_BUG_ON checking that after any correction we make, the (page_to_pfn(mem_map) == pgdat->node_start_pfn) condition holds? > Thanks, > Laura > ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2015-02-27 15:25 UTC | newest] Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-01-16 11:30 Issue on reserving memory with no-map flag in DT Srinivas Kandagatla 2015-01-17 0:24 ` Laura Abbott 2015-01-17 0:24 ` Laura Abbott 2015-01-17 8:39 ` Srinivas Kandagatla 2015-01-17 8:39 ` Srinivas Kandagatla 2015-01-19 15:49 ` Vlastimil Babka 2015-01-19 15:49 ` Vlastimil Babka 2015-01-19 23:57 ` Laura Abbott 2015-01-19 23:57 ` Laura Abbott 2015-01-20 9:54 ` Vlastimil Babka 2015-01-20 9:54 ` Vlastimil Babka 2015-01-21 1:37 ` [PATCH] mm: Don't offset memmap for flatmem Laura Abbott 2015-01-21 1:37 ` Laura Abbott 2015-01-21 10:15 ` Vlastimil Babka 2015-01-21 10:15 ` Vlastimil Babka 2015-01-22 1:01 ` [PATCHv2] " Laura Abbott 2015-01-22 1:01 ` Laura Abbott 2015-01-23 0:20 ` Andrew Morton 2015-01-23 0:20 ` Andrew Morton 2015-01-23 0:33 ` Laura Abbott 2015-01-23 0:33 ` Laura Abbott 2015-01-23 9:05 ` Vlastimil Babka 2015-01-23 9:05 ` Vlastimil Babka 2015-01-26 15:56 ` Mel Gorman 2015-01-26 15:56 ` Mel Gorman 2015-01-29 13:13 ` Vlastimil Babka 2015-01-29 13:13 ` Vlastimil Babka 2015-02-04 2:25 ` Laura Abbott 2015-02-04 2:25 ` Laura Abbott 2015-02-24 19:54 ` Laura Abbott 2015-02-24 19:54 ` Laura Abbott 2015-02-27 15:24 ` Vlastimil Babka 2015-02-27 15:24 ` Vlastimil Babka
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.