* [PATCH] sparsemem/bootmem: catch greater than section size allocations @ 2012-02-24 19:33 Nishanth Aravamudan 2012-02-28 13:53 ` Johannes Weiner 2012-02-28 15:47 ` Mel Gorman 0 siblings, 2 replies; 11+ messages in thread From: Nishanth Aravamudan @ 2012-02-24 19:33 UTC (permalink / raw) To: Andrew Morton Cc: Anton Blanchard, Dave Hansen, linux-mm, Paul Mackerras, Robert Jennings, linuxppc-dev While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory Overcommit) on powerpc, we tripped the following: kernel BUG at mm/bootmem.c:483! cpu 0x0: Vector: 700 (Program Check) at [c000000000c03940] pc: c000000000a62bd8: .alloc_bootmem_core+0x90/0x39c lr: c000000000a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c sp: c000000000c03bc0 msr: 8000000000021032 current = 0xc000000000b0cce0 paca = 0xc000000001d80000 pid = 0, comm = swapper kernel BUG at mm/bootmem.c:483! enter ? for help [c000000000c03c80] c000000000a64bcc .sparse_early_usemaps_alloc_node+0x84/0x29c [c000000000c03d50] c000000000a64f10 .sparse_init+0x12c/0x28c [c000000000c03e20] c000000000a474f4 .setup_arch+0x20c/0x294 [c000000000c03ee0] c000000000a4079c .start_kernel+0xb4/0x460 [c000000000c03f90] c000000000009670 .start_here_common+0x1c/0x2c This is BUG_ON(limit && goal + size > limit); and after some debugging, it seems that goal = 0x7ffff000000 limit = 0x80000000000 and sparse_early_usemaps_alloc_node -> sparse_early_usemaps_alloc_pgdat_section -> alloc_bootmem_section calls return alloc_bootmem_section(usemap_size() * count, section_nr); This is on a system with 8TB available via the AMS pool, and as a quirk of AMS in firmware, all of that memory shows up in node 0. So, we end up with an allocation that will fail the goal/limit constraints. In theory, we could "fall-back" to alloc_bootmem_node() in sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE defined, we'll BUG_ON() instead. A simple solution appears to be to disable the limit check if the size of the allocation in alloc_bootmem_secition exceeds the section size. Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Anton Blanchard <anton@au1.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ben Herrenschmidt <benh@kernel.crashing.org> Cc: Robert Jennings <rcj@linux.vnet.ibm.com> Cc: linux-mm@kvack.org Cc: linuxppc-dev@lists.ozlabs.org --- include/linux/mmzone.h | 2 ++ mm/bootmem.c | 5 ++++- 2 files changed, 6 insertions(+), 1 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 650ba2f..4176834 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -967,6 +967,8 @@ static inline unsigned long early_pfn_to_nid(unsigned long pfn) * PA_SECTION_SHIFT physical address to/from section number * PFN_SECTION_SHIFT pfn to/from section number */ +#define BYTES_PER_SECTION (1UL << SECTION_SIZE_BITS) + #define SECTIONS_SHIFT (MAX_PHYSMEM_BITS - SECTION_SIZE_BITS) #define PA_SECTION_SHIFT (SECTION_SIZE_BITS) diff --git a/mm/bootmem.c b/mm/bootmem.c index 668e94d..5cbbc76 100644 --- a/mm/bootmem.c +++ b/mm/bootmem.c @@ -770,7 +770,10 @@ void * __init alloc_bootmem_section(unsigned long size, pfn = section_nr_to_pfn(section_nr); goal = pfn << PAGE_SHIFT; - limit = section_nr_to_pfn(section_nr + 1) << PAGE_SHIFT; + if (size > BYTES_PER_SECTION) + limit = 0; + else + limit = section_nr_to_pfn(section_nr + 1) << PAGE_SHIFT; bdata = &bootmem_node_data[early_pfn_to_nid(pfn)]; return alloc_bootmem_core(bdata, size, SMP_CACHE_BYTES, goal, limit); -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] sparsemem/bootmem: catch greater than section size allocations 2012-02-24 19:33 [PATCH] sparsemem/bootmem: catch greater than section size allocations Nishanth Aravamudan @ 2012-02-28 13:53 ` Johannes Weiner 2012-02-28 20:11 ` Nishanth Aravamudan 2012-02-28 15:47 ` Mel Gorman 1 sibling, 1 reply; 11+ messages in thread From: Johannes Weiner @ 2012-02-28 13:53 UTC (permalink / raw) To: Nishanth Aravamudan Cc: Anton Blanchard, Dave Hansen, linux-mm, Paul Mackerras, Andrew Morton, Robert Jennings, linuxppc-dev On Fri, Feb 24, 2012 at 11:33:58AM -0800, Nishanth Aravamudan wrote: > While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory > Overcommit) on powerpc, we tripped the following: > > kernel BUG at mm/bootmem.c:483! > cpu 0x0: Vector: 700 (Program Check) at [c000000000c03940] > pc: c000000000a62bd8: .alloc_bootmem_core+0x90/0x39c > lr: c000000000a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c > sp: c000000000c03bc0 > msr: 8000000000021032 > current = 0xc000000000b0cce0 > paca = 0xc000000001d80000 > pid = 0, comm = swapper > kernel BUG at mm/bootmem.c:483! > enter ? for help > [c000000000c03c80] c000000000a64bcc > .sparse_early_usemaps_alloc_node+0x84/0x29c > [c000000000c03d50] c000000000a64f10 .sparse_init+0x12c/0x28c > [c000000000c03e20] c000000000a474f4 .setup_arch+0x20c/0x294 > [c000000000c03ee0] c000000000a4079c .start_kernel+0xb4/0x460 > [c000000000c03f90] c000000000009670 .start_here_common+0x1c/0x2c > > This is > > BUG_ON(limit && goal + size > limit); > > and after some debugging, it seems that > > goal = 0x7ffff000000 > limit = 0x80000000000 > > and sparse_early_usemaps_alloc_node -> > sparse_early_usemaps_alloc_pgdat_section -> alloc_bootmem_section calls > > return alloc_bootmem_section(usemap_size() * count, section_nr); > > This is on a system with 8TB available via the AMS pool, and as a quirk > of AMS in firmware, all of that memory shows up in node 0. So, we end up > with an allocation that will fail the goal/limit constraints. In theory, > we could "fall-back" to alloc_bootmem_node() in > sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE > defined, we'll BUG_ON() instead. A simple solution appears to be to > disable the limit check if the size of the allocation in > alloc_bootmem_secition exceeds the section size. It makes sense to allow the usemaps to spill over to subsequent sections instead of panicking, so FWIW: Acked-by: Johannes Weiner <hannes@cmpxchg.org> That being said, it would be good if check_usemap_section_nr() printed the cross-dependencies between pgdats and sections when the usemaps of a node spilled over to other sections than the ones holding the pgdat. How about this? --- From: Johannes Weiner <hannes@cmpxchg.org> Subject: sparsemem/bootmem: catch greater than section size allocations fix If alloc_bootmem_section() no longer guarantees section-locality, we need check_usemap_section_nr() to print possible cross-dependencies between node descriptors and the usemaps allocated through it. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- diff --git a/mm/sparse.c b/mm/sparse.c index 61d7cde..9e032dc 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -359,6 +359,7 @@ static void __init sparse_early_usemaps_alloc_node(unsigned long**usemap_map, continue; usemap_map[pnum] = usemap; usemap += size; + check_usemap_section_nr(nodeid, usemap_map[pnum]); } return; } --- Furthermore, I wonder if we can remove the sparse-specific stuff from bootmem.c as well, as now even more so than before, calculating the desired area is really none of bootmem's business. Would something like this be okay? --- From: Johannes Weiner <hannes@cmpxchg.org> Subject: [patch] mm: remove sparsemem allocation details from the bootmem allocator alloc_bootmem_section() derives allocation area constraints from the specified sparsemem section. This is a bit specific for a generic memory allocator like bootmem, though, so move it over to sparsemem. Since __alloc_bootmem_node() already retries failed allocations with relaxed area constraints, the fallback code in sparsemem.c can be removed and the code becomes a bit more compact overall. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- include/linux/bootmem.h | 3 --- mm/bootmem.c | 26 -------------------------- mm/sparse.c | 29 +++++++++-------------------- 3 files changed, 9 insertions(+), 49 deletions(-) diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h index ab344a5..001c248 100644 --- a/include/linux/bootmem.h +++ b/include/linux/bootmem.h @@ -135,9 +135,6 @@ extern void *__alloc_bootmem_low_node(pg_data_t *pgdat, extern int reserve_bootmem_generic(unsigned long addr, unsigned long size, int flags); -extern void *alloc_bootmem_section(unsigned long size, - unsigned long section_nr); - #ifdef CONFIG_HAVE_ARCH_ALLOC_REMAP extern void *alloc_remap(int nid, unsigned long size); #else diff --git a/mm/bootmem.c b/mm/bootmem.c index 7bc0557..d34026c 100644 --- a/mm/bootmem.c +++ b/mm/bootmem.c @@ -756,32 +756,6 @@ void * __init __alloc_bootmem_node_high(pg_data_t *pgdat, unsigned long size, } -#ifdef CONFIG_SPARSEMEM -/** - * alloc_bootmem_section - allocate boot memory from a specific section - * @size: size of the request in bytes - * @section_nr: sparse map section to allocate from - * - * Return NULL on failure. - */ -void * __init alloc_bootmem_section(unsigned long size, - unsigned long section_nr) -{ - bootmem_data_t *bdata; - unsigned long pfn, goal, limit; - - pfn = section_nr_to_pfn(section_nr); - goal = pfn << PAGE_SHIFT; - if (size > BYTES_PER_SECTION) - limit = 0; - else - limit = section_nr_to_pfn(section_nr + 1) << PAGE_SHIFT; - bdata = &bootmem_node_data[early_pfn_to_nid(pfn)]; - - return alloc_bootmem_core(bdata, size, SMP_CACHE_BYTES, goal, limit); -} -#endif - void * __init __alloc_bootmem_node_nopanic(pg_data_t *pgdat, unsigned long size, unsigned long align, unsigned long goal) { diff --git a/mm/sparse.c b/mm/sparse.c index 9e032dc..ac0d5a3 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -273,10 +273,10 @@ static unsigned long *__kmalloc_section_usemap(void) #ifdef CONFIG_MEMORY_HOTREMOVE static unsigned long * __init sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat, - unsigned long count) + unsigned long size) { - unsigned long section_nr; - + pg_data_t *host_pgdat; + unsigned long goal; /* * A page may contain usemaps for other sections preventing the * page being freed and making a section unremovable while @@ -287,8 +287,9 @@ sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat, * from the same section as the pgdat where possible to avoid * this problem. */ - section_nr = pfn_to_section_nr(__pa(pgdat) >> PAGE_SHIFT); - return alloc_bootmem_section(usemap_size() * count, section_nr); + goal = __pa(pgdat) & PAGE_SECTION_MASK; + host_pgdat = NODE_DATA(early_pfn_to_nid(goal)); + return __alloc_bootmem_node(host_pgdat, size, SMP_CACHE_BYTES, goal); } static void __init check_usemap_section_nr(int nid, unsigned long *usemap) @@ -332,9 +333,9 @@ static void __init check_usemap_section_nr(int nid, unsigned long *usemap) #else static unsigned long * __init sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat, - unsigned long count) + unsigned long size) { - return NULL; + return alloc_bootmem_node(pgdat, size); } static void __init check_usemap_section_nr(int nid, unsigned long *usemap) @@ -352,19 +353,7 @@ static void __init sparse_early_usemaps_alloc_node(unsigned long**usemap_map, int size = usemap_size(); usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nodeid), - usemap_count); - if (usemap) { - for (pnum = pnum_begin; pnum < pnum_end; pnum++) { - if (!present_section_nr(pnum)) - continue; - usemap_map[pnum] = usemap; - usemap += size; - check_usemap_section_nr(nodeid, usemap_map[pnum]); - } - return; - } - - usemap = alloc_bootmem_node(NODE_DATA(nodeid), size * usemap_count); + size * usemap_count); if (usemap) { for (pnum = pnum_begin; pnum < pnum_end; pnum++) { if (!present_section_nr(pnum)) -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] sparsemem/bootmem: catch greater than section size allocations 2012-02-28 13:53 ` Johannes Weiner @ 2012-02-28 20:11 ` Nishanth Aravamudan 2012-02-29 9:17 ` Johannes Weiner 0 siblings, 1 reply; 11+ messages in thread From: Nishanth Aravamudan @ 2012-02-28 20:11 UTC (permalink / raw) To: Johannes Weiner Cc: Anton Blanchard, Dave Hansen, linux-mm, Paul Mackerras, Nishanth Aravamudan, Andrew Morton, Robert Jennings, linuxppc-dev On 28.02.2012 [14:53:26 +0100], Johannes Weiner wrote: > On Fri, Feb 24, 2012 at 11:33:58AM -0800, Nishanth Aravamudan wrote: > > While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory > > Overcommit) on powerpc, we tripped the following: > > > > kernel BUG at mm/bootmem.c:483! > > cpu 0x0: Vector: 700 (Program Check) at [c000000000c03940] > > pc: c000000000a62bd8: .alloc_bootmem_core+0x90/0x39c > > lr: c000000000a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c > > sp: c000000000c03bc0 > > msr: 8000000000021032 > > current = 0xc000000000b0cce0 > > paca = 0xc000000001d80000 > > pid = 0, comm = swapper > > kernel BUG at mm/bootmem.c:483! > > enter ? for help > > [c000000000c03c80] c000000000a64bcc > > .sparse_early_usemaps_alloc_node+0x84/0x29c > > [c000000000c03d50] c000000000a64f10 .sparse_init+0x12c/0x28c > > [c000000000c03e20] c000000000a474f4 .setup_arch+0x20c/0x294 > > [c000000000c03ee0] c000000000a4079c .start_kernel+0xb4/0x460 > > [c000000000c03f90] c000000000009670 .start_here_common+0x1c/0x2c > > > > This is > > > > BUG_ON(limit && goal + size > limit); > > > > and after some debugging, it seems that > > > > goal = 0x7ffff000000 > > limit = 0x80000000000 > > > > and sparse_early_usemaps_alloc_node -> > > sparse_early_usemaps_alloc_pgdat_section -> alloc_bootmem_section calls > > > > return alloc_bootmem_section(usemap_size() * count, section_nr); > > > > This is on a system with 8TB available via the AMS pool, and as a quirk > > of AMS in firmware, all of that memory shows up in node 0. So, we end up > > with an allocation that will fail the goal/limit constraints. In theory, > > we could "fall-back" to alloc_bootmem_node() in > > sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE > > defined, we'll BUG_ON() instead. A simple solution appears to be to > > disable the limit check if the size of the allocation in > > alloc_bootmem_secition exceeds the section size. > > It makes sense to allow the usemaps to spill over to subsequent > sections instead of panicking, so FWIW: > > Acked-by: Johannes Weiner <hannes@cmpxchg.org> > > That being said, it would be good if check_usemap_section_nr() printed > the cross-dependencies between pgdats and sections when the usemaps of > a node spilled over to other sections than the ones holding the pgdat. > > How about this? > > --- > From: Johannes Weiner <hannes@cmpxchg.org> > Subject: sparsemem/bootmem: catch greater than section size allocations fix > > If alloc_bootmem_section() no longer guarantees section-locality, we > need check_usemap_section_nr() to print possible cross-dependencies > between node descriptors and the usemaps allocated through it. > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> > --- > > diff --git a/mm/sparse.c b/mm/sparse.c > index 61d7cde..9e032dc 100644 > --- a/mm/sparse.c > +++ b/mm/sparse.c > @@ -359,6 +359,7 @@ static void __init sparse_early_usemaps_alloc_node(unsigned long**usemap_map, > continue; > usemap_map[pnum] = usemap; > usemap += size; > + check_usemap_section_nr(nodeid, usemap_map[pnum]); > } > return; > } This makes sense to me -- ok if I fold it into the re-worked patch (based upon Mel's comments)? > --- > > Furthermore, I wonder if we can remove the sparse-specific stuff from > bootmem.c as well, as now even more so than before, calculating the > desired area is really none of bootmem's business. > > Would something like this be okay? > > --- > From: Johannes Weiner <hannes@cmpxchg.org> > Subject: [patch] mm: remove sparsemem allocation details from the bootmem allocator > > alloc_bootmem_section() derives allocation area constraints from the > specified sparsemem section. This is a bit specific for a generic > memory allocator like bootmem, though, so move it over to sparsemem. > > Since __alloc_bootmem_node() already retries failed allocations with > relaxed area constraints, the fallback code in sparsemem.c can be > removed and the code becomes a bit more compact overall. > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> I've not tested it, but the intention seems sensible. I think it should remain a separate change. Thanks, Nish -- Nishanth Aravamudan <nacc@us.ibm.com> IBM Linux Technology Center ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sparsemem/bootmem: catch greater than section size allocations 2012-02-28 20:11 ` Nishanth Aravamudan @ 2012-02-29 9:17 ` Johannes Weiner 0 siblings, 0 replies; 11+ messages in thread From: Johannes Weiner @ 2012-02-29 9:17 UTC (permalink / raw) To: Nishanth Aravamudan Cc: Anton Blanchard, Dave Hansen, linux-mm, Paul Mackerras, Nishanth Aravamudan, Andrew Morton, Robert Jennings, linuxppc-dev On Tue, Feb 28, 2012 at 12:11:51PM -0800, Nishanth Aravamudan wrote: > On 28.02.2012 [14:53:26 +0100], Johannes Weiner wrote: > > On Fri, Feb 24, 2012 at 11:33:58AM -0800, Nishanth Aravamudan wrote: > > > While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory > > > Overcommit) on powerpc, we tripped the following: > > > > > > kernel BUG at mm/bootmem.c:483! > > > cpu 0x0: Vector: 700 (Program Check) at [c000000000c03940] > > > pc: c000000000a62bd8: .alloc_bootmem_core+0x90/0x39c > > > lr: c000000000a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c > > > sp: c000000000c03bc0 > > > msr: 8000000000021032 > > > current = 0xc000000000b0cce0 > > > paca = 0xc000000001d80000 > > > pid = 0, comm = swapper > > > kernel BUG at mm/bootmem.c:483! > > > enter ? for help > > > [c000000000c03c80] c000000000a64bcc > > > .sparse_early_usemaps_alloc_node+0x84/0x29c > > > [c000000000c03d50] c000000000a64f10 .sparse_init+0x12c/0x28c > > > [c000000000c03e20] c000000000a474f4 .setup_arch+0x20c/0x294 > > > [c000000000c03ee0] c000000000a4079c .start_kernel+0xb4/0x460 > > > [c000000000c03f90] c000000000009670 .start_here_common+0x1c/0x2c > > > > > > This is > > > > > > BUG_ON(limit && goal + size > limit); > > > > > > and after some debugging, it seems that > > > > > > goal = 0x7ffff000000 > > > limit = 0x80000000000 > > > > > > and sparse_early_usemaps_alloc_node -> > > > sparse_early_usemaps_alloc_pgdat_section -> alloc_bootmem_section calls > > > > > > return alloc_bootmem_section(usemap_size() * count, section_nr); > > > > > > This is on a system with 8TB available via the AMS pool, and as a quirk > > > of AMS in firmware, all of that memory shows up in node 0. So, we end up > > > with an allocation that will fail the goal/limit constraints. In theory, > > > we could "fall-back" to alloc_bootmem_node() in > > > sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE > > > defined, we'll BUG_ON() instead. A simple solution appears to be to > > > disable the limit check if the size of the allocation in > > > alloc_bootmem_secition exceeds the section size. > > > > It makes sense to allow the usemaps to spill over to subsequent > > sections instead of panicking, so FWIW: > > > > Acked-by: Johannes Weiner <hannes@cmpxchg.org> > > > > That being said, it would be good if check_usemap_section_nr() printed > > the cross-dependencies between pgdats and sections when the usemaps of > > a node spilled over to other sections than the ones holding the pgdat. > > > > How about this? > > > > --- > > From: Johannes Weiner <hannes@cmpxchg.org> > > Subject: sparsemem/bootmem: catch greater than section size allocations fix > > > > If alloc_bootmem_section() no longer guarantees section-locality, we > > need check_usemap_section_nr() to print possible cross-dependencies > > between node descriptors and the usemaps allocated through it. > > > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> > > --- > > > > diff --git a/mm/sparse.c b/mm/sparse.c > > index 61d7cde..9e032dc 100644 > > --- a/mm/sparse.c > > +++ b/mm/sparse.c > > @@ -359,6 +359,7 @@ static void __init sparse_early_usemaps_alloc_node(unsigned long**usemap_map, > > continue; > > usemap_map[pnum] = usemap; > > usemap += size; > > + check_usemap_section_nr(nodeid, usemap_map[pnum]); > > } > > return; > > } > > This makes sense to me -- ok if I fold it into the re-worked patch > (based upon Mel's comments)? Sure thing! > > Furthermore, I wonder if we can remove the sparse-specific stuff from > > bootmem.c as well, as now even more so than before, calculating the > > desired area is really none of bootmem's business. > > > > Would something like this be okay? > > > > --- > > From: Johannes Weiner <hannes@cmpxchg.org> > > Subject: [patch] mm: remove sparsemem allocation details from the bootmem allocator > > > > alloc_bootmem_section() derives allocation area constraints from the > > specified sparsemem section. This is a bit specific for a generic > > memory allocator like bootmem, though, so move it over to sparsemem. > > > > Since __alloc_bootmem_node() already retries failed allocations with > > relaxed area constraints, the fallback code in sparsemem.c can be > > removed and the code becomes a bit more compact overall. > > > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> > > I've not tested it, but the intention seems sensible. I think it should > remain a separate change. Yes, I agree. I'll resend it in a bit as stand-alone patch. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sparsemem/bootmem: catch greater than section size allocations 2012-02-24 19:33 [PATCH] sparsemem/bootmem: catch greater than section size allocations Nishanth Aravamudan 2012-02-28 13:53 ` Johannes Weiner @ 2012-02-28 15:47 ` Mel Gorman 2012-02-29 18:12 ` [PATCH v2] bootmem/sparsemem: remove limit constraint in alloc_bootmem_section Nishanth Aravamudan 1 sibling, 1 reply; 11+ messages in thread From: Mel Gorman @ 2012-02-28 15:47 UTC (permalink / raw) To: Nishanth Aravamudan Cc: Anton Blanchard, Dave Hansen, linux-mm, Paul Mackerras, Andrew Morton, Robert Jennings, linuxppc-dev On Fri, Feb 24, 2012 at 11:33:58AM -0800, Nishanth Aravamudan wrote: > While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory > Overcommit) on powerpc, we tripped the following: > > kernel BUG at mm/bootmem.c:483! > cpu 0x0: Vector: 700 (Program Check) at [c000000000c03940] > pc: c000000000a62bd8: .alloc_bootmem_core+0x90/0x39c > lr: c000000000a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c > sp: c000000000c03bc0 > msr: 8000000000021032 > current = 0xc000000000b0cce0 > paca = 0xc000000001d80000 > pid = 0, comm = swapper > kernel BUG at mm/bootmem.c:483! > enter ? for help > [c000000000c03c80] c000000000a64bcc > .sparse_early_usemaps_alloc_node+0x84/0x29c > [c000000000c03d50] c000000000a64f10 .sparse_init+0x12c/0x28c > [c000000000c03e20] c000000000a474f4 .setup_arch+0x20c/0x294 > [c000000000c03ee0] c000000000a4079c .start_kernel+0xb4/0x460 > [c000000000c03f90] c000000000009670 .start_here_common+0x1c/0x2c > > This is > > BUG_ON(limit && goal + size > limit); > > and after some debugging, it seems that > > goal = 0x7ffff000000 > limit = 0x80000000000 > > and sparse_early_usemaps_alloc_node -> > sparse_early_usemaps_alloc_pgdat_section -> alloc_bootmem_section calls > > return alloc_bootmem_section(usemap_size() * count, section_nr); > > This is on a system with 8TB available via the AMS pool, and as a quirk > of AMS in firmware, all of that memory shows up in node 0. So, we end up > with an allocation that will fail the goal/limit constraints. In theory, > we could "fall-back" to alloc_bootmem_node() in > sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE > defined, we'll BUG_ON() instead. A simple solution appears to be to > disable the limit check if the size of the allocation in > alloc_bootmem_secition exceeds the section size. > > Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> > Cc: Dave Hansen <haveblue@us.ibm.com> > Cc: Anton Blanchard <anton@au1.ibm.com> > Cc: Paul Mackerras <paulus@samba.org> > Cc: Ben Herrenschmidt <benh@kernel.crashing.org> > Cc: Robert Jennings <rcj@linux.vnet.ibm.com> > Cc: linux-mm@kvack.org > Cc: linuxppc-dev@lists.ozlabs.org > --- > include/linux/mmzone.h | 2 ++ > mm/bootmem.c | 5 ++++- > 2 files changed, 6 insertions(+), 1 deletions(-) > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index 650ba2f..4176834 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -967,6 +967,8 @@ static inline unsigned long early_pfn_to_nid(unsigned long pfn) > * PA_SECTION_SHIFT physical address to/from section number > * PFN_SECTION_SHIFT pfn to/from section number > */ > +#define BYTES_PER_SECTION (1UL << SECTION_SIZE_BITS) > + > #define SECTIONS_SHIFT (MAX_PHYSMEM_BITS - SECTION_SIZE_BITS) > > #define PA_SECTION_SHIFT (SECTION_SIZE_BITS) > diff --git a/mm/bootmem.c b/mm/bootmem.c > index 668e94d..5cbbc76 100644 > --- a/mm/bootmem.c > +++ b/mm/bootmem.c > @@ -770,7 +770,10 @@ void * __init alloc_bootmem_section(unsigned long size, > > pfn = section_nr_to_pfn(section_nr); > goal = pfn << PAGE_SHIFT; > - limit = section_nr_to_pfn(section_nr + 1) << PAGE_SHIFT; > + if (size > BYTES_PER_SECTION) > + limit = 0; > + else > + limit = section_nr_to_pfn(section_nr + 1) << PAGE_SHIFT; As it's ok to spill the allocation over to an adjacent section, why not just make limit==0 unconditionally. That would avoid defining BYTES_PER_SECTION. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2] bootmem/sparsemem: remove limit constraint in alloc_bootmem_section 2012-02-28 15:47 ` Mel Gorman @ 2012-02-29 18:12 ` Nishanth Aravamudan 2012-02-29 18:45 ` Johannes Weiner ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Nishanth Aravamudan @ 2012-02-29 18:12 UTC (permalink / raw) To: Mel Gorman Cc: Anton Blanchard, Dave Hansen, linux-mm, Paul Mackerras, Johannes Weiner, Andrew Morton, Robert Jennings, linuxppc-dev On 28.02.2012 [15:47:32 +0000], Mel Gorman wrote: > On Fri, Feb 24, 2012 at 11:33:58AM -0800, Nishanth Aravamudan wrote: > > While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory > > Overcommit) on powerpc, we tripped the following: > > > > kernel BUG at mm/bootmem.c:483! > > cpu 0x0: Vector: 700 (Program Check) at [c000000000c03940] > > pc: c000000000a62bd8: .alloc_bootmem_core+0x90/0x39c > > lr: c000000000a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c > > sp: c000000000c03bc0 > > msr: 8000000000021032 > > current = 0xc000000000b0cce0 > > paca = 0xc000000001d80000 > > pid = 0, comm = swapper > > kernel BUG at mm/bootmem.c:483! > > enter ? for help > > [c000000000c03c80] c000000000a64bcc > > .sparse_early_usemaps_alloc_node+0x84/0x29c > > [c000000000c03d50] c000000000a64f10 .sparse_init+0x12c/0x28c > > [c000000000c03e20] c000000000a474f4 .setup_arch+0x20c/0x294 > > [c000000000c03ee0] c000000000a4079c .start_kernel+0xb4/0x460 > > [c000000000c03f90] c000000000009670 .start_here_common+0x1c/0x2c > > > > This is > > > > BUG_ON(limit && goal + size > limit); > > > > and after some debugging, it seems that > > > > goal = 0x7ffff000000 > > limit = 0x80000000000 > > > > and sparse_early_usemaps_alloc_node -> > > sparse_early_usemaps_alloc_pgdat_section -> alloc_bootmem_section calls > > > > return alloc_bootmem_section(usemap_size() * count, section_nr); > > > > This is on a system with 8TB available via the AMS pool, and as a quirk > > of AMS in firmware, all of that memory shows up in node 0. So, we end up > > with an allocation that will fail the goal/limit constraints. In theory, > > we could "fall-back" to alloc_bootmem_node() in > > sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE > > defined, we'll BUG_ON() instead. A simple solution appears to be to > > disable the limit check if the size of the allocation in > > alloc_bootmem_secition exceeds the section size. > > > > Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> > > Cc: Dave Hansen <haveblue@us.ibm.com> > > Cc: Anton Blanchard <anton@au1.ibm.com> > > Cc: Paul Mackerras <paulus@samba.org> > > Cc: Ben Herrenschmidt <benh@kernel.crashing.org> > > Cc: Robert Jennings <rcj@linux.vnet.ibm.com> > > Cc: linux-mm@kvack.org > > Cc: linuxppc-dev@lists.ozlabs.org > > --- > > include/linux/mmzone.h | 2 ++ > > mm/bootmem.c | 5 ++++- > > 2 files changed, 6 insertions(+), 1 deletions(-) > > > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > > index 650ba2f..4176834 100644 > > --- a/include/linux/mmzone.h > > +++ b/include/linux/mmzone.h > > @@ -967,6 +967,8 @@ static inline unsigned long early_pfn_to_nid(unsigned long pfn) > > * PA_SECTION_SHIFT physical address to/from section number > > * PFN_SECTION_SHIFT pfn to/from section number > > */ > > +#define BYTES_PER_SECTION (1UL << SECTION_SIZE_BITS) > > + > > #define SECTIONS_SHIFT (MAX_PHYSMEM_BITS - SECTION_SIZE_BITS) > > > > #define PA_SECTION_SHIFT (SECTION_SIZE_BITS) > > diff --git a/mm/bootmem.c b/mm/bootmem.c > > index 668e94d..5cbbc76 100644 > > --- a/mm/bootmem.c > > +++ b/mm/bootmem.c > > @@ -770,7 +770,10 @@ void * __init alloc_bootmem_section(unsigned long size, > > > > pfn = section_nr_to_pfn(section_nr); > > goal = pfn << PAGE_SHIFT; > > - limit = section_nr_to_pfn(section_nr + 1) << PAGE_SHIFT; > > + if (size > BYTES_PER_SECTION) > > + limit = 0; > > + else > > + limit = section_nr_to_pfn(section_nr + 1) << PAGE_SHIFT; > > As it's ok to spill the allocation over to an adjacent section, why not > just make limit==0 unconditionally. That would avoid defining > BYTES_PER_SECTION. Something like this? Andrew, presuming Mel & Johannes give their, ack this should presumably supersede the patch you pulled into -mm. Thanks, Nish ------- While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory Overcommit) on powerpc, we tripped the following: kernel BUG at mm/bootmem.c:483! cpu 0x0: Vector: 700 (Program Check) at [c000000000c03940] pc: c000000000a62bd8: .alloc_bootmem_core+0x90/0x39c lr: c000000000a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c sp: c000000000c03bc0 msr: 8000000000021032 current = 0xc000000000b0cce0 paca = 0xc000000001d80000 pid = 0, comm = swapper kernel BUG at mm/bootmem.c:483! enter ? for help [c000000000c03c80] c000000000a64bcc .sparse_early_usemaps_alloc_node+0x84/0x29c [c000000000c03d50] c000000000a64f10 .sparse_init+0x12c/0x28c [c000000000c03e20] c000000000a474f4 .setup_arch+0x20c/0x294 [c000000000c03ee0] c000000000a4079c .start_kernel+0xb4/0x460 [c000000000c03f90] c000000000009670 .start_here_common+0x1c/0x2c This is BUG_ON(limit && goal + size > limit); and after some debugging, it seems that goal = 0x7ffff000000 limit = 0x80000000000 and sparse_early_usemaps_alloc_node -> sparse_early_usemaps_alloc_pgdat_section calls return alloc_bootmem_section(usemap_size() * count, section_nr); This is on a system with 8TB available via the AMS pool, and as a quirk of AMS in firmware, all of that memory shows up in node 0. So, we end up with an allocation that will fail the goal/limit constraints. In theory, we could "fall-back" to alloc_bootmem_node() in sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE defined, we'll BUG_ON() instead. A simple solution appears to be to unconditionally remove the limit condition in alloc_bootmem_section, meaning allocations are allowed to cross section boundaries (necessary for systems of this size). Johannes Weiner pointed out that if alloc_bootmem_section() no longer guarantees section-locality, we need check_usemap_section_nr() to print possible cross-dependencies between node descriptors and the usemaps allocated through it. That makes the two loops in sparse_early_usemaps_alloc_node() identical, so re-factor the code a bit. Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> --- v2: Unconditionally set limit to 0. Fold in Johannes' changes to sparse_early_usemaps_alloc_node. diff --git a/mm/bootmem.c b/mm/bootmem.c index 668e94d..9c9ae09 100644 --- a/mm/bootmem.c +++ b/mm/bootmem.c @@ -770,7 +770,7 @@ void * __init alloc_bootmem_section(unsigned long size, pfn = section_nr_to_pfn(section_nr); goal = pfn << PAGE_SHIFT; - limit = section_nr_to_pfn(section_nr + 1) << PAGE_SHIFT; + limit = 0; bdata = &bootmem_node_data[early_pfn_to_nid(pfn)]; return alloc_bootmem_core(bdata, size, SMP_CACHE_BYTES, goal, limit); diff --git a/mm/sparse.c b/mm/sparse.c index 61d7cde..a8bc7d3 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -353,29 +353,21 @@ static void __init sparse_early_usemaps_alloc_node(unsigned long**usemap_map, usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nodeid), usemap_count); - if (usemap) { - for (pnum = pnum_begin; pnum < pnum_end; pnum++) { - if (!present_section_nr(pnum)) - continue; - usemap_map[pnum] = usemap; - usemap += size; + if (!usemap) { + usemap = alloc_bootmem_node(NODE_DATA(nodeid), size * usemap_count); + if (!usemap) { + printk(KERN_WARNING "%s: allocation failed\n", __func__); + return; } - return; } - usemap = alloc_bootmem_node(NODE_DATA(nodeid), size * usemap_count); - if (usemap) { - for (pnum = pnum_begin; pnum < pnum_end; pnum++) { - if (!present_section_nr(pnum)) - continue; - usemap_map[pnum] = usemap; - usemap += size; - check_usemap_section_nr(nodeid, usemap_map[pnum]); - } - return; + for (pnum = pnum_begin; pnum < pnum_end; pnum++) { + if (!present_section_nr(pnum)) + continue; + usemap_map[pnum] = usemap; + usemap += size; + check_usemap_section_nr(nodeid, usemap_map[pnum]); } - - printk(KERN_WARNING "%s: allocation failed\n", __func__); } #ifndef CONFIG_SPARSEMEM_VMEMMAP -- Nishanth Aravamudan <nacc@us.ibm.com> IBM Linux Technology Center ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2] bootmem/sparsemem: remove limit constraint in alloc_bootmem_section 2012-02-29 18:12 ` [PATCH v2] bootmem/sparsemem: remove limit constraint in alloc_bootmem_section Nishanth Aravamudan @ 2012-02-29 18:45 ` Johannes Weiner 2012-02-29 23:28 ` Andrew Morton 2012-03-01 11:42 ` Mel Gorman 2 siblings, 0 replies; 11+ messages in thread From: Johannes Weiner @ 2012-02-29 18:45 UTC (permalink / raw) To: Nishanth Aravamudan Cc: Anton Blanchard, Dave Hansen, linux-mm, Paul Mackerras, Mel Gorman, Andrew Morton, Robert Jennings, linuxppc-dev On Wed, Feb 29, 2012 at 10:12:33AM -0800, Nishanth Aravamudan wrote: > On 28.02.2012 [15:47:32 +0000], Mel Gorman wrote: > > On Fri, Feb 24, 2012 at 11:33:58AM -0800, Nishanth Aravamudan wrote: > > > While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory > > > Overcommit) on powerpc, we tripped the following: > > > > > > kernel BUG at mm/bootmem.c:483! > > > cpu 0x0: Vector: 700 (Program Check) at [c000000000c03940] > > > pc: c000000000a62bd8: .alloc_bootmem_core+0x90/0x39c > > > lr: c000000000a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c > > > sp: c000000000c03bc0 > > > msr: 8000000000021032 > > > current = 0xc000000000b0cce0 > > > paca = 0xc000000001d80000 > > > pid = 0, comm = swapper > > > kernel BUG at mm/bootmem.c:483! > > > enter ? for help > > > [c000000000c03c80] c000000000a64bcc > > > .sparse_early_usemaps_alloc_node+0x84/0x29c > > > [c000000000c03d50] c000000000a64f10 .sparse_init+0x12c/0x28c > > > [c000000000c03e20] c000000000a474f4 .setup_arch+0x20c/0x294 > > > [c000000000c03ee0] c000000000a4079c .start_kernel+0xb4/0x460 > > > [c000000000c03f90] c000000000009670 .start_here_common+0x1c/0x2c > > > > > > This is > > > > > > BUG_ON(limit && goal + size > limit); > > > > > > and after some debugging, it seems that > > > > > > goal = 0x7ffff000000 > > > limit = 0x80000000000 > > > > > > and sparse_early_usemaps_alloc_node -> > > > sparse_early_usemaps_alloc_pgdat_section -> alloc_bootmem_section calls > > > > > > return alloc_bootmem_section(usemap_size() * count, section_nr); > > > > > > This is on a system with 8TB available via the AMS pool, and as a quirk > > > of AMS in firmware, all of that memory shows up in node 0. So, we end up > > > with an allocation that will fail the goal/limit constraints. In theory, > > > we could "fall-back" to alloc_bootmem_node() in > > > sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE > > > defined, we'll BUG_ON() instead. A simple solution appears to be to > > > disable the limit check if the size of the allocation in > > > alloc_bootmem_secition exceeds the section size. > > > > > > Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> > > > Cc: Dave Hansen <haveblue@us.ibm.com> > > > Cc: Anton Blanchard <anton@au1.ibm.com> > > > Cc: Paul Mackerras <paulus@samba.org> > > > Cc: Ben Herrenschmidt <benh@kernel.crashing.org> > > > Cc: Robert Jennings <rcj@linux.vnet.ibm.com> > > > Cc: linux-mm@kvack.org > > > Cc: linuxppc-dev@lists.ozlabs.org > > > --- > > > include/linux/mmzone.h | 2 ++ > > > mm/bootmem.c | 5 ++++- > > > 2 files changed, 6 insertions(+), 1 deletions(-) > > > > > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > > > index 650ba2f..4176834 100644 > > > --- a/include/linux/mmzone.h > > > +++ b/include/linux/mmzone.h > > > @@ -967,6 +967,8 @@ static inline unsigned long early_pfn_to_nid(unsigned long pfn) > > > * PA_SECTION_SHIFT physical address to/from section number > > > * PFN_SECTION_SHIFT pfn to/from section number > > > */ > > > +#define BYTES_PER_SECTION (1UL << SECTION_SIZE_BITS) > > > + > > > #define SECTIONS_SHIFT (MAX_PHYSMEM_BITS - SECTION_SIZE_BITS) > > > > > > #define PA_SECTION_SHIFT (SECTION_SIZE_BITS) > > > diff --git a/mm/bootmem.c b/mm/bootmem.c > > > index 668e94d..5cbbc76 100644 > > > --- a/mm/bootmem.c > > > +++ b/mm/bootmem.c > > > @@ -770,7 +770,10 @@ void * __init alloc_bootmem_section(unsigned long size, > > > > > > pfn = section_nr_to_pfn(section_nr); > > > goal = pfn << PAGE_SHIFT; > > > - limit = section_nr_to_pfn(section_nr + 1) << PAGE_SHIFT; > > > + if (size > BYTES_PER_SECTION) > > > + limit = 0; > > > + else > > > + limit = section_nr_to_pfn(section_nr + 1) << PAGE_SHIFT; > > > > As it's ok to spill the allocation over to an adjacent section, why not > > just make limit==0 unconditionally. That would avoid defining > > BYTES_PER_SECTION. > > Something like this? > > Andrew, presuming Mel & Johannes give their, ack this should presumably > supersede the patch you pulled into -mm. > > Thanks, > Nish > > ------- > > While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory > Overcommit) on powerpc, we tripped the following: > > kernel BUG at mm/bootmem.c:483! > cpu 0x0: Vector: 700 (Program Check) at [c000000000c03940] > pc: c000000000a62bd8: .alloc_bootmem_core+0x90/0x39c > lr: c000000000a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c > sp: c000000000c03bc0 > msr: 8000000000021032 > current = 0xc000000000b0cce0 > paca = 0xc000000001d80000 > pid = 0, comm = swapper > kernel BUG at mm/bootmem.c:483! > enter ? for help > [c000000000c03c80] c000000000a64bcc > .sparse_early_usemaps_alloc_node+0x84/0x29c > [c000000000c03d50] c000000000a64f10 .sparse_init+0x12c/0x28c > [c000000000c03e20] c000000000a474f4 .setup_arch+0x20c/0x294 > [c000000000c03ee0] c000000000a4079c .start_kernel+0xb4/0x460 > [c000000000c03f90] c000000000009670 .start_here_common+0x1c/0x2c > > This is > > BUG_ON(limit && goal + size > limit); > > and after some debugging, it seems that > > goal = 0x7ffff000000 > limit = 0x80000000000 > > and sparse_early_usemaps_alloc_node -> > sparse_early_usemaps_alloc_pgdat_section calls > > return alloc_bootmem_section(usemap_size() * count, section_nr); > > This is on a system with 8TB available via the AMS pool, and as a quirk > of AMS in firmware, all of that memory shows up in node 0. So, we end up > with an allocation that will fail the goal/limit constraints. In theory, > we could "fall-back" to alloc_bootmem_node() in > sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE > defined, we'll BUG_ON() instead. A simple solution appears to be to > unconditionally remove the limit condition in alloc_bootmem_section, > meaning allocations are allowed to cross section boundaries (necessary > for systems of this size). > > Johannes Weiner pointed out that if alloc_bootmem_section() no longer > guarantees section-locality, we need check_usemap_section_nr() to print > possible cross-dependencies between node descriptors and the usemaps > allocated through it. That makes the two loops in > sparse_early_usemaps_alloc_node() identical, so re-factor the code a > bit. > > Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] bootmem/sparsemem: remove limit constraint in alloc_bootmem_section 2012-02-29 18:12 ` [PATCH v2] bootmem/sparsemem: remove limit constraint in alloc_bootmem_section Nishanth Aravamudan 2012-02-29 18:45 ` Johannes Weiner @ 2012-02-29 23:28 ` Andrew Morton 2012-03-01 0:03 ` Nishanth Aravamudan 2012-03-01 23:12 ` Nishanth Aravamudan 2012-03-01 11:42 ` Mel Gorman 2 siblings, 2 replies; 11+ messages in thread From: Andrew Morton @ 2012-02-29 23:28 UTC (permalink / raw) To: Nishanth Aravamudan Cc: Anton Blanchard, Dave Hansen, stable, linux-mm, Paul Mackerras, Mel Gorman, Johannes Weiner, Robert Jennings, linuxppc-dev On Wed, 29 Feb 2012 10:12:33 -0800 Nishanth Aravamudan <nacc@linux.vnet.ibm.com> wrote: > While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory > Overcommit) on powerpc, we tripped the following: > > kernel BUG at mm/bootmem.c:483! > > ... > > This is > > BUG_ON(limit && goal + size > limit); > > and after some debugging, it seems that > > goal = 0x7ffff000000 > limit = 0x80000000000 > > and sparse_early_usemaps_alloc_node -> > sparse_early_usemaps_alloc_pgdat_section calls > > return alloc_bootmem_section(usemap_size() * count, section_nr); > > This is on a system with 8TB available via the AMS pool, and as a quirk > of AMS in firmware, all of that memory shows up in node 0. So, we end up > with an allocation that will fail the goal/limit constraints. In theory, > we could "fall-back" to alloc_bootmem_node() in > sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE > defined, we'll BUG_ON() instead. A simple solution appears to be to > unconditionally remove the limit condition in alloc_bootmem_section, > meaning allocations are allowed to cross section boundaries (necessary > for systems of this size). > > Johannes Weiner pointed out that if alloc_bootmem_section() no longer > guarantees section-locality, we need check_usemap_section_nr() to print > possible cross-dependencies between node descriptors and the usemaps > allocated through it. That makes the two loops in > sparse_early_usemaps_alloc_node() identical, so re-factor the code a > bit. The patch is a bit scary now, so I think we should merge it into 3.4-rc1 and then backport it into 3.3.1 if nothing blows up. Do you think it should be backported into 3.3.x? Earlier kernels? Also, this? --- a/mm/bootmem.c~bootmem-sparsemem-remove-limit-constraint-in-alloc_bootmem_section-fix +++ a/mm/bootmem.c @@ -766,14 +766,13 @@ void * __init alloc_bootmem_section(unsi unsigned long section_nr) { bootmem_data_t *bdata; - unsigned long pfn, goal, limit; + unsigned long pfn, goal; pfn = section_nr_to_pfn(section_nr); goal = pfn << PAGE_SHIFT; - limit = 0; bdata = &bootmem_node_data[early_pfn_to_nid(pfn)]; - return alloc_bootmem_core(bdata, size, SMP_CACHE_BYTES, goal, limit); + return alloc_bootmem_core(bdata, size, SMP_CACHE_BYTES, goal, 0); } #endif _ ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] bootmem/sparsemem: remove limit constraint in alloc_bootmem_section 2012-02-29 23:28 ` Andrew Morton @ 2012-03-01 0:03 ` Nishanth Aravamudan 2012-03-01 23:12 ` Nishanth Aravamudan 1 sibling, 0 replies; 11+ messages in thread From: Nishanth Aravamudan @ 2012-03-01 0:03 UTC (permalink / raw) To: Andrew Morton Cc: Anton Blanchard, Dave Hansen, stable, linux-mm, Paul Mackerras, Mel Gorman, Johannes Weiner, Robert Jennings, linuxppc-dev On 29.02.2012 [15:28:30 -0800], Andrew Morton wrote: > On Wed, 29 Feb 2012 10:12:33 -0800 > Nishanth Aravamudan <nacc@linux.vnet.ibm.com> wrote: > > > While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory > > Overcommit) on powerpc, we tripped the following: > > > > kernel BUG at mm/bootmem.c:483! > > > > ... > > > > This is > > > > BUG_ON(limit && goal + size > limit); > > > > and after some debugging, it seems that > > > > goal = 0x7ffff000000 > > limit = 0x80000000000 > > > > and sparse_early_usemaps_alloc_node -> > > sparse_early_usemaps_alloc_pgdat_section calls > > > > return alloc_bootmem_section(usemap_size() * count, section_nr); > > > > This is on a system with 8TB available via the AMS pool, and as a quirk > > of AMS in firmware, all of that memory shows up in node 0. So, we end up > > with an allocation that will fail the goal/limit constraints. In theory, > > we could "fall-back" to alloc_bootmem_node() in > > sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE > > defined, we'll BUG_ON() instead. A simple solution appears to be to > > unconditionally remove the limit condition in alloc_bootmem_section, > > meaning allocations are allowed to cross section boundaries (necessary > > for systems of this size). > > > > Johannes Weiner pointed out that if alloc_bootmem_section() no longer > > guarantees section-locality, we need check_usemap_section_nr() to print > > possible cross-dependencies between node descriptors and the usemaps > > allocated through it. That makes the two loops in > > sparse_early_usemaps_alloc_node() identical, so re-factor the code a > > bit. > > The patch is a bit scary now, so I think we should merge it into > 3.4-rc1 and then backport it into 3.3.1 if nothing blows up. I think that's fair. > Do you think it should be backported into 3.3.x? Earlier kernels? 3.3.x seems reasonable. If I had to guess, I think this could be hit on any kernels with this functionality -- that is, sparsemem in general? Not sure how far back it's worth backporting. > Also, this? Urgh, yeah, that's way better. Acked-by: Nishanth Aravamudan <nacc@us.ibm.com> > --- a/mm/bootmem.c~bootmem-sparsemem-remove-limit-constraint-in-alloc_bootmem_section-fix > +++ a/mm/bootmem.c > @@ -766,14 +766,13 @@ void * __init alloc_bootmem_section(unsi > unsigned long section_nr) > { > bootmem_data_t *bdata; > - unsigned long pfn, goal, limit; > + unsigned long pfn, goal; > > pfn = section_nr_to_pfn(section_nr); > goal = pfn << PAGE_SHIFT; > - limit = 0; > bdata = &bootmem_node_data[early_pfn_to_nid(pfn)]; > > - return alloc_bootmem_core(bdata, size, SMP_CACHE_BYTES, goal, limit); > + return alloc_bootmem_core(bdata, size, SMP_CACHE_BYTES, goal, 0); > } > #endif Thanks for all the feedback! -Nish -- Nishanth Aravamudan <nacc@us.ibm.com> IBM Linux Technology Center ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] bootmem/sparsemem: remove limit constraint in alloc_bootmem_section 2012-02-29 23:28 ` Andrew Morton 2012-03-01 0:03 ` Nishanth Aravamudan @ 2012-03-01 23:12 ` Nishanth Aravamudan 1 sibling, 0 replies; 11+ messages in thread From: Nishanth Aravamudan @ 2012-03-01 23:12 UTC (permalink / raw) To: Andrew Morton Cc: Anton Blanchard, Dave Hansen, stable, linux-mm, Paul Mackerras, Mel Gorman, Johannes Weiner, Robert Jennings, linuxppc-dev On 29.02.2012 [15:28:30 -0800], Andrew Morton wrote: > On Wed, 29 Feb 2012 10:12:33 -0800 > Nishanth Aravamudan <nacc@linux.vnet.ibm.com> wrote: > > > While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory > > Overcommit) on powerpc, we tripped the following: > > > > kernel BUG at mm/bootmem.c:483! > > > > ... > > > > This is > > > > BUG_ON(limit && goal + size > limit); > > > > and after some debugging, it seems that > > > > goal = 0x7ffff000000 > > limit = 0x80000000000 > > > > and sparse_early_usemaps_alloc_node -> > > sparse_early_usemaps_alloc_pgdat_section calls > > > > return alloc_bootmem_section(usemap_size() * count, section_nr); > > > > This is on a system with 8TB available via the AMS pool, and as a quirk > > of AMS in firmware, all of that memory shows up in node 0. So, we end up > > with an allocation that will fail the goal/limit constraints. In theory, > > we could "fall-back" to alloc_bootmem_node() in > > sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE > > defined, we'll BUG_ON() instead. A simple solution appears to be to > > unconditionally remove the limit condition in alloc_bootmem_section, > > meaning allocations are allowed to cross section boundaries (necessary > > for systems of this size). > > > > Johannes Weiner pointed out that if alloc_bootmem_section() no longer > > guarantees section-locality, we need check_usemap_section_nr() to print > > possible cross-dependencies between node descriptors and the usemaps > > allocated through it. That makes the two loops in > > sparse_early_usemaps_alloc_node() identical, so re-factor the code a > > bit. > > The patch is a bit scary now, so I think we should merge it into > 3.4-rc1 and then backport it into 3.3.1 if nothing blows up. > > Do you think it should be backported into 3.3.x? Earlier kernels? Upon review, it would be good if we can get it pushed back to kernels 3.0.x, 3.1.x and 3.2.x. Thanks, Nish -- Nishanth Aravamudan <nacc@us.ibm.com> IBM Linux Technology Center ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] bootmem/sparsemem: remove limit constraint in alloc_bootmem_section 2012-02-29 18:12 ` [PATCH v2] bootmem/sparsemem: remove limit constraint in alloc_bootmem_section Nishanth Aravamudan 2012-02-29 18:45 ` Johannes Weiner 2012-02-29 23:28 ` Andrew Morton @ 2012-03-01 11:42 ` Mel Gorman 2 siblings, 0 replies; 11+ messages in thread From: Mel Gorman @ 2012-03-01 11:42 UTC (permalink / raw) To: Nishanth Aravamudan Cc: Anton Blanchard, Dave Hansen, linux-mm, Paul Mackerras, Johannes Weiner, Andrew Morton, Robert Jennings, linuxppc-dev On Wed, Feb 29, 2012 at 10:12:33AM -0800, Nishanth Aravamudan wrote: > <SNIP> > > Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> > Acked-by: Mel Gorman <mgorman@suse.de> -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2012-03-01 23:12 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-02-24 19:33 [PATCH] sparsemem/bootmem: catch greater than section size allocations Nishanth Aravamudan 2012-02-28 13:53 ` Johannes Weiner 2012-02-28 20:11 ` Nishanth Aravamudan 2012-02-29 9:17 ` Johannes Weiner 2012-02-28 15:47 ` Mel Gorman 2012-02-29 18:12 ` [PATCH v2] bootmem/sparsemem: remove limit constraint in alloc_bootmem_section Nishanth Aravamudan 2012-02-29 18:45 ` Johannes Weiner 2012-02-29 23:28 ` Andrew Morton 2012-03-01 0:03 ` Nishanth Aravamudan 2012-03-01 23:12 ` Nishanth Aravamudan 2012-03-01 11:42 ` Mel Gorman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).