From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751269AbdKTCED (ORCPT ); Sun, 19 Nov 2017 21:04:03 -0500 Received: from mail-pf0-f195.google.com ([209.85.192.195]:35444 "EHLO mail-pf0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751043AbdKTCEA (ORCPT ); Sun, 19 Nov 2017 21:04:00 -0500 X-Google-Smtp-Source: AGs4zMYNqyKgpGzHTk8/1joxaZeNPQ7kWgWWJ9Gx9hiAk290LEkcdFdVH8bh03UHN+z2K+j5dIZliw== Subject: Re: mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info (crisv32 hang) To: Nicolas Pitre Cc: Tejun Heo , Christoph Lameter , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Mikael Starvik , Jesper Nilsson , linux-cris-kernel@axis.com References: <20171118182542.GA23928@roeck-us.net> From: Guenter Roeck Message-ID: Date: Sun, 19 Nov 2017 18:03:57 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/19/2017 12:36 PM, Nicolas Pitre wrote: > On Sat, 18 Nov 2017, Guenter Roeck wrote: > >> Hi, >> >> On Tue, Oct 03, 2017 at 06:29:49PM -0400, Nicolas Pitre wrote: >>> On Tue, 3 Oct 2017, Tejun Heo wrote: >>> >>>> On Tue, Oct 03, 2017 at 04:57:44PM -0400, Nicolas Pitre wrote: >>>>> This can be much smaller than a page on very small memory systems. >>>>> Always rounding up the size to a page is wasteful in that case, and >>>>> required alignment is smaller than the memblock default. Let's round >>>>> things up to a page size only when the actual size is >= page size, and >>>>> then it makes sense to page-align for a nicer allocation pattern. >>>> >>>> Isn't that a temporary area which gets freed later during boot? >>> >>> Hmmm... >>> >>> It may get freed through 3 different paths where 2 of them are error >>> paths. What looks like a non-error path is in pcpu_embed_first_chunk() >>> called from setup_per_cpu_areas(). But there are two versions of >>> setup_per_cpu_areas(): one for SMP and one for !SMP. And the !SMP case >>> never calls pcpu_free_alloc_info() currently. >>> >>> I'm not sure i understand that code fully, but maybe the following patch >>> could be a better fit: >>> >>> ----- >8 >>> Subject: [PATCH] percpu: don't forget to free the temporary struct pcpu_alloc_info >>> >>> Unlike the SMP case, the !SMP case does not free the memory for struct >>> pcpu_alloc_info allocated in setup_per_cpu_areas(). And to give it a >>> chance of being reused by the page allocator later, align it to a page >>> boundary just like its size. >>> >>> Signed-off-by: Nicolas Pitre >> >> This patch causes my crisv32 qemu emulation to hang with no console output. >> >>> >>> diff --git a/mm/percpu.c b/mm/percpu.c >>> index 434844415d..caab63375b 100644 >>> --- a/mm/percpu.c >>> +++ b/mm/percpu.c >>> @@ -1416,7 +1416,7 @@ struct pcpu_alloc_info * __init pcpu_alloc_alloc_info(int nr_groups, >>> __alignof__(ai->groups[0].cpu_map[0])); >>> ai_size = base_size + nr_units * sizeof(ai->groups[0].cpu_map[0]); >>> >>> - ptr = memblock_virt_alloc_nopanic(PFN_ALIGN(ai_size), 0); >>> + ptr = memblock_virt_alloc_nopanic(PFN_ALIGN(ai_size), PAGE_SIZE); >>> if (!ptr) >>> return NULL; >>> ai = ptr; >>> @@ -2295,6 +2295,7 @@ void __init setup_per_cpu_areas(void) >>> >>> if (pcpu_setup_first_chunk(ai, fc) < 0) >>> panic("Failed to initialize percpu areas."); >>> + pcpu_free_alloc_info(ai); >> >> This is the culprit. Everything works fine if I remove this line. > > Without this line, the memory at the ai pointer is leaked. Maybe this is > modifying the memory allocation pattern and that triggers a bug later on > in your case. > > At that point the console driver is not yet initialized and any error > message won't be printed. You should enable the early console mechanism > in your kernel (see arch/cris/arch-v32/kernel/debugport.c) and see what > that might tell you. > The problem is that BUG() on crisv32 does not yield useful output. Anyway, here is the culprit. diff --git a/mm/bootmem.c b/mm/bootmem.c index 6aef64254203..2bcc8901450c 100644 --- a/mm/bootmem.c +++ b/mm/bootmem.c @@ -382,7 +382,8 @@ static int __init mark_bootmem(unsigned long start, unsigned long end, return 0; pos = bdata->node_low_pfn; } - BUG(); + WARN(1, "mark_bootmem(): memory range 0x%lx-0x%lx not found\n", start, end); + return -ENOMEM; } /** diff --git a/mm/percpu.c b/mm/percpu.c index 79e3549cab0f..c75622d844f1 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -1881,6 +1881,7 @@ struct pcpu_alloc_info * __init pcpu_alloc_alloc_info(int nr_groups, */ void __init pcpu_free_alloc_info(struct pcpu_alloc_info *ai) { + printk("pcpu_free_alloc_info(%p (0x%lx))\n", ai, __pa(ai)); memblock_free_early(__pa(ai), ai->__ai_size); } results in: pcpu_free_alloc_info(c0534000 (0x40534000)) ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at mm/bootmem.c:385 mark_bootmem+0x9a/0xaa mark_bootmem(): memory range 0x2029a-0x2029b not found and the system keeps booting. If I drop the __pa() from the memblock_free_early() parameter, everything works as expected. Off to the cris maintainers ... I have no idea how to fix the problem. Guenter From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f69.google.com (mail-pg0-f69.google.com [74.125.83.69]) by kanga.kvack.org (Postfix) with ESMTP id BE2BB6B0038 for ; Sun, 19 Nov 2017 21:04:01 -0500 (EST) Received: by mail-pg0-f69.google.com with SMTP id p9so8069085pgc.6 for ; Sun, 19 Nov 2017 18:04:01 -0800 (PST) Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id r129sor2799686pfc.147.2017.11.19.18.04.00 for (Google Transport Security); Sun, 19 Nov 2017 18:04:00 -0800 (PST) Subject: Re: mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info (crisv32 hang) References: <20171118182542.GA23928@roeck-us.net> From: Guenter Roeck Message-ID: Date: Sun, 19 Nov 2017 18:03:57 -0800 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Nicolas Pitre Cc: Tejun Heo , Christoph Lameter , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Mikael Starvik , Jesper Nilsson , linux-cris-kernel@axis.com On 11/19/2017 12:36 PM, Nicolas Pitre wrote: > On Sat, 18 Nov 2017, Guenter Roeck wrote: > >> Hi, >> >> On Tue, Oct 03, 2017 at 06:29:49PM -0400, Nicolas Pitre wrote: >>> On Tue, 3 Oct 2017, Tejun Heo wrote: >>> >>>> On Tue, Oct 03, 2017 at 04:57:44PM -0400, Nicolas Pitre wrote: >>>>> This can be much smaller than a page on very small memory systems. >>>>> Always rounding up the size to a page is wasteful in that case, and >>>>> required alignment is smaller than the memblock default. Let's round >>>>> things up to a page size only when the actual size is >= page size, and >>>>> then it makes sense to page-align for a nicer allocation pattern. >>>> >>>> Isn't that a temporary area which gets freed later during boot? >>> >>> Hmmm... >>> >>> It may get freed through 3 different paths where 2 of them are error >>> paths. What looks like a non-error path is in pcpu_embed_first_chunk() >>> called from setup_per_cpu_areas(). But there are two versions of >>> setup_per_cpu_areas(): one for SMP and one for !SMP. And the !SMP case >>> never calls pcpu_free_alloc_info() currently. >>> >>> I'm not sure i understand that code fully, but maybe the following patch >>> could be a better fit: >>> >>> ----- >8 >>> Subject: [PATCH] percpu: don't forget to free the temporary struct pcpu_alloc_info >>> >>> Unlike the SMP case, the !SMP case does not free the memory for struct >>> pcpu_alloc_info allocated in setup_per_cpu_areas(). And to give it a >>> chance of being reused by the page allocator later, align it to a page >>> boundary just like its size. >>> >>> Signed-off-by: Nicolas Pitre >> >> This patch causes my crisv32 qemu emulation to hang with no console output. >> >>> >>> diff --git a/mm/percpu.c b/mm/percpu.c >>> index 434844415d..caab63375b 100644 >>> --- a/mm/percpu.c >>> +++ b/mm/percpu.c >>> @@ -1416,7 +1416,7 @@ struct pcpu_alloc_info * __init pcpu_alloc_alloc_info(int nr_groups, >>> __alignof__(ai->groups[0].cpu_map[0])); >>> ai_size = base_size + nr_units * sizeof(ai->groups[0].cpu_map[0]); >>> >>> - ptr = memblock_virt_alloc_nopanic(PFN_ALIGN(ai_size), 0); >>> + ptr = memblock_virt_alloc_nopanic(PFN_ALIGN(ai_size), PAGE_SIZE); >>> if (!ptr) >>> return NULL; >>> ai = ptr; >>> @@ -2295,6 +2295,7 @@ void __init setup_per_cpu_areas(void) >>> >>> if (pcpu_setup_first_chunk(ai, fc) < 0) >>> panic("Failed to initialize percpu areas."); >>> + pcpu_free_alloc_info(ai); >> >> This is the culprit. Everything works fine if I remove this line. > > Without this line, the memory at the ai pointer is leaked. Maybe this is > modifying the memory allocation pattern and that triggers a bug later on > in your case. > > At that point the console driver is not yet initialized and any error > message won't be printed. You should enable the early console mechanism > in your kernel (see arch/cris/arch-v32/kernel/debugport.c) and see what > that might tell you. > The problem is that BUG() on crisv32 does not yield useful output. Anyway, here is the culprit. diff --git a/mm/bootmem.c b/mm/bootmem.c index 6aef64254203..2bcc8901450c 100644 --- a/mm/bootmem.c +++ b/mm/bootmem.c @@ -382,7 +382,8 @@ static int __init mark_bootmem(unsigned long start, unsigned long end, return 0; pos = bdata->node_low_pfn; } - BUG(); + WARN(1, "mark_bootmem(): memory range 0x%lx-0x%lx not found\n", start, end); + return -ENOMEM; } /** diff --git a/mm/percpu.c b/mm/percpu.c index 79e3549cab0f..c75622d844f1 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -1881,6 +1881,7 @@ struct pcpu_alloc_info * __init pcpu_alloc_alloc_info(int nr_groups, */ void __init pcpu_free_alloc_info(struct pcpu_alloc_info *ai) { + printk("pcpu_free_alloc_info(%p (0x%lx))\n", ai, __pa(ai)); memblock_free_early(__pa(ai), ai->__ai_size); } results in: pcpu_free_alloc_info(c0534000 (0x40534000)) ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at mm/bootmem.c:385 mark_bootmem+0x9a/0xaa mark_bootmem(): memory range 0x2029a-0x2029b not found and the system keeps booting. If I drop the __pa() from the memblock_free_early() parameter, everything works as expected. Off to the cris maintainers ... I have no idea how to fix the problem. Guenter -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org