From: "Zhouguanghui (OS Kernel)" <zhouguanghui1@huawei.com> To: Mike Rapoport <rppt@kernel.org>, Andrew Morton <akpm@linux-foundation.org> Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "linux-mm@kvack.org" <linux-mm@kvack.org>, "xuqiang (M)" <xuqiang36@huawei.com> Subject: Re: [PATCH] memblock: config the number of init memblock regions Date: Thu, 12 May 2022 02:46:25 +0000 [thread overview] Message-ID: <73da782c847b413d9b81b0c2940ab13c@huawei.com> (raw) In-Reply-To: YntRlrwJeP40q6Hg@kernel.org 在 2022/5/11 14:03, Mike Rapoport 写道: > On Tue, May 10, 2022 at 06:55:23PM -0700, Andrew Morton wrote: >> On Wed, 11 May 2022 01:05:30 +0000 Zhou Guanghui <zhouguanghui1@huawei.com> wrote: >> >>> During early boot, the number of memblocks may exceed 128(some memory >>> areas are not reported to the kernel due to test failures. As a result, >>> contiguous memory is divided into multiple parts for reporting). If >>> the size of the init memblock regions is exceeded before the array size >>> can be resized, the excess memory will be lost. > > I'd like to see more details about how firmware creates that sparse memory > map in the changelog. > The scenario is as follows: In a system using HBM, a multi-bit ECC error occurs, and the BIOS saves the corresponding area (for example, 2 MB). When the system restarts next time, these areas are isolated and not reported or reported as EFI_UNUSABLE_MEMORY. Both of them lead to an increase in the number of memblocks, whereas EFI_UNUSABLE_MEMORY leads to a larger number of memblocks. For example, if the EFI_UNUSABLE_MEMORY type is reported: .. memory[0x92] [0x0000200834a00000-0x0000200835bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0 memory[0x93] [0x0000200835c00000-0x0000200835dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4 memory[0x94] [0x0000200835e00000-0x00002008367fffff], 0x0000000000a00000 bytes on node 7 flags: 0x0 memory[0x95] [0x0000200836800000-0x00002008369fffff], 0x0000000000200000 bytes on node 7 flags: 0x4 memory[0x96] [0x0000200836a00000-0x0000200837bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0 memory[0x97] [0x0000200837c00000-0x0000200837dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4 memory[0x98] [0x0000200837e00000-0x000020087fffffff], 0x0000000048200000 bytes on node 7 flags: 0x0 memory[0x99] [0x0000200880000000-0x0000200bcfffffff], 0x0000000350000000 bytes on node 6 flags: 0x0 memory[0x9a] [0x0000200bd0000000-0x0000200bd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4 memory[0x9b] [0x0000200bd0200000-0x0000200bd07fffff], 0x0000000000600000 bytes on node 6 flags: 0x0 memory[0x9c] [0x0000200bd0800000-0x0000200bd09fffff], 0x0000000000200000 bytes on node 6 flags: 0x4 memory[0x9d] [0x0000200bd0a00000-0x0000200fcfffffff], 0x00000003ff600000 bytes on node 6 flags: 0x0 memory[0x9e] [0x0000200fd0000000-0x0000200fd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4 memory[0x9f] [0x0000200fd0200000-0x0000200fffffffff], 0x000000002fe00000 bytes on node 6 flags: 0x0 .. >>> >>> ... >>> >>> --- a/mm/Kconfig >>> +++ b/mm/Kconfig >>> @@ -89,6 +89,14 @@ config SPARSEMEM_VMEMMAP >>> pfn_to_page and page_to_pfn operations. This is the most >>> efficient option when sufficient kernel resources are available. >>> >>> +config MEMBLOCK_INIT_REGIONS >>> + int "Number of init memblock regions" >>> + range 128 1024 >>> + default 128 >>> + help >>> + The number of init memblock regions which used to track "memory" and >>> + "reserved" memblocks during early boot. >>> + >>> config HAVE_MEMBLOCK_PHYS_MAP >>> bool >>> >>> diff --git a/mm/memblock.c b/mm/memblock.c >>> index e4f03a6e8e56..6893d26b750e 100644 >>> --- a/mm/memblock.c >>> +++ b/mm/memblock.c >>> @@ -22,7 +22,7 @@ >>> >>> #include "internal.h" >>> >>> -#define INIT_MEMBLOCK_REGIONS 128 >>> +#define INIT_MEMBLOCK_REGIONS CONFIG_MEMBLOCK_INIT_REGIONS >> >> Consistent naming would be nice - MEMBLOCK_INIT versus INIT_MEMBLOCK. I agree. >> >> Can we simply increase INIT_MEMBLOCK_REGIONS to 1024 and avoid the >> config option? It appears that the overhead from this would be 60kB or >> so. > > 60k is not big, but using 1024 entries array for 2-4 memory banks on > systems that don't report that fragmented memory map is really a waste. > > We can make this per platform opt-in, like INIT_MEMBLOCK_RESERVED_REGIONS ... > As I described above, is this a general scenario? >> Or zero if CONFIG_ARCH_KEEP_MEMBLOCK and CONFIG_MEMORY_HOTPLUG >> are cooperating. > > ... or add code that will discard unused parts of memblock arrays even if > CONFIG_ARCH_KEEP_MEMBLOCK=y. > In scenarios where the memory usage is sensitive, should CONFIG_ARCH_KEEP_MEMBLOCK be set to n or set the number by adding config? Andrew, Mike, thank you.
WARNING: multiple messages have this Message-ID (diff)
From: "Zhouguanghui (OS Kernel)" <zhouguanghui1@huawei.com> To: Mike Rapoport <rppt@kernel.org>, Andrew Morton <akpm@linux-foundation.org> Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "linux-mm@kvack.org" <linux-mm@kvack.org>, "xuqiang (M)" <xuqiang36@huawei.com> Subject: Re: [PATCH] memblock: config the number of init memblock regions Date: Thu, 12 May 2022 02:46:25 +0000 [thread overview] Message-ID: <73da782c847b413d9b81b0c2940ab13c@huawei.com> (raw) In-Reply-To: YntRlrwJeP40q6Hg@kernel.org 在 2022/5/11 14:03, Mike Rapoport 写道: > On Tue, May 10, 2022 at 06:55:23PM -0700, Andrew Morton wrote: >> On Wed, 11 May 2022 01:05:30 +0000 Zhou Guanghui <zhouguanghui1@huawei.com> wrote: >> >>> During early boot, the number of memblocks may exceed 128(some memory >>> areas are not reported to the kernel due to test failures. As a result, >>> contiguous memory is divided into multiple parts for reporting). If >>> the size of the init memblock regions is exceeded before the array size >>> can be resized, the excess memory will be lost. > > I'd like to see more details about how firmware creates that sparse memory > map in the changelog. > The scenario is as follows: In a system using HBM, a multi-bit ECC error occurs, and the BIOS saves the corresponding area (for example, 2 MB). When the system restarts next time, these areas are isolated and not reported or reported as EFI_UNUSABLE_MEMORY. Both of them lead to an increase in the number of memblocks, whereas EFI_UNUSABLE_MEMORY leads to a larger number of memblocks. For example, if the EFI_UNUSABLE_MEMORY type is reported: ... memory[0x92] [0x0000200834a00000-0x0000200835bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0 memory[0x93] [0x0000200835c00000-0x0000200835dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4 memory[0x94] [0x0000200835e00000-0x00002008367fffff], 0x0000000000a00000 bytes on node 7 flags: 0x0 memory[0x95] [0x0000200836800000-0x00002008369fffff], 0x0000000000200000 bytes on node 7 flags: 0x4 memory[0x96] [0x0000200836a00000-0x0000200837bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0 memory[0x97] [0x0000200837c00000-0x0000200837dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4 memory[0x98] [0x0000200837e00000-0x000020087fffffff], 0x0000000048200000 bytes on node 7 flags: 0x0 memory[0x99] [0x0000200880000000-0x0000200bcfffffff], 0x0000000350000000 bytes on node 6 flags: 0x0 memory[0x9a] [0x0000200bd0000000-0x0000200bd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4 memory[0x9b] [0x0000200bd0200000-0x0000200bd07fffff], 0x0000000000600000 bytes on node 6 flags: 0x0 memory[0x9c] [0x0000200bd0800000-0x0000200bd09fffff], 0x0000000000200000 bytes on node 6 flags: 0x4 memory[0x9d] [0x0000200bd0a00000-0x0000200fcfffffff], 0x00000003ff600000 bytes on node 6 flags: 0x0 memory[0x9e] [0x0000200fd0000000-0x0000200fd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4 memory[0x9f] [0x0000200fd0200000-0x0000200fffffffff], 0x000000002fe00000 bytes on node 6 flags: 0x0 ... >>> >>> ... >>> >>> --- a/mm/Kconfig >>> +++ b/mm/Kconfig >>> @@ -89,6 +89,14 @@ config SPARSEMEM_VMEMMAP >>> pfn_to_page and page_to_pfn operations. This is the most >>> efficient option when sufficient kernel resources are available. >>> >>> +config MEMBLOCK_INIT_REGIONS >>> + int "Number of init memblock regions" >>> + range 128 1024 >>> + default 128 >>> + help >>> + The number of init memblock regions which used to track "memory" and >>> + "reserved" memblocks during early boot. >>> + >>> config HAVE_MEMBLOCK_PHYS_MAP >>> bool >>> >>> diff --git a/mm/memblock.c b/mm/memblock.c >>> index e4f03a6e8e56..6893d26b750e 100644 >>> --- a/mm/memblock.c >>> +++ b/mm/memblock.c >>> @@ -22,7 +22,7 @@ >>> >>> #include "internal.h" >>> >>> -#define INIT_MEMBLOCK_REGIONS 128 >>> +#define INIT_MEMBLOCK_REGIONS CONFIG_MEMBLOCK_INIT_REGIONS >> >> Consistent naming would be nice - MEMBLOCK_INIT versus INIT_MEMBLOCK. I agree. >> >> Can we simply increase INIT_MEMBLOCK_REGIONS to 1024 and avoid the >> config option? It appears that the overhead from this would be 60kB or >> so. > > 60k is not big, but using 1024 entries array for 2-4 memory banks on > systems that don't report that fragmented memory map is really a waste. > > We can make this per platform opt-in, like INIT_MEMBLOCK_RESERVED_REGIONS ... > As I described above, is this a general scenario? >> Or zero if CONFIG_ARCH_KEEP_MEMBLOCK and CONFIG_MEMORY_HOTPLUG >> are cooperating. > > ... or add code that will discard unused parts of memblock arrays even if > CONFIG_ARCH_KEEP_MEMBLOCK=y. > In scenarios where the memory usage is sensitive, should CONFIG_ARCH_KEEP_MEMBLOCK be set to n or set the number by adding config? Andrew, Mike, thank you.
next prev parent reply other threads:[~2022-05-12 2:46 UTC|newest] Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-05-11 1:05 [PATCH] memblock: config the number of init memblock regions Zhou Guanghui 2022-05-11 1:55 ` Andrew Morton 2022-05-11 6:03 ` Mike Rapoport 2022-05-12 2:46 ` Zhouguanghui (OS Kernel) [this message] 2022-05-12 2:46 ` Zhouguanghui (OS Kernel) 2022-05-12 6:28 ` Mike Rapoport 2022-05-25 16:44 ` Darren Hart 2022-05-25 17:12 ` Mike Rapoport
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=73da782c847b413d9b81b0c2940ab13c@huawei.com \ --to=zhouguanghui1@huawei.com \ --cc=akpm@linux-foundation.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=rppt@kernel.org \ --cc=xuqiang36@huawei.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.