From: Mel Gorman <mgorman@suse.de> To: Andrew Morton <akpm@linux-foundation.org> Cc: Waiman Long <waiman.long@hp.com>, Nathan Zimmer <nzimmer@sgi.com>, Dave Hansen <dave.hansen@intel.com>, Scott Norton <scott.norton@hp.com>, Daniel J Blueman <daniel@numascale.com>, Linux-MM <linux-mm@kvack.org>, LKML <linux-kernel@vger.kernel.org> Subject: Re: [PATCH 0/13] Parallel struct page initialisation v4 Date: Tue, 5 May 2015 11:45:14 +0100 [thread overview] Message-ID: <20150505104514.GC2462@suse.de> (raw) In-Reply-To: <20150504143046.9404c572486caf71bdef0676@linux-foundation.org> On Mon, May 04, 2015 at 02:30:46PM -0700, Andrew Morton wrote: > > Before the patch, the boot time from elilo prompt to ssh login was 694s. > > After the patch, the boot up time was 346s, a saving of 348s (about 50%). > > Having to guesstimate the amount of memory which is needed for a > successful boot will be painful. Any number we choose will be wrong > 99% of the time. > > If the kswapd threads have started, all we need to do is to wait: take > a little nap in the allocator's page==NULL slowpath. > > I'm not seeing any reason why we can't start kswapd much earlier - > right at the start of do_basic_setup()? It doesn't even have to be kswapd, it just should be a thread pinned to a done. The difficulty is that dealing with the system hashes means the initialisation has to happen before vfs_caches_init_early() when there is no scheduler. Those allocations could be delayed further but then there is the possibility that the allocations would not be contiguous and they'd have to rely on CMA to make the attempt. That potentially alters the performance of the large system hashes at run time. We can scale the amount initialised with memory sizes relatively easy. This boots on the same 1TB machine I was testing before but that is hardly a surprise. ---8<--- mm: meminit: Take into account that large system caches scale linearly with memory Waiman Long reported a 24TB machine triggered an OOM as parallel memory initialisation deferred too much memory for initialisation. The likely consumer of this memory was large system hashes that scale with memory size. This patch initialises at least 2G per node but scales the amount initialised for larger systems. Signed-off-by: Mel Gorman <mgorman@suse.de> --- mm/page_alloc.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 598f78d6544c..f7cc6c9fb909 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -266,15 +266,16 @@ static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid) */ static inline bool update_defer_init(pg_data_t *pgdat, unsigned long pfn, unsigned long zone_end, + unsigned long max_initialise, unsigned long *nr_initialised) { /* Always populate low zones for address-contrained allocations */ if (zone_end < pgdat_end_pfn(pgdat)) return true; - /* Initialise at least 2G of the highest zone */ + /* Initialise at least the requested amount in the highest zone */ (*nr_initialised)++; - if (*nr_initialised > (2UL << (30 - PAGE_SHIFT)) && + if ((*nr_initialised > max_initialise) && (pfn & (PAGES_PER_SECTION - 1)) == 0) { pgdat->first_deferred_pfn = pfn; return false; @@ -299,6 +300,7 @@ static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid) static inline bool update_defer_init(pg_data_t *pgdat, unsigned long pfn, unsigned long zone_end, + unsigned long max_initialise, unsigned long *nr_initialised) { return true; @@ -4457,11 +4459,19 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, unsigned long end_pfn = start_pfn + size; unsigned long pfn; struct zone *z; + unsigned long max_initialise; unsigned long nr_initialised = 0; if (highest_memmap_pfn < end_pfn - 1) highest_memmap_pfn = end_pfn - 1; + /* + * Initialise at least 2G of a node but also take into account that + * two large system hashes that can take up an 8th of memory. + */ + max_initialise = min(2UL << (30 - PAGE_SHIFT), + (pgdat->node_spanned_pages >> 3)); + z = &pgdat->node_zones[zone]; for (pfn = start_pfn; pfn < end_pfn; pfn++) { /* @@ -4475,6 +4485,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, if (!early_pfn_in_nid(pfn, nid)) continue; if (!update_defer_init(pgdat, pfn, end_pfn, + max_initialise, &nr_initialised)) break; }
WARNING: multiple messages have this Message-ID (diff)
From: Mel Gorman <mgorman@suse.de> To: Andrew Morton <akpm@linux-foundation.org> Cc: Waiman Long <waiman.long@hp.com>, Nathan Zimmer <nzimmer@sgi.com>, Dave Hansen <dave.hansen@intel.com>, Scott Norton <scott.norton@hp.com>, Daniel J Blueman <daniel@numascale.com>, Linux-MM <linux-mm@kvack.org>, LKML <linux-kernel@vger.kernel.org> Subject: Re: [PATCH 0/13] Parallel struct page initialisation v4 Date: Tue, 5 May 2015 11:45:14 +0100 [thread overview] Message-ID: <20150505104514.GC2462@suse.de> (raw) In-Reply-To: <20150504143046.9404c572486caf71bdef0676@linux-foundation.org> On Mon, May 04, 2015 at 02:30:46PM -0700, Andrew Morton wrote: > > Before the patch, the boot time from elilo prompt to ssh login was 694s. > > After the patch, the boot up time was 346s, a saving of 348s (about 50%). > > Having to guesstimate the amount of memory which is needed for a > successful boot will be painful. Any number we choose will be wrong > 99% of the time. > > If the kswapd threads have started, all we need to do is to wait: take > a little nap in the allocator's page==NULL slowpath. > > I'm not seeing any reason why we can't start kswapd much earlier - > right at the start of do_basic_setup()? It doesn't even have to be kswapd, it just should be a thread pinned to a done. The difficulty is that dealing with the system hashes means the initialisation has to happen before vfs_caches_init_early() when there is no scheduler. Those allocations could be delayed further but then there is the possibility that the allocations would not be contiguous and they'd have to rely on CMA to make the attempt. That potentially alters the performance of the large system hashes at run time. We can scale the amount initialised with memory sizes relatively easy. This boots on the same 1TB machine I was testing before but that is hardly a surprise. ---8<--- mm: meminit: Take into account that large system caches scale linearly with memory Waiman Long reported a 24TB machine triggered an OOM as parallel memory initialisation deferred too much memory for initialisation. The likely consumer of this memory was large system hashes that scale with memory size. This patch initialises at least 2G per node but scales the amount initialised for larger systems. Signed-off-by: Mel Gorman <mgorman@suse.de> --- mm/page_alloc.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 598f78d6544c..f7cc6c9fb909 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -266,15 +266,16 @@ static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid) */ static inline bool update_defer_init(pg_data_t *pgdat, unsigned long pfn, unsigned long zone_end, + unsigned long max_initialise, unsigned long *nr_initialised) { /* Always populate low zones for address-contrained allocations */ if (zone_end < pgdat_end_pfn(pgdat)) return true; - /* Initialise at least 2G of the highest zone */ + /* Initialise at least the requested amount in the highest zone */ (*nr_initialised)++; - if (*nr_initialised > (2UL << (30 - PAGE_SHIFT)) && + if ((*nr_initialised > max_initialise) && (pfn & (PAGES_PER_SECTION - 1)) == 0) { pgdat->first_deferred_pfn = pfn; return false; @@ -299,6 +300,7 @@ static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid) static inline bool update_defer_init(pg_data_t *pgdat, unsigned long pfn, unsigned long zone_end, + unsigned long max_initialise, unsigned long *nr_initialised) { return true; @@ -4457,11 +4459,19 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, unsigned long end_pfn = start_pfn + size; unsigned long pfn; struct zone *z; + unsigned long max_initialise; unsigned long nr_initialised = 0; if (highest_memmap_pfn < end_pfn - 1) highest_memmap_pfn = end_pfn - 1; + /* + * Initialise at least 2G of a node but also take into account that + * two large system hashes that can take up an 8th of memory. + */ + max_initialise = min(2UL << (30 - PAGE_SHIFT), + (pgdat->node_spanned_pages >> 3)); + z = &pgdat->node_zones[zone]; for (pfn = start_pfn; pfn < end_pfn; pfn++) { /* @@ -4475,6 +4485,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, if (!early_pfn_in_nid(pfn, nid)) continue; if (!update_defer_init(pgdat, pfn, end_pfn, + max_initialise, &nr_initialised)) break; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-05-05 10:45 UTC|newest] Thread overview: 168+ messages / expand[flat|nested] mbox.gz Atom feed top 2015-04-28 14:36 [PATCH 0/13] Parallel struct page initialisation v4 Mel Gorman 2015-04-28 14:36 ` Mel Gorman 2015-04-28 14:36 ` [PATCH 01/13] memblock: Introduce a for_each_reserved_mem_region iterator Mel Gorman 2015-04-28 14:36 ` Mel Gorman 2015-04-28 14:36 ` [PATCH 02/13] mm: meminit: Move page initialization into a separate function Mel Gorman 2015-04-28 14:36 ` Mel Gorman 2015-04-28 14:37 ` [PATCH 03/13] mm: meminit: Only set page reserved in the memblock region Mel Gorman 2015-04-28 14:37 ` Mel Gorman 2015-05-22 20:31 ` Tony Luck 2015-05-22 20:31 ` Tony Luck 2015-05-26 10:22 ` Mel Gorman 2015-05-26 10:22 ` Mel Gorman 2015-04-28 14:37 ` [PATCH 04/13] mm: page_alloc: Pass PFN to __free_pages_bootmem Mel Gorman 2015-04-28 14:37 ` Mel Gorman 2015-05-01 9:20 ` [PATCH] mm: page_alloc: pass PFN to __free_pages_bootmem -fix Mel Gorman 2015-05-01 9:20 ` Mel Gorman 2015-04-28 14:37 ` [PATCH 05/13] mm: meminit: Make __early_pfn_to_nid SMP-safe and introduce meminit_pfn_in_nid Mel Gorman 2015-04-28 14:37 ` Mel Gorman 2015-04-28 14:37 ` [PATCH 06/13] mm: meminit: Inline some helper functions Mel Gorman 2015-04-28 14:37 ` Mel Gorman 2015-04-30 21:53 ` Andrew Morton 2015-04-30 21:53 ` Andrew Morton 2015-04-30 21:55 ` Andrew Morton 2015-04-30 21:55 ` Andrew Morton 2015-05-04 8:33 ` Michal Hocko 2015-05-04 8:33 ` Michal Hocko 2015-05-04 8:38 ` Michal Hocko 2015-05-04 8:38 ` Michal Hocko 2015-04-28 14:37 ` [PATCH 07/13] mm: meminit: Initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set Mel Gorman 2015-04-28 14:37 ` Mel Gorman 2015-04-29 21:19 ` Andrew Morton 2015-04-29 21:19 ` Andrew Morton 2015-04-30 8:45 ` Mel Gorman 2015-04-30 8:45 ` Mel Gorman 2015-05-01 9:21 ` [PATCH] mm: meminit: Initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set -fix Mel Gorman 2015-05-01 9:21 ` Mel Gorman 2015-07-14 15:54 ` 4.2-rc2: hitting "file-max limit 8192 reached" Dave Hansen 2015-07-14 15:54 ` Dave Hansen 2015-07-14 16:15 ` Andrew Morton 2015-07-14 16:15 ` Andrew Morton 2015-07-15 10:45 ` Mel Gorman 2015-07-15 10:45 ` Mel Gorman 2015-04-28 14:37 ` [PATCH 08/13] mm: meminit: Initialise remaining struct pages in parallel with kswapd Mel Gorman 2015-04-28 14:37 ` Mel Gorman 2015-04-28 14:37 ` [PATCH 09/13] mm: meminit: Minimise number of pfn->page lookups during initialisation Mel Gorman 2015-04-28 14:37 ` Mel Gorman 2015-04-28 14:37 ` [PATCH 10/13] x86: mm: Enable deferred struct page initialisation on x86-64 Mel Gorman 2015-04-28 14:37 ` Mel Gorman 2015-04-28 14:37 ` [PATCH 11/13] mm: meminit: Free pages in large chunks where possible Mel Gorman 2015-04-28 14:37 ` Mel Gorman 2015-04-28 14:37 ` [PATCH 12/13] mm: meminit: Reduce number of times pageblocks are set during struct page init Mel Gorman 2015-04-28 14:37 ` Mel Gorman 2015-05-01 9:23 ` [PATCH] mm: meminit: Reduce number of times pageblocks are set during struct page init -fix Mel Gorman 2015-05-01 9:23 ` Mel Gorman 2015-04-28 14:37 ` [PATCH 13/13] mm: meminit: Remove mminit_verify_page_links Mel Gorman 2015-04-28 14:37 ` Mel Gorman 2015-04-28 16:06 ` [PATCH 0/13] Parallel struct page initialisation v4 Pekka Enberg 2015-04-28 16:06 ` Pekka Enberg 2015-04-28 18:38 ` nzimmer 2015-04-28 18:38 ` nzimmer 2015-04-30 16:10 ` Daniel J Blueman 2015-04-30 16:10 ` Daniel J Blueman 2015-04-30 17:12 ` nzimmer 2015-04-30 17:12 ` nzimmer 2015-04-30 17:28 ` Mel Gorman 2015-04-30 17:28 ` Mel Gorman 2015-05-02 11:52 ` Elliott, Robert (Server Storage) 2015-05-02 11:52 ` Elliott, Robert (Server Storage) 2015-05-02 11:52 ` Elliott, Robert (Server Storage) 2015-04-29 1:16 ` Waiman Long 2015-04-29 1:16 ` Waiman Long 2015-05-01 22:02 ` Waiman Long 2015-05-01 22:02 ` Waiman Long 2015-05-02 0:09 ` Waiman Long 2015-05-02 0:09 ` Waiman Long 2015-05-02 8:52 ` Daniel J Blueman 2015-05-02 8:52 ` Daniel J Blueman 2015-05-02 16:05 ` Daniel J Blueman 2015-05-02 16:05 ` Daniel J Blueman 2015-05-04 21:30 ` Andrew Morton 2015-05-04 21:30 ` Andrew Morton 2015-05-05 3:32 ` Waiman Long 2015-05-05 3:32 ` Waiman Long 2015-05-05 10:45 ` Mel Gorman [this message] 2015-05-05 10:45 ` Mel Gorman 2015-05-05 13:55 ` Waiman Long 2015-05-05 13:55 ` Waiman Long 2015-05-05 14:31 ` Mel Gorman 2015-05-05 14:31 ` Mel Gorman 2015-05-05 15:01 ` Waiman Long 2015-05-05 15:01 ` Waiman Long 2015-05-06 3:39 ` Waiman Long 2015-05-06 3:39 ` Waiman Long 2015-05-06 0:55 ` Waiman Long 2015-05-06 0:55 ` Waiman Long 2015-05-05 20:02 ` Andrew Morton 2015-05-05 20:02 ` Andrew Morton 2015-05-05 22:13 ` Mel Gorman 2015-05-05 22:13 ` Mel Gorman 2015-05-05 22:25 ` Andrew Morton 2015-05-05 22:25 ` Andrew Morton 2015-05-06 7:12 ` Mel Gorman 2015-05-06 7:12 ` Mel Gorman 2015-05-06 10:22 ` Mel Gorman 2015-05-06 10:22 ` Mel Gorman 2015-05-06 12:05 ` Mel Gorman 2015-05-06 12:05 ` Mel Gorman 2015-05-06 17:58 ` Waiman Long 2015-05-06 17:58 ` Waiman Long 2015-05-07 2:37 ` Waiman Long 2015-05-07 2:37 ` Waiman Long 2015-05-07 7:21 ` Mel Gorman 2015-05-07 7:21 ` Mel Gorman 2015-05-06 1:21 ` Waiman Long 2015-05-06 1:21 ` Waiman Long 2015-05-06 2:01 ` Andrew Morton 2015-05-06 2:01 ` Andrew Morton 2015-05-07 7:25 ` [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup Mel Gorman 2015-05-07 7:25 ` Mel Gorman 2015-05-07 22:09 ` Andrew Morton 2015-05-07 22:09 ` Andrew Morton 2015-05-07 22:52 ` Mel Gorman 2015-05-07 22:52 ` Mel Gorman 2015-05-07 23:02 ` Andrew Morton 2015-05-07 23:02 ` Andrew Morton 2015-05-13 15:53 ` nzimmer 2015-05-13 15:53 ` nzimmer 2015-05-13 16:31 ` Mel Gorman 2015-05-13 16:31 ` Mel Gorman 2015-05-14 10:03 ` Daniel J Blueman 2015-05-14 10:03 ` Daniel J Blueman 2015-05-14 15:47 ` nzimmer 2015-05-14 15:47 ` nzimmer 2015-05-19 18:31 ` nzimmer 2015-05-19 18:31 ` nzimmer 2015-05-19 19:06 ` Mel Gorman 2015-05-19 19:06 ` Mel Gorman 2015-05-22 6:30 ` Daniel J Blueman 2015-05-22 6:30 ` Daniel J Blueman 2015-05-22 9:33 ` Mel Gorman 2015-05-22 9:33 ` Mel Gorman 2015-05-22 17:14 ` Waiman Long 2015-05-22 17:14 ` Waiman Long 2015-05-22 21:43 ` Davidlohr Bueso 2015-05-22 21:43 ` Davidlohr Bueso 2015-05-23 3:49 ` Daniel J Blueman 2015-05-23 3:49 ` Daniel J Blueman 2015-06-24 22:50 ` Nathan Zimmer 2015-06-24 22:50 ` Nathan Zimmer 2015-06-25 20:48 ` Mel Gorman 2015-06-25 20:48 ` Mel Gorman 2015-06-25 20:57 ` Mel Gorman 2015-06-25 20:57 ` Mel Gorman 2015-06-25 21:37 ` Nathan Zimmer 2015-06-25 21:37 ` Nathan Zimmer 2015-06-25 21:34 ` Nathan Zimmer 2015-06-25 21:34 ` Nathan Zimmer 2015-06-25 21:44 ` [RFC] kthread_create_on_node is failing to honor the node choice Nathan Zimmer 2015-06-26 1:08 ` Lai Jiangshan 2015-07-09 22:12 ` Andrew Morton 2015-07-10 14:26 ` Mel Gorman 2015-07-10 17:34 ` Nathan Zimmer 2015-06-26 10:16 ` [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup Mel Gorman 2015-06-26 10:16 ` Mel Gorman 2015-07-06 17:45 ` Daniel J Blueman 2015-07-06 17:45 ` Daniel J Blueman 2015-07-09 17:49 ` Nathan Zimmer 2015-07-09 17:49 ` Nathan Zimmer
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20150505104514.GC2462@suse.de \ --to=mgorman@suse.de \ --cc=akpm@linux-foundation.org \ --cc=daniel@numascale.com \ --cc=dave.hansen@intel.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=nzimmer@sgi.com \ --cc=scott.norton@hp.com \ --cc=waiman.long@hp.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.