linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mike Rapoport <rppt@kernel.org>
To: Baoquan He <bhe@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	akpm@linux-foundation.org, gopakumarr@vmware.com,
	david@redhat.com
Subject: Re: [PATCH v2 1/5] mm: memmap defer init dosn't work as expected
Date: Mon, 21 Dec 2020 08:32:52 +0200	[thread overview]
Message-ID: <20201221063252.GC392325@kernel.org> (raw)
In-Reply-To: <20201220082754.6900-2-bhe@redhat.com>

On Sun, Dec 20, 2020 at 04:27:50PM +0800, Baoquan He wrote:
> VMware observed a performance regression during memmap init on their platform,
> and bisected to commit 73a6e474cb376 ("mm: memmap_init: iterate over memblock
> regions rather that check each PFN") causing it.
> 
> Before the commit:
> 
>   [0.033176] Normal zone: 1445888 pages used for memmap
>   [0.033176] Normal zone: 89391104 pages, LIFO batch:63
>   [0.035851] ACPI: PM-Timer IO Port: 0x448
> 
> With commit
> 
>   [0.026874] Normal zone: 1445888 pages used for memmap
>   [0.026875] Normal zone: 89391104 pages, LIFO batch:63
>   [2.028450] ACPI: PM-Timer IO Port: 0x448
> 
> The root cause is the current memmap defer init doesn't work as expected.
> Before, memmap_init_zone() was used to do memmap init of one whole zone, to
> initialize all low zones of one numa node, but defer memmap init of the
> last zone in that numa node. However, since commit 73a6e474cb376, function
> memmap_init() is adapted to iterater over memblock regions inside one zone,
> then call memmap_init_zone() to do memmap init for each region.
> 
> E.g, on VMware's system, the memory layout is as below, there are two memory
> regions in node 2. The current code will mistakenly initialize the whole 1st
> region [mem 0xab00000000-0xfcffffffff], then do memmap defer to iniatialize
> only one memmory section on the 2nd region [mem 0x10000000000-0x1033fffffff].
> In fact, we only expect to see that there's only one memory section's memmap
> initialized. That's why more time is costed at the time.
> 
> [    0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
> [    0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff]
> [    0.008843] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x55ffffffff]
> [    0.008844] ACPI: SRAT: Node 1 PXM 1 [mem 0x5600000000-0xaaffffffff]
> [    0.008844] ACPI: SRAT: Node 2 PXM 2 [mem 0xab00000000-0xfcffffffff]
> [    0.008845] ACPI: SRAT: Node 2 PXM 2 [mem 0x10000000000-0x1033fffffff]
> 
> Now, let's add a parameter 'zone_end_pfn' to memmap_init_zone() to pass
> down the real zone end pfn so that defer_init() can use it to judge whether
> defer need be taken in zone wide.
> 
> Fixes: commit 73a6e474cb376 ("mm: memmap_init: iterate over memblock regions rather that check each PFN")
> Reported-by: Rahul Gopakumar <gopakumarr@vmware.com>
> Signed-off-by: Baoquan He <bhe@redhat.com>
> Cc: stable@vger.kernel.org

Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>

> ---
>  arch/ia64/mm/init.c | 4 ++--
>  include/linux/mm.h  | 5 +++--
>  mm/memory_hotplug.c | 2 +-
>  mm/page_alloc.c     | 8 +++++---
>  4 files changed, 11 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
> index 9b5acf8fb092..e76386a3479e 100644
> --- a/arch/ia64/mm/init.c
> +++ b/arch/ia64/mm/init.c
> @@ -536,7 +536,7 @@ virtual_memmap_init(u64 start, u64 end, void *arg)
>  
>  	if (map_start < map_end)
>  		memmap_init_zone((unsigned long)(map_end - map_start),
> -				 args->nid, args->zone, page_to_pfn(map_start),
> +				 args->nid, args->zone, page_to_pfn(map_start), page_to_pfn(map_end),
>  				 MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
>  	return 0;
>  }
> @@ -546,7 +546,7 @@ memmap_init (unsigned long size, int nid, unsigned long zone,
>  	     unsigned long start_pfn)
>  {
>  	if (!vmem_map) {
> -		memmap_init_zone(size, nid, zone, start_pfn,
> +		memmap_init_zone(size, nid, zone, start_pfn, start_pfn + size,
>  				 MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
>  	} else {
>  		struct page *start;
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index e4e5be20b0c2..92e06ea053f4 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2432,8 +2432,9 @@ extern int __meminit early_pfn_to_nid(unsigned long pfn);
>  #endif
>  
>  extern void set_dma_reserve(unsigned long new_dma_reserve);
> -extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long,
> -		enum meminit_context, struct vmem_altmap *, int migratetype);
> +extern void memmap_init_zone(unsigned long, int, unsigned long,
> +		unsigned long, unsigned long, enum meminit_context,
> +		struct vmem_altmap *, int migratetype);
>  extern void setup_per_zone_wmarks(void);
>  extern int __meminit init_per_zone_wmark_min(void);
>  extern void mem_init(void);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index af41fb990820..f9d57b9be8c7 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -713,7 +713,7 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
>  	 * expects the zone spans the pfn range. All the pages in the range
>  	 * are reserved so nobody should be touching them so we should be safe
>  	 */
> -	memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn,
> +	memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn, 0,
>  			 MEMINIT_HOTPLUG, altmap, migratetype);
>  
>  	set_zone_contiguous(zone);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8cea0823b70e..32645f2e7b96 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -423,6 +423,8 @@ defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
>  	if (end_pfn < pgdat_end_pfn(NODE_DATA(nid)))
>  		return false;
>  
> +	if (NODE_DATA(nid)->first_deferred_pfn != ULONG_MAX)
> +		return true;
>  	/*
>  	 * We start only with one section of pages, more pages are added as
>  	 * needed until the rest of deferred pages are initialized.
> @@ -6116,7 +6118,7 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
>   * zone stats (e.g., nr_isolate_pageblock) are touched.
>   */
>  void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> -		unsigned long start_pfn,
> +		unsigned long start_pfn, unsigned long zone_end_pfn,
>  		enum meminit_context context,
>  		struct vmem_altmap *altmap, int migratetype)
>  {
> @@ -6152,7 +6154,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>  		if (context == MEMINIT_EARLY) {
>  			if (overlap_memmap_init(zone, &pfn))
>  				continue;
> -			if (defer_init(nid, pfn, end_pfn))
> +			if (defer_init(nid, pfn, zone_end_pfn))
>  				break;
>  		}
>  
> @@ -6307,7 +6309,7 @@ void __init __weak memmap_init(unsigned long size, int nid,
>  
>  		if (end_pfn > start_pfn) {
>  			size = end_pfn - start_pfn;
> -			memmap_init_zone(size, nid, zone, start_pfn,
> +			memmap_init_zone(size, nid, zone, start_pfn, range_end_pfn,
>  					 MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
>  		}
>  
> -- 
> 2.17.2
> 

-- 
Sincerely yours,
Mike.


  reply	other threads:[~2020-12-21  6:33 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-20  8:27 [PATCH v2 0/5] Fix the incorrect memmep defer init handling and do some cleanup Baoquan He
2020-12-20  8:27 ` [PATCH v2 1/5] mm: memmap defer init dosn't work as expected Baoquan He
2020-12-21  6:32   ` Mike Rapoport [this message]
2020-12-20  8:27 ` [PATCH v2 2/5] mm: rename memmap_init() and memmap_init_zone() Baoquan He
2020-12-21  6:33   ` Mike Rapoport
2020-12-20  8:27 ` [PATCH v2 3/5] mm: simplify parater of function memmap_init_zone() Baoquan He
2020-12-21  6:34   ` Mike Rapoport
2020-12-20  8:27 ` [PATCH v2 4/5] mm: simplify parameter of setup_usemap() Baoquan He
2020-12-21  6:34   ` Mike Rapoport
2020-12-20  8:27 ` [PATCH v2 5/5] mm: remove unneeded local variable in free_area_init_core Baoquan He
2020-12-21  6:35   ` Mike Rapoport
2020-12-23  1:46 ` [PATCH v2 0/5] Fix the incorrect memmep defer init handling and do some cleanup Andrew Morton
2020-12-23  2:05   ` Baoquan He
2020-12-23  8:12     ` Baoquan He

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201221063252.GC392325@kernel.org \
    --to=rppt@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=david@redhat.com \
    --cc=gopakumarr@vmware.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).