Re: [PATCH 3/6] mm/page_alloc: Adjust pcp->high after CPU hotplug events

From: Dave Hansen <dave.hansen@intel.com>
To: Mel Gorman <mgorman@techsingularity.net>, Linux-MM <linux-mm@kvack.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Michal Hocko <mhocko@kernel.org>,
	Nicholas Piggin <npiggin@gmail.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 3/6] mm/page_alloc: Adjust pcp->high after CPU hotplug events
Date: Fri, 21 May 2021 15:13:35 -0700	[thread overview]
Message-ID: <add15859-31e2-1688-3d8c-26e2579e9a57@intel.com> (raw)
In-Reply-To: <20210521102826.28552-4-mgorman@techsingularity.net>

On 5/21/21 3:28 AM, Mel Gorman wrote:
> The PCP high watermark is based on the number of online CPUs so the
> watermarks must be adjusted during CPU hotplug. At the time of
> hot-remove, the number of online CPUs is already adjusted but during
> hot-add, a delta needs to be applied to update PCP to the correct
> value. After this patch is applied, the high watermarks are adjusted
> correctly.
> 
>   # grep high: /proc/zoneinfo  | tail -1
>               high:  649
>   # echo 0 > /sys/devices/system/cpu/cpu4/online
>   # grep high: /proc/zoneinfo  | tail -1
>               high:  664
>   # echo 1 > /sys/devices/system/cpu/cpu4/online
>   # grep high: /proc/zoneinfo  | tail -1
>               high:  649

This is actually a comment more about the previous patch, but it doesn't
really become apparent until the example above.

In your example, you mentioned increased exit() performance by using
"vm.percpu_pagelist_fraction to increase the pcp->high value".  That's
presumably because of the increased batching effects and fewer lock
acquisitions.

But, logically, doesn't that mean that, the more CPUs you have in a
node, the *higher* you want pcp->high to be?  If we took this to the
extreme and had an absurd number of CPUs in a node, we could end up with
a too-small pcp->high value.

Also, do you worry at all about a zone with a low min_free_kbytes seeing
increased zone lock contention?

...
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index bf5cdc466e6c..2761b03b3a44 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6628,7 +6628,7 @@ static int zone_batchsize(struct zone *zone)
>  #endif
>  }
>  
> -static int zone_highsize(struct zone *zone)
> +static int zone_highsize(struct zone *zone, int cpu_online)
>  {
>  #ifdef CONFIG_MMU
>  	int high;
> @@ -6640,7 +6640,7 @@ static int zone_highsize(struct zone *zone)
>  	 * CPUs local to a zone. Note that early in boot that CPUs may
>  	 * not be online yet.
>  	 */
> -	nr_local_cpus = max(1U, cpumask_weight(cpumask_of_node(zone_to_nid(zone))));
> +	nr_local_cpus = max(1U, cpumask_weight(cpumask_of_node(zone_to_nid(zone)))) + cpu_online;
>  	high = low_wmark_pages(zone) / nr_local_cpus;

Is this "+ cpu_online" bias because the CPU isn't in cpumask_of_node()
when the CPU hotplug callback occurs?  If so, it might be nice to mention.