linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Mel Gorman <mgorman@techsingularity.net>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Hillf Danton <hdanton@sina.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Michal Hocko <mhocko@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH 2/6] mm/page_alloc: Disassociate the pcp->high from pcp->batch
Date: Wed, 26 May 2021 20:14:13 +0200	[thread overview]
Message-ID: <10cb326c-b4ad-3a82-a38b-aba7d2192736@suse.cz> (raw)
In-Reply-To: <20210525080119.5455-3-mgorman@techsingularity.net>

On 5/25/21 10:01 AM, Mel Gorman wrote:
> The pcp high watermark is based on the batch size but there is no
> relationship between them other than it is convenient to use early in
> boot.
> 
> This patch takes the first step and bases pcp->high on the zone low
> watermark split across the number of CPUs local to a zone while the batch
> size remains the same to avoid increasing allocation latencies. The intent
> behind the default pcp->high is "set the number of PCP pages such that
> if they are all full that background reclaim is not started prematurely".
> 
> Note that in this patch the pcp->high values are adjusted after memory
> hotplug events, min_free_kbytes adjustments and watermark scale factor
> adjustments but not CPU hotplug events which is handled later in the
> series.
> 
> On a test KVM instance;
> 
> Before grep -E "high:|batch" /proc/zoneinfo | tail -2
>               high:  378
>               batch: 63
> 
> After grep -E "high:|batch" /proc/zoneinfo | tail -2
>               high:  649
>               batch: 63
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

...

> @@ -6637,6 +6628,34 @@ static int zone_batchsize(struct zone *zone)
>  #endif
>  }
>  
> +static int zone_highsize(struct zone *zone, int batch)
> +{
> +#ifdef CONFIG_MMU
> +	int high;
> +	int nr_local_cpus;
> +
> +	/*
> +	 * The high value of the pcp is based on the zone low watermark
> +	 * so that if they are full then background reclaim will not be
> +	 * started prematurely. The value is split across all online CPUs
> +	 * local to the zone. Note that early in boot that CPUs may not be
> +	 * online yet.
> +	 */
> +	nr_local_cpus = max(1U, cpumask_weight(cpumask_of_node(zone_to_nid(zone))));
> +	high = low_wmark_pages(zone) / nr_local_cpus;
> +
> +	/*
> +	 * Ensure high is at least batch*4. The multiple is based on the
> +	 * historical relationship between high and batch.
> +	 */
> +	high = max(high, batch << 2);
> +
> +	return high;
> +#else
> +	return 0;
> +#endif
> +}
> +
>  /*
>   * pcp->high and pcp->batch values are related and generally batch is lower
>   * than high. They are also related to pcp->count such that count is lower
> @@ -6698,11 +6717,10 @@ static void __zone_set_pageset_high_and_batch(struct zone *zone, unsigned long h
>   */
>  static void zone_set_pageset_high_and_batch(struct zone *zone)
>  {
> -	unsigned long new_high, new_batch;
> +	int new_high, new_batch;
>  
> -	new_batch = zone_batchsize(zone);
> -	new_high = 6 * new_batch;
> -	new_batch = max(1UL, 1 * new_batch);
> +	new_batch = max(1, zone_batchsize(zone));
> +	new_high = zone_highsize(zone, new_batch);
>  
>  	if (zone->pageset_high == new_high &&
>  	    zone->pageset_batch == new_batch)
> @@ -8170,6 +8188,12 @@ static void __setup_per_zone_wmarks(void)
>  		zone->_watermark[WMARK_LOW]  = min_wmark_pages(zone) + tmp;
>  		zone->_watermark[WMARK_HIGH] = min_wmark_pages(zone) + tmp * 2;
>  
> +		/*
> +		 * The watermark size have changed so update the pcpu batch
> +		 * and high limits or the limits may be inappropriate.
> +		 */
> +		zone_set_pageset_high_and_batch(zone);

Hm so this puts the call in the path of various watermark related sysctl
handlers, but it's not protected by pcp_batch_high_lock. The zone lock won't
help against zone_pcp_update() from a hotplug handler. On the other hand, since
hotplug handlers also call __setup_per_zone_wmarks(), the zone_pcp_update()
calls there are now redundant and could be removed, no?
But later there will be a new sysctl in patch 6/6 using pcp_batch_high_lock,
thus that one will not be protected against the watermark related sysctl
handlers that reach here.

To solve all this, seems like the static lock in setup_per_zone_wmarks() could
become a top-level visible lock and pcp high/batch updates could switch to that
one instead of own pcp_batch_high_lock. And zone_pcp_update() calls from hotplug
handlers could be removed.

> +
>  		spin_unlock_irqrestore(&zone->lock, flags);
>  	}
>  
> 


  reply	other threads:[~2021-05-26 18:14 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-25  8:01 [PATCH 0/6 v2] Calculate pcp->high based on zone sizes and active CPUs Mel Gorman
2021-05-25  8:01 ` [PATCH 1/6] mm/page_alloc: Delete vm.percpu_pagelist_fraction Mel Gorman
2021-05-26 17:41   ` Vlastimil Babka
2021-05-25  8:01 ` [PATCH 2/6] mm/page_alloc: Disassociate the pcp->high from pcp->batch Mel Gorman
2021-05-26 18:14   ` Vlastimil Babka [this message]
2021-05-27 10:52     ` Mel Gorman
2021-05-28 10:27       ` Vlastimil Babka
2021-05-25  8:01 ` [PATCH 3/6] mm/page_alloc: Adjust pcp->high after CPU hotplug events Mel Gorman
2021-05-28 11:08   ` Vlastimil Babka
2021-05-25  8:01 ` [PATCH 4/6] mm/page_alloc: Scale the number of pages that are batch freed Mel Gorman
2021-05-28 11:19   ` Vlastimil Babka
2021-05-25  8:01 ` [PATCH 5/6] mm/page_alloc: Limit the number of pages on PCP lists when reclaim is active Mel Gorman
2021-05-28 11:43   ` Vlastimil Babka
2021-05-25  8:01 ` [PATCH 6/6] mm/page_alloc: Introduce vm.percpu_pagelist_high_fraction Mel Gorman
2021-05-28 11:59   ` Vlastimil Babka
2021-05-28 12:53     ` Mel Gorman
2021-05-28 14:38       ` Vlastimil Babka
2021-05-27 19:36 ` [PATCH 0/6 v2] Calculate pcp->high based on zone sizes and active CPUs Dave Hansen
2021-05-28  8:55   ` Mel Gorman
2021-05-28  9:03     ` David Hildenbrand
2021-05-28  9:08       ` David Hildenbrand
2021-05-28  9:49         ` Mel Gorman
2021-05-28  9:52           ` David Hildenbrand
2021-05-28 10:09             ` Mel Gorman
2021-05-28 10:21               ` David Hildenbrand
2021-05-28 12:12     ` Vlastimil Babka
2021-05-28 12:37       ` Mel Gorman
2021-05-28 14:39     ` Dave Hansen
2021-05-28 15:18       ` Mel Gorman
2021-05-28 16:17         ` Dave Hansen
2021-05-31 12:00           ` Feng Tang
  -- strict thread matches above, loose matches on Subject: below --
2021-05-21 10:28 [RFC PATCH 0/6] " Mel Gorman
2021-05-21 10:28 ` [PATCH 2/6] mm/page_alloc: Disassociate the pcp->high from pcp->batch Mel Gorman
2021-05-21 21:52   ` Dave Hansen
2021-05-24  8:32     ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=10cb326c-b4ad-3a82-a38b-aba7d2192736@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=hdanton@sina.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).