From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=fGrr=KT=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-10.3 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,
	SPF_PASS,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0A085C47083
	for <linux-mm@archiver.kernel.org>; Mon, 24 May 2021 09:07:33 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 70ACF610A5
	for <linux-mm@archiver.kernel.org>; Mon, 24 May 2021 09:07:32 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 70ACF610A5
Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=techsingularity.net
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 7025F94005E; Mon, 24 May 2021 05:07:31 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 6AB25940055; Mon, 24 May 2021 05:07:31 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 298D494005E; Mon, 24 May 2021 05:07:31 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0144.hostedemail.com [216.40.44.144])
	by kanga.kvack.org (Postfix) with ESMTP id D9854940055
	for <linux-mm@kvack.org>; Mon, 24 May 2021 05:07:30 -0400 (EDT)
Received: from smtpin34.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay01.hostedemail.com (Postfix) with ESMTP id 744FA180ACEF3
	for <linux-mm@kvack.org>; Mon, 24 May 2021 09:07:30 +0000 (UTC)
X-FDA: 78175546260.34.0346AA2
Received: from outbound-smtp21.blacknight.com (outbound-smtp21.blacknight.com [81.17.249.41])
	by imf26.hostedemail.com (Postfix) with ESMTP id 55C5C40B8CFF
	for <linux-mm@kvack.org>; Mon, 24 May 2021 09:07:27 +0000 (UTC)
Received: from mail.blacknight.com (pemlinmail05.blacknight.ie [81.17.254.26])
	by outbound-smtp21.blacknight.com (Postfix) with ESMTPS id 5E171CCB5F
	for <linux-mm@kvack.org>; Mon, 24 May 2021 10:07:28 +0100 (IST)
Received: (qmail 10052 invoked from network); 24 May 2021 09:07:28 -0000
Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.23.168])
  by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 24 May 2021 09:07:28 -0000
Date: Mon, 24 May 2021 10:07:26 +0100
From: Mel Gorman <mgorman@techsingularity.net>
To: Dave Hansen <dave.hansen@intel.com>
Cc: Linux-MM <linux-mm@kvack.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	Vlastimil Babka <vbabka@suse.cz>, Michal Hocko <mhocko@kernel.org>,
	Nicholas Piggin <npiggin@gmail.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 3/6] mm/page_alloc: Adjust pcp->high after CPU hotplug
 events
Message-ID: <20210524090726.GB30378@techsingularity.net>
References: <20210521102826.28552-1-mgorman@techsingularity.net>
 <20210521102826.28552-4-mgorman@techsingularity.net>
 <add15859-31e2-1688-3d8c-26e2579e9a57@intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-15
Content-Disposition: inline
In-Reply-To: <add15859-31e2-1688-3d8c-26e2579e9a57@intel.com>
User-Agent: Mutt/1.10.1 (2018-07-13)
X-Rspamd-Queue-Id: 55C5C40B8CFF
Authentication-Results: imf26.hostedemail.com;
	dkim=none;
	spf=pass (imf26.hostedemail.com: domain of mgorman@techsingularity.net designates 81.17.249.41 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net;
	dmarc=none
X-Rspamd-Server: rspam03
X-Stat-Signature: eakikwha1pwgrtm7qf5mee4y9be8opeb
X-HE-Tag: 1621847247-833937
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Fri, May 21, 2021 at 03:13:35PM -0700, Dave Hansen wrote:
> On 5/21/21 3:28 AM, Mel Gorman wrote:
> > The PCP high watermark is based on the number of online CPUs so the
> > watermarks must be adjusted during CPU hotplug. At the time of
> > hot-remove, the number of online CPUs is already adjusted but during
> > hot-add, a delta needs to be applied to update PCP to the correct
> > value. After this patch is applied, the high watermarks are adjusted
> > correctly.
> > 
> >   # grep high: /proc/zoneinfo  | tail -1
> >               high:  649
> >   # echo 0 > /sys/devices/system/cpu/cpu4/online
> >   # grep high: /proc/zoneinfo  | tail -1
> >               high:  664
> >   # echo 1 > /sys/devices/system/cpu/cpu4/online
> >   # grep high: /proc/zoneinfo  | tail -1
> >               high:  649
> 
> This is actually a comment more about the previous patch, but it doesn't
> really become apparent until the example above.
> 
> In your example, you mentioned increased exit() performance by using
> "vm.percpu_pagelist_fraction to increase the pcp->high value".  That's
> presumably because of the increased batching effects and fewer lock
> acquisitions.
> 

Yes

> But, logically, doesn't that mean that, the more CPUs you have in a
> node, the *higher* you want pcp->high to be?  If we took this to the
> extreme and had an absurd number of CPUs in a node, we could end up with
> a too-small pcp->high value.
> 

I see your point but I don't think increasing pcp->high for larger
numbers of CPUs is the right answer because then reclaim can be
triggered simply because too many PCPs have pages.

To address your point requires much deeper surgery. zone->lock would have
to be split to being a metadata lock and a free page lock. Then the free
areas would have to be split based on some factor -- number of CPUs or
memory size. That gets complex because then the page allocator loop needs
to walk multiple arenas as well as multiple zones as well as consider which
arena should be examined first. Fragmentation should also be considered
because a decision would need to be made on whether a pageblock should
fragment or whether other local areans should be examined. Anything that
walks PFNs such as compaction would also need to be aware of arenas and
their associated locks. Finally every acquisition of zone->lock would
have to be audited to determine exactly what it is protecting. Even with
all that, it still makes sense to disassociate pcp->high from pcp->batch
as this series does.

There is value to doing something like this but it's beyond what this
series is trying to do and doing the work without introducing regressions
would be very difficult.

> Also, do you worry at all about a zone with a low min_free_kbytes seeing
> increased zone lock contention?
> 
> ...
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index bf5cdc466e6c..2761b03b3a44 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -6628,7 +6628,7 @@ static int zone_batchsize(struct zone *zone)
> >  #endif
> >  }
> >  
> > -static int zone_highsize(struct zone *zone)
> > +static int zone_highsize(struct zone *zone, int cpu_online)
> >  {
> >  #ifdef CONFIG_MMU
> >  	int high;
> > @@ -6640,7 +6640,7 @@ static int zone_highsize(struct zone *zone)
> >  	 * CPUs local to a zone. Note that early in boot that CPUs may
> >  	 * not be online yet.
> >  	 */
> > -	nr_local_cpus = max(1U, cpumask_weight(cpumask_of_node(zone_to_nid(zone))));
> > +	nr_local_cpus = max(1U, cpumask_weight(cpumask_of_node(zone_to_nid(zone)))) + cpu_online;
> >  	high = low_wmark_pages(zone) / nr_local_cpus;
> 
> Is this "+ cpu_online" bias because the CPU isn't in cpumask_of_node()
> when the CPU hotplug callback occurs?  If so, it might be nice to mention.

Fixed.

-- 
Mel Gorman
SUSE Labs