linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Dave Hansen <dave.hansen@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Hillf Danton <hdanton@sina.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Michal Hocko <mhocko@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>, "Tang, Feng" <feng.tang@intel.com>
Subject: Re: [PATCH 0/6 v2] Calculate pcp->high based on zone sizes and active CPUs
Date: Fri, 28 May 2021 13:37:22 +0100	[thread overview]
Message-ID: <20210528123721.GO30378@techsingularity.net> (raw)
In-Reply-To: <416f39e7-704a-86d0-8261-dc27366336ab@suse.cz>

On Fri, May 28, 2021 at 02:12:09PM +0200, Vlastimil Babka wrote:
> > mm/page_alloc: Split pcp->high across all online CPUs for cpuless nodes
> > 
> > Dave Hansen reported the following about Feng Tang's tests on a machine
> > with persistent memory onlined as a DRAM-like device.
> > 
> >   Feng Tang tossed these on a "Cascade Lake" system with 96 threads and
> >   ~512G of persistent memory and 128G of DRAM.  The PMEM is in "volatile
> >   use" mode and being managed via the buddy just like the normal RAM.
> > 
> >   The PMEM zones are big ones:
> > 
> >         present  65011712 = 248 G
> >         high       134595 = 525 M
> > 
> >   The PMEM nodes, of course, don't have any CPUs in them.
> > 
> >   With your series, the pcp->high value per-cpu is 69584 pages or about
> >   270MB per CPU.  Scaled up by the 96 CPU threads, that's ~26GB of
> >   worst-case memory in the pcps per zone, or roughly 10% of the size of
> >   the zone.
> > 
> > This should not cause a problem as such although it could trigger reclaim
> > due to pages being stored on per-cpu lists for CPUs remote to a node. It
> > is not possible to treat cpuless nodes exactly the same as normal nodes
> > but the worst-case scenario can be mitigated by splitting pcp->high across
> > all online CPUs for cpuless memory nodes.
> > 
> > Suggested-by: Dave Hansen <dave.hansen@intel.com>
> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> 

Thanks.

> Maybe we should even consider distinguishing high limits for local-to-cpu zones
> vs remote, for example for the local-to-cpu zones we would divide by the number
> of local cpus, for remote-to-cpu zones we would divide by all cpus.
> 
> Because we can expect cpus to allocate mostly from local zones, so leaving more
> pages on percpu for those zones can be beneficial.
> 

I did think about whether the ratios should be different but failed to
conclude that it was necessary or useful so I kept it simple.

> But as the motivation here was to reduce lock contention on freeing, that's less
> clear. We probably can't expect the cpu to be freeing mostly local pages (in
> case of e.g. a large process exiting), because no mechanism works towards that,
> or does it? In case of cpu freeing to remote zone, the lower high limit could hurt.
> 

This is the major issue. Even if an application was NUMA aware and heavily
threaded, the process exiting is potentially freeing remote memory and
there is nothing wrong about that. The remote memory will be partially
drained by pcp->high being reached and the remaining memory will be
cleaned up by vmstat. It's a similar problem if a process is truncating
a large file with page cache allocated on a remote node.

Hence I decided to do nothing fancy with the ratios until a practical
problem was identified that could be alleviated by adjusting pcp->high
based on whether the CPU is remote or local to memory.

-- 
Mel Gorman
SUSE Labs

  reply	other threads:[~2021-05-28 12:38 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-25  8:01 Mel Gorman
2021-05-25  8:01 ` [PATCH 1/6] mm/page_alloc: Delete vm.percpu_pagelist_fraction Mel Gorman
2021-05-26 17:41   ` Vlastimil Babka
2021-05-25  8:01 ` [PATCH 2/6] mm/page_alloc: Disassociate the pcp->high from pcp->batch Mel Gorman
2021-05-26 18:14   ` Vlastimil Babka
2021-05-27 10:52     ` Mel Gorman
2021-05-28 10:27       ` Vlastimil Babka
2021-05-25  8:01 ` [PATCH 3/6] mm/page_alloc: Adjust pcp->high after CPU hotplug events Mel Gorman
2021-05-28 11:08   ` Vlastimil Babka
2021-05-25  8:01 ` [PATCH 4/6] mm/page_alloc: Scale the number of pages that are batch freed Mel Gorman
2021-05-28 11:19   ` Vlastimil Babka
2021-05-25  8:01 ` [PATCH 5/6] mm/page_alloc: Limit the number of pages on PCP lists when reclaim is active Mel Gorman
2021-05-28 11:43   ` Vlastimil Babka
2021-05-25  8:01 ` [PATCH 6/6] mm/page_alloc: Introduce vm.percpu_pagelist_high_fraction Mel Gorman
2021-05-28 11:59   ` Vlastimil Babka
2021-05-28 12:53     ` Mel Gorman
2021-05-28 14:38       ` Vlastimil Babka
2021-05-27 19:36 ` [PATCH 0/6 v2] Calculate pcp->high based on zone sizes and active CPUs Dave Hansen
2021-05-28  8:55   ` Mel Gorman
2021-05-28  9:03     ` David Hildenbrand
2021-05-28  9:08       ` David Hildenbrand
2021-05-28  9:49         ` Mel Gorman
2021-05-28  9:52           ` David Hildenbrand
2021-05-28 10:09             ` Mel Gorman
2021-05-28 10:21               ` David Hildenbrand
2021-05-28 12:12     ` Vlastimil Babka
2021-05-28 12:37       ` Mel Gorman [this message]
2021-05-28 14:39     ` Dave Hansen
2021-05-28 15:18       ` Mel Gorman
2021-05-28 16:17         ` Dave Hansen
2021-05-31 12:00           ` Feng Tang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210528123721.GO30378@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=feng.tang@intel.com \
    --cc=hdanton@sina.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=vbabka@suse.cz \
    --subject='Re: [PATCH 0/6 v2] Calculate pcp->high based on zone sizes and active CPUs' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).