All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Aubrey Li <aubrey.li@linux.intel.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 4/4] sched/numa: Adjust imb_numa_nr to a better approximation of memory channels
Date: Wed, 18 May 2022 12:15:39 +0100	[thread overview]
Message-ID: <20220518111539.GP3441@techsingularity.net> (raw)
In-Reply-To: <20220518094112.GE10117@worktop.programming.kicks-ass.net>

On Wed, May 18, 2022 at 11:41:12AM +0200, Peter Zijlstra wrote:
> On Wed, May 11, 2022 at 03:30:38PM +0100, Mel Gorman wrote:
> > For a single LLC per node, a NUMA imbalance is allowed up until 25%
> > of CPUs sharing a node could be active. One intent of the cut-off is
> > to avoid an imbalance of memory channels but there is no topological
> > information based on active memory channels. Furthermore, there can
> > be differences between nodes depending on the number of populated
> > DIMMs.
> > 
> > A cut-off of 25% was arbitrary but generally worked. It does have a severe
> > corner cases though when an parallel workload is using 25% of all available
> > CPUs over-saturates memory channels. This can happen due to the initial
> > forking of tasks that get pulled more to one node after early wakeups
> > (e.g. a barrier synchronisation) that is not quickly corrected by the
> > load balancer. The LB may fail to act quickly as the parallel tasks are
> > considered to be poor migrate candidates due to locality or cache hotness.
> > 
> > On a range of modern Intel CPUs, 12.5% appears to be a better cut-off
> > assuming all memory channels are populated and is used as the new cut-off
> > point. A minimum of 1 is specified to allow a communicating pair to
> > remain local even for CPUs with low numbers of cores. For modern AMDs,
> > there are multiple LLCs and are not affected.
> 
> Can the hardware tell us about memory channels?

It's in the SMBIOS table somewhere as it's available via dmidecode. For
example, on a 2-socket machine;

$ dmidecode -t memory | grep -E "Size|Bank"
        Size: 8192 MB
        Bank Locator: P0_Node0_Channel0_Dimm0
        Size: No Module Installed
        Bank Locator: P0_Node0_Channel0_Dimm1
        Size: 8192 MB
        Bank Locator: P0_Node0_Channel1_Dimm0
        Size: No Module Installed
        Bank Locator: P0_Node0_Channel1_Dimm1
        Size: 8192 MB
        Bank Locator: P0_Node0_Channel2_Dimm0
        Size: No Module Installed
        Bank Locator: P0_Node0_Channel2_Dimm1
        Size: 8192 MB
        Bank Locator: P0_Node0_Channel3_Dimm0
        Size: No Module Installed
        Bank Locator: P0_Node0_Channel3_Dimm1
        Size: 8192 MB
        Bank Locator: P1_Node1_Channel0_Dimm0
        Size: No Module Installed
        Bank Locator: P1_Node1_Channel0_Dimm1
        Size: 8192 MB
        Bank Locator: P1_Node1_Channel1_Dimm0
        Size: No Module Installed
        Bank Locator: P1_Node1_Channel1_Dimm1
        Size: 8192 MB
        Bank Locator: P1_Node1_Channel2_Dimm0
        Size: No Module Installed
        Bank Locator: P1_Node1_Channel2_Dimm1
        Size: 8192 MB
        Bank Locator: P1_Node1_Channel3_Dimm0
        Size: No Module Installed
        Bank Locator: P1_Node1_Channel3_Dimm1

SMBIOUS contains the information on number of channels and whether they
are populated with at least one DIMM.

I'm not aware of how it can be done in-kernel on a cross architectural
basis. Reading through the arch manual, it states how many channels are
in a given processor family and it's available during memory check errors
(apparently via the EDAC driver). It's sometimes available via PMUs but
I couldn't find a place where it's generically available for topology.c
that would work on all x86-64 machines let alone every other architecture.

It's not even clear if SMBIOS was parsed in early boot whether it's a
good idea. It could result in difference imbalance thresholds for each
NUMA domain or weird corner cases where assymetric NUMA node populations
would result in run-to-run variance that are difficult to analyse.

-- 
Mel Gorman
SUSE Labs

  reply	other threads:[~2022-05-18 11:15 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-11 14:30 [PATCH 0/4] Mitigate inconsistent NUMA imbalance behaviour Mel Gorman
2022-05-11 14:30 ` [PATCH 1/4] sched/numa: Initialise numa_migrate_retry Mel Gorman
2022-05-11 14:30 ` [PATCH 2/4] sched/numa: Do not swap tasks between nodes when spare capacity is available Mel Gorman
2022-05-11 14:30 ` [PATCH 3/4] sched/numa: Apply imbalance limitations consistently Mel Gorman
2022-05-18  9:24   ` [sched/numa] bb2dee337b: unixbench.score -11.2% regression kernel test robot
2022-05-18  9:24     ` kernel test robot
2022-05-18 15:22     ` Mel Gorman
2022-05-18 15:22       ` Mel Gorman
2022-05-19  7:54       ` ying.huang
2022-05-19  7:54         ` ying.huang
2022-05-20  6:44         ` [LKP] " Ying Huang
2022-05-20  6:44           ` Ying Huang
2022-05-18  9:31   ` [PATCH 3/4] sched/numa: Apply imbalance limitations consistently Peter Zijlstra
2022-05-18 10:46     ` Mel Gorman
2022-05-18 13:59       ` Peter Zijlstra
2022-05-18 15:39         ` Mel Gorman
2022-05-11 14:30 ` [PATCH 4/4] sched/numa: Adjust imb_numa_nr to a better approximation of memory channels Mel Gorman
2022-05-18  9:41   ` Peter Zijlstra
2022-05-18 11:15     ` Mel Gorman [this message]
2022-05-18 14:05       ` Peter Zijlstra
2022-05-18 17:06         ` Mel Gorman
2022-05-19  9:29           ` Mel Gorman
2022-05-20  4:58 ` [PATCH 0/4] Mitigate inconsistent NUMA imbalance behaviour K Prateek Nayak
2022-05-20 10:18   ` Mel Gorman
2022-05-20 15:17     ` K Prateek Nayak
2022-05-20 10:35 [PATCH v2 " Mel Gorman
2022-05-20 10:35 ` [PATCH 4/4] sched/numa: Adjust imb_numa_nr to a better approximation of memory channels Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220518111539.GP3441@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=aubrey.li@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.