All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: Mel Gorman <mgorman@suse.de>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Feng Tang <feng.tang@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.com>, Rik van Riel <riel@surriel.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Yang Shi <shy828301@gmail.com>, Zi Yan <ziy@nvidia.com>,
	Wei Xu <weixugc@google.com>, osalvador <osalvador@suse.de>,
	Shakeel Butt <shakeelb@google.com>,
	Hasan Al Maruf <hasanalmaruf@fb.com>
Subject: Re: [PATCH -V10 RESEND 0/6] NUMA balancing: optimize memory placement for memory tiering system
Date: Thu, 13 Jan 2022 14:00:39 +0100	[thread overview]
Message-ID: <YeAid+EXvmH9WAbq@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <87o84fu9f3.fsf@yhuang6-desk2.ccr.corp.intel.com>

On Thu, Jan 13, 2022 at 08:06:40PM +0800, Huang, Ying wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
> > On Thu, Jan 13, 2022 at 03:19:06PM +0800, Huang, Ying wrote:
> >> Peter Zijlstra <peterz@infradead.org> writes:
> >> > On Tue, Dec 07, 2021 at 10:27:51AM +0800, Huang Ying wrote:

> >> >> After commit c221c0b0308f ("device-dax: "Hotplug" persistent memory
> >> >> for use like normal RAM"), the PMEM could be used as the
> >> >> cost-effective volatile memory in separate NUMA nodes.  In a typical
> >> >> memory tiering system, there are CPUs, DRAM and PMEM in each physical
> >> >> NUMA node.  The CPUs and the DRAM will be put in one logical node,
> >> >> while the PMEM will be put in another (faked) logical node.
> >> >
> >> > So what does a system like that actually look like, SLIT table wise, and
> >> > how does that affect init_numa_topology_type() ?
> >> 
> >> The SLIT table is as follows,

<snip>

> >> node distances:
> >> node   0   1   2   3 
> >>   0:  10  21  17  28 
> >>   1:  21  10  28  17 
> >>   2:  17  28  10  28 
> >>   3:  28  17  28  10 
> >> 
> >> init_numa_topology_type() set sched_numa_topology_type to NUMA_DIRECT.
> >> 
> >> The node 0 and node 1 are onlined during boot.  While the PMEM node,
> >> that is, node 2 and node 3 are onlined later.  As in the following dmesg
> >> snippet.
> >
> > But how? sched_init_numa() scans the *whole* SLIT table to determine
> > nr_levels / sched_domains_numa_levels, even offline nodes. Therefore it
> > should find 4 distinct distance values and end up not selecting
> > NUMA_DIRECT.
> >
> > Similarly for the other types it uses for_each_online_node(), which
> > would include the pmem nodes once they've been onlined, but I'm thinking
> > we explicitly want to skip CPU-less nodes in that iteration.
> 
> I used the debug patch as below, and get the log in dmesg as follows,
> 
> [    5.394577][    T1] sched_numa_topology_type: 0, levels: 4, max_distance: 28
> 
> I found that I forget another caller of init_numa_topology_type() run
> during hotplug.  I will add another printk() to show it.  Sorry about
> that.

Can you try with this on?

I'm suspecting there's a problem with init_numa_topology_type(); it will
never find the max distance due to the _online_ clause in the iteration,
since you said the pmem nodes are not online yet.

---
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index d201a7052a29..53ab9c63c185 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1756,6 +1756,8 @@ static void init_numa_topology_type(void)
 			return;
 		}
 	}
+
+	WARN(1, "no NUMA type determined");
 }

  reply	other threads:[~2022-01-13 13:01 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-07  2:27 [PATCH -V10 RESEND 0/6] NUMA balancing: optimize memory placement for memory tiering system Huang Ying
2021-12-07  2:27 ` [PATCH -V10 RESEND 1/6] NUMA Balancing: add page promotion counter Huang Ying
2021-12-07  6:05   ` Hasan Al Maruf
2021-12-08  2:16     ` Huang, Ying
2021-12-17  7:25   ` Baolin Wang
2021-12-07  2:27 ` [PATCH -V10 RESEND 2/6] NUMA balancing: optimize page placement for memory tiering system Huang Ying
2021-12-07  6:36   ` Hasan Al Maruf
2021-12-08  3:16     ` Huang, Ying
2021-12-17  7:35   ` Baolin Wang
2021-12-07  2:27 ` [PATCH -V10 RESEND 3/6] memory tiering: skip to scan fast memory Huang Ying
2021-12-17  7:41   ` Baolin Wang
2021-12-07  2:27 ` [PATCH -V10 RESEND 4/6] memory tiering: hot page selection with hint page fault latency Huang Ying
2021-12-07  2:27 ` [PATCH -V10 RESEND 5/6] memory tiering: rate limit NUMA migration throughput Huang Ying
2021-12-07  2:27 ` [PATCH -V10 RESEND 6/6] memory tiering: adjust hot threshold automatically Huang Ying
2022-01-12 16:10 ` [PATCH -V10 RESEND 0/6] NUMA balancing: optimize memory placement for memory tiering system Peter Zijlstra
2022-01-13  7:19   ` Huang, Ying
2022-01-13  9:49     ` Peter Zijlstra
2022-01-13 12:06       ` Huang, Ying
2022-01-13 13:00         ` Peter Zijlstra [this message]
2022-01-13 13:13           ` Huang, Ying
2022-01-13 14:24           ` Huang, Ying
2022-01-14  5:24             ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YeAid+EXvmH9WAbq@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=feng.tang@intel.com \
    --cc=hasanalmaruf@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=osalvador@suse.de \
    --cc=riel@surriel.com \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    --cc=weixugc@google.com \
    --cc=ying.huang@intel.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.