All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tim Chen <tim.c.chen@linux.intel.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: Tim Chen <tim.c.chen@linux.intel.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dave Hansen <dave.hansen@intel.com>,
	Ying Huang <ying.huang@intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	David Rientjes <rientjes@google.com>,
	Shakeel Butt <shakeelb@google.com>,
	linux-mm@kvack.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH v1 01/11] mm: Define top tier memory node mask
Date: Mon,  5 Apr 2021 10:08:25 -0700	[thread overview]
Message-ID: <57544494cb67299fabfa01dd17885f7b6a4266bb.1617642417.git.tim.c.chen@linux.intel.com> (raw)
In-Reply-To: <cover.1617642417.git.tim.c.chen@linux.intel.com>

Traditionally, all RAM is DRAM.  Some DRAM might be closer/faster
than others, but a byte of media has about the same cost whether it
is close or far.  But, with new memory tiers such as High-Bandwidth
Memory or Persistent Memory, there is a choice between fast/expensive
and slow/cheap.

The fast/expensive memory lives in the top tier of the memory
hierachy and it is a precious resource that needs to be accounted and
managed on a memory cgroup basis.

Define the top tier memory as the memory nodes that don't have demotion
paths into it from higher tier memory.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 drivers/base/node.c      | 2 ++
 include/linux/nodemask.h | 1 +
 mm/memory_hotplug.c      | 3 +++
 mm/migrate.c             | 1 +
 mm/page_alloc.c          | 5 ++++-
 5 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 04f71c7bc3f8..9eb214ac331f 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -1016,6 +1016,7 @@ static struct node_attr node_state_attr[] = {
 #endif
 	[N_MEMORY] = _NODE_ATTR(has_memory, N_MEMORY),
 	[N_CPU] = _NODE_ATTR(has_cpu, N_CPU),
+	[N_TOPTIER] = _NODE_ATTR(is_toptier, N_TOPTIER),
 	[N_GENERIC_INITIATOR] = _NODE_ATTR(has_generic_initiator,
 					   N_GENERIC_INITIATOR),
 };
@@ -1029,6 +1030,7 @@ static struct attribute *node_state_attrs[] = {
 #endif
 	&node_state_attr[N_MEMORY].attr.attr,
 	&node_state_attr[N_CPU].attr.attr,
+	&node_state_attr[N_TOPTIER].attr.attr,
 	&node_state_attr[N_GENERIC_INITIATOR].attr.attr,
 	NULL
 };
diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
index ac398e143c9a..3003401ed7f0 100644
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -399,6 +399,7 @@ enum node_states {
 #endif
 	N_MEMORY,		/* The node has memory(regular, high, movable) */
 	N_CPU,		/* The node has one or more cpus */
+	N_TOPTIER,		/* Top tier node, no demotion path into node */
 	N_GENERIC_INITIATOR,	/* The node has one or more Generic Initiators */
 	NR_NODE_STATES
 };
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 7550b88e2432..7b21560d4c4d 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -36,6 +36,7 @@
 #include <linux/memblock.h>
 #include <linux/compaction.h>
 #include <linux/rmap.h>
+#include <linux/node.h>
 
 #include <asm/tlbflush.h>
 
@@ -654,6 +655,8 @@ static void node_states_set_node(int node, struct memory_notify *arg)
 
 	if (arg->status_change_nid >= 0)
 		node_set_state(node, N_MEMORY);
+
+	node_set_state(node, N_TOPTIER);
 }
 
 static void __meminit resize_zone_range(struct zone *zone, unsigned long start_pfn,
diff --git a/mm/migrate.c b/mm/migrate.c
index 72223fd7e623..e84aedf611da 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -3439,6 +3439,7 @@ static int establish_migrate_target(int node, nodemask_t *used)
 		return NUMA_NO_NODE;
 
 	node_demotion[node] = migration_target;
+	node_clear_state(migration_target, N_TOPTIER);
 
 	return migration_target;
 }
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ff058941ccfa..471a2c342c4f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -157,6 +157,7 @@ nodemask_t node_states[NR_NODE_STATES] __read_mostly = {
 	[N_MEMORY] = { { [0] = 1UL } },
 	[N_CPU] = { { [0] = 1UL } },
 #endif	/* NUMA */
+	[N_TOPTIER] = NODE_MASK_ALL,
 };
 EXPORT_SYMBOL(node_states);
 
@@ -7590,8 +7591,10 @@ void __init free_area_init(unsigned long *max_zone_pfn)
 		free_area_init_node(nid);
 
 		/* Any memory on that node */
-		if (pgdat->node_present_pages)
+		if (pgdat->node_present_pages) {
 			node_set_state(nid, N_MEMORY);
+			node_set_state(nid, N_TOPTIER);
+		}
 		check_for_memory(pgdat, nid);
 	}
 }
-- 
2.20.1


WARNING: multiple messages have this Message-ID (diff)
From: Tim Chen <tim.c.chen-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
To: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
Cc: Tim Chen <tim.c.chen-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>,
	Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Dave Hansen <dave.hansen-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	Ying Huang <ying.huang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	Dan Williams
	<dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	David Rientjes <rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: [RFC PATCH v1 01/11] mm: Define top tier memory node mask
Date: Mon,  5 Apr 2021 10:08:25 -0700	[thread overview]
Message-ID: <57544494cb67299fabfa01dd17885f7b6a4266bb.1617642417.git.tim.c.chen@linux.intel.com> (raw)
In-Reply-To: <cover.1617642417.git.tim.c.chen-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>

Traditionally, all RAM is DRAM.  Some DRAM might be closer/faster
than others, but a byte of media has about the same cost whether it
is close or far.  But, with new memory tiers such as High-Bandwidth
Memory or Persistent Memory, there is a choice between fast/expensive
and slow/cheap.

The fast/expensive memory lives in the top tier of the memory
hierachy and it is a precious resource that needs to be accounted and
managed on a memory cgroup basis.

Define the top tier memory as the memory nodes that don't have demotion
paths into it from higher tier memory.

Signed-off-by: Tim Chen <tim.c.chen-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
---
 drivers/base/node.c      | 2 ++
 include/linux/nodemask.h | 1 +
 mm/memory_hotplug.c      | 3 +++
 mm/migrate.c             | 1 +
 mm/page_alloc.c          | 5 ++++-
 5 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 04f71c7bc3f8..9eb214ac331f 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -1016,6 +1016,7 @@ static struct node_attr node_state_attr[] = {
 #endif
 	[N_MEMORY] = _NODE_ATTR(has_memory, N_MEMORY),
 	[N_CPU] = _NODE_ATTR(has_cpu, N_CPU),
+	[N_TOPTIER] = _NODE_ATTR(is_toptier, N_TOPTIER),
 	[N_GENERIC_INITIATOR] = _NODE_ATTR(has_generic_initiator,
 					   N_GENERIC_INITIATOR),
 };
@@ -1029,6 +1030,7 @@ static struct attribute *node_state_attrs[] = {
 #endif
 	&node_state_attr[N_MEMORY].attr.attr,
 	&node_state_attr[N_CPU].attr.attr,
+	&node_state_attr[N_TOPTIER].attr.attr,
 	&node_state_attr[N_GENERIC_INITIATOR].attr.attr,
 	NULL
 };
diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
index ac398e143c9a..3003401ed7f0 100644
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -399,6 +399,7 @@ enum node_states {
 #endif
 	N_MEMORY,		/* The node has memory(regular, high, movable) */
 	N_CPU,		/* The node has one or more cpus */
+	N_TOPTIER,		/* Top tier node, no demotion path into node */
 	N_GENERIC_INITIATOR,	/* The node has one or more Generic Initiators */
 	NR_NODE_STATES
 };
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 7550b88e2432..7b21560d4c4d 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -36,6 +36,7 @@
 #include <linux/memblock.h>
 #include <linux/compaction.h>
 #include <linux/rmap.h>
+#include <linux/node.h>
 
 #include <asm/tlbflush.h>
 
@@ -654,6 +655,8 @@ static void node_states_set_node(int node, struct memory_notify *arg)
 
 	if (arg->status_change_nid >= 0)
 		node_set_state(node, N_MEMORY);
+
+	node_set_state(node, N_TOPTIER);
 }
 
 static void __meminit resize_zone_range(struct zone *zone, unsigned long start_pfn,
diff --git a/mm/migrate.c b/mm/migrate.c
index 72223fd7e623..e84aedf611da 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -3439,6 +3439,7 @@ static int establish_migrate_target(int node, nodemask_t *used)
 		return NUMA_NO_NODE;
 
 	node_demotion[node] = migration_target;
+	node_clear_state(migration_target, N_TOPTIER);
 
 	return migration_target;
 }
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ff058941ccfa..471a2c342c4f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -157,6 +157,7 @@ nodemask_t node_states[NR_NODE_STATES] __read_mostly = {
 	[N_MEMORY] = { { [0] = 1UL } },
 	[N_CPU] = { { [0] = 1UL } },
 #endif	/* NUMA */
+	[N_TOPTIER] = NODE_MASK_ALL,
 };
 EXPORT_SYMBOL(node_states);
 
@@ -7590,8 +7591,10 @@ void __init free_area_init(unsigned long *max_zone_pfn)
 		free_area_init_node(nid);
 
 		/* Any memory on that node */
-		if (pgdat->node_present_pages)
+		if (pgdat->node_present_pages) {
 			node_set_state(nid, N_MEMORY);
+			node_set_state(nid, N_TOPTIER);
+		}
 		check_for_memory(pgdat, nid);
 	}
 }
-- 
2.20.1


  reply	other threads:[~2021-04-05 18:09 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-05 17:08 [RFC PATCH v1 00/11] Manage the top tier memory in a tiered memory Tim Chen
2021-04-05 17:08 ` Tim Chen [this message]
2021-04-05 17:08   ` [RFC PATCH v1 01/11] mm: Define top tier memory node mask Tim Chen
2021-04-05 17:08 ` [RFC PATCH v1 02/11] mm: Add soft memory limit for mem cgroup Tim Chen
2021-04-05 17:08   ` Tim Chen
2021-04-05 17:08 ` [RFC PATCH v1 03/11] mm: Account the top tier memory usage per cgroup Tim Chen
2021-04-05 17:08 ` [RFC PATCH v1 04/11] mm: Report top tier memory usage in sysfs Tim Chen
2021-04-05 17:08   ` Tim Chen
2021-04-05 17:08 ` [RFC PATCH v1 05/11] mm: Add soft_limit_top_tier tree for mem cgroup Tim Chen
2021-04-05 17:08   ` Tim Chen
2021-04-05 17:08 ` [RFC PATCH v1 06/11] mm: Handle top tier memory in cgroup soft limit memory tree utilities Tim Chen
2021-04-05 17:08   ` Tim Chen
2021-04-05 17:08 ` [RFC PATCH v1 07/11] mm: Account the total top tier memory in use Tim Chen
2021-04-05 17:08   ` Tim Chen
2021-04-05 17:08 ` [RFC PATCH v1 08/11] mm: Add toptier option for mem_cgroup_soft_limit_reclaim() Tim Chen
2021-04-05 17:08   ` Tim Chen
2021-04-05 17:08 ` [RFC PATCH v1 09/11] mm: Use kswapd to demote pages when toptier memory is tight Tim Chen
2021-04-05 17:08   ` Tim Chen
2021-04-05 17:08 ` [RFC PATCH v1 10/11] mm: Set toptier_scale_factor via sysctl Tim Chen
2021-04-05 17:08   ` Tim Chen
2021-04-05 17:08 ` [RFC PATCH v1 11/11] mm: Wakeup kswapd if toptier memory need soft reclaim Tim Chen
2021-04-05 17:08   ` Tim Chen
2021-04-06  9:08 ` [RFC PATCH v1 00/11] Manage the top tier memory in a tiered memory Michal Hocko
2021-04-06  9:08   ` Michal Hocko
2021-04-07 22:33   ` Tim Chen
2021-04-07 22:33     ` Tim Chen
2021-04-08 11:52     ` Michal Hocko
2021-04-08 11:52       ` Michal Hocko
2021-04-09 23:26       ` Tim Chen
2021-04-09 23:26         ` Tim Chen
2021-04-12 19:20         ` Shakeel Butt
2021-04-12 19:20           ` Shakeel Butt
2021-04-12 19:20           ` Shakeel Butt
2021-04-14  8:59           ` Jonathan Cameron
2021-04-14  8:59             ` Jonathan Cameron
2021-04-15  0:42           ` Tim Chen
2021-04-15  0:42             ` Tim Chen
2021-04-13  2:15         ` Huang, Ying
2021-04-13  2:15           ` Huang, Ying
2021-04-13  2:15           ` Huang, Ying
2021-04-13  8:33         ` Michal Hocko
2021-04-13  8:33           ` Michal Hocko
2021-04-12 14:03       ` Shakeel Butt
2021-04-12 14:03         ` Shakeel Butt
2021-04-12 14:03         ` Shakeel Butt
2021-04-08 17:18 ` Shakeel Butt
2021-04-08 17:18   ` Shakeel Butt
2021-04-08 17:18   ` Shakeel Butt
2021-04-08 18:00   ` Yang Shi
2021-04-08 18:00     ` Yang Shi
2021-04-08 20:29     ` Shakeel Butt
2021-04-08 20:29       ` Shakeel Butt
2021-04-08 20:29       ` Shakeel Butt
2021-04-08 20:50       ` Yang Shi
2021-04-08 20:50         ` Yang Shi
2021-04-08 20:50         ` Yang Shi
2021-04-12 14:03         ` Shakeel Butt
2021-04-12 14:03           ` Shakeel Butt
2021-04-12 14:03           ` Shakeel Butt
2021-04-09  7:24       ` Michal Hocko
2021-04-09  7:24         ` Michal Hocko
2021-04-15 22:31         ` Tim Chen
2021-04-15 22:31           ` Tim Chen
2021-04-16  6:38           ` Michal Hocko
2021-04-16  6:38             ` Michal Hocko
2021-04-14 23:22       ` Tim Chen
2021-04-14 23:22         ` Tim Chen
2021-04-09  2:58     ` Huang, Ying
2021-04-09  2:58       ` Huang, Ying
2021-04-09  2:58       ` Huang, Ying
2021-04-09 20:50       ` Yang Shi
2021-04-09 20:50         ` Yang Shi
2021-04-09 20:50         ` Yang Shi
2021-04-15 22:25   ` Tim Chen
2021-04-15 22:25     ` Tim Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57544494cb67299fabfa01dd17885f7b6a4266bb.1617642417.git.tim.c.chen@linux.intel.com \
    --to=tim.c.chen@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=rientjes@google.com \
    --cc=shakeelb@google.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.