All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] Cgroup accounting of memory tier usage
@ 2022-06-14 22:25 Tim Chen
  2022-06-14 22:25 ` [RFC PATCH 1/3] mm/memory-tiers Add functions for tier memory usage in a cgroup Tim Chen
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Tim Chen @ 2022-06-14 22:25 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: Tim Chen, Wei Xu, Huang Ying, Greg Thelen, Yang Shi,
	Davidlohr Bueso, Brice Goglin, Michal Hocko,
	Linux Kernel Mailing List, Hesham Almatary, Dave Hansen,
	Jonathan Cameron, Alistair Popple, Dan Williams, Feng Tang,
	Jagdish Gediya, Baolin Wang, David Rientjes, Aneesh Kumar K . V,
	Shakeel Butt

For controlling usage of a top tiered memory by a cgroup, accounting
of top tier memory usage is needed.  This patch set implements the
following:

Patch 1 introduces interface and simple implementation to retrieve
	cgroup tiered memory usage
Patch 2 introduces more efficient accounting with top tier memory page counter 
Patch 3 provides a sysfs interface to repot the the top tiered memory
	usage.

The patchset works with Aneesh's v6 memory-tiering implementation [1].
It is a preparatory patch set before introducing features to
control top tiered memory in cgroups.

I'll like to first get feedback to see if 
(1) Controllng the topmost tiered memory is enough 
or
(2) Multiple tiers at the top levels need to be grouped into "toptier"
or
(3) There are use cases not covered by (1) and (2). 

Thanks.

Tim

[1] https://lore.kernel.org/linux-mm/20220610135229.182859-1-aneesh.kumar@linux.ibm.com/ 

Tim Chen (3):
  mm/memory-tiers Add functions for tier memory usage in a cgroup
  mm/memory-tiers: Use page counter to track toptier memory usage
  mm/memory-tiers: Show toptier memory usage for cgroup

 include/linux/memcontrol.h   |  1 +
 include/linux/memory-tiers.h |  2 +
 mm/memcontrol.c              | 86 +++++++++++++++++++++++++++++++++++-
 mm/memory-tiers.c            |  3 +-
 4 files changed, 89 insertions(+), 3 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH 1/3] mm/memory-tiers Add functions for tier memory usage in a cgroup
  2022-06-14 22:25 [RFC PATCH 0/3] Cgroup accounting of memory tier usage Tim Chen
@ 2022-06-14 22:25 ` Tim Chen
  2022-06-21  4:18   ` Aneesh Kumar K.V
  2022-06-14 22:25 ` [RFC PATCH 2/3] mm/memory-tiers: Use page counter to track toptier memory usage Tim Chen
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Tim Chen @ 2022-06-14 22:25 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: Tim Chen, Wei Xu, Huang Ying, Greg Thelen, Yang Shi,
	Davidlohr Bueso, Brice Goglin, Michal Hocko,
	Linux Kernel Mailing List, Hesham Almatary, Dave Hansen,
	Jonathan Cameron, Alistair Popple, Dan Williams, Feng Tang,
	Jagdish Gediya, Baolin Wang, David Rientjes, Aneesh Kumar K . V,
	Shakeel Butt

Add functions to provide tier based memory usage.  This is in preparation
for query via sysfs and for controlling a cgroup's top tier memory usage.

This patch introduces the tiered memory usage query interface and a
simple implementation.  A more efficient implementation to get toptier
memory usage will be introduced in the next patch.
---
 include/linux/memory-tiers.h |  2 ++
 mm/memcontrol.c              | 35 +++++++++++++++++++++++++++++++++++
 mm/memory-tiers.c            |  3 ++-
 3 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
index de4098f6d5d5..1177dcbbdeda 100644
--- a/include/linux/memory-tiers.h
+++ b/include/linux/memory-tiers.h
@@ -31,6 +31,8 @@ struct memory_tier {
 };
 
 extern bool numa_demotion_enabled;
+extern struct list_head memory_tiers;
+
 int node_create_and_set_memory_tier(int node, int tier);
 int next_demotion_node(int node);
 int node_set_memory_tier(int node, int tier);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index abec50f31fe6..2f6e95e6d200 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -63,6 +63,7 @@
 #include <linux/resume_user_mode.h>
 #include <linux/psi.h>
 #include <linux/seq_buf.h>
+#include <linux/memory-tiers.h>
 #include "internal.h"
 #include <net/sock.h>
 #include <net/ip.h>
@@ -3921,6 +3922,40 @@ static int memcg_numa_stat_show(struct seq_file *m, void *v)
 
 	return 0;
 }
+
+unsigned long mem_cgroup_memtier_usage(struct mem_cgroup *memcg,
+					struct memory_tier *memtier)
+{
+	int node;
+	struct memory_tier *node_tier;
+	unsigned long usage = 0;
+
+	if (!memcg)
+		return 0;
+
+	rcu_read_lock();
+	for_each_online_node(node) {
+		node_tier = node_get_memory_tier(node);
+		if (node_tier == memtier)
+			usage += mem_cgroup_node_nr_lru_pages(memcg, node,
+					LRU_ALL, true);
+		node_put_memory_tier(node_tier);
+	}
+	rcu_read_unlock();
+	return usage;
+}
+
+unsigned long mem_cgroup_toptier_usage(struct mem_cgroup *memcg)
+{
+	struct memory_tier *top_tier;
+
+	top_tier = list_first_entry(&memory_tiers, struct memory_tier, list);
+	if (top_tier)
+		return mem_cgroup_memtier_usage(memcg, top_tier);
+	else
+		return 0;
+}
+
 #endif /* CONFIG_NUMA */
 
 static const unsigned int memcg1_stats[] = {
diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index 0dae3114e22c..d552ac1e9d57 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -16,7 +16,8 @@ struct demotion_nodes {
 #define to_memory_tier(device) container_of(device, struct memory_tier, dev)
 static void establish_migration_targets(void);
 static DEFINE_MUTEX(memory_tier_lock);
-static LIST_HEAD(memory_tiers);
+LIST_HEAD(memory_tiers);
+EXPORT_SYMBOL(memory_tiers);
 static int top_tier_rank;
 /*
  * node_demotion[] examples:
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 2/3] mm/memory-tiers: Use page counter to track toptier memory usage
  2022-06-14 22:25 [RFC PATCH 0/3] Cgroup accounting of memory tier usage Tim Chen
  2022-06-14 22:25 ` [RFC PATCH 1/3] mm/memory-tiers Add functions for tier memory usage in a cgroup Tim Chen
@ 2022-06-14 22:25 ` Tim Chen
  2022-06-15  0:27   ` Wei Xu
  2022-06-15  0:30   ` Wei Xu
  2022-06-14 22:25 ` [RFC PATCH 3/3] mm/memory-tiers: Show toptier memory usage for cgroup Tim Chen
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 14+ messages in thread
From: Tim Chen @ 2022-06-14 22:25 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: Tim Chen, Wei Xu, Huang Ying, Greg Thelen, Yang Shi,
	Davidlohr Bueso, Brice Goglin, Michal Hocko,
	Linux Kernel Mailing List, Hesham Almatary, Dave Hansen,
	Jonathan Cameron, Alistair Popple, Dan Williams, Feng Tang,
	Jagdish Gediya, Baolin Wang, David Rientjes, Aneesh Kumar K . V,
	Shakeel Butt

If we need to restrict toptier memory usage for a cgroup,
we need to retrieve usage of toptier memory efficiently.
Add a page counter to track toptier memory usage directly
so its value can be returned right away.
---
 include/linux/memcontrol.h |  1 +
 mm/memcontrol.c            | 50 ++++++++++++++++++++++++++++++++------
 2 files changed, 43 insertions(+), 8 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 9ecead1042b9..b4f727cba1de 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -241,6 +241,7 @@ struct mem_cgroup {
 
 	/* Accounted resources */
 	struct page_counter memory;		/* Both v1 & v2 */
+	struct page_counter toptier;
 
 	union {
 		struct page_counter swap;	/* v2 only */
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2f6e95e6d200..2f20ec2712b8 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -848,6 +848,23 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg,
 	__this_cpu_add(memcg->vmstats_percpu->nr_page_events, nr_pages);
 }
 
+static inline void mem_cgroup_charge_toptier(struct mem_cgroup *memcg,
+					     int nid,
+					     int nr_pages)
+{
+	if (!node_is_toptier(nid) || !memcg)
+		return;
+
+	if (nr_pages >= 0) {
+		page_counter_charge(&memcg->toptier,
+				(unsigned long) nr_pages);
+	} else {
+		nr_pages = -nr_pages;
+		page_counter_uncharge(&memcg->toptier,
+				(unsigned long) nr_pages);
+	}
+}
+
 static bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg,
 				       enum mem_cgroup_events_target target)
 {
@@ -3027,6 +3044,8 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
 		if (!ret) {
 			page->memcg_data = (unsigned long)objcg |
 				MEMCG_DATA_KMEM;
+			mem_cgroup_charge_toptier(page_memcg(page),
+					page_to_nid(page), 1 << order);
 			return 0;
 		}
 		obj_cgroup_put(objcg);
@@ -3050,6 +3069,8 @@ void __memcg_kmem_uncharge_page(struct page *page, int order)
 
 	objcg = __folio_objcg(folio);
 	obj_cgroup_uncharge_pages(objcg, nr_pages);
+	mem_cgroup_charge_toptier(page_memcg(page),
+			page_to_nid(page), -nr_pages);
 	folio->memcg_data = 0;
 	obj_cgroup_put(objcg);
 }
@@ -3947,13 +3968,10 @@ unsigned long mem_cgroup_memtier_usage(struct mem_cgroup *memcg,
 
 unsigned long mem_cgroup_toptier_usage(struct mem_cgroup *memcg)
 {
-	struct memory_tier *top_tier;
-
-	top_tier = list_first_entry(&memory_tiers, struct memory_tier, list);
-	if (top_tier)
-		return mem_cgroup_memtier_usage(memcg, top_tier);
-	else
+	if (!memcg)
 		return 0;
+
+	return page_counter_read(&memcg->toptier);
 }
 
 #endif /* CONFIG_NUMA */
@@ -5228,11 +5246,13 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
 		memcg->oom_kill_disable = parent->oom_kill_disable;
 
 		page_counter_init(&memcg->memory, &parent->memory);
+		page_counter_init(&memcg->toptier, &parent->toptier);
 		page_counter_init(&memcg->swap, &parent->swap);
 		page_counter_init(&memcg->kmem, &parent->kmem);
 		page_counter_init(&memcg->tcpmem, &parent->tcpmem);
 	} else {
 		page_counter_init(&memcg->memory, NULL);
+		page_counter_init(&memcg->toptier, NULL);
 		page_counter_init(&memcg->swap, NULL);
 		page_counter_init(&memcg->kmem, NULL);
 		page_counter_init(&memcg->tcpmem, NULL);
@@ -5678,6 +5698,8 @@ static int mem_cgroup_move_account(struct page *page,
 	memcg_check_events(to, nid);
 	mem_cgroup_charge_statistics(from, -nr_pages);
 	memcg_check_events(from, nid);
+	mem_cgroup_charge_toptier(to, nid, nr_pages);
+	mem_cgroup_charge_toptier(from, nid, -nr_pages);
 	local_irq_enable();
 out_unlock:
 	folio_unlock(folio);
@@ -6761,6 +6783,7 @@ static int charge_memcg(struct folio *folio, struct mem_cgroup *memcg,
 
 	local_irq_disable();
 	mem_cgroup_charge_statistics(memcg, nr_pages);
+	mem_cgroup_charge_toptier(memcg, folio_nid(folio), nr_pages);
 	memcg_check_events(memcg, folio_nid(folio));
 	local_irq_enable();
 out:
@@ -6853,6 +6876,7 @@ struct uncharge_gather {
 	unsigned long nr_memory;
 	unsigned long pgpgout;
 	unsigned long nr_kmem;
+	unsigned long nr_toptier;
 	int nid;
 };
 
@@ -6867,6 +6891,7 @@ static void uncharge_batch(const struct uncharge_gather *ug)
 
 	if (ug->nr_memory) {
 		page_counter_uncharge(&ug->memcg->memory, ug->nr_memory);
+		page_counter_uncharge(&ug->memcg->toptier, ug->nr_toptier);
 		if (do_memsw_account())
 			page_counter_uncharge(&ug->memcg->memsw, ug->nr_memory);
 		if (ug->nr_kmem)
@@ -6929,12 +6954,18 @@ static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug)
 		ug->nr_memory += nr_pages;
 		ug->nr_kmem += nr_pages;
 
+		if (node_is_toptier(folio_nid(folio)))
+			ug->nr_toptier += nr_pages;
+
 		folio->memcg_data = 0;
 		obj_cgroup_put(objcg);
 	} else {
 		/* LRU pages aren't accounted at the root level */
-		if (!mem_cgroup_is_root(memcg))
+		if (!mem_cgroup_is_root(memcg)) {
 			ug->nr_memory += nr_pages;
+			if (node_is_toptier(folio_nid(folio)))
+				ug->nr_toptier += nr_pages;
+		}
 		ug->pgpgout++;
 
 		folio->memcg_data = 0;
@@ -7011,6 +7042,7 @@ void mem_cgroup_migrate(struct folio *old, struct folio *new)
 	/* Force-charge the new page. The old one will be freed soon */
 	if (!mem_cgroup_is_root(memcg)) {
 		page_counter_charge(&memcg->memory, nr_pages);
+		mem_cgroup_charge_toptier(memcg, folio_nid(new), nr_pages);
 		if (do_memsw_account())
 			page_counter_charge(&memcg->memsw, nr_pages);
 	}
@@ -7231,8 +7263,10 @@ void mem_cgroup_swapout(struct folio *folio, swp_entry_t entry)
 
 	folio->memcg_data = 0;
 
-	if (!mem_cgroup_is_root(memcg))
+	if (!mem_cgroup_is_root(memcg)) {
 		page_counter_uncharge(&memcg->memory, nr_entries);
+		mem_cgroup_charge_toptier(memcg, folio_nid(folio), -nr_entries);
+	}
 
 	if (!cgroup_memory_noswap && memcg != swap_memcg) {
 		if (!mem_cgroup_is_root(swap_memcg))
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 3/3] mm/memory-tiers: Show toptier memory usage for cgroup
  2022-06-14 22:25 [RFC PATCH 0/3] Cgroup accounting of memory tier usage Tim Chen
  2022-06-14 22:25 ` [RFC PATCH 1/3] mm/memory-tiers Add functions for tier memory usage in a cgroup Tim Chen
  2022-06-14 22:25 ` [RFC PATCH 2/3] mm/memory-tiers: Use page counter to track toptier memory usage Tim Chen
@ 2022-06-14 22:25 ` Tim Chen
  2022-06-15  4:58 ` [RFC PATCH 0/3] Cgroup accounting of memory tier usage Ying Huang
  2022-06-15 11:11 ` Michal Hocko
  4 siblings, 0 replies; 14+ messages in thread
From: Tim Chen @ 2022-06-14 22:25 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: Tim Chen, Wei Xu, Huang Ying, Greg Thelen, Yang Shi,
	Davidlohr Bueso, Brice Goglin, Michal Hocko,
	Linux Kernel Mailing List, Hesham Almatary, Dave Hansen,
	Jonathan Cameron, Alistair Popple, Dan Williams, Feng Tang,
	Jagdish Gediya, Baolin Wang, David Rientjes, Aneesh Kumar K . V,
	Shakeel Butt

Show toptier memory usage for a cgroup via sysfs:

/sys/fs/cgroup/<group>/memory.toptier
---
 mm/memcontrol.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2f20ec2712b8..5fd1e3b686cd 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6205,6 +6205,14 @@ static u64 memory_current_read(struct cgroup_subsys_state *css,
 	return (u64)page_counter_read(&memcg->memory) * PAGE_SIZE;
 }
 
+static u64 memory_toptier_read(struct cgroup_subsys_state *css,
+			       struct cftype *cft)
+{
+	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
+
+	return (u64)mem_cgroup_toptier_usage(memcg) * PAGE_SIZE;
+}
+
 static u64 memory_peak_read(struct cgroup_subsys_state *css,
 			    struct cftype *cft)
 {
@@ -6516,6 +6524,11 @@ static struct cftype memory_files[] = {
 		.flags = CFTYPE_NOT_ON_ROOT,
 		.read_u64 = memory_current_read,
 	},
+	{
+		.name = "toptier",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.read_u64 = memory_toptier_read,
+	},
 	{
 		.name = "peak",
 		.flags = CFTYPE_NOT_ON_ROOT,
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 2/3] mm/memory-tiers: Use page counter to track toptier memory usage
  2022-06-14 22:25 ` [RFC PATCH 2/3] mm/memory-tiers: Use page counter to track toptier memory usage Tim Chen
@ 2022-06-15  0:27   ` Wei Xu
  2022-06-15  0:30   ` Wei Xu
  1 sibling, 0 replies; 14+ messages in thread
From: Wei Xu @ 2022-06-15  0:27 UTC (permalink / raw)
  To: Tim Chen
  Cc: Linux MM, Andrew Morton, Huang Ying, Greg Thelen, Yang Shi,
	Davidlohr Bueso, Brice Goglin, Michal Hocko,
	Linux Kernel Mailing List, Hesham Almatary, Dave Hansen,
	Jonathan Cameron, Alistair Popple, Dan Williams, Feng Tang,
	Jagdish Gediya, Baolin Wang, David Rientjes, Aneesh Kumar K . V,
	Shakeel Butt

[-- Attachment #1: Type: text/plain, Size: 7876 bytes --]

On Tue, Jun 14, 2022 at 3:26 PM Tim Chen <tim.c.chen@linux.intel.com> wrote:

> If we need to restrict toptier memory usage for a cgroup,
> we need to retrieve usage of toptier memory efficiently.
> Add a page counter to track toptier memory usage directly
> so its value can be returned right away.
> ---
>  include/linux/memcontrol.h |  1 +
>  mm/memcontrol.c            | 50 ++++++++++++++++++++++++++++++++------
>  2 files changed, 43 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 9ecead1042b9..b4f727cba1de 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -241,6 +241,7 @@ struct mem_cgroup {
>
>         /* Accounted resources */
>         struct page_counter memory;             /* Both v1 & v2 */
> +       struct page_counter toptier;
>
>         union {
>                 struct page_counter swap;       /* v2 only */
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 2f6e95e6d200..2f20ec2712b8 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -848,6 +848,23 @@ static void mem_cgroup_charge_statistics(struct
> mem_cgroup *memcg,
>         __this_cpu_add(memcg->vmstats_percpu->nr_page_events, nr_pages);
>  }
>
> +static inline void mem_cgroup_charge_toptier(struct mem_cgroup *memcg,
> +                                            int nid,
> +                                            int nr_pages)
> +{
> +       if (!node_is_toptier(nid) || !memcg)
> +               return;
> +
> +       if (nr_pages >= 0) {
> +               page_counter_charge(&memcg->toptier,
> +                               (unsigned long) nr_pages);
> +       } else {
> +               nr_pages = -nr_pages;
> +               page_counter_uncharge(&memcg->toptier,
> +                               (unsigned long) nr_pages);
> +       }
> +}
> +
>

When we don't know which pages are being charged, we should still charge
the usage to toptier (assuming that toptier always include the default
tier), e.g. from try_charge_memcg().

The idea is that when lower tier memory is not used, memcg->toptier and
memcg->memory should have the same value. Otherwise, it can cause
confusions about where the pages of (memcg->memory - memcg->toptier) go.

 static bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg,
>                                        enum mem_cgroup_events_target
> target)
>  {
> @@ -3027,6 +3044,8 @@ int __memcg_kmem_charge_page(struct page *page,
> gfp_t gfp, int order)
>                 if (!ret) {
>                         page->memcg_data = (unsigned long)objcg |
>                                 MEMCG_DATA_KMEM;
> +                       mem_cgroup_charge_toptier(page_memcg(page),
> +                                       page_to_nid(page), 1 << order);
>                         return 0;
>                 }
>                 obj_cgroup_put(objcg);
> @@ -3050,6 +3069,8 @@ void __memcg_kmem_uncharge_page(struct page *page,
> int order)
>
>         objcg = __folio_objcg(folio);
>         obj_cgroup_uncharge_pages(objcg, nr_pages);
> +       mem_cgroup_charge_toptier(page_memcg(page),
> +                       page_to_nid(page), -nr_pages);
>         folio->memcg_data = 0;
>         obj_cgroup_put(objcg);
>  }
> @@ -3947,13 +3968,10 @@ unsigned long mem_cgroup_memtier_usage(struct
> mem_cgroup *memcg,
>
>  unsigned long mem_cgroup_toptier_usage(struct mem_cgroup *memcg)
>  {
> -       struct memory_tier *top_tier;
> -
> -       top_tier = list_first_entry(&memory_tiers, struct memory_tier,
> list);
> -       if (top_tier)
> -               return mem_cgroup_memtier_usage(memcg, top_tier);
> -       else
> +       if (!memcg)
>                 return 0;
> +
> +       return page_counter_read(&memcg->toptier);
>  }
>
>  #endif /* CONFIG_NUMA */
> @@ -5228,11 +5246,13 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state
> *parent_css)
>                 memcg->oom_kill_disable = parent->oom_kill_disable;
>
>                 page_counter_init(&memcg->memory, &parent->memory);
> +               page_counter_init(&memcg->toptier, &parent->toptier);
>                 page_counter_init(&memcg->swap, &parent->swap);
>                 page_counter_init(&memcg->kmem, &parent->kmem);
>                 page_counter_init(&memcg->tcpmem, &parent->tcpmem);
>         } else {
>                 page_counter_init(&memcg->memory, NULL);
> +               page_counter_init(&memcg->toptier, NULL);
>                 page_counter_init(&memcg->swap, NULL);
>                 page_counter_init(&memcg->kmem, NULL);
>                 page_counter_init(&memcg->tcpmem, NULL);
> @@ -5678,6 +5698,8 @@ static int mem_cgroup_move_account(struct page *page,
>         memcg_check_events(to, nid);
>         mem_cgroup_charge_statistics(from, -nr_pages);
>         memcg_check_events(from, nid);
> +       mem_cgroup_charge_toptier(to, nid, nr_pages);
> +       mem_cgroup_charge_toptier(from, nid, -nr_pages);
>         local_irq_enable();
>  out_unlock:
>         folio_unlock(folio);
> @@ -6761,6 +6783,7 @@ static int charge_memcg(struct folio *folio, struct
> mem_cgroup *memcg,
>
>         local_irq_disable();
>         mem_cgroup_charge_statistics(memcg, nr_pages);
> +       mem_cgroup_charge_toptier(memcg, folio_nid(folio), nr_pages);
>         memcg_check_events(memcg, folio_nid(folio));
>         local_irq_enable();
>  out:
> @@ -6853,6 +6876,7 @@ struct uncharge_gather {
>         unsigned long nr_memory;
>         unsigned long pgpgout;
>         unsigned long nr_kmem;
> +       unsigned long nr_toptier;
>         int nid;
>  };
>
> @@ -6867,6 +6891,7 @@ static void uncharge_batch(const struct
> uncharge_gather *ug)
>
>         if (ug->nr_memory) {
>                 page_counter_uncharge(&ug->memcg->memory, ug->nr_memory);
> +               page_counter_uncharge(&ug->memcg->toptier, ug->nr_toptier);
>                 if (do_memsw_account())
>                         page_counter_uncharge(&ug->memcg->memsw,
> ug->nr_memory);
>                 if (ug->nr_kmem)
> @@ -6929,12 +6954,18 @@ static void uncharge_folio(struct folio *folio,
> struct uncharge_gather *ug)
>                 ug->nr_memory += nr_pages;
>                 ug->nr_kmem += nr_pages;
>
> +               if (node_is_toptier(folio_nid(folio)))
> +                       ug->nr_toptier += nr_pages;
> +
>                 folio->memcg_data = 0;
>                 obj_cgroup_put(objcg);
>         } else {
>                 /* LRU pages aren't accounted at the root level */
> -               if (!mem_cgroup_is_root(memcg))
> +               if (!mem_cgroup_is_root(memcg)) {
>                         ug->nr_memory += nr_pages;
> +                       if (node_is_toptier(folio_nid(folio)))
> +                               ug->nr_toptier += nr_pages;
> +               }
>                 ug->pgpgout++;
>
>                 folio->memcg_data = 0;
> @@ -7011,6 +7042,7 @@ void mem_cgroup_migrate(struct folio *old, struct
> folio *new)
>         /* Force-charge the new page. The old one will be freed soon */
>         if (!mem_cgroup_is_root(memcg)) {
>                 page_counter_charge(&memcg->memory, nr_pages);
> +               mem_cgroup_charge_toptier(memcg, folio_nid(new), nr_pages);
>                 if (do_memsw_account())
>                         page_counter_charge(&memcg->memsw, nr_pages);
>         }
> @@ -7231,8 +7263,10 @@ void mem_cgroup_swapout(struct folio *folio,
> swp_entry_t entry)
>
>         folio->memcg_data = 0;
>
> -       if (!mem_cgroup_is_root(memcg))
> +       if (!mem_cgroup_is_root(memcg)) {
>                 page_counter_uncharge(&memcg->memory, nr_entries);
> +               mem_cgroup_charge_toptier(memcg, folio_nid(folio),
> -nr_entries);
> +       }
>
>         if (!cgroup_memory_noswap && memcg != swap_memcg) {
>                 if (!mem_cgroup_is_root(swap_memcg))
> --
> 2.35.1
>
>
>

[-- Attachment #2: Type: text/html, Size: 9998 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 2/3] mm/memory-tiers: Use page counter to track toptier memory usage
  2022-06-14 22:25 ` [RFC PATCH 2/3] mm/memory-tiers: Use page counter to track toptier memory usage Tim Chen
  2022-06-15  0:27   ` Wei Xu
@ 2022-06-15  0:30   ` Wei Xu
  2022-06-16  4:12     ` Tim Chen
  1 sibling, 1 reply; 14+ messages in thread
From: Wei Xu @ 2022-06-15  0:30 UTC (permalink / raw)
  To: Tim Chen
  Cc: Linux MM, Andrew Morton, Huang Ying, Greg Thelen, Yang Shi,
	Davidlohr Bueso, Brice Goglin, Michal Hocko,
	Linux Kernel Mailing List, Hesham Almatary, Dave Hansen,
	Jonathan Cameron, Alistair Popple, Dan Williams, Feng Tang,
	Jagdish Gediya, Baolin Wang, David Rientjes, Aneesh Kumar K . V,
	Shakeel Butt

(Resend in plain text. Sorry.)

On Tue, Jun 14, 2022 at 3:26 PM Tim Chen <tim.c.chen@linux.intel.com> wrote:
>
> If we need to restrict toptier memory usage for a cgroup,
> we need to retrieve usage of toptier memory efficiently.
> Add a page counter to track toptier memory usage directly
> so its value can be returned right away.
> ---
>  include/linux/memcontrol.h |  1 +
>  mm/memcontrol.c            | 50 ++++++++++++++++++++++++++++++++------
>  2 files changed, 43 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 9ecead1042b9..b4f727cba1de 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -241,6 +241,7 @@ struct mem_cgroup {
>
>         /* Accounted resources */
>         struct page_counter memory;             /* Both v1 & v2 */
> +       struct page_counter toptier;
>
>         union {
>                 struct page_counter swap;       /* v2 only */
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 2f6e95e6d200..2f20ec2712b8 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -848,6 +848,23 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg,
>         __this_cpu_add(memcg->vmstats_percpu->nr_page_events, nr_pages);
>  }
>
> +static inline void mem_cgroup_charge_toptier(struct mem_cgroup *memcg,
> +                                            int nid,
> +                                            int nr_pages)
> +{
> +       if (!node_is_toptier(nid) || !memcg)
> +               return;
> +
> +       if (nr_pages >= 0) {
> +               page_counter_charge(&memcg->toptier,
> +                               (unsigned long) nr_pages);
> +       } else {
> +               nr_pages = -nr_pages;
> +               page_counter_uncharge(&memcg->toptier,
> +                               (unsigned long) nr_pages);
> +       }
> +}

When we don't know which pages are being charged, we should still
charge the usage to toptier (assuming that toptier always include the
default tier), e.g. from try_charge_memcg().

The idea is that when lower tier memory is not used, memcg->toptier
and memcg->memory should have the same value. Otherwise, it can cause
confusions about where the pages of (memcg->memory - memcg->toptier)
go.

>  static bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg,
>                                        enum mem_cgroup_events_target target)
>  {
> @@ -3027,6 +3044,8 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
>                 if (!ret) {
>                         page->memcg_data = (unsigned long)objcg |
>                                 MEMCG_DATA_KMEM;
> +                       mem_cgroup_charge_toptier(page_memcg(page),
> +                                       page_to_nid(page), 1 << order);
>                         return 0;
>                 }
>                 obj_cgroup_put(objcg);
> @@ -3050,6 +3069,8 @@ void __memcg_kmem_uncharge_page(struct page *page, int order)
>
>         objcg = __folio_objcg(folio);
>         obj_cgroup_uncharge_pages(objcg, nr_pages);
> +       mem_cgroup_charge_toptier(page_memcg(page),
> +                       page_to_nid(page), -nr_pages);
>         folio->memcg_data = 0;
>         obj_cgroup_put(objcg);
>  }
> @@ -3947,13 +3968,10 @@ unsigned long mem_cgroup_memtier_usage(struct mem_cgroup *memcg,
>
>  unsigned long mem_cgroup_toptier_usage(struct mem_cgroup *memcg)
>  {
> -       struct memory_tier *top_tier;
> -
> -       top_tier = list_first_entry(&memory_tiers, struct memory_tier, list);
> -       if (top_tier)
> -               return mem_cgroup_memtier_usage(memcg, top_tier);
> -       else
> +       if (!memcg)
>                 return 0;
> +
> +       return page_counter_read(&memcg->toptier);
>  }
>
>  #endif /* CONFIG_NUMA */
> @@ -5228,11 +5246,13 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
>                 memcg->oom_kill_disable = parent->oom_kill_disable;
>
>                 page_counter_init(&memcg->memory, &parent->memory);
> +               page_counter_init(&memcg->toptier, &parent->toptier);
>                 page_counter_init(&memcg->swap, &parent->swap);
>                 page_counter_init(&memcg->kmem, &parent->kmem);
>                 page_counter_init(&memcg->tcpmem, &parent->tcpmem);
>         } else {
>                 page_counter_init(&memcg->memory, NULL);
> +               page_counter_init(&memcg->toptier, NULL);
>                 page_counter_init(&memcg->swap, NULL);
>                 page_counter_init(&memcg->kmem, NULL);
>                 page_counter_init(&memcg->tcpmem, NULL);
> @@ -5678,6 +5698,8 @@ static int mem_cgroup_move_account(struct page *page,
>         memcg_check_events(to, nid);
>         mem_cgroup_charge_statistics(from, -nr_pages);
>         memcg_check_events(from, nid);
> +       mem_cgroup_charge_toptier(to, nid, nr_pages);
> +       mem_cgroup_charge_toptier(from, nid, -nr_pages);
>         local_irq_enable();
>  out_unlock:
>         folio_unlock(folio);
> @@ -6761,6 +6783,7 @@ static int charge_memcg(struct folio *folio, struct mem_cgroup *memcg,
>
>         local_irq_disable();
>         mem_cgroup_charge_statistics(memcg, nr_pages);
> +       mem_cgroup_charge_toptier(memcg, folio_nid(folio), nr_pages);
>         memcg_check_events(memcg, folio_nid(folio));
>         local_irq_enable();
>  out:
> @@ -6853,6 +6876,7 @@ struct uncharge_gather {
>         unsigned long nr_memory;
>         unsigned long pgpgout;
>         unsigned long nr_kmem;
> +       unsigned long nr_toptier;
>         int nid;
>  };
>
> @@ -6867,6 +6891,7 @@ static void uncharge_batch(const struct uncharge_gather *ug)
>
>         if (ug->nr_memory) {
>                 page_counter_uncharge(&ug->memcg->memory, ug->nr_memory);
> +               page_counter_uncharge(&ug->memcg->toptier, ug->nr_toptier);
>                 if (do_memsw_account())
>                         page_counter_uncharge(&ug->memcg->memsw, ug->nr_memory);
>                 if (ug->nr_kmem)
> @@ -6929,12 +6954,18 @@ static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug)
>                 ug->nr_memory += nr_pages;
>                 ug->nr_kmem += nr_pages;
>
> +               if (node_is_toptier(folio_nid(folio)))
> +                       ug->nr_toptier += nr_pages;
> +
>                 folio->memcg_data = 0;
>                 obj_cgroup_put(objcg);
>         } else {
>                 /* LRU pages aren't accounted at the root level */
> -               if (!mem_cgroup_is_root(memcg))
> +               if (!mem_cgroup_is_root(memcg)) {
>                         ug->nr_memory += nr_pages;
> +                       if (node_is_toptier(folio_nid(folio)))
> +                               ug->nr_toptier += nr_pages;
> +               }
>                 ug->pgpgout++;
>
>                 folio->memcg_data = 0;
> @@ -7011,6 +7042,7 @@ void mem_cgroup_migrate(struct folio *old, struct folio *new)
>         /* Force-charge the new page. The old one will be freed soon */
>         if (!mem_cgroup_is_root(memcg)) {
>                 page_counter_charge(&memcg->memory, nr_pages);
> +               mem_cgroup_charge_toptier(memcg, folio_nid(new), nr_pages);
>                 if (do_memsw_account())
>                         page_counter_charge(&memcg->memsw, nr_pages);
>         }
> @@ -7231,8 +7263,10 @@ void mem_cgroup_swapout(struct folio *folio, swp_entry_t entry)
>
>         folio->memcg_data = 0;
>
> -       if (!mem_cgroup_is_root(memcg))
> +       if (!mem_cgroup_is_root(memcg)) {
>                 page_counter_uncharge(&memcg->memory, nr_entries);
> +               mem_cgroup_charge_toptier(memcg, folio_nid(folio), -nr_entries);
> +       }
>
>         if (!cgroup_memory_noswap && memcg != swap_memcg) {
>                 if (!mem_cgroup_is_root(swap_memcg))
> --
> 2.35.1
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/3] Cgroup accounting of memory tier usage
  2022-06-14 22:25 [RFC PATCH 0/3] Cgroup accounting of memory tier usage Tim Chen
                   ` (2 preceding siblings ...)
  2022-06-14 22:25 ` [RFC PATCH 3/3] mm/memory-tiers: Show toptier memory usage for cgroup Tim Chen
@ 2022-06-15  4:58 ` Ying Huang
  2022-06-15 17:47   ` Tim Chen
  2022-06-15 11:11 ` Michal Hocko
  4 siblings, 1 reply; 14+ messages in thread
From: Ying Huang @ 2022-06-15  4:58 UTC (permalink / raw)
  To: Tim Chen, linux-mm, akpm
  Cc: Wei Xu, Greg Thelen, Yang Shi, Davidlohr Bueso, Brice Goglin,
	Michal Hocko, Linux Kernel Mailing List, Hesham Almatary,
	Dave Hansen, Jonathan Cameron, Alistair Popple, Dan Williams,
	Feng Tang, Jagdish Gediya, Baolin Wang, David Rientjes,
	Aneesh Kumar K . V, Shakeel Butt

On Tue, 2022-06-14 at 15:25 -0700, Tim Chen wrote:
> For controlling usage of a top tiered memory by a cgroup, accounting
> of top tier memory usage is needed.  This patch set implements the
> following:
> 
> Patch 1 introduces interface and simple implementation to retrieve
> 	cgroup tiered memory usage
> Patch 2 introduces more efficient accounting with top tier memory page counter 
> Patch 3 provides a sysfs interface to repot the the top tiered memory
> 	usage.
> 
> The patchset works with Aneesh's v6 memory-tiering implementation [1].
> It is a preparatory patch set before introducing features to
> control top tiered memory in cgroups.
> 
> I'll like to first get feedback to see if 
> (1) Controllng the topmost tiered memory is enough 
> or
> (2) Multiple tiers at the top levels need to be grouped into "toptier"
> or

If we combine top-N tiers, I think the better name could be "fast-tier",
in contrast to "slow-tier".

> (3) There are use cases not covered by (1) and (2). 

Is it necessary to control memory usage of each tier (except the
lowest/slowest)?  I am not the right person to answer the question, but
I want to ask it.

Best Regards,
Huang, Ying

> Thanks.
> 
> Tim
> 
> [1] https://lore.kernel.org/linux-mm/20220610135229.182859-1-aneesh.kumar@linux.ibm.com/ 
> 
> Tim Chen (3):
>   mm/memory-tiers Add functions for tier memory usage in a cgroup
>   mm/memory-tiers: Use page counter to track toptier memory usage
>   mm/memory-tiers: Show toptier memory usage for cgroup
> 
>  include/linux/memcontrol.h   |  1 +
>  include/linux/memory-tiers.h |  2 +
>  mm/memcontrol.c              | 86 +++++++++++++++++++++++++++++++++++-
>  mm/memory-tiers.c            |  3 +-
>  4 files changed, 89 insertions(+), 3 deletions(-)
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/3] Cgroup accounting of memory tier usage
  2022-06-14 22:25 [RFC PATCH 0/3] Cgroup accounting of memory tier usage Tim Chen
                   ` (3 preceding siblings ...)
  2022-06-15  4:58 ` [RFC PATCH 0/3] Cgroup accounting of memory tier usage Ying Huang
@ 2022-06-15 11:11 ` Michal Hocko
  2022-06-15 15:23   ` Tim Chen
  4 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2022-06-15 11:11 UTC (permalink / raw)
  To: Tim Chen
  Cc: linux-mm, akpm, Wei Xu, Huang Ying, Greg Thelen, Yang Shi,
	Davidlohr Bueso, Brice Goglin, Linux Kernel Mailing List,
	Hesham Almatary, Dave Hansen, Jonathan Cameron, Alistair Popple,
	Dan Williams, Feng Tang, Jagdish Gediya, Baolin Wang,
	David Rientjes, Aneesh Kumar K . V, Shakeel Butt

On Tue 14-06-22 15:25:32, Tim Chen wrote:
> For controlling usage of a top tiered memory by a cgroup, accounting
> of top tier memory usage is needed.  This patch set implements the
> following:
> 
> Patch 1 introduces interface and simple implementation to retrieve
> 	cgroup tiered memory usage
> Patch 2 introduces more efficient accounting with top tier memory page counter 
> Patch 3 provides a sysfs interface to repot the the top tiered memory
> 	usage.

I guess you meant cgroupfs here, right?

> The patchset works with Aneesh's v6 memory-tiering implementation [1].
> It is a preparatory patch set before introducing features to
> control top tiered memory in cgroups.
> 
> I'll like to first get feedback to see if 
> (1) Controllng the topmost tiered memory is enough 
> or
> (2) Multiple tiers at the top levels need to be grouped into "toptier"
> or
> (3) There are use cases not covered by (1) and (2). 

I would start by asking why do we need a dedicated interface in the
first place. Why the existing numa_stat is not a proper interface. Right
now we only report LRU per node stats. Is this insufficient?
What is userspace expect to do based on the reported data?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/3] Cgroup accounting of memory tier usage
  2022-06-15 11:11 ` Michal Hocko
@ 2022-06-15 15:23   ` Tim Chen
  2022-06-15 15:59     ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: Tim Chen @ 2022-06-15 15:23 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, akpm, Wei Xu, Huang Ying, Greg Thelen, Yang Shi,
	Davidlohr Bueso, Brice Goglin, Linux Kernel Mailing List,
	Hesham Almatary, Dave Hansen, Jonathan Cameron, Alistair Popple,
	Dan Williams, Feng Tang, Jagdish Gediya, Baolin Wang,
	David Rientjes, Aneesh Kumar K . V, Shakeel Butt

On Wed, 2022-06-15 at 13:11 +0200, Michal Hocko wrote:
> On Tue 14-06-22 15:25:32, Tim Chen wrote:
> > For controlling usage of a top tiered memory by a cgroup, accounting
> > of top tier memory usage is needed.  This patch set implements the
> > following:
> > 
> > Patch 1 introduces interface and simple implementation to retrieve
> > 	cgroup tiered memory usage
> > Patch 2 introduces more efficient accounting with top tier memory page counter 
> > Patch 3 provides a sysfs interface to repot the the top tiered memory
> > 	usage.
> 
> I guess you meant cgroupfs here, right?

Yes.

> 
> > The patchset works with Aneesh's v6 memory-tiering implementation [1].
> > It is a preparatory patch set before introducing features to
> > control top tiered memory in cgroups.
> > 
> > I'll like to first get feedback to see if 
> > (1) Controllng the topmost tiered memory is enough 
> > or
> > (2) Multiple tiers at the top levels need to be grouped into "toptier"
> > or
> > (3) There are use cases not covered by (1) and (2). 
> 
> I would start by asking why do we need a dedicated interface in the
> first place. Why the existing numa_stat is not a proper interface. Right
> now we only report LRU per node stats. Is this insufficient?
> What is userspace expect to do based on the reported data?

Exporting the toptier information here is convenient for me for debugging
purpose of seeing whether a cgroup's toptier usage is under control.
Otherwise writing a script to parse numastat and the memtier heirachy will
work too. Exporting toptier usage directly is optional and we don't have to do it.

Tim


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/3] Cgroup accounting of memory tier usage
  2022-06-15 15:23   ` Tim Chen
@ 2022-06-15 15:59     ` Michal Hocko
  0 siblings, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2022-06-15 15:59 UTC (permalink / raw)
  To: Tim Chen
  Cc: linux-mm, akpm, Wei Xu, Huang Ying, Greg Thelen, Yang Shi,
	Davidlohr Bueso, Brice Goglin, Linux Kernel Mailing List,
	Hesham Almatary, Dave Hansen, Jonathan Cameron, Alistair Popple,
	Dan Williams, Feng Tang, Jagdish Gediya, Baolin Wang,
	David Rientjes, Aneesh Kumar K . V, Shakeel Butt

On Wed 15-06-22 08:23:56, Tim Chen wrote:
> On Wed, 2022-06-15 at 13:11 +0200, Michal Hocko wrote:
[...]
> > > The patchset works with Aneesh's v6 memory-tiering implementation [1].
> > > It is a preparatory patch set before introducing features to
> > > control top tiered memory in cgroups.
> > > 
> > > I'll like to first get feedback to see if 
> > > (1) Controllng the topmost tiered memory is enough 
> > > or
> > > (2) Multiple tiers at the top levels need to be grouped into "toptier"
> > > or
> > > (3) There are use cases not covered by (1) and (2). 
> > 
> > I would start by asking why do we need a dedicated interface in the
> > first place. Why the existing numa_stat is not a proper interface. Right
> > now we only report LRU per node stats. Is this insufficient?
> > What is userspace expect to do based on the reported data?
> 
> Exporting the toptier information here is convenient for me for debugging
> purpose of seeing whether a cgroup's toptier usage is under control.
> Otherwise writing a script to parse numastat and the memtier heirachy will
> work too. Exporting toptier usage directly is optional and we don't have to do it.

Please keep in mind this is an userspace API which has to be maintained
for ever. We do not add those just to make debugging more convenient.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/3] Cgroup accounting of memory tier usage
  2022-06-15  4:58 ` [RFC PATCH 0/3] Cgroup accounting of memory tier usage Ying Huang
@ 2022-06-15 17:47   ` Tim Chen
  0 siblings, 0 replies; 14+ messages in thread
From: Tim Chen @ 2022-06-15 17:47 UTC (permalink / raw)
  To: Ying Huang, linux-mm, akpm
  Cc: Wei Xu, Greg Thelen, Yang Shi, Davidlohr Bueso, Brice Goglin,
	Michal Hocko, Linux Kernel Mailing List, Hesham Almatary,
	Dave Hansen, Jonathan Cameron, Alistair Popple, Dan Williams,
	Feng Tang, Jagdish Gediya, Baolin Wang, David Rientjes,
	Aneesh Kumar K . V, Shakeel Butt

On Wed, 2022-06-15 at 12:58 +0800, Ying Huang wrote:
> On Tue, 2022-06-14 at 15:25 -0700, Tim Chen wrote:
> > For controlling usage of a top tiered memory by a cgroup, accounting
> > of top tier memory usage is needed.  This patch set implements the
> > following:
> > 
> > Patch 1 introduces interface and simple implementation to retrieve
> > 	cgroup tiered memory usage
> > Patch 2 introduces more efficient accounting with top tier memory page counter 
> > Patch 3 provides a sysfs interface to repot the the top tiered memory
> > 	usage.
> > 
> > The patchset works with Aneesh's v6 memory-tiering implementation [1].
> > It is a preparatory patch set before introducing features to
> > control top tiered memory in cgroups.
> > 
> > I'll like to first get feedback to see if 
> > (1) Controllng the topmost tiered memory is enough 
> > or
> > (2) Multiple tiers at the top levels need to be grouped into "toptier"
> > or
> 
> If we combine top-N tiers, I think the better name could be "fast-tier",
> in contrast to "slow-tier".
> 

I can see use cases for grouping tiers. For example, it makese sense for HBM and 
DRAM tiers be grouped together into a "fast-tier-group".

To make things simple, we can define any tiers above or equal
to the rank of DRAM will belong to this fast-tier-group.

An implication for page promotion/demotion is it needs
to take tier grouping into consideration.  You want to demote
pages away from current tier-group.  For example,
you want to demote HBM (fast-tier-group) into PMEM (slow-tier-group)
instead of into DRAM (fast-tier-group).  

The question is whether fast/slow tier groups are sufficient.
Or you need fast/slow/slower groups?

> > (3) There are use cases not covered by (1) and (2). 
> 
> Is it necessary to control memory usage of each tier (except the
> lowest/slowest)?  I am not the right person to answer the question, but
> I want to ask it.
> 

I have the same question.

Tim



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 2/3] mm/memory-tiers: Use page counter to track toptier memory usage
  2022-06-15  0:30   ` Wei Xu
@ 2022-06-16  4:12     ` Tim Chen
  0 siblings, 0 replies; 14+ messages in thread
From: Tim Chen @ 2022-06-16  4:12 UTC (permalink / raw)
  To: Wei Xu
  Cc: Linux MM, Andrew Morton, Huang Ying, Greg Thelen, Yang Shi,
	Davidlohr Bueso, Brice Goglin, Michal Hocko,
	Linux Kernel Mailing List, Hesham Almatary, Dave Hansen,
	Jonathan Cameron, Alistair Popple, Dan Williams, Feng Tang,
	Jagdish Gediya, Baolin Wang, David Rientjes, Aneesh Kumar K . V,
	Shakeel Butt

On Tue, 2022-06-14 at 17:30 -0700, Wei Xu wrote:


Thanks for your comments.

> When we don't know which pages are being charged, we should still
> charge the usage to toptier (assuming that toptier always include the
> default tier), e.g. from try_charge_memcg().
> 

I delayed the charging of the toptier a bit till we know which page
is being used and the memcg is being assigned to the page.  
That's when mem_cgroup_charge_toptier is invoked. 

Otherwise if we charge to toptier first, we will have additional 
work to deduct the count when pages used are not toptier.

> The idea is that when lower tier memory is not used, memcg->toptier
> and memcg->memory should have the same value. Otherwise, it can cause
> confusions about where the pages of (memcg->memory - memcg->toptier)
> go.

Any difference should be very small as the charge will go into toptier
too quickly. The values will be different even if
memcg->memory is read at slightly different time anyway.

Tim



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 1/3] mm/memory-tiers Add functions for tier memory usage in a cgroup
  2022-06-14 22:25 ` [RFC PATCH 1/3] mm/memory-tiers Add functions for tier memory usage in a cgroup Tim Chen
@ 2022-06-21  4:18   ` Aneesh Kumar K.V
  2022-06-23 23:07     ` Tim Chen
  0 siblings, 1 reply; 14+ messages in thread
From: Aneesh Kumar K.V @ 2022-06-21  4:18 UTC (permalink / raw)
  To: Tim Chen, linux-mm, akpm
  Cc: Tim Chen, Wei Xu, Huang Ying, Greg Thelen, Yang Shi,
	Davidlohr Bueso, Brice Goglin, Michal Hocko,
	Linux Kernel Mailing List, Hesham Almatary, Dave Hansen,
	Jonathan Cameron, Alistair Popple, Dan Williams, Feng Tang,
	Jagdish Gediya, Baolin Wang, David Rientjes, Shakeel Butt

Tim Chen <tim.c.chen@linux.intel.com> writes:

 +unsigned long mem_cgroup_toptier_usage(struct mem_cgroup *memcg)
> +{
> +	struct memory_tier *top_tier;
> +
> +	top_tier = list_first_entry(&memory_tiers, struct memory_tier, list);
> +	if (top_tier)
> +		return mem_cgroup_memtier_usage(memcg, top_tier);
> +	else
> +		return 0;
> +}

As discussed here, we would want to consider all memory tiers that got
compute as top tier.

https://lore.kernel.org/linux-mm/11f94e0c50f17f4a6a2f974cb69a1ae72853e2be.camel@intel.com

V6 patchset actually walk the full memory tier hierarchy reverse and consider any
memory tier with higher or equal rank value than the first memory tier with CPU as top tier.

https://lore.kernel.org/linux-mm/20220610135229.182859-12-aneesh.kumar@linux.ibm.com

-aneesh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 1/3] mm/memory-tiers Add functions for tier memory usage in a cgroup
  2022-06-21  4:18   ` Aneesh Kumar K.V
@ 2022-06-23 23:07     ` Tim Chen
  0 siblings, 0 replies; 14+ messages in thread
From: Tim Chen @ 2022-06-23 23:07 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm
  Cc: Wei Xu, Huang Ying, Greg Thelen, Yang Shi, Davidlohr Bueso,
	Brice Goglin, Michal Hocko, Linux Kernel Mailing List,
	Hesham Almatary, Dave Hansen, Jonathan Cameron, Alistair Popple,
	Dan Williams, Feng Tang, Jagdish Gediya, Baolin Wang,
	David Rientjes, Shakeel Butt

On Tue, 2022-06-21 at 09:48 +0530, Aneesh Kumar K.V wrote:
> Tim Chen <tim.c.chen@linux.intel.com> writes:
> 
>  +unsigned long mem_cgroup_toptier_usage(struct mem_cgroup *memcg)
> > +{
> > +	struct memory_tier *top_tier;
> > +
> > +	top_tier = list_first_entry(&memory_tiers, struct memory_tier, list);
> > +	if (top_tier)
> > +		return mem_cgroup_memtier_usage(memcg, top_tier);
> > +	else
> > +		return 0;
> > +}
> 
> As discussed here, we would want to consider all memory tiers that got
> compute as top tier.
> 
> https://lore.kernel.org/linux-mm/11f94e0c50f17f4a6a2f974cb69a1ae72853e2be.camel@intel.com
> 
> V6 patchset actually walk the full memory tier hierarchy reverse and consider any
> memory tier with higher or equal rank value than the first memory tier with CPU as top tier.
> 
> https://lore.kernel.org/linux-mm/20220610135229.182859-12-aneesh.kumar@linux.ibm.com
> 

Thanks.  Will take that into consideration for future patches.

Tim


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-06-23 23:07 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-14 22:25 [RFC PATCH 0/3] Cgroup accounting of memory tier usage Tim Chen
2022-06-14 22:25 ` [RFC PATCH 1/3] mm/memory-tiers Add functions for tier memory usage in a cgroup Tim Chen
2022-06-21  4:18   ` Aneesh Kumar K.V
2022-06-23 23:07     ` Tim Chen
2022-06-14 22:25 ` [RFC PATCH 2/3] mm/memory-tiers: Use page counter to track toptier memory usage Tim Chen
2022-06-15  0:27   ` Wei Xu
2022-06-15  0:30   ` Wei Xu
2022-06-16  4:12     ` Tim Chen
2022-06-14 22:25 ` [RFC PATCH 3/3] mm/memory-tiers: Show toptier memory usage for cgroup Tim Chen
2022-06-15  4:58 ` [RFC PATCH 0/3] Cgroup accounting of memory tier usage Ying Huang
2022-06-15 17:47   ` Tim Chen
2022-06-15 11:11 ` Michal Hocko
2022-06-15 15:23   ` Tim Chen
2022-06-15 15:59     ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.