All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1] Fix vmstat_percpu incorrect subtraction after reparent
@ 2023-03-20  3:06 ` Cai Xinchen
  0 siblings, 0 replies; 10+ messages in thread
From: Cai Xinchen @ 2023-03-20  3:06 UTC (permalink / raw)
  To: songmuchun, akpm, hannes, longman, mhocko, roman.gushchin, shakeelb
  Cc: cgroups, duanxiongchun, linux-kernel, linux-mm, yosryahmed, mpenttil

Hello, I see the patch-series (Use obj_cgroup APIs to charge the LRU
pages).
Link: https://lore.kernel.org/all/20220621125658.64935-1-songmuchun@bytedance.com/

There are two problems left:

     root
     /  \
    A    B
   / \    \
  C   E    D

1. In some case of reparent, some page cache may be used by other memcg
D but it charges to the parent memcg A of dying memcg E. D is getting
away with using the page for free while A is taxed.

For this problem, the page may be shared by many memcgs. Which memcg
should be recharged to? It is hard to select. And for recharge method,
for example, the user rmdir E. If we recharge the page to D, some pages
of process attached to D may be reclaimed. The user may feel confused
about the phenomenon that I rmdir E but the processes attached to D are
reclaiming their pages and running slower.

And for cgroup v2, the page is charged to the memcg when it alloc and the
stats is counted to its parent. The method of reparent seems to follow
the rule.

2. The stats problem of vmstats_percpu. When memcg C is offllined, its 
pages are reparented to memcg P, so far P->vmstats (hierarchical) have
those pages, and P->vmstats_percpu (non-hierarchical) don't. When those
 pages get uncharged, P->vmstats (hierachical) decreases, which is correct,
but P->vmstats_percpu (non-hierarchical) also decreases, which is wrong, 
as those stats were never added to P->vmstats_percpu to begin with. If the
reparented memory exceeds the original non-hierarchical memory in P, some
arg such as cache which is show in memory.stat will be zero (if x < 0, it
shows 0)

I think propagate vmstats_percpu stats of dying memcg to its parent can 
solve this problem. If we do not propagate, the reparented memory exceeds
the original non-hierarchical memory in P, (hierarchical_usage -
non-hierarchical_usage(shows 0, but exactly negative number) - 
children_hierarchical_usage) may be meaningless.

And I want to ask for your opinions about problem 1, how to define the 
actions of charging pages to memcg when the memcg is died.

Cai Xinchen (1):
  mm: memcontrol: fix vmstats_percpu state incorrect subtraction after
    reparent

 kernel/cgroup/cgroup.c |  5 +++++
 mm/memcontrol.c        | 43 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 47 insertions(+), 1 deletion(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 0/1] Fix vmstat_percpu incorrect subtraction after reparent
@ 2023-03-20  3:06 ` Cai Xinchen
  0 siblings, 0 replies; 10+ messages in thread
From: Cai Xinchen @ 2023-03-20  3:06 UTC (permalink / raw)
  To: songmuchun-EC8Uxl6Npydl57MIdRCFDg,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w, longman-H+wXaHxf7aLQT0dZR+AlfA,
	mhocko-DgEjT+Ai2ygdnm+yROfE0A,
	roman.gushchin-fxUVXftIFDnyG1zEObXtfA,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	duanxiongchun-EC8Uxl6Npydl57MIdRCFDg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	yosryahmed-hpIqsD4AKlfQT0dZR+AlfA,
	mpenttil-H+wXaHxf7aLQT0dZR+AlfA

Hello, I see the patch-series (Use obj_cgroup APIs to charge the LRU
pages).
Link: https://lore.kernel.org/all/20220621125658.64935-1-songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org/

There are two problems left:

     root
     /  \
    A    B
   / \    \
  C   E    D

1. In some case of reparent, some page cache may be used by other memcg
D but it charges to the parent memcg A of dying memcg E. D is getting
away with using the page for free while A is taxed.

For this problem, the page may be shared by many memcgs. Which memcg
should be recharged to? It is hard to select. And for recharge method,
for example, the user rmdir E. If we recharge the page to D, some pages
of process attached to D may be reclaimed. The user may feel confused
about the phenomenon that I rmdir E but the processes attached to D are
reclaiming their pages and running slower.

And for cgroup v2, the page is charged to the memcg when it alloc and the
stats is counted to its parent. The method of reparent seems to follow
the rule.

2. The stats problem of vmstats_percpu. When memcg C is offllined, its 
pages are reparented to memcg P, so far P->vmstats (hierarchical) have
those pages, and P->vmstats_percpu (non-hierarchical) don't. When those
 pages get uncharged, P->vmstats (hierachical) decreases, which is correct,
but P->vmstats_percpu (non-hierarchical) also decreases, which is wrong, 
as those stats were never added to P->vmstats_percpu to begin with. If the
reparented memory exceeds the original non-hierarchical memory in P, some
arg such as cache which is show in memory.stat will be zero (if x < 0, it
shows 0)

I think propagate vmstats_percpu stats of dying memcg to its parent can 
solve this problem. If we do not propagate, the reparented memory exceeds
the original non-hierarchical memory in P, (hierarchical_usage -
non-hierarchical_usage(shows 0, but exactly negative number) - 
children_hierarchical_usage) may be meaningless.

And I want to ask for your opinions about problem 1, how to define the 
actions of charging pages to memcg when the memcg is died.

Cai Xinchen (1):
  mm: memcontrol: fix vmstats_percpu state incorrect subtraction after
    reparent

 kernel/cgroup/cgroup.c |  5 +++++
 mm/memcontrol.c        | 43 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 47 insertions(+), 1 deletion(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/1] mm: memcontrol: fix vmstats_percpu state incorrect subtraction after reparent
@ 2023-03-20  3:06   ` Cai Xinchen
  0 siblings, 0 replies; 10+ messages in thread
From: Cai Xinchen @ 2023-03-20  3:06 UTC (permalink / raw)
  To: songmuchun, akpm, hannes, longman, mhocko, roman.gushchin, shakeelb
  Cc: cgroups, duanxiongchun, linux-kernel, linux-mm, yosryahmed, mpenttil

When memcg C is offllined, its pages are reparented to memcg P,
so far P->vmstats (hierarchical) have those pages, and
P->vmstats_percpu (non-hierarchical) don't. When those pages get
uncharged, P->vmstats (hierachical) decreases, which is correct,
but P->vmstats_percpu (non-hierarchical) also decreases, which
is wrong, as those stats were never added to P->vmstats_percpu to
begin with. If the reparented memory exceeds the original
non-hierarchical memory in P, some arg such as cache which is shown
in memory.stat will be zero (if x < 0, it shows 0)

To solve this problem, after reparent memcg, we should use
cgroup_rstat_updated to add cgrp to rstat updated tree, and then after
css refcount == 0, cgroup_rstat_flush is called by css_release_work_fn
and css->flags is set as CSS_RELEASED. We propagate state in
vmstat_percpu of struct mem_cgroup to its parent if css->flags is
CSS_RELEASED. We use reparent_state to record those state and
reparent_tag to ensure propagate vmstats_percpu of dying memcg only
once. When mem_cgroup_css_rstat_flush is called, we
add reparent_state to parent_memcg->vmstats_percpu->state and
memcg->vmstgats_percpu->state_prev in locked context. The delta
between state and state_prev would not change. So some accumulative
 value of vmstats_percpu (such as cache in memory.stat) will be
shown correctly. And this patch does not produce delta between state
and state_prev in vmstats_percpu which will be added to vmstats.

Signed-off-by: Cai Xinchen <caixinchen1@huawei.com>
---
 kernel/cgroup/cgroup.c |  5 +++++
 mm/memcontrol.c        | 43 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 935e8121b21e..11138ee09a61 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -5384,12 +5384,17 @@ static void css_release_work_fn(struct work_struct *work)
 		container_of(work, struct cgroup_subsys_state, destroy_work);
 	struct cgroup_subsys *ss = css->ss;
 	struct cgroup *cgrp = css->cgroup;
+	int cpu;
 
 	mutex_lock(&cgroup_mutex);
 
 	css->flags |= CSS_RELEASED;
 	list_del_rcu(&css->sibling);
 
+	/* update all cgrp rstat because reparent rstat flush */
+	for_each_possible_cpu(cpu)
+		cgroup_rstat_updated(cgrp, cpu);
+
 	if (ss) {
 		/* css release path */
 		if (!list_empty(&css->rstat_css_node)) {
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 77929d08c8c9..658d42dc9820 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -842,6 +842,9 @@ struct memcg_vmstats_percpu {
 	/* Cgroup1: threshold notifications & softlimit tree updates */
 	unsigned long		nr_page_events;
 	unsigned long		targets[MEM_CGROUP_NTARGETS];
+
+	/* Tag for propagate vmstat_percpu, 1 for start, 0 for stop */
+	long			reparent_tag;
 };
 
 struct memcg_vmstats {
@@ -852,6 +855,9 @@ struct memcg_vmstats {
 	/* Pending child counts during tree propagation */
 	long			state_pending[MEMCG_NR_STAT];
 	unsigned long		events_pending[NR_MEMCG_EVENTS];
+
+	/* Propagate vmstat_percpu after reparent */
+	long			reparent_state[MEMCG_NR_STAT];
 };
 
 unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx)
@@ -5558,6 +5564,17 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
 	return -ENOMEM;
 }
 
+static void set_reparent_tag_for_vmstats(struct mem_cgroup *memcg)
+{
+	int cpu;
+	struct memcg_vmstats_percpu *statc;
+
+	for_each_possible_cpu(cpu) {
+		statc = per_cpu_ptr(memcg->vmstats_percpu, cpu);
+		WRITE_ONCE(statc->reparent_tag, 1);
+	}
+}
+
 static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
@@ -5579,6 +5596,7 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
 	page_counter_set_low(&memcg->memory, 0);
 
 	memcg_reparent_objcgs(memcg);
+	set_reparent_tag_for_vmstats(memcg);
 	memcg_offline_kmem(memcg);
 	reparent_shrinker_deferred(memcg);
 	wb_memcg_offline(memcg);
@@ -5653,8 +5671,9 @@ static void mem_cgroup_css_rstat_flush(struct cgroup_subsys_state *css, int cpu)
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
 	struct mem_cgroup *parent = parent_mem_cgroup(memcg);
 	struct memcg_vmstats_percpu *statc;
-	long delta, v;
+	long delta, v, reparent_v;
 	int i, nid;
+	bool reparent = false;
 
 	statc = per_cpu_ptr(memcg->vmstats_percpu, cpu);
 
@@ -5668,6 +5687,10 @@ static void mem_cgroup_css_rstat_flush(struct cgroup_subsys_state *css, int cpu)
 		if (delta)
 			memcg->vmstats->state_pending[i] = 0;
 
+		reparent_v = memcg->vmstats->reparent_state[i];
+		if (reparent_v)
+			memcg->vmstats->reparent_state[i] = 0;
+
 		/* Add CPU changes on this level since the last flush */
 		v = READ_ONCE(statc->state[i]);
 		if (v != statc->state_prev[i]) {
@@ -5675,6 +5698,21 @@ static void mem_cgroup_css_rstat_flush(struct cgroup_subsys_state *css, int cpu)
 			statc->state_prev[i] = v;
 		}
 
+		/* if reparent tag is set, this cgroup is offline and reparent.
+		 * if css->flags & CSS_RELEASED, this cgroup will be released
+		 * soon. We should propagate memcg's vmstats_percpu to its parent.
+		 */
+		if (READ_ONCE(statc->reparent_tag) &&
+			css->flags & CSS_RELEASED && parent) {
+			parent->vmstats->reparent_state[i] += v + reparent_v;
+			reparent = true;
+		} else if (reparent_v) {
+			__this_cpu_add(memcg->vmstats_percpu->state[i],
+				reparent_v);
+			__this_cpu_add(memcg->vmstats_percpu->state_prev[i],
+				reparent_v);
+		}
+
 		if (!delta)
 			continue;
 
@@ -5684,6 +5722,9 @@ static void mem_cgroup_css_rstat_flush(struct cgroup_subsys_state *css, int cpu)
 			parent->vmstats->state_pending[i] += delta;
 	}
 
+	if (reparent)
+		WRITE_ONCE(statc->reparent_tag, 0);
+
 	for (i = 0; i < NR_MEMCG_EVENTS; i++) {
 		delta = memcg->vmstats->events_pending[i];
 		if (delta)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 1/1] mm: memcontrol: fix vmstats_percpu state incorrect subtraction after reparent
@ 2023-03-20  3:06   ` Cai Xinchen
  0 siblings, 0 replies; 10+ messages in thread
From: Cai Xinchen @ 2023-03-20  3:06 UTC (permalink / raw)
  To: songmuchun-EC8Uxl6Npydl57MIdRCFDg,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w, longman-H+wXaHxf7aLQT0dZR+AlfA,
	mhocko-DgEjT+Ai2ygdnm+yROfE0A,
	roman.gushchin-fxUVXftIFDnyG1zEObXtfA,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	duanxiongchun-EC8Uxl6Npydl57MIdRCFDg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	yosryahmed-hpIqsD4AKlfQT0dZR+AlfA,
	mpenttil-H+wXaHxf7aLQT0dZR+AlfA

When memcg C is offllined, its pages are reparented to memcg P,
so far P->vmstats (hierarchical) have those pages, and
P->vmstats_percpu (non-hierarchical) don't. When those pages get
uncharged, P->vmstats (hierachical) decreases, which is correct,
but P->vmstats_percpu (non-hierarchical) also decreases, which
is wrong, as those stats were never added to P->vmstats_percpu to
begin with. If the reparented memory exceeds the original
non-hierarchical memory in P, some arg such as cache which is shown
in memory.stat will be zero (if x < 0, it shows 0)

To solve this problem, after reparent memcg, we should use
cgroup_rstat_updated to add cgrp to rstat updated tree, and then after
css refcount == 0, cgroup_rstat_flush is called by css_release_work_fn
and css->flags is set as CSS_RELEASED. We propagate state in
vmstat_percpu of struct mem_cgroup to its parent if css->flags is
CSS_RELEASED. We use reparent_state to record those state and
reparent_tag to ensure propagate vmstats_percpu of dying memcg only
once. When mem_cgroup_css_rstat_flush is called, we
add reparent_state to parent_memcg->vmstats_percpu->state and
memcg->vmstgats_percpu->state_prev in locked context. The delta
between state and state_prev would not change. So some accumulative
 value of vmstats_percpu (such as cache in memory.stat) will be
shown correctly. And this patch does not produce delta between state
and state_prev in vmstats_percpu which will be added to vmstats.

Signed-off-by: Cai Xinchen <caixinchen1-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 kernel/cgroup/cgroup.c |  5 +++++
 mm/memcontrol.c        | 43 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 935e8121b21e..11138ee09a61 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -5384,12 +5384,17 @@ static void css_release_work_fn(struct work_struct *work)
 		container_of(work, struct cgroup_subsys_state, destroy_work);
 	struct cgroup_subsys *ss = css->ss;
 	struct cgroup *cgrp = css->cgroup;
+	int cpu;
 
 	mutex_lock(&cgroup_mutex);
 
 	css->flags |= CSS_RELEASED;
 	list_del_rcu(&css->sibling);
 
+	/* update all cgrp rstat because reparent rstat flush */
+	for_each_possible_cpu(cpu)
+		cgroup_rstat_updated(cgrp, cpu);
+
 	if (ss) {
 		/* css release path */
 		if (!list_empty(&css->rstat_css_node)) {
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 77929d08c8c9..658d42dc9820 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -842,6 +842,9 @@ struct memcg_vmstats_percpu {
 	/* Cgroup1: threshold notifications & softlimit tree updates */
 	unsigned long		nr_page_events;
 	unsigned long		targets[MEM_CGROUP_NTARGETS];
+
+	/* Tag for propagate vmstat_percpu, 1 for start, 0 for stop */
+	long			reparent_tag;
 };
 
 struct memcg_vmstats {
@@ -852,6 +855,9 @@ struct memcg_vmstats {
 	/* Pending child counts during tree propagation */
 	long			state_pending[MEMCG_NR_STAT];
 	unsigned long		events_pending[NR_MEMCG_EVENTS];
+
+	/* Propagate vmstat_percpu after reparent */
+	long			reparent_state[MEMCG_NR_STAT];
 };
 
 unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx)
@@ -5558,6 +5564,17 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
 	return -ENOMEM;
 }
 
+static void set_reparent_tag_for_vmstats(struct mem_cgroup *memcg)
+{
+	int cpu;
+	struct memcg_vmstats_percpu *statc;
+
+	for_each_possible_cpu(cpu) {
+		statc = per_cpu_ptr(memcg->vmstats_percpu, cpu);
+		WRITE_ONCE(statc->reparent_tag, 1);
+	}
+}
+
 static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
@@ -5579,6 +5596,7 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
 	page_counter_set_low(&memcg->memory, 0);
 
 	memcg_reparent_objcgs(memcg);
+	set_reparent_tag_for_vmstats(memcg);
 	memcg_offline_kmem(memcg);
 	reparent_shrinker_deferred(memcg);
 	wb_memcg_offline(memcg);
@@ -5653,8 +5671,9 @@ static void mem_cgroup_css_rstat_flush(struct cgroup_subsys_state *css, int cpu)
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
 	struct mem_cgroup *parent = parent_mem_cgroup(memcg);
 	struct memcg_vmstats_percpu *statc;
-	long delta, v;
+	long delta, v, reparent_v;
 	int i, nid;
+	bool reparent = false;
 
 	statc = per_cpu_ptr(memcg->vmstats_percpu, cpu);
 
@@ -5668,6 +5687,10 @@ static void mem_cgroup_css_rstat_flush(struct cgroup_subsys_state *css, int cpu)
 		if (delta)
 			memcg->vmstats->state_pending[i] = 0;
 
+		reparent_v = memcg->vmstats->reparent_state[i];
+		if (reparent_v)
+			memcg->vmstats->reparent_state[i] = 0;
+
 		/* Add CPU changes on this level since the last flush */
 		v = READ_ONCE(statc->state[i]);
 		if (v != statc->state_prev[i]) {
@@ -5675,6 +5698,21 @@ static void mem_cgroup_css_rstat_flush(struct cgroup_subsys_state *css, int cpu)
 			statc->state_prev[i] = v;
 		}
 
+		/* if reparent tag is set, this cgroup is offline and reparent.
+		 * if css->flags & CSS_RELEASED, this cgroup will be released
+		 * soon. We should propagate memcg's vmstats_percpu to its parent.
+		 */
+		if (READ_ONCE(statc->reparent_tag) &&
+			css->flags & CSS_RELEASED && parent) {
+			parent->vmstats->reparent_state[i] += v + reparent_v;
+			reparent = true;
+		} else if (reparent_v) {
+			__this_cpu_add(memcg->vmstats_percpu->state[i],
+				reparent_v);
+			__this_cpu_add(memcg->vmstats_percpu->state_prev[i],
+				reparent_v);
+		}
+
 		if (!delta)
 			continue;
 
@@ -5684,6 +5722,9 @@ static void mem_cgroup_css_rstat_flush(struct cgroup_subsys_state *css, int cpu)
 			parent->vmstats->state_pending[i] += delta;
 	}
 
+	if (reparent)
+		WRITE_ONCE(statc->reparent_tag, 0);
+
 	for (i = 0; i < NR_MEMCG_EVENTS; i++) {
 		delta = memcg->vmstats->events_pending[i];
 		if (delta)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] mm: memcontrol: fix vmstats_percpu state incorrect subtraction after reparent
@ 2023-03-24 17:11     ` Michal Koutný
  0 siblings, 0 replies; 10+ messages in thread
From: Michal Koutný @ 2023-03-24 17:11 UTC (permalink / raw)
  To: Cai Xinchen
  Cc: songmuchun, akpm, hannes, longman, mhocko, roman.gushchin,
	shakeelb, cgroups, duanxiongchun, linux-kernel, linux-mm,
	yosryahmed, mpenttil

[-- Attachment #1: Type: text/plain, Size: 697 bytes --]

Hello.

On Mon, Mar 20, 2023 at 03:06:48AM +0000, Cai Xinchen <caixinchen1@huawei.com> wrote:
> When memcg C is offllined, its pages are reparented to memcg P,
> so far P->vmstats (hierarchical) have those pages, and
> P->vmstats_percpu (non-hierarchical) don't. When those pages get
> uncharged, P->vmstats (hierachical) decreases, which is correct,
> but P->vmstats_percpu (non-hierarchical) also decreases, which
> is wrong, as those stats were never added to P->vmstats_percpu to
> begin with.

I was wondering why ->vmstats_percpu matters (in the end all is summed
in ->vmstats) -- do you mean this is a cgroup v1 only issue? As only
that exposes the non-hieararchical stats.

Thanks,
Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] mm: memcontrol: fix vmstats_percpu state incorrect subtraction after reparent
@ 2023-03-24 17:11     ` Michal Koutný
  0 siblings, 0 replies; 10+ messages in thread
From: Michal Koutný @ 2023-03-24 17:11 UTC (permalink / raw)
  To: Cai Xinchen
  Cc: songmuchun-EC8Uxl6Npydl57MIdRCFDg,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w, longman-H+wXaHxf7aLQT0dZR+AlfA,
	mhocko-DgEjT+Ai2ygdnm+yROfE0A,
	roman.gushchin-fxUVXftIFDnyG1zEObXtfA,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	duanxiongchun-EC8Uxl6Npydl57MIdRCFDg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	yosryahmed-hpIqsD4AKlfQT0dZR+AlfA,
	mpenttil-H+wXaHxf7aLQT0dZR+AlfA

[-- Attachment #1: Type: text/plain, Size: 726 bytes --]

Hello.

On Mon, Mar 20, 2023 at 03:06:48AM +0000, Cai Xinchen <caixinchen1-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> wrote:
> When memcg C is offllined, its pages are reparented to memcg P,
> so far P->vmstats (hierarchical) have those pages, and
> P->vmstats_percpu (non-hierarchical) don't. When those pages get
> uncharged, P->vmstats (hierachical) decreases, which is correct,
> but P->vmstats_percpu (non-hierarchical) also decreases, which
> is wrong, as those stats were never added to P->vmstats_percpu to
> begin with.

I was wondering why ->vmstats_percpu matters (in the end all is summed
in ->vmstats) -- do you mean this is a cgroup v1 only issue? As only
that exposes the non-hieararchical stats.

Thanks,
Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/1] Fix vmstat_percpu incorrect subtraction after reparent
@ 2023-03-24 17:14   ` Michal Koutný
  0 siblings, 0 replies; 10+ messages in thread
From: Michal Koutný @ 2023-03-24 17:14 UTC (permalink / raw)
  To: Cai Xinchen
  Cc: songmuchun, akpm, hannes, longman, mhocko, roman.gushchin,
	shakeelb, cgroups, duanxiongchun, linux-kernel, linux-mm,
	yosryahmed, mpenttil

[-- Attachment #1: Type: text/plain, Size: 638 bytes --]

On Mon, Mar 20, 2023 at 03:06:47AM +0000, Cai Xinchen <caixinchen1@huawei.com> wrote:
> There are two problems left:
> 
>      root
>      /  \
>     A    B
>    / \    \
>   C   E    D
> 
> 1. In some case of reparent, some page cache may be used by other memcg
> D but it charges to the parent memcg A of dying memcg E. D is getting
> away with using the page for free while A is taxed.

Note that A is (effectively) taxed even before E is removed due to
hierarchical nature of charging. Then what you describe transforms into
"well-known" problem of shared charging (with not well-known solution
:-/).

HTH,
Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/1] Fix vmstat_percpu incorrect subtraction after reparent
@ 2023-03-24 17:14   ` Michal Koutný
  0 siblings, 0 replies; 10+ messages in thread
From: Michal Koutný @ 2023-03-24 17:14 UTC (permalink / raw)
  To: Cai Xinchen
  Cc: songmuchun-EC8Uxl6Npydl57MIdRCFDg,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	hannes-druUgvl0LCNAfugRpC6u6w, longman-H+wXaHxf7aLQT0dZR+AlfA,
	mhocko-DgEjT+Ai2ygdnm+yROfE0A,
	roman.gushchin-fxUVXftIFDnyG1zEObXtfA,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	duanxiongchun-EC8Uxl6Npydl57MIdRCFDg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	yosryahmed-hpIqsD4AKlfQT0dZR+AlfA,
	mpenttil-H+wXaHxf7aLQT0dZR+AlfA

[-- Attachment #1: Type: text/plain, Size: 638 bytes --]

On Mon, Mar 20, 2023 at 03:06:47AM +0000, Cai Xinchen <caixinchen1@huawei.com> wrote:
> There are two problems left:
> 
>      root
>      /  \
>     A    B
>    / \    \
>   C   E    D
> 
> 1. In some case of reparent, some page cache may be used by other memcg
> D but it charges to the parent memcg A of dying memcg E. D is getting
> away with using the page for free while A is taxed.

Note that A is (effectively) taxed even before E is removed due to
hierarchical nature of charging. Then what you describe transforms into
"well-known" problem of shared charging (with not well-known solution
:-/).

HTH,
Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] mm: memcontrol: fix vmstats_percpu state incorrect subtraction after reparent
  2023-03-24 17:11     ` Michal Koutný
@ 2023-03-27  1:29       ` Cai Xinchen
  -1 siblings, 0 replies; 10+ messages in thread
From: Cai Xinchen @ 2023-03-27  1:29 UTC (permalink / raw)
  To: mkoutny
  Cc: songmuchun, akpm, hannes, longman, mhocko, roman.gushchin,
	shakeelb, cgroups, duanxiongchun, linux-kernel, linux-mm,
	yosryahmed, mpenttil

Yes, only cgroup v1.

On 2023/3/25 1:11, Michal Koutný wrote:
> Hello.
>
> On Mon, Mar 20, 2023 at 03:06:48AM +0000, Cai Xinchen <caixinchen1@huawei.com> wrote:
>> When memcg C is offllined, its pages are reparented to memcg P,
>> so far P->vmstats (hierarchical) have those pages, and
>> P->vmstats_percpu (non-hierarchical) don't. When those pages get
>> uncharged, P->vmstats (hierachical) decreases, which is correct,
>> but P->vmstats_percpu (non-hierarchical) also decreases, which
>> is wrong, as those stats were never added to P->vmstats_percpu to
>> begin with.
> I was wondering why ->vmstats_percpu matters (in the end all is summed
> in ->vmstats) -- do you mean this is a cgroup v1 only issue? As only
> that exposes the non-hieararchical stats.
>
> Thanks,
> Michal

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] mm: memcontrol: fix vmstats_percpu state incorrect subtraction after reparent
@ 2023-03-27  1:29       ` Cai Xinchen
  0 siblings, 0 replies; 10+ messages in thread
From: Cai Xinchen @ 2023-03-27  1:29 UTC (permalink / raw)
  To: mkoutny
  Cc: songmuchun, akpm, hannes, longman, mhocko, roman.gushchin,
	shakeelb, cgroups, duanxiongchun, linux-kernel, linux-mm,
	yosryahmed, mpenttil

Yes, only cgroup v1.

On 2023/3/25 1:11, Michal Koutný wrote:
> Hello.
>
> On Mon, Mar 20, 2023 at 03:06:48AM +0000, Cai Xinchen <caixinchen1@huawei.com> wrote:
>> When memcg C is offllined, its pages are reparented to memcg P,
>> so far P->vmstats (hierarchical) have those pages, and
>> P->vmstats_percpu (non-hierarchical) don't. When those pages get
>> uncharged, P->vmstats (hierachical) decreases, which is correct,
>> but P->vmstats_percpu (non-hierarchical) also decreases, which
>> is wrong, as those stats were never added to P->vmstats_percpu to
>> begin with.
> I was wondering why ->vmstats_percpu matters (in the end all is summed
> in ->vmstats) -- do you mean this is a cgroup v1 only issue? As only
> that exposes the non-hieararchical stats.
>
> Thanks,
> Michal

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-03-27  1:29 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-20  3:06 [PATCH 0/1] Fix vmstat_percpu incorrect subtraction after reparent Cai Xinchen
2023-03-20  3:06 ` Cai Xinchen
2023-03-20  3:06 ` [PATCH 1/1] mm: memcontrol: fix vmstats_percpu state " Cai Xinchen
2023-03-20  3:06   ` Cai Xinchen
2023-03-24 17:11   ` Michal Koutný
2023-03-24 17:11     ` Michal Koutný
2023-03-27  1:29     ` Cai Xinchen
2023-03-27  1:29       ` Cai Xinchen
2023-03-24 17:14 ` [PATCH 0/1] Fix vmstat_percpu " Michal Koutný
2023-03-24 17:14   ` Michal Koutný

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.