All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] Fix some bugs in memcg/slab
@ 2020-10-27  8:02 ` Muchun Song
  0 siblings, 0 replies; 41+ messages in thread
From: Muchun Song @ 2020-10-27  8:02 UTC (permalink / raw)
  To: hannes, mhocko, vdavydov.dev, akpm, shakeelb, guro,
	iamjoonsoo.kim, laoar.shao, chris, christian.brauner, peterz,
	mingo, keescook, tglx, esyr, surenb, areber, elver
  Cc: linux-kernel, cgroups, linux-mm, Muchun Song

This patch series fixes some bugs and simplify the code in the
memcontrol.c.

Muchun Song (5):
  mm: memcg/slab: Fix return child memcg objcg for root memcg
  mm: memcg/slab: Fix use after free in obj_cgroup_charge
  mm: memcg/slab: Rename *_lruvec_slab_state to *_lruvec_kmem_state
  mm: memcg/slab: Fix root memcg vmstats
  mm: memcontrol: Simplify the mem_cgroup_page_lruvec

 include/linux/memcontrol.h | 62 +++++++++++++++++++++++++-------------
 kernel/fork.c              |  2 +-
 mm/memcontrol.c            | 61 ++++++++++---------------------------
 mm/workingset.c            |  8 ++---
 4 files changed, 62 insertions(+), 71 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 0/5] Fix some bugs in memcg/slab
@ 2020-10-27  8:02 ` Muchun Song
  0 siblings, 0 replies; 41+ messages in thread
From: Muchun Song @ 2020-10-27  8:02 UTC (permalink / raw)
  To: hannes-druUgvl0LCNAfugRpC6u6w, mhocko-DgEjT+Ai2ygdnm+yROfE0A,
	vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA, guro-b10kYP2dOMg,
	iamjoonsoo.kim-Hm3cg6mZ9cc, laoar.shao-Re5JQEeQqe8AvxtiuMwx3w,
	chris-6Bi1550iOqEnzZ6mRAm98g,
	christian.brauner-GeWIH/nMZzLQT0dZR+AlfA,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ, mingo-DgEjT+Ai2ygdnm+yROfE0A,
	keescook-F7+t8E8rja9g9hUCZPvPmw, tglx-hfZtesqFncYOwBW4kG4KsQ,
	esyr-H+wXaHxf7aLQT0dZR+AlfA, surenb-hpIqsD4AKlfQT0dZR+AlfA,
	areber-H+wXaHxf7aLQT0dZR+AlfA, elver-hpIqsD4AKlfQT0dZR+AlfA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	Muchun Song

This patch series fixes some bugs and simplify the code in the
memcontrol.c.

Muchun Song (5):
  mm: memcg/slab: Fix return child memcg objcg for root memcg
  mm: memcg/slab: Fix use after free in obj_cgroup_charge
  mm: memcg/slab: Rename *_lruvec_slab_state to *_lruvec_kmem_state
  mm: memcg/slab: Fix root memcg vmstats
  mm: memcontrol: Simplify the mem_cgroup_page_lruvec

 include/linux/memcontrol.h | 62 +++++++++++++++++++++++++-------------
 kernel/fork.c              |  2 +-
 mm/memcontrol.c            | 61 ++++++++++---------------------------
 mm/workingset.c            |  8 ++---
 4 files changed, 62 insertions(+), 71 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 1/5] mm: memcg/slab: Fix return child memcg objcg for root memcg
  2020-10-27  8:02 ` Muchun Song
  (?)
@ 2020-10-27  8:02 ` Muchun Song
  2020-10-27 17:42     ` Roman Gushchin
  -1 siblings, 1 reply; 41+ messages in thread
From: Muchun Song @ 2020-10-27  8:02 UTC (permalink / raw)
  To: hannes, mhocko, vdavydov.dev, akpm, shakeelb, guro,
	iamjoonsoo.kim, laoar.shao, chris, christian.brauner, peterz,
	mingo, keescook, tglx, esyr, surenb, areber, elver
  Cc: linux-kernel, cgroups, linux-mm, Muchun Song

Consider the following memcg hierarchy.

                    root
                   /    \
                  A      B

If we get the objcg of memcg A failed, the get_obj_cgroup_from_current
can return the wrong objcg for the root memcg.

Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 mm/memcontrol.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 1337775b04f3..fcbd79c5023e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2945,7 +2945,7 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
 
 __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void)
 {
-	struct obj_cgroup *objcg = NULL;
+	struct obj_cgroup *objcg;
 	struct mem_cgroup *memcg;
 
 	if (memcg_kmem_bypass())
@@ -2964,6 +2964,9 @@ __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void)
 	}
 	rcu_read_unlock();
 
+	if (memcg == root_mem_cgroup)
+		return NULL;
+
 	return objcg;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 2/5] mm: memcg/slab: Fix use after free in obj_cgroup_charge
  2020-10-27  8:02 ` Muchun Song
  (?)
  (?)
@ 2020-10-27  8:02 ` Muchun Song
  2020-10-27 18:31     ` Roman Gushchin
  -1 siblings, 1 reply; 41+ messages in thread
From: Muchun Song @ 2020-10-27  8:02 UTC (permalink / raw)
  To: hannes, mhocko, vdavydov.dev, akpm, shakeelb, guro,
	iamjoonsoo.kim, laoar.shao, chris, christian.brauner, peterz,
	mingo, keescook, tglx, esyr, surenb, areber, elver
  Cc: linux-kernel, cgroups, linux-mm, Muchun Song

The rcu_read_lock/unlock only can guarantee that the memcg will
not be freed, but it cannot guarantee the success of css_get to
memcg. This can be happened when we reparent the memcg. So using
css_tryget instead of css_get.

Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 mm/memcontrol.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fcbd79c5023e..c0c27bee23ad 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3223,8 +3223,10 @@ int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size)
 	 * independently later.
 	 */
 	rcu_read_lock();
+retry:
 	memcg = obj_cgroup_memcg(objcg);
-	css_get(&memcg->css);
+	if (!css_tryget(&memcg->css))
+		goto retry;
 	rcu_read_unlock();
 
 	nr_pages = size >> PAGE_SHIFT;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 3/5] mm: memcg/slab: Rename *_lruvec_slab_state to *_lruvec_kmem_state
@ 2020-10-27  8:02   ` Muchun Song
  0 siblings, 0 replies; 41+ messages in thread
From: Muchun Song @ 2020-10-27  8:02 UTC (permalink / raw)
  To: hannes, mhocko, vdavydov.dev, akpm, shakeelb, guro,
	iamjoonsoo.kim, laoar.shao, chris, christian.brauner, peterz,
	mingo, keescook, tglx, esyr, surenb, areber, elver
  Cc: linux-kernel, cgroups, linux-mm, Muchun Song

The *_lruvec_slab_state is also suitable for pages allocated from buddy,
not just for the slab objects. But the function name seems to tell us that
only slab object is applicable. So we can rename the keyword of slab to
kmem.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 include/linux/memcontrol.h | 18 +++++++++---------
 kernel/fork.c              |  2 +-
 mm/memcontrol.c            |  3 ++-
 mm/workingset.c            |  8 ++++----
 4 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index d7e339bf72dc..95807bf6be64 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -793,15 +793,15 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
 			      int val);
 void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
 			int val);
-void __mod_lruvec_slab_state(void *p, enum node_stat_item idx, int val);
+void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val);
 
-static inline void mod_lruvec_slab_state(void *p, enum node_stat_item idx,
+static inline void mod_lruvec_kmem_state(void *p, enum node_stat_item idx,
 					 int val)
 {
 	unsigned long flags;
 
 	local_irq_save(flags);
-	__mod_lruvec_slab_state(p, idx, val);
+	__mod_lruvec_kmem_state(p, idx, val);
 	local_irq_restore(flags);
 }
 
@@ -1227,7 +1227,7 @@ static inline void mod_lruvec_page_state(struct page *page,
 	mod_node_page_state(page_pgdat(page), idx, val);
 }
 
-static inline void __mod_lruvec_slab_state(void *p, enum node_stat_item idx,
+static inline void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx,
 					   int val)
 {
 	struct page *page = virt_to_head_page(p);
@@ -1235,7 +1235,7 @@ static inline void __mod_lruvec_slab_state(void *p, enum node_stat_item idx,
 	__mod_node_page_state(page_pgdat(page), idx, val);
 }
 
-static inline void mod_lruvec_slab_state(void *p, enum node_stat_item idx,
+static inline void mod_lruvec_kmem_state(void *p, enum node_stat_item idx,
 					 int val)
 {
 	struct page *page = virt_to_head_page(p);
@@ -1330,14 +1330,14 @@ static inline void __dec_lruvec_page_state(struct page *page,
 	__mod_lruvec_page_state(page, idx, -1);
 }
 
-static inline void __inc_lruvec_slab_state(void *p, enum node_stat_item idx)
+static inline void __inc_lruvec_kmem_state(void *p, enum node_stat_item idx)
 {
-	__mod_lruvec_slab_state(p, idx, 1);
+	__mod_lruvec_kmem_state(p, idx, 1);
 }
 
-static inline void __dec_lruvec_slab_state(void *p, enum node_stat_item idx)
+static inline void __dec_lruvec_kmem_state(void *p, enum node_stat_item idx)
 {
-	__mod_lruvec_slab_state(p, idx, -1);
+	__mod_lruvec_kmem_state(p, idx, -1);
 }
 
 /* idx can be of type enum memcg_stat_item or node_stat_item */
diff --git a/kernel/fork.c b/kernel/fork.c
index 4b328aecabb2..4fb0bbc3b041 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -384,7 +384,7 @@ static void account_kernel_stack(struct task_struct *tsk, int account)
 		mod_lruvec_page_state(vm->pages[0], NR_KERNEL_STACK_KB,
 				      account * (THREAD_SIZE / 1024));
 	else
-		mod_lruvec_slab_state(stack, NR_KERNEL_STACK_KB,
+		mod_lruvec_kmem_state(stack, NR_KERNEL_STACK_KB,
 				      account * (THREAD_SIZE / 1024));
 }
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c0c27bee23ad..22b4fb941b54 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -866,7 +866,7 @@ void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
 		__mod_memcg_lruvec_state(lruvec, idx, val);
 }
 
-void __mod_lruvec_slab_state(void *p, enum node_stat_item idx, int val)
+void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val)
 {
 	pg_data_t *pgdat = page_pgdat(virt_to_page(p));
 	struct mem_cgroup *memcg;
@@ -2920,6 +2920,7 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
 	if (mem_cgroup_disabled())
 		return NULL;
 
+	VM_BUG_ON(!virt_addr_valid(p));
 	page = virt_to_head_page(p);
 
 	/*
diff --git a/mm/workingset.c b/mm/workingset.c
index 50d53f3699e4..2c707c92dd89 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -445,12 +445,12 @@ void workingset_update_node(struct xa_node *node)
 	if (node->count && node->count == node->nr_values) {
 		if (list_empty(&node->private_list)) {
 			list_lru_add(&shadow_nodes, &node->private_list);
-			__inc_lruvec_slab_state(node, WORKINGSET_NODES);
+			__inc_lruvec_kmem_state(node, WORKINGSET_NODES);
 		}
 	} else {
 		if (!list_empty(&node->private_list)) {
 			list_lru_del(&shadow_nodes, &node->private_list);
-			__dec_lruvec_slab_state(node, WORKINGSET_NODES);
+			__dec_lruvec_kmem_state(node, WORKINGSET_NODES);
 		}
 	}
 }
@@ -541,7 +541,7 @@ static enum lru_status shadow_lru_isolate(struct list_head *item,
 	}
 
 	list_lru_isolate(lru, item);
-	__dec_lruvec_slab_state(node, WORKINGSET_NODES);
+	__dec_lruvec_kmem_state(node, WORKINGSET_NODES);
 
 	spin_unlock(lru_lock);
 
@@ -564,7 +564,7 @@ static enum lru_status shadow_lru_isolate(struct list_head *item,
 	 * shadow entries we were tracking ...
 	 */
 	xas_store(&xas, NULL);
-	__inc_lruvec_slab_state(node, WORKINGSET_NODERECLAIM);
+	__inc_lruvec_kmem_state(node, WORKINGSET_NODERECLAIM);
 
 out_invalid:
 	xa_unlock_irq(&mapping->i_pages);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 3/5] mm: memcg/slab: Rename *_lruvec_slab_state to *_lruvec_kmem_state
@ 2020-10-27  8:02   ` Muchun Song
  0 siblings, 0 replies; 41+ messages in thread
From: Muchun Song @ 2020-10-27  8:02 UTC (permalink / raw)
  To: hannes-druUgvl0LCNAfugRpC6u6w, mhocko-DgEjT+Ai2ygdnm+yROfE0A,
	vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA, guro-b10kYP2dOMg,
	iamjoonsoo.kim-Hm3cg6mZ9cc, laoar.shao-Re5JQEeQqe8AvxtiuMwx3w,
	chris-6Bi1550iOqEnzZ6mRAm98g,
	christian.brauner-GeWIH/nMZzLQT0dZR+AlfA,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ, mingo-DgEjT+Ai2ygdnm+yROfE0A,
	keescook-F7+t8E8rja9g9hUCZPvPmw, tglx-hfZtesqFncYOwBW4kG4KsQ,
	esyr-H+wXaHxf7aLQT0dZR+AlfA, surenb-hpIqsD4AKlfQT0dZR+AlfA,
	areber-H+wXaHxf7aLQT0dZR+AlfA, elver-hpIqsD4AKlfQT0dZR+AlfA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	Muchun Song

The *_lruvec_slab_state is also suitable for pages allocated from buddy,
not just for the slab objects. But the function name seems to tell us that
only slab object is applicable. So we can rename the keyword of slab to
kmem.

Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>
---
 include/linux/memcontrol.h | 18 +++++++++---------
 kernel/fork.c              |  2 +-
 mm/memcontrol.c            |  3 ++-
 mm/workingset.c            |  8 ++++----
 4 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index d7e339bf72dc..95807bf6be64 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -793,15 +793,15 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
 			      int val);
 void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
 			int val);
-void __mod_lruvec_slab_state(void *p, enum node_stat_item idx, int val);
+void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val);
 
-static inline void mod_lruvec_slab_state(void *p, enum node_stat_item idx,
+static inline void mod_lruvec_kmem_state(void *p, enum node_stat_item idx,
 					 int val)
 {
 	unsigned long flags;
 
 	local_irq_save(flags);
-	__mod_lruvec_slab_state(p, idx, val);
+	__mod_lruvec_kmem_state(p, idx, val);
 	local_irq_restore(flags);
 }
 
@@ -1227,7 +1227,7 @@ static inline void mod_lruvec_page_state(struct page *page,
 	mod_node_page_state(page_pgdat(page), idx, val);
 }
 
-static inline void __mod_lruvec_slab_state(void *p, enum node_stat_item idx,
+static inline void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx,
 					   int val)
 {
 	struct page *page = virt_to_head_page(p);
@@ -1235,7 +1235,7 @@ static inline void __mod_lruvec_slab_state(void *p, enum node_stat_item idx,
 	__mod_node_page_state(page_pgdat(page), idx, val);
 }
 
-static inline void mod_lruvec_slab_state(void *p, enum node_stat_item idx,
+static inline void mod_lruvec_kmem_state(void *p, enum node_stat_item idx,
 					 int val)
 {
 	struct page *page = virt_to_head_page(p);
@@ -1330,14 +1330,14 @@ static inline void __dec_lruvec_page_state(struct page *page,
 	__mod_lruvec_page_state(page, idx, -1);
 }
 
-static inline void __inc_lruvec_slab_state(void *p, enum node_stat_item idx)
+static inline void __inc_lruvec_kmem_state(void *p, enum node_stat_item idx)
 {
-	__mod_lruvec_slab_state(p, idx, 1);
+	__mod_lruvec_kmem_state(p, idx, 1);
 }
 
-static inline void __dec_lruvec_slab_state(void *p, enum node_stat_item idx)
+static inline void __dec_lruvec_kmem_state(void *p, enum node_stat_item idx)
 {
-	__mod_lruvec_slab_state(p, idx, -1);
+	__mod_lruvec_kmem_state(p, idx, -1);
 }
 
 /* idx can be of type enum memcg_stat_item or node_stat_item */
diff --git a/kernel/fork.c b/kernel/fork.c
index 4b328aecabb2..4fb0bbc3b041 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -384,7 +384,7 @@ static void account_kernel_stack(struct task_struct *tsk, int account)
 		mod_lruvec_page_state(vm->pages[0], NR_KERNEL_STACK_KB,
 				      account * (THREAD_SIZE / 1024));
 	else
-		mod_lruvec_slab_state(stack, NR_KERNEL_STACK_KB,
+		mod_lruvec_kmem_state(stack, NR_KERNEL_STACK_KB,
 				      account * (THREAD_SIZE / 1024));
 }
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c0c27bee23ad..22b4fb941b54 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -866,7 +866,7 @@ void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
 		__mod_memcg_lruvec_state(lruvec, idx, val);
 }
 
-void __mod_lruvec_slab_state(void *p, enum node_stat_item idx, int val)
+void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val)
 {
 	pg_data_t *pgdat = page_pgdat(virt_to_page(p));
 	struct mem_cgroup *memcg;
@@ -2920,6 +2920,7 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
 	if (mem_cgroup_disabled())
 		return NULL;
 
+	VM_BUG_ON(!virt_addr_valid(p));
 	page = virt_to_head_page(p);
 
 	/*
diff --git a/mm/workingset.c b/mm/workingset.c
index 50d53f3699e4..2c707c92dd89 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -445,12 +445,12 @@ void workingset_update_node(struct xa_node *node)
 	if (node->count && node->count == node->nr_values) {
 		if (list_empty(&node->private_list)) {
 			list_lru_add(&shadow_nodes, &node->private_list);
-			__inc_lruvec_slab_state(node, WORKINGSET_NODES);
+			__inc_lruvec_kmem_state(node, WORKINGSET_NODES);
 		}
 	} else {
 		if (!list_empty(&node->private_list)) {
 			list_lru_del(&shadow_nodes, &node->private_list);
-			__dec_lruvec_slab_state(node, WORKINGSET_NODES);
+			__dec_lruvec_kmem_state(node, WORKINGSET_NODES);
 		}
 	}
 }
@@ -541,7 +541,7 @@ static enum lru_status shadow_lru_isolate(struct list_head *item,
 	}
 
 	list_lru_isolate(lru, item);
-	__dec_lruvec_slab_state(node, WORKINGSET_NODES);
+	__dec_lruvec_kmem_state(node, WORKINGSET_NODES);
 
 	spin_unlock(lru_lock);
 
@@ -564,7 +564,7 @@ static enum lru_status shadow_lru_isolate(struct list_head *item,
 	 * shadow entries we were tracking ...
 	 */
 	xas_store(&xas, NULL);
-	__inc_lruvec_slab_state(node, WORKINGSET_NODERECLAIM);
+	__inc_lruvec_kmem_state(node, WORKINGSET_NODERECLAIM);
 
 out_invalid:
 	xa_unlock_irq(&mapping->i_pages);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 4/5] mm: memcg/slab: Fix root memcg vmstats
@ 2020-10-27  8:02   ` Muchun Song
  0 siblings, 0 replies; 41+ messages in thread
From: Muchun Song @ 2020-10-27  8:02 UTC (permalink / raw)
  To: hannes, mhocko, vdavydov.dev, akpm, shakeelb, guro,
	iamjoonsoo.kim, laoar.shao, chris, christian.brauner, peterz,
	mingo, keescook, tglx, esyr, surenb, areber, elver
  Cc: linux-kernel, cgroups, linux-mm, Muchun Song

If we reparent the slab objects to the root memcg, when we free
the slab object, we need to update the per-memcg vmstats to keep
it correct for the root memcg. Now this at least affects the vmstat
of NR_KERNEL_STACK_KB for !CONFIG_VMAP_STACK when the thread stack
size is smaller than the PAGE_SIZE.

Fixes: ec9f02384f60 ("mm: workingset: fix vmstat counters for shadow nodes")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 mm/memcontrol.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 22b4fb941b54..70345b15b150 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -875,8 +875,13 @@ void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val)
 	rcu_read_lock();
 	memcg = mem_cgroup_from_obj(p);
 
-	/* Untracked pages have no memcg, no lruvec. Update only the node */
-	if (!memcg || memcg == root_mem_cgroup) {
+	/*
+	 * Untracked pages have no memcg, no lruvec. Update only the
+	 * node. If we reparent the slab objects to the root memcg,
+	 * when we free the slab object, we need to update the per-memcg
+	 * vmstats to keep it correct for the root memcg.
+	 */
+	if (!memcg) {
 		__mod_node_page_state(pgdat, idx, val);
 	} else {
 		lruvec = mem_cgroup_lruvec(memcg, pgdat);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 4/5] mm: memcg/slab: Fix root memcg vmstats
@ 2020-10-27  8:02   ` Muchun Song
  0 siblings, 0 replies; 41+ messages in thread
From: Muchun Song @ 2020-10-27  8:02 UTC (permalink / raw)
  To: hannes-druUgvl0LCNAfugRpC6u6w, mhocko-DgEjT+Ai2ygdnm+yROfE0A,
	vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA, guro-b10kYP2dOMg,
	iamjoonsoo.kim-Hm3cg6mZ9cc, laoar.shao-Re5JQEeQqe8AvxtiuMwx3w,
	chris-6Bi1550iOqEnzZ6mRAm98g,
	christian.brauner-GeWIH/nMZzLQT0dZR+AlfA,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ, mingo-DgEjT+Ai2ygdnm+yROfE0A,
	keescook-F7+t8E8rja9g9hUCZPvPmw, tglx-hfZtesqFncYOwBW4kG4KsQ,
	esyr-H+wXaHxf7aLQT0dZR+AlfA, surenb-hpIqsD4AKlfQT0dZR+AlfA,
	areber-H+wXaHxf7aLQT0dZR+AlfA, elver-hpIqsD4AKlfQT0dZR+AlfA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	Muchun Song

If we reparent the slab objects to the root memcg, when we free
the slab object, we need to update the per-memcg vmstats to keep
it correct for the root memcg. Now this at least affects the vmstat
of NR_KERNEL_STACK_KB for !CONFIG_VMAP_STACK when the thread stack
size is smaller than the PAGE_SIZE.

Fixes: ec9f02384f60 ("mm: workingset: fix vmstat counters for shadow nodes")
Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>
---
 mm/memcontrol.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 22b4fb941b54..70345b15b150 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -875,8 +875,13 @@ void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val)
 	rcu_read_lock();
 	memcg = mem_cgroup_from_obj(p);
 
-	/* Untracked pages have no memcg, no lruvec. Update only the node */
-	if (!memcg || memcg == root_mem_cgroup) {
+	/*
+	 * Untracked pages have no memcg, no lruvec. Update only the
+	 * node. If we reparent the slab objects to the root memcg,
+	 * when we free the slab object, we need to update the per-memcg
+	 * vmstats to keep it correct for the root memcg.
+	 */
+	if (!memcg) {
 		__mod_node_page_state(pgdat, idx, val);
 	} else {
 		lruvec = mem_cgroup_lruvec(memcg, pgdat);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 5/5] mm: memcontrol: Simplify the mem_cgroup_page_lruvec
@ 2020-10-27  8:02   ` Muchun Song
  0 siblings, 0 replies; 41+ messages in thread
From: Muchun Song @ 2020-10-27  8:02 UTC (permalink / raw)
  To: hannes, mhocko, vdavydov.dev, akpm, shakeelb, guro,
	iamjoonsoo.kim, laoar.shao, chris, christian.brauner, peterz,
	mingo, keescook, tglx, esyr, surenb, areber, elver
  Cc: linux-kernel, cgroups, linux-mm, Muchun Song

We can reuse the code of mem_cgroup_lruvec() to simplify the code
of the mem_cgroup_page_lruvec().

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 include/linux/memcontrol.h | 44 +++++++++++++++++++++++++++-----------
 mm/memcontrol.c            | 40 ----------------------------------
 2 files changed, 32 insertions(+), 52 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 95807bf6be64..5e8480e54cd8 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -451,16 +451,9 @@ mem_cgroup_nodeinfo(struct mem_cgroup *memcg, int nid)
 	return memcg->nodeinfo[nid];
 }
 
-/**
- * mem_cgroup_lruvec - get the lru list vector for a memcg & node
- * @memcg: memcg of the wanted lruvec
- *
- * Returns the lru list vector holding pages for a given @memcg &
- * @node combination. This can be the node lruvec, if the memory
- * controller is disabled.
- */
-static inline struct lruvec *mem_cgroup_lruvec(struct mem_cgroup *memcg,
-					       struct pglist_data *pgdat)
+static inline struct lruvec *mem_cgroup_node_lruvec(struct mem_cgroup *memcg,
+						    struct pglist_data *pgdat,
+						    int nid)
 {
 	struct mem_cgroup_per_node *mz;
 	struct lruvec *lruvec;
@@ -473,7 +466,7 @@ static inline struct lruvec *mem_cgroup_lruvec(struct mem_cgroup *memcg,
 	if (!memcg)
 		memcg = root_mem_cgroup;
 
-	mz = mem_cgroup_nodeinfo(memcg, pgdat->node_id);
+	mz = mem_cgroup_nodeinfo(memcg, nid);
 	lruvec = &mz->lruvec;
 out:
 	/*
@@ -486,7 +479,34 @@ static inline struct lruvec *mem_cgroup_lruvec(struct mem_cgroup *memcg,
 	return lruvec;
 }
 
-struct lruvec *mem_cgroup_page_lruvec(struct page *, struct pglist_data *);
+/**
+ * mem_cgroup_lruvec - get the lru list vector for a memcg & node
+ * @memcg: memcg of the wanted lruvec
+ *
+ * Returns the lru list vector holding pages for a given @memcg &
+ * @node combination. This can be the node lruvec, if the memory
+ * controller is disabled.
+ */
+static inline struct lruvec *mem_cgroup_lruvec(struct mem_cgroup *memcg,
+					       struct pglist_data *pgdat)
+{
+	return mem_cgroup_node_lruvec(memcg, pgdat, pgdat->node_id);
+}
+
+/**
+ * mem_cgroup_page_lruvec - return lruvec for isolating/putting an LRU page
+ * @page: the page
+ * @pgdat: pgdat of the page
+ *
+ * This function relies on page->mem_cgroup being stable - see the
+ * access rules in commit_charge().
+ */
+static inline struct lruvec *mem_cgroup_page_lruvec(struct page *page,
+						    struct pglist_data *pgdat)
+{
+	return mem_cgroup_node_lruvec(page->mem_cgroup, pgdat,
+				      page_to_nid(page));
+}
 
 struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p);
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 70345b15b150..7097f3fc4dee 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1332,46 +1332,6 @@ int mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
 	return ret;
 }
 
-/**
- * mem_cgroup_page_lruvec - return lruvec for isolating/putting an LRU page
- * @page: the page
- * @pgdat: pgdat of the page
- *
- * This function relies on page->mem_cgroup being stable - see the
- * access rules in commit_charge().
- */
-struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgdat)
-{
-	struct mem_cgroup_per_node *mz;
-	struct mem_cgroup *memcg;
-	struct lruvec *lruvec;
-
-	if (mem_cgroup_disabled()) {
-		lruvec = &pgdat->__lruvec;
-		goto out;
-	}
-
-	memcg = page->mem_cgroup;
-	/*
-	 * Swapcache readahead pages are added to the LRU - and
-	 * possibly migrated - before they are charged.
-	 */
-	if (!memcg)
-		memcg = root_mem_cgroup;
-
-	mz = mem_cgroup_page_nodeinfo(memcg, page);
-	lruvec = &mz->lruvec;
-out:
-	/*
-	 * Since a node can be onlined after the mem_cgroup was created,
-	 * we have to be prepared to initialize lruvec->zone here;
-	 * and if offlined then reonlined, we need to reinitialize it.
-	 */
-	if (unlikely(lruvec->pgdat != pgdat))
-		lruvec->pgdat = pgdat;
-	return lruvec;
-}
-
 /**
  * mem_cgroup_update_lru_size - account for adding or removing an lru page
  * @lruvec: mem_cgroup per zone lru vector
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 5/5] mm: memcontrol: Simplify the mem_cgroup_page_lruvec
@ 2020-10-27  8:02   ` Muchun Song
  0 siblings, 0 replies; 41+ messages in thread
From: Muchun Song @ 2020-10-27  8:02 UTC (permalink / raw)
  To: hannes-druUgvl0LCNAfugRpC6u6w, mhocko-DgEjT+Ai2ygdnm+yROfE0A,
	vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA, guro-b10kYP2dOMg,
	iamjoonsoo.kim-Hm3cg6mZ9cc, laoar.shao-Re5JQEeQqe8AvxtiuMwx3w,
	chris-6Bi1550iOqEnzZ6mRAm98g,
	christian.brauner-GeWIH/nMZzLQT0dZR+AlfA,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ, mingo-DgEjT+Ai2ygdnm+yROfE0A,
	keescook-F7+t8E8rja9g9hUCZPvPmw, tglx-hfZtesqFncYOwBW4kG4KsQ,
	esyr-H+wXaHxf7aLQT0dZR+AlfA, surenb-hpIqsD4AKlfQT0dZR+AlfA,
	areber-H+wXaHxf7aLQT0dZR+AlfA, elver-hpIqsD4AKlfQT0dZR+AlfA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	Muchun Song

We can reuse the code of mem_cgroup_lruvec() to simplify the code
of the mem_cgroup_page_lruvec().

Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>
---
 include/linux/memcontrol.h | 44 +++++++++++++++++++++++++++-----------
 mm/memcontrol.c            | 40 ----------------------------------
 2 files changed, 32 insertions(+), 52 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 95807bf6be64..5e8480e54cd8 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -451,16 +451,9 @@ mem_cgroup_nodeinfo(struct mem_cgroup *memcg, int nid)
 	return memcg->nodeinfo[nid];
 }
 
-/**
- * mem_cgroup_lruvec - get the lru list vector for a memcg & node
- * @memcg: memcg of the wanted lruvec
- *
- * Returns the lru list vector holding pages for a given @memcg &
- * @node combination. This can be the node lruvec, if the memory
- * controller is disabled.
- */
-static inline struct lruvec *mem_cgroup_lruvec(struct mem_cgroup *memcg,
-					       struct pglist_data *pgdat)
+static inline struct lruvec *mem_cgroup_node_lruvec(struct mem_cgroup *memcg,
+						    struct pglist_data *pgdat,
+						    int nid)
 {
 	struct mem_cgroup_per_node *mz;
 	struct lruvec *lruvec;
@@ -473,7 +466,7 @@ static inline struct lruvec *mem_cgroup_lruvec(struct mem_cgroup *memcg,
 	if (!memcg)
 		memcg = root_mem_cgroup;
 
-	mz = mem_cgroup_nodeinfo(memcg, pgdat->node_id);
+	mz = mem_cgroup_nodeinfo(memcg, nid);
 	lruvec = &mz->lruvec;
 out:
 	/*
@@ -486,7 +479,34 @@ static inline struct lruvec *mem_cgroup_lruvec(struct mem_cgroup *memcg,
 	return lruvec;
 }
 
-struct lruvec *mem_cgroup_page_lruvec(struct page *, struct pglist_data *);
+/**
+ * mem_cgroup_lruvec - get the lru list vector for a memcg & node
+ * @memcg: memcg of the wanted lruvec
+ *
+ * Returns the lru list vector holding pages for a given @memcg &
+ * @node combination. This can be the node lruvec, if the memory
+ * controller is disabled.
+ */
+static inline struct lruvec *mem_cgroup_lruvec(struct mem_cgroup *memcg,
+					       struct pglist_data *pgdat)
+{
+	return mem_cgroup_node_lruvec(memcg, pgdat, pgdat->node_id);
+}
+
+/**
+ * mem_cgroup_page_lruvec - return lruvec for isolating/putting an LRU page
+ * @page: the page
+ * @pgdat: pgdat of the page
+ *
+ * This function relies on page->mem_cgroup being stable - see the
+ * access rules in commit_charge().
+ */
+static inline struct lruvec *mem_cgroup_page_lruvec(struct page *page,
+						    struct pglist_data *pgdat)
+{
+	return mem_cgroup_node_lruvec(page->mem_cgroup, pgdat,
+				      page_to_nid(page));
+}
 
 struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p);
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 70345b15b150..7097f3fc4dee 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1332,46 +1332,6 @@ int mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
 	return ret;
 }
 
-/**
- * mem_cgroup_page_lruvec - return lruvec for isolating/putting an LRU page
- * @page: the page
- * @pgdat: pgdat of the page
- *
- * This function relies on page->mem_cgroup being stable - see the
- * access rules in commit_charge().
- */
-struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgdat)
-{
-	struct mem_cgroup_per_node *mz;
-	struct mem_cgroup *memcg;
-	struct lruvec *lruvec;
-
-	if (mem_cgroup_disabled()) {
-		lruvec = &pgdat->__lruvec;
-		goto out;
-	}
-
-	memcg = page->mem_cgroup;
-	/*
-	 * Swapcache readahead pages are added to the LRU - and
-	 * possibly migrated - before they are charged.
-	 */
-	if (!memcg)
-		memcg = root_mem_cgroup;
-
-	mz = mem_cgroup_page_nodeinfo(memcg, page);
-	lruvec = &mz->lruvec;
-out:
-	/*
-	 * Since a node can be onlined after the mem_cgroup was created,
-	 * we have to be prepared to initialize lruvec->zone here;
-	 * and if offlined then reonlined, we need to reinitialize it.
-	 */
-	if (unlikely(lruvec->pgdat != pgdat))
-		lruvec->pgdat = pgdat;
-	return lruvec;
-}
-
 /**
  * mem_cgroup_update_lru_size - account for adding or removing an lru page
  * @lruvec: mem_cgroup per zone lru vector
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH 5/5] mm: memcontrol: Simplify the mem_cgroup_page_lruvec
@ 2020-10-27 13:36     ` Michal Hocko
  0 siblings, 0 replies; 41+ messages in thread
From: Michal Hocko @ 2020-10-27 13:36 UTC (permalink / raw)
  To: Muchun Song
  Cc: hannes, vdavydov.dev, akpm, shakeelb, guro, iamjoonsoo.kim,
	laoar.shao, chris, christian.brauner, peterz, mingo, keescook,
	tglx, esyr, surenb, areber, elver, linux-kernel, cgroups,
	linux-mm

On Tue 27-10-20 16:02:56, Muchun Song wrote:
> We can reuse the code of mem_cgroup_lruvec() to simplify the code
> of the mem_cgroup_page_lruvec().

yes, removing the code duplication is reasonable. But ...

> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  include/linux/memcontrol.h | 44 +++++++++++++++++++++++++++-----------
>  mm/memcontrol.c            | 40 ----------------------------------
>  2 files changed, 32 insertions(+), 52 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 95807bf6be64..5e8480e54cd8 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -451,16 +451,9 @@ mem_cgroup_nodeinfo(struct mem_cgroup *memcg, int nid)
>  	return memcg->nodeinfo[nid];
>  }
>  
> -/**
> - * mem_cgroup_lruvec - get the lru list vector for a memcg & node
> - * @memcg: memcg of the wanted lruvec
> - *
> - * Returns the lru list vector holding pages for a given @memcg &
> - * @node combination. This can be the node lruvec, if the memory
> - * controller is disabled.
> - */
> -static inline struct lruvec *mem_cgroup_lruvec(struct mem_cgroup *memcg,
> -					       struct pglist_data *pgdat)
> +static inline struct lruvec *mem_cgroup_node_lruvec(struct mem_cgroup *memcg,
> +						    struct pglist_data *pgdat,
> +						    int nid)

This is just wrong interface. Either take nid or pgdat. You do not want
both because that just begs for wrong usage.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 5/5] mm: memcontrol: Simplify the mem_cgroup_page_lruvec
@ 2020-10-27 13:36     ` Michal Hocko
  0 siblings, 0 replies; 41+ messages in thread
From: Michal Hocko @ 2020-10-27 13:36 UTC (permalink / raw)
  To: Muchun Song
  Cc: hannes-druUgvl0LCNAfugRpC6u6w,
	vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA, guro-b10kYP2dOMg,
	iamjoonsoo.kim-Hm3cg6mZ9cc, laoar.shao-Re5JQEeQqe8AvxtiuMwx3w,
	chris-6Bi1550iOqEnzZ6mRAm98g,
	christian.brauner-GeWIH/nMZzLQT0dZR+AlfA,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ, mingo-DgEjT+Ai2ygdnm+yROfE0A,
	keescook-F7+t8E8rja9g9hUCZPvPmw, tglx-hfZtesqFncYOwBW4kG4KsQ,
	esyr-H+wXaHxf7aLQT0dZR+AlfA, surenb-hpIqsD4AKlfQT0dZR+AlfA,
	areber-H+wXaHxf7aLQT0dZR+AlfA, elver-hpIqsD4AKlfQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Tue 27-10-20 16:02:56, Muchun Song wrote:
> We can reuse the code of mem_cgroup_lruvec() to simplify the code
> of the mem_cgroup_page_lruvec().

yes, removing the code duplication is reasonable. But ...

> 
> Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>
> ---
>  include/linux/memcontrol.h | 44 +++++++++++++++++++++++++++-----------
>  mm/memcontrol.c            | 40 ----------------------------------
>  2 files changed, 32 insertions(+), 52 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 95807bf6be64..5e8480e54cd8 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -451,16 +451,9 @@ mem_cgroup_nodeinfo(struct mem_cgroup *memcg, int nid)
>  	return memcg->nodeinfo[nid];
>  }
>  
> -/**
> - * mem_cgroup_lruvec - get the lru list vector for a memcg & node
> - * @memcg: memcg of the wanted lruvec
> - *
> - * Returns the lru list vector holding pages for a given @memcg &
> - * @node combination. This can be the node lruvec, if the memory
> - * controller is disabled.
> - */
> -static inline struct lruvec *mem_cgroup_lruvec(struct mem_cgroup *memcg,
> -					       struct pglist_data *pgdat)
> +static inline struct lruvec *mem_cgroup_node_lruvec(struct mem_cgroup *memcg,
> +						    struct pglist_data *pgdat,
> +						    int nid)

This is just wrong interface. Either take nid or pgdat. You do not want
both because that just begs for wrong usage.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [External] Re: [PATCH 5/5] mm: memcontrol: Simplify the mem_cgroup_page_lruvec
  2020-10-27 13:36     ` Michal Hocko
  (?)
@ 2020-10-27 14:15       ` Muchun Song
  -1 siblings, 0 replies; 41+ messages in thread
From: Muchun Song @ 2020-10-27 14:15 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Shakeel Butt,
	Roman Gushchin, Joonsoo Kim, laoar.shao, Chris Down,
	Christian Brauner, Peter Zijlstra, Ingo Molnar, Kees Cook,
	Thomas Gleixner, esyr, Suren Baghdasaryan, areber, Marco Elver,
	LKML, Cgroups, Linux Memory Management List

On Tue, Oct 27, 2020 at 9:36 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Tue 27-10-20 16:02:56, Muchun Song wrote:
> > We can reuse the code of mem_cgroup_lruvec() to simplify the code
> > of the mem_cgroup_page_lruvec().
>
> yes, removing the code duplication is reasonable. But ...
>
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  include/linux/memcontrol.h | 44 +++++++++++++++++++++++++++-----------
> >  mm/memcontrol.c            | 40 ----------------------------------
> >  2 files changed, 32 insertions(+), 52 deletions(-)
> >
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index 95807bf6be64..5e8480e54cd8 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -451,16 +451,9 @@ mem_cgroup_nodeinfo(struct mem_cgroup *memcg, int nid)
> >       return memcg->nodeinfo[nid];
> >  }
> >
> > -/**
> > - * mem_cgroup_lruvec - get the lru list vector for a memcg & node
> > - * @memcg: memcg of the wanted lruvec
> > - *
> > - * Returns the lru list vector holding pages for a given @memcg &
> > - * @node combination. This can be the node lruvec, if the memory
> > - * controller is disabled.
> > - */
> > -static inline struct lruvec *mem_cgroup_lruvec(struct mem_cgroup *memcg,
> > -                                            struct pglist_data *pgdat)
> > +static inline struct lruvec *mem_cgroup_node_lruvec(struct mem_cgroup *memcg,
> > +                                                 struct pglist_data *pgdat,
> > +                                                 int nid)
>
> This is just wrong interface. Either take nid or pgdat. You do not want
> both because that just begs for wrong usage.

If we want to avoid abuse of mem_cgroup_node_lruvec. We can move
those functions to the memcontrol.c. And add the "static" attribute to the
mem_cgroup_node_lruvec. Just export mem_cgroup_lruvec and
mem_cgroup_page_lruvec. Is this OK?

Thanks.

> --
> Michal Hocko
> SUSE Labs

-- 
Yours,
Muchun

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [External] Re: [PATCH 5/5] mm: memcontrol: Simplify the mem_cgroup_page_lruvec
@ 2020-10-27 14:15       ` Muchun Song
  0 siblings, 0 replies; 41+ messages in thread
From: Muchun Song @ 2020-10-27 14:15 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Shakeel Butt,
	Roman Gushchin, Joonsoo Kim, laoar.shao, Chris Down,
	Christian Brauner, Peter Zijlstra, Ingo Molnar, Kees Cook,
	Thomas Gleixner, esyr, Suren Baghdasaryan, areber, Marco Elver,
	LKML, Cgroups, Linux Memory Management List

On Tue, Oct 27, 2020 at 9:36 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Tue 27-10-20 16:02:56, Muchun Song wrote:
> > We can reuse the code of mem_cgroup_lruvec() to simplify the code
> > of the mem_cgroup_page_lruvec().
>
> yes, removing the code duplication is reasonable. But ...
>
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  include/linux/memcontrol.h | 44 +++++++++++++++++++++++++++-----------
> >  mm/memcontrol.c            | 40 ----------------------------------
> >  2 files changed, 32 insertions(+), 52 deletions(-)
> >
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index 95807bf6be64..5e8480e54cd8 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -451,16 +451,9 @@ mem_cgroup_nodeinfo(struct mem_cgroup *memcg, int nid)
> >       return memcg->nodeinfo[nid];
> >  }
> >
> > -/**
> > - * mem_cgroup_lruvec - get the lru list vector for a memcg & node
> > - * @memcg: memcg of the wanted lruvec
> > - *
> > - * Returns the lru list vector holding pages for a given @memcg &
> > - * @node combination. This can be the node lruvec, if the memory
> > - * controller is disabled.
> > - */
> > -static inline struct lruvec *mem_cgroup_lruvec(struct mem_cgroup *memcg,
> > -                                            struct pglist_data *pgdat)
> > +static inline struct lruvec *mem_cgroup_node_lruvec(struct mem_cgroup *memcg,
> > +                                                 struct pglist_data *pgdat,
> > +                                                 int nid)
>
> This is just wrong interface. Either take nid or pgdat. You do not want
> both because that just begs for wrong usage.

If we want to avoid abuse of mem_cgroup_node_lruvec. We can move
those functions to the memcontrol.c. And add the "static" attribute to the
mem_cgroup_node_lruvec. Just export mem_cgroup_lruvec and
mem_cgroup_page_lruvec. Is this OK?

Thanks.

> --
> Michal Hocko
> SUSE Labs

-- 
Yours,
Muchun


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [External] Re: [PATCH 5/5] mm: memcontrol: Simplify the mem_cgroup_page_lruvec
@ 2020-10-27 14:15       ` Muchun Song
  0 siblings, 0 replies; 41+ messages in thread
From: Muchun Song @ 2020-10-27 14:15 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Shakeel Butt,
	Roman Gushchin, Joonsoo Kim, laoar.shao-Re5JQEeQqe8AvxtiuMwx3w,
	Chris Down, Christian Brauner, Peter Zijlstra, Ingo Molnar,
	Kees Cook, Thomas Gleixner, esyr-H+wXaHxf7aLQT0dZR+AlfA,
	Suren Baghdasaryan, areber-H+wXaHxf7aLQT0dZR+AlfA, Marco Elver,
	LKML, Cgroups, Linux Memory Management List

On Tue, Oct 27, 2020 at 9:36 PM Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org> wrote:
>
> On Tue 27-10-20 16:02:56, Muchun Song wrote:
> > We can reuse the code of mem_cgroup_lruvec() to simplify the code
> > of the mem_cgroup_page_lruvec().
>
> yes, removing the code duplication is reasonable. But ...
>
> >
> > Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>
> > ---
> >  include/linux/memcontrol.h | 44 +++++++++++++++++++++++++++-----------
> >  mm/memcontrol.c            | 40 ----------------------------------
> >  2 files changed, 32 insertions(+), 52 deletions(-)
> >
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index 95807bf6be64..5e8480e54cd8 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -451,16 +451,9 @@ mem_cgroup_nodeinfo(struct mem_cgroup *memcg, int nid)
> >       return memcg->nodeinfo[nid];
> >  }
> >
> > -/**
> > - * mem_cgroup_lruvec - get the lru list vector for a memcg & node
> > - * @memcg: memcg of the wanted lruvec
> > - *
> > - * Returns the lru list vector holding pages for a given @memcg &
> > - * @node combination. This can be the node lruvec, if the memory
> > - * controller is disabled.
> > - */
> > -static inline struct lruvec *mem_cgroup_lruvec(struct mem_cgroup *memcg,
> > -                                            struct pglist_data *pgdat)
> > +static inline struct lruvec *mem_cgroup_node_lruvec(struct mem_cgroup *memcg,
> > +                                                 struct pglist_data *pgdat,
> > +                                                 int nid)
>
> This is just wrong interface. Either take nid or pgdat. You do not want
> both because that just begs for wrong usage.

If we want to avoid abuse of mem_cgroup_node_lruvec. We can move
those functions to the memcontrol.c. And add the "static" attribute to the
mem_cgroup_node_lruvec. Just export mem_cgroup_lruvec and
mem_cgroup_page_lruvec. Is this OK?

Thanks.

> --
> Michal Hocko
> SUSE Labs

-- 
Yours,
Muchun

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [External] Re: [PATCH 5/5] mm: memcontrol: Simplify the mem_cgroup_page_lruvec
@ 2020-10-27 14:31         ` Michal Hocko
  0 siblings, 0 replies; 41+ messages in thread
From: Michal Hocko @ 2020-10-27 14:31 UTC (permalink / raw)
  To: Muchun Song
  Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Shakeel Butt,
	Roman Gushchin, Joonsoo Kim, laoar.shao, Chris Down,
	Christian Brauner, Peter Zijlstra, Ingo Molnar, Kees Cook,
	Thomas Gleixner, esyr, Suren Baghdasaryan, areber, Marco Elver,
	LKML, Cgroups, Linux Memory Management List

On Tue 27-10-20 22:15:16, Muchun Song wrote:
> On Tue, Oct 27, 2020 at 9:36 PM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Tue 27-10-20 16:02:56, Muchun Song wrote:
> > > We can reuse the code of mem_cgroup_lruvec() to simplify the code
> > > of the mem_cgroup_page_lruvec().
> >
> > yes, removing the code duplication is reasonable. But ...
> >
> > >
> > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > ---
> > >  include/linux/memcontrol.h | 44 +++++++++++++++++++++++++++-----------
> > >  mm/memcontrol.c            | 40 ----------------------------------
> > >  2 files changed, 32 insertions(+), 52 deletions(-)
> > >
> > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > > index 95807bf6be64..5e8480e54cd8 100644
> > > --- a/include/linux/memcontrol.h
> > > +++ b/include/linux/memcontrol.h
> > > @@ -451,16 +451,9 @@ mem_cgroup_nodeinfo(struct mem_cgroup *memcg, int nid)
> > >       return memcg->nodeinfo[nid];
> > >  }
> > >
> > > -/**
> > > - * mem_cgroup_lruvec - get the lru list vector for a memcg & node
> > > - * @memcg: memcg of the wanted lruvec
> > > - *
> > > - * Returns the lru list vector holding pages for a given @memcg &
> > > - * @node combination. This can be the node lruvec, if the memory
> > > - * controller is disabled.
> > > - */
> > > -static inline struct lruvec *mem_cgroup_lruvec(struct mem_cgroup *memcg,
> > > -                                            struct pglist_data *pgdat)
> > > +static inline struct lruvec *mem_cgroup_node_lruvec(struct mem_cgroup *memcg,
> > > +                                                 struct pglist_data *pgdat,
> > > +                                                 int nid)
> >
> > This is just wrong interface. Either take nid or pgdat. You do not want
> > both because that just begs for wrong usage.
> 
> If we want to avoid abuse of mem_cgroup_node_lruvec. We can move
> those functions to the memcontrol.c. And add the "static" attribute to the
> mem_cgroup_node_lruvec. Just export mem_cgroup_lruvec and
> mem_cgroup_page_lruvec. Is this OK?

Sorry, I was probably not clear enough. I am not against the function
per se. I just do not think we want to make it trickier to use than
necessary. That means either use pgdat or nid argument. Not both because
they should always be in sync and you can trivially get from one to the
other and vice versa.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [External] Re: [PATCH 5/5] mm: memcontrol: Simplify the mem_cgroup_page_lruvec
@ 2020-10-27 14:31         ` Michal Hocko
  0 siblings, 0 replies; 41+ messages in thread
From: Michal Hocko @ 2020-10-27 14:31 UTC (permalink / raw)
  To: Muchun Song
  Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Shakeel Butt,
	Roman Gushchin, Joonsoo Kim, laoar.shao-Re5JQEeQqe8AvxtiuMwx3w,
	Chris Down, Christian Brauner, Peter Zijlstra, Ingo Molnar,
	Kees Cook, Thomas Gleixner, esyr-H+wXaHxf7aLQT0dZR+AlfA,
	Suren Baghdasaryan, areber-H+wXaHxf7aLQT0dZR+AlfA, Marco Elver,
	LKML, Cgroups, Linux Memory Management List

On Tue 27-10-20 22:15:16, Muchun Song wrote:
> On Tue, Oct 27, 2020 at 9:36 PM Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org> wrote:
> >
> > On Tue 27-10-20 16:02:56, Muchun Song wrote:
> > > We can reuse the code of mem_cgroup_lruvec() to simplify the code
> > > of the mem_cgroup_page_lruvec().
> >
> > yes, removing the code duplication is reasonable. But ...
> >
> > >
> > > Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>
> > > ---
> > >  include/linux/memcontrol.h | 44 +++++++++++++++++++++++++++-----------
> > >  mm/memcontrol.c            | 40 ----------------------------------
> > >  2 files changed, 32 insertions(+), 52 deletions(-)
> > >
> > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > > index 95807bf6be64..5e8480e54cd8 100644
> > > --- a/include/linux/memcontrol.h
> > > +++ b/include/linux/memcontrol.h
> > > @@ -451,16 +451,9 @@ mem_cgroup_nodeinfo(struct mem_cgroup *memcg, int nid)
> > >       return memcg->nodeinfo[nid];
> > >  }
> > >
> > > -/**
> > > - * mem_cgroup_lruvec - get the lru list vector for a memcg & node
> > > - * @memcg: memcg of the wanted lruvec
> > > - *
> > > - * Returns the lru list vector holding pages for a given @memcg &
> > > - * @node combination. This can be the node lruvec, if the memory
> > > - * controller is disabled.
> > > - */
> > > -static inline struct lruvec *mem_cgroup_lruvec(struct mem_cgroup *memcg,
> > > -                                            struct pglist_data *pgdat)
> > > +static inline struct lruvec *mem_cgroup_node_lruvec(struct mem_cgroup *memcg,
> > > +                                                 struct pglist_data *pgdat,
> > > +                                                 int nid)
> >
> > This is just wrong interface. Either take nid or pgdat. You do not want
> > both because that just begs for wrong usage.
> 
> If we want to avoid abuse of mem_cgroup_node_lruvec. We can move
> those functions to the memcontrol.c. And add the "static" attribute to the
> mem_cgroup_node_lruvec. Just export mem_cgroup_lruvec and
> mem_cgroup_page_lruvec. Is this OK?

Sorry, I was probably not clear enough. I am not against the function
per se. I just do not think we want to make it trickier to use than
necessary. That means either use pgdat or nid argument. Not both because
they should always be in sync and you can trivially get from one to the
other and vice versa.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/5] mm: memcg/slab: Fix return child memcg objcg for root memcg
  2020-10-27  8:02 ` [PATCH 1/5] mm: memcg/slab: Fix return child memcg objcg for root memcg Muchun Song
@ 2020-10-27 17:42     ` Roman Gushchin
  0 siblings, 0 replies; 41+ messages in thread
From: Roman Gushchin @ 2020-10-27 17:42 UTC (permalink / raw)
  To: Muchun Song
  Cc: hannes, mhocko, vdavydov.dev, akpm, shakeelb, iamjoonsoo.kim,
	laoar.shao, chris, christian.brauner, peterz, mingo, keescook,
	tglx, esyr, surenb, areber, elver, linux-kernel, cgroups,
	linux-mm

On Tue, Oct 27, 2020 at 04:02:52PM +0800, Muchun Song wrote:
> Consider the following memcg hierarchy.
> 
>                     root
>                    /    \
>                   A      B
> 
> If we get the objcg of memcg A failed, the get_obj_cgroup_from_current
> can return the wrong objcg for the root memcg.
> 
> Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API")
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  mm/memcontrol.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)

Hi Muchun,

thank you for the patchset!

A generic note: it's usually not a good idea to group fixes with non-fixes
in one patchset if fixes are planned to be backported to stable.

> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 1337775b04f3..fcbd79c5023e 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2945,7 +2945,7 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
>  
>  __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void)
>  {
> -	struct obj_cgroup *objcg = NULL;
> +	struct obj_cgroup *objcg;
>  	struct mem_cgroup *memcg;
>  
>  	if (memcg_kmem_bypass())
> @@ -2964,6 +2964,9 @@ __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void)
>  	}
>  	rcu_read_unlock();
>  
> +	if (memcg == root_mem_cgroup)
> +		return NULL;
> +
>  	return objcg;
>  }

I agree, it's a bug. Good catch, thanks!

I would prefer a slightly different fix though. Because we're going to change how
the root memory cgroup is handled (an rfc version posted here:
https://lkml.org/lkml/2020/10/21/612), it's better to not use a comparison
with the root_mem_cgroup and handle the obj_cgroup_tryget() return code instead.
Something like this:

@@ -2973,6 +2973,7 @@ __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void)
 		objcg = rcu_dereference(memcg->objcg);
 		if (objcg && obj_cgroup_tryget(objcg))
 			break;
+		objcg = NULL;
 	}
 	rcu_read_unlock();

Or a more explicit:
   if (obj_cgroup_tryget(objcg)) {
     ...
   } else {
     ...
   }

Also, please, add:
Cc: stable@vger.kernel.org

Thanks!

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/5] mm: memcg/slab: Fix return child memcg objcg for root memcg
@ 2020-10-27 17:42     ` Roman Gushchin
  0 siblings, 0 replies; 41+ messages in thread
From: Roman Gushchin @ 2020-10-27 17:42 UTC (permalink / raw)
  To: Muchun Song
  Cc: hannes, mhocko, vdavydov.dev, akpm, shakeelb, iamjoonsoo.kim,
	laoar.shao, chris, christian.brauner, peterz, mingo, keescook,
	tglx, esyr, surenb, areber, elver, linux-kernel, cgroups,
	linux-mm

On Tue, Oct 27, 2020 at 04:02:52PM +0800, Muchun Song wrote:
> Consider the following memcg hierarchy.
> 
>                     root
>                    /    \
>                   A      B
> 
> If we get the objcg of memcg A failed, the get_obj_cgroup_from_current
> can return the wrong objcg for the root memcg.
> 
> Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API")
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  mm/memcontrol.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)

Hi Muchun,

thank you for the patchset!

A generic note: it's usually not a good idea to group fixes with non-fixes
in one patchset if fixes are planned to be backported to stable.

> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 1337775b04f3..fcbd79c5023e 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2945,7 +2945,7 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
>  
>  __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void)
>  {
> -	struct obj_cgroup *objcg = NULL;
> +	struct obj_cgroup *objcg;
>  	struct mem_cgroup *memcg;
>  
>  	if (memcg_kmem_bypass())
> @@ -2964,6 +2964,9 @@ __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void)
>  	}
>  	rcu_read_unlock();
>  
> +	if (memcg == root_mem_cgroup)
> +		return NULL;
> +
>  	return objcg;
>  }

I agree, it's a bug. Good catch, thanks!

I would prefer a slightly different fix though. Because we're going to change how
the root memory cgroup is handled (an rfc version posted here:
https://lkml.org/lkml/2020/10/21/612), it's better to not use a comparison
with the root_mem_cgroup and handle the obj_cgroup_tryget() return code instead.
Something like this:

@@ -2973,6 +2973,7 @@ __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void)
 		objcg = rcu_dereference(memcg->objcg);
 		if (objcg && obj_cgroup_tryget(objcg))
 			break;
+		objcg = NULL;
 	}
 	rcu_read_unlock();

Or a more explicit:
   if (obj_cgroup_tryget(objcg)) {
     ...
   } else {
     ...
   }

Also, please, add:
Cc: stable@vger.kernel.org

Thanks!

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/5] mm: memcg/slab: Fix use after free in obj_cgroup_charge
  2020-10-27  8:02 ` [PATCH 2/5] mm: memcg/slab: Fix use after free in obj_cgroup_charge Muchun Song
@ 2020-10-27 18:31     ` Roman Gushchin
  0 siblings, 0 replies; 41+ messages in thread
From: Roman Gushchin @ 2020-10-27 18:31 UTC (permalink / raw)
  To: Muchun Song
  Cc: hannes, mhocko, vdavydov.dev, akpm, shakeelb, iamjoonsoo.kim,
	laoar.shao, chris, christian.brauner, peterz, mingo, keescook,
	tglx, esyr, surenb, areber, elver, linux-kernel, cgroups,
	linux-mm

On Tue, Oct 27, 2020 at 04:02:53PM +0800, Muchun Song wrote:
> The rcu_read_lock/unlock only can guarantee that the memcg will
> not be freed, but it cannot guarantee the success of css_get to
> memcg. This can be happened when we reparent the memcg. So using
> css_tryget instead of css_get.

Hm, I really doubt that it can be reproduced in the real life, but
I think you're formally correct.

If the whole process of a cgroup offlining is completed between reading
a objcg->memcg pointer and bumping the css reference on another CPU,
and there are exactly 0 external references to this memory cgroup
(how we get to the obj_cgroup_charge() then?), css_get() can change
the ref counter from 0 back to 1.
Is this a good description of the problem?

If so, I think it's worth a more detailed description and mentioning that
it's not a real-life bug. It also probably doesn't deserve a stable@ backport.

> 
> Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API")
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  mm/memcontrol.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index fcbd79c5023e..c0c27bee23ad 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3223,8 +3223,10 @@ int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size)
>  	 * independently later.
>  	 */
>  	rcu_read_lock();
> +retry:
>  	memcg = obj_cgroup_memcg(objcg);
> -	css_get(&memcg->css);
> +	if (!css_tryget(&memcg->css))

  	if (unlikely(!css_tryget(&memcg->css)))
            ^^^^^^^^
	    maybe?

> +		goto retry;

Otherwise the fix looks good to me.

Please, feel free to add
Acked-by: Roman Gushchin <guro@fb.com>
after extending the commit description.

Thanks!

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/5] mm: memcg/slab: Fix use after free in obj_cgroup_charge
@ 2020-10-27 18:31     ` Roman Gushchin
  0 siblings, 0 replies; 41+ messages in thread
From: Roman Gushchin @ 2020-10-27 18:31 UTC (permalink / raw)
  To: Muchun Song
  Cc: hannes, mhocko, vdavydov.dev, akpm, shakeelb, iamjoonsoo.kim,
	laoar.shao, chris, christian.brauner, peterz, mingo, keescook,
	tglx, esyr, surenb, areber, elver, linux-kernel, cgroups,
	linux-mm

On Tue, Oct 27, 2020 at 04:02:53PM +0800, Muchun Song wrote:
> The rcu_read_lock/unlock only can guarantee that the memcg will
> not be freed, but it cannot guarantee the success of css_get to
> memcg. This can be happened when we reparent the memcg. So using
> css_tryget instead of css_get.

Hm, I really doubt that it can be reproduced in the real life, but
I think you're formally correct.

If the whole process of a cgroup offlining is completed between reading
a objcg->memcg pointer and bumping the css reference on another CPU,
and there are exactly 0 external references to this memory cgroup
(how we get to the obj_cgroup_charge() then?), css_get() can change
the ref counter from 0 back to 1.
Is this a good description of the problem?

If so, I think it's worth a more detailed description and mentioning that
it's not a real-life bug. It also probably doesn't deserve a stable@ backport.

> 
> Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API")
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  mm/memcontrol.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index fcbd79c5023e..c0c27bee23ad 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3223,8 +3223,10 @@ int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size)
>  	 * independently later.
>  	 */
>  	rcu_read_lock();
> +retry:
>  	memcg = obj_cgroup_memcg(objcg);
> -	css_get(&memcg->css);
> +	if (!css_tryget(&memcg->css))

  	if (unlikely(!css_tryget(&memcg->css)))
            ^^^^^^^^
	    maybe?

> +		goto retry;

Otherwise the fix looks good to me.

Please, feel free to add
Acked-by: Roman Gushchin <guro@fb.com>
after extending the commit description.

Thanks!

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/5] mm: memcg/slab: Rename *_lruvec_slab_state to *_lruvec_kmem_state
@ 2020-10-27 18:37     ` Roman Gushchin
  0 siblings, 0 replies; 41+ messages in thread
From: Roman Gushchin @ 2020-10-27 18:37 UTC (permalink / raw)
  To: Muchun Song
  Cc: hannes, mhocko, vdavydov.dev, akpm, shakeelb, iamjoonsoo.kim,
	laoar.shao, chris, christian.brauner, peterz, mingo, keescook,
	tglx, esyr, surenb, areber, elver, linux-kernel, cgroups,
	linux-mm

On Tue, Oct 27, 2020 at 04:02:54PM +0800, Muchun Song wrote:
> The *_lruvec_slab_state is also suitable for pages allocated from buddy,
> not just for the slab objects. But the function name seems to tell us that
> only slab object is applicable. So we can rename the keyword of slab to
> kmem.

I agree, I think we've even discussed such a renaming in the past.
Please, drop the addition of VM_BUG_ON() (it's not related to the renaming, right?).

Other than that:
Acked-by: Roman Gushchin <guro@fb.com>

Thanks!

> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  include/linux/memcontrol.h | 18 +++++++++---------
>  kernel/fork.c              |  2 +-
>  mm/memcontrol.c            |  3 ++-
>  mm/workingset.c            |  8 ++++----
>  4 files changed, 16 insertions(+), 15 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index d7e339bf72dc..95807bf6be64 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -793,15 +793,15 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
>  			      int val);
>  void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
>  			int val);
> -void __mod_lruvec_slab_state(void *p, enum node_stat_item idx, int val);
> +void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val);
>  
> -static inline void mod_lruvec_slab_state(void *p, enum node_stat_item idx,
> +static inline void mod_lruvec_kmem_state(void *p, enum node_stat_item idx,
>  					 int val)
>  {
>  	unsigned long flags;
>  
>  	local_irq_save(flags);
> -	__mod_lruvec_slab_state(p, idx, val);
> +	__mod_lruvec_kmem_state(p, idx, val);
>  	local_irq_restore(flags);
>  }
>  
> @@ -1227,7 +1227,7 @@ static inline void mod_lruvec_page_state(struct page *page,
>  	mod_node_page_state(page_pgdat(page), idx, val);
>  }
>  
> -static inline void __mod_lruvec_slab_state(void *p, enum node_stat_item idx,
> +static inline void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx,
>  					   int val)
>  {
>  	struct page *page = virt_to_head_page(p);
> @@ -1235,7 +1235,7 @@ static inline void __mod_lruvec_slab_state(void *p, enum node_stat_item idx,
>  	__mod_node_page_state(page_pgdat(page), idx, val);
>  }
>  
> -static inline void mod_lruvec_slab_state(void *p, enum node_stat_item idx,
> +static inline void mod_lruvec_kmem_state(void *p, enum node_stat_item idx,
>  					 int val)
>  {
>  	struct page *page = virt_to_head_page(p);
> @@ -1330,14 +1330,14 @@ static inline void __dec_lruvec_page_state(struct page *page,
>  	__mod_lruvec_page_state(page, idx, -1);
>  }
>  
> -static inline void __inc_lruvec_slab_state(void *p, enum node_stat_item idx)
> +static inline void __inc_lruvec_kmem_state(void *p, enum node_stat_item idx)
>  {
> -	__mod_lruvec_slab_state(p, idx, 1);
> +	__mod_lruvec_kmem_state(p, idx, 1);
>  }
>  
> -static inline void __dec_lruvec_slab_state(void *p, enum node_stat_item idx)
> +static inline void __dec_lruvec_kmem_state(void *p, enum node_stat_item idx)
>  {
> -	__mod_lruvec_slab_state(p, idx, -1);
> +	__mod_lruvec_kmem_state(p, idx, -1);
>  }
>  
>  /* idx can be of type enum memcg_stat_item or node_stat_item */
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 4b328aecabb2..4fb0bbc3b041 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -384,7 +384,7 @@ static void account_kernel_stack(struct task_struct *tsk, int account)
>  		mod_lruvec_page_state(vm->pages[0], NR_KERNEL_STACK_KB,
>  				      account * (THREAD_SIZE / 1024));
>  	else
> -		mod_lruvec_slab_state(stack, NR_KERNEL_STACK_KB,
> +		mod_lruvec_kmem_state(stack, NR_KERNEL_STACK_KB,
>  				      account * (THREAD_SIZE / 1024));
>  }
>  
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index c0c27bee23ad..22b4fb941b54 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -866,7 +866,7 @@ void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
>  		__mod_memcg_lruvec_state(lruvec, idx, val);
>  }
>  
> -void __mod_lruvec_slab_state(void *p, enum node_stat_item idx, int val)
> +void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val)
>  {
>  	pg_data_t *pgdat = page_pgdat(virt_to_page(p));
>  	struct mem_cgroup *memcg;
> @@ -2920,6 +2920,7 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
>  	if (mem_cgroup_disabled())
>  		return NULL;
>  
> +	VM_BUG_ON(!virt_addr_valid(p));
>  	page = virt_to_head_page(p);
>  
>  	/*
> diff --git a/mm/workingset.c b/mm/workingset.c
> index 50d53f3699e4..2c707c92dd89 100644
> --- a/mm/workingset.c
> +++ b/mm/workingset.c
> @@ -445,12 +445,12 @@ void workingset_update_node(struct xa_node *node)
>  	if (node->count && node->count == node->nr_values) {
>  		if (list_empty(&node->private_list)) {
>  			list_lru_add(&shadow_nodes, &node->private_list);
> -			__inc_lruvec_slab_state(node, WORKINGSET_NODES);
> +			__inc_lruvec_kmem_state(node, WORKINGSET_NODES);
>  		}
>  	} else {
>  		if (!list_empty(&node->private_list)) {
>  			list_lru_del(&shadow_nodes, &node->private_list);
> -			__dec_lruvec_slab_state(node, WORKINGSET_NODES);
> +			__dec_lruvec_kmem_state(node, WORKINGSET_NODES);
>  		}
>  	}
>  }
> @@ -541,7 +541,7 @@ static enum lru_status shadow_lru_isolate(struct list_head *item,
>  	}
>  
>  	list_lru_isolate(lru, item);
> -	__dec_lruvec_slab_state(node, WORKINGSET_NODES);
> +	__dec_lruvec_kmem_state(node, WORKINGSET_NODES);
>  
>  	spin_unlock(lru_lock);
>  
> @@ -564,7 +564,7 @@ static enum lru_status shadow_lru_isolate(struct list_head *item,
>  	 * shadow entries we were tracking ...
>  	 */
>  	xas_store(&xas, NULL);
> -	__inc_lruvec_slab_state(node, WORKINGSET_NODERECLAIM);
> +	__inc_lruvec_kmem_state(node, WORKINGSET_NODERECLAIM);
>  
>  out_invalid:
>  	xa_unlock_irq(&mapping->i_pages);
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/5] mm: memcg/slab: Rename *_lruvec_slab_state to *_lruvec_kmem_state
@ 2020-10-27 18:37     ` Roman Gushchin
  0 siblings, 0 replies; 41+ messages in thread
From: Roman Gushchin @ 2020-10-27 18:37 UTC (permalink / raw)
  To: Muchun Song
  Cc: hannes-druUgvl0LCNAfugRpC6u6w, mhocko-DgEjT+Ai2ygdnm+yROfE0A,
	vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA, iamjoonsoo.kim-Hm3cg6mZ9cc,
	laoar.shao-Re5JQEeQqe8AvxtiuMwx3w, chris-6Bi1550iOqEnzZ6mRAm98g,
	christian.brauner-GeWIH/nMZzLQT0dZR+AlfA,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ, mingo-DgEjT+Ai2ygdnm+yROfE0A,
	keescook-F7+t8E8rja9g9hUCZPvPmw, tglx-hfZtesqFncYOwBW4kG4KsQ,
	esyr-H+wXaHxf7aLQT0dZR+AlfA, surenb-hpIqsD4AKlfQT0dZR+AlfA,
	areber-H+wXaHxf7aLQT0dZR+AlfA, elver-hpIqsD4AKlfQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Tue, Oct 27, 2020 at 04:02:54PM +0800, Muchun Song wrote:
> The *_lruvec_slab_state is also suitable for pages allocated from buddy,
> not just for the slab objects. But the function name seems to tell us that
> only slab object is applicable. So we can rename the keyword of slab to
> kmem.

I agree, I think we've even discussed such a renaming in the past.
Please, drop the addition of VM_BUG_ON() (it's not related to the renaming, right?).

Other than that:
Acked-by: Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org>

Thanks!

> 
> Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>
> ---
>  include/linux/memcontrol.h | 18 +++++++++---------
>  kernel/fork.c              |  2 +-
>  mm/memcontrol.c            |  3 ++-
>  mm/workingset.c            |  8 ++++----
>  4 files changed, 16 insertions(+), 15 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index d7e339bf72dc..95807bf6be64 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -793,15 +793,15 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
>  			      int val);
>  void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
>  			int val);
> -void __mod_lruvec_slab_state(void *p, enum node_stat_item idx, int val);
> +void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val);
>  
> -static inline void mod_lruvec_slab_state(void *p, enum node_stat_item idx,
> +static inline void mod_lruvec_kmem_state(void *p, enum node_stat_item idx,
>  					 int val)
>  {
>  	unsigned long flags;
>  
>  	local_irq_save(flags);
> -	__mod_lruvec_slab_state(p, idx, val);
> +	__mod_lruvec_kmem_state(p, idx, val);
>  	local_irq_restore(flags);
>  }
>  
> @@ -1227,7 +1227,7 @@ static inline void mod_lruvec_page_state(struct page *page,
>  	mod_node_page_state(page_pgdat(page), idx, val);
>  }
>  
> -static inline void __mod_lruvec_slab_state(void *p, enum node_stat_item idx,
> +static inline void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx,
>  					   int val)
>  {
>  	struct page *page = virt_to_head_page(p);
> @@ -1235,7 +1235,7 @@ static inline void __mod_lruvec_slab_state(void *p, enum node_stat_item idx,
>  	__mod_node_page_state(page_pgdat(page), idx, val);
>  }
>  
> -static inline void mod_lruvec_slab_state(void *p, enum node_stat_item idx,
> +static inline void mod_lruvec_kmem_state(void *p, enum node_stat_item idx,
>  					 int val)
>  {
>  	struct page *page = virt_to_head_page(p);
> @@ -1330,14 +1330,14 @@ static inline void __dec_lruvec_page_state(struct page *page,
>  	__mod_lruvec_page_state(page, idx, -1);
>  }
>  
> -static inline void __inc_lruvec_slab_state(void *p, enum node_stat_item idx)
> +static inline void __inc_lruvec_kmem_state(void *p, enum node_stat_item idx)
>  {
> -	__mod_lruvec_slab_state(p, idx, 1);
> +	__mod_lruvec_kmem_state(p, idx, 1);
>  }
>  
> -static inline void __dec_lruvec_slab_state(void *p, enum node_stat_item idx)
> +static inline void __dec_lruvec_kmem_state(void *p, enum node_stat_item idx)
>  {
> -	__mod_lruvec_slab_state(p, idx, -1);
> +	__mod_lruvec_kmem_state(p, idx, -1);
>  }
>  
>  /* idx can be of type enum memcg_stat_item or node_stat_item */
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 4b328aecabb2..4fb0bbc3b041 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -384,7 +384,7 @@ static void account_kernel_stack(struct task_struct *tsk, int account)
>  		mod_lruvec_page_state(vm->pages[0], NR_KERNEL_STACK_KB,
>  				      account * (THREAD_SIZE / 1024));
>  	else
> -		mod_lruvec_slab_state(stack, NR_KERNEL_STACK_KB,
> +		mod_lruvec_kmem_state(stack, NR_KERNEL_STACK_KB,
>  				      account * (THREAD_SIZE / 1024));
>  }
>  
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index c0c27bee23ad..22b4fb941b54 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -866,7 +866,7 @@ void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
>  		__mod_memcg_lruvec_state(lruvec, idx, val);
>  }
>  
> -void __mod_lruvec_slab_state(void *p, enum node_stat_item idx, int val)
> +void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val)
>  {
>  	pg_data_t *pgdat = page_pgdat(virt_to_page(p));
>  	struct mem_cgroup *memcg;
> @@ -2920,6 +2920,7 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
>  	if (mem_cgroup_disabled())
>  		return NULL;
>  
> +	VM_BUG_ON(!virt_addr_valid(p));
>  	page = virt_to_head_page(p);
>  
>  	/*
> diff --git a/mm/workingset.c b/mm/workingset.c
> index 50d53f3699e4..2c707c92dd89 100644
> --- a/mm/workingset.c
> +++ b/mm/workingset.c
> @@ -445,12 +445,12 @@ void workingset_update_node(struct xa_node *node)
>  	if (node->count && node->count == node->nr_values) {
>  		if (list_empty(&node->private_list)) {
>  			list_lru_add(&shadow_nodes, &node->private_list);
> -			__inc_lruvec_slab_state(node, WORKINGSET_NODES);
> +			__inc_lruvec_kmem_state(node, WORKINGSET_NODES);
>  		}
>  	} else {
>  		if (!list_empty(&node->private_list)) {
>  			list_lru_del(&shadow_nodes, &node->private_list);
> -			__dec_lruvec_slab_state(node, WORKINGSET_NODES);
> +			__dec_lruvec_kmem_state(node, WORKINGSET_NODES);
>  		}
>  	}
>  }
> @@ -541,7 +541,7 @@ static enum lru_status shadow_lru_isolate(struct list_head *item,
>  	}
>  
>  	list_lru_isolate(lru, item);
> -	__dec_lruvec_slab_state(node, WORKINGSET_NODES);
> +	__dec_lruvec_kmem_state(node, WORKINGSET_NODES);
>  
>  	spin_unlock(lru_lock);
>  
> @@ -564,7 +564,7 @@ static enum lru_status shadow_lru_isolate(struct list_head *item,
>  	 * shadow entries we were tracking ...
>  	 */
>  	xas_store(&xas, NULL);
> -	__inc_lruvec_slab_state(node, WORKINGSET_NODERECLAIM);
> +	__inc_lruvec_kmem_state(node, WORKINGSET_NODERECLAIM);
>  
>  out_invalid:
>  	xa_unlock_irq(&mapping->i_pages);
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 4/5] mm: memcg/slab: Fix root memcg vmstats
@ 2020-10-27 18:48     ` Roman Gushchin
  0 siblings, 0 replies; 41+ messages in thread
From: Roman Gushchin @ 2020-10-27 18:48 UTC (permalink / raw)
  To: Muchun Song
  Cc: hannes, mhocko, vdavydov.dev, akpm, shakeelb, iamjoonsoo.kim,
	laoar.shao, chris, christian.brauner, peterz, mingo, keescook,
	tglx, esyr, surenb, areber, elver, linux-kernel, cgroups,
	linux-mm

On Tue, Oct 27, 2020 at 04:02:55PM +0800, Muchun Song wrote:
> If we reparent the slab objects to the root memcg, when we free
> the slab object, we need to update the per-memcg vmstats to keep
> it correct for the root memcg. Now this at least affects the vmstat
> of NR_KERNEL_STACK_KB for !CONFIG_VMAP_STACK when the thread stack
> size is smaller than the PAGE_SIZE.
> 
> Fixes: ec9f02384f60 ("mm: workingset: fix vmstat counters for shadow nodes")
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>

Can you, please, drop this patch for now?

I'm working on a bigger cleanup related to the handling of the root memory
cgroup (I sent a link earlier in this thread), which already does a similar change.
There are several issues like this one, so it will be nice to fix them all at once.

Thank you!

> ---
>  mm/memcontrol.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 22b4fb941b54..70345b15b150 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -875,8 +875,13 @@ void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val)
>  	rcu_read_lock();
>  	memcg = mem_cgroup_from_obj(p);
>  
> -	/* Untracked pages have no memcg, no lruvec. Update only the node */
> -	if (!memcg || memcg == root_mem_cgroup) {
> +	/*
> +	 * Untracked pages have no memcg, no lruvec. Update only the
> +	 * node. If we reparent the slab objects to the root memcg,
> +	 * when we free the slab object, we need to update the per-memcg
> +	 * vmstats to keep it correct for the root memcg.
> +	 */
> +	if (!memcg) {
>  		__mod_node_page_state(pgdat, idx, val);
>  	} else {
>  		lruvec = mem_cgroup_lruvec(memcg, pgdat);
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 4/5] mm: memcg/slab: Fix root memcg vmstats
@ 2020-10-27 18:48     ` Roman Gushchin
  0 siblings, 0 replies; 41+ messages in thread
From: Roman Gushchin @ 2020-10-27 18:48 UTC (permalink / raw)
  To: Muchun Song
  Cc: hannes-druUgvl0LCNAfugRpC6u6w, mhocko-DgEjT+Ai2ygdnm+yROfE0A,
	vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA, iamjoonsoo.kim-Hm3cg6mZ9cc,
	laoar.shao-Re5JQEeQqe8AvxtiuMwx3w, chris-6Bi1550iOqEnzZ6mRAm98g,
	christian.brauner-GeWIH/nMZzLQT0dZR+AlfA,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ, mingo-DgEjT+Ai2ygdnm+yROfE0A,
	keescook-F7+t8E8rja9g9hUCZPvPmw, tglx-hfZtesqFncYOwBW4kG4KsQ,
	esyr-H+wXaHxf7aLQT0dZR+AlfA, surenb-hpIqsD4AKlfQT0dZR+AlfA,
	areber-H+wXaHxf7aLQT0dZR+AlfA, elver-hpIqsD4AKlfQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Tue, Oct 27, 2020 at 04:02:55PM +0800, Muchun Song wrote:
> If we reparent the slab objects to the root memcg, when we free
> the slab object, we need to update the per-memcg vmstats to keep
> it correct for the root memcg. Now this at least affects the vmstat
> of NR_KERNEL_STACK_KB for !CONFIG_VMAP_STACK when the thread stack
> size is smaller than the PAGE_SIZE.
> 
> Fixes: ec9f02384f60 ("mm: workingset: fix vmstat counters for shadow nodes")
> Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>

Can you, please, drop this patch for now?

I'm working on a bigger cleanup related to the handling of the root memory
cgroup (I sent a link earlier in this thread), which already does a similar change.
There are several issues like this one, so it will be nice to fix them all at once.

Thank you!

> ---
>  mm/memcontrol.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 22b4fb941b54..70345b15b150 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -875,8 +875,13 @@ void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val)
>  	rcu_read_lock();
>  	memcg = mem_cgroup_from_obj(p);
>  
> -	/* Untracked pages have no memcg, no lruvec. Update only the node */
> -	if (!memcg || memcg == root_mem_cgroup) {
> +	/*
> +	 * Untracked pages have no memcg, no lruvec. Update only the
> +	 * node. If we reparent the slab objects to the root memcg,
> +	 * when we free the slab object, we need to update the per-memcg
> +	 * vmstats to keep it correct for the root memcg.
> +	 */
> +	if (!memcg) {
>  		__mod_node_page_state(pgdat, idx, val);
>  	} else {
>  		lruvec = mem_cgroup_lruvec(memcg, pgdat);
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/5] mm: memcg/slab: Fix return child memcg objcg for root memcg
@ 2020-10-27 19:05       ` Roman Gushchin
  0 siblings, 0 replies; 41+ messages in thread
From: Roman Gushchin @ 2020-10-27 19:05 UTC (permalink / raw)
  To: Muchun Song
  Cc: hannes, mhocko, vdavydov.dev, akpm, shakeelb, iamjoonsoo.kim,
	laoar.shao, chris, christian.brauner, peterz, mingo, keescook,
	tglx, esyr, surenb, areber, elver, linux-kernel, cgroups,
	linux-mm

On Tue, Oct 27, 2020 at 10:42:36AM -0700, Roman Gushchin wrote:
> On Tue, Oct 27, 2020 at 04:02:52PM +0800, Muchun Song wrote:
> > Consider the following memcg hierarchy.
> > 
> >                     root
> >                    /    \
> >                   A      B
> > 
> > If we get the objcg of memcg A failed, the get_obj_cgroup_from_current
> > can return the wrong objcg for the root memcg.
> > 
> > Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API")
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  mm/memcontrol.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> Hi Muchun,
> 
> thank you for the patchset!
> 
> A generic note: it's usually not a good idea to group fixes with non-fixes
> in one patchset if fixes are planned to be backported to stable.
> 
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 1337775b04f3..fcbd79c5023e 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -2945,7 +2945,7 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
> >  
> >  __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void)
> >  {
> > -	struct obj_cgroup *objcg = NULL;
> > +	struct obj_cgroup *objcg;
> >  	struct mem_cgroup *memcg;
> >  
> >  	if (memcg_kmem_bypass())
> > @@ -2964,6 +2964,9 @@ __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void)
> >  	}
> >  	rcu_read_unlock();
> >  
> > +	if (memcg == root_mem_cgroup)
> > +		return NULL;
> > +
> >  	return objcg;
> >  }
> 
> I agree, it's a bug. Good catch, thanks!
> 
> I would prefer a slightly different fix though. Because we're going to change how
> the root memory cgroup is handled (an rfc version posted here:
> https://lkml.org/lkml/2020/10/21/612), it's better to not use a comparison
> with the root_mem_cgroup and handle the obj_cgroup_tryget() return code instead.
> Something like this:
> 
> @@ -2973,6 +2973,7 @@ __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void)
>  		objcg = rcu_dereference(memcg->objcg);
>  		if (objcg && obj_cgroup_tryget(objcg))
>  			break;
> +		objcg = NULL;
>  	}
>  	rcu_read_unlock();
> 
> Or a more explicit:
>    if (obj_cgroup_tryget(objcg)) {
>      ...
>    } else {
>      ...
>    }
> 
> Also, please, add:
> Cc: stable@vger.kernel.org

After a second thought, it's also a highly theoretical race. In order to have
obj_cgroup_tryget() failed we need to enter get_obj_cgroup_from_current()
racing with cgroup offlining (which is already tricky) and then have
memcg_reparent_objcgs() finish in between reading memcg->objcg and bumping
the reference counter.

So I'm not sure we need a stable@ backport, sorry for the confusion.

Thanks!

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/5] mm: memcg/slab: Fix return child memcg objcg for root memcg
@ 2020-10-27 19:05       ` Roman Gushchin
  0 siblings, 0 replies; 41+ messages in thread
From: Roman Gushchin @ 2020-10-27 19:05 UTC (permalink / raw)
  To: Muchun Song
  Cc: hannes-druUgvl0LCNAfugRpC6u6w, mhocko-DgEjT+Ai2ygdnm+yROfE0A,
	vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA, iamjoonsoo.kim-Hm3cg6mZ9cc,
	laoar.shao-Re5JQEeQqe8AvxtiuMwx3w, chris-6Bi1550iOqEnzZ6mRAm98g,
	christian.brauner-GeWIH/nMZzLQT0dZR+AlfA,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ, mingo-DgEjT+Ai2ygdnm+yROfE0A,
	keescook-F7+t8E8rja9g9hUCZPvPmw, tglx-hfZtesqFncYOwBW4kG4KsQ,
	esyr-H+wXaHxf7aLQT0dZR+AlfA, surenb-hpIqsD4AKlfQT0dZR+AlfA,
	areber-H+wXaHxf7aLQT0dZR+AlfA, elver-hpIqsD4AKlfQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Tue, Oct 27, 2020 at 10:42:36AM -0700, Roman Gushchin wrote:
> On Tue, Oct 27, 2020 at 04:02:52PM +0800, Muchun Song wrote:
> > Consider the following memcg hierarchy.
> > 
> >                     root
> >                    /    \
> >                   A      B
> > 
> > If we get the objcg of memcg A failed, the get_obj_cgroup_from_current
> > can return the wrong objcg for the root memcg.
> > 
> > Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API")
> > Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>
> > ---
> >  mm/memcontrol.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> Hi Muchun,
> 
> thank you for the patchset!
> 
> A generic note: it's usually not a good idea to group fixes with non-fixes
> in one patchset if fixes are planned to be backported to stable.
> 
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 1337775b04f3..fcbd79c5023e 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -2945,7 +2945,7 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
> >  
> >  __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void)
> >  {
> > -	struct obj_cgroup *objcg = NULL;
> > +	struct obj_cgroup *objcg;
> >  	struct mem_cgroup *memcg;
> >  
> >  	if (memcg_kmem_bypass())
> > @@ -2964,6 +2964,9 @@ __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void)
> >  	}
> >  	rcu_read_unlock();
> >  
> > +	if (memcg == root_mem_cgroup)
> > +		return NULL;
> > +
> >  	return objcg;
> >  }
> 
> I agree, it's a bug. Good catch, thanks!
> 
> I would prefer a slightly different fix though. Because we're going to change how
> the root memory cgroup is handled (an rfc version posted here:
> https://lkml.org/lkml/2020/10/21/612), it's better to not use a comparison
> with the root_mem_cgroup and handle the obj_cgroup_tryget() return code instead.
> Something like this:
> 
> @@ -2973,6 +2973,7 @@ __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void)
>  		objcg = rcu_dereference(memcg->objcg);
>  		if (objcg && obj_cgroup_tryget(objcg))
>  			break;
> +		objcg = NULL;
>  	}
>  	rcu_read_unlock();
> 
> Or a more explicit:
>    if (obj_cgroup_tryget(objcg)) {
>      ...
>    } else {
>      ...
>    }
> 
> Also, please, add:
> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

After a second thought, it's also a highly theoretical race. In order to have
obj_cgroup_tryget() failed we need to enter get_obj_cgroup_from_current()
racing with cgroup offlining (which is already tricky) and then have
memcg_reparent_objcgs() finish in between reading memcg->objcg and bumping
the reference counter.

So I'm not sure we need a stable@ backport, sorry for the confusion.

Thanks!

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [External] Re: [PATCH 4/5] mm: memcg/slab: Fix root memcg vmstats
  2020-10-27 18:48     ` Roman Gushchin
  (?)
@ 2020-10-28  2:56       ` Muchun Song
  -1 siblings, 0 replies; 41+ messages in thread
From: Muchun Song @ 2020-10-28  2:56 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Shakeel Butt, Joonsoo Kim, laoar.shao, Chris Down,
	Christian Brauner, Peter Zijlstra, Ingo Molnar, Kees Cook,
	Thomas Gleixner, esyr, Suren Baghdasaryan, areber, Marco Elver,
	LKML, Cgroups, Linux Memory Management List

On Wed, Oct 28, 2020 at 2:48 AM Roman Gushchin <guro@fb.com> wrote:
>
> On Tue, Oct 27, 2020 at 04:02:55PM +0800, Muchun Song wrote:
> > If we reparent the slab objects to the root memcg, when we free
> > the slab object, we need to update the per-memcg vmstats to keep
> > it correct for the root memcg. Now this at least affects the vmstat
> > of NR_KERNEL_STACK_KB for !CONFIG_VMAP_STACK when the thread stack
> > size is smaller than the PAGE_SIZE.
> >
> > Fixes: ec9f02384f60 ("mm: workingset: fix vmstat counters for shadow nodes")
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>
> Can you, please, drop this patch for now?
>
> I'm working on a bigger cleanup related to the handling of the root memory
> cgroup (I sent a link earlier in this thread), which already does a similar change.
> There are several issues like this one, so it will be nice to fix them all at once.

I have read the patch of https://lkml.org/lkml/2020/10/14/869. You
mean this patch
fixes this issue? It chooses to uncharge the root memcg. But here we may need to
uncharge the root memcg to keep root vmstats correct. If we do not do
this, we can
see the wrong vmstats via root memory.stat(e.g. NR_KERNEL_STACK_KB).

Thanks.

>
> Thank you!
>
> > ---
> >  mm/memcontrol.c | 9 +++++++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 22b4fb941b54..70345b15b150 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -875,8 +875,13 @@ void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val)
> >       rcu_read_lock();
> >       memcg = mem_cgroup_from_obj(p);
> >
> > -     /* Untracked pages have no memcg, no lruvec. Update only the node */
> > -     if (!memcg || memcg == root_mem_cgroup) {
> > +     /*
> > +      * Untracked pages have no memcg, no lruvec. Update only the
> > +      * node. If we reparent the slab objects to the root memcg,
> > +      * when we free the slab object, we need to update the per-memcg
> > +      * vmstats to keep it correct for the root memcg.
> > +      */
> > +     if (!memcg) {
> >               __mod_node_page_state(pgdat, idx, val);
> >       } else {
> >               lruvec = mem_cgroup_lruvec(memcg, pgdat);
> > --
> > 2.20.1
> >



-- 
Yours,
Muchun

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [External] Re: [PATCH 4/5] mm: memcg/slab: Fix root memcg vmstats
@ 2020-10-28  2:56       ` Muchun Song
  0 siblings, 0 replies; 41+ messages in thread
From: Muchun Song @ 2020-10-28  2:56 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Shakeel Butt, Joonsoo Kim, laoar.shao, Chris Down,
	Christian Brauner, Peter Zijlstra, Ingo Molnar, Kees Cook,
	Thomas Gleixner, esyr, Suren Baghdasaryan, areber, Marco Elver,
	LKML, Cgroups, Linux Memory Management List

On Wed, Oct 28, 2020 at 2:48 AM Roman Gushchin <guro@fb.com> wrote:
>
> On Tue, Oct 27, 2020 at 04:02:55PM +0800, Muchun Song wrote:
> > If we reparent the slab objects to the root memcg, when we free
> > the slab object, we need to update the per-memcg vmstats to keep
> > it correct for the root memcg. Now this at least affects the vmstat
> > of NR_KERNEL_STACK_KB for !CONFIG_VMAP_STACK when the thread stack
> > size is smaller than the PAGE_SIZE.
> >
> > Fixes: ec9f02384f60 ("mm: workingset: fix vmstat counters for shadow nodes")
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>
> Can you, please, drop this patch for now?
>
> I'm working on a bigger cleanup related to the handling of the root memory
> cgroup (I sent a link earlier in this thread), which already does a similar change.
> There are several issues like this one, so it will be nice to fix them all at once.

I have read the patch of https://lkml.org/lkml/2020/10/14/869. You
mean this patch
fixes this issue? It chooses to uncharge the root memcg. But here we may need to
uncharge the root memcg to keep root vmstats correct. If we do not do
this, we can
see the wrong vmstats via root memory.stat(e.g. NR_KERNEL_STACK_KB).

Thanks.

>
> Thank you!
>
> > ---
> >  mm/memcontrol.c | 9 +++++++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 22b4fb941b54..70345b15b150 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -875,8 +875,13 @@ void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val)
> >       rcu_read_lock();
> >       memcg = mem_cgroup_from_obj(p);
> >
> > -     /* Untracked pages have no memcg, no lruvec. Update only the node */
> > -     if (!memcg || memcg == root_mem_cgroup) {
> > +     /*
> > +      * Untracked pages have no memcg, no lruvec. Update only the
> > +      * node. If we reparent the slab objects to the root memcg,
> > +      * when we free the slab object, we need to update the per-memcg
> > +      * vmstats to keep it correct for the root memcg.
> > +      */
> > +     if (!memcg) {
> >               __mod_node_page_state(pgdat, idx, val);
> >       } else {
> >               lruvec = mem_cgroup_lruvec(memcg, pgdat);
> > --
> > 2.20.1
> >



-- 
Yours,
Muchun


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [External] Re: [PATCH 4/5] mm: memcg/slab: Fix root memcg vmstats
@ 2020-10-28  2:56       ` Muchun Song
  0 siblings, 0 replies; 41+ messages in thread
From: Muchun Song @ 2020-10-28  2:56 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Shakeel Butt, Joonsoo Kim, laoar.shao-Re5JQEeQqe8AvxtiuMwx3w,
	Chris Down, Christian Brauner, Peter Zijlstra, Ingo Molnar,
	Kees Cook, Thomas Gleixner, esyr-H+wXaHxf7aLQT0dZR+AlfA,
	Suren Baghdasaryan, areber-H+wXaHxf7aLQT0dZR+AlfA, Marco Elver,
	LKML, Cgroups, Linux Memory Management List

On Wed, Oct 28, 2020 at 2:48 AM Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org> wrote:
>
> On Tue, Oct 27, 2020 at 04:02:55PM +0800, Muchun Song wrote:
> > If we reparent the slab objects to the root memcg, when we free
> > the slab object, we need to update the per-memcg vmstats to keep
> > it correct for the root memcg. Now this at least affects the vmstat
> > of NR_KERNEL_STACK_KB for !CONFIG_VMAP_STACK when the thread stack
> > size is smaller than the PAGE_SIZE.
> >
> > Fixes: ec9f02384f60 ("mm: workingset: fix vmstat counters for shadow nodes")
> > Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>
>
> Can you, please, drop this patch for now?
>
> I'm working on a bigger cleanup related to the handling of the root memory
> cgroup (I sent a link earlier in this thread), which already does a similar change.
> There are several issues like this one, so it will be nice to fix them all at once.

I have read the patch of https://lkml.org/lkml/2020/10/14/869. You
mean this patch
fixes this issue? It chooses to uncharge the root memcg. But here we may need to
uncharge the root memcg to keep root vmstats correct. If we do not do
this, we can
see the wrong vmstats via root memory.stat(e.g. NR_KERNEL_STACK_KB).

Thanks.

>
> Thank you!
>
> > ---
> >  mm/memcontrol.c | 9 +++++++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 22b4fb941b54..70345b15b150 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -875,8 +875,13 @@ void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val)
> >       rcu_read_lock();
> >       memcg = mem_cgroup_from_obj(p);
> >
> > -     /* Untracked pages have no memcg, no lruvec. Update only the node */
> > -     if (!memcg || memcg == root_mem_cgroup) {
> > +     /*
> > +      * Untracked pages have no memcg, no lruvec. Update only the
> > +      * node. If we reparent the slab objects to the root memcg,
> > +      * when we free the slab object, we need to update the per-memcg
> > +      * vmstats to keep it correct for the root memcg.
> > +      */
> > +     if (!memcg) {
> >               __mod_node_page_state(pgdat, idx, val);
> >       } else {
> >               lruvec = mem_cgroup_lruvec(memcg, pgdat);
> > --
> > 2.20.1
> >



-- 
Yours,
Muchun

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [External] Re: [PATCH 4/5] mm: memcg/slab: Fix root memcg vmstats
  2020-10-28  2:56       ` Muchun Song
@ 2020-10-28  3:47         ` Muchun Song
  -1 siblings, 0 replies; 41+ messages in thread
From: Muchun Song @ 2020-10-28  3:47 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Shakeel Butt, Joonsoo Kim, laoar.shao, Chris Down,
	Christian Brauner, Peter Zijlstra, Ingo Molnar, Kees Cook,
	Thomas Gleixner, esyr, Suren Baghdasaryan, areber, Marco Elver,
	LKML, Cgroups, Linux Memory Management List

On Wed, Oct 28, 2020 at 10:56 AM Muchun Song <songmuchun@bytedance.com> wrote:
>
> On Wed, Oct 28, 2020 at 2:48 AM Roman Gushchin <guro@fb.com> wrote:
> >
> > On Tue, Oct 27, 2020 at 04:02:55PM +0800, Muchun Song wrote:
> > > If we reparent the slab objects to the root memcg, when we free
> > > the slab object, we need to update the per-memcg vmstats to keep
> > > it correct for the root memcg. Now this at least affects the vmstat
> > > of NR_KERNEL_STACK_KB for !CONFIG_VMAP_STACK when the thread stack
> > > size is smaller than the PAGE_SIZE.
> > >
> > > Fixes: ec9f02384f60 ("mm: workingset: fix vmstat counters for shadow nodes")
> > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> >
> > Can you, please, drop this patch for now?
> >
> > I'm working on a bigger cleanup related to the handling of the root memory
> > cgroup (I sent a link earlier in this thread), which already does a similar change.
> > There are several issues like this one, so it will be nice to fix them all at once.
>
> I have read the patch of https://lkml.org/lkml/2020/10/14/869. You
> mean this patch
> fixes this issue? It chooses to uncharge the root memcg. But here we may need to

Here I mean "It chooses to not uncharge the root memcg", sorry.

> uncharge the root memcg to keep root vmstats correct. If we do not do
> this, we can
> see the wrong vmstats via root memory.stat(e.g. NR_KERNEL_STACK_KB).
>
> Thanks.
>
> >
> > Thank you!
> >
> > > ---
> > >  mm/memcontrol.c | 9 +++++++--
> > >  1 file changed, 7 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index 22b4fb941b54..70345b15b150 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -875,8 +875,13 @@ void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val)
> > >       rcu_read_lock();
> > >       memcg = mem_cgroup_from_obj(p);
> > >
> > > -     /* Untracked pages have no memcg, no lruvec. Update only the node */
> > > -     if (!memcg || memcg == root_mem_cgroup) {
> > > +     /*
> > > +      * Untracked pages have no memcg, no lruvec. Update only the
> > > +      * node. If we reparent the slab objects to the root memcg,
> > > +      * when we free the slab object, we need to update the per-memcg
> > > +      * vmstats to keep it correct for the root memcg.
> > > +      */
> > > +     if (!memcg) {
> > >               __mod_node_page_state(pgdat, idx, val);
> > >       } else {
> > >               lruvec = mem_cgroup_lruvec(memcg, pgdat);
> > > --
> > > 2.20.1
> > >
>
>
>
> --
> Yours,
> Muchun



-- 
Yours,
Muchun

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [External] Re: [PATCH 4/5] mm: memcg/slab: Fix root memcg vmstats
@ 2020-10-28  3:47         ` Muchun Song
  0 siblings, 0 replies; 41+ messages in thread
From: Muchun Song @ 2020-10-28  3:47 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Shakeel Butt, Joonsoo Kim, laoar.shao, Chris Down,
	Christian Brauner, Peter Zijlstra, Ingo Molnar, Kees Cook,
	Thomas Gleixner, esyr, Suren Baghdasaryan, areber, Marco Elver,
	LKML, Cgroups, Linux Memory Management List

On Wed, Oct 28, 2020 at 10:56 AM Muchun Song <songmuchun@bytedance.com> wrote:
>
> On Wed, Oct 28, 2020 at 2:48 AM Roman Gushchin <guro@fb.com> wrote:
> >
> > On Tue, Oct 27, 2020 at 04:02:55PM +0800, Muchun Song wrote:
> > > If we reparent the slab objects to the root memcg, when we free
> > > the slab object, we need to update the per-memcg vmstats to keep
> > > it correct for the root memcg. Now this at least affects the vmstat
> > > of NR_KERNEL_STACK_KB for !CONFIG_VMAP_STACK when the thread stack
> > > size is smaller than the PAGE_SIZE.
> > >
> > > Fixes: ec9f02384f60 ("mm: workingset: fix vmstat counters for shadow nodes")
> > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> >
> > Can you, please, drop this patch for now?
> >
> > I'm working on a bigger cleanup related to the handling of the root memory
> > cgroup (I sent a link earlier in this thread), which already does a similar change.
> > There are several issues like this one, so it will be nice to fix them all at once.
>
> I have read the patch of https://lkml.org/lkml/2020/10/14/869. You
> mean this patch
> fixes this issue? It chooses to uncharge the root memcg. But here we may need to

Here I mean "It chooses to not uncharge the root memcg", sorry.

> uncharge the root memcg to keep root vmstats correct. If we do not do
> this, we can
> see the wrong vmstats via root memory.stat(e.g. NR_KERNEL_STACK_KB).
>
> Thanks.
>
> >
> > Thank you!
> >
> > > ---
> > >  mm/memcontrol.c | 9 +++++++--
> > >  1 file changed, 7 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index 22b4fb941b54..70345b15b150 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -875,8 +875,13 @@ void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val)
> > >       rcu_read_lock();
> > >       memcg = mem_cgroup_from_obj(p);
> > >
> > > -     /* Untracked pages have no memcg, no lruvec. Update only the node */
> > > -     if (!memcg || memcg == root_mem_cgroup) {
> > > +     /*
> > > +      * Untracked pages have no memcg, no lruvec. Update only the
> > > +      * node. If we reparent the slab objects to the root memcg,
> > > +      * when we free the slab object, we need to update the per-memcg
> > > +      * vmstats to keep it correct for the root memcg.
> > > +      */
> > > +     if (!memcg) {
> > >               __mod_node_page_state(pgdat, idx, val);
> > >       } else {
> > >               lruvec = mem_cgroup_lruvec(memcg, pgdat);
> > > --
> > > 2.20.1
> > >
>
>
>
> --
> Yours,
> Muchun



-- 
Yours,
Muchun

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [External] Re: [PATCH 4/5] mm: memcg/slab: Fix root memcg vmstats
@ 2020-10-29  0:14         ` Roman Gushchin
  0 siblings, 0 replies; 41+ messages in thread
From: Roman Gushchin @ 2020-10-29  0:14 UTC (permalink / raw)
  To: Muchun Song
  Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Shakeel Butt, Joonsoo Kim, laoar.shao, Chris Down,
	Christian Brauner, Peter Zijlstra, Ingo Molnar, Kees Cook,
	Thomas Gleixner, esyr, Suren Baghdasaryan, areber, Marco Elver,
	LKML, Cgroups, Linux Memory Management List

On Wed, Oct 28, 2020 at 10:56:20AM +0800, Muchun Song wrote:
> On Wed, Oct 28, 2020 at 2:48 AM Roman Gushchin <guro@fb.com> wrote:
> >
> > On Tue, Oct 27, 2020 at 04:02:55PM +0800, Muchun Song wrote:
> > > If we reparent the slab objects to the root memcg, when we free
> > > the slab object, we need to update the per-memcg vmstats to keep
> > > it correct for the root memcg. Now this at least affects the vmstat
> > > of NR_KERNEL_STACK_KB for !CONFIG_VMAP_STACK when the thread stack
> > > size is smaller than the PAGE_SIZE.
> > >
> > > Fixes: ec9f02384f60 ("mm: workingset: fix vmstat counters for shadow nodes")
> > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> >
> > Can you, please, drop this patch for now?
> >
> > I'm working on a bigger cleanup related to the handling of the root memory
> > cgroup (I sent a link earlier in this thread), which already does a similar change.
> > There are several issues like this one, so it will be nice to fix them all at once.
> 
> I have read the patch of https://lkml.org/lkml/2020/10/14/869. You
> mean this patch
> fixes this issue? It chooses to uncharge the root memcg. But here we may need to
> uncharge the root memcg to keep root vmstats correct. If we do not do
> this, we can
> see the wrong vmstats via root memory.stat(e.g. NR_KERNEL_STACK_KB).

I pointed at a different patch in the same thread (it looks like you read the first one):
https://lkml.org/lkml/2020/10/21/612

It contained the following part:

@@ -868,7 +860,7 @@ void __mod_lruvec_slab_state(void *p, enum node_stat_item idx, int val)
 	memcg = mem_cgroup_from_obj(p);
 
 	/* Untracked pages have no memcg, no lruvec. Update only the node */
-	if (!memcg || memcg == root_mem_cgroup) {
+	if (!memcg) {
 		__mod_node_page_state(pgdat, idx, val);
 	} else {
 		lruvec = mem_cgroup_lruvec(memcg, pgdat);

So it's exactly what your patch does.

Thanks!

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [External] Re: [PATCH 4/5] mm: memcg/slab: Fix root memcg vmstats
@ 2020-10-29  0:14         ` Roman Gushchin
  0 siblings, 0 replies; 41+ messages in thread
From: Roman Gushchin @ 2020-10-29  0:14 UTC (permalink / raw)
  To: Muchun Song
  Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Shakeel Butt, Joonsoo Kim, laoar.shao-Re5JQEeQqe8AvxtiuMwx3w,
	Chris Down, Christian Brauner, Peter Zijlstra, Ingo Molnar,
	Kees Cook, Thomas Gleixner, esyr-H+wXaHxf7aLQT0dZR+AlfA,
	Suren Baghdasaryan, areber-H+wXaHxf7aLQT0dZR+AlfA, Marco Elver,
	LKML, Cgroups, Linux Memory Management List

On Wed, Oct 28, 2020 at 10:56:20AM +0800, Muchun Song wrote:
> On Wed, Oct 28, 2020 at 2:48 AM Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org> wrote:
> >
> > On Tue, Oct 27, 2020 at 04:02:55PM +0800, Muchun Song wrote:
> > > If we reparent the slab objects to the root memcg, when we free
> > > the slab object, we need to update the per-memcg vmstats to keep
> > > it correct for the root memcg. Now this at least affects the vmstat
> > > of NR_KERNEL_STACK_KB for !CONFIG_VMAP_STACK when the thread stack
> > > size is smaller than the PAGE_SIZE.
> > >
> > > Fixes: ec9f02384f60 ("mm: workingset: fix vmstat counters for shadow nodes")
> > > Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>
> >
> > Can you, please, drop this patch for now?
> >
> > I'm working on a bigger cleanup related to the handling of the root memory
> > cgroup (I sent a link earlier in this thread), which already does a similar change.
> > There are several issues like this one, so it will be nice to fix them all at once.
> 
> I have read the patch of https://lkml.org/lkml/2020/10/14/869. You
> mean this patch
> fixes this issue? It chooses to uncharge the root memcg. But here we may need to
> uncharge the root memcg to keep root vmstats correct. If we do not do
> this, we can
> see the wrong vmstats via root memory.stat(e.g. NR_KERNEL_STACK_KB).

I pointed at a different patch in the same thread (it looks like you read the first one):
https://lkml.org/lkml/2020/10/21/612

It contained the following part:

@@ -868,7 +860,7 @@ void __mod_lruvec_slab_state(void *p, enum node_stat_item idx, int val)
 	memcg = mem_cgroup_from_obj(p);
 
 	/* Untracked pages have no memcg, no lruvec. Update only the node */
-	if (!memcg || memcg == root_mem_cgroup) {
+	if (!memcg) {
 		__mod_node_page_state(pgdat, idx, val);
 	} else {
 		lruvec = mem_cgroup_lruvec(memcg, pgdat);

So it's exactly what your patch does.

Thanks!

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [External] Re: [PATCH 4/5] mm: memcg/slab: Fix root memcg vmstats
  2020-10-29  0:14         ` Roman Gushchin
@ 2020-10-29  6:15           ` Muchun Song
  -1 siblings, 0 replies; 41+ messages in thread
From: Muchun Song @ 2020-10-29  6:15 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Shakeel Butt, Joonsoo Kim, laoar.shao, Chris Down,
	Christian Brauner, Peter Zijlstra, Ingo Molnar, Kees Cook,
	Thomas Gleixner, esyr, Suren Baghdasaryan, areber, Marco Elver,
	LKML, Cgroups, Linux Memory Management List

On Thu, Oct 29, 2020 at 8:14 AM Roman Gushchin <guro@fb.com> wrote:
>
> On Wed, Oct 28, 2020 at 10:56:20AM +0800, Muchun Song wrote:
> > On Wed, Oct 28, 2020 at 2:48 AM Roman Gushchin <guro@fb.com> wrote:
> > >
> > > On Tue, Oct 27, 2020 at 04:02:55PM +0800, Muchun Song wrote:
> > > > If we reparent the slab objects to the root memcg, when we free
> > > > the slab object, we need to update the per-memcg vmstats to keep
> > > > it correct for the root memcg. Now this at least affects the vmstat
> > > > of NR_KERNEL_STACK_KB for !CONFIG_VMAP_STACK when the thread stack
> > > > size is smaller than the PAGE_SIZE.
> > > >
> > > > Fixes: ec9f02384f60 ("mm: workingset: fix vmstat counters for shadow nodes")
> > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > >
> > > Can you, please, drop this patch for now?
> > >
> > > I'm working on a bigger cleanup related to the handling of the root memory
> > > cgroup (I sent a link earlier in this thread), which already does a similar change.
> > > There are several issues like this one, so it will be nice to fix them all at once.
> >
> > I have read the patch of https://lkml.org/lkml/2020/10/14/869. You
> > mean this patch
> > fixes this issue? It chooses to uncharge the root memcg. But here we may need to
> > uncharge the root memcg to keep root vmstats correct. If we do not do
> > this, we can
> > see the wrong vmstats via root memory.stat(e.g. NR_KERNEL_STACK_KB).
>
> I pointed at a different patch in the same thread (it looks like you read the first one):
> https://lkml.org/lkml/2020/10/21/612

Got it. Thanks. That is fine to me.

>
> It contained the following part:
>
> @@ -868,7 +860,7 @@ void __mod_lruvec_slab_state(void *p, enum node_stat_item idx, int val)
>         memcg = mem_cgroup_from_obj(p);
>
>         /* Untracked pages have no memcg, no lruvec. Update only the node */
> -       if (!memcg || memcg == root_mem_cgroup) {
> +       if (!memcg) {
>                 __mod_node_page_state(pgdat, idx, val);
>         } else {
>                 lruvec = mem_cgroup_lruvec(memcg, pgdat);
>
> So it's exactly what your patch does.
>
> Thanks!



-- 
Yours,
Muchun

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [External] Re: [PATCH 4/5] mm: memcg/slab: Fix root memcg vmstats
@ 2020-10-29  6:15           ` Muchun Song
  0 siblings, 0 replies; 41+ messages in thread
From: Muchun Song @ 2020-10-29  6:15 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Shakeel Butt, Joonsoo Kim, laoar.shao, Chris Down,
	Christian Brauner, Peter Zijlstra, Ingo Molnar, Kees Cook,
	Thomas Gleixner, esyr, Suren Baghdasaryan, areber, Marco Elver,
	LKML, Cgroups, Linux Memory Management List

On Thu, Oct 29, 2020 at 8:14 AM Roman Gushchin <guro@fb.com> wrote:
>
> On Wed, Oct 28, 2020 at 10:56:20AM +0800, Muchun Song wrote:
> > On Wed, Oct 28, 2020 at 2:48 AM Roman Gushchin <guro@fb.com> wrote:
> > >
> > > On Tue, Oct 27, 2020 at 04:02:55PM +0800, Muchun Song wrote:
> > > > If we reparent the slab objects to the root memcg, when we free
> > > > the slab object, we need to update the per-memcg vmstats to keep
> > > > it correct for the root memcg. Now this at least affects the vmstat
> > > > of NR_KERNEL_STACK_KB for !CONFIG_VMAP_STACK when the thread stack
> > > > size is smaller than the PAGE_SIZE.
> > > >
> > > > Fixes: ec9f02384f60 ("mm: workingset: fix vmstat counters for shadow nodes")
> > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > >
> > > Can you, please, drop this patch for now?
> > >
> > > I'm working on a bigger cleanup related to the handling of the root memory
> > > cgroup (I sent a link earlier in this thread), which already does a similar change.
> > > There are several issues like this one, so it will be nice to fix them all at once.
> >
> > I have read the patch of https://lkml.org/lkml/2020/10/14/869. You
> > mean this patch
> > fixes this issue? It chooses to uncharge the root memcg. But here we may need to
> > uncharge the root memcg to keep root vmstats correct. If we do not do
> > this, we can
> > see the wrong vmstats via root memory.stat(e.g. NR_KERNEL_STACK_KB).
>
> I pointed at a different patch in the same thread (it looks like you read the first one):
> https://lkml.org/lkml/2020/10/21/612

Got it. Thanks. That is fine to me.

>
> It contained the following part:
>
> @@ -868,7 +860,7 @@ void __mod_lruvec_slab_state(void *p, enum node_stat_item idx, int val)
>         memcg = mem_cgroup_from_obj(p);
>
>         /* Untracked pages have no memcg, no lruvec. Update only the node */
> -       if (!memcg || memcg == root_mem_cgroup) {
> +       if (!memcg) {
>                 __mod_node_page_state(pgdat, idx, val);
>         } else {
>                 lruvec = mem_cgroup_lruvec(memcg, pgdat);
>
> So it's exactly what your patch does.
>
> Thanks!



-- 
Yours,
Muchun

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [External] Re: [PATCH 4/5] mm: memcg/slab: Fix root memcg vmstats
  2020-10-29  6:15           ` Muchun Song
@ 2020-11-10  1:32             ` Roman Gushchin
  -1 siblings, 0 replies; 41+ messages in thread
From: Roman Gushchin @ 2020-11-10  1:32 UTC (permalink / raw)
  To: Muchun Song
  Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Shakeel Butt, Joonsoo Kim, laoar.shao, Chris Down,
	Christian Brauner, Peter Zijlstra, Ingo Molnar, Kees Cook,
	Thomas Gleixner, esyr, Suren Baghdasaryan, areber, Marco Elver,
	LKML, Cgroups, Linux Memory Management List

On Thu, Oct 29, 2020 at 02:15:43PM +0800, Muchun Song wrote:
> On Thu, Oct 29, 2020 at 8:14 AM Roman Gushchin <guro@fb.com> wrote:
> >
> > On Wed, Oct 28, 2020 at 10:56:20AM +0800, Muchun Song wrote:
> > > On Wed, Oct 28, 2020 at 2:48 AM Roman Gushchin <guro@fb.com> wrote:
> > > >
> > > > On Tue, Oct 27, 2020 at 04:02:55PM +0800, Muchun Song wrote:
> > > > > If we reparent the slab objects to the root memcg, when we free
> > > > > the slab object, we need to update the per-memcg vmstats to keep
> > > > > it correct for the root memcg. Now this at least affects the vmstat
> > > > > of NR_KERNEL_STACK_KB for !CONFIG_VMAP_STACK when the thread stack
> > > > > size is smaller than the PAGE_SIZE.
> > > > >
> > > > > Fixes: ec9f02384f60 ("mm: workingset: fix vmstat counters for shadow nodes")
> > > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > >
> > > > Can you, please, drop this patch for now?
> > > >
> > > > I'm working on a bigger cleanup related to the handling of the root memory
> > > > cgroup (I sent a link earlier in this thread), which already does a similar change.
> > > > There are several issues like this one, so it will be nice to fix them all at once.
> > >
> > > I have read the patch of https://lkml.org/lkml/2020/10/14/869. You
> > > mean this patch
> > > fixes this issue? It chooses to uncharge the root memcg. But here we may need to
> > > uncharge the root memcg to keep root vmstats correct. If we do not do
> > > this, we can
> > > see the wrong vmstats via root memory.stat(e.g. NR_KERNEL_STACK_KB).
> >
> > I pointed at a different patch in the same thread (it looks like you read the first one):
> > https://lkml.org/lkml/2020/10/21/612

Hi Muchun!

Can you please, resend your patch? The planned cleanup of the root memory cgroup
is more complex than expected, so I think it makes sense to merge your patch without
waiting for it. I'm sorry for delaying it initially.

Please, feel free to add
Acked-by: Roman Gushchin <guro@fb.com>

Thank you!

Roman

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [External] Re: [PATCH 4/5] mm: memcg/slab: Fix root memcg vmstats
@ 2020-11-10  1:32             ` Roman Gushchin
  0 siblings, 0 replies; 41+ messages in thread
From: Roman Gushchin @ 2020-11-10  1:32 UTC (permalink / raw)
  To: Muchun Song
  Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Shakeel Butt, Joonsoo Kim, laoar.shao, Chris Down,
	Christian Brauner, Peter Zijlstra, Ingo Molnar, Kees Cook,
	Thomas Gleixner, esyr, Suren Baghdasaryan, areber, Marco Elver,
	LKML, Cgroups, Linux Memory Management List

On Thu, Oct 29, 2020 at 02:15:43PM +0800, Muchun Song wrote:
> On Thu, Oct 29, 2020 at 8:14 AM Roman Gushchin <guro@fb.com> wrote:
> >
> > On Wed, Oct 28, 2020 at 10:56:20AM +0800, Muchun Song wrote:
> > > On Wed, Oct 28, 2020 at 2:48 AM Roman Gushchin <guro@fb.com> wrote:
> > > >
> > > > On Tue, Oct 27, 2020 at 04:02:55PM +0800, Muchun Song wrote:
> > > > > If we reparent the slab objects to the root memcg, when we free
> > > > > the slab object, we need to update the per-memcg vmstats to keep
> > > > > it correct for the root memcg. Now this at least affects the vmstat
> > > > > of NR_KERNEL_STACK_KB for !CONFIG_VMAP_STACK when the thread stack
> > > > > size is smaller than the PAGE_SIZE.
> > > > >
> > > > > Fixes: ec9f02384f60 ("mm: workingset: fix vmstat counters for shadow nodes")
> > > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > >
> > > > Can you, please, drop this patch for now?
> > > >
> > > > I'm working on a bigger cleanup related to the handling of the root memory
> > > > cgroup (I sent a link earlier in this thread), which already does a similar change.
> > > > There are several issues like this one, so it will be nice to fix them all at once.
> > >
> > > I have read the patch of https://lkml.org/lkml/2020/10/14/869. You
> > > mean this patch
> > > fixes this issue? It chooses to uncharge the root memcg. But here we may need to
> > > uncharge the root memcg to keep root vmstats correct. If we do not do
> > > this, we can
> > > see the wrong vmstats via root memory.stat(e.g. NR_KERNEL_STACK_KB).
> >
> > I pointed at a different patch in the same thread (it looks like you read the first one):
> > https://lkml.org/lkml/2020/10/21/612

Hi Muchun!

Can you please, resend your patch? The planned cleanup of the root memory cgroup
is more complex than expected, so I think it makes sense to merge your patch without
waiting for it. I'm sorry for delaying it initially.

Please, feel free to add
Acked-by: Roman Gushchin <guro@fb.com>

Thank you!

Roman

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [External] Re: [PATCH 4/5] mm: memcg/slab: Fix root memcg vmstats
  2020-11-10  1:32             ` Roman Gushchin
  (?)
@ 2020-11-10  2:57               ` Muchun Song
  -1 siblings, 0 replies; 41+ messages in thread
From: Muchun Song @ 2020-11-10  2:57 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Shakeel Butt, Joonsoo Kim, Yafang Shao, Chris Down,
	Christian Brauner, Peter Zijlstra, Ingo Molnar, Kees Cook,
	Thomas Gleixner, esyr, Suren Baghdasaryan, areber, Marco Elver,
	LKML, Cgroups, Linux Memory Management List

On Tue, Nov 10, 2020 at 9:33 AM Roman Gushchin <guro@fb.com> wrote:
>
> On Thu, Oct 29, 2020 at 02:15:43PM +0800, Muchun Song wrote:
> > On Thu, Oct 29, 2020 at 8:14 AM Roman Gushchin <guro@fb.com> wrote:
> > >
> > > On Wed, Oct 28, 2020 at 10:56:20AM +0800, Muchun Song wrote:
> > > > On Wed, Oct 28, 2020 at 2:48 AM Roman Gushchin <guro@fb.com> wrote:
> > > > >
> > > > > On Tue, Oct 27, 2020 at 04:02:55PM +0800, Muchun Song wrote:
> > > > > > If we reparent the slab objects to the root memcg, when we free
> > > > > > the slab object, we need to update the per-memcg vmstats to keep
> > > > > > it correct for the root memcg. Now this at least affects the vmstat
> > > > > > of NR_KERNEL_STACK_KB for !CONFIG_VMAP_STACK when the thread stack
> > > > > > size is smaller than the PAGE_SIZE.
> > > > > >
> > > > > > Fixes: ec9f02384f60 ("mm: workingset: fix vmstat counters for shadow nodes")
> > > > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > > >
> > > > > Can you, please, drop this patch for now?
> > > > >
> > > > > I'm working on a bigger cleanup related to the handling of the root memory
> > > > > cgroup (I sent a link earlier in this thread), which already does a similar change.
> > > > > There are several issues like this one, so it will be nice to fix them all at once.
> > > >
> > > > I have read the patch of https://lkml.org/lkml/2020/10/14/869. You
> > > > mean this patch
> > > > fixes this issue? It chooses to uncharge the root memcg. But here we may need to
> > > > uncharge the root memcg to keep root vmstats correct. If we do not do
> > > > this, we can
> > > > see the wrong vmstats via root memory.stat(e.g. NR_KERNEL_STACK_KB).
> > >
> > > I pointed at a different patch in the same thread (it looks like you read the first one):
> > > https://lkml.org/lkml/2020/10/21/612
>
> Hi Muchun!
>
> Can you please, resend your patch? The planned cleanup of the root memory cgroup
> is more complex than expected, so I think it makes sense to merge your patch without
> waiting for it. I'm sorry for delaying it initially.

OK, I will do that. Thanks.

>
> Please, feel free to add
> Acked-by: Roman Gushchin <guro@fb.com>
>
> Thank you!
>
> Roman



-- 
Yours,
Muchun

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [External] Re: [PATCH 4/5] mm: memcg/slab: Fix root memcg vmstats
@ 2020-11-10  2:57               ` Muchun Song
  0 siblings, 0 replies; 41+ messages in thread
From: Muchun Song @ 2020-11-10  2:57 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Shakeel Butt, Joonsoo Kim, Yafang Shao, Chris Down,
	Christian Brauner, Peter Zijlstra, Ingo Molnar, Kees Cook,
	Thomas Gleixner, esyr, Suren Baghdasaryan, areber, Marco Elver,
	LKML, Cgroups, Linux Memory Management List

On Tue, Nov 10, 2020 at 9:33 AM Roman Gushchin <guro@fb.com> wrote:
>
> On Thu, Oct 29, 2020 at 02:15:43PM +0800, Muchun Song wrote:
> > On Thu, Oct 29, 2020 at 8:14 AM Roman Gushchin <guro@fb.com> wrote:
> > >
> > > On Wed, Oct 28, 2020 at 10:56:20AM +0800, Muchun Song wrote:
> > > > On Wed, Oct 28, 2020 at 2:48 AM Roman Gushchin <guro@fb.com> wrote:
> > > > >
> > > > > On Tue, Oct 27, 2020 at 04:02:55PM +0800, Muchun Song wrote:
> > > > > > If we reparent the slab objects to the root memcg, when we free
> > > > > > the slab object, we need to update the per-memcg vmstats to keep
> > > > > > it correct for the root memcg. Now this at least affects the vmstat
> > > > > > of NR_KERNEL_STACK_KB for !CONFIG_VMAP_STACK when the thread stack
> > > > > > size is smaller than the PAGE_SIZE.
> > > > > >
> > > > > > Fixes: ec9f02384f60 ("mm: workingset: fix vmstat counters for shadow nodes")
> > > > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > > >
> > > > > Can you, please, drop this patch for now?
> > > > >
> > > > > I'm working on a bigger cleanup related to the handling of the root memory
> > > > > cgroup (I sent a link earlier in this thread), which already does a similar change.
> > > > > There are several issues like this one, so it will be nice to fix them all at once.
> > > >
> > > > I have read the patch of https://lkml.org/lkml/2020/10/14/869. You
> > > > mean this patch
> > > > fixes this issue? It chooses to uncharge the root memcg. But here we may need to
> > > > uncharge the root memcg to keep root vmstats correct. If we do not do
> > > > this, we can
> > > > see the wrong vmstats via root memory.stat(e.g. NR_KERNEL_STACK_KB).
> > >
> > > I pointed at a different patch in the same thread (it looks like you read the first one):
> > > https://lkml.org/lkml/2020/10/21/612
>
> Hi Muchun!
>
> Can you please, resend your patch? The planned cleanup of the root memory cgroup
> is more complex than expected, so I think it makes sense to merge your patch without
> waiting for it. I'm sorry for delaying it initially.

OK, I will do that. Thanks.

>
> Please, feel free to add
> Acked-by: Roman Gushchin <guro@fb.com>
>
> Thank you!
>
> Roman



-- 
Yours,
Muchun


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [External] Re: [PATCH 4/5] mm: memcg/slab: Fix root memcg vmstats
@ 2020-11-10  2:57               ` Muchun Song
  0 siblings, 0 replies; 41+ messages in thread
From: Muchun Song @ 2020-11-10  2:57 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Shakeel Butt, Joonsoo Kim, Yafang Shao, Chris Down,
	Christian Brauner, Peter Zijlstra, Ingo Molnar, Kees Cook,
	Thomas Gleixner, esyr-H+wXaHxf7aLQT0dZR+AlfA, Suren Baghdasaryan,
	areber-H+wXaHxf7aLQT0dZR+AlfA, Marco Elver, LKML, Cgroups,
	Linux Memory Management List

On Tue, Nov 10, 2020 at 9:33 AM Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org> wrote:
>
> On Thu, Oct 29, 2020 at 02:15:43PM +0800, Muchun Song wrote:
> > On Thu, Oct 29, 2020 at 8:14 AM Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org> wrote:
> > >
> > > On Wed, Oct 28, 2020 at 10:56:20AM +0800, Muchun Song wrote:
> > > > On Wed, Oct 28, 2020 at 2:48 AM Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org> wrote:
> > > > >
> > > > > On Tue, Oct 27, 2020 at 04:02:55PM +0800, Muchun Song wrote:
> > > > > > If we reparent the slab objects to the root memcg, when we free
> > > > > > the slab object, we need to update the per-memcg vmstats to keep
> > > > > > it correct for the root memcg. Now this at least affects the vmstat
> > > > > > of NR_KERNEL_STACK_KB for !CONFIG_VMAP_STACK when the thread stack
> > > > > > size is smaller than the PAGE_SIZE.
> > > > > >
> > > > > > Fixes: ec9f02384f60 ("mm: workingset: fix vmstat counters for shadow nodes")
> > > > > > Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>
> > > > >
> > > > > Can you, please, drop this patch for now?
> > > > >
> > > > > I'm working on a bigger cleanup related to the handling of the root memory
> > > > > cgroup (I sent a link earlier in this thread), which already does a similar change.
> > > > > There are several issues like this one, so it will be nice to fix them all at once.
> > > >
> > > > I have read the patch of https://lkml.org/lkml/2020/10/14/869. You
> > > > mean this patch
> > > > fixes this issue? It chooses to uncharge the root memcg. But here we may need to
> > > > uncharge the root memcg to keep root vmstats correct. If we do not do
> > > > this, we can
> > > > see the wrong vmstats via root memory.stat(e.g. NR_KERNEL_STACK_KB).
> > >
> > > I pointed at a different patch in the same thread (it looks like you read the first one):
> > > https://lkml.org/lkml/2020/10/21/612
>
> Hi Muchun!
>
> Can you please, resend your patch? The planned cleanup of the root memory cgroup
> is more complex than expected, so I think it makes sense to merge your patch without
> waiting for it. I'm sorry for delaying it initially.

OK, I will do that. Thanks.

>
> Please, feel free to add
> Acked-by: Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org>
>
> Thank you!
>
> Roman



-- 
Yours,
Muchun

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2020-11-10  2:57 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-27  8:02 [PATCH 0/5] Fix some bugs in memcg/slab Muchun Song
2020-10-27  8:02 ` Muchun Song
2020-10-27  8:02 ` [PATCH 1/5] mm: memcg/slab: Fix return child memcg objcg for root memcg Muchun Song
2020-10-27 17:42   ` Roman Gushchin
2020-10-27 17:42     ` Roman Gushchin
2020-10-27 19:05     ` Roman Gushchin
2020-10-27 19:05       ` Roman Gushchin
2020-10-27  8:02 ` [PATCH 2/5] mm: memcg/slab: Fix use after free in obj_cgroup_charge Muchun Song
2020-10-27 18:31   ` Roman Gushchin
2020-10-27 18:31     ` Roman Gushchin
2020-10-27  8:02 ` [PATCH 3/5] mm: memcg/slab: Rename *_lruvec_slab_state to *_lruvec_kmem_state Muchun Song
2020-10-27  8:02   ` Muchun Song
2020-10-27 18:37   ` Roman Gushchin
2020-10-27 18:37     ` Roman Gushchin
2020-10-27  8:02 ` [PATCH 4/5] mm: memcg/slab: Fix root memcg vmstats Muchun Song
2020-10-27  8:02   ` Muchun Song
2020-10-27 18:48   ` Roman Gushchin
2020-10-27 18:48     ` Roman Gushchin
2020-10-28  2:56     ` [External] " Muchun Song
2020-10-28  2:56       ` Muchun Song
2020-10-28  2:56       ` Muchun Song
2020-10-28  3:47       ` Muchun Song
2020-10-28  3:47         ` Muchun Song
2020-10-29  0:14       ` Roman Gushchin
2020-10-29  0:14         ` Roman Gushchin
2020-10-29  6:15         ` Muchun Song
2020-10-29  6:15           ` Muchun Song
2020-11-10  1:32           ` Roman Gushchin
2020-11-10  1:32             ` Roman Gushchin
2020-11-10  2:57             ` Muchun Song
2020-11-10  2:57               ` Muchun Song
2020-11-10  2:57               ` Muchun Song
2020-10-27  8:02 ` [PATCH 5/5] mm: memcontrol: Simplify the mem_cgroup_page_lruvec Muchun Song
2020-10-27  8:02   ` Muchun Song
2020-10-27 13:36   ` Michal Hocko
2020-10-27 13:36     ` Michal Hocko
2020-10-27 14:15     ` [External] " Muchun Song
2020-10-27 14:15       ` Muchun Song
2020-10-27 14:15       ` Muchun Song
2020-10-27 14:31       ` Michal Hocko
2020-10-27 14:31         ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.