All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node
@ 2022-01-11  1:02 ` Wei Yang
  0 siblings, 0 replies; 55+ messages in thread
From: Wei Yang @ 2022-01-11  1:02 UTC (permalink / raw)
  To: hannes, mhocko, vdavydov.dev, akpm, shakeelb, guro, vbabka,
	willy, songmuchun, shy828301, surenb
  Cc: linux-kernel, cgroups, linux-mm, Wei Yang

Instead of use "-1", let's use NUMA_NO_NODE for consistency.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
---
 mm/memcontrol.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2ed5f2a0879d..11715f7323c0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5054,7 +5054,7 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
 	 *       function.
 	 */
 	if (!node_state(node, N_NORMAL_MEMORY))
-		tmp = -1;
+		tmp = NUMA_NO_NODE;
 	pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, tmp);
 	if (!pn)
 		return 1;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node
@ 2022-01-11  1:02 ` Wei Yang
  0 siblings, 0 replies; 55+ messages in thread
From: Wei Yang @ 2022-01-11  1:02 UTC (permalink / raw)
  To: hannes-druUgvl0LCNAfugRpC6u6w, mhocko-DgEjT+Ai2ygdnm+yROfE0A,
	vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA, guro-b10kYP2dOMg,
	vbabka-AlSwsSmVLrQ, willy-wEGCiKHe2LqWVfeAwA7xHQ,
	songmuchun-EC8Uxl6Npydl57MIdRCFDg,
	shy828301-Re5JQEeQqe8AvxtiuMwx3w, surenb-hpIqsD4AKlfQT0dZR+AlfA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	Wei Yang

Instead of use "-1", let's use NUMA_NO_NODE for consistency.

Signed-off-by: Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 mm/memcontrol.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2ed5f2a0879d..11715f7323c0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5054,7 +5054,7 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
 	 *       function.
 	 */
 	if (!node_state(node, N_NORMAL_MEMORY))
-		tmp = -1;
+		tmp = NUMA_NO_NODE;
 	pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, tmp);
 	if (!pn)
 		return 1;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 2/4] mm/memcg: mem_cgroup_per_node is already set to 0 on allocation
  2022-01-11  1:02 ` Wei Yang
  (?)
@ 2022-01-11  1:03 ` Wei Yang
  2022-01-11  2:25     ` Muchun Song
                     ` (4 more replies)
  -1 siblings, 5 replies; 55+ messages in thread
From: Wei Yang @ 2022-01-11  1:03 UTC (permalink / raw)
  To: hannes, mhocko, vdavydov.dev, akpm, shakeelb, guro, vbabka,
	willy, songmuchun, shy828301, surenb
  Cc: linux-kernel, cgroups, linux-mm, Wei Yang

kzalloc_node() would set data to 0, so it's not necessary to set it
again.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
---
 mm/memcontrol.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 11715f7323c0..a504616f904a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5067,8 +5067,6 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
 	}
 
 	lruvec_init(&pn->lruvec);
-	pn->usage_in_excess = 0;
-	pn->on_tree = false;
 	pn->memcg = memcg;
 
 	memcg->nodeinfo[node] = pn;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 3/4] mm/memcg: retrieve parent memcg from css.parent
@ 2022-01-11  1:03   ` Wei Yang
  0 siblings, 0 replies; 55+ messages in thread
From: Wei Yang @ 2022-01-11  1:03 UTC (permalink / raw)
  To: hannes, mhocko, vdavydov.dev, akpm, shakeelb, guro, vbabka,
	willy, songmuchun, shy828301, surenb
  Cc: linux-kernel, cgroups, linux-mm, Wei Yang

The parent we get from page_counter is correct, while this is two
different hierarchy.

Let's retrieve the parent memcg from css.parent just like parent_cs(),
blkcg_parent(), etc.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
---
 include/linux/memcontrol.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 0c5c403f4be6..12bf443f7b14 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -886,9 +886,7 @@ static inline struct mem_cgroup *lruvec_memcg(struct lruvec *lruvec)
  */
 static inline struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg)
 {
-	if (!memcg->memory.parent)
-		return NULL;
-	return mem_cgroup_from_counter(memcg->memory.parent, memory);
+	return mem_cgroup_from_css(memcg->css.parent);
 }
 
 static inline bool mem_cgroup_is_descendant(struct mem_cgroup *memcg,
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 3/4] mm/memcg: retrieve parent memcg from css.parent
@ 2022-01-11  1:03   ` Wei Yang
  0 siblings, 0 replies; 55+ messages in thread
From: Wei Yang @ 2022-01-11  1:03 UTC (permalink / raw)
  To: hannes-druUgvl0LCNAfugRpC6u6w, mhocko-DgEjT+Ai2ygdnm+yROfE0A,
	vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA, guro-b10kYP2dOMg,
	vbabka-AlSwsSmVLrQ, willy-wEGCiKHe2LqWVfeAwA7xHQ,
	songmuchun-EC8Uxl6Npydl57MIdRCFDg,
	shy828301-Re5JQEeQqe8AvxtiuMwx3w, surenb-hpIqsD4AKlfQT0dZR+AlfA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	Wei Yang

The parent we get from page_counter is correct, while this is two
different hierarchy.

Let's retrieve the parent memcg from css.parent just like parent_cs(),
blkcg_parent(), etc.

Signed-off-by: Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 include/linux/memcontrol.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 0c5c403f4be6..12bf443f7b14 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -886,9 +886,7 @@ static inline struct mem_cgroup *lruvec_memcg(struct lruvec *lruvec)
  */
 static inline struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg)
 {
-	if (!memcg->memory.parent)
-		return NULL;
-	return mem_cgroup_from_counter(memcg->memory.parent, memory);
+	return mem_cgroup_from_css(memcg->css.parent);
 }
 
 static inline bool mem_cgroup_is_descendant(struct mem_cgroup *memcg,
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 4/4] mm/memcg: refine mem_cgroup_threshold_ary->current_threshold calculation
@ 2022-01-11  1:03   ` Wei Yang
  0 siblings, 0 replies; 55+ messages in thread
From: Wei Yang @ 2022-01-11  1:03 UTC (permalink / raw)
  To: hannes, mhocko, vdavydov.dev, akpm, shakeelb, guro, vbabka,
	willy, songmuchun, shy828301, surenb
  Cc: linux-kernel, cgroups, linux-mm, Wei Yang

mem_cgroup_threshold_ary->current_threshold points to the last entry
who's threshold is less or equal to usage.

Instead of iterating entries to get the correct index, we can leverage
primary->current_threshold to get it. If the threshold added is less or
equal to usage, current_threshold should increase by one. Otherwise, it
doesn't change.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
---
 mm/memcontrol.c | 31 +++++++++++++++++--------------
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a504616f904a..ce7060907df2 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4161,7 +4161,7 @@ static int __mem_cgroup_usage_register_event(struct mem_cgroup *memcg,
 	struct mem_cgroup_threshold_ary *new;
 	unsigned long threshold;
 	unsigned long usage;
-	int i, size, ret;
+	int size, ret;
 
 	ret = page_counter_memparse(args, "-1", &threshold);
 	if (ret)
@@ -4193,9 +4193,13 @@ static int __mem_cgroup_usage_register_event(struct mem_cgroup *memcg,
 	new->size = size;
 
 	/* Copy thresholds (if any) to new array */
-	if (thresholds->primary)
+	if (thresholds->primary) {
 		memcpy(new->entries, thresholds->primary->entries,
 		       flex_array_size(new, entries, size - 1));
+		new->current_threshold = thresholds->primary->current_threshold;
+	} else {
+		new->current_threshold = -1;
+	}
 
 	/* Add new threshold */
 	new->entries[size - 1].eventfd = eventfd;
@@ -4205,18 +4209,17 @@ static int __mem_cgroup_usage_register_event(struct mem_cgroup *memcg,
 	sort(new->entries, size, sizeof(*new->entries),
 			compare_thresholds, NULL);
 
-	/* Find current threshold */
-	new->current_threshold = -1;
-	for (i = 0; i < size; i++) {
-		if (new->entries[i].threshold <= usage) {
-			/*
-			 * new->current_threshold will not be used until
-			 * rcu_assign_pointer(), so it's safe to increment
-			 * it here.
-			 */
-			++new->current_threshold;
-		} else
-			break;
+	/*
+	 * If the threshold added here is less or equal to usage, this means
+	 * current_threshold need to increase by one.
+	 */
+	if (threshold <= usage) {
+		/*
+		 * new->current_threshold will not be used until
+		 * rcu_assign_pointer(), so it's safe to increment
+		 * it here.
+		 */
+		new->current_threshold++;
 	}
 
 	/* Free old spare buffer and save old primary buffer as spare */
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 4/4] mm/memcg: refine mem_cgroup_threshold_ary->current_threshold calculation
@ 2022-01-11  1:03   ` Wei Yang
  0 siblings, 0 replies; 55+ messages in thread
From: Wei Yang @ 2022-01-11  1:03 UTC (permalink / raw)
  To: hannes-druUgvl0LCNAfugRpC6u6w, mhocko-DgEjT+Ai2ygdnm+yROfE0A,
	vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA, guro-b10kYP2dOMg,
	vbabka-AlSwsSmVLrQ, willy-wEGCiKHe2LqWVfeAwA7xHQ,
	songmuchun-EC8Uxl6Npydl57MIdRCFDg,
	shy828301-Re5JQEeQqe8AvxtiuMwx3w, surenb-hpIqsD4AKlfQT0dZR+AlfA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	Wei Yang

mem_cgroup_threshold_ary->current_threshold points to the last entry
who's threshold is less or equal to usage.

Instead of iterating entries to get the correct index, we can leverage
primary->current_threshold to get it. If the threshold added is less or
equal to usage, current_threshold should increase by one. Otherwise, it
doesn't change.

Signed-off-by: Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 mm/memcontrol.c | 31 +++++++++++++++++--------------
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a504616f904a..ce7060907df2 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4161,7 +4161,7 @@ static int __mem_cgroup_usage_register_event(struct mem_cgroup *memcg,
 	struct mem_cgroup_threshold_ary *new;
 	unsigned long threshold;
 	unsigned long usage;
-	int i, size, ret;
+	int size, ret;
 
 	ret = page_counter_memparse(args, "-1", &threshold);
 	if (ret)
@@ -4193,9 +4193,13 @@ static int __mem_cgroup_usage_register_event(struct mem_cgroup *memcg,
 	new->size = size;
 
 	/* Copy thresholds (if any) to new array */
-	if (thresholds->primary)
+	if (thresholds->primary) {
 		memcpy(new->entries, thresholds->primary->entries,
 		       flex_array_size(new, entries, size - 1));
+		new->current_threshold = thresholds->primary->current_threshold;
+	} else {
+		new->current_threshold = -1;
+	}
 
 	/* Add new threshold */
 	new->entries[size - 1].eventfd = eventfd;
@@ -4205,18 +4209,17 @@ static int __mem_cgroup_usage_register_event(struct mem_cgroup *memcg,
 	sort(new->entries, size, sizeof(*new->entries),
 			compare_thresholds, NULL);
 
-	/* Find current threshold */
-	new->current_threshold = -1;
-	for (i = 0; i < size; i++) {
-		if (new->entries[i].threshold <= usage) {
-			/*
-			 * new->current_threshold will not be used until
-			 * rcu_assign_pointer(), so it's safe to increment
-			 * it here.
-			 */
-			++new->current_threshold;
-		} else
-			break;
+	/*
+	 * If the threshold added here is less or equal to usage, this means
+	 * current_threshold need to increase by one.
+	 */
+	if (threshold <= usage) {
+		/*
+		 * new->current_threshold will not be used until
+		 * rcu_assign_pointer(), so it's safe to increment
+		 * it here.
+		 */
+		new->current_threshold++;
 	}
 
 	/* Free old spare buffer and save old primary buffer as spare */
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node
@ 2022-01-11  2:23   ` Muchun Song
  0 siblings, 0 replies; 55+ messages in thread
From: Muchun Song @ 2022-01-11  2:23 UTC (permalink / raw)
  To: Wei Yang
  Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Shakeel Butt, Roman Gushchin, Vlastimil Babka, Matthew Wilcox,
	Yang Shi, Suren Baghdasaryan, LKML, Cgroups,
	Linux Memory Management List

On Tue, Jan 11, 2022 at 9:03 AM Wei Yang <richard.weiyang@gmail.com> wrote:
>
> Instead of use "-1", let's use NUMA_NO_NODE for consistency.
>
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>

Reviewed-by: Muchun Song <songmuchun@bytedance.com>

Thanks.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node
@ 2022-01-11  2:23   ` Muchun Song
  0 siblings, 0 replies; 55+ messages in thread
From: Muchun Song @ 2022-01-11  2:23 UTC (permalink / raw)
  To: Wei Yang
  Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Shakeel Butt, Roman Gushchin, Vlastimil Babka, Matthew Wilcox,
	Yang Shi, Suren Baghdasaryan, LKML, Cgroups,
	Linux Memory Management List

On Tue, Jan 11, 2022 at 9:03 AM Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
> Instead of use "-1", let's use NUMA_NO_NODE for consistency.
>
> Signed-off-by: Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Reviewed-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>

Thanks.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/4] mm/memcg: mem_cgroup_per_node is already set to 0 on allocation
@ 2022-01-11  2:25     ` Muchun Song
  0 siblings, 0 replies; 55+ messages in thread
From: Muchun Song @ 2022-01-11  2:25 UTC (permalink / raw)
  To: Wei Yang
  Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Shakeel Butt, Roman Gushchin, Vlastimil Babka, Matthew Wilcox,
	Yang Shi, Suren Baghdasaryan, LKML, Cgroups,
	Linux Memory Management List

On Tue, Jan 11, 2022 at 9:03 AM Wei Yang <richard.weiyang@gmail.com> wrote:
>
> kzalloc_node() would set data to 0, so it's not necessary to set it
> again.
>
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>

Reviewed-by: Muchun Song <songmuchun@bytedance.com>

Thanks.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/4] mm/memcg: mem_cgroup_per_node is already set to 0 on allocation
@ 2022-01-11  2:25     ` Muchun Song
  0 siblings, 0 replies; 55+ messages in thread
From: Muchun Song @ 2022-01-11  2:25 UTC (permalink / raw)
  To: Wei Yang
  Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Shakeel Butt, Roman Gushchin, Vlastimil Babka, Matthew Wilcox,
	Yang Shi, Suren Baghdasaryan, LKML, Cgroups,
	Linux Memory Management List

On Tue, Jan 11, 2022 at 9:03 AM Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
> kzalloc_node() would set data to 0, so it's not necessary to set it
> again.
>
> Signed-off-by: Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Reviewed-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>

Thanks.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/4] mm/memcg: retrieve parent memcg from css.parent
@ 2022-01-11  3:12     ` Muchun Song
  0 siblings, 0 replies; 55+ messages in thread
From: Muchun Song @ 2022-01-11  3:12 UTC (permalink / raw)
  To: Wei Yang
  Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Shakeel Butt, Roman Gushchin, Vlastimil Babka, Matthew Wilcox,
	Yang Shi, Suren Baghdasaryan, LKML, Cgroups,
	Linux Memory Management List

On Tue, Jan 11, 2022 at 9:03 AM Wei Yang <richard.weiyang@gmail.com> wrote:
>
> The parent we get from page_counter is correct, while this is two
> different hierarchy.
>
> Let's retrieve the parent memcg from css.parent just like parent_cs(),
> blkcg_parent(), etc.
>
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>

Reviewed-by: Muchun Song <songmuchun@bytedance.com>

Thanks.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/4] mm/memcg: retrieve parent memcg from css.parent
@ 2022-01-11  3:12     ` Muchun Song
  0 siblings, 0 replies; 55+ messages in thread
From: Muchun Song @ 2022-01-11  3:12 UTC (permalink / raw)
  To: Wei Yang
  Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Shakeel Butt, Roman Gushchin, Vlastimil Babka, Matthew Wilcox,
	Yang Shi, Suren Baghdasaryan, LKML, Cgroups,
	Linux Memory Management List

On Tue, Jan 11, 2022 at 9:03 AM Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
> The parent we get from page_counter is correct, while this is two
> different hierarchy.
>
> Let's retrieve the parent memcg from css.parent just like parent_cs(),
> blkcg_parent(), etc.
>
> Signed-off-by: Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Reviewed-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>

Thanks.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node
@ 2022-01-11  8:40   ` Michal Hocko
  0 siblings, 0 replies; 55+ messages in thread
From: Michal Hocko @ 2022-01-11  8:40 UTC (permalink / raw)
  To: Wei Yang
  Cc: hannes, vdavydov.dev, akpm, shakeelb, guro, vbabka, willy,
	songmuchun, shy828301, surenb, linux-kernel, cgroups, linux-mm

On Tue 11-01-22 01:02:59, Wei Yang wrote:
> Instead of use "-1", let's use NUMA_NO_NODE for consistency.
> 
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>

I am not really sure this is worth it. After the merge window I plan to
post http://lkml.kernel.org/r/20211214100732.26335-1-mhocko@kernel.org.
With that in place we can drop the check and a node rewrite so the net
result will be a less and more straightforward code. If you agree I can
add this with your s-o-b into my series:

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 781605e92015..ed19a21ee14e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5044,18 +5044,8 @@ struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
 static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
 {
 	struct mem_cgroup_per_node *pn;
-	int tmp = node;
-	/*
-	 * This routine is called against possible nodes.
-	 * But it's BUG to call kmalloc() against offline node.
-	 *
-	 * TODO: this routine can waste much memory for nodes which will
-	 *       never be onlined. It's better to use memory hotplug callback
-	 *       function.
-	 */
-	if (!node_state(node, N_NORMAL_MEMORY))
-		tmp = -1;
-	pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, tmp);
+
+	pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, node);
 	if (!pn)
 		return 1;
 
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node
@ 2022-01-11  8:40   ` Michal Hocko
  0 siblings, 0 replies; 55+ messages in thread
From: Michal Hocko @ 2022-01-11  8:40 UTC (permalink / raw)
  To: Wei Yang
  Cc: hannes-druUgvl0LCNAfugRpC6u6w,
	vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA, guro-b10kYP2dOMg,
	vbabka-AlSwsSmVLrQ, willy-wEGCiKHe2LqWVfeAwA7xHQ,
	songmuchun-EC8Uxl6Npydl57MIdRCFDg,
	shy828301-Re5JQEeQqe8AvxtiuMwx3w, surenb-hpIqsD4AKlfQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Tue 11-01-22 01:02:59, Wei Yang wrote:
> Instead of use "-1", let's use NUMA_NO_NODE for consistency.
> 
> Signed-off-by: Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

I am not really sure this is worth it. After the merge window I plan to
post http://lkml.kernel.org/r/20211214100732.26335-1-mhocko-DgEjT+Ai2yi4UlQgPVntAg@public.gmane.org
With that in place we can drop the check and a node rewrite so the net
result will be a less and more straightforward code. If you agree I can
add this with your s-o-b into my series:

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 781605e92015..ed19a21ee14e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5044,18 +5044,8 @@ struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
 static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
 {
 	struct mem_cgroup_per_node *pn;
-	int tmp = node;
-	/*
-	 * This routine is called against possible nodes.
-	 * But it's BUG to call kmalloc() against offline node.
-	 *
-	 * TODO: this routine can waste much memory for nodes which will
-	 *       never be onlined. It's better to use memory hotplug callback
-	 *       function.
-	 */
-	if (!node_state(node, N_NORMAL_MEMORY))
-		tmp = -1;
-	pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, tmp);
+
+	pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, node);
 	if (!pn)
 		return 1;
 
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/4] mm/memcg: mem_cgroup_per_node is already set to 0 on allocation
  2022-01-11  1:03 ` [PATCH 2/4] mm/memcg: mem_cgroup_per_node is already set to 0 on allocation Wei Yang
  2022-01-11  2:25     ` Muchun Song
@ 2022-01-11  8:41   ` Michal Hocko
  2022-01-11 18:06     ` Roman Gushchin
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 55+ messages in thread
From: Michal Hocko @ 2022-01-11  8:41 UTC (permalink / raw)
  To: Wei Yang
  Cc: hannes, vdavydov.dev, akpm, shakeelb, guro, vbabka, willy,
	songmuchun, shy828301, surenb, linux-kernel, cgroups, linux-mm

On Tue 11-01-22 01:03:00, Wei Yang wrote:
> kzalloc_node() would set data to 0, so it's not necessary to set it
> again.
> 
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/memcontrol.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 11715f7323c0..a504616f904a 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5067,8 +5067,6 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
>  	}
>  
>  	lruvec_init(&pn->lruvec);
> -	pn->usage_in_excess = 0;
> -	pn->on_tree = false;
>  	pn->memcg = memcg;
>  
>  	memcg->nodeinfo[node] = pn;
> -- 
> 2.33.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/4] mm/memcg: retrieve parent memcg from css.parent
  2022-01-11  1:03   ` Wei Yang
  (?)
  (?)
@ 2022-01-11  8:43   ` Michal Hocko
  -1 siblings, 0 replies; 55+ messages in thread
From: Michal Hocko @ 2022-01-11  8:43 UTC (permalink / raw)
  To: Wei Yang
  Cc: hannes, vdavydov.dev, akpm, shakeelb, guro, vbabka, willy,
	songmuchun, shy828301, surenb, linux-kernel, cgroups, linux-mm

On Tue 11-01-22 01:03:01, Wei Yang wrote:
> The parent we get from page_counter is correct, while this is two
> different hierarchy.
> 
> Let's retrieve the parent memcg from css.parent just like parent_cs(),
> blkcg_parent(), etc.
> 
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  include/linux/memcontrol.h | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 0c5c403f4be6..12bf443f7b14 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -886,9 +886,7 @@ static inline struct mem_cgroup *lruvec_memcg(struct lruvec *lruvec)
>   */
>  static inline struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg)
>  {
> -	if (!memcg->memory.parent)
> -		return NULL;
> -	return mem_cgroup_from_counter(memcg->memory.parent, memory);
> +	return mem_cgroup_from_css(memcg->css.parent);
>  }
>  
>  static inline bool mem_cgroup_is_descendant(struct mem_cgroup *memcg,
> -- 
> 2.33.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/4] mm/memcg: refine mem_cgroup_threshold_ary->current_threshold calculation
  2022-01-11  1:03   ` Wei Yang
  (?)
@ 2022-01-11  8:44   ` Michal Hocko
  -1 siblings, 0 replies; 55+ messages in thread
From: Michal Hocko @ 2022-01-11  8:44 UTC (permalink / raw)
  To: Wei Yang
  Cc: hannes, vdavydov.dev, akpm, shakeelb, guro, vbabka, willy,
	songmuchun, shy828301, surenb, linux-kernel, cgroups, linux-mm

On Tue 11-01-22 01:03:02, Wei Yang wrote:
> mem_cgroup_threshold_ary->current_threshold points to the last entry
> who's threshold is less or equal to usage.
> 
> Instead of iterating entries to get the correct index, we can leverage
> primary->current_threshold to get it. If the threshold added is less or
> equal to usage, current_threshold should increase by one. Otherwise, it
> doesn't change.

Why do we want/need this change?

> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> ---
>  mm/memcontrol.c | 31 +++++++++++++++++--------------
>  1 file changed, 17 insertions(+), 14 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index a504616f904a..ce7060907df2 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4161,7 +4161,7 @@ static int __mem_cgroup_usage_register_event(struct mem_cgroup *memcg,
>  	struct mem_cgroup_threshold_ary *new;
>  	unsigned long threshold;
>  	unsigned long usage;
> -	int i, size, ret;
> +	int size, ret;
>  
>  	ret = page_counter_memparse(args, "-1", &threshold);
>  	if (ret)
> @@ -4193,9 +4193,13 @@ static int __mem_cgroup_usage_register_event(struct mem_cgroup *memcg,
>  	new->size = size;
>  
>  	/* Copy thresholds (if any) to new array */
> -	if (thresholds->primary)
> +	if (thresholds->primary) {
>  		memcpy(new->entries, thresholds->primary->entries,
>  		       flex_array_size(new, entries, size - 1));
> +		new->current_threshold = thresholds->primary->current_threshold;
> +	} else {
> +		new->current_threshold = -1;
> +	}
>  
>  	/* Add new threshold */
>  	new->entries[size - 1].eventfd = eventfd;
> @@ -4205,18 +4209,17 @@ static int __mem_cgroup_usage_register_event(struct mem_cgroup *memcg,
>  	sort(new->entries, size, sizeof(*new->entries),
>  			compare_thresholds, NULL);
>  
> -	/* Find current threshold */
> -	new->current_threshold = -1;
> -	for (i = 0; i < size; i++) {
> -		if (new->entries[i].threshold <= usage) {
> -			/*
> -			 * new->current_threshold will not be used until
> -			 * rcu_assign_pointer(), so it's safe to increment
> -			 * it here.
> -			 */
> -			++new->current_threshold;
> -		} else
> -			break;
> +	/*
> +	 * If the threshold added here is less or equal to usage, this means
> +	 * current_threshold need to increase by one.
> +	 */
> +	if (threshold <= usage) {
> +		/*
> +		 * new->current_threshold will not be used until
> +		 * rcu_assign_pointer(), so it's safe to increment
> +		 * it here.
> +		 */
> +		new->current_threshold++;
>  	}
>  
>  	/* Free old spare buffer and save old primary buffer as spare */
> -- 
> 2.33.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node
@ 2022-01-11 18:05   ` Roman Gushchin
  0 siblings, 0 replies; 55+ messages in thread
From: Roman Gushchin @ 2022-01-11 18:05 UTC (permalink / raw)
  To: Wei Yang
  Cc: hannes, mhocko, vdavydov.dev, akpm, shakeelb, vbabka, willy,
	songmuchun, shy828301, surenb, linux-kernel, cgroups, linux-mm

On Tue, Jan 11, 2022 at 01:02:59AM +0000, Wei Yang wrote:
> Instead of use "-1", let's use NUMA_NO_NODE for consistency.
> 
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>

If the patch won't be dropped for Michal's work, please,
feel free to add
Acked-by: Roman Gushchin <guro@fb.com>
Thanks!

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node
@ 2022-01-11 18:05   ` Roman Gushchin
  0 siblings, 0 replies; 55+ messages in thread
From: Roman Gushchin @ 2022-01-11 18:05 UTC (permalink / raw)
  To: Wei Yang
  Cc: hannes-druUgvl0LCNAfugRpC6u6w, mhocko-DgEjT+Ai2ygdnm+yROfE0A,
	vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA, vbabka-AlSwsSmVLrQ,
	willy-wEGCiKHe2LqWVfeAwA7xHQ, songmuchun-EC8Uxl6Npydl57MIdRCFDg,
	shy828301-Re5JQEeQqe8AvxtiuMwx3w, surenb-hpIqsD4AKlfQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Tue, Jan 11, 2022 at 01:02:59AM +0000, Wei Yang wrote:
> Instead of use "-1", let's use NUMA_NO_NODE for consistency.
> 
> Signed-off-by: Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

If the patch won't be dropped for Michal's work, please,
feel free to add
Acked-by: Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org>
Thanks!

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/4] mm/memcg: mem_cgroup_per_node is already set to 0 on allocation
@ 2022-01-11 18:06     ` Roman Gushchin
  0 siblings, 0 replies; 55+ messages in thread
From: Roman Gushchin @ 2022-01-11 18:06 UTC (permalink / raw)
  To: Wei Yang
  Cc: hannes, mhocko, vdavydov.dev, akpm, shakeelb, vbabka, willy,
	songmuchun, shy828301, surenb, linux-kernel, cgroups, linux-mm

On Tue, Jan 11, 2022 at 01:03:00AM +0000, Wei Yang wrote:
> kzalloc_node() would set data to 0, so it's not necessary to set it
> again.
> 
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>

Reviewed-by: Roman Gushchin <guro@fb.com>

Thanks!

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/4] mm/memcg: mem_cgroup_per_node is already set to 0 on allocation
@ 2022-01-11 18:06     ` Roman Gushchin
  0 siblings, 0 replies; 55+ messages in thread
From: Roman Gushchin @ 2022-01-11 18:06 UTC (permalink / raw)
  To: Wei Yang
  Cc: hannes-druUgvl0LCNAfugRpC6u6w, mhocko-DgEjT+Ai2ygdnm+yROfE0A,
	vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA, vbabka-AlSwsSmVLrQ,
	willy-wEGCiKHe2LqWVfeAwA7xHQ, songmuchun-EC8Uxl6Npydl57MIdRCFDg,
	shy828301-Re5JQEeQqe8AvxtiuMwx3w, surenb-hpIqsD4AKlfQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Tue, Jan 11, 2022 at 01:03:00AM +0000, Wei Yang wrote:
> kzalloc_node() would set data to 0, so it's not necessary to set it
> again.
> 
> Signed-off-by: Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Reviewed-by: Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org>

Thanks!

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/4] mm/memcg: retrieve parent memcg from css.parent
@ 2022-01-11 18:09     ` Roman Gushchin
  0 siblings, 0 replies; 55+ messages in thread
From: Roman Gushchin @ 2022-01-11 18:09 UTC (permalink / raw)
  To: Wei Yang
  Cc: hannes, mhocko, vdavydov.dev, akpm, shakeelb, vbabka, willy,
	songmuchun, shy828301, surenb, linux-kernel, cgroups, linux-mm

On Tue, Jan 11, 2022 at 01:03:01AM +0000, Wei Yang wrote:
> The parent we get from page_counter is correct, while this is two
> different hierarchy.
> 
> Let's retrieve the parent memcg from css.parent just like parent_cs(),
> blkcg_parent(), etc.

Does it bring any benefits except consistency?

> 
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>

Reviewed-by: Roman Gushchin <guro@fb.com>

Thanks!

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/4] mm/memcg: retrieve parent memcg from css.parent
@ 2022-01-11 18:09     ` Roman Gushchin
  0 siblings, 0 replies; 55+ messages in thread
From: Roman Gushchin @ 2022-01-11 18:09 UTC (permalink / raw)
  To: Wei Yang
  Cc: hannes-druUgvl0LCNAfugRpC6u6w, mhocko-DgEjT+Ai2ygdnm+yROfE0A,
	vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA, vbabka-AlSwsSmVLrQ,
	willy-wEGCiKHe2LqWVfeAwA7xHQ, songmuchun-EC8Uxl6Npydl57MIdRCFDg,
	shy828301-Re5JQEeQqe8AvxtiuMwx3w, surenb-hpIqsD4AKlfQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Tue, Jan 11, 2022 at 01:03:01AM +0000, Wei Yang wrote:
> The parent we get from page_counter is correct, while this is two
> different hierarchy.
> 
> Let's retrieve the parent memcg from css.parent just like parent_cs(),
> blkcg_parent(), etc.

Does it bring any benefits except consistency?

> 
> Signed-off-by: Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Reviewed-by: Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org>

Thanks!

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/4] mm/memcg: refine mem_cgroup_threshold_ary->current_threshold calculation
@ 2022-01-11 18:23     ` Roman Gushchin
  0 siblings, 0 replies; 55+ messages in thread
From: Roman Gushchin @ 2022-01-11 18:23 UTC (permalink / raw)
  To: Wei Yang
  Cc: hannes, mhocko, vdavydov.dev, akpm, shakeelb, vbabka, willy,
	songmuchun, shy828301, surenb, linux-kernel, cgroups, linux-mm

On Tue, Jan 11, 2022 at 01:03:02AM +0000, Wei Yang wrote:
> mem_cgroup_threshold_ary->current_threshold points to the last entry
> who's threshold is less or equal to usage.
> 
> Instead of iterating entries to get the correct index, we can leverage
> primary->current_threshold to get it. If the threshold added is less or
> equal to usage, current_threshold should increase by one. Otherwise, it
> doesn't change.

How big is usually an array of thresholds? If it's not huge, likely
any savings won't be really noticeable (it's not a hot path and there
is an rc_synchronize() below).

So I agree with Michal that a better justification is really needed.

Thanks!

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/4] mm/memcg: refine mem_cgroup_threshold_ary->current_threshold calculation
@ 2022-01-11 18:23     ` Roman Gushchin
  0 siblings, 0 replies; 55+ messages in thread
From: Roman Gushchin @ 2022-01-11 18:23 UTC (permalink / raw)
  To: Wei Yang
  Cc: hannes-druUgvl0LCNAfugRpC6u6w, mhocko-DgEjT+Ai2ygdnm+yROfE0A,
	vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA, vbabka-AlSwsSmVLrQ,
	willy-wEGCiKHe2LqWVfeAwA7xHQ, songmuchun-EC8Uxl6Npydl57MIdRCFDg,
	shy828301-Re5JQEeQqe8AvxtiuMwx3w, surenb-hpIqsD4AKlfQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Tue, Jan 11, 2022 at 01:03:02AM +0000, Wei Yang wrote:
> mem_cgroup_threshold_ary->current_threshold points to the last entry
> who's threshold is less or equal to usage.
> 
> Instead of iterating entries to get the correct index, we can leverage
> primary->current_threshold to get it. If the threshold added is less or
> equal to usage, current_threshold should increase by one. Otherwise, it
> doesn't change.

How big is usually an array of thresholds? If it's not huge, likely
any savings won't be really noticeable (it's not a hot path and there
is an rc_synchronize() below).

So I agree with Michal that a better justification is really needed.

Thanks!

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/4] mm/memcg: retrieve parent memcg from css.parent
@ 2022-01-12  0:24       ` Wei Yang
  0 siblings, 0 replies; 55+ messages in thread
From: Wei Yang @ 2022-01-12  0:24 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Wei Yang, hannes, mhocko, vdavydov.dev, akpm, shakeelb, vbabka,
	willy, songmuchun, shy828301, surenb, linux-kernel, cgroups,
	linux-mm

On Tue, Jan 11, 2022 at 10:09:52AM -0800, Roman Gushchin wrote:
>On Tue, Jan 11, 2022 at 01:03:01AM +0000, Wei Yang wrote:
>> The parent we get from page_counter is correct, while this is two
>> different hierarchy.
>> 
>> Let's retrieve the parent memcg from css.parent just like parent_cs(),
>> blkcg_parent(), etc.
>
>Does it bring any benefits except consistency?
>

I am afraid no.

>> 
>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>
>Reviewed-by: Roman Gushchin <guro@fb.com>
>
>Thanks!

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/4] mm/memcg: retrieve parent memcg from css.parent
@ 2022-01-12  0:24       ` Wei Yang
  0 siblings, 0 replies; 55+ messages in thread
From: Wei Yang @ 2022-01-12  0:24 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Wei Yang, hannes-druUgvl0LCNAfugRpC6u6w,
	mhocko-DgEjT+Ai2ygdnm+yROfE0A,
	vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA, vbabka-AlSwsSmVLrQ,
	willy-wEGCiKHe2LqWVfeAwA7xHQ, songmuchun-EC8Uxl6Npydl57MIdRCFDg,
	shy828301-Re5JQEeQqe8AvxtiuMwx3w, surenb-hpIqsD4AKlfQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Tue, Jan 11, 2022 at 10:09:52AM -0800, Roman Gushchin wrote:
>On Tue, Jan 11, 2022 at 01:03:01AM +0000, Wei Yang wrote:
>> The parent we get from page_counter is correct, while this is two
>> different hierarchy.
>> 
>> Let's retrieve the parent memcg from css.parent just like parent_cs(),
>> blkcg_parent(), etc.
>
>Does it bring any benefits except consistency?
>

I am afraid no.

>> 
>> Signed-off-by: Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>
>Reviewed-by: Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org>
>
>Thanks!

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/4] mm/memcg: refine mem_cgroup_threshold_ary->current_threshold calculation
@ 2022-01-12  0:25       ` Wei Yang
  0 siblings, 0 replies; 55+ messages in thread
From: Wei Yang @ 2022-01-12  0:25 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Wei Yang, hannes, mhocko, vdavydov.dev, akpm, shakeelb, vbabka,
	willy, songmuchun, shy828301, surenb, linux-kernel, cgroups,
	linux-mm

On Tue, Jan 11, 2022 at 10:23:41AM -0800, Roman Gushchin wrote:
>On Tue, Jan 11, 2022 at 01:03:02AM +0000, Wei Yang wrote:
>> mem_cgroup_threshold_ary->current_threshold points to the last entry
>> who's threshold is less or equal to usage.
>> 
>> Instead of iterating entries to get the correct index, we can leverage
>> primary->current_threshold to get it. If the threshold added is less or
>> equal to usage, current_threshold should increase by one. Otherwise, it
>> doesn't change.
>
>How big is usually an array of thresholds? If it's not huge, likely
>any savings won't be really noticeable (it's not a hot path and there
>is an rc_synchronize() below).

Usually the size is small I think.

>
>So I agree with Michal that a better justification is really needed.

Yep.

>
>Thanks!

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/4] mm/memcg: refine mem_cgroup_threshold_ary->current_threshold calculation
@ 2022-01-12  0:25       ` Wei Yang
  0 siblings, 0 replies; 55+ messages in thread
From: Wei Yang @ 2022-01-12  0:25 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Wei Yang, hannes-druUgvl0LCNAfugRpC6u6w,
	mhocko-DgEjT+Ai2ygdnm+yROfE0A,
	vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA, vbabka-AlSwsSmVLrQ,
	willy-wEGCiKHe2LqWVfeAwA7xHQ, songmuchun-EC8Uxl6Npydl57MIdRCFDg,
	shy828301-Re5JQEeQqe8AvxtiuMwx3w, surenb-hpIqsD4AKlfQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Tue, Jan 11, 2022 at 10:23:41AM -0800, Roman Gushchin wrote:
>On Tue, Jan 11, 2022 at 01:03:02AM +0000, Wei Yang wrote:
>> mem_cgroup_threshold_ary->current_threshold points to the last entry
>> who's threshold is less or equal to usage.
>> 
>> Instead of iterating entries to get the correct index, we can leverage
>> primary->current_threshold to get it. If the threshold added is less or
>> equal to usage, current_threshold should increase by one. Otherwise, it
>> doesn't change.
>
>How big is usually an array of thresholds? If it's not huge, likely
>any savings won't be really noticeable (it's not a hot path and there
>is an rc_synchronize() below).

Usually the size is small I think.

>
>So I agree with Michal that a better justification is really needed.

Yep.

>
>Thanks!

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node
@ 2022-01-12  0:46     ` Wei Yang
  0 siblings, 0 replies; 55+ messages in thread
From: Wei Yang @ 2022-01-12  0:46 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Wei Yang, hannes, vdavydov.dev, akpm, shakeelb, guro, vbabka,
	willy, songmuchun, shy828301, surenb, linux-kernel, cgroups,
	linux-mm

On Tue, Jan 11, 2022 at 09:40:20AM +0100, Michal Hocko wrote:
>On Tue 11-01-22 01:02:59, Wei Yang wrote:
>> Instead of use "-1", let's use NUMA_NO_NODE for consistency.
>> 
>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>
>I am not really sure this is worth it. After the merge window I plan to
>post http://lkml.kernel.org/r/20211214100732.26335-1-mhocko@kernel.org.

Give me some time to understand it :-)


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node
@ 2022-01-12  0:46     ` Wei Yang
  0 siblings, 0 replies; 55+ messages in thread
From: Wei Yang @ 2022-01-12  0:46 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Wei Yang, hannes-druUgvl0LCNAfugRpC6u6w,
	vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA, guro-b10kYP2dOMg,
	vbabka-AlSwsSmVLrQ, willy-wEGCiKHe2LqWVfeAwA7xHQ,
	songmuchun-EC8Uxl6Npydl57MIdRCFDg,
	shy828301-Re5JQEeQqe8AvxtiuMwx3w, surenb-hpIqsD4AKlfQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Tue, Jan 11, 2022 at 09:40:20AM +0100, Michal Hocko wrote:
>On Tue 11-01-22 01:02:59, Wei Yang wrote:
>> Instead of use "-1", let's use NUMA_NO_NODE for consistency.
>> 
>> Signed-off-by: Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>
>I am not really sure this is worth it. After the merge window I plan to
>post http://lkml.kernel.org/r/20211214100732.26335-1-mhocko-DgEjT+Ai2yi4UlQgPVntAg@public.gmane.org

Give me some time to understand it :-)


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node
  2022-01-12  0:46     ` Wei Yang
@ 2022-01-12  8:56       ` Michal Hocko
  -1 siblings, 0 replies; 55+ messages in thread
From: Michal Hocko @ 2022-01-12  8:56 UTC (permalink / raw)
  To: Wei Yang
  Cc: hannes, vdavydov.dev, akpm, shakeelb, guro, vbabka, willy,
	songmuchun, shy828301, surenb, linux-kernel, cgroups, linux-mm

On Wed 12-01-22 00:46:34, Wei Yang wrote:
> On Tue, Jan 11, 2022 at 09:40:20AM +0100, Michal Hocko wrote:
> >On Tue 11-01-22 01:02:59, Wei Yang wrote:
> >> Instead of use "-1", let's use NUMA_NO_NODE for consistency.
> >> 
> >> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> >
> >I am not really sure this is worth it. After the merge window I plan to
> >post http://lkml.kernel.org/r/20211214100732.26335-1-mhocko@kernel.org.
> 
> Give me some time to understand it :-)

Just for the record, here is what I have put on top of that series:
--- 
From b7195eba02fe6308a6927450f4630057c05e808e Mon Sep 17 00:00:00 2001
From: Wei Yang <richard.weiyang@gmail.com>
Date: Tue, 11 Jan 2022 09:45:25 +0100
Subject: [PATCH] memcg: do not tweak node in alloc_mem_cgroup_per_node_info

alloc_mem_cgroup_per_node_info is allocated for each possible node and
this used to be a problem because not !node_online nodes didn't have
appropriate data structure allocated. This has changed by "mm: handle
uninitialized numa nodes gracefully" so we can drop the special casing
here.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/memcontrol.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 781605e92015..ed19a21ee14e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5044,18 +5044,8 @@ struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
 static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
 {
 	struct mem_cgroup_per_node *pn;
-	int tmp = node;
-	/*
-	 * This routine is called against possible nodes.
-	 * But it's BUG to call kmalloc() against offline node.
-	 *
-	 * TODO: this routine can waste much memory for nodes which will
-	 *       never be onlined. It's better to use memory hotplug callback
-	 *       function.
-	 */
-	if (!node_state(node, N_NORMAL_MEMORY))
-		tmp = -1;
-	pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, tmp);
+
+	pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, node);
 	if (!pn)
 		return 1;
 
-- 
2.30.2


-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node
@ 2022-01-12  8:56       ` Michal Hocko
  0 siblings, 0 replies; 55+ messages in thread
From: Michal Hocko @ 2022-01-12  8:56 UTC (permalink / raw)
  To: Wei Yang
  Cc: hannes-druUgvl0LCNAfugRpC6u6w,
	vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA, guro-b10kYP2dOMg,
	vbabka-AlSwsSmVLrQ, willy-wEGCiKHe2LqWVfeAwA7xHQ,
	songmuchun-EC8Uxl6Npydl57MIdRCFDg,
	shy828301-Re5JQEeQqe8AvxtiuMwx3w, surenb-hpIqsD4AKlfQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Wed 12-01-22 00:46:34, Wei Yang wrote:
> On Tue, Jan 11, 2022 at 09:40:20AM +0100, Michal Hocko wrote:
> >On Tue 11-01-22 01:02:59, Wei Yang wrote:
> >> Instead of use "-1", let's use NUMA_NO_NODE for consistency.
> >> 
> >> Signed-off-by: Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> >
> >I am not really sure this is worth it. After the merge window I plan to
> >post http://lkml.kernel.org/r/20211214100732.26335-1-mhocko-DgEjT+Ai2yi4UlQgPVntAg@public.gmane.org
> 
> Give me some time to understand it :-)

Just for the record, here is what I have put on top of that series:
--- 
From b7195eba02fe6308a6927450f4630057c05e808e Mon Sep 17 00:00:00 2001
From: Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date: Tue, 11 Jan 2022 09:45:25 +0100
Subject: [PATCH] memcg: do not tweak node in alloc_mem_cgroup_per_node_info

alloc_mem_cgroup_per_node_info is allocated for each possible node and
this used to be a problem because not !node_online nodes didn't have
appropriate data structure allocated. This has changed by "mm: handle
uninitialized numa nodes gracefully" so we can drop the special casing
here.

Signed-off-by: Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Signed-off-by: Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>
---
 mm/memcontrol.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 781605e92015..ed19a21ee14e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5044,18 +5044,8 @@ struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
 static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
 {
 	struct mem_cgroup_per_node *pn;
-	int tmp = node;
-	/*
-	 * This routine is called against possible nodes.
-	 * But it's BUG to call kmalloc() against offline node.
-	 *
-	 * TODO: this routine can waste much memory for nodes which will
-	 *       never be onlined. It's better to use memory hotplug callback
-	 *       function.
-	 */
-	if (!node_state(node, N_NORMAL_MEMORY))
-		tmp = -1;
-	pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, tmp);
+
+	pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, node);
 	if (!pn)
 		return 1;
 
-- 
2.30.2


-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node
@ 2022-01-14  0:29         ` Wei Yang
  0 siblings, 0 replies; 55+ messages in thread
From: Wei Yang @ 2022-01-14  0:29 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Wei Yang, hannes, vdavydov.dev, akpm, shakeelb, guro, vbabka,
	willy, songmuchun, shy828301, surenb, linux-kernel, cgroups,
	linux-mm

On Wed, Jan 12, 2022 at 09:56:15AM +0100, Michal Hocko wrote:
>On Wed 12-01-22 00:46:34, Wei Yang wrote:
>> On Tue, Jan 11, 2022 at 09:40:20AM +0100, Michal Hocko wrote:
>> >On Tue 11-01-22 01:02:59, Wei Yang wrote:
>> >> Instead of use "-1", let's use NUMA_NO_NODE for consistency.
>> >> 
>> >> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> >
>> >I am not really sure this is worth it. After the merge window I plan to
>> >post http://lkml.kernel.org/r/20211214100732.26335-1-mhocko@kernel.org.
>> 
>> Give me some time to understand it :-)
>
>Just for the record, here is what I have put on top of that series:

Ok, I got what you try to resolve. I am ok with the following change except
one point.

>--- 
>>From b7195eba02fe6308a6927450f4630057c05e808e Mon Sep 17 00:00:00 2001
>From: Wei Yang <richard.weiyang@gmail.com>
>Date: Tue, 11 Jan 2022 09:45:25 +0100
>Subject: [PATCH] memcg: do not tweak node in alloc_mem_cgroup_per_node_info
>
>alloc_mem_cgroup_per_node_info is allocated for each possible node and
>this used to be a problem because not !node_online nodes didn't have
>appropriate data structure allocated. This has changed by "mm: handle
>uninitialized numa nodes gracefully" so we can drop the special casing
>here.
>
>Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>Signed-off-by: Michal Hocko <mhocko@suse.com>
>---
> mm/memcontrol.c | 14 ++------------
> 1 file changed, 2 insertions(+), 12 deletions(-)
>
>diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>index 781605e92015..ed19a21ee14e 100644
>--- a/mm/memcontrol.c
>+++ b/mm/memcontrol.c
>@@ -5044,18 +5044,8 @@ struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
> static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
> {
> 	struct mem_cgroup_per_node *pn;
>-	int tmp = node;
>-	/*
>-	 * This routine is called against possible nodes.
>-	 * But it's BUG to call kmalloc() against offline node.
>-	 *
>-	 * TODO: this routine can waste much memory for nodes which will
>-	 *       never be onlined. It's better to use memory hotplug callback
>-	 *       function.
>-	 */

Do you think this TODO is not related to this change?

>-	if (!node_state(node, N_NORMAL_MEMORY))
>-		tmp = -1;
>-	pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, tmp);
>+
>+	pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, node);
> 	if (!pn)
> 		return 1;
> 
>-- 
>2.30.2
>
>
>-- 
>Michal Hocko
>SUSE Labs

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node
@ 2022-01-14  0:29         ` Wei Yang
  0 siblings, 0 replies; 55+ messages in thread
From: Wei Yang @ 2022-01-14  0:29 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Wei Yang, hannes-druUgvl0LCNAfugRpC6u6w,
	vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA, guro-b10kYP2dOMg,
	vbabka-AlSwsSmVLrQ, willy-wEGCiKHe2LqWVfeAwA7xHQ,
	songmuchun-EC8Uxl6Npydl57MIdRCFDg,
	shy828301-Re5JQEeQqe8AvxtiuMwx3w, surenb-hpIqsD4AKlfQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Wed, Jan 12, 2022 at 09:56:15AM +0100, Michal Hocko wrote:
>On Wed 12-01-22 00:46:34, Wei Yang wrote:
>> On Tue, Jan 11, 2022 at 09:40:20AM +0100, Michal Hocko wrote:
>> >On Tue 11-01-22 01:02:59, Wei Yang wrote:
>> >> Instead of use "-1", let's use NUMA_NO_NODE for consistency.
>> >> 
>> >> Signed-off-by: Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>> >
>> >I am not really sure this is worth it. After the merge window I plan to
>> >post http://lkml.kernel.org/r/20211214100732.26335-1-mhocko-DgEjT+Ai2yi4UlQgPVntAg@public.gmane.org
>> 
>> Give me some time to understand it :-)
>
>Just for the record, here is what I have put on top of that series:

Ok, I got what you try to resolve. I am ok with the following change except
one point.

>--- 
>>From b7195eba02fe6308a6927450f4630057c05e808e Mon Sep 17 00:00:00 2001
>From: Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>Date: Tue, 11 Jan 2022 09:45:25 +0100
>Subject: [PATCH] memcg: do not tweak node in alloc_mem_cgroup_per_node_info
>
>alloc_mem_cgroup_per_node_info is allocated for each possible node and
>this used to be a problem because not !node_online nodes didn't have
>appropriate data structure allocated. This has changed by "mm: handle
>uninitialized numa nodes gracefully" so we can drop the special casing
>here.
>
>Signed-off-by: Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>Signed-off-by: Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>
>---
> mm/memcontrol.c | 14 ++------------
> 1 file changed, 2 insertions(+), 12 deletions(-)
>
>diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>index 781605e92015..ed19a21ee14e 100644
>--- a/mm/memcontrol.c
>+++ b/mm/memcontrol.c
>@@ -5044,18 +5044,8 @@ struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
> static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
> {
> 	struct mem_cgroup_per_node *pn;
>-	int tmp = node;
>-	/*
>-	 * This routine is called against possible nodes.
>-	 * But it's BUG to call kmalloc() against offline node.
>-	 *
>-	 * TODO: this routine can waste much memory for nodes which will
>-	 *       never be onlined. It's better to use memory hotplug callback
>-	 *       function.
>-	 */

Do you think this TODO is not related to this change?

>-	if (!node_state(node, N_NORMAL_MEMORY))
>-		tmp = -1;
>-	pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, tmp);
>+
>+	pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, node);
> 	if (!pn)
> 		return 1;
> 
>-- 
>2.30.2
>
>
>-- 
>Michal Hocko
>SUSE Labs

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node
  2022-01-14  0:29         ` Wei Yang
@ 2022-01-14  8:51           ` Michal Hocko
  -1 siblings, 0 replies; 55+ messages in thread
From: Michal Hocko @ 2022-01-14  8:51 UTC (permalink / raw)
  To: Wei Yang
  Cc: hannes, vdavydov.dev, akpm, shakeelb, guro, vbabka, willy,
	songmuchun, shy828301, surenb, linux-kernel, cgroups, linux-mm

On Fri 14-01-22 00:29:37, Wei Yang wrote:
> On Wed, Jan 12, 2022 at 09:56:15AM +0100, Michal Hocko wrote:
> >On Wed 12-01-22 00:46:34, Wei Yang wrote:
> >> On Tue, Jan 11, 2022 at 09:40:20AM +0100, Michal Hocko wrote:
> >> >On Tue 11-01-22 01:02:59, Wei Yang wrote:
> >> >> Instead of use "-1", let's use NUMA_NO_NODE for consistency.
> >> >> 
> >> >> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> >> >
> >> >I am not really sure this is worth it. After the merge window I plan to
> >> >post http://lkml.kernel.org/r/20211214100732.26335-1-mhocko@kernel.org.
> >> 
> >> Give me some time to understand it :-)
> >
> >Just for the record, here is what I have put on top of that series:
> 
> Ok, I got what you try to resolve. I am ok with the following change except
> one point.
> 
> >--- 
> >>From b7195eba02fe6308a6927450f4630057c05e808e Mon Sep 17 00:00:00 2001
> >From: Wei Yang <richard.weiyang@gmail.com>
> >Date: Tue, 11 Jan 2022 09:45:25 +0100
> >Subject: [PATCH] memcg: do not tweak node in alloc_mem_cgroup_per_node_info
> >
> >alloc_mem_cgroup_per_node_info is allocated for each possible node and
> >this used to be a problem because not !node_online nodes didn't have
> >appropriate data structure allocated. This has changed by "mm: handle
> >uninitialized numa nodes gracefully" so we can drop the special casing
> >here.
> >
> >Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> >Signed-off-by: Michal Hocko <mhocko@suse.com>
> >---
> > mm/memcontrol.c | 14 ++------------
> > 1 file changed, 2 insertions(+), 12 deletions(-)
> >
> >diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> >index 781605e92015..ed19a21ee14e 100644
> >--- a/mm/memcontrol.c
> >+++ b/mm/memcontrol.c
> >@@ -5044,18 +5044,8 @@ struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
> > static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
> > {
> > 	struct mem_cgroup_per_node *pn;
> >-	int tmp = node;
> >-	/*
> >-	 * This routine is called against possible nodes.
> >-	 * But it's BUG to call kmalloc() against offline node.
> >-	 *
> >-	 * TODO: this routine can waste much memory for nodes which will
> >-	 *       never be onlined. It's better to use memory hotplug callback
> >-	 *       function.
> >-	 */
> 
> Do you think this TODO is not related to this change?

It is not really related but I am not sure how useful it is. Essentially
any allocation that is per-possible node is in the same situation and if
we really need to deal with large and sparse possible nodes masks.

If you want me to keep the TODO I will do it though.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node
@ 2022-01-14  8:51           ` Michal Hocko
  0 siblings, 0 replies; 55+ messages in thread
From: Michal Hocko @ 2022-01-14  8:51 UTC (permalink / raw)
  To: Wei Yang
  Cc: hannes-druUgvl0LCNAfugRpC6u6w,
	vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA, guro-b10kYP2dOMg,
	vbabka-AlSwsSmVLrQ, willy-wEGCiKHe2LqWVfeAwA7xHQ,
	songmuchun-EC8Uxl6Npydl57MIdRCFDg,
	shy828301-Re5JQEeQqe8AvxtiuMwx3w, surenb-hpIqsD4AKlfQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Fri 14-01-22 00:29:37, Wei Yang wrote:
> On Wed, Jan 12, 2022 at 09:56:15AM +0100, Michal Hocko wrote:
> >On Wed 12-01-22 00:46:34, Wei Yang wrote:
> >> On Tue, Jan 11, 2022 at 09:40:20AM +0100, Michal Hocko wrote:
> >> >On Tue 11-01-22 01:02:59, Wei Yang wrote:
> >> >> Instead of use "-1", let's use NUMA_NO_NODE for consistency.
> >> >> 
> >> >> Signed-off-by: Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> >> >
> >> >I am not really sure this is worth it. After the merge window I plan to
> >> >post http://lkml.kernel.org/r/20211214100732.26335-1-mhocko-DgEjT+Ai2yi4UlQgPVntAg@public.gmane.org
> >> 
> >> Give me some time to understand it :-)
> >
> >Just for the record, here is what I have put on top of that series:
> 
> Ok, I got what you try to resolve. I am ok with the following change except
> one point.
> 
> >--- 
> >>From b7195eba02fe6308a6927450f4630057c05e808e Mon Sep 17 00:00:00 2001
> >From: Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> >Date: Tue, 11 Jan 2022 09:45:25 +0100
> >Subject: [PATCH] memcg: do not tweak node in alloc_mem_cgroup_per_node_info
> >
> >alloc_mem_cgroup_per_node_info is allocated for each possible node and
> >this used to be a problem because not !node_online nodes didn't have
> >appropriate data structure allocated. This has changed by "mm: handle
> >uninitialized numa nodes gracefully" so we can drop the special casing
> >here.
> >
> >Signed-off-by: Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> >Signed-off-by: Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>
> >---
> > mm/memcontrol.c | 14 ++------------
> > 1 file changed, 2 insertions(+), 12 deletions(-)
> >
> >diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> >index 781605e92015..ed19a21ee14e 100644
> >--- a/mm/memcontrol.c
> >+++ b/mm/memcontrol.c
> >@@ -5044,18 +5044,8 @@ struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
> > static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
> > {
> > 	struct mem_cgroup_per_node *pn;
> >-	int tmp = node;
> >-	/*
> >-	 * This routine is called against possible nodes.
> >-	 * But it's BUG to call kmalloc() against offline node.
> >-	 *
> >-	 * TODO: this routine can waste much memory for nodes which will
> >-	 *       never be onlined. It's better to use memory hotplug callback
> >-	 *       function.
> >-	 */
> 
> Do you think this TODO is not related to this change?

It is not really related but I am not sure how useful it is. Essentially
any allocation that is per-possible node is in the same situation and if
we really need to deal with large and sparse possible nodes masks.

If you want me to keep the TODO I will do it though.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node
@ 2022-01-14 11:07   ` Mike Rapoport
  0 siblings, 0 replies; 55+ messages in thread
From: Mike Rapoport @ 2022-01-14 11:07 UTC (permalink / raw)
  To: Wei Yang
  Cc: hannes, mhocko, vdavydov.dev, akpm, shakeelb, guro, vbabka,
	willy, songmuchun, shy828301, surenb, linux-kernel, cgroups,
	linux-mm

On Tue, Jan 11, 2022 at 01:02:59AM +0000, Wei Yang wrote:
> Instead of use "-1", let's use NUMA_NO_NODE for consistency.
> 
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>

Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>

> ---
>  mm/memcontrol.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 2ed5f2a0879d..11715f7323c0 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5054,7 +5054,7 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
>  	 *       function.
>  	 */
>  	if (!node_state(node, N_NORMAL_MEMORY))
> -		tmp = -1;
> +		tmp = NUMA_NO_NODE;
>  	pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, tmp);
>  	if (!pn)
>  		return 1;
> -- 
> 2.33.1
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node
@ 2022-01-14 11:07   ` Mike Rapoport
  0 siblings, 0 replies; 55+ messages in thread
From: Mike Rapoport @ 2022-01-14 11:07 UTC (permalink / raw)
  To: Wei Yang
  Cc: hannes-druUgvl0LCNAfugRpC6u6w, mhocko-DgEjT+Ai2ygdnm+yROfE0A,
	vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA, guro-b10kYP2dOMg,
	vbabka-AlSwsSmVLrQ, willy-wEGCiKHe2LqWVfeAwA7xHQ,
	songmuchun-EC8Uxl6Npydl57MIdRCFDg,
	shy828301-Re5JQEeQqe8AvxtiuMwx3w, surenb-hpIqsD4AKlfQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Tue, Jan 11, 2022 at 01:02:59AM +0000, Wei Yang wrote:
> Instead of use "-1", let's use NUMA_NO_NODE for consistency.
> 
> Signed-off-by: Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Reviewed-by: Mike Rapoport <rppt-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org>

> ---
>  mm/memcontrol.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 2ed5f2a0879d..11715f7323c0 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5054,7 +5054,7 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
>  	 *       function.
>  	 */
>  	if (!node_state(node, N_NORMAL_MEMORY))
> -		tmp = -1;
> +		tmp = NUMA_NO_NODE;
>  	pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, tmp);
>  	if (!pn)
>  		return 1;
> -- 
> 2.33.1
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/4] mm/memcg: mem_cgroup_per_node is already set to 0 on allocation
  2022-01-11  1:03 ` [PATCH 2/4] mm/memcg: mem_cgroup_per_node is already set to 0 on allocation Wei Yang
                     ` (2 preceding siblings ...)
  2022-01-11 18:06     ` Roman Gushchin
@ 2022-01-14 11:08   ` Mike Rapoport
  2022-01-16 19:24     ` Shakeel Butt
  4 siblings, 0 replies; 55+ messages in thread
From: Mike Rapoport @ 2022-01-14 11:08 UTC (permalink / raw)
  To: Wei Yang
  Cc: hannes, mhocko, vdavydov.dev, akpm, shakeelb, guro, vbabka,
	willy, songmuchun, shy828301, surenb, linux-kernel, cgroups,
	linux-mm

On Tue, Jan 11, 2022 at 01:03:00AM +0000, Wei Yang wrote:
> kzalloc_node() would set data to 0, so it's not necessary to set it
> again.
> 
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>

Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>

> ---
>  mm/memcontrol.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 11715f7323c0..a504616f904a 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5067,8 +5067,6 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
>  	}
>  
>  	lruvec_init(&pn->lruvec);
> -	pn->usage_in_excess = 0;
> -	pn->on_tree = false;
>  	pn->memcg = memcg;
>  
>  	memcg->nodeinfo[node] = pn;
> -- 
> 2.33.1
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node
@ 2022-01-15 22:10             ` Wei Yang
  0 siblings, 0 replies; 55+ messages in thread
From: Wei Yang @ 2022-01-15 22:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Wei Yang, hannes, vdavydov.dev, akpm, shakeelb, guro, vbabka,
	willy, songmuchun, shy828301, surenb, linux-kernel, cgroups,
	linux-mm

On Fri, Jan 14, 2022 at 09:51:31AM +0100, Michal Hocko wrote:
>On Fri 14-01-22 00:29:37, Wei Yang wrote:
>> On Wed, Jan 12, 2022 at 09:56:15AM +0100, Michal Hocko wrote:
>> >On Wed 12-01-22 00:46:34, Wei Yang wrote:
>> >> On Tue, Jan 11, 2022 at 09:40:20AM +0100, Michal Hocko wrote:
>> >> >On Tue 11-01-22 01:02:59, Wei Yang wrote:
>> >> >> Instead of use "-1", let's use NUMA_NO_NODE for consistency.
>> >> >> 
>> >> >> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> >> >
>> >> >I am not really sure this is worth it. After the merge window I plan to
>> >> >post http://lkml.kernel.org/r/20211214100732.26335-1-mhocko@kernel.org.
>> >> 
>> >> Give me some time to understand it :-)
>> >
>> >Just for the record, here is what I have put on top of that series:
>> 
>> Ok, I got what you try to resolve. I am ok with the following change except
>> one point.
>> 
>> >--- 
>> >>From b7195eba02fe6308a6927450f4630057c05e808e Mon Sep 17 00:00:00 2001
>> >From: Wei Yang <richard.weiyang@gmail.com>
>> >Date: Tue, 11 Jan 2022 09:45:25 +0100
>> >Subject: [PATCH] memcg: do not tweak node in alloc_mem_cgroup_per_node_info
>> >
>> >alloc_mem_cgroup_per_node_info is allocated for each possible node and
>> >this used to be a problem because not !node_online nodes didn't have
>> >appropriate data structure allocated. This has changed by "mm: handle
>> >uninitialized numa nodes gracefully" so we can drop the special casing
>> >here.
>> >
>> >Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> >Signed-off-by: Michal Hocko <mhocko@suse.com>
>> >---
>> > mm/memcontrol.c | 14 ++------------
>> > 1 file changed, 2 insertions(+), 12 deletions(-)
>> >
>> >diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> >index 781605e92015..ed19a21ee14e 100644
>> >--- a/mm/memcontrol.c
>> >+++ b/mm/memcontrol.c
>> >@@ -5044,18 +5044,8 @@ struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
>> > static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
>> > {
>> > 	struct mem_cgroup_per_node *pn;
>> >-	int tmp = node;
>> >-	/*
>> >-	 * This routine is called against possible nodes.
>> >-	 * But it's BUG to call kmalloc() against offline node.
>> >-	 *
>> >-	 * TODO: this routine can waste much memory for nodes which will
>> >-	 *       never be onlined. It's better to use memory hotplug callback
>> >-	 *       function.
>> >-	 */
>> 
>> Do you think this TODO is not related to this change?
>
>It is not really related but I am not sure how useful it is. Essentially
>any allocation that is per-possible node is in the same situation and if
>we really need to deal with large and sparse possible nodes masks.
>

Sounds reasonable :-)

>If you want me to keep the TODO I will do it though.
>
>-- 
>Michal Hocko
>SUSE Labs

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node
@ 2022-01-15 22:10             ` Wei Yang
  0 siblings, 0 replies; 55+ messages in thread
From: Wei Yang @ 2022-01-15 22:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Wei Yang, hannes-druUgvl0LCNAfugRpC6u6w,
	vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA, guro-b10kYP2dOMg,
	vbabka-AlSwsSmVLrQ, willy-wEGCiKHe2LqWVfeAwA7xHQ,
	songmuchun-EC8Uxl6Npydl57MIdRCFDg,
	shy828301-Re5JQEeQqe8AvxtiuMwx3w, surenb-hpIqsD4AKlfQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Fri, Jan 14, 2022 at 09:51:31AM +0100, Michal Hocko wrote:
>On Fri 14-01-22 00:29:37, Wei Yang wrote:
>> On Wed, Jan 12, 2022 at 09:56:15AM +0100, Michal Hocko wrote:
>> >On Wed 12-01-22 00:46:34, Wei Yang wrote:
>> >> On Tue, Jan 11, 2022 at 09:40:20AM +0100, Michal Hocko wrote:
>> >> >On Tue 11-01-22 01:02:59, Wei Yang wrote:
>> >> >> Instead of use "-1", let's use NUMA_NO_NODE for consistency.
>> >> >> 
>> >> >> Signed-off-by: Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>> >> >
>> >> >I am not really sure this is worth it. After the merge window I plan to
>> >> >post http://lkml.kernel.org/r/20211214100732.26335-1-mhocko-DgEjT+Ai2yi4UlQgPVntAg@public.gmane.org
>> >> 
>> >> Give me some time to understand it :-)
>> >
>> >Just for the record, here is what I have put on top of that series:
>> 
>> Ok, I got what you try to resolve. I am ok with the following change except
>> one point.
>> 
>> >--- 
>> >>From b7195eba02fe6308a6927450f4630057c05e808e Mon Sep 17 00:00:00 2001
>> >From: Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>> >Date: Tue, 11 Jan 2022 09:45:25 +0100
>> >Subject: [PATCH] memcg: do not tweak node in alloc_mem_cgroup_per_node_info
>> >
>> >alloc_mem_cgroup_per_node_info is allocated for each possible node and
>> >this used to be a problem because not !node_online nodes didn't have
>> >appropriate data structure allocated. This has changed by "mm: handle
>> >uninitialized numa nodes gracefully" so we can drop the special casing
>> >here.
>> >
>> >Signed-off-by: Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>> >Signed-off-by: Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>
>> >---
>> > mm/memcontrol.c | 14 ++------------
>> > 1 file changed, 2 insertions(+), 12 deletions(-)
>> >
>> >diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> >index 781605e92015..ed19a21ee14e 100644
>> >--- a/mm/memcontrol.c
>> >+++ b/mm/memcontrol.c
>> >@@ -5044,18 +5044,8 @@ struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
>> > static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
>> > {
>> > 	struct mem_cgroup_per_node *pn;
>> >-	int tmp = node;
>> >-	/*
>> >-	 * This routine is called against possible nodes.
>> >-	 * But it's BUG to call kmalloc() against offline node.
>> >-	 *
>> >-	 * TODO: this routine can waste much memory for nodes which will
>> >-	 *       never be onlined. It's better to use memory hotplug callback
>> >-	 *       function.
>> >-	 */
>> 
>> Do you think this TODO is not related to this change?
>
>It is not really related but I am not sure how useful it is. Essentially
>any allocation that is per-possible node is in the same situation and if
>we really need to deal with large and sparse possible nodes masks.
>

Sounds reasonable :-)

>If you want me to keep the TODO I will do it though.
>
>-- 
>Michal Hocko
>SUSE Labs

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node
@ 2022-01-16 19:23         ` Shakeel Butt
  0 siblings, 0 replies; 55+ messages in thread
From: Shakeel Butt @ 2022-01-16 19:23 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Wei Yang, Johannes Weiner, Vladimir Davydov, Andrew Morton,
	Roman Gushchin, Vlastimil Babka, Matthew Wilcox, Muchun Song,
	Yang Shi, Suren Baghdasaryan, LKML, Cgroups, Linux MM

On Wed, Jan 12, 2022 at 12:56 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 12-01-22 00:46:34, Wei Yang wrote:
> > On Tue, Jan 11, 2022 at 09:40:20AM +0100, Michal Hocko wrote:
> > >On Tue 11-01-22 01:02:59, Wei Yang wrote:
> > >> Instead of use "-1", let's use NUMA_NO_NODE for consistency.
> > >>
> > >> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> > >
> > >I am not really sure this is worth it. After the merge window I plan to
> > >post http://lkml.kernel.org/r/20211214100732.26335-1-mhocko@kernel.org.
> >
> > Give me some time to understand it :-)
>
> Just for the record, here is what I have put on top of that series:
> ---
> From b7195eba02fe6308a6927450f4630057c05e808e Mon Sep 17 00:00:00 2001
> From: Wei Yang <richard.weiyang@gmail.com>
> Date: Tue, 11 Jan 2022 09:45:25 +0100
> Subject: [PATCH] memcg: do not tweak node in alloc_mem_cgroup_per_node_info
>
> alloc_mem_cgroup_per_node_info is allocated for each possible node and
> this used to be a problem because not !node_online nodes didn't have
> appropriate data structure allocated. This has changed by "mm: handle
> uninitialized numa nodes gracefully" so we can drop the special casing
> here.
>
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Reviewed-by: Shakeel Butt <shakeelb@google.com>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node
@ 2022-01-16 19:23         ` Shakeel Butt
  0 siblings, 0 replies; 55+ messages in thread
From: Shakeel Butt @ 2022-01-16 19:23 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Wei Yang, Johannes Weiner, Vladimir Davydov, Andrew Morton,
	Roman Gushchin, Vlastimil Babka, Matthew Wilcox, Muchun Song,
	Yang Shi, Suren Baghdasaryan, LKML, Cgroups, Linux MM

On Wed, Jan 12, 2022 at 12:56 AM Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org> wrote:
>
> On Wed 12-01-22 00:46:34, Wei Yang wrote:
> > On Tue, Jan 11, 2022 at 09:40:20AM +0100, Michal Hocko wrote:
> > >On Tue 11-01-22 01:02:59, Wei Yang wrote:
> > >> Instead of use "-1", let's use NUMA_NO_NODE for consistency.
> > >>
> > >> Signed-off-by: Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> > >
> > >I am not really sure this is worth it. After the merge window I plan to
> > >post http://lkml.kernel.org/r/20211214100732.26335-1-mhocko-DgEjT+Ai2yi4UlQgPVntAg@public.gmane.org
> >
> > Give me some time to understand it :-)
>
> Just for the record, here is what I have put on top of that series:
> ---
> From b7195eba02fe6308a6927450f4630057c05e808e Mon Sep 17 00:00:00 2001
> From: Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Date: Tue, 11 Jan 2022 09:45:25 +0100
> Subject: [PATCH] memcg: do not tweak node in alloc_mem_cgroup_per_node_info
>
> alloc_mem_cgroup_per_node_info is allocated for each possible node and
> this used to be a problem because not !node_online nodes didn't have
> appropriate data structure allocated. This has changed by "mm: handle
> uninitialized numa nodes gracefully" so we can drop the special casing
> here.
>
> Signed-off-by: Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Signed-off-by: Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>

Reviewed-by: Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/4] mm/memcg: mem_cgroup_per_node is already set to 0 on allocation
@ 2022-01-16 19:24     ` Shakeel Butt
  0 siblings, 0 replies; 55+ messages in thread
From: Shakeel Butt @ 2022-01-16 19:24 UTC (permalink / raw)
  To: Wei Yang
  Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Roman Gushchin, Vlastimil Babka, Matthew Wilcox, Muchun Song,
	Yang Shi, Suren Baghdasaryan, LKML, Cgroups, Linux MM

On Mon, Jan 10, 2022 at 5:03 PM Wei Yang <richard.weiyang@gmail.com> wrote:
>
> kzalloc_node() would set data to 0, so it's not necessary to set it
> again.
>
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>

Reviewed-by: Shakeel Butt <shakeelb@google.com>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/4] mm/memcg: mem_cgroup_per_node is already set to 0 on allocation
@ 2022-01-16 19:24     ` Shakeel Butt
  0 siblings, 0 replies; 55+ messages in thread
From: Shakeel Butt @ 2022-01-16 19:24 UTC (permalink / raw)
  To: Wei Yang
  Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Roman Gushchin, Vlastimil Babka, Matthew Wilcox, Muchun Song,
	Yang Shi, Suren Baghdasaryan, LKML, Cgroups, Linux MM

On Mon, Jan 10, 2022 at 5:03 PM Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
> kzalloc_node() would set data to 0, so it's not necessary to set it
> again.
>
> Signed-off-by: Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Reviewed-by: Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/4] mm/memcg: retrieve parent memcg from css.parent
@ 2022-01-16 19:54     ` Shakeel Butt
  0 siblings, 0 replies; 55+ messages in thread
From: Shakeel Butt @ 2022-01-16 19:54 UTC (permalink / raw)
  To: Wei Yang
  Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Roman Gushchin, Vlastimil Babka, Matthew Wilcox, Muchun Song,
	Yang Shi, Suren Baghdasaryan, LKML, Cgroups, Linux MM

On Mon, Jan 10, 2022 at 5:03 PM Wei Yang <richard.weiyang@gmail.com> wrote:
>
> The parent we get from page_counter is correct, while this is two
> different hierarchy.
>
> Let's retrieve the parent memcg from css.parent just like parent_cs(),
> blkcg_parent(), etc.
>
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>

Reviewed-by: Shakeel Butt <shakeelb@google.com>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/4] mm/memcg: retrieve parent memcg from css.parent
@ 2022-01-16 19:54     ` Shakeel Butt
  0 siblings, 0 replies; 55+ messages in thread
From: Shakeel Butt @ 2022-01-16 19:54 UTC (permalink / raw)
  To: Wei Yang
  Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Roman Gushchin, Vlastimil Babka, Matthew Wilcox, Muchun Song,
	Yang Shi, Suren Baghdasaryan, LKML, Cgroups, Linux MM

On Mon, Jan 10, 2022 at 5:03 PM Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
> The parent we get from page_counter is correct, while this is two
> different hierarchy.
>
> Let's retrieve the parent memcg from css.parent just like parent_cs(),
> blkcg_parent(), etc.
>
> Signed-off-by: Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Reviewed-by: Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node
  2022-01-11  1:02 ` Wei Yang
@ 2022-01-31  1:47   ` Wei Yang
  -1 siblings, 0 replies; 55+ messages in thread
From: Wei Yang @ 2022-01-31  1:47 UTC (permalink / raw)
  To: akpm
  Cc: hannes, mhocko, vdavydov.dev, akpm, shakeelb, guro, vbabka,
	willy, songmuchun, shy828301, surenb, linux-kernel, cgroups,
	linux-mm

Hi, Andrew

Would you pick up this patch set, or prefer me to send a v2?

On Tue, Jan 11, 2022 at 01:02:59AM +0000, Wei Yang wrote:
>Instead of use "-1", let's use NUMA_NO_NODE for consistency.
>
>Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>---
> mm/memcontrol.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>index 2ed5f2a0879d..11715f7323c0 100644
>--- a/mm/memcontrol.c
>+++ b/mm/memcontrol.c
>@@ -5054,7 +5054,7 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
> 	 *       function.
> 	 */
> 	if (!node_state(node, N_NORMAL_MEMORY))
>-		tmp = -1;
>+		tmp = NUMA_NO_NODE;
> 	pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, tmp);
> 	if (!pn)
> 		return 1;
>-- 
>2.33.1

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node
@ 2022-01-31  1:47   ` Wei Yang
  0 siblings, 0 replies; 55+ messages in thread
From: Wei Yang @ 2022-01-31  1:47 UTC (permalink / raw)
  Cc: hannes, mhocko, vdavydov.dev, akpm, shakeelb, guro, vbabka,
	willy, songmuchun, shy828301, surenb, linux-kernel, cgroups,
	linux-mm

Hi, Andrew

Would you pick up this patch set, or prefer me to send a v2?

On Tue, Jan 11, 2022 at 01:02:59AM +0000, Wei Yang wrote:
>Instead of use "-1", let's use NUMA_NO_NODE for consistency.
>
>Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>---
> mm/memcontrol.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>index 2ed5f2a0879d..11715f7323c0 100644
>--- a/mm/memcontrol.c
>+++ b/mm/memcontrol.c
>@@ -5054,7 +5054,7 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
> 	 *       function.
> 	 */
> 	if (!node_state(node, N_NORMAL_MEMORY))
>-		tmp = -1;
>+		tmp = NUMA_NO_NODE;
> 	pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, tmp);
> 	if (!pn)
> 		return 1;
>-- 
>2.33.1

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node
  2022-01-31  1:47   ` Wei Yang
@ 2022-01-31 22:36     ` Andrew Morton
  -1 siblings, 0 replies; 55+ messages in thread
From: Andrew Morton @ 2022-01-31 22:36 UTC (permalink / raw)
  To: Wei Yang
  Cc: hannes, mhocko, vdavydov.dev, shakeelb, guro, vbabka, willy,
	songmuchun, shy828301, surenb, linux-kernel, cgroups, linux-mm

On Mon, 31 Jan 2022 01:47:42 +0000 Wei Yang <richard.weiyang@gmail.com> wrote:

> Hi, Andrew
> 
> Would you pick up this patch set, or prefer me to send a v2?
> 

It's unclear to me what's happening with [4/4].  At least a new
changelog with more justification is expected?

So yes, please resend?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node
@ 2022-01-31 22:36     ` Andrew Morton
  0 siblings, 0 replies; 55+ messages in thread
From: Andrew Morton @ 2022-01-31 22:36 UTC (permalink / raw)
  To: Wei Yang
  Cc: hannes-druUgvl0LCNAfugRpC6u6w, mhocko-DgEjT+Ai2ygdnm+yROfE0A,
	vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA, guro-b10kYP2dOMg,
	vbabka-AlSwsSmVLrQ, willy-wEGCiKHe2LqWVfeAwA7xHQ,
	songmuchun-EC8Uxl6Npydl57MIdRCFDg,
	shy828301-Re5JQEeQqe8AvxtiuMwx3w, surenb-hpIqsD4AKlfQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Mon, 31 Jan 2022 01:47:42 +0000 Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> Hi, Andrew
> 
> Would you pick up this patch set, or prefer me to send a v2?
> 

It's unclear to me what's happening with [4/4].  At least a new
changelog with more justification is expected?

So yes, please resend?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node
@ 2022-02-01  0:42       ` Wei Yang
  0 siblings, 0 replies; 55+ messages in thread
From: Wei Yang @ 2022-02-01  0:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Wei Yang, hannes, mhocko, vdavydov.dev, shakeelb, guro, vbabka,
	willy, songmuchun, shy828301, surenb, linux-kernel, cgroups,
	linux-mm

On Mon, Jan 31, 2022 at 02:36:20PM -0800, Andrew Morton wrote:
>On Mon, 31 Jan 2022 01:47:42 +0000 Wei Yang <richard.weiyang@gmail.com> wrote:
>
>> Hi, Andrew
>> 
>> Would you pick up this patch set, or prefer me to send a v2?
>> 
>
>It's unclear to me what's happening with [4/4].  At least a new
>changelog with more justification is expected?
>

As Michal and Roman suggested, this is not a hot-path. I would drop this one.

>So yes, please resend?

Sure.

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node
@ 2022-02-01  0:42       ` Wei Yang
  0 siblings, 0 replies; 55+ messages in thread
From: Wei Yang @ 2022-02-01  0:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Wei Yang, hannes-druUgvl0LCNAfugRpC6u6w,
	mhocko-DgEjT+Ai2ygdnm+yROfE0A,
	vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA, guro-b10kYP2dOMg,
	vbabka-AlSwsSmVLrQ, willy-wEGCiKHe2LqWVfeAwA7xHQ,
	songmuchun-EC8Uxl6Npydl57MIdRCFDg,
	shy828301-Re5JQEeQqe8AvxtiuMwx3w, surenb-hpIqsD4AKlfQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Mon, Jan 31, 2022 at 02:36:20PM -0800, Andrew Morton wrote:
>On Mon, 31 Jan 2022 01:47:42 +0000 Wei Yang <richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
>> Hi, Andrew
>> 
>> Would you pick up this patch set, or prefer me to send a v2?
>> 
>
>It's unclear to me what's happening with [4/4].  At least a new
>changelog with more justification is expected?
>

As Michal and Roman suggested, this is not a hot-path. I would drop this one.

>So yes, please resend?

Sure.

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2022-02-01  0:42 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-11  1:02 [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node Wei Yang
2022-01-11  1:02 ` Wei Yang
2022-01-11  1:03 ` [PATCH 2/4] mm/memcg: mem_cgroup_per_node is already set to 0 on allocation Wei Yang
2022-01-11  2:25   ` Muchun Song
2022-01-11  2:25     ` Muchun Song
2022-01-11  8:41   ` Michal Hocko
2022-01-11 18:06   ` Roman Gushchin
2022-01-11 18:06     ` Roman Gushchin
2022-01-14 11:08   ` Mike Rapoport
2022-01-16 19:24   ` Shakeel Butt
2022-01-16 19:24     ` Shakeel Butt
2022-01-11  1:03 ` [PATCH 3/4] mm/memcg: retrieve parent memcg from css.parent Wei Yang
2022-01-11  1:03   ` Wei Yang
2022-01-11  3:12   ` Muchun Song
2022-01-11  3:12     ` Muchun Song
2022-01-11  8:43   ` Michal Hocko
2022-01-11 18:09   ` Roman Gushchin
2022-01-11 18:09     ` Roman Gushchin
2022-01-12  0:24     ` Wei Yang
2022-01-12  0:24       ` Wei Yang
2022-01-16 19:54   ` Shakeel Butt
2022-01-16 19:54     ` Shakeel Butt
2022-01-11  1:03 ` [PATCH 4/4] mm/memcg: refine mem_cgroup_threshold_ary->current_threshold calculation Wei Yang
2022-01-11  1:03   ` Wei Yang
2022-01-11  8:44   ` Michal Hocko
2022-01-11 18:23   ` Roman Gushchin
2022-01-11 18:23     ` Roman Gushchin
2022-01-12  0:25     ` Wei Yang
2022-01-12  0:25       ` Wei Yang
2022-01-11  2:23 ` [PATCH 1/4] mm/memcg: use NUMA_NO_NODE to indicate allocation from unspecified node Muchun Song
2022-01-11  2:23   ` Muchun Song
2022-01-11  8:40 ` Michal Hocko
2022-01-11  8:40   ` Michal Hocko
2022-01-12  0:46   ` Wei Yang
2022-01-12  0:46     ` Wei Yang
2022-01-12  8:56     ` Michal Hocko
2022-01-12  8:56       ` Michal Hocko
2022-01-14  0:29       ` Wei Yang
2022-01-14  0:29         ` Wei Yang
2022-01-14  8:51         ` Michal Hocko
2022-01-14  8:51           ` Michal Hocko
2022-01-15 22:10           ` Wei Yang
2022-01-15 22:10             ` Wei Yang
2022-01-16 19:23       ` Shakeel Butt
2022-01-16 19:23         ` Shakeel Butt
2022-01-11 18:05 ` Roman Gushchin
2022-01-11 18:05   ` Roman Gushchin
2022-01-14 11:07 ` Mike Rapoport
2022-01-14 11:07   ` Mike Rapoport
2022-01-31  1:47 ` Wei Yang
2022-01-31  1:47   ` Wei Yang
2022-01-31 22:36   ` Andrew Morton
2022-01-31 22:36     ` Andrew Morton
2022-02-01  0:42     ` Wei Yang
2022-02-01  0:42       ` Wei Yang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.