All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] mm: memcontrol: Add the missing numa_stat interface for cgroup v2
@ 2020-09-13  7:00 Muchun Song
  2020-09-13 17:09 ` Chris Down
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Muchun Song @ 2020-09-13  7:00 UTC (permalink / raw)
  To: tj, lizefan, hannes, corbet, mhocko, vdavydov.dev, akpm, shakeelb, guro
  Cc: cgroups, linux-doc, linux-kernel, linux-mm, Muchun Song,
	kernel test robot

In the cgroup v1, we have a numa_stat interface. This is useful for
providing visibility into the numa locality information within an
memcg since the pages are allowed to be allocated from any physical
node. One of the use cases is evaluating application performance by
combining this information with the application's CPU allocation.
But the cgroup v2 does not. So this patch adds the missing information.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Suggested-by: Shakeel Butt <shakeelb@google.com>
Reported-by: kernel test robot <lkp@intel.com>
---
 changelog in v3:
 1. Fix compiler error on powerpc architecture reported by kernel test robot.
 2. Fix a typo from "anno" to "anon".

 changelog in v2:
 1. Add memory.numa_stat interface in cgroup v2.

 Documentation/admin-guide/cgroup-v2.rst |  72 ++++++++++++++++
 mm/memcontrol.c                         | 107 ++++++++++++++++++++++++
 2 files changed, 179 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 6be43781ec7f..92207f0012e4 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1368,6 +1368,78 @@ PAGE_SIZE multiple when read back.
 		collapsing an existing range of pages. This counter is not
 		present when CONFIG_TRANSPARENT_HUGEPAGE is not set.
 
+  memory.numa_stat
+	A read-only flat-keyed file which exists on non-root cgroups.
+
+	This breaks down the cgroup's memory footprint into different
+	types of memory, type-specific details, and other information
+	per node on the state of the memory management system.
+
+	This is useful for providing visibility into the numa locality
+	information within an memcg since the pages are allowed to be
+	allocated from any physical node. One of the use cases is evaluating
+	application performance by combining this information with the
+	application's CPU allocation.
+
+	All memory amounts are in bytes.
+
+	The output format of memory.numa_stat is::
+
+	  type N0=<node 0 pages> N1=<node 1 pages> ...
+
+	The entries are ordered to be human readable, and new entries
+	can show up in the middle. Don't rely on items remaining in a
+	fixed position; use the keys to look up specific values!
+
+	  anon
+		Amount of memory per node used in anonymous mappings such
+		as brk(), sbrk(), and mmap(MAP_ANONYMOUS)
+
+	  file
+		Amount of memory per node used to cache filesystem data,
+		including tmpfs and shared memory.
+
+	  kernel_stack
+		Amount of memory per node allocated to kernel stacks.
+
+	  shmem
+		Amount of cached filesystem data per node that is swap-backed,
+		such as tmpfs, shm segments, shared anonymous mmap()s
+
+	  file_mapped
+		Amount of cached filesystem data per node mapped with mmap()
+
+	  file_dirty
+		Amount of cached filesystem data per node that was modified but
+		not yet written back to disk
+
+	  file_writeback
+		Amount of cached filesystem data per node that was modified and
+		is currently being written back to disk
+
+	  anon_thp
+		Amount of memory per node used in anonymous mappings backed by
+		transparent hugepages
+
+	  inactive_anon, active_anon, inactive_file, active_file, unevictable
+		Amount of memory, swap-backed and filesystem-backed,
+		per node on the internal memory management lists used
+		by the page reclaim algorithm.
+
+		As these represent internal list state (eg. shmem pages are on anon
+		memory management lists), inactive_foo + active_foo may not be equal to
+		the value for the foo counter, since the foo counter is type-based, not
+		list-based.
+
+	  slab_reclaimable
+		Amount of memory per node used for storing in-kernel data
+		structures which might be reclaimed, such as dentries and
+		inodes.
+
+	  slab_unreclaimable
+		Amount of memory per node used for storing in-kernel data
+		structures which cannot be reclaimed on memory pressure.
+
   memory.swap.current
 	A read-only single value file which exists on non-root
 	cgroups.
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 1b403d6f5da0..2dadb1aaedaa 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6427,6 +6427,107 @@ static int memory_stat_show(struct seq_file *m, void *v)
 	return 0;
 }
 
+#ifdef CONFIG_NUMA
+struct numa_stat {
+	const char *name;
+	unsigned int ratio;
+	enum node_stat_item idx;
+};
+
+static struct numa_stat numa_stats[] = {
+	{ "anon", PAGE_SIZE, NR_ANON_MAPPED },
+	{ "file", PAGE_SIZE, NR_FILE_PAGES },
+	{ "kernel_stack", 1024, NR_KERNEL_STACK_KB },
+	{ "shmem", PAGE_SIZE, NR_SHMEM },
+	{ "file_mapped", PAGE_SIZE, NR_FILE_MAPPED },
+	{ "file_dirty", PAGE_SIZE, NR_FILE_DIRTY },
+	{ "file_writeback", PAGE_SIZE, NR_WRITEBACK },
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+	/*
+	 * The ratio will be initialized in numa_stats_init(). Because
+	 * on some architectures, the macro of HPAGE_PMD_SIZE is not
+	 * constant(e.g. powerpc).
+	 */
+	{ "anon_thp", 0, NR_ANON_THPS },
+#endif
+	{ "inactive_anon", PAGE_SIZE, NR_INACTIVE_ANON },
+	{ "active_anon", PAGE_SIZE, NR_ACTIVE_ANON },
+	{ "inactive_file", PAGE_SIZE, NR_INACTIVE_FILE },
+	{ "active_file", PAGE_SIZE, NR_ACTIVE_FILE },
+	{ "unevictable", PAGE_SIZE, NR_UNEVICTABLE },
+	{ "slab_reclaimable", 1, NR_SLAB_RECLAIMABLE_B },
+	{ "slab_unreclaimable", 1, NR_SLAB_UNRECLAIMABLE_B },
+};
+
+static int __init numa_stats_init(void)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(numa_stats); i++) {
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+		if (numa_stats[i].idx == NR_ANON_THPS)
+			numa_stats[i].ratio = HPAGE_PMD_SIZE;
+#endif
+	}
+
+	return 0;
+}
+pure_initcall(numa_stats_init);
+
+static unsigned long memcg_node_page_state(struct mem_cgroup *memcg,
+					   unsigned int nid,
+					   enum node_stat_item idx)
+{
+	VM_BUG_ON(nid >= nr_node_ids);
+	return lruvec_page_state(mem_cgroup_lruvec(memcg, NODE_DATA(nid)), idx);
+}
+
+static const char *memory_numa_stat_format(struct mem_cgroup *memcg)
+{
+	int i;
+	struct seq_buf s;
+
+	/* Reserve a byte for the trailing null */
+	seq_buf_init(&s, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE - 1);
+	if (!s.buffer)
+		return NULL;
+
+	for (i = 0; i < ARRAY_SIZE(numa_stats); i++) {
+		int nid;
+
+		seq_buf_printf(&s, "%s", numa_stats[i].name);
+		for_each_node_state(nid, N_MEMORY) {
+			u64 size;
+
+			size = memcg_node_page_state(memcg, nid,
+						     numa_stats[i].idx);
+			size *= numa_stats[i].ratio;
+			seq_buf_printf(&s, " N%d=%llu", nid, size);
+		}
+		seq_buf_putc(&s, '\n');
+	}
+
+	/* The above should easily fit into one page */
+	if (WARN_ON_ONCE(seq_buf_putc(&s, '\0')))
+		s.buffer[PAGE_SIZE - 1] = '\0';
+
+	return s.buffer;
+}
+
+static int memory_numa_stat_show(struct seq_file *m, void *v)
+{
+	struct mem_cgroup *memcg = mem_cgroup_from_seq(m);
+	const char *buf;
+
+	buf = memory_numa_stat_format(memcg);
+	if (!buf)
+		return -ENOMEM;
+	seq_puts(m, buf);
+	kfree(buf);
+	return 0;
+}
+#endif
+
 static int memory_oom_group_show(struct seq_file *m, void *v)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_seq(m);
@@ -6504,6 +6605,12 @@ static struct cftype memory_files[] = {
 		.name = "stat",
 		.seq_show = memory_stat_show,
 	},
+#ifdef CONFIG_NUMA
+	{
+		.name = "numa_stat",
+		.seq_show = memory_numa_stat_show,
+	},
+#endif
 	{
 		.name = "oom.group",
 		.flags = CFTYPE_NOT_ON_ROOT | CFTYPE_NS_DELEGATABLE,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] mm: memcontrol: Add the missing numa_stat interface for cgroup v2
  2020-09-13  7:00 [PATCH v3] mm: memcontrol: Add the missing numa_stat interface for cgroup v2 Muchun Song
@ 2020-09-13 17:09 ` Chris Down
  2020-09-14  3:10     ` Muchun Song
  2020-09-14 16:07   ` Shakeel Butt
  2020-09-14 19:06 ` Randy Dunlap
  2 siblings, 1 reply; 23+ messages in thread
From: Chris Down @ 2020-09-13 17:09 UTC (permalink / raw)
  To: Muchun Song
  Cc: tj, lizefan, hannes, corbet, mhocko, vdavydov.dev, akpm,
	shakeelb, guro, cgroups, linux-doc, linux-kernel, linux-mm,
	kernel test robot

Muchun Song writes:
>In the cgroup v1, we have a numa_stat interface. This is useful for
>providing visibility into the numa locality information within an
>memcg since the pages are allowed to be allocated from any physical
>node. One of the use cases is evaluating application performance by
>combining this information with the application's CPU allocation.
>But the cgroup v2 does not. So this patch adds the missing information.
>
>Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>Suggested-by: Shakeel Butt <shakeelb@google.com>
>Reported-by: kernel test robot <lkp@intel.com>

This is a feature patch, why does this have LKP's Reported-by?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [External] Re: [PATCH v3] mm: memcontrol: Add the missing numa_stat interface for cgroup v2
  2020-09-13 17:09 ` Chris Down
@ 2020-09-14  3:10     ` Muchun Song
  0 siblings, 0 replies; 23+ messages in thread
From: Muchun Song @ 2020-09-14  3:10 UTC (permalink / raw)
  To: Chris Down
  Cc: tj, lizefan, Johannes Weiner, corbet, Michal Hocko,
	Vladimir Davydov, Andrew Morton, Shakeel Butt, Roman Gushchin,
	Cgroups, linux-doc, LKML, Linux Memory Management List,
	kernel test robot

On Mon, Sep 14, 2020 at 1:09 AM Chris Down <chris@chrisdown.name> wrote:
>
> Muchun Song writes:
> >In the cgroup v1, we have a numa_stat interface. This is useful for
> >providing visibility into the numa locality information within an
> >memcg since the pages are allowed to be allocated from any physical
> >node. One of the use cases is evaluating application performance by
> >combining this information with the application's CPU allocation.
> >But the cgroup v2 does not. So this patch adds the missing information.
> >
> >Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> >Suggested-by: Shakeel Butt <shakeelb@google.com>
> >Reported-by: kernel test robot <lkp@intel.com>
>
> This is a feature patch, why does this have LKP's Reported-by?

In the v2 version, the kernel test robot reported a compiler error
on the powerpc architecture. So I added that. Thanks.

-- 
Yours,
Muchun

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [External] Re: [PATCH v3] mm: memcontrol: Add the missing numa_stat interface for cgroup v2
@ 2020-09-14  3:10     ` Muchun Song
  0 siblings, 0 replies; 23+ messages in thread
From: Muchun Song @ 2020-09-14  3:10 UTC (permalink / raw)
  To: Chris Down
  Cc: tj, lizefan, Johannes Weiner, corbet, Michal Hocko,
	Vladimir Davydov, Andrew Morton, Shakeel Butt, Roman Gushchin,
	Cgroups, linux-doc, LKML, Linux Memory Management List,
	kernel test robot

On Mon, Sep 14, 2020 at 1:09 AM Chris Down <chris@chrisdown.name> wrote:
>
> Muchun Song writes:
> >In the cgroup v1, we have a numa_stat interface. This is useful for
> >providing visibility into the numa locality information within an
> >memcg since the pages are allowed to be allocated from any physical
> >node. One of the use cases is evaluating application performance by
> >combining this information with the application's CPU allocation.
> >But the cgroup v2 does not. So this patch adds the missing information.
> >
> >Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> >Suggested-by: Shakeel Butt <shakeelb@google.com>
> >Reported-by: kernel test robot <lkp@intel.com>
>
> This is a feature patch, why does this have LKP's Reported-by?

In the v2 version, the kernel test robot reported a compiler error
on the powerpc architecture. So I added that. Thanks.

-- 
Yours,
Muchun

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [External] Re: [PATCH v3] mm: memcontrol: Add the missing numa_stat interface for cgroup v2
@ 2020-09-14  3:18       ` Zefan Li
  0 siblings, 0 replies; 23+ messages in thread
From: Zefan Li @ 2020-09-14  3:18 UTC (permalink / raw)
  To: Muchun Song, Chris Down
  Cc: tj, Johannes Weiner, corbet, Michal Hocko, Vladimir Davydov,
	Andrew Morton, Shakeel Butt, Roman Gushchin, Cgroups, linux-doc,
	LKML, Linux Memory Management List, kernel test robot

On 2020/9/14 11:10, Muchun Song wrote:
> On Mon, Sep 14, 2020 at 1:09 AM Chris Down <chris@chrisdown.name> wrote:
>>
>> Muchun Song writes:
>>> In the cgroup v1, we have a numa_stat interface. This is useful for
>>> providing visibility into the numa locality information within an
>>> memcg since the pages are allowed to be allocated from any physical
>>> node. One of the use cases is evaluating application performance by
>>> combining this information with the application's CPU allocation.
>>> But the cgroup v2 does not. So this patch adds the missing information.
>>>
>>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>>> Suggested-by: Shakeel Butt <shakeelb@google.com>
>>> Reported-by: kernel test robot <lkp@intel.com>
>>
>> This is a feature patch, why does this have LKP's Reported-by?
> 
> In the v2 version, the kernel test robot reported a compiler error
> on the powerpc architecture. So I added that. Thanks.
> 

You should remove this reported-by, and then add v2->v3 changelog:

...original changelog...

v3:
- fixed something reported by test rebot

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [External] Re: [PATCH v3] mm: memcontrol: Add the missing numa_stat interface for cgroup v2
@ 2020-09-14  3:18       ` Zefan Li
  0 siblings, 0 replies; 23+ messages in thread
From: Zefan Li @ 2020-09-14  3:18 UTC (permalink / raw)
  To: Muchun Song, Chris Down
  Cc: tj-DgEjT+Ai2ygdnm+yROfE0A, Johannes Weiner, corbet-T1hC0tSOHrs,
	Michal Hocko, Vladimir Davydov, Andrew Morton, Shakeel Butt,
	Roman Gushchin, Cgroups, linux-doc-u79uwXL29TY76Z2rM5mHXA, LKML,
	Linux Memory Management List, kernel test robot

On 2020/9/14 11:10, Muchun Song wrote:
> On Mon, Sep 14, 2020 at 1:09 AM Chris Down <chris-6Bi1550iOqEnzZ6mRAm98g@public.gmane.org> wrote:
>>
>> Muchun Song writes:
>>> In the cgroup v1, we have a numa_stat interface. This is useful for
>>> providing visibility into the numa locality information within an
>>> memcg since the pages are allowed to be allocated from any physical
>>> node. One of the use cases is evaluating application performance by
>>> combining this information with the application's CPU allocation.
>>> But the cgroup v2 does not. So this patch adds the missing information.
>>>
>>> Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>
>>> Suggested-by: Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
>>> Reported-by: kernel test robot <lkp-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>>
>> This is a feature patch, why does this have LKP's Reported-by?
> 
> In the v2 version, the kernel test robot reported a compiler error
> on the powerpc architecture. So I added that. Thanks.
> 

You should remove this reported-by, and then add v2->v3 changelog:

...original changelog...

v3:
- fixed something reported by test rebot

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [External] Re: [PATCH v3] mm: memcontrol: Add the missing numa_stat interface for cgroup v2
  2020-09-14  3:18       ` Zefan Li
@ 2020-09-14  3:28         ` Muchun Song
  -1 siblings, 0 replies; 23+ messages in thread
From: Muchun Song @ 2020-09-14  3:28 UTC (permalink / raw)
  To: Zefan Li
  Cc: Chris Down, tj, Johannes Weiner, corbet, Michal Hocko,
	Vladimir Davydov, Andrew Morton, Shakeel Butt, Roman Gushchin,
	Cgroups, linux-doc, LKML, Linux Memory Management List,
	kernel test robot

On Mon, Sep 14, 2020 at 11:19 AM Zefan Li <lizefan@huawei.com> wrote:
>
> On 2020/9/14 11:10, Muchun Song wrote:
> > On Mon, Sep 14, 2020 at 1:09 AM Chris Down <chris@chrisdown.name> wrote:
> >>
> >> Muchun Song writes:
> >>> In the cgroup v1, we have a numa_stat interface. This is useful for
> >>> providing visibility into the numa locality information within an
> >>> memcg since the pages are allowed to be allocated from any physical
> >>> node. One of the use cases is evaluating application performance by
> >>> combining this information with the application's CPU allocation.
> >>> But the cgroup v2 does not. So this patch adds the missing information.
> >>>
> >>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> >>> Suggested-by: Shakeel Butt <shakeelb@google.com>
> >>> Reported-by: kernel test robot <lkp@intel.com>
> >>
> >> This is a feature patch, why does this have LKP's Reported-by?
> >
> > In the v2 version, the kernel test robot reported a compiler error
> > on the powerpc architecture. So I added that. Thanks.
> >
>
> You should remove this reported-by, and then add v2->v3 changelog:

Got it. I see Andrew has done it for me, thank him very much for
his work. He also added this patch to the -mm tree.

>
> ...original changelog...
>
> v3:
> - fixed something reported by test rebot

I already added that in the changelog. Thanks.


--
Yours,
Muchun

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [External] Re: [PATCH v3] mm: memcontrol: Add the missing numa_stat interface for cgroup v2
@ 2020-09-14  3:28         ` Muchun Song
  0 siblings, 0 replies; 23+ messages in thread
From: Muchun Song @ 2020-09-14  3:28 UTC (permalink / raw)
  To: Zefan Li
  Cc: Chris Down, tj, Johannes Weiner, corbet, Michal Hocko,
	Vladimir Davydov, Andrew Morton, Shakeel Butt, Roman Gushchin,
	Cgroups, linux-doc, LKML, Linux Memory Management List,
	kernel test robot

On Mon, Sep 14, 2020 at 11:19 AM Zefan Li <lizefan@huawei.com> wrote:
>
> On 2020/9/14 11:10, Muchun Song wrote:
> > On Mon, Sep 14, 2020 at 1:09 AM Chris Down <chris@chrisdown.name> wrote:
> >>
> >> Muchun Song writes:
> >>> In the cgroup v1, we have a numa_stat interface. This is useful for
> >>> providing visibility into the numa locality information within an
> >>> memcg since the pages are allowed to be allocated from any physical
> >>> node. One of the use cases is evaluating application performance by
> >>> combining this information with the application's CPU allocation.
> >>> But the cgroup v2 does not. So this patch adds the missing information.
> >>>
> >>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> >>> Suggested-by: Shakeel Butt <shakeelb@google.com>
> >>> Reported-by: kernel test robot <lkp@intel.com>
> >>
> >> This is a feature patch, why does this have LKP's Reported-by?
> >
> > In the v2 version, the kernel test robot reported a compiler error
> > on the powerpc architecture. So I added that. Thanks.
> >
>
> You should remove this reported-by, and then add v2->v3 changelog:

Got it. I see Andrew has done it for me, thank him very much for
his work. He also added this patch to the -mm tree.

>
> ...original changelog...
>
> v3:
> - fixed something reported by test rebot

I already added that in the changelog. Thanks.


--
Yours,
Muchun

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] mm: memcontrol: Add the missing numa_stat interface for cgroup v2
  2020-09-13  7:00 [PATCH v3] mm: memcontrol: Add the missing numa_stat interface for cgroup v2 Muchun Song
@ 2020-09-14 16:07   ` Shakeel Butt
  2020-09-14 16:07   ` Shakeel Butt
  2020-09-14 19:06 ` Randy Dunlap
  2 siblings, 0 replies; 23+ messages in thread
From: Shakeel Butt @ 2020-09-14 16:07 UTC (permalink / raw)
  To: Muchun Song
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Jonathan Corbet,
	Michal Hocko, Vladimir Davydov, Andrew Morton, Roman Gushchin,
	Cgroups, linux-doc, LKML, Linux MM, kernel test robot

On Sun, Sep 13, 2020 at 12:01 AM Muchun Song <songmuchun@bytedance.com> wrote:
>
> In the cgroup v1, we have a numa_stat interface. This is useful for
> providing visibility into the numa locality information within an
> memcg since the pages are allowed to be allocated from any physical
> node. One of the use cases is evaluating application performance by
> combining this information with the application's CPU allocation.
> But the cgroup v2 does not. So this patch adds the missing information.
>
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Suggested-by: Shakeel Butt <shakeelb@google.com>
> Reported-by: kernel test robot <lkp@intel.com>
> ---
[snip]
> +
> +static struct numa_stat numa_stats[] = {
> +       { "anon", PAGE_SIZE, NR_ANON_MAPPED },
> +       { "file", PAGE_SIZE, NR_FILE_PAGES },
> +       { "kernel_stack", 1024, NR_KERNEL_STACK_KB },
> +       { "shmem", PAGE_SIZE, NR_SHMEM },
> +       { "file_mapped", PAGE_SIZE, NR_FILE_MAPPED },
> +       { "file_dirty", PAGE_SIZE, NR_FILE_DIRTY },
> +       { "file_writeback", PAGE_SIZE, NR_WRITEBACK },
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +       /*
> +        * The ratio will be initialized in numa_stats_init(). Because
> +        * on some architectures, the macro of HPAGE_PMD_SIZE is not
> +        * constant(e.g. powerpc).
> +        */
> +       { "anon_thp", 0, NR_ANON_THPS },
> +#endif
> +       { "inactive_anon", PAGE_SIZE, NR_INACTIVE_ANON },
> +       { "active_anon", PAGE_SIZE, NR_ACTIVE_ANON },
> +       { "inactive_file", PAGE_SIZE, NR_INACTIVE_FILE },
> +       { "active_file", PAGE_SIZE, NR_ACTIVE_FILE },
> +       { "unevictable", PAGE_SIZE, NR_UNEVICTABLE },
> +       { "slab_reclaimable", 1, NR_SLAB_RECLAIMABLE_B },
> +       { "slab_unreclaimable", 1, NR_SLAB_UNRECLAIMABLE_B },
> +};
> +
> +static int __init numa_stats_init(void)
> +{
> +       int i;
> +
> +       for (i = 0; i < ARRAY_SIZE(numa_stats); i++) {
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +               if (numa_stats[i].idx == NR_ANON_THPS)
> +                       numa_stats[i].ratio = HPAGE_PMD_SIZE;
> +#endif
> +       }

The for loop seems excessive but I don't really have a good alternative.

> +
> +       return 0;
> +}
> +pure_initcall(numa_stats_init);
> +
> +static unsigned long memcg_node_page_state(struct mem_cgroup *memcg,
> +                                          unsigned int nid,
> +                                          enum node_stat_item idx)
> +{
> +       VM_BUG_ON(nid >= nr_node_ids);
> +       return lruvec_page_state(mem_cgroup_lruvec(memcg, NODE_DATA(nid)), idx);
> +}
> +
> +static const char *memory_numa_stat_format(struct mem_cgroup *memcg)
> +{
> +       int i;
> +       struct seq_buf s;
> +
> +       /* Reserve a byte for the trailing null */
> +       seq_buf_init(&s, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE - 1);
> +       if (!s.buffer)
> +               return NULL;
> +
> +       for (i = 0; i < ARRAY_SIZE(numa_stats); i++) {
> +               int nid;
> +
> +               seq_buf_printf(&s, "%s", numa_stats[i].name);
> +               for_each_node_state(nid, N_MEMORY) {
> +                       u64 size;
> +
> +                       size = memcg_node_page_state(memcg, nid,
> +                                                    numa_stats[i].idx);
> +                       size *= numa_stats[i].ratio;
> +                       seq_buf_printf(&s, " N%d=%llu", nid, size);
> +               }
> +               seq_buf_putc(&s, '\n');
> +       }
> +
> +       /* The above should easily fit into one page */
> +       if (WARN_ON_ONCE(seq_buf_putc(&s, '\0')))
> +               s.buffer[PAGE_SIZE - 1] = '\0';

I think you should follow Michal's recommendation at
http://lkml.kernel.org/r/20200914115724.GO16999@dhcp22.suse.cz

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] mm: memcontrol: Add the missing numa_stat interface for cgroup v2
@ 2020-09-14 16:07   ` Shakeel Butt
  0 siblings, 0 replies; 23+ messages in thread
From: Shakeel Butt @ 2020-09-14 16:07 UTC (permalink / raw)
  To: Muchun Song
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Jonathan Corbet,
	Michal Hocko, Vladimir Davydov, Andrew Morton, Roman Gushchin,
	Cgroups, linux-doc, LKML, Linux MM, kernel test robot

On Sun, Sep 13, 2020 at 12:01 AM Muchun Song <songmuchun@bytedance.com> wrote:
>
> In the cgroup v1, we have a numa_stat interface. This is useful for
> providing visibility into the numa locality information within an
> memcg since the pages are allowed to be allocated from any physical
> node. One of the use cases is evaluating application performance by
> combining this information with the application's CPU allocation.
> But the cgroup v2 does not. So this patch adds the missing information.
>
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Suggested-by: Shakeel Butt <shakeelb@google.com>
> Reported-by: kernel test robot <lkp@intel.com>
> ---
[snip]
> +
> +static struct numa_stat numa_stats[] = {
> +       { "anon", PAGE_SIZE, NR_ANON_MAPPED },
> +       { "file", PAGE_SIZE, NR_FILE_PAGES },
> +       { "kernel_stack", 1024, NR_KERNEL_STACK_KB },
> +       { "shmem", PAGE_SIZE, NR_SHMEM },
> +       { "file_mapped", PAGE_SIZE, NR_FILE_MAPPED },
> +       { "file_dirty", PAGE_SIZE, NR_FILE_DIRTY },
> +       { "file_writeback", PAGE_SIZE, NR_WRITEBACK },
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +       /*
> +        * The ratio will be initialized in numa_stats_init(). Because
> +        * on some architectures, the macro of HPAGE_PMD_SIZE is not
> +        * constant(e.g. powerpc).
> +        */
> +       { "anon_thp", 0, NR_ANON_THPS },
> +#endif
> +       { "inactive_anon", PAGE_SIZE, NR_INACTIVE_ANON },
> +       { "active_anon", PAGE_SIZE, NR_ACTIVE_ANON },
> +       { "inactive_file", PAGE_SIZE, NR_INACTIVE_FILE },
> +       { "active_file", PAGE_SIZE, NR_ACTIVE_FILE },
> +       { "unevictable", PAGE_SIZE, NR_UNEVICTABLE },
> +       { "slab_reclaimable", 1, NR_SLAB_RECLAIMABLE_B },
> +       { "slab_unreclaimable", 1, NR_SLAB_UNRECLAIMABLE_B },
> +};
> +
> +static int __init numa_stats_init(void)
> +{
> +       int i;
> +
> +       for (i = 0; i < ARRAY_SIZE(numa_stats); i++) {
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +               if (numa_stats[i].idx == NR_ANON_THPS)
> +                       numa_stats[i].ratio = HPAGE_PMD_SIZE;
> +#endif
> +       }

The for loop seems excessive but I don't really have a good alternative.

> +
> +       return 0;
> +}
> +pure_initcall(numa_stats_init);
> +
> +static unsigned long memcg_node_page_state(struct mem_cgroup *memcg,
> +                                          unsigned int nid,
> +                                          enum node_stat_item idx)
> +{
> +       VM_BUG_ON(nid >= nr_node_ids);
> +       return lruvec_page_state(mem_cgroup_lruvec(memcg, NODE_DATA(nid)), idx);
> +}
> +
> +static const char *memory_numa_stat_format(struct mem_cgroup *memcg)
> +{
> +       int i;
> +       struct seq_buf s;
> +
> +       /* Reserve a byte for the trailing null */
> +       seq_buf_init(&s, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE - 1);
> +       if (!s.buffer)
> +               return NULL;
> +
> +       for (i = 0; i < ARRAY_SIZE(numa_stats); i++) {
> +               int nid;
> +
> +               seq_buf_printf(&s, "%s", numa_stats[i].name);
> +               for_each_node_state(nid, N_MEMORY) {
> +                       u64 size;
> +
> +                       size = memcg_node_page_state(memcg, nid,
> +                                                    numa_stats[i].idx);
> +                       size *= numa_stats[i].ratio;
> +                       seq_buf_printf(&s, " N%d=%llu", nid, size);
> +               }
> +               seq_buf_putc(&s, '\n');
> +       }
> +
> +       /* The above should easily fit into one page */
> +       if (WARN_ON_ONCE(seq_buf_putc(&s, '\0')))
> +               s.buffer[PAGE_SIZE - 1] = '\0';

I think you should follow Michal's recommendation at
http://lkml.kernel.org/r/20200914115724.GO16999@dhcp22.suse.cz

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [External] Re: [PATCH v3] mm: memcontrol: Add the missing numa_stat interface for cgroup v2
  2020-09-14 16:07   ` Shakeel Butt
  (?)
@ 2020-09-14 16:54     ` Muchun Song
  -1 siblings, 0 replies; 23+ messages in thread
From: Muchun Song @ 2020-09-14 16:54 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Jonathan Corbet,
	Michal Hocko, Vladimir Davydov, Andrew Morton, Roman Gushchin,
	Cgroups, linux-doc, LKML, Linux MM, kernel test robot

On Tue, Sep 15, 2020 at 12:07 AM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Sun, Sep 13, 2020 at 12:01 AM Muchun Song <songmuchun@bytedance.com> wrote:
> >
> > In the cgroup v1, we have a numa_stat interface. This is useful for
> > providing visibility into the numa locality information within an
> > memcg since the pages are allowed to be allocated from any physical
> > node. One of the use cases is evaluating application performance by
> > combining this information with the application's CPU allocation.
> > But the cgroup v2 does not. So this patch adds the missing information.
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > Suggested-by: Shakeel Butt <shakeelb@google.com>
> > Reported-by: kernel test robot <lkp@intel.com>
> > ---
> [snip]
> > +
> > +static struct numa_stat numa_stats[] = {
> > +       { "anon", PAGE_SIZE, NR_ANON_MAPPED },
> > +       { "file", PAGE_SIZE, NR_FILE_PAGES },
> > +       { "kernel_stack", 1024, NR_KERNEL_STACK_KB },
> > +       { "shmem", PAGE_SIZE, NR_SHMEM },
> > +       { "file_mapped", PAGE_SIZE, NR_FILE_MAPPED },
> > +       { "file_dirty", PAGE_SIZE, NR_FILE_DIRTY },
> > +       { "file_writeback", PAGE_SIZE, NR_WRITEBACK },
> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > +       /*
> > +        * The ratio will be initialized in numa_stats_init(). Because
> > +        * on some architectures, the macro of HPAGE_PMD_SIZE is not
> > +        * constant(e.g. powerpc).
> > +        */
> > +       { "anon_thp", 0, NR_ANON_THPS },
> > +#endif
> > +       { "inactive_anon", PAGE_SIZE, NR_INACTIVE_ANON },
> > +       { "active_anon", PAGE_SIZE, NR_ACTIVE_ANON },
> > +       { "inactive_file", PAGE_SIZE, NR_INACTIVE_FILE },
> > +       { "active_file", PAGE_SIZE, NR_ACTIVE_FILE },
> > +       { "unevictable", PAGE_SIZE, NR_UNEVICTABLE },
> > +       { "slab_reclaimable", 1, NR_SLAB_RECLAIMABLE_B },
> > +       { "slab_unreclaimable", 1, NR_SLAB_UNRECLAIMABLE_B },
> > +};
> > +
> > +static int __init numa_stats_init(void)
> > +{
> > +       int i;
> > +
> > +       for (i = 0; i < ARRAY_SIZE(numa_stats); i++) {
> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > +               if (numa_stats[i].idx == NR_ANON_THPS)
> > +                       numa_stats[i].ratio = HPAGE_PMD_SIZE;
> > +#endif
> > +       }
>
> The for loop seems excessive but I don't really have a good alternative.

Yeah, I also have no good alternative. The numa_stats is only initialized
once. So there may be no problem :).

>
> > +
> > +       return 0;
> > +}
> > +pure_initcall(numa_stats_init);
> > +
> > +static unsigned long memcg_node_page_state(struct mem_cgroup *memcg,
> > +                                          unsigned int nid,
> > +                                          enum node_stat_item idx)
> > +{
> > +       VM_BUG_ON(nid >= nr_node_ids);
> > +       return lruvec_page_state(mem_cgroup_lruvec(memcg, NODE_DATA(nid)), idx);
> > +}
> > +
> > +static const char *memory_numa_stat_format(struct mem_cgroup *memcg)
> > +{
> > +       int i;
> > +       struct seq_buf s;
> > +
> > +       /* Reserve a byte for the trailing null */
> > +       seq_buf_init(&s, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE - 1);
> > +       if (!s.buffer)
> > +               return NULL;
> > +
> > +       for (i = 0; i < ARRAY_SIZE(numa_stats); i++) {
> > +               int nid;
> > +
> > +               seq_buf_printf(&s, "%s", numa_stats[i].name);
> > +               for_each_node_state(nid, N_MEMORY) {
> > +                       u64 size;
> > +
> > +                       size = memcg_node_page_state(memcg, nid,
> > +                                                    numa_stats[i].idx);
> > +                       size *= numa_stats[i].ratio;
> > +                       seq_buf_printf(&s, " N%d=%llu", nid, size);
> > +               }
> > +               seq_buf_putc(&s, '\n');
> > +       }
> > +
> > +       /* The above should easily fit into one page */
> > +       if (WARN_ON_ONCE(seq_buf_putc(&s, '\0')))
> > +               s.buffer[PAGE_SIZE - 1] = '\0';
>
> I think you should follow Michal's recommendation at
> http://lkml.kernel.org/r/20200914115724.GO16999@dhcp22.suse.cz

Here is different, because the seq_buf_putc(&s, '\n') will not add \0 unless
we use seq_buf_puts(&s, "\n").


-- 
Yours,
Muchun

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [External] Re: [PATCH v3] mm: memcontrol: Add the missing numa_stat interface for cgroup v2
@ 2020-09-14 16:54     ` Muchun Song
  0 siblings, 0 replies; 23+ messages in thread
From: Muchun Song @ 2020-09-14 16:54 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Jonathan Corbet,
	Michal Hocko, Vladimir Davydov, Andrew Morton, Roman Gushchin,
	Cgroups, linux-doc, LKML, Linux MM, kernel test robot

On Tue, Sep 15, 2020 at 12:07 AM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Sun, Sep 13, 2020 at 12:01 AM Muchun Song <songmuchun@bytedance.com> wrote:
> >
> > In the cgroup v1, we have a numa_stat interface. This is useful for
> > providing visibility into the numa locality information within an
> > memcg since the pages are allowed to be allocated from any physical
> > node. One of the use cases is evaluating application performance by
> > combining this information with the application's CPU allocation.
> > But the cgroup v2 does not. So this patch adds the missing information.
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > Suggested-by: Shakeel Butt <shakeelb@google.com>
> > Reported-by: kernel test robot <lkp@intel.com>
> > ---
> [snip]
> > +
> > +static struct numa_stat numa_stats[] = {
> > +       { "anon", PAGE_SIZE, NR_ANON_MAPPED },
> > +       { "file", PAGE_SIZE, NR_FILE_PAGES },
> > +       { "kernel_stack", 1024, NR_KERNEL_STACK_KB },
> > +       { "shmem", PAGE_SIZE, NR_SHMEM },
> > +       { "file_mapped", PAGE_SIZE, NR_FILE_MAPPED },
> > +       { "file_dirty", PAGE_SIZE, NR_FILE_DIRTY },
> > +       { "file_writeback", PAGE_SIZE, NR_WRITEBACK },
> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > +       /*
> > +        * The ratio will be initialized in numa_stats_init(). Because
> > +        * on some architectures, the macro of HPAGE_PMD_SIZE is not
> > +        * constant(e.g. powerpc).
> > +        */
> > +       { "anon_thp", 0, NR_ANON_THPS },
> > +#endif
> > +       { "inactive_anon", PAGE_SIZE, NR_INACTIVE_ANON },
> > +       { "active_anon", PAGE_SIZE, NR_ACTIVE_ANON },
> > +       { "inactive_file", PAGE_SIZE, NR_INACTIVE_FILE },
> > +       { "active_file", PAGE_SIZE, NR_ACTIVE_FILE },
> > +       { "unevictable", PAGE_SIZE, NR_UNEVICTABLE },
> > +       { "slab_reclaimable", 1, NR_SLAB_RECLAIMABLE_B },
> > +       { "slab_unreclaimable", 1, NR_SLAB_UNRECLAIMABLE_B },
> > +};
> > +
> > +static int __init numa_stats_init(void)
> > +{
> > +       int i;
> > +
> > +       for (i = 0; i < ARRAY_SIZE(numa_stats); i++) {
> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > +               if (numa_stats[i].idx == NR_ANON_THPS)
> > +                       numa_stats[i].ratio = HPAGE_PMD_SIZE;
> > +#endif
> > +       }
>
> The for loop seems excessive but I don't really have a good alternative.

Yeah, I also have no good alternative. The numa_stats is only initialized
once. So there may be no problem :).

>
> > +
> > +       return 0;
> > +}
> > +pure_initcall(numa_stats_init);
> > +
> > +static unsigned long memcg_node_page_state(struct mem_cgroup *memcg,
> > +                                          unsigned int nid,
> > +                                          enum node_stat_item idx)
> > +{
> > +       VM_BUG_ON(nid >= nr_node_ids);
> > +       return lruvec_page_state(mem_cgroup_lruvec(memcg, NODE_DATA(nid)), idx);
> > +}
> > +
> > +static const char *memory_numa_stat_format(struct mem_cgroup *memcg)
> > +{
> > +       int i;
> > +       struct seq_buf s;
> > +
> > +       /* Reserve a byte for the trailing null */
> > +       seq_buf_init(&s, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE - 1);
> > +       if (!s.buffer)
> > +               return NULL;
> > +
> > +       for (i = 0; i < ARRAY_SIZE(numa_stats); i++) {
> > +               int nid;
> > +
> > +               seq_buf_printf(&s, "%s", numa_stats[i].name);
> > +               for_each_node_state(nid, N_MEMORY) {
> > +                       u64 size;
> > +
> > +                       size = memcg_node_page_state(memcg, nid,
> > +                                                    numa_stats[i].idx);
> > +                       size *= numa_stats[i].ratio;
> > +                       seq_buf_printf(&s, " N%d=%llu", nid, size);
> > +               }
> > +               seq_buf_putc(&s, '\n');
> > +       }
> > +
> > +       /* The above should easily fit into one page */
> > +       if (WARN_ON_ONCE(seq_buf_putc(&s, '\0')))
> > +               s.buffer[PAGE_SIZE - 1] = '\0';
>
> I think you should follow Michal's recommendation at
> http://lkml.kernel.org/r/20200914115724.GO16999@dhcp22.suse.cz

Here is different, because the seq_buf_putc(&s, '\n') will not add \0 unless
we use seq_buf_puts(&s, "\n").


-- 
Yours,
Muchun


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [External] Re: [PATCH v3] mm: memcontrol: Add the missing numa_stat interface for cgroup v2
@ 2020-09-14 16:54     ` Muchun Song
  0 siblings, 0 replies; 23+ messages in thread
From: Muchun Song @ 2020-09-14 16:54 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Jonathan Corbet,
	Michal Hocko, Vladimir Davydov, Andrew Morton, Roman Gushchin,
	Cgroups, linux-doc-u79uwXL29TY76Z2rM5mHXA, LKML, Linux MM,
	kernel test robot

On Tue, Sep 15, 2020 at 12:07 AM Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>
> On Sun, Sep 13, 2020 at 12:01 AM Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org> wrote:
> >
> > In the cgroup v1, we have a numa_stat interface. This is useful for
> > providing visibility into the numa locality information within an
> > memcg since the pages are allowed to be allocated from any physical
> > node. One of the use cases is evaluating application performance by
> > combining this information with the application's CPU allocation.
> > But the cgroup v2 does not. So this patch adds the missing information.
> >
> > Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>
> > Suggested-by: Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > Reported-by: kernel test robot <lkp-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > ---
> [snip]
> > +
> > +static struct numa_stat numa_stats[] = {
> > +       { "anon", PAGE_SIZE, NR_ANON_MAPPED },
> > +       { "file", PAGE_SIZE, NR_FILE_PAGES },
> > +       { "kernel_stack", 1024, NR_KERNEL_STACK_KB },
> > +       { "shmem", PAGE_SIZE, NR_SHMEM },
> > +       { "file_mapped", PAGE_SIZE, NR_FILE_MAPPED },
> > +       { "file_dirty", PAGE_SIZE, NR_FILE_DIRTY },
> > +       { "file_writeback", PAGE_SIZE, NR_WRITEBACK },
> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > +       /*
> > +        * The ratio will be initialized in numa_stats_init(). Because
> > +        * on some architectures, the macro of HPAGE_PMD_SIZE is not
> > +        * constant(e.g. powerpc).
> > +        */
> > +       { "anon_thp", 0, NR_ANON_THPS },
> > +#endif
> > +       { "inactive_anon", PAGE_SIZE, NR_INACTIVE_ANON },
> > +       { "active_anon", PAGE_SIZE, NR_ACTIVE_ANON },
> > +       { "inactive_file", PAGE_SIZE, NR_INACTIVE_FILE },
> > +       { "active_file", PAGE_SIZE, NR_ACTIVE_FILE },
> > +       { "unevictable", PAGE_SIZE, NR_UNEVICTABLE },
> > +       { "slab_reclaimable", 1, NR_SLAB_RECLAIMABLE_B },
> > +       { "slab_unreclaimable", 1, NR_SLAB_UNRECLAIMABLE_B },
> > +};
> > +
> > +static int __init numa_stats_init(void)
> > +{
> > +       int i;
> > +
> > +       for (i = 0; i < ARRAY_SIZE(numa_stats); i++) {
> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > +               if (numa_stats[i].idx == NR_ANON_THPS)
> > +                       numa_stats[i].ratio = HPAGE_PMD_SIZE;
> > +#endif
> > +       }
>
> The for loop seems excessive but I don't really have a good alternative.

Yeah, I also have no good alternative. The numa_stats is only initialized
once. So there may be no problem :).

>
> > +
> > +       return 0;
> > +}
> > +pure_initcall(numa_stats_init);
> > +
> > +static unsigned long memcg_node_page_state(struct mem_cgroup *memcg,
> > +                                          unsigned int nid,
> > +                                          enum node_stat_item idx)
> > +{
> > +       VM_BUG_ON(nid >= nr_node_ids);
> > +       return lruvec_page_state(mem_cgroup_lruvec(memcg, NODE_DATA(nid)), idx);
> > +}
> > +
> > +static const char *memory_numa_stat_format(struct mem_cgroup *memcg)
> > +{
> > +       int i;
> > +       struct seq_buf s;
> > +
> > +       /* Reserve a byte for the trailing null */
> > +       seq_buf_init(&s, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE - 1);
> > +       if (!s.buffer)
> > +               return NULL;
> > +
> > +       for (i = 0; i < ARRAY_SIZE(numa_stats); i++) {
> > +               int nid;
> > +
> > +               seq_buf_printf(&s, "%s", numa_stats[i].name);
> > +               for_each_node_state(nid, N_MEMORY) {
> > +                       u64 size;
> > +
> > +                       size = memcg_node_page_state(memcg, nid,
> > +                                                    numa_stats[i].idx);
> > +                       size *= numa_stats[i].ratio;
> > +                       seq_buf_printf(&s, " N%d=%llu", nid, size);
> > +               }
> > +               seq_buf_putc(&s, '\n');
> > +       }
> > +
> > +       /* The above should easily fit into one page */
> > +       if (WARN_ON_ONCE(seq_buf_putc(&s, '\0')))
> > +               s.buffer[PAGE_SIZE - 1] = '\0';
>
> I think you should follow Michal's recommendation at
> http://lkml.kernel.org/r/20200914115724.GO16999-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org

Here is different, because the seq_buf_putc(&s, '\n') will not add \0 unless
we use seq_buf_puts(&s, "\n").


-- 
Yours,
Muchun

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] mm: memcontrol: Add the missing numa_stat interface for cgroup v2
  2020-09-13  7:00 [PATCH v3] mm: memcontrol: Add the missing numa_stat interface for cgroup v2 Muchun Song
  2020-09-13 17:09 ` Chris Down
  2020-09-14 16:07   ` Shakeel Butt
@ 2020-09-14 19:06 ` Randy Dunlap
  2020-09-15  2:44     ` Muchun Song
  2 siblings, 1 reply; 23+ messages in thread
From: Randy Dunlap @ 2020-09-14 19:06 UTC (permalink / raw)
  To: Muchun Song, tj, lizefan, hannes, corbet, mhocko, vdavydov.dev,
	akpm, shakeelb, guro
  Cc: cgroups, linux-doc, linux-kernel, linux-mm, kernel test robot

On 9/13/20 12:00 AM, Muchun Song wrote:
> In the cgroup v1, we have a numa_stat interface. This is useful for
> providing visibility into the numa locality information within an
> memcg since the pages are allowed to be allocated from any physical
> node. One of the use cases is evaluating application performance by
> combining this information with the application's CPU allocation.
> But the cgroup v2 does not. So this patch adds the missing information.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Suggested-by: Shakeel Butt <shakeelb@google.com>
> Reported-by: kernel test robot <lkp@intel.com>
> ---
>  changelog in v3:
>  1. Fix compiler error on powerpc architecture reported by kernel test robot.
>  2. Fix a typo from "anno" to "anon".
> 
>  changelog in v2:
>  1. Add memory.numa_stat interface in cgroup v2.
> 
>  Documentation/admin-guide/cgroup-v2.rst |  72 ++++++++++++++++
>  mm/memcontrol.c                         | 107 ++++++++++++++++++++++++
>  2 files changed, 179 insertions(+)
> 
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 6be43781ec7f..92207f0012e4 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1368,6 +1368,78 @@ PAGE_SIZE multiple when read back.
>  		collapsing an existing range of pages. This counter is not
>  		present when CONFIG_TRANSPARENT_HUGEPAGE is not set.
>  
> +  memory.numa_stat
> +	A read-only flat-keyed file which exists on non-root cgroups.
> +
> +	This breaks down the cgroup's memory footprint into different
> +	types of memory, type-specific details, and other information
> +	per node on the state of the memory management system.
> +
> +	This is useful for providing visibility into the numa locality

capitalize acronyms, please:                             NUMA


> +	information within an memcg since the pages are allowed to be
> +	allocated from any physical node. One of the use cases is evaluating
> +	application performance by combining this information with the
> +	application's CPU allocation.
> +
> +	All memory amounts are in bytes.
> +
> +	The output format of memory.numa_stat is::
> +
> +	  type N0=<node 0 pages> N1=<node 1 pages> ...

Now I'm confused.  5 lines above here it says "All memory amounts are in bytes"
but these appear to be in pages. Which is it?  and what size pages if that matters?

Is it like this?
	  type N0=<bytes in node 0 pages> N1=<bytes in node 1 pages> ...



> +	The entries are ordered to be human readable, and new entries
> +	can show up in the middle. Don't rely on items remaining in a
> +	fixed position; use the keys to look up specific values!
> +
> +	  anon
> +		Amount of memory per node used in anonymous mappings such
> +		as brk(), sbrk(), and mmap(MAP_ANONYMOUS)
> +
> +	  file
> +		Amount of memory per node used to cache filesystem data,
> +		including tmpfs and shared memory.
> +
> +	  kernel_stack
> +		Amount of memory per node allocated to kernel stacks.
> +
> +	  shmem
> +		Amount of cached filesystem data per node that is swap-backed,
> +		such as tmpfs, shm segments, shared anonymous mmap()s
> +
> +	  file_mapped
> +		Amount of cached filesystem data per node mapped with mmap()
> +
> +	  file_dirty
> +		Amount of cached filesystem data per node that was modified but
> +		not yet written back to disk
> +
> +	  file_writeback
> +		Amount of cached filesystem data per node that was modified and
> +		is currently being written back to disk
> +
> +	  anon_thp
> +		Amount of memory per node used in anonymous mappings backed by
> +		transparent hugepages
> +
> +	  inactive_anon, active_anon, inactive_file, active_file, unevictable
> +		Amount of memory, swap-backed and filesystem-backed,
> +		per node on the internal memory management lists used
> +		by the page reclaim algorithm.
> +
> +		As these represent internal list state (eg. shmem pages are on anon

		                                         e.g.

> +		memory management lists), inactive_foo + active_foo may not be equal to
> +		the value for the foo counter, since the foo counter is type-based, not
> +		list-based.
> +
> +	  slab_reclaimable
> +		Amount of memory per node used for storing in-kernel data
> +		structures which might be reclaimed, such as dentries and
> +		inodes.
> +
> +	  slab_unreclaimable
> +		Amount of memory per node used for storing in-kernel data
> +		structures which cannot be reclaimed on memory pressure.

Some of the descriptions above end with a '.' and some do not. Please be consistent.

> +
>    memory.swap.current
>  	A read-only single value file which exists on non-root
>  	cgroups.


thanks.
-- 
~Randy


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [External] Re: [PATCH v3] mm: memcontrol: Add the missing numa_stat interface for cgroup v2
  2020-09-14 16:54     ` Muchun Song
  (?)
@ 2020-09-14 22:57       ` Shakeel Butt
  -1 siblings, 0 replies; 23+ messages in thread
From: Shakeel Butt @ 2020-09-14 22:57 UTC (permalink / raw)
  To: Muchun Song
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Jonathan Corbet,
	Michal Hocko, Vladimir Davydov, Andrew Morton, Roman Gushchin,
	Cgroups, linux-doc, LKML, Linux MM, kernel test robot

On Mon, Sep 14, 2020 at 9:55 AM Muchun Song <songmuchun@bytedance.com> wrote:
>
> On Tue, Sep 15, 2020 at 12:07 AM Shakeel Butt <shakeelb@google.com> wrote:
> >
> > On Sun, Sep 13, 2020 at 12:01 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > >
> > > In the cgroup v1, we have a numa_stat interface. This is useful for
> > > providing visibility into the numa locality information within an
> > > memcg since the pages are allowed to be allocated from any physical
> > > node. One of the use cases is evaluating application performance by
> > > combining this information with the application's CPU allocation.
> > > But the cgroup v2 does not. So this patch adds the missing information.
> > >
> > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > Suggested-by: Shakeel Butt <shakeelb@google.com>
> > > Reported-by: kernel test robot <lkp@intel.com>
> > > ---
> > [snip]
> > > +
> > > +static struct numa_stat numa_stats[] = {
> > > +       { "anon", PAGE_SIZE, NR_ANON_MAPPED },
> > > +       { "file", PAGE_SIZE, NR_FILE_PAGES },
> > > +       { "kernel_stack", 1024, NR_KERNEL_STACK_KB },
> > > +       { "shmem", PAGE_SIZE, NR_SHMEM },
> > > +       { "file_mapped", PAGE_SIZE, NR_FILE_MAPPED },
> > > +       { "file_dirty", PAGE_SIZE, NR_FILE_DIRTY },
> > > +       { "file_writeback", PAGE_SIZE, NR_WRITEBACK },
> > > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > > +       /*
> > > +        * The ratio will be initialized in numa_stats_init(). Because
> > > +        * on some architectures, the macro of HPAGE_PMD_SIZE is not
> > > +        * constant(e.g. powerpc).
> > > +        */
> > > +       { "anon_thp", 0, NR_ANON_THPS },
> > > +#endif
> > > +       { "inactive_anon", PAGE_SIZE, NR_INACTIVE_ANON },
> > > +       { "active_anon", PAGE_SIZE, NR_ACTIVE_ANON },
> > > +       { "inactive_file", PAGE_SIZE, NR_INACTIVE_FILE },
> > > +       { "active_file", PAGE_SIZE, NR_ACTIVE_FILE },
> > > +       { "unevictable", PAGE_SIZE, NR_UNEVICTABLE },
> > > +       { "slab_reclaimable", 1, NR_SLAB_RECLAIMABLE_B },
> > > +       { "slab_unreclaimable", 1, NR_SLAB_UNRECLAIMABLE_B },
> > > +};
> > > +
> > > +static int __init numa_stats_init(void)
> > > +{
> > > +       int i;
> > > +
> > > +       for (i = 0; i < ARRAY_SIZE(numa_stats); i++) {
> > > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > > +               if (numa_stats[i].idx == NR_ANON_THPS)
> > > +                       numa_stats[i].ratio = HPAGE_PMD_SIZE;
> > > +#endif
> > > +       }
> >
> > The for loop seems excessive but I don't really have a good alternative.
>
> Yeah, I also have no good alternative. The numa_stats is only initialized
> once. So there may be no problem :).
>
> >
> > > +
> > > +       return 0;
> > > +}
> > > +pure_initcall(numa_stats_init);
> > > +
> > > +static unsigned long memcg_node_page_state(struct mem_cgroup *memcg,
> > > +                                          unsigned int nid,
> > > +                                          enum node_stat_item idx)
> > > +{
> > > +       VM_BUG_ON(nid >= nr_node_ids);
> > > +       return lruvec_page_state(mem_cgroup_lruvec(memcg, NODE_DATA(nid)), idx);
> > > +}
> > > +
> > > +static const char *memory_numa_stat_format(struct mem_cgroup *memcg)
> > > +{
> > > +       int i;
> > > +       struct seq_buf s;
> > > +
> > > +       /* Reserve a byte for the trailing null */
> > > +       seq_buf_init(&s, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE - 1);
> > > +       if (!s.buffer)
> > > +               return NULL;
> > > +
> > > +       for (i = 0; i < ARRAY_SIZE(numa_stats); i++) {
> > > +               int nid;
> > > +
> > > +               seq_buf_printf(&s, "%s", numa_stats[i].name);
> > > +               for_each_node_state(nid, N_MEMORY) {
> > > +                       u64 size;
> > > +
> > > +                       size = memcg_node_page_state(memcg, nid,
> > > +                                                    numa_stats[i].idx);
> > > +                       size *= numa_stats[i].ratio;
> > > +                       seq_buf_printf(&s, " N%d=%llu", nid, size);
> > > +               }
> > > +               seq_buf_putc(&s, '\n');
> > > +       }
> > > +
> > > +       /* The above should easily fit into one page */
> > > +       if (WARN_ON_ONCE(seq_buf_putc(&s, '\0')))
> > > +               s.buffer[PAGE_SIZE - 1] = '\0';
> >
> > I think you should follow Michal's recommendation at
> > http://lkml.kernel.org/r/20200914115724.GO16999@dhcp22.suse.cz
>
> Here is different, because the seq_buf_putc(&s, '\n') will not add \0 unless
> we use seq_buf_puts(&s, "\n").
>

Why a separate memory_numa_stat_format()? For memory_stat_format(), it
is called from two places. There is no need to have a separate
memory_numa_stat_format(). Similarly why not just call seq_printf()
instead of formatting into a seq_buf?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [External] Re: [PATCH v3] mm: memcontrol: Add the missing numa_stat interface for cgroup v2
@ 2020-09-14 22:57       ` Shakeel Butt
  0 siblings, 0 replies; 23+ messages in thread
From: Shakeel Butt @ 2020-09-14 22:57 UTC (permalink / raw)
  To: Muchun Song
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Jonathan Corbet,
	Michal Hocko, Vladimir Davydov, Andrew Morton, Roman Gushchin,
	Cgroups, linux-doc, LKML, Linux MM, kernel test robot

On Mon, Sep 14, 2020 at 9:55 AM Muchun Song <songmuchun@bytedance.com> wrote:
>
> On Tue, Sep 15, 2020 at 12:07 AM Shakeel Butt <shakeelb@google.com> wrote:
> >
> > On Sun, Sep 13, 2020 at 12:01 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > >
> > > In the cgroup v1, we have a numa_stat interface. This is useful for
> > > providing visibility into the numa locality information within an
> > > memcg since the pages are allowed to be allocated from any physical
> > > node. One of the use cases is evaluating application performance by
> > > combining this information with the application's CPU allocation.
> > > But the cgroup v2 does not. So this patch adds the missing information.
> > >
> > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > Suggested-by: Shakeel Butt <shakeelb@google.com>
> > > Reported-by: kernel test robot <lkp@intel.com>
> > > ---
> > [snip]
> > > +
> > > +static struct numa_stat numa_stats[] = {
> > > +       { "anon", PAGE_SIZE, NR_ANON_MAPPED },
> > > +       { "file", PAGE_SIZE, NR_FILE_PAGES },
> > > +       { "kernel_stack", 1024, NR_KERNEL_STACK_KB },
> > > +       { "shmem", PAGE_SIZE, NR_SHMEM },
> > > +       { "file_mapped", PAGE_SIZE, NR_FILE_MAPPED },
> > > +       { "file_dirty", PAGE_SIZE, NR_FILE_DIRTY },
> > > +       { "file_writeback", PAGE_SIZE, NR_WRITEBACK },
> > > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > > +       /*
> > > +        * The ratio will be initialized in numa_stats_init(). Because
> > > +        * on some architectures, the macro of HPAGE_PMD_SIZE is not
> > > +        * constant(e.g. powerpc).
> > > +        */
> > > +       { "anon_thp", 0, NR_ANON_THPS },
> > > +#endif
> > > +       { "inactive_anon", PAGE_SIZE, NR_INACTIVE_ANON },
> > > +       { "active_anon", PAGE_SIZE, NR_ACTIVE_ANON },
> > > +       { "inactive_file", PAGE_SIZE, NR_INACTIVE_FILE },
> > > +       { "active_file", PAGE_SIZE, NR_ACTIVE_FILE },
> > > +       { "unevictable", PAGE_SIZE, NR_UNEVICTABLE },
> > > +       { "slab_reclaimable", 1, NR_SLAB_RECLAIMABLE_B },
> > > +       { "slab_unreclaimable", 1, NR_SLAB_UNRECLAIMABLE_B },
> > > +};
> > > +
> > > +static int __init numa_stats_init(void)
> > > +{
> > > +       int i;
> > > +
> > > +       for (i = 0; i < ARRAY_SIZE(numa_stats); i++) {
> > > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > > +               if (numa_stats[i].idx == NR_ANON_THPS)
> > > +                       numa_stats[i].ratio = HPAGE_PMD_SIZE;
> > > +#endif
> > > +       }
> >
> > The for loop seems excessive but I don't really have a good alternative.
>
> Yeah, I also have no good alternative. The numa_stats is only initialized
> once. So there may be no problem :).
>
> >
> > > +
> > > +       return 0;
> > > +}
> > > +pure_initcall(numa_stats_init);
> > > +
> > > +static unsigned long memcg_node_page_state(struct mem_cgroup *memcg,
> > > +                                          unsigned int nid,
> > > +                                          enum node_stat_item idx)
> > > +{
> > > +       VM_BUG_ON(nid >= nr_node_ids);
> > > +       return lruvec_page_state(mem_cgroup_lruvec(memcg, NODE_DATA(nid)), idx);
> > > +}
> > > +
> > > +static const char *memory_numa_stat_format(struct mem_cgroup *memcg)
> > > +{
> > > +       int i;
> > > +       struct seq_buf s;
> > > +
> > > +       /* Reserve a byte for the trailing null */
> > > +       seq_buf_init(&s, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE - 1);
> > > +       if (!s.buffer)
> > > +               return NULL;
> > > +
> > > +       for (i = 0; i < ARRAY_SIZE(numa_stats); i++) {
> > > +               int nid;
> > > +
> > > +               seq_buf_printf(&s, "%s", numa_stats[i].name);
> > > +               for_each_node_state(nid, N_MEMORY) {
> > > +                       u64 size;
> > > +
> > > +                       size = memcg_node_page_state(memcg, nid,
> > > +                                                    numa_stats[i].idx);
> > > +                       size *= numa_stats[i].ratio;
> > > +                       seq_buf_printf(&s, " N%d=%llu", nid, size);
> > > +               }
> > > +               seq_buf_putc(&s, '\n');
> > > +       }
> > > +
> > > +       /* The above should easily fit into one page */
> > > +       if (WARN_ON_ONCE(seq_buf_putc(&s, '\0')))
> > > +               s.buffer[PAGE_SIZE - 1] = '\0';
> >
> > I think you should follow Michal's recommendation at
> > http://lkml.kernel.org/r/20200914115724.GO16999@dhcp22.suse.cz
>
> Here is different, because the seq_buf_putc(&s, '\n') will not add \0 unless
> we use seq_buf_puts(&s, "\n").
>

Why a separate memory_numa_stat_format()? For memory_stat_format(), it
is called from two places. There is no need to have a separate
memory_numa_stat_format(). Similarly why not just call seq_printf()
instead of formatting into a seq_buf?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [External] Re: [PATCH v3] mm: memcontrol: Add the missing numa_stat interface for cgroup v2
@ 2020-09-14 22:57       ` Shakeel Butt
  0 siblings, 0 replies; 23+ messages in thread
From: Shakeel Butt @ 2020-09-14 22:57 UTC (permalink / raw)
  To: Muchun Song
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Jonathan Corbet,
	Michal Hocko, Vladimir Davydov, Andrew Morton, Roman Gushchin,
	Cgroups, linux-doc-u79uwXL29TY76Z2rM5mHXA, LKML, Linux MM,
	kernel test robot

On Mon, Sep 14, 2020 at 9:55 AM Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org> wrote:
>
> On Tue, Sep 15, 2020 at 12:07 AM Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> >
> > On Sun, Sep 13, 2020 at 12:01 AM Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org> wrote:
> > >
> > > In the cgroup v1, we have a numa_stat interface. This is useful for
> > > providing visibility into the numa locality information within an
> > > memcg since the pages are allowed to be allocated from any physical
> > > node. One of the use cases is evaluating application performance by
> > > combining this information with the application's CPU allocation.
> > > But the cgroup v2 does not. So this patch adds the missing information.
> > >
> > > Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>
> > > Suggested-by: Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > > Reported-by: kernel test robot <lkp-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > > ---
> > [snip]
> > > +
> > > +static struct numa_stat numa_stats[] = {
> > > +       { "anon", PAGE_SIZE, NR_ANON_MAPPED },
> > > +       { "file", PAGE_SIZE, NR_FILE_PAGES },
> > > +       { "kernel_stack", 1024, NR_KERNEL_STACK_KB },
> > > +       { "shmem", PAGE_SIZE, NR_SHMEM },
> > > +       { "file_mapped", PAGE_SIZE, NR_FILE_MAPPED },
> > > +       { "file_dirty", PAGE_SIZE, NR_FILE_DIRTY },
> > > +       { "file_writeback", PAGE_SIZE, NR_WRITEBACK },
> > > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > > +       /*
> > > +        * The ratio will be initialized in numa_stats_init(). Because
> > > +        * on some architectures, the macro of HPAGE_PMD_SIZE is not
> > > +        * constant(e.g. powerpc).
> > > +        */
> > > +       { "anon_thp", 0, NR_ANON_THPS },
> > > +#endif
> > > +       { "inactive_anon", PAGE_SIZE, NR_INACTIVE_ANON },
> > > +       { "active_anon", PAGE_SIZE, NR_ACTIVE_ANON },
> > > +       { "inactive_file", PAGE_SIZE, NR_INACTIVE_FILE },
> > > +       { "active_file", PAGE_SIZE, NR_ACTIVE_FILE },
> > > +       { "unevictable", PAGE_SIZE, NR_UNEVICTABLE },
> > > +       { "slab_reclaimable", 1, NR_SLAB_RECLAIMABLE_B },
> > > +       { "slab_unreclaimable", 1, NR_SLAB_UNRECLAIMABLE_B },
> > > +};
> > > +
> > > +static int __init numa_stats_init(void)
> > > +{
> > > +       int i;
> > > +
> > > +       for (i = 0; i < ARRAY_SIZE(numa_stats); i++) {
> > > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > > +               if (numa_stats[i].idx == NR_ANON_THPS)
> > > +                       numa_stats[i].ratio = HPAGE_PMD_SIZE;
> > > +#endif
> > > +       }
> >
> > The for loop seems excessive but I don't really have a good alternative.
>
> Yeah, I also have no good alternative. The numa_stats is only initialized
> once. So there may be no problem :).
>
> >
> > > +
> > > +       return 0;
> > > +}
> > > +pure_initcall(numa_stats_init);
> > > +
> > > +static unsigned long memcg_node_page_state(struct mem_cgroup *memcg,
> > > +                                          unsigned int nid,
> > > +                                          enum node_stat_item idx)
> > > +{
> > > +       VM_BUG_ON(nid >= nr_node_ids);
> > > +       return lruvec_page_state(mem_cgroup_lruvec(memcg, NODE_DATA(nid)), idx);
> > > +}
> > > +
> > > +static const char *memory_numa_stat_format(struct mem_cgroup *memcg)
> > > +{
> > > +       int i;
> > > +       struct seq_buf s;
> > > +
> > > +       /* Reserve a byte for the trailing null */
> > > +       seq_buf_init(&s, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE - 1);
> > > +       if (!s.buffer)
> > > +               return NULL;
> > > +
> > > +       for (i = 0; i < ARRAY_SIZE(numa_stats); i++) {
> > > +               int nid;
> > > +
> > > +               seq_buf_printf(&s, "%s", numa_stats[i].name);
> > > +               for_each_node_state(nid, N_MEMORY) {
> > > +                       u64 size;
> > > +
> > > +                       size = memcg_node_page_state(memcg, nid,
> > > +                                                    numa_stats[i].idx);
> > > +                       size *= numa_stats[i].ratio;
> > > +                       seq_buf_printf(&s, " N%d=%llu", nid, size);
> > > +               }
> > > +               seq_buf_putc(&s, '\n');
> > > +       }
> > > +
> > > +       /* The above should easily fit into one page */
> > > +       if (WARN_ON_ONCE(seq_buf_putc(&s, '\0')))
> > > +               s.buffer[PAGE_SIZE - 1] = '\0';
> >
> > I think you should follow Michal's recommendation at
> > http://lkml.kernel.org/r/20200914115724.GO16999-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org
>
> Here is different, because the seq_buf_putc(&s, '\n') will not add \0 unless
> we use seq_buf_puts(&s, "\n").
>

Why a separate memory_numa_stat_format()? For memory_stat_format(), it
is called from two places. There is no need to have a separate
memory_numa_stat_format(). Similarly why not just call seq_printf()
instead of formatting into a seq_buf?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [External] Re: [PATCH v3] mm: memcontrol: Add the missing numa_stat interface for cgroup v2
  2020-09-14 19:06 ` Randy Dunlap
  2020-09-15  2:44     ` Muchun Song
@ 2020-09-15  2:44     ` Muchun Song
  0 siblings, 0 replies; 23+ messages in thread
From: Muchun Song @ 2020-09-15  2:44 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: tj, Zefan Li, Johannes Weiner, corbet, Michal Hocko,
	Vladimir Davydov, Andrew Morton, Shakeel Butt, Roman Gushchin,
	Cgroups, linux-doc, LKML, Linux Memory Management List,
	kernel test robot

On Tue, Sep 15, 2020 at 3:07 AM Randy Dunlap <rdunlap@infradead.org> wrote:
>
> On 9/13/20 12:00 AM, Muchun Song wrote:
> > In the cgroup v1, we have a numa_stat interface. This is useful for
> > providing visibility into the numa locality information within an
> > memcg since the pages are allowed to be allocated from any physical
> > node. One of the use cases is evaluating application performance by
> > combining this information with the application's CPU allocation.
> > But the cgroup v2 does not. So this patch adds the missing information.
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > Suggested-by: Shakeel Butt <shakeelb@google.com>
> > Reported-by: kernel test robot <lkp@intel.com>
> > ---
> >  changelog in v3:
> >  1. Fix compiler error on powerpc architecture reported by kernel test robot.
> >  2. Fix a typo from "anno" to "anon".
> >
> >  changelog in v2:
> >  1. Add memory.numa_stat interface in cgroup v2.
> >
> >  Documentation/admin-guide/cgroup-v2.rst |  72 ++++++++++++++++
> >  mm/memcontrol.c                         | 107 ++++++++++++++++++++++++
> >  2 files changed, 179 insertions(+)
> >
> > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> > index 6be43781ec7f..92207f0012e4 100644
> > --- a/Documentation/admin-guide/cgroup-v2.rst
> > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > @@ -1368,6 +1368,78 @@ PAGE_SIZE multiple when read back.
> >               collapsing an existing range of pages. This counter is not
> >               present when CONFIG_TRANSPARENT_HUGEPAGE is not set.
> >
> > +  memory.numa_stat
> > +     A read-only flat-keyed file which exists on non-root cgroups.
> > +
> > +     This breaks down the cgroup's memory footprint into different
> > +     types of memory, type-specific details, and other information
> > +     per node on the state of the memory management system.
> > +
> > +     This is useful for providing visibility into the numa locality
>
> capitalize acronyms, please:                             NUMA

OK, I will do that. Thanks.

>
>
> > +     information within an memcg since the pages are allowed to be
> > +     allocated from any physical node. One of the use cases is evaluating
> > +     application performance by combining this information with the
> > +     application's CPU allocation.
> > +
> > +     All memory amounts are in bytes.
> > +
> > +     The output format of memory.numa_stat is::
> > +
> > +       type N0=<node 0 pages> N1=<node 1 pages> ...
>
> Now I'm confused.  5 lines above here it says "All memory amounts are in bytes"
> but these appear to be in pages. Which is it?  and what size pages if that matters?

Sorry. It's my mistake. I will fix it.

>
> Is it like this?
>           type N0=<bytes in node 0 pages> N1=<bytes in node 1 pages> ...

Thanks.

>
>
>
> > +     The entries are ordered to be human readable, and new entries
> > +     can show up in the middle. Don't rely on items remaining in a
> > +     fixed position; use the keys to look up specific values!
> > +
> > +       anon
> > +             Amount of memory per node used in anonymous mappings such
> > +             as brk(), sbrk(), and mmap(MAP_ANONYMOUS)
> > +
> > +       file
> > +             Amount of memory per node used to cache filesystem data,
> > +             including tmpfs and shared memory.
> > +
> > +       kernel_stack
> > +             Amount of memory per node allocated to kernel stacks.
> > +
> > +       shmem
> > +             Amount of cached filesystem data per node that is swap-backed,
> > +             such as tmpfs, shm segments, shared anonymous mmap()s
> > +
> > +       file_mapped
> > +             Amount of cached filesystem data per node mapped with mmap()
> > +
> > +       file_dirty
> > +             Amount of cached filesystem data per node that was modified but
> > +             not yet written back to disk
> > +
> > +       file_writeback
> > +             Amount of cached filesystem data per node that was modified and
> > +             is currently being written back to disk
> > +
> > +       anon_thp
> > +             Amount of memory per node used in anonymous mappings backed by
> > +             transparent hugepages
> > +
> > +       inactive_anon, active_anon, inactive_file, active_file, unevictable
> > +             Amount of memory, swap-backed and filesystem-backed,
> > +             per node on the internal memory management lists used
> > +             by the page reclaim algorithm.
> > +
> > +             As these represent internal list state (eg. shmem pages are on anon
>
>                                                          e.g.

Thanks.

>
> > +             memory management lists), inactive_foo + active_foo may not be equal to
> > +             the value for the foo counter, since the foo counter is type-based, not
> > +             list-based.
> > +
> > +       slab_reclaimable
> > +             Amount of memory per node used for storing in-kernel data
> > +             structures which might be reclaimed, such as dentries and
> > +             inodes.
> > +
> > +       slab_unreclaimable
> > +             Amount of memory per node used for storing in-kernel data
> > +             structures which cannot be reclaimed on memory pressure.
>
> Some of the descriptions above end with a '.' and some do not. Please be consistent.

Will do that.

>
> > +
> >    memory.swap.current
> >       A read-only single value file which exists on non-root
> >       cgroups.
>
>
> thanks.
> --
> ~Randy
>


-- 
Yours,
Muchun

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [External] Re: [PATCH v3] mm: memcontrol: Add the missing numa_stat interface for cgroup v2
@ 2020-09-15  2:44     ` Muchun Song
  0 siblings, 0 replies; 23+ messages in thread
From: Muchun Song @ 2020-09-15  2:44 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: tj, Zefan Li, Johannes Weiner, corbet, Michal Hocko,
	Vladimir Davydov, Andrew Morton, Shakeel Butt, Roman Gushchin,
	Cgroups, linux-doc, LKML, Linux Memory Management List,
	kernel test robot

On Tue, Sep 15, 2020 at 3:07 AM Randy Dunlap <rdunlap@infradead.org> wrote:
>
> On 9/13/20 12:00 AM, Muchun Song wrote:
> > In the cgroup v1, we have a numa_stat interface. This is useful for
> > providing visibility into the numa locality information within an
> > memcg since the pages are allowed to be allocated from any physical
> > node. One of the use cases is evaluating application performance by
> > combining this information with the application's CPU allocation.
> > But the cgroup v2 does not. So this patch adds the missing information.
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > Suggested-by: Shakeel Butt <shakeelb@google.com>
> > Reported-by: kernel test robot <lkp@intel.com>
> > ---
> >  changelog in v3:
> >  1. Fix compiler error on powerpc architecture reported by kernel test robot.
> >  2. Fix a typo from "anno" to "anon".
> >
> >  changelog in v2:
> >  1. Add memory.numa_stat interface in cgroup v2.
> >
> >  Documentation/admin-guide/cgroup-v2.rst |  72 ++++++++++++++++
> >  mm/memcontrol.c                         | 107 ++++++++++++++++++++++++
> >  2 files changed, 179 insertions(+)
> >
> > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> > index 6be43781ec7f..92207f0012e4 100644
> > --- a/Documentation/admin-guide/cgroup-v2.rst
> > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > @@ -1368,6 +1368,78 @@ PAGE_SIZE multiple when read back.
> >               collapsing an existing range of pages. This counter is not
> >               present when CONFIG_TRANSPARENT_HUGEPAGE is not set.
> >
> > +  memory.numa_stat
> > +     A read-only flat-keyed file which exists on non-root cgroups.
> > +
> > +     This breaks down the cgroup's memory footprint into different
> > +     types of memory, type-specific details, and other information
> > +     per node on the state of the memory management system.
> > +
> > +     This is useful for providing visibility into the numa locality
>
> capitalize acronyms, please:                             NUMA

OK, I will do that. Thanks.

>
>
> > +     information within an memcg since the pages are allowed to be
> > +     allocated from any physical node. One of the use cases is evaluating
> > +     application performance by combining this information with the
> > +     application's CPU allocation.
> > +
> > +     All memory amounts are in bytes.
> > +
> > +     The output format of memory.numa_stat is::
> > +
> > +       type N0=<node 0 pages> N1=<node 1 pages> ...
>
> Now I'm confused.  5 lines above here it says "All memory amounts are in bytes"
> but these appear to be in pages. Which is it?  and what size pages if that matters?

Sorry. It's my mistake. I will fix it.

>
> Is it like this?
>           type N0=<bytes in node 0 pages> N1=<bytes in node 1 pages> ...

Thanks.

>
>
>
> > +     The entries are ordered to be human readable, and new entries
> > +     can show up in the middle. Don't rely on items remaining in a
> > +     fixed position; use the keys to look up specific values!
> > +
> > +       anon
> > +             Amount of memory per node used in anonymous mappings such
> > +             as brk(), sbrk(), and mmap(MAP_ANONYMOUS)
> > +
> > +       file
> > +             Amount of memory per node used to cache filesystem data,
> > +             including tmpfs and shared memory.
> > +
> > +       kernel_stack
> > +             Amount of memory per node allocated to kernel stacks.
> > +
> > +       shmem
> > +             Amount of cached filesystem data per node that is swap-backed,
> > +             such as tmpfs, shm segments, shared anonymous mmap()s
> > +
> > +       file_mapped
> > +             Amount of cached filesystem data per node mapped with mmap()
> > +
> > +       file_dirty
> > +             Amount of cached filesystem data per node that was modified but
> > +             not yet written back to disk
> > +
> > +       file_writeback
> > +             Amount of cached filesystem data per node that was modified and
> > +             is currently being written back to disk
> > +
> > +       anon_thp
> > +             Amount of memory per node used in anonymous mappings backed by
> > +             transparent hugepages
> > +
> > +       inactive_anon, active_anon, inactive_file, active_file, unevictable
> > +             Amount of memory, swap-backed and filesystem-backed,
> > +             per node on the internal memory management lists used
> > +             by the page reclaim algorithm.
> > +
> > +             As these represent internal list state (eg. shmem pages are on anon
>
>                                                          e.g.

Thanks.

>
> > +             memory management lists), inactive_foo + active_foo may not be equal to
> > +             the value for the foo counter, since the foo counter is type-based, not
> > +             list-based.
> > +
> > +       slab_reclaimable
> > +             Amount of memory per node used for storing in-kernel data
> > +             structures which might be reclaimed, such as dentries and
> > +             inodes.
> > +
> > +       slab_unreclaimable
> > +             Amount of memory per node used for storing in-kernel data
> > +             structures which cannot be reclaimed on memory pressure.
>
> Some of the descriptions above end with a '.' and some do not. Please be consistent.

Will do that.

>
> > +
> >    memory.swap.current
> >       A read-only single value file which exists on non-root
> >       cgroups.
>
>
> thanks.
> --
> ~Randy
>


-- 
Yours,
Muchun


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [External] Re: [PATCH v3] mm: memcontrol: Add the missing numa_stat interface for cgroup v2
@ 2020-09-15  2:44     ` Muchun Song
  0 siblings, 0 replies; 23+ messages in thread
From: Muchun Song @ 2020-09-15  2:44 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: tj-DgEjT+Ai2ygdnm+yROfE0A, Zefan Li, Johannes Weiner,
	corbet-T1hC0tSOHrs, Michal Hocko, Vladimir Davydov,
	Andrew Morton, Shakeel Butt, Roman Gushchin, Cgroups,
	linux-doc-u79uwXL29TY76Z2rM5mHXA, LKML,
	Linux Memory Management List, kernel test robot

On Tue, Sep 15, 2020 at 3:07 AM Randy Dunlap <rdunlap-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote:
>
> On 9/13/20 12:00 AM, Muchun Song wrote:
> > In the cgroup v1, we have a numa_stat interface. This is useful for
> > providing visibility into the numa locality information within an
> > memcg since the pages are allowed to be allocated from any physical
> > node. One of the use cases is evaluating application performance by
> > combining this information with the application's CPU allocation.
> > But the cgroup v2 does not. So this patch adds the missing information.
> >
> > Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>
> > Suggested-by: Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > Reported-by: kernel test robot <lkp-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > ---
> >  changelog in v3:
> >  1. Fix compiler error on powerpc architecture reported by kernel test robot.
> >  2. Fix a typo from "anno" to "anon".
> >
> >  changelog in v2:
> >  1. Add memory.numa_stat interface in cgroup v2.
> >
> >  Documentation/admin-guide/cgroup-v2.rst |  72 ++++++++++++++++
> >  mm/memcontrol.c                         | 107 ++++++++++++++++++++++++
> >  2 files changed, 179 insertions(+)
> >
> > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> > index 6be43781ec7f..92207f0012e4 100644
> > --- a/Documentation/admin-guide/cgroup-v2.rst
> > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > @@ -1368,6 +1368,78 @@ PAGE_SIZE multiple when read back.
> >               collapsing an existing range of pages. This counter is not
> >               present when CONFIG_TRANSPARENT_HUGEPAGE is not set.
> >
> > +  memory.numa_stat
> > +     A read-only flat-keyed file which exists on non-root cgroups.
> > +
> > +     This breaks down the cgroup's memory footprint into different
> > +     types of memory, type-specific details, and other information
> > +     per node on the state of the memory management system.
> > +
> > +     This is useful for providing visibility into the numa locality
>
> capitalize acronyms, please:                             NUMA

OK, I will do that. Thanks.

>
>
> > +     information within an memcg since the pages are allowed to be
> > +     allocated from any physical node. One of the use cases is evaluating
> > +     application performance by combining this information with the
> > +     application's CPU allocation.
> > +
> > +     All memory amounts are in bytes.
> > +
> > +     The output format of memory.numa_stat is::
> > +
> > +       type N0=<node 0 pages> N1=<node 1 pages> ...
>
> Now I'm confused.  5 lines above here it says "All memory amounts are in bytes"
> but these appear to be in pages. Which is it?  and what size pages if that matters?

Sorry. It's my mistake. I will fix it.

>
> Is it like this?
>           type N0=<bytes in node 0 pages> N1=<bytes in node 1 pages> ...

Thanks.

>
>
>
> > +     The entries are ordered to be human readable, and new entries
> > +     can show up in the middle. Don't rely on items remaining in a
> > +     fixed position; use the keys to look up specific values!
> > +
> > +       anon
> > +             Amount of memory per node used in anonymous mappings such
> > +             as brk(), sbrk(), and mmap(MAP_ANONYMOUS)
> > +
> > +       file
> > +             Amount of memory per node used to cache filesystem data,
> > +             including tmpfs and shared memory.
> > +
> > +       kernel_stack
> > +             Amount of memory per node allocated to kernel stacks.
> > +
> > +       shmem
> > +             Amount of cached filesystem data per node that is swap-backed,
> > +             such as tmpfs, shm segments, shared anonymous mmap()s
> > +
> > +       file_mapped
> > +             Amount of cached filesystem data per node mapped with mmap()
> > +
> > +       file_dirty
> > +             Amount of cached filesystem data per node that was modified but
> > +             not yet written back to disk
> > +
> > +       file_writeback
> > +             Amount of cached filesystem data per node that was modified and
> > +             is currently being written back to disk
> > +
> > +       anon_thp
> > +             Amount of memory per node used in anonymous mappings backed by
> > +             transparent hugepages
> > +
> > +       inactive_anon, active_anon, inactive_file, active_file, unevictable
> > +             Amount of memory, swap-backed and filesystem-backed,
> > +             per node on the internal memory management lists used
> > +             by the page reclaim algorithm.
> > +
> > +             As these represent internal list state (eg. shmem pages are on anon
>
>                                                          e.g.

Thanks.

>
> > +             memory management lists), inactive_foo + active_foo may not be equal to
> > +             the value for the foo counter, since the foo counter is type-based, not
> > +             list-based.
> > +
> > +       slab_reclaimable
> > +             Amount of memory per node used for storing in-kernel data
> > +             structures which might be reclaimed, such as dentries and
> > +             inodes.
> > +
> > +       slab_unreclaimable
> > +             Amount of memory per node used for storing in-kernel data
> > +             structures which cannot be reclaimed on memory pressure.
>
> Some of the descriptions above end with a '.' and some do not. Please be consistent.

Will do that.

>
> > +
> >    memory.swap.current
> >       A read-only single value file which exists on non-root
> >       cgroups.
>
>
> thanks.
> --
> ~Randy
>


-- 
Yours,
Muchun

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [External] Re: [PATCH v3] mm: memcontrol: Add the missing numa_stat interface for cgroup v2
  2020-09-14 22:57       ` Shakeel Butt
  (?)
@ 2020-09-15  2:46         ` Muchun Song
  -1 siblings, 0 replies; 23+ messages in thread
From: Muchun Song @ 2020-09-15  2:46 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Jonathan Corbet,
	Michal Hocko, Vladimir Davydov, Andrew Morton, Roman Gushchin,
	Cgroups, linux-doc, LKML, Linux MM, kernel test robot

On Tue, Sep 15, 2020 at 6:57 AM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Mon, Sep 14, 2020 at 9:55 AM Muchun Song <songmuchun@bytedance.com> wrote:
> >
> > On Tue, Sep 15, 2020 at 12:07 AM Shakeel Butt <shakeelb@google.com> wrote:
> > >
> > > On Sun, Sep 13, 2020 at 12:01 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > > >
> > > > In the cgroup v1, we have a numa_stat interface. This is useful for
> > > > providing visibility into the numa locality information within an
> > > > memcg since the pages are allowed to be allocated from any physical
> > > > node. One of the use cases is evaluating application performance by
> > > > combining this information with the application's CPU allocation.
> > > > But the cgroup v2 does not. So this patch adds the missing information.
> > > >
> > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > > Suggested-by: Shakeel Butt <shakeelb@google.com>
> > > > Reported-by: kernel test robot <lkp@intel.com>
> > > > ---
> > > [snip]
> > > > +
> > > > +static struct numa_stat numa_stats[] = {
> > > > +       { "anon", PAGE_SIZE, NR_ANON_MAPPED },
> > > > +       { "file", PAGE_SIZE, NR_FILE_PAGES },
> > > > +       { "kernel_stack", 1024, NR_KERNEL_STACK_KB },
> > > > +       { "shmem", PAGE_SIZE, NR_SHMEM },
> > > > +       { "file_mapped", PAGE_SIZE, NR_FILE_MAPPED },
> > > > +       { "file_dirty", PAGE_SIZE, NR_FILE_DIRTY },
> > > > +       { "file_writeback", PAGE_SIZE, NR_WRITEBACK },
> > > > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > > > +       /*
> > > > +        * The ratio will be initialized in numa_stats_init(). Because
> > > > +        * on some architectures, the macro of HPAGE_PMD_SIZE is not
> > > > +        * constant(e.g. powerpc).
> > > > +        */
> > > > +       { "anon_thp", 0, NR_ANON_THPS },
> > > > +#endif
> > > > +       { "inactive_anon", PAGE_SIZE, NR_INACTIVE_ANON },
> > > > +       { "active_anon", PAGE_SIZE, NR_ACTIVE_ANON },
> > > > +       { "inactive_file", PAGE_SIZE, NR_INACTIVE_FILE },
> > > > +       { "active_file", PAGE_SIZE, NR_ACTIVE_FILE },
> > > > +       { "unevictable", PAGE_SIZE, NR_UNEVICTABLE },
> > > > +       { "slab_reclaimable", 1, NR_SLAB_RECLAIMABLE_B },
> > > > +       { "slab_unreclaimable", 1, NR_SLAB_UNRECLAIMABLE_B },
> > > > +};
> > > > +
> > > > +static int __init numa_stats_init(void)
> > > > +{
> > > > +       int i;
> > > > +
> > > > +       for (i = 0; i < ARRAY_SIZE(numa_stats); i++) {
> > > > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > > > +               if (numa_stats[i].idx == NR_ANON_THPS)
> > > > +                       numa_stats[i].ratio = HPAGE_PMD_SIZE;
> > > > +#endif
> > > > +       }
> > >
> > > The for loop seems excessive but I don't really have a good alternative.
> >
> > Yeah, I also have no good alternative. The numa_stats is only initialized
> > once. So there may be no problem :).
> >
> > >
> > > > +
> > > > +       return 0;
> > > > +}
> > > > +pure_initcall(numa_stats_init);
> > > > +
> > > > +static unsigned long memcg_node_page_state(struct mem_cgroup *memcg,
> > > > +                                          unsigned int nid,
> > > > +                                          enum node_stat_item idx)
> > > > +{
> > > > +       VM_BUG_ON(nid >= nr_node_ids);
> > > > +       return lruvec_page_state(mem_cgroup_lruvec(memcg, NODE_DATA(nid)), idx);
> > > > +}
> > > > +
> > > > +static const char *memory_numa_stat_format(struct mem_cgroup *memcg)
> > > > +{
> > > > +       int i;
> > > > +       struct seq_buf s;
> > > > +
> > > > +       /* Reserve a byte for the trailing null */
> > > > +       seq_buf_init(&s, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE - 1);
> > > > +       if (!s.buffer)
> > > > +               return NULL;
> > > > +
> > > > +       for (i = 0; i < ARRAY_SIZE(numa_stats); i++) {
> > > > +               int nid;
> > > > +
> > > > +               seq_buf_printf(&s, "%s", numa_stats[i].name);
> > > > +               for_each_node_state(nid, N_MEMORY) {
> > > > +                       u64 size;
> > > > +
> > > > +                       size = memcg_node_page_state(memcg, nid,
> > > > +                                                    numa_stats[i].idx);
> > > > +                       size *= numa_stats[i].ratio;
> > > > +                       seq_buf_printf(&s, " N%d=%llu", nid, size);
> > > > +               }
> > > > +               seq_buf_putc(&s, '\n');
> > > > +       }
> > > > +
> > > > +       /* The above should easily fit into one page */
> > > > +       if (WARN_ON_ONCE(seq_buf_putc(&s, '\0')))
> > > > +               s.buffer[PAGE_SIZE - 1] = '\0';
> > >
> > > I think you should follow Michal's recommendation at
> > > http://lkml.kernel.org/r/20200914115724.GO16999@dhcp22.suse.cz
> >
> > Here is different, because the seq_buf_putc(&s, '\n') will not add \0 unless
> > we use seq_buf_puts(&s, "\n").
> >
>
> Why a separate memory_numa_stat_format()? For memory_stat_format(), it
> is called from two places. There is no need to have a separate
> memory_numa_stat_format(). Similarly why not just call seq_printf()
> instead of formatting into a seq_buf?

I was indeed affected by memory_stat_format(). Thank you for
making me sober.




-- 
Yours,
Muchun

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [External] Re: [PATCH v3] mm: memcontrol: Add the missing numa_stat interface for cgroup v2
@ 2020-09-15  2:46         ` Muchun Song
  0 siblings, 0 replies; 23+ messages in thread
From: Muchun Song @ 2020-09-15  2:46 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Jonathan Corbet,
	Michal Hocko, Vladimir Davydov, Andrew Morton, Roman Gushchin,
	Cgroups, linux-doc, LKML, Linux MM, kernel test robot

On Tue, Sep 15, 2020 at 6:57 AM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Mon, Sep 14, 2020 at 9:55 AM Muchun Song <songmuchun@bytedance.com> wrote:
> >
> > On Tue, Sep 15, 2020 at 12:07 AM Shakeel Butt <shakeelb@google.com> wrote:
> > >
> > > On Sun, Sep 13, 2020 at 12:01 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > > >
> > > > In the cgroup v1, we have a numa_stat interface. This is useful for
> > > > providing visibility into the numa locality information within an
> > > > memcg since the pages are allowed to be allocated from any physical
> > > > node. One of the use cases is evaluating application performance by
> > > > combining this information with the application's CPU allocation.
> > > > But the cgroup v2 does not. So this patch adds the missing information.
> > > >
> > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > > Suggested-by: Shakeel Butt <shakeelb@google.com>
> > > > Reported-by: kernel test robot <lkp@intel.com>
> > > > ---
> > > [snip]
> > > > +
> > > > +static struct numa_stat numa_stats[] = {
> > > > +       { "anon", PAGE_SIZE, NR_ANON_MAPPED },
> > > > +       { "file", PAGE_SIZE, NR_FILE_PAGES },
> > > > +       { "kernel_stack", 1024, NR_KERNEL_STACK_KB },
> > > > +       { "shmem", PAGE_SIZE, NR_SHMEM },
> > > > +       { "file_mapped", PAGE_SIZE, NR_FILE_MAPPED },
> > > > +       { "file_dirty", PAGE_SIZE, NR_FILE_DIRTY },
> > > > +       { "file_writeback", PAGE_SIZE, NR_WRITEBACK },
> > > > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > > > +       /*
> > > > +        * The ratio will be initialized in numa_stats_init(). Because
> > > > +        * on some architectures, the macro of HPAGE_PMD_SIZE is not
> > > > +        * constant(e.g. powerpc).
> > > > +        */
> > > > +       { "anon_thp", 0, NR_ANON_THPS },
> > > > +#endif
> > > > +       { "inactive_anon", PAGE_SIZE, NR_INACTIVE_ANON },
> > > > +       { "active_anon", PAGE_SIZE, NR_ACTIVE_ANON },
> > > > +       { "inactive_file", PAGE_SIZE, NR_INACTIVE_FILE },
> > > > +       { "active_file", PAGE_SIZE, NR_ACTIVE_FILE },
> > > > +       { "unevictable", PAGE_SIZE, NR_UNEVICTABLE },
> > > > +       { "slab_reclaimable", 1, NR_SLAB_RECLAIMABLE_B },
> > > > +       { "slab_unreclaimable", 1, NR_SLAB_UNRECLAIMABLE_B },
> > > > +};
> > > > +
> > > > +static int __init numa_stats_init(void)
> > > > +{
> > > > +       int i;
> > > > +
> > > > +       for (i = 0; i < ARRAY_SIZE(numa_stats); i++) {
> > > > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > > > +               if (numa_stats[i].idx == NR_ANON_THPS)
> > > > +                       numa_stats[i].ratio = HPAGE_PMD_SIZE;
> > > > +#endif
> > > > +       }
> > >
> > > The for loop seems excessive but I don't really have a good alternative.
> >
> > Yeah, I also have no good alternative. The numa_stats is only initialized
> > once. So there may be no problem :).
> >
> > >
> > > > +
> > > > +       return 0;
> > > > +}
> > > > +pure_initcall(numa_stats_init);
> > > > +
> > > > +static unsigned long memcg_node_page_state(struct mem_cgroup *memcg,
> > > > +                                          unsigned int nid,
> > > > +                                          enum node_stat_item idx)
> > > > +{
> > > > +       VM_BUG_ON(nid >= nr_node_ids);
> > > > +       return lruvec_page_state(mem_cgroup_lruvec(memcg, NODE_DATA(nid)), idx);
> > > > +}
> > > > +
> > > > +static const char *memory_numa_stat_format(struct mem_cgroup *memcg)
> > > > +{
> > > > +       int i;
> > > > +       struct seq_buf s;
> > > > +
> > > > +       /* Reserve a byte for the trailing null */
> > > > +       seq_buf_init(&s, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE - 1);
> > > > +       if (!s.buffer)
> > > > +               return NULL;
> > > > +
> > > > +       for (i = 0; i < ARRAY_SIZE(numa_stats); i++) {
> > > > +               int nid;
> > > > +
> > > > +               seq_buf_printf(&s, "%s", numa_stats[i].name);
> > > > +               for_each_node_state(nid, N_MEMORY) {
> > > > +                       u64 size;
> > > > +
> > > > +                       size = memcg_node_page_state(memcg, nid,
> > > > +                                                    numa_stats[i].idx);
> > > > +                       size *= numa_stats[i].ratio;
> > > > +                       seq_buf_printf(&s, " N%d=%llu", nid, size);
> > > > +               }
> > > > +               seq_buf_putc(&s, '\n');
> > > > +       }
> > > > +
> > > > +       /* The above should easily fit into one page */
> > > > +       if (WARN_ON_ONCE(seq_buf_putc(&s, '\0')))
> > > > +               s.buffer[PAGE_SIZE - 1] = '\0';
> > >
> > > I think you should follow Michal's recommendation at
> > > http://lkml.kernel.org/r/20200914115724.GO16999@dhcp22.suse.cz
> >
> > Here is different, because the seq_buf_putc(&s, '\n') will not add \0 unless
> > we use seq_buf_puts(&s, "\n").
> >
>
> Why a separate memory_numa_stat_format()? For memory_stat_format(), it
> is called from two places. There is no need to have a separate
> memory_numa_stat_format(). Similarly why not just call seq_printf()
> instead of formatting into a seq_buf?

I was indeed affected by memory_stat_format(). Thank you for
making me sober.




-- 
Yours,
Muchun


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [External] Re: [PATCH v3] mm: memcontrol: Add the missing numa_stat interface for cgroup v2
@ 2020-09-15  2:46         ` Muchun Song
  0 siblings, 0 replies; 23+ messages in thread
From: Muchun Song @ 2020-09-15  2:46 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Jonathan Corbet,
	Michal Hocko, Vladimir Davydov, Andrew Morton, Roman Gushchin,
	Cgroups, linux-doc-u79uwXL29TY76Z2rM5mHXA, LKML, Linux MM,
	kernel test robot

On Tue, Sep 15, 2020 at 6:57 AM Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>
> On Mon, Sep 14, 2020 at 9:55 AM Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org> wrote:
> >
> > On Tue, Sep 15, 2020 at 12:07 AM Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> > >
> > > On Sun, Sep 13, 2020 at 12:01 AM Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org> wrote:
> > > >
> > > > In the cgroup v1, we have a numa_stat interface. This is useful for
> > > > providing visibility into the numa locality information within an
> > > > memcg since the pages are allowed to be allocated from any physical
> > > > node. One of the use cases is evaluating application performance by
> > > > combining this information with the application's CPU allocation.
> > > > But the cgroup v2 does not. So this patch adds the missing information.
> > > >
> > > > Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>
> > > > Suggested-by: Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > > > Reported-by: kernel test robot <lkp-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > > > ---
> > > [snip]
> > > > +
> > > > +static struct numa_stat numa_stats[] = {
> > > > +       { "anon", PAGE_SIZE, NR_ANON_MAPPED },
> > > > +       { "file", PAGE_SIZE, NR_FILE_PAGES },
> > > > +       { "kernel_stack", 1024, NR_KERNEL_STACK_KB },
> > > > +       { "shmem", PAGE_SIZE, NR_SHMEM },
> > > > +       { "file_mapped", PAGE_SIZE, NR_FILE_MAPPED },
> > > > +       { "file_dirty", PAGE_SIZE, NR_FILE_DIRTY },
> > > > +       { "file_writeback", PAGE_SIZE, NR_WRITEBACK },
> > > > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > > > +       /*
> > > > +        * The ratio will be initialized in numa_stats_init(). Because
> > > > +        * on some architectures, the macro of HPAGE_PMD_SIZE is not
> > > > +        * constant(e.g. powerpc).
> > > > +        */
> > > > +       { "anon_thp", 0, NR_ANON_THPS },
> > > > +#endif
> > > > +       { "inactive_anon", PAGE_SIZE, NR_INACTIVE_ANON },
> > > > +       { "active_anon", PAGE_SIZE, NR_ACTIVE_ANON },
> > > > +       { "inactive_file", PAGE_SIZE, NR_INACTIVE_FILE },
> > > > +       { "active_file", PAGE_SIZE, NR_ACTIVE_FILE },
> > > > +       { "unevictable", PAGE_SIZE, NR_UNEVICTABLE },
> > > > +       { "slab_reclaimable", 1, NR_SLAB_RECLAIMABLE_B },
> > > > +       { "slab_unreclaimable", 1, NR_SLAB_UNRECLAIMABLE_B },
> > > > +};
> > > > +
> > > > +static int __init numa_stats_init(void)
> > > > +{
> > > > +       int i;
> > > > +
> > > > +       for (i = 0; i < ARRAY_SIZE(numa_stats); i++) {
> > > > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > > > +               if (numa_stats[i].idx == NR_ANON_THPS)
> > > > +                       numa_stats[i].ratio = HPAGE_PMD_SIZE;
> > > > +#endif
> > > > +       }
> > >
> > > The for loop seems excessive but I don't really have a good alternative.
> >
> > Yeah, I also have no good alternative. The numa_stats is only initialized
> > once. So there may be no problem :).
> >
> > >
> > > > +
> > > > +       return 0;
> > > > +}
> > > > +pure_initcall(numa_stats_init);
> > > > +
> > > > +static unsigned long memcg_node_page_state(struct mem_cgroup *memcg,
> > > > +                                          unsigned int nid,
> > > > +                                          enum node_stat_item idx)
> > > > +{
> > > > +       VM_BUG_ON(nid >= nr_node_ids);
> > > > +       return lruvec_page_state(mem_cgroup_lruvec(memcg, NODE_DATA(nid)), idx);
> > > > +}
> > > > +
> > > > +static const char *memory_numa_stat_format(struct mem_cgroup *memcg)
> > > > +{
> > > > +       int i;
> > > > +       struct seq_buf s;
> > > > +
> > > > +       /* Reserve a byte for the trailing null */
> > > > +       seq_buf_init(&s, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE - 1);
> > > > +       if (!s.buffer)
> > > > +               return NULL;
> > > > +
> > > > +       for (i = 0; i < ARRAY_SIZE(numa_stats); i++) {
> > > > +               int nid;
> > > > +
> > > > +               seq_buf_printf(&s, "%s", numa_stats[i].name);
> > > > +               for_each_node_state(nid, N_MEMORY) {
> > > > +                       u64 size;
> > > > +
> > > > +                       size = memcg_node_page_state(memcg, nid,
> > > > +                                                    numa_stats[i].idx);
> > > > +                       size *= numa_stats[i].ratio;
> > > > +                       seq_buf_printf(&s, " N%d=%llu", nid, size);
> > > > +               }
> > > > +               seq_buf_putc(&s, '\n');
> > > > +       }
> > > > +
> > > > +       /* The above should easily fit into one page */
> > > > +       if (WARN_ON_ONCE(seq_buf_putc(&s, '\0')))
> > > > +               s.buffer[PAGE_SIZE - 1] = '\0';
> > >
> > > I think you should follow Michal's recommendation at
> > > http://lkml.kernel.org/r/20200914115724.GO16999-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org
> >
> > Here is different, because the seq_buf_putc(&s, '\n') will not add \0 unless
> > we use seq_buf_puts(&s, "\n").
> >
>
> Why a separate memory_numa_stat_format()? For memory_stat_format(), it
> is called from two places. There is no need to have a separate
> memory_numa_stat_format(). Similarly why not just call seq_printf()
> instead of formatting into a seq_buf?

I was indeed affected by memory_stat_format(). Thank you for
making me sober.




-- 
Yours,
Muchun

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2020-09-15  2:47 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-13  7:00 [PATCH v3] mm: memcontrol: Add the missing numa_stat interface for cgroup v2 Muchun Song
2020-09-13 17:09 ` Chris Down
2020-09-14  3:10   ` [External] " Muchun Song
2020-09-14  3:10     ` Muchun Song
2020-09-14  3:18     ` Zefan Li
2020-09-14  3:18       ` Zefan Li
2020-09-14  3:28       ` Muchun Song
2020-09-14  3:28         ` Muchun Song
2020-09-14 16:07 ` Shakeel Butt
2020-09-14 16:07   ` Shakeel Butt
2020-09-14 16:54   ` [External] " Muchun Song
2020-09-14 16:54     ` Muchun Song
2020-09-14 16:54     ` Muchun Song
2020-09-14 22:57     ` Shakeel Butt
2020-09-14 22:57       ` Shakeel Butt
2020-09-14 22:57       ` Shakeel Butt
2020-09-15  2:46       ` Muchun Song
2020-09-15  2:46         ` Muchun Song
2020-09-15  2:46         ` Muchun Song
2020-09-14 19:06 ` Randy Dunlap
2020-09-15  2:44   ` [External] " Muchun Song
2020-09-15  2:44     ` Muchun Song
2020-09-15  2:44     ` Muchun Song

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.