All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] mm: memcontrol: cgroup2 memory statistics
@ 2016-01-13 22:01 ` Johannes Weiner
  0 siblings, 0 replies; 29+ messages in thread
From: Johannes Weiner @ 2016-01-13 22:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Vladimir Davydov, linux-mm, cgroups, linux-kernel,
	kernel-team

Hi Andrew,

these patches add basic memory statistics so that new users of cgroup2
have some inkling of what's going on, and are not just confronted with
a single number of bytes used.

This is very short-notice, but also straight-forward. It would be cool
to get this in along with the lifting of the cgroup2 devel flag.

Michal, Vladimir, what do you think? We'll also have to figure out how
we're going to represent and break down the "kmem" consumers.

Thanks,
Johannes

 include/linux/memcontrol.h |  5 +++-
 mm/memcontrol.c            | 63 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 67 insertions(+), 1 deletion(-)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 0/2] mm: memcontrol: cgroup2 memory statistics
@ 2016-01-13 22:01 ` Johannes Weiner
  0 siblings, 0 replies; 29+ messages in thread
From: Johannes Weiner @ 2016-01-13 22:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Vladimir Davydov, linux-mm, cgroups, linux-kernel,
	kernel-team

Hi Andrew,

these patches add basic memory statistics so that new users of cgroup2
have some inkling of what's going on, and are not just confronted with
a single number of bytes used.

This is very short-notice, but also straight-forward. It would be cool
to get this in along with the lifting of the cgroup2 devel flag.

Michal, Vladimir, what do you think? We'll also have to figure out how
we're going to represent and break down the "kmem" consumers.

Thanks,
Johannes

 include/linux/memcontrol.h |  5 +++-
 mm/memcontrol.c            | 63 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 67 insertions(+), 1 deletion(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 0/2] mm: memcontrol: cgroup2 memory statistics
@ 2016-01-13 22:01 ` Johannes Weiner
  0 siblings, 0 replies; 29+ messages in thread
From: Johannes Weiner @ 2016-01-13 22:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Vladimir Davydov, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg

Hi Andrew,

these patches add basic memory statistics so that new users of cgroup2
have some inkling of what's going on, and are not just confronted with
a single number of bytes used.

This is very short-notice, but also straight-forward. It would be cool
to get this in along with the lifting of the cgroup2 devel flag.

Michal, Vladimir, what do you think? We'll also have to figure out how
we're going to represent and break down the "kmem" consumers.

Thanks,
Johannes

 include/linux/memcontrol.h |  5 +++-
 mm/memcontrol.c            | 63 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 67 insertions(+), 1 deletion(-)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/2] mm: memcontrol: basic memory statistics in cgroup2 memory controller
  2016-01-13 22:01 ` Johannes Weiner
@ 2016-01-13 22:01   ` Johannes Weiner
  -1 siblings, 0 replies; 29+ messages in thread
From: Johannes Weiner @ 2016-01-13 22:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Vladimir Davydov, linux-mm, cgroups, linux-kernel,
	kernel-team

Provide a cgroup2 memory.stat that provides statistics on LRU memory
and fault event counters. More consumers and breakdowns will follow.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/memcontrol.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c26ffac..8645852 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2767,6 +2767,18 @@ static unsigned long tree_stat(struct mem_cgroup *memcg,
 	return val;
 }
 
+static unsigned long tree_events(struct mem_cgroup *memcg,
+				 enum mem_cgroup_events_index idx)
+{
+	struct mem_cgroup *iter;
+	unsigned long val = 0;
+
+	for_each_mem_cgroup_tree(iter, memcg)
+		val += mem_cgroup_read_events(iter, idx);
+
+	return val;
+}
+
 static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
 {
 	unsigned long val;
@@ -5095,6 +5107,46 @@ static int memory_events_show(struct seq_file *m, void *v)
 	return 0;
 }
 
+static int memory_stat_show(struct seq_file *m, void *v)
+{
+	struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m));
+	int i;
+
+	/* Memory consumer totals */
+
+	seq_printf(m, "anon %lu\n",
+		   tree_stat(memcg, MEM_CGROUP_STAT_RSS) * PAGE_SIZE);
+	seq_printf(m, "file %lu\n",
+		   tree_stat(memcg, MEM_CGROUP_STAT_CACHE) * PAGE_SIZE);
+
+	/* Per-consumer breakdowns */
+
+	for (i = 0; i < NR_LRU_LISTS; i++) {
+		struct mem_cgroup *mi;
+		unsigned long val = 0;
+
+		for_each_mem_cgroup_tree(mi, memcg)
+			val += mem_cgroup_nr_lru_pages(mi, BIT(i)) * PAGE_SIZE;
+		seq_printf(m, "%s %lu\n", mem_cgroup_lru_names[i], val);
+	}
+
+	seq_printf(m, "file_mapped %lu\n",
+		   tree_stat(memcg, MEM_CGROUP_STAT_FILE_MAPPED) * PAGE_SIZE);
+	seq_printf(m, "file_dirty %lu\n",
+		   tree_stat(memcg, MEM_CGROUP_STAT_DIRTY) * PAGE_SIZE);
+	seq_printf(m, "file_writeback %lu\n",
+		   tree_stat(memcg, MEM_CGROUP_STAT_WRITEBACK) * PAGE_SIZE);
+
+	/* Memory management events */
+
+	seq_printf(m, "pgfault %lu\n",
+		   tree_events(memcg, MEM_CGROUP_EVENTS_PGFAULT));
+	seq_printf(m, "pgmajfault %lu\n",
+		   tree_events(memcg, MEM_CGROUP_EVENTS_PGMAJFAULT));
+
+	return 0;
+}
+
 static struct cftype memory_files[] = {
 	{
 		.name = "current",
@@ -5125,6 +5177,11 @@ static struct cftype memory_files[] = {
 		.file_offset = offsetof(struct mem_cgroup, events_file),
 		.seq_show = memory_events_show,
 	},
+	{
+		.name = "stat",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = memory_stat_show,
+	},
 	{ }	/* terminate */
 };
 
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 1/2] mm: memcontrol: basic memory statistics in cgroup2 memory controller
@ 2016-01-13 22:01   ` Johannes Weiner
  0 siblings, 0 replies; 29+ messages in thread
From: Johannes Weiner @ 2016-01-13 22:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Vladimir Davydov, linux-mm, cgroups, linux-kernel,
	kernel-team

Provide a cgroup2 memory.stat that provides statistics on LRU memory
and fault event counters. More consumers and breakdowns will follow.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/memcontrol.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c26ffac..8645852 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2767,6 +2767,18 @@ static unsigned long tree_stat(struct mem_cgroup *memcg,
 	return val;
 }
 
+static unsigned long tree_events(struct mem_cgroup *memcg,
+				 enum mem_cgroup_events_index idx)
+{
+	struct mem_cgroup *iter;
+	unsigned long val = 0;
+
+	for_each_mem_cgroup_tree(iter, memcg)
+		val += mem_cgroup_read_events(iter, idx);
+
+	return val;
+}
+
 static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
 {
 	unsigned long val;
@@ -5095,6 +5107,46 @@ static int memory_events_show(struct seq_file *m, void *v)
 	return 0;
 }
 
+static int memory_stat_show(struct seq_file *m, void *v)
+{
+	struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m));
+	int i;
+
+	/* Memory consumer totals */
+
+	seq_printf(m, "anon %lu\n",
+		   tree_stat(memcg, MEM_CGROUP_STAT_RSS) * PAGE_SIZE);
+	seq_printf(m, "file %lu\n",
+		   tree_stat(memcg, MEM_CGROUP_STAT_CACHE) * PAGE_SIZE);
+
+	/* Per-consumer breakdowns */
+
+	for (i = 0; i < NR_LRU_LISTS; i++) {
+		struct mem_cgroup *mi;
+		unsigned long val = 0;
+
+		for_each_mem_cgroup_tree(mi, memcg)
+			val += mem_cgroup_nr_lru_pages(mi, BIT(i)) * PAGE_SIZE;
+		seq_printf(m, "%s %lu\n", mem_cgroup_lru_names[i], val);
+	}
+
+	seq_printf(m, "file_mapped %lu\n",
+		   tree_stat(memcg, MEM_CGROUP_STAT_FILE_MAPPED) * PAGE_SIZE);
+	seq_printf(m, "file_dirty %lu\n",
+		   tree_stat(memcg, MEM_CGROUP_STAT_DIRTY) * PAGE_SIZE);
+	seq_printf(m, "file_writeback %lu\n",
+		   tree_stat(memcg, MEM_CGROUP_STAT_WRITEBACK) * PAGE_SIZE);
+
+	/* Memory management events */
+
+	seq_printf(m, "pgfault %lu\n",
+		   tree_events(memcg, MEM_CGROUP_EVENTS_PGFAULT));
+	seq_printf(m, "pgmajfault %lu\n",
+		   tree_events(memcg, MEM_CGROUP_EVENTS_PGMAJFAULT));
+
+	return 0;
+}
+
 static struct cftype memory_files[] = {
 	{
 		.name = "current",
@@ -5125,6 +5177,11 @@ static struct cftype memory_files[] = {
 		.file_offset = offsetof(struct mem_cgroup, events_file),
 		.seq_show = memory_events_show,
 	},
+	{
+		.name = "stat",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = memory_stat_show,
+	},
 	{ }	/* terminate */
 };
 
-- 
2.7.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/2] mm: memcontrol: add "sock" to cgroup2 memory.stat
  2016-01-13 22:01 ` Johannes Weiner
@ 2016-01-13 22:01   ` Johannes Weiner
  -1 siblings, 0 replies; 29+ messages in thread
From: Johannes Weiner @ 2016-01-13 22:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Vladimir Davydov, linux-mm, cgroups, linux-kernel,
	kernel-team

Provide statistics on how much of a cgroup's memory footprint is made
up of socket buffers from network connections owned by the group.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/memcontrol.h | 5 ++++-
 mm/memcontrol.c            | 6 ++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 1666617..9ae48d4 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -50,6 +50,9 @@ enum mem_cgroup_stat_index {
 	MEM_CGROUP_STAT_WRITEBACK,	/* # of pages under writeback */
 	MEM_CGROUP_STAT_SWAP,		/* # of pages, swapped out */
 	MEM_CGROUP_STAT_NSTATS,
+	/* default hierarchy stats */
+	MEMCG_SOCK,
+	MEMCG_NR_STAT,
 };
 
 struct mem_cgroup_reclaim_cookie {
@@ -87,7 +90,7 @@ enum mem_cgroup_events_target {
 
 #ifdef CONFIG_MEMCG
 struct mem_cgroup_stat_cpu {
-	long count[MEM_CGROUP_STAT_NSTATS];
+	long count[MEMCG_NR_STAT];
 	unsigned long events[MEMCG_NR_EVENTS];
 	unsigned long nr_page_events;
 	unsigned long targets[MEM_CGROUP_NTARGETS];
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8645852..6bb23a7 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5118,6 +5118,8 @@ static int memory_stat_show(struct seq_file *m, void *v)
 		   tree_stat(memcg, MEM_CGROUP_STAT_RSS) * PAGE_SIZE);
 	seq_printf(m, "file %lu\n",
 		   tree_stat(memcg, MEM_CGROUP_STAT_CACHE) * PAGE_SIZE);
+	seq_printf(m, "sock %lu\n",
+		   tree_stat(memcg, MEMCG_SOCK) * PAGE_SIZE);
 
 	/* Per-consumer breakdowns */
 
@@ -5619,6 +5621,8 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
 	if (in_softirq())
 		gfp_mask = GFP_NOWAIT;
 
+	this_cpu_add(memcg->stat->count[MEMCG_SOCK], nr_pages);
+
 	if (try_charge(memcg, gfp_mask, nr_pages) == 0)
 		return true;
 
@@ -5638,6 +5642,8 @@ void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
 		return;
 	}
 
+	this_cpu_sub(memcg->stat->count[MEMCG_SOCK], nr_pages);
+
 	page_counter_uncharge(&memcg->memory, nr_pages);
 	css_put_many(&memcg->css, nr_pages);
 }
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/2] mm: memcontrol: add "sock" to cgroup2 memory.stat
@ 2016-01-13 22:01   ` Johannes Weiner
  0 siblings, 0 replies; 29+ messages in thread
From: Johannes Weiner @ 2016-01-13 22:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Vladimir Davydov, linux-mm, cgroups, linux-kernel,
	kernel-team

Provide statistics on how much of a cgroup's memory footprint is made
up of socket buffers from network connections owned by the group.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/memcontrol.h | 5 ++++-
 mm/memcontrol.c            | 6 ++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 1666617..9ae48d4 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -50,6 +50,9 @@ enum mem_cgroup_stat_index {
 	MEM_CGROUP_STAT_WRITEBACK,	/* # of pages under writeback */
 	MEM_CGROUP_STAT_SWAP,		/* # of pages, swapped out */
 	MEM_CGROUP_STAT_NSTATS,
+	/* default hierarchy stats */
+	MEMCG_SOCK,
+	MEMCG_NR_STAT,
 };
 
 struct mem_cgroup_reclaim_cookie {
@@ -87,7 +90,7 @@ enum mem_cgroup_events_target {
 
 #ifdef CONFIG_MEMCG
 struct mem_cgroup_stat_cpu {
-	long count[MEM_CGROUP_STAT_NSTATS];
+	long count[MEMCG_NR_STAT];
 	unsigned long events[MEMCG_NR_EVENTS];
 	unsigned long nr_page_events;
 	unsigned long targets[MEM_CGROUP_NTARGETS];
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8645852..6bb23a7 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5118,6 +5118,8 @@ static int memory_stat_show(struct seq_file *m, void *v)
 		   tree_stat(memcg, MEM_CGROUP_STAT_RSS) * PAGE_SIZE);
 	seq_printf(m, "file %lu\n",
 		   tree_stat(memcg, MEM_CGROUP_STAT_CACHE) * PAGE_SIZE);
+	seq_printf(m, "sock %lu\n",
+		   tree_stat(memcg, MEMCG_SOCK) * PAGE_SIZE);
 
 	/* Per-consumer breakdowns */
 
@@ -5619,6 +5621,8 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
 	if (in_softirq())
 		gfp_mask = GFP_NOWAIT;
 
+	this_cpu_add(memcg->stat->count[MEMCG_SOCK], nr_pages);
+
 	if (try_charge(memcg, gfp_mask, nr_pages) == 0)
 		return true;
 
@@ -5638,6 +5642,8 @@ void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
 		return;
 	}
 
+	this_cpu_sub(memcg->stat->count[MEMCG_SOCK], nr_pages);
+
 	page_counter_uncharge(&memcg->memory, nr_pages);
 	css_put_many(&memcg->css, nr_pages);
 }
-- 
2.7.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/2] mm: memcontrol: cgroup2 memory statistics
  2016-01-13 22:01 ` Johannes Weiner
  (?)
@ 2016-01-13 22:49   ` Andrew Morton
  -1 siblings, 0 replies; 29+ messages in thread
From: Andrew Morton @ 2016-01-13 22:49 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Michal Hocko, Vladimir Davydov, linux-mm, cgroups, linux-kernel,
	kernel-team

On Wed, 13 Jan 2016 17:01:07 -0500 Johannes Weiner <hannes@cmpxchg.org> wrote:

> Hi Andrew,
> 
> these patches add basic memory statistics so that new users of cgroup2
> have some inkling of what's going on, and are not just confronted with
> a single number of bytes used.
> 
> This is very short-notice, but also straight-forward. It would be cool
> to get this in along with the lifting of the cgroup2 devel flag.
> 
> Michal, Vladimir, what do you think? We'll also have to figure out how
> we're going to represent and break down the "kmem" consumers.
> 

It would be nice to see example output, and a description of why this
output was chosen: what was included, what was omitted, why it was
presented this way, what units were chosen for displaying the stats and
why.  Will the things which are being displayed still be relevant (or
even available) 10 years from now.  etcetera.

And the interface should be documented at some point.  Doing it now
will help with the review of the proposed interface.

Because this stuff is forever and we have to get it right.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/2] mm: memcontrol: cgroup2 memory statistics
@ 2016-01-13 22:49   ` Andrew Morton
  0 siblings, 0 replies; 29+ messages in thread
From: Andrew Morton @ 2016-01-13 22:49 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Michal Hocko, Vladimir Davydov, linux-mm, cgroups, linux-kernel,
	kernel-team

On Wed, 13 Jan 2016 17:01:07 -0500 Johannes Weiner <hannes@cmpxchg.org> wrote:

> Hi Andrew,
> 
> these patches add basic memory statistics so that new users of cgroup2
> have some inkling of what's going on, and are not just confronted with
> a single number of bytes used.
> 
> This is very short-notice, but also straight-forward. It would be cool
> to get this in along with the lifting of the cgroup2 devel flag.
> 
> Michal, Vladimir, what do you think? We'll also have to figure out how
> we're going to represent and break down the "kmem" consumers.
> 

It would be nice to see example output, and a description of why this
output was chosen: what was included, what was omitted, why it was
presented this way, what units were chosen for displaying the stats and
why.  Will the things which are being displayed still be relevant (or
even available) 10 years from now.  etcetera.

And the interface should be documented at some point.  Doing it now
will help with the review of the proposed interface.

Because this stuff is forever and we have to get it right.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/2] mm: memcontrol: cgroup2 memory statistics
@ 2016-01-13 22:49   ` Andrew Morton
  0 siblings, 0 replies; 29+ messages in thread
From: Andrew Morton @ 2016-01-13 22:49 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Michal Hocko, Vladimir Davydov, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg

On Wed, 13 Jan 2016 17:01:07 -0500 Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> wrote:

> Hi Andrew,
> 
> these patches add basic memory statistics so that new users of cgroup2
> have some inkling of what's going on, and are not just confronted with
> a single number of bytes used.
> 
> This is very short-notice, but also straight-forward. It would be cool
> to get this in along with the lifting of the cgroup2 devel flag.
> 
> Michal, Vladimir, what do you think? We'll also have to figure out how
> we're going to represent and break down the "kmem" consumers.
> 

It would be nice to see example output, and a description of why this
output was chosen: what was included, what was omitted, why it was
presented this way, what units were chosen for displaying the stats and
why.  Will the things which are being displayed still be relevant (or
even available) 10 years from now.  etcetera.

And the interface should be documented at some point.  Doing it now
will help with the review of the proposed interface.

Because this stuff is forever and we have to get it right.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] mm: memcontrol: basic memory statistics in cgroup2 memory controller
  2016-01-13 22:01   ` Johannes Weiner
@ 2016-01-13 22:49     ` Andrew Morton
  -1 siblings, 0 replies; 29+ messages in thread
From: Andrew Morton @ 2016-01-13 22:49 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Michal Hocko, Vladimir Davydov, linux-mm, cgroups, linux-kernel,
	kernel-team

On Wed, 13 Jan 2016 17:01:08 -0500 Johannes Weiner <hannes@cmpxchg.org> wrote:

> Provide a cgroup2 memory.stat that provides statistics on LRU memory
> and fault event counters. More consumers and breakdowns will follow.
> 
> ...
>
> @@ -5095,6 +5107,46 @@ static int memory_events_show(struct seq_file *m, void *v)
>  	return 0;
>  }
>  
> +static int memory_stat_show(struct seq_file *m, void *v)
> +{
> +	struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m));
> +	int i;
> +
> +	/* Memory consumer totals */
> +
> +	seq_printf(m, "anon %lu\n",
> +		   tree_stat(memcg, MEM_CGROUP_STAT_RSS) * PAGE_SIZE);

Is there any reason why this won't overflow a longword on 32-bit?

> +	seq_printf(m, "file %lu\n",
> +		   tree_stat(memcg, MEM_CGROUP_STAT_CACHE) * PAGE_SIZE);
> +
> +	/* Per-consumer breakdowns */
> +
> +	for (i = 0; i < NR_LRU_LISTS; i++) {
> +		struct mem_cgroup *mi;
> +		unsigned long val = 0;
> +
> +		for_each_mem_cgroup_tree(mi, memcg)
> +			val += mem_cgroup_nr_lru_pages(mi, BIT(i)) * PAGE_SIZE;
> +		seq_printf(m, "%s %lu\n", mem_cgroup_lru_names[i], val);
> +	}
> +
> +	seq_printf(m, "file_mapped %lu\n",
> +		   tree_stat(memcg, MEM_CGROUP_STAT_FILE_MAPPED) * PAGE_SIZE);
> +	seq_printf(m, "file_dirty %lu\n",
> +		   tree_stat(memcg, MEM_CGROUP_STAT_DIRTY) * PAGE_SIZE);
> +	seq_printf(m, "file_writeback %lu\n",
> +		   tree_stat(memcg, MEM_CGROUP_STAT_WRITEBACK) * PAGE_SIZE);
> +
> +	/* Memory management events */
> +
> +	seq_printf(m, "pgfault %lu\n",
> +		   tree_events(memcg, MEM_CGROUP_EVENTS_PGFAULT));
> +	seq_printf(m, "pgmajfault %lu\n",
> +		   tree_events(memcg, MEM_CGROUP_EVENTS_PGMAJFAULT));
> +
> +	return 0;
> +}

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] mm: memcontrol: basic memory statistics in cgroup2 memory controller
@ 2016-01-13 22:49     ` Andrew Morton
  0 siblings, 0 replies; 29+ messages in thread
From: Andrew Morton @ 2016-01-13 22:49 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Michal Hocko, Vladimir Davydov, linux-mm, cgroups, linux-kernel,
	kernel-team

On Wed, 13 Jan 2016 17:01:08 -0500 Johannes Weiner <hannes@cmpxchg.org> wrote:

> Provide a cgroup2 memory.stat that provides statistics on LRU memory
> and fault event counters. More consumers and breakdowns will follow.
> 
> ...
>
> @@ -5095,6 +5107,46 @@ static int memory_events_show(struct seq_file *m, void *v)
>  	return 0;
>  }
>  
> +static int memory_stat_show(struct seq_file *m, void *v)
> +{
> +	struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m));
> +	int i;
> +
> +	/* Memory consumer totals */
> +
> +	seq_printf(m, "anon %lu\n",
> +		   tree_stat(memcg, MEM_CGROUP_STAT_RSS) * PAGE_SIZE);

Is there any reason why this won't overflow a longword on 32-bit?

> +	seq_printf(m, "file %lu\n",
> +		   tree_stat(memcg, MEM_CGROUP_STAT_CACHE) * PAGE_SIZE);
> +
> +	/* Per-consumer breakdowns */
> +
> +	for (i = 0; i < NR_LRU_LISTS; i++) {
> +		struct mem_cgroup *mi;
> +		unsigned long val = 0;
> +
> +		for_each_mem_cgroup_tree(mi, memcg)
> +			val += mem_cgroup_nr_lru_pages(mi, BIT(i)) * PAGE_SIZE;
> +		seq_printf(m, "%s %lu\n", mem_cgroup_lru_names[i], val);
> +	}
> +
> +	seq_printf(m, "file_mapped %lu\n",
> +		   tree_stat(memcg, MEM_CGROUP_STAT_FILE_MAPPED) * PAGE_SIZE);
> +	seq_printf(m, "file_dirty %lu\n",
> +		   tree_stat(memcg, MEM_CGROUP_STAT_DIRTY) * PAGE_SIZE);
> +	seq_printf(m, "file_writeback %lu\n",
> +		   tree_stat(memcg, MEM_CGROUP_STAT_WRITEBACK) * PAGE_SIZE);
> +
> +	/* Memory management events */
> +
> +	seq_printf(m, "pgfault %lu\n",
> +		   tree_events(memcg, MEM_CGROUP_EVENTS_PGFAULT));
> +	seq_printf(m, "pgmajfault %lu\n",
> +		   tree_events(memcg, MEM_CGROUP_EVENTS_PGMAJFAULT));
> +
> +	return 0;
> +}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/2] mm: memcontrol: cgroup2 memory statistics
  2016-01-13 22:49   ` Andrew Morton
  (?)
@ 2016-01-14 20:24     ` Johannes Weiner
  -1 siblings, 0 replies; 29+ messages in thread
From: Johannes Weiner @ 2016-01-14 20:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Vladimir Davydov, linux-mm, cgroups, linux-kernel,
	kernel-team

On Wed, Jan 13, 2016 at 02:49:16PM -0800, Andrew Morton wrote:
> It would be nice to see example output, and a description of why this
> output was chosen: what was included, what was omitted, why it was
> presented this way, what units were chosen for displaying the stats and
> why.  Will the things which are being displayed still be relevant (or
> even available) 10 years from now.  etcetera.
> 
> And the interface should be documented at some point.  Doing it now
> will help with the review of the proposed interface.
> 
> Because this stuff is forever and we have to get it right.

Here is a follow-up to 1/2 that hopefully addresses all that, as well
as the 32-bit overflow problem. What do you think? I'm probably a bit
too optimistic with being able to maintain a meaningful sort order of
the file when adding new entries. It depends on whether people start
relying on items staying at fixed offsets and what we tell them in
response when that breaks. I hope that we can at least get the main
memory consumers in before this is released, just in case.

>From 1be87db16a3895538ce65362b5234ef9c8af308d Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Thu, 14 Jan 2016 10:40:24 -0500
Subject: [PATCH] mm: memcontrol: basic memory statistics in cgroup2 memory
 controller fix

Fixlet addressing akpm's feedback:

- Fix overflowing byte counters on 32-bit. Just like in the existing
  interface files, bytes must be printed as u64 to work with highmem.

- Add documentation in cgroup.txt that explains the memory.stat file
  and its format.

- Rethink item ordering to accomodate potential future additions. The
  ordering now follows both 1) from big picture to detail and 2) from
  stats that reflect on userspace behavior towards stats that reflect
  on kernel heuristics. Both are gradients, and item-by-item ordering
  will still require judgement calls (and some bike shed painting).

Changelog addendum to the original patch:

The output of this file looks as follows:

$ cat memory.stat
anon 167936
file 87302144
file_mapped 0
file_dirty 0
file_writeback 0
inactive_anon 0
active_anon 155648
inactive_file 87298048
active_file 4096
unevictable 0
pgfault 636
pgmajfault 0

The list consists of two sections: statistics reflecting the current
state of the memory management subsystem, and statistics reflecting
past events. The items themselves are sorted such that generic big
picture items come before specific details, and items related to
userspace activity come before items related to kernel heuristics.

All memory counters are in bytes to eliminate all ambiguity with
variable page sizes.

There will be more items and statistics added in the future, but this
is a good initial set to get a minimum of insight into how a cgroup is
using memory, and the items chosen for now are likely to remain valid
even with significant changes to the memory management implementation.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 Documentation/cgroup.txt | 56 ++++++++++++++++++++++++++++++++++++++++++++++++
 mm/memcontrol.c          | 45 +++++++++++++++++++++++---------------
 2 files changed, 84 insertions(+), 17 deletions(-)

diff --git a/Documentation/cgroup.txt b/Documentation/cgroup.txt
index f441564..65b3eac 100644
--- a/Documentation/cgroup.txt
+++ b/Documentation/cgroup.txt
@@ -819,6 +819,62 @@ PAGE_SIZE multiple when read back.
 		the cgroup.  This may not exactly match the number of
 		processes killed but should generally be close.
 
+  memory.stat
+
+	A read-only flat-keyed file which exists on non-root cgroups.
+
+	This breaks down the cgroup's memory footprint into different
+	types of memory, type-specific details, and other information
+	on the state and past events of the memory management system.
+
+	All memory amounts are in bytes.
+
+	The entries are ordered to be human readable, and new entries
+	can show up in the middle. Don't rely on items remaining in a
+	fixed position; use the keys to look up specific values!
+
+	  anon
+
+		Amount of memory used in anonymous mappings such as
+		brk(), sbrk(), and mmap(MAP_ANONYMOUS)
+
+	  file
+
+		Amount of memory used to cache filesystem data,
+		including tmpfs and shared memory.
+
+	  file_mapped
+
+		Amount of cached filesystem data mapped with mmap()
+
+	  file_dirty
+
+		Amount of cached filesystem data that was modified but
+		not yet written back to disk
+
+	  file_writeback
+
+		Amount of cached filesystem data that was modified and
+		is currently being written back to disk
+
+	  inactive_anon
+	  active_anon
+	  inactive_file
+	  active_file
+	  unevictable
+
+		Amount of memory, swap-backed and filesystem-backed,
+		on the internal memory management lists used by the
+		page reclaim algorithm
+
+	  pgfault
+
+		Total number of page faults incurred
+
+	  pgmajfault
+
+		Number of major page faults incurred
+
   memory.swap.current
 
 	A read-only single value file which exists on non-root
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8645852..cdb51a9 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5112,32 +5112,43 @@ static int memory_stat_show(struct seq_file *m, void *v)
 	struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m));
 	int i;
 
-	/* Memory consumer totals */
-
-	seq_printf(m, "anon %lu\n",
-		   tree_stat(memcg, MEM_CGROUP_STAT_RSS) * PAGE_SIZE);
-	seq_printf(m, "file %lu\n",
-		   tree_stat(memcg, MEM_CGROUP_STAT_CACHE) * PAGE_SIZE);
+	/*
+	 * Provide statistics on the state of the memory subsystem as
+	 * well as cumulative event counters that show past behavior.
+	 *
+	 * This list is ordered following a combination of these gradients:
+	 * 1) generic big picture -> specifics and details
+	 * 2) reflecting userspace activity -> reflecting kernel heuristics
+	 *
+	 * Current memory state:
+	 */
 
-	/* Per-consumer breakdowns */
+	seq_printf(m, "anon %llu\n",
+		   (u64)tree_stat(memcg, MEM_CGROUP_STAT_RSS) * PAGE_SIZE);
+	seq_printf(m, "file %llu\n",
+		   (u64)tree_stat(memcg, MEM_CGROUP_STAT_CACHE) * PAGE_SIZE);
+
+	seq_printf(m, "file_mapped %llu\n",
+		   (u64)tree_stat(memcg, MEM_CGROUP_STAT_FILE_MAPPED) *
+		   PAGE_SIZE);
+	seq_printf(m, "file_dirty %llu\n",
+		   (u64)tree_stat(memcg, MEM_CGROUP_STAT_DIRTY) *
+		   PAGE_SIZE);
+	seq_printf(m, "file_writeback %llu\n",
+		   (u64)tree_stat(memcg, MEM_CGROUP_STAT_WRITEBACK) *
+		   PAGE_SIZE);
 
 	for (i = 0; i < NR_LRU_LISTS; i++) {
 		struct mem_cgroup *mi;
 		unsigned long val = 0;
 
 		for_each_mem_cgroup_tree(mi, memcg)
-			val += mem_cgroup_nr_lru_pages(mi, BIT(i)) * PAGE_SIZE;
-		seq_printf(m, "%s %lu\n", mem_cgroup_lru_names[i], val);
+			val += mem_cgroup_nr_lru_pages(mi, BIT(i));
+		seq_printf(m, "%s %llu\n",
+			   mem_cgroup_lru_names[i], (u64)val * PAGE_SIZE);
 	}
 
-	seq_printf(m, "file_mapped %lu\n",
-		   tree_stat(memcg, MEM_CGROUP_STAT_FILE_MAPPED) * PAGE_SIZE);
-	seq_printf(m, "file_dirty %lu\n",
-		   tree_stat(memcg, MEM_CGROUP_STAT_DIRTY) * PAGE_SIZE);
-	seq_printf(m, "file_writeback %lu\n",
-		   tree_stat(memcg, MEM_CGROUP_STAT_WRITEBACK) * PAGE_SIZE);
-
-	/* Memory management events */
+	/* Accumulated memory events */
 
 	seq_printf(m, "pgfault %lu\n",
 		   tree_events(memcg, MEM_CGROUP_EVENTS_PGFAULT));
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/2] mm: memcontrol: cgroup2 memory statistics
@ 2016-01-14 20:24     ` Johannes Weiner
  0 siblings, 0 replies; 29+ messages in thread
From: Johannes Weiner @ 2016-01-14 20:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Vladimir Davydov, linux-mm, cgroups, linux-kernel,
	kernel-team

On Wed, Jan 13, 2016 at 02:49:16PM -0800, Andrew Morton wrote:
> It would be nice to see example output, and a description of why this
> output was chosen: what was included, what was omitted, why it was
> presented this way, what units were chosen for displaying the stats and
> why.  Will the things which are being displayed still be relevant (or
> even available) 10 years from now.  etcetera.
> 
> And the interface should be documented at some point.  Doing it now
> will help with the review of the proposed interface.
> 
> Because this stuff is forever and we have to get it right.

Here is a follow-up to 1/2 that hopefully addresses all that, as well
as the 32-bit overflow problem. What do you think? I'm probably a bit
too optimistic with being able to maintain a meaningful sort order of
the file when adding new entries. It depends on whether people start
relying on items staying at fixed offsets and what we tell them in
response when that breaks. I hope that we can at least get the main
memory consumers in before this is released, just in case.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/2] mm: memcontrol: cgroup2 memory statistics
@ 2016-01-14 20:24     ` Johannes Weiner
  0 siblings, 0 replies; 29+ messages in thread
From: Johannes Weiner @ 2016-01-14 20:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Vladimir Davydov, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg

On Wed, Jan 13, 2016 at 02:49:16PM -0800, Andrew Morton wrote:
> It would be nice to see example output, and a description of why this
> output was chosen: what was included, what was omitted, why it was
> presented this way, what units were chosen for displaying the stats and
> why.  Will the things which are being displayed still be relevant (or
> even available) 10 years from now.  etcetera.
> 
> And the interface should be documented at some point.  Doing it now
> will help with the review of the proposed interface.
> 
> Because this stuff is forever and we have to get it right.

Here is a follow-up to 1/2 that hopefully addresses all that, as well
as the 32-bit overflow problem. What do you think? I'm probably a bit
too optimistic with being able to maintain a meaningful sort order of
the file when adding new entries. It depends on whether people start
relying on items staying at fixed offsets and what we tell them in
response when that breaks. I hope that we can at least get the main
memory consumers in before this is released, just in case.

From 1be87db16a3895538ce65362b5234ef9c8af308d Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Date: Thu, 14 Jan 2016 10:40:24 -0500
Subject: [PATCH] mm: memcontrol: basic memory statistics in cgroup2 memory
 controller fix

Fixlet addressing akpm's feedback:

- Fix overflowing byte counters on 32-bit. Just like in the existing
  interface files, bytes must be printed as u64 to work with highmem.

- Add documentation in cgroup.txt that explains the memory.stat file
  and its format.

- Rethink item ordering to accomodate potential future additions. The
  ordering now follows both 1) from big picture to detail and 2) from
  stats that reflect on userspace behavior towards stats that reflect
  on kernel heuristics. Both are gradients, and item-by-item ordering
  will still require judgement calls (and some bike shed painting).

Changelog addendum to the original patch:

The output of this file looks as follows:

$ cat memory.stat
anon 167936
file 87302144
file_mapped 0
file_dirty 0
file_writeback 0
inactive_anon 0
active_anon 155648
inactive_file 87298048
active_file 4096
unevictable 0
pgfault 636
pgmajfault 0

The list consists of two sections: statistics reflecting the current
state of the memory management subsystem, and statistics reflecting
past events. The items themselves are sorted such that generic big
picture items come before specific details, and items related to
userspace activity come before items related to kernel heuristics.

All memory counters are in bytes to eliminate all ambiguity with
variable page sizes.

There will be more items and statistics added in the future, but this
is a good initial set to get a minimum of insight into how a cgroup is
using memory, and the items chosen for now are likely to remain valid
even with significant changes to the memory management implementation.

Signed-off-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
---
 Documentation/cgroup.txt | 56 ++++++++++++++++++++++++++++++++++++++++++++++++
 mm/memcontrol.c          | 45 +++++++++++++++++++++++---------------
 2 files changed, 84 insertions(+), 17 deletions(-)

diff --git a/Documentation/cgroup.txt b/Documentation/cgroup.txt
index f441564..65b3eac 100644
--- a/Documentation/cgroup.txt
+++ b/Documentation/cgroup.txt
@@ -819,6 +819,62 @@ PAGE_SIZE multiple when read back.
 		the cgroup.  This may not exactly match the number of
 		processes killed but should generally be close.
 
+  memory.stat
+
+	A read-only flat-keyed file which exists on non-root cgroups.
+
+	This breaks down the cgroup's memory footprint into different
+	types of memory, type-specific details, and other information
+	on the state and past events of the memory management system.
+
+	All memory amounts are in bytes.
+
+	The entries are ordered to be human readable, and new entries
+	can show up in the middle. Don't rely on items remaining in a
+	fixed position; use the keys to look up specific values!
+
+	  anon
+
+		Amount of memory used in anonymous mappings such as
+		brk(), sbrk(), and mmap(MAP_ANONYMOUS)
+
+	  file
+
+		Amount of memory used to cache filesystem data,
+		including tmpfs and shared memory.
+
+	  file_mapped
+
+		Amount of cached filesystem data mapped with mmap()
+
+	  file_dirty
+
+		Amount of cached filesystem data that was modified but
+		not yet written back to disk
+
+	  file_writeback
+
+		Amount of cached filesystem data that was modified and
+		is currently being written back to disk
+
+	  inactive_anon
+	  active_anon
+	  inactive_file
+	  active_file
+	  unevictable
+
+		Amount of memory, swap-backed and filesystem-backed,
+		on the internal memory management lists used by the
+		page reclaim algorithm
+
+	  pgfault
+
+		Total number of page faults incurred
+
+	  pgmajfault
+
+		Number of major page faults incurred
+
   memory.swap.current
 
 	A read-only single value file which exists on non-root
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8645852..cdb51a9 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5112,32 +5112,43 @@ static int memory_stat_show(struct seq_file *m, void *v)
 	struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m));
 	int i;
 
-	/* Memory consumer totals */
-
-	seq_printf(m, "anon %lu\n",
-		   tree_stat(memcg, MEM_CGROUP_STAT_RSS) * PAGE_SIZE);
-	seq_printf(m, "file %lu\n",
-		   tree_stat(memcg, MEM_CGROUP_STAT_CACHE) * PAGE_SIZE);
+	/*
+	 * Provide statistics on the state of the memory subsystem as
+	 * well as cumulative event counters that show past behavior.
+	 *
+	 * This list is ordered following a combination of these gradients:
+	 * 1) generic big picture -> specifics and details
+	 * 2) reflecting userspace activity -> reflecting kernel heuristics
+	 *
+	 * Current memory state:
+	 */
 
-	/* Per-consumer breakdowns */
+	seq_printf(m, "anon %llu\n",
+		   (u64)tree_stat(memcg, MEM_CGROUP_STAT_RSS) * PAGE_SIZE);
+	seq_printf(m, "file %llu\n",
+		   (u64)tree_stat(memcg, MEM_CGROUP_STAT_CACHE) * PAGE_SIZE);
+
+	seq_printf(m, "file_mapped %llu\n",
+		   (u64)tree_stat(memcg, MEM_CGROUP_STAT_FILE_MAPPED) *
+		   PAGE_SIZE);
+	seq_printf(m, "file_dirty %llu\n",
+		   (u64)tree_stat(memcg, MEM_CGROUP_STAT_DIRTY) *
+		   PAGE_SIZE);
+	seq_printf(m, "file_writeback %llu\n",
+		   (u64)tree_stat(memcg, MEM_CGROUP_STAT_WRITEBACK) *
+		   PAGE_SIZE);
 
 	for (i = 0; i < NR_LRU_LISTS; i++) {
 		struct mem_cgroup *mi;
 		unsigned long val = 0;
 
 		for_each_mem_cgroup_tree(mi, memcg)
-			val += mem_cgroup_nr_lru_pages(mi, BIT(i)) * PAGE_SIZE;
-		seq_printf(m, "%s %lu\n", mem_cgroup_lru_names[i], val);
+			val += mem_cgroup_nr_lru_pages(mi, BIT(i));
+		seq_printf(m, "%s %llu\n",
+			   mem_cgroup_lru_names[i], (u64)val * PAGE_SIZE);
 	}
 
-	seq_printf(m, "file_mapped %lu\n",
-		   tree_stat(memcg, MEM_CGROUP_STAT_FILE_MAPPED) * PAGE_SIZE);
-	seq_printf(m, "file_dirty %lu\n",
-		   tree_stat(memcg, MEM_CGROUP_STAT_DIRTY) * PAGE_SIZE);
-	seq_printf(m, "file_writeback %lu\n",
-		   tree_stat(memcg, MEM_CGROUP_STAT_WRITEBACK) * PAGE_SIZE);
-
-	/* Memory management events */
+	/* Accumulated memory events */
 
 	seq_printf(m, "pgfault %lu\n",
 		   tree_events(memcg, MEM_CGROUP_EVENTS_PGFAULT));
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] mm: memcontrol: basic memory statistics in cgroup2 memory controller
  2016-01-13 22:49     ` Andrew Morton
  (?)
@ 2016-01-14 20:25       ` Johannes Weiner
  -1 siblings, 0 replies; 29+ messages in thread
From: Johannes Weiner @ 2016-01-14 20:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Vladimir Davydov, linux-mm, cgroups, linux-kernel,
	kernel-team

On Wed, Jan 13, 2016 at 02:49:30PM -0800, Andrew Morton wrote:
> > @@ -5095,6 +5107,46 @@ static int memory_events_show(struct seq_file *m, void *v)
> >  	return 0;
> >  }
> >  
> > +static int memory_stat_show(struct seq_file *m, void *v)
> > +{
> > +	struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m));
> > +	int i;
> > +
> > +	/* Memory consumer totals */
> > +
> > +	seq_printf(m, "anon %lu\n",
> > +		   tree_stat(memcg, MEM_CGROUP_STAT_RSS) * PAGE_SIZE);
> 
> Is there any reason why this won't overflow a longword on 32-bit?

It will, I don't know what I was thinking there. Fixed in the update.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] mm: memcontrol: basic memory statistics in cgroup2 memory controller
@ 2016-01-14 20:25       ` Johannes Weiner
  0 siblings, 0 replies; 29+ messages in thread
From: Johannes Weiner @ 2016-01-14 20:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Vladimir Davydov, linux-mm, cgroups, linux-kernel,
	kernel-team

On Wed, Jan 13, 2016 at 02:49:30PM -0800, Andrew Morton wrote:
> > @@ -5095,6 +5107,46 @@ static int memory_events_show(struct seq_file *m, void *v)
> >  	return 0;
> >  }
> >  
> > +static int memory_stat_show(struct seq_file *m, void *v)
> > +{
> > +	struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m));
> > +	int i;
> > +
> > +	/* Memory consumer totals */
> > +
> > +	seq_printf(m, "anon %lu\n",
> > +		   tree_stat(memcg, MEM_CGROUP_STAT_RSS) * PAGE_SIZE);
> 
> Is there any reason why this won't overflow a longword on 32-bit?

It will, I don't know what I was thinking there. Fixed in the update.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] mm: memcontrol: basic memory statistics in cgroup2 memory controller
@ 2016-01-14 20:25       ` Johannes Weiner
  0 siblings, 0 replies; 29+ messages in thread
From: Johannes Weiner @ 2016-01-14 20:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Vladimir Davydov, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg

On Wed, Jan 13, 2016 at 02:49:30PM -0800, Andrew Morton wrote:
> > @@ -5095,6 +5107,46 @@ static int memory_events_show(struct seq_file *m, void *v)
> >  	return 0;
> >  }
> >  
> > +static int memory_stat_show(struct seq_file *m, void *v)
> > +{
> > +	struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m));
> > +	int i;
> > +
> > +	/* Memory consumer totals */
> > +
> > +	seq_printf(m, "anon %lu\n",
> > +		   tree_stat(memcg, MEM_CGROUP_STAT_RSS) * PAGE_SIZE);
> 
> Is there any reason why this won't overflow a longword on 32-bit?

It will, I don't know what I was thinking there. Fixed in the update.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 2/2 V2] mm: memcontrol: add "sock" to cgroup2 memory.stat
  2016-01-13 22:01   ` Johannes Weiner
@ 2016-01-14 20:26     ` Johannes Weiner
  -1 siblings, 0 replies; 29+ messages in thread
From: Johannes Weiner @ 2016-01-14 20:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Vladimir Davydov, linux-mm, cgroups, linux-kernel,
	kernel-team

Provide statistics on how much of a cgroup's memory footprint is made
up of socket buffers from network connections owned by the group.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---

v2: add item description in cgroup.txt and rebase to updated 1/2

 Documentation/cgroup.txt   | 4 ++++
 include/linux/memcontrol.h | 5 ++++-
 mm/memcontrol.c            | 6 ++++++
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/Documentation/cgroup.txt b/Documentation/cgroup.txt
index 65b3eac..e8d25e7 100644
--- a/Documentation/cgroup.txt
+++ b/Documentation/cgroup.txt
@@ -843,6 +843,10 @@ PAGE_SIZE multiple when read back.
 		Amount of memory used to cache filesystem data,
 		including tmpfs and shared memory.
 
+	  sock
+
+		Amount of memory used in network transmission buffers
+
 	  file_mapped
 
 		Amount of cached filesystem data mapped with mmap()
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 1666617..9ae48d4 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -50,6 +50,9 @@ enum mem_cgroup_stat_index {
 	MEM_CGROUP_STAT_WRITEBACK,	/* # of pages under writeback */
 	MEM_CGROUP_STAT_SWAP,		/* # of pages, swapped out */
 	MEM_CGROUP_STAT_NSTATS,
+	/* default hierarchy stats */
+	MEMCG_SOCK,
+	MEMCG_NR_STAT,
 };
 
 struct mem_cgroup_reclaim_cookie {
@@ -87,7 +90,7 @@ enum mem_cgroup_events_target {
 
 #ifdef CONFIG_MEMCG
 struct mem_cgroup_stat_cpu {
-	long count[MEM_CGROUP_STAT_NSTATS];
+	long count[MEMCG_NR_STAT];
 	unsigned long events[MEMCG_NR_EVENTS];
 	unsigned long nr_page_events;
 	unsigned long targets[MEM_CGROUP_NTARGETS];
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index cdb51a9..97e47de 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5127,6 +5127,8 @@ static int memory_stat_show(struct seq_file *m, void *v)
 		   (u64)tree_stat(memcg, MEM_CGROUP_STAT_RSS) * PAGE_SIZE);
 	seq_printf(m, "file %llu\n",
 		   (u64)tree_stat(memcg, MEM_CGROUP_STAT_CACHE) * PAGE_SIZE);
+	seq_printf(m, "sock %llu\n",
+		   (u64)tree_stat(memcg, MEMCG_SOCK) * PAGE_SIZE);
 
 	seq_printf(m, "file_mapped %llu\n",
 		   (u64)tree_stat(memcg, MEM_CGROUP_STAT_FILE_MAPPED) *
@@ -5630,6 +5632,8 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
 	if (in_softirq())
 		gfp_mask = GFP_NOWAIT;
 
+	this_cpu_add(memcg->stat->count[MEMCG_SOCK], nr_pages);
+
 	if (try_charge(memcg, gfp_mask, nr_pages) == 0)
 		return true;
 
@@ -5649,6 +5653,8 @@ void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
 		return;
 	}
 
+	this_cpu_sub(memcg->stat->count[MEMCG_SOCK], nr_pages);
+
 	page_counter_uncharge(&memcg->memory, nr_pages);
 	css_put_many(&memcg->css, nr_pages);
 }
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/2 V2] mm: memcontrol: add "sock" to cgroup2 memory.stat
@ 2016-01-14 20:26     ` Johannes Weiner
  0 siblings, 0 replies; 29+ messages in thread
From: Johannes Weiner @ 2016-01-14 20:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Vladimir Davydov, linux-mm, cgroups, linux-kernel,
	kernel-team

Provide statistics on how much of a cgroup's memory footprint is made
up of socket buffers from network connections owned by the group.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---

v2: add item description in cgroup.txt and rebase to updated 1/2

 Documentation/cgroup.txt   | 4 ++++
 include/linux/memcontrol.h | 5 ++++-
 mm/memcontrol.c            | 6 ++++++
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/Documentation/cgroup.txt b/Documentation/cgroup.txt
index 65b3eac..e8d25e7 100644
--- a/Documentation/cgroup.txt
+++ b/Documentation/cgroup.txt
@@ -843,6 +843,10 @@ PAGE_SIZE multiple when read back.
 		Amount of memory used to cache filesystem data,
 		including tmpfs and shared memory.
 
+	  sock
+
+		Amount of memory used in network transmission buffers
+
 	  file_mapped
 
 		Amount of cached filesystem data mapped with mmap()
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 1666617..9ae48d4 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -50,6 +50,9 @@ enum mem_cgroup_stat_index {
 	MEM_CGROUP_STAT_WRITEBACK,	/* # of pages under writeback */
 	MEM_CGROUP_STAT_SWAP,		/* # of pages, swapped out */
 	MEM_CGROUP_STAT_NSTATS,
+	/* default hierarchy stats */
+	MEMCG_SOCK,
+	MEMCG_NR_STAT,
 };
 
 struct mem_cgroup_reclaim_cookie {
@@ -87,7 +90,7 @@ enum mem_cgroup_events_target {
 
 #ifdef CONFIG_MEMCG
 struct mem_cgroup_stat_cpu {
-	long count[MEM_CGROUP_STAT_NSTATS];
+	long count[MEMCG_NR_STAT];
 	unsigned long events[MEMCG_NR_EVENTS];
 	unsigned long nr_page_events;
 	unsigned long targets[MEM_CGROUP_NTARGETS];
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index cdb51a9..97e47de 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5127,6 +5127,8 @@ static int memory_stat_show(struct seq_file *m, void *v)
 		   (u64)tree_stat(memcg, MEM_CGROUP_STAT_RSS) * PAGE_SIZE);
 	seq_printf(m, "file %llu\n",
 		   (u64)tree_stat(memcg, MEM_CGROUP_STAT_CACHE) * PAGE_SIZE);
+	seq_printf(m, "sock %llu\n",
+		   (u64)tree_stat(memcg, MEMCG_SOCK) * PAGE_SIZE);
 
 	seq_printf(m, "file_mapped %llu\n",
 		   (u64)tree_stat(memcg, MEM_CGROUP_STAT_FILE_MAPPED) *
@@ -5630,6 +5632,8 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
 	if (in_softirq())
 		gfp_mask = GFP_NOWAIT;
 
+	this_cpu_add(memcg->stat->count[MEMCG_SOCK], nr_pages);
+
 	if (try_charge(memcg, gfp_mask, nr_pages) == 0)
 		return true;
 
@@ -5649,6 +5653,8 @@ void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
 		return;
 	}
 
+	this_cpu_sub(memcg->stat->count[MEMCG_SOCK], nr_pages);
+
 	page_counter_uncharge(&memcg->memory, nr_pages);
 	css_put_many(&memcg->css, nr_pages);
 }
-- 
2.7.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/2] mm: memcontrol: cgroup2 memory statistics
  2016-01-14 20:24     ` Johannes Weiner
  (?)
@ 2016-01-15  9:58       ` Vladimir Davydov
  -1 siblings, 0 replies; 29+ messages in thread
From: Vladimir Davydov @ 2016-01-15  9:58 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Michal Hocko, linux-mm, cgroups, linux-kernel,
	kernel-team

Hi Johannes,

On Thu, Jan 14, 2016 at 03:24:08PM -0500, Johannes Weiner wrote:
...
> The output of this file looks as follows:
> 
> $ cat memory.stat
> anon 167936
> file 87302144
> file_mapped 0
> file_dirty 0
> file_writeback 0
> inactive_anon 0
> active_anon 155648
> inactive_file 87298048
> active_file 4096
> unevictable 0
> pgfault 636
> pgmajfault 0
> 
> The list consists of two sections: statistics reflecting the current
> state of the memory management subsystem, and statistics reflecting
> past events. The items themselves are sorted such that generic big
> picture items come before specific details, and items related to
> userspace activity come before items related to kernel heuristics.
> 
> All memory counters are in bytes to eliminate all ambiguity with
> variable page sizes.
> 
> There will be more items and statistics added in the future, but this
> is a good initial set to get a minimum of insight into how a cgroup is
> using memory, and the items chosen for now are likely to remain valid
> even with significant changes to the memory management implementation.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

With the follow-up it looks good to me. All exported counters look
justified enough and the format follows that of other cgroup2
controllers (cpu, blkio). Thanks!

Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>

One addition though. May be, we could add 'total' field which would show
memory.current? Yeah, this would result in a little redundancy, but I
think that from userspace pov it's much more convenient to read the
only file and get all stat counters than having them spread throughout
several files.

Come to think of it, do we really need separate memory.events file?
Can't these counters live in memory.stat either? Yeah, this file
generates events, but IMHO it's not very useful the way it is currently
implemented:

Suppose, a user wants to receive notifications about OOM or LOW events,
which are rather rare normally and might require immediate action. The
only way to do that is to listen to memory.events, but this file can
generate tons of MAX/HIGH when the cgroup is performing normally. The
userspace app will have to wake up every time the cgroup performs
reclaim and check memory.events just to ensure no OOM happened and this
all will result in wasting cpu time.

May be, we could generate LOW/HIGH/MAX events on memory.low/high/max?
This would look natural IMO. Don't know where OOM events should go in
this case though.

Thanks,
Vladimir

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/2] mm: memcontrol: cgroup2 memory statistics
@ 2016-01-15  9:58       ` Vladimir Davydov
  0 siblings, 0 replies; 29+ messages in thread
From: Vladimir Davydov @ 2016-01-15  9:58 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Michal Hocko, linux-mm, cgroups, linux-kernel,
	kernel-team

Hi Johannes,

On Thu, Jan 14, 2016 at 03:24:08PM -0500, Johannes Weiner wrote:
...
> The output of this file looks as follows:
> 
> $ cat memory.stat
> anon 167936
> file 87302144
> file_mapped 0
> file_dirty 0
> file_writeback 0
> inactive_anon 0
> active_anon 155648
> inactive_file 87298048
> active_file 4096
> unevictable 0
> pgfault 636
> pgmajfault 0
> 
> The list consists of two sections: statistics reflecting the current
> state of the memory management subsystem, and statistics reflecting
> past events. The items themselves are sorted such that generic big
> picture items come before specific details, and items related to
> userspace activity come before items related to kernel heuristics.
> 
> All memory counters are in bytes to eliminate all ambiguity with
> variable page sizes.
> 
> There will be more items and statistics added in the future, but this
> is a good initial set to get a minimum of insight into how a cgroup is
> using memory, and the items chosen for now are likely to remain valid
> even with significant changes to the memory management implementation.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

With the follow-up it looks good to me. All exported counters look
justified enough and the format follows that of other cgroup2
controllers (cpu, blkio). Thanks!

Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>

One addition though. May be, we could add 'total' field which would show
memory.current? Yeah, this would result in a little redundancy, but I
think that from userspace pov it's much more convenient to read the
only file and get all stat counters than having them spread throughout
several files.

Come to think of it, do we really need separate memory.events file?
Can't these counters live in memory.stat either? Yeah, this file
generates events, but IMHO it's not very useful the way it is currently
implemented:

Suppose, a user wants to receive notifications about OOM or LOW events,
which are rather rare normally and might require immediate action. The
only way to do that is to listen to memory.events, but this file can
generate tons of MAX/HIGH when the cgroup is performing normally. The
userspace app will have to wake up every time the cgroup performs
reclaim and check memory.events just to ensure no OOM happened and this
all will result in wasting cpu time.

May be, we could generate LOW/HIGH/MAX events on memory.low/high/max?
This would look natural IMO. Don't know where OOM events should go in
this case though.

Thanks,
Vladimir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/2] mm: memcontrol: cgroup2 memory statistics
@ 2016-01-15  9:58       ` Vladimir Davydov
  0 siblings, 0 replies; 29+ messages in thread
From: Vladimir Davydov @ 2016-01-15  9:58 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Michal Hocko, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg

Hi Johannes,

On Thu, Jan 14, 2016 at 03:24:08PM -0500, Johannes Weiner wrote:
...
> The output of this file looks as follows:
> 
> $ cat memory.stat
> anon 167936
> file 87302144
> file_mapped 0
> file_dirty 0
> file_writeback 0
> inactive_anon 0
> active_anon 155648
> inactive_file 87298048
> active_file 4096
> unevictable 0
> pgfault 636
> pgmajfault 0
> 
> The list consists of two sections: statistics reflecting the current
> state of the memory management subsystem, and statistics reflecting
> past events. The items themselves are sorted such that generic big
> picture items come before specific details, and items related to
> userspace activity come before items related to kernel heuristics.
> 
> All memory counters are in bytes to eliminate all ambiguity with
> variable page sizes.
> 
> There will be more items and statistics added in the future, but this
> is a good initial set to get a minimum of insight into how a cgroup is
> using memory, and the items chosen for now are likely to remain valid
> even with significant changes to the memory management implementation.
> 
> Signed-off-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>

With the follow-up it looks good to me. All exported counters look
justified enough and the format follows that of other cgroup2
controllers (cpu, blkio). Thanks!

Acked-by: Vladimir Davydov <vdavydov-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>

One addition though. May be, we could add 'total' field which would show
memory.current? Yeah, this would result in a little redundancy, but I
think that from userspace pov it's much more convenient to read the
only file and get all stat counters than having them spread throughout
several files.

Come to think of it, do we really need separate memory.events file?
Can't these counters live in memory.stat either? Yeah, this file
generates events, but IMHO it's not very useful the way it is currently
implemented:

Suppose, a user wants to receive notifications about OOM or LOW events,
which are rather rare normally and might require immediate action. The
only way to do that is to listen to memory.events, but this file can
generate tons of MAX/HIGH when the cgroup is performing normally. The
userspace app will have to wake up every time the cgroup performs
reclaim and check memory.events just to ensure no OOM happened and this
all will result in wasting cpu time.

May be, we could generate LOW/HIGH/MAX events on memory.low/high/max?
This would look natural IMO. Don't know where OOM events should go in
this case though.

Thanks,
Vladimir

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/2] mm: memcontrol: cgroup2 memory statistics
  2016-01-15  9:58       ` Vladimir Davydov
  (?)
@ 2016-01-15 20:30         ` Johannes Weiner
  -1 siblings, 0 replies; 29+ messages in thread
From: Johannes Weiner @ 2016-01-15 20:30 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Andrew Morton, Michal Hocko, linux-mm, cgroups, linux-kernel,
	kernel-team

On Fri, Jan 15, 2016 at 12:58:34PM +0300, Vladimir Davydov wrote:
> With the follow-up it looks good to me. All exported counters look
> justified enough and the format follows that of other cgroup2
> controllers (cpu, blkio). Thanks!
> 
> Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>

Thanks Vladimir.

> One addition though. May be, we could add 'total' field which would show
> memory.current? Yeah, this would result in a little redundancy, but I
> think that from userspace pov it's much more convenient to read the
> only file and get all stat counters than having them spread throughout
> several files.

I am not fully convinced that a total value or even memory.current
will be looked at that often in practice, because in all but a few
cornercases that value will be pegged to the configured limit. In
those instances I think it should be okay to check another file.

> Come to think of it, do we really need separate memory.events file?
> Can't these counters live in memory.stat either?

I think it sits at a different level of the interface. The events file
indicates cgroup-specific dynamics between configuration and memory
footprint, and so it sits on the same level as low, high, max, and
current. These are the parts involved in the most basic control loop
between the kernel and the job scheduler--monitor and adjust or notify
the admin. It's for the entity that allocates and manages the system.

The memory.stat file on the other hand is geared toward analyzing and
understanding workload-specific performance (whether by humans or with
some automated heuristics) and if necessary correcting the config file
that describes the application's requirements to the job scheduler.

I think it makes sense to not conflate these two interfaces.

> Yeah, this file
> generates events, but IMHO it's not very useful the way it is currently
> implemented:
> 
> Suppose, a user wants to receive notifications about OOM or LOW events,
> which are rather rare normally and might require immediate action. The
> only way to do that is to listen to memory.events, but this file can
> generate tons of MAX/HIGH when the cgroup is performing normally. The
> userspace app will have to wake up every time the cgroup performs
> reclaim and check memory.events just to ensure no OOM happened and this
> all will result in wasting cpu time.

Under optimal system load there is no limit reclaim, and memory
pressure comes exclusively from a shortage of physical pages that
global reclaim balances based on memory.low. If groups run into their
own limits, it means that there are idle resources left on the table.

So events only happen when the machine is over or under utilized, and
as per above, the events file is mainly meant for something like a job
scheduler tasked with allocating the machine's resources. It's hard to
imagine a job scheduler scenario where the long-term goal is anything
other than optimal utilization.

There are reasonable cases in which memory could be temporarily left
idle, say to keep startup latency of new jobs low. In those it's true
that the max and high notifications might become annoying. But do you
really think that could become problematic in practice? In that case
it should be enough if we ratelimit the file-changed notifications.

> May be, we could generate LOW/HIGH/MAX events on memory.low/high/max?
> This would look natural IMO. Don't know where OOM events should go in
> this case though.

Without a natural place for OOM notifications, it probably makes sense
to stick with memory.events.

Thanks,
Johannes

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/2] mm: memcontrol: cgroup2 memory statistics
@ 2016-01-15 20:30         ` Johannes Weiner
  0 siblings, 0 replies; 29+ messages in thread
From: Johannes Weiner @ 2016-01-15 20:30 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Andrew Morton, Michal Hocko, linux-mm, cgroups, linux-kernel,
	kernel-team

On Fri, Jan 15, 2016 at 12:58:34PM +0300, Vladimir Davydov wrote:
> With the follow-up it looks good to me. All exported counters look
> justified enough and the format follows that of other cgroup2
> controllers (cpu, blkio). Thanks!
> 
> Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>

Thanks Vladimir.

> One addition though. May be, we could add 'total' field which would show
> memory.current? Yeah, this would result in a little redundancy, but I
> think that from userspace pov it's much more convenient to read the
> only file and get all stat counters than having them spread throughout
> several files.

I am not fully convinced that a total value or even memory.current
will be looked at that often in practice, because in all but a few
cornercases that value will be pegged to the configured limit. In
those instances I think it should be okay to check another file.

> Come to think of it, do we really need separate memory.events file?
> Can't these counters live in memory.stat either?

I think it sits at a different level of the interface. The events file
indicates cgroup-specific dynamics between configuration and memory
footprint, and so it sits on the same level as low, high, max, and
current. These are the parts involved in the most basic control loop
between the kernel and the job scheduler--monitor and adjust or notify
the admin. It's for the entity that allocates and manages the system.

The memory.stat file on the other hand is geared toward analyzing and
understanding workload-specific performance (whether by humans or with
some automated heuristics) and if necessary correcting the config file
that describes the application's requirements to the job scheduler.

I think it makes sense to not conflate these two interfaces.

> Yeah, this file
> generates events, but IMHO it's not very useful the way it is currently
> implemented:
> 
> Suppose, a user wants to receive notifications about OOM or LOW events,
> which are rather rare normally and might require immediate action. The
> only way to do that is to listen to memory.events, but this file can
> generate tons of MAX/HIGH when the cgroup is performing normally. The
> userspace app will have to wake up every time the cgroup performs
> reclaim and check memory.events just to ensure no OOM happened and this
> all will result in wasting cpu time.

Under optimal system load there is no limit reclaim, and memory
pressure comes exclusively from a shortage of physical pages that
global reclaim balances based on memory.low. If groups run into their
own limits, it means that there are idle resources left on the table.

So events only happen when the machine is over or under utilized, and
as per above, the events file is mainly meant for something like a job
scheduler tasked with allocating the machine's resources. It's hard to
imagine a job scheduler scenario where the long-term goal is anything
other than optimal utilization.

There are reasonable cases in which memory could be temporarily left
idle, say to keep startup latency of new jobs low. In those it's true
that the max and high notifications might become annoying. But do you
really think that could become problematic in practice? In that case
it should be enough if we ratelimit the file-changed notifications.

> May be, we could generate LOW/HIGH/MAX events on memory.low/high/max?
> This would look natural IMO. Don't know where OOM events should go in
> this case though.

Without a natural place for OOM notifications, it probably makes sense
to stick with memory.events.

Thanks,
Johannes

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/2] mm: memcontrol: cgroup2 memory statistics
@ 2016-01-15 20:30         ` Johannes Weiner
  0 siblings, 0 replies; 29+ messages in thread
From: Johannes Weiner @ 2016-01-15 20:30 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Andrew Morton, Michal Hocko, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg

On Fri, Jan 15, 2016 at 12:58:34PM +0300, Vladimir Davydov wrote:
> With the follow-up it looks good to me. All exported counters look
> justified enough and the format follows that of other cgroup2
> controllers (cpu, blkio). Thanks!
> 
> Acked-by: Vladimir Davydov <vdavydov-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>

Thanks Vladimir.

> One addition though. May be, we could add 'total' field which would show
> memory.current? Yeah, this would result in a little redundancy, but I
> think that from userspace pov it's much more convenient to read the
> only file and get all stat counters than having them spread throughout
> several files.

I am not fully convinced that a total value or even memory.current
will be looked at that often in practice, because in all but a few
cornercases that value will be pegged to the configured limit. In
those instances I think it should be okay to check another file.

> Come to think of it, do we really need separate memory.events file?
> Can't these counters live in memory.stat either?

I think it sits at a different level of the interface. The events file
indicates cgroup-specific dynamics between configuration and memory
footprint, and so it sits on the same level as low, high, max, and
current. These are the parts involved in the most basic control loop
between the kernel and the job scheduler--monitor and adjust or notify
the admin. It's for the entity that allocates and manages the system.

The memory.stat file on the other hand is geared toward analyzing and
understanding workload-specific performance (whether by humans or with
some automated heuristics) and if necessary correcting the config file
that describes the application's requirements to the job scheduler.

I think it makes sense to not conflate these two interfaces.

> Yeah, this file
> generates events, but IMHO it's not very useful the way it is currently
> implemented:
> 
> Suppose, a user wants to receive notifications about OOM or LOW events,
> which are rather rare normally and might require immediate action. The
> only way to do that is to listen to memory.events, but this file can
> generate tons of MAX/HIGH when the cgroup is performing normally. The
> userspace app will have to wake up every time the cgroup performs
> reclaim and check memory.events just to ensure no OOM happened and this
> all will result in wasting cpu time.

Under optimal system load there is no limit reclaim, and memory
pressure comes exclusively from a shortage of physical pages that
global reclaim balances based on memory.low. If groups run into their
own limits, it means that there are idle resources left on the table.

So events only happen when the machine is over or under utilized, and
as per above, the events file is mainly meant for something like a job
scheduler tasked with allocating the machine's resources. It's hard to
imagine a job scheduler scenario where the long-term goal is anything
other than optimal utilization.

There are reasonable cases in which memory could be temporarily left
idle, say to keep startup latency of new jobs low. In those it's true
that the max and high notifications might become annoying. But do you
really think that could become problematic in practice? In that case
it should be enough if we ratelimit the file-changed notifications.

> May be, we could generate LOW/HIGH/MAX events on memory.low/high/max?
> This would look natural IMO. Don't know where OOM events should go in
> this case though.

Without a natural place for OOM notifications, it probably makes sense
to stick with memory.events.

Thanks,
Johannes

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/2] mm: memcontrol: cgroup2 memory statistics
  2016-01-15 20:30         ` Johannes Weiner
  (?)
@ 2016-01-28 16:21           ` Michal Hocko
  -1 siblings, 0 replies; 29+ messages in thread
From: Michal Hocko @ 2016-01-28 16:21 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Vladimir Davydov, Andrew Morton, linux-mm, cgroups, linux-kernel,
	kernel-team

[I am sorry I am coming here late but I didn't have time earlier]

On Fri 15-01-16 15:30:59, Johannes Weiner wrote:
> On Fri, Jan 15, 2016 at 12:58:34PM +0300, Vladimir Davydov wrote:
> > With the follow-up it looks good to me. All exported counters look
> > justified enough and the format follows that of other cgroup2
> > controllers (cpu, blkio). Thanks!
> > 
> > Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>
> 
> Thanks Vladimir.

Patches got merged in the meantime but anyway just for the record

Acked-by: Michal Hocko <mhocko@suse.com>

I would have liked nr_pages more than B because they are neither
following /proc/vmstat nor /proc/meminfo (which is in kB). It is true
that other memcg knobs are primarily bytes oriented so there is some
reason to use B here as well.

> > One addition though. May be, we could add 'total' field which would show
> > memory.current? Yeah, this would result in a little redundancy, but I
> > think that from userspace pov it's much more convenient to read the
> > only file and get all stat counters than having them spread throughout
> > several files.
> 
> I am not fully convinced that a total value or even memory.current
> will be looked at that often in practice, because in all but a few
> cornercases that value will be pegged to the configured limit. In
> those instances I think it should be okay to check another file.

Agreed

> > Come to think of it, do we really need separate memory.events file?
> > Can't these counters live in memory.stat either?
> 
> I think it sits at a different level of the interface. The events file
> indicates cgroup-specific dynamics between configuration and memory
> footprint, and so it sits on the same level as low, high, max, and
> current. These are the parts involved in the most basic control loop
> between the kernel and the job scheduler--monitor and adjust or notify
> the admin. It's for the entity that allocates and manages the system.
> 
> The memory.stat file on the other hand is geared toward analyzing and
> understanding workload-specific performance (whether by humans or with
> some automated heuristics) and if necessary correcting the config file
> that describes the application's requirements to the job scheduler.
> 
> I think it makes sense to not conflate these two interfaces.

Agreed here as well.
 
> > Yeah, this file
> > generates events, but IMHO it's not very useful the way it is currently
> > implemented:
> > 
> > Suppose, a user wants to receive notifications about OOM or LOW events,
> > which are rather rare normally and might require immediate action. The
> > only way to do that is to listen to memory.events, but this file can
> > generate tons of MAX/HIGH when the cgroup is performing normally. The
> > userspace app will have to wake up every time the cgroup performs
> > reclaim and check memory.events just to ensure no OOM happened and this
> > all will result in wasting cpu time.
> 
> Under optimal system load there is no limit reclaim, and memory
> pressure comes exclusively from a shortage of physical pages that
> global reclaim balances based on memory.low. If groups run into their
> own limits, it means that there are idle resources left on the table.

I would expect that the high limit will be routinely hit due to page
cache consumption.
 
> So events only happen when the machine is over or under utilized, and
> as per above, the events file is mainly meant for something like a job
> scheduler tasked with allocating the machine's resources. It's hard to
> imagine a job scheduler scenario where the long-term goal is anything
> other than optimal utilization.
> 
> There are reasonable cases in which memory could be temporarily left
> idle, say to keep startup latency of new jobs low. In those it's true
> that the max and high notifications might become annoying. But do you
> really think that could become problematic in practice? In that case
> it should be enough if we ratelimit the file-changed notifications.

I am not sure this would be a real problem either. Sure you can see a
lot of events but AFAIU no events will be lost, right?

> > May be, we could generate LOW/HIGH/MAX events on memory.low/high/max?
> > This would look natural IMO. Don't know where OOM events should go in
> > this case though.
> 
> Without a natural place for OOM notifications, it probably makes sense
> to stick with memory.events.

yes
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/2] mm: memcontrol: cgroup2 memory statistics
@ 2016-01-28 16:21           ` Michal Hocko
  0 siblings, 0 replies; 29+ messages in thread
From: Michal Hocko @ 2016-01-28 16:21 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Vladimir Davydov, Andrew Morton, linux-mm, cgroups, linux-kernel,
	kernel-team

[I am sorry I am coming here late but I didn't have time earlier]

On Fri 15-01-16 15:30:59, Johannes Weiner wrote:
> On Fri, Jan 15, 2016 at 12:58:34PM +0300, Vladimir Davydov wrote:
> > With the follow-up it looks good to me. All exported counters look
> > justified enough and the format follows that of other cgroup2
> > controllers (cpu, blkio). Thanks!
> > 
> > Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>
> 
> Thanks Vladimir.

Patches got merged in the meantime but anyway just for the record

Acked-by: Michal Hocko <mhocko@suse.com>

I would have liked nr_pages more than B because they are neither
following /proc/vmstat nor /proc/meminfo (which is in kB). It is true
that other memcg knobs are primarily bytes oriented so there is some
reason to use B here as well.

> > One addition though. May be, we could add 'total' field which would show
> > memory.current? Yeah, this would result in a little redundancy, but I
> > think that from userspace pov it's much more convenient to read the
> > only file and get all stat counters than having them spread throughout
> > several files.
> 
> I am not fully convinced that a total value or even memory.current
> will be looked at that often in practice, because in all but a few
> cornercases that value will be pegged to the configured limit. In
> those instances I think it should be okay to check another file.

Agreed

> > Come to think of it, do we really need separate memory.events file?
> > Can't these counters live in memory.stat either?
> 
> I think it sits at a different level of the interface. The events file
> indicates cgroup-specific dynamics between configuration and memory
> footprint, and so it sits on the same level as low, high, max, and
> current. These are the parts involved in the most basic control loop
> between the kernel and the job scheduler--monitor and adjust or notify
> the admin. It's for the entity that allocates and manages the system.
> 
> The memory.stat file on the other hand is geared toward analyzing and
> understanding workload-specific performance (whether by humans or with
> some automated heuristics) and if necessary correcting the config file
> that describes the application's requirements to the job scheduler.
> 
> I think it makes sense to not conflate these two interfaces.

Agreed here as well.
 
> > Yeah, this file
> > generates events, but IMHO it's not very useful the way it is currently
> > implemented:
> > 
> > Suppose, a user wants to receive notifications about OOM or LOW events,
> > which are rather rare normally and might require immediate action. The
> > only way to do that is to listen to memory.events, but this file can
> > generate tons of MAX/HIGH when the cgroup is performing normally. The
> > userspace app will have to wake up every time the cgroup performs
> > reclaim and check memory.events just to ensure no OOM happened and this
> > all will result in wasting cpu time.
> 
> Under optimal system load there is no limit reclaim, and memory
> pressure comes exclusively from a shortage of physical pages that
> global reclaim balances based on memory.low. If groups run into their
> own limits, it means that there are idle resources left on the table.

I would expect that the high limit will be routinely hit due to page
cache consumption.
 
> So events only happen when the machine is over or under utilized, and
> as per above, the events file is mainly meant for something like a job
> scheduler tasked with allocating the machine's resources. It's hard to
> imagine a job scheduler scenario where the long-term goal is anything
> other than optimal utilization.
> 
> There are reasonable cases in which memory could be temporarily left
> idle, say to keep startup latency of new jobs low. In those it's true
> that the max and high notifications might become annoying. But do you
> really think that could become problematic in practice? In that case
> it should be enough if we ratelimit the file-changed notifications.

I am not sure this would be a real problem either. Sure you can see a
lot of events but AFAIU no events will be lost, right?

> > May be, we could generate LOW/HIGH/MAX events on memory.low/high/max?
> > This would look natural IMO. Don't know where OOM events should go in
> > this case though.
> 
> Without a natural place for OOM notifications, it probably makes sense
> to stick with memory.events.

yes
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/2] mm: memcontrol: cgroup2 memory statistics
@ 2016-01-28 16:21           ` Michal Hocko
  0 siblings, 0 replies; 29+ messages in thread
From: Michal Hocko @ 2016-01-28 16:21 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Vladimir Davydov, Andrew Morton, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg

[I am sorry I am coming here late but I didn't have time earlier]

On Fri 15-01-16 15:30:59, Johannes Weiner wrote:
> On Fri, Jan 15, 2016 at 12:58:34PM +0300, Vladimir Davydov wrote:
> > With the follow-up it looks good to me. All exported counters look
> > justified enough and the format follows that of other cgroup2
> > controllers (cpu, blkio). Thanks!
> > 
> > Acked-by: Vladimir Davydov <vdavydov-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
> 
> Thanks Vladimir.

Patches got merged in the meantime but anyway just for the record

Acked-by: Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>

I would have liked nr_pages more than B because they are neither
following /proc/vmstat nor /proc/meminfo (which is in kB). It is true
that other memcg knobs are primarily bytes oriented so there is some
reason to use B here as well.

> > One addition though. May be, we could add 'total' field which would show
> > memory.current? Yeah, this would result in a little redundancy, but I
> > think that from userspace pov it's much more convenient to read the
> > only file and get all stat counters than having them spread throughout
> > several files.
> 
> I am not fully convinced that a total value or even memory.current
> will be looked at that often in practice, because in all but a few
> cornercases that value will be pegged to the configured limit. In
> those instances I think it should be okay to check another file.

Agreed

> > Come to think of it, do we really need separate memory.events file?
> > Can't these counters live in memory.stat either?
> 
> I think it sits at a different level of the interface. The events file
> indicates cgroup-specific dynamics between configuration and memory
> footprint, and so it sits on the same level as low, high, max, and
> current. These are the parts involved in the most basic control loop
> between the kernel and the job scheduler--monitor and adjust or notify
> the admin. It's for the entity that allocates and manages the system.
> 
> The memory.stat file on the other hand is geared toward analyzing and
> understanding workload-specific performance (whether by humans or with
> some automated heuristics) and if necessary correcting the config file
> that describes the application's requirements to the job scheduler.
> 
> I think it makes sense to not conflate these two interfaces.

Agreed here as well.
 
> > Yeah, this file
> > generates events, but IMHO it's not very useful the way it is currently
> > implemented:
> > 
> > Suppose, a user wants to receive notifications about OOM or LOW events,
> > which are rather rare normally and might require immediate action. The
> > only way to do that is to listen to memory.events, but this file can
> > generate tons of MAX/HIGH when the cgroup is performing normally. The
> > userspace app will have to wake up every time the cgroup performs
> > reclaim and check memory.events just to ensure no OOM happened and this
> > all will result in wasting cpu time.
> 
> Under optimal system load there is no limit reclaim, and memory
> pressure comes exclusively from a shortage of physical pages that
> global reclaim balances based on memory.low. If groups run into their
> own limits, it means that there are idle resources left on the table.

I would expect that the high limit will be routinely hit due to page
cache consumption.
 
> So events only happen when the machine is over or under utilized, and
> as per above, the events file is mainly meant for something like a job
> scheduler tasked with allocating the machine's resources. It's hard to
> imagine a job scheduler scenario where the long-term goal is anything
> other than optimal utilization.
> 
> There are reasonable cases in which memory could be temporarily left
> idle, say to keep startup latency of new jobs low. In those it's true
> that the max and high notifications might become annoying. But do you
> really think that could become problematic in practice? In that case
> it should be enough if we ratelimit the file-changed notifications.

I am not sure this would be a real problem either. Sure you can see a
lot of events but AFAIU no events will be lost, right?

> > May be, we could generate LOW/HIGH/MAX events on memory.low/high/max?
> > This would look natural IMO. Don't know where OOM events should go in
> > this case though.
> 
> Without a natural place for OOM notifications, it probably makes sense
> to stick with memory.events.

yes
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2016-01-28 16:21 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-13 22:01 [PATCH 0/2] mm: memcontrol: cgroup2 memory statistics Johannes Weiner
2016-01-13 22:01 ` Johannes Weiner
2016-01-13 22:01 ` Johannes Weiner
2016-01-13 22:01 ` [PATCH 1/2] mm: memcontrol: basic memory statistics in cgroup2 memory controller Johannes Weiner
2016-01-13 22:01   ` Johannes Weiner
2016-01-13 22:49   ` Andrew Morton
2016-01-13 22:49     ` Andrew Morton
2016-01-14 20:25     ` Johannes Weiner
2016-01-14 20:25       ` Johannes Weiner
2016-01-14 20:25       ` Johannes Weiner
2016-01-13 22:01 ` [PATCH 2/2] mm: memcontrol: add "sock" to cgroup2 memory.stat Johannes Weiner
2016-01-13 22:01   ` Johannes Weiner
2016-01-14 20:26   ` [PATCH 2/2 V2] " Johannes Weiner
2016-01-14 20:26     ` Johannes Weiner
2016-01-13 22:49 ` [PATCH 0/2] mm: memcontrol: cgroup2 memory statistics Andrew Morton
2016-01-13 22:49   ` Andrew Morton
2016-01-13 22:49   ` Andrew Morton
2016-01-14 20:24   ` Johannes Weiner
2016-01-14 20:24     ` Johannes Weiner
2016-01-14 20:24     ` Johannes Weiner
2016-01-15  9:58     ` Vladimir Davydov
2016-01-15  9:58       ` Vladimir Davydov
2016-01-15  9:58       ` Vladimir Davydov
2016-01-15 20:30       ` Johannes Weiner
2016-01-15 20:30         ` Johannes Weiner
2016-01-15 20:30         ` Johannes Weiner
2016-01-28 16:21         ` Michal Hocko
2016-01-28 16:21           ` Michal Hocko
2016-01-28 16:21           ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.