[2/4] mm: memcontrol: re-use global VM event enum
diff mbox series

Message ID 20170404220148.28338-2-hannes@cmpxchg.org
State New, archived
Headers show
Series
  • [1/4] mm: memcontrol: clean up memory.events counting function
Related show

Commit Message

Johannes Weiner April 4, 2017, 10:01 p.m. UTC
The current duplication is a high-maintenance mess, and it's painful
to add new items.

This increases the size of the event array, but we'll eventually want
most of the VM events tracked on a per-cgroup basis anyway.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/memcontrol.h | 45 ++++++++++++---------------------------
 mm/memcontrol.c            | 53 ++++++++++++++++++++++++----------------------
 2 files changed, 42 insertions(+), 56 deletions(-)

Comments

Vladimir Davydov April 6, 2017, 8:49 a.m. UTC | #1
On Tue, Apr 04, 2017 at 06:01:46PM -0400, Johannes Weiner wrote:
> The current duplication is a high-maintenance mess, and it's painful
> to add new items.
> 
> This increases the size of the event array, but we'll eventually want
> most of the VM events tracked on a per-cgroup basis anyway.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Although the increase in the mem_cgroup struct introduced by this patch
looks scary, I agree this is a reasonable step toward unification of
vmstat, as most vm_even_item entries do make sense to be accounted per
cgroup as well.

Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>

> @@ -608,9 +601,9 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg,
>  
>  	/* pagein of a big page is an event. So, ignore page size */
>  	if (nr_pages > 0)
> -		__this_cpu_inc(memcg->stat->events[MEM_CGROUP_EVENTS_PGPGIN]);
> +		__this_cpu_inc(memcg->stat->events[PGPGIN]);
>  	else {
> -		__this_cpu_inc(memcg->stat->events[MEM_CGROUP_EVENTS_PGPGOUT]);
> +		__this_cpu_inc(memcg->stat->events[PGPGOUT]);
>  		nr_pages = -nr_pages; /* for event */
>  	}

AFAIR this doesn't exactly match system-wide PGPGIN/PGPGOUT: they are
supposed to account only paging events involving IO while currently they
include faulting in zero pages and zapping a process address space.
Probably, this should be revised before rolling out to cgroup v2.
Michal Hocko April 7, 2017, 12:47 p.m. UTC | #2
I do agree that we should share global and memcg specific events constants
but I am not sure we want to share all of them. Would it make sense to
reorganize the global enum and put those that are shared to the
beginning? We wouldn't need the memcg specific translation then.

Anyway, two comments on the current implementation.

On Tue 04-04-17 18:01:46, Johannes Weiner wrote:
[...]
> +/* Cgroup-specific events, on top of universal VM events */
> +enum memcg_event_item {
> +	MEMCG_LOW = NR_VM_EVENT_ITEMS,
> +	MEMCG_HIGH,
> +	MEMCG_MAX,
> +	MEMCG_OOM,
> +	MEMCG_NR_EVENTS,
> +};

The above should mention that each supported global VM event should
provide the corresponding translation

[...]

here...
> +/* Universal VM events cgroup1 shows, original sort order */
> +unsigned int memcg1_events[] = {
> +	PGPGIN,
> +	PGPGOUT,
> +	PGFAULT,
> +	PGMAJFAULT,
> +};
> +
> +static const char *const memcg1_event_names[] = {
> +	"pgpgin",
> +	"pgpgout",
> +	"pgfault",
> +	"pgmajfault",
> +};

the naming doesn't make it easier to undestand why we need this.
global2memcg_event?
Johannes Weiner April 10, 2017, 2:13 p.m. UTC | #3
On Fri, Apr 07, 2017 at 02:47:02PM +0200, Michal Hocko wrote:
> I do agree that we should share global and memcg specific events constants
> but I am not sure we want to share all of them. Would it make sense to
> reorganize the global enum and put those that are shared to the
> beginning? We wouldn't need the memcg specific translation then.

I'm not sure I follow. Which translation?

> Anyway, two comments on the current implementation.
> 
> On Tue 04-04-17 18:01:46, Johannes Weiner wrote:
> [...]
> > +/* Cgroup-specific events, on top of universal VM events */
> > +enum memcg_event_item {
> > +	MEMCG_LOW = NR_VM_EVENT_ITEMS,
> > +	MEMCG_HIGH,
> > +	MEMCG_MAX,
> > +	MEMCG_OOM,
> > +	MEMCG_NR_EVENTS,
> > +};
> 
> The above should mention that each supported global VM event should
> provide the corresponding translation
> 
> [...]
> 
> here...
> > +/* Universal VM events cgroup1 shows, original sort order */
> > +unsigned int memcg1_events[] = {
> > +	PGPGIN,
> > +	PGPGOUT,
> > +	PGFAULT,
> > +	PGMAJFAULT,
> > +};
> > +
> > +static const char *const memcg1_event_names[] = {
> > +	"pgpgin",
> > +	"pgpgout",
> > +	"pgfault",
> > +	"pgmajfault",
> > +};
> 
> the naming doesn't make it easier to undestand why we need this.
> global2memcg_event?

This is just to keep the file order consistent. It could have been
done like memory.stat in cgroup2, where we simply do

   seq_printf(s, "pgmajfault %lu\n", stat[PGMAJFAULT]);

but I didn't want to change the v1 code too much. So these two arrays
are just a sorted list of global VM events shown in v1's memory.stat.
Johannes Weiner April 10, 2017, 2:17 p.m. UTC | #4
On Thu, Apr 06, 2017 at 11:49:24AM +0300, Vladimir Davydov wrote:
> On Tue, Apr 04, 2017 at 06:01:46PM -0400, Johannes Weiner wrote:
> > The current duplication is a high-maintenance mess, and it's painful
> > to add new items.
> > 
> > This increases the size of the event array, but we'll eventually want
> > most of the VM events tracked on a per-cgroup basis anyway.
> > 
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> 
> Although the increase in the mem_cgroup struct introduced by this patch
> looks scary, I agree this is a reasonable step toward unification of
> vmstat, as most vm_even_item entries do make sense to be accounted per
> cgroup as well.
> 
> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>

Thanks!

> > @@ -608,9 +601,9 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg,
> >  
> >  	/* pagein of a big page is an event. So, ignore page size */
> >  	if (nr_pages > 0)
> > -		__this_cpu_inc(memcg->stat->events[MEM_CGROUP_EVENTS_PGPGIN]);
> > +		__this_cpu_inc(memcg->stat->events[PGPGIN]);
> >  	else {
> > -		__this_cpu_inc(memcg->stat->events[MEM_CGROUP_EVENTS_PGPGOUT]);
> > +		__this_cpu_inc(memcg->stat->events[PGPGOUT]);
> >  		nr_pages = -nr_pages; /* for event */
> >  	}
> 
> AFAIR this doesn't exactly match system-wide PGPGIN/PGPGOUT: they are
> supposed to account only paging events involving IO while currently they
> include faulting in zero pages and zapping a process address space.
> Probably, this should be revised before rolling out to cgroup v2.

Yeah, that stat item doesn't make much sense in cgroup1. Cgroup2
doesn't export it at all right now. Should we export it in the future,
we can add special MEMCG1_PGPGIN_SILLY events and let cgroup2 use the
real thing. Or remove/fix the stats in cgroup1, because I cannot think
how this would be useful at all right now, anyway.

But it's ooold behavior, so no urgency to tackle this right now.
Michal Hocko April 11, 2017, 12:30 p.m. UTC | #5
On Mon 10-04-17 10:13:34, Johannes Weiner wrote:
> On Fri, Apr 07, 2017 at 02:47:02PM +0200, Michal Hocko wrote:
> > I do agree that we should share global and memcg specific events constants
> > but I am not sure we want to share all of them. Would it make sense to
> > reorganize the global enum and put those that are shared to the
> > beginning? We wouldn't need the memcg specific translation then.
> 
> I'm not sure I follow. Which translation?

Sorry, I should have said s@translation@filtering@ by memcg1_events*

> > Anyway, two comments on the current implementation.
> > 
> > On Tue 04-04-17 18:01:46, Johannes Weiner wrote:
> > [...]
> > > +/* Cgroup-specific events, on top of universal VM events */
> > > +enum memcg_event_item {
> > > +	MEMCG_LOW = NR_VM_EVENT_ITEMS,
> > > +	MEMCG_HIGH,
> > > +	MEMCG_MAX,
> > > +	MEMCG_OOM,
> > > +	MEMCG_NR_EVENTS,
> > > +};
> > 
> > The above should mention that each supported global VM event should
> > provide the corresponding translation
> > 
> > [...]
> > 
> > here...
> > > +/* Universal VM events cgroup1 shows, original sort order */
> > > +unsigned int memcg1_events[] = {
> > > +	PGPGIN,
> > > +	PGPGOUT,
> > > +	PGFAULT,
> > > +	PGMAJFAULT,
> > > +};
> > > +
> > > +static const char *const memcg1_event_names[] = {
> > > +	"pgpgin",
> > > +	"pgpgout",
> > > +	"pgfault",
> > > +	"pgmajfault",
> > > +};
> > 
> > the naming doesn't make it easier to undestand why we need this.
> > global2memcg_event?
> 
> This is just to keep the file order consistent. It could have been
> done like memory.stat in cgroup2, where we simply do
> 
>    seq_printf(s, "pgmajfault %lu\n", stat[PGMAJFAULT]);
> 
> but I didn't want to change the v1 code too much. So these two arrays
> are just a sorted list of global VM events shown in v1's memory.stat.

You would still have to know which are the relevant parts of the global
starts that we account for memcg.

Patch
diff mbox series

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index bc0c16e284c0..0bb5f055bd26 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -69,20 +69,6 @@  struct mem_cgroup_reclaim_cookie {
 	unsigned int generation;
 };
 
-enum mem_cgroup_events_index {
-	MEM_CGROUP_EVENTS_PGPGIN,	/* # of pages paged in */
-	MEM_CGROUP_EVENTS_PGPGOUT,	/* # of pages paged out */
-	MEM_CGROUP_EVENTS_PGFAULT,	/* # of page-faults */
-	MEM_CGROUP_EVENTS_PGMAJFAULT,	/* # of major page-faults */
-	MEM_CGROUP_EVENTS_NSTATS,
-	/* default hierarchy events */
-	MEMCG_LOW = MEM_CGROUP_EVENTS_NSTATS,
-	MEMCG_HIGH,
-	MEMCG_MAX,
-	MEMCG_OOM,
-	MEMCG_NR_EVENTS,
-};
-
 /*
  * Per memcg event counter is incremented at every pagein/pageout. With THP,
  * it will be incremated by the number of pages. This counter is used for
@@ -106,6 +92,15 @@  struct mem_cgroup_id {
 	atomic_t ref;
 };
 
+/* Cgroup-specific events, on top of universal VM events */
+enum memcg_event_item {
+	MEMCG_LOW = NR_VM_EVENT_ITEMS,
+	MEMCG_HIGH,
+	MEMCG_MAX,
+	MEMCG_OOM,
+	MEMCG_NR_EVENTS,
+};
+
 struct mem_cgroup_stat_cpu {
 	long count[MEMCG_NR_STAT];
 	unsigned long events[MEMCG_NR_EVENTS];
@@ -288,9 +283,9 @@  static inline bool mem_cgroup_disabled(void)
 }
 
 static inline void mem_cgroup_event(struct mem_cgroup *memcg,
-				    enum mem_cgroup_events_index idx)
+				    enum memcg_event_item event)
 {
-	this_cpu_inc(memcg->stat->events[idx]);
+	this_cpu_inc(memcg->stat->events[event]);
 	cgroup_file_notify(&memcg->events_file);
 }
 
@@ -575,20 +570,8 @@  static inline void mem_cgroup_count_vm_event(struct mm_struct *mm,
 
 	rcu_read_lock();
 	memcg = mem_cgroup_from_task(rcu_dereference(mm->owner));
-	if (unlikely(!memcg))
-		goto out;
-
-	switch (idx) {
-	case PGFAULT:
-		this_cpu_inc(memcg->stat->events[MEM_CGROUP_EVENTS_PGFAULT]);
-		break;
-	case PGMAJFAULT:
-		this_cpu_inc(memcg->stat->events[MEM_CGROUP_EVENTS_PGMAJFAULT]);
-		break;
-	default:
-		BUG();
-	}
-out:
+	if (likely(memcg))
+		this_cpu_inc(memcg->stat->events[idx]);
 	rcu_read_unlock();
 }
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
@@ -608,7 +591,7 @@  static inline bool mem_cgroup_disabled(void)
 }
 
 static inline void mem_cgroup_event(struct mem_cgroup *memcg,
-				    enum mem_cgroup_events_index idx)
+				    enum memcg_event_item event)
 {
 }
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 1ffa3ad201ea..6b42887e5f14 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -111,13 +111,6 @@  static const char * const mem_cgroup_stat_names[] = {
 	"swap",
 };
 
-static const char * const mem_cgroup_events_names[] = {
-	"pgpgin",
-	"pgpgout",
-	"pgfault",
-	"pgmajfault",
-};
-
 static const char * const mem_cgroup_lru_names[] = {
 	"inactive_anon",
 	"active_anon",
@@ -571,13 +564,13 @@  mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz)
  */
 
 static unsigned long mem_cgroup_read_events(struct mem_cgroup *memcg,
-					    enum mem_cgroup_events_index idx)
+					    enum memcg_event_item event)
 {
 	unsigned long val = 0;
 	int cpu;
 
 	for_each_possible_cpu(cpu)
-		val += per_cpu(memcg->stat->events[idx], cpu);
+		val += per_cpu(memcg->stat->events[event], cpu);
 	return val;
 }
 
@@ -608,9 +601,9 @@  static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg,
 
 	/* pagein of a big page is an event. So, ignore page size */
 	if (nr_pages > 0)
-		__this_cpu_inc(memcg->stat->events[MEM_CGROUP_EVENTS_PGPGIN]);
+		__this_cpu_inc(memcg->stat->events[PGPGIN]);
 	else {
-		__this_cpu_inc(memcg->stat->events[MEM_CGROUP_EVENTS_PGPGOUT]);
+		__this_cpu_inc(memcg->stat->events[PGPGOUT]);
 		nr_pages = -nr_pages; /* for event */
 	}
 
@@ -3119,6 +3112,21 @@  static int memcg_numa_stat_show(struct seq_file *m, void *v)
 }
 #endif /* CONFIG_NUMA */
 
+/* Universal VM events cgroup1 shows, original sort order */
+unsigned int memcg1_events[] = {
+	PGPGIN,
+	PGPGOUT,
+	PGFAULT,
+	PGMAJFAULT,
+};
+
+static const char *const memcg1_event_names[] = {
+	"pgpgin",
+	"pgpgout",
+	"pgfault",
+	"pgmajfault",
+};
+
 static int memcg_stat_show(struct seq_file *m, void *v)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m));
@@ -3128,8 +3136,6 @@  static int memcg_stat_show(struct seq_file *m, void *v)
 
 	BUILD_BUG_ON(ARRAY_SIZE(mem_cgroup_stat_names) !=
 		     MEM_CGROUP_STAT_NSTATS);
-	BUILD_BUG_ON(ARRAY_SIZE(mem_cgroup_events_names) !=
-		     MEM_CGROUP_EVENTS_NSTATS);
 	BUILD_BUG_ON(ARRAY_SIZE(mem_cgroup_lru_names) != NR_LRU_LISTS);
 
 	for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
@@ -3139,9 +3145,9 @@  static int memcg_stat_show(struct seq_file *m, void *v)
 			   mem_cgroup_read_stat(memcg, i) * PAGE_SIZE);
 	}
 
-	for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++)
-		seq_printf(m, "%s %lu\n", mem_cgroup_events_names[i],
-			   mem_cgroup_read_events(memcg, i));
+	for (i = 0; i < ARRAY_SIZE(memcg1_events); i++)
+		seq_printf(m, "%s %lu\n", memcg1_event_names[i],
+			   mem_cgroup_read_events(memcg, memcg1_events[i]));
 
 	for (i = 0; i < NR_LRU_LISTS; i++)
 		seq_printf(m, "%s %lu\n", mem_cgroup_lru_names[i],
@@ -3169,13 +3175,12 @@  static int memcg_stat_show(struct seq_file *m, void *v)
 		seq_printf(m, "total_%s %llu\n", mem_cgroup_stat_names[i], val);
 	}
 
-	for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++) {
+	for (i = 0; i < ARRAY_SIZE(memcg1_events); i++) {
 		unsigned long long val = 0;
 
 		for_each_mem_cgroup_tree(mi, memcg)
-			val += mem_cgroup_read_events(mi, i);
-		seq_printf(m, "total_%s %llu\n",
-			   mem_cgroup_events_names[i], val);
+			val += mem_cgroup_read_events(mi, memcg1_events[i]);
+		seq_printf(m, "total_%s %llu\n", memcg1_event_names[i], val);
 	}
 
 	for (i = 0; i < NR_LRU_LISTS; i++) {
@@ -5222,10 +5227,8 @@  static int memory_stat_show(struct seq_file *m, void *v)
 
 	/* Accumulated memory events */
 
-	seq_printf(m, "pgfault %lu\n",
-		   events[MEM_CGROUP_EVENTS_PGFAULT]);
-	seq_printf(m, "pgmajfault %lu\n",
-		   events[MEM_CGROUP_EVENTS_PGMAJFAULT]);
+	seq_printf(m, "pgfault %lu\n", events[PGFAULT]);
+	seq_printf(m, "pgmajfault %lu\n", events[PGMAJFAULT]);
 
 	seq_printf(m, "workingset_refault %lu\n",
 		   stat[MEMCG_WORKINGSET_REFAULT]);
@@ -5493,7 +5496,7 @@  static void uncharge_batch(struct mem_cgroup *memcg, unsigned long pgpgout,
 	__this_cpu_sub(memcg->stat->count[MEM_CGROUP_STAT_CACHE], nr_file);
 	__this_cpu_sub(memcg->stat->count[MEM_CGROUP_STAT_RSS_HUGE], nr_huge);
 	__this_cpu_sub(memcg->stat->count[MEM_CGROUP_STAT_SHMEM], nr_shmem);
-	__this_cpu_add(memcg->stat->events[MEM_CGROUP_EVENTS_PGPGOUT], pgpgout);
+	__this_cpu_add(memcg->stat->events[PGPGOUT], pgpgout);
 	__this_cpu_add(memcg->stat->nr_page_events, nr_pages);
 	memcg_check_events(memcg, dummy_page);
 	local_irq_restore(flags);