linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] mm: reduce the memory footprint of dying memory cgroups
@ 2019-03-12 22:33 Roman Gushchin
  2019-03-12 22:33 ` [PATCH v2 1/6] mm: prepare to premature release of memcg->vmstats_percpu Roman Gushchin
                   ` (6 more replies)
  0 siblings, 7 replies; 12+ messages in thread
From: Roman Gushchin @ 2019-03-12 22:33 UTC (permalink / raw)
  To: linux-mm, kernel-team
  Cc: linux-kernel, Tejun Heo, Rik van Riel, Johannes Weiner,
	Michal Hocko, Roman Gushchin

A cgroup can remain in the dying state for a long time, being pinned in the
memory by any kernel object. It can be pinned by a page, shared with other
cgroup (e.g. mlocked by a process in the other cgroup). It can be pinned
by a vfs cache object, etc.

Mostly because of percpu data, the size of a memcg structure in the kernel
memory is quite large. Depending on the machine size and the kernel config,
it can easily reach hundreds of kilobytes per cgroup.

Depending on the memory pressure and the reclaim approach (which is a separate
topic), it looks like several hundreds (if not single thousands) of dying
cgroups is a typical number. On a moderately sized machine the overall memory
footprint is measured in hundreds of megabytes.

So if we can't completely get rid of dying cgroups, let's make them smaller.
This patchset aims to reduce the size of a dying memory cgroup by the premature
release of percpu data during the cgroup removal, and use of atomic counterparts
instead. Currently it covers per-memcg vmstat_percpu, per-memcg per-node
lruvec_stat_cpu. The same approach can be further applied to other percpu data.

Results on my test machine (32 CPUs, singe node):

  With the patchset:              Originally:

  nr_dying_descendants 0
  Slab:              66640 kB	  Slab:              67644 kB
  Percpu:             6912 kB	  Percpu:             6912 kB

  nr_dying_descendants 1000
  Slab:              85912 kB	  Slab:              84704 kB
  Percpu:            26880 kB	  Percpu:            64128 kB

So one dying cgroup went from 75 kB to 39 kB, which is almost twice smaller.
The difference will be even bigger on a bigger machine
(especially, with NUMA).

To test the patchset, I used the following script:
  CG=/sys/fs/cgroup/percpu_test/

  mkdir ${CG}
  echo "+memory" > ${CG}/cgroup.subtree_control

  cat ${CG}/cgroup.stat | grep nr_dying_descendants
  cat /proc/meminfo | grep -e Percpu -e Slab

  for i in `seq 1 1000`; do
      mkdir ${CG}/${i}
      echo $$ > ${CG}/${i}/cgroup.procs
      dd if=/dev/urandom of=/tmp/test-${i} count=1 2> /dev/null
      echo $$ > /sys/fs/cgroup/cgroup.procs
      rmdir ${CG}/${i}
  done

  cat /sys/fs/cgroup/cgroup.stat | grep nr_dying_descendants
  cat /proc/meminfo | grep -e Percpu -e Slab

  rmdir ${CG}


v2:
  - several renamings suggested by Johannes Weiner
  - added a patch, which merges cpu offlining and percpu flush code


Roman Gushchin (6):
  mm: prepare to premature release of memcg->vmstats_percpu
  mm: prepare to premature release of per-node lruvec_stat_cpu
  mm: release memcg percpu data prematurely
  mm: release per-node memcg percpu data prematurely
  mm: flush memcg percpu stats and events before releasing
  mm: refactor memcg_hotplug_cpu_dead() to use
    memcg_flush_offline_percpu()

 include/linux/memcontrol.h |  66 ++++++++++----
 mm/memcontrol.c            | 179 ++++++++++++++++++++++++++++---------
 2 files changed, 186 insertions(+), 59 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/6] mm: prepare to premature release of memcg->vmstats_percpu
  2019-03-12 22:33 [PATCH v2 0/6] mm: reduce the memory footprint of dying memory cgroups Roman Gushchin
@ 2019-03-12 22:33 ` Roman Gushchin
  2019-03-12 22:33 ` [PATCH v2 2/6] mm: prepare to premature release of per-node lruvec_stat_cpu Roman Gushchin
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Roman Gushchin @ 2019-03-12 22:33 UTC (permalink / raw)
  To: linux-mm, kernel-team
  Cc: linux-kernel, Tejun Heo, Rik van Riel, Johannes Weiner,
	Michal Hocko, Roman Gushchin

Prepare to handle premature release of memcg->vmstats_percpu data.
Currently it's a generic pointer which is expected to be non-NULL
during the whole life time of a memcg. Switch over to the
rcu-protected pointer, and carefully check it for being non-NULL.

This change is a required step towards dynamic premature release
of percpu memcg data.

Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/memcontrol.h | 40 +++++++++++++++++-------
 mm/memcontrol.c            | 62 +++++++++++++++++++++++++++++---------
 2 files changed, 77 insertions(+), 25 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 534267947664..05ca77767c6a 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -274,7 +274,7 @@ struct mem_cgroup {
 	struct task_struct	*move_lock_task;
 
 	/* memory.stat */
-	struct memcg_vmstats_percpu __percpu *vmstats_percpu;
+	struct memcg_vmstats_percpu __rcu /* __percpu */ *vmstats_percpu;
 
 	MEMCG_PADDING(_pad2_);
 
@@ -597,17 +597,26 @@ static inline unsigned long memcg_page_state(struct mem_cgroup *memcg,
 static inline void __mod_memcg_state(struct mem_cgroup *memcg,
 				     int idx, int val)
 {
+	struct memcg_vmstats_percpu __percpu *vmstats_percpu;
 	long x;
 
 	if (mem_cgroup_disabled())
 		return;
 
-	x = val + __this_cpu_read(memcg->vmstats_percpu->stat[idx]);
-	if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) {
-		atomic_long_add(x, &memcg->vmstats[idx]);
-		x = 0;
+	rcu_read_lock();
+	vmstats_percpu = (struct memcg_vmstats_percpu __percpu *)
+		rcu_dereference(memcg->vmstats_percpu);
+	if (likely(vmstats_percpu)) {
+		x = val + __this_cpu_read(vmstats_percpu->stat[idx]);
+		if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) {
+			atomic_long_add(x, &memcg->vmstats[idx]);
+			x = 0;
+		}
+		__this_cpu_write(vmstats_percpu->stat[idx], x);
+	} else {
+		atomic_long_add(val, &memcg->vmstats[idx]);
 	}
-	__this_cpu_write(memcg->vmstats_percpu->stat[idx], x);
+	rcu_read_unlock();
 }
 
 /* idx can be of type enum memcg_stat_item or node_stat_item */
@@ -740,17 +749,26 @@ static inline void __count_memcg_events(struct mem_cgroup *memcg,
 					enum vm_event_item idx,
 					unsigned long count)
 {
+	struct memcg_vmstats_percpu __percpu *vmstats_percpu;
 	unsigned long x;
 
 	if (mem_cgroup_disabled())
 		return;
 
-	x = count + __this_cpu_read(memcg->vmstats_percpu->events[idx]);
-	if (unlikely(x > MEMCG_CHARGE_BATCH)) {
-		atomic_long_add(x, &memcg->vmevents[idx]);
-		x = 0;
+	rcu_read_lock();
+	vmstats_percpu = (struct memcg_vmstats_percpu __percpu *)
+		rcu_dereference(memcg->vmstats_percpu);
+	if (likely(vmstats_percpu)) {
+		x = count + __this_cpu_read(vmstats_percpu->events[idx]);
+		if (unlikely(x > MEMCG_CHARGE_BATCH)) {
+			atomic_long_add(x, &memcg->vmevents[idx]);
+			x = 0;
+		}
+		__this_cpu_write(vmstats_percpu->events[idx], x);
+	} else {
+		atomic_long_add(count, &memcg->vmevents[idx]);
 	}
-	__this_cpu_write(memcg->vmstats_percpu->events[idx], x);
+	rcu_read_unlock();
 }
 
 static inline void count_memcg_events(struct mem_cgroup *memcg,
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c532f8685aa3..803c772f354b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -697,6 +697,8 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg,
 					 struct page *page,
 					 bool compound, int nr_pages)
 {
+	struct memcg_vmstats_percpu __percpu *vmstats_percpu;
+
 	/*
 	 * Here, RSS means 'mapped anon' and anon's SwapCache. Shmem/tmpfs is
 	 * counted as CACHE even if it's on ANON LRU.
@@ -722,7 +724,12 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg,
 		nr_pages = -nr_pages; /* for event */
 	}
 
-	__this_cpu_add(memcg->vmstats_percpu->nr_page_events, nr_pages);
+	rcu_read_lock();
+	vmstats_percpu = (struct memcg_vmstats_percpu __percpu *)
+		rcu_dereference(memcg->vmstats_percpu);
+	if (likely(vmstats_percpu))
+		__this_cpu_add(vmstats_percpu->nr_page_events, nr_pages);
+	rcu_read_unlock();
 }
 
 unsigned long mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg,
@@ -756,10 +763,18 @@ static unsigned long mem_cgroup_nr_lru_pages(struct mem_cgroup *memcg,
 static bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg,
 				       enum mem_cgroup_events_target target)
 {
+	struct memcg_vmstats_percpu __percpu *vmstats_percpu;
 	unsigned long val, next;
+	bool ret = false;
 
-	val = __this_cpu_read(memcg->vmstats_percpu->nr_page_events);
-	next = __this_cpu_read(memcg->vmstats_percpu->targets[target]);
+	rcu_read_lock();
+	vmstats_percpu = (struct memcg_vmstats_percpu __percpu *)
+		rcu_dereference(memcg->vmstats_percpu);
+	if (!vmstats_percpu)
+		goto out;
+
+	val = __this_cpu_read(vmstats_percpu->nr_page_events);
+	next = __this_cpu_read(vmstats_percpu->targets[target]);
 	/* from time_after() in jiffies.h */
 	if ((long)(next - val) < 0) {
 		switch (target) {
@@ -775,10 +790,12 @@ static bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg,
 		default:
 			break;
 		}
-		__this_cpu_write(memcg->vmstats_percpu->targets[target], next);
-		return true;
+		__this_cpu_write(vmstats_percpu->targets[target], next);
+		ret = true;
 	}
-	return false;
+out:
+	rcu_read_unlock();
+	return ret;
 }
 
 /*
@@ -2104,22 +2121,29 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
 
 static int memcg_hotplug_cpu_dead(unsigned int cpu)
 {
+	struct memcg_vmstats_percpu __percpu *vmstats_percpu;
 	struct memcg_stock_pcp *stock;
 	struct mem_cgroup *memcg;
 
 	stock = &per_cpu(memcg_stock, cpu);
 	drain_stock(stock);
 
+	rcu_read_lock();
 	for_each_mem_cgroup(memcg) {
 		int i;
 
+		vmstats_percpu = (struct memcg_vmstats_percpu __percpu *)
+			rcu_dereference(memcg->vmstats_percpu);
+
 		for (i = 0; i < MEMCG_NR_STAT; i++) {
 			int nid;
 			long x;
 
-			x = this_cpu_xchg(memcg->vmstats_percpu->stat[i], 0);
-			if (x)
-				atomic_long_add(x, &memcg->vmstats[i]);
+			if (vmstats_percpu) {
+				x = this_cpu_xchg(vmstats_percpu->stat[i], 0);
+				if (x)
+					atomic_long_add(x, &memcg->vmstats[i]);
+			}
 
 			if (i >= NR_VM_NODE_STAT_ITEMS)
 				continue;
@@ -2137,11 +2161,14 @@ static int memcg_hotplug_cpu_dead(unsigned int cpu)
 		for (i = 0; i < NR_VM_EVENT_ITEMS; i++) {
 			long x;
 
-			x = this_cpu_xchg(memcg->vmstats_percpu->events[i], 0);
-			if (x)
-				atomic_long_add(x, &memcg->vmevents[i]);
+			if (vmstats_percpu) {
+				x = this_cpu_xchg(vmstats_percpu->events[i], 0);
+				if (x)
+					atomic_long_add(x, &memcg->vmevents[i]);
+			}
 		}
 	}
+	rcu_read_unlock();
 
 	return 0;
 }
@@ -4464,7 +4491,8 @@ static struct mem_cgroup *mem_cgroup_alloc(void)
 	if (memcg->id.id < 0)
 		goto fail;
 
-	memcg->vmstats_percpu = alloc_percpu(struct memcg_vmstats_percpu);
+	rcu_assign_pointer(memcg->vmstats_percpu,
+			   alloc_percpu(struct memcg_vmstats_percpu));
 	if (!memcg->vmstats_percpu)
 		goto fail;
 
@@ -6054,6 +6082,7 @@ static void uncharge_batch(const struct uncharge_gather *ug)
 {
 	unsigned long nr_pages = ug->nr_anon + ug->nr_file + ug->nr_kmem;
 	unsigned long flags;
+	struct memcg_vmstats_percpu __percpu *vmstats_percpu;
 
 	if (!mem_cgroup_is_root(ug->memcg)) {
 		page_counter_uncharge(&ug->memcg->memory, nr_pages);
@@ -6070,7 +6099,12 @@ static void uncharge_batch(const struct uncharge_gather *ug)
 	__mod_memcg_state(ug->memcg, MEMCG_RSS_HUGE, -ug->nr_huge);
 	__mod_memcg_state(ug->memcg, NR_SHMEM, -ug->nr_shmem);
 	__count_memcg_events(ug->memcg, PGPGOUT, ug->pgpgout);
-	__this_cpu_add(ug->memcg->vmstats_percpu->nr_page_events, nr_pages);
+	rcu_read_lock();
+	vmstats_percpu = (struct memcg_vmstats_percpu __percpu *)
+		rcu_dereference(ug->memcg->vmstats_percpu);
+	if (likely(vmstats_percpu))
+		__this_cpu_add(vmstats_percpu->nr_page_events, nr_pages);
+	rcu_read_unlock();
 	memcg_check_events(ug->memcg, ug->dummy_page);
 	local_irq_restore(flags);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/6] mm: prepare to premature release of per-node lruvec_stat_cpu
  2019-03-12 22:33 [PATCH v2 0/6] mm: reduce the memory footprint of dying memory cgroups Roman Gushchin
  2019-03-12 22:33 ` [PATCH v2 1/6] mm: prepare to premature release of memcg->vmstats_percpu Roman Gushchin
@ 2019-03-12 22:33 ` Roman Gushchin
  2019-03-12 22:34 ` [PATCH v2 3/6] mm: release memcg percpu data prematurely Roman Gushchin
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Roman Gushchin @ 2019-03-12 22:33 UTC (permalink / raw)
  To: linux-mm, kernel-team
  Cc: linux-kernel, Tejun Heo, Rik van Riel, Johannes Weiner,
	Michal Hocko, Roman Gushchin

Similar to the memcg's vmstats_percpu, per-memcg per-node stats
consists of percpu- and atomic counterparts, and we do expect
that both coexist during the whole life-cycle of the memcg.

To prepare for a premature release of percpu per-node data,
let's pretend that lruvec_stat_cpu is a rcu-protected pointer,
which can be NULL. This patch adds corresponding checks whenever
required.

Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/memcontrol.h | 21 +++++++++++++++------
 mm/memcontrol.c            | 14 +++++++++++---
 2 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 05ca77767c6a..8ac04632002a 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -126,7 +126,7 @@ struct memcg_shrinker_map {
 struct mem_cgroup_per_node {
 	struct lruvec		lruvec;
 
-	struct lruvec_stat __percpu *lruvec_stat_cpu;
+	struct lruvec_stat __rcu /* __percpu */ *lruvec_stat_cpu;
 	atomic_long_t		lruvec_stat[NR_VM_NODE_STAT_ITEMS];
 
 	unsigned long		lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS];
@@ -682,6 +682,7 @@ static inline unsigned long lruvec_page_state(struct lruvec *lruvec,
 static inline void __mod_lruvec_state(struct lruvec *lruvec,
 				      enum node_stat_item idx, int val)
 {
+	struct lruvec_stat __percpu *lruvec_stat_cpu;
 	struct mem_cgroup_per_node *pn;
 	long x;
 
@@ -697,12 +698,20 @@ static inline void __mod_lruvec_state(struct lruvec *lruvec,
 	__mod_memcg_state(pn->memcg, idx, val);
 
 	/* Update lruvec */
-	x = val + __this_cpu_read(pn->lruvec_stat_cpu->count[idx]);
-	if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) {
-		atomic_long_add(x, &pn->lruvec_stat[idx]);
-		x = 0;
+	rcu_read_lock();
+	lruvec_stat_cpu = (struct lruvec_stat __percpu *)
+		rcu_dereference(pn->lruvec_stat_cpu);
+	if (likely(lruvec_stat_cpu)) {
+		x = val + __this_cpu_read(lruvec_stat_cpu->count[idx]);
+		if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) {
+			atomic_long_add(x, &pn->lruvec_stat[idx]);
+			x = 0;
+		}
+		__this_cpu_write(lruvec_stat_cpu->count[idx], x);
+	} else {
+		atomic_long_add(val, &pn->lruvec_stat[idx]);
 	}
-	__this_cpu_write(pn->lruvec_stat_cpu->count[idx], x);
+	rcu_read_unlock();
 }
 
 static inline void mod_lruvec_state(struct lruvec *lruvec,
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 803c772f354b..5ef4098f3f8d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2122,6 +2122,7 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
 static int memcg_hotplug_cpu_dead(unsigned int cpu)
 {
 	struct memcg_vmstats_percpu __percpu *vmstats_percpu;
+	struct lruvec_stat __percpu *lruvec_stat_cpu;
 	struct memcg_stock_pcp *stock;
 	struct mem_cgroup *memcg;
 
@@ -2152,7 +2153,12 @@ static int memcg_hotplug_cpu_dead(unsigned int cpu)
 				struct mem_cgroup_per_node *pn;
 
 				pn = mem_cgroup_nodeinfo(memcg, nid);
-				x = this_cpu_xchg(pn->lruvec_stat_cpu->count[i], 0);
+
+				lruvec_stat_cpu = (struct lruvec_stat __percpu*)
+					rcu_dereference(pn->lruvec_stat_cpu);
+				if (!lruvec_stat_cpu)
+					continue;
+				x = this_cpu_xchg(lruvec_stat_cpu->count[i], 0);
 				if (x)
 					atomic_long_add(x, &pn->lruvec_stat[i]);
 			}
@@ -4414,6 +4420,7 @@ struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
 
 static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
 {
+	struct lruvec_stat __percpu *lruvec_stat_cpu;
 	struct mem_cgroup_per_node *pn;
 	int tmp = node;
 	/*
@@ -4430,11 +4437,12 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
 	if (!pn)
 		return 1;
 
-	pn->lruvec_stat_cpu = alloc_percpu(struct lruvec_stat);
-	if (!pn->lruvec_stat_cpu) {
+	lruvec_stat_cpu = alloc_percpu(struct lruvec_stat);
+	if (!lruvec_stat_cpu) {
 		kfree(pn);
 		return 1;
 	}
+	rcu_assign_pointer(pn->lruvec_stat_cpu, lruvec_stat_cpu);
 
 	lruvec_init(&pn->lruvec);
 	pn->usage_in_excess = 0;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 3/6] mm: release memcg percpu data prematurely
  2019-03-12 22:33 [PATCH v2 0/6] mm: reduce the memory footprint of dying memory cgroups Roman Gushchin
  2019-03-12 22:33 ` [PATCH v2 1/6] mm: prepare to premature release of memcg->vmstats_percpu Roman Gushchin
  2019-03-12 22:33 ` [PATCH v2 2/6] mm: prepare to premature release of per-node lruvec_stat_cpu Roman Gushchin
@ 2019-03-12 22:34 ` Roman Gushchin
  2019-03-12 22:34 ` [PATCH v2 4/6] mm: release per-node " Roman Gushchin
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Roman Gushchin @ 2019-03-12 22:34 UTC (permalink / raw)
  To: linux-mm, kernel-team
  Cc: linux-kernel, Tejun Heo, Rik van Riel, Johannes Weiner,
	Michal Hocko, Roman Gushchin

To reduce the memory footprint of a dying memory cgroup, let's
release massive percpu data (vmstats_percpu) as early as possible,
and use atomic counterparts instead.

A dying cgroup can remain in the dying state for quite a long
time, being pinned in memory by any reference. For example,
if a page mlocked by some other cgroup, is charged to the dying
cgroup, it won't go away until the page will be released.

A dying memory cgroup can have some memory activity (e.g. dirty
pages can be flushed after cgroup removal), but in general it's
not expected to be very active in comparison to living cgroups.

So reducing the memory footprint by releasing percpu data
and switching over to atomics seems to be a good trade off.

Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/memcontrol.h |  4 ++++
 mm/memcontrol.c            | 24 +++++++++++++++++++++++-
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 8ac04632002a..569337514230 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -275,6 +275,10 @@ struct mem_cgroup {
 
 	/* memory.stat */
 	struct memcg_vmstats_percpu __rcu /* __percpu */ *vmstats_percpu;
+	struct memcg_vmstats_percpu __percpu *vmstats_percpu_offlined;
+
+	/* used to release non-used percpu memory */
+	struct rcu_head rcu;
 
 	MEMCG_PADDING(_pad2_);
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 5ef4098f3f8d..efd5bc131a38 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4470,7 +4470,7 @@ static void __mem_cgroup_free(struct mem_cgroup *memcg)
 
 	for_each_node(node)
 		free_mem_cgroup_per_node_info(memcg, node);
-	free_percpu(memcg->vmstats_percpu);
+	WARN_ON_ONCE(memcg->vmstats_percpu != NULL);
 	kfree(memcg);
 }
 
@@ -4613,6 +4613,26 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
 	return 0;
 }
 
+static void percpu_rcu_free(struct rcu_head *rcu)
+{
+	struct mem_cgroup *memcg = container_of(rcu, struct mem_cgroup, rcu);
+
+	free_percpu(memcg->vmstats_percpu_offlined);
+	WARN_ON_ONCE(memcg->vmstats_percpu);
+
+	css_put(&memcg->css);
+}
+
+static void mem_cgroup_offline_percpu(struct mem_cgroup *memcg)
+{
+	memcg->vmstats_percpu_offlined = (struct memcg_vmstats_percpu __percpu*)
+		rcu_dereference(memcg->vmstats_percpu);
+	rcu_assign_pointer(memcg->vmstats_percpu, NULL);
+
+	css_get(&memcg->css);
+	call_rcu(&memcg->rcu, percpu_rcu_free);
+}
+
 static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
@@ -4639,6 +4659,8 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
 	drain_all_stock(memcg);
 
 	mem_cgroup_id_put(memcg);
+
+	mem_cgroup_offline_percpu(memcg);
 }
 
 static void mem_cgroup_css_released(struct cgroup_subsys_state *css)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 4/6] mm: release per-node memcg percpu data prematurely
  2019-03-12 22:33 [PATCH v2 0/6] mm: reduce the memory footprint of dying memory cgroups Roman Gushchin
                   ` (2 preceding siblings ...)
  2019-03-12 22:34 ` [PATCH v2 3/6] mm: release memcg percpu data prematurely Roman Gushchin
@ 2019-03-12 22:34 ` Roman Gushchin
  2019-03-12 22:34 ` [PATCH v2 5/6] mm: flush memcg percpu stats and events before releasing Roman Gushchin
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Roman Gushchin @ 2019-03-12 22:34 UTC (permalink / raw)
  To: linux-mm, kernel-team
  Cc: linux-kernel, Tejun Heo, Rik van Riel, Johannes Weiner,
	Michal Hocko, Roman Gushchin

Similar to memcg-level statistics, per-node data isn't expected
to be hot after cgroup removal. Switching over to atomics and
prematurely releasing percpu data helps to reduce the memory
footprint of dying cgroups.

Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/memcontrol.h |  1 +
 mm/memcontrol.c            | 24 +++++++++++++++++++++++-
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 569337514230..f296693d102b 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -127,6 +127,7 @@ struct mem_cgroup_per_node {
 	struct lruvec		lruvec;
 
 	struct lruvec_stat __rcu /* __percpu */ *lruvec_stat_cpu;
+	struct lruvec_stat __percpu *lruvec_stat_cpu_offlined;
 	atomic_long_t		lruvec_stat[NR_VM_NODE_STAT_ITEMS];
 
 	unsigned long		lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS];
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index efd5bc131a38..1b5fe826d6d0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4460,7 +4460,7 @@ static void free_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
 	if (!pn)
 		return;
 
-	free_percpu(pn->lruvec_stat_cpu);
+	WARN_ON_ONCE(pn->lruvec_stat_cpu != NULL);
 	kfree(pn);
 }
 
@@ -4616,7 +4616,17 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
 static void percpu_rcu_free(struct rcu_head *rcu)
 {
 	struct mem_cgroup *memcg = container_of(rcu, struct mem_cgroup, rcu);
+	int node;
+
+	for_each_node(node) {
+		struct mem_cgroup_per_node *pn = memcg->nodeinfo[node];
 
+		if (!pn)
+			continue;
+
+		free_percpu(pn->lruvec_stat_cpu_offlined);
+		WARN_ON_ONCE(pn->lruvec_stat_cpu != NULL);
+	}
 	free_percpu(memcg->vmstats_percpu_offlined);
 	WARN_ON_ONCE(memcg->vmstats_percpu);
 
@@ -4625,6 +4635,18 @@ static void percpu_rcu_free(struct rcu_head *rcu)
 
 static void mem_cgroup_offline_percpu(struct mem_cgroup *memcg)
 {
+	int node;
+
+	for_each_node(node) {
+		struct mem_cgroup_per_node *pn = memcg->nodeinfo[node];
+
+		if (!pn)
+			continue;
+
+		pn->lruvec_stat_cpu_offlined = (struct lruvec_stat __percpu *)
+			rcu_dereference(pn->lruvec_stat_cpu);
+		rcu_assign_pointer(pn->lruvec_stat_cpu, NULL);
+	}
 	memcg->vmstats_percpu_offlined = (struct memcg_vmstats_percpu __percpu*)
 		rcu_dereference(memcg->vmstats_percpu);
 	rcu_assign_pointer(memcg->vmstats_percpu, NULL);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 5/6] mm: flush memcg percpu stats and events before releasing
  2019-03-12 22:33 [PATCH v2 0/6] mm: reduce the memory footprint of dying memory cgroups Roman Gushchin
                   ` (3 preceding siblings ...)
  2019-03-12 22:34 ` [PATCH v2 4/6] mm: release per-node " Roman Gushchin
@ 2019-03-12 22:34 ` Roman Gushchin
  2019-03-13 16:00   ` Johannes Weiner
  2019-03-12 22:34 ` [PATCH 5/5] mm: spill " Roman Gushchin
  2019-03-12 22:34 ` [PATCH v2 6/6] mm: refactor memcg_hotplug_cpu_dead() to use memcg_flush_offline_percpu() Roman Gushchin
  6 siblings, 1 reply; 12+ messages in thread
From: Roman Gushchin @ 2019-03-12 22:34 UTC (permalink / raw)
  To: linux-mm, kernel-team
  Cc: linux-kernel, Tejun Heo, Rik van Riel, Johannes Weiner,
	Michal Hocko, Roman Gushchin

Flush percpu stats and events data to corresponding before releasing
percpu memory.

Although per-cpu stats are never exactly precise, dropping them on
floor regularly may lead to an accumulation of an error. So, it's
safer to flush them before releasing.

To minimize the number of atomic updates, let's sum all stats/events
on all cpus locally, and then make a single update per entry.

Signed-off-by: Roman Gushchin <guro@fb.com>
---
 mm/memcontrol.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 1b5fe826d6d0..0f18bf2afea8 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2119,6 +2119,56 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
 	mutex_unlock(&percpu_charge_mutex);
 }
 
+/*
+ * Flush all per-cpu stats and events into atomics.
+ * Try to minimize the number of atomic writes by gathering data from
+ * all cpus locally, and then make one atomic update.
+ * No locking is required, because no one has an access to
+ * the offlined percpu data.
+ */
+static void memcg_flush_offline_percpu(struct mem_cgroup *memcg)
+{
+	struct memcg_vmstats_percpu __percpu *vmstats_percpu;
+	struct lruvec_stat __percpu *lruvec_stat_cpu;
+	struct mem_cgroup_per_node *pn;
+	int cpu, i;
+	long x;
+
+	vmstats_percpu = memcg->vmstats_percpu_offlined;
+
+	for (i = 0; i < MEMCG_NR_STAT; i++) {
+		int nid;
+
+		x = 0;
+		for_each_possible_cpu(cpu)
+			x += per_cpu(vmstats_percpu->stat[i], cpu);
+		if (x)
+			atomic_long_add(x, &memcg->vmstats[i]);
+
+		if (i >= NR_VM_NODE_STAT_ITEMS)
+			continue;
+
+		for_each_node(nid) {
+			pn = mem_cgroup_nodeinfo(memcg, nid);
+			lruvec_stat_cpu = pn->lruvec_stat_cpu_offlined;
+
+			x = 0;
+			for_each_possible_cpu(cpu)
+				x += per_cpu(lruvec_stat_cpu->count[i], cpu);
+			if (x)
+				atomic_long_add(x, &pn->lruvec_stat[i]);
+		}
+	}
+
+	for (i = 0; i < NR_VM_EVENT_ITEMS; i++) {
+		x = 0;
+		for_each_possible_cpu(cpu)
+			x += per_cpu(vmstats_percpu->events[i], cpu);
+		if (x)
+			atomic_long_add(x, &memcg->vmevents[i]);
+	}
+}
+
 static int memcg_hotplug_cpu_dead(unsigned int cpu)
 {
 	struct memcg_vmstats_percpu __percpu *vmstats_percpu;
@@ -4618,6 +4668,8 @@ static void percpu_rcu_free(struct rcu_head *rcu)
 	struct mem_cgroup *memcg = container_of(rcu, struct mem_cgroup, rcu);
 	int node;
 
+	memcg_flush_offline_percpu(memcg);
+
 	for_each_node(node) {
 		struct mem_cgroup_per_node *pn = memcg->nodeinfo[node];
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 5/5] mm: spill memcg percpu stats and events before releasing
  2019-03-12 22:33 [PATCH v2 0/6] mm: reduce the memory footprint of dying memory cgroups Roman Gushchin
                   ` (4 preceding siblings ...)
  2019-03-12 22:34 ` [PATCH v2 5/6] mm: flush memcg percpu stats and events before releasing Roman Gushchin
@ 2019-03-12 22:34 ` Roman Gushchin
  2019-03-12 22:34 ` [PATCH v2 6/6] mm: refactor memcg_hotplug_cpu_dead() to use memcg_flush_offline_percpu() Roman Gushchin
  6 siblings, 0 replies; 12+ messages in thread
From: Roman Gushchin @ 2019-03-12 22:34 UTC (permalink / raw)
  To: linux-mm, kernel-team
  Cc: linux-kernel, Tejun Heo, Rik van Riel, Johannes Weiner,
	Michal Hocko, Roman Gushchin

Spill percpu stats and events data to corresponding before releasing
percpu memory.

Although per-cpu stats are never exactly precise, dropping them on
floor regularly may lead to an accumulation of an error. So, it's
safer to sync them before releasing.

To minimize the number of atomic updates, let's sum all stats/events
on all cpus locally, and then make a single update per entry.

Signed-off-by: Roman Gushchin <guro@fb.com>
---
 mm/memcontrol.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 18e863890392..b7eb6fac735e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4612,11 +4612,63 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
 	return 0;
 }
 
+/*
+ * Spill all per-cpu stats and events into atomics.
+ * Try to minimize the number of atomic writes by gathering data from
+ * all cpus locally, and then make one atomic update.
+ * No locking is required, because no one has an access to
+ * the offlined percpu data.
+ */
+static void mem_cgroup_spill_offlined_percpu(struct mem_cgroup *memcg)
+{
+	struct memcg_vmstats_percpu __percpu *vmstats_percpu;
+	struct lruvec_stat __percpu *lruvec_stat_cpu;
+	struct mem_cgroup_per_node *pn;
+	int cpu, i;
+	long x;
+
+	vmstats_percpu = memcg->vmstats_percpu_offlined;
+
+	for (i = 0; i < MEMCG_NR_STAT; i++) {
+		int nid;
+
+		x = 0;
+		for_each_possible_cpu(cpu)
+			x += per_cpu(vmstats_percpu->stat[i], cpu);
+		if (x)
+			atomic_long_add(x, &memcg->vmstats[i]);
+
+		if (i >= NR_VM_NODE_STAT_ITEMS)
+			continue;
+
+		for_each_node(nid) {
+			pn = mem_cgroup_nodeinfo(memcg, nid);
+			lruvec_stat_cpu = pn->lruvec_stat_cpu_offlined;
+
+			x = 0;
+			for_each_possible_cpu(cpu)
+				x += per_cpu(lruvec_stat_cpu->count[i], cpu);
+			if (x)
+				atomic_long_add(x, &pn->lruvec_stat[i]);
+		}
+	}
+
+	for (i = 0; i < NR_VM_EVENT_ITEMS; i++) {
+		x = 0;
+		for_each_possible_cpu(cpu)
+			x += per_cpu(vmstats_percpu->events[i], cpu);
+		if (x)
+			atomic_long_add(x, &memcg->vmevents[i]);
+	}
+}
+
 static void mem_cgroup_free_percpu(struct rcu_head *rcu)
 {
 	struct mem_cgroup *memcg = container_of(rcu, struct mem_cgroup, rcu);
 	int node;
 
+	mem_cgroup_spill_offlined_percpu(memcg);
+
 	for_each_node(node) {
 		struct mem_cgroup_per_node *pn = memcg->nodeinfo[node];
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 6/6] mm: refactor memcg_hotplug_cpu_dead() to use memcg_flush_offline_percpu()
  2019-03-12 22:33 [PATCH v2 0/6] mm: reduce the memory footprint of dying memory cgroups Roman Gushchin
                   ` (5 preceding siblings ...)
  2019-03-12 22:34 ` [PATCH 5/5] mm: spill " Roman Gushchin
@ 2019-03-12 22:34 ` Roman Gushchin
  2019-03-13 16:07   ` Johannes Weiner
  6 siblings, 1 reply; 12+ messages in thread
From: Roman Gushchin @ 2019-03-12 22:34 UTC (permalink / raw)
  To: linux-mm, kernel-team
  Cc: linux-kernel, Tejun Heo, Rik van Riel, Johannes Weiner,
	Michal Hocko, Roman Gushchin

It's possible to remove a big chunk of the redundant code by making
memcg_flush_offline_percpu() to take cpumask as an argument and flush
percpu data on all cpus belonging to the mask instead of all possible cpus.

Then memcg_hotplug_cpu_dead() can call it with a single CPU bit set.

This approach allows to remove all duplicated code, but safe the
performance optimization made in memcg_flush_offline_percpu():
only one atomic operation per data entry.

for_each_data_entry()
	for_each_cpu(cpu. cpumask)
		sum_events()
	flush()

Otherwise it would be one atomic operation per data entry per cpu:
for_each_cpu(cpu)
	for_each_data_entry()
		flush()

Signed-off-by: Roman Gushchin <guro@fb.com>
---
 mm/memcontrol.c | 61 ++++++++-----------------------------------------
 1 file changed, 9 insertions(+), 52 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 0f18bf2afea8..92c80275d5eb 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2122,11 +2122,12 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
 /*
  * Flush all per-cpu stats and events into atomics.
  * Try to minimize the number of atomic writes by gathering data from
- * all cpus locally, and then make one atomic update.
+ * all cpus in cpumask locally, and then make one atomic update.
  * No locking is required, because no one has an access to
  * the offlined percpu data.
  */
-static void memcg_flush_offline_percpu(struct mem_cgroup *memcg)
+static void memcg_flush_offline_percpu(struct mem_cgroup *memcg,
+				       const struct cpumask *cpumask)
 {
 	struct memcg_vmstats_percpu __percpu *vmstats_percpu;
 	struct lruvec_stat __percpu *lruvec_stat_cpu;
@@ -2140,7 +2141,7 @@ static void memcg_flush_offline_percpu(struct mem_cgroup *memcg)
 		int nid;
 
 		x = 0;
-		for_each_possible_cpu(cpu)
+		for_each_cpu(cpu, cpumask)
 			x += per_cpu(vmstats_percpu->stat[i], cpu);
 		if (x)
 			atomic_long_add(x, &memcg->vmstats[i]);
@@ -2153,7 +2154,7 @@ static void memcg_flush_offline_percpu(struct mem_cgroup *memcg)
 			lruvec_stat_cpu = pn->lruvec_stat_cpu_offlined;
 
 			x = 0;
-			for_each_possible_cpu(cpu)
+			for_each_cpu(cpu, cpumask)
 				x += per_cpu(lruvec_stat_cpu->count[i], cpu);
 			if (x)
 				atomic_long_add(x, &pn->lruvec_stat[i]);
@@ -2162,7 +2163,7 @@ static void memcg_flush_offline_percpu(struct mem_cgroup *memcg)
 
 	for (i = 0; i < NR_VM_EVENT_ITEMS; i++) {
 		x = 0;
-		for_each_possible_cpu(cpu)
+		for_each_cpu(cpu, cpumask)
 			x += per_cpu(vmstats_percpu->events[i], cpu);
 		if (x)
 			atomic_long_add(x, &memcg->vmevents[i]);
@@ -2171,8 +2172,6 @@ static void memcg_flush_offline_percpu(struct mem_cgroup *memcg)
 
 static int memcg_hotplug_cpu_dead(unsigned int cpu)
 {
-	struct memcg_vmstats_percpu __percpu *vmstats_percpu;
-	struct lruvec_stat __percpu *lruvec_stat_cpu;
 	struct memcg_stock_pcp *stock;
 	struct mem_cgroup *memcg;
 
@@ -2180,50 +2179,8 @@ static int memcg_hotplug_cpu_dead(unsigned int cpu)
 	drain_stock(stock);
 
 	rcu_read_lock();
-	for_each_mem_cgroup(memcg) {
-		int i;
-
-		vmstats_percpu = (struct memcg_vmstats_percpu __percpu *)
-			rcu_dereference(memcg->vmstats_percpu);
-
-		for (i = 0; i < MEMCG_NR_STAT; i++) {
-			int nid;
-			long x;
-
-			if (vmstats_percpu) {
-				x = this_cpu_xchg(vmstats_percpu->stat[i], 0);
-				if (x)
-					atomic_long_add(x, &memcg->vmstats[i]);
-			}
-
-			if (i >= NR_VM_NODE_STAT_ITEMS)
-				continue;
-
-			for_each_node(nid) {
-				struct mem_cgroup_per_node *pn;
-
-				pn = mem_cgroup_nodeinfo(memcg, nid);
-
-				lruvec_stat_cpu = (struct lruvec_stat __percpu*)
-					rcu_dereference(pn->lruvec_stat_cpu);
-				if (!lruvec_stat_cpu)
-					continue;
-				x = this_cpu_xchg(lruvec_stat_cpu->count[i], 0);
-				if (x)
-					atomic_long_add(x, &pn->lruvec_stat[i]);
-			}
-		}
-
-		for (i = 0; i < NR_VM_EVENT_ITEMS; i++) {
-			long x;
-
-			if (vmstats_percpu) {
-				x = this_cpu_xchg(vmstats_percpu->events[i], 0);
-				if (x)
-					atomic_long_add(x, &memcg->vmevents[i]);
-			}
-		}
-	}
+	for_each_mem_cgroup(memcg)
+		memcg_flush_offline_percpu(memcg, get_cpu_mask(cpu));
 	rcu_read_unlock();
 
 	return 0;
@@ -4668,7 +4625,7 @@ static void percpu_rcu_free(struct rcu_head *rcu)
 	struct mem_cgroup *memcg = container_of(rcu, struct mem_cgroup, rcu);
 	int node;
 
-	memcg_flush_offline_percpu(memcg);
+	memcg_flush_offline_percpu(memcg, cpu_possible_mask);
 
 	for_each_node(node) {
 		struct mem_cgroup_per_node *pn = memcg->nodeinfo[node];
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 5/6] mm: flush memcg percpu stats and events before releasing
  2019-03-12 22:34 ` [PATCH v2 5/6] mm: flush memcg percpu stats and events before releasing Roman Gushchin
@ 2019-03-13 16:00   ` Johannes Weiner
  2019-03-13 18:23     ` Roman Gushchin
  0 siblings, 1 reply; 12+ messages in thread
From: Johannes Weiner @ 2019-03-13 16:00 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-mm, kernel-team, linux-kernel, Tejun Heo, Rik van Riel,
	Michal Hocko, Roman Gushchin

On Tue, Mar 12, 2019 at 03:34:02PM -0700, Roman Gushchin wrote:
> Flush percpu stats and events data to corresponding before releasing
> percpu memory.
> 
> Although per-cpu stats are never exactly precise, dropping them on
> floor regularly may lead to an accumulation of an error. So, it's
> safer to flush them before releasing.
> 
> To minimize the number of atomic updates, let's sum all stats/events
> on all cpus locally, and then make a single update per entry.
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

Do you mind merging 6/6 into this one? That would make it easier to
verify that the code added in this patch and the code removed in 6/6
are indeed functionally equivalent.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 6/6] mm: refactor memcg_hotplug_cpu_dead() to use memcg_flush_offline_percpu()
  2019-03-12 22:34 ` [PATCH v2 6/6] mm: refactor memcg_hotplug_cpu_dead() to use memcg_flush_offline_percpu() Roman Gushchin
@ 2019-03-13 16:07   ` Johannes Weiner
  2019-03-13 18:23     ` Roman Gushchin
  0 siblings, 1 reply; 12+ messages in thread
From: Johannes Weiner @ 2019-03-13 16:07 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-mm, kernel-team, linux-kernel, Tejun Heo, Rik van Riel,
	Michal Hocko, Roman Gushchin

On Tue, Mar 12, 2019 at 03:34:04PM -0700, Roman Gushchin wrote:
> @@ -2180,50 +2179,8 @@ static int memcg_hotplug_cpu_dead(unsigned int cpu)
> +	for_each_mem_cgroup(memcg)
> +		memcg_flush_offline_percpu(memcg, get_cpu_mask(cpu));

cpumask_of(cpu) is the official API function, with kerneldoc and
everything. I think get_cpu_mask() is just an implementation helper.

[hannes@computer linux]$ git grep cpumask_of | wc -l
400
[hannes@computer linux]$ git grep get_cpu_mask | wc -l
20

Otherwise, looks good to me!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 5/6] mm: flush memcg percpu stats and events before releasing
  2019-03-13 16:00   ` Johannes Weiner
@ 2019-03-13 18:23     ` Roman Gushchin
  0 siblings, 0 replies; 12+ messages in thread
From: Roman Gushchin @ 2019-03-13 18:23 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Roman Gushchin, linux-mm, Kernel Team, linux-kernel, Tejun Heo,
	Rik van Riel, Michal Hocko

On Wed, Mar 13, 2019 at 12:00:17PM -0400, Johannes Weiner wrote:
> On Tue, Mar 12, 2019 at 03:34:02PM -0700, Roman Gushchin wrote:
> > Flush percpu stats and events data to corresponding before releasing
> > percpu memory.
> > 
> > Although per-cpu stats are never exactly precise, dropping them on
> > floor regularly may lead to an accumulation of an error. So, it's
> > safer to flush them before releasing.
> > 
> > To minimize the number of atomic updates, let's sum all stats/events
> > on all cpus locally, and then make a single update per entry.
> > 
> > Signed-off-by: Roman Gushchin <guro@fb.com>
> 
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> 
> Do you mind merging 6/6 into this one? That would make it easier to
> verify that the code added in this patch and the code removed in 6/6
> are indeed functionally equivalent.
> 

I did try, but the result is the mess of added and removed lines,
which are *almost* the same, but are slightly different (e.g. tabs).
So it's much easier to review it as two separate patches.

Thanks!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 6/6] mm: refactor memcg_hotplug_cpu_dead() to use memcg_flush_offline_percpu()
  2019-03-13 16:07   ` Johannes Weiner
@ 2019-03-13 18:23     ` Roman Gushchin
  0 siblings, 0 replies; 12+ messages in thread
From: Roman Gushchin @ 2019-03-13 18:23 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Roman Gushchin, linux-mm, Kernel Team, linux-kernel, Tejun Heo,
	Rik van Riel, Michal Hocko

On Wed, Mar 13, 2019 at 12:07:49PM -0400, Johannes Weiner wrote:
> On Tue, Mar 12, 2019 at 03:34:04PM -0700, Roman Gushchin wrote:
> > @@ -2180,50 +2179,8 @@ static int memcg_hotplug_cpu_dead(unsigned int cpu)
> > +	for_each_mem_cgroup(memcg)
> > +		memcg_flush_offline_percpu(memcg, get_cpu_mask(cpu));
> 
> cpumask_of(cpu) is the official API function, with kerneldoc and
> everything. I think get_cpu_mask() is just an implementation helper.
> 
> [hannes@computer linux]$ git grep cpumask_of | wc -l
> 400
> [hannes@computer linux]$ git grep get_cpu_mask | wc -l
> 20
> 
> Otherwise, looks good to me!

Fixed in v3.

Thank you!

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-03-13 18:26 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-12 22:33 [PATCH v2 0/6] mm: reduce the memory footprint of dying memory cgroups Roman Gushchin
2019-03-12 22:33 ` [PATCH v2 1/6] mm: prepare to premature release of memcg->vmstats_percpu Roman Gushchin
2019-03-12 22:33 ` [PATCH v2 2/6] mm: prepare to premature release of per-node lruvec_stat_cpu Roman Gushchin
2019-03-12 22:34 ` [PATCH v2 3/6] mm: release memcg percpu data prematurely Roman Gushchin
2019-03-12 22:34 ` [PATCH v2 4/6] mm: release per-node " Roman Gushchin
2019-03-12 22:34 ` [PATCH v2 5/6] mm: flush memcg percpu stats and events before releasing Roman Gushchin
2019-03-13 16:00   ` Johannes Weiner
2019-03-13 18:23     ` Roman Gushchin
2019-03-12 22:34 ` [PATCH 5/5] mm: spill " Roman Gushchin
2019-03-12 22:34 ` [PATCH v2 6/6] mm: refactor memcg_hotplug_cpu_dead() to use memcg_flush_offline_percpu() Roman Gushchin
2019-03-13 16:07   ` Johannes Weiner
2019-03-13 18:23     ` Roman Gushchin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).