[PATCH 0/5] mm: reduce the memory footprint of dying memory cgroups

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/5] mm: reduce the memory footprint of dying memory cgroups
@ 2019-03-07 23:00 Roman Gushchin
  2019-03-07 23:00 ` [PATCH 1/5] mm: prepare to premature release of memcg->vmstats_percpu Roman Gushchin
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Roman Gushchin @ 2019-03-07 23:00 UTC (permalink / raw)
  To: linux-mm, kernel-team
  Cc: linux-kernel, Tejun Heo, Rik van Riel, Johannes Weiner,
	Michal Hocko, Roman Gushchin

A cgroup can remain in the dying state for a long time, being pinned in the
memory by any kernel object. It can be pinned by a page, shared with other
cgroup (e.g. mlocked by a process in the other cgroup). It can be pinned
by a vfs cache object, etc.

Mostly because of percpu data, the size of a memcg structure in the kernel
memory is quite large. Depending on the machine size and the kernel config,
it can easily reach hundreds of kilobytes per cgroup.

Depending on the memory pressure and the reclaim approach (which is a separate
topic), it looks like several hundreds (if not single thousands) of dying
cgroups is a typical number. On a moderately sized machine the overall memory
footprint is measured in hundreds of megabytes.

So if we can't completely get rid of dying cgroups, let's make them smaller.
This patchset aims to reduce the size of a dying memory cgroup by the premature
release of percpu data during the cgroup removal, and use of atomic counterparts
instead. Currently it covers per-memcg vmstat_percpu, per-memcg per-node
lruvec_stat_cpu. The same approach can be further applied to other percpu data.

Results on my test machine (32 CPUs, singe node):

  With the patchset:              Originally:

  nr_dying_descendants 0
  Slab:              66640 kB	  Slab:              67644 kB
  Percpu:             6912 kB	  Percpu:             6912 kB

  nr_dying_descendants 1000
  Slab:              85912 kB	  Slab:              84704 kB
  Percpu:            26880 kB	  Percpu:            64128 kB

So one dying cgroup went from 75 kB to 39 kB, which is almost twice smaller.
The difference will be even bigger on a bigger machine
(especially, with NUMA).

To test the patchset, I used the following script:
  CG=/sys/fs/cgroup/percpu_test/

  mkdir ${CG}
  echo "+memory" > ${CG}/cgroup.subtree_control

  cat ${CG}/cgroup.stat | grep nr_dying_descendants
  cat /proc/meminfo | grep -e Percpu -e Slab

  for i in `seq 1 1000`; do
      mkdir ${CG}/${i}
      echo $$ > ${CG}/${i}/cgroup.procs
      dd if=/dev/urandom of=/tmp/test-${i} count=1 2> /dev/null
      echo $$ > /sys/fs/cgroup/cgroup.procs
      rmdir ${CG}/${i}
  done

  cat /sys/fs/cgroup/cgroup.stat | grep nr_dying_descendants
  cat /proc/meminfo | grep -e Percpu -e Slab

  rmdir ${CG}

Roman Gushchin (5):
  mm: prepare to premature release of memcg->vmstats_percpu
  mm: prepare to premature release of per-node lruvec_stat_cpu
  mm: release memcg percpu data prematurely
  mm: release per-node memcg percpu data prematurely
  mm: spill memcg percpu stats and events before releasing

 include/linux/memcontrol.h |  66 ++++++++++----
 mm/memcontrol.c            | 173 +++++++++++++++++++++++++++++++++----
 2 files changed, 204 insertions(+), 35 deletions(-)

-- 
2.20.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/5] mm: prepare to premature release of memcg->vmstats_percpu
  2019-03-07 23:00 [PATCH 0/5] mm: reduce the memory footprint of dying memory cgroups Roman Gushchin
@ 2019-03-07 23:00 ` Roman Gushchin
  2019-03-11 17:14   ` Johannes Weiner
  2019-03-07 23:00 ` [PATCH 2/5] mm: prepare to premature release of per-node lruvec_stat_cpu Roman Gushchin
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Roman Gushchin @ 2019-03-07 23:00 UTC (permalink / raw)
  To: linux-mm, kernel-team
  Cc: linux-kernel, Tejun Heo, Rik van Riel, Johannes Weiner,
	Michal Hocko, Roman Gushchin

Prepare to handle premature release of memcg->vmstats_percpu data.
Currently it's a generic pointer which is expected to be non-NULL
during the whole life time of a memcg. Switch over to the
rcu-protected pointer, and carefully check it for being non-NULL.

This change is a required step towards dynamic premature release
of percpu memcg data.

Signed-off-by: Roman Gushchin <guro@fb.com>
---
 include/linux/memcontrol.h | 40 +++++++++++++++++-------
 mm/memcontrol.c            | 62 +++++++++++++++++++++++++++++---------
 2 files changed, 77 insertions(+), 25 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 534267947664..05ca77767c6a 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -274,7 +274,7 @@ struct mem_cgroup {
 	struct task_struct	*move_lock_task;
 
 	/* memory.stat */
-	struct memcg_vmstats_percpu __percpu *vmstats_percpu;
+	struct memcg_vmstats_percpu __rcu /* __percpu */ *vmstats_percpu;
 
 	MEMCG_PADDING(_pad2_);
 
@@ -597,17 +597,26 @@ static inline unsigned long memcg_page_state(struct mem_cgroup *memcg,
 static inline void __mod_memcg_state(struct mem_cgroup *memcg,
 				     int idx, int val)
 {
+	struct memcg_vmstats_percpu __percpu *vmstats_percpu;
 	long x;
 
 	if (mem_cgroup_disabled())
 		return;
 
-	x = val + __this_cpu_read(memcg->vmstats_percpu->stat[idx]);
-	if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) {
-		atomic_long_add(x, &memcg->vmstats[idx]);
-		x = 0;
+	rcu_read_lock();
+	vmstats_percpu = (struct memcg_vmstats_percpu __percpu *)
+		rcu_dereference(memcg->vmstats_percpu);
+	if (likely(vmstats_percpu)) {
+		x = val + __this_cpu_read(vmstats_percpu->stat[idx]);
+		if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) {
+			atomic_long_add(x, &memcg->vmstats[idx]);
+			x = 0;
+		}
+		__this_cpu_write(vmstats_percpu->stat[idx], x);
+	} else {
+		atomic_long_add(val, &memcg->vmstats[idx]);
 	}
-	__this_cpu_write(memcg->vmstats_percpu->stat[idx], x);
+	rcu_read_unlock();
 }
 
 /* idx can be of type enum memcg_stat_item or node_stat_item */
@@ -740,17 +749,26 @@ static inline void __count_memcg_events(struct mem_cgroup *memcg,
 					enum vm_event_item idx,
 					unsigned long count)
 {
+	struct memcg_vmstats_percpu __percpu *vmstats_percpu;
 	unsigned long x;
 
 	if (mem_cgroup_disabled())
 		return;
 
-	x = count + __this_cpu_read(memcg->vmstats_percpu->events[idx]);
-	if (unlikely(x > MEMCG_CHARGE_BATCH)) {
-		atomic_long_add(x, &memcg->vmevents[idx]);
-		x = 0;
+	rcu_read_lock();
+	vmstats_percpu = (struct memcg_vmstats_percpu __percpu *)
+		rcu_dereference(memcg->vmstats_percpu);
+	if (likely(vmstats_percpu)) {
+		x = count + __this_cpu_read(vmstats_percpu->events[idx]);
+		if (unlikely(x > MEMCG_CHARGE_BATCH)) {
+			atomic_long_add(x, &memcg->vmevents[idx]);
+			x = 0;
+		}
+		__this_cpu_write(vmstats_percpu->events[idx], x);
+	} else {
+		atomic_long_add(count, &memcg->vmevents[idx]);
 	}
-	__this_cpu_write(memcg->vmstats_percpu->events[idx], x);
+	rcu_read_unlock();
 }
 
 static inline void count_memcg_events(struct mem_cgroup *memcg,
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c532f8685aa3..803c772f354b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -697,6 +697,8 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg,
 					 struct page *page,
 					 bool compound, int nr_pages)
 {
+	struct memcg_vmstats_percpu __percpu *vmstats_percpu;
+
 	/*
 	 * Here, RSS means 'mapped anon' and anon's SwapCache. Shmem/tmpfs is
 	 * counted as CACHE even if it's on ANON LRU.
@@ -722,7 +724,12 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg,
 		nr_pages = -nr_pages; /* for event */
 	}
 
-	__this_cpu_add(memcg->vmstats_percpu->nr_page_events, nr_pages);
+	rcu_read_lock();
+	vmstats_percpu = (struct memcg_vmstats_percpu __percpu *)
+		rcu_dereference(memcg->vmstats_percpu);
+	if (likely(vmstats_percpu))
+		__this_cpu_add(vmstats_percpu->nr_page_events, nr_pages);
+	rcu_read_unlock();
 }
 
 unsigned long mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg,
@@ -756,10 +763,18 @@ static unsigned long mem_cgroup_nr_lru_pages(struct mem_cgroup *memcg,
 static bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg,
 				       enum mem_cgroup_events_target target)
 {
+	struct memcg_vmstats_percpu __percpu *vmstats_percpu;
 	unsigned long val, next;
+	bool ret = false;
 
-	val = __this_cpu_read(memcg->vmstats_percpu->nr_page_events);
-	next = __this_cpu_read(memcg->vmstats_percpu->targets[target]);
+	rcu_read_lock();
+	vmstats_percpu = (struct memcg_vmstats_percpu __percpu *)
+		rcu_dereference(memcg->vmstats_percpu);
+	if (!vmstats_percpu)
+		goto out;
+
+	val = __this_cpu_read(vmstats_percpu->nr_page_events);
+	next = __this_cpu_read(vmstats_percpu->targets[target]);
 	/* from time_after() in jiffies.h */
 	if ((long)(next - val) < 0) {
 		switch (target) {
@@ -775,10 +790,12 @@ static bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg,
 		default:
 			break;
 		}
-		__this_cpu_write(memcg->vmstats_percpu->targets[target], next);
-		return true;
+		__this_cpu_write(vmstats_percpu->targets[target], next);
+		ret = true;
 	}
-	return false;
+out:
+	rcu_read_unlock();
+	return ret;
 }
 
 /*
@@ -2104,22 +2121,29 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
 
 static int memcg_hotplug_cpu_dead(unsigned int cpu)
 {
+	struct memcg_vmstats_percpu __percpu *vmstats_percpu;
 	struct memcg_stock_pcp *stock;
 	struct mem_cgroup *memcg;
 
 	stock = &per_cpu(memcg_stock, cpu);
 	drain_stock(stock);
 
+	rcu_read_lock();
 	for_each_mem_cgroup(memcg) {
 		int i;
 
+		vmstats_percpu = (struct memcg_vmstats_percpu __percpu *)
+			rcu_dereference(memcg->vmstats_percpu);
+
 		for (i = 0; i < MEMCG_NR_STAT; i++) {
 			int nid;
 			long x;
 
-			x = this_cpu_xchg(memcg->vmstats_percpu->stat[i], 0);
-			if (x)
-				atomic_long_add(x, &memcg->vmstats[i]);
+			if (vmstats_percpu) {
+				x = this_cpu_xchg(vmstats_percpu->stat[i], 0);
+				if (x)
+					atomic_long_add(x, &memcg->vmstats[i]);
+			}
 
 			if (i >= NR_VM_NODE_STAT_ITEMS)
 				continue;
@@ -2137,11 +2161,14 @@ static int memcg_hotplug_cpu_dead(unsigned int cpu)
 		for (i = 0; i < NR_VM_EVENT_ITEMS; i++) {
 			long x;
 
-			x = this_cpu_xchg(memcg->vmstats_percpu->events[i], 0);
-			if (x)
-				atomic_long_add(x, &memcg->vmevents[i]);
+			if (vmstats_percpu) {
+				x = this_cpu_xchg(vmstats_percpu->events[i], 0);
+				if (x)
+					atomic_long_add(x, &memcg->vmevents[i]);
+			}
 		}
 	}
+	rcu_read_unlock();
 
 	return 0;
 }
@@ -4464,7 +4491,8 @@ static struct mem_cgroup *mem_cgroup_alloc(void)
 	if (memcg->id.id < 0)
 		goto fail;
 
-	memcg->vmstats_percpu = alloc_percpu(struct memcg_vmstats_percpu);
+	rcu_assign_pointer(memcg->vmstats_percpu,
+			   alloc_percpu(struct memcg_vmstats_percpu));
 	if (!memcg->vmstats_percpu)
 		goto fail;
 
@@ -6054,6 +6082,7 @@ static void uncharge_batch(const struct uncharge_gather *ug)
 {
 	unsigned long nr_pages = ug->nr_anon + ug->nr_file + ug->nr_kmem;
 	unsigned long flags;
+	struct memcg_vmstats_percpu __percpu *vmstats_percpu;
 
 	if (!mem_cgroup_is_root(ug->memcg)) {
 		page_counter_uncharge(&ug->memcg->memory, nr_pages);
@@ -6070,7 +6099,12 @@ static void uncharge_batch(const struct uncharge_gather *ug)
 	__mod_memcg_state(ug->memcg, MEMCG_RSS_HUGE, -ug->nr_huge);
 	__mod_memcg_state(ug->memcg, NR_SHMEM, -ug->nr_shmem);
 	__count_memcg_events(ug->memcg, PGPGOUT, ug->pgpgout);
-	__this_cpu_add(ug->memcg->vmstats_percpu->nr_page_events, nr_pages);
+	rcu_read_lock();
+	vmstats_percpu = (struct memcg_vmstats_percpu __percpu *)
+		rcu_dereference(ug->memcg->vmstats_percpu);
+	if (likely(vmstats_percpu))
+		__this_cpu_add(vmstats_percpu->nr_page_events, nr_pages);
+	rcu_read_unlock();
 	memcg_check_events(ug->memcg, ug->dummy_page);
 	local_irq_restore(flags);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/5] mm: prepare to premature release of per-node lruvec_stat_cpu
  2019-03-07 23:00 [PATCH 0/5] mm: reduce the memory footprint of dying memory cgroups Roman Gushchin
  2019-03-07 23:00 ` [PATCH 1/5] mm: prepare to premature release of memcg->vmstats_percpu Roman Gushchin
@ 2019-03-07 23:00 ` Roman Gushchin
  2019-03-11 17:17   ` Johannes Weiner
  2019-03-07 23:00 ` [PATCH 3/5] mm: release memcg percpu data prematurely Roman Gushchin
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Roman Gushchin @ 2019-03-07 23:00 UTC (permalink / raw)
  To: linux-mm, kernel-team
  Cc: linux-kernel, Tejun Heo, Rik van Riel, Johannes Weiner,
	Michal Hocko, Roman Gushchin

Similar to the memcg's vmstats_percpu, per-memcg per-node stats
consists of percpu- and atomic counterparts, and we do expect
that both coexist during the whole life-cycle of the memcg.

To prepare for a premature release of percpu per-node data,
let's pretend that lruvec_stat_cpu is a rcu-protected pointer,
which can be NULL. This patch adds corresponding checks whenever
required.

Signed-off-by: Roman Gushchin <guro@fb.com>
---
 include/linux/memcontrol.h | 21 +++++++++++++++------
 mm/memcontrol.c            | 11 +++++++++--
 2 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 05ca77767c6a..8ac04632002a 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -126,7 +126,7 @@ struct memcg_shrinker_map {
 struct mem_cgroup_per_node {
 	struct lruvec		lruvec;
 
-	struct lruvec_stat __percpu *lruvec_stat_cpu;
+	struct lruvec_stat __rcu /* __percpu */ *lruvec_stat_cpu;
 	atomic_long_t		lruvec_stat[NR_VM_NODE_STAT_ITEMS];
 
 	unsigned long		lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS];
@@ -682,6 +682,7 @@ static inline unsigned long lruvec_page_state(struct lruvec *lruvec,
 static inline void __mod_lruvec_state(struct lruvec *lruvec,
 				      enum node_stat_item idx, int val)
 {
+	struct lruvec_stat __percpu *lruvec_stat_cpu;
 	struct mem_cgroup_per_node *pn;
 	long x;
 
@@ -697,12 +698,20 @@ static inline void __mod_lruvec_state(struct lruvec *lruvec,
 	__mod_memcg_state(pn->memcg, idx, val);
 
 	/* Update lruvec */
-	x = val + __this_cpu_read(pn->lruvec_stat_cpu->count[idx]);
-	if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) {
-		atomic_long_add(x, &pn->lruvec_stat[idx]);
-		x = 0;
+	rcu_read_lock();
+	lruvec_stat_cpu = (struct lruvec_stat __percpu *)
+		rcu_dereference(pn->lruvec_stat_cpu);
+	if (likely(lruvec_stat_cpu)) {
+		x = val + __this_cpu_read(lruvec_stat_cpu->count[idx]);
+		if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) {
+			atomic_long_add(x, &pn->lruvec_stat[idx]);
+			x = 0;
+		}
+		__this_cpu_write(lruvec_stat_cpu->count[idx], x);
+	} else {
+		atomic_long_add(val, &pn->lruvec_stat[idx]);
 	}
-	__this_cpu_write(pn->lruvec_stat_cpu->count[idx], x);
+	rcu_read_unlock();
 }
 
 static inline void mod_lruvec_state(struct lruvec *lruvec,
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 803c772f354b..8f3cac02221a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2122,6 +2122,7 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
 static int memcg_hotplug_cpu_dead(unsigned int cpu)
 {
 	struct memcg_vmstats_percpu __percpu *vmstats_percpu;
+	struct lruvec_stat __percpu *lruvec_stat_cpu;
 	struct memcg_stock_pcp *stock;
 	struct mem_cgroup *memcg;
 
@@ -2152,7 +2153,12 @@ static int memcg_hotplug_cpu_dead(unsigned int cpu)
 				struct mem_cgroup_per_node *pn;
 
 				pn = mem_cgroup_nodeinfo(memcg, nid);
-				x = this_cpu_xchg(pn->lruvec_stat_cpu->count[i], 0);
+
+				lruvec_stat_cpu = (struct lruvec_stat __percpu*)
+					rcu_dereference(pn->lruvec_stat_cpu);
+				if (!lruvec_stat_cpu)
+					continue;
+				x = this_cpu_xchg(lruvec_stat_cpu->count[i], 0);
 				if (x)
 					atomic_long_add(x, &pn->lruvec_stat[i]);
 			}
@@ -4430,7 +4436,8 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
 	if (!pn)
 		return 1;
 
-	pn->lruvec_stat_cpu = alloc_percpu(struct lruvec_stat);
+	rcu_assign_pointer(pn->lruvec_stat_cpu,
+			   alloc_percpu(struct lruvec_stat));
 	if (!pn->lruvec_stat_cpu) {
 		kfree(pn);
 		return 1;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/5] mm: release memcg percpu data prematurely
  2019-03-07 23:00 [PATCH 0/5] mm: reduce the memory footprint of dying memory cgroups Roman Gushchin
  2019-03-07 23:00 ` [PATCH 1/5] mm: prepare to premature release of memcg->vmstats_percpu Roman Gushchin
  2019-03-07 23:00 ` [PATCH 2/5] mm: prepare to premature release of per-node lruvec_stat_cpu Roman Gushchin
@ 2019-03-07 23:00 ` Roman Gushchin
  2019-03-11 17:25   ` Johannes Weiner
  2019-03-07 23:00 ` [PATCH 4/5] mm: release per-node " Roman Gushchin
  2019-03-07 23:00 ` [PATCH 5/5] mm: spill memcg percpu stats and events before releasing Roman Gushchin
  4 siblings, 1 reply; 12+ messages in thread
From: Roman Gushchin @ 2019-03-07 23:00 UTC (permalink / raw)
  To: linux-mm, kernel-team
  Cc: linux-kernel, Tejun Heo, Rik van Riel, Johannes Weiner,
	Michal Hocko, Roman Gushchin

To reduce the memory footprint of a dying memory cgroup, let's
release massive percpu data (vmstats_percpu) as early as possible,
and use atomic counterparts instead.

A dying cgroup can remain in the dying state for quite a long
time, being pinned in memory by any reference. For example,
if a page mlocked by some other cgroup, is charged to the dying
cgroup, it won't go away until the page will be released.

A dying memory cgroup can have some memory activity (e.g. dirty
pages can be flushed after cgroup removal), but in general it's
not expected to be very active in comparison to living cgroups.

So reducing the memory footprint by releasing percpu data
and switching over to atomics seems to be a good trade off.

Signed-off-by: Roman Gushchin <guro@fb.com>
---
 include/linux/memcontrol.h |  4 ++++
 mm/memcontrol.c            | 24 +++++++++++++++++++++++-
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 8ac04632002a..569337514230 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -275,6 +275,10 @@ struct mem_cgroup {
 
 	/* memory.stat */
 	struct memcg_vmstats_percpu __rcu /* __percpu */ *vmstats_percpu;
+	struct memcg_vmstats_percpu __percpu *vmstats_percpu_offlined;
+
+	/* used to release non-used percpu memory */
+	struct rcu_head rcu;
 
 	MEMCG_PADDING(_pad2_);
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8f3cac02221a..8c55954e6f23 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4469,7 +4469,7 @@ static void __mem_cgroup_free(struct mem_cgroup *memcg)
 
 	for_each_node(node)
 		free_mem_cgroup_per_node_info(memcg, node);
-	free_percpu(memcg->vmstats_percpu);
+	WARN_ON_ONCE(memcg->vmstats_percpu != NULL);
 	kfree(memcg);
 }
 
@@ -4612,6 +4612,26 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
 	return 0;
 }
 
+static void mem_cgroup_free_percpu(struct rcu_head *rcu)
+{
+	struct mem_cgroup *memcg = container_of(rcu, struct mem_cgroup, rcu);
+
+	free_percpu(memcg->vmstats_percpu_offlined);
+	WARN_ON_ONCE(memcg->vmstats_percpu);
+
+	css_put(&memcg->css);
+}
+
+static void mem_cgroup_offline_percpu(struct mem_cgroup *memcg)
+{
+	memcg->vmstats_percpu_offlined = (struct memcg_vmstats_percpu __percpu*)
+		rcu_dereference(memcg->vmstats_percpu);
+	rcu_assign_pointer(memcg->vmstats_percpu, NULL);
+
+	css_get(&memcg->css);
+	call_rcu(&memcg->rcu, mem_cgroup_free_percpu);
+}
+
 static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
@@ -4638,6 +4658,8 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
 	drain_all_stock(memcg);
 
 	mem_cgroup_id_put(memcg);
+
+	mem_cgroup_offline_percpu(memcg);
 }
 
 static void mem_cgroup_css_released(struct cgroup_subsys_state *css)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/5] mm: release per-node memcg percpu data prematurely
  2019-03-07 23:00 [PATCH 0/5] mm: reduce the memory footprint of dying memory cgroups Roman Gushchin
                   ` (2 preceding siblings ...)
  2019-03-07 23:00 ` [PATCH 3/5] mm: release memcg percpu data prematurely Roman Gushchin
@ 2019-03-07 23:00 ` Roman Gushchin
  2019-03-11 17:27   ` Johannes Weiner
  2019-03-07 23:00 ` [PATCH 5/5] mm: spill memcg percpu stats and events before releasing Roman Gushchin
  4 siblings, 1 reply; 12+ messages in thread
From: Roman Gushchin @ 2019-03-07 23:00 UTC (permalink / raw)
  To: linux-mm, kernel-team
  Cc: linux-kernel, Tejun Heo, Rik van Riel, Johannes Weiner,
	Michal Hocko, Roman Gushchin

Similar to memcg-level statistics, per-node data isn't expected
to be hot after cgroup removal. Switching over to atomics and
prematurely releasing percpu data helps to reduce the memory
footprint of dying cgroups.

Signed-off-by: Roman Gushchin <guro@fb.com>
---
 include/linux/memcontrol.h |  1 +
 mm/memcontrol.c            | 24 +++++++++++++++++++++++-
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 569337514230..f296693d102b 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -127,6 +127,7 @@ struct mem_cgroup_per_node {
 	struct lruvec		lruvec;
 
 	struct lruvec_stat __rcu /* __percpu */ *lruvec_stat_cpu;
+	struct lruvec_stat __percpu *lruvec_stat_cpu_offlined;
 	atomic_long_t		lruvec_stat[NR_VM_NODE_STAT_ITEMS];
 
 	unsigned long		lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS];
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8c55954e6f23..18e863890392 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4459,7 +4459,7 @@ static void free_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
 	if (!pn)
 		return;
 
-	free_percpu(pn->lruvec_stat_cpu);
+	WARN_ON_ONCE(pn->lruvec_stat_cpu != NULL);
 	kfree(pn);
 }
 
@@ -4615,7 +4615,17 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
 static void mem_cgroup_free_percpu(struct rcu_head *rcu)
 {
 	struct mem_cgroup *memcg = container_of(rcu, struct mem_cgroup, rcu);
+	int node;
+
+	for_each_node(node) {
+		struct mem_cgroup_per_node *pn = memcg->nodeinfo[node];
 
+		if (!pn)
+			continue;
+
+		free_percpu(pn->lruvec_stat_cpu_offlined);
+		WARN_ON_ONCE(pn->lruvec_stat_cpu != NULL);
+	}
 	free_percpu(memcg->vmstats_percpu_offlined);
 	WARN_ON_ONCE(memcg->vmstats_percpu);
 
@@ -4624,6 +4634,18 @@ static void mem_cgroup_free_percpu(struct rcu_head *rcu)
 
 static void mem_cgroup_offline_percpu(struct mem_cgroup *memcg)
 {
+	int node;
+
+	for_each_node(node) {
+		struct mem_cgroup_per_node *pn = memcg->nodeinfo[node];
+
+		if (!pn)
+			continue;
+
+		pn->lruvec_stat_cpu_offlined = (struct lruvec_stat __percpu *)
+			rcu_dereference(pn->lruvec_stat_cpu);
+		rcu_assign_pointer(pn->lruvec_stat_cpu, NULL);
+	}
 	memcg->vmstats_percpu_offlined = (struct memcg_vmstats_percpu __percpu*)
 		rcu_dereference(memcg->vmstats_percpu);
 	rcu_assign_pointer(memcg->vmstats_percpu, NULL);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 5/5] mm: spill memcg percpu stats and events before releasing
  2019-03-07 23:00 [PATCH 0/5] mm: reduce the memory footprint of dying memory cgroups Roman Gushchin
                   ` (3 preceding siblings ...)
  2019-03-07 23:00 ` [PATCH 4/5] mm: release per-node " Roman Gushchin
@ 2019-03-07 23:00 ` Roman Gushchin
  2019-03-11 17:38   ` Johannes Weiner
  4 siblings, 1 reply; 12+ messages in thread
From: Roman Gushchin @ 2019-03-07 23:00 UTC (permalink / raw)
  To: linux-mm, kernel-team
  Cc: linux-kernel, Tejun Heo, Rik van Riel, Johannes Weiner,
	Michal Hocko, Roman Gushchin

Spill percpu stats and events data to corresponding before releasing
percpu memory.

Although per-cpu stats are never exactly precise, dropping them on
floor regularly may lead to an accumulation of an error. So, it's
safer to sync them before releasing.

To minimize the number of atomic updates, let's sum all stats/events
on all cpus locally, and then make a single update per entry.

Signed-off-by: Roman Gushchin <guro@fb.com>
---
 mm/memcontrol.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 18e863890392..b7eb6fac735e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4612,11 +4612,63 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
 	return 0;
 }
 
+/*
+ * Spill all per-cpu stats and events into atomics.
+ * Try to minimize the number of atomic writes by gathering data from
+ * all cpus locally, and then make one atomic update.
+ * No locking is required, because no one has an access to
+ * the offlined percpu data.
+ */
+static void mem_cgroup_spill_offlined_percpu(struct mem_cgroup *memcg)
+{
+	struct memcg_vmstats_percpu __percpu *vmstats_percpu;
+	struct lruvec_stat __percpu *lruvec_stat_cpu;
+	struct mem_cgroup_per_node *pn;
+	int cpu, i;
+	long x;
+
+	vmstats_percpu = memcg->vmstats_percpu_offlined;
+
+	for (i = 0; i < MEMCG_NR_STAT; i++) {
+		int nid;
+
+		x = 0;
+		for_each_possible_cpu(cpu)
+			x += per_cpu(vmstats_percpu->stat[i], cpu);
+		if (x)
+			atomic_long_add(x, &memcg->vmstats[i]);
+
+		if (i >= NR_VM_NODE_STAT_ITEMS)
+			continue;
+
+		for_each_node(nid) {
+			pn = mem_cgroup_nodeinfo(memcg, nid);
+			lruvec_stat_cpu = pn->lruvec_stat_cpu_offlined;
+
+			x = 0;
+			for_each_possible_cpu(cpu)
+				x += per_cpu(lruvec_stat_cpu->count[i], cpu);
+			if (x)
+				atomic_long_add(x, &pn->lruvec_stat[i]);
+		}
+	}
+
+	for (i = 0; i < NR_VM_EVENT_ITEMS; i++) {
+		x = 0;
+		for_each_possible_cpu(cpu)
+			x += per_cpu(vmstats_percpu->events[i], cpu);
+		if (x)
+			atomic_long_add(x, &memcg->vmevents[i]);
+	}
+}
+
 static void mem_cgroup_free_percpu(struct rcu_head *rcu)
 {
 	struct mem_cgroup *memcg = container_of(rcu, struct mem_cgroup, rcu);
 	int node;
 
+	mem_cgroup_spill_offlined_percpu(memcg);
+
 	for_each_node(node) {
 		struct mem_cgroup_per_node *pn = memcg->nodeinfo[node];
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/5] mm: prepare to premature release of memcg->vmstats_percpu
  2019-03-07 23:00 ` [PATCH 1/5] mm: prepare to premature release of memcg->vmstats_percpu Roman Gushchin
@ 2019-03-11 17:14   ` Johannes Weiner
  0 siblings, 0 replies; 12+ messages in thread
From: Johannes Weiner @ 2019-03-11 17:14 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-mm, kernel-team, linux-kernel, Tejun Heo, Rik van Riel,
	Michal Hocko, Roman Gushchin

On Thu, Mar 07, 2019 at 03:00:29PM -0800, Roman Gushchin wrote:
> Prepare to handle premature release of memcg->vmstats_percpu data.
> Currently it's a generic pointer which is expected to be non-NULL
> during the whole life time of a memcg. Switch over to the
> rcu-protected pointer, and carefully check it for being non-NULL.
> 
> This change is a required step towards dynamic premature release
> of percpu memcg data.
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/5] mm: prepare to premature release of per-node lruvec_stat_cpu
  2019-03-07 23:00 ` [PATCH 2/5] mm: prepare to premature release of per-node lruvec_stat_cpu Roman Gushchin
@ 2019-03-11 17:17   ` Johannes Weiner
  0 siblings, 0 replies; 12+ messages in thread
From: Johannes Weiner @ 2019-03-11 17:17 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-mm, kernel-team, linux-kernel, Tejun Heo, Rik van Riel,
	Michal Hocko, Roman Gushchin

On Thu, Mar 07, 2019 at 03:00:30PM -0800, Roman Gushchin wrote:
> Similar to the memcg's vmstats_percpu, per-memcg per-node stats
> consists of percpu- and atomic counterparts, and we do expect
> that both coexist during the whole life-cycle of the memcg.
> 
> To prepare for a premature release of percpu per-node data,
> let's pretend that lruvec_stat_cpu is a rcu-protected pointer,
> which can be NULL. This patch adds corresponding checks whenever
> required.
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

> @@ -4430,7 +4436,8 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
>  	if (!pn)
>  		return 1;
>  
> -	pn->lruvec_stat_cpu = alloc_percpu(struct lruvec_stat);
> +	rcu_assign_pointer(pn->lruvec_stat_cpu,
> +			   alloc_percpu(struct lruvec_stat));
>  	if (!pn->lruvec_stat_cpu) {

Nitpick: wouldn't this have to use rcu_dereference()? Might be cleaner
to use an intermediate variable and only assign after the NULL check.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/5] mm: release memcg percpu data prematurely
  2019-03-07 23:00 ` [PATCH 3/5] mm: release memcg percpu data prematurely Roman Gushchin
@ 2019-03-11 17:25   ` Johannes Weiner
  0 siblings, 0 replies; 12+ messages in thread
From: Johannes Weiner @ 2019-03-11 17:25 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-mm, kernel-team, linux-kernel, Tejun Heo, Rik van Riel,
	Michal Hocko, Roman Gushchin

On Thu, Mar 07, 2019 at 03:00:31PM -0800, Roman Gushchin wrote:
> To reduce the memory footprint of a dying memory cgroup, let's
> release massive percpu data (vmstats_percpu) as early as possible,
> and use atomic counterparts instead.
> 
> A dying cgroup can remain in the dying state for quite a long
> time, being pinned in memory by any reference. For example,
> if a page mlocked by some other cgroup, is charged to the dying
> cgroup, it won't go away until the page will be released.
> 
> A dying memory cgroup can have some memory activity (e.g. dirty
> pages can be flushed after cgroup removal), but in general it's
> not expected to be very active in comparison to living cgroups.
> 
> So reducing the memory footprint by releasing percpu data
> and switching over to atomics seems to be a good trade off.
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

One nitpick below:

> @@ -4612,6 +4612,26 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
>  	return 0;
>  }
>  
> +static void mem_cgroup_free_percpu(struct rcu_head *rcu)
> +{
> +	struct mem_cgroup *memcg = container_of(rcu, struct mem_cgroup, rcu);
> +
> +	free_percpu(memcg->vmstats_percpu_offlined);
> +	WARN_ON_ONCE(memcg->vmstats_percpu);
> +
> +	css_put(&memcg->css);

Nitpick: I had to double take seeing a "mem_cgroup_free_*" function
that does css_put(). We use "free" terminology (mem_cgroup_css_free,
memcg_free_kmem, memcg_free_shrinker_maps, mem_cgroup_free) from the
.css_free callback, which only happens when the last reference is put.

Can we go with something less ambigous? We can add "rcu" and drop the
mem_cgroup prefix since it's narrowly scoped. "percpu_rcu_free"?

> +static void mem_cgroup_offline_percpu(struct mem_cgroup *memcg)
> +{
> +	memcg->vmstats_percpu_offlined = (struct memcg_vmstats_percpu __percpu*)
> +		rcu_dereference(memcg->vmstats_percpu);
> +	rcu_assign_pointer(memcg->vmstats_percpu, NULL);
> +
> +	css_get(&memcg->css);
> +	call_rcu(&memcg->rcu, mem_cgroup_free_percpu);
> +}
> +
>  static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
>  {
>  	struct mem_cgroup *memcg = mem_cgroup_from_css(css);

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/5] mm: release per-node memcg percpu data prematurely
  2019-03-07 23:00 ` [PATCH 4/5] mm: release per-node " Roman Gushchin
@ 2019-03-11 17:27   ` Johannes Weiner
  0 siblings, 0 replies; 12+ messages in thread
From: Johannes Weiner @ 2019-03-11 17:27 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-mm, kernel-team, linux-kernel, Tejun Heo, Rik van Riel,
	Michal Hocko, Roman Gushchin

On Thu, Mar 07, 2019 at 03:00:32PM -0800, Roman Gushchin wrote:
> Similar to memcg-level statistics, per-node data isn't expected
> to be hot after cgroup removal. Switching over to atomics and
> prematurely releasing percpu data helps to reduce the memory
> footprint of dying cgroups.
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 5/5] mm: spill memcg percpu stats and events before releasing
  2019-03-07 23:00 ` [PATCH 5/5] mm: spill memcg percpu stats and events before releasing Roman Gushchin
@ 2019-03-11 17:38   ` Johannes Weiner
  2019-03-11 19:27     ` Roman Gushchin
  0 siblings, 1 reply; 12+ messages in thread
From: Johannes Weiner @ 2019-03-11 17:38 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-mm, kernel-team, linux-kernel, Tejun Heo, Rik van Riel,
	Michal Hocko, Roman Gushchin

On Thu, Mar 07, 2019 at 03:00:33PM -0800, Roman Gushchin wrote:
> Spill percpu stats and events data to corresponding before releasing
> percpu memory.
> 
> Although per-cpu stats are never exactly precise, dropping them on
> floor regularly may lead to an accumulation of an error. So, it's
> safer to sync them before releasing.
> 
> To minimize the number of atomic updates, let's sum all stats/events
> on all cpus locally, and then make a single update per entry.
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>
> ---
>  mm/memcontrol.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 52 insertions(+)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 18e863890392..b7eb6fac735e 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4612,11 +4612,63 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
>  	return 0;
>  }
>  
> +/*
> + * Spill all per-cpu stats and events into atomics.
> + * Try to minimize the number of atomic writes by gathering data from
> + * all cpus locally, and then make one atomic update.
> + * No locking is required, because no one has an access to
> + * the offlined percpu data.
> + */
> +static void mem_cgroup_spill_offlined_percpu(struct mem_cgroup *memcg)
> +{
> +	struct memcg_vmstats_percpu __percpu *vmstats_percpu;
> +	struct lruvec_stat __percpu *lruvec_stat_cpu;
> +	struct mem_cgroup_per_node *pn;
> +	int cpu, i;
> +	long x;
> +
> +	vmstats_percpu = memcg->vmstats_percpu_offlined;
> +
> +	for (i = 0; i < MEMCG_NR_STAT; i++) {
> +		int nid;
> +
> +		x = 0;
> +		for_each_possible_cpu(cpu)
> +			x += per_cpu(vmstats_percpu->stat[i], cpu);
> +		if (x)
> +			atomic_long_add(x, &memcg->vmstats[i]);
> +
> +		if (i >= NR_VM_NODE_STAT_ITEMS)
> +			continue;
> +
> +		for_each_node(nid) {
> +			pn = mem_cgroup_nodeinfo(memcg, nid);
> +			lruvec_stat_cpu = pn->lruvec_stat_cpu_offlined;
> +
> +			x = 0;
> +			for_each_possible_cpu(cpu)
> +				x += per_cpu(lruvec_stat_cpu->count[i], cpu);
> +			if (x)
> +				atomic_long_add(x, &pn->lruvec_stat[i]);
> +		}
> +	}
> +
> +	for (i = 0; i < NR_VM_EVENT_ITEMS; i++) {
> +		x = 0;
> +		for_each_possible_cpu(cpu)
> +			x += per_cpu(vmstats_percpu->events[i], cpu);
> +		if (x)
> +			atomic_long_add(x, &memcg->vmevents[i]);
> +	}

This looks good, but couldn't this be merged with the cpu offlining?
It seems to be exactly the same code, except for the nesting of the
for_each_possible_cpu() iteration here.

This could be a function that takes a CPU argument and then iterates
the cgroups and stat items to collect and spill the counters of that
specified CPU; offlining would call it once, and this spill code here
would call it for_each_possible_cpu().

We shouldn't need the atomicity of this_cpu_xchg() during hotunplug,
the scheduler isn't even active on that CPU anymore when it's called.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 5/5] mm: spill memcg percpu stats and events before releasing
  2019-03-11 17:38   ` Johannes Weiner
@ 2019-03-11 19:27     ` Roman Gushchin
  0 siblings, 0 replies; 12+ messages in thread
From: Roman Gushchin @ 2019-03-11 19:27 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Roman Gushchin, linux-mm, Kernel Team, linux-kernel, Tejun Heo,
	Rik van Riel, Michal Hocko

On Mon, Mar 11, 2019 at 01:38:25PM -0400, Johannes Weiner wrote:
> On Thu, Mar 07, 2019 at 03:00:33PM -0800, Roman Gushchin wrote:
> > Spill percpu stats and events data to corresponding before releasing
> > percpu memory.
> > 
> > Although per-cpu stats are never exactly precise, dropping them on
> > floor regularly may lead to an accumulation of an error. So, it's
> > safer to sync them before releasing.
> > 
> > To minimize the number of atomic updates, let's sum all stats/events
> > on all cpus locally, and then make a single update per entry.
> > 
> > Signed-off-by: Roman Gushchin <guro@fb.com>
> > ---
> >  mm/memcontrol.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 52 insertions(+)
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 18e863890392..b7eb6fac735e 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -4612,11 +4612,63 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
> >  	return 0;
> >  }
> >  
> > +/*
> > + * Spill all per-cpu stats and events into atomics.
> > + * Try to minimize the number of atomic writes by gathering data from
> > + * all cpus locally, and then make one atomic update.
> > + * No locking is required, because no one has an access to
> > + * the offlined percpu data.
> > + */
> > +static void mem_cgroup_spill_offlined_percpu(struct mem_cgroup *memcg)
> > +{
> > +	struct memcg_vmstats_percpu __percpu *vmstats_percpu;
> > +	struct lruvec_stat __percpu *lruvec_stat_cpu;
> > +	struct mem_cgroup_per_node *pn;
> > +	int cpu, i;
> > +	long x;
> > +
> > +	vmstats_percpu = memcg->vmstats_percpu_offlined;
> > +
> > +	for (i = 0; i < MEMCG_NR_STAT; i++) {
> > +		int nid;
> > +
> > +		x = 0;
> > +		for_each_possible_cpu(cpu)
> > +			x += per_cpu(vmstats_percpu->stat[i], cpu);
> > +		if (x)
> > +			atomic_long_add(x, &memcg->vmstats[i]);
> > +
> > +		if (i >= NR_VM_NODE_STAT_ITEMS)
> > +			continue;
> > +
> > +		for_each_node(nid) {
> > +			pn = mem_cgroup_nodeinfo(memcg, nid);
> > +			lruvec_stat_cpu = pn->lruvec_stat_cpu_offlined;
> > +
> > +			x = 0;
> > +			for_each_possible_cpu(cpu)
> > +				x += per_cpu(lruvec_stat_cpu->count[i], cpu);
> > +			if (x)
> > +				atomic_long_add(x, &pn->lruvec_stat[i]);
> > +		}
> > +	}
> > +
> > +	for (i = 0; i < NR_VM_EVENT_ITEMS; i++) {
> > +		x = 0;
> > +		for_each_possible_cpu(cpu)
> > +			x += per_cpu(vmstats_percpu->events[i], cpu);
> > +		if (x)
> > +			atomic_long_add(x, &memcg->vmevents[i]);
> > +	}
> 
> This looks good, but couldn't this be merged with the cpu offlining?
> It seems to be exactly the same code, except for the nesting of the
> for_each_possible_cpu() iteration here.
> 
> This could be a function that takes a CPU argument and then iterates
> the cgroups and stat items to collect and spill the counters of that
> specified CPU; offlining would call it once, and this spill code here
> would call it for_each_possible_cpu().
> 
> We shouldn't need the atomicity of this_cpu_xchg() during hotunplug,
> the scheduler isn't even active on that CPU anymore when it's called.

Good point!
I initially tried to adapt the cpu offlining code, but it didn't work
well: the code became too complex and ugly. But the opposite can be done
easily: mem_cgroup_spill_offlined_percpu() can take a cpumask,
and the cpu offlining code will look like:
	for_each_mem_cgroup(memcg)
		mem_cgroup_spill_offlined_percpu(memcg, cpumask);
I'll master a separate patch.

Thank you!

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-03-11 19:27 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-07 23:00 [PATCH 0/5] mm: reduce the memory footprint of dying memory cgroups Roman Gushchin
2019-03-07 23:00 ` [PATCH 1/5] mm: prepare to premature release of memcg->vmstats_percpu Roman Gushchin
2019-03-11 17:14   ` Johannes Weiner
2019-03-07 23:00 ` [PATCH 2/5] mm: prepare to premature release of per-node lruvec_stat_cpu Roman Gushchin
2019-03-11 17:17   ` Johannes Weiner
2019-03-07 23:00 ` [PATCH 3/5] mm: release memcg percpu data prematurely Roman Gushchin
2019-03-11 17:25   ` Johannes Weiner
2019-03-07 23:00 ` [PATCH 4/5] mm: release per-node " Roman Gushchin
2019-03-11 17:27   ` Johannes Weiner
2019-03-07 23:00 ` [PATCH 5/5] mm: spill memcg percpu stats and events before releasing Roman Gushchin
2019-03-11 17:38   ` Johannes Weiner
2019-03-11 19:27     ` Roman Gushchin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.