linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] mm: reduce the memory footprint of dying memory cgroups
@ 2019-03-12 22:33 Roman Gushchin
  2019-03-12 22:33 ` [PATCH v2 1/6] mm: prepare to premature release of memcg->vmstats_percpu Roman Gushchin
                   ` (6 more replies)
  0 siblings, 7 replies; 12+ messages in thread
From: Roman Gushchin @ 2019-03-12 22:33 UTC (permalink / raw)
  To: linux-mm, kernel-team
  Cc: linux-kernel, Tejun Heo, Rik van Riel, Johannes Weiner,
	Michal Hocko, Roman Gushchin

A cgroup can remain in the dying state for a long time, being pinned in the
memory by any kernel object. It can be pinned by a page, shared with other
cgroup (e.g. mlocked by a process in the other cgroup). It can be pinned
by a vfs cache object, etc.

Mostly because of percpu data, the size of a memcg structure in the kernel
memory is quite large. Depending on the machine size and the kernel config,
it can easily reach hundreds of kilobytes per cgroup.

Depending on the memory pressure and the reclaim approach (which is a separate
topic), it looks like several hundreds (if not single thousands) of dying
cgroups is a typical number. On a moderately sized machine the overall memory
footprint is measured in hundreds of megabytes.

So if we can't completely get rid of dying cgroups, let's make them smaller.
This patchset aims to reduce the size of a dying memory cgroup by the premature
release of percpu data during the cgroup removal, and use of atomic counterparts
instead. Currently it covers per-memcg vmstat_percpu, per-memcg per-node
lruvec_stat_cpu. The same approach can be further applied to other percpu data.

Results on my test machine (32 CPUs, singe node):

  With the patchset:              Originally:

  nr_dying_descendants 0
  Slab:              66640 kB	  Slab:              67644 kB
  Percpu:             6912 kB	  Percpu:             6912 kB

  nr_dying_descendants 1000
  Slab:              85912 kB	  Slab:              84704 kB
  Percpu:            26880 kB	  Percpu:            64128 kB

So one dying cgroup went from 75 kB to 39 kB, which is almost twice smaller.
The difference will be even bigger on a bigger machine
(especially, with NUMA).

To test the patchset, I used the following script:
  CG=/sys/fs/cgroup/percpu_test/

  mkdir ${CG}
  echo "+memory" > ${CG}/cgroup.subtree_control

  cat ${CG}/cgroup.stat | grep nr_dying_descendants
  cat /proc/meminfo | grep -e Percpu -e Slab

  for i in `seq 1 1000`; do
      mkdir ${CG}/${i}
      echo $$ > ${CG}/${i}/cgroup.procs
      dd if=/dev/urandom of=/tmp/test-${i} count=1 2> /dev/null
      echo $$ > /sys/fs/cgroup/cgroup.procs
      rmdir ${CG}/${i}
  done

  cat /sys/fs/cgroup/cgroup.stat | grep nr_dying_descendants
  cat /proc/meminfo | grep -e Percpu -e Slab

  rmdir ${CG}


v2:
  - several renamings suggested by Johannes Weiner
  - added a patch, which merges cpu offlining and percpu flush code


Roman Gushchin (6):
  mm: prepare to premature release of memcg->vmstats_percpu
  mm: prepare to premature release of per-node lruvec_stat_cpu
  mm: release memcg percpu data prematurely
  mm: release per-node memcg percpu data prematurely
  mm: flush memcg percpu stats and events before releasing
  mm: refactor memcg_hotplug_cpu_dead() to use
    memcg_flush_offline_percpu()

 include/linux/memcontrol.h |  66 ++++++++++----
 mm/memcontrol.c            | 179 ++++++++++++++++++++++++++++---------
 2 files changed, 186 insertions(+), 59 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-03-13 18:26 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-12 22:33 [PATCH v2 0/6] mm: reduce the memory footprint of dying memory cgroups Roman Gushchin
2019-03-12 22:33 ` [PATCH v2 1/6] mm: prepare to premature release of memcg->vmstats_percpu Roman Gushchin
2019-03-12 22:33 ` [PATCH v2 2/6] mm: prepare to premature release of per-node lruvec_stat_cpu Roman Gushchin
2019-03-12 22:34 ` [PATCH v2 3/6] mm: release memcg percpu data prematurely Roman Gushchin
2019-03-12 22:34 ` [PATCH v2 4/6] mm: release per-node " Roman Gushchin
2019-03-12 22:34 ` [PATCH v2 5/6] mm: flush memcg percpu stats and events before releasing Roman Gushchin
2019-03-13 16:00   ` Johannes Weiner
2019-03-13 18:23     ` Roman Gushchin
2019-03-12 22:34 ` [PATCH 5/5] mm: spill " Roman Gushchin
2019-03-12 22:34 ` [PATCH v2 6/6] mm: refactor memcg_hotplug_cpu_dead() to use memcg_flush_offline_percpu() Roman Gushchin
2019-03-13 16:07   ` Johannes Weiner
2019-03-13 18:23     ` Roman Gushchin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).