linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Roman Gushchin <guro@fb.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>, <linux-mm@kvack.org>,
	<kernel-team@fb.com>, <linux-kernel@vger.kernel.org>,
	Roman Gushchin <guro@fb.com>
Subject: [PATCH v3 00/19] The new cgroup slab memory controller
Date: Wed, 22 Apr 2020 13:46:49 -0700	[thread overview]
Message-ID: <20200422204708.2176080-1-guro@fb.com> (raw)

This is a third version of the slab cgroup controller rework.

The patchset moves the accounting from the page level to the object
level. It allows to share slab pages between memory cgroups.
This leads to a significant win in the slab utilization (up to 45%)
and the corresponding drop in the total kernel memory footprint.
The reduced number of unmovable slab pages should also have a positive
effect on the memory fragmentation.

The patchset makes the slab accounting code simpler: there is no more
need in the complicated dynamic creation and destruction of per-cgroup
slab caches, all memory cgroups use a global set of shared slab caches.
The lifetime of slab caches is not more connected to the lifetime
of memory cgroups.

The more precise accounting does require more CPU, however in practice
the difference seems to be negligible. We've been using the new slab
controller in Facebook production for several months with different
workloads and haven't seen any noticeable regressions. What we've seen
were memory savings in order of 1 GB per host (it varied heavily depending
on the actual workload, size of RAM, number of CPUs, memory pressure, etc).

The third version of the patchset added yet another step towards
the simplification of the code: sharing of slab caches between
accounted and non-accounted allocations. It comes with significant
upsides (most noticeable, a complete elimination of dynamic slab caches
creation) but not without some regression risks, so this change sits
on top of the patchset and is not completely merged in. So in the unlikely
event of a noticeable performance regression it can be reverted separately.

v3:
  1) added a patch that switches to a global single set of kmem_caches
  2) kmem API clean up dropped, because if has been already merged
  3) byte-sized slab vmstat API over page-sized global counters and
     bytes-sized memcg/lruvec counters
  3) obj_cgroup refcounting simplifications and other minor fixes
  4) other minor changes

v2:
  1) implemented re-layering and renaming suggested by Johannes,
     added his patch to the set. Thanks!
  2) fixed the issue discovered by Bharata B Rao. Thanks!
  3) added kmem API clean up part
  4) added slab/memcg follow-up clean up part
  5) fixed a couple of issues discovered by internal testing on FB fleet.
  6) added kselftests
  7) included metadata into the charge calculation
  8) refreshed commit logs, regrouped patches, rebased onto mm tree, etc

v1:
  1) fixed a bug in zoneinfo_show_print()
  2) added some comments to the subpage charging API, a minor fix
  3) separated memory.kmem.slabinfo deprecation into a separate patch,
     provided a drgn-based replacement
  4) rebased on top of the current mm tree

RFC:
  https://lwn.net/Articles/798605/


Johannes Weiner (1):
  mm: memcontrol: decouple reference counting from page accounting

Roman Gushchin (18):
  mm: memcg: factor out memcg- and lruvec-level changes out of
    __mod_lruvec_state()
  mm: memcg: prepare for byte-sized vmstat items
  mm: memcg: convert vmstat slab counters to bytes
  mm: slub: implement SLUB version of obj_to_index()
  mm: memcg/slab: obj_cgroup API
  mm: memcg/slab: allocate obj_cgroups for non-root slab pages
  mm: memcg/slab: save obj_cgroup for non-root slab objects
  mm: memcg/slab: charge individual slab objects instead of pages
  mm: memcg/slab: deprecate memory.kmem.slabinfo
  mm: memcg/slab: move memcg_kmem_bypass() to memcontrol.h
  mm: memcg/slab: use a single set of kmem_caches for all accounted
    allocations
  mm: memcg/slab: simplify memcg cache creation
  mm: memcg/slab: deprecate memcg_kmem_get_cache()
  mm: memcg/slab: deprecate slab_root_caches
  mm: memcg/slab: remove redundant check in memcg_accumulate_slabinfo()
  mm: memcg/slab: use a single set of kmem_caches for all allocations
  kselftests: cgroup: add kernel memory accounting tests
  tools/cgroup: add memcg_slabinfo.py tool

 drivers/base/node.c                        |   6 +-
 fs/proc/meminfo.c                          |   4 +-
 include/linux/memcontrol.h                 |  80 ++-
 include/linux/mm_types.h                   |   5 +-
 include/linux/mmzone.h                     |  19 +-
 include/linux/slab.h                       |   5 -
 include/linux/slab_def.h                   |   8 +-
 include/linux/slub_def.h                   |  20 +-
 include/linux/vmstat.h                     |  16 +-
 kernel/power/snapshot.c                    |   2 +-
 mm/memcontrol.c                            | 569 ++++++++++--------
 mm/oom_kill.c                              |   2 +-
 mm/page_alloc.c                            |   8 +-
 mm/slab.c                                  |  39 +-
 mm/slab.h                                  | 365 +++++-------
 mm/slab_common.c                           | 643 +--------------------
 mm/slob.c                                  |  12 +-
 mm/slub.c                                  | 183 +-----
 mm/vmscan.c                                |   3 +-
 mm/vmstat.c                                |  33 +-
 mm/workingset.c                            |   6 +-
 tools/cgroup/memcg_slabinfo.py             | 226 ++++++++
 tools/testing/selftests/cgroup/.gitignore  |   1 +
 tools/testing/selftests/cgroup/Makefile    |   2 +
 tools/testing/selftests/cgroup/test_kmem.c | 382 ++++++++++++
 25 files changed, 1322 insertions(+), 1317 deletions(-)
 create mode 100755 tools/cgroup/memcg_slabinfo.py
 create mode 100644 tools/testing/selftests/cgroup/test_kmem.c

-- 
2.25.3


             reply	other threads:[~2020-04-22 20:48 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-22 20:46 Roman Gushchin [this message]
2020-04-22 20:46 ` [PATCH v3 01/19] mm: memcg: factor out memcg- and lruvec-level changes out of __mod_lruvec_state() Roman Gushchin
2020-05-07 20:33   ` Johannes Weiner
2020-05-20 10:49   ` Vlastimil Babka
2020-04-22 20:46 ` [PATCH v3 02/19] mm: memcg: prepare for byte-sized vmstat items Roman Gushchin
2020-05-07 20:34   ` Johannes Weiner
2020-05-20 11:31   ` Vlastimil Babka
2020-05-20 11:36     ` Vlastimil Babka
2020-04-22 20:46 ` [PATCH v3 03/19] mm: memcg: convert vmstat slab counters to bytes Roman Gushchin
2020-05-07 20:41   ` Johannes Weiner
2020-05-20 12:25   ` Vlastimil Babka
2020-05-20 19:26     ` Roman Gushchin
2020-05-21  9:57       ` Vlastimil Babka
2020-05-21 21:14         ` Roman Gushchin
2020-04-22 20:46 ` [PATCH v3 04/19] mm: slub: implement SLUB version of obj_to_index() Roman Gushchin
2020-04-22 23:52   ` Christopher Lameter
2020-04-23  0:05     ` Roman Gushchin
2020-04-25  2:10       ` Christopher Lameter
2020-04-25  2:46         ` Roman Gushchin
2020-04-27 16:21           ` Christopher Lameter
2020-04-27 16:46             ` Roman Gushchin
2020-04-28 17:06               ` Roman Gushchin
2020-04-28 17:45               ` Johannes Weiner
2020-04-30 16:29               ` Christopher Lameter
2020-04-30 17:15                 ` Roman Gushchin
2020-05-02 23:54                   ` Christopher Lameter
2020-05-04 18:29                     ` Roman Gushchin
2020-05-08 21:35                       ` Christopher Lameter
2020-05-13  0:57                         ` Roman Gushchin
2020-05-15 21:45                           ` Christopher Lameter
2020-05-15 22:12                             ` Roman Gushchin
2020-05-20  9:51                           ` Vlastimil Babka
2020-05-20 20:57                             ` Roman Gushchin
2020-05-15 20:02                         ` Roman Gushchin
2020-04-23 21:01     ` Roman Gushchin
2020-04-25  2:10       ` Christopher Lameter
2020-05-20 13:51   ` Vlastimil Babka
2020-05-20 21:00     ` Roman Gushchin
2020-05-21 11:01       ` Vlastimil Babka
2020-05-21 21:06         ` Roman Gushchin
2020-04-22 20:46 ` [PATCH v3 05/19] mm: memcontrol: decouple reference counting from page accounting Roman Gushchin
2020-04-22 20:46 ` [PATCH v3 06/19] mm: memcg/slab: obj_cgroup API Roman Gushchin
2020-05-07 21:03   ` Johannes Weiner
2020-05-07 22:26     ` Roman Gushchin
2020-05-12 22:56       ` Johannes Weiner
2020-05-15 22:01         ` Roman Gushchin
2020-04-22 20:46 ` [PATCH v3 07/19] mm: memcg/slab: allocate obj_cgroups for non-root slab pages Roman Gushchin
2020-04-23 20:20   ` Roman Gushchin
2020-05-22 18:27   ` Vlastimil Babka
2020-05-23  1:32     ` Roman Gushchin
2020-05-26 17:50     ` Roman Gushchin
2020-05-25 14:46   ` Vlastimil Babka
2020-04-22 20:46 ` [PATCH v3 08/19] mm: memcg/slab: save obj_cgroup for non-root slab objects Roman Gushchin
2020-05-25 15:07   ` Vlastimil Babka
2020-05-26 17:53     ` Roman Gushchin
2020-05-27 11:03       ` Vlastimil Babka
2020-04-22 20:46 ` [PATCH v3 09/19] mm: memcg/slab: charge individual slab objects instead of pages Roman Gushchin
2020-05-25 16:10   ` Vlastimil Babka
2020-05-26 18:04     ` Roman Gushchin
2020-04-22 20:46 ` [PATCH v3 10/19] mm: memcg/slab: deprecate memory.kmem.slabinfo Roman Gushchin
2020-05-07 21:05   ` Johannes Weiner
2020-04-22 20:47 ` [PATCH v3 11/19] mm: memcg/slab: move memcg_kmem_bypass() to memcontrol.h Roman Gushchin
2020-05-25 17:03   ` Vlastimil Babka
2020-04-22 20:47 ` [PATCH v3 12/19] mm: memcg/slab: use a single set of kmem_caches for all accounted allocations Roman Gushchin
2020-05-26 10:12   ` Vlastimil Babka
2020-04-22 20:47 ` [PATCH v3 13/19] mm: memcg/slab: simplify memcg cache creation Roman Gushchin
2020-05-26 10:31   ` Vlastimil Babka
2020-04-22 20:47 ` [PATCH v3 14/19] mm: memcg/slab: deprecate memcg_kmem_get_cache() Roman Gushchin
2020-05-26 10:34   ` Vlastimil Babka
2020-04-22 20:47 ` [PATCH v3 15/19] mm: memcg/slab: deprecate slab_root_caches Roman Gushchin
2020-05-26 10:52   ` Vlastimil Babka
2020-05-26 18:50     ` Roman Gushchin
2020-04-22 20:47 ` [PATCH v3 16/19] mm: memcg/slab: remove redundant check in memcg_accumulate_slabinfo() Roman Gushchin
2020-05-26 11:31   ` Vlastimil Babka
2020-04-22 20:47 ` [PATCH v3 17/19] mm: memcg/slab: use a single set of kmem_caches for all allocations Roman Gushchin
2020-05-26 14:55   ` Vlastimil Babka
2020-05-27  8:35     ` Jesper Dangaard Brouer
2020-04-22 20:47 ` [PATCH v3 18/19] kselftests: cgroup: add kernel memory accounting tests Roman Gushchin
2020-05-26 15:24   ` Vlastimil Babka
2020-05-26 15:45     ` Roman Gushchin
2020-05-27 17:00       ` Vlastimil Babka
2020-05-27 20:45         ` Roman Gushchin
2020-04-22 20:47 ` [PATCH v3 19/19] tools/cgroup: add memcg_slabinfo.py tool Roman Gushchin
2020-05-05 15:59   ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200422204708.2176080-1-guro@fb.com \
    --to=guro@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --subject='Re: [PATCH v3 00/19] The new cgroup slab memory controller' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox