linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nhat Pham <nphamcs@gmail.com>
To: akpm@linux-foundation.org
Cc: riel@surriel.com, hannes@cmpxchg.org, mhocko@kernel.org,
	roman.gushchin@linux.dev, shakeelb@google.com,
	muchun.song@linux.dev, tj@kernel.org, lizefan.x@bytedance.com,
	shuah@kernel.org, mike.kravetz@oracle.com, yosryahmed@google.com,
	fvdl@google.com, linux-mm@kvack.org, kernel-team@meta.com,
	linux-kernel@vger.kernel.org, cgroups@vger.kernel.org
Subject: [PATCH v4 0/4] hugetlb memcg accounting
Date: Fri,  6 Oct 2023 11:46:25 -0700	[thread overview]
Message-ID: <20231006184629.155543-1-nphamcs@gmail.com> (raw)

Changelog:
v4:
	* Add another prep patch to clean up memory controller migration
	  logic.
	* Fix an issue in hugetlb folio migration where the new folio
	  is not properly charged (patch 3) (reported by Mike Kravetz)
	  (suggested by Johannes Weiner).
v3:
	* Add a prep patch at the start of the series to extend the memory
	  controller interface with new helper functions for hugetlb
	  accounting.
	* Do not account hugetlb memory for memcontroller in cgroup v1
	  (patch 2) (suggested by Johannes Weiner).
	* Change the gfp flag passed to mem cgroup charging (patch 2)
	  (suggested by Michal Hocko).
	* Add caveats to cgroup admin guide and commit changelog
	  (patch 2) (suggested by Michal Hocko).
v2:
	* Add a cgroup mount option to enable/disable the new hugetlb memcg
	  accounting behavior (patch 1) (suggested by Johannes Weiner).
	* Add a couple more ksft_print_msg() on error to aid debugging when
	  the selftest fails. (patch 2)

Currently, hugetlb memory usage is not acounted for in the memory
controller, which could lead to memory overprotection for cgroups with
hugetlb-backed memory. This has been observed in our production system.

For instance, here is one of our usecases: suppose there are two 32G
containers. The machine is booted with hugetlb_cma=6G, and each 
container may or may not use up to 3 gigantic page, depending on the 
workload within it. The rest is anon, cache, slab, etc. We can set the
hugetlb cgroup limit of each cgroup to 3G to enforce hugetlb fairness. 
But it is very difficult to configure memory.max to keep overall 
consumption, including anon, cache, slab etc. fair.
 
What we have had to resort to is to constantly poll hugetlb usage and
readjust memory.max. Similar procedure is done to other memory limits
(memory.low for e.g). However, this is rather cumbersome and buggy.
Furthermore, when there is a delay in memory limits correction, (for
e.g when hugetlb usage changes within consecutive runs of the userspace
agent), the system could be in an over/underprotected state.

This patch series rectifies this issue by charging the memcg when the
hugetlb folio is allocated, and uncharging when the folio is freed. In
addition, a new selftest is added to demonstrate and verify this new
behavior.

Nhat Pham (4):
  memcontrol: add helpers for hugetlb memcg accounting
  memcontrol: only transfer the memcg data for migration
  hugetlb: memcg: account hugetlb-backed memory in memory controller
  selftests: add a selftest to verify hugetlb usage in memcg

 Documentation/admin-guide/cgroup-v2.rst       |  29 +++
 MAINTAINERS                                   |   2 +
 include/linux/cgroup-defs.h                   |   5 +
 include/linux/memcontrol.h                    |  37 +++
 kernel/cgroup/cgroup.c                        |  15 +-
 mm/filemap.c                                  |   2 +-
 mm/hugetlb.c                                  |  35 ++-
 mm/memcontrol.c                               | 139 +++++++++--
 mm/migrate.c                                  |   3 +-
 tools/testing/selftests/cgroup/.gitignore     |   1 +
 tools/testing/selftests/cgroup/Makefile       |   2 +
 .../selftests/cgroup/test_hugetlb_memcg.c     | 234 ++++++++++++++++++
 12 files changed, 478 insertions(+), 26 deletions(-)
 create mode 100644 tools/testing/selftests/cgroup/test_hugetlb_memcg.c

-- 
2.34.1


             reply	other threads:[~2023-10-06 18:46 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-06 18:46 Nhat Pham [this message]
2023-10-06 18:46 ` [PATCH v4 1/4] memcontrol: add helpers for hugetlb memcg accounting Nhat Pham
2023-10-06 18:46 ` [PATCH v4 2/4] memcontrol: only transfer the memcg data for migration Nhat Pham
2023-10-06 18:46 ` [PATCH v4 3/4] hugetlb: memcg: account hugetlb-backed memory in memory controller Nhat Pham
2023-10-06 18:46 ` [PATCH v4 4/4] selftests: add a selftest to verify hugetlb usage in memcg Nhat Pham

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231006184629.155543-1-nphamcs@gmail.com \
    --to=nphamcs@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=fvdl@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan.x@bytedance.com \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=muchun.song@linux.dev \
    --cc=riel@surriel.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=shuah@kernel.org \
    --cc=tj@kernel.org \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).