All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mina Almasry <almasrymina@google.com>
To: mike.kravetz@oracle.com
Cc: shuah@kernel.org, almasrymina@google.com, rientjes@google.com,
	shakeelb@google.com, gthelen@google.com,
	akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, linux-kselftest@vger.kernel.org,
	cgroups@vger.kernel.org
Subject: [PATCH v12 9/9] hugetlb_cgroup: Add hugetlb_cgroup reservation docs
Date: Tue, 11 Feb 2020 13:31:28 -0800	[thread overview]
Message-ID: <20200211213128.73302-9-almasrymina@google.com> (raw)
In-Reply-To: <20200211213128.73302-1-almasrymina@google.com>

Add docs for how to use hugetlb_cgroup reservations, and their behavior.

Signed-off-by: Mina Almasry <almasrymina@google.com>

---

Changes in v11:
- Changed resv.* to rsvd.*
Changes in v10:
- Clarify reparenting behavior.
- Reword benefits of reservation limits.
Changes in v6:
- Updated docs to reflect the new design based on a new counter that
tracks both reservations and faults.

---
 .../admin-guide/cgroup-v1/hugetlb.rst         | 103 ++++++++++++++++--
 1 file changed, 92 insertions(+), 11 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v1/hugetlb.rst b/Documentation/admin-guide/cgroup-v1/hugetlb.rst
index a3902aa253a96..338f2c7d7a1cd 100644
--- a/Documentation/admin-guide/cgroup-v1/hugetlb.rst
+++ b/Documentation/admin-guide/cgroup-v1/hugetlb.rst
@@ -2,13 +2,6 @@
 HugeTLB Controller
 ==================

-The HugeTLB controller allows to limit the HugeTLB usage per control group and
-enforces the controller limit during page fault. Since HugeTLB doesn't
-support page reclaim, enforcing the limit at page fault time implies that,
-the application will get SIGBUS signal if it tries to access HugeTLB pages
-beyond its limit. This requires the application to know beforehand how much
-HugeTLB pages it would require for its use.
-
 HugeTLB controller can be created by first mounting the cgroup filesystem.

 # mount -t cgroup -o hugetlb none /sys/fs/cgroup
@@ -28,10 +21,14 @@ process (bash) into it.

 Brief summary of control files::

- hugetlb.<hugepagesize>.limit_in_bytes     # set/show limit of "hugepagesize" hugetlb usage
- hugetlb.<hugepagesize>.max_usage_in_bytes # show max "hugepagesize" hugetlb  usage recorded
- hugetlb.<hugepagesize>.usage_in_bytes     # show current usage for "hugepagesize" hugetlb
- hugetlb.<hugepagesize>.failcnt		   # show the number of allocation failure due to HugeTLB limit
+ hugetlb.<hugepagesize>.rsvd.limit_in_bytes            # set/show limit of "hugepagesize" hugetlb reservations
+ hugetlb.<hugepagesize>.rsvd.max_usage_in_bytes        # show max "hugepagesize" hugetlb reservations and no-reserve faults
+ hugetlb.<hugepagesize>.rsvd.usage_in_bytes            # show current reservations and no-reserve faults for "hugepagesize" hugetlb
+ hugetlb.<hugepagesize>.rsvd.failcnt                   # show the number of allocation failure due to HugeTLB reservation limit
+ hugetlb.<hugepagesize>.limit_in_bytes                 # set/show limit of "hugepagesize" hugetlb faults
+ hugetlb.<hugepagesize>.max_usage_in_bytes             # show max "hugepagesize" hugetlb  usage recorded
+ hugetlb.<hugepagesize>.usage_in_bytes                 # show current usage for "hugepagesize" hugetlb
+ hugetlb.<hugepagesize>.failcnt                        # show the number of allocation failure due to HugeTLB usage limit

 For a system supporting three hugepage sizes (64k, 32M and 1G), the control
 files include::
@@ -40,11 +37,95 @@ files include::
   hugetlb.1GB.max_usage_in_bytes
   hugetlb.1GB.usage_in_bytes
   hugetlb.1GB.failcnt
+  hugetlb.1GB.rsvd.limit_in_bytes
+  hugetlb.1GB.rsvd.max_usage_in_bytes
+  hugetlb.1GB.rsvd.usage_in_bytes
+  hugetlb.1GB.rsvd.failcnt
   hugetlb.64KB.limit_in_bytes
   hugetlb.64KB.max_usage_in_bytes
   hugetlb.64KB.usage_in_bytes
   hugetlb.64KB.failcnt
+  hugetlb.64KB.rsvd.limit_in_bytes
+  hugetlb.64KB.rsvd.max_usage_in_bytes
+  hugetlb.64KB.rsvd.usage_in_bytes
+  hugetlb.64KB.rsvd.failcnt
   hugetlb.32MB.limit_in_bytes
   hugetlb.32MB.max_usage_in_bytes
   hugetlb.32MB.usage_in_bytes
   hugetlb.32MB.failcnt
+  hugetlb.32MB.rsvd.limit_in_bytes
+  hugetlb.32MB.rsvd.max_usage_in_bytes
+  hugetlb.32MB.rsvd.usage_in_bytes
+  hugetlb.32MB.rsvd.failcnt
+
+
+1. Page fault accounting
+
+hugetlb.<hugepagesize>.limit_in_bytes
+hugetlb.<hugepagesize>.max_usage_in_bytes
+hugetlb.<hugepagesize>.usage_in_bytes
+hugetlb.<hugepagesize>.failcnt
+
+The HugeTLB controller allows users to limit the HugeTLB usage (page fault) per
+control group and enforces the limit during page fault. Since HugeTLB
+doesn't support page reclaim, enforcing the limit at page fault time implies
+that, the application will get SIGBUS signal if it tries to fault in HugeTLB
+pages beyond its limit. Therefore the application needs to know exactly how many
+HugeTLB pages it uses before hand, and the sysadmin needs to make sure that
+there are enough available on the machine for all the users to avoid processes
+getting SIGBUS.
+
+
+2. Reservation accounting
+
+hugetlb.<hugepagesize>.rsvd.limit_in_bytes
+hugetlb.<hugepagesize>.rsvd.max_usage_in_bytes
+hugetlb.<hugepagesize>.rsvd.usage_in_bytes
+hugetlb.<hugepagesize>.rsvd.failcnt
+
+The HugeTLB controller allows to limit the HugeTLB reservations per control
+group and enforces the controller limit at reservation time and at the fault of
+HugeTLB memory for which no reservation exists. Since reservation limits are
+enforced at reservation time (on mmap or shget), reservation limits never causes
+the application to get SIGBUS signal if the memory was reserved before hand. For
+MAP_NORESERVE allocations, the reservation limit behaves the same as the fault
+limit, enforcing memory usage at fault time and causing the application to
+receive a SIGBUS if it's crossing its limit.
+
+Reservation limits are superior to page fault limits described above, since
+reservation limits are enforced at reservation time (on mmap or shget), and
+never causes the application to get SIGBUS signal if the memory was reserved
+before hand. This allows for easier fallback to alternatives such as
+non-HugeTLB memory for example. In the case of page fault accounting, it's very
+hard to avoid processes getting SIGBUS since the sysadmin needs precisely know
+the HugeTLB usage of all the tasks in the system and make sure there is enough
+pages to satisfy all requests. Avoiding tasks getting SIGBUS on overcommited
+systems is practically impossible with page fault accounting.
+
+
+3. Caveats with shared memory
+
+For shared HugeTLB memory, both HugeTLB reservation and page faults are charged
+to the first task that causes the memory to be reserved or faulted, and all
+subsequent uses of this reserved or faulted memory is done without charging.
+
+Shared HugeTLB memory is only uncharged when it is unreserved or deallocated.
+This is usually when the HugeTLB file is deleted, and not when the task that
+caused the reservation or fault has exited.
+
+
+4. Caveats with HugeTLB cgroup offline.
+
+When a HugeTLB cgroup goes offline with some reservations or faults still
+charged to it, the behavior is as follows:
+
+- The fault charges are charged to the parent HugeTLB cgroup (reparented),
+- the reservation charges remain on the offline HugeTLB cgroup.
+
+This means that if a HugeTLB cgroup gets offlined while there is still HugeTLB
+reservations charged to it, that cgroup persists as a zombie until all HugeTLB
+reservations are uncharged. HugeTLB reservations behave in this manner to match
+the memory controller whose cgroups also persist as zombie until all charged
+memory is uncharged. Also, the tracking of HugeTLB reservations is a bit more
+complex compared to the tracking of HugeTLB faults, so it is significantly
+harder to reparent reservations at offline time.
--
2.25.0.225.g125e21ebc7-goog

  parent reply	other threads:[~2020-02-11 21:32 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-11 21:31 [PATCH v12 1/9] hugetlb_cgroup: Add hugetlb_cgroup reservation counter Mina Almasry
2020-02-11 21:31 ` Mina Almasry
2020-02-11 21:31 ` Mina Almasry
2020-02-11 21:31 ` [PATCH v12 2/9] hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations Mina Almasry
2020-02-11 21:31   ` Mina Almasry
2020-02-15  0:50   ` Mike Kravetz
2020-02-16  1:21     ` David Rientjes
2020-02-16  1:21       ` David Rientjes
2020-02-11 21:31 ` [PATCH v12 3/9] hugetlb_cgroup: add reservation accounting for private mappings Mina Almasry
2020-02-11 21:31   ` Mina Almasry
2020-02-11 21:31   ` Mina Almasry
2020-02-11 21:31 ` [PATCH v12 4/9] hugetlb: disable region_add file_region coalescing Mina Almasry
2020-02-11 21:31   ` Mina Almasry
2020-02-15  1:27   ` Mike Kravetz
2020-02-15  1:27     ` Mike Kravetz
2020-02-16  1:25   ` David Rientjes
2020-02-16  1:25     ` David Rientjes
2020-02-16  1:25     ` David Rientjes
2020-02-11 21:31 ` [PATCH v12 5/9] hugetlb_cgroup: add accounting for shared mappings Mina Almasry
2020-02-11 21:31   ` Mina Almasry
2020-02-11 21:31   ` Mina Almasry
2020-02-16  1:29   ` David Rientjes
2020-02-16  1:29     ` David Rientjes
2020-02-18 18:07   ` Mike Kravetz
2020-02-18 18:07     ` Mike Kravetz
2020-02-11 21:31 ` [PATCH v12 6/9] hugetlb_cgroup: support noreserve mappings Mina Almasry
2020-02-11 21:31   ` Mina Almasry
2020-02-18 20:57   ` Mike Kravetz
2020-02-18 20:57     ` Mike Kravetz
2020-02-11 21:31 ` [PATCH v12 7/9] hugetlb: support file_region coalescing again Mina Almasry
2020-02-11 21:31   ` Mina Almasry
2020-02-11 21:31   ` Mina Almasry
2020-02-16  1:29   ` David Rientjes
2020-02-16  1:29     ` David Rientjes
2020-02-19  3:28   ` Mike Kravetz
2020-02-19  7:54     ` Mina Almasry
2020-02-19  7:54       ` Mina Almasry
2020-02-19 23:36     ` [PATCH] hugetlb: Remove check_coalesce_bug debug code Mina Almasry
2020-02-19 23:36       ` Mina Almasry
2020-02-20  0:07       ` Mike Kravetz
2020-02-11 21:31 ` [PATCH v12 8/9] hugetlb_cgroup: Add hugetlb_cgroup reservation tests Mina Almasry
2020-02-11 21:31   ` Mina Almasry
2020-02-11 21:31   ` Mina Almasry
2020-02-12  8:50   ` Sandipan Das
2020-02-20  0:05     ` Mina Almasry
2020-02-20  0:05       ` Mina Almasry
2020-02-20  0:05       ` Mina Almasry
2020-02-21  0:52   ` Mike Kravetz
2020-02-21  0:52     ` Mike Kravetz
2020-02-11 21:31 ` Mina Almasry [this message]
2020-02-11 21:31   ` [PATCH v12 9/9] hugetlb_cgroup: Add hugetlb_cgroup reservation docs Mina Almasry
2020-02-20  0:03   ` Mina Almasry
2020-02-20  0:03     ` Mina Almasry
2020-02-20  0:03     ` Mina Almasry
2020-02-20  0:18   ` Mike Kravetz
2020-02-20  0:18     ` Mike Kravetz
2020-02-11 23:19 ` [PATCH v12 1/9] hugetlb_cgroup: Add hugetlb_cgroup reservation counter Andrew Morton
2020-02-11 23:19   ` Andrew Morton
2020-02-18 14:21   ` Qian Cai
2020-02-18 14:21     ` Qian Cai
2020-02-18 14:21     ` Qian Cai
2020-02-18 18:35     ` Mina Almasry
2020-02-18 18:35       ` Mina Almasry
2020-02-18 18:35       ` Mina Almasry
2020-02-18 18:41       ` Qian Cai
2020-02-18 18:41         ` Qian Cai
2020-02-18 19:14       ` Mike Kravetz
2020-02-18 19:14         ` Mike Kravetz
2020-02-18 19:25         ` Mina Almasry
2020-02-18 19:25           ` Mina Almasry
2020-02-18 21:36           ` Mina Almasry
2020-02-18 21:36             ` Mina Almasry
2020-02-18 21:41             ` Mike Kravetz
2020-02-18 21:41               ` Mike Kravetz
2020-02-18 22:27               ` Mina Almasry
2020-02-18 22:27                 ` Mina Almasry
2020-02-18 22:27                 ` Mina Almasry
2020-02-19 19:05   ` Mina Almasry
2020-02-19 19:05     ` Mina Almasry
2020-02-19 21:06     ` Andrew Morton
2020-02-19 21:06       ` Andrew Morton
2020-02-20 19:22       ` Mina Almasry
2020-02-20 19:22         ` Mina Almasry
2020-02-21  0:28         ` Andrew Morton
2020-02-21  0:41           ` Mike Kravetz
2020-02-21  0:41             ` Mike Kravetz
2020-02-21  1:52             ` Mina Almasry
2020-02-21  1:52               ` Mina Almasry
2020-02-21  1:52               ` Mina Almasry
2020-02-21 20:19       ` Mina Almasry
2020-02-21 20:19         ` Mina Almasry
2020-02-21 20:19         ` Mina Almasry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200211213128.73302-9-almasrymina@google.com \
    --to=almasrymina@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=gthelen@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=rientjes@google.com \
    --cc=shakeelb@google.com \
    --cc=shuah@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.