All of lore.kernel.org
 help / color / mirror / Atom feed
From: Waiman Long <longman@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Alexey Dobriyan <adobriyan@gmail.com>,
	Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>
Cc: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org,
	linux-mm@kvack.org, Waiman Long <longman@redhat.com>
Subject: [RFC PATCH 8/8] memcg: Add over-high action prctl() documentation
Date: Mon, 17 Aug 2020 10:08:31 -0400	[thread overview]
Message-ID: <20200817140831.30260-9-longman@redhat.com> (raw)
In-Reply-To: <20200817140831.30260-1-longman@redhat.com>

A new memcontrol.rst documentation file is added to document the new
prctl(2) interface for setting the over-high mitigation action parameters
and retrieving them.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/userspace-api/index.rst      |   1 +
 Documentation/userspace-api/memcontrol.rst | 174 +++++++++++++++++++++
 2 files changed, 175 insertions(+)
 create mode 100644 Documentation/userspace-api/memcontrol.rst

diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
index 69fc5167e648..1c0fc7a7f4ec 100644
--- a/Documentation/userspace-api/index.rst
+++ b/Documentation/userspace-api/index.rst
@@ -23,6 +23,7 @@ place where this information is gathered.
    accelerators/ocxl
    ioctl/index
    media/index
+   memcontrol
 
 .. only::  subproject and html
 
diff --git a/Documentation/userspace-api/memcontrol.rst b/Documentation/userspace-api/memcontrol.rst
new file mode 100644
index 000000000000..0cfcc72ad5f0
--- /dev/null
+++ b/Documentation/userspace-api/memcontrol.rst
@@ -0,0 +1,174 @@
+==============
+Memory Control
+==============
+
+Memory controller can be used to control and limit the amount of
+physical memory used by a task. When a limit is set in "memory.high" in
+a v2 non-root memory cgroup, the memory controller will try to reclaim
+memory if the limit has been exceeded. Normally, that will be enough
+to keep the physical memory consumption of tasks in the memory cgroup
+to be around or below the "memory.high" limit.
+
+Sometimes, memory reclaim may not be able to recover memory in a rate
+that can catch up to the physical memory allocation rate. In this case,
+the physical memory consumption will keep on increasing.  For memory
+cgroup v2, when it is reaching "memory.max" or the system is running
+out of free memory, the OOM killer will be invoked to kill some tasks
+to free up additional memory. However, one has little control of which
+tasks are going to be killed by an OOM killer. Killing tasks that hold
+some important resources without freeing them first can create other
+system problems.
+
+Users who do not want the OOM killer to be invoked to kill random
+tasks in an out-of-memory situation can use the memory control facility
+provided by :manpage:`prctl(2)` to better manage the mitigation action
+that needs to be performed to an individual task when the specified
+memory limit is exceeded with memory cgroup v2 being used.
+
+The task to be controlled must be running in a non-root memory cgroup
+as no limit will be imposed on tasks running in the root memory cgroup.
+
+There are two prctl commands related to this:
+
+ * PR_SET_MEMCONTROL
+
+ * PR_GET_MEMCONTROL
+
+
+PR_SET_MEMCONTROL
+-----------------
+
+PR_SET_MEMCTROL controls what action should be taken when the memory
+limit is exceeded.
+
+The arg2 of :manpage:`prctl(2)` sets the desired mitigation action. The
+action code consists of three different parts:
+
+ * Bits 0-7: action command
+
+ * Bits 8-15: signal number
+
+ * Bits 16-31: flags
+
+The currently supported action commands are:
+
+====== ================== ================================================
+Value  Define             Description
+====== ================== ================================================
+0      PR_MEMACT_NONE     Use the default memory cgroup behavior
+1      PR_MEMACT_ENOMEM   Return ENOMEM for selected syscalls that try to
+                          allocate more memory when the preset memory limit
+                          is exceeded
+2      PR_MEMACT_SLOWDOWN Slow down the process for memory reclaim to
+                          catch up when memory limit is exceeded
+3      PR_MEMACT_SIGNAL   Send a signal to the task that has exceeded
+                          preset memory limit
+4      PR_MEMACT_KILL     Kill the task that has exceeded preset memory
+                          limit
+====== ================== ================================================
+
+The currently supports flags are:
+
+====== ==================== ================================================
+Value  Define               Description
+====== ==================== ================================================
+0x01   PR_MEMFLAG_SIGCONT   Send a signal on every allocation request instead
+                            of a one-shot signal
+0x02   PR_MEMFLAG_DIRECT    Check per-task memory limit irrespective of cgroup
+                            setting
+0x04   PR_MEMFLAG_LOG       Log any actions taken to the kernel ring buffer
+0x10   PR_MEMFLAG_RSS_ANON  Check process anonymous memory
+0x20   PR_MEMFLAG_RSS_FILE  Check process page caches
+0x40   PR_MEMFLAG_RSS_SHMEM Check process shared memory
+0x70   PR_MEMFLAG_RSS       Equivalent to (PR_MEMFLAG_RSS_ANON |
+                            PR_MEMFLAG_RSS_FILE | PR_MEMFLAG_RSS_SHMEM)
+====== ==================== ================================================
+
+If the action command is PR_MEMACT_SIGNAL, bits 16-23 of the action
+code contains the signal number to be used when the memory limit is
+exceeded. By default, the signal number is reset after delivery so
+that the signal will be delivered only once. Another PR_SET_MEMCONTROL
+command will have to be issued to set the signal again.  If the user
+want a non-fatal signal to be delivered every time when the memory
+limit is breached without doing another PR_SET_MEMCONTROL call, the
+PR_MEMFLAG_SIGCONT flag can be set.
+
+The arg3 of :manpage:`prctl(2)` sets the additional memory cgroup
+limit that will be added to the value specified in the "memory.high"
+control file to get the real limit over which action specified in the
+action command will be triggered. This is to make sure that mitigation
+action will only be taken when the kernel memory reclaim facility fails
+to limit the growth of physical memory usage.
+
+If any of the PR_MEMFLAG_RSS* flag is specified, arg4 contains the
+per-process memory limit that will be used to compare against the sum
+of the specified RSS memory consumption of the process to determine
+if action will be taken provided that overall memory consumption has
+exceeded the "memory.high" + arg3 limit when the PR_MEMFLAG_DIRECT flag
+isn't set.
+
+If the PR_MEMFLAG_DIRECT flag is set, however, the cgroup memory limit
+is ignored and a memory-over-limit check will be performed on each
+memory allocation request, if applicable. This is reserved for special
+use case and is not recommended for general use.
+
+
+PR_GET_MEMCONTROL
+-----------------
+
+PR_GET_MEMCONTROL returns the parameters set by a previous
+PR_SET_MEMCONTROL command.
+
+The arg2 of :manpage:`prctl(2)` sets type of parameter that is to be
+returned. The possible values are:
+
+====== =================== ================================================
+Value  Define              Description
+====== =================== ================================================
+0      PR_MEMGET_ACTION    Return the action code - command, flags & signal
+1      PR_MEMGET_CLIMIT    Return the additional cgroup memory limit (in bytes)
+2      PR_MEMGET_PLIMIT    Return the process memory limit for PR_MEMFLAG_RSS*
+====== =================== ================================================
+
+
+/proc/<pid>/memctl
+------------------
+
+PR_GET_MEMCONTROL only returns memory control setting about the
+task itself. To find those information about other tasks, the
+/proc/<pid>/memctl file can be read. This file reports three integer
+parameters:
+
+ * action code
+
+ * cgroup additional memory limit
+
+ * process memory limit for PR_MEMFLAG_RSS* flags
+
+These are the same values that will be returned if the task is
+calling :manpage:`prctl(2)` with PR_GET_MEMCONTROL command and the
+PR_MEMGET_ACTION, PR_MEMGET_CLIMIT and PR_MEMGET_PLIMIT arguments
+respectively.
+
+Privileged users can also write to the memctl file directly to modify
+those parameters for a given task.
+
+This procfs file is present for each of the running threads of a process.
+So multiple writes to each of them are needed to update the parameters
+for all the threads within a running process.
+
+Affected Syscalls
+-----------------
+
+The following system calls have additional check for the over-high
+memory usage flag that is set by the above memory control facility.
+
+ * :manpage:`brk(2)`
+
+ * :manpage:`mlock(2)`
+
+ * :manpage:`mlock2(2)`
+
+ * :manpage:`mlockall(2)`
+
+ * :manpage:`mmap(2)`
-- 
2.18.1


  parent reply	other threads:[~2020-08-17 14:10 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-17 14:08 [RFC PATCH 0/8] memcg: Enable fine-grained per process memory control Waiman Long
2020-08-17 14:08 ` [RFC PATCH 1/8] memcg: Enable fine-grained control of over memory.high action Waiman Long
2020-08-17 14:30   ` Chris Down
2020-08-17 15:38     ` Waiman Long
2020-08-17 16:11       ` Chris Down
2020-08-17 16:44   ` Shakeel Butt
2020-08-17 16:44     ` Shakeel Butt
2020-08-17 16:56     ` Chris Down
2020-08-18 19:12       ` Waiman Long
2020-08-18 19:14     ` Waiman Long
2020-08-17 14:08 ` [RFC PATCH 2/8] memcg, mm: Return ENOMEM or delay if memcg_over_limit Waiman Long
2020-08-17 14:08 ` [RFC PATCH 3/8] memcg: Allow the use of task RSS memory as over-high action trigger Waiman Long
2020-08-17 14:08   ` Waiman Long
2020-08-17 14:08 ` [RFC PATCH 4/8] fs/proc: Support a new procfs memctl file Waiman Long
2020-08-17 20:10   ` kernel test robot
2020-08-17 20:10   ` [RFC PATCH] fs/proc: proc_memctl_operations can be static kernel test robot
2020-08-17 14:08 ` [RFC PATCH 5/8] memcg: Allow direct per-task memory limit checking Waiman Long
2020-08-17 14:08 ` [RFC PATCH 6/8] memcg: Introduce additional memory control slowdown if needed Waiman Long
2020-08-17 14:08 ` [RFC PATCH 7/8] memcg: Enable logging of memory control mitigation action Waiman Long
2020-08-17 14:08   ` Waiman Long
2020-08-17 14:08 ` Waiman Long [this message]
2020-08-17 15:26 ` [RFC PATCH 0/8] memcg: Enable fine-grained per process memory control Michal Hocko
2020-08-17 15:55   ` Waiman Long
2020-08-17 15:55     ` Waiman Long
2020-08-17 19:26     ` Michal Hocko
2020-08-18 19:20       ` Waiman Long
2020-08-18 19:20         ` Waiman Long
2020-08-18  9:14 ` peterz
2020-08-18  9:14   ` peterz-wEGCiKHe2LqWVfeAwA7xHQ
2020-08-18  9:26   ` Michal Hocko
2020-08-18  9:26     ` Michal Hocko
2020-08-18  9:59     ` peterz
2020-08-18  9:59       ` peterz-wEGCiKHe2LqWVfeAwA7xHQ
2020-08-18 10:05       ` Michal Hocko
2020-08-18 10:18         ` peterz
2020-08-18 10:30           ` Michal Hocko
2020-08-18 10:36             ` peterz
2020-08-18 13:49           ` Johannes Weiner
2020-08-18 13:49             ` Johannes Weiner
2020-08-21 19:37             ` Peter Zijlstra
2020-08-21 19:37               ` Peter Zijlstra
2020-08-24 16:58               ` Johannes Weiner
2020-09-07 11:47                 ` Chris Down
2020-09-09 11:53                 ` Michal Hocko
2020-08-18 10:17       ` Chris Down
2020-08-18 10:17         ` Chris Down
2020-08-18 10:26         ` peterz
2020-08-18 10:35           ` Chris Down
2020-08-23  2:49         ` Waiman Long
2020-08-23  2:49           ` Waiman Long
2020-08-18  9:27   ` Chris Down
2020-08-18  9:27     ` Chris Down
2020-08-18 10:04     ` peterz
2020-08-18 12:55       ` Matthew Wilcox
2020-08-18 12:55         ` Matthew Wilcox
2020-08-20  6:11         ` Dave Chinner
2020-08-18 19:30     ` Waiman Long
2020-08-18 19:27   ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200817140831.30260-9-longman@redhat.com \
    --to=longman@redhat.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=corbet@lwn.net \
    --cc=hannes@cmpxchg.org \
    --cc=juri.lelli@redhat.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=vdavydov.dev@gmail.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.