From: Glauber Costa <glommer@parallels.com>
To: linux-kernel@vger.kernel.org
Cc: paul@paulmenage.org, lizf@cn.fujitsu.com,
kamezawa.hiroyu@jp.fujitsu.com, ebiederm@xmission.com,
davem@davemloft.net, gthelen@google.com, netdev@vger.kernel.org,
linux-mm@kvack.org, Glauber Costa <glommer@parallels.com>
Subject: [PATCH v2 1/7] Basic kernel memory functionality for the Memory Controller
Date: Wed, 14 Sep 2011 22:46:09 -0300 [thread overview]
Message-ID: <1316051175-17780-2-git-send-email-glommer@parallels.com> (raw)
In-Reply-To: <1316051175-17780-1-git-send-email-glommer@parallels.com>
This patch lays down the foundation for the kernel memory component
of the Memory Controller.
As of today, I am only laying down the following files:
* memory.independent_kmem_limit
* memory.kmem.limit_in_bytes (currently ignored)
* memory.kmem.usage_in_bytes (always zero)
Signed-off-by: Glauber Costa <glommer@parallels.com>
CC: Paul Menage <paul@paulmenage.org>
CC: Greg Thelen <gthelen@google.com>
---
Documentation/cgroups/memory.txt | 29 ++++++++++-
init/Kconfig | 11 ++++
mm/memcontrol.c | 105 ++++++++++++++++++++++++++++++++++++-
3 files changed, 140 insertions(+), 5 deletions(-)
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index 6f3c598..ca58eff 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -44,8 +44,9 @@ Features:
- oom-killer disable knob and oom-notifier
- Root cgroup has no limit controls.
- Kernel memory and Hugepages are not under control yet. We just manage
- pages on LRU. To add more controls, we have to take care of performance.
+ Hugepages is not under control yet. We just manage pages on LRU. To add more
+ controls, we have to take care of performance. Kernel memory support is work
+ in progress, and the current version provides basically functionality.
Brief summary of control files.
@@ -56,8 +57,11 @@ Brief summary of control files.
(See 5.5 for details)
memory.memsw.usage_in_bytes # show current res_counter usage for memory+Swap
(See 5.5 for details)
+ memory.kmem.usage_in_bytes # show current res_counter usage for kmem only.
+ (See 2.7 for details)
memory.limit_in_bytes # set/show limit of memory usage
memory.memsw.limit_in_bytes # set/show limit of memory+Swap usage
+ memory.kmem.limit_in_bytes # if allowed, set/show limit of kernel memory
memory.failcnt # show the number of memory usage hits limits
memory.memsw.failcnt # show the number of memory+Swap hits limits
memory.max_usage_in_bytes # show max memory usage recorded
@@ -72,6 +76,9 @@ Brief summary of control files.
memory.oom_control # set/show oom controls.
memory.numa_stat # show the number of memory usage per numa node
+ memory.independent_kmem_limit # select whether or not kernel memory limits are
+ independent of user limits
+
1. History
The memory controller has a long history. A request for comments for the memory
@@ -255,6 +262,24 @@ When oom event notifier is registered, event will be delivered.
per-zone-per-cgroup LRU (cgroup's private LRU) is just guarded by
zone->lru_lock, it has no lock of its own.
+2.7 Kernel Memory Extension (CONFIG_CGROUP_MEM_RES_CTLR_KMEM)
+
+ With the Kernel memory extension, the Memory Controller is able to limit
+the amount of kernel memory used by the system. Kernel memory is fundamentally
+different than user memory, since it can't be swapped out, which makes it
+possible to DoS the system by consuming too much of this precious resource.
+
+Memory limits as specified by the standard Memory Controller may or may not
+take kernel memory into consideration. This is achieved through the file
+memory.independent_kmem_limit. A Value different than 0 will allow for kernel
+memory to be controlled separately.
+
+When kernel memory limits are not independent, the limit values set in
+memory.kmem files are ignored.
+
+Currently no soft limit is implemented for kernel memory. It is future work
+to trigger slab reclaim when those limits are reached.
+
3. User Interface
0. Configuration
diff --git a/init/Kconfig b/init/Kconfig
index d627783..49e5839 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -689,6 +689,17 @@ config CGROUP_MEM_RES_CTLR_SWAP_ENABLED
For those who want to have the feature enabled by default should
select this option (if, for some reason, they need to disable it
then swapaccount=0 does the trick).
+config CGROUP_MEM_RES_CTLR_KMEM
+ bool "Memory Resource Controller Kernel Memory accounting"
+ depends on CGROUP_MEM_RES_CTLR
+ default y
+ help
+ The Kernel Memory extension for Memory Resource Controller can limit
+ the amount of memory used by kernel objects in the system. Those are
+ fundamentally different from the entities handled by the standard
+ Memory Controller, which are page-based, and can be swapped. Users of
+ the kmem extension can use it to guarantee that no group of processes
+ will ever exhaust kernel resources alone.
config CGROUP_PERF
bool "Enable perf_event per-cpu per-container group (cgroup) monitoring"
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ebd1e86..1c5d01a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -73,7 +73,11 @@ static int really_do_swap_account __initdata = 0;
#define do_swap_account (0)
#endif
-
+#ifdef CONFIG_CGROUP_MEM_RES_CTLR_KMEM
+int do_kmem_account __read_mostly = 1;
+#else
+#define do_kmem_account (0)
+#endif
/*
* Statistics for memory cgroup.
*/
@@ -270,6 +274,10 @@ struct mem_cgroup {
*/
struct res_counter memsw;
/*
+ * the counter to account for kmem usage.
+ */
+ struct res_counter kmem;
+ /*
* Per cgroup active and inactive list, similar to the
* per zone LRU lists.
*/
@@ -321,6 +329,11 @@ struct mem_cgroup {
*/
unsigned long move_charge_at_immigrate;
/*
+ * Should kernel memory limits be stabilished independently
+ * from user memory ?
+ */
+ int kmem_independent;
+ /*
* percpu counter.
*/
struct mem_cgroup_stat_cpu *stat;
@@ -391,6 +404,7 @@ enum charge_type {
#define _MEM (0)
#define _MEMSWAP (1)
#define _OOM_TYPE (2)
+#define _KMEM (3)
#define MEMFILE_PRIVATE(x, val) (((x) << 16) | (val))
#define MEMFILE_TYPE(val) (((val) >> 16) & 0xffff)
#define MEMFILE_ATTR(val) ((val) & 0xffff)
@@ -3941,12 +3955,18 @@ static unsigned long mem_cgroup_recursive_stat(struct mem_cgroup *mem,
static inline u64 mem_cgroup_usage(struct mem_cgroup *mem, bool swap)
{
u64 val;
+ u64 kmem = 0;
+
+ if (!mem->kmem_independent)
+ kmem = res_counter_read_u64(&mem->kmem, RES_USAGE);
if (!mem_cgroup_is_root(mem)) {
if (!swap)
- return res_counter_read_u64(&mem->res, RES_USAGE);
+ kmem += res_counter_read_u64(&mem->res, RES_USAGE);
else
- return res_counter_read_u64(&mem->memsw, RES_USAGE);
+ kmem += res_counter_read_u64(&mem->memsw, RES_USAGE);
+
+ return kmem;
}
val = mem_cgroup_recursive_stat(mem, MEM_CGROUP_STAT_CACHE);
@@ -3979,6 +3999,10 @@ static u64 mem_cgroup_read(struct cgroup *cont, struct cftype *cft)
else
val = res_counter_read_u64(&mem->memsw, name);
break;
+ case _KMEM:
+ val = res_counter_read_u64(&mem->kmem, name);
+ break;
+
default:
BUG();
break;
@@ -4756,6 +4780,21 @@ static int mem_cgroup_reset_vmscan_stat(struct cgroup *cgrp,
return 0;
}
+#ifdef CONFIG_CGROUP_MEM_RES_CTLR_KMEM
+static u64 kmem_limit_independent_read(struct cgroup *cont, struct cftype *cft)
+{
+ return mem_cgroup_from_cont(cont)->kmem_independent;
+}
+
+static int kmem_limit_independent_write(struct cgroup *cont, struct cftype *cft,
+ u64 val)
+{
+ cgroup_lock();
+ mem_cgroup_from_cont(cont)->kmem_independent = !!val;
+ cgroup_unlock();
+ return 0;
+}
+#endif
static struct cftype mem_cgroup_files[] = {
{
@@ -4877,6 +4916,46 @@ static int register_memsw_files(struct cgroup *cont, struct cgroup_subsys *ss)
}
#endif
+
+#ifdef CONFIG_CGROUP_MEM_RES_CTLR_KMEM
+static struct cftype kmem_cgroup_files[] = {
+ {
+ .name = "independent_kmem_limit",
+ .read_u64 = kmem_limit_independent_read,
+ .write_u64 = kmem_limit_independent_write,
+ },
+ {
+ .name = "kmem.usage_in_bytes",
+ .private = MEMFILE_PRIVATE(_KMEM, RES_USAGE),
+ .read_u64 = mem_cgroup_read,
+ .register_event = mem_cgroup_usage_register_event,
+ .unregister_event = mem_cgroup_usage_unregister_event,
+ },
+ {
+ .name = "kmem.limit_in_bytes",
+ .private = MEMFILE_PRIVATE(_KMEM, RES_LIMIT),
+ .read_u64 = mem_cgroup_read,
+ .register_event = mem_cgroup_usage_register_event,
+ .unregister_event = mem_cgroup_usage_unregister_event,
+ },
+};
+
+static int register_kmem_files(struct cgroup *cont, struct cgroup_subsys *ss)
+{
+ if (!do_kmem_account)
+ return 0;
+
+ return cgroup_add_files(cont, ss, kmem_cgroup_files,
+ ARRAY_SIZE(kmem_cgroup_files));
+};
+
+#else
+static int register_kmem_files(struct cgroup *cont, struct cgroup_subsys *ss)
+{
+ return 0;
+}
+#endif
+
static int alloc_mem_cgroup_per_zone_info(struct mem_cgroup *mem, int node)
{
struct mem_cgroup_per_node *pn;
@@ -5075,6 +5154,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
if (parent && parent->use_hierarchy) {
res_counter_init(&mem->res, &parent->res);
res_counter_init(&mem->memsw, &parent->memsw);
+ res_counter_init(&mem->kmem, &parent->kmem);
/*
* We increment refcnt of the parent to ensure that we can
* safely access it on res_counter_charge/uncharge.
@@ -5085,6 +5165,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
} else {
res_counter_init(&mem->res, NULL);
res_counter_init(&mem->memsw, NULL);
+ res_counter_init(&mem->kmem, NULL);
}
mem->last_scanned_child = 0;
mem->last_scanned_node = MAX_NUMNODES;
@@ -5129,6 +5210,10 @@ static int mem_cgroup_populate(struct cgroup_subsys *ss,
if (!ret)
ret = register_memsw_files(cont, ss);
+
+ if (!ret)
+ ret = register_kmem_files(cont, ss);
+
return ret;
}
@@ -5665,3 +5750,17 @@ static int __init enable_swap_account(char *s)
__setup("swapaccount=", enable_swap_account);
#endif
+
+#ifdef CONFIG_CGROUP_MEM_RES_CTLR_KMEM
+static int __init disable_kmem_account(char *s)
+{
+ /* consider enabled if no parameter or 1 is given */
+ if (!strcmp(s, "1"))
+ do_kmem_account = 1;
+ else if (!strcmp(s, "0"))
+ do_kmem_account = 0;
+ return 1;
+}
+__setup("kmemaccount=", disable_kmem_account);
+
+#endif
--
1.7.6
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-09-15 1:46 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-09-15 1:46 [PATCH v2 0/7] per-cgroup tcp buffer pressure settings Glauber Costa
2011-09-15 1:46 ` Glauber Costa [this message]
2011-09-17 17:45 ` [PATCH v2 1/7] Basic kernel memory functionality for the Memory Controller Kirill A. Shutemov
2011-09-18 3:39 ` Glauber Costa
2011-09-18 19:05 ` Kirill A. Shutemov
2011-09-18 19:11 ` Glauber Costa
2011-09-18 20:39 ` Kirill A. Shutemov
2011-09-18 20:40 ` Glauber Costa
2011-09-18 20:43 ` Kirill A. Shutemov
2011-09-15 1:46 ` [PATCH v2 2/7] socket: initial cgroup code Glauber Costa
2011-09-17 17:52 ` Kirill A. Shutemov
2011-09-18 3:32 ` Glauber Costa
2011-09-18 18:58 ` Kirill A. Shutemov
2011-09-15 1:46 ` [PATCH v2 3/7] foundations of per-cgroup memory pressure controlling Glauber Costa
2011-09-15 1:46 ` [PATCH v2 4/7] per-cgroup tcp buffers control Glauber Costa
2011-09-17 18:11 ` Kirill A. Shutemov
2011-09-17 18:33 ` Cyrill Gorcunov
2011-09-18 3:32 ` Glauber Costa
2011-09-18 18:58 ` Kirill A. Shutemov
2011-09-18 19:42 ` Glauber Costa
2011-09-28 11:58 ` Andrew Wagin
2011-09-28 12:11 ` Glauber Costa
2011-09-15 1:46 ` [PATCH v2 5/7] per-netns ipv4 sysctl_tcp_mem Glauber Costa
2011-09-15 1:46 ` [PATCH v2 6/7] tcp buffer limitation: per-cgroup limit Glauber Costa
2011-09-17 12:12 ` Glauber Costa
2011-09-15 1:46 ` [PATCH v2 7/7] Display current tcp memory allocation in kmem cgroup Glauber Costa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1316051175-17780-2-git-send-email-glommer@parallels.com \
--to=glommer@parallels.com \
--cc=davem@davemloft.net \
--cc=ebiederm@xmission.com \
--cc=gthelen@google.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizf@cn.fujitsu.com \
--cc=netdev@vger.kernel.org \
--cc=paul@paulmenage.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).