From: Glauber Costa <glommer@parallels.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: <linux-kernel@vger.kernel.org>, <cgroups@vger.kernel.org>,
<kamezawa.hiroyu@jp.fujitsu.com>, <devel@openvz.org>,
Tejun Heo <tj@kernel.org>, <linux-mm@kvack.org>,
Suleiman Souhlal <suleiman@google.com>,
Frederic Weisbecker <fweisbec@gmail.com>,
Mel Gorman <mgorman@suse.de>,
David Rientjes <rientjes@google.com>,
Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH v3 04/13] kmem accounting basic infrastructure
Date: Wed, 26 Sep 2012 18:33:10 +0400 [thread overview]
Message-ID: <50631226.9050304@parallels.com> (raw)
In-Reply-To: <20120926140347.GD15801@dhcp22.suse.cz>
On 09/26/2012 06:03 PM, Michal Hocko wrote:
> On Tue 18-09-12 18:04:01, Glauber Costa wrote:
>> This patch adds the basic infrastructure for the accounting of the slab
>> caches. To control that, the following files are created:
>>
>> * memory.kmem.usage_in_bytes
>> * memory.kmem.limit_in_bytes
>> * memory.kmem.failcnt
>> * memory.kmem.max_usage_in_bytes
>>
>> They have the same meaning of their user memory counterparts. They
>> reflect the state of the "kmem" res_counter.
>
>> The code is not enabled until a limit is set.
>
> "Per cgroup slab memory accounting is not enabled until a limit is set
> for the group. Once the limit is set the accounting cannot be disabled
> such a group."
>
> Better?
>
>> This can be tested by the flag "kmem_accounted".
>
> Sounds as if it could be done from userspace (because you were talking
> about an user interface) which it cannot and we do not see it in this
> patch because it is not used anywhere. So please be more specific.
>
>> This means that after the patch is applied, no behavioral changes
>> exists for whoever is still using memcg to control their memory usage.
>>
>> We always account to both user and kernel resource_counters.
>
> This is in contradiction with your claim that there is no behavioral
> change for memcg users. Please clarify when we use u and when u+k
> accounting.
> "
> There is no behavioral change if the kmem accounting is turned off for
> memcg users but when there is a kmem.limit_in_bytes is set then the
> memory.usage_in_bytes will include both user and kmem memory.
> "
>
>> This
>> effectively means that an independent kernel limit is in place when the
>> limit is set to a lower value than the user memory. A equal or higher
>> value means that the user limit will always hit first, meaning that kmem
>> is effectively unlimited.
>>
>> People who want to track kernel memory but not limit it, can set this
>> limit to a very high number (like RESOURCE_MAX - 1page - that no one
>> will ever hit, or equal to the user memory)
>>
>> Signed-off-by: Glauber Costa <glommer@parallels.com>
>> CC: Michal Hocko <mhocko@suse.cz>
>> CC: Johannes Weiner <hannes@cmpxchg.org>
>> Acked-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> ---
>> mm/memcontrol.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 63 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index d6ad138..f3fd354 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -265,6 +265,10 @@ struct mem_cgroup {
>> };
>>
>> /*
>> + * the counter to account for kernel memory usage.
>> + */
>> + struct res_counter kmem;
>> + /*
>> * Per cgroup active and inactive list, similar to the
>> * per zone LRU lists.
>> */
>> @@ -279,6 +283,7 @@ struct mem_cgroup {
>> * Should the accounting and control be hierarchical, per subtree?
>> */
>> bool use_hierarchy;
>> + bool kmem_accounted;
>>
>> bool oom_lock;
>> atomic_t under_oom;
>> @@ -389,6 +394,7 @@ enum res_type {
>> _MEM,
>> _MEMSWAP,
>> _OOM_TYPE,
>> + _KMEM,
>> };
>>
>> #define MEMFILE_PRIVATE(x, val) ((x) << 16 | (val))
>> @@ -1439,6 +1445,10 @@ done:
>> res_counter_read_u64(&memcg->memsw, RES_USAGE) >> 10,
>> res_counter_read_u64(&memcg->memsw, RES_LIMIT) >> 10,
>> res_counter_read_u64(&memcg->memsw, RES_FAILCNT));
>> + printk(KERN_INFO "kmem: usage %llukB, limit %llukB, failcnt %llu\n",
>> + res_counter_read_u64(&memcg->kmem, RES_USAGE) >> 10,
>> + res_counter_read_u64(&memcg->kmem, RES_LIMIT) >> 10,
>> + res_counter_read_u64(&memcg->kmem, RES_FAILCNT));
>> }
>>
>> /*
>> @@ -3946,6 +3956,9 @@ static ssize_t mem_cgroup_read(struct cgroup *cont, struct cftype *cft,
>> else
>> val = res_counter_read_u64(&memcg->memsw, name);
>> break;
>> + case _KMEM:
>> + val = res_counter_read_u64(&memcg->kmem, name);
>> + break;
>> default:
>> BUG();
>> }
>> @@ -3984,8 +3997,18 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
>> break;
>> if (type == _MEM)
>> ret = mem_cgroup_resize_limit(memcg, val);
>> - else
>> + else if (type == _MEMSWAP)
>> ret = mem_cgroup_resize_memsw_limit(memcg, val);
>> + else if (type == _KMEM) {
>> + ret = res_counter_set_limit(&memcg->kmem, val);
>> + if (ret)
>> + break;
>> +
>> + /* For simplicity, we won't allow this to be disabled */
>> + if (!memcg->kmem_accounted && val != RESOURCE_MAX)
>> + memcg->kmem_accounted = true;
>> + } else
>> + return -EINVAL;
>> break;
>> case RES_SOFT_LIMIT:
>> ret = res_counter_memparse_write_strategy(buffer, &val);
>> @@ -4051,12 +4074,16 @@ static int mem_cgroup_reset(struct cgroup *cont, unsigned int event)
>> case RES_MAX_USAGE:
>> if (type == _MEM)
>> res_counter_reset_max(&memcg->res);
>> + else if (type == _KMEM)
>> + res_counter_reset_max(&memcg->kmem);
>> else
>> res_counter_reset_max(&memcg->memsw);
>> break;
>> case RES_FAILCNT:
>> if (type == _MEM)
>> res_counter_reset_failcnt(&memcg->res);
>> + else if (type == _KMEM)
>> + res_counter_reset_failcnt(&memcg->kmem);
>> else
>> res_counter_reset_failcnt(&memcg->memsw);
>> break;
>> @@ -4618,6 +4645,33 @@ static int mem_cgroup_oom_control_write(struct cgroup *cgrp,
>> }
>>
>> #ifdef CONFIG_MEMCG_KMEM
>
> Some things are guarded CONFIG_MEMCG_KMEM but some are not (e.g. struct
> mem_cgroup.kmem). I do understand you want to keep ifdefs on the leash
> but we should clean this up one day.
>
>> +static struct cftype kmem_cgroup_files[] = {
>> + {
>> + .name = "kmem.limit_in_bytes",
>> + .private = MEMFILE_PRIVATE(_KMEM, RES_LIMIT),
>> + .write_string = mem_cgroup_write,
>> + .read = mem_cgroup_read,
>> + },
>> + {
>> + .name = "kmem.usage_in_bytes",
>> + .private = MEMFILE_PRIVATE(_KMEM, RES_USAGE),
>> + .read = mem_cgroup_read,
>> + },
>> + {
>> + .name = "kmem.failcnt",
>> + .private = MEMFILE_PRIVATE(_KMEM, RES_FAILCNT),
>> + .trigger = mem_cgroup_reset,
>> + .read = mem_cgroup_read,
>> + },
>> + {
>> + .name = "kmem.max_usage_in_bytes",
>> + .private = MEMFILE_PRIVATE(_KMEM, RES_MAX_USAGE),
>> + .trigger = mem_cgroup_reset,
>> + .read = mem_cgroup_read,
>> + },
>> + {},
>> +};
>> +
>> static int memcg_init_kmem(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
>> {
>> return mem_cgroup_sockets_init(memcg, ss);
>> @@ -4961,6 +5015,12 @@ mem_cgroup_create(struct cgroup *cont)
>> int cpu;
>> enable_swap_cgroup();
>> parent = NULL;
>> +
>> +#ifdef CONFIG_MEMCG_KMEM
>> + WARN_ON(cgroup_add_cftypes(&mem_cgroup_subsys,
>> + kmem_cgroup_files));
>> +#endif
>> +
>> if (mem_cgroup_soft_limit_tree_init())
>> goto free_out;
>> root_mem_cgroup = memcg;
>> @@ -4979,6 +5039,7 @@ mem_cgroup_create(struct cgroup *cont)
>> if (parent && parent->use_hierarchy) {
>> res_counter_init(&memcg->res, &parent->res);
>> res_counter_init(&memcg->memsw, &parent->memsw);
>> + res_counter_init(&memcg->kmem, &parent->kmem);
>
> Haven't we already discussed that a new memcg should inherit kmem_accounted
> from its parent for use_hierarchy?
> Say we have
> root
> |
> A (kmem_accounted = 1, use_hierachy = 1)
> \
> B (kmem_accounted = 0)
> \
> C (kmem_accounted = 1)
>
> B find's itself in an awkward situation becuase it doesn't want to
> account u+k but it ends up doing so becuase C.
>
Ok, I haven't updated it here. But that should be taken care of in the
lifecycle patch.
next prev parent reply other threads:[~2012-09-26 14:36 UTC|newest]
Thread overview: 127+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-18 14:03 [PATCH v3 00/13] kmem controller for memcg Glauber Costa
2012-09-18 14:03 ` [PATCH v3 01/13] memcg: Make it possible to use the stock for more than one page Glauber Costa
2012-10-01 18:48 ` Johannes Weiner
2012-09-18 14:03 ` [PATCH v3 02/13] memcg: Reclaim when more than one page needed Glauber Costa
2012-10-01 19:00 ` Johannes Weiner
2012-09-18 14:04 ` [PATCH v3 03/13] memcg: change defines to an enum Glauber Costa
2012-10-01 19:06 ` Johannes Weiner
2012-10-02 9:10 ` Glauber Costa
2012-09-18 14:04 ` [PATCH v3 04/13] kmem accounting basic infrastructure Glauber Costa
2012-09-21 16:34 ` Tejun Heo
2012-09-24 8:09 ` Glauber Costa
2012-09-26 14:03 ` Michal Hocko
2012-09-26 14:33 ` Glauber Costa [this message]
2012-09-26 16:01 ` Michal Hocko
2012-09-26 17:34 ` Glauber Costa
2012-09-26 16:36 ` Tejun Heo
2012-09-26 17:36 ` Glauber Costa
2012-09-26 17:44 ` Tejun Heo
2012-09-26 17:53 ` Glauber Costa
2012-09-26 18:01 ` Tejun Heo
2012-09-26 18:56 ` Glauber Costa
2012-09-26 19:34 ` Tejun Heo
2012-09-26 19:46 ` Glauber Costa
2012-09-26 19:56 ` Tejun Heo
2012-09-26 20:02 ` Glauber Costa
2012-09-26 20:16 ` Tejun Heo
2012-09-26 21:24 ` Glauber Costa
2012-09-26 22:10 ` Tejun Heo
2012-09-26 22:29 ` Glauber Costa
2012-09-26 22:42 ` Tejun Heo
2012-09-26 22:54 ` Glauber Costa
2012-09-26 23:08 ` Tejun Heo
2012-09-26 23:20 ` Glauber Costa
2012-09-26 23:33 ` Tejun Heo
2012-09-27 12:15 ` Michal Hocko
2012-09-27 12:20 ` Glauber Costa
2012-09-27 12:40 ` Michal Hocko
2012-09-27 12:40 ` Glauber Costa
2012-09-27 12:54 ` Michal Hocko
2012-09-27 14:28 ` Mel Gorman
2012-09-27 14:49 ` Tejun Heo
2012-09-27 14:57 ` Glauber Costa
2012-09-27 17:46 ` Tejun Heo
2012-09-27 17:56 ` Michal Hocko
2012-09-27 18:45 ` Glauber Costa
2012-09-30 7:57 ` Tejun Heo
2012-09-30 8:02 ` Tejun Heo
2012-09-30 8:56 ` James Bottomley
2012-09-30 10:37 ` Tejun Heo
2012-09-30 11:25 ` James Bottomley
2012-10-01 0:57 ` Tejun Heo
2012-10-01 8:43 ` Glauber Costa
2012-10-01 8:46 ` Glauber Costa
2012-10-03 22:59 ` Tejun Heo
2012-10-01 8:36 ` Glauber Costa
2012-09-27 12:08 ` Michal Hocko
2012-09-27 12:11 ` Glauber Costa
2012-09-27 14:33 ` Tejun Heo
2012-09-27 14:43 ` Mel Gorman
2012-09-27 14:58 ` Tejun Heo
2012-09-27 18:30 ` Glauber Costa
2012-09-30 8:23 ` Tejun Heo
2012-10-01 8:45 ` Glauber Costa
2012-10-03 22:54 ` Tejun Heo
2012-10-04 11:55 ` Glauber Costa
2012-10-06 2:19 ` Tejun Heo
2012-09-27 15:09 ` Michal Hocko
2012-09-30 8:47 ` Tejun Heo
2012-10-01 9:27 ` Michal Hocko
2012-10-03 22:43 ` Tejun Heo
2012-10-05 13:47 ` Michal Hocko
2012-09-26 22:11 ` Johannes Weiner
2012-09-26 22:45 ` Glauber Costa
2012-09-18 14:04 ` [PATCH v3 05/13] Add a __GFP_KMEMCG flag Glauber Costa
2012-09-18 14:15 ` Rik van Riel
2012-09-18 15:06 ` Christoph Lameter
2012-09-19 7:39 ` Glauber Costa
2012-09-19 14:07 ` Christoph Lameter
2012-09-27 13:34 ` Mel Gorman
2012-09-27 13:41 ` Glauber Costa
2012-10-01 19:09 ` Johannes Weiner
2012-09-18 14:04 ` [PATCH v3 06/13] memcg: kmem controller infrastructure Glauber Costa
2012-09-20 16:05 ` JoonSoo Kim
2012-09-21 8:41 ` Glauber Costa
2012-09-21 9:14 ` JoonSoo Kim
2012-09-26 15:51 ` Michal Hocko
2012-09-27 11:31 ` Glauber Costa
2012-09-27 13:44 ` Michal Hocko
2012-09-28 11:34 ` Glauber Costa
2012-09-30 8:25 ` Tejun Heo
2012-10-01 8:28 ` Glauber Costa
2012-10-03 22:11 ` Tejun Heo
2012-10-01 9:44 ` Michal Hocko
2012-10-01 9:48 ` Michal Hocko
2012-10-01 10:09 ` Glauber Costa
2012-10-01 11:51 ` Michal Hocko
2012-10-01 11:51 ` Glauber Costa
2012-10-01 11:58 ` Michal Hocko
2012-10-01 12:04 ` Glauber Costa
2012-09-18 14:04 ` [PATCH v3 07/13] mm: Allocate kernel pages to the right memcg Glauber Costa
2012-09-27 13:50 ` Mel Gorman
2012-09-28 9:43 ` Glauber Costa
2012-09-28 13:28 ` Mel Gorman
2012-09-27 13:52 ` Michal Hocko
2012-09-18 14:04 ` [PATCH v3 08/13] res_counter: return amount of charges after res_counter_uncharge Glauber Costa
2012-10-01 10:00 ` Michal Hocko
2012-10-01 10:01 ` Glauber Costa
2012-09-18 14:04 ` [PATCH v3 09/13] memcg: kmem accounting lifecycle management Glauber Costa
2012-10-01 12:15 ` Michal Hocko
2012-10-01 12:29 ` Glauber Costa
2012-10-01 12:36 ` Michal Hocko
2012-10-01 12:43 ` Glauber Costa
2012-09-18 14:04 ` [PATCH v3 10/13] memcg: use static branches when code not in use Glauber Costa
2012-10-01 12:25 ` Michal Hocko
2012-10-01 12:27 ` Glauber Costa
2012-09-18 14:04 ` [PATCH v3 11/13] memcg: allow a memcg with kmem charges to be destructed Glauber Costa
2012-10-01 12:30 ` Michal Hocko
2012-09-18 14:04 ` [PATCH v3 12/13] execute the whole memcg freeing in rcu callback Glauber Costa
2012-09-21 17:23 ` Tejun Heo
2012-09-24 8:48 ` Glauber Costa
2012-10-01 13:27 ` Michal Hocko
2012-10-04 10:53 ` Glauber Costa
2012-10-04 14:20 ` Glauber Costa
2012-10-05 15:31 ` Johannes Weiner
2012-10-08 9:45 ` Glauber Costa
2012-09-18 14:04 ` [PATCH v3 13/13] protect architectures where THREAD_SIZE >= PAGE_SIZE against fork bombs Glauber Costa
2012-10-01 13:17 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50631226.9050304@parallels.com \
--to=glommer@parallels.com \
--cc=cgroups@vger.kernel.org \
--cc=devel@openvz.org \
--cc=fweisbec@gmail.com \
--cc=hannes@cmpxchg.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.cz \
--cc=rientjes@google.com \
--cc=suleiman@google.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).