All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.cz>
To: Glauber Costa <glommer@parallels.com>
Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
	kamezawa.hiroyu@jp.fujitsu.com, devel@openvz.org,
	Tejun Heo <tj@kernel.org>,
	linux-mm@kvack.org, Suleiman Souhlal <suleiman@google.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Mel Gorman <mgorman@suse.de>,
	David Rientjes <rientjes@google.com>,
	Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@cs.helsinki.fi>,
	Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH v3 06/13] memcg: kmem controller infrastructure
Date: Wed, 26 Sep 2012 17:51:09 +0200	[thread overview]
Message-ID: <20120926155108.GE15801@dhcp22.suse.cz> (raw)
In-Reply-To: <1347977050-29476-7-git-send-email-glommer@parallels.com>

On Tue 18-09-12 18:04:03, Glauber Costa wrote:
> This patch introduces infrastructure for tracking kernel memory pages to
> a given memcg. This will happen whenever the caller includes the flag
> __GFP_KMEMCG flag, and the task belong to a memcg other than the root.
> 
> In memcontrol.h those functions are wrapped in inline acessors.  The
> idea is to later on, patch those with static branches, so we don't incur
> any overhead when no mem cgroups with limited kmem are being used.

Could you describe the API a bit here, please? I guess the
kernel user is supposed to call memcg_kmem_newpage_charge and
memcg_kmem_commit_charge resp. memcg_kmem_uncharge_page.
All other kmem functions here are just helpers, right?
> 
> [ v2: improved comments and standardized function names ]
> [ v3: handle no longer opaque, functions not exported,
>   even more comments ]
> [ v4: reworked Used bit handling and surroundings for more clarity ]
> 
> Signed-off-by: Glauber Costa <glommer@parallels.com>
> CC: Christoph Lameter <cl@linux.com>
> CC: Pekka Enberg <penberg@cs.helsinki.fi>
> CC: Michal Hocko <mhocko@suse.cz>
> CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> CC: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  include/linux/memcontrol.h |  97 +++++++++++++++++++++++++
>  mm/memcontrol.c            | 177 +++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 274 insertions(+)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 8d9489f..82ede9a 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -21,6 +21,7 @@
>  #define _LINUX_MEMCONTROL_H
>  #include <linux/cgroup.h>
>  #include <linux/vm_event_item.h>
> +#include <linux/hardirq.h>
>
>  struct mem_cgroup;
>  struct page_cgroup;
> @@ -399,6 +400,17 @@ struct sock;
>  #ifdef CONFIG_MEMCG_KMEM
>  void sock_update_memcg(struct sock *sk);
>  void sock_release_memcg(struct sock *sk);
> +
> +static inline bool memcg_kmem_enabled(void)
> +{
> +	return true;
> +}
> +
> +extern bool __memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **memcg,
> +					int order);
> +extern void __memcg_kmem_commit_charge(struct page *page,
> +				       struct mem_cgroup *memcg, int order);
> +extern void __memcg_kmem_uncharge_page(struct page *page, int order);
>  #else
>  static inline void sock_update_memcg(struct sock *sk)
>  {
> @@ -406,6 +418,91 @@ static inline void sock_update_memcg(struct sock *sk)
>  static inline void sock_release_memcg(struct sock *sk)
>  {
>  }
> +
> +static inline bool memcg_kmem_enabled(void)
> +{
> +	return false;
> +}
> +
> +static inline bool
> +__memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **memcg, int order)
> +{
> +	return false;
> +}
> +
> +static inline void  __memcg_kmem_uncharge_page(struct page *page, int order)
> +{
> +}
> +
> +static inline void
> +__memcg_kmem_commit_charge(struct page *page, struct mem_cgroup *memcg, int order)
> +{
> +}

I think we shouldn't care about these for !MEMCG_KMEM. It should be
sufficient to define the main three functions bellow as return true
resp. NOOP. This would reduce the code churn a bit and also make it
better maintainable.

>  #endif /* CONFIG_MEMCG_KMEM */
> +
> +/**
> + * memcg_kmem_newpage_charge: verify if a new kmem allocation is allowed.
> + * @gfp: the gfp allocation flags.
> + * @memcg: a pointer to the memcg this was charged against.
> + * @order: allocation order.
> + *
> + * returns true if the memcg where the current task belongs can hold this
> + * allocation.
> + *
> + * We return true automatically if this allocation is not to be accounted to
> + * any memcg.
> + */
> +static __always_inline bool
> +memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **memcg, int order)
> +{
> +	if (!memcg_kmem_enabled())
> +		return true;
> +
> +	/*
> +	 * __GFP_NOFAIL allocations will move on even if charging is not
> +	 * possible. Therefore we don't even try, and have this allocation
> +	 * unaccounted. We could in theory charge it with
> +	 * res_counter_charge_nofail, but we hope those allocations are rare,
> +	 * and won't be worth the trouble.
> +	 */
> +	if (!(gfp & __GFP_KMEMCG) || (gfp & __GFP_NOFAIL))
> +		return true;
> +	if (in_interrupt() || (!current->mm) || (current->flags & PF_KTHREAD))
> +		return true;
> +	return __memcg_kmem_newpage_charge(gfp, memcg, order);
> +}
> +
> +/**
> + * memcg_kmem_uncharge_page: uncharge pages from memcg
> + * @page: pointer to struct page being freed
> + * @order: allocation order.
> + *
> + * there is no need to specify memcg here, since it is embedded in page_cgroup
> + */
> +static __always_inline void
> +memcg_kmem_uncharge_page(struct page *page, int order)
> +{
> +	if (memcg_kmem_enabled())
> +		__memcg_kmem_uncharge_page(page, order);
> +}
> +
> +/**
> + * memcg_kmem_commit_charge: embeds correct memcg in a page
> + * @memcg: a pointer to the memcg this was charged against.
      ^^^^^^^
remove this one?

> + * @page: pointer to struct page recently allocated
> + * @memcg: the memcg structure we charged against
> + * @order: allocation order.
> + *
> + * Needs to be called after memcg_kmem_newpage_charge, regardless of success or
> + * failure of the allocation. if @page is NULL, this function will revert the
> + * charges. Otherwise, it will commit the memcg given by @memcg to the
> + * corresponding page_cgroup.
> + */
> +static __always_inline void
> +memcg_kmem_commit_charge(struct page *page, struct mem_cgroup *memcg, int order)
> +{
> +	if (memcg_kmem_enabled() && memcg)
> +		__memcg_kmem_commit_charge(page, memcg, order);
> +}
>  #endif /* _LINUX_MEMCONTROL_H */
>  
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index f3fd354..0f36a01 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -10,6 +10,10 @@
>   * Copyright (C) 2009 Nokia Corporation
>   * Author: Kirill A. Shutemov
>   *
> + * Kernel Memory Controller
> + * Copyright (C) 2012 Parallels Inc. and Google Inc.
> + * Authors: Glauber Costa and Suleiman Souhlal
> + *
>   * This program is free software; you can redistribute it and/or modify
>   * it under the terms of the GNU General Public License as published by
>   * the Free Software Foundation; either version 2 of the License, or
> @@ -426,6 +430,9 @@ struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *s)
>  #include <net/ip.h>
>  
>  static bool mem_cgroup_is_root(struct mem_cgroup *memcg);
> +static int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size);
> +static void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size);
> +

Why the forward declarations here? We can simply move definitions up
before they are used for the first time, can't we? Besides that they are
never used/defined from outside of KMEM_MEMCG.

>  void sock_update_memcg(struct sock *sk)
>  {
>  	if (mem_cgroup_sockets_enabled) {
> @@ -480,6 +487,110 @@ struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg)
>  }
>  EXPORT_SYMBOL(tcp_proto_cgroup);
>  #endif /* CONFIG_INET */
> +
> +static inline bool memcg_can_account_kmem(struct mem_cgroup *memcg)
> +{
> +	return !mem_cgroup_disabled() && !mem_cgroup_is_root(memcg) &&
> +		memcg->kmem_accounted;
> +}
> +
> +/*
> + * We need to verify if the allocation against current->mm->owner's memcg is
> + * possible for the given order. But the page is not allocated yet, so we'll
> + * need a further commit step to do the final arrangements.
> + *
> + * It is possible for the task to switch cgroups in this mean time, so at
> + * commit time, we can't rely on task conversion any longer.  We'll then use
> + * the handle argument to return to the caller which cgroup we should commit
> + * against. We could also return the memcg directly and avoid the pointer
> + * passing, but a boolean return value gives better semantics considering
> + * the compiled-out case as well.
> + *
> + * Returning true means the allocation is possible.
> + */
> +bool
> +__memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **_memcg, int order)
> +{
> +	struct mem_cgroup *memcg;
> +	bool ret;
> +	struct task_struct *p;

Johannes likes christmas trees ;) and /me would like to remove `p' and
use mem_cgroup_from_task(rcu_dereference(current->mm->owner)) same as we
do at other places (I guess it will be checkpatch safe).

> +
> +	*_memcg = NULL;
> +	rcu_read_lock();
> +	p = rcu_dereference(current->mm->owner);
> +	memcg = mem_cgroup_from_task(p);

mem_cgroup_from_task says it can return NULL. Do we care here? If not
then please put VM_BUG_ON(!memcg) here.

> +	rcu_read_unlock();
> +
> +	if (!memcg_can_account_kmem(memcg))
> +		return true;
> +
> +	mem_cgroup_get(memcg);

I am confused. Why do we take a reference to memcg rather than css_get
here? Ahh it is because we keep the reference while the page is
allocated, right? Comment please.

I am still not sure whether we need css_get here as well. How do you
know that the current is not moved in parallel and it is a last task in
a group which then can go away?

> +
> +	ret = memcg_charge_kmem(memcg, gfp, PAGE_SIZE << order) == 0;
> +	if (ret)
> +		*_memcg = memcg;
> +	else
> +		mem_cgroup_put(memcg);
> +
> +	return ret;
> +}
> +
> +void __memcg_kmem_commit_charge(struct page *page, struct mem_cgroup *memcg,
> +			      int order)
> +{
> +	struct page_cgroup *pc;
> +
> +	WARN_ON(mem_cgroup_is_root(memcg));

Why the warn? Nobody should use this directly and
memcg_kmem_commit_charge takes care of the root same as
__memcg_kmem_newpage_charge does. If it is for correctness then it
should be VM_BUG_ON.

> +
> +	/* The page allocation failed. Revert */
> +	if (!page) {
> +		memcg_uncharge_kmem(memcg, PAGE_SIZE << order);
> +		return;
> +	}
> +
> +	pc = lookup_page_cgroup(page);
> +	lock_page_cgroup(pc);
> +	pc->mem_cgroup = memcg;
> +	SetPageCgroupUsed(pc);
> +	unlock_page_cgroup(pc);
> +}
> +
> +void __memcg_kmem_uncharge_page(struct page *page, int order)
> +{
> +	struct mem_cgroup *memcg = NULL;
> +	struct page_cgroup *pc;
> +
> +
> +	pc = lookup_page_cgroup(page);
> +	/*
> +	 * Fast unlocked return. Theoretically might have changed, have to
> +	 * check again after locking.
> +	 */
> +	if (!PageCgroupUsed(pc))
> +		return;
> +
> +	lock_page_cgroup(pc);
> +	if (PageCgroupUsed(pc)) {
> +		memcg = pc->mem_cgroup;
> +		ClearPageCgroupUsed(pc);
> +	}
> +	unlock_page_cgroup(pc);
> +
> +	/*
> +	 * Checking if kmem accounted is enabled won't work for uncharge, since
> +	 * it is possible that the user enabled kmem tracking, allocated, and
> +	 * then disabled it again.

disabling cannot happen, right?

> +	 *
> +	 * We trust if there is a memcg associated with the page, it is a valid
> +	 * allocation
> +	 */
> +	if (!memcg)
> +		return;
> +
> +	WARN_ON(mem_cgroup_is_root(memcg));

Same as above I do not see a reason for warn here. It just adds a code
and if you want it for debugging then VM_BUG_ON sounds more appropriate.
/me thinks

> +	memcg_uncharge_kmem(memcg, PAGE_SIZE << order);
> +	mem_cgroup_put(memcg);
> +}
>  #endif /* CONFIG_MEMCG_KMEM */
>  
>  #if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
> @@ -5700,3 +5811,69 @@ static int __init enable_swap_account(char *s)
>  __setup("swapaccount=", enable_swap_account);
>  
>  #endif
> +
> +#ifdef CONFIG_MEMCG_KMEM
> +int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size)
> +{
> +	struct res_counter *fail_res;
> +	struct mem_cgroup *_memcg;
> +	int ret;
> +	bool may_oom;
> +	bool nofail = false;
> +
> +	may_oom = (gfp & __GFP_WAIT) && (gfp & __GFP_FS) &&
> +	    !(gfp & __GFP_NORETRY);

A comment please? Why __GFP_IO is not considered for example?

> +
> +	ret = 0;
> +
> +	if (!memcg)
> +		return ret;

How can we get a NULL memcg here without blowing in
__memcg_kmem_newpage_charge?

> +
> +	_memcg = memcg;
> +	ret = __mem_cgroup_try_charge(NULL, gfp, size / PAGE_SIZE,

me likes >> PAGE_SHIFT more.

> +	    &_memcg, may_oom);
> +
> +	if (ret == -EINTR)  {
> +		nofail = true;
> +		/*
> +		 * __mem_cgroup_try_charge() chosed to bypass to root due to
> +		 * OOM kill or fatal signal.  Since our only options are to
> +		 * either fail the allocation or charge it to this cgroup, do
> +		 * it as a temporary condition. But we can't fail. From a
> +		 * kmem/slab perspective, the cache has already been selected,
> +		 * by mem_cgroup_get_kmem_cache(), so it is too late to change
> +		 * our minds
> +		 */
> +		res_counter_charge_nofail(&memcg->res, size, &fail_res);
> +		if (do_swap_account)
> +			res_counter_charge_nofail(&memcg->memsw, size,
> +						  &fail_res);
> +		ret = 0;
> +	} else if (ret == -ENOMEM)
> +		return ret;
> +
> +	if (nofail)
> +		res_counter_charge_nofail(&memcg->kmem, size, &fail_res);
> +	else
> +		ret = res_counter_charge(&memcg->kmem, size, &fail_res);
> +
> +	if (ret) {
> +		res_counter_uncharge(&memcg->res, size);
> +		if (do_swap_account)
> +			res_counter_uncharge(&memcg->memsw, size);
> +	}

You could save few lines and get rid of the strange nofail by:
[...]
+		res_counter_charge_nofail(&memcg->res, size, &fail_res);
+		if (do_swap_account)
+			res_counter_charge_nofail(&memcg->memsw, size,
+						  &fail_res);
+		res_counter_charge_nofail(&memcg->kmem, size, &fail_res);
+		return 0;
+	} else if (ret == -ENOMEM)
+		return ret;
+	else
+		ret = res_counter_charge(&memcg->kmem, size, &fail_res);
+
+	if (ret) {
+		res_counter_uncharge(&memcg->res, size);
+		if (do_swap_account)
+			res_counter_uncharge(&memcg->memsw, size);
+	}

> +
> +	return ret;
> +}
> +
> +void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size)
> +{
> +	if (!memcg)
> +		return;
> +
> +	res_counter_uncharge(&memcg->kmem, size);
> +	res_counter_uncharge(&memcg->res, size);
> +	if (do_swap_account)
> +		res_counter_uncharge(&memcg->memsw, size);
> +}
> +#endif /* CONFIG_MEMCG_KMEM */
> -- 
> 1.7.11.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@suse.cz>
To: Glauber Costa <glommer@parallels.com>
Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
	kamezawa.hiroyu@jp.fujitsu.com, devel@openvz.org,
	Tejun Heo <tj@kernel.org>,
	linux-mm@kvack.org, Suleiman Souhlal <suleiman@google.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Mel Gorman <mgorman@suse.de>,
	David Rientjes <rientjes@google.com>,
	Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@cs.helsinki.fi>,
	Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH v3 06/13] memcg: kmem controller infrastructure
Date: Wed, 26 Sep 2012 17:51:09 +0200	[thread overview]
Message-ID: <20120926155108.GE15801@dhcp22.suse.cz> (raw)
In-Reply-To: <1347977050-29476-7-git-send-email-glommer@parallels.com>

On Tue 18-09-12 18:04:03, Glauber Costa wrote:
> This patch introduces infrastructure for tracking kernel memory pages to
> a given memcg. This will happen whenever the caller includes the flag
> __GFP_KMEMCG flag, and the task belong to a memcg other than the root.
> 
> In memcontrol.h those functions are wrapped in inline acessors.  The
> idea is to later on, patch those with static branches, so we don't incur
> any overhead when no mem cgroups with limited kmem are being used.

Could you describe the API a bit here, please? I guess the
kernel user is supposed to call memcg_kmem_newpage_charge and
memcg_kmem_commit_charge resp. memcg_kmem_uncharge_page.
All other kmem functions here are just helpers, right?
> 
> [ v2: improved comments and standardized function names ]
> [ v3: handle no longer opaque, functions not exported,
>   even more comments ]
> [ v4: reworked Used bit handling and surroundings for more clarity ]
> 
> Signed-off-by: Glauber Costa <glommer@parallels.com>
> CC: Christoph Lameter <cl@linux.com>
> CC: Pekka Enberg <penberg@cs.helsinki.fi>
> CC: Michal Hocko <mhocko@suse.cz>
> CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> CC: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  include/linux/memcontrol.h |  97 +++++++++++++++++++++++++
>  mm/memcontrol.c            | 177 +++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 274 insertions(+)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 8d9489f..82ede9a 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -21,6 +21,7 @@
>  #define _LINUX_MEMCONTROL_H
>  #include <linux/cgroup.h>
>  #include <linux/vm_event_item.h>
> +#include <linux/hardirq.h>
>
>  struct mem_cgroup;
>  struct page_cgroup;
> @@ -399,6 +400,17 @@ struct sock;
>  #ifdef CONFIG_MEMCG_KMEM
>  void sock_update_memcg(struct sock *sk);
>  void sock_release_memcg(struct sock *sk);
> +
> +static inline bool memcg_kmem_enabled(void)
> +{
> +	return true;
> +}
> +
> +extern bool __memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **memcg,
> +					int order);
> +extern void __memcg_kmem_commit_charge(struct page *page,
> +				       struct mem_cgroup *memcg, int order);
> +extern void __memcg_kmem_uncharge_page(struct page *page, int order);
>  #else
>  static inline void sock_update_memcg(struct sock *sk)
>  {
> @@ -406,6 +418,91 @@ static inline void sock_update_memcg(struct sock *sk)
>  static inline void sock_release_memcg(struct sock *sk)
>  {
>  }
> +
> +static inline bool memcg_kmem_enabled(void)
> +{
> +	return false;
> +}
> +
> +static inline bool
> +__memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **memcg, int order)
> +{
> +	return false;
> +}
> +
> +static inline void  __memcg_kmem_uncharge_page(struct page *page, int order)
> +{
> +}
> +
> +static inline void
> +__memcg_kmem_commit_charge(struct page *page, struct mem_cgroup *memcg, int order)
> +{
> +}

I think we shouldn't care about these for !MEMCG_KMEM. It should be
sufficient to define the main three functions bellow as return true
resp. NOOP. This would reduce the code churn a bit and also make it
better maintainable.

>  #endif /* CONFIG_MEMCG_KMEM */
> +
> +/**
> + * memcg_kmem_newpage_charge: verify if a new kmem allocation is allowed.
> + * @gfp: the gfp allocation flags.
> + * @memcg: a pointer to the memcg this was charged against.
> + * @order: allocation order.
> + *
> + * returns true if the memcg where the current task belongs can hold this
> + * allocation.
> + *
> + * We return true automatically if this allocation is not to be accounted to
> + * any memcg.
> + */
> +static __always_inline bool
> +memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **memcg, int order)
> +{
> +	if (!memcg_kmem_enabled())
> +		return true;
> +
> +	/*
> +	 * __GFP_NOFAIL allocations will move on even if charging is not
> +	 * possible. Therefore we don't even try, and have this allocation
> +	 * unaccounted. We could in theory charge it with
> +	 * res_counter_charge_nofail, but we hope those allocations are rare,
> +	 * and won't be worth the trouble.
> +	 */
> +	if (!(gfp & __GFP_KMEMCG) || (gfp & __GFP_NOFAIL))
> +		return true;
> +	if (in_interrupt() || (!current->mm) || (current->flags & PF_KTHREAD))
> +		return true;
> +	return __memcg_kmem_newpage_charge(gfp, memcg, order);
> +}
> +
> +/**
> + * memcg_kmem_uncharge_page: uncharge pages from memcg
> + * @page: pointer to struct page being freed
> + * @order: allocation order.
> + *
> + * there is no need to specify memcg here, since it is embedded in page_cgroup
> + */
> +static __always_inline void
> +memcg_kmem_uncharge_page(struct page *page, int order)
> +{
> +	if (memcg_kmem_enabled())
> +		__memcg_kmem_uncharge_page(page, order);
> +}
> +
> +/**
> + * memcg_kmem_commit_charge: embeds correct memcg in a page
> + * @memcg: a pointer to the memcg this was charged against.
      ^^^^^^^
remove this one?

> + * @page: pointer to struct page recently allocated
> + * @memcg: the memcg structure we charged against
> + * @order: allocation order.
> + *
> + * Needs to be called after memcg_kmem_newpage_charge, regardless of success or
> + * failure of the allocation. if @page is NULL, this function will revert the
> + * charges. Otherwise, it will commit the memcg given by @memcg to the
> + * corresponding page_cgroup.
> + */
> +static __always_inline void
> +memcg_kmem_commit_charge(struct page *page, struct mem_cgroup *memcg, int order)
> +{
> +	if (memcg_kmem_enabled() && memcg)
> +		__memcg_kmem_commit_charge(page, memcg, order);
> +}
>  #endif /* _LINUX_MEMCONTROL_H */
>  
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index f3fd354..0f36a01 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -10,6 +10,10 @@
>   * Copyright (C) 2009 Nokia Corporation
>   * Author: Kirill A. Shutemov
>   *
> + * Kernel Memory Controller
> + * Copyright (C) 2012 Parallels Inc. and Google Inc.
> + * Authors: Glauber Costa and Suleiman Souhlal
> + *
>   * This program is free software; you can redistribute it and/or modify
>   * it under the terms of the GNU General Public License as published by
>   * the Free Software Foundation; either version 2 of the License, or
> @@ -426,6 +430,9 @@ struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *s)
>  #include <net/ip.h>
>  
>  static bool mem_cgroup_is_root(struct mem_cgroup *memcg);
> +static int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size);
> +static void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size);
> +

Why the forward declarations here? We can simply move definitions up
before they are used for the first time, can't we? Besides that they are
never used/defined from outside of KMEM_MEMCG.

>  void sock_update_memcg(struct sock *sk)
>  {
>  	if (mem_cgroup_sockets_enabled) {
> @@ -480,6 +487,110 @@ struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg)
>  }
>  EXPORT_SYMBOL(tcp_proto_cgroup);
>  #endif /* CONFIG_INET */
> +
> +static inline bool memcg_can_account_kmem(struct mem_cgroup *memcg)
> +{
> +	return !mem_cgroup_disabled() && !mem_cgroup_is_root(memcg) &&
> +		memcg->kmem_accounted;
> +}
> +
> +/*
> + * We need to verify if the allocation against current->mm->owner's memcg is
> + * possible for the given order. But the page is not allocated yet, so we'll
> + * need a further commit step to do the final arrangements.
> + *
> + * It is possible for the task to switch cgroups in this mean time, so at
> + * commit time, we can't rely on task conversion any longer.  We'll then use
> + * the handle argument to return to the caller which cgroup we should commit
> + * against. We could also return the memcg directly and avoid the pointer
> + * passing, but a boolean return value gives better semantics considering
> + * the compiled-out case as well.
> + *
> + * Returning true means the allocation is possible.
> + */
> +bool
> +__memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **_memcg, int order)
> +{
> +	struct mem_cgroup *memcg;
> +	bool ret;
> +	struct task_struct *p;

Johannes likes christmas trees ;) and /me would like to remove `p' and
use mem_cgroup_from_task(rcu_dereference(current->mm->owner)) same as we
do at other places (I guess it will be checkpatch safe).

> +
> +	*_memcg = NULL;
> +	rcu_read_lock();
> +	p = rcu_dereference(current->mm->owner);
> +	memcg = mem_cgroup_from_task(p);

mem_cgroup_from_task says it can return NULL. Do we care here? If not
then please put VM_BUG_ON(!memcg) here.

> +	rcu_read_unlock();
> +
> +	if (!memcg_can_account_kmem(memcg))
> +		return true;
> +
> +	mem_cgroup_get(memcg);

I am confused. Why do we take a reference to memcg rather than css_get
here? Ahh it is because we keep the reference while the page is
allocated, right? Comment please.

I am still not sure whether we need css_get here as well. How do you
know that the current is not moved in parallel and it is a last task in
a group which then can go away?

> +
> +	ret = memcg_charge_kmem(memcg, gfp, PAGE_SIZE << order) == 0;
> +	if (ret)
> +		*_memcg = memcg;
> +	else
> +		mem_cgroup_put(memcg);
> +
> +	return ret;
> +}
> +
> +void __memcg_kmem_commit_charge(struct page *page, struct mem_cgroup *memcg,
> +			      int order)
> +{
> +	struct page_cgroup *pc;
> +
> +	WARN_ON(mem_cgroup_is_root(memcg));

Why the warn? Nobody should use this directly and
memcg_kmem_commit_charge takes care of the root same as
__memcg_kmem_newpage_charge does. If it is for correctness then it
should be VM_BUG_ON.

> +
> +	/* The page allocation failed. Revert */
> +	if (!page) {
> +		memcg_uncharge_kmem(memcg, PAGE_SIZE << order);
> +		return;
> +	}
> +
> +	pc = lookup_page_cgroup(page);
> +	lock_page_cgroup(pc);
> +	pc->mem_cgroup = memcg;
> +	SetPageCgroupUsed(pc);
> +	unlock_page_cgroup(pc);
> +}
> +
> +void __memcg_kmem_uncharge_page(struct page *page, int order)
> +{
> +	struct mem_cgroup *memcg = NULL;
> +	struct page_cgroup *pc;
> +
> +
> +	pc = lookup_page_cgroup(page);
> +	/*
> +	 * Fast unlocked return. Theoretically might have changed, have to
> +	 * check again after locking.
> +	 */
> +	if (!PageCgroupUsed(pc))
> +		return;
> +
> +	lock_page_cgroup(pc);
> +	if (PageCgroupUsed(pc)) {
> +		memcg = pc->mem_cgroup;
> +		ClearPageCgroupUsed(pc);
> +	}
> +	unlock_page_cgroup(pc);
> +
> +	/*
> +	 * Checking if kmem accounted is enabled won't work for uncharge, since
> +	 * it is possible that the user enabled kmem tracking, allocated, and
> +	 * then disabled it again.

disabling cannot happen, right?

> +	 *
> +	 * We trust if there is a memcg associated with the page, it is a valid
> +	 * allocation
> +	 */
> +	if (!memcg)
> +		return;
> +
> +	WARN_ON(mem_cgroup_is_root(memcg));

Same as above I do not see a reason for warn here. It just adds a code
and if you want it for debugging then VM_BUG_ON sounds more appropriate.
/me thinks

> +	memcg_uncharge_kmem(memcg, PAGE_SIZE << order);
> +	mem_cgroup_put(memcg);
> +}
>  #endif /* CONFIG_MEMCG_KMEM */
>  
>  #if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
> @@ -5700,3 +5811,69 @@ static int __init enable_swap_account(char *s)
>  __setup("swapaccount=", enable_swap_account);
>  
>  #endif
> +
> +#ifdef CONFIG_MEMCG_KMEM
> +int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size)
> +{
> +	struct res_counter *fail_res;
> +	struct mem_cgroup *_memcg;
> +	int ret;
> +	bool may_oom;
> +	bool nofail = false;
> +
> +	may_oom = (gfp & __GFP_WAIT) && (gfp & __GFP_FS) &&
> +	    !(gfp & __GFP_NORETRY);

A comment please? Why __GFP_IO is not considered for example?

> +
> +	ret = 0;
> +
> +	if (!memcg)
> +		return ret;

How can we get a NULL memcg here without blowing in
__memcg_kmem_newpage_charge?

> +
> +	_memcg = memcg;
> +	ret = __mem_cgroup_try_charge(NULL, gfp, size / PAGE_SIZE,

me likes >> PAGE_SHIFT more.

> +	    &_memcg, may_oom);
> +
> +	if (ret == -EINTR)  {
> +		nofail = true;
> +		/*
> +		 * __mem_cgroup_try_charge() chosed to bypass to root due to
> +		 * OOM kill or fatal signal.  Since our only options are to
> +		 * either fail the allocation or charge it to this cgroup, do
> +		 * it as a temporary condition. But we can't fail. From a
> +		 * kmem/slab perspective, the cache has already been selected,
> +		 * by mem_cgroup_get_kmem_cache(), so it is too late to change
> +		 * our minds
> +		 */
> +		res_counter_charge_nofail(&memcg->res, size, &fail_res);
> +		if (do_swap_account)
> +			res_counter_charge_nofail(&memcg->memsw, size,
> +						  &fail_res);
> +		ret = 0;
> +	} else if (ret == -ENOMEM)
> +		return ret;
> +
> +	if (nofail)
> +		res_counter_charge_nofail(&memcg->kmem, size, &fail_res);
> +	else
> +		ret = res_counter_charge(&memcg->kmem, size, &fail_res);
> +
> +	if (ret) {
> +		res_counter_uncharge(&memcg->res, size);
> +		if (do_swap_account)
> +			res_counter_uncharge(&memcg->memsw, size);
> +	}

You could save few lines and get rid of the strange nofail by:
[...]
+		res_counter_charge_nofail(&memcg->res, size, &fail_res);
+		if (do_swap_account)
+			res_counter_charge_nofail(&memcg->memsw, size,
+						  &fail_res);
+		res_counter_charge_nofail(&memcg->kmem, size, &fail_res);
+		return 0;
+	} else if (ret == -ENOMEM)
+		return ret;
+	else
+		ret = res_counter_charge(&memcg->kmem, size, &fail_res);
+
+	if (ret) {
+		res_counter_uncharge(&memcg->res, size);
+		if (do_swap_account)
+			res_counter_uncharge(&memcg->memsw, size);
+	}

> +
> +	return ret;
> +}
> +
> +void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size)
> +{
> +	if (!memcg)
> +		return;
> +
> +	res_counter_uncharge(&memcg->kmem, size);
> +	res_counter_uncharge(&memcg->res, size);
> +	if (do_swap_account)
> +		res_counter_uncharge(&memcg->memsw, size);
> +}
> +#endif /* CONFIG_MEMCG_KMEM */
> -- 
> 1.7.11.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
To: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org,
	devel-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org,
	Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	Suleiman Souhlal
	<suleiman-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Frederic Weisbecker
	<fweisbec-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Mel Gorman <mgorman-l3A5Bk7waGM@public.gmane.org>,
	David Rientjes <rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>,
	Pekka Enberg <penberg-bbCR+/B0CizivPeTLB3BmA@public.gmane.org>,
	Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Subject: Re: [PATCH v3 06/13] memcg: kmem controller infrastructure
Date: Wed, 26 Sep 2012 17:51:09 +0200	[thread overview]
Message-ID: <20120926155108.GE15801@dhcp22.suse.cz> (raw)
In-Reply-To: <1347977050-29476-7-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>

On Tue 18-09-12 18:04:03, Glauber Costa wrote:
> This patch introduces infrastructure for tracking kernel memory pages to
> a given memcg. This will happen whenever the caller includes the flag
> __GFP_KMEMCG flag, and the task belong to a memcg other than the root.
> 
> In memcontrol.h those functions are wrapped in inline acessors.  The
> idea is to later on, patch those with static branches, so we don't incur
> any overhead when no mem cgroups with limited kmem are being used.

Could you describe the API a bit here, please? I guess the
kernel user is supposed to call memcg_kmem_newpage_charge and
memcg_kmem_commit_charge resp. memcg_kmem_uncharge_page.
All other kmem functions here are just helpers, right?
> 
> [ v2: improved comments and standardized function names ]
> [ v3: handle no longer opaque, functions not exported,
>   even more comments ]
> [ v4: reworked Used bit handling and surroundings for more clarity ]
> 
> Signed-off-by: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
> CC: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
> CC: Pekka Enberg <penberg-bbCR+/B0CizivPeTLB3BmA@public.gmane.org>
> CC: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
> CC: Kamezawa Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
> CC: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
> ---
>  include/linux/memcontrol.h |  97 +++++++++++++++++++++++++
>  mm/memcontrol.c            | 177 +++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 274 insertions(+)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 8d9489f..82ede9a 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -21,6 +21,7 @@
>  #define _LINUX_MEMCONTROL_H
>  #include <linux/cgroup.h>
>  #include <linux/vm_event_item.h>
> +#include <linux/hardirq.h>
>
>  struct mem_cgroup;
>  struct page_cgroup;
> @@ -399,6 +400,17 @@ struct sock;
>  #ifdef CONFIG_MEMCG_KMEM
>  void sock_update_memcg(struct sock *sk);
>  void sock_release_memcg(struct sock *sk);
> +
> +static inline bool memcg_kmem_enabled(void)
> +{
> +	return true;
> +}
> +
> +extern bool __memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **memcg,
> +					int order);
> +extern void __memcg_kmem_commit_charge(struct page *page,
> +				       struct mem_cgroup *memcg, int order);
> +extern void __memcg_kmem_uncharge_page(struct page *page, int order);
>  #else
>  static inline void sock_update_memcg(struct sock *sk)
>  {
> @@ -406,6 +418,91 @@ static inline void sock_update_memcg(struct sock *sk)
>  static inline void sock_release_memcg(struct sock *sk)
>  {
>  }
> +
> +static inline bool memcg_kmem_enabled(void)
> +{
> +	return false;
> +}
> +
> +static inline bool
> +__memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **memcg, int order)
> +{
> +	return false;
> +}
> +
> +static inline void  __memcg_kmem_uncharge_page(struct page *page, int order)
> +{
> +}
> +
> +static inline void
> +__memcg_kmem_commit_charge(struct page *page, struct mem_cgroup *memcg, int order)
> +{
> +}

I think we shouldn't care about these for !MEMCG_KMEM. It should be
sufficient to define the main three functions bellow as return true
resp. NOOP. This would reduce the code churn a bit and also make it
better maintainable.

>  #endif /* CONFIG_MEMCG_KMEM */
> +
> +/**
> + * memcg_kmem_newpage_charge: verify if a new kmem allocation is allowed.
> + * @gfp: the gfp allocation flags.
> + * @memcg: a pointer to the memcg this was charged against.
> + * @order: allocation order.
> + *
> + * returns true if the memcg where the current task belongs can hold this
> + * allocation.
> + *
> + * We return true automatically if this allocation is not to be accounted to
> + * any memcg.
> + */
> +static __always_inline bool
> +memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **memcg, int order)
> +{
> +	if (!memcg_kmem_enabled())
> +		return true;
> +
> +	/*
> +	 * __GFP_NOFAIL allocations will move on even if charging is not
> +	 * possible. Therefore we don't even try, and have this allocation
> +	 * unaccounted. We could in theory charge it with
> +	 * res_counter_charge_nofail, but we hope those allocations are rare,
> +	 * and won't be worth the trouble.
> +	 */
> +	if (!(gfp & __GFP_KMEMCG) || (gfp & __GFP_NOFAIL))
> +		return true;
> +	if (in_interrupt() || (!current->mm) || (current->flags & PF_KTHREAD))
> +		return true;
> +	return __memcg_kmem_newpage_charge(gfp, memcg, order);
> +}
> +
> +/**
> + * memcg_kmem_uncharge_page: uncharge pages from memcg
> + * @page: pointer to struct page being freed
> + * @order: allocation order.
> + *
> + * there is no need to specify memcg here, since it is embedded in page_cgroup
> + */
> +static __always_inline void
> +memcg_kmem_uncharge_page(struct page *page, int order)
> +{
> +	if (memcg_kmem_enabled())
> +		__memcg_kmem_uncharge_page(page, order);
> +}
> +
> +/**
> + * memcg_kmem_commit_charge: embeds correct memcg in a page
> + * @memcg: a pointer to the memcg this was charged against.
      ^^^^^^^
remove this one?

> + * @page: pointer to struct page recently allocated
> + * @memcg: the memcg structure we charged against
> + * @order: allocation order.
> + *
> + * Needs to be called after memcg_kmem_newpage_charge, regardless of success or
> + * failure of the allocation. if @page is NULL, this function will revert the
> + * charges. Otherwise, it will commit the memcg given by @memcg to the
> + * corresponding page_cgroup.
> + */
> +static __always_inline void
> +memcg_kmem_commit_charge(struct page *page, struct mem_cgroup *memcg, int order)
> +{
> +	if (memcg_kmem_enabled() && memcg)
> +		__memcg_kmem_commit_charge(page, memcg, order);
> +}
>  #endif /* _LINUX_MEMCONTROL_H */
>  
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index f3fd354..0f36a01 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -10,6 +10,10 @@
>   * Copyright (C) 2009 Nokia Corporation
>   * Author: Kirill A. Shutemov
>   *
> + * Kernel Memory Controller
> + * Copyright (C) 2012 Parallels Inc. and Google Inc.
> + * Authors: Glauber Costa and Suleiman Souhlal
> + *
>   * This program is free software; you can redistribute it and/or modify
>   * it under the terms of the GNU General Public License as published by
>   * the Free Software Foundation; either version 2 of the License, or
> @@ -426,6 +430,9 @@ struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *s)
>  #include <net/ip.h>
>  
>  static bool mem_cgroup_is_root(struct mem_cgroup *memcg);
> +static int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size);
> +static void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size);
> +

Why the forward declarations here? We can simply move definitions up
before they are used for the first time, can't we? Besides that they are
never used/defined from outside of KMEM_MEMCG.

>  void sock_update_memcg(struct sock *sk)
>  {
>  	if (mem_cgroup_sockets_enabled) {
> @@ -480,6 +487,110 @@ struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg)
>  }
>  EXPORT_SYMBOL(tcp_proto_cgroup);
>  #endif /* CONFIG_INET */
> +
> +static inline bool memcg_can_account_kmem(struct mem_cgroup *memcg)
> +{
> +	return !mem_cgroup_disabled() && !mem_cgroup_is_root(memcg) &&
> +		memcg->kmem_accounted;
> +}
> +
> +/*
> + * We need to verify if the allocation against current->mm->owner's memcg is
> + * possible for the given order. But the page is not allocated yet, so we'll
> + * need a further commit step to do the final arrangements.
> + *
> + * It is possible for the task to switch cgroups in this mean time, so at
> + * commit time, we can't rely on task conversion any longer.  We'll then use
> + * the handle argument to return to the caller which cgroup we should commit
> + * against. We could also return the memcg directly and avoid the pointer
> + * passing, but a boolean return value gives better semantics considering
> + * the compiled-out case as well.
> + *
> + * Returning true means the allocation is possible.
> + */
> +bool
> +__memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **_memcg, int order)
> +{
> +	struct mem_cgroup *memcg;
> +	bool ret;
> +	struct task_struct *p;

Johannes likes christmas trees ;) and /me would like to remove `p' and
use mem_cgroup_from_task(rcu_dereference(current->mm->owner)) same as we
do at other places (I guess it will be checkpatch safe).

> +
> +	*_memcg = NULL;
> +	rcu_read_lock();
> +	p = rcu_dereference(current->mm->owner);
> +	memcg = mem_cgroup_from_task(p);

mem_cgroup_from_task says it can return NULL. Do we care here? If not
then please put VM_BUG_ON(!memcg) here.

> +	rcu_read_unlock();
> +
> +	if (!memcg_can_account_kmem(memcg))
> +		return true;
> +
> +	mem_cgroup_get(memcg);

I am confused. Why do we take a reference to memcg rather than css_get
here? Ahh it is because we keep the reference while the page is
allocated, right? Comment please.

I am still not sure whether we need css_get here as well. How do you
know that the current is not moved in parallel and it is a last task in
a group which then can go away?

> +
> +	ret = memcg_charge_kmem(memcg, gfp, PAGE_SIZE << order) == 0;
> +	if (ret)
> +		*_memcg = memcg;
> +	else
> +		mem_cgroup_put(memcg);
> +
> +	return ret;
> +}
> +
> +void __memcg_kmem_commit_charge(struct page *page, struct mem_cgroup *memcg,
> +			      int order)
> +{
> +	struct page_cgroup *pc;
> +
> +	WARN_ON(mem_cgroup_is_root(memcg));

Why the warn? Nobody should use this directly and
memcg_kmem_commit_charge takes care of the root same as
__memcg_kmem_newpage_charge does. If it is for correctness then it
should be VM_BUG_ON.

> +
> +	/* The page allocation failed. Revert */
> +	if (!page) {
> +		memcg_uncharge_kmem(memcg, PAGE_SIZE << order);
> +		return;
> +	}
> +
> +	pc = lookup_page_cgroup(page);
> +	lock_page_cgroup(pc);
> +	pc->mem_cgroup = memcg;
> +	SetPageCgroupUsed(pc);
> +	unlock_page_cgroup(pc);
> +}
> +
> +void __memcg_kmem_uncharge_page(struct page *page, int order)
> +{
> +	struct mem_cgroup *memcg = NULL;
> +	struct page_cgroup *pc;
> +
> +
> +	pc = lookup_page_cgroup(page);
> +	/*
> +	 * Fast unlocked return. Theoretically might have changed, have to
> +	 * check again after locking.
> +	 */
> +	if (!PageCgroupUsed(pc))
> +		return;
> +
> +	lock_page_cgroup(pc);
> +	if (PageCgroupUsed(pc)) {
> +		memcg = pc->mem_cgroup;
> +		ClearPageCgroupUsed(pc);
> +	}
> +	unlock_page_cgroup(pc);
> +
> +	/*
> +	 * Checking if kmem accounted is enabled won't work for uncharge, since
> +	 * it is possible that the user enabled kmem tracking, allocated, and
> +	 * then disabled it again.

disabling cannot happen, right?

> +	 *
> +	 * We trust if there is a memcg associated with the page, it is a valid
> +	 * allocation
> +	 */
> +	if (!memcg)
> +		return;
> +
> +	WARN_ON(mem_cgroup_is_root(memcg));

Same as above I do not see a reason for warn here. It just adds a code
and if you want it for debugging then VM_BUG_ON sounds more appropriate.
/me thinks

> +	memcg_uncharge_kmem(memcg, PAGE_SIZE << order);
> +	mem_cgroup_put(memcg);
> +}
>  #endif /* CONFIG_MEMCG_KMEM */
>  
>  #if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
> @@ -5700,3 +5811,69 @@ static int __init enable_swap_account(char *s)
>  __setup("swapaccount=", enable_swap_account);
>  
>  #endif
> +
> +#ifdef CONFIG_MEMCG_KMEM
> +int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size)
> +{
> +	struct res_counter *fail_res;
> +	struct mem_cgroup *_memcg;
> +	int ret;
> +	bool may_oom;
> +	bool nofail = false;
> +
> +	may_oom = (gfp & __GFP_WAIT) && (gfp & __GFP_FS) &&
> +	    !(gfp & __GFP_NORETRY);

A comment please? Why __GFP_IO is not considered for example?

> +
> +	ret = 0;
> +
> +	if (!memcg)
> +		return ret;

How can we get a NULL memcg here without blowing in
__memcg_kmem_newpage_charge?

> +
> +	_memcg = memcg;
> +	ret = __mem_cgroup_try_charge(NULL, gfp, size / PAGE_SIZE,

me likes >> PAGE_SHIFT more.

> +	    &_memcg, may_oom);
> +
> +	if (ret == -EINTR)  {
> +		nofail = true;
> +		/*
> +		 * __mem_cgroup_try_charge() chosed to bypass to root due to
> +		 * OOM kill or fatal signal.  Since our only options are to
> +		 * either fail the allocation or charge it to this cgroup, do
> +		 * it as a temporary condition. But we can't fail. From a
> +		 * kmem/slab perspective, the cache has already been selected,
> +		 * by mem_cgroup_get_kmem_cache(), so it is too late to change
> +		 * our minds
> +		 */
> +		res_counter_charge_nofail(&memcg->res, size, &fail_res);
> +		if (do_swap_account)
> +			res_counter_charge_nofail(&memcg->memsw, size,
> +						  &fail_res);
> +		ret = 0;
> +	} else if (ret == -ENOMEM)
> +		return ret;
> +
> +	if (nofail)
> +		res_counter_charge_nofail(&memcg->kmem, size, &fail_res);
> +	else
> +		ret = res_counter_charge(&memcg->kmem, size, &fail_res);
> +
> +	if (ret) {
> +		res_counter_uncharge(&memcg->res, size);
> +		if (do_swap_account)
> +			res_counter_uncharge(&memcg->memsw, size);
> +	}

You could save few lines and get rid of the strange nofail by:
[...]
+		res_counter_charge_nofail(&memcg->res, size, &fail_res);
+		if (do_swap_account)
+			res_counter_charge_nofail(&memcg->memsw, size,
+						  &fail_res);
+		res_counter_charge_nofail(&memcg->kmem, size, &fail_res);
+		return 0;
+	} else if (ret == -ENOMEM)
+		return ret;
+	else
+		ret = res_counter_charge(&memcg->kmem, size, &fail_res);
+
+	if (ret) {
+		res_counter_uncharge(&memcg->res, size);
+		if (do_swap_account)
+			res_counter_uncharge(&memcg->memsw, size);
+	}

> +
> +	return ret;
> +}
> +
> +void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size)
> +{
> +	if (!memcg)
> +		return;
> +
> +	res_counter_uncharge(&memcg->kmem, size);
> +	res_counter_uncharge(&memcg->res, size);
> +	if (do_swap_account)
> +		res_counter_uncharge(&memcg->memsw, size);
> +}
> +#endif /* CONFIG_MEMCG_KMEM */
> -- 
> 1.7.11.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Michal Hocko
SUSE Labs

  parent reply	other threads:[~2012-09-26 15:51 UTC|newest]

Thread overview: 323+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-18 14:03 [PATCH v3 00/13] kmem controller for memcg Glauber Costa
2012-09-18 14:03 ` Glauber Costa
2012-09-18 14:03 ` Glauber Costa
2012-09-18 14:03 ` [PATCH v3 01/13] memcg: Make it possible to use the stock for more than one page Glauber Costa
2012-09-18 14:03   ` Glauber Costa
2012-09-18 14:03   ` Glauber Costa
2012-10-01 18:48   ` Johannes Weiner
2012-10-01 18:48     ` Johannes Weiner
2012-09-18 14:03 ` [PATCH v3 02/13] memcg: Reclaim when more than one page needed Glauber Costa
2012-09-18 14:03   ` Glauber Costa
2012-09-18 14:03   ` Glauber Costa
2012-10-01 19:00   ` Johannes Weiner
2012-10-01 19:00     ` Johannes Weiner
2012-09-18 14:04 ` [PATCH v3 03/13] memcg: change defines to an enum Glauber Costa
2012-09-18 14:04   ` Glauber Costa
2012-09-18 14:04   ` Glauber Costa
2012-10-01 19:06   ` Johannes Weiner
2012-10-01 19:06     ` Johannes Weiner
2012-10-01 19:06     ` Johannes Weiner
2012-10-02  9:10     ` Glauber Costa
2012-10-02  9:10       ` Glauber Costa
2012-09-18 14:04 ` [PATCH v3 04/13] kmem accounting basic infrastructure Glauber Costa
2012-09-18 14:04   ` Glauber Costa
2012-09-18 14:04   ` Glauber Costa
2012-09-21 16:34   ` Tejun Heo
2012-09-21 16:34     ` Tejun Heo
2012-09-24  8:09     ` Glauber Costa
2012-09-24  8:09       ` Glauber Costa
2012-09-24  8:09       ` Glauber Costa
2012-09-26 14:03   ` Michal Hocko
2012-09-26 14:03     ` Michal Hocko
2012-09-26 14:03     ` Michal Hocko
2012-09-26 14:33     ` Glauber Costa
2012-09-26 14:33       ` Glauber Costa
2012-09-26 16:01       ` Michal Hocko
2012-09-26 16:01         ` Michal Hocko
2012-09-26 16:01         ` Michal Hocko
2012-09-26 17:34         ` Glauber Costa
2012-09-26 17:34           ` Glauber Costa
2012-09-26 16:36     ` Tejun Heo
2012-09-26 16:36       ` Tejun Heo
2012-09-26 16:36       ` Tejun Heo
2012-09-26 17:36       ` Glauber Costa
2012-09-26 17:36         ` Glauber Costa
2012-09-26 17:44         ` Tejun Heo
2012-09-26 17:44           ` Tejun Heo
2012-09-26 17:44           ` Tejun Heo
2012-09-26 17:53           ` Glauber Costa
2012-09-26 17:53             ` Glauber Costa
2012-09-26 18:01             ` Tejun Heo
2012-09-26 18:01               ` Tejun Heo
2012-09-26 18:56               ` Glauber Costa
2012-09-26 18:56                 ` Glauber Costa
2012-09-26 18:56                 ` Glauber Costa
2012-09-26 19:34                 ` Tejun Heo
2012-09-26 19:34                   ` Tejun Heo
2012-09-26 19:34                   ` Tejun Heo
2012-09-26 19:46                   ` Glauber Costa
2012-09-26 19:46                     ` Glauber Costa
2012-09-26 19:56                     ` Tejun Heo
2012-09-26 19:56                       ` Tejun Heo
2012-09-26 19:56                       ` Tejun Heo
2012-09-26 20:02                       ` Glauber Costa
2012-09-26 20:02                         ` Glauber Costa
2012-09-26 20:16                         ` Tejun Heo
2012-09-26 20:16                           ` Tejun Heo
2012-09-26 20:16                           ` Tejun Heo
2012-09-26 21:24                           ` Glauber Costa
2012-09-26 21:24                             ` Glauber Costa
2012-09-26 22:10                             ` Tejun Heo
2012-09-26 22:10                               ` Tejun Heo
2012-09-26 22:10                               ` Tejun Heo
2012-09-26 22:29                               ` Glauber Costa
2012-09-26 22:29                                 ` Glauber Costa
2012-09-26 22:42                                 ` Tejun Heo
2012-09-26 22:42                                   ` Tejun Heo
2012-09-26 22:42                                   ` Tejun Heo
2012-09-26 22:54                                   ` Glauber Costa
2012-09-26 22:54                                     ` Glauber Costa
2012-09-26 23:08                                     ` Tejun Heo
2012-09-26 23:08                                       ` Tejun Heo
2012-09-26 23:08                                       ` Tejun Heo
2012-09-26 23:20                                       ` Glauber Costa
2012-09-26 23:20                                         ` Glauber Costa
2012-09-26 23:33                                         ` Tejun Heo
2012-09-26 23:33                                           ` Tejun Heo
2012-09-27 12:15                                           ` Michal Hocko
2012-09-27 12:15                                             ` Michal Hocko
2012-09-27 12:15                                             ` Michal Hocko
2012-09-27 12:20                                             ` Glauber Costa
2012-09-27 12:20                                               ` Glauber Costa
2012-09-27 12:40                                               ` Michal Hocko
2012-09-27 12:40                                                 ` Michal Hocko
2012-09-27 12:40                                                 ` Michal Hocko
2012-09-27 12:40                                                 ` Glauber Costa
2012-09-27 12:40                                                   ` Glauber Costa
2012-09-27 12:40                                                   ` Glauber Costa
2012-09-27 12:54                                                   ` Michal Hocko
2012-09-27 12:54                                                     ` Michal Hocko
2012-09-27 14:28                                       ` Mel Gorman
2012-09-27 14:28                                         ` Mel Gorman
2012-09-27 14:28                                         ` Mel Gorman
2012-09-27 14:49                                         ` Tejun Heo
2012-09-27 14:49                                           ` Tejun Heo
2012-09-27 14:57                                           ` Glauber Costa
2012-09-27 14:57                                             ` Glauber Costa
2012-09-27 17:46                                             ` Tejun Heo
2012-09-27 17:46                                               ` Tejun Heo
2012-09-27 17:46                                               ` Tejun Heo
2012-09-27 17:56                                               ` Michal Hocko
2012-09-27 17:56                                                 ` Michal Hocko
2012-09-27 18:45                                               ` Glauber Costa
2012-09-27 18:45                                                 ` Glauber Costa
2012-09-30  7:57                                                 ` Tejun Heo
2012-09-30  7:57                                                   ` Tejun Heo
2012-09-30  7:57                                                   ` Tejun Heo
2012-09-30  8:02                                                   ` Tejun Heo
2012-09-30  8:02                                                     ` Tejun Heo
2012-09-30  8:56                                                     ` James Bottomley
2012-09-30  8:56                                                       ` James Bottomley
2012-09-30  8:56                                                       ` James Bottomley
2012-09-30 10:37                                                       ` Tejun Heo
2012-09-30 10:37                                                         ` Tejun Heo
2012-09-30 10:37                                                         ` Tejun Heo
2012-09-30 11:25                                                         ` James Bottomley
2012-09-30 11:25                                                           ` James Bottomley
2012-10-01  0:57                                                           ` Tejun Heo
2012-10-01  0:57                                                             ` Tejun Heo
2012-10-01  0:57                                                             ` Tejun Heo
2012-10-01  8:43                                                             ` Glauber Costa
2012-10-01  8:43                                                               ` Glauber Costa
2012-10-01  8:46                                                         ` Glauber Costa
2012-10-01  8:46                                                           ` Glauber Costa
2012-10-03 22:59                                                           ` Tejun Heo
2012-10-03 22:59                                                             ` Tejun Heo
2012-10-01  8:36                                                   ` Glauber Costa
2012-10-01  8:36                                                     ` Glauber Costa
2012-09-27 12:08                             ` Michal Hocko
2012-09-27 12:08                               ` Michal Hocko
2012-09-27 12:08                               ` Michal Hocko
2012-09-27 12:11                               ` Glauber Costa
2012-09-27 12:11                                 ` Glauber Costa
2012-09-27 14:33                               ` Tejun Heo
2012-09-27 14:33                                 ` Tejun Heo
2012-09-27 14:43                                 ` Mel Gorman
2012-09-27 14:43                                   ` Mel Gorman
2012-09-27 14:43                                   ` Mel Gorman
2012-09-27 14:58                                   ` Tejun Heo
2012-09-27 14:58                                     ` Tejun Heo
2012-09-27 18:30                                     ` Glauber Costa
2012-09-27 18:30                                       ` Glauber Costa
2012-09-30  8:23                                       ` Tejun Heo
2012-09-30  8:23                                         ` Tejun Heo
2012-10-01  8:45                                         ` Glauber Costa
2012-10-01  8:45                                           ` Glauber Costa
2012-10-03 22:54                                           ` Tejun Heo
2012-10-03 22:54                                             ` Tejun Heo
2012-10-04 11:55                                             ` Glauber Costa
2012-10-04 11:55                                               ` Glauber Costa
2012-10-06  2:19                                               ` Tejun Heo
2012-10-06  2:19                                                 ` Tejun Heo
2012-10-06  2:19                                                 ` Tejun Heo
2012-09-27 15:09                                 ` Michal Hocko
2012-09-27 15:09                                   ` Michal Hocko
2012-09-30  8:47                                   ` Tejun Heo
2012-09-30  8:47                                     ` Tejun Heo
2012-10-01  9:27                                     ` Michal Hocko
2012-10-01  9:27                                       ` Michal Hocko
2012-10-03 22:43                                       ` Tejun Heo
2012-10-03 22:43                                         ` Tejun Heo
2012-10-03 22:43                                         ` Tejun Heo
2012-10-05 13:47                                         ` Michal Hocko
2012-10-05 13:47                                           ` Michal Hocko
2012-10-05 13:47                                           ` Michal Hocko
2012-09-26 22:11                         ` Johannes Weiner
2012-09-26 22:11                           ` Johannes Weiner
2012-09-26 22:11                           ` Johannes Weiner
2012-09-26 22:45                           ` Glauber Costa
2012-09-26 22:45                             ` Glauber Costa
2012-09-18 14:04 ` [PATCH v3 05/13] Add a __GFP_KMEMCG flag Glauber Costa
2012-09-18 14:04   ` Glauber Costa
2012-09-18 14:04   ` Glauber Costa
2012-09-18 14:15   ` Rik van Riel
2012-09-18 14:15     ` Rik van Riel
2012-09-18 15:06   ` Christoph Lameter
2012-09-18 15:06     ` Christoph Lameter
2012-09-18 15:06     ` Christoph Lameter
2012-09-19  7:39     ` Glauber Costa
2012-09-19  7:39       ` Glauber Costa
2012-09-19  7:39       ` Glauber Costa
2012-09-19 14:07       ` Christoph Lameter
2012-09-19 14:07         ` Christoph Lameter
2012-09-19 14:07         ` Christoph Lameter
2012-09-27 13:34   ` Mel Gorman
2012-09-27 13:34     ` Mel Gorman
2012-09-27 13:41     ` Glauber Costa
2012-09-27 13:41       ` Glauber Costa
2012-10-01 19:09   ` Johannes Weiner
2012-10-01 19:09     ` Johannes Weiner
2012-09-18 14:04 ` [PATCH v3 06/13] memcg: kmem controller infrastructure Glauber Costa
2012-09-18 14:04   ` Glauber Costa
2012-09-18 14:04   ` Glauber Costa
2012-09-20 16:05   ` JoonSoo Kim
2012-09-20 16:05     ` JoonSoo Kim
2012-09-20 16:05     ` JoonSoo Kim
2012-09-21  8:41     ` Glauber Costa
2012-09-21  8:41       ` Glauber Costa
2012-09-21  8:41       ` Glauber Costa
2012-09-21  9:14       ` JoonSoo Kim
2012-09-21  9:14         ` JoonSoo Kim
2012-09-26 15:51   ` Michal Hocko [this message]
2012-09-26 15:51     ` Michal Hocko
2012-09-26 15:51     ` Michal Hocko
2012-09-27 11:31     ` Glauber Costa
2012-09-27 11:31       ` Glauber Costa
2012-09-27 13:44       ` Michal Hocko
2012-09-27 13:44         ` Michal Hocko
2012-09-27 13:44         ` Michal Hocko
2012-09-28 11:34         ` Glauber Costa
2012-09-28 11:34           ` Glauber Costa
2012-09-28 11:34           ` Glauber Costa
2012-09-30  8:25           ` Tejun Heo
2012-09-30  8:25             ` Tejun Heo
2012-10-01  8:28             ` Glauber Costa
2012-10-01  8:28               ` Glauber Costa
2012-10-03 22:11               ` Tejun Heo
2012-10-03 22:11                 ` Tejun Heo
2012-10-03 22:11                 ` Tejun Heo
2012-10-01  9:44             ` Michal Hocko
2012-10-01  9:44               ` Michal Hocko
2012-10-01  9:44               ` Michal Hocko
2012-10-01  9:48           ` Michal Hocko
2012-10-01  9:48             ` Michal Hocko
2012-10-01  9:48             ` Michal Hocko
2012-10-01 10:09             ` Glauber Costa
2012-10-01 10:09               ` Glauber Costa
2012-10-01 11:51               ` Michal Hocko
2012-10-01 11:51                 ` Michal Hocko
2012-10-01 11:51                 ` Glauber Costa
2012-10-01 11:51                   ` Glauber Costa
2012-10-01 11:51                   ` Glauber Costa
2012-10-01 11:58                   ` Michal Hocko
2012-10-01 11:58                     ` Michal Hocko
2012-10-01 12:04                     ` Glauber Costa
2012-10-01 12:04                       ` Glauber Costa
2012-10-01 12:04                       ` Glauber Costa
2012-09-18 14:04 ` [PATCH v3 07/13] mm: Allocate kernel pages to the right memcg Glauber Costa
2012-09-18 14:04   ` Glauber Costa
2012-09-18 14:04   ` Glauber Costa
2012-09-27 13:50   ` Mel Gorman
2012-09-27 13:50     ` Mel Gorman
2012-09-28  9:43     ` Glauber Costa
2012-09-28  9:43       ` Glauber Costa
2012-09-28  9:43       ` Glauber Costa
2012-09-28 13:28       ` Mel Gorman
2012-09-28 13:28         ` Mel Gorman
2012-09-27 13:52   ` Michal Hocko
2012-09-27 13:52     ` Michal Hocko
2012-09-27 13:52     ` Michal Hocko
2012-09-18 14:04 ` [PATCH v3 08/13] res_counter: return amount of charges after res_counter_uncharge Glauber Costa
2012-09-18 14:04   ` Glauber Costa
2012-09-18 14:04   ` Glauber Costa
2012-10-01 10:00   ` Michal Hocko
2012-10-01 10:00     ` Michal Hocko
2012-10-01 10:01     ` Glauber Costa
2012-10-01 10:01       ` Glauber Costa
2012-09-18 14:04 ` [PATCH v3 09/13] memcg: kmem accounting lifecycle management Glauber Costa
2012-09-18 14:04   ` Glauber Costa
2012-09-18 14:04   ` Glauber Costa
2012-10-01 12:15   ` Michal Hocko
2012-10-01 12:15     ` Michal Hocko
2012-10-01 12:15     ` Michal Hocko
2012-10-01 12:29     ` Glauber Costa
2012-10-01 12:29       ` Glauber Costa
2012-10-01 12:29       ` Glauber Costa
2012-10-01 12:36       ` Michal Hocko
2012-10-01 12:36         ` Michal Hocko
2012-10-01 12:36         ` Michal Hocko
2012-10-01 12:43         ` Glauber Costa
2012-10-01 12:43           ` Glauber Costa
2012-10-01 12:43           ` Glauber Costa
2012-09-18 14:04 ` [PATCH v3 10/13] memcg: use static branches when code not in use Glauber Costa
2012-09-18 14:04   ` Glauber Costa
2012-09-18 14:04   ` Glauber Costa
2012-10-01 12:25   ` Michal Hocko
2012-10-01 12:25     ` Michal Hocko
2012-10-01 12:27     ` Glauber Costa
2012-10-01 12:27       ` Glauber Costa
2012-10-01 12:27       ` Glauber Costa
2012-09-18 14:04 ` [PATCH v3 11/13] memcg: allow a memcg with kmem charges to be destructed Glauber Costa
2012-09-18 14:04   ` Glauber Costa
2012-09-18 14:04   ` Glauber Costa
2012-10-01 12:30   ` Michal Hocko
2012-10-01 12:30     ` Michal Hocko
2012-10-01 12:30     ` Michal Hocko
2012-09-18 14:04 ` [PATCH v3 12/13] execute the whole memcg freeing in rcu callback Glauber Costa
2012-09-18 14:04   ` Glauber Costa
2012-09-18 14:04   ` Glauber Costa
2012-09-21 17:23   ` Tejun Heo
2012-09-21 17:23     ` Tejun Heo
2012-09-21 17:23     ` Tejun Heo
2012-09-24  8:48     ` Glauber Costa
2012-09-24  8:48       ` Glauber Costa
2012-09-24  8:48       ` Glauber Costa
2012-10-01 13:27   ` Michal Hocko
2012-10-01 13:27     ` Michal Hocko
2012-10-01 13:27     ` Michal Hocko
2012-10-04 10:53     ` Glauber Costa
2012-10-04 10:53       ` Glauber Costa
2012-10-04 10:53       ` Glauber Costa
2012-10-04 14:20       ` Glauber Costa
2012-10-04 14:20         ` Glauber Costa
2012-10-05 15:31       ` Johannes Weiner
2012-10-05 15:31         ` Johannes Weiner
2012-10-05 15:31         ` Johannes Weiner
2012-10-08  9:45         ` Glauber Costa
2012-10-08  9:45           ` Glauber Costa
2012-09-18 14:04 ` [PATCH v3 13/13] protect architectures where THREAD_SIZE >= PAGE_SIZE against fork bombs Glauber Costa
2012-09-18 14:04   ` Glauber Costa
2012-09-18 14:04   ` Glauber Costa
2012-10-01 13:17   ` Michal Hocko
2012-10-01 13:17     ` Michal Hocko
2012-10-01 13:17     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120926155108.GE15801@dhcp22.suse.cz \
    --to=mhocko@suse.cz \
    --cc=cgroups@vger.kernel.org \
    --cc=cl@linux.com \
    --cc=devel@openvz.org \
    --cc=fweisbec@gmail.com \
    --cc=glommer@parallels.com \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=penberg@cs.helsinki.fi \
    --cc=rientjes@google.com \
    --cc=suleiman@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.