Re: [PATCH v2 18/29] memcg: kmem controller charge/uncharge infrastructure

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Glauber Costa <glommer@parallels.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: <linux-kernel@vger.kernel.org>, <cgroups@vger.kernel.org>,
	<linux-mm@kvack.org>, Tejun Heo <tj@kernel.org>,
	Li Zefan <lizefan@huawei.com>, Greg Thelen <gthelen@google.com>,
	Suleiman Souhlal <suleiman@google.com>,
	Michal Hocko <mhocko@suse.cz>,
	Johannes Weiner <hannes@cmpxchg.org>, <devel@openvz.org>,
	Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@cs.helsinki.fi>
Subject: Re: [PATCH v2 18/29] memcg: kmem controller charge/uncharge infrastructure
Date: Wed, 16 May 2012 12:25:42 +0400	[thread overview]
Message-ID: <4FB36486.6060500@parallels.com> (raw)
In-Reply-To: <4FB362D4.8000800@jp.fujitsu.com>

On 05/16/2012 12:18 PM, KAMEZAWA Hiroyuki wrote:
>> If at this point the memcg hits a NOFAIL allocation worth 2 pages, by
>> >  the method I am using, the memcg will be at 4M + 4k after the
>> >  allocation. Charging it to the root memcg will leave it at 4M - 4k.
>> >  
>> >  This means that to be able to allocate a page again, you need to free
>> >  two other pages, be it the 2 pages used by the GFP allocation or any
>> >  other. In other words: the memcg that originated the charge is held
>> >  accountable for it. If he says it can't fail for whatever reason, fine,
>> >  we respect that,  but we punish it later for other allocations.
>> >  
> I personally think 'we punish it later' is bad thing at resource accounting.
> We have 'hard limit'. It's not soft limit.

That only makes sense if you will fail the allocation. If you won't, you
are over your hard limit anyway. You are just masquerading that.

>> >  Without that GFP_NOFAIL becomes just a nice way for people to bypass
>> >  those controls altogether, since after a ton of GFP_NOFAIL allocations,
>> >  normal allocations will still succeed.
>> >  
> Allowing people to bypass is not bad because they're kernel.

No, they are not. They are in process context, on behalf of a process
that belongs to a valid memcg. If they happen to be a kernel thread,
!current->mm test will send the allocation to the root memcg already.

> 
> But, IIUC, from gfp.h
> ==
>   * __GFP_NOFAIL: The VM implementation_must_  retry infinitely: the caller
>   * cannot handle allocation failures.  This modifier is deprecated and no new
>   * users should be added.
> ==
> 
> GFP_NOFAIL will go away and no new user is recommended.
> 
Yes, I am aware of that. That's actually why I don't plan to insist on
this too much - although your e-mail didn't really convince me.

It should not matter in practice.

> So, please skip GFP_NOFAIL accounting and avoid to write
> "usage may go over limit if you're unfortune, sorry" into memcg documentation.

I won't write that, because that's not true. Is more like: "Allocations
that can fail will fail if you go over limit".

> 
>> >  The change you propose is totally doable. I just don't believe it should
>> >  be done.
>> >  
>> >  But let me know where you stand.
>> >  
> My stand point is keeping "usage<= limit" is the spec. and
> important in enterprise system. So, please avoid usage>  limit.
> 
As I said, I won't make a case here because those allocations shouldn't
matter in real life anyway. I can change it.

WARNING: multiple messages have this Message-ID (diff)

From: Glauber Costa <glommer@parallels.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
	linux-mm@kvack.org, Tejun Heo <tj@kernel.org>,
	Li Zefan <lizefan@huawei.com>, Greg Thelen <gthelen@google.com>,
	Suleiman Souhlal <suleiman@google.com>,
	Michal Hocko <mhocko@suse.cz>,
	Johannes Weiner <hannes@cmpxchg.org>,
	devel@openvz.org, Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@cs.helsinki.fi>
Subject: Re: [PATCH v2 18/29] memcg: kmem controller charge/uncharge infrastructure
Date: Wed, 16 May 2012 12:25:42 +0400	[thread overview]
Message-ID: <4FB36486.6060500@parallels.com> (raw)
In-Reply-To: <4FB362D4.8000800@jp.fujitsu.com>

On 05/16/2012 12:18 PM, KAMEZAWA Hiroyuki wrote:
>> If at this point the memcg hits a NOFAIL allocation worth 2 pages, by
>> >  the method I am using, the memcg will be at 4M + 4k after the
>> >  allocation. Charging it to the root memcg will leave it at 4M - 4k.
>> >  
>> >  This means that to be able to allocate a page again, you need to free
>> >  two other pages, be it the 2 pages used by the GFP allocation or any
>> >  other. In other words: the memcg that originated the charge is held
>> >  accountable for it. If he says it can't fail for whatever reason, fine,
>> >  we respect that,  but we punish it later for other allocations.
>> >  
> I personally think 'we punish it later' is bad thing at resource accounting.
> We have 'hard limit'. It's not soft limit.

That only makes sense if you will fail the allocation. If you won't, you
are over your hard limit anyway. You are just masquerading that.

>> >  Without that GFP_NOFAIL becomes just a nice way for people to bypass
>> >  those controls altogether, since after a ton of GFP_NOFAIL allocations,
>> >  normal allocations will still succeed.
>> >  
> Allowing people to bypass is not bad because they're kernel.

No, they are not. They are in process context, on behalf of a process
that belongs to a valid memcg. If they happen to be a kernel thread,
!current->mm test will send the allocation to the root memcg already.

> 
> But, IIUC, from gfp.h
> ==
>   * __GFP_NOFAIL: The VM implementation_must_  retry infinitely: the caller
>   * cannot handle allocation failures.  This modifier is deprecated and no new
>   * users should be added.
> ==
> 
> GFP_NOFAIL will go away and no new user is recommended.
> 
Yes, I am aware of that. That's actually why I don't plan to insist on
this too much - although your e-mail didn't really convince me.

It should not matter in practice.

> So, please skip GFP_NOFAIL accounting and avoid to write
> "usage may go over limit if you're unfortune, sorry" into memcg documentation.

I won't write that, because that's not true. Is more like: "Allocations
that can fail will fail if you go over limit".

> 
>> >  The change you propose is totally doable. I just don't believe it should
>> >  be done.
>> >  
>> >  But let me know where you stand.
>> >  
> My stand point is keeping "usage<= limit" is the spec. and
> important in enterprise system. So, please avoid usage>  limit.
> 
As I said, I won't make a case here because those allocations shouldn't
matter in real life anyway. I can change it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
To: KAMEZAWA Hiroyuki
	<kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>,
	Greg Thelen <gthelen-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Suleiman Souhlal
	<suleiman-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>,
	Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
	devel-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org,
	Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>,
	Pekka Enberg <penberg-bbCR+/B0CizivPeTLB3BmA@public.gmane.org>
Subject: Re: [PATCH v2 18/29] memcg: kmem controller charge/uncharge infrastructure
Date: Wed, 16 May 2012 12:25:42 +0400	[thread overview]
Message-ID: <4FB36486.6060500@parallels.com> (raw)
In-Reply-To: <4FB362D4.8000800-+CUm20s59erQFUHtdCDX3A@public.gmane.org>

On 05/16/2012 12:18 PM, KAMEZAWA Hiroyuki wrote:
>> If at this point the memcg hits a NOFAIL allocation worth 2 pages, by
>> >  the method I am using, the memcg will be at 4M + 4k after the
>> >  allocation. Charging it to the root memcg will leave it at 4M - 4k.
>> >  
>> >  This means that to be able to allocate a page again, you need to free
>> >  two other pages, be it the 2 pages used by the GFP allocation or any
>> >  other. In other words: the memcg that originated the charge is held
>> >  accountable for it. If he says it can't fail for whatever reason, fine,
>> >  we respect that,  but we punish it later for other allocations.
>> >  
> I personally think 'we punish it later' is bad thing at resource accounting.
> We have 'hard limit'. It's not soft limit.

That only makes sense if you will fail the allocation. If you won't, you
are over your hard limit anyway. You are just masquerading that.

>> >  Without that GFP_NOFAIL becomes just a nice way for people to bypass
>> >  those controls altogether, since after a ton of GFP_NOFAIL allocations,
>> >  normal allocations will still succeed.
>> >  
> Allowing people to bypass is not bad because they're kernel.

No, they are not. They are in process context, on behalf of a process
that belongs to a valid memcg. If they happen to be a kernel thread,
!current->mm test will send the allocation to the root memcg already.

> 
> But, IIUC, from gfp.h
> ==
>   * __GFP_NOFAIL: The VM implementation_must_  retry infinitely: the caller
>   * cannot handle allocation failures.  This modifier is deprecated and no new
>   * users should be added.
> ==
> 
> GFP_NOFAIL will go away and no new user is recommended.
> 
Yes, I am aware of that. That's actually why I don't plan to insist on
this too much - although your e-mail didn't really convince me.

It should not matter in practice.

> So, please skip GFP_NOFAIL accounting and avoid to write
> "usage may go over limit if you're unfortune, sorry" into memcg documentation.

I won't write that, because that's not true. Is more like: "Allocations
that can fail will fail if you go over limit".

> 
>> >  The change you propose is totally doable. I just don't believe it should
>> >  be done.
>> >  
>> >  But let me know where you stand.
>> >  
> My stand point is keeping "usage<= limit" is the spec. and
> important in enterprise system. So, please avoid usage>  limit.
> 
As I said, I won't make a case here because those allocations shouldn't
matter in real life anyway. I can change it.

next prev parent reply	other threads:[~2012-05-16  8:27 UTC|newest]

Thread overview: 167+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-11 17:44 [PATCH v2 00/29] kmem limitation for memcg Glauber Costa
2012-05-11 17:44 ` Glauber Costa
2012-05-11 17:44 ` Glauber Costa
2012-05-11 17:44 ` [PATCH v2 01/29] slab: dup name string Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-15 22:04   ` David Rientjes
2012-05-15 22:04     ` David Rientjes
2012-05-16  6:12     ` Glauber Costa
2012-05-16  6:12       ` Glauber Costa
2012-05-16  6:12       ` Glauber Costa
2012-05-11 17:44 ` [PATCH v2 02/29] slub: fix slab_state for slub Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:51   ` Christoph Lameter
2012-05-11 17:51     ` Christoph Lameter
2012-05-15 21:55   ` David Rientjes
2012-05-15 21:55     ` David Rientjes
2012-05-15 21:55     ` David Rientjes
2012-05-16  6:10     ` Glauber Costa
2012-05-16  6:10       ` Glauber Costa
2012-05-16  6:10       ` Glauber Costa
2012-05-17 10:14     ` Glauber Costa
2012-05-17 10:14       ` Glauber Costa
2012-05-17 10:14       ` Glauber Costa
2012-05-11 17:44 ` [PATCH v2 03/29] memcg: Always free struct memcg through schedule_work() Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44 ` [PATCH v2 04/29] slub: always get the cache from its page in kfree Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:53   ` Christoph Lameter
2012-05-11 17:53     ` Christoph Lameter
2012-05-11 17:57     ` Glauber Costa
2012-05-11 17:57       ` Glauber Costa
2012-05-11 17:57       ` Glauber Costa
2012-05-11 18:06       ` Christoph Lameter
2012-05-11 18:06         ` Christoph Lameter
2012-05-11 18:11         ` Glauber Costa
2012-05-11 18:11           ` Glauber Costa
2012-05-11 18:11           ` Glauber Costa
2012-05-11 18:17           ` Christoph Lameter
2012-05-11 18:17             ` Christoph Lameter
2012-05-11 18:20             ` Glauber Costa
2012-05-11 18:20               ` Glauber Costa
2012-05-11 18:20               ` Glauber Costa
2012-05-11 18:32               ` Christoph Lameter
2012-05-11 18:32                 ` Christoph Lameter
2012-05-11 18:32                 ` Christoph Lameter
2012-05-11 18:42                 ` Glauber Costa
2012-05-11 18:42                   ` Glauber Costa
2012-05-11 18:42                   ` Glauber Costa
2012-05-11 18:56                   ` Christoph Lameter
2012-05-11 18:56                     ` Christoph Lameter
2012-05-11 18:56                     ` Christoph Lameter
2012-05-11 18:58                     ` Glauber Costa
2012-05-11 18:58                       ` Glauber Costa
2012-05-11 18:58                       ` Glauber Costa
2012-05-11 19:09                       ` Christoph Lameter
2012-05-11 19:09                         ` Christoph Lameter
2012-05-11 19:11                         ` Glauber Costa
2012-05-11 19:11                           ` Glauber Costa
2012-05-11 19:11                           ` Glauber Costa
2012-05-11 19:20                           ` Christoph Lameter
2012-05-11 19:20                             ` Christoph Lameter
2012-05-11 19:24                             ` Glauber Costa
2012-05-11 19:24                               ` Glauber Costa
2012-05-11 19:24                               ` Glauber Costa
2012-05-11 17:44 ` [PATCH v2 05/29] slab: rename gfpflags to allocflags Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:54   ` Christoph Lameter
2012-05-11 17:54     ` Christoph Lameter
2012-05-15 21:57   ` David Rientjes
2012-05-15 21:57     ` David Rientjes
2012-05-11 17:44 ` [PATCH v2 06/29] memcg: Make it possible to use the stock for more than one page Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44 ` [PATCH v2 07/29] memcg: Reclaim when more than one page needed Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44 ` [PATCH v2 08/29] slab: use obj_size field of struct kmem_cache when not debugging Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44 ` [PATCH v2 09/29] memcg: change defines to an enum Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44 ` [PATCH v2 10/29] res_counter: don't force return value checking in res_counter_charge_nofail Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44 ` [PATCH v2 11/29] cgroups: ability to stop res charge propagation on bounded ancestor Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-15  2:59   ` KAMEZAWA Hiroyuki
2012-05-15  2:59     ` KAMEZAWA Hiroyuki
2012-05-16  6:16     ` Glauber Costa
2012-05-16  6:16       ` Glauber Costa
2012-05-16  6:16       ` Glauber Costa
2012-05-11 17:44 ` [PATCH v2 12/29] kmem slab accounting basic infrastructure Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44 ` [PATCH v2 13/29] slab/slub: struct memcg_params Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44 ` [PATCH v2 14/29] slub: consider a memcg parameter in kmem_create_cache Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44 ` [PATCH v2 15/29] slab: pass memcg parameter to kmem_cache_create Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44 ` [PATCH v2 16/29] slub: create duplicate cache Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44 ` [PATCH v2 17/29] slab: " Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44 ` [PATCH v2 18/29] memcg: kmem controller charge/uncharge infrastructure Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-15  2:57   ` KAMEZAWA Hiroyuki
2012-05-15  2:57     ` KAMEZAWA Hiroyuki
2012-05-16  6:42     ` Glauber Costa
2012-05-16  6:42       ` Glauber Costa
2012-05-16  8:18       ` KAMEZAWA Hiroyuki
2012-05-16  8:18         ` KAMEZAWA Hiroyuki
2012-05-16  8:25         ` Glauber Costa [this message]
2012-05-16  8:25           ` Glauber Costa
2012-05-16  8:25           ` Glauber Costa
2012-05-16  9:15           ` KAMEZAWA Hiroyuki
2012-05-16  9:15             ` KAMEZAWA Hiroyuki
2012-05-16  9:15             ` KAMEZAWA Hiroyuki
2012-05-11 17:44 ` [PATCH v2 19/29] skip memcg kmem allocations in specified code regions Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-15  2:46   ` KAMEZAWA Hiroyuki
2012-05-15  2:46     ` KAMEZAWA Hiroyuki
2012-05-16  6:19     ` Glauber Costa
2012-05-16  6:19       ` Glauber Costa
2012-05-16  6:19       ` Glauber Costa
2012-05-16  7:55       ` KAMEZAWA Hiroyuki
2012-05-16  7:55         ` KAMEZAWA Hiroyuki
2012-05-16  7:55         ` KAMEZAWA Hiroyuki
2012-05-11 17:44 ` [PATCH v2 20/29] slub: charge allocation to a memcg Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44 ` [PATCH v2 21/29] slab: per-memcg accounting of slab caches Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44 ` [PATCH v2 22/29] memcg: disable kmem code when not in use Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44 ` [PATCH v2 23/29] memcg: destroy memcg caches Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44 ` [PATCH v2 24/29] memcg/slub: shrink dead caches Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44 ` [PATCH v2 25/29] memcg: Track all the memcg children of a kmem_cache Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44 ` [PATCH v2 26/29] memcg: Per-memcg memory.kmem.slabinfo file Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44 ` [PATCH v2 27/29] slub: create slabinfo file for memcg Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44 ` [PATCH v2 28/29] slub: track all children of a kmem cache Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 17:44 ` [PATCH v2 29/29] Documentation: add documentation for slab tracker for memcg Glauber Costa
2012-05-11 17:44   ` Glauber Costa
2012-05-11 18:05 ` [PATCH v2 00/29] kmem limitation " Glauber Costa
2012-05-11 18:05   ` Glauber Costa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FB36486.6060500@parallels.com \
    --to=glommer@parallels.com \
    --cc=cgroups@vger.kernel.org \
    --cc=cl@linux.com \
    --cc=devel@openvz.org \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan@huawei.com \
    --cc=mhocko@suse.cz \
    --cc=penberg@cs.helsinki.fi \
    --cc=suleiman@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.