linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.cz>
To: Glauber Costa <glommer@parallels.com>
Cc: linux-mm@kvack.org, Pekka Enberg <penberg@kernel.org>,
	Cristoph Lameter <cl@linux.com>,
	David Rientjes <rientjes@google.com>,
	cgroups@vger.kernel.org, devel@openvz.org,
	kamezawa.hiroyu@jp.fujitsu.com, linux-kernel@vger.kernel.org,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Suleiman Souhlal <suleiman@google.com>
Subject: Re: [PATCH v4 07/25] memcg: Reclaim when more than one page needed.
Date: Thu, 21 Jun 2012 23:19:23 +0200	[thread overview]
Message-ID: <20120621211923.GC31759@tiehlicka.suse.cz> (raw)
In-Reply-To: <4FE227F8.3000504@parallels.com>

On Wed 20-06-12 23:43:52, Glauber Costa wrote:
> On 06/20/2012 05:47 PM, Michal Hocko wrote:
> >On Mon 18-06-12 14:28:00, Glauber Costa wrote:
> >>From: Suleiman Souhlal <ssouhlal@FreeBSD.org>
> >>
> >>mem_cgroup_do_charge() was written before slab accounting, and expects
> >>three cases: being called for 1 page, being called for a stock of 32 pages,
> >>or being called for a hugepage.  If we call for 2 or 3 pages (and several
> >>slabs used in process creation are such, at least with the debug options I
> >>had), it assumed it's being called for stock and just retried without reclaiming.
> >>
> >>Fix that by passing down a minsize argument in addition to the csize.
> >>
> >>And what to do about that (csize == PAGE_SIZE && ret) retry?  If it's
> >>needed at all (and presumably is since it's there, perhaps to handle
> >>races), then it should be extended to more than PAGE_SIZE, yet how far?
> >>And should there be a retry count limit, of what?  For now retry up to
> >>COSTLY_ORDER (as page_alloc.c does), stay safe with a cond_resched(),
> >>and make sure not to do it if __GFP_NORETRY.
> >>
> >>[v4: fixed nr pages calculation pointed out by Christoph Lameter ]
> >>
> >>Signed-off-by: Suleiman Souhlal <suleiman@google.com>
> >>Signed-off-by: Glauber Costa <glommer@parallels.com>
> >>Reviewed-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >
> >I think this is not ready to be merged yet.
> Fair Enough
> 
> >Two comments below.
> >
> >[...]
> >>@@ -2210,18 +2211,18 @@ static int mem_cgroup_do_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
> >>  	} else
> >>  		mem_over_limit = mem_cgroup_from_res_counter(fail_res, res);
> >>  	/*
> >>-	 * nr_pages can be either a huge page (HPAGE_PMD_NR), a batch
> >>-	 * of regular pages (CHARGE_BATCH), or a single regular page (1).
> >>-	 *
> >>  	 * Never reclaim on behalf of optional batching, retry with a
> >>  	 * single page instead.
> >>  	 */
> >>-	if (nr_pages == CHARGE_BATCH)
> >>+	if (nr_pages > min_pages)
> >>  		return CHARGE_RETRY;
> >>
> >>  	if (!(gfp_mask & __GFP_WAIT))
> >>  		return CHARGE_WOULDBLOCK;
> >>
> >>+	if (gfp_mask & __GFP_NORETRY)
> >>+		return CHARGE_NOMEM;
> >
> >This is kmem specific and should be preparated out in case this should
> >be merged before the rest.
> ok.
> 
> >Btw. I assume that oom==false when called from kmem...
> 
> What prevents the oom killer to be called for a reclaimable kmem
> allocation that can be satisfied ?

Well, I am not familiar with the rest of the patch series yet (sorry
about that) but playing with oom can be really nasty if oom score
doesn't consider also kmem allocations. You can end up killing
unexpected processes just because of kmem hungry (and nasty) process.
Dunno, have to thing about that.

> >>+
> >>  	ret = mem_cgroup_reclaim(mem_over_limit, gfp_mask, flags);
> >>  	if (mem_cgroup_margin(mem_over_limit) >= nr_pages)
> >>  		return CHARGE_RETRY;
> >>@@ -2234,8 +2235,10 @@ static int mem_cgroup_do_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
> >>  	 * unlikely to succeed so close to the limit, and we fall back
> >>  	 * to regular pages anyway in case of failure.
> >>  	 */
> >>-	if (nr_pages == 1 && ret)
> >>+	if (nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER) && ret) {
> >>+		cond_resched();
> >>  		return CHARGE_RETRY;
> >>+	}
> >
> >What prevents us from looping for unbounded amount of time here?
> >Maybe you need to consider the number of reclaimed pages here.
> 
> Why would we even loop here? It will just return CHARGE_RETRY, it is
> up to the caller to decide whether or not it will retry.

Yes, but the test was original to prevent oom when we managed to reclaim
something. And something might be enough for a single page but now you
have high order allocations so we can retry without any success.

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

  reply	other threads:[~2012-06-21 21:19 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-18 10:27 [PATCH v4 00/25] kmem limitation for memcg Glauber Costa
2012-06-18 10:27 ` [PATCH v4 01/25] slab: rename gfpflags to allocflags Glauber Costa
2012-06-18 10:27 ` [PATCH v4 02/25] provide a common place for initcall processing in kmem_cache Glauber Costa
2012-06-18 10:27 ` [PATCH v4 03/25] slab: move FULL state transition to an initcall Glauber Costa
2012-06-18 10:27 ` [PATCH v4 04/25] Wipe out CFLGS_OFF_SLAB from flags during initial slab creation Glauber Costa
2012-06-18 10:27 ` [PATCH v4 05/25] memcg: Always free struct memcg through schedule_work() Glauber Costa
2012-06-18 12:07   ` Kamezawa Hiroyuki
2012-06-18 12:10     ` Glauber Costa
2012-06-19  0:11       ` Kamezawa Hiroyuki
2012-06-20  7:32       ` Pekka Enberg
2012-06-20  8:40         ` Glauber Costa
2012-06-21 11:39           ` Kamezawa Hiroyuki
2012-06-20 13:20   ` Michal Hocko
2012-06-18 10:27 ` [PATCH v4 06/25] memcg: Make it possible to use the stock for more than one page Glauber Costa
2012-06-20 13:28   ` Michal Hocko
2012-06-20 19:36     ` Glauber Costa
2012-06-21 21:14       ` Michal Hocko
2012-06-25 13:03     ` Glauber Costa
2012-06-18 10:28 ` [PATCH v4 07/25] memcg: Reclaim when more than one page needed Glauber Costa
2012-06-20 13:47   ` Michal Hocko
2012-06-20 19:43     ` Glauber Costa
2012-06-21 21:19       ` Michal Hocko [this message]
2012-06-25 13:13         ` Glauber Costa
2012-06-25 14:04           ` Glauber Costa
2012-06-18 10:28 ` [PATCH v4 08/25] memcg: change defines to an enum Glauber Costa
2012-06-20 13:13   ` Michal Hocko
2012-06-18 10:28 ` [PATCH v4 09/25] kmem slab accounting basic infrastructure Glauber Costa
2012-06-18 10:28 ` [PATCH v4 10/25] slab/slub: struct memcg_params Glauber Costa
2012-06-18 10:28 ` [PATCH v4 11/25] consider a memcg parameter in kmem_create_cache Glauber Costa
2012-06-18 10:28 ` [PATCH v4 12/25] sl[au]b: always get the cache from its page in kfree Glauber Costa
2012-06-18 10:28 ` [PATCH v4 13/25] Add a __GFP_SLABMEMCG flag Glauber Costa
2012-06-18 10:28 ` [PATCH v4 14/25] memcg: kmem controller dispatch infrastructure Glauber Costa
2012-06-18 10:28 ` [PATCH v4 15/25] allow enable_cpu_cache to use preset values for its tunables Glauber Costa
2012-06-18 10:28 ` [PATCH v4 16/25] don't do __ClearPageSlab before freeing slab page Glauber Costa
2012-06-18 10:28 ` [PATCH v4 17/25] skip memcg kmem allocations in specified code regions Glauber Costa
2012-06-18 12:19   ` Kamezawa Hiroyuki
2012-06-18 10:28 ` [PATCH v4 18/25] mm: Allocate kernel pages to the right memcg Glauber Costa
2012-06-18 10:28 ` [PATCH v4 19/25] memcg: disable kmem code when not in use Glauber Costa
2012-06-18 12:22   ` Kamezawa Hiroyuki
2012-06-18 12:26     ` Glauber Costa
2012-06-18 10:28 ` [PATCH v4 20/25] memcg: destroy memcg caches Glauber Costa
2012-06-18 10:28 ` [PATCH v4 21/25] Track all the memcg children of a kmem_cache Glauber Costa
2012-06-18 10:28 ` [PATCH v4 22/25] slab: slab-specific propagation changes Glauber Costa
2012-06-18 10:28 ` [PATCH v4 23/25] memcg: propagate kmem limiting information to children Glauber Costa
2012-06-18 12:37   ` Kamezawa Hiroyuki
2012-06-18 12:43     ` Glauber Costa
2012-06-19  0:16       ` Kamezawa Hiroyuki
2012-06-19  8:35         ` Glauber Costa
2012-06-19  8:54           ` Glauber Costa
2012-06-20  8:59             ` Glauber Costa
2012-06-23  4:19               ` Kamezawa Hiroyuki
2012-06-18 10:28 ` [PATCH v4 24/25] memcg/slub: shrink dead caches Glauber Costa
2012-07-06 15:16   ` Christoph Lameter
2012-07-20 22:16     ` Glauber Costa
2012-07-25 15:23       ` Christoph Lameter
2012-07-25 18:15         ` Glauber Costa
2012-06-18 10:28 ` [PATCH v4 25/25] Documentation: add documentation for slab tracker for memcg Glauber Costa
2012-06-18 12:10 ` [PATCH v4 00/25] kmem limitation " Kamezawa Hiroyuki
2012-06-18 12:14   ` Glauber Costa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120621211923.GC31759@tiehlicka.suse.cz \
    --to=mhocko@suse.cz \
    --cc=cgroups@vger.kernel.org \
    --cc=cl@linux.com \
    --cc=devel@openvz.org \
    --cc=fweisbec@gmail.com \
    --cc=glommer@parallels.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=suleiman@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).