linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Tejun Heo <tj@kernel.org>, Michal Hocko <mhocko@suse.cz>,
	Li Zefan <lizefan@huawei.com>,
	linux-mm@kvack.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH] mm: memcontrol: fix cgroup creation failure after many small jobs
Date: Fri, 17 Jun 2016 12:40:43 -0400	[thread overview]
Message-ID: <20160617164043.GA10485@cmpxchg.org> (raw)
In-Reply-To: <20160617090655.GE13143@esperanza>

On Fri, Jun 17, 2016 at 12:06:55PM +0300, Vladimir Davydov wrote:
> On Wed, Jun 15, 2016 at 11:42:44PM -0400, Johannes Weiner wrote:
> > The memory controller has quite a bit of state that usually outlives
> > the cgroup and pins its CSS until said state disappears. At the same
> > time it imposes a 16-bit limit on the CSS ID space to economically
> > store IDs in the wild. Consequently, when we use cgroups to contain
> > frequent but small and short-lived jobs that leave behind some page
> > cache, we quickly run into the 64k limitations of outstanding CSSs.
> > Creating a new cgroup fails with -ENOSPC while there are only a few,
> > or even no user-visible cgroups in existence.
> > 
> > Although pinning CSSs past cgroup removal is common, there are only
> > two instances that actually need a CSS ID after a cgroup is deleted:
> > cache shadow entries and swapout records.
> > 
> > Cache shadow entries reference the ID weakly and can deal with the CSS
> > having disappeared when it's looked up later. They pose no hurdle.
> > 
> > Swap-out records do need to pin the css to hierarchically attribute
> > swapins after the cgroup has been deleted; though the only pages that
> > remain swapped out after a process exits are tmpfs/shmem pages. Those
> > references are under the user's control and thus manageable.
> > 
> > This patch introduces a private 16bit memcg ID and switches swap and
> > cache shadow entries over to using that. It then decouples the CSS
> > lifetime from the CSS ID lifetime, such that a CSS ID can be recycled
> > when the CSS is only pinned by common objects that don't need an ID.
> 
> There's already id which is only used for online memory cgroups - it's
> kmemcg_id. May be, instead of introducing one more idr, we could name it
> generically and reuse it for shadow entries?

Good point. But it seems mem_cgroup_idr is more generic, it makes
sense to switch slab accounting over to that. I'll look into that, but
as a refactoring patch on top of this fix.

> Regarding swap entries, would it really make much difference if we used
> 4 bytes per swap page instead of 2? For a 100 GB swap it'd increase
> overhead from 50 MB up to 100 MB, which still doesn't seem too much IMO,
> so may be just use plain unrestricted css->id for swap entries?

Yes and no. I agree that the increased consumption wouldn't be too
crazy, but if we have to maintain a 16-bit ID anyway, we might as well
use it for swap too to save that space. I don't think tmpfs and shmem
pins past offlining will be common enough to significantly eat into
the ID space of online cgroups.

  reply	other threads:[~2016-06-17 16:43 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-16  3:42 [PATCH] mm: memcontrol: fix cgroup creation failure after many small jobs Johannes Weiner
2016-06-16 20:06 ` Tejun Heo
2016-06-17 16:23   ` Johannes Weiner
2016-06-17 16:23     ` [PATCH 1/3] cgroup: fix idr leak for the first cgroup root Johannes Weiner
2016-06-17 16:24     ` [PATCH 2/3] cgroup: remove unnecessary 0 check from css_from_id() Johannes Weiner
2016-06-17 18:17       ` Tejun Heo
2016-06-17 16:25     ` [PATCH 3/3] mm: memcontrol: fix cgroup creation failure after many small jobs Johannes Weiner
2016-06-17 18:18       ` Tejun Heo
2016-06-20  6:14       ` Nikolay Borisov
2016-06-21 10:16       ` Vladimir Davydov
2016-06-21 15:46         ` Johannes Weiner
2016-06-17  9:06 ` [PATCH] " Vladimir Davydov
2016-06-17 16:40   ` Johannes Weiner [this message]
2016-07-14 15:37 ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160617164043.GA10485@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan@huawei.com \
    --cc=mhocko@suse.cz \
    --cc=tj@kernel.org \
    --cc=vdavydov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).