linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Andrew Morton <akpm@linux-foundation.org>, Tejun Heo <tj@kernel.org>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>,
	Michal Hocko <mhocko@suse.cz>, Li Zefan <lizefan@huawei.com>,
	linux-mm@kvack.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH] mm: memcontrol: fix cgroup creation failure after many small jobs
Date: Thu, 14 Jul 2016 11:37:23 -0400	[thread overview]
Message-ID: <20160714153723.GA9840@cmpxchg.org> (raw)
In-Reply-To: <20160616034244.14839-1-hannes@cmpxchg.org>

Hi Andrew,

this issue dates back quite a bit and wasn't reported until now, so I
didn't tag it for stable. However, it seems that larger scale setups
are now running into this as they upgrade their kernels, and several
people have run into this independently now. Could you please add:

Reported-by: John Garcia <john.garcia@mesosphere.io>
Fixes: b2052564e66d ("mm: memcontrol: continue cache reclaim from offlined groups")
CC: stable@kernel.org # 3.19+

and send it linusward?

Thank you

On Wed, Jun 15, 2016 at 11:42:44PM -0400, Johannes Weiner wrote:
> The memory controller has quite a bit of state that usually outlives
> the cgroup and pins its CSS until said state disappears. At the same
> time it imposes a 16-bit limit on the CSS ID space to economically
> store IDs in the wild. Consequently, when we use cgroups to contain
> frequent but small and short-lived jobs that leave behind some page
> cache, we quickly run into the 64k limitations of outstanding CSSs.
> Creating a new cgroup fails with -ENOSPC while there are only a few,
> or even no user-visible cgroups in existence.
> 
> Although pinning CSSs past cgroup removal is common, there are only
> two instances that actually need a CSS ID after a cgroup is deleted:
> cache shadow entries and swapout records.
> 
> Cache shadow entries reference the ID weakly and can deal with the CSS
> having disappeared when it's looked up later. They pose no hurdle.
> 
> Swap-out records do need to pin the css to hierarchically attribute
> swapins after the cgroup has been deleted; though the only pages that
> remain swapped out after a process exits are tmpfs/shmem pages. Those
> references are under the user's control and thus manageable.
> 
> This patch introduces a private 16bit memcg ID and switches swap and
> cache shadow entries over to using that. It then decouples the CSS
> lifetime from the CSS ID lifetime, such that a CSS ID can be recycled
> when the CSS is only pinned by common objects that don't need an ID.
> 
> This script demonstrates the problem by faulting one cache page in a
> new cgroup and deleting it again:
> 
> set -e
> mkdir -p pages
> for x in `seq 128000`; do
>   [ $((x % 1000)) -eq 0 ] && echo $x
>   mkdir /cgroup/foo
>   echo $$ >/cgroup/foo/cgroup.procs
>   echo trex >pages/$x
>   echo $$ >/cgroup/cgroup.procs
>   rmdir /cgroup/foo
> done
> 
> When run on an unpatched kernel, we eventually run out of possible CSS
> IDs even though there is no visible cgroup existing anymore:
> 
> [root@ham ~]# ./cssidstress.sh
> [...]
> 65000
> mkdir: cannot create directory '/cgroup/foo': No space left on device
> 
> After this patch, the CSS IDs get released upon cgroup destruction and
> the cache and css objects get released once memory reclaim kicks in.

      parent reply	other threads:[~2016-07-14 15:40 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-16  3:42 [PATCH] mm: memcontrol: fix cgroup creation failure after many small jobs Johannes Weiner
2016-06-16 20:06 ` Tejun Heo
2016-06-17 16:23   ` Johannes Weiner
2016-06-17 16:23     ` [PATCH 1/3] cgroup: fix idr leak for the first cgroup root Johannes Weiner
2016-06-17 16:24     ` [PATCH 2/3] cgroup: remove unnecessary 0 check from css_from_id() Johannes Weiner
2016-06-17 18:17       ` Tejun Heo
2016-06-17 16:25     ` [PATCH 3/3] mm: memcontrol: fix cgroup creation failure after many small jobs Johannes Weiner
2016-06-17 18:18       ` Tejun Heo
2016-06-20  6:14       ` Nikolay Borisov
2016-06-21 10:16       ` Vladimir Davydov
2016-06-21 15:46         ` Johannes Weiner
2016-06-17  9:06 ` [PATCH] " Vladimir Davydov
2016-06-17 16:40   ` Johannes Weiner
2016-07-14 15:37 ` Johannes Weiner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160714153723.GA9840@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan@huawei.com \
    --cc=mhocko@suse.cz \
    --cc=tj@kernel.org \
    --cc=vdavydov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).