linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Thelen <gthelen@google.com>
To: Vladimir Davydov <vdavydov@parallels.com>
Cc: Tim Hockin <thockin@hockin.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Cgroups <cgroups@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tejun Heo <tj@kernel.org>, Li Zefan <lizefan@huawei.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@suse.cz>, Mel Gorman <mgorman@suse.de>,
	Rik van Riel <riel@redhat.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Hugh Dickins <hughd@google.com>,
	David Rientjes <rientjes@google.com>,
	Pavel Emelyanov <xemul@parallels.com>,
	Balbir Singh <bsingharora@gmail.com>
Subject: Re: [PATCH RFC 0/5] Virtual Memory Resource Controller for cgroups
Date: Wed, 9 Jul 2014 10:04:21 -0700	[thread overview]
Message-ID: <CAHH2K0Y2OH9scJ8FGkL3M124RSfoUFiELNhGNTHJEsaCEm+hiQ@mail.gmail.com> (raw)
In-Reply-To: <20140709163631.GG6685@esperanza>

On Wed, Jul 9, 2014 at 9:36 AM, Vladimir Davydov <vdavydov@parallels.com> wrote:
> Hi Tim,
>
> On Wed, Jul 09, 2014 at 08:08:07AM -0700, Tim Hockin wrote:
>> How is this different from RLIMIT_AS?  You specifically mentioned it
>> earlier but you don't explain how this is different.
>
> The main difference is that RLIMIT_AS is per process while this
> controller is per cgroup. RLIMIT_AS doesn't allow us to limit VSIZE for
> a group of unrelated or cooperating through shmem processes.
>
> Also RLIMIT_AS accounts for total VM usage (including file mappings),
> while this only charges private writable and shared mappings, whose
> faulted-in pages always occupy mem+swap and therefore cannot be just
> synced and dropped like file pages. In other words, this controller
> works exactly as the global overcommit control.
>
>> From my perspective, this is pointless.  There's plenty of perfectly
>> correct software that mmaps files without concern for VSIZE, because
>> they never fault most of those pages in.
>
> But there's also software that correctly handles ENOMEM returned by
> mmap. For example, mongodb keeps growing its buffers until mmap fails.
> Therefore, if there's no overcommit control, it will be OOM-killed
> sooner or later, which may be pretty annoying. And we did have customers
> complaining about that.

Is mongodb's buffer growth causing the oom kills?

If yes, I wonder if apps, like mongodb, that want ENOMEM should (1)
use MAP_POPULATE and (2) we change vm_map_pgoff() to propagate
mm_populate() ENOMEM failures back to mmap()?

>> From my observations it is not generally possible to predict an
>> average VSIZE limit that would satisfy your concerns *and* not kill
>> lots of valid apps.
>
> Yes, it's difficult. Actually, we can only guess. Nevertheless, we
> predict and set the VSIZE limit system-wide by default.
>
>> It sounds like what you want is to limit or even disable swap usage.
>
> I want to avoid OOM kill if it's possible to return ENOMEM. OOM can be
> painful. It can kill lots of innocent processes. Of course, the user can
> protect some processes by setting oom_score_adj, but this is difficult
> and requires time and expertise, so an average user won't do that.
>
>> Given your example, your hypothetical user would probably be better of
>> getting an OOM kill early so she can fix her job spec to request more
>> memory.
>
> In my example the user won't get OOM kill *early*...

  reply	other threads:[~2014-07-09 17:04 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-03 12:48 [PATCH RFC 0/5] Virtual Memory Resource Controller for cgroups Vladimir Davydov
2014-07-03 12:48 ` [PATCH RFC 1/5] vm_cgroup: basic infrastructure Vladimir Davydov
2014-07-03 12:48 ` [PATCH RFC 2/5] vm_cgroup: private writable mappings accounting Vladimir Davydov
2014-07-03 12:48 ` [PATCH RFC 3/5] shmem: pass inode to shmem_acct_* methods Vladimir Davydov
2014-07-03 12:48 ` [PATCH RFC 4/5] vm_cgroup: shared memory accounting Vladimir Davydov
2014-07-03 12:48 ` [PATCH RFC 5/5] vm_cgroup: do not charge tasks in root cgroup Vladimir Davydov
2014-07-04 12:16 ` [PATCH RFC 0/5] Virtual Memory Resource Controller for cgroups Michal Hocko
2014-07-04 15:38   ` Vladimir Davydov
2014-07-16 12:01     ` Michal Hocko
2014-07-23 14:08       ` Vladimir Davydov
2014-07-09  7:52 ` Vladimir Davydov
2014-07-09 15:08   ` Tim Hockin
2014-07-09 16:36     ` Vladimir Davydov
2014-07-09 17:04       ` Greg Thelen [this message]
2014-07-10 16:35         ` Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHH2K0Y2OH9scJ8FGkL3M124RSfoUFiELNhGNTHJEsaCEm+hiQ@mail.gmail.com \
    --to=gthelen@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=bsingharora@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan@huawei.com \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=thockin@hockin.org \
    --cc=tj@kernel.org \
    --cc=vdavydov@parallels.com \
    --cc=xemul@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).