linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vladimir Davydov <vdavydov@parallels.com>
To: Greg Thelen <gthelen@google.com>
Cc: Tim Hockin <thockin@hockin.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Cgroups <cgroups@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tejun Heo <tj@kernel.org>, Li Zefan <lizefan@huawei.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@suse.cz>, Mel Gorman <mgorman@suse.de>,
	Rik van Riel <riel@redhat.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Hugh Dickins <hughd@google.com>,
	David Rientjes <rientjes@google.com>,
	Pavel Emelyanov <xemul@parallels.com>,
	Balbir Singh <bsingharora@gmail.com>
Subject: Re: [PATCH RFC 0/5] Virtual Memory Resource Controller for cgroups
Date: Thu, 10 Jul 2014 20:35:45 +0400	[thread overview]
Message-ID: <20140710163545.GA835@esperanza> (raw)
In-Reply-To: <CAHH2K0Y2OH9scJ8FGkL3M124RSfoUFiELNhGNTHJEsaCEm+hiQ@mail.gmail.com>

Hi Greg,

On Wed, Jul 09, 2014 at 10:04:21AM -0700, Greg Thelen wrote:
> On Wed, Jul 9, 2014 at 9:36 AM, Vladimir Davydov <vdavydov@parallels.com> wrote:
> > Hi Tim,
> >
> > On Wed, Jul 09, 2014 at 08:08:07AM -0700, Tim Hockin wrote:
> >> How is this different from RLIMIT_AS?  You specifically mentioned it
> >> earlier but you don't explain how this is different.
> >
> > The main difference is that RLIMIT_AS is per process while this
> > controller is per cgroup. RLIMIT_AS doesn't allow us to limit VSIZE for
> > a group of unrelated or cooperating through shmem processes.
> >
> > Also RLIMIT_AS accounts for total VM usage (including file mappings),
> > while this only charges private writable and shared mappings, whose
> > faulted-in pages always occupy mem+swap and therefore cannot be just
> > synced and dropped like file pages. In other words, this controller
> > works exactly as the global overcommit control.
> >
> >> From my perspective, this is pointless.  There's plenty of perfectly
> >> correct software that mmaps files without concern for VSIZE, because
> >> they never fault most of those pages in.
> >
> > But there's also software that correctly handles ENOMEM returned by
> > mmap. For example, mongodb keeps growing its buffers until mmap fails.
> > Therefore, if there's no overcommit control, it will be OOM-killed
> > sooner or later, which may be pretty annoying. And we did have customers
> > complaining about that.
> 
> Is mongodb's buffer growth causing the oom kills?

We saw this happened on our customer's node some time ago. A container
running mongodb and several other services got OOM-kills from time to
time, which made the customer unhappy. Limiting overcommit helped then.

> If yes, I wonder if apps, like mongodb, that want ENOMEM should (1)
> use MAP_POPULATE and (2) we change vm_map_pgoff() to propagate
> mm_populate() ENOMEM failures back to mmap()?

This way we may fault-in lots of pages, evicting someone's working set
along the way, only to get ENOMEM eventually. This doesn't look optimal.
Also, this requires modifications of userspace apps, which isn't always
possible.

Thanks.

      reply	other threads:[~2014-07-10 16:36 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-03 12:48 [PATCH RFC 0/5] Virtual Memory Resource Controller for cgroups Vladimir Davydov
2014-07-03 12:48 ` [PATCH RFC 1/5] vm_cgroup: basic infrastructure Vladimir Davydov
2014-07-03 12:48 ` [PATCH RFC 2/5] vm_cgroup: private writable mappings accounting Vladimir Davydov
2014-07-03 12:48 ` [PATCH RFC 3/5] shmem: pass inode to shmem_acct_* methods Vladimir Davydov
2014-07-03 12:48 ` [PATCH RFC 4/5] vm_cgroup: shared memory accounting Vladimir Davydov
2014-07-03 12:48 ` [PATCH RFC 5/5] vm_cgroup: do not charge tasks in root cgroup Vladimir Davydov
2014-07-04 12:16 ` [PATCH RFC 0/5] Virtual Memory Resource Controller for cgroups Michal Hocko
2014-07-04 15:38   ` Vladimir Davydov
2014-07-16 12:01     ` Michal Hocko
2014-07-23 14:08       ` Vladimir Davydov
2014-07-09  7:52 ` Vladimir Davydov
2014-07-09 15:08   ` Tim Hockin
2014-07-09 16:36     ` Vladimir Davydov
2014-07-09 17:04       ` Greg Thelen
2014-07-10 16:35         ` Vladimir Davydov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140710163545.GA835@esperanza \
    --to=vdavydov@parallels.com \
    --cc=akpm@linux-foundation.org \
    --cc=bsingharora@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan@huawei.com \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=thockin@hockin.org \
    --cc=tj@kernel.org \
    --cc=xemul@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).