From: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Vladimir Davydov <vdavydov@parallels.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@suse.cz>, Greg Thelen <gthelen@google.com>,
Hugh Dickins <hughd@google.com>,
Motohiro Kosaki <Motohiro.Kosaki@us.fujitsu.com>,
Glauber Costa <glommer@gmail.com>, Tejun Heo <tj@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Pavel Emelianov <xemul@parallels.com>,
Konstantin Khorenko <khorenko@parallels.com>,
LKML-MM <linux-mm@kvack.org>,
LKML-cgroups <cgroups@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC] memory cgroup: my thoughts on memsw
Date: Sat, 06 Sep 2014 08:15:44 +0900 [thread overview]
Message-ID: <540A4420.2030504@jp.fujitsu.com> (raw)
In-Reply-To: <20140905160029.GF25641@esperanza>
(2014/09/06 1:00), Vladimir Davydov wrote:
> On Fri, Sep 05, 2014 at 11:20:43PM +0900, Kamezawa Hiroyuki wrote:
>> Basically, I don't like OOM Kill. Anyone don't like it, I think.
>>
>> In recent container use, application may be build as "stateless" and
>> kill-and-respawn may not be problematic, but I think killing "a" process
>> by oom-kill is too naive.
>>
>> If your proposal is triggering notification to user space at hitting
>> anon+swap limit, it may be useful.
>> ...Some container-cluster management software can handle it.
>> For example, container may be restarted.
>>
>> Memcg has threshold notifier and vmpressure notifier.
>> I think you can enhance it.
> [...]
>> My point is that "killing a process" tend not to be able to fix the situation.
>> For example, fork-bomb by "make -j" cannot be handled by it.
>>
>> So, I don't want to think about enhancing OOM-Kill. Please think of better
>> way to survive. With the help of countainer-management-softwares, I think
>> we can have several choices.
>>
>> Restart contantainer (killall) may be the best if container app is stateless.
>> Or container-management can provide some failover.
>
> The problem I'm trying to set out is not about OOM actually (sorry if
> the way I explain is confusing). We could probably configure OOM to kill
> a whole cgroup (not just a process) and/or improve user-notification so
> that the userspace could react somehow. I'm sure it must and will be
> discussed one day.
>
> The problem is that *before* invoking OOM on *global* pressure we're
> trying to reclaim containers' memory and if there's progress we won't
> invoke OOM. This can result in a huge slow down of the whole system (due
> to swap out).
>
use SSD or zram for swap device.
>> The 1st reason we added memsw.limit was for avoiding that the whole swap
>> is used up by a cgroup where memory-leak of forkbomb running and not for
>> some intellegent controls.
>>
>> From your opinion, I feel what you want is avoiding charging against page-caches.
>> But thiking docker at el, page-cache is not shared between containers any more.
>> I think "including cache" makes sense.
>
> Not exactly. It's not about sharing caches among containers. The point
> is (1) it's difficult to estimate the size of file caches that will max
> out the performance of a container, and (2) a typical workload will
> perform better and put less pressure on disk if it has more caches.
>
> Now imagine a big host running a small number of containers and
> therefore having a lot of free memory most of time, but still
> experiencing load spikes once an hour/day/whatever when memory usage
> raises up drastically. It'd be unwise to set hard limits for those
> containers that are running regularly, because they'd probably perform
> much better if they had more file caches. So the admin decides to use
> soft limits instead. He is forced to use memsw.limit > the soft limit,
> but this is unsafe, because the container may eat anon memory up to
> memsw.limit then, and anon memory isn't easy to get rid of when it comes
> to the global pressure. If the admin had a mean to limit swappable
> memory, he could avoid it. This is what I was trying to illustrate by
> the example in the first e-mail of this thread.
>
> Note if there were no soft limits, the current setup would be just fine,
> otherwise it fails. And soft limits are proved to be useful AFAIK.
>
As you noticed, hitting anon+swap limit just means oom-kill.
My point is that using oom-killer for "server management" just seems crazy.
Let my clarify things. your proposal was.
1. soft-limit will be a main feature for server management.
2. Because of soft-limit, global memory reclaim runs.
3. Using swap at global memory reclaim can cause poor performance.
4. So, making use of OOM-Killer for avoiding swap.
I can't agree "4". I think
- don't configure swap.
- use zram
- use SSD for swap
Or
- provide a way to notify usage of "anon+swap" to container management software.
Now we have "vmpressure". Container management software can kill or respawn container
with using user-defined policy for avoidng swap.
If you don't want to run kswapd at all, threshold notifier enhancement may be required.
/proc/meminfo provides total number of ANON/CACHE pages.
Many things can be done in userland.
And your idea can't help swap-out caused by memory pressure comes from "zones".
I guess vmpressure will be a total win. The kernel may need some enhancement
but I don't like to make use of oom-killer as a part of feature for avoiding swap.
Thanks,
-Kame
next prev parent reply other threads:[~2014-09-05 23:33 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-04 14:30 [RFC] memory cgroup: my thoughts on memsw Vladimir Davydov
2014-09-04 22:03 ` Kamezawa Hiroyuki
2014-09-05 8:28 ` Vladimir Davydov
2014-09-05 14:20 ` Kamezawa Hiroyuki
2014-09-05 16:00 ` Vladimir Davydov
2014-09-05 23:15 ` Kamezawa Hiroyuki [this message]
2014-09-08 11:01 ` Vladimir Davydov
2014-09-08 13:53 ` Kamezawa Hiroyuki
2014-09-09 10:39 ` Vladimir Davydov
2014-09-11 2:04 ` Kamezawa Hiroyuki
2014-09-11 8:23 ` Vladimir Davydov
2014-09-11 8:53 ` Kamezawa Hiroyuki
2014-09-11 9:50 ` Vladimir Davydov
2014-09-10 12:01 ` Vladimir Davydov
2014-09-11 1:22 ` Kamezawa Hiroyuki
2014-09-11 7:03 ` Vladimir Davydov
2014-09-15 19:14 ` Johannes Weiner
2014-09-16 1:34 ` Kamezawa Hiroyuki
2014-09-17 15:59 ` Vladimir Davydov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=540A4420.2030504@jp.fujitsu.com \
--to=kamezawa.hiroyu@jp.fujitsu.com \
--cc=Motohiro.Kosaki@us.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=glommer@gmail.com \
--cc=gthelen@google.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=khorenko@parallels.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
--cc=tj@kernel.org \
--cc=vdavydov@parallels.com \
--cc=xemul@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).