All of lore.kernel.org
 help / color / mirror / Atom feed
From: Roman Gushchin <guro@fb.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Tim Hockin <thockin@hockin.org>,
	Johannes Weiner <hannes@cmpxchg.org>, Tejun Heo <tj@kernel.org>,
	<kernel-team@fb.com>, David Rientjes <rientjes@google.com>,
	<linux-mm@kvack.org>, Vladimir Davydov <vdavydov.dev@gmail.com>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	Andrew Morton <akpm@linux-foundation.org>,
	Cgroups <cgroups@vger.kernel.org>, <linux-doc@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [v8 0/4] cgroup-aware OOM killer
Date: Wed, 27 Sep 2017 11:19:13 +0100	[thread overview]
Message-ID: <20170927101913.GB4159@castle> (raw)
In-Reply-To: <20170927074319.o3k26kja43rfqmvb@dhcp22.suse.cz>

On Wed, Sep 27, 2017 at 09:43:19AM +0200, Michal Hocko wrote:
> On Tue 26-09-17 20:37:37, Tim Hockin wrote:
> [...]
> > I feel like David has offered examples here, and many of us at Google
> > have offered examples as long ago as 2013 (if I recall) of cases where
> > the proposed heuristic is EXACTLY WRONG.
> 
> I do not think we have discussed anything resembling the current
> approach. And I would really appreciate some more examples where
> decisions based on leaf nodes would be EXACTLY WRONG.
>

I would agree here.

The discussing two-step approach (select biggest leaf or oom_group memcg,
then select largest process inside) does really look as a way to go.

It should work well in practice and it allows further development.
It will catch workloads which are leaking child processes by default,
which is an advantage in comparison to the existing algorithm.

Both strong hierarchical approach (as in v8) and pure flat (by Johannes)
are more limiting. In first case, deep hierarchies are affected (as Michal
mentioned) and we stick with tree traverse policy (Tejun's point).

In second case, the further development is under a question: any new idea
(say, oom_priorities, or, for example, if we will have a new useful memcg
metric) should be applied to processes and memcgs simultaneously.
Also, We drop any idea of memcg-level fairness and obtain some implementation
issues (which I mentioned earlier). The idea of mixing tasks and memcgs
leads to a much more hairy code, and the OOM code is already quite hairy.
The idea of comparing killable entities is a leaking abstraction,
as we can't predict how much memory killing a single process will release
(say, for example, the process is the init in a pid namespace).

> > We need OOM behavior to kill in a deterministic order configured by
> > policy.
> 
> And nobody is objecting to this usecase. I think we can build a priority
> policy on top of leaf-based decision as well. The main point we are
> trying to sort out here is a reasonable semantic that would work for
> most workloads. Sibling based selection will simply not work on those
> that have to use deeper hierarchies for organizational purposes. I
> haven't heard a counter argument for that example yet.

Yes, implementing oom_priorities is a ~15 lines patch on top of
the discussing approach. David can use this small off-stream patch
for now, in any case it's a step forward in comparison to the existing state.


Overall, do we have any open question left? Does anyone has any strong
arguments against the discussing design?

Thanks!

WARNING: multiple messages have this Message-ID (diff)
From: Roman Gushchin <guro@fb.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Tim Hockin <thockin@hockin.org>,
	Johannes Weiner <hannes@cmpxchg.org>, Tejun Heo <tj@kernel.org>,
	kernel-team@fb.com, David Rientjes <rientjes@google.com>,
	linux-mm@kvack.org, Vladimir Davydov <vdavydov.dev@gmail.com>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	Andrew Morton <akpm@linux-foundation.org>,
	Cgroups <cgroups@vger.kernel.org>,
	linux-doc@vger.kernel.org,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [v8 0/4] cgroup-aware OOM killer
Date: Wed, 27 Sep 2017 11:19:13 +0100	[thread overview]
Message-ID: <20170927101913.GB4159@castle> (raw)
In-Reply-To: <20170927074319.o3k26kja43rfqmvb@dhcp22.suse.cz>

On Wed, Sep 27, 2017 at 09:43:19AM +0200, Michal Hocko wrote:
> On Tue 26-09-17 20:37:37, Tim Hockin wrote:
> [...]
> > I feel like David has offered examples here, and many of us at Google
> > have offered examples as long ago as 2013 (if I recall) of cases where
> > the proposed heuristic is EXACTLY WRONG.
> 
> I do not think we have discussed anything resembling the current
> approach. And I would really appreciate some more examples where
> decisions based on leaf nodes would be EXACTLY WRONG.
>

I would agree here.

The discussing two-step approach (select biggest leaf or oom_group memcg,
then select largest process inside) does really look as a way to go.

It should work well in practice and it allows further development.
It will catch workloads which are leaking child processes by default,
which is an advantage in comparison to the existing algorithm.

Both strong hierarchical approach (as in v8) and pure flat (by Johannes)
are more limiting. In first case, deep hierarchies are affected (as Michal
mentioned) and we stick with tree traverse policy (Tejun's point).

In second case, the further development is under a question: any new idea
(say, oom_priorities, or, for example, if we will have a new useful memcg
metric) should be applied to processes and memcgs simultaneously.
Also, We drop any idea of memcg-level fairness and obtain some implementation
issues (which I mentioned earlier). The idea of mixing tasks and memcgs
leads to a much more hairy code, and the OOM code is already quite hairy.
The idea of comparing killable entities is a leaking abstraction,
as we can't predict how much memory killing a single process will release
(say, for example, the process is the init in a pid namespace).

> > We need OOM behavior to kill in a deterministic order configured by
> > policy.
> 
> And nobody is objecting to this usecase. I think we can build a priority
> policy on top of leaf-based decision as well. The main point we are
> trying to sort out here is a reasonable semantic that would work for
> most workloads. Sibling based selection will simply not work on those
> that have to use deeper hierarchies for organizational purposes. I
> haven't heard a counter argument for that example yet.

Yes, implementing oom_priorities is a ~15 lines patch on top of
the discussing approach. David can use this small off-stream patch
for now, in any case it's a step forward in comparison to the existing state.


Overall, do we have any open question left? Does anyone has any strong
arguments against the discussing design?

Thanks!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Roman Gushchin <guro@fb.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Tim Hockin <thockin@hockin.org>,
	Johannes Weiner <hannes@cmpxchg.org>, Tejun Heo <tj@kernel.org>,
	kernel-team@fb.com, David Rientjes <rientjes@google.com>,
	linux-mm@kvack.org, Vladimir Davydov <vdavydov.dev@gmail.com>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	Andrew Morton <akpm@linux-foundation.org>,
	Cgroups <cgroups@vger.kernel.org>,
	linux-doc@vger.kernel.org,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [v8 0/4] cgroup-aware OOM killer
Date: Wed, 27 Sep 2017 11:19:13 +0100	[thread overview]
Message-ID: <20170927101913.GB4159@castle> (raw)
In-Reply-To: <20170927074319.o3k26kja43rfqmvb@dhcp22.suse.cz>

On Wed, Sep 27, 2017 at 09:43:19AM +0200, Michal Hocko wrote:
> On Tue 26-09-17 20:37:37, Tim Hockin wrote:
> [...]
> > I feel like David has offered examples here, and many of us at Google
> > have offered examples as long ago as 2013 (if I recall) of cases where
> > the proposed heuristic is EXACTLY WRONG.
> 
> I do not think we have discussed anything resembling the current
> approach. And I would really appreciate some more examples where
> decisions based on leaf nodes would be EXACTLY WRONG.
>

I would agree here.

The discussing two-step approach (select biggest leaf or oom_group memcg,
then select largest process inside) does really look as a way to go.

It should work well in practice and it allows further development.
It will catch workloads which are leaking child processes by default,
which is an advantage in comparison to the existing algorithm.

Both strong hierarchical approach (as in v8) and pure flat (by Johannes)
are more limiting. In first case, deep hierarchies are affected (as Michal
mentioned) and we stick with tree traverse policy (Tejun's point).

In second case, the further development is under a question: any new idea
(say, oom_priorities, or, for example, if we will have a new useful memcg
metric) should be applied to processes and memcgs simultaneously.
Also, We drop any idea of memcg-level fairness and obtain some implementation
issues (which I mentioned earlier). The idea of mixing tasks and memcgs
leads to a much more hairy code, and the OOM code is already quite hairy.
The idea of comparing killable entities is a leaking abstraction,
as we can't predict how much memory killing a single process will release
(say, for example, the process is the init in a pid namespace).

> > We need OOM behavior to kill in a deterministic order configured by
> > policy.
> 
> And nobody is objecting to this usecase. I think we can build a priority
> policy on top of leaf-based decision as well. The main point we are
> trying to sort out here is a reasonable semantic that would work for
> most workloads. Sibling based selection will simply not work on those
> that have to use deeper hierarchies for organizational purposes. I
> haven't heard a counter argument for that example yet.

Yes, implementing oom_priorities is a ~15 lines patch on top of
the discussing approach. David can use this small off-stream patch
for now, in any case it's a step forward in comparison to the existing state.


Overall, do we have any open question left? Does anyone has any strong
arguments against the discussing design?

Thanks!

  reply	other threads:[~2017-09-27 10:19 UTC|newest]

Thread overview: 168+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-11 13:17 [v8 0/4] cgroup-aware OOM killer Roman Gushchin
2017-09-11 13:17 ` Roman Gushchin
2017-09-11 13:17 ` [v8 1/4] mm, oom: refactor the oom_kill_process() function Roman Gushchin
2017-09-11 13:17   ` Roman Gushchin
2017-09-11 20:51   ` David Rientjes
2017-09-11 20:51     ` David Rientjes
2017-09-14 13:42   ` Michal Hocko
2017-09-14 13:42     ` Michal Hocko
2017-09-11 13:17 ` [v8 2/4] mm, oom: cgroup-aware OOM killer Roman Gushchin
2017-09-11 13:17   ` Roman Gushchin
2017-09-13 20:46   ` David Rientjes
2017-09-13 20:46     ` David Rientjes
2017-09-13 21:59     ` Roman Gushchin
2017-09-13 21:59       ` Roman Gushchin
2017-09-13 21:59       ` Roman Gushchin
2017-09-11 13:17 ` [v8 3/4] mm, oom: add cgroup v2 mount option for " Roman Gushchin
2017-09-11 13:17   ` Roman Gushchin
2017-09-11 13:17   ` Roman Gushchin
2017-09-11 20:48   ` David Rientjes
2017-09-11 20:48     ` David Rientjes
2017-09-12 20:01     ` Roman Gushchin
2017-09-12 20:01       ` Roman Gushchin
2017-09-12 20:23       ` David Rientjes
2017-09-12 20:23         ` David Rientjes
2017-09-13 12:23       ` Michal Hocko
2017-09-13 12:23         ` Michal Hocko
2017-09-11 13:17 ` [v8 4/4] mm, oom, docs: describe the " Roman Gushchin
2017-09-11 13:17   ` Roman Gushchin
2017-09-11 20:44 ` [v8 0/4] " David Rientjes
2017-09-11 20:44   ` David Rientjes
2017-09-13 12:29   ` Michal Hocko
2017-09-13 12:29     ` Michal Hocko
2017-09-13 20:46     ` David Rientjes
2017-09-13 20:46       ` David Rientjes
2017-09-14 13:34       ` Michal Hocko
2017-09-14 13:34         ` Michal Hocko
2017-09-14 20:07         ` David Rientjes
2017-09-14 20:07           ` David Rientjes
2017-09-13 21:56     ` Roman Gushchin
2017-09-13 21:56       ` Roman Gushchin
2017-09-14 13:40       ` Michal Hocko
2017-09-14 13:40         ` Michal Hocko
2017-09-14 16:05         ` Roman Gushchin
2017-09-14 16:05           ` Roman Gushchin
2017-09-15 10:58           ` Michal Hocko
2017-09-15 10:58             ` Michal Hocko
2017-09-15 15:23             ` Roman Gushchin
2017-09-15 15:23               ` Roman Gushchin
2017-09-15 19:55               ` David Rientjes
2017-09-15 19:55                 ` David Rientjes
2017-09-15 21:08                 ` Roman Gushchin
2017-09-15 21:08                   ` Roman Gushchin
2017-09-18  6:20                   ` Michal Hocko
2017-09-18  6:20                     ` Michal Hocko
2017-09-18 15:02                     ` Roman Gushchin
2017-09-18 15:02                       ` Roman Gushchin
2017-09-18 15:02                       ` Roman Gushchin
2017-09-21  8:30                       ` David Rientjes
2017-09-21  8:30                         ` David Rientjes
2017-09-19 20:54                   ` David Rientjes
2017-09-19 20:54                     ` David Rientjes
2017-09-20 22:24                     ` Roman Gushchin
2017-09-20 22:24                       ` Roman Gushchin
2017-09-21  8:27                       ` David Rientjes
2017-09-21  8:27                         ` David Rientjes
2017-09-18  6:16                 ` Michal Hocko
2017-09-18  6:16                   ` Michal Hocko
2017-09-19 20:51                   ` David Rientjes
2017-09-19 20:51                     ` David Rientjes
2017-09-18  6:14               ` Michal Hocko
2017-09-18  6:14                 ` Michal Hocko
2017-09-20 21:53                 ` Roman Gushchin
2017-09-20 21:53                   ` Roman Gushchin
2017-09-20 21:53                   ` Roman Gushchin
2017-09-25 12:24                   ` Michal Hocko
2017-09-25 12:24                     ` Michal Hocko
2017-09-25 17:00                     ` Johannes Weiner
2017-09-25 17:00                       ` Johannes Weiner
2017-09-25 18:15                       ` Roman Gushchin
2017-09-25 18:15                         ` Roman Gushchin
2017-09-25 20:25                         ` Michal Hocko
2017-09-25 20:25                           ` Michal Hocko
2017-09-25 20:25                           ` Michal Hocko
2017-09-26 10:59                           ` Roman Gushchin
2017-09-26 10:59                             ` Roman Gushchin
2017-09-26 11:21                             ` Michal Hocko
2017-09-26 11:21                               ` Michal Hocko
2017-09-26 12:13                               ` Roman Gushchin
2017-09-26 12:13                                 ` Roman Gushchin
2017-09-26 12:13                                 ` Roman Gushchin
2017-09-26 13:30                                 ` Michal Hocko
2017-09-26 13:30                                   ` Michal Hocko
2017-09-26 17:26                                   ` Johannes Weiner
2017-09-26 17:26                                     ` Johannes Weiner
2017-09-27  3:37                                     ` Tim Hockin
2017-09-27  3:37                                       ` Tim Hockin
2017-09-27  7:43                                       ` Michal Hocko
2017-09-27  7:43                                         ` Michal Hocko
2017-09-27 10:19                                         ` Roman Gushchin [this message]
2017-09-27 10:19                                           ` Roman Gushchin
2017-09-27 10:19                                           ` Roman Gushchin
2017-09-27 15:35                                         ` Tim Hockin
2017-09-27 15:35                                           ` Tim Hockin
2017-09-27 16:23                                           ` Roman Gushchin
2017-09-27 16:23                                             ` Roman Gushchin
2017-09-27 18:11                                             ` Tim Hockin
2017-09-27 18:11                                               ` Tim Hockin
2017-10-01 23:29                                               ` Shakeel Butt
2017-10-01 23:29                                                 ` Shakeel Butt
2017-10-02 11:56                                                 ` Tetsuo Handa
2017-10-02 11:56                                                   ` Tetsuo Handa
2017-10-02 12:24                                                 ` Michal Hocko
2017-10-02 12:24                                                   ` Michal Hocko
2017-10-02 12:47                                                   ` Roman Gushchin
2017-10-02 12:47                                                     ` Roman Gushchin
2017-10-02 14:29                                                     ` Michal Hocko
2017-10-02 14:29                                                       ` Michal Hocko
2017-10-02 14:29                                                       ` Michal Hocko
2017-10-02 19:00                                                   ` Shakeel Butt
2017-10-02 19:00                                                     ` Shakeel Butt
2017-10-02 19:28                                                     ` Michal Hocko
2017-10-02 19:28                                                       ` Michal Hocko
2017-10-02 19:45                                                       ` Shakeel Butt
2017-10-02 19:45                                                         ` Shakeel Butt
2017-10-02 19:56                                                         ` Michal Hocko
2017-10-02 19:56                                                           ` Michal Hocko
2017-10-02 20:00                                                           ` Tim Hockin
2017-10-02 20:00                                                             ` Tim Hockin
2017-10-02 20:08                                                             ` Michal Hocko
2017-10-02 20:08                                                               ` Michal Hocko
2017-10-02 20:09                                                             ` Shakeel Butt
2017-10-02 20:20                                                             ` Shakeel Butt
2017-10-02 20:20                                                               ` Shakeel Butt
2017-10-02 20:24                                                           ` Shakeel Butt
2017-10-02 20:24                                                             ` Shakeel Butt
2017-10-02 20:34                                                             ` Johannes Weiner
2017-10-02 20:34                                                               ` Johannes Weiner
2017-10-02 20:55                                                             ` Michal Hocko
2017-10-02 20:55                                                               ` Michal Hocko
2017-09-25 22:21                       ` David Rientjes
2017-09-25 22:21                         ` David Rientjes
2017-09-26  8:46                         ` Michal Hocko
2017-09-26  8:46                           ` Michal Hocko
2017-09-26 21:04                           ` David Rientjes
2017-09-26 21:04                             ` David Rientjes
2017-09-27  7:37                             ` Michal Hocko
2017-09-27  7:37                               ` Michal Hocko
2017-09-27  9:57                               ` Roman Gushchin
2017-09-27  9:57                                 ` Roman Gushchin
2017-09-21 14:21   ` Johannes Weiner
2017-09-21 14:21     ` Johannes Weiner
2017-09-21 21:17     ` David Rientjes
2017-09-21 21:17       ` David Rientjes
2017-09-21 21:17       ` David Rientjes
2017-09-21 21:51       ` Johannes Weiner
2017-09-21 21:51         ` Johannes Weiner
2017-09-22 20:53         ` David Rientjes
2017-09-22 20:53           ` David Rientjes
2017-09-22 15:44       ` Tejun Heo
2017-09-22 15:44         ` Tejun Heo
2017-09-22 15:44         ` Tejun Heo
2017-09-22 20:39         ` David Rientjes
2017-09-22 20:39           ` David Rientjes
2017-09-22 20:39           ` David Rientjes
2017-09-22 21:05           ` Tejun Heo
2017-09-22 21:05             ` Tejun Heo
2017-09-23  8:16             ` David Rientjes
2017-09-23  8:16               ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170927101913.GB4159@castle \
    --to=guro@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=rientjes@google.com \
    --cc=thockin@hockin.org \
    --cc=tj@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.