From: Michal Hocko <mhocko@suse.com>
To: "程垲涛 Chengkaitao Cheng" <chengkaitao@didiglobal.com>
Cc: chengkaitao <pilgrimtao@gmail.com>,
"tj@kernel.org" <tj@kernel.org>,
"lizefan.x@bytedance.com" <lizefan.x@bytedance.com>,
"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
"corbet@lwn.net" <corbet@lwn.net>,
"roman.gushchin@linux.dev" <roman.gushchin@linux.dev>,
"shakeelb@google.com" <shakeelb@google.com>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"songmuchun@bytedance.com" <songmuchun@bytedance.com>,
"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
"zhengqi.arch@bytedance.com" <zhengqi.arch@bytedance.com>,
"ebiederm@xmission.com" <ebiederm@xmission.com>,
"Liam.Howlett@oracle.com" <Liam.Howlett@oracle.com>,
"chengzhihao1@huawei.com" <chengzhihao1@huawei.com>,
"haolee.swjtu@gmail.com" <haolee.swjtu@gmail.com>,
"yuzhao@google.com" <yuzhao@google.com>,
"willy@infradead.org" <willy@infradead.org>,
"vasily.averin@linux.dev" <vasily.averin@linux.dev>,
"vbabka@suse.cz" <vbabka@suse.cz>,
"surenb@google.com" <surenb@google.com>,
"sfr@canb.auug.org.au" <sfr@canb.auug.org.au>,
"mcgrof@kernel.org" <mcgrof@kernel.org>,
"sujiaxun@uniontech.com" <sujiaxun@uniontech.com>,
"feng.tang@intel.com" <feng.tang@intel.com>,
"cgroups@vger.kernel.org" <cgroups@vger.kernel.org>,
"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [PATCH v2] mm: memcontrol: protect the memory in cgroup from being oom killed
Date: Fri, 9 Dec 2022 09:25:37 +0100 [thread overview]
Message-ID: <Y5LxAbOB2AYp42hi@dhcp22.suse.cz> (raw)
In-Reply-To: <114DF8F0-3E68-4F2B-8E35-0943EC2F51AE@didiglobal.com>
On Fri 09-12-22 05:07:15, 程垲涛 Chengkaitao Cheng wrote:
> At 2022-12-08 22:23:56, "Michal Hocko" <mhocko@suse.com> wrote:
[...]
> >oom killer is a memory reclaim of the last resort. So yes, there is some
> >difference but fundamentally it is about releasing some memory. And long
> >term we have learned that the more clever it tries to be the more likely
> >corner cases can happen. It is simply impossible to know the best
> >candidate so this is a just a best effort. We try to aim for
> >predictability at least.
>
> Is the current oom_score strategy predictable? I don't think so. The score_adj
> has broken the predictability of oom_score (it is no longer simply killing the
> process that uses the most mems).
oom_score as reported to the userspace already considers oom_score_adj
which means that you can compare processes and get a reasonable guess
what would be the current oom_victim. There is a certain fuzz level
because this is not atomic and also there is no clear candidate when
multiple processes have equal score. So yes, it is not 100% predictable.
memory.reclaim as you propose doesn't change that though.
Is oom_score_adj a good interface? No, not really. If I could go back in
time I would nack it but here we are. We have an interface that
promises quite much but essentially it only allows two usecases
(OOM_SCORE_ADJ_MIN, OOM_SCORE_ADJ_MAX) reliably. Everything in between
is clumsy at best because a real user space oom policy would require to
re-evaluate the whole oom domain (be it global or memcg oom) as the
memory consumption evolves over time. I am really worried that your
memory.oom.protection directs a very similar trajectory because
protection really needs to consider other memcgs to balance properly.
[...]
> > But I am really open
> >to be convinced otherwise and this is in fact what I have been asking
> >for since the beginning. I would love to see some examples on the
> >reasonable configuration for a practical usecase.
>
> Here is a simple example. In a docker container, users can divide all processes
> into two categories (important and normal), and put them in different cgroups.
> One cgroup's oom.protect is set to "max", the other is set to "0". In this way,
> important processes in the container can be protected.
That is effectivelly oom_score_adj = OOM_SCORE_ADJ_MIN - 1 to all
processes in the important group. I would argue you can achieve a very
similar result by the process launcher to set the oom_score_adj and
inherit it to all processes in that important container. You do not need
any memcg tunable for that. I am really much more interested in examples
when the protection is to be fine tuned.
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2022-12-09 8:25 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-08 3:46 [PATCH v2] mm: memcontrol: protect the memory in cgroup from being oom killed chengkaitao
2022-12-08 7:33 ` Michal Hocko
2022-12-08 7:59 ` 程垲涛 Chengkaitao Cheng
2022-12-08 8:14 ` Michal Hocko
2022-12-08 14:07 ` 程垲涛 Chengkaitao Cheng
2022-12-08 14:23 ` Michal Hocko
2022-12-09 5:07 ` 程垲涛 Chengkaitao Cheng
2022-12-09 8:25 ` Michal Hocko [this message]
2022-12-09 12:27 ` Michal Hocko
2022-12-10 9:18 ` 程垲涛 Chengkaitao Cheng
2022-12-19 3:16 ` 程垲涛 Chengkaitao Cheng
2022-12-19 12:06 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y5LxAbOB2AYp42hi@dhcp22.suse.cz \
--to=mhocko@suse.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=chengkaitao@didiglobal.com \
--cc=chengzhihao1@huawei.com \
--cc=corbet@lwn.net \
--cc=ebiederm@xmission.com \
--cc=feng.tang@intel.com \
--cc=hannes@cmpxchg.org \
--cc=haolee.swjtu@gmail.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizefan.x@bytedance.com \
--cc=mcgrof@kernel.org \
--cc=pilgrimtao@gmail.com \
--cc=roman.gushchin@linux.dev \
--cc=sfr@canb.auug.org.au \
--cc=shakeelb@google.com \
--cc=songmuchun@bytedance.com \
--cc=sujiaxun@uniontech.com \
--cc=surenb@google.com \
--cc=tj@kernel.org \
--cc=vasily.averin@linux.dev \
--cc=vbabka@suse.cz \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
--cc=yuzhao@google.com \
--cc=zhengqi.arch@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).