linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: "程垲涛 Chengkaitao Cheng" <chengkaitao@didiglobal.com>
Cc: Tao pilgrim <pilgrimtao@gmail.com>,
	"tj@kernel.org" <tj@kernel.org>,
	"lizefan.x@bytedance.com" <lizefan.x@bytedance.com>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
	"corbet@lwn.net" <corbet@lwn.net>,
	"roman.gushchin@linux.dev" <roman.gushchin@linux.dev>,
	"shakeelb@google.com" <shakeelb@google.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"songmuchun@bytedance.com" <songmuchun@bytedance.com>,
	"cgel.zte@gmail.com" <cgel.zte@gmail.com>,
	"ran.xiaokai@zte.com.cn" <ran.xiaokai@zte.com.cn>,
	"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
	"zhengqi.arch@bytedance.com" <zhengqi.arch@bytedance.com>,
	"ebiederm@xmission.com" <ebiederm@xmission.com>,
	"Liam.Howlett@oracle.com" <Liam.Howlett@oracle.com>,
	"chengzhihao1@huawei.com" <chengzhihao1@huawei.com>,
	"haolee.swjtu@gmail.com" <haolee.swjtu@gmail.com>,
	"yuzhao@google.com" <yuzhao@google.com>,
	"willy@infradead.org" <willy@infradead.org>,
	"vasily.averin@linux.dev" <vasily.averin@linux.dev>,
	"vbabka@suse.cz" <vbabka@suse.cz>,
	"surenb@google.com" <surenb@google.com>,
	"sfr@canb.auug.org.au" <sfr@canb.auug.org.au>,
	"mcgrof@kernel.org" <mcgrof@kernel.org>,
	"sujiaxun@uniontech.com" <sujiaxun@uniontech.com>,
	"feng.tang@intel.com" <feng.tang@intel.com>,
	"cgroups@vger.kernel.org" <cgroups@vger.kernel.org>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	Bagas Sanjaya <bagasdotme@gmail.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Subject: Re: [PATCH] mm: memcontrol: protect the memory in cgroup from being oom killed
Date: Thu, 1 Dec 2022 13:44:57 +0100	[thread overview]
Message-ID: <Y4ihyRqQzyFFLqh6@dhcp22.suse.cz> (raw)
In-Reply-To: <C2CC36C1-29AE-4B65-A18A-19A745652182@didiglobal.com>

On Thu 01-12-22 10:52:35, 程垲涛 Chengkaitao Cheng wrote:
> At 2022-12-01 16:49:27, "Michal Hocko" <mhocko@suse.com> wrote:
> >On Thu 01-12-22 04:52:27, 程垲涛 Chengkaitao Cheng wrote:
> >> At 2022-12-01 00:27:54, "Michal Hocko" <mhocko@suse.com> wrote:
> >> >On Wed 30-11-22 15:46:19, 程垲涛 Chengkaitao Cheng wrote:
> >> >> On 2022-11-30 21:15:06, "Michal Hocko" <mhocko@suse.com> wrote:
> >> >> > On Wed 30-11-22 15:01:58, chengkaitao wrote:
> >> >> > > From: chengkaitao <pilgrimtao@gmail.com>
> >> >> > >
> >> >> > > We created a new interface <memory.oom.protect> for memory, If there is
> >> >> > > the OOM killer under parent memory cgroup, and the memory usage of a
> >> >> > > child cgroup is within its effective oom.protect boundary, the cgroup's
> >> >> > > tasks won't be OOM killed unless there is no unprotected tasks in other
> >> >> > > children cgroups. It draws on the logic of <memory.min/low> in the
> >> >> > > inheritance relationship.
> >> >> >
> >> >> > Could you be more specific about usecases?
> >> >
> >> >This is a very important question to answer.
> >> 
> >> usecases 1: users say that they want to protect an important process 
> >> with high memory consumption from being killed by the oom in case 
> >> of docker container failure, so as to retain more critical on-site 
> >> information or a self recovery mechanism. At this time, they suggest 
> >> setting the score_adj of this process to -1000, but I don't agree with 
> >> it, because the docker container is not important to other docker 
> >> containers of the same physical machine. If score_adj of the process 
> >> is set to -1000, the probability of oom in other container processes will 
> >> increase.
> >> 
> >> usecases 2: There are many business processes and agent processes 
> >> mixed together on a physical machine, and they need to be classified 
> >> and protected. However, some agents are the parents of business 
> >> processes, and some business processes are the parents of agent 
> >> processes, It will be troublesome to set different score_adj for them. 
> >> Business processes and agents cannot determine which level their 
> >> score_adj should be at, If we create another agent to set all processes's 
> >> score_adj, we have to cycle through all the processes on the physical 
> >> machine regularly, which looks stupid.
> >
> >I do agree that oom_score_adj is far from ideal tool for these usecases.
> >But I also agree with Roman that these could be addressed by an oom
> >killer implementation in the userspace which can have much better
> >tailored policies. OOM protection limits would require tuning and also
> >regular revisions (e.g. memory consumption by any workload might change
> >with different kernel versions) to provide what you are looking for.
> 
> There is a misunderstanding, oom.protect does not replace the user's 
> tailed policies, Its purpose is to make it easier and more efficient for 
> users to customize policies, or try to avoid users completely abandoning 
> the oom score to formulate new policies.

Then you should focus on explaining on how this makes those policies and
easier and moe efficient. I do not see it.

[...]

> >Why cannot you simply discount the protection from all processes
> >equally? I do not follow why the task_usage has to play any role in
> >that.
> 
> If all processes are protected equally, the oom protection of cgroup is 
> meaningless. For example, if there are more processes in the cgroup, 
> the cgroup can protect more mems, it is unfair to cgroups with fewer 
> processes. So we need to keep the total amount of memory that all 
> processes in the cgroup need to protect consistent with the value of 
> eoom.protect.

You are mixing two different concepts together I am afraid. The per
memcg protection should protect the cgroup (i.e. all processes in that
cgroup) while you want it to be also process aware. This results in a
very unclear runtime behavior when a process from a more protected memcg
is selected based on its individual memory usage.
-- 
Michal Hocko
SUSE Labs


  reply	other threads:[~2022-12-01 12:45 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-30  7:01 [PATCH] mm: memcontrol: protect the memory in cgroup from being oom killed chengkaitao
2022-11-30  8:41 ` Bagas Sanjaya
2022-11-30 11:33   ` Tao pilgrim
2022-11-30 12:43     ` Bagas Sanjaya
2022-11-30 13:25       ` 程垲涛 Chengkaitao Cheng
2022-11-30 15:46     ` 程垲涛 Chengkaitao Cheng
2022-11-30 16:27       ` Michal Hocko
2022-12-01  4:52         ` 程垲涛 Chengkaitao Cheng
2022-12-01  7:49           ` 程垲涛 Chengkaitao Cheng
2022-12-01  9:02             ` Michal Hocko
2022-12-01 13:05               ` 程垲涛 Chengkaitao Cheng
2022-12-01  8:49           ` Michal Hocko
2022-12-01 10:52             ` 程垲涛 Chengkaitao Cheng
2022-12-01 12:44               ` Michal Hocko [this message]
2022-12-01 13:08                 ` Michal Hocko
2022-12-01 14:30                   ` 程垲涛 Chengkaitao Cheng
2022-12-01 15:17                     ` Michal Hocko
2022-12-02  8:37                       ` 程垲涛 Chengkaitao Cheng
2022-11-30 13:15 ` Michal Hocko
2022-11-30 23:29 ` Roman Gushchin
2022-12-01 20:18   ` Mina Almasry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y4ihyRqQzyFFLqh6@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=bagasdotme@gmail.com \
    --cc=cgel.zte@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=chengkaitao@didiglobal.com \
    --cc=chengzhihao1@huawei.com \
    --cc=corbet@lwn.net \
    --cc=ebiederm@xmission.com \
    --cc=feng.tang@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=haolee.swjtu@gmail.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan.x@bytedance.com \
    --cc=mcgrof@kernel.org \
    --cc=pilgrimtao@gmail.com \
    --cc=ran.xiaokai@zte.com.cn \
    --cc=roman.gushchin@linux.dev \
    --cc=sfr@canb.auug.org.au \
    --cc=shakeelb@google.com \
    --cc=songmuchun@bytedance.com \
    --cc=sujiaxun@uniontech.com \
    --cc=surenb@google.com \
    --cc=tj@kernel.org \
    --cc=vasily.averin@linux.dev \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).