linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "程垲涛 Chengkaitao Cheng" <chengkaitao@didiglobal.com>
To: Michal Hocko <mhocko@suse.com>
Cc: Tao pilgrim <pilgrimtao@gmail.com>,
	"tj@kernel.org" <tj@kernel.org>,
	"lizefan.x@bytedance.com" <lizefan.x@bytedance.com>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
	"corbet@lwn.net" <corbet@lwn.net>,
	"roman.gushchin@linux.dev" <roman.gushchin@linux.dev>,
	"shakeelb@google.com" <shakeelb@google.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"songmuchun@bytedance.com" <songmuchun@bytedance.com>,
	"cgel.zte@gmail.com" <cgel.zte@gmail.com>,
	"ran.xiaokai@zte.com.cn" <ran.xiaokai@zte.com.cn>,
	"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
	"zhengqi.arch@bytedance.com" <zhengqi.arch@bytedance.com>,
	"ebiederm@xmission.com" <ebiederm@xmission.com>,
	"Liam.Howlett@oracle.com" <Liam.Howlett@oracle.com>,
	"chengzhihao1@huawei.com" <chengzhihao1@huawei.com>,
	"haolee.swjtu@gmail.com" <haolee.swjtu@gmail.com>,
	"yuzhao@google.com" <yuzhao@google.com>,
	"willy@infradead.org" <willy@infradead.org>,
	"vasily.averin@linux.dev" <vasily.averin@linux.dev>,
	"vbabka@suse.cz" <vbabka@suse.cz>,
	"surenb@google.com" <surenb@google.com>,
	"sfr@canb.auug.org.au" <sfr@canb.auug.org.au>,
	"mcgrof@kernel.org" <mcgrof@kernel.org>,
	"sujiaxun@uniontech.com" <sujiaxun@uniontech.com>,
	"feng.tang@intel.com" <feng.tang@intel.com>,
	"cgroups@vger.kernel.org" <cgroups@vger.kernel.org>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"Bagas Sanjaya" <bagasdotme@gmail.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Subject: Re: [PATCH] mm: memcontrol: protect the memory in cgroup from being oom killed
Date: Fri, 2 Dec 2022 08:37:52 +0000	[thread overview]
Message-ID: <771CC621-A19E-4174-B3D0-F451B1D7D69A@didiglobal.com> (raw)
In-Reply-To: <Y4jFnY7kMdB8ReSW@dhcp22.suse.cz>

At 2022-12-01 23:17:49, "Michal Hocko" <mhocko@suse.com> wrote:
>On Thu 01-12-22 14:30:11, 程垲涛 Chengkaitao Cheng wrote:
>> At 2022-12-01 21:08:26, "Michal Hocko" <mhocko@suse.com> wrote:
>> >On Thu 01-12-22 13:44:58, Michal Hocko wrote:
>> >> On Thu 01-12-22 10:52:35, 程垲涛 Chengkaitao Cheng wrote:
>> >> > At 2022-12-01 16:49:27, "Michal Hocko" <mhocko@suse.com> wrote:
>> >[...]
>> >> There is a misunderstanding, oom.protect does not replace the user's 
>> >> tailed policies, Its purpose is to make it easier and more efficient for 
>> >> users to customize policies, or try to avoid users completely abandoning 
>> >> the oom score to formulate new policies.
>> >
>> > Then you should focus on explaining on how this makes those policies and
>> > easier and moe efficient. I do not see it.
>> 
>> In fact, there are some relevant contents in the previous chat records. 
>> If oom.protect is applied, it will have the following benefits
>> 1. Users only need to focus on the management of the local cgroup, not the 
>> impact on other users' cgroups.
>
>Protection based balancing cannot really work in an isolation.

I think that a cgroup only needs to concern the protection value of the child 
cgroup, which is independent in a certain sense.

>> 2. Users and system do not need to spend extra time on complicated and 
>> repeated scanning and configuration. They just need to configure the 
>> oom.protect of specific cgroups, which is a one-time task
>
>This will not work same way as the memory reclaim protection cannot work
>in an isolation on the memcg level.

The parent cgroup's oom.protect can change the actual protected memory size 
of the child cgroup, which is exactly what we need. Because of it, the child cgroup 
can set its own oom.protect at will.

>> >> > >Why cannot you simply discount the protection from all processes
>> >> > >equally? I do not follow why the task_usage has to play any role in
>> >> > >that.
>> >> > 
>> >> > If all processes are protected equally, the oom protection of cgroup is 
>> >> > meaningless. For example, if there are more processes in the cgroup, 
>> >> > the cgroup can protect more mems, it is unfair to cgroups with fewer 
>> >> > processes. So we need to keep the total amount of memory that all 
>> >> > processes in the cgroup need to protect consistent with the value of 
>> >> > eoom.protect.
>> >> 
>> >> You are mixing two different concepts together I am afraid. The per
>> >> memcg protection should protect the cgroup (i.e. all processes in that
>> >> cgroup) while you want it to be also process aware. This results in a
>> >> very unclear runtime behavior when a process from a more protected memcg
>> >> is selected based on its individual memory usage.
>> >
>> The correct statement here should be that each memcg protection should 
>> protect the number of mems specified by the oom.protect. For example, 
>> a cgroup's usage is 6G, and it's oom.protect is 2G, when an oom killer occurs, 
>> In the worst case, we will only reduce the memory used by this cgroup to 2G 
>> through the om killer.
>
>I do not see how that could be guaranteed. Please keep in mind that a
>non-trivial amount of memory resources could be completely independent
>on any process life time (just consider tmpfs as a trivial example).
>
>> >Let me be more specific here. Although it is primarily processes which
>> >are the primary source of memcg charges the memory accounted for the oom
>> >badness purposes is not really comparable to the overal memcg charged
>> >memory. Kernel memory, non-mapped memory all that can generate rather
>> >interesting cornercases.
>> 
>> Sorry, I'm thoughtless enough about some special memory statistics. I will fix 
>> it in the next version
>
>Let me just emphasise that we are talking about fundamental disconnect.
>Rss based accounting has been used for the OOM killer selection because
>the memory gets unmapped and _potentially_ freed when the process goes
>away. Memcg changes are bound to the object life time and as said in
>many cases there is no direct relation with any process life time.

Based on your question, I want to revise the formula as follows,
score = task_usage + score_adj * totalpage - eoom.protect * (task_usage - task_rssshare) / 
(local_memcg_usage + local_memcg_swapcache)

After the process is killed, the unmapped cache and sharemem will not be 
released immediately, so they should not apply to cgroup for protection quota. 
In extreme environments, the memory that cannot be released by the oom killer 
(i.e. some mems that have not been charged to the process) may occupy a large 
share of protection quota, but it is expected. Of course, the idea may have some 
problems that I haven't considered.

>
>Hope that clarifies.

Thanks for your comment!
chengkaitao


  reply	other threads:[~2022-12-02  8:38 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-30  7:01 [PATCH] mm: memcontrol: protect the memory in cgroup from being oom killed chengkaitao
2022-11-30  8:41 ` Bagas Sanjaya
2022-11-30 11:33   ` Tao pilgrim
2022-11-30 12:43     ` Bagas Sanjaya
2022-11-30 13:25       ` 程垲涛 Chengkaitao Cheng
2022-11-30 15:46     ` 程垲涛 Chengkaitao Cheng
2022-11-30 16:27       ` Michal Hocko
2022-12-01  4:52         ` 程垲涛 Chengkaitao Cheng
2022-12-01  7:49           ` 程垲涛 Chengkaitao Cheng
2022-12-01  9:02             ` Michal Hocko
2022-12-01 13:05               ` 程垲涛 Chengkaitao Cheng
2022-12-01  8:49           ` Michal Hocko
2022-12-01 10:52             ` 程垲涛 Chengkaitao Cheng
2022-12-01 12:44               ` Michal Hocko
2022-12-01 13:08                 ` Michal Hocko
2022-12-01 14:30                   ` 程垲涛 Chengkaitao Cheng
2022-12-01 15:17                     ` Michal Hocko
2022-12-02  8:37                       ` 程垲涛 Chengkaitao Cheng [this message]
2022-11-30 13:15 ` Michal Hocko
2022-11-30 23:29 ` Roman Gushchin
2022-12-01 20:18   ` Mina Almasry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=771CC621-A19E-4174-B3D0-F451B1D7D69A@didiglobal.com \
    --to=chengkaitao@didiglobal.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=bagasdotme@gmail.com \
    --cc=cgel.zte@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=chengzhihao1@huawei.com \
    --cc=corbet@lwn.net \
    --cc=ebiederm@xmission.com \
    --cc=feng.tang@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=haolee.swjtu@gmail.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan.x@bytedance.com \
    --cc=mcgrof@kernel.org \
    --cc=mhocko@suse.com \
    --cc=pilgrimtao@gmail.com \
    --cc=ran.xiaokai@zte.com.cn \
    --cc=roman.gushchin@linux.dev \
    --cc=sfr@canb.auug.org.au \
    --cc=shakeelb@google.com \
    --cc=songmuchun@bytedance.com \
    --cc=sujiaxun@uniontech.com \
    --cc=surenb@google.com \
    --cc=tj@kernel.org \
    --cc=vasily.averin@linux.dev \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).