From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.6 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E7EEFC432C0 for ; Mon, 25 Nov 2019 14:11:54 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8F62920748 for ; Mon, 25 Nov 2019 14:11:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="BMfkYKCg" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8F62920748 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id EC51F6B0500; Mon, 25 Nov 2019 09:11:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E77206B055D; Mon, 25 Nov 2019 09:11:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D65C96B05A0; Mon, 25 Nov 2019 09:11:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0140.hostedemail.com [216.40.44.140]) by kanga.kvack.org (Postfix) with ESMTP id BE2416B0500 for ; Mon, 25 Nov 2019 09:11:53 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id 7B9538249980 for ; Mon, 25 Nov 2019 14:11:53 +0000 (UTC) X-FDA: 76194988506.12.anger18_1074a785cbf33 X-HE-Tag: anger18_1074a785cbf33 X-Filterd-Recvd-Size: 7936 Received: from mail-io1-f65.google.com (mail-io1-f65.google.com [209.85.166.65]) by imf08.hostedemail.com (Postfix) with ESMTP for ; Mon, 25 Nov 2019 14:11:52 +0000 (UTC) Received: by mail-io1-f65.google.com with SMTP id z26so12822925iot.8 for ; Mon, 25 Nov 2019 06:11:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=29wTy2fHru0/xuzyKpSFh6cYhgNpYnPvj0dIO5T8s+I=; b=BMfkYKCgAyAz6CaXQouARWKBkf9nkT+PV6r6Y/Z9NW313bDiJlputWh1fs15Cum+zG meGz3BNEsmUV239QJFP7pPoMepYve1tRohpe4flNjIvhIJOqHcXCnxpZojgY9sTtMBvy DkwFzmg0XUjonMUBaUIQFnZIR+wwmz+tNBHoozNAj6OZrF3HYnuHH4kR3b8Je0JIWPe8 EQjPgElcvulO6HkhOPIPUiBRWTPHBbHgofkqzjuy9fGo5StckfP3AptD0b7dPiNR8s6b bYZ7G71jWk0mA3/5IDMUNb0p6t3rlXBE+Wcd5VRL+UWInIVCjKfIB+xhVzmcIA96gn6D ehbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=29wTy2fHru0/xuzyKpSFh6cYhgNpYnPvj0dIO5T8s+I=; b=t/JW5c54g1JF8TTfREqDACn1xZcPbFN5SCZ6KjlAkblwetKFeHV65iFqTzSfy50qoM TnigfL17D3yCYEhLsgYlnL06XrAaRYrfWJ4QomC9Zj+dF7nvywVyEqTjTOgXZUrgQOor 0DZD+jyijDxmLEtLTZ5H22eYj3Mb7eaw2cJjv/jGwwJRYg8R2cTCIhlJFg0l4oXE6H3F GLF2GN64rrimTG3Gxtx4CSDDswT18So9BnGfDFWZ8WrDcr/SpFqRtjGT7skuW22uxCAc sf5MGiHPzExaRSWWIjmICCOU+x2MN4O3dhnEkxcdqIN8k5Zs6ARUP++ryu0smxrwAzmz GrYg== X-Gm-Message-State: APjAAAVRPmy1Kj2JZ9fUdxSxMlX8FzFju1c28CJjJ7uYhlInbJNWh1xc U2JFSjFS6nqqq0xdAvPKRRHVX1gCO17paxsS1CA= X-Google-Smtp-Source: APXvYqymj2a356fdReDe3rYnKeWiUzpFqY31p44BM4UhV98VVqtLK1yeoAjHDMAIBmEYyg3PRYd+exiguP4hvN9od1E= X-Received: by 2002:a6b:fd0b:: with SMTP id c11mr27224938ioi.203.1574691111883; Mon, 25 Nov 2019 06:11:51 -0800 (PST) MIME-Version: 1.0 References: <1574676893-1571-1-git-send-email-laoar.shao@gmail.com> <20191125110848.GH31714@dhcp22.suse.cz> <20191125115409.GJ31714@dhcp22.suse.cz> <20191125123123.GL31714@dhcp22.suse.cz> <20191125124553.GM31714@dhcp22.suse.cz> In-Reply-To: <20191125124553.GM31714@dhcp22.suse.cz> From: Yafang Shao Date: Mon, 25 Nov 2019 22:11:15 +0800 Message-ID: Subject: Re: [PATCH] mm, memcg: clear page protection when memcg oom group happens To: Michal Hocko Cc: Johannes Weiner , Vladimir Davydov , Andrew Morton , Linux MM Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Nov 25, 2019 at 8:45 PM Michal Hocko wrote: > > On Mon 25-11-19 20:37:52, Yafang Shao wrote: > > On Mon, Nov 25, 2019 at 8:31 PM Michal Hocko wrote: > > > > > > On Mon 25-11-19 20:17:15, Yafang Shao wrote: > > > > On Mon, Nov 25, 2019 at 7:54 PM Michal Hocko wrote: > > > > > > > > > > On Mon 25-11-19 19:37:59, Yafang Shao wrote: > > > > > > On Mon, Nov 25, 2019 at 7:08 PM Michal Hocko wrote: > > > > > > > > > > > > > > On Mon 25-11-19 05:14:53, Yafang Shao wrote: > > > > > > > > We set memory.oom.group to make all processes in this memcg are killed by > > > > > > > > OOM killer to free more pages. In this case, it doesn't make sense to > > > > > > > > protect the pages with memroy.{min, low} again if they are set. > > > > > > > > > > > > > > I do not see why? What does group OOM killing has anything to do with > > > > > > > the reclaim protection? What is the actual problem you are trying to > > > > > > > solve? > > > > > > > > > > > > > > > > > > > The cgroup is treated as a indivisible workload when cgroup.oom.group > > > > > > is set and OOM killer is trying to kill a prcess in this cgroup. > > > > > > > > > > Yes this is true. > > > > > > > > > > > We set cgroup.oom.group is to guarantee the workload integrity, now > > > > > > that processes ara all killed, why keeps the page cache here? > > > > > > > > > > Because an administrator has configured the reclaim protection in a > > > > > certain way and hopefully had a good reason to do that. We are not going > > > > > to override that configure just because there is on OOM killer invoked > > > > > and killed tasks in that memcg. The workload might get restarted and it > > > > > would run under a different constrains all of the sudden which is not > > > > > expected. > > > > > > > > > > In short kernel should never silently change the configuration made by > > > > > an admistrator. > > > > > > > > Understood. > > > > > > > > So what about bellow changes ? We don't override the admin setting, > > > > but we reclaim the page caches from it if this memcg is oom killed. > > > > Something like, > > > > > > > > mem_cgroup_protected > > > > { > > > > ... > > > > + if (!cgroup_is_populated(memcg->css.cgroup) && > > > > mem_cgroup_under_oom_group_kill(memcg)) > > > > + return MEMCG_PROT_NONE; > > > > + > > > > usage = page_counter_read(&memcg->memory); > > > > if (!usage) > > > > return MEMCG_PROT_NONE; > > > > } > > > > > > I assume that mem_cgroup_under_oom_group_kill is essentially > > > memcg->under_oom && memcg->oom_group > > > But that doesn't really help much because all the reclaim attempts have > > > been already attempted and failed. I do not remember exact details about > > > under_oom but I have a recollection that it wouldn't really work for > > > cgroup v2 because the oom_control is not in place and so the state would > > > be set for only very short time period. > > > > > > Again, what is a problem that you are trying to fix? > > > > When there's no processes running in a memcg, for example if they are > > killed by OOM killer, we can't reclaim the file page cache protected > > by memory.min of this memcg. These file page caches are useless in > > this case. > > That's what I'm trying to fix. > > Could you be more specific please? I would assume that the group oom > configured memcg would either restart its workload when killed (that is > why you want to kill the whole workload to restart it cleanly in many > case) or simply tear down the memcg altogether. > Yes, we always restart it automatically if these processes are exit (no matter because of OOM or some other reason). It is safe to do that if OOM happens, because OOM is always because of anon pages leaked and the restart can free these anon pages. But there may be some cases that we can't success to restart it, while if that happens the protected pages will be never be reclaimed until the admin reset it or make this memcg offline. When there're no processes, we don't need to protect the pages. You can consider it as 'fault tolerance' . > In other words why do you care about the oom killer case so much? It is > not different that handling a lingering memcg with the workload already > finished. You simply have no way to know whether the reclaim protection > is still required. Admin is supposed to either offline the memcg that is > no longer used or drop the reclaim protection once it is not needed > because that has some visible consequences on the overall system > operation. Actually what I concern is the case that there's no process running but memory protection coninues protecting the file pages. OOM is just one case of them. Thanks Yafang