From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.7 required=3.0 tests=FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 703B4C432C1 for ; Tue, 24 Sep 2019 07:37:00 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 01D3F20640 for ; Tue, 24 Sep 2019 07:36:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 01D3F20640 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=sina.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3C2D16B0005; Tue, 24 Sep 2019 03:36:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 373F86B0006; Tue, 24 Sep 2019 03:36:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 289B16B0007; Tue, 24 Sep 2019 03:36:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0146.hostedemail.com [216.40.44.146]) by kanga.kvack.org (Postfix) with ESMTP id 06B456B0005 for ; Tue, 24 Sep 2019 03:36:58 -0400 (EDT) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id 8C7531EFD for ; Tue, 24 Sep 2019 07:36:58 +0000 (UTC) X-FDA: 75969007716.04.stamp95_59c3f0b216135 X-HE-Tag: stamp95_59c3f0b216135 X-Filterd-Recvd-Size: 4788 Received: from mail3-166.sinamail.sina.com.cn (mail3-166.sinamail.sina.com.cn [202.108.3.166]) by imf33.hostedemail.com (Postfix) with SMTP for ; Tue, 24 Sep 2019 07:36:56 +0000 (UTC) Received: from unknown (HELO localhost.localdomain)([111.194.183.187]) by sina.com with ESMTP id 5D89C79300022B86; Tue, 24 Sep 2019 15:36:54 +0800 (CST) X-Sender: hdanton@sina.com X-Auth-ID: hdanton@sina.com X-SMAIL-MID: 31032754921789 From: Hillf Danton To: Michal Hocko Cc: Hillf Danton , Johannes Weiner , Andrew Morton , linux , linux-mm , Shakeel Butt , Roman Gushchin , Matthew Wilcox Subject: Re: [RFC] mm: memcg: add priority for soft limit reclaiming Date: Tue, 24 Sep 2019 15:36:42 +0800 Message-Id: <20190924073642.3224-1-hdanton@sina.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 23 Sep 2019 21:28:34 Michal Hocko wrote: >=20 > On Mon 23-09-19 21:04:59, Hillf Danton wrote: > > > > On Thu, 19 Sep 2019 21:32:31 +0800 Michal Hocko wrote: > > > > > > On Thu 19-09-19 21:13:32, Hillf Danton wrote: > > > > > > > > Currently memory controler is playing increasingly important role= in > > > > how memory is used and how pages are reclaimed on memory pressure= . > > > > > > > > In daily works memcg is often created for critical tasks and thei= r pre > > > > configured memory usage is supposed to be met even on memory pres= sure. > > > > Administrator wants to make it configurable that the pages consum= ed by > > > > memcg-B can be reclaimed by page allocations invoked not by memcg= -A but > > > > by memcg-C. > > > > > > I am not really sure I understand the usecase well but this sounds = like > > > what memory reclaim protection in v2 is aiming at. > > > > Please describe the usecase. >=20 It is for quite a while that task-A has been able to preempt task-B for cpu cycles. IOW the physical resource cpu cycles are preemptible. Are physical pages are preemptible too in the same manner? Nope without priority defined for pages currently (say the link between page->nice and task->nice). The slrp is added for memcg instead of nice because 1) it is only used in the page reclaiming context (in memcg it is soft limit reclaiming), and 2) it is difficult to compare reclaimer and reclaimee task->nice directly in that context as only info about reclaimer and lru page is available. Here task->nice is replaced with memcg->slrp in order to do page preemption, PP. There is no way for task-A to PP task-B, but the group containing task-A can PP the group containing task-B. That preemption needs code within 100 lines as you see on top of the current memory controller framework. The user visible things following PP include 1) the increase in system-wide configurability, Combined with and/or in parallel to memcg.high, PP help admin configure and maintain 100 mm groups on systems with 100GB RAM. With every group high bundary set to 10MB, then he only needs to fiddle with the slrps of handful of groups containing critical tasks. 2) the increase in system-wide responsibility, Because critical groups can be configured to be not page preempted. 3) the gradient field grows in a long running system with prioirty, Just like the rivers going through all the ways from mountains to the seas. Adding PP in background reclaiming is on the way: 1> define page->nice and link it to task->nice 2> on isolating lru pages check reclaimer->nice against page->nice and skip page if reclaimer is lower on priority > > A tipoint to the v2 stuff please. >=20 > Documentation/admin-guide/cgroup-v2.rst >=20 Thanks Michal. Out of surprise slrp happened to go with the line of cgroup-v2. --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1108,6 +1108,17 @@ PAGE_SIZE multiple when read back. Going over the high limit never invokes the OOM killer and under extreme conditions the limit may be breached. + memory.slrp + A read-write single value [0-32] file which exists on non-root + cgroups. The default is "0". + + Soft limit reclaiming priority. This is the mechanism to control + how physical pages are reclaimed when a group's memory usage goes + over its high boundary. + + It makes sure that no pages will be reclaimed from any group of + higher slrp in favor of a lower-slrp group. + memory.max A read-write single value file which exists on non-root cgroups. The default is "max". -- Hillf