From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98602C433FE for ; Thu, 7 Apr 2022 22:08:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 21A3B6B0072; Thu, 7 Apr 2022 18:08:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1C8F36B0073; Thu, 7 Apr 2022 18:08:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 06A7B6B0074; Thu, 7 Apr 2022 18:08:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id EBB9C6B0072 for ; Thu, 7 Apr 2022 18:08:17 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 94B07254D8 for ; Thu, 7 Apr 2022 22:08:07 +0000 (UTC) X-FDA: 79331471814.04.5807B7E Received: from mail-io1-f42.google.com (mail-io1-f42.google.com [209.85.166.42]) by imf31.hostedemail.com (Postfix) with ESMTP id 0BC3020004 for ; Thu, 7 Apr 2022 22:08:06 +0000 (UTC) Received: by mail-io1-f42.google.com with SMTP id g21so8499210iom.13 for ; Thu, 07 Apr 2022 15:08:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=v2J8eCzM8yKfV+vRAyY/bzAyxc9N3oBiF4sFVHJhKfU=; b=bj95dsHKE0YqfAeIpcZzbvdQHUIkOo52rMu7fVaBYXW+GXPxy1kvpZlR62+TpC0rAO WF08PcKcI3qwyRvUtVQnJxC+Ifd0U2EC9D4mWN6aNTjccM3T80KuJXn9f9FZ5LZb2yb9 ppVC4ddvN1f8vMWXWYLu/Ccw33RJnOZOf1e6IO9FCjvcofpX2FOt+0+z8ed4HET0PWGg 0UdeuwZgy/prYFf1iYAO5OvztdZP7IUiFaHC7hEts29D2w7ZZzm7BEotqfN3HFGarmqV nUv99FZjPh+AxenydEx/25SScfC4ulaojSc0QyJKUuf/kkjOPfFfDK+lNI1ZPwVtkkxm fUiw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=v2J8eCzM8yKfV+vRAyY/bzAyxc9N3oBiF4sFVHJhKfU=; b=0glAfE+1C/TnzrkpMXOwN/N4l7Uhl52nrSgQlMbxvgT/BIdxlsi5miNRMl2yaN8sx5 aaLnlu0b+BNaZ1G2Es4dJvMKrgxALf2D5dvryQs+7C5TNpFpvCeGbIdazltHJ1AXR32A HqFo9zwoz3J6D57/OvasIdYtA7dq+4xThTCzJ0xhT7i1eQa27Y6uKL8WB1v+InezvfEe 3ojj5yUkINls/KMkIFbZQx0FG6+D0DCySsZgsp+XoDamc3Jo6UpGcEzr/8MFVowfuU+r OBZbYsJAQ/RWINUflT6Q2dmXQN6iXDKnA532FGGq/jATboltiYvVf2cqvB+LNuCcOPRg 4f1A== X-Gm-Message-State: AOAM5334zxNhOC7UmkzTlkB7Uu3lKr4nu6O8Sx8+Fs7I8zYcHZfchcGs dvTMS3cimUUzmtX/GQUJ5vXF2/wIDfknJrCt1EfPAA== X-Google-Smtp-Source: ABdhPJzgQwxkhzS5eNSECezeQi2xsyJ65euGelknnYcCuCz8dRp5hoduEg13jzuMuVLBaJGftRYYVJn/iWpgb6xzl8U= X-Received: by 2002:a05:6638:3012:b0:317:9a63:ecd3 with SMTP id r18-20020a056638301200b003179a63ecd3mr8776925jak.210.1649369286222; Thu, 07 Apr 2022 15:08:06 -0700 (PDT) MIME-Version: 1.0 References: <20220331084151.2600229-1-yosryahmed@google.com> <87y20nzyw4.fsf@yhuang6-desk2.ccr.corp.intel.com> <87o81fujdc.fsf@yhuang6-desk2.ccr.corp.intel.com> <87bkxfudrk.fsf@yhuang6-desk2.ccr.corp.intel.com> <215bd7332aee0ed1092bad4d826a42854ebfd04a.camel@linux.intel.com> In-Reply-To: <215bd7332aee0ed1092bad4d826a42854ebfd04a.camel@linux.intel.com> From: Wei Xu Date: Thu, 7 Apr 2022 15:07:55 -0700 Message-ID: Subject: Re: [PATCH resend] memcg: introduce per-memcg reclaim interface To: Tim Chen Cc: "Huang, Ying" , Michal Hocko , Yosry Ahmed , Johannes Weiner , Shakeel Butt , Andrew Morton , David Rientjes , Tejun Heo , Zefan Li , Roman Gushchin , Cgroups , "open list:DOCUMENTATION" , Linux Kernel Mailing List , Linux MM , Jonathan Corbet , Yu Zhao , Dave Hansen , Greg Thelen Content-Type: multipart/alternative; boundary="00000000000069002105dc17b6a5" X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 0BC3020004 X-Stat-Signature: mz8wg178tx3wgpsgbuqxo15bph4q94dg Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=bj95dsHK; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf31.hostedemail.com: domain of weixugc@google.com designates 209.85.166.42 as permitted sender) smtp.mailfrom=weixugc@google.com X-Rspam-User: X-HE-Tag: 1649369286-1070 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --00000000000069002105dc17b6a5 Content-Type: text/plain; charset="UTF-8" On Thu, Apr 7, 2022 at 2:26 PM Tim Chen wrote: > On Wed, 2022-04-06 at 10:49 +0800, Huang, Ying wrote: > > > > > > If so, > > > > > > > > # echo A > memory.reclaim > > > > > > > > means > > > > > > > > a) "A" bytes memory are freed from the memcg, regardless demoting is > > > > used or not. > > > > > > > > or > > > > > > > > b) "A" bytes memory are reclaimed from the memcg, some of them may be > > > > freed, some of them may be just demoted from DRAM to PMEM. The > total > > > > number is "A". > > > > > > > > For me, a) looks more reasonable. > > > > > > > > > > We can use a DEMOTE flag to control the demotion behavior for > > > memory.reclaim. If the flag is not set (the default), then > > > no_demotion of scan_control can be set to 1, similar to > > > reclaim_pages(). > > > > If we have to use a flag to control the behavior, I think it's better to > > have a separate interface (e.g. memory.demote). But do we really need > b)? > > > > > The question is then whether we want to rename memory.reclaim to > > > something more general. I think this name is fine if reclaim-based > > > demotion is an accepted concept. > > > > memory.demote will work for 2 level of memory tiers. But when we have 3 > level > of memory (e.g. high bandwidth memory, DRAM and PMEM), > it gets ambiguous again of wheter we sould demote from high bandwidth > memory > or DRAM. > > Will something like this be more general? > > echo X > memory_[dram,pmem,hbm].reclaim > > So echo X > memory_dram.reclaim > means that we want to free up X bytes from DRAM for the mem cgroup. > > echo demote > memory_dram.reclaim_policy > > This means that we prefer demotion for reclaim instead > of swapping to disk. > > memory.demote can work with any level of memory tiers if a nodemask argument (or a tier argument if there is a more-explicitly defined, userspace visible tiering representation) is provided. The semantics can be to demote X bytes from these nodes to their next tier. memory_dram/memory_pmem assumes the hardware for a particular memory tier, which is undesirable. For example, it is entirely possible that a slow memory tier is implemented by a lower-cost/lower-performance DDR device connected via CXL.mem, not by PMEM. It is better for this interface to speak in either the NUMA node abstraction or a new tier abstraction. It is also desirable to make this interface stateless, i.e. not to require the setting of memory_dram.reclaim_policy. Any policy can be specified as arguments to the request itself and should only affect that particular request. Wei --00000000000069002105dc17b6a5 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Thu, Apr 7, 2022 at 2:26 PM Tim Ch= en <tim.c.chen@linux.intel= .com> wrote:
On Wed, 2022-04-06 at 10:49 +0800, Huang, Ying wrote:
>
> > > If so,
> > >
> > > # echo A > memory.reclaim
> > >
> > > means
> > >
> > > a) "A" bytes memory are freed from the memcg, rega= rdless demoting is
> > >=C2=A0 =C2=A0 used or not.
> > >
> > > or
> > >
> > > b) "A" bytes memory are reclaimed from the memcg, = some of them may be
> > >=C2=A0 =C2=A0 freed, some of them may be just demoted from DR= AM to PMEM.=C2=A0 The total
> > >=C2=A0 =C2=A0 number is "A".
> > >
> > > For me, a) looks more reasonable.
> > >
> >
> > We can use a DEMOTE flag to control the demotion behavior for
> > memory.reclaim.=C2=A0 If the flag is not set (the default), then<= br> > > no_demotion of scan_control can be set to 1, similar to
> > reclaim_pages().
>
> If we have to use a flag to control the behavior, I think it's bet= ter to
> have a separate interface (e.g. memory.demote).=C2=A0 But do we really= need b)?
>
> > The question is then whether we want to rename memory.reclaim to<= br> > > something more general.=C2=A0 I think this name is fine if reclai= m-based
> > demotion is an accepted concept.
>

memory.demote will work for 2 level of memory tiers.=C2=A0 But when we have= 3 level
of memory (e.g. high bandwidth memory, DRAM and PMEM),
it gets ambiguous again of wheter we sould demote from high bandwidth memor= y
or DRAM.

Will something like this be more general?

echo X > memory_[dram,pmem,hbm].reclaim

So echo X > memory_dram.reclaim
means that we want to free up X bytes from DRAM for the mem cgroup.

echo demote > memory_dram.reclaim_policy

This means that we prefer demotion for reclaim instead
of swapping to disk.


memory.demote = can work with any level of memory tiers if a nodemask argument (or a tier a= rgument if there is a more-explicitly defined, userspace visible tiering re= presentation) is provided.=C2=A0 The semantics can be to demote X bytes fro= m these nodes to their next tier.

memory_dram/memo= ry_pmem assumes the hardware for a particular memory tier, which is undesir= able.=C2=A0 For example, it is entirely possible that a slow memory tier is= implemented by a lower-cost/lower-performance DDR device connected via CXL= .mem, not by PMEM.=C2=A0 It is better for this interface to speak in either= the NUMA node abstraction or a new tier abstraction.

<= div>It is also desirable to make this interface stateless, i.e. not to requ= ire the setting of memory_dram.reclaim_policy.=C2=A0 Any policy can be spec= ified as arguments to the request itself and should only affect that partic= ular request.

Wei

=C2=A0<= /div>
--00000000000069002105dc17b6a5--