From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F975C433EF for ; Fri, 8 Apr 2022 04:10:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234218AbiDHEMz (ORCPT ); Fri, 8 Apr 2022 00:12:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51158 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234207AbiDHEMs (ORCPT ); Fri, 8 Apr 2022 00:12:48 -0400 Received: from mail-io1-xd36.google.com (mail-io1-xd36.google.com [IPv6:2607:f8b0:4864:20::d36]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6E0711FF436 for ; Thu, 7 Apr 2022 21:10:45 -0700 (PDT) Received: by mail-io1-xd36.google.com with SMTP id p21so9309202ioj.4 for ; Thu, 07 Apr 2022 21:10:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=kLJsh09usQM6KN7slbu+hDyAz10KLaAr5TuAGBF1ET8=; b=gBo+/HbruUwOBKndO6LMcDu/iJv+/tOrAmtZhCfGRubv6KuCygoW0XzqGPxgTnIjUE YsTDHr2rrSBBP4YfBA072r25eyCoMDp623Z9qh4T2KJ/8B9Rst7np2vKyRSlZ6aWzWJ9 ZrVqZ0kipXK2bXR/XkFovvZKgn4UZJfIehQfiXfSUyIkHnZs4TO8t80L0fJfGBfxdQ0e Ks3hUVwCWQARQLa89WDBX16qmbJeFLC3C+MqOnQu4ghjjxCkOQcnGZUWk62YF7TAP13I 8FRV6jbqN0tq1xYnyrUCZVi227qm1C7BkKEQywopeprRzWbJ2f0+P4RyC2+AILgu8SD8 MUKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=kLJsh09usQM6KN7slbu+hDyAz10KLaAr5TuAGBF1ET8=; b=jJ/OwBKl0igF8J2GllTGzr9OyDrOutb7mLdh4NJEarxnSOwwtmHjMvgW4vyDQhfIDs yUaCC5HbDcayooS3kZeckCimWfuTfoRDjNmE/q0gsEoQQT7QsssWhVNzrzkl1joYB1EC UxXYkbViRoiDlEbRZLsfSQHHTFe4Mv2xSipYhyjVWadX2EInTC/m6a/87oEqSoFt0fcq z5RhraTWGTYj/CWGS5y864x4+f5fuQlJQ4dYFEh9fVNt0Jfk3Ku4FKgVi3QIxWfczC47 MbueQGTl5M0ITDvBpypF3svoKZOt0CwrK6YElnIPo5Arygld8Xj4U21EmGW0PaEUJJqt Y3Ng== X-Gm-Message-State: AOAM530wPMWA2zA0zAwucvqaPZyqrG02rXPIbAdA0Wt6+zT8mBrjOk9r cAKZUjbgRlVs+K/z1vCOtZg32Zz9MijjKqsdDe7zlQ== X-Google-Smtp-Source: ABdhPJzsHGDQGCBMIEMjZZ+ZfDHTSSvZY148evFFWftTD2yA+ROjdj/6Mou/Sxg7Iwz3kSErR9RaYM0kkHP9g4gRVLk= X-Received: by 2002:a5e:dc4c:0:b0:64c:ceff:8916 with SMTP id s12-20020a5edc4c000000b0064cceff8916mr7651721iop.117.1649391044303; Thu, 07 Apr 2022 21:10:44 -0700 (PDT) MIME-Version: 1.0 References: <20220331084151.2600229-1-yosryahmed@google.com> <87y20nzyw4.fsf@yhuang6-desk2.ccr.corp.intel.com> <87o81fujdc.fsf@yhuang6-desk2.ccr.corp.intel.com> <87bkxfudrk.fsf@yhuang6-desk2.ccr.corp.intel.com> <215bd7332aee0ed1092bad4d826a42854ebfd04a.camel@linux.intel.com> <87y20gtgpf.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <87y20gtgpf.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Wei Xu Date: Thu, 7 Apr 2022 21:10:33 -0700 Message-ID: Subject: Re: [PATCH resend] memcg: introduce per-memcg reclaim interface To: "Huang, Ying" Cc: Tim Chen , Michal Hocko , Yosry Ahmed , Johannes Weiner , Shakeel Butt , Andrew Morton , David Rientjes , Tejun Heo , Zefan Li , Roman Gushchin , Cgroups , "open list:DOCUMENTATION" , Linux Kernel Mailing List , Linux MM , Jonathan Corbet , Yu Zhao , Dave Hansen , Greg Thelen Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 7, 2022 at 8:08 PM Huang, Ying wrote: > > Wei Xu writes: > > > On Thu, Apr 7, 2022 at 4:11 PM Tim Chen wrote: > >> > >> On Thu, 2022-04-07 at 15:12 -0700, Wei Xu wrote: > >> > >> > > >> > (resending in plain-text, sorry). > >> > > >> > memory.demote can work with any level of memory tiers if a nodemask > >> > argument (or a tier argument if there is a more-explicitly defined, > >> > userspace visible tiering representation) is provided. The semantics > >> > can be to demote X bytes from these nodes to their next tier. > >> > > >> > >> We do need some kind of userspace visible tiering representation. > >> Will be nice if I can tell the memory type, nodemask of nodes in tier Y with > >> > >> cat memory.tier_Y > >> > >> > >> > memory_dram/memory_pmem assumes the hardware for a particular memory > >> > tier, which is undesirable. For example, it is entirely possible that > >> > a slow memory tier is implemented by a lower-cost/lower-performance > >> > DDR device connected via CXL.mem, not by PMEM. It is better for this > >> > interface to speak in either the NUMA node abstraction or a new tier > >> > abstraction. > >> > >> Just from the perspective of memory.reclaim and memory.demote, I think > >> they could work with nodemask. For ease of management, > >> some kind of abstraction of tier information like nodemask, memory type > >> and expected performance should be readily accessible by user space. > >> > > > > I agree. The tier information should be provided at the system level. > > One suggestion is to have a new directory "/sys/devices/system/tier/" > > for tiers, e.g.: > > > > /sys/devices/system/tier/tier0/memlist: all memory nodes in tier 0. > > /sys/devices/system/tier/tier1/memlist: all memory nodes in tier 1. > > I think that it may be sufficient to make tier an attribute of "node". > Some thing like, > > /sys/devices/system/node/nodeX/memory_tier > This works. If we want additional information about each tier, we can then add a tier-specific subtree. In addition, it would be good to also expose the demotion target nodes (node_demotion[]) via sysfs, e.g.: /sys/devices/system/node/nodeX/demotion_path which returns node_demotion[X]. > Best Regards, > Huang, Ying > > > We can discuss this tier representation in a new thread. > > > >> Tim > >> > >> > > >> > It is also desirable to make this interface stateless, i.e. not to > >> > require the setting of memory_dram.reclaim_policy. Any policy can be > >> > specified as arguments to the request itself and should only affect > >> > that particular request. > >> > > >> > Wei > >> > From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wei Xu Subject: Re: [PATCH resend] memcg: introduce per-memcg reclaim interface Date: Thu, 7 Apr 2022 21:10:33 -0700 Message-ID: References: <20220331084151.2600229-1-yosryahmed@google.com> <87y20nzyw4.fsf@yhuang6-desk2.ccr.corp.intel.com> <87o81fujdc.fsf@yhuang6-desk2.ccr.corp.intel.com> <87bkxfudrk.fsf@yhuang6-desk2.ccr.corp.intel.com> <215bd7332aee0ed1092bad4d826a42854ebfd04a.camel@linux.intel.com> <87y20gtgpf.fsf@yhuang6-desk2.ccr.corp.intel.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=kLJsh09usQM6KN7slbu+hDyAz10KLaAr5TuAGBF1ET8=; b=gBo+/HbruUwOBKndO6LMcDu/iJv+/tOrAmtZhCfGRubv6KuCygoW0XzqGPxgTnIjUE YsTDHr2rrSBBP4YfBA072r25eyCoMDp623Z9qh4T2KJ/8B9Rst7np2vKyRSlZ6aWzWJ9 ZrVqZ0kipXK2bXR/XkFovvZKgn4UZJfIehQfiXfSUyIkHnZs4TO8t80L0fJfGBfxdQ0e Ks3hUVwCWQARQLa89WDBX16qmbJeFLC3C+MqOnQu4ghjjxCkOQcnGZUWk62YF7TAP13I 8FRV6jbqN0tq1xYnyrUCZVi227qm1C7BkKEQywopeprRzWbJ2f0+P4RyC2+AILgu8SD8 MUKg== In-Reply-To: <87y20gtgpf.fsf-fFUE1NP8JkzwuUmzmnQr+vooFf0ArEBIu+b9c/7xato@public.gmane.org> List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: "Huang, Ying" Cc: Tim Chen , Michal Hocko , Yosry Ahmed , Johannes Weiner , Shakeel Butt , Andrew Morton , David Rientjes , Tejun Heo , Zefan Li , Roman Gushchin , Cgroups , "open list:DOCUMENTATION" , Linux Kernel Mailing List , Linux MM , Jonathan Corbet , Yu Zhao , Dave Hansen , Greg Thelen On Thu, Apr 7, 2022 at 8:08 PM Huang, Ying wrote: > > Wei Xu writes: > > > On Thu, Apr 7, 2022 at 4:11 PM Tim Chen wrote: > >> > >> On Thu, 2022-04-07 at 15:12 -0700, Wei Xu wrote: > >> > >> > > >> > (resending in plain-text, sorry). > >> > > >> > memory.demote can work with any level of memory tiers if a nodemask > >> > argument (or a tier argument if there is a more-explicitly defined, > >> > userspace visible tiering representation) is provided. The semantics > >> > can be to demote X bytes from these nodes to their next tier. > >> > > >> > >> We do need some kind of userspace visible tiering representation. > >> Will be nice if I can tell the memory type, nodemask of nodes in tier Y with > >> > >> cat memory.tier_Y > >> > >> > >> > memory_dram/memory_pmem assumes the hardware for a particular memory > >> > tier, which is undesirable. For example, it is entirely possible that > >> > a slow memory tier is implemented by a lower-cost/lower-performance > >> > DDR device connected via CXL.mem, not by PMEM. It is better for this > >> > interface to speak in either the NUMA node abstraction or a new tier > >> > abstraction. > >> > >> Just from the perspective of memory.reclaim and memory.demote, I think > >> they could work with nodemask. For ease of management, > >> some kind of abstraction of tier information like nodemask, memory type > >> and expected performance should be readily accessible by user space. > >> > > > > I agree. The tier information should be provided at the system level. > > One suggestion is to have a new directory "/sys/devices/system/tier/" > > for tiers, e.g.: > > > > /sys/devices/system/tier/tier0/memlist: all memory nodes in tier 0. > > /sys/devices/system/tier/tier1/memlist: all memory nodes in tier 1. > > I think that it may be sufficient to make tier an attribute of "node". > Some thing like, > > /sys/devices/system/node/nodeX/memory_tier > This works. If we want additional information about each tier, we can then add a tier-specific subtree. In addition, it would be good to also expose the demotion target nodes (node_demotion[]) via sysfs, e.g.: /sys/devices/system/node/nodeX/demotion_path which returns node_demotion[X]. > Best Regards, > Huang, Ying > > > We can discuss this tier representation in a new thread. > > > >> Tim > >> > >> > > >> > It is also desirable to make this interface stateless, i.e. not to > >> > require the setting of memory_dram.reclaim_policy. Any policy can be > >> > specified as arguments to the request itself and should only affect > >> > that particular request. > >> > > >> > Wei > >> >