From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE008C433FE for ; Fri, 8 Apr 2022 02:10:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233644AbiDHCMi (ORCPT ); Thu, 7 Apr 2022 22:12:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57604 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229788AbiDHCMf (ORCPT ); Thu, 7 Apr 2022 22:12:35 -0400 Received: from mail-il1-x135.google.com (mail-il1-x135.google.com [IPv6:2607:f8b0:4864:20::135]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 38CE214D00E for ; Thu, 7 Apr 2022 19:10:32 -0700 (PDT) Received: by mail-il1-x135.google.com with SMTP id q10so2443008ilt.9 for ; Thu, 07 Apr 2022 19:10:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=3zLo2uZAPFPb1a10/oiW+kNzLrd/tjhqt8zswWgEmmM=; b=RYo/qKEMhFXkj4EEEZGvJ6VFDZ+tSforE2IzUpicHlQ/MFAhXALefpKrPRB75mPtt+ LwAN/p4WQS8oGGtRmuO+G5WEcMOa1EAvFqiDhlC7lh1l4ndUmWgIpMSACdklWU7vHuJ3 bW+C65ucZLINuYcSBx07IA/fu5Txan3nuxAWjN+WLEs28JeBv3l7zgkDIyJWAHW6KN0i sdNZlbScPlrCV3G2u6yxXSZIi76DYmQM/NS2jmaCbCW5Q3K6BeBfnbtcU4hPI8vJzo/J R7jTf4zbZywtg+3GWrThOvSHBeJS30G1EO4VjtBChGE0fe2KHRnkUlzLjc/IABN0jZxt 6xOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=3zLo2uZAPFPb1a10/oiW+kNzLrd/tjhqt8zswWgEmmM=; b=wmcatanp0KBavc01RZzRjzAc6auf+Y7T0PZKtt0v8OrGt3sulWCDNNiz7/VTlkisLW XjEKPYJNH5AM+sRUte20LmgVxZ+TYtMxON8VLpqRKMbtFl2xdeO6bihUBrTVAxUCTfv3 Fcy9Jlgued017pMZIqA9GGHuuji6V89+2hp5sdkdzy13eWO3FYk/nSaWvKPRbzFf05HJ NUitEbPzfqsFJjr4ZfKs6tr+bmI4ZZeZKUrzCu7TAGhTVG5FrVoSLhG6Zv8J5xcEbMli kMww+ayEmu+d4TErYRjlSmcwpQ8XBHHa2MdRSIfhEB6JY24UiB66ZUXAFgskpZKNrgC8 v2XA== X-Gm-Message-State: AOAM533mbyr1ZfSOm6+Pyxou41Jribe2IYxcqI0x/1QTVdAKZW7SsKbm ethD18lYyfs+q8+2TBAFllgItRhqlM5UCkspeHa2fQ== X-Google-Smtp-Source: ABdhPJzywFSFz4PYO+zLQREwaiNdmfPXBeHr0ksmWhzLo32ltQANJedCo3Fi6YRF+TuO6wKPeYFJE6Q3/rK52iGg5U8= X-Received: by 2002:a92:ca45:0:b0:2ca:810e:7855 with SMTP id q5-20020a92ca45000000b002ca810e7855mr1672675ilo.303.1649383831471; Thu, 07 Apr 2022 19:10:31 -0700 (PDT) MIME-Version: 1.0 References: <20220331084151.2600229-1-yosryahmed@google.com> <87y20nzyw4.fsf@yhuang6-desk2.ccr.corp.intel.com> <87o81fujdc.fsf@yhuang6-desk2.ccr.corp.intel.com> <87bkxfudrk.fsf@yhuang6-desk2.ccr.corp.intel.com> <215bd7332aee0ed1092bad4d826a42854ebfd04a.camel@linux.intel.com> In-Reply-To: From: Wei Xu Date: Thu, 7 Apr 2022 19:10:20 -0700 Message-ID: Subject: Re: [PATCH resend] memcg: introduce per-memcg reclaim interface To: Tim Chen Cc: "Huang, Ying" , Michal Hocko , Yosry Ahmed , Johannes Weiner , Shakeel Butt , Andrew Morton , David Rientjes , Tejun Heo , Zefan Li , Roman Gushchin , Cgroups , "open list:DOCUMENTATION" , Linux Kernel Mailing List , Linux MM , Jonathan Corbet , Yu Zhao , Dave Hansen , Greg Thelen Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 7, 2022 at 4:11 PM Tim Chen wrote: > > On Thu, 2022-04-07 at 15:12 -0700, Wei Xu wrote: > > > > > (resending in plain-text, sorry). > > > > memory.demote can work with any level of memory tiers if a nodemask > > argument (or a tier argument if there is a more-explicitly defined, > > userspace visible tiering representation) is provided. The semantics > > can be to demote X bytes from these nodes to their next tier. > > > > We do need some kind of userspace visible tiering representation. > Will be nice if I can tell the memory type, nodemask of nodes in tier Y with > > cat memory.tier_Y > > > > memory_dram/memory_pmem assumes the hardware for a particular memory > > tier, which is undesirable. For example, it is entirely possible that > > a slow memory tier is implemented by a lower-cost/lower-performance > > DDR device connected via CXL.mem, not by PMEM. It is better for this > > interface to speak in either the NUMA node abstraction or a new tier > > abstraction. > > Just from the perspective of memory.reclaim and memory.demote, I think > they could work with nodemask. For ease of management, > some kind of abstraction of tier information like nodemask, memory type > and expected performance should be readily accessible by user space. > I agree. The tier information should be provided at the system level. One suggestion is to have a new directory "/sys/devices/system/tier/" for tiers, e.g.: /sys/devices/system/tier/tier0/memlist: all memory nodes in tier 0. /sys/devices/system/tier/tier1/memlist: all memory nodes in tier 1. We can discuss this tier representation in a new thread. > Tim > > > > > It is also desirable to make this interface stateless, i.e. not to > > require the setting of memory_dram.reclaim_policy. Any policy can be > > specified as arguments to the request itself and should only affect > > that particular request. > > > > Wei > From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wei Xu Subject: Re: [PATCH resend] memcg: introduce per-memcg reclaim interface Date: Thu, 7 Apr 2022 19:10:20 -0700 Message-ID: References: <20220331084151.2600229-1-yosryahmed@google.com> <87y20nzyw4.fsf@yhuang6-desk2.ccr.corp.intel.com> <87o81fujdc.fsf@yhuang6-desk2.ccr.corp.intel.com> <87bkxfudrk.fsf@yhuang6-desk2.ccr.corp.intel.com> <215bd7332aee0ed1092bad4d826a42854ebfd04a.camel@linux.intel.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=3zLo2uZAPFPb1a10/oiW+kNzLrd/tjhqt8zswWgEmmM=; b=RYo/qKEMhFXkj4EEEZGvJ6VFDZ+tSforE2IzUpicHlQ/MFAhXALefpKrPRB75mPtt+ LwAN/p4WQS8oGGtRmuO+G5WEcMOa1EAvFqiDhlC7lh1l4ndUmWgIpMSACdklWU7vHuJ3 bW+C65ucZLINuYcSBx07IA/fu5Txan3nuxAWjN+WLEs28JeBv3l7zgkDIyJWAHW6KN0i sdNZlbScPlrCV3G2u6yxXSZIi76DYmQM/NS2jmaCbCW5Q3K6BeBfnbtcU4hPI8vJzo/J R7jTf4zbZywtg+3GWrThOvSHBeJS30G1EO4VjtBChGE0fe2KHRnkUlzLjc/IABN0jZxt 6xOA== In-Reply-To: List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Tim Chen Cc: "Huang, Ying" , Michal Hocko , Yosry Ahmed , Johannes Weiner , Shakeel Butt , Andrew Morton , David Rientjes , Tejun Heo , Zefan Li , Roman Gushchin , Cgroups , "open list:DOCUMENTATION" , Linux Kernel Mailing List , Linux MM , Jonathan Corbet , Yu Zhao , Dave Hansen , Greg Thelen On Thu, Apr 7, 2022 at 4:11 PM Tim Chen wrote: > > On Thu, 2022-04-07 at 15:12 -0700, Wei Xu wrote: > > > > > (resending in plain-text, sorry). > > > > memory.demote can work with any level of memory tiers if a nodemask > > argument (or a tier argument if there is a more-explicitly defined, > > userspace visible tiering representation) is provided. The semantics > > can be to demote X bytes from these nodes to their next tier. > > > > We do need some kind of userspace visible tiering representation. > Will be nice if I can tell the memory type, nodemask of nodes in tier Y with > > cat memory.tier_Y > > > > memory_dram/memory_pmem assumes the hardware for a particular memory > > tier, which is undesirable. For example, it is entirely possible that > > a slow memory tier is implemented by a lower-cost/lower-performance > > DDR device connected via CXL.mem, not by PMEM. It is better for this > > interface to speak in either the NUMA node abstraction or a new tier > > abstraction. > > Just from the perspective of memory.reclaim and memory.demote, I think > they could work with nodemask. For ease of management, > some kind of abstraction of tier information like nodemask, memory type > and expected performance should be readily accessible by user space. > I agree. The tier information should be provided at the system level. One suggestion is to have a new directory "/sys/devices/system/tier/" for tiers, e.g.: /sys/devices/system/tier/tier0/memlist: all memory nodes in tier 0. /sys/devices/system/tier/tier1/memlist: all memory nodes in tier 1. We can discuss this tier representation in a new thread. > Tim > > > > > It is also desirable to make this interface stateless, i.e. not to > > require the setting of memory_dram.reclaim_policy. Any policy can be > > specified as arguments to the request itself and should only affect > > that particular request. > > > > Wei >