All of lore.kernel.org
 help / color / mirror / Atom feed
From: Aneesh Kumar K V <aneesh.kumar@linux.ibm.com>
To: Ying Huang <ying.huang@intel.com>
Cc: Greg Thelen <gthelen@google.com>, Yang Shi <shy828301@gmail.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Tim C Chen <tim.c.chen@intel.com>,
	Brice Goglin <brice.goglin@gmail.com>,
	Michal Hocko <mhocko@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Hesham Almatary <hesham.almatary@huawei.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Alistair Popple <apopple@nvidia.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Feng Tang <feng.tang@intel.com>,
	Jagdish Gediya <jvgediya@linux.ibm.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	David Rientjes <rientjes@google.com>,
	linux-mm@kvack.org, akpm@linux-foundation.org
Subject: Re: [RFC PATCH v4 1/7] mm/demotion: Add support for explicit memory tiers
Date: Mon, 6 Jun 2022 09:26:32 +0530	[thread overview]
Message-ID: <d429a644-ef27-bcd8-52bd-c8cbe5fedc26@linux.ibm.com> (raw)
In-Reply-To: <aeced91ea9d9396e9842f5c0264391aabd291726.camel@intel.com>

On 6/6/22 8:19 AM, Ying Huang wrote:
> On Thu, 2022-06-02 at 14:07 +0800, Ying Huang wrote:
>> On Fri, 2022-05-27 at 17:55 +0530, Aneesh Kumar K.V wrote:
>>> From: Jagdish Gediya <jvgediya@linux.ibm.com>
>>>
>>> In the current kernel, memory tiers are defined implicitly via a
>>> demotion path relationship between NUMA nodes, which is created
>>> during the kernel initialization and updated when a NUMA node is
>>> hot-added or hot-removed.  The current implementation puts all
>>> nodes with CPU into the top tier, and builds the tier hierarchy
>>> tier-by-tier by establishing the per-node demotion targets based
>>> on the distances between nodes.
>>>
>>> This current memory tier kernel interface needs to be improved for
>>> several important use cases,
>>>
>>> The current tier initialization code always initializes
>>> each memory-only NUMA node into a lower tier.  But a memory-only
>>> NUMA node may have a high performance memory device (e.g. a DRAM
>>> device attached via CXL.mem or a DRAM-backed memory-only node on
>>> a virtual machine) and should be put into a higher tier.
>>>
>>> The current tier hierarchy always puts CPU nodes into the top
>>> tier. But on a system with HBM or GPU devices, the
>>> memory-only NUMA nodes mapping these devices should be in the
>>> top tier, and DRAM nodes with CPUs are better to be placed into the
>>> next lower tier.
>>>
>>> With current kernel higher tier node can only be demoted to selected nodes on the
>>> next lower tier as defined by the demotion path, not any other
>>> node from any lower tier.  This strict, hard-coded demotion order
>>> does not work in all use cases (e.g. some use cases may want to
>>> allow cross-socket demotion to another node in the same demotion
>>> tier as a fallback when the preferred demotion node is out of
>>> space), This demotion order is also inconsistent with the page
>>> allocation fallback order when all the nodes in a higher tier are
>>> out of space: The page allocation can fall back to any node from
>>> any lower tier, whereas the demotion order doesn't allow that.
>>>
>>> The current kernel also don't provide any interfaces for the
>>> userspace to learn about the memory tier hierarchy in order to
>>> optimize its memory allocations.
>>>
>>> This patch series address the above by defining memory tiers explicitly.
>>>
>>> This patch adds below sysfs interface which is read-only and
>>> can be used to read nodes available in specific tier.
>>>
>>> /sys/devices/system/memtier/memtierN/nodelist
>>>
>>> Tier 0 is the highest tier, while tier MAX_MEMORY_TIERS - 1 is the
>>> lowest tier. The absolute value of a tier id number has no specific
>>> meaning. what matters is the relative order of the tier id numbers.
>>>
>>> All the tiered memory code is guarded by CONFIG_TIERED_MEMORY.
>>> Default number of memory tiers are MAX_MEMORY_TIERS(3). All the
>>> nodes are by default assigned to DEFAULT_MEMORY_TIER(1).
>>>
>>> Default memory tier can be read from,
>>> /sys/devices/system/memtier/default_tier
>>>
>>> Max memory tier can be read from,
>>> /sys/devices/system/memtier/max_tiers
>>>
>>> This patch implements the RFC spec sent by Wei Xu <weixugc@google.com> at [1].
>>>
>>> [1] https://lore.kernel.org/linux-mm/CAAPL-u-DGLcKRVDnChN9ZhxPkfxQvz9Sb93kVoX_4J2oiJSkUw@mail.gmail.com/
>>>
>>> Signed-off-by: Jagdish Gediya <jvgediya@linux.ibm.com>
>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>
>> IMHO, we should change the kernel internal implementation firstly, then
>> implement the kerne/user space interface.  That is, make memory tier
>> explicit inside kernel, then expose it to user space.
> 
> Why ignore this comment for v5?  If you don't agree, please respond me.
> 

I am not sure what benefit such a rearrange would bring in? Right now I 
am writing the series from the point of view of introducing all the 
plumbing and them switching the existing demotion logic to use the new 
infrastructure. Redoing the code to hide all the userspace sysfs till we 
switch the demotion logic to use the new infrastructure doesn't really 
bring any additional clarity to patch review and would require me to 
redo the series with a lot of conflicts across the patches in the patchset.

-aneesh


  reply	other threads:[~2022-06-06  4:01 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-26 21:22 RFC: Memory Tiering Kernel Interfaces (v3) Wei Xu
2022-05-27  2:58 ` Ying Huang
2022-05-27 14:05   ` Hesham Almatary
2022-05-27 16:25     ` Wei Xu
2022-05-27 12:25 ` [RFC PATCH v4 0/7] mm/demotion: Memory tiers and demotion Aneesh Kumar K.V
2022-05-27 12:25   ` [RFC PATCH v4 1/7] mm/demotion: Add support for explicit memory tiers Aneesh Kumar K.V
2022-05-27 13:59     ` Jonathan Cameron
2022-06-02  6:07     ` Ying Huang
2022-06-06  2:49       ` Ying Huang
2022-06-06  3:56         ` Aneesh Kumar K V [this message]
2022-06-06  5:33           ` Ying Huang
2022-06-06  6:01             ` Aneesh Kumar K V
2022-06-06  6:27               ` Aneesh Kumar K.V
2022-06-06  7:53                 ` Ying Huang
2022-06-06  8:01                   ` Aneesh Kumar K V
2022-06-06  8:52                     ` Ying Huang
2022-06-06  9:02                       ` Aneesh Kumar K V
2022-06-08  1:24                         ` Ying Huang
2022-06-08  7:16     ` Ying Huang
2022-06-08  8:24       ` Aneesh Kumar K V
2022-06-08  8:27         ` Ying Huang
2022-05-27 12:25   ` [RFC PATCH v4 2/7] mm/demotion: Expose per node memory tier to sysfs Aneesh Kumar K.V
2022-05-27 14:15     ` Jonathan Cameron
2022-06-03  8:40       ` Aneesh Kumar K V
2022-06-06 14:59         ` Jonathan Cameron
2022-06-06 16:01           ` Aneesh Kumar K V
2022-06-06 16:16             ` Jonathan Cameron
2022-06-06 16:39               ` Aneesh Kumar K V
2022-06-06 17:46                 ` Aneesh Kumar K.V
2022-06-07 14:32                   ` Jonathan Cameron
2022-05-28  1:33     ` kernel test robot
2022-06-08  7:18     ` Ying Huang
2022-06-08  8:25       ` Aneesh Kumar K V
2022-06-08  8:29         ` Ying Huang
2022-05-27 12:25   ` [RFC PATCH v4 3/7] mm/demotion: Build demotion targets based on explicit memory tiers Aneesh Kumar K.V
2022-05-27 14:31     ` Jonathan Cameron
2022-05-30  3:35     ` [mm/demotion] 8ebccd60c2: BUG:sleeping_function_called_from_invalid_context_at_mm/compaction.c kernel test robot
2022-05-30  3:35       ` kernel test robot
2022-05-27 12:25   ` [RFC PATCH v4 4/7] mm/demotion/dax/kmem: Set node's memory tier to MEMORY_TIER_PMEM Aneesh Kumar K.V
2022-06-01  6:29     ` Bharata B Rao
2022-06-01 13:49       ` Aneesh Kumar K V
2022-06-02  6:36         ` Bharata B Rao
2022-06-03  9:04           ` Aneesh Kumar K V
2022-06-06 10:11             ` Bharata B Rao
2022-06-06 10:16               ` Aneesh Kumar K V
2022-06-06 11:54                 ` Aneesh Kumar K.V
2022-06-06 12:09                   ` Bharata B Rao
2022-06-06 13:00                     ` Aneesh Kumar K V
2022-05-27 12:25   ` [RFC PATCH v4 5/7] mm/demotion: Add support to associate rank with memory tier Aneesh Kumar K.V
2022-05-27 14:45     ` Jonathan Cameron
2022-05-27 15:45       ` Aneesh Kumar K V
2022-05-30 12:36         ` Jonathan Cameron
2022-06-02  6:41     ` Ying Huang
2022-05-27 12:25   ` [RFC PATCH v4 6/7] mm/demotion: Add support for removing node from demotion memory tiers Aneesh Kumar K.V
2022-06-02  6:43     ` Ying Huang
2022-05-27 12:25   ` [RFC PATCH v4 7/7] mm/demotion: Demote pages according to allocation fallback order Aneesh Kumar K.V
2022-05-27 15:03     ` Jonathan Cameron
2022-06-02  7:35     ` Ying Huang
2022-06-03 15:09       ` Aneesh Kumar K V
2022-06-06  0:43         ` Ying Huang
2022-06-06  4:07           ` Aneesh Kumar K V
2022-06-06  5:26             ` Ying Huang
2022-06-06  6:21               ` Aneesh Kumar K.V
2022-06-06  7:42                 ` Ying Huang
2022-06-06  8:02                   ` Aneesh Kumar K V
2022-06-06  8:06                     ` Ying Huang
2022-06-06 17:07               ` Yang Shi
2022-05-27 13:40 ` RFC: Memory Tiering Kernel Interfaces (v3) Aneesh Kumar K V
2022-05-27 16:30   ` Wei Xu
2022-05-29  4:31     ` Ying Huang
2022-05-30 12:50       ` Jonathan Cameron
2022-05-31  1:57         ` Ying Huang
2022-06-07 19:25         ` Tim Chen
2022-06-08  4:41           ` Aneesh Kumar K V

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d429a644-ef27-bcd8-52bd-c8cbe5fedc26@linux.ibm.com \
    --to=aneesh.kumar@linux.ibm.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=brice.goglin@gmail.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=dave@stgolabs.net \
    --cc=feng.tang@intel.com \
    --cc=gthelen@google.com \
    --cc=hesham.almatary@huawei.com \
    --cc=jvgediya@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=rientjes@google.com \
    --cc=shy828301@gmail.com \
    --cc=tim.c.chen@intel.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.