linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Alistair Popple <apopple@nvidia.com>
Cc: Wei Xu <weixugc@google.com>,
	Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>,
	Dave Hansen <dave.hansen@intel.com>,
	"Huang Ying" <ying.huang@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Greg Thelen" <gthelen@google.com>,
	Yang Shi <shy828301@gmail.com>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	Jagdish Gediya <jvgediya@linux.ibm.com>,
	Michal Hocko <mhocko@kernel.org>,
	Tim C Chen <tim.c.chen@intel.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	"Feng Tang" <feng.tang@intel.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	"Dan Williams" <dan.j.williams@intel.com>,
	David Rientjes <rientjes@google.com>,
	Linux MM <linux-mm@kvack.org>,
	Brice Goglin <brice.goglin@gmail.com>,
	"Hesham Almatary" <hesham.almatary@huawei.com>
Subject: Re: RFC: Memory Tiering Kernel Interfaces (v2)
Date: Wed, 25 May 2022 12:48:47 +0100	[thread overview]
Message-ID: <20220525124847.00007a16@Huawei.com> (raw)
In-Reply-To: <87h75ef3y5.fsf@nvdebian.thelocal>

On Wed, 25 May 2022 17:47:33 +1000
Alistair Popple <apopple@nvidia.com> wrote:

> Wei Xu <weixugc@google.com> writes:
> 
> > On Tue, May 24, 2022 at 6:27 AM Aneesh Kumar K.V
> > <aneesh.kumar@linux.ibm.com> wrote:  
> >>
> >> Wei Xu <weixugc@google.com> writes:
> >>  
> >> > On Wed, May 18, 2022 at 5:00 AM Jonathan Cameron
> >> > <Jonathan.Cameron@huawei.com> wrote:  
> >> >>
> >> >> On Wed, 18 May 2022 00:09:48 -0700
> >> >> Wei Xu <weixugc@google.com> wrote:  
> >>
> >> ...
> >>  
> >> > Nice :)  
> >> >>
> >> >> Initially I thought this was over complicated when compared to just leaving space, but
> >> >> after a chat with Hesham just now you have us both convinced that this is an elegant solution.
> >> >>
> >> >> Few corners probably need fleshing out:
> >> >> *  Use of an allocator for new tiers. Flat number at startup, or new one on write of unique
> >> >>    value to set_memtier perhaps?  Also whether to allow drivers to allocate (I think
> >> >>    we should).
> >> >> *  Multiple tiers with same rank.  My assumption is from demotion path point of view you
> >> >>    fuse them (treat them as if they were a single tier), but keep them expressed
> >> >>    separately in the sysfs interface so that the rank can be changed independently.
> >> >> *  Some guidance on what values make sense for given rank default that might be set by
> >> >>    a driver. If we have multiple GPU vendors, and someone mixes them in a system we
> >> >>    probably don't want the default values they use to result in demotion between them.
> >> >>    This might well be a guidance DOC or appropriate set of #define  
> >> >
> >> > All of these are good ideas, though I am afraid that these can make
> >> > tier management too complex for what it's worth.
> >> >
> >> > How about an alternative tier numbering scheme that uses major.minor
> >> > device IDs?  For simplicity, we can just start with 3 major tiers.
> >> > New tiers can be inserted in-between using minor tier IDs.  
> >>
> >>
> >> What drives the creation of a new memory tier here?  Jonathan was
> >> suggesting we could do something similar to writing to set_memtier for
> >> creating a new memory tier.
> >>
> >> $ echo "memtier128" > sys/devices/system/node/node1/set_memtier
> >>
> >> But I am wondering whether we should implement that now. If we keep
> >> "rank" concept and detach tier index (memtier0 is the memory tier with
> >> index 0) separate from rank, I assume we have enough flexibility for a
> >> future extension that will allow us to create a memory tier from userspace
> >> and assigning it a rank value that helps the device to be placed before or
> >> after DRAM in demotion order.
> >>
> >> ie, For now we will only have memtier0, memtier1, memtier2. We won't add
> >> dynamic creation of memory tiers and the above memory tiers will have
> >> rank value 0, 1, 2 according with demotion order 0 -> 1 -> 2.  
> >
> > Great. So the consensus is to go with the "rank" approach.  The above
> > sounds good to me as a starting point.  
> 
> The rank approach seems good to me too.

Rank is good, but I do slightly worry about accidentally defining ABI
that people care about with the particular numbers used for the initial ranks.

Maybe just x100 on all of them to allow things in between with no change to
this initial set of 3?  So 0, 100, 200

Jonathan

> 
>  - Alistair
> 
> >> -aneesh  



  reply	other threads:[~2022-05-25 11:48 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-12  6:22 RFC: Memory Tiering Kernel Interfaces (v2) Wei Xu
2022-05-12  7:03 ` ying.huang
2022-05-12  7:12   ` Aneesh Kumar K V
2022-05-12  7:18     ` ying.huang
2022-05-12  7:22     ` Wei Xu
2022-05-12  7:36       ` Aneesh Kumar K.V
2022-05-12  8:15         ` Wei Xu
2022-05-12  8:37           ` ying.huang
2022-05-13  2:52             ` ying.huang
2022-05-13  7:00               ` Wei Xu
2022-05-16  1:57                 ` ying.huang
2022-05-12 21:12           ` Tim Chen
2022-05-12 21:31             ` Wei Xu
2022-05-12 15:00 ` Jonathan Cameron
2022-05-18  7:09   ` Wei Xu
2022-05-18 12:00     ` Jonathan Cameron
2022-05-24  7:36       ` Wei Xu
2022-05-24 13:26         ` Aneesh Kumar K.V
2022-05-25  5:27           ` Wei Xu
2022-05-25  7:47             ` Alistair Popple
2022-05-25 11:48               ` Jonathan Cameron [this message]
2022-05-25 15:32                 ` Wei Xu
2022-05-20  3:06     ` Ying Huang
2022-05-24  7:04       ` Wei Xu
2022-05-24  8:24         ` Ying Huang
2022-05-25  5:32           ` Wei Xu
2022-05-25  9:03             ` Ying Huang
2022-05-25 10:01               ` Aneesh Kumar K V
2022-05-25 11:36                 ` Mika Penttilä
2022-05-25 15:33                   ` Wei Xu
2022-05-25 17:27                 ` Wei Xu
2022-05-26  9:32                   ` Jonathan Cameron
2022-05-26 20:30                     ` Wei Xu
2022-05-27  9:26                   ` Aneesh Kumar K V
2022-05-25 15:36               ` Wei Xu
2022-05-26  1:09                 ` Ying Huang
2022-05-26  3:53                   ` Wei Xu
2022-05-26  6:54                     ` Ying Huang
2022-05-26  7:08                       ` Wei Xu
2022-05-26  7:39                         ` Ying Huang
2022-05-26 20:55                           ` Wei Xu
2022-05-27  9:10                             ` Jonathan Cameron
2022-05-30  6:54                               ` Ying Huang
2022-05-13  3:25 ` ying.huang
2022-05-13  6:36   ` Wei Xu
2022-05-13  7:04     ` ying.huang
2022-05-13  7:21       ` Wei Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220525124847.00007a16@Huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=apopple@nvidia.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=brice.goglin@gmail.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=dave@stgolabs.net \
    --cc=feng.tang@intel.com \
    --cc=gthelen@google.com \
    --cc=hesham.almatary@huawei.com \
    --cc=jvgediya@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=rientjes@google.com \
    --cc=shy828301@gmail.com \
    --cc=tim.c.chen@intel.com \
    --cc=weixugc@google.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).