All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alistair Popple <apopple@nvidia.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	linux-cxl@vger.kernel.org, nvdimm@lists.linux.dev,
	linux-acpi@vger.kernel.org,
	"Aneesh Kumar K  . V" <aneesh.kumar@linux.ibm.com>,
	Wei Xu <weixugc@google.com>,
	Dan  Williams <dan.j.williams@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Michal Hocko <mhocko@kernel.org>, Yang Shi <shy828301@gmail.com>,
	Rafael J Wysocki <rafael.j.wysocki@intel.com>,
	Dave Jiang <dave.jiang@intel.com>
Subject: Re: [PATCH RESEND 1/4] memory tiering: add abstract distance calculation algorithms management
Date: Tue, 22 Aug 2023 09:52:43 +1000	[thread overview]
Message-ID: <87edjwlzn7.fsf@nvdebian.thelocal> (raw)
In-Reply-To: <87il98c8ms.fsf@yhuang6-desk2.ccr.corp.intel.com>


"Huang, Ying" <ying.huang@intel.com> writes:

> Alistair Popple <apopple@nvidia.com> writes:
>
>> "Huang, Ying" <ying.huang@intel.com> writes:
>>
>>> Hi, Alistair,
>>>
>>> Sorry for late response.  Just come back from vacation.
>>
>> Ditto for this response :-)
>>
>> I see Andrew has taken this into mm-unstable though, so my bad for not
>> getting around to following all this up sooner.
>>
>>> Alistair Popple <apopple@nvidia.com> writes:
>>>
>>>> "Huang, Ying" <ying.huang@intel.com> writes:
>>>>
>>>>> Alistair Popple <apopple@nvidia.com> writes:
>>>>>
>>>>>> "Huang, Ying" <ying.huang@intel.com> writes:
>>>>>>
>>>>>>> Alistair Popple <apopple@nvidia.com> writes:
>>>>>>>
>>>>>>>>>>> While other memory device drivers can use the general notifier chain
>>>>>>>>>>> interface at the same time.
>>>>>>>>
>>>>>>>> How would that work in practice though? The abstract distance as far as
>>>>>>>> I can tell doesn't have any meaning other than establishing preferences
>>>>>>>> for memory demotion order. Therefore all calculations are relative to
>>>>>>>> the rest of the calculations on the system. So if a driver does it's own
>>>>>>>> thing how does it choose a sensible distance? IHMO the value here is in
>>>>>>>> coordinating all that through a standard interface, whether that is HMAT
>>>>>>>> or something else.
>>>>>>>
>>>>>>> Only if different algorithms follow the same basic principle.  For
>>>>>>> example, the abstract distance of default DRAM nodes are fixed
>>>>>>> (MEMTIER_ADISTANCE_DRAM).  The abstract distance of the memory device is
>>>>>>> in linear direct proportion to the memory latency and inversely
>>>>>>> proportional to the memory bandwidth.  Use the memory latency and
>>>>>>> bandwidth of default DRAM nodes as base.
>>>>>>>
>>>>>>> HMAT and CDAT report the raw memory latency and bandwidth.  If there are
>>>>>>> some other methods to report the raw memory latency and bandwidth, we
>>>>>>> can use them too.
>>>>>>
>>>>>> Argh! So we could address my concerns by having drivers feed
>>>>>> latency/bandwidth numbers into a standard calculation algorithm right?
>>>>>> Ie. Rather than having drivers calculate abstract distance themselves we
>>>>>> have the notifier chains return the raw performance data from which the
>>>>>> abstract distance is derived.
>>>>>
>>>>> Now, memory device drivers only need a general interface to get the
>>>>> abstract distance from the NUMA node ID.  In the future, if they need
>>>>> more interfaces, we can add them.  For example, the interface you
>>>>> suggested above.
>>>>
>>>> Huh? Memory device drivers (ie. dax/kmem.c) don't care about abstract
>>>> distance, it's a meaningless number. The only reason they care about it
>>>> is so they can pass it to alloc_memory_type():
>>>>
>>>> struct memory_dev_type *alloc_memory_type(int adistance)
>>>>
>>>> Instead alloc_memory_type() should be taking bandwidth/latency numbers
>>>> and the calculation of abstract distance should be done there. That
>>>> resovles the issues about how drivers are supposed to devine adistance
>>>> and also means that when CDAT is added we don't have to duplicate the
>>>> calculation code.
>>>
>>> In the current design, the abstract distance is the key concept of
>>> memory types and memory tiers.  And it is used as interface to allocate
>>> memory types.  This provides more flexibility than some other interfaces
>>> (e.g. read/write bandwidth/latency).  For example, in current
>>> dax/kmem.c, if HMAT isn't available in the system, the default abstract
>>> distance: MEMTIER_DEFAULT_DAX_ADISTANCE is used.  This is still useful
>>> to support some systems now.  On a system without HMAT/CDAT, it's
>>> possible to calculate abstract distance from ACPI SLIT, although this is
>>> quite limited.  I'm not sure whether all systems will provide read/write
>>> bandwith/latency data for all memory devices.
>>>
>>> HMAT and CDAT or some other mechanisms may provide the read/write
>>> bandwidth/latency data to be used to calculate abstract distance.  For
>>> them, we can provide a shared implementation in mm/memory-tiers.c to map
>>> from read/write bandwith/latency to the abstract distance.  Can this
>>> solve your concerns about the consistency among algorithms?  If so, we
>>> can do that when we add the second algorithm that needs that.
>>
>> I guess it would address my concerns if we did that now. I don't see why
>> we need to wait for a second implementation for that though - the whole
>> series seems to be built around adding a framework for supporting
>> multiple algorithms even though only one exists. So I think we should
>> support that fully, or simplfy the whole thing and just assume the only
>> thing that exists is HMAT and get rid of the general interface until a
>> second algorithm comes along.
>
> We will need a general interface even for one algorithm implementation.
> Because it's not good to make a dax subsystem driver (dax/kmem) to
> depend on a ACPI subsystem driver (acpi/hmat).  We need some general
> interface at subsystem level (memory tier here) between them.

I don't understand this argument. For a single algorithm it would be
simpler to just define acpi_hmat_calculate_adistance() and a static
inline version of it that returns -ENOENT when !CONFIG_ACPI than adding
a layer of indirection through notifier blocks. That breaks any
dependency on ACPI and there's plenty of precedent for this approach in
the kernel already.

Thanks,
Alistar.

> Best Regards,
> Huang, Ying


  reply	other threads:[~2023-08-21 23:55 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-21  1:29 [PATCH RESEND 0/4] memory tiering: calculate abstract distance based on ACPI HMAT Huang Ying
2023-07-21  1:29 ` [PATCH RESEND 1/4] memory tiering: add abstract distance calculation algorithms management Huang Ying
2023-07-25  2:13   ` Alistair Popple
2023-07-25  3:14     ` Huang, Ying
2023-07-25  8:26       ` Alistair Popple
2023-07-26  7:33         ` Huang, Ying
2023-07-27  3:42           ` Alistair Popple
2023-07-27  4:02             ` Huang, Ying
2023-07-27  4:07               ` Alistair Popple
2023-07-27  5:41                 ` Huang, Ying
2023-07-28  1:20                   ` Alistair Popple
2023-08-11  3:51                     ` Huang, Ying
2023-08-21 11:26                       ` Alistair Popple
2023-08-21 22:50                         ` Huang, Ying
2023-08-21 23:52                           ` Alistair Popple [this message]
2023-08-22  0:58                             ` Huang, Ying
2023-08-22  7:11                               ` Alistair Popple
2023-08-23  5:56                                 ` Huang, Ying
2023-08-25  5:41                                   ` Alistair Popple
2023-07-21  1:29 ` [PATCH RESEND 2/4] acpi, hmat: refactor hmat_register_target_initiators() Huang Ying
2023-07-25  2:44   ` Alistair Popple
2023-08-07 16:55   ` Jonathan Cameron
2023-08-11  1:13     ` Huang, Ying
2023-07-21  1:29 ` [PATCH RESEND 3/4] acpi, hmat: calculate abstract distance with HMAT Huang Ying
2023-07-25  2:45   ` Alistair Popple
2023-07-25  6:47     ` Huang, Ying
2023-08-21 11:53       ` Alistair Popple
2023-08-21 23:28         ` Huang, Ying
2023-07-21  1:29 ` [PATCH RESEND 4/4] dax, kmem: calculate abstract distance with general interface Huang Ying
2023-07-25  3:11   ` Alistair Popple
2023-07-25  7:02     ` Huang, Ying
2023-08-21 12:03       ` Alistair Popple
2023-08-21 23:33         ` Huang, Ying
2023-08-22  7:36           ` Alistair Popple
2023-08-23  2:13             ` Huang, Ying
2023-08-25  6:00               ` Alistair Popple
2023-07-21  4:15 ` [PATCH RESEND 0/4] memory tiering: calculate abstract distance based on ACPI HMAT Alistair Popple
2023-07-24 17:58   ` Andrew Morton
2023-08-01  2:35     ` Bharata B Rao
2023-08-11  6:26       ` Huang, Ying
2023-08-11  7:49         ` Bharata B Rao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87edjwlzn7.fsf@nvdebian.thelocal \
    --to=apopple@nvidia.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=hannes@cmpxchg.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=rafael.j.wysocki@intel.com \
    --cc=shy828301@gmail.com \
    --cc=weixugc@google.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.