Re: [PATCH v10 1/8] mm/demotion: Add support for explicit memory tiers

From: "Huang, Ying" <ying.huang@intel.com>
To: Aneesh Kumar K V <aneesh.kumar@linux.ibm.com>,
	Wei Xu <weixugc@google.com>, Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org,
	Yang Shi <shy828301@gmail.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Tim C Chen <tim.c.chen@intel.com>,
	Michal Hocko <mhocko@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Hesham Almatary <hesham.almatary@huawei.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Alistair Popple <apopple@nvidia.com>,
	Dan Williams <dan.j.williams@intel.com>,
	jvgediya.oss@gmail.com, Jagdish Gediya <jvgediya@linux.ibm.com>
Subject: Re: [PATCH v10 1/8] mm/demotion: Add support for explicit memory tiers
Date: Wed, 27 Jul 2022 09:16:08 +0800	[thread overview]
Message-ID: <87lesfuzhj.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <9e9ba2e4-3a87-3a79-e336-8849dad4856a@linux.ibm.com> (Aneesh Kumar K. V.'s message of "Tue, 26 Jul 2022 17:29:56 +0530")

Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:

>>> diff --git a/include/linux/node.h b/include/linux/node.h
>>> index 40d641a8bfb0..a2a16d4104fd 100644
>>> --- a/include/linux/node.h
>>> +++ b/include/linux/node.h
>>> @@ -92,6 +92,12 @@ struct node {
>>>  	struct list_head cache_attrs;
>>>  	struct device *cache_dev;
>>>  #endif
>>> +	/*
>>> +	 * For memory devices, perf_level describes
>>> +	 * the device performance and how it should be used
>>> +	 * while building a memory hierarchy.
>>> +	 */
>>> +	int perf_level;
>> 
>> Think again, I found that "perf_level" may be not the best abstraction
>> of the performance of memory devices.  In concept, it's an abstraction of the memory
>> bandwidth.  But it will not reflect the memory latency.
>> 
>> Instead, the previous proposed "abstract_distance" is an abstraction of
>> the memory latency.  Per my understanding, the memory latency has more
>> direct influence on system performance.  And because the latency of the
>> memory device will increase if the memory accessing throughput nears its
>> max bandwidth, so the memory bandwidth can be reflected in the "abstract
>> distance" too.  That is, the "abstract distance" is an abstraction of
>> the memory latency under the expected memory accessing throughput.  The
>> "offset" to the default "abstract distance" reflects the different
>> expected memory accessing throughput.
>> 
>> So, I think we need some kind of abstraction of the memory latency
>> instead of memory bandwidth, e.g., "abstract distance".
>> 
>
> I am reworking other parts of the patch set based on your feedback.

Thanks!

> This part I guess we need to reach some consensus.

Yes.  Let's do that.

> IMHO perf_level (performance level) can indicate a combination of both latency
> and bandwidth.

"abstract distance" is based on latency, and bandwidth is reflected via
"latency under the expected memory accessing throughput".

How does perf_level indicate the combination?  Per my understanding,
it's bandwidth based.

> It is an abstract concept that indicates the performance of the
> device. As we learn more about which device attribute makes more impact in
> defining hierarchy, performance level will give more weightage to that specific
> attribute. It could be write latency or bandwidth. For me, distance has a direct
> linkage to latency because that is how we define numa distance now. Adding
> abstract to the name is not making it more abstract than perf_level. 
>
> I am open to suggestions from others.  Wei Xu has also suggested perf_level name.
> I can rename this to abstract_distance if that indicates the goal better.

I'm open to naming.  But I think that it's good to define it at some
degree instead of completely opaque stuff.  If it's latency based, then
low value corresponds to high performance.  If it's bandwidth based,
then low value corresponds to low performance.

Hi, Wei and Johannes,

What do you think about this?

Best Regards,
Huang, Ying