Re: [PATCH 0/5] Transparent Page Placement for Tiered-Memory

From: "Huang, Ying" <ying.huang@intel.com>
To: Hasan Al Maruf <hasan3050@gmail.com>
Cc: dave.hansen@linux.intel.com, hannes@cmpxchg.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	mgorman@techsingularity.net, riel@surriel.com,
	yang.shi@linux.alibaba.com
Subject: Re: [PATCH 0/5] Transparent Page Placement for Tiered-Memory
Date: Tue, 30 Nov 2021 09:29:51 +0800	[thread overview]
Message-ID: <87ee6yv3ao.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <874k812fl8.fsf@yhuang6-desk2.ccr.corp.intel.com> (Hasan Al Maruf's message of "Mon, 29 Nov 2021 19:28:16 -0500")

Hi, Hasan,

Hasan Al Maruf <hasan3050@gmail.com> writes:

> Hi Huang,
>
> We find the patches in the tiering series are well thought and helpful.
> For our workloads, we initially started with that series and we find the
> whole series is too complex and some features do not benefit as
> expected. Therefore, we have come up with the current basic patches which
> are essential and help achieve most of the intended behaviors while
> reducing complexity as much as possible.
> As we started with your tiering series (with 72 patches), there are
> overlaps between our patches and the tiering series. We adopt the
> functionalities from the tiering series, modify, and extend them to make
> page placement mechanism simpler but workable.

Thanks for the background!

> Here is the key points for
> each of the patches in our Transparent Page Placement series.
>
> Patch #1:
> We combine all the promotion and demotion related statistics in this patch
> Having statistics on both promotion, demotion, and failures help observe
> the systems behavior and reason about performance behavior. Besides, anon
> vs file breakdown in both promotion and demotion path help understand
> application behavior on a tiered memory systems. As applications may have
> different sensitivity toward the anon and file placements, this breakdown
> in the migration path is often helpful to assess the effectiveness of the
> page placement policy.
>
> Patch #2:
> This patch largely overlaps with your current series on NUMA Balancing.
> https://lore.kernel.org/lkml/20211116013522.140575-1-ying.huang@intel.com/
> This patch is a combination of your Patch #2 and Patch #3 except the
> static 10MB free space in the top-tier node to maintain a free headroom
> for new allocation and promotion. Rather, we find having a user defined
> demote watermark would make it more generic that we include in our patch#3
>
> Patch #3:
> This patch has the logic for having a separate demote watermark per node.
> In the tiering series, that demote watermark is somewhat bound to the
> cgroup and triggered on per-application basis. Besides, It only supports
> cgroup-v1. However, we think, instead of cgroup based soft reclamation,
> a global per-node demote watermark is more meaningful and should be the
> basic one to start with. In that case, the user does not have to think
> about per-application setup.
>
> Patch #4:
> This patch includes the code for kswapd based reclamation. As I mentioned
> earlier, instead of cgroup-based reclamation, here we look whether a node
> is balanced during each kswapd invocation. For top-tier node, we check
> whether kswapd reclaimed till DEMOTE_WMARK is satisfied, for other nodes
> the default mechanism continues. The differences between tiering series
> and this patch is the cgroup based reclamation vs per-node reclamation.
>
> Patch #5:
> In your patches for promotion, you consider re-fault time for promotion
> candidate selection. Although the hot-threshold is tunable, from our
> experiments, we find this not helpful to some extent. For example, if
> different subset of pages have different re-access time, time-based
> promotion should not be able to distinguish between them. If you make
> the time window long enough, then any infrequently accessed pages will
> also become the promotion candidate, and later be a candidate for the
> demotion.
>
> In this patch, we propose LRU based promotion, which would give anon and
> files different promotion paths. If pages are used sporadically at high
> frequency, irregular pages would be eventually moved from the active LRU
> list. We find that our LRU based approach can reduce up to 11x promotion
> traffic while retaining the same application throughput for multiple
> workloads.
>
> Besides, with promotion rate limit, if files largely get promoted to
> top-tier, anon promotion rate often gets hampered as files are taking the
> large portion of the total rate (which often happen for applications that
> generates huge caches). In our LRU-based approach, each type has their own
> separate LRU to check. So for workloads with smaller anons and large file
> usage, with LRU-based approach, we can see more anons are being promoted
> rather than the files.
>
> I don't mind this patchset being merged to your current patchset under
> discussion or any later ones. But, I think this series contains the very
> basic functionalities to have a workable page placement mechanism for
> tiered-memory. This can obviously be augmented by the other features in
> you future tiering series.

Thanks for detailed description!  After reading your patchset and the
description above, I found that the basic part ([1/6] - [3/6]) of the
promotion patchset as follows can be the base for your patchset too.

https://lore.kernel.org/lkml/20211116013522.140575-1-ying.huang@intel.com/

The main problem of that basic patchset is lacking review.  Can I ask
you to help to review that patchset, especially the common base [1/6] -
[3/6]?  If you think the rest of the patchset isn't good enough for you,
we can try to merge just [1/6] - [3/6] firstly.  Do you agree?

Best Regards,
Huang, Ying