All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Huang Ying <ying.huang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Michal Hocko <mhocko@suse.com>, Rik van Riel <riel@surriel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Yang Shi <shy828301@gmail.com>, Zi Yan <ziy@nvidia.com>,
	Wei Xu <weixugc@google.com>, osalvador <osalvador@suse.de>,
	Shakeel Butt <shakeelb@google.com>,
	Zhong Jiang <zhongjiang-ali@linux.alibaba.com>
Subject: Re: [PATCH 0/3] memory tiering: hot page selection
Date: Fri, 8 Apr 2022 21:07:09 -0700	[thread overview]
Message-ID: <20220408210709.ce4ede3a9c778700dda0a292@linux-foundation.org> (raw)
In-Reply-To: <20220408071222.219689-1-ying.huang@intel.com>

On Fri,  8 Apr 2022 15:12:19 +0800 Huang Ying <ying.huang@intel.com> wrote:

> To optimize page placement in a memory tiering system with NUMA
> balancing, the hot pages in the slow memory node need to be
> identified.  Essentially, the original NUMA balancing implementation
> selects and promote the mostly recently accessed (MRU) pages.  But the
> recently accessed pages may be cold.

Wait.  What.  How on earth could the most recently accessed page be
considered "cold"?

Changelog needs work, please.

> So in this patchset, we
> implement a new hot page identification algorithm based on the latency
> between NUMA balancing page table scanning and hint page fault.

That sounds reasonable.

> And the hot page promotion can incur some overhead in the system.  To
> control the overhead a simple promotion rate limit mechanism is
> implemented.

That sounds like a hack to fix a problem you just added?  Maybe a
reasonable one..

> The hot threshold used to identify the hot pages is workload dependent
> usually.  So we also implemented a hot threshold automatic adjustment
> algorithm.  The basic idea is to increase/decrease the hot threshold
> to make the number of pages that pass the hot threshold (promote
> candidate) near the rate limit.

hot threshold is tweakable via debugfs.  So it's an unstable interface,
undocumented, may be withdrawn at any time, etc.

Please justify this.  Is it intended that this interface be removed? 
If yes, please describe the plan for this in the changelog.  If no then
this interface should be in sysfs, it should be fully documented in the
kernel tree and it should be fully changelogged (preferably with usage
examples) so that reviewers can best understand what is a permanent
extension to the kernel API which we must maintain for all time.



Does this code build correctly if !LAST_CPUPID_NOT_IN_PAGE_FLAGS?


> TODO: Add ABI document for new sysctl knob.

um, yes.


check_cpupid() is a poor function name.  Check for what?  Liver damage?
A better identifier would be cpupid_valid(), perhaps.  And perhaps it
should be implemented in mm.h.  And even documented.


Oddly, the proposed changes do a decent job of documenting
intra-function changes but a poor job of documenting newly added
functions.  Please review every new function and decide whether it is
so simple and obvious that it can be added to Linux without any inline
description at all.  

pgdat_free_space_enough() and  numa_hint_fault_latency().

numa_migration_check_rate_limit() particularly.  There's that "check"
word again.  Check for what?



The numa_balancing_rate_limit_mbps sysctl.  I assume "b" means "bytes".
That's better than "pages", but still.  Machines vary a lot in how
many bytes they have and in the speed at which they process those
bytes.  Is there some metric which we can use here which at least
partially insulates us from this variability?  So that sysadmins don't
need to set all their machines differently, based upon their size and
speed?


numa_threshold_ts is apparently in units of jiffies.  This was not
documented at the definition site, and it should be.  But jiffies can
vary between machines and configs.  Why add this inter-machine and
inter-config uncertainty/variability when we have include/linux/ktime.h?




  parent reply	other threads:[~2022-04-09  4:07 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-08  7:12 [PATCH 0/3] memory tiering: hot page selection Huang Ying
2022-04-08  7:12 ` [PATCH 1/3] memory tiering: hot page selection with hint page fault latency Huang Ying
2022-04-14 13:23   ` Jagdish Gediya
2022-04-15  2:42     ` ying.huang
2022-04-08  7:12 ` [PATCH 2/3] memory tiering: rate limit NUMA migration throughput Huang Ying
2022-04-08  7:12 ` [PATCH 3/3] memory tiering: adjust hot threshold automatically Huang Ying
2022-04-09  4:07 ` Andrew Morton [this message]
2022-04-11  8:16   ` [PATCH 0/3] memory tiering: hot page selection ying.huang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220408210709.ce4ede3a9c778700dda0a292@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=osalvador@suse.de \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    --cc=weixugc@google.com \
    --cc=ying.huang@intel.com \
    --cc=zhongjiang-ali@linux.alibaba.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.