linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Huang Ying <ying.huang@intel.com>
To: linux-kernel@vger.kernel.org
Cc: Huang Ying <ying.huang@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.com>, Rik van Riel <riel@redhat.com>,
	Mel Gorman <mgorman@suse.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Yang Shi <shy828301@gmail.com>, Zi Yan <ziy@nvidia.com>,
	Wei Xu <weixugc@google.com>, osalvador <osalvador@suse.de>,
	Shakeel Butt <shakeelb@google.com>,
	linux-mm@kvack.org
Subject: [RFC -V7 0/6] NUMA balancing: optimize memory placement for memory tiering system
Date: Thu, 22 Jul 2021 11:18:13 +0800	[thread overview]
Message-ID: <20210722031819.3446711-1-ying.huang@intel.com> (raw)

This patchset is based on mmots tree of 2021-07-15.

Especially, it depends on the following patchset in mmots tree,

[PATCH -V10 0/9] Migrate Pages in lieu of discard
https://lore.kernel.org/lkml/20210715055145.195411-1-ying.huang@intel.com/

This is part of a larger patch set.  If you want to apply these or
play with them, I'd suggest using the tree from below,

https://github.com/hying-caritas/linux/commits/mt-20210722

The changes since the last post are as follows,

- Rebased on the mmots tree of 2021-07-15.

- Some minor fixes.

--

With the advent of various new memory types, some machines will have
multiple types of memory, e.g. DRAM and PMEM (persistent memory).  The
memory subsystem of these machines can be called memory tiering
system, because the performance of the different types of memory are
usually different.

After commit c221c0b0308f ("device-dax: "Hotplug" persistent memory
for use like normal RAM"), the PMEM could be used as the
cost-effective volatile memory in separate NUMA nodes.  In a typical
memory tiering system, there are CPUs, DRAM and PMEM in each physical
NUMA node.  The CPUs and the DRAM will be put in one logical node,
while the PMEM will be put in another (faked) logical node.

To optimize the system overall performance, the hot pages should be
placed in DRAM node.  To do that, we need to identify the hot pages in
the PMEM node and migrate them to DRAM node via NUMA migration.

In the original NUMA balancing, there are already a set of existing
mechanisms to identify the pages recently accessed by the CPUs in a
node and migrate the pages to the node.  So we can reuse these
mechanisms to build the mechanisms to optimize the page placement in
the memory tiering system.  This has been implemented in this
patchset.

At the other hand, the cold pages should be placed in PMEM node.  So,
we also need to identify the cold pages in the DRAM node and migrate
them to PMEM node.

In the following patchset (now in the latest mmots tree),

[PATCH -V10 0/9] Migrate Pages in lieu of discard
https://lore.kernel.org/lkml/20210715055145.195411-1-ying.huang@intel.com/

A mechanism to demote the cold DRAM pages to PMEM node under memory
pressure is implemented.  Based on that, the cold DRAM pages can be
demoted to PMEM node proactively to free some memory space on DRAM
node to accommodate the promoted hot PMEM pages.  This has been
implemented in this patchset too.

We have tested the solution with the pmbench memory accessing
benchmark with the 80:20 read/write ratio and the normal access
address distribution on a 2 socket Intel server with Optane DC
Persistent Memory Model.  The test results of the base kernel and step
by step optimizations are as follows,

                Throughput	Promotion      DRAM bandwidth
		  access/s           MB/s                MB/s
               -----------     ----------      --------------
Base            74238178.0                             4291.7
Patch 2        146050652.3          359.4             11248.6
Patch 3        146300787.1          355.2             11237.2
Patch 4        162536383.0          211.7             11890.4
Patch 5        157187775.0          105.9             10412.3
Patch 6        164028415.2           73.3             10810.6

The whole patchset improves the benchmark score up to 119.1%.  The
basic NUMA balancing based optimization solution (patch 1), the hot
page selection algorithm (patch 4), and the threshold automatic
adjustment algorithms (patch 6) improves the performance or reduce the
overhead (promotion MB/s) mostly.

Changelog:

v7:

- Rebased on the mmots tree of 2021-07-15.

- Some minor fixes.

v6:

- Rebased on the latest page demotion patchset. (which bases on v5.11)

v5:

- Rebased on the latest page demotion patchset. (which bases on v5.10)

v4:

- Rebased on the latest page demotion patchset. (which bases on v5.9-rc6)

- Add page promotion counter.

v3:

- Move the rate limit control as late as possible per Mel Gorman's
  comments.

- Revise the hot page selection implementation to store page scan time
  in struct page.

- Code cleanup.

- Rebased on the latest page demotion patchset.

v2:

- Addressed comments for V1.

- Rebased on v5.5.

Best Regards,
Huang, Ying

             reply	other threads:[~2021-07-22  3:18 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-22  3:18 Huang Ying [this message]
2021-07-22  3:18 ` [PATCH -V7 1/6] NUMA balancing: optimize page placement for memory tiering system Huang Ying
2021-07-22  3:18 ` [PATCH -V7 2/6] memory tiering: add page promotion counter Huang Ying
2021-07-22  3:18 ` [PATCH -V7 3/6] memory tiering: skip to scan fast memory Huang Ying
2021-07-22  3:18 ` [PATCH -V7 4/6] memory tiering: hot page selection with hint page fault latency Huang Ying
2021-07-22  3:18 ` [PATCH -V7 5/6] memory tiering: rate limit NUMA migration throughput Huang Ying
2021-07-22  3:18 ` [PATCH -V7 6/6] memory tiering: adjust hot threshold automatically Huang Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210722031819.3446711-1-ying.huang@intel.com \
    --to=ying.huang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=osalvador@suse.de \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    --cc=weixugc@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).