linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yang Shi <yang.shi@linux.alibaba.com>
To: mhocko@suse.com, mgorman@techsingularity.net, riel@surriel.com,
	hannes@cmpxchg.org, akpm@linux-foundation.org,
	dave.hansen@intel.com, keith.busch@intel.com,
	dan.j.williams@intel.com, fengguang.wu@intel.com,
	fan.du@intel.com, ying.huang@intel.com
Cc: yang.shi@linux.alibaba.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH 04/10] mm: numa: promote pages to DRAM when it is accessed twice
Date: Sat, 23 Mar 2019 12:44:29 +0800	[thread overview]
Message-ID: <1553316275-21985-5-git-send-email-yang.shi@linux.alibaba.com> (raw)
In-Reply-To: <1553316275-21985-1-git-send-email-yang.shi@linux.alibaba.com>

NUMA balancing would promote the pages to DRAM once it is accessed, but
it might be just one off access.  To reduce migration thrashing and
memory bandwidth pressure, introduce PG_promote flag to mark promote
candidate.  The page will be promoted to DRAM when it is accessed twice.
This might be a good way to filter out those one-off access pages.

PG_promote flag will be inherited by tail pages when THP gets split.
But, it will not be copied to the new page once the migration is done.

This approach is not definitely the optimal one to distinguish the
hot or cold pages.  It may need much more sophisticated algorithm to
distinguish hot or cold pages accurately.  Kernel may be not the good
place to implement such algorithm considering the complexity and potential
overhead.  But, kernel may still need such capability.

With NUMA balancing the whole workingset of the process may end up being
promoted to DRAM finally.  It depends on the page reclaim to demote
inactive pages to PMEM implemented by the following patch.

Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
---
 include/linux/page-flags.h     |  4 ++++
 include/trace/events/mmflags.h |  3 ++-
 mm/huge_memory.c               | 10 ++++++++++
 mm/memory.c                    |  8 ++++++++
 4 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 9f8712a..2d53166 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -131,6 +131,7 @@ enum pageflags {
 	PG_young,
 	PG_idle,
 #endif
+	PG_promote,		/* Promote candidate for NUMA balancing */
 	__NR_PAGEFLAGS,
 
 	/* Filesystems */
@@ -348,6 +349,9 @@ static inline void page_init_poison(struct page *page, size_t size)
 PAGEFLAG(OwnerPriv1, owner_priv_1, PF_ANY)
 	TESTCLEARFLAG(OwnerPriv1, owner_priv_1, PF_ANY)
 
+PAGEFLAG(Promote, promote, PF_ANY) __SETPAGEFLAG(Promote, promote, PF_ANY)
+	__CLEARPAGEFLAG(Promote, promote, PF_ANY)
+
 /*
  * Only test-and-set exist for PG_writeback.  The unconditional operators are
  * risky: they bypass page accounting.
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index a1675d4..f13c2a1 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -100,7 +100,8 @@
 	{1UL << PG_mappedtodisk,	"mappedtodisk"	},		\
 	{1UL << PG_reclaim,		"reclaim"	},		\
 	{1UL << PG_swapbacked,		"swapbacked"	},		\
-	{1UL << PG_unevictable,		"unevictable"	}		\
+	{1UL << PG_unevictable,		"unevictable"	},		\
+	{1UL << PG_promote,		"promote"	}		\
 IF_HAVE_PG_MLOCK(PG_mlocked,		"mlocked"	)		\
 IF_HAVE_PG_UNCACHED(PG_uncached,	"uncached"	)		\
 IF_HAVE_PG_HWPOISON(PG_hwpoison,	"hwpoison"	)		\
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 404acdc..8268a3c 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1589,6 +1589,15 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd)
 					      haddr + HPAGE_PMD_SIZE);
 	}
 
+	/* Promote page to DRAM when referenced twice */
+	if (!(node_isset(page_nid, def_alloc_nodemask)) &&
+	    !PagePromote(page)) {
+		SetPagePromote(page);
+		put_page(page);
+		page_nid = -1;
+		goto clear_pmdnuma;
+	}
+
 	/*
 	 * Migrate the THP to the requested node, returns with page unlocked
 	 * and access rights restored.
@@ -2396,6 +2405,7 @@ static void __split_huge_page_tail(struct page *head, int tail,
 			 (1L << PG_workingset) |
 			 (1L << PG_locked) |
 			 (1L << PG_unevictable) |
+			 (1L << PG_promote) |
 			 (1L << PG_dirty)));
 
 	/* ->mapping in first tail page is compound_mapcount */
diff --git a/mm/memory.c b/mm/memory.c
index 47fe250..2494c11 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3680,6 +3680,14 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
 		goto out;
 	}
 
+	/* Promote the non-DRAM page when it is referenced twice */
+	if (!(node_isset(page_nid, def_alloc_nodemask)) &&
+	    !PagePromote(page)) {
+		SetPagePromote(page);
+		put_page(page);
+		goto out;
+	}
+
 	/* Migrate to the requested node */
 	migrated = migrate_misplaced_page(page, vma, target_nid);
 	if (migrated) {
-- 
1.8.3.1


  parent reply	other threads:[~2019-03-23  4:45 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-23  4:44 [RFC PATCH 0/10] Another Approach to Use PMEM as NUMA Node Yang Shi
2019-03-23  4:44 ` [PATCH 01/10] mm: control memory placement by nodemask for two tier main memory Yang Shi
2019-03-23 17:21   ` Dan Williams
2019-03-25 19:28     ` Yang Shi
2019-03-25 23:18       ` Dan Williams
2019-03-25 23:36         ` Yang Shi
2019-03-25 23:42           ` Dan Williams
2019-03-23  4:44 ` [PATCH 02/10] mm: mempolicy: introduce MPOL_HYBRID policy Yang Shi
2019-03-23  4:44 ` [PATCH 03/10] mm: mempolicy: promote page to DRAM for MPOL_HYBRID Yang Shi
2019-03-23  4:44 ` Yang Shi [this message]
2019-03-29  0:31   ` [PATCH 04/10] mm: numa: promote pages to DRAM when it is accessed twice kbuild test robot
2019-03-23  4:44 ` [PATCH 05/10] mm: page_alloc: make find_next_best_node could skip DRAM node Yang Shi
2019-03-23  4:44 ` [PATCH 06/10] mm: vmscan: demote anon DRAM pages to PMEM node Yang Shi
2019-03-23  6:03   ` Zi Yan
2019-03-25 21:49     ` Yang Shi
2019-03-24 22:20   ` Keith Busch
2019-03-25 19:49     ` Yang Shi
2019-03-27  0:35       ` Keith Busch
2019-03-27  3:41         ` Yang Shi
2019-03-27 13:08           ` Keith Busch
2019-03-27 17:00             ` Zi Yan
2019-03-27 17:05               ` Dave Hansen
2019-03-27 17:48                 ` Zi Yan
2019-03-27 18:00                   ` Dave Hansen
2019-03-27 20:37                     ` Zi Yan
2019-03-27 20:42                       ` Dave Hansen
2019-03-28 21:59             ` Yang Shi
2019-03-28 22:45               ` Keith Busch
2019-03-23  4:44 ` [PATCH 07/10] mm: vmscan: add page demotion counter Yang Shi
2019-03-23  4:44 ` [PATCH 08/10] mm: numa: add page promotion counter Yang Shi
2019-03-23  4:44 ` [PATCH 09/10] doc: add description for MPOL_HYBRID mode Yang Shi
2019-03-23  4:44 ` [PATCH 10/10] doc: elaborate the PMEM allocation rule Yang Shi
2019-03-25 16:15 ` [RFC PATCH 0/10] Another Approach to Use PMEM as NUMA Node Brice Goglin
2019-03-25 16:56   ` Dan Williams
2019-03-25 17:45     ` Brice Goglin
2019-03-25 19:29       ` Dan Williams
2019-03-25 23:09         ` Brice Goglin
2019-03-25 23:37           ` Dan Williams
2019-03-26 12:19             ` Jonathan Cameron
2019-03-25 20:04   ` Yang Shi
2019-03-26 13:58 ` Michal Hocko
2019-03-26 18:33   ` Yang Shi
2019-03-26 18:37     ` Michal Hocko
2019-03-27  2:58       ` Yang Shi
2019-03-27  9:01         ` Michal Hocko
2019-03-27 17:34           ` Dan Williams
2019-03-27 18:59             ` Yang Shi
2019-03-27 20:09               ` Michal Hocko
2019-03-28  2:09                 ` Yang Shi
2019-03-28  6:58                   ` Michal Hocko
2019-03-28 18:58                     ` Yang Shi
2019-03-28 19:12                       ` Michal Hocko
2019-03-28 19:40                         ` Yang Shi
2019-03-28 20:40                           ` Michal Hocko
2019-03-28  8:21                   ` Dan Williams
2019-03-27 20:14               ` Dave Hansen
2019-03-27 20:35             ` Matthew Wilcox
2019-03-27 20:40               ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1553316275-21985-5-git-send-email-yang.shi@linux.alibaba.com \
    --to=yang.shi@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=fan.du@intel.com \
    --cc=fengguang.wu@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=keith.busch@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=riel@surriel.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).