From: Fengguang Wu <fengguang.wu@intel.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Linux Memory Management List <linux-mm@kvack.org>
Cc: kvm@vger.kernel.org
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: Fan Du <fan.du@intel.com>
Cc: Yao Yuan <yuan.yao@intel.com>
Cc: Peng Dong <dongx.peng@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Liu Jingqi <jingqi.liu@intel.com>
Cc: Dong Eddie <eddie.dong@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Zhang Yi <yi.z.zhang@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Subject: [RFC][PATCH v2 00/21] PMEM NUMA node and hotness accounting/migration
Date: Wed, 26 Dec 2018 21:14:46 +0800 [thread overview]
Message-ID: <20181226131446.330864849@intel.com> (raw)
Message-ID: <20181226131446.NT67mnFig3zoMmDvba2K5tCZ5IeQl3rfDCvFmg00inA@z> (raw)
This is an attempt to use NVDIMM/PMEM as volatile NUMA memory that's
transparent to normal applications and virtual machines.
The code is still in active development. It's provided for early design review.
Key functionalities:
1) create and describe PMEM NUMA node for NVDIMM memory
2) dumb /proc/PID/idle_pages interface, for user space driven hot page accounting
3) passive kernel cold page migration in page reclaim path
4) improved move_pages() for active user space hot/cold page migration
(1) is foundation for transparent usage of NVDIMM for normal apps and virtual
machines. (2-4) enable auto placing hot pages in DRAM for better performance.
A user space migration daemon is being built based on this kernel patchset to
make the full vertical solution.
Base kernel is v4.20 . The patches are not suitable for upstreaming in near
future -- some are quick hacks, some others need more works. However they are
complete enough to demo the necessary kernel changes for the proposed app&VM
transparent NVDIMM volatile use model.
The interfaces are far from finalized. They kind of illustrate what would be
necessary for creating a user space driven solution. The exact forms will ask
for more thoughts and inputs. We may adopt HMAT based solution for NUMA node
related interface when they are ready. The /proc/PID/idle_pages interface is
standalone but non-trivial. Before upstreaming some day, it's expected to take
long time to collect various real use cases and feedbacks, so as to refine and
stabilize the format.
Create PMEM numa node
[PATCH 01/21] e820: cheat PMEM as DRAM
Mark numa node as DRAM/PMEM
[PATCH 02/21] acpi/numa: memorize NUMA node type from SRAT table
[PATCH 03/21] x86/numa_emulation: fix fake NUMA in uniform case
[PATCH 04/21] x86/numa_emulation: pass numa node type to fake nodes
[PATCH 05/21] mmzone: new pgdat flags for DRAM and PMEM
[PATCH 06/21] x86,numa: update numa node type
[PATCH 07/21] mm: export node type {pmem|dram} under /sys/bus/node
Point neighbor DRAM/PMEM to each other
[PATCH 08/21] mm: introduce and export pgdat peer_node
[PATCH 09/21] mm: avoid duplicate peer target node
Standalone zonelist for DRAM and PMEM nodes
[PATCH 10/21] mm: build separate zonelist for PMEM and DRAM node
Keep page table pages in DRAM
[PATCH 11/21] kvm: allocate page table pages from DRAM
[PATCH 12/21] x86/pgtable: allocate page table pages from DRAM
/proc/PID/idle_pages interface for virtual machine and normal tasks
[PATCH 13/21] x86/pgtable: dont check PMD accessed bit
[PATCH 14/21] kvm: register in mm_struct
[PATCH 15/21] ept-idle: EPT walk for virtual machine
[PATCH 16/21] mm-idle: mm_walk for normal task
[PATCH 17/21] proc: introduce /proc/PID/idle_pages
[PATCH 18/21] kvm-ept-idle: enable module
Mark hot pages
[PATCH 19/21] mm/migrate.c: add move_pages(MPOL_MF_SW_YOUNG) flag
Kernel DRAM=>PMEM migration
[PATCH 20/21] mm/vmscan.c: migrate anon DRAM pages to PMEM node
[PATCH 21/21] mm/vmscan.c: shrink anon list if can migrate to PMEM
arch/x86/include/asm/numa.h | 2
arch/x86/include/asm/pgalloc.h | 10
arch/x86/include/asm/pgtable.h | 3
arch/x86/kernel/e820.c | 3
arch/x86/kvm/Kconfig | 11
arch/x86/kvm/Makefile | 4
arch/x86/kvm/ept_idle.c | 841 +++++++++++++++++++++++++++++++
arch/x86/kvm/ept_idle.h | 116 ++++
arch/x86/kvm/mmu.c | 12
arch/x86/mm/numa.c | 3
arch/x86/mm/numa_emulation.c | 30 +
arch/x86/mm/pgtable.c | 22
drivers/acpi/numa.c | 5
drivers/base/node.c | 21
fs/proc/base.c | 2
fs/proc/internal.h | 1
fs/proc/task_mmu.c | 54 +
include/linux/mm_types.h | 11
include/linux/mmzone.h | 38 +
mm/mempolicy.c | 14
mm/migrate.c | 13
mm/page_alloc.c | 77 ++
mm/pagewalk.c | 1
mm/vmscan.c | 38 +
virt/kvm/kvm_main.c | 3
25 files changed, 1306 insertions(+), 29 deletions(-)
V1 patches: https://lkml.org/lkml/2018/9/2/13
Regards,
Fengguang
next reply other threads:[~2018-12-26 13:37 UTC|newest]
Thread overview: 99+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-26 13:14 Fengguang Wu [this message]
2018-12-26 13:14 ` [RFC][PATCH v2 00/21] PMEM NUMA node and hotness accounting/migration Fengguang Wu
2018-12-26 13:14 ` [RFC][PATCH v2 01/21] e820: cheat PMEM as DRAM Fengguang Wu
2018-12-26 13:14 ` Fengguang Wu
2018-12-27 3:41 ` Matthew Wilcox
2018-12-27 4:11 ` Fengguang Wu
2018-12-27 5:13 ` Dan Williams
2018-12-27 5:13 ` Dan Williams
2018-12-27 19:32 ` Yang Shi
2018-12-27 19:32 ` Yang Shi
2018-12-28 3:27 ` Fengguang Wu
2018-12-26 13:14 ` [RFC][PATCH v2 02/21] acpi/numa: memorize NUMA node type from SRAT table Fengguang Wu
2018-12-26 13:14 ` Fengguang Wu
2018-12-26 13:14 ` [RFC][PATCH v2 03/21] x86/numa_emulation: fix fake NUMA in uniform case Fengguang Wu
2018-12-26 13:14 ` Fengguang Wu
2018-12-26 13:14 ` [RFC][PATCH v2 04/21] x86/numa_emulation: pass numa node type to fake nodes Fengguang Wu
2018-12-26 13:14 ` Fengguang Wu
2018-12-26 13:14 ` [RFC][PATCH v2 05/21] mmzone: new pgdat flags for DRAM and PMEM Fengguang Wu
2018-12-26 13:14 ` Fengguang Wu
2018-12-26 13:14 ` [RFC][PATCH v2 06/21] x86,numa: update numa node type Fengguang Wu
2018-12-26 13:14 ` Fengguang Wu
2018-12-26 13:14 ` [RFC][PATCH v2 07/21] mm: export node type {pmem|dram} under /sys/bus/node Fengguang Wu
2018-12-26 13:14 ` Fengguang Wu
2018-12-26 13:14 ` [RFC][PATCH v2 08/21] mm: introduce and export pgdat peer_node Fengguang Wu
2018-12-26 13:14 ` Fengguang Wu
2018-12-27 20:07 ` Christopher Lameter
2018-12-27 20:07 ` Christopher Lameter
2018-12-28 2:31 ` Fengguang Wu
2018-12-26 13:14 ` [RFC][PATCH v2 09/21] mm: avoid duplicate peer target node Fengguang Wu
2018-12-26 13:14 ` Fengguang Wu
2018-12-26 13:14 ` [RFC][PATCH v2 10/21] mm: build separate zonelist for PMEM and DRAM node Fengguang Wu
2018-12-26 13:14 ` Fengguang Wu
2019-01-01 9:14 ` Aneesh Kumar K.V
2019-01-01 9:14 ` Aneesh Kumar K.V
2019-01-07 9:57 ` Fengguang Wu
2019-01-07 14:09 ` Aneesh Kumar K.V
2018-12-26 13:14 ` [RFC][PATCH v2 11/21] kvm: allocate page table pages from DRAM Fengguang Wu
2018-12-26 13:14 ` Fengguang Wu
2019-01-01 9:23 ` Aneesh Kumar K.V
2019-01-01 9:23 ` Aneesh Kumar K.V
2019-01-02 0:59 ` Yuan Yao
2019-01-02 16:47 ` Dave Hansen
2019-01-07 10:21 ` Fengguang Wu
2018-12-26 13:14 ` [RFC][PATCH v2 12/21] x86/pgtable: " Fengguang Wu
2018-12-26 13:14 ` Fengguang Wu
2018-12-26 13:14 ` [RFC][PATCH v2 13/21] x86/pgtable: dont check PMD accessed bit Fengguang Wu
2018-12-26 13:14 ` Fengguang Wu
2018-12-26 13:15 ` [RFC][PATCH v2 14/21] kvm: register in mm_struct Fengguang Wu
2018-12-26 13:15 ` Fengguang Wu
2019-02-02 6:57 ` Peter Xu
2019-02-02 10:50 ` Fengguang Wu
2019-02-04 10:46 ` Paolo Bonzini
2018-12-26 13:15 ` [RFC][PATCH v2 15/21] ept-idle: EPT walk for virtual machine Fengguang Wu
2018-12-26 13:15 ` Fengguang Wu
2018-12-26 13:15 ` [RFC][PATCH v2 16/21] mm-idle: mm_walk for normal task Fengguang Wu
2018-12-26 13:15 ` Fengguang Wu
2018-12-26 13:15 ` [RFC][PATCH v2 17/21] proc: introduce /proc/PID/idle_pages Fengguang Wu
2018-12-26 13:15 ` Fengguang Wu
2018-12-26 13:15 ` [RFC][PATCH v2 18/21] kvm-ept-idle: enable module Fengguang Wu
2018-12-26 13:15 ` Fengguang Wu
2018-12-26 13:15 ` [RFC][PATCH v2 19/21] mm/migrate.c: add move_pages(MPOL_MF_SW_YOUNG) flag Fengguang Wu
2018-12-26 13:15 ` Fengguang Wu
2018-12-26 13:15 ` [RFC][PATCH v2 20/21] mm/vmscan.c: migrate anon DRAM pages to PMEM node Fengguang Wu
2018-12-26 13:15 ` Fengguang Wu
2018-12-26 13:15 ` [RFC][PATCH v2 21/21] mm/vmscan.c: shrink anon list if can migrate to PMEM Fengguang Wu
2018-12-26 13:15 ` Fengguang Wu
2018-12-27 20:31 ` [RFC][PATCH v2 00/21] PMEM NUMA node and hotness accounting/migration Michal Hocko
2018-12-28 5:08 ` Fengguang Wu
2018-12-28 8:41 ` Michal Hocko
2018-12-28 9:42 ` Fengguang Wu
2018-12-28 12:15 ` Michal Hocko
2018-12-28 13:15 ` Fengguang Wu
2018-12-28 13:15 ` Fengguang Wu
2018-12-28 19:46 ` Michal Hocko
2018-12-28 13:31 ` Fengguang Wu
2018-12-28 18:28 ` Yang Shi
2018-12-28 18:28 ` Yang Shi
2018-12-28 19:52 ` Michal Hocko
2019-01-02 12:21 ` Jonathan Cameron
2019-01-02 12:21 ` Jonathan Cameron
2019-01-08 14:52 ` Michal Hocko
2019-01-10 15:53 ` Jerome Glisse
2019-01-10 15:53 ` Jerome Glisse
2019-01-10 16:42 ` Michal Hocko
2019-01-10 17:42 ` Jerome Glisse
2019-01-10 17:42 ` Jerome Glisse
2019-01-10 18:26 ` Jonathan Cameron
2019-01-10 18:26 ` Jonathan Cameron
2019-01-28 17:42 ` Jonathan Cameron
2019-01-28 17:42 ` Jonathan Cameron
2019-01-29 2:00 ` Fengguang Wu
2019-01-03 10:57 ` Mel Gorman
2019-01-10 16:25 ` Jerome Glisse
2019-01-10 16:25 ` Jerome Glisse
2019-01-10 16:50 ` Michal Hocko
2019-01-10 18:02 ` Jerome Glisse
2019-01-10 18:02 ` Jerome Glisse
2019-01-02 18:12 ` Dave Hansen
2019-01-08 14:53 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181226131446.330864849@intel.com \
--to=fengguang.wu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).