From: Fengguang Wu <fengguang.wu@intel.com> To: Andrew Morton <akpm@linux-foundation.org> Cc: Linux Memory Management List <linux-mm@kvack.org>, kvm@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>, Fan Du <fan.du@intel.com>, Yao Yuan <yuan.yao@intel.com>, Peng Dong <dongx.peng@intel.com>, Huang Ying <ying.huang@intel.com>, Liu Jingqi <jingqi.liu@intel.com>, Dong Eddie <eddie.dong@intel.com>, Dave Hansen <dave.hansen@intel.com>, Zhang Yi <yi.z.zhang@linux.intel.com>, Dan Williams <dan.j.williams@intel.com>, Fengguang Wu <fengguang.wu@intel.com> Subject: [RFC][PATCH v2 00/21] PMEM NUMA node and hotness accounting/migration Date: Wed, 26 Dec 2018 21:14:46 +0800 [thread overview] Message-ID: <20181226131446.330864849@intel.com> (raw) This is an attempt to use NVDIMM/PMEM as volatile NUMA memory that's transparent to normal applications and virtual machines. The code is still in active development. It's provided for early design review. Key functionalities: 1) create and describe PMEM NUMA node for NVDIMM memory 2) dumb /proc/PID/idle_pages interface, for user space driven hot page accounting 3) passive kernel cold page migration in page reclaim path 4) improved move_pages() for active user space hot/cold page migration (1) is foundation for transparent usage of NVDIMM for normal apps and virtual machines. (2-4) enable auto placing hot pages in DRAM for better performance. A user space migration daemon is being built based on this kernel patchset to make the full vertical solution. Base kernel is v4.20 . The patches are not suitable for upstreaming in near future -- some are quick hacks, some others need more works. However they are complete enough to demo the necessary kernel changes for the proposed app&VM transparent NVDIMM volatile use model. The interfaces are far from finalized. They kind of illustrate what would be necessary for creating a user space driven solution. The exact forms will ask for more thoughts and inputs. We may adopt HMAT based solution for NUMA node related interface when they are ready. The /proc/PID/idle_pages interface is standalone but non-trivial. Before upstreaming some day, it's expected to take long time to collect various real use cases and feedbacks, so as to refine and stabilize the format. Create PMEM numa node [PATCH 01/21] e820: cheat PMEM as DRAM Mark numa node as DRAM/PMEM [PATCH 02/21] acpi/numa: memorize NUMA node type from SRAT table [PATCH 03/21] x86/numa_emulation: fix fake NUMA in uniform case [PATCH 04/21] x86/numa_emulation: pass numa node type to fake nodes [PATCH 05/21] mmzone: new pgdat flags for DRAM and PMEM [PATCH 06/21] x86,numa: update numa node type [PATCH 07/21] mm: export node type {pmem|dram} under /sys/bus/node Point neighbor DRAM/PMEM to each other [PATCH 08/21] mm: introduce and export pgdat peer_node [PATCH 09/21] mm: avoid duplicate peer target node Standalone zonelist for DRAM and PMEM nodes [PATCH 10/21] mm: build separate zonelist for PMEM and DRAM node Keep page table pages in DRAM [PATCH 11/21] kvm: allocate page table pages from DRAM [PATCH 12/21] x86/pgtable: allocate page table pages from DRAM /proc/PID/idle_pages interface for virtual machine and normal tasks [PATCH 13/21] x86/pgtable: dont check PMD accessed bit [PATCH 14/21] kvm: register in mm_struct [PATCH 15/21] ept-idle: EPT walk for virtual machine [PATCH 16/21] mm-idle: mm_walk for normal task [PATCH 17/21] proc: introduce /proc/PID/idle_pages [PATCH 18/21] kvm-ept-idle: enable module Mark hot pages [PATCH 19/21] mm/migrate.c: add move_pages(MPOL_MF_SW_YOUNG) flag Kernel DRAM=>PMEM migration [PATCH 20/21] mm/vmscan.c: migrate anon DRAM pages to PMEM node [PATCH 21/21] mm/vmscan.c: shrink anon list if can migrate to PMEM arch/x86/include/asm/numa.h | 2 arch/x86/include/asm/pgalloc.h | 10 arch/x86/include/asm/pgtable.h | 3 arch/x86/kernel/e820.c | 3 arch/x86/kvm/Kconfig | 11 arch/x86/kvm/Makefile | 4 arch/x86/kvm/ept_idle.c | 841 +++++++++++++++++++++++++++++++ arch/x86/kvm/ept_idle.h | 116 ++++ arch/x86/kvm/mmu.c | 12 arch/x86/mm/numa.c | 3 arch/x86/mm/numa_emulation.c | 30 + arch/x86/mm/pgtable.c | 22 drivers/acpi/numa.c | 5 drivers/base/node.c | 21 fs/proc/base.c | 2 fs/proc/internal.h | 1 fs/proc/task_mmu.c | 54 + include/linux/mm_types.h | 11 include/linux/mmzone.h | 38 + mm/mempolicy.c | 14 mm/migrate.c | 13 mm/page_alloc.c | 77 ++ mm/pagewalk.c | 1 mm/vmscan.c | 38 + virt/kvm/kvm_main.c | 3 25 files changed, 1306 insertions(+), 29 deletions(-) V1 patches: https://lkml.org/lkml/2018/9/2/13 Regards, Fengguang
WARNING: multiple messages have this Message-ID (diff)
From: Fengguang Wu <fengguang.wu@intel.com> To: Andrew Morton <akpm@linux-foundation.org> Cc: Linux Memory Management List <linux-mm@kvack.org> Cc: kvm@vger.kernel.org Cc: LKML <linux-kernel@vger.kernel.org> Cc: Fan Du <fan.du@intel.com> Cc: Yao Yuan <yuan.yao@intel.com> Cc: Peng Dong <dongx.peng@intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Liu Jingqi <jingqi.liu@intel.com> Cc: Dong Eddie <eddie.dong@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Zhang Yi <yi.z.zhang@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Subject: [RFC][PATCH v2 00/21] PMEM NUMA node and hotness accounting/migration Date: Wed, 26 Dec 2018 21:14:46 +0800 [thread overview] Message-ID: <20181226131446.330864849@intel.com> (raw) Message-ID: <20181226131446.NT67mnFig3zoMmDvba2K5tCZ5IeQl3rfDCvFmg00inA@z> (raw) This is an attempt to use NVDIMM/PMEM as volatile NUMA memory that's transparent to normal applications and virtual machines. The code is still in active development. It's provided for early design review. Key functionalities: 1) create and describe PMEM NUMA node for NVDIMM memory 2) dumb /proc/PID/idle_pages interface, for user space driven hot page accounting 3) passive kernel cold page migration in page reclaim path 4) improved move_pages() for active user space hot/cold page migration (1) is foundation for transparent usage of NVDIMM for normal apps and virtual machines. (2-4) enable auto placing hot pages in DRAM for better performance. A user space migration daemon is being built based on this kernel patchset to make the full vertical solution. Base kernel is v4.20 . The patches are not suitable for upstreaming in near future -- some are quick hacks, some others need more works. However they are complete enough to demo the necessary kernel changes for the proposed app&VM transparent NVDIMM volatile use model. The interfaces are far from finalized. They kind of illustrate what would be necessary for creating a user space driven solution. The exact forms will ask for more thoughts and inputs. We may adopt HMAT based solution for NUMA node related interface when they are ready. The /proc/PID/idle_pages interface is standalone but non-trivial. Before upstreaming some day, it's expected to take long time to collect various real use cases and feedbacks, so as to refine and stabilize the format. Create PMEM numa node [PATCH 01/21] e820: cheat PMEM as DRAM Mark numa node as DRAM/PMEM [PATCH 02/21] acpi/numa: memorize NUMA node type from SRAT table [PATCH 03/21] x86/numa_emulation: fix fake NUMA in uniform case [PATCH 04/21] x86/numa_emulation: pass numa node type to fake nodes [PATCH 05/21] mmzone: new pgdat flags for DRAM and PMEM [PATCH 06/21] x86,numa: update numa node type [PATCH 07/21] mm: export node type {pmem|dram} under /sys/bus/node Point neighbor DRAM/PMEM to each other [PATCH 08/21] mm: introduce and export pgdat peer_node [PATCH 09/21] mm: avoid duplicate peer target node Standalone zonelist for DRAM and PMEM nodes [PATCH 10/21] mm: build separate zonelist for PMEM and DRAM node Keep page table pages in DRAM [PATCH 11/21] kvm: allocate page table pages from DRAM [PATCH 12/21] x86/pgtable: allocate page table pages from DRAM /proc/PID/idle_pages interface for virtual machine and normal tasks [PATCH 13/21] x86/pgtable: dont check PMD accessed bit [PATCH 14/21] kvm: register in mm_struct [PATCH 15/21] ept-idle: EPT walk for virtual machine [PATCH 16/21] mm-idle: mm_walk for normal task [PATCH 17/21] proc: introduce /proc/PID/idle_pages [PATCH 18/21] kvm-ept-idle: enable module Mark hot pages [PATCH 19/21] mm/migrate.c: add move_pages(MPOL_MF_SW_YOUNG) flag Kernel DRAM=>PMEM migration [PATCH 20/21] mm/vmscan.c: migrate anon DRAM pages to PMEM node [PATCH 21/21] mm/vmscan.c: shrink anon list if can migrate to PMEM arch/x86/include/asm/numa.h | 2 arch/x86/include/asm/pgalloc.h | 10 arch/x86/include/asm/pgtable.h | 3 arch/x86/kernel/e820.c | 3 arch/x86/kvm/Kconfig | 11 arch/x86/kvm/Makefile | 4 arch/x86/kvm/ept_idle.c | 841 +++++++++++++++++++++++++++++++ arch/x86/kvm/ept_idle.h | 116 ++++ arch/x86/kvm/mmu.c | 12 arch/x86/mm/numa.c | 3 arch/x86/mm/numa_emulation.c | 30 + arch/x86/mm/pgtable.c | 22 drivers/acpi/numa.c | 5 drivers/base/node.c | 21 fs/proc/base.c | 2 fs/proc/internal.h | 1 fs/proc/task_mmu.c | 54 + include/linux/mm_types.h | 11 include/linux/mmzone.h | 38 + mm/mempolicy.c | 14 mm/migrate.c | 13 mm/page_alloc.c | 77 ++ mm/pagewalk.c | 1 mm/vmscan.c | 38 + virt/kvm/kvm_main.c | 3 25 files changed, 1306 insertions(+), 29 deletions(-) V1 patches: https://lkml.org/lkml/2018/9/2/13 Regards, Fengguang
next reply other threads:[~2018-12-26 13:37 UTC|newest] Thread overview: 99+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-12-26 13:14 Fengguang Wu [this message] 2018-12-26 13:14 ` [RFC][PATCH v2 00/21] PMEM NUMA node and hotness accounting/migration Fengguang Wu 2018-12-26 13:14 ` [RFC][PATCH v2 01/21] e820: cheat PMEM as DRAM Fengguang Wu 2018-12-26 13:14 ` Fengguang Wu 2018-12-27 3:41 ` Matthew Wilcox 2018-12-27 4:11 ` Fengguang Wu 2018-12-27 5:13 ` Dan Williams 2018-12-27 5:13 ` Dan Williams 2018-12-27 19:32 ` Yang Shi 2018-12-27 19:32 ` Yang Shi 2018-12-28 3:27 ` Fengguang Wu 2018-12-26 13:14 ` [RFC][PATCH v2 02/21] acpi/numa: memorize NUMA node type from SRAT table Fengguang Wu 2018-12-26 13:14 ` Fengguang Wu 2018-12-26 13:14 ` [RFC][PATCH v2 03/21] x86/numa_emulation: fix fake NUMA in uniform case Fengguang Wu 2018-12-26 13:14 ` Fengguang Wu 2018-12-26 13:14 ` [RFC][PATCH v2 04/21] x86/numa_emulation: pass numa node type to fake nodes Fengguang Wu 2018-12-26 13:14 ` Fengguang Wu 2018-12-26 13:14 ` [RFC][PATCH v2 05/21] mmzone: new pgdat flags for DRAM and PMEM Fengguang Wu 2018-12-26 13:14 ` Fengguang Wu 2018-12-26 13:14 ` [RFC][PATCH v2 06/21] x86,numa: update numa node type Fengguang Wu 2018-12-26 13:14 ` Fengguang Wu 2018-12-26 13:14 ` [RFC][PATCH v2 07/21] mm: export node type {pmem|dram} under /sys/bus/node Fengguang Wu 2018-12-26 13:14 ` Fengguang Wu 2018-12-26 13:14 ` [RFC][PATCH v2 08/21] mm: introduce and export pgdat peer_node Fengguang Wu 2018-12-26 13:14 ` Fengguang Wu 2018-12-27 20:07 ` Christopher Lameter 2018-12-27 20:07 ` Christopher Lameter 2018-12-28 2:31 ` Fengguang Wu 2018-12-26 13:14 ` [RFC][PATCH v2 09/21] mm: avoid duplicate peer target node Fengguang Wu 2018-12-26 13:14 ` Fengguang Wu 2018-12-26 13:14 ` [RFC][PATCH v2 10/21] mm: build separate zonelist for PMEM and DRAM node Fengguang Wu 2018-12-26 13:14 ` Fengguang Wu 2019-01-01 9:14 ` Aneesh Kumar K.V 2019-01-01 9:14 ` Aneesh Kumar K.V 2019-01-07 9:57 ` Fengguang Wu 2019-01-07 14:09 ` Aneesh Kumar K.V 2018-12-26 13:14 ` [RFC][PATCH v2 11/21] kvm: allocate page table pages from DRAM Fengguang Wu 2018-12-26 13:14 ` Fengguang Wu 2019-01-01 9:23 ` Aneesh Kumar K.V 2019-01-01 9:23 ` Aneesh Kumar K.V 2019-01-02 0:59 ` Yuan Yao 2019-01-02 16:47 ` Dave Hansen 2019-01-07 10:21 ` Fengguang Wu 2018-12-26 13:14 ` [RFC][PATCH v2 12/21] x86/pgtable: " Fengguang Wu 2018-12-26 13:14 ` Fengguang Wu 2018-12-26 13:14 ` [RFC][PATCH v2 13/21] x86/pgtable: dont check PMD accessed bit Fengguang Wu 2018-12-26 13:14 ` Fengguang Wu 2018-12-26 13:15 ` [RFC][PATCH v2 14/21] kvm: register in mm_struct Fengguang Wu 2018-12-26 13:15 ` Fengguang Wu 2019-02-02 6:57 ` Peter Xu 2019-02-02 10:50 ` Fengguang Wu 2019-02-04 10:46 ` Paolo Bonzini 2018-12-26 13:15 ` [RFC][PATCH v2 15/21] ept-idle: EPT walk for virtual machine Fengguang Wu 2018-12-26 13:15 ` Fengguang Wu 2018-12-26 13:15 ` [RFC][PATCH v2 16/21] mm-idle: mm_walk for normal task Fengguang Wu 2018-12-26 13:15 ` Fengguang Wu 2018-12-26 13:15 ` [RFC][PATCH v2 17/21] proc: introduce /proc/PID/idle_pages Fengguang Wu 2018-12-26 13:15 ` Fengguang Wu 2018-12-26 13:15 ` [RFC][PATCH v2 18/21] kvm-ept-idle: enable module Fengguang Wu 2018-12-26 13:15 ` Fengguang Wu 2018-12-26 13:15 ` [RFC][PATCH v2 19/21] mm/migrate.c: add move_pages(MPOL_MF_SW_YOUNG) flag Fengguang Wu 2018-12-26 13:15 ` Fengguang Wu 2018-12-26 13:15 ` [RFC][PATCH v2 20/21] mm/vmscan.c: migrate anon DRAM pages to PMEM node Fengguang Wu 2018-12-26 13:15 ` Fengguang Wu 2018-12-26 13:15 ` [RFC][PATCH v2 21/21] mm/vmscan.c: shrink anon list if can migrate to PMEM Fengguang Wu 2018-12-26 13:15 ` Fengguang Wu 2018-12-27 20:31 ` [RFC][PATCH v2 00/21] PMEM NUMA node and hotness accounting/migration Michal Hocko 2018-12-28 5:08 ` Fengguang Wu 2018-12-28 8:41 ` Michal Hocko 2018-12-28 9:42 ` Fengguang Wu 2018-12-28 12:15 ` Michal Hocko 2018-12-28 13:15 ` Fengguang Wu 2018-12-28 13:15 ` Fengguang Wu 2018-12-28 19:46 ` Michal Hocko 2018-12-28 13:31 ` Fengguang Wu 2018-12-28 18:28 ` Yang Shi 2018-12-28 18:28 ` Yang Shi 2018-12-28 19:52 ` Michal Hocko 2019-01-02 12:21 ` Jonathan Cameron 2019-01-02 12:21 ` Jonathan Cameron 2019-01-08 14:52 ` Michal Hocko 2019-01-10 15:53 ` Jerome Glisse 2019-01-10 15:53 ` Jerome Glisse 2019-01-10 16:42 ` Michal Hocko 2019-01-10 17:42 ` Jerome Glisse 2019-01-10 17:42 ` Jerome Glisse 2019-01-10 18:26 ` Jonathan Cameron 2019-01-10 18:26 ` Jonathan Cameron 2019-01-28 17:42 ` Jonathan Cameron 2019-01-28 17:42 ` Jonathan Cameron 2019-01-29 2:00 ` Fengguang Wu 2019-01-03 10:57 ` Mel Gorman 2019-01-10 16:25 ` Jerome Glisse 2019-01-10 16:25 ` Jerome Glisse 2019-01-10 16:50 ` Michal Hocko 2019-01-10 18:02 ` Jerome Glisse 2019-01-10 18:02 ` Jerome Glisse 2019-01-02 18:12 ` Dave Hansen 2019-01-08 14:53 ` Michal Hocko
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20181226131446.330864849@intel.com \ --to=fengguang.wu@intel.com \ --cc=akpm@linux-foundation.org \ --cc=dan.j.williams@intel.com \ --cc=dave.hansen@intel.com \ --cc=dongx.peng@intel.com \ --cc=eddie.dong@intel.com \ --cc=fan.du@intel.com \ --cc=jingqi.liu@intel.com \ --cc=kvm@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=yi.z.zhang@linux.intel.com \ --cc=ying.huang@intel.com \ --cc=yuan.yao@intel.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).