From: Fengguang Wu <fengguang.wu@intel.com> To: Andrew Morton <akpm@linux-foundation.org> Cc: Linux Memory Management List <linux-mm@kvack.org>, Liu Jingqi <jingqi.liu@intel.com>, Fengguang Wu <fengguang.wu@intel.com>, kvm@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>, Fan Du <fan.du@intel.com>, Yao Yuan <yuan.yao@intel.com>, Peng Dong <dongx.peng@intel.com>, Huang Ying <ying.huang@intel.com>, Dong Eddie <eddie.dong@intel.com>, Dave Hansen <dave.hansen@intel.com>, Zhang Yi <yi.z.zhang@linux.intel.com>, Dan Williams <dan.j.williams@intel.com> Subject: [RFC][PATCH v2 19/21] mm/migrate.c: add move_pages(MPOL_MF_SW_YOUNG) flag Date: Wed, 26 Dec 2018 21:15:05 +0800 [thread overview] Message-ID: <20181226133352.189896494@intel.com> (raw) In-Reply-To: 20181226131446.330864849@intel.com [-- Attachment #1: 0010-migrate-check-if-the-page-is-software-young-when-mov.patch --] [-- Type: text/plain, Size: 3635 bytes --] From: Liu Jingqi <jingqi.liu@intel.com> Introduce MPOL_MF_SW_YOUNG flag to move_pages(). When on, the already-in-DRAM pages will be set PG_referenced. Background: The use space migration daemon will frequently scan page table and read-clear accessed bits to detect hot/cold pages. Then migrate hot pages from PMEM to DRAM node. When doing so, it btw tells kernel that these are the hot page set. This maintains a persistent view of hot/cold pages between kernel and user space daemon. The more concrete steps are 1) do multiple scan of page table, count accessed bits 2) highest accessed count => hot pages 3) call move_pages(hot pages, DRAM nodes, MPOL_MF_SW_YOUNG) (1) regularly clears PTE young, which makes kernel lose access to PTE young information (2) for anonymous pages, user space daemon defines which is hot and which is cold (3) conveys user space view of hot/cold pages to kernel through PG_referenced In the long run, most hot pages could already be in DRAM. move_pages(MPOL_MF_SW_YOUNG) sets PG_referenced for those already in DRAM hot pages. But not for newly migrated hot pages. Since they are expected to put to the end of LRU, thus has long enough time in LRU to gather accessed/PG_referenced bit and prove to kernel they are really hot. The daemon may only select DRAM/2 pages as hot for 2 purposes: - avoid thrashing, eg. some warm pages got promoted then demoted soon - make sure enough DRAM LRU pages look "cold" to kernel, so that vmscan won't run into trouble busy scanning LRU lists Signed-off-by: Liu Jingqi <jingqi.liu@intel.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> --- mm/migrate.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) --- linux.orig/mm/migrate.c 2018-12-23 20:37:12.604621319 +0800 +++ linux/mm/migrate.c 2018-12-23 20:37:12.604621319 +0800 @@ -55,6 +55,8 @@ #include "internal.h" +#define MPOL_MF_SW_YOUNG (1<<7) + /* * migrate_prep() needs to be called before we start compiling a list of pages * to be migrated using isolate_lru_page(). If scheduling work on other CPUs is @@ -1484,12 +1486,13 @@ static int do_move_pages_to_node(struct * the target node */ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr, - int node, struct list_head *pagelist, bool migrate_all) + int node, struct list_head *pagelist, int flags) { struct vm_area_struct *vma; struct page *page; unsigned int follflags; int err; + bool migrate_all = flags & MPOL_MF_MOVE_ALL; down_read(&mm->mmap_sem); err = -EFAULT; @@ -1519,6 +1522,8 @@ static int add_page_for_migration(struct if (PageHuge(page)) { if (PageHead(page)) { + if (flags & MPOL_MF_SW_YOUNG) + SetPageReferenced(page); isolate_huge_page(page, pagelist); err = 0; } @@ -1531,6 +1536,8 @@ static int add_page_for_migration(struct goto out_putpage; err = 0; + if (flags & MPOL_MF_SW_YOUNG) + SetPageReferenced(head); list_add_tail(&head->lru, pagelist); mod_node_page_state(page_pgdat(head), NR_ISOLATED_ANON + page_is_file_cache(head), @@ -1606,7 +1613,7 @@ static int do_pages_move(struct mm_struc * report them via status */ err = add_page_for_migration(mm, addr, current_node, - &pagelist, flags & MPOL_MF_MOVE_ALL); + &pagelist, flags); if (!err) continue; @@ -1725,7 +1732,7 @@ static int kernel_move_pages(pid_t pid, nodemask_t task_nodes; /* Check flags */ - if (flags & ~(MPOL_MF_MOVE|MPOL_MF_MOVE_ALL)) + if (flags & ~(MPOL_MF_MOVE|MPOL_MF_MOVE_ALL|MPOL_MF_SW_YOUNG)) return -EINVAL; if ((flags & MPOL_MF_MOVE_ALL) && !capable(CAP_SYS_NICE))
WARNING: multiple messages have this Message-ID (diff)
From: Fengguang Wu <fengguang.wu@intel.com> To: Andrew Morton <akpm@linux-foundation.org> Cc: Linux Memory Management List <linux-mm@kvack.org>, Liu Jingqi <jingqi.liu@intel.com>, Fengguang Wu <fengguang.wu@intel.com> Cc: kvm@vger.kernel.org Cc: LKML <linux-kernel@vger.kernel.org> Cc: Fan Du <fan.du@intel.com> Cc: Yao Yuan <yuan.yao@intel.com> Cc: Peng Dong <dongx.peng@intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Dong Eddie <eddie.dong@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Zhang Yi <yi.z.zhang@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Subject: [RFC][PATCH v2 19/21] mm/migrate.c: add move_pages(MPOL_MF_SW_YOUNG) flag Date: Wed, 26 Dec 2018 21:15:05 +0800 [thread overview] Message-ID: <20181226133352.189896494@intel.com> (raw) Message-ID: <20181226131505.XE-GwjPIQloU_9Rr7NVGwGyncb4_Dx84BHCqS8sRGbo@z> (raw) In-Reply-To: 20181226131446.330864849@intel.com [-- Attachment #1: 0010-migrate-check-if-the-page-is-software-young-when-mov.patch --] [-- Type: text/plain, Size: 3637 bytes --] From: Liu Jingqi <jingqi.liu@intel.com> Introduce MPOL_MF_SW_YOUNG flag to move_pages(). When on, the already-in-DRAM pages will be set PG_referenced. Background: The use space migration daemon will frequently scan page table and read-clear accessed bits to detect hot/cold pages. Then migrate hot pages from PMEM to DRAM node. When doing so, it btw tells kernel that these are the hot page set. This maintains a persistent view of hot/cold pages between kernel and user space daemon. The more concrete steps are 1) do multiple scan of page table, count accessed bits 2) highest accessed count => hot pages 3) call move_pages(hot pages, DRAM nodes, MPOL_MF_SW_YOUNG) (1) regularly clears PTE young, which makes kernel lose access to PTE young information (2) for anonymous pages, user space daemon defines which is hot and which is cold (3) conveys user space view of hot/cold pages to kernel through PG_referenced In the long run, most hot pages could already be in DRAM. move_pages(MPOL_MF_SW_YOUNG) sets PG_referenced for those already in DRAM hot pages. But not for newly migrated hot pages. Since they are expected to put to the end of LRU, thus has long enough time in LRU to gather accessed/PG_referenced bit and prove to kernel they are really hot. The daemon may only select DRAM/2 pages as hot for 2 purposes: - avoid thrashing, eg. some warm pages got promoted then demoted soon - make sure enough DRAM LRU pages look "cold" to kernel, so that vmscan won't run into trouble busy scanning LRU lists Signed-off-by: Liu Jingqi <jingqi.liu@intel.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> --- mm/migrate.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) --- linux.orig/mm/migrate.c 2018-12-23 20:37:12.604621319 +0800 +++ linux/mm/migrate.c 2018-12-23 20:37:12.604621319 +0800 @@ -55,6 +55,8 @@ #include "internal.h" +#define MPOL_MF_SW_YOUNG (1<<7) + /* * migrate_prep() needs to be called before we start compiling a list of pages * to be migrated using isolate_lru_page(). If scheduling work on other CPUs is @@ -1484,12 +1486,13 @@ static int do_move_pages_to_node(struct * the target node */ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr, - int node, struct list_head *pagelist, bool migrate_all) + int node, struct list_head *pagelist, int flags) { struct vm_area_struct *vma; struct page *page; unsigned int follflags; int err; + bool migrate_all = flags & MPOL_MF_MOVE_ALL; down_read(&mm->mmap_sem); err = -EFAULT; @@ -1519,6 +1522,8 @@ static int add_page_for_migration(struct if (PageHuge(page)) { if (PageHead(page)) { + if (flags & MPOL_MF_SW_YOUNG) + SetPageReferenced(page); isolate_huge_page(page, pagelist); err = 0; } @@ -1531,6 +1536,8 @@ static int add_page_for_migration(struct goto out_putpage; err = 0; + if (flags & MPOL_MF_SW_YOUNG) + SetPageReferenced(head); list_add_tail(&head->lru, pagelist); mod_node_page_state(page_pgdat(head), NR_ISOLATED_ANON + page_is_file_cache(head), @@ -1606,7 +1613,7 @@ static int do_pages_move(struct mm_struc * report them via status */ err = add_page_for_migration(mm, addr, current_node, - &pagelist, flags & MPOL_MF_MOVE_ALL); + &pagelist, flags); if (!err) continue; @@ -1725,7 +1732,7 @@ static int kernel_move_pages(pid_t pid, nodemask_t task_nodes; /* Check flags */ - if (flags & ~(MPOL_MF_MOVE|MPOL_MF_MOVE_ALL)) + if (flags & ~(MPOL_MF_MOVE|MPOL_MF_MOVE_ALL|MPOL_MF_SW_YOUNG)) return -EINVAL; if ((flags & MPOL_MF_MOVE_ALL) && !capable(CAP_SYS_NICE))
next prev parent reply other threads:[~2018-12-26 13:37 UTC|newest] Thread overview: 99+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-12-26 13:14 [RFC][PATCH v2 00/21] PMEM NUMA node and hotness accounting/migration Fengguang Wu 2018-12-26 13:14 ` Fengguang Wu 2018-12-26 13:14 ` [RFC][PATCH v2 01/21] e820: cheat PMEM as DRAM Fengguang Wu 2018-12-26 13:14 ` Fengguang Wu 2018-12-27 3:41 ` Matthew Wilcox 2018-12-27 4:11 ` Fengguang Wu 2018-12-27 5:13 ` Dan Williams 2018-12-27 5:13 ` Dan Williams 2018-12-27 19:32 ` Yang Shi 2018-12-27 19:32 ` Yang Shi 2018-12-28 3:27 ` Fengguang Wu 2018-12-26 13:14 ` [RFC][PATCH v2 02/21] acpi/numa: memorize NUMA node type from SRAT table Fengguang Wu 2018-12-26 13:14 ` Fengguang Wu 2018-12-26 13:14 ` [RFC][PATCH v2 03/21] x86/numa_emulation: fix fake NUMA in uniform case Fengguang Wu 2018-12-26 13:14 ` Fengguang Wu 2018-12-26 13:14 ` [RFC][PATCH v2 04/21] x86/numa_emulation: pass numa node type to fake nodes Fengguang Wu 2018-12-26 13:14 ` Fengguang Wu 2018-12-26 13:14 ` [RFC][PATCH v2 05/21] mmzone: new pgdat flags for DRAM and PMEM Fengguang Wu 2018-12-26 13:14 ` Fengguang Wu 2018-12-26 13:14 ` [RFC][PATCH v2 06/21] x86,numa: update numa node type Fengguang Wu 2018-12-26 13:14 ` Fengguang Wu 2018-12-26 13:14 ` [RFC][PATCH v2 07/21] mm: export node type {pmem|dram} under /sys/bus/node Fengguang Wu 2018-12-26 13:14 ` Fengguang Wu 2018-12-26 13:14 ` [RFC][PATCH v2 08/21] mm: introduce and export pgdat peer_node Fengguang Wu 2018-12-26 13:14 ` Fengguang Wu 2018-12-27 20:07 ` Christopher Lameter 2018-12-27 20:07 ` Christopher Lameter 2018-12-28 2:31 ` Fengguang Wu 2018-12-26 13:14 ` [RFC][PATCH v2 09/21] mm: avoid duplicate peer target node Fengguang Wu 2018-12-26 13:14 ` Fengguang Wu 2018-12-26 13:14 ` [RFC][PATCH v2 10/21] mm: build separate zonelist for PMEM and DRAM node Fengguang Wu 2018-12-26 13:14 ` Fengguang Wu 2019-01-01 9:14 ` Aneesh Kumar K.V 2019-01-01 9:14 ` Aneesh Kumar K.V 2019-01-07 9:57 ` Fengguang Wu 2019-01-07 14:09 ` Aneesh Kumar K.V 2018-12-26 13:14 ` [RFC][PATCH v2 11/21] kvm: allocate page table pages from DRAM Fengguang Wu 2018-12-26 13:14 ` Fengguang Wu 2019-01-01 9:23 ` Aneesh Kumar K.V 2019-01-01 9:23 ` Aneesh Kumar K.V 2019-01-02 0:59 ` Yuan Yao 2019-01-02 16:47 ` Dave Hansen 2019-01-07 10:21 ` Fengguang Wu 2018-12-26 13:14 ` [RFC][PATCH v2 12/21] x86/pgtable: " Fengguang Wu 2018-12-26 13:14 ` Fengguang Wu 2018-12-26 13:14 ` [RFC][PATCH v2 13/21] x86/pgtable: dont check PMD accessed bit Fengguang Wu 2018-12-26 13:14 ` Fengguang Wu 2018-12-26 13:15 ` [RFC][PATCH v2 14/21] kvm: register in mm_struct Fengguang Wu 2018-12-26 13:15 ` Fengguang Wu 2019-02-02 6:57 ` Peter Xu 2019-02-02 10:50 ` Fengguang Wu 2019-02-04 10:46 ` Paolo Bonzini 2018-12-26 13:15 ` [RFC][PATCH v2 15/21] ept-idle: EPT walk for virtual machine Fengguang Wu 2018-12-26 13:15 ` Fengguang Wu 2018-12-26 13:15 ` [RFC][PATCH v2 16/21] mm-idle: mm_walk for normal task Fengguang Wu 2018-12-26 13:15 ` Fengguang Wu 2018-12-26 13:15 ` [RFC][PATCH v2 17/21] proc: introduce /proc/PID/idle_pages Fengguang Wu 2018-12-26 13:15 ` Fengguang Wu 2018-12-26 13:15 ` [RFC][PATCH v2 18/21] kvm-ept-idle: enable module Fengguang Wu 2018-12-26 13:15 ` Fengguang Wu 2018-12-26 13:15 ` Fengguang Wu [this message] 2018-12-26 13:15 ` [RFC][PATCH v2 19/21] mm/migrate.c: add move_pages(MPOL_MF_SW_YOUNG) flag Fengguang Wu 2018-12-26 13:15 ` [RFC][PATCH v2 20/21] mm/vmscan.c: migrate anon DRAM pages to PMEM node Fengguang Wu 2018-12-26 13:15 ` Fengguang Wu 2018-12-26 13:15 ` [RFC][PATCH v2 21/21] mm/vmscan.c: shrink anon list if can migrate to PMEM Fengguang Wu 2018-12-26 13:15 ` Fengguang Wu 2018-12-27 20:31 ` [RFC][PATCH v2 00/21] PMEM NUMA node and hotness accounting/migration Michal Hocko 2018-12-28 5:08 ` Fengguang Wu 2018-12-28 8:41 ` Michal Hocko 2018-12-28 9:42 ` Fengguang Wu 2018-12-28 12:15 ` Michal Hocko 2018-12-28 13:15 ` Fengguang Wu 2018-12-28 13:15 ` Fengguang Wu 2018-12-28 19:46 ` Michal Hocko 2018-12-28 13:31 ` Fengguang Wu 2018-12-28 18:28 ` Yang Shi 2018-12-28 18:28 ` Yang Shi 2018-12-28 19:52 ` Michal Hocko 2019-01-02 12:21 ` Jonathan Cameron 2019-01-02 12:21 ` Jonathan Cameron 2019-01-08 14:52 ` Michal Hocko 2019-01-10 15:53 ` Jerome Glisse 2019-01-10 15:53 ` Jerome Glisse 2019-01-10 16:42 ` Michal Hocko 2019-01-10 17:42 ` Jerome Glisse 2019-01-10 17:42 ` Jerome Glisse 2019-01-10 18:26 ` Jonathan Cameron 2019-01-10 18:26 ` Jonathan Cameron 2019-01-28 17:42 ` Jonathan Cameron 2019-01-28 17:42 ` Jonathan Cameron 2019-01-29 2:00 ` Fengguang Wu 2019-01-03 10:57 ` Mel Gorman 2019-01-10 16:25 ` Jerome Glisse 2019-01-10 16:25 ` Jerome Glisse 2019-01-10 16:50 ` Michal Hocko 2019-01-10 18:02 ` Jerome Glisse 2019-01-10 18:02 ` Jerome Glisse 2019-01-02 18:12 ` Dave Hansen 2019-01-08 14:53 ` Michal Hocko
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20181226133352.189896494@intel.com \ --to=fengguang.wu@intel.com \ --cc=akpm@linux-foundation.org \ --cc=dan.j.williams@intel.com \ --cc=dave.hansen@intel.com \ --cc=dongx.peng@intel.com \ --cc=eddie.dong@intel.com \ --cc=fan.du@intel.com \ --cc=jingqi.liu@intel.com \ --cc=kvm@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=yi.z.zhang@linux.intel.com \ --cc=ying.huang@intel.com \ --cc=yuan.yao@intel.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).