From: Fengguang Wu <fengguang.wu@intel.com> To: Andrew Morton <akpm@linux-foundation.org> Cc: Linux Memory Management List <linux-mm@kvack.org> Cc: kvm@vger.kernel.org Cc: Peng DongX <dongx.peng@intel.com> Cc: Liu Jingqi <jingqi.liu@intel.com> Cc: Dong Eddie <eddie.dong@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Brendan Gregg <bgregg@netflix.com> Cc: Fengguang Wu <fengguang.wu@intel.com>, LKML <linux-kernel@vger.kernel.org> Subject: [RFC][PATCH 0/5] introduce /proc/PID/idle_bitmap Date: Sat, 01 Sep 2018 19:28:18 +0800 [thread overview] Message-ID: <20180901112818.126790961@intel.com> (raw) This new /proc/PID/idle_bitmap interface aims to complement the current global /sys/kernel/mm/page_idle/bitmap. To enable efficient user space driven migrations. The pros and cons will be discussed in changelog of "[PATCH] proc: introduce /proc/PID/idle_bitmap". The driving force is to improve efficiency by 10+ times, so that hot/cold page tracking can be done in some regular intervals in user space w/o too much overheads. Making it possible for some user space daemon to do regular page migration between NUMA nodes of different speeds. Note it's not about NUMA migration between local and remote nodes -- we already have NUMA balancing for that. This interface and user space migration daemon targets for NUMA nodes made of different mediums -- ie. DIMM and NVDIMM(*) -- with larger performance gaps. Basic policy will be "move hot pages to DIMM; cold pages to NVDIMM". Since NVDIMMs size can easily reach several Terabytes, working set tracking efficiency will matter and be challeging. (*) Here we use persistent memory (PMEM) w/o using its persistence. Persistence is good to have, however it requires modifying applications. Upcoming NVDIMM products like Intel Apache Pass (AEP) will be more cost and energy effective than DRAM, but slower. Merely using it in form of NUMA memory node could immediately benefit many workloads. For example, warm but not hot apps, workloads with sharp hot/cold page distribution (good for migration), or relies more on memory size than latency and bandwidth, and do more reads than writes. This is an early RFC version to collect feedbacks. It's complete enough to demo the basic ideas and performance, however not usable yet. Regards, Fengguang
WARNING: multiple messages have this Message-ID (diff)
From: Fengguang Wu <fengguang.wu@intel.com> To: Andrew Morton <akpm@linux-foundation.org> Cc: Linux Memory Management List <linux-mm@kvack.org>, kvm@vger.kernel.org, Peng DongX <dongx.peng@intel.com>, Liu Jingqi <jingqi.liu@intel.com>, Dong Eddie <eddie.dong@intel.com>, Dave Hansen <dave.hansen@intel.com>, Huang Ying <ying.huang@intel.com>, Brendan Gregg <bgregg@netflix.com>, Fengguang Wu <fengguang.wu@intel.com>, LKML <linux-kernel@vger.kernel.org> Subject: [RFC][PATCH 0/5] introduce /proc/PID/idle_bitmap Date: Sat, 01 Sep 2018 19:28:18 +0800 [thread overview] Message-ID: <20180901112818.126790961@intel.com> (raw) This new /proc/PID/idle_bitmap interface aims to complement the current global /sys/kernel/mm/page_idle/bitmap. To enable efficient user space driven migrations. The pros and cons will be discussed in changelog of "[PATCH] proc: introduce /proc/PID/idle_bitmap". The driving force is to improve efficiency by 10+ times, so that hot/cold page tracking can be done in some regular intervals in user space w/o too much overheads. Making it possible for some user space daemon to do regular page migration between NUMA nodes of different speeds. Note it's not about NUMA migration between local and remote nodes -- we already have NUMA balancing for that. This interface and user space migration daemon targets for NUMA nodes made of different mediums -- ie. DIMM and NVDIMM(*) -- with larger performance gaps. Basic policy will be "move hot pages to DIMM; cold pages to NVDIMM". Since NVDIMMs size can easily reach several Terabytes, working set tracking efficiency will matter and be challeging. (*) Here we use persistent memory (PMEM) w/o using its persistence. Persistence is good to have, however it requires modifying applications. Upcoming NVDIMM products like Intel Apache Pass (AEP) will be more cost and energy effective than DRAM, but slower. Merely using it in form of NUMA memory node could immediately benefit many workloads. For example, warm but not hot apps, workloads with sharp hot/cold page distribution (good for migration), or relies more on memory size than latency and bandwidth, and do more reads than writes. This is an early RFC version to collect feedbacks. It's complete enough to demo the basic ideas and performance, however not usable yet. Regards, Fengguang
next reply other threads:[~2018-09-02 2:21 UTC|newest] Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-09-01 11:28 Fengguang Wu [this message] 2018-09-01 11:28 ` [RFC][PATCH 0/5] introduce /proc/PID/idle_bitmap Fengguang Wu 2018-09-01 11:28 ` [RFC][PATCH 1/5] [PATCH 1/5] kvm: register in task_struct Fengguang Wu 2018-09-01 11:28 ` Fengguang Wu 2018-09-01 11:28 ` [RFC][PATCH 2/5] [PATCH 2/5] proc: introduce /proc/PID/idle_bitmap Fengguang Wu 2018-09-01 11:28 ` Fengguang Wu 2018-09-04 19:02 ` Sean Christopherson 2018-09-06 14:12 ` Dave Hansen 2018-09-01 11:28 ` [RFC][PATCH 3/5] [PATCH 3/5] kvm-ept-idle: HVA indexed EPT read Fengguang Wu 2018-09-01 11:28 ` Fengguang Wu 2018-09-04 7:57 ` Nikita Leshenko 2018-09-04 8:12 ` Peng, DongX 2018-09-04 8:15 ` Fengguang Wu 2018-09-01 11:28 ` [RFC][PATCH 4/5] [PATCH 4/5] kvm-ept-idle: EPT page table walk for A bits Fengguang Wu 2018-09-01 11:28 ` Fengguang Wu 2018-09-06 14:35 ` Dave Hansen 2018-09-01 11:28 ` [RFC][PATCH 5/5] [PATCH 5/5] kvm-ept-idle: enable module Fengguang Wu 2018-09-01 11:28 ` Fengguang Wu 2018-09-04 19:14 ` Sean Christopherson 2018-09-02 8:24 ` [RFC][PATCH 0/5] introduce /proc/PID/idle_bitmap Fengguang Wu
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20180901112818.126790961@intel.com \ --to=fengguang.wu@intel.com \ --cc=akpm@linux-foundation.org \ --cc=linux-mm@kvack.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.