From: Brice Goglin <Brice.Goglin@inria.fr>
To: Yang Shi <yang.shi@linux.alibaba.com>,
mhocko@suse.com, mgorman@techsingularity.net, riel@surriel.com,
hannes@cmpxchg.org, akpm@linux-foundation.org,
dave.hansen@intel.com, keith.busch@intel.com,
dan.j.williams@intel.com, fengguang.wu@intel.com,
fan.du@intel.com, ying.huang@intel.com
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 0/10] Another Approach to Use PMEM as NUMA Node
Date: Mon, 25 Mar 2019 17:15:11 +0100 [thread overview]
Message-ID: <cc6f44e2-48b5-067f-9685-99d8ae470b50@inria.fr> (raw)
In-Reply-To: <1553316275-21985-1-git-send-email-yang.shi@linux.alibaba.com>
Le 23/03/2019 à 05:44, Yang Shi a écrit :
> With Dave Hansen's patches merged into Linus's tree
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c221c0b0308fd01d9fb33a16f64d2fd95f8830a4
>
> PMEM could be hot plugged as NUMA node now. But, how to use PMEM as NUMA node
> effectively and efficiently is still a question.
>
> There have been a couple of proposals posted on the mailing list [1] [2].
>
> The patchset is aimed to try a different approach from this proposal [1]
> to use PMEM as NUMA nodes.
>
> The approach is designed to follow the below principles:
>
> 1. Use PMEM as normal NUMA node, no special gfp flag, zone, zonelist, etc.
>
> 2. DRAM first/by default. No surprise to existing applications and default
> running. PMEM will not be allocated unless its node is specified explicitly
> by NUMA policy. Some applications may be not very sensitive to memory latency,
> so they could be placed on PMEM nodes then have hot pages promote to DRAM
> gradually.
I am not against the approach for some workloads. However, many HPC
people would rather do this manually. But there's currently no easy way
to find out from userspace whether a given NUMA node is DDR or PMEM*. We
have to assume HMAT is available (and correct) and look at performance
attributes. When talking to humans, it would be better to say "I
allocated on the local DDR NUMA node" rather than "I allocated on the
fastest node according to HMAT latency".
Also, when we'll have HBM+DDR, some applications may want to use DDR by
default, which means they want the *slowest* node according to HMAT (by
the way, will your hybrid policy work if we ever have HBM+DDR+PMEM?).
Performance attributes could help, but how does user-space know for sure
that X>Y will still mean HBM>DDR and not DDR>PMEM in 5 years?
It seems to me that exporting a flag in sysfs saying whether a node is
PMEM could be convenient. Patch series [1] exported a "type" in sysfs
node directories ("pmem" or "dram"). I don't know how if there's an easy
way to define what HBM is and expose that type too.
Brice
* As far as I know, the only way is to look at all DAX devices until you
find the given NUMA node in the "target_node" attribute. If none, you're
likely not PMEM-backed.
> [1]: https://lore.kernel.org/linux-mm/20181226131446.330864849@intel.com/
next prev parent reply other threads:[~2019-03-25 16:15 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-23 4:44 [RFC PATCH 0/10] Another Approach to Use PMEM as NUMA Node Yang Shi
2019-03-23 4:44 ` [PATCH 01/10] mm: control memory placement by nodemask for two tier main memory Yang Shi
2019-03-23 17:21 ` Dan Williams
2019-03-25 19:28 ` Yang Shi
2019-03-25 23:18 ` Dan Williams
2019-03-25 23:36 ` Yang Shi
2019-03-25 23:42 ` Dan Williams
2019-03-23 4:44 ` [PATCH 02/10] mm: mempolicy: introduce MPOL_HYBRID policy Yang Shi
2019-03-23 4:44 ` [PATCH 03/10] mm: mempolicy: promote page to DRAM for MPOL_HYBRID Yang Shi
2019-03-23 4:44 ` [PATCH 04/10] mm: numa: promote pages to DRAM when it is accessed twice Yang Shi
2019-03-29 0:31 ` kbuild test robot
2019-03-23 4:44 ` [PATCH 05/10] mm: page_alloc: make find_next_best_node could skip DRAM node Yang Shi
2019-03-23 4:44 ` [PATCH 06/10] mm: vmscan: demote anon DRAM pages to PMEM node Yang Shi
2019-03-23 6:03 ` Zi Yan
2019-03-25 21:49 ` Yang Shi
2019-03-24 22:20 ` Keith Busch
2019-03-25 19:49 ` Yang Shi
2019-03-27 0:35 ` Keith Busch
2019-03-27 3:41 ` Yang Shi
2019-03-27 13:08 ` Keith Busch
2019-03-27 17:00 ` Zi Yan
2019-03-27 17:05 ` Dave Hansen
2019-03-27 17:48 ` Zi Yan
2019-03-27 18:00 ` Dave Hansen
2019-03-27 20:37 ` Zi Yan
2019-03-27 20:42 ` Dave Hansen
2019-03-28 21:59 ` Yang Shi
2019-03-28 22:45 ` Keith Busch
2019-03-23 4:44 ` [PATCH 07/10] mm: vmscan: add page demotion counter Yang Shi
2019-03-23 4:44 ` [PATCH 08/10] mm: numa: add page promotion counter Yang Shi
2019-03-23 4:44 ` [PATCH 09/10] doc: add description for MPOL_HYBRID mode Yang Shi
2019-03-23 4:44 ` [PATCH 10/10] doc: elaborate the PMEM allocation rule Yang Shi
2019-03-25 16:15 ` Brice Goglin [this message]
2019-03-25 16:56 ` [RFC PATCH 0/10] Another Approach to Use PMEM as NUMA Node Dan Williams
2019-03-25 17:45 ` Brice Goglin
2019-03-25 19:29 ` Dan Williams
2019-03-25 23:09 ` Brice Goglin
2019-03-25 23:37 ` Dan Williams
2019-03-26 12:19 ` Jonathan Cameron
2019-03-25 20:04 ` Yang Shi
2019-03-26 13:58 ` Michal Hocko
2019-03-26 18:33 ` Yang Shi
2019-03-26 18:37 ` Michal Hocko
2019-03-27 2:58 ` Yang Shi
2019-03-27 9:01 ` Michal Hocko
2019-03-27 17:34 ` Dan Williams
2019-03-27 18:59 ` Yang Shi
2019-03-27 20:09 ` Michal Hocko
2019-03-28 2:09 ` Yang Shi
2019-03-28 6:58 ` Michal Hocko
2019-03-28 18:58 ` Yang Shi
2019-03-28 19:12 ` Michal Hocko
2019-03-28 19:40 ` Yang Shi
2019-03-28 20:40 ` Michal Hocko
2019-03-28 8:21 ` Dan Williams
2019-03-27 20:14 ` Dave Hansen
2019-03-27 20:35 ` Matthew Wilcox
2019-03-27 20:40 ` Dave Hansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cc6f44e2-48b5-067f-9685-99d8ae470b50@inria.fr \
--to=brice.goglin@inria.fr \
--cc=akpm@linux-foundation.org \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@intel.com \
--cc=fan.du@intel.com \
--cc=fengguang.wu@intel.com \
--cc=hannes@cmpxchg.org \
--cc=keith.busch@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@suse.com \
--cc=riel@surriel.com \
--cc=yang.shi@linux.alibaba.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).