All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Michal Hocko <mhocko@suse.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Rik van Riel <riel@surriel.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dave Hansen <dave.hansen@intel.com>,
	Keith Busch <keith.busch@intel.com>,
	Fengguang Wu <fengguang.wu@intel.com>,
	"Du, Fan" <fan.du@intel.com>,
	"Huang, Ying" <ying.huang@intel.com>,
	Linux MM <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 01/10] mm: control memory placement by nodemask for two tier main memory
Date: Sat, 23 Mar 2019 10:21:30 -0700	[thread overview]
Message-ID: <CAPcyv4g5RoHhXhkKQaYkqYLN1y3KavbGeM1zVus-3fY5Q+JdxA@mail.gmail.com> (raw)
In-Reply-To: <1553316275-21985-2-git-send-email-yang.shi@linux.alibaba.com>

On Fri, Mar 22, 2019 at 9:45 PM Yang Shi <yang.shi@linux.alibaba.com> wrote:
>
> When running applications on the machine with NVDIMM as NUMA node, the
> memory allocation may end up on NVDIMM node.  This may result in silent
> performance degradation and regression due to the difference of hardware
> property.
>
> DRAM first should be obeyed to prevent from surprising regression.  Any
> non-DRAM nodes should be excluded from default allocation.  Use nodemask
> to control the memory placement.  Introduce def_alloc_nodemask which has
> DRAM nodes set only.  Any non-DRAM allocation should be specified by
> NUMA policy explicitly.
>
> In the future we may be able to extract the memory charasteristics from
> HMAT or other source to build up the default allocation nodemask.
> However, just distinguish DRAM and PMEM (non-DRAM) nodes by SRAT flag
> for the time being.
>
> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
> ---
>  arch/x86/mm/numa.c     |  1 +
>  drivers/acpi/numa.c    |  8 ++++++++
>  include/linux/mmzone.h |  3 +++
>  mm/page_alloc.c        | 18 ++++++++++++++++--
>  4 files changed, 28 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> index dfb6c4d..d9e0ca4 100644
> --- a/arch/x86/mm/numa.c
> +++ b/arch/x86/mm/numa.c
> @@ -626,6 +626,7 @@ static int __init numa_init(int (*init_func)(void))
>         nodes_clear(numa_nodes_parsed);
>         nodes_clear(node_possible_map);
>         nodes_clear(node_online_map);
> +       nodes_clear(def_alloc_nodemask);
>         memset(&numa_meminfo, 0, sizeof(numa_meminfo));
>         WARN_ON(memblock_set_node(0, ULLONG_MAX, &memblock.memory,
>                                   MAX_NUMNODES));
> diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
> index 867f6e3..79dfedf 100644
> --- a/drivers/acpi/numa.c
> +++ b/drivers/acpi/numa.c
> @@ -296,6 +296,14 @@ void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
>                 goto out_err_bad_srat;
>         }
>
> +       /*
> +        * Non volatile memory is excluded from zonelist by default.
> +        * Only regular DRAM nodes are set in default allocation node
> +        * mask.
> +        */
> +       if (!(ma->flags & ACPI_SRAT_MEM_NON_VOLATILE))
> +               node_set(node, def_alloc_nodemask);

Hmm, no, I don't think we should do this. Especially considering
current generation NVDIMMs are energy backed DRAM there is no
performance difference that should be assumed by the non-volatile
flag.

Why isn't default SLIT distance sufficient for ensuring a DRAM-first
default policy?

  reply	other threads:[~2019-03-23 17:21 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-23  4:44 [RFC PATCH 0/10] Another Approach to Use PMEM as NUMA Node Yang Shi
2019-03-23  4:44 ` [PATCH 01/10] mm: control memory placement by nodemask for two tier main memory Yang Shi
2019-03-23 17:21   ` Dan Williams [this message]
2019-03-23 17:21     ` Dan Williams
2019-03-25 19:28     ` Yang Shi
2019-03-25 23:18       ` Dan Williams
2019-03-25 23:18         ` Dan Williams
2019-03-25 23:36         ` Yang Shi
2019-03-25 23:42           ` Dan Williams
2019-03-25 23:42             ` Dan Williams
2019-03-23  4:44 ` [PATCH 02/10] mm: mempolicy: introduce MPOL_HYBRID policy Yang Shi
2019-03-23  4:44 ` [PATCH 03/10] mm: mempolicy: promote page to DRAM for MPOL_HYBRID Yang Shi
2019-03-23  4:44 ` [PATCH 04/10] mm: numa: promote pages to DRAM when it is accessed twice Yang Shi
2019-03-29  0:31   ` kbuild test robot
2019-03-23  4:44 ` [PATCH 05/10] mm: page_alloc: make find_next_best_node could skip DRAM node Yang Shi
2019-03-23  4:44 ` [PATCH 06/10] mm: vmscan: demote anon DRAM pages to PMEM node Yang Shi
2019-03-23  6:03   ` Zi Yan
2019-03-25 21:49     ` Yang Shi
2019-03-24 22:20   ` Keith Busch
2019-03-25 19:49     ` Yang Shi
2019-03-27  0:35       ` Keith Busch
2019-03-27  3:41         ` Yang Shi
2019-03-27 13:08           ` Keith Busch
2019-03-27 17:00             ` Zi Yan
2019-03-27 17:05               ` Dave Hansen
2019-03-27 17:48                 ` Zi Yan
2019-03-27 18:00                   ` Dave Hansen
2019-03-27 20:37                     ` Zi Yan
2019-03-27 20:42                       ` Dave Hansen
2019-03-28 21:59             ` Yang Shi
2019-03-28 22:45               ` Keith Busch
2019-03-23  4:44 ` [PATCH 07/10] mm: vmscan: add page demotion counter Yang Shi
2019-03-23  4:44 ` [PATCH 08/10] mm: numa: add page promotion counter Yang Shi
2019-03-23  4:44 ` [PATCH 09/10] doc: add description for MPOL_HYBRID mode Yang Shi
2019-03-23  4:44 ` [PATCH 10/10] doc: elaborate the PMEM allocation rule Yang Shi
2019-03-25 16:15 ` [RFC PATCH 0/10] Another Approach to Use PMEM as NUMA Node Brice Goglin
2019-03-25 16:56   ` Dan Williams
2019-03-25 16:56     ` Dan Williams
2019-03-25 17:45     ` Brice Goglin
2019-03-25 19:29       ` Dan Williams
2019-03-25 19:29         ` Dan Williams
2019-03-25 23:09         ` Brice Goglin
2019-03-25 23:37           ` Dan Williams
2019-03-25 23:37             ` Dan Williams
2019-03-26 12:19             ` Jonathan Cameron
2019-03-25 20:04   ` Yang Shi
2019-03-26 13:58 ` Michal Hocko
2019-03-26 18:33   ` Yang Shi
2019-03-26 18:37     ` Michal Hocko
2019-03-27  2:58       ` Yang Shi
2019-03-27  9:01         ` Michal Hocko
2019-03-27 17:34           ` Dan Williams
2019-03-27 17:34             ` Dan Williams
2019-03-27 18:59             ` Yang Shi
2019-03-27 20:09               ` Michal Hocko
2019-03-28  2:09                 ` Yang Shi
2019-03-28  6:58                   ` Michal Hocko
2019-03-28 18:58                     ` Yang Shi
2019-03-28 19:12                       ` Michal Hocko
2019-03-28 19:40                         ` Yang Shi
2019-03-28 20:40                           ` Michal Hocko
2019-03-28  8:21                   ` Dan Williams
2019-03-28  8:21                     ` Dan Williams
2019-03-27 20:14               ` Dave Hansen
2019-03-27 20:35             ` Matthew Wilcox
2019-03-27 20:40               ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPcyv4g5RoHhXhkKQaYkqYLN1y3KavbGeM1zVus-3fY5Q+JdxA@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=fan.du@intel.com \
    --cc=fengguang.wu@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=keith.busch@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=riel@surriel.com \
    --cc=yang.shi@linux.alibaba.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.