linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: linux-nvdimm@lists.01.org
Cc: Ira Weiny <ira.weiny@intel.com>,
	David Hildenbrand <david@redhat.com>,
	Borislav Petkov <bp@alien8.de>,
	Vishal Verma <vishal.l.verma@intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	kbuild test robot <lkp@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Michal Hocko <mhocko@suse.com>, Paul Mackerras <paulus@samba.org>,
	Christoph Hellwig <hch@lst.de>, Ingo Molnar <mingo@redhat.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	x86@kernel.org, Oliver O'Halloran <oohall@gmail.com>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Andy Lutomirski <luto@kernel.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-acpi@vger.kernel.org
Subject: [PATCH v2 00/18] Memory Hierarchy: Enable target node lookups for reserved memory
Date: Sun, 17 Nov 2019 09:44:34 -0800	[thread overview]
Message-ID: <157401267421.43284.2135775608523385279.stgit@dwillia2-desk3.amr.corp.intel.com> (raw)

Changes since v1 [1]:
- Rework numa_map_to_online_node() to be compatible with papr_scm_node()
  (Aneesh)
- Export the 'target_node' attribute for nvdimm regions and namespaces
  (Aneesh)
- Rename memory_add_physaddr_to_target_nid() to phys_to_target_node()
  and make it independent of CONFIG_MEMORY_HOTPLUG=y. Put a weak
  definition in mm/mempolicy.c that can be overridden by an arch
  implementation.
- Fix various build reports (kbuild-robot)
- Collect some reviewed-by's from Aneesh.

[1]: https://lore.kernel.org/r/157309899529.1582359.15358067933360719580.stgit@dwillia2-desk3.amr.corp.intel.com/

---

As mentioned in the v1 cover letter [1] the libnvdimm device-type cleanup is
intertwined with the new target_node infrastructure. The more interesting
patches for arch and mm folks start at patch 14.

This new infrastructure will prove more valuable over time for Memory
Tiers / Hierarchy management as more platforms (via the ACPI HMAT and
EFI Specific Purpose Memory) publish reserved or "soft-reserved" ranges
to Linux. Linux system administrators will expect to be able to interact
with those ranges with a unique numa node number when/if that memory is
onlined via the dax_kmem driver [2].

One configuration that currently fails to properly convey the target
node for the resulting memory hotplug operation is persistent memory
defined by the memmap=nn!ss parameter. For example, today if node1 is a
memory only node, and all the memory from node1 is specified to
memmap=nn!ss and subsequently onlined, it will end up being onlined as
node0 memory. As it stands, memory_add_physaddr_to_nid() can only
identify online nodes and since node1 in this example has no online cpus
/ memory the target node is initialized node0.

The fix is to preserve rather than discard the numa_meminfo entries that
are relevant for reserved memory ranges, and to uplevel the node
distance helper for determining the "local" (closest) node relative to
an initiator node.

The first 13 patches are cleanups to make sure that all nvdimm devices
and their children properly export a numa_node attribute, and add a
'target_node' attribute by default to regions and namespaces. The switch
to a device-type is less code and less error prone as a result.

Patch 14 - 17 are the core changes to allow numa node
information for offline memory to be tracked, and to provide a unified
node mapping distance helper across architectures
numa_map_to_online_node.

Patches 18 uses this new capability to fix the conveyance of target_node
information for memmap=nn!ss assignments. See patch 18 for more details
and the test case.

Given the timeframe to the v5.5 merge window I expect patch 14 - 18 will
likely miss due to not enough time to review, but posting them for
feedback nonetheless.

[2]: https://pmem.io/ndctl/daxctl-reconfigure-device.html

---

Dan Williams (18):
      libnvdimm: Move attribute groups to device type
      libnvdimm: Move region attribute group definition
      libnvdimm: Move nd_device_attribute_group to device_type
      libnvdimm: Move nd_numa_attribute_group to device_type
      libnvdimm: Move nd_region_attribute_group to device_type
      libnvdimm: Move nd_mapping_attribute_group to device_type
      libnvdimm: Move nvdimm_attribute_group to device_type
      libnvdimm: Move nvdimm_bus_attribute_group to device_type
      dax: Create a dax device_type
      dax: Simplify root read-only definition for the 'resource' attribute
      libnvdimm: Simplify root read-only definition for the 'resource' attribute
      dax: Add numa_node to the default device-dax attributes
      libnvdimm: Export the target_node attribute for regions and namespaces
      acpi/numa: Up-level "map to online node" functionality
      mm/numa: Skip NUMA_NO_NODE and online nodes in numa_map_to_online_node()
      powerpc/papr_scm: Switch to numa_map_to_online_node()
      x86/numa: Provide a range-to-target_node lookup facility
      libnvdimm/e820: Retrieve and populate correct 'target_node' info


 arch/powerpc/platforms/pseries/papr_scm.c |   46 ------
 arch/x86/mm/numa.c                        |   76 +++++++++
 drivers/acpi/nfit/core.c                  |    7 -
 drivers/acpi/numa.c                       |   41 -----
 drivers/dax/bus.c                         |   22 ++-
 drivers/nvdimm/btt_devs.c                 |   24 +--
 drivers/nvdimm/bus.c                      |   44 +++++
 drivers/nvdimm/core.c                     |    8 +
 drivers/nvdimm/dax_devs.c                 |   27 +--
 drivers/nvdimm/dimm_devs.c                |   30 ++--
 drivers/nvdimm/e820.c                     |   31 ----
 drivers/nvdimm/namespace_devs.c           |   77 +++++-----
 drivers/nvdimm/nd.h                       |    5 -
 drivers/nvdimm/of_pmem.c                  |   13 --
 drivers/nvdimm/pfn_devs.c                 |   38 ++---
 drivers/nvdimm/region_devs.c              |  235 +++++++++++++++--------------
 include/linux/acpi.h                      |   23 +++
 include/linux/libnvdimm.h                 |    7 -
 include/linux/numa.h                      |   17 ++
 mm/mempolicy.c                            |   35 ++++
 20 files changed, 430 insertions(+), 376 deletions(-)

             reply	other threads:[~2019-11-17 17:58 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-17 17:44 Dan Williams [this message]
2019-11-17 17:44 ` [PATCH v2 01/18] libnvdimm: Move attribute groups to device type Dan Williams
2019-11-17 17:44 ` [PATCH v2 02/18] libnvdimm: Move region attribute group definition Dan Williams
2019-11-17 17:44 ` [PATCH v2 03/18] libnvdimm: Move nd_device_attribute_group to device_type Dan Williams
2019-11-17 17:44 ` [PATCH v2 04/18] libnvdimm: Move nd_numa_attribute_group " Dan Williams
2019-11-18  9:46   ` Aneesh Kumar K.V
2019-11-17 17:45 ` [PATCH v2 05/18] libnvdimm: Move nd_region_attribute_group " Dan Williams
2019-11-17 17:45 ` [PATCH v2 06/18] libnvdimm: Move nd_mapping_attribute_group " Dan Williams
2019-11-17 17:45 ` [PATCH v2 07/18] libnvdimm: Move nvdimm_attribute_group " Dan Williams
2019-11-17 17:45 ` [PATCH v2 08/18] libnvdimm: Move nvdimm_bus_attribute_group " Dan Williams
2019-11-17 17:45 ` [PATCH v2 09/18] dax: Create a dax device_type Dan Williams
2019-11-17 17:45 ` [PATCH v2 10/18] dax: Simplify root read-only definition for the 'resource' attribute Dan Williams
2019-11-17 17:45 ` [PATCH v2 11/18] libnvdimm: " Dan Williams
2019-11-17 17:45 ` [PATCH v2 12/18] dax: Add numa_node to the default device-dax attributes Dan Williams
2019-11-17 17:45 ` [PATCH v2 13/18] libnvdimm: Export the target_node attribute for regions and namespaces Dan Williams
2019-11-18  9:45   ` Aneesh Kumar K.V
2019-11-17 17:45 ` [PATCH v2 14/18] acpi/numa: Up-level "map to online node" functionality Dan Williams
2019-11-29 11:56   ` Rafael J. Wysocki
2019-11-17 17:45 ` [PATCH v2 15/18] mm/numa: Skip NUMA_NO_NODE and online nodes in numa_map_to_online_node() Dan Williams
2019-11-18  9:45   ` Aneesh Kumar K.V
2019-11-17 17:46 ` [PATCH v2 16/18] powerpc/papr_scm: Switch to numa_map_to_online_node() Dan Williams
2019-11-18  9:46   ` Aneesh Kumar K.V
2019-11-20 10:30   ` Michael Ellerman
2019-11-17 17:46 ` [PATCH v2 17/18] x86/numa: Provide a range-to-target_node lookup facility Dan Williams
2019-11-18 18:45   ` Dan Williams
2019-11-17 17:46 ` [PATCH v2 18/18] libnvdimm/e820: Retrieve and populate correct 'target_node' info Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=157401267421.43284.2135775608523385279.stgit@dwillia2-desk3.amr.corp.intel.com \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=hch@lst.de \
    --cc=hpa@zytor.com \
    --cc=ira.weiny@intel.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=lkp@intel.com \
    --cc=luto@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=oohall@gmail.com \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=rjw@rjwysocki.net \
    --cc=tglx@linutronix.de \
    --cc=vishal.l.verma@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).