All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: linux-nvdimm@lists.01.org
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
	Andy Lutomirski <luto@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	Michal Hocko <mhocko@suse.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: [PATCH 16/16] libnvdimm/e820: Retrieve and populate correct 'target_node' info
Date: Wed, 06 Nov 2019 19:58:03 -0800	[thread overview]
Message-ID: <157309908326.1582359.13665017314935413372.stgit@dwillia2-desk3.amr.corp.intel.com> (raw)
In-Reply-To: <157309899529.1582359.15358067933360719580.stgit@dwillia2-desk3.amr.corp.intel.com>

Use the new memory_add_physaddr_to_target_node() and
numa_map_to_online_node() helpers to retrieve the correct id for
the 'numa_node' (online initiator) and 'target_node' (offline target
memory node) sysfs attributes.

Below is an example from a 4 numa node system where all the memory on
node2 is pmem / reserved. It should be noted that with the arrival of
the ACPI HMAT table and EFI Specific Purpose Memory the kernel will
start to see more platforms with reserved / performance differentiated
memory in its own numa node. Hence all the stakeholders on the Cc for
what is ostensibly a libnvdimm local patch.

=== Before ===

/* Notice no online memory on node2 at start */

# numactl --hardware
available: 3 nodes (0-1,3)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
node 0 size: 3958 MB
node 0 free: 3708 MB
node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
node 1 size: 4027 MB
node 1 free: 3871 MB
node 3 cpus:
node 3 size: 3994 MB
node 3 free: 3971 MB
node distances:
node   0   1   3
  0:  10  21  21
  1:  21  10  21
  3:  21  21  10

/*
 * Put the pmem namespace into devdax mode so it can be assigned to the
 * kmem driver
 */

# ndctl create-namespace -e namespace0.0 -m devdax -f
{
  "dev":"namespace0.0",
  "mode":"devdax",
  "map":"dev",
  "size":"3.94 GiB (4.23 GB)",
  "uuid":"1650af9b-9ba3-4704-acd6-10178399d9a3",
  [..]
}

/* Online Persistent Memory as System RAM */

# daxctl reconfigure-device --mode=system-ram dax0.0
libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success
libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success
libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success
libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success
[
  {
    "chardev":"dax0.0",
    "size":4225761280,
    "target_node":0,
    "mode":"system-ram"
  }
]
reconfigured 1 device

/* Note that the memory is onlined by default to the wrong node, node0 */

# numactl --hardware
available: 3 nodes (0-1,3)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
node 0 size: 7926 MB
node 0 free: 7655 MB
node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
node 1 size: 4027 MB
node 1 free: 3871 MB
node 3 cpus:
node 3 size: 3994 MB
node 3 free: 3971 MB
node distances:
node   0   1   3
  0:  10  21  21
  1:  21  10  21
  3:  21  21  10


=== After ===

/* Notice that the "phys_index" error messages are gone */

# daxctl reconfigure-device --mode=system-ram dax0.0
[
  {
    "chardev":"dax0.0",
    "size":4225761280,
    "target_node":2,
    "mode":"system-ram"
  }
]
reconfigured 1 device

/* Notice that node2 is now correctly populated */

# numactl --hardware
available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
node 0 size: 3958 MB
node 0 free: 3793 MB
node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
node 1 size: 4027 MB
node 1 free: 3851 MB
node 2 cpus:
node 2 size: 3968 MB
node 2 free: 3968 MB
node 3 cpus:
node 3 size: 3994 MB
node 3 free: 3908 MB
node distances:
node   0   1   2   3
  0:  10  21  21  21
  1:  21  10  21  21
  2:  21  21  10  21
  3:  21  21  21  10

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/nvdimm/e820.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/nvdimm/e820.c b/drivers/nvdimm/e820.c
index b802291bcde1..23121dd6e494 100644
--- a/drivers/nvdimm/e820.c
+++ b/drivers/nvdimm/e820.c
@@ -20,11 +20,12 @@ static int e820_register_one(struct resource *res, void *data)
 {
 	struct nd_region_desc ndr_desc;
 	struct nvdimm_bus *nvdimm_bus = data;
+	int nid = memory_add_physaddr_to_target_node(res->start);
 
 	memset(&ndr_desc, 0, sizeof(ndr_desc));
 	ndr_desc.res = res;
-	ndr_desc.numa_node = memory_add_physaddr_to_nid(res->start);
-	ndr_desc.target_node = ndr_desc.numa_node;
+	ndr_desc.numa_node = numa_map_to_online_node(nid);
+	ndr_desc.target_node = nid;
 	set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
 	if (!nvdimm_pmem_region_create(nvdimm_bus, &ndr_desc))
 		return -ENXIO;
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com>
To: linux-nvdimm@lists.01.org
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
	Andy Lutomirski <luto@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	Michal Hocko <mhocko@suse.com>, Ira Weiny <ira.weiny@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: [PATCH 16/16] libnvdimm/e820: Retrieve and populate correct 'target_node' info
Date: Wed, 06 Nov 2019 19:58:03 -0800	[thread overview]
Message-ID: <157309908326.1582359.13665017314935413372.stgit@dwillia2-desk3.amr.corp.intel.com> (raw)
In-Reply-To: <157309899529.1582359.15358067933360719580.stgit@dwillia2-desk3.amr.corp.intel.com>

Use the new memory_add_physaddr_to_target_node() and
numa_map_to_online_node() helpers to retrieve the correct id for
the 'numa_node' (online initiator) and 'target_node' (offline target
memory node) sysfs attributes.

Below is an example from a 4 numa node system where all the memory on
node2 is pmem / reserved. It should be noted that with the arrival of
the ACPI HMAT table and EFI Specific Purpose Memory the kernel will
start to see more platforms with reserved / performance differentiated
memory in its own numa node. Hence all the stakeholders on the Cc for
what is ostensibly a libnvdimm local patch.

=== Before ===

/* Notice no online memory on node2 at start */

# numactl --hardware
available: 3 nodes (0-1,3)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
node 0 size: 3958 MB
node 0 free: 3708 MB
node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
node 1 size: 4027 MB
node 1 free: 3871 MB
node 3 cpus:
node 3 size: 3994 MB
node 3 free: 3971 MB
node distances:
node   0   1   3
  0:  10  21  21
  1:  21  10  21
  3:  21  21  10

/*
 * Put the pmem namespace into devdax mode so it can be assigned to the
 * kmem driver
 */

# ndctl create-namespace -e namespace0.0 -m devdax -f
{
  "dev":"namespace0.0",
  "mode":"devdax",
  "map":"dev",
  "size":"3.94 GiB (4.23 GB)",
  "uuid":"1650af9b-9ba3-4704-acd6-10178399d9a3",
  [..]
}

/* Online Persistent Memory as System RAM */

# daxctl reconfigure-device --mode=system-ram dax0.0
libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success
libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success
libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success
libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success
[
  {
    "chardev":"dax0.0",
    "size":4225761280,
    "target_node":0,
    "mode":"system-ram"
  }
]
reconfigured 1 device

/* Note that the memory is onlined by default to the wrong node, node0 */

# numactl --hardware
available: 3 nodes (0-1,3)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
node 0 size: 7926 MB
node 0 free: 7655 MB
node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
node 1 size: 4027 MB
node 1 free: 3871 MB
node 3 cpus:
node 3 size: 3994 MB
node 3 free: 3971 MB
node distances:
node   0   1   3
  0:  10  21  21
  1:  21  10  21
  3:  21  21  10


=== After ===

/* Notice that the "phys_index" error messages are gone */

# daxctl reconfigure-device --mode=system-ram dax0.0
[
  {
    "chardev":"dax0.0",
    "size":4225761280,
    "target_node":2,
    "mode":"system-ram"
  }
]
reconfigured 1 device

/* Notice that node2 is now correctly populated */

# numactl --hardware
available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
node 0 size: 3958 MB
node 0 free: 3793 MB
node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
node 1 size: 4027 MB
node 1 free: 3851 MB
node 2 cpus:
node 2 size: 3968 MB
node 2 free: 3968 MB
node 3 cpus:
node 3 size: 3994 MB
node 3 free: 3908 MB
node distances:
node   0   1   2   3
  0:  10  21  21  21
  1:  21  10  21  21
  2:  21  21  10  21
  3:  21  21  21  10

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/nvdimm/e820.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/nvdimm/e820.c b/drivers/nvdimm/e820.c
index b802291bcde1..23121dd6e494 100644
--- a/drivers/nvdimm/e820.c
+++ b/drivers/nvdimm/e820.c
@@ -20,11 +20,12 @@ static int e820_register_one(struct resource *res, void *data)
 {
 	struct nd_region_desc ndr_desc;
 	struct nvdimm_bus *nvdimm_bus = data;
+	int nid = memory_add_physaddr_to_target_node(res->start);
 
 	memset(&ndr_desc, 0, sizeof(ndr_desc));
 	ndr_desc.res = res;
-	ndr_desc.numa_node = memory_add_physaddr_to_nid(res->start);
-	ndr_desc.target_node = ndr_desc.numa_node;
+	ndr_desc.numa_node = numa_map_to_online_node(nid);
+	ndr_desc.target_node = nid;
 	set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
 	if (!nvdimm_pmem_region_create(nvdimm_bus, &ndr_desc))
 		return -ENXIO;


  parent reply	other threads:[~2019-11-07  4:12 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-07  3:56 [PATCH 00/16] Memory Hierarchy: Enable target node lookups for reserved memory Dan Williams
2019-11-07  3:56 ` Dan Williams
2019-11-07  3:56 ` [PATCH 01/16] libnvdimm: Move attribute groups to device type Dan Williams
2019-11-07  3:56   ` Dan Williams
2019-11-12 11:28   ` Aneesh Kumar K.V
2019-11-12 11:28     ` Aneesh Kumar K.V
2019-11-07  3:56 ` [PATCH 02/16] libnvdimm: Move region attribute group definition Dan Williams
2019-11-07  3:56   ` Dan Williams
2019-11-12 11:29   ` Aneesh Kumar K.V
2019-11-12 11:29     ` Aneesh Kumar K.V
2019-11-07  3:56 ` [PATCH 03/16] libnvdimm: Move nd_device_attribute_group to device_type Dan Williams
2019-11-07  3:56   ` Dan Williams
2019-11-12 11:30   ` Aneesh Kumar K.V
2019-11-12 11:30     ` Aneesh Kumar K.V
2019-11-07  3:56 ` [PATCH 04/16] libnvdimm: Move nd_numa_attribute_group " Dan Williams
2019-11-07  3:56   ` Dan Williams
2019-11-12  9:22   ` Aneesh Kumar K.V
2019-11-12  9:22     ` Aneesh Kumar K.V
2019-11-13  1:26     ` Dan Williams
2019-11-13  1:26       ` Dan Williams
2019-11-13  1:26       ` Dan Williams
2019-11-13  6:02       ` Aneesh Kumar K.V
2019-11-13  6:02         ` Aneesh Kumar K.V
2019-11-13  6:14         ` Dan Williams
2019-11-13  6:14           ` Dan Williams
2019-11-13  6:14           ` Dan Williams
2019-11-07  3:57 ` [PATCH 05/16] libnvdimm: Move nd_region_attribute_group " Dan Williams
2019-11-07  3:57   ` Dan Williams
2019-11-12 11:45   ` Aneesh Kumar K.V
2019-11-12 11:45     ` Aneesh Kumar K.V
2019-11-07  3:57 ` [PATCH 06/16] libnvdimm: Move nd_mapping_attribute_group " Dan Williams
2019-11-07  3:57   ` Dan Williams
2019-11-12 11:45   ` Aneesh Kumar K.V
2019-11-12 11:45     ` Aneesh Kumar K.V
2019-11-07  3:57 ` [PATCH 07/16] libnvdimm: Move nvdimm_attribute_group " Dan Williams
2019-11-07  3:57   ` Dan Williams
2019-11-12 11:48   ` Aneesh Kumar K.V
2019-11-12 11:48     ` Aneesh Kumar K.V
2019-11-07  3:57 ` [PATCH 08/16] libnvdimm: Move nvdimm_bus_attribute_group " Dan Williams
2019-11-07  3:57   ` Dan Williams
2019-11-12 11:48   ` Aneesh Kumar K.V
2019-11-12 11:48     ` Aneesh Kumar K.V
2019-11-07  3:57 ` [PATCH 09/16] dax: Create a dax device_type Dan Williams
2019-11-07  3:57   ` Dan Williams
2019-11-12 11:49   ` Aneesh Kumar K.V
2019-11-12 11:49     ` Aneesh Kumar K.V
2019-11-07  3:57 ` [PATCH 10/16] dax: Simplify root read-only definition for the 'resource' attribute Dan Williams
2019-11-07  3:57   ` Dan Williams
2019-11-12 11:49   ` Aneesh Kumar K.V
2019-11-12 11:49     ` Aneesh Kumar K.V
2019-11-07  3:57 ` [PATCH 11/16] libnvdimm: " Dan Williams
2019-11-07  3:57   ` Dan Williams
2019-11-12 11:50   ` Aneesh Kumar K.V
2019-11-12 11:50     ` Aneesh Kumar K.V
2019-11-07  3:57 ` [PATCH 12/16] dax: Add numa_node to the default device-dax attributes Dan Williams
2019-11-07  3:57   ` Dan Williams
2019-11-12 11:50   ` Aneesh Kumar K.V
2019-11-12 11:50     ` Aneesh Kumar K.V
2019-11-07  3:57 ` [PATCH 13/16] acpi/mm: Up-level "map to online node" functionality Dan Williams
2019-11-07  3:57   ` Dan Williams
2019-11-11 11:30   ` Aneesh Kumar K.V
2019-11-11 11:30     ` Aneesh Kumar K.V
2019-11-11 23:38     ` Dan Williams
2019-11-11 23:38       ` Dan Williams
2019-11-11 23:38       ` Dan Williams
2019-11-07  3:57 ` [PATCH 14/16] x86/numa: Provide a range-to-target_node lookup facility Dan Williams
2019-11-07  3:57   ` Dan Williams
2019-11-07  3:57 ` [PATCH 15/16] libnvdimm/e820: Drop the wrapper around memory_add_physaddr_to_nid Dan Williams
2019-11-07  3:57   ` Dan Williams
2019-11-07  3:58 ` Dan Williams [this message]
2019-11-07  3:58   ` [PATCH 16/16] libnvdimm/e820: Retrieve and populate correct 'target_node' info Dan Williams
2019-11-09  5:02   ` kbuild test robot
2019-11-09  5:02     ` kbuild test robot
2019-11-09  5:02     ` kbuild test robot
2019-11-12 11:42 ` [PATCH 00/16] Memory Hierarchy: Enable target node lookups for reserved memory Aneesh Kumar K.V
2019-11-12 11:42   ` Aneesh Kumar K.V
2019-11-12 19:37   ` Dan Williams
2019-11-12 19:37     ` Dan Williams
2019-11-12 19:37     ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=157309908326.1582359.13665017314935413372.stgit@dwillia2-desk3.amr.corp.intel.com \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=luto@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.