All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] Add NUMA support for NVDIMM devices
@ 2015-06-09 23:10 ` Toshi Kani
  0 siblings, 0 replies; 27+ messages in thread
From: Toshi Kani @ 2015-06-09 23:10 UTC (permalink / raw)
  To: rjw, dan.j.williams; +Cc: linux-acpi, linux-nvdimm, linux-kernel

Since NVDIMMs are installed on memory slots, they expose the NUMA
topology of a platform.  This patchset adds support of sysfs
'numa_node' to I/O-related NVDIMM devices under /sys/bus/nd/devices.
This enables numactl(8) to accept 'block:' and 'file:' paths of
pmem and btt devices as shown in the examples below.
  numactl --preferred block:pmem0 --show
  numactl --preferred file:/dev/pmem0s --show

numactl can be used to bind an application to the locality of
a target NVDIMM for better performance.  Here is a result of fio
benchmark to ext4/dax on an HP DL380 with 2 sockets for local and
remote settings.

  Local [1] :  4098.3MB/s
  Remote [2]:  3718.4MB/s

[1] numactl --preferred block:pmem0 --cpunodebind block:pmem0 fio <fs-on-pmem0> 
[2] numactl --preferred block:pmem1 --cpunodebind block:pmem1 fio <fs-on-pmem0>

Patch 1/3 applies on top of the acpica branch of the pm tree.
Patch 2/3-3/3 apply on top of Dan Williams's v5 patch series of 
"libnvdimm: non-volatile memory devices".

---
v2:
 - Add acpi_map_pxm_to_online_node(), which returns an online node.
 - Manage visibility of sysfs numa_node with is_visible. (Dan Williams)
 - Check ACPI_NFIT_PROXIMITY_VALID in spa->flags.

---
Toshi Kani (3):
  1/3 acpi: Add acpi_map_pxm_to_online_node()
  2/3 libnvdimm: Set numa_node to NVDIMM devices
  3/3 libnvdimm: Add sysfs numa_node to NVDIMM devices

---
 drivers/acpi/nfit.c             |  7 +++++++
 drivers/acpi/numa.c             | 40 +++++++++++++++++++++++++++++++++++++---
 drivers/nvdimm/btt.c            |  2 ++
 drivers/nvdimm/btt_devs.c       |  1 +
 drivers/nvdimm/bus.c            | 30 ++++++++++++++++++++++++++++++
 drivers/nvdimm/namespace_devs.c |  1 +
 drivers/nvdimm/nd.h             |  1 +
 drivers/nvdimm/region.c         |  1 +
 drivers/nvdimm/region_devs.c    |  1 +
 include/linux/acpi.h            |  5 +++++
 include/linux/libnvdimm.h       |  2 ++
 11 files changed, 88 insertions(+), 3 deletions(-)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 0/3] Add NUMA support for NVDIMM devices
@ 2015-06-09 23:10 ` Toshi Kani
  0 siblings, 0 replies; 27+ messages in thread
From: Toshi Kani @ 2015-06-09 23:10 UTC (permalink / raw)
  To: rjw, dan.j.williams; +Cc: linux-acpi, linux-nvdimm, linux-kernel

Since NVDIMMs are installed on memory slots, they expose the NUMA
topology of a platform.  This patchset adds support of sysfs
'numa_node' to I/O-related NVDIMM devices under /sys/bus/nd/devices.
This enables numactl(8) to accept 'block:' and 'file:' paths of
pmem and btt devices as shown in the examples below.
  numactl --preferred block:pmem0 --show
  numactl --preferred file:/dev/pmem0s --show

numactl can be used to bind an application to the locality of
a target NVDIMM for better performance.  Here is a result of fio
benchmark to ext4/dax on an HP DL380 with 2 sockets for local and
remote settings.

  Local [1] :  4098.3MB/s
  Remote [2]:  3718.4MB/s

[1] numactl --preferred block:pmem0 --cpunodebind block:pmem0 fio <fs-on-pmem0> 
[2] numactl --preferred block:pmem1 --cpunodebind block:pmem1 fio <fs-on-pmem0>

Patch 1/3 applies on top of the acpica branch of the pm tree.
Patch 2/3-3/3 apply on top of Dan Williams's v5 patch series of 
"libnvdimm: non-volatile memory devices".

---
v2:
 - Add acpi_map_pxm_to_online_node(), which returns an online node.
 - Manage visibility of sysfs numa_node with is_visible. (Dan Williams)
 - Check ACPI_NFIT_PROXIMITY_VALID in spa->flags.

---
Toshi Kani (3):
  1/3 acpi: Add acpi_map_pxm_to_online_node()
  2/3 libnvdimm: Set numa_node to NVDIMM devices
  3/3 libnvdimm: Add sysfs numa_node to NVDIMM devices

---
 drivers/acpi/nfit.c             |  7 +++++++
 drivers/acpi/numa.c             | 40 +++++++++++++++++++++++++++++++++++++---
 drivers/nvdimm/btt.c            |  2 ++
 drivers/nvdimm/btt_devs.c       |  1 +
 drivers/nvdimm/bus.c            | 30 ++++++++++++++++++++++++++++++
 drivers/nvdimm/namespace_devs.c |  1 +
 drivers/nvdimm/nd.h             |  1 +
 drivers/nvdimm/region.c         |  1 +
 drivers/nvdimm/region_devs.c    |  1 +
 include/linux/acpi.h            |  5 +++++
 include/linux/libnvdimm.h       |  2 ++
 11 files changed, 88 insertions(+), 3 deletions(-)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 1/3] acpi: Add acpi_map_pxm_to_online_node()
  2015-06-09 23:10 ` Toshi Kani
@ 2015-06-09 23:10   ` Toshi Kani
  -1 siblings, 0 replies; 27+ messages in thread
From: Toshi Kani @ 2015-06-09 23:10 UTC (permalink / raw)
  To: rjw, dan.j.williams; +Cc: linux-acpi, linux-nvdimm, linux-kernel, Toshi Kani

The kernel initializes CPU & memory's NUMA topology from ACPI
SRAT table.  Some other ACPI tables, such as NFIT and DMAR,
also contain proximity IDs for their device's NUMA topology.
This information can be used to improve performance of these
devices.

This patch introduces acpi_map_pxm_to_online_node(), which maps
a given pxm to an online node.  This allows ACPI device driver
modules to obtain a node from a device proximity ID.  Unlike
acpi_map_pxm_to_node(), this interface is guaranteed to return
an online node so that the caller module can use the node without
dealing with the node status.  A node may be offline when a device
proximity ID is unique, SRAT memory entry does not exist, or
NUMA is disabled (ex. numa_off on x86).

This patch also moves the pxm range check from acpi_get_node()
to acpi_map_pxm_to_node().

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 drivers/acpi/numa.c  |   40 +++++++++++++++++++++++++++++++++++++---
 include/linux/acpi.h |    5 +++++
 2 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
index 1333cbdc..a64947e 100644
--- a/drivers/acpi/numa.c
+++ b/drivers/acpi/numa.c
@@ -29,6 +29,8 @@
 #include <linux/errno.h>
 #include <linux/acpi.h>
 #include <linux/numa.h>
+#include <linux/nodemask.h>
+#include <linux/topology.h>
 
 #define PREFIX "ACPI: "
 
@@ -70,7 +72,12 @@ static void __acpi_map_pxm_to_node(int pxm, int node)
 
 int acpi_map_pxm_to_node(int pxm)
 {
-	int node = pxm_to_node_map[pxm];
+	int node;
+
+	if (pxm < 0 || pxm >= MAX_PXM_DOMAINS)
+		return NUMA_NO_NODE;
+
+	node = pxm_to_node_map[pxm];
 
 	if (node == NUMA_NO_NODE) {
 		if (nodes_weight(nodes_found_map) >= MAX_NUMNODES)
@@ -83,6 +90,35 @@ int acpi_map_pxm_to_node(int pxm)
 	return node;
 }
 
+/*
+ * Return an online node from a pxm.  This interface is intended for ACPI
+ * device drivers that obtain device NUMA topology from ACPI table, but
+ * do not initialize the node status.
+ */
+int acpi_map_pxm_to_online_node(int pxm)
+{
+	int node, n, dist, min_dist;
+
+	node = acpi_map_pxm_to_node(pxm);
+
+	if (node == NUMA_NO_NODE)
+		node = 0;
+
+	if (!node_online(node)) {
+		min_dist = INT_MAX;
+		for_each_online_node(n) {
+			dist = node_distance(node, n);
+			if (dist < min_dist) {
+				min_dist = dist;
+				node = n;
+			}
+		}
+	}
+
+	return node;
+}
+EXPORT_SYMBOL(acpi_map_pxm_to_online_node);
+
 static void __init
 acpi_table_print_srat_entry(struct acpi_subtable_header *header)
 {
@@ -328,8 +364,6 @@ int acpi_get_node(acpi_handle handle)
 	int pxm;
 
 	pxm = acpi_get_pxm(handle);
-	if (pxm < 0 || pxm >= MAX_PXM_DOMAINS)
-		return NUMA_NO_NODE;
 
 	return acpi_map_pxm_to_node(pxm);
 }
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index e4da5e3..1b3bbb1 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -289,8 +289,13 @@ extern void acpi_dmi_osi_linux(int enable, const struct dmi_system_id *d);
 extern void acpi_osi_setup(char *str);
 
 #ifdef CONFIG_ACPI_NUMA
+int acpi_map_pxm_to_online_node(int pxm);
 int acpi_get_node(acpi_handle handle);
 #else
+static inline int acpi_map_pxm_to_online_node(int pxm)
+{
+	return 0;
+}
 static inline int acpi_get_node(acpi_handle handle)
 {
 	return 0;

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 1/3] acpi: Add acpi_map_pxm_to_online_node()
@ 2015-06-09 23:10   ` Toshi Kani
  0 siblings, 0 replies; 27+ messages in thread
From: Toshi Kani @ 2015-06-09 23:10 UTC (permalink / raw)
  To: rjw, dan.j.williams; +Cc: linux-acpi, linux-nvdimm, linux-kernel, Toshi Kani

The kernel initializes CPU & memory's NUMA topology from ACPI
SRAT table.  Some other ACPI tables, such as NFIT and DMAR,
also contain proximity IDs for their device's NUMA topology.
This information can be used to improve performance of these
devices.

This patch introduces acpi_map_pxm_to_online_node(), which maps
a given pxm to an online node.  This allows ACPI device driver
modules to obtain a node from a device proximity ID.  Unlike
acpi_map_pxm_to_node(), this interface is guaranteed to return
an online node so that the caller module can use the node without
dealing with the node status.  A node may be offline when a device
proximity ID is unique, SRAT memory entry does not exist, or
NUMA is disabled (ex. numa_off on x86).

This patch also moves the pxm range check from acpi_get_node()
to acpi_map_pxm_to_node().

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 drivers/acpi/numa.c  |   40 +++++++++++++++++++++++++++++++++++++---
 include/linux/acpi.h |    5 +++++
 2 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
index 1333cbdc..a64947e 100644
--- a/drivers/acpi/numa.c
+++ b/drivers/acpi/numa.c
@@ -29,6 +29,8 @@
 #include <linux/errno.h>
 #include <linux/acpi.h>
 #include <linux/numa.h>
+#include <linux/nodemask.h>
+#include <linux/topology.h>
 
 #define PREFIX "ACPI: "
 
@@ -70,7 +72,12 @@ static void __acpi_map_pxm_to_node(int pxm, int node)
 
 int acpi_map_pxm_to_node(int pxm)
 {
-	int node = pxm_to_node_map[pxm];
+	int node;
+
+	if (pxm < 0 || pxm >= MAX_PXM_DOMAINS)
+		return NUMA_NO_NODE;
+
+	node = pxm_to_node_map[pxm];
 
 	if (node == NUMA_NO_NODE) {
 		if (nodes_weight(nodes_found_map) >= MAX_NUMNODES)
@@ -83,6 +90,35 @@ int acpi_map_pxm_to_node(int pxm)
 	return node;
 }
 
+/*
+ * Return an online node from a pxm.  This interface is intended for ACPI
+ * device drivers that obtain device NUMA topology from ACPI table, but
+ * do not initialize the node status.
+ */
+int acpi_map_pxm_to_online_node(int pxm)
+{
+	int node, n, dist, min_dist;
+
+	node = acpi_map_pxm_to_node(pxm);
+
+	if (node == NUMA_NO_NODE)
+		node = 0;
+
+	if (!node_online(node)) {
+		min_dist = INT_MAX;
+		for_each_online_node(n) {
+			dist = node_distance(node, n);
+			if (dist < min_dist) {
+				min_dist = dist;
+				node = n;
+			}
+		}
+	}
+
+	return node;
+}
+EXPORT_SYMBOL(acpi_map_pxm_to_online_node);
+
 static void __init
 acpi_table_print_srat_entry(struct acpi_subtable_header *header)
 {
@@ -328,8 +364,6 @@ int acpi_get_node(acpi_handle handle)
 	int pxm;
 
 	pxm = acpi_get_pxm(handle);
-	if (pxm < 0 || pxm >= MAX_PXM_DOMAINS)
-		return NUMA_NO_NODE;
 
 	return acpi_map_pxm_to_node(pxm);
 }
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index e4da5e3..1b3bbb1 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -289,8 +289,13 @@ extern void acpi_dmi_osi_linux(int enable, const struct dmi_system_id *d);
 extern void acpi_osi_setup(char *str);
 
 #ifdef CONFIG_ACPI_NUMA
+int acpi_map_pxm_to_online_node(int pxm);
 int acpi_get_node(acpi_handle handle);
 #else
+static inline int acpi_map_pxm_to_online_node(int pxm)
+{
+	return 0;
+}
 static inline int acpi_get_node(acpi_handle handle)
 {
 	return 0;

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 2/3] libnvdimm: Set numa_node to NVDIMM devices
  2015-06-09 23:10 ` Toshi Kani
@ 2015-06-09 23:10   ` Toshi Kani
  -1 siblings, 0 replies; 27+ messages in thread
From: Toshi Kani @ 2015-06-09 23:10 UTC (permalink / raw)
  To: rjw, dan.j.williams; +Cc: linux-acpi, linux-nvdimm, linux-kernel, Toshi Kani

ACPI NFIT table has System Physical Address Range Structure
entries that describe a proximity ID of each range when
ACPI_NFIT_PROXIMITY_VALID is set in the flags.

Change acpi_nfit_register_region() to map a proximity ID to its
node ID, and set it to a new numa_node field of nd_region_desc,
which is then conveyed to nd_region.

nd_region_probe() and nd_btt_probe() set the numa_node of nd_region
to their device object being probed.  A namespace device inherits
the numa_node from the parent region device.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 drivers/acpi/nfit.c          |    6 ++++++
 drivers/nvdimm/btt.c         |    2 ++
 drivers/nvdimm/nd.h          |    1 +
 drivers/nvdimm/region.c      |    1 +
 drivers/nvdimm/region_devs.c |    1 +
 include/linux/libnvdimm.h    |    1 +
 6 files changed, 12 insertions(+)

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index 5731e4a..69dc6e0 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -1255,6 +1255,12 @@ static int acpi_nfit_register_region(struct acpi_nfit_desc *acpi_desc,
 	ndr_desc->res = &res;
 	ndr_desc->provider_data = nfit_spa;
 	ndr_desc->attr_groups = acpi_nfit_region_attribute_groups;
+	if (spa->flags & ACPI_NFIT_PROXIMITY_VALID)
+		ndr_desc->numa_node = acpi_map_pxm_to_online_node(
+						spa->proximity_domain);
+	else
+		ndr_desc->numa_node = NUMA_NO_NODE;
+
 	list_for_each_entry(nfit_memdev, &acpi_desc->memdevs, list) {
 		struct acpi_nfit_memory_map *memdev = nfit_memdev->memdev;
 		struct nd_mapping *nd_mapping;
diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index 2d7ce9e..3b3e115 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -1369,6 +1369,8 @@ static int nd_btt_probe(struct device *dev)
 		rc = -ENOMEM;
 		goto err_btt;
 	}
+
+	set_dev_node(dev, nd_region->numa_node);
 	dev_set_drvdata(dev, btt);
 
 	return 0;
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index c807379..fefd8f6 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -108,6 +108,7 @@ struct nd_region {
 	u64 ndr_size;
 	u64 ndr_start;
 	int id, num_lanes;
+	int numa_node;
 	void *provider_data;
 	struct nd_interleave_set *nd_set;
 	struct nd_mapping mapping[0];
diff --git a/drivers/nvdimm/region.c b/drivers/nvdimm/region.c
index 373eab4..783220e 100644
--- a/drivers/nvdimm/region.c
+++ b/drivers/nvdimm/region.c
@@ -123,6 +123,7 @@ static int nd_region_probe(struct device *dev)
 
 	num_ns->active = rc;
 	num_ns->count = rc + err;
+	set_dev_node(dev, nd_region->numa_node);
 	dev_set_drvdata(dev, num_ns);
 
 	if (err == 0)
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 86adbd8..352bc80 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -627,6 +627,7 @@ static noinline struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus
 	nd_region->provider_data = ndr_desc->provider_data;
 	nd_region->nd_set = ndr_desc->nd_set;
 	nd_region->num_lanes = ndr_desc->num_lanes;
+	nd_region->numa_node = ndr_desc->numa_node;
 	ida_init(&nd_region->ns_ida);
 	dev = &nd_region->dev;
 	dev_set_name(dev, "region%d", nd_region->id);
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 96b9507..5d0c75a 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -78,6 +78,7 @@ struct nd_region_desc {
 	struct nd_interleave_set *nd_set;
 	void *provider_data;
 	int num_lanes;
+	int numa_node;
 };
 
 struct nvdimm_bus;

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 2/3] libnvdimm: Set numa_node to NVDIMM devices
@ 2015-06-09 23:10   ` Toshi Kani
  0 siblings, 0 replies; 27+ messages in thread
From: Toshi Kani @ 2015-06-09 23:10 UTC (permalink / raw)
  To: rjw, dan.j.williams; +Cc: linux-acpi, linux-nvdimm, linux-kernel, Toshi Kani

ACPI NFIT table has System Physical Address Range Structure
entries that describe a proximity ID of each range when
ACPI_NFIT_PROXIMITY_VALID is set in the flags.

Change acpi_nfit_register_region() to map a proximity ID to its
node ID, and set it to a new numa_node field of nd_region_desc,
which is then conveyed to nd_region.

nd_region_probe() and nd_btt_probe() set the numa_node of nd_region
to their device object being probed.  A namespace device inherits
the numa_node from the parent region device.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 drivers/acpi/nfit.c          |    6 ++++++
 drivers/nvdimm/btt.c         |    2 ++
 drivers/nvdimm/nd.h          |    1 +
 drivers/nvdimm/region.c      |    1 +
 drivers/nvdimm/region_devs.c |    1 +
 include/linux/libnvdimm.h    |    1 +
 6 files changed, 12 insertions(+)

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index 5731e4a..69dc6e0 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -1255,6 +1255,12 @@ static int acpi_nfit_register_region(struct acpi_nfit_desc *acpi_desc,
 	ndr_desc->res = &res;
 	ndr_desc->provider_data = nfit_spa;
 	ndr_desc->attr_groups = acpi_nfit_region_attribute_groups;
+	if (spa->flags & ACPI_NFIT_PROXIMITY_VALID)
+		ndr_desc->numa_node = acpi_map_pxm_to_online_node(
+						spa->proximity_domain);
+	else
+		ndr_desc->numa_node = NUMA_NO_NODE;
+
 	list_for_each_entry(nfit_memdev, &acpi_desc->memdevs, list) {
 		struct acpi_nfit_memory_map *memdev = nfit_memdev->memdev;
 		struct nd_mapping *nd_mapping;
diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index 2d7ce9e..3b3e115 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -1369,6 +1369,8 @@ static int nd_btt_probe(struct device *dev)
 		rc = -ENOMEM;
 		goto err_btt;
 	}
+
+	set_dev_node(dev, nd_region->numa_node);
 	dev_set_drvdata(dev, btt);
 
 	return 0;
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index c807379..fefd8f6 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -108,6 +108,7 @@ struct nd_region {
 	u64 ndr_size;
 	u64 ndr_start;
 	int id, num_lanes;
+	int numa_node;
 	void *provider_data;
 	struct nd_interleave_set *nd_set;
 	struct nd_mapping mapping[0];
diff --git a/drivers/nvdimm/region.c b/drivers/nvdimm/region.c
index 373eab4..783220e 100644
--- a/drivers/nvdimm/region.c
+++ b/drivers/nvdimm/region.c
@@ -123,6 +123,7 @@ static int nd_region_probe(struct device *dev)
 
 	num_ns->active = rc;
 	num_ns->count = rc + err;
+	set_dev_node(dev, nd_region->numa_node);
 	dev_set_drvdata(dev, num_ns);
 
 	if (err == 0)
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 86adbd8..352bc80 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -627,6 +627,7 @@ static noinline struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus
 	nd_region->provider_data = ndr_desc->provider_data;
 	nd_region->nd_set = ndr_desc->nd_set;
 	nd_region->num_lanes = ndr_desc->num_lanes;
+	nd_region->numa_node = ndr_desc->numa_node;
 	ida_init(&nd_region->ns_ida);
 	dev = &nd_region->dev;
 	dev_set_name(dev, "region%d", nd_region->id);
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 96b9507..5d0c75a 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -78,6 +78,7 @@ struct nd_region_desc {
 	struct nd_interleave_set *nd_set;
 	void *provider_data;
 	int num_lanes;
+	int numa_node;
 };
 
 struct nvdimm_bus;

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 3/3] libnvdimm: Add sysfs numa_node to NVDIMM devices
  2015-06-09 23:10 ` Toshi Kani
@ 2015-06-09 23:10   ` Toshi Kani
  -1 siblings, 0 replies; 27+ messages in thread
From: Toshi Kani @ 2015-06-09 23:10 UTC (permalink / raw)
  To: rjw, dan.j.williams; +Cc: linux-acpi, linux-nvdimm, linux-kernel, Toshi Kani

Add support of sysfs 'numa_node' to I/O-related NVDIMM devices
under /sys/bus/nd/devices, regionN, namespaceN.0, and bttN.
When bttN is not set up, its numa_node returns -1 (NUMA_NO_NODE).

Here is an example of numa_node values on a 2-socket system with
a single NVDIMM range on each socket.
  /sys/bus/nd/devices
  |-- btt0/numa_node:-1
  |-- btt1/numa_node:0
  |-- namespace0.0/numa_node:0
  |-- namespace1.0/numa_node:1
  |-- region0/numa_node:0
  |-- region1/numa_node:1

These numa_node files are then linked under the block class of
their device names.
  /sys/class/block/pmem0/device/numa_node:0
  /sys/class/block/pmem0s/device/numa_node:0
  /sys/class/block/pmem1/device/numa_node:1

This enables numactl(8) to accept 'block:' and 'file:' paths of
pmem and btt devices as shown in the examples below.
  numactl --preferred block:pmem0 --show
  numactl --preferred file:/dev/pmem0s --show

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 drivers/acpi/nfit.c             |    1 +
 drivers/nvdimm/btt_devs.c       |    1 +
 drivers/nvdimm/bus.c            |   30 ++++++++++++++++++++++++++++++
 drivers/nvdimm/namespace_devs.c |    1 +
 include/linux/libnvdimm.h       |    1 +
 5 files changed, 34 insertions(+)

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index 69dc6e0..ebcaf2a 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -789,6 +789,7 @@ static const struct attribute_group *acpi_nfit_region_attribute_groups[] = {
 	&nd_region_attribute_group,
 	&nd_mapping_attribute_group,
 	&nd_device_attribute_group,
+	&nd_numa_attribute_group,
 	&acpi_nfit_region_attribute_group,
 	NULL,
 };
diff --git a/drivers/nvdimm/btt_devs.c b/drivers/nvdimm/btt_devs.c
index 740b560..4a053e9 100644
--- a/drivers/nvdimm/btt_devs.c
+++ b/drivers/nvdimm/btt_devs.c
@@ -295,6 +295,7 @@ static struct attribute_group nd_btt_attribute_group = {
 static const struct attribute_group *nd_btt_attribute_groups[] = {
 	&nd_btt_attribute_group,
 	&nd_device_attribute_group,
+	&nd_numa_attribute_group,
 	NULL,
 };
 
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index d8a1794..20ffacc 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -353,6 +353,36 @@ struct attribute_group nd_device_attribute_group = {
 };
 EXPORT_SYMBOL_GPL(nd_device_attribute_group);
 
+static ssize_t numa_node_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%d\n", dev_to_node(dev));
+}
+static DEVICE_ATTR_RO(numa_node);
+
+static struct attribute *nd_numa_attributes[] = {
+	&dev_attr_numa_node.attr,
+	NULL,
+};
+
+static umode_t nd_numa_attr_visible(struct kobject *kobj, struct attribute *a,
+		int n)
+{
+	if (!IS_ENABLED(CONFIG_NUMA))
+		return 0;
+
+	return a->mode;
+}
+
+/**
+ * nd_numa_attribute_group - NUMA attributes for all devices on an nd bus
+ */
+struct attribute_group nd_numa_attribute_group = {
+	.attrs = nd_numa_attributes,
+	.is_visible = nd_numa_attr_visible,
+};
+EXPORT_SYMBOL_GPL(nd_numa_attribute_group);
+
 int nvdimm_bus_create_ndctl(struct nvdimm_bus *nvdimm_bus)
 {
 	dev_t devt = MKDEV(nvdimm_bus_major, nvdimm_bus->id);
diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index e89b019..26f877f 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -1123,6 +1123,7 @@ static struct attribute_group nd_namespace_attribute_group = {
 static const struct attribute_group *nd_namespace_attribute_groups[] = {
 	&nd_device_attribute_group,
 	&nd_namespace_attribute_group,
+	&nd_numa_attribute_group,
 	NULL,
 };
 
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 5d0c75a..a85566b 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -35,6 +35,7 @@ enum {
 extern struct attribute_group nvdimm_bus_attribute_group;
 extern struct attribute_group nvdimm_attribute_group;
 extern struct attribute_group nd_device_attribute_group;
+extern struct attribute_group nd_numa_attribute_group;
 extern struct attribute_group nd_region_attribute_group;
 extern struct attribute_group nd_mapping_attribute_group;
 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 3/3] libnvdimm: Add sysfs numa_node to NVDIMM devices
@ 2015-06-09 23:10   ` Toshi Kani
  0 siblings, 0 replies; 27+ messages in thread
From: Toshi Kani @ 2015-06-09 23:10 UTC (permalink / raw)
  To: rjw, dan.j.williams; +Cc: linux-acpi, linux-nvdimm, linux-kernel, Toshi Kani

Add support of sysfs 'numa_node' to I/O-related NVDIMM devices
under /sys/bus/nd/devices, regionN, namespaceN.0, and bttN.
When bttN is not set up, its numa_node returns -1 (NUMA_NO_NODE).

Here is an example of numa_node values on a 2-socket system with
a single NVDIMM range on each socket.
  /sys/bus/nd/devices
  |-- btt0/numa_node:-1
  |-- btt1/numa_node:0
  |-- namespace0.0/numa_node:0
  |-- namespace1.0/numa_node:1
  |-- region0/numa_node:0
  |-- region1/numa_node:1

These numa_node files are then linked under the block class of
their device names.
  /sys/class/block/pmem0/device/numa_node:0
  /sys/class/block/pmem0s/device/numa_node:0
  /sys/class/block/pmem1/device/numa_node:1

This enables numactl(8) to accept 'block:' and 'file:' paths of
pmem and btt devices as shown in the examples below.
  numactl --preferred block:pmem0 --show
  numactl --preferred file:/dev/pmem0s --show

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 drivers/acpi/nfit.c             |    1 +
 drivers/nvdimm/btt_devs.c       |    1 +
 drivers/nvdimm/bus.c            |   30 ++++++++++++++++++++++++++++++
 drivers/nvdimm/namespace_devs.c |    1 +
 include/linux/libnvdimm.h       |    1 +
 5 files changed, 34 insertions(+)

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index 69dc6e0..ebcaf2a 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -789,6 +789,7 @@ static const struct attribute_group *acpi_nfit_region_attribute_groups[] = {
 	&nd_region_attribute_group,
 	&nd_mapping_attribute_group,
 	&nd_device_attribute_group,
+	&nd_numa_attribute_group,
 	&acpi_nfit_region_attribute_group,
 	NULL,
 };
diff --git a/drivers/nvdimm/btt_devs.c b/drivers/nvdimm/btt_devs.c
index 740b560..4a053e9 100644
--- a/drivers/nvdimm/btt_devs.c
+++ b/drivers/nvdimm/btt_devs.c
@@ -295,6 +295,7 @@ static struct attribute_group nd_btt_attribute_group = {
 static const struct attribute_group *nd_btt_attribute_groups[] = {
 	&nd_btt_attribute_group,
 	&nd_device_attribute_group,
+	&nd_numa_attribute_group,
 	NULL,
 };
 
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index d8a1794..20ffacc 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -353,6 +353,36 @@ struct attribute_group nd_device_attribute_group = {
 };
 EXPORT_SYMBOL_GPL(nd_device_attribute_group);
 
+static ssize_t numa_node_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%d\n", dev_to_node(dev));
+}
+static DEVICE_ATTR_RO(numa_node);
+
+static struct attribute *nd_numa_attributes[] = {
+	&dev_attr_numa_node.attr,
+	NULL,
+};
+
+static umode_t nd_numa_attr_visible(struct kobject *kobj, struct attribute *a,
+		int n)
+{
+	if (!IS_ENABLED(CONFIG_NUMA))
+		return 0;
+
+	return a->mode;
+}
+
+/**
+ * nd_numa_attribute_group - NUMA attributes for all devices on an nd bus
+ */
+struct attribute_group nd_numa_attribute_group = {
+	.attrs = nd_numa_attributes,
+	.is_visible = nd_numa_attr_visible,
+};
+EXPORT_SYMBOL_GPL(nd_numa_attribute_group);
+
 int nvdimm_bus_create_ndctl(struct nvdimm_bus *nvdimm_bus)
 {
 	dev_t devt = MKDEV(nvdimm_bus_major, nvdimm_bus->id);
diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index e89b019..26f877f 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -1123,6 +1123,7 @@ static struct attribute_group nd_namespace_attribute_group = {
 static const struct attribute_group *nd_namespace_attribute_groups[] = {
 	&nd_device_attribute_group,
 	&nd_namespace_attribute_group,
+	&nd_numa_attribute_group,
 	NULL,
 };
 
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 5d0c75a..a85566b 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -35,6 +35,7 @@ enum {
 extern struct attribute_group nvdimm_bus_attribute_group;
 extern struct attribute_group nvdimm_attribute_group;
 extern struct attribute_group nd_device_attribute_group;
+extern struct attribute_group nd_numa_attribute_group;
 extern struct attribute_group nd_region_attribute_group;
 extern struct attribute_group nd_mapping_attribute_group;
 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/3] Add NUMA support for NVDIMM devices
  2015-06-09 23:10 ` Toshi Kani
                   ` (3 preceding siblings ...)
  (?)
@ 2015-06-10 15:54 ` Jeff Moyer
  2015-06-10 15:57   ` Dan Williams
  -1 siblings, 1 reply; 27+ messages in thread
From: Jeff Moyer @ 2015-06-10 15:54 UTC (permalink / raw)
  To: Toshi Kani; +Cc: rjw, dan.j.williams, linux-acpi, linux-kernel, linux-nvdimm

Toshi Kani <toshi.kani@hp.com> writes:

> Since NVDIMMs are installed on memory slots, they expose the NUMA
> topology of a platform.  This patchset adds support of sysfs
> 'numa_node' to I/O-related NVDIMM devices under /sys/bus/nd/devices.
> This enables numactl(8) to accept 'block:' and 'file:' paths of
> pmem and btt devices as shown in the examples below.
>   numactl --preferred block:pmem0 --show
>   numactl --preferred file:/dev/pmem0s --show
>
> numactl can be used to bind an application to the locality of
> a target NVDIMM for better performance.  Here is a result of fio
> benchmark to ext4/dax on an HP DL380 with 2 sockets for local and
> remote settings.
>
>   Local [1] :  4098.3MB/s
>   Remote [2]:  3718.4MB/s
>
> [1] numactl --preferred block:pmem0 --cpunodebind block:pmem0 fio <fs-on-pmem0> 
> [2] numactl --preferred block:pmem1 --cpunodebind block:pmem1 fio <fs-on-pmem0>

Did you post the patches to numactl somewhere?

-Jeff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/3] Add NUMA support for NVDIMM devices
  2015-06-10 15:54 ` [PATCH v2 0/3] Add NUMA support for " Jeff Moyer
@ 2015-06-10 15:57   ` Dan Williams
  2015-06-10 16:11       ` Jeff Moyer
                       ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Dan Williams @ 2015-06-10 15:57 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Toshi Kani, Rafael J. Wysocki, Linux ACPI, linux-kernel, linux-nvdimm

On Wed, Jun 10, 2015 at 8:54 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
> Toshi Kani <toshi.kani@hp.com> writes:
>
>> Since NVDIMMs are installed on memory slots, they expose the NUMA
>> topology of a platform.  This patchset adds support of sysfs
>> 'numa_node' to I/O-related NVDIMM devices under /sys/bus/nd/devices.
>> This enables numactl(8) to accept 'block:' and 'file:' paths of
>> pmem and btt devices as shown in the examples below.
>>   numactl --preferred block:pmem0 --show
>>   numactl --preferred file:/dev/pmem0s --show
>>
>> numactl can be used to bind an application to the locality of
>> a target NVDIMM for better performance.  Here is a result of fio
>> benchmark to ext4/dax on an HP DL380 with 2 sockets for local and
>> remote settings.
>>
>>   Local [1] :  4098.3MB/s
>>   Remote [2]:  3718.4MB/s
>>
>> [1] numactl --preferred block:pmem0 --cpunodebind block:pmem0 fio <fs-on-pmem0>
>> [2] numactl --preferred block:pmem1 --cpunodebind block:pmem1 fio <fs-on-pmem0>
>
> Did you post the patches to numactl somewhere?
>

numactl already supports this today.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/3] Add NUMA support for NVDIMM devices
  2015-06-10 15:57   ` Dan Williams
@ 2015-06-10 16:11       ` Jeff Moyer
  2015-06-10 16:20     ` Elliott, Robert (Server Storage)
  2015-06-10 16:20     ` Toshi Kani
  2 siblings, 0 replies; 27+ messages in thread
From: Jeff Moyer @ 2015-06-10 16:11 UTC (permalink / raw)
  To: Dan Williams
  Cc: Toshi Kani, Rafael J. Wysocki, Linux ACPI, linux-kernel, linux-nvdimm

Dan Williams <dan.j.williams@intel.com> writes:

> On Wed, Jun 10, 2015 at 8:54 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
>> Toshi Kani <toshi.kani@hp.com> writes:
>>
>>> Since NVDIMMs are installed on memory slots, they expose the NUMA
>>> topology of a platform.  This patchset adds support of sysfs
>>> 'numa_node' to I/O-related NVDIMM devices under /sys/bus/nd/devices.
>>> This enables numactl(8) to accept 'block:' and 'file:' paths of
>>> pmem and btt devices as shown in the examples below.
>>>   numactl --preferred block:pmem0 --show
>>>   numactl --preferred file:/dev/pmem0s --show
>>>
>>> numactl can be used to bind an application to the locality of
>>> a target NVDIMM for better performance.  Here is a result of fio
>>> benchmark to ext4/dax on an HP DL380 with 2 sockets for local and
>>> remote settings.
>>>
>>>   Local [1] :  4098.3MB/s
>>>   Remote [2]:  3718.4MB/s
>>>
>>> [1] numactl --preferred block:pmem0 --cpunodebind block:pmem0 fio <fs-on-pmem0>
>>> [2] numactl --preferred block:pmem1 --cpunodebind block:pmem1 fio <fs-on-pmem0>
>>
>> Did you post the patches to numactl somewhere?
>>
>
> numactl already supports this today.

Ah, I did not know that.  I guess I should have RTFM.  :)

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/3] Add NUMA support for NVDIMM devices
@ 2015-06-10 16:11       ` Jeff Moyer
  0 siblings, 0 replies; 27+ messages in thread
From: Jeff Moyer @ 2015-06-10 16:11 UTC (permalink / raw)
  To: Dan Williams
  Cc: Toshi Kani, Rafael J. Wysocki, Linux ACPI, linux-kernel, linux-nvdimm

Dan Williams <dan.j.williams@intel.com> writes:

> On Wed, Jun 10, 2015 at 8:54 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
>> Toshi Kani <toshi.kani@hp.com> writes:
>>
>>> Since NVDIMMs are installed on memory slots, they expose the NUMA
>>> topology of a platform.  This patchset adds support of sysfs
>>> 'numa_node' to I/O-related NVDIMM devices under /sys/bus/nd/devices.
>>> This enables numactl(8) to accept 'block:' and 'file:' paths of
>>> pmem and btt devices as shown in the examples below.
>>>   numactl --preferred block:pmem0 --show
>>>   numactl --preferred file:/dev/pmem0s --show
>>>
>>> numactl can be used to bind an application to the locality of
>>> a target NVDIMM for better performance.  Here is a result of fio
>>> benchmark to ext4/dax on an HP DL380 with 2 sockets for local and
>>> remote settings.
>>>
>>>   Local [1] :  4098.3MB/s
>>>   Remote [2]:  3718.4MB/s
>>>
>>> [1] numactl --preferred block:pmem0 --cpunodebind block:pmem0 fio <fs-on-pmem0>
>>> [2] numactl --preferred block:pmem1 --cpunodebind block:pmem1 fio <fs-on-pmem0>
>>
>> Did you post the patches to numactl somewhere?
>>
>
> numactl already supports this today.

Ah, I did not know that.  I guess I should have RTFM.  :)

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH v2 0/3] Add NUMA support for NVDIMM devices
  2015-06-10 15:57   ` Dan Williams
  2015-06-10 16:11       ` Jeff Moyer
@ 2015-06-10 16:20     ` Elliott, Robert (Server Storage)
  2015-06-10 16:37       ` Dan Williams
  2015-06-10 16:20     ` Toshi Kani
  2 siblings, 1 reply; 27+ messages in thread
From: Elliott, Robert (Server Storage) @ 2015-06-10 16:20 UTC (permalink / raw)
  To: Dan Williams, Jeff Moyer
  Cc: linux-nvdimm, Rafael J. Wysocki, linux-kernel, Linux ACPI

> -----Original Message-----
> From: Linux-nvdimm [mailto:linux-nvdimm-bounces@lists.01.org] On Behalf Of
> Dan Williams
> Sent: Wednesday, June 10, 2015 9:58 AM
> To: Jeff Moyer
> Cc: linux-nvdimm; Rafael J. Wysocki; linux-kernel@vger.kernel.org; Linux
> ACPI
> Subject: Re: [PATCH v2 0/3] Add NUMA support for NVDIMM devices
> 
> On Wed, Jun 10, 2015 at 8:54 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
> > Toshi Kani <toshi.kani@hp.com> writes:
> >
> >> Since NVDIMMs are installed on memory slots, they expose the NUMA
> >> topology of a platform.  This patchset adds support of sysfs
> >> 'numa_node' to I/O-related NVDIMM devices under /sys/bus/nd/devices.
> >> This enables numactl(8) to accept 'block:' and 'file:' paths of
> >> pmem and btt devices as shown in the examples below.
> >>   numactl --preferred block:pmem0 --show
> >>   numactl --preferred file:/dev/pmem0s --show
> >>
> >> numactl can be used to bind an application to the locality of
> >> a target NVDIMM for better performance.  Here is a result of fio
> >> benchmark to ext4/dax on an HP DL380 with 2 sockets for local and
> >> remote settings.
> >>
> >>   Local [1] :  4098.3MB/s
> >>   Remote [2]:  3718.4MB/s
> >>
> >> [1] numactl --preferred block:pmem0 --cpunodebind block:pmem0 fio <fs-
> on-pmem0>
> >> [2] numactl --preferred block:pmem1 --cpunodebind block:pmem1 fio <fs-
> on-pmem0>
> >
> > Did you post the patches to numactl somewhere?
> >
> 
> numactl already supports this today.

numactl does have a bug handling partitions under these devices,
because it assumes all storage devices have "/devices/pci"
in their path as it tries to find the parent device for the
partition.  I think we'll propose a numactl patch for that;
I don't think the drivers can fool it.

Details (from an earlier version of the patch series
in which btt devices were named /dev/nd1, etc.):

strace shows that numactl is trying to find numa_node in very
different locations for /dev/nd1p1 vs. /dev/sda1.

strace for /dev/nd1p1
=====================
open("/sys/class/block/nd1p1/dev", O_RDONLY) = 4
read(4, "259:1\n", 4095)                = 6
close(4)                                = 0
close(3)                                = 0
readlink("/sys/class/block/nd1p1", "../../devices/LNXSYSTM:00/LNXSYB"..., 1024) = 77
open("/sys/class/block/nd1p1/device/numa_node", O_RDONLY) = -1 ENOENT (No such file or directory)

strace for /dev/sda1
====================
open("/sys/class/block/sda1/dev", O_RDONLY) = 4
read(4, "8:1\n", 4095)                  = 4
close(4)                                = 0
close(3)                                = 0
readlink("/sys/class/block/sda1", "../../devices/pci0000:00/0000:00"..., 1024) = 91
open("/sys//devices/pci0000:00/0000:00:01.0//numa_node", O_RDONLY) = 3
read(3, "0\n", 4095)                    = 2
close(3)                                = 0

The "sys/class/block/xxx" paths link to:
lrwxrwxrwx. 1 root root 0 May 20 20:42 /sys/class/block/nd1p1 -> ../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/btt1/block/nd1/nd1p1
lrwxrwxrwx. 1 root root 0 May 20 20:41 /sys/class/block/sda1 -> ../../devices/pci0000:00/0000:00:01.0/0000:03:00.0/host6/target6:0:0/6:0:0:0/block/sda/sda1


For /dev/sda1, numactl recognizes "/devices/pci" as
a special path, and strips off everything after the
numbers.  Faced with:
../../devices/pci0000:00/0000:00:01.0/0000:03:00.0/host6/target6:0:0/6:0:0:0/block/sda/sda1

it ends up with this (leaving a sloppy "//" in the path):
/sys/devices/pci0000:00/0000:00:01.0//numa_node

It would also succeed if it ended up with this:
/sys/devices/pci0000:00/0000:00:01.0/0000:03:00.0/numa_node

For /dev/nd1p1 it does not see that string, so just
tries to open "/sys/class/block/nd1p1/device/numa_node"

There are no "device/" subdirectories in the tree for
partition devices (for either sda1 or nd1p1), so this 
fails.


>From http://oss.sgi.com/projects/libnuma/
numactl affinity.c:
        /* Somewhat hackish: extract device from symlink path.
           Better would be a direct backlink. This knows slightly too
           much about the actual sysfs layout. */
        char path[1024];
        char *fn = NULL;
        if (asprintf(&fn, "/sys/class/%s/%s", cls, dev) > 0 &&
            readlink(fn, path, sizeof path) > 0) {
                regex_t re;
                regmatch_t match[2];
                char *p;

                regcomp(&re, "(/devices/pci[0-9a-fA-F:/]+\\.[0-9]+)/",
                        REG_EXTENDED);
                ret = regexec(&re, path, 2, match, 0);
                regfree(&re);
                if (ret == 0) {
                        free(fn);
                        assert(match[0].rm_so > 0);
                        assert(match[0].rm_eo > 0);
                        path[match[1].rm_eo + 1] = 0;
                        p = path + match[0].rm_so;
                        ret = sysfs_node_read(mask, "/sys/%s/numa_node", p);
                        if (ret < 0)
                                return node_parse_failure(ret, NULL, p);
                        return ret;
                }
        }
        free(fn);

        ret = sysfs_node_read(mask, "/sys/class/%s/%s/device/numa_node",
                              cls, dev);






^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/3] Add NUMA support for NVDIMM devices
  2015-06-10 15:57   ` Dan Williams
  2015-06-10 16:11       ` Jeff Moyer
  2015-06-10 16:20     ` Elliott, Robert (Server Storage)
@ 2015-06-10 16:20     ` Toshi Kani
  2 siblings, 0 replies; 27+ messages in thread
From: Toshi Kani @ 2015-06-10 16:20 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jeff Moyer, Rafael J. Wysocki, Linux ACPI, linux-kernel, linux-nvdimm

On Wed, 2015-06-10 at 08:57 -0700, Dan Williams wrote:
> On Wed, Jun 10, 2015 at 8:54 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
> > Toshi Kani <toshi.kani@hp.com> writes:
> >
> >> Since NVDIMMs are installed on memory slots, they expose the NUMA
> >> topology of a platform.  This patchset adds support of sysfs
> >> 'numa_node' to I/O-related NVDIMM devices under /sys/bus/nd/devices.
> >> This enables numactl(8) to accept 'block:' and 'file:' paths of
> >> pmem and btt devices as shown in the examples below.
> >>   numactl --preferred block:pmem0 --show
> >>   numactl --preferred file:/dev/pmem0s --show
> >>
> >> numactl can be used to bind an application to the locality of
> >> a target NVDIMM for better performance.  Here is a result of fio
> >> benchmark to ext4/dax on an HP DL380 with 2 sockets for local and
> >> remote settings.
> >>
> >>   Local [1] :  4098.3MB/s
> >>   Remote [2]:  3718.4MB/s
> >>
> >> [1] numactl --preferred block:pmem0 --cpunodebind block:pmem0 fio <fs-on-pmem0>
> >> [2] numactl --preferred block:pmem1 --cpunodebind block:pmem1 fio <fs-on-pmem0>
> >
> > Did you post the patches to numactl somewhere?
> >
> 
> numactl already supports this today.

Yes, numactl supports the following sysfs class lookup for numa_node.
This patchset adds numa_node for NVDIMM devices in the same sysfs format
as described in patch 3/3.

   /* Generic sysfs class lookup */
   static int
   affinity_class(struct bitmask *mask, char *cls, const char *dev)
   {
		:
        ret = sysfs_node_read(mask, "/sys/class/%s/%s/device/numa_node",
                              cls, dev);

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/3] Add NUMA support for NVDIMM devices
  2015-06-10 16:20     ` Elliott, Robert (Server Storage)
@ 2015-06-10 16:37       ` Dan Williams
  0 siblings, 0 replies; 27+ messages in thread
From: Dan Williams @ 2015-06-10 16:37 UTC (permalink / raw)
  To: Elliott, Robert (Server Storage)
  Cc: Jeff Moyer, linux-nvdimm, Rafael J. Wysocki, linux-kernel, Linux ACPI

On Wed, Jun 10, 2015 at 9:20 AM, Elliott, Robert (Server Storage)
<Elliott@hp.com> wrote:
>> -----Original Message-----
>> From: Linux-nvdimm [mailto:linux-nvdimm-bounces@lists.01.org] On Behalf Of
>> Dan Williams
>> Sent: Wednesday, June 10, 2015 9:58 AM
>> To: Jeff Moyer
>> Cc: linux-nvdimm; Rafael J. Wysocki; linux-kernel@vger.kernel.org; Linux
>> ACPI
>> Subject: Re: [PATCH v2 0/3] Add NUMA support for NVDIMM devices
>>
>> On Wed, Jun 10, 2015 at 8:54 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
>> > Toshi Kani <toshi.kani@hp.com> writes:
>> >
>> >> Since NVDIMMs are installed on memory slots, they expose the NUMA
>> >> topology of a platform.  This patchset adds support of sysfs
>> >> 'numa_node' to I/O-related NVDIMM devices under /sys/bus/nd/devices.
>> >> This enables numactl(8) to accept 'block:' and 'file:' paths of
>> >> pmem and btt devices as shown in the examples below.
>> >>   numactl --preferred block:pmem0 --show
>> >>   numactl --preferred file:/dev/pmem0s --show
>> >>
>> >> numactl can be used to bind an application to the locality of
>> >> a target NVDIMM for better performance.  Here is a result of fio
>> >> benchmark to ext4/dax on an HP DL380 with 2 sockets for local and
>> >> remote settings.
>> >>
>> >>   Local [1] :  4098.3MB/s
>> >>   Remote [2]:  3718.4MB/s
>> >>
>> >> [1] numactl --preferred block:pmem0 --cpunodebind block:pmem0 fio <fs-
>> on-pmem0>
>> >> [2] numactl --preferred block:pmem1 --cpunodebind block:pmem1 fio <fs-
>> on-pmem0>
>> >
>> > Did you post the patches to numactl somewhere?
>> >
>>
>> numactl already supports this today.
>
> numactl does have a bug handling partitions under these devices,
> because it assumes all storage devices have "/devices/pci"
> in their path as it tries to find the parent device for the
> partition.  I think we'll propose a numactl patch for that;
> I don't think the drivers can fool it.
>
> Details (from an earlier version of the patch series
> in which btt devices were named /dev/nd1, etc.):
>
> strace shows that numactl is trying to find numa_node in very
> different locations for /dev/nd1p1 vs. /dev/sda1.
>
> strace for /dev/nd1p1
> =====================
> open("/sys/class/block/nd1p1/dev", O_RDONLY) = 4
> read(4, "259:1\n", 4095)                = 6
> close(4)                                = 0
> close(3)                                = 0
> readlink("/sys/class/block/nd1p1", "../../devices/LNXSYSTM:00/LNXSYB"..., 1024) = 77
> open("/sys/class/block/nd1p1/device/numa_node", O_RDONLY) = -1 ENOENT (No such file or directory)
>
> strace for /dev/sda1
> ====================
> open("/sys/class/block/sda1/dev", O_RDONLY) = 4
> read(4, "8:1\n", 4095)                  = 4
> close(4)                                = 0
> close(3)                                = 0
> readlink("/sys/class/block/sda1", "../../devices/pci0000:00/0000:00"..., 1024) = 91
> open("/sys//devices/pci0000:00/0000:00:01.0//numa_node", O_RDONLY) = 3
> read(3, "0\n", 4095)                    = 2
> close(3)                                = 0
>
> The "sys/class/block/xxx" paths link to:
> lrwxrwxrwx. 1 root root 0 May 20 20:42 /sys/class/block/nd1p1 -> ../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/btt1/block/nd1/nd1p1
> lrwxrwxrwx. 1 root root 0 May 20 20:41 /sys/class/block/sda1 -> ../../devices/pci0000:00/0000:00:01.0/0000:03:00.0/host6/target6:0:0/6:0:0:0/block/sda/sda1
>
>
> For /dev/sda1, numactl recognizes "/devices/pci" as
> a special path, and strips off everything after the
> numbers.  Faced with:
> ../../devices/pci0000:00/0000:00:01.0/0000:03:00.0/host6/target6:0:0/6:0:0:0/block/sda/sda1
>
> it ends up with this (leaving a sloppy "//" in the path):
> /sys/devices/pci0000:00/0000:00:01.0//numa_node
>
> It would also succeed if it ended up with this:
> /sys/devices/pci0000:00/0000:00:01.0/0000:03:00.0/numa_node
>
> For /dev/nd1p1 it does not see that string, so just
> tries to open "/sys/class/block/nd1p1/device/numa_node"
>
> There are no "device/" subdirectories in the tree for
> partition devices (for either sda1 or nd1p1), so this
> fails.
>
>
> From http://oss.sgi.com/projects/libnuma/
> numactl affinity.c:
>         /* Somewhat hackish: extract device from symlink path.
>            Better would be a direct backlink. This knows slightly too
>            much about the actual sysfs layout. */
>         char path[1024];
>         char *fn = NULL;
>         if (asprintf(&fn, "/sys/class/%s/%s", cls, dev) > 0 &&
>             readlink(fn, path, sizeof path) > 0) {
>                 regex_t re;
>                 regmatch_t match[2];
>                 char *p;
>
>                 regcomp(&re, "(/devices/pci[0-9a-fA-F:/]+\\.[0-9]+)/",
>                         REG_EXTENDED);
>                 ret = regexec(&re, path, 2, match, 0);
>                 regfree(&re);
>                 if (ret == 0) {
>                         free(fn);
>                         assert(match[0].rm_so > 0);
>                         assert(match[0].rm_eo > 0);
>                         path[match[1].rm_eo + 1] = 0;
>                         p = path + match[0].rm_so;
>                         ret = sysfs_node_read(mask, "/sys/%s/numa_node", p);
>                         if (ret < 0)
>                                 return node_parse_failure(ret, NULL, p);
>                         return ret;
>                 }
>         }
>         free(fn);
>
>         ret = sysfs_node_read(mask, "/sys/class/%s/%s/device/numa_node",
>                               cls, dev);

I think it is broken to try go from /sys/class down it should go from
the device node up.  I.e. from the resolved path of
/sys/dev/block/<major>:<minor>, and then walk up the directory tree to
the parent of block.

$ readlink -f /sys/dev/block/8\:1/
/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda1

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/3] Add NUMA support for NVDIMM devices
  2015-06-09 23:10 ` Toshi Kani
@ 2015-06-11 15:38   ` Dan Williams
  -1 siblings, 0 replies; 27+ messages in thread
From: Dan Williams @ 2015-06-11 15:38 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Rafael J. Wysocki, Linux ACPI, linux-nvdimm, linux-kernel,
	Rafael J Wysocki

On Tue, Jun 9, 2015 at 4:10 PM, Toshi Kani <toshi.kani@hp.com> wrote:
> Since NVDIMMs are installed on memory slots, they expose the NUMA
> topology of a platform.  This patchset adds support of sysfs
> 'numa_node' to I/O-related NVDIMM devices under /sys/bus/nd/devices.
> This enables numactl(8) to accept 'block:' and 'file:' paths of
> pmem and btt devices as shown in the examples below.
>   numactl --preferred block:pmem0 --show
>   numactl --preferred file:/dev/pmem0s --show
>
> numactl can be used to bind an application to the locality of
> a target NVDIMM for better performance.  Here is a result of fio
> benchmark to ext4/dax on an HP DL380 with 2 sockets for local and
> remote settings.
>
>   Local [1] :  4098.3MB/s
>   Remote [2]:  3718.4MB/s
>
> [1] numactl --preferred block:pmem0 --cpunodebind block:pmem0 fio <fs-on-pmem0>
> [2] numactl --preferred block:pmem1 --cpunodebind block:pmem1 fio <fs-on-pmem0>
>
> Patch 1/3 applies on top of the acpica branch of the pm tree.
> Patch 2/3-3/3 apply on top of Dan Williams's v5 patch series of
> "libnvdimm: non-volatile memory devices".
>
> ---
> v2:
>  - Add acpi_map_pxm_to_online_node(), which returns an online node.
>  - Manage visibility of sysfs numa_node with is_visible. (Dan Williams)
>  - Check ACPI_NFIT_PROXIMITY_VALID in spa->flags.
>
> ---
> Toshi Kani (3):
>   1/3 acpi: Add acpi_map_pxm_to_online_node()
>   2/3 libnvdimm: Set numa_node to NVDIMM devices
>   3/3 libnvdimm: Add sysfs numa_node to NVDIMM devices

Looks good to me.  Once Rafael acks the ACPI core changes I'll pull it
in to libnvdimm-for-next.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/3] Add NUMA support for NVDIMM devices
@ 2015-06-11 15:38   ` Dan Williams
  0 siblings, 0 replies; 27+ messages in thread
From: Dan Williams @ 2015-06-11 15:38 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Rafael J. Wysocki, Linux ACPI, linux-nvdimm@lists.01.org,
	linux-kernel, Rafael J Wysocki

On Tue, Jun 9, 2015 at 4:10 PM, Toshi Kani <toshi.kani@hp.com> wrote:
> Since NVDIMMs are installed on memory slots, they expose the NUMA
> topology of a platform.  This patchset adds support of sysfs
> 'numa_node' to I/O-related NVDIMM devices under /sys/bus/nd/devices.
> This enables numactl(8) to accept 'block:' and 'file:' paths of
> pmem and btt devices as shown in the examples below.
>   numactl --preferred block:pmem0 --show
>   numactl --preferred file:/dev/pmem0s --show
>
> numactl can be used to bind an application to the locality of
> a target NVDIMM for better performance.  Here is a result of fio
> benchmark to ext4/dax on an HP DL380 with 2 sockets for local and
> remote settings.
>
>   Local [1] :  4098.3MB/s
>   Remote [2]:  3718.4MB/s
>
> [1] numactl --preferred block:pmem0 --cpunodebind block:pmem0 fio <fs-on-pmem0>
> [2] numactl --preferred block:pmem1 --cpunodebind block:pmem1 fio <fs-on-pmem0>
>
> Patch 1/3 applies on top of the acpica branch of the pm tree.
> Patch 2/3-3/3 apply on top of Dan Williams's v5 patch series of
> "libnvdimm: non-volatile memory devices".
>
> ---
> v2:
>  - Add acpi_map_pxm_to_online_node(), which returns an online node.
>  - Manage visibility of sysfs numa_node with is_visible. (Dan Williams)
>  - Check ACPI_NFIT_PROXIMITY_VALID in spa->flags.
>
> ---
> Toshi Kani (3):
>   1/3 acpi: Add acpi_map_pxm_to_online_node()
>   2/3 libnvdimm: Set numa_node to NVDIMM devices
>   3/3 libnvdimm: Add sysfs numa_node to NVDIMM devices

Looks good to me.  Once Rafael acks the ACPI core changes I'll pull it
in to libnvdimm-for-next.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/3] Add NUMA support for NVDIMM devices
  2015-06-11 15:38   ` Dan Williams
@ 2015-06-11 15:45     ` Toshi Kani
  -1 siblings, 0 replies; 27+ messages in thread
From: Toshi Kani @ 2015-06-11 15:45 UTC (permalink / raw)
  To: Dan Williams
  Cc: Rafael J. Wysocki, Linux ACPI, linux-nvdimm, linux-kernel,
	Rafael J Wysocki

On Thu, 2015-06-11 at 08:38 -0700, Dan Williams wrote:
> On Tue, Jun 9, 2015 at 4:10 PM, Toshi Kani <toshi.kani@hp.com> wrote:
> > Since NVDIMMs are installed on memory slots, they expose the NUMA
> > topology of a platform.  This patchset adds support of sysfs
> > 'numa_node' to I/O-related NVDIMM devices under /sys/bus/nd/devices.
> > This enables numactl(8) to accept 'block:' and 'file:' paths of
> > pmem and btt devices as shown in the examples below.
> >   numactl --preferred block:pmem0 --show
> >   numactl --preferred file:/dev/pmem0s --show
> >
> > numactl can be used to bind an application to the locality of
> > a target NVDIMM for better performance.  Here is a result of fio
> > benchmark to ext4/dax on an HP DL380 with 2 sockets for local and
> > remote settings.
> >
> >   Local [1] :  4098.3MB/s
> >   Remote [2]:  3718.4MB/s
> >
> > [1] numactl --preferred block:pmem0 --cpunodebind block:pmem0 fio <fs-on-pmem0>
> > [2] numactl --preferred block:pmem1 --cpunodebind block:pmem1 fio <fs-on-pmem0>
> >
> > Patch 1/3 applies on top of the acpica branch of the pm tree.
> > Patch 2/3-3/3 apply on top of Dan Williams's v5 patch series of
> > "libnvdimm: non-volatile memory devices".
> >
> > ---
> > v2:
> >  - Add acpi_map_pxm_to_online_node(), which returns an online node.
> >  - Manage visibility of sysfs numa_node with is_visible. (Dan Williams)
> >  - Check ACPI_NFIT_PROXIMITY_VALID in spa->flags.
> >
> > ---
> > Toshi Kani (3):
> >   1/3 acpi: Add acpi_map_pxm_to_online_node()
> >   2/3 libnvdimm: Set numa_node to NVDIMM devices
> >   3/3 libnvdimm: Add sysfs numa_node to NVDIMM devices
> 
> Looks good to me.  Once Rafael acks the ACPI core changes I'll pull it
> in to libnvdimm-for-next.

Great!  Thanks Dan,
-Toshi


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/3] Add NUMA support for NVDIMM devices
@ 2015-06-11 15:45     ` Toshi Kani
  0 siblings, 0 replies; 27+ messages in thread
From: Toshi Kani @ 2015-06-11 15:45 UTC (permalink / raw)
  To: Dan Williams
  Cc: Rafael J. Wysocki, Linux ACPI, linux-nvdimm@lists.01.org,
	linux-kernel, Rafael J Wysocki

On Thu, 2015-06-11 at 08:38 -0700, Dan Williams wrote:
> On Tue, Jun 9, 2015 at 4:10 PM, Toshi Kani <toshi.kani@hp.com> wrote:
> > Since NVDIMMs are installed on memory slots, they expose the NUMA
> > topology of a platform.  This patchset adds support of sysfs
> > 'numa_node' to I/O-related NVDIMM devices under /sys/bus/nd/devices.
> > This enables numactl(8) to accept 'block:' and 'file:' paths of
> > pmem and btt devices as shown in the examples below.
> >   numactl --preferred block:pmem0 --show
> >   numactl --preferred file:/dev/pmem0s --show
> >
> > numactl can be used to bind an application to the locality of
> > a target NVDIMM for better performance.  Here is a result of fio
> > benchmark to ext4/dax on an HP DL380 with 2 sockets for local and
> > remote settings.
> >
> >   Local [1] :  4098.3MB/s
> >   Remote [2]:  3718.4MB/s
> >
> > [1] numactl --preferred block:pmem0 --cpunodebind block:pmem0 fio <fs-on-pmem0>
> > [2] numactl --preferred block:pmem1 --cpunodebind block:pmem1 fio <fs-on-pmem0>
> >
> > Patch 1/3 applies on top of the acpica branch of the pm tree.
> > Patch 2/3-3/3 apply on top of Dan Williams's v5 patch series of
> > "libnvdimm: non-volatile memory devices".
> >
> > ---
> > v2:
> >  - Add acpi_map_pxm_to_online_node(), which returns an online node.
> >  - Manage visibility of sysfs numa_node with is_visible. (Dan Williams)
> >  - Check ACPI_NFIT_PROXIMITY_VALID in spa->flags.
> >
> > ---
> > Toshi Kani (3):
> >   1/3 acpi: Add acpi_map_pxm_to_online_node()
> >   2/3 libnvdimm: Set numa_node to NVDIMM devices
> >   3/3 libnvdimm: Add sysfs numa_node to NVDIMM devices
> 
> Looks good to me.  Once Rafael acks the ACPI core changes I'll pull it
> in to libnvdimm-for-next.

Great!  Thanks Dan,
-Toshi


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/3] Add NUMA support for NVDIMM devices
  2015-06-09 23:10 ` Toshi Kani
@ 2015-06-18 20:24   ` Dan Williams
  -1 siblings, 0 replies; 27+ messages in thread
From: Dan Williams @ 2015-06-18 20:24 UTC (permalink / raw)
  To: Toshi Kani, Rafael J Wysocki
  Cc: Rafael J. Wysocki, Linux ACPI, linux-nvdimm, linux-kernel

Rafael, does patch1 look ok to you?

On Tue, Jun 9, 2015 at 4:10 PM, Toshi Kani <toshi.kani@hp.com> wrote:
> Since NVDIMMs are installed on memory slots, they expose the NUMA
> topology of a platform.  This patchset adds support of sysfs
> 'numa_node' to I/O-related NVDIMM devices under /sys/bus/nd/devices.
> This enables numactl(8) to accept 'block:' and 'file:' paths of
> pmem and btt devices as shown in the examples below.
>   numactl --preferred block:pmem0 --show
>   numactl --preferred file:/dev/pmem0s --show
>
> numactl can be used to bind an application to the locality of
> a target NVDIMM for better performance.  Here is a result of fio
> benchmark to ext4/dax on an HP DL380 with 2 sockets for local and
> remote settings.
>
>   Local [1] :  4098.3MB/s
>   Remote [2]:  3718.4MB/s
>
> [1] numactl --preferred block:pmem0 --cpunodebind block:pmem0 fio <fs-on-pmem0>
> [2] numactl --preferred block:pmem1 --cpunodebind block:pmem1 fio <fs-on-pmem0>
>
> Patch 1/3 applies on top of the acpica branch of the pm tree.
> Patch 2/3-3/3 apply on top of Dan Williams's v5 patch series of
> "libnvdimm: non-volatile memory devices".
>
> ---
> v2:
>  - Add acpi_map_pxm_to_online_node(), which returns an online node.
>  - Manage visibility of sysfs numa_node with is_visible. (Dan Williams)
>  - Check ACPI_NFIT_PROXIMITY_VALID in spa->flags.
>
> ---
> Toshi Kani (3):
>   1/3 acpi: Add acpi_map_pxm_to_online_node()
>   2/3 libnvdimm: Set numa_node to NVDIMM devices
>   3/3 libnvdimm: Add sysfs numa_node to NVDIMM devices
>
> ---
>  drivers/acpi/nfit.c             |  7 +++++++
>  drivers/acpi/numa.c             | 40 +++++++++++++++++++++++++++++++++++++---
>  drivers/nvdimm/btt.c            |  2 ++
>  drivers/nvdimm/btt_devs.c       |  1 +
>  drivers/nvdimm/bus.c            | 30 ++++++++++++++++++++++++++++++
>  drivers/nvdimm/namespace_devs.c |  1 +
>  drivers/nvdimm/nd.h             |  1 +
>  drivers/nvdimm/region.c         |  1 +
>  drivers/nvdimm/region_devs.c    |  1 +
>  include/linux/acpi.h            |  5 +++++
>  include/linux/libnvdimm.h       |  2 ++
>  11 files changed, 88 insertions(+), 3 deletions(-)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/3] Add NUMA support for NVDIMM devices
@ 2015-06-18 20:24   ` Dan Williams
  0 siblings, 0 replies; 27+ messages in thread
From: Dan Williams @ 2015-06-18 20:24 UTC (permalink / raw)
  To: Toshi Kani, Rafael J Wysocki
  Cc: Rafael J. Wysocki, Linux ACPI, linux-nvdimm@lists.01.org, linux-kernel

Rafael, does patch1 look ok to you?

On Tue, Jun 9, 2015 at 4:10 PM, Toshi Kani <toshi.kani@hp.com> wrote:
> Since NVDIMMs are installed on memory slots, they expose the NUMA
> topology of a platform.  This patchset adds support of sysfs
> 'numa_node' to I/O-related NVDIMM devices under /sys/bus/nd/devices.
> This enables numactl(8) to accept 'block:' and 'file:' paths of
> pmem and btt devices as shown in the examples below.
>   numactl --preferred block:pmem0 --show
>   numactl --preferred file:/dev/pmem0s --show
>
> numactl can be used to bind an application to the locality of
> a target NVDIMM for better performance.  Here is a result of fio
> benchmark to ext4/dax on an HP DL380 with 2 sockets for local and
> remote settings.
>
>   Local [1] :  4098.3MB/s
>   Remote [2]:  3718.4MB/s
>
> [1] numactl --preferred block:pmem0 --cpunodebind block:pmem0 fio <fs-on-pmem0>
> [2] numactl --preferred block:pmem1 --cpunodebind block:pmem1 fio <fs-on-pmem0>
>
> Patch 1/3 applies on top of the acpica branch of the pm tree.
> Patch 2/3-3/3 apply on top of Dan Williams's v5 patch series of
> "libnvdimm: non-volatile memory devices".
>
> ---
> v2:
>  - Add acpi_map_pxm_to_online_node(), which returns an online node.
>  - Manage visibility of sysfs numa_node with is_visible. (Dan Williams)
>  - Check ACPI_NFIT_PROXIMITY_VALID in spa->flags.
>
> ---
> Toshi Kani (3):
>   1/3 acpi: Add acpi_map_pxm_to_online_node()
>   2/3 libnvdimm: Set numa_node to NVDIMM devices
>   3/3 libnvdimm: Add sysfs numa_node to NVDIMM devices
>
> ---
>  drivers/acpi/nfit.c             |  7 +++++++
>  drivers/acpi/numa.c             | 40 +++++++++++++++++++++++++++++++++++++---
>  drivers/nvdimm/btt.c            |  2 ++
>  drivers/nvdimm/btt_devs.c       |  1 +
>  drivers/nvdimm/bus.c            | 30 ++++++++++++++++++++++++++++++
>  drivers/nvdimm/namespace_devs.c |  1 +
>  drivers/nvdimm/nd.h             |  1 +
>  drivers/nvdimm/region.c         |  1 +
>  drivers/nvdimm/region_devs.c    |  1 +
>  include/linux/acpi.h            |  5 +++++
>  include/linux/libnvdimm.h       |  2 ++
>  11 files changed, 88 insertions(+), 3 deletions(-)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/3] acpi: Add acpi_map_pxm_to_online_node()
  2015-06-09 23:10   ` Toshi Kani
@ 2015-06-19  0:42     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2015-06-19  0:42 UTC (permalink / raw)
  To: Toshi Kani; +Cc: dan.j.williams, linux-acpi, linux-nvdimm, linux-kernel

On Tuesday, June 09, 2015 05:10:38 PM Toshi Kani wrote:
> The kernel initializes CPU & memory's NUMA topology from ACPI
> SRAT table.  Some other ACPI tables, such as NFIT and DMAR,
> also contain proximity IDs for their device's NUMA topology.
> This information can be used to improve performance of these
> devices.
> 
> This patch introduces acpi_map_pxm_to_online_node(), which maps
> a given pxm to an online node.  This allows ACPI device driver
> modules to obtain a node from a device proximity ID.  Unlike
> acpi_map_pxm_to_node(), this interface is guaranteed to return
> an online node so that the caller module can use the node without
> dealing with the node status.  A node may be offline when a device
> proximity ID is unique, SRAT memory entry does not exist, or
> NUMA is disabled (ex. numa_off on x86).
> 
> This patch also moves the pxm range check from acpi_get_node()
> to acpi_map_pxm_to_node().
> 
> Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> ---
>  drivers/acpi/numa.c  |   40 +++++++++++++++++++++++++++++++++++++---
>  include/linux/acpi.h |    5 +++++
>  2 files changed, 42 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
> index 1333cbdc..a64947e 100644
> --- a/drivers/acpi/numa.c
> +++ b/drivers/acpi/numa.c
> @@ -29,6 +29,8 @@
>  #include <linux/errno.h>
>  #include <linux/acpi.h>
>  #include <linux/numa.h>
> +#include <linux/nodemask.h>
> +#include <linux/topology.h>
>  
>  #define PREFIX "ACPI: "
>  
> @@ -70,7 +72,12 @@ static void __acpi_map_pxm_to_node(int pxm, int node)
>  
>  int acpi_map_pxm_to_node(int pxm)
>  {
> -	int node = pxm_to_node_map[pxm];
> +	int node;
> +
> +	if (pxm < 0 || pxm >= MAX_PXM_DOMAINS)
> +		return NUMA_NO_NODE;
> +
> +	node = pxm_to_node_map[pxm];
>  
>  	if (node == NUMA_NO_NODE) {
>  		if (nodes_weight(nodes_found_map) >= MAX_NUMNODES)
> @@ -83,6 +90,35 @@ int acpi_map_pxm_to_node(int pxm)
>  	return node;
>  }
>  
> +/*
> + * Return an online node from a pxm.  This interface is intended for ACPI
> + * device drivers that obtain device NUMA topology from ACPI table, but
> + * do not initialize the node status.
> + */

Can you make this a proper kerneldoc, please?  *Especially* that it is an
exported function.

The description is a bit terse too in my view.

> +int acpi_map_pxm_to_online_node(int pxm)
> +{
> +	int node, n, dist, min_dist;
> +
> +	node = acpi_map_pxm_to_node(pxm);
> +
> +	if (node == NUMA_NO_NODE)
> +		node = 0;
> +
> +	if (!node_online(node)) {
> +		min_dist = INT_MAX;
> +		for_each_online_node(n) {
> +			dist = node_distance(node, n);
> +			if (dist < min_dist) {
> +				min_dist = dist;
> +				node = n;
> +			}
> +		}
> +	}
> +
> +	return node;
> +}
> +EXPORT_SYMBOL(acpi_map_pxm_to_online_node);
> +
>  static void __init
>  acpi_table_print_srat_entry(struct acpi_subtable_header *header)
>  {
> @@ -328,8 +364,6 @@ int acpi_get_node(acpi_handle handle)
>  	int pxm;
>  
>  	pxm = acpi_get_pxm(handle);
> -	if (pxm < 0 || pxm >= MAX_PXM_DOMAINS)
> -		return NUMA_NO_NODE;
>  
>  	return acpi_map_pxm_to_node(pxm);
>  }
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index e4da5e3..1b3bbb1 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -289,8 +289,13 @@ extern void acpi_dmi_osi_linux(int enable, const struct dmi_system_id *d);
>  extern void acpi_osi_setup(char *str);
>  
>  #ifdef CONFIG_ACPI_NUMA
> +int acpi_map_pxm_to_online_node(int pxm);
>  int acpi_get_node(acpi_handle handle);
>  #else
> +static inline int acpi_map_pxm_to_online_node(int pxm)
> +{
> +	return 0;
> +}
>  static inline int acpi_get_node(acpi_handle handle)
>  {
>  	return 0;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/3] acpi: Add acpi_map_pxm_to_online_node()
@ 2015-06-19  0:42     ` Rafael J. Wysocki
  0 siblings, 0 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2015-06-19  0:42 UTC (permalink / raw)
  To: Toshi Kani; +Cc: dan.j.williams, linux-acpi, linux-nvdimm, linux-kernel

On Tuesday, June 09, 2015 05:10:38 PM Toshi Kani wrote:
> The kernel initializes CPU & memory's NUMA topology from ACPI
> SRAT table.  Some other ACPI tables, such as NFIT and DMAR,
> also contain proximity IDs for their device's NUMA topology.
> This information can be used to improve performance of these
> devices.
> 
> This patch introduces acpi_map_pxm_to_online_node(), which maps
> a given pxm to an online node.  This allows ACPI device driver
> modules to obtain a node from a device proximity ID.  Unlike
> acpi_map_pxm_to_node(), this interface is guaranteed to return
> an online node so that the caller module can use the node without
> dealing with the node status.  A node may be offline when a device
> proximity ID is unique, SRAT memory entry does not exist, or
> NUMA is disabled (ex. numa_off on x86).
> 
> This patch also moves the pxm range check from acpi_get_node()
> to acpi_map_pxm_to_node().
> 
> Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> ---
>  drivers/acpi/numa.c  |   40 +++++++++++++++++++++++++++++++++++++---
>  include/linux/acpi.h |    5 +++++
>  2 files changed, 42 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
> index 1333cbdc..a64947e 100644
> --- a/drivers/acpi/numa.c
> +++ b/drivers/acpi/numa.c
> @@ -29,6 +29,8 @@
>  #include <linux/errno.h>
>  #include <linux/acpi.h>
>  #include <linux/numa.h>
> +#include <linux/nodemask.h>
> +#include <linux/topology.h>
>  
>  #define PREFIX "ACPI: "
>  
> @@ -70,7 +72,12 @@ static void __acpi_map_pxm_to_node(int pxm, int node)
>  
>  int acpi_map_pxm_to_node(int pxm)
>  {
> -	int node = pxm_to_node_map[pxm];
> +	int node;
> +
> +	if (pxm < 0 || pxm >= MAX_PXM_DOMAINS)
> +		return NUMA_NO_NODE;
> +
> +	node = pxm_to_node_map[pxm];
>  
>  	if (node == NUMA_NO_NODE) {
>  		if (nodes_weight(nodes_found_map) >= MAX_NUMNODES)
> @@ -83,6 +90,35 @@ int acpi_map_pxm_to_node(int pxm)
>  	return node;
>  }
>  
> +/*
> + * Return an online node from a pxm.  This interface is intended for ACPI
> + * device drivers that obtain device NUMA topology from ACPI table, but
> + * do not initialize the node status.
> + */

Can you make this a proper kerneldoc, please?  *Especially* that it is an
exported function.

The description is a bit terse too in my view.

> +int acpi_map_pxm_to_online_node(int pxm)
> +{
> +	int node, n, dist, min_dist;
> +
> +	node = acpi_map_pxm_to_node(pxm);
> +
> +	if (node == NUMA_NO_NODE)
> +		node = 0;
> +
> +	if (!node_online(node)) {
> +		min_dist = INT_MAX;
> +		for_each_online_node(n) {
> +			dist = node_distance(node, n);
> +			if (dist < min_dist) {
> +				min_dist = dist;
> +				node = n;
> +			}
> +		}
> +	}
> +
> +	return node;
> +}
> +EXPORT_SYMBOL(acpi_map_pxm_to_online_node);
> +
>  static void __init
>  acpi_table_print_srat_entry(struct acpi_subtable_header *header)
>  {
> @@ -328,8 +364,6 @@ int acpi_get_node(acpi_handle handle)
>  	int pxm;
>  
>  	pxm = acpi_get_pxm(handle);
> -	if (pxm < 0 || pxm >= MAX_PXM_DOMAINS)
> -		return NUMA_NO_NODE;
>  
>  	return acpi_map_pxm_to_node(pxm);
>  }
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index e4da5e3..1b3bbb1 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -289,8 +289,13 @@ extern void acpi_dmi_osi_linux(int enable, const struct dmi_system_id *d);
>  extern void acpi_osi_setup(char *str);
>  
>  #ifdef CONFIG_ACPI_NUMA
> +int acpi_map_pxm_to_online_node(int pxm);
>  int acpi_get_node(acpi_handle handle);
>  #else
> +static inline int acpi_map_pxm_to_online_node(int pxm)
> +{
> +	return 0;
> +}
>  static inline int acpi_get_node(acpi_handle handle)
>  {
>  	return 0;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/3] Add NUMA support for NVDIMM devices
  2015-06-18 20:24   ` Dan Williams
@ 2015-06-19  0:43     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2015-06-19  0:43 UTC (permalink / raw)
  To: Dan Williams
  Cc: Toshi Kani, Rafael J Wysocki, Linux ACPI, linux-nvdimm, linux-kernel

On Thursday, June 18, 2015 01:24:01 PM Dan Williams wrote:
> Rafael, does patch1 look ok to you?

Mostly.  acpi_map_pxm_to_online_node() needs a proper kerneldoc comment
describing what it does.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/3] Add NUMA support for NVDIMM devices
@ 2015-06-19  0:43     ` Rafael J. Wysocki
  0 siblings, 0 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2015-06-19  0:43 UTC (permalink / raw)
  To: Dan Williams
  Cc: Toshi Kani, Rafael J Wysocki, Linux ACPI,
	linux-nvdimm@lists.01.org, linux-kernel

On Thursday, June 18, 2015 01:24:01 PM Dan Williams wrote:
> Rafael, does patch1 look ok to you?

Mostly.  acpi_map_pxm_to_online_node() needs a proper kerneldoc comment
describing what it does.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/3] acpi: Add acpi_map_pxm_to_online_node()
  2015-06-19  0:42     ` Rafael J. Wysocki
@ 2015-06-19  1:16       ` Toshi Kani
  -1 siblings, 0 replies; 27+ messages in thread
From: Toshi Kani @ 2015-06-19  1:16 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: dan.j.williams, linux-acpi, linux-nvdimm, linux-kernel

On Fri, 2015-06-19 at 02:42 +0200, Rafael J. Wysocki wrote:
> On Tuesday, June 09, 2015 05:10:38 PM Toshi Kani wrote:
> > The kernel initializes CPU & memory's NUMA topology from ACPI
> > SRAT table.  Some other ACPI tables, such as NFIT and DMAR,
> > also contain proximity IDs for their device's NUMA topology.
> > This information can be used to improve performance of these
> > devices.
> > 
> > This patch introduces acpi_map_pxm_to_online_node(), which maps
> > a given pxm to an online node.  This allows ACPI device driver
> > modules to obtain a node from a device proximity ID.  Unlike
> > acpi_map_pxm_to_node(), this interface is guaranteed to return
> > an online node so that the caller module can use the node without
> > dealing with the node status.  A node may be offline when a device
> > proximity ID is unique, SRAT memory entry does not exist, or
> > NUMA is disabled (ex. numa_off on x86).
> > 
> > This patch also moves the pxm range check from acpi_get_node()
> > to acpi_map_pxm_to_node().
 :
> > +/*
> > + * Return an online node from a pxm.  This interface is intended for ACPI
> > + * device drivers that obtain device NUMA topology from ACPI table, but
> > + * do not initialize the node status.
> > + */
> 
> Can you make this a proper kerneldoc, please?  *Especially* that it is an
> exported function.
> 
> The description is a bit terse too in my view.

Agreed. I will update the comment as a proper kerneldoc.

Thanks!
-Toshi


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/3] acpi: Add acpi_map_pxm_to_online_node()
@ 2015-06-19  1:16       ` Toshi Kani
  0 siblings, 0 replies; 27+ messages in thread
From: Toshi Kani @ 2015-06-19  1:16 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: dan.j.williams, linux-acpi, linux-nvdimm, linux-kernel

On Fri, 2015-06-19 at 02:42 +0200, Rafael J. Wysocki wrote:
> On Tuesday, June 09, 2015 05:10:38 PM Toshi Kani wrote:
> > The kernel initializes CPU & memory's NUMA topology from ACPI
> > SRAT table.  Some other ACPI tables, such as NFIT and DMAR,
> > also contain proximity IDs for their device's NUMA topology.
> > This information can be used to improve performance of these
> > devices.
> > 
> > This patch introduces acpi_map_pxm_to_online_node(), which maps
> > a given pxm to an online node.  This allows ACPI device driver
> > modules to obtain a node from a device proximity ID.  Unlike
> > acpi_map_pxm_to_node(), this interface is guaranteed to return
> > an online node so that the caller module can use the node without
> > dealing with the node status.  A node may be offline when a device
> > proximity ID is unique, SRAT memory entry does not exist, or
> > NUMA is disabled (ex. numa_off on x86).
> > 
> > This patch also moves the pxm range check from acpi_get_node()
> > to acpi_map_pxm_to_node().
 :
> > +/*
> > + * Return an online node from a pxm.  This interface is intended for ACPI
> > + * device drivers that obtain device NUMA topology from ACPI table, but
> > + * do not initialize the node status.
> > + */
> 
> Can you make this a proper kerneldoc, please?  *Especially* that it is an
> exported function.
> 
> The description is a bit terse too in my view.

Agreed. I will update the comment as a proper kerneldoc.

Thanks!
-Toshi


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2015-06-19  1:17 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-09 23:10 [PATCH v2 0/3] Add NUMA support for NVDIMM devices Toshi Kani
2015-06-09 23:10 ` Toshi Kani
2015-06-09 23:10 ` [PATCH v2 1/3] acpi: Add acpi_map_pxm_to_online_node() Toshi Kani
2015-06-09 23:10   ` Toshi Kani
2015-06-19  0:42   ` Rafael J. Wysocki
2015-06-19  0:42     ` Rafael J. Wysocki
2015-06-19  1:16     ` Toshi Kani
2015-06-19  1:16       ` Toshi Kani
2015-06-09 23:10 ` [PATCH v2 2/3] libnvdimm: Set numa_node to NVDIMM devices Toshi Kani
2015-06-09 23:10   ` Toshi Kani
2015-06-09 23:10 ` [PATCH v2 3/3] libnvdimm: Add sysfs " Toshi Kani
2015-06-09 23:10   ` Toshi Kani
2015-06-10 15:54 ` [PATCH v2 0/3] Add NUMA support for " Jeff Moyer
2015-06-10 15:57   ` Dan Williams
2015-06-10 16:11     ` Jeff Moyer
2015-06-10 16:11       ` Jeff Moyer
2015-06-10 16:20     ` Elliott, Robert (Server Storage)
2015-06-10 16:37       ` Dan Williams
2015-06-10 16:20     ` Toshi Kani
2015-06-11 15:38 ` Dan Williams
2015-06-11 15:38   ` Dan Williams
2015-06-11 15:45   ` Toshi Kani
2015-06-11 15:45     ` Toshi Kani
2015-06-18 20:24 ` Dan Williams
2015-06-18 20:24   ` Dan Williams
2015-06-19  0:43   ` Rafael J. Wysocki
2015-06-19  0:43     ` Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.