All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Williams, Dan J" <dan.j.williams@intel.com>
To: "toshi.kani@hp.com" <toshi.kani@hp.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"mingo@kernel.org" <mingo@kernel.org>, "hch@lst.de" <hch@lst.de>,
	"axboe@kernel.dk" <axboe@kernel.dk>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
	"boaz@plexistor.com" <boaz@plexistor.com>
Subject: Re: [PATCH v2 15/17] libnvdimm: Set numa_node to NVDIMM devices
Date: Thu, 25 Jun 2015 18:34:47 +0000	[thread overview]
Message-ID: <1435257283.13411.4.camel@intel.com> (raw)
In-Reply-To: <1435254317.11808.327.camel@misato.fc.hp.com>

On Thu, 2015-06-25 at 11:45 -0600, Toshi Kani wrote:
> On Thu, 2015-06-25 at 05:37 -0400, Dan Williams wrote:
> > From: Toshi Kani <toshi.kani@hp.com>
> > 
> > ACPI NFIT table has System Physical Address Range Structure entries that
> > describe a proximity ID of each range when ACPI_NFIT_PROXIMITY_VALID is
> > set in the flags.
> > 
> > Change acpi_nfit_register_region() to map a proximity ID to its node ID,
> > and set it to a new numa_node field of nd_region_desc, which is then
> > conveyed to the nd_region device.
> > 
> > The device core arranges for btt and namespace devices to inherit their
> > node from their parent region.
> > 
> > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> > [djbw: move set_dev_node() from region 'probe' to 'create']
> 
> Sorry, I failed to mention other issue, which led me call set_dev_node()
> in probe.  nd_async_device_register() calls device_add(), which does:
> 
>         /* use parent numa_node */
>         if (parent)
>                 set_dev_node(dev, dev_to_node(parent));
> 
> and overwrites numa_node to -1.  Since region's parent is ndbusN, we
> cannot set numa_node to the parent.  So, I had to set it in probe. 

In general, I still don't like leaving it up to ->probe() which is
within its rights to fail and not set the node.  How about the following
that moves it to the bus uevent code?  Should get triggered before probe
so the numa_node is valid before userspace is ever notified about the
device.

device_add() does:

        kobject_uevent(&dev->kobj, KOBJ_ADD);
        bus_probe_device(dev);

...so I think we're good, agree?  I also added a missing init of
ndr_desc.numa_node in arch/x86/kernel/pmem.c, see below.

8<-----
Subject: libnvdimm: Set numa_node to NVDIMM devices

From: Toshi Kani <toshi.kani@hp.com>

ACPI NFIT table has System Physical Address Range Structure entries that
describe a proximity ID of each range when ACPI_NFIT_PROXIMITY_VALID is
set in the flags.

Change acpi_nfit_register_region() to map a proximity ID to its node ID,
and set it to a new numa_node field of nd_region_desc, which is then
conveyed to the nd_region device.

The device core arranges for btt and namespace devices to inherit their
node from their parent region.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
[djbw: move set_dev_node() from region.c to bus.c]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/kernel/pmem.c       |    1 +
 drivers/acpi/nfit.c          |    6 ++++++
 drivers/nvdimm/bus.c         |    6 ++++++
 drivers/nvdimm/nd.h          |    2 +-
 drivers/nvdimm/region_devs.c |    1 +
 include/linux/libnvdimm.h    |    1 +
 6 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/pmem.c b/arch/x86/kernel/pmem.c
index 0f4ef472ab9e..64f90f53bb85 100644
--- a/arch/x86/kernel/pmem.c
+++ b/arch/x86/kernel/pmem.c
@@ -67,6 +67,7 @@ static __init int register_e820_pmem(void)
 		memset(&ndr_desc, 0, sizeof(ndr_desc));
 		ndr_desc.res = &res;
 		ndr_desc.attr_groups = e820_pmem_region_attribute_groups;
+		ndr_desc.numa_node = NUMA_NO_NODE;
 		if (!nvdimm_pmem_region_create(nvdimm_bus, &ndr_desc))
 			goto err;
 	}
diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index 1f6f1b1a54f4..d96c8fe974dd 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -1392,6 +1392,12 @@ static int acpi_nfit_register_region(struct acpi_nfit_desc *acpi_desc,
 	ndr_desc->res = &res;
 	ndr_desc->provider_data = nfit_spa;
 	ndr_desc->attr_groups = acpi_nfit_region_attribute_groups;
+	if (spa->flags & ACPI_NFIT_PROXIMITY_VALID)
+		ndr_desc->numa_node = acpi_map_pxm_to_online_node(
+						spa->proximity_domain);
+	else
+		ndr_desc->numa_node = NUMA_NO_NODE;
+
 	list_for_each_entry(nfit_memdev, &acpi_desc->memdevs, list) {
 		struct acpi_nfit_memory_map *memdev = nfit_memdev->memdev;
 		struct nd_mapping *nd_mapping;
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index ec59f1f26d95..205344643852 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -48,6 +48,12 @@ static int to_nd_device_type(struct device *dev)
 
 static int nvdimm_bus_uevent(struct device *dev, struct kobj_uevent_env *env)
 {
+	/*
+	 * Ensure that region devices always have their numa node set as
+	 * early as possible.
+	 */
+	if (is_nd_pmem(dev) || is_nd_blk(dev))
+		set_dev_node(dev, to_nd_region(dev)->numa_node);
 	return add_uevent_var(env, "MODALIAS=" ND_DEVICE_MODALIAS_FMT,
 			to_nd_device_type(dev));
 }
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index b870de9add79..72c26461835d 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -96,7 +96,7 @@ struct nd_region {
 	u16 ndr_mappings;
 	u64 ndr_size;
 	u64 ndr_start;
-	int id, num_lanes, ro;
+	int id, num_lanes, ro, numa_node;
 	void *provider_data;
 	struct nd_interleave_set *nd_set;
 	struct nd_percpu_lane __percpu *lane;
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 8f8c7ea485f1..55b424f6ba0d 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -736,6 +736,7 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus,
 	nd_region->nd_set = ndr_desc->nd_set;
 	nd_region->num_lanes = ndr_desc->num_lanes;
 	nd_region->ro = ro;
+	nd_region->numa_node = ndr_desc->numa_node;
 	ida_init(&nd_region->ns_ida);
 	dev = &nd_region->dev;
 	dev_set_name(dev, "region%d", nd_region->id);
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index dc799a29ed1a..30b3deaafd51 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -89,6 +89,7 @@ struct nd_region_desc {
 	struct nd_interleave_set *nd_set;
 	void *provider_data;
 	int num_lanes;
+	int numa_node;
 };
 
 struct nvdimm_bus;



WARNING: multiple messages have this Message-ID (diff)
From: "Williams, Dan J" <dan.j.williams@intel.com>
To: "toshi.kani@hp.com" <toshi.kani@hp.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"mingo@kernel.org" <mingo@kernel.org>, "hch@lst.de" <hch@lst.de>,
	"axboe@kernel.dk" <axboe@kernel.dk>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
	"boaz@plexistor.com" <boaz@plexistor.com>
Subject: Re: [PATCH v2 15/17] libnvdimm: Set numa_node to NVDIMM devices
Date: Thu, 25 Jun 2015 18:34:47 +0000	[thread overview]
Message-ID: <1435257283.13411.4.camel@intel.com> (raw)
In-Reply-To: <1435254317.11808.327.camel@misato.fc.hp.com>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 5869 bytes --]

On Thu, 2015-06-25 at 11:45 -0600, Toshi Kani wrote:
> On Thu, 2015-06-25 at 05:37 -0400, Dan Williams wrote:
> > From: Toshi Kani <toshi.kani@hp.com>
> > 
> > ACPI NFIT table has System Physical Address Range Structure entries that
> > describe a proximity ID of each range when ACPI_NFIT_PROXIMITY_VALID is
> > set in the flags.
> > 
> > Change acpi_nfit_register_region() to map a proximity ID to its node ID,
> > and set it to a new numa_node field of nd_region_desc, which is then
> > conveyed to the nd_region device.
> > 
> > The device core arranges for btt and namespace devices to inherit their
> > node from their parent region.
> > 
> > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> > [djbw: move set_dev_node() from region 'probe' to 'create']
> 
> Sorry, I failed to mention other issue, which led me call set_dev_node()
> in probe.  nd_async_device_register() calls device_add(), which does:
> 
>         /* use parent numa_node */
>         if (parent)
>                 set_dev_node(dev, dev_to_node(parent));
> 
> and overwrites numa_node to -1.  Since region's parent is ndbusN, we
> cannot set numa_node to the parent.  So, I had to set it in probe. 

In general, I still don't like leaving it up to ->probe() which is
within its rights to fail and not set the node.  How about the following
that moves it to the bus uevent code?  Should get triggered before probe
so the numa_node is valid before userspace is ever notified about the
device.

device_add() does:

        kobject_uevent(&dev->kobj, KOBJ_ADD);
        bus_probe_device(dev);

...so I think we're good, agree?  I also added a missing init of
ndr_desc.numa_node in arch/x86/kernel/pmem.c, see below.

8<-----
Subject: libnvdimm: Set numa_node to NVDIMM devices

From: Toshi Kani <toshi.kani@hp.com>

ACPI NFIT table has System Physical Address Range Structure entries that
describe a proximity ID of each range when ACPI_NFIT_PROXIMITY_VALID is
set in the flags.

Change acpi_nfit_register_region() to map a proximity ID to its node ID,
and set it to a new numa_node field of nd_region_desc, which is then
conveyed to the nd_region device.

The device core arranges for btt and namespace devices to inherit their
node from their parent region.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
[djbw: move set_dev_node() from region.c to bus.c]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/kernel/pmem.c       |    1 +
 drivers/acpi/nfit.c          |    6 ++++++
 drivers/nvdimm/bus.c         |    6 ++++++
 drivers/nvdimm/nd.h          |    2 +-
 drivers/nvdimm/region_devs.c |    1 +
 include/linux/libnvdimm.h    |    1 +
 6 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/pmem.c b/arch/x86/kernel/pmem.c
index 0f4ef472ab9e..64f90f53bb85 100644
--- a/arch/x86/kernel/pmem.c
+++ b/arch/x86/kernel/pmem.c
@@ -67,6 +67,7 @@ static __init int register_e820_pmem(void)
 		memset(&ndr_desc, 0, sizeof(ndr_desc));
 		ndr_desc.res = &res;
 		ndr_desc.attr_groups = e820_pmem_region_attribute_groups;
+		ndr_desc.numa_node = NUMA_NO_NODE;
 		if (!nvdimm_pmem_region_create(nvdimm_bus, &ndr_desc))
 			goto err;
 	}
diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index 1f6f1b1a54f4..d96c8fe974dd 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -1392,6 +1392,12 @@ static int acpi_nfit_register_region(struct acpi_nfit_desc *acpi_desc,
 	ndr_desc->res = &res;
 	ndr_desc->provider_data = nfit_spa;
 	ndr_desc->attr_groups = acpi_nfit_region_attribute_groups;
+	if (spa->flags & ACPI_NFIT_PROXIMITY_VALID)
+		ndr_desc->numa_node = acpi_map_pxm_to_online_node(
+						spa->proximity_domain);
+	else
+		ndr_desc->numa_node = NUMA_NO_NODE;
+
 	list_for_each_entry(nfit_memdev, &acpi_desc->memdevs, list) {
 		struct acpi_nfit_memory_map *memdev = nfit_memdev->memdev;
 		struct nd_mapping *nd_mapping;
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index ec59f1f26d95..205344643852 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -48,6 +48,12 @@ static int to_nd_device_type(struct device *dev)
 
 static int nvdimm_bus_uevent(struct device *dev, struct kobj_uevent_env *env)
 {
+	/*
+	 * Ensure that region devices always have their numa node set as
+	 * early as possible.
+	 */
+	if (is_nd_pmem(dev) || is_nd_blk(dev))
+		set_dev_node(dev, to_nd_region(dev)->numa_node);
 	return add_uevent_var(env, "MODALIAS=" ND_DEVICE_MODALIAS_FMT,
 			to_nd_device_type(dev));
 }
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index b870de9add79..72c26461835d 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -96,7 +96,7 @@ struct nd_region {
 	u16 ndr_mappings;
 	u64 ndr_size;
 	u64 ndr_start;
-	int id, num_lanes, ro;
+	int id, num_lanes, ro, numa_node;
 	void *provider_data;
 	struct nd_interleave_set *nd_set;
 	struct nd_percpu_lane __percpu *lane;
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 8f8c7ea485f1..55b424f6ba0d 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -736,6 +736,7 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus,
 	nd_region->nd_set = ndr_desc->nd_set;
 	nd_region->num_lanes = ndr_desc->num_lanes;
 	nd_region->ro = ro;
+	nd_region->numa_node = ndr_desc->numa_node;
 	ida_init(&nd_region->ns_ida);
 	dev = &nd_region->dev;
 	dev_set_name(dev, "region%d", nd_region->id);
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index dc799a29ed1a..30b3deaafd51 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -89,6 +89,7 @@ struct nd_region_desc {
 	struct nd_interleave_set *nd_set;
 	void *provider_data;
 	int num_lanes;
+	int numa_node;
 };
 
 struct nvdimm_bus;


ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

  parent reply	other threads:[~2015-06-25 18:34 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-25  9:36 [PATCH v2 00/17] libnvdimm: ->rw_bytes(), BLK, BTT, PMEM api, and unit tests Dan Williams
2015-06-25  9:36 ` Dan Williams
2015-06-25  9:36 ` [PATCH v2 01/17] libnvdimm: infrastructure for btt devices Dan Williams
2015-06-25  9:36   ` Dan Williams
2015-06-25  9:36 ` [PATCH v2 02/17] nd_btt: atomic sector updates Dan Williams
2015-06-25  9:36   ` Dan Williams
2015-06-25  9:36 ` [PATCH v2 03/17] libnvdimm, nfit, nd_blk: driver for BLK-mode access persistent memory Dan Williams
2015-06-25  9:36   ` Dan Williams
2015-06-25  9:36 ` [PATCH v2 04/17] tools/testing/nvdimm: libnvdimm unit test infrastructure Dan Williams
2015-06-25  9:36   ` Dan Williams
2015-06-25  9:36 ` [PATCH v2 05/17] libnvdimm: Non-Volatile Devices Dan Williams
2015-06-25  9:36   ` Dan Williams
2015-06-25  9:36   ` Dan Williams
2015-06-25  9:36 ` [PATCH v2 06/17] fs/block_dev.c: skip rw_page if bdev has integrity Dan Williams
2015-06-25  9:36   ` Dan Williams
2015-06-25  9:36 ` [PATCH v2 07/17] libnvdimm, btt: add support for blk integrity Dan Williams
2015-06-25  9:36   ` Dan Williams
2015-06-25  9:37 ` [PATCH v2 08/17] libnvdimm, blk: " Dan Williams
2015-06-25  9:37   ` Dan Williams
2015-06-25  9:37 ` [PATCH v2 09/17] libnvdimm, pmem: fix up max_hw_sectors Dan Williams
2015-06-25  9:37   ` Dan Williams
2015-06-25  9:37 ` [PATCH v2 10/17] pmem: make_request cleanups Dan Williams
2015-06-25  9:37   ` Dan Williams
2015-06-25  9:37 ` [PATCH v2 11/17] libnvdimm: enable iostat Dan Williams
2015-06-25  9:37   ` Dan Williams
2015-06-25  9:37 ` [PATCH v2 12/17] pmem: flag pmem block devices as non-rotational Dan Williams
2015-06-25  9:37   ` Dan Williams
2015-06-25  9:37 ` [PATCH v2 13/17] libnvdimm, nfit: handle unarmed dimms, mark namespaces read-only Dan Williams
2015-06-25  9:37   ` Dan Williams
2015-06-25  9:37 ` [PATCH v2 14/17] acpi: Add acpi_map_pxm_to_online_node() Dan Williams
2015-06-25  9:37   ` Dan Williams
2015-06-25  9:37 ` [PATCH v2 15/17] libnvdimm: Set numa_node to NVDIMM devices Dan Williams
2015-06-25  9:37   ` Dan Williams
2015-06-25 17:45   ` Toshi Kani
2015-06-25 17:45     ` Toshi Kani
2015-06-25 17:47     ` Dan Williams
2015-06-25 17:47       ` Dan Williams
2015-06-25 18:34     ` Williams, Dan J [this message]
2015-06-25 18:34       ` Williams, Dan J
2015-06-25 21:31       ` Dan Williams
2015-06-25 21:31         ` Dan Williams
2015-06-25 21:51         ` Toshi Kani
2015-06-25 21:51           ` Toshi Kani
2015-06-25 22:00           ` Dan Williams
2015-06-25 22:00             ` Dan Williams
2015-06-25 22:11             ` Toshi Kani
2015-06-25 22:11               ` Toshi Kani
2015-06-25 22:34               ` Dan Williams
2015-06-25 22:34                 ` Dan Williams
2015-06-25 22:55                 ` Toshi Kani
2015-06-25 22:55                   ` Toshi Kani
2015-06-25 23:42                   ` Williams, Dan J
2015-06-25 23:42                     ` Williams, Dan J
2015-06-26  0:55                     ` Toshi Kani
2015-06-26  0:55                       ` Toshi Kani
2015-06-26  1:08                       ` Dan Williams
2015-06-26  1:08                         ` Dan Williams
2015-06-26  1:21                         ` Toshi Kani
2015-06-26  1:21                           ` Toshi Kani
2015-06-25  9:37 ` [PATCH v2 16/17] libnvdimm: Add sysfs " Dan Williams
2015-06-25  9:37   ` Dan Williams
2015-06-26  2:21   ` Toshi Kani
2015-06-26  2:21     ` Toshi Kani
2015-06-26 15:26     ` Dan Williams
2015-06-26 15:26       ` Dan Williams
2015-06-25  9:37 ` [PATCH v2 17/17] arch, x86: pmem api for ensuring durability of persistent memory updates Dan Williams
2015-06-25  9:37   ` Dan Williams
2015-06-30 10:21   ` Dan Carpenter
2015-06-30 16:23     ` Williams, Dan J
2015-06-30 16:23       ` Williams, Dan J

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1435257283.13411.4.camel@intel.com \
    --to=dan.j.williams@intel.com \
    --cc=axboe@kernel.dk \
    --cc=boaz@plexistor.com \
    --cc=hch@lst.de \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=mingo@kernel.org \
    --cc=toshi.kani@hp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.