[PATCH v3 0/3] ARS rescanning triggered by latent errors or userspace

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/3] ARS rescanning triggered by latent errors or userspace
@ 2016-07-22 23:21 ` Vishal Verma
  0 siblings, 0 replies; 8+ messages in thread
From: Vishal Verma @ 2016-07-22 23:21 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Dan Williams, Rafael J. Wysocki, Tony Luck, linux-kernel,
	linux-acpi, Vishal Verma

Changes in v3:
- Add a missing sysfs_put (Dan)
- Improve readability in an expression, making it (!x || !y) instead
  of !(x && y) (Dan)
- Only show the 'scrub' attribute if ARS is supported (Linda)
- For scrub_show(), indicate if a scrub is in progress (Dan)
- Rebase to nvdimm-for-next + 2 patches from Dan to handle resource leaks
  with nfit_test
- Remove an unnecessary mutex lock/unlock in nfit_exit in Patch 3


Changes in v2:
- Rework the ars_done flag in nfit_spa to be ars_required, and reuse it for
  rescanning (Dan)
- Rename the ars_rescan attribute to simply 'scrub', and move into the nfit
  group since only nfit buses have this capability (Dan)
- Make the scrub attribute RW, and on reads return the number of times a
  scrub has happened since driver load. This prompted some additional
  refactoring, notably the new helpers acpi_nfit_desc_alloc_register, and
  to_nvdimm_bus_dev. These are all in patch 2. (Dan)
- Remove some redundant list_empty checks in patch 3 (Dan)
- If the acpi_descs lists is not empty at driver unload time, WARN() (Dan)

This series adds on-demand ARS scanning on both, discovery of
latent media errors, and a sysfs trigger from userspace.

The rescanning part is easy to test using the nfit_test framework
- create a namespace (this will by default have bad sectors in
the middle), clear the bad sectors by writing to them, trigger
the rescan through sysfs, and the bad sectors will reappear in
/sys/block/<pmemX>/badblocks.

For the mce handling, I've tested the notifier chain callback
being called with a mock struct mce (called via another sysfs
trigger - this isn't included in the patch obviously), which
has the address field set to a known address in a SPA range,
and the status field with the MCACOD flag set.

What I haven't easily been able to test is the same callback
path with a 'real world' mce, being called as part of the
x86_mce_decoder_chain notifier. I'd therefore appreciate a
closer look at the initial filtering done in nfit_handle_mce
(patch 3/3) from Tony or anyone more familiar with mce handling.

The series is based on v4.7-rc7, and a tree is available at
https://git.kernel.org/cgit/linux/kernel/git/vishal/nvdimm.git/log/?h=ars-ondemand


Vishal Verma (3):
  pmem: clarify a debug print in pmem_clear_poison
  nfit, libnvdimm: allow an ARS scrub to be triggered on demand
  nfit: do an ARS scrub on hitting a latent media error

 drivers/acpi/nfit.c       | 221 ++++++++++++++++++++++++++++++++++++++++++++--
 drivers/acpi/nfit.h       |   5 +-
 drivers/nvdimm/core.c     |   7 ++
 drivers/nvdimm/pmem.c     |   2 +-
 include/linux/libnvdimm.h |   1 +
 5 files changed, 227 insertions(+), 9 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3 0/3] ARS rescanning triggered by latent errors or userspace
@ 2016-07-22 23:21 ` Vishal Verma
  0 siblings, 0 replies; 8+ messages in thread
From: Vishal Verma @ 2016-07-22 23:21 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Dan Williams, Rafael J. Wysocki, Tony Luck, linux-kernel,
	linux-acpi, Vishal Verma

Changes in v3:
- Add a missing sysfs_put (Dan)
- Improve readability in an expression, making it (!x || !y) instead
  of !(x && y) (Dan)
- Only show the 'scrub' attribute if ARS is supported (Linda)
- For scrub_show(), indicate if a scrub is in progress (Dan)
- Rebase to nvdimm-for-next + 2 patches from Dan to handle resource leaks
  with nfit_test
- Remove an unnecessary mutex lock/unlock in nfit_exit in Patch 3


Changes in v2:
- Rework the ars_done flag in nfit_spa to be ars_required, and reuse it for
  rescanning (Dan)
- Rename the ars_rescan attribute to simply 'scrub', and move into the nfit
  group since only nfit buses have this capability (Dan)
- Make the scrub attribute RW, and on reads return the number of times a
  scrub has happened since driver load. This prompted some additional
  refactoring, notably the new helpers acpi_nfit_desc_alloc_register, and
  to_nvdimm_bus_dev. These are all in patch 2. (Dan)
- Remove some redundant list_empty checks in patch 3 (Dan)
- If the acpi_descs lists is not empty at driver unload time, WARN() (Dan)

This series adds on-demand ARS scanning on both, discovery of
latent media errors, and a sysfs trigger from userspace.

The rescanning part is easy to test using the nfit_test framework
- create a namespace (this will by default have bad sectors in
the middle), clear the bad sectors by writing to them, trigger
the rescan through sysfs, and the bad sectors will reappear in
/sys/block/<pmemX>/badblocks.

For the mce handling, I've tested the notifier chain callback
being called with a mock struct mce (called via another sysfs
trigger - this isn't included in the patch obviously), which
has the address field set to a known address in a SPA range,
and the status field with the MCACOD flag set.

What I haven't easily been able to test is the same callback
path with a 'real world' mce, being called as part of the
x86_mce_decoder_chain notifier. I'd therefore appreciate a
closer look at the initial filtering done in nfit_handle_mce
(patch 3/3) from Tony or anyone more familiar with mce handling.

The series is based on v4.7-rc7, and a tree is available at
https://git.kernel.org/cgit/linux/kernel/git/vishal/nvdimm.git/log/?h=ars-ondemand


Vishal Verma (3):
  pmem: clarify a debug print in pmem_clear_poison
  nfit, libnvdimm: allow an ARS scrub to be triggered on demand
  nfit: do an ARS scrub on hitting a latent media error

 drivers/acpi/nfit.c       | 221 ++++++++++++++++++++++++++++++++++++++++++++--
 drivers/acpi/nfit.h       |   5 +-
 drivers/nvdimm/core.c     |   7 ++
 drivers/nvdimm/pmem.c     |   2 +-
 include/linux/libnvdimm.h |   1 +
 5 files changed, 227 insertions(+), 9 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3 1/3] pmem: clarify a debug print in pmem_clear_poison
  2016-07-22 23:21 ` Vishal Verma
@ 2016-07-22 23:21   ` Vishal Verma
  -1 siblings, 0 replies; 8+ messages in thread
From: Vishal Verma @ 2016-07-22 23:21 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Dan Williams, Rafael J. Wysocki, Tony Luck, linux-kernel,
	linux-acpi, Vishal Verma

Prefix the sector number being cleared with a '0x' to make it clear that
this is a hex value.

Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 drivers/nvdimm/pmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 7251b4b..9f75eb8 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -58,7 +58,7 @@ static void pmem_clear_poison(struct pmem_device *pmem, phys_addr_t offset,
 	cleared = nvdimm_clear_poison(dev, pmem->phys_addr + offset, len);
 
 	if (cleared > 0 && cleared / 512) {
-		dev_dbg(dev, "%s: %llx clear %ld sector%s\n",
+		dev_dbg(dev, "%s: %#llx clear %ld sector%s\n",
 				__func__, (unsigned long long) sector,
 				cleared / 512, cleared / 512 > 1 ? "s" : "");
 		badblocks_clear(&pmem->bb, sector, cleared / 512);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 1/3] pmem: clarify a debug print in pmem_clear_poison
@ 2016-07-22 23:21   ` Vishal Verma
  0 siblings, 0 replies; 8+ messages in thread
From: Vishal Verma @ 2016-07-22 23:21 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Dan Williams, Rafael J. Wysocki, Tony Luck, linux-kernel,
	linux-acpi, Vishal Verma

Prefix the sector number being cleared with a '0x' to make it clear that
this is a hex value.

Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 drivers/nvdimm/pmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 7251b4b..9f75eb8 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -58,7 +58,7 @@ static void pmem_clear_poison(struct pmem_device *pmem, phys_addr_t offset,
 	cleared = nvdimm_clear_poison(dev, pmem->phys_addr + offset, len);
 
 	if (cleared > 0 && cleared / 512) {
-		dev_dbg(dev, "%s: %llx clear %ld sector%s\n",
+		dev_dbg(dev, "%s: %#llx clear %ld sector%s\n",
 				__func__, (unsigned long long) sector,
 				cleared / 512, cleared / 512 > 1 ? "s" : "");
 		badblocks_clear(&pmem->bb, sector, cleared / 512);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 2/3] nfit, libnvdimm: allow an ARS scrub to be triggered on demand
  2016-07-22 23:21 ` Vishal Verma
@ 2016-07-22 23:21   ` Vishal Verma
  -1 siblings, 0 replies; 8+ messages in thread
From: Vishal Verma @ 2016-07-22 23:21 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Dan Williams, Rafael J. Wysocki, Tony Luck, linux-kernel,
	linux-acpi, Vishal Verma

Normally, an ARS (Address Range Scrub) only happens at
boot/initialization time. There can however arise situations where a
bus-wide rescan is needed - notably, in the case of discovering a latent
media error, we should do a full rescan to figure out what other sectors
are bad, and thus potentially avoid triggering an mce on them in the
future. Also provide a sysfs trigger to start a bus-wide scrub.

Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: <linux-acpi@vger.kernel.org>
Cc: <linux-nvdimm@lists.01.org>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 drivers/acpi/nfit.c       | 134 +++++++++++++++++++++++++++++++++++++++++++---
 drivers/acpi/nfit.h       |   4 +-
 drivers/nvdimm/core.c     |   7 +++
 include/linux/libnvdimm.h |   1 +
 4 files changed, 138 insertions(+), 8 deletions(-)

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index c0e1c3a..6e45183 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -15,6 +15,7 @@
 #include <linux/module.h>
 #include <linux/mutex.h>
 #include <linux/ndctl.h>
+#include <linux/sysfs.h>
 #include <linux/delay.h>
 #include <linux/list.h>
 #include <linux/acpi.h>
@@ -874,14 +875,76 @@ static ssize_t revision_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(revision);
 
+/*
+ * This shows the number of full Address Range Scrubs that have been
+ * completed since driver load time. Userspace can wait on this using
+ * select/poll etc. A '+' at the end indicates an ARS is in progress
+ */
+static ssize_t scrub_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
+	struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
+	struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
+
+	return sprintf(buf, "%d%s", acpi_desc->scrub_count,
+		(work_busy(&acpi_desc->work)) ? "+\n" : "\n");
+}
+
+static int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc);
+
+static ssize_t scrub_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t size)
+{
+	struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
+	struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
+	struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
+	int rc;
+
+	rc = acpi_nfit_ars_rescan(acpi_desc);
+	if (rc)
+		return rc;
+	return size;
+}
+static DEVICE_ATTR_RW(scrub);
+
+static bool acpi_nfit_ars_supported(struct nvdimm_bus *nvdimm_bus)
+{
+	struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
+
+	if (test_bit(ND_CMD_ARS_CAP, &nd_desc->cmd_mask))
+		return true;
+
+	return false;
+}
+
+static umode_t nfit_visible(struct kobject *kobj, struct attribute *a, int n)
+{
+	struct device *dev = container_of(kobj, struct device, kobj);
+	struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
+
+	if (a == &dev_attr_revision.attr)
+		return a->mode;
+
+	/* check if scrub is supported */
+	if (a == &dev_attr_scrub.attr) {
+		if (!acpi_nfit_ars_supported(nvdimm_bus))
+			return 0;
+	}
+
+	return a->mode;
+}
+
 static struct attribute *acpi_nfit_attributes[] = {
 	&dev_attr_revision.attr,
+	&dev_attr_scrub.attr,
 	NULL,
 };
 
 static struct attribute_group acpi_nfit_attribute_group = {
 	.name = "nfit",
 	.attrs = acpi_nfit_attributes,
+	.is_visible = nfit_visible,
 };
 
 static const struct attribute_group *acpi_nfit_attribute_groups[] = {
@@ -2055,7 +2118,7 @@ static void acpi_nfit_async_scrub(struct acpi_nfit_desc *acpi_desc,
 	unsigned int tmo = scrub_timeout;
 	int rc;
 
-	if (nfit_spa->ars_done || !nfit_spa->nd_region)
+	if (!nfit_spa->ars_required || !nfit_spa->nd_region)
 		return;
 
 	rc = ars_start(acpi_desc, nfit_spa);
@@ -2144,7 +2207,9 @@ static void acpi_nfit_scrub(struct work_struct *work)
 	 * firmware initiated scrubs to complete and then we go search for the
 	 * affected spa regions to mark them scanned.  In the second phase we
 	 * initiate a directed scrub for every range that was not scrubbed in
-	 * phase 1.
+	 * phase 1. If we're called for a 'rescan', we harmlessly pass through
+	 * the first phase, but really only care about running phase 2, where
+	 * regions can be notified of new poison.
 	 */
 
 	/* process platform firmware initiated scrubs */
@@ -2247,14 +2312,17 @@ static void acpi_nfit_scrub(struct work_struct *work)
 		 * Flag all the ranges that still need scrubbing, but
 		 * register them now to make data available.
 		 */
-		if (nfit_spa->nd_region)
-			nfit_spa->ars_done = 1;
-		else
+		if (!nfit_spa->nd_region) {
+			nfit_spa->ars_required = 1;
 			acpi_nfit_register_region(acpi_desc, nfit_spa);
+		}
 	}
 
 	list_for_each_entry(nfit_spa, &acpi_desc->spas, list)
 		acpi_nfit_async_scrub(acpi_desc, nfit_spa);
+	acpi_desc->scrub_count++;
+	if (acpi_desc->scrub_count_state)
+		sysfs_notify_dirent(acpi_desc->scrub_count_state);
 	mutex_unlock(&acpi_desc->init_mutex);
 }
 
@@ -2292,12 +2360,39 @@ static int acpi_nfit_check_deletions(struct acpi_nfit_desc *acpi_desc,
 	return 0;
 }
 
+static int acpi_nfit_desc_init_scrub_attr(struct acpi_nfit_desc *acpi_desc)
+{
+	struct device *dev = acpi_desc->dev;
+
+	if (acpi_nfit_ars_supported(acpi_desc->nvdimm_bus)) {
+		struct kernfs_node *nfit;
+		struct device *bus_dev;
+
+		bus_dev = to_nvdimm_bus_dev(acpi_desc->nvdimm_bus);
+		nfit = sysfs_get_dirent(bus_dev->kobj.sd, "nfit");
+		if (!nfit) {
+			dev_err(dev, "sysfs_get_dirent 'nfit' failed\n");
+			return -ENODEV;
+		}
+		acpi_desc->scrub_count_state = sysfs_get_dirent(nfit, "scrub");
+		sysfs_put(nfit);
+		if (!acpi_desc->scrub_count_state) {
+			dev_err(dev, "sysfs_get_dirent 'scrub' failed\n");
+			return -ENODEV;
+		}
+	}
+
+	return 0;
+}
+
 static void acpi_nfit_destruct(void *data)
 {
 	struct acpi_nfit_desc *acpi_desc = data;
 
 	acpi_desc->cancel = 1;
 	flush_workqueue(nfit_wq);
+	if (acpi_desc->scrub_count_state)
+		sysfs_put(acpi_desc->scrub_count_state);
 	nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
 	acpi_desc->nvdimm_bus = NULL;
 }
@@ -2309,6 +2404,8 @@ int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, void *data, acpi_size sz)
 	const void *end;
 	int rc;
 
+	acpi_nfit_init_dsms(acpi_desc);
+
 	if (!acpi_desc->nvdimm_bus) {
 		acpi_desc->nvdimm_bus = nvdimm_bus_register(dev,
 				&acpi_desc->nd_desc);
@@ -2320,6 +2417,10 @@ int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, void *data, acpi_size sz)
 			return rc;
 	}
 
+	rc = acpi_nfit_desc_init_scrub_attr(acpi_desc);
+	if (rc)
+		return rc;
+
 	mutex_lock(&acpi_desc->init_mutex);
 
 	INIT_LIST_HEAD(&prev.spas);
@@ -2361,8 +2462,6 @@ int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, void *data, acpi_size sz)
 	if (rc)
 		goto out_unlock;
 
-	acpi_nfit_init_dsms(acpi_desc);
-
 	rc = acpi_nfit_register_dimms(acpi_desc);
 	if (rc)
 		goto out_unlock;
@@ -2430,6 +2529,27 @@ static int acpi_nfit_clear_to_send(struct nvdimm_bus_descriptor *nd_desc,
 	return 0;
 }
 
+static int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc)
+{
+	struct device *dev = acpi_desc->dev;
+	struct nfit_spa *nfit_spa;
+
+	if (work_busy(&acpi_desc->work))
+		return -EBUSY;
+
+	list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
+		struct acpi_nfit_system_address *spa = nfit_spa->spa;
+
+		if (nfit_spa_type(spa) != NFIT_SPA_PM)
+			continue;
+
+		nfit_spa->ars_required = 1;
+	}
+	queue_work(nfit_wq, &acpi_desc->work);
+	dev_info(dev, "%s: ars_scan triggered\n", __func__);
+	return 0;
+}
+
 void acpi_nfit_desc_init(struct acpi_nfit_desc *acpi_desc, struct device *dev)
 {
 	struct nvdimm_bus_descriptor *nd_desc;
diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
index 9696e7a..33fc2e9 100644
--- a/drivers/acpi/nfit.h
+++ b/drivers/acpi/nfit.h
@@ -80,7 +80,7 @@ enum {
 struct nfit_spa {
 	struct list_head list;
 	struct nd_region *nd_region;
-	unsigned int ars_done:1;
+	unsigned int ars_required:1;
 	u32 clear_err_unit;
 	u32 max_ars;
 	struct acpi_nfit_system_address spa[0];
@@ -148,6 +148,8 @@ struct acpi_nfit_desc {
 	struct nd_cmd_ars_status *ars_status;
 	size_t ars_status_size;
 	struct work_struct work;
+	struct kernfs_node *scrub_count_state;
+	unsigned int scrub_count;
 	unsigned int cancel:1;
 	unsigned long dimm_cmd_force_en;
 	unsigned long bus_cmd_force_en;
diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c
index e852875..c128674 100644
--- a/drivers/nvdimm/core.c
+++ b/drivers/nvdimm/core.c
@@ -221,6 +221,13 @@ struct nvdimm_bus_descriptor *to_nd_desc(struct nvdimm_bus *nvdimm_bus)
 }
 EXPORT_SYMBOL_GPL(to_nd_desc);
 
+struct device *to_nvdimm_bus_dev(struct nvdimm_bus *nvdimm_bus)
+{
+	/* struct nvdimm_bus definition is private to libnvdimm */
+	return &nvdimm_bus->dev;
+}
+EXPORT_SYMBOL_GPL(to_nvdimm_bus_dev);
+
 struct nvdimm_bus *walk_to_nvdimm_bus(struct device *nd_dev)
 {
 	struct device *dev;
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 2ab869d..b519e13 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -137,6 +137,7 @@ struct nvdimm *to_nvdimm(struct device *dev);
 struct nd_region *to_nd_region(struct device *dev);
 struct nd_blk_region *to_nd_blk_region(struct device *dev);
 struct nvdimm_bus_descriptor *to_nd_desc(struct nvdimm_bus *nvdimm_bus);
+struct device *to_nvdimm_bus_dev(struct nvdimm_bus *nvdimm_bus);
 const char *nvdimm_name(struct nvdimm *nvdimm);
 unsigned long nvdimm_cmd_mask(struct nvdimm *nvdimm);
 void *nvdimm_provider_data(struct nvdimm *nvdimm);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 2/3] nfit, libnvdimm: allow an ARS scrub to be triggered on demand
@ 2016-07-22 23:21   ` Vishal Verma
  0 siblings, 0 replies; 8+ messages in thread
From: Vishal Verma @ 2016-07-22 23:21 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Dan Williams, Rafael J. Wysocki, Tony Luck, linux-kernel,
	linux-acpi, Vishal Verma

Normally, an ARS (Address Range Scrub) only happens at
boot/initialization time. There can however arise situations where a
bus-wide rescan is needed - notably, in the case of discovering a latent
media error, we should do a full rescan to figure out what other sectors
are bad, and thus potentially avoid triggering an mce on them in the
future. Also provide a sysfs trigger to start a bus-wide scrub.

Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: <linux-acpi@vger.kernel.org>
Cc: <linux-nvdimm@lists.01.org>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 drivers/acpi/nfit.c       | 134 +++++++++++++++++++++++++++++++++++++++++++---
 drivers/acpi/nfit.h       |   4 +-
 drivers/nvdimm/core.c     |   7 +++
 include/linux/libnvdimm.h |   1 +
 4 files changed, 138 insertions(+), 8 deletions(-)

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index c0e1c3a..6e45183 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -15,6 +15,7 @@
 #include <linux/module.h>
 #include <linux/mutex.h>
 #include <linux/ndctl.h>
+#include <linux/sysfs.h>
 #include <linux/delay.h>
 #include <linux/list.h>
 #include <linux/acpi.h>
@@ -874,14 +875,76 @@ static ssize_t revision_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(revision);
 
+/*
+ * This shows the number of full Address Range Scrubs that have been
+ * completed since driver load time. Userspace can wait on this using
+ * select/poll etc. A '+' at the end indicates an ARS is in progress
+ */
+static ssize_t scrub_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
+	struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
+	struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
+
+	return sprintf(buf, "%d%s", acpi_desc->scrub_count,
+		(work_busy(&acpi_desc->work)) ? "+\n" : "\n");
+}
+
+static int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc);
+
+static ssize_t scrub_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t size)
+{
+	struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
+	struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
+	struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
+	int rc;
+
+	rc = acpi_nfit_ars_rescan(acpi_desc);
+	if (rc)
+		return rc;
+	return size;
+}
+static DEVICE_ATTR_RW(scrub);
+
+static bool acpi_nfit_ars_supported(struct nvdimm_bus *nvdimm_bus)
+{
+	struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
+
+	if (test_bit(ND_CMD_ARS_CAP, &nd_desc->cmd_mask))
+		return true;
+
+	return false;
+}
+
+static umode_t nfit_visible(struct kobject *kobj, struct attribute *a, int n)
+{
+	struct device *dev = container_of(kobj, struct device, kobj);
+	struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
+
+	if (a == &dev_attr_revision.attr)
+		return a->mode;
+
+	/* check if scrub is supported */
+	if (a == &dev_attr_scrub.attr) {
+		if (!acpi_nfit_ars_supported(nvdimm_bus))
+			return 0;
+	}
+
+	return a->mode;
+}
+
 static struct attribute *acpi_nfit_attributes[] = {
 	&dev_attr_revision.attr,
+	&dev_attr_scrub.attr,
 	NULL,
 };
 
 static struct attribute_group acpi_nfit_attribute_group = {
 	.name = "nfit",
 	.attrs = acpi_nfit_attributes,
+	.is_visible = nfit_visible,
 };
 
 static const struct attribute_group *acpi_nfit_attribute_groups[] = {
@@ -2055,7 +2118,7 @@ static void acpi_nfit_async_scrub(struct acpi_nfit_desc *acpi_desc,
 	unsigned int tmo = scrub_timeout;
 	int rc;
 
-	if (nfit_spa->ars_done || !nfit_spa->nd_region)
+	if (!nfit_spa->ars_required || !nfit_spa->nd_region)
 		return;
 
 	rc = ars_start(acpi_desc, nfit_spa);
@@ -2144,7 +2207,9 @@ static void acpi_nfit_scrub(struct work_struct *work)
 	 * firmware initiated scrubs to complete and then we go search for the
 	 * affected spa regions to mark them scanned.  In the second phase we
 	 * initiate a directed scrub for every range that was not scrubbed in
-	 * phase 1.
+	 * phase 1. If we're called for a 'rescan', we harmlessly pass through
+	 * the first phase, but really only care about running phase 2, where
+	 * regions can be notified of new poison.
 	 */
 
 	/* process platform firmware initiated scrubs */
@@ -2247,14 +2312,17 @@ static void acpi_nfit_scrub(struct work_struct *work)
 		 * Flag all the ranges that still need scrubbing, but
 		 * register them now to make data available.
 		 */
-		if (nfit_spa->nd_region)
-			nfit_spa->ars_done = 1;
-		else
+		if (!nfit_spa->nd_region) {
+			nfit_spa->ars_required = 1;
 			acpi_nfit_register_region(acpi_desc, nfit_spa);
+		}
 	}
 
 	list_for_each_entry(nfit_spa, &acpi_desc->spas, list)
 		acpi_nfit_async_scrub(acpi_desc, nfit_spa);
+	acpi_desc->scrub_count++;
+	if (acpi_desc->scrub_count_state)
+		sysfs_notify_dirent(acpi_desc->scrub_count_state);
 	mutex_unlock(&acpi_desc->init_mutex);
 }
 
@@ -2292,12 +2360,39 @@ static int acpi_nfit_check_deletions(struct acpi_nfit_desc *acpi_desc,
 	return 0;
 }
 
+static int acpi_nfit_desc_init_scrub_attr(struct acpi_nfit_desc *acpi_desc)
+{
+	struct device *dev = acpi_desc->dev;
+
+	if (acpi_nfit_ars_supported(acpi_desc->nvdimm_bus)) {
+		struct kernfs_node *nfit;
+		struct device *bus_dev;
+
+		bus_dev = to_nvdimm_bus_dev(acpi_desc->nvdimm_bus);
+		nfit = sysfs_get_dirent(bus_dev->kobj.sd, "nfit");
+		if (!nfit) {
+			dev_err(dev, "sysfs_get_dirent 'nfit' failed\n");
+			return -ENODEV;
+		}
+		acpi_desc->scrub_count_state = sysfs_get_dirent(nfit, "scrub");
+		sysfs_put(nfit);
+		if (!acpi_desc->scrub_count_state) {
+			dev_err(dev, "sysfs_get_dirent 'scrub' failed\n");
+			return -ENODEV;
+		}
+	}
+
+	return 0;
+}
+
 static void acpi_nfit_destruct(void *data)
 {
 	struct acpi_nfit_desc *acpi_desc = data;
 
 	acpi_desc->cancel = 1;
 	flush_workqueue(nfit_wq);
+	if (acpi_desc->scrub_count_state)
+		sysfs_put(acpi_desc->scrub_count_state);
 	nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
 	acpi_desc->nvdimm_bus = NULL;
 }
@@ -2309,6 +2404,8 @@ int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, void *data, acpi_size sz)
 	const void *end;
 	int rc;
 
+	acpi_nfit_init_dsms(acpi_desc);
+
 	if (!acpi_desc->nvdimm_bus) {
 		acpi_desc->nvdimm_bus = nvdimm_bus_register(dev,
 				&acpi_desc->nd_desc);
@@ -2320,6 +2417,10 @@ int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, void *data, acpi_size sz)
 			return rc;
 	}
 
+	rc = acpi_nfit_desc_init_scrub_attr(acpi_desc);
+	if (rc)
+		return rc;
+
 	mutex_lock(&acpi_desc->init_mutex);
 
 	INIT_LIST_HEAD(&prev.spas);
@@ -2361,8 +2462,6 @@ int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, void *data, acpi_size sz)
 	if (rc)
 		goto out_unlock;
 
-	acpi_nfit_init_dsms(acpi_desc);
-
 	rc = acpi_nfit_register_dimms(acpi_desc);
 	if (rc)
 		goto out_unlock;
@@ -2430,6 +2529,27 @@ static int acpi_nfit_clear_to_send(struct nvdimm_bus_descriptor *nd_desc,
 	return 0;
 }
 
+static int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc)
+{
+	struct device *dev = acpi_desc->dev;
+	struct nfit_spa *nfit_spa;
+
+	if (work_busy(&acpi_desc->work))
+		return -EBUSY;
+
+	list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
+		struct acpi_nfit_system_address *spa = nfit_spa->spa;
+
+		if (nfit_spa_type(spa) != NFIT_SPA_PM)
+			continue;
+
+		nfit_spa->ars_required = 1;
+	}
+	queue_work(nfit_wq, &acpi_desc->work);
+	dev_info(dev, "%s: ars_scan triggered\n", __func__);
+	return 0;
+}
+
 void acpi_nfit_desc_init(struct acpi_nfit_desc *acpi_desc, struct device *dev)
 {
 	struct nvdimm_bus_descriptor *nd_desc;
diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
index 9696e7a..33fc2e9 100644
--- a/drivers/acpi/nfit.h
+++ b/drivers/acpi/nfit.h
@@ -80,7 +80,7 @@ enum {
 struct nfit_spa {
 	struct list_head list;
 	struct nd_region *nd_region;
-	unsigned int ars_done:1;
+	unsigned int ars_required:1;
 	u32 clear_err_unit;
 	u32 max_ars;
 	struct acpi_nfit_system_address spa[0];
@@ -148,6 +148,8 @@ struct acpi_nfit_desc {
 	struct nd_cmd_ars_status *ars_status;
 	size_t ars_status_size;
 	struct work_struct work;
+	struct kernfs_node *scrub_count_state;
+	unsigned int scrub_count;
 	unsigned int cancel:1;
 	unsigned long dimm_cmd_force_en;
 	unsigned long bus_cmd_force_en;
diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c
index e852875..c128674 100644
--- a/drivers/nvdimm/core.c
+++ b/drivers/nvdimm/core.c
@@ -221,6 +221,13 @@ struct nvdimm_bus_descriptor *to_nd_desc(struct nvdimm_bus *nvdimm_bus)
 }
 EXPORT_SYMBOL_GPL(to_nd_desc);
 
+struct device *to_nvdimm_bus_dev(struct nvdimm_bus *nvdimm_bus)
+{
+	/* struct nvdimm_bus definition is private to libnvdimm */
+	return &nvdimm_bus->dev;
+}
+EXPORT_SYMBOL_GPL(to_nvdimm_bus_dev);
+
 struct nvdimm_bus *walk_to_nvdimm_bus(struct device *nd_dev)
 {
 	struct device *dev;
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 2ab869d..b519e13 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -137,6 +137,7 @@ struct nvdimm *to_nvdimm(struct device *dev);
 struct nd_region *to_nd_region(struct device *dev);
 struct nd_blk_region *to_nd_blk_region(struct device *dev);
 struct nvdimm_bus_descriptor *to_nd_desc(struct nvdimm_bus *nvdimm_bus);
+struct device *to_nvdimm_bus_dev(struct nvdimm_bus *nvdimm_bus);
 const char *nvdimm_name(struct nvdimm *nvdimm);
 unsigned long nvdimm_cmd_mask(struct nvdimm *nvdimm);
 void *nvdimm_provider_data(struct nvdimm *nvdimm);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 3/3] nfit: do an ARS scrub on hitting a latent media error
  2016-07-22 23:21 ` Vishal Verma
@ 2016-07-22 23:21   ` Vishal Verma
  -1 siblings, 0 replies; 8+ messages in thread
From: Vishal Verma @ 2016-07-22 23:21 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Dan Williams, Rafael J. Wysocki, Tony Luck, linux-kernel,
	linux-acpi, Vishal Verma

When a latent (unknown to 'badblocks') error is encountered, it will
trigger a machine check exception. On a system with machine check
recovery, this will only SIGBUS the process(es) which had the bad page
mapped (as opposed to a kernel panic on platforms without machine
check recovery features). In the former case, we want to trigger a full
rescan of that nvdimm bus. This will allow any additional, new errors
to be captured in the block devices' badblocks lists, and offending
operations on them can be trapped early, avoiding machine checks.

This is done by registering a callback function with the
x86_mce_decoder_chain and calling the new ars_rescan functionality with
the address in the mce notificatiion.

Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: <linux-acpi@vger.kernel.org>
Cc: <linux-nvdimm@lists.01.org>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 drivers/acpi/nfit.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 drivers/acpi/nfit.h |  1 +
 2 files changed, 88 insertions(+)

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index 6e45183..954b610 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -12,6 +12,7 @@
  */
 #include <linux/list_sort.h>
 #include <linux/libnvdimm.h>
+#include <linux/notifier.h>
 #include <linux/module.h>
 #include <linux/mutex.h>
 #include <linux/ndctl.h>
@@ -24,6 +25,7 @@
 #include <linux/io.h>
 #include <linux/nd.h>
 #include <asm/cacheflush.h>
+#include <asm/mce.h>
 #include "nfit.h"
 
 /*
@@ -51,6 +53,9 @@ module_param(disable_vendor_specific, bool, S_IRUGO);
 MODULE_PARM_DESC(disable_vendor_specific,
 		"Limit commands to the publicly specified set\n");
 
+static LIST_HEAD(acpi_descs);
+static DEFINE_MUTEX(acpi_desc_lock);
+
 static struct workqueue_struct *nfit_wq;
 
 struct nfit_table_prev {
@@ -2395,13 +2400,18 @@ static void acpi_nfit_destruct(void *data)
 		sysfs_put(acpi_desc->scrub_count_state);
 	nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
 	acpi_desc->nvdimm_bus = NULL;
+	mutex_lock(&acpi_desc_lock);
+	list_del(&acpi_desc->list);
+	mutex_unlock(&acpi_desc_lock);
 }
 
 int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, void *data, acpi_size sz)
 {
+	struct acpi_nfit_desc *acpi_desc_entry;
 	struct device *dev = acpi_desc->dev;
 	struct nfit_table_prev prev;
 	const void *end;
+	int found = 0;
 	int rc;
 
 	acpi_nfit_init_dsms(acpi_desc);
@@ -2468,6 +2478,19 @@ int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, void *data, acpi_size sz)
 
 	rc = acpi_nfit_register_regions(acpi_desc);
 
+	/*
+	 * We may get here due to an update of the nfit via _FIT.
+	 * Check if the acpi_desc we're (re)initializing is already
+	 * present in the list, and if so, don't re-add it
+	 */
+	mutex_lock(&acpi_desc_lock);
+	list_for_each_entry(acpi_desc_entry, &acpi_descs, list)
+		if (acpi_desc_entry == acpi_desc)
+			found = 1;
+	if (found == 0)
+		list_add_tail(&acpi_desc->list, &acpi_descs);
+	mutex_unlock(&acpi_desc_lock);
+
  out_unlock:
 	mutex_unlock(&acpi_desc->init_mutex);
 	return rc;
@@ -2550,6 +2573,65 @@ static int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc)
 	return 0;
 }
 
+static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
+			void *data)
+{
+	struct mce *mce = (struct mce *)data;
+	struct acpi_nfit_desc *acpi_desc;
+	struct nfit_spa *nfit_spa;
+
+	/* We only care about memory errors */
+	if (!(mce->status & MCACOD))
+		return NOTIFY_DONE;
+
+	/*
+	 * mce->addr contains the physical addr accessed that caused the
+	 * machine check. We need to walk through the list of NFITs, and see
+	 * if any of them matches that address, and only then start a scrub.
+	 */
+	mutex_lock(&acpi_desc_lock);
+	list_for_each_entry(acpi_desc, &acpi_descs, list) {
+		struct device *dev = acpi_desc->dev;
+		int found_match = 0;
+
+		list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
+			struct acpi_nfit_system_address *spa = nfit_spa->spa;
+
+			if (nfit_spa_type(spa) != NFIT_SPA_PM)
+				continue;
+			/* find the spa that covers the mce addr */
+			if (spa->address > mce->addr)
+				continue;
+			if ((spa->address + spa->length - 1) < mce->addr)
+				continue;
+			found_match = 1;
+			dev_dbg(dev, "%s: addr in SPA %d (0x%llx, 0x%llx)\n",
+				__func__, spa->range_index, spa->address,
+				spa->length);
+			/*
+			 * We can break at the first match because we're going
+			 * to rescan all the SPA ranges. There shouldn't be any
+			 * aliasing anyway.
+			 */
+			break;
+		}
+
+		/*
+		 * We can ignore an -EBUSY here because if an ARS is already
+		 * in progress, just let that be the last authoritative one
+		 */
+		if (found_match)
+			acpi_nfit_ars_rescan(acpi_desc);
+	}
+
+	mutex_unlock(&acpi_desc_lock);
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block nfit_mce_dec = {
+	.notifier_call	= nfit_handle_mce,
+};
+
 void acpi_nfit_desc_init(struct acpi_nfit_desc *acpi_desc, struct device *dev)
 {
 	struct nvdimm_bus_descriptor *nd_desc;
@@ -2724,13 +2806,18 @@ static __init int nfit_init(void)
 	if (!nfit_wq)
 		return -ENOMEM;
 
+	INIT_LIST_HEAD(&acpi_descs);
+	mce_register_decode_chain(&nfit_mce_dec);
+
 	return acpi_bus_register_driver(&acpi_nfit_driver);
 }
 
 static __exit void nfit_exit(void)
 {
+	mce_unregister_decode_chain(&nfit_mce_dec);
 	acpi_bus_unregister_driver(&acpi_nfit_driver);
 	destroy_workqueue(nfit_wq);
+	WARN_ON(!list_empty(&acpi_descs));
 }
 
 module_init(nfit_init);
diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
index 33fc2e9..a2d6c6b 100644
--- a/drivers/acpi/nfit.h
+++ b/drivers/acpi/nfit.h
@@ -148,6 +148,7 @@ struct acpi_nfit_desc {
 	struct nd_cmd_ars_status *ars_status;
 	size_t ars_status_size;
 	struct work_struct work;
+	struct list_head list;
 	struct kernfs_node *scrub_count_state;
 	unsigned int scrub_count;
 	unsigned int cancel:1;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 3/3] nfit: do an ARS scrub on hitting a latent media error
@ 2016-07-22 23:21   ` Vishal Verma
  0 siblings, 0 replies; 8+ messages in thread
From: Vishal Verma @ 2016-07-22 23:21 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Dan Williams, Rafael J. Wysocki, Tony Luck, linux-kernel,
	linux-acpi, Vishal Verma

When a latent (unknown to 'badblocks') error is encountered, it will
trigger a machine check exception. On a system with machine check
recovery, this will only SIGBUS the process(es) which had the bad page
mapped (as opposed to a kernel panic on platforms without machine
check recovery features). In the former case, we want to trigger a full
rescan of that nvdimm bus. This will allow any additional, new errors
to be captured in the block devices' badblocks lists, and offending
operations on them can be trapped early, avoiding machine checks.

This is done by registering a callback function with the
x86_mce_decoder_chain and calling the new ars_rescan functionality with
the address in the mce notificatiion.

Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: <linux-acpi@vger.kernel.org>
Cc: <linux-nvdimm@lists.01.org>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 drivers/acpi/nfit.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 drivers/acpi/nfit.h |  1 +
 2 files changed, 88 insertions(+)

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index 6e45183..954b610 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -12,6 +12,7 @@
  */
 #include <linux/list_sort.h>
 #include <linux/libnvdimm.h>
+#include <linux/notifier.h>
 #include <linux/module.h>
 #include <linux/mutex.h>
 #include <linux/ndctl.h>
@@ -24,6 +25,7 @@
 #include <linux/io.h>
 #include <linux/nd.h>
 #include <asm/cacheflush.h>
+#include <asm/mce.h>
 #include "nfit.h"
 
 /*
@@ -51,6 +53,9 @@ module_param(disable_vendor_specific, bool, S_IRUGO);
 MODULE_PARM_DESC(disable_vendor_specific,
 		"Limit commands to the publicly specified set\n");
 
+static LIST_HEAD(acpi_descs);
+static DEFINE_MUTEX(acpi_desc_lock);
+
 static struct workqueue_struct *nfit_wq;
 
 struct nfit_table_prev {
@@ -2395,13 +2400,18 @@ static void acpi_nfit_destruct(void *data)
 		sysfs_put(acpi_desc->scrub_count_state);
 	nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
 	acpi_desc->nvdimm_bus = NULL;
+	mutex_lock(&acpi_desc_lock);
+	list_del(&acpi_desc->list);
+	mutex_unlock(&acpi_desc_lock);
 }
 
 int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, void *data, acpi_size sz)
 {
+	struct acpi_nfit_desc *acpi_desc_entry;
 	struct device *dev = acpi_desc->dev;
 	struct nfit_table_prev prev;
 	const void *end;
+	int found = 0;
 	int rc;
 
 	acpi_nfit_init_dsms(acpi_desc);
@@ -2468,6 +2478,19 @@ int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, void *data, acpi_size sz)
 
 	rc = acpi_nfit_register_regions(acpi_desc);
 
+	/*
+	 * We may get here due to an update of the nfit via _FIT.
+	 * Check if the acpi_desc we're (re)initializing is already
+	 * present in the list, and if so, don't re-add it
+	 */
+	mutex_lock(&acpi_desc_lock);
+	list_for_each_entry(acpi_desc_entry, &acpi_descs, list)
+		if (acpi_desc_entry == acpi_desc)
+			found = 1;
+	if (found == 0)
+		list_add_tail(&acpi_desc->list, &acpi_descs);
+	mutex_unlock(&acpi_desc_lock);
+
  out_unlock:
 	mutex_unlock(&acpi_desc->init_mutex);
 	return rc;
@@ -2550,6 +2573,65 @@ static int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc)
 	return 0;
 }
 
+static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
+			void *data)
+{
+	struct mce *mce = (struct mce *)data;
+	struct acpi_nfit_desc *acpi_desc;
+	struct nfit_spa *nfit_spa;
+
+	/* We only care about memory errors */
+	if (!(mce->status & MCACOD))
+		return NOTIFY_DONE;
+
+	/*
+	 * mce->addr contains the physical addr accessed that caused the
+	 * machine check. We need to walk through the list of NFITs, and see
+	 * if any of them matches that address, and only then start a scrub.
+	 */
+	mutex_lock(&acpi_desc_lock);
+	list_for_each_entry(acpi_desc, &acpi_descs, list) {
+		struct device *dev = acpi_desc->dev;
+		int found_match = 0;
+
+		list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
+			struct acpi_nfit_system_address *spa = nfit_spa->spa;
+
+			if (nfit_spa_type(spa) != NFIT_SPA_PM)
+				continue;
+			/* find the spa that covers the mce addr */
+			if (spa->address > mce->addr)
+				continue;
+			if ((spa->address + spa->length - 1) < mce->addr)
+				continue;
+			found_match = 1;
+			dev_dbg(dev, "%s: addr in SPA %d (0x%llx, 0x%llx)\n",
+				__func__, spa->range_index, spa->address,
+				spa->length);
+			/*
+			 * We can break at the first match because we're going
+			 * to rescan all the SPA ranges. There shouldn't be any
+			 * aliasing anyway.
+			 */
+			break;
+		}
+
+		/*
+		 * We can ignore an -EBUSY here because if an ARS is already
+		 * in progress, just let that be the last authoritative one
+		 */
+		if (found_match)
+			acpi_nfit_ars_rescan(acpi_desc);
+	}
+
+	mutex_unlock(&acpi_desc_lock);
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block nfit_mce_dec = {
+	.notifier_call	= nfit_handle_mce,
+};
+
 void acpi_nfit_desc_init(struct acpi_nfit_desc *acpi_desc, struct device *dev)
 {
 	struct nvdimm_bus_descriptor *nd_desc;
@@ -2724,13 +2806,18 @@ static __init int nfit_init(void)
 	if (!nfit_wq)
 		return -ENOMEM;
 
+	INIT_LIST_HEAD(&acpi_descs);
+	mce_register_decode_chain(&nfit_mce_dec);
+
 	return acpi_bus_register_driver(&acpi_nfit_driver);
 }
 
 static __exit void nfit_exit(void)
 {
+	mce_unregister_decode_chain(&nfit_mce_dec);
 	acpi_bus_unregister_driver(&acpi_nfit_driver);
 	destroy_workqueue(nfit_wq);
+	WARN_ON(!list_empty(&acpi_descs));
 }
 
 module_init(nfit_init);
diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
index 33fc2e9..a2d6c6b 100644
--- a/drivers/acpi/nfit.h
+++ b/drivers/acpi/nfit.h
@@ -148,6 +148,7 @@ struct acpi_nfit_desc {
 	struct nd_cmd_ars_status *ars_status;
 	size_t ars_status_size;
 	struct work_struct work;
+	struct list_head list;
 	struct kernfs_node *scrub_count_state;
 	unsigned int scrub_count;
 	unsigned int cancel:1;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-07-22 23:22 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-22 23:21 [PATCH v3 0/3] ARS rescanning triggered by latent errors or userspace Vishal Verma
2016-07-22 23:21 ` Vishal Verma
2016-07-22 23:21 ` [PATCH v3 1/3] pmem: clarify a debug print in pmem_clear_poison Vishal Verma
2016-07-22 23:21   ` Vishal Verma
2016-07-22 23:21 ` [PATCH v3 2/3] nfit, libnvdimm: allow an ARS scrub to be triggered on demand Vishal Verma
2016-07-22 23:21   ` Vishal Verma
2016-07-22 23:21 ` [PATCH v3 3/3] nfit: do an ARS scrub on hitting a latent media error Vishal Verma
2016-07-22 23:21   ` Vishal Verma

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.