All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] ARS rescanning triggered by latent errors or userspace
@ 2016-07-21  1:50 ` Vishal Verma
  0 siblings, 0 replies; 39+ messages in thread
From: Vishal Verma @ 2016-07-21  1:50 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: Tony Luck, Rafael J. Wysocki, linux-kernel, linux-acpi

Changes in v2:
- Rework the ars_done flag in nfit_spa to be ars_required, and reuse it for
  rescanning (Dan)
- Rename the ars_rescan attribute to simply 'scrub', and move into the nfit
  group since only nfit buses have this capability (Dan)
- Make the scrub attribute RW, and on reads return the number of times a
  scrub has happened since driver load. This prompted some additional
  refactoring, notably the new helpers acpi_nfit_desc_alloc_register, and
  to_nvdimm_bus_dev. These are all in patch 2. (Dan)
- Remove some redundant list_empty checks in patch 3 (Dan)
- If the acpi_descs lists is not empty at driver unload time, WARN() (Dan)

This series adds on-demand ARS scanning on both, discovery of
latent media errors, and a sysfs trigger from userspace.

The rescanning part is easy to test using the nfit_test framework
- create a namespace (this will by default have bad sectors in
the middle), clear the bad sectors by writing to them, trigger
the rescan through sysfs, and the bad sectors will reappear in
/sys/block/<pmemX>/badblocks.

For the mce handling, I've tested the notifier chain callback
being called with a mock struct mce (called via another sysfs
trigger - this isn't included in the patch obviously), which
has the address field set to a known address in a SPA range,
and the status field with the MCACOD flag set.

What I haven't easily been able to test is the same callback
path with a 'real world' mce, being called as part of the
x86_mce_decoder_chain notifier. I'd therefore appreciate a
closer look at the initial filtering done in nfit_handle_mce
(patch 3/3) from Tony or anyone more familiar with mce handling.

The series is based on v4.7-rc7, and a tree is available at
https://git.kernel.org/cgit/linux/kernel/git/vishal/nvdimm.git/log/?h=ars-ondemand



Vishal Verma (3):
  pmem: clarify a debug print in pmem_clear_poison
  nfit, libnvdimm: allow an ARS scrub to be triggered on demand
  nfit: do an ARS scrub on hitting a latent media error

 drivers/acpi/nfit.c              | 214 +++++++++++++++++++++++++++++++++++----
 drivers/acpi/nfit.h              |   5 +-
 drivers/nvdimm/core.c            |   7 ++
 drivers/nvdimm/pmem.c            |   2 +-
 include/linux/libnvdimm.h        |   1 +
 tools/testing/nvdimm/test/nfit.c |  16 +++
 6 files changed, 224 insertions(+), 21 deletions(-)

-- 
2.7.4

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v2 0/3] ARS rescanning triggered by latent errors or userspace
@ 2016-07-21  1:50 ` Vishal Verma
  0 siblings, 0 replies; 39+ messages in thread
From: Vishal Verma @ 2016-07-21  1:50 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Dan Williams, Rafael J. Wysocki, Tony Luck, linux-kernel,
	linux-acpi, Vishal Verma

Changes in v2:
- Rework the ars_done flag in nfit_spa to be ars_required, and reuse it for
  rescanning (Dan)
- Rename the ars_rescan attribute to simply 'scrub', and move into the nfit
  group since only nfit buses have this capability (Dan)
- Make the scrub attribute RW, and on reads return the number of times a
  scrub has happened since driver load. This prompted some additional
  refactoring, notably the new helpers acpi_nfit_desc_alloc_register, and
  to_nvdimm_bus_dev. These are all in patch 2. (Dan)
- Remove some redundant list_empty checks in patch 3 (Dan)
- If the acpi_descs lists is not empty at driver unload time, WARN() (Dan)

This series adds on-demand ARS scanning on both, discovery of
latent media errors, and a sysfs trigger from userspace.

The rescanning part is easy to test using the nfit_test framework
- create a namespace (this will by default have bad sectors in
the middle), clear the bad sectors by writing to them, trigger
the rescan through sysfs, and the bad sectors will reappear in
/sys/block/<pmemX>/badblocks.

For the mce handling, I've tested the notifier chain callback
being called with a mock struct mce (called via another sysfs
trigger - this isn't included in the patch obviously), which
has the address field set to a known address in a SPA range,
and the status field with the MCACOD flag set.

What I haven't easily been able to test is the same callback
path with a 'real world' mce, being called as part of the
x86_mce_decoder_chain notifier. I'd therefore appreciate a
closer look at the initial filtering done in nfit_handle_mce
(patch 3/3) from Tony or anyone more familiar with mce handling.

The series is based on v4.7-rc7, and a tree is available at
https://git.kernel.org/cgit/linux/kernel/git/vishal/nvdimm.git/log/?h=ars-ondemand



Vishal Verma (3):
  pmem: clarify a debug print in pmem_clear_poison
  nfit, libnvdimm: allow an ARS scrub to be triggered on demand
  nfit: do an ARS scrub on hitting a latent media error

 drivers/acpi/nfit.c              | 214 +++++++++++++++++++++++++++++++++++----
 drivers/acpi/nfit.h              |   5 +-
 drivers/nvdimm/core.c            |   7 ++
 drivers/nvdimm/pmem.c            |   2 +-
 include/linux/libnvdimm.h        |   1 +
 tools/testing/nvdimm/test/nfit.c |  16 +++
 6 files changed, 224 insertions(+), 21 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v2 0/3] ARS rescanning triggered by latent errors or userspace
@ 2016-07-21  1:50 ` Vishal Verma
  0 siblings, 0 replies; 39+ messages in thread
From: Vishal Verma @ 2016-07-21  1:50 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Dan Williams, Rafael J. Wysocki, Tony Luck, linux-kernel,
	linux-acpi, Vishal Verma

Changes in v2:
- Rework the ars_done flag in nfit_spa to be ars_required, and reuse it for
  rescanning (Dan)
- Rename the ars_rescan attribute to simply 'scrub', and move into the nfit
  group since only nfit buses have this capability (Dan)
- Make the scrub attribute RW, and on reads return the number of times a
  scrub has happened since driver load. This prompted some additional
  refactoring, notably the new helpers acpi_nfit_desc_alloc_register, and
  to_nvdimm_bus_dev. These are all in patch 2. (Dan)
- Remove some redundant list_empty checks in patch 3 (Dan)
- If the acpi_descs lists is not empty at driver unload time, WARN() (Dan)

This series adds on-demand ARS scanning on both, discovery of
latent media errors, and a sysfs trigger from userspace.

The rescanning part is easy to test using the nfit_test framework
- create a namespace (this will by default have bad sectors in
the middle), clear the bad sectors by writing to them, trigger
the rescan through sysfs, and the bad sectors will reappear in
/sys/block/<pmemX>/badblocks.

For the mce handling, I've tested the notifier chain callback
being called with a mock struct mce (called via another sysfs
trigger - this isn't included in the patch obviously), which
has the address field set to a known address in a SPA range,
and the status field with the MCACOD flag set.

What I haven't easily been able to test is the same callback
path with a 'real world' mce, being called as part of the
x86_mce_decoder_chain notifier. I'd therefore appreciate a
closer look at the initial filtering done in nfit_handle_mce
(patch 3/3) from Tony or anyone more familiar with mce handling.

The series is based on v4.7-rc7, and a tree is available at
https://git.kernel.org/cgit/linux/kernel/git/vishal/nvdimm.git/log/?h=ars-ondemand



Vishal Verma (3):
  pmem: clarify a debug print in pmem_clear_poison
  nfit, libnvdimm: allow an ARS scrub to be triggered on demand
  nfit: do an ARS scrub on hitting a latent media error

 drivers/acpi/nfit.c              | 214 +++++++++++++++++++++++++++++++++++----
 drivers/acpi/nfit.h              |   5 +-
 drivers/nvdimm/core.c            |   7 ++
 drivers/nvdimm/pmem.c            |   2 +-
 include/linux/libnvdimm.h        |   1 +
 tools/testing/nvdimm/test/nfit.c |  16 +++
 6 files changed, 224 insertions(+), 21 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v2 1/3] pmem: clarify a debug print in pmem_clear_poison
@ 2016-07-21  1:50   ` Vishal Verma
  0 siblings, 0 replies; 39+ messages in thread
From: Vishal Verma @ 2016-07-21  1:50 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: Tony Luck, Rafael J. Wysocki, linux-kernel, linux-acpi

Prefix the sector number being cleared with a '0x' to make it clear that
this is a hex value.

Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 drivers/nvdimm/pmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 608fc44..08ac2cb 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -57,7 +57,7 @@ static void pmem_clear_poison(struct pmem_device *pmem, phys_addr_t offset,
 	cleared = nvdimm_clear_poison(dev, pmem->phys_addr + offset, len);
 
 	if (cleared > 0 && cleared / 512) {
-		dev_dbg(dev, "%s: %llx clear %ld sector%s\n",
+		dev_dbg(dev, "%s: %#llx clear %ld sector%s\n",
 				__func__, (unsigned long long) sector,
 				cleared / 512, cleared / 512 > 1 ? "s" : "");
 		badblocks_clear(&pmem->bb, sector, cleared / 512);
-- 
2.7.4

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 1/3] pmem: clarify a debug print in pmem_clear_poison
@ 2016-07-21  1:50   ` Vishal Verma
  0 siblings, 0 replies; 39+ messages in thread
From: Vishal Verma @ 2016-07-21  1:50 UTC (permalink / raw)
  To: linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw
  Cc: Tony Luck, Rafael J. Wysocki,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA

Prefix the sector number being cleared with a '0x' to make it clear that
this is a hex value.

Signed-off-by: Vishal Verma <vishal.l.verma-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/nvdimm/pmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 608fc44..08ac2cb 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -57,7 +57,7 @@ static void pmem_clear_poison(struct pmem_device *pmem, phys_addr_t offset,
 	cleared = nvdimm_clear_poison(dev, pmem->phys_addr + offset, len);
 
 	if (cleared > 0 && cleared / 512) {
-		dev_dbg(dev, "%s: %llx clear %ld sector%s\n",
+		dev_dbg(dev, "%s: %#llx clear %ld sector%s\n",
 				__func__, (unsigned long long) sector,
 				cleared / 512, cleared / 512 > 1 ? "s" : "");
 		badblocks_clear(&pmem->bb, sector, cleared / 512);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 1/3] pmem: clarify a debug print in pmem_clear_poison
@ 2016-07-21  1:50   ` Vishal Verma
  0 siblings, 0 replies; 39+ messages in thread
From: Vishal Verma @ 2016-07-21  1:50 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Dan Williams, Rafael J. Wysocki, Tony Luck, linux-kernel,
	linux-acpi, Vishal Verma

Prefix the sector number being cleared with a '0x' to make it clear that
this is a hex value.

Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 drivers/nvdimm/pmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 608fc44..08ac2cb 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -57,7 +57,7 @@ static void pmem_clear_poison(struct pmem_device *pmem, phys_addr_t offset,
 	cleared = nvdimm_clear_poison(dev, pmem->phys_addr + offset, len);
 
 	if (cleared > 0 && cleared / 512) {
-		dev_dbg(dev, "%s: %llx clear %ld sector%s\n",
+		dev_dbg(dev, "%s: %#llx clear %ld sector%s\n",
 				__func__, (unsigned long long) sector,
 				cleared / 512, cleared / 512 > 1 ? "s" : "");
 		badblocks_clear(&pmem->bb, sector, cleared / 512);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 2/3] nfit, libnvdimm: allow an ARS scrub to be triggered on demand
@ 2016-07-21  1:50   ` Vishal Verma
  0 siblings, 0 replies; 39+ messages in thread
From: Vishal Verma @ 2016-07-21  1:50 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: Tony Luck, Rafael J. Wysocki, linux-kernel, linux-acpi

Normally, an ARS (Address Range Scrub) only happens at
boot/initialization time. There can however arise situations where a
bus-wide rescan is needed - notably, in the case of discovering a latent
media error, we should do a full rescan to figure out what other sectors
are bad, and thus potentially avoid triggering an mce on them in the
future. Also provide a sysfs trigger to start a bus-wide scrub.

Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: <linux-acpi@vger.kernel.org>
Cc: <linux-nvdimm@lists.01.org>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 drivers/acpi/nfit.c              | 123 +++++++++++++++++++++++++++++++++------
 drivers/acpi/nfit.h              |   4 +-
 drivers/nvdimm/core.c            |   7 +++
 include/linux/libnvdimm.h        |   1 +
 tools/testing/nvdimm/test/nfit.c |  16 +++++
 5 files changed, 131 insertions(+), 20 deletions(-)

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index ac6ddcc0..4e65255 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -15,6 +15,7 @@
 #include <linux/module.h>
 #include <linux/mutex.h>
 #include <linux/ndctl.h>
+#include <linux/sysfs.h>
 #include <linux/delay.h>
 #include <linux/list.h>
 #include <linux/acpi.h>
@@ -806,8 +807,41 @@ static ssize_t revision_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(revision);
 
+/*
+ * This shows the number of full Address Range Scrubs that have been
+ * completed since driver load time. Userspace can wait on this using
+ * select/poll etc.
+ */
+static ssize_t scrub_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
+	struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
+	struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
+
+	return sprintf(buf, "%d\n", acpi_desc->scrub_count);
+}
+
+static int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc);
+
+static ssize_t scrub_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t size)
+{
+	struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
+	struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
+	struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
+	int rc;
+
+	rc = acpi_nfit_ars_rescan(acpi_desc);
+	if (rc)
+		return rc;
+	return size;
+}
+static DEVICE_ATTR_RW(scrub);
+
 static struct attribute *acpi_nfit_attributes[] = {
 	&dev_attr_revision.attr,
+	&dev_attr_scrub.attr,
 	NULL,
 };
 
@@ -2138,7 +2172,7 @@ static void acpi_nfit_async_scrub(struct acpi_nfit_desc *acpi_desc,
 	unsigned int tmo = scrub_timeout;
 	int rc;
 
-	if (nfit_spa->ars_done || !nfit_spa->nd_region)
+	if (!(nfit_spa->ars_required && nfit_spa->nd_region))
 		return;
 
 	rc = ars_start(acpi_desc, nfit_spa);
@@ -2227,7 +2261,9 @@ static void acpi_nfit_scrub(struct work_struct *work)
 	 * firmware initiated scrubs to complete and then we go search for the
 	 * affected spa regions to mark them scanned.  In the second phase we
 	 * initiate a directed scrub for every range that was not scrubbed in
-	 * phase 1.
+	 * phase 1. If we're called for a 'rescan', we harmlessly pass through
+	 * the first phase, but really only care about running phase 2, where
+	 * regions can be notified of new poison.
 	 */
 
 	/* process platform firmware initiated scrubs */
@@ -2330,14 +2366,17 @@ static void acpi_nfit_scrub(struct work_struct *work)
 		 * Flag all the ranges that still need scrubbing, but
 		 * register them now to make data available.
 		 */
-		if (nfit_spa->nd_region)
-			nfit_spa->ars_done = 1;
-		else
+		if (!nfit_spa->nd_region) {
+			nfit_spa->ars_required = 1;
 			acpi_nfit_register_region(acpi_desc, nfit_spa);
+		}
 	}
 
 	list_for_each_entry(nfit_spa, &acpi_desc->spas, list)
 		acpi_nfit_async_scrub(acpi_desc, nfit_spa);
+	acpi_desc->scrub_count++;
+	if (acpi_desc->scrub_count_state)
+		sysfs_notify_dirent(acpi_desc->scrub_count_state);
 	mutex_unlock(&acpi_desc->init_mutex);
 }
 
@@ -2495,6 +2534,27 @@ static int acpi_nfit_clear_to_send(struct nvdimm_bus_descriptor *nd_desc,
 	return 0;
 }
 
+static int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc)
+{
+	struct device *dev = acpi_desc->dev;
+	struct nfit_spa *nfit_spa;
+
+	if (work_busy(&acpi_desc->work))
+		return -EBUSY;
+
+	list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
+		struct acpi_nfit_system_address *spa = nfit_spa->spa;
+
+		if (nfit_spa_type(spa) != NFIT_SPA_PM)
+			continue;
+
+		nfit_spa->ars_required = 1;
+	}
+	queue_work(nfit_wq, &acpi_desc->work);
+	dev_info(dev, "%s: ars_scan triggered\n", __func__);
+	return 0;
+}
+
 void acpi_nfit_desc_init(struct acpi_nfit_desc *acpi_desc, struct device *dev)
 {
 	struct nvdimm_bus_descriptor *nd_desc;
@@ -2523,6 +2583,37 @@ void acpi_nfit_desc_init(struct acpi_nfit_desc *acpi_desc, struct device *dev)
 }
 EXPORT_SYMBOL_GPL(acpi_nfit_desc_init);
 
+static struct acpi_nfit_desc *acpi_nfit_desc_alloc_register(struct device *dev)
+{
+	struct acpi_nfit_desc *acpi_desc;
+	struct kernfs_node *nfit;
+	struct device *bus_dev;
+
+	acpi_desc = devm_kzalloc(dev, sizeof(*acpi_desc), GFP_KERNEL);
+	if (!acpi_desc)
+		return ERR_PTR(-ENOMEM);
+
+	acpi_nfit_desc_init(acpi_desc, dev);
+
+	acpi_desc->nvdimm_bus = nvdimm_bus_register(dev, &acpi_desc->nd_desc);
+	if (!acpi_desc->nvdimm_bus)
+		return ERR_PTR(-ENOMEM);
+
+	bus_dev = to_nvdimm_bus_dev(acpi_desc->nvdimm_bus);
+	nfit = sysfs_get_dirent(bus_dev->kobj.sd, "nfit");
+	if (!nfit) {
+		dev_err(dev, "sysfs_get_dirent 'nfit' failed\n");
+		return ERR_PTR(-ENODEV);
+	}
+	acpi_desc->scrub_count_state = sysfs_get_dirent(nfit, "scrub");
+	if (!acpi_desc->scrub_count_state) {
+		dev_err(dev, "sysfs_get_dirent 'scrub' failed\n");
+		return ERR_PTR(-ENODEV);
+	}
+
+	return acpi_desc;
+}
+
 static int acpi_nfit_add(struct acpi_device *adev)
 {
 	struct acpi_buffer buf = { ACPI_ALLOCATE_BUFFER, NULL };
@@ -2540,13 +2631,9 @@ static int acpi_nfit_add(struct acpi_device *adev)
 		return 0;
 	}
 
-	acpi_desc = devm_kzalloc(dev, sizeof(*acpi_desc), GFP_KERNEL);
-	if (!acpi_desc)
-		return -ENOMEM;
-	acpi_nfit_desc_init(acpi_desc, &adev->dev);
-	acpi_desc->nvdimm_bus = nvdimm_bus_register(dev, &acpi_desc->nd_desc);
-	if (!acpi_desc->nvdimm_bus)
-		return -ENOMEM;
+	acpi_desc = acpi_nfit_desc_alloc_register(dev);
+	if (IS_ERR(acpi_desc))
+		return PTR_ERR(acpi_desc);
 
 	/*
 	 * Save the acpi header for later and then skip it,
@@ -2587,6 +2674,7 @@ static int acpi_nfit_remove(struct acpi_device *adev)
 
 	acpi_desc->cancel = 1;
 	flush_workqueue(nfit_wq);
+	sysfs_put(acpi_desc->scrub_count_state);
 	nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
 	return 0;
 }
@@ -2611,13 +2699,10 @@ static void acpi_nfit_notify(struct acpi_device *adev, u32 event)
 	}
 
 	if (!acpi_desc) {
-		acpi_desc = devm_kzalloc(dev, sizeof(*acpi_desc), GFP_KERNEL);
-		if (!acpi_desc)
-			goto out_unlock;
-		acpi_nfit_desc_init(acpi_desc, &adev->dev);
-		acpi_desc->nvdimm_bus = nvdimm_bus_register(dev, &acpi_desc->nd_desc);
-		if (!acpi_desc->nvdimm_bus)
-			goto out_unlock;
+		acpi_desc = acpi_nfit_desc_alloc_register(dev);
+		if (IS_ERR(acpi_desc))
+			dev_err(dev, "%s: failed to alloc acpi_desc (%ld)\n",
+				__func__, PTR_ERR(acpi_desc));
 	} else {
 		/*
 		 * Finish previous registration before considering new
diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
index 02b9ea1..954d2aa 100644
--- a/drivers/acpi/nfit.h
+++ b/drivers/acpi/nfit.h
@@ -77,7 +77,7 @@ struct nfit_spa {
 	struct acpi_nfit_system_address *spa;
 	struct list_head list;
 	struct nd_region *nd_region;
-	unsigned int ars_done:1;
+	unsigned int ars_required:1;
 	u32 clear_err_unit;
 	u32 max_ars;
 };
@@ -146,6 +146,8 @@ struct acpi_nfit_desc {
 	struct nd_cmd_ars_status *ars_status;
 	size_t ars_status_size;
 	struct work_struct work;
+	struct kernfs_node *scrub_count_state;
+	unsigned int scrub_count;
 	unsigned int cancel:1;
 	unsigned long dimm_cmd_force_en;
 	unsigned long bus_cmd_force_en;
diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c
index be89764..d81db3ac 100644
--- a/drivers/nvdimm/core.c
+++ b/drivers/nvdimm/core.c
@@ -99,6 +99,13 @@ struct nvdimm_bus_descriptor *to_nd_desc(struct nvdimm_bus *nvdimm_bus)
 }
 EXPORT_SYMBOL_GPL(to_nd_desc);
 
+struct device *to_nvdimm_bus_dev(struct nvdimm_bus *nvdimm_bus)
+{
+	/* struct nvdimm_bus definition is private to libnvdimm */
+	return &nvdimm_bus->dev;
+}
+EXPORT_SYMBOL_GPL(to_nvdimm_bus_dev);
+
 struct nvdimm_bus *walk_to_nvdimm_bus(struct device *nd_dev)
 {
 	struct device *dev;
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 0c3c30c..27cecc2 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -129,6 +129,7 @@ struct nvdimm *to_nvdimm(struct device *dev);
 struct nd_region *to_nd_region(struct device *dev);
 struct nd_blk_region *to_nd_blk_region(struct device *dev);
 struct nvdimm_bus_descriptor *to_nd_desc(struct nvdimm_bus *nvdimm_bus);
+struct device *to_nvdimm_bus_dev(struct nvdimm_bus *nvdimm_bus);
 const char *nvdimm_name(struct nvdimm *nvdimm);
 unsigned long nvdimm_cmd_mask(struct nvdimm *nvdimm);
 void *nvdimm_provider_data(struct nvdimm *nvdimm);
diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c
index c919866..74231de 100644
--- a/tools/testing/nvdimm/test/nfit.c
+++ b/tools/testing/nvdimm/test/nfit.c
@@ -20,6 +20,7 @@
 #include <linux/mutex.h>
 #include <linux/ndctl.h>
 #include <linux/sizes.h>
+#include <linux/sysfs.h>
 #include <linux/list.h>
 #include <linux/slab.h>
 #include <nfit.h>
@@ -1409,6 +1410,8 @@ static int nfit_test_probe(struct platform_device *pdev)
 	struct acpi_nfit_desc *acpi_desc;
 	struct device *dev = &pdev->dev;
 	struct nfit_test *nfit_test;
+	struct kernfs_node *nfit;
+	struct device *bus_dev;
 	int rc;
 
 	nfit_test = to_nfit_test(&pdev->dev);
@@ -1471,6 +1474,18 @@ static int nfit_test_probe(struct platform_device *pdev)
 	if (!acpi_desc->nvdimm_bus)
 		return -ENXIO;
 
+	bus_dev = to_nvdimm_bus_dev(acpi_desc->nvdimm_bus);
+	nfit = sysfs_get_dirent(bus_dev->kobj.sd, "nfit");
+	if (!nfit) {
+		dev_err(dev, "sysfs_get_dirent 'nfit' failed\n");
+		return -ENODEV;
+	}
+	acpi_desc->scrub_count_state = sysfs_get_dirent(nfit, "scrub");
+	if (!acpi_desc->scrub_count_state) {
+		dev_err(dev, "sysfs_get_dirent 'scrub' failed\n");
+		return -ENODEV;
+	}
+
 	rc = acpi_nfit_init(acpi_desc, nfit_test->nfit_size);
 	if (rc) {
 		nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
@@ -1497,6 +1512,7 @@ static int nfit_test_remove(struct platform_device *pdev)
 	struct nfit_test *nfit_test = to_nfit_test(&pdev->dev);
 	struct acpi_nfit_desc *acpi_desc = &nfit_test->acpi_desc;
 
+	sysfs_put(acpi_desc->scrub_count_state);
 	nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
 
 	return 0;
-- 
2.7.4

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 2/3] nfit, libnvdimm: allow an ARS scrub to be triggered on demand
@ 2016-07-21  1:50   ` Vishal Verma
  0 siblings, 0 replies; 39+ messages in thread
From: Vishal Verma @ 2016-07-21  1:50 UTC (permalink / raw)
  To: linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw
  Cc: Tony Luck, Rafael J. Wysocki,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA

Normally, an ARS (Address Range Scrub) only happens at
boot/initialization time. There can however arise situations where a
bus-wide rescan is needed - notably, in the case of discovering a latent
media error, we should do a full rescan to figure out what other sectors
are bad, and thus potentially avoid triggering an mce on them in the
future. Also provide a sysfs trigger to start a bus-wide scrub.

Cc: Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: <linux-acpi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Cc: <linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org>
Signed-off-by: Vishal Verma <vishal.l.verma-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/acpi/nfit.c              | 123 +++++++++++++++++++++++++++++++++------
 drivers/acpi/nfit.h              |   4 +-
 drivers/nvdimm/core.c            |   7 +++
 include/linux/libnvdimm.h        |   1 +
 tools/testing/nvdimm/test/nfit.c |  16 +++++
 5 files changed, 131 insertions(+), 20 deletions(-)

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index ac6ddcc0..4e65255 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -15,6 +15,7 @@
 #include <linux/module.h>
 #include <linux/mutex.h>
 #include <linux/ndctl.h>
+#include <linux/sysfs.h>
 #include <linux/delay.h>
 #include <linux/list.h>
 #include <linux/acpi.h>
@@ -806,8 +807,41 @@ static ssize_t revision_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(revision);
 
+/*
+ * This shows the number of full Address Range Scrubs that have been
+ * completed since driver load time. Userspace can wait on this using
+ * select/poll etc.
+ */
+static ssize_t scrub_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
+	struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
+	struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
+
+	return sprintf(buf, "%d\n", acpi_desc->scrub_count);
+}
+
+static int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc);
+
+static ssize_t scrub_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t size)
+{
+	struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
+	struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
+	struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
+	int rc;
+
+	rc = acpi_nfit_ars_rescan(acpi_desc);
+	if (rc)
+		return rc;
+	return size;
+}
+static DEVICE_ATTR_RW(scrub);
+
 static struct attribute *acpi_nfit_attributes[] = {
 	&dev_attr_revision.attr,
+	&dev_attr_scrub.attr,
 	NULL,
 };
 
@@ -2138,7 +2172,7 @@ static void acpi_nfit_async_scrub(struct acpi_nfit_desc *acpi_desc,
 	unsigned int tmo = scrub_timeout;
 	int rc;
 
-	if (nfit_spa->ars_done || !nfit_spa->nd_region)
+	if (!(nfit_spa->ars_required && nfit_spa->nd_region))
 		return;
 
 	rc = ars_start(acpi_desc, nfit_spa);
@@ -2227,7 +2261,9 @@ static void acpi_nfit_scrub(struct work_struct *work)
 	 * firmware initiated scrubs to complete and then we go search for the
 	 * affected spa regions to mark them scanned.  In the second phase we
 	 * initiate a directed scrub for every range that was not scrubbed in
-	 * phase 1.
+	 * phase 1. If we're called for a 'rescan', we harmlessly pass through
+	 * the first phase, but really only care about running phase 2, where
+	 * regions can be notified of new poison.
 	 */
 
 	/* process platform firmware initiated scrubs */
@@ -2330,14 +2366,17 @@ static void acpi_nfit_scrub(struct work_struct *work)
 		 * Flag all the ranges that still need scrubbing, but
 		 * register them now to make data available.
 		 */
-		if (nfit_spa->nd_region)
-			nfit_spa->ars_done = 1;
-		else
+		if (!nfit_spa->nd_region) {
+			nfit_spa->ars_required = 1;
 			acpi_nfit_register_region(acpi_desc, nfit_spa);
+		}
 	}
 
 	list_for_each_entry(nfit_spa, &acpi_desc->spas, list)
 		acpi_nfit_async_scrub(acpi_desc, nfit_spa);
+	acpi_desc->scrub_count++;
+	if (acpi_desc->scrub_count_state)
+		sysfs_notify_dirent(acpi_desc->scrub_count_state);
 	mutex_unlock(&acpi_desc->init_mutex);
 }
 
@@ -2495,6 +2534,27 @@ static int acpi_nfit_clear_to_send(struct nvdimm_bus_descriptor *nd_desc,
 	return 0;
 }
 
+static int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc)
+{
+	struct device *dev = acpi_desc->dev;
+	struct nfit_spa *nfit_spa;
+
+	if (work_busy(&acpi_desc->work))
+		return -EBUSY;
+
+	list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
+		struct acpi_nfit_system_address *spa = nfit_spa->spa;
+
+		if (nfit_spa_type(spa) != NFIT_SPA_PM)
+			continue;
+
+		nfit_spa->ars_required = 1;
+	}
+	queue_work(nfit_wq, &acpi_desc->work);
+	dev_info(dev, "%s: ars_scan triggered\n", __func__);
+	return 0;
+}
+
 void acpi_nfit_desc_init(struct acpi_nfit_desc *acpi_desc, struct device *dev)
 {
 	struct nvdimm_bus_descriptor *nd_desc;
@@ -2523,6 +2583,37 @@ void acpi_nfit_desc_init(struct acpi_nfit_desc *acpi_desc, struct device *dev)
 }
 EXPORT_SYMBOL_GPL(acpi_nfit_desc_init);
 
+static struct acpi_nfit_desc *acpi_nfit_desc_alloc_register(struct device *dev)
+{
+	struct acpi_nfit_desc *acpi_desc;
+	struct kernfs_node *nfit;
+	struct device *bus_dev;
+
+	acpi_desc = devm_kzalloc(dev, sizeof(*acpi_desc), GFP_KERNEL);
+	if (!acpi_desc)
+		return ERR_PTR(-ENOMEM);
+
+	acpi_nfit_desc_init(acpi_desc, dev);
+
+	acpi_desc->nvdimm_bus = nvdimm_bus_register(dev, &acpi_desc->nd_desc);
+	if (!acpi_desc->nvdimm_bus)
+		return ERR_PTR(-ENOMEM);
+
+	bus_dev = to_nvdimm_bus_dev(acpi_desc->nvdimm_bus);
+	nfit = sysfs_get_dirent(bus_dev->kobj.sd, "nfit");
+	if (!nfit) {
+		dev_err(dev, "sysfs_get_dirent 'nfit' failed\n");
+		return ERR_PTR(-ENODEV);
+	}
+	acpi_desc->scrub_count_state = sysfs_get_dirent(nfit, "scrub");
+	if (!acpi_desc->scrub_count_state) {
+		dev_err(dev, "sysfs_get_dirent 'scrub' failed\n");
+		return ERR_PTR(-ENODEV);
+	}
+
+	return acpi_desc;
+}
+
 static int acpi_nfit_add(struct acpi_device *adev)
 {
 	struct acpi_buffer buf = { ACPI_ALLOCATE_BUFFER, NULL };
@@ -2540,13 +2631,9 @@ static int acpi_nfit_add(struct acpi_device *adev)
 		return 0;
 	}
 
-	acpi_desc = devm_kzalloc(dev, sizeof(*acpi_desc), GFP_KERNEL);
-	if (!acpi_desc)
-		return -ENOMEM;
-	acpi_nfit_desc_init(acpi_desc, &adev->dev);
-	acpi_desc->nvdimm_bus = nvdimm_bus_register(dev, &acpi_desc->nd_desc);
-	if (!acpi_desc->nvdimm_bus)
-		return -ENOMEM;
+	acpi_desc = acpi_nfit_desc_alloc_register(dev);
+	if (IS_ERR(acpi_desc))
+		return PTR_ERR(acpi_desc);
 
 	/*
 	 * Save the acpi header for later and then skip it,
@@ -2587,6 +2674,7 @@ static int acpi_nfit_remove(struct acpi_device *adev)
 
 	acpi_desc->cancel = 1;
 	flush_workqueue(nfit_wq);
+	sysfs_put(acpi_desc->scrub_count_state);
 	nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
 	return 0;
 }
@@ -2611,13 +2699,10 @@ static void acpi_nfit_notify(struct acpi_device *adev, u32 event)
 	}
 
 	if (!acpi_desc) {
-		acpi_desc = devm_kzalloc(dev, sizeof(*acpi_desc), GFP_KERNEL);
-		if (!acpi_desc)
-			goto out_unlock;
-		acpi_nfit_desc_init(acpi_desc, &adev->dev);
-		acpi_desc->nvdimm_bus = nvdimm_bus_register(dev, &acpi_desc->nd_desc);
-		if (!acpi_desc->nvdimm_bus)
-			goto out_unlock;
+		acpi_desc = acpi_nfit_desc_alloc_register(dev);
+		if (IS_ERR(acpi_desc))
+			dev_err(dev, "%s: failed to alloc acpi_desc (%ld)\n",
+				__func__, PTR_ERR(acpi_desc));
 	} else {
 		/*
 		 * Finish previous registration before considering new
diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
index 02b9ea1..954d2aa 100644
--- a/drivers/acpi/nfit.h
+++ b/drivers/acpi/nfit.h
@@ -77,7 +77,7 @@ struct nfit_spa {
 	struct acpi_nfit_system_address *spa;
 	struct list_head list;
 	struct nd_region *nd_region;
-	unsigned int ars_done:1;
+	unsigned int ars_required:1;
 	u32 clear_err_unit;
 	u32 max_ars;
 };
@@ -146,6 +146,8 @@ struct acpi_nfit_desc {
 	struct nd_cmd_ars_status *ars_status;
 	size_t ars_status_size;
 	struct work_struct work;
+	struct kernfs_node *scrub_count_state;
+	unsigned int scrub_count;
 	unsigned int cancel:1;
 	unsigned long dimm_cmd_force_en;
 	unsigned long bus_cmd_force_en;
diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c
index be89764..d81db3ac 100644
--- a/drivers/nvdimm/core.c
+++ b/drivers/nvdimm/core.c
@@ -99,6 +99,13 @@ struct nvdimm_bus_descriptor *to_nd_desc(struct nvdimm_bus *nvdimm_bus)
 }
 EXPORT_SYMBOL_GPL(to_nd_desc);
 
+struct device *to_nvdimm_bus_dev(struct nvdimm_bus *nvdimm_bus)
+{
+	/* struct nvdimm_bus definition is private to libnvdimm */
+	return &nvdimm_bus->dev;
+}
+EXPORT_SYMBOL_GPL(to_nvdimm_bus_dev);
+
 struct nvdimm_bus *walk_to_nvdimm_bus(struct device *nd_dev)
 {
 	struct device *dev;
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 0c3c30c..27cecc2 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -129,6 +129,7 @@ struct nvdimm *to_nvdimm(struct device *dev);
 struct nd_region *to_nd_region(struct device *dev);
 struct nd_blk_region *to_nd_blk_region(struct device *dev);
 struct nvdimm_bus_descriptor *to_nd_desc(struct nvdimm_bus *nvdimm_bus);
+struct device *to_nvdimm_bus_dev(struct nvdimm_bus *nvdimm_bus);
 const char *nvdimm_name(struct nvdimm *nvdimm);
 unsigned long nvdimm_cmd_mask(struct nvdimm *nvdimm);
 void *nvdimm_provider_data(struct nvdimm *nvdimm);
diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c
index c919866..74231de 100644
--- a/tools/testing/nvdimm/test/nfit.c
+++ b/tools/testing/nvdimm/test/nfit.c
@@ -20,6 +20,7 @@
 #include <linux/mutex.h>
 #include <linux/ndctl.h>
 #include <linux/sizes.h>
+#include <linux/sysfs.h>
 #include <linux/list.h>
 #include <linux/slab.h>
 #include <nfit.h>
@@ -1409,6 +1410,8 @@ static int nfit_test_probe(struct platform_device *pdev)
 	struct acpi_nfit_desc *acpi_desc;
 	struct device *dev = &pdev->dev;
 	struct nfit_test *nfit_test;
+	struct kernfs_node *nfit;
+	struct device *bus_dev;
 	int rc;
 
 	nfit_test = to_nfit_test(&pdev->dev);
@@ -1471,6 +1474,18 @@ static int nfit_test_probe(struct platform_device *pdev)
 	if (!acpi_desc->nvdimm_bus)
 		return -ENXIO;
 
+	bus_dev = to_nvdimm_bus_dev(acpi_desc->nvdimm_bus);
+	nfit = sysfs_get_dirent(bus_dev->kobj.sd, "nfit");
+	if (!nfit) {
+		dev_err(dev, "sysfs_get_dirent 'nfit' failed\n");
+		return -ENODEV;
+	}
+	acpi_desc->scrub_count_state = sysfs_get_dirent(nfit, "scrub");
+	if (!acpi_desc->scrub_count_state) {
+		dev_err(dev, "sysfs_get_dirent 'scrub' failed\n");
+		return -ENODEV;
+	}
+
 	rc = acpi_nfit_init(acpi_desc, nfit_test->nfit_size);
 	if (rc) {
 		nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
@@ -1497,6 +1512,7 @@ static int nfit_test_remove(struct platform_device *pdev)
 	struct nfit_test *nfit_test = to_nfit_test(&pdev->dev);
 	struct acpi_nfit_desc *acpi_desc = &nfit_test->acpi_desc;
 
+	sysfs_put(acpi_desc->scrub_count_state);
 	nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
 
 	return 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 2/3] nfit, libnvdimm: allow an ARS scrub to be triggered on demand
@ 2016-07-21  1:50   ` Vishal Verma
  0 siblings, 0 replies; 39+ messages in thread
From: Vishal Verma @ 2016-07-21  1:50 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Dan Williams, Rafael J. Wysocki, Tony Luck, linux-kernel,
	linux-acpi, Vishal Verma

Normally, an ARS (Address Range Scrub) only happens at
boot/initialization time. There can however arise situations where a
bus-wide rescan is needed - notably, in the case of discovering a latent
media error, we should do a full rescan to figure out what other sectors
are bad, and thus potentially avoid triggering an mce on them in the
future. Also provide a sysfs trigger to start a bus-wide scrub.

Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: <linux-acpi@vger.kernel.org>
Cc: <linux-nvdimm@lists.01.org>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 drivers/acpi/nfit.c              | 123 +++++++++++++++++++++++++++++++++------
 drivers/acpi/nfit.h              |   4 +-
 drivers/nvdimm/core.c            |   7 +++
 include/linux/libnvdimm.h        |   1 +
 tools/testing/nvdimm/test/nfit.c |  16 +++++
 5 files changed, 131 insertions(+), 20 deletions(-)

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index ac6ddcc0..4e65255 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -15,6 +15,7 @@
 #include <linux/module.h>
 #include <linux/mutex.h>
 #include <linux/ndctl.h>
+#include <linux/sysfs.h>
 #include <linux/delay.h>
 #include <linux/list.h>
 #include <linux/acpi.h>
@@ -806,8 +807,41 @@ static ssize_t revision_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(revision);
 
+/*
+ * This shows the number of full Address Range Scrubs that have been
+ * completed since driver load time. Userspace can wait on this using
+ * select/poll etc.
+ */
+static ssize_t scrub_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
+	struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
+	struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
+
+	return sprintf(buf, "%d\n", acpi_desc->scrub_count);
+}
+
+static int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc);
+
+static ssize_t scrub_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t size)
+{
+	struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
+	struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
+	struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
+	int rc;
+
+	rc = acpi_nfit_ars_rescan(acpi_desc);
+	if (rc)
+		return rc;
+	return size;
+}
+static DEVICE_ATTR_RW(scrub);
+
 static struct attribute *acpi_nfit_attributes[] = {
 	&dev_attr_revision.attr,
+	&dev_attr_scrub.attr,
 	NULL,
 };
 
@@ -2138,7 +2172,7 @@ static void acpi_nfit_async_scrub(struct acpi_nfit_desc *acpi_desc,
 	unsigned int tmo = scrub_timeout;
 	int rc;
 
-	if (nfit_spa->ars_done || !nfit_spa->nd_region)
+	if (!(nfit_spa->ars_required && nfit_spa->nd_region))
 		return;
 
 	rc = ars_start(acpi_desc, nfit_spa);
@@ -2227,7 +2261,9 @@ static void acpi_nfit_scrub(struct work_struct *work)
 	 * firmware initiated scrubs to complete and then we go search for the
 	 * affected spa regions to mark them scanned.  In the second phase we
 	 * initiate a directed scrub for every range that was not scrubbed in
-	 * phase 1.
+	 * phase 1. If we're called for a 'rescan', we harmlessly pass through
+	 * the first phase, but really only care about running phase 2, where
+	 * regions can be notified of new poison.
 	 */
 
 	/* process platform firmware initiated scrubs */
@@ -2330,14 +2366,17 @@ static void acpi_nfit_scrub(struct work_struct *work)
 		 * Flag all the ranges that still need scrubbing, but
 		 * register them now to make data available.
 		 */
-		if (nfit_spa->nd_region)
-			nfit_spa->ars_done = 1;
-		else
+		if (!nfit_spa->nd_region) {
+			nfit_spa->ars_required = 1;
 			acpi_nfit_register_region(acpi_desc, nfit_spa);
+		}
 	}
 
 	list_for_each_entry(nfit_spa, &acpi_desc->spas, list)
 		acpi_nfit_async_scrub(acpi_desc, nfit_spa);
+	acpi_desc->scrub_count++;
+	if (acpi_desc->scrub_count_state)
+		sysfs_notify_dirent(acpi_desc->scrub_count_state);
 	mutex_unlock(&acpi_desc->init_mutex);
 }
 
@@ -2495,6 +2534,27 @@ static int acpi_nfit_clear_to_send(struct nvdimm_bus_descriptor *nd_desc,
 	return 0;
 }
 
+static int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc)
+{
+	struct device *dev = acpi_desc->dev;
+	struct nfit_spa *nfit_spa;
+
+	if (work_busy(&acpi_desc->work))
+		return -EBUSY;
+
+	list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
+		struct acpi_nfit_system_address *spa = nfit_spa->spa;
+
+		if (nfit_spa_type(spa) != NFIT_SPA_PM)
+			continue;
+
+		nfit_spa->ars_required = 1;
+	}
+	queue_work(nfit_wq, &acpi_desc->work);
+	dev_info(dev, "%s: ars_scan triggered\n", __func__);
+	return 0;
+}
+
 void acpi_nfit_desc_init(struct acpi_nfit_desc *acpi_desc, struct device *dev)
 {
 	struct nvdimm_bus_descriptor *nd_desc;
@@ -2523,6 +2583,37 @@ void acpi_nfit_desc_init(struct acpi_nfit_desc *acpi_desc, struct device *dev)
 }
 EXPORT_SYMBOL_GPL(acpi_nfit_desc_init);
 
+static struct acpi_nfit_desc *acpi_nfit_desc_alloc_register(struct device *dev)
+{
+	struct acpi_nfit_desc *acpi_desc;
+	struct kernfs_node *nfit;
+	struct device *bus_dev;
+
+	acpi_desc = devm_kzalloc(dev, sizeof(*acpi_desc), GFP_KERNEL);
+	if (!acpi_desc)
+		return ERR_PTR(-ENOMEM);
+
+	acpi_nfit_desc_init(acpi_desc, dev);
+
+	acpi_desc->nvdimm_bus = nvdimm_bus_register(dev, &acpi_desc->nd_desc);
+	if (!acpi_desc->nvdimm_bus)
+		return ERR_PTR(-ENOMEM);
+
+	bus_dev = to_nvdimm_bus_dev(acpi_desc->nvdimm_bus);
+	nfit = sysfs_get_dirent(bus_dev->kobj.sd, "nfit");
+	if (!nfit) {
+		dev_err(dev, "sysfs_get_dirent 'nfit' failed\n");
+		return ERR_PTR(-ENODEV);
+	}
+	acpi_desc->scrub_count_state = sysfs_get_dirent(nfit, "scrub");
+	if (!acpi_desc->scrub_count_state) {
+		dev_err(dev, "sysfs_get_dirent 'scrub' failed\n");
+		return ERR_PTR(-ENODEV);
+	}
+
+	return acpi_desc;
+}
+
 static int acpi_nfit_add(struct acpi_device *adev)
 {
 	struct acpi_buffer buf = { ACPI_ALLOCATE_BUFFER, NULL };
@@ -2540,13 +2631,9 @@ static int acpi_nfit_add(struct acpi_device *adev)
 		return 0;
 	}
 
-	acpi_desc = devm_kzalloc(dev, sizeof(*acpi_desc), GFP_KERNEL);
-	if (!acpi_desc)
-		return -ENOMEM;
-	acpi_nfit_desc_init(acpi_desc, &adev->dev);
-	acpi_desc->nvdimm_bus = nvdimm_bus_register(dev, &acpi_desc->nd_desc);
-	if (!acpi_desc->nvdimm_bus)
-		return -ENOMEM;
+	acpi_desc = acpi_nfit_desc_alloc_register(dev);
+	if (IS_ERR(acpi_desc))
+		return PTR_ERR(acpi_desc);
 
 	/*
 	 * Save the acpi header for later and then skip it,
@@ -2587,6 +2674,7 @@ static int acpi_nfit_remove(struct acpi_device *adev)
 
 	acpi_desc->cancel = 1;
 	flush_workqueue(nfit_wq);
+	sysfs_put(acpi_desc->scrub_count_state);
 	nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
 	return 0;
 }
@@ -2611,13 +2699,10 @@ static void acpi_nfit_notify(struct acpi_device *adev, u32 event)
 	}
 
 	if (!acpi_desc) {
-		acpi_desc = devm_kzalloc(dev, sizeof(*acpi_desc), GFP_KERNEL);
-		if (!acpi_desc)
-			goto out_unlock;
-		acpi_nfit_desc_init(acpi_desc, &adev->dev);
-		acpi_desc->nvdimm_bus = nvdimm_bus_register(dev, &acpi_desc->nd_desc);
-		if (!acpi_desc->nvdimm_bus)
-			goto out_unlock;
+		acpi_desc = acpi_nfit_desc_alloc_register(dev);
+		if (IS_ERR(acpi_desc))
+			dev_err(dev, "%s: failed to alloc acpi_desc (%ld)\n",
+				__func__, PTR_ERR(acpi_desc));
 	} else {
 		/*
 		 * Finish previous registration before considering new
diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
index 02b9ea1..954d2aa 100644
--- a/drivers/acpi/nfit.h
+++ b/drivers/acpi/nfit.h
@@ -77,7 +77,7 @@ struct nfit_spa {
 	struct acpi_nfit_system_address *spa;
 	struct list_head list;
 	struct nd_region *nd_region;
-	unsigned int ars_done:1;
+	unsigned int ars_required:1;
 	u32 clear_err_unit;
 	u32 max_ars;
 };
@@ -146,6 +146,8 @@ struct acpi_nfit_desc {
 	struct nd_cmd_ars_status *ars_status;
 	size_t ars_status_size;
 	struct work_struct work;
+	struct kernfs_node *scrub_count_state;
+	unsigned int scrub_count;
 	unsigned int cancel:1;
 	unsigned long dimm_cmd_force_en;
 	unsigned long bus_cmd_force_en;
diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c
index be89764..d81db3ac 100644
--- a/drivers/nvdimm/core.c
+++ b/drivers/nvdimm/core.c
@@ -99,6 +99,13 @@ struct nvdimm_bus_descriptor *to_nd_desc(struct nvdimm_bus *nvdimm_bus)
 }
 EXPORT_SYMBOL_GPL(to_nd_desc);
 
+struct device *to_nvdimm_bus_dev(struct nvdimm_bus *nvdimm_bus)
+{
+	/* struct nvdimm_bus definition is private to libnvdimm */
+	return &nvdimm_bus->dev;
+}
+EXPORT_SYMBOL_GPL(to_nvdimm_bus_dev);
+
 struct nvdimm_bus *walk_to_nvdimm_bus(struct device *nd_dev)
 {
 	struct device *dev;
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 0c3c30c..27cecc2 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -129,6 +129,7 @@ struct nvdimm *to_nvdimm(struct device *dev);
 struct nd_region *to_nd_region(struct device *dev);
 struct nd_blk_region *to_nd_blk_region(struct device *dev);
 struct nvdimm_bus_descriptor *to_nd_desc(struct nvdimm_bus *nvdimm_bus);
+struct device *to_nvdimm_bus_dev(struct nvdimm_bus *nvdimm_bus);
 const char *nvdimm_name(struct nvdimm *nvdimm);
 unsigned long nvdimm_cmd_mask(struct nvdimm *nvdimm);
 void *nvdimm_provider_data(struct nvdimm *nvdimm);
diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c
index c919866..74231de 100644
--- a/tools/testing/nvdimm/test/nfit.c
+++ b/tools/testing/nvdimm/test/nfit.c
@@ -20,6 +20,7 @@
 #include <linux/mutex.h>
 #include <linux/ndctl.h>
 #include <linux/sizes.h>
+#include <linux/sysfs.h>
 #include <linux/list.h>
 #include <linux/slab.h>
 #include <nfit.h>
@@ -1409,6 +1410,8 @@ static int nfit_test_probe(struct platform_device *pdev)
 	struct acpi_nfit_desc *acpi_desc;
 	struct device *dev = &pdev->dev;
 	struct nfit_test *nfit_test;
+	struct kernfs_node *nfit;
+	struct device *bus_dev;
 	int rc;
 
 	nfit_test = to_nfit_test(&pdev->dev);
@@ -1471,6 +1474,18 @@ static int nfit_test_probe(struct platform_device *pdev)
 	if (!acpi_desc->nvdimm_bus)
 		return -ENXIO;
 
+	bus_dev = to_nvdimm_bus_dev(acpi_desc->nvdimm_bus);
+	nfit = sysfs_get_dirent(bus_dev->kobj.sd, "nfit");
+	if (!nfit) {
+		dev_err(dev, "sysfs_get_dirent 'nfit' failed\n");
+		return -ENODEV;
+	}
+	acpi_desc->scrub_count_state = sysfs_get_dirent(nfit, "scrub");
+	if (!acpi_desc->scrub_count_state) {
+		dev_err(dev, "sysfs_get_dirent 'scrub' failed\n");
+		return -ENODEV;
+	}
+
 	rc = acpi_nfit_init(acpi_desc, nfit_test->nfit_size);
 	if (rc) {
 		nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
@@ -1497,6 +1512,7 @@ static int nfit_test_remove(struct platform_device *pdev)
 	struct nfit_test *nfit_test = to_nfit_test(&pdev->dev);
 	struct acpi_nfit_desc *acpi_desc = &nfit_test->acpi_desc;
 
+	sysfs_put(acpi_desc->scrub_count_state);
 	nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
 
 	return 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 3/3] nfit: do an ARS scrub on hitting a latent media error
@ 2016-07-21  1:50   ` Vishal Verma
  0 siblings, 0 replies; 39+ messages in thread
From: Vishal Verma @ 2016-07-21  1:50 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: Tony Luck, Rafael J. Wysocki, linux-kernel, linux-acpi

When a latent (unknown to 'badblocks') error is encountered, it will
trigger a machine check exception. On a system with machine check
recovery, this will only SIGBUS the process(es) which had the bad page
mapped (as opposed to a kernel panic on platforms without machine
check recovery features). In the former case, we want to trigger a full
rescan of that nvdimm bus. This will allow any additional, new errors
to be captured in the block devices' badblocks lists, and offending
operations on them can be trapped early, avoiding machine checks.

This is done by registering a callback function with the
x86_mce_decoder_chain and calling the new ars_rescan functionality with
the address in the mce notificatiion.

Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: <linux-acpi@vger.kernel.org>
Cc: <linux-nvdimm@lists.01.org>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 drivers/acpi/nfit.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 drivers/acpi/nfit.h |  1 +
 2 files changed, 90 insertions(+)

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index 4e65255..c9f1ee4 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -12,6 +12,7 @@
  */
 #include <linux/list_sort.h>
 #include <linux/libnvdimm.h>
+#include <linux/notifier.h>
 #include <linux/module.h>
 #include <linux/mutex.h>
 #include <linux/ndctl.h>
@@ -24,6 +25,7 @@
 #include <linux/io.h>
 #include <linux/nd.h>
 #include <asm/cacheflush.h>
+#include <asm/mce.h>
 #include "nfit.h"
 
 /*
@@ -51,6 +53,9 @@ module_param(disable_vendor_specific, bool, S_IRUGO);
 MODULE_PARM_DESC(disable_vendor_specific,
 		"Limit commands to the publicly specified set\n");
 
+static LIST_HEAD(acpi_descs);
+static DEFINE_MUTEX(acpi_desc_lock);
+
 static struct workqueue_struct *nfit_wq;
 
 struct nfit_table_prev {
@@ -2416,9 +2421,11 @@ static int acpi_nfit_check_deletions(struct acpi_nfit_desc *acpi_desc,
 
 int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
 {
+	struct acpi_nfit_desc *acpi_desc_entry;
 	struct device *dev = acpi_desc->dev;
 	struct nfit_table_prev prev;
 	const void *end;
+	int found = 0;
 	u8 *data;
 	int rc;
 
@@ -2473,6 +2480,19 @@ int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
 
 	rc = acpi_nfit_register_regions(acpi_desc);
 
+	/*
+	 * We may get here due to an update of the nfit via _FIT.
+	 * Check if the acpi_desc we're (re)initializing is already
+	 * present in the list, and if so, don't re-add it
+	 */
+	mutex_lock(&acpi_desc_lock);
+	list_for_each_entry(acpi_desc_entry, &acpi_descs, list)
+		if (acpi_desc_entry == acpi_desc)
+			found = 1;
+	if (found == 0)
+		list_add_tail(&acpi_desc->list, &acpi_descs);
+	mutex_unlock(&acpi_desc_lock);
+
  out_unlock:
 	mutex_unlock(&acpi_desc->init_mutex);
 	return rc;
@@ -2555,6 +2575,65 @@ static int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc)
 	return 0;
 }
 
+static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
+			void *data)
+{
+	struct mce *mce = (struct mce *)data;
+	struct acpi_nfit_desc *acpi_desc;
+	struct nfit_spa *nfit_spa;
+
+	/* We only care about memory errors */
+	if (!(mce->status & MCACOD))
+		return NOTIFY_DONE;
+
+	/*
+	 * mce->addr contains the physical addr accessed that caused the
+	 * machine check. We need to walk through the list of NFITs, and see
+	 * if any of them matches that address, and only then start a scrub.
+	 */
+	mutex_lock(&acpi_desc_lock);
+	list_for_each_entry(acpi_desc, &acpi_descs, list) {
+		struct device *dev = acpi_desc->dev;
+		int found_match = 0;
+
+		list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
+			struct acpi_nfit_system_address *spa = nfit_spa->spa;
+
+			if (nfit_spa_type(spa) != NFIT_SPA_PM)
+				continue;
+			/* find the spa that covers the mce addr */
+			if (spa->address > mce->addr)
+				continue;
+			if ((spa->address + spa->length - 1) < mce->addr)
+				continue;
+			found_match = 1;
+			dev_dbg(dev, "%s: addr in SPA %d (0x%llx, 0x%llx)\n",
+				__func__, spa->range_index, spa->address,
+				spa->length);
+			/*
+			 * We can break at the first match because we're going
+			 * to rescan all the SPA ranges. There shouldn't be any
+			 * aliasing anyway.
+			 */
+			break;
+		}
+
+		/*
+		 * We can ignore an -EBUSY here because if an ARS is already
+		 * in progress, just let that be the last authoritative one
+		 */
+		if (found_match)
+			acpi_nfit_ars_rescan(acpi_desc);
+	}
+
+	mutex_unlock(&acpi_desc_lock);
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block nfit_mce_dec = {
+	.notifier_call	= nfit_handle_mce,
+};
+
 void acpi_nfit_desc_init(struct acpi_nfit_desc *acpi_desc, struct device *dev)
 {
 	struct nvdimm_bus_descriptor *nd_desc;
@@ -2676,6 +2755,9 @@ static int acpi_nfit_remove(struct acpi_device *adev)
 	flush_workqueue(nfit_wq);
 	sysfs_put(acpi_desc->scrub_count_state);
 	nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
+	mutex_lock(&acpi_desc_lock);
+	list_del(&acpi_desc->list);
+	mutex_unlock(&acpi_desc_lock);
 	return 0;
 }
 
@@ -2782,13 +2864,20 @@ static __init int nfit_init(void)
 	if (!nfit_wq)
 		return -ENOMEM;
 
+	INIT_LIST_HEAD(&acpi_descs);
+	mce_register_decode_chain(&nfit_mce_dec);
+
 	return acpi_bus_register_driver(&acpi_nfit_driver);
 }
 
 static __exit void nfit_exit(void)
 {
+	mce_unregister_decode_chain(&nfit_mce_dec);
 	acpi_bus_unregister_driver(&acpi_nfit_driver);
 	destroy_workqueue(nfit_wq);
+	mutex_lock(&acpi_desc_lock);
+	WARN_ON(!list_empty(&acpi_descs));
+	mutex_unlock(&acpi_desc_lock);
 }
 
 module_init(nfit_init);
diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
index 954d2aa..d2aec6f 100644
--- a/drivers/acpi/nfit.h
+++ b/drivers/acpi/nfit.h
@@ -146,6 +146,7 @@ struct acpi_nfit_desc {
 	struct nd_cmd_ars_status *ars_status;
 	size_t ars_status_size;
 	struct work_struct work;
+	struct list_head list;
 	struct kernfs_node *scrub_count_state;
 	unsigned int scrub_count;
 	unsigned int cancel:1;
-- 
2.7.4

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 3/3] nfit: do an ARS scrub on hitting a latent media error
@ 2016-07-21  1:50   ` Vishal Verma
  0 siblings, 0 replies; 39+ messages in thread
From: Vishal Verma @ 2016-07-21  1:50 UTC (permalink / raw)
  To: linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw
  Cc: Tony Luck, Rafael J. Wysocki,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA

When a latent (unknown to 'badblocks') error is encountered, it will
trigger a machine check exception. On a system with machine check
recovery, this will only SIGBUS the process(es) which had the bad page
mapped (as opposed to a kernel panic on platforms without machine
check recovery features). In the former case, we want to trigger a full
rescan of that nvdimm bus. This will allow any additional, new errors
to be captured in the block devices' badblocks lists, and offending
operations on them can be trapped early, avoiding machine checks.

This is done by registering a callback function with the
x86_mce_decoder_chain and calling the new ars_rescan functionality with
the address in the mce notificatiion.

Cc: Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: Tony Luck <tony.luck-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: <linux-acpi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Cc: <linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org>
Signed-off-by: Vishal Verma <vishal.l.verma-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/acpi/nfit.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 drivers/acpi/nfit.h |  1 +
 2 files changed, 90 insertions(+)

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index 4e65255..c9f1ee4 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -12,6 +12,7 @@
  */
 #include <linux/list_sort.h>
 #include <linux/libnvdimm.h>
+#include <linux/notifier.h>
 #include <linux/module.h>
 #include <linux/mutex.h>
 #include <linux/ndctl.h>
@@ -24,6 +25,7 @@
 #include <linux/io.h>
 #include <linux/nd.h>
 #include <asm/cacheflush.h>
+#include <asm/mce.h>
 #include "nfit.h"
 
 /*
@@ -51,6 +53,9 @@ module_param(disable_vendor_specific, bool, S_IRUGO);
 MODULE_PARM_DESC(disable_vendor_specific,
 		"Limit commands to the publicly specified set\n");
 
+static LIST_HEAD(acpi_descs);
+static DEFINE_MUTEX(acpi_desc_lock);
+
 static struct workqueue_struct *nfit_wq;
 
 struct nfit_table_prev {
@@ -2416,9 +2421,11 @@ static int acpi_nfit_check_deletions(struct acpi_nfit_desc *acpi_desc,
 
 int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
 {
+	struct acpi_nfit_desc *acpi_desc_entry;
 	struct device *dev = acpi_desc->dev;
 	struct nfit_table_prev prev;
 	const void *end;
+	int found = 0;
 	u8 *data;
 	int rc;
 
@@ -2473,6 +2480,19 @@ int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
 
 	rc = acpi_nfit_register_regions(acpi_desc);
 
+	/*
+	 * We may get here due to an update of the nfit via _FIT.
+	 * Check if the acpi_desc we're (re)initializing is already
+	 * present in the list, and if so, don't re-add it
+	 */
+	mutex_lock(&acpi_desc_lock);
+	list_for_each_entry(acpi_desc_entry, &acpi_descs, list)
+		if (acpi_desc_entry == acpi_desc)
+			found = 1;
+	if (found == 0)
+		list_add_tail(&acpi_desc->list, &acpi_descs);
+	mutex_unlock(&acpi_desc_lock);
+
  out_unlock:
 	mutex_unlock(&acpi_desc->init_mutex);
 	return rc;
@@ -2555,6 +2575,65 @@ static int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc)
 	return 0;
 }
 
+static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
+			void *data)
+{
+	struct mce *mce = (struct mce *)data;
+	struct acpi_nfit_desc *acpi_desc;
+	struct nfit_spa *nfit_spa;
+
+	/* We only care about memory errors */
+	if (!(mce->status & MCACOD))
+		return NOTIFY_DONE;
+
+	/*
+	 * mce->addr contains the physical addr accessed that caused the
+	 * machine check. We need to walk through the list of NFITs, and see
+	 * if any of them matches that address, and only then start a scrub.
+	 */
+	mutex_lock(&acpi_desc_lock);
+	list_for_each_entry(acpi_desc, &acpi_descs, list) {
+		struct device *dev = acpi_desc->dev;
+		int found_match = 0;
+
+		list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
+			struct acpi_nfit_system_address *spa = nfit_spa->spa;
+
+			if (nfit_spa_type(spa) != NFIT_SPA_PM)
+				continue;
+			/* find the spa that covers the mce addr */
+			if (spa->address > mce->addr)
+				continue;
+			if ((spa->address + spa->length - 1) < mce->addr)
+				continue;
+			found_match = 1;
+			dev_dbg(dev, "%s: addr in SPA %d (0x%llx, 0x%llx)\n",
+				__func__, spa->range_index, spa->address,
+				spa->length);
+			/*
+			 * We can break at the first match because we're going
+			 * to rescan all the SPA ranges. There shouldn't be any
+			 * aliasing anyway.
+			 */
+			break;
+		}
+
+		/*
+		 * We can ignore an -EBUSY here because if an ARS is already
+		 * in progress, just let that be the last authoritative one
+		 */
+		if (found_match)
+			acpi_nfit_ars_rescan(acpi_desc);
+	}
+
+	mutex_unlock(&acpi_desc_lock);
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block nfit_mce_dec = {
+	.notifier_call	= nfit_handle_mce,
+};
+
 void acpi_nfit_desc_init(struct acpi_nfit_desc *acpi_desc, struct device *dev)
 {
 	struct nvdimm_bus_descriptor *nd_desc;
@@ -2676,6 +2755,9 @@ static int acpi_nfit_remove(struct acpi_device *adev)
 	flush_workqueue(nfit_wq);
 	sysfs_put(acpi_desc->scrub_count_state);
 	nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
+	mutex_lock(&acpi_desc_lock);
+	list_del(&acpi_desc->list);
+	mutex_unlock(&acpi_desc_lock);
 	return 0;
 }
 
@@ -2782,13 +2864,20 @@ static __init int nfit_init(void)
 	if (!nfit_wq)
 		return -ENOMEM;
 
+	INIT_LIST_HEAD(&acpi_descs);
+	mce_register_decode_chain(&nfit_mce_dec);
+
 	return acpi_bus_register_driver(&acpi_nfit_driver);
 }
 
 static __exit void nfit_exit(void)
 {
+	mce_unregister_decode_chain(&nfit_mce_dec);
 	acpi_bus_unregister_driver(&acpi_nfit_driver);
 	destroy_workqueue(nfit_wq);
+	mutex_lock(&acpi_desc_lock);
+	WARN_ON(!list_empty(&acpi_descs));
+	mutex_unlock(&acpi_desc_lock);
 }
 
 module_init(nfit_init);
diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
index 954d2aa..d2aec6f 100644
--- a/drivers/acpi/nfit.h
+++ b/drivers/acpi/nfit.h
@@ -146,6 +146,7 @@ struct acpi_nfit_desc {
 	struct nd_cmd_ars_status *ars_status;
 	size_t ars_status_size;
 	struct work_struct work;
+	struct list_head list;
 	struct kernfs_node *scrub_count_state;
 	unsigned int scrub_count;
 	unsigned int cancel:1;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 3/3] nfit: do an ARS scrub on hitting a latent media error
@ 2016-07-21  1:50   ` Vishal Verma
  0 siblings, 0 replies; 39+ messages in thread
From: Vishal Verma @ 2016-07-21  1:50 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Dan Williams, Rafael J. Wysocki, Tony Luck, linux-kernel,
	linux-acpi, Vishal Verma

When a latent (unknown to 'badblocks') error is encountered, it will
trigger a machine check exception. On a system with machine check
recovery, this will only SIGBUS the process(es) which had the bad page
mapped (as opposed to a kernel panic on platforms without machine
check recovery features). In the former case, we want to trigger a full
rescan of that nvdimm bus. This will allow any additional, new errors
to be captured in the block devices' badblocks lists, and offending
operations on them can be trapped early, avoiding machine checks.

This is done by registering a callback function with the
x86_mce_decoder_chain and calling the new ars_rescan functionality with
the address in the mce notificatiion.

Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: <linux-acpi@vger.kernel.org>
Cc: <linux-nvdimm@lists.01.org>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 drivers/acpi/nfit.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 drivers/acpi/nfit.h |  1 +
 2 files changed, 90 insertions(+)

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index 4e65255..c9f1ee4 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -12,6 +12,7 @@
  */
 #include <linux/list_sort.h>
 #include <linux/libnvdimm.h>
+#include <linux/notifier.h>
 #include <linux/module.h>
 #include <linux/mutex.h>
 #include <linux/ndctl.h>
@@ -24,6 +25,7 @@
 #include <linux/io.h>
 #include <linux/nd.h>
 #include <asm/cacheflush.h>
+#include <asm/mce.h>
 #include "nfit.h"
 
 /*
@@ -51,6 +53,9 @@ module_param(disable_vendor_specific, bool, S_IRUGO);
 MODULE_PARM_DESC(disable_vendor_specific,
 		"Limit commands to the publicly specified set\n");
 
+static LIST_HEAD(acpi_descs);
+static DEFINE_MUTEX(acpi_desc_lock);
+
 static struct workqueue_struct *nfit_wq;
 
 struct nfit_table_prev {
@@ -2416,9 +2421,11 @@ static int acpi_nfit_check_deletions(struct acpi_nfit_desc *acpi_desc,
 
 int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
 {
+	struct acpi_nfit_desc *acpi_desc_entry;
 	struct device *dev = acpi_desc->dev;
 	struct nfit_table_prev prev;
 	const void *end;
+	int found = 0;
 	u8 *data;
 	int rc;
 
@@ -2473,6 +2480,19 @@ int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
 
 	rc = acpi_nfit_register_regions(acpi_desc);
 
+	/*
+	 * We may get here due to an update of the nfit via _FIT.
+	 * Check if the acpi_desc we're (re)initializing is already
+	 * present in the list, and if so, don't re-add it
+	 */
+	mutex_lock(&acpi_desc_lock);
+	list_for_each_entry(acpi_desc_entry, &acpi_descs, list)
+		if (acpi_desc_entry == acpi_desc)
+			found = 1;
+	if (found == 0)
+		list_add_tail(&acpi_desc->list, &acpi_descs);
+	mutex_unlock(&acpi_desc_lock);
+
  out_unlock:
 	mutex_unlock(&acpi_desc->init_mutex);
 	return rc;
@@ -2555,6 +2575,65 @@ static int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc)
 	return 0;
 }
 
+static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
+			void *data)
+{
+	struct mce *mce = (struct mce *)data;
+	struct acpi_nfit_desc *acpi_desc;
+	struct nfit_spa *nfit_spa;
+
+	/* We only care about memory errors */
+	if (!(mce->status & MCACOD))
+		return NOTIFY_DONE;
+
+	/*
+	 * mce->addr contains the physical addr accessed that caused the
+	 * machine check. We need to walk through the list of NFITs, and see
+	 * if any of them matches that address, and only then start a scrub.
+	 */
+	mutex_lock(&acpi_desc_lock);
+	list_for_each_entry(acpi_desc, &acpi_descs, list) {
+		struct device *dev = acpi_desc->dev;
+		int found_match = 0;
+
+		list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
+			struct acpi_nfit_system_address *spa = nfit_spa->spa;
+
+			if (nfit_spa_type(spa) != NFIT_SPA_PM)
+				continue;
+			/* find the spa that covers the mce addr */
+			if (spa->address > mce->addr)
+				continue;
+			if ((spa->address + spa->length - 1) < mce->addr)
+				continue;
+			found_match = 1;
+			dev_dbg(dev, "%s: addr in SPA %d (0x%llx, 0x%llx)\n",
+				__func__, spa->range_index, spa->address,
+				spa->length);
+			/*
+			 * We can break at the first match because we're going
+			 * to rescan all the SPA ranges. There shouldn't be any
+			 * aliasing anyway.
+			 */
+			break;
+		}
+
+		/*
+		 * We can ignore an -EBUSY here because if an ARS is already
+		 * in progress, just let that be the last authoritative one
+		 */
+		if (found_match)
+			acpi_nfit_ars_rescan(acpi_desc);
+	}
+
+	mutex_unlock(&acpi_desc_lock);
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block nfit_mce_dec = {
+	.notifier_call	= nfit_handle_mce,
+};
+
 void acpi_nfit_desc_init(struct acpi_nfit_desc *acpi_desc, struct device *dev)
 {
 	struct nvdimm_bus_descriptor *nd_desc;
@@ -2676,6 +2755,9 @@ static int acpi_nfit_remove(struct acpi_device *adev)
 	flush_workqueue(nfit_wq);
 	sysfs_put(acpi_desc->scrub_count_state);
 	nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
+	mutex_lock(&acpi_desc_lock);
+	list_del(&acpi_desc->list);
+	mutex_unlock(&acpi_desc_lock);
 	return 0;
 }
 
@@ -2782,13 +2864,20 @@ static __init int nfit_init(void)
 	if (!nfit_wq)
 		return -ENOMEM;
 
+	INIT_LIST_HEAD(&acpi_descs);
+	mce_register_decode_chain(&nfit_mce_dec);
+
 	return acpi_bus_register_driver(&acpi_nfit_driver);
 }
 
 static __exit void nfit_exit(void)
 {
+	mce_unregister_decode_chain(&nfit_mce_dec);
 	acpi_bus_unregister_driver(&acpi_nfit_driver);
 	destroy_workqueue(nfit_wq);
+	mutex_lock(&acpi_desc_lock);
+	WARN_ON(!list_empty(&acpi_descs));
+	mutex_unlock(&acpi_desc_lock);
 }
 
 module_init(nfit_init);
diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
index 954d2aa..d2aec6f 100644
--- a/drivers/acpi/nfit.h
+++ b/drivers/acpi/nfit.h
@@ -146,6 +146,7 @@ struct acpi_nfit_desc {
 	struct nd_cmd_ars_status *ars_status;
 	size_t ars_status_size;
 	struct work_struct work;
+	struct list_head list;
 	struct kernfs_node *scrub_count_state;
 	unsigned int scrub_count;
 	unsigned int cancel:1;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/3] nfit, libnvdimm: allow an ARS scrub to be triggered on demand
  2016-07-21  1:50   ` Vishal Verma
  (?)
@ 2016-07-21 15:56     ` Dan Williams
  -1 siblings, 0 replies; 39+ messages in thread
From: Dan Williams @ 2016-07-21 15:56 UTC (permalink / raw)
  To: Vishal Verma
  Cc: linux-kernel, Rafael J. Wysocki, Linux ACPI, Tony Luck, linux-nvdimm

On Wed, Jul 20, 2016 at 6:50 PM, Vishal Verma <vishal.l.verma@intel.com> wrote:
> Normally, an ARS (Address Range Scrub) only happens at
> boot/initialization time. There can however arise situations where a
> bus-wide rescan is needed - notably, in the case of discovering a latent
> media error, we should do a full rescan to figure out what other sectors
> are bad, and thus potentially avoid triggering an mce on them in the
> future. Also provide a sysfs trigger to start a bus-wide scrub.
>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: <linux-acpi@vger.kernel.org>
> Cc: <linux-nvdimm@lists.01.org>
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> ---
>  drivers/acpi/nfit.c              | 123 +++++++++++++++++++++++++++++++++------
>  drivers/acpi/nfit.h              |   4 +-
>  drivers/nvdimm/core.c            |   7 +++
>  include/linux/libnvdimm.h        |   1 +
>  tools/testing/nvdimm/test/nfit.c |  16 +++++
>  5 files changed, 131 insertions(+), 20 deletions(-)
>

Looks good, just a couple nits:

[..]
> @@ -2138,7 +2172,7 @@ static void acpi_nfit_async_scrub(struct acpi_nfit_desc *acpi_desc,
>         unsigned int tmo = scrub_timeout;
>         int rc;
>
> -       if (nfit_spa->ars_done || !nfit_spa->nd_region)
> +       if (!(nfit_spa->ars_required && nfit_spa->nd_region))
>                 return;

Why is nd_region part of this check?  Can't this just be:

    if (!nfit_spa->ars_requested)
        return;

[..]
>
> +static struct acpi_nfit_desc *acpi_nfit_desc_alloc_register(struct device *dev)
> +{
> +       struct acpi_nfit_desc *acpi_desc;
> +       struct kernfs_node *nfit;
> +       struct device *bus_dev;
> +
> +       acpi_desc = devm_kzalloc(dev, sizeof(*acpi_desc), GFP_KERNEL);
> +       if (!acpi_desc)
> +               return ERR_PTR(-ENOMEM);
> +
> +       acpi_nfit_desc_init(acpi_desc, dev);
> +
> +       acpi_desc->nvdimm_bus = nvdimm_bus_register(dev, &acpi_desc->nd_desc);
> +       if (!acpi_desc->nvdimm_bus)
> +               return ERR_PTR(-ENOMEM);
> +
> +       bus_dev = to_nvdimm_bus_dev(acpi_desc->nvdimm_bus);
> +       nfit = sysfs_get_dirent(bus_dev->kobj.sd, "nfit");
> +       if (!nfit) {
> +               dev_err(dev, "sysfs_get_dirent 'nfit' failed\n");
> +               return ERR_PTR(-ENODEV);
> +       }
> +       acpi_desc->scrub_count_state = sysfs_get_dirent(nfit, "scrub");

Missing sysfs_put(nfit) here?

> +       if (!acpi_desc->scrub_count_state) {
> +               dev_err(dev, "sysfs_get_dirent 'scrub' failed\n");
> +               return ERR_PTR(-ENODEV);
> +       }
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/3] nfit, libnvdimm: allow an ARS scrub to be triggered on demand
@ 2016-07-21 15:56     ` Dan Williams
  0 siblings, 0 replies; 39+ messages in thread
From: Dan Williams @ 2016-07-21 15:56 UTC (permalink / raw)
  To: Vishal Verma
  Cc: linux-nvdimm, Rafael J. Wysocki, Tony Luck, linux-kernel, Linux ACPI

On Wed, Jul 20, 2016 at 6:50 PM, Vishal Verma <vishal.l.verma@intel.com> wrote:
> Normally, an ARS (Address Range Scrub) only happens at
> boot/initialization time. There can however arise situations where a
> bus-wide rescan is needed - notably, in the case of discovering a latent
> media error, we should do a full rescan to figure out what other sectors
> are bad, and thus potentially avoid triggering an mce on them in the
> future. Also provide a sysfs trigger to start a bus-wide scrub.
>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: <linux-acpi@vger.kernel.org>
> Cc: <linux-nvdimm@lists.01.org>
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> ---
>  drivers/acpi/nfit.c              | 123 +++++++++++++++++++++++++++++++++------
>  drivers/acpi/nfit.h              |   4 +-
>  drivers/nvdimm/core.c            |   7 +++
>  include/linux/libnvdimm.h        |   1 +
>  tools/testing/nvdimm/test/nfit.c |  16 +++++
>  5 files changed, 131 insertions(+), 20 deletions(-)
>

Looks good, just a couple nits:

[..]
> @@ -2138,7 +2172,7 @@ static void acpi_nfit_async_scrub(struct acpi_nfit_desc *acpi_desc,
>         unsigned int tmo = scrub_timeout;
>         int rc;
>
> -       if (nfit_spa->ars_done || !nfit_spa->nd_region)
> +       if (!(nfit_spa->ars_required && nfit_spa->nd_region))
>                 return;

Why is nd_region part of this check?  Can't this just be:

    if (!nfit_spa->ars_requested)
        return;

[..]
>
> +static struct acpi_nfit_desc *acpi_nfit_desc_alloc_register(struct device *dev)
> +{
> +       struct acpi_nfit_desc *acpi_desc;
> +       struct kernfs_node *nfit;
> +       struct device *bus_dev;
> +
> +       acpi_desc = devm_kzalloc(dev, sizeof(*acpi_desc), GFP_KERNEL);
> +       if (!acpi_desc)
> +               return ERR_PTR(-ENOMEM);
> +
> +       acpi_nfit_desc_init(acpi_desc, dev);
> +
> +       acpi_desc->nvdimm_bus = nvdimm_bus_register(dev, &acpi_desc->nd_desc);
> +       if (!acpi_desc->nvdimm_bus)
> +               return ERR_PTR(-ENOMEM);
> +
> +       bus_dev = to_nvdimm_bus_dev(acpi_desc->nvdimm_bus);
> +       nfit = sysfs_get_dirent(bus_dev->kobj.sd, "nfit");
> +       if (!nfit) {
> +               dev_err(dev, "sysfs_get_dirent 'nfit' failed\n");
> +               return ERR_PTR(-ENODEV);
> +       }
> +       acpi_desc->scrub_count_state = sysfs_get_dirent(nfit, "scrub");

Missing sysfs_put(nfit) here?

> +       if (!acpi_desc->scrub_count_state) {
> +               dev_err(dev, "sysfs_get_dirent 'scrub' failed\n");
> +               return ERR_PTR(-ENODEV);
> +       }

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/3] nfit, libnvdimm: allow an ARS scrub to be triggered on demand
@ 2016-07-21 15:56     ` Dan Williams
  0 siblings, 0 replies; 39+ messages in thread
From: Dan Williams @ 2016-07-21 15:56 UTC (permalink / raw)
  To: Vishal Verma
  Cc: linux-nvdimm@lists.01.org, Rafael J. Wysocki, Tony Luck,
	linux-kernel, Linux ACPI

On Wed, Jul 20, 2016 at 6:50 PM, Vishal Verma <vishal.l.verma@intel.com> wrote:
> Normally, an ARS (Address Range Scrub) only happens at
> boot/initialization time. There can however arise situations where a
> bus-wide rescan is needed - notably, in the case of discovering a latent
> media error, we should do a full rescan to figure out what other sectors
> are bad, and thus potentially avoid triggering an mce on them in the
> future. Also provide a sysfs trigger to start a bus-wide scrub.
>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: <linux-acpi@vger.kernel.org>
> Cc: <linux-nvdimm@lists.01.org>
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> ---
>  drivers/acpi/nfit.c              | 123 +++++++++++++++++++++++++++++++++------
>  drivers/acpi/nfit.h              |   4 +-
>  drivers/nvdimm/core.c            |   7 +++
>  include/linux/libnvdimm.h        |   1 +
>  tools/testing/nvdimm/test/nfit.c |  16 +++++
>  5 files changed, 131 insertions(+), 20 deletions(-)
>

Looks good, just a couple nits:

[..]
> @@ -2138,7 +2172,7 @@ static void acpi_nfit_async_scrub(struct acpi_nfit_desc *acpi_desc,
>         unsigned int tmo = scrub_timeout;
>         int rc;
>
> -       if (nfit_spa->ars_done || !nfit_spa->nd_region)
> +       if (!(nfit_spa->ars_required && nfit_spa->nd_region))
>                 return;

Why is nd_region part of this check?  Can't this just be:

    if (!nfit_spa->ars_requested)
        return;

[..]
>
> +static struct acpi_nfit_desc *acpi_nfit_desc_alloc_register(struct device *dev)
> +{
> +       struct acpi_nfit_desc *acpi_desc;
> +       struct kernfs_node *nfit;
> +       struct device *bus_dev;
> +
> +       acpi_desc = devm_kzalloc(dev, sizeof(*acpi_desc), GFP_KERNEL);
> +       if (!acpi_desc)
> +               return ERR_PTR(-ENOMEM);
> +
> +       acpi_nfit_desc_init(acpi_desc, dev);
> +
> +       acpi_desc->nvdimm_bus = nvdimm_bus_register(dev, &acpi_desc->nd_desc);
> +       if (!acpi_desc->nvdimm_bus)
> +               return ERR_PTR(-ENOMEM);
> +
> +       bus_dev = to_nvdimm_bus_dev(acpi_desc->nvdimm_bus);
> +       nfit = sysfs_get_dirent(bus_dev->kobj.sd, "nfit");
> +       if (!nfit) {
> +               dev_err(dev, "sysfs_get_dirent 'nfit' failed\n");
> +               return ERR_PTR(-ENODEV);
> +       }
> +       acpi_desc->scrub_count_state = sysfs_get_dirent(nfit, "scrub");

Missing sysfs_put(nfit) here?

> +       if (!acpi_desc->scrub_count_state) {
> +               dev_err(dev, "sysfs_get_dirent 'scrub' failed\n");
> +               return ERR_PTR(-ENODEV);
> +       }

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/3] nfit, libnvdimm: allow an ARS scrub to be triggered on demand
  2016-07-21 15:56     ` Dan Williams
  (?)
@ 2016-07-21 18:07       ` Vishal Verma
  -1 siblings, 0 replies; 39+ messages in thread
From: Vishal Verma @ 2016-07-21 18:07 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-kernel, Rafael J. Wysocki, Linux ACPI, Tony Luck, linux-nvdimm

On 07/21, Dan Williams wrote:
> On Wed, Jul 20, 2016 at 6:50 PM, Vishal Verma <vishal.l.verma@intel.com> wrote:
> > Normally, an ARS (Address Range Scrub) only happens at
> > boot/initialization time. There can however arise situations where a
> > bus-wide rescan is needed - notably, in the case of discovering a latent
> > media error, we should do a full rescan to figure out what other sectors
> > are bad, and thus potentially avoid triggering an mce on them in the
> > future. Also provide a sysfs trigger to start a bus-wide scrub.
> >
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Cc: <linux-acpi@vger.kernel.org>
> > Cc: <linux-nvdimm@lists.01.org>
> > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> > ---
> >  drivers/acpi/nfit.c              | 123 +++++++++++++++++++++++++++++++++------
> >  drivers/acpi/nfit.h              |   4 +-
> >  drivers/nvdimm/core.c            |   7 +++
> >  include/linux/libnvdimm.h        |   1 +
> >  tools/testing/nvdimm/test/nfit.c |  16 +++++
> >  5 files changed, 131 insertions(+), 20 deletions(-)
> >
> 
> Looks good, just a couple nits:
> 
> [..]
> > @@ -2138,7 +2172,7 @@ static void acpi_nfit_async_scrub(struct acpi_nfit_desc *acpi_desc,
> >         unsigned int tmo = scrub_timeout;
> >         int rc;
> >
> > -       if (nfit_spa->ars_done || !nfit_spa->nd_region)
> > +       if (!(nfit_spa->ars_required && nfit_spa->nd_region))
> >                 return;
> 
> Why is nd_region part of this check?  Can't this just be:
> 
>     if (!nfit_spa->ars_requested)
>         return;
> 
> [..]

This was there previously too - I think we should always have nd_region
when we get here, and if we don't that's a kernel bug. So we could just
BUG_ON if that happens.. If we don't have a valid nd_region, it will
cause an oops when we go to call nvdimm_region_notify..

I'll change it to a BUG_ON.

> >
> > +static struct acpi_nfit_desc *acpi_nfit_desc_alloc_register(struct device *dev)
> > +{
> > +       struct acpi_nfit_desc *acpi_desc;
> > +       struct kernfs_node *nfit;
> > +       struct device *bus_dev;
> > +
> > +       acpi_desc = devm_kzalloc(dev, sizeof(*acpi_desc), GFP_KERNEL);
> > +       if (!acpi_desc)
> > +               return ERR_PTR(-ENOMEM);
> > +
> > +       acpi_nfit_desc_init(acpi_desc, dev);
> > +
> > +       acpi_desc->nvdimm_bus = nvdimm_bus_register(dev, &acpi_desc->nd_desc);
> > +       if (!acpi_desc->nvdimm_bus)
> > +               return ERR_PTR(-ENOMEM);
> > +
> > +       bus_dev = to_nvdimm_bus_dev(acpi_desc->nvdimm_bus);
> > +       nfit = sysfs_get_dirent(bus_dev->kobj.sd, "nfit");
> > +       if (!nfit) {
> > +               dev_err(dev, "sysfs_get_dirent 'nfit' failed\n");
> > +               return ERR_PTR(-ENODEV);
> > +       }
> > +       acpi_desc->scrub_count_state = sysfs_get_dirent(nfit, "scrub");
> 
> Missing sysfs_put(nfit) here?

Yes, good catch! I'll fixup.
> 
> > +       if (!acpi_desc->scrub_count_state) {
> > +               dev_err(dev, "sysfs_get_dirent 'scrub' failed\n");
> > +               return ERR_PTR(-ENODEV);
> > +       }
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/3] nfit, libnvdimm: allow an ARS scrub to be triggered on demand
@ 2016-07-21 18:07       ` Vishal Verma
  0 siblings, 0 replies; 39+ messages in thread
From: Vishal Verma @ 2016-07-21 18:07 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-nvdimm, Rafael J. Wysocki, Tony Luck, linux-kernel, Linux ACPI

On 07/21, Dan Williams wrote:
> On Wed, Jul 20, 2016 at 6:50 PM, Vishal Verma <vishal.l.verma@intel.com> wrote:
> > Normally, an ARS (Address Range Scrub) only happens at
> > boot/initialization time. There can however arise situations where a
> > bus-wide rescan is needed - notably, in the case of discovering a latent
> > media error, we should do a full rescan to figure out what other sectors
> > are bad, and thus potentially avoid triggering an mce on them in the
> > future. Also provide a sysfs trigger to start a bus-wide scrub.
> >
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Cc: <linux-acpi@vger.kernel.org>
> > Cc: <linux-nvdimm@lists.01.org>
> > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> > ---
> >  drivers/acpi/nfit.c              | 123 +++++++++++++++++++++++++++++++++------
> >  drivers/acpi/nfit.h              |   4 +-
> >  drivers/nvdimm/core.c            |   7 +++
> >  include/linux/libnvdimm.h        |   1 +
> >  tools/testing/nvdimm/test/nfit.c |  16 +++++
> >  5 files changed, 131 insertions(+), 20 deletions(-)
> >
> 
> Looks good, just a couple nits:
> 
> [..]
> > @@ -2138,7 +2172,7 @@ static void acpi_nfit_async_scrub(struct acpi_nfit_desc *acpi_desc,
> >         unsigned int tmo = scrub_timeout;
> >         int rc;
> >
> > -       if (nfit_spa->ars_done || !nfit_spa->nd_region)
> > +       if (!(nfit_spa->ars_required && nfit_spa->nd_region))
> >                 return;
> 
> Why is nd_region part of this check?  Can't this just be:
> 
>     if (!nfit_spa->ars_requested)
>         return;
> 
> [..]

This was there previously too - I think we should always have nd_region
when we get here, and if we don't that's a kernel bug. So we could just
BUG_ON if that happens.. If we don't have a valid nd_region, it will
cause an oops when we go to call nvdimm_region_notify..

I'll change it to a BUG_ON.

> >
> > +static struct acpi_nfit_desc *acpi_nfit_desc_alloc_register(struct device *dev)
> > +{
> > +       struct acpi_nfit_desc *acpi_desc;
> > +       struct kernfs_node *nfit;
> > +       struct device *bus_dev;
> > +
> > +       acpi_desc = devm_kzalloc(dev, sizeof(*acpi_desc), GFP_KERNEL);
> > +       if (!acpi_desc)
> > +               return ERR_PTR(-ENOMEM);
> > +
> > +       acpi_nfit_desc_init(acpi_desc, dev);
> > +
> > +       acpi_desc->nvdimm_bus = nvdimm_bus_register(dev, &acpi_desc->nd_desc);
> > +       if (!acpi_desc->nvdimm_bus)
> > +               return ERR_PTR(-ENOMEM);
> > +
> > +       bus_dev = to_nvdimm_bus_dev(acpi_desc->nvdimm_bus);
> > +       nfit = sysfs_get_dirent(bus_dev->kobj.sd, "nfit");
> > +       if (!nfit) {
> > +               dev_err(dev, "sysfs_get_dirent 'nfit' failed\n");
> > +               return ERR_PTR(-ENODEV);
> > +       }
> > +       acpi_desc->scrub_count_state = sysfs_get_dirent(nfit, "scrub");
> 
> Missing sysfs_put(nfit) here?

Yes, good catch! I'll fixup.
> 
> > +       if (!acpi_desc->scrub_count_state) {
> > +               dev_err(dev, "sysfs_get_dirent 'scrub' failed\n");
> > +               return ERR_PTR(-ENODEV);
> > +       }

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/3] nfit, libnvdimm: allow an ARS scrub to be triggered on demand
@ 2016-07-21 18:07       ` Vishal Verma
  0 siblings, 0 replies; 39+ messages in thread
From: Vishal Verma @ 2016-07-21 18:07 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-nvdimm@lists.01.org, Rafael J. Wysocki, Tony Luck,
	linux-kernel, Linux ACPI

On 07/21, Dan Williams wrote:
> On Wed, Jul 20, 2016 at 6:50 PM, Vishal Verma <vishal.l.verma@intel.com> wrote:
> > Normally, an ARS (Address Range Scrub) only happens at
> > boot/initialization time. There can however arise situations where a
> > bus-wide rescan is needed - notably, in the case of discovering a latent
> > media error, we should do a full rescan to figure out what other sectors
> > are bad, and thus potentially avoid triggering an mce on them in the
> > future. Also provide a sysfs trigger to start a bus-wide scrub.
> >
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Cc: <linux-acpi@vger.kernel.org>
> > Cc: <linux-nvdimm@lists.01.org>
> > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> > ---
> >  drivers/acpi/nfit.c              | 123 +++++++++++++++++++++++++++++++++------
> >  drivers/acpi/nfit.h              |   4 +-
> >  drivers/nvdimm/core.c            |   7 +++
> >  include/linux/libnvdimm.h        |   1 +
> >  tools/testing/nvdimm/test/nfit.c |  16 +++++
> >  5 files changed, 131 insertions(+), 20 deletions(-)
> >
> 
> Looks good, just a couple nits:
> 
> [..]
> > @@ -2138,7 +2172,7 @@ static void acpi_nfit_async_scrub(struct acpi_nfit_desc *acpi_desc,
> >         unsigned int tmo = scrub_timeout;
> >         int rc;
> >
> > -       if (nfit_spa->ars_done || !nfit_spa->nd_region)
> > +       if (!(nfit_spa->ars_required && nfit_spa->nd_region))
> >                 return;
> 
> Why is nd_region part of this check?  Can't this just be:
> 
>     if (!nfit_spa->ars_requested)
>         return;
> 
> [..]

This was there previously too - I think we should always have nd_region
when we get here, and if we don't that's a kernel bug. So we could just
BUG_ON if that happens.. If we don't have a valid nd_region, it will
cause an oops when we go to call nvdimm_region_notify..

I'll change it to a BUG_ON.

> >
> > +static struct acpi_nfit_desc *acpi_nfit_desc_alloc_register(struct device *dev)
> > +{
> > +       struct acpi_nfit_desc *acpi_desc;
> > +       struct kernfs_node *nfit;
> > +       struct device *bus_dev;
> > +
> > +       acpi_desc = devm_kzalloc(dev, sizeof(*acpi_desc), GFP_KERNEL);
> > +       if (!acpi_desc)
> > +               return ERR_PTR(-ENOMEM);
> > +
> > +       acpi_nfit_desc_init(acpi_desc, dev);
> > +
> > +       acpi_desc->nvdimm_bus = nvdimm_bus_register(dev, &acpi_desc->nd_desc);
> > +       if (!acpi_desc->nvdimm_bus)
> > +               return ERR_PTR(-ENOMEM);
> > +
> > +       bus_dev = to_nvdimm_bus_dev(acpi_desc->nvdimm_bus);
> > +       nfit = sysfs_get_dirent(bus_dev->kobj.sd, "nfit");
> > +       if (!nfit) {
> > +               dev_err(dev, "sysfs_get_dirent 'nfit' failed\n");
> > +               return ERR_PTR(-ENODEV);
> > +       }
> > +       acpi_desc->scrub_count_state = sysfs_get_dirent(nfit, "scrub");
> 
> Missing sysfs_put(nfit) here?

Yes, good catch! I'll fixup.
> 
> > +       if (!acpi_desc->scrub_count_state) {
> > +               dev_err(dev, "sysfs_get_dirent 'scrub' failed\n");
> > +               return ERR_PTR(-ENODEV);
> > +       }

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/3] nfit, libnvdimm: allow an ARS scrub to be triggered on demand
@ 2016-07-21 19:40     ` Linda Knippers
  0 siblings, 0 replies; 39+ messages in thread
From: Linda Knippers @ 2016-07-21 19:40 UTC (permalink / raw)
  To: Vishal Verma, linux-nvdimm
  Cc: linux-acpi, Tony Luck, Rafael J. Wysocki, linux-kernel

On 07/20/2016 09:50 PM, Vishal Verma wrote:
> Normally, an ARS (Address Range Scrub) only happens at
> boot/initialization time. There can however arise situations where a
> bus-wide rescan is needed - notably, in the case of discovering a latent
> media error, we should do a full rescan to figure out what other sectors
> are bad, and thus potentially avoid triggering an mce on them in the
> future. Also provide a sysfs trigger to start a bus-wide scrub.

I don't see anything in here that checks to see if the platform actually
supports ARS before setting all this stuff up.  Setting up an MCE handler
and exposing a sysfs trigger for something that is optional and perhaps
not implemented doesn't seem helpful.  Or is there a check that I missed?

-- ljk

> 
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: <linux-acpi@vger.kernel.org>
> Cc: <linux-nvdimm@lists.01.org>
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> ---
>  drivers/acpi/nfit.c              | 123 +++++++++++++++++++++++++++++++++------
>  drivers/acpi/nfit.h              |   4 +-
>  drivers/nvdimm/core.c            |   7 +++
>  include/linux/libnvdimm.h        |   1 +
>  tools/testing/nvdimm/test/nfit.c |  16 +++++
>  5 files changed, 131 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
> index ac6ddcc0..4e65255 100644
> --- a/drivers/acpi/nfit.c
> +++ b/drivers/acpi/nfit.c
> @@ -15,6 +15,7 @@
>  #include <linux/module.h>
>  #include <linux/mutex.h>
>  #include <linux/ndctl.h>
> +#include <linux/sysfs.h>
>  #include <linux/delay.h>
>  #include <linux/list.h>
>  #include <linux/acpi.h>
> @@ -806,8 +807,41 @@ static ssize_t revision_show(struct device *dev,
>  }
>  static DEVICE_ATTR_RO(revision);
>  
> +/*
> + * This shows the number of full Address Range Scrubs that have been
> + * completed since driver load time. Userspace can wait on this using
> + * select/poll etc.
> + */
> +static ssize_t scrub_show(struct device *dev,
> +		struct device_attribute *attr, char *buf)
> +{
> +	struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
> +	struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
> +	struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
> +
> +	return sprintf(buf, "%d\n", acpi_desc->scrub_count);
> +}
> +
> +static int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc);
> +
> +static ssize_t scrub_store(struct device *dev,
> +		struct device_attribute *attr, const char *buf, size_t size)
> +{
> +	struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
> +	struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
> +	struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
> +	int rc;
> +
> +	rc = acpi_nfit_ars_rescan(acpi_desc);
> +	if (rc)
> +		return rc;
> +	return size;
> +}
> +static DEVICE_ATTR_RW(scrub);
> +
>  static struct attribute *acpi_nfit_attributes[] = {
>  	&dev_attr_revision.attr,
> +	&dev_attr_scrub.attr,
>  	NULL,
>  };
>  
> @@ -2138,7 +2172,7 @@ static void acpi_nfit_async_scrub(struct acpi_nfit_desc *acpi_desc,
>  	unsigned int tmo = scrub_timeout;
>  	int rc;
>  
> -	if (nfit_spa->ars_done || !nfit_spa->nd_region)
> +	if (!(nfit_spa->ars_required && nfit_spa->nd_region))
>  		return;
>  
>  	rc = ars_start(acpi_desc, nfit_spa);
> @@ -2227,7 +2261,9 @@ static void acpi_nfit_scrub(struct work_struct *work)
>  	 * firmware initiated scrubs to complete and then we go search for the
>  	 * affected spa regions to mark them scanned.  In the second phase we
>  	 * initiate a directed scrub for every range that was not scrubbed in
> -	 * phase 1.
> +	 * phase 1. If we're called for a 'rescan', we harmlessly pass through
> +	 * the first phase, but really only care about running phase 2, where
> +	 * regions can be notified of new poison.
>  	 */
>  
>  	/* process platform firmware initiated scrubs */
> @@ -2330,14 +2366,17 @@ static void acpi_nfit_scrub(struct work_struct *work)
>  		 * Flag all the ranges that still need scrubbing, but
>  		 * register them now to make data available.
>  		 */
> -		if (nfit_spa->nd_region)
> -			nfit_spa->ars_done = 1;
> -		else
> +		if (!nfit_spa->nd_region) {
> +			nfit_spa->ars_required = 1;
>  			acpi_nfit_register_region(acpi_desc, nfit_spa);
> +		}
>  	}
>  
>  	list_for_each_entry(nfit_spa, &acpi_desc->spas, list)
>  		acpi_nfit_async_scrub(acpi_desc, nfit_spa);
> +	acpi_desc->scrub_count++;
> +	if (acpi_desc->scrub_count_state)
> +		sysfs_notify_dirent(acpi_desc->scrub_count_state);
>  	mutex_unlock(&acpi_desc->init_mutex);
>  }
>  
> @@ -2495,6 +2534,27 @@ static int acpi_nfit_clear_to_send(struct nvdimm_bus_descriptor *nd_desc,
>  	return 0;
>  }
>  
> +static int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc)
> +{
> +	struct device *dev = acpi_desc->dev;
> +	struct nfit_spa *nfit_spa;
> +
> +	if (work_busy(&acpi_desc->work))
> +		return -EBUSY;
> +
> +	list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
> +		struct acpi_nfit_system_address *spa = nfit_spa->spa;
> +
> +		if (nfit_spa_type(spa) != NFIT_SPA_PM)
> +			continue;
> +
> +		nfit_spa->ars_required = 1;
> +	}
> +	queue_work(nfit_wq, &acpi_desc->work);
> +	dev_info(dev, "%s: ars_scan triggered\n", __func__);
> +	return 0;
> +}
> +
>  void acpi_nfit_desc_init(struct acpi_nfit_desc *acpi_desc, struct device *dev)
>  {
>  	struct nvdimm_bus_descriptor *nd_desc;
> @@ -2523,6 +2583,37 @@ void acpi_nfit_desc_init(struct acpi_nfit_desc *acpi_desc, struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(acpi_nfit_desc_init);
>  
> +static struct acpi_nfit_desc *acpi_nfit_desc_alloc_register(struct device *dev)
> +{
> +	struct acpi_nfit_desc *acpi_desc;
> +	struct kernfs_node *nfit;
> +	struct device *bus_dev;
> +
> +	acpi_desc = devm_kzalloc(dev, sizeof(*acpi_desc), GFP_KERNEL);
> +	if (!acpi_desc)
> +		return ERR_PTR(-ENOMEM);
> +
> +	acpi_nfit_desc_init(acpi_desc, dev);
> +
> +	acpi_desc->nvdimm_bus = nvdimm_bus_register(dev, &acpi_desc->nd_desc);
> +	if (!acpi_desc->nvdimm_bus)
> +		return ERR_PTR(-ENOMEM);
> +
> +	bus_dev = to_nvdimm_bus_dev(acpi_desc->nvdimm_bus);
> +	nfit = sysfs_get_dirent(bus_dev->kobj.sd, "nfit");
> +	if (!nfit) {
> +		dev_err(dev, "sysfs_get_dirent 'nfit' failed\n");
> +		return ERR_PTR(-ENODEV);
> +	}
> +	acpi_desc->scrub_count_state = sysfs_get_dirent(nfit, "scrub");
> +	if (!acpi_desc->scrub_count_state) {
> +		dev_err(dev, "sysfs_get_dirent 'scrub' failed\n");
> +		return ERR_PTR(-ENODEV);
> +	}
> +
> +	return acpi_desc;
> +}
> +
>  static int acpi_nfit_add(struct acpi_device *adev)
>  {
>  	struct acpi_buffer buf = { ACPI_ALLOCATE_BUFFER, NULL };
> @@ -2540,13 +2631,9 @@ static int acpi_nfit_add(struct acpi_device *adev)
>  		return 0;
>  	}
>  
> -	acpi_desc = devm_kzalloc(dev, sizeof(*acpi_desc), GFP_KERNEL);
> -	if (!acpi_desc)
> -		return -ENOMEM;
> -	acpi_nfit_desc_init(acpi_desc, &adev->dev);
> -	acpi_desc->nvdimm_bus = nvdimm_bus_register(dev, &acpi_desc->nd_desc);
> -	if (!acpi_desc->nvdimm_bus)
> -		return -ENOMEM;
> +	acpi_desc = acpi_nfit_desc_alloc_register(dev);
> +	if (IS_ERR(acpi_desc))
> +		return PTR_ERR(acpi_desc);
>  
>  	/*
>  	 * Save the acpi header for later and then skip it,
> @@ -2587,6 +2674,7 @@ static int acpi_nfit_remove(struct acpi_device *adev)
>  
>  	acpi_desc->cancel = 1;
>  	flush_workqueue(nfit_wq);
> +	sysfs_put(acpi_desc->scrub_count_state);
>  	nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
>  	return 0;
>  }
> @@ -2611,13 +2699,10 @@ static void acpi_nfit_notify(struct acpi_device *adev, u32 event)
>  	}
>  
>  	if (!acpi_desc) {
> -		acpi_desc = devm_kzalloc(dev, sizeof(*acpi_desc), GFP_KERNEL);
> -		if (!acpi_desc)
> -			goto out_unlock;
> -		acpi_nfit_desc_init(acpi_desc, &adev->dev);
> -		acpi_desc->nvdimm_bus = nvdimm_bus_register(dev, &acpi_desc->nd_desc);
> -		if (!acpi_desc->nvdimm_bus)
> -			goto out_unlock;
> +		acpi_desc = acpi_nfit_desc_alloc_register(dev);
> +		if (IS_ERR(acpi_desc))
> +			dev_err(dev, "%s: failed to alloc acpi_desc (%ld)\n",
> +				__func__, PTR_ERR(acpi_desc));
>  	} else {
>  		/*
>  		 * Finish previous registration before considering new
> diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
> index 02b9ea1..954d2aa 100644
> --- a/drivers/acpi/nfit.h
> +++ b/drivers/acpi/nfit.h
> @@ -77,7 +77,7 @@ struct nfit_spa {
>  	struct acpi_nfit_system_address *spa;
>  	struct list_head list;
>  	struct nd_region *nd_region;
> -	unsigned int ars_done:1;
> +	unsigned int ars_required:1;
>  	u32 clear_err_unit;
>  	u32 max_ars;
>  };
> @@ -146,6 +146,8 @@ struct acpi_nfit_desc {
>  	struct nd_cmd_ars_status *ars_status;
>  	size_t ars_status_size;
>  	struct work_struct work;
> +	struct kernfs_node *scrub_count_state;
> +	unsigned int scrub_count;
>  	unsigned int cancel:1;
>  	unsigned long dimm_cmd_force_en;
>  	unsigned long bus_cmd_force_en;
> diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c
> index be89764..d81db3ac 100644
> --- a/drivers/nvdimm/core.c
> +++ b/drivers/nvdimm/core.c
> @@ -99,6 +99,13 @@ struct nvdimm_bus_descriptor *to_nd_desc(struct nvdimm_bus *nvdimm_bus)
>  }
>  EXPORT_SYMBOL_GPL(to_nd_desc);
>  
> +struct device *to_nvdimm_bus_dev(struct nvdimm_bus *nvdimm_bus)
> +{
> +	/* struct nvdimm_bus definition is private to libnvdimm */
> +	return &nvdimm_bus->dev;
> +}
> +EXPORT_SYMBOL_GPL(to_nvdimm_bus_dev);
> +
>  struct nvdimm_bus *walk_to_nvdimm_bus(struct device *nd_dev)
>  {
>  	struct device *dev;
> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
> index 0c3c30c..27cecc2 100644
> --- a/include/linux/libnvdimm.h
> +++ b/include/linux/libnvdimm.h
> @@ -129,6 +129,7 @@ struct nvdimm *to_nvdimm(struct device *dev);
>  struct nd_region *to_nd_region(struct device *dev);
>  struct nd_blk_region *to_nd_blk_region(struct device *dev);
>  struct nvdimm_bus_descriptor *to_nd_desc(struct nvdimm_bus *nvdimm_bus);
> +struct device *to_nvdimm_bus_dev(struct nvdimm_bus *nvdimm_bus);
>  const char *nvdimm_name(struct nvdimm *nvdimm);
>  unsigned long nvdimm_cmd_mask(struct nvdimm *nvdimm);
>  void *nvdimm_provider_data(struct nvdimm *nvdimm);
> diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c
> index c919866..74231de 100644
> --- a/tools/testing/nvdimm/test/nfit.c
> +++ b/tools/testing/nvdimm/test/nfit.c
> @@ -20,6 +20,7 @@
>  #include <linux/mutex.h>
>  #include <linux/ndctl.h>
>  #include <linux/sizes.h>
> +#include <linux/sysfs.h>
>  #include <linux/list.h>
>  #include <linux/slab.h>
>  #include <nfit.h>
> @@ -1409,6 +1410,8 @@ static int nfit_test_probe(struct platform_device *pdev)
>  	struct acpi_nfit_desc *acpi_desc;
>  	struct device *dev = &pdev->dev;
>  	struct nfit_test *nfit_test;
> +	struct kernfs_node *nfit;
> +	struct device *bus_dev;
>  	int rc;
>  
>  	nfit_test = to_nfit_test(&pdev->dev);
> @@ -1471,6 +1474,18 @@ static int nfit_test_probe(struct platform_device *pdev)
>  	if (!acpi_desc->nvdimm_bus)
>  		return -ENXIO;
>  
> +	bus_dev = to_nvdimm_bus_dev(acpi_desc->nvdimm_bus);
> +	nfit = sysfs_get_dirent(bus_dev->kobj.sd, "nfit");
> +	if (!nfit) {
> +		dev_err(dev, "sysfs_get_dirent 'nfit' failed\n");
> +		return -ENODEV;
> +	}
> +	acpi_desc->scrub_count_state = sysfs_get_dirent(nfit, "scrub");
> +	if (!acpi_desc->scrub_count_state) {
> +		dev_err(dev, "sysfs_get_dirent 'scrub' failed\n");
> +		return -ENODEV;
> +	}
> +
>  	rc = acpi_nfit_init(acpi_desc, nfit_test->nfit_size);
>  	if (rc) {
>  		nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
> @@ -1497,6 +1512,7 @@ static int nfit_test_remove(struct platform_device *pdev)
>  	struct nfit_test *nfit_test = to_nfit_test(&pdev->dev);
>  	struct acpi_nfit_desc *acpi_desc = &nfit_test->acpi_desc;
>  
> +	sysfs_put(acpi_desc->scrub_count_state);
>  	nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
>  
>  	return 0;
> 

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/3] nfit, libnvdimm: allow an ARS scrub to be triggered on demand
@ 2016-07-21 19:40     ` Linda Knippers
  0 siblings, 0 replies; 39+ messages in thread
From: Linda Knippers @ 2016-07-21 19:40 UTC (permalink / raw)
  To: Vishal Verma, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw
  Cc: linux-acpi-u79uwXL29TY76Z2rM5mHXA, Tony Luck, Rafael J. Wysocki,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 07/20/2016 09:50 PM, Vishal Verma wrote:
> Normally, an ARS (Address Range Scrub) only happens at
> boot/initialization time. There can however arise situations where a
> bus-wide rescan is needed - notably, in the case of discovering a latent
> media error, we should do a full rescan to figure out what other sectors
> are bad, and thus potentially avoid triggering an mce on them in the
> future. Also provide a sysfs trigger to start a bus-wide scrub.

I don't see anything in here that checks to see if the platform actually
supports ARS before setting all this stuff up.  Setting up an MCE handler
and exposing a sysfs trigger for something that is optional and perhaps
not implemented doesn't seem helpful.  Or is there a check that I missed?

-- ljk

> 
> Cc: Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> Cc: Rafael J. Wysocki <rafael.j.wysocki-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> Cc: <linux-acpi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
> Cc: <linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org>
> Signed-off-by: Vishal Verma <vishal.l.verma-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> ---
>  drivers/acpi/nfit.c              | 123 +++++++++++++++++++++++++++++++++------
>  drivers/acpi/nfit.h              |   4 +-
>  drivers/nvdimm/core.c            |   7 +++
>  include/linux/libnvdimm.h        |   1 +
>  tools/testing/nvdimm/test/nfit.c |  16 +++++
>  5 files changed, 131 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
> index ac6ddcc0..4e65255 100644
> --- a/drivers/acpi/nfit.c
> +++ b/drivers/acpi/nfit.c
> @@ -15,6 +15,7 @@
>  #include <linux/module.h>
>  #include <linux/mutex.h>
>  #include <linux/ndctl.h>
> +#include <linux/sysfs.h>
>  #include <linux/delay.h>
>  #include <linux/list.h>
>  #include <linux/acpi.h>
> @@ -806,8 +807,41 @@ static ssize_t revision_show(struct device *dev,
>  }
>  static DEVICE_ATTR_RO(revision);
>  
> +/*
> + * This shows the number of full Address Range Scrubs that have been
> + * completed since driver load time. Userspace can wait on this using
> + * select/poll etc.
> + */
> +static ssize_t scrub_show(struct device *dev,
> +		struct device_attribute *attr, char *buf)
> +{
> +	struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
> +	struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
> +	struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
> +
> +	return sprintf(buf, "%d\n", acpi_desc->scrub_count);
> +}
> +
> +static int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc);
> +
> +static ssize_t scrub_store(struct device *dev,
> +		struct device_attribute *attr, const char *buf, size_t size)
> +{
> +	struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
> +	struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
> +	struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
> +	int rc;
> +
> +	rc = acpi_nfit_ars_rescan(acpi_desc);
> +	if (rc)
> +		return rc;
> +	return size;
> +}
> +static DEVICE_ATTR_RW(scrub);
> +
>  static struct attribute *acpi_nfit_attributes[] = {
>  	&dev_attr_revision.attr,
> +	&dev_attr_scrub.attr,
>  	NULL,
>  };
>  
> @@ -2138,7 +2172,7 @@ static void acpi_nfit_async_scrub(struct acpi_nfit_desc *acpi_desc,
>  	unsigned int tmo = scrub_timeout;
>  	int rc;
>  
> -	if (nfit_spa->ars_done || !nfit_spa->nd_region)
> +	if (!(nfit_spa->ars_required && nfit_spa->nd_region))
>  		return;
>  
>  	rc = ars_start(acpi_desc, nfit_spa);
> @@ -2227,7 +2261,9 @@ static void acpi_nfit_scrub(struct work_struct *work)
>  	 * firmware initiated scrubs to complete and then we go search for the
>  	 * affected spa regions to mark them scanned.  In the second phase we
>  	 * initiate a directed scrub for every range that was not scrubbed in
> -	 * phase 1.
> +	 * phase 1. If we're called for a 'rescan', we harmlessly pass through
> +	 * the first phase, but really only care about running phase 2, where
> +	 * regions can be notified of new poison.
>  	 */
>  
>  	/* process platform firmware initiated scrubs */
> @@ -2330,14 +2366,17 @@ static void acpi_nfit_scrub(struct work_struct *work)
>  		 * Flag all the ranges that still need scrubbing, but
>  		 * register them now to make data available.
>  		 */
> -		if (nfit_spa->nd_region)
> -			nfit_spa->ars_done = 1;
> -		else
> +		if (!nfit_spa->nd_region) {
> +			nfit_spa->ars_required = 1;
>  			acpi_nfit_register_region(acpi_desc, nfit_spa);
> +		}
>  	}
>  
>  	list_for_each_entry(nfit_spa, &acpi_desc->spas, list)
>  		acpi_nfit_async_scrub(acpi_desc, nfit_spa);
> +	acpi_desc->scrub_count++;
> +	if (acpi_desc->scrub_count_state)
> +		sysfs_notify_dirent(acpi_desc->scrub_count_state);
>  	mutex_unlock(&acpi_desc->init_mutex);
>  }
>  
> @@ -2495,6 +2534,27 @@ static int acpi_nfit_clear_to_send(struct nvdimm_bus_descriptor *nd_desc,
>  	return 0;
>  }
>  
> +static int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc)
> +{
> +	struct device *dev = acpi_desc->dev;
> +	struct nfit_spa *nfit_spa;
> +
> +	if (work_busy(&acpi_desc->work))
> +		return -EBUSY;
> +
> +	list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
> +		struct acpi_nfit_system_address *spa = nfit_spa->spa;
> +
> +		if (nfit_spa_type(spa) != NFIT_SPA_PM)
> +			continue;
> +
> +		nfit_spa->ars_required = 1;
> +	}
> +	queue_work(nfit_wq, &acpi_desc->work);
> +	dev_info(dev, "%s: ars_scan triggered\n", __func__);
> +	return 0;
> +}
> +
>  void acpi_nfit_desc_init(struct acpi_nfit_desc *acpi_desc, struct device *dev)
>  {
>  	struct nvdimm_bus_descriptor *nd_desc;
> @@ -2523,6 +2583,37 @@ void acpi_nfit_desc_init(struct acpi_nfit_desc *acpi_desc, struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(acpi_nfit_desc_init);
>  
> +static struct acpi_nfit_desc *acpi_nfit_desc_alloc_register(struct device *dev)
> +{
> +	struct acpi_nfit_desc *acpi_desc;
> +	struct kernfs_node *nfit;
> +	struct device *bus_dev;
> +
> +	acpi_desc = devm_kzalloc(dev, sizeof(*acpi_desc), GFP_KERNEL);
> +	if (!acpi_desc)
> +		return ERR_PTR(-ENOMEM);
> +
> +	acpi_nfit_desc_init(acpi_desc, dev);
> +
> +	acpi_desc->nvdimm_bus = nvdimm_bus_register(dev, &acpi_desc->nd_desc);
> +	if (!acpi_desc->nvdimm_bus)
> +		return ERR_PTR(-ENOMEM);
> +
> +	bus_dev = to_nvdimm_bus_dev(acpi_desc->nvdimm_bus);
> +	nfit = sysfs_get_dirent(bus_dev->kobj.sd, "nfit");
> +	if (!nfit) {
> +		dev_err(dev, "sysfs_get_dirent 'nfit' failed\n");
> +		return ERR_PTR(-ENODEV);
> +	}
> +	acpi_desc->scrub_count_state = sysfs_get_dirent(nfit, "scrub");
> +	if (!acpi_desc->scrub_count_state) {
> +		dev_err(dev, "sysfs_get_dirent 'scrub' failed\n");
> +		return ERR_PTR(-ENODEV);
> +	}
> +
> +	return acpi_desc;
> +}
> +
>  static int acpi_nfit_add(struct acpi_device *adev)
>  {
>  	struct acpi_buffer buf = { ACPI_ALLOCATE_BUFFER, NULL };
> @@ -2540,13 +2631,9 @@ static int acpi_nfit_add(struct acpi_device *adev)
>  		return 0;
>  	}
>  
> -	acpi_desc = devm_kzalloc(dev, sizeof(*acpi_desc), GFP_KERNEL);
> -	if (!acpi_desc)
> -		return -ENOMEM;
> -	acpi_nfit_desc_init(acpi_desc, &adev->dev);
> -	acpi_desc->nvdimm_bus = nvdimm_bus_register(dev, &acpi_desc->nd_desc);
> -	if (!acpi_desc->nvdimm_bus)
> -		return -ENOMEM;
> +	acpi_desc = acpi_nfit_desc_alloc_register(dev);
> +	if (IS_ERR(acpi_desc))
> +		return PTR_ERR(acpi_desc);
>  
>  	/*
>  	 * Save the acpi header for later and then skip it,
> @@ -2587,6 +2674,7 @@ static int acpi_nfit_remove(struct acpi_device *adev)
>  
>  	acpi_desc->cancel = 1;
>  	flush_workqueue(nfit_wq);
> +	sysfs_put(acpi_desc->scrub_count_state);
>  	nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
>  	return 0;
>  }
> @@ -2611,13 +2699,10 @@ static void acpi_nfit_notify(struct acpi_device *adev, u32 event)
>  	}
>  
>  	if (!acpi_desc) {
> -		acpi_desc = devm_kzalloc(dev, sizeof(*acpi_desc), GFP_KERNEL);
> -		if (!acpi_desc)
> -			goto out_unlock;
> -		acpi_nfit_desc_init(acpi_desc, &adev->dev);
> -		acpi_desc->nvdimm_bus = nvdimm_bus_register(dev, &acpi_desc->nd_desc);
> -		if (!acpi_desc->nvdimm_bus)
> -			goto out_unlock;
> +		acpi_desc = acpi_nfit_desc_alloc_register(dev);
> +		if (IS_ERR(acpi_desc))
> +			dev_err(dev, "%s: failed to alloc acpi_desc (%ld)\n",
> +				__func__, PTR_ERR(acpi_desc));
>  	} else {
>  		/*
>  		 * Finish previous registration before considering new
> diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
> index 02b9ea1..954d2aa 100644
> --- a/drivers/acpi/nfit.h
> +++ b/drivers/acpi/nfit.h
> @@ -77,7 +77,7 @@ struct nfit_spa {
>  	struct acpi_nfit_system_address *spa;
>  	struct list_head list;
>  	struct nd_region *nd_region;
> -	unsigned int ars_done:1;
> +	unsigned int ars_required:1;
>  	u32 clear_err_unit;
>  	u32 max_ars;
>  };
> @@ -146,6 +146,8 @@ struct acpi_nfit_desc {
>  	struct nd_cmd_ars_status *ars_status;
>  	size_t ars_status_size;
>  	struct work_struct work;
> +	struct kernfs_node *scrub_count_state;
> +	unsigned int scrub_count;
>  	unsigned int cancel:1;
>  	unsigned long dimm_cmd_force_en;
>  	unsigned long bus_cmd_force_en;
> diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c
> index be89764..d81db3ac 100644
> --- a/drivers/nvdimm/core.c
> +++ b/drivers/nvdimm/core.c
> @@ -99,6 +99,13 @@ struct nvdimm_bus_descriptor *to_nd_desc(struct nvdimm_bus *nvdimm_bus)
>  }
>  EXPORT_SYMBOL_GPL(to_nd_desc);
>  
> +struct device *to_nvdimm_bus_dev(struct nvdimm_bus *nvdimm_bus)
> +{
> +	/* struct nvdimm_bus definition is private to libnvdimm */
> +	return &nvdimm_bus->dev;
> +}
> +EXPORT_SYMBOL_GPL(to_nvdimm_bus_dev);
> +
>  struct nvdimm_bus *walk_to_nvdimm_bus(struct device *nd_dev)
>  {
>  	struct device *dev;
> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
> index 0c3c30c..27cecc2 100644
> --- a/include/linux/libnvdimm.h
> +++ b/include/linux/libnvdimm.h
> @@ -129,6 +129,7 @@ struct nvdimm *to_nvdimm(struct device *dev);
>  struct nd_region *to_nd_region(struct device *dev);
>  struct nd_blk_region *to_nd_blk_region(struct device *dev);
>  struct nvdimm_bus_descriptor *to_nd_desc(struct nvdimm_bus *nvdimm_bus);
> +struct device *to_nvdimm_bus_dev(struct nvdimm_bus *nvdimm_bus);
>  const char *nvdimm_name(struct nvdimm *nvdimm);
>  unsigned long nvdimm_cmd_mask(struct nvdimm *nvdimm);
>  void *nvdimm_provider_data(struct nvdimm *nvdimm);
> diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c
> index c919866..74231de 100644
> --- a/tools/testing/nvdimm/test/nfit.c
> +++ b/tools/testing/nvdimm/test/nfit.c
> @@ -20,6 +20,7 @@
>  #include <linux/mutex.h>
>  #include <linux/ndctl.h>
>  #include <linux/sizes.h>
> +#include <linux/sysfs.h>
>  #include <linux/list.h>
>  #include <linux/slab.h>
>  #include <nfit.h>
> @@ -1409,6 +1410,8 @@ static int nfit_test_probe(struct platform_device *pdev)
>  	struct acpi_nfit_desc *acpi_desc;
>  	struct device *dev = &pdev->dev;
>  	struct nfit_test *nfit_test;
> +	struct kernfs_node *nfit;
> +	struct device *bus_dev;
>  	int rc;
>  
>  	nfit_test = to_nfit_test(&pdev->dev);
> @@ -1471,6 +1474,18 @@ static int nfit_test_probe(struct platform_device *pdev)
>  	if (!acpi_desc->nvdimm_bus)
>  		return -ENXIO;
>  
> +	bus_dev = to_nvdimm_bus_dev(acpi_desc->nvdimm_bus);
> +	nfit = sysfs_get_dirent(bus_dev->kobj.sd, "nfit");
> +	if (!nfit) {
> +		dev_err(dev, "sysfs_get_dirent 'nfit' failed\n");
> +		return -ENODEV;
> +	}
> +	acpi_desc->scrub_count_state = sysfs_get_dirent(nfit, "scrub");
> +	if (!acpi_desc->scrub_count_state) {
> +		dev_err(dev, "sysfs_get_dirent 'scrub' failed\n");
> +		return -ENODEV;
> +	}
> +
>  	rc = acpi_nfit_init(acpi_desc, nfit_test->nfit_size);
>  	if (rc) {
>  		nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
> @@ -1497,6 +1512,7 @@ static int nfit_test_remove(struct platform_device *pdev)
>  	struct nfit_test *nfit_test = to_nfit_test(&pdev->dev);
>  	struct acpi_nfit_desc *acpi_desc = &nfit_test->acpi_desc;
>  
> +	sysfs_put(acpi_desc->scrub_count_state);
>  	nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
>  
>  	return 0;
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/3] nfit, libnvdimm: allow an ARS scrub to be triggered on demand
@ 2016-07-21 19:40     ` Linda Knippers
  0 siblings, 0 replies; 39+ messages in thread
From: Linda Knippers @ 2016-07-21 19:40 UTC (permalink / raw)
  To: Vishal Verma, linux-nvdimm
  Cc: Tony Luck, Rafael J. Wysocki, linux-kernel, linux-acpi

On 07/20/2016 09:50 PM, Vishal Verma wrote:
> Normally, an ARS (Address Range Scrub) only happens at
> boot/initialization time. There can however arise situations where a
> bus-wide rescan is needed - notably, in the case of discovering a latent
> media error, we should do a full rescan to figure out what other sectors
> are bad, and thus potentially avoid triggering an mce on them in the
> future. Also provide a sysfs trigger to start a bus-wide scrub.

I don't see anything in here that checks to see if the platform actually
supports ARS before setting all this stuff up.  Setting up an MCE handler
and exposing a sysfs trigger for something that is optional and perhaps
not implemented doesn't seem helpful.  Or is there a check that I missed?

-- ljk

> 
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: <linux-acpi@vger.kernel.org>
> Cc: <linux-nvdimm@lists.01.org>
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> ---
>  drivers/acpi/nfit.c              | 123 +++++++++++++++++++++++++++++++++------
>  drivers/acpi/nfit.h              |   4 +-
>  drivers/nvdimm/core.c            |   7 +++
>  include/linux/libnvdimm.h        |   1 +
>  tools/testing/nvdimm/test/nfit.c |  16 +++++
>  5 files changed, 131 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
> index ac6ddcc0..4e65255 100644
> --- a/drivers/acpi/nfit.c
> +++ b/drivers/acpi/nfit.c
> @@ -15,6 +15,7 @@
>  #include <linux/module.h>
>  #include <linux/mutex.h>
>  #include <linux/ndctl.h>
> +#include <linux/sysfs.h>
>  #include <linux/delay.h>
>  #include <linux/list.h>
>  #include <linux/acpi.h>
> @@ -806,8 +807,41 @@ static ssize_t revision_show(struct device *dev,
>  }
>  static DEVICE_ATTR_RO(revision);
>  
> +/*
> + * This shows the number of full Address Range Scrubs that have been
> + * completed since driver load time. Userspace can wait on this using
> + * select/poll etc.
> + */
> +static ssize_t scrub_show(struct device *dev,
> +		struct device_attribute *attr, char *buf)
> +{
> +	struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
> +	struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
> +	struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
> +
> +	return sprintf(buf, "%d\n", acpi_desc->scrub_count);
> +}
> +
> +static int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc);
> +
> +static ssize_t scrub_store(struct device *dev,
> +		struct device_attribute *attr, const char *buf, size_t size)
> +{
> +	struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
> +	struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
> +	struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
> +	int rc;
> +
> +	rc = acpi_nfit_ars_rescan(acpi_desc);
> +	if (rc)
> +		return rc;
> +	return size;
> +}
> +static DEVICE_ATTR_RW(scrub);
> +
>  static struct attribute *acpi_nfit_attributes[] = {
>  	&dev_attr_revision.attr,
> +	&dev_attr_scrub.attr,
>  	NULL,
>  };
>  
> @@ -2138,7 +2172,7 @@ static void acpi_nfit_async_scrub(struct acpi_nfit_desc *acpi_desc,
>  	unsigned int tmo = scrub_timeout;
>  	int rc;
>  
> -	if (nfit_spa->ars_done || !nfit_spa->nd_region)
> +	if (!(nfit_spa->ars_required && nfit_spa->nd_region))
>  		return;
>  
>  	rc = ars_start(acpi_desc, nfit_spa);
> @@ -2227,7 +2261,9 @@ static void acpi_nfit_scrub(struct work_struct *work)
>  	 * firmware initiated scrubs to complete and then we go search for the
>  	 * affected spa regions to mark them scanned.  In the second phase we
>  	 * initiate a directed scrub for every range that was not scrubbed in
> -	 * phase 1.
> +	 * phase 1. If we're called for a 'rescan', we harmlessly pass through
> +	 * the first phase, but really only care about running phase 2, where
> +	 * regions can be notified of new poison.
>  	 */
>  
>  	/* process platform firmware initiated scrubs */
> @@ -2330,14 +2366,17 @@ static void acpi_nfit_scrub(struct work_struct *work)
>  		 * Flag all the ranges that still need scrubbing, but
>  		 * register them now to make data available.
>  		 */
> -		if (nfit_spa->nd_region)
> -			nfit_spa->ars_done = 1;
> -		else
> +		if (!nfit_spa->nd_region) {
> +			nfit_spa->ars_required = 1;
>  			acpi_nfit_register_region(acpi_desc, nfit_spa);
> +		}
>  	}
>  
>  	list_for_each_entry(nfit_spa, &acpi_desc->spas, list)
>  		acpi_nfit_async_scrub(acpi_desc, nfit_spa);
> +	acpi_desc->scrub_count++;
> +	if (acpi_desc->scrub_count_state)
> +		sysfs_notify_dirent(acpi_desc->scrub_count_state);
>  	mutex_unlock(&acpi_desc->init_mutex);
>  }
>  
> @@ -2495,6 +2534,27 @@ static int acpi_nfit_clear_to_send(struct nvdimm_bus_descriptor *nd_desc,
>  	return 0;
>  }
>  
> +static int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc)
> +{
> +	struct device *dev = acpi_desc->dev;
> +	struct nfit_spa *nfit_spa;
> +
> +	if (work_busy(&acpi_desc->work))
> +		return -EBUSY;
> +
> +	list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
> +		struct acpi_nfit_system_address *spa = nfit_spa->spa;
> +
> +		if (nfit_spa_type(spa) != NFIT_SPA_PM)
> +			continue;
> +
> +		nfit_spa->ars_required = 1;
> +	}
> +	queue_work(nfit_wq, &acpi_desc->work);
> +	dev_info(dev, "%s: ars_scan triggered\n", __func__);
> +	return 0;
> +}
> +
>  void acpi_nfit_desc_init(struct acpi_nfit_desc *acpi_desc, struct device *dev)
>  {
>  	struct nvdimm_bus_descriptor *nd_desc;
> @@ -2523,6 +2583,37 @@ void acpi_nfit_desc_init(struct acpi_nfit_desc *acpi_desc, struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(acpi_nfit_desc_init);
>  
> +static struct acpi_nfit_desc *acpi_nfit_desc_alloc_register(struct device *dev)
> +{
> +	struct acpi_nfit_desc *acpi_desc;
> +	struct kernfs_node *nfit;
> +	struct device *bus_dev;
> +
> +	acpi_desc = devm_kzalloc(dev, sizeof(*acpi_desc), GFP_KERNEL);
> +	if (!acpi_desc)
> +		return ERR_PTR(-ENOMEM);
> +
> +	acpi_nfit_desc_init(acpi_desc, dev);
> +
> +	acpi_desc->nvdimm_bus = nvdimm_bus_register(dev, &acpi_desc->nd_desc);
> +	if (!acpi_desc->nvdimm_bus)
> +		return ERR_PTR(-ENOMEM);
> +
> +	bus_dev = to_nvdimm_bus_dev(acpi_desc->nvdimm_bus);
> +	nfit = sysfs_get_dirent(bus_dev->kobj.sd, "nfit");
> +	if (!nfit) {
> +		dev_err(dev, "sysfs_get_dirent 'nfit' failed\n");
> +		return ERR_PTR(-ENODEV);
> +	}
> +	acpi_desc->scrub_count_state = sysfs_get_dirent(nfit, "scrub");
> +	if (!acpi_desc->scrub_count_state) {
> +		dev_err(dev, "sysfs_get_dirent 'scrub' failed\n");
> +		return ERR_PTR(-ENODEV);
> +	}
> +
> +	return acpi_desc;
> +}
> +
>  static int acpi_nfit_add(struct acpi_device *adev)
>  {
>  	struct acpi_buffer buf = { ACPI_ALLOCATE_BUFFER, NULL };
> @@ -2540,13 +2631,9 @@ static int acpi_nfit_add(struct acpi_device *adev)
>  		return 0;
>  	}
>  
> -	acpi_desc = devm_kzalloc(dev, sizeof(*acpi_desc), GFP_KERNEL);
> -	if (!acpi_desc)
> -		return -ENOMEM;
> -	acpi_nfit_desc_init(acpi_desc, &adev->dev);
> -	acpi_desc->nvdimm_bus = nvdimm_bus_register(dev, &acpi_desc->nd_desc);
> -	if (!acpi_desc->nvdimm_bus)
> -		return -ENOMEM;
> +	acpi_desc = acpi_nfit_desc_alloc_register(dev);
> +	if (IS_ERR(acpi_desc))
> +		return PTR_ERR(acpi_desc);
>  
>  	/*
>  	 * Save the acpi header for later and then skip it,
> @@ -2587,6 +2674,7 @@ static int acpi_nfit_remove(struct acpi_device *adev)
>  
>  	acpi_desc->cancel = 1;
>  	flush_workqueue(nfit_wq);
> +	sysfs_put(acpi_desc->scrub_count_state);
>  	nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
>  	return 0;
>  }
> @@ -2611,13 +2699,10 @@ static void acpi_nfit_notify(struct acpi_device *adev, u32 event)
>  	}
>  
>  	if (!acpi_desc) {
> -		acpi_desc = devm_kzalloc(dev, sizeof(*acpi_desc), GFP_KERNEL);
> -		if (!acpi_desc)
> -			goto out_unlock;
> -		acpi_nfit_desc_init(acpi_desc, &adev->dev);
> -		acpi_desc->nvdimm_bus = nvdimm_bus_register(dev, &acpi_desc->nd_desc);
> -		if (!acpi_desc->nvdimm_bus)
> -			goto out_unlock;
> +		acpi_desc = acpi_nfit_desc_alloc_register(dev);
> +		if (IS_ERR(acpi_desc))
> +			dev_err(dev, "%s: failed to alloc acpi_desc (%ld)\n",
> +				__func__, PTR_ERR(acpi_desc));
>  	} else {
>  		/*
>  		 * Finish previous registration before considering new
> diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
> index 02b9ea1..954d2aa 100644
> --- a/drivers/acpi/nfit.h
> +++ b/drivers/acpi/nfit.h
> @@ -77,7 +77,7 @@ struct nfit_spa {
>  	struct acpi_nfit_system_address *spa;
>  	struct list_head list;
>  	struct nd_region *nd_region;
> -	unsigned int ars_done:1;
> +	unsigned int ars_required:1;
>  	u32 clear_err_unit;
>  	u32 max_ars;
>  };
> @@ -146,6 +146,8 @@ struct acpi_nfit_desc {
>  	struct nd_cmd_ars_status *ars_status;
>  	size_t ars_status_size;
>  	struct work_struct work;
> +	struct kernfs_node *scrub_count_state;
> +	unsigned int scrub_count;
>  	unsigned int cancel:1;
>  	unsigned long dimm_cmd_force_en;
>  	unsigned long bus_cmd_force_en;
> diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c
> index be89764..d81db3ac 100644
> --- a/drivers/nvdimm/core.c
> +++ b/drivers/nvdimm/core.c
> @@ -99,6 +99,13 @@ struct nvdimm_bus_descriptor *to_nd_desc(struct nvdimm_bus *nvdimm_bus)
>  }
>  EXPORT_SYMBOL_GPL(to_nd_desc);
>  
> +struct device *to_nvdimm_bus_dev(struct nvdimm_bus *nvdimm_bus)
> +{
> +	/* struct nvdimm_bus definition is private to libnvdimm */
> +	return &nvdimm_bus->dev;
> +}
> +EXPORT_SYMBOL_GPL(to_nvdimm_bus_dev);
> +
>  struct nvdimm_bus *walk_to_nvdimm_bus(struct device *nd_dev)
>  {
>  	struct device *dev;
> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
> index 0c3c30c..27cecc2 100644
> --- a/include/linux/libnvdimm.h
> +++ b/include/linux/libnvdimm.h
> @@ -129,6 +129,7 @@ struct nvdimm *to_nvdimm(struct device *dev);
>  struct nd_region *to_nd_region(struct device *dev);
>  struct nd_blk_region *to_nd_blk_region(struct device *dev);
>  struct nvdimm_bus_descriptor *to_nd_desc(struct nvdimm_bus *nvdimm_bus);
> +struct device *to_nvdimm_bus_dev(struct nvdimm_bus *nvdimm_bus);
>  const char *nvdimm_name(struct nvdimm *nvdimm);
>  unsigned long nvdimm_cmd_mask(struct nvdimm *nvdimm);
>  void *nvdimm_provider_data(struct nvdimm *nvdimm);
> diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c
> index c919866..74231de 100644
> --- a/tools/testing/nvdimm/test/nfit.c
> +++ b/tools/testing/nvdimm/test/nfit.c
> @@ -20,6 +20,7 @@
>  #include <linux/mutex.h>
>  #include <linux/ndctl.h>
>  #include <linux/sizes.h>
> +#include <linux/sysfs.h>
>  #include <linux/list.h>
>  #include <linux/slab.h>
>  #include <nfit.h>
> @@ -1409,6 +1410,8 @@ static int nfit_test_probe(struct platform_device *pdev)
>  	struct acpi_nfit_desc *acpi_desc;
>  	struct device *dev = &pdev->dev;
>  	struct nfit_test *nfit_test;
> +	struct kernfs_node *nfit;
> +	struct device *bus_dev;
>  	int rc;
>  
>  	nfit_test = to_nfit_test(&pdev->dev);
> @@ -1471,6 +1474,18 @@ static int nfit_test_probe(struct platform_device *pdev)
>  	if (!acpi_desc->nvdimm_bus)
>  		return -ENXIO;
>  
> +	bus_dev = to_nvdimm_bus_dev(acpi_desc->nvdimm_bus);
> +	nfit = sysfs_get_dirent(bus_dev->kobj.sd, "nfit");
> +	if (!nfit) {
> +		dev_err(dev, "sysfs_get_dirent 'nfit' failed\n");
> +		return -ENODEV;
> +	}
> +	acpi_desc->scrub_count_state = sysfs_get_dirent(nfit, "scrub");
> +	if (!acpi_desc->scrub_count_state) {
> +		dev_err(dev, "sysfs_get_dirent 'scrub' failed\n");
> +		return -ENODEV;
> +	}
> +
>  	rc = acpi_nfit_init(acpi_desc, nfit_test->nfit_size);
>  	if (rc) {
>  		nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
> @@ -1497,6 +1512,7 @@ static int nfit_test_remove(struct platform_device *pdev)
>  	struct nfit_test *nfit_test = to_nfit_test(&pdev->dev);
>  	struct acpi_nfit_desc *acpi_desc = &nfit_test->acpi_desc;
>  
> +	sysfs_put(acpi_desc->scrub_count_state);
>  	nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
>  
>  	return 0;
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/3] nfit, libnvdimm: allow an ARS scrub to be triggered on demand
@ 2016-07-21 19:46       ` Dan Williams
  0 siblings, 0 replies; 39+ messages in thread
From: Dan Williams @ 2016-07-21 19:46 UTC (permalink / raw)
  To: Linda Knippers
  Cc: Tony Luck, linux-nvdimm, Rafael J. Wysocki, linux-kernel, Linux ACPI

On Thu, Jul 21, 2016 at 12:40 PM, Linda Knippers <linda.knippers@hpe.com> wrote:
> On 07/20/2016 09:50 PM, Vishal Verma wrote:
>> Normally, an ARS (Address Range Scrub) only happens at
>> boot/initialization time. There can however arise situations where a
>> bus-wide rescan is needed - notably, in the case of discovering a latent
>> media error, we should do a full rescan to figure out what other sectors
>> are bad, and thus potentially avoid triggering an mce on them in the
>> future. Also provide a sysfs trigger to start a bus-wide scrub.
>
> I don't see anything in here that checks to see if the platform actually
> supports ARS before setting all this stuff up.  Setting up an MCE handler
> and exposing a sysfs trigger for something that is optional and perhaps
> not implemented doesn't seem helpful.  Or is there a check that I missed?

We'll get -ENOTTY to ars_start(), but you're right it's a good idea to
hide the scrub attribute if a platform does not have ars support.

Vishal, can you add an is_visible() routine to
acpi_nfit_attribute_group() to hide 'scrub' on platforms that do not
implement the ARS commands?
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/3] nfit, libnvdimm: allow an ARS scrub to be triggered on demand
@ 2016-07-21 19:46       ` Dan Williams
  0 siblings, 0 replies; 39+ messages in thread
From: Dan Williams @ 2016-07-21 19:46 UTC (permalink / raw)
  To: Linda Knippers
  Cc: Tony Luck, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	Rafael J. Wysocki, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Linux ACPI

On Thu, Jul 21, 2016 at 12:40 PM, Linda Knippers <linda.knippers-ZPxbGqLxI0U@public.gmane.org> wrote:
> On 07/20/2016 09:50 PM, Vishal Verma wrote:
>> Normally, an ARS (Address Range Scrub) only happens at
>> boot/initialization time. There can however arise situations where a
>> bus-wide rescan is needed - notably, in the case of discovering a latent
>> media error, we should do a full rescan to figure out what other sectors
>> are bad, and thus potentially avoid triggering an mce on them in the
>> future. Also provide a sysfs trigger to start a bus-wide scrub.
>
> I don't see anything in here that checks to see if the platform actually
> supports ARS before setting all this stuff up.  Setting up an MCE handler
> and exposing a sysfs trigger for something that is optional and perhaps
> not implemented doesn't seem helpful.  Or is there a check that I missed?

We'll get -ENOTTY to ars_start(), but you're right it's a good idea to
hide the scrub attribute if a platform does not have ars support.

Vishal, can you add an is_visible() routine to
acpi_nfit_attribute_group() to hide 'scrub' on platforms that do not
implement the ARS commands?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/3] nfit, libnvdimm: allow an ARS scrub to be triggered on demand
@ 2016-07-21 19:46       ` Dan Williams
  0 siblings, 0 replies; 39+ messages in thread
From: Dan Williams @ 2016-07-21 19:46 UTC (permalink / raw)
  To: Linda Knippers
  Cc: Vishal Verma, linux-nvdimm@lists.01.org, Linux ACPI, Tony Luck,
	Rafael J. Wysocki, linux-kernel

On Thu, Jul 21, 2016 at 12:40 PM, Linda Knippers <linda.knippers@hpe.com> wrote:
> On 07/20/2016 09:50 PM, Vishal Verma wrote:
>> Normally, an ARS (Address Range Scrub) only happens at
>> boot/initialization time. There can however arise situations where a
>> bus-wide rescan is needed - notably, in the case of discovering a latent
>> media error, we should do a full rescan to figure out what other sectors
>> are bad, and thus potentially avoid triggering an mce on them in the
>> future. Also provide a sysfs trigger to start a bus-wide scrub.
>
> I don't see anything in here that checks to see if the platform actually
> supports ARS before setting all this stuff up.  Setting up an MCE handler
> and exposing a sysfs trigger for something that is optional and perhaps
> not implemented doesn't seem helpful.  Or is there a check that I missed?

We'll get -ENOTTY to ars_start(), but you're right it's a good idea to
hide the scrub attribute if a platform does not have ars support.

Vishal, can you add an is_visible() routine to
acpi_nfit_attribute_group() to hide 'scrub' on platforms that do not
implement the ARS commands?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/3] nfit, libnvdimm: allow an ARS scrub to be triggered on demand
@ 2016-07-21 19:55         ` Linda Knippers
  0 siblings, 0 replies; 39+ messages in thread
From: Linda Knippers @ 2016-07-21 19:55 UTC (permalink / raw)
  To: Dan Williams
  Cc: Tony Luck, linux-nvdimm, Rafael J. Wysocki, linux-kernel, Linux ACPI



On 7/21/2016 3:46 PM, Dan Williams wrote:
> On Thu, Jul 21, 2016 at 12:40 PM, Linda Knippers <linda.knippers@hpe.com> wrote:
>> On 07/20/2016 09:50 PM, Vishal Verma wrote:
>>> Normally, an ARS (Address Range Scrub) only happens at
>>> boot/initialization time. There can however arise situations where a
>>> bus-wide rescan is needed - notably, in the case of discovering a latent
>>> media error, we should do a full rescan to figure out what other sectors
>>> are bad, and thus potentially avoid triggering an mce on them in the
>>> future. Also provide a sysfs trigger to start a bus-wide scrub.
>>
>> I don't see anything in here that checks to see if the platform actually
>> supports ARS before setting all this stuff up.  Setting up an MCE handler
>> and exposing a sysfs trigger for something that is optional and perhaps
>> not implemented doesn't seem helpful.  Or is there a check that I missed?
> 
> We'll get -ENOTTY to ars_start(), but you're right it's a good idea to
> hide the scrub attribute if a platform does not have ars support.
> 
> Vishal, can you add an is_visible() routine to
> acpi_nfit_attribute_group() to hide 'scrub' on platforms that do not
> implement the ARS commands?

It's also possible that a platform might only support ARS at boot time
so subsequent scrubs would fail or not return any new information.
I don't think there's a way to know that in advice though.

-- ljk
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/3] nfit, libnvdimm: allow an ARS scrub to be triggered on demand
@ 2016-07-21 19:55         ` Linda Knippers
  0 siblings, 0 replies; 39+ messages in thread
From: Linda Knippers @ 2016-07-21 19:55 UTC (permalink / raw)
  To: Dan Williams
  Cc: Tony Luck, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	Rafael J. Wysocki, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Linux ACPI



On 7/21/2016 3:46 PM, Dan Williams wrote:
> On Thu, Jul 21, 2016 at 12:40 PM, Linda Knippers <linda.knippers-ZPxbGqLxI0U@public.gmane.org> wrote:
>> On 07/20/2016 09:50 PM, Vishal Verma wrote:
>>> Normally, an ARS (Address Range Scrub) only happens at
>>> boot/initialization time. There can however arise situations where a
>>> bus-wide rescan is needed - notably, in the case of discovering a latent
>>> media error, we should do a full rescan to figure out what other sectors
>>> are bad, and thus potentially avoid triggering an mce on them in the
>>> future. Also provide a sysfs trigger to start a bus-wide scrub.
>>
>> I don't see anything in here that checks to see if the platform actually
>> supports ARS before setting all this stuff up.  Setting up an MCE handler
>> and exposing a sysfs trigger for something that is optional and perhaps
>> not implemented doesn't seem helpful.  Or is there a check that I missed?
> 
> We'll get -ENOTTY to ars_start(), but you're right it's a good idea to
> hide the scrub attribute if a platform does not have ars support.
> 
> Vishal, can you add an is_visible() routine to
> acpi_nfit_attribute_group() to hide 'scrub' on platforms that do not
> implement the ARS commands?

It's also possible that a platform might only support ARS at boot time
so subsequent scrubs would fail or not return any new information.
I don't think there's a way to know that in advice though.

-- ljk

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/3] nfit, libnvdimm: allow an ARS scrub to be triggered on demand
@ 2016-07-21 19:55         ` Linda Knippers
  0 siblings, 0 replies; 39+ messages in thread
From: Linda Knippers @ 2016-07-21 19:55 UTC (permalink / raw)
  To: Dan Williams
  Cc: Vishal Verma, linux-nvdimm@lists.01.org, Linux ACPI, Tony Luck,
	Rafael J. Wysocki, linux-kernel



On 7/21/2016 3:46 PM, Dan Williams wrote:
> On Thu, Jul 21, 2016 at 12:40 PM, Linda Knippers <linda.knippers@hpe.com> wrote:
>> On 07/20/2016 09:50 PM, Vishal Verma wrote:
>>> Normally, an ARS (Address Range Scrub) only happens at
>>> boot/initialization time. There can however arise situations where a
>>> bus-wide rescan is needed - notably, in the case of discovering a latent
>>> media error, we should do a full rescan to figure out what other sectors
>>> are bad, and thus potentially avoid triggering an mce on them in the
>>> future. Also provide a sysfs trigger to start a bus-wide scrub.
>>
>> I don't see anything in here that checks to see if the platform actually
>> supports ARS before setting all this stuff up.  Setting up an MCE handler
>> and exposing a sysfs trigger for something that is optional and perhaps
>> not implemented doesn't seem helpful.  Or is there a check that I missed?
> 
> We'll get -ENOTTY to ars_start(), but you're right it's a good idea to
> hide the scrub attribute if a platform does not have ars support.
> 
> Vishal, can you add an is_visible() routine to
> acpi_nfit_attribute_group() to hide 'scrub' on platforms that do not
> implement the ARS commands?

It's also possible that a platform might only support ARS at boot time
so subsequent scrubs would fail or not return any new information.
I don't think there's a way to know that in advice though.

-- ljk

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/3] nfit, libnvdimm: allow an ARS scrub to be triggered on demand
@ 2016-07-21 19:59           ` Dan Williams
  0 siblings, 0 replies; 39+ messages in thread
From: Dan Williams @ 2016-07-21 19:59 UTC (permalink / raw)
  To: Linda Knippers
  Cc: Tony Luck, linux-nvdimm, Rafael J. Wysocki, linux-kernel, Linux ACPI

On Thu, Jul 21, 2016 at 12:55 PM, Linda Knippers <linda.knippers@hpe.com> wrote:
>
>
> On 7/21/2016 3:46 PM, Dan Williams wrote:
>> On Thu, Jul 21, 2016 at 12:40 PM, Linda Knippers <linda.knippers@hpe.com> wrote:
>>> On 07/20/2016 09:50 PM, Vishal Verma wrote:
>>>> Normally, an ARS (Address Range Scrub) only happens at
>>>> boot/initialization time. There can however arise situations where a
>>>> bus-wide rescan is needed - notably, in the case of discovering a latent
>>>> media error, we should do a full rescan to figure out what other sectors
>>>> are bad, and thus potentially avoid triggering an mce on them in the
>>>> future. Also provide a sysfs trigger to start a bus-wide scrub.
>>>
>>> I don't see anything in here that checks to see if the platform actually
>>> supports ARS before setting all this stuff up.  Setting up an MCE handler
>>> and exposing a sysfs trigger for something that is optional and perhaps
>>> not implemented doesn't seem helpful.  Or is there a check that I missed?
>>
>> We'll get -ENOTTY to ars_start(), but you're right it's a good idea to
>> hide the scrub attribute if a platform does not have ars support.
>>
>> Vishal, can you add an is_visible() routine to
>> acpi_nfit_attribute_group() to hide 'scrub' on platforms that do not
>> implement the ARS commands?
>
> It's also possible that a platform might only support ARS at boot time
> so subsequent scrubs would fail or not return any new information.
> I don't think there's a way to know that in advice though.

I would hope a platform like that just marks the "ARS - Start" command
as not supported so that we don't even generate the failure.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/3] nfit, libnvdimm: allow an ARS scrub to be triggered on demand
@ 2016-07-21 19:59           ` Dan Williams
  0 siblings, 0 replies; 39+ messages in thread
From: Dan Williams @ 2016-07-21 19:59 UTC (permalink / raw)
  To: Linda Knippers
  Cc: Tony Luck, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	Rafael J. Wysocki, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Linux ACPI

On Thu, Jul 21, 2016 at 12:55 PM, Linda Knippers <linda.knippers-ZPxbGqLxI0U@public.gmane.org> wrote:
>
>
> On 7/21/2016 3:46 PM, Dan Williams wrote:
>> On Thu, Jul 21, 2016 at 12:40 PM, Linda Knippers <linda.knippers-ZPxbGqLxI0U@public.gmane.org> wrote:
>>> On 07/20/2016 09:50 PM, Vishal Verma wrote:
>>>> Normally, an ARS (Address Range Scrub) only happens at
>>>> boot/initialization time. There can however arise situations where a
>>>> bus-wide rescan is needed - notably, in the case of discovering a latent
>>>> media error, we should do a full rescan to figure out what other sectors
>>>> are bad, and thus potentially avoid triggering an mce on them in the
>>>> future. Also provide a sysfs trigger to start a bus-wide scrub.
>>>
>>> I don't see anything in here that checks to see if the platform actually
>>> supports ARS before setting all this stuff up.  Setting up an MCE handler
>>> and exposing a sysfs trigger for something that is optional and perhaps
>>> not implemented doesn't seem helpful.  Or is there a check that I missed?
>>
>> We'll get -ENOTTY to ars_start(), but you're right it's a good idea to
>> hide the scrub attribute if a platform does not have ars support.
>>
>> Vishal, can you add an is_visible() routine to
>> acpi_nfit_attribute_group() to hide 'scrub' on platforms that do not
>> implement the ARS commands?
>
> It's also possible that a platform might only support ARS at boot time
> so subsequent scrubs would fail or not return any new information.
> I don't think there's a way to know that in advice though.

I would hope a platform like that just marks the "ARS - Start" command
as not supported so that we don't even generate the failure.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/3] nfit, libnvdimm: allow an ARS scrub to be triggered on demand
@ 2016-07-21 19:59           ` Dan Williams
  0 siblings, 0 replies; 39+ messages in thread
From: Dan Williams @ 2016-07-21 19:59 UTC (permalink / raw)
  To: Linda Knippers
  Cc: Vishal Verma, linux-nvdimm@lists.01.org, Linux ACPI, Tony Luck,
	Rafael J. Wysocki, linux-kernel

On Thu, Jul 21, 2016 at 12:55 PM, Linda Knippers <linda.knippers@hpe.com> wrote:
>
>
> On 7/21/2016 3:46 PM, Dan Williams wrote:
>> On Thu, Jul 21, 2016 at 12:40 PM, Linda Knippers <linda.knippers@hpe.com> wrote:
>>> On 07/20/2016 09:50 PM, Vishal Verma wrote:
>>>> Normally, an ARS (Address Range Scrub) only happens at
>>>> boot/initialization time. There can however arise situations where a
>>>> bus-wide rescan is needed - notably, in the case of discovering a latent
>>>> media error, we should do a full rescan to figure out what other sectors
>>>> are bad, and thus potentially avoid triggering an mce on them in the
>>>> future. Also provide a sysfs trigger to start a bus-wide scrub.
>>>
>>> I don't see anything in here that checks to see if the platform actually
>>> supports ARS before setting all this stuff up.  Setting up an MCE handler
>>> and exposing a sysfs trigger for something that is optional and perhaps
>>> not implemented doesn't seem helpful.  Or is there a check that I missed?
>>
>> We'll get -ENOTTY to ars_start(), but you're right it's a good idea to
>> hide the scrub attribute if a platform does not have ars support.
>>
>> Vishal, can you add an is_visible() routine to
>> acpi_nfit_attribute_group() to hide 'scrub' on platforms that do not
>> implement the ARS commands?
>
> It's also possible that a platform might only support ARS at boot time
> so subsequent scrubs would fail or not return any new information.
> I don't think there's a way to know that in advice though.

I would hope a platform like that just marks the "ARS - Start" command
as not supported so that we don't even generate the failure.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 3/3] nfit: do an ARS scrub on hitting a latent media error
@ 2016-07-21 20:54     ` Linda Knippers
  0 siblings, 0 replies; 39+ messages in thread
From: Linda Knippers @ 2016-07-21 20:54 UTC (permalink / raw)
  To: Vishal Verma, linux-nvdimm
  Cc: linux-acpi, Tony Luck, Rafael J. Wysocki, linux-kernel



On 7/20/2016 9:50 PM, Vishal Verma wrote:
> When a latent (unknown to 'badblocks') error is encountered, it will
> trigger a machine check exception. On a system with machine check
> recovery, this will only SIGBUS the process(es) which had the bad page
> mapped (as opposed to a kernel panic on platforms without machine
> check recovery features). In the former case, we want to trigger a full
> rescan of that nvdimm bus. This will allow any additional, new errors
> to be captured in the block devices' badblocks lists, and offending
> operations on them can be trapped early, avoiding machine checks.

Do we really need to rescan all SPA ranges?  If the problem is with
an NVDIMM, wouldn't the blast radius be the device or it's interleave set,
which would be part of the same SPA range?

I don't know what the overhead associated with a scan is which is why I'm asking.

-- ljk

> 
> This is done by registering a callback function with the
> x86_mce_decoder_chain and calling the new ars_rescan functionality with
> the address in the mce notificatiion.
> 
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: <linux-acpi@vger.kernel.org>
> Cc: <linux-nvdimm@lists.01.org>
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> ---
>  drivers/acpi/nfit.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  drivers/acpi/nfit.h |  1 +
>  2 files changed, 90 insertions(+)
> 
> diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
> index 4e65255..c9f1ee4 100644
> --- a/drivers/acpi/nfit.c
> +++ b/drivers/acpi/nfit.c
> @@ -12,6 +12,7 @@
>   */
>  #include <linux/list_sort.h>
>  #include <linux/libnvdimm.h>
> +#include <linux/notifier.h>
>  #include <linux/module.h>
>  #include <linux/mutex.h>
>  #include <linux/ndctl.h>
> @@ -24,6 +25,7 @@
>  #include <linux/io.h>
>  #include <linux/nd.h>
>  #include <asm/cacheflush.h>
> +#include <asm/mce.h>
>  #include "nfit.h"
>  
>  /*
> @@ -51,6 +53,9 @@ module_param(disable_vendor_specific, bool, S_IRUGO);
>  MODULE_PARM_DESC(disable_vendor_specific,
>  		"Limit commands to the publicly specified set\n");
>  
> +static LIST_HEAD(acpi_descs);
> +static DEFINE_MUTEX(acpi_desc_lock);
> +
>  static struct workqueue_struct *nfit_wq;
>  
>  struct nfit_table_prev {
> @@ -2416,9 +2421,11 @@ static int acpi_nfit_check_deletions(struct acpi_nfit_desc *acpi_desc,
>  
>  int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
>  {
> +	struct acpi_nfit_desc *acpi_desc_entry;
>  	struct device *dev = acpi_desc->dev;
>  	struct nfit_table_prev prev;
>  	const void *end;
> +	int found = 0;
>  	u8 *data;
>  	int rc;
>  
> @@ -2473,6 +2480,19 @@ int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
>  
>  	rc = acpi_nfit_register_regions(acpi_desc);
>  
> +	/*
> +	 * We may get here due to an update of the nfit via _FIT.
> +	 * Check if the acpi_desc we're (re)initializing is already
> +	 * present in the list, and if so, don't re-add it
> +	 */
> +	mutex_lock(&acpi_desc_lock);
> +	list_for_each_entry(acpi_desc_entry, &acpi_descs, list)
> +		if (acpi_desc_entry == acpi_desc)
> +			found = 1;
> +	if (found == 0)
> +		list_add_tail(&acpi_desc->list, &acpi_descs);
> +	mutex_unlock(&acpi_desc_lock);
> +
>   out_unlock:
>  	mutex_unlock(&acpi_desc->init_mutex);
>  	return rc;
> @@ -2555,6 +2575,65 @@ static int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc)
>  	return 0;
>  }
>  
> +static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
> +			void *data)
> +{
> +	struct mce *mce = (struct mce *)data;
> +	struct acpi_nfit_desc *acpi_desc;
> +	struct nfit_spa *nfit_spa;
> +
> +	/* We only care about memory errors */
> +	if (!(mce->status & MCACOD))
> +		return NOTIFY_DONE;
> +
> +	/*
> +	 * mce->addr contains the physical addr accessed that caused the
> +	 * machine check. We need to walk through the list of NFITs, and see
> +	 * if any of them matches that address, and only then start a scrub.
> +	 */
> +	mutex_lock(&acpi_desc_lock);
> +	list_for_each_entry(acpi_desc, &acpi_descs, list) {
> +		struct device *dev = acpi_desc->dev;
> +		int found_match = 0;
> +
> +		list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
> +			struct acpi_nfit_system_address *spa = nfit_spa->spa;
> +
> +			if (nfit_spa_type(spa) != NFIT_SPA_PM)
> +				continue;
> +			/* find the spa that covers the mce addr */
> +			if (spa->address > mce->addr)
> +				continue;
> +			if ((spa->address + spa->length - 1) < mce->addr)
> +				continue;
> +			found_match = 1;
> +			dev_dbg(dev, "%s: addr in SPA %d (0x%llx, 0x%llx)\n",
> +				__func__, spa->range_index, spa->address,
> +				spa->length);
> +			/*
> +			 * We can break at the first match because we're going
> +			 * to rescan all the SPA ranges. There shouldn't be any
> +			 * aliasing anyway.
> +			 */
> +			break;
> +		}
> +
> +		/*
> +		 * We can ignore an -EBUSY here because if an ARS is already
> +		 * in progress, just let that be the last authoritative one
> +		 */
> +		if (found_match)
> +			acpi_nfit_ars_rescan(acpi_desc);
> +	}
> +
> +	mutex_unlock(&acpi_desc_lock);
> +	return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block nfit_mce_dec = {
> +	.notifier_call	= nfit_handle_mce,
> +};
> +
>  void acpi_nfit_desc_init(struct acpi_nfit_desc *acpi_desc, struct device *dev)
>  {
>  	struct nvdimm_bus_descriptor *nd_desc;
> @@ -2676,6 +2755,9 @@ static int acpi_nfit_remove(struct acpi_device *adev)
>  	flush_workqueue(nfit_wq);
>  	sysfs_put(acpi_desc->scrub_count_state);
>  	nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
> +	mutex_lock(&acpi_desc_lock);
> +	list_del(&acpi_desc->list);
> +	mutex_unlock(&acpi_desc_lock);
>  	return 0;
>  }
>  
> @@ -2782,13 +2864,20 @@ static __init int nfit_init(void)
>  	if (!nfit_wq)
>  		return -ENOMEM;
>  
> +	INIT_LIST_HEAD(&acpi_descs);
> +	mce_register_decode_chain(&nfit_mce_dec);
> +
>  	return acpi_bus_register_driver(&acpi_nfit_driver);
>  }
>  
>  static __exit void nfit_exit(void)
>  {
> +	mce_unregister_decode_chain(&nfit_mce_dec);
>  	acpi_bus_unregister_driver(&acpi_nfit_driver);
>  	destroy_workqueue(nfit_wq);
> +	mutex_lock(&acpi_desc_lock);
> +	WARN_ON(!list_empty(&acpi_descs));
> +	mutex_unlock(&acpi_desc_lock);
>  }
>  
>  module_init(nfit_init);
> diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
> index 954d2aa..d2aec6f 100644
> --- a/drivers/acpi/nfit.h
> +++ b/drivers/acpi/nfit.h
> @@ -146,6 +146,7 @@ struct acpi_nfit_desc {
>  	struct nd_cmd_ars_status *ars_status;
>  	size_t ars_status_size;
>  	struct work_struct work;
> +	struct list_head list;
>  	struct kernfs_node *scrub_count_state;
>  	unsigned int scrub_count;
>  	unsigned int cancel:1;
> 
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 3/3] nfit: do an ARS scrub on hitting a latent media error
@ 2016-07-21 20:54     ` Linda Knippers
  0 siblings, 0 replies; 39+ messages in thread
From: Linda Knippers @ 2016-07-21 20:54 UTC (permalink / raw)
  To: Vishal Verma, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw
  Cc: linux-acpi-u79uwXL29TY76Z2rM5mHXA, Tony Luck, Rafael J. Wysocki,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA



On 7/20/2016 9:50 PM, Vishal Verma wrote:
> When a latent (unknown to 'badblocks') error is encountered, it will
> trigger a machine check exception. On a system with machine check
> recovery, this will only SIGBUS the process(es) which had the bad page
> mapped (as opposed to a kernel panic on platforms without machine
> check recovery features). In the former case, we want to trigger a full
> rescan of that nvdimm bus. This will allow any additional, new errors
> to be captured in the block devices' badblocks lists, and offending
> operations on them can be trapped early, avoiding machine checks.

Do we really need to rescan all SPA ranges?  If the problem is with
an NVDIMM, wouldn't the blast radius be the device or it's interleave set,
which would be part of the same SPA range?

I don't know what the overhead associated with a scan is which is why I'm asking.

-- ljk

> 
> This is done by registering a callback function with the
> x86_mce_decoder_chain and calling the new ars_rescan functionality with
> the address in the mce notificatiion.
> 
> Cc: Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> Cc: Rafael J. Wysocki <rafael.j.wysocki-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> Cc: Tony Luck <tony.luck-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> Cc: <linux-acpi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
> Cc: <linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org>
> Signed-off-by: Vishal Verma <vishal.l.verma-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> ---
>  drivers/acpi/nfit.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  drivers/acpi/nfit.h |  1 +
>  2 files changed, 90 insertions(+)
> 
> diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
> index 4e65255..c9f1ee4 100644
> --- a/drivers/acpi/nfit.c
> +++ b/drivers/acpi/nfit.c
> @@ -12,6 +12,7 @@
>   */
>  #include <linux/list_sort.h>
>  #include <linux/libnvdimm.h>
> +#include <linux/notifier.h>
>  #include <linux/module.h>
>  #include <linux/mutex.h>
>  #include <linux/ndctl.h>
> @@ -24,6 +25,7 @@
>  #include <linux/io.h>
>  #include <linux/nd.h>
>  #include <asm/cacheflush.h>
> +#include <asm/mce.h>
>  #include "nfit.h"
>  
>  /*
> @@ -51,6 +53,9 @@ module_param(disable_vendor_specific, bool, S_IRUGO);
>  MODULE_PARM_DESC(disable_vendor_specific,
>  		"Limit commands to the publicly specified set\n");
>  
> +static LIST_HEAD(acpi_descs);
> +static DEFINE_MUTEX(acpi_desc_lock);
> +
>  static struct workqueue_struct *nfit_wq;
>  
>  struct nfit_table_prev {
> @@ -2416,9 +2421,11 @@ static int acpi_nfit_check_deletions(struct acpi_nfit_desc *acpi_desc,
>  
>  int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
>  {
> +	struct acpi_nfit_desc *acpi_desc_entry;
>  	struct device *dev = acpi_desc->dev;
>  	struct nfit_table_prev prev;
>  	const void *end;
> +	int found = 0;
>  	u8 *data;
>  	int rc;
>  
> @@ -2473,6 +2480,19 @@ int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
>  
>  	rc = acpi_nfit_register_regions(acpi_desc);
>  
> +	/*
> +	 * We may get here due to an update of the nfit via _FIT.
> +	 * Check if the acpi_desc we're (re)initializing is already
> +	 * present in the list, and if so, don't re-add it
> +	 */
> +	mutex_lock(&acpi_desc_lock);
> +	list_for_each_entry(acpi_desc_entry, &acpi_descs, list)
> +		if (acpi_desc_entry == acpi_desc)
> +			found = 1;
> +	if (found == 0)
> +		list_add_tail(&acpi_desc->list, &acpi_descs);
> +	mutex_unlock(&acpi_desc_lock);
> +
>   out_unlock:
>  	mutex_unlock(&acpi_desc->init_mutex);
>  	return rc;
> @@ -2555,6 +2575,65 @@ static int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc)
>  	return 0;
>  }
>  
> +static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
> +			void *data)
> +{
> +	struct mce *mce = (struct mce *)data;
> +	struct acpi_nfit_desc *acpi_desc;
> +	struct nfit_spa *nfit_spa;
> +
> +	/* We only care about memory errors */
> +	if (!(mce->status & MCACOD))
> +		return NOTIFY_DONE;
> +
> +	/*
> +	 * mce->addr contains the physical addr accessed that caused the
> +	 * machine check. We need to walk through the list of NFITs, and see
> +	 * if any of them matches that address, and only then start a scrub.
> +	 */
> +	mutex_lock(&acpi_desc_lock);
> +	list_for_each_entry(acpi_desc, &acpi_descs, list) {
> +		struct device *dev = acpi_desc->dev;
> +		int found_match = 0;
> +
> +		list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
> +			struct acpi_nfit_system_address *spa = nfit_spa->spa;
> +
> +			if (nfit_spa_type(spa) != NFIT_SPA_PM)
> +				continue;
> +			/* find the spa that covers the mce addr */
> +			if (spa->address > mce->addr)
> +				continue;
> +			if ((spa->address + spa->length - 1) < mce->addr)
> +				continue;
> +			found_match = 1;
> +			dev_dbg(dev, "%s: addr in SPA %d (0x%llx, 0x%llx)\n",
> +				__func__, spa->range_index, spa->address,
> +				spa->length);
> +			/*
> +			 * We can break at the first match because we're going
> +			 * to rescan all the SPA ranges. There shouldn't be any
> +			 * aliasing anyway.
> +			 */
> +			break;
> +		}
> +
> +		/*
> +		 * We can ignore an -EBUSY here because if an ARS is already
> +		 * in progress, just let that be the last authoritative one
> +		 */
> +		if (found_match)
> +			acpi_nfit_ars_rescan(acpi_desc);
> +	}
> +
> +	mutex_unlock(&acpi_desc_lock);
> +	return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block nfit_mce_dec = {
> +	.notifier_call	= nfit_handle_mce,
> +};
> +
>  void acpi_nfit_desc_init(struct acpi_nfit_desc *acpi_desc, struct device *dev)
>  {
>  	struct nvdimm_bus_descriptor *nd_desc;
> @@ -2676,6 +2755,9 @@ static int acpi_nfit_remove(struct acpi_device *adev)
>  	flush_workqueue(nfit_wq);
>  	sysfs_put(acpi_desc->scrub_count_state);
>  	nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
> +	mutex_lock(&acpi_desc_lock);
> +	list_del(&acpi_desc->list);
> +	mutex_unlock(&acpi_desc_lock);
>  	return 0;
>  }
>  
> @@ -2782,13 +2864,20 @@ static __init int nfit_init(void)
>  	if (!nfit_wq)
>  		return -ENOMEM;
>  
> +	INIT_LIST_HEAD(&acpi_descs);
> +	mce_register_decode_chain(&nfit_mce_dec);
> +
>  	return acpi_bus_register_driver(&acpi_nfit_driver);
>  }
>  
>  static __exit void nfit_exit(void)
>  {
> +	mce_unregister_decode_chain(&nfit_mce_dec);
>  	acpi_bus_unregister_driver(&acpi_nfit_driver);
>  	destroy_workqueue(nfit_wq);
> +	mutex_lock(&acpi_desc_lock);
> +	WARN_ON(!list_empty(&acpi_descs));
> +	mutex_unlock(&acpi_desc_lock);
>  }
>  
>  module_init(nfit_init);
> diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
> index 954d2aa..d2aec6f 100644
> --- a/drivers/acpi/nfit.h
> +++ b/drivers/acpi/nfit.h
> @@ -146,6 +146,7 @@ struct acpi_nfit_desc {
>  	struct nd_cmd_ars_status *ars_status;
>  	size_t ars_status_size;
>  	struct work_struct work;
> +	struct list_head list;
>  	struct kernfs_node *scrub_count_state;
>  	unsigned int scrub_count;
>  	unsigned int cancel:1;
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 3/3] nfit: do an ARS scrub on hitting a latent media error
@ 2016-07-21 20:54     ` Linda Knippers
  0 siblings, 0 replies; 39+ messages in thread
From: Linda Knippers @ 2016-07-21 20:54 UTC (permalink / raw)
  To: Vishal Verma, linux-nvdimm
  Cc: Tony Luck, Rafael J. Wysocki, linux-kernel, linux-acpi



On 7/20/2016 9:50 PM, Vishal Verma wrote:
> When a latent (unknown to 'badblocks') error is encountered, it will
> trigger a machine check exception. On a system with machine check
> recovery, this will only SIGBUS the process(es) which had the bad page
> mapped (as opposed to a kernel panic on platforms without machine
> check recovery features). In the former case, we want to trigger a full
> rescan of that nvdimm bus. This will allow any additional, new errors
> to be captured in the block devices' badblocks lists, and offending
> operations on them can be trapped early, avoiding machine checks.

Do we really need to rescan all SPA ranges?  If the problem is with
an NVDIMM, wouldn't the blast radius be the device or it's interleave set,
which would be part of the same SPA range?

I don't know what the overhead associated with a scan is which is why I'm asking.

-- ljk

> 
> This is done by registering a callback function with the
> x86_mce_decoder_chain and calling the new ars_rescan functionality with
> the address in the mce notificatiion.
> 
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: <linux-acpi@vger.kernel.org>
> Cc: <linux-nvdimm@lists.01.org>
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> ---
>  drivers/acpi/nfit.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  drivers/acpi/nfit.h |  1 +
>  2 files changed, 90 insertions(+)
> 
> diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
> index 4e65255..c9f1ee4 100644
> --- a/drivers/acpi/nfit.c
> +++ b/drivers/acpi/nfit.c
> @@ -12,6 +12,7 @@
>   */
>  #include <linux/list_sort.h>
>  #include <linux/libnvdimm.h>
> +#include <linux/notifier.h>
>  #include <linux/module.h>
>  #include <linux/mutex.h>
>  #include <linux/ndctl.h>
> @@ -24,6 +25,7 @@
>  #include <linux/io.h>
>  #include <linux/nd.h>
>  #include <asm/cacheflush.h>
> +#include <asm/mce.h>
>  #include "nfit.h"
>  
>  /*
> @@ -51,6 +53,9 @@ module_param(disable_vendor_specific, bool, S_IRUGO);
>  MODULE_PARM_DESC(disable_vendor_specific,
>  		"Limit commands to the publicly specified set\n");
>  
> +static LIST_HEAD(acpi_descs);
> +static DEFINE_MUTEX(acpi_desc_lock);
> +
>  static struct workqueue_struct *nfit_wq;
>  
>  struct nfit_table_prev {
> @@ -2416,9 +2421,11 @@ static int acpi_nfit_check_deletions(struct acpi_nfit_desc *acpi_desc,
>  
>  int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
>  {
> +	struct acpi_nfit_desc *acpi_desc_entry;
>  	struct device *dev = acpi_desc->dev;
>  	struct nfit_table_prev prev;
>  	const void *end;
> +	int found = 0;
>  	u8 *data;
>  	int rc;
>  
> @@ -2473,6 +2480,19 @@ int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
>  
>  	rc = acpi_nfit_register_regions(acpi_desc);
>  
> +	/*
> +	 * We may get here due to an update of the nfit via _FIT.
> +	 * Check if the acpi_desc we're (re)initializing is already
> +	 * present in the list, and if so, don't re-add it
> +	 */
> +	mutex_lock(&acpi_desc_lock);
> +	list_for_each_entry(acpi_desc_entry, &acpi_descs, list)
> +		if (acpi_desc_entry == acpi_desc)
> +			found = 1;
> +	if (found == 0)
> +		list_add_tail(&acpi_desc->list, &acpi_descs);
> +	mutex_unlock(&acpi_desc_lock);
> +
>   out_unlock:
>  	mutex_unlock(&acpi_desc->init_mutex);
>  	return rc;
> @@ -2555,6 +2575,65 @@ static int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc)
>  	return 0;
>  }
>  
> +static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
> +			void *data)
> +{
> +	struct mce *mce = (struct mce *)data;
> +	struct acpi_nfit_desc *acpi_desc;
> +	struct nfit_spa *nfit_spa;
> +
> +	/* We only care about memory errors */
> +	if (!(mce->status & MCACOD))
> +		return NOTIFY_DONE;
> +
> +	/*
> +	 * mce->addr contains the physical addr accessed that caused the
> +	 * machine check. We need to walk through the list of NFITs, and see
> +	 * if any of them matches that address, and only then start a scrub.
> +	 */
> +	mutex_lock(&acpi_desc_lock);
> +	list_for_each_entry(acpi_desc, &acpi_descs, list) {
> +		struct device *dev = acpi_desc->dev;
> +		int found_match = 0;
> +
> +		list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
> +			struct acpi_nfit_system_address *spa = nfit_spa->spa;
> +
> +			if (nfit_spa_type(spa) != NFIT_SPA_PM)
> +				continue;
> +			/* find the spa that covers the mce addr */
> +			if (spa->address > mce->addr)
> +				continue;
> +			if ((spa->address + spa->length - 1) < mce->addr)
> +				continue;
> +			found_match = 1;
> +			dev_dbg(dev, "%s: addr in SPA %d (0x%llx, 0x%llx)\n",
> +				__func__, spa->range_index, spa->address,
> +				spa->length);
> +			/*
> +			 * We can break at the first match because we're going
> +			 * to rescan all the SPA ranges. There shouldn't be any
> +			 * aliasing anyway.
> +			 */
> +			break;
> +		}
> +
> +		/*
> +		 * We can ignore an -EBUSY here because if an ARS is already
> +		 * in progress, just let that be the last authoritative one
> +		 */
> +		if (found_match)
> +			acpi_nfit_ars_rescan(acpi_desc);
> +	}
> +
> +	mutex_unlock(&acpi_desc_lock);
> +	return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block nfit_mce_dec = {
> +	.notifier_call	= nfit_handle_mce,
> +};
> +
>  void acpi_nfit_desc_init(struct acpi_nfit_desc *acpi_desc, struct device *dev)
>  {
>  	struct nvdimm_bus_descriptor *nd_desc;
> @@ -2676,6 +2755,9 @@ static int acpi_nfit_remove(struct acpi_device *adev)
>  	flush_workqueue(nfit_wq);
>  	sysfs_put(acpi_desc->scrub_count_state);
>  	nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
> +	mutex_lock(&acpi_desc_lock);
> +	list_del(&acpi_desc->list);
> +	mutex_unlock(&acpi_desc_lock);
>  	return 0;
>  }
>  
> @@ -2782,13 +2864,20 @@ static __init int nfit_init(void)
>  	if (!nfit_wq)
>  		return -ENOMEM;
>  
> +	INIT_LIST_HEAD(&acpi_descs);
> +	mce_register_decode_chain(&nfit_mce_dec);
> +
>  	return acpi_bus_register_driver(&acpi_nfit_driver);
>  }
>  
>  static __exit void nfit_exit(void)
>  {
> +	mce_unregister_decode_chain(&nfit_mce_dec);
>  	acpi_bus_unregister_driver(&acpi_nfit_driver);
>  	destroy_workqueue(nfit_wq);
> +	mutex_lock(&acpi_desc_lock);
> +	WARN_ON(!list_empty(&acpi_descs));
> +	mutex_unlock(&acpi_desc_lock);
>  }
>  
>  module_init(nfit_init);
> diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
> index 954d2aa..d2aec6f 100644
> --- a/drivers/acpi/nfit.h
> +++ b/drivers/acpi/nfit.h
> @@ -146,6 +146,7 @@ struct acpi_nfit_desc {
>  	struct nd_cmd_ars_status *ars_status;
>  	size_t ars_status_size;
>  	struct work_struct work;
> +	struct list_head list;
>  	struct kernfs_node *scrub_count_state;
>  	unsigned int scrub_count;
>  	unsigned int cancel:1;
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 3/3] nfit: do an ARS scrub on hitting a latent media error
@ 2016-07-21 21:10       ` Vishal Verma
  0 siblings, 0 replies; 39+ messages in thread
From: Vishal Verma @ 2016-07-21 21:10 UTC (permalink / raw)
  To: Linda Knippers
  Cc: linux-kernel, Tony Luck, linux-acpi, Rafael J. Wysocki, linux-nvdimm

On 07/21, Linda Knippers wrote:
> 
> 
> On 7/20/2016 9:50 PM, Vishal Verma wrote:
> > When a latent (unknown to 'badblocks') error is encountered, it will
> > trigger a machine check exception. On a system with machine check
> > recovery, this will only SIGBUS the process(es) which had the bad page
> > mapped (as opposed to a kernel panic on platforms without machine
> > check recovery features). In the former case, we want to trigger a full
> > rescan of that nvdimm bus. This will allow any additional, new errors
> > to be captured in the block devices' badblocks lists, and offending
> > operations on them can be trapped early, avoiding machine checks.
> 
> Do we really need to rescan all SPA ranges?  If the problem is with
> an NVDIMM, wouldn't the blast radius be the device or it's interleave set,
> which would be part of the same SPA range?
> 
> I don't know what the overhead associated with a scan is which is why I'm asking.

You're right that we don't _need_ to scan all ranges, and that the scrub
can be long-running, but we just take this 'event' as an opportunity to
basically refresh everything. Since it is asynchronous, we're not
holding anything up.

> 
> -- ljk
> 
> > 
> > This is done by registering a callback function with the
> > x86_mce_decoder_chain and calling the new ars_rescan functionality with
> > the address in the mce notificatiion.
> > 
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Cc: Tony Luck <tony.luck@intel.com>
> > Cc: <linux-acpi@vger.kernel.org>
> > Cc: <linux-nvdimm@lists.01.org>
> > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> > ---
> >  drivers/acpi/nfit.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  drivers/acpi/nfit.h |  1 +
> >  2 files changed, 90 insertions(+)
> > 
> > diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
> > index 4e65255..c9f1ee4 100644
> > --- a/drivers/acpi/nfit.c
> > +++ b/drivers/acpi/nfit.c
> > @@ -12,6 +12,7 @@
> >   */
> >  #include <linux/list_sort.h>
> >  #include <linux/libnvdimm.h>
> > +#include <linux/notifier.h>
> >  #include <linux/module.h>
> >  #include <linux/mutex.h>
> >  #include <linux/ndctl.h>
> > @@ -24,6 +25,7 @@
> >  #include <linux/io.h>
> >  #include <linux/nd.h>
> >  #include <asm/cacheflush.h>
> > +#include <asm/mce.h>
> >  #include "nfit.h"
> >  
> >  /*
> > @@ -51,6 +53,9 @@ module_param(disable_vendor_specific, bool, S_IRUGO);
> >  MODULE_PARM_DESC(disable_vendor_specific,
> >  		"Limit commands to the publicly specified set\n");
> >  
> > +static LIST_HEAD(acpi_descs);
> > +static DEFINE_MUTEX(acpi_desc_lock);
> > +
> >  static struct workqueue_struct *nfit_wq;
> >  
> >  struct nfit_table_prev {
> > @@ -2416,9 +2421,11 @@ static int acpi_nfit_check_deletions(struct acpi_nfit_desc *acpi_desc,
> >  
> >  int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
> >  {
> > +	struct acpi_nfit_desc *acpi_desc_entry;
> >  	struct device *dev = acpi_desc->dev;
> >  	struct nfit_table_prev prev;
> >  	const void *end;
> > +	int found = 0;
> >  	u8 *data;
> >  	int rc;
> >  
> > @@ -2473,6 +2480,19 @@ int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
> >  
> >  	rc = acpi_nfit_register_regions(acpi_desc);
> >  
> > +	/*
> > +	 * We may get here due to an update of the nfit via _FIT.
> > +	 * Check if the acpi_desc we're (re)initializing is already
> > +	 * present in the list, and if so, don't re-add it
> > +	 */
> > +	mutex_lock(&acpi_desc_lock);
> > +	list_for_each_entry(acpi_desc_entry, &acpi_descs, list)
> > +		if (acpi_desc_entry == acpi_desc)
> > +			found = 1;
> > +	if (found == 0)
> > +		list_add_tail(&acpi_desc->list, &acpi_descs);
> > +	mutex_unlock(&acpi_desc_lock);
> > +
> >   out_unlock:
> >  	mutex_unlock(&acpi_desc->init_mutex);
> >  	return rc;
> > @@ -2555,6 +2575,65 @@ static int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc)
> >  	return 0;
> >  }
> >  
> > +static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
> > +			void *data)
> > +{
> > +	struct mce *mce = (struct mce *)data;
> > +	struct acpi_nfit_desc *acpi_desc;
> > +	struct nfit_spa *nfit_spa;
> > +
> > +	/* We only care about memory errors */
> > +	if (!(mce->status & MCACOD))
> > +		return NOTIFY_DONE;
> > +
> > +	/*
> > +	 * mce->addr contains the physical addr accessed that caused the
> > +	 * machine check. We need to walk through the list of NFITs, and see
> > +	 * if any of them matches that address, and only then start a scrub.
> > +	 */
> > +	mutex_lock(&acpi_desc_lock);
> > +	list_for_each_entry(acpi_desc, &acpi_descs, list) {
> > +		struct device *dev = acpi_desc->dev;
> > +		int found_match = 0;
> > +
> > +		list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
> > +			struct acpi_nfit_system_address *spa = nfit_spa->spa;
> > +
> > +			if (nfit_spa_type(spa) != NFIT_SPA_PM)
> > +				continue;
> > +			/* find the spa that covers the mce addr */
> > +			if (spa->address > mce->addr)
> > +				continue;
> > +			if ((spa->address + spa->length - 1) < mce->addr)
> > +				continue;
> > +			found_match = 1;
> > +			dev_dbg(dev, "%s: addr in SPA %d (0x%llx, 0x%llx)\n",
> > +				__func__, spa->range_index, spa->address,
> > +				spa->length);
> > +			/*
> > +			 * We can break at the first match because we're going
> > +			 * to rescan all the SPA ranges. There shouldn't be any
> > +			 * aliasing anyway.
> > +			 */
> > +			break;
> > +		}
> > +
> > +		/*
> > +		 * We can ignore an -EBUSY here because if an ARS is already
> > +		 * in progress, just let that be the last authoritative one
> > +		 */
> > +		if (found_match)
> > +			acpi_nfit_ars_rescan(acpi_desc);
> > +	}
> > +
> > +	mutex_unlock(&acpi_desc_lock);
> > +	return NOTIFY_DONE;
> > +}
> > +
> > +static struct notifier_block nfit_mce_dec = {
> > +	.notifier_call	= nfit_handle_mce,
> > +};
> > +
> >  void acpi_nfit_desc_init(struct acpi_nfit_desc *acpi_desc, struct device *dev)
> >  {
> >  	struct nvdimm_bus_descriptor *nd_desc;
> > @@ -2676,6 +2755,9 @@ static int acpi_nfit_remove(struct acpi_device *adev)
> >  	flush_workqueue(nfit_wq);
> >  	sysfs_put(acpi_desc->scrub_count_state);
> >  	nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
> > +	mutex_lock(&acpi_desc_lock);
> > +	list_del(&acpi_desc->list);
> > +	mutex_unlock(&acpi_desc_lock);
> >  	return 0;
> >  }
> >  
> > @@ -2782,13 +2864,20 @@ static __init int nfit_init(void)
> >  	if (!nfit_wq)
> >  		return -ENOMEM;
> >  
> > +	INIT_LIST_HEAD(&acpi_descs);
> > +	mce_register_decode_chain(&nfit_mce_dec);
> > +
> >  	return acpi_bus_register_driver(&acpi_nfit_driver);
> >  }
> >  
> >  static __exit void nfit_exit(void)
> >  {
> > +	mce_unregister_decode_chain(&nfit_mce_dec);
> >  	acpi_bus_unregister_driver(&acpi_nfit_driver);
> >  	destroy_workqueue(nfit_wq);
> > +	mutex_lock(&acpi_desc_lock);
> > +	WARN_ON(!list_empty(&acpi_descs));
> > +	mutex_unlock(&acpi_desc_lock);
> >  }
> >  
> >  module_init(nfit_init);
> > diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
> > index 954d2aa..d2aec6f 100644
> > --- a/drivers/acpi/nfit.h
> > +++ b/drivers/acpi/nfit.h
> > @@ -146,6 +146,7 @@ struct acpi_nfit_desc {
> >  	struct nd_cmd_ars_status *ars_status;
> >  	size_t ars_status_size;
> >  	struct work_struct work;
> > +	struct list_head list;
> >  	struct kernfs_node *scrub_count_state;
> >  	unsigned int scrub_count;
> >  	unsigned int cancel:1;
> > 
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 3/3] nfit: do an ARS scrub on hitting a latent media error
@ 2016-07-21 21:10       ` Vishal Verma
  0 siblings, 0 replies; 39+ messages in thread
From: Vishal Verma @ 2016-07-21 21:10 UTC (permalink / raw)
  To: Linda Knippers
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tony Luck,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, Rafael J. Wysocki,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw

On 07/21, Linda Knippers wrote:
> 
> 
> On 7/20/2016 9:50 PM, Vishal Verma wrote:
> > When a latent (unknown to 'badblocks') error is encountered, it will
> > trigger a machine check exception. On a system with machine check
> > recovery, this will only SIGBUS the process(es) which had the bad page
> > mapped (as opposed to a kernel panic on platforms without machine
> > check recovery features). In the former case, we want to trigger a full
> > rescan of that nvdimm bus. This will allow any additional, new errors
> > to be captured in the block devices' badblocks lists, and offending
> > operations on them can be trapped early, avoiding machine checks.
> 
> Do we really need to rescan all SPA ranges?  If the problem is with
> an NVDIMM, wouldn't the blast radius be the device or it's interleave set,
> which would be part of the same SPA range?
> 
> I don't know what the overhead associated with a scan is which is why I'm asking.

You're right that we don't _need_ to scan all ranges, and that the scrub
can be long-running, but we just take this 'event' as an opportunity to
basically refresh everything. Since it is asynchronous, we're not
holding anything up.

> 
> -- ljk
> 
> > 
> > This is done by registering a callback function with the
> > x86_mce_decoder_chain and calling the new ars_rescan functionality with
> > the address in the mce notificatiion.
> > 
> > Cc: Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > Cc: Rafael J. Wysocki <rafael.j.wysocki-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > Cc: Tony Luck <tony.luck-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > Cc: <linux-acpi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
> > Cc: <linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org>
> > Signed-off-by: Vishal Verma <vishal.l.verma-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > ---
> >  drivers/acpi/nfit.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  drivers/acpi/nfit.h |  1 +
> >  2 files changed, 90 insertions(+)
> > 
> > diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
> > index 4e65255..c9f1ee4 100644
> > --- a/drivers/acpi/nfit.c
> > +++ b/drivers/acpi/nfit.c
> > @@ -12,6 +12,7 @@
> >   */
> >  #include <linux/list_sort.h>
> >  #include <linux/libnvdimm.h>
> > +#include <linux/notifier.h>
> >  #include <linux/module.h>
> >  #include <linux/mutex.h>
> >  #include <linux/ndctl.h>
> > @@ -24,6 +25,7 @@
> >  #include <linux/io.h>
> >  #include <linux/nd.h>
> >  #include <asm/cacheflush.h>
> > +#include <asm/mce.h>
> >  #include "nfit.h"
> >  
> >  /*
> > @@ -51,6 +53,9 @@ module_param(disable_vendor_specific, bool, S_IRUGO);
> >  MODULE_PARM_DESC(disable_vendor_specific,
> >  		"Limit commands to the publicly specified set\n");
> >  
> > +static LIST_HEAD(acpi_descs);
> > +static DEFINE_MUTEX(acpi_desc_lock);
> > +
> >  static struct workqueue_struct *nfit_wq;
> >  
> >  struct nfit_table_prev {
> > @@ -2416,9 +2421,11 @@ static int acpi_nfit_check_deletions(struct acpi_nfit_desc *acpi_desc,
> >  
> >  int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
> >  {
> > +	struct acpi_nfit_desc *acpi_desc_entry;
> >  	struct device *dev = acpi_desc->dev;
> >  	struct nfit_table_prev prev;
> >  	const void *end;
> > +	int found = 0;
> >  	u8 *data;
> >  	int rc;
> >  
> > @@ -2473,6 +2480,19 @@ int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
> >  
> >  	rc = acpi_nfit_register_regions(acpi_desc);
> >  
> > +	/*
> > +	 * We may get here due to an update of the nfit via _FIT.
> > +	 * Check if the acpi_desc we're (re)initializing is already
> > +	 * present in the list, and if so, don't re-add it
> > +	 */
> > +	mutex_lock(&acpi_desc_lock);
> > +	list_for_each_entry(acpi_desc_entry, &acpi_descs, list)
> > +		if (acpi_desc_entry == acpi_desc)
> > +			found = 1;
> > +	if (found == 0)
> > +		list_add_tail(&acpi_desc->list, &acpi_descs);
> > +	mutex_unlock(&acpi_desc_lock);
> > +
> >   out_unlock:
> >  	mutex_unlock(&acpi_desc->init_mutex);
> >  	return rc;
> > @@ -2555,6 +2575,65 @@ static int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc)
> >  	return 0;
> >  }
> >  
> > +static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
> > +			void *data)
> > +{
> > +	struct mce *mce = (struct mce *)data;
> > +	struct acpi_nfit_desc *acpi_desc;
> > +	struct nfit_spa *nfit_spa;
> > +
> > +	/* We only care about memory errors */
> > +	if (!(mce->status & MCACOD))
> > +		return NOTIFY_DONE;
> > +
> > +	/*
> > +	 * mce->addr contains the physical addr accessed that caused the
> > +	 * machine check. We need to walk through the list of NFITs, and see
> > +	 * if any of them matches that address, and only then start a scrub.
> > +	 */
> > +	mutex_lock(&acpi_desc_lock);
> > +	list_for_each_entry(acpi_desc, &acpi_descs, list) {
> > +		struct device *dev = acpi_desc->dev;
> > +		int found_match = 0;
> > +
> > +		list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
> > +			struct acpi_nfit_system_address *spa = nfit_spa->spa;
> > +
> > +			if (nfit_spa_type(spa) != NFIT_SPA_PM)
> > +				continue;
> > +			/* find the spa that covers the mce addr */
> > +			if (spa->address > mce->addr)
> > +				continue;
> > +			if ((spa->address + spa->length - 1) < mce->addr)
> > +				continue;
> > +			found_match = 1;
> > +			dev_dbg(dev, "%s: addr in SPA %d (0x%llx, 0x%llx)\n",
> > +				__func__, spa->range_index, spa->address,
> > +				spa->length);
> > +			/*
> > +			 * We can break at the first match because we're going
> > +			 * to rescan all the SPA ranges. There shouldn't be any
> > +			 * aliasing anyway.
> > +			 */
> > +			break;
> > +		}
> > +
> > +		/*
> > +		 * We can ignore an -EBUSY here because if an ARS is already
> > +		 * in progress, just let that be the last authoritative one
> > +		 */
> > +		if (found_match)
> > +			acpi_nfit_ars_rescan(acpi_desc);
> > +	}
> > +
> > +	mutex_unlock(&acpi_desc_lock);
> > +	return NOTIFY_DONE;
> > +}
> > +
> > +static struct notifier_block nfit_mce_dec = {
> > +	.notifier_call	= nfit_handle_mce,
> > +};
> > +
> >  void acpi_nfit_desc_init(struct acpi_nfit_desc *acpi_desc, struct device *dev)
> >  {
> >  	struct nvdimm_bus_descriptor *nd_desc;
> > @@ -2676,6 +2755,9 @@ static int acpi_nfit_remove(struct acpi_device *adev)
> >  	flush_workqueue(nfit_wq);
> >  	sysfs_put(acpi_desc->scrub_count_state);
> >  	nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
> > +	mutex_lock(&acpi_desc_lock);
> > +	list_del(&acpi_desc->list);
> > +	mutex_unlock(&acpi_desc_lock);
> >  	return 0;
> >  }
> >  
> > @@ -2782,13 +2864,20 @@ static __init int nfit_init(void)
> >  	if (!nfit_wq)
> >  		return -ENOMEM;
> >  
> > +	INIT_LIST_HEAD(&acpi_descs);
> > +	mce_register_decode_chain(&nfit_mce_dec);
> > +
> >  	return acpi_bus_register_driver(&acpi_nfit_driver);
> >  }
> >  
> >  static __exit void nfit_exit(void)
> >  {
> > +	mce_unregister_decode_chain(&nfit_mce_dec);
> >  	acpi_bus_unregister_driver(&acpi_nfit_driver);
> >  	destroy_workqueue(nfit_wq);
> > +	mutex_lock(&acpi_desc_lock);
> > +	WARN_ON(!list_empty(&acpi_descs));
> > +	mutex_unlock(&acpi_desc_lock);
> >  }
> >  
> >  module_init(nfit_init);
> > diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
> > index 954d2aa..d2aec6f 100644
> > --- a/drivers/acpi/nfit.h
> > +++ b/drivers/acpi/nfit.h
> > @@ -146,6 +146,7 @@ struct acpi_nfit_desc {
> >  	struct nd_cmd_ars_status *ars_status;
> >  	size_t ars_status_size;
> >  	struct work_struct work;
> > +	struct list_head list;
> >  	struct kernfs_node *scrub_count_state;
> >  	unsigned int scrub_count;
> >  	unsigned int cancel:1;
> > 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 3/3] nfit: do an ARS scrub on hitting a latent media error
@ 2016-07-21 21:10       ` Vishal Verma
  0 siblings, 0 replies; 39+ messages in thread
From: Vishal Verma @ 2016-07-21 21:10 UTC (permalink / raw)
  To: Linda Knippers
  Cc: linux-nvdimm, Tony Luck, Rafael J. Wysocki, linux-kernel, linux-acpi

On 07/21, Linda Knippers wrote:
> 
> 
> On 7/20/2016 9:50 PM, Vishal Verma wrote:
> > When a latent (unknown to 'badblocks') error is encountered, it will
> > trigger a machine check exception. On a system with machine check
> > recovery, this will only SIGBUS the process(es) which had the bad page
> > mapped (as opposed to a kernel panic on platforms without machine
> > check recovery features). In the former case, we want to trigger a full
> > rescan of that nvdimm bus. This will allow any additional, new errors
> > to be captured in the block devices' badblocks lists, and offending
> > operations on them can be trapped early, avoiding machine checks.
> 
> Do we really need to rescan all SPA ranges?  If the problem is with
> an NVDIMM, wouldn't the blast radius be the device or it's interleave set,
> which would be part of the same SPA range?
> 
> I don't know what the overhead associated with a scan is which is why I'm asking.

You're right that we don't _need_ to scan all ranges, and that the scrub
can be long-running, but we just take this 'event' as an opportunity to
basically refresh everything. Since it is asynchronous, we're not
holding anything up.

> 
> -- ljk
> 
> > 
> > This is done by registering a callback function with the
> > x86_mce_decoder_chain and calling the new ars_rescan functionality with
> > the address in the mce notificatiion.
> > 
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Cc: Tony Luck <tony.luck@intel.com>
> > Cc: <linux-acpi@vger.kernel.org>
> > Cc: <linux-nvdimm@lists.01.org>
> > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> > ---
> >  drivers/acpi/nfit.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  drivers/acpi/nfit.h |  1 +
> >  2 files changed, 90 insertions(+)
> > 
> > diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
> > index 4e65255..c9f1ee4 100644
> > --- a/drivers/acpi/nfit.c
> > +++ b/drivers/acpi/nfit.c
> > @@ -12,6 +12,7 @@
> >   */
> >  #include <linux/list_sort.h>
> >  #include <linux/libnvdimm.h>
> > +#include <linux/notifier.h>
> >  #include <linux/module.h>
> >  #include <linux/mutex.h>
> >  #include <linux/ndctl.h>
> > @@ -24,6 +25,7 @@
> >  #include <linux/io.h>
> >  #include <linux/nd.h>
> >  #include <asm/cacheflush.h>
> > +#include <asm/mce.h>
> >  #include "nfit.h"
> >  
> >  /*
> > @@ -51,6 +53,9 @@ module_param(disable_vendor_specific, bool, S_IRUGO);
> >  MODULE_PARM_DESC(disable_vendor_specific,
> >  		"Limit commands to the publicly specified set\n");
> >  
> > +static LIST_HEAD(acpi_descs);
> > +static DEFINE_MUTEX(acpi_desc_lock);
> > +
> >  static struct workqueue_struct *nfit_wq;
> >  
> >  struct nfit_table_prev {
> > @@ -2416,9 +2421,11 @@ static int acpi_nfit_check_deletions(struct acpi_nfit_desc *acpi_desc,
> >  
> >  int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
> >  {
> > +	struct acpi_nfit_desc *acpi_desc_entry;
> >  	struct device *dev = acpi_desc->dev;
> >  	struct nfit_table_prev prev;
> >  	const void *end;
> > +	int found = 0;
> >  	u8 *data;
> >  	int rc;
> >  
> > @@ -2473,6 +2480,19 @@ int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
> >  
> >  	rc = acpi_nfit_register_regions(acpi_desc);
> >  
> > +	/*
> > +	 * We may get here due to an update of the nfit via _FIT.
> > +	 * Check if the acpi_desc we're (re)initializing is already
> > +	 * present in the list, and if so, don't re-add it
> > +	 */
> > +	mutex_lock(&acpi_desc_lock);
> > +	list_for_each_entry(acpi_desc_entry, &acpi_descs, list)
> > +		if (acpi_desc_entry == acpi_desc)
> > +			found = 1;
> > +	if (found == 0)
> > +		list_add_tail(&acpi_desc->list, &acpi_descs);
> > +	mutex_unlock(&acpi_desc_lock);
> > +
> >   out_unlock:
> >  	mutex_unlock(&acpi_desc->init_mutex);
> >  	return rc;
> > @@ -2555,6 +2575,65 @@ static int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc)
> >  	return 0;
> >  }
> >  
> > +static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
> > +			void *data)
> > +{
> > +	struct mce *mce = (struct mce *)data;
> > +	struct acpi_nfit_desc *acpi_desc;
> > +	struct nfit_spa *nfit_spa;
> > +
> > +	/* We only care about memory errors */
> > +	if (!(mce->status & MCACOD))
> > +		return NOTIFY_DONE;
> > +
> > +	/*
> > +	 * mce->addr contains the physical addr accessed that caused the
> > +	 * machine check. We need to walk through the list of NFITs, and see
> > +	 * if any of them matches that address, and only then start a scrub.
> > +	 */
> > +	mutex_lock(&acpi_desc_lock);
> > +	list_for_each_entry(acpi_desc, &acpi_descs, list) {
> > +		struct device *dev = acpi_desc->dev;
> > +		int found_match = 0;
> > +
> > +		list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
> > +			struct acpi_nfit_system_address *spa = nfit_spa->spa;
> > +
> > +			if (nfit_spa_type(spa) != NFIT_SPA_PM)
> > +				continue;
> > +			/* find the spa that covers the mce addr */
> > +			if (spa->address > mce->addr)
> > +				continue;
> > +			if ((spa->address + spa->length - 1) < mce->addr)
> > +				continue;
> > +			found_match = 1;
> > +			dev_dbg(dev, "%s: addr in SPA %d (0x%llx, 0x%llx)\n",
> > +				__func__, spa->range_index, spa->address,
> > +				spa->length);
> > +			/*
> > +			 * We can break at the first match because we're going
> > +			 * to rescan all the SPA ranges. There shouldn't be any
> > +			 * aliasing anyway.
> > +			 */
> > +			break;
> > +		}
> > +
> > +		/*
> > +		 * We can ignore an -EBUSY here because if an ARS is already
> > +		 * in progress, just let that be the last authoritative one
> > +		 */
> > +		if (found_match)
> > +			acpi_nfit_ars_rescan(acpi_desc);
> > +	}
> > +
> > +	mutex_unlock(&acpi_desc_lock);
> > +	return NOTIFY_DONE;
> > +}
> > +
> > +static struct notifier_block nfit_mce_dec = {
> > +	.notifier_call	= nfit_handle_mce,
> > +};
> > +
> >  void acpi_nfit_desc_init(struct acpi_nfit_desc *acpi_desc, struct device *dev)
> >  {
> >  	struct nvdimm_bus_descriptor *nd_desc;
> > @@ -2676,6 +2755,9 @@ static int acpi_nfit_remove(struct acpi_device *adev)
> >  	flush_workqueue(nfit_wq);
> >  	sysfs_put(acpi_desc->scrub_count_state);
> >  	nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
> > +	mutex_lock(&acpi_desc_lock);
> > +	list_del(&acpi_desc->list);
> > +	mutex_unlock(&acpi_desc_lock);
> >  	return 0;
> >  }
> >  
> > @@ -2782,13 +2864,20 @@ static __init int nfit_init(void)
> >  	if (!nfit_wq)
> >  		return -ENOMEM;
> >  
> > +	INIT_LIST_HEAD(&acpi_descs);
> > +	mce_register_decode_chain(&nfit_mce_dec);
> > +
> >  	return acpi_bus_register_driver(&acpi_nfit_driver);
> >  }
> >  
> >  static __exit void nfit_exit(void)
> >  {
> > +	mce_unregister_decode_chain(&nfit_mce_dec);
> >  	acpi_bus_unregister_driver(&acpi_nfit_driver);
> >  	destroy_workqueue(nfit_wq);
> > +	mutex_lock(&acpi_desc_lock);
> > +	WARN_ON(!list_empty(&acpi_descs));
> > +	mutex_unlock(&acpi_desc_lock);
> >  }
> >  
> >  module_init(nfit_init);
> > diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
> > index 954d2aa..d2aec6f 100644
> > --- a/drivers/acpi/nfit.h
> > +++ b/drivers/acpi/nfit.h
> > @@ -146,6 +146,7 @@ struct acpi_nfit_desc {
> >  	struct nd_cmd_ars_status *ars_status;
> >  	size_t ars_status_size;
> >  	struct work_struct work;
> > +	struct list_head list;
> >  	struct kernfs_node *scrub_count_state;
> >  	unsigned int scrub_count;
> >  	unsigned int cancel:1;
> > 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 3/3] nfit: do an ARS scrub on hitting a latent media error
@ 2016-07-21 21:25         ` Linda Knippers
  0 siblings, 0 replies; 39+ messages in thread
From: Linda Knippers @ 2016-07-21 21:25 UTC (permalink / raw)
  To: Vishal Verma; +Cc: linux-kernel, Tony Luck, linux-acpi, Rafael



On 7/21/2016 5:10 PM, Vishal Verma wrote:
> On 07/21, Linda Knippers wrote:
>>
>>
>> On 7/20/2016 9:50 PM, Vishal Verma wrote:
>>> When a latent (unknown to 'badblocks') error is encountered, it will
>>> trigger a machine check exception. On a system with machine check
>>> recovery, this will only SIGBUS the process(es) which had the bad page
>>> mapped (as opposed to a kernel panic on platforms without machine
>>> check recovery features). In the former case, we want to trigger a full
>>> rescan of that nvdimm bus. This will allow any additional, new errors
>>> to be captured in the block devices' badblocks lists, and offending
>>> operations on them can be trapped early, avoiding machine checks.
>>
>> Do we really need to rescan all SPA ranges?  If the problem is with
>> an NVDIMM, wouldn't the blast radius be the device or it's interleave set,
>> which would be part of the same SPA range?
>>
>> I don't know what the overhead associated with a scan is which is why I'm asking.
> 
> You're right that we don't _need_ to scan all ranges, and that the scrub
> can be long-running, but we just take this 'event' as an opportunity to
> basically refresh everything. Since it is asynchronous, we're not
> holding anything up.

We're not holding up anything in the kernel but I assume there it's
not a zero-overhead operation.  The memory controller may be doing something
or the platform firmware could be doing something which could introduce
latency spikes.  It's the kind of thing that really annoys some customers
but maybe following an MCE no one cares about that.

-- ljk
> 
>>
>> -- ljk
>>
>>>
>>> This is done by registering a callback function with the
>>> x86_mce_decoder_chain and calling the new ars_rescan functionality with
>>> the address in the mce notificatiion.
>>>
>>> Cc: Dan Williams <dan.j.williams@intel.com>
>>> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>> Cc: Tony Luck <tony.luck@intel.com>
>>> Cc: <linux-acpi@vger.kernel.org>
>>> Cc: <linux-nvdimm@lists.01.org>
>>> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
>>> ---
>>>  drivers/acpi/nfit.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>  drivers/acpi/nfit.h |  1 +
>>>  2 files changed, 90 insertions(+)
>>>
>>> diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
>>> index 4e65255..c9f1ee4 100644
>>> --- a/drivers/acpi/nfit.c
>>> +++ b/drivers/acpi/nfit.c
>>> @@ -12,6 +12,7 @@
>>>   */
>>>  #include <linux/list_sort.h>
>>>  #include <linux/libnvdimm.h>
>>> +#include <linux/notifier.h>
>>>  #include <linux/module.h>
>>>  #include <linux/mutex.h>
>>>  #include <linux/ndctl.h>
>>> @@ -24,6 +25,7 @@
>>>  #include <linux/io.h>
>>>  #include <linux/nd.h>
>>>  #include <asm/cacheflush.h>
>>> +#include <asm/mce.h>
>>>  #include "nfit.h"
>>>  
>>>  /*
>>> @@ -51,6 +53,9 @@ module_param(disable_vendor_specific, bool, S_IRUGO);
>>>  MODULE_PARM_DESC(disable_vendor_specific,
>>>  		"Limit commands to the publicly specified set\n");
>>>  
>>> +static LIST_HEAD(acpi_descs);
>>> +static DEFINE_MUTEX(acpi_desc_lock);
>>> +
>>>  static struct workqueue_struct *nfit_wq;
>>>  
>>>  struct nfit_table_prev {
>>> @@ -2416,9 +2421,11 @@ static int acpi_nfit_check_deletions(struct acpi_nfit_desc *acpi_desc,
>>>  
>>>  int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
>>>  {
>>> +	struct acpi_nfit_desc *acpi_desc_entry;
>>>  	struct device *dev = acpi_desc->dev;
>>>  	struct nfit_table_prev prev;
>>>  	const void *end;
>>> +	int found = 0;
>>>  	u8 *data;
>>>  	int rc;
>>>  
>>> @@ -2473,6 +2480,19 @@ int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
>>>  
>>>  	rc = acpi_nfit_register_regions(acpi_desc);
>>>  
>>> +	/*
>>> +	 * We may get here due to an update of the nfit via _FIT.
>>> +	 * Check if the acpi_desc we're (re)initializing is already
>>> +	 * present in the list, and if so, don't re-add it
>>> +	 */
>>> +	mutex_lock(&acpi_desc_lock);
>>> +	list_for_each_entry(acpi_desc_entry, &acpi_descs, list)
>>> +		if (acpi_desc_entry == acpi_desc)
>>> +			found = 1;
>>> +	if (found == 0)
>>> +		list_add_tail(&acpi_desc->list, &acpi_descs);
>>> +	mutex_unlock(&acpi_desc_lock);
>>> +
>>>   out_unlock:
>>>  	mutex_unlock(&acpi_desc->init_mutex);
>>>  	return rc;
>>> @@ -2555,6 +2575,65 @@ static int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc)
>>>  	return 0;
>>>  }
>>>  
>>> +static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
>>> +			void *data)
>>> +{
>>> +	struct mce *mce = (struct mce *)data;
>>> +	struct acpi_nfit_desc *acpi_desc;
>>> +	struct nfit_spa *nfit_spa;
>>> +
>>> +	/* We only care about memory errors */
>>> +	if (!(mce->status & MCACOD))
>>> +		return NOTIFY_DONE;
>>> +
>>> +	/*
>>> +	 * mce->addr contains the physical addr accessed that caused the
>>> +	 * machine check. We need to walk through the list of NFITs, and see
>>> +	 * if any of them matches that address, and only then start a scrub.
>>> +	 */
>>> +	mutex_lock(&acpi_desc_lock);
>>> +	list_for_each_entry(acpi_desc, &acpi_descs, list) {
>>> +		struct device *dev = acpi_desc->dev;
>>> +		int found_match = 0;
>>> +
>>> +		list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
>>> +			struct acpi_nfit_system_address *spa = nfit_spa->spa;
>>> +
>>> +			if (nfit_spa_type(spa) != NFIT_SPA_PM)
>>> +				continue;
>>> +			/* find the spa that covers the mce addr */
>>> +			if (spa->address > mce->addr)
>>> +				continue;
>>> +			if ((spa->address + spa->length - 1) < mce->addr)
>>> +				continue;
>>> +			found_match = 1;
>>> +			dev_dbg(dev, "%s: addr in SPA %d (0x%llx, 0x%llx)\n",
>>> +				__func__, spa->range_index, spa->address,
>>> +				spa->length);
>>> +			/*
>>> +			 * We can break at the first match because we're going
>>> +			 * to rescan all the SPA ranges. There shouldn't be any
>>> +			 * aliasing anyway.
>>> +			 */
>>> +			break;
>>> +		}
>>> +
>>> +		/*
>>> +		 * We can ignore an -EBUSY here because if an ARS is already
>>> +		 * in progress, just let that be the last authoritative one
>>> +		 */
>>> +		if (found_match)
>>> +			acpi_nfit_ars_rescan(acpi_desc);
>>> +	}
>>> +
>>> +	mutex_unlock(&acpi_desc_lock);
>>> +	return NOTIFY_DONE;
>>> +}
>>> +
>>> +static struct notifier_block nfit_mce_dec = {
>>> +	.notifier_call	= nfit_handle_mce,
>>> +};
>>> +
>>>  void acpi_nfit_desc_init(struct acpi_nfit_desc *acpi_desc, struct device *dev)
>>>  {
>>>  	struct nvdimm_bus_descriptor *nd_desc;
>>> @@ -2676,6 +2755,9 @@ static int acpi_nfit_remove(struct acpi_device *adev)
>>>  	flush_workqueue(nfit_wq);
>>>  	sysfs_put(acpi_desc->scrub_count_state);
>>>  	nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
>>> +	mutex_lock(&acpi_desc_lock);
>>> +	list_del(&acpi_desc->list);
>>> +	mutex_unlock(&acpi_desc_lock);
>>>  	return 0;
>>>  }
>>>  
>>> @@ -2782,13 +2864,20 @@ static __init int nfit_init(void)
>>>  	if (!nfit_wq)
>>>  		return -ENOMEM;
>>>  
>>> +	INIT_LIST_HEAD(&acpi_descs);
>>> +	mce_register_decode_chain(&nfit_mce_dec);
>>> +
>>>  	return acpi_bus_register_driver(&acpi_nfit_driver);
>>>  }
>>>  
>>>  static __exit void nfit_exit(void)
>>>  {
>>> +	mce_unregister_decode_chain(&nfit_mce_dec);
>>>  	acpi_bus_unregister_driver(&acpi_nfit_driver);
>>>  	destroy_workqueue(nfit_wq);
>>> +	mutex_lock(&acpi_desc_lock);
>>> +	WARN_ON(!list_empty(&acpi_descs));
>>> +	mutex_unlock(&acpi_desc_lock);
>>>  }
>>>  
>>>  module_init(nfit_init);
>>> diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
>>> index 954d2aa..d2aec6f 100644
>>> --- a/drivers/acpi/nfit.h
>>> +++ b/drivers/acpi/nfit.h
>>> @@ -146,6 +146,7 @@ struct acpi_nfit_desc {
>>>  	struct nd_cmd_ars_status *ars_status;
>>>  	size_t ars_status_size;
>>>  	struct work_struct work;
>>> +	struct list_head list;
>>>  	struct kernfs_node *scrub_count_state;
>>>  	unsigned int scrub_count;
>>>  	unsigned int cancel:1;
>>>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 3/3] nfit: do an ARS scrub on hitting a latent media error
@ 2016-07-21 21:25         ` Linda Knippers
  0 siblings, 0 replies; 39+ messages in thread
From: Linda Knippers @ 2016-07-21 21:25 UTC (permalink / raw)
  To: Vishal Verma
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tony Luck,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, Rafael J. Wysocki,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw



On 7/21/2016 5:10 PM, Vishal Verma wrote:
> On 07/21, Linda Knippers wrote:
>>
>>
>> On 7/20/2016 9:50 PM, Vishal Verma wrote:
>>> When a latent (unknown to 'badblocks') error is encountered, it will
>>> trigger a machine check exception. On a system with machine check
>>> recovery, this will only SIGBUS the process(es) which had the bad page
>>> mapped (as opposed to a kernel panic on platforms without machine
>>> check recovery features). In the former case, we want to trigger a full
>>> rescan of that nvdimm bus. This will allow any additional, new errors
>>> to be captured in the block devices' badblocks lists, and offending
>>> operations on them can be trapped early, avoiding machine checks.
>>
>> Do we really need to rescan all SPA ranges?  If the problem is with
>> an NVDIMM, wouldn't the blast radius be the device or it's interleave set,
>> which would be part of the same SPA range?
>>
>> I don't know what the overhead associated with a scan is which is why I'm asking.
> 
> You're right that we don't _need_ to scan all ranges, and that the scrub
> can be long-running, but we just take this 'event' as an opportunity to
> basically refresh everything. Since it is asynchronous, we're not
> holding anything up.

We're not holding up anything in the kernel but I assume there it's
not a zero-overhead operation.  The memory controller may be doing something
or the platform firmware could be doing something which could introduce
latency spikes.  It's the kind of thing that really annoys some customers
but maybe following an MCE no one cares about that.

-- ljk
> 
>>
>> -- ljk
>>
>>>
>>> This is done by registering a callback function with the
>>> x86_mce_decoder_chain and calling the new ars_rescan functionality with
>>> the address in the mce notificatiion.
>>>
>>> Cc: Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>>> Cc: Rafael J. Wysocki <rafael.j.wysocki-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>>> Cc: Tony Luck <tony.luck-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>>> Cc: <linux-acpi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
>>> Cc: <linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org>
>>> Signed-off-by: Vishal Verma <vishal.l.verma-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>>> ---
>>>  drivers/acpi/nfit.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>  drivers/acpi/nfit.h |  1 +
>>>  2 files changed, 90 insertions(+)
>>>
>>> diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
>>> index 4e65255..c9f1ee4 100644
>>> --- a/drivers/acpi/nfit.c
>>> +++ b/drivers/acpi/nfit.c
>>> @@ -12,6 +12,7 @@
>>>   */
>>>  #include <linux/list_sort.h>
>>>  #include <linux/libnvdimm.h>
>>> +#include <linux/notifier.h>
>>>  #include <linux/module.h>
>>>  #include <linux/mutex.h>
>>>  #include <linux/ndctl.h>
>>> @@ -24,6 +25,7 @@
>>>  #include <linux/io.h>
>>>  #include <linux/nd.h>
>>>  #include <asm/cacheflush.h>
>>> +#include <asm/mce.h>
>>>  #include "nfit.h"
>>>  
>>>  /*
>>> @@ -51,6 +53,9 @@ module_param(disable_vendor_specific, bool, S_IRUGO);
>>>  MODULE_PARM_DESC(disable_vendor_specific,
>>>  		"Limit commands to the publicly specified set\n");
>>>  
>>> +static LIST_HEAD(acpi_descs);
>>> +static DEFINE_MUTEX(acpi_desc_lock);
>>> +
>>>  static struct workqueue_struct *nfit_wq;
>>>  
>>>  struct nfit_table_prev {
>>> @@ -2416,9 +2421,11 @@ static int acpi_nfit_check_deletions(struct acpi_nfit_desc *acpi_desc,
>>>  
>>>  int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
>>>  {
>>> +	struct acpi_nfit_desc *acpi_desc_entry;
>>>  	struct device *dev = acpi_desc->dev;
>>>  	struct nfit_table_prev prev;
>>>  	const void *end;
>>> +	int found = 0;
>>>  	u8 *data;
>>>  	int rc;
>>>  
>>> @@ -2473,6 +2480,19 @@ int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
>>>  
>>>  	rc = acpi_nfit_register_regions(acpi_desc);
>>>  
>>> +	/*
>>> +	 * We may get here due to an update of the nfit via _FIT.
>>> +	 * Check if the acpi_desc we're (re)initializing is already
>>> +	 * present in the list, and if so, don't re-add it
>>> +	 */
>>> +	mutex_lock(&acpi_desc_lock);
>>> +	list_for_each_entry(acpi_desc_entry, &acpi_descs, list)
>>> +		if (acpi_desc_entry == acpi_desc)
>>> +			found = 1;
>>> +	if (found == 0)
>>> +		list_add_tail(&acpi_desc->list, &acpi_descs);
>>> +	mutex_unlock(&acpi_desc_lock);
>>> +
>>>   out_unlock:
>>>  	mutex_unlock(&acpi_desc->init_mutex);
>>>  	return rc;
>>> @@ -2555,6 +2575,65 @@ static int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc)
>>>  	return 0;
>>>  }
>>>  
>>> +static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
>>> +			void *data)
>>> +{
>>> +	struct mce *mce = (struct mce *)data;
>>> +	struct acpi_nfit_desc *acpi_desc;
>>> +	struct nfit_spa *nfit_spa;
>>> +
>>> +	/* We only care about memory errors */
>>> +	if (!(mce->status & MCACOD))
>>> +		return NOTIFY_DONE;
>>> +
>>> +	/*
>>> +	 * mce->addr contains the physical addr accessed that caused the
>>> +	 * machine check. We need to walk through the list of NFITs, and see
>>> +	 * if any of them matches that address, and only then start a scrub.
>>> +	 */
>>> +	mutex_lock(&acpi_desc_lock);
>>> +	list_for_each_entry(acpi_desc, &acpi_descs, list) {
>>> +		struct device *dev = acpi_desc->dev;
>>> +		int found_match = 0;
>>> +
>>> +		list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
>>> +			struct acpi_nfit_system_address *spa = nfit_spa->spa;
>>> +
>>> +			if (nfit_spa_type(spa) != NFIT_SPA_PM)
>>> +				continue;
>>> +			/* find the spa that covers the mce addr */
>>> +			if (spa->address > mce->addr)
>>> +				continue;
>>> +			if ((spa->address + spa->length - 1) < mce->addr)
>>> +				continue;
>>> +			found_match = 1;
>>> +			dev_dbg(dev, "%s: addr in SPA %d (0x%llx, 0x%llx)\n",
>>> +				__func__, spa->range_index, spa->address,
>>> +				spa->length);
>>> +			/*
>>> +			 * We can break at the first match because we're going
>>> +			 * to rescan all the SPA ranges. There shouldn't be any
>>> +			 * aliasing anyway.
>>> +			 */
>>> +			break;
>>> +		}
>>> +
>>> +		/*
>>> +		 * We can ignore an -EBUSY here because if an ARS is already
>>> +		 * in progress, just let that be the last authoritative one
>>> +		 */
>>> +		if (found_match)
>>> +			acpi_nfit_ars_rescan(acpi_desc);
>>> +	}
>>> +
>>> +	mutex_unlock(&acpi_desc_lock);
>>> +	return NOTIFY_DONE;
>>> +}
>>> +
>>> +static struct notifier_block nfit_mce_dec = {
>>> +	.notifier_call	= nfit_handle_mce,
>>> +};
>>> +
>>>  void acpi_nfit_desc_init(struct acpi_nfit_desc *acpi_desc, struct device *dev)
>>>  {
>>>  	struct nvdimm_bus_descriptor *nd_desc;
>>> @@ -2676,6 +2755,9 @@ static int acpi_nfit_remove(struct acpi_device *adev)
>>>  	flush_workqueue(nfit_wq);
>>>  	sysfs_put(acpi_desc->scrub_count_state);
>>>  	nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
>>> +	mutex_lock(&acpi_desc_lock);
>>> +	list_del(&acpi_desc->list);
>>> +	mutex_unlock(&acpi_desc_lock);
>>>  	return 0;
>>>  }
>>>  
>>> @@ -2782,13 +2864,20 @@ static __init int nfit_init(void)
>>>  	if (!nfit_wq)
>>>  		return -ENOMEM;
>>>  
>>> +	INIT_LIST_HEAD(&acpi_descs);
>>> +	mce_register_decode_chain(&nfit_mce_dec);
>>> +
>>>  	return acpi_bus_register_driver(&acpi_nfit_driver);
>>>  }
>>>  
>>>  static __exit void nfit_exit(void)
>>>  {
>>> +	mce_unregister_decode_chain(&nfit_mce_dec);
>>>  	acpi_bus_unregister_driver(&acpi_nfit_driver);
>>>  	destroy_workqueue(nfit_wq);
>>> +	mutex_lock(&acpi_desc_lock);
>>> +	WARN_ON(!list_empty(&acpi_descs));
>>> +	mutex_unlock(&acpi_desc_lock);
>>>  }
>>>  
>>>  module_init(nfit_init);
>>> diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
>>> index 954d2aa..d2aec6f 100644
>>> --- a/drivers/acpi/nfit.h
>>> +++ b/drivers/acpi/nfit.h
>>> @@ -146,6 +146,7 @@ struct acpi_nfit_desc {
>>>  	struct nd_cmd_ars_status *ars_status;
>>>  	size_t ars_status_size;
>>>  	struct work_struct work;
>>> +	struct list_head list;
>>>  	struct kernfs_node *scrub_count_state;
>>>  	unsigned int scrub_count;
>>>  	unsigned int cancel:1;
>>>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 3/3] nfit: do an ARS scrub on hitting a latent media error
@ 2016-07-21 21:25         ` Linda Knippers
  0 siblings, 0 replies; 39+ messages in thread
From: Linda Knippers @ 2016-07-21 21:25 UTC (permalink / raw)
  To: Vishal Verma
  Cc: linux-nvdimm, Tony Luck, Rafael J. Wysocki, linux-kernel, linux-acpi



On 7/21/2016 5:10 PM, Vishal Verma wrote:
> On 07/21, Linda Knippers wrote:
>>
>>
>> On 7/20/2016 9:50 PM, Vishal Verma wrote:
>>> When a latent (unknown to 'badblocks') error is encountered, it will
>>> trigger a machine check exception. On a system with machine check
>>> recovery, this will only SIGBUS the process(es) which had the bad page
>>> mapped (as opposed to a kernel panic on platforms without machine
>>> check recovery features). In the former case, we want to trigger a full
>>> rescan of that nvdimm bus. This will allow any additional, new errors
>>> to be captured in the block devices' badblocks lists, and offending
>>> operations on them can be trapped early, avoiding machine checks.
>>
>> Do we really need to rescan all SPA ranges?  If the problem is with
>> an NVDIMM, wouldn't the blast radius be the device or it's interleave set,
>> which would be part of the same SPA range?
>>
>> I don't know what the overhead associated with a scan is which is why I'm asking.
> 
> You're right that we don't _need_ to scan all ranges, and that the scrub
> can be long-running, but we just take this 'event' as an opportunity to
> basically refresh everything. Since it is asynchronous, we're not
> holding anything up.

We're not holding up anything in the kernel but I assume there it's
not a zero-overhead operation.  The memory controller may be doing something
or the platform firmware could be doing something which could introduce
latency spikes.  It's the kind of thing that really annoys some customers
but maybe following an MCE no one cares about that.

-- ljk
> 
>>
>> -- ljk
>>
>>>
>>> This is done by registering a callback function with the
>>> x86_mce_decoder_chain and calling the new ars_rescan functionality with
>>> the address in the mce notificatiion.
>>>
>>> Cc: Dan Williams <dan.j.williams@intel.com>
>>> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>> Cc: Tony Luck <tony.luck@intel.com>
>>> Cc: <linux-acpi@vger.kernel.org>
>>> Cc: <linux-nvdimm@lists.01.org>
>>> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
>>> ---
>>>  drivers/acpi/nfit.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>  drivers/acpi/nfit.h |  1 +
>>>  2 files changed, 90 insertions(+)
>>>
>>> diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
>>> index 4e65255..c9f1ee4 100644
>>> --- a/drivers/acpi/nfit.c
>>> +++ b/drivers/acpi/nfit.c
>>> @@ -12,6 +12,7 @@
>>>   */
>>>  #include <linux/list_sort.h>
>>>  #include <linux/libnvdimm.h>
>>> +#include <linux/notifier.h>
>>>  #include <linux/module.h>
>>>  #include <linux/mutex.h>
>>>  #include <linux/ndctl.h>
>>> @@ -24,6 +25,7 @@
>>>  #include <linux/io.h>
>>>  #include <linux/nd.h>
>>>  #include <asm/cacheflush.h>
>>> +#include <asm/mce.h>
>>>  #include "nfit.h"
>>>  
>>>  /*
>>> @@ -51,6 +53,9 @@ module_param(disable_vendor_specific, bool, S_IRUGO);
>>>  MODULE_PARM_DESC(disable_vendor_specific,
>>>  		"Limit commands to the publicly specified set\n");
>>>  
>>> +static LIST_HEAD(acpi_descs);
>>> +static DEFINE_MUTEX(acpi_desc_lock);
>>> +
>>>  static struct workqueue_struct *nfit_wq;
>>>  
>>>  struct nfit_table_prev {
>>> @@ -2416,9 +2421,11 @@ static int acpi_nfit_check_deletions(struct acpi_nfit_desc *acpi_desc,
>>>  
>>>  int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
>>>  {
>>> +	struct acpi_nfit_desc *acpi_desc_entry;
>>>  	struct device *dev = acpi_desc->dev;
>>>  	struct nfit_table_prev prev;
>>>  	const void *end;
>>> +	int found = 0;
>>>  	u8 *data;
>>>  	int rc;
>>>  
>>> @@ -2473,6 +2480,19 @@ int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, acpi_size sz)
>>>  
>>>  	rc = acpi_nfit_register_regions(acpi_desc);
>>>  
>>> +	/*
>>> +	 * We may get here due to an update of the nfit via _FIT.
>>> +	 * Check if the acpi_desc we're (re)initializing is already
>>> +	 * present in the list, and if so, don't re-add it
>>> +	 */
>>> +	mutex_lock(&acpi_desc_lock);
>>> +	list_for_each_entry(acpi_desc_entry, &acpi_descs, list)
>>> +		if (acpi_desc_entry == acpi_desc)
>>> +			found = 1;
>>> +	if (found == 0)
>>> +		list_add_tail(&acpi_desc->list, &acpi_descs);
>>> +	mutex_unlock(&acpi_desc_lock);
>>> +
>>>   out_unlock:
>>>  	mutex_unlock(&acpi_desc->init_mutex);
>>>  	return rc;
>>> @@ -2555,6 +2575,65 @@ static int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc)
>>>  	return 0;
>>>  }
>>>  
>>> +static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
>>> +			void *data)
>>> +{
>>> +	struct mce *mce = (struct mce *)data;
>>> +	struct acpi_nfit_desc *acpi_desc;
>>> +	struct nfit_spa *nfit_spa;
>>> +
>>> +	/* We only care about memory errors */
>>> +	if (!(mce->status & MCACOD))
>>> +		return NOTIFY_DONE;
>>> +
>>> +	/*
>>> +	 * mce->addr contains the physical addr accessed that caused the
>>> +	 * machine check. We need to walk through the list of NFITs, and see
>>> +	 * if any of them matches that address, and only then start a scrub.
>>> +	 */
>>> +	mutex_lock(&acpi_desc_lock);
>>> +	list_for_each_entry(acpi_desc, &acpi_descs, list) {
>>> +		struct device *dev = acpi_desc->dev;
>>> +		int found_match = 0;
>>> +
>>> +		list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
>>> +			struct acpi_nfit_system_address *spa = nfit_spa->spa;
>>> +
>>> +			if (nfit_spa_type(spa) != NFIT_SPA_PM)
>>> +				continue;
>>> +			/* find the spa that covers the mce addr */
>>> +			if (spa->address > mce->addr)
>>> +				continue;
>>> +			if ((spa->address + spa->length - 1) < mce->addr)
>>> +				continue;
>>> +			found_match = 1;
>>> +			dev_dbg(dev, "%s: addr in SPA %d (0x%llx, 0x%llx)\n",
>>> +				__func__, spa->range_index, spa->address,
>>> +				spa->length);
>>> +			/*
>>> +			 * We can break at the first match because we're going
>>> +			 * to rescan all the SPA ranges. There shouldn't be any
>>> +			 * aliasing anyway.
>>> +			 */
>>> +			break;
>>> +		}
>>> +
>>> +		/*
>>> +		 * We can ignore an -EBUSY here because if an ARS is already
>>> +		 * in progress, just let that be the last authoritative one
>>> +		 */
>>> +		if (found_match)
>>> +			acpi_nfit_ars_rescan(acpi_desc);
>>> +	}
>>> +
>>> +	mutex_unlock(&acpi_desc_lock);
>>> +	return NOTIFY_DONE;
>>> +}
>>> +
>>> +static struct notifier_block nfit_mce_dec = {
>>> +	.notifier_call	= nfit_handle_mce,
>>> +};
>>> +
>>>  void acpi_nfit_desc_init(struct acpi_nfit_desc *acpi_desc, struct device *dev)
>>>  {
>>>  	struct nvdimm_bus_descriptor *nd_desc;
>>> @@ -2676,6 +2755,9 @@ static int acpi_nfit_remove(struct acpi_device *adev)
>>>  	flush_workqueue(nfit_wq);
>>>  	sysfs_put(acpi_desc->scrub_count_state);
>>>  	nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
>>> +	mutex_lock(&acpi_desc_lock);
>>> +	list_del(&acpi_desc->list);
>>> +	mutex_unlock(&acpi_desc_lock);
>>>  	return 0;
>>>  }
>>>  
>>> @@ -2782,13 +2864,20 @@ static __init int nfit_init(void)
>>>  	if (!nfit_wq)
>>>  		return -ENOMEM;
>>>  
>>> +	INIT_LIST_HEAD(&acpi_descs);
>>> +	mce_register_decode_chain(&nfit_mce_dec);
>>> +
>>>  	return acpi_bus_register_driver(&acpi_nfit_driver);
>>>  }
>>>  
>>>  static __exit void nfit_exit(void)
>>>  {
>>> +	mce_unregister_decode_chain(&nfit_mce_dec);
>>>  	acpi_bus_unregister_driver(&acpi_nfit_driver);
>>>  	destroy_workqueue(nfit_wq);
>>> +	mutex_lock(&acpi_desc_lock);
>>> +	WARN_ON(!list_empty(&acpi_descs));
>>> +	mutex_unlock(&acpi_desc_lock);
>>>  }
>>>  
>>>  module_init(nfit_init);
>>> diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
>>> index 954d2aa..d2aec6f 100644
>>> --- a/drivers/acpi/nfit.h
>>> +++ b/drivers/acpi/nfit.h
>>> @@ -146,6 +146,7 @@ struct acpi_nfit_desc {
>>>  	struct nd_cmd_ars_status *ars_status;
>>>  	size_t ars_status_size;
>>>  	struct work_struct work;
>>> +	struct list_head list;
>>>  	struct kernfs_node *scrub_count_state;
>>>  	unsigned int scrub_count;
>>>  	unsigned int cancel:1;
>>>

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2016-07-21 22:27 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-21  1:50 [PATCH v2 0/3] ARS rescanning triggered by latent errors or userspace Vishal Verma
2016-07-21  1:50 ` Vishal Verma
2016-07-21  1:50 ` Vishal Verma
2016-07-21  1:50 ` [PATCH v2 1/3] pmem: clarify a debug print in pmem_clear_poison Vishal Verma
2016-07-21  1:50   ` Vishal Verma
2016-07-21  1:50   ` Vishal Verma
2016-07-21  1:50 ` [PATCH v2 2/3] nfit, libnvdimm: allow an ARS scrub to be triggered on demand Vishal Verma
2016-07-21  1:50   ` Vishal Verma
2016-07-21  1:50   ` Vishal Verma
2016-07-21 15:56   ` Dan Williams
2016-07-21 15:56     ` Dan Williams
2016-07-21 15:56     ` Dan Williams
2016-07-21 18:07     ` Vishal Verma
2016-07-21 18:07       ` Vishal Verma
2016-07-21 18:07       ` Vishal Verma
2016-07-21 19:40   ` Linda Knippers
2016-07-21 19:40     ` Linda Knippers
2016-07-21 19:40     ` Linda Knippers
2016-07-21 19:46     ` Dan Williams
2016-07-21 19:46       ` Dan Williams
2016-07-21 19:46       ` Dan Williams
2016-07-21 19:55       ` Linda Knippers
2016-07-21 19:55         ` Linda Knippers
2016-07-21 19:55         ` Linda Knippers
2016-07-21 19:59         ` Dan Williams
2016-07-21 19:59           ` Dan Williams
2016-07-21 19:59           ` Dan Williams
2016-07-21  1:50 ` [PATCH v2 3/3] nfit: do an ARS scrub on hitting a latent media error Vishal Verma
2016-07-21  1:50   ` Vishal Verma
2016-07-21  1:50   ` Vishal Verma
2016-07-21 20:54   ` Linda Knippers
2016-07-21 20:54     ` Linda Knippers
2016-07-21 20:54     ` Linda Knippers
2016-07-21 21:10     ` Vishal Verma
2016-07-21 21:10       ` Vishal Verma
2016-07-21 21:10       ` Vishal Verma
2016-07-21 21:25       ` Linda Knippers
2016-07-21 21:25         ` Linda Knippers
2016-07-21 21:25         ` Linda Knippers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.