nvdimm.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/7] DAX poison recovery
@ 2022-01-28 21:31 Jane Chu
  2022-01-28 21:31 ` [PATCH v5 1/7] mce: fix set_mce_nospec to always unmap the whole page Jane Chu
                   ` (6 more replies)
  0 siblings, 7 replies; 34+ messages in thread
From: Jane Chu @ 2022-01-28 21:31 UTC (permalink / raw)
  To: david, djwong, dan.j.williams, hch, vishal.l.verma, dave.jiang,
	agk, snitzer, dm-devel, ira.weiny, willy, vgoyal, linux-fsdevel,
	nvdimm, linux-kernel, linux-xfs

In this series, dax recovery code path is independent of that of
normal write. Competing dax recovery threads are serialized,
racing read threads are guaranteed not overlapping with the
recovery process.

In this phase, the recovery granularity is page, future patch
will explore recovery in finer granularity.

Change from v4:
  Fixed build errors reported by kernel test robot
Change from v3:
  Rebased to v5.17-rc1-81-g0280e3c58f92

v4:
https://lore.kernel.org/lkml/20220126211116.860012-1-jane.chu@oracle.com/T/
v3:
https://lkml.org/lkml/2022/1/11/900
v2:
https://lore.kernel.org/all/20211106011638.2613039-1-jane.chu@oracle.com/
Disussions about marking poisoned page as 'np':
https://lore.kernel.org/all/CAPcyv4hrXPb1tASBZUg-GgdVs0OOFKXMXLiHmktg_kFi7YBMyQ@mail.gmail.com/


Jane Chu (7):
  mce: fix set_mce_nospec to always unmap the whole page
  dax: introduce dax device flag DAXDEV_RECOVERY
  dm: make dm aware of target's DAXDEV_RECOVERY capability
  dax: add dax_recovery_write to dax_op and dm target type
  pmem: add pmem_recovery_write() dax op
  dax: add recovery_write to dax_iomap_iter in failure path
  pmem: fix pmem_do_write() avoid writing to 'np' page

 arch/x86/include/asm/set_memory.h | 17 ++----
 arch/x86/kernel/cpu/mce/core.c    |  6 +-
 arch/x86/mm/pat/set_memory.c      |  8 ++-
 drivers/dax/super.c               | 41 +++++++++++++
 drivers/md/dm-linear.c            | 12 ++++
 drivers/md/dm-log-writes.c        | 12 ++++
 drivers/md/dm-stripe.c            | 13 ++++
 drivers/md/dm-table.c             | 33 +++++++++++
 drivers/md/dm.c                   | 27 +++++++++
 drivers/nvdimm/pmem.c             | 99 ++++++++++++++++++++++++++++---
 drivers/nvdimm/pmem.h             |  1 +
 fs/dax.c                          | 23 ++++++-
 include/linux/dax.h               | 30 ++++++++++
 include/linux/device-mapper.h     |  9 +++
 include/linux/set_memory.h        |  2 +-
 15 files changed, 306 insertions(+), 27 deletions(-)

-- 
2.18.4


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v5 1/7] mce: fix set_mce_nospec to always unmap the whole page
  2022-01-28 21:31 [PATCH v5 0/7] DAX poison recovery Jane Chu
@ 2022-01-28 21:31 ` Jane Chu
  2022-02-02 13:21   ` Christoph Hellwig
  2022-01-28 21:31 ` [PATCH v5 2/7] dax: introduce dax device flag DAXDEV_RECOVERY Jane Chu
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 34+ messages in thread
From: Jane Chu @ 2022-01-28 21:31 UTC (permalink / raw)
  To: david, djwong, dan.j.williams, hch, vishal.l.verma, dave.jiang,
	agk, snitzer, dm-devel, ira.weiny, willy, vgoyal, linux-fsdevel,
	nvdimm, linux-kernel, linux-xfs

Mark poisoned page as not present, and to reverse the 'np' effect,
restate the _PAGE_PRESENT bit. Please refer to discussions here for
reason behind the decision.
https://lore.kernel.org/all/CAPcyv4hrXPb1tASBZUg-GgdVs0OOFKXMXLiHmktg_kFi7YBMyQ@mail.gmail.com/

Fixes: 284ce4011ba6 ("x86/memory_failure: Introduce {set, clear}_mce_nospec()")
Signed-off-by: Jane Chu <jane.chu@oracle.com>
---
 arch/x86/include/asm/set_memory.h | 17 +++++------------
 arch/x86/kernel/cpu/mce/core.c    |  6 +++---
 arch/x86/mm/pat/set_memory.c      |  8 +++++++-
 include/linux/set_memory.h        |  2 +-
 4 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index ff0f2d90338a..aef6677da291 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -50,6 +50,7 @@ int set_memory_decrypted(unsigned long addr, int numpages);
 int set_memory_np_noalias(unsigned long addr, int numpages);
 int set_memory_nonglobal(unsigned long addr, int numpages);
 int set_memory_global(unsigned long addr, int numpages);
+int _set_memory_present(unsigned long addr, int numpages);
 
 int set_pages_array_uc(struct page **pages, int addrinarray);
 int set_pages_array_wc(struct page **pages, int addrinarray);
@@ -89,13 +90,8 @@ void notify_range_enc_status_changed(unsigned long vaddr, int npages, bool enc);
 extern int kernel_set_to_readonly;
 
 #ifdef CONFIG_X86_64
-/*
- * Prevent speculative access to the page by either unmapping
- * it (if we do not require access to any part of the page) or
- * marking it uncacheable (if we want to try to retrieve data
- * from non-poisoned lines in the page).
- */
-static inline int set_mce_nospec(unsigned long pfn, bool unmap)
+/* Prevent speculative access to a page by marking it not-present */
+static inline int set_mce_nospec(unsigned long pfn)
 {
 	unsigned long decoy_addr;
 	int rc;
@@ -117,10 +113,7 @@ static inline int set_mce_nospec(unsigned long pfn, bool unmap)
 	 */
 	decoy_addr = (pfn << PAGE_SHIFT) + (PAGE_OFFSET ^ BIT(63));
 
-	if (unmap)
-		rc = set_memory_np(decoy_addr, 1);
-	else
-		rc = set_memory_uc(decoy_addr, 1);
+	rc = set_memory_np(decoy_addr, 1);
 	if (rc)
 		pr_warn("Could not invalidate pfn=0x%lx from 1:1 map\n", pfn);
 	return rc;
@@ -130,7 +123,7 @@ static inline int set_mce_nospec(unsigned long pfn, bool unmap)
 /* Restore full speculative operation to the pfn. */
 static inline int clear_mce_nospec(unsigned long pfn)
 {
-	return set_memory_wb((unsigned long) pfn_to_kaddr(pfn), 1);
+	return _set_memory_present((unsigned long) pfn_to_kaddr(pfn), 1);
 }
 #define clear_mce_nospec clear_mce_nospec
 #else
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 5818b837fd4d..8d12739f283d 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -613,7 +613,7 @@ static int uc_decode_notifier(struct notifier_block *nb, unsigned long val,
 
 	pfn = mce->addr >> PAGE_SHIFT;
 	if (!memory_failure(pfn, 0)) {
-		set_mce_nospec(pfn, whole_page(mce));
+		set_mce_nospec(pfn);
 		mce->kflags |= MCE_HANDLED_UC;
 	}
 
@@ -1297,7 +1297,7 @@ static void kill_me_maybe(struct callback_head *cb)
 
 	ret = memory_failure(p->mce_addr >> PAGE_SHIFT, flags);
 	if (!ret) {
-		set_mce_nospec(p->mce_addr >> PAGE_SHIFT, p->mce_whole_page);
+		set_mce_nospec(p->mce_addr >> PAGE_SHIFT);
 		sync_core();
 		return;
 	}
@@ -1321,7 +1321,7 @@ static void kill_me_never(struct callback_head *cb)
 	p->mce_count = 0;
 	pr_err("Kernel accessed poison in user space at %llx\n", p->mce_addr);
 	if (!memory_failure(p->mce_addr >> PAGE_SHIFT, 0))
-		set_mce_nospec(p->mce_addr >> PAGE_SHIFT, p->mce_whole_page);
+		set_mce_nospec(p->mce_addr >> PAGE_SHIFT);
 }
 
 static void queue_task_work(struct mce *m, char *msg, void (*func)(struct callback_head *))
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index b4072115c8ef..68d84c8bd977 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -1816,7 +1816,7 @@ static inline int cpa_clear_pages_array(struct page **pages, int numpages,
 }
 
 /*
- * _set_memory_prot is an internal helper for callers that have been passed
+ * __set_memory_prot is an internal helper for callers that have been passed
  * a pgprot_t value from upper layers and a reservation has already been taken.
  * If you want to set the pgprot to a specific page protocol, use the
  * set_memory_xx() functions.
@@ -1983,6 +1983,12 @@ int set_memory_global(unsigned long addr, int numpages)
 				    __pgprot(_PAGE_GLOBAL), 0);
 }
 
+int _set_memory_present(unsigned long addr, int numpages)
+{
+	return change_page_attr_set(&addr, numpages, __pgprot(_PAGE_PRESENT), 0);
+}
+EXPORT_SYMBOL_GPL(_set_memory_present);
+
 /*
  * __set_memory_enc_pgtable() is used for the hypervisors that get
  * informed about "encryption" status via page tables.
diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h
index f36be5166c19..9ad898d40e7e 100644
--- a/include/linux/set_memory.h
+++ b/include/linux/set_memory.h
@@ -43,7 +43,7 @@ static inline bool can_set_direct_map(void)
 #endif /* CONFIG_ARCH_HAS_SET_DIRECT_MAP */
 
 #ifndef set_mce_nospec
-static inline int set_mce_nospec(unsigned long pfn, bool unmap)
+static inline int set_mce_nospec(unsigned long pfn)
 {
 	return 0;
 }
-- 
2.18.4


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 2/7] dax: introduce dax device flag DAXDEV_RECOVERY
  2022-01-28 21:31 [PATCH v5 0/7] DAX poison recovery Jane Chu
  2022-01-28 21:31 ` [PATCH v5 1/7] mce: fix set_mce_nospec to always unmap the whole page Jane Chu
@ 2022-01-28 21:31 ` Jane Chu
  2022-02-02 13:23   ` Christoph Hellwig
  2022-01-28 21:31 ` [PATCH v5 3/7] dm: make dm aware of target's DAXDEV_RECOVERY capability Jane Chu
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 34+ messages in thread
From: Jane Chu @ 2022-01-28 21:31 UTC (permalink / raw)
  To: david, djwong, dan.j.williams, hch, vishal.l.verma, dave.jiang,
	agk, snitzer, dm-devel, ira.weiny, willy, vgoyal, linux-fsdevel,
	nvdimm, linux-kernel, linux-xfs

Introduce dax device flag DAXDEV_RECOVERY to indicate a device
that is capable of recoverying from media poison.
For MD raid DAX devices, the capability is allowed for partial
device as oppose to the entire device. And the final poison
detection and repair rely on the provisioning base drivers.

Signed-off-by: Jane Chu <jane.chu@oracle.com>
---
 drivers/dax/super.c   | 24 ++++++++++++++++++++++++
 drivers/nvdimm/pmem.c |  1 +
 include/linux/dax.h   | 14 ++++++++++++++
 3 files changed, 39 insertions(+)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index e3029389d809..f4f607d9698b 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -109,6 +109,8 @@ enum dax_device_flags {
 	DAXDEV_NOCACHE,
 	/* handle CPU fetch exceptions during reads */
 	DAXDEV_NOMC,
+	/* flag to indicate device capable of poison recovery */
+	DAXDEV_RECOVERY,
 };
 
 /**
@@ -311,6 +313,28 @@ static void dax_destroy_inode(struct inode *inode)
 			"kill_dax() must be called before final iput()\n");
 }
 
+void set_dax_recovery(struct dax_device *dax_dev)
+{
+	set_bit(DAXDEV_RECOVERY, &dax_dev->flags);
+}
+EXPORT_SYMBOL_GPL(set_dax_recovery);
+
+bool dax_recovery_capable(struct dax_device *dax_dev)
+{
+	return test_bit(DAXDEV_RECOVERY, &dax_dev->flags);
+}
+EXPORT_SYMBOL_GPL(dax_recovery_capable);
+
+int dax_prep_recovery(struct dax_device *dax_dev, void **kaddr)
+{
+	if (dax_recovery_capable(dax_dev)) {
+		set_bit(DAXDEV_RECOVERY, (unsigned long *)kaddr);
+		return 0;
+	}
+	return -EINVAL;
+}
+EXPORT_SYMBOL_GPL(dax_prep_recovery);
+
 static const struct super_operations dax_sops = {
 	.statfs = simple_statfs,
 	.alloc_inode = dax_alloc_inode,
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 58d95242a836..e8823813a8df 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -487,6 +487,7 @@ static int pmem_attach_disk(struct device *dev,
 	if (rc)
 		goto out_cleanup_dax;
 	dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
+	set_dax_recovery(dax_dev);
 	pmem->dax_dev = dax_dev;
 
 	rc = device_add_disk(dev, disk, pmem_attribute_groups);
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 9fc5f99a0ae2..2fc776653c6e 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -40,6 +40,9 @@ void dax_write_cache(struct dax_device *dax_dev, bool wc);
 bool dax_write_cache_enabled(struct dax_device *dax_dev);
 bool dax_synchronous(struct dax_device *dax_dev);
 void set_dax_synchronous(struct dax_device *dax_dev);
+void set_dax_recovery(struct dax_device *dax_dev);
+bool dax_recovery_capable(struct dax_device *dax_dev);
+int dax_prep_recovery(struct dax_device *dax_dev, void **kaddr);
 /*
  * Check if given mapping is supported by the file / underlying device.
  */
@@ -87,6 +90,17 @@ static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
 {
 	return !(vma->vm_flags & VM_SYNC);
 }
+static inline void set_dax_recovery(struct dax_device *dax_dev)
+{
+}
+static inline bool dax_recovery_capable(struct dax_device *dax_dev)
+{
+	return false;
+}
+static inline int dax_prep_recovery(struct dax_device *dax_dev, void **kaddr)
+{
+	return -EINVAL;
+}
 #endif
 
 void set_dax_nocache(struct dax_device *dax_dev);
-- 
2.18.4


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 3/7] dm: make dm aware of target's DAXDEV_RECOVERY capability
  2022-01-28 21:31 [PATCH v5 0/7] DAX poison recovery Jane Chu
  2022-01-28 21:31 ` [PATCH v5 1/7] mce: fix set_mce_nospec to always unmap the whole page Jane Chu
  2022-01-28 21:31 ` [PATCH v5 2/7] dax: introduce dax device flag DAXDEV_RECOVERY Jane Chu
@ 2022-01-28 21:31 ` Jane Chu
  2022-02-04  5:34   ` Dan Williams
  2022-01-28 21:31 ` [PATCH v5 4/7] dax: add dax_recovery_write to dax_op and dm target type Jane Chu
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 34+ messages in thread
From: Jane Chu @ 2022-01-28 21:31 UTC (permalink / raw)
  To: david, djwong, dan.j.williams, hch, vishal.l.verma, dave.jiang,
	agk, snitzer, dm-devel, ira.weiny, willy, vgoyal, linux-fsdevel,
	nvdimm, linux-kernel, linux-xfs

If one of the MD raid participating target dax device supports
DAXDEV_RECOVERY, then it'll be declared on the whole that the
MD device is capable of DAXDEV_RECOVERY.
And only when the recovery process reaches to the target driver,
it becomes deterministic whether a certain dax address range
maybe recovered, or not.

Signed-off-by: Jane Chu <jane.chu@oracle.com>
---
 drivers/md/dm-table.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index e43096cfe9e2..8af8a81b6172 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -844,6 +844,36 @@ static bool dm_table_supports_dax(struct dm_table *t,
 	return true;
 }
 
+/* Check whether device is capable of dax poison recovery */
+static int device_poison_recovery_capable(struct dm_target *ti,
+	struct dm_dev *dev, sector_t start, sector_t len, void *data)
+{
+	if (!dev->dax_dev)
+		return false;
+	return dax_recovery_capable(dev->dax_dev);
+}
+
+static bool dm_table_supports_poison_recovery(struct dm_table *t,
+	iterate_devices_callout_fn func)
+{
+	struct dm_target *ti;
+	unsigned int i;
+
+	/* Check if any DAX target supports poison recovery */
+	for (i = 0; i < dm_table_get_num_targets(t); i++) {
+		ti = dm_table_get_target(t, i);
+
+		if (!ti->type->direct_access)
+			return false;
+
+		if (ti->type->iterate_devices &&
+		    ti->type->iterate_devices(ti, func, NULL))
+			return true;
+	}
+
+	return false;
+}
+
 static int device_is_rq_stackable(struct dm_target *ti, struct dm_dev *dev,
 				  sector_t start, sector_t len, void *data)
 {
@@ -2014,6 +2044,9 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 		blk_queue_flag_set(QUEUE_FLAG_DAX, q);
 		if (dm_table_supports_dax(t, device_not_dax_synchronous_capable))
 			set_dax_synchronous(t->md->dax_dev);
+		if (dm_table_supports_poison_recovery(t,
+					device_poison_recovery_capable))
+			set_dax_recovery(t->md->dax_dev);
 	}
 	else
 		blk_queue_flag_clear(QUEUE_FLAG_DAX, q);
-- 
2.18.4


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 4/7] dax: add dax_recovery_write to dax_op and dm target type
  2022-01-28 21:31 [PATCH v5 0/7] DAX poison recovery Jane Chu
                   ` (2 preceding siblings ...)
  2022-01-28 21:31 ` [PATCH v5 3/7] dm: make dm aware of target's DAXDEV_RECOVERY capability Jane Chu
@ 2022-01-28 21:31 ` Jane Chu
  2022-02-02 13:34   ` Christoph Hellwig
  2022-02-04  6:03   ` Dan Williams
  2022-01-28 21:31 ` [PATCH v5 5/7] pmem: add pmem_recovery_write() dax op Jane Chu
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 34+ messages in thread
From: Jane Chu @ 2022-01-28 21:31 UTC (permalink / raw)
  To: david, djwong, dan.j.williams, hch, vishal.l.verma, dave.jiang,
	agk, snitzer, dm-devel, ira.weiny, willy, vgoyal, linux-fsdevel,
	nvdimm, linux-kernel, linux-xfs

dax_recovery_write() dax op is only required for DAX device that
export DAXDEV_RECOVERY indicating its capability to recover from
poisons.

DM may be nested, if part of the base dax devices forming a DM
device support dax recovery, the DM device is marked with such
capability.

Signed-off-by: Jane Chu <jane.chu@oracle.com>
---
 drivers/dax/super.c           | 17 +++++++++++++++++
 drivers/md/dm-linear.c        | 12 ++++++++++++
 drivers/md/dm-log-writes.c    | 12 ++++++++++++
 drivers/md/dm-stripe.c        | 13 +++++++++++++
 drivers/md/dm.c               | 27 +++++++++++++++++++++++++++
 drivers/nvdimm/pmem.c         |  7 +++++++
 include/linux/dax.h           | 16 ++++++++++++++++
 include/linux/device-mapper.h |  9 +++++++++
 8 files changed, 113 insertions(+)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index f4f607d9698b..a136fa6b3e36 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -195,6 +195,23 @@ int dax_zero_page_range(struct dax_device *dax_dev, pgoff_t pgoff,
 }
 EXPORT_SYMBOL_GPL(dax_zero_page_range);
 
+size_t dax_recovery_write(struct dax_device *dax_dev, pgoff_t pgoff,
+			void *addr, size_t bytes, struct iov_iter *i)
+{
+	if (!dax_recovery_capable(dax_dev) || !dax_dev->ops->recovery_write)
+		return (size_t)-EOPNOTSUPP;
+	return dax_dev->ops->recovery_write(dax_dev, pgoff, addr, bytes, i);
+}
+EXPORT_SYMBOL_GPL(dax_recovery_write);
+
+bool dax_recovery_started(struct dax_device *dax_dev, void **kaddr)
+{
+	if (!kaddr || !dax_recovery_capable(dax_dev))
+		return false;
+	return test_bit(DAXDEV_RECOVERY, (unsigned long *)kaddr);
+}
+EXPORT_SYMBOL_GPL(dax_recovery_started);
+
 #ifdef CONFIG_ARCH_HAS_PMEM_API
 void arch_wb_cache_pmem(void *addr, size_t size);
 void dax_flush(struct dax_device *dax_dev, void *addr, size_t size)
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 1b97a11d7151..831c565bf024 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -188,9 +188,20 @@ static int linear_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff,
 	return dax_zero_page_range(dax_dev, pgoff, nr_pages);
 }
 
+static size_t linear_dax_recovery_write(struct dm_target *ti, pgoff_t pgoff,
+	void *addr, size_t bytes, struct iov_iter *i)
+{
+	struct dax_device *dax_dev = linear_dax_pgoff(ti, &pgoff);
+
+	if (!dax_recovery_capable(dax_dev))
+		return (size_t) -EOPNOTSUPP;
+	return dax_recovery_write(dax_dev, pgoff, addr, bytes, i);
+};
+
 #else
 #define linear_dax_direct_access NULL
 #define linear_dax_zero_page_range NULL
+#define	linear_dax_recovery_write NULL
 #endif
 
 static struct target_type linear_target = {
@@ -208,6 +219,7 @@ static struct target_type linear_target = {
 	.iterate_devices = linear_iterate_devices,
 	.direct_access = linear_dax_direct_access,
 	.dax_zero_page_range = linear_dax_zero_page_range,
+	.dax_recovery_write = linear_dax_recovery_write,
 };
 
 int __init dm_linear_init(void)
diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
index 139b09b06eda..22c2d64ef963 100644
--- a/drivers/md/dm-log-writes.c
+++ b/drivers/md/dm-log-writes.c
@@ -927,9 +927,20 @@ static int log_writes_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff,
 	return dax_zero_page_range(dax_dev, pgoff, nr_pages << PAGE_SHIFT);
 }
 
+static size_t log_writes_dax_recovery_write(struct dm_target *ti,
+	pgoff_t pgoff, void *addr, size_t bytes, struct iov_iter *i)
+{
+	struct dax_device *dax_dev = log_writes_dax_pgoff(ti, &pgoff);
+
+	if (!dax_recovery_capable(dax_dev))
+		return (size_t) -EOPNOTSUPP;
+	return dax_recovery_write(dax_dev, pgoff, addr, bytes, i);
+}
+
 #else
 #define log_writes_dax_direct_access NULL
 #define log_writes_dax_zero_page_range NULL
+#define	log_writes_dax_recovery_write NULL
 #endif
 
 static struct target_type log_writes_target = {
@@ -947,6 +958,7 @@ static struct target_type log_writes_target = {
 	.io_hints = log_writes_io_hints,
 	.direct_access = log_writes_dax_direct_access,
 	.dax_zero_page_range = log_writes_dax_zero_page_range,
+	.dax_recovery_write = log_writes_dax_recovery_write,
 };
 
 static int __init dm_log_writes_init(void)
diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
index e566115ec0bb..78c52c8865ef 100644
--- a/drivers/md/dm-stripe.c
+++ b/drivers/md/dm-stripe.c
@@ -332,9 +332,21 @@ static int stripe_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff,
 	return dax_zero_page_range(dax_dev, pgoff, nr_pages);
 }
 
+static size_t stripe_dax_recovery_write(struct dm_target *ti, pgoff_t pgoff,
+		void *addr, size_t bytes, struct iov_iter *i)
+{
+	struct dax_device *dax_dev = stripe_dax_pgoff(ti, &pgoff);
+
+	if (!dax_recovery_capable(dax_dev))
+		return (size_t) -EOPNOTSUPP;
+
+	return dax_recovery_write(dax_dev, pgoff, addr, bytes, i);
+}
+
 #else
 #define stripe_dax_direct_access NULL
 #define stripe_dax_zero_page_range NULL
+#define	stripe_dax_recovery_write NULL
 #endif
 
 /*
@@ -471,6 +483,7 @@ static struct target_type stripe_target = {
 	.io_hints = stripe_io_hints,
 	.direct_access = stripe_dax_direct_access,
 	.dax_zero_page_range = stripe_dax_zero_page_range,
+	.dax_recovery_write = stripe_dax_recovery_write,
 };
 
 int __init dm_stripe_init(void)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index c0ae8087c602..bdc142258ace 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1054,6 +1054,32 @@ static int dm_dax_zero_page_range(struct dax_device *dax_dev, pgoff_t pgoff,
 	return ret;
 }
 
+static size_t dm_dax_recovery_write(struct dax_device *dax_dev, pgoff_t pgoff,
+		void *addr, size_t bytes, struct iov_iter *i)
+{
+	struct mapped_device *md = dax_get_private(dax_dev);
+	sector_t sector = pgoff * PAGE_SECTORS;
+	struct dm_target *ti;
+	long ret = 0;
+	int srcu_idx;
+
+	ti = dm_dax_get_live_target(md, sector, &srcu_idx);
+
+	if (!ti)
+		goto out;
+
+	if (!ti->type->dax_recovery_write) {
+		ret = (size_t)-EOPNOTSUPP;
+		goto out;
+	}
+
+	ret = ti->type->dax_recovery_write(ti, pgoff, addr, bytes, i);
+out:
+	dm_put_live_table(md, srcu_idx);
+
+	return ret;
+}
+
 /*
  * A target may call dm_accept_partial_bio only from the map routine.  It is
  * allowed for all bio types except REQ_PREFLUSH, REQ_OP_ZONE_* zone management
@@ -2980,6 +3006,7 @@ static const struct block_device_operations dm_rq_blk_dops = {
 static const struct dax_operations dm_dax_ops = {
 	.direct_access = dm_dax_direct_access,
 	.zero_page_range = dm_dax_zero_page_range,
+	.recovery_write = dm_dax_recovery_write,
 };
 
 /*
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index e8823813a8df..638e64681db9 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -301,9 +301,16 @@ static long pmem_dax_direct_access(struct dax_device *dax_dev,
 	return __pmem_direct_access(pmem, pgoff, nr_pages, kaddr, pfn);
 }
 
+static size_t pmem_recovery_write(struct dax_device *dax_dev, pgoff_t pgoff,
+		void *addr, size_t bytes, struct iov_iter *i)
+{
+	return 0;
+}
+
 static const struct dax_operations pmem_dax_ops = {
 	.direct_access = pmem_dax_direct_access,
 	.zero_page_range = pmem_dax_zero_page_range,
+	.recovery_write = pmem_recovery_write,
 };
 
 static ssize_t write_cache_show(struct device *dev,
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 2fc776653c6e..1b3d6ebf3e49 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -30,6 +30,9 @@ struct dax_operations {
 			sector_t, sector_t);
 	/* zero_page_range: required operation. Zero page range   */
 	int (*zero_page_range)(struct dax_device *, pgoff_t, size_t);
+	/* recovery_write: optional operation. */
+	size_t (*recovery_write)(struct dax_device *, pgoff_t, void *, size_t,
+				struct iov_iter *);
 };
 
 #if IS_ENABLED(CONFIG_DAX)
@@ -43,6 +46,9 @@ void set_dax_synchronous(struct dax_device *dax_dev);
 void set_dax_recovery(struct dax_device *dax_dev);
 bool dax_recovery_capable(struct dax_device *dax_dev);
 int dax_prep_recovery(struct dax_device *dax_dev, void **kaddr);
+bool dax_recovery_started(struct dax_device *dax_dev, void **kaddr);
+size_t dax_recovery_write(struct dax_device *dax_dev, pgoff_t pgoff, void *addr,
+		size_t bytes, struct iov_iter *i);
 /*
  * Check if given mapping is supported by the file / underlying device.
  */
@@ -101,6 +107,16 @@ static inline int dax_prep_recovery(struct dax_device *dax_dev, void **kaddr)
 {
 	return -EINVAL;
 }
+static inline bool dax_recovery_started(struct dax_device *dax_dev,
+			void **kaddr)
+{
+	return false;
+}
+static inline size_t dax_recovery_write(struct dax_device *dax_dev,
+		pgoff_t pgoff, void *addr, size_t bytes, struct iov_iter *i)
+{
+	return 0;
+}
 #endif
 
 void set_dax_nocache(struct dax_device *dax_dev);
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index b26fecf6c8e8..4f134ba63d3c 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -150,6 +150,14 @@ typedef long (*dm_dax_direct_access_fn) (struct dm_target *ti, pgoff_t pgoff,
 typedef int (*dm_dax_zero_page_range_fn)(struct dm_target *ti, pgoff_t pgoff,
 		size_t nr_pages);
 
+/*
+ * Returns:
+ *   != 0 : number of bytes transferred, or size_t casted negative error code
+ *   0    : failure
+ */
+typedef size_t (*dm_dax_recovery_write_fn)(struct dm_target *ti, pgoff_t pgoff,
+		void *addr, size_t bytes, struct iov_iter *i);
+
 void dm_error(const char *message);
 
 struct dm_dev {
@@ -199,6 +207,7 @@ struct target_type {
 	dm_io_hints_fn io_hints;
 	dm_dax_direct_access_fn direct_access;
 	dm_dax_zero_page_range_fn dax_zero_page_range;
+	dm_dax_recovery_write_fn dax_recovery_write;
 
 	/* For internal device-mapper use. */
 	struct list_head list;
-- 
2.18.4


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 5/7] pmem: add pmem_recovery_write() dax op
  2022-01-28 21:31 [PATCH v5 0/7] DAX poison recovery Jane Chu
                   ` (3 preceding siblings ...)
  2022-01-28 21:31 ` [PATCH v5 4/7] dax: add dax_recovery_write to dax_op and dm target type Jane Chu
@ 2022-01-28 21:31 ` Jane Chu
  2022-02-02 13:43   ` Christoph Hellwig
  2022-02-04  6:21   ` Dan Williams
  2022-01-28 21:31 ` [PATCH v5 6/7] dax: add recovery_write to dax_iomap_iter in failure path Jane Chu
  2022-01-28 21:31 ` [PATCH v5 7/7] pmem: fix pmem_do_write() avoid writing to 'np' page Jane Chu
  6 siblings, 2 replies; 34+ messages in thread
From: Jane Chu @ 2022-01-28 21:31 UTC (permalink / raw)
  To: david, djwong, dan.j.williams, hch, vishal.l.verma, dave.jiang,
	agk, snitzer, dm-devel, ira.weiny, willy, vgoyal, linux-fsdevel,
	nvdimm, linux-kernel, linux-xfs

pmem_recovery_write() consists of clearing poison via DSM,
clearing page HWPoison bit, re-state _PAGE_PRESENT bit,
cacheflush, write, and finally clearing bad-block record.

A competing pread thread is held off during recovery write
by the presence of bad-block record. A competing recovery_write
thread is serialized by a lock.

Signed-off-by: Jane Chu <jane.chu@oracle.com>
---
 drivers/nvdimm/pmem.c | 82 +++++++++++++++++++++++++++++++++++++++----
 drivers/nvdimm/pmem.h |  1 +
 2 files changed, 77 insertions(+), 6 deletions(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 638e64681db9..f2d6b34d48de 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -69,6 +69,14 @@ static void hwpoison_clear(struct pmem_device *pmem,
 	}
 }
 
+static void pmem_clear_badblocks(struct pmem_device *pmem, sector_t sector,
+				long cleared_blks)
+{
+	badblocks_clear(&pmem->bb, sector, cleared_blks);
+	if (pmem->bb_state)
+		sysfs_notify_dirent(pmem->bb_state);
+}
+
 static blk_status_t pmem_clear_poison(struct pmem_device *pmem,
 		phys_addr_t offset, unsigned int len)
 {
@@ -88,9 +96,7 @@ static blk_status_t pmem_clear_poison(struct pmem_device *pmem,
 		dev_dbg(dev, "%#llx clear %ld sector%s\n",
 				(unsigned long long) sector, cleared,
 				cleared > 1 ? "s" : "");
-		badblocks_clear(&pmem->bb, sector, cleared);
-		if (pmem->bb_state)
-			sysfs_notify_dirent(pmem->bb_state);
+		pmem_clear_badblocks(pmem, sector, cleared);
 	}
 
 	arch_invalidate_pmem(pmem->virt_addr + offset, len);
@@ -257,10 +263,15 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector,
 __weak long __pmem_direct_access(struct pmem_device *pmem, pgoff_t pgoff,
 		long nr_pages, void **kaddr, pfn_t *pfn)
 {
+	bool bad_pmem;
+	bool do_recovery = false;
 	resource_size_t offset = PFN_PHYS(pgoff) + pmem->data_offset;
 
-	if (unlikely(is_bad_pmem(&pmem->bb, PFN_PHYS(pgoff) / 512,
-					PFN_PHYS(nr_pages))))
+	bad_pmem = is_bad_pmem(&pmem->bb, PFN_PHYS(pgoff) / 512,
+				PFN_PHYS(nr_pages));
+	if (bad_pmem && kaddr)
+		do_recovery = dax_recovery_started(pmem->dax_dev, kaddr);
+	if (bad_pmem && !do_recovery)
 		return -EIO;
 
 	if (kaddr)
@@ -301,10 +312,68 @@ static long pmem_dax_direct_access(struct dax_device *dax_dev,
 	return __pmem_direct_access(pmem, pgoff, nr_pages, kaddr, pfn);
 }
 
+/*
+ * The recovery write thread started out as a normal pwrite thread and
+ * when the filesystem was told about potential media error in the
+ * range, filesystem turns the normal pwrite to a dax_recovery_write.
+ *
+ * The recovery write consists of clearing poison via DSM, clearing page
+ * HWPoison bit, reenable page-wide read-write permission, flush the
+ * caches and finally write.  A competing pread thread needs to be held
+ * off during the recovery process since data read back might not be valid.
+ * And that's achieved by placing the badblock records clearing after
+ * the completion of the recovery write.
+ *
+ * Any competing recovery write thread needs to be serialized, and this is
+ * done via pmem device level lock .recovery_lock.
+ */
 static size_t pmem_recovery_write(struct dax_device *dax_dev, pgoff_t pgoff,
 		void *addr, size_t bytes, struct iov_iter *i)
 {
-	return 0;
+	size_t rc, len, off;
+	phys_addr_t pmem_off;
+	struct pmem_device *pmem = dax_get_private(dax_dev);
+	struct device *dev = pmem->bb.dev;
+	sector_t sector;
+	long cleared, cleared_blk;
+
+	mutex_lock(&pmem->recovery_lock);
+
+	/* If no poison found in the range, go ahead with write */
+	off = (unsigned long)addr & ~PAGE_MASK;
+	len = PFN_PHYS(PFN_UP(off + bytes));
+	if (!is_bad_pmem(&pmem->bb, PFN_PHYS(pgoff) / 512, len)) {
+		rc = _copy_from_iter_flushcache(addr, bytes, i);
+		goto write_done;
+	}
+
+	/* Not page-aligned range cannot be recovered */
+	if (off || !(PAGE_ALIGNED(bytes))) {
+		dev_warn(dev, "Found poison, but addr(%p) or bytes(%#lx) not page aligned\n",
+			addr, bytes);
+		rc = (size_t) -EIO;
+		goto write_done;
+	}
+
+	pmem_off = PFN_PHYS(pgoff) + pmem->data_offset;
+	sector = (pmem_off - pmem->data_offset) / 512;
+	cleared = nvdimm_clear_poison(dev, pmem->phys_addr + pmem_off, len);
+	cleared_blk = cleared / 512;
+	if (cleared_blk > 0) {
+		hwpoison_clear(pmem, pmem->phys_addr + pmem_off, cleared);
+	} else {
+		dev_warn(dev, "pmem_recovery_write: cleared_blk: %ld\n",
+			cleared_blk);
+		rc = (size_t) -EIO;
+		goto write_done;
+	}
+	arch_invalidate_pmem(pmem->virt_addr + pmem_off, bytes);
+	rc = _copy_from_iter_flushcache(addr, bytes, i);
+	pmem_clear_badblocks(pmem, sector, cleared_blk);
+
+write_done:
+	mutex_unlock(&pmem->recovery_lock);
+	return rc;
 }
 
 static const struct dax_operations pmem_dax_ops = {
@@ -495,6 +564,7 @@ static int pmem_attach_disk(struct device *dev,
 		goto out_cleanup_dax;
 	dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
 	set_dax_recovery(dax_dev);
+	mutex_init(&pmem->recovery_lock);
 	pmem->dax_dev = dax_dev;
 
 	rc = device_add_disk(dev, disk, pmem_attribute_groups);
diff --git a/drivers/nvdimm/pmem.h b/drivers/nvdimm/pmem.h
index 59cfe13ea8a8..971bff7552d6 100644
--- a/drivers/nvdimm/pmem.h
+++ b/drivers/nvdimm/pmem.h
@@ -24,6 +24,7 @@ struct pmem_device {
 	struct dax_device	*dax_dev;
 	struct gendisk		*disk;
 	struct dev_pagemap	pgmap;
+	struct mutex		recovery_lock;
 };
 
 long __pmem_direct_access(struct pmem_device *pmem, pgoff_t pgoff,
-- 
2.18.4


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 6/7] dax: add recovery_write to dax_iomap_iter in failure path
  2022-01-28 21:31 [PATCH v5 0/7] DAX poison recovery Jane Chu
                   ` (4 preceding siblings ...)
  2022-01-28 21:31 ` [PATCH v5 5/7] pmem: add pmem_recovery_write() dax op Jane Chu
@ 2022-01-28 21:31 ` Jane Chu
  2022-02-02 13:44   ` Christoph Hellwig
  2022-01-28 21:31 ` [PATCH v5 7/7] pmem: fix pmem_do_write() avoid writing to 'np' page Jane Chu
  6 siblings, 1 reply; 34+ messages in thread
From: Jane Chu @ 2022-01-28 21:31 UTC (permalink / raw)
  To: david, djwong, dan.j.williams, hch, vishal.l.verma, dave.jiang,
	agk, snitzer, dm-devel, ira.weiny, willy, vgoyal, linux-fsdevel,
	nvdimm, linux-kernel, linux-xfs

dax_iomap_iter() fails if the destination range contains poison.
Add recovery_write to the failure code path.

Signed-off-by: Jane Chu <jane.chu@oracle.com>
---
 fs/dax.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index cd03485867a7..236675bd5946 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1199,6 +1199,8 @@ int dax_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
 }
 EXPORT_SYMBOL_GPL(dax_truncate_page);
 
+typedef size_t (*iter_func_t)(struct dax_device *dax_dev, pgoff_t pgoff,
+		void *addr, size_t bytes, struct iov_iter *i);
 static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
 		struct iov_iter *iter)
 {
@@ -1210,6 +1212,7 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
 	ssize_t ret = 0;
 	size_t xfer;
 	int id;
+	iter_func_t write_func = dax_copy_from_iter;
 
 	if (iov_iter_rw(iter) == READ) {
 		end = min(end, i_size_read(iomi->inode));
@@ -1249,6 +1252,17 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
 
 		map_len = dax_direct_access(dax_dev, pgoff, PHYS_PFN(size),
 				&kaddr, NULL);
+		if ((map_len == -EIO) && (iov_iter_rw(iter) == WRITE)) {
+			if (dax_prep_recovery(dax_dev, &kaddr) < 0) {
+				ret = map_len;
+				break;
+			}
+			map_len = dax_direct_access(dax_dev, pgoff,
+					PHYS_PFN(size), &kaddr, NULL);
+			if (map_len > 0)
+				write_func = dax_recovery_write;
+		}
+
 		if (map_len < 0) {
 			ret = map_len;
 			break;
@@ -1261,12 +1275,17 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
 			map_len = end - pos;
 
 		if (iov_iter_rw(iter) == WRITE)
-			xfer = dax_copy_from_iter(dax_dev, pgoff, kaddr,
-					map_len, iter);
+			xfer = write_func(dax_dev, pgoff, kaddr, map_len, iter);
 		else
 			xfer = dax_copy_to_iter(dax_dev, pgoff, kaddr,
 					map_len, iter);
 
+		if (xfer == (ssize_t) -EIO) {
+			pr_warn("dax_ioma_iter: write_func returns -EIO\n");
+			ret = -EIO;
+			break;
+		}
+
 		pos += xfer;
 		length -= xfer;
 		done += xfer;
-- 
2.18.4


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 7/7] pmem: fix pmem_do_write() avoid writing to 'np' page
  2022-01-28 21:31 [PATCH v5 0/7] DAX poison recovery Jane Chu
                   ` (5 preceding siblings ...)
  2022-01-28 21:31 ` [PATCH v5 6/7] dax: add recovery_write to dax_iomap_iter in failure path Jane Chu
@ 2022-01-28 21:31 ` Jane Chu
  2022-02-02 13:28   ` Christoph Hellwig
  6 siblings, 1 reply; 34+ messages in thread
From: Jane Chu @ 2022-01-28 21:31 UTC (permalink / raw)
  To: david, djwong, dan.j.williams, hch, vishal.l.verma, dave.jiang,
	agk, snitzer, dm-devel, ira.weiny, willy, vgoyal, linux-fsdevel,
	nvdimm, linux-kernel, linux-xfs

Since poisoned page is marked as not-present, the first of the
two back-to-back write_pmem() calls can only be made when there
is no poison in the range, otherwise kernel Oops.

Signed-off-by: Jane Chu <jane.chu@oracle.com>
---
 drivers/nvdimm/pmem.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index f2d6b34d48de..283890084d58 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -187,10 +187,15 @@ static blk_status_t pmem_do_write(struct pmem_device *pmem,
 	 * after clear poison.
 	 */
 	flush_dcache_page(page);
-	write_pmem(pmem_addr, page, page_off, len);
-	if (unlikely(bad_pmem)) {
-		rc = pmem_clear_poison(pmem, pmem_off, len);
+	if (!bad_pmem) {
 		write_pmem(pmem_addr, page, page_off, len);
+	} else {
+		rc = pmem_clear_poison(pmem, pmem_off, len);
+		if (rc == BLK_STS_OK)
+			write_pmem(pmem_addr, page, page_off, len);
+		else
+			pr_warn("%s: failed to clear poison\n",
+				__func__);
 	}
 
 	return rc;
-- 
2.18.4


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 1/7] mce: fix set_mce_nospec to always unmap the whole page
  2022-01-28 21:31 ` [PATCH v5 1/7] mce: fix set_mce_nospec to always unmap the whole page Jane Chu
@ 2022-02-02 13:21   ` Christoph Hellwig
  2022-02-02 21:20     ` Jane Chu
  0 siblings, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2022-02-02 13:21 UTC (permalink / raw)
  To: Jane Chu
  Cc: david, djwong, dan.j.williams, hch, vishal.l.verma, dave.jiang,
	agk, snitzer, dm-devel, ira.weiny, willy, vgoyal, linux-fsdevel,
	nvdimm, linux-kernel, linux-xfs

> +static inline int set_mce_nospec(unsigned long pfn)
>  {
>  	unsigned long decoy_addr;
>  	int rc;
> @@ -117,10 +113,7 @@ static inline int set_mce_nospec(unsigned long pfn, bool unmap)
>  	 */
>  	decoy_addr = (pfn << PAGE_SHIFT) + (PAGE_OFFSET ^ BIT(63));
>  
> -	if (unmap)
> -		rc = set_memory_np(decoy_addr, 1);
> -	else
> -		rc = set_memory_uc(decoy_addr, 1);
> +	rc = set_memory_np(decoy_addr, 1);
>  	if (rc)
>  		pr_warn("Could not invalidate pfn=0x%lx from 1:1 map\n", pfn);
>  	return rc;
> @@ -130,7 +123,7 @@ static inline int set_mce_nospec(unsigned long pfn, bool unmap)
>  /* Restore full speculative operation to the pfn. */
>  static inline int clear_mce_nospec(unsigned long pfn)
>  {
> -	return set_memory_wb((unsigned long) pfn_to_kaddr(pfn), 1);
> +	return _set_memory_present((unsigned long) pfn_to_kaddr(pfn), 1);
>  }

Wouldn't it make more sense to move these helpers out of line rather
than exporting _set_memory_present?

>  /*
> - * _set_memory_prot is an internal helper for callers that have been passed
> + * __set_memory_prot is an internal helper for callers that have been passed

This looks unrelated to the patch.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 2/7] dax: introduce dax device flag DAXDEV_RECOVERY
  2022-01-28 21:31 ` [PATCH v5 2/7] dax: introduce dax device flag DAXDEV_RECOVERY Jane Chu
@ 2022-02-02 13:23   ` Christoph Hellwig
  2022-02-02 21:27     ` Jane Chu
  0 siblings, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2022-02-02 13:23 UTC (permalink / raw)
  To: Jane Chu
  Cc: david, djwong, dan.j.williams, hch, vishal.l.verma, dave.jiang,
	agk, snitzer, dm-devel, ira.weiny, willy, vgoyal, linux-fsdevel,
	nvdimm, linux-kernel, linux-xfs

On Fri, Jan 28, 2022 at 02:31:45PM -0700, Jane Chu wrote:
> +int dax_prep_recovery(struct dax_device *dax_dev, void **kaddr)
> +{
> +	if (dax_recovery_capable(dax_dev)) {
> +		set_bit(DAXDEV_RECOVERY, (unsigned long *)kaddr);
> +		return 0;
> +	}
> +	return -EINVAL;

Setting a random bit on a passed in memory address looks a little
dangerous to me.

Also I'd return early for the EINVAL case to make the flow a little
more clear.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 7/7] pmem: fix pmem_do_write() avoid writing to 'np' page
  2022-01-28 21:31 ` [PATCH v5 7/7] pmem: fix pmem_do_write() avoid writing to 'np' page Jane Chu
@ 2022-02-02 13:28   ` Christoph Hellwig
  2022-02-02 21:31     ` Jane Chu
  0 siblings, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2022-02-02 13:28 UTC (permalink / raw)
  To: Jane Chu
  Cc: david, djwong, dan.j.williams, hch, vishal.l.verma, dave.jiang,
	agk, snitzer, dm-devel, ira.weiny, willy, vgoyal, linux-fsdevel,
	nvdimm, linux-kernel, linux-xfs

On Fri, Jan 28, 2022 at 02:31:50PM -0700, Jane Chu wrote:
> +	if (!bad_pmem) {
>  		write_pmem(pmem_addr, page, page_off, len);
> +	} else {
> +		rc = pmem_clear_poison(pmem, pmem_off, len);
> +		if (rc == BLK_STS_OK)
> +			write_pmem(pmem_addr, page, page_off, len);
> +		else
> +			pr_warn("%s: failed to clear poison\n",
> +				__func__);

This warning probably needs ratelimiting.

Also this flow looks a little odd.  I'd redo the whole function with a
clear bad_mem case:

	if (unlikely(is_bad_pmem(&pmem->bb, sector, len))) {
		blk_status_t rc = pmem_clear_poison(pmem, pmem_off, len);

		if (rc != BLK_STS_OK) {
			pr_warn("%s: failed to clear poison\n", __func__);
			return rc;
		}
	}
	flush_dcache_page(page);
	write_pmem(pmem_addr, page, page_off, len);
	return BLK_STS_OK;


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 4/7] dax: add dax_recovery_write to dax_op and dm target type
  2022-01-28 21:31 ` [PATCH v5 4/7] dax: add dax_recovery_write to dax_op and dm target type Jane Chu
@ 2022-02-02 13:34   ` Christoph Hellwig
  2022-02-02 22:03     ` Jane Chu
  2022-02-04  6:03   ` Dan Williams
  1 sibling, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2022-02-02 13:34 UTC (permalink / raw)
  To: Jane Chu
  Cc: david, djwong, dan.j.williams, hch, vishal.l.verma, dave.jiang,
	agk, snitzer, dm-devel, ira.weiny, willy, vgoyal, linux-fsdevel,
	nvdimm, linux-kernel, linux-xfs

On Fri, Jan 28, 2022 at 02:31:47PM -0700, Jane Chu wrote:
> dax_recovery_write() dax op is only required for DAX device that
> export DAXDEV_RECOVERY indicating its capability to recover from
> poisons.
> 
> DM may be nested, if part of the base dax devices forming a DM
> device support dax recovery, the DM device is marked with such
> capability.

I'd fold this into the previous 2 patches as the flag and method
are clearly very tightly coupled.

> +static size_t linear_dax_recovery_write(struct dm_target *ti, pgoff_t pgoff,
> +	void *addr, size_t bytes, struct iov_iter *i)

Function line continuations use two tab indentations or alignment after
the opening brace.

> +{
> +	struct dax_device *dax_dev = linear_dax_pgoff(ti, &pgoff);
> +
> +	if (!dax_recovery_capable(dax_dev))
> +		return (size_t) -EOPNOTSUPP;

Returning a negativ errno through an unsigned argument looks dangerous.

> +	/* recovery_write: optional operation. */

And explanation of what the method is use for might be more useful than
mentioning that is is optional.

> +	size_t (*recovery_write)(struct dax_device *, pgoff_t, void *, size_t,
> +				struct iov_iter *);

Spelling out the arguments tends to help readability, but then again
none of the existing methods does it.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 5/7] pmem: add pmem_recovery_write() dax op
  2022-01-28 21:31 ` [PATCH v5 5/7] pmem: add pmem_recovery_write() dax op Jane Chu
@ 2022-02-02 13:43   ` Christoph Hellwig
  2022-02-02 22:13     ` Jane Chu
  2022-02-04  6:21   ` Dan Williams
  1 sibling, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2022-02-02 13:43 UTC (permalink / raw)
  To: Jane Chu
  Cc: david, djwong, dan.j.williams, hch, vishal.l.verma, dave.jiang,
	agk, snitzer, dm-devel, ira.weiny, willy, vgoyal, linux-fsdevel,
	nvdimm, linux-kernel, linux-xfs

> @@ -257,10 +263,15 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector,
>  __weak long __pmem_direct_access(struct pmem_device *pmem, pgoff_t pgoff,
>  		long nr_pages, void **kaddr, pfn_t *pfn)
>  {
> +	bool bad_pmem;
> +	bool do_recovery = false;
>  	resource_size_t offset = PFN_PHYS(pgoff) + pmem->data_offset;
>  
> -	if (unlikely(is_bad_pmem(&pmem->bb, PFN_PHYS(pgoff) / 512,
> -					PFN_PHYS(nr_pages))))
> +	bad_pmem = is_bad_pmem(&pmem->bb, PFN_PHYS(pgoff) / 512,
> +				PFN_PHYS(nr_pages));
> +	if (bad_pmem && kaddr)
> +		do_recovery = dax_recovery_started(pmem->dax_dev, kaddr);
> +	if (bad_pmem && !do_recovery)
>  		return -EIO;

I find the passing of the recovery flag through the address very
cumbersome.  I remember there was some kind of discussion, but this looks
pretty ugly.

Also no need for the bad_pmem variable:

	if (is_bad_pmem(&pmem->bb, PFN_PHYS(pgoff) / 512, PFN_PHYS(nr_pages)) &&
	    (!kaddr | !dax_recovery_started(pmem->dax_dev, kaddr)))
		return -EIO;

Also:  the !kaddr check could go into dax_recovery_started.  That way
even if we stick with the overloading kaddr could also be used just for
the flag if needed.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 6/7] dax: add recovery_write to dax_iomap_iter in failure path
  2022-01-28 21:31 ` [PATCH v5 6/7] dax: add recovery_write to dax_iomap_iter in failure path Jane Chu
@ 2022-02-02 13:44   ` Christoph Hellwig
  2022-02-02 22:18     ` Jane Chu
  0 siblings, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2022-02-02 13:44 UTC (permalink / raw)
  To: Jane Chu
  Cc: david, djwong, dan.j.williams, hch, vishal.l.verma, dave.jiang,
	agk, snitzer, dm-devel, ira.weiny, willy, vgoyal, linux-fsdevel,
	nvdimm, linux-kernel, linux-xfs

On Fri, Jan 28, 2022 at 02:31:49PM -0700, Jane Chu wrote:
> +typedef size_t (*iter_func_t)(struct dax_device *dax_dev, pgoff_t pgoff,
> +		void *addr, size_t bytes, struct iov_iter *i);
>  static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
>  		struct iov_iter *iter)
>  {
> @@ -1210,6 +1212,7 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
>  	ssize_t ret = 0;
>  	size_t xfer;
>  	int id;
> +	iter_func_t write_func = dax_copy_from_iter;

This use of a function pointer causes indirect call overhead.  A simple
"bool in_recovery" or do_recovery does the trick in a way that is
both more readable and generates faster code.

> +		if ((map_len == -EIO) && (iov_iter_rw(iter) == WRITE)) {

No need for the braces.

>  		if (iov_iter_rw(iter) == WRITE)
> -			xfer = dax_copy_from_iter(dax_dev, pgoff, kaddr,
> -					map_len, iter);
> +			xfer = write_func(dax_dev, pgoff, kaddr, map_len, iter);
>  		else
>  			xfer = dax_copy_to_iter(dax_dev, pgoff, kaddr,
>  					map_len, iter);

i.e.

		if (iov_iter_rw(iter) == READ)
			xfer = dax_copy_to_iter(dax_dev, pgoff, kaddr,
					map_len, iter);
		else if (unlikely(do_recovery))
			xfer = dax_recovery_write(dax_dev, pgoff, kaddr,
					map_len, iter);
		else
			xfer = dax_copy_from_iter(dax_dev, pgoff, kaddr,
					map_len, iter);

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 1/7] mce: fix set_mce_nospec to always unmap the whole page
  2022-02-02 13:21   ` Christoph Hellwig
@ 2022-02-02 21:20     ` Jane Chu
  2022-02-02 23:07       ` Jane Chu
  0 siblings, 1 reply; 34+ messages in thread
From: Jane Chu @ 2022-02-02 21:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: david, djwong, dan.j.williams, vishal.l.verma, dave.jiang, agk,
	snitzer, dm-devel, ira.weiny, willy, vgoyal, linux-fsdevel,
	nvdimm, linux-kernel, linux-xfs

On 2/2/2022 5:21 AM, Christoph Hellwig wrote:
>> +static inline int set_mce_nospec(unsigned long pfn)
>>   {
>>   	unsigned long decoy_addr;
>>   	int rc;
>> @@ -117,10 +113,7 @@ static inline int set_mce_nospec(unsigned long pfn, bool unmap)
>>   	 */
>>   	decoy_addr = (pfn << PAGE_SHIFT) + (PAGE_OFFSET ^ BIT(63));
>>   
>> -	if (unmap)
>> -		rc = set_memory_np(decoy_addr, 1);
>> -	else
>> -		rc = set_memory_uc(decoy_addr, 1);
>> +	rc = set_memory_np(decoy_addr, 1);
>>   	if (rc)
>>   		pr_warn("Could not invalidate pfn=0x%lx from 1:1 map\n", pfn);
>>   	return rc;
>> @@ -130,7 +123,7 @@ static inline int set_mce_nospec(unsigned long pfn, bool unmap)
>>   /* Restore full speculative operation to the pfn. */
>>   static inline int clear_mce_nospec(unsigned long pfn)
>>   {
>> -	return set_memory_wb((unsigned long) pfn_to_kaddr(pfn), 1);
>> +	return _set_memory_present((unsigned long) pfn_to_kaddr(pfn), 1);
>>   }
> 
> Wouldn't it make more sense to move these helpers out of line rather
> than exporting _set_memory_present?

Do you mean to move
   return change_page_attr_set(&addr, numpages, __pgprot(_PAGE_PRESENT), 0);
into clear_mce_nospec() for the x86 arch and get rid of _set_memory_present?
If so, sure I'll do that.

> 
>>   /*
>> - * _set_memory_prot is an internal helper for callers that have been passed
>> + * __set_memory_prot is an internal helper for callers that have been passed
> 
> This looks unrelated to the patch.

My bad, will remove the remnant.

thanks!
-jane


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 2/7] dax: introduce dax device flag DAXDEV_RECOVERY
  2022-02-02 13:23   ` Christoph Hellwig
@ 2022-02-02 21:27     ` Jane Chu
  2022-02-03 13:43       ` Christoph Hellwig
  0 siblings, 1 reply; 34+ messages in thread
From: Jane Chu @ 2022-02-02 21:27 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: david, djwong, dan.j.williams, vishal.l.verma, dave.jiang, agk,
	snitzer, dm-devel, ira.weiny, willy, vgoyal, linux-fsdevel,
	nvdimm, linux-kernel, linux-xfs

On 2/2/2022 5:23 AM, Christoph Hellwig wrote:
> On Fri, Jan 28, 2022 at 02:31:45PM -0700, Jane Chu wrote:
>> +int dax_prep_recovery(struct dax_device *dax_dev, void **kaddr)
>> +{
>> +	if (dax_recovery_capable(dax_dev)) {
>> +		set_bit(DAXDEV_RECOVERY, (unsigned long *)kaddr);
>> +		return 0;
>> +	}
>> +	return -EINVAL;
> 
> Setting a random bit on a passed in memory address looks a little
> dangerous to me.

Yeah, I see.  Would you suggest a way to pass the indication from
dax_iomap_iter to dax_direct_access that the caller intends the
callee to ignore poison in the range because the caller intends
to do recovery_write? We tried adding a flag to dax_direct_access, and 
that wasn't liked if I recall.

> 
> Also I'd return early for the EINVAL case to make the flow a little
> more clear.

Agreed, will do.

thanks!
-jane

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 7/7] pmem: fix pmem_do_write() avoid writing to 'np' page
  2022-02-02 13:28   ` Christoph Hellwig
@ 2022-02-02 21:31     ` Jane Chu
  0 siblings, 0 replies; 34+ messages in thread
From: Jane Chu @ 2022-02-02 21:31 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: david, djwong, dan.j.williams, vishal.l.verma, dave.jiang, agk,
	snitzer, dm-devel, ira.weiny, willy, vgoyal, linux-fsdevel,
	nvdimm, linux-kernel, linux-xfs

On 2/2/2022 5:28 AM, Christoph Hellwig wrote:
> On Fri, Jan 28, 2022 at 02:31:50PM -0700, Jane Chu wrote:
>> +	if (!bad_pmem) {
>>   		write_pmem(pmem_addr, page, page_off, len);
>> +	} else {
>> +		rc = pmem_clear_poison(pmem, pmem_off, len);
>> +		if (rc == BLK_STS_OK)
>> +			write_pmem(pmem_addr, page, page_off, len);
>> +		else
>> +			pr_warn("%s: failed to clear poison\n",
>> +				__func__);
> 
> This warning probably needs ratelimiting.

Will do, in case bad hardware is encountered, I can see lots of warnings.

> 
> Also this flow looks a little odd.  I'd redo the whole function with a
> clear bad_mem case:
> 
> 	if (unlikely(is_bad_pmem(&pmem->bb, sector, len))) {
> 		blk_status_t rc = pmem_clear_poison(pmem, pmem_off, len);
> 
> 		if (rc != BLK_STS_OK) {
> 			pr_warn("%s: failed to clear poison\n", __func__);
> 			return rc;
> 		}
> 	}
> 	flush_dcache_page(page);
> 	write_pmem(pmem_addr, page, page_off, len);
> 	return BLK_STS_OK;
> 
> 

This is much better, thanks for the suggestion!

-jane


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 4/7] dax: add dax_recovery_write to dax_op and dm target type
  2022-02-02 13:34   ` Christoph Hellwig
@ 2022-02-02 22:03     ` Jane Chu
  0 siblings, 0 replies; 34+ messages in thread
From: Jane Chu @ 2022-02-02 22:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: david, djwong, dan.j.williams, vishal.l.verma, dave.jiang, agk,
	snitzer, dm-devel, ira.weiny, willy, vgoyal, linux-fsdevel,
	nvdimm, linux-kernel, linux-xfs

On 2/2/2022 5:34 AM, Christoph Hellwig wrote:
> On Fri, Jan 28, 2022 at 02:31:47PM -0700, Jane Chu wrote:
>> dax_recovery_write() dax op is only required for DAX device that
>> export DAXDEV_RECOVERY indicating its capability to recover from
>> poisons.
>>
>> DM may be nested, if part of the base dax devices forming a DM
>> device support dax recovery, the DM device is marked with such
>> capability.
> 
> I'd fold this into the previous 2 patches as the flag and method
> are clearly very tightly coupled.

Make sense, will do.

> 
>> +static size_t linear_dax_recovery_write(struct dm_target *ti, pgoff_t pgoff,
>> +	void *addr, size_t bytes, struct iov_iter *i)
> 
> Function line continuations use two tab indentations or alignment after
> the opening brace.

Okay.

> 
>> +{
>> +	struct dax_device *dax_dev = linear_dax_pgoff(ti, &pgoff);
>> +
>> +	if (!dax_recovery_capable(dax_dev))
>> +		return (size_t) -EOPNOTSUPP;
> 
> Returning a negativ errno through an unsigned argument looks dangerous.

Sorry, should be (ssize_t) there.

> 
>> +	/* recovery_write: optional operation. */
> 
> And explanation of what the method is use for might be more useful than
> mentioning that is is optional.

Will add substance to comments.

> 
>> +	size_t (*recovery_write)(struct dax_device *, pgoff_t, void *, size_t,
>> +				struct iov_iter *);
> 
> Spelling out the arguments tends to help readability, but then again
> none of the existing methods does it.

Thanks!
-jane


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 5/7] pmem: add pmem_recovery_write() dax op
  2022-02-02 13:43   ` Christoph Hellwig
@ 2022-02-02 22:13     ` Jane Chu
  0 siblings, 0 replies; 34+ messages in thread
From: Jane Chu @ 2022-02-02 22:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: david, djwong, dan.j.williams, vishal.l.verma, dave.jiang, agk,
	snitzer, dm-devel, ira.weiny, willy, vgoyal, linux-fsdevel,
	nvdimm, linux-kernel, linux-xfs

On 2/2/2022 5:43 AM, Christoph Hellwig wrote:
>> @@ -257,10 +263,15 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector,
>>   __weak long __pmem_direct_access(struct pmem_device *pmem, pgoff_t pgoff,
>>   		long nr_pages, void **kaddr, pfn_t *pfn)
>>   {
>> +	bool bad_pmem;
>> +	bool do_recovery = false;
>>   	resource_size_t offset = PFN_PHYS(pgoff) + pmem->data_offset;
>>   
>> -	if (unlikely(is_bad_pmem(&pmem->bb, PFN_PHYS(pgoff) / 512,
>> -					PFN_PHYS(nr_pages))))
>> +	bad_pmem = is_bad_pmem(&pmem->bb, PFN_PHYS(pgoff) / 512,
>> +				PFN_PHYS(nr_pages));
>> +	if (bad_pmem && kaddr)
>> +		do_recovery = dax_recovery_started(pmem->dax_dev, kaddr);
>> +	if (bad_pmem && !do_recovery)
>>   		return -EIO;
> 
> I find the passing of the recovery flag through the address very
> cumbersome.  I remember there was some kind of discussion, but this looks
> pretty ugly.
> 
> Also no need for the bad_pmem variable:
> 
> 	if (is_bad_pmem(&pmem->bb, PFN_PHYS(pgoff) / 512, PFN_PHYS(nr_pages)) &&
> 	    (!kaddr | !dax_recovery_started(pmem->dax_dev, kaddr)))
> 		return -EIO;

Okay.

>
> Also:  the !kaddr check could go into dax_recovery_started.  That way
> even if we stick with the overloading kaddr could also be used just for
> the flag if needed.

The !kaddr check is in dax_recovery_started(), and that reminded me the
check should be in dax_prep_recovery() too.

Thanks!
-jane


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 6/7] dax: add recovery_write to dax_iomap_iter in failure path
  2022-02-02 13:44   ` Christoph Hellwig
@ 2022-02-02 22:18     ` Jane Chu
  0 siblings, 0 replies; 34+ messages in thread
From: Jane Chu @ 2022-02-02 22:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: david, djwong, dan.j.williams, vishal.l.verma, dave.jiang, agk,
	snitzer, dm-devel, ira.weiny, willy, vgoyal, linux-fsdevel,
	nvdimm, linux-kernel, linux-xfs

On 2/2/2022 5:44 AM, Christoph Hellwig wrote:
> On Fri, Jan 28, 2022 at 02:31:49PM -0700, Jane Chu wrote:
>> +typedef size_t (*iter_func_t)(struct dax_device *dax_dev, pgoff_t pgoff,
>> +		void *addr, size_t bytes, struct iov_iter *i);
>>   static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
>>   		struct iov_iter *iter)
>>   {
>> @@ -1210,6 +1212,7 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
>>   	ssize_t ret = 0;
>>   	size_t xfer;
>>   	int id;
>> +	iter_func_t write_func = dax_copy_from_iter;
> 
> This use of a function pointer causes indirect call overhead.  A simple
> "bool in_recovery" or do_recovery does the trick in a way that is
> both more readable and generates faster code.

Good point, thanks!

> 
>> +		if ((map_len == -EIO) && (iov_iter_rw(iter) == WRITE)) {
> 
> No need for the braces.

Did you mean the outer "( )" ?

> 
>>   		if (iov_iter_rw(iter) == WRITE)
>> -			xfer = dax_copy_from_iter(dax_dev, pgoff, kaddr,
>> -					map_len, iter);
>> +			xfer = write_func(dax_dev, pgoff, kaddr, map_len, iter);
>>   		else
>>   			xfer = dax_copy_to_iter(dax_dev, pgoff, kaddr,
>>   					map_len, iter);
> 
> i.e.
> 
> 		if (iov_iter_rw(iter) == READ)
> 			xfer = dax_copy_to_iter(dax_dev, pgoff, kaddr,
> 					map_len, iter);
> 		else if (unlikely(do_recovery))
> 			xfer = dax_recovery_write(dax_dev, pgoff, kaddr,
> 					map_len, iter);
> 		else
> 			xfer = dax_copy_from_iter(dax_dev, pgoff, kaddr,
> 					map_len, iter);
> 

Will do.

Thanks a lot!

-jane

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 1/7] mce: fix set_mce_nospec to always unmap the whole page
  2022-02-02 21:20     ` Jane Chu
@ 2022-02-02 23:07       ` Jane Chu
  2022-02-03 13:42         ` Christoph Hellwig
  0 siblings, 1 reply; 34+ messages in thread
From: Jane Chu @ 2022-02-02 23:07 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: david, djwong, dan.j.williams, vishal.l.verma, dave.jiang, agk,
	snitzer, dm-devel, ira.weiny, willy, vgoyal, linux-fsdevel,
	nvdimm, linux-kernel, linux-xfs

On 2/2/2022 1:20 PM, Jane Chu wrote:
> On 2/2/2022 5:21 AM, Christoph Hellwig wrote:
>>> +static inline int set_mce_nospec(unsigned long pfn)
>>>    {
>>>    	unsigned long decoy_addr;
>>>    	int rc;
>>> @@ -117,10 +113,7 @@ static inline int set_mce_nospec(unsigned long pfn, bool unmap)
>>>    	 */
>>>    	decoy_addr = (pfn << PAGE_SHIFT) + (PAGE_OFFSET ^ BIT(63));
>>>    
>>> -	if (unmap)
>>> -		rc = set_memory_np(decoy_addr, 1);
>>> -	else
>>> -		rc = set_memory_uc(decoy_addr, 1);
>>> +	rc = set_memory_np(decoy_addr, 1);
>>>    	if (rc)
>>>    		pr_warn("Could not invalidate pfn=0x%lx from 1:1 map\n", pfn);
>>>    	return rc;
>>> @@ -130,7 +123,7 @@ static inline int set_mce_nospec(unsigned long pfn, bool unmap)
>>>    /* Restore full speculative operation to the pfn. */
>>>    static inline int clear_mce_nospec(unsigned long pfn)
>>>    {
>>> -	return set_memory_wb((unsigned long) pfn_to_kaddr(pfn), 1);
>>> +	return _set_memory_present((unsigned long) pfn_to_kaddr(pfn), 1);
>>>    }
>>
>> Wouldn't it make more sense to move these helpers out of line rather
>> than exporting _set_memory_present?
> 
> Do you mean to move
>     return change_page_attr_set(&addr, numpages, __pgprot(_PAGE_PRESENT), 0);
> into clear_mce_nospec() for the x86 arch and get rid of _set_memory_present?
> If so, sure I'll do that.

Looks like I can't do that.  It's either exporting 
_set_memory_present(), or exporting change_page_attr_set().  Perhaps the 
former is more conventional?

-jane

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 1/7] mce: fix set_mce_nospec to always unmap the whole page
  2022-02-02 23:07       ` Jane Chu
@ 2022-02-03 13:42         ` Christoph Hellwig
  2022-02-04  5:23           ` Dan Williams
  0 siblings, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2022-02-03 13:42 UTC (permalink / raw)
  To: Jane Chu
  Cc: Christoph Hellwig, david, djwong, dan.j.williams, vishal.l.verma,
	dave.jiang, agk, snitzer, dm-devel, ira.weiny, willy, vgoyal,
	linux-fsdevel, nvdimm, linux-kernel, linux-xfs

On Wed, Feb 02, 2022 at 11:07:37PM +0000, Jane Chu wrote:
> On 2/2/2022 1:20 PM, Jane Chu wrote:
> >> Wouldn't it make more sense to move these helpers out of line rather
> >> than exporting _set_memory_present?
> > 
> > Do you mean to move
> >     return change_page_attr_set(&addr, numpages, __pgprot(_PAGE_PRESENT), 0);
> > into clear_mce_nospec() for the x86 arch and get rid of _set_memory_present?
> > If so, sure I'll do that.
> 
> Looks like I can't do that.  It's either exporting 
> _set_memory_present(), or exporting change_page_attr_set().  Perhaps the 
> former is more conventional?

These helpers above means set_mce_nospec and clear_mce_nospec.  If they
are moved to normal functions instead of inlines, there is no need to
export the internals at all.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 2/7] dax: introduce dax device flag DAXDEV_RECOVERY
  2022-02-02 21:27     ` Jane Chu
@ 2022-02-03 13:43       ` Christoph Hellwig
  2022-02-04  5:17         ` Dan Williams
  0 siblings, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2022-02-03 13:43 UTC (permalink / raw)
  To: Jane Chu
  Cc: Christoph Hellwig, david, djwong, dan.j.williams, vishal.l.verma,
	dave.jiang, agk, snitzer, dm-devel, ira.weiny, willy, vgoyal,
	linux-fsdevel, nvdimm, linux-kernel, linux-xfs

On Wed, Feb 02, 2022 at 09:27:42PM +0000, Jane Chu wrote:
> Yeah, I see.  Would you suggest a way to pass the indication from
> dax_iomap_iter to dax_direct_access that the caller intends the
> callee to ignore poison in the range because the caller intends
> to do recovery_write? We tried adding a flag to dax_direct_access, and 
> that wasn't liked if I recall.

To me a flag seems cleaner than this magic, but let's wait for Dan to
chime in.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 2/7] dax: introduce dax device flag DAXDEV_RECOVERY
  2022-02-03 13:43       ` Christoph Hellwig
@ 2022-02-04  5:17         ` Dan Williams
  2022-02-04  5:32           ` Dan Williams
  0 siblings, 1 reply; 34+ messages in thread
From: Dan Williams @ 2022-02-04  5:17 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jane Chu, david, djwong, vishal.l.verma, dave.jiang, agk,
	snitzer, dm-devel, ira.weiny, willy, vgoyal, linux-fsdevel,
	nvdimm, linux-kernel, linux-xfs

On Thu, Feb 3, 2022 at 5:43 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Wed, Feb 02, 2022 at 09:27:42PM +0000, Jane Chu wrote:
> > Yeah, I see.  Would you suggest a way to pass the indication from
> > dax_iomap_iter to dax_direct_access that the caller intends the
> > callee to ignore poison in the range because the caller intends
> > to do recovery_write? We tried adding a flag to dax_direct_access, and
> > that wasn't liked if I recall.
>
> To me a flag seems cleaner than this magic, but let's wait for Dan to
> chime in.

So back in November I suggested modifying the kaddr, mainly to avoid
touching all the dax_direct_access() call sites [1]. However, now
seeing the code and Chrisoph's comment I think this either wants type
safety (e.g. 'dax_addr_t *'), or just add a new flag. Given both of
those options involve touching all dax_direct_access() call sites and
a @flags operation is more extensible if any other scenarios arrive
lets go ahead and plumb a flag and skip the magic.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 1/7] mce: fix set_mce_nospec to always unmap the whole page
  2022-02-03 13:42         ` Christoph Hellwig
@ 2022-02-04  5:23           ` Dan Williams
  2022-02-06  8:30             ` Jane Chu
  0 siblings, 1 reply; 34+ messages in thread
From: Dan Williams @ 2022-02-04  5:23 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jane Chu, david, djwong, vishal.l.verma, dave.jiang, agk,
	snitzer, dm-devel, ira.weiny, willy, vgoyal, linux-fsdevel,
	nvdimm, linux-kernel, linux-xfs

On Thu, Feb 3, 2022 at 5:42 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Wed, Feb 02, 2022 at 11:07:37PM +0000, Jane Chu wrote:
> > On 2/2/2022 1:20 PM, Jane Chu wrote:
> > >> Wouldn't it make more sense to move these helpers out of line rather
> > >> than exporting _set_memory_present?
> > >
> > > Do you mean to move
> > >     return change_page_attr_set(&addr, numpages, __pgprot(_PAGE_PRESENT), 0);
> > > into clear_mce_nospec() for the x86 arch and get rid of _set_memory_present?
> > > If so, sure I'll do that.
> >
> > Looks like I can't do that.  It's either exporting
> > _set_memory_present(), or exporting change_page_attr_set().  Perhaps the
> > former is more conventional?
>
> These helpers above means set_mce_nospec and clear_mce_nospec.  If they
> are moved to normal functions instead of inlines, there is no need to
> export the internals at all.

Agree, {set,clear}_mce_nospec() can just move to arch/x86/mm/pat/set_memory.c.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 2/7] dax: introduce dax device flag DAXDEV_RECOVERY
  2022-02-04  5:17         ` Dan Williams
@ 2022-02-04  5:32           ` Dan Williams
  2022-02-06  8:29             ` Jane Chu
  0 siblings, 1 reply; 34+ messages in thread
From: Dan Williams @ 2022-02-04  5:32 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jane Chu, david, djwong, vishal.l.verma, dave.jiang, agk,
	snitzer, dm-devel, ira.weiny, willy, vgoyal, linux-fsdevel,
	nvdimm, linux-kernel, linux-xfs

On Thu, Feb 3, 2022 at 9:17 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Thu, Feb 3, 2022 at 5:43 AM Christoph Hellwig <hch@infradead.org> wrote:
> >
> > On Wed, Feb 02, 2022 at 09:27:42PM +0000, Jane Chu wrote:
> > > Yeah, I see.  Would you suggest a way to pass the indication from
> > > dax_iomap_iter to dax_direct_access that the caller intends the
> > > callee to ignore poison in the range because the caller intends
> > > to do recovery_write? We tried adding a flag to dax_direct_access, and
> > > that wasn't liked if I recall.
> >
> > To me a flag seems cleaner than this magic, but let's wait for Dan to
> > chime in.
>
> So back in November I suggested modifying the kaddr, mainly to avoid
> touching all the dax_direct_access() call sites [1]. However, now
> seeing the code and Chrisoph's comment I think this either wants type
> safety (e.g. 'dax_addr_t *'), or just add a new flag. Given both of
> those options involve touching all dax_direct_access() call sites and
> a @flags operation is more extensible if any other scenarios arrive
> lets go ahead and plumb a flag and skip the magic.

Just to be clear we are talking about a flow like:

        flags = 0;
        rc = dax_direct_access(..., &kaddr, flags, ...);
        if (unlikely(rc)) {
                flags |= DAX_RECOVERY;
                dax_direct_access(..., &kaddr, flags, ...);
                return dax_recovery_{read,write}(..., kaddr, ...);
        }
        return copy_{mc_to_iter,from_iter_flushcache}(...);

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 3/7] dm: make dm aware of target's DAXDEV_RECOVERY capability
  2022-01-28 21:31 ` [PATCH v5 3/7] dm: make dm aware of target's DAXDEV_RECOVERY capability Jane Chu
@ 2022-02-04  5:34   ` Dan Williams
  2022-02-06  8:27     ` Jane Chu
  0 siblings, 1 reply; 34+ messages in thread
From: Dan Williams @ 2022-02-04  5:34 UTC (permalink / raw)
  To: Jane Chu
  Cc: david, Darrick J. Wong, Christoph Hellwig, Vishal L Verma,
	Dave Jiang, Alasdair Kergon, Mike Snitzer,
	device-mapper development, Weiny, Ira, Matthew Wilcox,
	Vivek Goyal, linux-fsdevel, Linux NVDIMM,
	Linux Kernel Mailing List, linux-xfs

On Fri, Jan 28, 2022 at 1:32 PM Jane Chu <jane.chu@oracle.com> wrote:
>
> If one of the MD raid participating target dax device supports
> DAXDEV_RECOVERY, then it'll be declared on the whole that the
> MD device is capable of DAXDEV_RECOVERY.
> And only when the recovery process reaches to the target driver,
> it becomes deterministic whether a certain dax address range
> maybe recovered, or not.
>
> Signed-off-by: Jane Chu <jane.chu@oracle.com>
> ---
>  drivers/md/dm-table.c | 33 +++++++++++++++++++++++++++++++++
>  1 file changed, 33 insertions(+)
>
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index e43096cfe9e2..8af8a81b6172 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -844,6 +844,36 @@ static bool dm_table_supports_dax(struct dm_table *t,
>         return true;
>  }
>
> +/* Check whether device is capable of dax poison recovery */
> +static int device_poison_recovery_capable(struct dm_target *ti,
> +       struct dm_dev *dev, sector_t start, sector_t len, void *data)
> +{
> +       if (!dev->dax_dev)
> +               return false;
> +       return dax_recovery_capable(dev->dax_dev);

Hmm it's not clear to me that dax_recovery_capable is necessary. If a
dax_dev does not support recovery it can simply fail the
dax_direct_access() call with the DAX_RECOVERY flag set.

So all DM needs to worry about is passing the new @flags parameter
through the stack.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 4/7] dax: add dax_recovery_write to dax_op and dm target type
  2022-01-28 21:31 ` [PATCH v5 4/7] dax: add dax_recovery_write to dax_op and dm target type Jane Chu
  2022-02-02 13:34   ` Christoph Hellwig
@ 2022-02-04  6:03   ` Dan Williams
  2022-02-06  8:25     ` Jane Chu
  1 sibling, 1 reply; 34+ messages in thread
From: Dan Williams @ 2022-02-04  6:03 UTC (permalink / raw)
  To: Jane Chu
  Cc: david, Darrick J. Wong, Christoph Hellwig, Vishal L Verma,
	Dave Jiang, Alasdair Kergon, Mike Snitzer,
	device-mapper development, Weiny, Ira, Matthew Wilcox,
	Vivek Goyal, linux-fsdevel, Linux NVDIMM,
	Linux Kernel Mailing List, linux-xfs

On Fri, Jan 28, 2022 at 1:32 PM Jane Chu <jane.chu@oracle.com> wrote:
>
> dax_recovery_write() dax op is only required for DAX device that
> export DAXDEV_RECOVERY indicating its capability to recover from
> poisons.
>
> DM may be nested, if part of the base dax devices forming a DM
> device support dax recovery, the DM device is marked with such
> capability.
>
> Signed-off-by: Jane Chu <jane.chu@oracle.com>
[..]
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index 2fc776653c6e..1b3d6ebf3e49 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -30,6 +30,9 @@ struct dax_operations {
>                         sector_t, sector_t);
>         /* zero_page_range: required operation. Zero page range   */
>         int (*zero_page_range)(struct dax_device *, pgoff_t, size_t);
> +       /* recovery_write: optional operation. */
> +       size_t (*recovery_write)(struct dax_device *, pgoff_t, void *, size_t,
> +                               struct iov_iter *);

The removal of the ->copy_{to,from}_iter() operations set the
precedent that dax ops should not be needed when the operation can be
carried out generically. The only need to call back to the pmem driver
is so that it can call nvdimm_clear_poison(). nvdimm_clear_poison() in
turn only needs the 'struct device' hosting the pmem and the physical
address to be cleared. The physical address is already returned by
dax_direct_access(). The device is something that could be added to
dax_device, and the pgmap could host the callback that pmem fills in.
Something like:


diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 58eda16f5c53..36486ba4753a 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -694,6 +694,7 @@ static int __nvdimm_setup_pfn(struct nd_pfn
*nd_pfn, struct dev_pagemap *pgmap)
                .end = nsio->res.end - end_trunc,
        };
        pgmap->nr_range = 1;
+       pgmap->owner = &nd_pfn->dev;
        if (nd_pfn->mode == PFN_MODE_RAM) {
                if (offset < reserve)
                        return -EINVAL;
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 58d95242a836..95e1b6326f88 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -481,6 +481,7 @@ static int pmem_attach_disk(struct device *dev,
        }
        set_dax_nocache(dax_dev);
        set_dax_nomc(dax_dev);
+       set_dax_pgmap(dax_dev, &pmem->pgmap);
        if (is_nvdimm_sync(nd_region))
                set_dax_synchronous(dax_dev);
        rc = dax_add_host(dax_dev, disk);
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 1fafcc38acba..8cb59b5df38b 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -81,6 +81,11 @@ struct dev_pagemap_ops {

 #define PGMAP_ALTMAP_VALID     (1 << 0)

+struct dev_pagemap_operations {
+       size_t (*recovery_write)(struct dev_pagemap *pgmap, void *, size_t,
+                                struct iov_iter *);
+};
+
 /**
  * struct dev_pagemap - metadata for ZONE_DEVICE mappings
  * @altmap: pre-allocated/reserved memory for vmemmap allocations
@@ -111,12 +116,15 @@ struct dev_pagemap {
        const struct dev_pagemap_ops *ops;
        void *owner;
        int nr_range;
+       struct dev_pagemap_operations ops;
        union {
                struct range range;
                struct range ranges[0];
        };
 };

...then DM does not need to be involved in the recovery path, fs/dax.c
just does dax_direct_access(..., DAX_RECOVERY, ...) and then looks up
the pgmap to generically coordinate the recovery_write(). The pmem
driver would be responsible for setting pgmap->recovery_write() to a
function that calls nvdimm_clear_poison().

This arch works for anything that can be described by a pgmap, and
supports error clearing, it need not be limited to the pmem block
driver.

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 5/7] pmem: add pmem_recovery_write() dax op
  2022-01-28 21:31 ` [PATCH v5 5/7] pmem: add pmem_recovery_write() dax op Jane Chu
  2022-02-02 13:43   ` Christoph Hellwig
@ 2022-02-04  6:21   ` Dan Williams
  2022-02-06  8:08     ` Jane Chu
  1 sibling, 1 reply; 34+ messages in thread
From: Dan Williams @ 2022-02-04  6:21 UTC (permalink / raw)
  To: Jane Chu
  Cc: david, Darrick J. Wong, Christoph Hellwig, Vishal L Verma,
	Dave Jiang, Alasdair Kergon, Mike Snitzer,
	device-mapper development, Weiny, Ira, Matthew Wilcox,
	Vivek Goyal, linux-fsdevel, Linux NVDIMM,
	Linux Kernel Mailing List, linux-xfs

On Fri, Jan 28, 2022 at 1:32 PM Jane Chu <jane.chu@oracle.com> wrote:
>
> pmem_recovery_write() consists of clearing poison via DSM,
> clearing page HWPoison bit, re-state _PAGE_PRESENT bit,
> cacheflush, write, and finally clearing bad-block record.
>
> A competing pread thread is held off during recovery write
> by the presence of bad-block record. A competing recovery_write
> thread is serialized by a lock.
>
> Signed-off-by: Jane Chu <jane.chu@oracle.com>
> ---
>  drivers/nvdimm/pmem.c | 82 +++++++++++++++++++++++++++++++++++++++----
>  drivers/nvdimm/pmem.h |  1 +
>  2 files changed, 77 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index 638e64681db9..f2d6b34d48de 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -69,6 +69,14 @@ static void hwpoison_clear(struct pmem_device *pmem,
>         }
>  }
>
> +static void pmem_clear_badblocks(struct pmem_device *pmem, sector_t sector,
> +                               long cleared_blks)
> +{
> +       badblocks_clear(&pmem->bb, sector, cleared_blks);
> +       if (pmem->bb_state)
> +               sysfs_notify_dirent(pmem->bb_state);
> +}
> +
>  static blk_status_t pmem_clear_poison(struct pmem_device *pmem,
>                 phys_addr_t offset, unsigned int len)
>  {
> @@ -88,9 +96,7 @@ static blk_status_t pmem_clear_poison(struct pmem_device *pmem,
>                 dev_dbg(dev, "%#llx clear %ld sector%s\n",
>                                 (unsigned long long) sector, cleared,
>                                 cleared > 1 ? "s" : "");
> -               badblocks_clear(&pmem->bb, sector, cleared);
> -               if (pmem->bb_state)
> -                       sysfs_notify_dirent(pmem->bb_state);
> +               pmem_clear_badblocks(pmem, sector, cleared);
>         }
>
>         arch_invalidate_pmem(pmem->virt_addr + offset, len);
> @@ -257,10 +263,15 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector,
>  __weak long __pmem_direct_access(struct pmem_device *pmem, pgoff_t pgoff,
>                 long nr_pages, void **kaddr, pfn_t *pfn)
>  {
> +       bool bad_pmem;
> +       bool do_recovery = false;
>         resource_size_t offset = PFN_PHYS(pgoff) + pmem->data_offset;
>
> -       if (unlikely(is_bad_pmem(&pmem->bb, PFN_PHYS(pgoff) / 512,
> -                                       PFN_PHYS(nr_pages))))
> +       bad_pmem = is_bad_pmem(&pmem->bb, PFN_PHYS(pgoff) / 512,
> +                               PFN_PHYS(nr_pages));
> +       if (bad_pmem && kaddr)
> +               do_recovery = dax_recovery_started(pmem->dax_dev, kaddr);
> +       if (bad_pmem && !do_recovery)
>                 return -EIO;
>
>         if (kaddr)
> @@ -301,10 +312,68 @@ static long pmem_dax_direct_access(struct dax_device *dax_dev,
>         return __pmem_direct_access(pmem, pgoff, nr_pages, kaddr, pfn);
>  }
>
> +/*
> + * The recovery write thread started out as a normal pwrite thread and
> + * when the filesystem was told about potential media error in the
> + * range, filesystem turns the normal pwrite to a dax_recovery_write.
> + *
> + * The recovery write consists of clearing poison via DSM, clearing page
> + * HWPoison bit, reenable page-wide read-write permission, flush the
> + * caches and finally write.  A competing pread thread needs to be held
> + * off during the recovery process since data read back might not be valid.
> + * And that's achieved by placing the badblock records clearing after
> + * the completion of the recovery write.
> + *
> + * Any competing recovery write thread needs to be serialized, and this is
> + * done via pmem device level lock .recovery_lock.
> + */
>  static size_t pmem_recovery_write(struct dax_device *dax_dev, pgoff_t pgoff,
>                 void *addr, size_t bytes, struct iov_iter *i)
>  {
> -       return 0;
> +       size_t rc, len, off;
> +       phys_addr_t pmem_off;
> +       struct pmem_device *pmem = dax_get_private(dax_dev);
> +       struct device *dev = pmem->bb.dev;
> +       sector_t sector;
> +       long cleared, cleared_blk;
> +
> +       mutex_lock(&pmem->recovery_lock);
> +
> +       /* If no poison found in the range, go ahead with write */
> +       off = (unsigned long)addr & ~PAGE_MASK;
> +       len = PFN_PHYS(PFN_UP(off + bytes));
> +       if (!is_bad_pmem(&pmem->bb, PFN_PHYS(pgoff) / 512, len)) {
> +               rc = _copy_from_iter_flushcache(addr, bytes, i);
> +               goto write_done;
> +       }

is_bad_pmem() takes a seqlock so it should be safe to move the
recovery_lock below this point.

> +
> +       /* Not page-aligned range cannot be recovered */
> +       if (off || !(PAGE_ALIGNED(bytes))) {
> +               dev_warn(dev, "Found poison, but addr(%p) or bytes(%#lx) not page aligned\n",
> +                       addr, bytes);

fs/dax.c will prevent this from happening, right? It seems like an
upper layer bug if we get this far into the recovery process without
checking if a full page is being overwritten.

> +               rc = (size_t) -EIO;
> +               goto write_done;
> +       }
> +
> +       pmem_off = PFN_PHYS(pgoff) + pmem->data_offset;
> +       sector = (pmem_off - pmem->data_offset) / 512;
> +       cleared = nvdimm_clear_poison(dev, pmem->phys_addr + pmem_off, len);
> +       cleared_blk = cleared / 512;
> +       if (cleared_blk > 0) {
> +               hwpoison_clear(pmem, pmem->phys_addr + pmem_off, cleared);
> +       } else {
> +               dev_warn(dev, "pmem_recovery_write: cleared_blk: %ld\n",
> +                       cleared_blk);
> +               rc = (size_t) -EIO;
> +               goto write_done;
> +       }
> +       arch_invalidate_pmem(pmem->virt_addr + pmem_off, bytes);
> +       rc = _copy_from_iter_flushcache(addr, bytes, i);
> +       pmem_clear_badblocks(pmem, sector, cleared_blk);

This duplicates pmem_clear_poison() can more code be shared between
the 2 functions?


> +
> +write_done:
> +       mutex_unlock(&pmem->recovery_lock);
> +       return rc;
>  }
>
>  static const struct dax_operations pmem_dax_ops = {
> @@ -495,6 +564,7 @@ static int pmem_attach_disk(struct device *dev,
>                 goto out_cleanup_dax;
>         dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
>         set_dax_recovery(dax_dev);
> +       mutex_init(&pmem->recovery_lock);
>         pmem->dax_dev = dax_dev;
>
>         rc = device_add_disk(dev, disk, pmem_attribute_groups);
> diff --git a/drivers/nvdimm/pmem.h b/drivers/nvdimm/pmem.h
> index 59cfe13ea8a8..971bff7552d6 100644
> --- a/drivers/nvdimm/pmem.h
> +++ b/drivers/nvdimm/pmem.h
> @@ -24,6 +24,7 @@ struct pmem_device {
>         struct dax_device       *dax_dev;
>         struct gendisk          *disk;
>         struct dev_pagemap      pgmap;
> +       struct mutex            recovery_lock;
>  };
>
>  long __pmem_direct_access(struct pmem_device *pmem, pgoff_t pgoff,
> --
> 2.18.4
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 5/7] pmem: add pmem_recovery_write() dax op
  2022-02-04  6:21   ` Dan Williams
@ 2022-02-06  8:08     ` Jane Chu
  0 siblings, 0 replies; 34+ messages in thread
From: Jane Chu @ 2022-02-06  8:08 UTC (permalink / raw)
  To: Dan Williams
  Cc: david, Darrick J. Wong, Christoph Hellwig, Vishal L Verma,
	Dave Jiang, Alasdair Kergon, Mike Snitzer,
	device-mapper development, Weiny, Ira, Matthew Wilcox,
	Vivek Goyal, linux-fsdevel, Linux NVDIMM,
	Linux Kernel Mailing List, linux-xfs

On 2/3/2022 10:21 PM, Dan Williams wrote:
> On Fri, Jan 28, 2022 at 1:32 PM Jane Chu <jane.chu@oracle.com> wrote:
>>
>> pmem_recovery_write() consists of clearing poison via DSM,
>> clearing page HWPoison bit, re-state _PAGE_PRESENT bit,
>> cacheflush, write, and finally clearing bad-block record.
>>
>> A competing pread thread is held off during recovery write
>> by the presence of bad-block record. A competing recovery_write
>> thread is serialized by a lock.
>>
>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
>> ---
>>   drivers/nvdimm/pmem.c | 82 +++++++++++++++++++++++++++++++++++++++----
>>   drivers/nvdimm/pmem.h |  1 +
>>   2 files changed, 77 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
>> index 638e64681db9..f2d6b34d48de 100644
>> --- a/drivers/nvdimm/pmem.c
>> +++ b/drivers/nvdimm/pmem.c
>> @@ -69,6 +69,14 @@ static void hwpoison_clear(struct pmem_device *pmem,
>>          }
>>   }
>>
>> +static void pmem_clear_badblocks(struct pmem_device *pmem, sector_t sector,
>> +                               long cleared_blks)
>> +{
>> +       badblocks_clear(&pmem->bb, sector, cleared_blks);
>> +       if (pmem->bb_state)
>> +               sysfs_notify_dirent(pmem->bb_state);
>> +}
>> +
>>   static blk_status_t pmem_clear_poison(struct pmem_device *pmem,
>>                  phys_addr_t offset, unsigned int len)
>>   {
>> @@ -88,9 +96,7 @@ static blk_status_t pmem_clear_poison(struct pmem_device *pmem,
>>                  dev_dbg(dev, "%#llx clear %ld sector%s\n",
>>                                  (unsigned long long) sector, cleared,
>>                                  cleared > 1 ? "s" : "");
>> -               badblocks_clear(&pmem->bb, sector, cleared);
>> -               if (pmem->bb_state)
>> -                       sysfs_notify_dirent(pmem->bb_state);
>> +               pmem_clear_badblocks(pmem, sector, cleared);
>>          }
>>
>>          arch_invalidate_pmem(pmem->virt_addr + offset, len);
>> @@ -257,10 +263,15 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector,
>>   __weak long __pmem_direct_access(struct pmem_device *pmem, pgoff_t pgoff,
>>                  long nr_pages, void **kaddr, pfn_t *pfn)
>>   {
>> +       bool bad_pmem;
>> +       bool do_recovery = false;
>>          resource_size_t offset = PFN_PHYS(pgoff) + pmem->data_offset;
>>
>> -       if (unlikely(is_bad_pmem(&pmem->bb, PFN_PHYS(pgoff) / 512,
>> -                                       PFN_PHYS(nr_pages))))
>> +       bad_pmem = is_bad_pmem(&pmem->bb, PFN_PHYS(pgoff) / 512,
>> +                               PFN_PHYS(nr_pages));
>> +       if (bad_pmem && kaddr)
>> +               do_recovery = dax_recovery_started(pmem->dax_dev, kaddr);
>> +       if (bad_pmem && !do_recovery)
>>                  return -EIO;
>>
>>          if (kaddr)
>> @@ -301,10 +312,68 @@ static long pmem_dax_direct_access(struct dax_device *dax_dev,
>>          return __pmem_direct_access(pmem, pgoff, nr_pages, kaddr, pfn);
>>   }
>>
>> +/*
>> + * The recovery write thread started out as a normal pwrite thread and
>> + * when the filesystem was told about potential media error in the
>> + * range, filesystem turns the normal pwrite to a dax_recovery_write.
>> + *
>> + * The recovery write consists of clearing poison via DSM, clearing page
>> + * HWPoison bit, reenable page-wide read-write permission, flush the
>> + * caches and finally write.  A competing pread thread needs to be held
>> + * off during the recovery process since data read back might not be valid.
>> + * And that's achieved by placing the badblock records clearing after
>> + * the completion of the recovery write.
>> + *
>> + * Any competing recovery write thread needs to be serialized, and this is
>> + * done via pmem device level lock .recovery_lock.
>> + */
>>   static size_t pmem_recovery_write(struct dax_device *dax_dev, pgoff_t pgoff,
>>                  void *addr, size_t bytes, struct iov_iter *i)
>>   {
>> -       return 0;
>> +       size_t rc, len, off;
>> +       phys_addr_t pmem_off;
>> +       struct pmem_device *pmem = dax_get_private(dax_dev);
>> +       struct device *dev = pmem->bb.dev;
>> +       sector_t sector;
>> +       long cleared, cleared_blk;
>> +
>> +       mutex_lock(&pmem->recovery_lock);
>> +
>> +       /* If no poison found in the range, go ahead with write */
>> +       off = (unsigned long)addr & ~PAGE_MASK;
>> +       len = PFN_PHYS(PFN_UP(off + bytes));
>> +       if (!is_bad_pmem(&pmem->bb, PFN_PHYS(pgoff) / 512, len)) {
>> +               rc = _copy_from_iter_flushcache(addr, bytes, i);
>> +               goto write_done;
>> +       }
> 
> is_bad_pmem() takes a seqlock so it should be safe to move the
> recovery_lock below this point.

Okay, thanks!

> 
>> +
>> +       /* Not page-aligned range cannot be recovered */
>> +       if (off || !(PAGE_ALIGNED(bytes))) {
>> +               dev_warn(dev, "Found poison, but addr(%p) or bytes(%#lx) not page aligned\n",
>> +                       addr, bytes);
> 
> fs/dax.c will prevent this from happening, right? It seems like an
> upper layer bug if we get this far into the recovery process without
> checking if a full page is being overwritten.

Yes, at the start of each dax_iomap_iter, the buffer is page aligned. 
However, the underlying dax_copy_from_iter is allowed to return partial 
results, causing the subsequent 'while' loop within dax_iomap_iter to 
continue at not page aligned address. I ran into the situation when I 
played around dax_copy_from_iter, not sure in reality, partial result is 
legitimate, just thought to issue a warning should the situation happen.

> 
>> +               rc = (size_t) -EIO;
>> +               goto write_done;
>> +       }
>> +
>> +       pmem_off = PFN_PHYS(pgoff) + pmem->data_offset;
>> +       sector = (pmem_off - pmem->data_offset) / 512;
>> +       cleared = nvdimm_clear_poison(dev, pmem->phys_addr + pmem_off, len);
>> +       cleared_blk = cleared / 512;
>> +       if (cleared_blk > 0) {
>> +               hwpoison_clear(pmem, pmem->phys_addr + pmem_off, cleared);
>> +       } else {
>> +               dev_warn(dev, "pmem_recovery_write: cleared_blk: %ld\n",
>> +                       cleared_blk);
>> +               rc = (size_t) -EIO;
>> +               goto write_done;
>> +       }
>> +       arch_invalidate_pmem(pmem->virt_addr + pmem_off, bytes);
>> +       rc = _copy_from_iter_flushcache(addr, bytes, i);
>> +       pmem_clear_badblocks(pmem, sector, cleared_blk);
> 
> This duplicates pmem_clear_poison() can more code be shared between
> the 2 functions?

I'll look into refactoring pmem_clear_poison().
> 
> 
>> +
>> +write_done:
>> +       mutex_unlock(&pmem->recovery_lock);
>> +       return rc;
>>   }
>>
>>   static const struct dax_operations pmem_dax_ops = {
>> @@ -495,6 +564,7 @@ static int pmem_attach_disk(struct device *dev,
>>                  goto out_cleanup_dax;
>>          dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
>>          set_dax_recovery(dax_dev);
>> +       mutex_init(&pmem->recovery_lock);
>>          pmem->dax_dev = dax_dev;
>>
>>          rc = device_add_disk(dev, disk, pmem_attribute_groups);
>> diff --git a/drivers/nvdimm/pmem.h b/drivers/nvdimm/pmem.h
>> index 59cfe13ea8a8..971bff7552d6 100644
>> --- a/drivers/nvdimm/pmem.h
>> +++ b/drivers/nvdimm/pmem.h
>> @@ -24,6 +24,7 @@ struct pmem_device {
>>          struct dax_device       *dax_dev;
>>          struct gendisk          *disk;
>>          struct dev_pagemap      pgmap;
>> +       struct mutex            recovery_lock;
>>   };
>>
>>   long __pmem_direct_access(struct pmem_device *pmem, pgoff_t pgoff,
>> --
>> 2.18.4
>>

thanks!
-jane

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 4/7] dax: add dax_recovery_write to dax_op and dm target type
  2022-02-04  6:03   ` Dan Williams
@ 2022-02-06  8:25     ` Jane Chu
  0 siblings, 0 replies; 34+ messages in thread
From: Jane Chu @ 2022-02-06  8:25 UTC (permalink / raw)
  To: Dan Williams
  Cc: david, Darrick J. Wong, Christoph Hellwig, Vishal L Verma,
	Dave Jiang, Alasdair Kergon, Mike Snitzer,
	device-mapper development, Weiny, Ira, Matthew Wilcox,
	Vivek Goyal, linux-fsdevel, Linux NVDIMM,
	Linux Kernel Mailing List, linux-xfs

On 2/3/2022 10:03 PM, Dan Williams wrote:
> On Fri, Jan 28, 2022 at 1:32 PM Jane Chu <jane.chu@oracle.com> wrote:
>>
>> dax_recovery_write() dax op is only required for DAX device that
>> export DAXDEV_RECOVERY indicating its capability to recover from
>> poisons.
>>
>> DM may be nested, if part of the base dax devices forming a DM
>> device support dax recovery, the DM device is marked with such
>> capability.
>>
>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
> [..]
>> diff --git a/include/linux/dax.h b/include/linux/dax.h
>> index 2fc776653c6e..1b3d6ebf3e49 100644
>> --- a/include/linux/dax.h
>> +++ b/include/linux/dax.h
>> @@ -30,6 +30,9 @@ struct dax_operations {
>>                          sector_t, sector_t);
>>          /* zero_page_range: required operation. Zero page range   */
>>          int (*zero_page_range)(struct dax_device *, pgoff_t, size_t);
>> +       /* recovery_write: optional operation. */
>> +       size_t (*recovery_write)(struct dax_device *, pgoff_t, void *, size_t,
>> +                               struct iov_iter *);
> 
> The removal of the ->copy_{to,from}_iter() operations set the
> precedent that dax ops should not be needed when the operation can be
> carried out generically. The only need to call back to the pmem driver
> is so that it can call nvdimm_clear_poison(). nvdimm_clear_poison() in
> turn only needs the 'struct device' hosting the pmem and the physical
> address to be cleared. The physical address is already returned by
> dax_direct_access(). The device is something that could be added to
> dax_device, and the pgmap could host the callback that pmem fills in.
> Something like:
> 
> 
> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
> index 58eda16f5c53..36486ba4753a 100644
> --- a/drivers/nvdimm/pfn_devs.c
> +++ b/drivers/nvdimm/pfn_devs.c
> @@ -694,6 +694,7 @@ static int __nvdimm_setup_pfn(struct nd_pfn
> *nd_pfn, struct dev_pagemap *pgmap)
>                  .end = nsio->res.end - end_trunc,
>          };
>          pgmap->nr_range = 1;
> +       pgmap->owner = &nd_pfn->dev;
>          if (nd_pfn->mode == PFN_MODE_RAM) {
>                  if (offset < reserve)
>                          return -EINVAL;
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index 58d95242a836..95e1b6326f88 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -481,6 +481,7 @@ static int pmem_attach_disk(struct device *dev,
>          }
>          set_dax_nocache(dax_dev);
>          set_dax_nomc(dax_dev);
> +       set_dax_pgmap(dax_dev, &pmem->pgmap);
>          if (is_nvdimm_sync(nd_region))
>                  set_dax_synchronous(dax_dev);
>          rc = dax_add_host(dax_dev, disk);
> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> index 1fafcc38acba..8cb59b5df38b 100644
> --- a/include/linux/memremap.h
> +++ b/include/linux/memremap.h
> @@ -81,6 +81,11 @@ struct dev_pagemap_ops {
> 
>   #define PGMAP_ALTMAP_VALID     (1 << 0)
> 
> +struct dev_pagemap_operations {
> +       size_t (*recovery_write)(struct dev_pagemap *pgmap, void *, size_t,
> +                                struct iov_iter *);
> +};
> +
>   /**
>    * struct dev_pagemap - metadata for ZONE_DEVICE mappings
>    * @altmap: pre-allocated/reserved memory for vmemmap allocations
> @@ -111,12 +116,15 @@ struct dev_pagemap {
>          const struct dev_pagemap_ops *ops;
>          void *owner;
>          int nr_range;
> +       struct dev_pagemap_operations ops;
>          union {
>                  struct range range;
>                  struct range ranges[0];
>          };
>   };
> 
> ...then DM does not need to be involved in the recovery path, fs/dax.c
> just does dax_direct_access(..., DAX_RECOVERY, ...) and then looks up
> the pgmap to generically coordinate the recovery_write(). The pmem
> driver would be responsible for setting pgmap->recovery_write() to a
> function that calls nvdimm_clear_poison().
> 
> This arch works for anything that can be described by a pgmap, and
> supports error clearing, it need not be limited to the pmem block
> driver.

This is an interesting idea, let me give it a try and get back to you.

Thanks!
-jane


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 3/7] dm: make dm aware of target's DAXDEV_RECOVERY capability
  2022-02-04  5:34   ` Dan Williams
@ 2022-02-06  8:27     ` Jane Chu
  0 siblings, 0 replies; 34+ messages in thread
From: Jane Chu @ 2022-02-06  8:27 UTC (permalink / raw)
  To: Dan Williams
  Cc: david, Darrick J. Wong, Christoph Hellwig, Vishal L Verma,
	Dave Jiang, Alasdair Kergon, Mike Snitzer,
	device-mapper development, Weiny, Ira, Matthew Wilcox,
	Vivek Goyal, linux-fsdevel, Linux NVDIMM,
	Linux Kernel Mailing List, linux-xfs

On 2/3/2022 9:34 PM, Dan Williams wrote:
> On Fri, Jan 28, 2022 at 1:32 PM Jane Chu <jane.chu@oracle.com> wrote:
>>
>> If one of the MD raid participating target dax device supports
>> DAXDEV_RECOVERY, then it'll be declared on the whole that the
>> MD device is capable of DAXDEV_RECOVERY.
>> And only when the recovery process reaches to the target driver,
>> it becomes deterministic whether a certain dax address range
>> maybe recovered, or not.
>>
>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
>> ---
>>   drivers/md/dm-table.c | 33 +++++++++++++++++++++++++++++++++
>>   1 file changed, 33 insertions(+)
>>
>> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
>> index e43096cfe9e2..8af8a81b6172 100644
>> --- a/drivers/md/dm-table.c
>> +++ b/drivers/md/dm-table.c
>> @@ -844,6 +844,36 @@ static bool dm_table_supports_dax(struct dm_table *t,
>>          return true;
>>   }
>>
>> +/* Check whether device is capable of dax poison recovery */
>> +static int device_poison_recovery_capable(struct dm_target *ti,
>> +       struct dm_dev *dev, sector_t start, sector_t len, void *data)
>> +{
>> +       if (!dev->dax_dev)
>> +               return false;
>> +       return dax_recovery_capable(dev->dax_dev);
> 
> Hmm it's not clear to me that dax_recovery_capable is necessary. If a
> dax_dev does not support recovery it can simply fail the
> dax_direct_access() call with the DAX_RECOVERY flag set.
> 
> So all DM needs to worry about is passing the new @flags parameter
> through the stack.

Yeah, given your idea about adding the .recovery_write to pgmap_ops, it 
wouldn't be needed.

thanks,
-jane

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 2/7] dax: introduce dax device flag DAXDEV_RECOVERY
  2022-02-04  5:32           ` Dan Williams
@ 2022-02-06  8:29             ` Jane Chu
  0 siblings, 0 replies; 34+ messages in thread
From: Jane Chu @ 2022-02-06  8:29 UTC (permalink / raw)
  To: Dan Williams, Christoph Hellwig
  Cc: david, djwong, vishal.l.verma, dave.jiang, agk, snitzer,
	dm-devel, ira.weiny, willy, vgoyal, linux-fsdevel, nvdimm,
	linux-kernel, linux-xfs

On 2/3/2022 9:32 PM, Dan Williams wrote:
> On Thu, Feb 3, 2022 at 9:17 PM Dan Williams <dan.j.williams@intel.com> wrote:
>>
>> On Thu, Feb 3, 2022 at 5:43 AM Christoph Hellwig <hch@infradead.org> wrote:
>>>
>>> On Wed, Feb 02, 2022 at 09:27:42PM +0000, Jane Chu wrote:
>>>> Yeah, I see.  Would you suggest a way to pass the indication from
>>>> dax_iomap_iter to dax_direct_access that the caller intends the
>>>> callee to ignore poison in the range because the caller intends
>>>> to do recovery_write? We tried adding a flag to dax_direct_access, and
>>>> that wasn't liked if I recall.
>>>
>>> To me a flag seems cleaner than this magic, but let's wait for Dan to
>>> chime in.
>>
>> So back in November I suggested modifying the kaddr, mainly to avoid
>> touching all the dax_direct_access() call sites [1]. However, now
>> seeing the code and Chrisoph's comment I think this either wants type
>> safety (e.g. 'dax_addr_t *'), or just add a new flag. Given both of
>> those options involve touching all dax_direct_access() call sites and
>> a @flags operation is more extensible if any other scenarios arrive
>> lets go ahead and plumb a flag and skip the magic.
> 
> Just to be clear we are talking about a flow like:
> 
>          flags = 0;
>          rc = dax_direct_access(..., &kaddr, flags, ...);
>          if (unlikely(rc)) {
>                  flags |= DAX_RECOVERY;
>                  dax_direct_access(..., &kaddr, flags, ...);
>                  return dax_recovery_{read,write}(..., kaddr, ...);
>          }
>          return copy_{mc_to_iter,from_iter_flushcache}(...);

Okay, will go with a flag.

thanks!
-jane

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 1/7] mce: fix set_mce_nospec to always unmap the whole page
  2022-02-04  5:23           ` Dan Williams
@ 2022-02-06  8:30             ` Jane Chu
  0 siblings, 0 replies; 34+ messages in thread
From: Jane Chu @ 2022-02-06  8:30 UTC (permalink / raw)
  To: Dan Williams, Christoph Hellwig
  Cc: david, djwong, vishal.l.verma, dave.jiang, agk, snitzer,
	dm-devel, ira.weiny, willy, vgoyal, linux-fsdevel, nvdimm,
	linux-kernel, linux-xfs

On 2/3/2022 9:23 PM, Dan Williams wrote:
> On Thu, Feb 3, 2022 at 5:42 AM Christoph Hellwig <hch@infradead.org> wrote:
>>
>> On Wed, Feb 02, 2022 at 11:07:37PM +0000, Jane Chu wrote:
>>> On 2/2/2022 1:20 PM, Jane Chu wrote:
>>>>> Wouldn't it make more sense to move these helpers out of line rather
>>>>> than exporting _set_memory_present?
>>>>
>>>> Do you mean to move
>>>>      return change_page_attr_set(&addr, numpages, __pgprot(_PAGE_PRESENT), 0);
>>>> into clear_mce_nospec() for the x86 arch and get rid of _set_memory_present?
>>>> If so, sure I'll do that.
>>>
>>> Looks like I can't do that.  It's either exporting
>>> _set_memory_present(), or exporting change_page_attr_set().  Perhaps the
>>> former is more conventional?
>>
>> These helpers above means set_mce_nospec and clear_mce_nospec.  If they
>> are moved to normal functions instead of inlines, there is no need to
>> export the internals at all.
> 
> Agree, {set,clear}_mce_nospec() can just move to arch/x86/mm/pat/set_memory.c.

Got it, will do.

thanks!
-jane

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2022-02-06  8:30 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-28 21:31 [PATCH v5 0/7] DAX poison recovery Jane Chu
2022-01-28 21:31 ` [PATCH v5 1/7] mce: fix set_mce_nospec to always unmap the whole page Jane Chu
2022-02-02 13:21   ` Christoph Hellwig
2022-02-02 21:20     ` Jane Chu
2022-02-02 23:07       ` Jane Chu
2022-02-03 13:42         ` Christoph Hellwig
2022-02-04  5:23           ` Dan Williams
2022-02-06  8:30             ` Jane Chu
2022-01-28 21:31 ` [PATCH v5 2/7] dax: introduce dax device flag DAXDEV_RECOVERY Jane Chu
2022-02-02 13:23   ` Christoph Hellwig
2022-02-02 21:27     ` Jane Chu
2022-02-03 13:43       ` Christoph Hellwig
2022-02-04  5:17         ` Dan Williams
2022-02-04  5:32           ` Dan Williams
2022-02-06  8:29             ` Jane Chu
2022-01-28 21:31 ` [PATCH v5 3/7] dm: make dm aware of target's DAXDEV_RECOVERY capability Jane Chu
2022-02-04  5:34   ` Dan Williams
2022-02-06  8:27     ` Jane Chu
2022-01-28 21:31 ` [PATCH v5 4/7] dax: add dax_recovery_write to dax_op and dm target type Jane Chu
2022-02-02 13:34   ` Christoph Hellwig
2022-02-02 22:03     ` Jane Chu
2022-02-04  6:03   ` Dan Williams
2022-02-06  8:25     ` Jane Chu
2022-01-28 21:31 ` [PATCH v5 5/7] pmem: add pmem_recovery_write() dax op Jane Chu
2022-02-02 13:43   ` Christoph Hellwig
2022-02-02 22:13     ` Jane Chu
2022-02-04  6:21   ` Dan Williams
2022-02-06  8:08     ` Jane Chu
2022-01-28 21:31 ` [PATCH v5 6/7] dax: add recovery_write to dax_iomap_iter in failure path Jane Chu
2022-02-02 13:44   ` Christoph Hellwig
2022-02-02 22:18     ` Jane Chu
2022-01-28 21:31 ` [PATCH v5 7/7] pmem: fix pmem_do_write() avoid writing to 'np' page Jane Chu
2022-02-02 13:28   ` Christoph Hellwig
2022-02-02 21:31     ` Jane Chu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).