linux-s390.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* devirtualize kernel access to DAX
@ 2021-12-09  6:38 Christoph Hellwig
  2021-12-09  6:38 ` [PATCH 1/5] uio: remove copy_from_iter_flushcache() and copy_mc_to_iter() Christoph Hellwig
                   ` (4 more replies)
  0 siblings, 5 replies; 33+ messages in thread
From: Christoph Hellwig @ 2021-12-09  6:38 UTC (permalink / raw)
  To: Dan Williams, Vishal Verma, Dave Jiang
  Cc: Alasdair Kergon, Mike Snitzer, Ira Weiny, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, Vivek Goyal,
	Stefan Hajnoczi, Miklos Szeredi, Matthew Wilcox, dm-devel,
	nvdimm, linux-s390, linux-fsdevel, virtualization

Hi Dan,

this series cleans up a few loose end ends and then removes the
copy_from_iter and copy_to_iter dax_operations methods in favor of
straight calls.

Diffstat:
 drivers/dax/bus.c             |    3 +
 drivers/dax/super.c           |   40 ++++++++++++++-------
 drivers/md/dm-linear.c        |   20 ----------
 drivers/md/dm-log-writes.c    |   80 ------------------------------------------
 drivers/md/dm-stripe.c        |   20 ----------
 drivers/md/dm.c               |   52 ---------------------------
 drivers/nvdimm/pmem.c         |   27 +-------------
 drivers/s390/block/dcssblk.c  |   18 +--------
 fs/dax.c                      |    5 --
 fs/fuse/virtio_fs.c           |   20 +---------
 include/linux/dax.h           |   28 +++-----------
 include/linux/device-mapper.h |    4 --
 include/linux/uio.h           |   20 ----------
 13 files changed, 44 insertions(+), 293 deletions(-)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/5] uio: remove copy_from_iter_flushcache() and copy_mc_to_iter()
  2021-12-09  6:38 devirtualize kernel access to DAX Christoph Hellwig
@ 2021-12-09  6:38 ` Christoph Hellwig
  2021-12-12 14:22   ` Dan Williams
  2021-12-09  6:38 ` [PATCH 2/5] dax: simplify dax_synchronous and set_dax_synchronous Christoph Hellwig
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 33+ messages in thread
From: Christoph Hellwig @ 2021-12-09  6:38 UTC (permalink / raw)
  To: Dan Williams, Vishal Verma, Dave Jiang
  Cc: Alasdair Kergon, Mike Snitzer, Ira Weiny, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, Vivek Goyal,
	Stefan Hajnoczi, Miklos Szeredi, Matthew Wilcox, dm-devel,
	nvdimm, linux-s390, linux-fsdevel, virtualization

These two wrappers are never used.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvdimm/pmem.c |  4 ++--
 include/linux/uio.h   | 20 +-------------------
 2 files changed, 3 insertions(+), 21 deletions(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 4190c8c46ca88..8294f1c701baa 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -302,8 +302,8 @@ static long pmem_dax_direct_access(struct dax_device *dax_dev,
 }
 
 /*
- * Use the 'no check' versions of copy_from_iter_flushcache() and
- * copy_mc_to_iter() to bypass HARDENED_USERCOPY overhead. Bounds
+ * Use the 'no check' versions of _copy_from_iter_flushcache() and
+ * _copy_mc_to_iter() to bypass HARDENED_USERCOPY overhead. Bounds
  * checking, both file offset and device offset, is handled by
  * dax_iomap_actor()
  */
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 6350354f97e90..494d552c1d663 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -196,7 +196,7 @@ bool copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i)
 #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
 /*
  * Note, users like pmem that depend on the stricter semantics of
- * copy_from_iter_flushcache() than copy_from_iter_nocache() must check for
+ * _copy_from_iter_flushcache() than _copy_from_iter_nocache() must check for
  * IS_ENABLED(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) before assuming that the
  * destination is flushed from the cache on return.
  */
@@ -211,24 +211,6 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
 #define _copy_mc_to_iter _copy_to_iter
 #endif
 
-static __always_inline __must_check
-size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
-{
-	if (unlikely(!check_copy_size(addr, bytes, false)))
-		return 0;
-	else
-		return _copy_from_iter_flushcache(addr, bytes, i);
-}
-
-static __always_inline __must_check
-size_t copy_mc_to_iter(void *addr, size_t bytes, struct iov_iter *i)
-{
-	if (unlikely(!check_copy_size(addr, bytes, true)))
-		return 0;
-	else
-		return _copy_mc_to_iter(addr, bytes, i);
-}
-
 size_t iov_iter_zero(size_t bytes, struct iov_iter *);
 unsigned long iov_iter_alignment(const struct iov_iter *i);
 unsigned long iov_iter_gap_alignment(const struct iov_iter *i);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/5] dax: simplify dax_synchronous and set_dax_synchronous
  2021-12-09  6:38 devirtualize kernel access to DAX Christoph Hellwig
  2021-12-09  6:38 ` [PATCH 1/5] uio: remove copy_from_iter_flushcache() and copy_mc_to_iter() Christoph Hellwig
@ 2021-12-09  6:38 ` Christoph Hellwig
  2021-12-09 21:03   ` Pankaj Gupta
  2021-12-12 14:23   ` Dan Williams
  2021-12-09  6:38 ` [PATCH 3/5] dax: remove the DAXDEV_F_SYNC flag Christoph Hellwig
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 33+ messages in thread
From: Christoph Hellwig @ 2021-12-09  6:38 UTC (permalink / raw)
  To: Dan Williams, Vishal Verma, Dave Jiang
  Cc: Alasdair Kergon, Mike Snitzer, Ira Weiny, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, Vivek Goyal,
	Stefan Hajnoczi, Miklos Szeredi, Matthew Wilcox, dm-devel,
	nvdimm, linux-s390, linux-fsdevel, virtualization

Remove the pointless wrappers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/dax/super.c |  8 ++++----
 include/linux/dax.h | 12 ++----------
 2 files changed, 6 insertions(+), 14 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index e7152a6c4cc40..e18155f43a635 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -208,17 +208,17 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev)
 }
 EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
 
-bool __dax_synchronous(struct dax_device *dax_dev)
+bool dax_synchronous(struct dax_device *dax_dev)
 {
 	return test_bit(DAXDEV_SYNC, &dax_dev->flags);
 }
-EXPORT_SYMBOL_GPL(__dax_synchronous);
+EXPORT_SYMBOL_GPL(dax_synchronous);
 
-void __set_dax_synchronous(struct dax_device *dax_dev)
+void set_dax_synchronous(struct dax_device *dax_dev)
 {
 	set_bit(DAXDEV_SYNC, &dax_dev->flags);
 }
-EXPORT_SYMBOL_GPL(__set_dax_synchronous);
+EXPORT_SYMBOL_GPL(set_dax_synchronous);
 
 bool dax_alive(struct dax_device *dax_dev)
 {
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 87ae4c9b1d65b..3bd1fdb5d5f4b 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -48,16 +48,8 @@ void put_dax(struct dax_device *dax_dev);
 void kill_dax(struct dax_device *dax_dev);
 void dax_write_cache(struct dax_device *dax_dev, bool wc);
 bool dax_write_cache_enabled(struct dax_device *dax_dev);
-bool __dax_synchronous(struct dax_device *dax_dev);
-static inline bool dax_synchronous(struct dax_device *dax_dev)
-{
-	return  __dax_synchronous(dax_dev);
-}
-void __set_dax_synchronous(struct dax_device *dax_dev);
-static inline void set_dax_synchronous(struct dax_device *dax_dev)
-{
-	__set_dax_synchronous(dax_dev);
-}
+bool dax_synchronous(struct dax_device *dax_dev);
+void set_dax_synchronous(struct dax_device *dax_dev);
 /*
  * Check if given mapping is supported by the file / underlying device.
  */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 3/5] dax: remove the DAXDEV_F_SYNC flag
  2021-12-09  6:38 devirtualize kernel access to DAX Christoph Hellwig
  2021-12-09  6:38 ` [PATCH 1/5] uio: remove copy_from_iter_flushcache() and copy_mc_to_iter() Christoph Hellwig
  2021-12-09  6:38 ` [PATCH 2/5] dax: simplify dax_synchronous and set_dax_synchronous Christoph Hellwig
@ 2021-12-09  6:38 ` Christoph Hellwig
  2021-12-12 14:24   ` Dan Williams
  2021-12-13  8:40   ` Pankaj Gupta
  2021-12-09  6:38 ` [PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods Christoph Hellwig
  2021-12-09  6:38 ` [PATCH 5/5] dax: always use _copy_mc_to_iter in dax_copy_to_iter Christoph Hellwig
  4 siblings, 2 replies; 33+ messages in thread
From: Christoph Hellwig @ 2021-12-09  6:38 UTC (permalink / raw)
  To: Dan Williams, Vishal Verma, Dave Jiang
  Cc: Alasdair Kergon, Mike Snitzer, Ira Weiny, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, Vivek Goyal,
	Stefan Hajnoczi, Miklos Szeredi, Matthew Wilcox, dm-devel,
	nvdimm, linux-s390, linux-fsdevel, virtualization

Remove the DAXDEV_F_SYNC flag and thus the flags argument to alloc_dax and
just let the drivers call set_dax_synchronous directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/dax/bus.c            | 3 ++-
 drivers/dax/super.c          | 6 +-----
 drivers/md/dm.c              | 2 +-
 drivers/nvdimm/pmem.c        | 7 +++----
 drivers/s390/block/dcssblk.c | 4 ++--
 fs/fuse/virtio_fs.c          | 2 +-
 include/linux/dax.h          | 8 ++------
 7 files changed, 12 insertions(+), 20 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 6683d42c32c56..da2a14d096d29 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -1324,11 +1324,12 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data)
 	 * No dax_operations since there is no access to this device outside of
 	 * mmap of the resulting character device.
 	 */
-	dax_dev = alloc_dax(dev_dax, NULL, DAXDEV_F_SYNC);
+	dax_dev = alloc_dax(dev_dax, NULL);
 	if (IS_ERR(dax_dev)) {
 		rc = PTR_ERR(dax_dev);
 		goto err_alloc_dax;
 	}
+	set_dax_synchronous(dax_dev);
 
 	/* a device_dax instance is dead while the driver is not attached */
 	kill_dax(dax_dev);
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index e18155f43a635..e81d5ee57390f 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -345,8 +345,7 @@ static struct dax_device *dax_dev_get(dev_t devt)
 	return dax_dev;
 }
 
-struct dax_device *alloc_dax(void *private, const struct dax_operations *ops,
-		unsigned long flags)
+struct dax_device *alloc_dax(void *private, const struct dax_operations *ops)
 {
 	struct dax_device *dax_dev;
 	dev_t devt;
@@ -366,9 +365,6 @@ struct dax_device *alloc_dax(void *private, const struct dax_operations *ops,
 
 	dax_dev->ops = ops;
 	dax_dev->private = private;
-	if (flags & DAXDEV_F_SYNC)
-		set_dax_synchronous(dax_dev);
-
 	return dax_dev;
 
  err_dev:
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 4e997c02bb0a0..f4b972af10928 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1765,7 +1765,7 @@ static struct mapped_device *alloc_dev(int minor)
 	sprintf(md->disk->disk_name, "dm-%d", minor);
 
 	if (IS_ENABLED(CONFIG_FS_DAX)) {
-		md->dax_dev = alloc_dax(md, &dm_dax_ops, 0);
+		md->dax_dev = alloc_dax(md, &dm_dax_ops);
 		if (IS_ERR(md->dax_dev)) {
 			md->dax_dev = NULL;
 			goto bad;
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 8294f1c701baa..85b3339286bd8 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -402,7 +402,6 @@ static int pmem_attach_disk(struct device *dev,
 	struct gendisk *disk;
 	void *addr;
 	int rc;
-	unsigned long flags = 0UL;
 
 	pmem = devm_kzalloc(dev, sizeof(*pmem), GFP_KERNEL);
 	if (!pmem)
@@ -495,13 +494,13 @@ static int pmem_attach_disk(struct device *dev,
 	nvdimm_badblocks_populate(nd_region, &pmem->bb, &bb_range);
 	disk->bb = &pmem->bb;
 
-	if (is_nvdimm_sync(nd_region))
-		flags = DAXDEV_F_SYNC;
-	dax_dev = alloc_dax(pmem, &pmem_dax_ops, flags);
+	dax_dev = alloc_dax(pmem, &pmem_dax_ops);
 	if (IS_ERR(dax_dev)) {
 		rc = PTR_ERR(dax_dev);
 		goto out;
 	}
+	if (is_nvdimm_sync(nd_region))
+		set_dax_synchronous(dax_dev);
 	rc = dax_add_host(dax_dev, disk);
 	if (rc)
 		goto out_cleanup_dax;
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index e65e83764d1ce..10823debc09bd 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -686,13 +686,13 @@ dcssblk_add_store(struct device *dev, struct device_attribute *attr, const char
 	if (rc)
 		goto put_dev;
 
-	dev_info->dax_dev = alloc_dax(dev_info, &dcssblk_dax_ops,
-			DAXDEV_F_SYNC);
+	dev_info->dax_dev = alloc_dax(dev_info, &dcssblk_dax_ops);
 	if (IS_ERR(dev_info->dax_dev)) {
 		rc = PTR_ERR(dev_info->dax_dev);
 		dev_info->dax_dev = NULL;
 		goto put_dev;
 	}
+	set_dax_synchronous(dev_info->dax_dev);
 	rc = dax_add_host(dev_info->dax_dev, dev_info->gd);
 	if (rc)
 		goto out_dax;
diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 242cc1c0d7ed7..5c03a0364a9bb 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -850,7 +850,7 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs)
 	dev_dbg(&vdev->dev, "%s: window kaddr 0x%px phys_addr 0x%llx len 0x%llx\n",
 		__func__, fs->window_kaddr, cache_reg.addr, cache_reg.len);
 
-	fs->dax_dev = alloc_dax(fs, &virtio_fs_dax_ops, 0);
+	fs->dax_dev = alloc_dax(fs, &virtio_fs_dax_ops);
 	if (IS_ERR(fs->dax_dev))
 		return PTR_ERR(fs->dax_dev);
 
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 3bd1fdb5d5f4b..c04f46478e3b5 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -6,9 +6,6 @@
 #include <linux/mm.h>
 #include <linux/radix-tree.h>
 
-/* Flag for synchronous flush */
-#define DAXDEV_F_SYNC (1UL << 0)
-
 typedef unsigned long dax_entry_t;
 
 struct dax_device;
@@ -42,8 +39,7 @@ struct dax_operations {
 };
 
 #if IS_ENABLED(CONFIG_DAX)
-struct dax_device *alloc_dax(void *private, const struct dax_operations *ops,
-		unsigned long flags);
+struct dax_device *alloc_dax(void *private, const struct dax_operations *ops);
 void put_dax(struct dax_device *dax_dev);
 void kill_dax(struct dax_device *dax_dev);
 void dax_write_cache(struct dax_device *dax_dev, bool wc);
@@ -64,7 +60,7 @@ static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
 }
 #else
 static inline struct dax_device *alloc_dax(void *private,
-		const struct dax_operations *ops, unsigned long flags)
+		const struct dax_operations *ops)
 {
 	/*
 	 * Callers should check IS_ENABLED(CONFIG_DAX) to know if this
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods
  2021-12-09  6:38 devirtualize kernel access to DAX Christoph Hellwig
                   ` (2 preceding siblings ...)
  2021-12-09  6:38 ` [PATCH 3/5] dax: remove the DAXDEV_F_SYNC flag Christoph Hellwig
@ 2021-12-09  6:38 ` Christoph Hellwig
  2021-12-10 14:16   ` Vivek Goyal
  2021-12-12 14:39   ` Dan Williams
  2021-12-09  6:38 ` [PATCH 5/5] dax: always use _copy_mc_to_iter in dax_copy_to_iter Christoph Hellwig
  4 siblings, 2 replies; 33+ messages in thread
From: Christoph Hellwig @ 2021-12-09  6:38 UTC (permalink / raw)
  To: Dan Williams, Vishal Verma, Dave Jiang
  Cc: Alasdair Kergon, Mike Snitzer, Ira Weiny, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, Vivek Goyal,
	Stefan Hajnoczi, Miklos Szeredi, Matthew Wilcox, dm-devel,
	nvdimm, linux-s390, linux-fsdevel, virtualization

These methods indirect the actual DAX read/write path.  In the end pmem
uses magic flush and mc safe variants and fuse and dcssblk use plain ones
while device mapper picks redirects to the underlying device.

Add set_dax_virtual() and set_dax_nomcsafe() APIs for fuse to skip these
special variants, then use them everywhere as they fall back to the plain
ones on s390 anyway and remove an indirect call from the read/write path
as well as a lot of boilerplate code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/dax/super.c           | 36 ++++++++++++++--
 drivers/md/dm-linear.c        | 20 ---------
 drivers/md/dm-log-writes.c    | 80 -----------------------------------
 drivers/md/dm-stripe.c        | 20 ---------
 drivers/md/dm.c               | 50 ----------------------
 drivers/nvdimm/pmem.c         | 20 ---------
 drivers/s390/block/dcssblk.c  | 14 ------
 fs/dax.c                      |  5 ---
 fs/fuse/virtio_fs.c           | 19 +--------
 include/linux/dax.h           |  9 ++--
 include/linux/device-mapper.h |  4 --
 11 files changed, 37 insertions(+), 240 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index e81d5ee57390f..ff676a07480c8 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -105,6 +105,10 @@ enum dax_device_flags {
 	DAXDEV_WRITE_CACHE,
 	/* flag to check if device supports synchronous flush */
 	DAXDEV_SYNC,
+	/* do not use uncached operations to write data */
+	DAXDEV_CACHED,
+	/* do not use mcsafe operations to read data */
+	DAXDEV_NOMCSAFE,
 };
 
 /**
@@ -146,9 +150,15 @@ size_t dax_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr,
 	if (!dax_alive(dax_dev))
 		return 0;
 
-	return dax_dev->ops->copy_from_iter(dax_dev, pgoff, addr, bytes, i);
+	/*
+	 * The userspace address for the memory copy has already been validated
+	 * via access_ok() in vfs_write, so use the 'no check' version to bypass
+	 * the HARDENED_USERCOPY overhead.
+	 */
+	if (test_bit(DAXDEV_CACHED, &dax_dev->flags))
+		return _copy_from_iter(addr, bytes, i);
+	return _copy_from_iter_flushcache(addr, bytes, i);
 }
-EXPORT_SYMBOL_GPL(dax_copy_from_iter);
 
 size_t dax_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr,
 		size_t bytes, struct iov_iter *i)
@@ -156,9 +166,15 @@ size_t dax_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr,
 	if (!dax_alive(dax_dev))
 		return 0;
 
-	return dax_dev->ops->copy_to_iter(dax_dev, pgoff, addr, bytes, i);
+	/*
+	 * The userspace address for the memory copy has already been validated
+	 * via access_ok() in vfs_red, so use the 'no check' version to bypass
+	 * the HARDENED_USERCOPY overhead.
+	 */
+	if (test_bit(DAXDEV_NOMCSAFE, &dax_dev->flags))
+		return _copy_to_iter(addr, bytes, i);
+	return _copy_mc_to_iter(addr, bytes, i);
 }
-EXPORT_SYMBOL_GPL(dax_copy_to_iter);
 
 int dax_zero_page_range(struct dax_device *dax_dev, pgoff_t pgoff,
 			size_t nr_pages)
@@ -220,6 +236,18 @@ void set_dax_synchronous(struct dax_device *dax_dev)
 }
 EXPORT_SYMBOL_GPL(set_dax_synchronous);
 
+void set_dax_cached(struct dax_device *dax_dev)
+{
+	set_bit(DAXDEV_CACHED, &dax_dev->flags);
+}
+EXPORT_SYMBOL_GPL(set_dax_cached);
+
+void set_dax_nomcsafe(struct dax_device *dax_dev)
+{
+	set_bit(DAXDEV_NOMCSAFE, &dax_dev->flags);
+}
+EXPORT_SYMBOL_GPL(set_dax_nomcsafe);
+
 bool dax_alive(struct dax_device *dax_dev)
 {
 	lockdep_assert_held(&dax_srcu);
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 90de42f6743ac..1b97a11d71517 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -180,22 +180,6 @@ static long linear_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
 	return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn);
 }
 
-static size_t linear_dax_copy_from_iter(struct dm_target *ti, pgoff_t pgoff,
-		void *addr, size_t bytes, struct iov_iter *i)
-{
-	struct dax_device *dax_dev = linear_dax_pgoff(ti, &pgoff);
-
-	return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i);
-}
-
-static size_t linear_dax_copy_to_iter(struct dm_target *ti, pgoff_t pgoff,
-		void *addr, size_t bytes, struct iov_iter *i)
-{
-	struct dax_device *dax_dev = linear_dax_pgoff(ti, &pgoff);
-
-	return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i);
-}
-
 static int linear_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff,
 				      size_t nr_pages)
 {
@@ -206,8 +190,6 @@ static int linear_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff,
 
 #else
 #define linear_dax_direct_access NULL
-#define linear_dax_copy_from_iter NULL
-#define linear_dax_copy_to_iter NULL
 #define linear_dax_zero_page_range NULL
 #endif
 
@@ -225,8 +207,6 @@ static struct target_type linear_target = {
 	.prepare_ioctl = linear_prepare_ioctl,
 	.iterate_devices = linear_iterate_devices,
 	.direct_access = linear_dax_direct_access,
-	.dax_copy_from_iter = linear_dax_copy_from_iter,
-	.dax_copy_to_iter = linear_dax_copy_to_iter,
 	.dax_zero_page_range = linear_dax_zero_page_range,
 };
 
diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
index cdb22e7a1d0da..139b09b06eda9 100644
--- a/drivers/md/dm-log-writes.c
+++ b/drivers/md/dm-log-writes.c
@@ -902,51 +902,6 @@ static void log_writes_io_hints(struct dm_target *ti, struct queue_limits *limit
 }
 
 #if IS_ENABLED(CONFIG_FS_DAX)
-static int log_dax(struct log_writes_c *lc, sector_t sector, size_t bytes,
-		   struct iov_iter *i)
-{
-	struct pending_block *block;
-
-	if (!bytes)
-		return 0;
-
-	block = kzalloc(sizeof(struct pending_block), GFP_KERNEL);
-	if (!block) {
-		DMERR("Error allocating dax pending block");
-		return -ENOMEM;
-	}
-
-	block->data = kzalloc(bytes, GFP_KERNEL);
-	if (!block->data) {
-		DMERR("Error allocating dax data space");
-		kfree(block);
-		return -ENOMEM;
-	}
-
-	/* write data provided via the iterator */
-	if (!copy_from_iter(block->data, bytes, i)) {
-		DMERR("Error copying dax data");
-		kfree(block->data);
-		kfree(block);
-		return -EIO;
-	}
-
-	/* rewind the iterator so that the block driver can use it */
-	iov_iter_revert(i, bytes);
-
-	block->datalen = bytes;
-	block->sector = bio_to_dev_sectors(lc, sector);
-	block->nr_sectors = ALIGN(bytes, lc->sectorsize) >> lc->sectorshift;
-
-	atomic_inc(&lc->pending_blocks);
-	spin_lock_irq(&lc->blocks_lock);
-	list_add_tail(&block->list, &lc->unflushed_blocks);
-	spin_unlock_irq(&lc->blocks_lock);
-	wake_up_process(lc->log_kthread);
-
-	return 0;
-}
-
 static struct dax_device *log_writes_dax_pgoff(struct dm_target *ti,
 		pgoff_t *pgoff)
 {
@@ -964,37 +919,6 @@ static long log_writes_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
 	return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn);
 }
 
-static size_t log_writes_dax_copy_from_iter(struct dm_target *ti,
-					    pgoff_t pgoff, void *addr, size_t bytes,
-					    struct iov_iter *i)
-{
-	struct log_writes_c *lc = ti->private;
-	sector_t sector = pgoff * PAGE_SECTORS;
-	struct dax_device *dax_dev = log_writes_dax_pgoff(ti, &pgoff);
-	int err;
-
-	/* Don't bother doing anything if logging has been disabled */
-	if (!lc->logging_enabled)
-		goto dax_copy;
-
-	err = log_dax(lc, sector, bytes, i);
-	if (err) {
-		DMWARN("Error %d logging DAX write", err);
-		return 0;
-	}
-dax_copy:
-	return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i);
-}
-
-static size_t log_writes_dax_copy_to_iter(struct dm_target *ti,
-					  pgoff_t pgoff, void *addr, size_t bytes,
-					  struct iov_iter *i)
-{
-	struct dax_device *dax_dev = log_writes_dax_pgoff(ti, &pgoff);
-
-	return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i);
-}
-
 static int log_writes_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff,
 					  size_t nr_pages)
 {
@@ -1005,8 +929,6 @@ static int log_writes_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff,
 
 #else
 #define log_writes_dax_direct_access NULL
-#define log_writes_dax_copy_from_iter NULL
-#define log_writes_dax_copy_to_iter NULL
 #define log_writes_dax_zero_page_range NULL
 #endif
 
@@ -1024,8 +946,6 @@ static struct target_type log_writes_target = {
 	.iterate_devices = log_writes_iterate_devices,
 	.io_hints = log_writes_io_hints,
 	.direct_access = log_writes_dax_direct_access,
-	.dax_copy_from_iter = log_writes_dax_copy_from_iter,
-	.dax_copy_to_iter = log_writes_dax_copy_to_iter,
 	.dax_zero_page_range = log_writes_dax_zero_page_range,
 };
 
diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
index 50dba3f39274c..e566115ec0bb8 100644
--- a/drivers/md/dm-stripe.c
+++ b/drivers/md/dm-stripe.c
@@ -324,22 +324,6 @@ static long stripe_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
 	return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn);
 }
 
-static size_t stripe_dax_copy_from_iter(struct dm_target *ti, pgoff_t pgoff,
-		void *addr, size_t bytes, struct iov_iter *i)
-{
-	struct dax_device *dax_dev = stripe_dax_pgoff(ti, &pgoff);
-
-	return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i);
-}
-
-static size_t stripe_dax_copy_to_iter(struct dm_target *ti, pgoff_t pgoff,
-		void *addr, size_t bytes, struct iov_iter *i)
-{
-	struct dax_device *dax_dev = stripe_dax_pgoff(ti, &pgoff);
-
-	return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i);
-}
-
 static int stripe_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff,
 				      size_t nr_pages)
 {
@@ -350,8 +334,6 @@ static int stripe_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff,
 
 #else
 #define stripe_dax_direct_access NULL
-#define stripe_dax_copy_from_iter NULL
-#define stripe_dax_copy_to_iter NULL
 #define stripe_dax_zero_page_range NULL
 #endif
 
@@ -488,8 +470,6 @@ static struct target_type stripe_target = {
 	.iterate_devices = stripe_iterate_devices,
 	.io_hints = stripe_io_hints,
 	.direct_access = stripe_dax_direct_access,
-	.dax_copy_from_iter = stripe_dax_copy_from_iter,
-	.dax_copy_to_iter = stripe_dax_copy_to_iter,
 	.dax_zero_page_range = stripe_dax_zero_page_range,
 };
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index f4b972af10928..1b7e4ec9e2f57 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1027,54 +1027,6 @@ static long dm_dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
 	return ret;
 }
 
-static size_t dm_dax_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff,
-				    void *addr, size_t bytes, struct iov_iter *i)
-{
-	struct mapped_device *md = dax_get_private(dax_dev);
-	sector_t sector = pgoff * PAGE_SECTORS;
-	struct dm_target *ti;
-	long ret = 0;
-	int srcu_idx;
-
-	ti = dm_dax_get_live_target(md, sector, &srcu_idx);
-
-	if (!ti)
-		goto out;
-	if (!ti->type->dax_copy_from_iter) {
-		ret = copy_from_iter(addr, bytes, i);
-		goto out;
-	}
-	ret = ti->type->dax_copy_from_iter(ti, pgoff, addr, bytes, i);
- out:
-	dm_put_live_table(md, srcu_idx);
-
-	return ret;
-}
-
-static size_t dm_dax_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff,
-		void *addr, size_t bytes, struct iov_iter *i)
-{
-	struct mapped_device *md = dax_get_private(dax_dev);
-	sector_t sector = pgoff * PAGE_SECTORS;
-	struct dm_target *ti;
-	long ret = 0;
-	int srcu_idx;
-
-	ti = dm_dax_get_live_target(md, sector, &srcu_idx);
-
-	if (!ti)
-		goto out;
-	if (!ti->type->dax_copy_to_iter) {
-		ret = copy_to_iter(addr, bytes, i);
-		goto out;
-	}
-	ret = ti->type->dax_copy_to_iter(ti, pgoff, addr, bytes, i);
- out:
-	dm_put_live_table(md, srcu_idx);
-
-	return ret;
-}
-
 static int dm_dax_zero_page_range(struct dax_device *dax_dev, pgoff_t pgoff,
 				  size_t nr_pages)
 {
@@ -3024,8 +2976,6 @@ static const struct block_device_operations dm_rq_blk_dops = {
 
 static const struct dax_operations dm_dax_ops = {
 	.direct_access = dm_dax_direct_access,
-	.copy_from_iter = dm_dax_copy_from_iter,
-	.copy_to_iter = dm_dax_copy_to_iter,
 	.zero_page_range = dm_dax_zero_page_range,
 };
 
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 85b3339286bd8..b08f0642aa257 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -301,28 +301,8 @@ static long pmem_dax_direct_access(struct dax_device *dax_dev,
 	return __pmem_direct_access(pmem, pgoff, nr_pages, kaddr, pfn);
 }
 
-/*
- * Use the 'no check' versions of _copy_from_iter_flushcache() and
- * _copy_mc_to_iter() to bypass HARDENED_USERCOPY overhead. Bounds
- * checking, both file offset and device offset, is handled by
- * dax_iomap_actor()
- */
-static size_t pmem_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff,
-		void *addr, size_t bytes, struct iov_iter *i)
-{
-	return _copy_from_iter_flushcache(addr, bytes, i);
-}
-
-static size_t pmem_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff,
-		void *addr, size_t bytes, struct iov_iter *i)
-{
-	return _copy_mc_to_iter(addr, bytes, i);
-}
-
 static const struct dax_operations pmem_dax_ops = {
 	.direct_access = pmem_dax_direct_access,
-	.copy_from_iter = pmem_copy_from_iter,
-	.copy_to_iter = pmem_copy_to_iter,
 	.zero_page_range = pmem_dax_zero_page_range,
 };
 
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index 10823debc09bd..d614843caf6cc 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -44,18 +44,6 @@ static const struct block_device_operations dcssblk_devops = {
 	.release 	= dcssblk_release,
 };
 
-static size_t dcssblk_dax_copy_from_iter(struct dax_device *dax_dev,
-		pgoff_t pgoff, void *addr, size_t bytes, struct iov_iter *i)
-{
-	return copy_from_iter(addr, bytes, i);
-}
-
-static size_t dcssblk_dax_copy_to_iter(struct dax_device *dax_dev,
-		pgoff_t pgoff, void *addr, size_t bytes, struct iov_iter *i)
-{
-	return copy_to_iter(addr, bytes, i);
-}
-
 static int dcssblk_dax_zero_page_range(struct dax_device *dax_dev,
 				       pgoff_t pgoff, size_t nr_pages)
 {
@@ -72,8 +60,6 @@ static int dcssblk_dax_zero_page_range(struct dax_device *dax_dev,
 
 static const struct dax_operations dcssblk_dax_ops = {
 	.direct_access = dcssblk_dax_direct_access,
-	.copy_from_iter = dcssblk_dax_copy_from_iter,
-	.copy_to_iter = dcssblk_dax_copy_to_iter,
 	.zero_page_range = dcssblk_dax_zero_page_range,
 };
 
diff --git a/fs/dax.c b/fs/dax.c
index e0eecd8e3a8f8..cd03485867a74 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1260,11 +1260,6 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
 		if (map_len > end - pos)
 			map_len = end - pos;
 
-		/*
-		 * The userspace address for the memory copy has already been
-		 * validated via access_ok() in either vfs_read() or
-		 * vfs_write(), depending on which operation we are doing.
-		 */
 		if (iov_iter_rw(iter) == WRITE)
 			xfer = dax_copy_from_iter(dax_dev, pgoff, kaddr,
 					map_len, iter);
diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 5c03a0364a9bb..754319ce2a29b 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -753,20 +753,6 @@ static long virtio_fs_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
 	return nr_pages > max_nr_pages ? max_nr_pages : nr_pages;
 }
 
-static size_t virtio_fs_copy_from_iter(struct dax_device *dax_dev,
-				       pgoff_t pgoff, void *addr,
-				       size_t bytes, struct iov_iter *i)
-{
-	return copy_from_iter(addr, bytes, i);
-}
-
-static size_t virtio_fs_copy_to_iter(struct dax_device *dax_dev,
-				       pgoff_t pgoff, void *addr,
-				       size_t bytes, struct iov_iter *i)
-{
-	return copy_to_iter(addr, bytes, i);
-}
-
 static int virtio_fs_zero_page_range(struct dax_device *dax_dev,
 				     pgoff_t pgoff, size_t nr_pages)
 {
@@ -783,8 +769,6 @@ static int virtio_fs_zero_page_range(struct dax_device *dax_dev,
 
 static const struct dax_operations virtio_fs_dax_ops = {
 	.direct_access = virtio_fs_direct_access,
-	.copy_from_iter = virtio_fs_copy_from_iter,
-	.copy_to_iter = virtio_fs_copy_to_iter,
 	.zero_page_range = virtio_fs_zero_page_range,
 };
 
@@ -853,7 +837,8 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs)
 	fs->dax_dev = alloc_dax(fs, &virtio_fs_dax_ops);
 	if (IS_ERR(fs->dax_dev))
 		return PTR_ERR(fs->dax_dev);
-
+	set_dax_cached(fs->dax_dev);
+	set_dax_nomcsafe(fs->dax_dev);
 	return devm_add_action_or_reset(&vdev->dev, virtio_fs_cleanup_dax,
 					fs->dax_dev);
 }
diff --git a/include/linux/dax.h b/include/linux/dax.h
index c04f46478e3b5..d22cbf03d37d2 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -28,12 +28,6 @@ struct dax_operations {
 	 */
 	bool (*dax_supported)(struct dax_device *, struct block_device *, int,
 			sector_t, sector_t);
-	/* copy_from_iter: required operation for fs-dax direct-i/o */
-	size_t (*copy_from_iter)(struct dax_device *, pgoff_t, void *, size_t,
-			struct iov_iter *);
-	/* copy_to_iter: required operation for fs-dax direct-i/o */
-	size_t (*copy_to_iter)(struct dax_device *, pgoff_t, void *, size_t,
-			struct iov_iter *);
 	/* zero_page_range: required operation. Zero page range   */
 	int (*zero_page_range)(struct dax_device *, pgoff_t, size_t);
 };
@@ -95,6 +89,9 @@ static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
 }
 #endif
 
+void set_dax_cached(struct dax_device *dax_dev);
+void set_dax_nomcsafe(struct dax_device *dax_dev);
+
 struct writeback_control;
 #if defined(CONFIG_BLOCK) && defined(CONFIG_FS_DAX)
 int dax_add_host(struct dax_device *dax_dev, struct gendisk *disk);
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index a7df155ea49b8..b26fecf6c8e87 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -147,8 +147,6 @@ typedef int (*dm_busy_fn) (struct dm_target *ti);
  */
 typedef long (*dm_dax_direct_access_fn) (struct dm_target *ti, pgoff_t pgoff,
 		long nr_pages, void **kaddr, pfn_t *pfn);
-typedef size_t (*dm_dax_copy_iter_fn)(struct dm_target *ti, pgoff_t pgoff,
-		void *addr, size_t bytes, struct iov_iter *i);
 typedef int (*dm_dax_zero_page_range_fn)(struct dm_target *ti, pgoff_t pgoff,
 		size_t nr_pages);
 
@@ -200,8 +198,6 @@ struct target_type {
 	dm_iterate_devices_fn iterate_devices;
 	dm_io_hints_fn io_hints;
 	dm_dax_direct_access_fn direct_access;
-	dm_dax_copy_iter_fn dax_copy_from_iter;
-	dm_dax_copy_iter_fn dax_copy_to_iter;
 	dm_dax_zero_page_range_fn dax_zero_page_range;
 
 	/* For internal device-mapper use. */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 5/5] dax: always use _copy_mc_to_iter in dax_copy_to_iter
  2021-12-09  6:38 devirtualize kernel access to DAX Christoph Hellwig
                   ` (3 preceding siblings ...)
  2021-12-09  6:38 ` [PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods Christoph Hellwig
@ 2021-12-09  6:38 ` Christoph Hellwig
  2021-12-10 14:05   ` Vivek Goyal
  2021-12-12 15:03   ` Dan Williams
  4 siblings, 2 replies; 33+ messages in thread
From: Christoph Hellwig @ 2021-12-09  6:38 UTC (permalink / raw)
  To: Dan Williams, Vishal Verma, Dave Jiang
  Cc: Alasdair Kergon, Mike Snitzer, Ira Weiny, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, Vivek Goyal,
	Stefan Hajnoczi, Miklos Szeredi, Matthew Wilcox, dm-devel,
	nvdimm, linux-s390, linux-fsdevel, virtualization

While using the MC-safe copy routines is rather pointless on a virtual device
like virtiofs, it also isn't harmful at all.  So just use _copy_mc_to_iter
unconditionally to simplify the code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/dax/super.c | 10 ----------
 fs/fuse/virtio_fs.c |  1 -
 include/linux/dax.h |  1 -
 3 files changed, 12 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index ff676a07480c8..fe783234ca669 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -107,8 +107,6 @@ enum dax_device_flags {
 	DAXDEV_SYNC,
 	/* do not use uncached operations to write data */
 	DAXDEV_CACHED,
-	/* do not use mcsafe operations to read data */
-	DAXDEV_NOMCSAFE,
 };
 
 /**
@@ -171,8 +169,6 @@ size_t dax_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr,
 	 * via access_ok() in vfs_red, so use the 'no check' version to bypass
 	 * the HARDENED_USERCOPY overhead.
 	 */
-	if (test_bit(DAXDEV_NOMCSAFE, &dax_dev->flags))
-		return _copy_to_iter(addr, bytes, i);
 	return _copy_mc_to_iter(addr, bytes, i);
 }
 
@@ -242,12 +238,6 @@ void set_dax_cached(struct dax_device *dax_dev)
 }
 EXPORT_SYMBOL_GPL(set_dax_cached);
 
-void set_dax_nomcsafe(struct dax_device *dax_dev)
-{
-	set_bit(DAXDEV_NOMCSAFE, &dax_dev->flags);
-}
-EXPORT_SYMBOL_GPL(set_dax_nomcsafe);
-
 bool dax_alive(struct dax_device *dax_dev)
 {
 	lockdep_assert_held(&dax_srcu);
diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 754319ce2a29b..d9c20b148ac19 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -838,7 +838,6 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs)
 	if (IS_ERR(fs->dax_dev))
 		return PTR_ERR(fs->dax_dev);
 	set_dax_cached(fs->dax_dev);
-	set_dax_nomcsafe(fs->dax_dev);
 	return devm_add_action_or_reset(&vdev->dev, virtio_fs_cleanup_dax,
 					fs->dax_dev);
 }
diff --git a/include/linux/dax.h b/include/linux/dax.h
index d22cbf03d37d2..d267331bc37e7 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -90,7 +90,6 @@ static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
 #endif
 
 void set_dax_cached(struct dax_device *dax_dev);
-void set_dax_nomcsafe(struct dax_device *dax_dev);
 
 struct writeback_control;
 #if defined(CONFIG_BLOCK) && defined(CONFIG_FS_DAX)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/5] dax: simplify dax_synchronous and set_dax_synchronous
  2021-12-09  6:38 ` [PATCH 2/5] dax: simplify dax_synchronous and set_dax_synchronous Christoph Hellwig
@ 2021-12-09 21:03   ` Pankaj Gupta
  2021-12-12 14:23   ` Dan Williams
  1 sibling, 0 replies; 33+ messages in thread
From: Pankaj Gupta @ 2021-12-09 21:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dan Williams, Vishal Verma, Dave Jiang, Alasdair Kergon,
	Mike Snitzer, Ira Weiny, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Vivek Goyal, Stefan Hajnoczi,
	Miklos Szeredi, Matthew Wilcox, dm-devel, Linux NVDIMM,
	linux-s390, linux-fsdevel, virtualization

> Remove the pointless wrappers.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/dax/super.c |  8 ++++----
>  include/linux/dax.h | 12 ++----------
>  2 files changed, 6 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index e7152a6c4cc40..e18155f43a635 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -208,17 +208,17 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev)
>  }
>  EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
>
> -bool __dax_synchronous(struct dax_device *dax_dev)
> +bool dax_synchronous(struct dax_device *dax_dev)
>  {
>         return test_bit(DAXDEV_SYNC, &dax_dev->flags);
>  }
> -EXPORT_SYMBOL_GPL(__dax_synchronous);
> +EXPORT_SYMBOL_GPL(dax_synchronous);
>
> -void __set_dax_synchronous(struct dax_device *dax_dev)
> +void set_dax_synchronous(struct dax_device *dax_dev)
>  {
>         set_bit(DAXDEV_SYNC, &dax_dev->flags);
>  }
> -EXPORT_SYMBOL_GPL(__set_dax_synchronous);
> +EXPORT_SYMBOL_GPL(set_dax_synchronous);
>
>  bool dax_alive(struct dax_device *dax_dev)
>  {
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index 87ae4c9b1d65b..3bd1fdb5d5f4b 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -48,16 +48,8 @@ void put_dax(struct dax_device *dax_dev);
>  void kill_dax(struct dax_device *dax_dev);
>  void dax_write_cache(struct dax_device *dax_dev, bool wc);
>  bool dax_write_cache_enabled(struct dax_device *dax_dev);
> -bool __dax_synchronous(struct dax_device *dax_dev);
> -static inline bool dax_synchronous(struct dax_device *dax_dev)
> -{
> -       return  __dax_synchronous(dax_dev);
> -}
> -void __set_dax_synchronous(struct dax_device *dax_dev);
> -static inline void set_dax_synchronous(struct dax_device *dax_dev)
> -{
> -       __set_dax_synchronous(dax_dev);
> -}
> +bool dax_synchronous(struct dax_device *dax_dev);
> +void set_dax_synchronous(struct dax_device *dax_dev);
>  /*
>   * Check if given mapping is supported by the file / underlying device.
>   */

Looks good to me.

Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/5] dax: always use _copy_mc_to_iter in dax_copy_to_iter
  2021-12-09  6:38 ` [PATCH 5/5] dax: always use _copy_mc_to_iter in dax_copy_to_iter Christoph Hellwig
@ 2021-12-10 14:05   ` Vivek Goyal
  2021-12-12 14:48     ` Dan Williams
  2021-12-12 15:03   ` Dan Williams
  1 sibling, 1 reply; 33+ messages in thread
From: Vivek Goyal @ 2021-12-10 14:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dan Williams, Vishal Verma, Dave Jiang, Alasdair Kergon,
	Mike Snitzer, Ira Weiny, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Stefan Hajnoczi, Miklos Szeredi,
	Matthew Wilcox, dm-devel, nvdimm, linux-s390, linux-fsdevel,
	virtualization

On Thu, Dec 09, 2021 at 07:38:28AM +0100, Christoph Hellwig wrote:
> While using the MC-safe copy routines is rather pointless on a virtual device
> like virtiofs,

I was wondering about that. Is it completely pointless.

Typically we are just mapping host page cache into qemu address space.
That shows as virtiofs device pfn in guest and that pfn is mapped into
guest application address space in mmap() call.

Given on host its DRAM, so I would not expect machine check on load side
so there was no need to use machine check safe variant. But what if host
filesystem is on persistent memory and using DAX. In that case load in
guest can trigger a machine check. Not sure if that machine check will
actually travel into the guest and unblock read() operation or not.

But this sounds like a good change from virtiofs point of view, anyway.

Thanks
Vivek


> it also isn't harmful at all.  So just use _copy_mc_to_iter
> unconditionally to simplify the code.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/dax/super.c | 10 ----------
>  fs/fuse/virtio_fs.c |  1 -
>  include/linux/dax.h |  1 -
>  3 files changed, 12 deletions(-)
> 
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index ff676a07480c8..fe783234ca669 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -107,8 +107,6 @@ enum dax_device_flags {
>  	DAXDEV_SYNC,
>  	/* do not use uncached operations to write data */
>  	DAXDEV_CACHED,
> -	/* do not use mcsafe operations to read data */
> -	DAXDEV_NOMCSAFE,
>  };
>  
>  /**
> @@ -171,8 +169,6 @@ size_t dax_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr,
>  	 * via access_ok() in vfs_red, so use the 'no check' version to bypass
>  	 * the HARDENED_USERCOPY overhead.
>  	 */
> -	if (test_bit(DAXDEV_NOMCSAFE, &dax_dev->flags))
> -		return _copy_to_iter(addr, bytes, i);
>  	return _copy_mc_to_iter(addr, bytes, i);
>  }
>  
> @@ -242,12 +238,6 @@ void set_dax_cached(struct dax_device *dax_dev)
>  }
>  EXPORT_SYMBOL_GPL(set_dax_cached);
>  
> -void set_dax_nomcsafe(struct dax_device *dax_dev)
> -{
> -	set_bit(DAXDEV_NOMCSAFE, &dax_dev->flags);
> -}
> -EXPORT_SYMBOL_GPL(set_dax_nomcsafe);
> -
>  bool dax_alive(struct dax_device *dax_dev)
>  {
>  	lockdep_assert_held(&dax_srcu);
> diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> index 754319ce2a29b..d9c20b148ac19 100644
> --- a/fs/fuse/virtio_fs.c
> +++ b/fs/fuse/virtio_fs.c
> @@ -838,7 +838,6 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs)
>  	if (IS_ERR(fs->dax_dev))
>  		return PTR_ERR(fs->dax_dev);
>  	set_dax_cached(fs->dax_dev);
> -	set_dax_nomcsafe(fs->dax_dev);
>  	return devm_add_action_or_reset(&vdev->dev, virtio_fs_cleanup_dax,
>  					fs->dax_dev);
>  }
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index d22cbf03d37d2..d267331bc37e7 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -90,7 +90,6 @@ static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
>  #endif
>  
>  void set_dax_cached(struct dax_device *dax_dev);
> -void set_dax_nomcsafe(struct dax_device *dax_dev);
>  
>  struct writeback_control;
>  #if defined(CONFIG_BLOCK) && defined(CONFIG_FS_DAX)
> -- 
> 2.30.2
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods
  2021-12-09  6:38 ` [PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods Christoph Hellwig
@ 2021-12-10 14:16   ` Vivek Goyal
  2021-12-12 14:44     ` Dan Williams
  2021-12-12 14:39   ` Dan Williams
  1 sibling, 1 reply; 33+ messages in thread
From: Vivek Goyal @ 2021-12-10 14:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dan Williams, Vishal Verma, Dave Jiang, Alasdair Kergon,
	Mike Snitzer, Ira Weiny, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Stefan Hajnoczi, Miklos Szeredi,
	Matthew Wilcox, dm-devel, nvdimm, linux-s390, linux-fsdevel,
	virtualization

On Thu, Dec 09, 2021 at 07:38:27AM +0100, Christoph Hellwig wrote:
> These methods indirect the actual DAX read/write path.  In the end pmem
> uses magic flush and mc safe variants and fuse and dcssblk use plain ones
> while device mapper picks redirects to the underlying device.
> 
> Add set_dax_virtual() and set_dax_nomcsafe() APIs for fuse to skip these
> special variants, then use them everywhere as they fall back to the plain
> ones on s390 anyway and remove an indirect call from the read/write path
> as well as a lot of boilerplate code.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/dax/super.c           | 36 ++++++++++++++--
>  drivers/md/dm-linear.c        | 20 ---------
>  drivers/md/dm-log-writes.c    | 80 -----------------------------------
>  drivers/md/dm-stripe.c        | 20 ---------
>  drivers/md/dm.c               | 50 ----------------------
>  drivers/nvdimm/pmem.c         | 20 ---------
>  drivers/s390/block/dcssblk.c  | 14 ------
>  fs/dax.c                      |  5 ---
>  fs/fuse/virtio_fs.c           | 19 +--------
>  include/linux/dax.h           |  9 ++--
>  include/linux/device-mapper.h |  4 --
>  11 files changed, 37 insertions(+), 240 deletions(-)
> 

[..]
> diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> index 5c03a0364a9bb..754319ce2a29b 100644
> --- a/fs/fuse/virtio_fs.c
> +++ b/fs/fuse/virtio_fs.c
> @@ -753,20 +753,6 @@ static long virtio_fs_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
>  	return nr_pages > max_nr_pages ? max_nr_pages : nr_pages;
>  }
>  
> -static size_t virtio_fs_copy_from_iter(struct dax_device *dax_dev,
> -				       pgoff_t pgoff, void *addr,
> -				       size_t bytes, struct iov_iter *i)
> -{
> -	return copy_from_iter(addr, bytes, i);
> -}
> -
> -static size_t virtio_fs_copy_to_iter(struct dax_device *dax_dev,
> -				       pgoff_t pgoff, void *addr,
> -				       size_t bytes, struct iov_iter *i)
> -{
> -	return copy_to_iter(addr, bytes, i);
> -}
> -
>  static int virtio_fs_zero_page_range(struct dax_device *dax_dev,
>  				     pgoff_t pgoff, size_t nr_pages)
>  {
> @@ -783,8 +769,6 @@ static int virtio_fs_zero_page_range(struct dax_device *dax_dev,
>  
>  static const struct dax_operations virtio_fs_dax_ops = {
>  	.direct_access = virtio_fs_direct_access,
> -	.copy_from_iter = virtio_fs_copy_from_iter,
> -	.copy_to_iter = virtio_fs_copy_to_iter,
>  	.zero_page_range = virtio_fs_zero_page_range,
>  };
>  
> @@ -853,7 +837,8 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs)
>  	fs->dax_dev = alloc_dax(fs, &virtio_fs_dax_ops);
>  	if (IS_ERR(fs->dax_dev))
>  		return PTR_ERR(fs->dax_dev);
> -
> +	set_dax_cached(fs->dax_dev);

Looks good to me from virtiofs point of view.

Reviewed-by: Vivek Goyal <vgoyal@redhat.com>

Going forward, I am wondering should virtiofs use flushcache version as
well. What if host filesystem is using DAX and mapping persistent memory
pfn directly into qemu address space. I have never tested that.

Right now we are relying on applications to do fsync/msync on virtiofs
for data persistence.

> +	set_dax_nomcsafe(fs->dax_dev);
>  	return devm_add_action_or_reset(&vdev->dev, virtio_fs_cleanup_dax,
>  					fs->dax_dev);
>  }

Thanks
Vivek


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/5] uio: remove copy_from_iter_flushcache() and copy_mc_to_iter()
  2021-12-09  6:38 ` [PATCH 1/5] uio: remove copy_from_iter_flushcache() and copy_mc_to_iter() Christoph Hellwig
@ 2021-12-12 14:22   ` Dan Williams
  2021-12-13  8:27     ` Christoph Hellwig
  0 siblings, 1 reply; 33+ messages in thread
From: Dan Williams @ 2021-12-12 14:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Vishal Verma, Dave Jiang, Alasdair Kergon, Mike Snitzer,
	Ira Weiny, Heiko Carstens, Vasily Gorbik, Christian Borntraeger,
	Vivek Goyal, Stefan Hajnoczi, Miklos Szeredi, Matthew Wilcox,
	device-mapper development, Linux NVDIMM, linux-s390,
	linux-fsdevel, virtualization

On Wed, Dec 8, 2021 at 10:38 PM Christoph Hellwig <hch@lst.de> wrote:
>
> These two wrappers are never used.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/nvdimm/pmem.c |  4 ++--
>  include/linux/uio.h   | 20 +-------------------
>  2 files changed, 3 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index 4190c8c46ca88..8294f1c701baa 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -302,8 +302,8 @@ static long pmem_dax_direct_access(struct dax_device *dax_dev,
>  }
>
>  /*
> - * Use the 'no check' versions of copy_from_iter_flushcache() and
> - * copy_mc_to_iter() to bypass HARDENED_USERCOPY overhead. Bounds
> + * Use the 'no check' versions of _copy_from_iter_flushcache() and
> + * _copy_mc_to_iter() to bypass HARDENED_USERCOPY overhead. Bounds
>   * checking, both file offset and device offset, is handled by
>   * dax_iomap_actor()
>   */

This comment change does not make sense since it is saying why pmem is
using the "_" versions. However, I assume this whole comment goes away
in a later patch.

> diff --git a/include/linux/uio.h b/include/linux/uio.h
> index 6350354f97e90..494d552c1d663 100644
> --- a/include/linux/uio.h
> +++ b/include/linux/uio.h
> @@ -196,7 +196,7 @@ bool copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i)
>  #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
>  /*
>   * Note, users like pmem that depend on the stricter semantics of
> - * copy_from_iter_flushcache() than copy_from_iter_nocache() must check for
> + * _copy_from_iter_flushcache() than _copy_from_iter_nocache() must check for
>   * IS_ENABLED(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) before assuming that the
>   * destination is flushed from the cache on return.
>   */

Same here.

> @@ -211,24 +211,6 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
>  #define _copy_mc_to_iter _copy_to_iter
>  #endif
>
> -static __always_inline __must_check
> -size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
> -{
> -       if (unlikely(!check_copy_size(addr, bytes, false)))
> -               return 0;
> -       else
> -               return _copy_from_iter_flushcache(addr, bytes, i);
> -}
> -
> -static __always_inline __must_check
> -size_t copy_mc_to_iter(void *addr, size_t bytes, struct iov_iter *i)
> -{
> -       if (unlikely(!check_copy_size(addr, bytes, true)))
> -               return 0;
> -       else
> -               return _copy_mc_to_iter(addr, bytes, i);
> -}
> -
>  size_t iov_iter_zero(size_t bytes, struct iov_iter *);
>  unsigned long iov_iter_alignment(const struct iov_iter *i);
>  unsigned long iov_iter_gap_alignment(const struct iov_iter *i);
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/5] dax: simplify dax_synchronous and set_dax_synchronous
  2021-12-09  6:38 ` [PATCH 2/5] dax: simplify dax_synchronous and set_dax_synchronous Christoph Hellwig
  2021-12-09 21:03   ` Pankaj Gupta
@ 2021-12-12 14:23   ` Dan Williams
  1 sibling, 0 replies; 33+ messages in thread
From: Dan Williams @ 2021-12-12 14:23 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Vishal Verma, Dave Jiang, Alasdair Kergon, Mike Snitzer,
	Ira Weiny, Heiko Carstens, Vasily Gorbik, Christian Borntraeger,
	Vivek Goyal, Stefan Hajnoczi, Miklos Szeredi, Matthew Wilcox,
	device-mapper development, Linux NVDIMM, linux-s390,
	linux-fsdevel, virtualization

On Wed, Dec 8, 2021 at 10:38 PM Christoph Hellwig <hch@lst.de> wrote:
>
> Remove the pointless wrappers.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good, not sure why those ever existed.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

> ---
>  drivers/dax/super.c |  8 ++++----
>  include/linux/dax.h | 12 ++----------
>  2 files changed, 6 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index e7152a6c4cc40..e18155f43a635 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -208,17 +208,17 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev)
>  }
>  EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
>
> -bool __dax_synchronous(struct dax_device *dax_dev)
> +bool dax_synchronous(struct dax_device *dax_dev)
>  {
>         return test_bit(DAXDEV_SYNC, &dax_dev->flags);
>  }
> -EXPORT_SYMBOL_GPL(__dax_synchronous);
> +EXPORT_SYMBOL_GPL(dax_synchronous);
>
> -void __set_dax_synchronous(struct dax_device *dax_dev)
> +void set_dax_synchronous(struct dax_device *dax_dev)
>  {
>         set_bit(DAXDEV_SYNC, &dax_dev->flags);
>  }
> -EXPORT_SYMBOL_GPL(__set_dax_synchronous);
> +EXPORT_SYMBOL_GPL(set_dax_synchronous);
>
>  bool dax_alive(struct dax_device *dax_dev)
>  {
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index 87ae4c9b1d65b..3bd1fdb5d5f4b 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -48,16 +48,8 @@ void put_dax(struct dax_device *dax_dev);
>  void kill_dax(struct dax_device *dax_dev);
>  void dax_write_cache(struct dax_device *dax_dev, bool wc);
>  bool dax_write_cache_enabled(struct dax_device *dax_dev);
> -bool __dax_synchronous(struct dax_device *dax_dev);
> -static inline bool dax_synchronous(struct dax_device *dax_dev)
> -{
> -       return  __dax_synchronous(dax_dev);
> -}
> -void __set_dax_synchronous(struct dax_device *dax_dev);
> -static inline void set_dax_synchronous(struct dax_device *dax_dev)
> -{
> -       __set_dax_synchronous(dax_dev);
> -}
> +bool dax_synchronous(struct dax_device *dax_dev);
> +void set_dax_synchronous(struct dax_device *dax_dev);
>  /*
>   * Check if given mapping is supported by the file / underlying device.
>   */
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/5] dax: remove the DAXDEV_F_SYNC flag
  2021-12-09  6:38 ` [PATCH 3/5] dax: remove the DAXDEV_F_SYNC flag Christoph Hellwig
@ 2021-12-12 14:24   ` Dan Williams
  2021-12-13  8:40   ` Pankaj Gupta
  1 sibling, 0 replies; 33+ messages in thread
From: Dan Williams @ 2021-12-12 14:24 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Vishal Verma, Dave Jiang, Alasdair Kergon, Mike Snitzer,
	Ira Weiny, Heiko Carstens, Vasily Gorbik, Christian Borntraeger,
	Vivek Goyal, Stefan Hajnoczi, Miklos Szeredi, Matthew Wilcox,
	device-mapper development, Linux NVDIMM, linux-s390,
	linux-fsdevel, virtualization

On Wed, Dec 8, 2021 at 10:38 PM Christoph Hellwig <hch@lst.de> wrote:
>
> Remove the DAXDEV_F_SYNC flag and thus the flags argument to alloc_dax and
> just let the drivers call set_dax_synchronous directly.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Sure, looks good to me.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

> ---
>  drivers/dax/bus.c            | 3 ++-
>  drivers/dax/super.c          | 6 +-----
>  drivers/md/dm.c              | 2 +-
>  drivers/nvdimm/pmem.c        | 7 +++----
>  drivers/s390/block/dcssblk.c | 4 ++--
>  fs/fuse/virtio_fs.c          | 2 +-
>  include/linux/dax.h          | 8 ++------
>  7 files changed, 12 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index 6683d42c32c56..da2a14d096d29 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -1324,11 +1324,12 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data)
>          * No dax_operations since there is no access to this device outside of
>          * mmap of the resulting character device.
>          */
> -       dax_dev = alloc_dax(dev_dax, NULL, DAXDEV_F_SYNC);
> +       dax_dev = alloc_dax(dev_dax, NULL);
>         if (IS_ERR(dax_dev)) {
>                 rc = PTR_ERR(dax_dev);
>                 goto err_alloc_dax;
>         }
> +       set_dax_synchronous(dax_dev);
>
>         /* a device_dax instance is dead while the driver is not attached */
>         kill_dax(dax_dev);
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index e18155f43a635..e81d5ee57390f 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -345,8 +345,7 @@ static struct dax_device *dax_dev_get(dev_t devt)
>         return dax_dev;
>  }
>
> -struct dax_device *alloc_dax(void *private, const struct dax_operations *ops,
> -               unsigned long flags)
> +struct dax_device *alloc_dax(void *private, const struct dax_operations *ops)
>  {
>         struct dax_device *dax_dev;
>         dev_t devt;
> @@ -366,9 +365,6 @@ struct dax_device *alloc_dax(void *private, const struct dax_operations *ops,
>
>         dax_dev->ops = ops;
>         dax_dev->private = private;
> -       if (flags & DAXDEV_F_SYNC)
> -               set_dax_synchronous(dax_dev);
> -
>         return dax_dev;
>
>   err_dev:
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 4e997c02bb0a0..f4b972af10928 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1765,7 +1765,7 @@ static struct mapped_device *alloc_dev(int minor)
>         sprintf(md->disk->disk_name, "dm-%d", minor);
>
>         if (IS_ENABLED(CONFIG_FS_DAX)) {
> -               md->dax_dev = alloc_dax(md, &dm_dax_ops, 0);
> +               md->dax_dev = alloc_dax(md, &dm_dax_ops);
>                 if (IS_ERR(md->dax_dev)) {
>                         md->dax_dev = NULL;
>                         goto bad;
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index 8294f1c701baa..85b3339286bd8 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -402,7 +402,6 @@ static int pmem_attach_disk(struct device *dev,
>         struct gendisk *disk;
>         void *addr;
>         int rc;
> -       unsigned long flags = 0UL;
>
>         pmem = devm_kzalloc(dev, sizeof(*pmem), GFP_KERNEL);
>         if (!pmem)
> @@ -495,13 +494,13 @@ static int pmem_attach_disk(struct device *dev,
>         nvdimm_badblocks_populate(nd_region, &pmem->bb, &bb_range);
>         disk->bb = &pmem->bb;
>
> -       if (is_nvdimm_sync(nd_region))
> -               flags = DAXDEV_F_SYNC;
> -       dax_dev = alloc_dax(pmem, &pmem_dax_ops, flags);
> +       dax_dev = alloc_dax(pmem, &pmem_dax_ops);
>         if (IS_ERR(dax_dev)) {
>                 rc = PTR_ERR(dax_dev);
>                 goto out;
>         }
> +       if (is_nvdimm_sync(nd_region))
> +               set_dax_synchronous(dax_dev);
>         rc = dax_add_host(dax_dev, disk);
>         if (rc)
>                 goto out_cleanup_dax;
> diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
> index e65e83764d1ce..10823debc09bd 100644
> --- a/drivers/s390/block/dcssblk.c
> +++ b/drivers/s390/block/dcssblk.c
> @@ -686,13 +686,13 @@ dcssblk_add_store(struct device *dev, struct device_attribute *attr, const char
>         if (rc)
>                 goto put_dev;
>
> -       dev_info->dax_dev = alloc_dax(dev_info, &dcssblk_dax_ops,
> -                       DAXDEV_F_SYNC);
> +       dev_info->dax_dev = alloc_dax(dev_info, &dcssblk_dax_ops);
>         if (IS_ERR(dev_info->dax_dev)) {
>                 rc = PTR_ERR(dev_info->dax_dev);
>                 dev_info->dax_dev = NULL;
>                 goto put_dev;
>         }
> +       set_dax_synchronous(dev_info->dax_dev);
>         rc = dax_add_host(dev_info->dax_dev, dev_info->gd);
>         if (rc)
>                 goto out_dax;
> diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> index 242cc1c0d7ed7..5c03a0364a9bb 100644
> --- a/fs/fuse/virtio_fs.c
> +++ b/fs/fuse/virtio_fs.c
> @@ -850,7 +850,7 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs)
>         dev_dbg(&vdev->dev, "%s: window kaddr 0x%px phys_addr 0x%llx len 0x%llx\n",
>                 __func__, fs->window_kaddr, cache_reg.addr, cache_reg.len);
>
> -       fs->dax_dev = alloc_dax(fs, &virtio_fs_dax_ops, 0);
> +       fs->dax_dev = alloc_dax(fs, &virtio_fs_dax_ops);
>         if (IS_ERR(fs->dax_dev))
>                 return PTR_ERR(fs->dax_dev);
>
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index 3bd1fdb5d5f4b..c04f46478e3b5 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -6,9 +6,6 @@
>  #include <linux/mm.h>
>  #include <linux/radix-tree.h>
>
> -/* Flag for synchronous flush */
> -#define DAXDEV_F_SYNC (1UL << 0)
> -
>  typedef unsigned long dax_entry_t;
>
>  struct dax_device;
> @@ -42,8 +39,7 @@ struct dax_operations {
>  };
>
>  #if IS_ENABLED(CONFIG_DAX)
> -struct dax_device *alloc_dax(void *private, const struct dax_operations *ops,
> -               unsigned long flags);
> +struct dax_device *alloc_dax(void *private, const struct dax_operations *ops);
>  void put_dax(struct dax_device *dax_dev);
>  void kill_dax(struct dax_device *dax_dev);
>  void dax_write_cache(struct dax_device *dax_dev, bool wc);
> @@ -64,7 +60,7 @@ static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
>  }
>  #else
>  static inline struct dax_device *alloc_dax(void *private,
> -               const struct dax_operations *ops, unsigned long flags)
> +               const struct dax_operations *ops)
>  {
>         /*
>          * Callers should check IS_ENABLED(CONFIG_DAX) to know if this
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods
  2021-12-09  6:38 ` [PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods Christoph Hellwig
  2021-12-10 14:16   ` Vivek Goyal
@ 2021-12-12 14:39   ` Dan Williams
  2021-12-13  8:24     ` Christoph Hellwig
  1 sibling, 1 reply; 33+ messages in thread
From: Dan Williams @ 2021-12-12 14:39 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Vishal Verma, Dave Jiang, Alasdair Kergon, Mike Snitzer,
	Ira Weiny, Heiko Carstens, Vasily Gorbik, Christian Borntraeger,
	Vivek Goyal, Stefan Hajnoczi, Miklos Szeredi, Matthew Wilcox,
	device-mapper development, Linux NVDIMM, linux-s390,
	linux-fsdevel, virtualization

On Wed, Dec 8, 2021 at 10:38 PM Christoph Hellwig <hch@lst.de> wrote:
>
> These methods indirect the actual DAX read/write path.  In the end pmem
> uses magic flush and mc safe variants and fuse and dcssblk use plain ones
> while device mapper picks redirects to the underlying device.
>
> Add set_dax_virtual() and set_dax_nomcsafe() APIs for fuse to skip these
> special variants, then use them everywhere as they fall back to the plain
> ones on s390 anyway and remove an indirect call from the read/write path
> as well as a lot of boilerplate code.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/dax/super.c           | 36 ++++++++++++++--
>  drivers/md/dm-linear.c        | 20 ---------
>  drivers/md/dm-log-writes.c    | 80 -----------------------------------
>  drivers/md/dm-stripe.c        | 20 ---------
>  drivers/md/dm.c               | 50 ----------------------
>  drivers/nvdimm/pmem.c         | 20 ---------
>  drivers/s390/block/dcssblk.c  | 14 ------
>  fs/dax.c                      |  5 ---
>  fs/fuse/virtio_fs.c           | 19 +--------
>  include/linux/dax.h           |  9 ++--
>  include/linux/device-mapper.h |  4 --
>  11 files changed, 37 insertions(+), 240 deletions(-)
>
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index e81d5ee57390f..ff676a07480c8 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -105,6 +105,10 @@ enum dax_device_flags {
>         DAXDEV_WRITE_CACHE,
>         /* flag to check if device supports synchronous flush */
>         DAXDEV_SYNC,
> +       /* do not use uncached operations to write data */
> +       DAXDEV_CACHED,
> +       /* do not use mcsafe operations to read data */
> +       DAXDEV_NOMCSAFE,

Linus did not like the mcsafe name, and this brings it back. Let's
flip the polarity to positively indicate which routine to use, and to
match the 'nofault' style which says "copy and handle faults".

/* do not leave the caches dirty after writes */
DAXDEV_NOCACHE

/* handle CPU fetch exceptions during reads */
DAXDEV_NOMC

...and then flip the use cases around.

Otherwise, nice cleanup. In retrospect I took the feedback to push
this decision to a driver a bit too literally, this is much better.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods
  2021-12-10 14:16   ` Vivek Goyal
@ 2021-12-12 14:44     ` Dan Williams
  2021-12-13  8:23       ` Christoph Hellwig
  2021-12-13 16:17       ` Vivek Goyal
  0 siblings, 2 replies; 33+ messages in thread
From: Dan Williams @ 2021-12-12 14:44 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Christoph Hellwig, Vishal Verma, Dave Jiang, Alasdair Kergon,
	Mike Snitzer, Ira Weiny, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Stefan Hajnoczi, Miklos Szeredi,
	Matthew Wilcox, device-mapper development, Linux NVDIMM,
	linux-s390, linux-fsdevel, virtualization

On Fri, Dec 10, 2021 at 6:17 AM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Thu, Dec 09, 2021 at 07:38:27AM +0100, Christoph Hellwig wrote:
> > These methods indirect the actual DAX read/write path.  In the end pmem
> > uses magic flush and mc safe variants and fuse and dcssblk use plain ones
> > while device mapper picks redirects to the underlying device.
> >
> > Add set_dax_virtual() and set_dax_nomcsafe() APIs for fuse to skip these
> > special variants, then use them everywhere as they fall back to the plain
> > ones on s390 anyway and remove an indirect call from the read/write path
> > as well as a lot of boilerplate code.
> >
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  drivers/dax/super.c           | 36 ++++++++++++++--
> >  drivers/md/dm-linear.c        | 20 ---------
> >  drivers/md/dm-log-writes.c    | 80 -----------------------------------
> >  drivers/md/dm-stripe.c        | 20 ---------
> >  drivers/md/dm.c               | 50 ----------------------
> >  drivers/nvdimm/pmem.c         | 20 ---------
> >  drivers/s390/block/dcssblk.c  | 14 ------
> >  fs/dax.c                      |  5 ---
> >  fs/fuse/virtio_fs.c           | 19 +--------
> >  include/linux/dax.h           |  9 ++--
> >  include/linux/device-mapper.h |  4 --
> >  11 files changed, 37 insertions(+), 240 deletions(-)
> >
>
> [..]
> > diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> > index 5c03a0364a9bb..754319ce2a29b 100644
> > --- a/fs/fuse/virtio_fs.c
> > +++ b/fs/fuse/virtio_fs.c
> > @@ -753,20 +753,6 @@ static long virtio_fs_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
> >       return nr_pages > max_nr_pages ? max_nr_pages : nr_pages;
> >  }
> >
> > -static size_t virtio_fs_copy_from_iter(struct dax_device *dax_dev,
> > -                                    pgoff_t pgoff, void *addr,
> > -                                    size_t bytes, struct iov_iter *i)
> > -{
> > -     return copy_from_iter(addr, bytes, i);
> > -}
> > -
> > -static size_t virtio_fs_copy_to_iter(struct dax_device *dax_dev,
> > -                                    pgoff_t pgoff, void *addr,
> > -                                    size_t bytes, struct iov_iter *i)
> > -{
> > -     return copy_to_iter(addr, bytes, i);
> > -}
> > -
> >  static int virtio_fs_zero_page_range(struct dax_device *dax_dev,
> >                                    pgoff_t pgoff, size_t nr_pages)
> >  {
> > @@ -783,8 +769,6 @@ static int virtio_fs_zero_page_range(struct dax_device *dax_dev,
> >
> >  static const struct dax_operations virtio_fs_dax_ops = {
> >       .direct_access = virtio_fs_direct_access,
> > -     .copy_from_iter = virtio_fs_copy_from_iter,
> > -     .copy_to_iter = virtio_fs_copy_to_iter,
> >       .zero_page_range = virtio_fs_zero_page_range,
> >  };
> >
> > @@ -853,7 +837,8 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs)
> >       fs->dax_dev = alloc_dax(fs, &virtio_fs_dax_ops);
> >       if (IS_ERR(fs->dax_dev))
> >               return PTR_ERR(fs->dax_dev);
> > -
> > +     set_dax_cached(fs->dax_dev);
>
> Looks good to me from virtiofs point of view.
>
> Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
>
> Going forward, I am wondering should virtiofs use flushcache version as
> well. What if host filesystem is using DAX and mapping persistent memory
> pfn directly into qemu address space. I have never tested that.
>
> Right now we are relying on applications to do fsync/msync on virtiofs
> for data persistence.

This sounds like it would need coordination with a paravirtualized
driver that can indicate whether the host side is pmem or not, like
the virtio_pmem driver. However, if the guest sends any fsync/msync
you would still need to go explicitly cache flush any dirty page
because you can't necessarily trust that the guest did that already.

>
> > +     set_dax_nomcsafe(fs->dax_dev);
> >       return devm_add_action_or_reset(&vdev->dev, virtio_fs_cleanup_dax,
> >                                       fs->dax_dev);
> >  }
>
> Thanks
> Vivek
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/5] dax: always use _copy_mc_to_iter in dax_copy_to_iter
  2021-12-10 14:05   ` Vivek Goyal
@ 2021-12-12 14:48     ` Dan Williams
  2021-12-13  8:20       ` Christoph Hellwig
  2021-12-14 13:59       ` Vivek Goyal
  0 siblings, 2 replies; 33+ messages in thread
From: Dan Williams @ 2021-12-12 14:48 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Christoph Hellwig, Vishal Verma, Dave Jiang, Alasdair Kergon,
	Mike Snitzer, Ira Weiny, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Stefan Hajnoczi, Miklos Szeredi,
	Matthew Wilcox, device-mapper development, Linux NVDIMM,
	linux-s390, linux-fsdevel, virtualization

On Fri, Dec 10, 2021 at 6:05 AM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Thu, Dec 09, 2021 at 07:38:28AM +0100, Christoph Hellwig wrote:
> > While using the MC-safe copy routines is rather pointless on a virtual device
> > like virtiofs,
>
> I was wondering about that. Is it completely pointless.
>
> Typically we are just mapping host page cache into qemu address space.
> That shows as virtiofs device pfn in guest and that pfn is mapped into
> guest application address space in mmap() call.
>
> Given on host its DRAM, so I would not expect machine check on load side
> so there was no need to use machine check safe variant.

That's a broken assumption, DRAM experiences multi-bit ECC errors.
Machine checks, data aborts, etc existed before PMEM.

>  But what if host
> filesystem is on persistent memory and using DAX. In that case load in
> guest can trigger a machine check. Not sure if that machine check will
> actually travel into the guest and unblock read() operation or not.
>
> But this sounds like a good change from virtiofs point of view, anyway.
>
> Thanks
> Vivek
>
>
> > it also isn't harmful at all.  So just use _copy_mc_to_iter
> > unconditionally to simplify the code.
> >
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  drivers/dax/super.c | 10 ----------
> >  fs/fuse/virtio_fs.c |  1 -
> >  include/linux/dax.h |  1 -
> >  3 files changed, 12 deletions(-)
> >
> > diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> > index ff676a07480c8..fe783234ca669 100644
> > --- a/drivers/dax/super.c
> > +++ b/drivers/dax/super.c
> > @@ -107,8 +107,6 @@ enum dax_device_flags {
> >       DAXDEV_SYNC,
> >       /* do not use uncached operations to write data */
> >       DAXDEV_CACHED,
> > -     /* do not use mcsafe operations to read data */
> > -     DAXDEV_NOMCSAFE,
> >  };
> >
> >  /**
> > @@ -171,8 +169,6 @@ size_t dax_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr,
> >        * via access_ok() in vfs_red, so use the 'no check' version to bypass
> >        * the HARDENED_USERCOPY overhead.
> >        */
> > -     if (test_bit(DAXDEV_NOMCSAFE, &dax_dev->flags))
> > -             return _copy_to_iter(addr, bytes, i);
> >       return _copy_mc_to_iter(addr, bytes, i);
> >  }
> >
> > @@ -242,12 +238,6 @@ void set_dax_cached(struct dax_device *dax_dev)
> >  }
> >  EXPORT_SYMBOL_GPL(set_dax_cached);
> >
> > -void set_dax_nomcsafe(struct dax_device *dax_dev)
> > -{
> > -     set_bit(DAXDEV_NOMCSAFE, &dax_dev->flags);
> > -}
> > -EXPORT_SYMBOL_GPL(set_dax_nomcsafe);
> > -
> >  bool dax_alive(struct dax_device *dax_dev)
> >  {
> >       lockdep_assert_held(&dax_srcu);
> > diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> > index 754319ce2a29b..d9c20b148ac19 100644
> > --- a/fs/fuse/virtio_fs.c
> > +++ b/fs/fuse/virtio_fs.c
> > @@ -838,7 +838,6 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs)
> >       if (IS_ERR(fs->dax_dev))
> >               return PTR_ERR(fs->dax_dev);
> >       set_dax_cached(fs->dax_dev);
> > -     set_dax_nomcsafe(fs->dax_dev);
> >       return devm_add_action_or_reset(&vdev->dev, virtio_fs_cleanup_dax,
> >                                       fs->dax_dev);
> >  }
> > diff --git a/include/linux/dax.h b/include/linux/dax.h
> > index d22cbf03d37d2..d267331bc37e7 100644
> > --- a/include/linux/dax.h
> > +++ b/include/linux/dax.h
> > @@ -90,7 +90,6 @@ static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
> >  #endif
> >
> >  void set_dax_cached(struct dax_device *dax_dev);
> > -void set_dax_nomcsafe(struct dax_device *dax_dev);
> >
> >  struct writeback_control;
> >  #if defined(CONFIG_BLOCK) && defined(CONFIG_FS_DAX)
> > --
> > 2.30.2
> >
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/5] dax: always use _copy_mc_to_iter in dax_copy_to_iter
  2021-12-09  6:38 ` [PATCH 5/5] dax: always use _copy_mc_to_iter in dax_copy_to_iter Christoph Hellwig
  2021-12-10 14:05   ` Vivek Goyal
@ 2021-12-12 15:03   ` Dan Williams
  1 sibling, 0 replies; 33+ messages in thread
From: Dan Williams @ 2021-12-12 15:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Vishal Verma, Dave Jiang, Alasdair Kergon, Mike Snitzer,
	Ira Weiny, Heiko Carstens, Vasily Gorbik, Christian Borntraeger,
	Vivek Goyal, Stefan Hajnoczi, Miklos Szeredi, Matthew Wilcox,
	device-mapper development, Linux NVDIMM, linux-s390,
	linux-fsdevel, virtualization

On Wed, Dec 8, 2021 at 10:38 PM Christoph Hellwig <hch@lst.de> wrote:
>
> While using the MC-safe copy routines is rather pointless on a virtual device
> like virtiofs, it also isn't harmful at all.  So just use _copy_mc_to_iter
> unconditionally to simplify the code.

From a correctness perspective, yes, but from a performance perspective, see:

enable_copy_mc_fragile()

...on those platforms fast-string copy implementation is replaced with
a manual unrolled copy. So this will cause a performance regression on
those platforms.

How about let's keep this as is / still only use it for PMEM where end
users are already dealing with the performance difference across
platforms? I considered exporting an indicator of which backend
routine has been selected from arch/x86/lib/copy_mc.c, but it got
messy quickly so I fell back to just keeping the status quo.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/5] dax: always use _copy_mc_to_iter in dax_copy_to_iter
  2021-12-12 14:48     ` Dan Williams
@ 2021-12-13  8:20       ` Christoph Hellwig
  2021-12-13 16:43         ` Dan Williams
  2021-12-14 13:59       ` Vivek Goyal
  1 sibling, 1 reply; 33+ messages in thread
From: Christoph Hellwig @ 2021-12-13  8:20 UTC (permalink / raw)
  To: Dan Williams
  Cc: Vivek Goyal, Christoph Hellwig, Vishal Verma, Dave Jiang,
	Alasdair Kergon, Mike Snitzer, Ira Weiny, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, Stefan Hajnoczi,
	Miklos Szeredi, Matthew Wilcox, device-mapper development,
	Linux NVDIMM, linux-s390, linux-fsdevel, virtualization

On Sun, Dec 12, 2021 at 06:48:05AM -0800, Dan Williams wrote:
> On Fri, Dec 10, 2021 at 6:05 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Thu, Dec 09, 2021 at 07:38:28AM +0100, Christoph Hellwig wrote:
> > > While using the MC-safe copy routines is rather pointless on a virtual device
> > > like virtiofs,
> >
> > I was wondering about that. Is it completely pointless.
> >
> > Typically we are just mapping host page cache into qemu address space.
> > That shows as virtiofs device pfn in guest and that pfn is mapped into
> > guest application address space in mmap() call.
> >
> > Given on host its DRAM, so I would not expect machine check on load side
> > so there was no need to use machine check safe variant.
> 
> That's a broken assumption, DRAM experiences multi-bit ECC errors.
> Machine checks, data aborts, etc existed before PMEM.

So the conclusion here is that we should always use the mc safe variant?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods
  2021-12-12 14:44     ` Dan Williams
@ 2021-12-13  8:23       ` Christoph Hellwig
  2021-12-14 14:22         ` Vivek Goyal
  2021-12-13 16:17       ` Vivek Goyal
  1 sibling, 1 reply; 33+ messages in thread
From: Christoph Hellwig @ 2021-12-13  8:23 UTC (permalink / raw)
  To: Dan Williams
  Cc: Vivek Goyal, Christoph Hellwig, Vishal Verma, Dave Jiang,
	Alasdair Kergon, Mike Snitzer, Ira Weiny, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, Stefan Hajnoczi,
	Miklos Szeredi, Matthew Wilcox, device-mapper development,
	Linux NVDIMM, linux-s390, linux-fsdevel, virtualization

On Sun, Dec 12, 2021 at 06:44:26AM -0800, Dan Williams wrote:
> On Fri, Dec 10, 2021 at 6:17 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > Going forward, I am wondering should virtiofs use flushcache version as
> > well. What if host filesystem is using DAX and mapping persistent memory
> > pfn directly into qemu address space. I have never tested that.
> >
> > Right now we are relying on applications to do fsync/msync on virtiofs
> > for data persistence.
> 
> This sounds like it would need coordination with a paravirtualized
> driver that can indicate whether the host side is pmem or not, like
> the virtio_pmem driver. However, if the guest sends any fsync/msync
> you would still need to go explicitly cache flush any dirty page
> because you can't necessarily trust that the guest did that already.

Do we?  The application can't really know what backend it is on, so
it sounds like the current virtiofs implementation doesn't really, does it?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods
  2021-12-12 14:39   ` Dan Williams
@ 2021-12-13  8:24     ` Christoph Hellwig
  0 siblings, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2021-12-13  8:24 UTC (permalink / raw)
  To: Dan Williams
  Cc: Christoph Hellwig, Vishal Verma, Dave Jiang, Alasdair Kergon,
	Mike Snitzer, Ira Weiny, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Vivek Goyal, Stefan Hajnoczi,
	Miklos Szeredi, Matthew Wilcox, device-mapper development,
	Linux NVDIMM, linux-s390, linux-fsdevel, virtualization

On Sun, Dec 12, 2021 at 06:39:16AM -0800, Dan Williams wrote:
> >         /* flag to check if device supports synchronous flush */
> >         DAXDEV_SYNC,
> > +       /* do not use uncached operations to write data */
> > +       DAXDEV_CACHED,
> > +       /* do not use mcsafe operations to read data */
> > +       DAXDEV_NOMCSAFE,
> 
> Linus did not like the mcsafe name, and this brings it back. Let's
> flip the polarity to positively indicate which routine to use, and to
> match the 'nofault' style which says "copy and handle faults".
> 
> /* do not leave the caches dirty after writes */
> DAXDEV_NOCACHE
> 
> /* handle CPU fetch exceptions during reads */
> DAXDEV_NOMC
> 
> ...and then flip the use cases around.

Sure we can do that.  But let's finish the discussion if we actually
need the virtiofs special casing, as it seems pretty fishy in many
aspects.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/5] uio: remove copy_from_iter_flushcache() and copy_mc_to_iter()
  2021-12-12 14:22   ` Dan Williams
@ 2021-12-13  8:27     ` Christoph Hellwig
  0 siblings, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2021-12-13  8:27 UTC (permalink / raw)
  To: Dan Williams
  Cc: Christoph Hellwig, Vishal Verma, Dave Jiang, Alasdair Kergon,
	Mike Snitzer, Ira Weiny, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Vivek Goyal, Stefan Hajnoczi,
	Miklos Szeredi, Matthew Wilcox, device-mapper development,
	Linux NVDIMM, linux-s390, linux-fsdevel, virtualization

On Sun, Dec 12, 2021 at 06:22:20AM -0800, Dan Williams wrote:
> > - * Use the 'no check' versions of copy_from_iter_flushcache() and
> > - * copy_mc_to_iter() to bypass HARDENED_USERCOPY overhead. Bounds
> > + * Use the 'no check' versions of _copy_from_iter_flushcache() and
> > + * _copy_mc_to_iter() to bypass HARDENED_USERCOPY overhead. Bounds
> >   * checking, both file offset and device offset, is handled by
> >   * dax_iomap_actor()
> >   */
> 
> This comment change does not make sense since it is saying why pmem is
> using the "_" versions. However, I assume this whole comment goes away
> in a later patch.

It does not go away in this series.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/5] dax: remove the DAXDEV_F_SYNC flag
  2021-12-09  6:38 ` [PATCH 3/5] dax: remove the DAXDEV_F_SYNC flag Christoph Hellwig
  2021-12-12 14:24   ` Dan Williams
@ 2021-12-13  8:40   ` Pankaj Gupta
  1 sibling, 0 replies; 33+ messages in thread
From: Pankaj Gupta @ 2021-12-13  8:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dan Williams, Vishal Verma, Dave Jiang, Alasdair Kergon,
	Mike Snitzer, Ira Weiny, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Vivek Goyal, Stefan Hajnoczi,
	Miklos Szeredi, Matthew Wilcox, dm-devel, Linux NVDIMM,
	linux-s390, linux-fsdevel, virtualization

> Remove the DAXDEV_F_SYNC flag and thus the flags argument to alloc_dax and
> just let the drivers call set_dax_synchronous directly.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/dax/bus.c            | 3 ++-
>  drivers/dax/super.c          | 6 +-----
>  drivers/md/dm.c              | 2 +-
>  drivers/nvdimm/pmem.c        | 7 +++----
>  drivers/s390/block/dcssblk.c | 4 ++--
>  fs/fuse/virtio_fs.c          | 2 +-
>  include/linux/dax.h          | 8 ++------
>  7 files changed, 12 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index 6683d42c32c56..da2a14d096d29 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -1324,11 +1324,12 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data)
>          * No dax_operations since there is no access to this device outside of
>          * mmap of the resulting character device.
>          */
> -       dax_dev = alloc_dax(dev_dax, NULL, DAXDEV_F_SYNC);
> +       dax_dev = alloc_dax(dev_dax, NULL);
>         if (IS_ERR(dax_dev)) {
>                 rc = PTR_ERR(dax_dev);
>                 goto err_alloc_dax;
>         }
> +       set_dax_synchronous(dax_dev);
>
>         /* a device_dax instance is dead while the driver is not attached */
>         kill_dax(dax_dev);
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index e18155f43a635..e81d5ee57390f 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -345,8 +345,7 @@ static struct dax_device *dax_dev_get(dev_t devt)
>         return dax_dev;
>  }
>
> -struct dax_device *alloc_dax(void *private, const struct dax_operations *ops,
> -               unsigned long flags)
> +struct dax_device *alloc_dax(void *private, const struct dax_operations *ops)
>  {
>         struct dax_device *dax_dev;
>         dev_t devt;
> @@ -366,9 +365,6 @@ struct dax_device *alloc_dax(void *private, const struct dax_operations *ops,
>
>         dax_dev->ops = ops;
>         dax_dev->private = private;
> -       if (flags & DAXDEV_F_SYNC)
> -               set_dax_synchronous(dax_dev);
> -
>         return dax_dev;
>
>   err_dev:
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 4e997c02bb0a0..f4b972af10928 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1765,7 +1765,7 @@ static struct mapped_device *alloc_dev(int minor)
>         sprintf(md->disk->disk_name, "dm-%d", minor);
>
>         if (IS_ENABLED(CONFIG_FS_DAX)) {
> -               md->dax_dev = alloc_dax(md, &dm_dax_ops, 0);
> +               md->dax_dev = alloc_dax(md, &dm_dax_ops);
>                 if (IS_ERR(md->dax_dev)) {
>                         md->dax_dev = NULL;
>                         goto bad;
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index 8294f1c701baa..85b3339286bd8 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -402,7 +402,6 @@ static int pmem_attach_disk(struct device *dev,
>         struct gendisk *disk;
>         void *addr;
>         int rc;
> -       unsigned long flags = 0UL;
>
>         pmem = devm_kzalloc(dev, sizeof(*pmem), GFP_KERNEL);
>         if (!pmem)
> @@ -495,13 +494,13 @@ static int pmem_attach_disk(struct device *dev,
>         nvdimm_badblocks_populate(nd_region, &pmem->bb, &bb_range);
>         disk->bb = &pmem->bb;
>
> -       if (is_nvdimm_sync(nd_region))
> -               flags = DAXDEV_F_SYNC;
> -       dax_dev = alloc_dax(pmem, &pmem_dax_ops, flags);
> +       dax_dev = alloc_dax(pmem, &pmem_dax_ops);
>         if (IS_ERR(dax_dev)) {
>                 rc = PTR_ERR(dax_dev);
>                 goto out;
>         }
> +       if (is_nvdimm_sync(nd_region))
> +               set_dax_synchronous(dax_dev);
>         rc = dax_add_host(dax_dev, disk);
>         if (rc)
>                 goto out_cleanup_dax;
> diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
> index e65e83764d1ce..10823debc09bd 100644
> --- a/drivers/s390/block/dcssblk.c
> +++ b/drivers/s390/block/dcssblk.c
> @@ -686,13 +686,13 @@ dcssblk_add_store(struct device *dev, struct device_attribute *attr, const char
>         if (rc)
>                 goto put_dev;
>
> -       dev_info->dax_dev = alloc_dax(dev_info, &dcssblk_dax_ops,
> -                       DAXDEV_F_SYNC);
> +       dev_info->dax_dev = alloc_dax(dev_info, &dcssblk_dax_ops);
>         if (IS_ERR(dev_info->dax_dev)) {
>                 rc = PTR_ERR(dev_info->dax_dev);
>                 dev_info->dax_dev = NULL;
>                 goto put_dev;
>         }
> +       set_dax_synchronous(dev_info->dax_dev);
>         rc = dax_add_host(dev_info->dax_dev, dev_info->gd);
>         if (rc)
>                 goto out_dax;
> diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> index 242cc1c0d7ed7..5c03a0364a9bb 100644
> --- a/fs/fuse/virtio_fs.c
> +++ b/fs/fuse/virtio_fs.c
> @@ -850,7 +850,7 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs)
>         dev_dbg(&vdev->dev, "%s: window kaddr 0x%px phys_addr 0x%llx len 0x%llx\n",
>                 __func__, fs->window_kaddr, cache_reg.addr, cache_reg.len);
>
> -       fs->dax_dev = alloc_dax(fs, &virtio_fs_dax_ops, 0);
> +       fs->dax_dev = alloc_dax(fs, &virtio_fs_dax_ops);
>         if (IS_ERR(fs->dax_dev))
>                 return PTR_ERR(fs->dax_dev);
>
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index 3bd1fdb5d5f4b..c04f46478e3b5 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -6,9 +6,6 @@
>  #include <linux/mm.h>
>  #include <linux/radix-tree.h>
>
> -/* Flag for synchronous flush */
> -#define DAXDEV_F_SYNC (1UL << 0)
> -
>  typedef unsigned long dax_entry_t;
>
>  struct dax_device;
> @@ -42,8 +39,7 @@ struct dax_operations {
>  };
>
>  #if IS_ENABLED(CONFIG_DAX)
> -struct dax_device *alloc_dax(void *private, const struct dax_operations *ops,
> -               unsigned long flags);
> +struct dax_device *alloc_dax(void *private, const struct dax_operations *ops);
>  void put_dax(struct dax_device *dax_dev);
>  void kill_dax(struct dax_device *dax_dev);
>  void dax_write_cache(struct dax_device *dax_dev, bool wc);
> @@ -64,7 +60,7 @@ static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
>  }
>  #else
>  static inline struct dax_device *alloc_dax(void *private,
> -               const struct dax_operations *ops, unsigned long flags)
> +               const struct dax_operations *ops)
>  {
>         /*
>          * Callers should check IS_ENABLED(CONFIG_DAX) to know if this

Looks good.
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods
  2021-12-12 14:44     ` Dan Williams
  2021-12-13  8:23       ` Christoph Hellwig
@ 2021-12-13 16:17       ` Vivek Goyal
  1 sibling, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2021-12-13 16:17 UTC (permalink / raw)
  To: Dan Williams
  Cc: Christoph Hellwig, Vishal Verma, Dave Jiang, Alasdair Kergon,
	Mike Snitzer, Ira Weiny, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Stefan Hajnoczi, Miklos Szeredi,
	Matthew Wilcox, device-mapper development, Linux NVDIMM,
	linux-s390, linux-fsdevel, virtualization,
	Dr. David Alan Gilbert

On Sun, Dec 12, 2021 at 06:44:26AM -0800, Dan Williams wrote:
> On Fri, Dec 10, 2021 at 6:17 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Thu, Dec 09, 2021 at 07:38:27AM +0100, Christoph Hellwig wrote:
> > > These methods indirect the actual DAX read/write path.  In the end pmem
> > > uses magic flush and mc safe variants and fuse and dcssblk use plain ones
> > > while device mapper picks redirects to the underlying device.
> > >
> > > Add set_dax_virtual() and set_dax_nomcsafe() APIs for fuse to skip these
> > > special variants, then use them everywhere as they fall back to the plain
> > > ones on s390 anyway and remove an indirect call from the read/write path
> > > as well as a lot of boilerplate code.
> > >
> > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > > ---
> > >  drivers/dax/super.c           | 36 ++++++++++++++--
> > >  drivers/md/dm-linear.c        | 20 ---------
> > >  drivers/md/dm-log-writes.c    | 80 -----------------------------------
> > >  drivers/md/dm-stripe.c        | 20 ---------
> > >  drivers/md/dm.c               | 50 ----------------------
> > >  drivers/nvdimm/pmem.c         | 20 ---------
> > >  drivers/s390/block/dcssblk.c  | 14 ------
> > >  fs/dax.c                      |  5 ---
> > >  fs/fuse/virtio_fs.c           | 19 +--------
> > >  include/linux/dax.h           |  9 ++--
> > >  include/linux/device-mapper.h |  4 --
> > >  11 files changed, 37 insertions(+), 240 deletions(-)
> > >
> >
> > [..]
> > > diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> > > index 5c03a0364a9bb..754319ce2a29b 100644
> > > --- a/fs/fuse/virtio_fs.c
> > > +++ b/fs/fuse/virtio_fs.c
> > > @@ -753,20 +753,6 @@ static long virtio_fs_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
> > >       return nr_pages > max_nr_pages ? max_nr_pages : nr_pages;
> > >  }
> > >
> > > -static size_t virtio_fs_copy_from_iter(struct dax_device *dax_dev,
> > > -                                    pgoff_t pgoff, void *addr,
> > > -                                    size_t bytes, struct iov_iter *i)
> > > -{
> > > -     return copy_from_iter(addr, bytes, i);
> > > -}
> > > -
> > > -static size_t virtio_fs_copy_to_iter(struct dax_device *dax_dev,
> > > -                                    pgoff_t pgoff, void *addr,
> > > -                                    size_t bytes, struct iov_iter *i)
> > > -{
> > > -     return copy_to_iter(addr, bytes, i);
> > > -}
> > > -
> > >  static int virtio_fs_zero_page_range(struct dax_device *dax_dev,
> > >                                    pgoff_t pgoff, size_t nr_pages)
> > >  {
> > > @@ -783,8 +769,6 @@ static int virtio_fs_zero_page_range(struct dax_device *dax_dev,
> > >
> > >  static const struct dax_operations virtio_fs_dax_ops = {
> > >       .direct_access = virtio_fs_direct_access,
> > > -     .copy_from_iter = virtio_fs_copy_from_iter,
> > > -     .copy_to_iter = virtio_fs_copy_to_iter,
> > >       .zero_page_range = virtio_fs_zero_page_range,
> > >  };
> > >
> > > @@ -853,7 +837,8 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs)
> > >       fs->dax_dev = alloc_dax(fs, &virtio_fs_dax_ops);
> > >       if (IS_ERR(fs->dax_dev))
> > >               return PTR_ERR(fs->dax_dev);
> > > -
> > > +     set_dax_cached(fs->dax_dev);
> >
> > Looks good to me from virtiofs point of view.
> >
> > Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
> >
> > Going forward, I am wondering should virtiofs use flushcache version as
> > well. What if host filesystem is using DAX and mapping persistent memory
> > pfn directly into qemu address space. I have never tested that.
> >
> > Right now we are relying on applications to do fsync/msync on virtiofs
> > for data persistence.
> 
> This sounds like it would need coordination with a paravirtualized
> driver that can indicate whether the host side is pmem or not, like
> the virtio_pmem driver.

Agreed. Let me check the details of virtio_pmem driver.

> However, if the guest sends any fsync/msync
> you would still need to go explicitly cache flush any dirty page
> because you can't necessarily trust that the guest did that already.

So host dax functionality will already take care of that, IIUC, right?
I see a dax_flush() call in dax_writeback_one(). I am assuming that's
the will take care of flushing dirty pages when guest issues
fsync()/msync(). So probably don't have to do anything extra here.

I think qemu should map files using MAP_SYNC though in this case though.
Any read/writes to virtiofs files will turn into host file load/store
operations. So flushcache in guest makes more sense with MAP_SYNC which
should make sure any filesystem metadata will already persist after
fault completion. And later guest can do writes followed by flush and
ensure data persists too.

IOW, I probably only need to do following.

- In virtiofs virtual device, add a notion of kind of dax window or memory
  it supports. So may be some kind of "writethrough" property of virtiofs
  dax cache.

- Use this property in virtiofs driver to decide whether to use 
  plain copy_from_iter() or _copy_from_iter_flushcache().

- qemu should use mmap(MAP_SYNC) if host filesystem is on persistent
  memory.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/5] dax: always use _copy_mc_to_iter in dax_copy_to_iter
  2021-12-13  8:20       ` Christoph Hellwig
@ 2021-12-13 16:43         ` Dan Williams
  0 siblings, 0 replies; 33+ messages in thread
From: Dan Williams @ 2021-12-13 16:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Vivek Goyal, Vishal Verma, Dave Jiang, Alasdair Kergon,
	Mike Snitzer, Ira Weiny, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Stefan Hajnoczi, Miklos Szeredi,
	Matthew Wilcox, device-mapper development, Linux NVDIMM,
	linux-s390, linux-fsdevel, virtualization

On Mon, Dec 13, 2021 at 12:20 AM Christoph Hellwig <hch@lst.de> wrote:
>
> On Sun, Dec 12, 2021 at 06:48:05AM -0800, Dan Williams wrote:
> > On Fri, Dec 10, 2021 at 6:05 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > >
> > > On Thu, Dec 09, 2021 at 07:38:28AM +0100, Christoph Hellwig wrote:
> > > > While using the MC-safe copy routines is rather pointless on a virtual device
> > > > like virtiofs,
> > >
> > > I was wondering about that. Is it completely pointless.
> > >
> > > Typically we are just mapping host page cache into qemu address space.
> > > That shows as virtiofs device pfn in guest and that pfn is mapped into
> > > guest application address space in mmap() call.
> > >
> > > Given on host its DRAM, so I would not expect machine check on load side
> > > so there was no need to use machine check safe variant.
> >
> > That's a broken assumption, DRAM experiences multi-bit ECC errors.
> > Machine checks, data aborts, etc existed before PMEM.
>
> So the conclusion here is that we should always use the mc safe variant?

The proposal is one of the following:

1/ any paths not currently using the mc safe variant should continue
not using it to avoid the performance regression on older platforms,
i.e. drop this patch.

2/ add plumbing to switch to mcsafe variant, but only on newer
platforms, incremental new patch

3/ always use the mc safe variant, keep this patch

We could go with 3/ and see who screams, because 3/ is smallest
ongoing maintenance burden. However, I feel like 1/ is the path of
least resistance until the platforms with the need to do 'careful'
copying age out of use.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/5] dax: always use _copy_mc_to_iter in dax_copy_to_iter
  2021-12-12 14:48     ` Dan Williams
  2021-12-13  8:20       ` Christoph Hellwig
@ 2021-12-14 13:59       ` Vivek Goyal
  1 sibling, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2021-12-14 13:59 UTC (permalink / raw)
  To: Dan Williams
  Cc: Christoph Hellwig, Vishal Verma, Dave Jiang, Alasdair Kergon,
	Mike Snitzer, Ira Weiny, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Stefan Hajnoczi, Miklos Szeredi,
	Matthew Wilcox, device-mapper development, Linux NVDIMM,
	linux-s390, linux-fsdevel, virtualization

On Sun, Dec 12, 2021 at 06:48:05AM -0800, Dan Williams wrote:
> On Fri, Dec 10, 2021 at 6:05 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Thu, Dec 09, 2021 at 07:38:28AM +0100, Christoph Hellwig wrote:
> > > While using the MC-safe copy routines is rather pointless on a virtual device
> > > like virtiofs,
> >
> > I was wondering about that. Is it completely pointless.
> >
> > Typically we are just mapping host page cache into qemu address space.
> > That shows as virtiofs device pfn in guest and that pfn is mapped into
> > guest application address space in mmap() call.
> >
> > Given on host its DRAM, so I would not expect machine check on load side
> > so there was no need to use machine check safe variant.
> 
> That's a broken assumption, DRAM experiences multi-bit ECC errors.
> Machine checks, data aborts, etc existed before PMEM.

So we should use MC safe variant when loading from DRAM as well?
(If needed platoform support is there).

Vivek


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods
  2021-12-13  8:23       ` Christoph Hellwig
@ 2021-12-14 14:22         ` Vivek Goyal
  2021-12-14 16:41           ` Dan Williams
  0 siblings, 1 reply; 33+ messages in thread
From: Vivek Goyal @ 2021-12-14 14:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dan Williams, Vishal Verma, Dave Jiang, Alasdair Kergon,
	Mike Snitzer, Ira Weiny, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Stefan Hajnoczi, Miklos Szeredi,
	Matthew Wilcox, device-mapper development, Linux NVDIMM,
	linux-s390, linux-fsdevel, virtualization

On Mon, Dec 13, 2021 at 09:23:18AM +0100, Christoph Hellwig wrote:
> On Sun, Dec 12, 2021 at 06:44:26AM -0800, Dan Williams wrote:
> > On Fri, Dec 10, 2021 at 6:17 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > Going forward, I am wondering should virtiofs use flushcache version as
> > > well. What if host filesystem is using DAX and mapping persistent memory
> > > pfn directly into qemu address space. I have never tested that.
> > >
> > > Right now we are relying on applications to do fsync/msync on virtiofs
> > > for data persistence.
> > 
> > This sounds like it would need coordination with a paravirtualized
> > driver that can indicate whether the host side is pmem or not, like
> > the virtio_pmem driver. However, if the guest sends any fsync/msync
> > you would still need to go explicitly cache flush any dirty page
> > because you can't necessarily trust that the guest did that already.
> 
> Do we?  The application can't really know what backend it is on, so
> it sounds like the current virtiofs implementation doesn't really, does it?

Agreed that application does not know what backend it is on. So virtiofs
just offers regular posix API where applications have to do fsync/msync
for data persistence. No support for mmap(MAP_SYNC). We don't offer persistent
memory programming model on virtiofs. That's not the expectation. DAX 
is used only to bypass guest page cache.

With this assumption, I think we might not have to use flushcache version
at all even if shared filesystem is on persistent memory on host. 

- We mmap() host files into qemu address space. So any dax store in virtiofs
  should make corresponding pages dirty in page cache on host and when
  and fsync()/msync() comes later, it should flush all the data to PMEM.

- In case of file extending writes, virtiofs falls back to regular
  FUSE_WRITE path (and not use DAX), and in that case host pmem driver
  should make sure writes are flushed to pmem immediately.

Are there any other path I am missing. If not, looks like we might not
have to use flushcache version in virtiofs at all as long as we are not
offering guest applications user space flushes and MAP_SYNC support.

We still might have to use machine check safe variant though as loads
might generate synchronous machine check. What's not clear to me is
that if this MC safe variant should be used only in case of PMEM or
should it be used in case of non-PMEM as well.

Vivek


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods
  2021-12-14 14:22         ` Vivek Goyal
@ 2021-12-14 16:41           ` Dan Williams
  2021-12-14 20:32             ` Vivek Goyal
  0 siblings, 1 reply; 33+ messages in thread
From: Dan Williams @ 2021-12-14 16:41 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Christoph Hellwig, Vishal Verma, Dave Jiang, Alasdair Kergon,
	Mike Snitzer, Ira Weiny, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Stefan Hajnoczi, Miklos Szeredi,
	Matthew Wilcox, device-mapper development, Linux NVDIMM,
	linux-s390, linux-fsdevel, virtualization

On Tue, Dec 14, 2021 at 6:23 AM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Mon, Dec 13, 2021 at 09:23:18AM +0100, Christoph Hellwig wrote:
> > On Sun, Dec 12, 2021 at 06:44:26AM -0800, Dan Williams wrote:
> > > On Fri, Dec 10, 2021 at 6:17 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > > Going forward, I am wondering should virtiofs use flushcache version as
> > > > well. What if host filesystem is using DAX and mapping persistent memory
> > > > pfn directly into qemu address space. I have never tested that.
> > > >
> > > > Right now we are relying on applications to do fsync/msync on virtiofs
> > > > for data persistence.
> > >
> > > This sounds like it would need coordination with a paravirtualized
> > > driver that can indicate whether the host side is pmem or not, like
> > > the virtio_pmem driver. However, if the guest sends any fsync/msync
> > > you would still need to go explicitly cache flush any dirty page
> > > because you can't necessarily trust that the guest did that already.
> >
> > Do we?  The application can't really know what backend it is on, so
> > it sounds like the current virtiofs implementation doesn't really, does it?
>
> Agreed that application does not know what backend it is on. So virtiofs
> just offers regular posix API where applications have to do fsync/msync
> for data persistence. No support for mmap(MAP_SYNC). We don't offer persistent
> memory programming model on virtiofs. That's not the expectation. DAX
> is used only to bypass guest page cache.
>
> With this assumption, I think we might not have to use flushcache version
> at all even if shared filesystem is on persistent memory on host.
>
> - We mmap() host files into qemu address space. So any dax store in virtiofs
>   should make corresponding pages dirty in page cache on host and when
>   and fsync()/msync() comes later, it should flush all the data to PMEM.
>
> - In case of file extending writes, virtiofs falls back to regular
>   FUSE_WRITE path (and not use DAX), and in that case host pmem driver
>   should make sure writes are flushed to pmem immediately.
>
> Are there any other path I am missing. If not, looks like we might not
> have to use flushcache version in virtiofs at all as long as we are not
> offering guest applications user space flushes and MAP_SYNC support.
>
> We still might have to use machine check safe variant though as loads
> might generate synchronous machine check. What's not clear to me is
> that if this MC safe variant should be used only in case of PMEM or
> should it be used in case of non-PMEM as well.

It should be used on any memory address that can throw exception on
load, which is any physical address, in paths that can tolerate
memcpy() returning an error code, most I/O paths, and can tolerate
slower copy performance on older platforms that do not support MC
recovery with fast string operations, to date that's only PMEM users.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods
  2021-12-14 16:41           ` Dan Williams
@ 2021-12-14 20:32             ` Vivek Goyal
  2021-12-14 23:43               ` Dan Williams
  2021-12-15 10:30               ` Stefan Hajnoczi
  0 siblings, 2 replies; 33+ messages in thread
From: Vivek Goyal @ 2021-12-14 20:32 UTC (permalink / raw)
  To: Dan Williams
  Cc: Christoph Hellwig, Vishal Verma, Dave Jiang, Alasdair Kergon,
	Mike Snitzer, Ira Weiny, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Stefan Hajnoczi, Miklos Szeredi,
	Matthew Wilcox, device-mapper development, Linux NVDIMM,
	linux-s390, linux-fsdevel, virtualization

On Tue, Dec 14, 2021 at 08:41:30AM -0800, Dan Williams wrote:
> On Tue, Dec 14, 2021 at 6:23 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Mon, Dec 13, 2021 at 09:23:18AM +0100, Christoph Hellwig wrote:
> > > On Sun, Dec 12, 2021 at 06:44:26AM -0800, Dan Williams wrote:
> > > > On Fri, Dec 10, 2021 at 6:17 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > > > Going forward, I am wondering should virtiofs use flushcache version as
> > > > > well. What if host filesystem is using DAX and mapping persistent memory
> > > > > pfn directly into qemu address space. I have never tested that.
> > > > >
> > > > > Right now we are relying on applications to do fsync/msync on virtiofs
> > > > > for data persistence.
> > > >
> > > > This sounds like it would need coordination with a paravirtualized
> > > > driver that can indicate whether the host side is pmem or not, like
> > > > the virtio_pmem driver. However, if the guest sends any fsync/msync
> > > > you would still need to go explicitly cache flush any dirty page
> > > > because you can't necessarily trust that the guest did that already.
> > >
> > > Do we?  The application can't really know what backend it is on, so
> > > it sounds like the current virtiofs implementation doesn't really, does it?
> >
> > Agreed that application does not know what backend it is on. So virtiofs
> > just offers regular posix API where applications have to do fsync/msync
> > for data persistence. No support for mmap(MAP_SYNC). We don't offer persistent
> > memory programming model on virtiofs. That's not the expectation. DAX
> > is used only to bypass guest page cache.
> >
> > With this assumption, I think we might not have to use flushcache version
> > at all even if shared filesystem is on persistent memory on host.
> >
> > - We mmap() host files into qemu address space. So any dax store in virtiofs
> >   should make corresponding pages dirty in page cache on host and when
> >   and fsync()/msync() comes later, it should flush all the data to PMEM.
> >
> > - In case of file extending writes, virtiofs falls back to regular
> >   FUSE_WRITE path (and not use DAX), and in that case host pmem driver
> >   should make sure writes are flushed to pmem immediately.
> >
> > Are there any other path I am missing. If not, looks like we might not
> > have to use flushcache version in virtiofs at all as long as we are not
> > offering guest applications user space flushes and MAP_SYNC support.
> >
> > We still might have to use machine check safe variant though as loads
> > might generate synchronous machine check. What's not clear to me is
> > that if this MC safe variant should be used only in case of PMEM or
> > should it be used in case of non-PMEM as well.
> 
> It should be used on any memory address that can throw exception on
> load, which is any physical address, in paths that can tolerate
> memcpy() returning an error code, most I/O paths, and can tolerate
> slower copy performance on older platforms that do not support MC
> recovery with fast string operations, to date that's only PMEM users.

Ok, So basically latest cpus can do fast string operations with MC
recovery so that using MC safe variant is not a problem.

Then there is range of cpus which can do MC recovery but do slower
versions of memcpy and that's where the issue is.

So if we knew that virtiofs dax window is backed by a pmem device
then we should always use MC safe variant. Even if it means paying
the price of slow version for the sake of correctness. 

But if we are not using pmem on host, then there is no point in
using MC safe variant.

IOW.

	if (virtiofs_backed_by_pmem) {
		use_mc_safe_version
	else
		use_non_mc_safe_version
	}

Now question is, how do we know if virtiofs dax window is backed by
a pmem or not. I checked virtio_pmem driver and that does not seem
to communicate anything like that. It just communicates start of the
range and size of range, nothing else.

I don't have full handle on stack of modules of virtio_pmem, but my guess
is it probably is using MC safe version always (because it does not
know anthing about the backing storage).

/me will definitely like to pay penalty of slower memcpy if virtiofs
device is not backed by a pmem.

Vivek


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods
  2021-12-14 20:32             ` Vivek Goyal
@ 2021-12-14 23:43               ` Dan Williams
  2021-12-15 15:52                 ` Vivek Goyal
  2021-12-15 10:30               ` Stefan Hajnoczi
  1 sibling, 1 reply; 33+ messages in thread
From: Dan Williams @ 2021-12-14 23:43 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Christoph Hellwig, Vishal Verma, Dave Jiang, Alasdair Kergon,
	Mike Snitzer, Ira Weiny, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Stefan Hajnoczi, Miklos Szeredi,
	Matthew Wilcox, device-mapper development, Linux NVDIMM,
	linux-s390, linux-fsdevel, virtualization

On Tue, Dec 14, 2021 at 12:33 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Tue, Dec 14, 2021 at 08:41:30AM -0800, Dan Williams wrote:
> > On Tue, Dec 14, 2021 at 6:23 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > >
> > > On Mon, Dec 13, 2021 at 09:23:18AM +0100, Christoph Hellwig wrote:
> > > > On Sun, Dec 12, 2021 at 06:44:26AM -0800, Dan Williams wrote:
> > > > > On Fri, Dec 10, 2021 at 6:17 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > > > > Going forward, I am wondering should virtiofs use flushcache version as
> > > > > > well. What if host filesystem is using DAX and mapping persistent memory
> > > > > > pfn directly into qemu address space. I have never tested that.
> > > > > >
> > > > > > Right now we are relying on applications to do fsync/msync on virtiofs
> > > > > > for data persistence.
> > > > >
> > > > > This sounds like it would need coordination with a paravirtualized
> > > > > driver that can indicate whether the host side is pmem or not, like
> > > > > the virtio_pmem driver. However, if the guest sends any fsync/msync
> > > > > you would still need to go explicitly cache flush any dirty page
> > > > > because you can't necessarily trust that the guest did that already.
> > > >
> > > > Do we?  The application can't really know what backend it is on, so
> > > > it sounds like the current virtiofs implementation doesn't really, does it?
> > >
> > > Agreed that application does not know what backend it is on. So virtiofs
> > > just offers regular posix API where applications have to do fsync/msync
> > > for data persistence. No support for mmap(MAP_SYNC). We don't offer persistent
> > > memory programming model on virtiofs. That's not the expectation. DAX
> > > is used only to bypass guest page cache.
> > >
> > > With this assumption, I think we might not have to use flushcache version
> > > at all even if shared filesystem is on persistent memory on host.
> > >
> > > - We mmap() host files into qemu address space. So any dax store in virtiofs
> > >   should make corresponding pages dirty in page cache on host and when
> > >   and fsync()/msync() comes later, it should flush all the data to PMEM.
> > >
> > > - In case of file extending writes, virtiofs falls back to regular
> > >   FUSE_WRITE path (and not use DAX), and in that case host pmem driver
> > >   should make sure writes are flushed to pmem immediately.
> > >
> > > Are there any other path I am missing. If not, looks like we might not
> > > have to use flushcache version in virtiofs at all as long as we are not
> > > offering guest applications user space flushes and MAP_SYNC support.
> > >
> > > We still might have to use machine check safe variant though as loads
> > > might generate synchronous machine check. What's not clear to me is
> > > that if this MC safe variant should be used only in case of PMEM or
> > > should it be used in case of non-PMEM as well.
> >
> > It should be used on any memory address that can throw exception on
> > load, which is any physical address, in paths that can tolerate
> > memcpy() returning an error code, most I/O paths, and can tolerate
> > slower copy performance on older platforms that do not support MC
> > recovery with fast string operations, to date that's only PMEM users.
>
> Ok, So basically latest cpus can do fast string operations with MC
> recovery so that using MC safe variant is not a problem.
>
> Then there is range of cpus which can do MC recovery but do slower
> versions of memcpy and that's where the issue is.
>
> So if we knew that virtiofs dax window is backed by a pmem device
> then we should always use MC safe variant. Even if it means paying
> the price of slow version for the sake of correctness.
>
> But if we are not using pmem on host, then there is no point in
> using MC safe variant.
>
> IOW.
>
>         if (virtiofs_backed_by_pmem) {

No, PMEM should not be considered at all relative to whether to use MC
or not, it is 100% a decision of whether you expect virtiofs users
will balk more at unhandled machine checks or performance regressions
on the platforms that set "enable_copy_mc_fragile()". See
quirk_intel_brickland_xeon_ras_cap() and
quirk_intel_purley_xeon_ras_cap() in arch/x86/kernel/quirks.c.

>                 use_mc_safe_version
>         else
>                 use_non_mc_safe_version
>         }
>
> Now question is, how do we know if virtiofs dax window is backed by
> a pmem or not. I checked virtio_pmem driver and that does not seem
> to communicate anything like that. It just communicates start of the
> range and size of range, nothing else.
>
> I don't have full handle on stack of modules of virtio_pmem, but my guess
> is it probably is using MC safe version always (because it does not
> know anthing about the backing storage).
>
> /me will definitely like to pay penalty of slower memcpy if virtiofs
> device is not backed by a pmem.

I assume you meant "not like", but again PMEM has no bearing on
whether using that device will throw machine checks. I'm sure there
are people that would make the opposite tradeoff.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods
  2021-12-14 20:32             ` Vivek Goyal
  2021-12-14 23:43               ` Dan Williams
@ 2021-12-15 10:30               ` Stefan Hajnoczi
  2021-12-15 15:43                 ` Vivek Goyal
  1 sibling, 1 reply; 33+ messages in thread
From: Stefan Hajnoczi @ 2021-12-15 10:30 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Dan Williams, Christoph Hellwig, Vishal Verma, Dave Jiang,
	Alasdair Kergon, Mike Snitzer, Ira Weiny, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, Miklos Szeredi,
	Matthew Wilcox, device-mapper development, Linux NVDIMM,
	linux-s390, linux-fsdevel, virtualization

[-- Attachment #1: Type: text/plain, Size: 4743 bytes --]

On Tue, Dec 14, 2021 at 03:32:43PM -0500, Vivek Goyal wrote:
> On Tue, Dec 14, 2021 at 08:41:30AM -0800, Dan Williams wrote:
> > On Tue, Dec 14, 2021 at 6:23 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > >
> > > On Mon, Dec 13, 2021 at 09:23:18AM +0100, Christoph Hellwig wrote:
> > > > On Sun, Dec 12, 2021 at 06:44:26AM -0800, Dan Williams wrote:
> > > > > On Fri, Dec 10, 2021 at 6:17 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > > > > Going forward, I am wondering should virtiofs use flushcache version as
> > > > > > well. What if host filesystem is using DAX and mapping persistent memory
> > > > > > pfn directly into qemu address space. I have never tested that.
> > > > > >
> > > > > > Right now we are relying on applications to do fsync/msync on virtiofs
> > > > > > for data persistence.
> > > > >
> > > > > This sounds like it would need coordination with a paravirtualized
> > > > > driver that can indicate whether the host side is pmem or not, like
> > > > > the virtio_pmem driver. However, if the guest sends any fsync/msync
> > > > > you would still need to go explicitly cache flush any dirty page
> > > > > because you can't necessarily trust that the guest did that already.
> > > >
> > > > Do we?  The application can't really know what backend it is on, so
> > > > it sounds like the current virtiofs implementation doesn't really, does it?
> > >
> > > Agreed that application does not know what backend it is on. So virtiofs
> > > just offers regular posix API where applications have to do fsync/msync
> > > for data persistence. No support for mmap(MAP_SYNC). We don't offer persistent
> > > memory programming model on virtiofs. That's not the expectation. DAX
> > > is used only to bypass guest page cache.
> > >
> > > With this assumption, I think we might not have to use flushcache version
> > > at all even if shared filesystem is on persistent memory on host.
> > >
> > > - We mmap() host files into qemu address space. So any dax store in virtiofs
> > >   should make corresponding pages dirty in page cache on host and when
> > >   and fsync()/msync() comes later, it should flush all the data to PMEM.
> > >
> > > - In case of file extending writes, virtiofs falls back to regular
> > >   FUSE_WRITE path (and not use DAX), and in that case host pmem driver
> > >   should make sure writes are flushed to pmem immediately.
> > >
> > > Are there any other path I am missing. If not, looks like we might not
> > > have to use flushcache version in virtiofs at all as long as we are not
> > > offering guest applications user space flushes and MAP_SYNC support.
> > >
> > > We still might have to use machine check safe variant though as loads
> > > might generate synchronous machine check. What's not clear to me is
> > > that if this MC safe variant should be used only in case of PMEM or
> > > should it be used in case of non-PMEM as well.
> > 
> > It should be used on any memory address that can throw exception on
> > load, which is any physical address, in paths that can tolerate
> > memcpy() returning an error code, most I/O paths, and can tolerate
> > slower copy performance on older platforms that do not support MC
> > recovery with fast string operations, to date that's only PMEM users.
> 
> Ok, So basically latest cpus can do fast string operations with MC
> recovery so that using MC safe variant is not a problem.
> 
> Then there is range of cpus which can do MC recovery but do slower
> versions of memcpy and that's where the issue is.
> 
> So if we knew that virtiofs dax window is backed by a pmem device
> then we should always use MC safe variant. Even if it means paying
> the price of slow version for the sake of correctness. 
> 
> But if we are not using pmem on host, then there is no point in
> using MC safe variant.
> 
> IOW.
> 
> 	if (virtiofs_backed_by_pmem) {
> 		use_mc_safe_version
> 	else
> 		use_non_mc_safe_version
> 	}
> 
> Now question is, how do we know if virtiofs dax window is backed by
> a pmem or not. I checked virtio_pmem driver and that does not seem
> to communicate anything like that. It just communicates start of the
> range and size of range, nothing else.
> 
> I don't have full handle on stack of modules of virtio_pmem, but my guess
> is it probably is using MC safe version always (because it does not
> know anthing about the backing storage).
> 
> /me will definitely like to pay penalty of slower memcpy if virtiofs
> device is not backed by a pmem.

Reads from the page cache handle machine checks (filemap_read() ->
raw_copy_to_user()). I think virtiofs should therefore always handle
machine checks when reading from the DAX Window.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods
  2021-12-15 10:30               ` Stefan Hajnoczi
@ 2021-12-15 15:43                 ` Vivek Goyal
  2021-12-15 17:27                   ` Stefan Hajnoczi
  0 siblings, 1 reply; 33+ messages in thread
From: Vivek Goyal @ 2021-12-15 15:43 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Dan Williams, Christoph Hellwig, Vishal Verma, Dave Jiang,
	Alasdair Kergon, Mike Snitzer, Ira Weiny, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, Miklos Szeredi,
	Matthew Wilcox, device-mapper development, Linux NVDIMM,
	linux-s390, linux-fsdevel, virtualization

On Wed, Dec 15, 2021 at 10:30:50AM +0000, Stefan Hajnoczi wrote:
> On Tue, Dec 14, 2021 at 03:32:43PM -0500, Vivek Goyal wrote:
> > On Tue, Dec 14, 2021 at 08:41:30AM -0800, Dan Williams wrote:
> > > On Tue, Dec 14, 2021 at 6:23 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > >
> > > > On Mon, Dec 13, 2021 at 09:23:18AM +0100, Christoph Hellwig wrote:
> > > > > On Sun, Dec 12, 2021 at 06:44:26AM -0800, Dan Williams wrote:
> > > > > > On Fri, Dec 10, 2021 at 6:17 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > > > > > Going forward, I am wondering should virtiofs use flushcache version as
> > > > > > > well. What if host filesystem is using DAX and mapping persistent memory
> > > > > > > pfn directly into qemu address space. I have never tested that.
> > > > > > >
> > > > > > > Right now we are relying on applications to do fsync/msync on virtiofs
> > > > > > > for data persistence.
> > > > > >
> > > > > > This sounds like it would need coordination with a paravirtualized
> > > > > > driver that can indicate whether the host side is pmem or not, like
> > > > > > the virtio_pmem driver. However, if the guest sends any fsync/msync
> > > > > > you would still need to go explicitly cache flush any dirty page
> > > > > > because you can't necessarily trust that the guest did that already.
> > > > >
> > > > > Do we?  The application can't really know what backend it is on, so
> > > > > it sounds like the current virtiofs implementation doesn't really, does it?
> > > >
> > > > Agreed that application does not know what backend it is on. So virtiofs
> > > > just offers regular posix API where applications have to do fsync/msync
> > > > for data persistence. No support for mmap(MAP_SYNC). We don't offer persistent
> > > > memory programming model on virtiofs. That's not the expectation. DAX
> > > > is used only to bypass guest page cache.
> > > >
> > > > With this assumption, I think we might not have to use flushcache version
> > > > at all even if shared filesystem is on persistent memory on host.
> > > >
> > > > - We mmap() host files into qemu address space. So any dax store in virtiofs
> > > >   should make corresponding pages dirty in page cache on host and when
> > > >   and fsync()/msync() comes later, it should flush all the data to PMEM.
> > > >
> > > > - In case of file extending writes, virtiofs falls back to regular
> > > >   FUSE_WRITE path (and not use DAX), and in that case host pmem driver
> > > >   should make sure writes are flushed to pmem immediately.
> > > >
> > > > Are there any other path I am missing. If not, looks like we might not
> > > > have to use flushcache version in virtiofs at all as long as we are not
> > > > offering guest applications user space flushes and MAP_SYNC support.
> > > >
> > > > We still might have to use machine check safe variant though as loads
> > > > might generate synchronous machine check. What's not clear to me is
> > > > that if this MC safe variant should be used only in case of PMEM or
> > > > should it be used in case of non-PMEM as well.
> > > 
> > > It should be used on any memory address that can throw exception on
> > > load, which is any physical address, in paths that can tolerate
> > > memcpy() returning an error code, most I/O paths, and can tolerate
> > > slower copy performance on older platforms that do not support MC
> > > recovery with fast string operations, to date that's only PMEM users.
> > 
> > Ok, So basically latest cpus can do fast string operations with MC
> > recovery so that using MC safe variant is not a problem.
> > 
> > Then there is range of cpus which can do MC recovery but do slower
> > versions of memcpy and that's where the issue is.
> > 
> > So if we knew that virtiofs dax window is backed by a pmem device
> > then we should always use MC safe variant. Even if it means paying
> > the price of slow version for the sake of correctness. 
> > 
> > But if we are not using pmem on host, then there is no point in
> > using MC safe variant.
> > 
> > IOW.
> > 
> > 	if (virtiofs_backed_by_pmem) {
> > 		use_mc_safe_version
> > 	else
> > 		use_non_mc_safe_version
> > 	}
> > 
> > Now question is, how do we know if virtiofs dax window is backed by
> > a pmem or not. I checked virtio_pmem driver and that does not seem
> > to communicate anything like that. It just communicates start of the
> > range and size of range, nothing else.
> > 
> > I don't have full handle on stack of modules of virtio_pmem, but my guess
> > is it probably is using MC safe version always (because it does not
> > know anthing about the backing storage).
> > 
> > /me will definitely like to pay penalty of slower memcpy if virtiofs
> > device is not backed by a pmem.
> 
> Reads from the page cache handle machine checks (filemap_read() ->
> raw_copy_to_user()). I think virtiofs should therefore always handle
> machine checks when reading from the DAX Window.

IIUC, raw_copy_to_user() does not handle recovery from machine check. For
example, it can call copy_user_enhanced_fast_string() if cpu supports
X86_FEATURE_ERMS. But equivalent machine check safe version is
copy_mc_enhanced_fast_string() instead.

Hence, I don't think reading from page cache is using machine check safe
variants by default. This copy_mc_to_user() path has to be taken
explicitly for machine check safe variants. And currently only pmem driver
seems to take it by calling _copy_mc_to_iter().

Thanks
Vivek


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods
  2021-12-14 23:43               ` Dan Williams
@ 2021-12-15 15:52                 ` Vivek Goyal
  2021-12-15 16:46                   ` Dan Williams
  0 siblings, 1 reply; 33+ messages in thread
From: Vivek Goyal @ 2021-12-15 15:52 UTC (permalink / raw)
  To: Dan Williams
  Cc: Christoph Hellwig, Vishal Verma, Dave Jiang, Alasdair Kergon,
	Mike Snitzer, Ira Weiny, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Stefan Hajnoczi, Miklos Szeredi,
	Matthew Wilcox, device-mapper development, Linux NVDIMM,
	linux-s390, linux-fsdevel, virtualization

On Tue, Dec 14, 2021 at 03:43:38PM -0800, Dan Williams wrote:
> On Tue, Dec 14, 2021 at 12:33 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Tue, Dec 14, 2021 at 08:41:30AM -0800, Dan Williams wrote:
> > > On Tue, Dec 14, 2021 at 6:23 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > >
> > > > On Mon, Dec 13, 2021 at 09:23:18AM +0100, Christoph Hellwig wrote:
> > > > > On Sun, Dec 12, 2021 at 06:44:26AM -0800, Dan Williams wrote:
> > > > > > On Fri, Dec 10, 2021 at 6:17 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > > > > > Going forward, I am wondering should virtiofs use flushcache version as
> > > > > > > well. What if host filesystem is using DAX and mapping persistent memory
> > > > > > > pfn directly into qemu address space. I have never tested that.
> > > > > > >
> > > > > > > Right now we are relying on applications to do fsync/msync on virtiofs
> > > > > > > for data persistence.
> > > > > >
> > > > > > This sounds like it would need coordination with a paravirtualized
> > > > > > driver that can indicate whether the host side is pmem or not, like
> > > > > > the virtio_pmem driver. However, if the guest sends any fsync/msync
> > > > > > you would still need to go explicitly cache flush any dirty page
> > > > > > because you can't necessarily trust that the guest did that already.
> > > > >
> > > > > Do we?  The application can't really know what backend it is on, so
> > > > > it sounds like the current virtiofs implementation doesn't really, does it?
> > > >
> > > > Agreed that application does not know what backend it is on. So virtiofs
> > > > just offers regular posix API where applications have to do fsync/msync
> > > > for data persistence. No support for mmap(MAP_SYNC). We don't offer persistent
> > > > memory programming model on virtiofs. That's not the expectation. DAX
> > > > is used only to bypass guest page cache.
> > > >
> > > > With this assumption, I think we might not have to use flushcache version
> > > > at all even if shared filesystem is on persistent memory on host.
> > > >
> > > > - We mmap() host files into qemu address space. So any dax store in virtiofs
> > > >   should make corresponding pages dirty in page cache on host and when
> > > >   and fsync()/msync() comes later, it should flush all the data to PMEM.
> > > >
> > > > - In case of file extending writes, virtiofs falls back to regular
> > > >   FUSE_WRITE path (and not use DAX), and in that case host pmem driver
> > > >   should make sure writes are flushed to pmem immediately.
> > > >
> > > > Are there any other path I am missing. If not, looks like we might not
> > > > have to use flushcache version in virtiofs at all as long as we are not
> > > > offering guest applications user space flushes and MAP_SYNC support.
> > > >
> > > > We still might have to use machine check safe variant though as loads
> > > > might generate synchronous machine check. What's not clear to me is
> > > > that if this MC safe variant should be used only in case of PMEM or
> > > > should it be used in case of non-PMEM as well.
> > >
> > > It should be used on any memory address that can throw exception on
> > > load, which is any physical address, in paths that can tolerate
> > > memcpy() returning an error code, most I/O paths, and can tolerate
> > > slower copy performance on older platforms that do not support MC
> > > recovery with fast string operations, to date that's only PMEM users.
> >
> > Ok, So basically latest cpus can do fast string operations with MC
> > recovery so that using MC safe variant is not a problem.
> >
> > Then there is range of cpus which can do MC recovery but do slower
> > versions of memcpy and that's where the issue is.
> >
> > So if we knew that virtiofs dax window is backed by a pmem device
> > then we should always use MC safe variant. Even if it means paying
> > the price of slow version for the sake of correctness.
> >
> > But if we are not using pmem on host, then there is no point in
> > using MC safe variant.
> >
> > IOW.
> >
> >         if (virtiofs_backed_by_pmem) {
> 
> No, PMEM should not be considered at all relative to whether to use MC
> or not, it is 100% a decision of whether you expect virtiofs users
> will balk more at unhandled machine checks or performance regressions
> on the platforms that set "enable_copy_mc_fragile()".

If we don't handle machine check, kernel will panic(), right? So that's
the trade off. Whether get higher performance (on select platforms) and
crash if MC happens OR get slower memcpy() performance (on select
platoforms) and recover from MC. Hmm...


> See
> quirk_intel_brickland_xeon_ras_cap() and
> quirk_intel_purley_xeon_ras_cap() in arch/x86/kernel/quirks.c.
> 
> >                 use_mc_safe_version
> >         else
> >                 use_non_mc_safe_version
> >         }
> >
> > Now question is, how do we know if virtiofs dax window is backed by
> > a pmem or not. I checked virtio_pmem driver and that does not seem
> > to communicate anything like that. It just communicates start of the
> > range and size of range, nothing else.
> >
> > I don't have full handle on stack of modules of virtio_pmem, but my guess
> > is it probably is using MC safe version always (because it does not
> > know anthing about the backing storage).
> >
> > /me will definitely like to pay penalty of slower memcpy if virtiofs
> > device is not backed by a pmem.
> 
> I assume you meant "not like",

Yes. It was a typo.

> but again PMEM has no bearing on
> whether using that device will throw machine checks. I'm sure there
> are people that would make the opposite tradeoff.

Why pmem driver does not have to make such trade off and it always
uses machine check variant.

As you mentioned machine checks can happen with DRAM too. So why loading
from page cache not use machine check variant (or given an option to user
allow making a choice).

BTW, stefan mentioned that we could think of adding a device feature
bit to signal whether to do MC safe memcpy() or not if it becomes
really necessary. For now probably let us stick to performance 
variant and if users demand machine check handling, then either
introduce it unconditionally or make it an opt-in based on device
feature bit.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods
  2021-12-15 15:52                 ` Vivek Goyal
@ 2021-12-15 16:46                   ` Dan Williams
  0 siblings, 0 replies; 33+ messages in thread
From: Dan Williams @ 2021-12-15 16:46 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Christoph Hellwig, Vishal Verma, Dave Jiang, Alasdair Kergon,
	Mike Snitzer, Ira Weiny, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Stefan Hajnoczi, Miklos Szeredi,
	Matthew Wilcox, device-mapper development, Linux NVDIMM,
	linux-s390, linux-fsdevel, virtualization

On Wed, Dec 15, 2021 at 7:53 AM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Tue, Dec 14, 2021 at 03:43:38PM -0800, Dan Williams wrote:
> > On Tue, Dec 14, 2021 at 12:33 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> > >
> > > On Tue, Dec 14, 2021 at 08:41:30AM -0800, Dan Williams wrote:
> > > > On Tue, Dec 14, 2021 at 6:23 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > > >
> > > > > On Mon, Dec 13, 2021 at 09:23:18AM +0100, Christoph Hellwig wrote:
> > > > > > On Sun, Dec 12, 2021 at 06:44:26AM -0800, Dan Williams wrote:
> > > > > > > On Fri, Dec 10, 2021 at 6:17 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > > > > > > Going forward, I am wondering should virtiofs use flushcache version as
> > > > > > > > well. What if host filesystem is using DAX and mapping persistent memory
> > > > > > > > pfn directly into qemu address space. I have never tested that.
> > > > > > > >
> > > > > > > > Right now we are relying on applications to do fsync/msync on virtiofs
> > > > > > > > for data persistence.
> > > > > > >
> > > > > > > This sounds like it would need coordination with a paravirtualized
> > > > > > > driver that can indicate whether the host side is pmem or not, like
> > > > > > > the virtio_pmem driver. However, if the guest sends any fsync/msync
> > > > > > > you would still need to go explicitly cache flush any dirty page
> > > > > > > because you can't necessarily trust that the guest did that already.
> > > > > >
> > > > > > Do we?  The application can't really know what backend it is on, so
> > > > > > it sounds like the current virtiofs implementation doesn't really, does it?
> > > > >
> > > > > Agreed that application does not know what backend it is on. So virtiofs
> > > > > just offers regular posix API where applications have to do fsync/msync
> > > > > for data persistence. No support for mmap(MAP_SYNC). We don't offer persistent
> > > > > memory programming model on virtiofs. That's not the expectation. DAX
> > > > > is used only to bypass guest page cache.
> > > > >
> > > > > With this assumption, I think we might not have to use flushcache version
> > > > > at all even if shared filesystem is on persistent memory on host.
> > > > >
> > > > > - We mmap() host files into qemu address space. So any dax store in virtiofs
> > > > >   should make corresponding pages dirty in page cache on host and when
> > > > >   and fsync()/msync() comes later, it should flush all the data to PMEM.
> > > > >
> > > > > - In case of file extending writes, virtiofs falls back to regular
> > > > >   FUSE_WRITE path (and not use DAX), and in that case host pmem driver
> > > > >   should make sure writes are flushed to pmem immediately.
> > > > >
> > > > > Are there any other path I am missing. If not, looks like we might not
> > > > > have to use flushcache version in virtiofs at all as long as we are not
> > > > > offering guest applications user space flushes and MAP_SYNC support.
> > > > >
> > > > > We still might have to use machine check safe variant though as loads
> > > > > might generate synchronous machine check. What's not clear to me is
> > > > > that if this MC safe variant should be used only in case of PMEM or
> > > > > should it be used in case of non-PMEM as well.
> > > >
> > > > It should be used on any memory address that can throw exception on
> > > > load, which is any physical address, in paths that can tolerate
> > > > memcpy() returning an error code, most I/O paths, and can tolerate
> > > > slower copy performance on older platforms that do not support MC
> > > > recovery with fast string operations, to date that's only PMEM users.
> > >
> > > Ok, So basically latest cpus can do fast string operations with MC
> > > recovery so that using MC safe variant is not a problem.
> > >
> > > Then there is range of cpus which can do MC recovery but do slower
> > > versions of memcpy and that's where the issue is.
> > >
> > > So if we knew that virtiofs dax window is backed by a pmem device
> > > then we should always use MC safe variant. Even if it means paying
> > > the price of slow version for the sake of correctness.
> > >
> > > But if we are not using pmem on host, then there is no point in
> > > using MC safe variant.
> > >
> > > IOW.
> > >
> > >         if (virtiofs_backed_by_pmem) {
> >
> > No, PMEM should not be considered at all relative to whether to use MC
> > or not, it is 100% a decision of whether you expect virtiofs users
> > will balk more at unhandled machine checks or performance regressions
> > on the platforms that set "enable_copy_mc_fragile()".
>
> If we don't handle machine check, kernel will panic(), right? So that's
> the trade off. Whether get higher performance (on select platforms) and
> crash if MC happens OR get slower memcpy() performance (on select
> platoforms) and recover from MC. Hmm...
>
>
> > See
> > quirk_intel_brickland_xeon_ras_cap() and
> > quirk_intel_purley_xeon_ras_cap() in arch/x86/kernel/quirks.c.
> >
> > >                 use_mc_safe_version
> > >         else
> > >                 use_non_mc_safe_version
> > >         }
> > >
> > > Now question is, how do we know if virtiofs dax window is backed by
> > > a pmem or not. I checked virtio_pmem driver and that does not seem
> > > to communicate anything like that. It just communicates start of the
> > > range and size of range, nothing else.
> > >
> > > I don't have full handle on stack of modules of virtio_pmem, but my guess
> > > is it probably is using MC safe version always (because it does not
> > > know anthing about the backing storage).
> > >
> > > /me will definitely like to pay penalty of slower memcpy if virtiofs
> > > device is not backed by a pmem.
> >
> > I assume you meant "not like",
>
> Yes. It was a typo.
>
> > but again PMEM has no bearing on
> > whether using that device will throw machine checks. I'm sure there
> > are people that would make the opposite tradeoff.
>
> Why pmem driver does not have to make such trade off and it always
> uses machine check variant.

It certainly did. I can't find the thread now, but the end result was
to accept the performance regression in favor of maximal MC handling
protection.

> As you mentioned machine checks can happen with DRAM too. So why loading
> from page cache not use machine check variant (or given an option to user
> allow making a choice).

...because regressing page cache operations is fraught, and x86
machine check architecture elicits strong feelings.

> BTW, stefan mentioned that we could think of adding a device feature
> bit to signal whether to do MC safe memcpy() or not if it becomes
> really necessary. For now probably let us stick to performance
> variant and if users demand machine check handling, then either
> introduce it unconditionally or make it an opt-in based on device
> feature bit.

Sure, it's a reasonable choice.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods
  2021-12-15 15:43                 ` Vivek Goyal
@ 2021-12-15 17:27                   ` Stefan Hajnoczi
  0 siblings, 0 replies; 33+ messages in thread
From: Stefan Hajnoczi @ 2021-12-15 17:27 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Dan Williams, Christoph Hellwig, Vishal Verma, Dave Jiang,
	Alasdair Kergon, Mike Snitzer, Ira Weiny, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, Miklos Szeredi,
	Matthew Wilcox, device-mapper development, Linux NVDIMM,
	linux-s390, linux-fsdevel, virtualization

[-- Attachment #1: Type: text/plain, Size: 6569 bytes --]

On Wed, Dec 15, 2021 at 10:43:33AM -0500, Vivek Goyal wrote:
> On Wed, Dec 15, 2021 at 10:30:50AM +0000, Stefan Hajnoczi wrote:
> > On Tue, Dec 14, 2021 at 03:32:43PM -0500, Vivek Goyal wrote:
> > > On Tue, Dec 14, 2021 at 08:41:30AM -0800, Dan Williams wrote:
> > > > On Tue, Dec 14, 2021 at 6:23 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > > >
> > > > > On Mon, Dec 13, 2021 at 09:23:18AM +0100, Christoph Hellwig wrote:
> > > > > > On Sun, Dec 12, 2021 at 06:44:26AM -0800, Dan Williams wrote:
> > > > > > > On Fri, Dec 10, 2021 at 6:17 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > > > > > > Going forward, I am wondering should virtiofs use flushcache version as
> > > > > > > > well. What if host filesystem is using DAX and mapping persistent memory
> > > > > > > > pfn directly into qemu address space. I have never tested that.
> > > > > > > >
> > > > > > > > Right now we are relying on applications to do fsync/msync on virtiofs
> > > > > > > > for data persistence.
> > > > > > >
> > > > > > > This sounds like it would need coordination with a paravirtualized
> > > > > > > driver that can indicate whether the host side is pmem or not, like
> > > > > > > the virtio_pmem driver. However, if the guest sends any fsync/msync
> > > > > > > you would still need to go explicitly cache flush any dirty page
> > > > > > > because you can't necessarily trust that the guest did that already.
> > > > > >
> > > > > > Do we?  The application can't really know what backend it is on, so
> > > > > > it sounds like the current virtiofs implementation doesn't really, does it?
> > > > >
> > > > > Agreed that application does not know what backend it is on. So virtiofs
> > > > > just offers regular posix API where applications have to do fsync/msync
> > > > > for data persistence. No support for mmap(MAP_SYNC). We don't offer persistent
> > > > > memory programming model on virtiofs. That's not the expectation. DAX
> > > > > is used only to bypass guest page cache.
> > > > >
> > > > > With this assumption, I think we might not have to use flushcache version
> > > > > at all even if shared filesystem is on persistent memory on host.
> > > > >
> > > > > - We mmap() host files into qemu address space. So any dax store in virtiofs
> > > > >   should make corresponding pages dirty in page cache on host and when
> > > > >   and fsync()/msync() comes later, it should flush all the data to PMEM.
> > > > >
> > > > > - In case of file extending writes, virtiofs falls back to regular
> > > > >   FUSE_WRITE path (and not use DAX), and in that case host pmem driver
> > > > >   should make sure writes are flushed to pmem immediately.
> > > > >
> > > > > Are there any other path I am missing. If not, looks like we might not
> > > > > have to use flushcache version in virtiofs at all as long as we are not
> > > > > offering guest applications user space flushes and MAP_SYNC support.
> > > > >
> > > > > We still might have to use machine check safe variant though as loads
> > > > > might generate synchronous machine check. What's not clear to me is
> > > > > that if this MC safe variant should be used only in case of PMEM or
> > > > > should it be used in case of non-PMEM as well.
> > > > 
> > > > It should be used on any memory address that can throw exception on
> > > > load, which is any physical address, in paths that can tolerate
> > > > memcpy() returning an error code, most I/O paths, and can tolerate
> > > > slower copy performance on older platforms that do not support MC
> > > > recovery with fast string operations, to date that's only PMEM users.
> > > 
> > > Ok, So basically latest cpus can do fast string operations with MC
> > > recovery so that using MC safe variant is not a problem.
> > > 
> > > Then there is range of cpus which can do MC recovery but do slower
> > > versions of memcpy and that's where the issue is.
> > > 
> > > So if we knew that virtiofs dax window is backed by a pmem device
> > > then we should always use MC safe variant. Even if it means paying
> > > the price of slow version for the sake of correctness. 
> > > 
> > > But if we are not using pmem on host, then there is no point in
> > > using MC safe variant.
> > > 
> > > IOW.
> > > 
> > > 	if (virtiofs_backed_by_pmem) {
> > > 		use_mc_safe_version
> > > 	else
> > > 		use_non_mc_safe_version
> > > 	}
> > > 
> > > Now question is, how do we know if virtiofs dax window is backed by
> > > a pmem or not. I checked virtio_pmem driver and that does not seem
> > > to communicate anything like that. It just communicates start of the
> > > range and size of range, nothing else.
> > > 
> > > I don't have full handle on stack of modules of virtio_pmem, but my guess
> > > is it probably is using MC safe version always (because it does not
> > > know anthing about the backing storage).
> > > 
> > > /me will definitely like to pay penalty of slower memcpy if virtiofs
> > > device is not backed by a pmem.
> > 
> > Reads from the page cache handle machine checks (filemap_read() ->
> > raw_copy_to_user()). I think virtiofs should therefore always handle
> > machine checks when reading from the DAX Window.
> 
> IIUC, raw_copy_to_user() does not handle recovery from machine check. For
> example, it can call copy_user_enhanced_fast_string() if cpu supports
> X86_FEATURE_ERMS. But equivalent machine check safe version is
> copy_mc_enhanced_fast_string() instead.
> 
> Hence, I don't think reading from page cache is using machine check safe
> variants by default. This copy_mc_to_user() path has to be taken
> explicitly for machine check safe variants. And currently only pmem driver
> seems to take it by calling _copy_mc_to_iter().

Now I'm confused between copy_user_enhanced_fast_string() and
copy_mc_enhanced_fast_string(). The code is very similar, the main
difference being _ASM_EXTABLE_CPY() vs _ASM_EXTABLE_TYPE(..., ...,
EX_TYPE_DEFAULT_MCE_SAFE).

Both return IN_KERNEL_RECOV from error_context() and set mce->kflags |=
MCE_IN_KERNEL_RECOV. The difference is that
copy_user_enhanced_fast_string() also sets mce->kflags |=
MCE_IN_KERNEL_COPYIN in copy_user_enhanced_fast_string() whereas
copy_mc_enhanced_fast_string() does not.

I must be missing something:

1. What is the purpose of the extable in
   copy_user_enhanced_fast_string() if that function cannot recover from
   MCEs?

2. Why is there a "Don't try to copy the tail if machine check happened"
   comment in .Lcopy_user_handle_tail?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2021-12-15 17:29 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-09  6:38 devirtualize kernel access to DAX Christoph Hellwig
2021-12-09  6:38 ` [PATCH 1/5] uio: remove copy_from_iter_flushcache() and copy_mc_to_iter() Christoph Hellwig
2021-12-12 14:22   ` Dan Williams
2021-12-13  8:27     ` Christoph Hellwig
2021-12-09  6:38 ` [PATCH 2/5] dax: simplify dax_synchronous and set_dax_synchronous Christoph Hellwig
2021-12-09 21:03   ` Pankaj Gupta
2021-12-12 14:23   ` Dan Williams
2021-12-09  6:38 ` [PATCH 3/5] dax: remove the DAXDEV_F_SYNC flag Christoph Hellwig
2021-12-12 14:24   ` Dan Williams
2021-12-13  8:40   ` Pankaj Gupta
2021-12-09  6:38 ` [PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods Christoph Hellwig
2021-12-10 14:16   ` Vivek Goyal
2021-12-12 14:44     ` Dan Williams
2021-12-13  8:23       ` Christoph Hellwig
2021-12-14 14:22         ` Vivek Goyal
2021-12-14 16:41           ` Dan Williams
2021-12-14 20:32             ` Vivek Goyal
2021-12-14 23:43               ` Dan Williams
2021-12-15 15:52                 ` Vivek Goyal
2021-12-15 16:46                   ` Dan Williams
2021-12-15 10:30               ` Stefan Hajnoczi
2021-12-15 15:43                 ` Vivek Goyal
2021-12-15 17:27                   ` Stefan Hajnoczi
2021-12-13 16:17       ` Vivek Goyal
2021-12-12 14:39   ` Dan Williams
2021-12-13  8:24     ` Christoph Hellwig
2021-12-09  6:38 ` [PATCH 5/5] dax: always use _copy_mc_to_iter in dax_copy_to_iter Christoph Hellwig
2021-12-10 14:05   ` Vivek Goyal
2021-12-12 14:48     ` Dan Williams
2021-12-13  8:20       ` Christoph Hellwig
2021-12-13 16:43         ` Dan Williams
2021-12-14 13:59       ` Vivek Goyal
2021-12-12 15:03   ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).