ocfs2-devel.oss.oracle.com archive mirror
 help / color / mirror / Atom feed
* [Ocfs2-devel] [PATCH 0/7] fsdax, xfs: Add reflink&dedupe support for fsdax
@ 2021-02-07 17:09 Shiyang Ruan
  2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 1/7] fsdax: Output address in dax_iomap_pfn() and rename it Shiyang Ruan
                   ` (7 more replies)
  0 siblings, 8 replies; 28+ messages in thread
From: Shiyang Ruan @ 2021-02-07 17:09 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, linux-nvdimm, linux-fsdevel
  Cc: jack, darrick.wong, david, ocfs2-devel, viro, dan.j.williams,
	linux-btrfs

This patchset is attempt to add CoW support for fsdax, and take XFS,
which has both reflink and fsdax feature, as an example.

One of the key mechanism need to be implemented in fsdax is CoW.  Copy
the data from srcmap before we actually write data to the destance
iomap.  And we just copy range in which data won't be changed.

Another mechanism is range comparison .  In page cache case, readpage()
is used to load data on disk to page cache in order to be able to
compare data.  In fsdax case, readpage() does not work.  So, we need
another compare data with direct access support.

With the two mechanism implemented in fsdax, we are able to make reflink
and fsdax work together in XFS.


Some of the patches are picked up from Goldwyn's patchset.  I made some
changes to adapt to this patchset.

(Rebased on v5.10)
==

Shiyang Ruan (7):
  fsdax: Output address in dax_iomap_pfn() and rename it
  fsdax: Introduce dax_copy_edges() for CoW
  fsdax: Copy data before write
  fsdax: Replace mmap entry in case of CoW
  fsdax: Dedup file range to use a compare function
  fs/xfs: Handle CoW for fsdax write() path
  fs/xfs: Add dedupe support for fsdax

 fs/btrfs/reflink.c     |   3 +-
 fs/dax.c               | 188 ++++++++++++++++++++++++++++++++++++++---
 fs/ocfs2/file.c        |   2 +-
 fs/remap_range.c       |  14 +--
 fs/xfs/xfs_bmap_util.c |   6 +-
 fs/xfs/xfs_file.c      |  30 ++++++-
 fs/xfs/xfs_inode.c     |   8 +-
 fs/xfs/xfs_inode.h     |   1 +
 fs/xfs/xfs_iomap.c     |   3 +-
 fs/xfs/xfs_iops.c      |  11 ++-
 fs/xfs/xfs_reflink.c   |  23 ++++-
 include/linux/dax.h    |   5 ++
 include/linux/fs.h     |   9 +-
 13 files changed, 270 insertions(+), 33 deletions(-)

-- 
2.30.0




_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Ocfs2-devel] [PATCH 1/7] fsdax: Output address in dax_iomap_pfn() and rename it
  2021-02-07 17:09 [Ocfs2-devel] [PATCH 0/7] fsdax, xfs: Add reflink&dedupe support for fsdax Shiyang Ruan
@ 2021-02-07 17:09 ` Shiyang Ruan
  2021-02-08 15:11   ` Christoph Hellwig
  2021-02-22  7:44   ` Xiaoguang Wang
  2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 2/7] fsdax: Introduce dax_copy_edges() for CoW Shiyang Ruan
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 28+ messages in thread
From: Shiyang Ruan @ 2021-02-07 17:09 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, linux-nvdimm, linux-fsdevel
  Cc: jack, darrick.wong, david, ocfs2-devel, viro, dan.j.williams,
	linux-btrfs

Add address output in dax_iomap_pfn() in order to perform a memcpy() in
CoW case.  Since this function both output address and pfn, rename it to
dax_iomap_direct_access().

Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
---
 fs/dax.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 5b47834f2e1b..b012b2db7ba2 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -998,8 +998,8 @@ static sector_t dax_iomap_sector(struct iomap *iomap, loff_t pos)
 	return (iomap->addr + (pos & PAGE_MASK) - iomap->offset) >> 9;
 }
 
-static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size,
-			 pfn_t *pfnp)
+static int dax_iomap_direct_access(struct iomap *iomap, loff_t pos, size_t size,
+		void **kaddr, pfn_t *pfnp)
 {
 	const sector_t sector = dax_iomap_sector(iomap, pos);
 	pgoff_t pgoff;
@@ -1011,11 +1011,13 @@ static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size,
 		return rc;
 	id = dax_read_lock();
 	length = dax_direct_access(iomap->dax_dev, pgoff, PHYS_PFN(size),
-				   NULL, pfnp);
+				   kaddr, pfnp);
 	if (length < 0) {
 		rc = length;
 		goto out;
 	}
+	if (!pfnp)
+		goto out_check_addr;
 	rc = -EINVAL;
 	if (PFN_PHYS(length) < size)
 		goto out;
@@ -1025,6 +1027,12 @@ static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size,
 	if (length > 1 && !pfn_t_devmap(*pfnp))
 		goto out;
 	rc = 0;
+
+out_check_addr:
+	if (!kaddr)
+		goto out;
+	if (!*kaddr)
+		rc = -EFAULT;
 out:
 	dax_read_unlock(id);
 	return rc;
@@ -1348,7 +1356,8 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
 			count_memcg_event_mm(vma->vm_mm, PGMAJFAULT);
 			major = VM_FAULT_MAJOR;
 		}
-		error = dax_iomap_pfn(&iomap, pos, PAGE_SIZE, &pfn);
+		error = dax_iomap_direct_access(&iomap, pos, PAGE_SIZE,
+						NULL, &pfn);
 		if (error < 0)
 			goto error_finish_iomap;
 
@@ -1566,7 +1575,8 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
 
 	switch (iomap.type) {
 	case IOMAP_MAPPED:
-		error = dax_iomap_pfn(&iomap, pos, PMD_SIZE, &pfn);
+		error = dax_iomap_direct_access(&iomap, pos, PMD_SIZE,
+						NULL, &pfn);
 		if (error < 0)
 			goto finish_iomap;
 
-- 
2.30.0




_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Ocfs2-devel] [PATCH 2/7] fsdax: Introduce dax_copy_edges() for CoW
  2021-02-07 17:09 [Ocfs2-devel] [PATCH 0/7] fsdax, xfs: Add reflink&dedupe support for fsdax Shiyang Ruan
  2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 1/7] fsdax: Output address in dax_iomap_pfn() and rename it Shiyang Ruan
@ 2021-02-07 17:09 ` Shiyang Ruan
  2021-02-08 15:11   ` Christoph Hellwig
  2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 3/7] fsdax: Copy data before write Shiyang Ruan
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Shiyang Ruan @ 2021-02-07 17:09 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, linux-nvdimm, linux-fsdevel
  Cc: jack, darrick.wong, david, ocfs2-devel, viro, Goldwyn Rodrigues,
	dan.j.williams, linux-btrfs

dax_copy_edges() is a helper functions performs a copy from one part of
the device to another for data not page aligned.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
---
 fs/dax.c | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/fs/dax.c b/fs/dax.c
index b012b2db7ba2..ea4e8a434900 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1038,6 +1038,47 @@ static int dax_iomap_direct_access(struct iomap *iomap, loff_t pos, size_t size,
 	return rc;
 }
 
+/*
+ * Copies the part of the pages not included in the write, but required for CoW
+ * because offset/offset+length are not page aligned.
+ */
+static int dax_copy_edges(loff_t pos, loff_t length, struct iomap *srcmap,
+			  void *daddr, bool pmd)
+{
+	size_t page_size = pmd ? PMD_SIZE : PAGE_SIZE;
+	loff_t offset = pos & (page_size - 1);
+	size_t size = ALIGN(offset + length, page_size);
+	loff_t end = pos + length;
+	loff_t pg_end = round_up(end, page_size);
+	void *saddr = 0;
+	int ret = 0;
+
+	ret = dax_iomap_direct_access(srcmap, pos, size, &saddr, NULL);
+	if (ret)
+		return ret;
+	/*
+	 * Copy the first part of the page
+	 * Note: we pass offset as length
+	 */
+	if (offset) {
+		if (saddr)
+			ret = copy_mc_to_kernel(daddr, saddr, offset);
+		else
+			memset(daddr, 0, offset);
+	}
+
+	/* Copy the last part of the range */
+	if (end < pg_end) {
+		if (saddr)
+			ret = copy_mc_to_kernel(daddr + offset + length,
+			       saddr + offset + length,	pg_end - end);
+		else
+			memset(daddr + offset + length, 0, pg_end - end);
+	}
+
+	return ret;
+}
+
 /*
  * The user has performed a load from a hole in the file.  Allocating a new
  * page in the file would cause excessive storage usage for workloads with
-- 
2.30.0




_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Ocfs2-devel] [PATCH 3/7] fsdax: Copy data before write
  2021-02-07 17:09 [Ocfs2-devel] [PATCH 0/7] fsdax, xfs: Add reflink&dedupe support for fsdax Shiyang Ruan
  2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 1/7] fsdax: Output address in dax_iomap_pfn() and rename it Shiyang Ruan
  2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 2/7] fsdax: Introduce dax_copy_edges() for CoW Shiyang Ruan
@ 2021-02-07 17:09 ` Shiyang Ruan
  2021-02-08 15:14   ` Christoph Hellwig
  2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 4/7] fsdax: Replace mmap entry in case of CoW Shiyang Ruan
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Shiyang Ruan @ 2021-02-07 17:09 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, linux-nvdimm, linux-fsdevel
  Cc: jack, darrick.wong, david, ocfs2-devel, viro, dan.j.williams,
	linux-btrfs

Add dax_copy_edges() into each dax actor functions to perform CoW.

Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
---
 fs/dax.c | 37 ++++++++++++++++++++++++++++++++++---
 1 file changed, 34 insertions(+), 3 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index ea4e8a434900..b2195cbdf2dc 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1161,7 +1161,8 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 			return iov_iter_zero(min(length, end - pos), iter);
 	}
 
-	if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED))
+	if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED &&
+			!(iomap->flags & IOMAP_F_SHARED)))
 		return -EIO;
 
 	/*
@@ -1200,6 +1201,12 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 			break;
 		}
 
+		if (iomap != srcmap) {
+			ret = dax_copy_edges(pos, length, srcmap, kaddr, false);
+			if (ret)
+				break;
+		}
+
 		map_len = PFN_PHYS(map_len);
 		kaddr += offset;
 		map_len -= offset;
@@ -1311,6 +1318,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
 	vm_fault_t ret = 0;
 	void *entry;
 	pfn_t pfn;
+	void *kaddr;
 
 	trace_dax_pte_fault(inode, vmf, ret);
 	/*
@@ -1392,19 +1400,27 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
 
 	switch (iomap.type) {
 	case IOMAP_MAPPED:
+cow:
 		if (iomap.flags & IOMAP_F_NEW) {
 			count_vm_event(PGMAJFAULT);
 			count_memcg_event_mm(vma->vm_mm, PGMAJFAULT);
 			major = VM_FAULT_MAJOR;
 		}
 		error = dax_iomap_direct_access(&iomap, pos, PAGE_SIZE,
-						NULL, &pfn);
+						&kaddr, &pfn);
 		if (error < 0)
 			goto error_finish_iomap;
 
 		entry = dax_insert_entry(&xas, mapping, vmf, entry, pfn,
 						 0, write && !sync);
 
+		if (srcmap.type != IOMAP_HOLE) {
+			error = dax_copy_edges(pos, PAGE_SIZE, &srcmap, kaddr,
+					       false);
+			if (error)
+				goto error_finish_iomap;
+		}
+
 		/*
 		 * If we are doing synchronous page fault and inode needs fsync,
 		 * we can insert PTE into page tables only after that happens.
@@ -1428,6 +1444,9 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
 
 		goto finish_iomap;
 	case IOMAP_UNWRITTEN:
+		if (write && iomap.flags & IOMAP_F_SHARED)
+			goto cow;
+		fallthrough;
 	case IOMAP_HOLE:
 		if (!write) {
 			ret = dax_load_hole(&xas, mapping, &entry, vmf);
@@ -1535,6 +1554,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
 	loff_t pos;
 	int error;
 	pfn_t pfn;
+	void *kaddr;
 
 	/*
 	 * Check whether offset isn't beyond end of file now. Caller is
@@ -1616,14 +1636,22 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
 
 	switch (iomap.type) {
 	case IOMAP_MAPPED:
+cow:
 		error = dax_iomap_direct_access(&iomap, pos, PMD_SIZE,
-						NULL, &pfn);
+						&kaddr, &pfn);
 		if (error < 0)
 			goto finish_iomap;
 
 		entry = dax_insert_entry(&xas, mapping, vmf, entry, pfn,
 						DAX_PMD, write && !sync);
 
+		if (srcmap.type != IOMAP_HOLE) {
+			error = dax_copy_edges(pos, PMD_SIZE, &srcmap, kaddr,
+					       true);
+			if (error)
+				goto unlock_entry;
+		}
+
 		/*
 		 * If we are doing synchronous page fault and inode needs fsync,
 		 * we can insert PMD into page tables only after that happens.
@@ -1642,6 +1670,9 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
 		result = vmf_insert_pfn_pmd(vmf, pfn, write);
 		break;
 	case IOMAP_UNWRITTEN:
+		if (write && iomap.flags & IOMAP_F_SHARED)
+			goto cow;
+		fallthrough;
 	case IOMAP_HOLE:
 		if (WARN_ON_ONCE(write))
 			break;
-- 
2.30.0




_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Ocfs2-devel] [PATCH 4/7] fsdax: Replace mmap entry in case of CoW
  2021-02-07 17:09 [Ocfs2-devel] [PATCH 0/7] fsdax, xfs: Add reflink&dedupe support for fsdax Shiyang Ruan
                   ` (2 preceding siblings ...)
  2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 3/7] fsdax: Copy data before write Shiyang Ruan
@ 2021-02-07 17:09 ` Shiyang Ruan
  2021-02-08 15:16   ` Christoph Hellwig
  2021-02-16 13:11   ` David Sterba
  2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 5/7] fsdax: Dedup file range to use a compare function Shiyang Ruan
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 28+ messages in thread
From: Shiyang Ruan @ 2021-02-07 17:09 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, linux-nvdimm, linux-fsdevel
  Cc: jack, darrick.wong, david, ocfs2-devel, viro, Goldwyn Rodrigues,
	dan.j.williams, linux-btrfs

We replace the existing entry to the newly allocated one
in case of CoW. Also, we mark the entry as PAGECACHE_TAG_TOWRITE
so writeback marks this entry as writeprotected. This
helps us snapshots so new write pagefaults after snapshots
trigger a CoW.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
---
 fs/dax.c | 31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index b2195cbdf2dc..29698a3d2e37 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -722,6 +722,9 @@ static int copy_cow_page_dax(struct block_device *bdev, struct dax_device *dax_d
 	return 0;
 }
 
+#define DAX_IF_DIRTY		(1ULL << 0)
+#define DAX_IF_COW		(1ULL << 1)
+
 /*
  * By this point grab_mapping_entry() has ensured that we have a locked entry
  * of the appropriate size so we don't have to worry about downgrading PMDs to
@@ -731,14 +734,16 @@ static int copy_cow_page_dax(struct block_device *bdev, struct dax_device *dax_d
  */
 static void *dax_insert_entry(struct xa_state *xas,
 		struct address_space *mapping, struct vm_fault *vmf,
-		void *entry, pfn_t pfn, unsigned long flags, bool dirty)
+		void *entry, pfn_t pfn, unsigned long flags, bool insert_flags)
 {
 	void *new_entry = dax_make_entry(pfn, flags);
+	bool dirty = insert_flags & DAX_IF_DIRTY;
+	bool cow = insert_flags & DAX_IF_COW;
 
 	if (dirty)
 		__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
 
-	if (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE)) {
+	if (cow || (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE))) {
 		unsigned long index = xas->xa_index;
 		/* we are replacing a zero page with block mapping */
 		if (dax_is_pmd_entry(entry))
@@ -750,7 +755,7 @@ static void *dax_insert_entry(struct xa_state *xas,
 
 	xas_reset(xas);
 	xas_lock_irq(xas);
-	if (dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) {
+	if (cow || dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) {
 		void *old;
 
 		dax_disassociate_entry(entry, mapping, false);
@@ -774,6 +779,9 @@ static void *dax_insert_entry(struct xa_state *xas,
 	if (dirty)
 		xas_set_mark(xas, PAGECACHE_TAG_DIRTY);
 
+	if (cow)
+		xas_set_mark(xas, PAGECACHE_TAG_TOWRITE);
+
 	xas_unlock_irq(xas);
 	return entry;
 }
@@ -1319,6 +1327,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
 	void *entry;
 	pfn_t pfn;
 	void *kaddr;
+	unsigned long insert_flags = 0;
 
 	trace_dax_pte_fault(inode, vmf, ret);
 	/*
@@ -1444,8 +1453,10 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
 
 		goto finish_iomap;
 	case IOMAP_UNWRITTEN:
-		if (write && iomap.flags & IOMAP_F_SHARED)
+		if (write && (iomap.flags & IOMAP_F_SHARED)) {
+			insert_flags |= DAX_IF_COW;
 			goto cow;
+		}
 		fallthrough;
 	case IOMAP_HOLE:
 		if (!write) {
@@ -1555,6 +1566,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
 	int error;
 	pfn_t pfn;
 	void *kaddr;
+	unsigned long insert_flags = 0;
 
 	/*
 	 * Check whether offset isn't beyond end of file now. Caller is
@@ -1670,14 +1682,17 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
 		result = vmf_insert_pfn_pmd(vmf, pfn, write);
 		break;
 	case IOMAP_UNWRITTEN:
-		if (write && iomap.flags & IOMAP_F_SHARED)
+		if (write && (iomap.flags & IOMAP_F_SHARED)) {
+			insert_flags |= DAX_IF_COW;
 			goto cow;
+		}
 		fallthrough;
 	case IOMAP_HOLE:
-		if (WARN_ON_ONCE(write))
+		if (!write) {
+			result = dax_pmd_load_hole(&xas, vmf, &iomap, &entry);
 			break;
-		result = dax_pmd_load_hole(&xas, vmf, &iomap, &entry);
-		break;
+		}
+		fallthrough;
 	default:
 		WARN_ON_ONCE(1);
 		break;
-- 
2.30.0




_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Ocfs2-devel] [PATCH 5/7] fsdax: Dedup file range to use a compare function
  2021-02-07 17:09 [Ocfs2-devel] [PATCH 0/7] fsdax, xfs: Add reflink&dedupe support for fsdax Shiyang Ruan
                   ` (3 preceding siblings ...)
  2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 4/7] fsdax: Replace mmap entry in case of CoW Shiyang Ruan
@ 2021-02-07 17:09 ` Shiyang Ruan
  2021-02-08 15:19   ` Christoph Hellwig
  2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 6/7] fs/xfs: Handle CoW for fsdax write() path Shiyang Ruan
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Shiyang Ruan @ 2021-02-07 17:09 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, linux-nvdimm, linux-fsdevel
  Cc: jack, darrick.wong, david, ocfs2-devel, viro, Goldwyn Rodrigues,
	dan.j.williams, linux-btrfs

With dax we cannot deal with readpage() etc. So, we create a
funciton callback to perform the file data comparison and pass
it to generic_remap_file_range_prep() so it can use iomap-based
functions.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
---
 fs/btrfs/reflink.c   |  3 +-
 fs/dax.c             | 67 ++++++++++++++++++++++++++++++++++++++++++++
 fs/ocfs2/file.c      |  2 +-
 fs/remap_range.c     | 14 +++++----
 fs/xfs/xfs_reflink.c |  2 +-
 include/linux/dax.h  |  5 ++++
 include/linux/fs.h   |  9 +++++-
 7 files changed, 92 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/reflink.c b/fs/btrfs/reflink.c
index 99aa87c08912..efe094f7f748 100644
--- a/fs/btrfs/reflink.c
+++ b/fs/btrfs/reflink.c
@@ -783,7 +783,8 @@ static int btrfs_remap_file_range_prep(struct file *file_in, loff_t pos_in,
 		return ret;
 
 	return generic_remap_file_range_prep(file_in, pos_in, file_out, pos_out,
-					    len, remap_flags);
+					    len, remap_flags,
+					    vfs_dedupe_file_range_compare);
 }
 
 loff_t btrfs_remap_file_range(struct file *src_file, loff_t off,
diff --git a/fs/dax.c b/fs/dax.c
index 29698a3d2e37..6e9391845057 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -30,6 +30,8 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/fs_dax.h>
 
+#define MIN(a, b) (((a) < (b)) ? (a) : (b))
+
 static inline unsigned int pe_order(enum page_entry_size pe_size)
 {
 	if (pe_size == PE_SIZE_PTE)
@@ -1827,3 +1829,68 @@ vm_fault_t dax_finish_sync_fault(struct vm_fault *vmf,
 	return dax_insert_pfn_mkwrite(vmf, pfn, order);
 }
 EXPORT_SYMBOL_GPL(dax_finish_sync_fault);
+
+int dax_file_range_compare(struct inode *src, loff_t srcoff, struct inode *dest,
+		loff_t destoff, loff_t len, bool *is_same,
+		const struct iomap_ops *ops)
+{
+	void *saddr, *daddr;
+	struct iomap smap = { 0 };
+	struct iomap dmap = { 0 };
+	bool same = false;
+	loff_t cmp_len;
+	int id, ret = 0;
+
+	id = dax_read_lock();
+	while (len) {
+		ret = ops->iomap_begin(src, srcoff, len, 0, &smap, NULL);
+		if (ret < 0)
+			goto out_src;
+		cmp_len = MIN(len, smap.offset + smap.length - srcoff);
+
+		ret = ops->iomap_begin(dest, destoff, cmp_len, 0, &dmap, NULL);
+		if (ret < 0)
+			goto out_dest;
+		cmp_len = MIN(cmp_len, dmap.offset + dmap.length - destoff);
+
+		if (smap.type == IOMAP_HOLE && dmap.type == IOMAP_HOLE)
+			goto next;
+
+		if (smap.type == IOMAP_HOLE || dmap.type == IOMAP_HOLE) {
+			same = false;
+			goto next;
+		}
+
+		ret = dax_iomap_direct_access(&smap, srcoff,
+			  ALIGN(srcoff + cmp_len, PAGE_SIZE), &saddr, NULL);
+		if (ret < 0)
+			goto out_dest;
+
+		ret = dax_iomap_direct_access(&dmap, destoff,
+			  ALIGN(destoff + cmp_len, PAGE_SIZE), &daddr, NULL);
+		if (ret < 0)
+			goto out_dest;
+
+		same = !memcmp(saddr, daddr, cmp_len);
+		if (!same)
+			break;
+next:
+		len -= cmp_len;
+		srcoff += cmp_len;
+		destoff += cmp_len;
+out_dest:
+		if (ops->iomap_end)
+			ops->iomap_end(dest, destoff, cmp_len, 0, 0, &dmap);
+out_src:
+		if (ops->iomap_end)
+			ops->iomap_end(src, srcoff, len, 0, 0, &smap);
+
+		if (ret < 0)
+			goto out;
+	}
+	*is_same = same;
+out:
+	dax_read_unlock(id);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(dax_file_range_compare);
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 85979e2214b3..9d101f129d16 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -2591,7 +2591,7 @@ static loff_t ocfs2_remap_file_range(struct file *file_in, loff_t pos_in,
 		goto out_unlock;
 
 	ret = generic_remap_file_range_prep(file_in, pos_in, file_out, pos_out,
-			&len, remap_flags);
+			&len, remap_flags, vfs_dedupe_file_range_compare);
 	if (ret < 0 || len == 0)
 		goto out_unlock;
 
diff --git a/fs/remap_range.c b/fs/remap_range.c
index e6099beefa97..fc64f7e2af5a 100644
--- a/fs/remap_range.c
+++ b/fs/remap_range.c
@@ -199,9 +199,9 @@ static void vfs_unlock_two_pages(struct page *page1, struct page *page2)
  * Compare extents of two files to see if they are the same.
  * Caller must have locked both inodes to prevent write races.
  */
-static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
-					 struct inode *dest, loff_t destoff,
-					 loff_t len, bool *is_same)
+int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
+				  struct inode *dest, loff_t destoff,
+				  loff_t len, bool *is_same)
 {
 	loff_t src_poff;
 	loff_t dest_poff;
@@ -280,6 +280,7 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
 out_error:
 	return error;
 }
+EXPORT_SYMBOL(vfs_dedupe_file_range_compare);
 
 /*
  * Check that the two inodes are eligible for cloning, the ranges make
@@ -291,7 +292,8 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
  */
 int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in,
 				  struct file *file_out, loff_t pos_out,
-				  loff_t *len, unsigned int remap_flags)
+				  loff_t *len, unsigned int remap_flags,
+				  compare_range_t compare_range_fn)
 {
 	struct inode *inode_in = file_inode(file_in);
 	struct inode *inode_out = file_inode(file_out);
@@ -351,8 +353,8 @@ int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in,
 	if (remap_flags & REMAP_FILE_DEDUP) {
 		bool		is_same = false;
 
-		ret = vfs_dedupe_file_range_compare(inode_in, pos_in,
-				inode_out, pos_out, *len, &is_same);
+		ret = compare_range_fn(inode_in, pos_in, inode_out, pos_out,
+				       *len, &is_same);
 		if (ret)
 			return ret;
 		if (!is_same)
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 6fa05fb78189..26708c01fd78 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1309,7 +1309,7 @@ xfs_reflink_remap_prep(
 		goto out_unlock;
 
 	ret = generic_remap_file_range_prep(file_in, pos_in, file_out, pos_out,
-			len, remap_flags);
+			len, remap_flags, vfs_dedupe_file_range_compare);
 	if (ret || *len == 0)
 		goto out_unlock;
 
diff --git a/include/linux/dax.h b/include/linux/dax.h
index b52f084aa643..183e084c2ac7 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -238,6 +238,11 @@ int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index);
 int dax_invalidate_mapping_entry_sync(struct address_space *mapping,
 				      pgoff_t index);
 s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap);
+int dax_file_range_compare(struct inode *src, loff_t srcoff,
+			   struct inode *dest, loff_t destoff,
+			   loff_t len, bool *is_same,
+			   const struct iomap_ops *ops);
+
 static inline bool dax_mapping(struct address_space *mapping)
 {
 	return mapping->host && IS_DAX(mapping->host);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8667d0cdc71e..159ec755e47b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1912,13 +1912,20 @@ extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
 extern ssize_t vfs_write(struct file *, const char __user *, size_t, loff_t *);
 extern ssize_t vfs_copy_file_range(struct file *, loff_t , struct file *,
 				   loff_t, size_t, unsigned int);
+extern int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
+					 struct inode *dest, loff_t destoff,
+					 loff_t len, bool *is_same);
+typedef int (*compare_range_t)(struct inode *src, loff_t srcpos,
+			       struct inode *dest, loff_t destpos,
+			       loff_t len, bool *is_same);
 extern ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in,
 				       struct file *file_out, loff_t pos_out,
 				       size_t len, unsigned int flags);
 extern int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in,
 					 struct file *file_out, loff_t pos_out,
 					 loff_t *count,
-					 unsigned int remap_flags);
+					 unsigned int remap_flags,
+					 compare_range_t compare_range_fn);
 extern loff_t do_clone_file_range(struct file *file_in, loff_t pos_in,
 				  struct file *file_out, loff_t pos_out,
 				  loff_t len, unsigned int remap_flags);
-- 
2.30.0




_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Ocfs2-devel] [PATCH 6/7] fs/xfs: Handle CoW for fsdax write() path
  2021-02-07 17:09 [Ocfs2-devel] [PATCH 0/7] fsdax, xfs: Add reflink&dedupe support for fsdax Shiyang Ruan
                   ` (4 preceding siblings ...)
  2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 5/7] fsdax: Dedup file range to use a compare function Shiyang Ruan
@ 2021-02-07 17:09 ` Shiyang Ruan
  2021-02-08 15:24   ` Christoph Hellwig
  2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 7/7] fs/xfs: Add dedupe support for fsdax Shiyang Ruan
  2021-02-08 15:39 ` [Ocfs2-devel] [PATCH 0/7] fsdax, xfs: Add reflink&dedupe " Jan Kara
  7 siblings, 1 reply; 28+ messages in thread
From: Shiyang Ruan @ 2021-02-07 17:09 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, linux-nvdimm, linux-fsdevel
  Cc: jack, darrick.wong, david, ocfs2-devel, viro, dan.j.williams,
	linux-btrfs

In fsdax mode, WRITE and ZERO on a shared extent need CoW mechanism
performed.  After CoW, new extents needs to be remapped to the file.

Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
---
 fs/xfs/xfs_bmap_util.c |  6 +++++-
 fs/xfs/xfs_file.c      | 10 +++++++---
 fs/xfs/xfs_iomap.c     |  3 ++-
 fs/xfs/xfs_iops.c      | 11 ++++++++---
 fs/xfs/xfs_reflink.c   |  2 ++
 5 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 7371a7f7c652..eb0c7ff103f8 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -977,10 +977,14 @@ xfs_free_file_space(
 	if (offset + len > XFS_ISIZE(ip))
 		len = XFS_ISIZE(ip) - offset;
 	error = iomap_zero_range(VFS_I(ip), offset, len, NULL,
-			&xfs_buffered_write_iomap_ops);
+		  IS_DAX(VFS_I(ip)) ?
+		  &xfs_direct_write_iomap_ops : &xfs_buffered_write_iomap_ops);
 	if (error)
 		return error;
 
+	if (xfs_is_reflink_inode(ip))
+		xfs_reflink_end_cow(ip, offset, len);
+
 	/*
 	 * If we zeroed right up to EOF and EOF straddles a page boundary we
 	 * must make sure that the post-EOF area is also zeroed because the
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 5b0f93f73837..ab738641a3f4 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -624,9 +624,13 @@ xfs_file_dax_write(
 
 	trace_xfs_file_dax_write(ip, count, pos);
 	ret = dax_iomap_rw(iocb, from, &xfs_direct_write_iomap_ops);
-	if (ret > 0 && iocb->ki_pos > i_size_read(inode)) {
-		i_size_write(inode, iocb->ki_pos);
-		error = xfs_setfilesize(ip, pos, ret);
+	if (ret > 0) {
+		if (iocb->ki_pos > i_size_read(inode)) {
+			i_size_write(inode, iocb->ki_pos);
+			error = xfs_setfilesize(ip, pos, ret);
+		}
+		if (xfs_is_cow_inode(ip))
+			xfs_reflink_end_cow(ip, pos, ret);
 	}
 out:
 	xfs_iunlock(ip, iolock);
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 7b9ff824e82d..d6d4cc0f084e 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -765,13 +765,14 @@ xfs_direct_write_iomap_begin(
 		goto out_unlock;
 
 	if (imap_needs_cow(ip, flags, &imap, nimaps)) {
+		bool need_convert = flags & IOMAP_DIRECT || IS_DAX(inode);
 		error = -EAGAIN;
 		if (flags & IOMAP_NOWAIT)
 			goto out_unlock;
 
 		/* may drop and re-acquire the ilock */
 		error = xfs_reflink_allocate_cow(ip, &imap, &cmap, &shared,
-				&lockmode, flags & IOMAP_DIRECT);
+				&lockmode, need_convert);
 		if (error)
 			goto out_unlock;
 		if (shared)
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 1414ab79eacf..480bda0f16d0 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -860,6 +860,7 @@ xfs_setattr_size(
 	int			error;
 	uint			lock_flags = 0;
 	bool			did_zeroing = false;
+	const struct iomap_ops	*ops;
 
 	ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
 	ASSERT(xfs_isilocked(ip, XFS_MMAPLOCK_EXCL));
@@ -906,10 +907,15 @@ xfs_setattr_size(
 	 * extension, or zeroing out the rest of the block on a downward
 	 * truncate.
 	 */
+	if (IS_DAX(inode))
+		ops = &xfs_direct_write_iomap_ops;
+	else
+		ops = &xfs_buffered_write_iomap_ops;
+
 	if (newsize > oldsize) {
 		trace_xfs_zero_eof(ip, oldsize, newsize - oldsize);
 		error = iomap_zero_range(inode, oldsize, newsize - oldsize,
-				&did_zeroing, &xfs_buffered_write_iomap_ops);
+				&did_zeroing, ops);
 	} else {
 		/*
 		 * iomap won't detect a dirty page over an unwritten block (or a
@@ -921,8 +927,7 @@ xfs_setattr_size(
 						     newsize);
 		if (error)
 			return error;
-		error = iomap_truncate_page(inode, newsize, &did_zeroing,
-				&xfs_buffered_write_iomap_ops);
+		error = iomap_truncate_page(inode, newsize, &did_zeroing, ops);
 	}
 
 	if (error)
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 26708c01fd78..38bde16cdb39 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1246,6 +1246,8 @@ xfs_reflink_zero_posteof(
 
 	trace_xfs_zero_eof(ip, isize, pos - isize);
 	return iomap_zero_range(VFS_I(ip), isize, pos - isize, NULL,
+			IS_DAX(VFS_I(ip)) ?
+			&xfs_direct_write_iomap_ops :
 			&xfs_buffered_write_iomap_ops);
 }
 
-- 
2.30.0




_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Ocfs2-devel] [PATCH 7/7] fs/xfs: Add dedupe support for fsdax
  2021-02-07 17:09 [Ocfs2-devel] [PATCH 0/7] fsdax, xfs: Add reflink&dedupe support for fsdax Shiyang Ruan
                   ` (5 preceding siblings ...)
  2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 6/7] fs/xfs: Handle CoW for fsdax write() path Shiyang Ruan
@ 2021-02-07 17:09 ` Shiyang Ruan
  2021-02-08 15:39 ` [Ocfs2-devel] [PATCH 0/7] fsdax, xfs: Add reflink&dedupe " Jan Kara
  7 siblings, 0 replies; 28+ messages in thread
From: Shiyang Ruan @ 2021-02-07 17:09 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, linux-nvdimm, linux-fsdevel
  Cc: jack, darrick.wong, david, ocfs2-devel, viro, dan.j.williams,
	linux-btrfs

Add xfs_break_two_dax_layouts() to break layout for tow dax files.  Then
call compare range function only when files are both DAX or not.

Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
---
 fs/xfs/xfs_file.c    | 20 ++++++++++++++++++++
 fs/xfs/xfs_inode.c   |  8 +++++++-
 fs/xfs/xfs_inode.h   |  1 +
 fs/xfs/xfs_reflink.c | 21 ++++++++++++++++++---
 4 files changed, 46 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index ab738641a3f4..64ded96b43c8 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -791,6 +791,26 @@ xfs_break_dax_layouts(
 			0, 0, xfs_wait_dax_page(inode));
 }
 
+int
+xfs_break_two_dax_layouts(
+	struct inode		*src,
+	struct inode		*dest)
+{
+	int			error;
+	bool			retry = false;
+
+retry:
+	error = xfs_break_dax_layouts(src, &retry);
+	if (error || retry)
+		goto retry;
+
+	error = xfs_break_dax_layouts(dest, &retry);
+	if (error || retry)
+		goto retry;
+
+	return error;
+}
+
 int
 xfs_break_layouts(
 	struct inode		*inode,
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 2bfbcf28b1bd..6483dbaa4d57 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3786,8 +3786,10 @@ xfs_ilock2_io_mmap(
 	struct xfs_inode	*ip2)
 {
 	int			ret;
+	struct inode		*inode1 = VFS_I(ip1);
+	struct inode		*inode2 = VFS_I(ip2);
 
-	ret = xfs_iolock_two_inodes_and_break_layout(VFS_I(ip1), VFS_I(ip2));
+	ret = xfs_iolock_two_inodes_and_break_layout(inode1, inode2);
 	if (ret)
 		return ret;
 	if (ip1 == ip2)
@@ -3795,6 +3797,10 @@ xfs_ilock2_io_mmap(
 	else
 		xfs_lock_two_inodes(ip1, XFS_MMAPLOCK_EXCL,
 				    ip2, XFS_MMAPLOCK_EXCL);
+
+	if (IS_DAX(inode1) && IS_DAX(inode2))
+		ret = xfs_break_two_dax_layouts(inode1, inode2);
+
 	return 0;
 }
 
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 751a3d1d7d84..462b61bea691 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -431,6 +431,7 @@ enum xfs_prealloc_flags {
 
 int	xfs_update_prealloc_flags(struct xfs_inode *ip,
 				  enum xfs_prealloc_flags flags);
+int	xfs_break_two_dax_layouts(struct inode *inode1, struct inode *inode2);
 int	xfs_break_layouts(struct inode *inode, uint *iolock,
 		enum layout_break_reason reason);
 
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 38bde16cdb39..bdb9a07e703f 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -29,6 +29,7 @@
 #include "xfs_iomap.h"
 #include "xfs_sb.h"
 #include "xfs_ag_resv.h"
+#include <linux/dax.h>
 
 /*
  * Copy on Write of Shared Blocks
@@ -1251,6 +1252,14 @@ xfs_reflink_zero_posteof(
 			&xfs_buffered_write_iomap_ops);
 }
 
+int xfs_reflink_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
+					  struct inode *dest, loff_t destoff,
+					  loff_t len, bool *is_same)
+{
+	return dax_file_range_compare(src, srcoff, dest, destoff, len, is_same,
+				      &xfs_read_iomap_ops);
+}
+
 /*
  * Prepare two files for range cloning.  Upon a successful return both inodes
  * will have the iolock and mmaplock held, the page cache of the out file will
@@ -1293,6 +1302,7 @@ xfs_reflink_remap_prep(
 	struct xfs_inode	*src = XFS_I(inode_in);
 	struct inode		*inode_out = file_inode(file_out);
 	struct xfs_inode	*dest = XFS_I(inode_out);
+	compare_range_t		compare_range_fn;
 	int			ret;
 
 	/* Lock both files against IO */
@@ -1306,12 +1316,17 @@ xfs_reflink_remap_prep(
 	if (XFS_IS_REALTIME_INODE(src) || XFS_IS_REALTIME_INODE(dest))
 		goto out_unlock;
 
-	/* Don't share DAX file data for now. */
-	if (IS_DAX(inode_in) || IS_DAX(inode_out))
+	/* Don't share DAX file data with non-DAX file. */
+	if (IS_DAX(inode_in) != IS_DAX(inode_out))
 		goto out_unlock;
 
+	if (IS_DAX(inode_in))
+		compare_range_fn = xfs_reflink_dedupe_file_range_compare;
+	else
+		compare_range_fn = vfs_dedupe_file_range_compare;
+
 	ret = generic_remap_file_range_prep(file_in, pos_in, file_out, pos_out,
-			len, remap_flags, vfs_dedupe_file_range_compare);
+			len, remap_flags, compare_range_fn);
 	if (ret || *len == 0)
 		goto out_unlock;
 
-- 
2.30.0




_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Ocfs2-devel] [PATCH 1/7] fsdax: Output address in dax_iomap_pfn() and rename it
  2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 1/7] fsdax: Output address in dax_iomap_pfn() and rename it Shiyang Ruan
@ 2021-02-08 15:11   ` Christoph Hellwig
  2021-02-22  7:44   ` Xiaoguang Wang
  1 sibling, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2021-02-08 15:11 UTC (permalink / raw)
  To: Shiyang Ruan
  Cc: jack, darrick.wong, linux-nvdimm, david, linux-kernel, linux-xfs,
	ocfs2-devel, viro, linux-fsdevel, dan.j.williams, linux-btrfs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Ocfs2-devel] [PATCH 2/7] fsdax: Introduce dax_copy_edges() for CoW
  2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 2/7] fsdax: Introduce dax_copy_edges() for CoW Shiyang Ruan
@ 2021-02-08 15:11   ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2021-02-08 15:11 UTC (permalink / raw)
  To: Shiyang Ruan
  Cc: jack, darrick.wong, linux-nvdimm, david, linux-kernel, linux-xfs,
	ocfs2-devel, viro, Goldwyn Rodrigues, linux-fsdevel,
	dan.j.williams, linux-btrfs

On Mon, Feb 08, 2021 at 01:09:19AM +0800, Shiyang Ruan wrote:
> dax_copy_edges() is a helper functions performs a copy from one part of
> the device to another for data not page aligned.

The function looks good to me, but adding a static function without a
user is not very bisection friendly.

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Ocfs2-devel] [PATCH 3/7] fsdax: Copy data before write
  2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 3/7] fsdax: Copy data before write Shiyang Ruan
@ 2021-02-08 15:14   ` Christoph Hellwig
  2021-02-09  1:53     ` Ruan Shiyang
  0 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2021-02-08 15:14 UTC (permalink / raw)
  To: Shiyang Ruan
  Cc: jack, darrick.wong, linux-nvdimm, david, linux-kernel, linux-xfs,
	ocfs2-devel, viro, linux-fsdevel, dan.j.williams, linux-btrfs

>  	switch (iomap.type) {
>  	case IOMAP_MAPPED:
> +cow:
>  		if (iomap.flags & IOMAP_F_NEW) {
>  			count_vm_event(PGMAJFAULT);
>  			count_memcg_event_mm(vma->vm_mm, PGMAJFAULT);
>  			major = VM_FAULT_MAJOR;
>  		}
>  		error = dax_iomap_direct_access(&iomap, pos, PAGE_SIZE,
> -						NULL, &pfn);
> +						&kaddr, &pfn);

Any chance you could look into factoring out this code into a helper
to avoid the goto magic, which is a little too convoluted?

>  	switch (iomap.type) {
>  	case IOMAP_MAPPED:
> +cow:
>  		error = dax_iomap_direct_access(&iomap, pos, PMD_SIZE,
> -						NULL, &pfn);
> +						&kaddr, &pfn);
>  		if (error < 0)
>  			goto finish_iomap;
>  
>  		entry = dax_insert_entry(&xas, mapping, vmf, entry, pfn,
>  						DAX_PMD, write && !sync);
>  
> +		if (srcmap.type != IOMAP_HOLE) {

Same here.

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Ocfs2-devel] [PATCH 4/7] fsdax: Replace mmap entry in case of CoW
  2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 4/7] fsdax: Replace mmap entry in case of CoW Shiyang Ruan
@ 2021-02-08 15:16   ` Christoph Hellwig
  2021-02-16 13:11   ` David Sterba
  1 sibling, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2021-02-08 15:16 UTC (permalink / raw)
  To: Shiyang Ruan
  Cc: jack, darrick.wong, linux-nvdimm, david, linux-kernel, linux-xfs,
	ocfs2-devel, viro, Goldwyn Rodrigues, linux-fsdevel,
	dan.j.williams, linux-btrfs

>  static void *dax_insert_entry(struct xa_state *xas,
>  		struct address_space *mapping, struct vm_fault *vmf,
> -		void *entry, pfn_t pfn, unsigned long flags, bool dirty)
> +		void *entry, pfn_t pfn, unsigned long flags, bool insert_flags)
>  {
>  	void *new_entry = dax_make_entry(pfn, flags);
> +	bool dirty = insert_flags & DAX_IF_DIRTY;
> +	bool cow = insert_flags & DAX_IF_COW;
>  
>  	if (dirty)
>  		__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);

Can we just move the __mark_inode_dirty to the one caller that needs it
in a prep patch?

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Ocfs2-devel] [PATCH 5/7] fsdax: Dedup file range to use a compare function
  2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 5/7] fsdax: Dedup file range to use a compare function Shiyang Ruan
@ 2021-02-08 15:19   ` Christoph Hellwig
  2021-02-09  9:15     ` Ruan Shiyang
  0 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2021-02-08 15:19 UTC (permalink / raw)
  To: Shiyang Ruan
  Cc: jack, darrick.wong, linux-nvdimm, david, linux-kernel, linux-xfs,
	ocfs2-devel, viro, Goldwyn Rodrigues, linux-fsdevel,
	dan.j.williams, linux-btrfs

On Mon, Feb 08, 2021 at 01:09:22AM +0800, Shiyang Ruan wrote:
> With dax we cannot deal with readpage() etc. So, we create a
> funciton callback to perform the file data comparison and pass

s/funciton/function/g

> +#define MIN(a, b) (((a) < (b)) ? (a) : (b))

This should use the existing min or min_t helpers.


>  int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in,
>  				  struct file *file_out, loff_t pos_out,
> -				  loff_t *len, unsigned int remap_flags)
> +				  loff_t *len, unsigned int remap_flags,
> +				  compare_range_t compare_range_fn)

Can we keep generic_remap_file_range_prep as-is, and add a new
dax_remap_file_range_prep, both sharing a low-level
__generic_remap_file_range_prep implementation?  And for that
implementation I'd also go for classic if/else instead of the function
pointer.

> +extern int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
> +					 struct inode *dest, loff_t destoff,
> +					 loff_t len, bool *is_same);

no need for the extern.

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Ocfs2-devel] [PATCH 6/7] fs/xfs: Handle CoW for fsdax write() path
  2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 6/7] fs/xfs: Handle CoW for fsdax write() path Shiyang Ruan
@ 2021-02-08 15:24   ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2021-02-08 15:24 UTC (permalink / raw)
  To: Shiyang Ruan
  Cc: jack, darrick.wong, linux-nvdimm, david, linux-kernel, linux-xfs,
	ocfs2-devel, viro, linux-fsdevel, dan.j.williams, linux-btrfs

> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -977,10 +977,14 @@ xfs_free_file_space(
>  	if (offset + len > XFS_ISIZE(ip))
>  		len = XFS_ISIZE(ip) - offset;
>  	error = iomap_zero_range(VFS_I(ip), offset, len, NULL,
> -			&xfs_buffered_write_iomap_ops);
> +		  IS_DAX(VFS_I(ip)) ?
> +		  &xfs_direct_write_iomap_ops : &xfs_buffered_write_iomap_ops);
>  	if (error)
>  		return error;
> +	if (xfs_is_reflink_inode(ip))
> +		xfs_reflink_end_cow(ip, offset, len);

Maybe we need to add (back) and xfs_zero_range helper that encapsulates
the details?

>  	trace_xfs_file_dax_write(ip, count, pos);
>  	ret = dax_iomap_rw(iocb, from, &xfs_direct_write_iomap_ops);
> -	if (ret > 0 && iocb->ki_pos > i_size_read(inode)) {
> -		i_size_write(inode, iocb->ki_pos);
> -		error = xfs_setfilesize(ip, pos, ret);
> +	if (ret > 0) {
> +		if (iocb->ki_pos > i_size_read(inode)) {
> +			i_size_write(inode, iocb->ki_pos);
> +			error = xfs_setfilesize(ip, pos, ret);
> +		}
> +		if (xfs_is_cow_inode(ip))
> +			xfs_reflink_end_cow(ip, pos, ret);

Nitpick, but I'd just goto out for ret <= 0 to reduce the indentation a
bit.

>  	}
>  out:
>  	xfs_iunlock(ip, iolock);
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 7b9ff824e82d..d6d4cc0f084e 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -765,13 +765,14 @@ xfs_direct_write_iomap_begin(
>  		goto out_unlock;
>  
>  	if (imap_needs_cow(ip, flags, &imap, nimaps)) {
> +		bool need_convert = flags & IOMAP_DIRECT || IS_DAX(inode);
>  		error = -EAGAIN;
>  		if (flags & IOMAP_NOWAIT)
>  			goto out_unlock;
>  
>  		/* may drop and re-acquire the ilock */
>  		error = xfs_reflink_allocate_cow(ip, &imap, &cmap, &shared,
> -				&lockmode, flags & IOMAP_DIRECT);
> +				&lockmode, need_convert);

Why not:

		error = xfs_reflink_allocate_cow(ip, &imap, &cmap, &shared,
				&lockmode,
				(flags & IOMAP_DIRECT) || IS_DAX(inode));

?

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Ocfs2-devel] [PATCH 0/7] fsdax, xfs: Add reflink&dedupe support for fsdax
  2021-02-07 17:09 [Ocfs2-devel] [PATCH 0/7] fsdax, xfs: Add reflink&dedupe support for fsdax Shiyang Ruan
                   ` (6 preceding siblings ...)
  2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 7/7] fs/xfs: Add dedupe support for fsdax Shiyang Ruan
@ 2021-02-08 15:39 ` Jan Kara
  2021-02-09  1:50   ` Ruan Shiyang
  7 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2021-02-08 15:39 UTC (permalink / raw)
  To: Shiyang Ruan
  Cc: jack, darrick.wong, linux-nvdimm, david, linux-kernel, linux-xfs,
	ocfs2-devel, viro, linux-fsdevel, dan.j.williams, linux-btrfs

On Mon 08-02-21 01:09:17, Shiyang Ruan wrote:
> This patchset is attempt to add CoW support for fsdax, and take XFS,
> which has both reflink and fsdax feature, as an example.
> 
> One of the key mechanism need to be implemented in fsdax is CoW.  Copy
> the data from srcmap before we actually write data to the destance
> iomap.  And we just copy range in which data won't be changed.
> 
> Another mechanism is range comparison .  In page cache case, readpage()
> is used to load data on disk to page cache in order to be able to
> compare data.  In fsdax case, readpage() does not work.  So, we need
> another compare data with direct access support.
> 
> With the two mechanism implemented in fsdax, we are able to make reflink
> and fsdax work together in XFS.
> 
> Some of the patches are picked up from Goldwyn's patchset.  I made some
> changes to adapt to this patchset.

How do you deal with HWPoison code trying to reverse-map struct page back
to inode-offset pair? This also needs to be fixed for reflink to work with
DAX.

								Honza

> 
> (Rebased on v5.10)
> ==
> 
> Shiyang Ruan (7):
>   fsdax: Output address in dax_iomap_pfn() and rename it
>   fsdax: Introduce dax_copy_edges() for CoW
>   fsdax: Copy data before write
>   fsdax: Replace mmap entry in case of CoW
>   fsdax: Dedup file range to use a compare function
>   fs/xfs: Handle CoW for fsdax write() path
>   fs/xfs: Add dedupe support for fsdax
> 
>  fs/btrfs/reflink.c     |   3 +-
>  fs/dax.c               | 188 ++++++++++++++++++++++++++++++++++++++---
>  fs/ocfs2/file.c        |   2 +-
>  fs/remap_range.c       |  14 +--
>  fs/xfs/xfs_bmap_util.c |   6 +-
>  fs/xfs/xfs_file.c      |  30 ++++++-
>  fs/xfs/xfs_inode.c     |   8 +-
>  fs/xfs/xfs_inode.h     |   1 +
>  fs/xfs/xfs_iomap.c     |   3 +-
>  fs/xfs/xfs_iops.c      |  11 ++-
>  fs/xfs/xfs_reflink.c   |  23 ++++-
>  include/linux/dax.h    |   5 ++
>  include/linux/fs.h     |   9 +-
>  13 files changed, 270 insertions(+), 33 deletions(-)
> 
> -- 
> 2.30.0
> 
> 
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Ocfs2-devel] [PATCH 0/7] fsdax, xfs: Add reflink&dedupe support for fsdax
  2021-02-08 15:39 ` [Ocfs2-devel] [PATCH 0/7] fsdax, xfs: Add reflink&dedupe " Jan Kara
@ 2021-02-09  1:50   ` Ruan Shiyang
  0 siblings, 0 replies; 28+ messages in thread
From: Ruan Shiyang @ 2021-02-09  1:50 UTC (permalink / raw)
  To: Jan Kara
  Cc: darrick.wong, linux-nvdimm, david, linux-kernel, linux-xfs,
	ocfs2-devel, viro, linux-fsdevel, dan.j.williams, linux-btrfs



On 2021/2/8 下午11:39, Jan Kara wrote:
> On Mon 08-02-21 01:09:17, Shiyang Ruan wrote:
>> This patchset is attempt to add CoW support for fsdax, and take XFS,
>> which has both reflink and fsdax feature, as an example.
>>
>> One of the key mechanism need to be implemented in fsdax is CoW.  Copy
>> the data from srcmap before we actually write data to the destance
>> iomap.  And we just copy range in which data won't be changed.
>>
>> Another mechanism is range comparison .  In page cache case, readpage()
>> is used to load data on disk to page cache in order to be able to
>> compare data.  In fsdax case, readpage() does not work.  So, we need
>> another compare data with direct access support.
>>
>> With the two mechanism implemented in fsdax, we are able to make reflink
>> and fsdax work together in XFS.
>>
>> Some of the patches are picked up from Goldwyn's patchset.  I made some
>> changes to adapt to this patchset.
> 
> How do you deal with HWPoison code trying to reverse-map struct page back
> to inode-offset pair? This also needs to be fixed for reflink to work with
> DAX.
> 

I have posted v3 patchset to add reverse-map support for dax page 
yesterday.  Please take a look at this:

   https://lkml.org/lkml/2021/2/8/347


--
Thanks,
Ruan Shiyang.

> 								Honza
> 
>>
>> (Rebased on v5.10)
>> ==
>>
>> Shiyang Ruan (7):
>>    fsdax: Output address in dax_iomap_pfn() and rename it
>>    fsdax: Introduce dax_copy_edges() for CoW
>>    fsdax: Copy data before write
>>    fsdax: Replace mmap entry in case of CoW
>>    fsdax: Dedup file range to use a compare function
>>    fs/xfs: Handle CoW for fsdax write() path
>>    fs/xfs: Add dedupe support for fsdax
>>
>>   fs/btrfs/reflink.c     |   3 +-
>>   fs/dax.c               | 188 ++++++++++++++++++++++++++++++++++++++---
>>   fs/ocfs2/file.c        |   2 +-
>>   fs/remap_range.c       |  14 +--
>>   fs/xfs/xfs_bmap_util.c |   6 +-
>>   fs/xfs/xfs_file.c      |  30 ++++++-
>>   fs/xfs/xfs_inode.c     |   8 +-
>>   fs/xfs/xfs_inode.h     |   1 +
>>   fs/xfs/xfs_iomap.c     |   3 +-
>>   fs/xfs/xfs_iops.c      |  11 ++-
>>   fs/xfs/xfs_reflink.c   |  23 ++++-
>>   include/linux/dax.h    |   5 ++
>>   include/linux/fs.h     |   9 +-
>>   13 files changed, 270 insertions(+), 33 deletions(-)
>>
>> -- 
>> 2.30.0
>>
>>
>>



_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Ocfs2-devel] [PATCH 3/7] fsdax: Copy data before write
  2021-02-08 15:14   ` Christoph Hellwig
@ 2021-02-09  1:53     ` Ruan Shiyang
  0 siblings, 0 replies; 28+ messages in thread
From: Ruan Shiyang @ 2021-02-09  1:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: jack, darrick.wong, linux-nvdimm, david, linux-kernel, linux-xfs,
	ocfs2-devel, viro, linux-fsdevel, dan.j.williams, linux-btrfs



On 2021/2/8 下午11:14, Christoph Hellwig wrote:
>>   	switch (iomap.type) {
>>   	case IOMAP_MAPPED:
>> +cow:
>>   		if (iomap.flags & IOMAP_F_NEW) {
>>   			count_vm_event(PGMAJFAULT);
>>   			count_memcg_event_mm(vma->vm_mm, PGMAJFAULT);
>>   			major = VM_FAULT_MAJOR;
>>   		}
>>   		error = dax_iomap_direct_access(&iomap, pos, PAGE_SIZE,
>> -						NULL, &pfn);
>> +						&kaddr, &pfn);
> 
> Any chance you could look into factoring out this code into a helper
> to avoid the goto magic, which is a little too convoluted?
> 
>>   	switch (iomap.type) {
>>   	case IOMAP_MAPPED:
>> +cow:
>>   		error = dax_iomap_direct_access(&iomap, pos, PMD_SIZE,
>> -						NULL, &pfn);
>> +						&kaddr, &pfn);
>>   		if (error < 0)
>>   			goto finish_iomap;
>>   
>>   		entry = dax_insert_entry(&xas, mapping, vmf, entry, pfn,
>>   						DAX_PMD, write && !sync);
>>   
>> +		if (srcmap.type != IOMAP_HOLE) {
> 
> Same here.

Thanks for suggestion.  I'll try it.


--
Thanks,
Ruan Shiyang.
> 
> 



_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Ocfs2-devel] [PATCH 5/7] fsdax: Dedup file range to use a compare function
  2021-02-08 15:19   ` Christoph Hellwig
@ 2021-02-09  9:15     ` Ruan Shiyang
  2021-02-09  9:34       ` Christoph Hellwig
  0 siblings, 1 reply; 28+ messages in thread
From: Ruan Shiyang @ 2021-02-09  9:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: jack, darrick.wong, linux-nvdimm, david, linux-kernel, linux-xfs,
	ocfs2-devel, viro, Goldwyn Rodrigues, linux-fsdevel,
	dan.j.williams, linux-btrfs



On 2021/2/8 下午11:19, Christoph Hellwig wrote:
> On Mon, Feb 08, 2021 at 01:09:22AM +0800, Shiyang Ruan wrote:
>> With dax we cannot deal with readpage() etc. So, we create a
>> funciton callback to perform the file data comparison and pass
> 
> s/funciton/function/g
> 
>> +#define MIN(a, b) (((a) < (b)) ? (a) : (b))
> 
> This should use the existing min or min_t helpers.
> 
> 
>>   int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in,
>>   				  struct file *file_out, loff_t pos_out,
>> -				  loff_t *len, unsigned int remap_flags)
>> +				  loff_t *len, unsigned int remap_flags,
>> +				  compare_range_t compare_range_fn)
> 
> Can we keep generic_remap_file_range_prep as-is, and add a new
> dax_remap_file_range_prep, both sharing a low-level
> __generic_remap_file_range_prep implementation?  And for that
> implementation I'd also go for classic if/else instead of the function
> pointer.

The dax dedupe comparison need the iomap_ops pointer as argument, so my 
understanding is that we don't modify the argument list of 
generic_remap_file_range_prep(), but move its code into 
__generic_remap_file_range_prep() whose argument list can be modified to 
accepts the iomap_ops pointer.  Then it looks like this:

```
int dax_remap_file_range_prep(struct file *file_in, loff_t pos_in,
				  struct file *file_out, loff_t pos_out,
				  loff_t *len, unsigned int remap_flags,
				  const struct iomap_ops *ops)
{
	return __generic_remap_file_range_prep(file_in, pos_in, file_out,
			pos_out, len, remap_flags, ops);
}
EXPORT_SYMBOL(dax_remap_file_range_prep);

int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in,
				  struct file *file_out, loff_t pos_out,
				  loff_t *len, unsigned int remap_flags)
{
	return __generic_remap_file_range_prep(file_in, pos_in, file_out,
			pos_out, len, remap_flags, NULL);
}
EXPORT_SYMBOL(generic_remap_file_range_prep);
```

Am i right?


--
Thanks,
Ruan Shiyang.

> 
>> +extern int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
>> +					 struct inode *dest, loff_t destoff,
>> +					 loff_t len, bool *is_same);
> 
> no need for the extern.
> 
> 



_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Ocfs2-devel] [PATCH 5/7] fsdax: Dedup file range to use a compare function
  2021-02-09  9:15     ` Ruan Shiyang
@ 2021-02-09  9:34       ` Christoph Hellwig
  2021-02-09  9:46         ` Ruan Shiyang
  0 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2021-02-09  9:34 UTC (permalink / raw)
  To: Ruan Shiyang
  Cc: jack, linux-nvdimm, darrick.wong, david, linux-kernel, linux-xfs,
	ocfs2-devel, viro, Goldwyn Rodrigues, linux-fsdevel,
	dan.j.williams, linux-btrfs

On Tue, Feb 09, 2021 at 05:15:13PM +0800, Ruan Shiyang wrote:
> The dax dedupe comparison need the iomap_ops pointer as argument, so my 
> understanding is that we don't modify the argument list of 
> generic_remap_file_range_prep(), but move its code into 
> __generic_remap_file_range_prep() whose argument list can be modified to 
> accepts the iomap_ops pointer.  Then it looks like this:

I'd say just add the iomap_ops pointer to
generic_remap_file_range_prep and do away with the extra wrappers.  We
only have three callers anyway.

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Ocfs2-devel] [PATCH 5/7] fsdax: Dedup file range to use a compare function
  2021-02-09  9:34       ` Christoph Hellwig
@ 2021-02-09  9:46         ` Ruan Shiyang
  2021-02-10 13:19           ` Christoph Hellwig
  0 siblings, 1 reply; 28+ messages in thread
From: Ruan Shiyang @ 2021-02-09  9:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: jack, darrick.wong, linux-nvdimm, david, linux-kernel, linux-xfs,
	ocfs2-devel, viro, Goldwyn Rodrigues, linux-fsdevel,
	dan.j.williams, linux-btrfs



On 2021/2/9 下午5:34, Christoph Hellwig wrote:
> On Tue, Feb 09, 2021 at 05:15:13PM +0800, Ruan Shiyang wrote:
>> The dax dedupe comparison need the iomap_ops pointer as argument, so my
>> understanding is that we don't modify the argument list of
>> generic_remap_file_range_prep(), but move its code into
>> __generic_remap_file_range_prep() whose argument list can be modified to
>> accepts the iomap_ops pointer.  Then it looks like this:
> 
> I'd say just add the iomap_ops pointer to
> generic_remap_file_range_prep and do away with the extra wrappers.  We
> only have three callers anyway.

OK.


--
Thanks,
Ruan Shiyang.
> 
> 



_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Ocfs2-devel] [PATCH 5/7] fsdax: Dedup file range to use a compare function
  2021-02-09  9:46         ` Ruan Shiyang
@ 2021-02-10 13:19           ` Christoph Hellwig
  2021-02-17  3:24             ` Ruan Shiyang
  0 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2021-02-10 13:19 UTC (permalink / raw)
  To: Ruan Shiyang
  Cc: jack, linux-nvdimm, darrick.wong, david, linux-kernel, linux-xfs,
	ocfs2-devel, viro, Goldwyn Rodrigues, linux-fsdevel,
	dan.j.williams, linux-btrfs

On Tue, Feb 09, 2021 at 05:46:13PM +0800, Ruan Shiyang wrote:
>
>
> On 2021/2/9 下午5:34, Christoph Hellwig wrote:
>> On Tue, Feb 09, 2021 at 05:15:13PM +0800, Ruan Shiyang wrote:
>>> The dax dedupe comparison need the iomap_ops pointer as argument, so my
>>> understanding is that we don't modify the argument list of
>>> generic_remap_file_range_prep(), but move its code into
>>> __generic_remap_file_range_prep() whose argument list can be modified to
>>> accepts the iomap_ops pointer.  Then it looks like this:
>>
>> I'd say just add the iomap_ops pointer to
>> generic_remap_file_range_prep and do away with the extra wrappers.  We
>> only have three callers anyway.
>
> OK.

So looking at this again I think your proposal actaully is better,
given that the iomap variant is still DAX specific.  Sorry for
the noise.

Also I think dax_file_range_compare should use iomap_apply instead
of open coding it.

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Ocfs2-devel] [PATCH 4/7] fsdax: Replace mmap entry in case of CoW
  2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 4/7] fsdax: Replace mmap entry in case of CoW Shiyang Ruan
  2021-02-08 15:16   ` Christoph Hellwig
@ 2021-02-16 13:11   ` David Sterba
  2021-02-17  3:06     ` Ruan Shiyang
  1 sibling, 1 reply; 28+ messages in thread
From: David Sterba @ 2021-02-16 13:11 UTC (permalink / raw)
  To: Shiyang Ruan
  Cc: jack, darrick.wong, linux-nvdimm, david, linux-kernel, linux-xfs,
	ocfs2-devel, viro, Goldwyn Rodrigues, linux-fsdevel,
	dan.j.williams, linux-btrfs

On Mon, Feb 08, 2021 at 01:09:21AM +0800, Shiyang Ruan wrote:
> We replace the existing entry to the newly allocated one
> in case of CoW. Also, we mark the entry as PAGECACHE_TAG_TOWRITE
> so writeback marks this entry as writeprotected. This
> helps us snapshots so new write pagefaults after snapshots
> trigger a CoW.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
> ---
>  fs/dax.c | 31 +++++++++++++++++++++++--------
>  1 file changed, 23 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index b2195cbdf2dc..29698a3d2e37 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -722,6 +722,9 @@ static int copy_cow_page_dax(struct block_device *bdev, struct dax_device *dax_d
>  	return 0;
>  }
>  
> +#define DAX_IF_DIRTY		(1ULL << 0)
> +#define DAX_IF_COW		(1ULL << 1)

The constants are ULL, but I see other flags only 'unsigned long'

> +
>  /*
>   * By this point grab_mapping_entry() has ensured that we have a locked entry
>   * of the appropriate size so we don't have to worry about downgrading PMDs to
> @@ -731,14 +734,16 @@ static int copy_cow_page_dax(struct block_device *bdev, struct dax_device *dax_d
>   */
>  static void *dax_insert_entry(struct xa_state *xas,
>  		struct address_space *mapping, struct vm_fault *vmf,
> -		void *entry, pfn_t pfn, unsigned long flags, bool dirty)
> +		void *entry, pfn_t pfn, unsigned long flags, bool insert_flags)

insert_flags is bool

>  {
>  	void *new_entry = dax_make_entry(pfn, flags);
> +	bool dirty = insert_flags & DAX_IF_DIRTY;

"insert_flags & DAX_IF_DIRTY" is "bool & ULL", this can't be right

> +	bool cow = insert_flags & DAX_IF_COW;

Same

>  
>  	if (dirty)
>  		__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
>  
> -	if (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE)) {
> +	if (cow || (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE))) {
>  		unsigned long index = xas->xa_index;
>  		/* we are replacing a zero page with block mapping */
>  		if (dax_is_pmd_entry(entry))
> @@ -750,7 +755,7 @@ static void *dax_insert_entry(struct xa_state *xas,
>  
>  	xas_reset(xas);
>  	xas_lock_irq(xas);
> -	if (dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) {
> +	if (cow || dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) {
>  		void *old;
>  
>  		dax_disassociate_entry(entry, mapping, false);
> @@ -774,6 +779,9 @@ static void *dax_insert_entry(struct xa_state *xas,
>  	if (dirty)
>  		xas_set_mark(xas, PAGECACHE_TAG_DIRTY);
>  
> +	if (cow)
> +		xas_set_mark(xas, PAGECACHE_TAG_TOWRITE);
> +
>  	xas_unlock_irq(xas);
>  	return entry;
>  }
> @@ -1319,6 +1327,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
>  	void *entry;
>  	pfn_t pfn;
>  	void *kaddr;
> +	unsigned long insert_flags = 0;
>  
>  	trace_dax_pte_fault(inode, vmf, ret);
>  	/*
> @@ -1444,8 +1453,10 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
>  
>  		goto finish_iomap;
>  	case IOMAP_UNWRITTEN:
> -		if (write && iomap.flags & IOMAP_F_SHARED)
> +		if (write && (iomap.flags & IOMAP_F_SHARED)) {
> +			insert_flags |= DAX_IF_COW;

Here's an example of 'unsigned long = unsigned long long', though it'll
work, it would be better to unify all the types.

>  			goto cow;
> +		}
>  		fallthrough;
>  	case IOMAP_HOLE:
>  		if (!write) {
> @@ -1555,6 +1566,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
>  	int error;
>  	pfn_t pfn;
>  	void *kaddr;
> +	unsigned long insert_flags = 0;
>  
>  	/*
>  	 * Check whether offset isn't beyond end of file now. Caller is
> @@ -1670,14 +1682,17 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
>  		result = vmf_insert_pfn_pmd(vmf, pfn, write);
>  		break;
>  	case IOMAP_UNWRITTEN:
> -		if (write && iomap.flags & IOMAP_F_SHARED)
> +		if (write && (iomap.flags & IOMAP_F_SHARED)) {
> +			insert_flags |= DAX_IF_COW;
>  			goto cow;
> +		}
>  		fallthrough;
>  	case IOMAP_HOLE:
> -		if (WARN_ON_ONCE(write))
> +		if (!write) {
> +			result = dax_pmd_load_hole(&xas, vmf, &iomap, &entry);
>  			break;
> -		result = dax_pmd_load_hole(&xas, vmf, &iomap, &entry);
> -		break;
> +		}
> +		fallthrough;
>  	default:
>  		WARN_ON_ONCE(1);
>  		break;
> -- 
> 2.30.0
> 
> 

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Ocfs2-devel] [PATCH 4/7] fsdax: Replace mmap entry in case of CoW
  2021-02-16 13:11   ` David Sterba
@ 2021-02-17  3:06     ` Ruan Shiyang
  0 siblings, 0 replies; 28+ messages in thread
From: Ruan Shiyang @ 2021-02-17  3:06 UTC (permalink / raw)
  To: dsterba, linux-kernel, linux-xfs, linux-nvdimm, linux-fsdevel,
	darrick.wong, dan.j.williams, willy, jack, viro, linux-btrfs,
	ocfs2-devel, david, hch, rgoldwyn, Goldwyn Rodrigues



On 2021/2/16 下午9:11, David Sterba wrote:
> On Mon, Feb 08, 2021 at 01:09:21AM +0800, Shiyang Ruan wrote:
>> We replace the existing entry to the newly allocated one
>> in case of CoW. Also, we mark the entry as PAGECACHE_TAG_TOWRITE
>> so writeback marks this entry as writeprotected. This
>> helps us snapshots so new write pagefaults after snapshots
>> trigger a CoW.
>>
>> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
>> Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
>> ---
>>   fs/dax.c | 31 +++++++++++++++++++++++--------
>>   1 file changed, 23 insertions(+), 8 deletions(-)
>>
>> diff --git a/fs/dax.c b/fs/dax.c
>> index b2195cbdf2dc..29698a3d2e37 100644
>> --- a/fs/dax.c
>> +++ b/fs/dax.c
>> @@ -722,6 +722,9 @@ static int copy_cow_page_dax(struct block_device *bdev, struct dax_device *dax_d
>>   	return 0;
>>   }
>>   
>> +#define DAX_IF_DIRTY		(1ULL << 0)
>> +#define DAX_IF_COW		(1ULL << 1)
> 
> The constants are ULL, but I see other flags only 'unsigned long'
> 
>> +
>>   /*
>>    * By this point grab_mapping_entry() has ensured that we have a locked entry
>>    * of the appropriate size so we don't have to worry about downgrading PMDs to
>> @@ -731,14 +734,16 @@ static int copy_cow_page_dax(struct block_device *bdev, struct dax_device *dax_d
>>    */
>>   static void *dax_insert_entry(struct xa_state *xas,
>>   		struct address_space *mapping, struct vm_fault *vmf,
>> -		void *entry, pfn_t pfn, unsigned long flags, bool dirty)
>> +		void *entry, pfn_t pfn, unsigned long flags, bool insert_flags)
> 
> insert_flags is bool
> 
>>   {
>>   	void *new_entry = dax_make_entry(pfn, flags);
>> +	bool dirty = insert_flags & DAX_IF_DIRTY;
> 
> "insert_flags & DAX_IF_DIRTY" is "bool & ULL", this can't be right

This is a mistake caused by rebasing my old version patchset.  Thanks 
for pointing out.  I'll fix this in next version.
> 
>> +	bool cow = insert_flags & DAX_IF_COW;
> 
> Same
> 
>>   
>>   	if (dirty)
>>   		__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
>>   
>> -	if (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE)) {
>> +	if (cow || (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE))) {
>>   		unsigned long index = xas->xa_index;
>>   		/* we are replacing a zero page with block mapping */
>>   		if (dax_is_pmd_entry(entry))
>> @@ -750,7 +755,7 @@ static void *dax_insert_entry(struct xa_state *xas,
>>   
>>   	xas_reset(xas);
>>   	xas_lock_irq(xas);
>> -	if (dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) {
>> +	if (cow || dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) {
>>   		void *old;
>>   
>>   		dax_disassociate_entry(entry, mapping, false);
>> @@ -774,6 +779,9 @@ static void *dax_insert_entry(struct xa_state *xas,
>>   	if (dirty)
>>   		xas_set_mark(xas, PAGECACHE_TAG_DIRTY);
>>   
>> +	if (cow)
>> +		xas_set_mark(xas, PAGECACHE_TAG_TOWRITE);
>> +
>>   	xas_unlock_irq(xas);
>>   	return entry;
>>   }
>> @@ -1319,6 +1327,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
>>   	void *entry;
>>   	pfn_t pfn;
>>   	void *kaddr;
>> +	unsigned long insert_flags = 0;
>>   
>>   	trace_dax_pte_fault(inode, vmf, ret);
>>   	/*
>> @@ -1444,8 +1453,10 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
>>   
>>   		goto finish_iomap;
>>   	case IOMAP_UNWRITTEN:
>> -		if (write && iomap.flags & IOMAP_F_SHARED)
>> +		if (write && (iomap.flags & IOMAP_F_SHARED)) {
>> +			insert_flags |= DAX_IF_COW;
> 
> Here's an example of 'unsigned long = unsigned long long', though it'll
> work, it would be better to unify all the types.

Yes, I'll fix it.


--
Thanks,
Ruan Shiyang.
> 
>>   			goto cow;
>> +		}
>>   		fallthrough;
>>   	case IOMAP_HOLE:
>>   		if (!write) {
>> @@ -1555,6 +1566,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
>>   	int error;
>>   	pfn_t pfn;
>>   	void *kaddr;
>> +	unsigned long insert_flags = 0;
>>   
>>   	/*
>>   	 * Check whether offset isn't beyond end of file now. Caller is
>> @@ -1670,14 +1682,17 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
>>   		result = vmf_insert_pfn_pmd(vmf, pfn, write);
>>   		break;
>>   	case IOMAP_UNWRITTEN:
>> -		if (write && iomap.flags & IOMAP_F_SHARED)
>> +		if (write && (iomap.flags & IOMAP_F_SHARED)) {
>> +			insert_flags |= DAX_IF_COW;
>>   			goto cow;
>> +		}
>>   		fallthrough;
>>   	case IOMAP_HOLE:
>> -		if (WARN_ON_ONCE(write))
>> +		if (!write) {
>> +			result = dax_pmd_load_hole(&xas, vmf, &iomap, &entry);
>>   			break;
>> -		result = dax_pmd_load_hole(&xas, vmf, &iomap, &entry);
>> -		break;
>> +		}
>> +		fallthrough;
>>   	default:
>>   		WARN_ON_ONCE(1);
>>   		break;
>> -- 
>> 2.30.0
>>
>>
> 
> 



_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Ocfs2-devel] [PATCH 5/7] fsdax: Dedup file range to use a compare function
  2021-02-10 13:19           ` Christoph Hellwig
@ 2021-02-17  3:24             ` Ruan Shiyang
  2021-02-18 16:20               ` Darrick J. Wong
  0 siblings, 1 reply; 28+ messages in thread
From: Ruan Shiyang @ 2021-02-17  3:24 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: jack, darrick.wong, linux-nvdimm, david, linux-kernel, linux-xfs,
	ocfs2-devel, viro, Goldwyn Rodrigues, linux-fsdevel,
	dan.j.williams, linux-btrfs



On 2021/2/10 下午9:19, Christoph Hellwig wrote:
> On Tue, Feb 09, 2021 at 05:46:13PM +0800, Ruan Shiyang wrote:
>>
>>
>> On 2021/2/9 下午5:34, Christoph Hellwig wrote:
>>> On Tue, Feb 09, 2021 at 05:15:13PM +0800, Ruan Shiyang wrote:
>>>> The dax dedupe comparison need the iomap_ops pointer as argument, so my
>>>> understanding is that we don't modify the argument list of
>>>> generic_remap_file_range_prep(), but move its code into
>>>> __generic_remap_file_range_prep() whose argument list can be modified to
>>>> accepts the iomap_ops pointer.  Then it looks like this:
>>>
>>> I'd say just add the iomap_ops pointer to
>>> generic_remap_file_range_prep and do away with the extra wrappers.  We
>>> only have three callers anyway.
>>
>> OK.
> 
> So looking at this again I think your proposal actaully is better,
> given that the iomap variant is still DAX specific.  Sorry for
> the noise.
> 
> Also I think dax_file_range_compare should use iomap_apply instead
> of open coding it.
> 

There are two files, which are not reflinked, need to be direct_access() 
here.  The iomap_apply() can handle one file each time.  So, it seems 
that iomap_apply() is not suitable for this case...


The pseudo code of this process is as follows:

   srclen = ops->begin(&srcmap)
   destlen = ops->begin(&destmap)

   direct_access(&srcmap, &saddr)
   direct_access(&destmap, &daddr)

   same = memcpy(saddr, daddr, min(srclen,destlen))

   ops->end(&destmap)
   ops->end(&srcmap)

I think a nested call like this is necessary.  That's why I use the open 
code way.


--
Thanks,
Ruan Shiyang.
> 



_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Ocfs2-devel] [PATCH 5/7] fsdax: Dedup file range to use a compare function
  2021-02-17  3:24             ` Ruan Shiyang
@ 2021-02-18 16:20               ` Darrick J. Wong
  2021-02-25  7:35                 ` Christoph Hellwig
  0 siblings, 1 reply; 28+ messages in thread
From: Darrick J. Wong @ 2021-02-18 16:20 UTC (permalink / raw)
  To: Ruan Shiyang
  Cc: jack, linux-nvdimm, darrick.wong, david, linux-kernel, linux-xfs,
	ocfs2-devel, viro, Goldwyn Rodrigues, linux-fsdevel,
	dan.j.williams, linux-btrfs

On Wed, Feb 17, 2021 at 11:24:18AM +0800, Ruan Shiyang wrote:
> 
> 
> On 2021/2/10 下午9:19, Christoph Hellwig wrote:
> > On Tue, Feb 09, 2021 at 05:46:13PM +0800, Ruan Shiyang wrote:
> > > 
> > > 
> > > On 2021/2/9 下午5:34, Christoph Hellwig wrote:
> > > > On Tue, Feb 09, 2021 at 05:15:13PM +0800, Ruan Shiyang wrote:
> > > > > The dax dedupe comparison need the iomap_ops pointer as argument, so my
> > > > > understanding is that we don't modify the argument list of
> > > > > generic_remap_file_range_prep(), but move its code into
> > > > > __generic_remap_file_range_prep() whose argument list can be modified to
> > > > > accepts the iomap_ops pointer.  Then it looks like this:
> > > > 
> > > > I'd say just add the iomap_ops pointer to
> > > > generic_remap_file_range_prep and do away with the extra wrappers.  We
> > > > only have three callers anyway.
> > > 
> > > OK.
> > 
> > So looking at this again I think your proposal actaully is better,
> > given that the iomap variant is still DAX specific.  Sorry for
> > the noise.
> > 
> > Also I think dax_file_range_compare should use iomap_apply instead
> > of open coding it.
> > 
> 
> There are two files, which are not reflinked, need to be direct_access()
> here.  The iomap_apply() can handle one file each time.  So, it seems that
> iomap_apply() is not suitable for this case...
> 
> 
> The pseudo code of this process is as follows:
> 
>   srclen = ops->begin(&srcmap)
>   destlen = ops->begin(&destmap)
> 
>   direct_access(&srcmap, &saddr)
>   direct_access(&destmap, &daddr)
> 
>   same = memcpy(saddr, daddr, min(srclen,destlen))
> 
>   ops->end(&destmap)
>   ops->end(&srcmap)
> 
> I think a nested call like this is necessary.  That's why I use the open
> code way.

This might be a good place to implement an iomap_apply2() loop that
actually /does/ walk all the extents of file1 and file2.  There's now
two users of this idiom.

(Possibly structured as a "get next mappings from both" generator
function like Matthew Wilcox keeps asking for. :))

--D

> 
> --
> Thanks,
> Ruan Shiyang.
> > 
> 
> 

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Ocfs2-devel] [PATCH 1/7] fsdax: Output address in dax_iomap_pfn() and rename it
  2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 1/7] fsdax: Output address in dax_iomap_pfn() and rename it Shiyang Ruan
  2021-02-08 15:11   ` Christoph Hellwig
@ 2021-02-22  7:44   ` Xiaoguang Wang
  2021-02-23  1:32     ` [Ocfs2-devel] 回复: " ruansy.fnst
  1 sibling, 1 reply; 28+ messages in thread
From: Xiaoguang Wang @ 2021-02-22  7:44 UTC (permalink / raw)
  To: Shiyang Ruan, linux-kernel, linux-xfs, linux-nvdimm, linux-fsdevel
  Cc: jack, darrick.wong, david, ocfs2-devel, viro, dan.j.williams,
	linux-btrfs

hi,

> Add address output in dax_iomap_pfn() in order to perform a memcpy() in
> CoW case.  Since this function both output address and pfn, rename it to
> dax_iomap_direct_access().
> 
> Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
> ---
>   fs/dax.c | 20 +++++++++++++++-----
>   1 file changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 5b47834f2e1b..b012b2db7ba2 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -998,8 +998,8 @@ static sector_t dax_iomap_sector(struct iomap *iomap, loff_t pos)
>   	return (iomap->addr + (pos & PAGE_MASK) - iomap->offset) >> 9;
>   }
>   
> -static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size,
> -			 pfn_t *pfnp)
> +static int dax_iomap_direct_access(struct iomap *iomap, loff_t pos, size_t size,
> +		void **kaddr, pfn_t *pfnp)
>   {
>   	const sector_t sector = dax_iomap_sector(iomap, pos);
>   	pgoff_t pgoff;
> @@ -1011,11 +1011,13 @@ static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size,
>   		return rc;
>   	id = dax_read_lock();
>   	length = dax_direct_access(iomap->dax_dev, pgoff, PHYS_PFN(size),
> -				   NULL, pfnp);
> +				   kaddr, pfnp);
>   	if (length < 0) {
>   		rc = length;
>   		goto out;
>   	}
> +	if (!pfnp)
Should this be "if (!*pfnp)"?

Regards,
Xiaoguang Wang
> +		goto out_check_addr;
>   	rc = -EINVAL;
>   	if (PFN_PHYS(length) < size)
>   		goto out;
> @@ -1025,6 +1027,12 @@ static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size,
>   	if (length > 1 && !pfn_t_devmap(*pfnp))
>   		goto out;
>   	rc = 0;
> +
> +out_check_addr:
> +	if (!kaddr)
> +		goto out;
> +	if (!*kaddr)
> +		rc = -EFAULT;
>   out:
>   	dax_read_unlock(id);
>   	return rc;
> @@ -1348,7 +1356,8 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
>   			count_memcg_event_mm(vma->vm_mm, PGMAJFAULT);
>   			major = VM_FAULT_MAJOR;
>   		}
> -		error = dax_iomap_pfn(&iomap, pos, PAGE_SIZE, &pfn);
> +		error = dax_iomap_direct_access(&iomap, pos, PAGE_SIZE,
> +						NULL, &pfn);
>   		if (error < 0)
>   			goto error_finish_iomap;
>   
> @@ -1566,7 +1575,8 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
>   
>   	switch (iomap.type) {
>   	case IOMAP_MAPPED:
> -		error = dax_iomap_pfn(&iomap, pos, PMD_SIZE, &pfn);
> +		error = dax_iomap_direct_access(&iomap, pos, PMD_SIZE,
> +						NULL, &pfn);
>   		if (error < 0)
>   			goto finish_iomap;
>   
> 

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Ocfs2-devel]  回复: [PATCH 1/7] fsdax: Output address in dax_iomap_pfn() and rename it
  2021-02-22  7:44   ` Xiaoguang Wang
@ 2021-02-23  1:32     ` ruansy.fnst
  0 siblings, 0 replies; 28+ messages in thread
From: ruansy.fnst @ 2021-02-23  1:32 UTC (permalink / raw)
  To: Xiaoguang Wang
  Cc: jack, darrick.wong, linux-nvdimm, david, linux-kernel, linux-xfs,
	ocfs2-devel, viro, linux-fsdevel, dan.j.williams, linux-btrfs

> hi,
> 
> > Add address output in dax_iomap_pfn() in order to perform a memcpy() in
> > CoW case.  Since this function both output address and pfn, rename it to
> > dax_iomap_direct_access().
> >
> > Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
> > ---
> >   fs/dax.c | 20 +++++++++++++++-----
> >   1 file changed, 15 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/dax.c b/fs/dax.c
> > index 5b47834f2e1b..b012b2db7ba2 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -998,8 +998,8 @@ static sector_t dax_iomap_sector(struct iomap *iomap, loff_t pos)
> >        return (iomap->addr + (pos & PAGE_MASK) - iomap->offset) >> 9;
> >   }
> >  
> > -static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size,
> > -                      pfn_t *pfnp)
> > +static int dax_iomap_direct_access(struct iomap *iomap, loff_t pos, size_t size,
> > +             void **kaddr, pfn_t *pfnp)
> >   {
> >        const sector_t sector = dax_iomap_sector(iomap, pos);
> >        pgoff_t pgoff;
> > @@ -1011,11 +1011,13 @@ static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size,
> >                return rc;
> >        id = dax_read_lock();
> >        length = dax_direct_access(iomap->dax_dev, pgoff, PHYS_PFN(size),
> > -                                NULL, pfnp);
> > +                                kaddr, pfnp);
> >        if (length < 0) {
> >                rc = length;
> >                goto out;
> >        }
> > +     if (!pfnp)
> Should this be "if (!*pfnp)"?

pfnp may be NULL if we only need a kaddr output.
  `dax_iomap_direct_access(iomap, pos, size, &kaddr, NULL);`

So, it's a NULL pointer check here.


--
Thanks,
Ruan Shiyang.

> 
> Regards,
> Xiaoguang Wang
> > +             goto out_check_addr;
> >        rc = -EINVAL;
> >        if (PFN_PHYS(length) < size)
> >                goto out;
> > @@ -1025,6 +1027,12 @@ static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size,
> >        if (length > 1 && !pfn_t_devmap(*pfnp))
> >                goto out;
> >        rc = 0;
> > +
> > +out_check_addr:
> > +     if (!kaddr)
> > +             goto out;
> > +     if (!*kaddr)
> > +             rc = -EFAULT;
> >   out:
> >        dax_read_unlock(id);
> >        return rc;
> 
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Ocfs2-devel] [PATCH 5/7] fsdax: Dedup file range to use a compare function
  2021-02-18 16:20               ` Darrick J. Wong
@ 2021-02-25  7:35                 ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2021-02-25  7:35 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: jack, Ruan Shiyang, linux-nvdimm, darrick.wong, david,
	linux-kernel, linux-xfs, ocfs2-devel, viro, Goldwyn Rodrigues,
	linux-fsdevel, dan.j.williams, linux-btrfs

On Thu, Feb 18, 2021 at 08:20:18AM -0800, Darrick J. Wong wrote:
> > I think a nested call like this is necessary.  That's why I use the open
> > code way.
> 
> This might be a good place to implement an iomap_apply2() loop that
> actually /does/ walk all the extents of file1 and file2.  There's now
> two users of this idiom.

Why do we need a special helper for that?

> (Possibly structured as a "get next mappings from both" generator
> function like Matthew Wilcox keeps asking for. :))

OTOH this might be a good first use for that.

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2021-02-26 16:55 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-07 17:09 [Ocfs2-devel] [PATCH 0/7] fsdax, xfs: Add reflink&dedupe support for fsdax Shiyang Ruan
2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 1/7] fsdax: Output address in dax_iomap_pfn() and rename it Shiyang Ruan
2021-02-08 15:11   ` Christoph Hellwig
2021-02-22  7:44   ` Xiaoguang Wang
2021-02-23  1:32     ` [Ocfs2-devel] 回复: " ruansy.fnst
2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 2/7] fsdax: Introduce dax_copy_edges() for CoW Shiyang Ruan
2021-02-08 15:11   ` Christoph Hellwig
2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 3/7] fsdax: Copy data before write Shiyang Ruan
2021-02-08 15:14   ` Christoph Hellwig
2021-02-09  1:53     ` Ruan Shiyang
2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 4/7] fsdax: Replace mmap entry in case of CoW Shiyang Ruan
2021-02-08 15:16   ` Christoph Hellwig
2021-02-16 13:11   ` David Sterba
2021-02-17  3:06     ` Ruan Shiyang
2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 5/7] fsdax: Dedup file range to use a compare function Shiyang Ruan
2021-02-08 15:19   ` Christoph Hellwig
2021-02-09  9:15     ` Ruan Shiyang
2021-02-09  9:34       ` Christoph Hellwig
2021-02-09  9:46         ` Ruan Shiyang
2021-02-10 13:19           ` Christoph Hellwig
2021-02-17  3:24             ` Ruan Shiyang
2021-02-18 16:20               ` Darrick J. Wong
2021-02-25  7:35                 ` Christoph Hellwig
2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 6/7] fs/xfs: Handle CoW for fsdax write() path Shiyang Ruan
2021-02-08 15:24   ` Christoph Hellwig
2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 7/7] fs/xfs: Add dedupe support for fsdax Shiyang Ruan
2021-02-08 15:39 ` [Ocfs2-devel] [PATCH 0/7] fsdax, xfs: Add reflink&dedupe " Jan Kara
2021-02-09  1:50   ` Ruan Shiyang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).