* [Ocfs2-devel] [PATCH 1/7] fsdax: Output address in dax_iomap_pfn() and rename it
2021-02-07 17:09 [Ocfs2-devel] [PATCH 0/7] fsdax, xfs: Add reflink&dedupe support for fsdax Shiyang Ruan
@ 2021-02-07 17:09 ` Shiyang Ruan
2021-02-08 15:11 ` Christoph Hellwig
2021-02-22 7:44 ` Xiaoguang Wang
2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 2/7] fsdax: Introduce dax_copy_edges() for CoW Shiyang Ruan
` (6 subsequent siblings)
7 siblings, 2 replies; 28+ messages in thread
From: Shiyang Ruan @ 2021-02-07 17:09 UTC (permalink / raw)
To: linux-kernel, linux-xfs, linux-nvdimm, linux-fsdevel
Cc: jack, darrick.wong, david, ocfs2-devel, viro, dan.j.williams,
linux-btrfs
Add address output in dax_iomap_pfn() in order to perform a memcpy() in
CoW case. Since this function both output address and pfn, rename it to
dax_iomap_direct_access().
Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
---
fs/dax.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/fs/dax.c b/fs/dax.c
index 5b47834f2e1b..b012b2db7ba2 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -998,8 +998,8 @@ static sector_t dax_iomap_sector(struct iomap *iomap, loff_t pos)
return (iomap->addr + (pos & PAGE_MASK) - iomap->offset) >> 9;
}
-static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size,
- pfn_t *pfnp)
+static int dax_iomap_direct_access(struct iomap *iomap, loff_t pos, size_t size,
+ void **kaddr, pfn_t *pfnp)
{
const sector_t sector = dax_iomap_sector(iomap, pos);
pgoff_t pgoff;
@@ -1011,11 +1011,13 @@ static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size,
return rc;
id = dax_read_lock();
length = dax_direct_access(iomap->dax_dev, pgoff, PHYS_PFN(size),
- NULL, pfnp);
+ kaddr, pfnp);
if (length < 0) {
rc = length;
goto out;
}
+ if (!pfnp)
+ goto out_check_addr;
rc = -EINVAL;
if (PFN_PHYS(length) < size)
goto out;
@@ -1025,6 +1027,12 @@ static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size,
if (length > 1 && !pfn_t_devmap(*pfnp))
goto out;
rc = 0;
+
+out_check_addr:
+ if (!kaddr)
+ goto out;
+ if (!*kaddr)
+ rc = -EFAULT;
out:
dax_read_unlock(id);
return rc;
@@ -1348,7 +1356,8 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
count_memcg_event_mm(vma->vm_mm, PGMAJFAULT);
major = VM_FAULT_MAJOR;
}
- error = dax_iomap_pfn(&iomap, pos, PAGE_SIZE, &pfn);
+ error = dax_iomap_direct_access(&iomap, pos, PAGE_SIZE,
+ NULL, &pfn);
if (error < 0)
goto error_finish_iomap;
@@ -1566,7 +1575,8 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
switch (iomap.type) {
case IOMAP_MAPPED:
- error = dax_iomap_pfn(&iomap, pos, PMD_SIZE, &pfn);
+ error = dax_iomap_direct_access(&iomap, pos, PMD_SIZE,
+ NULL, &pfn);
if (error < 0)
goto finish_iomap;
--
2.30.0
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [Ocfs2-devel] [PATCH 1/7] fsdax: Output address in dax_iomap_pfn() and rename it
2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 1/7] fsdax: Output address in dax_iomap_pfn() and rename it Shiyang Ruan
@ 2021-02-08 15:11 ` Christoph Hellwig
2021-02-22 7:44 ` Xiaoguang Wang
1 sibling, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2021-02-08 15:11 UTC (permalink / raw)
To: Shiyang Ruan
Cc: jack, darrick.wong, linux-nvdimm, david, linux-kernel, linux-xfs,
ocfs2-devel, viro, linux-fsdevel, dan.j.williams, linux-btrfs
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Ocfs2-devel] [PATCH 1/7] fsdax: Output address in dax_iomap_pfn() and rename it
2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 1/7] fsdax: Output address in dax_iomap_pfn() and rename it Shiyang Ruan
2021-02-08 15:11 ` Christoph Hellwig
@ 2021-02-22 7:44 ` Xiaoguang Wang
2021-02-23 1:32 ` [Ocfs2-devel] 回复: " ruansy.fnst
1 sibling, 1 reply; 28+ messages in thread
From: Xiaoguang Wang @ 2021-02-22 7:44 UTC (permalink / raw)
To: Shiyang Ruan, linux-kernel, linux-xfs, linux-nvdimm, linux-fsdevel
Cc: jack, darrick.wong, david, ocfs2-devel, viro, dan.j.williams,
linux-btrfs
hi,
> Add address output in dax_iomap_pfn() in order to perform a memcpy() in
> CoW case. Since this function both output address and pfn, rename it to
> dax_iomap_direct_access().
>
> Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
> ---
> fs/dax.c | 20 +++++++++++++++-----
> 1 file changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index 5b47834f2e1b..b012b2db7ba2 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -998,8 +998,8 @@ static sector_t dax_iomap_sector(struct iomap *iomap, loff_t pos)
> return (iomap->addr + (pos & PAGE_MASK) - iomap->offset) >> 9;
> }
>
> -static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size,
> - pfn_t *pfnp)
> +static int dax_iomap_direct_access(struct iomap *iomap, loff_t pos, size_t size,
> + void **kaddr, pfn_t *pfnp)
> {
> const sector_t sector = dax_iomap_sector(iomap, pos);
> pgoff_t pgoff;
> @@ -1011,11 +1011,13 @@ static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size,
> return rc;
> id = dax_read_lock();
> length = dax_direct_access(iomap->dax_dev, pgoff, PHYS_PFN(size),
> - NULL, pfnp);
> + kaddr, pfnp);
> if (length < 0) {
> rc = length;
> goto out;
> }
> + if (!pfnp)
Should this be "if (!*pfnp)"?
Regards,
Xiaoguang Wang
> + goto out_check_addr;
> rc = -EINVAL;
> if (PFN_PHYS(length) < size)
> goto out;
> @@ -1025,6 +1027,12 @@ static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size,
> if (length > 1 && !pfn_t_devmap(*pfnp))
> goto out;
> rc = 0;
> +
> +out_check_addr:
> + if (!kaddr)
> + goto out;
> + if (!*kaddr)
> + rc = -EFAULT;
> out:
> dax_read_unlock(id);
> return rc;
> @@ -1348,7 +1356,8 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
> count_memcg_event_mm(vma->vm_mm, PGMAJFAULT);
> major = VM_FAULT_MAJOR;
> }
> - error = dax_iomap_pfn(&iomap, pos, PAGE_SIZE, &pfn);
> + error = dax_iomap_direct_access(&iomap, pos, PAGE_SIZE,
> + NULL, &pfn);
> if (error < 0)
> goto error_finish_iomap;
>
> @@ -1566,7 +1575,8 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
>
> switch (iomap.type) {
> case IOMAP_MAPPED:
> - error = dax_iomap_pfn(&iomap, pos, PMD_SIZE, &pfn);
> + error = dax_iomap_direct_access(&iomap, pos, PMD_SIZE,
> + NULL, &pfn);
> if (error < 0)
> goto finish_iomap;
>
>
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Ocfs2-devel] 回复: [PATCH 1/7] fsdax: Output address in dax_iomap_pfn() and rename it
2021-02-22 7:44 ` Xiaoguang Wang
@ 2021-02-23 1:32 ` ruansy.fnst
0 siblings, 0 replies; 28+ messages in thread
From: ruansy.fnst @ 2021-02-23 1:32 UTC (permalink / raw)
To: Xiaoguang Wang
Cc: jack, darrick.wong, linux-nvdimm, david, linux-kernel, linux-xfs,
ocfs2-devel, viro, linux-fsdevel, dan.j.williams, linux-btrfs
> hi,
>
> > Add address output in dax_iomap_pfn() in order to perform a memcpy() in
> > CoW case. Since this function both output address and pfn, rename it to
> > dax_iomap_direct_access().
> >
> > Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
> > ---
> > fs/dax.c | 20 +++++++++++++++-----
> > 1 file changed, 15 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/dax.c b/fs/dax.c
> > index 5b47834f2e1b..b012b2db7ba2 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -998,8 +998,8 @@ static sector_t dax_iomap_sector(struct iomap *iomap, loff_t pos)
> > return (iomap->addr + (pos & PAGE_MASK) - iomap->offset) >> 9;
> > }
> >
> > -static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size,
> > - pfn_t *pfnp)
> > +static int dax_iomap_direct_access(struct iomap *iomap, loff_t pos, size_t size,
> > + void **kaddr, pfn_t *pfnp)
> > {
> > const sector_t sector = dax_iomap_sector(iomap, pos);
> > pgoff_t pgoff;
> > @@ -1011,11 +1011,13 @@ static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size,
> > return rc;
> > id = dax_read_lock();
> > length = dax_direct_access(iomap->dax_dev, pgoff, PHYS_PFN(size),
> > - NULL, pfnp);
> > + kaddr, pfnp);
> > if (length < 0) {
> > rc = length;
> > goto out;
> > }
> > + if (!pfnp)
> Should this be "if (!*pfnp)"?
pfnp may be NULL if we only need a kaddr output.
`dax_iomap_direct_access(iomap, pos, size, &kaddr, NULL);`
So, it's a NULL pointer check here.
--
Thanks,
Ruan Shiyang.
>
> Regards,
> Xiaoguang Wang
> > + goto out_check_addr;
> > rc = -EINVAL;
> > if (PFN_PHYS(length) < size)
> > goto out;
> > @@ -1025,6 +1027,12 @@ static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size,
> > if (length > 1 && !pfn_t_devmap(*pfnp))
> > goto out;
> > rc = 0;
> > +
> > +out_check_addr:
> > + if (!kaddr)
> > + goto out;
> > + if (!*kaddr)
> > + rc = -EFAULT;
> > out:
> > dax_read_unlock(id);
> > return rc;
>
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Ocfs2-devel] [PATCH 2/7] fsdax: Introduce dax_copy_edges() for CoW
2021-02-07 17:09 [Ocfs2-devel] [PATCH 0/7] fsdax, xfs: Add reflink&dedupe support for fsdax Shiyang Ruan
2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 1/7] fsdax: Output address in dax_iomap_pfn() and rename it Shiyang Ruan
@ 2021-02-07 17:09 ` Shiyang Ruan
2021-02-08 15:11 ` Christoph Hellwig
2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 3/7] fsdax: Copy data before write Shiyang Ruan
` (5 subsequent siblings)
7 siblings, 1 reply; 28+ messages in thread
From: Shiyang Ruan @ 2021-02-07 17:09 UTC (permalink / raw)
To: linux-kernel, linux-xfs, linux-nvdimm, linux-fsdevel
Cc: jack, darrick.wong, david, ocfs2-devel, viro, Goldwyn Rodrigues,
dan.j.williams, linux-btrfs
dax_copy_edges() is a helper functions performs a copy from one part of
the device to another for data not page aligned.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
---
fs/dax.c | 41 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 41 insertions(+)
diff --git a/fs/dax.c b/fs/dax.c
index b012b2db7ba2..ea4e8a434900 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1038,6 +1038,47 @@ static int dax_iomap_direct_access(struct iomap *iomap, loff_t pos, size_t size,
return rc;
}
+/*
+ * Copies the part of the pages not included in the write, but required for CoW
+ * because offset/offset+length are not page aligned.
+ */
+static int dax_copy_edges(loff_t pos, loff_t length, struct iomap *srcmap,
+ void *daddr, bool pmd)
+{
+ size_t page_size = pmd ? PMD_SIZE : PAGE_SIZE;
+ loff_t offset = pos & (page_size - 1);
+ size_t size = ALIGN(offset + length, page_size);
+ loff_t end = pos + length;
+ loff_t pg_end = round_up(end, page_size);
+ void *saddr = 0;
+ int ret = 0;
+
+ ret = dax_iomap_direct_access(srcmap, pos, size, &saddr, NULL);
+ if (ret)
+ return ret;
+ /*
+ * Copy the first part of the page
+ * Note: we pass offset as length
+ */
+ if (offset) {
+ if (saddr)
+ ret = copy_mc_to_kernel(daddr, saddr, offset);
+ else
+ memset(daddr, 0, offset);
+ }
+
+ /* Copy the last part of the range */
+ if (end < pg_end) {
+ if (saddr)
+ ret = copy_mc_to_kernel(daddr + offset + length,
+ saddr + offset + length, pg_end - end);
+ else
+ memset(daddr + offset + length, 0, pg_end - end);
+ }
+
+ return ret;
+}
+
/*
* The user has performed a load from a hole in the file. Allocating a new
* page in the file would cause excessive storage usage for workloads with
--
2.30.0
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [Ocfs2-devel] [PATCH 2/7] fsdax: Introduce dax_copy_edges() for CoW
2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 2/7] fsdax: Introduce dax_copy_edges() for CoW Shiyang Ruan
@ 2021-02-08 15:11 ` Christoph Hellwig
0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2021-02-08 15:11 UTC (permalink / raw)
To: Shiyang Ruan
Cc: jack, darrick.wong, linux-nvdimm, david, linux-kernel, linux-xfs,
ocfs2-devel, viro, Goldwyn Rodrigues, linux-fsdevel,
dan.j.williams, linux-btrfs
On Mon, Feb 08, 2021 at 01:09:19AM +0800, Shiyang Ruan wrote:
> dax_copy_edges() is a helper functions performs a copy from one part of
> the device to another for data not page aligned.
The function looks good to me, but adding a static function without a
user is not very bisection friendly.
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Ocfs2-devel] [PATCH 3/7] fsdax: Copy data before write
2021-02-07 17:09 [Ocfs2-devel] [PATCH 0/7] fsdax, xfs: Add reflink&dedupe support for fsdax Shiyang Ruan
2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 1/7] fsdax: Output address in dax_iomap_pfn() and rename it Shiyang Ruan
2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 2/7] fsdax: Introduce dax_copy_edges() for CoW Shiyang Ruan
@ 2021-02-07 17:09 ` Shiyang Ruan
2021-02-08 15:14 ` Christoph Hellwig
2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 4/7] fsdax: Replace mmap entry in case of CoW Shiyang Ruan
` (4 subsequent siblings)
7 siblings, 1 reply; 28+ messages in thread
From: Shiyang Ruan @ 2021-02-07 17:09 UTC (permalink / raw)
To: linux-kernel, linux-xfs, linux-nvdimm, linux-fsdevel
Cc: jack, darrick.wong, david, ocfs2-devel, viro, dan.j.williams,
linux-btrfs
Add dax_copy_edges() into each dax actor functions to perform CoW.
Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
---
fs/dax.c | 37 ++++++++++++++++++++++++++++++++++---
1 file changed, 34 insertions(+), 3 deletions(-)
diff --git a/fs/dax.c b/fs/dax.c
index ea4e8a434900..b2195cbdf2dc 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1161,7 +1161,8 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
return iov_iter_zero(min(length, end - pos), iter);
}
- if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED))
+ if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED &&
+ !(iomap->flags & IOMAP_F_SHARED)))
return -EIO;
/*
@@ -1200,6 +1201,12 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
break;
}
+ if (iomap != srcmap) {
+ ret = dax_copy_edges(pos, length, srcmap, kaddr, false);
+ if (ret)
+ break;
+ }
+
map_len = PFN_PHYS(map_len);
kaddr += offset;
map_len -= offset;
@@ -1311,6 +1318,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
vm_fault_t ret = 0;
void *entry;
pfn_t pfn;
+ void *kaddr;
trace_dax_pte_fault(inode, vmf, ret);
/*
@@ -1392,19 +1400,27 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
switch (iomap.type) {
case IOMAP_MAPPED:
+cow:
if (iomap.flags & IOMAP_F_NEW) {
count_vm_event(PGMAJFAULT);
count_memcg_event_mm(vma->vm_mm, PGMAJFAULT);
major = VM_FAULT_MAJOR;
}
error = dax_iomap_direct_access(&iomap, pos, PAGE_SIZE,
- NULL, &pfn);
+ &kaddr, &pfn);
if (error < 0)
goto error_finish_iomap;
entry = dax_insert_entry(&xas, mapping, vmf, entry, pfn,
0, write && !sync);
+ if (srcmap.type != IOMAP_HOLE) {
+ error = dax_copy_edges(pos, PAGE_SIZE, &srcmap, kaddr,
+ false);
+ if (error)
+ goto error_finish_iomap;
+ }
+
/*
* If we are doing synchronous page fault and inode needs fsync,
* we can insert PTE into page tables only after that happens.
@@ -1428,6 +1444,9 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
goto finish_iomap;
case IOMAP_UNWRITTEN:
+ if (write && iomap.flags & IOMAP_F_SHARED)
+ goto cow;
+ fallthrough;
case IOMAP_HOLE:
if (!write) {
ret = dax_load_hole(&xas, mapping, &entry, vmf);
@@ -1535,6 +1554,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
loff_t pos;
int error;
pfn_t pfn;
+ void *kaddr;
/*
* Check whether offset isn't beyond end of file now. Caller is
@@ -1616,14 +1636,22 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
switch (iomap.type) {
case IOMAP_MAPPED:
+cow:
error = dax_iomap_direct_access(&iomap, pos, PMD_SIZE,
- NULL, &pfn);
+ &kaddr, &pfn);
if (error < 0)
goto finish_iomap;
entry = dax_insert_entry(&xas, mapping, vmf, entry, pfn,
DAX_PMD, write && !sync);
+ if (srcmap.type != IOMAP_HOLE) {
+ error = dax_copy_edges(pos, PMD_SIZE, &srcmap, kaddr,
+ true);
+ if (error)
+ goto unlock_entry;
+ }
+
/*
* If we are doing synchronous page fault and inode needs fsync,
* we can insert PMD into page tables only after that happens.
@@ -1642,6 +1670,9 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
result = vmf_insert_pfn_pmd(vmf, pfn, write);
break;
case IOMAP_UNWRITTEN:
+ if (write && iomap.flags & IOMAP_F_SHARED)
+ goto cow;
+ fallthrough;
case IOMAP_HOLE:
if (WARN_ON_ONCE(write))
break;
--
2.30.0
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [Ocfs2-devel] [PATCH 3/7] fsdax: Copy data before write
2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 3/7] fsdax: Copy data before write Shiyang Ruan
@ 2021-02-08 15:14 ` Christoph Hellwig
2021-02-09 1:53 ` Ruan Shiyang
0 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2021-02-08 15:14 UTC (permalink / raw)
To: Shiyang Ruan
Cc: jack, darrick.wong, linux-nvdimm, david, linux-kernel, linux-xfs,
ocfs2-devel, viro, linux-fsdevel, dan.j.williams, linux-btrfs
> switch (iomap.type) {
> case IOMAP_MAPPED:
> +cow:
> if (iomap.flags & IOMAP_F_NEW) {
> count_vm_event(PGMAJFAULT);
> count_memcg_event_mm(vma->vm_mm, PGMAJFAULT);
> major = VM_FAULT_MAJOR;
> }
> error = dax_iomap_direct_access(&iomap, pos, PAGE_SIZE,
> - NULL, &pfn);
> + &kaddr, &pfn);
Any chance you could look into factoring out this code into a helper
to avoid the goto magic, which is a little too convoluted?
> switch (iomap.type) {
> case IOMAP_MAPPED:
> +cow:
> error = dax_iomap_direct_access(&iomap, pos, PMD_SIZE,
> - NULL, &pfn);
> + &kaddr, &pfn);
> if (error < 0)
> goto finish_iomap;
>
> entry = dax_insert_entry(&xas, mapping, vmf, entry, pfn,
> DAX_PMD, write && !sync);
>
> + if (srcmap.type != IOMAP_HOLE) {
Same here.
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Ocfs2-devel] [PATCH 3/7] fsdax: Copy data before write
2021-02-08 15:14 ` Christoph Hellwig
@ 2021-02-09 1:53 ` Ruan Shiyang
0 siblings, 0 replies; 28+ messages in thread
From: Ruan Shiyang @ 2021-02-09 1:53 UTC (permalink / raw)
To: Christoph Hellwig
Cc: jack, darrick.wong, linux-nvdimm, david, linux-kernel, linux-xfs,
ocfs2-devel, viro, linux-fsdevel, dan.j.williams, linux-btrfs
On 2021/2/8 下午11:14, Christoph Hellwig wrote:
>> switch (iomap.type) {
>> case IOMAP_MAPPED:
>> +cow:
>> if (iomap.flags & IOMAP_F_NEW) {
>> count_vm_event(PGMAJFAULT);
>> count_memcg_event_mm(vma->vm_mm, PGMAJFAULT);
>> major = VM_FAULT_MAJOR;
>> }
>> error = dax_iomap_direct_access(&iomap, pos, PAGE_SIZE,
>> - NULL, &pfn);
>> + &kaddr, &pfn);
>
> Any chance you could look into factoring out this code into a helper
> to avoid the goto magic, which is a little too convoluted?
>
>> switch (iomap.type) {
>> case IOMAP_MAPPED:
>> +cow:
>> error = dax_iomap_direct_access(&iomap, pos, PMD_SIZE,
>> - NULL, &pfn);
>> + &kaddr, &pfn);
>> if (error < 0)
>> goto finish_iomap;
>>
>> entry = dax_insert_entry(&xas, mapping, vmf, entry, pfn,
>> DAX_PMD, write && !sync);
>>
>> + if (srcmap.type != IOMAP_HOLE) {
>
> Same here.
Thanks for suggestion. I'll try it.
--
Thanks,
Ruan Shiyang.
>
>
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Ocfs2-devel] [PATCH 4/7] fsdax: Replace mmap entry in case of CoW
2021-02-07 17:09 [Ocfs2-devel] [PATCH 0/7] fsdax, xfs: Add reflink&dedupe support for fsdax Shiyang Ruan
` (2 preceding siblings ...)
2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 3/7] fsdax: Copy data before write Shiyang Ruan
@ 2021-02-07 17:09 ` Shiyang Ruan
2021-02-08 15:16 ` Christoph Hellwig
2021-02-16 13:11 ` David Sterba
2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 5/7] fsdax: Dedup file range to use a compare function Shiyang Ruan
` (3 subsequent siblings)
7 siblings, 2 replies; 28+ messages in thread
From: Shiyang Ruan @ 2021-02-07 17:09 UTC (permalink / raw)
To: linux-kernel, linux-xfs, linux-nvdimm, linux-fsdevel
Cc: jack, darrick.wong, david, ocfs2-devel, viro, Goldwyn Rodrigues,
dan.j.williams, linux-btrfs
We replace the existing entry to the newly allocated one
in case of CoW. Also, we mark the entry as PAGECACHE_TAG_TOWRITE
so writeback marks this entry as writeprotected. This
helps us snapshots so new write pagefaults after snapshots
trigger a CoW.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
---
fs/dax.c | 31 +++++++++++++++++++++++--------
1 file changed, 23 insertions(+), 8 deletions(-)
diff --git a/fs/dax.c b/fs/dax.c
index b2195cbdf2dc..29698a3d2e37 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -722,6 +722,9 @@ static int copy_cow_page_dax(struct block_device *bdev, struct dax_device *dax_d
return 0;
}
+#define DAX_IF_DIRTY (1ULL << 0)
+#define DAX_IF_COW (1ULL << 1)
+
/*
* By this point grab_mapping_entry() has ensured that we have a locked entry
* of the appropriate size so we don't have to worry about downgrading PMDs to
@@ -731,14 +734,16 @@ static int copy_cow_page_dax(struct block_device *bdev, struct dax_device *dax_d
*/
static void *dax_insert_entry(struct xa_state *xas,
struct address_space *mapping, struct vm_fault *vmf,
- void *entry, pfn_t pfn, unsigned long flags, bool dirty)
+ void *entry, pfn_t pfn, unsigned long flags, bool insert_flags)
{
void *new_entry = dax_make_entry(pfn, flags);
+ bool dirty = insert_flags & DAX_IF_DIRTY;
+ bool cow = insert_flags & DAX_IF_COW;
if (dirty)
__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
- if (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE)) {
+ if (cow || (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE))) {
unsigned long index = xas->xa_index;
/* we are replacing a zero page with block mapping */
if (dax_is_pmd_entry(entry))
@@ -750,7 +755,7 @@ static void *dax_insert_entry(struct xa_state *xas,
xas_reset(xas);
xas_lock_irq(xas);
- if (dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) {
+ if (cow || dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) {
void *old;
dax_disassociate_entry(entry, mapping, false);
@@ -774,6 +779,9 @@ static void *dax_insert_entry(struct xa_state *xas,
if (dirty)
xas_set_mark(xas, PAGECACHE_TAG_DIRTY);
+ if (cow)
+ xas_set_mark(xas, PAGECACHE_TAG_TOWRITE);
+
xas_unlock_irq(xas);
return entry;
}
@@ -1319,6 +1327,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
void *entry;
pfn_t pfn;
void *kaddr;
+ unsigned long insert_flags = 0;
trace_dax_pte_fault(inode, vmf, ret);
/*
@@ -1444,8 +1453,10 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
goto finish_iomap;
case IOMAP_UNWRITTEN:
- if (write && iomap.flags & IOMAP_F_SHARED)
+ if (write && (iomap.flags & IOMAP_F_SHARED)) {
+ insert_flags |= DAX_IF_COW;
goto cow;
+ }
fallthrough;
case IOMAP_HOLE:
if (!write) {
@@ -1555,6 +1566,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
int error;
pfn_t pfn;
void *kaddr;
+ unsigned long insert_flags = 0;
/*
* Check whether offset isn't beyond end of file now. Caller is
@@ -1670,14 +1682,17 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
result = vmf_insert_pfn_pmd(vmf, pfn, write);
break;
case IOMAP_UNWRITTEN:
- if (write && iomap.flags & IOMAP_F_SHARED)
+ if (write && (iomap.flags & IOMAP_F_SHARED)) {
+ insert_flags |= DAX_IF_COW;
goto cow;
+ }
fallthrough;
case IOMAP_HOLE:
- if (WARN_ON_ONCE(write))
+ if (!write) {
+ result = dax_pmd_load_hole(&xas, vmf, &iomap, &entry);
break;
- result = dax_pmd_load_hole(&xas, vmf, &iomap, &entry);
- break;
+ }
+ fallthrough;
default:
WARN_ON_ONCE(1);
break;
--
2.30.0
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [Ocfs2-devel] [PATCH 4/7] fsdax: Replace mmap entry in case of CoW
2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 4/7] fsdax: Replace mmap entry in case of CoW Shiyang Ruan
@ 2021-02-08 15:16 ` Christoph Hellwig
2021-02-16 13:11 ` David Sterba
1 sibling, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2021-02-08 15:16 UTC (permalink / raw)
To: Shiyang Ruan
Cc: jack, darrick.wong, linux-nvdimm, david, linux-kernel, linux-xfs,
ocfs2-devel, viro, Goldwyn Rodrigues, linux-fsdevel,
dan.j.williams, linux-btrfs
> static void *dax_insert_entry(struct xa_state *xas,
> struct address_space *mapping, struct vm_fault *vmf,
> - void *entry, pfn_t pfn, unsigned long flags, bool dirty)
> + void *entry, pfn_t pfn, unsigned long flags, bool insert_flags)
> {
> void *new_entry = dax_make_entry(pfn, flags);
> + bool dirty = insert_flags & DAX_IF_DIRTY;
> + bool cow = insert_flags & DAX_IF_COW;
>
> if (dirty)
> __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
Can we just move the __mark_inode_dirty to the one caller that needs it
in a prep patch?
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Ocfs2-devel] [PATCH 4/7] fsdax: Replace mmap entry in case of CoW
2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 4/7] fsdax: Replace mmap entry in case of CoW Shiyang Ruan
2021-02-08 15:16 ` Christoph Hellwig
@ 2021-02-16 13:11 ` David Sterba
2021-02-17 3:06 ` Ruan Shiyang
1 sibling, 1 reply; 28+ messages in thread
From: David Sterba @ 2021-02-16 13:11 UTC (permalink / raw)
To: Shiyang Ruan
Cc: jack, darrick.wong, linux-nvdimm, david, linux-kernel, linux-xfs,
ocfs2-devel, viro, Goldwyn Rodrigues, linux-fsdevel,
dan.j.williams, linux-btrfs
On Mon, Feb 08, 2021 at 01:09:21AM +0800, Shiyang Ruan wrote:
> We replace the existing entry to the newly allocated one
> in case of CoW. Also, we mark the entry as PAGECACHE_TAG_TOWRITE
> so writeback marks this entry as writeprotected. This
> helps us snapshots so new write pagefaults after snapshots
> trigger a CoW.
>
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
> ---
> fs/dax.c | 31 +++++++++++++++++++++++--------
> 1 file changed, 23 insertions(+), 8 deletions(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index b2195cbdf2dc..29698a3d2e37 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -722,6 +722,9 @@ static int copy_cow_page_dax(struct block_device *bdev, struct dax_device *dax_d
> return 0;
> }
>
> +#define DAX_IF_DIRTY (1ULL << 0)
> +#define DAX_IF_COW (1ULL << 1)
The constants are ULL, but I see other flags only 'unsigned long'
> +
> /*
> * By this point grab_mapping_entry() has ensured that we have a locked entry
> * of the appropriate size so we don't have to worry about downgrading PMDs to
> @@ -731,14 +734,16 @@ static int copy_cow_page_dax(struct block_device *bdev, struct dax_device *dax_d
> */
> static void *dax_insert_entry(struct xa_state *xas,
> struct address_space *mapping, struct vm_fault *vmf,
> - void *entry, pfn_t pfn, unsigned long flags, bool dirty)
> + void *entry, pfn_t pfn, unsigned long flags, bool insert_flags)
insert_flags is bool
> {
> void *new_entry = dax_make_entry(pfn, flags);
> + bool dirty = insert_flags & DAX_IF_DIRTY;
"insert_flags & DAX_IF_DIRTY" is "bool & ULL", this can't be right
> + bool cow = insert_flags & DAX_IF_COW;
Same
>
> if (dirty)
> __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
>
> - if (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE)) {
> + if (cow || (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE))) {
> unsigned long index = xas->xa_index;
> /* we are replacing a zero page with block mapping */
> if (dax_is_pmd_entry(entry))
> @@ -750,7 +755,7 @@ static void *dax_insert_entry(struct xa_state *xas,
>
> xas_reset(xas);
> xas_lock_irq(xas);
> - if (dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) {
> + if (cow || dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) {
> void *old;
>
> dax_disassociate_entry(entry, mapping, false);
> @@ -774,6 +779,9 @@ static void *dax_insert_entry(struct xa_state *xas,
> if (dirty)
> xas_set_mark(xas, PAGECACHE_TAG_DIRTY);
>
> + if (cow)
> + xas_set_mark(xas, PAGECACHE_TAG_TOWRITE);
> +
> xas_unlock_irq(xas);
> return entry;
> }
> @@ -1319,6 +1327,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
> void *entry;
> pfn_t pfn;
> void *kaddr;
> + unsigned long insert_flags = 0;
>
> trace_dax_pte_fault(inode, vmf, ret);
> /*
> @@ -1444,8 +1453,10 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
>
> goto finish_iomap;
> case IOMAP_UNWRITTEN:
> - if (write && iomap.flags & IOMAP_F_SHARED)
> + if (write && (iomap.flags & IOMAP_F_SHARED)) {
> + insert_flags |= DAX_IF_COW;
Here's an example of 'unsigned long = unsigned long long', though it'll
work, it would be better to unify all the types.
> goto cow;
> + }
> fallthrough;
> case IOMAP_HOLE:
> if (!write) {
> @@ -1555,6 +1566,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
> int error;
> pfn_t pfn;
> void *kaddr;
> + unsigned long insert_flags = 0;
>
> /*
> * Check whether offset isn't beyond end of file now. Caller is
> @@ -1670,14 +1682,17 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
> result = vmf_insert_pfn_pmd(vmf, pfn, write);
> break;
> case IOMAP_UNWRITTEN:
> - if (write && iomap.flags & IOMAP_F_SHARED)
> + if (write && (iomap.flags & IOMAP_F_SHARED)) {
> + insert_flags |= DAX_IF_COW;
> goto cow;
> + }
> fallthrough;
> case IOMAP_HOLE:
> - if (WARN_ON_ONCE(write))
> + if (!write) {
> + result = dax_pmd_load_hole(&xas, vmf, &iomap, &entry);
> break;
> - result = dax_pmd_load_hole(&xas, vmf, &iomap, &entry);
> - break;
> + }
> + fallthrough;
> default:
> WARN_ON_ONCE(1);
> break;
> --
> 2.30.0
>
>
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Ocfs2-devel] [PATCH 4/7] fsdax: Replace mmap entry in case of CoW
2021-02-16 13:11 ` David Sterba
@ 2021-02-17 3:06 ` Ruan Shiyang
0 siblings, 0 replies; 28+ messages in thread
From: Ruan Shiyang @ 2021-02-17 3:06 UTC (permalink / raw)
To: dsterba, linux-kernel, linux-xfs, linux-nvdimm, linux-fsdevel,
darrick.wong, dan.j.williams, willy, jack, viro, linux-btrfs,
ocfs2-devel, david, hch, rgoldwyn, Goldwyn Rodrigues
On 2021/2/16 下午9:11, David Sterba wrote:
> On Mon, Feb 08, 2021 at 01:09:21AM +0800, Shiyang Ruan wrote:
>> We replace the existing entry to the newly allocated one
>> in case of CoW. Also, we mark the entry as PAGECACHE_TAG_TOWRITE
>> so writeback marks this entry as writeprotected. This
>> helps us snapshots so new write pagefaults after snapshots
>> trigger a CoW.
>>
>> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
>> Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
>> ---
>> fs/dax.c | 31 +++++++++++++++++++++++--------
>> 1 file changed, 23 insertions(+), 8 deletions(-)
>>
>> diff --git a/fs/dax.c b/fs/dax.c
>> index b2195cbdf2dc..29698a3d2e37 100644
>> --- a/fs/dax.c
>> +++ b/fs/dax.c
>> @@ -722,6 +722,9 @@ static int copy_cow_page_dax(struct block_device *bdev, struct dax_device *dax_d
>> return 0;
>> }
>>
>> +#define DAX_IF_DIRTY (1ULL << 0)
>> +#define DAX_IF_COW (1ULL << 1)
>
> The constants are ULL, but I see other flags only 'unsigned long'
>
>> +
>> /*
>> * By this point grab_mapping_entry() has ensured that we have a locked entry
>> * of the appropriate size so we don't have to worry about downgrading PMDs to
>> @@ -731,14 +734,16 @@ static int copy_cow_page_dax(struct block_device *bdev, struct dax_device *dax_d
>> */
>> static void *dax_insert_entry(struct xa_state *xas,
>> struct address_space *mapping, struct vm_fault *vmf,
>> - void *entry, pfn_t pfn, unsigned long flags, bool dirty)
>> + void *entry, pfn_t pfn, unsigned long flags, bool insert_flags)
>
> insert_flags is bool
>
>> {
>> void *new_entry = dax_make_entry(pfn, flags);
>> + bool dirty = insert_flags & DAX_IF_DIRTY;
>
> "insert_flags & DAX_IF_DIRTY" is "bool & ULL", this can't be right
This is a mistake caused by rebasing my old version patchset. Thanks
for pointing out. I'll fix this in next version.
>
>> + bool cow = insert_flags & DAX_IF_COW;
>
> Same
>
>>
>> if (dirty)
>> __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
>>
>> - if (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE)) {
>> + if (cow || (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE))) {
>> unsigned long index = xas->xa_index;
>> /* we are replacing a zero page with block mapping */
>> if (dax_is_pmd_entry(entry))
>> @@ -750,7 +755,7 @@ static void *dax_insert_entry(struct xa_state *xas,
>>
>> xas_reset(xas);
>> xas_lock_irq(xas);
>> - if (dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) {
>> + if (cow || dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) {
>> void *old;
>>
>> dax_disassociate_entry(entry, mapping, false);
>> @@ -774,6 +779,9 @@ static void *dax_insert_entry(struct xa_state *xas,
>> if (dirty)
>> xas_set_mark(xas, PAGECACHE_TAG_DIRTY);
>>
>> + if (cow)
>> + xas_set_mark(xas, PAGECACHE_TAG_TOWRITE);
>> +
>> xas_unlock_irq(xas);
>> return entry;
>> }
>> @@ -1319,6 +1327,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
>> void *entry;
>> pfn_t pfn;
>> void *kaddr;
>> + unsigned long insert_flags = 0;
>>
>> trace_dax_pte_fault(inode, vmf, ret);
>> /*
>> @@ -1444,8 +1453,10 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
>>
>> goto finish_iomap;
>> case IOMAP_UNWRITTEN:
>> - if (write && iomap.flags & IOMAP_F_SHARED)
>> + if (write && (iomap.flags & IOMAP_F_SHARED)) {
>> + insert_flags |= DAX_IF_COW;
>
> Here's an example of 'unsigned long = unsigned long long', though it'll
> work, it would be better to unify all the types.
Yes, I'll fix it.
--
Thanks,
Ruan Shiyang.
>
>> goto cow;
>> + }
>> fallthrough;
>> case IOMAP_HOLE:
>> if (!write) {
>> @@ -1555,6 +1566,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
>> int error;
>> pfn_t pfn;
>> void *kaddr;
>> + unsigned long insert_flags = 0;
>>
>> /*
>> * Check whether offset isn't beyond end of file now. Caller is
>> @@ -1670,14 +1682,17 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
>> result = vmf_insert_pfn_pmd(vmf, pfn, write);
>> break;
>> case IOMAP_UNWRITTEN:
>> - if (write && iomap.flags & IOMAP_F_SHARED)
>> + if (write && (iomap.flags & IOMAP_F_SHARED)) {
>> + insert_flags |= DAX_IF_COW;
>> goto cow;
>> + }
>> fallthrough;
>> case IOMAP_HOLE:
>> - if (WARN_ON_ONCE(write))
>> + if (!write) {
>> + result = dax_pmd_load_hole(&xas, vmf, &iomap, &entry);
>> break;
>> - result = dax_pmd_load_hole(&xas, vmf, &iomap, &entry);
>> - break;
>> + }
>> + fallthrough;
>> default:
>> WARN_ON_ONCE(1);
>> break;
>> --
>> 2.30.0
>>
>>
>
>
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Ocfs2-devel] [PATCH 5/7] fsdax: Dedup file range to use a compare function
2021-02-07 17:09 [Ocfs2-devel] [PATCH 0/7] fsdax, xfs: Add reflink&dedupe support for fsdax Shiyang Ruan
` (3 preceding siblings ...)
2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 4/7] fsdax: Replace mmap entry in case of CoW Shiyang Ruan
@ 2021-02-07 17:09 ` Shiyang Ruan
2021-02-08 15:19 ` Christoph Hellwig
2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 6/7] fs/xfs: Handle CoW for fsdax write() path Shiyang Ruan
` (2 subsequent siblings)
7 siblings, 1 reply; 28+ messages in thread
From: Shiyang Ruan @ 2021-02-07 17:09 UTC (permalink / raw)
To: linux-kernel, linux-xfs, linux-nvdimm, linux-fsdevel
Cc: jack, darrick.wong, david, ocfs2-devel, viro, Goldwyn Rodrigues,
dan.j.williams, linux-btrfs
With dax we cannot deal with readpage() etc. So, we create a
funciton callback to perform the file data comparison and pass
it to generic_remap_file_range_prep() so it can use iomap-based
functions.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
---
fs/btrfs/reflink.c | 3 +-
fs/dax.c | 67 ++++++++++++++++++++++++++++++++++++++++++++
fs/ocfs2/file.c | 2 +-
fs/remap_range.c | 14 +++++----
fs/xfs/xfs_reflink.c | 2 +-
include/linux/dax.h | 5 ++++
include/linux/fs.h | 9 +++++-
7 files changed, 92 insertions(+), 10 deletions(-)
diff --git a/fs/btrfs/reflink.c b/fs/btrfs/reflink.c
index 99aa87c08912..efe094f7f748 100644
--- a/fs/btrfs/reflink.c
+++ b/fs/btrfs/reflink.c
@@ -783,7 +783,8 @@ static int btrfs_remap_file_range_prep(struct file *file_in, loff_t pos_in,
return ret;
return generic_remap_file_range_prep(file_in, pos_in, file_out, pos_out,
- len, remap_flags);
+ len, remap_flags,
+ vfs_dedupe_file_range_compare);
}
loff_t btrfs_remap_file_range(struct file *src_file, loff_t off,
diff --git a/fs/dax.c b/fs/dax.c
index 29698a3d2e37..6e9391845057 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -30,6 +30,8 @@
#define CREATE_TRACE_POINTS
#include <trace/events/fs_dax.h>
+#define MIN(a, b) (((a) < (b)) ? (a) : (b))
+
static inline unsigned int pe_order(enum page_entry_size pe_size)
{
if (pe_size == PE_SIZE_PTE)
@@ -1827,3 +1829,68 @@ vm_fault_t dax_finish_sync_fault(struct vm_fault *vmf,
return dax_insert_pfn_mkwrite(vmf, pfn, order);
}
EXPORT_SYMBOL_GPL(dax_finish_sync_fault);
+
+int dax_file_range_compare(struct inode *src, loff_t srcoff, struct inode *dest,
+ loff_t destoff, loff_t len, bool *is_same,
+ const struct iomap_ops *ops)
+{
+ void *saddr, *daddr;
+ struct iomap smap = { 0 };
+ struct iomap dmap = { 0 };
+ bool same = false;
+ loff_t cmp_len;
+ int id, ret = 0;
+
+ id = dax_read_lock();
+ while (len) {
+ ret = ops->iomap_begin(src, srcoff, len, 0, &smap, NULL);
+ if (ret < 0)
+ goto out_src;
+ cmp_len = MIN(len, smap.offset + smap.length - srcoff);
+
+ ret = ops->iomap_begin(dest, destoff, cmp_len, 0, &dmap, NULL);
+ if (ret < 0)
+ goto out_dest;
+ cmp_len = MIN(cmp_len, dmap.offset + dmap.length - destoff);
+
+ if (smap.type == IOMAP_HOLE && dmap.type == IOMAP_HOLE)
+ goto next;
+
+ if (smap.type == IOMAP_HOLE || dmap.type == IOMAP_HOLE) {
+ same = false;
+ goto next;
+ }
+
+ ret = dax_iomap_direct_access(&smap, srcoff,
+ ALIGN(srcoff + cmp_len, PAGE_SIZE), &saddr, NULL);
+ if (ret < 0)
+ goto out_dest;
+
+ ret = dax_iomap_direct_access(&dmap, destoff,
+ ALIGN(destoff + cmp_len, PAGE_SIZE), &daddr, NULL);
+ if (ret < 0)
+ goto out_dest;
+
+ same = !memcmp(saddr, daddr, cmp_len);
+ if (!same)
+ break;
+next:
+ len -= cmp_len;
+ srcoff += cmp_len;
+ destoff += cmp_len;
+out_dest:
+ if (ops->iomap_end)
+ ops->iomap_end(dest, destoff, cmp_len, 0, 0, &dmap);
+out_src:
+ if (ops->iomap_end)
+ ops->iomap_end(src, srcoff, len, 0, 0, &smap);
+
+ if (ret < 0)
+ goto out;
+ }
+ *is_same = same;
+out:
+ dax_read_unlock(id);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(dax_file_range_compare);
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 85979e2214b3..9d101f129d16 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -2591,7 +2591,7 @@ static loff_t ocfs2_remap_file_range(struct file *file_in, loff_t pos_in,
goto out_unlock;
ret = generic_remap_file_range_prep(file_in, pos_in, file_out, pos_out,
- &len, remap_flags);
+ &len, remap_flags, vfs_dedupe_file_range_compare);
if (ret < 0 || len == 0)
goto out_unlock;
diff --git a/fs/remap_range.c b/fs/remap_range.c
index e6099beefa97..fc64f7e2af5a 100644
--- a/fs/remap_range.c
+++ b/fs/remap_range.c
@@ -199,9 +199,9 @@ static void vfs_unlock_two_pages(struct page *page1, struct page *page2)
* Compare extents of two files to see if they are the same.
* Caller must have locked both inodes to prevent write races.
*/
-static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
- struct inode *dest, loff_t destoff,
- loff_t len, bool *is_same)
+int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
+ struct inode *dest, loff_t destoff,
+ loff_t len, bool *is_same)
{
loff_t src_poff;
loff_t dest_poff;
@@ -280,6 +280,7 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
out_error:
return error;
}
+EXPORT_SYMBOL(vfs_dedupe_file_range_compare);
/*
* Check that the two inodes are eligible for cloning, the ranges make
@@ -291,7 +292,8 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
*/
int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in,
struct file *file_out, loff_t pos_out,
- loff_t *len, unsigned int remap_flags)
+ loff_t *len, unsigned int remap_flags,
+ compare_range_t compare_range_fn)
{
struct inode *inode_in = file_inode(file_in);
struct inode *inode_out = file_inode(file_out);
@@ -351,8 +353,8 @@ int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in,
if (remap_flags & REMAP_FILE_DEDUP) {
bool is_same = false;
- ret = vfs_dedupe_file_range_compare(inode_in, pos_in,
- inode_out, pos_out, *len, &is_same);
+ ret = compare_range_fn(inode_in, pos_in, inode_out, pos_out,
+ *len, &is_same);
if (ret)
return ret;
if (!is_same)
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 6fa05fb78189..26708c01fd78 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1309,7 +1309,7 @@ xfs_reflink_remap_prep(
goto out_unlock;
ret = generic_remap_file_range_prep(file_in, pos_in, file_out, pos_out,
- len, remap_flags);
+ len, remap_flags, vfs_dedupe_file_range_compare);
if (ret || *len == 0)
goto out_unlock;
diff --git a/include/linux/dax.h b/include/linux/dax.h
index b52f084aa643..183e084c2ac7 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -238,6 +238,11 @@ int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index);
int dax_invalidate_mapping_entry_sync(struct address_space *mapping,
pgoff_t index);
s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap);
+int dax_file_range_compare(struct inode *src, loff_t srcoff,
+ struct inode *dest, loff_t destoff,
+ loff_t len, bool *is_same,
+ const struct iomap_ops *ops);
+
static inline bool dax_mapping(struct address_space *mapping)
{
return mapping->host && IS_DAX(mapping->host);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8667d0cdc71e..159ec755e47b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1912,13 +1912,20 @@ extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
extern ssize_t vfs_write(struct file *, const char __user *, size_t, loff_t *);
extern ssize_t vfs_copy_file_range(struct file *, loff_t , struct file *,
loff_t, size_t, unsigned int);
+extern int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
+ struct inode *dest, loff_t destoff,
+ loff_t len, bool *is_same);
+typedef int (*compare_range_t)(struct inode *src, loff_t srcpos,
+ struct inode *dest, loff_t destpos,
+ loff_t len, bool *is_same);
extern ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in,
struct file *file_out, loff_t pos_out,
size_t len, unsigned int flags);
extern int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in,
struct file *file_out, loff_t pos_out,
loff_t *count,
- unsigned int remap_flags);
+ unsigned int remap_flags,
+ compare_range_t compare_range_fn);
extern loff_t do_clone_file_range(struct file *file_in, loff_t pos_in,
struct file *file_out, loff_t pos_out,
loff_t len, unsigned int remap_flags);
--
2.30.0
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [Ocfs2-devel] [PATCH 5/7] fsdax: Dedup file range to use a compare function
2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 5/7] fsdax: Dedup file range to use a compare function Shiyang Ruan
@ 2021-02-08 15:19 ` Christoph Hellwig
2021-02-09 9:15 ` Ruan Shiyang
0 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2021-02-08 15:19 UTC (permalink / raw)
To: Shiyang Ruan
Cc: jack, darrick.wong, linux-nvdimm, david, linux-kernel, linux-xfs,
ocfs2-devel, viro, Goldwyn Rodrigues, linux-fsdevel,
dan.j.williams, linux-btrfs
On Mon, Feb 08, 2021 at 01:09:22AM +0800, Shiyang Ruan wrote:
> With dax we cannot deal with readpage() etc. So, we create a
> funciton callback to perform the file data comparison and pass
s/funciton/function/g
> +#define MIN(a, b) (((a) < (b)) ? (a) : (b))
This should use the existing min or min_t helpers.
> int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in,
> struct file *file_out, loff_t pos_out,
> - loff_t *len, unsigned int remap_flags)
> + loff_t *len, unsigned int remap_flags,
> + compare_range_t compare_range_fn)
Can we keep generic_remap_file_range_prep as-is, and add a new
dax_remap_file_range_prep, both sharing a low-level
__generic_remap_file_range_prep implementation? And for that
implementation I'd also go for classic if/else instead of the function
pointer.
> +extern int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
> + struct inode *dest, loff_t destoff,
> + loff_t len, bool *is_same);
no need for the extern.
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Ocfs2-devel] [PATCH 5/7] fsdax: Dedup file range to use a compare function
2021-02-08 15:19 ` Christoph Hellwig
@ 2021-02-09 9:15 ` Ruan Shiyang
2021-02-09 9:34 ` Christoph Hellwig
0 siblings, 1 reply; 28+ messages in thread
From: Ruan Shiyang @ 2021-02-09 9:15 UTC (permalink / raw)
To: Christoph Hellwig
Cc: jack, darrick.wong, linux-nvdimm, david, linux-kernel, linux-xfs,
ocfs2-devel, viro, Goldwyn Rodrigues, linux-fsdevel,
dan.j.williams, linux-btrfs
On 2021/2/8 下午11:19, Christoph Hellwig wrote:
> On Mon, Feb 08, 2021 at 01:09:22AM +0800, Shiyang Ruan wrote:
>> With dax we cannot deal with readpage() etc. So, we create a
>> funciton callback to perform the file data comparison and pass
>
> s/funciton/function/g
>
>> +#define MIN(a, b) (((a) < (b)) ? (a) : (b))
>
> This should use the existing min or min_t helpers.
>
>
>> int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in,
>> struct file *file_out, loff_t pos_out,
>> - loff_t *len, unsigned int remap_flags)
>> + loff_t *len, unsigned int remap_flags,
>> + compare_range_t compare_range_fn)
>
> Can we keep generic_remap_file_range_prep as-is, and add a new
> dax_remap_file_range_prep, both sharing a low-level
> __generic_remap_file_range_prep implementation? And for that
> implementation I'd also go for classic if/else instead of the function
> pointer.
The dax dedupe comparison need the iomap_ops pointer as argument, so my
understanding is that we don't modify the argument list of
generic_remap_file_range_prep(), but move its code into
__generic_remap_file_range_prep() whose argument list can be modified to
accepts the iomap_ops pointer. Then it looks like this:
```
int dax_remap_file_range_prep(struct file *file_in, loff_t pos_in,
struct file *file_out, loff_t pos_out,
loff_t *len, unsigned int remap_flags,
const struct iomap_ops *ops)
{
return __generic_remap_file_range_prep(file_in, pos_in, file_out,
pos_out, len, remap_flags, ops);
}
EXPORT_SYMBOL(dax_remap_file_range_prep);
int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in,
struct file *file_out, loff_t pos_out,
loff_t *len, unsigned int remap_flags)
{
return __generic_remap_file_range_prep(file_in, pos_in, file_out,
pos_out, len, remap_flags, NULL);
}
EXPORT_SYMBOL(generic_remap_file_range_prep);
```
Am i right?
--
Thanks,
Ruan Shiyang.
>
>> +extern int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
>> + struct inode *dest, loff_t destoff,
>> + loff_t len, bool *is_same);
>
> no need for the extern.
>
>
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Ocfs2-devel] [PATCH 5/7] fsdax: Dedup file range to use a compare function
2021-02-09 9:15 ` Ruan Shiyang
@ 2021-02-09 9:34 ` Christoph Hellwig
2021-02-09 9:46 ` Ruan Shiyang
0 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2021-02-09 9:34 UTC (permalink / raw)
To: Ruan Shiyang
Cc: jack, linux-nvdimm, darrick.wong, david, linux-kernel, linux-xfs,
ocfs2-devel, viro, Goldwyn Rodrigues, linux-fsdevel,
dan.j.williams, linux-btrfs
On Tue, Feb 09, 2021 at 05:15:13PM +0800, Ruan Shiyang wrote:
> The dax dedupe comparison need the iomap_ops pointer as argument, so my
> understanding is that we don't modify the argument list of
> generic_remap_file_range_prep(), but move its code into
> __generic_remap_file_range_prep() whose argument list can be modified to
> accepts the iomap_ops pointer. Then it looks like this:
I'd say just add the iomap_ops pointer to
generic_remap_file_range_prep and do away with the extra wrappers. We
only have three callers anyway.
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Ocfs2-devel] [PATCH 5/7] fsdax: Dedup file range to use a compare function
2021-02-09 9:34 ` Christoph Hellwig
@ 2021-02-09 9:46 ` Ruan Shiyang
2021-02-10 13:19 ` Christoph Hellwig
0 siblings, 1 reply; 28+ messages in thread
From: Ruan Shiyang @ 2021-02-09 9:46 UTC (permalink / raw)
To: Christoph Hellwig
Cc: jack, darrick.wong, linux-nvdimm, david, linux-kernel, linux-xfs,
ocfs2-devel, viro, Goldwyn Rodrigues, linux-fsdevel,
dan.j.williams, linux-btrfs
On 2021/2/9 下午5:34, Christoph Hellwig wrote:
> On Tue, Feb 09, 2021 at 05:15:13PM +0800, Ruan Shiyang wrote:
>> The dax dedupe comparison need the iomap_ops pointer as argument, so my
>> understanding is that we don't modify the argument list of
>> generic_remap_file_range_prep(), but move its code into
>> __generic_remap_file_range_prep() whose argument list can be modified to
>> accepts the iomap_ops pointer. Then it looks like this:
>
> I'd say just add the iomap_ops pointer to
> generic_remap_file_range_prep and do away with the extra wrappers. We
> only have three callers anyway.
OK.
--
Thanks,
Ruan Shiyang.
>
>
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Ocfs2-devel] [PATCH 5/7] fsdax: Dedup file range to use a compare function
2021-02-09 9:46 ` Ruan Shiyang
@ 2021-02-10 13:19 ` Christoph Hellwig
2021-02-17 3:24 ` Ruan Shiyang
0 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2021-02-10 13:19 UTC (permalink / raw)
To: Ruan Shiyang
Cc: jack, linux-nvdimm, darrick.wong, david, linux-kernel, linux-xfs,
ocfs2-devel, viro, Goldwyn Rodrigues, linux-fsdevel,
dan.j.williams, linux-btrfs
On Tue, Feb 09, 2021 at 05:46:13PM +0800, Ruan Shiyang wrote:
>
>
> On 2021/2/9 下午5:34, Christoph Hellwig wrote:
>> On Tue, Feb 09, 2021 at 05:15:13PM +0800, Ruan Shiyang wrote:
>>> The dax dedupe comparison need the iomap_ops pointer as argument, so my
>>> understanding is that we don't modify the argument list of
>>> generic_remap_file_range_prep(), but move its code into
>>> __generic_remap_file_range_prep() whose argument list can be modified to
>>> accepts the iomap_ops pointer. Then it looks like this:
>>
>> I'd say just add the iomap_ops pointer to
>> generic_remap_file_range_prep and do away with the extra wrappers. We
>> only have three callers anyway.
>
> OK.
So looking at this again I think your proposal actaully is better,
given that the iomap variant is still DAX specific. Sorry for
the noise.
Also I think dax_file_range_compare should use iomap_apply instead
of open coding it.
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Ocfs2-devel] [PATCH 5/7] fsdax: Dedup file range to use a compare function
2021-02-10 13:19 ` Christoph Hellwig
@ 2021-02-17 3:24 ` Ruan Shiyang
2021-02-18 16:20 ` Darrick J. Wong
0 siblings, 1 reply; 28+ messages in thread
From: Ruan Shiyang @ 2021-02-17 3:24 UTC (permalink / raw)
To: Christoph Hellwig
Cc: jack, darrick.wong, linux-nvdimm, david, linux-kernel, linux-xfs,
ocfs2-devel, viro, Goldwyn Rodrigues, linux-fsdevel,
dan.j.williams, linux-btrfs
On 2021/2/10 下午9:19, Christoph Hellwig wrote:
> On Tue, Feb 09, 2021 at 05:46:13PM +0800, Ruan Shiyang wrote:
>>
>>
>> On 2021/2/9 下午5:34, Christoph Hellwig wrote:
>>> On Tue, Feb 09, 2021 at 05:15:13PM +0800, Ruan Shiyang wrote:
>>>> The dax dedupe comparison need the iomap_ops pointer as argument, so my
>>>> understanding is that we don't modify the argument list of
>>>> generic_remap_file_range_prep(), but move its code into
>>>> __generic_remap_file_range_prep() whose argument list can be modified to
>>>> accepts the iomap_ops pointer. Then it looks like this:
>>>
>>> I'd say just add the iomap_ops pointer to
>>> generic_remap_file_range_prep and do away with the extra wrappers. We
>>> only have three callers anyway.
>>
>> OK.
>
> So looking at this again I think your proposal actaully is better,
> given that the iomap variant is still DAX specific. Sorry for
> the noise.
>
> Also I think dax_file_range_compare should use iomap_apply instead
> of open coding it.
>
There are two files, which are not reflinked, need to be direct_access()
here. The iomap_apply() can handle one file each time. So, it seems
that iomap_apply() is not suitable for this case...
The pseudo code of this process is as follows:
srclen = ops->begin(&srcmap)
destlen = ops->begin(&destmap)
direct_access(&srcmap, &saddr)
direct_access(&destmap, &daddr)
same = memcpy(saddr, daddr, min(srclen,destlen))
ops->end(&destmap)
ops->end(&srcmap)
I think a nested call like this is necessary. That's why I use the open
code way.
--
Thanks,
Ruan Shiyang.
>
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Ocfs2-devel] [PATCH 5/7] fsdax: Dedup file range to use a compare function
2021-02-17 3:24 ` Ruan Shiyang
@ 2021-02-18 16:20 ` Darrick J. Wong
2021-02-25 7:35 ` Christoph Hellwig
0 siblings, 1 reply; 28+ messages in thread
From: Darrick J. Wong @ 2021-02-18 16:20 UTC (permalink / raw)
To: Ruan Shiyang
Cc: jack, linux-nvdimm, darrick.wong, david, linux-kernel, linux-xfs,
ocfs2-devel, viro, Goldwyn Rodrigues, linux-fsdevel,
dan.j.williams, linux-btrfs
On Wed, Feb 17, 2021 at 11:24:18AM +0800, Ruan Shiyang wrote:
>
>
> On 2021/2/10 下午9:19, Christoph Hellwig wrote:
> > On Tue, Feb 09, 2021 at 05:46:13PM +0800, Ruan Shiyang wrote:
> > >
> > >
> > > On 2021/2/9 下午5:34, Christoph Hellwig wrote:
> > > > On Tue, Feb 09, 2021 at 05:15:13PM +0800, Ruan Shiyang wrote:
> > > > > The dax dedupe comparison need the iomap_ops pointer as argument, so my
> > > > > understanding is that we don't modify the argument list of
> > > > > generic_remap_file_range_prep(), but move its code into
> > > > > __generic_remap_file_range_prep() whose argument list can be modified to
> > > > > accepts the iomap_ops pointer. Then it looks like this:
> > > >
> > > > I'd say just add the iomap_ops pointer to
> > > > generic_remap_file_range_prep and do away with the extra wrappers. We
> > > > only have three callers anyway.
> > >
> > > OK.
> >
> > So looking at this again I think your proposal actaully is better,
> > given that the iomap variant is still DAX specific. Sorry for
> > the noise.
> >
> > Also I think dax_file_range_compare should use iomap_apply instead
> > of open coding it.
> >
>
> There are two files, which are not reflinked, need to be direct_access()
> here. The iomap_apply() can handle one file each time. So, it seems that
> iomap_apply() is not suitable for this case...
>
>
> The pseudo code of this process is as follows:
>
> srclen = ops->begin(&srcmap)
> destlen = ops->begin(&destmap)
>
> direct_access(&srcmap, &saddr)
> direct_access(&destmap, &daddr)
>
> same = memcpy(saddr, daddr, min(srclen,destlen))
>
> ops->end(&destmap)
> ops->end(&srcmap)
>
> I think a nested call like this is necessary. That's why I use the open
> code way.
This might be a good place to implement an iomap_apply2() loop that
actually /does/ walk all the extents of file1 and file2. There's now
two users of this idiom.
(Possibly structured as a "get next mappings from both" generator
function like Matthew Wilcox keeps asking for. :))
--D
>
> --
> Thanks,
> Ruan Shiyang.
> >
>
>
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Ocfs2-devel] [PATCH 5/7] fsdax: Dedup file range to use a compare function
2021-02-18 16:20 ` Darrick J. Wong
@ 2021-02-25 7:35 ` Christoph Hellwig
0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2021-02-25 7:35 UTC (permalink / raw)
To: Darrick J. Wong
Cc: jack, Ruan Shiyang, linux-nvdimm, darrick.wong, david,
linux-kernel, linux-xfs, ocfs2-devel, viro, Goldwyn Rodrigues,
linux-fsdevel, dan.j.williams, linux-btrfs
On Thu, Feb 18, 2021 at 08:20:18AM -0800, Darrick J. Wong wrote:
> > I think a nested call like this is necessary. That's why I use the open
> > code way.
>
> This might be a good place to implement an iomap_apply2() loop that
> actually /does/ walk all the extents of file1 and file2. There's now
> two users of this idiom.
Why do we need a special helper for that?
> (Possibly structured as a "get next mappings from both" generator
> function like Matthew Wilcox keeps asking for. :))
OTOH this might be a good first use for that.
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Ocfs2-devel] [PATCH 6/7] fs/xfs: Handle CoW for fsdax write() path
2021-02-07 17:09 [Ocfs2-devel] [PATCH 0/7] fsdax, xfs: Add reflink&dedupe support for fsdax Shiyang Ruan
` (4 preceding siblings ...)
2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 5/7] fsdax: Dedup file range to use a compare function Shiyang Ruan
@ 2021-02-07 17:09 ` Shiyang Ruan
2021-02-08 15:24 ` Christoph Hellwig
2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 7/7] fs/xfs: Add dedupe support for fsdax Shiyang Ruan
2021-02-08 15:39 ` [Ocfs2-devel] [PATCH 0/7] fsdax, xfs: Add reflink&dedupe " Jan Kara
7 siblings, 1 reply; 28+ messages in thread
From: Shiyang Ruan @ 2021-02-07 17:09 UTC (permalink / raw)
To: linux-kernel, linux-xfs, linux-nvdimm, linux-fsdevel
Cc: jack, darrick.wong, david, ocfs2-devel, viro, dan.j.williams,
linux-btrfs
In fsdax mode, WRITE and ZERO on a shared extent need CoW mechanism
performed. After CoW, new extents needs to be remapped to the file.
Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
---
fs/xfs/xfs_bmap_util.c | 6 +++++-
fs/xfs/xfs_file.c | 10 +++++++---
fs/xfs/xfs_iomap.c | 3 ++-
fs/xfs/xfs_iops.c | 11 ++++++++---
fs/xfs/xfs_reflink.c | 2 ++
5 files changed, 24 insertions(+), 8 deletions(-)
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 7371a7f7c652..eb0c7ff103f8 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -977,10 +977,14 @@ xfs_free_file_space(
if (offset + len > XFS_ISIZE(ip))
len = XFS_ISIZE(ip) - offset;
error = iomap_zero_range(VFS_I(ip), offset, len, NULL,
- &xfs_buffered_write_iomap_ops);
+ IS_DAX(VFS_I(ip)) ?
+ &xfs_direct_write_iomap_ops : &xfs_buffered_write_iomap_ops);
if (error)
return error;
+ if (xfs_is_reflink_inode(ip))
+ xfs_reflink_end_cow(ip, offset, len);
+
/*
* If we zeroed right up to EOF and EOF straddles a page boundary we
* must make sure that the post-EOF area is also zeroed because the
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 5b0f93f73837..ab738641a3f4 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -624,9 +624,13 @@ xfs_file_dax_write(
trace_xfs_file_dax_write(ip, count, pos);
ret = dax_iomap_rw(iocb, from, &xfs_direct_write_iomap_ops);
- if (ret > 0 && iocb->ki_pos > i_size_read(inode)) {
- i_size_write(inode, iocb->ki_pos);
- error = xfs_setfilesize(ip, pos, ret);
+ if (ret > 0) {
+ if (iocb->ki_pos > i_size_read(inode)) {
+ i_size_write(inode, iocb->ki_pos);
+ error = xfs_setfilesize(ip, pos, ret);
+ }
+ if (xfs_is_cow_inode(ip))
+ xfs_reflink_end_cow(ip, pos, ret);
}
out:
xfs_iunlock(ip, iolock);
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 7b9ff824e82d..d6d4cc0f084e 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -765,13 +765,14 @@ xfs_direct_write_iomap_begin(
goto out_unlock;
if (imap_needs_cow(ip, flags, &imap, nimaps)) {
+ bool need_convert = flags & IOMAP_DIRECT || IS_DAX(inode);
error = -EAGAIN;
if (flags & IOMAP_NOWAIT)
goto out_unlock;
/* may drop and re-acquire the ilock */
error = xfs_reflink_allocate_cow(ip, &imap, &cmap, &shared,
- &lockmode, flags & IOMAP_DIRECT);
+ &lockmode, need_convert);
if (error)
goto out_unlock;
if (shared)
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 1414ab79eacf..480bda0f16d0 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -860,6 +860,7 @@ xfs_setattr_size(
int error;
uint lock_flags = 0;
bool did_zeroing = false;
+ const struct iomap_ops *ops;
ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
ASSERT(xfs_isilocked(ip, XFS_MMAPLOCK_EXCL));
@@ -906,10 +907,15 @@ xfs_setattr_size(
* extension, or zeroing out the rest of the block on a downward
* truncate.
*/
+ if (IS_DAX(inode))
+ ops = &xfs_direct_write_iomap_ops;
+ else
+ ops = &xfs_buffered_write_iomap_ops;
+
if (newsize > oldsize) {
trace_xfs_zero_eof(ip, oldsize, newsize - oldsize);
error = iomap_zero_range(inode, oldsize, newsize - oldsize,
- &did_zeroing, &xfs_buffered_write_iomap_ops);
+ &did_zeroing, ops);
} else {
/*
* iomap won't detect a dirty page over an unwritten block (or a
@@ -921,8 +927,7 @@ xfs_setattr_size(
newsize);
if (error)
return error;
- error = iomap_truncate_page(inode, newsize, &did_zeroing,
- &xfs_buffered_write_iomap_ops);
+ error = iomap_truncate_page(inode, newsize, &did_zeroing, ops);
}
if (error)
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 26708c01fd78..38bde16cdb39 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1246,6 +1246,8 @@ xfs_reflink_zero_posteof(
trace_xfs_zero_eof(ip, isize, pos - isize);
return iomap_zero_range(VFS_I(ip), isize, pos - isize, NULL,
+ IS_DAX(VFS_I(ip)) ?
+ &xfs_direct_write_iomap_ops :
&xfs_buffered_write_iomap_ops);
}
--
2.30.0
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [Ocfs2-devel] [PATCH 6/7] fs/xfs: Handle CoW for fsdax write() path
2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 6/7] fs/xfs: Handle CoW for fsdax write() path Shiyang Ruan
@ 2021-02-08 15:24 ` Christoph Hellwig
0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2021-02-08 15:24 UTC (permalink / raw)
To: Shiyang Ruan
Cc: jack, darrick.wong, linux-nvdimm, david, linux-kernel, linux-xfs,
ocfs2-devel, viro, linux-fsdevel, dan.j.williams, linux-btrfs
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -977,10 +977,14 @@ xfs_free_file_space(
> if (offset + len > XFS_ISIZE(ip))
> len = XFS_ISIZE(ip) - offset;
> error = iomap_zero_range(VFS_I(ip), offset, len, NULL,
> - &xfs_buffered_write_iomap_ops);
> + IS_DAX(VFS_I(ip)) ?
> + &xfs_direct_write_iomap_ops : &xfs_buffered_write_iomap_ops);
> if (error)
> return error;
> + if (xfs_is_reflink_inode(ip))
> + xfs_reflink_end_cow(ip, offset, len);
Maybe we need to add (back) and xfs_zero_range helper that encapsulates
the details?
> trace_xfs_file_dax_write(ip, count, pos);
> ret = dax_iomap_rw(iocb, from, &xfs_direct_write_iomap_ops);
> - if (ret > 0 && iocb->ki_pos > i_size_read(inode)) {
> - i_size_write(inode, iocb->ki_pos);
> - error = xfs_setfilesize(ip, pos, ret);
> + if (ret > 0) {
> + if (iocb->ki_pos > i_size_read(inode)) {
> + i_size_write(inode, iocb->ki_pos);
> + error = xfs_setfilesize(ip, pos, ret);
> + }
> + if (xfs_is_cow_inode(ip))
> + xfs_reflink_end_cow(ip, pos, ret);
Nitpick, but I'd just goto out for ret <= 0 to reduce the indentation a
bit.
> }
> out:
> xfs_iunlock(ip, iolock);
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 7b9ff824e82d..d6d4cc0f084e 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -765,13 +765,14 @@ xfs_direct_write_iomap_begin(
> goto out_unlock;
>
> if (imap_needs_cow(ip, flags, &imap, nimaps)) {
> + bool need_convert = flags & IOMAP_DIRECT || IS_DAX(inode);
> error = -EAGAIN;
> if (flags & IOMAP_NOWAIT)
> goto out_unlock;
>
> /* may drop and re-acquire the ilock */
> error = xfs_reflink_allocate_cow(ip, &imap, &cmap, &shared,
> - &lockmode, flags & IOMAP_DIRECT);
> + &lockmode, need_convert);
Why not:
error = xfs_reflink_allocate_cow(ip, &imap, &cmap, &shared,
&lockmode,
(flags & IOMAP_DIRECT) || IS_DAX(inode));
?
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Ocfs2-devel] [PATCH 7/7] fs/xfs: Add dedupe support for fsdax
2021-02-07 17:09 [Ocfs2-devel] [PATCH 0/7] fsdax, xfs: Add reflink&dedupe support for fsdax Shiyang Ruan
` (5 preceding siblings ...)
2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 6/7] fs/xfs: Handle CoW for fsdax write() path Shiyang Ruan
@ 2021-02-07 17:09 ` Shiyang Ruan
2021-02-08 15:39 ` [Ocfs2-devel] [PATCH 0/7] fsdax, xfs: Add reflink&dedupe " Jan Kara
7 siblings, 0 replies; 28+ messages in thread
From: Shiyang Ruan @ 2021-02-07 17:09 UTC (permalink / raw)
To: linux-kernel, linux-xfs, linux-nvdimm, linux-fsdevel
Cc: jack, darrick.wong, david, ocfs2-devel, viro, dan.j.williams,
linux-btrfs
Add xfs_break_two_dax_layouts() to break layout for tow dax files. Then
call compare range function only when files are both DAX or not.
Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
---
fs/xfs/xfs_file.c | 20 ++++++++++++++++++++
fs/xfs/xfs_inode.c | 8 +++++++-
fs/xfs/xfs_inode.h | 1 +
fs/xfs/xfs_reflink.c | 21 ++++++++++++++++++---
4 files changed, 46 insertions(+), 4 deletions(-)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index ab738641a3f4..64ded96b43c8 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -791,6 +791,26 @@ xfs_break_dax_layouts(
0, 0, xfs_wait_dax_page(inode));
}
+int
+xfs_break_two_dax_layouts(
+ struct inode *src,
+ struct inode *dest)
+{
+ int error;
+ bool retry = false;
+
+retry:
+ error = xfs_break_dax_layouts(src, &retry);
+ if (error || retry)
+ goto retry;
+
+ error = xfs_break_dax_layouts(dest, &retry);
+ if (error || retry)
+ goto retry;
+
+ return error;
+}
+
int
xfs_break_layouts(
struct inode *inode,
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 2bfbcf28b1bd..6483dbaa4d57 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3786,8 +3786,10 @@ xfs_ilock2_io_mmap(
struct xfs_inode *ip2)
{
int ret;
+ struct inode *inode1 = VFS_I(ip1);
+ struct inode *inode2 = VFS_I(ip2);
- ret = xfs_iolock_two_inodes_and_break_layout(VFS_I(ip1), VFS_I(ip2));
+ ret = xfs_iolock_two_inodes_and_break_layout(inode1, inode2);
if (ret)
return ret;
if (ip1 == ip2)
@@ -3795,6 +3797,10 @@ xfs_ilock2_io_mmap(
else
xfs_lock_two_inodes(ip1, XFS_MMAPLOCK_EXCL,
ip2, XFS_MMAPLOCK_EXCL);
+
+ if (IS_DAX(inode1) && IS_DAX(inode2))
+ ret = xfs_break_two_dax_layouts(inode1, inode2);
+
return 0;
}
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 751a3d1d7d84..462b61bea691 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -431,6 +431,7 @@ enum xfs_prealloc_flags {
int xfs_update_prealloc_flags(struct xfs_inode *ip,
enum xfs_prealloc_flags flags);
+int xfs_break_two_dax_layouts(struct inode *inode1, struct inode *inode2);
int xfs_break_layouts(struct inode *inode, uint *iolock,
enum layout_break_reason reason);
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 38bde16cdb39..bdb9a07e703f 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -29,6 +29,7 @@
#include "xfs_iomap.h"
#include "xfs_sb.h"
#include "xfs_ag_resv.h"
+#include <linux/dax.h>
/*
* Copy on Write of Shared Blocks
@@ -1251,6 +1252,14 @@ xfs_reflink_zero_posteof(
&xfs_buffered_write_iomap_ops);
}
+int xfs_reflink_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
+ struct inode *dest, loff_t destoff,
+ loff_t len, bool *is_same)
+{
+ return dax_file_range_compare(src, srcoff, dest, destoff, len, is_same,
+ &xfs_read_iomap_ops);
+}
+
/*
* Prepare two files for range cloning. Upon a successful return both inodes
* will have the iolock and mmaplock held, the page cache of the out file will
@@ -1293,6 +1302,7 @@ xfs_reflink_remap_prep(
struct xfs_inode *src = XFS_I(inode_in);
struct inode *inode_out = file_inode(file_out);
struct xfs_inode *dest = XFS_I(inode_out);
+ compare_range_t compare_range_fn;
int ret;
/* Lock both files against IO */
@@ -1306,12 +1316,17 @@ xfs_reflink_remap_prep(
if (XFS_IS_REALTIME_INODE(src) || XFS_IS_REALTIME_INODE(dest))
goto out_unlock;
- /* Don't share DAX file data for now. */
- if (IS_DAX(inode_in) || IS_DAX(inode_out))
+ /* Don't share DAX file data with non-DAX file. */
+ if (IS_DAX(inode_in) != IS_DAX(inode_out))
goto out_unlock;
+ if (IS_DAX(inode_in))
+ compare_range_fn = xfs_reflink_dedupe_file_range_compare;
+ else
+ compare_range_fn = vfs_dedupe_file_range_compare;
+
ret = generic_remap_file_range_prep(file_in, pos_in, file_out, pos_out,
- len, remap_flags, vfs_dedupe_file_range_compare);
+ len, remap_flags, compare_range_fn);
if (ret || *len == 0)
goto out_unlock;
--
2.30.0
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [Ocfs2-devel] [PATCH 0/7] fsdax, xfs: Add reflink&dedupe support for fsdax
2021-02-07 17:09 [Ocfs2-devel] [PATCH 0/7] fsdax, xfs: Add reflink&dedupe support for fsdax Shiyang Ruan
` (6 preceding siblings ...)
2021-02-07 17:09 ` [Ocfs2-devel] [PATCH 7/7] fs/xfs: Add dedupe support for fsdax Shiyang Ruan
@ 2021-02-08 15:39 ` Jan Kara
2021-02-09 1:50 ` Ruan Shiyang
7 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2021-02-08 15:39 UTC (permalink / raw)
To: Shiyang Ruan
Cc: jack, darrick.wong, linux-nvdimm, david, linux-kernel, linux-xfs,
ocfs2-devel, viro, linux-fsdevel, dan.j.williams, linux-btrfs
On Mon 08-02-21 01:09:17, Shiyang Ruan wrote:
> This patchset is attempt to add CoW support for fsdax, and take XFS,
> which has both reflink and fsdax feature, as an example.
>
> One of the key mechanism need to be implemented in fsdax is CoW. Copy
> the data from srcmap before we actually write data to the destance
> iomap. And we just copy range in which data won't be changed.
>
> Another mechanism is range comparison . In page cache case, readpage()
> is used to load data on disk to page cache in order to be able to
> compare data. In fsdax case, readpage() does not work. So, we need
> another compare data with direct access support.
>
> With the two mechanism implemented in fsdax, we are able to make reflink
> and fsdax work together in XFS.
>
> Some of the patches are picked up from Goldwyn's patchset. I made some
> changes to adapt to this patchset.
How do you deal with HWPoison code trying to reverse-map struct page back
to inode-offset pair? This also needs to be fixed for reflink to work with
DAX.
Honza
>
> (Rebased on v5.10)
> ==
>
> Shiyang Ruan (7):
> fsdax: Output address in dax_iomap_pfn() and rename it
> fsdax: Introduce dax_copy_edges() for CoW
> fsdax: Copy data before write
> fsdax: Replace mmap entry in case of CoW
> fsdax: Dedup file range to use a compare function
> fs/xfs: Handle CoW for fsdax write() path
> fs/xfs: Add dedupe support for fsdax
>
> fs/btrfs/reflink.c | 3 +-
> fs/dax.c | 188 ++++++++++++++++++++++++++++++++++++++---
> fs/ocfs2/file.c | 2 +-
> fs/remap_range.c | 14 +--
> fs/xfs/xfs_bmap_util.c | 6 +-
> fs/xfs/xfs_file.c | 30 ++++++-
> fs/xfs/xfs_inode.c | 8 +-
> fs/xfs/xfs_inode.h | 1 +
> fs/xfs/xfs_iomap.c | 3 +-
> fs/xfs/xfs_iops.c | 11 ++-
> fs/xfs/xfs_reflink.c | 23 ++++-
> include/linux/dax.h | 5 ++
> include/linux/fs.h | 9 +-
> 13 files changed, 270 insertions(+), 33 deletions(-)
>
> --
> 2.30.0
>
>
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Ocfs2-devel] [PATCH 0/7] fsdax, xfs: Add reflink&dedupe support for fsdax
2021-02-08 15:39 ` [Ocfs2-devel] [PATCH 0/7] fsdax, xfs: Add reflink&dedupe " Jan Kara
@ 2021-02-09 1:50 ` Ruan Shiyang
0 siblings, 0 replies; 28+ messages in thread
From: Ruan Shiyang @ 2021-02-09 1:50 UTC (permalink / raw)
To: Jan Kara
Cc: darrick.wong, linux-nvdimm, david, linux-kernel, linux-xfs,
ocfs2-devel, viro, linux-fsdevel, dan.j.williams, linux-btrfs
On 2021/2/8 下午11:39, Jan Kara wrote:
> On Mon 08-02-21 01:09:17, Shiyang Ruan wrote:
>> This patchset is attempt to add CoW support for fsdax, and take XFS,
>> which has both reflink and fsdax feature, as an example.
>>
>> One of the key mechanism need to be implemented in fsdax is CoW. Copy
>> the data from srcmap before we actually write data to the destance
>> iomap. And we just copy range in which data won't be changed.
>>
>> Another mechanism is range comparison . In page cache case, readpage()
>> is used to load data on disk to page cache in order to be able to
>> compare data. In fsdax case, readpage() does not work. So, we need
>> another compare data with direct access support.
>>
>> With the two mechanism implemented in fsdax, we are able to make reflink
>> and fsdax work together in XFS.
>>
>> Some of the patches are picked up from Goldwyn's patchset. I made some
>> changes to adapt to this patchset.
>
> How do you deal with HWPoison code trying to reverse-map struct page back
> to inode-offset pair? This also needs to be fixed for reflink to work with
> DAX.
>
I have posted v3 patchset to add reverse-map support for dax page
yesterday. Please take a look at this:
https://lkml.org/lkml/2021/2/8/347
--
Thanks,
Ruan Shiyang.
> Honza
>
>>
>> (Rebased on v5.10)
>> ==
>>
>> Shiyang Ruan (7):
>> fsdax: Output address in dax_iomap_pfn() and rename it
>> fsdax: Introduce dax_copy_edges() for CoW
>> fsdax: Copy data before write
>> fsdax: Replace mmap entry in case of CoW
>> fsdax: Dedup file range to use a compare function
>> fs/xfs: Handle CoW for fsdax write() path
>> fs/xfs: Add dedupe support for fsdax
>>
>> fs/btrfs/reflink.c | 3 +-
>> fs/dax.c | 188 ++++++++++++++++++++++++++++++++++++++---
>> fs/ocfs2/file.c | 2 +-
>> fs/remap_range.c | 14 +--
>> fs/xfs/xfs_bmap_util.c | 6 +-
>> fs/xfs/xfs_file.c | 30 ++++++-
>> fs/xfs/xfs_inode.c | 8 +-
>> fs/xfs/xfs_inode.h | 1 +
>> fs/xfs/xfs_iomap.c | 3 +-
>> fs/xfs/xfs_iops.c | 11 ++-
>> fs/xfs/xfs_reflink.c | 23 ++++-
>> include/linux/dax.h | 5 ++
>> include/linux/fs.h | 9 +-
>> 13 files changed, 270 insertions(+), 33 deletions(-)
>>
>> --
>> 2.30.0
>>
>>
>>
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply [flat|nested] 28+ messages in thread