* [PATCH 0/2] fsdax,xfs: fix warning messages @ 2022-11-24 14:54 Shiyang Ruan 2022-11-24 14:54 ` [PATCH 1/2] fsdax,xfs: fix warning messages at dax_[dis]associate_entry() Shiyang Ruan ` (4 more replies) 0 siblings, 5 replies; 23+ messages in thread From: Shiyang Ruan @ 2022-11-24 14:54 UTC (permalink / raw) To: linux-kernel, linux-xfs, nvdimm, linux-fsdevel Cc: djwong, david, dan.j.williams Many testcases failed in dax+reflink mode with warning message in dmesg. This also effects dax+noreflink mode if we run the test after a dax+reflink test. So, the most urgent thing is solving the warning messages. Patch 1 fixes some mistakes and adds handling of CoW cases not previously considered (srcmap is HOLE or UNWRITTEN). Patch 2 adds the implementation of unshare for fsdax. With these fixes, most warning messages in dax_associate_entry() are gone. But honestly, generic/388 will randomly failed with the warning. The case shutdown the xfs when fsstress is running, and do it for many times. I think the reason is that dax pages in use are not able to be invalidated in time when fs is shutdown. The next time dax page to be associated, it still remains the mapping value set last time. I'll keep on solving it. The warning message in dax_writeback_one() can also be fixed because of the dax unshare. Shiyang Ruan (2): fsdax,xfs: fix warning messages at dax_[dis]associate_entry() fsdax,xfs: port unshare to fsdax fs/dax.c | 166 ++++++++++++++++++++++++++++++------------- fs/xfs/xfs_iomap.c | 6 +- fs/xfs/xfs_reflink.c | 8 ++- include/linux/dax.h | 2 + 4 files changed, 129 insertions(+), 53 deletions(-) -- 2.38.1 ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 1/2] fsdax,xfs: fix warning messages at dax_[dis]associate_entry() 2022-11-24 14:54 [PATCH 0/2] fsdax,xfs: fix warning messages Shiyang Ruan @ 2022-11-24 14:54 ` Shiyang Ruan 2022-11-30 4:08 ` Darrick J. Wong 2022-11-30 4:12 ` Dan Williams 2022-11-24 14:54 ` [PATCH 2/2] fsdax,xfs: port unshare to fsdax Shiyang Ruan ` (3 subsequent siblings) 4 siblings, 2 replies; 23+ messages in thread From: Shiyang Ruan @ 2022-11-24 14:54 UTC (permalink / raw) To: linux-kernel, linux-xfs, nvdimm, linux-fsdevel Cc: djwong, david, dan.j.williams This patch fixes the warning message reported in dax_associate_entry() and dax_disassociate_entry(). 1. reset page->mapping and ->index when refcount counting down to 0. 2. set IOMAP_F_SHARED flag when iomap read to allow one dax page to be associated more than once for not only write but also read. 3. should zero the edge (when not aligned) if srcmap is HOLE or UNWRITTEN. 4. iterator of two files in dedupe should be executed side by side, not nested. 5. use xfs_dax_write_iomap_ops for xfs zero and truncate. Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> --- fs/dax.c | 114 ++++++++++++++++++++++++++------------------- fs/xfs/xfs_iomap.c | 6 +-- 2 files changed, 69 insertions(+), 51 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 1c6867810cbd..5ea7c0926b7f 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -398,7 +398,7 @@ static void dax_disassociate_entry(void *entry, struct address_space *mapping, WARN_ON_ONCE(trunc && page_ref_count(page) > 1); if (dax_mapping_is_cow(page->mapping)) { /* keep the CoW flag if this page is still shared */ - if (page->index-- > 0) + if (page->index-- > 1) continue; } else WARN_ON_ONCE(page->mapping && page->mapping != mapping); @@ -840,12 +840,6 @@ static bool dax_fault_is_synchronous(const struct iomap_iter *iter, (iter->iomap.flags & IOMAP_F_DIRTY); } -static bool dax_fault_is_cow(const struct iomap_iter *iter) -{ - return (iter->flags & IOMAP_WRITE) && - (iter->iomap.flags & IOMAP_F_SHARED); -} - /* * By this point grab_mapping_entry() has ensured that we have a locked entry * of the appropriate size so we don't have to worry about downgrading PMDs to @@ -859,13 +853,14 @@ static void *dax_insert_entry(struct xa_state *xas, struct vm_fault *vmf, { struct address_space *mapping = vmf->vma->vm_file->f_mapping; void *new_entry = dax_make_entry(pfn, flags); - bool dirty = !dax_fault_is_synchronous(iter, vmf->vma); - bool cow = dax_fault_is_cow(iter); + bool write = iter->flags & IOMAP_WRITE; + bool dirty = write && !dax_fault_is_synchronous(iter, vmf->vma); + bool shared = iter->iomap.flags & IOMAP_F_SHARED; if (dirty) __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); - if (cow || (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE))) { + if (shared || (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE))) { unsigned long index = xas->xa_index; /* we are replacing a zero page with block mapping */ if (dax_is_pmd_entry(entry)) @@ -877,12 +872,12 @@ static void *dax_insert_entry(struct xa_state *xas, struct vm_fault *vmf, xas_reset(xas); xas_lock_irq(xas); - if (cow || dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) { + if (shared || dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) { void *old; dax_disassociate_entry(entry, mapping, false); dax_associate_entry(new_entry, mapping, vmf->vma, vmf->address, - cow); + shared); /* * Only swap our new entry into the page cache if the current * entry is a zero page or an empty entry. If a normal PTE or @@ -902,7 +897,7 @@ static void *dax_insert_entry(struct xa_state *xas, struct vm_fault *vmf, if (dirty) xas_set_mark(xas, PAGECACHE_TAG_DIRTY); - if (cow) + if (write && shared) xas_set_mark(xas, PAGECACHE_TAG_TOWRITE); xas_unlock_irq(xas); @@ -1107,23 +1102,35 @@ static int dax_iomap_cow_copy(loff_t pos, uint64_t length, size_t align_size, loff_t end = pos + length; loff_t pg_end = round_up(end, align_size); bool copy_all = head_off == 0 && end == pg_end; + /* write zero at edge if srcmap is a HOLE or IOMAP_UNWRITTEN */ + bool zero_edge = srcmap->flags & IOMAP_F_SHARED || + srcmap->type == IOMAP_UNWRITTEN; void *saddr = 0; int ret = 0; - ret = dax_iomap_direct_access(srcmap, pos, size, &saddr, NULL); - if (ret) - return ret; + if (!zero_edge) { + ret = dax_iomap_direct_access(srcmap, pos, size, &saddr, NULL); + if (ret) + return ret; + } if (copy_all) { - ret = copy_mc_to_kernel(daddr, saddr, length); - return ret ? -EIO : 0; + if (zero_edge) + memset(daddr, 0, size); + else + ret = copy_mc_to_kernel(daddr, saddr, length); + goto out; } /* Copy the head part of the range */ if (head_off) { - ret = copy_mc_to_kernel(daddr, saddr, head_off); - if (ret) - return -EIO; + if (zero_edge) + memset(daddr, 0, head_off); + else { + ret = copy_mc_to_kernel(daddr, saddr, head_off); + if (ret) + return -EIO; + } } /* Copy the tail part of the range */ @@ -1131,12 +1138,19 @@ static int dax_iomap_cow_copy(loff_t pos, uint64_t length, size_t align_size, loff_t tail_off = head_off + length; loff_t tail_len = pg_end - end; - ret = copy_mc_to_kernel(daddr + tail_off, saddr + tail_off, - tail_len); - if (ret) - return -EIO; + if (zero_edge) + memset(daddr + tail_off, 0, tail_len); + else { + ret = copy_mc_to_kernel(daddr + tail_off, + saddr + tail_off, tail_len); + if (ret) + return -EIO; + } } - return 0; +out: + if (zero_edge) + dax_flush(srcmap->dax_dev, daddr, size); + return ret ? -EIO : 0; } /* @@ -1235,13 +1249,9 @@ static int dax_memzero(struct iomap_iter *iter, loff_t pos, size_t size) if (ret < 0) return ret; memset(kaddr + offset, 0, size); - if (srcmap->addr != iomap->addr) { - ret = dax_iomap_cow_copy(pos, size, PAGE_SIZE, srcmap, - kaddr); - if (ret < 0) - return ret; - dax_flush(iomap->dax_dev, kaddr, PAGE_SIZE); - } else + if (iomap->flags & IOMAP_F_SHARED) + ret = dax_iomap_cow_copy(pos, size, PAGE_SIZE, srcmap, kaddr); + else dax_flush(iomap->dax_dev, kaddr + offset, size); return ret; } @@ -1258,6 +1268,15 @@ static s64 dax_zero_iter(struct iomap_iter *iter, bool *did_zero) if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN) return length; + /* + * invalidate the pages whose sharing state is to be changed + * because of CoW. + */ + if (iomap->flags & IOMAP_F_SHARED) + invalidate_inode_pages2_range(iter->inode->i_mapping, + pos >> PAGE_SHIFT, + (pos + length - 1) >> PAGE_SHIFT); + do { unsigned offset = offset_in_page(pos); unsigned size = min_t(u64, PAGE_SIZE - offset, length); @@ -1318,12 +1337,13 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi, struct iov_iter *iter) { const struct iomap *iomap = &iomi->iomap; - const struct iomap *srcmap = &iomi->srcmap; + const struct iomap *srcmap = iomap_iter_srcmap(iomi); loff_t length = iomap_length(iomi); loff_t pos = iomi->pos; struct dax_device *dax_dev = iomap->dax_dev; loff_t end = pos + length, done = 0; bool write = iov_iter_rw(iter) == WRITE; + bool cow = write && iomap->flags & IOMAP_F_SHARED; ssize_t ret = 0; size_t xfer; int id; @@ -1350,7 +1370,7 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi, * into page tables. We have to tear down these mappings so that data * written by write(2) is visible in mmap. */ - if (iomap->flags & IOMAP_F_NEW) { + if (iomap->flags & IOMAP_F_NEW || cow) { invalidate_inode_pages2_range(iomi->inode->i_mapping, pos >> PAGE_SHIFT, (end - 1) >> PAGE_SHIFT); @@ -1384,8 +1404,7 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi, break; } - if (write && - srcmap->type != IOMAP_HOLE && srcmap->addr != iomap->addr) { + if (cow) { ret = dax_iomap_cow_copy(pos, length, PAGE_SIZE, srcmap, kaddr); if (ret) @@ -1532,7 +1551,7 @@ static vm_fault_t dax_fault_iter(struct vm_fault *vmf, struct xa_state *xas, void **entry, bool pmd) { const struct iomap *iomap = &iter->iomap; - const struct iomap *srcmap = &iter->srcmap; + const struct iomap *srcmap = iomap_iter_srcmap(iter); size_t size = pmd ? PMD_SIZE : PAGE_SIZE; loff_t pos = (loff_t)xas->xa_index << PAGE_SHIFT; bool write = iter->flags & IOMAP_WRITE; @@ -1563,8 +1582,7 @@ static vm_fault_t dax_fault_iter(struct vm_fault *vmf, *entry = dax_insert_entry(xas, vmf, iter, *entry, pfn, entry_flags); - if (write && - srcmap->type != IOMAP_HOLE && srcmap->addr != iomap->addr) { + if (write && iomap->flags & IOMAP_F_SHARED) { err = dax_iomap_cow_copy(pos, size, size, srcmap, kaddr); if (err) return dax_fault_return(err); @@ -1936,15 +1954,15 @@ int dax_dedupe_file_range_compare(struct inode *src, loff_t srcoff, .len = len, .flags = IOMAP_DAX, }; - int ret; + int ret, compared = 0; - while ((ret = iomap_iter(&src_iter, ops)) > 0) { - while ((ret = iomap_iter(&dst_iter, ops)) > 0) { - dst_iter.processed = dax_range_compare_iter(&src_iter, - &dst_iter, len, same); - } - if (ret <= 0) - src_iter.processed = ret; + while ((ret = iomap_iter(&src_iter, ops)) > 0 && + (ret = iomap_iter(&dst_iter, ops)) > 0) { + compared = dax_range_compare_iter(&src_iter, &dst_iter, len, + same); + if (compared < 0) + return ret; + src_iter.processed = dst_iter.processed = compared; } return ret; } diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 07da03976ec1..d9401d0300ad 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -1215,7 +1215,7 @@ xfs_read_iomap_begin( return error; error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap, &nimaps, 0); - if (!error && (flags & IOMAP_REPORT)) + if (!error && ((flags & IOMAP_REPORT) || IS_DAX(inode))) error = xfs_reflink_trim_around_shared(ip, &imap, &shared); xfs_iunlock(ip, lockmode); @@ -1370,7 +1370,7 @@ xfs_zero_range( if (IS_DAX(inode)) return dax_zero_range(inode, pos, len, did_zero, - &xfs_direct_write_iomap_ops); + &xfs_dax_write_iomap_ops); return iomap_zero_range(inode, pos, len, did_zero, &xfs_buffered_write_iomap_ops); } @@ -1385,7 +1385,7 @@ xfs_truncate_page( if (IS_DAX(inode)) return dax_truncate_page(inode, pos, did_zero, - &xfs_direct_write_iomap_ops); + &xfs_dax_write_iomap_ops); return iomap_truncate_page(inode, pos, did_zero, &xfs_buffered_write_iomap_ops); } -- 2.38.1 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] fsdax,xfs: fix warning messages at dax_[dis]associate_entry() 2022-11-24 14:54 ` [PATCH 1/2] fsdax,xfs: fix warning messages at dax_[dis]associate_entry() Shiyang Ruan @ 2022-11-30 4:08 ` Darrick J. Wong 2022-11-30 8:58 ` Shiyang Ruan 2022-11-30 4:12 ` Dan Williams 1 sibling, 1 reply; 23+ messages in thread From: Darrick J. Wong @ 2022-11-30 4:08 UTC (permalink / raw) To: Shiyang Ruan Cc: linux-kernel, linux-xfs, nvdimm, linux-fsdevel, david, dan.j.williams On Thu, Nov 24, 2022 at 02:54:53PM +0000, Shiyang Ruan wrote: > This patch fixes the warning message reported in dax_associate_entry() > and dax_disassociate_entry(). Hmm, that's quite a bit to put in a single patch, but I'll try to get through this... > 1. reset page->mapping and ->index when refcount counting down to 0. > 2. set IOMAP_F_SHARED flag when iomap read to allow one dax page to be > associated more than once for not only write but also read. That makes sense, I think. > 3. should zero the edge (when not aligned) if srcmap is HOLE or When is IOMAP_F_SHARED set on the /source/ mapping? > UNWRITTEN. > 4. iterator of two files in dedupe should be executed side by side, not > nested. Why? Also, this seems like a separate change? > 5. use xfs_dax_write_iomap_ops for xfs zero and truncate. Makes sense. > Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> > --- > fs/dax.c | 114 ++++++++++++++++++++++++++------------------- > fs/xfs/xfs_iomap.c | 6 +-- > 2 files changed, 69 insertions(+), 51 deletions(-) > > diff --git a/fs/dax.c b/fs/dax.c > index 1c6867810cbd..5ea7c0926b7f 100644 > --- a/fs/dax.c > +++ b/fs/dax.c > @@ -398,7 +398,7 @@ static void dax_disassociate_entry(void *entry, struct address_space *mapping, > WARN_ON_ONCE(trunc && page_ref_count(page) > 1); > if (dax_mapping_is_cow(page->mapping)) { > /* keep the CoW flag if this page is still shared */ > - if (page->index-- > 0) > + if (page->index-- > 1) Hmm. So if the fsdax "page" sharing factor drops from 2 to 1, we'll now null out the mapping and index? Before, we only did that when it dropped from 1 to 0. Does this leave the page with no mapping? And I guess a subsequent access will now take a fault to map it back in? > continue; > } else > WARN_ON_ONCE(page->mapping && page->mapping != mapping); > @@ -840,12 +840,6 @@ static bool dax_fault_is_synchronous(const struct iomap_iter *iter, > (iter->iomap.flags & IOMAP_F_DIRTY); > } > > -static bool dax_fault_is_cow(const struct iomap_iter *iter) > -{ > - return (iter->flags & IOMAP_WRITE) && > - (iter->iomap.flags & IOMAP_F_SHARED); > -} > - > /* > * By this point grab_mapping_entry() has ensured that we have a locked entry > * of the appropriate size so we don't have to worry about downgrading PMDs to > @@ -859,13 +853,14 @@ static void *dax_insert_entry(struct xa_state *xas, struct vm_fault *vmf, > { > struct address_space *mapping = vmf->vma->vm_file->f_mapping; > void *new_entry = dax_make_entry(pfn, flags); > - bool dirty = !dax_fault_is_synchronous(iter, vmf->vma); > - bool cow = dax_fault_is_cow(iter); > + bool write = iter->flags & IOMAP_WRITE; > + bool dirty = write && !dax_fault_is_synchronous(iter, vmf->vma); > + bool shared = iter->iomap.flags & IOMAP_F_SHARED; > > if (dirty) > __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > > - if (cow || (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE))) { > + if (shared || (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE))) { Ah, ok, so now we're yanking the mapping if the extent is shared, presumably so that... > unsigned long index = xas->xa_index; > /* we are replacing a zero page with block mapping */ > if (dax_is_pmd_entry(entry)) > @@ -877,12 +872,12 @@ static void *dax_insert_entry(struct xa_state *xas, struct vm_fault *vmf, > > xas_reset(xas); > xas_lock_irq(xas); > - if (cow || dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) { > + if (shared || dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) { > void *old; > > dax_disassociate_entry(entry, mapping, false); > dax_associate_entry(new_entry, mapping, vmf->vma, vmf->address, > - cow); > + shared); ...down here we can rebuild the association, but this time we'll set the page->mapping to PAGE_MAPPING_DAX_COW? I see a lot of similar changes, so I'm guessing this is how you fixed the failures that were a result of read file A -> reflink A to B -> read file B sequences? > /* > * Only swap our new entry into the page cache if the current > * entry is a zero page or an empty entry. If a normal PTE or > @@ -902,7 +897,7 @@ static void *dax_insert_entry(struct xa_state *xas, struct vm_fault *vmf, > if (dirty) > xas_set_mark(xas, PAGECACHE_TAG_DIRTY); > > - if (cow) > + if (write && shared) > xas_set_mark(xas, PAGECACHE_TAG_TOWRITE); > > xas_unlock_irq(xas); > @@ -1107,23 +1102,35 @@ static int dax_iomap_cow_copy(loff_t pos, uint64_t length, size_t align_size, I think this function isn't well named. It's copying into the parts of the @daddr page that are *not* covered by @pos/@length. In other words, it's really copying *around* the range that's supplied, isn't it? > loff_t end = pos + length; > loff_t pg_end = round_up(end, align_size); > bool copy_all = head_off == 0 && end == pg_end; > + /* write zero at edge if srcmap is a HOLE or IOMAP_UNWRITTEN */ > + bool zero_edge = srcmap->flags & IOMAP_F_SHARED || When is IOMAP_F_SHARED set on the /source/ mapping? I don't understand that circumstance, so I don't understand why we want to zero around in that case. > + srcmap->type == IOMAP_UNWRITTEN; Though it's self evident why we'd do that if the source map is unwritten. > void *saddr = 0; > int ret = 0; > > - ret = dax_iomap_direct_access(srcmap, pos, size, &saddr, NULL); > - if (ret) > - return ret; > + if (!zero_edge) { > + ret = dax_iomap_direct_access(srcmap, pos, size, &saddr, NULL); > + if (ret) > + return ret; > + } > > if (copy_all) { > - ret = copy_mc_to_kernel(daddr, saddr, length); > - return ret ? -EIO : 0; > + if (zero_edge) > + memset(daddr, 0, size); > + else > + ret = copy_mc_to_kernel(daddr, saddr, length); > + goto out; > } > > /* Copy the head part of the range */ > if (head_off) { > - ret = copy_mc_to_kernel(daddr, saddr, head_off); > - if (ret) > - return -EIO; > + if (zero_edge) > + memset(daddr, 0, head_off); > + else { > + ret = copy_mc_to_kernel(daddr, saddr, head_off); > + if (ret) > + return -EIO; > + } > } > > /* Copy the tail part of the range */ > @@ -1131,12 +1138,19 @@ static int dax_iomap_cow_copy(loff_t pos, uint64_t length, size_t align_size, > loff_t tail_off = head_off + length; > loff_t tail_len = pg_end - end; > > - ret = copy_mc_to_kernel(daddr + tail_off, saddr + tail_off, > - tail_len); > - if (ret) > - return -EIO; > + if (zero_edge) > + memset(daddr + tail_off, 0, tail_len); > + else { > + ret = copy_mc_to_kernel(daddr + tail_off, > + saddr + tail_off, tail_len); > + if (ret) > + return -EIO; > + } > } > - return 0; > +out: > + if (zero_edge) > + dax_flush(srcmap->dax_dev, daddr, size); > + return ret ? -EIO : 0; > } > > /* > @@ -1235,13 +1249,9 @@ static int dax_memzero(struct iomap_iter *iter, loff_t pos, size_t size) > if (ret < 0) > return ret; > memset(kaddr + offset, 0, size); > - if (srcmap->addr != iomap->addr) { > - ret = dax_iomap_cow_copy(pos, size, PAGE_SIZE, srcmap, > - kaddr); > - if (ret < 0) > - return ret; > - dax_flush(iomap->dax_dev, kaddr, PAGE_SIZE); > - } else > + if (iomap->flags & IOMAP_F_SHARED) > + ret = dax_iomap_cow_copy(pos, size, PAGE_SIZE, srcmap, kaddr); > + else > dax_flush(iomap->dax_dev, kaddr + offset, size); > return ret; > } > @@ -1258,6 +1268,15 @@ static s64 dax_zero_iter(struct iomap_iter *iter, bool *did_zero) > if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN) > return length; > > + /* > + * invalidate the pages whose sharing state is to be changed > + * because of CoW. > + */ > + if (iomap->flags & IOMAP_F_SHARED) > + invalidate_inode_pages2_range(iter->inode->i_mapping, > + pos >> PAGE_SHIFT, > + (pos + length - 1) >> PAGE_SHIFT); > + > do { > unsigned offset = offset_in_page(pos); > unsigned size = min_t(u64, PAGE_SIZE - offset, length); > @@ -1318,12 +1337,13 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi, > struct iov_iter *iter) > { > const struct iomap *iomap = &iomi->iomap; > - const struct iomap *srcmap = &iomi->srcmap; > + const struct iomap *srcmap = iomap_iter_srcmap(iomi); > loff_t length = iomap_length(iomi); > loff_t pos = iomi->pos; > struct dax_device *dax_dev = iomap->dax_dev; > loff_t end = pos + length, done = 0; > bool write = iov_iter_rw(iter) == WRITE; > + bool cow = write && iomap->flags & IOMAP_F_SHARED; > ssize_t ret = 0; > size_t xfer; > int id; > @@ -1350,7 +1370,7 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi, > * into page tables. We have to tear down these mappings so that data > * written by write(2) is visible in mmap. > */ > - if (iomap->flags & IOMAP_F_NEW) { > + if (iomap->flags & IOMAP_F_NEW || cow) { > invalidate_inode_pages2_range(iomi->inode->i_mapping, > pos >> PAGE_SHIFT, > (end - 1) >> PAGE_SHIFT); > @@ -1384,8 +1404,7 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi, > break; > } > > - if (write && > - srcmap->type != IOMAP_HOLE && srcmap->addr != iomap->addr) { > + if (cow) { > ret = dax_iomap_cow_copy(pos, length, PAGE_SIZE, srcmap, > kaddr); > if (ret) > @@ -1532,7 +1551,7 @@ static vm_fault_t dax_fault_iter(struct vm_fault *vmf, > struct xa_state *xas, void **entry, bool pmd) > { > const struct iomap *iomap = &iter->iomap; > - const struct iomap *srcmap = &iter->srcmap; > + const struct iomap *srcmap = iomap_iter_srcmap(iter); > size_t size = pmd ? PMD_SIZE : PAGE_SIZE; > loff_t pos = (loff_t)xas->xa_index << PAGE_SHIFT; > bool write = iter->flags & IOMAP_WRITE; > @@ -1563,8 +1582,7 @@ static vm_fault_t dax_fault_iter(struct vm_fault *vmf, > > *entry = dax_insert_entry(xas, vmf, iter, *entry, pfn, entry_flags); > > - if (write && > - srcmap->type != IOMAP_HOLE && srcmap->addr != iomap->addr) { > + if (write && iomap->flags & IOMAP_F_SHARED) { > err = dax_iomap_cow_copy(pos, size, size, srcmap, kaddr); > if (err) > return dax_fault_return(err); > @@ -1936,15 +1954,15 @@ int dax_dedupe_file_range_compare(struct inode *src, loff_t srcoff, Does the dedupe change need to be in this patch? It looks ok both before and after, so I don't know why it's necessary. Welp, thank you for fixing the problems, at least. After a couple of days it looks like the serious problems have cleared up. --D > .len = len, > .flags = IOMAP_DAX, > }; > - int ret; > + int ret, compared = 0; > > - while ((ret = iomap_iter(&src_iter, ops)) > 0) { > - while ((ret = iomap_iter(&dst_iter, ops)) > 0) { > - dst_iter.processed = dax_range_compare_iter(&src_iter, > - &dst_iter, len, same); > - } > - if (ret <= 0) > - src_iter.processed = ret; > + while ((ret = iomap_iter(&src_iter, ops)) > 0 && > + (ret = iomap_iter(&dst_iter, ops)) > 0) { > + compared = dax_range_compare_iter(&src_iter, &dst_iter, len, > + same); > + if (compared < 0) > + return ret; > + src_iter.processed = dst_iter.processed = compared; > } > return ret; > } > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c > index 07da03976ec1..d9401d0300ad 100644 > --- a/fs/xfs/xfs_iomap.c > +++ b/fs/xfs/xfs_iomap.c > @@ -1215,7 +1215,7 @@ xfs_read_iomap_begin( > return error; > error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap, > &nimaps, 0); > - if (!error && (flags & IOMAP_REPORT)) > + if (!error && ((flags & IOMAP_REPORT) || IS_DAX(inode))) > error = xfs_reflink_trim_around_shared(ip, &imap, &shared); > xfs_iunlock(ip, lockmode); > > @@ -1370,7 +1370,7 @@ xfs_zero_range( > > if (IS_DAX(inode)) > return dax_zero_range(inode, pos, len, did_zero, > - &xfs_direct_write_iomap_ops); > + &xfs_dax_write_iomap_ops); > return iomap_zero_range(inode, pos, len, did_zero, > &xfs_buffered_write_iomap_ops); > } > @@ -1385,7 +1385,7 @@ xfs_truncate_page( > > if (IS_DAX(inode)) > return dax_truncate_page(inode, pos, did_zero, > - &xfs_direct_write_iomap_ops); > + &xfs_dax_write_iomap_ops); > return iomap_truncate_page(inode, pos, did_zero, > &xfs_buffered_write_iomap_ops); > } > -- > 2.38.1 > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] fsdax,xfs: fix warning messages at dax_[dis]associate_entry() 2022-11-30 4:08 ` Darrick J. Wong @ 2022-11-30 8:58 ` Shiyang Ruan 2022-11-30 16:28 ` Darrick J. Wong 0 siblings, 1 reply; 23+ messages in thread From: Shiyang Ruan @ 2022-11-30 8:58 UTC (permalink / raw) To: Darrick J. Wong Cc: linux-kernel, linux-xfs, nvdimm, linux-fsdevel, david, dan.j.williams 在 2022/11/30 12:08, Darrick J. Wong 写道: > On Thu, Nov 24, 2022 at 02:54:53PM +0000, Shiyang Ruan wrote: >> This patch fixes the warning message reported in dax_associate_entry() >> and dax_disassociate_entry(). > > Hmm, that's quite a bit to put in a single patch, but I'll try to get > through this... Oh sorry... > >> 1. reset page->mapping and ->index when refcount counting down to 0. >> 2. set IOMAP_F_SHARED flag when iomap read to allow one dax page to be >> associated more than once for not only write but also read. > > That makes sense, I think. > >> 3. should zero the edge (when not aligned) if srcmap is HOLE or > > When is IOMAP_F_SHARED set on the /source/ mapping? In fs/xfs/xfs_iomap.c: xfs_direct_write_iomap_begin(): goto out_found_cow tag, srcmap is *not set* when the source extent is HOLE, then only iomap is set with IOMAP_F_SHARED flag. Now we come to iomap iter, when we get the srcmap by calling iomap_iter_srcmap(iter), the iomap will be returned (because srcmap isn't set). So, in this case, srcmap == iomap, we can think the source extent is a HOLE if srcmap->flag & IOMAP_F_SHARED != 0 > >> UNWRITTEN. >> 4. iterator of two files in dedupe should be executed side by side, not >> nested. > > Why? Also, this seems like a separate change? Explain below. > >> 5. use xfs_dax_write_iomap_ops for xfs zero and truncate. > > Makes sense. > >> Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> >> --- >> fs/dax.c | 114 ++++++++++++++++++++++++++------------------- >> fs/xfs/xfs_iomap.c | 6 +-- >> 2 files changed, 69 insertions(+), 51 deletions(-) >> >> diff --git a/fs/dax.c b/fs/dax.c >> index 1c6867810cbd..5ea7c0926b7f 100644 >> --- a/fs/dax.c >> +++ b/fs/dax.c >> @@ -398,7 +398,7 @@ static void dax_disassociate_entry(void *entry, struct address_space *mapping, >> WARN_ON_ONCE(trunc && page_ref_count(page) > 1); >> if (dax_mapping_is_cow(page->mapping)) { >> /* keep the CoW flag if this page is still shared */ >> - if (page->index-- > 0) >> + if (page->index-- > 1) > > Hmm. So if the fsdax "page" sharing factor drops from 2 to 1, we'll now > null out the mapping and index? Before, we only did that when it > dropped from 1 to 0. > > Does this leave the page with no mapping? And I guess a subsequent > access will now take a fault to map it back in? I confused it with --page->index, the result of "page->index--" is page->index itself. So, assume: this time, refcount is 2, >1, minus 1 to 1, then continue; next time, refcount is 1, not >1, minus 1 to 0, then clear the page->mapping. > >> continue; >> } else >> WARN_ON_ONCE(page->mapping && page->mapping != mapping); >> @@ -840,12 +840,6 @@ static bool dax_fault_is_synchronous(const struct iomap_iter *iter, >> (iter->iomap.flags & IOMAP_F_DIRTY); >> } >> >> -static bool dax_fault_is_cow(const struct iomap_iter *iter) >> -{ >> - return (iter->flags & IOMAP_WRITE) && >> - (iter->iomap.flags & IOMAP_F_SHARED); >> -} >> - >> /* >> * By this point grab_mapping_entry() has ensured that we have a locked entry >> * of the appropriate size so we don't have to worry about downgrading PMDs to >> @@ -859,13 +853,14 @@ static void *dax_insert_entry(struct xa_state *xas, struct vm_fault *vmf, >> { >> struct address_space *mapping = vmf->vma->vm_file->f_mapping; >> void *new_entry = dax_make_entry(pfn, flags); >> - bool dirty = !dax_fault_is_synchronous(iter, vmf->vma); >> - bool cow = dax_fault_is_cow(iter); >> + bool write = iter->flags & IOMAP_WRITE; >> + bool dirty = write && !dax_fault_is_synchronous(iter, vmf->vma); >> + bool shared = iter->iomap.flags & IOMAP_F_SHARED; >> >> if (dirty) >> __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); >> >> - if (cow || (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE))) { >> + if (shared || (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE))) { > > Ah, ok, so now we're yanking the mapping if the extent is shared, > presumably so that... > >> unsigned long index = xas->xa_index; >> /* we are replacing a zero page with block mapping */ >> if (dax_is_pmd_entry(entry)) >> @@ -877,12 +872,12 @@ static void *dax_insert_entry(struct xa_state *xas, struct vm_fault *vmf, >> >> xas_reset(xas); >> xas_lock_irq(xas); >> - if (cow || dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) { >> + if (shared || dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) { >> void *old; >> >> dax_disassociate_entry(entry, mapping, false); >> dax_associate_entry(new_entry, mapping, vmf->vma, vmf->address, >> - cow); >> + shared); > > ...down here we can rebuild the association, but this time we'll set the > page->mapping to PAGE_MAPPING_DAX_COW? I see a lot of similar changes, > so I'm guessing this is how you fixed the failures that were a result of > read file A -> reflink A to B -> read file B sequences? Yes, it even failed when mapreading a page shared by two extent of ONE file. But I remember that I had tested these cases before... > >> /* >> * Only swap our new entry into the page cache if the current >> * entry is a zero page or an empty entry. If a normal PTE or >> @@ -902,7 +897,7 @@ static void *dax_insert_entry(struct xa_state *xas, struct vm_fault *vmf, >> if (dirty) >> xas_set_mark(xas, PAGECACHE_TAG_DIRTY); >> >> - if (cow) >> + if (write && shared) >> xas_set_mark(xas, PAGECACHE_TAG_TOWRITE); >> >> xas_unlock_irq(xas); >> @@ -1107,23 +1102,35 @@ static int dax_iomap_cow_copy(loff_t pos, uint64_t length, size_t align_size, > > I think this function isn't well named. It's copying into the parts of > the @daddr page that are *not* covered by @pos/@length. In other words, > it's really copying *around* the range that's supplied, isn't it? Yes, I used to name it "dax_iomap_copy_edge()", which is not so good too. I'm not good at naming. > >> loff_t end = pos + length; >> loff_t pg_end = round_up(end, align_size); >> bool copy_all = head_off == 0 && end == pg_end; >> + /* write zero at edge if srcmap is a HOLE or IOMAP_UNWRITTEN */ >> + bool zero_edge = srcmap->flags & IOMAP_F_SHARED || > > When is IOMAP_F_SHARED set on the /source/ mapping? I don't understand > that circumstance, so I don't understand why we want to zero around in > that case. Please see explanation above. > >> + srcmap->type == IOMAP_UNWRITTEN; > > Though it's self evident why we'd do that if the source map is > unwritten. The new allocated destination pages for CoW is not all zeroed, not like new allocated page cache. Old data remains on it. So, if the source extent doesn't contain valid data (HOLE and UNWRITTEN extent), we need to zero the destination range around the @pos+@length. > >> void *saddr = 0; >> int ret = 0; >> >> - ret = dax_iomap_direct_access(srcmap, pos, size, &saddr, NULL); >> - if (ret) >> - return ret; >> + if (!zero_edge) { >> + ret = dax_iomap_direct_access(srcmap, pos, size, &saddr, NULL); >> + if (ret) >> + return ret; >> + } >> >> if (copy_all) { >> - ret = copy_mc_to_kernel(daddr, saddr, length); >> - return ret ? -EIO : 0; >> + if (zero_edge) >> + memset(daddr, 0, size); >> + else >> + ret = copy_mc_to_kernel(daddr, saddr, length); >> + goto out; >> } >> >> /* Copy the head part of the range */ >> if (head_off) { >> - ret = copy_mc_to_kernel(daddr, saddr, head_off); >> - if (ret) >> - return -EIO; >> + if (zero_edge) >> + memset(daddr, 0, head_off); >> + else { >> + ret = copy_mc_to_kernel(daddr, saddr, head_off); >> + if (ret) >> + return -EIO; >> + } >> } >> >> /* Copy the tail part of the range */ >> @@ -1131,12 +1138,19 @@ static int dax_iomap_cow_copy(loff_t pos, uint64_t length, size_t align_size, >> loff_t tail_off = head_off + length; >> loff_t tail_len = pg_end - end; >> >> - ret = copy_mc_to_kernel(daddr + tail_off, saddr + tail_off, >> - tail_len); >> - if (ret) >> - return -EIO; >> + if (zero_edge) >> + memset(daddr + tail_off, 0, tail_len); >> + else { >> + ret = copy_mc_to_kernel(daddr + tail_off, >> + saddr + tail_off, tail_len); >> + if (ret) >> + return -EIO; >> + } >> } >> - return 0; >> +out: >> + if (zero_edge) >> + dax_flush(srcmap->dax_dev, daddr, size); >> + return ret ? -EIO : 0; >> } >> >> /* >> @@ -1235,13 +1249,9 @@ static int dax_memzero(struct iomap_iter *iter, loff_t pos, size_t size) >> if (ret < 0) >> return ret; >> memset(kaddr + offset, 0, size); >> - if (srcmap->addr != iomap->addr) { >> - ret = dax_iomap_cow_copy(pos, size, PAGE_SIZE, srcmap, >> - kaddr); >> - if (ret < 0) >> - return ret; >> - dax_flush(iomap->dax_dev, kaddr, PAGE_SIZE); >> - } else >> + if (iomap->flags & IOMAP_F_SHARED) >> + ret = dax_iomap_cow_copy(pos, size, PAGE_SIZE, srcmap, kaddr); >> + else >> dax_flush(iomap->dax_dev, kaddr + offset, size); >> return ret; >> } >> @@ -1258,6 +1268,15 @@ static s64 dax_zero_iter(struct iomap_iter *iter, bool *did_zero) >> if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN) >> return length; >> >> + /* >> + * invalidate the pages whose sharing state is to be changed >> + * because of CoW. >> + */ >> + if (iomap->flags & IOMAP_F_SHARED) >> + invalidate_inode_pages2_range(iter->inode->i_mapping, >> + pos >> PAGE_SHIFT, >> + (pos + length - 1) >> PAGE_SHIFT); >> + >> do { >> unsigned offset = offset_in_page(pos); >> unsigned size = min_t(u64, PAGE_SIZE - offset, length); >> @@ -1318,12 +1337,13 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi, >> struct iov_iter *iter) >> { >> const struct iomap *iomap = &iomi->iomap; >> - const struct iomap *srcmap = &iomi->srcmap; >> + const struct iomap *srcmap = iomap_iter_srcmap(iomi); >> loff_t length = iomap_length(iomi); >> loff_t pos = iomi->pos; >> struct dax_device *dax_dev = iomap->dax_dev; >> loff_t end = pos + length, done = 0; >> bool write = iov_iter_rw(iter) == WRITE; >> + bool cow = write && iomap->flags & IOMAP_F_SHARED; >> ssize_t ret = 0; >> size_t xfer; >> int id; >> @@ -1350,7 +1370,7 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi, >> * into page tables. We have to tear down these mappings so that data >> * written by write(2) is visible in mmap. >> */ >> - if (iomap->flags & IOMAP_F_NEW) { >> + if (iomap->flags & IOMAP_F_NEW || cow) { >> invalidate_inode_pages2_range(iomi->inode->i_mapping, >> pos >> PAGE_SHIFT, >> (end - 1) >> PAGE_SHIFT); >> @@ -1384,8 +1404,7 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi, >> break; >> } >> >> - if (write && >> - srcmap->type != IOMAP_HOLE && srcmap->addr != iomap->addr) { >> + if (cow) { >> ret = dax_iomap_cow_copy(pos, length, PAGE_SIZE, srcmap, >> kaddr); >> if (ret) >> @@ -1532,7 +1551,7 @@ static vm_fault_t dax_fault_iter(struct vm_fault *vmf, >> struct xa_state *xas, void **entry, bool pmd) >> { >> const struct iomap *iomap = &iter->iomap; >> - const struct iomap *srcmap = &iter->srcmap; >> + const struct iomap *srcmap = iomap_iter_srcmap(iter); >> size_t size = pmd ? PMD_SIZE : PAGE_SIZE; >> loff_t pos = (loff_t)xas->xa_index << PAGE_SHIFT; >> bool write = iter->flags & IOMAP_WRITE; >> @@ -1563,8 +1582,7 @@ static vm_fault_t dax_fault_iter(struct vm_fault *vmf, >> >> *entry = dax_insert_entry(xas, vmf, iter, *entry, pfn, entry_flags); >> >> - if (write && >> - srcmap->type != IOMAP_HOLE && srcmap->addr != iomap->addr) { >> + if (write && iomap->flags & IOMAP_F_SHARED) { >> err = dax_iomap_cow_copy(pos, size, size, srcmap, kaddr); >> if (err) >> return dax_fault_return(err); >> @@ -1936,15 +1954,15 @@ int dax_dedupe_file_range_compare(struct inode *src, loff_t srcoff, > > Does the dedupe change need to be in this patch? It looks ok both > before and after, so I don't know why it's necessary. I'll separate this in next version. This is the old version: while ((ret = iomap_iter(&src_iter, ops)) > 0) { while ((ret = iomap_iter(&dst_iter, ops)) > 0) { dst_iter.processed = dax_range_compare_iter(&src_iter, &dst_iter, len, same); } if (ret <= 0) src_iter.processed = ret; } The inner iter (iomap_begin, actor, iomap_end) may loop more than once. In this case, the inner dest_iter updates its iomap, but the outer src_iter still keeps the old. The comparison of new dest_iomap and old src_iomap is meanless and it causes the bug. So, we need to make the two iters able to update their iomaps together. > > Welp, thank you for fixing the problems, at least. After a couple of > days it looks like the serious problems have cleared up. > I didn't test the dax code well. Sorry... -- Thanks, Ruan. > --D > >> .len = len, >> .flags = IOMAP_DAX, >> }; >> - int ret; >> + int ret, compared = 0; >> >> - while ((ret = iomap_iter(&src_iter, ops)) > 0) { >> - while ((ret = iomap_iter(&dst_iter, ops)) > 0) { >> - dst_iter.processed = dax_range_compare_iter(&src_iter, >> - &dst_iter, len, same); >> - } >> - if (ret <= 0) >> - src_iter.processed = ret; >> + while ((ret = iomap_iter(&src_iter, ops)) > 0 && >> + (ret = iomap_iter(&dst_iter, ops)) > 0) { >> + compared = dax_range_compare_iter(&src_iter, &dst_iter, len, >> + same); >> + if (compared < 0) >> + return ret; >> + src_iter.processed = dst_iter.processed = compared; >> } >> return ret; >> } >> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c >> index 07da03976ec1..d9401d0300ad 100644 >> --- a/fs/xfs/xfs_iomap.c >> +++ b/fs/xfs/xfs_iomap.c >> @@ -1215,7 +1215,7 @@ xfs_read_iomap_begin( >> return error; >> error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap, >> &nimaps, 0); >> - if (!error && (flags & IOMAP_REPORT)) >> + if (!error && ((flags & IOMAP_REPORT) || IS_DAX(inode))) >> error = xfs_reflink_trim_around_shared(ip, &imap, &shared); >> xfs_iunlock(ip, lockmode); >> >> @@ -1370,7 +1370,7 @@ xfs_zero_range( >> >> if (IS_DAX(inode)) >> return dax_zero_range(inode, pos, len, did_zero, >> - &xfs_direct_write_iomap_ops); >> + &xfs_dax_write_iomap_ops); >> return iomap_zero_range(inode, pos, len, did_zero, >> &xfs_buffered_write_iomap_ops); >> } >> @@ -1385,7 +1385,7 @@ xfs_truncate_page( >> >> if (IS_DAX(inode)) >> return dax_truncate_page(inode, pos, did_zero, >> - &xfs_direct_write_iomap_ops); >> + &xfs_dax_write_iomap_ops); >> return iomap_truncate_page(inode, pos, did_zero, >> &xfs_buffered_write_iomap_ops); >> } >> -- >> 2.38.1 >> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] fsdax,xfs: fix warning messages at dax_[dis]associate_entry() 2022-11-30 8:58 ` Shiyang Ruan @ 2022-11-30 16:28 ` Darrick J. Wong 0 siblings, 0 replies; 23+ messages in thread From: Darrick J. Wong @ 2022-11-30 16:28 UTC (permalink / raw) To: Shiyang Ruan Cc: linux-kernel, linux-xfs, nvdimm, linux-fsdevel, david, dan.j.williams On Wed, Nov 30, 2022 at 04:58:32PM +0800, Shiyang Ruan wrote: > > > 在 2022/11/30 12:08, Darrick J. Wong 写道: > > On Thu, Nov 24, 2022 at 02:54:53PM +0000, Shiyang Ruan wrote: > > > This patch fixes the warning message reported in dax_associate_entry() > > > and dax_disassociate_entry(). > > > > Hmm, that's quite a bit to put in a single patch, but I'll try to get > > through this... > > Oh sorry... Well you have to start somewhere. :) I often start with a megapatch for testing and later break it into smaller pieces once I've validated that the megapatch creates a solid improvement. > > > > > 1. reset page->mapping and ->index when refcount counting down to 0. > > > 2. set IOMAP_F_SHARED flag when iomap read to allow one dax page to be > > > associated more than once for not only write but also read. > > > > That makes sense, I think. > > > > > 3. should zero the edge (when not aligned) if srcmap is HOLE or > > > > When is IOMAP_F_SHARED set on the /source/ mapping? > > In fs/xfs/xfs_iomap.c: xfs_direct_write_iomap_begin(): goto out_found_cow > tag, srcmap is *not set* when the source extent is HOLE, then only iomap is > set with IOMAP_F_SHARED flag. > > Now we come to iomap iter, when we get the srcmap by calling > iomap_iter_srcmap(iter), the iomap will be returned (because srcmap isn't > set). So, in this case, srcmap == iomap, we can think the source extent is > a HOLE if srcmap->flag & IOMAP_F_SHARED != 0 Aha, got it. IOWs, this handles things like alwayscow and cowing over a hole, where we don't have a source mapping. Thanks for refreshing my memory. > > > UNWRITTEN. > > > 4. iterator of two files in dedupe should be executed side by side, not > > > nested. > > > > Why? Also, this seems like a separate change? > > Explain below. > > > > > > 5. use xfs_dax_write_iomap_ops for xfs zero and truncate. > > > > Makes sense. > > > > > Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> > > > --- > > > fs/dax.c | 114 ++++++++++++++++++++++++++------------------- > > > fs/xfs/xfs_iomap.c | 6 +-- > > > 2 files changed, 69 insertions(+), 51 deletions(-) > > > > > > diff --git a/fs/dax.c b/fs/dax.c > > > index 1c6867810cbd..5ea7c0926b7f 100644 > > > --- a/fs/dax.c > > > +++ b/fs/dax.c > > > @@ -398,7 +398,7 @@ static void dax_disassociate_entry(void *entry, struct address_space *mapping, > > > WARN_ON_ONCE(trunc && page_ref_count(page) > 1); > > > if (dax_mapping_is_cow(page->mapping)) { > > > /* keep the CoW flag if this page is still shared */ > > > - if (page->index-- > 0) > > > + if (page->index-- > 1) > > > > Hmm. So if the fsdax "page" sharing factor drops from 2 to 1, we'll now > > null out the mapping and index? Before, we only did that when it > > dropped from 1 to 0. > > > > Does this leave the page with no mapping? And I guess a subsequent > > access will now take a fault to map it back in? > > I confused it with --page->index, the result of "page->index--" is > page->index itself. Yeah, postfix operators in comparisons are not great for readability the later one gets into the night. > So, assume: > this time, refcount is 2, >1, minus 1 to 1, then continue; > next time, refcount is 1, not >1, minus 1 to 0, then clear the > page->mapping. > > > > > > continue; > > > } else > > > WARN_ON_ONCE(page->mapping && page->mapping != mapping); > > > @@ -840,12 +840,6 @@ static bool dax_fault_is_synchronous(const struct iomap_iter *iter, > > > (iter->iomap.flags & IOMAP_F_DIRTY); > > > } > > > -static bool dax_fault_is_cow(const struct iomap_iter *iter) > > > -{ > > > - return (iter->flags & IOMAP_WRITE) && > > > - (iter->iomap.flags & IOMAP_F_SHARED); > > > -} > > > - > > > /* > > > * By this point grab_mapping_entry() has ensured that we have a locked entry > > > * of the appropriate size so we don't have to worry about downgrading PMDs to > > > @@ -859,13 +853,14 @@ static void *dax_insert_entry(struct xa_state *xas, struct vm_fault *vmf, > > > { > > > struct address_space *mapping = vmf->vma->vm_file->f_mapping; > > > void *new_entry = dax_make_entry(pfn, flags); > > > - bool dirty = !dax_fault_is_synchronous(iter, vmf->vma); > > > - bool cow = dax_fault_is_cow(iter); > > > + bool write = iter->flags & IOMAP_WRITE; > > > + bool dirty = write && !dax_fault_is_synchronous(iter, vmf->vma); > > > + bool shared = iter->iomap.flags & IOMAP_F_SHARED; > > > if (dirty) > > > __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > > > - if (cow || (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE))) { > > > + if (shared || (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE))) { > > > > Ah, ok, so now we're yanking the mapping if the extent is shared, > > presumably so that... > > > > > unsigned long index = xas->xa_index; > > > /* we are replacing a zero page with block mapping */ > > > if (dax_is_pmd_entry(entry)) > > > @@ -877,12 +872,12 @@ static void *dax_insert_entry(struct xa_state *xas, struct vm_fault *vmf, > > > xas_reset(xas); > > > xas_lock_irq(xas); > > > - if (cow || dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) { > > > + if (shared || dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) { > > > void *old; > > > dax_disassociate_entry(entry, mapping, false); > > > dax_associate_entry(new_entry, mapping, vmf->vma, vmf->address, > > > - cow); > > > + shared); > > > > ...down here we can rebuild the association, but this time we'll set the > > page->mapping to PAGE_MAPPING_DAX_COW? I see a lot of similar changes, > > so I'm guessing this is how you fixed the failures that were a result of > > read file A -> reflink A to B -> read file B sequences? > > Yes, it even failed when mapreading a page shared by two extent of ONE file. > But I remember that I had tested these cases before... > > > > > > /* > > > * Only swap our new entry into the page cache if the current > > > * entry is a zero page or an empty entry. If a normal PTE or > > > @@ -902,7 +897,7 @@ static void *dax_insert_entry(struct xa_state *xas, struct vm_fault *vmf, > > > if (dirty) > > > xas_set_mark(xas, PAGECACHE_TAG_DIRTY); > > > - if (cow) > > > + if (write && shared) > > > xas_set_mark(xas, PAGECACHE_TAG_TOWRITE); > > > xas_unlock_irq(xas); > > > @@ -1107,23 +1102,35 @@ static int dax_iomap_cow_copy(loff_t pos, uint64_t length, size_t align_size, > > > > I think this function isn't well named. It's copying into the parts of > > the @daddr page that are *not* covered by @pos/@length. In other words, > > it's really copying *around* the range that's supplied, isn't it? > > Yes, I used to name it "dax_iomap_copy_edge()", which is not so good too. > I'm not good at naming. dax_iomap_cow_edges() ? dax_iomap_copy_around() ? > > > loff_t end = pos + length; > > > loff_t pg_end = round_up(end, align_size); > > > bool copy_all = head_off == 0 && end == pg_end; > > > + /* write zero at edge if srcmap is a HOLE or IOMAP_UNWRITTEN */ > > > + bool zero_edge = srcmap->flags & IOMAP_F_SHARED || > > > > When is IOMAP_F_SHARED set on the /source/ mapping? I don't understand > > that circumstance, so I don't understand why we want to zero around in > > that case. > > Please see explanation above. > > > > > > + srcmap->type == IOMAP_UNWRITTEN; > > > > Though it's self evident why we'd do that if the source map is > > unwritten. > > The new allocated destination pages for CoW is not all zeroed, not like new > allocated page cache. Old data remains on it. So, if the source extent > doesn't contain valid data (HOLE and UNWRITTEN extent), we need to zero the > destination range around the @pos+@length. <nod> > > > > > void *saddr = 0; > > > int ret = 0; > > > - ret = dax_iomap_direct_access(srcmap, pos, size, &saddr, NULL); > > > - if (ret) > > > - return ret; > > > + if (!zero_edge) { > > > + ret = dax_iomap_direct_access(srcmap, pos, size, &saddr, NULL); > > > + if (ret) > > > + return ret; > > > + } > > > if (copy_all) { > > > - ret = copy_mc_to_kernel(daddr, saddr, length); > > > - return ret ? -EIO : 0; > > > + if (zero_edge) > > > + memset(daddr, 0, size); > > > + else > > > + ret = copy_mc_to_kernel(daddr, saddr, length); > > > + goto out; > > > } > > > /* Copy the head part of the range */ > > > if (head_off) { > > > - ret = copy_mc_to_kernel(daddr, saddr, head_off); > > > - if (ret) > > > - return -EIO; > > > + if (zero_edge) > > > + memset(daddr, 0, head_off); > > > + else { > > > + ret = copy_mc_to_kernel(daddr, saddr, head_off); > > > + if (ret) > > > + return -EIO; > > > + } > > > } > > > /* Copy the tail part of the range */ > > > @@ -1131,12 +1138,19 @@ static int dax_iomap_cow_copy(loff_t pos, uint64_t length, size_t align_size, > > > loff_t tail_off = head_off + length; > > > loff_t tail_len = pg_end - end; > > > - ret = copy_mc_to_kernel(daddr + tail_off, saddr + tail_off, > > > - tail_len); > > > - if (ret) > > > - return -EIO; > > > + if (zero_edge) > > > + memset(daddr + tail_off, 0, tail_len); > > > + else { > > > + ret = copy_mc_to_kernel(daddr + tail_off, > > > + saddr + tail_off, tail_len); > > > + if (ret) > > > + return -EIO; > > > + } > > > } > > > - return 0; > > > +out: > > > + if (zero_edge) > > > + dax_flush(srcmap->dax_dev, daddr, size); > > > + return ret ? -EIO : 0; > > > } > > > /* > > > @@ -1235,13 +1249,9 @@ static int dax_memzero(struct iomap_iter *iter, loff_t pos, size_t size) > > > if (ret < 0) > > > return ret; > > > memset(kaddr + offset, 0, size); > > > - if (srcmap->addr != iomap->addr) { > > > - ret = dax_iomap_cow_copy(pos, size, PAGE_SIZE, srcmap, > > > - kaddr); > > > - if (ret < 0) > > > - return ret; > > > - dax_flush(iomap->dax_dev, kaddr, PAGE_SIZE); > > > - } else > > > + if (iomap->flags & IOMAP_F_SHARED) > > > + ret = dax_iomap_cow_copy(pos, size, PAGE_SIZE, srcmap, kaddr); > > > + else > > > dax_flush(iomap->dax_dev, kaddr + offset, size); > > > return ret; > > > } > > > @@ -1258,6 +1268,15 @@ static s64 dax_zero_iter(struct iomap_iter *iter, bool *did_zero) > > > if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN) > > > return length; > > > + /* > > > + * invalidate the pages whose sharing state is to be changed > > > + * because of CoW. > > > + */ > > > + if (iomap->flags & IOMAP_F_SHARED) > > > + invalidate_inode_pages2_range(iter->inode->i_mapping, > > > + pos >> PAGE_SHIFT, > > > + (pos + length - 1) >> PAGE_SHIFT); > > > + > > > do { > > > unsigned offset = offset_in_page(pos); > > > unsigned size = min_t(u64, PAGE_SIZE - offset, length); > > > @@ -1318,12 +1337,13 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi, > > > struct iov_iter *iter) > > > { > > > const struct iomap *iomap = &iomi->iomap; > > > - const struct iomap *srcmap = &iomi->srcmap; > > > + const struct iomap *srcmap = iomap_iter_srcmap(iomi); > > > loff_t length = iomap_length(iomi); > > > loff_t pos = iomi->pos; > > > struct dax_device *dax_dev = iomap->dax_dev; > > > loff_t end = pos + length, done = 0; > > > bool write = iov_iter_rw(iter) == WRITE; > > > + bool cow = write && iomap->flags & IOMAP_F_SHARED; > > > ssize_t ret = 0; > > > size_t xfer; > > > int id; > > > @@ -1350,7 +1370,7 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi, > > > * into page tables. We have to tear down these mappings so that data > > > * written by write(2) is visible in mmap. > > > */ > > > - if (iomap->flags & IOMAP_F_NEW) { > > > + if (iomap->flags & IOMAP_F_NEW || cow) { > > > invalidate_inode_pages2_range(iomi->inode->i_mapping, > > > pos >> PAGE_SHIFT, > > > (end - 1) >> PAGE_SHIFT); > > > @@ -1384,8 +1404,7 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi, > > > break; > > > } > > > - if (write && > > > - srcmap->type != IOMAP_HOLE && srcmap->addr != iomap->addr) { > > > + if (cow) { > > > ret = dax_iomap_cow_copy(pos, length, PAGE_SIZE, srcmap, > > > kaddr); > > > if (ret) > > > @@ -1532,7 +1551,7 @@ static vm_fault_t dax_fault_iter(struct vm_fault *vmf, > > > struct xa_state *xas, void **entry, bool pmd) > > > { > > > const struct iomap *iomap = &iter->iomap; > > > - const struct iomap *srcmap = &iter->srcmap; > > > + const struct iomap *srcmap = iomap_iter_srcmap(iter); > > > size_t size = pmd ? PMD_SIZE : PAGE_SIZE; > > > loff_t pos = (loff_t)xas->xa_index << PAGE_SHIFT; > > > bool write = iter->flags & IOMAP_WRITE; > > > @@ -1563,8 +1582,7 @@ static vm_fault_t dax_fault_iter(struct vm_fault *vmf, > > > *entry = dax_insert_entry(xas, vmf, iter, *entry, pfn, entry_flags); > > > - if (write && > > > - srcmap->type != IOMAP_HOLE && srcmap->addr != iomap->addr) { > > > + if (write && iomap->flags & IOMAP_F_SHARED) { > > > err = dax_iomap_cow_copy(pos, size, size, srcmap, kaddr); > > > if (err) > > > return dax_fault_return(err); > > > @@ -1936,15 +1954,15 @@ int dax_dedupe_file_range_compare(struct inode *src, loff_t srcoff, > > > > Does the dedupe change need to be in this patch? It looks ok both > > before and after, so I don't know why it's necessary. > > I'll separate this in next version. > > This is the old version: > while ((ret = iomap_iter(&src_iter, ops)) > 0) { > while ((ret = iomap_iter(&dst_iter, ops)) > 0) { > dst_iter.processed = dax_range_compare_iter(&src_iter, > &dst_iter, len, same); > } > if (ret <= 0) > src_iter.processed = ret; > } > The inner iter (iomap_begin, actor, iomap_end) may loop more than once. In > this case, the inner dest_iter updates its iomap, but the outer src_iter > still keeps the old. The comparison of new dest_iomap and old src_iomap is > meanless and it causes the bug. Ahh. Got it. The two iomap iters need to advance at the same time, or the dst_iter needs to be defined in the outer loop body based on whatever pos and length were returned by iomap_iter(&src_iter...). > So, we need to make the two iters able to update their iomaps together. > > > > > Welp, thank you for fixing the problems, at least. After a couple of > > days it looks like the serious problems have cleared up. > > > > I didn't test the dax code well. Sorry... Well, it's still experimental code. :) --D > > -- > Thanks, > Ruan. > > > --D > > > > > .len = len, > > > .flags = IOMAP_DAX, > > > }; > > > - int ret; > > > + int ret, compared = 0; > > > - while ((ret = iomap_iter(&src_iter, ops)) > 0) { > > > - while ((ret = iomap_iter(&dst_iter, ops)) > 0) { > > > - dst_iter.processed = dax_range_compare_iter(&src_iter, > > > - &dst_iter, len, same); > > > - } > > > - if (ret <= 0) > > > - src_iter.processed = ret; > > > + while ((ret = iomap_iter(&src_iter, ops)) > 0 && > > > + (ret = iomap_iter(&dst_iter, ops)) > 0) { > > > + compared = dax_range_compare_iter(&src_iter, &dst_iter, len, > > > + same); > > > + if (compared < 0) > > > + return ret; > > > + src_iter.processed = dst_iter.processed = compared; > > > } > > > return ret; > > > } > > > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c > > > index 07da03976ec1..d9401d0300ad 100644 > > > --- a/fs/xfs/xfs_iomap.c > > > +++ b/fs/xfs/xfs_iomap.c > > > @@ -1215,7 +1215,7 @@ xfs_read_iomap_begin( > > > return error; > > > error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap, > > > &nimaps, 0); > > > - if (!error && (flags & IOMAP_REPORT)) > > > + if (!error && ((flags & IOMAP_REPORT) || IS_DAX(inode))) > > > error = xfs_reflink_trim_around_shared(ip, &imap, &shared); > > > xfs_iunlock(ip, lockmode); > > > @@ -1370,7 +1370,7 @@ xfs_zero_range( > > > if (IS_DAX(inode)) > > > return dax_zero_range(inode, pos, len, did_zero, > > > - &xfs_direct_write_iomap_ops); > > > + &xfs_dax_write_iomap_ops); > > > return iomap_zero_range(inode, pos, len, did_zero, > > > &xfs_buffered_write_iomap_ops); > > > } > > > @@ -1385,7 +1385,7 @@ xfs_truncate_page( > > > if (IS_DAX(inode)) > > > return dax_truncate_page(inode, pos, did_zero, > > > - &xfs_direct_write_iomap_ops); > > > + &xfs_dax_write_iomap_ops); > > > return iomap_truncate_page(inode, pos, did_zero, > > > &xfs_buffered_write_iomap_ops); > > > } > > > -- > > > 2.38.1 > > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: [PATCH 1/2] fsdax,xfs: fix warning messages at dax_[dis]associate_entry() 2022-11-24 14:54 ` [PATCH 1/2] fsdax,xfs: fix warning messages at dax_[dis]associate_entry() Shiyang Ruan 2022-11-30 4:08 ` Darrick J. Wong @ 2022-11-30 4:12 ` Dan Williams 1 sibling, 0 replies; 23+ messages in thread From: Dan Williams @ 2022-11-30 4:12 UTC (permalink / raw) To: Shiyang Ruan, linux-kernel, linux-xfs, nvdimm, linux-fsdevel Cc: djwong, david, dan.j.williams Shiyang Ruan wrote: > This patch fixes the warning message reported in dax_associate_entry() > and dax_disassociate_entry(). Can you include the xfstest test number and a snippet of the warning message. > 1. reset page->mapping and ->index when refcount counting down to 0. > 2. set IOMAP_F_SHARED flag when iomap read to allow one dax page to be > associated more than once for not only write but also read. > 3. should zero the edge (when not aligned) if srcmap is HOLE or > UNWRITTEN. > 4. iterator of two files in dedupe should be executed side by side, not > nested. > 5. use xfs_dax_write_iomap_ops for xfs zero and truncate. Do these all need to be done at once, or is this 5 patches? > > Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> > --- > fs/dax.c | 114 ++++++++++++++++++++++++++------------------- > fs/xfs/xfs_iomap.c | 6 +-- > 2 files changed, 69 insertions(+), 51 deletions(-) > > diff --git a/fs/dax.c b/fs/dax.c > index 1c6867810cbd..5ea7c0926b7f 100644 > --- a/fs/dax.c > +++ b/fs/dax.c > @@ -398,7 +398,7 @@ static void dax_disassociate_entry(void *entry, struct address_space *mapping, > WARN_ON_ONCE(trunc && page_ref_count(page) > 1); > if (dax_mapping_is_cow(page->mapping)) { > /* keep the CoW flag if this page is still shared */ > - if (page->index-- > 0) > + if (page->index-- > 1) I think this wants either a helper function to make it clear that ->index is being used as a share count, or go ahead and rename that field in this context with something like: diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 910d880e67eb..1a409288f39d 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -103,7 +103,10 @@ struct page { }; /* See page-flags.h for PAGE_MAPPING_FLAGS */ struct address_space *mapping; - pgoff_t index; /* Our offset within mapping. */ + union { + pgoff_t index; /* Our offset within mapping. */ + unsigned long share; + }; /** * @private: Mapping-private opaque data. * Usually used for buffer_heads if PagePrivate. ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 2/2] fsdax,xfs: port unshare to fsdax 2022-11-24 14:54 [PATCH 0/2] fsdax,xfs: fix warning messages Shiyang Ruan 2022-11-24 14:54 ` [PATCH 1/2] fsdax,xfs: fix warning messages at dax_[dis]associate_entry() Shiyang Ruan @ 2022-11-24 14:54 ` Shiyang Ruan 2022-11-30 3:28 ` Darrick J. Wong 2022-11-27 18:38 ` [PATCH 0/2] fsdax,xfs: fix warning messages Darrick J. Wong ` (2 subsequent siblings) 4 siblings, 1 reply; 23+ messages in thread From: Shiyang Ruan @ 2022-11-24 14:54 UTC (permalink / raw) To: linux-kernel, linux-xfs, nvdimm, linux-fsdevel Cc: djwong, david, dan.j.williams Implement unshare in fsdax mode: copy data from srcmap to iomap. Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> --- fs/dax.c | 52 ++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_reflink.c | 8 +++++-- include/linux/dax.h | 2 ++ 3 files changed, 60 insertions(+), 2 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 5ea7c0926b7f..3d0bf68ab6b0 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1235,6 +1235,58 @@ static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, struct vm_fault *vmf, } #endif /* CONFIG_FS_DAX_PMD */ +static s64 dax_unshare_iter(struct iomap_iter *iter) +{ + struct iomap *iomap = &iter->iomap; + const struct iomap *srcmap = iomap_iter_srcmap(iter); + loff_t pos = iter->pos; + loff_t length = iomap_length(iter); + int id = 0; + s64 ret = 0; + void *daddr = NULL, *saddr = NULL; + + /* don't bother with blocks that are not shared to start with */ + if (!(iomap->flags & IOMAP_F_SHARED)) + return length; + /* don't bother with holes or unwritten extents */ + if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN) + return length; + + id = dax_read_lock(); + ret = dax_iomap_direct_access(iomap, pos, length, &daddr, NULL); + if (ret < 0) + goto out_unlock; + + ret = dax_iomap_direct_access(srcmap, pos, length, &saddr, NULL); + if (ret < 0) + goto out_unlock; + + ret = copy_mc_to_kernel(daddr, saddr, length); + if (ret) + ret = -EIO; + +out_unlock: + dax_read_unlock(id); + return ret; +} + +int dax_file_unshare(struct inode *inode, loff_t pos, loff_t len, + const struct iomap_ops *ops) +{ + struct iomap_iter iter = { + .inode = inode, + .pos = pos, + .len = len, + .flags = IOMAP_WRITE | IOMAP_UNSHARE | IOMAP_DAX, + }; + int ret; + + while ((ret = iomap_iter(&iter, ops)) > 0) + iter.processed = dax_unshare_iter(&iter); + return ret; +} +EXPORT_SYMBOL_GPL(dax_file_unshare); + static int dax_memzero(struct iomap_iter *iter, loff_t pos, size_t size) { const struct iomap *iomap = &iter->iomap; diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 93bdd25680bc..fe46bce8cae6 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -1693,8 +1693,12 @@ xfs_reflink_unshare( inode_dio_wait(inode); - error = iomap_file_unshare(inode, offset, len, - &xfs_buffered_write_iomap_ops); + if (IS_DAX(inode)) + error = dax_file_unshare(inode, offset, len, + &xfs_dax_write_iomap_ops); + else + error = iomap_file_unshare(inode, offset, len, + &xfs_buffered_write_iomap_ops); if (error) goto out; diff --git a/include/linux/dax.h b/include/linux/dax.h index ba985333e26b..2b5ecb591059 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -205,6 +205,8 @@ static inline void dax_unlock_mapping_entry(struct address_space *mapping, } #endif +int dax_file_unshare(struct inode *inode, loff_t pos, loff_t len, + const struct iomap_ops *ops); int dax_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, const struct iomap_ops *ops); int dax_truncate_page(struct inode *inode, loff_t pos, bool *did_zero, -- 2.38.1 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 2/2] fsdax,xfs: port unshare to fsdax 2022-11-24 14:54 ` [PATCH 2/2] fsdax,xfs: port unshare to fsdax Shiyang Ruan @ 2022-11-30 3:28 ` Darrick J. Wong 0 siblings, 0 replies; 23+ messages in thread From: Darrick J. Wong @ 2022-11-30 3:28 UTC (permalink / raw) To: Shiyang Ruan Cc: linux-kernel, linux-xfs, nvdimm, linux-fsdevel, david, dan.j.williams On Thu, Nov 24, 2022 at 02:54:54PM +0000, Shiyang Ruan wrote: > Implement unshare in fsdax mode: copy data from srcmap to iomap. > > Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> Heh, I had a version nearly like this in my tree. Makes reviewing easier: Reviewed-by: Darrick J. Wong <djwong@kernel.org> --D > --- > fs/dax.c | 52 ++++++++++++++++++++++++++++++++++++++++++++ > fs/xfs/xfs_reflink.c | 8 +++++-- > include/linux/dax.h | 2 ++ > 3 files changed, 60 insertions(+), 2 deletions(-) > > diff --git a/fs/dax.c b/fs/dax.c > index 5ea7c0926b7f..3d0bf68ab6b0 100644 > --- a/fs/dax.c > +++ b/fs/dax.c > @@ -1235,6 +1235,58 @@ static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, struct vm_fault *vmf, > } > #endif /* CONFIG_FS_DAX_PMD */ > > +static s64 dax_unshare_iter(struct iomap_iter *iter) > +{ > + struct iomap *iomap = &iter->iomap; > + const struct iomap *srcmap = iomap_iter_srcmap(iter); > + loff_t pos = iter->pos; > + loff_t length = iomap_length(iter); > + int id = 0; > + s64 ret = 0; > + void *daddr = NULL, *saddr = NULL; > + > + /* don't bother with blocks that are not shared to start with */ > + if (!(iomap->flags & IOMAP_F_SHARED)) > + return length; > + /* don't bother with holes or unwritten extents */ > + if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN) > + return length; > + > + id = dax_read_lock(); > + ret = dax_iomap_direct_access(iomap, pos, length, &daddr, NULL); > + if (ret < 0) > + goto out_unlock; > + > + ret = dax_iomap_direct_access(srcmap, pos, length, &saddr, NULL); > + if (ret < 0) > + goto out_unlock; > + > + ret = copy_mc_to_kernel(daddr, saddr, length); > + if (ret) > + ret = -EIO; > + > +out_unlock: > + dax_read_unlock(id); > + return ret; > +} > + > +int dax_file_unshare(struct inode *inode, loff_t pos, loff_t len, > + const struct iomap_ops *ops) > +{ > + struct iomap_iter iter = { > + .inode = inode, > + .pos = pos, > + .len = len, > + .flags = IOMAP_WRITE | IOMAP_UNSHARE | IOMAP_DAX, > + }; > + int ret; > + > + while ((ret = iomap_iter(&iter, ops)) > 0) > + iter.processed = dax_unshare_iter(&iter); > + return ret; > +} > +EXPORT_SYMBOL_GPL(dax_file_unshare); > + > static int dax_memzero(struct iomap_iter *iter, loff_t pos, size_t size) > { > const struct iomap *iomap = &iter->iomap; > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c > index 93bdd25680bc..fe46bce8cae6 100644 > --- a/fs/xfs/xfs_reflink.c > +++ b/fs/xfs/xfs_reflink.c > @@ -1693,8 +1693,12 @@ xfs_reflink_unshare( > > inode_dio_wait(inode); > > - error = iomap_file_unshare(inode, offset, len, > - &xfs_buffered_write_iomap_ops); > + if (IS_DAX(inode)) > + error = dax_file_unshare(inode, offset, len, > + &xfs_dax_write_iomap_ops); > + else > + error = iomap_file_unshare(inode, offset, len, > + &xfs_buffered_write_iomap_ops); > if (error) > goto out; > > diff --git a/include/linux/dax.h b/include/linux/dax.h > index ba985333e26b..2b5ecb591059 100644 > --- a/include/linux/dax.h > +++ b/include/linux/dax.h > @@ -205,6 +205,8 @@ static inline void dax_unlock_mapping_entry(struct address_space *mapping, > } > #endif > > +int dax_file_unshare(struct inode *inode, loff_t pos, loff_t len, > + const struct iomap_ops *ops); > int dax_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, > const struct iomap_ops *ops); > int dax_truncate_page(struct inode *inode, loff_t pos, bool *did_zero, > -- > 2.38.1 > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] fsdax,xfs: fix warning messages 2022-11-24 14:54 [PATCH 0/2] fsdax,xfs: fix warning messages Shiyang Ruan 2022-11-24 14:54 ` [PATCH 1/2] fsdax,xfs: fix warning messages at dax_[dis]associate_entry() Shiyang Ruan 2022-11-24 14:54 ` [PATCH 2/2] fsdax,xfs: port unshare to fsdax Shiyang Ruan @ 2022-11-27 18:38 ` Darrick J. Wong 2022-11-28 2:16 ` Shiyang Ruan 2022-11-30 3:59 ` Dan Williams 2022-11-30 10:30 ` [PATCH 0/2] fsdax,xfs: fix warning messages #forregzbot Thorsten Leemhuis 4 siblings, 1 reply; 23+ messages in thread From: Darrick J. Wong @ 2022-11-27 18:38 UTC (permalink / raw) To: Shiyang Ruan Cc: linux-kernel, linux-xfs, nvdimm, linux-fsdevel, david, dan.j.williams On Thu, Nov 24, 2022 at 02:54:52PM +0000, Shiyang Ruan wrote: > Many testcases failed in dax+reflink mode with warning message in dmesg. > This also effects dax+noreflink mode if we run the test after a > dax+reflink test. So, the most urgent thing is solving the warning > messages. > > Patch 1 fixes some mistakes and adds handling of CoW cases not > previously considered (srcmap is HOLE or UNWRITTEN). > Patch 2 adds the implementation of unshare for fsdax. > > With these fixes, most warning messages in dax_associate_entry() are > gone. But honestly, generic/388 will randomly failed with the warning. > The case shutdown the xfs when fsstress is running, and do it for many > times. I think the reason is that dax pages in use are not able to be > invalidated in time when fs is shutdown. The next time dax page to be > associated, it still remains the mapping value set last time. I'll keep > on solving it. > > The warning message in dax_writeback_one() can also be fixed because of > the dax unshare. This cuts down the amount of test failures quite a bit, but I think you're still missing a piece or two -- namely the part that refuses to enable S_DAX mode on a reflinked file when the inode is being loaded from disk. However, thank you for fixing dax.c, because that was the part I couldn't figure out at all. :) --D > > Shiyang Ruan (2): > fsdax,xfs: fix warning messages at dax_[dis]associate_entry() > fsdax,xfs: port unshare to fsdax > > fs/dax.c | 166 ++++++++++++++++++++++++++++++------------- > fs/xfs/xfs_iomap.c | 6 +- > fs/xfs/xfs_reflink.c | 8 ++- > include/linux/dax.h | 2 + > 4 files changed, 129 insertions(+), 53 deletions(-) > > -- > 2.38.1 > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] fsdax,xfs: fix warning messages 2022-11-27 18:38 ` [PATCH 0/2] fsdax,xfs: fix warning messages Darrick J. Wong @ 2022-11-28 2:16 ` Shiyang Ruan 2022-11-28 23:08 ` Darrick J. Wong 0 siblings, 1 reply; 23+ messages in thread From: Shiyang Ruan @ 2022-11-28 2:16 UTC (permalink / raw) To: Darrick J. Wong Cc: linux-kernel, linux-xfs, nvdimm, linux-fsdevel, david, dan.j.williams 在 2022/11/28 2:38, Darrick J. Wong 写道: > On Thu, Nov 24, 2022 at 02:54:52PM +0000, Shiyang Ruan wrote: >> Many testcases failed in dax+reflink mode with warning message in dmesg. >> This also effects dax+noreflink mode if we run the test after a >> dax+reflink test. So, the most urgent thing is solving the warning >> messages. >> >> Patch 1 fixes some mistakes and adds handling of CoW cases not >> previously considered (srcmap is HOLE or UNWRITTEN). >> Patch 2 adds the implementation of unshare for fsdax. >> >> With these fixes, most warning messages in dax_associate_entry() are >> gone. But honestly, generic/388 will randomly failed with the warning. >> The case shutdown the xfs when fsstress is running, and do it for many >> times. I think the reason is that dax pages in use are not able to be >> invalidated in time when fs is shutdown. The next time dax page to be >> associated, it still remains the mapping value set last time. I'll keep >> on solving it. >> >> The warning message in dax_writeback_one() can also be fixed because of >> the dax unshare. > > This cuts down the amount of test failures quite a bit, but I think > you're still missing a piece or two -- namely the part that refuses to > enable S_DAX mode on a reflinked file when the inode is being loaded > from disk. However, thank you for fixing dax.c, because that was the > part I couldn't figure out at all. :) I didn't include it[1] in this patchset... [1] https://lore.kernel.org/linux-xfs/1663234002-17-1-git-send-email-ruansy.fnst@fujitsu.com/ -- Thanks, Ruan. > > --D > >> >> Shiyang Ruan (2): >> fsdax,xfs: fix warning messages at dax_[dis]associate_entry() >> fsdax,xfs: port unshare to fsdax >> >> fs/dax.c | 166 ++++++++++++++++++++++++++++++------------- >> fs/xfs/xfs_iomap.c | 6 +- >> fs/xfs/xfs_reflink.c | 8 ++- >> include/linux/dax.h | 2 + >> 4 files changed, 129 insertions(+), 53 deletions(-) >> >> -- >> 2.38.1 >> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] fsdax,xfs: fix warning messages 2022-11-28 2:16 ` Shiyang Ruan @ 2022-11-28 23:08 ` Darrick J. Wong 0 siblings, 0 replies; 23+ messages in thread From: Darrick J. Wong @ 2022-11-28 23:08 UTC (permalink / raw) To: Shiyang Ruan Cc: linux-kernel, linux-xfs, nvdimm, linux-fsdevel, david, dan.j.williams On Mon, Nov 28, 2022 at 10:16:23AM +0800, Shiyang Ruan wrote: > > > 在 2022/11/28 2:38, Darrick J. Wong 写道: > > On Thu, Nov 24, 2022 at 02:54:52PM +0000, Shiyang Ruan wrote: > > > Many testcases failed in dax+reflink mode with warning message in dmesg. > > > This also effects dax+noreflink mode if we run the test after a > > > dax+reflink test. So, the most urgent thing is solving the warning > > > messages. > > > > > > Patch 1 fixes some mistakes and adds handling of CoW cases not > > > previously considered (srcmap is HOLE or UNWRITTEN). > > > Patch 2 adds the implementation of unshare for fsdax. > > > > > > With these fixes, most warning messages in dax_associate_entry() are > > > gone. But honestly, generic/388 will randomly failed with the warning. > > > The case shutdown the xfs when fsstress is running, and do it for many > > > times. I think the reason is that dax pages in use are not able to be > > > invalidated in time when fs is shutdown. The next time dax page to be > > > associated, it still remains the mapping value set last time. I'll keep > > > on solving it. > > > > > > The warning message in dax_writeback_one() can also be fixed because of > > > the dax unshare. > > > > This cuts down the amount of test failures quite a bit, but I think > > you're still missing a piece or two -- namely the part that refuses to > > enable S_DAX mode on a reflinked file when the inode is being loaded > > from disk. However, thank you for fixing dax.c, because that was the > > part I couldn't figure out at all. :) > > I didn't include it[1] in this patchset... > > [1] https://lore.kernel.org/linux-xfs/1663234002-17-1-git-send-email-ruansy.fnst@fujitsu.com/ Oh, ok. I'll pull that one in. All the remaining test failures seem to be related to inode flag states or tests that trip over the lack of delalloc on dax+reflink files. --D > > -- > Thanks, > Ruan. > > > > > --D > > > > > > > > Shiyang Ruan (2): > > > fsdax,xfs: fix warning messages at dax_[dis]associate_entry() > > > fsdax,xfs: port unshare to fsdax > > > > > > fs/dax.c | 166 ++++++++++++++++++++++++++++++------------- > > > fs/xfs/xfs_iomap.c | 6 +- > > > fs/xfs/xfs_reflink.c | 8 ++- > > > include/linux/dax.h | 2 + > > > 4 files changed, 129 insertions(+), 53 deletions(-) > > > > > > -- > > > 2.38.1 > > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: [PATCH 0/2] fsdax,xfs: fix warning messages 2022-11-24 14:54 [PATCH 0/2] fsdax,xfs: fix warning messages Shiyang Ruan ` (2 preceding siblings ...) 2022-11-27 18:38 ` [PATCH 0/2] fsdax,xfs: fix warning messages Darrick J. Wong @ 2022-11-30 3:59 ` Dan Williams 2022-11-30 4:16 ` Darrick J. Wong 2022-11-30 21:27 ` Andrew Morton 2022-11-30 10:30 ` [PATCH 0/2] fsdax,xfs: fix warning messages #forregzbot Thorsten Leemhuis 4 siblings, 2 replies; 23+ messages in thread From: Dan Williams @ 2022-11-30 3:59 UTC (permalink / raw) To: Shiyang Ruan, linux-kernel, linux-xfs, nvdimm, linux-fsdevel Cc: djwong, david, dan.j.williams, akpm [ add Andrew ] Shiyang Ruan wrote: > Many testcases failed in dax+reflink mode with warning message in dmesg. > This also effects dax+noreflink mode if we run the test after a > dax+reflink test. So, the most urgent thing is solving the warning > messages. > > Patch 1 fixes some mistakes and adds handling of CoW cases not > previously considered (srcmap is HOLE or UNWRITTEN). > Patch 2 adds the implementation of unshare for fsdax. > > With these fixes, most warning messages in dax_associate_entry() are > gone. But honestly, generic/388 will randomly failed with the warning. > The case shutdown the xfs when fsstress is running, and do it for many > times. I think the reason is that dax pages in use are not able to be > invalidated in time when fs is shutdown. The next time dax page to be > associated, it still remains the mapping value set last time. I'll keep > on solving it. > > The warning message in dax_writeback_one() can also be fixed because of > the dax unshare. Thank you for digging in on this, I had been pinned down on CXL tasks and worried that we would need to mark FS_DAX broken for a cycle, so this is timely. My only concern is that these patches look to have significant collisions with the fsdax page reference counting reworks pending in linux-next. Although, those are still sitting in mm-unstable: http://lore.kernel.org/r/20221108162059.2ee440d5244657c4f16bdca0@linux-foundation.org My preference would be to move ahead with both in which case I can help rebase these fixes on top. In that scenario everything would go through Andrew. However, if we are getting too late in the cycle for that path I think these dax-fixes take precedence, and one more cycle to let the page reference count reworks sit is ok. > Shiyang Ruan (2): > fsdax,xfs: fix warning messages at dax_[dis]associate_entry() > fsdax,xfs: port unshare to fsdax > > fs/dax.c | 166 ++++++++++++++++++++++++++++++------------- > fs/xfs/xfs_iomap.c | 6 +- > fs/xfs/xfs_reflink.c | 8 ++- > include/linux/dax.h | 2 + > 4 files changed, 129 insertions(+), 53 deletions(-) > > -- > 2.38.1 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] fsdax,xfs: fix warning messages 2022-11-30 3:59 ` Dan Williams @ 2022-11-30 4:16 ` Darrick J. Wong 2022-11-30 7:05 ` Dan Williams 2022-11-30 21:27 ` Andrew Morton 1 sibling, 1 reply; 23+ messages in thread From: Darrick J. Wong @ 2022-11-30 4:16 UTC (permalink / raw) To: Dan Williams Cc: Shiyang Ruan, linux-kernel, linux-xfs, nvdimm, linux-fsdevel, david, akpm On Tue, Nov 29, 2022 at 07:59:14PM -0800, Dan Williams wrote: > [ add Andrew ] > > Shiyang Ruan wrote: > > Many testcases failed in dax+reflink mode with warning message in dmesg. > > This also effects dax+noreflink mode if we run the test after a > > dax+reflink test. So, the most urgent thing is solving the warning > > messages. > > > > Patch 1 fixes some mistakes and adds handling of CoW cases not > > previously considered (srcmap is HOLE or UNWRITTEN). > > Patch 2 adds the implementation of unshare for fsdax. > > > > With these fixes, most warning messages in dax_associate_entry() are > > gone. But honestly, generic/388 will randomly failed with the warning. > > The case shutdown the xfs when fsstress is running, and do it for many > > times. I think the reason is that dax pages in use are not able to be > > invalidated in time when fs is shutdown. The next time dax page to be > > associated, it still remains the mapping value set last time. I'll keep > > on solving it. > > > > The warning message in dax_writeback_one() can also be fixed because of > > the dax unshare. > > Thank you for digging in on this, I had been pinned down on CXL tasks > and worried that we would need to mark FS_DAX broken for a cycle, so > this is timely. > > My only concern is that these patches look to have significant collisions with > the fsdax page reference counting reworks pending in linux-next. Although, > those are still sitting in mm-unstable: > > http://lore.kernel.org/r/20221108162059.2ee440d5244657c4f16bdca0@linux-foundation.org > > My preference would be to move ahead with both in which case I can help > rebase these fixes on top. In that scenario everything would go through > Andrew. > > However, if we are getting too late in the cycle for that path I think > these dax-fixes take precedence, and one more cycle to let the page > reference count reworks sit is ok. Well now that raises some interesting questions -- dax and reflink are totally broken on 6.1. I was thinking about cramming them into 6.2 as a data corruption fix on the grounds that is not an acceptable state of affairs. OTOH we're past -rc7, which is **really late** to be changing core code. Then again, there aren't so many fsdax users and nobody's complained about 6.0/6.1 being busted, so perhaps the risk of regression isn't so bad? Then again, that could be a sign that this could wait, if you and Andrew are really eager to merge the reworks. Just looking at the stuff that's still broken with dax+reflink -- I noticed that xfs/550-552 (aka the dax poison tests) are still regressing on reflink filesystems. So, uh, what would this patchset need to change if the "fsdax page reference counting reworks" were applied? Would it be changing the page refcount instead of stashing that in page->index? --D > > Shiyang Ruan (2): > > fsdax,xfs: fix warning messages at dax_[dis]associate_entry() > > fsdax,xfs: port unshare to fsdax > > > > fs/dax.c | 166 ++++++++++++++++++++++++++++++------------- > > fs/xfs/xfs_iomap.c | 6 +- > > fs/xfs/xfs_reflink.c | 8 ++- > > include/linux/dax.h | 2 + > > 4 files changed, 129 insertions(+), 53 deletions(-) > > > > -- > > 2.38.1 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] fsdax,xfs: fix warning messages 2022-11-30 4:16 ` Darrick J. Wong @ 2022-11-30 7:05 ` Dan Williams 2022-11-30 21:08 ` Darrick J. Wong 0 siblings, 1 reply; 23+ messages in thread From: Dan Williams @ 2022-11-30 7:05 UTC (permalink / raw) To: Darrick J. Wong, Dan Williams Cc: Shiyang Ruan, linux-kernel, linux-xfs, nvdimm, linux-fsdevel, david, akpm Darrick J. Wong wrote: > On Tue, Nov 29, 2022 at 07:59:14PM -0800, Dan Williams wrote: > > [ add Andrew ] > > > > Shiyang Ruan wrote: > > > Many testcases failed in dax+reflink mode with warning message in dmesg. > > > This also effects dax+noreflink mode if we run the test after a > > > dax+reflink test. So, the most urgent thing is solving the warning > > > messages. > > > > > > Patch 1 fixes some mistakes and adds handling of CoW cases not > > > previously considered (srcmap is HOLE or UNWRITTEN). > > > Patch 2 adds the implementation of unshare for fsdax. > > > > > > With these fixes, most warning messages in dax_associate_entry() are > > > gone. But honestly, generic/388 will randomly failed with the warning. > > > The case shutdown the xfs when fsstress is running, and do it for many > > > times. I think the reason is that dax pages in use are not able to be > > > invalidated in time when fs is shutdown. The next time dax page to be > > > associated, it still remains the mapping value set last time. I'll keep > > > on solving it. > > > > > > The warning message in dax_writeback_one() can also be fixed because of > > > the dax unshare. > > > > Thank you for digging in on this, I had been pinned down on CXL tasks > > and worried that we would need to mark FS_DAX broken for a cycle, so > > this is timely. > > > > My only concern is that these patches look to have significant collisions with > > the fsdax page reference counting reworks pending in linux-next. Although, > > those are still sitting in mm-unstable: > > > > http://lore.kernel.org/r/20221108162059.2ee440d5244657c4f16bdca0@linux-foundation.org > > > > My preference would be to move ahead with both in which case I can help > > rebase these fixes on top. In that scenario everything would go through > > Andrew. > > > > However, if we are getting too late in the cycle for that path I think > > these dax-fixes take precedence, and one more cycle to let the page > > reference count reworks sit is ok. > > Well now that raises some interesting questions -- dax and reflink are > totally broken on 6.1. I was thinking about cramming them into 6.2 as a > data corruption fix on the grounds that is not an acceptable state of > affairs. I agree it's not an acceptable state of affairs, but for 6.1 the answer may be to just revert to dax+reflink being forbidden again. The fact that no end user has noticed is probably a good sign that we can disable that without any one screaming. That may be the easy answer for 6.2 as well given how late this all is. > OTOH we're past -rc7, which is **really late** to be changing core code. > Then again, there aren't so many fsdax users and nobody's complained > about 6.0/6.1 being busted, so perhaps the risk of regression isn't so > bad? Then again, that could be a sign that this could wait, if you and > Andrew are really eager to merge the reworks. The page reference counting has also been languishing for a long time. A 6.2 merge would be nice, it relieves maintenance burden, but they do not start to have real end user implications until CXL memory hotplug platforms arrive and the warts in the reference counting start to show real problems in production. > Just looking at the stuff that's still broken with dax+reflink -- I > noticed that xfs/550-552 (aka the dax poison tests) are still regressing > on reflink filesystems. That's worrying because the whole point of reworking dax, xfs, and mm/memory-failure all at once was to handle the collision of poison and reflink'd dax files. > So, uh, what would this patchset need to change if the "fsdax page > reference counting reworks" were applied? Would it be changing the page > refcount instead of stashing that in page->index? Nah, it's things like switching from pages to folios and shifting how dax goes from pfns to pages. https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-unstable&id=cca48ba3196 Ideally fsdax would never deal in pfns at all and do everything in terms of offsets relative to a 'struct dax_device'. My gut is saying these patches, the refcount reworks, and the dax+reflink fixes, are important but not end user critical. One more status quo release does not hurt, and we can circle back to get this all straightened early in v6.3. I.e. just revert: 35fcd75af3ed xfs: fail dax mount if reflink is enabled on a partition ...for v6.1-rc8 and get back to this early in the New Year. > > --D > > > > Shiyang Ruan (2): > > > fsdax,xfs: fix warning messages at dax_[dis]associate_entry() > > > fsdax,xfs: port unshare to fsdax > > > > > > fs/dax.c | 166 ++++++++++++++++++++++++++++++------------- > > > fs/xfs/xfs_iomap.c | 6 +- > > > fs/xfs/xfs_reflink.c | 8 ++- > > > include/linux/dax.h | 2 + > > > 4 files changed, 129 insertions(+), 53 deletions(-) > > > > > > -- > > > 2.38.1 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] fsdax,xfs: fix warning messages 2022-11-30 7:05 ` Dan Williams @ 2022-11-30 21:08 ` Darrick J. Wong 2022-12-01 15:39 ` Shiyang Ruan 0 siblings, 1 reply; 23+ messages in thread From: Darrick J. Wong @ 2022-11-30 21:08 UTC (permalink / raw) To: Dan Williams Cc: Shiyang Ruan, linux-kernel, linux-xfs, nvdimm, linux-fsdevel, david, akpm On Tue, Nov 29, 2022 at 11:05:30PM -0800, Dan Williams wrote: > Darrick J. Wong wrote: > > On Tue, Nov 29, 2022 at 07:59:14PM -0800, Dan Williams wrote: > > > [ add Andrew ] > > > > > > Shiyang Ruan wrote: > > > > Many testcases failed in dax+reflink mode with warning message in dmesg. > > > > This also effects dax+noreflink mode if we run the test after a > > > > dax+reflink test. So, the most urgent thing is solving the warning > > > > messages. > > > > > > > > Patch 1 fixes some mistakes and adds handling of CoW cases not > > > > previously considered (srcmap is HOLE or UNWRITTEN). > > > > Patch 2 adds the implementation of unshare for fsdax. > > > > > > > > With these fixes, most warning messages in dax_associate_entry() are > > > > gone. But honestly, generic/388 will randomly failed with the warning. > > > > The case shutdown the xfs when fsstress is running, and do it for many > > > > times. I think the reason is that dax pages in use are not able to be > > > > invalidated in time when fs is shutdown. The next time dax page to be > > > > associated, it still remains the mapping value set last time. I'll keep > > > > on solving it. > > > > > > > > The warning message in dax_writeback_one() can also be fixed because of > > > > the dax unshare. > > > > > > Thank you for digging in on this, I had been pinned down on CXL tasks > > > and worried that we would need to mark FS_DAX broken for a cycle, so > > > this is timely. > > > > > > My only concern is that these patches look to have significant collisions with > > > the fsdax page reference counting reworks pending in linux-next. Although, > > > those are still sitting in mm-unstable: > > > > > > http://lore.kernel.org/r/20221108162059.2ee440d5244657c4f16bdca0@linux-foundation.org > > > > > > My preference would be to move ahead with both in which case I can help > > > rebase these fixes on top. In that scenario everything would go through > > > Andrew. > > > > > > However, if we are getting too late in the cycle for that path I think > > > these dax-fixes take precedence, and one more cycle to let the page > > > reference count reworks sit is ok. > > > > Well now that raises some interesting questions -- dax and reflink are > > totally broken on 6.1. I was thinking about cramming them into 6.2 as a > > data corruption fix on the grounds that is not an acceptable state of > > affairs. > > I agree it's not an acceptable state of affairs, but for 6.1 the answer > may be to just revert to dax+reflink being forbidden again. The fact > that no end user has noticed is probably a good sign that we can disable > that without any one screaming. That may be the easy answer for 6.2 as > well given how late this all is. > > > OTOH we're past -rc7, which is **really late** to be changing core code. > > Then again, there aren't so many fsdax users and nobody's complained > > about 6.0/6.1 being busted, so perhaps the risk of regression isn't so > > bad? Then again, that could be a sign that this could wait, if you and > > Andrew are really eager to merge the reworks. > > The page reference counting has also been languishing for a long time. A > 6.2 merge would be nice, it relieves maintenance burden, but they do not > start to have real end user implications until CXL memory hotplug > platforms arrive and the warts in the reference counting start to show > real problems in production. Hm. How bad *would* it be to rebase that patchset atop this one? After overnight testing on -rc7 it looks like Ruan's patchset fixes all the problems AFAICT. Most of the remaining regressions are to mask off fragmentation testing because fsdax cow (like the directio write paths) doesn't make much use of extent size hints. > > Just looking at the stuff that's still broken with dax+reflink -- I > > noticed that xfs/550-552 (aka the dax poison tests) are still regressing > > on reflink filesystems. > > That's worrying because the whole point of reworking dax, xfs, and > mm/memory-failure all at once was to handle the collision of poison and > reflink'd dax files. I just tried out -rc7 and all three pass, so disregard this please. > > So, uh, what would this patchset need to change if the "fsdax page > > reference counting reworks" were applied? Would it be changing the page > > refcount instead of stashing that in page->index? > > Nah, it's things like switching from pages to folios and shifting how > dax goes from pfns to pages. > > https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-unstable&id=cca48ba3196 > > Ideally fsdax would never deal in pfns at all and do everything in terms > of offsets relative to a 'struct dax_device'. > > My gut is saying these patches, the refcount reworks, and the > dax+reflink fixes, are important but not end user critical. One more > status quo release does not hurt, and we can circle back to get this all > straightened early in v6.3. Being a data corruption fix, I don't see why we shouldn't revisit this during the 6.2 cycle, even if it comes after merging the refcounting stuff. Question for Ruan: Would it be terribly difficult to push out a v2 with the review comments applied so that we have something we can backport to 6.1; and then rebase the series atop 6.2-rc1 so we can apply it to upstream (and then apply the 6.1 version to the LTS)? Or is this too convoluted...? > I.e. just revert: > > 35fcd75af3ed xfs: fail dax mount if reflink is enabled on a partition > > ...for v6.1-rc8 and get back to this early in the New Year. Hm. Tempting. --D > > > > --D > > > > > > Shiyang Ruan (2): > > > > fsdax,xfs: fix warning messages at dax_[dis]associate_entry() > > > > fsdax,xfs: port unshare to fsdax > > > > > > > > fs/dax.c | 166 ++++++++++++++++++++++++++++++------------- > > > > fs/xfs/xfs_iomap.c | 6 +- > > > > fs/xfs/xfs_reflink.c | 8 ++- > > > > include/linux/dax.h | 2 + > > > > 4 files changed, 129 insertions(+), 53 deletions(-) > > > > > > > > -- > > > > 2.38.1 > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] fsdax,xfs: fix warning messages 2022-11-30 21:08 ` Darrick J. Wong @ 2022-12-01 15:39 ` Shiyang Ruan 2022-12-01 17:47 ` Darrick J. Wong 0 siblings, 1 reply; 23+ messages in thread From: Shiyang Ruan @ 2022-12-01 15:39 UTC (permalink / raw) To: Darrick J. Wong, Dan Williams Cc: linux-kernel, linux-xfs, nvdimm, linux-fsdevel, david, akpm 在 2022/12/1 5:08, Darrick J. Wong 写道: > On Tue, Nov 29, 2022 at 11:05:30PM -0800, Dan Williams wrote: >> Darrick J. Wong wrote: >>> On Tue, Nov 29, 2022 at 07:59:14PM -0800, Dan Williams wrote: >>>> [ add Andrew ] >>>> >>>> Shiyang Ruan wrote: >>>>> Many testcases failed in dax+reflink mode with warning message in dmesg. >>>>> This also effects dax+noreflink mode if we run the test after a >>>>> dax+reflink test. So, the most urgent thing is solving the warning >>>>> messages. >>>>> >>>>> Patch 1 fixes some mistakes and adds handling of CoW cases not >>>>> previously considered (srcmap is HOLE or UNWRITTEN). >>>>> Patch 2 adds the implementation of unshare for fsdax. >>>>> >>>>> With these fixes, most warning messages in dax_associate_entry() are >>>>> gone. But honestly, generic/388 will randomly failed with the warning. >>>>> The case shutdown the xfs when fsstress is running, and do it for many >>>>> times. I think the reason is that dax pages in use are not able to be >>>>> invalidated in time when fs is shutdown. The next time dax page to be >>>>> associated, it still remains the mapping value set last time. I'll keep >>>>> on solving it. >>>>> >>>>> The warning message in dax_writeback_one() can also be fixed because of >>>>> the dax unshare. >>>> >>>> Thank you for digging in on this, I had been pinned down on CXL tasks >>>> and worried that we would need to mark FS_DAX broken for a cycle, so >>>> this is timely. >>>> >>>> My only concern is that these patches look to have significant collisions with >>>> the fsdax page reference counting reworks pending in linux-next. Although, >>>> those are still sitting in mm-unstable: >>>> >>>> http://lore.kernel.org/r/20221108162059.2ee440d5244657c4f16bdca0@linux-foundation.org >>>> >>>> My preference would be to move ahead with both in which case I can help >>>> rebase these fixes on top. In that scenario everything would go through >>>> Andrew. >>>> >>>> However, if we are getting too late in the cycle for that path I think >>>> these dax-fixes take precedence, and one more cycle to let the page >>>> reference count reworks sit is ok. >>> >>> Well now that raises some interesting questions -- dax and reflink are >>> totally broken on 6.1. I was thinking about cramming them into 6.2 as a >>> data corruption fix on the grounds that is not an acceptable state of >>> affairs. >> >> I agree it's not an acceptable state of affairs, but for 6.1 the answer >> may be to just revert to dax+reflink being forbidden again. The fact >> that no end user has noticed is probably a good sign that we can disable >> that without any one screaming. That may be the easy answer for 6.2 as >> well given how late this all is. >> >>> OTOH we're past -rc7, which is **really late** to be changing core code. >>> Then again, there aren't so many fsdax users and nobody's complained >>> about 6.0/6.1 being busted, so perhaps the risk of regression isn't so >>> bad? Then again, that could be a sign that this could wait, if you and >>> Andrew are really eager to merge the reworks. >> >> The page reference counting has also been languishing for a long time. A >> 6.2 merge would be nice, it relieves maintenance burden, but they do not >> start to have real end user implications until CXL memory hotplug >> platforms arrive and the warts in the reference counting start to show >> real problems in production. > > Hm. How bad *would* it be to rebase that patchset atop this one? > > After overnight testing on -rc7 it looks like Ruan's patchset fixes all > the problems AFAICT. Most of the remaining regressions are to mask off > fragmentation testing because fsdax cow (like the directio write paths) > doesn't make much use of extent size hints. > >>> Just looking at the stuff that's still broken with dax+reflink -- I >>> noticed that xfs/550-552 (aka the dax poison tests) are still regressing >>> on reflink filesystems. >> >> That's worrying because the whole point of reworking dax, xfs, and >> mm/memory-failure all at once was to handle the collision of poison and >> reflink'd dax files. > > I just tried out -rc7 and all three pass, so disregard this please. > >>> So, uh, what would this patchset need to change if the "fsdax page >>> reference counting reworks" were applied? Would it be changing the page >>> refcount instead of stashing that in page->index? >> >> Nah, it's things like switching from pages to folios and shifting how >> dax goes from pfns to pages. >> >> https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-unstable&id=cca48ba3196 >> >> Ideally fsdax would never deal in pfns at all and do everything in terms >> of offsets relative to a 'struct dax_device'. >> >> My gut is saying these patches, the refcount reworks, and the >> dax+reflink fixes, are important but not end user critical. One more >> status quo release does not hurt, and we can circle back to get this all >> straightened early in v6.3. > > Being a data corruption fix, I don't see why we shouldn't revisit this > during the 6.2 cycle, even if it comes after merging the refcounting > stuff. > > Question for Ruan: Would it be terribly difficult to push out a v2 with > the review comments applied so that we have something we can backport to > 6.1; and then rebase the series atop 6.2-rc1 so we can apply it to > upstream (and then apply the 6.1 version to the LTS)? Or is this too > convoluted...? It's fine to me. V2 has been posted just now. The big patch has been separated. -- Thanks, Ruan. > >> I.e. just revert: >> >> 35fcd75af3ed xfs: fail dax mount if reflink is enabled on a partition >> >> ...for v6.1-rc8 and get back to this early in the New Year. > > Hm. Tempting. > > --D > >>> >>> --D >>> >>>>> Shiyang Ruan (2): >>>>> fsdax,xfs: fix warning messages at dax_[dis]associate_entry() >>>>> fsdax,xfs: port unshare to fsdax >>>>> >>>>> fs/dax.c | 166 ++++++++++++++++++++++++++++++------------- >>>>> fs/xfs/xfs_iomap.c | 6 +- >>>>> fs/xfs/xfs_reflink.c | 8 ++- >>>>> include/linux/dax.h | 2 + >>>>> 4 files changed, 129 insertions(+), 53 deletions(-) >>>>> >>>>> -- >>>>> 2.38.1 >> >> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] fsdax,xfs: fix warning messages 2022-12-01 15:39 ` Shiyang Ruan @ 2022-12-01 17:47 ` Darrick J. Wong 0 siblings, 0 replies; 23+ messages in thread From: Darrick J. Wong @ 2022-12-01 17:47 UTC (permalink / raw) To: Shiyang Ruan Cc: Dan Williams, linux-kernel, linux-xfs, nvdimm, linux-fsdevel, david, akpm On Thu, Dec 01, 2022 at 11:39:12PM +0800, Shiyang Ruan wrote: > > > 在 2022/12/1 5:08, Darrick J. Wong 写道: > > On Tue, Nov 29, 2022 at 11:05:30PM -0800, Dan Williams wrote: > > > Darrick J. Wong wrote: > > > > On Tue, Nov 29, 2022 at 07:59:14PM -0800, Dan Williams wrote: > > > > > [ add Andrew ] > > > > > > > > > > Shiyang Ruan wrote: > > > > > > Many testcases failed in dax+reflink mode with warning message in dmesg. > > > > > > This also effects dax+noreflink mode if we run the test after a > > > > > > dax+reflink test. So, the most urgent thing is solving the warning > > > > > > messages. > > > > > > > > > > > > Patch 1 fixes some mistakes and adds handling of CoW cases not > > > > > > previously considered (srcmap is HOLE or UNWRITTEN). > > > > > > Patch 2 adds the implementation of unshare for fsdax. > > > > > > > > > > > > With these fixes, most warning messages in dax_associate_entry() are > > > > > > gone. But honestly, generic/388 will randomly failed with the warning. > > > > > > The case shutdown the xfs when fsstress is running, and do it for many > > > > > > times. I think the reason is that dax pages in use are not able to be > > > > > > invalidated in time when fs is shutdown. The next time dax page to be > > > > > > associated, it still remains the mapping value set last time. I'll keep > > > > > > on solving it. > > > > > > > > > > > > The warning message in dax_writeback_one() can also be fixed because of > > > > > > the dax unshare. > > > > > > > > > > Thank you for digging in on this, I had been pinned down on CXL tasks > > > > > and worried that we would need to mark FS_DAX broken for a cycle, so > > > > > this is timely. > > > > > > > > > > My only concern is that these patches look to have significant collisions with > > > > > the fsdax page reference counting reworks pending in linux-next. Although, > > > > > those are still sitting in mm-unstable: > > > > > > > > > > http://lore.kernel.org/r/20221108162059.2ee440d5244657c4f16bdca0@linux-foundation.org > > > > > > > > > > My preference would be to move ahead with both in which case I can help > > > > > rebase these fixes on top. In that scenario everything would go through > > > > > Andrew. > > > > > > > > > > However, if we are getting too late in the cycle for that path I think > > > > > these dax-fixes take precedence, and one more cycle to let the page > > > > > reference count reworks sit is ok. > > > > > > > > Well now that raises some interesting questions -- dax and reflink are > > > > totally broken on 6.1. I was thinking about cramming them into 6.2 as a > > > > data corruption fix on the grounds that is not an acceptable state of > > > > affairs. > > > > > > I agree it's not an acceptable state of affairs, but for 6.1 the answer > > > may be to just revert to dax+reflink being forbidden again. The fact > > > that no end user has noticed is probably a good sign that we can disable > > > that without any one screaming. That may be the easy answer for 6.2 as > > > well given how late this all is. > > > > > > > OTOH we're past -rc7, which is **really late** to be changing core code. > > > > Then again, there aren't so many fsdax users and nobody's complained > > > > about 6.0/6.1 being busted, so perhaps the risk of regression isn't so > > > > bad? Then again, that could be a sign that this could wait, if you and > > > > Andrew are really eager to merge the reworks. > > > > > > The page reference counting has also been languishing for a long time. A > > > 6.2 merge would be nice, it relieves maintenance burden, but they do not > > > start to have real end user implications until CXL memory hotplug > > > platforms arrive and the warts in the reference counting start to show > > > real problems in production. > > > > Hm. How bad *would* it be to rebase that patchset atop this one? > > > > After overnight testing on -rc7 it looks like Ruan's patchset fixes all > > the problems AFAICT. Most of the remaining regressions are to mask off > > fragmentation testing because fsdax cow (like the directio write paths) > > doesn't make much use of extent size hints. > > > > > > Just looking at the stuff that's still broken with dax+reflink -- I > > > > noticed that xfs/550-552 (aka the dax poison tests) are still regressing > > > > on reflink filesystems. > > > > > > That's worrying because the whole point of reworking dax, xfs, and > > > mm/memory-failure all at once was to handle the collision of poison and > > > reflink'd dax files. > > > > I just tried out -rc7 and all three pass, so disregard this please. > > > > > > So, uh, what would this patchset need to change if the "fsdax page > > > > reference counting reworks" were applied? Would it be changing the page > > > > refcount instead of stashing that in page->index? > > > > > > Nah, it's things like switching from pages to folios and shifting how > > > dax goes from pfns to pages. > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-unstable&id=cca48ba3196 > > > > > > Ideally fsdax would never deal in pfns at all and do everything in terms > > > of offsets relative to a 'struct dax_device'. > > > > > > My gut is saying these patches, the refcount reworks, and the > > > dax+reflink fixes, are important but not end user critical. One more > > > status quo release does not hurt, and we can circle back to get this all > > > straightened early in v6.3. > > > > Being a data corruption fix, I don't see why we shouldn't revisit this > > during the 6.2 cycle, even if it comes after merging the refcounting > > stuff. > > > > Question for Ruan: Would it be terribly difficult to push out a v2 with > > the review comments applied so that we have something we can backport to > > 6.1; and then rebase the series atop 6.2-rc1 so we can apply it to > > upstream (and then apply the 6.1 version to the LTS)? Or is this too > > convoluted...? > > It's fine to me. V2 has been posted just now. The big patch has been > separated. Ok, thank you. Since akpm/Dan aren't moving forward with the page refcounting changes for 6.2, I think I'll try to merge these fixes for 6.2-rc1 without so much rebasing. :) --D > > -- > Thanks, > Ruan. > > > > > > I.e. just revert: > > > > > > 35fcd75af3ed xfs: fail dax mount if reflink is enabled on a partition > > > > > > ...for v6.1-rc8 and get back to this early in the New Year. > > > > Hm. Tempting. > > > > --D > > > > > > > > > > --D > > > > > > > > > > Shiyang Ruan (2): > > > > > > fsdax,xfs: fix warning messages at dax_[dis]associate_entry() > > > > > > fsdax,xfs: port unshare to fsdax > > > > > > > > > > > > fs/dax.c | 166 ++++++++++++++++++++++++++++++------------- > > > > > > fs/xfs/xfs_iomap.c | 6 +- > > > > > > fs/xfs/xfs_reflink.c | 8 ++- > > > > > > include/linux/dax.h | 2 + > > > > > > 4 files changed, 129 insertions(+), 53 deletions(-) > > > > > > > > > > > > -- > > > > > > 2.38.1 > > > > > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] fsdax,xfs: fix warning messages 2022-11-30 3:59 ` Dan Williams 2022-11-30 4:16 ` Darrick J. Wong @ 2022-11-30 21:27 ` Andrew Morton 2022-11-30 21:48 ` Dan Williams 1 sibling, 1 reply; 23+ messages in thread From: Andrew Morton @ 2022-11-30 21:27 UTC (permalink / raw) To: Dan Williams Cc: Shiyang Ruan, linux-kernel, linux-xfs, nvdimm, linux-fsdevel, djwong, david On Tue, 29 Nov 2022 19:59:14 -0800 Dan Williams <dan.j.williams@intel.com> wrote: > [ add Andrew ] > > Shiyang Ruan wrote: > > Many testcases failed in dax+reflink mode with warning message in dmesg. > > This also effects dax+noreflink mode if we run the test after a > > dax+reflink test. So, the most urgent thing is solving the warning > > messages. > > > > Patch 1 fixes some mistakes and adds handling of CoW cases not > > previously considered (srcmap is HOLE or UNWRITTEN). > > Patch 2 adds the implementation of unshare for fsdax. > > > > With these fixes, most warning messages in dax_associate_entry() are > > gone. But honestly, generic/388 will randomly failed with the warning. > > The case shutdown the xfs when fsstress is running, and do it for many > > times. I think the reason is that dax pages in use are not able to be > > invalidated in time when fs is shutdown. The next time dax page to be > > associated, it still remains the mapping value set last time. I'll keep > > on solving it. > > > > The warning message in dax_writeback_one() can also be fixed because of > > the dax unshare. > > Thank you for digging in on this, I had been pinned down on CXL tasks > and worried that we would need to mark FS_DAX broken for a cycle, so > this is timely. > > My only concern is that these patches look to have significant collisions with > the fsdax page reference counting reworks pending in linux-next. Although, > those are still sitting in mm-unstable: > > http://lore.kernel.org/r/20221108162059.2ee440d5244657c4f16bdca0@linux-foundation.org As far as I know, Dan's "Fix the DAX-gup mistake" series is somewhat stuck. Jan pointed out: https://lore.kernel.org/all/20221109113849.p7pwob533ijgrytu@quack3/T/#u or have Jason's issues since been addressed? > My preference would be to move ahead with both in which case I can help > rebase these fixes on top. In that scenario everything would go through > Andrew. > > However, if we are getting too late in the cycle for that path I think > these dax-fixes take precedence, and one more cycle to let the page > reference count reworks sit is ok. That sounds a decent approach. So we go with this series ("fsdax,xfs: fix warning messages") and aim at 6.3-rc1 with "Fix the DAX-gup mistake"? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] fsdax,xfs: fix warning messages 2022-11-30 21:27 ` Andrew Morton @ 2022-11-30 21:48 ` Dan Williams 2022-11-30 23:08 ` Darrick J. Wong 0 siblings, 1 reply; 23+ messages in thread From: Dan Williams @ 2022-11-30 21:48 UTC (permalink / raw) To: Andrew Morton, Dan Williams Cc: Shiyang Ruan, linux-kernel, linux-xfs, nvdimm, linux-fsdevel, djwong, david Andrew Morton wrote: > On Tue, 29 Nov 2022 19:59:14 -0800 Dan Williams <dan.j.williams@intel.com> wrote: > > > [ add Andrew ] > > > > Shiyang Ruan wrote: > > > Many testcases failed in dax+reflink mode with warning message in dmesg. > > > This also effects dax+noreflink mode if we run the test after a > > > dax+reflink test. So, the most urgent thing is solving the warning > > > messages. > > > > > > Patch 1 fixes some mistakes and adds handling of CoW cases not > > > previously considered (srcmap is HOLE or UNWRITTEN). > > > Patch 2 adds the implementation of unshare for fsdax. > > > > > > With these fixes, most warning messages in dax_associate_entry() are > > > gone. But honestly, generic/388 will randomly failed with the warning. > > > The case shutdown the xfs when fsstress is running, and do it for many > > > times. I think the reason is that dax pages in use are not able to be > > > invalidated in time when fs is shutdown. The next time dax page to be > > > associated, it still remains the mapping value set last time. I'll keep > > > on solving it. > > > > > > The warning message in dax_writeback_one() can also be fixed because of > > > the dax unshare. > > > > Thank you for digging in on this, I had been pinned down on CXL tasks > > and worried that we would need to mark FS_DAX broken for a cycle, so > > this is timely. > > > > My only concern is that these patches look to have significant collisions with > > the fsdax page reference counting reworks pending in linux-next. Although, > > those are still sitting in mm-unstable: > > > > http://lore.kernel.org/r/20221108162059.2ee440d5244657c4f16bdca0@linux-foundation.org > > As far as I know, Dan's "Fix the DAX-gup mistake" series is somewhat > stuck. Jan pointed out: > > https://lore.kernel.org/all/20221109113849.p7pwob533ijgrytu@quack3/T/#u > > or have Jason's issues since been addressed? No, they have not. I do think the current series is a step forward, but given the urgency remains low for the time being (CXL hotplug use case further out, no known collisions with ongoing folio work, and no MEMORY_DEVICE_PRIVATE users looking to build any conversions on top for 6.2) I am ok to circle back for 6.3 for that follow on work to be integrated. > > My preference would be to move ahead with both in which case I can help > > rebase these fixes on top. In that scenario everything would go through > > Andrew. > > > > However, if we are getting too late in the cycle for that path I think > > these dax-fixes take precedence, and one more cycle to let the page > > reference count reworks sit is ok. > > That sounds a decent approach. So we go with this series ("fsdax,xfs: > fix warning messages") and aim at 6.3-rc1 with "Fix the DAX-gup > mistake"? > Yeah, that's the path of least hassle. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] fsdax,xfs: fix warning messages 2022-11-30 21:48 ` Dan Williams @ 2022-11-30 23:08 ` Darrick J. Wong 2022-11-30 23:58 ` Dan Williams 0 siblings, 1 reply; 23+ messages in thread From: Darrick J. Wong @ 2022-11-30 23:08 UTC (permalink / raw) To: Dan Williams Cc: Andrew Morton, Shiyang Ruan, linux-kernel, linux-xfs, nvdimm, linux-fsdevel, david On Wed, Nov 30, 2022 at 01:48:59PM -0800, Dan Williams wrote: > Andrew Morton wrote: > > On Tue, 29 Nov 2022 19:59:14 -0800 Dan Williams <dan.j.williams@intel.com> wrote: > > > > > [ add Andrew ] > > > > > > Shiyang Ruan wrote: > > > > Many testcases failed in dax+reflink mode with warning message in dmesg. > > > > This also effects dax+noreflink mode if we run the test after a > > > > dax+reflink test. So, the most urgent thing is solving the warning > > > > messages. > > > > > > > > Patch 1 fixes some mistakes and adds handling of CoW cases not > > > > previously considered (srcmap is HOLE or UNWRITTEN). > > > > Patch 2 adds the implementation of unshare for fsdax. > > > > > > > > With these fixes, most warning messages in dax_associate_entry() are > > > > gone. But honestly, generic/388 will randomly failed with the warning. > > > > The case shutdown the xfs when fsstress is running, and do it for many > > > > times. I think the reason is that dax pages in use are not able to be > > > > invalidated in time when fs is shutdown. The next time dax page to be > > > > associated, it still remains the mapping value set last time. I'll keep > > > > on solving it. > > > > > > > > The warning message in dax_writeback_one() can also be fixed because of > > > > the dax unshare. > > > > > > Thank you for digging in on this, I had been pinned down on CXL tasks > > > and worried that we would need to mark FS_DAX broken for a cycle, so > > > this is timely. > > > > > > My only concern is that these patches look to have significant collisions with > > > the fsdax page reference counting reworks pending in linux-next. Although, > > > those are still sitting in mm-unstable: > > > > > > http://lore.kernel.org/r/20221108162059.2ee440d5244657c4f16bdca0@linux-foundation.org > > > > As far as I know, Dan's "Fix the DAX-gup mistake" series is somewhat > > stuck. Jan pointed out: > > > > https://lore.kernel.org/all/20221109113849.p7pwob533ijgrytu@quack3/T/#u > > > > or have Jason's issues since been addressed? > > No, they have not. I do think the current series is a step forward, but > given the urgency remains low for the time being (CXL hotplug use case > further out, no known collisions with ongoing folio work, and no > MEMORY_DEVICE_PRIVATE users looking to build any conversions on top for > 6.2) I am ok to circle back for 6.3 for that follow on work to be > integrated. > > > > My preference would be to move ahead with both in which case I can help > > > rebase these fixes on top. In that scenario everything would go through > > > Andrew. > > > > > > However, if we are getting too late in the cycle for that path I think > > > these dax-fixes take precedence, and one more cycle to let the page > > > reference count reworks sit is ok. > > > > That sounds a decent approach. So we go with this series ("fsdax,xfs: > > fix warning messages") and aim at 6.3-rc1 with "Fix the DAX-gup > > mistake"? > > > > Yeah, that's the path of least hassle. Sounds good. I still want to see patch 1 of this series broken up into smaller pieces though. Once the series goes through review, do you want me to push the fixes to Linus, seeing as xfs is the only user of this functionality? --D ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] fsdax,xfs: fix warning messages 2022-11-30 23:08 ` Darrick J. Wong @ 2022-11-30 23:58 ` Dan Williams 0 siblings, 0 replies; 23+ messages in thread From: Dan Williams @ 2022-11-30 23:58 UTC (permalink / raw) To: Darrick J. Wong, Dan Williams Cc: Andrew Morton, Shiyang Ruan, linux-kernel, linux-xfs, nvdimm, linux-fsdevel, david Darrick J. Wong wrote: > On Wed, Nov 30, 2022 at 01:48:59PM -0800, Dan Williams wrote: > > Andrew Morton wrote: > > > On Tue, 29 Nov 2022 19:59:14 -0800 Dan Williams <dan.j.williams@intel.com> wrote: > > > > > > > [ add Andrew ] > > > > > > > > Shiyang Ruan wrote: > > > > > Many testcases failed in dax+reflink mode with warning message in dmesg. > > > > > This also effects dax+noreflink mode if we run the test after a > > > > > dax+reflink test. So, the most urgent thing is solving the warning > > > > > messages. > > > > > > > > > > Patch 1 fixes some mistakes and adds handling of CoW cases not > > > > > previously considered (srcmap is HOLE or UNWRITTEN). > > > > > Patch 2 adds the implementation of unshare for fsdax. > > > > > > > > > > With these fixes, most warning messages in dax_associate_entry() are > > > > > gone. But honestly, generic/388 will randomly failed with the warning. > > > > > The case shutdown the xfs when fsstress is running, and do it for many > > > > > times. I think the reason is that dax pages in use are not able to be > > > > > invalidated in time when fs is shutdown. The next time dax page to be > > > > > associated, it still remains the mapping value set last time. I'll keep > > > > > on solving it. > > > > > > > > > > The warning message in dax_writeback_one() can also be fixed because of > > > > > the dax unshare. > > > > > > > > Thank you for digging in on this, I had been pinned down on CXL tasks > > > > and worried that we would need to mark FS_DAX broken for a cycle, so > > > > this is timely. > > > > > > > > My only concern is that these patches look to have significant collisions with > > > > the fsdax page reference counting reworks pending in linux-next. Although, > > > > those are still sitting in mm-unstable: > > > > > > > > http://lore.kernel.org/r/20221108162059.2ee440d5244657c4f16bdca0@linux-foundation.org > > > > > > As far as I know, Dan's "Fix the DAX-gup mistake" series is somewhat > > > stuck. Jan pointed out: > > > > > > https://lore.kernel.org/all/20221109113849.p7pwob533ijgrytu@quack3/T/#u > > > > > > or have Jason's issues since been addressed? > > > > No, they have not. I do think the current series is a step forward, but > > given the urgency remains low for the time being (CXL hotplug use case > > further out, no known collisions with ongoing folio work, and no > > MEMORY_DEVICE_PRIVATE users looking to build any conversions on top for > > 6.2) I am ok to circle back for 6.3 for that follow on work to be > > integrated. > > > > > > My preference would be to move ahead with both in which case I can help > > > > rebase these fixes on top. In that scenario everything would go through > > > > Andrew. > > > > > > > > However, if we are getting too late in the cycle for that path I think > > > > these dax-fixes take precedence, and one more cycle to let the page > > > > reference count reworks sit is ok. > > > > > > That sounds a decent approach. So we go with this series ("fsdax,xfs: > > > fix warning messages") and aim at 6.3-rc1 with "Fix the DAX-gup > > > mistake"? > > > > > > > Yeah, that's the path of least hassle. > > Sounds good. I still want to see patch 1 of this series broken up into > smaller pieces though. Once the series goes through review, do you want > me to push the fixes to Linus, seeing as xfs is the only user of this > functionality? Yes, that was my primary feedback as well, and merging through xfs makes sense to me. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] fsdax,xfs: fix warning messages #forregzbot 2022-11-24 14:54 [PATCH 0/2] fsdax,xfs: fix warning messages Shiyang Ruan ` (3 preceding siblings ...) 2022-11-30 3:59 ` Dan Williams @ 2022-11-30 10:30 ` Thorsten Leemhuis 2022-12-12 7:06 ` Thorsten Leemhuis 4 siblings, 1 reply; 23+ messages in thread From: Thorsten Leemhuis @ 2022-11-30 10:30 UTC (permalink / raw) To: linux-kernel, linux-xfs, nvdimm, linux-fsdevel; +Cc: regressions [Note: this mail is primarily send for documentation purposes and/or for regzbot, my Linux kernel regression tracking bot. That's why I removed most or all folks from the list of recipients, but left any that looked like a mailing lists. These mails usually contain '#forregzbot' in the subject, to make them easy to spot and filter out.] On 24.11.22 15:54, Shiyang Ruan wrote: > Many testcases failed in dax+reflink mode with warning message in dmesg. > This also effects dax+noreflink mode if we run the test after a > dax+reflink test. So, the most urgent thing is solving the warning > messages. Darrick in https://lore.kernel.org/all/Y4bZGvP8Ozp+4De%2F@magnolia/ wrote "dax and reflink are totally broken on 6.1". Hence, add this to the tracking to be sure it's not forgotten. #regzbot ^introduced 35fcd75af3ed #regzbot title xfs/dax/reflink are totally broken on 6.1 #regzbot ignore-activity Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] fsdax,xfs: fix warning messages #forregzbot 2022-11-30 10:30 ` [PATCH 0/2] fsdax,xfs: fix warning messages #forregzbot Thorsten Leemhuis @ 2022-12-12 7:06 ` Thorsten Leemhuis 0 siblings, 0 replies; 23+ messages in thread From: Thorsten Leemhuis @ 2022-12-12 7:06 UTC (permalink / raw) To: linux-kernel, linux-xfs, nvdimm, linux-fsdevel; +Cc: regressions On 30.11.22 11:30, Thorsten Leemhuis wrote: > [Note: this mail is primarily send for documentation purposes and/or for > regzbot, my Linux kernel regression tracking bot. That's why I removed > most or all folks from the list of recipients, but left any that looked > like a mailing lists. These mails usually contain '#forregzbot' in the > subject, to make them easy to spot and filter out.] > > On 24.11.22 15:54, Shiyang Ruan wrote: >> Many testcases failed in dax+reflink mode with warning message in dmesg. >> This also effects dax+noreflink mode if we run the test after a >> dax+reflink test. So, the most urgent thing is solving the warning >> messages. > > Darrick in https://lore.kernel.org/all/Y4bZGvP8Ozp+4De%2F@magnolia/ > wrote "dax and reflink are totally broken on 6.1". Hence, add this to > the tracking to be sure it's not forgotten. > > #regzbot ^introduced 35fcd75af3ed > #regzbot title xfs/dax/reflink are totally broken on 6.1 > #regzbot ignore-activity #regzbot inconclusive complex issue; fixes with backports apparently planed to be merged for 6.2 Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight. ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2022-12-12 7:06 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-11-24 14:54 [PATCH 0/2] fsdax,xfs: fix warning messages Shiyang Ruan 2022-11-24 14:54 ` [PATCH 1/2] fsdax,xfs: fix warning messages at dax_[dis]associate_entry() Shiyang Ruan 2022-11-30 4:08 ` Darrick J. Wong 2022-11-30 8:58 ` Shiyang Ruan 2022-11-30 16:28 ` Darrick J. Wong 2022-11-30 4:12 ` Dan Williams 2022-11-24 14:54 ` [PATCH 2/2] fsdax,xfs: port unshare to fsdax Shiyang Ruan 2022-11-30 3:28 ` Darrick J. Wong 2022-11-27 18:38 ` [PATCH 0/2] fsdax,xfs: fix warning messages Darrick J. Wong 2022-11-28 2:16 ` Shiyang Ruan 2022-11-28 23:08 ` Darrick J. Wong 2022-11-30 3:59 ` Dan Williams 2022-11-30 4:16 ` Darrick J. Wong 2022-11-30 7:05 ` Dan Williams 2022-11-30 21:08 ` Darrick J. Wong 2022-12-01 15:39 ` Shiyang Ruan 2022-12-01 17:47 ` Darrick J. Wong 2022-11-30 21:27 ` Andrew Morton 2022-11-30 21:48 ` Dan Williams 2022-11-30 23:08 ` Darrick J. Wong 2022-11-30 23:58 ` Dan Williams 2022-11-30 10:30 ` [PATCH 0/2] fsdax,xfs: fix warning messages #forregzbot Thorsten Leemhuis 2022-12-12 7:06 ` Thorsten Leemhuis
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).