* [PATCH 01/20] iomap: better document the IOMAP_F_* flags
2019-10-08 7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
@ 2019-10-08 7:15 ` Christoph Hellwig
2019-10-08 7:15 ` [PATCH 02/20] iomap: remove the unused iomap argument to __iomap_write_end Christoph Hellwig
` (18 subsequent siblings)
19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08 7:15 UTC (permalink / raw)
To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel
Cc: Allison Collins, Darrick J . Wong
The documentation for IOMAP_F_* is a bit disorganized, and doesn't
mention the fact that most flags are set by the file system and consumed
by the iomap core, while IOMAP_F_SIZE_CHANGED is set by the core and
consumed by the file system.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
include/linux/iomap.h | 31 +++++++++++++++++++++++--------
1 file changed, 23 insertions(+), 8 deletions(-)
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 46ce730b1590..17cf63717681 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -30,21 +30,36 @@ struct vm_fault;
#define IOMAP_INLINE 0x05 /* data inline in the inode */
/*
- * Flags for all iomap mappings:
+ * Flags reported by the file system from iomap_begin:
+ *
+ * IOMAP_F_NEW indicates that the blocks have been newly allocated and need
+ * zeroing for areas that no data is copied to.
*
* IOMAP_F_DIRTY indicates the inode has uncommitted metadata needed to access
* written data and requires fdatasync to commit them to persistent storage.
+ *
+ * IOMAP_F_SHARED indicates that the blocks are shared, and will need to be
+ * unshared as part a write.
+ *
+ * IOMAP_F_MERGED indicates that the iomap contains the merge of multiple block
+ * mappings.
+ *
+ * IOMAP_F_BUFFER_HEAD indicates that the file system requires the use of
+ * buffer heads for this mapping.
*/
-#define IOMAP_F_NEW 0x01 /* blocks have been newly allocated */
-#define IOMAP_F_DIRTY 0x02 /* uncommitted metadata */
-#define IOMAP_F_BUFFER_HEAD 0x04 /* file system requires buffer heads */
-#define IOMAP_F_SIZE_CHANGED 0x08 /* file size has changed */
+#define IOMAP_F_NEW 0x01
+#define IOMAP_F_DIRTY 0x02
+#define IOMAP_F_SHARED 0x04
+#define IOMAP_F_MERGED 0x08
+#define IOMAP_F_BUFFER_HEAD 0x10
/*
- * Flags that only need to be reported for IOMAP_REPORT requests:
+ * Flags set by the core iomap code during operations:
+ *
+ * IOMAP_F_SIZE_CHANGED indicates to the iomap_end method that the file size
+ * has changed as the result of this write operation.
*/
-#define IOMAP_F_MERGED 0x10 /* contains multiple blocks/extents */
-#define IOMAP_F_SHARED 0x20 /* block shared with another file */
+#define IOMAP_F_SIZE_CHANGED 0x100
/*
* Flags from 0x1000 up are for file system specific usage:
--
2.20.1
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 02/20] iomap: remove the unused iomap argument to __iomap_write_end
2019-10-08 7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
2019-10-08 7:15 ` [PATCH 01/20] iomap: better document the IOMAP_F_* flags Christoph Hellwig
@ 2019-10-08 7:15 ` Christoph Hellwig
2019-10-08 7:15 ` [PATCH 03/20] iomap: always use AOP_FLAG_NOFS in iomap_write_begin Christoph Hellwig
` (17 subsequent siblings)
19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08 7:15 UTC (permalink / raw)
To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel
Cc: Allison Collins, Darrick J . Wong
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/iomap/buffered-io.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 35c93f72c172..672f5d8efdbe 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -693,7 +693,7 @@ EXPORT_SYMBOL_GPL(iomap_set_page_dirty);
static int
__iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
- unsigned copied, struct page *page, struct iomap *iomap)
+ unsigned copied, struct page *page)
{
flush_dcache_page(page);
@@ -746,7 +746,7 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
ret = block_write_end(NULL, inode->i_mapping, pos, len, copied,
page, NULL);
} else {
- ret = __iomap_write_end(inode, pos, len, copied, page, iomap);
+ ret = __iomap_write_end(inode, pos, len, copied, page);
}
/*
--
2.20.1
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 03/20] iomap: always use AOP_FLAG_NOFS in iomap_write_begin
2019-10-08 7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
2019-10-08 7:15 ` [PATCH 01/20] iomap: better document the IOMAP_F_* flags Christoph Hellwig
2019-10-08 7:15 ` [PATCH 02/20] iomap: remove the unused iomap argument to __iomap_write_end Christoph Hellwig
@ 2019-10-08 7:15 ` Christoph Hellwig
2019-10-08 7:15 ` [PATCH 04/20] iomap: ignore non-shared or non-data blocks in xfs_file_dirty Christoph Hellwig
` (16 subsequent siblings)
19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08 7:15 UTC (permalink / raw)
To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel; +Cc: Darrick J . Wong
All callers pass AOP_FLAG_NOFS, so lift that flag to iomap_write_begin
to allow reusing the flags arguments for an internal flags namespace
soon. Also remove the local index variable that is only used once.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/iomap/buffered-io.c | 14 +++++---------
1 file changed, 5 insertions(+), 9 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 672f5d8efdbe..bf6a0e0b92a5 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -621,7 +621,6 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
struct page **pagep, struct iomap *iomap)
{
const struct iomap_page_ops *page_ops = iomap->page_ops;
- pgoff_t index = pos >> PAGE_SHIFT;
struct page *page;
int status = 0;
@@ -636,7 +635,8 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
return status;
}
- page = grab_cache_page_write_begin(inode->i_mapping, index, flags);
+ page = grab_cache_page_write_begin(inode->i_mapping, pos >> PAGE_SHIFT,
+ AOP_FLAG_NOFS);
if (!page) {
status = -ENOMEM;
goto out_no_page;
@@ -778,7 +778,6 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
struct iov_iter *i = data;
long status = 0;
ssize_t written = 0;
- unsigned int flags = AOP_FLAG_NOFS;
do {
struct page *page;
@@ -808,8 +807,7 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
break;
}
- status = iomap_write_begin(inode, pos, bytes, flags, &page,
- iomap);
+ status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
if (unlikely(status))
break;
@@ -907,8 +905,7 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
if (IS_ERR(rpage))
return PTR_ERR(rpage);
- status = iomap_write_begin(inode, pos, bytes,
- AOP_FLAG_NOFS, &page, iomap);
+ status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
put_page(rpage);
if (unlikely(status))
return status;
@@ -959,8 +956,7 @@ static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
struct page *page;
int status;
- status = iomap_write_begin(inode, pos, bytes, AOP_FLAG_NOFS, &page,
- iomap);
+ status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
if (status)
return status;
--
2.20.1
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 04/20] iomap: ignore non-shared or non-data blocks in xfs_file_dirty
2019-10-08 7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
` (2 preceding siblings ...)
2019-10-08 7:15 ` [PATCH 03/20] iomap: always use AOP_FLAG_NOFS in iomap_write_begin Christoph Hellwig
@ 2019-10-08 7:15 ` Christoph Hellwig
2019-10-08 7:15 ` [PATCH 05/20] iomap: move the zeroing case out of iomap_read_page_sync Christoph Hellwig
` (15 subsequent siblings)
19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08 7:15 UTC (permalink / raw)
To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel; +Cc: Darrick J . Wong
xfs_file_dirty is used to unshare reflink blocks. Rename the function
to xfs_file_unshare to better document that purpose, and skip iomaps
that are not shared and don't need zeroing. This will allow to simplify
the caller.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/iomap/buffered-io.c | 15 +++++++++++----
fs/xfs/xfs_reflink.c | 2 +-
include/linux/iomap.h | 2 +-
3 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index bf6a0e0b92a5..59751835f172 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -887,12 +887,19 @@ __iomap_read_page(struct inode *inode, loff_t offset)
}
static loff_t
-iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
+iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
struct iomap *iomap)
{
long status = 0;
ssize_t written = 0;
+ /* don't bother with blocks that are not shared to start with */
+ if (!(iomap->flags & IOMAP_F_SHARED))
+ return length;
+ /* don't bother with holes or unwritten extents */
+ if (iomap->type == IOMAP_HOLE || iomap->type == IOMAP_UNWRITTEN)
+ return length;
+
do {
struct page *page, *rpage;
unsigned long offset; /* Offset into pagecache page */
@@ -932,14 +939,14 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
}
int
-iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
+iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
const struct iomap_ops *ops)
{
loff_t ret;
while (len) {
ret = iomap_apply(inode, pos, len, IOMAP_WRITE, ops, NULL,
- iomap_dirty_actor);
+ iomap_unshare_actor);
if (ret <= 0)
return ret;
pos += ret;
@@ -948,7 +955,7 @@ iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
return 0;
}
-EXPORT_SYMBOL_GPL(iomap_file_dirty);
+EXPORT_SYMBOL_GPL(iomap_file_unshare);
static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
unsigned bytes, struct iomap *iomap)
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 0f08153b4994..a9634110c783 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1442,7 +1442,7 @@ xfs_reflink_dirty_extents(
flen = XFS_FSB_TO_B(mp, rlen);
if (fpos + flen > isize)
flen = isize - fpos;
- error = iomap_file_dirty(VFS_I(ip), fpos, flen,
+ error = iomap_file_unshare(VFS_I(ip), fpos, flen,
&xfs_iomap_ops);
xfs_ilock(ip, XFS_ILOCK_EXCL);
if (error)
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 17cf63717681..220f6b17a1a7 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -166,7 +166,7 @@ int iomap_migrate_page(struct address_space *mapping, struct page *newpage,
#else
#define iomap_migrate_page NULL
#endif
-int iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
+int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
const struct iomap_ops *ops);
int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len,
bool *did_zero, const struct iomap_ops *ops);
--
2.20.1
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 05/20] iomap: move the zeroing case out of iomap_read_page_sync
2019-10-08 7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
` (3 preceding siblings ...)
2019-10-08 7:15 ` [PATCH 04/20] iomap: ignore non-shared or non-data blocks in xfs_file_dirty Christoph Hellwig
@ 2019-10-08 7:15 ` Christoph Hellwig
2019-10-08 7:15 ` [PATCH 06/20] iomap: use write_begin to read pages to unshare Christoph Hellwig
` (14 subsequent siblings)
19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08 7:15 UTC (permalink / raw)
To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel; +Cc: Darrick J . Wong
That keeps the function a little easier to understand, and easier to
modify for pending enhancements.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/iomap/buffered-io.c | 33 ++++++++++++++++-----------------
1 file changed, 16 insertions(+), 17 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 59751835f172..d5abd8e5dca7 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -562,19 +562,12 @@ iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
}
static int
-iomap_read_page_sync(struct inode *inode, loff_t block_start, struct page *page,
- unsigned poff, unsigned plen, unsigned from, unsigned to,
- struct iomap *iomap)
+iomap_read_page_sync(loff_t block_start, struct page *page, unsigned poff,
+ unsigned plen, struct iomap *iomap)
{
struct bio_vec bvec;
struct bio bio;
- if (iomap_block_needs_zeroing(inode, iomap, block_start)) {
- zero_user_segments(page, poff, from, to, poff + plen);
- iomap_set_range_uptodate(page, poff, plen);
- return 0;
- }
-
bio_init(&bio, &bvec, 1);
bio.bi_opf = REQ_OP_READ;
bio.bi_iter.bi_sector = iomap_sector(iomap, block_start);
@@ -592,7 +585,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
loff_t block_start = pos & ~(block_size - 1);
loff_t block_end = (pos + len + block_size - 1) & ~(block_size - 1);
unsigned from = offset_in_page(pos), to = from + len, poff, plen;
- int status = 0;
+ int status;
if (PageUptodate(page))
return 0;
@@ -603,17 +596,23 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
if (plen == 0)
break;
- if ((from > poff && from < poff + plen) ||
- (to > poff && to < poff + plen)) {
- status = iomap_read_page_sync(inode, block_start, page,
- poff, plen, from, to, iomap);
- if (status)
- break;
+ if ((from <= poff || from >= poff + plen) &&
+ (to <= poff || to >= poff + plen))
+ continue;
+
+ if (iomap_block_needs_zeroing(inode, iomap, block_start)) {
+ zero_user_segments(page, poff, from, to, poff + plen);
+ iomap_set_range_uptodate(page, poff, plen);
+ continue;
}
+ status = iomap_read_page_sync(block_start, page, poff, plen,
+ iomap);
+ if (status)
+ return status;
} while ((block_start += plen) < block_end);
- return status;
+ return 0;
}
static int
--
2.20.1
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 06/20] iomap: use write_begin to read pages to unshare
2019-10-08 7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
` (4 preceding siblings ...)
2019-10-08 7:15 ` [PATCH 05/20] iomap: move the zeroing case out of iomap_read_page_sync Christoph Hellwig
@ 2019-10-08 7:15 ` Christoph Hellwig
2019-10-08 15:12 ` Darrick J. Wong
2019-10-08 7:15 ` [PATCH 07/20] iomap: renumber IOMAP_HOLE to 0 Christoph Hellwig
` (13 subsequent siblings)
19 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08 7:15 UTC (permalink / raw)
To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel
Use the existing iomap write_begin code to read the pages unshared
by iomap_file_unshare. That avoids the extra ->readpage call and
extent tree lookup currently done by read_mapping_page.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/iomap/buffered-io.c | 49 ++++++++++++++----------------------------
1 file changed, 16 insertions(+), 33 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index d5abd8e5dca7..ac1bbed71a9b 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -548,6 +548,10 @@ iomap_migrate_page(struct address_space *mapping, struct page *newpage,
EXPORT_SYMBOL_GPL(iomap_migrate_page);
#endif /* CONFIG_MIGRATION */
+enum {
+ IOMAP_WRITE_F_UNSHARE = (1 << 0),
+};
+
static void
iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
{
@@ -577,7 +581,7 @@ iomap_read_page_sync(loff_t block_start, struct page *page, unsigned poff,
}
static int
-__iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
+__iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
struct page *page, struct iomap *iomap)
{
struct iomap_page *iop = iomap_page_create(inode, page);
@@ -596,11 +600,14 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
if (plen == 0)
break;
- if ((from <= poff || from >= poff + plen) &&
+ if (!(flags & IOMAP_WRITE_F_UNSHARE) &&
+ (from <= poff || from >= poff + plen) &&
(to <= poff || to >= poff + plen))
continue;
if (iomap_block_needs_zeroing(inode, iomap, block_start)) {
+ if (WARN_ON_ONCE(flags & IOMAP_WRITE_F_UNSHARE))
+ return -EIO;
zero_user_segments(page, poff, from, to, poff + plen);
iomap_set_range_uptodate(page, poff, plen);
continue;
@@ -646,7 +653,8 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
status = __block_write_begin_int(page, pos, len, NULL, iomap);
else
- status = __iomap_write_begin(inode, pos, len, page, iomap);
+ status = __iomap_write_begin(inode, pos, len, flags, page,
+ iomap);
if (unlikely(status))
goto out_unlock;
@@ -869,22 +877,6 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *iter,
}
EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
-static struct page *
-__iomap_read_page(struct inode *inode, loff_t offset)
-{
- struct address_space *mapping = inode->i_mapping;
- struct page *page;
-
- page = read_mapping_page(mapping, offset >> PAGE_SHIFT, NULL);
- if (IS_ERR(page))
- return page;
- if (!PageUptodate(page)) {
- put_page(page);
- return ERR_PTR(-EIO);
- }
- return page;
-}
-
static loff_t
iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
struct iomap *iomap)
@@ -900,24 +892,15 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
return length;
do {
- struct page *page, *rpage;
- unsigned long offset; /* Offset into pagecache page */
- unsigned long bytes; /* Bytes to write to page */
-
- offset = offset_in_page(pos);
- bytes = min_t(loff_t, PAGE_SIZE - offset, length);
-
- rpage = __iomap_read_page(inode, pos);
- if (IS_ERR(rpage))
- return PTR_ERR(rpage);
+ unsigned long offset = offset_in_page(pos);
+ unsigned long bytes = min_t(loff_t, PAGE_SIZE - offset, length);
+ struct page *page;
- status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
- put_page(rpage);
+ status = iomap_write_begin(inode, pos, bytes,
+ IOMAP_WRITE_F_UNSHARE, &page, iomap);
if (unlikely(status))
return status;
- WARN_ON_ONCE(!PageUptodate(page));
-
status = iomap_write_end(inode, pos, bytes, bytes, page, iomap);
if (unlikely(status <= 0)) {
if (WARN_ON_ONCE(status == 0))
--
2.20.1
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH 06/20] iomap: use write_begin to read pages to unshare
2019-10-08 7:15 ` [PATCH 06/20] iomap: use write_begin to read pages to unshare Christoph Hellwig
@ 2019-10-08 15:12 ` Darrick J. Wong
0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2019-10-08 15:12 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel
On Tue, Oct 08, 2019 at 09:15:13AM +0200, Christoph Hellwig wrote:
> Use the existing iomap write_begin code to read the pages unshared
> by iomap_file_unshare. That avoids the extra ->readpage call and
> extent tree lookup currently done by read_mapping_page.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
--D
> ---
> fs/iomap/buffered-io.c | 49 ++++++++++++++----------------------------
> 1 file changed, 16 insertions(+), 33 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index d5abd8e5dca7..ac1bbed71a9b 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -548,6 +548,10 @@ iomap_migrate_page(struct address_space *mapping, struct page *newpage,
> EXPORT_SYMBOL_GPL(iomap_migrate_page);
> #endif /* CONFIG_MIGRATION */
>
> +enum {
> + IOMAP_WRITE_F_UNSHARE = (1 << 0),
> +};
> +
> static void
> iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
> {
> @@ -577,7 +581,7 @@ iomap_read_page_sync(loff_t block_start, struct page *page, unsigned poff,
> }
>
> static int
> -__iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
> +__iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
> struct page *page, struct iomap *iomap)
> {
> struct iomap_page *iop = iomap_page_create(inode, page);
> @@ -596,11 +600,14 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
> if (plen == 0)
> break;
>
> - if ((from <= poff || from >= poff + plen) &&
> + if (!(flags & IOMAP_WRITE_F_UNSHARE) &&
> + (from <= poff || from >= poff + plen) &&
> (to <= poff || to >= poff + plen))
> continue;
>
> if (iomap_block_needs_zeroing(inode, iomap, block_start)) {
> + if (WARN_ON_ONCE(flags & IOMAP_WRITE_F_UNSHARE))
> + return -EIO;
> zero_user_segments(page, poff, from, to, poff + plen);
> iomap_set_range_uptodate(page, poff, plen);
> continue;
> @@ -646,7 +653,8 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
> else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
> status = __block_write_begin_int(page, pos, len, NULL, iomap);
> else
> - status = __iomap_write_begin(inode, pos, len, page, iomap);
> + status = __iomap_write_begin(inode, pos, len, flags, page,
> + iomap);
>
> if (unlikely(status))
> goto out_unlock;
> @@ -869,22 +877,6 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *iter,
> }
> EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
>
> -static struct page *
> -__iomap_read_page(struct inode *inode, loff_t offset)
> -{
> - struct address_space *mapping = inode->i_mapping;
> - struct page *page;
> -
> - page = read_mapping_page(mapping, offset >> PAGE_SHIFT, NULL);
> - if (IS_ERR(page))
> - return page;
> - if (!PageUptodate(page)) {
> - put_page(page);
> - return ERR_PTR(-EIO);
> - }
> - return page;
> -}
> -
> static loff_t
> iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> struct iomap *iomap)
> @@ -900,24 +892,15 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> return length;
>
> do {
> - struct page *page, *rpage;
> - unsigned long offset; /* Offset into pagecache page */
> - unsigned long bytes; /* Bytes to write to page */
> -
> - offset = offset_in_page(pos);
> - bytes = min_t(loff_t, PAGE_SIZE - offset, length);
> -
> - rpage = __iomap_read_page(inode, pos);
> - if (IS_ERR(rpage))
> - return PTR_ERR(rpage);
> + unsigned long offset = offset_in_page(pos);
> + unsigned long bytes = min_t(loff_t, PAGE_SIZE - offset, length);
> + struct page *page;
>
> - status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
> - put_page(rpage);
> + status = iomap_write_begin(inode, pos, bytes,
> + IOMAP_WRITE_F_UNSHARE, &page, iomap);
> if (unlikely(status))
> return status;
>
> - WARN_ON_ONCE(!PageUptodate(page));
> -
> status = iomap_write_end(inode, pos, bytes, bytes, page, iomap);
> if (unlikely(status <= 0)) {
> if (WARN_ON_ONCE(status == 0))
> --
> 2.20.1
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 07/20] iomap: renumber IOMAP_HOLE to 0
2019-10-08 7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
` (5 preceding siblings ...)
2019-10-08 7:15 ` [PATCH 06/20] iomap: use write_begin to read pages to unshare Christoph Hellwig
@ 2019-10-08 7:15 ` Christoph Hellwig
2019-10-08 15:01 ` Darrick J. Wong
2019-10-08 7:15 ` [PATCH 08/20] iomap: use a srcmap for a read-modify-write I/O Christoph Hellwig
` (12 subsequent siblings)
19 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08 7:15 UTC (permalink / raw)
To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel; +Cc: Darrick J . Wong
Instead of keeping a separate unnamed state for uninitialized iomaps,
renumber IOMAP_HOLE to zero so that an uninitialized iomap is treated
as a hole.
Suggested-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
include/linux/iomap.h | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 220f6b17a1a7..24c784e44274 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -23,11 +23,11 @@ struct vm_fault;
/*
* Types of block ranges for iomap mappings:
*/
-#define IOMAP_HOLE 0x01 /* no blocks allocated, need allocation */
-#define IOMAP_DELALLOC 0x02 /* delayed allocation blocks */
-#define IOMAP_MAPPED 0x03 /* blocks allocated at @addr */
-#define IOMAP_UNWRITTEN 0x04 /* blocks allocated at @addr in unwritten state */
-#define IOMAP_INLINE 0x05 /* data inline in the inode */
+#define IOMAP_HOLE 0 /* no blocks allocated, need allocation */
+#define IOMAP_DELALLOC 1 /* delayed allocation blocks */
+#define IOMAP_MAPPED 2 /* blocks allocated at @addr */
+#define IOMAP_UNWRITTEN 3 /* blocks allocated at @addr in unwritten state */
+#define IOMAP_INLINE 4 /* data inline in the inode */
/*
* Flags reported by the file system from iomap_begin:
--
2.20.1
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH 07/20] iomap: renumber IOMAP_HOLE to 0
2019-10-08 7:15 ` [PATCH 07/20] iomap: renumber IOMAP_HOLE to 0 Christoph Hellwig
@ 2019-10-08 15:01 ` Darrick J. Wong
0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2019-10-08 15:01 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel
On Tue, Oct 08, 2019 at 09:15:14AM +0200, Christoph Hellwig wrote:
> Instead of keeping a separate unnamed state for uninitialized iomaps,
> renumber IOMAP_HOLE to zero so that an uninitialized iomap is treated
> as a hole.
>
> Suggested-by: Darrick J. Wong <darrick.wong@oracle.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
--D
> ---
> include/linux/iomap.h | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 220f6b17a1a7..24c784e44274 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -23,11 +23,11 @@ struct vm_fault;
> /*
> * Types of block ranges for iomap mappings:
> */
> -#define IOMAP_HOLE 0x01 /* no blocks allocated, need allocation */
> -#define IOMAP_DELALLOC 0x02 /* delayed allocation blocks */
> -#define IOMAP_MAPPED 0x03 /* blocks allocated at @addr */
> -#define IOMAP_UNWRITTEN 0x04 /* blocks allocated at @addr in unwritten state */
> -#define IOMAP_INLINE 0x05 /* data inline in the inode */
> +#define IOMAP_HOLE 0 /* no blocks allocated, need allocation */
> +#define IOMAP_DELALLOC 1 /* delayed allocation blocks */
> +#define IOMAP_MAPPED 2 /* blocks allocated at @addr */
> +#define IOMAP_UNWRITTEN 3 /* blocks allocated at @addr in unwritten state */
> +#define IOMAP_INLINE 4 /* data inline in the inode */
>
> /*
> * Flags reported by the file system from iomap_begin:
> --
> 2.20.1
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 08/20] iomap: use a srcmap for a read-modify-write I/O
2019-10-08 7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
` (6 preceding siblings ...)
2019-10-08 7:15 ` [PATCH 07/20] iomap: renumber IOMAP_HOLE to 0 Christoph Hellwig
@ 2019-10-08 7:15 ` Christoph Hellwig
2019-10-08 15:00 ` Darrick J. Wong
2019-10-14 23:27 ` Darrick J. Wong
2019-10-08 7:15 ` [PATCH 09/20] xfs: also call xfs_file_iomap_end_delalloc for zeroing operations Christoph Hellwig
` (11 subsequent siblings)
19 siblings, 2 replies; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08 7:15 UTC (permalink / raw)
To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel
From: Goldwyn Rodrigues <rgoldwyn@suse.com>
The srcmap is used to identify where the read is to be performed from.
It is passed to ->iomap_begin, which can fill it in if we need to read
data for partially written blocks from a different location than the
write target. The srcmap is only supported for buffered writes so far.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
[hch: merged two patches, removed the IOMAP_F_COW flag, use iomap as
srcmap if not set, adjust length down to srcmap end as well]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/dax.c | 9 ++++--
fs/ext2/inode.c | 2 +-
fs/ext4/inode.c | 2 +-
fs/gfs2/bmap.c | 3 +-
fs/iomap/apply.c | 25 ++++++++++++----
fs/iomap/buffered-io.c | 65 +++++++++++++++++++++++-------------------
fs/iomap/direct-io.c | 2 +-
fs/iomap/fiemap.c | 4 +--
fs/iomap/seek.c | 4 +--
fs/iomap/swapfile.c | 3 +-
fs/xfs/xfs_iomap.c | 9 ++++--
include/linux/iomap.h | 5 ++--
12 files changed, 80 insertions(+), 53 deletions(-)
diff --git a/fs/dax.c b/fs/dax.c
index 6bf81f931de3..920105457c2c 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1090,7 +1090,7 @@ EXPORT_SYMBOL_GPL(__dax_zero_page_range);
static loff_t
dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
- struct iomap *iomap)
+ struct iomap *iomap, struct iomap *srcmap)
{
struct block_device *bdev = iomap->bdev;
struct dax_device *dax_dev = iomap->dax_dev;
@@ -1248,6 +1248,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
unsigned long vaddr = vmf->address;
loff_t pos = (loff_t)vmf->pgoff << PAGE_SHIFT;
struct iomap iomap = { 0 };
+ struct iomap srcmap = { .type = IOMAP_HOLE };
unsigned flags = IOMAP_FAULT;
int error, major = 0;
bool write = vmf->flags & FAULT_FLAG_WRITE;
@@ -1292,7 +1293,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
* the file system block size to be equal the page size, which means
* that we never have to deal with more than a single extent here.
*/
- error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap);
+ error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap, &srcmap);
if (iomap_errp)
*iomap_errp = error;
if (error) {
@@ -1472,6 +1473,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
struct inode *inode = mapping->host;
vm_fault_t result = VM_FAULT_FALLBACK;
struct iomap iomap = { 0 };
+ struct iomap srcmap = { .type = IOMAP_HOLE };
pgoff_t max_pgoff;
void *entry;
loff_t pos;
@@ -1546,7 +1548,8 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
* to look up our filesystem block.
*/
pos = (loff_t)xas.xa_index << PAGE_SHIFT;
- error = ops->iomap_begin(inode, pos, PMD_SIZE, iomap_flags, &iomap);
+ error = ops->iomap_begin(inode, pos, PMD_SIZE, iomap_flags, &iomap,
+ &srcmap);
if (error)
goto unlock_entry;
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 7004ce581a32..467c13ff6b40 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -801,7 +801,7 @@ int ext2_get_block(struct inode *inode, sector_t iblock,
#ifdef CONFIG_FS_DAX
static int ext2_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
- unsigned flags, struct iomap *iomap)
+ unsigned flags, struct iomap *iomap, struct iomap *srcmap)
{
unsigned int blkbits = inode->i_blkbits;
unsigned long first_block = offset >> blkbits;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 516faa280ced..abaaf7d96ca4 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3407,7 +3407,7 @@ static bool ext4_inode_datasync_dirty(struct inode *inode)
}
static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
- unsigned flags, struct iomap *iomap)
+ unsigned flags, struct iomap *iomap, struct iomap *srcmap)
{
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
unsigned int blkbits = inode->i_blkbits;
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index f63df54a08c6..516103248272 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -1149,7 +1149,8 @@ static inline bool gfs2_iomap_need_write_lock(unsigned flags)
}
static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
- unsigned flags, struct iomap *iomap)
+ unsigned flags, struct iomap *iomap,
+ struct iomap *srcmap)
{
struct gfs2_inode *ip = GFS2_I(inode);
struct metapath mp = { .mp_aheight = 1, };
diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c
index 54c02aecf3cd..484dd8eda861 100644
--- a/fs/iomap/apply.c
+++ b/fs/iomap/apply.c
@@ -23,8 +23,10 @@ loff_t
iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
const struct iomap_ops *ops, void *data, iomap_actor_t actor)
{
- struct iomap iomap = { 0 };
+ struct iomap iomap = { .type = IOMAP_HOLE };
+ struct iomap srcmap = { .type = IOMAP_HOLE };
loff_t written = 0, ret;
+ u64 end;
/*
* Need to map a range from start position for length bytes. This can
@@ -38,7 +40,7 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
* expose transient stale data. If the reserve fails, we can safely
* back out at this point as there is nothing to undo.
*/
- ret = ops->iomap_begin(inode, pos, length, flags, &iomap);
+ ret = ops->iomap_begin(inode, pos, length, flags, &iomap, &srcmap);
if (ret)
return ret;
if (WARN_ON(iomap.offset > pos))
@@ -50,15 +52,26 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
* Cut down the length to the one actually provided by the filesystem,
* as it might not be able to give us the whole size that we requested.
*/
- if (iomap.offset + iomap.length < pos + length)
- length = iomap.offset + iomap.length - pos;
+ end = iomap.offset + iomap.length;
+ if (srcmap.type != IOMAP_HOLE)
+ end = min(end, srcmap.offset + srcmap.length);
+ if (pos + length > end)
+ length = end - pos;
/*
- * Now that we have guaranteed that the space allocation will succeed.
+ * Now that we have guaranteed that the space allocation will succeed,
* we can do the copy-in page by page without having to worry about
* failures exposing transient data.
+ *
+ * To support COW operations, we read in data for partially blocks from
+ * the srcmap if the file system filled it in. In that case we the
+ * length needs to be limited to the earlier of the ends of the iomaps.
+ * If the file system did not provide a srcmap we pass in the normal
+ * iomap into the actors so that they don't need to have special
+ * handling for the two cases.
*/
- written = actor(inode, pos, length, data, &iomap);
+ written = actor(inode, pos, length, data, &iomap,
+ srcmap.type != IOMAP_HOLE ? &srcmap : &iomap);
/*
* Now the data has been copied, commit the range we've copied. This
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index ac1bbed71a9b..eb2c6d73a837 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -234,7 +234,7 @@ static inline bool iomap_block_needs_zeroing(struct inode *inode,
static loff_t
iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
- struct iomap *iomap)
+ struct iomap *iomap, struct iomap *srcmap)
{
struct iomap_readpage_ctx *ctx = data;
struct page *page = ctx->cur_page;
@@ -382,7 +382,7 @@ iomap_next_page(struct inode *inode, struct list_head *pages, loff_t pos,
static loff_t
iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
- void *data, struct iomap *iomap)
+ void *data, struct iomap *iomap, struct iomap *srcmap)
{
struct iomap_readpage_ctx *ctx = data;
loff_t done, ret;
@@ -402,7 +402,7 @@ iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
ctx->cur_page_in_bio = false;
}
ret = iomap_readpage_actor(inode, pos + done, length - done,
- ctx, iomap);
+ ctx, iomap, srcmap);
}
return done;
@@ -582,7 +582,7 @@ iomap_read_page_sync(loff_t block_start, struct page *page, unsigned poff,
static int
__iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
- struct page *page, struct iomap *iomap)
+ struct page *page, struct iomap *srcmap)
{
struct iomap_page *iop = iomap_page_create(inode, page);
loff_t block_size = i_blocksize(inode);
@@ -605,7 +605,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
(to <= poff || to >= poff + plen))
continue;
- if (iomap_block_needs_zeroing(inode, iomap, block_start)) {
+ if (iomap_block_needs_zeroing(inode, srcmap, block_start)) {
if (WARN_ON_ONCE(flags & IOMAP_WRITE_F_UNSHARE))
return -EIO;
zero_user_segments(page, poff, from, to, poff + plen);
@@ -614,7 +614,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
}
status = iomap_read_page_sync(block_start, page, poff, plen,
- iomap);
+ srcmap);
if (status)
return status;
} while ((block_start += plen) < block_end);
@@ -624,13 +624,15 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
static int
iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
- struct page **pagep, struct iomap *iomap)
+ struct page **pagep, struct iomap *iomap, struct iomap *srcmap)
{
const struct iomap_page_ops *page_ops = iomap->page_ops;
struct page *page;
int status = 0;
BUG_ON(pos + len > iomap->offset + iomap->length);
+ if (srcmap != iomap)
+ BUG_ON(pos + len > srcmap->offset + srcmap->length);
if (fatal_signal_pending(current))
return -EINTR;
@@ -648,13 +650,13 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
goto out_no_page;
}
- if (iomap->type == IOMAP_INLINE)
- iomap_read_inline_data(inode, page, iomap);
+ if (srcmap->type == IOMAP_INLINE)
+ iomap_read_inline_data(inode, page, srcmap);
else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
- status = __block_write_begin_int(page, pos, len, NULL, iomap);
+ status = __block_write_begin_int(page, pos, len, NULL, srcmap);
else
status = __iomap_write_begin(inode, pos, len, flags, page,
- iomap);
+ srcmap);
if (unlikely(status))
goto out_unlock;
@@ -740,16 +742,16 @@ iomap_write_end_inline(struct inode *inode, struct page *page,
}
static int
-iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
- unsigned copied, struct page *page, struct iomap *iomap)
+iomap_write_end(struct inode *inode, loff_t pos, unsigned len, unsigned copied,
+ struct page *page, struct iomap *iomap, struct iomap *srcmap)
{
const struct iomap_page_ops *page_ops = iomap->page_ops;
loff_t old_size = inode->i_size;
int ret;
- if (iomap->type == IOMAP_INLINE) {
+ if (srcmap->type == IOMAP_INLINE) {
ret = iomap_write_end_inline(inode, page, iomap, pos, copied);
- } else if (iomap->flags & IOMAP_F_BUFFER_HEAD) {
+ } else if (srcmap->flags & IOMAP_F_BUFFER_HEAD) {
ret = block_write_end(NULL, inode->i_mapping, pos, len, copied,
page, NULL);
} else {
@@ -780,7 +782,7 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
static loff_t
iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
- struct iomap *iomap)
+ struct iomap *iomap, struct iomap *srcmap)
{
struct iov_iter *i = data;
long status = 0;
@@ -814,7 +816,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
break;
}
- status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
+ status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap,
+ srcmap);
if (unlikely(status))
break;
@@ -825,8 +828,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
flush_dcache_page(page);
- status = iomap_write_end(inode, pos, bytes, copied, page,
- iomap);
+ status = iomap_write_end(inode, pos, bytes, copied, page, iomap,
+ srcmap);
if (unlikely(status < 0))
break;
copied = status;
@@ -879,7 +882,7 @@ EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
static loff_t
iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
- struct iomap *iomap)
+ struct iomap *iomap, struct iomap *srcmap)
{
long status = 0;
ssize_t written = 0;
@@ -888,7 +891,7 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
if (!(iomap->flags & IOMAP_F_SHARED))
return length;
/* don't bother with holes or unwritten extents */
- if (iomap->type == IOMAP_HOLE || iomap->type == IOMAP_UNWRITTEN)
+ if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
return length;
do {
@@ -897,11 +900,12 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
struct page *page;
status = iomap_write_begin(inode, pos, bytes,
- IOMAP_WRITE_F_UNSHARE, &page, iomap);
+ IOMAP_WRITE_F_UNSHARE, &page, iomap, srcmap);
if (unlikely(status))
return status;
- status = iomap_write_end(inode, pos, bytes, bytes, page, iomap);
+ status = iomap_write_end(inode, pos, bytes, bytes, page, iomap,
+ srcmap);
if (unlikely(status <= 0)) {
if (WARN_ON_ONCE(status == 0))
return -EIO;
@@ -940,19 +944,19 @@ iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
EXPORT_SYMBOL_GPL(iomap_file_unshare);
static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
- unsigned bytes, struct iomap *iomap)
+ unsigned bytes, struct iomap *iomap, struct iomap *srcmap)
{
struct page *page;
int status;
- status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
+ status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap, srcmap);
if (status)
return status;
zero_user(page, offset, bytes);
mark_page_accessed(page);
- return iomap_write_end(inode, pos, bytes, bytes, page, iomap);
+ return iomap_write_end(inode, pos, bytes, bytes, page, iomap, srcmap);
}
static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
@@ -964,14 +968,14 @@ static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
static loff_t
iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
- void *data, struct iomap *iomap)
+ void *data, struct iomap *iomap, struct iomap *srcmap)
{
bool *did_zero = data;
loff_t written = 0;
int status;
/* already zeroed? we're done. */
- if (iomap->type == IOMAP_HOLE || iomap->type == IOMAP_UNWRITTEN)
+ if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
return count;
do {
@@ -983,7 +987,8 @@ iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
if (IS_DAX(inode))
status = iomap_dax_zero(pos, offset, bytes, iomap);
else
- status = iomap_zero(inode, pos, offset, bytes, iomap);
+ status = iomap_zero(inode, pos, offset, bytes, iomap,
+ srcmap);
if (status < 0)
return status;
@@ -1033,7 +1038,7 @@ EXPORT_SYMBOL_GPL(iomap_truncate_page);
static loff_t
iomap_page_mkwrite_actor(struct inode *inode, loff_t pos, loff_t length,
- void *data, struct iomap *iomap)
+ void *data, struct iomap *iomap, struct iomap *srcmap)
{
struct page *page = data;
int ret;
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 1fc28c2da279..e3ccbf7daaae 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -358,7 +358,7 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length,
static loff_t
iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length,
- void *data, struct iomap *iomap)
+ void *data, struct iomap *iomap, struct iomap *srcmap)
{
struct iomap_dio *dio = data;
diff --git a/fs/iomap/fiemap.c b/fs/iomap/fiemap.c
index f26fdd36e383..690ef2d7c6c8 100644
--- a/fs/iomap/fiemap.c
+++ b/fs/iomap/fiemap.c
@@ -44,7 +44,7 @@ static int iomap_to_fiemap(struct fiemap_extent_info *fi,
static loff_t
iomap_fiemap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
- struct iomap *iomap)
+ struct iomap *iomap, struct iomap *srcmap)
{
struct fiemap_ctx *ctx = data;
loff_t ret = length;
@@ -111,7 +111,7 @@ EXPORT_SYMBOL_GPL(iomap_fiemap);
static loff_t
iomap_bmap_actor(struct inode *inode, loff_t pos, loff_t length,
- void *data, struct iomap *iomap)
+ void *data, struct iomap *iomap, struct iomap *srcmap)
{
sector_t *bno = data, addr;
diff --git a/fs/iomap/seek.c b/fs/iomap/seek.c
index c04bad4b2b43..89f61d93c0bc 100644
--- a/fs/iomap/seek.c
+++ b/fs/iomap/seek.c
@@ -119,7 +119,7 @@ page_cache_seek_hole_data(struct inode *inode, loff_t offset, loff_t length,
static loff_t
iomap_seek_hole_actor(struct inode *inode, loff_t offset, loff_t length,
- void *data, struct iomap *iomap)
+ void *data, struct iomap *iomap, struct iomap *srcmap)
{
switch (iomap->type) {
case IOMAP_UNWRITTEN:
@@ -165,7 +165,7 @@ EXPORT_SYMBOL_GPL(iomap_seek_hole);
static loff_t
iomap_seek_data_actor(struct inode *inode, loff_t offset, loff_t length,
- void *data, struct iomap *iomap)
+ void *data, struct iomap *iomap, struct iomap *srcmap)
{
switch (iomap->type) {
case IOMAP_HOLE:
diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c
index 152a230f668d..a648dbf6991e 100644
--- a/fs/iomap/swapfile.c
+++ b/fs/iomap/swapfile.c
@@ -76,7 +76,8 @@ static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi)
* distinction between written and unwritten extents.
*/
static loff_t iomap_swapfile_activate_actor(struct inode *inode, loff_t pos,
- loff_t count, void *data, struct iomap *iomap)
+ loff_t count, void *data, struct iomap *iomap,
+ struct iomap *srcmap)
{
struct iomap_swapfile_info *isi = data;
int error;
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index c0a492353826..016adcd7dd66 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -928,7 +928,8 @@ xfs_file_iomap_begin(
loff_t offset,
loff_t length,
unsigned flags,
- struct iomap *iomap)
+ struct iomap *iomap,
+ struct iomap *srcmap)
{
struct xfs_inode *ip = XFS_I(inode);
struct xfs_mount *mp = ip->i_mount;
@@ -1154,7 +1155,8 @@ xfs_seek_iomap_begin(
loff_t offset,
loff_t length,
unsigned flags,
- struct iomap *iomap)
+ struct iomap *iomap,
+ struct iomap *srcmap)
{
struct xfs_inode *ip = XFS_I(inode);
struct xfs_mount *mp = ip->i_mount;
@@ -1240,7 +1242,8 @@ xfs_xattr_iomap_begin(
loff_t offset,
loff_t length,
unsigned flags,
- struct iomap *iomap)
+ struct iomap *iomap,
+ struct iomap *srcmap)
{
struct xfs_inode *ip = XFS_I(inode);
struct xfs_mount *mp = ip->i_mount;
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 24c784e44274..37af5f9dc722 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -127,7 +127,8 @@ struct iomap_ops {
* The actual length is returned in iomap->length.
*/
int (*iomap_begin)(struct inode *inode, loff_t pos, loff_t length,
- unsigned flags, struct iomap *iomap);
+ unsigned flags, struct iomap *iomap,
+ struct iomap *srcmap);
/*
* Commit and/or unreserve space previous allocated using iomap_begin.
@@ -143,7 +144,7 @@ struct iomap_ops {
* Main iomap iterator function.
*/
typedef loff_t (*iomap_actor_t)(struct inode *inode, loff_t pos, loff_t len,
- void *data, struct iomap *iomap);
+ void *data, struct iomap *iomap, struct iomap *srcmap);
loff_t iomap_apply(struct inode *inode, loff_t pos, loff_t length,
unsigned flags, const struct iomap_ops *ops, void *data,
--
2.20.1
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH 08/20] iomap: use a srcmap for a read-modify-write I/O
2019-10-08 7:15 ` [PATCH 08/20] iomap: use a srcmap for a read-modify-write I/O Christoph Hellwig
@ 2019-10-08 15:00 ` Darrick J. Wong
2019-10-09 6:28 ` Christoph Hellwig
2019-10-14 23:27 ` Darrick J. Wong
1 sibling, 1 reply; 32+ messages in thread
From: Darrick J. Wong @ 2019-10-08 15:00 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel
On Tue, Oct 08, 2019 at 09:15:15AM +0200, Christoph Hellwig wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
>
> The srcmap is used to identify where the read is to be performed from.
> It is passed to ->iomap_begin, which can fill it in if we need to read
> data for partially written blocks from a different location than the
> write target. The srcmap is only supported for buffered writes so far.
>
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> [hch: merged two patches, removed the IOMAP_F_COW flag, use iomap as
> srcmap if not set, adjust length down to srcmap end as well]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/dax.c | 9 ++++--
> fs/ext2/inode.c | 2 +-
> fs/ext4/inode.c | 2 +-
> fs/gfs2/bmap.c | 3 +-
> fs/iomap/apply.c | 25 ++++++++++++----
> fs/iomap/buffered-io.c | 65 +++++++++++++++++++++++-------------------
> fs/iomap/direct-io.c | 2 +-
> fs/iomap/fiemap.c | 4 +--
> fs/iomap/seek.c | 4 +--
> fs/iomap/swapfile.c | 3 +-
> fs/xfs/xfs_iomap.c | 9 ++++--
> include/linux/iomap.h | 5 ++--
> 12 files changed, 80 insertions(+), 53 deletions(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index 6bf81f931de3..920105457c2c 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1090,7 +1090,7 @@ EXPORT_SYMBOL_GPL(__dax_zero_page_range);
>
> static loff_t
> dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> - struct iomap *iomap)
> + struct iomap *iomap, struct iomap *srcmap)
> {
> struct block_device *bdev = iomap->bdev;
> struct dax_device *dax_dev = iomap->dax_dev;
> @@ -1248,6 +1248,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
> unsigned long vaddr = vmf->address;
> loff_t pos = (loff_t)vmf->pgoff << PAGE_SHIFT;
> struct iomap iomap = { 0 };
Does this definition ^^^^^ need to be converted too? You convert the
one in iomap_apply()...
> + struct iomap srcmap = { .type = IOMAP_HOLE };
...and at the same time I wonder if we ought to have:
/*
* The @iomap and @srcmap parameters should be set to a hole
* prior to calling ->iomap_begin.
*/
#define IOMAP_EMPTY_RECORD { .type = IOMAP_HOLE }
...and later...
struct iomap srcmap = IOMAP_EMPTY_RECORD;
..but meh, I'm not sure that adds much.
> unsigned flags = IOMAP_FAULT;
> int error, major = 0;
> bool write = vmf->flags & FAULT_FLAG_WRITE;
> @@ -1292,7 +1293,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
> * the file system block size to be equal the page size, which means
> * that we never have to deal with more than a single extent here.
> */
> - error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap);
> + error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap, &srcmap);
->iomap_begin callers are never supposed to touch srcmap, right?
Maybe we ought to check that srcmap.io_type == HOLE, at least until
someone fixes this code to dax-copy the data from srcmap to iomap?
(I don't like this open-coded iomap_apply here, but fixing that is for
another day because I once tried to extract the iteration pieces and
yeurghck...)
The rest of the patch looks ok.
--D
> if (iomap_errp)
> *iomap_errp = error;
> if (error) {
> @@ -1472,6 +1473,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
> struct inode *inode = mapping->host;
> vm_fault_t result = VM_FAULT_FALLBACK;
> struct iomap iomap = { 0 };
> + struct iomap srcmap = { .type = IOMAP_HOLE };
> pgoff_t max_pgoff;
> void *entry;
> loff_t pos;
> @@ -1546,7 +1548,8 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
> * to look up our filesystem block.
> */
> pos = (loff_t)xas.xa_index << PAGE_SHIFT;
> - error = ops->iomap_begin(inode, pos, PMD_SIZE, iomap_flags, &iomap);
> + error = ops->iomap_begin(inode, pos, PMD_SIZE, iomap_flags, &iomap,
> + &srcmap);
> if (error)
> goto unlock_entry;
>
> diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
> index 7004ce581a32..467c13ff6b40 100644
> --- a/fs/ext2/inode.c
> +++ b/fs/ext2/inode.c
> @@ -801,7 +801,7 @@ int ext2_get_block(struct inode *inode, sector_t iblock,
>
> #ifdef CONFIG_FS_DAX
> static int ext2_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> - unsigned flags, struct iomap *iomap)
> + unsigned flags, struct iomap *iomap, struct iomap *srcmap)
> {
> unsigned int blkbits = inode->i_blkbits;
> unsigned long first_block = offset >> blkbits;
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 516faa280ced..abaaf7d96ca4 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3407,7 +3407,7 @@ static bool ext4_inode_datasync_dirty(struct inode *inode)
> }
>
> static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> - unsigned flags, struct iomap *iomap)
> + unsigned flags, struct iomap *iomap, struct iomap *srcmap)
> {
> struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> unsigned int blkbits = inode->i_blkbits;
> diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
> index f63df54a08c6..516103248272 100644
> --- a/fs/gfs2/bmap.c
> +++ b/fs/gfs2/bmap.c
> @@ -1149,7 +1149,8 @@ static inline bool gfs2_iomap_need_write_lock(unsigned flags)
> }
>
> static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
> - unsigned flags, struct iomap *iomap)
> + unsigned flags, struct iomap *iomap,
> + struct iomap *srcmap)
> {
> struct gfs2_inode *ip = GFS2_I(inode);
> struct metapath mp = { .mp_aheight = 1, };
> diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c
> index 54c02aecf3cd..484dd8eda861 100644
> --- a/fs/iomap/apply.c
> +++ b/fs/iomap/apply.c
> @@ -23,8 +23,10 @@ loff_t
> iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
> const struct iomap_ops *ops, void *data, iomap_actor_t actor)
> {
> - struct iomap iomap = { 0 };
> + struct iomap iomap = { .type = IOMAP_HOLE };
> + struct iomap srcmap = { .type = IOMAP_HOLE };
> loff_t written = 0, ret;
> + u64 end;
>
> /*
> * Need to map a range from start position for length bytes. This can
> @@ -38,7 +40,7 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
> * expose transient stale data. If the reserve fails, we can safely
> * back out at this point as there is nothing to undo.
> */
> - ret = ops->iomap_begin(inode, pos, length, flags, &iomap);
> + ret = ops->iomap_begin(inode, pos, length, flags, &iomap, &srcmap);
> if (ret)
> return ret;
> if (WARN_ON(iomap.offset > pos))
> @@ -50,15 +52,26 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
> * Cut down the length to the one actually provided by the filesystem,
> * as it might not be able to give us the whole size that we requested.
> */
> - if (iomap.offset + iomap.length < pos + length)
> - length = iomap.offset + iomap.length - pos;
> + end = iomap.offset + iomap.length;
> + if (srcmap.type != IOMAP_HOLE)
> + end = min(end, srcmap.offset + srcmap.length);
> + if (pos + length > end)
> + length = end - pos;
>
> /*
> - * Now that we have guaranteed that the space allocation will succeed.
> + * Now that we have guaranteed that the space allocation will succeed,
> * we can do the copy-in page by page without having to worry about
> * failures exposing transient data.
> + *
> + * To support COW operations, we read in data for partially blocks from
> + * the srcmap if the file system filled it in. In that case we the
> + * length needs to be limited to the earlier of the ends of the iomaps.
> + * If the file system did not provide a srcmap we pass in the normal
> + * iomap into the actors so that they don't need to have special
> + * handling for the two cases.
> */
> - written = actor(inode, pos, length, data, &iomap);
> + written = actor(inode, pos, length, data, &iomap,
> + srcmap.type != IOMAP_HOLE ? &srcmap : &iomap);
>
> /*
> * Now the data has been copied, commit the range we've copied. This
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index ac1bbed71a9b..eb2c6d73a837 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -234,7 +234,7 @@ static inline bool iomap_block_needs_zeroing(struct inode *inode,
>
> static loff_t
> iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> - struct iomap *iomap)
> + struct iomap *iomap, struct iomap *srcmap)
> {
> struct iomap_readpage_ctx *ctx = data;
> struct page *page = ctx->cur_page;
> @@ -382,7 +382,7 @@ iomap_next_page(struct inode *inode, struct list_head *pages, loff_t pos,
>
> static loff_t
> iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
> - void *data, struct iomap *iomap)
> + void *data, struct iomap *iomap, struct iomap *srcmap)
> {
> struct iomap_readpage_ctx *ctx = data;
> loff_t done, ret;
> @@ -402,7 +402,7 @@ iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
> ctx->cur_page_in_bio = false;
> }
> ret = iomap_readpage_actor(inode, pos + done, length - done,
> - ctx, iomap);
> + ctx, iomap, srcmap);
> }
>
> return done;
> @@ -582,7 +582,7 @@ iomap_read_page_sync(loff_t block_start, struct page *page, unsigned poff,
>
> static int
> __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
> - struct page *page, struct iomap *iomap)
> + struct page *page, struct iomap *srcmap)
> {
> struct iomap_page *iop = iomap_page_create(inode, page);
> loff_t block_size = i_blocksize(inode);
> @@ -605,7 +605,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
> (to <= poff || to >= poff + plen))
> continue;
>
> - if (iomap_block_needs_zeroing(inode, iomap, block_start)) {
> + if (iomap_block_needs_zeroing(inode, srcmap, block_start)) {
> if (WARN_ON_ONCE(flags & IOMAP_WRITE_F_UNSHARE))
> return -EIO;
> zero_user_segments(page, poff, from, to, poff + plen);
> @@ -614,7 +614,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
> }
>
> status = iomap_read_page_sync(block_start, page, poff, plen,
> - iomap);
> + srcmap);
> if (status)
> return status;
> } while ((block_start += plen) < block_end);
> @@ -624,13 +624,15 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
>
> static int
> iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
> - struct page **pagep, struct iomap *iomap)
> + struct page **pagep, struct iomap *iomap, struct iomap *srcmap)
> {
> const struct iomap_page_ops *page_ops = iomap->page_ops;
> struct page *page;
> int status = 0;
>
> BUG_ON(pos + len > iomap->offset + iomap->length);
> + if (srcmap != iomap)
> + BUG_ON(pos + len > srcmap->offset + srcmap->length);
This should be a WARN_ON(...) followed by return -EIO, right?
>
> if (fatal_signal_pending(current))
> return -EINTR;
> @@ -648,13 +650,13 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
> goto out_no_page;
> }
>
> - if (iomap->type == IOMAP_INLINE)
> - iomap_read_inline_data(inode, page, iomap);
> + if (srcmap->type == IOMAP_INLINE)
> + iomap_read_inline_data(inode, page, srcmap);
> else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
> - status = __block_write_begin_int(page, pos, len, NULL, iomap);
> + status = __block_write_begin_int(page, pos, len, NULL, srcmap);
> else
> status = __iomap_write_begin(inode, pos, len, flags, page,
> - iomap);
> + srcmap);
>
> if (unlikely(status))
> goto out_unlock;
> @@ -740,16 +742,16 @@ iomap_write_end_inline(struct inode *inode, struct page *page,
> }
>
> static int
> -iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
> - unsigned copied, struct page *page, struct iomap *iomap)
> +iomap_write_end(struct inode *inode, loff_t pos, unsigned len, unsigned copied,
> + struct page *page, struct iomap *iomap, struct iomap *srcmap)
> {
> const struct iomap_page_ops *page_ops = iomap->page_ops;
> loff_t old_size = inode->i_size;
> int ret;
>
> - if (iomap->type == IOMAP_INLINE) {
> + if (srcmap->type == IOMAP_INLINE) {
> ret = iomap_write_end_inline(inode, page, iomap, pos, copied);
> - } else if (iomap->flags & IOMAP_F_BUFFER_HEAD) {
> + } else if (srcmap->flags & IOMAP_F_BUFFER_HEAD) {
> ret = block_write_end(NULL, inode->i_mapping, pos, len, copied,
> page, NULL);
> } else {
> @@ -780,7 +782,7 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
>
> static loff_t
> iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> - struct iomap *iomap)
> + struct iomap *iomap, struct iomap *srcmap)
> {
> struct iov_iter *i = data;
> long status = 0;
> @@ -814,7 +816,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> break;
> }
>
> - status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
> + status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap,
> + srcmap);
> if (unlikely(status))
> break;
>
> @@ -825,8 +828,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>
> flush_dcache_page(page);
>
> - status = iomap_write_end(inode, pos, bytes, copied, page,
> - iomap);
> + status = iomap_write_end(inode, pos, bytes, copied, page, iomap,
> + srcmap);
> if (unlikely(status < 0))
> break;
> copied = status;
> @@ -879,7 +882,7 @@ EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
>
> static loff_t
> iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> - struct iomap *iomap)
> + struct iomap *iomap, struct iomap *srcmap)
> {
> long status = 0;
> ssize_t written = 0;
> @@ -888,7 +891,7 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> if (!(iomap->flags & IOMAP_F_SHARED))
> return length;
> /* don't bother with holes or unwritten extents */
> - if (iomap->type == IOMAP_HOLE || iomap->type == IOMAP_UNWRITTEN)
> + if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
> return length;
>
> do {
> @@ -897,11 +900,12 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> struct page *page;
>
> status = iomap_write_begin(inode, pos, bytes,
> - IOMAP_WRITE_F_UNSHARE, &page, iomap);
> + IOMAP_WRITE_F_UNSHARE, &page, iomap, srcmap);
> if (unlikely(status))
> return status;
>
> - status = iomap_write_end(inode, pos, bytes, bytes, page, iomap);
> + status = iomap_write_end(inode, pos, bytes, bytes, page, iomap,
> + srcmap);
> if (unlikely(status <= 0)) {
> if (WARN_ON_ONCE(status == 0))
> return -EIO;
> @@ -940,19 +944,19 @@ iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
> EXPORT_SYMBOL_GPL(iomap_file_unshare);
>
> static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
> - unsigned bytes, struct iomap *iomap)
> + unsigned bytes, struct iomap *iomap, struct iomap *srcmap)
> {
> struct page *page;
> int status;
>
> - status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
> + status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap, srcmap);
> if (status)
> return status;
>
> zero_user(page, offset, bytes);
> mark_page_accessed(page);
>
> - return iomap_write_end(inode, pos, bytes, bytes, page, iomap);
> + return iomap_write_end(inode, pos, bytes, bytes, page, iomap, srcmap);
> }
>
> static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
> @@ -964,14 +968,14 @@ static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
>
> static loff_t
> iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
> - void *data, struct iomap *iomap)
> + void *data, struct iomap *iomap, struct iomap *srcmap)
> {
> bool *did_zero = data;
> loff_t written = 0;
> int status;
>
> /* already zeroed? we're done. */
> - if (iomap->type == IOMAP_HOLE || iomap->type == IOMAP_UNWRITTEN)
> + if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
> return count;
>
> do {
> @@ -983,7 +987,8 @@ iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
> if (IS_DAX(inode))
> status = iomap_dax_zero(pos, offset, bytes, iomap);
> else
> - status = iomap_zero(inode, pos, offset, bytes, iomap);
> + status = iomap_zero(inode, pos, offset, bytes, iomap,
> + srcmap);
> if (status < 0)
> return status;
>
> @@ -1033,7 +1038,7 @@ EXPORT_SYMBOL_GPL(iomap_truncate_page);
>
> static loff_t
> iomap_page_mkwrite_actor(struct inode *inode, loff_t pos, loff_t length,
> - void *data, struct iomap *iomap)
> + void *data, struct iomap *iomap, struct iomap *srcmap)
> {
> struct page *page = data;
> int ret;
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index 1fc28c2da279..e3ccbf7daaae 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -358,7 +358,7 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length,
>
> static loff_t
> iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length,
> - void *data, struct iomap *iomap)
> + void *data, struct iomap *iomap, struct iomap *srcmap)
> {
> struct iomap_dio *dio = data;
>
> diff --git a/fs/iomap/fiemap.c b/fs/iomap/fiemap.c
> index f26fdd36e383..690ef2d7c6c8 100644
> --- a/fs/iomap/fiemap.c
> +++ b/fs/iomap/fiemap.c
> @@ -44,7 +44,7 @@ static int iomap_to_fiemap(struct fiemap_extent_info *fi,
>
> static loff_t
> iomap_fiemap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> - struct iomap *iomap)
> + struct iomap *iomap, struct iomap *srcmap)
> {
> struct fiemap_ctx *ctx = data;
> loff_t ret = length;
> @@ -111,7 +111,7 @@ EXPORT_SYMBOL_GPL(iomap_fiemap);
>
> static loff_t
> iomap_bmap_actor(struct inode *inode, loff_t pos, loff_t length,
> - void *data, struct iomap *iomap)
> + void *data, struct iomap *iomap, struct iomap *srcmap)
> {
> sector_t *bno = data, addr;
>
> diff --git a/fs/iomap/seek.c b/fs/iomap/seek.c
> index c04bad4b2b43..89f61d93c0bc 100644
> --- a/fs/iomap/seek.c
> +++ b/fs/iomap/seek.c
> @@ -119,7 +119,7 @@ page_cache_seek_hole_data(struct inode *inode, loff_t offset, loff_t length,
>
> static loff_t
> iomap_seek_hole_actor(struct inode *inode, loff_t offset, loff_t length,
> - void *data, struct iomap *iomap)
> + void *data, struct iomap *iomap, struct iomap *srcmap)
> {
> switch (iomap->type) {
> case IOMAP_UNWRITTEN:
> @@ -165,7 +165,7 @@ EXPORT_SYMBOL_GPL(iomap_seek_hole);
>
> static loff_t
> iomap_seek_data_actor(struct inode *inode, loff_t offset, loff_t length,
> - void *data, struct iomap *iomap)
> + void *data, struct iomap *iomap, struct iomap *srcmap)
> {
> switch (iomap->type) {
> case IOMAP_HOLE:
> diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c
> index 152a230f668d..a648dbf6991e 100644
> --- a/fs/iomap/swapfile.c
> +++ b/fs/iomap/swapfile.c
> @@ -76,7 +76,8 @@ static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi)
> * distinction between written and unwritten extents.
> */
> static loff_t iomap_swapfile_activate_actor(struct inode *inode, loff_t pos,
> - loff_t count, void *data, struct iomap *iomap)
> + loff_t count, void *data, struct iomap *iomap,
> + struct iomap *srcmap)
> {
> struct iomap_swapfile_info *isi = data;
> int error;
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index c0a492353826..016adcd7dd66 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -928,7 +928,8 @@ xfs_file_iomap_begin(
> loff_t offset,
> loff_t length,
> unsigned flags,
> - struct iomap *iomap)
> + struct iomap *iomap,
> + struct iomap *srcmap)
> {
> struct xfs_inode *ip = XFS_I(inode);
> struct xfs_mount *mp = ip->i_mount;
> @@ -1154,7 +1155,8 @@ xfs_seek_iomap_begin(
> loff_t offset,
> loff_t length,
> unsigned flags,
> - struct iomap *iomap)
> + struct iomap *iomap,
> + struct iomap *srcmap)
> {
> struct xfs_inode *ip = XFS_I(inode);
> struct xfs_mount *mp = ip->i_mount;
> @@ -1240,7 +1242,8 @@ xfs_xattr_iomap_begin(
> loff_t offset,
> loff_t length,
> unsigned flags,
> - struct iomap *iomap)
> + struct iomap *iomap,
> + struct iomap *srcmap)
> {
> struct xfs_inode *ip = XFS_I(inode);
> struct xfs_mount *mp = ip->i_mount;
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 24c784e44274..37af5f9dc722 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -127,7 +127,8 @@ struct iomap_ops {
> * The actual length is returned in iomap->length.
> */
> int (*iomap_begin)(struct inode *inode, loff_t pos, loff_t length,
> - unsigned flags, struct iomap *iomap);
> + unsigned flags, struct iomap *iomap,
> + struct iomap *srcmap);
>
> /*
> * Commit and/or unreserve space previous allocated using iomap_begin.
> @@ -143,7 +144,7 @@ struct iomap_ops {
> * Main iomap iterator function.
> */
> typedef loff_t (*iomap_actor_t)(struct inode *inode, loff_t pos, loff_t len,
> - void *data, struct iomap *iomap);
> + void *data, struct iomap *iomap, struct iomap *srcmap);
>
> loff_t iomap_apply(struct inode *inode, loff_t pos, loff_t length,
> unsigned flags, const struct iomap_ops *ops, void *data,
> --
> 2.20.1
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 08/20] iomap: use a srcmap for a read-modify-write I/O
2019-10-08 15:00 ` Darrick J. Wong
@ 2019-10-09 6:28 ` Christoph Hellwig
2019-10-09 17:16 ` Darrick J. Wong
0 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-09 6:28 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Christoph Hellwig, Goldwyn Rodrigues, linux-xfs, linux-fsdevel
On Tue, Oct 08, 2019 at 08:00:44AM -0700, Darrick J. Wong wrote:
> > unsigned long vaddr = vmf->address;
> > loff_t pos = (loff_t)vmf->pgoff << PAGE_SHIFT;
> > struct iomap iomap = { 0 };
>
> Does this definition ^^^^^ need to be converted too? You convert the
> one in iomap_apply()...
Doesn't strictly need to, but it sure would look nicer and fit the theme.
> /*
> * The @iomap and @srcmap parameters should be set to a hole
> * prior to calling ->iomap_begin.
> */
> #define IOMAP_EMPTY_RECORD { .type = IOMAP_HOLE }
>
> ...and later...
>
> struct iomap srcmap = IOMAP_EMPTY_RECORD;
>
> ..but meh, I'm not sure that adds much.
I don't really see the point.
> > unsigned flags = IOMAP_FAULT;
> > int error, major = 0;
> > bool write = vmf->flags & FAULT_FLAG_WRITE;
> > @@ -1292,7 +1293,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
> > * the file system block size to be equal the page size, which means
> > * that we never have to deal with more than a single extent here.
> > */
> > - error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap);
> > + error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap, &srcmap);
>
> ->iomap_begin callers are never supposed to touch srcmap, right?
> Maybe we ought to check that srcmap.io_type == HOLE, at least until
> someone fixes this code to dax-copy the data from srcmap to iomap?
What do you mean with touch? ->iomap_begin fills it out and then the
caller looks at it, at least for places that can deal with
read-modify-write operations (DAX currently can't).
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 08/20] iomap: use a srcmap for a read-modify-write I/O
2019-10-09 6:28 ` Christoph Hellwig
@ 2019-10-09 17:16 ` Darrick J. Wong
0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2019-10-09 17:16 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel
On Wed, Oct 09, 2019 at 08:28:24AM +0200, Christoph Hellwig wrote:
> On Tue, Oct 08, 2019 at 08:00:44AM -0700, Darrick J. Wong wrote:
> > > unsigned long vaddr = vmf->address;
> > > loff_t pos = (loff_t)vmf->pgoff << PAGE_SHIFT;
> > > struct iomap iomap = { 0 };
> >
> > Does this definition ^^^^^ need to be converted too? You convert the
> > one in iomap_apply()...
>
> Doesn't strictly need to, but it sure would look nicer and fit the theme.
>
> > /*
> > * The @iomap and @srcmap parameters should be set to a hole
> > * prior to calling ->iomap_begin.
> > */
> > #define IOMAP_EMPTY_RECORD { .type = IOMAP_HOLE }
> >
> > ...and later...
> >
> > struct iomap srcmap = IOMAP_EMPTY_RECORD;
> >
> > ..but meh, I'm not sure that adds much.
>
> I don't really see the point.
Yeah. Agreed.
> > > unsigned flags = IOMAP_FAULT;
> > > int error, major = 0;
> > > bool write = vmf->flags & FAULT_FLAG_WRITE;
> > > @@ -1292,7 +1293,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
> > > * the file system block size to be equal the page size, which means
> > > * that we never have to deal with more than a single extent here.
> > > */
> > > - error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap);
> > > + error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap, &srcmap);
> >
> > ->iomap_begin callers are never supposed to touch srcmap, right?
> > Maybe we ought to check that srcmap.io_type == HOLE, at least until
> > someone fixes this code to dax-copy the data from srcmap to iomap?
>
> What do you mean with touch? ->iomap_begin fills it out and then the
> caller looks at it, at least for places that can deal with
> read-modify-write operations (DAX currently can't).
Yes, I grok that the DAX code should never get fed a shared mapping, but
maybe we ought to have a WARN_ON_ONCE just in case some filesystem AI
programmer decides to backport a fs patch that results in sending a
non-hole srcmap back to the dax iomap callers. /We/ know that you
should never do this, but does the AI know? <grumble>
(Yeah, pure paranoia on my part :P)
--D
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 08/20] iomap: use a srcmap for a read-modify-write I/O
2019-10-08 7:15 ` [PATCH 08/20] iomap: use a srcmap for a read-modify-write I/O Christoph Hellwig
2019-10-08 15:00 ` Darrick J. Wong
@ 2019-10-14 23:27 ` Darrick J. Wong
2019-10-15 13:00 ` Goldwyn Rodrigues
1 sibling, 1 reply; 32+ messages in thread
From: Darrick J. Wong @ 2019-10-14 23:27 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel
On Tue, Oct 08, 2019 at 09:15:15AM +0200, Christoph Hellwig wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
>
> The srcmap is used to identify where the read is to be performed from.
> It is passed to ->iomap_begin, which can fill it in if we need to read
> data for partially written blocks from a different location than the
> write target. The srcmap is only supported for buffered writes so far.
>
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Goldwyn,
Since we've reworked your original patch quite extensively, could you
please have a look at (and if you approve, add an Acked-by) this new(er)
version so we can get this series moving for 5.5?
--D
> [hch: merged two patches, removed the IOMAP_F_COW flag, use iomap as
> srcmap if not set, adjust length down to srcmap end as well]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/dax.c | 9 ++++--
> fs/ext2/inode.c | 2 +-
> fs/ext4/inode.c | 2 +-
> fs/gfs2/bmap.c | 3 +-
> fs/iomap/apply.c | 25 ++++++++++++----
> fs/iomap/buffered-io.c | 65 +++++++++++++++++++++++-------------------
> fs/iomap/direct-io.c | 2 +-
> fs/iomap/fiemap.c | 4 +--
> fs/iomap/seek.c | 4 +--
> fs/iomap/swapfile.c | 3 +-
> fs/xfs/xfs_iomap.c | 9 ++++--
> include/linux/iomap.h | 5 ++--
> 12 files changed, 80 insertions(+), 53 deletions(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index 6bf81f931de3..920105457c2c 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1090,7 +1090,7 @@ EXPORT_SYMBOL_GPL(__dax_zero_page_range);
>
> static loff_t
> dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> - struct iomap *iomap)
> + struct iomap *iomap, struct iomap *srcmap)
> {
> struct block_device *bdev = iomap->bdev;
> struct dax_device *dax_dev = iomap->dax_dev;
> @@ -1248,6 +1248,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
> unsigned long vaddr = vmf->address;
> loff_t pos = (loff_t)vmf->pgoff << PAGE_SHIFT;
> struct iomap iomap = { 0 };
> + struct iomap srcmap = { .type = IOMAP_HOLE };
> unsigned flags = IOMAP_FAULT;
> int error, major = 0;
> bool write = vmf->flags & FAULT_FLAG_WRITE;
> @@ -1292,7 +1293,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
> * the file system block size to be equal the page size, which means
> * that we never have to deal with more than a single extent here.
> */
> - error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap);
> + error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap, &srcmap);
> if (iomap_errp)
> *iomap_errp = error;
> if (error) {
> @@ -1472,6 +1473,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
> struct inode *inode = mapping->host;
> vm_fault_t result = VM_FAULT_FALLBACK;
> struct iomap iomap = { 0 };
> + struct iomap srcmap = { .type = IOMAP_HOLE };
> pgoff_t max_pgoff;
> void *entry;
> loff_t pos;
> @@ -1546,7 +1548,8 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
> * to look up our filesystem block.
> */
> pos = (loff_t)xas.xa_index << PAGE_SHIFT;
> - error = ops->iomap_begin(inode, pos, PMD_SIZE, iomap_flags, &iomap);
> + error = ops->iomap_begin(inode, pos, PMD_SIZE, iomap_flags, &iomap,
> + &srcmap);
> if (error)
> goto unlock_entry;
>
> diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
> index 7004ce581a32..467c13ff6b40 100644
> --- a/fs/ext2/inode.c
> +++ b/fs/ext2/inode.c
> @@ -801,7 +801,7 @@ int ext2_get_block(struct inode *inode, sector_t iblock,
>
> #ifdef CONFIG_FS_DAX
> static int ext2_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> - unsigned flags, struct iomap *iomap)
> + unsigned flags, struct iomap *iomap, struct iomap *srcmap)
> {
> unsigned int blkbits = inode->i_blkbits;
> unsigned long first_block = offset >> blkbits;
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 516faa280ced..abaaf7d96ca4 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3407,7 +3407,7 @@ static bool ext4_inode_datasync_dirty(struct inode *inode)
> }
>
> static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> - unsigned flags, struct iomap *iomap)
> + unsigned flags, struct iomap *iomap, struct iomap *srcmap)
> {
> struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> unsigned int blkbits = inode->i_blkbits;
> diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
> index f63df54a08c6..516103248272 100644
> --- a/fs/gfs2/bmap.c
> +++ b/fs/gfs2/bmap.c
> @@ -1149,7 +1149,8 @@ static inline bool gfs2_iomap_need_write_lock(unsigned flags)
> }
>
> static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
> - unsigned flags, struct iomap *iomap)
> + unsigned flags, struct iomap *iomap,
> + struct iomap *srcmap)
> {
> struct gfs2_inode *ip = GFS2_I(inode);
> struct metapath mp = { .mp_aheight = 1, };
> diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c
> index 54c02aecf3cd..484dd8eda861 100644
> --- a/fs/iomap/apply.c
> +++ b/fs/iomap/apply.c
> @@ -23,8 +23,10 @@ loff_t
> iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
> const struct iomap_ops *ops, void *data, iomap_actor_t actor)
> {
> - struct iomap iomap = { 0 };
> + struct iomap iomap = { .type = IOMAP_HOLE };
> + struct iomap srcmap = { .type = IOMAP_HOLE };
> loff_t written = 0, ret;
> + u64 end;
>
> /*
> * Need to map a range from start position for length bytes. This can
> @@ -38,7 +40,7 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
> * expose transient stale data. If the reserve fails, we can safely
> * back out at this point as there is nothing to undo.
> */
> - ret = ops->iomap_begin(inode, pos, length, flags, &iomap);
> + ret = ops->iomap_begin(inode, pos, length, flags, &iomap, &srcmap);
> if (ret)
> return ret;
> if (WARN_ON(iomap.offset > pos))
> @@ -50,15 +52,26 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
> * Cut down the length to the one actually provided by the filesystem,
> * as it might not be able to give us the whole size that we requested.
> */
> - if (iomap.offset + iomap.length < pos + length)
> - length = iomap.offset + iomap.length - pos;
> + end = iomap.offset + iomap.length;
> + if (srcmap.type != IOMAP_HOLE)
> + end = min(end, srcmap.offset + srcmap.length);
> + if (pos + length > end)
> + length = end - pos;
>
> /*
> - * Now that we have guaranteed that the space allocation will succeed.
> + * Now that we have guaranteed that the space allocation will succeed,
> * we can do the copy-in page by page without having to worry about
> * failures exposing transient data.
> + *
> + * To support COW operations, we read in data for partially blocks from
> + * the srcmap if the file system filled it in. In that case we the
> + * length needs to be limited to the earlier of the ends of the iomaps.
> + * If the file system did not provide a srcmap we pass in the normal
> + * iomap into the actors so that they don't need to have special
> + * handling for the two cases.
> */
> - written = actor(inode, pos, length, data, &iomap);
> + written = actor(inode, pos, length, data, &iomap,
> + srcmap.type != IOMAP_HOLE ? &srcmap : &iomap);
>
> /*
> * Now the data has been copied, commit the range we've copied. This
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index ac1bbed71a9b..eb2c6d73a837 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -234,7 +234,7 @@ static inline bool iomap_block_needs_zeroing(struct inode *inode,
>
> static loff_t
> iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> - struct iomap *iomap)
> + struct iomap *iomap, struct iomap *srcmap)
> {
> struct iomap_readpage_ctx *ctx = data;
> struct page *page = ctx->cur_page;
> @@ -382,7 +382,7 @@ iomap_next_page(struct inode *inode, struct list_head *pages, loff_t pos,
>
> static loff_t
> iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
> - void *data, struct iomap *iomap)
> + void *data, struct iomap *iomap, struct iomap *srcmap)
> {
> struct iomap_readpage_ctx *ctx = data;
> loff_t done, ret;
> @@ -402,7 +402,7 @@ iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
> ctx->cur_page_in_bio = false;
> }
> ret = iomap_readpage_actor(inode, pos + done, length - done,
> - ctx, iomap);
> + ctx, iomap, srcmap);
> }
>
> return done;
> @@ -582,7 +582,7 @@ iomap_read_page_sync(loff_t block_start, struct page *page, unsigned poff,
>
> static int
> __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
> - struct page *page, struct iomap *iomap)
> + struct page *page, struct iomap *srcmap)
> {
> struct iomap_page *iop = iomap_page_create(inode, page);
> loff_t block_size = i_blocksize(inode);
> @@ -605,7 +605,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
> (to <= poff || to >= poff + plen))
> continue;
>
> - if (iomap_block_needs_zeroing(inode, iomap, block_start)) {
> + if (iomap_block_needs_zeroing(inode, srcmap, block_start)) {
> if (WARN_ON_ONCE(flags & IOMAP_WRITE_F_UNSHARE))
> return -EIO;
> zero_user_segments(page, poff, from, to, poff + plen);
> @@ -614,7 +614,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
> }
>
> status = iomap_read_page_sync(block_start, page, poff, plen,
> - iomap);
> + srcmap);
> if (status)
> return status;
> } while ((block_start += plen) < block_end);
> @@ -624,13 +624,15 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
>
> static int
> iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
> - struct page **pagep, struct iomap *iomap)
> + struct page **pagep, struct iomap *iomap, struct iomap *srcmap)
> {
> const struct iomap_page_ops *page_ops = iomap->page_ops;
> struct page *page;
> int status = 0;
>
> BUG_ON(pos + len > iomap->offset + iomap->length);
> + if (srcmap != iomap)
> + BUG_ON(pos + len > srcmap->offset + srcmap->length);
>
> if (fatal_signal_pending(current))
> return -EINTR;
> @@ -648,13 +650,13 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
> goto out_no_page;
> }
>
> - if (iomap->type == IOMAP_INLINE)
> - iomap_read_inline_data(inode, page, iomap);
> + if (srcmap->type == IOMAP_INLINE)
> + iomap_read_inline_data(inode, page, srcmap);
> else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
> - status = __block_write_begin_int(page, pos, len, NULL, iomap);
> + status = __block_write_begin_int(page, pos, len, NULL, srcmap);
> else
> status = __iomap_write_begin(inode, pos, len, flags, page,
> - iomap);
> + srcmap);
>
> if (unlikely(status))
> goto out_unlock;
> @@ -740,16 +742,16 @@ iomap_write_end_inline(struct inode *inode, struct page *page,
> }
>
> static int
> -iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
> - unsigned copied, struct page *page, struct iomap *iomap)
> +iomap_write_end(struct inode *inode, loff_t pos, unsigned len, unsigned copied,
> + struct page *page, struct iomap *iomap, struct iomap *srcmap)
> {
> const struct iomap_page_ops *page_ops = iomap->page_ops;
> loff_t old_size = inode->i_size;
> int ret;
>
> - if (iomap->type == IOMAP_INLINE) {
> + if (srcmap->type == IOMAP_INLINE) {
> ret = iomap_write_end_inline(inode, page, iomap, pos, copied);
> - } else if (iomap->flags & IOMAP_F_BUFFER_HEAD) {
> + } else if (srcmap->flags & IOMAP_F_BUFFER_HEAD) {
> ret = block_write_end(NULL, inode->i_mapping, pos, len, copied,
> page, NULL);
> } else {
> @@ -780,7 +782,7 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
>
> static loff_t
> iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> - struct iomap *iomap)
> + struct iomap *iomap, struct iomap *srcmap)
> {
> struct iov_iter *i = data;
> long status = 0;
> @@ -814,7 +816,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> break;
> }
>
> - status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
> + status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap,
> + srcmap);
> if (unlikely(status))
> break;
>
> @@ -825,8 +828,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>
> flush_dcache_page(page);
>
> - status = iomap_write_end(inode, pos, bytes, copied, page,
> - iomap);
> + status = iomap_write_end(inode, pos, bytes, copied, page, iomap,
> + srcmap);
> if (unlikely(status < 0))
> break;
> copied = status;
> @@ -879,7 +882,7 @@ EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
>
> static loff_t
> iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> - struct iomap *iomap)
> + struct iomap *iomap, struct iomap *srcmap)
> {
> long status = 0;
> ssize_t written = 0;
> @@ -888,7 +891,7 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> if (!(iomap->flags & IOMAP_F_SHARED))
> return length;
> /* don't bother with holes or unwritten extents */
> - if (iomap->type == IOMAP_HOLE || iomap->type == IOMAP_UNWRITTEN)
> + if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
> return length;
>
> do {
> @@ -897,11 +900,12 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> struct page *page;
>
> status = iomap_write_begin(inode, pos, bytes,
> - IOMAP_WRITE_F_UNSHARE, &page, iomap);
> + IOMAP_WRITE_F_UNSHARE, &page, iomap, srcmap);
> if (unlikely(status))
> return status;
>
> - status = iomap_write_end(inode, pos, bytes, bytes, page, iomap);
> + status = iomap_write_end(inode, pos, bytes, bytes, page, iomap,
> + srcmap);
> if (unlikely(status <= 0)) {
> if (WARN_ON_ONCE(status == 0))
> return -EIO;
> @@ -940,19 +944,19 @@ iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
> EXPORT_SYMBOL_GPL(iomap_file_unshare);
>
> static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
> - unsigned bytes, struct iomap *iomap)
> + unsigned bytes, struct iomap *iomap, struct iomap *srcmap)
> {
> struct page *page;
> int status;
>
> - status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
> + status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap, srcmap);
> if (status)
> return status;
>
> zero_user(page, offset, bytes);
> mark_page_accessed(page);
>
> - return iomap_write_end(inode, pos, bytes, bytes, page, iomap);
> + return iomap_write_end(inode, pos, bytes, bytes, page, iomap, srcmap);
> }
>
> static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
> @@ -964,14 +968,14 @@ static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
>
> static loff_t
> iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
> - void *data, struct iomap *iomap)
> + void *data, struct iomap *iomap, struct iomap *srcmap)
> {
> bool *did_zero = data;
> loff_t written = 0;
> int status;
>
> /* already zeroed? we're done. */
> - if (iomap->type == IOMAP_HOLE || iomap->type == IOMAP_UNWRITTEN)
> + if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
> return count;
>
> do {
> @@ -983,7 +987,8 @@ iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
> if (IS_DAX(inode))
> status = iomap_dax_zero(pos, offset, bytes, iomap);
> else
> - status = iomap_zero(inode, pos, offset, bytes, iomap);
> + status = iomap_zero(inode, pos, offset, bytes, iomap,
> + srcmap);
> if (status < 0)
> return status;
>
> @@ -1033,7 +1038,7 @@ EXPORT_SYMBOL_GPL(iomap_truncate_page);
>
> static loff_t
> iomap_page_mkwrite_actor(struct inode *inode, loff_t pos, loff_t length,
> - void *data, struct iomap *iomap)
> + void *data, struct iomap *iomap, struct iomap *srcmap)
> {
> struct page *page = data;
> int ret;
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index 1fc28c2da279..e3ccbf7daaae 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -358,7 +358,7 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length,
>
> static loff_t
> iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length,
> - void *data, struct iomap *iomap)
> + void *data, struct iomap *iomap, struct iomap *srcmap)
> {
> struct iomap_dio *dio = data;
>
> diff --git a/fs/iomap/fiemap.c b/fs/iomap/fiemap.c
> index f26fdd36e383..690ef2d7c6c8 100644
> --- a/fs/iomap/fiemap.c
> +++ b/fs/iomap/fiemap.c
> @@ -44,7 +44,7 @@ static int iomap_to_fiemap(struct fiemap_extent_info *fi,
>
> static loff_t
> iomap_fiemap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> - struct iomap *iomap)
> + struct iomap *iomap, struct iomap *srcmap)
> {
> struct fiemap_ctx *ctx = data;
> loff_t ret = length;
> @@ -111,7 +111,7 @@ EXPORT_SYMBOL_GPL(iomap_fiemap);
>
> static loff_t
> iomap_bmap_actor(struct inode *inode, loff_t pos, loff_t length,
> - void *data, struct iomap *iomap)
> + void *data, struct iomap *iomap, struct iomap *srcmap)
> {
> sector_t *bno = data, addr;
>
> diff --git a/fs/iomap/seek.c b/fs/iomap/seek.c
> index c04bad4b2b43..89f61d93c0bc 100644
> --- a/fs/iomap/seek.c
> +++ b/fs/iomap/seek.c
> @@ -119,7 +119,7 @@ page_cache_seek_hole_data(struct inode *inode, loff_t offset, loff_t length,
>
> static loff_t
> iomap_seek_hole_actor(struct inode *inode, loff_t offset, loff_t length,
> - void *data, struct iomap *iomap)
> + void *data, struct iomap *iomap, struct iomap *srcmap)
> {
> switch (iomap->type) {
> case IOMAP_UNWRITTEN:
> @@ -165,7 +165,7 @@ EXPORT_SYMBOL_GPL(iomap_seek_hole);
>
> static loff_t
> iomap_seek_data_actor(struct inode *inode, loff_t offset, loff_t length,
> - void *data, struct iomap *iomap)
> + void *data, struct iomap *iomap, struct iomap *srcmap)
> {
> switch (iomap->type) {
> case IOMAP_HOLE:
> diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c
> index 152a230f668d..a648dbf6991e 100644
> --- a/fs/iomap/swapfile.c
> +++ b/fs/iomap/swapfile.c
> @@ -76,7 +76,8 @@ static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi)
> * distinction between written and unwritten extents.
> */
> static loff_t iomap_swapfile_activate_actor(struct inode *inode, loff_t pos,
> - loff_t count, void *data, struct iomap *iomap)
> + loff_t count, void *data, struct iomap *iomap,
> + struct iomap *srcmap)
> {
> struct iomap_swapfile_info *isi = data;
> int error;
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index c0a492353826..016adcd7dd66 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -928,7 +928,8 @@ xfs_file_iomap_begin(
> loff_t offset,
> loff_t length,
> unsigned flags,
> - struct iomap *iomap)
> + struct iomap *iomap,
> + struct iomap *srcmap)
> {
> struct xfs_inode *ip = XFS_I(inode);
> struct xfs_mount *mp = ip->i_mount;
> @@ -1154,7 +1155,8 @@ xfs_seek_iomap_begin(
> loff_t offset,
> loff_t length,
> unsigned flags,
> - struct iomap *iomap)
> + struct iomap *iomap,
> + struct iomap *srcmap)
> {
> struct xfs_inode *ip = XFS_I(inode);
> struct xfs_mount *mp = ip->i_mount;
> @@ -1240,7 +1242,8 @@ xfs_xattr_iomap_begin(
> loff_t offset,
> loff_t length,
> unsigned flags,
> - struct iomap *iomap)
> + struct iomap *iomap,
> + struct iomap *srcmap)
> {
> struct xfs_inode *ip = XFS_I(inode);
> struct xfs_mount *mp = ip->i_mount;
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 24c784e44274..37af5f9dc722 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -127,7 +127,8 @@ struct iomap_ops {
> * The actual length is returned in iomap->length.
> */
> int (*iomap_begin)(struct inode *inode, loff_t pos, loff_t length,
> - unsigned flags, struct iomap *iomap);
> + unsigned flags, struct iomap *iomap,
> + struct iomap *srcmap);
>
> /*
> * Commit and/or unreserve space previous allocated using iomap_begin.
> @@ -143,7 +144,7 @@ struct iomap_ops {
> * Main iomap iterator function.
> */
> typedef loff_t (*iomap_actor_t)(struct inode *inode, loff_t pos, loff_t len,
> - void *data, struct iomap *iomap);
> + void *data, struct iomap *iomap, struct iomap *srcmap);
>
> loff_t iomap_apply(struct inode *inode, loff_t pos, loff_t length,
> unsigned flags, const struct iomap_ops *ops, void *data,
> --
> 2.20.1
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 08/20] iomap: use a srcmap for a read-modify-write I/O
2019-10-14 23:27 ` Darrick J. Wong
@ 2019-10-15 13:00 ` Goldwyn Rodrigues
0 siblings, 0 replies; 32+ messages in thread
From: Goldwyn Rodrigues @ 2019-10-15 13:00 UTC (permalink / raw)
To: Darrick J. Wong ; +Cc: Christoph Hellwig, linux-fsdevel, linux-xfs
On 16:27 14/10, Darrick J. Wong wrote:
> On Tue, Oct 08, 2019 at 09:15:15AM +0200, Christoph Hellwig wrote:
> > From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> >
> > The srcmap is used to identify where the read is to be performed from.
> > It is passed to ->iomap_begin, which can fill it in if we need to read
> > data for partially written blocks from a different location than the
> > write target. The srcmap is only supported for buffered writes so far.
> >
> > Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
>
> Goldwyn,
>
> Since we've reworked your original patch quite extensively, could you
> please have a look at (and if you approve, add an Acked-by) this new(er)
> version so we can get this series moving for 5.5?
>
>
Sorry, was on vacation until yesterday.
Yes, I have used this version in my series and I can confirm it works
correctly with btrfs. Thanks!
> > [hch: merged two patches, removed the IOMAP_F_COW flag, use iomap as
> > srcmap if not set, adjust length down to srcmap end as well]
Acked-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> > fs/dax.c | 9 ++++--
> > fs/ext2/inode.c | 2 +-
> > fs/ext4/inode.c | 2 +-
> > fs/gfs2/bmap.c | 3 +-
> > fs/iomap/apply.c | 25 ++++++++++++----
> > fs/iomap/buffered-io.c | 65 +++++++++++++++++++++++-------------------
> > fs/iomap/direct-io.c | 2 +-
> > fs/iomap/fiemap.c | 4 +--
> > fs/iomap/seek.c | 4 +--
> > fs/iomap/swapfile.c | 3 +-
> > fs/xfs/xfs_iomap.c | 9 ++++--
> > include/linux/iomap.h | 5 ++--
> > 12 files changed, 80 insertions(+), 53 deletions(-)
> >
> > diff --git a/fs/dax.c b/fs/dax.c
> > index 6bf81f931de3..920105457c2c 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -1090,7 +1090,7 @@ EXPORT_SYMBOL_GPL(__dax_zero_page_range);
> >
> > static loff_t
> > dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> > - struct iomap *iomap)
> > + struct iomap *iomap, struct iomap *srcmap)
> > {
> > struct block_device *bdev = iomap->bdev;
> > struct dax_device *dax_dev = iomap->dax_dev;
> > @@ -1248,6 +1248,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
> > unsigned long vaddr = vmf->address;
> > loff_t pos = (loff_t)vmf->pgoff << PAGE_SHIFT;
> > struct iomap iomap = { 0 };
> > + struct iomap srcmap = { .type = IOMAP_HOLE };
> > unsigned flags = IOMAP_FAULT;
> > int error, major = 0;
> > bool write = vmf->flags & FAULT_FLAG_WRITE;
> > @@ -1292,7 +1293,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
> > * the file system block size to be equal the page size, which means
> > * that we never have to deal with more than a single extent here.
> > */
> > - error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap);
> > + error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap, &srcmap);
> > if (iomap_errp)
> > *iomap_errp = error;
> > if (error) {
> > @@ -1472,6 +1473,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
> > struct inode *inode = mapping->host;
> > vm_fault_t result = VM_FAULT_FALLBACK;
> > struct iomap iomap = { 0 };
> > + struct iomap srcmap = { .type = IOMAP_HOLE };
> > pgoff_t max_pgoff;
> > void *entry;
> > loff_t pos;
> > @@ -1546,7 +1548,8 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
> > * to look up our filesystem block.
> > */
> > pos = (loff_t)xas.xa_index << PAGE_SHIFT;
> > - error = ops->iomap_begin(inode, pos, PMD_SIZE, iomap_flags, &iomap);
> > + error = ops->iomap_begin(inode, pos, PMD_SIZE, iomap_flags, &iomap,
> > + &srcmap);
> > if (error)
> > goto unlock_entry;
> >
> > diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
> > index 7004ce581a32..467c13ff6b40 100644
> > --- a/fs/ext2/inode.c
> > +++ b/fs/ext2/inode.c
> > @@ -801,7 +801,7 @@ int ext2_get_block(struct inode *inode, sector_t iblock,
> >
> > #ifdef CONFIG_FS_DAX
> > static int ext2_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> > - unsigned flags, struct iomap *iomap)
> > + unsigned flags, struct iomap *iomap, struct iomap *srcmap)
> > {
> > unsigned int blkbits = inode->i_blkbits;
> > unsigned long first_block = offset >> blkbits;
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index 516faa280ced..abaaf7d96ca4 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -3407,7 +3407,7 @@ static bool ext4_inode_datasync_dirty(struct inode *inode)
> > }
> >
> > static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> > - unsigned flags, struct iomap *iomap)
> > + unsigned flags, struct iomap *iomap, struct iomap *srcmap)
> > {
> > struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> > unsigned int blkbits = inode->i_blkbits;
> > diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
> > index f63df54a08c6..516103248272 100644
> > --- a/fs/gfs2/bmap.c
> > +++ b/fs/gfs2/bmap.c
> > @@ -1149,7 +1149,8 @@ static inline bool gfs2_iomap_need_write_lock(unsigned flags)
> > }
> >
> > static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
> > - unsigned flags, struct iomap *iomap)
> > + unsigned flags, struct iomap *iomap,
> > + struct iomap *srcmap)
> > {
> > struct gfs2_inode *ip = GFS2_I(inode);
> > struct metapath mp = { .mp_aheight = 1, };
> > diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c
> > index 54c02aecf3cd..484dd8eda861 100644
> > --- a/fs/iomap/apply.c
> > +++ b/fs/iomap/apply.c
> > @@ -23,8 +23,10 @@ loff_t
> > iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
> > const struct iomap_ops *ops, void *data, iomap_actor_t actor)
> > {
> > - struct iomap iomap = { 0 };
> > + struct iomap iomap = { .type = IOMAP_HOLE };
> > + struct iomap srcmap = { .type = IOMAP_HOLE };
> > loff_t written = 0, ret;
> > + u64 end;
> >
> > /*
> > * Need to map a range from start position for length bytes. This can
> > @@ -38,7 +40,7 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
> > * expose transient stale data. If the reserve fails, we can safely
> > * back out at this point as there is nothing to undo.
> > */
> > - ret = ops->iomap_begin(inode, pos, length, flags, &iomap);
> > + ret = ops->iomap_begin(inode, pos, length, flags, &iomap, &srcmap);
> > if (ret)
> > return ret;
> > if (WARN_ON(iomap.offset > pos))
> > @@ -50,15 +52,26 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
> > * Cut down the length to the one actually provided by the filesystem,
> > * as it might not be able to give us the whole size that we requested.
> > */
> > - if (iomap.offset + iomap.length < pos + length)
> > - length = iomap.offset + iomap.length - pos;
> > + end = iomap.offset + iomap.length;
> > + if (srcmap.type != IOMAP_HOLE)
> > + end = min(end, srcmap.offset + srcmap.length);
> > + if (pos + length > end)
> > + length = end - pos;
> >
> > /*
> > - * Now that we have guaranteed that the space allocation will succeed.
> > + * Now that we have guaranteed that the space allocation will succeed,
> > * we can do the copy-in page by page without having to worry about
> > * failures exposing transient data.
> > + *
> > + * To support COW operations, we read in data for partially blocks from
> > + * the srcmap if the file system filled it in. In that case we the
> > + * length needs to be limited to the earlier of the ends of the iomaps.
> > + * If the file system did not provide a srcmap we pass in the normal
> > + * iomap into the actors so that they don't need to have special
> > + * handling for the two cases.
> > */
> > - written = actor(inode, pos, length, data, &iomap);
> > + written = actor(inode, pos, length, data, &iomap,
> > + srcmap.type != IOMAP_HOLE ? &srcmap : &iomap);
> >
> > /*
> > * Now the data has been copied, commit the range we've copied. This
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index ac1bbed71a9b..eb2c6d73a837 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -234,7 +234,7 @@ static inline bool iomap_block_needs_zeroing(struct inode *inode,
> >
> > static loff_t
> > iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> > - struct iomap *iomap)
> > + struct iomap *iomap, struct iomap *srcmap)
> > {
> > struct iomap_readpage_ctx *ctx = data;
> > struct page *page = ctx->cur_page;
> > @@ -382,7 +382,7 @@ iomap_next_page(struct inode *inode, struct list_head *pages, loff_t pos,
> >
> > static loff_t
> > iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
> > - void *data, struct iomap *iomap)
> > + void *data, struct iomap *iomap, struct iomap *srcmap)
> > {
> > struct iomap_readpage_ctx *ctx = data;
> > loff_t done, ret;
> > @@ -402,7 +402,7 @@ iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
> > ctx->cur_page_in_bio = false;
> > }
> > ret = iomap_readpage_actor(inode, pos + done, length - done,
> > - ctx, iomap);
> > + ctx, iomap, srcmap);
> > }
> >
> > return done;
> > @@ -582,7 +582,7 @@ iomap_read_page_sync(loff_t block_start, struct page *page, unsigned poff,
> >
> > static int
> > __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
> > - struct page *page, struct iomap *iomap)
> > + struct page *page, struct iomap *srcmap)
> > {
> > struct iomap_page *iop = iomap_page_create(inode, page);
> > loff_t block_size = i_blocksize(inode);
> > @@ -605,7 +605,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
> > (to <= poff || to >= poff + plen))
> > continue;
> >
> > - if (iomap_block_needs_zeroing(inode, iomap, block_start)) {
> > + if (iomap_block_needs_zeroing(inode, srcmap, block_start)) {
> > if (WARN_ON_ONCE(flags & IOMAP_WRITE_F_UNSHARE))
> > return -EIO;
> > zero_user_segments(page, poff, from, to, poff + plen);
> > @@ -614,7 +614,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
> > }
> >
> > status = iomap_read_page_sync(block_start, page, poff, plen,
> > - iomap);
> > + srcmap);
> > if (status)
> > return status;
> > } while ((block_start += plen) < block_end);
> > @@ -624,13 +624,15 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
> >
> > static int
> > iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
> > - struct page **pagep, struct iomap *iomap)
> > + struct page **pagep, struct iomap *iomap, struct iomap *srcmap)
> > {
> > const struct iomap_page_ops *page_ops = iomap->page_ops;
> > struct page *page;
> > int status = 0;
> >
> > BUG_ON(pos + len > iomap->offset + iomap->length);
> > + if (srcmap != iomap)
> > + BUG_ON(pos + len > srcmap->offset + srcmap->length);
> >
> > if (fatal_signal_pending(current))
> > return -EINTR;
> > @@ -648,13 +650,13 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
> > goto out_no_page;
> > }
> >
> > - if (iomap->type == IOMAP_INLINE)
> > - iomap_read_inline_data(inode, page, iomap);
> > + if (srcmap->type == IOMAP_INLINE)
> > + iomap_read_inline_data(inode, page, srcmap);
> > else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
> > - status = __block_write_begin_int(page, pos, len, NULL, iomap);
> > + status = __block_write_begin_int(page, pos, len, NULL, srcmap);
> > else
> > status = __iomap_write_begin(inode, pos, len, flags, page,
> > - iomap);
> > + srcmap);
> >
> > if (unlikely(status))
> > goto out_unlock;
> > @@ -740,16 +742,16 @@ iomap_write_end_inline(struct inode *inode, struct page *page,
> > }
> >
> > static int
> > -iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
> > - unsigned copied, struct page *page, struct iomap *iomap)
> > +iomap_write_end(struct inode *inode, loff_t pos, unsigned len, unsigned copied,
> > + struct page *page, struct iomap *iomap, struct iomap *srcmap)
> > {
> > const struct iomap_page_ops *page_ops = iomap->page_ops;
> > loff_t old_size = inode->i_size;
> > int ret;
> >
> > - if (iomap->type == IOMAP_INLINE) {
> > + if (srcmap->type == IOMAP_INLINE) {
> > ret = iomap_write_end_inline(inode, page, iomap, pos, copied);
> > - } else if (iomap->flags & IOMAP_F_BUFFER_HEAD) {
> > + } else if (srcmap->flags & IOMAP_F_BUFFER_HEAD) {
> > ret = block_write_end(NULL, inode->i_mapping, pos, len, copied,
> > page, NULL);
> > } else {
> > @@ -780,7 +782,7 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
> >
> > static loff_t
> > iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> > - struct iomap *iomap)
> > + struct iomap *iomap, struct iomap *srcmap)
> > {
> > struct iov_iter *i = data;
> > long status = 0;
> > @@ -814,7 +816,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> > break;
> > }
> >
> > - status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
> > + status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap,
> > + srcmap);
> > if (unlikely(status))
> > break;
> >
> > @@ -825,8 +828,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> >
> > flush_dcache_page(page);
> >
> > - status = iomap_write_end(inode, pos, bytes, copied, page,
> > - iomap);
> > + status = iomap_write_end(inode, pos, bytes, copied, page, iomap,
> > + srcmap);
> > if (unlikely(status < 0))
> > break;
> > copied = status;
> > @@ -879,7 +882,7 @@ EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
> >
> > static loff_t
> > iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> > - struct iomap *iomap)
> > + struct iomap *iomap, struct iomap *srcmap)
> > {
> > long status = 0;
> > ssize_t written = 0;
> > @@ -888,7 +891,7 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> > if (!(iomap->flags & IOMAP_F_SHARED))
> > return length;
> > /* don't bother with holes or unwritten extents */
> > - if (iomap->type == IOMAP_HOLE || iomap->type == IOMAP_UNWRITTEN)
> > + if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
> > return length;
> >
> > do {
> > @@ -897,11 +900,12 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> > struct page *page;
> >
> > status = iomap_write_begin(inode, pos, bytes,
> > - IOMAP_WRITE_F_UNSHARE, &page, iomap);
> > + IOMAP_WRITE_F_UNSHARE, &page, iomap, srcmap);
> > if (unlikely(status))
> > return status;
> >
> > - status = iomap_write_end(inode, pos, bytes, bytes, page, iomap);
> > + status = iomap_write_end(inode, pos, bytes, bytes, page, iomap,
> > + srcmap);
> > if (unlikely(status <= 0)) {
> > if (WARN_ON_ONCE(status == 0))
> > return -EIO;
> > @@ -940,19 +944,19 @@ iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
> > EXPORT_SYMBOL_GPL(iomap_file_unshare);
> >
> > static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
> > - unsigned bytes, struct iomap *iomap)
> > + unsigned bytes, struct iomap *iomap, struct iomap *srcmap)
> > {
> > struct page *page;
> > int status;
> >
> > - status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
> > + status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap, srcmap);
> > if (status)
> > return status;
> >
> > zero_user(page, offset, bytes);
> > mark_page_accessed(page);
> >
> > - return iomap_write_end(inode, pos, bytes, bytes, page, iomap);
> > + return iomap_write_end(inode, pos, bytes, bytes, page, iomap, srcmap);
> > }
> >
> > static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
> > @@ -964,14 +968,14 @@ static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
> >
> > static loff_t
> > iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
> > - void *data, struct iomap *iomap)
> > + void *data, struct iomap *iomap, struct iomap *srcmap)
> > {
> > bool *did_zero = data;
> > loff_t written = 0;
> > int status;
> >
> > /* already zeroed? we're done. */
> > - if (iomap->type == IOMAP_HOLE || iomap->type == IOMAP_UNWRITTEN)
> > + if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
> > return count;
> >
> > do {
> > @@ -983,7 +987,8 @@ iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
> > if (IS_DAX(inode))
> > status = iomap_dax_zero(pos, offset, bytes, iomap);
> > else
> > - status = iomap_zero(inode, pos, offset, bytes, iomap);
> > + status = iomap_zero(inode, pos, offset, bytes, iomap,
> > + srcmap);
> > if (status < 0)
> > return status;
> >
> > @@ -1033,7 +1038,7 @@ EXPORT_SYMBOL_GPL(iomap_truncate_page);
> >
> > static loff_t
> > iomap_page_mkwrite_actor(struct inode *inode, loff_t pos, loff_t length,
> > - void *data, struct iomap *iomap)
> > + void *data, struct iomap *iomap, struct iomap *srcmap)
> > {
> > struct page *page = data;
> > int ret;
> > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > index 1fc28c2da279..e3ccbf7daaae 100644
> > --- a/fs/iomap/direct-io.c
> > +++ b/fs/iomap/direct-io.c
> > @@ -358,7 +358,7 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length,
> >
> > static loff_t
> > iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length,
> > - void *data, struct iomap *iomap)
> > + void *data, struct iomap *iomap, struct iomap *srcmap)
> > {
> > struct iomap_dio *dio = data;
> >
> > diff --git a/fs/iomap/fiemap.c b/fs/iomap/fiemap.c
> > index f26fdd36e383..690ef2d7c6c8 100644
> > --- a/fs/iomap/fiemap.c
> > +++ b/fs/iomap/fiemap.c
> > @@ -44,7 +44,7 @@ static int iomap_to_fiemap(struct fiemap_extent_info *fi,
> >
> > static loff_t
> > iomap_fiemap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> > - struct iomap *iomap)
> > + struct iomap *iomap, struct iomap *srcmap)
> > {
> > struct fiemap_ctx *ctx = data;
> > loff_t ret = length;
> > @@ -111,7 +111,7 @@ EXPORT_SYMBOL_GPL(iomap_fiemap);
> >
> > static loff_t
> > iomap_bmap_actor(struct inode *inode, loff_t pos, loff_t length,
> > - void *data, struct iomap *iomap)
> > + void *data, struct iomap *iomap, struct iomap *srcmap)
> > {
> > sector_t *bno = data, addr;
> >
> > diff --git a/fs/iomap/seek.c b/fs/iomap/seek.c
> > index c04bad4b2b43..89f61d93c0bc 100644
> > --- a/fs/iomap/seek.c
> > +++ b/fs/iomap/seek.c
> > @@ -119,7 +119,7 @@ page_cache_seek_hole_data(struct inode *inode, loff_t offset, loff_t length,
> >
> > static loff_t
> > iomap_seek_hole_actor(struct inode *inode, loff_t offset, loff_t length,
> > - void *data, struct iomap *iomap)
> > + void *data, struct iomap *iomap, struct iomap *srcmap)
> > {
> > switch (iomap->type) {
> > case IOMAP_UNWRITTEN:
> > @@ -165,7 +165,7 @@ EXPORT_SYMBOL_GPL(iomap_seek_hole);
> >
> > static loff_t
> > iomap_seek_data_actor(struct inode *inode, loff_t offset, loff_t length,
> > - void *data, struct iomap *iomap)
> > + void *data, struct iomap *iomap, struct iomap *srcmap)
> > {
> > switch (iomap->type) {
> > case IOMAP_HOLE:
> > diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c
> > index 152a230f668d..a648dbf6991e 100644
> > --- a/fs/iomap/swapfile.c
> > +++ b/fs/iomap/swapfile.c
> > @@ -76,7 +76,8 @@ static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi)
> > * distinction between written and unwritten extents.
> > */
> > static loff_t iomap_swapfile_activate_actor(struct inode *inode, loff_t pos,
> > - loff_t count, void *data, struct iomap *iomap)
> > + loff_t count, void *data, struct iomap *iomap,
> > + struct iomap *srcmap)
> > {
> > struct iomap_swapfile_info *isi = data;
> > int error;
> > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > index c0a492353826..016adcd7dd66 100644
> > --- a/fs/xfs/xfs_iomap.c
> > +++ b/fs/xfs/xfs_iomap.c
> > @@ -928,7 +928,8 @@ xfs_file_iomap_begin(
> > loff_t offset,
> > loff_t length,
> > unsigned flags,
> > - struct iomap *iomap)
> > + struct iomap *iomap,
> > + struct iomap *srcmap)
> > {
> > struct xfs_inode *ip = XFS_I(inode);
> > struct xfs_mount *mp = ip->i_mount;
> > @@ -1154,7 +1155,8 @@ xfs_seek_iomap_begin(
> > loff_t offset,
> > loff_t length,
> > unsigned flags,
> > - struct iomap *iomap)
> > + struct iomap *iomap,
> > + struct iomap *srcmap)
> > {
> > struct xfs_inode *ip = XFS_I(inode);
> > struct xfs_mount *mp = ip->i_mount;
> > @@ -1240,7 +1242,8 @@ xfs_xattr_iomap_begin(
> > loff_t offset,
> > loff_t length,
> > unsigned flags,
> > - struct iomap *iomap)
> > + struct iomap *iomap,
> > + struct iomap *srcmap)
> > {
> > struct xfs_inode *ip = XFS_I(inode);
> > struct xfs_mount *mp = ip->i_mount;
> > diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> > index 24c784e44274..37af5f9dc722 100644
> > --- a/include/linux/iomap.h
> > +++ b/include/linux/iomap.h
> > @@ -127,7 +127,8 @@ struct iomap_ops {
> > * The actual length is returned in iomap->length.
> > */
> > int (*iomap_begin)(struct inode *inode, loff_t pos, loff_t length,
> > - unsigned flags, struct iomap *iomap);
> > + unsigned flags, struct iomap *iomap,
> > + struct iomap *srcmap);
> >
> > /*
> > * Commit and/or unreserve space previous allocated using iomap_begin.
> > @@ -143,7 +144,7 @@ struct iomap_ops {
> > * Main iomap iterator function.
> > */
> > typedef loff_t (*iomap_actor_t)(struct inode *inode, loff_t pos, loff_t len,
> > - void *data, struct iomap *iomap);
> > + void *data, struct iomap *iomap, struct iomap *srcmap);
> >
> > loff_t iomap_apply(struct inode *inode, loff_t pos, loff_t length,
> > unsigned flags, const struct iomap_ops *ops, void *data,
> > --
> > 2.20.1
> >
>
--
Goldwyn
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 09/20] xfs: also call xfs_file_iomap_end_delalloc for zeroing operations
2019-10-08 7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
` (7 preceding siblings ...)
2019-10-08 7:15 ` [PATCH 08/20] iomap: use a srcmap for a read-modify-write I/O Christoph Hellwig
@ 2019-10-08 7:15 ` Christoph Hellwig
2019-10-08 7:15 ` [PATCH 10/20] xfs: remove xfs_reflink_dirty_extents Christoph Hellwig
` (10 subsequent siblings)
19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08 7:15 UTC (permalink / raw)
To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel; +Cc: Darrick J . Wong
There is no reason not to punch out stale delalloc blocks for zeroing
operations, as they otherwise behave exactly like normal writes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/xfs_iomap.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 016adcd7dd66..7031547e25ad 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1138,7 +1138,8 @@ xfs_file_iomap_end(
unsigned flags,
struct iomap *iomap)
{
- if ((flags & IOMAP_WRITE) && iomap->type == IOMAP_DELALLOC)
+ if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) &&
+ iomap->type == IOMAP_DELALLOC)
return xfs_file_iomap_end_delalloc(XFS_I(inode), offset,
length, written, iomap);
return 0;
--
2.20.1
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 10/20] xfs: remove xfs_reflink_dirty_extents
2019-10-08 7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
` (8 preceding siblings ...)
2019-10-08 7:15 ` [PATCH 09/20] xfs: also call xfs_file_iomap_end_delalloc for zeroing operations Christoph Hellwig
@ 2019-10-08 7:15 ` Christoph Hellwig
2019-10-08 15:20 ` Darrick J. Wong
2019-10-08 7:15 ` [PATCH 11/20] xfs: pass two imaps to xfs_reflink_allocate_cow Christoph Hellwig
` (9 subsequent siblings)
19 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08 7:15 UTC (permalink / raw)
To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel
Now that xfs_file_unshare is not completely dumb we can just call it
directly without iterating the extent and reflink btrees ourselves.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_reflink.c | 103 +++----------------------------------------
1 file changed, 5 insertions(+), 98 deletions(-)
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index a9634110c783..7fc728a8852b 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1381,85 +1381,6 @@ xfs_reflink_remap_prep(
return ret;
}
-/*
- * The user wants to preemptively CoW all shared blocks in this file,
- * which enables us to turn off the reflink flag. Iterate all
- * extents which are not prealloc/delalloc to see which ranges are
- * mentioned in the refcount tree, then read those blocks into the
- * pagecache, dirty them, fsync them back out, and then we can update
- * the inode flag. What happens if we run out of memory? :)
- */
-STATIC int
-xfs_reflink_dirty_extents(
- struct xfs_inode *ip,
- xfs_fileoff_t fbno,
- xfs_filblks_t end,
- xfs_off_t isize)
-{
- struct xfs_mount *mp = ip->i_mount;
- xfs_agnumber_t agno;
- xfs_agblock_t agbno;
- xfs_extlen_t aglen;
- xfs_agblock_t rbno;
- xfs_extlen_t rlen;
- xfs_off_t fpos;
- xfs_off_t flen;
- struct xfs_bmbt_irec map[2];
- int nmaps;
- int error = 0;
-
- while (end - fbno > 0) {
- nmaps = 1;
- /*
- * Look for extents in the file. Skip holes, delalloc, or
- * unwritten extents; they can't be reflinked.
- */
- error = xfs_bmapi_read(ip, fbno, end - fbno, map, &nmaps, 0);
- if (error)
- goto out;
- if (nmaps == 0)
- break;
- if (!xfs_bmap_is_real_extent(&map[0]))
- goto next;
-
- map[1] = map[0];
- while (map[1].br_blockcount) {
- agno = XFS_FSB_TO_AGNO(mp, map[1].br_startblock);
- agbno = XFS_FSB_TO_AGBNO(mp, map[1].br_startblock);
- aglen = map[1].br_blockcount;
-
- error = xfs_reflink_find_shared(mp, NULL, agno, agbno,
- aglen, &rbno, &rlen, true);
- if (error)
- goto out;
- if (rbno == NULLAGBLOCK)
- break;
-
- /* Dirty the pages */
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
- fpos = XFS_FSB_TO_B(mp, map[1].br_startoff +
- (rbno - agbno));
- flen = XFS_FSB_TO_B(mp, rlen);
- if (fpos + flen > isize)
- flen = isize - fpos;
- error = iomap_file_unshare(VFS_I(ip), fpos, flen,
- &xfs_iomap_ops);
- xfs_ilock(ip, XFS_ILOCK_EXCL);
- if (error)
- goto out;
-
- map[1].br_blockcount -= (rbno - agbno + rlen);
- map[1].br_startoff += (rbno - agbno + rlen);
- map[1].br_startblock += (rbno - agbno + rlen);
- }
-
-next:
- fbno = map[0].br_startoff + map[0].br_blockcount;
- }
-out:
- return error;
-}
-
/* Does this inode need the reflink flag? */
int
xfs_reflink_inode_has_shared_extents(
@@ -1596,10 +1517,7 @@ xfs_reflink_unshare(
xfs_off_t offset,
xfs_off_t len)
{
- struct xfs_mount *mp = ip->i_mount;
- xfs_fileoff_t fbno;
- xfs_filblks_t end;
- xfs_off_t isize;
+ struct inode *inode = VFS_I(ip);
int error;
if (!xfs_is_reflink_inode(ip))
@@ -1607,20 +1525,12 @@ xfs_reflink_unshare(
trace_xfs_reflink_unshare(ip, offset, len);
- inode_dio_wait(VFS_I(ip));
+ inode_dio_wait(inode);
- /* Try to CoW the selected ranges */
- xfs_ilock(ip, XFS_ILOCK_EXCL);
- fbno = XFS_B_TO_FSBT(mp, offset);
- isize = i_size_read(VFS_I(ip));
- end = XFS_B_TO_FSB(mp, offset + len);
- error = xfs_reflink_dirty_extents(ip, fbno, end, isize);
+ error = iomap_file_unshare(inode, offset, len, &xfs_iomap_ops);
if (error)
- goto out_unlock;
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
-
- /* Wait for the IO to finish */
- error = filemap_write_and_wait(VFS_I(ip)->i_mapping);
+ goto out;
+ error = filemap_write_and_wait(inode->i_mapping);
if (error)
goto out;
@@ -1628,11 +1538,8 @@ xfs_reflink_unshare(
error = xfs_reflink_try_clear_inode_flag(ip);
if (error)
goto out;
-
return 0;
-out_unlock:
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
out:
trace_xfs_reflink_unshare_error(ip, error, _RET_IP_);
return error;
--
2.20.1
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH 10/20] xfs: remove xfs_reflink_dirty_extents
2019-10-08 7:15 ` [PATCH 10/20] xfs: remove xfs_reflink_dirty_extents Christoph Hellwig
@ 2019-10-08 15:20 ` Darrick J. Wong
0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2019-10-08 15:20 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel
On Tue, Oct 08, 2019 at 09:15:17AM +0200, Christoph Hellwig wrote:
> Now that xfs_file_unshare is not completely dumb we can just call it
> directly without iterating the extent and reflink btrees ourselves.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
--D
> ---
> fs/xfs/xfs_reflink.c | 103 +++----------------------------------------
> 1 file changed, 5 insertions(+), 98 deletions(-)
>
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index a9634110c783..7fc728a8852b 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -1381,85 +1381,6 @@ xfs_reflink_remap_prep(
> return ret;
> }
>
> -/*
> - * The user wants to preemptively CoW all shared blocks in this file,
> - * which enables us to turn off the reflink flag. Iterate all
> - * extents which are not prealloc/delalloc to see which ranges are
> - * mentioned in the refcount tree, then read those blocks into the
> - * pagecache, dirty them, fsync them back out, and then we can update
> - * the inode flag. What happens if we run out of memory? :)
> - */
> -STATIC int
> -xfs_reflink_dirty_extents(
> - struct xfs_inode *ip,
> - xfs_fileoff_t fbno,
> - xfs_filblks_t end,
> - xfs_off_t isize)
> -{
> - struct xfs_mount *mp = ip->i_mount;
> - xfs_agnumber_t agno;
> - xfs_agblock_t agbno;
> - xfs_extlen_t aglen;
> - xfs_agblock_t rbno;
> - xfs_extlen_t rlen;
> - xfs_off_t fpos;
> - xfs_off_t flen;
> - struct xfs_bmbt_irec map[2];
> - int nmaps;
> - int error = 0;
> -
> - while (end - fbno > 0) {
> - nmaps = 1;
> - /*
> - * Look for extents in the file. Skip holes, delalloc, or
> - * unwritten extents; they can't be reflinked.
> - */
> - error = xfs_bmapi_read(ip, fbno, end - fbno, map, &nmaps, 0);
> - if (error)
> - goto out;
> - if (nmaps == 0)
> - break;
> - if (!xfs_bmap_is_real_extent(&map[0]))
> - goto next;
> -
> - map[1] = map[0];
> - while (map[1].br_blockcount) {
> - agno = XFS_FSB_TO_AGNO(mp, map[1].br_startblock);
> - agbno = XFS_FSB_TO_AGBNO(mp, map[1].br_startblock);
> - aglen = map[1].br_blockcount;
> -
> - error = xfs_reflink_find_shared(mp, NULL, agno, agbno,
> - aglen, &rbno, &rlen, true);
> - if (error)
> - goto out;
> - if (rbno == NULLAGBLOCK)
> - break;
> -
> - /* Dirty the pages */
> - xfs_iunlock(ip, XFS_ILOCK_EXCL);
> - fpos = XFS_FSB_TO_B(mp, map[1].br_startoff +
> - (rbno - agbno));
> - flen = XFS_FSB_TO_B(mp, rlen);
> - if (fpos + flen > isize)
> - flen = isize - fpos;
> - error = iomap_file_unshare(VFS_I(ip), fpos, flen,
> - &xfs_iomap_ops);
> - xfs_ilock(ip, XFS_ILOCK_EXCL);
> - if (error)
> - goto out;
> -
> - map[1].br_blockcount -= (rbno - agbno + rlen);
> - map[1].br_startoff += (rbno - agbno + rlen);
> - map[1].br_startblock += (rbno - agbno + rlen);
> - }
> -
> -next:
> - fbno = map[0].br_startoff + map[0].br_blockcount;
> - }
> -out:
> - return error;
> -}
> -
> /* Does this inode need the reflink flag? */
> int
> xfs_reflink_inode_has_shared_extents(
> @@ -1596,10 +1517,7 @@ xfs_reflink_unshare(
> xfs_off_t offset,
> xfs_off_t len)
> {
> - struct xfs_mount *mp = ip->i_mount;
> - xfs_fileoff_t fbno;
> - xfs_filblks_t end;
> - xfs_off_t isize;
> + struct inode *inode = VFS_I(ip);
> int error;
>
> if (!xfs_is_reflink_inode(ip))
> @@ -1607,20 +1525,12 @@ xfs_reflink_unshare(
>
> trace_xfs_reflink_unshare(ip, offset, len);
>
> - inode_dio_wait(VFS_I(ip));
> + inode_dio_wait(inode);
>
> - /* Try to CoW the selected ranges */
> - xfs_ilock(ip, XFS_ILOCK_EXCL);
> - fbno = XFS_B_TO_FSBT(mp, offset);
> - isize = i_size_read(VFS_I(ip));
> - end = XFS_B_TO_FSB(mp, offset + len);
> - error = xfs_reflink_dirty_extents(ip, fbno, end, isize);
> + error = iomap_file_unshare(inode, offset, len, &xfs_iomap_ops);
> if (error)
> - goto out_unlock;
> - xfs_iunlock(ip, XFS_ILOCK_EXCL);
> -
> - /* Wait for the IO to finish */
> - error = filemap_write_and_wait(VFS_I(ip)->i_mapping);
> + goto out;
> + error = filemap_write_and_wait(inode->i_mapping);
> if (error)
> goto out;
>
> @@ -1628,11 +1538,8 @@ xfs_reflink_unshare(
> error = xfs_reflink_try_clear_inode_flag(ip);
> if (error)
> goto out;
> -
> return 0;
>
> -out_unlock:
> - xfs_iunlock(ip, XFS_ILOCK_EXCL);
> out:
> trace_xfs_reflink_unshare_error(ip, error, _RET_IP_);
> return error;
> --
> 2.20.1
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 11/20] xfs: pass two imaps to xfs_reflink_allocate_cow
2019-10-08 7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
` (9 preceding siblings ...)
2019-10-08 7:15 ` [PATCH 10/20] xfs: remove xfs_reflink_dirty_extents Christoph Hellwig
@ 2019-10-08 7:15 ` Christoph Hellwig
2019-10-08 7:15 ` [PATCH 12/20] xfs: refactor xfs_file_iomap_begin_delay Christoph Hellwig
` (8 subsequent siblings)
19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08 7:15 UTC (permalink / raw)
To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel; +Cc: Darrick J . Wong
xfs_reflink_allocate_cow consumes the source data fork imap, and
potentially returns the COW fork imap. Split the arguments in two
to clear up the calling conventions and to prepare for returning
a source iomap from ->iomap_begin.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/xfs_iomap.c | 8 ++++----
fs/xfs/xfs_reflink.c | 30 +++++++++++++++---------------
fs/xfs/xfs_reflink.h | 4 ++--
3 files changed, 21 insertions(+), 21 deletions(-)
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 7031547e25ad..44ae6de02a19 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -996,9 +996,8 @@ xfs_file_iomap_begin(
goto out_found;
/* may drop and re-acquire the ilock */
- cmap = imap;
- error = xfs_reflink_allocate_cow(ip, &cmap, &shared, &lockmode,
- directio);
+ error = xfs_reflink_allocate_cow(ip, &imap, &cmap, &shared,
+ &lockmode, directio);
if (error)
goto out_unlock;
@@ -1011,7 +1010,8 @@ xfs_file_iomap_begin(
* newly allocated address. If the data fork has a hole, copy
* the COW fork mapping to avoid allocating to the data fork.
*/
- if (directio || imap.br_startblock == HOLESTARTBLOCK)
+ if (shared &&
+ (directio || imap.br_startblock == HOLESTARTBLOCK))
imap = cmap;
end_fsb = imap.br_startoff + imap.br_blockcount;
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 7fc728a8852b..19a6e4644123 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -308,13 +308,13 @@ static int
xfs_find_trim_cow_extent(
struct xfs_inode *ip,
struct xfs_bmbt_irec *imap,
+ struct xfs_bmbt_irec *cmap,
bool *shared,
bool *found)
{
xfs_fileoff_t offset_fsb = imap->br_startoff;
xfs_filblks_t count_fsb = imap->br_blockcount;
struct xfs_iext_cursor icur;
- struct xfs_bmbt_irec got;
*found = false;
@@ -322,23 +322,22 @@ xfs_find_trim_cow_extent(
* If we don't find an overlapping extent, trim the range we need to
* allocate to fit the hole we found.
*/
- if (!xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, &got))
- got.br_startoff = offset_fsb + count_fsb;
- if (got.br_startoff > offset_fsb) {
+ if (!xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, cmap))
+ cmap->br_startoff = offset_fsb + count_fsb;
+ if (cmap->br_startoff > offset_fsb) {
xfs_trim_extent(imap, imap->br_startoff,
- got.br_startoff - imap->br_startoff);
+ cmap->br_startoff - imap->br_startoff);
return xfs_inode_need_cow(ip, imap, shared);
}
*shared = true;
- if (isnullstartblock(got.br_startblock)) {
- xfs_trim_extent(imap, got.br_startoff, got.br_blockcount);
+ if (isnullstartblock(cmap->br_startblock)) {
+ xfs_trim_extent(imap, cmap->br_startoff, cmap->br_blockcount);
return 0;
}
/* real extent found - no need to allocate */
- xfs_trim_extent(&got, offset_fsb, count_fsb);
- *imap = got;
+ xfs_trim_extent(cmap, offset_fsb, count_fsb);
*found = true;
return 0;
}
@@ -348,6 +347,7 @@ int
xfs_reflink_allocate_cow(
struct xfs_inode *ip,
struct xfs_bmbt_irec *imap,
+ struct xfs_bmbt_irec *cmap,
bool *shared,
uint *lockmode,
bool convert_now)
@@ -367,7 +367,7 @@ xfs_reflink_allocate_cow(
xfs_ifork_init_cow(ip);
}
- error = xfs_find_trim_cow_extent(ip, imap, shared, &found);
+ error = xfs_find_trim_cow_extent(ip, imap, cmap, shared, &found);
if (error || !*shared)
return error;
if (found)
@@ -392,7 +392,7 @@ xfs_reflink_allocate_cow(
/*
* Check for an overlapping extent again now that we dropped the ilock.
*/
- error = xfs_find_trim_cow_extent(ip, imap, shared, &found);
+ error = xfs_find_trim_cow_extent(ip, imap, cmap, shared, &found);
if (error || !*shared)
goto out_trans_cancel;
if (found) {
@@ -411,7 +411,7 @@ xfs_reflink_allocate_cow(
nimaps = 1;
error = xfs_bmapi_write(tp, ip, imap->br_startoff, imap->br_blockcount,
XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC,
- resblks, imap, &nimaps);
+ resblks, cmap, &nimaps);
if (error)
goto out_unreserve;
@@ -427,15 +427,15 @@ xfs_reflink_allocate_cow(
if (nimaps == 0)
return -ENOSPC;
convert:
- xfs_trim_extent(imap, offset_fsb, count_fsb);
+ xfs_trim_extent(cmap, offset_fsb, count_fsb);
/*
* COW fork extents are supposed to remain unwritten until we're ready
* to initiate a disk write. For direct I/O we are going to write the
* data and need the conversion, but for buffered writes we're done.
*/
- if (!convert_now || imap->br_state == XFS_EXT_NORM)
+ if (!convert_now || cmap->br_state == XFS_EXT_NORM)
return 0;
- trace_xfs_reflink_convert_cow(ip, imap);
+ trace_xfs_reflink_convert_cow(ip, cmap);
return xfs_reflink_convert_cow_locked(ip, offset_fsb, count_fsb);
out_unreserve:
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index 28a43b7f581d..d18ad7f4fb64 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -25,8 +25,8 @@ extern int xfs_reflink_trim_around_shared(struct xfs_inode *ip,
bool xfs_inode_need_cow(struct xfs_inode *ip, struct xfs_bmbt_irec *imap,
bool *shared);
-extern int xfs_reflink_allocate_cow(struct xfs_inode *ip,
- struct xfs_bmbt_irec *imap, bool *shared, uint *lockmode,
+int xfs_reflink_allocate_cow(struct xfs_inode *ip, struct xfs_bmbt_irec *imap,
+ struct xfs_bmbt_irec *cmap, bool *shared, uint *lockmode,
bool convert_now);
extern int xfs_reflink_convert_cow(struct xfs_inode *ip, xfs_off_t offset,
xfs_off_t count);
--
2.20.1
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 12/20] xfs: refactor xfs_file_iomap_begin_delay
2019-10-08 7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
` (10 preceding siblings ...)
2019-10-08 7:15 ` [PATCH 11/20] xfs: pass two imaps to xfs_reflink_allocate_cow Christoph Hellwig
@ 2019-10-08 7:15 ` Christoph Hellwig
2019-10-08 7:15 ` [PATCH 13/20] xfs: fill out the srcmap in iomap_begin Christoph Hellwig
` (7 subsequent siblings)
19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08 7:15 UTC (permalink / raw)
To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel; +Cc: Darrick J . Wong
Rejuggle the return path to prepare for filling out a source iomap.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/xfs_iomap.c | 48 +++++++++++++++++++++++-----------------------
1 file changed, 24 insertions(+), 24 deletions(-)
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 44ae6de02a19..3292dfc8030a 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -539,7 +539,6 @@ xfs_file_iomap_begin_delay(
struct xfs_iext_cursor icur, ccur;
xfs_fsblock_t prealloc_blocks = 0;
bool eof = false, cow_eof = false, shared = false;
- u16 iomap_flags = 0;
int whichfork = XFS_DATA_FORK;
int error = 0;
@@ -600,8 +599,7 @@ xfs_file_iomap_begin_delay(
&ccur, &cmap);
if (!cow_eof && cmap.br_startoff <= offset_fsb) {
trace_xfs_reflink_cow_found(ip, &cmap);
- whichfork = XFS_COW_FORK;
- goto done;
+ goto found_cow;
}
}
@@ -615,7 +613,7 @@ xfs_file_iomap_begin_delay(
((flags & IOMAP_ZERO) && imap.br_state != XFS_EXT_NORM)) {
trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK,
&imap);
- goto done;
+ goto found_imap;
}
xfs_trim_extent(&imap, offset_fsb, end_fsb - offset_fsb);
@@ -629,7 +627,7 @@ xfs_file_iomap_begin_delay(
if (!shared) {
trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK,
&imap);
- goto done;
+ goto found_imap;
}
/*
@@ -703,35 +701,37 @@ xfs_file_iomap_begin_delay(
goto out_unlock;
}
+ if (whichfork == XFS_COW_FORK) {
+ trace_xfs_iomap_alloc(ip, offset, count, whichfork, &cmap);
+ goto found_cow;
+ }
+
/*
* Flag newly allocated delalloc blocks with IOMAP_F_NEW so we punch
* them out if the write happens to fail.
*/
- if (whichfork == XFS_DATA_FORK) {
- iomap_flags |= IOMAP_F_NEW;
- trace_xfs_iomap_alloc(ip, offset, count, whichfork, &imap);
- } else {
- trace_xfs_iomap_alloc(ip, offset, count, whichfork, &cmap);
- }
-done:
- if (whichfork == XFS_COW_FORK) {
- if (imap.br_startoff > offset_fsb) {
- xfs_trim_extent(&cmap, offset_fsb,
- imap.br_startoff - offset_fsb);
- error = xfs_bmbt_to_iomap(ip, iomap, &cmap,
- IOMAP_F_SHARED);
- goto out_unlock;
- }
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ trace_xfs_iomap_alloc(ip, offset, count, whichfork, &imap);
+ return xfs_bmbt_to_iomap(ip, iomap, &imap, IOMAP_F_NEW);
+
+found_imap:
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ return xfs_bmbt_to_iomap(ip, iomap, &imap, 0);
+
+found_cow:
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ if (imap.br_startoff <= offset_fsb) {
/* ensure we only report blocks we have a reservation for */
xfs_trim_extent(&imap, cmap.br_startoff, cmap.br_blockcount);
- shared = true;
+ return xfs_bmbt_to_iomap(ip, iomap, &imap, IOMAP_F_SHARED);
}
- if (shared)
- iomap_flags |= IOMAP_F_SHARED;
- error = xfs_bmbt_to_iomap(ip, iomap, &imap, iomap_flags);
+ xfs_trim_extent(&cmap, offset_fsb, imap.br_startoff - offset_fsb);
+ return xfs_bmbt_to_iomap(ip, iomap, &cmap, IOMAP_F_SHARED);
+
out_unlock:
xfs_iunlock(ip, XFS_ILOCK_EXCL);
return error;
+
}
int
--
2.20.1
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 13/20] xfs: fill out the srcmap in iomap_begin
2019-10-08 7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
` (11 preceding siblings ...)
2019-10-08 7:15 ` [PATCH 12/20] xfs: refactor xfs_file_iomap_begin_delay Christoph Hellwig
@ 2019-10-08 7:15 ` Christoph Hellwig
2019-10-08 15:26 ` Darrick J. Wong
2019-10-08 7:15 ` [PATCH 14/20] xfs: factor out a helper to calculate the end_fsb Christoph Hellwig
` (6 subsequent siblings)
19 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08 7:15 UTC (permalink / raw)
To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel
Replace our local hacks to report the source block in the main iomap
with the proper scrmap reporting.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_iomap.c | 49 +++++++++++++++++++++++-----------------------
1 file changed, 24 insertions(+), 25 deletions(-)
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 3292dfc8030a..ab2482a573d8 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -527,7 +527,8 @@ xfs_file_iomap_begin_delay(
loff_t offset,
loff_t count,
unsigned flags,
- struct iomap *iomap)
+ struct iomap *iomap,
+ struct iomap *srcmap)
{
struct xfs_inode *ip = XFS_I(inode);
struct xfs_mount *mp = ip->i_mount;
@@ -721,11 +722,13 @@ xfs_file_iomap_begin_delay(
found_cow:
xfs_iunlock(ip, XFS_ILOCK_EXCL);
if (imap.br_startoff <= offset_fsb) {
- /* ensure we only report blocks we have a reservation for */
- xfs_trim_extent(&imap, cmap.br_startoff, cmap.br_blockcount);
- return xfs_bmbt_to_iomap(ip, iomap, &imap, IOMAP_F_SHARED);
+ error = xfs_bmbt_to_iomap(ip, srcmap, &imap, 0);
+ if (error)
+ return error;
+ } else {
+ xfs_trim_extent(&cmap, offset_fsb,
+ imap.br_startoff - offset_fsb);
}
- xfs_trim_extent(&cmap, offset_fsb, imap.br_startoff - offset_fsb);
return xfs_bmbt_to_iomap(ip, iomap, &cmap, IOMAP_F_SHARED);
out_unlock:
@@ -933,7 +936,7 @@ xfs_file_iomap_begin(
{
struct xfs_inode *ip = XFS_I(inode);
struct xfs_mount *mp = ip->i_mount;
- struct xfs_bmbt_irec imap;
+ struct xfs_bmbt_irec imap, cmap;
xfs_fileoff_t offset_fsb, end_fsb;
int nimaps = 1, error = 0;
bool shared = false;
@@ -947,7 +950,7 @@ xfs_file_iomap_begin(
!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
/* Reserve delalloc blocks for regular writeback. */
return xfs_file_iomap_begin_delay(inode, offset, length, flags,
- iomap);
+ iomap, srcmap);
}
/*
@@ -987,9 +990,6 @@ xfs_file_iomap_begin(
* been done up front, so we don't need to do them here.
*/
if (xfs_is_cow_inode(ip)) {
- struct xfs_bmbt_irec cmap;
- bool directio = (flags & IOMAP_DIRECT);
-
/* if zeroing doesn't need COW allocation, then we are done. */
if ((flags & IOMAP_ZERO) &&
!needs_cow_for_zeroing(&imap, nimaps))
@@ -997,23 +997,11 @@ xfs_file_iomap_begin(
/* may drop and re-acquire the ilock */
error = xfs_reflink_allocate_cow(ip, &imap, &cmap, &shared,
- &lockmode, directio);
+ &lockmode, flags & IOMAP_DIRECT);
if (error)
goto out_unlock;
-
- /*
- * For buffered writes we need to report the address of the
- * previous block (if there was any) so that the higher level
- * write code can perform read-modify-write operations; we
- * won't need the CoW fork mapping until writeback. For direct
- * I/O, which must be block aligned, we need to report the
- * newly allocated address. If the data fork has a hole, copy
- * the COW fork mapping to avoid allocating to the data fork.
- */
- if (shared &&
- (directio || imap.br_startblock == HOLESTARTBLOCK))
- imap = cmap;
-
+ if (shared)
+ goto out_found_cow;
end_fsb = imap.br_startoff + imap.br_blockcount;
length = XFS_FSB_TO_B(mp, end_fsb) - offset;
}
@@ -1067,6 +1055,17 @@ xfs_file_iomap_begin(
trace_xfs_iomap_found(ip, offset, length, XFS_DATA_FORK, &imap);
goto out_finish;
+out_found_cow:
+ xfs_iunlock(ip, lockmode);
+ length = XFS_FSB_TO_B(mp, cmap.br_startoff + cmap.br_blockcount);
+ trace_xfs_iomap_found(ip, offset, length - offset, XFS_COW_FORK, &cmap);
+ if (imap.br_startblock != HOLESTARTBLOCK) {
+ error = xfs_bmbt_to_iomap(ip, srcmap, &imap, 0);
+ if (error)
+ return error;
+ }
+ return xfs_bmbt_to_iomap(ip, iomap, &cmap, IOMAP_F_SHARED);
+
out_unlock:
xfs_iunlock(ip, lockmode);
return error;
--
2.20.1
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH 13/20] xfs: fill out the srcmap in iomap_begin
2019-10-08 7:15 ` [PATCH 13/20] xfs: fill out the srcmap in iomap_begin Christoph Hellwig
@ 2019-10-08 15:26 ` Darrick J. Wong
0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2019-10-08 15:26 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel
On Tue, Oct 08, 2019 at 09:15:20AM +0200, Christoph Hellwig wrote:
> Replace our local hacks to report the source block in the main iomap
> with the proper scrmap reporting.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
--D
> ---
> fs/xfs/xfs_iomap.c | 49 +++++++++++++++++++++++-----------------------
> 1 file changed, 24 insertions(+), 25 deletions(-)
>
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 3292dfc8030a..ab2482a573d8 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -527,7 +527,8 @@ xfs_file_iomap_begin_delay(
> loff_t offset,
> loff_t count,
> unsigned flags,
> - struct iomap *iomap)
> + struct iomap *iomap,
> + struct iomap *srcmap)
> {
> struct xfs_inode *ip = XFS_I(inode);
> struct xfs_mount *mp = ip->i_mount;
> @@ -721,11 +722,13 @@ xfs_file_iomap_begin_delay(
> found_cow:
> xfs_iunlock(ip, XFS_ILOCK_EXCL);
> if (imap.br_startoff <= offset_fsb) {
> - /* ensure we only report blocks we have a reservation for */
> - xfs_trim_extent(&imap, cmap.br_startoff, cmap.br_blockcount);
> - return xfs_bmbt_to_iomap(ip, iomap, &imap, IOMAP_F_SHARED);
> + error = xfs_bmbt_to_iomap(ip, srcmap, &imap, 0);
> + if (error)
> + return error;
> + } else {
> + xfs_trim_extent(&cmap, offset_fsb,
> + imap.br_startoff - offset_fsb);
> }
> - xfs_trim_extent(&cmap, offset_fsb, imap.br_startoff - offset_fsb);
> return xfs_bmbt_to_iomap(ip, iomap, &cmap, IOMAP_F_SHARED);
>
> out_unlock:
> @@ -933,7 +936,7 @@ xfs_file_iomap_begin(
> {
> struct xfs_inode *ip = XFS_I(inode);
> struct xfs_mount *mp = ip->i_mount;
> - struct xfs_bmbt_irec imap;
> + struct xfs_bmbt_irec imap, cmap;
> xfs_fileoff_t offset_fsb, end_fsb;
> int nimaps = 1, error = 0;
> bool shared = false;
> @@ -947,7 +950,7 @@ xfs_file_iomap_begin(
> !IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
> /* Reserve delalloc blocks for regular writeback. */
> return xfs_file_iomap_begin_delay(inode, offset, length, flags,
> - iomap);
> + iomap, srcmap);
> }
>
> /*
> @@ -987,9 +990,6 @@ xfs_file_iomap_begin(
> * been done up front, so we don't need to do them here.
> */
> if (xfs_is_cow_inode(ip)) {
> - struct xfs_bmbt_irec cmap;
> - bool directio = (flags & IOMAP_DIRECT);
> -
> /* if zeroing doesn't need COW allocation, then we are done. */
> if ((flags & IOMAP_ZERO) &&
> !needs_cow_for_zeroing(&imap, nimaps))
> @@ -997,23 +997,11 @@ xfs_file_iomap_begin(
>
> /* may drop and re-acquire the ilock */
> error = xfs_reflink_allocate_cow(ip, &imap, &cmap, &shared,
> - &lockmode, directio);
> + &lockmode, flags & IOMAP_DIRECT);
> if (error)
> goto out_unlock;
> -
> - /*
> - * For buffered writes we need to report the address of the
> - * previous block (if there was any) so that the higher level
> - * write code can perform read-modify-write operations; we
> - * won't need the CoW fork mapping until writeback. For direct
> - * I/O, which must be block aligned, we need to report the
> - * newly allocated address. If the data fork has a hole, copy
> - * the COW fork mapping to avoid allocating to the data fork.
> - */
> - if (shared &&
> - (directio || imap.br_startblock == HOLESTARTBLOCK))
> - imap = cmap;
> -
> + if (shared)
> + goto out_found_cow;
> end_fsb = imap.br_startoff + imap.br_blockcount;
> length = XFS_FSB_TO_B(mp, end_fsb) - offset;
> }
> @@ -1067,6 +1055,17 @@ xfs_file_iomap_begin(
> trace_xfs_iomap_found(ip, offset, length, XFS_DATA_FORK, &imap);
> goto out_finish;
>
> +out_found_cow:
> + xfs_iunlock(ip, lockmode);
> + length = XFS_FSB_TO_B(mp, cmap.br_startoff + cmap.br_blockcount);
> + trace_xfs_iomap_found(ip, offset, length - offset, XFS_COW_FORK, &cmap);
> + if (imap.br_startblock != HOLESTARTBLOCK) {
> + error = xfs_bmbt_to_iomap(ip, srcmap, &imap, 0);
> + if (error)
> + return error;
> + }
> + return xfs_bmbt_to_iomap(ip, iomap, &cmap, IOMAP_F_SHARED);
> +
> out_unlock:
> xfs_iunlock(ip, lockmode);
> return error;
> --
> 2.20.1
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 14/20] xfs: factor out a helper to calculate the end_fsb
2019-10-08 7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
` (12 preceding siblings ...)
2019-10-08 7:15 ` [PATCH 13/20] xfs: fill out the srcmap in iomap_begin Christoph Hellwig
@ 2019-10-08 7:15 ` Christoph Hellwig
2019-10-08 7:15 ` [PATCH 15/20] xfs: split out a new set of read-only iomap ops Christoph Hellwig
` (5 subsequent siblings)
19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08 7:15 UTC (permalink / raw)
To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel
Cc: Allison Collins, Darrick J . Wong
We have lots of places that want to calculate the final fsb for
a offset + count in bytes and check that the result fits into
s_maxbytes. Factor out a helper for that.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/xfs_iomap.c | 40 ++++++++++++++++++++--------------------
1 file changed, 20 insertions(+), 20 deletions(-)
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index ab2482a573d8..a9f9e8c9034a 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -102,6 +102,17 @@ xfs_hole_to_iomap(
iomap->dax_dev = xfs_find_daxdev_for_inode(VFS_I(ip));
}
+static inline xfs_fileoff_t
+xfs_iomap_end_fsb(
+ struct xfs_mount *mp,
+ loff_t offset,
+ loff_t count)
+{
+ ASSERT(offset <= mp->m_super->s_maxbytes);
+ return min(XFS_B_TO_FSB(mp, offset + count),
+ XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes));
+}
+
xfs_extlen_t
xfs_eof_alignment(
struct xfs_inode *ip,
@@ -172,8 +183,8 @@ xfs_iomap_write_direct(
int nmaps)
{
xfs_mount_t *mp = ip->i_mount;
- xfs_fileoff_t offset_fsb;
- xfs_fileoff_t last_fsb;
+ xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset);
+ xfs_fileoff_t last_fsb = xfs_iomap_end_fsb(mp, offset, count);
xfs_filblks_t count_fsb, resaligned;
xfs_extlen_t extsz;
int nimaps;
@@ -192,8 +203,6 @@ xfs_iomap_write_direct(
ASSERT(xfs_isilocked(ip, lockmode));
- offset_fsb = XFS_B_TO_FSBT(mp, offset);
- last_fsb = XFS_B_TO_FSB(mp, ((xfs_ufsize_t)(offset + count)));
if ((offset + count) > XFS_ISIZE(ip)) {
/*
* Assert that the in-core extent list is present since this can
@@ -533,9 +542,7 @@ xfs_file_iomap_begin_delay(
struct xfs_inode *ip = XFS_I(inode);
struct xfs_mount *mp = ip->i_mount;
xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset);
- xfs_fileoff_t maxbytes_fsb =
- XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
- xfs_fileoff_t end_fsb;
+ xfs_fileoff_t end_fsb = xfs_iomap_end_fsb(mp, offset, count);
struct xfs_bmbt_irec imap, cmap;
struct xfs_iext_cursor icur, ccur;
xfs_fsblock_t prealloc_blocks = 0;
@@ -565,8 +572,6 @@ xfs_file_iomap_begin_delay(
goto out_unlock;
}
- end_fsb = min(XFS_B_TO_FSB(mp, offset + count), maxbytes_fsb);
-
/*
* Search the data fork fork first to look up our source mapping. We
* always need the data fork map, as we have to return it to the
@@ -648,7 +653,7 @@ xfs_file_iomap_begin_delay(
* the lower level functions are updated.
*/
count = min_t(loff_t, count, 1024 * PAGE_SIZE);
- end_fsb = min(XFS_B_TO_FSB(mp, offset + count), maxbytes_fsb);
+ end_fsb = xfs_iomap_end_fsb(mp, offset, count);
if (xfs_is_always_cow_inode(ip))
whichfork = XFS_COW_FORK;
@@ -674,7 +679,8 @@ xfs_file_iomap_begin_delay(
if (align)
p_end_fsb = roundup_64(p_end_fsb, align);
- p_end_fsb = min(p_end_fsb, maxbytes_fsb);
+ p_end_fsb = min(p_end_fsb,
+ XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes));
ASSERT(p_end_fsb > offset_fsb);
prealloc_blocks = p_end_fsb - end_fsb;
}
@@ -937,7 +943,8 @@ xfs_file_iomap_begin(
struct xfs_inode *ip = XFS_I(inode);
struct xfs_mount *mp = ip->i_mount;
struct xfs_bmbt_irec imap, cmap;
- xfs_fileoff_t offset_fsb, end_fsb;
+ xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset);
+ xfs_fileoff_t end_fsb = xfs_iomap_end_fsb(mp, offset, length);
int nimaps = 1, error = 0;
bool shared = false;
u16 iomap_flags = 0;
@@ -963,12 +970,6 @@ xfs_file_iomap_begin(
if (error)
return error;
- ASSERT(offset <= mp->m_super->s_maxbytes);
- if (offset > mp->m_super->s_maxbytes - length)
- length = mp->m_super->s_maxbytes - offset;
- offset_fsb = XFS_B_TO_FSBT(mp, offset);
- end_fsb = XFS_B_TO_FSB(mp, offset + length);
-
error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap,
&nimaps, 0);
if (error)
@@ -1189,8 +1190,7 @@ xfs_seek_iomap_begin(
/*
* Fake a hole until the end of the file.
*/
- data_fsb = min(XFS_B_TO_FSB(mp, offset + length),
- XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes));
+ data_fsb = xfs_iomap_end_fsb(mp, offset, length);
}
/*
--
2.20.1
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 15/20] xfs: split out a new set of read-only iomap ops
2019-10-08 7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
` (13 preceding siblings ...)
2019-10-08 7:15 ` [PATCH 14/20] xfs: factor out a helper to calculate the end_fsb Christoph Hellwig
@ 2019-10-08 7:15 ` Christoph Hellwig
2019-10-08 15:27 ` Darrick J. Wong
2019-10-08 7:15 ` [PATCH 16/20] xfs: move xfs_file_iomap_begin_delay around Christoph Hellwig
` (4 subsequent siblings)
19 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08 7:15 UTC (permalink / raw)
To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel
Start untangling xfs_file_iomap_begin by splitting out the read-only
case into its own set of iomap_ops with a very simply iomap_begin
helper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_aops.c | 9 ++++---
fs/xfs/xfs_file.c | 8 +++---
fs/xfs/xfs_iomap.c | 61 ++++++++++++++++++++++++++++++++++------------
fs/xfs/xfs_iomap.h | 1 +
fs/xfs/xfs_iops.c | 2 +-
5 files changed, 58 insertions(+), 23 deletions(-)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 807af5c3c347..f708a2831d2f 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -635,7 +635,7 @@ xfs_vm_bmap(
*/
if (xfs_is_cow_inode(ip) || XFS_IS_REALTIME_INODE(ip))
return 0;
- return iomap_bmap(mapping, block, &xfs_iomap_ops);
+ return iomap_bmap(mapping, block, &xfs_read_iomap_ops);
}
STATIC int
@@ -643,7 +643,7 @@ xfs_vm_readpage(
struct file *unused,
struct page *page)
{
- return iomap_readpage(page, &xfs_iomap_ops);
+ return iomap_readpage(page, &xfs_read_iomap_ops);
}
STATIC int
@@ -653,7 +653,7 @@ xfs_vm_readpages(
struct list_head *pages,
unsigned nr_pages)
{
- return iomap_readpages(mapping, pages, nr_pages, &xfs_iomap_ops);
+ return iomap_readpages(mapping, pages, nr_pages, &xfs_read_iomap_ops);
}
static int
@@ -663,7 +663,8 @@ xfs_iomap_swapfile_activate(
sector_t *span)
{
sis->bdev = xfs_find_bdev_for_inode(file_inode(swap_file));
- return iomap_swapfile_activate(sis, swap_file, span, &xfs_iomap_ops);
+ return iomap_swapfile_activate(sis, swap_file, span,
+ &xfs_read_iomap_ops);
}
const struct address_space_operations xfs_address_space_operations = {
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 1ffb179f35d2..f9814306ed8e 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -188,7 +188,7 @@ xfs_file_dio_aio_read(
file_accessed(iocb->ki_filp);
xfs_ilock(ip, XFS_IOLOCK_SHARED);
- ret = iomap_dio_rw(iocb, to, &xfs_iomap_ops, NULL);
+ ret = iomap_dio_rw(iocb, to, &xfs_read_iomap_ops, NULL);
xfs_iunlock(ip, XFS_IOLOCK_SHARED);
return ret;
@@ -215,7 +215,7 @@ xfs_file_dax_read(
xfs_ilock(ip, XFS_IOLOCK_SHARED);
}
- ret = dax_iomap_rw(iocb, to, &xfs_iomap_ops);
+ ret = dax_iomap_rw(iocb, to, &xfs_read_iomap_ops);
xfs_iunlock(ip, XFS_IOLOCK_SHARED);
file_accessed(iocb->ki_filp);
@@ -1156,7 +1156,9 @@ __xfs_filemap_fault(
if (IS_DAX(inode)) {
pfn_t pfn;
- ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &xfs_iomap_ops);
+ ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL,
+ (write_fault && !vmf->cow_page) ?
+ &xfs_iomap_ops : &xfs_read_iomap_ops);
if (ret & VM_FAULT_NEEDDSYNC)
ret = dax_finish_sync_fault(vmf, pe_size, pfn);
} else {
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index a9f9e8c9034a..0cfd973fd192 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -950,11 +950,13 @@ xfs_file_iomap_begin(
u16 iomap_flags = 0;
unsigned lockmode;
+ ASSERT(flags & (IOMAP_WRITE | IOMAP_ZERO));
+
if (XFS_FORCED_SHUTDOWN(mp))
return -EIO;
- if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && !(flags & IOMAP_DIRECT) &&
- !IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
+ if (!(flags & IOMAP_DIRECT) && !IS_DAX(inode) &&
+ !xfs_get_extsz_hint(ip)) {
/* Reserve delalloc blocks for regular writeback. */
return xfs_file_iomap_begin_delay(inode, offset, length, flags,
iomap, srcmap);
@@ -975,17 +977,6 @@ xfs_file_iomap_begin(
if (error)
goto out_unlock;
- if (flags & IOMAP_REPORT) {
- /* Trim the mapping to the nearest shared extent boundary. */
- error = xfs_reflink_trim_around_shared(ip, &imap, &shared);
- if (error)
- goto out_unlock;
- }
-
- /* Non-modifying mapping requested, so we are done */
- if (!(flags & (IOMAP_WRITE | IOMAP_ZERO)))
- goto out_found;
-
/*
* Break shared extents if necessary. Checks for non-blocking IO have
* been done up front, so we don't need to do them here.
@@ -1046,8 +1037,6 @@ xfs_file_iomap_begin(
trace_xfs_iomap_alloc(ip, offset, length, XFS_DATA_FORK, &imap);
out_finish:
- if (shared)
- iomap_flags |= IOMAP_F_SHARED;
return xfs_bmbt_to_iomap(ip, iomap, &imap, iomap_flags);
out_found:
@@ -1150,6 +1139,48 @@ const struct iomap_ops xfs_iomap_ops = {
.iomap_end = xfs_file_iomap_end,
};
+static int
+xfs_read_iomap_begin(
+ struct inode *inode,
+ loff_t offset,
+ loff_t length,
+ unsigned flags,
+ struct iomap *iomap,
+ struct iomap *srcmap)
+{
+ struct xfs_inode *ip = XFS_I(inode);
+ struct xfs_mount *mp = ip->i_mount;
+ struct xfs_bmbt_irec imap;
+ xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset);
+ xfs_fileoff_t end_fsb = xfs_iomap_end_fsb(mp, offset, length);
+ int nimaps = 1, error = 0;
+ bool shared = false;
+ unsigned lockmode;
+
+ ASSERT(!(flags & (IOMAP_WRITE | IOMAP_ZERO)));
+
+ if (XFS_FORCED_SHUTDOWN(mp))
+ return -EIO;
+
+ error = xfs_ilock_for_iomap(ip, flags, &lockmode);
+ if (error)
+ return error;
+ error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap,
+ &nimaps, 0);
+ if (!error && (flags & IOMAP_REPORT))
+ error = xfs_reflink_trim_around_shared(ip, &imap, &shared);
+ xfs_iunlock(ip, lockmode);
+
+ if (error)
+ return error;
+ trace_xfs_iomap_found(ip, offset, length, XFS_DATA_FORK, &imap);
+ return xfs_bmbt_to_iomap(ip, iomap, &imap, shared ? IOMAP_F_SHARED : 0);
+}
+
+const struct iomap_ops xfs_read_iomap_ops = {
+ .iomap_begin = xfs_read_iomap_begin,
+};
+
static int
xfs_seek_iomap_begin(
struct inode *inode,
diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h
index 71d0ae460c44..61b1fc3e5143 100644
--- a/fs/xfs/xfs_iomap.h
+++ b/fs/xfs/xfs_iomap.h
@@ -40,6 +40,7 @@ xfs_aligned_fsb_count(
}
extern const struct iomap_ops xfs_iomap_ops;
+extern const struct iomap_ops xfs_read_iomap_ops;
extern const struct iomap_ops xfs_seek_iomap_ops;
extern const struct iomap_ops xfs_xattr_iomap_ops;
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index fe285d123d69..9c448a54a951 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1114,7 +1114,7 @@ xfs_vn_fiemap(
&xfs_xattr_iomap_ops);
} else {
error = iomap_fiemap(inode, fieinfo, start, length,
- &xfs_iomap_ops);
+ &xfs_read_iomap_ops);
}
xfs_iunlock(XFS_I(inode), XFS_IOLOCK_SHARED);
--
2.20.1
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH 15/20] xfs: split out a new set of read-only iomap ops
2019-10-08 7:15 ` [PATCH 15/20] xfs: split out a new set of read-only iomap ops Christoph Hellwig
@ 2019-10-08 15:27 ` Darrick J. Wong
0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2019-10-08 15:27 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel
On Tue, Oct 08, 2019 at 09:15:22AM +0200, Christoph Hellwig wrote:
> Start untangling xfs_file_iomap_begin by splitting out the read-only
> case into its own set of iomap_ops with a very simply iomap_begin
> helper.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks the same as last time...
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
--D
> ---
> fs/xfs/xfs_aops.c | 9 ++++---
> fs/xfs/xfs_file.c | 8 +++---
> fs/xfs/xfs_iomap.c | 61 ++++++++++++++++++++++++++++++++++------------
> fs/xfs/xfs_iomap.h | 1 +
> fs/xfs/xfs_iops.c | 2 +-
> 5 files changed, 58 insertions(+), 23 deletions(-)
>
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 807af5c3c347..f708a2831d2f 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -635,7 +635,7 @@ xfs_vm_bmap(
> */
> if (xfs_is_cow_inode(ip) || XFS_IS_REALTIME_INODE(ip))
> return 0;
> - return iomap_bmap(mapping, block, &xfs_iomap_ops);
> + return iomap_bmap(mapping, block, &xfs_read_iomap_ops);
> }
>
> STATIC int
> @@ -643,7 +643,7 @@ xfs_vm_readpage(
> struct file *unused,
> struct page *page)
> {
> - return iomap_readpage(page, &xfs_iomap_ops);
> + return iomap_readpage(page, &xfs_read_iomap_ops);
> }
>
> STATIC int
> @@ -653,7 +653,7 @@ xfs_vm_readpages(
> struct list_head *pages,
> unsigned nr_pages)
> {
> - return iomap_readpages(mapping, pages, nr_pages, &xfs_iomap_ops);
> + return iomap_readpages(mapping, pages, nr_pages, &xfs_read_iomap_ops);
> }
>
> static int
> @@ -663,7 +663,8 @@ xfs_iomap_swapfile_activate(
> sector_t *span)
> {
> sis->bdev = xfs_find_bdev_for_inode(file_inode(swap_file));
> - return iomap_swapfile_activate(sis, swap_file, span, &xfs_iomap_ops);
> + return iomap_swapfile_activate(sis, swap_file, span,
> + &xfs_read_iomap_ops);
> }
>
> const struct address_space_operations xfs_address_space_operations = {
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 1ffb179f35d2..f9814306ed8e 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -188,7 +188,7 @@ xfs_file_dio_aio_read(
> file_accessed(iocb->ki_filp);
>
> xfs_ilock(ip, XFS_IOLOCK_SHARED);
> - ret = iomap_dio_rw(iocb, to, &xfs_iomap_ops, NULL);
> + ret = iomap_dio_rw(iocb, to, &xfs_read_iomap_ops, NULL);
> xfs_iunlock(ip, XFS_IOLOCK_SHARED);
>
> return ret;
> @@ -215,7 +215,7 @@ xfs_file_dax_read(
> xfs_ilock(ip, XFS_IOLOCK_SHARED);
> }
>
> - ret = dax_iomap_rw(iocb, to, &xfs_iomap_ops);
> + ret = dax_iomap_rw(iocb, to, &xfs_read_iomap_ops);
> xfs_iunlock(ip, XFS_IOLOCK_SHARED);
>
> file_accessed(iocb->ki_filp);
> @@ -1156,7 +1156,9 @@ __xfs_filemap_fault(
> if (IS_DAX(inode)) {
> pfn_t pfn;
>
> - ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &xfs_iomap_ops);
> + ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL,
> + (write_fault && !vmf->cow_page) ?
> + &xfs_iomap_ops : &xfs_read_iomap_ops);
> if (ret & VM_FAULT_NEEDDSYNC)
> ret = dax_finish_sync_fault(vmf, pe_size, pfn);
> } else {
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index a9f9e8c9034a..0cfd973fd192 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -950,11 +950,13 @@ xfs_file_iomap_begin(
> u16 iomap_flags = 0;
> unsigned lockmode;
>
> + ASSERT(flags & (IOMAP_WRITE | IOMAP_ZERO));
> +
> if (XFS_FORCED_SHUTDOWN(mp))
> return -EIO;
>
> - if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && !(flags & IOMAP_DIRECT) &&
> - !IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
> + if (!(flags & IOMAP_DIRECT) && !IS_DAX(inode) &&
> + !xfs_get_extsz_hint(ip)) {
> /* Reserve delalloc blocks for regular writeback. */
> return xfs_file_iomap_begin_delay(inode, offset, length, flags,
> iomap, srcmap);
> @@ -975,17 +977,6 @@ xfs_file_iomap_begin(
> if (error)
> goto out_unlock;
>
> - if (flags & IOMAP_REPORT) {
> - /* Trim the mapping to the nearest shared extent boundary. */
> - error = xfs_reflink_trim_around_shared(ip, &imap, &shared);
> - if (error)
> - goto out_unlock;
> - }
> -
> - /* Non-modifying mapping requested, so we are done */
> - if (!(flags & (IOMAP_WRITE | IOMAP_ZERO)))
> - goto out_found;
> -
> /*
> * Break shared extents if necessary. Checks for non-blocking IO have
> * been done up front, so we don't need to do them here.
> @@ -1046,8 +1037,6 @@ xfs_file_iomap_begin(
> trace_xfs_iomap_alloc(ip, offset, length, XFS_DATA_FORK, &imap);
>
> out_finish:
> - if (shared)
> - iomap_flags |= IOMAP_F_SHARED;
> return xfs_bmbt_to_iomap(ip, iomap, &imap, iomap_flags);
>
> out_found:
> @@ -1150,6 +1139,48 @@ const struct iomap_ops xfs_iomap_ops = {
> .iomap_end = xfs_file_iomap_end,
> };
>
> +static int
> +xfs_read_iomap_begin(
> + struct inode *inode,
> + loff_t offset,
> + loff_t length,
> + unsigned flags,
> + struct iomap *iomap,
> + struct iomap *srcmap)
> +{
> + struct xfs_inode *ip = XFS_I(inode);
> + struct xfs_mount *mp = ip->i_mount;
> + struct xfs_bmbt_irec imap;
> + xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset);
> + xfs_fileoff_t end_fsb = xfs_iomap_end_fsb(mp, offset, length);
> + int nimaps = 1, error = 0;
> + bool shared = false;
> + unsigned lockmode;
> +
> + ASSERT(!(flags & (IOMAP_WRITE | IOMAP_ZERO)));
> +
> + if (XFS_FORCED_SHUTDOWN(mp))
> + return -EIO;
> +
> + error = xfs_ilock_for_iomap(ip, flags, &lockmode);
> + if (error)
> + return error;
> + error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap,
> + &nimaps, 0);
> + if (!error && (flags & IOMAP_REPORT))
> + error = xfs_reflink_trim_around_shared(ip, &imap, &shared);
> + xfs_iunlock(ip, lockmode);
> +
> + if (error)
> + return error;
> + trace_xfs_iomap_found(ip, offset, length, XFS_DATA_FORK, &imap);
> + return xfs_bmbt_to_iomap(ip, iomap, &imap, shared ? IOMAP_F_SHARED : 0);
> +}
> +
> +const struct iomap_ops xfs_read_iomap_ops = {
> + .iomap_begin = xfs_read_iomap_begin,
> +};
> +
> static int
> xfs_seek_iomap_begin(
> struct inode *inode,
> diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h
> index 71d0ae460c44..61b1fc3e5143 100644
> --- a/fs/xfs/xfs_iomap.h
> +++ b/fs/xfs/xfs_iomap.h
> @@ -40,6 +40,7 @@ xfs_aligned_fsb_count(
> }
>
> extern const struct iomap_ops xfs_iomap_ops;
> +extern const struct iomap_ops xfs_read_iomap_ops;
> extern const struct iomap_ops xfs_seek_iomap_ops;
> extern const struct iomap_ops xfs_xattr_iomap_ops;
>
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index fe285d123d69..9c448a54a951 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -1114,7 +1114,7 @@ xfs_vn_fiemap(
> &xfs_xattr_iomap_ops);
> } else {
> error = iomap_fiemap(inode, fieinfo, start, length,
> - &xfs_iomap_ops);
> + &xfs_read_iomap_ops);
> }
> xfs_iunlock(XFS_I(inode), XFS_IOLOCK_SHARED);
>
> --
> 2.20.1
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 16/20] xfs: move xfs_file_iomap_begin_delay around
2019-10-08 7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
` (14 preceding siblings ...)
2019-10-08 7:15 ` [PATCH 15/20] xfs: split out a new set of read-only iomap ops Christoph Hellwig
@ 2019-10-08 7:15 ` Christoph Hellwig
2019-10-08 7:15 ` [PATCH 17/20] xfs: split the iomap ops for buffered vs direct writes Christoph Hellwig
` (3 subsequent siblings)
19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08 7:15 UTC (permalink / raw)
To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel; +Cc: Darrick J . Wong
Move xfs_file_iomap_begin_delay near the end of the file next to the
other iomap functions to prepare for additional refactoring.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/xfs_iomap.c | 434 +++++++++++++++++++++++----------------------
1 file changed, 221 insertions(+), 213 deletions(-)
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 0cfd973fd192..a6a03b65c4e7 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -530,219 +530,6 @@ xfs_iomap_prealloc_size(
return alloc_blocks;
}
-static int
-xfs_file_iomap_begin_delay(
- struct inode *inode,
- loff_t offset,
- loff_t count,
- unsigned flags,
- struct iomap *iomap,
- struct iomap *srcmap)
-{
- struct xfs_inode *ip = XFS_I(inode);
- struct xfs_mount *mp = ip->i_mount;
- xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset);
- xfs_fileoff_t end_fsb = xfs_iomap_end_fsb(mp, offset, count);
- struct xfs_bmbt_irec imap, cmap;
- struct xfs_iext_cursor icur, ccur;
- xfs_fsblock_t prealloc_blocks = 0;
- bool eof = false, cow_eof = false, shared = false;
- int whichfork = XFS_DATA_FORK;
- int error = 0;
-
- ASSERT(!XFS_IS_REALTIME_INODE(ip));
- ASSERT(!xfs_get_extsz_hint(ip));
-
- xfs_ilock(ip, XFS_ILOCK_EXCL);
-
- if (unlikely(XFS_TEST_ERROR(
- (XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_EXTENTS &&
- XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_BTREE),
- mp, XFS_ERRTAG_BMAPIFORMAT))) {
- XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, mp);
- error = -EFSCORRUPTED;
- goto out_unlock;
- }
-
- XFS_STATS_INC(mp, xs_blk_mapw);
-
- if (!(ip->i_df.if_flags & XFS_IFEXTENTS)) {
- error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
- if (error)
- goto out_unlock;
- }
-
- /*
- * Search the data fork fork first to look up our source mapping. We
- * always need the data fork map, as we have to return it to the
- * iomap code so that the higher level write code can read data in to
- * perform read-modify-write cycles for unaligned writes.
- */
- eof = !xfs_iext_lookup_extent(ip, &ip->i_df, offset_fsb, &icur, &imap);
- if (eof)
- imap.br_startoff = end_fsb; /* fake hole until the end */
-
- /* We never need to allocate blocks for zeroing a hole. */
- if ((flags & IOMAP_ZERO) && imap.br_startoff > offset_fsb) {
- xfs_hole_to_iomap(ip, iomap, offset_fsb, imap.br_startoff);
- goto out_unlock;
- }
-
- /*
- * Search the COW fork extent list even if we did not find a data fork
- * extent. This serves two purposes: first this implements the
- * speculative preallocation using cowextsize, so that we also unshare
- * block adjacent to shared blocks instead of just the shared blocks
- * themselves. Second the lookup in the extent list is generally faster
- * than going out to the shared extent tree.
- */
- if (xfs_is_cow_inode(ip)) {
- if (!ip->i_cowfp) {
- ASSERT(!xfs_is_reflink_inode(ip));
- xfs_ifork_init_cow(ip);
- }
- cow_eof = !xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb,
- &ccur, &cmap);
- if (!cow_eof && cmap.br_startoff <= offset_fsb) {
- trace_xfs_reflink_cow_found(ip, &cmap);
- goto found_cow;
- }
- }
-
- if (imap.br_startoff <= offset_fsb) {
- /*
- * For reflink files we may need a delalloc reservation when
- * overwriting shared extents. This includes zeroing of
- * existing extents that contain data.
- */
- if (!xfs_is_cow_inode(ip) ||
- ((flags & IOMAP_ZERO) && imap.br_state != XFS_EXT_NORM)) {
- trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK,
- &imap);
- goto found_imap;
- }
-
- xfs_trim_extent(&imap, offset_fsb, end_fsb - offset_fsb);
-
- /* Trim the mapping to the nearest shared extent boundary. */
- error = xfs_inode_need_cow(ip, &imap, &shared);
- if (error)
- goto out_unlock;
-
- /* Not shared? Just report the (potentially capped) extent. */
- if (!shared) {
- trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK,
- &imap);
- goto found_imap;
- }
-
- /*
- * Fork all the shared blocks from our write offset until the
- * end of the extent.
- */
- whichfork = XFS_COW_FORK;
- end_fsb = imap.br_startoff + imap.br_blockcount;
- } else {
- /*
- * We cap the maximum length we map here to MAX_WRITEBACK_PAGES
- * pages to keep the chunks of work done where somewhat
- * symmetric with the work writeback does. This is a completely
- * arbitrary number pulled out of thin air.
- *
- * Note that the values needs to be less than 32-bits wide until
- * the lower level functions are updated.
- */
- count = min_t(loff_t, count, 1024 * PAGE_SIZE);
- end_fsb = xfs_iomap_end_fsb(mp, offset, count);
-
- if (xfs_is_always_cow_inode(ip))
- whichfork = XFS_COW_FORK;
- }
-
- error = xfs_qm_dqattach_locked(ip, false);
- if (error)
- goto out_unlock;
-
- if (eof) {
- prealloc_blocks = xfs_iomap_prealloc_size(ip, whichfork, offset,
- count, &icur);
- if (prealloc_blocks) {
- xfs_extlen_t align;
- xfs_off_t end_offset;
- xfs_fileoff_t p_end_fsb;
-
- end_offset = XFS_WRITEIO_ALIGN(mp, offset + count - 1);
- p_end_fsb = XFS_B_TO_FSBT(mp, end_offset) +
- prealloc_blocks;
-
- align = xfs_eof_alignment(ip, 0);
- if (align)
- p_end_fsb = roundup_64(p_end_fsb, align);
-
- p_end_fsb = min(p_end_fsb,
- XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes));
- ASSERT(p_end_fsb > offset_fsb);
- prealloc_blocks = p_end_fsb - end_fsb;
- }
- }
-
-retry:
- error = xfs_bmapi_reserve_delalloc(ip, whichfork, offset_fsb,
- end_fsb - offset_fsb, prealloc_blocks,
- whichfork == XFS_DATA_FORK ? &imap : &cmap,
- whichfork == XFS_DATA_FORK ? &icur : &ccur,
- whichfork == XFS_DATA_FORK ? eof : cow_eof);
- switch (error) {
- case 0:
- break;
- case -ENOSPC:
- case -EDQUOT:
- /* retry without any preallocation */
- trace_xfs_delalloc_enospc(ip, offset, count);
- if (prealloc_blocks) {
- prealloc_blocks = 0;
- goto retry;
- }
- /*FALLTHRU*/
- default:
- goto out_unlock;
- }
-
- if (whichfork == XFS_COW_FORK) {
- trace_xfs_iomap_alloc(ip, offset, count, whichfork, &cmap);
- goto found_cow;
- }
-
- /*
- * Flag newly allocated delalloc blocks with IOMAP_F_NEW so we punch
- * them out if the write happens to fail.
- */
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
- trace_xfs_iomap_alloc(ip, offset, count, whichfork, &imap);
- return xfs_bmbt_to_iomap(ip, iomap, &imap, IOMAP_F_NEW);
-
-found_imap:
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
- return xfs_bmbt_to_iomap(ip, iomap, &imap, 0);
-
-found_cow:
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
- if (imap.br_startoff <= offset_fsb) {
- error = xfs_bmbt_to_iomap(ip, srcmap, &imap, 0);
- if (error)
- return error;
- } else {
- xfs_trim_extent(&cmap, offset_fsb,
- imap.br_startoff - offset_fsb);
- }
- return xfs_bmbt_to_iomap(ip, iomap, &cmap, IOMAP_F_SHARED);
-
-out_unlock:
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
- return error;
-
-}
-
int
xfs_iomap_write_unwritten(
xfs_inode_t *ip,
@@ -931,6 +718,15 @@ xfs_ilock_for_iomap(
return 0;
}
+static int
+xfs_file_iomap_begin_delay(
+ struct inode *inode,
+ loff_t offset,
+ loff_t count,
+ unsigned flags,
+ struct iomap *iomap,
+ struct iomap *srcmap);
+
static int
xfs_file_iomap_begin(
struct inode *inode,
@@ -1061,6 +857,218 @@ xfs_file_iomap_begin(
return error;
}
+static int
+xfs_file_iomap_begin_delay(
+ struct inode *inode,
+ loff_t offset,
+ loff_t count,
+ unsigned flags,
+ struct iomap *iomap,
+ struct iomap *srcmap)
+{
+ struct xfs_inode *ip = XFS_I(inode);
+ struct xfs_mount *mp = ip->i_mount;
+ xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset);
+ xfs_fileoff_t end_fsb = xfs_iomap_end_fsb(mp, offset, count);
+ struct xfs_bmbt_irec imap, cmap;
+ struct xfs_iext_cursor icur, ccur;
+ xfs_fsblock_t prealloc_blocks = 0;
+ bool eof = false, cow_eof = false, shared = false;
+ int whichfork = XFS_DATA_FORK;
+ int error = 0;
+
+ ASSERT(!XFS_IS_REALTIME_INODE(ip));
+ ASSERT(!xfs_get_extsz_hint(ip));
+
+ xfs_ilock(ip, XFS_ILOCK_EXCL);
+
+ if (unlikely(XFS_TEST_ERROR(
+ (XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_EXTENTS &&
+ XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_BTREE),
+ mp, XFS_ERRTAG_BMAPIFORMAT))) {
+ XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, mp);
+ error = -EFSCORRUPTED;
+ goto out_unlock;
+ }
+
+ XFS_STATS_INC(mp, xs_blk_mapw);
+
+ if (!(ip->i_df.if_flags & XFS_IFEXTENTS)) {
+ error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
+ if (error)
+ goto out_unlock;
+ }
+
+ /*
+ * Search the data fork fork first to look up our source mapping. We
+ * always need the data fork map, as we have to return it to the
+ * iomap code so that the higher level write code can read data in to
+ * perform read-modify-write cycles for unaligned writes.
+ */
+ eof = !xfs_iext_lookup_extent(ip, &ip->i_df, offset_fsb, &icur, &imap);
+ if (eof)
+ imap.br_startoff = end_fsb; /* fake hole until the end */
+
+ /* We never need to allocate blocks for zeroing a hole. */
+ if ((flags & IOMAP_ZERO) && imap.br_startoff > offset_fsb) {
+ xfs_hole_to_iomap(ip, iomap, offset_fsb, imap.br_startoff);
+ goto out_unlock;
+ }
+
+ /*
+ * Search the COW fork extent list even if we did not find a data fork
+ * extent. This serves two purposes: first this implements the
+ * speculative preallocation using cowextsize, so that we also unshare
+ * block adjacent to shared blocks instead of just the shared blocks
+ * themselves. Second the lookup in the extent list is generally faster
+ * than going out to the shared extent tree.
+ */
+ if (xfs_is_cow_inode(ip)) {
+ if (!ip->i_cowfp) {
+ ASSERT(!xfs_is_reflink_inode(ip));
+ xfs_ifork_init_cow(ip);
+ }
+ cow_eof = !xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb,
+ &ccur, &cmap);
+ if (!cow_eof && cmap.br_startoff <= offset_fsb) {
+ trace_xfs_reflink_cow_found(ip, &cmap);
+ goto found_cow;
+ }
+ }
+
+ if (imap.br_startoff <= offset_fsb) {
+ /*
+ * For reflink files we may need a delalloc reservation when
+ * overwriting shared extents. This includes zeroing of
+ * existing extents that contain data.
+ */
+ if (!xfs_is_cow_inode(ip) ||
+ ((flags & IOMAP_ZERO) && imap.br_state != XFS_EXT_NORM)) {
+ trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK,
+ &imap);
+ goto found_imap;
+ }
+
+ xfs_trim_extent(&imap, offset_fsb, end_fsb - offset_fsb);
+
+ /* Trim the mapping to the nearest shared extent boundary. */
+ error = xfs_inode_need_cow(ip, &imap, &shared);
+ if (error)
+ goto out_unlock;
+
+ /* Not shared? Just report the (potentially capped) extent. */
+ if (!shared) {
+ trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK,
+ &imap);
+ goto found_imap;
+ }
+
+ /*
+ * Fork all the shared blocks from our write offset until the
+ * end of the extent.
+ */
+ whichfork = XFS_COW_FORK;
+ end_fsb = imap.br_startoff + imap.br_blockcount;
+ } else {
+ /*
+ * We cap the maximum length we map here to MAX_WRITEBACK_PAGES
+ * pages to keep the chunks of work done where somewhat
+ * symmetric with the work writeback does. This is a completely
+ * arbitrary number pulled out of thin air.
+ *
+ * Note that the values needs to be less than 32-bits wide until
+ * the lower level functions are updated.
+ */
+ count = min_t(loff_t, count, 1024 * PAGE_SIZE);
+ end_fsb = xfs_iomap_end_fsb(mp, offset, count);
+
+ if (xfs_is_always_cow_inode(ip))
+ whichfork = XFS_COW_FORK;
+ }
+
+ error = xfs_qm_dqattach_locked(ip, false);
+ if (error)
+ goto out_unlock;
+
+ if (eof) {
+ prealloc_blocks = xfs_iomap_prealloc_size(ip, whichfork, offset,
+ count, &icur);
+ if (prealloc_blocks) {
+ xfs_extlen_t align;
+ xfs_off_t end_offset;
+ xfs_fileoff_t p_end_fsb;
+
+ end_offset = XFS_WRITEIO_ALIGN(mp, offset + count - 1);
+ p_end_fsb = XFS_B_TO_FSBT(mp, end_offset) +
+ prealloc_blocks;
+
+ align = xfs_eof_alignment(ip, 0);
+ if (align)
+ p_end_fsb = roundup_64(p_end_fsb, align);
+
+ p_end_fsb = min(p_end_fsb,
+ XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes));
+ ASSERT(p_end_fsb > offset_fsb);
+ prealloc_blocks = p_end_fsb - end_fsb;
+ }
+ }
+
+retry:
+ error = xfs_bmapi_reserve_delalloc(ip, whichfork, offset_fsb,
+ end_fsb - offset_fsb, prealloc_blocks,
+ whichfork == XFS_DATA_FORK ? &imap : &cmap,
+ whichfork == XFS_DATA_FORK ? &icur : &ccur,
+ whichfork == XFS_DATA_FORK ? eof : cow_eof);
+ switch (error) {
+ case 0:
+ break;
+ case -ENOSPC:
+ case -EDQUOT:
+ /* retry without any preallocation */
+ trace_xfs_delalloc_enospc(ip, offset, count);
+ if (prealloc_blocks) {
+ prealloc_blocks = 0;
+ goto retry;
+ }
+ /*FALLTHRU*/
+ default:
+ goto out_unlock;
+ }
+
+ if (whichfork == XFS_COW_FORK) {
+ trace_xfs_iomap_alloc(ip, offset, count, whichfork, &cmap);
+ goto found_cow;
+ }
+
+ /*
+ * Flag newly allocated delalloc blocks with IOMAP_F_NEW so we punch
+ * them out if the write happens to fail.
+ */
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ trace_xfs_iomap_alloc(ip, offset, count, whichfork, &imap);
+ return xfs_bmbt_to_iomap(ip, iomap, &imap, IOMAP_F_NEW);
+
+found_imap:
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ return xfs_bmbt_to_iomap(ip, iomap, &imap, 0);
+
+found_cow:
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ if (imap.br_startoff <= offset_fsb) {
+ error = xfs_bmbt_to_iomap(ip, srcmap, &imap, 0);
+ if (error)
+ return error;
+ } else {
+ xfs_trim_extent(&cmap, offset_fsb,
+ imap.br_startoff - offset_fsb);
+ }
+ return xfs_bmbt_to_iomap(ip, iomap, &cmap, IOMAP_F_SHARED);
+
+out_unlock:
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ return error;
+}
+
static int
xfs_file_iomap_end_delalloc(
struct xfs_inode *ip,
--
2.20.1
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 17/20] xfs: split the iomap ops for buffered vs direct writes
2019-10-08 7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
` (15 preceding siblings ...)
2019-10-08 7:15 ` [PATCH 16/20] xfs: move xfs_file_iomap_begin_delay around Christoph Hellwig
@ 2019-10-08 7:15 ` Christoph Hellwig
2019-10-08 15:28 ` Darrick J. Wong
2019-10-08 7:15 ` [PATCH 18/20] xfs: rename the whichfork variable in xfs_buffered_write_iomap_begin Christoph Hellwig
` (2 subsequent siblings)
19 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08 7:15 UTC (permalink / raw)
To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel
Instead of lots of magic conditionals in the main write_begin
handler this make the intent very clear. Thing will become even
better once we support delayed allocations for extent size hints
and realtime allocations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_bmap_util.c | 3 ++-
fs/xfs/xfs_file.c | 16 ++++++-----
fs/xfs/xfs_iomap.c | 61 +++++++++++++++---------------------------
fs/xfs/xfs_iomap.h | 3 ++-
fs/xfs/xfs_iops.c | 4 +--
fs/xfs/xfs_reflink.c | 5 ++--
6 files changed, 40 insertions(+), 52 deletions(-)
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 0910cb75b65d..a6831b7bdc18 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1111,7 +1111,8 @@ xfs_free_file_space(
return 0;
if (offset + len > XFS_ISIZE(ip))
len = XFS_ISIZE(ip) - offset;
- error = iomap_zero_range(VFS_I(ip), offset, len, NULL, &xfs_iomap_ops);
+ error = iomap_zero_range(VFS_I(ip), offset, len, NULL,
+ &xfs_buffered_write_iomap_ops);
if (error)
return error;
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index f9814306ed8e..71ffab53a0fc 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -351,7 +351,7 @@ xfs_file_aio_write_checks(
trace_xfs_zero_eof(ip, isize, iocb->ki_pos - isize);
error = iomap_zero_range(inode, isize, iocb->ki_pos - isize,
- NULL, &xfs_iomap_ops);
+ NULL, &xfs_buffered_write_iomap_ops);
if (error)
return error;
} else
@@ -547,7 +547,8 @@ xfs_file_dio_aio_write(
}
trace_xfs_file_direct_write(ip, count, iocb->ki_pos);
- ret = iomap_dio_rw(iocb, from, &xfs_iomap_ops, &xfs_dio_write_ops);
+ ret = iomap_dio_rw(iocb, from, &xfs_direct_write_iomap_ops,
+ &xfs_dio_write_ops);
/*
* If unaligned, this is the only IO in-flight. If it has not yet
@@ -594,7 +595,7 @@ xfs_file_dax_write(
count = iov_iter_count(from);
trace_xfs_file_dax_write(ip, count, pos);
- ret = dax_iomap_rw(iocb, from, &xfs_iomap_ops);
+ ret = dax_iomap_rw(iocb, from, &xfs_direct_write_iomap_ops);
if (ret > 0 && iocb->ki_pos > i_size_read(inode)) {
i_size_write(inode, iocb->ki_pos);
error = xfs_setfilesize(ip, pos, ret);
@@ -641,7 +642,8 @@ xfs_file_buffered_aio_write(
current->backing_dev_info = inode_to_bdi(inode);
trace_xfs_file_buffered_write(ip, iov_iter_count(from), iocb->ki_pos);
- ret = iomap_file_buffered_write(iocb, from, &xfs_iomap_ops);
+ ret = iomap_file_buffered_write(iocb, from,
+ &xfs_buffered_write_iomap_ops);
if (likely(ret >= 0))
iocb->ki_pos += ret;
@@ -1158,12 +1160,14 @@ __xfs_filemap_fault(
ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL,
(write_fault && !vmf->cow_page) ?
- &xfs_iomap_ops : &xfs_read_iomap_ops);
+ &xfs_direct_write_iomap_ops :
+ &xfs_read_iomap_ops);
if (ret & VM_FAULT_NEEDDSYNC)
ret = dax_finish_sync_fault(vmf, pe_size, pfn);
} else {
if (write_fault)
- ret = iomap_page_mkwrite(vmf, &xfs_iomap_ops);
+ ret = iomap_page_mkwrite(vmf,
+ &xfs_buffered_write_iomap_ops);
else
ret = filemap_fault(vmf);
}
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index a6a03b65c4e7..5a7499f88673 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -719,16 +719,7 @@ xfs_ilock_for_iomap(
}
static int
-xfs_file_iomap_begin_delay(
- struct inode *inode,
- loff_t offset,
- loff_t count,
- unsigned flags,
- struct iomap *iomap,
- struct iomap *srcmap);
-
-static int
-xfs_file_iomap_begin(
+xfs_direct_write_iomap_begin(
struct inode *inode,
loff_t offset,
loff_t length,
@@ -751,13 +742,6 @@ xfs_file_iomap_begin(
if (XFS_FORCED_SHUTDOWN(mp))
return -EIO;
- if (!(flags & IOMAP_DIRECT) && !IS_DAX(inode) &&
- !xfs_get_extsz_hint(ip)) {
- /* Reserve delalloc blocks for regular writeback. */
- return xfs_file_iomap_begin_delay(inode, offset, length, flags,
- iomap, srcmap);
- }
-
/*
* Lock the inode in the manner required for the specified operation and
* check for as many conditions that would result in blocking as
@@ -857,8 +841,12 @@ xfs_file_iomap_begin(
return error;
}
+const struct iomap_ops xfs_direct_write_iomap_ops = {
+ .iomap_begin = xfs_direct_write_iomap_begin,
+};
+
static int
-xfs_file_iomap_begin_delay(
+xfs_buffered_write_iomap_begin(
struct inode *inode,
loff_t offset,
loff_t count,
@@ -877,8 +865,12 @@ xfs_file_iomap_begin_delay(
int whichfork = XFS_DATA_FORK;
int error = 0;
+ /* we can't use delayed allocations when using extent size hints */
+ if (xfs_get_extsz_hint(ip))
+ return xfs_direct_write_iomap_begin(inode, offset, count,
+ flags, iomap, srcmap);
+
ASSERT(!XFS_IS_REALTIME_INODE(ip));
- ASSERT(!xfs_get_extsz_hint(ip));
xfs_ilock(ip, XFS_ILOCK_EXCL);
@@ -1070,18 +1062,23 @@ xfs_file_iomap_begin_delay(
}
static int
-xfs_file_iomap_end_delalloc(
- struct xfs_inode *ip,
+xfs_buffered_write_iomap_end(
+ struct inode *inode,
loff_t offset,
loff_t length,
ssize_t written,
+ unsigned flags,
struct iomap *iomap)
{
+ struct xfs_inode *ip = XFS_I(inode);
struct xfs_mount *mp = ip->i_mount;
xfs_fileoff_t start_fsb;
xfs_fileoff_t end_fsb;
int error = 0;
+ if (iomap->type != IOMAP_DELALLOC)
+ return 0;
+
/*
* Behave as if the write failed if drop writes is enabled. Set the NEW
* flag to force delalloc cleanup.
@@ -1126,25 +1123,9 @@ xfs_file_iomap_end_delalloc(
return 0;
}
-static int
-xfs_file_iomap_end(
- struct inode *inode,
- loff_t offset,
- loff_t length,
- ssize_t written,
- unsigned flags,
- struct iomap *iomap)
-{
- if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) &&
- iomap->type == IOMAP_DELALLOC)
- return xfs_file_iomap_end_delalloc(XFS_I(inode), offset,
- length, written, iomap);
- return 0;
-}
-
-const struct iomap_ops xfs_iomap_ops = {
- .iomap_begin = xfs_file_iomap_begin,
- .iomap_end = xfs_file_iomap_end,
+const struct iomap_ops xfs_buffered_write_iomap_ops = {
+ .iomap_begin = xfs_buffered_write_iomap_begin,
+ .iomap_end = xfs_buffered_write_iomap_end,
};
static int
diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h
index 61b1fc3e5143..7aed28275089 100644
--- a/fs/xfs/xfs_iomap.h
+++ b/fs/xfs/xfs_iomap.h
@@ -39,7 +39,8 @@ xfs_aligned_fsb_count(
return count_fsb;
}
-extern const struct iomap_ops xfs_iomap_ops;
+extern const struct iomap_ops xfs_buffered_write_iomap_ops;
+extern const struct iomap_ops xfs_direct_write_iomap_ops;
extern const struct iomap_ops xfs_read_iomap_ops;
extern const struct iomap_ops xfs_seek_iomap_ops;
extern const struct iomap_ops xfs_xattr_iomap_ops;
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 9c448a54a951..329a34af8e79 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -883,10 +883,10 @@ xfs_setattr_size(
if (newsize > oldsize) {
trace_xfs_zero_eof(ip, oldsize, newsize - oldsize);
error = iomap_zero_range(inode, oldsize, newsize - oldsize,
- &did_zeroing, &xfs_iomap_ops);
+ &did_zeroing, &xfs_buffered_write_iomap_ops);
} else {
error = iomap_truncate_page(inode, newsize, &did_zeroing,
- &xfs_iomap_ops);
+ &xfs_buffered_write_iomap_ops);
}
if (error)
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 19a6e4644123..1e18b4024b82 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1270,7 +1270,7 @@ xfs_reflink_zero_posteof(
trace_xfs_zero_eof(ip, isize, pos - isize);
return iomap_zero_range(VFS_I(ip), isize, pos - isize, NULL,
- &xfs_iomap_ops);
+ &xfs_buffered_write_iomap_ops);
}
/*
@@ -1527,7 +1527,8 @@ xfs_reflink_unshare(
inode_dio_wait(inode);
- error = iomap_file_unshare(inode, offset, len, &xfs_iomap_ops);
+ error = iomap_file_unshare(inode, offset, len,
+ &xfs_buffered_write_iomap_ops);
if (error)
goto out;
error = filemap_write_and_wait(inode->i_mapping);
--
2.20.1
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH 17/20] xfs: split the iomap ops for buffered vs direct writes
2019-10-08 7:15 ` [PATCH 17/20] xfs: split the iomap ops for buffered vs direct writes Christoph Hellwig
@ 2019-10-08 15:28 ` Darrick J. Wong
0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2019-10-08 15:28 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel
On Tue, Oct 08, 2019 at 09:15:24AM +0200, Christoph Hellwig wrote:
> Instead of lots of magic conditionals in the main write_begin
> handler this make the intent very clear. Thing will become even
> better once we support delayed allocations for extent size hints
> and realtime allocations.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks the same as last time...right?
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
--D
> ---
> fs/xfs/xfs_bmap_util.c | 3 ++-
> fs/xfs/xfs_file.c | 16 ++++++-----
> fs/xfs/xfs_iomap.c | 61 +++++++++++++++---------------------------
> fs/xfs/xfs_iomap.h | 3 ++-
> fs/xfs/xfs_iops.c | 4 +--
> fs/xfs/xfs_reflink.c | 5 ++--
> 6 files changed, 40 insertions(+), 52 deletions(-)
>
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 0910cb75b65d..a6831b7bdc18 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -1111,7 +1111,8 @@ xfs_free_file_space(
> return 0;
> if (offset + len > XFS_ISIZE(ip))
> len = XFS_ISIZE(ip) - offset;
> - error = iomap_zero_range(VFS_I(ip), offset, len, NULL, &xfs_iomap_ops);
> + error = iomap_zero_range(VFS_I(ip), offset, len, NULL,
> + &xfs_buffered_write_iomap_ops);
> if (error)
> return error;
>
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index f9814306ed8e..71ffab53a0fc 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -351,7 +351,7 @@ xfs_file_aio_write_checks(
>
> trace_xfs_zero_eof(ip, isize, iocb->ki_pos - isize);
> error = iomap_zero_range(inode, isize, iocb->ki_pos - isize,
> - NULL, &xfs_iomap_ops);
> + NULL, &xfs_buffered_write_iomap_ops);
> if (error)
> return error;
> } else
> @@ -547,7 +547,8 @@ xfs_file_dio_aio_write(
> }
>
> trace_xfs_file_direct_write(ip, count, iocb->ki_pos);
> - ret = iomap_dio_rw(iocb, from, &xfs_iomap_ops, &xfs_dio_write_ops);
> + ret = iomap_dio_rw(iocb, from, &xfs_direct_write_iomap_ops,
> + &xfs_dio_write_ops);
>
> /*
> * If unaligned, this is the only IO in-flight. If it has not yet
> @@ -594,7 +595,7 @@ xfs_file_dax_write(
> count = iov_iter_count(from);
>
> trace_xfs_file_dax_write(ip, count, pos);
> - ret = dax_iomap_rw(iocb, from, &xfs_iomap_ops);
> + ret = dax_iomap_rw(iocb, from, &xfs_direct_write_iomap_ops);
> if (ret > 0 && iocb->ki_pos > i_size_read(inode)) {
> i_size_write(inode, iocb->ki_pos);
> error = xfs_setfilesize(ip, pos, ret);
> @@ -641,7 +642,8 @@ xfs_file_buffered_aio_write(
> current->backing_dev_info = inode_to_bdi(inode);
>
> trace_xfs_file_buffered_write(ip, iov_iter_count(from), iocb->ki_pos);
> - ret = iomap_file_buffered_write(iocb, from, &xfs_iomap_ops);
> + ret = iomap_file_buffered_write(iocb, from,
> + &xfs_buffered_write_iomap_ops);
> if (likely(ret >= 0))
> iocb->ki_pos += ret;
>
> @@ -1158,12 +1160,14 @@ __xfs_filemap_fault(
>
> ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL,
> (write_fault && !vmf->cow_page) ?
> - &xfs_iomap_ops : &xfs_read_iomap_ops);
> + &xfs_direct_write_iomap_ops :
> + &xfs_read_iomap_ops);
> if (ret & VM_FAULT_NEEDDSYNC)
> ret = dax_finish_sync_fault(vmf, pe_size, pfn);
> } else {
> if (write_fault)
> - ret = iomap_page_mkwrite(vmf, &xfs_iomap_ops);
> + ret = iomap_page_mkwrite(vmf,
> + &xfs_buffered_write_iomap_ops);
> else
> ret = filemap_fault(vmf);
> }
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index a6a03b65c4e7..5a7499f88673 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -719,16 +719,7 @@ xfs_ilock_for_iomap(
> }
>
> static int
> -xfs_file_iomap_begin_delay(
> - struct inode *inode,
> - loff_t offset,
> - loff_t count,
> - unsigned flags,
> - struct iomap *iomap,
> - struct iomap *srcmap);
> -
> -static int
> -xfs_file_iomap_begin(
> +xfs_direct_write_iomap_begin(
> struct inode *inode,
> loff_t offset,
> loff_t length,
> @@ -751,13 +742,6 @@ xfs_file_iomap_begin(
> if (XFS_FORCED_SHUTDOWN(mp))
> return -EIO;
>
> - if (!(flags & IOMAP_DIRECT) && !IS_DAX(inode) &&
> - !xfs_get_extsz_hint(ip)) {
> - /* Reserve delalloc blocks for regular writeback. */
> - return xfs_file_iomap_begin_delay(inode, offset, length, flags,
> - iomap, srcmap);
> - }
> -
> /*
> * Lock the inode in the manner required for the specified operation and
> * check for as many conditions that would result in blocking as
> @@ -857,8 +841,12 @@ xfs_file_iomap_begin(
> return error;
> }
>
> +const struct iomap_ops xfs_direct_write_iomap_ops = {
> + .iomap_begin = xfs_direct_write_iomap_begin,
> +};
> +
> static int
> -xfs_file_iomap_begin_delay(
> +xfs_buffered_write_iomap_begin(
> struct inode *inode,
> loff_t offset,
> loff_t count,
> @@ -877,8 +865,12 @@ xfs_file_iomap_begin_delay(
> int whichfork = XFS_DATA_FORK;
> int error = 0;
>
> + /* we can't use delayed allocations when using extent size hints */
> + if (xfs_get_extsz_hint(ip))
> + return xfs_direct_write_iomap_begin(inode, offset, count,
> + flags, iomap, srcmap);
> +
> ASSERT(!XFS_IS_REALTIME_INODE(ip));
> - ASSERT(!xfs_get_extsz_hint(ip));
>
> xfs_ilock(ip, XFS_ILOCK_EXCL);
>
> @@ -1070,18 +1062,23 @@ xfs_file_iomap_begin_delay(
> }
>
> static int
> -xfs_file_iomap_end_delalloc(
> - struct xfs_inode *ip,
> +xfs_buffered_write_iomap_end(
> + struct inode *inode,
> loff_t offset,
> loff_t length,
> ssize_t written,
> + unsigned flags,
> struct iomap *iomap)
> {
> + struct xfs_inode *ip = XFS_I(inode);
> struct xfs_mount *mp = ip->i_mount;
> xfs_fileoff_t start_fsb;
> xfs_fileoff_t end_fsb;
> int error = 0;
>
> + if (iomap->type != IOMAP_DELALLOC)
> + return 0;
> +
> /*
> * Behave as if the write failed if drop writes is enabled. Set the NEW
> * flag to force delalloc cleanup.
> @@ -1126,25 +1123,9 @@ xfs_file_iomap_end_delalloc(
> return 0;
> }
>
> -static int
> -xfs_file_iomap_end(
> - struct inode *inode,
> - loff_t offset,
> - loff_t length,
> - ssize_t written,
> - unsigned flags,
> - struct iomap *iomap)
> -{
> - if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) &&
> - iomap->type == IOMAP_DELALLOC)
> - return xfs_file_iomap_end_delalloc(XFS_I(inode), offset,
> - length, written, iomap);
> - return 0;
> -}
> -
> -const struct iomap_ops xfs_iomap_ops = {
> - .iomap_begin = xfs_file_iomap_begin,
> - .iomap_end = xfs_file_iomap_end,
> +const struct iomap_ops xfs_buffered_write_iomap_ops = {
> + .iomap_begin = xfs_buffered_write_iomap_begin,
> + .iomap_end = xfs_buffered_write_iomap_end,
> };
>
> static int
> diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h
> index 61b1fc3e5143..7aed28275089 100644
> --- a/fs/xfs/xfs_iomap.h
> +++ b/fs/xfs/xfs_iomap.h
> @@ -39,7 +39,8 @@ xfs_aligned_fsb_count(
> return count_fsb;
> }
>
> -extern const struct iomap_ops xfs_iomap_ops;
> +extern const struct iomap_ops xfs_buffered_write_iomap_ops;
> +extern const struct iomap_ops xfs_direct_write_iomap_ops;
> extern const struct iomap_ops xfs_read_iomap_ops;
> extern const struct iomap_ops xfs_seek_iomap_ops;
> extern const struct iomap_ops xfs_xattr_iomap_ops;
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 9c448a54a951..329a34af8e79 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -883,10 +883,10 @@ xfs_setattr_size(
> if (newsize > oldsize) {
> trace_xfs_zero_eof(ip, oldsize, newsize - oldsize);
> error = iomap_zero_range(inode, oldsize, newsize - oldsize,
> - &did_zeroing, &xfs_iomap_ops);
> + &did_zeroing, &xfs_buffered_write_iomap_ops);
> } else {
> error = iomap_truncate_page(inode, newsize, &did_zeroing,
> - &xfs_iomap_ops);
> + &xfs_buffered_write_iomap_ops);
> }
>
> if (error)
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 19a6e4644123..1e18b4024b82 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -1270,7 +1270,7 @@ xfs_reflink_zero_posteof(
>
> trace_xfs_zero_eof(ip, isize, pos - isize);
> return iomap_zero_range(VFS_I(ip), isize, pos - isize, NULL,
> - &xfs_iomap_ops);
> + &xfs_buffered_write_iomap_ops);
> }
>
> /*
> @@ -1527,7 +1527,8 @@ xfs_reflink_unshare(
>
> inode_dio_wait(inode);
>
> - error = iomap_file_unshare(inode, offset, len, &xfs_iomap_ops);
> + error = iomap_file_unshare(inode, offset, len,
> + &xfs_buffered_write_iomap_ops);
> if (error)
> goto out;
> error = filemap_write_and_wait(inode->i_mapping);
> --
> 2.20.1
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 18/20] xfs: rename the whichfork variable in xfs_buffered_write_iomap_begin
2019-10-08 7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
` (16 preceding siblings ...)
2019-10-08 7:15 ` [PATCH 17/20] xfs: split the iomap ops for buffered vs direct writes Christoph Hellwig
@ 2019-10-08 7:15 ` Christoph Hellwig
2019-10-08 7:15 ` [PATCH 19/20] xfs: cleanup xfs_iomap_write_unwritten Christoph Hellwig
2019-10-08 7:15 ` [PATCH 20/20] xfs: improve the IOMAP_NOWAIT check for COW inodes Christoph Hellwig
19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08 7:15 UTC (permalink / raw)
To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel
Cc: Allison Collins, Darrick J . Wong
Renaming whichfork to allocfork in xfs_buffered_write_iomap_begin makes
the usage of this variable a little more clear.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/xfs_iomap.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 5a7499f88673..7a3fcfe5662c 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -862,7 +862,7 @@ xfs_buffered_write_iomap_begin(
struct xfs_iext_cursor icur, ccur;
xfs_fsblock_t prealloc_blocks = 0;
bool eof = false, cow_eof = false, shared = false;
- int whichfork = XFS_DATA_FORK;
+ int allocfork = XFS_DATA_FORK;
int error = 0;
/* we can't use delayed allocations when using extent size hints */
@@ -959,7 +959,7 @@ xfs_buffered_write_iomap_begin(
* Fork all the shared blocks from our write offset until the
* end of the extent.
*/
- whichfork = XFS_COW_FORK;
+ allocfork = XFS_COW_FORK;
end_fsb = imap.br_startoff + imap.br_blockcount;
} else {
/*
@@ -975,7 +975,7 @@ xfs_buffered_write_iomap_begin(
end_fsb = xfs_iomap_end_fsb(mp, offset, count);
if (xfs_is_always_cow_inode(ip))
- whichfork = XFS_COW_FORK;
+ allocfork = XFS_COW_FORK;
}
error = xfs_qm_dqattach_locked(ip, false);
@@ -983,7 +983,7 @@ xfs_buffered_write_iomap_begin(
goto out_unlock;
if (eof) {
- prealloc_blocks = xfs_iomap_prealloc_size(ip, whichfork, offset,
+ prealloc_blocks = xfs_iomap_prealloc_size(ip, allocfork, offset,
count, &icur);
if (prealloc_blocks) {
xfs_extlen_t align;
@@ -1006,11 +1006,11 @@ xfs_buffered_write_iomap_begin(
}
retry:
- error = xfs_bmapi_reserve_delalloc(ip, whichfork, offset_fsb,
+ error = xfs_bmapi_reserve_delalloc(ip, allocfork, offset_fsb,
end_fsb - offset_fsb, prealloc_blocks,
- whichfork == XFS_DATA_FORK ? &imap : &cmap,
- whichfork == XFS_DATA_FORK ? &icur : &ccur,
- whichfork == XFS_DATA_FORK ? eof : cow_eof);
+ allocfork == XFS_DATA_FORK ? &imap : &cmap,
+ allocfork == XFS_DATA_FORK ? &icur : &ccur,
+ allocfork == XFS_DATA_FORK ? eof : cow_eof);
switch (error) {
case 0:
break;
@@ -1027,8 +1027,8 @@ xfs_buffered_write_iomap_begin(
goto out_unlock;
}
- if (whichfork == XFS_COW_FORK) {
- trace_xfs_iomap_alloc(ip, offset, count, whichfork, &cmap);
+ if (allocfork == XFS_COW_FORK) {
+ trace_xfs_iomap_alloc(ip, offset, count, allocfork, &cmap);
goto found_cow;
}
@@ -1037,7 +1037,7 @@ xfs_buffered_write_iomap_begin(
* them out if the write happens to fail.
*/
xfs_iunlock(ip, XFS_ILOCK_EXCL);
- trace_xfs_iomap_alloc(ip, offset, count, whichfork, &imap);
+ trace_xfs_iomap_alloc(ip, offset, count, allocfork, &imap);
return xfs_bmbt_to_iomap(ip, iomap, &imap, IOMAP_F_NEW);
found_imap:
--
2.20.1
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 19/20] xfs: cleanup xfs_iomap_write_unwritten
2019-10-08 7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
` (17 preceding siblings ...)
2019-10-08 7:15 ` [PATCH 18/20] xfs: rename the whichfork variable in xfs_buffered_write_iomap_begin Christoph Hellwig
@ 2019-10-08 7:15 ` Christoph Hellwig
2019-10-08 7:15 ` [PATCH 20/20] xfs: improve the IOMAP_NOWAIT check for COW inodes Christoph Hellwig
19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08 7:15 UTC (permalink / raw)
To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel; +Cc: Darrick J . Wong
Move more checks into the helpers that determine if we need a COW
operation or allocation and split the return path for when an existing
data for allocation has been found versus a new allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/xfs_iomap.c | 74 ++++++++++++++++++++++++----------------------
1 file changed, 38 insertions(+), 36 deletions(-)
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 7a3fcfe5662c..cc3d8fa011fc 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -642,23 +642,42 @@ xfs_iomap_write_unwritten(
static inline bool
imap_needs_alloc(
struct inode *inode,
+ unsigned flags,
struct xfs_bmbt_irec *imap,
int nimaps)
{
- return !nimaps ||
- imap->br_startblock == HOLESTARTBLOCK ||
- imap->br_startblock == DELAYSTARTBLOCK ||
- (IS_DAX(inode) && imap->br_state == XFS_EXT_UNWRITTEN);
+ /* don't allocate blocks when just zeroing */
+ if (flags & IOMAP_ZERO)
+ return false;
+ if (!nimaps ||
+ imap->br_startblock == HOLESTARTBLOCK ||
+ imap->br_startblock == DELAYSTARTBLOCK)
+ return true;
+ /* we convert unwritten extents before copying the data for DAX */
+ if (IS_DAX(inode) && imap->br_state == XFS_EXT_UNWRITTEN)
+ return true;
+ return false;
}
static inline bool
-needs_cow_for_zeroing(
+imap_needs_cow(
+ struct xfs_inode *ip,
+ unsigned int flags,
struct xfs_bmbt_irec *imap,
int nimaps)
{
- return nimaps &&
- imap->br_startblock != HOLESTARTBLOCK &&
- imap->br_state != XFS_EXT_UNWRITTEN;
+ if (!xfs_is_cow_inode(ip))
+ return false;
+
+ /* when zeroing we don't have to COW holes or unwritten extents */
+ if (flags & IOMAP_ZERO) {
+ if (!nimaps ||
+ imap->br_startblock == HOLESTARTBLOCK ||
+ imap->br_state == XFS_EXT_UNWRITTEN)
+ return false;
+ }
+
+ return true;
}
static int
@@ -734,7 +753,6 @@ xfs_direct_write_iomap_begin(
xfs_fileoff_t end_fsb = xfs_iomap_end_fsb(mp, offset, length);
int nimaps = 1, error = 0;
bool shared = false;
- u16 iomap_flags = 0;
unsigned lockmode;
ASSERT(flags & (IOMAP_WRITE | IOMAP_ZERO));
@@ -761,12 +779,7 @@ xfs_direct_write_iomap_begin(
* Break shared extents if necessary. Checks for non-blocking IO have
* been done up front, so we don't need to do them here.
*/
- if (xfs_is_cow_inode(ip)) {
- /* if zeroing doesn't need COW allocation, then we are done. */
- if ((flags & IOMAP_ZERO) &&
- !needs_cow_for_zeroing(&imap, nimaps))
- goto out_found;
-
+ if (imap_needs_cow(ip, flags, &imap, nimaps)) {
/* may drop and re-acquire the ilock */
error = xfs_reflink_allocate_cow(ip, &imap, &cmap, &shared,
&lockmode, flags & IOMAP_DIRECT);
@@ -778,18 +791,17 @@ xfs_direct_write_iomap_begin(
length = XFS_FSB_TO_B(mp, end_fsb) - offset;
}
- /* Don't need to allocate over holes when doing zeroing operations. */
- if (flags & IOMAP_ZERO)
- goto out_found;
+ if (imap_needs_alloc(inode, flags, &imap, nimaps))
+ goto allocate_blocks;
- if (!imap_needs_alloc(inode, &imap, nimaps))
- goto out_found;
+ xfs_iunlock(ip, lockmode);
+ trace_xfs_iomap_found(ip, offset, length, XFS_DATA_FORK, &imap);
+ return xfs_bmbt_to_iomap(ip, iomap, &imap, 0);
- /* If nowait is set bail since we are going to make allocations. */
- if (flags & IOMAP_NOWAIT) {
- error = -EAGAIN;
+allocate_blocks:
+ error = -EAGAIN;
+ if (flags & IOMAP_NOWAIT)
goto out_unlock;
- }
/*
* We cap the maximum length we map to a sane size to keep the chunks
@@ -808,22 +820,12 @@ xfs_direct_write_iomap_begin(
*/
if (lockmode == XFS_ILOCK_EXCL)
xfs_ilock_demote(ip, lockmode);
- error = xfs_iomap_write_direct(ip, offset, length, &imap,
- nimaps);
+ error = xfs_iomap_write_direct(ip, offset, length, &imap, nimaps);
if (error)
return error;
- iomap_flags |= IOMAP_F_NEW;
trace_xfs_iomap_alloc(ip, offset, length, XFS_DATA_FORK, &imap);
-
-out_finish:
- return xfs_bmbt_to_iomap(ip, iomap, &imap, iomap_flags);
-
-out_found:
- ASSERT(nimaps);
- xfs_iunlock(ip, lockmode);
- trace_xfs_iomap_found(ip, offset, length, XFS_DATA_FORK, &imap);
- goto out_finish;
+ return xfs_bmbt_to_iomap(ip, iomap, &imap, IOMAP_F_NEW);
out_found_cow:
xfs_iunlock(ip, lockmode);
--
2.20.1
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 20/20] xfs: improve the IOMAP_NOWAIT check for COW inodes
2019-10-08 7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
` (18 preceding siblings ...)
2019-10-08 7:15 ` [PATCH 19/20] xfs: cleanup xfs_iomap_write_unwritten Christoph Hellwig
@ 2019-10-08 7:15 ` Christoph Hellwig
19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08 7:15 UTC (permalink / raw)
To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel; +Cc: Darrick J . Wong
Only bail out once we know that a COW allocation is actually required,
similar to how we handle normal data fork allocations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/xfs_iomap.c | 23 +++++------------------
1 file changed, 5 insertions(+), 18 deletions(-)
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index cc3d8fa011fc..46dee9c9e6ef 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -693,15 +693,8 @@ xfs_ilock_for_iomap(
* COW writes may allocate delalloc space or convert unwritten COW
* extents, so we need to make sure to take the lock exclusively here.
*/
- if (xfs_is_cow_inode(ip) && is_write) {
- /*
- * FIXME: It could still overwrite on unshared extents and not
- * need allocation.
- */
- if (flags & IOMAP_NOWAIT)
- return -EAGAIN;
+ if (xfs_is_cow_inode(ip) && is_write)
mode = XFS_ILOCK_EXCL;
- }
/*
* Extents not yet cached requires exclusive access, don't block. This
@@ -760,12 +753,6 @@ xfs_direct_write_iomap_begin(
if (XFS_FORCED_SHUTDOWN(mp))
return -EIO;
- /*
- * Lock the inode in the manner required for the specified operation and
- * check for as many conditions that would result in blocking as
- * possible. This removes most of the non-blocking checks from the
- * mapping code below.
- */
error = xfs_ilock_for_iomap(ip, flags, &lockmode);
if (error)
return error;
@@ -775,11 +762,11 @@ xfs_direct_write_iomap_begin(
if (error)
goto out_unlock;
- /*
- * Break shared extents if necessary. Checks for non-blocking IO have
- * been done up front, so we don't need to do them here.
- */
if (imap_needs_cow(ip, flags, &imap, nimaps)) {
+ error = -EAGAIN;
+ if (flags & IOMAP_NOWAIT)
+ goto out_unlock;
+
/* may drop and re-acquire the ilock */
error = xfs_reflink_allocate_cow(ip, &imap, &cmap, &shared,
&lockmode, flags & IOMAP_DIRECT);
--
2.20.1
^ permalink raw reply related [flat|nested] 32+ messages in thread