All of lore.kernel.org
 help / color / mirror / Atom feed
* reflink direct I/O improvements V2
@ 2017-02-01 21:42 Christoph Hellwig
  2017-02-01 21:42 ` [PATCH 1/4] xfs: reject all unaligned direct writes to reflinked files Christoph Hellwig
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Christoph Hellwig @ 2017-02-01 21:42 UTC (permalink / raw)
  To: linux-xfs

This series improves the reflink direct I/O path, by avoiding a pointless
detour through delayed allocations and doing the allocations directly
from the iomap code.

Changes since V1:
 - align the alignment code with the regular direct I/O path
 - rebase on top the patches from Darrick to use unwritten extents
   in the direct I/O COW path

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/4] xfs: reject all unaligned direct writes to reflinked files
  2017-02-01 21:42 reflink direct I/O improvements V2 Christoph Hellwig
@ 2017-02-01 21:42 ` Christoph Hellwig
  2017-02-02 23:01   ` Darrick J. Wong
  2017-02-01 21:42 ` [PATCH 2/4] xfs: introduce xfs_aligned_fsb_count Christoph Hellwig
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2017-02-01 21:42 UTC (permalink / raw)
  To: linux-xfs

We currently fall back from direct to buffered writes if we detect a
remaining shared extent in the iomap_begin callback.  But by the time
iomap_begin is called for the potentially unaligned end block we might
have already written most of the data to disk, which we'd now write
again using buffered I/O.  To avoid this reject all writes to reflinked
files before starting I/O so that we are guaranteed to only write the
data once.

The alternative would be to unshare the unaligned start and/or end block
before doing the I/O. I think that's doable, and will actually be
required to support reflinks on DAX file system.  But it will take a
little more time and I'd rather get rid of the double write ASAP.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_file.c  |  7 +++++++
 fs/xfs/xfs_iomap.c | 12 +-----------
 2 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index bbb9eb6..06236bf 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -527,6 +527,13 @@ xfs_file_dio_aio_write(
 	if ((iocb->ki_pos & mp->m_blockmask) ||
 	    ((iocb->ki_pos + count) & mp->m_blockmask)) {
 		unaligned_io = 1;
+
+		/*
+		 * We can't properly handle unaligned direct I/O to reflink
+		 * files yet, as we can't unshare a partial block.
+		 */
+		if (xfs_is_reflink_inode(ip))
+			return -EREMCHG;
 		iolock = XFS_IOLOCK_EXCL;
 	} else {
 		iolock = XFS_IOLOCK_SHARED;
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 767208f..d17aa3b 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1026,17 +1026,7 @@ xfs_file_iomap_begin(
 		if (error)
 			goto out_unlock;
 
-		/*
-		 * We're here because we're trying to do a directio write to a
-		 * region that isn't aligned to a filesystem block.  If the
-		 * extent is shared, fall back to buffered mode to handle the
-		 * RMW.
-		 */
-		if (!(flags & IOMAP_REPORT) && shared) {
-			trace_xfs_reflink_bounce_dio_write(ip, &imap);
-			error = -EREMCHG;
-			goto out_unlock;
-		}
+		ASSERT((flags & IOMAP_REPORT) || !shared);
 	}
 
 	if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && xfs_is_reflink_inode(ip)) {
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/4] xfs: introduce xfs_aligned_fsb_count
  2017-02-01 21:42 reflink direct I/O improvements V2 Christoph Hellwig
  2017-02-01 21:42 ` [PATCH 1/4] xfs: reject all unaligned direct writes to reflinked files Christoph Hellwig
@ 2017-02-01 21:42 ` Christoph Hellwig
  2017-02-02 22:56   ` Darrick J. Wong
  2017-02-01 21:42 ` [PATCH 3/4] xfs: go straight to real allocations for direct I/O COW writes Christoph Hellwig
  2017-02-01 21:42 ` [PATCH 4/4] xfs: allocate direct I/O COW blocks in iomap_begin Christoph Hellwig
  3 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2017-02-01 21:42 UTC (permalink / raw)
  To: linux-xfs

Factor a helper to calculate the extent-size aligned block out of the
iomap code, so that it can be reused by the upcoming reflink dio code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_iomap.c | 11 ++---------
 fs/xfs/xfs_iomap.h | 20 ++++++++++++++++++++
 2 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index d17aa3b..9d811eb 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -162,7 +162,7 @@ xfs_iomap_write_direct(
 	xfs_fileoff_t	last_fsb;
 	xfs_filblks_t	count_fsb, resaligned;
 	xfs_fsblock_t	firstfsb;
-	xfs_extlen_t	extsz, temp;
+	xfs_extlen_t	extsz;
 	int		nimaps;
 	int		quota_flag;
 	int		rt;
@@ -203,14 +203,7 @@ xfs_iomap_write_direct(
 	}
 	count_fsb = last_fsb - offset_fsb;
 	ASSERT(count_fsb > 0);
-
-	resaligned = count_fsb;
-	if (unlikely(extsz)) {
-		if ((temp = do_mod(offset_fsb, extsz)))
-			resaligned += temp;
-		if ((temp = do_mod(resaligned, extsz)))
-			resaligned += extsz - temp;
-	}
+	resaligned = xfs_aligned_fsb_count(offset_fsb, count_fsb, extsz);
 
 	if (unlikely(rt)) {
 		resrtextents = qblocks = resaligned;
diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h
index 6d45cf0..1fef4a5 100644
--- a/fs/xfs/xfs_iomap.h
+++ b/fs/xfs/xfs_iomap.h
@@ -33,6 +33,26 @@ void xfs_bmbt_to_iomap(struct xfs_inode *, struct iomap *,
 		struct xfs_bmbt_irec *);
 xfs_extlen_t xfs_eof_alignment(struct xfs_inode *ip, xfs_extlen_t extsize);
 
+static inline xfs_filblks_t
+xfs_aligned_fsb_count(
+	xfs_fileoff_t		offset_fsb,
+	xfs_filblks_t		count_fsb,
+	xfs_extlen_t		extsz)
+{
+	if (extsz) {
+		xfs_extlen_t	align;
+
+		align = do_mod(offset_fsb, extsz);
+		if (align)
+			count_fsb += align;
+		align = do_mod(count_fsb, extsz);
+		if (align)
+			count_fsb += extsz - align;
+	}
+
+	return count_fsb;
+}
+
 extern struct iomap_ops xfs_iomap_ops;
 extern struct iomap_ops xfs_xattr_iomap_ops;
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/4] xfs: go straight to real allocations for direct I/O COW writes
  2017-02-01 21:42 reflink direct I/O improvements V2 Christoph Hellwig
  2017-02-01 21:42 ` [PATCH 1/4] xfs: reject all unaligned direct writes to reflinked files Christoph Hellwig
  2017-02-01 21:42 ` [PATCH 2/4] xfs: introduce xfs_aligned_fsb_count Christoph Hellwig
@ 2017-02-01 21:42 ` Christoph Hellwig
  2017-02-01 21:42 ` [PATCH 4/4] xfs: allocate direct I/O COW blocks in iomap_begin Christoph Hellwig
  3 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2017-02-01 21:42 UTC (permalink / raw)
  To: linux-xfs

When we allocate COW fork blocks for direct I/O writes we currently first
create a delayed allocation, and then convert it to a real allocation
once we've got the delayed one.

As there is no good reason for that this patch instead makes use call
xfs_bmapi_write from the COW allocation path.  The only interesting bits
are a few tweaks the low-level allocator to allow for this, most notably
the need to remove the call to xfs_bmap_extsize_align for the cowextsize
in xfs_bmap_btalloc - for the existing convert case it's a no-op, but
for the direct allocation case it would blow up our block reservation
way beyond what we reserved for the transaction.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_bmap.c |  3 +-
 fs/xfs/xfs_reflink.c     | 94 +++++++++++++++++++++++++++++++++---------------
 2 files changed, 68 insertions(+), 29 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 1566e9d..2b4ad84 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -2909,13 +2909,14 @@ xfs_bmap_add_extent_hole_real(
 	ASSERT(!isnullstartblock(new->br_startblock));
 	ASSERT(!bma->cur ||
 	       !(bma->cur->bc_private.b.flags & XFS_BTCUR_BPRV_WASDEL));
-	ASSERT(whichfork != XFS_COW_FORK);
 
 	XFS_STATS_INC(mp, xs_add_exlist);
 
 	state = 0;
 	if (whichfork == XFS_ATTR_FORK)
 		state |= BMAP_ATTRFORK;
+	if (whichfork == XFS_COW_FORK)
+		state |= BMAP_COWFORK;
 
 	/*
 	 * Check and set flags if this segment has a left neighbor.
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index a2e1fff..1a96741 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -391,62 +391,100 @@ __xfs_reflink_allocate_cow(
 	xfs_fileoff_t		end_fsb)
 {
 	struct xfs_mount	*mp = ip->i_mount;
-	struct xfs_bmbt_irec	imap;
+	struct xfs_bmbt_irec	imap, got;
 	struct xfs_defer_ops	dfops;
 	struct xfs_trans	*tp;
 	xfs_fsblock_t		first_block;
-	int			nimaps = 1, error;
-	bool			shared;
+	int			nimaps, error, lockmode;
+	bool			shared, trimmed;
+	xfs_filblks_t		resaligned;
+	xfs_extlen_t		resblks;
+	xfs_extnum_t		idx;
 
-	xfs_defer_init(&dfops, &first_block);
+	resaligned = xfs_aligned_fsb_count(*offset_fsb, end_fsb - *offset_fsb,
+			xfs_get_cowextsz_hint(ip));
+	resblks = XFS_DIOSTRAT_SPACE_RES(mp, resaligned);
 
-	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, 0, 0,
-			XFS_TRANS_RESERVE, &tp);
+	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp);
 	if (error)
 		return error;
 
-	xfs_ilock(ip, XFS_ILOCK_EXCL);
+	lockmode = XFS_ILOCK_EXCL;
+	xfs_ilock(ip, lockmode);
 
-	/* Read extent from the source file. */
-	nimaps = 1;
-	error = xfs_bmapi_read(ip, *offset_fsb, end_fsb - *offset_fsb,
-			&imap, &nimaps, 0);
-	if (error)
-		goto out_unlock;
-	ASSERT(nimaps == 1);
+	/*
+	 * Even if the extent is not shared we might have a preallocation for
+	 * it in the COW fork.  If so use it.
+	 */
+	if (xfs_iext_lookup_extent(ip, ip->i_cowfp, *offset_fsb, &idx, &got) &&
+	    got.br_startoff <= *offset_fsb) {
+		/* If we have a real allocation in the COW fork we're done. */
+		if (!isnullstartblock(got.br_startblock)) {
+			xfs_trim_extent(&got, *offset_fsb,
+					end_fsb - *offset_fsb);
+			*offset_fsb = got.br_startoff + got.br_blockcount;
+			goto out_trans_cancel;
+		}
+	} else {
+		nimaps = 1;
+		error = xfs_bmapi_read(ip, *offset_fsb, end_fsb - *offset_fsb,
+				&imap, &nimaps, 0);
+		if (error)
+			goto out_trans_cancel;
+		ASSERT(nimaps == 1);
+
+		/* Trim the mapping to the nearest shared extent boundary. */
+		error = xfs_reflink_trim_around_shared(ip, &imap, &shared,
+				&trimmed);
+		if (error)
+			goto out_trans_cancel;
+
+		if (!shared) {
+			*offset_fsb = imap.br_startoff + imap.br_blockcount;
+			goto out_trans_cancel;
+		}
 
-	/* Make sure there's a CoW reservation for it. */
-	error = xfs_reflink_reserve_cow(ip, &imap, &shared);
+		*offset_fsb = imap.br_startoff;
+		end_fsb = imap.br_startoff + imap.br_blockcount;
+	}
+
+	error = xfs_trans_reserve_quota_nblks(tp, ip, resblks, 0,
+			XFS_QMOPT_RES_REGBLKS);
 	if (error)
 		goto out_trans_cancel;
 
-	if (!shared) {
-		*offset_fsb = imap.br_startoff + imap.br_blockcount;
-		goto out_trans_cancel;
-	}
+	xfs_trans_ijoin(tp, ip, 0);
+
+	xfs_defer_init(&dfops, &first_block);
+	nimaps = 1;
 
 	/* Allocate the entire reservation as unwritten blocks. */
-	xfs_trans_ijoin(tp, ip, 0);
-	error = xfs_bmapi_write(tp, ip, imap.br_startoff, imap.br_blockcount,
+	error = xfs_bmapi_write(tp, ip, *offset_fsb, end_fsb - *offset_fsb,
 			XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC, &first_block,
-			XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK),
-			&imap, &nimaps, &dfops);
+			resblks, &imap, &nimaps, &dfops);
 	if (error)
-		goto out_trans_cancel;
+		goto out_bmap_cancel;
 
 	/* Finish up. */
 	error = xfs_defer_finish(&tp, &dfops, NULL);
 	if (error)
-		goto out_trans_cancel;
+		goto out_bmap_cancel;
 
 	error = xfs_trans_commit(tp);
+	if (error)
+		goto out_unlock;
 
 	*offset_fsb = imap.br_startoff + imap.br_blockcount;
+
 out_unlock:
-	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	xfs_iunlock(ip, lockmode);
 	return error;
-out_trans_cancel:
+
+out_bmap_cancel:
 	xfs_defer_cancel(&dfops);
+	xfs_trans_unreserve_quota_nblks(tp, ip, (long)resblks, 0,
+			XFS_QMOPT_RES_REGBLKS);
+out_trans_cancel:
 	xfs_trans_cancel(tp);
 	goto out_unlock;
 }
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/4] xfs: allocate direct I/O COW blocks in iomap_begin
  2017-02-01 21:42 reflink direct I/O improvements V2 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2017-02-01 21:42 ` [PATCH 3/4] xfs: go straight to real allocations for direct I/O COW writes Christoph Hellwig
@ 2017-02-01 21:42 ` Christoph Hellwig
  2017-02-03  0:55   ` Darrick J. Wong
  3 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2017-02-01 21:42 UTC (permalink / raw)
  To: linux-xfs

Instead of preallocating all the required COW blocks in the high-level
write code do it inside the iomap code, like we do for all other I/O.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_file.c    |   8 ---
 fs/xfs/xfs_iomap.c   |  31 +++++------
 fs/xfs/xfs_reflink.c | 147 ++++++++++++++++++++-------------------------------
 fs/xfs/xfs_reflink.h |   5 +-
 fs/xfs/xfs_trace.h   |   2 -
 5 files changed, 71 insertions(+), 122 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 06236bf..56ac5b7 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -559,14 +559,6 @@ xfs_file_dio_aio_write(
 	}
 
 	trace_xfs_file_direct_write(ip, count, iocb->ki_pos);
-
-	/* If this is a block-aligned directio CoW, remap immediately. */
-	if (xfs_is_reflink_inode(ip) && !unaligned_io) {
-		ret = xfs_reflink_allocate_cow_range(ip, iocb->ki_pos, count);
-		if (ret)
-			goto out;
-	}
-
 	ret = iomap_dio_rw(iocb, from, &xfs_iomap_ops, xfs_dio_write_end_io);
 out:
 	xfs_iunlock(ip, iolock);
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 9d811eb..de717e7 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -995,37 +995,31 @@ xfs_file_iomap_begin(
 	offset_fsb = XFS_B_TO_FSBT(mp, offset);
 	end_fsb = XFS_B_TO_FSB(mp, offset + length);
 
-	if (xfs_is_reflink_inode(ip) &&
-	    (flags & IOMAP_WRITE) && (flags & IOMAP_DIRECT)) {
-		shared = xfs_reflink_find_cow_mapping(ip, offset, &imap);
-		if (shared) {
-			xfs_iunlock(ip, lockmode);
-			goto alloc_done;
-		}
-		ASSERT(!isnullstartblock(imap.br_startblock));
-	}
-
 	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap,
 			       &nimaps, 0);
 	if (error)
 		goto out_unlock;
 
-	if ((flags & IOMAP_REPORT) ||
-	    (xfs_is_reflink_inode(ip) &&
-	     (flags & IOMAP_WRITE) && (flags & IOMAP_DIRECT))) {
+	if (flags & IOMAP_REPORT) {
 		/* Trim the mapping to the nearest shared extent boundary. */
 		error = xfs_reflink_trim_around_shared(ip, &imap, &shared,
 				&trimmed);
 		if (error)
 			goto out_unlock;
-
-		ASSERT((flags & IOMAP_REPORT) || !shared);
 	}
 
 	if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && xfs_is_reflink_inode(ip)) {
-		error = xfs_reflink_reserve_cow(ip, &imap, &shared);
-		if (error)
-			goto out_unlock;
+		if (flags & IOMAP_DIRECT) {
+			/* may drop and re-acquire the ilock */
+			error = xfs_reflink_allocate_cow(ip, offset_fsb, end_fsb,
+					&imap, &shared, &lockmode);
+			if (error)
+				goto out_unlock;
+		} else {
+			error = xfs_reflink_reserve_cow(ip, &imap, &shared);
+			if (error)
+				goto out_unlock;
+		}
 
 		end_fsb = imap.br_startoff + imap.br_blockcount;
 		length = XFS_FSB_TO_B(mp, end_fsb) - offset;
@@ -1054,7 +1048,6 @@ xfs_file_iomap_begin(
 		if (error)
 			return error;
 
-alloc_done:
 		iomap->flags = IOMAP_F_NEW;
 		trace_xfs_iomap_alloc(ip, offset, length, 0, &imap);
 	} else {
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 1a96741..d5a2cf2 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -384,74 +384,84 @@ xfs_reflink_convert_cow(
 }
 
 /* Allocate all CoW reservations covering a range of blocks in a file. */
-static int
-__xfs_reflink_allocate_cow(
+int
+xfs_reflink_allocate_cow(
 	struct xfs_inode	*ip,
-	xfs_fileoff_t		*offset_fsb,
-	xfs_fileoff_t		end_fsb)
+	xfs_fileoff_t		offset_fsb,
+	xfs_fileoff_t		end_fsb,
+	struct xfs_bmbt_irec	*imap,
+	bool			*shared,
+	uint			*lockmode)
 {
 	struct xfs_mount	*mp = ip->i_mount;
-	struct xfs_bmbt_irec	imap, got;
+	struct xfs_bmbt_irec	got;
 	struct xfs_defer_ops	dfops;
-	struct xfs_trans	*tp;
+	struct xfs_trans	*tp = NULL;
 	xfs_fsblock_t		first_block;
-	int			nimaps, error, lockmode;
-	bool			shared, trimmed;
+	int			nimaps, error = 0;
+	bool			trimmed;
 	xfs_filblks_t		resaligned;
-	xfs_extlen_t		resblks;
+	xfs_extlen_t		resblks = 0;
 	xfs_extnum_t		idx;
 
-	resaligned = xfs_aligned_fsb_count(*offset_fsb, end_fsb - *offset_fsb,
-			xfs_get_cowextsz_hint(ip));
-	resblks = XFS_DIOSTRAT_SPACE_RES(mp, resaligned);
-
-	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp);
-	if (error)
-		return error;
-
-	lockmode = XFS_ILOCK_EXCL;
-	xfs_ilock(ip, lockmode);
+retry:
+	ASSERT(xfs_is_reflink_inode(ip));
+	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL | XFS_ILOCK_SHARED));
 
 	/*
 	 * Even if the extent is not shared we might have a preallocation for
 	 * it in the COW fork.  If so use it.
 	 */
-	if (xfs_iext_lookup_extent(ip, ip->i_cowfp, *offset_fsb, &idx, &got) &&
-	    got.br_startoff <= *offset_fsb) {
+	if (xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &idx, &got) &&
+	    got.br_startoff <= offset_fsb) {
+		*shared = true;
+
 		/* If we have a real allocation in the COW fork we're done. */
 		if (!isnullstartblock(got.br_startblock)) {
-			xfs_trim_extent(&got, *offset_fsb,
-					end_fsb - *offset_fsb);
-			*offset_fsb = got.br_startoff + got.br_blockcount;
-			goto out_trans_cancel;
+			xfs_trim_extent(&got, offset_fsb, end_fsb - offset_fsb);
+			*imap = got;
+			goto out;
 		}
+
+		xfs_trim_extent(imap, got.br_startoff, got.br_blockcount);
 	} else {
-		nimaps = 1;
-		error = xfs_bmapi_read(ip, *offset_fsb, end_fsb - *offset_fsb,
-				&imap, &nimaps, 0);
+		error = xfs_reflink_trim_around_shared(ip, imap, shared, &trimmed);
+		if (error || !*shared)
+			goto out;
+	}
+
+	if (!tp) {
+		resaligned = xfs_aligned_fsb_count(offset_fsb,
+			end_fsb - offset_fsb, xfs_get_cowextsz_hint(ip));
+		resblks = XFS_DIOSTRAT_SPACE_RES(mp, resaligned);
+
+		xfs_iunlock(ip, *lockmode);
+		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp);
+		*lockmode = XFS_ILOCK_EXCL;
+		xfs_ilock(ip, *lockmode);
+
 		if (error)
-			goto out_trans_cancel;
-		ASSERT(nimaps == 1);
+			return error;
 
-		/* Trim the mapping to the nearest shared extent boundary. */
-		error = xfs_reflink_trim_around_shared(ip, &imap, &shared,
-				&trimmed);
+		error = xfs_qm_dqattach_locked(ip, 0);
 		if (error)
-			goto out_trans_cancel;
+			goto out;
 
-		if (!shared) {
-			*offset_fsb = imap.br_startoff + imap.br_blockcount;
-			goto out_trans_cancel;
-		}
+		/* Read extent from the source file. */
+		nimaps = 1;
+		error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
+				imap, &nimaps, 0);
+		if (error)
+			goto out;
+		ASSERT(nimaps == 1);
 
-		*offset_fsb = imap.br_startoff;
-		end_fsb = imap.br_startoff + imap.br_blockcount;
+		goto retry;
 	}
 
 	error = xfs_trans_reserve_quota_nblks(tp, ip, resblks, 0,
 			XFS_QMOPT_RES_REGBLKS);
 	if (error)
-		goto out_trans_cancel;
+		goto out;
 
 	xfs_trans_ijoin(tp, ip, 0);
 
@@ -459,9 +469,9 @@ __xfs_reflink_allocate_cow(
 	nimaps = 1;
 
 	/* Allocate the entire reservation as unwritten blocks. */
-	error = xfs_bmapi_write(tp, ip, *offset_fsb, end_fsb - *offset_fsb,
+	error = xfs_bmapi_write(tp, ip, imap->br_startoff, imap->br_blockcount,
 			XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC, &first_block,
-			resblks, &imap, &nimaps, &dfops);
+			resblks, imap, &nimaps, &dfops);
 	if (error)
 		goto out_bmap_cancel;
 
@@ -470,60 +480,15 @@ __xfs_reflink_allocate_cow(
 	if (error)
 		goto out_bmap_cancel;
 
-	error = xfs_trans_commit(tp);
-	if (error)
-		goto out_unlock;
-
-	*offset_fsb = imap.br_startoff + imap.br_blockcount;
-
-out_unlock:
-	xfs_iunlock(ip, lockmode);
-	return error;
+	return xfs_trans_commit(tp);
 
 out_bmap_cancel:
 	xfs_defer_cancel(&dfops);
 	xfs_trans_unreserve_quota_nblks(tp, ip, (long)resblks, 0,
 			XFS_QMOPT_RES_REGBLKS);
-out_trans_cancel:
-	xfs_trans_cancel(tp);
-	goto out_unlock;
-}
-
-/* Allocate all CoW reservations covering a part of a file. */
-int
-xfs_reflink_allocate_cow_range(
-	struct xfs_inode	*ip,
-	xfs_off_t		offset,
-	xfs_off_t		count)
-{
-	struct xfs_mount	*mp = ip->i_mount;
-	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
-	xfs_fileoff_t		end_fsb = XFS_B_TO_FSB(mp, offset + count);
-	int			error;
-
-	ASSERT(xfs_is_reflink_inode(ip));
-
-	trace_xfs_reflink_allocate_cow_range(ip, offset, count);
-
-	/*
-	 * Make sure that the dquots are there.
-	 */
-	error = xfs_qm_dqattach(ip, 0);
-	if (error)
-		return error;
-
-	while (offset_fsb < end_fsb) {
-		error = __xfs_reflink_allocate_cow(ip, &offset_fsb, end_fsb);
-		if (error) {
-			trace_xfs_reflink_allocate_cow_range_error(ip, error,
-					_RET_IP_);
-			goto out;
-		}
-	}
-
-	/* Convert the CoW extents to regular. */
-	error = xfs_reflink_convert_cow(ip, offset, count);
 out:
+	if (tp)
+		xfs_trans_cancel(tp);
 	return error;
 }
 
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index 1583c47..08792a4 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -28,8 +28,9 @@ extern int xfs_reflink_trim_around_shared(struct xfs_inode *ip,
 
 extern int xfs_reflink_reserve_cow(struct xfs_inode *ip,
 		struct xfs_bmbt_irec *imap, bool *shared);
-extern int xfs_reflink_allocate_cow_range(struct xfs_inode *ip,
-		xfs_off_t offset, xfs_off_t count);
+extern int xfs_reflink_allocate_cow(struct xfs_inode *ip,
+		xfs_fileoff_t offset_fsb, xfs_fileoff_t end_fsb,
+		struct xfs_bmbt_irec *imap, bool *shared, uint *lockmode);
 extern int xfs_reflink_convert_cow(struct xfs_inode *ip, xfs_off_t offset,
 		xfs_off_t count);
 extern bool xfs_reflink_find_cow_mapping(struct xfs_inode *ip, xfs_off_t offset,
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index d3d11905..5f0a0d6 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3248,7 +3248,6 @@ DEFINE_INODE_IREC_EVENT(xfs_reflink_cow_enospc);
 DEFINE_INODE_IREC_EVENT(xfs_reflink_convert_cow);
 
 DEFINE_RW_EVENT(xfs_reflink_reserve_cow);
-DEFINE_RW_EVENT(xfs_reflink_allocate_cow_range);
 
 DEFINE_INODE_IREC_EVENT(xfs_reflink_bounce_dio_write);
 DEFINE_IOMAP_EVENT(xfs_reflink_find_cow_mapping);
@@ -3258,7 +3257,6 @@ DEFINE_SIMPLE_IO_EVENT(xfs_reflink_cancel_cow_range);
 DEFINE_SIMPLE_IO_EVENT(xfs_reflink_end_cow);
 DEFINE_INODE_IREC_EVENT(xfs_reflink_cow_remap);
 
-DEFINE_INODE_ERROR_EVENT(xfs_reflink_allocate_cow_range_error);
 DEFINE_INODE_ERROR_EVENT(xfs_reflink_cancel_cow_range_error);
 DEFINE_INODE_ERROR_EVENT(xfs_reflink_end_cow_error);
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/4] xfs: introduce xfs_aligned_fsb_count
  2017-02-01 21:42 ` [PATCH 2/4] xfs: introduce xfs_aligned_fsb_count Christoph Hellwig
@ 2017-02-02 22:56   ` Darrick J. Wong
  0 siblings, 0 replies; 10+ messages in thread
From: Darrick J. Wong @ 2017-02-02 22:56 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Wed, Feb 01, 2017 at 10:42:37PM +0100, Christoph Hellwig wrote:
> Factor a helper to calculate the extent-size aligned block out of the
> iomap code, so that it can be reused by the upcoming reflink dio code.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks like a reasonable enough code replacement, so:
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_iomap.c | 11 ++---------
>  fs/xfs/xfs_iomap.h | 20 ++++++++++++++++++++
>  2 files changed, 22 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index d17aa3b..9d811eb 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -162,7 +162,7 @@ xfs_iomap_write_direct(
>  	xfs_fileoff_t	last_fsb;
>  	xfs_filblks_t	count_fsb, resaligned;
>  	xfs_fsblock_t	firstfsb;
> -	xfs_extlen_t	extsz, temp;
> +	xfs_extlen_t	extsz;
>  	int		nimaps;
>  	int		quota_flag;
>  	int		rt;
> @@ -203,14 +203,7 @@ xfs_iomap_write_direct(
>  	}
>  	count_fsb = last_fsb - offset_fsb;
>  	ASSERT(count_fsb > 0);
> -
> -	resaligned = count_fsb;
> -	if (unlikely(extsz)) {
> -		if ((temp = do_mod(offset_fsb, extsz)))
> -			resaligned += temp;
> -		if ((temp = do_mod(resaligned, extsz)))
> -			resaligned += extsz - temp;
> -	}
> +	resaligned = xfs_aligned_fsb_count(offset_fsb, count_fsb, extsz);
>  
>  	if (unlikely(rt)) {
>  		resrtextents = qblocks = resaligned;
> diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h
> index 6d45cf0..1fef4a5 100644
> --- a/fs/xfs/xfs_iomap.h
> +++ b/fs/xfs/xfs_iomap.h
> @@ -33,6 +33,26 @@ void xfs_bmbt_to_iomap(struct xfs_inode *, struct iomap *,
>  		struct xfs_bmbt_irec *);
>  xfs_extlen_t xfs_eof_alignment(struct xfs_inode *ip, xfs_extlen_t extsize);
>  
> +static inline xfs_filblks_t
> +xfs_aligned_fsb_count(
> +	xfs_fileoff_t		offset_fsb,
> +	xfs_filblks_t		count_fsb,
> +	xfs_extlen_t		extsz)
> +{
> +	if (extsz) {
> +		xfs_extlen_t	align;
> +
> +		align = do_mod(offset_fsb, extsz);
> +		if (align)
> +			count_fsb += align;
> +		align = do_mod(count_fsb, extsz);
> +		if (align)
> +			count_fsb += extsz - align;
> +	}
> +
> +	return count_fsb;
> +}
> +
>  extern struct iomap_ops xfs_iomap_ops;
>  extern struct iomap_ops xfs_xattr_iomap_ops;
>  
> -- 
> 2.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] xfs: reject all unaligned direct writes to reflinked files
  2017-02-01 21:42 ` [PATCH 1/4] xfs: reject all unaligned direct writes to reflinked files Christoph Hellwig
@ 2017-02-02 23:01   ` Darrick J. Wong
  0 siblings, 0 replies; 10+ messages in thread
From: Darrick J. Wong @ 2017-02-02 23:01 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Wed, Feb 01, 2017 at 10:42:36PM +0100, Christoph Hellwig wrote:
> We currently fall back from direct to buffered writes if we detect a
> remaining shared extent in the iomap_begin callback.  But by the time
> iomap_begin is called for the potentially unaligned end block we might
> have already written most of the data to disk, which we'd now write
> again using buffered I/O.  To avoid this reject all writes to reflinked
> files before starting I/O so that we are guaranteed to only write the
> data once.
> 
> The alternative would be to unshare the unaligned start and/or end block
> before doing the I/O. I think that's doable, and will actually be
> required to support reflinks on DAX file system.  But it will take a
> little more time and I'd rather get rid of the double write ASAP.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/xfs_file.c  |  7 +++++++
>  fs/xfs/xfs_iomap.c | 12 +-----------
>  2 files changed, 8 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index bbb9eb6..06236bf 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -527,6 +527,13 @@ xfs_file_dio_aio_write(
>  	if ((iocb->ki_pos & mp->m_blockmask) ||
>  	    ((iocb->ki_pos + count) & mp->m_blockmask)) {
>  		unaligned_io = 1;
> +
> +		/*
> +		 * We can't properly handle unaligned direct I/O to reflink
> +		 * files yet, as we can't unshare a partial block.
> +		 */
> +		if (xfs_is_reflink_inode(ip))
> +			return -EREMCHG;
>  		iolock = XFS_IOLOCK_EXCL;
>  	} else {
>  		iolock = XFS_IOLOCK_SHARED;
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 767208f..d17aa3b 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1026,17 +1026,7 @@ xfs_file_iomap_begin(
>  		if (error)
>  			goto out_unlock;
>  
> -		/*
> -		 * We're here because we're trying to do a directio write to a
> -		 * region that isn't aligned to a filesystem block.  If the
> -		 * extent is shared, fall back to buffered mode to handle the
> -		 * RMW.
> -		 */
> -		if (!(flags & IOMAP_REPORT) && shared) {
> -			trace_xfs_reflink_bounce_dio_write(ip, &imap);

Can you either port the tracepoint to the new hunk or just kill it
entirely?

(If you opt for killing it, I'll just remove it from xfs_trace.h when I
import this patch.)

Otherwise,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> -			error = -EREMCHG;
> -			goto out_unlock;
> -		}
> +		ASSERT((flags & IOMAP_REPORT) || !shared);
>  	}
>  
>  	if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && xfs_is_reflink_inode(ip)) {
> -- 
> 2.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/4] xfs: allocate direct I/O COW blocks in iomap_begin
  2017-02-01 21:42 ` [PATCH 4/4] xfs: allocate direct I/O COW blocks in iomap_begin Christoph Hellwig
@ 2017-02-03  0:55   ` Darrick J. Wong
  2017-02-03 10:53     ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Darrick J. Wong @ 2017-02-03  0:55 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Wed, Feb 01, 2017 at 10:42:39PM +0100, Christoph Hellwig wrote:
> Instead of preallocating all the required COW blocks in the high-level
> write code do it inside the iomap code, like we do for all other I/O.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_file.c    |   8 ---
>  fs/xfs/xfs_iomap.c   |  31 +++++------
>  fs/xfs/xfs_reflink.c | 147 ++++++++++++++++++++-------------------------------
>  fs/xfs/xfs_reflink.h |   5 +-
>  fs/xfs/xfs_trace.h   |   2 -
>  5 files changed, 71 insertions(+), 122 deletions(-)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 06236bf..56ac5b7 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -559,14 +559,6 @@ xfs_file_dio_aio_write(
>  	}
>  
>  	trace_xfs_file_direct_write(ip, count, iocb->ki_pos);
> -
> -	/* If this is a block-aligned directio CoW, remap immediately. */
> -	if (xfs_is_reflink_inode(ip) && !unaligned_io) {
> -		ret = xfs_reflink_allocate_cow_range(ip, iocb->ki_pos, count);
> -		if (ret)
> -			goto out;
> -	}
> -
>  	ret = iomap_dio_rw(iocb, from, &xfs_iomap_ops, xfs_dio_write_end_io);
>  out:
>  	xfs_iunlock(ip, iolock);
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 9d811eb..de717e7 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -995,37 +995,31 @@ xfs_file_iomap_begin(
>  	offset_fsb = XFS_B_TO_FSBT(mp, offset);
>  	end_fsb = XFS_B_TO_FSB(mp, offset + length);
>  
> -	if (xfs_is_reflink_inode(ip) &&
> -	    (flags & IOMAP_WRITE) && (flags & IOMAP_DIRECT)) {
> -		shared = xfs_reflink_find_cow_mapping(ip, offset, &imap);
> -		if (shared) {
> -			xfs_iunlock(ip, lockmode);
> -			goto alloc_done;
> -		}
> -		ASSERT(!isnullstartblock(imap.br_startblock));
> -	}
> -
>  	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap,
>  			       &nimaps, 0);
>  	if (error)
>  		goto out_unlock;
>  
> -	if ((flags & IOMAP_REPORT) ||
> -	    (xfs_is_reflink_inode(ip) &&
> -	     (flags & IOMAP_WRITE) && (flags & IOMAP_DIRECT))) {
> +	if (flags & IOMAP_REPORT) {
>  		/* Trim the mapping to the nearest shared extent boundary. */
>  		error = xfs_reflink_trim_around_shared(ip, &imap, &shared,
>  				&trimmed);
>  		if (error)
>  			goto out_unlock;
> -
> -		ASSERT((flags & IOMAP_REPORT) || !shared);
>  	}
>  
>  	if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && xfs_is_reflink_inode(ip)) {
> -		error = xfs_reflink_reserve_cow(ip, &imap, &shared);
> -		if (error)
> -			goto out_unlock;
> +		if (flags & IOMAP_DIRECT) {
> +			/* may drop and re-acquire the ilock */
> +			error = xfs_reflink_allocate_cow(ip, offset_fsb, end_fsb,
> +					&imap, &shared, &lockmode);
> +			if (error)
> +				goto out_unlock;
> +		} else {
> +			error = xfs_reflink_reserve_cow(ip, &imap, &shared);
> +			if (error)
> +				goto out_unlock;
> +		}
>  
>  		end_fsb = imap.br_startoff + imap.br_blockcount;
>  		length = XFS_FSB_TO_B(mp, end_fsb) - offset;
> @@ -1054,7 +1048,6 @@ xfs_file_iomap_begin(
>  		if (error)
>  			return error;
>  
> -alloc_done:
>  		iomap->flags = IOMAP_F_NEW;
>  		trace_xfs_iomap_alloc(ip, offset, length, 0, &imap);
>  	} else {
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 1a96741..d5a2cf2 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -384,74 +384,84 @@ xfs_reflink_convert_cow(
>  }
>  
>  /* Allocate all CoW reservations covering a range of blocks in a file. */
> -static int
> -__xfs_reflink_allocate_cow(
> +int
> +xfs_reflink_allocate_cow(
>  	struct xfs_inode	*ip,
> -	xfs_fileoff_t		*offset_fsb,
> -	xfs_fileoff_t		end_fsb)
> +	xfs_fileoff_t		offset_fsb,
> +	xfs_fileoff_t		end_fsb,
> +	struct xfs_bmbt_irec	*imap,
> +	bool			*shared,
> +	uint			*lockmode)
>  {
>  	struct xfs_mount	*mp = ip->i_mount;
> -	struct xfs_bmbt_irec	imap, got;
> +	struct xfs_bmbt_irec	got;
>  	struct xfs_defer_ops	dfops;
> -	struct xfs_trans	*tp;
> +	struct xfs_trans	*tp = NULL;
>  	xfs_fsblock_t		first_block;
> -	int			nimaps, error, lockmode;
> -	bool			shared, trimmed;
> +	int			nimaps, error = 0;
> +	bool			trimmed;
>  	xfs_filblks_t		resaligned;
> -	xfs_extlen_t		resblks;
> +	xfs_extlen_t		resblks = 0;
>  	xfs_extnum_t		idx;
>  
> -	resaligned = xfs_aligned_fsb_count(*offset_fsb, end_fsb - *offset_fsb,
> -			xfs_get_cowextsz_hint(ip));
> -	resblks = XFS_DIOSTRAT_SPACE_RES(mp, resaligned);
> -
> -	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp);
> -	if (error)
> -		return error;
> -
> -	lockmode = XFS_ILOCK_EXCL;
> -	xfs_ilock(ip, lockmode);
> +retry:
> +	ASSERT(xfs_is_reflink_inode(ip));
> +	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL | XFS_ILOCK_SHARED));
>  
>  	/*
>  	 * Even if the extent is not shared we might have a preallocation for
>  	 * it in the COW fork.  If so use it.
>  	 */
> -	if (xfs_iext_lookup_extent(ip, ip->i_cowfp, *offset_fsb, &idx, &got) &&
> -	    got.br_startoff <= *offset_fsb) {
> +	if (xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &idx, &got) &&
> +	    got.br_startoff <= offset_fsb) {
> +		*shared = true;
> +
>  		/* If we have a real allocation in the COW fork we're done. */
>  		if (!isnullstartblock(got.br_startblock)) {
> -			xfs_trim_extent(&got, *offset_fsb,
> -					end_fsb - *offset_fsb);
> -			*offset_fsb = got.br_startoff + got.br_blockcount;
> -			goto out_trans_cancel;
> +			xfs_trim_extent(&got, offset_fsb, end_fsb - offset_fsb);
> +			*imap = got;
> +			goto out;
>  		}
> +
> +		xfs_trim_extent(imap, got.br_startoff, got.br_blockcount);
>  	} else {
> -		nimaps = 1;
> -		error = xfs_bmapi_read(ip, *offset_fsb, end_fsb - *offset_fsb,
> -				&imap, &nimaps, 0);
> +		error = xfs_reflink_trim_around_shared(ip, imap, shared, &trimmed);
> +		if (error || !*shared)
> +			goto out;
> +	}
> +
> +	if (!tp) {
> +		resaligned = xfs_aligned_fsb_count(offset_fsb,
> +			end_fsb - offset_fsb, xfs_get_cowextsz_hint(ip));
> +		resblks = XFS_DIOSTRAT_SPACE_RES(mp, resaligned);
> +
> +		xfs_iunlock(ip, *lockmode);
> +		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp);
> +		*lockmode = XFS_ILOCK_EXCL;
> +		xfs_ilock(ip, *lockmode);
> +
>  		if (error)
> -			goto out_trans_cancel;
> -		ASSERT(nimaps == 1);
> +			return error;
>  
> -		/* Trim the mapping to the nearest shared extent boundary. */
> -		error = xfs_reflink_trim_around_shared(ip, &imap, &shared,
> -				&trimmed);
> +		error = xfs_qm_dqattach_locked(ip, 0);
>  		if (error)
> -			goto out_trans_cancel;
> +			goto out;
>  
> -		if (!shared) {
> -			*offset_fsb = imap.br_startoff + imap.br_blockcount;
> -			goto out_trans_cancel;
> -		}
> +		/* Read extent from the source file. */
> +		nimaps = 1;
> +		error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
> +				imap, &nimaps, 0);
> +		if (error)
> +			goto out;
> +		ASSERT(nimaps == 1);
>  
> -		*offset_fsb = imap.br_startoff;
> -		end_fsb = imap.br_startoff + imap.br_blockcount;
> +		goto retry;
>  	}
>  
>  	error = xfs_trans_reserve_quota_nblks(tp, ip, resblks, 0,
>  			XFS_QMOPT_RES_REGBLKS);
>  	if (error)
> -		goto out_trans_cancel;
> +		goto out;
>  
>  	xfs_trans_ijoin(tp, ip, 0);
>  
> @@ -459,9 +469,9 @@ __xfs_reflink_allocate_cow(
>  	nimaps = 1;
>  
>  	/* Allocate the entire reservation as unwritten blocks. */
> -	error = xfs_bmapi_write(tp, ip, *offset_fsb, end_fsb - *offset_fsb,
> +	error = xfs_bmapi_write(tp, ip, imap->br_startoff, imap->br_blockcount,
>  			XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC, &first_block,
> -			resblks, &imap, &nimaps, &dfops);
> +			resblks, imap, &nimaps, &dfops);
>  	if (error)
>  		goto out_bmap_cancel;
>  
> @@ -470,60 +480,15 @@ __xfs_reflink_allocate_cow(
>  	if (error)
>  		goto out_bmap_cancel;
>  
> -	error = xfs_trans_commit(tp);
> -	if (error)
> -		goto out_unlock;
> -
> -	*offset_fsb = imap.br_startoff + imap.br_blockcount;
> -
> -out_unlock:
> -	xfs_iunlock(ip, lockmode);
> -	return error;
> +	return xfs_trans_commit(tp);
>  
>  out_bmap_cancel:
>  	xfs_defer_cancel(&dfops);
>  	xfs_trans_unreserve_quota_nblks(tp, ip, (long)resblks, 0,
>  			XFS_QMOPT_RES_REGBLKS);
> -out_trans_cancel:
> -	xfs_trans_cancel(tp);
> -	goto out_unlock;
> -}
> -
> -/* Allocate all CoW reservations covering a part of a file. */
> -int
> -xfs_reflink_allocate_cow_range(
> -	struct xfs_inode	*ip,
> -	xfs_off_t		offset,
> -	xfs_off_t		count)
> -{
> -	struct xfs_mount	*mp = ip->i_mount;
> -	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
> -	xfs_fileoff_t		end_fsb = XFS_B_TO_FSB(mp, offset + count);
> -	int			error;
> -
> -	ASSERT(xfs_is_reflink_inode(ip));
> -
> -	trace_xfs_reflink_allocate_cow_range(ip, offset, count);
> -
> -	/*
> -	 * Make sure that the dquots are there.
> -	 */
> -	error = xfs_qm_dqattach(ip, 0);
> -	if (error)
> -		return error;
> -
> -	while (offset_fsb < end_fsb) {
> -		error = __xfs_reflink_allocate_cow(ip, &offset_fsb, end_fsb);
> -		if (error) {
> -			trace_xfs_reflink_allocate_cow_range_error(ip, error,
> -					_RET_IP_);
> -			goto out;
> -		}
> -	}
> -
> -	/* Convert the CoW extents to regular. */
> -	error = xfs_reflink_convert_cow(ip, offset, count);

I think this is incorrect -- we create an unwritten extent in the cow
fork above (BMAP_COWFORK | BMAP_PREALLOC) but we no longer convert the
part we're actually writing to a real extent.  This means that
_reflink_cow won't actually remap anything that we've written.

I saw generic/{139,183,187,188} fail and then g/190 crashed.  There's
something incorrect going on for sure, though it's late and I won't
have time to pinpoint it tonight.

--D

>  out:
> +	if (tp)
> +		xfs_trans_cancel(tp);
>  	return error;
>  }
>  
> diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
> index 1583c47..08792a4 100644
> --- a/fs/xfs/xfs_reflink.h
> +++ b/fs/xfs/xfs_reflink.h
> @@ -28,8 +28,9 @@ extern int xfs_reflink_trim_around_shared(struct xfs_inode *ip,
>  
>  extern int xfs_reflink_reserve_cow(struct xfs_inode *ip,
>  		struct xfs_bmbt_irec *imap, bool *shared);
> -extern int xfs_reflink_allocate_cow_range(struct xfs_inode *ip,
> -		xfs_off_t offset, xfs_off_t count);
> +extern int xfs_reflink_allocate_cow(struct xfs_inode *ip,
> +		xfs_fileoff_t offset_fsb, xfs_fileoff_t end_fsb,
> +		struct xfs_bmbt_irec *imap, bool *shared, uint *lockmode);
>  extern int xfs_reflink_convert_cow(struct xfs_inode *ip, xfs_off_t offset,
>  		xfs_off_t count);
>  extern bool xfs_reflink_find_cow_mapping(struct xfs_inode *ip, xfs_off_t offset,
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index d3d11905..5f0a0d6 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -3248,7 +3248,6 @@ DEFINE_INODE_IREC_EVENT(xfs_reflink_cow_enospc);
>  DEFINE_INODE_IREC_EVENT(xfs_reflink_convert_cow);
>  
>  DEFINE_RW_EVENT(xfs_reflink_reserve_cow);
> -DEFINE_RW_EVENT(xfs_reflink_allocate_cow_range);
>  
>  DEFINE_INODE_IREC_EVENT(xfs_reflink_bounce_dio_write);
>  DEFINE_IOMAP_EVENT(xfs_reflink_find_cow_mapping);
> @@ -3258,7 +3257,6 @@ DEFINE_SIMPLE_IO_EVENT(xfs_reflink_cancel_cow_range);
>  DEFINE_SIMPLE_IO_EVENT(xfs_reflink_end_cow);
>  DEFINE_INODE_IREC_EVENT(xfs_reflink_cow_remap);
>  
> -DEFINE_INODE_ERROR_EVENT(xfs_reflink_allocate_cow_range_error);
>  DEFINE_INODE_ERROR_EVENT(xfs_reflink_cancel_cow_range_error);
>  DEFINE_INODE_ERROR_EVENT(xfs_reflink_end_cow_error);
>  
> -- 
> 2.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/4] xfs: allocate direct I/O COW blocks in iomap_begin
  2017-02-03  0:55   ` Darrick J. Wong
@ 2017-02-03 10:53     ` Christoph Hellwig
  2017-02-05 20:13       ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2017-02-03 10:53 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs

On Thu, Feb 02, 2017 at 04:55:23PM -0800, Darrick J. Wong wrote:
> On Wed, Feb 01, 2017 at 10:42:39PM +0100, Christoph Hellwig wrote:
> > Instead of preallocating all the required COW blocks in the high-level
> > write code do it inside the iomap code, like we do for all other I/O.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  fs/xfs/xfs_file.c    |   8 ---
> >  fs/xfs/xfs_iomap.c   |  31 +++++------
> >  fs/xfs/xfs_reflink.c | 147 ++++++++++++++++++++-------------------------------
> >  fs/xfs/xfs_reflink.h |   5 +-
> >  fs/xfs/xfs_trace.h   |   2 -
> >  5 files changed, 71 insertions(+), 122 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > index 06236bf..56ac5b7 100644
> > --- a/fs/xfs/xfs_file.c
> > +++ b/fs/xfs/xfs_file.c
> > @@ -559,14 +559,6 @@ xfs_file_dio_aio_write(
> >  	}
> >  
> >  	trace_xfs_file_direct_write(ip, count, iocb->ki_pos);
> > -
> > -	/* If this is a block-aligned directio CoW, remap immediately. */
> > -	if (xfs_is_reflink_inode(ip) && !unaligned_io) {
> > -		ret = xfs_reflink_allocate_cow_range(ip, iocb->ki_pos, count);
> > -		if (ret)
> > -			goto out;
> > -	}
> > -
> >  	ret = iomap_dio_rw(iocb, from, &xfs_iomap_ops, xfs_dio_write_end_io);
> >  out:
> >  	xfs_iunlock(ip, iolock);
> > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > index 9d811eb..de717e7 100644
> > --- a/fs/xfs/xfs_iomap.c
> > +++ b/fs/xfs/xfs_iomap.c
> > @@ -995,37 +995,31 @@ xfs_file_iomap_begin(
> >  	offset_fsb = XFS_B_TO_FSBT(mp, offset);
> >  	end_fsb = XFS_B_TO_FSB(mp, offset + length);
> >  
> > -	if (xfs_is_reflink_inode(ip) &&
> > -	    (flags & IOMAP_WRITE) && (flags & IOMAP_DIRECT)) {
> > -		shared = xfs_reflink_find_cow_mapping(ip, offset, &imap);
> > -		if (shared) {
> > -			xfs_iunlock(ip, lockmode);
> > -			goto alloc_done;
> > -		}
> > -		ASSERT(!isnullstartblock(imap.br_startblock));
> > -	}
> > -
> >  	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap,
> >  			       &nimaps, 0);
> >  	if (error)
> >  		goto out_unlock;
> >  
> > -	if ((flags & IOMAP_REPORT) ||
> > -	    (xfs_is_reflink_inode(ip) &&
> > -	     (flags & IOMAP_WRITE) && (flags & IOMAP_DIRECT))) {
> > +	if (flags & IOMAP_REPORT) {
> >  		/* Trim the mapping to the nearest shared extent boundary. */
> >  		error = xfs_reflink_trim_around_shared(ip, &imap, &shared,
> >  				&trimmed);
> >  		if (error)
> >  			goto out_unlock;
> > -
> > -		ASSERT((flags & IOMAP_REPORT) || !shared);
> >  	}
> >  
> >  	if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && xfs_is_reflink_inode(ip)) {
> > -		error = xfs_reflink_reserve_cow(ip, &imap, &shared);
> > -		if (error)
> > -			goto out_unlock;
> > +		if (flags & IOMAP_DIRECT) {
> > +			/* may drop and re-acquire the ilock */
> > +			error = xfs_reflink_allocate_cow(ip, offset_fsb, end_fsb,
> > +					&imap, &shared, &lockmode);
> > +			if (error)
> > +				goto out_unlock;
> > +		} else {
> > +			error = xfs_reflink_reserve_cow(ip, &imap, &shared);
> > +			if (error)
> > +				goto out_unlock;
> > +		}
> >  
> >  		end_fsb = imap.br_startoff + imap.br_blockcount;
> >  		length = XFS_FSB_TO_B(mp, end_fsb) - offset;
> > @@ -1054,7 +1048,6 @@ xfs_file_iomap_begin(
> >  		if (error)
> >  			return error;
> >  
> > -alloc_done:
> >  		iomap->flags = IOMAP_F_NEW;
> >  		trace_xfs_iomap_alloc(ip, offset, length, 0, &imap);
> >  	} else {
> > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> > index 1a96741..d5a2cf2 100644
> > --- a/fs/xfs/xfs_reflink.c
> > +++ b/fs/xfs/xfs_reflink.c
> > @@ -384,74 +384,84 @@ xfs_reflink_convert_cow(
> >  }
> >  
> >  /* Allocate all CoW reservations covering a range of blocks in a file. */
> > -static int
> > -__xfs_reflink_allocate_cow(
> > +int
> > +xfs_reflink_allocate_cow(
> >  	struct xfs_inode	*ip,
> > -	xfs_fileoff_t		*offset_fsb,
> > -	xfs_fileoff_t		end_fsb)
> > +	xfs_fileoff_t		offset_fsb,
> > +	xfs_fileoff_t		end_fsb,
> > +	struct xfs_bmbt_irec	*imap,
> > +	bool			*shared,
> > +	uint			*lockmode)
> >  {
> >  	struct xfs_mount	*mp = ip->i_mount;
> > -	struct xfs_bmbt_irec	imap, got;
> > +	struct xfs_bmbt_irec	got;
> >  	struct xfs_defer_ops	dfops;
> > -	struct xfs_trans	*tp;
> > +	struct xfs_trans	*tp = NULL;
> >  	xfs_fsblock_t		first_block;
> > -	int			nimaps, error, lockmode;
> > -	bool			shared, trimmed;
> > +	int			nimaps, error = 0;
> > +	bool			trimmed;
> >  	xfs_filblks_t		resaligned;
> > -	xfs_extlen_t		resblks;
> > +	xfs_extlen_t		resblks = 0;
> >  	xfs_extnum_t		idx;
> >  
> > -	resaligned = xfs_aligned_fsb_count(*offset_fsb, end_fsb - *offset_fsb,
> > -			xfs_get_cowextsz_hint(ip));
> > -	resblks = XFS_DIOSTRAT_SPACE_RES(mp, resaligned);
> > -
> > -	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp);
> > -	if (error)
> > -		return error;
> > -
> > -	lockmode = XFS_ILOCK_EXCL;
> > -	xfs_ilock(ip, lockmode);
> > +retry:
> > +	ASSERT(xfs_is_reflink_inode(ip));
> > +	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL | XFS_ILOCK_SHARED));
> >  
> >  	/*
> >  	 * Even if the extent is not shared we might have a preallocation for
> >  	 * it in the COW fork.  If so use it.
> >  	 */
> > -	if (xfs_iext_lookup_extent(ip, ip->i_cowfp, *offset_fsb, &idx, &got) &&
> > -	    got.br_startoff <= *offset_fsb) {
> > +	if (xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &idx, &got) &&
> > +	    got.br_startoff <= offset_fsb) {
> > +		*shared = true;
> > +
> >  		/* If we have a real allocation in the COW fork we're done. */
> >  		if (!isnullstartblock(got.br_startblock)) {
> > -			xfs_trim_extent(&got, *offset_fsb,
> > -					end_fsb - *offset_fsb);
> > -			*offset_fsb = got.br_startoff + got.br_blockcount;
> > -			goto out_trans_cancel;
> > +			xfs_trim_extent(&got, offset_fsb, end_fsb - offset_fsb);
> > +			*imap = got;
> > +			goto out;
> >  		}
> > +
> > +		xfs_trim_extent(imap, got.br_startoff, got.br_blockcount);
> >  	} else {
> > -		nimaps = 1;
> > -		error = xfs_bmapi_read(ip, *offset_fsb, end_fsb - *offset_fsb,
> > -				&imap, &nimaps, 0);
> > +		error = xfs_reflink_trim_around_shared(ip, imap, shared, &trimmed);
> > +		if (error || !*shared)
> > +			goto out;
> > +	}
> > +
> > +	if (!tp) {
> > +		resaligned = xfs_aligned_fsb_count(offset_fsb,
> > +			end_fsb - offset_fsb, xfs_get_cowextsz_hint(ip));
> > +		resblks = XFS_DIOSTRAT_SPACE_RES(mp, resaligned);
> > +
> > +		xfs_iunlock(ip, *lockmode);
> > +		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp);
> > +		*lockmode = XFS_ILOCK_EXCL;
> > +		xfs_ilock(ip, *lockmode);
> > +
> >  		if (error)
> > -			goto out_trans_cancel;
> > -		ASSERT(nimaps == 1);
> > +			return error;
> >  
> > -		/* Trim the mapping to the nearest shared extent boundary. */
> > -		error = xfs_reflink_trim_around_shared(ip, &imap, &shared,
> > -				&trimmed);
> > +		error = xfs_qm_dqattach_locked(ip, 0);
> >  		if (error)
> > -			goto out_trans_cancel;
> > +			goto out;
> >  
> > -		if (!shared) {
> > -			*offset_fsb = imap.br_startoff + imap.br_blockcount;
> > -			goto out_trans_cancel;
> > -		}
> > +		/* Read extent from the source file. */
> > +		nimaps = 1;
> > +		error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
> > +				imap, &nimaps, 0);
> > +		if (error)
> > +			goto out;
> > +		ASSERT(nimaps == 1);
> >  
> > -		*offset_fsb = imap.br_startoff;
> > -		end_fsb = imap.br_startoff + imap.br_blockcount;
> > +		goto retry;
> >  	}
> >  
> >  	error = xfs_trans_reserve_quota_nblks(tp, ip, resblks, 0,
> >  			XFS_QMOPT_RES_REGBLKS);
> >  	if (error)
> > -		goto out_trans_cancel;
> > +		goto out;
> >  
> >  	xfs_trans_ijoin(tp, ip, 0);
> >  
> > @@ -459,9 +469,9 @@ __xfs_reflink_allocate_cow(
> >  	nimaps = 1;
> >  
> >  	/* Allocate the entire reservation as unwritten blocks. */
> > -	error = xfs_bmapi_write(tp, ip, *offset_fsb, end_fsb - *offset_fsb,
> > +	error = xfs_bmapi_write(tp, ip, imap->br_startoff, imap->br_blockcount,
> >  			XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC, &first_block,
> > -			resblks, &imap, &nimaps, &dfops);
> > +			resblks, imap, &nimaps, &dfops);
> >  	if (error)
> >  		goto out_bmap_cancel;
> >  
> > @@ -470,60 +480,15 @@ __xfs_reflink_allocate_cow(
> >  	if (error)
> >  		goto out_bmap_cancel;
> >  
> > -	error = xfs_trans_commit(tp);
> > -	if (error)
> > -		goto out_unlock;
> > -
> > -	*offset_fsb = imap.br_startoff + imap.br_blockcount;
> > -
> > -out_unlock:
> > -	xfs_iunlock(ip, lockmode);
> > -	return error;
> > +	return xfs_trans_commit(tp);
> >  
> >  out_bmap_cancel:
> >  	xfs_defer_cancel(&dfops);
> >  	xfs_trans_unreserve_quota_nblks(tp, ip, (long)resblks, 0,
> >  			XFS_QMOPT_RES_REGBLKS);
> > -out_trans_cancel:
> > -	xfs_trans_cancel(tp);
> > -	goto out_unlock;
> > -}
> > -
> > -/* Allocate all CoW reservations covering a part of a file. */
> > -int
> > -xfs_reflink_allocate_cow_range(
> > -	struct xfs_inode	*ip,
> > -	xfs_off_t		offset,
> > -	xfs_off_t		count)
> > -{
> > -	struct xfs_mount	*mp = ip->i_mount;
> > -	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
> > -	xfs_fileoff_t		end_fsb = XFS_B_TO_FSB(mp, offset + count);
> > -	int			error;
> > -
> > -	ASSERT(xfs_is_reflink_inode(ip));
> > -
> > -	trace_xfs_reflink_allocate_cow_range(ip, offset, count);
> > -
> > -	/*
> > -	 * Make sure that the dquots are there.
> > -	 */
> > -	error = xfs_qm_dqattach(ip, 0);
> > -	if (error)
> > -		return error;
> > -
> > -	while (offset_fsb < end_fsb) {
> > -		error = __xfs_reflink_allocate_cow(ip, &offset_fsb, end_fsb);
> > -		if (error) {
> > -			trace_xfs_reflink_allocate_cow_range_error(ip, error,
> > -					_RET_IP_);
> > -			goto out;
> > -		}
> > -	}
> > -
> > -	/* Convert the CoW extents to regular. */
> > -	error = xfs_reflink_convert_cow(ip, offset, count);
> 
> I think this is incorrect -- we create an unwritten extent in the cow
> fork above (BMAP_COWFORK | BMAP_PREALLOC) but we no longer convert the
> part we're actually writing to a real extent.  This means that
> _reflink_cow won't actually remap anything that we've written.

It's not correct and I messed up when rebasing - the version I tested
still had it and worked.  Sigh..

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/4] xfs: allocate direct I/O COW blocks in iomap_begin
  2017-02-03 10:53     ` Christoph Hellwig
@ 2017-02-05 20:13       ` Christoph Hellwig
  0 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2017-02-05 20:13 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs

On Fri, Feb 03, 2017 at 11:53:59AM +0100, Christoph Hellwig wrote:
> > I think this is incorrect -- we create an unwritten extent in the cow
> > fork above (BMAP_COWFORK | BMAP_PREALLOC) but we no longer convert the
> > part we're actually writing to a real extent.  This means that
> > _reflink_cow won't actually remap anything that we've written.
> 
> It's not correct and I messed up when rebasing - the version I tested
> still had it and worked.  Sigh..

Or rather I hadn't finished rebasing and the version I had was totally
b0rked.  Proper one in progress now..

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-02-05 20:13 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-01 21:42 reflink direct I/O improvements V2 Christoph Hellwig
2017-02-01 21:42 ` [PATCH 1/4] xfs: reject all unaligned direct writes to reflinked files Christoph Hellwig
2017-02-02 23:01   ` Darrick J. Wong
2017-02-01 21:42 ` [PATCH 2/4] xfs: introduce xfs_aligned_fsb_count Christoph Hellwig
2017-02-02 22:56   ` Darrick J. Wong
2017-02-01 21:42 ` [PATCH 3/4] xfs: go straight to real allocations for direct I/O COW writes Christoph Hellwig
2017-02-01 21:42 ` [PATCH 4/4] xfs: allocate direct I/O COW blocks in iomap_begin Christoph Hellwig
2017-02-03  0:55   ` Darrick J. Wong
2017-02-03 10:53     ` Christoph Hellwig
2017-02-05 20:13       ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.