linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] xfs: reflink fixes
@ 2018-01-21  5:33 Darrick J. Wong
  2018-01-21  5:33 ` [PATCH 1/6] xfs: reflink should break pnfs leases before sharing blocks Darrick J. Wong
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Darrick J. Wong @ 2018-01-21  5:33 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

Hi all,

Running generic/232 with quotas and reflink demonstrated that there was
something wrong with the way we did quota accounting -- on an otherwise
idle system, fs-wide du block count numbers didn't match the quota
reports.  I started digging into why the quota accounting was wrong, and
the following are the results of my bug hunt.

The first patch teaches the reflink code to break layout leases before
commencing the block remapping work.  This time we avoid the "looping
trying to get a lock" that Christoph complained about, in favor of
dropping both locks and retrying if we can't cleanly break the layouts
without waiting.

The second patch changes the source file locking (if src != dest) during
a reflink operation to take the shared locks when possible.  The only
thing changing in the source file is the setting of the reflink iflag,
for which we will still take ILOCK_EXCL.  The net result of this is
less lock contention during fsstress and a 30% lower runtime, not that
anyone cares about fsstress benchmarking. :)

Patch three ensure that we attach dquots to inodes before we start
reflinking their blocks.  This could lead to quota undercharging; an
fstest to check this will be sent separately.

Patch four reorganizes the copy on write quota updating code to reflect
how the CoW fork works now.  In short, the CoW fork is entirely in
memory, so we can only use the in-memory quota reservation counters for
all CoW blocks; the accounting only becomes permanent if we remap an
extent into the data fork.

Patch five creates a separate i_cow_blocks counter to track all the CoW
blocks assigned to a file, which makes changing a file's uid/gid/prjid
easier, makes reporting cow blocks via stat easy, and enables various
cleanups.

Patch six fixes a serious potential corruption problem with the cow
extent allocation -- when we allocate into the CoW fork with the cow
extent size hint set, the allocator enlarges the allocation request to
try to hit alignment goals.  However, if the allocated extent does not
actually fulfill any of the requested range, we send a garbage
zero-length extent back to the iomap code (which also doesn't notice),
and the write lands at the startblock of the garbage extent.  The fix is
to detect that we didn't fill the entire requested range and fix up the
returned mapping so that we always fill the first block of the
requested allocation.

--D

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/6] xfs: reflink should break pnfs leases before sharing blocks
  2018-01-21  5:33 [PATCH 0/6] xfs: reflink fixes Darrick J. Wong
@ 2018-01-21  5:33 ` Darrick J. Wong
  2018-01-23 12:05   ` Brian Foster
  2018-01-21  5:34 ` [PATCH 2/6] xfs: only grab shared inode locks for source file during reflink Darrick J. Wong
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Darrick J. Wong @ 2018-01-21  5:33 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Before we share blocks between files, we need to break the pnfs leases
on the layout before we start slicing and dicing the block map.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_reflink.c |   46 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 45 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 47aea2e..ce523dd 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1245,6 +1245,48 @@ xfs_reflink_remap_blocks(
 }
 
 /*
+ * Grab the exclusive iolock for a data copy from src to dest, making
+ * sure to abide vfs locking order (lowest pointer value goes first) and
+ * breaking the pnfs layout leases on dest before proceeding.
+ */
+static int
+xfs_iolock_two_inodes_and_break_layout(
+	struct inode		*src,
+	struct inode		*dest)
+{
+	bool			src_first = src < dest;
+	bool			src_last = src > dest;
+	int			error;
+
+retry:
+	if (src_first) {
+		inode_lock(src);
+		inode_lock_nested(dest, I_MUTEX_NONDIR2);
+	} else {
+		inode_lock(dest);
+	}
+
+	error = break_layout(dest, false);
+	if (error == -EWOULDBLOCK) {
+		inode_unlock(dest);
+		if (src_first)
+			inode_unlock(src);
+		error = break_layout(dest, true);
+		if (error)
+			return error;
+		goto retry;
+	} else if (error) {
+		inode_unlock(dest);
+		if (src_first)
+			inode_unlock(src);
+		return error;
+	}
+	if (src_last)
+		inode_lock_nested(src, I_MUTEX_NONDIR2);
+	return 0;
+}
+
+/*
  * Link a range of blocks from one file to another.
  */
 int
@@ -1274,7 +1316,9 @@ xfs_reflink_remap_range(
 		return -EIO;
 
 	/* Lock both files against IO */
-	lock_two_nondirectories(inode_in, inode_out);
+	ret = xfs_iolock_two_inodes_and_break_layout(inode_in, inode_out);
+	if (ret)
+		return ret;
 	if (same_inode)
 		xfs_ilock(src, XFS_MMAPLOCK_EXCL);
 	else


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/6] xfs: only grab shared inode locks for source file during reflink
  2018-01-21  5:33 [PATCH 0/6] xfs: reflink fixes Darrick J. Wong
  2018-01-21  5:33 ` [PATCH 1/6] xfs: reflink should break pnfs leases before sharing blocks Darrick J. Wong
@ 2018-01-21  5:34 ` Darrick J. Wong
  2018-01-23 12:05   ` Brian Foster
  2018-01-21  5:34 ` [PATCH 3/6] xfs: call xfs_qm_dqattach before performing reflink operations Darrick J. Wong
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Darrick J. Wong @ 2018-01-21  5:34 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Reflink and dedupe operations remap blocks from a source file into a
destination file.  The destination file needs exclusive locks on all
levels because we're updating its block map, but the source file isn't
undergoing any block map changes so we can use a shared lock.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_reflink.c |   50 +++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 37 insertions(+), 13 deletions(-)


diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index ce523dd..5d1ff5a 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1202,13 +1202,16 @@ xfs_reflink_remap_blocks(
 
 	/* drange = (destoff, destoff + len); srange = (srcoff, srcoff + len) */
 	while (len) {
+		uint		lock_mode;
+
 		trace_xfs_reflink_remap_blocks_loop(src, srcoff, len,
 				dest, destoff);
+
 		/* Read extent from the source file */
 		nimaps = 1;
-		xfs_ilock(src, XFS_ILOCK_EXCL);
+		lock_mode = xfs_ilock_data_map_shared(src);
 		error = xfs_bmapi_read(src, srcoff, len, &imap, &nimaps, 0);
-		xfs_iunlock(src, XFS_ILOCK_EXCL);
+		xfs_iunlock(src, lock_mode);
 		if (error)
 			goto err;
 		ASSERT(nimaps == 1);
@@ -1260,7 +1263,7 @@ xfs_iolock_two_inodes_and_break_layout(
 
 retry:
 	if (src_first) {
-		inode_lock(src);
+		inode_lock_shared(src);
 		inode_lock_nested(dest, I_MUTEX_NONDIR2);
 	} else {
 		inode_lock(dest);
@@ -1270,7 +1273,7 @@ xfs_iolock_two_inodes_and_break_layout(
 	if (error == -EWOULDBLOCK) {
 		inode_unlock(dest);
 		if (src_first)
-			inode_unlock(src);
+			inode_unlock_shared(src);
 		error = break_layout(dest, true);
 		if (error)
 			return error;
@@ -1278,14 +1281,36 @@ xfs_iolock_two_inodes_and_break_layout(
 	} else if (error) {
 		inode_unlock(dest);
 		if (src_first)
-			inode_unlock(src);
+			inode_unlock_shared(src);
 		return error;
 	}
 	if (src_last)
-		inode_lock_nested(src, I_MUTEX_NONDIR2);
+		down_read_nested(&src->i_rwsem, I_MUTEX_NONDIR2);
 	return 0;
 }
 
+static void
+xfs_reflink_mmaplock_two(
+	struct xfs_inode	*src,
+	struct xfs_inode	*dest)
+{
+	int			i = 0;
+
+	if (src->i_ino == dest->i_ino) {
+		xfs_ilock(src, XFS_MMAPLOCK_EXCL);
+		return;
+	}
+
+	if (src->i_ino < dest->i_ino) {
+		xfs_ilock(src, XFS_MMAPLOCK_SHARED);
+		i++;
+	}
+	xfs_ilock(dest, XFS_MMAPLOCK_EXCL + (i << XFS_MMAPLOCK_SHIFT));
+	i++;
+	if (src->i_ino > dest->i_ino)
+		xfs_ilock(src, XFS_MMAPLOCK_SHARED + (i << XFS_MMAPLOCK_SHIFT));
+}
+
 /*
  * Link a range of blocks from one file to another.
  */
@@ -1319,10 +1344,7 @@ xfs_reflink_remap_range(
 	ret = xfs_iolock_two_inodes_and_break_layout(inode_in, inode_out);
 	if (ret)
 		return ret;
-	if (same_inode)
-		xfs_ilock(src, XFS_MMAPLOCK_EXCL);
-	else
-		xfs_lock_two_inodes(src, dest, XFS_MMAPLOCK_EXCL);
+	xfs_reflink_mmaplock_two(src, dest);
 
 	/* Check file eligibility and prepare for block sharing. */
 	ret = -EINVAL;
@@ -1385,10 +1407,12 @@ xfs_reflink_remap_range(
 			is_dedupe);
 
 out_unlock:
-	xfs_iunlock(src, XFS_MMAPLOCK_EXCL);
+	xfs_iunlock(dest, XFS_MMAPLOCK_EXCL);
+	if (!same_inode)
+		xfs_iunlock(src, XFS_MMAPLOCK_SHARED);
+	inode_unlock(inode_out);
 	if (!same_inode)
-		xfs_iunlock(dest, XFS_MMAPLOCK_EXCL);
-	unlock_two_nondirectories(inode_in, inode_out);
+		inode_unlock_shared(inode_in);
 	if (ret)
 		trace_xfs_reflink_remap_range_error(dest, ret, _RET_IP_);
 	return ret;


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/6] xfs: call xfs_qm_dqattach before performing reflink operations
  2018-01-21  5:33 [PATCH 0/6] xfs: reflink fixes Darrick J. Wong
  2018-01-21  5:33 ` [PATCH 1/6] xfs: reflink should break pnfs leases before sharing blocks Darrick J. Wong
  2018-01-21  5:34 ` [PATCH 2/6] xfs: only grab shared inode locks for source file during reflink Darrick J. Wong
@ 2018-01-21  5:34 ` Darrick J. Wong
  2018-01-23 12:05   ` Brian Foster
  2018-01-21  5:34 ` [PATCH 4/6] xfs: CoW fork operations should only update quota reservations Darrick J. Wong
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Darrick J. Wong @ 2018-01-21  5:34 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Ensure that we've attached all the necessary dquots before performing
reflink operations so that quota accounting is accurate.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_reflink.c |   17 +++++++++++++++++
 1 file changed, 17 insertions(+)


diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 5d1ff5a..947d0637 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -54,6 +54,7 @@
 #include "xfs_rmap_btree.h"
 #include "xfs_sb.h"
 #include "xfs_ag_resv.h"
+#include "xfs_qm.h"
 
 /*
  * Copy on Write of Shared Blocks
@@ -282,6 +283,10 @@ xfs_reflink_reserve_cow(
 	 * tree.
 	 */
 
+	error = xfs_qm_dqattach_locked(ip, 0);
+	if (error)
+		return error;
+
 	if (!xfs_iext_lookup_extent(ip, ifp, imap->br_startoff, &icur, &got))
 		eof = true;
 	if (!eof && got.br_startoff <= imap->br_startoff) {
@@ -396,6 +401,10 @@ xfs_reflink_allocate_cow(
 	ASSERT(xfs_is_reflink_inode(ip));
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL | XFS_ILOCK_SHARED));
 
+	error = xfs_qm_dqattach_locked(ip, 0);
+	if (error)
+		return error;
+
 	/*
 	 * Even if the extent is not shared we might have a preallocation for
 	 * it in the COW fork.  If so use it.
@@ -1356,6 +1365,14 @@ xfs_reflink_remap_range(
 	if (IS_DAX(inode_in) || IS_DAX(inode_out))
 		goto out_unlock;
 
+	/* Attach dquots to both inodes */
+	ret = xfs_qm_dqattach(src, 0);
+	if (ret)
+		goto out_unlock;
+	ret = xfs_qm_dqattach(dest, 0);
+	if (ret)
+		goto out_unlock;
+
 	ret = vfs_clone_file_prep_inodes(inode_in, pos_in, inode_out, pos_out,
 			&len, is_dedupe);
 	if (ret <= 0)


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/6] xfs: CoW fork operations should only update quota reservations
  2018-01-21  5:33 [PATCH 0/6] xfs: reflink fixes Darrick J. Wong
                   ` (2 preceding siblings ...)
  2018-01-21  5:34 ` [PATCH 3/6] xfs: call xfs_qm_dqattach before performing reflink operations Darrick J. Wong
@ 2018-01-21  5:34 ` Darrick J. Wong
  2018-01-21  5:34 ` [PATCH 5/6] xfs: track CoW blocks separately in the inode Darrick J. Wong
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2018-01-21  5:34 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Since the CoW fork only exists in memory, it is incorrect to update the
on-disk quota block counts when we modify the CoW fork.  Unlike the data
fork, even real extents in the CoW fork are only reservations (on-disk
they're owned by the refcountbt) so they must not be tracked in the on
disk quota info.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c |  203 ++++++++++++++++++++++++++++++++++++++++++++--
 fs/xfs/xfs_reflink.c     |    8 +-
 2 files changed, 196 insertions(+), 15 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 068e8fb..8df2df5 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -52,6 +52,145 @@
 #include "xfs_refcount.h"
 #include "xfs_icache.h"
 
+/*
+ * Data/Attr Fork Mapping Lifecycle
+ *
+ * The data fork contains the block mappings between logical blocks in a file
+ * and physical blocks on the disk.  The XFS notions of delayed allocation
+ * reservations, unwritten extents, and real extents follow well known
+ * conventions in the filesystem world.
+ *
+ * As a side note, the attribute fork does the same for extended attribute
+ * blocks, though the logical block offsets are not available to userspace and
+ * the only valid states are HOLE and REAL.
+ *
+ * Metadata involved outside of the block mapping itself are as follows:
+ *
+ * - i_delayed_blks: Number of blocks that are reserved for delayed allocation.
+ * - i_cow_blocks: Number of blocks reserved for copy on write staging.
+ *
+ * - di_nblocks: Number of blocks (on-disk) assigned to the inode.
+ *
+ * - d_bcount: Number of quota blocks accounted for by on-disk metadata.
+ * - q_res_bcount: Number of quota blocks reserved in-core for future writes +
+ *           blocks mentioned by on-disk metadata.
+ *
+ * - qt_blk_res: Number of quota blocks reserved in-core for this transaction.
+ *           Unused reservation is given back to q_res_bcount on commit.
+ * - qt_bcount: Number of quota blocks used by this transaction from
+ *           qt_blk_res.  d_bcount is increased by this on commit.
+ * - qt_delbcount: Number of quota blocks used by this transaction from
+ *           q_res_bcount but not q_res_bcount.  d_bcount is increased by this
+ *           on commit.
+ *
+ * - sb_fdblocks: Number of free blocks recorded in the superblock on disk.
+ * - fdblocks: Number of free blocks recorded in the superblock minus any
+ *           in-core reservations made in anticipation of future writes.
+ *
+ * - t_blk_res: Number of blocks reserved out of fdblocks for a transaction.
+ *           When the transaction commits, t_blk_res - t_blk_res_used is given
+ *           back to fdblocks.
+ * - t_blk_res_used: Number of blocks used by this transaction that were
+ *           reserved for this transaction.
+ * - t_fdblocks_del: Number of blocks by which fdblocks and sb_fdblocks will
+ *           have to decrease at commit.
+ * - t_res_fdblocks_delta: Number of blocks by which sb_fdblocks will have to
+ *           decrease at commit.  We assume that fdblocks was decreased
+ *           prior to the transaction.
+ *
+ * Data fork block mappings have four logical states:
+ *
+ *    +--------> UNWRITTEN <------+
+ *    |              ^            |
+ *    |              v            v
+ * DELALLOC <----> HOLE <------> REAL
+ *    |                           ^
+ *    |                           |
+ *    +---------------------------+
+ *
+ * The state transitions and required metadata updates are as follows:
+ *
+ * - HOLE to DELALLOC: Increase i_delayed_blks and q_res_bcount, and decrease
+ *           fdblocks.
+ * - HOLE to REAL: Increase di_nblocks and qt_bcount, and decrease fdblocks.
+ * - HOLE to UNWRITTEN: Same as above.
+ *
+ * - DELALLOC to UNWRITTEN: Increase di_nblocks and qt_delbcount, and decrease
+ *           i_delayed_blks.
+ * - DELALLOC to REAL: Same as above.
+ * - DELALLOC to HOLE: Increase fdblocks, and decrease i_delayed_blks and
+ *           q_res_bcount.
+ *
+ * - UNWRITTEN to HOLE: Decrease di_nblocks and q_bcount, and increase fdblocks.
+ * - UNWRITTEN to REAL: No change.
+ *
+ * - REAL to UNWRITTEN: No change.
+ * - REAL to HOLE: Decrease di_nblocks and q_bcount, and increase fdblocks.
+ *
+ * Note in particular that delalloc reservations have "transaction-less"
+ * quota reservations via q_res_bcount.  If the reservation is allocated,
+ * qt_delbcount is used to increment d_bcount without touching q_res_bcount.
+ * Filling a hole with an allocated extent, by contrast, uses qt_blk_res
+ * to make a reservation in q_res_bcount, qt_bcount to record the number
+ * of allocated blocks; at commit qt_bcount is added to d_bcount and
+ * qt_blk_res - qt_bcount is added back to q_res_bcount.
+ *
+ * Copy on Write Fork Mapping Lifecycle
+ *
+ * The CoW fork handles things differently from the data fork because its
+ * mappings only exist in memory-- the refcount btree is the on-disk owner of
+ * the extents until they're remapped into the data fork.  Therefore,
+ * unwritten and real extents in the CoW fork are treated the same way as
+ * delayed allocation extents.  Quota and fdblock changes only exist in
+ * memory, which requires some twists in the bmap functions.
+ *
+ * The CoW fork extent state diagram looks like this:
+ *
+ *    +--------> UNWRITTEN -------+
+ *    |              ^            |
+ *    |              v            v
+ * DELALLOC <----> HOLE <------- REAL
+ *
+ * Holes are still holes.  Delayed allocation extents reserve blocks for
+ * landing future writes, just like they do in the data fork.  However, unlike
+ * the data fork, unwritten extents signal an extent that has been allocated
+ * but is not currently undergoing writeback.  Real extents are undergoing
+ * writeback, and when that writeback finishes the corresponding data fork
+ * extent will be punched out and the CoW fork counterpart moved to the new
+ * hole in the data fork.
+ *
+ * The state transitions and required metadata updates are as follows:
+ *
+ * - HOLE to DELALLOC: Increase i_cow_blocks and q_res_bcount, and decrease
+ *           fdblocks.
+ * - HOLE to UNWRITTEN: Same as above, but since we reserved quota via
+ *           qt_blk_res (which increased q_res_bcount) when we allocate the
+ *           extent we have to decrease qt_blk_res so that the commit doesn't
+ *           give the allocated CoW blocks back.
+ *
+ * - DELALLOC to UNWRITTEN: No change.
+ * - DELALLOC to HOLE: Decrease i_cow_blocks and q_res_bcount, and increase
+ *           fdblocks.
+ *
+ * - UNWRITTEN to HOLE: Same as DELALLOC to HOLE.
+ * - UNWRITTEN to REAL: No change.
+ *
+ * - REAL to HOLE: This transition happens when we've finished a write
+ *           operation and need to move the mapping to the data fork.  We
+ *           punch the correspond data fork mappings, which decreases
+ *           qt_bcount.  Then we map the CoW fork mapping into the hole we
+ *           just cleared out of the data fork, which increases qt_bcount.
+ *           There's a subtlety here -- if we promoted a write over a hole to
+ *           CoW, there will be a net increase in qt_bcount, which is fine
+ *           because we already reserved the quota when we filled the CoW
+ *           fork.  Finally, we punch the CoW fork mapping, which decreases
+ *           q_res_bcount.
+ *
+ * Notice how all CoW fork extents use transactionless quota reservations and
+ * the in-core fdblocks to maintain state, and we avoid updating any on-disk
+ * metadata.  This is essential to maintain metadata correctness if the system
+ * goes down.
+ */
 
 kmem_zone_t		*xfs_bmap_free_item_zone;
 
@@ -3337,6 +3476,39 @@ xfs_bmap_btalloc_filestreams(
 	return 0;
 }
 
+/* Deal with CoW fork accounting when we allocate a block. */
+static void
+xfs_bmap_btalloc_cow(
+	struct xfs_bmalloca	*ap,
+	struct xfs_alloc_arg	*args)
+{
+	/* Filling a previously reserved extent; nothing to do here. */
+	if (ap->wasdel)
+		return;
+
+	/*
+	 * The CoW fork only exists in memory, so the on-disk quota accounting
+	 * must not incude any CoW fork extents.  Therefore, CoW blocks are
+	 * only tracked in the in-core dquot block count (q_res_bcount).
+	 *
+	 * If we get here, we're filling a CoW hole with a real (non-delalloc)
+	 * CoW extent having reserved enough blocks from both q_res_bcount and
+	 * qt_blk_res to guarantee that we won't run out of space.  The unused
+	 * qt_blk_res is given back to q_res_bcount when the transaction
+	 * commits.
+	 *
+	 * We don't want the quota accounting for our newly allocated blocks
+	 * to be given back, so we must decrease qt_blk_res without decreasing
+	 * q_res_bcount.
+	 *
+	 * Note: If we're allocating a delalloc extent, we already reserved
+	 * the q_res_bcount blocks, so no quota accounting update is needed
+	 * here.
+	 */
+	xfs_trans_mod_dquot_byino(ap->tp, ap->ip, XFS_TRANS_DQ_RES_BLKS,
+			-(long)args->len);
+}
+
 STATIC int
 xfs_bmap_btalloc(
 	struct xfs_bmalloca	*ap)	/* bmap alloc argument struct */
@@ -3571,19 +3743,22 @@ xfs_bmap_btalloc(
 			*ap->firstblock = args.fsbno;
 		ASSERT(nullfb || fb_agno <= args.agno);
 		ap->length = args.len;
-		if (!(ap->flags & XFS_BMAPI_COWFORK))
-			ap->ip->i_d.di_nblocks += args.len;
-		xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);
 		if (ap->wasdel)
 			ap->ip->i_delayed_blks -= args.len;
-		/*
-		 * Adjust the disk quota also. This was reserved
-		 * earlier.
-		 */
-		xfs_trans_mod_dquot_byino(ap->tp, ap->ip,
-			ap->wasdel ? XFS_TRANS_DQ_DELBCOUNT :
-					XFS_TRANS_DQ_BCOUNT,
-			(long) args.len);
+		if (ap->flags & XFS_BMAPI_COWFORK) {
+			xfs_bmap_btalloc_cow(ap, &args);
+		} else {
+			ap->ip->i_d.di_nblocks += args.len;
+			xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);
+			/*
+			 * Adjust the disk quota also. This was reserved
+			 * earlier.
+			 */
+			xfs_trans_mod_dquot_byino(ap->tp, ap->ip,
+				ap->wasdel ? XFS_TRANS_DQ_DELBCOUNT :
+						XFS_TRANS_DQ_BCOUNT,
+				(long) args.len);
+		}
 	} else {
 		ap->blkno = NULLFSBLOCK;
 		ap->length = 0;
@@ -4776,6 +4951,7 @@ xfs_bmap_del_extent_cow(
 	struct xfs_bmbt_irec	new;
 	xfs_fileoff_t		del_endoff, got_endoff;
 	int			state = BMAP_COWFORK;
+	int			error;
 
 	XFS_STATS_INC(mp, xs_del_exlist);
 
@@ -4832,6 +5008,11 @@ xfs_bmap_del_extent_cow(
 		xfs_iext_insert(ip, icur, &new, state);
 		break;
 	}
+
+	/* Remove the quota reservation */
+	error = xfs_trans_reserve_quota_nblks(NULL, ip,
+			-(long)del->br_blockcount, 0, XFS_QMOPT_RES_REGBLKS);
+	ASSERT(error == 0);
 }
 
 /*
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 947d0637..7bd7873 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -608,10 +608,6 @@ xfs_reflink_cancel_cow_blocks(
 					del.br_startblock, del.br_blockcount,
 					NULL);
 
-			/* Update quota accounting */
-			xfs_trans_mod_dquot_byino(*tpp, ip, XFS_TRANS_DQ_BCOUNT,
-					-(long)del.br_blockcount);
-
 			/* Roll the transaction */
 			xfs_defer_ijoin(&dfops, ip);
 			error = xfs_defer_finish(tpp, &dfops);
@@ -804,6 +800,10 @@ xfs_reflink_end_cow(
 		if (error)
 			goto out_defer;
 
+		/* Charge this new data fork mapping to the on-disk quota. */
+		xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT,
+				(long)del.br_blockcount);
+
 		/* Remove the mapping from the CoW fork. */
 		xfs_bmap_del_extent_cow(ip, &icur, &got, &del);
 


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/6] xfs: track CoW blocks separately in the inode
  2018-01-21  5:33 [PATCH 0/6] xfs: reflink fixes Darrick J. Wong
                   ` (3 preceding siblings ...)
  2018-01-21  5:34 ` [PATCH 4/6] xfs: CoW fork operations should only update quota reservations Darrick J. Wong
@ 2018-01-21  5:34 ` Darrick J. Wong
  2018-01-21  5:34 ` [PATCH 6/6] xfs: fix up cowextsz allocation shortfalls Darrick J. Wong
  2018-01-22 23:25 ` [PATCH 0/6] xfs: reflink fixes Darrick J. Wong
  6 siblings, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2018-01-21  5:34 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Track the number of blocks reserved in the CoW fork so that we can
move the quota reservations whenever we chown, and don't account for
CoW fork delalloc reservations in i_delayed_blks.  This should make
chown work properly for quota reservations, enables us to fully
account for real extents in the cow fork in the file stat info, and
improves the post-eof scanning decisions because we're no longer
confusing data fork delalloc extents with cow fork delalloc extents.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c      |   16 ++++++++++++----
 fs/xfs/libxfs/xfs_inode_buf.c |    1 +
 fs/xfs/xfs_bmap_util.c        |    5 +++++
 fs/xfs/xfs_icache.c           |    3 ++-
 fs/xfs/xfs_inode.c            |   11 +++++------
 fs/xfs/xfs_inode.h            |    1 +
 fs/xfs/xfs_iops.c             |    3 ++-
 fs/xfs/xfs_itable.c           |    3 ++-
 fs/xfs/xfs_qm.c               |    2 +-
 fs/xfs/xfs_reflink.c          |    4 ++--
 fs/xfs/xfs_super.c            |    1 +
 11 files changed, 34 insertions(+), 16 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 8df2df5..ad0dc56 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3505,6 +3505,7 @@ xfs_bmap_btalloc_cow(
 	 * the q_res_bcount blocks, so no quota accounting update is needed
 	 * here.
 	 */
+	ap->ip->i_cow_blocks += args->len;
 	xfs_trans_mod_dquot_byino(ap->tp, ap->ip, XFS_TRANS_DQ_RES_BLKS,
 			-(long)args->len);
 }
@@ -3743,13 +3744,13 @@ xfs_bmap_btalloc(
 			*ap->firstblock = args.fsbno;
 		ASSERT(nullfb || fb_agno <= args.agno);
 		ap->length = args.len;
-		if (ap->wasdel)
-			ap->ip->i_delayed_blks -= args.len;
 		if (ap->flags & XFS_BMAPI_COWFORK) {
 			xfs_bmap_btalloc_cow(ap, &args);
 		} else {
 			ap->ip->i_d.di_nblocks += args.len;
 			xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);
+			if (ap->wasdel)
+				ap->ip->i_delayed_blks -= args.len;
 			/*
 			 * Adjust the disk quota also. This was reserved
 			 * earlier.
@@ -4116,7 +4117,10 @@ xfs_bmapi_reserve_delalloc(
 		goto out_unreserve_blocks;
 
 
-	ip->i_delayed_blks += alen;
+	if (whichfork == XFS_COW_FORK)
+		ip->i_cow_blocks += alen;
+	else
+		ip->i_delayed_blks += alen;
 
 	got->br_startoff = aoff;
 	got->br_startblock = nullstartblock(indlen);
@@ -4859,7 +4863,10 @@ xfs_bmap_del_extent_delay(
 			isrt ? XFS_QMOPT_RES_RTBLKS : XFS_QMOPT_RES_REGBLKS);
 	if (error)
 		return error;
-	ip->i_delayed_blks -= del->br_blockcount;
+	if (whichfork == XFS_COW_FORK)
+		ip->i_cow_blocks -= del->br_blockcount;
+	else
+		ip->i_delayed_blks -= del->br_blockcount;
 
 	if (got->br_startoff == del->br_startoff)
 		state |= BMAP_LEFT_FILLING;
@@ -5010,6 +5017,7 @@ xfs_bmap_del_extent_cow(
 	}
 
 	/* Remove the quota reservation */
+	ip->i_cow_blocks -= del->br_blockcount;
 	error = xfs_trans_reserve_quota_nblks(NULL, ip,
 			-(long)del->br_blockcount, 0, XFS_QMOPT_RES_REGBLKS);
 	ASSERT(error == 0);
diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index ae1f58e..e43630b 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -624,6 +624,7 @@ xfs_iread(
 
 	ASSERT(ip->i_d.di_version >= 2);
 	ip->i_delayed_blks = 0;
+	ip->i_cow_blocks = 0;
 
 	/*
 	 * Mark the buffer containing the inode as something to keep
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 6d37ab4..c572789 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1991,6 +1991,7 @@ xfs_swap_extents(
 	/* Swap the cow forks. */
 	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
 		xfs_extnum_t	extnum;
+		unsigned int	cowblocks;
 
 		ASSERT(ip->i_cformat == XFS_DINODE_FMT_EXTENTS);
 		ASSERT(tip->i_cformat == XFS_DINODE_FMT_EXTENTS);
@@ -2011,6 +2012,10 @@ xfs_swap_extents(
 			xfs_inode_set_cowblocks_tag(tip);
 		else
 			xfs_inode_clear_cowblocks_tag(tip);
+
+		cowblocks = tip->i_cow_blocks;
+		tip->i_cow_blocks = ip->i_cow_blocks;
+		ip->i_cow_blocks = cowblocks;
 	}
 
 	xfs_trans_log_inode(tp, ip,  src_log_flags);
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 2da7a2e..1344206 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -80,6 +80,7 @@ xfs_inode_alloc(
 	memset(&ip->i_df, 0, sizeof(xfs_ifork_t));
 	ip->i_flags = 0;
 	ip->i_delayed_blks = 0;
+	ip->i_cow_blocks = 0;
 	memset(&ip->i_d, 0, sizeof(ip->i_d));
 
 	return ip;
@@ -1668,7 +1669,7 @@ xfs_prep_free_cowblocks(
 	 * Just clear the tag if we have an empty cow fork or none at all. It's
 	 * possible the inode was fully unshared since it was originally tagged.
 	 */
-	if (!xfs_is_reflink_inode(ip) || !ifp->if_bytes) {
+	if (!xfs_is_reflink_inode(ip) || ip->i_cow_blocks == 0) {
 		trace_xfs_inode_free_cowblocks_invalid(ip);
 		xfs_inode_clear_cowblocks_tag(ip);
 		return false;
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index b36c57a..c432f3a 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1493,15 +1493,13 @@ xfs_itruncate_clear_reflink_flags(
 	struct xfs_inode	*ip)
 {
 	struct xfs_ifork	*dfork;
-	struct xfs_ifork	*cfork;
 
 	if (!xfs_is_reflink_inode(ip))
 		return;
 	dfork = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
-	cfork = XFS_IFORK_PTR(ip, XFS_COW_FORK);
-	if (dfork->if_bytes == 0 && cfork->if_bytes == 0)
+	if (dfork->if_bytes == 0 && ip->i_cow_blocks == 0)
 		ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK;
-	if (cfork->if_bytes == 0)
+	if (ip->i_cow_blocks == 0)
 		xfs_inode_clear_cowblocks_tag(ip);
 }
 
@@ -1654,7 +1652,7 @@ xfs_release(
 		truncated = xfs_iflags_test_and_clear(ip, XFS_ITRUNCATED);
 		if (truncated) {
 			xfs_iflags_clear(ip, XFS_IDIRTY_RELEASE);
-			if (ip->i_delayed_blks > 0) {
+			if (ip->i_delayed_blks > 0 || ip->i_cow_blocks > 0) {
 				error = filemap_flush(VFS_I(ip)->i_mapping);
 				if (error)
 					return error;
@@ -1894,7 +1892,8 @@ xfs_inactive(
 
 	if (S_ISREG(VFS_I(ip)->i_mode) &&
 	    (ip->i_d.di_size != 0 || XFS_ISIZE(ip) != 0 ||
-	     ip->i_d.di_nextents > 0 || ip->i_delayed_blks > 0))
+	     ip->i_d.di_nextents > 0 || ip->i_delayed_blks > 0 ||
+	     ip->i_cow_blocks > 0))
 		truncate = 1;
 
 	error = xfs_qm_dqattach(ip, 0);
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 386b0bb..d53705a 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -62,6 +62,7 @@ typedef struct xfs_inode {
 	/* Miscellaneous state. */
 	unsigned long		i_flags;	/* see defined flags below */
 	unsigned int		i_delayed_blks;	/* count of delay alloc blks */
+	unsigned int		i_cow_blocks;	/* count of cow fork blocks */
 
 	struct xfs_icdinode	i_d;		/* most of ondisk inode */
 
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 56475fc..6c3381c 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -513,7 +513,8 @@ xfs_vn_getattr(
 	stat->mtime = inode->i_mtime;
 	stat->ctime = inode->i_ctime;
 	stat->blocks =
-		XFS_FSB_TO_BB(mp, ip->i_d.di_nblocks + ip->i_delayed_blks);
+		XFS_FSB_TO_BB(mp, ip->i_d.di_nblocks + ip->i_delayed_blks +
+				  ip->i_cow_blocks);
 
 	if (ip->i_d.di_version == 3) {
 		if (request_mask & STATX_BTIME) {
diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
index cf0bab0..e71d2f3 100644
--- a/fs/xfs/xfs_itable.c
+++ b/fs/xfs/xfs_itable.c
@@ -122,7 +122,8 @@ xfs_bulkstat_one_int(
 	case XFS_DINODE_FMT_BTREE:
 		buf->bs_rdev = 0;
 		buf->bs_blksize = mp->m_sb.sb_blocksize;
-		buf->bs_blocks = dic->di_nblocks + ip->i_delayed_blks;
+		buf->bs_blocks = dic->di_nblocks + ip->i_delayed_blks +
+				 ip->i_cow_blocks;
 		break;
 	}
 	xfs_iunlock(ip, XFS_ILOCK_SHARED);
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index 923e7f9..3e53269 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -1847,7 +1847,7 @@ xfs_qm_vop_chown_reserve(
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED));
 	ASSERT(XFS_IS_QUOTA_RUNNING(mp));
 
-	delblks = ip->i_delayed_blks;
+	delblks = ip->i_delayed_blks + ip->i_cow_blocks;
 	blkflags = XFS_IS_REALTIME_INODE(ip) ?
 			XFS_QMOPT_RES_RTBLKS : XFS_QMOPT_RES_REGBLKS;
 
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 7bd7873..3c2fd63 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -628,7 +628,7 @@ xfs_reflink_cancel_cow_blocks(
 	}
 
 	/* clear tag if cow fork is emptied */
-	if (!ifp->if_bytes)
+	if (ip->i_cow_blocks == 0)
 		xfs_inode_clear_cowblocks_tag(ip);
 
 	return error;
@@ -713,7 +713,7 @@ xfs_reflink_end_cow(
 	trace_xfs_reflink_end_cow(ip, offset, count);
 
 	/* No COW extents?  That's easy! */
-	if (ifp->if_bytes == 0)
+	if (ip->i_cow_blocks == 0)
 		return 0;
 
 	offset_fsb = XFS_B_TO_FSBT(ip->i_mount, offset);
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index f3e0001..9d04cfb 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -989,6 +989,7 @@ xfs_fs_destroy_inode(
 	xfs_inactive(ip);
 
 	ASSERT(XFS_FORCED_SHUTDOWN(ip->i_mount) || ip->i_delayed_blks == 0);
+	ASSERT(XFS_FORCED_SHUTDOWN(ip->i_mount) || ip->i_cow_blocks == 0);
 	XFS_STATS_INC(ip->i_mount, vn_reclaim);
 
 	/*


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 6/6] xfs: fix up cowextsz allocation shortfalls
  2018-01-21  5:33 [PATCH 0/6] xfs: reflink fixes Darrick J. Wong
                   ` (4 preceding siblings ...)
  2018-01-21  5:34 ` [PATCH 5/6] xfs: track CoW blocks separately in the inode Darrick J. Wong
@ 2018-01-21  5:34 ` Darrick J. Wong
  2018-01-22 23:25 ` [PATCH 0/6] xfs: reflink fixes Darrick J. Wong
  6 siblings, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2018-01-21  5:34 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

In xfs_bmap_btalloc, we try using the CoW extent size hint to force
allocations to align (offset-wise) to cowextsz granularity to reduce CoW
fragmentation.  This works fine until we cannot satisfy the allocation
with enough blocks to cover the requested range and the alignment hints.
If this happens, return an unaligned region because if we don't the
extent trim functions cause us to return a zero-length extent to iomap,
which iomap doesn't catch and thus blows up.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/iomap.c               |    2 +-
 fs/xfs/libxfs/xfs_bmap.c |   21 +++++++++++++++++++--
 2 files changed, 20 insertions(+), 3 deletions(-)


diff --git a/fs/iomap.c b/fs/iomap.c
index e5de772..aec35a0 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -63,7 +63,7 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
 	ret = ops->iomap_begin(inode, pos, length, flags, &iomap);
 	if (ret)
 		return ret;
-	if (WARN_ON(iomap.offset > pos))
+	if (WARN_ON(iomap.offset > pos) || WARN_ON(iomap.length == 0))
 		return -EIO;
 
 	/*
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index ad0dc56..e64a499 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3480,8 +3480,20 @@ xfs_bmap_btalloc_filestreams(
 static void
 xfs_bmap_btalloc_cow(
 	struct xfs_bmalloca	*ap,
-	struct xfs_alloc_arg	*args)
+	struct xfs_alloc_arg	*args,
+	xfs_fileoff_t		orig_offset,
+	xfs_extlen_t		orig_length)
 {
+	/*
+	 * If we didn't get enough blocks to satisfy the cowextsize
+	 * aligned request, break the alignment and return whatever we
+	 * got; it's the best we can do.
+	 */
+	if (ap->length <= orig_length)
+		ap->offset = orig_offset;
+	else if (ap->offset + ap->length < orig_offset + orig_length)
+		ap->offset = orig_offset + orig_length - ap->length;
+
 	/* Filling a previously reserved extent; nothing to do here. */
 	if (ap->wasdel)
 		return;
@@ -3520,6 +3532,8 @@ xfs_bmap_btalloc(
 	xfs_agnumber_t	fb_agno;	/* ag number of ap->firstblock */
 	xfs_agnumber_t	ag;
 	xfs_alloc_arg_t	args;
+	xfs_fileoff_t	orig_offset;
+	xfs_extlen_t	orig_length;
 	xfs_extlen_t	blen;
 	xfs_extlen_t	nextminlen = 0;
 	int		nullfb;		/* true if ap->firstblock isn't set */
@@ -3529,6 +3543,8 @@ xfs_bmap_btalloc(
 	int		stripe_align;
 
 	ASSERT(ap->length);
+	orig_offset = ap->offset;
+	orig_length = ap->length;
 
 	mp = ap->ip->i_mount;
 
@@ -3745,7 +3761,8 @@ xfs_bmap_btalloc(
 		ASSERT(nullfb || fb_agno <= args.agno);
 		ap->length = args.len;
 		if (ap->flags & XFS_BMAPI_COWFORK) {
-			xfs_bmap_btalloc_cow(ap, &args);
+			xfs_bmap_btalloc_cow(ap, &args, orig_offset,
+					orig_length);
 		} else {
 			ap->ip->i_d.di_nblocks += args.len;
 			xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/6] xfs: reflink fixes
  2018-01-21  5:33 [PATCH 0/6] xfs: reflink fixes Darrick J. Wong
                   ` (5 preceding siblings ...)
  2018-01-21  5:34 ` [PATCH 6/6] xfs: fix up cowextsz allocation shortfalls Darrick J. Wong
@ 2018-01-22 23:25 ` Darrick J. Wong
  6 siblings, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2018-01-22 23:25 UTC (permalink / raw)
  To: linux-xfs

On Sat, Jan 20, 2018 at 09:33:49PM -0800, Darrick J. Wong wrote:
> Hi all,
> 
> Running generic/232 with quotas and reflink demonstrated that there was
> something wrong with the way we did quota accounting -- on an otherwise
> idle system, fs-wide du block count numbers didn't match the quota
> reports.  I started digging into why the quota accounting was wrong, and
> the following are the results of my bug hunt.

Well, I wasn't expecting 4.15 to be delayed again, but I guess it has.
To disambiguate: this series is intended for 4.16, even though I already
sent out the AGFL series tagged for 4.17.

--D

> The first patch teaches the reflink code to break layout leases before
> commencing the block remapping work.  This time we avoid the "looping
> trying to get a lock" that Christoph complained about, in favor of
> dropping both locks and retrying if we can't cleanly break the layouts
> without waiting.
> 
> The second patch changes the source file locking (if src != dest) during
> a reflink operation to take the shared locks when possible.  The only
> thing changing in the source file is the setting of the reflink iflag,
> for which we will still take ILOCK_EXCL.  The net result of this is
> less lock contention during fsstress and a 30% lower runtime, not that
> anyone cares about fsstress benchmarking. :)
> 
> Patch three ensure that we attach dquots to inodes before we start
> reflinking their blocks.  This could lead to quota undercharging; an
> fstest to check this will be sent separately.
> 
> Patch four reorganizes the copy on write quota updating code to reflect
> how the CoW fork works now.  In short, the CoW fork is entirely in
> memory, so we can only use the in-memory quota reservation counters for
> all CoW blocks; the accounting only becomes permanent if we remap an
> extent into the data fork.
> 
> Patch five creates a separate i_cow_blocks counter to track all the CoW
> blocks assigned to a file, which makes changing a file's uid/gid/prjid
> easier, makes reporting cow blocks via stat easy, and enables various
> cleanups.
> 
> Patch six fixes a serious potential corruption problem with the cow
> extent allocation -- when we allocate into the CoW fork with the cow
> extent size hint set, the allocator enlarges the allocation request to
> try to hit alignment goals.  However, if the allocated extent does not
> actually fulfill any of the requested range, we send a garbage
> zero-length extent back to the iomap code (which also doesn't notice),
> and the write lands at the startblock of the garbage extent.  The fix is
> to detect that we didn't fill the entire requested range and fix up the
> returned mapping so that we always fill the first block of the
> requested allocation.
> 
> --D
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/6] xfs: reflink should break pnfs leases before sharing blocks
  2018-01-21  5:33 ` [PATCH 1/6] xfs: reflink should break pnfs leases before sharing blocks Darrick J. Wong
@ 2018-01-23 12:05   ` Brian Foster
  2018-01-23 18:19     ` Darrick J. Wong
  0 siblings, 1 reply; 14+ messages in thread
From: Brian Foster @ 2018-01-23 12:05 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Sat, Jan 20, 2018 at 09:33:55PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Before we share blocks between files, we need to break the pnfs leases
> on the layout before we start slicing and dicing the block map.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/xfs_reflink.c |   46 +++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 45 insertions(+), 1 deletion(-)
> 
> 
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 47aea2e..ce523dd 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -1245,6 +1245,48 @@ xfs_reflink_remap_blocks(
>  }
>  
>  /*
> + * Grab the exclusive iolock for a data copy from src to dest, making
> + * sure to abide vfs locking order (lowest pointer value goes first) and
> + * breaking the pnfs layout leases on dest before proceeding.
> + */
> +static int
> +xfs_iolock_two_inodes_and_break_layout(
> +	struct inode		*src,
> +	struct inode		*dest)
> +{
> +	bool			src_first = src < dest;
> +	bool			src_last = src > dest;
> +	int			error;
> +
> +retry:
> +	if (src_first) {
> +		inode_lock(src);
> +		inode_lock_nested(dest, I_MUTEX_NONDIR2);
> +	} else {
> +		inode_lock(dest);
> +	}
> +
> +	error = break_layout(dest, false);
> +	if (error == -EWOULDBLOCK) {
> +		inode_unlock(dest);
> +		if (src_first)
> +			inode_unlock(src);
> +		error = break_layout(dest, true);
> +		if (error)
> +			return error;
> +		goto retry;
> +	} else if (error) {
> +		inode_unlock(dest);
> +		if (src_first)
> +			inode_unlock(src);
> +		return error;
> +	}
> +	if (src_last)
> +		inode_lock_nested(src, I_MUTEX_NONDIR2);
> +	return 0;
> +}
> +

I'm not really familiar with the lease code so it's not clear to me, for
example, why the loop is necessary here and whatnot. But it seems the
above could be factored into a slightly more generic helper patterned
after xfs_break_layouts(). For example, create an
xfs_break_two_nondir_layouts() in xfs_pnfs.c with the exact same
semantics/structure as the former, just replace the iolock calls with
the lock/unlock_two_nondir() calls..?

Brian

> +/*
>   * Link a range of blocks from one file to another.
>   */
>  int
> @@ -1274,7 +1316,9 @@ xfs_reflink_remap_range(
>  		return -EIO;
>  
>  	/* Lock both files against IO */
> -	lock_two_nondirectories(inode_in, inode_out);
> +	ret = xfs_iolock_two_inodes_and_break_layout(inode_in, inode_out);
> +	if (ret)
> +		return ret;
>  	if (same_inode)
>  		xfs_ilock(src, XFS_MMAPLOCK_EXCL);
>  	else
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/6] xfs: only grab shared inode locks for source file during reflink
  2018-01-21  5:34 ` [PATCH 2/6] xfs: only grab shared inode locks for source file during reflink Darrick J. Wong
@ 2018-01-23 12:05   ` Brian Foster
  2018-01-23 18:23     ` Darrick J. Wong
  0 siblings, 1 reply; 14+ messages in thread
From: Brian Foster @ 2018-01-23 12:05 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Sat, Jan 20, 2018 at 09:34:03PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Reflink and dedupe operations remap blocks from a source file into a
> destination file.  The destination file needs exclusive locks on all
> levels because we're updating its block map, but the source file isn't
> undergoing any block map changes so we can use a shared lock.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/xfs_reflink.c |   50 +++++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 37 insertions(+), 13 deletions(-)
> 
> 
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index ce523dd..5d1ff5a 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -1202,13 +1202,16 @@ xfs_reflink_remap_blocks(
>  
>  	/* drange = (destoff, destoff + len); srange = (srcoff, srcoff + len) */
>  	while (len) {
> +		uint		lock_mode;
> +
>  		trace_xfs_reflink_remap_blocks_loop(src, srcoff, len,
>  				dest, destoff);
> +
>  		/* Read extent from the source file */
>  		nimaps = 1;
> -		xfs_ilock(src, XFS_ILOCK_EXCL);
> +		lock_mode = xfs_ilock_data_map_shared(src);
>  		error = xfs_bmapi_read(src, srcoff, len, &imap, &nimaps, 0);
> -		xfs_iunlock(src, XFS_ILOCK_EXCL);
> +		xfs_iunlock(src, lock_mode);
>  		if (error)
>  			goto err;
>  		ASSERT(nimaps == 1);
> @@ -1260,7 +1263,7 @@ xfs_iolock_two_inodes_and_break_layout(
>  
>  retry:
>  	if (src_first) {
> -		inode_lock(src);
> +		inode_lock_shared(src);

Hm, I guess this could make my comment on the previous patch more
difficult. Oh well.

>  		inode_lock_nested(dest, I_MUTEX_NONDIR2);
>  	} else {
>  		inode_lock(dest);
> @@ -1270,7 +1273,7 @@ xfs_iolock_two_inodes_and_break_layout(
>  	if (error == -EWOULDBLOCK) {
>  		inode_unlock(dest);
>  		if (src_first)
> -			inode_unlock(src);
> +			inode_unlock_shared(src);
>  		error = break_layout(dest, true);
>  		if (error)
>  			return error;
> @@ -1278,14 +1281,36 @@ xfs_iolock_two_inodes_and_break_layout(
>  	} else if (error) {
>  		inode_unlock(dest);
>  		if (src_first)
> -			inode_unlock(src);
> +			inode_unlock_shared(src);
>  		return error;
>  	}
>  	if (src_last)
> -		inode_lock_nested(src, I_MUTEX_NONDIR2);
> +		down_read_nested(&src->i_rwsem, I_MUTEX_NONDIR2);
>  	return 0;
>  }
>  
> +static void
> +xfs_reflink_mmaplock_two(
> +	struct xfs_inode	*src,
> +	struct xfs_inode	*dest)
> +{
> +	int			i = 0;
> +
> +	if (src->i_ino == dest->i_ino) {
> +		xfs_ilock(src, XFS_MMAPLOCK_EXCL);
> +		return;
> +	}
> +
> +	if (src->i_ino < dest->i_ino) {
> +		xfs_ilock(src, XFS_MMAPLOCK_SHARED);
> +		i++;
> +	}
> +	xfs_ilock(dest, XFS_MMAPLOCK_EXCL + (i << XFS_MMAPLOCK_SHIFT));
> +	i++;
> +	if (src->i_ino > dest->i_ino)
> +		xfs_ilock(src, XFS_MMAPLOCK_SHARED + (i << XFS_MMAPLOCK_SHIFT));
> +}
> +

I am kind of wondering if this one could be replaced with a refactor of
xfs_lock_two_inodes() to take two sets of lock flags (then create a
wrapper to preserve the current signature that just passes the same set
of flags for both inodes).

Brian

>  /*
>   * Link a range of blocks from one file to another.
>   */
> @@ -1319,10 +1344,7 @@ xfs_reflink_remap_range(
>  	ret = xfs_iolock_two_inodes_and_break_layout(inode_in, inode_out);
>  	if (ret)
>  		return ret;
> -	if (same_inode)
> -		xfs_ilock(src, XFS_MMAPLOCK_EXCL);
> -	else
> -		xfs_lock_two_inodes(src, dest, XFS_MMAPLOCK_EXCL);
> +	xfs_reflink_mmaplock_two(src, dest);
>  
>  	/* Check file eligibility and prepare for block sharing. */
>  	ret = -EINVAL;
> @@ -1385,10 +1407,12 @@ xfs_reflink_remap_range(
>  			is_dedupe);
>  
>  out_unlock:
> -	xfs_iunlock(src, XFS_MMAPLOCK_EXCL);
> +	xfs_iunlock(dest, XFS_MMAPLOCK_EXCL);
> +	if (!same_inode)
> +		xfs_iunlock(src, XFS_MMAPLOCK_SHARED);
> +	inode_unlock(inode_out);
>  	if (!same_inode)
> -		xfs_iunlock(dest, XFS_MMAPLOCK_EXCL);
> -	unlock_two_nondirectories(inode_in, inode_out);
> +		inode_unlock_shared(inode_in);
>  	if (ret)
>  		trace_xfs_reflink_remap_range_error(dest, ret, _RET_IP_);
>  	return ret;
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/6] xfs: call xfs_qm_dqattach before performing reflink operations
  2018-01-21  5:34 ` [PATCH 3/6] xfs: call xfs_qm_dqattach before performing reflink operations Darrick J. Wong
@ 2018-01-23 12:05   ` Brian Foster
  2018-01-23 18:27     ` Darrick J. Wong
  0 siblings, 1 reply; 14+ messages in thread
From: Brian Foster @ 2018-01-23 12:05 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Sat, Jan 20, 2018 at 09:34:10PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Ensure that we've attached all the necessary dquots before performing
> reflink operations so that quota accounting is accurate.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/xfs_reflink.c |   17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> 
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 5d1ff5a..947d0637 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -54,6 +54,7 @@
>  #include "xfs_rmap_btree.h"
>  #include "xfs_sb.h"
>  #include "xfs_ag_resv.h"
> +#include "xfs_qm.h"
>  
>  /*
>   * Copy on Write of Shared Blocks
> @@ -282,6 +283,10 @@ xfs_reflink_reserve_cow(
>  	 * tree.
>  	 */
>  
> +	error = xfs_qm_dqattach_locked(ip, 0);
> +	if (error)
> +		return error;
> +

The same call exists further down in the function. Was the intent to
move it? I suspect we don't need it twice, at least.

>  	if (!xfs_iext_lookup_extent(ip, ifp, imap->br_startoff, &icur, &got))
>  		eof = true;
>  	if (!eof && got.br_startoff <= imap->br_startoff) {
> @@ -396,6 +401,10 @@ xfs_reflink_allocate_cow(
>  	ASSERT(xfs_is_reflink_inode(ip));
>  	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL | XFS_ILOCK_SHARED));
>  
> +	error = xfs_qm_dqattach_locked(ip, 0);
> +	if (error)
> +		return error;
> +

Similar pattern here, but for this one the assert above suggests we
could have the shared lock. xfs_qm_dqattach_locked() looks like it
expects the exclusive lock (and that's what it looks like the second
call deals with). Hm?

Brian

>  	/*
>  	 * Even if the extent is not shared we might have a preallocation for
>  	 * it in the COW fork.  If so use it.
> @@ -1356,6 +1365,14 @@ xfs_reflink_remap_range(
>  	if (IS_DAX(inode_in) || IS_DAX(inode_out))
>  		goto out_unlock;
>  
> +	/* Attach dquots to both inodes */
> +	ret = xfs_qm_dqattach(src, 0);
> +	if (ret)
> +		goto out_unlock;
> +	ret = xfs_qm_dqattach(dest, 0);
> +	if (ret)
> +		goto out_unlock;
> +
>  	ret = vfs_clone_file_prep_inodes(inode_in, pos_in, inode_out, pos_out,
>  			&len, is_dedupe);
>  	if (ret <= 0)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/6] xfs: reflink should break pnfs leases before sharing blocks
  2018-01-23 12:05   ` Brian Foster
@ 2018-01-23 18:19     ` Darrick J. Wong
  0 siblings, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2018-01-23 18:19 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Tue, Jan 23, 2018 at 07:05:20AM -0500, Brian Foster wrote:
> On Sat, Jan 20, 2018 at 09:33:55PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Before we share blocks between files, we need to break the pnfs leases
> > on the layout before we start slicing and dicing the block map.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/xfs_reflink.c |   46 +++++++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 45 insertions(+), 1 deletion(-)
> > 
> > 
> > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> > index 47aea2e..ce523dd 100644
> > --- a/fs/xfs/xfs_reflink.c
> > +++ b/fs/xfs/xfs_reflink.c
> > @@ -1245,6 +1245,48 @@ xfs_reflink_remap_blocks(
> >  }
> >  
> >  /*
> > + * Grab the exclusive iolock for a data copy from src to dest, making
> > + * sure to abide vfs locking order (lowest pointer value goes first) and
> > + * breaking the pnfs layout leases on dest before proceeding.
> > + */
> > +static int
> > +xfs_iolock_two_inodes_and_break_layout(
> > +	struct inode		*src,
> > +	struct inode		*dest)
> > +{
> > +	bool			src_first = src < dest;
> > +	bool			src_last = src > dest;
> > +	int			error;
> > +
> > +retry:
> > +	if (src_first) {
> > +		inode_lock(src);
> > +		inode_lock_nested(dest, I_MUTEX_NONDIR2);
> > +	} else {
> > +		inode_lock(dest);
> > +	}
> > +
> > +	error = break_layout(dest, false);
> > +	if (error == -EWOULDBLOCK) {
> > +		inode_unlock(dest);
> > +		if (src_first)
> > +			inode_unlock(src);
> > +		error = break_layout(dest, true);
> > +		if (error)
> > +			return error;
> > +		goto retry;
> > +	} else if (error) {
> > +		inode_unlock(dest);
> > +		if (src_first)
> > +			inode_unlock(src);
> > +		return error;
> > +	}
> > +	if (src_last)
> > +		inode_lock_nested(src, I_MUTEX_NONDIR2);
> > +	return 0;
> > +}
> > +
> 
> I'm not really familiar with the lease code so it's not clear to me, for
> example, why the loop is necessary here and whatnot.

Christoph objected to the original version of this patch grabbing
IOLOCK_EXCL (or any lock) and then spinning on a break-unlock-break-lock
loop trying to break the leases on dest.  I didn't see anything obvious
that could stall in break_layout for a long time, but the fact that it
has a 'wait' parameter strongly implies that something in that path
could sleep.

> But it seems the above could be factored into a slightly more generic
> helper patterned after xfs_break_layouts(). For example, create an
> xfs_break_two_nondir_layouts() in xfs_pnfs.c with the exact same
> semantics/structure as the former, just replace the iolock calls with
> the lock/unlock_two_nondir() calls..?

The awkward part of this patch is that we really only need a function
like this to do things like reflink (read from one file, write to
another) so that's why it's in xfs_reflink.c... but I'll move on to the
next patch for more of a reply. :)

--D

> Brian
> 
> > +/*
> >   * Link a range of blocks from one file to another.
> >   */
> >  int
> > @@ -1274,7 +1316,9 @@ xfs_reflink_remap_range(
> >  		return -EIO;
> >  
> >  	/* Lock both files against IO */
> > -	lock_two_nondirectories(inode_in, inode_out);
> > +	ret = xfs_iolock_two_inodes_and_break_layout(inode_in, inode_out);
> > +	if (ret)
> > +		return ret;
> >  	if (same_inode)
> >  		xfs_ilock(src, XFS_MMAPLOCK_EXCL);
> >  	else
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/6] xfs: only grab shared inode locks for source file during reflink
  2018-01-23 12:05   ` Brian Foster
@ 2018-01-23 18:23     ` Darrick J. Wong
  0 siblings, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2018-01-23 18:23 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Tue, Jan 23, 2018 at 07:05:34AM -0500, Brian Foster wrote:
> On Sat, Jan 20, 2018 at 09:34:03PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Reflink and dedupe operations remap blocks from a source file into a
> > destination file.  The destination file needs exclusive locks on all
> > levels because we're updating its block map, but the source file isn't
> > undergoing any block map changes so we can use a shared lock.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/xfs_reflink.c |   50 +++++++++++++++++++++++++++++++++++++-------------
> >  1 file changed, 37 insertions(+), 13 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> > index ce523dd..5d1ff5a 100644
> > --- a/fs/xfs/xfs_reflink.c
> > +++ b/fs/xfs/xfs_reflink.c
> > @@ -1202,13 +1202,16 @@ xfs_reflink_remap_blocks(
> >  
> >  	/* drange = (destoff, destoff + len); srange = (srcoff, srcoff + len) */
> >  	while (len) {
> > +		uint		lock_mode;
> > +
> >  		trace_xfs_reflink_remap_blocks_loop(src, srcoff, len,
> >  				dest, destoff);
> > +
> >  		/* Read extent from the source file */
> >  		nimaps = 1;
> > -		xfs_ilock(src, XFS_ILOCK_EXCL);
> > +		lock_mode = xfs_ilock_data_map_shared(src);
> >  		error = xfs_bmapi_read(src, srcoff, len, &imap, &nimaps, 0);
> > -		xfs_iunlock(src, XFS_ILOCK_EXCL);
> > +		xfs_iunlock(src, lock_mode);
> >  		if (error)
> >  			goto err;
> >  		ASSERT(nimaps == 1);
> > @@ -1260,7 +1263,7 @@ xfs_iolock_two_inodes_and_break_layout(
> >  
> >  retry:
> >  	if (src_first) {
> > -		inode_lock(src);
> > +		inode_lock_shared(src);
> 
> Hm, I guess this could make my comment on the previous patch more
> difficult. Oh well.

Yeah, there's nowhere else in the xfs code where we do this (read lock
one file, write lock another), afaict.

> >  		inode_lock_nested(dest, I_MUTEX_NONDIR2);
> >  	} else {
> >  		inode_lock(dest);
> > @@ -1270,7 +1273,7 @@ xfs_iolock_two_inodes_and_break_layout(
> >  	if (error == -EWOULDBLOCK) {
> >  		inode_unlock(dest);
> >  		if (src_first)
> > -			inode_unlock(src);
> > +			inode_unlock_shared(src);
> >  		error = break_layout(dest, true);
> >  		if (error)
> >  			return error;
> > @@ -1278,14 +1281,36 @@ xfs_iolock_two_inodes_and_break_layout(
> >  	} else if (error) {
> >  		inode_unlock(dest);
> >  		if (src_first)
> > -			inode_unlock(src);
> > +			inode_unlock_shared(src);
> >  		return error;
> >  	}
> >  	if (src_last)
> > -		inode_lock_nested(src, I_MUTEX_NONDIR2);
> > +		down_read_nested(&src->i_rwsem, I_MUTEX_NONDIR2);
> >  	return 0;
> >  }
> >  
> > +static void
> > +xfs_reflink_mmaplock_two(
> > +	struct xfs_inode	*src,
> > +	struct xfs_inode	*dest)
> > +{
> > +	int			i = 0;
> > +
> > +	if (src->i_ino == dest->i_ino) {
> > +		xfs_ilock(src, XFS_MMAPLOCK_EXCL);
> > +		return;
> > +	}
> > +
> > +	if (src->i_ino < dest->i_ino) {
> > +		xfs_ilock(src, XFS_MMAPLOCK_SHARED);
> > +		i++;
> > +	}
> > +	xfs_ilock(dest, XFS_MMAPLOCK_EXCL + (i << XFS_MMAPLOCK_SHIFT));
> > +	i++;
> > +	if (src->i_ino > dest->i_ino)
> > +		xfs_ilock(src, XFS_MMAPLOCK_SHARED + (i << XFS_MMAPLOCK_SHIFT));
> > +}
> > +
> 
> I am kind of wondering if this one could be replaced with a refactor of
> xfs_lock_two_inodes() to take two sets of lock flags (then create a
> wrapper to preserve the current signature that just passes the same set
> of flags for both inodes).

Yes, I'll do that.

--D

> Brian
> 
> >  /*
> >   * Link a range of blocks from one file to another.
> >   */
> > @@ -1319,10 +1344,7 @@ xfs_reflink_remap_range(
> >  	ret = xfs_iolock_two_inodes_and_break_layout(inode_in, inode_out);
> >  	if (ret)
> >  		return ret;
> > -	if (same_inode)
> > -		xfs_ilock(src, XFS_MMAPLOCK_EXCL);
> > -	else
> > -		xfs_lock_two_inodes(src, dest, XFS_MMAPLOCK_EXCL);
> > +	xfs_reflink_mmaplock_two(src, dest);
> >  
> >  	/* Check file eligibility and prepare for block sharing. */
> >  	ret = -EINVAL;
> > @@ -1385,10 +1407,12 @@ xfs_reflink_remap_range(
> >  			is_dedupe);
> >  
> >  out_unlock:
> > -	xfs_iunlock(src, XFS_MMAPLOCK_EXCL);
> > +	xfs_iunlock(dest, XFS_MMAPLOCK_EXCL);
> > +	if (!same_inode)
> > +		xfs_iunlock(src, XFS_MMAPLOCK_SHARED);
> > +	inode_unlock(inode_out);
> >  	if (!same_inode)
> > -		xfs_iunlock(dest, XFS_MMAPLOCK_EXCL);
> > -	unlock_two_nondirectories(inode_in, inode_out);
> > +		inode_unlock_shared(inode_in);
> >  	if (ret)
> >  		trace_xfs_reflink_remap_range_error(dest, ret, _RET_IP_);
> >  	return ret;
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/6] xfs: call xfs_qm_dqattach before performing reflink operations
  2018-01-23 12:05   ` Brian Foster
@ 2018-01-23 18:27     ` Darrick J. Wong
  0 siblings, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2018-01-23 18:27 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Tue, Jan 23, 2018 at 07:05:57AM -0500, Brian Foster wrote:
> On Sat, Jan 20, 2018 at 09:34:10PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Ensure that we've attached all the necessary dquots before performing
> > reflink operations so that quota accounting is accurate.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/xfs_reflink.c |   17 +++++++++++++++++
> >  1 file changed, 17 insertions(+)
> > 
> > 
> > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> > index 5d1ff5a..947d0637 100644
> > --- a/fs/xfs/xfs_reflink.c
> > +++ b/fs/xfs/xfs_reflink.c
> > @@ -54,6 +54,7 @@
> >  #include "xfs_rmap_btree.h"
> >  #include "xfs_sb.h"
> >  #include "xfs_ag_resv.h"
> > +#include "xfs_qm.h"
> >  
> >  /*
> >   * Copy on Write of Shared Blocks
> > @@ -282,6 +283,10 @@ xfs_reflink_reserve_cow(
> >  	 * tree.
> >  	 */
> >  
> > +	error = xfs_qm_dqattach_locked(ip, 0);
> > +	if (error)
> > +		return error;
> > +
> 
> The same call exists further down in the function. Was the intent to
> move it? I suspect we don't need it twice, at least.

Drat, I don't know why either of these are there, I think I got paste-happy?

OH, yuck, this is the debug patch from an earlier revision.  The only
dqattach we actually need is...

> >  	if (!xfs_iext_lookup_extent(ip, ifp, imap->br_startoff, &icur, &got))
> >  		eof = true;
> >  	if (!eof && got.br_startoff <= imap->br_startoff) {
> > @@ -396,6 +401,10 @@ xfs_reflink_allocate_cow(
> >  	ASSERT(xfs_is_reflink_inode(ip));
> >  	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL | XFS_ILOCK_SHARED));
> >  
> > +	error = xfs_qm_dqattach_locked(ip, 0);
> > +	if (error)
> > +		return error;
> > +
> 
> Similar pattern here, but for this one the assert above suggests we
> could have the shared lock. xfs_qm_dqattach_locked() looks like it
> expects the exclusive lock (and that's what it looks like the second
> call deals with). Hm?
> 
> Brian
> 
> >  	/*
> >  	 * Even if the extent is not shared we might have a preallocation for
> >  	 * it in the COW fork.  If so use it.
> > @@ -1356,6 +1365,14 @@ xfs_reflink_remap_range(
> >  	if (IS_DAX(inode_in) || IS_DAX(inode_out))
> >  		goto out_unlock;
> >  
> > +	/* Attach dquots to both inodes */
> > +	ret = xfs_qm_dqattach(src, 0);
> > +	if (ret)
> > +		goto out_unlock;
> > +	ret = xfs_qm_dqattach(dest, 0);

...this one, because we're not changing the src file's block allocations,
but we /are/ forgetting to ensure they're attached to dest.

Sorry for the noise, I'll send out the proper patch later today.

--D

> > +	if (ret)
> > +		goto out_unlock;
> > +
> >  	ret = vfs_clone_file_prep_inodes(inode_in, pos_in, inode_out, pos_out,
> >  			&len, is_dedupe);
> >  	if (ret <= 0)
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-01-23 18:27 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-21  5:33 [PATCH 0/6] xfs: reflink fixes Darrick J. Wong
2018-01-21  5:33 ` [PATCH 1/6] xfs: reflink should break pnfs leases before sharing blocks Darrick J. Wong
2018-01-23 12:05   ` Brian Foster
2018-01-23 18:19     ` Darrick J. Wong
2018-01-21  5:34 ` [PATCH 2/6] xfs: only grab shared inode locks for source file during reflink Darrick J. Wong
2018-01-23 12:05   ` Brian Foster
2018-01-23 18:23     ` Darrick J. Wong
2018-01-21  5:34 ` [PATCH 3/6] xfs: call xfs_qm_dqattach before performing reflink operations Darrick J. Wong
2018-01-23 12:05   ` Brian Foster
2018-01-23 18:27     ` Darrick J. Wong
2018-01-21  5:34 ` [PATCH 4/6] xfs: CoW fork operations should only update quota reservations Darrick J. Wong
2018-01-21  5:34 ` [PATCH 5/6] xfs: track CoW blocks separately in the inode Darrick J. Wong
2018-01-21  5:34 ` [PATCH 6/6] xfs: fix up cowextsz allocation shortfalls Darrick J. Wong
2018-01-22 23:25 ` [PATCH 0/6] xfs: reflink fixes Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).