linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/7] xfs: reflink/scrub/quota fixes
@ 2018-01-27  1:10 Darrick J. Wong
  2018-01-27  1:10 ` [PATCH 1/7] xfs: reflink should break pnfs leases before sharing blocks Darrick J. Wong
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Darrick J. Wong @ 2018-01-27  1:10 UTC (permalink / raw)
  To: hch, bfoster, darrick.wong; +Cc: linux-xfs

Hi all,

This is a re-roll of five patches that have generated a lot of review
comments; hopefully they're ready to go.  Some of the patches already
have reviewed-bys, but the discussion so choppy I decided that I'd just
republish the whole set.

Running generic/232 with quotas and reflink demonstrated that there was
something wrong with the way we did quota accounting -- on an otherwise
idle system, fs-wide du block count numbers didn't match the quota
reports.  I started digging into why the quota accounting was wrong, and
the following are the results of my bug hunt.  Regression tests have
been sent to the fstests lists.

The first patch fixes reflink to break pnfs leases before reflinking as
we take the iolocks of both inodes.  Patches 2-3 restructure the locking
code to take shared locks on the source file whenever possible, to
reduce lock contention.

The fourth patch reorganizes the copy on write quota updating code to
reflect how the CoW fork works now.  In short, the CoW fork is entirely
in memory, so we can only use the in-memory quota reservation counters
for all CoW blocks; the accounting only becomes permanent if we remap an
extent into the data fork.

Patch 5 creates a separate i_cow_blocks counter to track all the CoW
blocks assigned to a file, which makes changing a file's uid/gid/prjid
easier, makes reporting cow blocks via stat easy, and enables various
cleanups.  This also makes it so that we avoid unnecesary eofblocks
scanning on a file with speculative cow preallocations.

Patches 6-7 fixes a serious corruption problem with direct io cow
writes.  If extent allocation returns no error and no mapping, we must
return ENOSPC to the caller because otherwise we pass a garbage mapping
to the dio code and the write shoots into space.  Bad.

Patch 8 fixes a problem with direct IO writes (regular and cow)
returning ENOSPC when there's plenty of free space.  When we allocate
into the CoW fork with the cow extent size hint set, the allocator
enlarges the allocation request to try to hit alignment goals.  However,
if the allocated extent does not actually fulfill any of the requested
range, we send a garbage zero-length extent back to the iomap code
(which also doesn't notice), and the write lands at the startblock of
the garbage extent.  The fix is to detect that we didn't fill the entire
requested range and fix up the returned mapping so that we always fill
the first block of the requested allocation and for CoW to check the
validity returned mappings before blowing things up.  This also fixes a
problem where a direct io overwrite can return ENOSPC to userspace
because of the same fragmentation problem.

Sorry the maintainer^Wpatches has been so volatile and rapidfire
recently; the end of the 4.15 cycle and some major $distrokernel
deadlines have made for a hellishly busy week.

--D

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/7] xfs: reflink should break pnfs leases before sharing blocks
  2018-01-27  1:10 [PATCH v3 0/7] xfs: reflink/scrub/quota fixes Darrick J. Wong
@ 2018-01-27  1:10 ` Darrick J. Wong
  2018-01-27  7:31   ` Christoph Hellwig
  2018-01-27  1:10 ` [PATCH 2/7] xfs: allow xfs_lock_two_inodes to take different EXCL/SHARED modes Darrick J. Wong
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2018-01-27  1:10 UTC (permalink / raw)
  To: hch, bfoster, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Before we share blocks between files, we need to break the pnfs leases
on the layout before we start slicing and dicing the block map.  The
structure of this function sets us up for the lock contention reduction
in the next patch.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_reflink.c |   48 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 47 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index bcc2ad4..bac464f 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1245,6 +1245,50 @@ xfs_reflink_remap_blocks(
 }
 
 /*
+ * Grab the exclusive iolock for a data copy from src to dest, making
+ * sure to abide vfs locking order (lowest pointer value goes first) and
+ * breaking the pnfs layout leases on dest before proceeding.  The loop
+ * is needed because we cannot call the blocking break_layout() with the
+ * src iolock held, and therefore have to back out both locks.
+ */
+static int
+xfs_iolock_two_inodes_and_break_layout(
+	struct inode		*src,
+	struct inode		*dest)
+{
+	int			error;
+
+retry:
+	if (src < dest) {
+		inode_lock(src);
+		inode_lock_nested(dest, I_MUTEX_NONDIR2);
+	} else {
+		/* src >= dest */
+		inode_lock(dest);
+	}
+
+	error = break_layout(dest, false);
+	if (error == -EWOULDBLOCK) {
+		inode_unlock(dest);
+		if (src < dest)
+			inode_unlock(src);
+		error = break_layout(dest, true);
+		if (error)
+			return error;
+		goto retry;
+	}
+	if (error) {
+		inode_unlock(dest);
+		if (src < dest)
+			inode_unlock(src);
+		return error;
+	}
+	if (src > dest)
+		inode_lock_nested(src, I_MUTEX_NONDIR2);
+	return 0;
+}
+
+/*
  * Link a range of blocks from one file to another.
  */
 int
@@ -1274,7 +1318,9 @@ xfs_reflink_remap_range(
 		return -EIO;
 
 	/* Lock both files against IO */
-	lock_two_nondirectories(inode_in, inode_out);
+	ret = xfs_iolock_two_inodes_and_break_layout(inode_in, inode_out);
+	if (ret)
+		return ret;
 	if (same_inode)
 		xfs_ilock(src, XFS_MMAPLOCK_EXCL);
 	else


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/7] xfs: allow xfs_lock_two_inodes to take different EXCL/SHARED modes
  2018-01-27  1:10 [PATCH v3 0/7] xfs: reflink/scrub/quota fixes Darrick J. Wong
  2018-01-27  1:10 ` [PATCH 1/7] xfs: reflink should break pnfs leases before sharing blocks Darrick J. Wong
@ 2018-01-27  1:10 ` Darrick J. Wong
  2018-01-27  7:32   ` Christoph Hellwig
  2018-01-27  1:10 ` [PATCH 3/7] xfs: only grab shared inode locks for source file during reflink Darrick J. Wong
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2018-01-27  1:10 UTC (permalink / raw)
  To: hch, bfoster, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Refactor xfs_lock_two_inodes to take separate locking modes for each
inode.  Specifically, this enables us to take a SHARED lock on one inode
and an EXCL lock on the other.  The lock class (MMAPLOCK/ILOCK) must be
the same for each inode.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_bmap_util.c |    4 ++--
 fs/xfs/xfs_inode.c     |   49 +++++++++++++++++++++++++++++++-----------------
 fs/xfs/xfs_inode.h     |    3 ++-
 fs/xfs/xfs_reflink.c   |    5 +++--
 4 files changed, 39 insertions(+), 22 deletions(-)


diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 6d37ab4..c83f549 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1872,7 +1872,7 @@ xfs_swap_extents(
 	 */
 	lock_two_nondirectories(VFS_I(ip), VFS_I(tip));
 	lock_flags = XFS_MMAPLOCK_EXCL;
-	xfs_lock_two_inodes(ip, tip, XFS_MMAPLOCK_EXCL);
+	xfs_lock_two_inodes(ip, XFS_MMAPLOCK_EXCL, tip, XFS_MMAPLOCK_EXCL);
 
 	/* Verify that both files have the same format */
 	if ((VFS_I(ip)->i_mode & S_IFMT) != (VFS_I(tip)->i_mode & S_IFMT)) {
@@ -1919,7 +1919,7 @@ xfs_swap_extents(
 	 * Lock and join the inodes to the tansaction so that transaction commit
 	 * or cancel will unlock the inodes from this point onwards.
 	 */
-	xfs_lock_two_inodes(ip, tip, XFS_ILOCK_EXCL);
+	xfs_lock_two_inodes(ip, XFS_ILOCK_EXCL, tip, XFS_ILOCK_EXCL);
 	lock_flags |= XFS_ILOCK_EXCL;
 	xfs_trans_ijoin(tp, ip, 0);
 	xfs_trans_ijoin(tp, tip, 0);
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 5366fb6..e7f6d52 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -546,23 +546,36 @@ xfs_lock_inodes(
 
 /*
  * xfs_lock_two_inodes() can only be used to lock one type of lock at a time -
- * the iolock, the mmaplock or the ilock, but not more than one at a time. If we
- * lock more than one at a time, lockdep will report false positives saying we
- * have violated locking orders.
+ * the mmaplock or the ilock, but not more than one type at a time. If we lock
+ * more than one at a time, lockdep will report false positives saying we have
+ * violated locking orders.  The iolock must be double-locked separately since
+ * we use i_rwsem for that.  We now support taking one lock EXCL and the other
+ * SHARED.
  */
 void
 xfs_lock_two_inodes(
-	xfs_inode_t		*ip0,
-	xfs_inode_t		*ip1,
-	uint			lock_mode)
+	struct xfs_inode	*ip0,
+	uint			ip0_mode,
+	struct xfs_inode	*ip1,
+	uint			ip1_mode)
 {
-	xfs_inode_t		*temp;
+	struct xfs_inode	*temp;
+	uint			mode_temp;
 	int			attempts = 0;
 	xfs_log_item_t		*lp;
 
-	ASSERT(!(lock_mode & (XFS_IOLOCK_SHARED|XFS_IOLOCK_EXCL)));
-	if (lock_mode & (XFS_MMAPLOCK_SHARED|XFS_MMAPLOCK_EXCL))
-		ASSERT(!(lock_mode & (XFS_ILOCK_SHARED|XFS_ILOCK_EXCL)));
+	ASSERT(hweight32(ip0_mode) == 1);
+	ASSERT(hweight32(ip1_mode) == 1);
+	ASSERT(!(ip0_mode & (XFS_IOLOCK_SHARED|XFS_IOLOCK_EXCL)));
+	ASSERT(!(ip1_mode & (XFS_IOLOCK_SHARED|XFS_IOLOCK_EXCL)));
+	ASSERT(!(ip0_mode & (XFS_MMAPLOCK_SHARED|XFS_MMAPLOCK_EXCL)) ||
+	       !(ip0_mode & (XFS_ILOCK_SHARED|XFS_ILOCK_EXCL)));
+	ASSERT(!(ip1_mode & (XFS_MMAPLOCK_SHARED|XFS_MMAPLOCK_EXCL)) ||
+	       !(ip1_mode & (XFS_ILOCK_SHARED|XFS_ILOCK_EXCL)));
+	ASSERT(!(ip1_mode & (XFS_MMAPLOCK_SHARED|XFS_MMAPLOCK_EXCL)) ||
+	       !(ip0_mode & (XFS_ILOCK_SHARED|XFS_ILOCK_EXCL)));
+	ASSERT(!(ip0_mode & (XFS_MMAPLOCK_SHARED|XFS_MMAPLOCK_EXCL)) ||
+	       !(ip1_mode & (XFS_ILOCK_SHARED|XFS_ILOCK_EXCL)));
 
 	ASSERT(ip0->i_ino != ip1->i_ino);
 
@@ -570,10 +583,13 @@ xfs_lock_two_inodes(
 		temp = ip0;
 		ip0 = ip1;
 		ip1 = temp;
+		mode_temp = ip0_mode;
+		ip0_mode = ip1_mode;
+		ip1_mode = mode_temp;
 	}
 
  again:
-	xfs_ilock(ip0, xfs_lock_inumorder(lock_mode, 0));
+	xfs_ilock(ip0, xfs_lock_inumorder(ip0_mode, 0));
 
 	/*
 	 * If the first lock we have locked is in the AIL, we must TRY to get
@@ -582,18 +598,17 @@ xfs_lock_two_inodes(
 	 */
 	lp = (xfs_log_item_t *)ip0->i_itemp;
 	if (lp && (lp->li_flags & XFS_LI_IN_AIL)) {
-		if (!xfs_ilock_nowait(ip1, xfs_lock_inumorder(lock_mode, 1))) {
-			xfs_iunlock(ip0, lock_mode);
+		if (!xfs_ilock_nowait(ip1, xfs_lock_inumorder(ip1_mode, 1))) {
+			xfs_iunlock(ip0, ip0_mode);
 			if ((++attempts % 5) == 0)
 				delay(1); /* Don't just spin the CPU */
 			goto again;
 		}
 	} else {
-		xfs_ilock(ip1, xfs_lock_inumorder(lock_mode, 1));
+		xfs_ilock(ip1, xfs_lock_inumorder(ip1_mode, 1));
 	}
 }
 
-
 void
 __xfs_iflock(
 	struct xfs_inode	*ip)
@@ -1421,7 +1436,7 @@ xfs_link(
 	if (error)
 		goto std_return;
 
-	xfs_lock_two_inodes(sip, tdp, XFS_ILOCK_EXCL);
+	xfs_lock_two_inodes(sip, XFS_ILOCK_EXCL, tdp, XFS_ILOCK_EXCL);
 
 	xfs_trans_ijoin(tp, sip, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(tp, tdp, XFS_ILOCK_EXCL);
@@ -2585,7 +2600,7 @@ xfs_remove(
 		goto std_return;
 	}
 
-	xfs_lock_two_inodes(dp, ip, XFS_ILOCK_EXCL);
+	xfs_lock_two_inodes(dp, XFS_ILOCK_EXCL, ip, XFS_ILOCK_EXCL);
 
 	xfs_trans_ijoin(tp, dp, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 386b0bb..3e8dc99 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -423,7 +423,8 @@ void		xfs_iunpin_wait(xfs_inode_t *);
 #define xfs_ipincount(ip)	((unsigned int) atomic_read(&ip->i_pincount))
 
 int		xfs_iflush(struct xfs_inode *, struct xfs_buf **);
-void		xfs_lock_two_inodes(xfs_inode_t *, xfs_inode_t *, uint);
+void		xfs_lock_two_inodes(struct xfs_inode *ip0, uint ip0_mode,
+				struct xfs_inode *ip1, uint ip1_mode);
 
 xfs_extlen_t	xfs_get_extsz_hint(struct xfs_inode *ip);
 xfs_extlen_t	xfs_get_cowextsz_hint(struct xfs_inode *ip);
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index bac464f..bcc58c2 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -944,7 +944,7 @@ xfs_reflink_set_inode_flag(
 	if (src->i_ino == dest->i_ino)
 		xfs_ilock(src, XFS_ILOCK_EXCL);
 	else
-		xfs_lock_two_inodes(src, dest, XFS_ILOCK_EXCL);
+		xfs_lock_two_inodes(src, XFS_ILOCK_EXCL, dest, XFS_ILOCK_EXCL);
 
 	if (!xfs_is_reflink_inode(src)) {
 		trace_xfs_reflink_set_inode_flag(src);
@@ -1324,7 +1324,8 @@ xfs_reflink_remap_range(
 	if (same_inode)
 		xfs_ilock(src, XFS_MMAPLOCK_EXCL);
 	else
-		xfs_lock_two_inodes(src, dest, XFS_MMAPLOCK_EXCL);
+		xfs_lock_two_inodes(src, XFS_MMAPLOCK_EXCL, dest,
+				XFS_MMAPLOCK_EXCL);
 
 	/* Check file eligibility and prepare for block sharing. */
 	ret = -EINVAL;


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/7] xfs: only grab shared inode locks for source file during reflink
  2018-01-27  1:10 [PATCH v3 0/7] xfs: reflink/scrub/quota fixes Darrick J. Wong
  2018-01-27  1:10 ` [PATCH 1/7] xfs: reflink should break pnfs leases before sharing blocks Darrick J. Wong
  2018-01-27  1:10 ` [PATCH 2/7] xfs: allow xfs_lock_two_inodes to take different EXCL/SHARED modes Darrick J. Wong
@ 2018-01-27  1:10 ` Darrick J. Wong
  2018-01-27  7:33   ` Christoph Hellwig
  2018-01-27  1:10 ` [PATCH 4/7] xfs: treat CoW fork operations as delalloc for quota accounting Darrick J. Wong
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2018-01-27  1:10 UTC (permalink / raw)
  To: hch, bfoster, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Reflink and dedupe operations remap blocks from a source file into a
destination file.  The destination file needs exclusive locks on all
levels because we're updating its block map, but the source file isn't
undergoing any block map changes so we can use a shared lock.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_reflink.c |   25 +++++++++++++++----------
 include/linux/fs.h   |    5 +++++
 2 files changed, 20 insertions(+), 10 deletions(-)


diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index bcc58c2..85a119e 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1202,13 +1202,16 @@ xfs_reflink_remap_blocks(
 
 	/* drange = (destoff, destoff + len); srange = (srcoff, srcoff + len) */
 	while (len) {
+		uint		lock_mode;
+
 		trace_xfs_reflink_remap_blocks_loop(src, srcoff, len,
 				dest, destoff);
+
 		/* Read extent from the source file */
 		nimaps = 1;
-		xfs_ilock(src, XFS_ILOCK_EXCL);
+		lock_mode = xfs_ilock_data_map_shared(src);
 		error = xfs_bmapi_read(src, srcoff, len, &imap, &nimaps, 0);
-		xfs_iunlock(src, XFS_ILOCK_EXCL);
+		xfs_iunlock(src, lock_mode);
 		if (error)
 			goto err;
 		ASSERT(nimaps == 1);
@@ -1260,7 +1263,7 @@ xfs_iolock_two_inodes_and_break_layout(
 
 retry:
 	if (src < dest) {
-		inode_lock(src);
+		inode_lock_shared(src);
 		inode_lock_nested(dest, I_MUTEX_NONDIR2);
 	} else {
 		/* src >= dest */
@@ -1271,7 +1274,7 @@ xfs_iolock_two_inodes_and_break_layout(
 	if (error == -EWOULDBLOCK) {
 		inode_unlock(dest);
 		if (src < dest)
-			inode_unlock(src);
+			inode_unlock_shared(src);
 		error = break_layout(dest, true);
 		if (error)
 			return error;
@@ -1280,11 +1283,11 @@ xfs_iolock_two_inodes_and_break_layout(
 	if (error) {
 		inode_unlock(dest);
 		if (src < dest)
-			inode_unlock(src);
+			inode_unlock_shared(src);
 		return error;
 	}
 	if (src > dest)
-		inode_lock_nested(src, I_MUTEX_NONDIR2);
+		inode_lock_shared_nested(src, I_MUTEX_NONDIR2);
 	return 0;
 }
 
@@ -1324,7 +1327,7 @@ xfs_reflink_remap_range(
 	if (same_inode)
 		xfs_ilock(src, XFS_MMAPLOCK_EXCL);
 	else
-		xfs_lock_two_inodes(src, XFS_MMAPLOCK_EXCL, dest,
+		xfs_lock_two_inodes(src, XFS_MMAPLOCK_SHARED, dest,
 				XFS_MMAPLOCK_EXCL);
 
 	/* Check file eligibility and prepare for block sharing. */
@@ -1393,10 +1396,12 @@ xfs_reflink_remap_range(
 			is_dedupe);
 
 out_unlock:
-	xfs_iunlock(src, XFS_MMAPLOCK_EXCL);
+	xfs_iunlock(dest, XFS_MMAPLOCK_EXCL);
+	if (!same_inode)
+		xfs_iunlock(src, XFS_MMAPLOCK_SHARED);
+	inode_unlock(inode_out);
 	if (!same_inode)
-		xfs_iunlock(dest, XFS_MMAPLOCK_EXCL);
-	unlock_two_nondirectories(inode_in, inode_out);
+		inode_unlock_shared(inode_in);
 	if (ret)
 		trace_xfs_reflink_remap_range_error(dest, ret, _RET_IP_);
 	return ret;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7f8d96d..5cbeab8 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -748,6 +748,11 @@ static inline void inode_lock_nested(struct inode *inode, unsigned subclass)
 	down_write_nested(&inode->i_rwsem, subclass);
 }
 
+static inline void inode_lock_shared_nested(struct inode *inode, unsigned subclass)
+{
+	down_read_nested(&inode->i_rwsem, subclass);
+}
+
 void lock_two_nondirectories(struct inode *, struct inode*);
 void unlock_two_nondirectories(struct inode *, struct inode*);
 


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/7] xfs: treat CoW fork operations as delalloc for quota accounting
  2018-01-27  1:10 [PATCH v3 0/7] xfs: reflink/scrub/quota fixes Darrick J. Wong
                   ` (2 preceding siblings ...)
  2018-01-27  1:10 ` [PATCH 3/7] xfs: only grab shared inode locks for source file during reflink Darrick J. Wong
@ 2018-01-27  1:10 ` Darrick J. Wong
  2018-01-27  7:33   ` Christoph Hellwig
  2018-01-29 12:26   ` Brian Foster
  2018-01-27  1:10 ` [PATCH 5/7] iomap: warn on zero-length mappings Darrick J. Wong
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 19+ messages in thread
From: Darrick J. Wong @ 2018-01-27  1:10 UTC (permalink / raw)
  To: hch, bfoster, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Since the CoW fork only exists in memory, it is incorrect to update the
on-disk quota block counts when we modify the CoW fork.  Unlike the data
fork, even real extents in the CoW fork are only delalloc-style
reservations (on-disk they're owned by the refcountbt) so they must not
be tracked in the on disk quota info.  Ensure the i_delayed_blks
accounting reflects this too.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c |   32 ++++++++++++++++++++++++++++++--
 fs/xfs/xfs_reflink.c     |   15 +++++++++++----
 2 files changed, 41 insertions(+), 6 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 4582f55..cad21fd 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3343,8 +3343,35 @@ xfs_bmap_btalloc_accounting(
 	struct xfs_bmalloca	*ap,
 	struct xfs_alloc_arg	*args)
 {
-	if (!(ap->flags & XFS_BMAPI_COWFORK))
-		ap->ip->i_d.di_nblocks += args->len;
+	if (ap->flags & XFS_BMAPI_COWFORK) {
+		/*
+		 * COW fork blocks are in-core only and thus are treated as
+		 * in-core quota reservation (like delalloc blocks) even when
+		 * converted to real blocks. The quota reservation is not
+		 * accounted to disk until blocks are remapped to the data
+		 * fork. So if these blocks were previously delalloc, we
+		 * already have quota reservation and there's nothing to do
+		 * yet.
+		 */
+		if (ap->wasdel)
+			return;
+
+		/*
+		 * Otherwise, we've allocated blocks in a hole. The transaction
+		 * has acquired in-core quota reservation for this extent.
+		 * Rather than account these as real blocks, however, we reduce
+		 * the transaction quota reservation based on the allocation.
+		 * This essentially transfers the transaction quota reservation
+		 * to that of a delalloc extent.
+		 */
+		ap->ip->i_delayed_blks += args->len;
+		xfs_trans_mod_dquot_byino(ap->tp, ap->ip, XFS_TRANS_DQ_RES_BLKS,
+				-(long)args->len);
+		return;
+	}
+
+	/* data/attr fork only */
+	ap->ip->i_d.di_nblocks += args->len;
 	xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);
 	if (ap->wasdel)
 		ap->ip->i_delayed_blks -= args->len;
@@ -4820,6 +4847,7 @@ xfs_bmap_del_extent_cow(
 		xfs_iext_insert(ip, icur, &new, state);
 		break;
 	}
+	ip->i_delayed_blks -= del->br_blockcount;
 }
 
 /*
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 85a119e..c4f0aff 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -599,10 +599,6 @@ xfs_reflink_cancel_cow_blocks(
 					del.br_startblock, del.br_blockcount,
 					NULL);
 
-			/* Update quota accounting */
-			xfs_trans_mod_dquot_byino(*tpp, ip, XFS_TRANS_DQ_BCOUNT,
-					-(long)del.br_blockcount);
-
 			/* Roll the transaction */
 			xfs_defer_ijoin(&dfops, ip);
 			error = xfs_defer_finish(tpp, &dfops);
@@ -613,6 +609,13 @@ xfs_reflink_cancel_cow_blocks(
 
 			/* Remove the mapping from the CoW fork. */
 			xfs_bmap_del_extent_cow(ip, &icur, &got, &del);
+
+			/* Remove the quota reservation */
+			error = xfs_trans_reserve_quota_nblks(NULL, ip,
+					-(long)del.br_blockcount, 0,
+					XFS_QMOPT_RES_REGBLKS);
+			if (error)
+				break;
 		} else {
 			/* Didn't do anything, push cursor back. */
 			xfs_iext_prev(ifp, &icur);
@@ -795,6 +798,10 @@ xfs_reflink_end_cow(
 		if (error)
 			goto out_defer;
 
+		/* Charge this new data fork mapping to the on-disk quota. */
+		xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_DELBCOUNT,
+				(long)del.br_blockcount);
+
 		/* Remove the mapping from the CoW fork. */
 		xfs_bmap_del_extent_cow(ip, &icur, &got, &del);
 


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/7] iomap: warn on zero-length mappings
  2018-01-27  1:10 [PATCH v3 0/7] xfs: reflink/scrub/quota fixes Darrick J. Wong
                   ` (3 preceding siblings ...)
  2018-01-27  1:10 ` [PATCH 4/7] xfs: treat CoW fork operations as delalloc for quota accounting Darrick J. Wong
@ 2018-01-27  1:10 ` Darrick J. Wong
  2018-01-27  7:34   ` Christoph Hellwig
  2018-01-27 18:08   ` [PATCH v2 " Darrick J. Wong
  2018-01-27  1:10 ` [PATCH 6/7] xfs: check reflink allocation mappings Darrick J. Wong
  2018-01-27  1:10 ` [PATCH 7/7] xfs: don't screw up direct writes when freesp is fragmented Darrick J. Wong
  6 siblings, 2 replies; 19+ messages in thread
From: Darrick J. Wong @ 2018-01-27  1:10 UTC (permalink / raw)
  To: hch, bfoster, darrick.wong; +Cc: linux-xfs, Christoph Hellwig

From: Darrick J. Wong <darrick.wong@oracle.com>

Don't let the iomap callback get away with feeding us a garbage zero
length mapping -- there was a bug in xfs that resulted in those leaking
out to hilarious effect.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/fs/iomap.c b/fs/iomap.c
index e5de772..aec35a0 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -63,7 +63,7 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
 	ret = ops->iomap_begin(inode, pos, length, flags, &iomap);
 	if (ret)
 		return ret;
-	if (WARN_ON(iomap.offset > pos))
+	if (WARN_ON(iomap.offset > pos) || WARN_ON(iomap.length == 0))
 		return -EIO;
 
 	/*


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/7] xfs: check reflink allocation mappings
  2018-01-27  1:10 [PATCH v3 0/7] xfs: reflink/scrub/quota fixes Darrick J. Wong
                   ` (4 preceding siblings ...)
  2018-01-27  1:10 ` [PATCH 5/7] iomap: warn on zero-length mappings Darrick J. Wong
@ 2018-01-27  1:10 ` Darrick J. Wong
  2018-01-27  7:35   ` Christoph Hellwig
  2018-01-27  1:10 ` [PATCH 7/7] xfs: don't screw up direct writes when freesp is fragmented Darrick J. Wong
  6 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2018-01-27  1:10 UTC (permalink / raw)
  To: hch, bfoster, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

There's a really bad bug in xfs_reflink_allocate_cow -- if bmapi_write
can return a zero error code but no mappings.  This happens if there's
an extent size hint (which causes allocation requests to be rounded to
extsz granularity internally), but there wasn't a big enough chunk of
free space to start filling at the extsz granularity and fill even one
block of the range that we actually requested.

In any case, if we got no mappings we can't possibly do anything useful
with the contents of imap, so we must bail out with ENOSPC here.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_reflink.c |    7 +++++++
 1 file changed, 7 insertions(+)


diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index c4f0aff..2702469 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -464,6 +464,13 @@ xfs_reflink_allocate_cow(
 	error = xfs_trans_commit(tp);
 	if (error)
 		return error;
+
+	/*
+	 * Allocation succeeded but the requested range was not even partially
+	 * satisfied?  Bail out!
+	 */
+	if (nimaps == 0)
+		return -ENOSPC;
 convert:
 	return xfs_reflink_convert_cow_extent(ip, imap, offset_fsb, count_fsb,
 			&dfops);


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 7/7] xfs: don't screw up direct writes when freesp is fragmented
  2018-01-27  1:10 [PATCH v3 0/7] xfs: reflink/scrub/quota fixes Darrick J. Wong
                   ` (5 preceding siblings ...)
  2018-01-27  1:10 ` [PATCH 6/7] xfs: check reflink allocation mappings Darrick J. Wong
@ 2018-01-27  1:10 ` Darrick J. Wong
  6 siblings, 0 replies; 19+ messages in thread
From: Darrick J. Wong @ 2018-01-27  1:10 UTC (permalink / raw)
  To: hch, bfoster, darrick.wong; +Cc: linux-xfs, Christoph Hellwig

From: Darrick J. Wong <darrick.wong@oracle.com>

xfs_bmap_btalloc is given a range of file offset blocks that must be
allocated to some data/attr/cow fork.  If the fork has an extent size
hint associated with it, the request will be enlarged on both ends to
try to satisfy the alignment hint.  If free space is fragmentated,
sometimes we can allocate some blocks but not enough to fulfill any of
the requested range.  Since bmapi_allocate always trims the new extent
mapping to match the originally requested range, this results in
bmapi_write returning zero and no mapping.

The consequences of this vary -- buffered writes will simply re-call
bmapi_write until it can satisfy at least one block from the original
request.  Direct IO overwrites notice nmaps == 0 and return -ENOSPC
through the dio mechanism out to userspace with the weird result that
writes fail even when we have enough space because the ENOSPC return
overrides any partial write status.  For direct CoW writes the situation
was disastrous because nobody notices us returning an invalid zero-length
wrong-offset mapping to iomap and the write goes off into space.

Therefore, if free space is so fragmented that we managed to allocate
some space but not enough to map into even a single block of the
original allocation request range, we should break the alignment hint in
order to guarantee at least some forward progress for the direct write.
If we return a short allocation to iomap_apply it'll call back about the
remaining blocks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_bmap.c |   20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index cad21fd..35660e9 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3390,6 +3390,8 @@ xfs_bmap_btalloc(
 	xfs_agnumber_t	fb_agno;	/* ag number of ap->firstblock */
 	xfs_agnumber_t	ag;
 	xfs_alloc_arg_t	args;
+	xfs_fileoff_t	orig_offset;
+	xfs_extlen_t	orig_length;
 	xfs_extlen_t	blen;
 	xfs_extlen_t	nextminlen = 0;
 	int		nullfb;		/* true if ap->firstblock isn't set */
@@ -3399,6 +3401,8 @@ xfs_bmap_btalloc(
 	int		stripe_align;
 
 	ASSERT(ap->length);
+	orig_offset = ap->offset;
+	orig_length = ap->length;
 
 	mp = ap->ip->i_mount;
 
@@ -3614,6 +3618,22 @@ xfs_bmap_btalloc(
 			*ap->firstblock = args.fsbno;
 		ASSERT(nullfb || fb_agno <= args.agno);
 		ap->length = args.len;
+		/*
+		 * If the extent size hint is active, we tried to round the
+		 * caller's allocation request offset down to extsz and the
+		 * length up to another extsz boundary.  If we found a free
+		 * extent we mapped it in starting at this new offset.  If the
+		 * newly mapped space isn't long enough to cover the entire
+		 * range of offsets that was originally requested, move the
+		 * mapping up so that we can fill as much of the caller's
+		 * original request as possible.  Free space is apparently
+		 * very fragmented so we're unlikely to be able to satisfy the
+		 * hints anyway.
+		 */
+		if (ap->length <= orig_length)
+			ap->offset = orig_offset;
+		else if (ap->offset + ap->length < orig_offset + orig_length)
+			ap->offset = orig_offset + orig_length - ap->length;
 		xfs_bmap_btalloc_accounting(ap, &args);
 	} else {
 		ap->blkno = NULLFSBLOCK;


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/7] xfs: reflink should break pnfs leases before sharing blocks
  2018-01-27  1:10 ` [PATCH 1/7] xfs: reflink should break pnfs leases before sharing blocks Darrick J. Wong
@ 2018-01-27  7:31   ` Christoph Hellwig
  0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2018-01-27  7:31 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: hch, bfoster, linux-xfs

Still not a big fan of the code flow, but I guess we need the fix for
now and can clean up later..

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/7] xfs: allow xfs_lock_two_inodes to take different EXCL/SHARED modes
  2018-01-27  1:10 ` [PATCH 2/7] xfs: allow xfs_lock_two_inodes to take different EXCL/SHARED modes Darrick J. Wong
@ 2018-01-27  7:32   ` Christoph Hellwig
  0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2018-01-27  7:32 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: hch, bfoster, linux-xfs

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/7] xfs: only grab shared inode locks for source file during reflink
  2018-01-27  1:10 ` [PATCH 3/7] xfs: only grab shared inode locks for source file during reflink Darrick J. Wong
@ 2018-01-27  7:33   ` Christoph Hellwig
  0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2018-01-27  7:33 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: hch, bfoster, linux-xfs, viro, linux-fsdevel

Looks fine, although the fs.h addition would usually warrant a Cc to
Al and fsdevel.  (thus included as a full quote)

Reviewed-by: Christoph Hellwig <hch@lst.de>

On Fri, Jan 26, 2018 at 05:10:28PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Reflink and dedupe operations remap blocks from a source file into a
> destination file.  The destination file needs exclusive locks on all
> levels because we're updating its block map, but the source file isn't
> undergoing any block map changes so we can use a shared lock.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/xfs_reflink.c |   25 +++++++++++++++----------
>  include/linux/fs.h   |    5 +++++
>  2 files changed, 20 insertions(+), 10 deletions(-)
> 
> 
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index bcc58c2..85a119e 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -1202,13 +1202,16 @@ xfs_reflink_remap_blocks(
>  
>  	/* drange = (destoff, destoff + len); srange = (srcoff, srcoff + len) */
>  	while (len) {
> +		uint		lock_mode;
> +
>  		trace_xfs_reflink_remap_blocks_loop(src, srcoff, len,
>  				dest, destoff);
> +
>  		/* Read extent from the source file */
>  		nimaps = 1;
> -		xfs_ilock(src, XFS_ILOCK_EXCL);
> +		lock_mode = xfs_ilock_data_map_shared(src);
>  		error = xfs_bmapi_read(src, srcoff, len, &imap, &nimaps, 0);
> -		xfs_iunlock(src, XFS_ILOCK_EXCL);
> +		xfs_iunlock(src, lock_mode);
>  		if (error)
>  			goto err;
>  		ASSERT(nimaps == 1);
> @@ -1260,7 +1263,7 @@ xfs_iolock_two_inodes_and_break_layout(
>  
>  retry:
>  	if (src < dest) {
> -		inode_lock(src);
> +		inode_lock_shared(src);
>  		inode_lock_nested(dest, I_MUTEX_NONDIR2);
>  	} else {
>  		/* src >= dest */
> @@ -1271,7 +1274,7 @@ xfs_iolock_two_inodes_and_break_layout(
>  	if (error == -EWOULDBLOCK) {
>  		inode_unlock(dest);
>  		if (src < dest)
> -			inode_unlock(src);
> +			inode_unlock_shared(src);
>  		error = break_layout(dest, true);
>  		if (error)
>  			return error;
> @@ -1280,11 +1283,11 @@ xfs_iolock_two_inodes_and_break_layout(
>  	if (error) {
>  		inode_unlock(dest);
>  		if (src < dest)
> -			inode_unlock(src);
> +			inode_unlock_shared(src);
>  		return error;
>  	}
>  	if (src > dest)
> -		inode_lock_nested(src, I_MUTEX_NONDIR2);
> +		inode_lock_shared_nested(src, I_MUTEX_NONDIR2);
>  	return 0;
>  }
>  
> @@ -1324,7 +1327,7 @@ xfs_reflink_remap_range(
>  	if (same_inode)
>  		xfs_ilock(src, XFS_MMAPLOCK_EXCL);
>  	else
> -		xfs_lock_two_inodes(src, XFS_MMAPLOCK_EXCL, dest,
> +		xfs_lock_two_inodes(src, XFS_MMAPLOCK_SHARED, dest,
>  				XFS_MMAPLOCK_EXCL);
>  
>  	/* Check file eligibility and prepare for block sharing. */
> @@ -1393,10 +1396,12 @@ xfs_reflink_remap_range(
>  			is_dedupe);
>  
>  out_unlock:
> -	xfs_iunlock(src, XFS_MMAPLOCK_EXCL);
> +	xfs_iunlock(dest, XFS_MMAPLOCK_EXCL);
> +	if (!same_inode)
> +		xfs_iunlock(src, XFS_MMAPLOCK_SHARED);
> +	inode_unlock(inode_out);
>  	if (!same_inode)
> -		xfs_iunlock(dest, XFS_MMAPLOCK_EXCL);
> -	unlock_two_nondirectories(inode_in, inode_out);
> +		inode_unlock_shared(inode_in);
>  	if (ret)
>  		trace_xfs_reflink_remap_range_error(dest, ret, _RET_IP_);
>  	return ret;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 7f8d96d..5cbeab8 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -748,6 +748,11 @@ static inline void inode_lock_nested(struct inode *inode, unsigned subclass)
>  	down_write_nested(&inode->i_rwsem, subclass);
>  }
>  
> +static inline void inode_lock_shared_nested(struct inode *inode, unsigned subclass)
> +{
> +	down_read_nested(&inode->i_rwsem, subclass);
> +}
> +
>  void lock_two_nondirectories(struct inode *, struct inode*);
>  void unlock_two_nondirectories(struct inode *, struct inode*);
>  
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
---end quoted text---

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/7] xfs: treat CoW fork operations as delalloc for quota accounting
  2018-01-27  1:10 ` [PATCH 4/7] xfs: treat CoW fork operations as delalloc for quota accounting Darrick J. Wong
@ 2018-01-27  7:33   ` Christoph Hellwig
  2018-01-29 12:26   ` Brian Foster
  1 sibling, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2018-01-27  7:33 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: hch, bfoster, linux-xfs

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/7] iomap: warn on zero-length mappings
  2018-01-27  1:10 ` [PATCH 5/7] iomap: warn on zero-length mappings Darrick J. Wong
@ 2018-01-27  7:34   ` Christoph Hellwig
  2018-01-27 18:04     ` Darrick J. Wong
  2018-01-27 18:08   ` [PATCH v2 " Darrick J. Wong
  1 sibling, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2018-01-27  7:34 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: bfoster, linux-xfs

> -	if (WARN_ON(iomap.offset > pos))
> +	if (WARN_ON(iomap.offset > pos) || WARN_ON(iomap.length == 0))

Btw, shouldn't this be one WARN_ON with both conditions inside?
It's not like we can spot which one it was based on the output
anyway.

Alternatively just have two conditionals, which allows us to spot
what went wrong.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/7] xfs: check reflink allocation mappings
  2018-01-27  1:10 ` [PATCH 6/7] xfs: check reflink allocation mappings Darrick J. Wong
@ 2018-01-27  7:35   ` Christoph Hellwig
  0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2018-01-27  7:35 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: hch, bfoster, linux-xfs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/7] iomap: warn on zero-length mappings
  2018-01-27  7:34   ` Christoph Hellwig
@ 2018-01-27 18:04     ` Darrick J. Wong
  0 siblings, 0 replies; 19+ messages in thread
From: Darrick J. Wong @ 2018-01-27 18:04 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: bfoster, linux-xfs

On Fri, Jan 26, 2018 at 11:34:50PM -0800, Christoph Hellwig wrote:
> > -	if (WARN_ON(iomap.offset > pos))
> > +	if (WARN_ON(iomap.offset > pos) || WARN_ON(iomap.length == 0))
> 
> Btw, shouldn't this be one WARN_ON with both conditions inside?
> It's not like we can spot which one it was based on the output
> anyway.
> 
> Alternatively just have two conditionals, which allows us to spot
> what went wrong.

Ah, right, I forgot that WARN_ON doesn't print the failing condition
unlike our ASSERT.

--D

> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 5/7] iomap: warn on zero-length mappings
  2018-01-27  1:10 ` [PATCH 5/7] iomap: warn on zero-length mappings Darrick J. Wong
  2018-01-27  7:34   ` Christoph Hellwig
@ 2018-01-27 18:08   ` Darrick J. Wong
  1 sibling, 0 replies; 19+ messages in thread
From: Darrick J. Wong @ 2018-01-27 18:08 UTC (permalink / raw)
  To: hch, bfoster; +Cc: linux-xfs, Christoph Hellwig

From: Darrick J. Wong <darrick.wong@oracle.com>

Don't let the iomap callback get away with feeding us a garbage zero
length mapping -- there was a bug in xfs that resulted in those leaking
out to hilarious effect.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/iomap.c b/fs/iomap.c
index e5de772..afd1635 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -65,6 +65,8 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
 		return ret;
 	if (WARN_ON(iomap.offset > pos))
 		return -EIO;
+	if (WARN_ON(iomap.length == 0))
+		return -EIO;
 
 	/*
 	 * Cut down the length to the one actually provided by the filesystem,

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/7] xfs: treat CoW fork operations as delalloc for quota accounting
  2018-01-27  1:10 ` [PATCH 4/7] xfs: treat CoW fork operations as delalloc for quota accounting Darrick J. Wong
  2018-01-27  7:33   ` Christoph Hellwig
@ 2018-01-29 12:26   ` Brian Foster
  2018-01-29 23:00     ` Darrick J. Wong
  1 sibling, 1 reply; 19+ messages in thread
From: Brian Foster @ 2018-01-29 12:26 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: hch, linux-xfs

On Fri, Jan 26, 2018 at 05:10:34PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Since the CoW fork only exists in memory, it is incorrect to update the
> on-disk quota block counts when we modify the CoW fork.  Unlike the data
> fork, even real extents in the CoW fork are only delalloc-style
> reservations (on-disk they're owned by the refcountbt) so they must not
> be tracked in the on disk quota info.  Ensure the i_delayed_blks
> accounting reflects this too.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Patch doesn't apply.. I'm guessing because xfs_bmap_btalloc_accounting()
doesn't exist. Is this series incomplete, or perhaps not consistent with
for-next..?

Brian

>  fs/xfs/libxfs/xfs_bmap.c |   32 ++++++++++++++++++++++++++++++--
>  fs/xfs/xfs_reflink.c     |   15 +++++++++++----
>  2 files changed, 41 insertions(+), 6 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 4582f55..cad21fd 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3343,8 +3343,35 @@ xfs_bmap_btalloc_accounting(
>  	struct xfs_bmalloca	*ap,
>  	struct xfs_alloc_arg	*args)
>  {
> -	if (!(ap->flags & XFS_BMAPI_COWFORK))
> -		ap->ip->i_d.di_nblocks += args->len;
> +	if (ap->flags & XFS_BMAPI_COWFORK) {
> +		/*
> +		 * COW fork blocks are in-core only and thus are treated as
> +		 * in-core quota reservation (like delalloc blocks) even when
> +		 * converted to real blocks. The quota reservation is not
> +		 * accounted to disk until blocks are remapped to the data
> +		 * fork. So if these blocks were previously delalloc, we
> +		 * already have quota reservation and there's nothing to do
> +		 * yet.
> +		 */
> +		if (ap->wasdel)
> +			return;
> +
> +		/*
> +		 * Otherwise, we've allocated blocks in a hole. The transaction
> +		 * has acquired in-core quota reservation for this extent.
> +		 * Rather than account these as real blocks, however, we reduce
> +		 * the transaction quota reservation based on the allocation.
> +		 * This essentially transfers the transaction quota reservation
> +		 * to that of a delalloc extent.
> +		 */
> +		ap->ip->i_delayed_blks += args->len;
> +		xfs_trans_mod_dquot_byino(ap->tp, ap->ip, XFS_TRANS_DQ_RES_BLKS,
> +				-(long)args->len);
> +		return;
> +	}
> +
> +	/* data/attr fork only */
> +	ap->ip->i_d.di_nblocks += args->len;
>  	xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);
>  	if (ap->wasdel)
>  		ap->ip->i_delayed_blks -= args->len;
> @@ -4820,6 +4847,7 @@ xfs_bmap_del_extent_cow(
>  		xfs_iext_insert(ip, icur, &new, state);
>  		break;
>  	}
> +	ip->i_delayed_blks -= del->br_blockcount;
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 85a119e..c4f0aff 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -599,10 +599,6 @@ xfs_reflink_cancel_cow_blocks(
>  					del.br_startblock, del.br_blockcount,
>  					NULL);
>  
> -			/* Update quota accounting */
> -			xfs_trans_mod_dquot_byino(*tpp, ip, XFS_TRANS_DQ_BCOUNT,
> -					-(long)del.br_blockcount);
> -
>  			/* Roll the transaction */
>  			xfs_defer_ijoin(&dfops, ip);
>  			error = xfs_defer_finish(tpp, &dfops);
> @@ -613,6 +609,13 @@ xfs_reflink_cancel_cow_blocks(
>  
>  			/* Remove the mapping from the CoW fork. */
>  			xfs_bmap_del_extent_cow(ip, &icur, &got, &del);
> +
> +			/* Remove the quota reservation */
> +			error = xfs_trans_reserve_quota_nblks(NULL, ip,
> +					-(long)del.br_blockcount, 0,
> +					XFS_QMOPT_RES_REGBLKS);
> +			if (error)
> +				break;
>  		} else {
>  			/* Didn't do anything, push cursor back. */
>  			xfs_iext_prev(ifp, &icur);
> @@ -795,6 +798,10 @@ xfs_reflink_end_cow(
>  		if (error)
>  			goto out_defer;
>  
> +		/* Charge this new data fork mapping to the on-disk quota. */
> +		xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_DELBCOUNT,
> +				(long)del.br_blockcount);
> +
>  		/* Remove the mapping from the CoW fork. */
>  		xfs_bmap_del_extent_cow(ip, &icur, &got, &del);
>  
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/7] xfs: treat CoW fork operations as delalloc for quota accounting
  2018-01-29 12:26   ` Brian Foster
@ 2018-01-29 23:00     ` Darrick J. Wong
  2018-01-30 11:48       ` Brian Foster
  0 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2018-01-29 23:00 UTC (permalink / raw)
  To: Brian Foster; +Cc: hch, linux-xfs

On Mon, Jan 29, 2018 at 07:26:38AM -0500, Brian Foster wrote:
> On Fri, Jan 26, 2018 at 05:10:34PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Since the CoW fork only exists in memory, it is incorrect to update the
> > on-disk quota block counts when we modify the CoW fork.  Unlike the data
> > fork, even real extents in the CoW fork are only delalloc-style
> > reservations (on-disk they're owned by the refcountbt) so they must not
> > be tracked in the on disk quota info.  Ensure the i_delayed_blks
> > accounting reflects this too.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> 
> Patch doesn't apply.. I'm guessing because xfs_bmap_btalloc_accounting()
> doesn't exist. Is this series incomplete, or perhaps not consistent with
> for-next..?

Yeah, sorry, I think I forgot to send the middle patch when I sent this
series....

...no, wait, "xfs: refactor accounting updates out of xfs_bmap_btalloc"
was already reviewed, so these last few patches were supposed to apply
atop that, and that went on top of for-next.

--D

> 
> Brian
> 
> >  fs/xfs/libxfs/xfs_bmap.c |   32 ++++++++++++++++++++++++++++++--
> >  fs/xfs/xfs_reflink.c     |   15 +++++++++++----
> >  2 files changed, 41 insertions(+), 6 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> > index 4582f55..cad21fd 100644
> > --- a/fs/xfs/libxfs/xfs_bmap.c
> > +++ b/fs/xfs/libxfs/xfs_bmap.c
> > @@ -3343,8 +3343,35 @@ xfs_bmap_btalloc_accounting(
> >  	struct xfs_bmalloca	*ap,
> >  	struct xfs_alloc_arg	*args)
> >  {
> > -	if (!(ap->flags & XFS_BMAPI_COWFORK))
> > -		ap->ip->i_d.di_nblocks += args->len;
> > +	if (ap->flags & XFS_BMAPI_COWFORK) {
> > +		/*
> > +		 * COW fork blocks are in-core only and thus are treated as
> > +		 * in-core quota reservation (like delalloc blocks) even when
> > +		 * converted to real blocks. The quota reservation is not
> > +		 * accounted to disk until blocks are remapped to the data
> > +		 * fork. So if these blocks were previously delalloc, we
> > +		 * already have quota reservation and there's nothing to do
> > +		 * yet.
> > +		 */
> > +		if (ap->wasdel)
> > +			return;
> > +
> > +		/*
> > +		 * Otherwise, we've allocated blocks in a hole. The transaction
> > +		 * has acquired in-core quota reservation for this extent.
> > +		 * Rather than account these as real blocks, however, we reduce
> > +		 * the transaction quota reservation based on the allocation.
> > +		 * This essentially transfers the transaction quota reservation
> > +		 * to that of a delalloc extent.
> > +		 */
> > +		ap->ip->i_delayed_blks += args->len;
> > +		xfs_trans_mod_dquot_byino(ap->tp, ap->ip, XFS_TRANS_DQ_RES_BLKS,
> > +				-(long)args->len);
> > +		return;
> > +	}
> > +
> > +	/* data/attr fork only */
> > +	ap->ip->i_d.di_nblocks += args->len;
> >  	xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);
> >  	if (ap->wasdel)
> >  		ap->ip->i_delayed_blks -= args->len;
> > @@ -4820,6 +4847,7 @@ xfs_bmap_del_extent_cow(
> >  		xfs_iext_insert(ip, icur, &new, state);
> >  		break;
> >  	}
> > +	ip->i_delayed_blks -= del->br_blockcount;
> >  }
> >  
> >  /*
> > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> > index 85a119e..c4f0aff 100644
> > --- a/fs/xfs/xfs_reflink.c
> > +++ b/fs/xfs/xfs_reflink.c
> > @@ -599,10 +599,6 @@ xfs_reflink_cancel_cow_blocks(
> >  					del.br_startblock, del.br_blockcount,
> >  					NULL);
> >  
> > -			/* Update quota accounting */
> > -			xfs_trans_mod_dquot_byino(*tpp, ip, XFS_TRANS_DQ_BCOUNT,
> > -					-(long)del.br_blockcount);
> > -
> >  			/* Roll the transaction */
> >  			xfs_defer_ijoin(&dfops, ip);
> >  			error = xfs_defer_finish(tpp, &dfops);
> > @@ -613,6 +609,13 @@ xfs_reflink_cancel_cow_blocks(
> >  
> >  			/* Remove the mapping from the CoW fork. */
> >  			xfs_bmap_del_extent_cow(ip, &icur, &got, &del);
> > +
> > +			/* Remove the quota reservation */
> > +			error = xfs_trans_reserve_quota_nblks(NULL, ip,
> > +					-(long)del.br_blockcount, 0,
> > +					XFS_QMOPT_RES_REGBLKS);
> > +			if (error)
> > +				break;
> >  		} else {
> >  			/* Didn't do anything, push cursor back. */
> >  			xfs_iext_prev(ifp, &icur);
> > @@ -795,6 +798,10 @@ xfs_reflink_end_cow(
> >  		if (error)
> >  			goto out_defer;
> >  
> > +		/* Charge this new data fork mapping to the on-disk quota. */
> > +		xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_DELBCOUNT,
> > +				(long)del.br_blockcount);
> > +
> >  		/* Remove the mapping from the CoW fork. */
> >  		xfs_bmap_del_extent_cow(ip, &icur, &got, &del);
> >  
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/7] xfs: treat CoW fork operations as delalloc for quota accounting
  2018-01-29 23:00     ` Darrick J. Wong
@ 2018-01-30 11:48       ` Brian Foster
  0 siblings, 0 replies; 19+ messages in thread
From: Brian Foster @ 2018-01-30 11:48 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: hch, linux-xfs

On Mon, Jan 29, 2018 at 03:00:04PM -0800, Darrick J. Wong wrote:
> On Mon, Jan 29, 2018 at 07:26:38AM -0500, Brian Foster wrote:
> > On Fri, Jan 26, 2018 at 05:10:34PM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > Since the CoW fork only exists in memory, it is incorrect to update the
> > > on-disk quota block counts when we modify the CoW fork.  Unlike the data
> > > fork, even real extents in the CoW fork are only delalloc-style
> > > reservations (on-disk they're owned by the refcountbt) so they must not
> > > be tracked in the on disk quota info.  Ensure the i_delayed_blks
> > > accounting reflects this too.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > 
> > Patch doesn't apply.. I'm guessing because xfs_bmap_btalloc_accounting()
> > doesn't exist. Is this series incomplete, or perhaps not consistent with
> > for-next..?
> 
> Yeah, sorry, I think I forgot to send the middle patch when I sent this
> series....
> 
> ...no, wait, "xfs: refactor accounting updates out of xfs_bmap_btalloc"
> was already reviewed, so these last few patches were supposed to apply
> atop that, and that went on top of for-next.
> 

Hmm.. I think I saw that a day or two ago, but then for-next moved
forward/backward and when I tried to apply this series in full, the head
of for-next was commit 75d4a13b ("xfs: fix non-debug build compiler
warnings"), which is behind the original refactoring patch. :/

Annnyways.. it looks like these are all merged now as of this morning,
so moving on..

Brian

> --D
> 
> > 
> > Brian
> > 
> > >  fs/xfs/libxfs/xfs_bmap.c |   32 ++++++++++++++++++++++++++++++--
> > >  fs/xfs/xfs_reflink.c     |   15 +++++++++++----
> > >  2 files changed, 41 insertions(+), 6 deletions(-)
> > > 
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> > > index 4582f55..cad21fd 100644
> > > --- a/fs/xfs/libxfs/xfs_bmap.c
> > > +++ b/fs/xfs/libxfs/xfs_bmap.c
> > > @@ -3343,8 +3343,35 @@ xfs_bmap_btalloc_accounting(
> > >  	struct xfs_bmalloca	*ap,
> > >  	struct xfs_alloc_arg	*args)
> > >  {
> > > -	if (!(ap->flags & XFS_BMAPI_COWFORK))
> > > -		ap->ip->i_d.di_nblocks += args->len;
> > > +	if (ap->flags & XFS_BMAPI_COWFORK) {
> > > +		/*
> > > +		 * COW fork blocks are in-core only and thus are treated as
> > > +		 * in-core quota reservation (like delalloc blocks) even when
> > > +		 * converted to real blocks. The quota reservation is not
> > > +		 * accounted to disk until blocks are remapped to the data
> > > +		 * fork. So if these blocks were previously delalloc, we
> > > +		 * already have quota reservation and there's nothing to do
> > > +		 * yet.
> > > +		 */
> > > +		if (ap->wasdel)
> > > +			return;
> > > +
> > > +		/*
> > > +		 * Otherwise, we've allocated blocks in a hole. The transaction
> > > +		 * has acquired in-core quota reservation for this extent.
> > > +		 * Rather than account these as real blocks, however, we reduce
> > > +		 * the transaction quota reservation based on the allocation.
> > > +		 * This essentially transfers the transaction quota reservation
> > > +		 * to that of a delalloc extent.
> > > +		 */
> > > +		ap->ip->i_delayed_blks += args->len;
> > > +		xfs_trans_mod_dquot_byino(ap->tp, ap->ip, XFS_TRANS_DQ_RES_BLKS,
> > > +				-(long)args->len);
> > > +		return;
> > > +	}
> > > +
> > > +	/* data/attr fork only */
> > > +	ap->ip->i_d.di_nblocks += args->len;
> > >  	xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);
> > >  	if (ap->wasdel)
> > >  		ap->ip->i_delayed_blks -= args->len;
> > > @@ -4820,6 +4847,7 @@ xfs_bmap_del_extent_cow(
> > >  		xfs_iext_insert(ip, icur, &new, state);
> > >  		break;
> > >  	}
> > > +	ip->i_delayed_blks -= del->br_blockcount;
> > >  }
> > >  
> > >  /*
> > > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> > > index 85a119e..c4f0aff 100644
> > > --- a/fs/xfs/xfs_reflink.c
> > > +++ b/fs/xfs/xfs_reflink.c
> > > @@ -599,10 +599,6 @@ xfs_reflink_cancel_cow_blocks(
> > >  					del.br_startblock, del.br_blockcount,
> > >  					NULL);
> > >  
> > > -			/* Update quota accounting */
> > > -			xfs_trans_mod_dquot_byino(*tpp, ip, XFS_TRANS_DQ_BCOUNT,
> > > -					-(long)del.br_blockcount);
> > > -
> > >  			/* Roll the transaction */
> > >  			xfs_defer_ijoin(&dfops, ip);
> > >  			error = xfs_defer_finish(tpp, &dfops);
> > > @@ -613,6 +609,13 @@ xfs_reflink_cancel_cow_blocks(
> > >  
> > >  			/* Remove the mapping from the CoW fork. */
> > >  			xfs_bmap_del_extent_cow(ip, &icur, &got, &del);
> > > +
> > > +			/* Remove the quota reservation */
> > > +			error = xfs_trans_reserve_quota_nblks(NULL, ip,
> > > +					-(long)del.br_blockcount, 0,
> > > +					XFS_QMOPT_RES_REGBLKS);
> > > +			if (error)
> > > +				break;
> > >  		} else {
> > >  			/* Didn't do anything, push cursor back. */
> > >  			xfs_iext_prev(ifp, &icur);
> > > @@ -795,6 +798,10 @@ xfs_reflink_end_cow(
> > >  		if (error)
> > >  			goto out_defer;
> > >  
> > > +		/* Charge this new data fork mapping to the on-disk quota. */
> > > +		xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_DELBCOUNT,
> > > +				(long)del.br_blockcount);
> > > +
> > >  		/* Remove the mapping from the CoW fork. */
> > >  		xfs_bmap_del_extent_cow(ip, &icur, &got, &del);
> > >  
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-01-30 11:48 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-27  1:10 [PATCH v3 0/7] xfs: reflink/scrub/quota fixes Darrick J. Wong
2018-01-27  1:10 ` [PATCH 1/7] xfs: reflink should break pnfs leases before sharing blocks Darrick J. Wong
2018-01-27  7:31   ` Christoph Hellwig
2018-01-27  1:10 ` [PATCH 2/7] xfs: allow xfs_lock_two_inodes to take different EXCL/SHARED modes Darrick J. Wong
2018-01-27  7:32   ` Christoph Hellwig
2018-01-27  1:10 ` [PATCH 3/7] xfs: only grab shared inode locks for source file during reflink Darrick J. Wong
2018-01-27  7:33   ` Christoph Hellwig
2018-01-27  1:10 ` [PATCH 4/7] xfs: treat CoW fork operations as delalloc for quota accounting Darrick J. Wong
2018-01-27  7:33   ` Christoph Hellwig
2018-01-29 12:26   ` Brian Foster
2018-01-29 23:00     ` Darrick J. Wong
2018-01-30 11:48       ` Brian Foster
2018-01-27  1:10 ` [PATCH 5/7] iomap: warn on zero-length mappings Darrick J. Wong
2018-01-27  7:34   ` Christoph Hellwig
2018-01-27 18:04     ` Darrick J. Wong
2018-01-27 18:08   ` [PATCH v2 " Darrick J. Wong
2018-01-27  1:10 ` [PATCH 6/7] xfs: check reflink allocation mappings Darrick J. Wong
2018-01-27  7:35   ` Christoph Hellwig
2018-01-27  1:10 ` [PATCH 7/7] xfs: don't screw up direct writes when freesp is fragmented Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).