All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] xfs: reflink/scrub/quota fixes
@ 2018-01-26  2:04 Darrick J. Wong
  2018-01-26  2:04 ` [PATCH 1/6] xfs: refactor accounting updates out of xfs_bmap_btalloc Darrick J. Wong
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Darrick J. Wong @ 2018-01-26  2:04 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

Hi all,

This is a rollup of all the remaining patches I've sent in the past 9
days or so.  If all goes well I hope to land this during the 4.16 merge.

Running generic/232 with quotas and reflink demonstrated that there was
something wrong with the way we did quota accounting -- on an otherwise
idle system, fs-wide du block count numbers didn't match the quota
reports.  I started digging into why the quota accounting was wrong, and
the following are the results of my bug hunt.

Patches 1-2 reorganize the copy on write quota updating code to reflect
how the CoW fork works now.  In short, the CoW fork is entirely in
memory, so we can only use the in-memory quota reservation counters for
all CoW blocks; the accounting only becomes permanent if we remap an
extent into the data fork.

Patch 3 creates a separate i_cow_blocks counter to track all the CoW
blocks assigned to a file, which makes changing a file's uid/gid/prjid
easier, makes reporting cow blocks via stat easy, and enables various
cleanups.

Patch 4 fixes a serious corruption problem with the cow extent
allocation -- when we allocate into the CoW fork with the cow extent
size hint set, the allocator enlarges the allocation request to try to
hit alignment goals.  However, if the allocated extent does not actually
fulfill any of the requested range, we send a garbage zero-length extent
back to the iomap code (which also doesn't notice), and the write lands
at the startblock of the garbage extent.  The fix is to detect that we
didn't fill the entire requested range and fix up the returned mapping
so that we always fill the first block of the requested allocation and
for CoW to check the validity returned mappings before blowing things
up.  This also fixes a problem where a direct io overwrite can return
ENOSPC to userspace because of the same fragmentation problem.

The 5th patch fixes a hard to reproduce race between truncate and
buffered CoW writeback.  Writeback doesn't seem to take i_rwsem (aka the
iolock) when it's gathering io requests, so it races with a
truncate-down, which starts by reducing i_size before moving on to
removing the pages from the page cache.  If the writeback was to a
shared block and we "win" the race, we'll return the CoW mapping but
it'll be deemed invalid because it's beyond EOF.  We then blow an assert
when we try to go to the data fork for a valid mapping (because we
should never do that!) though even any data fork mapping will also be
invalidated and we ultimately don't write anything.

The 6th patch fixes a NULL pointer deref because we incorrectly
freed the inode btree cursor if there's an error while counting the
blocks in the inode btree for rmapbt cross-referencing.

Anyway, with this set applied I think we're ready to remove the reflink
EXPERIMENTAL tag during the 4.16 cycle.

--D

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/6] xfs: refactor accounting updates out of xfs_bmap_btalloc
  2018-01-26  2:04 [PATCH v2 0/6] xfs: reflink/scrub/quota fixes Darrick J. Wong
@ 2018-01-26  2:04 ` Darrick J. Wong
  2018-01-26 12:19   ` Christoph Hellwig
  2018-01-26 19:06   ` Brian Foster
  2018-01-26  2:05 ` [PATCH 2/6] xfs: CoW fork operations should only update quota reservations Darrick J. Wong
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 15+ messages in thread
From: Darrick J. Wong @ 2018-01-26  2:04 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Move all the inode and quota accounting updates out of xfs_bmap_btalloc
in preparation for fixing some quota accounting problems with copy on
write.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c |   34 +++++++++++++++++++++-------------
 1 file changed, 21 insertions(+), 13 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 0c9c9cd..6ad79ea 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3337,6 +3337,26 @@ xfs_bmap_btalloc_filestreams(
 	return 0;
 }
 
+/* Update all inode and quota accounting for the allocation we just did. */
+static void
+xfs_bmap_btalloc_accounting(
+	struct xfs_bmalloca	*ap,
+	struct xfs_alloc_arg	*args)
+{
+	if (!(ap->flags & XFS_BMAPI_COWFORK))
+		ap->ip->i_d.di_nblocks += args->len;
+	xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);
+	if (ap->wasdel)
+		ap->ip->i_delayed_blks -= args->len;
+	/*
+	 * Adjust the disk quota also. This was reserved
+	 * earlier.
+	 */
+	xfs_trans_mod_dquot_byino(ap->tp, ap->ip,
+		ap->wasdel ? XFS_TRANS_DQ_DELBCOUNT : XFS_TRANS_DQ_BCOUNT,
+		(long) args->len);
+}
+
 STATIC int
 xfs_bmap_btalloc(
 	struct xfs_bmalloca	*ap)	/* bmap alloc argument struct */
@@ -3571,19 +3591,7 @@ xfs_bmap_btalloc(
 			*ap->firstblock = args.fsbno;
 		ASSERT(nullfb || fb_agno <= args.agno);
 		ap->length = args.len;
-		if (!(ap->flags & XFS_BMAPI_COWFORK))
-			ap->ip->i_d.di_nblocks += args.len;
-		xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);
-		if (ap->wasdel)
-			ap->ip->i_delayed_blks -= args.len;
-		/*
-		 * Adjust the disk quota also. This was reserved
-		 * earlier.
-		 */
-		xfs_trans_mod_dquot_byino(ap->tp, ap->ip,
-			ap->wasdel ? XFS_TRANS_DQ_DELBCOUNT :
-					XFS_TRANS_DQ_BCOUNT,
-			(long) args.len);
+		xfs_bmap_btalloc_accounting(ap, &args);
 	} else {
 		ap->blkno = NULLFSBLOCK;
 		ap->length = 0;


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/6] xfs: CoW fork operations should only update quota reservations
  2018-01-26  2:04 [PATCH v2 0/6] xfs: reflink/scrub/quota fixes Darrick J. Wong
  2018-01-26  2:04 ` [PATCH 1/6] xfs: refactor accounting updates out of xfs_bmap_btalloc Darrick J. Wong
@ 2018-01-26  2:05 ` Darrick J. Wong
  2018-01-26 19:06   ` Brian Foster
  2018-01-26  2:05 ` [PATCH 3/6] xfs: track CoW blocks separately in the inode Darrick J. Wong
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Darrick J. Wong @ 2018-01-26  2:05 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Since the CoW fork only exists in memory, it is incorrect to update the
on-disk quota block counts when we modify the CoW fork.  Unlike the data
fork, even real extents in the CoW fork are only reservations (on-disk
they're owned by the refcountbt) so they must not be tracked in the on
disk quota info.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c |   33 ++++++++++++++++++++++++++++++---
 fs/xfs/libxfs/xfs_bmap.h |    2 +-
 fs/xfs/xfs_reflink.c     |   12 ++++++------
 3 files changed, 37 insertions(+), 10 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 6ad79ea..a59d5be 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3343,8 +3343,26 @@ xfs_bmap_btalloc_accounting(
 	struct xfs_bmalloca	*ap,
 	struct xfs_alloc_arg	*args)
 {
-	if (!(ap->flags & XFS_BMAPI_COWFORK))
-		ap->ip->i_d.di_nblocks += args->len;
+	if (ap->flags & XFS_BMAPI_COWFORK) {
+		/* Filling a previously reserved extent; nothing to do here. */
+		if (ap->wasdel)
+			return;
+
+		/*
+		 * If we get here, we're filling a CoW hole with a real
+		 * (non-delalloc) CoW extent having reserved enough blocks
+		 * from both q_res_bcount and qt_blk_res to guarantee that we
+		 * won't run out of space.  The unused qt_blk_res is given
+		 * back to q_res_bcount when the transaction commits, so we
+		 * must decrease qt_blk_res without decreasing q_res_bcount.
+		 */
+		xfs_trans_mod_dquot_byino(ap->tp, ap->ip, XFS_TRANS_DQ_RES_BLKS,
+				-(long)args->len);
+		return;
+	}
+
+	/* data/attr fork only */
+	ap->ip->i_d.di_nblocks += args->len;
 	xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);
 	if (ap->wasdel)
 		ap->ip->i_delayed_blks -= args->len;
@@ -4761,13 +4779,15 @@ xfs_bmap_del_extent_cow(
 	struct xfs_inode	*ip,
 	struct xfs_iext_cursor	*icur,
 	struct xfs_bmbt_irec	*got,
-	struct xfs_bmbt_irec	*del)
+	struct xfs_bmbt_irec	*del,
+	bool			free_quotares)
 {
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
 	struct xfs_bmbt_irec	new;
 	xfs_fileoff_t		del_endoff, got_endoff;
 	int			state = BMAP_COWFORK;
+	int			error;
 
 	XFS_STATS_INC(mp, xs_del_exlist);
 
@@ -4824,6 +4844,13 @@ xfs_bmap_del_extent_cow(
 		xfs_iext_insert(ip, icur, &new, state);
 		break;
 	}
+
+	/* Remove the quota reservation */
+	if (!free_quotares)
+		return;
+	error = xfs_trans_reserve_quota_nblks(NULL, ip,
+			-(long)del->br_blockcount, 0, XFS_QMOPT_RES_REGBLKS);
+	ASSERT(error == 0);
 }
 
 /*
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index e36d757..e99f28f 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -224,7 +224,7 @@ int	xfs_bmap_del_extent_delay(struct xfs_inode *ip, int whichfork,
 		struct xfs_bmbt_irec *del);
 void	xfs_bmap_del_extent_cow(struct xfs_inode *ip,
 		struct xfs_iext_cursor *cur, struct xfs_bmbt_irec *got,
-		struct xfs_bmbt_irec *del);
+		struct xfs_bmbt_irec *del, bool free_quotares);
 uint	xfs_default_attroffset(struct xfs_inode *ip);
 int	xfs_bmap_collapse_extents(struct xfs_trans *tp, struct xfs_inode *ip,
 		xfs_fileoff_t *next_fsb, xfs_fileoff_t offset_shift_fsb,
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 82abff6..3644a08 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -599,10 +599,6 @@ xfs_reflink_cancel_cow_blocks(
 					del.br_startblock, del.br_blockcount,
 					NULL);
 
-			/* Update quota accounting */
-			xfs_trans_mod_dquot_byino(*tpp, ip, XFS_TRANS_DQ_BCOUNT,
-					-(long)del.br_blockcount);
-
 			/* Roll the transaction */
 			xfs_defer_ijoin(&dfops, ip);
 			error = xfs_defer_finish(tpp, &dfops);
@@ -612,7 +608,7 @@ xfs_reflink_cancel_cow_blocks(
 			}
 
 			/* Remove the mapping from the CoW fork. */
-			xfs_bmap_del_extent_cow(ip, &icur, &got, &del);
+			xfs_bmap_del_extent_cow(ip, &icur, &got, &del, true);
 		} else {
 			/* Didn't do anything, push cursor back. */
 			xfs_iext_prev(ifp, &icur);
@@ -795,8 +791,12 @@ xfs_reflink_end_cow(
 		if (error)
 			goto out_defer;
 
+		/* Charge this new data fork mapping to the on-disk quota. */
+		xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_DELBCOUNT,
+				(long)del.br_blockcount);
+
 		/* Remove the mapping from the CoW fork. */
-		xfs_bmap_del_extent_cow(ip, &icur, &got, &del);
+		xfs_bmap_del_extent_cow(ip, &icur, &got, &del, false);
 
 		xfs_defer_ijoin(&dfops, ip);
 		error = xfs_defer_finish(&tp, &dfops);


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/6] xfs: track CoW blocks separately in the inode
  2018-01-26  2:04 [PATCH v2 0/6] xfs: reflink/scrub/quota fixes Darrick J. Wong
  2018-01-26  2:04 ` [PATCH 1/6] xfs: refactor accounting updates out of xfs_bmap_btalloc Darrick J. Wong
  2018-01-26  2:05 ` [PATCH 2/6] xfs: CoW fork operations should only update quota reservations Darrick J. Wong
@ 2018-01-26  2:05 ` Darrick J. Wong
  2018-01-26  2:05 ` [PATCH 4/6] xfs: don't screw up direct writes when freesp is fragmented Darrick J. Wong
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Darrick J. Wong @ 2018-01-26  2:05 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Track the number of blocks reserved in the CoW fork so that we can
move the quota reservations whenever we chown, and don't account for
CoW fork delalloc reservations in i_delayed_blks.  This should make
chown work properly for quota reservations, enables us to fully
account for real extents in the cow fork in the file stat info, and
improves the post-eof scanning decisions because we're no longer
confusing data fork delalloc extents with cow fork delalloc extents.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c      |   12 ++++++++++--
 fs/xfs/libxfs/xfs_inode_buf.c |    1 +
 fs/xfs/xfs_bmap_util.c        |    5 +++++
 fs/xfs/xfs_icache.c           |    3 ++-
 fs/xfs/xfs_inode.c            |   11 +++++------
 fs/xfs/xfs_inode.h            |    1 +
 fs/xfs/xfs_iops.c             |    3 ++-
 fs/xfs/xfs_itable.c           |    3 ++-
 fs/xfs/xfs_qm.c               |    2 +-
 fs/xfs/xfs_reflink.c          |    4 ++--
 fs/xfs/xfs_super.c            |    1 +
 11 files changed, 32 insertions(+), 14 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index a59d5be..032befb 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3356,6 +3356,7 @@ xfs_bmap_btalloc_accounting(
 		 * back to q_res_bcount when the transaction commits, so we
 		 * must decrease qt_blk_res without decreasing q_res_bcount.
 		 */
+		ap->ip->i_cow_blocks += args->len;
 		xfs_trans_mod_dquot_byino(ap->tp, ap->ip, XFS_TRANS_DQ_RES_BLKS,
 				-(long)args->len);
 		return;
@@ -3954,7 +3955,10 @@ xfs_bmapi_reserve_delalloc(
 		goto out_unreserve_blocks;
 
 
-	ip->i_delayed_blks += alen;
+	if (whichfork == XFS_COW_FORK)
+		ip->i_cow_blocks += alen;
+	else
+		ip->i_delayed_blks += alen;
 
 	got->br_startoff = aoff;
 	got->br_startblock = nullstartblock(indlen);
@@ -4694,7 +4698,10 @@ xfs_bmap_del_extent_delay(
 			isrt ? XFS_QMOPT_RES_RTBLKS : XFS_QMOPT_RES_REGBLKS);
 	if (error)
 		return error;
-	ip->i_delayed_blks -= del->br_blockcount;
+	if (whichfork == XFS_COW_FORK)
+		ip->i_cow_blocks -= del->br_blockcount;
+	else
+		ip->i_delayed_blks -= del->br_blockcount;
 
 	if (got->br_startoff == del->br_startoff)
 		state |= BMAP_LEFT_FILLING;
@@ -4846,6 +4853,7 @@ xfs_bmap_del_extent_cow(
 	}
 
 	/* Remove the quota reservation */
+	ip->i_cow_blocks -= del->br_blockcount;
 	if (!free_quotares)
 		return;
 	error = xfs_trans_reserve_quota_nblks(NULL, ip,
diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index d7e7e58..d73ded3 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -622,6 +622,7 @@ xfs_iread(
 
 	ASSERT(ip->i_d.di_version >= 2);
 	ip->i_delayed_blks = 0;
+	ip->i_cow_blocks = 0;
 
 	/*
 	 * Mark the buffer containing the inode as something to keep
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 6d37ab4..c572789 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1991,6 +1991,7 @@ xfs_swap_extents(
 	/* Swap the cow forks. */
 	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
 		xfs_extnum_t	extnum;
+		unsigned int	cowblocks;
 
 		ASSERT(ip->i_cformat == XFS_DINODE_FMT_EXTENTS);
 		ASSERT(tip->i_cformat == XFS_DINODE_FMT_EXTENTS);
@@ -2011,6 +2012,10 @@ xfs_swap_extents(
 			xfs_inode_set_cowblocks_tag(tip);
 		else
 			xfs_inode_clear_cowblocks_tag(tip);
+
+		cowblocks = tip->i_cow_blocks;
+		tip->i_cow_blocks = ip->i_cow_blocks;
+		ip->i_cow_blocks = cowblocks;
 	}
 
 	xfs_trans_log_inode(tp, ip,  src_log_flags);
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 2da7a2e..1344206 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -80,6 +80,7 @@ xfs_inode_alloc(
 	memset(&ip->i_df, 0, sizeof(xfs_ifork_t));
 	ip->i_flags = 0;
 	ip->i_delayed_blks = 0;
+	ip->i_cow_blocks = 0;
 	memset(&ip->i_d, 0, sizeof(ip->i_d));
 
 	return ip;
@@ -1668,7 +1669,7 @@ xfs_prep_free_cowblocks(
 	 * Just clear the tag if we have an empty cow fork or none at all. It's
 	 * possible the inode was fully unshared since it was originally tagged.
 	 */
-	if (!xfs_is_reflink_inode(ip) || !ifp->if_bytes) {
+	if (!xfs_is_reflink_inode(ip) || ip->i_cow_blocks == 0) {
 		trace_xfs_inode_free_cowblocks_invalid(ip);
 		xfs_inode_clear_cowblocks_tag(ip);
 		return false;
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index f8e8802..57ebc60 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1508,15 +1508,13 @@ xfs_itruncate_clear_reflink_flags(
 	struct xfs_inode	*ip)
 {
 	struct xfs_ifork	*dfork;
-	struct xfs_ifork	*cfork;
 
 	if (!xfs_is_reflink_inode(ip))
 		return;
 	dfork = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
-	cfork = XFS_IFORK_PTR(ip, XFS_COW_FORK);
-	if (dfork->if_bytes == 0 && cfork->if_bytes == 0)
+	if (dfork->if_bytes == 0 && ip->i_cow_blocks == 0)
 		ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK;
-	if (cfork->if_bytes == 0)
+	if (ip->i_cow_blocks == 0)
 		xfs_inode_clear_cowblocks_tag(ip);
 }
 
@@ -1669,7 +1667,7 @@ xfs_release(
 		truncated = xfs_iflags_test_and_clear(ip, XFS_ITRUNCATED);
 		if (truncated) {
 			xfs_iflags_clear(ip, XFS_IDIRTY_RELEASE);
-			if (ip->i_delayed_blks > 0) {
+			if (ip->i_delayed_blks > 0 || ip->i_cow_blocks > 0) {
 				error = filemap_flush(VFS_I(ip)->i_mapping);
 				if (error)
 					return error;
@@ -1909,7 +1907,8 @@ xfs_inactive(
 
 	if (S_ISREG(VFS_I(ip)->i_mode) &&
 	    (ip->i_d.di_size != 0 || XFS_ISIZE(ip) != 0 ||
-	     ip->i_d.di_nextents > 0 || ip->i_delayed_blks > 0))
+	     ip->i_d.di_nextents > 0 || ip->i_delayed_blks > 0 ||
+	     ip->i_cow_blocks > 0))
 		truncate = 1;
 
 	error = xfs_qm_dqattach(ip, 0);
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index ff56486..6feee8a 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -62,6 +62,7 @@ typedef struct xfs_inode {
 	/* Miscellaneous state. */
 	unsigned long		i_flags;	/* see defined flags below */
 	unsigned int		i_delayed_blks;	/* count of delay alloc blks */
+	unsigned int		i_cow_blocks;	/* count of cow fork blocks */
 
 	struct xfs_icdinode	i_d;		/* most of ondisk inode */
 
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 56475fc..6c3381c 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -513,7 +513,8 @@ xfs_vn_getattr(
 	stat->mtime = inode->i_mtime;
 	stat->ctime = inode->i_ctime;
 	stat->blocks =
-		XFS_FSB_TO_BB(mp, ip->i_d.di_nblocks + ip->i_delayed_blks);
+		XFS_FSB_TO_BB(mp, ip->i_d.di_nblocks + ip->i_delayed_blks +
+				  ip->i_cow_blocks);
 
 	if (ip->i_d.di_version == 3) {
 		if (request_mask & STATX_BTIME) {
diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
index d583105..412d7eb 100644
--- a/fs/xfs/xfs_itable.c
+++ b/fs/xfs/xfs_itable.c
@@ -122,7 +122,8 @@ xfs_bulkstat_one_int(
 	case XFS_DINODE_FMT_BTREE:
 		buf->bs_rdev = 0;
 		buf->bs_blksize = mp->m_sb.sb_blocksize;
-		buf->bs_blocks = dic->di_nblocks + ip->i_delayed_blks;
+		buf->bs_blocks = dic->di_nblocks + ip->i_delayed_blks +
+				 ip->i_cow_blocks;
 		break;
 	}
 	xfs_iunlock(ip, XFS_ILOCK_SHARED);
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index 5b848f4..28f12f8 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -1847,7 +1847,7 @@ xfs_qm_vop_chown_reserve(
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED));
 	ASSERT(XFS_IS_QUOTA_RUNNING(mp));
 
-	delblks = ip->i_delayed_blks;
+	delblks = ip->i_delayed_blks + ip->i_cow_blocks;
 	blkflags = XFS_IS_REALTIME_INODE(ip) ?
 			XFS_QMOPT_RES_RTBLKS : XFS_QMOPT_RES_REGBLKS;
 
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 3644a08..9a6c545 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -619,7 +619,7 @@ xfs_reflink_cancel_cow_blocks(
 	}
 
 	/* clear tag if cow fork is emptied */
-	if (!ifp->if_bytes)
+	if (ip->i_cow_blocks == 0)
 		xfs_inode_clear_cowblocks_tag(ip);
 
 	return error;
@@ -704,7 +704,7 @@ xfs_reflink_end_cow(
 	trace_xfs_reflink_end_cow(ip, offset, count);
 
 	/* No COW extents?  That's easy! */
-	if (ifp->if_bytes == 0)
+	if (ip->i_cow_blocks == 0)
 		return 0;
 
 	offset_fsb = XFS_B_TO_FSBT(ip->i_mount, offset);
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index f3e0001..9d04cfb 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -989,6 +989,7 @@ xfs_fs_destroy_inode(
 	xfs_inactive(ip);
 
 	ASSERT(XFS_FORCED_SHUTDOWN(ip->i_mount) || ip->i_delayed_blks == 0);
+	ASSERT(XFS_FORCED_SHUTDOWN(ip->i_mount) || ip->i_cow_blocks == 0);
 	XFS_STATS_INC(ip->i_mount, vn_reclaim);
 
 	/*


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/6] xfs: don't screw up direct writes when freesp is fragmented
  2018-01-26  2:04 [PATCH v2 0/6] xfs: reflink/scrub/quota fixes Darrick J. Wong
                   ` (2 preceding siblings ...)
  2018-01-26  2:05 ` [PATCH 3/6] xfs: track CoW blocks separately in the inode Darrick J. Wong
@ 2018-01-26  2:05 ` Darrick J. Wong
  2018-01-26 19:24   ` Brian Foster
  2018-01-26  2:05 ` [PATCH 5/6] xfs: skip CoW writes past EOF when writeback races with truncate Darrick J. Wong
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Darrick J. Wong @ 2018-01-26  2:05 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

xfs_bmap_btalloc is given a range of file offset blocks that must be
allocated to some data/attr/cow fork.  If the fork has an extent size
hint associated with it, the request will be enlarged on both ends to
try to satisfy the alignment hint.  If free space is fragmentated,
sometimes we can allocate some blocks but not enough to fulfill any of
the requested range.  Since bmapi_allocate always trims the new extent
mapping to match the originally requested range, this results in
bmapi_write returning zero and no mapping.

The consequences of this vary -- buffered writes will simply re-call
bmapi_write until it can satisfy at least one block from the original
request.  Direct IO overwrites notice nmaps == 0 and return -ENOSPC
through the dio mechanism out to userspace with the weird result that
writes fail even when we have enough space because the ENOSPC return
overrides any partial write status.  For direct CoW writes the situation
is disastrous because nobody notices us returning an invalid zero-length
wrong-offset mapping to iomap and the write goes off into space.

First of all, teach iomap and xfs_reflink_allocate_cow to check the
mappings being returned and bail out with ENOSPC if we can't get any
blocks.  Second of all, if free space is so fragmented we can't satisfy
even a single block of the original allocation request but did get a few
blocks, we should break the alignment hint in order to guarantee at
least some forward progress for the direct write.  If we return a short
allocation to iomap_apply it'll call back about the remaining blocks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/iomap.c               |    2 +-
 fs/xfs/libxfs/xfs_bmap.c |   15 +++++++++++++++
 fs/xfs/xfs_reflink.c     |    6 ++++++
 3 files changed, 22 insertions(+), 1 deletion(-)


diff --git a/fs/iomap.c b/fs/iomap.c
index e5de772..aec35a0 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -63,7 +63,7 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
 	ret = ops->iomap_begin(inode, pos, length, flags, &iomap);
 	if (ret)
 		return ret;
-	if (WARN_ON(iomap.offset > pos))
+	if (WARN_ON(iomap.offset > pos) || WARN_ON(iomap.length == 0))
 		return -EIO;
 
 	/*
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 032befb..1fb885c 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3386,6 +3386,8 @@ xfs_bmap_btalloc(
 	xfs_agnumber_t	fb_agno;	/* ag number of ap->firstblock */
 	xfs_agnumber_t	ag;
 	xfs_alloc_arg_t	args;
+	xfs_fileoff_t	orig_offset;
+	xfs_extlen_t	orig_length;
 	xfs_extlen_t	blen;
 	xfs_extlen_t	nextminlen = 0;
 	int		nullfb;		/* true if ap->firstblock isn't set */
@@ -3395,6 +3397,8 @@ xfs_bmap_btalloc(
 	int		stripe_align;
 
 	ASSERT(ap->length);
+	orig_offset = ap->offset;
+	orig_length = ap->length;
 
 	mp = ap->ip->i_mount;
 
@@ -3610,6 +3614,17 @@ xfs_bmap_btalloc(
 			*ap->firstblock = args.fsbno;
 		ASSERT(nullfb || fb_agno <= args.agno);
 		ap->length = args.len;
+		/*
+		 * If we didn't get enough blocks to fill even one block of
+		 * the original request, break the alignment and return
+		 * whatever we got; it's the best we can do.  Free space is
+		 * fragmented enough that we cannot honor the extent size
+		 * hints but we can still make some forward progress.
+		 */
+		if (ap->length <= orig_length)
+			ap->offset = orig_offset;
+		else if (ap->offset + ap->length < orig_offset + orig_length)
+			ap->offset = orig_offset + orig_length - ap->length;
 		xfs_bmap_btalloc_accounting(ap, &args);
 	} else {
 		ap->blkno = NULLFSBLOCK;
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 9a6c545..9eca8aa 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -464,6 +464,12 @@ xfs_reflink_allocate_cow(
 	error = xfs_trans_commit(tp);
 	if (error)
 		return error;
+	/*
+	 * Allocation succeeded but our request was not even partially
+	 * satisfied?  Bail out.
+	 */
+	if (nimaps == 0)
+		return -ENOSPC;
 convert:
 	return xfs_reflink_convert_cow_extent(ip, imap, offset_fsb, count_fsb,
 			&dfops);


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/6] xfs: skip CoW writes past EOF when writeback races with truncate
  2018-01-26  2:04 [PATCH v2 0/6] xfs: reflink/scrub/quota fixes Darrick J. Wong
                   ` (3 preceding siblings ...)
  2018-01-26  2:05 ` [PATCH 4/6] xfs: don't screw up direct writes when freesp is fragmented Darrick J. Wong
@ 2018-01-26  2:05 ` Darrick J. Wong
  2018-01-26 12:21   ` Christoph Hellwig
  2018-01-26  2:05 ` [PATCH 6/6] xfs: don't clobber inobt/finobt cursors when xref with rmap Darrick J. Wong
  2018-01-26 12:18 ` [PATCH v2 0/6] xfs: reflink/scrub/quota fixes Christoph Hellwig
  6 siblings, 1 reply; 15+ messages in thread
From: Darrick J. Wong @ 2018-01-26  2:05 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Every so often we blow the ASSERT(type != XFS_IO_COW) in xfs_map_blocks
when running fsstress, as we do in generic/269.  The cause of this is
writeback racing with truncate -- writeback doesn't take the iolock, so
truncate can sneak in to decrease i_size and truncate page cache while
writeback is gathering buffer heads to schedule writeout.

If we hit this race on a block that has a CoW mapping, we'll get a valid
imap from the CoW fork but the reduced i_size trims the mapping to zero
length (which makes it invalid), so we call xfs_map_blocks to try again.
This doesn't do much anyway, since any mapping we get out of that will
also be invalid, so we might as well skip the assert and just stop.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c |   13 +++++++++++++
 1 file changed, 13 insertions(+)


diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 2e094c7..2ebc42e 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -390,6 +390,19 @@ xfs_map_blocks(
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
+	/*
+	 * Apparently truncate can race with writeback since writeback doesn't
+	 * take the iolock and truncate decreases the file size before it
+	 * starts truncating the pages between new_size and old_size.
+	 * Therefore, we can end up in the situation where writeback gets a
+	 * CoW fork mapping but the truncate makes the mapping invalid and we
+	 * end up in here trying to get a new mapping.  Bail out here so that
+	 * we simply never get a valid mapping and so we drop the write
+	 * altogether.  The page truncation will kill the contents anyway.
+	 */
+	if (type == XFS_IO_COW && offset > i_size_read(inode))
+		return 0;
+
 	ASSERT(type != XFS_IO_COW);
 	if (type == XFS_IO_UNWRITTEN)
 		bmapi_flags |= XFS_BMAPI_IGSTATE;


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 6/6] xfs: don't clobber inobt/finobt cursors when xref with rmap
  2018-01-26  2:04 [PATCH v2 0/6] xfs: reflink/scrub/quota fixes Darrick J. Wong
                   ` (4 preceding siblings ...)
  2018-01-26  2:05 ` [PATCH 5/6] xfs: skip CoW writes past EOF when writeback races with truncate Darrick J. Wong
@ 2018-01-26  2:05 ` Darrick J. Wong
  2018-01-26 12:18 ` [PATCH v2 0/6] xfs: reflink/scrub/quota fixes Christoph Hellwig
  6 siblings, 0 replies; 15+ messages in thread
From: Darrick J. Wong @ 2018-01-26  2:05 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Even if we can't use the inobt/finobt cursors to count the number of
inode btree blocks, we are never allowed to clobber the cursor of the
btree being checked, so don't do this.  Found by fuzzing level = ones
in xfs/364.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/scrub/ialloc.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
index 21c850a..63ab3f9 100644
--- a/fs/xfs/scrub/ialloc.c
+++ b/fs/xfs/scrub/ialloc.c
@@ -391,12 +391,12 @@ xfs_scrub_iallocbt_xref_rmap_btreeblks(
 
 	/* Check that we saw as many inobt blocks as the rmap says. */
 	error = xfs_btree_count_blocks(sc->sa.ino_cur, &inobt_blocks);
-	if (!xfs_scrub_should_check_xref(sc, &error, &sc->sa.ino_cur))
+	if (!xfs_scrub_process_error(sc, 0, 0, &error))
 		return;
 
 	if (sc->sa.fino_cur) {
 		error = xfs_btree_count_blocks(sc->sa.fino_cur, &finobt_blocks);
-		if (!xfs_scrub_should_check_xref(sc, &error, &sc->sa.fino_cur))
+		if (!xfs_scrub_process_error(sc, 0, 0, &error))
 			return;
 	}
 


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/6] xfs: reflink/scrub/quota fixes
  2018-01-26  2:04 [PATCH v2 0/6] xfs: reflink/scrub/quota fixes Darrick J. Wong
                   ` (5 preceding siblings ...)
  2018-01-26  2:05 ` [PATCH 6/6] xfs: don't clobber inobt/finobt cursors when xref with rmap Darrick J. Wong
@ 2018-01-26 12:18 ` Christoph Hellwig
  6 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2018-01-26 12:18 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

Oh, I haven't notice the repost.  But I think most of my
comments on the previous iteration of some of these patches still
apply.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/6] xfs: refactor accounting updates out of xfs_bmap_btalloc
  2018-01-26  2:04 ` [PATCH 1/6] xfs: refactor accounting updates out of xfs_bmap_btalloc Darrick J. Wong
@ 2018-01-26 12:19   ` Christoph Hellwig
  2018-01-26 19:06   ` Brian Foster
  1 sibling, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2018-01-26 12:19 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Jan 25, 2018 at 06:04:54PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Move all the inode and quota accounting updates out of xfs_bmap_btalloc
> in preparation for fixing some quota accounting problems with copy on
> write.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_bmap.c |   34 +++++++++++++++++++++-------------
>  1 file changed, 21 insertions(+), 13 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 0c9c9cd..6ad79ea 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3337,6 +3337,26 @@ xfs_bmap_btalloc_filestreams(
>  	return 0;
>  }
>  
> +/* Update all inode and quota accounting for the allocation we just did. */
> +static void
> +xfs_bmap_btalloc_accounting(
> +	struct xfs_bmalloca	*ap,
> +	struct xfs_alloc_arg	*args)
> +{
> +	if (!(ap->flags & XFS_BMAPI_COWFORK))
> +		ap->ip->i_d.di_nblocks += args->len;
> +	xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);
> +	if (ap->wasdel)
> +		ap->ip->i_delayed_blks -= args->len;
> +	/*
> +	 * Adjust the disk quota also. This was reserved
> +	 * earlier.

Please use the full line length.  Or in fact just remove the comment
as it seems rather confusing given that we can deal with reservations
or not depending on the wasdel status.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/6] xfs: skip CoW writes past EOF when writeback races with truncate
  2018-01-26  2:05 ` [PATCH 5/6] xfs: skip CoW writes past EOF when writeback races with truncate Darrick J. Wong
@ 2018-01-26 12:21   ` Christoph Hellwig
  0 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2018-01-26 12:21 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

> +	/*
> +	 * Apparently truncate can race with writeback since writeback doesn't

The apparently here looks rather snarky, I'd remove it.

> +	 * take the iolock and truncate decreases the file size before it
> +	 * starts truncating the pages between new_size and old_size.
> +	 * Therefore, we can end up in the situation where writeback gets a
> +	 * CoW fork mapping but the truncate makes the mapping invalid and we
> +	 * end up in here trying to get a new mapping.  Bail out here so that
> +	 * we simply never get a valid mapping and so we drop the write
> +	 * altogether.  The page truncation will kill the contents anyway.
> +	 */
> +	if (type == XFS_IO_COW && offset > i_size_read(inode))
> +		return 0;

Otherwise this looks fine:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/6] xfs: refactor accounting updates out of xfs_bmap_btalloc
  2018-01-26  2:04 ` [PATCH 1/6] xfs: refactor accounting updates out of xfs_bmap_btalloc Darrick J. Wong
  2018-01-26 12:19   ` Christoph Hellwig
@ 2018-01-26 19:06   ` Brian Foster
  1 sibling, 0 replies; 15+ messages in thread
From: Brian Foster @ 2018-01-26 19:06 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Jan 25, 2018 at 06:04:54PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Move all the inode and quota accounting updates out of xfs_bmap_btalloc
> in preparation for fixing some quota accounting problems with copy on
> write.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Modulo Christoph's comments:

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/libxfs/xfs_bmap.c |   34 +++++++++++++++++++++-------------
>  1 file changed, 21 insertions(+), 13 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 0c9c9cd..6ad79ea 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3337,6 +3337,26 @@ xfs_bmap_btalloc_filestreams(
>  	return 0;
>  }
>  
> +/* Update all inode and quota accounting for the allocation we just did. */
> +static void
> +xfs_bmap_btalloc_accounting(
> +	struct xfs_bmalloca	*ap,
> +	struct xfs_alloc_arg	*args)
> +{
> +	if (!(ap->flags & XFS_BMAPI_COWFORK))
> +		ap->ip->i_d.di_nblocks += args->len;
> +	xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);
> +	if (ap->wasdel)
> +		ap->ip->i_delayed_blks -= args->len;
> +	/*
> +	 * Adjust the disk quota also. This was reserved
> +	 * earlier.
> +	 */
> +	xfs_trans_mod_dquot_byino(ap->tp, ap->ip,
> +		ap->wasdel ? XFS_TRANS_DQ_DELBCOUNT : XFS_TRANS_DQ_BCOUNT,
> +		(long) args->len);
> +}
> +
>  STATIC int
>  xfs_bmap_btalloc(
>  	struct xfs_bmalloca	*ap)	/* bmap alloc argument struct */
> @@ -3571,19 +3591,7 @@ xfs_bmap_btalloc(
>  			*ap->firstblock = args.fsbno;
>  		ASSERT(nullfb || fb_agno <= args.agno);
>  		ap->length = args.len;
> -		if (!(ap->flags & XFS_BMAPI_COWFORK))
> -			ap->ip->i_d.di_nblocks += args.len;
> -		xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);
> -		if (ap->wasdel)
> -			ap->ip->i_delayed_blks -= args.len;
> -		/*
> -		 * Adjust the disk quota also. This was reserved
> -		 * earlier.
> -		 */
> -		xfs_trans_mod_dquot_byino(ap->tp, ap->ip,
> -			ap->wasdel ? XFS_TRANS_DQ_DELBCOUNT :
> -					XFS_TRANS_DQ_BCOUNT,
> -			(long) args.len);
> +		xfs_bmap_btalloc_accounting(ap, &args);
>  	} else {
>  		ap->blkno = NULLFSBLOCK;
>  		ap->length = 0;
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/6] xfs: CoW fork operations should only update quota reservations
  2018-01-26  2:05 ` [PATCH 2/6] xfs: CoW fork operations should only update quota reservations Darrick J. Wong
@ 2018-01-26 19:06   ` Brian Foster
  2018-01-26 19:20     ` Darrick J. Wong
  0 siblings, 1 reply; 15+ messages in thread
From: Brian Foster @ 2018-01-26 19:06 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Jan 25, 2018 at 06:05:00PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Since the CoW fork only exists in memory, it is incorrect to update the
> on-disk quota block counts when we modify the CoW fork.  Unlike the data
> fork, even real extents in the CoW fork are only reservations (on-disk
> they're owned by the refcountbt) so they must not be tracked in the on
> disk quota info.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Looks sane.. just a couple comments from my braindump on the previous
patch and a few comment suggestions that help me understand this better
(feel free to tweak, rewrite, etc.).

>  fs/xfs/libxfs/xfs_bmap.c |   33 ++++++++++++++++++++++++++++++---
>  fs/xfs/libxfs/xfs_bmap.h |    2 +-
>  fs/xfs/xfs_reflink.c     |   12 ++++++------
>  3 files changed, 37 insertions(+), 10 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 6ad79ea..a59d5be 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3343,8 +3343,26 @@ xfs_bmap_btalloc_accounting(
>  	struct xfs_bmalloca	*ap,
>  	struct xfs_alloc_arg	*args)
>  {
> -	if (!(ap->flags & XFS_BMAPI_COWFORK))
> -		ap->ip->i_d.di_nblocks += args->len;
> +	if (ap->flags & XFS_BMAPI_COWFORK) {

/*
 * COW fork blocks are in-core only and thus are treated as in-core
 * quota reservation (like delalloc blocks) even when converted to real
 * blocks. The quota reservation is not accounted to disk until blocks
 * are remapped to the data fork. So if these blocks were previously
 * delalloc, we already have quota reservation and there's nothing to do
 * yet.
 */

> +		/* Filling a previously reserved extent; nothing to do here. */
> +		if (ap->wasdel)
> +			return;
> +
> +		/*
> +		 * If we get here, we're filling a CoW hole with a real
> +		 * (non-delalloc) CoW extent having reserved enough blocks
> +		 * from both q_res_bcount and qt_blk_res to guarantee that we
> +		 * won't run out of space.  The unused qt_blk_res is given
> +		 * back to q_res_bcount when the transaction commits, so we
> +		 * must decrease qt_blk_res without decreasing q_res_bcount.
> +		 */

/*
 * Otherwise, we've allocated blocks in a hole. The transaction has
 * acquired in-core quota reservation for this extent. Rather than
 * account these as real blocks, however, we reduce the transaction
 * quota reservation based on the allocation. This essentially transfers
 * the transaction quota reservation to that of a delalloc extent.
 */

> +		xfs_trans_mod_dquot_byino(ap->tp, ap->ip, XFS_TRANS_DQ_RES_BLKS,
> +				-(long)args->len);
> +		return;
> +	}
> +
> +	/* data/attr fork only */
> +	ap->ip->i_d.di_nblocks += args->len;
>  	xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);
>  	if (ap->wasdel)
>  		ap->ip->i_delayed_blks -= args->len;
> @@ -4761,13 +4779,15 @@ xfs_bmap_del_extent_cow(
>  	struct xfs_inode	*ip,
>  	struct xfs_iext_cursor	*icur,
>  	struct xfs_bmbt_irec	*got,
> -	struct xfs_bmbt_irec	*del)
> +	struct xfs_bmbt_irec	*del,
> +	bool			free_quotares)
>  {
>  	struct xfs_mount	*mp = ip->i_mount;
>  	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
>  	struct xfs_bmbt_irec	new;
>  	xfs_fileoff_t		del_endoff, got_endoff;
>  	int			state = BMAP_COWFORK;
> +	int			error;
>  
>  	XFS_STATS_INC(mp, xs_del_exlist);
>  
> @@ -4824,6 +4844,13 @@ xfs_bmap_del_extent_cow(
>  		xfs_iext_insert(ip, icur, &new, state);
>  		break;
>  	}
> +
> +	/* Remove the quota reservation */
> +	if (!free_quotares)
> +		return;
> +	error = xfs_trans_reserve_quota_nblks(NULL, ip,
> +			-(long)del->br_blockcount, 0, XFS_QMOPT_RES_REGBLKS);
> +	ASSERT(error == 0);

Might as well pull this into the only free_quotares = true caller.

Brian

>  }
>  
>  /*
> diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
> index e36d757..e99f28f 100644
> --- a/fs/xfs/libxfs/xfs_bmap.h
> +++ b/fs/xfs/libxfs/xfs_bmap.h
> @@ -224,7 +224,7 @@ int	xfs_bmap_del_extent_delay(struct xfs_inode *ip, int whichfork,
>  		struct xfs_bmbt_irec *del);
>  void	xfs_bmap_del_extent_cow(struct xfs_inode *ip,
>  		struct xfs_iext_cursor *cur, struct xfs_bmbt_irec *got,
> -		struct xfs_bmbt_irec *del);
> +		struct xfs_bmbt_irec *del, bool free_quotares);
>  uint	xfs_default_attroffset(struct xfs_inode *ip);
>  int	xfs_bmap_collapse_extents(struct xfs_trans *tp, struct xfs_inode *ip,
>  		xfs_fileoff_t *next_fsb, xfs_fileoff_t offset_shift_fsb,
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 82abff6..3644a08 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -599,10 +599,6 @@ xfs_reflink_cancel_cow_blocks(
>  					del.br_startblock, del.br_blockcount,
>  					NULL);
>  
> -			/* Update quota accounting */
> -			xfs_trans_mod_dquot_byino(*tpp, ip, XFS_TRANS_DQ_BCOUNT,
> -					-(long)del.br_blockcount);
> -
>  			/* Roll the transaction */
>  			xfs_defer_ijoin(&dfops, ip);
>  			error = xfs_defer_finish(tpp, &dfops);
> @@ -612,7 +608,7 @@ xfs_reflink_cancel_cow_blocks(
>  			}
>  
>  			/* Remove the mapping from the CoW fork. */
> -			xfs_bmap_del_extent_cow(ip, &icur, &got, &del);
> +			xfs_bmap_del_extent_cow(ip, &icur, &got, &del, true);
>  		} else {
>  			/* Didn't do anything, push cursor back. */
>  			xfs_iext_prev(ifp, &icur);
> @@ -795,8 +791,12 @@ xfs_reflink_end_cow(
>  		if (error)
>  			goto out_defer;
>  
> +		/* Charge this new data fork mapping to the on-disk quota. */
> +		xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_DELBCOUNT,
> +				(long)del.br_blockcount);
> +
>  		/* Remove the mapping from the CoW fork. */
> -		xfs_bmap_del_extent_cow(ip, &icur, &got, &del);
> +		xfs_bmap_del_extent_cow(ip, &icur, &got, &del, false);
>  
>  		xfs_defer_ijoin(&dfops, ip);
>  		error = xfs_defer_finish(&tp, &dfops);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/6] xfs: CoW fork operations should only update quota reservations
  2018-01-26 19:06   ` Brian Foster
@ 2018-01-26 19:20     ` Darrick J. Wong
  0 siblings, 0 replies; 15+ messages in thread
From: Darrick J. Wong @ 2018-01-26 19:20 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Jan 26, 2018 at 02:06:49PM -0500, Brian Foster wrote:
> On Thu, Jan 25, 2018 at 06:05:00PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Since the CoW fork only exists in memory, it is incorrect to update the
> > on-disk quota block counts when we modify the CoW fork.  Unlike the data
> > fork, even real extents in the CoW fork are only reservations (on-disk
> > they're owned by the refcountbt) so they must not be tracked in the on
> > disk quota info.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> 
> Looks sane.. just a couple comments from my braindump on the previous
> patch and a few comment suggestions that help me understand this better
> (feel free to tweak, rewrite, etc.).
> 
> >  fs/xfs/libxfs/xfs_bmap.c |   33 ++++++++++++++++++++++++++++++---
> >  fs/xfs/libxfs/xfs_bmap.h |    2 +-
> >  fs/xfs/xfs_reflink.c     |   12 ++++++------
> >  3 files changed, 37 insertions(+), 10 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> > index 6ad79ea..a59d5be 100644
> > --- a/fs/xfs/libxfs/xfs_bmap.c
> > +++ b/fs/xfs/libxfs/xfs_bmap.c
> > @@ -3343,8 +3343,26 @@ xfs_bmap_btalloc_accounting(
> >  	struct xfs_bmalloca	*ap,
> >  	struct xfs_alloc_arg	*args)
> >  {
> > -	if (!(ap->flags & XFS_BMAPI_COWFORK))
> > -		ap->ip->i_d.di_nblocks += args->len;
> > +	if (ap->flags & XFS_BMAPI_COWFORK) {
> 
> /*
>  * COW fork blocks are in-core only and thus are treated as in-core
>  * quota reservation (like delalloc blocks) even when converted to real
>  * blocks. The quota reservation is not accounted to disk until blocks
>  * are remapped to the data fork. So if these blocks were previously
>  * delalloc, we already have quota reservation and there's nothing to do
>  * yet.
>  */

Ok.

> 
> > +		/* Filling a previously reserved extent; nothing to do here. */
> > +		if (ap->wasdel)
> > +			return;
> > +
> > +		/*
> > +		 * If we get here, we're filling a CoW hole with a real
> > +		 * (non-delalloc) CoW extent having reserved enough blocks
> > +		 * from both q_res_bcount and qt_blk_res to guarantee that we
> > +		 * won't run out of space.  The unused qt_blk_res is given
> > +		 * back to q_res_bcount when the transaction commits, so we
> > +		 * must decrease qt_blk_res without decreasing q_res_bcount.
> > +		 */
> 
> /*
>  * Otherwise, we've allocated blocks in a hole. The transaction has
>  * acquired in-core quota reservation for this extent. Rather than
>  * account these as real blocks, however, we reduce the transaction
>  * quota reservation based on the allocation. This essentially transfers
>  * the transaction quota reservation to that of a delalloc extent.
>  */

Ok.

> 
> > +		xfs_trans_mod_dquot_byino(ap->tp, ap->ip, XFS_TRANS_DQ_RES_BLKS,
> > +				-(long)args->len);
> > +		return;
> > +	}
> > +
> > +	/* data/attr fork only */
> > +	ap->ip->i_d.di_nblocks += args->len;
> >  	xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);
> >  	if (ap->wasdel)
> >  		ap->ip->i_delayed_blks -= args->len;
> > @@ -4761,13 +4779,15 @@ xfs_bmap_del_extent_cow(
> >  	struct xfs_inode	*ip,
> >  	struct xfs_iext_cursor	*icur,
> >  	struct xfs_bmbt_irec	*got,
> > -	struct xfs_bmbt_irec	*del)
> > +	struct xfs_bmbt_irec	*del,
> > +	bool			free_quotares)
> >  {
> >  	struct xfs_mount	*mp = ip->i_mount;
> >  	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
> >  	struct xfs_bmbt_irec	new;
> >  	xfs_fileoff_t		del_endoff, got_endoff;
> >  	int			state = BMAP_COWFORK;
> > +	int			error;
> >  
> >  	XFS_STATS_INC(mp, xs_del_exlist);
> >  
> > @@ -4824,6 +4844,13 @@ xfs_bmap_del_extent_cow(
> >  		xfs_iext_insert(ip, icur, &new, state);
> >  		break;
> >  	}
> > +
> > +	/* Remove the quota reservation */
> > +	if (!free_quotares)
> > +		return;
> > +	error = xfs_trans_reserve_quota_nblks(NULL, ip,
> > +			-(long)del->br_blockcount, 0, XFS_QMOPT_RES_REGBLKS);
> > +	ASSERT(error == 0);
> 
> Might as well pull this into the only free_quotares = true caller.

Done.  Thx for the review!

--D

> Brian
> 
> >  }
> >  
> >  /*
> > diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
> > index e36d757..e99f28f 100644
> > --- a/fs/xfs/libxfs/xfs_bmap.h
> > +++ b/fs/xfs/libxfs/xfs_bmap.h
> > @@ -224,7 +224,7 @@ int	xfs_bmap_del_extent_delay(struct xfs_inode *ip, int whichfork,
> >  		struct xfs_bmbt_irec *del);
> >  void	xfs_bmap_del_extent_cow(struct xfs_inode *ip,
> >  		struct xfs_iext_cursor *cur, struct xfs_bmbt_irec *got,
> > -		struct xfs_bmbt_irec *del);
> > +		struct xfs_bmbt_irec *del, bool free_quotares);
> >  uint	xfs_default_attroffset(struct xfs_inode *ip);
> >  int	xfs_bmap_collapse_extents(struct xfs_trans *tp, struct xfs_inode *ip,
> >  		xfs_fileoff_t *next_fsb, xfs_fileoff_t offset_shift_fsb,
> > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> > index 82abff6..3644a08 100644
> > --- a/fs/xfs/xfs_reflink.c
> > +++ b/fs/xfs/xfs_reflink.c
> > @@ -599,10 +599,6 @@ xfs_reflink_cancel_cow_blocks(
> >  					del.br_startblock, del.br_blockcount,
> >  					NULL);
> >  
> > -			/* Update quota accounting */
> > -			xfs_trans_mod_dquot_byino(*tpp, ip, XFS_TRANS_DQ_BCOUNT,
> > -					-(long)del.br_blockcount);
> > -
> >  			/* Roll the transaction */
> >  			xfs_defer_ijoin(&dfops, ip);
> >  			error = xfs_defer_finish(tpp, &dfops);
> > @@ -612,7 +608,7 @@ xfs_reflink_cancel_cow_blocks(
> >  			}
> >  
> >  			/* Remove the mapping from the CoW fork. */
> > -			xfs_bmap_del_extent_cow(ip, &icur, &got, &del);
> > +			xfs_bmap_del_extent_cow(ip, &icur, &got, &del, true);
> >  		} else {
> >  			/* Didn't do anything, push cursor back. */
> >  			xfs_iext_prev(ifp, &icur);
> > @@ -795,8 +791,12 @@ xfs_reflink_end_cow(
> >  		if (error)
> >  			goto out_defer;
> >  
> > +		/* Charge this new data fork mapping to the on-disk quota. */
> > +		xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_DELBCOUNT,
> > +				(long)del.br_blockcount);
> > +
> >  		/* Remove the mapping from the CoW fork. */
> > -		xfs_bmap_del_extent_cow(ip, &icur, &got, &del);
> > +		xfs_bmap_del_extent_cow(ip, &icur, &got, &del, false);
> >  
> >  		xfs_defer_ijoin(&dfops, ip);
> >  		error = xfs_defer_finish(&tp, &dfops);
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/6] xfs: don't screw up direct writes when freesp is fragmented
  2018-01-26  2:05 ` [PATCH 4/6] xfs: don't screw up direct writes when freesp is fragmented Darrick J. Wong
@ 2018-01-26 19:24   ` Brian Foster
  2018-01-26 19:49     ` Darrick J. Wong
  0 siblings, 1 reply; 15+ messages in thread
From: Brian Foster @ 2018-01-26 19:24 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Jan 25, 2018 at 06:05:12PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> xfs_bmap_btalloc is given a range of file offset blocks that must be
> allocated to some data/attr/cow fork.  If the fork has an extent size
> hint associated with it, the request will be enlarged on both ends to
> try to satisfy the alignment hint.  If free space is fragmentated,
> sometimes we can allocate some blocks but not enough to fulfill any of
> the requested range.  Since bmapi_allocate always trims the new extent
> mapping to match the originally requested range, this results in
> bmapi_write returning zero and no mapping.
> 
> The consequences of this vary -- buffered writes will simply re-call
> bmapi_write until it can satisfy at least one block from the original
> request.  Direct IO overwrites notice nmaps == 0 and return -ENOSPC
> through the dio mechanism out to userspace with the weird result that
> writes fail even when we have enough space because the ENOSPC return
> overrides any partial write status.  For direct CoW writes the situation
> is disastrous because nobody notices us returning an invalid zero-length
> wrong-offset mapping to iomap and the write goes off into space.
> 
> First of all, teach iomap and xfs_reflink_allocate_cow to check the
> mappings being returned and bail out with ENOSPC if we can't get any
> blocks.  Second of all, if free space is so fragmented we can't satisfy
> even a single block of the original allocation request but did get a few
> blocks, we should break the alignment hint in order to guarantee at
> least some forward progress for the direct write.  If we return a short
> allocation to iomap_apply it'll call back about the remaining blocks.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/iomap.c               |    2 +-
>  fs/xfs/libxfs/xfs_bmap.c |   15 +++++++++++++++
>  fs/xfs/xfs_reflink.c     |    6 ++++++
>  3 files changed, 22 insertions(+), 1 deletion(-)
> 
> 
> diff --git a/fs/iomap.c b/fs/iomap.c
> index e5de772..aec35a0 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -63,7 +63,7 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
>  	ret = ops->iomap_begin(inode, pos, length, flags, &iomap);
>  	if (ret)
>  		return ret;
> -	if (WARN_ON(iomap.offset > pos))
> +	if (WARN_ON(iomap.offset > pos) || WARN_ON(iomap.length == 0))
>  		return -EIO;
>  
>  	/*
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 032befb..1fb885c 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3386,6 +3386,8 @@ xfs_bmap_btalloc(
>  	xfs_agnumber_t	fb_agno;	/* ag number of ap->firstblock */
>  	xfs_agnumber_t	ag;
>  	xfs_alloc_arg_t	args;
> +	xfs_fileoff_t	orig_offset;
> +	xfs_extlen_t	orig_length;
>  	xfs_extlen_t	blen;
>  	xfs_extlen_t	nextminlen = 0;
>  	int		nullfb;		/* true if ap->firstblock isn't set */
> @@ -3395,6 +3397,8 @@ xfs_bmap_btalloc(
>  	int		stripe_align;
>  
>  	ASSERT(ap->length);
> +	orig_offset = ap->offset;
> +	orig_length = ap->length;
>  
>  	mp = ap->ip->i_mount;
>  
> @@ -3610,6 +3614,17 @@ xfs_bmap_btalloc(
>  			*ap->firstblock = args.fsbno;
>  		ASSERT(nullfb || fb_agno <= args.agno);
>  		ap->length = args.len;
> +		/*
> +		 * If we didn't get enough blocks to fill even one block of
> +		 * the original request, break the alignment and return
> +		 * whatever we got; it's the best we can do.  Free space is
> +		 * fragmented enough that we cannot honor the extent size
> +		 * hints but we can still make some forward progress.
> +		 */
> +		if (ap->length <= orig_length)
> +			ap->offset = orig_offset;
> +		else if (ap->offset + ap->length < orig_offset + orig_length)
> +			ap->offset = orig_offset + orig_length - ap->length;

Ok, but do we really want to shift the extent placement for any
allocation that is short? E.g., what if the allocation is short but
still covers the target offset? ISTM there's a tradeoff here between
filling the range asked for vs. following the heuristic in
xfs_bmap_extsize_align(). Even if we do end up shifting to prioritize
the target range, that doesn't seem to be what the comment describes by
saying "... we didn't get enough blocks to fill even one block of the
original request." :P

>  		xfs_bmap_btalloc_accounting(ap, &args);
>  	} else {
>  		ap->blkno = NULLFSBLOCK;
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 9a6c545..9eca8aa 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -464,6 +464,12 @@ xfs_reflink_allocate_cow(
>  	error = xfs_trans_commit(tp);
>  	if (error)
>  		return error;
> +	/*
> +	 * Allocation succeeded but our request was not even partially
> +	 * satisfied?  Bail out.
> +	 */
> +	if (nimaps == 0)
> +		return -ENOSPC;

I think this should probably be a separate patch. This part is a
straightforward bug fix for a potentially critical issue (i.e., a no
brainer). While not that much code, the change above is an enhancement
with a bit more complexity to consider.

Brian

>  convert:
>  	return xfs_reflink_convert_cow_extent(ip, imap, offset_fsb, count_fsb,
>  			&dfops);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/6] xfs: don't screw up direct writes when freesp is fragmented
  2018-01-26 19:24   ` Brian Foster
@ 2018-01-26 19:49     ` Darrick J. Wong
  0 siblings, 0 replies; 15+ messages in thread
From: Darrick J. Wong @ 2018-01-26 19:49 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Jan 26, 2018 at 02:24:57PM -0500, Brian Foster wrote:
> On Thu, Jan 25, 2018 at 06:05:12PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > xfs_bmap_btalloc is given a range of file offset blocks that must be
> > allocated to some data/attr/cow fork.  If the fork has an extent size
> > hint associated with it, the request will be enlarged on both ends to
> > try to satisfy the alignment hint.  If free space is fragmentated,
> > sometimes we can allocate some blocks but not enough to fulfill any of
> > the requested range.  Since bmapi_allocate always trims the new extent
> > mapping to match the originally requested range, this results in
> > bmapi_write returning zero and no mapping.
> > 
> > The consequences of this vary -- buffered writes will simply re-call
> > bmapi_write until it can satisfy at least one block from the original
> > request.  Direct IO overwrites notice nmaps == 0 and return -ENOSPC
> > through the dio mechanism out to userspace with the weird result that
> > writes fail even when we have enough space because the ENOSPC return
> > overrides any partial write status.  For direct CoW writes the situation
> > is disastrous because nobody notices us returning an invalid zero-length
> > wrong-offset mapping to iomap and the write goes off into space.
> > 
> > First of all, teach iomap and xfs_reflink_allocate_cow to check the
> > mappings being returned and bail out with ENOSPC if we can't get any
> > blocks.  Second of all, if free space is so fragmented we can't satisfy
> > even a single block of the original allocation request but did get a few
> > blocks, we should break the alignment hint in order to guarantee at
> > least some forward progress for the direct write.  If we return a short
> > allocation to iomap_apply it'll call back about the remaining blocks.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/iomap.c               |    2 +-
> >  fs/xfs/libxfs/xfs_bmap.c |   15 +++++++++++++++
> >  fs/xfs/xfs_reflink.c     |    6 ++++++
> >  3 files changed, 22 insertions(+), 1 deletion(-)
> > 
> > 
> > diff --git a/fs/iomap.c b/fs/iomap.c
> > index e5de772..aec35a0 100644
> > --- a/fs/iomap.c
> > +++ b/fs/iomap.c
> > @@ -63,7 +63,7 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
> >  	ret = ops->iomap_begin(inode, pos, length, flags, &iomap);
> >  	if (ret)
> >  		return ret;
> > -	if (WARN_ON(iomap.offset > pos))
> > +	if (WARN_ON(iomap.offset > pos) || WARN_ON(iomap.length == 0))
> >  		return -EIO;
> >  
> >  	/*
> > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> > index 032befb..1fb885c 100644
> > --- a/fs/xfs/libxfs/xfs_bmap.c
> > +++ b/fs/xfs/libxfs/xfs_bmap.c
> > @@ -3386,6 +3386,8 @@ xfs_bmap_btalloc(
> >  	xfs_agnumber_t	fb_agno;	/* ag number of ap->firstblock */
> >  	xfs_agnumber_t	ag;
> >  	xfs_alloc_arg_t	args;
> > +	xfs_fileoff_t	orig_offset;
> > +	xfs_extlen_t	orig_length;
> >  	xfs_extlen_t	blen;
> >  	xfs_extlen_t	nextminlen = 0;
> >  	int		nullfb;		/* true if ap->firstblock isn't set */
> > @@ -3395,6 +3397,8 @@ xfs_bmap_btalloc(
> >  	int		stripe_align;
> >  
> >  	ASSERT(ap->length);
> > +	orig_offset = ap->offset;
> > +	orig_length = ap->length;
> >  
> >  	mp = ap->ip->i_mount;
> >  
> > @@ -3610,6 +3614,17 @@ xfs_bmap_btalloc(
> >  			*ap->firstblock = args.fsbno;
> >  		ASSERT(nullfb || fb_agno <= args.agno);
> >  		ap->length = args.len;
> > +		/*
> > +		 * If we didn't get enough blocks to fill even one block of
> > +		 * the original request, break the alignment and return
> > +		 * whatever we got; it's the best we can do.  Free space is
> > +		 * fragmented enough that we cannot honor the extent size
> > +		 * hints but we can still make some forward progress.
> > +		 */
> > +		if (ap->length <= orig_length)
> > +			ap->offset = orig_offset;
> > +		else if (ap->offset + ap->length < orig_offset + orig_length)
> > +			ap->offset = orig_offset + orig_length - ap->length;
> 
> Ok, but do we really want to shift the extent placement for any
> allocation that is short? E.g., what if the allocation is short but
> still covers the target offset? ISTM there's a tradeoff here between
> filling the range asked for vs. following the heuristic in
> xfs_bmap_extsize_align(). Even if we do end up shifting to prioritize
> the target range, that doesn't seem to be what the comment describes by
> saying "... we didn't get enough blocks to fill even one block of the
> original request." :P

Yeah, Eric complained about the wording of the comment on IRC too.
How about:

/*
 * If the extent size hint is active, we tried to round the caller's
 * allocation request offset down to extsz and the length up to another
 * extsz boundary.  If we found a free extent we mapped it in starting
 * at this new offset.  If the newly mapped space isn't long enough to
 * cover the entire range of offsets that was originally requested, move
 * the mapping up so that we can fill as much of the caller's original
 * request as possible.  Free space is apparently very fragmented so
 * we're unlikely to be able to satisfy the hints anyway.
 */
if (ap->length <= orig_length)
	ap->offset = orig_offset;
else if (ap->offset + ap->length < orig_offset + orig_length)
	ap->offset = orig_offset + orig_length - ap->length;

> 
> >  		xfs_bmap_btalloc_accounting(ap, &args);
> >  	} else {
> >  		ap->blkno = NULLFSBLOCK;
> > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> > index 9a6c545..9eca8aa 100644
> > --- a/fs/xfs/xfs_reflink.c
> > +++ b/fs/xfs/xfs_reflink.c
> > @@ -464,6 +464,12 @@ xfs_reflink_allocate_cow(
> >  	error = xfs_trans_commit(tp);
> >  	if (error)
> >  		return error;
> > +	/*
> > +	 * Allocation succeeded but our request was not even partially
> > +	 * satisfied?  Bail out.
> > +	 */
> > +	if (nimaps == 0)
> > +		return -ENOSPC;
> 
> I think this should probably be a separate patch. This part is a
> straightforward bug fix for a potentially critical issue (i.e., a no
> brainer). While not that much code, the change above is an enhancement
> with a bit more complexity to consider.

Ok, will do.

--D

> Brian
> 
> >  convert:
> >  	return xfs_reflink_convert_cow_extent(ip, imap, offset_fsb, count_fsb,
> >  			&dfops);
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-01-26 19:59 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-26  2:04 [PATCH v2 0/6] xfs: reflink/scrub/quota fixes Darrick J. Wong
2018-01-26  2:04 ` [PATCH 1/6] xfs: refactor accounting updates out of xfs_bmap_btalloc Darrick J. Wong
2018-01-26 12:19   ` Christoph Hellwig
2018-01-26 19:06   ` Brian Foster
2018-01-26  2:05 ` [PATCH 2/6] xfs: CoW fork operations should only update quota reservations Darrick J. Wong
2018-01-26 19:06   ` Brian Foster
2018-01-26 19:20     ` Darrick J. Wong
2018-01-26  2:05 ` [PATCH 3/6] xfs: track CoW blocks separately in the inode Darrick J. Wong
2018-01-26  2:05 ` [PATCH 4/6] xfs: don't screw up direct writes when freesp is fragmented Darrick J. Wong
2018-01-26 19:24   ` Brian Foster
2018-01-26 19:49     ` Darrick J. Wong
2018-01-26  2:05 ` [PATCH 5/6] xfs: skip CoW writes past EOF when writeback races with truncate Darrick J. Wong
2018-01-26 12:21   ` Christoph Hellwig
2018-01-26  2:05 ` [PATCH 6/6] xfs: don't clobber inobt/finobt cursors when xref with rmap Darrick J. Wong
2018-01-26 12:18 ` [PATCH v2 0/6] xfs: reflink/scrub/quota fixes Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.