All of lore.kernel.org
 help / color / mirror / Atom feed
* delalloc and reflink fixes & tweaks
@ 2018-09-17 20:53 Christoph Hellwig
  2018-09-17 20:53 ` [PATCH 01/10] xfs: fix transaction leak in xfs_reflink_allocate_cow() Christoph Hellwig
                   ` (10 more replies)
  0 siblings, 11 replies; 26+ messages in thread
From: Christoph Hellwig @ 2018-09-17 20:53 UTC (permalink / raw)
  To: linux-xfs

Hi all,

this series starts with a for-4.19 fix originally from Dave, and then
has various tweaks, including a fix for a potential uninitialized data
exposure that are 4.20 material.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 01/10] xfs: fix transaction leak in xfs_reflink_allocate_cow()
  2018-09-17 20:53 delalloc and reflink fixes & tweaks Christoph Hellwig
@ 2018-09-17 20:53 ` Christoph Hellwig
  2018-09-17 23:51   ` Darrick J. Wong
  2018-09-17 20:53 ` [PATCH 02/10] xfs: don't bring in extents in xfs_bmap_punch_delalloc_range Christoph Hellwig
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2018-09-17 20:53 UTC (permalink / raw)
  To: linux-xfs; +Cc: Dave Chinner

From: Dave Chinner <dchinner@redhat.com>

When xfs_reflink_allocate_cow() allocates a transaction, it drops
the ILOCK to perform the operation. This Introduces a race condition
where another thread modifying the file can perform the COW
allocation operation underneath us. This result in the retry loop
finding an allocated block and jumping straight to the conversion
code. It does not, however, cancel the transaction it holds and so
this gets leaked. This results in a lockdep warning:

================================================
WARNING: lock held when returning to user space!
4.18.5 #1 Not tainted
------------------------------------------------
worker/6123 is leaving the kernel with locks still held!
1 lock held by worker/6123:
 #0: 000000009eab4f1b (sb_internal#2){.+.+}, at: xfs_trans_alloc+0x17c/0x220

And eventually the filesystem deadlocks because it runs out of log
space that is reserved by the leaked transaction and never gets
released.

The logic flow in xfs_reflink_allocate_cow() is a convoluted mess of
gotos - it's no surprise that it has bug where the flow through
several goto jumps then fails to clean up context from a non-obvious
logic path. CLean up the logic flow and make sure every path does
the right thing.

Reported-by: Alexander Y. Fomichev <git.user@gmail.com>
Tested-by: Alexander Y. Fomichev <git.user@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200981
Signed-off-by: Dave Chinner <dchinner@redhat.com>
[hch: slight refactor]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_reflink.c | 127 ++++++++++++++++++++++++++-----------------
 1 file changed, 77 insertions(+), 50 deletions(-)

diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 38f405415b88..d60d0eeed7b9 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -352,6 +352,47 @@ xfs_reflink_convert_cow(
 	return error;
 }
 
+/*
+ * Find the extent that maps the given range in the COW fork. Even if the extent
+ * is not shared we might have a preallocation for it in the COW fork. If so we
+ * use it that rather than trigger a new allocation.
+ */
+static int
+xfs_find_trim_cow_extent(
+	struct xfs_inode	*ip,
+	struct xfs_bmbt_irec	*imap,
+	bool			*shared,
+	bool			*found)
+{
+	xfs_fileoff_t		offset_fsb = imap->br_startoff;
+	xfs_filblks_t		count_fsb = imap->br_blockcount;
+	struct xfs_iext_cursor	icur;
+	struct xfs_bmbt_irec	got;
+	bool			trimmed;
+
+	*found = false;
+
+	/*
+	 * If we don't find an overlapping extent, trim the range we need to
+	 * allocate to fit the hole we found.
+	 */
+	if (!xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, &got) ||
+	    got.br_startoff > offset_fsb)
+		return xfs_reflink_trim_around_shared(ip, imap, shared, &trimmed);
+
+	*shared = true;
+	if (isnullstartblock(got.br_startblock)) {
+		xfs_trim_extent(imap, got.br_startoff, got.br_blockcount);
+		return 0;
+	}
+
+	/* real extent found - no need to allocate */
+	xfs_trim_extent(&got, offset_fsb, count_fsb);
+	*imap = got;
+	*found = true;
+	return 0;
+}
+
 /* Allocate all CoW reservations covering a range of blocks in a file. */
 int
 xfs_reflink_allocate_cow(
@@ -363,78 +404,64 @@ xfs_reflink_allocate_cow(
 	struct xfs_mount	*mp = ip->i_mount;
 	xfs_fileoff_t		offset_fsb = imap->br_startoff;
 	xfs_filblks_t		count_fsb = imap->br_blockcount;
-	struct xfs_bmbt_irec	got;
-	struct xfs_trans	*tp = NULL;
+	struct xfs_trans	*tp;
 	int			nimaps, error = 0;
-	bool			trimmed;
+	bool			found;
 	xfs_filblks_t		resaligned;
 	xfs_extlen_t		resblks = 0;
-	struct xfs_iext_cursor	icur;
 
-retry:
-	ASSERT(xfs_is_reflink_inode(ip));
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
+	ASSERT(xfs_is_reflink_inode(ip));
 
-	/*
-	 * Even if the extent is not shared we might have a preallocation for
-	 * it in the COW fork.  If so use it.
-	 */
-	if (xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, &got) &&
-	    got.br_startoff <= offset_fsb) {
-		*shared = true;
-
-		/* If we have a real allocation in the COW fork we're done. */
-		if (!isnullstartblock(got.br_startblock)) {
-			xfs_trim_extent(&got, offset_fsb, count_fsb);
-			*imap = got;
-			goto convert;
-		}
+	error = xfs_find_trim_cow_extent(ip, imap, shared, &found);
+	if (error || !*shared)
+		return error;
+	if (found)
+		goto convert;
 
-		xfs_trim_extent(imap, got.br_startoff, got.br_blockcount);
-	} else {
-		error = xfs_reflink_trim_around_shared(ip, imap, shared, &trimmed);
-		if (error || !*shared)
-			goto out;
-	}
+	resaligned = xfs_aligned_fsb_count(imap->br_startoff,
+		imap->br_blockcount, xfs_get_cowextsz_hint(ip));
+	resblks = XFS_DIOSTRAT_SPACE_RES(mp, resaligned);
 
-	if (!tp) {
-		resaligned = xfs_aligned_fsb_count(imap->br_startoff,
-			imap->br_blockcount, xfs_get_cowextsz_hint(ip));
-		resblks = XFS_DIOSTRAT_SPACE_RES(mp, resaligned);
+	xfs_iunlock(ip, *lockmode);
+	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp);
+	*lockmode = XFS_ILOCK_EXCL;
+	xfs_ilock(ip, *lockmode);
 
-		xfs_iunlock(ip, *lockmode);
-		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp);
-		*lockmode = XFS_ILOCK_EXCL;
-		xfs_ilock(ip, *lockmode);
+	if (error)
+		return error;
 
-		if (error)
-			return error;
+	error = xfs_qm_dqattach_locked(ip, false);
+	if (error)
+		goto out_trans_cancel;
 
-		error = xfs_qm_dqattach_locked(ip, false);
-		if (error)
-			goto out;
-		goto retry;
+	/*
+	 * Check for an overlapping extent again now that we dropped the ilock.
+	 */
+	error = xfs_find_trim_cow_extent(ip, imap, shared, &found);
+	if (error || !*shared)
+		goto out_trans_cancel;
+	if (found) {
+		xfs_trans_cancel(tp);
+		goto convert;
 	}
 
 	error = xfs_trans_reserve_quota_nblks(tp, ip, resblks, 0,
 			XFS_QMOPT_RES_REGBLKS);
 	if (error)
-		goto out;
+		goto out_trans_cancel;
 
 	xfs_trans_ijoin(tp, ip, 0);
 
-	nimaps = 1;
-
 	/* Allocate the entire reservation as unwritten blocks. */
+	nimaps = 1;
 	error = xfs_bmapi_write(tp, ip, imap->br_startoff, imap->br_blockcount,
 			XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC,
 			resblks, imap, &nimaps);
 	if (error)
-		goto out_trans_cancel;
+		goto out_unreserve;
 
 	xfs_inode_set_cowblocks_tag(ip);
-
-	/* Finish up. */
 	error = xfs_trans_commit(tp);
 	if (error)
 		return error;
@@ -447,12 +474,12 @@ xfs_reflink_allocate_cow(
 		return -ENOSPC;
 convert:
 	return xfs_reflink_convert_cow_extent(ip, imap, offset_fsb, count_fsb);
-out_trans_cancel:
+
+out_unreserve:
 	xfs_trans_unreserve_quota_nblks(tp, ip, (long)resblks, 0,
 			XFS_QMOPT_RES_REGBLKS);
-out:
-	if (tp)
-		xfs_trans_cancel(tp);
+out_trans_cancel:
+	xfs_trans_cancel(tp);
 	return error;
 }
 
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 02/10] xfs: don't bring in extents in xfs_bmap_punch_delalloc_range
  2018-09-17 20:53 delalloc and reflink fixes & tweaks Christoph Hellwig
  2018-09-17 20:53 ` [PATCH 01/10] xfs: fix transaction leak in xfs_reflink_allocate_cow() Christoph Hellwig
@ 2018-09-17 20:53 ` Christoph Hellwig
  2018-09-20 20:23   ` Darrick J. Wong
  2018-09-17 20:53 ` [PATCH 03/10] xfs: remove XFS_IO_INVALID Christoph Hellwig
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2018-09-17 20:53 UTC (permalink / raw)
  To: linux-xfs

This function is only used to punch out delayed allocations on I/O
failure, which means we need to have read the extents earlier.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_bmap_util.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index addbd74ecd8e..418e1e725f3a 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -702,13 +702,9 @@ xfs_bmap_punch_delalloc_range(
 	struct xfs_iext_cursor	icur;
 	int			error = 0;
 
-	xfs_ilock(ip, XFS_ILOCK_EXCL);
-	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
-		error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
-		if (error)
-			goto out_unlock;
-	}
+	ASSERT(ifp->if_flags & XFS_IFEXTENTS);
 
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	if (!xfs_iext_lookup_extent_before(ip, ifp, &end_fsb, &icur, &got))
 		goto out_unlock;
 
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 03/10] xfs: remove XFS_IO_INVALID
  2018-09-17 20:53 delalloc and reflink fixes & tweaks Christoph Hellwig
  2018-09-17 20:53 ` [PATCH 01/10] xfs: fix transaction leak in xfs_reflink_allocate_cow() Christoph Hellwig
  2018-09-17 20:53 ` [PATCH 02/10] xfs: don't bring in extents in xfs_bmap_punch_delalloc_range Christoph Hellwig
@ 2018-09-17 20:53 ` Christoph Hellwig
  2018-09-20 20:31   ` Darrick J. Wong
  2018-09-17 20:53 ` [PATCH 04/10] xfs: simplify the IOMAP_ZERO check in xfs_file_iomap_begin a bit Christoph Hellwig
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2018-09-17 20:53 UTC (permalink / raw)
  To: linux-xfs

The invalid state isn't any different from a hole, so merge the two
states.  Use the more descriptive hole name, but keep it as the first
value of the enum to catch uninitialized fields.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c |  4 ++--
 fs/xfs/xfs_aops.h | 14 ++++++--------
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 49f5f5896a43..338b9d9984e0 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -917,7 +917,7 @@ xfs_vm_writepage(
 	struct writeback_control *wbc)
 {
 	struct xfs_writepage_ctx wpc = {
-		.io_type = XFS_IO_INVALID,
+		.io_type = XFS_IO_HOLE,
 	};
 	int			ret;
 
@@ -933,7 +933,7 @@ xfs_vm_writepages(
 	struct writeback_control *wbc)
 {
 	struct xfs_writepage_ctx wpc = {
-		.io_type = XFS_IO_INVALID,
+		.io_type = XFS_IO_HOLE,
 	};
 	int			ret;
 
diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
index 9af867951a10..494b4338446e 100644
--- a/fs/xfs/xfs_aops.h
+++ b/fs/xfs/xfs_aops.h
@@ -12,21 +12,19 @@ extern struct bio_set xfs_ioend_bioset;
  * Types of I/O for bmap clustering and I/O completion tracking.
  */
 enum {
-	XFS_IO_INVALID,		/* initial state */
+	XFS_IO_HOLE,		/* covers region without any block allocation */
 	XFS_IO_DELALLOC,	/* covers delalloc region */
 	XFS_IO_UNWRITTEN,	/* covers allocated but uninitialized data */
 	XFS_IO_OVERWRITE,	/* covers already allocated extent */
 	XFS_IO_COW,		/* covers copy-on-write extent */
-	XFS_IO_HOLE,		/* covers region without any block allocation */
 };
 
 #define XFS_IO_TYPES \
-	{ XFS_IO_INVALID,		"invalid" }, \
-	{ XFS_IO_DELALLOC,		"delalloc" }, \
-	{ XFS_IO_UNWRITTEN,		"unwritten" }, \
-	{ XFS_IO_OVERWRITE,		"overwrite" }, \
-	{ XFS_IO_COW,			"CoW" }, \
-	{ XFS_IO_HOLE,			"hole" }
+	{ XFS_IO_HOLE,			"hole" },	\
+	{ XFS_IO_DELALLOC,		"delalloc" },	\
+	{ XFS_IO_UNWRITTEN,		"unwritten" },	\
+	{ XFS_IO_OVERWRITE,		"overwrite" },	\
+	{ XFS_IO_COW,			"CoW" }
 
 /*
  * Structure for buffered I/O completions.
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 04/10] xfs: simplify the IOMAP_ZERO check in xfs_file_iomap_begin a bit
  2018-09-17 20:53 delalloc and reflink fixes & tweaks Christoph Hellwig
                   ` (2 preceding siblings ...)
  2018-09-17 20:53 ` [PATCH 03/10] xfs: remove XFS_IO_INVALID Christoph Hellwig
@ 2018-09-17 20:53 ` Christoph Hellwig
  2018-09-20 20:31   ` Darrick J. Wong
  2018-09-26 15:17   ` Brian Foster
  2018-09-17 20:53 ` [PATCH 05/10] xfs: handle zeroing in xfs_file_iomap_begin_delay Christoph Hellwig
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 26+ messages in thread
From: Christoph Hellwig @ 2018-09-17 20:53 UTC (permalink / raw)
  To: linux-xfs

Merge the two cases for reflink vs not into a single one.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_iomap.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 6320aca39f39..9595a3c59ade 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1048,16 +1048,20 @@ xfs_file_iomap_begin(
 	if (!(flags & (IOMAP_WRITE | IOMAP_ZERO)))
 		goto out_found;
 
+	/*
+	 * Don't need to allocate over holes when doing zeroing operations,
+	 * unless we need to COW and have an existing extent.
+	 */
+	if ((flags & IOMAP_ZERO) &&
+	    (!xfs_is_reflink_inode(ip) ||
+	     !needs_cow_for_zeroing(&imap, nimaps)))
+		goto out_found;
+
 	/*
 	 * Break shared extents if necessary. Checks for non-blocking IO have
 	 * been done up front, so we don't need to do them here.
 	 */
 	if (xfs_is_reflink_inode(ip)) {
-		/* if zeroing doesn't need COW allocation, then we are done. */
-		if ((flags & IOMAP_ZERO) &&
-		    !needs_cow_for_zeroing(&imap, nimaps))
-			goto out_found;
-
 		if (flags & IOMAP_DIRECT) {
 			/* may drop and re-acquire the ilock */
 			error = xfs_reflink_allocate_cow(ip, &imap, &shared,
@@ -1074,10 +1078,6 @@ xfs_file_iomap_begin(
 		length = XFS_FSB_TO_B(mp, end_fsb) - offset;
 	}
 
-	/* Don't need to allocate over holes when doing zeroing operations. */
-	if (flags & IOMAP_ZERO)
-		goto out_found;
-
 	if (!imap_needs_alloc(inode, &imap, nimaps))
 		goto out_found;
 
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 05/10] xfs: handle zeroing in xfs_file_iomap_begin_delay
  2018-09-17 20:53 delalloc and reflink fixes & tweaks Christoph Hellwig
                   ` (3 preceding siblings ...)
  2018-09-17 20:53 ` [PATCH 04/10] xfs: simplify the IOMAP_ZERO check in xfs_file_iomap_begin a bit Christoph Hellwig
@ 2018-09-17 20:53 ` Christoph Hellwig
  2018-09-17 20:53 ` [PATCH 06/10] xfs: always allocate blocks as unwritten for file data Christoph Hellwig
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2018-09-17 20:53 UTC (permalink / raw)
  To: linux-xfs

We only need to allocate blocks for zeroing for reflink inodes,
and for we currently have a special case for reflink files in
the otherwise direct I/O path that I'd like to get rid of.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_iomap.c | 34 ++++++++++++++++++++++++++++++----
 1 file changed, 30 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 9595a3c59ade..854b91080002 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -62,6 +62,21 @@ xfs_bmbt_to_iomap(
 	iomap->dax_dev = xfs_find_daxdev_for_inode(VFS_I(ip));
 }
 
+static void
+xfs_hole_to_iomap(
+	struct xfs_inode	*ip,
+	struct iomap		*iomap,
+	xfs_fileoff_t		offset_fsb,
+	xfs_fileoff_t		end_fsb)
+{
+	iomap->addr = IOMAP_NULL_ADDR;
+	iomap->type = IOMAP_HOLE;
+	iomap->offset = XFS_FSB_TO_B(ip->i_mount, offset_fsb);
+	iomap->length = XFS_FSB_TO_B(ip->i_mount, end_fsb - offset_fsb);
+	iomap->bdev = xfs_find_bdev_for_inode(VFS_I(ip));
+	iomap->dax_dev = xfs_find_daxdev_for_inode(VFS_I(ip));
+}
+
 xfs_extlen_t
 xfs_eof_alignment(
 	struct xfs_inode	*ip,
@@ -502,6 +517,7 @@ xfs_file_iomap_begin_delay(
 	struct inode		*inode,
 	loff_t			offset,
 	loff_t			count,
+	unsigned		flags,
 	struct iomap		*iomap)
 {
 	struct xfs_inode	*ip = XFS_I(inode);
@@ -539,8 +555,12 @@ xfs_file_iomap_begin_delay(
 	}
 
 	eof = !xfs_iext_lookup_extent(ip, ifp, offset_fsb, &icur, &got);
-	if (!eof && got.br_startoff <= offset_fsb) {
-		if (xfs_is_reflink_inode(ip)) {
+	if (eof)
+		got.br_startoff = maxbytes_fsb;
+	if (got.br_startoff <= offset_fsb) {
+		if (xfs_is_reflink_inode(ip) &&
+		    ((flags & IOMAP_WRITE) ||
+		     got.br_state != XFS_EXT_UNWRITTEN)) {
 			bool		shared;
 
 			end_fsb = min(XFS_B_TO_FSB(mp, offset + count),
@@ -555,6 +575,11 @@ xfs_file_iomap_begin_delay(
 		goto done;
 	}
 
+	if (flags & IOMAP_ZERO) {
+		xfs_hole_to_iomap(ip, iomap, offset_fsb, got.br_startoff);
+		goto out_unlock;
+	}
+
 	error = xfs_qm_dqattach_locked(ip, false);
 	if (error)
 		goto out_unlock;
@@ -1009,10 +1034,11 @@ xfs_file_iomap_begin(
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
-	if (((flags & (IOMAP_WRITE | IOMAP_DIRECT)) == IOMAP_WRITE) &&
+	if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && !(flags & IOMAP_DIRECT) &&
 			!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
 		/* Reserve delalloc blocks for regular writeback. */
-		return xfs_file_iomap_begin_delay(inode, offset, length, iomap);
+		return xfs_file_iomap_begin_delay(inode, offset, length, flags,
+				iomap);
 	}
 
 	/*
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 06/10] xfs: always allocate blocks as unwritten for file data
  2018-09-17 20:53 delalloc and reflink fixes & tweaks Christoph Hellwig
                   ` (4 preceding siblings ...)
  2018-09-17 20:53 ` [PATCH 05/10] xfs: handle zeroing in xfs_file_iomap_begin_delay Christoph Hellwig
@ 2018-09-17 20:53 ` Christoph Hellwig
  2018-09-17 20:53 ` [PATCH 07/10] xfs: handle extent size hints in xfs_file_iomap_begin_delay Christoph Hellwig
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2018-09-17 20:53 UTC (permalink / raw)
  To: linux-xfs

XFS historically had a small race that could lead to exposing
uninitialized data in case of a crash.  If we are filling holes using
buffered I/O we convert the delayed allocation to a real allocation
before writing out the data.  If we crash after the blocks were
allocated, but before the data was written this could lead to reading
uninitialized blocks (or leaked data from a previous allocation that was
reused).  Now that we have the CIL logging extent format changes is
cheap, so we can switch to always allocating blocks as unwritten.
Note that this is not be strictly necessary for writes that append
beyond i_size, but given that we have to log a transaction in that
case anyway we might as well give all block allocations a uniform
treatment.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c  | 3 +--
 fs/xfs/xfs_aops.h  | 2 --
 fs/xfs/xfs_iomap.c | 4 ++--
 3 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 338b9d9984e0..775cdcfe70c2 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -437,8 +437,7 @@ xfs_map_blocks(
 			imap.br_blockcount = cow_fsb - imap.br_startoff;
 
 		if (isnullstartblock(imap.br_startblock)) {
-			/* got a delalloc extent */
-			wpc->io_type = XFS_IO_DELALLOC;
+			wpc->io_type = XFS_IO_UNWRITTEN;
 			goto allocate_blocks;
 		}
 
diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
index 494b4338446e..f0710c54cf68 100644
--- a/fs/xfs/xfs_aops.h
+++ b/fs/xfs/xfs_aops.h
@@ -13,7 +13,6 @@ extern struct bio_set xfs_ioend_bioset;
  */
 enum {
 	XFS_IO_HOLE,		/* covers region without any block allocation */
-	XFS_IO_DELALLOC,	/* covers delalloc region */
 	XFS_IO_UNWRITTEN,	/* covers allocated but uninitialized data */
 	XFS_IO_OVERWRITE,	/* covers already allocated extent */
 	XFS_IO_COW,		/* covers copy-on-write extent */
@@ -21,7 +20,6 @@ enum {
 
 #define XFS_IO_TYPES \
 	{ XFS_IO_HOLE,			"hole" },	\
-	{ XFS_IO_DELALLOC,		"delalloc" },	\
 	{ XFS_IO_UNWRITTEN,		"unwritten" },	\
 	{ XFS_IO_OVERWRITE,		"overwrite" },	\
 	{ XFS_IO_COW,			"CoW" }
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 854b91080002..e12ff5e9a8ec 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -687,11 +687,11 @@ xfs_iomap_write_allocate(
 	xfs_trans_t	*tp;
 	int		nimaps;
 	int		error = 0;
-	int		flags = XFS_BMAPI_DELALLOC;
+	int		flags = XFS_BMAPI_DELALLOC | XFS_BMAPI_PREALLOC;
 	int		nres;
 
 	if (whichfork == XFS_COW_FORK)
-		flags |= XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC;
+		flags |= XFS_BMAPI_COWFORK;
 
 	/*
 	 * Make sure that the dquots are there.
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 07/10] xfs: handle extent size hints in xfs_file_iomap_begin_delay
  2018-09-17 20:53 delalloc and reflink fixes & tweaks Christoph Hellwig
                   ` (5 preceding siblings ...)
  2018-09-17 20:53 ` [PATCH 06/10] xfs: always allocate blocks as unwritten for file data Christoph Hellwig
@ 2018-09-17 20:53 ` Christoph Hellwig
  2018-09-26 15:17   ` Brian Foster
  2018-09-17 20:53 ` [PATCH 08/10] xfs: remove the unused shared argument to xfs_reflink_reserve_cow Christoph Hellwig
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2018-09-17 20:53 UTC (permalink / raw)
  To: linux-xfs

For files with extent size hints we always allocate unwritten extents
first and then convert them to real extents later to avoid exposing
stale data outside the blocks actually written.  But there is no
reason we can't do unwritten extent allocations from delalloc blocks,
in fact we already do that for COW operations.

Note that this does not handle files on the RT subvolume (yet), as
we removed code handling RT allocations from the delalloc path this
would be a slightly bigger project, and it doesn't really help with
simplifying the COW path either.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_iomap.c | 17 ++++-------------
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index e12ff5e9a8ec..2d08ace09e25 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -533,7 +533,6 @@ xfs_file_iomap_begin_delay(
 	xfs_fsblock_t		prealloc_blocks = 0;
 
 	ASSERT(!XFS_IS_REALTIME_INODE(ip));
-	ASSERT(!xfs_get_extsz_hint(ip));
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 
@@ -1035,7 +1034,7 @@ xfs_file_iomap_begin(
 		return -EIO;
 
 	if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && !(flags & IOMAP_DIRECT) &&
-			!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
+			!IS_DAX(inode) && !XFS_IS_REALTIME_INODE(ip)) {
 		/* Reserve delalloc blocks for regular writeback. */
 		return xfs_file_iomap_begin_delay(inode, offset, length, flags,
 				iomap);
@@ -1088,17 +1087,9 @@ xfs_file_iomap_begin(
 	 * been done up front, so we don't need to do them here.
 	 */
 	if (xfs_is_reflink_inode(ip)) {
-		if (flags & IOMAP_DIRECT) {
-			/* may drop and re-acquire the ilock */
-			error = xfs_reflink_allocate_cow(ip, &imap, &shared,
-					&lockmode);
-			if (error)
-				goto out_unlock;
-		} else {
-			error = xfs_reflink_reserve_cow(ip, &imap, &shared);
-			if (error)
-				goto out_unlock;
-		}
+		error = xfs_reflink_allocate_cow(ip, &imap, &shared, &lockmode);
+		if (error)
+			goto out_unlock;
 
 		end_fsb = imap.br_startoff + imap.br_blockcount;
 		length = XFS_FSB_TO_B(mp, end_fsb) - offset;
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 08/10] xfs: remove the unused shared argument to xfs_reflink_reserve_cow
  2018-09-17 20:53 delalloc and reflink fixes & tweaks Christoph Hellwig
                   ` (6 preceding siblings ...)
  2018-09-17 20:53 ` [PATCH 07/10] xfs: handle extent size hints in xfs_file_iomap_begin_delay Christoph Hellwig
@ 2018-09-17 20:53 ` Christoph Hellwig
  2018-09-17 20:53 ` [PATCH 09/10] xfs: remove the unused trimmed argument from xfs_reflink_trim_around_shared Christoph Hellwig
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2018-09-17 20:53 UTC (permalink / raw)
  To: linux-xfs

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_iomap.c   |  4 +---
 fs/xfs/xfs_reflink.c | 12 +++++-------
 fs/xfs/xfs_reflink.h |  2 +-
 3 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 2d08ace09e25..0d815b00643b 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -560,12 +560,10 @@ xfs_file_iomap_begin_delay(
 		if (xfs_is_reflink_inode(ip) &&
 		    ((flags & IOMAP_WRITE) ||
 		     got.br_state != XFS_EXT_UNWRITTEN)) {
-			bool		shared;
-
 			end_fsb = min(XFS_B_TO_FSB(mp, offset + count),
 					maxbytes_fsb);
 			xfs_trim_extent(&got, offset_fsb, end_fsb - offset_fsb);
-			error = xfs_reflink_reserve_cow(ip, &got, &shared);
+			error = xfs_reflink_reserve_cow(ip, &got);
 			if (error)
 				goto out_unlock;
 		}
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index d60d0eeed7b9..0b41d29d433d 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -241,7 +241,7 @@ xfs_reflink_trim_around_shared(
 /*
  * Trim the passed in imap to the next shared/unshared extent boundary, and
  * if imap->br_startoff points to a shared extent reserve space for it in the
- * COW fork.  In this case *shared is set to true, else to false.
+ * COW fork.
  *
  * Note that imap will always contain the block numbers for the existing blocks
  * in the data fork, as the upper layers need them for read-modify-write
@@ -250,14 +250,14 @@ xfs_reflink_trim_around_shared(
 int
 xfs_reflink_reserve_cow(
 	struct xfs_inode	*ip,
-	struct xfs_bmbt_irec	*imap,
-	bool			*shared)
+	struct xfs_bmbt_irec	*imap)
 {
 	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
 	struct xfs_bmbt_irec	got;
 	int			error = 0;
 	bool			eof = false, trimmed;
 	struct xfs_iext_cursor	icur;
+	bool			shared;
 
 	/*
 	 * Search the COW fork extent list first.  This serves two purposes:
@@ -273,18 +273,16 @@ xfs_reflink_reserve_cow(
 	if (!eof && got.br_startoff <= imap->br_startoff) {
 		trace_xfs_reflink_cow_found(ip, imap);
 		xfs_trim_extent(imap, got.br_startoff, got.br_blockcount);
-
-		*shared = true;
 		return 0;
 	}
 
 	/* Trim the mapping to the nearest shared extent boundary. */
-	error = xfs_reflink_trim_around_shared(ip, imap, shared, &trimmed);
+	error = xfs_reflink_trim_around_shared(ip, imap, &shared, &trimmed);
 	if (error)
 		return error;
 
 	/* Not shared?  Just report the (potentially capped) extent. */
-	if (!*shared)
+	if (!shared)
 		return 0;
 
 	/*
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index c585ad9552b2..b77f4079022a 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -13,7 +13,7 @@ extern int xfs_reflink_trim_around_shared(struct xfs_inode *ip,
 		struct xfs_bmbt_irec *irec, bool *shared, bool *trimmed);
 
 extern int xfs_reflink_reserve_cow(struct xfs_inode *ip,
-		struct xfs_bmbt_irec *imap, bool *shared);
+		struct xfs_bmbt_irec *imap);
 extern int xfs_reflink_allocate_cow(struct xfs_inode *ip,
 		struct xfs_bmbt_irec *imap, bool *shared, uint *lockmode);
 extern int xfs_reflink_convert_cow(struct xfs_inode *ip, xfs_off_t offset,
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 09/10] xfs: remove the unused trimmed argument from xfs_reflink_trim_around_shared
  2018-09-17 20:53 delalloc and reflink fixes & tweaks Christoph Hellwig
                   ` (7 preceding siblings ...)
  2018-09-17 20:53 ` [PATCH 08/10] xfs: remove the unused shared argument to xfs_reflink_reserve_cow Christoph Hellwig
@ 2018-09-17 20:53 ` Christoph Hellwig
  2018-09-17 20:53 ` [PATCH 10/10] xfs: use a separate iomap_ops for delalloc writes Christoph Hellwig
  2018-09-17 21:23 ` delalloc and reflink fixes & tweaks Dave Chinner
  10 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2018-09-17 20:53 UTC (permalink / raw)
  To: linux-xfs

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_bmap_util.c |  4 ++--
 fs/xfs/xfs_iomap.c     |  5 ++---
 fs/xfs/xfs_reflink.c   | 15 +++++----------
 fs/xfs/xfs_reflink.h   |  2 +-
 4 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 418e1e725f3a..3f9d4d2777e4 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -406,10 +406,10 @@ xfs_getbmap_report_one(
 	struct xfs_bmbt_irec	*got)
 {
 	struct kgetbmap		*p = out + bmv->bmv_entries;
-	bool			shared = false, trimmed = false;
+	bool			shared = false;
 	int			error;
 
-	error = xfs_reflink_trim_around_shared(ip, got, &shared, &trimmed);
+	error = xfs_reflink_trim_around_shared(ip, got, &shared);
 	if (error)
 		return error;
 
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 0d815b00643b..9079196bbc35 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1025,7 +1025,7 @@ xfs_file_iomap_begin(
 	struct xfs_bmbt_irec	imap;
 	xfs_fileoff_t		offset_fsb, end_fsb;
 	int			nimaps = 1, error = 0;
-	bool			shared = false, trimmed = false;
+	bool			shared = false;
 	unsigned		lockmode;
 
 	if (XFS_FORCED_SHUTDOWN(mp))
@@ -1061,8 +1061,7 @@ xfs_file_iomap_begin(
 
 	if (flags & IOMAP_REPORT) {
 		/* Trim the mapping to the nearest shared extent boundary. */
-		error = xfs_reflink_trim_around_shared(ip, &imap, &shared,
-				&trimmed);
+		error = xfs_reflink_trim_around_shared(ip, &imap, &shared);
 		if (error)
 			goto out_unlock;
 	}
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 0b41d29d433d..e0111d31140b 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -182,8 +182,7 @@ int
 xfs_reflink_trim_around_shared(
 	struct xfs_inode	*ip,
 	struct xfs_bmbt_irec	*irec,
-	bool			*shared,
-	bool			*trimmed)
+	bool			*shared)
 {
 	xfs_agnumber_t		agno;
 	xfs_agblock_t		agbno;
@@ -209,7 +208,7 @@ xfs_reflink_trim_around_shared(
 	if (error)
 		return error;
 
-	*shared = *trimmed = false;
+	*shared = false;
 	if (fbno == NULLAGBLOCK) {
 		/* No shared blocks at all. */
 		return 0;
@@ -222,8 +221,6 @@ xfs_reflink_trim_around_shared(
 		 */
 		irec->br_blockcount = flen;
 		*shared = true;
-		if (flen != aglen)
-			*trimmed = true;
 		return 0;
 	} else {
 		/*
@@ -233,7 +230,6 @@ xfs_reflink_trim_around_shared(
 		 * start of the shared region.
 		 */
 		irec->br_blockcount = fbno - agbno;
-		*trimmed = true;
 		return 0;
 	}
 }
@@ -255,7 +251,7 @@ xfs_reflink_reserve_cow(
 	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
 	struct xfs_bmbt_irec	got;
 	int			error = 0;
-	bool			eof = false, trimmed;
+	bool			eof = false;
 	struct xfs_iext_cursor	icur;
 	bool			shared;
 
@@ -277,7 +273,7 @@ xfs_reflink_reserve_cow(
 	}
 
 	/* Trim the mapping to the nearest shared extent boundary. */
-	error = xfs_reflink_trim_around_shared(ip, imap, &shared, &trimmed);
+	error = xfs_reflink_trim_around_shared(ip, imap, &shared);
 	if (error)
 		return error;
 
@@ -366,7 +362,6 @@ xfs_find_trim_cow_extent(
 	xfs_filblks_t		count_fsb = imap->br_blockcount;
 	struct xfs_iext_cursor	icur;
 	struct xfs_bmbt_irec	got;
-	bool			trimmed;
 
 	*found = false;
 
@@ -376,7 +371,7 @@ xfs_find_trim_cow_extent(
 	 */
 	if (!xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, &got) ||
 	    got.br_startoff > offset_fsb)
-		return xfs_reflink_trim_around_shared(ip, imap, shared, &trimmed);
+		return xfs_reflink_trim_around_shared(ip, imap, shared);
 
 	*shared = true;
 	if (isnullstartblock(got.br_startblock)) {
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index b77f4079022a..7f47202b5639 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -10,7 +10,7 @@ extern int xfs_reflink_find_shared(struct xfs_mount *mp, struct xfs_trans *tp,
 		xfs_agnumber_t agno, xfs_agblock_t agbno, xfs_extlen_t aglen,
 		xfs_agblock_t *fbno, xfs_extlen_t *flen, bool find_maximal);
 extern int xfs_reflink_trim_around_shared(struct xfs_inode *ip,
-		struct xfs_bmbt_irec *irec, bool *shared, bool *trimmed);
+		struct xfs_bmbt_irec *irec, bool *shared);
 
 extern int xfs_reflink_reserve_cow(struct xfs_inode *ip,
 		struct xfs_bmbt_irec *imap);
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 10/10] xfs: use a separate iomap_ops for delalloc writes
  2018-09-17 20:53 delalloc and reflink fixes & tweaks Christoph Hellwig
                   ` (8 preceding siblings ...)
  2018-09-17 20:53 ` [PATCH 09/10] xfs: remove the unused trimmed argument from xfs_reflink_trim_around_shared Christoph Hellwig
@ 2018-09-17 20:53 ` Christoph Hellwig
  2018-09-26 15:18   ` Brian Foster
  2018-09-17 21:23 ` delalloc and reflink fixes & tweaks Dave Chinner
  10 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2018-09-17 20:53 UTC (permalink / raw)
  To: linux-xfs

Introduce a new xfs_delalloc_iomap_ops instance that is passed to
iomap_apply when we are doing delayed allocations.  This keeps the code
more neatly separated for future work.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_bmap_util.c |   3 +-
 fs/xfs/xfs_file.c      |   6 +-
 fs/xfs/xfs_iomap.c     | 177 ++++++++++++++++++++---------------------
 fs/xfs/xfs_iomap.h     |   1 +
 fs/xfs/xfs_iops.c      |   4 +-
 fs/xfs/xfs_reflink.c   |   2 +-
 6 files changed, 95 insertions(+), 98 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 3f9d4d2777e4..414dbc31139c 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1165,7 +1165,8 @@ xfs_free_file_space(
 		return 0;
 	if (offset + len > XFS_ISIZE(ip))
 		len = XFS_ISIZE(ip) - offset;
-	error = iomap_zero_range(VFS_I(ip), offset, len, NULL, &xfs_iomap_ops);
+	error = iomap_zero_range(VFS_I(ip), offset, len, NULL,
+			&xfs_delalloc_iomap_ops);
 	if (error)
 		return error;
 
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 61a5ad2600e8..fa8fbc84eec8 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -355,7 +355,7 @@ xfs_file_aio_write_checks(
 	
 		trace_xfs_zero_eof(ip, isize, iocb->ki_pos - isize);
 		error = iomap_zero_range(inode, isize, iocb->ki_pos - isize,
-				NULL, &xfs_iomap_ops);
+				NULL, &xfs_delalloc_iomap_ops);
 		if (error)
 			return error;
 	} else
@@ -633,7 +633,7 @@ xfs_file_buffered_aio_write(
 	current->backing_dev_info = inode_to_bdi(inode);
 
 	trace_xfs_file_buffered_write(ip, iov_iter_count(from), iocb->ki_pos);
-	ret = iomap_file_buffered_write(iocb, from, &xfs_iomap_ops);
+	ret = iomap_file_buffered_write(iocb, from, &xfs_delalloc_iomap_ops);
 	if (likely(ret >= 0))
 		iocb->ki_pos += ret;
 
@@ -1077,7 +1077,7 @@ __xfs_filemap_fault(
 			ret = dax_finish_sync_fault(vmf, pe_size, pfn);
 	} else {
 		if (write_fault)
-			ret = iomap_page_mkwrite(vmf, &xfs_iomap_ops);
+			ret = iomap_page_mkwrite(vmf, &xfs_delalloc_iomap_ops);
 		else
 			ret = filemap_fault(vmf);
 	}
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 9079196bbc35..1bfc40ce581a 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -512,8 +512,16 @@ xfs_iomap_prealloc_size(
 	return alloc_blocks;
 }
 
+static int xfs_file_iomap_begin(struct inode *inode, loff_t offset,
+		loff_t length, unsigned flags, struct iomap *iomap);
+
+/*
+ * Start writing to a file, allocating blocks using delayed allocations if
+ * needed.  Note that in case of doing COW overwrites the iomap needs to return
+ * the address of the existing data.
+ */
 static int
-xfs_file_iomap_begin_delay(
+xfs_delalloc_iomap_begin(
 	struct inode		*inode,
 	loff_t			offset,
 	loff_t			count,
@@ -532,7 +540,12 @@ xfs_file_iomap_begin_delay(
 	struct xfs_iext_cursor	icur;
 	xfs_fsblock_t		prealloc_blocks = 0;
 
-	ASSERT(!XFS_IS_REALTIME_INODE(ip));
+	/*
+	 * We currently don't support the RT subvolume special cases in the
+	 * delalloc path, so revert to real allocations.
+	 */
+	if (XFS_IS_REALTIME_INODE(ip))
+		return xfs_file_iomap_begin(inode, offset, count, flags, iomap);
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 
@@ -658,6 +671,73 @@ xfs_file_iomap_begin_delay(
 	return error;
 }
 
+static int
+xfs_delalloc_iomap_end(
+	struct inode		*inode,
+	loff_t			offset,
+	loff_t			length,
+	ssize_t			written,
+	unsigned		flags,
+	struct iomap		*iomap)
+{
+	struct xfs_inode	*ip = XFS_I(inode);
+	struct xfs_mount	*mp = ip->i_mount;
+	xfs_fileoff_t		start_fsb;
+	xfs_fileoff_t		end_fsb;
+	int			error = 0;
+
+	if (iomap->type != IOMAP_DELALLOC)
+		return 0;
+
+	/*
+	 * Behave as if the write failed if drop writes is enabled. Set the NEW
+	 * flag to force delalloc cleanup.
+	 */
+	if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_DROP_WRITES)) {
+		iomap->flags |= IOMAP_F_NEW;
+		written = 0;
+	}
+
+	/*
+	 * start_fsb refers to the first unused block after a short write. If
+	 * nothing was written, round offset down to point at the first block in
+	 * the range.
+	 */
+	if (unlikely(!written))
+		start_fsb = XFS_B_TO_FSBT(mp, offset);
+	else
+		start_fsb = XFS_B_TO_FSB(mp, offset + written);
+	end_fsb = XFS_B_TO_FSB(mp, offset + length);
+
+	/*
+	 * Trim delalloc blocks if they were allocated by this write and we
+	 * didn't manage to write the whole range.
+	 *
+	 * We don't need to care about racing delalloc as we hold i_mutex
+	 * across the reserve/allocate/unreserve calls. If there are delalloc
+	 * blocks in the range, they are ours.
+	 */
+	if ((iomap->flags & IOMAP_F_NEW) && start_fsb < end_fsb) {
+		truncate_pagecache_range(VFS_I(ip), XFS_FSB_TO_B(mp, start_fsb),
+					 XFS_FSB_TO_B(mp, end_fsb) - 1);
+
+		error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
+					       end_fsb - start_fsb);
+		if (error && !XFS_FORCED_SHUTDOWN(mp)) {
+			xfs_alert(mp, "%s: unable to clean up ino %lld",
+				__func__, ip->i_ino);
+			return error;
+		}
+	}
+
+	return 0;
+}
+
+const struct iomap_ops xfs_delalloc_iomap_ops = {
+	.iomap_begin		= xfs_delalloc_iomap_begin,
+	.iomap_end		= xfs_delalloc_iomap_end,
+};
+
 /*
  * Pass in a delayed allocate extent, convert it to real extents;
  * return to the caller the extent we create which maps on top of
@@ -1028,16 +1108,13 @@ xfs_file_iomap_begin(
 	bool			shared = false;
 	unsigned		lockmode;
 
+	ASSERT(!(flags & (IOMAP_WRITE | IOMAP_ZERO)) ||
+		(flags & IOMAP_DIRECT) || IS_DAX(inode));
+	ASSERT(!(flags & IOMAP_ZERO) || XFS_IS_REALTIME_INODE(ip));
+
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
-	if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && !(flags & IOMAP_DIRECT) &&
-			!IS_DAX(inode) && !XFS_IS_REALTIME_INODE(ip)) {
-		/* Reserve delalloc blocks for regular writeback. */
-		return xfs_file_iomap_begin_delay(inode, offset, length, flags,
-				iomap);
-	}
-
 	/*
 	 * Lock the inode in the manner required for the specified operation and
 	 * check for as many conditions that would result in blocking as
@@ -1070,15 +1147,6 @@ xfs_file_iomap_begin(
 	if (!(flags & (IOMAP_WRITE | IOMAP_ZERO)))
 		goto out_found;
 
-	/*
-	 * Don't need to allocate over holes when doing zeroing operations,
-	 * unless we need to COW and have an existing extent.
-	 */
-	if ((flags & IOMAP_ZERO) &&
-	    (!xfs_is_reflink_inode(ip) ||
-	     !needs_cow_for_zeroing(&imap, nimaps)))
-		goto out_found;
-
 	/*
 	 * Break shared extents if necessary. Checks for non-blocking IO have
 	 * been done up front, so we don't need to do them here.
@@ -1148,81 +1216,8 @@ xfs_file_iomap_begin(
 	return error;
 }
 
-static int
-xfs_file_iomap_end_delalloc(
-	struct xfs_inode	*ip,
-	loff_t			offset,
-	loff_t			length,
-	ssize_t			written,
-	struct iomap		*iomap)
-{
-	struct xfs_mount	*mp = ip->i_mount;
-	xfs_fileoff_t		start_fsb;
-	xfs_fileoff_t		end_fsb;
-	int			error = 0;
-
-	/*
-	 * Behave as if the write failed if drop writes is enabled. Set the NEW
-	 * flag to force delalloc cleanup.
-	 */
-	if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_DROP_WRITES)) {
-		iomap->flags |= IOMAP_F_NEW;
-		written = 0;
-	}
-
-	/*
-	 * start_fsb refers to the first unused block after a short write. If
-	 * nothing was written, round offset down to point at the first block in
-	 * the range.
-	 */
-	if (unlikely(!written))
-		start_fsb = XFS_B_TO_FSBT(mp, offset);
-	else
-		start_fsb = XFS_B_TO_FSB(mp, offset + written);
-	end_fsb = XFS_B_TO_FSB(mp, offset + length);
-
-	/*
-	 * Trim delalloc blocks if they were allocated by this write and we
-	 * didn't manage to write the whole range.
-	 *
-	 * We don't need to care about racing delalloc as we hold i_mutex
-	 * across the reserve/allocate/unreserve calls. If there are delalloc
-	 * blocks in the range, they are ours.
-	 */
-	if ((iomap->flags & IOMAP_F_NEW) && start_fsb < end_fsb) {
-		truncate_pagecache_range(VFS_I(ip), XFS_FSB_TO_B(mp, start_fsb),
-					 XFS_FSB_TO_B(mp, end_fsb) - 1);
-
-		error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
-					       end_fsb - start_fsb);
-		if (error && !XFS_FORCED_SHUTDOWN(mp)) {
-			xfs_alert(mp, "%s: unable to clean up ino %lld",
-				__func__, ip->i_ino);
-			return error;
-		}
-	}
-
-	return 0;
-}
-
-static int
-xfs_file_iomap_end(
-	struct inode		*inode,
-	loff_t			offset,
-	loff_t			length,
-	ssize_t			written,
-	unsigned		flags,
-	struct iomap		*iomap)
-{
-	if ((flags & IOMAP_WRITE) && iomap->type == IOMAP_DELALLOC)
-		return xfs_file_iomap_end_delalloc(XFS_I(inode), offset,
-				length, written, iomap);
-	return 0;
-}
-
 const struct iomap_ops xfs_iomap_ops = {
 	.iomap_begin		= xfs_file_iomap_begin,
-	.iomap_end		= xfs_file_iomap_end,
 };
 
 static int
diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h
index c6170548831b..fe4cde2c30c9 100644
--- a/fs/xfs/xfs_iomap.h
+++ b/fs/xfs/xfs_iomap.h
@@ -42,6 +42,7 @@ xfs_aligned_fsb_count(
 }
 
 extern const struct iomap_ops xfs_iomap_ops;
+extern const struct iomap_ops xfs_delalloc_iomap_ops;
 extern const struct iomap_ops xfs_xattr_iomap_ops;
 
 #endif /* __XFS_IOMAP_H__*/
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index c3e74f9128e8..fb8035e3ba0b 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -865,10 +865,10 @@ xfs_setattr_size(
 	if (newsize > oldsize) {
 		trace_xfs_zero_eof(ip, oldsize, newsize - oldsize);
 		error = iomap_zero_range(inode, oldsize, newsize - oldsize,
-				&did_zeroing, &xfs_iomap_ops);
+				&did_zeroing, &xfs_delalloc_iomap_ops);
 	} else {
 		error = iomap_truncate_page(inode, newsize, &did_zeroing,
-				&xfs_iomap_ops);
+				&xfs_delalloc_iomap_ops);
 	}
 
 	if (error)
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index e0111d31140b..ac94ace45424 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1393,7 +1393,7 @@ xfs_reflink_dirty_extents(
 			if (fpos + flen > isize)
 				flen = isize - fpos;
 			error = iomap_file_dirty(VFS_I(ip), fpos, flen,
-					&xfs_iomap_ops);
+					&xfs_delalloc_iomap_ops);
 			xfs_ilock(ip, XFS_ILOCK_EXCL);
 			if (error)
 				goto out;
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: delalloc and reflink fixes & tweaks
  2018-09-17 20:53 delalloc and reflink fixes & tweaks Christoph Hellwig
                   ` (9 preceding siblings ...)
  2018-09-17 20:53 ` [PATCH 10/10] xfs: use a separate iomap_ops for delalloc writes Christoph Hellwig
@ 2018-09-17 21:23 ` Dave Chinner
  2018-09-18 18:17   ` Christoph Hellwig
  10 siblings, 1 reply; 26+ messages in thread
From: Dave Chinner @ 2018-09-17 21:23 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, Sep 17, 2018 at 10:53:44PM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> this series starts with a for-4.19 fix originally from Dave, and then
> has various tweaks, including a fix for a potential uninitialized data
> exposure that are 4.20 material.

Nice! I'll get the fix ready for 4.19-rc5 and look at the rest for
the 4.20 tree.

Do you have any numbers that demonstrate the performance impact of
the preallocation change? We've always intended to fix this exposure
issue as you've done, I'm just interested in the sort of impact it
has (if any). Allowing extent size hints to use delalloc should
substantially improve performance for workloads that use extent size
hints, so I don't expect the news to be all bad. :)

Also, with this change to use unwritten extents for all delalloc
extents, we can start doing speculative preallocation for writes
into holes inside EOF without leaving uninitialised/unzeroed blocks
laying around. Using spec.  prealloc for "extend-via-truncate,
sequential write to fill hole" workloads would greatly benefit from
that. Did you have any plans to look at that code?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 01/10] xfs: fix transaction leak in xfs_reflink_allocate_cow()
  2018-09-17 20:53 ` [PATCH 01/10] xfs: fix transaction leak in xfs_reflink_allocate_cow() Christoph Hellwig
@ 2018-09-17 23:51   ` Darrick J. Wong
  0 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2018-09-17 23:51 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, Dave Chinner

On Mon, Sep 17, 2018 at 10:53:45PM +0200, Christoph Hellwig wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When xfs_reflink_allocate_cow() allocates a transaction, it drops
> the ILOCK to perform the operation. This Introduces a race condition
> where another thread modifying the file can perform the COW
> allocation operation underneath us. This result in the retry loop
> finding an allocated block and jumping straight to the conversion
> code. It does not, however, cancel the transaction it holds and so
> this gets leaked. This results in a lockdep warning:
> 
> ================================================
> WARNING: lock held when returning to user space!
> 4.18.5 #1 Not tainted
> ------------------------------------------------
> worker/6123 is leaving the kernel with locks still held!
> 1 lock held by worker/6123:
>  #0: 000000009eab4f1b (sb_internal#2){.+.+}, at: xfs_trans_alloc+0x17c/0x220
> 
> And eventually the filesystem deadlocks because it runs out of log
> space that is reserved by the leaked transaction and never gets
> released.
> 
> The logic flow in xfs_reflink_allocate_cow() is a convoluted mess of
> gotos - it's no surprise that it has bug where the flow through
> several goto jumps then fails to clean up context from a non-obvious
> logic path. CLean up the logic flow and make sure every path does
> the right thing.
> 
> Reported-by: Alexander Y. Fomichev <git.user@gmail.com>
> Tested-by: Alexander Y. Fomichev <git.user@gmail.com>
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200981
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> [hch: slight refactor]
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_reflink.c | 127 ++++++++++++++++++++++++++-----------------
>  1 file changed, 77 insertions(+), 50 deletions(-)
> 
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 38f405415b88..d60d0eeed7b9 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -352,6 +352,47 @@ xfs_reflink_convert_cow(
>  	return error;
>  }
>  
> +/*
> + * Find the extent that maps the given range in the COW fork. Even if the extent
> + * is not shared we might have a preallocation for it in the COW fork. If so we
> + * use it that rather than trigger a new allocation.
> + */
> +static int
> +xfs_find_trim_cow_extent(
> +	struct xfs_inode	*ip,
> +	struct xfs_bmbt_irec	*imap,
> +	bool			*shared,
> +	bool			*found)
> +{
> +	xfs_fileoff_t		offset_fsb = imap->br_startoff;
> +	xfs_filblks_t		count_fsb = imap->br_blockcount;
> +	struct xfs_iext_cursor	icur;
> +	struct xfs_bmbt_irec	got;
> +	bool			trimmed;
> +
> +	*found = false;
> +
> +	/*
> +	 * If we don't find an overlapping extent, trim the range we need to
> +	 * allocate to fit the hole we found.
> +	 */
> +	if (!xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, &got) ||
> +	    got.br_startoff > offset_fsb)
> +		return xfs_reflink_trim_around_shared(ip, imap, shared, &trimmed);
> +
> +	*shared = true;
> +	if (isnullstartblock(got.br_startblock)) {
> +		xfs_trim_extent(imap, got.br_startoff, got.br_blockcount);
> +		return 0;
> +	}
> +
> +	/* real extent found - no need to allocate */
> +	xfs_trim_extent(&got, offset_fsb, count_fsb);
> +	*imap = got;
> +	*found = true;
> +	return 0;
> +}
> +
>  /* Allocate all CoW reservations covering a range of blocks in a file. */
>  int
>  xfs_reflink_allocate_cow(
> @@ -363,78 +404,64 @@ xfs_reflink_allocate_cow(
>  	struct xfs_mount	*mp = ip->i_mount;
>  	xfs_fileoff_t		offset_fsb = imap->br_startoff;
>  	xfs_filblks_t		count_fsb = imap->br_blockcount;
> -	struct xfs_bmbt_irec	got;
> -	struct xfs_trans	*tp = NULL;
> +	struct xfs_trans	*tp;
>  	int			nimaps, error = 0;
> -	bool			trimmed;
> +	bool			found;
>  	xfs_filblks_t		resaligned;
>  	xfs_extlen_t		resblks = 0;
> -	struct xfs_iext_cursor	icur;
>  
> -retry:
> -	ASSERT(xfs_is_reflink_inode(ip));
>  	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
> +	ASSERT(xfs_is_reflink_inode(ip));
>  
> -	/*
> -	 * Even if the extent is not shared we might have a preallocation for
> -	 * it in the COW fork.  If so use it.
> -	 */
> -	if (xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, &got) &&
> -	    got.br_startoff <= offset_fsb) {
> -		*shared = true;
> -
> -		/* If we have a real allocation in the COW fork we're done. */
> -		if (!isnullstartblock(got.br_startblock)) {
> -			xfs_trim_extent(&got, offset_fsb, count_fsb);
> -			*imap = got;
> -			goto convert;
> -		}
> +	error = xfs_find_trim_cow_extent(ip, imap, shared, &found);
> +	if (error || !*shared)
> +		return error;
> +	if (found)
> +		goto convert;
>  
> -		xfs_trim_extent(imap, got.br_startoff, got.br_blockcount);
> -	} else {
> -		error = xfs_reflink_trim_around_shared(ip, imap, shared, &trimmed);
> -		if (error || !*shared)
> -			goto out;
> -	}
> +	resaligned = xfs_aligned_fsb_count(imap->br_startoff,
> +		imap->br_blockcount, xfs_get_cowextsz_hint(ip));
> +	resblks = XFS_DIOSTRAT_SPACE_RES(mp, resaligned);
>  
> -	if (!tp) {
> -		resaligned = xfs_aligned_fsb_count(imap->br_startoff,
> -			imap->br_blockcount, xfs_get_cowextsz_hint(ip));
> -		resblks = XFS_DIOSTRAT_SPACE_RES(mp, resaligned);
> +	xfs_iunlock(ip, *lockmode);
> +	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp);
> +	*lockmode = XFS_ILOCK_EXCL;
> +	xfs_ilock(ip, *lockmode);
>  
> -		xfs_iunlock(ip, *lockmode);
> -		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp);
> -		*lockmode = XFS_ILOCK_EXCL;
> -		xfs_ilock(ip, *lockmode);
> +	if (error)
> +		return error;
>  
> -		if (error)
> -			return error;
> +	error = xfs_qm_dqattach_locked(ip, false);
> +	if (error)
> +		goto out_trans_cancel;
>  
> -		error = xfs_qm_dqattach_locked(ip, false);
> -		if (error)
> -			goto out;
> -		goto retry;
> +	/*
> +	 * Check for an overlapping extent again now that we dropped the ilock.
> +	 */
> +	error = xfs_find_trim_cow_extent(ip, imap, shared, &found);
> +	if (error || !*shared)
> +		goto out_trans_cancel;
> +	if (found) {
> +		xfs_trans_cancel(tp);
> +		goto convert;
>  	}
>  
>  	error = xfs_trans_reserve_quota_nblks(tp, ip, resblks, 0,
>  			XFS_QMOPT_RES_REGBLKS);
>  	if (error)
> -		goto out;
> +		goto out_trans_cancel;
>  
>  	xfs_trans_ijoin(tp, ip, 0);
>  
> -	nimaps = 1;
> -
>  	/* Allocate the entire reservation as unwritten blocks. */
> +	nimaps = 1;
>  	error = xfs_bmapi_write(tp, ip, imap->br_startoff, imap->br_blockcount,
>  			XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC,
>  			resblks, imap, &nimaps);
>  	if (error)
> -		goto out_trans_cancel;
> +		goto out_unreserve;
>  
>  	xfs_inode_set_cowblocks_tag(ip);
> -
> -	/* Finish up. */
>  	error = xfs_trans_commit(tp);
>  	if (error)
>  		return error;
> @@ -447,12 +474,12 @@ xfs_reflink_allocate_cow(
>  		return -ENOSPC;
>  convert:
>  	return xfs_reflink_convert_cow_extent(ip, imap, offset_fsb, count_fsb);
> -out_trans_cancel:
> +
> +out_unreserve:
>  	xfs_trans_unreserve_quota_nblks(tp, ip, (long)resblks, 0,
>  			XFS_QMOPT_RES_REGBLKS);
> -out:
> -	if (tp)
> -		xfs_trans_cancel(tp);
> +out_trans_cancel:
> +	xfs_trans_cancel(tp);
>  	return error;
>  }
>  
> -- 
> 2.18.0
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: delalloc and reflink fixes & tweaks
  2018-09-17 21:23 ` delalloc and reflink fixes & tweaks Dave Chinner
@ 2018-09-18 18:17   ` Christoph Hellwig
  2018-09-18 23:00     ` Dave Chinner
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2018-09-18 18:17 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, linux-xfs

On Tue, Sep 18, 2018 at 07:23:23AM +1000, Dave Chinner wrote:
> Do you have any numbers that demonstrate the performance impact of
> the preallocation change? We've always intended to fix this exposure
> issue as you've done, I'm just interested in the sort of impact it
> has (if any).

For the absolute worst case - completely random 4k writes to a sparse
file I see a slowdown of about 3%.  Which is less than the improvement
that we saw from removing buffer heads.

> Also, with this change to use unwritten extents for all delalloc
> extents, we can start doing speculative preallocation for writes
> into holes inside EOF without leaving uninitialised/unzeroed blocks
> laying around.

Careful.  We already have issues because delalloc blocks before EOF don't
ever get reclaimed.  This triggers up on xfs/442 with 1k blocksize for
me.  I actually have a fix for that now, but that will require dropping
one of the cleanup patches from this series, so expect a respin.

If we want to more generic preallocation I guess we should follow the
example of the COW direct I/O path and mark all preallocated extents
as unwritten - that way we know delalloc extents that have the unwritten
bit set can be safely reclaimed.  But that is more work than I want
to do for this merge window at least.

> Using spec.  prealloc for "extend-via-truncate,
> sequential write to fill hole" workloads would greatly benefit from
> that. Did you have any plans to look at that code?

Not anytime soon.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: delalloc and reflink fixes & tweaks
  2018-09-18 18:17   ` Christoph Hellwig
@ 2018-09-18 23:00     ` Dave Chinner
  2018-09-19  5:40       ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Dave Chinner @ 2018-09-18 23:00 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Tue, Sep 18, 2018 at 08:17:28PM +0200, Christoph Hellwig wrote:
> On Tue, Sep 18, 2018 at 07:23:23AM +1000, Dave Chinner wrote:
> > Do you have any numbers that demonstrate the performance impact of
> > the preallocation change? We've always intended to fix this exposure
> > issue as you've done, I'm just interested in the sort of impact it
> > has (if any).
> 
> For the absolute worst case - completely random 4k writes to a sparse
> file I see a slowdown of about 3%.  Which is less than the improvement
> that we saw from removing buffer heads.

Nice! I didn't expect a huge hit, and this is right in the ballpark
of what I was expecting. I think we can live with this.

> > Also, with this change to use unwritten extents for all delalloc
> > extents, we can start doing speculative preallocation for writes
> > into holes inside EOF without leaving uninitialised/unzeroed blocks
> > laying around.
> 
> Careful.  We already have issues because delalloc blocks before EOF don't
> ever get reclaimed.  This triggers up on xfs/442 with 1k blocksize for
> me.  I actually have a fix for that now, but that will require dropping
> one of the cleanup patches from this series, so expect a respin.

Yeah, I didn't say everything already worked, just that using
preallocation for delalloc gets rid of the stale data exposure
problem that has prevented us from doing this in the past.

(Technically speaking, it has already been done in the past - ~1999,
IIRC - but that got yanked pretty quickly when the stale data
exposure problems were reported ;)

> If we want to more generic preallocation I guess we should follow the
> example of the COW direct I/O path and mark all preallocated extents
> as unwritten - that way we know delalloc extents that have the unwritten
> bit set can be safely reclaimed.  But that is more work than I want
> to do for this merge window at least.

That's sounds like a good idea, and a good direction to head
towards. No immediate hurry, just trying to understand where you
might be going with these changes.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: delalloc and reflink fixes & tweaks
  2018-09-18 23:00     ` Dave Chinner
@ 2018-09-19  5:40       ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2018-09-19  5:40 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, linux-xfs

On Wed, Sep 19, 2018 at 09:00:57AM +1000, Dave Chinner wrote:
> > Careful.  We already have issues because delalloc blocks before EOF don't
> > ever get reclaimed.  This triggers up on xfs/442 with 1k blocksize for
> > me.  I actually have a fix for that now, but that will require dropping
> > one of the cleanup patches from this series, so expect a respin.
> 
> Yeah, I didn't say everything already worked, just that using
> preallocation for delalloc gets rid of the stale data exposure
> problem that has prevented us from doing this in the past.

And I think I have to do the mark preallocations as unwritten now
given that my other fix idea that I wasted the whole day on yesterday
didn't turn out to actually work properly.  Sigh..

> That's sounds like a good idea, and a good direction to head
> towards. No immediate hurry, just trying to understand where you
> might be going with these changes.

I've been working on adding (or resurrecting) and always_cow mode,
which helped to expose all kinds of interesting bugs.  That mode
again will be that base for atomic writes (including safe overwrites)
and a zone block device (aka SMR) mode that also can't write in place.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 02/10] xfs: don't bring in extents in xfs_bmap_punch_delalloc_range
  2018-09-17 20:53 ` [PATCH 02/10] xfs: don't bring in extents in xfs_bmap_punch_delalloc_range Christoph Hellwig
@ 2018-09-20 20:23   ` Darrick J. Wong
  0 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2018-09-20 20:23 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, Sep 17, 2018 at 10:53:46PM +0200, Christoph Hellwig wrote:
> This function is only used to punch out delayed allocations on I/O
> failure, which means we need to have read the extents earlier.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_bmap_util.c | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index addbd74ecd8e..418e1e725f3a 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -702,13 +702,9 @@ xfs_bmap_punch_delalloc_range(
>  	struct xfs_iext_cursor	icur;
>  	int			error = 0;
>  
> -	xfs_ilock(ip, XFS_ILOCK_EXCL);
> -	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
> -		error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
> -		if (error)
> -			goto out_unlock;
> -	}
> +	ASSERT(ifp->if_flags & XFS_IFEXTENTS);
>  
> +	xfs_ilock(ip, XFS_ILOCK_EXCL);
>  	if (!xfs_iext_lookup_extent_before(ip, ifp, &end_fsb, &icur, &got))
>  		goto out_unlock;
>  
> -- 
> 2.18.0
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 03/10] xfs: remove XFS_IO_INVALID
  2018-09-17 20:53 ` [PATCH 03/10] xfs: remove XFS_IO_INVALID Christoph Hellwig
@ 2018-09-20 20:31   ` Darrick J. Wong
  2018-09-27 18:38     ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2018-09-20 20:31 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, Sep 17, 2018 at 10:53:47PM +0200, Christoph Hellwig wrote:
> The invalid state isn't any different from a hole, so merge the two
> states.  Use the more descriptive hole name, but keep it as the first
> value of the enum to catch uninitialized fields.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

It's probably worth mentioning that I'm just reading patches *without*
having previously shoved all the day's patches into a branch, built it,
and started xfstests.... :)

> ---
>  fs/xfs/xfs_aops.c |  4 ++--
>  fs/xfs/xfs_aops.h | 14 ++++++--------
>  2 files changed, 8 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 49f5f5896a43..338b9d9984e0 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -917,7 +917,7 @@ xfs_vm_writepage(
>  	struct writeback_control *wbc)
>  {
>  	struct xfs_writepage_ctx wpc = {
> -		.io_type = XFS_IO_INVALID,
> +		.io_type = XFS_IO_HOLE,

Hm.  So I guess if we ever saw XFS_IO_INVALID that meant "we never did
find any extents and so never set io_type", right?

Just checking assumptions here, having let everything page out of my
brain these past 6 weeks...

--D

>  	};
>  	int			ret;
>  
> @@ -933,7 +933,7 @@ xfs_vm_writepages(
>  	struct writeback_control *wbc)
>  {
>  	struct xfs_writepage_ctx wpc = {
> -		.io_type = XFS_IO_INVALID,
> +		.io_type = XFS_IO_HOLE,
>  	};
>  	int			ret;
>  
> diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
> index 9af867951a10..494b4338446e 100644
> --- a/fs/xfs/xfs_aops.h
> +++ b/fs/xfs/xfs_aops.h
> @@ -12,21 +12,19 @@ extern struct bio_set xfs_ioend_bioset;
>   * Types of I/O for bmap clustering and I/O completion tracking.
>   */
>  enum {
> -	XFS_IO_INVALID,		/* initial state */
> +	XFS_IO_HOLE,		/* covers region without any block allocation */
>  	XFS_IO_DELALLOC,	/* covers delalloc region */
>  	XFS_IO_UNWRITTEN,	/* covers allocated but uninitialized data */
>  	XFS_IO_OVERWRITE,	/* covers already allocated extent */
>  	XFS_IO_COW,		/* covers copy-on-write extent */
> -	XFS_IO_HOLE,		/* covers region without any block allocation */
>  };
>  
>  #define XFS_IO_TYPES \
> -	{ XFS_IO_INVALID,		"invalid" }, \
> -	{ XFS_IO_DELALLOC,		"delalloc" }, \
> -	{ XFS_IO_UNWRITTEN,		"unwritten" }, \
> -	{ XFS_IO_OVERWRITE,		"overwrite" }, \
> -	{ XFS_IO_COW,			"CoW" }, \
> -	{ XFS_IO_HOLE,			"hole" }
> +	{ XFS_IO_HOLE,			"hole" },	\
> +	{ XFS_IO_DELALLOC,		"delalloc" },	\
> +	{ XFS_IO_UNWRITTEN,		"unwritten" },	\
> +	{ XFS_IO_OVERWRITE,		"overwrite" },	\
> +	{ XFS_IO_COW,			"CoW" }
>  
>  /*
>   * Structure for buffered I/O completions.
> -- 
> 2.18.0
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 04/10] xfs: simplify the IOMAP_ZERO check in xfs_file_iomap_begin a bit
  2018-09-17 20:53 ` [PATCH 04/10] xfs: simplify the IOMAP_ZERO check in xfs_file_iomap_begin a bit Christoph Hellwig
@ 2018-09-20 20:31   ` Darrick J. Wong
  2018-09-26 15:17   ` Brian Foster
  1 sibling, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2018-09-20 20:31 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, Sep 17, 2018 at 10:53:48PM +0200, Christoph Hellwig wrote:
> Merge the two cases for reflink vs not into a single one.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok to me,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_iomap.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 6320aca39f39..9595a3c59ade 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1048,16 +1048,20 @@ xfs_file_iomap_begin(
>  	if (!(flags & (IOMAP_WRITE | IOMAP_ZERO)))
>  		goto out_found;
>  
> +	/*
> +	 * Don't need to allocate over holes when doing zeroing operations,
> +	 * unless we need to COW and have an existing extent.
> +	 */
> +	if ((flags & IOMAP_ZERO) &&
> +	    (!xfs_is_reflink_inode(ip) ||
> +	     !needs_cow_for_zeroing(&imap, nimaps)))
> +		goto out_found;
> +
>  	/*
>  	 * Break shared extents if necessary. Checks for non-blocking IO have
>  	 * been done up front, so we don't need to do them here.
>  	 */
>  	if (xfs_is_reflink_inode(ip)) {
> -		/* if zeroing doesn't need COW allocation, then we are done. */
> -		if ((flags & IOMAP_ZERO) &&
> -		    !needs_cow_for_zeroing(&imap, nimaps))
> -			goto out_found;
> -
>  		if (flags & IOMAP_DIRECT) {
>  			/* may drop and re-acquire the ilock */
>  			error = xfs_reflink_allocate_cow(ip, &imap, &shared,
> @@ -1074,10 +1078,6 @@ xfs_file_iomap_begin(
>  		length = XFS_FSB_TO_B(mp, end_fsb) - offset;
>  	}
>  
> -	/* Don't need to allocate over holes when doing zeroing operations. */
> -	if (flags & IOMAP_ZERO)
> -		goto out_found;
> -
>  	if (!imap_needs_alloc(inode, &imap, nimaps))
>  		goto out_found;
>  
> -- 
> 2.18.0
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 04/10] xfs: simplify the IOMAP_ZERO check in xfs_file_iomap_begin a bit
  2018-09-17 20:53 ` [PATCH 04/10] xfs: simplify the IOMAP_ZERO check in xfs_file_iomap_begin a bit Christoph Hellwig
  2018-09-20 20:31   ` Darrick J. Wong
@ 2018-09-26 15:17   ` Brian Foster
  2018-09-27 18:40     ` Christoph Hellwig
  1 sibling, 1 reply; 26+ messages in thread
From: Brian Foster @ 2018-09-26 15:17 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, Sep 17, 2018 at 10:53:48PM +0200, Christoph Hellwig wrote:
> Merge the two cases for reflink vs not into a single one.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_iomap.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 6320aca39f39..9595a3c59ade 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1048,16 +1048,20 @@ xfs_file_iomap_begin(
>  	if (!(flags & (IOMAP_WRITE | IOMAP_ZERO)))
>  		goto out_found;
>  
> +	/*
> +	 * Don't need to allocate over holes when doing zeroing operations,
> +	 * unless we need to COW and have an existing extent.
> +	 */
> +	if ((flags & IOMAP_ZERO) &&
> +	    (!xfs_is_reflink_inode(ip) ||
> +	     !needs_cow_for_zeroing(&imap, nimaps)))
> +		goto out_found;
> +
>  	/*
>  	 * Break shared extents if necessary. Checks for non-blocking IO have
>  	 * been done up front, so we don't need to do them here.
>  	 */
>  	if (xfs_is_reflink_inode(ip)) {
> -		/* if zeroing doesn't need COW allocation, then we are done. */
> -		if ((flags & IOMAP_ZERO) &&
> -		    !needs_cow_for_zeroing(&imap, nimaps))
> -			goto out_found;
> -
>  		if (flags & IOMAP_DIRECT) {
>  			/* may drop and re-acquire the ilock */
>  			error = xfs_reflink_allocate_cow(ip, &imap, &shared,
> @@ -1074,10 +1078,6 @@ xfs_file_iomap_begin(
>  		length = XFS_FSB_TO_B(mp, end_fsb) - offset;
>  	}
>  
> -	/* Don't need to allocate over holes when doing zeroing operations. */
> -	if (flags & IOMAP_ZERO)
> -		goto out_found;
> -

Hmm, can't this subtlely change behavior? Suppose we get a zero range
call over a non-shared delalloc extent of a reflinked file. The current
code gets into the reflink branch above, needs_cow_for_zeroing() returns
true because it's not an unwritten extent, xfs_reflink_reserve_cow()
doesn't ultimately do anything because the range is not shared, we skip
out via the check above and presumably the core iomap code zeroes the
page.

With this change, it looks like that scenario plays out the same until
we get to imap_needs_alloc(), which returns true and brings us to do an
allocation... I guess this changes a bit in the follow on patches, but
the IOMAP_ZERO check that remains still makes the logic look funny.
Should we ever get here with IOMAP_ZERO after the final patch to switch
to separate ops?

Brian

>  	if (!imap_needs_alloc(inode, &imap, nimaps))
>  		goto out_found;
>  
> -- 
> 2.18.0
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 07/10] xfs: handle extent size hints in xfs_file_iomap_begin_delay
  2018-09-17 20:53 ` [PATCH 07/10] xfs: handle extent size hints in xfs_file_iomap_begin_delay Christoph Hellwig
@ 2018-09-26 15:17   ` Brian Foster
  2018-10-01 12:38     ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Brian Foster @ 2018-09-26 15:17 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, Sep 17, 2018 at 10:53:51PM +0200, Christoph Hellwig wrote:
> For files with extent size hints we always allocate unwritten extents
> first and then convert them to real extents later to avoid exposing
> stale data outside the blocks actually written.  But there is no
> reason we can't do unwritten extent allocations from delalloc blocks,
> in fact we already do that for COW operations.
> 
> Note that this does not handle files on the RT subvolume (yet), as
> we removed code handling RT allocations from the delalloc path this
> would be a slightly bigger project, and it doesn't really help with
> simplifying the COW path either.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_iomap.c | 17 ++++-------------
>  1 file changed, 4 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index e12ff5e9a8ec..2d08ace09e25 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -533,7 +533,6 @@ xfs_file_iomap_begin_delay(
>  	xfs_fsblock_t		prealloc_blocks = 0;
>  
>  	ASSERT(!XFS_IS_REALTIME_INODE(ip));
> -	ASSERT(!xfs_get_extsz_hint(ip));
>  
>  	xfs_ilock(ip, XFS_ILOCK_EXCL);
>  
> @@ -1035,7 +1034,7 @@ xfs_file_iomap_begin(
>  		return -EIO;
>  
>  	if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && !(flags & IOMAP_DIRECT) &&
> -			!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
> +			!IS_DAX(inode) && !XFS_IS_REALTIME_INODE(ip)) {

AFAICT this fails to honor the extent size hint for buffered writes:

# xfs_io -fc "truncate 1m" -c "extsize 32k" -c "pwrite 0 4k" -c "fiemap -v" /mnt/file
wrote 4096/4096 bytes at offset 0
4 KiB, 1 ops; 0.0001 sec (35.511 MiB/sec and 9090.9091 ops/sec)
/mnt/file:
 EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
   0: [0..7]:          80..87               8   0x1
   1: [8..2047]:       hole              2040

... and with an unpatched kernel:

# xfs_io -fc "truncate 1m" -c "extsize 32k" -c "pwrite 0 4k" -c "fiemap -v" /mnt/file                                                                                                                                       
wrote 4096/4096 bytes at offset 0
4 KiB, 1 ops; 0.0017 sec (2.283 MiB/sec and 584.4535 ops/sec)
/mnt/file:
 EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
   0: [0..7]:          192..199             8   0x0
   1: [8..63]:         200..255            56 0x801
   2: [64..2047]:      hole              1984

I'd guess this is due to the extent overlap check in
xfs_bmap_extsize_align(), but I haven't verified.

Brian

>  		/* Reserve delalloc blocks for regular writeback. */
>  		return xfs_file_iomap_begin_delay(inode, offset, length, flags,
>  				iomap);
> @@ -1088,17 +1087,9 @@ xfs_file_iomap_begin(
>  	 * been done up front, so we don't need to do them here.
>  	 */
>  	if (xfs_is_reflink_inode(ip)) {
> -		if (flags & IOMAP_DIRECT) {
> -			/* may drop and re-acquire the ilock */
> -			error = xfs_reflink_allocate_cow(ip, &imap, &shared,
> -					&lockmode);
> -			if (error)
> -				goto out_unlock;
> -		} else {
> -			error = xfs_reflink_reserve_cow(ip, &imap, &shared);
> -			if (error)
> -				goto out_unlock;
> -		}
> +		error = xfs_reflink_allocate_cow(ip, &imap, &shared, &lockmode);
> +		if (error)
> +			goto out_unlock;
>  
>  		end_fsb = imap.br_startoff + imap.br_blockcount;
>  		length = XFS_FSB_TO_B(mp, end_fsb) - offset;
> -- 
> 2.18.0
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 10/10] xfs: use a separate iomap_ops for delalloc writes
  2018-09-17 20:53 ` [PATCH 10/10] xfs: use a separate iomap_ops for delalloc writes Christoph Hellwig
@ 2018-09-26 15:18   ` Brian Foster
  2018-10-01 12:40     ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Brian Foster @ 2018-09-26 15:18 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, Sep 17, 2018 at 10:53:54PM +0200, Christoph Hellwig wrote:
> Introduce a new xfs_delalloc_iomap_ops instance that is passed to
> iomap_apply when we are doing delayed allocations.  This keeps the code
> more neatly separated for future work.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_bmap_util.c |   3 +-
>  fs/xfs/xfs_file.c      |   6 +-
>  fs/xfs/xfs_iomap.c     | 177 ++++++++++++++++++++---------------------
>  fs/xfs/xfs_iomap.h     |   1 +
>  fs/xfs/xfs_iops.c      |   4 +-
>  fs/xfs/xfs_reflink.c   |   2 +-
>  6 files changed, 95 insertions(+), 98 deletions(-)
> 
...
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 9079196bbc35..1bfc40ce581a 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -512,8 +512,16 @@ xfs_iomap_prealloc_size(
>  	return alloc_blocks;
>  }
>  
> +static int xfs_file_iomap_begin(struct inode *inode, loff_t offset,
> +		loff_t length, unsigned flags, struct iomap *iomap);
> +
> +/*
> + * Start writing to a file, allocating blocks using delayed allocations if
> + * needed.  Note that in case of doing COW overwrites the iomap needs to return
> + * the address of the existing data.
> + */
>  static int
> -xfs_file_iomap_begin_delay(
> +xfs_delalloc_iomap_begin(

Ok, but wouldn't something like xfs_buffered_iomap_begin/end() be a
better name? Technically a real extent may already exist in this
codepath, so I find the delalloc name kind of confusing.

>  	struct inode		*inode,
>  	loff_t			offset,
>  	loff_t			count,
...
> @@ -658,6 +671,73 @@ xfs_file_iomap_begin_delay(
>  	return error;
>  }
>  
> +static int
> +xfs_delalloc_iomap_end(
> +	struct inode		*inode,
> +	loff_t			offset,
> +	loff_t			length,
> +	ssize_t			written,
> +	unsigned		flags,
> +	struct iomap		*iomap)
> +{
> +	struct xfs_inode	*ip = XFS_I(inode);
> +	struct xfs_mount	*mp = ip->i_mount;
> +	xfs_fileoff_t		start_fsb;
> +	xfs_fileoff_t		end_fsb;
> +	int			error = 0;
> +
> +	if (iomap->type != IOMAP_DELALLOC)
> +		return 0;

Any reason we don't continue to filter out !IOMAP_WRITE as well?

Brian

> +
> +	/*
> +	 * Behave as if the write failed if drop writes is enabled. Set the NEW
> +	 * flag to force delalloc cleanup.
> +	 */
> +	if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_DROP_WRITES)) {
> +		iomap->flags |= IOMAP_F_NEW;
> +		written = 0;
> +	}
> +
> +	/*
> +	 * start_fsb refers to the first unused block after a short write. If
> +	 * nothing was written, round offset down to point at the first block in
> +	 * the range.
> +	 */
> +	if (unlikely(!written))
> +		start_fsb = XFS_B_TO_FSBT(mp, offset);
> +	else
> +		start_fsb = XFS_B_TO_FSB(mp, offset + written);
> +	end_fsb = XFS_B_TO_FSB(mp, offset + length);
> +
> +	/*
> +	 * Trim delalloc blocks if they were allocated by this write and we
> +	 * didn't manage to write the whole range.
> +	 *
> +	 * We don't need to care about racing delalloc as we hold i_mutex
> +	 * across the reserve/allocate/unreserve calls. If there are delalloc
> +	 * blocks in the range, they are ours.
> +	 */
> +	if ((iomap->flags & IOMAP_F_NEW) && start_fsb < end_fsb) {
> +		truncate_pagecache_range(VFS_I(ip), XFS_FSB_TO_B(mp, start_fsb),
> +					 XFS_FSB_TO_B(mp, end_fsb) - 1);
> +
> +		error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
> +					       end_fsb - start_fsb);
> +		if (error && !XFS_FORCED_SHUTDOWN(mp)) {
> +			xfs_alert(mp, "%s: unable to clean up ino %lld",
> +				__func__, ip->i_ino);
> +			return error;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +const struct iomap_ops xfs_delalloc_iomap_ops = {
> +	.iomap_begin		= xfs_delalloc_iomap_begin,
> +	.iomap_end		= xfs_delalloc_iomap_end,
> +};
> +
>  /*
>   * Pass in a delayed allocate extent, convert it to real extents;
>   * return to the caller the extent we create which maps on top of
> @@ -1028,16 +1108,13 @@ xfs_file_iomap_begin(
>  	bool			shared = false;
>  	unsigned		lockmode;
>  
> +	ASSERT(!(flags & (IOMAP_WRITE | IOMAP_ZERO)) ||
> +		(flags & IOMAP_DIRECT) || IS_DAX(inode));
> +	ASSERT(!(flags & IOMAP_ZERO) || XFS_IS_REALTIME_INODE(ip));
> +
>  	if (XFS_FORCED_SHUTDOWN(mp))
>  		return -EIO;
>  
> -	if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && !(flags & IOMAP_DIRECT) &&
> -			!IS_DAX(inode) && !XFS_IS_REALTIME_INODE(ip)) {
> -		/* Reserve delalloc blocks for regular writeback. */
> -		return xfs_file_iomap_begin_delay(inode, offset, length, flags,
> -				iomap);
> -	}
> -
>  	/*
>  	 * Lock the inode in the manner required for the specified operation and
>  	 * check for as many conditions that would result in blocking as
> @@ -1070,15 +1147,6 @@ xfs_file_iomap_begin(
>  	if (!(flags & (IOMAP_WRITE | IOMAP_ZERO)))
>  		goto out_found;
>  
> -	/*
> -	 * Don't need to allocate over holes when doing zeroing operations,
> -	 * unless we need to COW and have an existing extent.
> -	 */
> -	if ((flags & IOMAP_ZERO) &&
> -	    (!xfs_is_reflink_inode(ip) ||
> -	     !needs_cow_for_zeroing(&imap, nimaps)))
> -		goto out_found;
> -
>  	/*
>  	 * Break shared extents if necessary. Checks for non-blocking IO have
>  	 * been done up front, so we don't need to do them here.
> @@ -1148,81 +1216,8 @@ xfs_file_iomap_begin(
>  	return error;
>  }
>  
> -static int
> -xfs_file_iomap_end_delalloc(
> -	struct xfs_inode	*ip,
> -	loff_t			offset,
> -	loff_t			length,
> -	ssize_t			written,
> -	struct iomap		*iomap)
> -{
> -	struct xfs_mount	*mp = ip->i_mount;
> -	xfs_fileoff_t		start_fsb;
> -	xfs_fileoff_t		end_fsb;
> -	int			error = 0;
> -
> -	/*
> -	 * Behave as if the write failed if drop writes is enabled. Set the NEW
> -	 * flag to force delalloc cleanup.
> -	 */
> -	if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_DROP_WRITES)) {
> -		iomap->flags |= IOMAP_F_NEW;
> -		written = 0;
> -	}
> -
> -	/*
> -	 * start_fsb refers to the first unused block after a short write. If
> -	 * nothing was written, round offset down to point at the first block in
> -	 * the range.
> -	 */
> -	if (unlikely(!written))
> -		start_fsb = XFS_B_TO_FSBT(mp, offset);
> -	else
> -		start_fsb = XFS_B_TO_FSB(mp, offset + written);
> -	end_fsb = XFS_B_TO_FSB(mp, offset + length);
> -
> -	/*
> -	 * Trim delalloc blocks if they were allocated by this write and we
> -	 * didn't manage to write the whole range.
> -	 *
> -	 * We don't need to care about racing delalloc as we hold i_mutex
> -	 * across the reserve/allocate/unreserve calls. If there are delalloc
> -	 * blocks in the range, they are ours.
> -	 */
> -	if ((iomap->flags & IOMAP_F_NEW) && start_fsb < end_fsb) {
> -		truncate_pagecache_range(VFS_I(ip), XFS_FSB_TO_B(mp, start_fsb),
> -					 XFS_FSB_TO_B(mp, end_fsb) - 1);
> -
> -		error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
> -					       end_fsb - start_fsb);
> -		if (error && !XFS_FORCED_SHUTDOWN(mp)) {
> -			xfs_alert(mp, "%s: unable to clean up ino %lld",
> -				__func__, ip->i_ino);
> -			return error;
> -		}
> -	}
> -
> -	return 0;
> -}
> -
> -static int
> -xfs_file_iomap_end(
> -	struct inode		*inode,
> -	loff_t			offset,
> -	loff_t			length,
> -	ssize_t			written,
> -	unsigned		flags,
> -	struct iomap		*iomap)
> -{
> -	if ((flags & IOMAP_WRITE) && iomap->type == IOMAP_DELALLOC)
> -		return xfs_file_iomap_end_delalloc(XFS_I(inode), offset,
> -				length, written, iomap);
> -	return 0;
> -}
> -
>  const struct iomap_ops xfs_iomap_ops = {
>  	.iomap_begin		= xfs_file_iomap_begin,
> -	.iomap_end		= xfs_file_iomap_end,
>  };
>  
>  static int
> diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h
> index c6170548831b..fe4cde2c30c9 100644
> --- a/fs/xfs/xfs_iomap.h
> +++ b/fs/xfs/xfs_iomap.h
> @@ -42,6 +42,7 @@ xfs_aligned_fsb_count(
>  }
>  
>  extern const struct iomap_ops xfs_iomap_ops;
> +extern const struct iomap_ops xfs_delalloc_iomap_ops;
>  extern const struct iomap_ops xfs_xattr_iomap_ops;
>  
>  #endif /* __XFS_IOMAP_H__*/
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index c3e74f9128e8..fb8035e3ba0b 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -865,10 +865,10 @@ xfs_setattr_size(
>  	if (newsize > oldsize) {
>  		trace_xfs_zero_eof(ip, oldsize, newsize - oldsize);
>  		error = iomap_zero_range(inode, oldsize, newsize - oldsize,
> -				&did_zeroing, &xfs_iomap_ops);
> +				&did_zeroing, &xfs_delalloc_iomap_ops);
>  	} else {
>  		error = iomap_truncate_page(inode, newsize, &did_zeroing,
> -				&xfs_iomap_ops);
> +				&xfs_delalloc_iomap_ops);
>  	}
>  
>  	if (error)
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index e0111d31140b..ac94ace45424 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -1393,7 +1393,7 @@ xfs_reflink_dirty_extents(
>  			if (fpos + flen > isize)
>  				flen = isize - fpos;
>  			error = iomap_file_dirty(VFS_I(ip), fpos, flen,
> -					&xfs_iomap_ops);
> +					&xfs_delalloc_iomap_ops);
>  			xfs_ilock(ip, XFS_ILOCK_EXCL);
>  			if (error)
>  				goto out;
> -- 
> 2.18.0
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 03/10] xfs: remove XFS_IO_INVALID
  2018-09-20 20:31   ` Darrick J. Wong
@ 2018-09-27 18:38     ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2018-09-27 18:38 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs

On Thu, Sep 20, 2018 at 01:31:27PM -0700, Darrick J. Wong wrote:
> >  {
> >  	struct xfs_writepage_ctx wpc = {
> > -		.io_type = XFS_IO_INVALID,
> > +		.io_type = XFS_IO_HOLE,
> 
> Hm.  So I guess if we ever saw XFS_IO_INVALID that meant "we never did
> find any extents and so never set io_type", right?

Yes.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 04/10] xfs: simplify the IOMAP_ZERO check in xfs_file_iomap_begin a bit
  2018-09-26 15:17   ` Brian Foster
@ 2018-09-27 18:40     ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2018-09-27 18:40 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-xfs

On Wed, Sep 26, 2018 at 11:17:12AM -0400, Brian Foster wrote:
> With this change, it looks like that scenario plays out the same until
> we get to imap_needs_alloc(), which returns true and brings us to do an
> allocation... I guess this changes a bit in the follow on patches, but
> the IOMAP_ZERO check that remains still makes the logic look funny.
> Should we ever get here with IOMAP_ZERO after the final patch to switch
> to separate ops?

Good point.  I thought we wouldn't ever get here, but until the next
patch that isn't actually true.  I'll reorder and add asserts.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 07/10] xfs: handle extent size hints in xfs_file_iomap_begin_delay
  2018-09-26 15:17   ` Brian Foster
@ 2018-10-01 12:38     ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2018-10-01 12:38 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-xfs

On Wed, Sep 26, 2018 at 11:17:24AM -0400, Brian Foster wrote:
> I'd guess this is due to the extent overlap check in
> xfs_bmap_extsize_align(), but I haven't verified.

The problem is that we don't honor extent size hints at all when
converting delalloc to real extents.  This will be a much bigger
change, so I'll defer it for now.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 10/10] xfs: use a separate iomap_ops for delalloc writes
  2018-09-26 15:18   ` Brian Foster
@ 2018-10-01 12:40     ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2018-10-01 12:40 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-xfs

On Wed, Sep 26, 2018 at 11:18:06AM -0400, Brian Foster wrote:
> >  static int
> > -xfs_file_iomap_begin_delay(
> > +xfs_delalloc_iomap_begin(
> 
> Ok, but wouldn't something like xfs_buffered_iomap_begin/end() be a
> better name? Technically a real extent may already exist in this
> codepath, so I find the delalloc name kind of confusing.

The point is that it does a delalloc allocation.  I'd always rather
document what it does than the (current) use case.

Either way I've dropped the patch for now, as it seems a bit pointless
as long as we still don't use delayed allocations for files with
extent size hints.

> > +	xfs_fileoff_t		start_fsb;
> > +	xfs_fileoff_t		end_fsb;
> > +	int			error = 0;
> > +
> > +	if (iomap->type != IOMAP_DELALLOC)
> > +		return 0;
> 
> Any reason we don't continue to filter out !IOMAP_WRITE as well?

We are only using it for writes (or zeroing, but the same logic should
apply there)

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2018-10-01 19:18 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-17 20:53 delalloc and reflink fixes & tweaks Christoph Hellwig
2018-09-17 20:53 ` [PATCH 01/10] xfs: fix transaction leak in xfs_reflink_allocate_cow() Christoph Hellwig
2018-09-17 23:51   ` Darrick J. Wong
2018-09-17 20:53 ` [PATCH 02/10] xfs: don't bring in extents in xfs_bmap_punch_delalloc_range Christoph Hellwig
2018-09-20 20:23   ` Darrick J. Wong
2018-09-17 20:53 ` [PATCH 03/10] xfs: remove XFS_IO_INVALID Christoph Hellwig
2018-09-20 20:31   ` Darrick J. Wong
2018-09-27 18:38     ` Christoph Hellwig
2018-09-17 20:53 ` [PATCH 04/10] xfs: simplify the IOMAP_ZERO check in xfs_file_iomap_begin a bit Christoph Hellwig
2018-09-20 20:31   ` Darrick J. Wong
2018-09-26 15:17   ` Brian Foster
2018-09-27 18:40     ` Christoph Hellwig
2018-09-17 20:53 ` [PATCH 05/10] xfs: handle zeroing in xfs_file_iomap_begin_delay Christoph Hellwig
2018-09-17 20:53 ` [PATCH 06/10] xfs: always allocate blocks as unwritten for file data Christoph Hellwig
2018-09-17 20:53 ` [PATCH 07/10] xfs: handle extent size hints in xfs_file_iomap_begin_delay Christoph Hellwig
2018-09-26 15:17   ` Brian Foster
2018-10-01 12:38     ` Christoph Hellwig
2018-09-17 20:53 ` [PATCH 08/10] xfs: remove the unused shared argument to xfs_reflink_reserve_cow Christoph Hellwig
2018-09-17 20:53 ` [PATCH 09/10] xfs: remove the unused trimmed argument from xfs_reflink_trim_around_shared Christoph Hellwig
2018-09-17 20:53 ` [PATCH 10/10] xfs: use a separate iomap_ops for delalloc writes Christoph Hellwig
2018-09-26 15:18   ` Brian Foster
2018-10-01 12:40     ` Christoph Hellwig
2018-09-17 21:23 ` delalloc and reflink fixes & tweaks Dave Chinner
2018-09-18 18:17   ` Christoph Hellwig
2018-09-18 23:00     ` Dave Chinner
2018-09-19  5:40       ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.