linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/4] xfs: fix stale disk exposure after crash
@ 2020-05-23 16:49 Darrick J. Wong
  2020-05-23 16:49 ` [PATCH 1/4] xfs: don't fail unwritten extent conversion on writeback due to edquot Darrick J. Wong
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Darrick J. Wong @ 2020-05-23 16:49 UTC (permalink / raw)
  To: darrick.wong
  Cc: Christoph Hellwig, Brian Foster, Christoph Hellwig, linux-xfs,
	hch, bfoster

Hi all,

This set of patches try to shrink the window during which a crash during
writeback can expose stale disk contents.  The first patch causes
delalloc reservations to be converted to unwritten extents for any
writeback that's going on within EOF.  The second patch fixes a minor
error encountered during writeback; and the third patch fixes
speculative preallocation to work when the EOF block could be unwritten.

v3 puts the extent tree walk into the dynamic prealloc sizing function.
v4 fixes a bug with walking the extent map backwards.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=stale-exposure

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=stale-exposure
---
 fs/xfs/libxfs/xfs_bmap.c |   29 +++++++-----
 fs/xfs/xfs_iomap.c       |  107 +++++++++++++++++++++++-----------------------
 2 files changed, 71 insertions(+), 65 deletions(-)


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/4] xfs: don't fail unwritten extent conversion on writeback due to edquot
  2020-05-23 16:49 [PATCH v4 0/4] xfs: fix stale disk exposure after crash Darrick J. Wong
@ 2020-05-23 16:49 ` Darrick J. Wong
  2020-05-23 16:49 ` [PATCH 2/4] xfs: measure all contiguous previous extents for prealloc size Darrick J. Wong
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2020-05-23 16:49 UTC (permalink / raw)
  To: darrick.wong; +Cc: Brian Foster, Christoph Hellwig, linux-xfs, hch, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

During writeback, it's possible for the quota block reservation in
xfs_iomap_write_unwritten to fail with EDQUOT because we hit the quota
limit.  This causes writeback errors for data that was already written
to disk, when it's not even guaranteed that the bmbt will expand to
exceed the quota limit.  Irritatingly, this condition is reported to
userspace as EIO by fsync, which is confusing.

We wrote the data, so allow the reservation.  That might put us slightly
above the hard limit, but it's better than losing data after a write.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_iomap.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index bb590a267a7f..ac970b13b1f8 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -563,7 +563,7 @@ xfs_iomap_write_unwritten(
 		xfs_trans_ijoin(tp, ip, 0);
 
 		error = xfs_trans_reserve_quota_nblks(tp, ip, resblks, 0,
-				XFS_QMOPT_RES_REGBLKS);
+				XFS_QMOPT_RES_REGBLKS | XFS_QMOPT_FORCE_RES);
 		if (error)
 			goto error_on_bmapi_transaction;
 


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/4] xfs: measure all contiguous previous extents for prealloc size
  2020-05-23 16:49 [PATCH v4 0/4] xfs: fix stale disk exposure after crash Darrick J. Wong
  2020-05-23 16:49 ` [PATCH 1/4] xfs: don't fail unwritten extent conversion on writeback due to edquot Darrick J. Wong
@ 2020-05-23 16:49 ` Darrick J. Wong
  2020-05-24  9:14   ` Christoph Hellwig
  2020-05-24 17:16   ` [PATCH v2 " Darrick J. Wong
  2020-05-23 16:49 ` [PATCH 3/4] xfs: refactor xfs_iomap_prealloc_size Darrick J. Wong
  2020-05-23 16:49 ` [PATCH 4/4] xfs: force writes to delalloc regions to unwritten Darrick J. Wong
  3 siblings, 2 replies; 13+ messages in thread
From: Darrick J. Wong @ 2020-05-23 16:49 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, hch, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

When we're estimating a new speculative preallocation length for an
extending write, we should walk backwards through the extent list to
determine the number of number of blocks that are physically and
logically contiguous with the write offset, and use that as an input to
the preallocation size computation.

This way, preallocation length is truly measured by the effectiveness of
the allocator in giving us contiguous allocations without being
influenced by the state of a given extent.  This fixes both the problem
where ZERO_RANGE within an EOF can reduce preallocation, and prevents
the unnecessary shrinkage of preallocation when delalloc extents are
turned into unwritten extents.

This was found as a regression in xfs/014 after changing delalloc writes
to create unwritten extents during writeback.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_iomap.c |   40 +++++++++++++++++++++++++++-------------
 1 file changed, 27 insertions(+), 13 deletions(-)


diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index ac970b13b1f8..de990365397e 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -377,15 +377,17 @@ xfs_iomap_prealloc_size(
 	loff_t			count,
 	struct xfs_iext_cursor	*icur)
 {
+	struct xfs_iext_cursor	ncur = *icur;
+	struct xfs_bmbt_irec	prev, got;
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, whichfork);
 	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
-	struct xfs_bmbt_irec	prev;
-	int			shift = 0;
 	int64_t			freesp;
 	xfs_fsblock_t		qblocks;
-	int			qshift = 0;
 	xfs_fsblock_t		alloc_blocks = 0;
+	xfs_extlen_t		plen;
+	int			shift = 0;
+	int			qshift = 0;
 
 	if (offset + count <= XFS_ISIZE(ip))
 		return 0;
@@ -400,7 +402,7 @@ xfs_iomap_prealloc_size(
 	 */
 	if ((mp->m_flags & XFS_MOUNT_ALLOCSIZE) ||
 	    XFS_ISIZE(ip) < XFS_FSB_TO_B(mp, mp->m_dalign) ||
-	    !xfs_iext_peek_prev_extent(ifp, icur, &prev) ||
+	    !xfs_iext_prev_extent(ifp, &ncur, &prev) ||
 	    prev.br_startoff + prev.br_blockcount < offset_fsb)
 		return mp->m_allocsize_blocks;
 
@@ -413,16 +415,28 @@ xfs_iomap_prealloc_size(
 	 * preallocation size.
 	 *
 	 * If the extent is a hole, then preallocation is essentially disabled.
-	 * Otherwise we take the size of the preceding data extent as the basis
-	 * for the preallocation size. If the size of the extent is greater than
-	 * half the maximum extent length, then use the current offset as the
-	 * basis. This ensures that for large files the preallocation size
-	 * always extends to MAXEXTLEN rather than falling short due to things
-	 * like stripe unit/width alignment of real extents.
+	 * Otherwise we take the size of the preceding data extents as the basis
+	 * for the preallocation size. Note that we don't care if the previous
+	 * extents are written or not.
+	 *
+	 * If the size of the extents is greater than half the maximum extent
+	 * length, then use the current offset as the basis. This ensures that
+	 * for large files the preallocation size always extends to MAXEXTLEN
+	 * rather than falling short due to things like stripe unit/width
+	 * alignment of real extents.
 	 */
-	if (prev.br_blockcount <= (MAXEXTLEN >> 1))
-		alloc_blocks = prev.br_blockcount << 1;
-	else
+	plen = prev.br_blockcount;
+	while (xfs_iext_prev_extent(ifp, &ncur, &got)) {
+		if (plen > MAXEXTLEN / 2 ||
+		    isnullstartblock(got.br_startblock) ||
+		    got.br_startoff + got.br_blockcount != prev.br_startoff ||
+		    got.br_startblock + got.br_blockcount != prev.br_startblock)
+			break;
+		plen += got.br_blockcount;
+		prev = got;
+	}
+	alloc_blocks = plen << 1;
+	if (alloc_blocks > MAXEXTLEN)
 		alloc_blocks = XFS_B_TO_FSB(mp, offset);
 	if (!alloc_blocks)
 		goto check_writeio;


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/4] xfs: refactor xfs_iomap_prealloc_size
  2020-05-23 16:49 [PATCH v4 0/4] xfs: fix stale disk exposure after crash Darrick J. Wong
  2020-05-23 16:49 ` [PATCH 1/4] xfs: don't fail unwritten extent conversion on writeback due to edquot Darrick J. Wong
  2020-05-23 16:49 ` [PATCH 2/4] xfs: measure all contiguous previous extents for prealloc size Darrick J. Wong
@ 2020-05-23 16:49 ` Darrick J. Wong
  2020-05-24 17:17   ` [PATCH v2 " Darrick J. Wong
  2020-05-23 16:49 ` [PATCH 4/4] xfs: force writes to delalloc regions to unwritten Darrick J. Wong
  3 siblings, 1 reply; 13+ messages in thread
From: Darrick J. Wong @ 2020-05-23 16:49 UTC (permalink / raw)
  To: darrick.wong
  Cc: Christoph Hellwig, Christoph Hellwig, linux-xfs, hch, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

Refactor xfs_iomap_prealloc_size to be the function that dynamically
computes the per-file preallocation size by moving the allocsize= case
to the caller.  Break up the huge comment preceding the function to
annotate the relevant parts of the code, and remove the impossible
check_writeio case.

Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_iomap.c |   83 ++++++++++++++++++++++------------------------------
 1 file changed, 35 insertions(+), 48 deletions(-)


diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index de990365397e..ad695c77a472 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -352,22 +352,10 @@ xfs_quota_calc_throttle(
 }
 
 /*
- * If we are doing a write at the end of the file and there are no allocations
- * past this one, then extend the allocation out to the file system's write
- * iosize.
- *
  * If we don't have a user specified preallocation size, dynamically increase
  * the preallocation size as the size of the file grows.  Cap the maximum size
  * at a single extent or less if the filesystem is near full. The closer the
- * filesystem is to full, the smaller the maximum prealocation.
- *
- * As an exception we don't do any preallocation at all if the file is smaller
- * than the minimum preallocation and we are using the default dynamic
- * preallocation scheme, as it is likely this is the only write to the file that
- * is going to be done.
- *
- * We clean up any extra space left over when the file is closed in
- * xfs_inactive().
+ * filesystem is to being full, the smaller the maximum preallocation.
  */
 STATIC xfs_fsblock_t
 xfs_iomap_prealloc_size(
@@ -389,41 +377,28 @@ xfs_iomap_prealloc_size(
 	int			shift = 0;
 	int			qshift = 0;
 
-	if (offset + count <= XFS_ISIZE(ip))
-		return 0;
-
-	if (!(mp->m_flags & XFS_MOUNT_ALLOCSIZE) &&
-	    (XFS_ISIZE(ip) < XFS_FSB_TO_B(mp, mp->m_allocsize_blocks)))
+	/*
+	 * As an exception we don't do any preallocation at all if the file is
+	 * smaller than the minimum preallocation and we are using the default
+	 * dynamic preallocation scheme, as it is likely this is the only write
+	 * to the file that is going to be done.
+	 */
+	if (XFS_ISIZE(ip) < XFS_FSB_TO_B(mp, mp->m_allocsize_blocks))
 		return 0;
 
 	/*
-	 * If an explicit allocsize is set, the file is small, or we
-	 * are writing behind a hole, then use the minimum prealloc:
+	 * Use the minimum preallocation size for small files or if we are
+	 * writing right after a hole.
 	 */
-	if ((mp->m_flags & XFS_MOUNT_ALLOCSIZE) ||
-	    XFS_ISIZE(ip) < XFS_FSB_TO_B(mp, mp->m_dalign) ||
+	if (XFS_ISIZE(ip) < XFS_FSB_TO_B(mp, mp->m_dalign) ||
 	    !xfs_iext_prev_extent(ifp, &ncur, &prev) ||
 	    prev.br_startoff + prev.br_blockcount < offset_fsb)
 		return mp->m_allocsize_blocks;
 
 	/*
-	 * Determine the initial size of the preallocation. We are beyond the
-	 * current EOF here, but we need to take into account whether this is
-	 * a sparse write or an extending write when determining the
-	 * preallocation size.  Hence we need to look up the extent that ends
-	 * at the current write offset and use the result to determine the
-	 * preallocation size.
-	 *
-	 * If the extent is a hole, then preallocation is essentially disabled.
-	 * Otherwise we take the size of the preceding data extents as the basis
-	 * for the preallocation size. Note that we don't care if the previous
-	 * extents are written or not.
-	 *
-	 * If the size of the extents is greater than half the maximum extent
-	 * length, then use the current offset as the basis. This ensures that
-	 * for large files the preallocation size always extends to MAXEXTLEN
-	 * rather than falling short due to things like stripe unit/width
-	 * alignment of real extents.
+	 * Take the size of the preceding data extents as the basis for the
+	 * preallocation size. Note that we don't care if the previous extents
+	 * are written or not.
 	 */
 	plen = prev.br_blockcount;
 	while (xfs_iext_prev_extent(ifp, &ncur, &got)) {
@@ -435,19 +410,25 @@ xfs_iomap_prealloc_size(
 		plen += got.br_blockcount;
 		prev = got;
 	}
+
+	/*
+	 * If the size of the extents is greater than half the maximum extent
+	 * length, then use the current offset as the basis.  This ensures that
+	 * for large files the preallocation size always extends to MAXEXTLEN
+	 * rather than falling short due to things like stripe unit/width
+	 * alignment of real extents.
+	 */
 	alloc_blocks = plen << 1;
 	if (alloc_blocks > MAXEXTLEN)
 		alloc_blocks = XFS_B_TO_FSB(mp, offset);
-	if (!alloc_blocks)
-		goto check_writeio;
 	qblocks = alloc_blocks;
 
 	/*
 	 * MAXEXTLEN is not a power of two value but we round the prealloc down
 	 * to the nearest power of two value after throttling. To prevent the
-	 * round down from unconditionally reducing the maximum supported prealloc
-	 * size, we round up first, apply appropriate throttling, round down and
-	 * cap the value to MAXEXTLEN.
+	 * round down from unconditionally reducing the maximum supported
+	 * prealloc size, we round up first, apply appropriate throttling,
+	 * round down and cap the value to MAXEXTLEN.
 	 */
 	alloc_blocks = XFS_FILEOFF_MIN(roundup_pow_of_two(MAXEXTLEN),
 				       alloc_blocks);
@@ -508,7 +489,6 @@ xfs_iomap_prealloc_size(
 	 */
 	while (alloc_blocks && alloc_blocks >= freesp)
 		alloc_blocks >>= 4;
-check_writeio:
 	if (alloc_blocks < mp->m_allocsize_blocks)
 		alloc_blocks = mp->m_allocsize_blocks;
 	trace_xfs_iomap_prealloc_size(ip, alloc_blocks, shift,
@@ -975,9 +955,16 @@ xfs_buffered_write_iomap_begin(
 	if (error)
 		goto out_unlock;
 
-	if (eof) {
-		prealloc_blocks = xfs_iomap_prealloc_size(ip, allocfork, offset,
-				count, &icur);
+	if (eof && offset + count > XFS_ISIZE(ip)) {
+		/*
+		 * Determine the initial size of the preallocation.
+		 * We clean up any extra preallocation when the file is closed.
+		 */
+		if (mp->m_flags & XFS_MOUNT_ALLOCSIZE)
+			prealloc_blocks = mp->m_allocsize_blocks;
+		else
+			prealloc_blocks = xfs_iomap_prealloc_size(ip, allocfork,
+						offset, count, &icur);
 		if (prealloc_blocks) {
 			xfs_extlen_t	align;
 			xfs_off_t	end_offset;


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/4] xfs: force writes to delalloc regions to unwritten
  2020-05-23 16:49 [PATCH v4 0/4] xfs: fix stale disk exposure after crash Darrick J. Wong
                   ` (2 preceding siblings ...)
  2020-05-23 16:49 ` [PATCH 3/4] xfs: refactor xfs_iomap_prealloc_size Darrick J. Wong
@ 2020-05-23 16:49 ` Darrick J. Wong
  3 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2020-05-23 16:49 UTC (permalink / raw)
  To: darrick.wong; +Cc: Christoph Hellwig, Brian Foster, linux-xfs, hch, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

When writing to a delalloc region in the data fork, commit the new
allocations (of the da reservation) as unwritten so that the mappings
are only marked written once writeback completes successfully.  This
fixes the problem of stale data exposure if the system goes down during
targeted writeback of a specific region of a file, as tested by
generic/042.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_bmap.c |   29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index fda13cd7add0..f8fe83c9348d 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4193,17 +4193,7 @@ xfs_bmapi_allocate(
 	bma->got.br_blockcount = bma->length;
 	bma->got.br_state = XFS_EXT_NORM;
 
-	/*
-	 * In the data fork, a wasdelay extent has been initialized, so
-	 * shouldn't be flagged as unwritten.
-	 *
-	 * For the cow fork, however, we convert delalloc reservations
-	 * (extents allocated for speculative preallocation) to
-	 * allocated unwritten extents, and only convert the unwritten
-	 * extents to real extents when we're about to write the data.
-	 */
-	if ((!bma->wasdel || (bma->flags & XFS_BMAPI_COWFORK)) &&
-	    (bma->flags & XFS_BMAPI_PREALLOC))
+	if (bma->flags & XFS_BMAPI_PREALLOC)
 		bma->got.br_state = XFS_EXT_UNWRITTEN;
 
 	if (bma->wasdel)
@@ -4611,8 +4601,23 @@ xfs_bmapi_convert_delalloc(
 	bma.offset = bma.got.br_startoff;
 	bma.length = max_t(xfs_filblks_t, bma.got.br_blockcount, MAXEXTLEN);
 	bma.minleft = xfs_bmapi_minleft(tp, ip, whichfork);
+
+	/*
+	 * When we're converting the delalloc reservations backing dirty pages
+	 * in the page cache, we must be careful about how we create the new
+	 * extents:
+	 *
+	 * New CoW fork extents are created unwritten, turned into real extents
+	 * when we're about to write the data to disk, and mapped into the data
+	 * fork after the write finishes.  End of story.
+	 *
+	 * New data fork extents must be mapped in as unwritten and converted
+	 * to real extents after the write succeeds to avoid exposing stale
+	 * disk contents if we crash.
+	 */
+	bma.flags = XFS_BMAPI_PREALLOC;
 	if (whichfork == XFS_COW_FORK)
-		bma.flags = XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC;
+		bma.flags |= XFS_BMAPI_COWFORK;
 
 	if (!xfs_iext_peek_prev_extent(ifp, &bma.icur, &bma.prev))
 		bma.prev.br_startoff = NULLFILEOFF;


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] xfs: measure all contiguous previous extents for prealloc size
  2020-05-23 16:49 ` [PATCH 2/4] xfs: measure all contiguous previous extents for prealloc size Darrick J. Wong
@ 2020-05-24  9:14   ` Christoph Hellwig
  2020-05-24 17:16     ` Darrick J. Wong
  2020-05-24 17:16   ` [PATCH v2 " Darrick J. Wong
  1 sibling, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2020-05-24  9:14 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, hch, bfoster

I really, really hate the pointless use of shits where the
multiplication makes it obvious what is going on, and still allows
the compiler to opimize it in whatever way it wants.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 2/4] xfs: measure all contiguous previous extents for prealloc size
  2020-05-23 16:49 ` [PATCH 2/4] xfs: measure all contiguous previous extents for prealloc size Darrick J. Wong
  2020-05-24  9:14   ` Christoph Hellwig
@ 2020-05-24 17:16   ` Darrick J. Wong
  2020-05-25 13:28     ` Christoph Hellwig
  2020-05-26 13:46     ` Brian Foster
  1 sibling, 2 replies; 13+ messages in thread
From: Darrick J. Wong @ 2020-05-24 17:16 UTC (permalink / raw)
  To: linux-xfs, hch, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

When we're estimating a new speculative preallocation length for an
extending write, we should walk backwards through the extent list to
determine the number of number of blocks that are physically and
logically contiguous with the write offset, and use that as an input to
the preallocation size computation.

This way, preallocation length is truly measured by the effectiveness of
the allocator in giving us contiguous allocations without being
influenced by the state of a given extent.  This fixes both the problem
where ZERO_RANGE within an EOF can reduce preallocation, and prevents
the unnecessary shrinkage of preallocation when delalloc extents are
turned into unwritten extents.

This was found as a regression in xfs/014 after changing delalloc writes
to create unwritten extents during writeback.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v2: multiplication instead of lshift
---
 fs/xfs/xfs_iomap.c |   40 +++++++++++++++++++++++++++-------------
 1 file changed, 27 insertions(+), 13 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 7d8966ce630a..e74a8c2c94ce 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -377,15 +377,17 @@ xfs_iomap_prealloc_size(
 	loff_t			count,
 	struct xfs_iext_cursor	*icur)
 {
+	struct xfs_iext_cursor	ncur = *icur;
+	struct xfs_bmbt_irec	prev, got;
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, whichfork);
 	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
-	struct xfs_bmbt_irec	prev;
-	int			shift = 0;
 	int64_t			freesp;
 	xfs_fsblock_t		qblocks;
-	int			qshift = 0;
 	xfs_fsblock_t		alloc_blocks = 0;
+	xfs_extlen_t		plen;
+	int			shift = 0;
+	int			qshift = 0;
 
 	if (offset + count <= XFS_ISIZE(ip))
 		return 0;
@@ -400,7 +402,7 @@ xfs_iomap_prealloc_size(
 	 */
 	if ((mp->m_flags & XFS_MOUNT_ALLOCSIZE) ||
 	    XFS_ISIZE(ip) < XFS_FSB_TO_B(mp, mp->m_dalign) ||
-	    !xfs_iext_peek_prev_extent(ifp, icur, &prev) ||
+	    !xfs_iext_prev_extent(ifp, &ncur, &prev) ||
 	    prev.br_startoff + prev.br_blockcount < offset_fsb)
 		return mp->m_allocsize_blocks;
 
@@ -413,16 +415,28 @@ xfs_iomap_prealloc_size(
 	 * preallocation size.
 	 *
 	 * If the extent is a hole, then preallocation is essentially disabled.
-	 * Otherwise we take the size of the preceding data extent as the basis
-	 * for the preallocation size. If the size of the extent is greater than
-	 * half the maximum extent length, then use the current offset as the
-	 * basis. This ensures that for large files the preallocation size
-	 * always extends to MAXEXTLEN rather than falling short due to things
-	 * like stripe unit/width alignment of real extents.
+	 * Otherwise we take the size of the preceding data extents as the basis
+	 * for the preallocation size. Note that we don't care if the previous
+	 * extents are written or not.
+	 *
+	 * If the size of the extents is greater than half the maximum extent
+	 * length, then use the current offset as the basis. This ensures that
+	 * for large files the preallocation size always extends to MAXEXTLEN
+	 * rather than falling short due to things like stripe unit/width
+	 * alignment of real extents.
 	 */
-	if (prev.br_blockcount <= (MAXEXTLEN >> 1))
-		alloc_blocks = prev.br_blockcount << 1;
-	else
+	plen = prev.br_blockcount;
+	while (xfs_iext_prev_extent(ifp, &ncur, &got)) {
+		if (plen > MAXEXTLEN / 2 ||
+		    isnullstartblock(got.br_startblock) ||
+		    got.br_startoff + got.br_blockcount != prev.br_startoff ||
+		    got.br_startblock + got.br_blockcount != prev.br_startblock)
+			break;
+		plen += got.br_blockcount;
+		prev = got;
+	}
+	alloc_blocks = plen * 2;
+	if (alloc_blocks > MAXEXTLEN)
 		alloc_blocks = XFS_B_TO_FSB(mp, offset);
 	if (!alloc_blocks)
 		goto check_writeio;

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] xfs: measure all contiguous previous extents for prealloc size
  2020-05-24  9:14   ` Christoph Hellwig
@ 2020-05-24 17:16     ` Darrick J. Wong
  0 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2020-05-24 17:16 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, bfoster

On Sun, May 24, 2020 at 02:14:46AM -0700, Christoph Hellwig wrote:
> I really, really hate the pointless use of shits where the
> multiplication makes it obvious what is going on, and still allows
> the compiler to opimize it in whatever way it wants.

<grumble> Fine, I'll change it back....

--D

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 3/4] xfs: refactor xfs_iomap_prealloc_size
  2020-05-23 16:49 ` [PATCH 3/4] xfs: refactor xfs_iomap_prealloc_size Darrick J. Wong
@ 2020-05-24 17:17   ` Darrick J. Wong
  2020-05-26 13:46     ` Brian Foster
  0 siblings, 1 reply; 13+ messages in thread
From: Darrick J. Wong @ 2020-05-24 17:17 UTC (permalink / raw)
  To: Christoph Hellwig, Christoph Hellwig, linux-xfs, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

Refactor xfs_iomap_prealloc_size to be the function that dynamically
computes the per-file preallocation size by moving the allocsize= case
to the caller.  Break up the huge comment preceding the function to
annotate the relevant parts of the code, and remove the impossible
check_writeio case.

Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
v2: minor rebase due to changes in previous patch
---
 fs/xfs/xfs_iomap.c |   83 ++++++++++++++++++++++------------------------------
 1 file changed, 35 insertions(+), 48 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index e74a8c2c94ce..b9a8c3798e08 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -352,22 +352,10 @@ xfs_quota_calc_throttle(
 }
 
 /*
- * If we are doing a write at the end of the file and there are no allocations
- * past this one, then extend the allocation out to the file system's write
- * iosize.
- *
  * If we don't have a user specified preallocation size, dynamically increase
  * the preallocation size as the size of the file grows.  Cap the maximum size
  * at a single extent or less if the filesystem is near full. The closer the
- * filesystem is to full, the smaller the maximum prealocation.
- *
- * As an exception we don't do any preallocation at all if the file is smaller
- * than the minimum preallocation and we are using the default dynamic
- * preallocation scheme, as it is likely this is the only write to the file that
- * is going to be done.
- *
- * We clean up any extra space left over when the file is closed in
- * xfs_inactive().
+ * filesystem is to being full, the smaller the maximum preallocation.
  */
 STATIC xfs_fsblock_t
 xfs_iomap_prealloc_size(
@@ -389,41 +377,28 @@ xfs_iomap_prealloc_size(
 	int			shift = 0;
 	int			qshift = 0;
 
-	if (offset + count <= XFS_ISIZE(ip))
-		return 0;
-
-	if (!(mp->m_flags & XFS_MOUNT_ALLOCSIZE) &&
-	    (XFS_ISIZE(ip) < XFS_FSB_TO_B(mp, mp->m_allocsize_blocks)))
+	/*
+	 * As an exception we don't do any preallocation at all if the file is
+	 * smaller than the minimum preallocation and we are using the default
+	 * dynamic preallocation scheme, as it is likely this is the only write
+	 * to the file that is going to be done.
+	 */
+	if (XFS_ISIZE(ip) < XFS_FSB_TO_B(mp, mp->m_allocsize_blocks))
 		return 0;
 
 	/*
-	 * If an explicit allocsize is set, the file is small, or we
-	 * are writing behind a hole, then use the minimum prealloc:
+	 * Use the minimum preallocation size for small files or if we are
+	 * writing right after a hole.
 	 */
-	if ((mp->m_flags & XFS_MOUNT_ALLOCSIZE) ||
-	    XFS_ISIZE(ip) < XFS_FSB_TO_B(mp, mp->m_dalign) ||
+	if (XFS_ISIZE(ip) < XFS_FSB_TO_B(mp, mp->m_dalign) ||
 	    !xfs_iext_prev_extent(ifp, &ncur, &prev) ||
 	    prev.br_startoff + prev.br_blockcount < offset_fsb)
 		return mp->m_allocsize_blocks;
 
 	/*
-	 * Determine the initial size of the preallocation. We are beyond the
-	 * current EOF here, but we need to take into account whether this is
-	 * a sparse write or an extending write when determining the
-	 * preallocation size.  Hence we need to look up the extent that ends
-	 * at the current write offset and use the result to determine the
-	 * preallocation size.
-	 *
-	 * If the extent is a hole, then preallocation is essentially disabled.
-	 * Otherwise we take the size of the preceding data extents as the basis
-	 * for the preallocation size. Note that we don't care if the previous
-	 * extents are written or not.
-	 *
-	 * If the size of the extents is greater than half the maximum extent
-	 * length, then use the current offset as the basis. This ensures that
-	 * for large files the preallocation size always extends to MAXEXTLEN
-	 * rather than falling short due to things like stripe unit/width
-	 * alignment of real extents.
+	 * Take the size of the preceding data extents as the basis for the
+	 * preallocation size. Note that we don't care if the previous extents
+	 * are written or not.
 	 */
 	plen = prev.br_blockcount;
 	while (xfs_iext_prev_extent(ifp, &ncur, &got)) {
@@ -435,19 +410,25 @@ xfs_iomap_prealloc_size(
 		plen += got.br_blockcount;
 		prev = got;
 	}
+
+	/*
+	 * If the size of the extents is greater than half the maximum extent
+	 * length, then use the current offset as the basis.  This ensures that
+	 * for large files the preallocation size always extends to MAXEXTLEN
+	 * rather than falling short due to things like stripe unit/width
+	 * alignment of real extents.
+	 */
 	alloc_blocks = plen * 2;
 	if (alloc_blocks > MAXEXTLEN)
 		alloc_blocks = XFS_B_TO_FSB(mp, offset);
-	if (!alloc_blocks)
-		goto check_writeio;
 	qblocks = alloc_blocks;
 
 	/*
 	 * MAXEXTLEN is not a power of two value but we round the prealloc down
 	 * to the nearest power of two value after throttling. To prevent the
-	 * round down from unconditionally reducing the maximum supported prealloc
-	 * size, we round up first, apply appropriate throttling, round down and
-	 * cap the value to MAXEXTLEN.
+	 * round down from unconditionally reducing the maximum supported
+	 * prealloc size, we round up first, apply appropriate throttling,
+	 * round down and cap the value to MAXEXTLEN.
 	 */
 	alloc_blocks = XFS_FILEOFF_MIN(roundup_pow_of_two(MAXEXTLEN),
 				       alloc_blocks);
@@ -508,7 +489,6 @@ xfs_iomap_prealloc_size(
 	 */
 	while (alloc_blocks && alloc_blocks >= freesp)
 		alloc_blocks >>= 4;
-check_writeio:
 	if (alloc_blocks < mp->m_allocsize_blocks)
 		alloc_blocks = mp->m_allocsize_blocks;
 	trace_xfs_iomap_prealloc_size(ip, alloc_blocks, shift,
@@ -975,9 +955,16 @@ xfs_buffered_write_iomap_begin(
 	if (error)
 		goto out_unlock;
 
-	if (eof) {
-		prealloc_blocks = xfs_iomap_prealloc_size(ip, allocfork, offset,
-				count, &icur);
+	if (eof && offset + count > XFS_ISIZE(ip)) {
+		/*
+		 * Determine the initial size of the preallocation.
+		 * We clean up any extra preallocation when the file is closed.
+		 */
+		if (mp->m_flags & XFS_MOUNT_ALLOCSIZE)
+			prealloc_blocks = mp->m_allocsize_blocks;
+		else
+			prealloc_blocks = xfs_iomap_prealloc_size(ip, allocfork,
+						offset, count, &icur);
 		if (prealloc_blocks) {
 			xfs_extlen_t	align;
 			xfs_off_t	end_offset;

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/4] xfs: measure all contiguous previous extents for prealloc size
  2020-05-24 17:16   ` [PATCH v2 " Darrick J. Wong
@ 2020-05-25 13:28     ` Christoph Hellwig
  2020-05-26 13:46     ` Brian Foster
  1 sibling, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2020-05-25 13:28 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, hch, bfoster

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/4] xfs: measure all contiguous previous extents for prealloc size
  2020-05-24 17:16   ` [PATCH v2 " Darrick J. Wong
  2020-05-25 13:28     ` Christoph Hellwig
@ 2020-05-26 13:46     ` Brian Foster
  1 sibling, 0 replies; 13+ messages in thread
From: Brian Foster @ 2020-05-26 13:46 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, hch

On Sun, May 24, 2020 at 10:16:12AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> When we're estimating a new speculative preallocation length for an
> extending write, we should walk backwards through the extent list to
> determine the number of number of blocks that are physically and
> logically contiguous with the write offset, and use that as an input to
> the preallocation size computation.
> 
> This way, preallocation length is truly measured by the effectiveness of
> the allocator in giving us contiguous allocations without being
> influenced by the state of a given extent.  This fixes both the problem
> where ZERO_RANGE within an EOF can reduce preallocation, and prevents
> the unnecessary shrinkage of preallocation when delalloc extents are
> turned into unwritten extents.
> 
> This was found as a regression in xfs/014 after changing delalloc writes
> to create unwritten extents during writeback.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> v2: multiplication instead of lshift
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_iomap.c |   40 +++++++++++++++++++++++++++-------------
>  1 file changed, 27 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 7d8966ce630a..e74a8c2c94ce 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -377,15 +377,17 @@ xfs_iomap_prealloc_size(
>  	loff_t			count,
>  	struct xfs_iext_cursor	*icur)
>  {
> +	struct xfs_iext_cursor	ncur = *icur;
> +	struct xfs_bmbt_irec	prev, got;
>  	struct xfs_mount	*mp = ip->i_mount;
>  	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, whichfork);
>  	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
> -	struct xfs_bmbt_irec	prev;
> -	int			shift = 0;
>  	int64_t			freesp;
>  	xfs_fsblock_t		qblocks;
> -	int			qshift = 0;
>  	xfs_fsblock_t		alloc_blocks = 0;
> +	xfs_extlen_t		plen;
> +	int			shift = 0;
> +	int			qshift = 0;
>  
>  	if (offset + count <= XFS_ISIZE(ip))
>  		return 0;
> @@ -400,7 +402,7 @@ xfs_iomap_prealloc_size(
>  	 */
>  	if ((mp->m_flags & XFS_MOUNT_ALLOCSIZE) ||
>  	    XFS_ISIZE(ip) < XFS_FSB_TO_B(mp, mp->m_dalign) ||
> -	    !xfs_iext_peek_prev_extent(ifp, icur, &prev) ||
> +	    !xfs_iext_prev_extent(ifp, &ncur, &prev) ||
>  	    prev.br_startoff + prev.br_blockcount < offset_fsb)
>  		return mp->m_allocsize_blocks;
>  
> @@ -413,16 +415,28 @@ xfs_iomap_prealloc_size(
>  	 * preallocation size.
>  	 *
>  	 * If the extent is a hole, then preallocation is essentially disabled.
> -	 * Otherwise we take the size of the preceding data extent as the basis
> -	 * for the preallocation size. If the size of the extent is greater than
> -	 * half the maximum extent length, then use the current offset as the
> -	 * basis. This ensures that for large files the preallocation size
> -	 * always extends to MAXEXTLEN rather than falling short due to things
> -	 * like stripe unit/width alignment of real extents.
> +	 * Otherwise we take the size of the preceding data extents as the basis
> +	 * for the preallocation size. Note that we don't care if the previous
> +	 * extents are written or not.
> +	 *
> +	 * If the size of the extents is greater than half the maximum extent
> +	 * length, then use the current offset as the basis. This ensures that
> +	 * for large files the preallocation size always extends to MAXEXTLEN
> +	 * rather than falling short due to things like stripe unit/width
> +	 * alignment of real extents.
>  	 */
> -	if (prev.br_blockcount <= (MAXEXTLEN >> 1))
> -		alloc_blocks = prev.br_blockcount << 1;
> -	else
> +	plen = prev.br_blockcount;
> +	while (xfs_iext_prev_extent(ifp, &ncur, &got)) {
> +		if (plen > MAXEXTLEN / 2 ||
> +		    isnullstartblock(got.br_startblock) ||
> +		    got.br_startoff + got.br_blockcount != prev.br_startoff ||
> +		    got.br_startblock + got.br_blockcount != prev.br_startblock)
> +			break;
> +		plen += got.br_blockcount;
> +		prev = got;
> +	}
> +	alloc_blocks = plen * 2;
> +	if (alloc_blocks > MAXEXTLEN)
>  		alloc_blocks = XFS_B_TO_FSB(mp, offset);
>  	if (!alloc_blocks)
>  		goto check_writeio;
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/4] xfs: refactor xfs_iomap_prealloc_size
  2020-05-24 17:17   ` [PATCH v2 " Darrick J. Wong
@ 2020-05-26 13:46     ` Brian Foster
  0 siblings, 0 replies; 13+ messages in thread
From: Brian Foster @ 2020-05-26 13:46 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, Christoph Hellwig, linux-xfs

On Sun, May 24, 2020 at 10:17:09AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Refactor xfs_iomap_prealloc_size to be the function that dynamically
> computes the per-file preallocation size by moving the allocsize= case
> to the caller.  Break up the huge comment preceding the function to
> annotate the relevant parts of the code, and remove the impossible
> check_writeio case.
> 
> Suggested-by: Christoph Hellwig <hch@infradead.org>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
> v2: minor rebase due to changes in previous patch
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_iomap.c |   83 ++++++++++++++++++++++------------------------------
>  1 file changed, 35 insertions(+), 48 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index e74a8c2c94ce..b9a8c3798e08 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -352,22 +352,10 @@ xfs_quota_calc_throttle(
>  }
>  
>  /*
> - * If we are doing a write at the end of the file and there are no allocations
> - * past this one, then extend the allocation out to the file system's write
> - * iosize.
> - *
>   * If we don't have a user specified preallocation size, dynamically increase
>   * the preallocation size as the size of the file grows.  Cap the maximum size
>   * at a single extent or less if the filesystem is near full. The closer the
> - * filesystem is to full, the smaller the maximum prealocation.
> - *
> - * As an exception we don't do any preallocation at all if the file is smaller
> - * than the minimum preallocation and we are using the default dynamic
> - * preallocation scheme, as it is likely this is the only write to the file that
> - * is going to be done.
> - *
> - * We clean up any extra space left over when the file is closed in
> - * xfs_inactive().
> + * filesystem is to being full, the smaller the maximum preallocation.
>   */
>  STATIC xfs_fsblock_t
>  xfs_iomap_prealloc_size(
> @@ -389,41 +377,28 @@ xfs_iomap_prealloc_size(
>  	int			shift = 0;
>  	int			qshift = 0;
>  
> -	if (offset + count <= XFS_ISIZE(ip))
> -		return 0;
> -
> -	if (!(mp->m_flags & XFS_MOUNT_ALLOCSIZE) &&
> -	    (XFS_ISIZE(ip) < XFS_FSB_TO_B(mp, mp->m_allocsize_blocks)))
> +	/*
> +	 * As an exception we don't do any preallocation at all if the file is
> +	 * smaller than the minimum preallocation and we are using the default
> +	 * dynamic preallocation scheme, as it is likely this is the only write
> +	 * to the file that is going to be done.
> +	 */
> +	if (XFS_ISIZE(ip) < XFS_FSB_TO_B(mp, mp->m_allocsize_blocks))
>  		return 0;
>  
>  	/*
> -	 * If an explicit allocsize is set, the file is small, or we
> -	 * are writing behind a hole, then use the minimum prealloc:
> +	 * Use the minimum preallocation size for small files or if we are
> +	 * writing right after a hole.
>  	 */
> -	if ((mp->m_flags & XFS_MOUNT_ALLOCSIZE) ||
> -	    XFS_ISIZE(ip) < XFS_FSB_TO_B(mp, mp->m_dalign) ||
> +	if (XFS_ISIZE(ip) < XFS_FSB_TO_B(mp, mp->m_dalign) ||
>  	    !xfs_iext_prev_extent(ifp, &ncur, &prev) ||
>  	    prev.br_startoff + prev.br_blockcount < offset_fsb)
>  		return mp->m_allocsize_blocks;
>  
>  	/*
> -	 * Determine the initial size of the preallocation. We are beyond the
> -	 * current EOF here, but we need to take into account whether this is
> -	 * a sparse write or an extending write when determining the
> -	 * preallocation size.  Hence we need to look up the extent that ends
> -	 * at the current write offset and use the result to determine the
> -	 * preallocation size.
> -	 *
> -	 * If the extent is a hole, then preallocation is essentially disabled.
> -	 * Otherwise we take the size of the preceding data extents as the basis
> -	 * for the preallocation size. Note that we don't care if the previous
> -	 * extents are written or not.
> -	 *
> -	 * If the size of the extents is greater than half the maximum extent
> -	 * length, then use the current offset as the basis. This ensures that
> -	 * for large files the preallocation size always extends to MAXEXTLEN
> -	 * rather than falling short due to things like stripe unit/width
> -	 * alignment of real extents.
> +	 * Take the size of the preceding data extents as the basis for the
> +	 * preallocation size. Note that we don't care if the previous extents
> +	 * are written or not.
>  	 */
>  	plen = prev.br_blockcount;
>  	while (xfs_iext_prev_extent(ifp, &ncur, &got)) {
> @@ -435,19 +410,25 @@ xfs_iomap_prealloc_size(
>  		plen += got.br_blockcount;
>  		prev = got;
>  	}
> +
> +	/*
> +	 * If the size of the extents is greater than half the maximum extent
> +	 * length, then use the current offset as the basis.  This ensures that
> +	 * for large files the preallocation size always extends to MAXEXTLEN
> +	 * rather than falling short due to things like stripe unit/width
> +	 * alignment of real extents.
> +	 */
>  	alloc_blocks = plen * 2;
>  	if (alloc_blocks > MAXEXTLEN)
>  		alloc_blocks = XFS_B_TO_FSB(mp, offset);
> -	if (!alloc_blocks)
> -		goto check_writeio;
>  	qblocks = alloc_blocks;
>  
>  	/*
>  	 * MAXEXTLEN is not a power of two value but we round the prealloc down
>  	 * to the nearest power of two value after throttling. To prevent the
> -	 * round down from unconditionally reducing the maximum supported prealloc
> -	 * size, we round up first, apply appropriate throttling, round down and
> -	 * cap the value to MAXEXTLEN.
> +	 * round down from unconditionally reducing the maximum supported
> +	 * prealloc size, we round up first, apply appropriate throttling,
> +	 * round down and cap the value to MAXEXTLEN.
>  	 */
>  	alloc_blocks = XFS_FILEOFF_MIN(roundup_pow_of_two(MAXEXTLEN),
>  				       alloc_blocks);
> @@ -508,7 +489,6 @@ xfs_iomap_prealloc_size(
>  	 */
>  	while (alloc_blocks && alloc_blocks >= freesp)
>  		alloc_blocks >>= 4;
> -check_writeio:
>  	if (alloc_blocks < mp->m_allocsize_blocks)
>  		alloc_blocks = mp->m_allocsize_blocks;
>  	trace_xfs_iomap_prealloc_size(ip, alloc_blocks, shift,
> @@ -975,9 +955,16 @@ xfs_buffered_write_iomap_begin(
>  	if (error)
>  		goto out_unlock;
>  
> -	if (eof) {
> -		prealloc_blocks = xfs_iomap_prealloc_size(ip, allocfork, offset,
> -				count, &icur);
> +	if (eof && offset + count > XFS_ISIZE(ip)) {
> +		/*
> +		 * Determine the initial size of the preallocation.
> +		 * We clean up any extra preallocation when the file is closed.
> +		 */
> +		if (mp->m_flags & XFS_MOUNT_ALLOCSIZE)
> +			prealloc_blocks = mp->m_allocsize_blocks;
> +		else
> +			prealloc_blocks = xfs_iomap_prealloc_size(ip, allocfork,
> +						offset, count, &icur);
>  		if (prealloc_blocks) {
>  			xfs_extlen_t	align;
>  			xfs_off_t	end_offset;
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/4] xfs: don't fail unwritten extent conversion on writeback due to edquot
  2020-05-22  2:52 [PATCH v3 0/4] xfs: fix stale disk exposure after crash Darrick J. Wong
@ 2020-05-22  2:53 ` Darrick J. Wong
  0 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2020-05-22  2:53 UTC (permalink / raw)
  To: darrick.wong; +Cc: Brian Foster, Christoph Hellwig, linux-xfs, hch, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

During writeback, it's possible for the quota block reservation in
xfs_iomap_write_unwritten to fail with EDQUOT because we hit the quota
limit.  This causes writeback errors for data that was already written
to disk, when it's not even guaranteed that the bmbt will expand to
exceed the quota limit.  Irritatingly, this condition is reported to
userspace as EIO by fsync, which is confusing.

We wrote the data, so allow the reservation.  That might put us slightly
above the hard limit, but it's better than losing data after a write.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_iomap.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index bb590a267a7f..ac970b13b1f8 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -563,7 +563,7 @@ xfs_iomap_write_unwritten(
 		xfs_trans_ijoin(tp, ip, 0);
 
 		error = xfs_trans_reserve_quota_nblks(tp, ip, resblks, 0,
-				XFS_QMOPT_RES_REGBLKS);
+				XFS_QMOPT_RES_REGBLKS | XFS_QMOPT_FORCE_RES);
 		if (error)
 			goto error_on_bmapi_transaction;
 


^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-05-26 13:47 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-23 16:49 [PATCH v4 0/4] xfs: fix stale disk exposure after crash Darrick J. Wong
2020-05-23 16:49 ` [PATCH 1/4] xfs: don't fail unwritten extent conversion on writeback due to edquot Darrick J. Wong
2020-05-23 16:49 ` [PATCH 2/4] xfs: measure all contiguous previous extents for prealloc size Darrick J. Wong
2020-05-24  9:14   ` Christoph Hellwig
2020-05-24 17:16     ` Darrick J. Wong
2020-05-24 17:16   ` [PATCH v2 " Darrick J. Wong
2020-05-25 13:28     ` Christoph Hellwig
2020-05-26 13:46     ` Brian Foster
2020-05-23 16:49 ` [PATCH 3/4] xfs: refactor xfs_iomap_prealloc_size Darrick J. Wong
2020-05-24 17:17   ` [PATCH v2 " Darrick J. Wong
2020-05-26 13:46     ` Brian Foster
2020-05-23 16:49 ` [PATCH 4/4] xfs: force writes to delalloc regions to unwritten Darrick J. Wong
  -- strict thread matches above, loose matches on Subject: below --
2020-05-22  2:52 [PATCH v3 0/4] xfs: fix stale disk exposure after crash Darrick J. Wong
2020-05-22  2:53 ` [PATCH 1/4] xfs: don't fail unwritten extent conversion on writeback due to edquot Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).