All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@lst.de>
To: stable@vger.kernel.org
Cc: linux-xfs@vger.kernel.org, Brian Foster <bfoster@redhat.com>,
	"Darrick J . Wong" <darrick.wong@oracle.com>
Subject: [PATCH 26/27] xfs: use iomap new flag for newly allocated delalloc blocks
Date: Sat,  1 Apr 2017 08:40:25 +0200	[thread overview]
Message-ID: <20170401064026.5783-27-hch@lst.de> (raw)
In-Reply-To: <20170401064026.5783-1-hch@lst.de>

From: Brian Foster <bfoster@redhat.com>

commit f65e6fad293b3a5793b7fa2044800506490e7a2e upstream.

Commit fa7f138 ("xfs: clear delalloc and cache on buffered write
failure") fixed one regression in the iomap error handling code and
exposed another. The fundamental problem is that if a buffered write
is a rewrite of preexisting delalloc blocks and the write fails, the
failure handling code can punch out preexisting blocks with valid
file data.

This was reproduced directly by sub-block writes in the LTP
kernel/syscalls/write/write03 test. A first 100 byte write allocates
a single block in a file. A subsequent 100 byte write fails and
punches out the block, including the data successfully written by
the previous write.

To address this problem, update the ->iomap_begin() handler to
distinguish newly allocated delalloc blocks from preexisting
delalloc blocks via the IOMAP_F_NEW flag. Use this flag in the
->iomap_end() handler to decide when a failed or short write should
punch out delalloc blocks.

This introduces the subtle requirement that ->iomap_begin() should
never combine newly allocated delalloc blocks with existing blocks
in the resulting iomap descriptor. This can occur when a new
delalloc reservation merges with a neighboring extent that is part
of the current write, for example. Therefore, drop the
post-allocation extent lookup from xfs_bmapi_reserve_delalloc() and
just return the record inserted into the fork. This ensures only new
blocks are returned and thus that preexisting delalloc blocks are
always handled as "found" blocks and not punched out on a failed
rewrite.

Reported-by: Xiong Zhou <xzhou@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 24 ++++++++++++++----------
 fs/xfs/xfs_iomap.c       | 16 +++++++++++-----
 2 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index a4322526ea5f..ec93395eccdc 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4253,6 +4253,19 @@ xfs_bmapi_read(
 	return 0;
 }
 
+/*
+ * Add a delayed allocation extent to an inode. Blocks are reserved from the
+ * global pool and the extent inserted into the inode in-core extent tree.
+ *
+ * On entry, got refers to the first extent beyond the offset of the extent to
+ * allocate or eof is specified if no such extent exists. On return, got refers
+ * to the extent record that was inserted to the inode fork.
+ *
+ * Note that the allocated extent may have been merged with contiguous extents
+ * during insertion into the inode fork. Thus, got does not reflect the current
+ * state of the inode fork on return. If necessary, the caller can use lastx to
+ * look up the updated record in the inode fork.
+ */
 int
 xfs_bmapi_reserve_delalloc(
 	struct xfs_inode	*ip,
@@ -4339,13 +4352,8 @@ xfs_bmapi_reserve_delalloc(
 	got->br_startblock = nullstartblock(indlen);
 	got->br_blockcount = alen;
 	got->br_state = XFS_EXT_NORM;
-	xfs_bmap_add_extent_hole_delay(ip, whichfork, lastx, got);
 
-	/*
-	 * Update our extent pointer, given that xfs_bmap_add_extent_hole_delay
-	 * might have merged it into one of the neighbouring ones.
-	 */
-	xfs_bmbt_get_all(xfs_iext_get_ext(ifp, *lastx), got);
+	xfs_bmap_add_extent_hole_delay(ip, whichfork, lastx, got);
 
 	/*
 	 * Tag the inode if blocks were preallocated. Note that COW fork
@@ -4357,10 +4365,6 @@ xfs_bmapi_reserve_delalloc(
 	if (whichfork == XFS_COW_FORK && (prealloc || aoff < off || alen > len))
 		xfs_inode_set_cowblocks_tag(ip);
 
-	ASSERT(got->br_startoff <= aoff);
-	ASSERT(got->br_startoff + got->br_blockcount >= aoff + alen);
-	ASSERT(isnullstartblock(got->br_startblock));
-	ASSERT(got->br_state == XFS_EXT_NORM);
 	return 0;
 
 out_unreserve_blocks:
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 5211887cbcd2..360562484e7b 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -637,6 +637,11 @@ xfs_file_iomap_begin_delay(
 		goto out_unlock;
 	}
 
+	/*
+	 * Flag newly allocated delalloc blocks with IOMAP_F_NEW so we punch
+	 * them out if the write happens to fail.
+	 */
+	iomap->flags = IOMAP_F_NEW;
 	trace_xfs_iomap_alloc(ip, offset, count, 0, &got);
 done:
 	if (isnullstartblock(got.br_startblock))
@@ -1061,7 +1066,8 @@ xfs_file_iomap_end_delalloc(
 	struct xfs_inode	*ip,
 	loff_t			offset,
 	loff_t			length,
-	ssize_t			written)
+	ssize_t			written,
+	struct iomap		*iomap)
 {
 	struct xfs_mount	*mp = ip->i_mount;
 	xfs_fileoff_t		start_fsb;
@@ -1080,14 +1086,14 @@ xfs_file_iomap_end_delalloc(
 	end_fsb = XFS_B_TO_FSB(mp, offset + length);
 
 	/*
-	 * Trim back delalloc blocks if we didn't manage to write the whole
-	 * range reserved.
+	 * Trim delalloc blocks if they were allocated by this write and we
+	 * didn't manage to write the whole range.
 	 *
 	 * We don't need to care about racing delalloc as we hold i_mutex
 	 * across the reserve/allocate/unreserve calls. If there are delalloc
 	 * blocks in the range, they are ours.
 	 */
-	if (start_fsb < end_fsb) {
+	if ((iomap->flags & IOMAP_F_NEW) && start_fsb < end_fsb) {
 		truncate_pagecache_range(VFS_I(ip), XFS_FSB_TO_B(mp, start_fsb),
 					 XFS_FSB_TO_B(mp, end_fsb) - 1);
 
@@ -1117,7 +1123,7 @@ xfs_file_iomap_end(
 {
 	if ((flags & IOMAP_WRITE) && iomap->type == IOMAP_DELALLOC)
 		return xfs_file_iomap_end_delalloc(XFS_I(inode), offset,
-				length, written);
+				length, written, iomap);
 	return 0;
 }
 
-- 
2.11.0


  parent reply	other threads:[~2017-04-01  6:42 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-01  6:39 4.9-stable updates for XFS Christoph Hellwig
2017-04-01  6:40 ` [PATCH 01/27] xfs: only update mount/resv fields on success in __xfs_ag_resv_init Christoph Hellwig
2017-04-01  6:40 ` [PATCH 02/27] xfs: use per-AG reservations for the finobt Christoph Hellwig
2017-04-01  6:40 ` [PATCH 03/27] xfs: pull up iolock from xfs_free_eofblocks() Christoph Hellwig
2017-04-01  6:40 ` [PATCH 04/27] xfs: sync eofblocks scans under iolock are livelock prone Christoph Hellwig
2017-04-01  6:40 ` [PATCH 05/27] xfs: fix eofblocks race with file extending async dio writes Christoph Hellwig
2017-04-01  6:40 ` [PATCH 06/27] xfs: fix toctou race when locking an inode to access the data map Christoph Hellwig
2017-04-01  6:40 ` [PATCH 07/27] xfs: fail _dir_open when readahead fails Christoph Hellwig
2017-04-01  6:40 ` [PATCH 08/27] xfs: filter out obviously bad btree pointers Christoph Hellwig
2017-04-01  6:40 ` [PATCH 09/27] xfs: check for obviously bad level values in the bmbt root Christoph Hellwig
2017-04-01  6:40 ` [PATCH 10/27] xfs: verify free block header fields Christoph Hellwig
2017-04-01  6:40 ` [PATCH 11/27] xfs: allow unwritten extents in the CoW fork Christoph Hellwig
2017-04-01  6:40 ` [PATCH 12/27] xfs: mark speculative prealloc CoW fork extents unwritten Christoph Hellwig
2017-04-01  6:40 ` [PATCH 13/27] xfs: reset b_first_retry_time when clear the retry status of xfs_buf_t Christoph Hellwig
2017-04-01  6:40 ` [PATCH 14/27] xfs: reject all unaligned direct writes to reflinked files Christoph Hellwig
2017-04-01  6:40 ` [PATCH 15/27] xfs: update ctime and mtime on clone destinatation inodes Christoph Hellwig
2017-04-01  6:40 ` [PATCH 16/27] xfs: don't fail xfs_extent_busy allocation Christoph Hellwig
2017-04-01  6:40 ` [PATCH 17/27] xfs: handle indlen shortage on delalloc extent merge Christoph Hellwig
2017-04-01  6:40 ` [PATCH 18/27] xfs: split indlen reservations fairly when under reserved Christoph Hellwig
2017-04-01  6:40 ` [PATCH 19/27] xfs: fix uninitialized variable in _reflink_convert_cow Christoph Hellwig
2017-04-01  6:40 ` [PATCH 20/27] xfs: don't reserve blocks for right shift transactions Christoph Hellwig
2017-04-01  6:40 ` [PATCH 21/27] xfs: Use xfs_icluster_size_fsb() to calculate inode chunk alignment Christoph Hellwig
2017-04-01  6:40 ` [PATCH 22/27] xfs: tune down agno asserts in the bmap code Christoph Hellwig
2017-04-01  6:40 ` [PATCH 23/27] xfs: only reclaim unwritten COW extents periodically Christoph Hellwig
2017-04-01  6:40 ` [PATCH 24/27] xfs: fix and streamline error handling in xfs_end_io Christoph Hellwig
2017-04-01  6:40 ` [PATCH 25/27] xfs: Use xfs_icluster_size_fsb() to calculate inode alignment mask Christoph Hellwig
2017-04-01  6:40 ` Christoph Hellwig [this message]
2017-04-01  6:40 ` [PATCH 27/27] xfs: try any AG when allocating the first btree block when reflinking Christoph Hellwig
2017-04-01 17:32 ` 4.9-stable updates for XFS Greg KH
  -- strict thread matches above, loose matches on Subject: below --
2017-03-27  8:48 Christoph Hellwig
2017-03-27  8:49 ` [PATCH 26/27] xfs: use iomap new flag for newly allocated delalloc blocks Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170401064026.5783-27-hch@lst.de \
    --to=hch@lst.de \
    --cc=bfoster@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.