All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev, Dave Chinner <dchinner@redhat.com>,
	Christoph Hellwig <hch@lst.de>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Leah Rumancik <leah.rumancik@gmail.com>
Subject: [PATCH 6.1 11/45] xfs: punching delalloc extents on write failure is racy
Date: Thu, 23 May 2024 15:13:02 +0200	[thread overview]
Message-ID: <20240523130332.921321869@linuxfoundation.org> (raw)
In-Reply-To: <20240523130332.496202557@linuxfoundation.org>

6.1-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dave Chinner <dchinner@redhat.com>

[ Upstream commit 198dd8aedee6a7d2de0dfa739f9a008a938f6848 ]

xfs_buffered_write_iomap_end() has a comment about the safety of
punching delalloc extents based holding the IOLOCK_EXCL. This
comment is wrong, and punching delalloc extents is not race free.

When we punch out a delalloc extent after a write failure in
xfs_buffered_write_iomap_end(), we punch out the page cache with
truncate_pagecache_range() before we punch out the delalloc extents.
At this point, we only hold the IOLOCK_EXCL, so there is nothing
stopping mmap() write faults racing with this cleanup operation,
reinstantiating a folio over the range we are about to punch and
hence requiring the delalloc extent to be kept.

If this race condition is hit, we can end up with a dirty page in
the page cache that has no delalloc extent or space reservation
backing it. This leads to bad things happening at writeback time.

To avoid this race condition, we need the page cache truncation to
be atomic w.r.t. the extent manipulation. We can do this by holding
the mapping->invalidate_lock exclusively across this operation -
this will prevent new pages from being inserted into the page cache
whilst we are removing the pages and the backing extent and space
reservation.

Taking the mapping->invalidate_lock exclusively in the buffered
write IO path is safe - it naturally nests inside the IOLOCK (see
truncate and fallocate paths). iomap_zero_range() can be called from
under the mapping->invalidate_lock (from the truncate path via
either xfs_zero_eof() or xfs_truncate_page(), but iomap_zero_iter()
will not instantiate new delalloc pages (because it skips holes) and
hence will not ever need to punch out delalloc extents on failure.

Fix the locking issue, and clean up the code logic a little to avoid
unnecessary work if we didn't allocate the delalloc extent or wrote
the entire region we allocated.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/xfs/xfs_iomap.c |   41 +++++++++++++++++++++++------------------
 1 file changed, 23 insertions(+), 18 deletions(-)

--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1147,6 +1147,10 @@ xfs_buffered_write_iomap_end(
 		written = 0;
 	}
 
+	/* If we didn't reserve the blocks, we're not allowed to punch them. */
+	if (!(iomap->flags & IOMAP_F_NEW))
+		return 0;
+
 	/*
 	 * start_fsb refers to the first unused block after a short write. If
 	 * nothing was written, round offset down to point at the first block in
@@ -1158,27 +1162,28 @@ xfs_buffered_write_iomap_end(
 		start_fsb = XFS_B_TO_FSB(mp, offset + written);
 	end_fsb = XFS_B_TO_FSB(mp, offset + length);
 
+	/* Nothing to do if we've written the entire delalloc extent */
+	if (start_fsb >= end_fsb)
+		return 0;
+
 	/*
-	 * Trim delalloc blocks if they were allocated by this write and we
-	 * didn't manage to write the whole range.
-	 *
-	 * We don't need to care about racing delalloc as we hold i_mutex
-	 * across the reserve/allocate/unreserve calls. If there are delalloc
-	 * blocks in the range, they are ours.
+	 * Lock the mapping to avoid races with page faults re-instantiating
+	 * folios and dirtying them via ->page_mkwrite between the page cache
+	 * truncation and the delalloc extent removal. Failing to do this can
+	 * leave dirty pages with no space reservation in the cache.
 	 */
-	if ((iomap->flags & IOMAP_F_NEW) && start_fsb < end_fsb) {
-		truncate_pagecache_range(VFS_I(ip), XFS_FSB_TO_B(mp, start_fsb),
-					 XFS_FSB_TO_B(mp, end_fsb) - 1);
-
-		error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
-					       end_fsb - start_fsb);
-		if (error && !xfs_is_shutdown(mp)) {
-			xfs_alert(mp, "%s: unable to clean up ino %lld",
-				__func__, ip->i_ino);
-			return error;
-		}
-	}
+	filemap_invalidate_lock(inode->i_mapping);
+	truncate_pagecache_range(VFS_I(ip), XFS_FSB_TO_B(mp, start_fsb),
+				 XFS_FSB_TO_B(mp, end_fsb) - 1);
 
+	error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
+				       end_fsb - start_fsb);
+	filemap_invalidate_unlock(inode->i_mapping);
+	if (error && !xfs_is_shutdown(mp)) {
+		xfs_alert(mp, "%s: unable to clean up ino %lld",
+			__func__, ip->i_ino);
+		return error;
+	}
 	return 0;
 }
 



  parent reply	other threads:[~2024-05-23 13:19 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-23 13:12 [PATCH 6.1 00/45] 6.1.92-rc1 review Greg Kroah-Hartman
2024-05-23 13:12 ` [PATCH 6.1 01/45] drm/amd/display: Fix division by zero in setup_dsc_config Greg Kroah-Hartman
2024-05-23 13:12 ` [PATCH 6.1 02/45] net: ks8851: Fix another TX stall caused by wrong ISR flag handling Greg Kroah-Hartman
2024-05-23 13:12 ` [PATCH 6.1 03/45] ice: pass VSI pointer into ice_vc_isvalid_q_id Greg Kroah-Hartman
2024-05-23 13:12 ` [PATCH 6.1 04/45] ice: remove unnecessary duplicate checks for VF VSI ID Greg Kroah-Hartman
2024-05-23 13:12 ` [PATCH 6.1 05/45] pinctrl: core: handle radix_tree_insert() errors in pinctrl_register_one_pin() Greg Kroah-Hartman
2024-05-23 13:12 ` [PATCH 6.1 06/45] mfd: stpmic1: Fix swapped mask/unmask in irq chip Greg Kroah-Hartman
2024-05-23 13:12 ` [PATCH 6.1 07/45] nfsd: dont allow nfsd threads to be signalled Greg Kroah-Hartman
2024-05-23 13:12 ` [PATCH 6.1 08/45] KEYS: trusted: Fix memory leak in tpm2_key_encode() Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 09/45] mmc: core: Add HS400 tuning in HS400es initialization Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 10/45] xfs: write page faults in iomap are not buffered writes Greg Kroah-Hartman
2024-05-23 13:13 ` Greg Kroah-Hartman [this message]
2024-05-23 13:13 ` [PATCH 6.1 12/45] xfs: use byte ranges for write cleanup ranges Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 13/45] xfs,iomap: move delalloc punching to iomap Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 14/45] iomap: buffered write failure should not truncate the page cache Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 15/45] xfs: xfs_bmap_punch_delalloc_range() should take a byte range Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 16/45] iomap: write iomap validity checks Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 17/45] xfs: use iomap_valid method to detect stale cached iomaps Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 18/45] xfs: drop write error injection is unfixable, remove it Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 19/45] xfs: fix off-by-one-block in xfs_discard_folio() Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 20/45] xfs: fix incorrect error-out in xfs_remove Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 21/45] xfs: fix sb write verify for lazysbcount Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 22/45] xfs: fix incorrect i_nlink caused by inode racing Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 23/45] xfs: invalidate block device page cache during unmount Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 24/45] xfs: attach dquots to inode before reading data/cow fork mappings Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 25/45] xfs: wait iclog complete before tearing down AIL Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 26/45] xfs: fix super block buf log item UAF during force shutdown Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 27/45] xfs: hoist refcount record merge predicates Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 28/45] xfs: estimate post-merge refcounts correctly Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 29/45] xfs: invalidate xfs_bufs when allocating cow extents Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 30/45] xfs: allow inode inactivation during a ro mount log recovery Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 31/45] xfs: fix log recovery when unknown rocompat bits are set Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 32/45] xfs: get root inode correctly at bulkstat Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 33/45] xfs: short circuit xfs_growfs_data_private() if delta is zero Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 34/45] arm64: atomics: lse: remove stale dependency on JUMP_LABEL Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 35/45] drm/amdgpu: Fix possible NULL dereference in amdgpu_ras_query_error_status_helper() Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 36/45] binder: fix max_thread type inconsistency Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 37/45] usb: dwc3: Wait unconditionally after issuing EndXfer command Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 38/45] net: usb: ax88179_178a: fix link status when link is set to down/up Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 39/45] usb: typec: ucsi: displayport: Fix potential deadlock Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 40/45] usb: typec: tipd: fix event checking for tps6598x Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 41/45] serial: kgdboc: Fix NMI-safety problems from keyboard reset code Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 42/45] remoteproc: mediatek: Make sure IPI buffer fits in L2TCM Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 43/45] KEYS: trusted: Do not use WARN when encode fails Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 44/45] admin-guide/hw-vuln/core-scheduling: fix return type of PR_SCHED_CORE_GET Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 45/45] docs: kernel_include.py: Cope with docutils 0.21 Greg Kroah-Hartman
2024-05-23 17:03 ` [PATCH 6.1 00/45] 6.1.92-rc1 review SeongJae Park
2024-05-23 18:17 ` Mark Brown
2024-05-23 20:21 ` Florian Fainelli
2024-05-24  8:13 ` Anders Roxell
2024-05-24 11:21 ` Pavel Machek
2024-05-24 14:37 ` Shuah Khan
2024-05-24 15:20 ` Jon Hunter
2024-05-24 20:35 ` Mateusz Jończyk
2024-05-24 20:38 ` Ron Economos
2024-05-25  1:06 ` Kelsey Steele
2024-05-25 16:25 ` Allen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240523130332.921321869@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=leah.rumancik@gmail.com \
    --cc=patches@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.