All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Brian Foster <bfoster@redhat.com>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Amir Goldstein <amir73il@gmail.com>
Subject: [PATCH 5.10 58/65] xfs: hold buffer across unpin and potential shutdown processing
Date: Mon,  1 Aug 2022 13:47:15 +0200	[thread overview]
Message-ID: <20220801114136.094866787@linuxfoundation.org> (raw)
In-Reply-To: <20220801114133.641770326@linuxfoundation.org>

From: Brian Foster <bfoster@redhat.com>

commit 84d8949e770745b16a7e8a68dcb1d0f3687bdee9 upstream.

The special processing used to simulate a buffer I/O failure on fs
shutdown has a difficult to reproduce race that can result in a use
after free of the associated buffer. Consider a buffer that has been
committed to the on-disk log and thus is AIL resident. The buffer
lands on the writeback delwri queue, but is subsequently locked,
committed and pinned by another transaction before submitted for
I/O. At this point, the buffer is stuck on the delwri queue as it
cannot be submitted for I/O until it is unpinned. A log checkpoint
I/O failure occurs sometime later, which aborts the bli. The unpin
handler is called with the aborted log item, drops the bli reference
count, the pin count, and falls into the I/O failure simulation
path.

The potential problem here is that once the pin count falls to zero
in ->iop_unpin(), xfsaild is free to retry delwri submission of the
buffer at any time, before the unpin handler even completes. If
delwri queue submission wins the race to the buffer lock, it
observes the shutdown state and simulates the I/O failure itself.
This releases both the bli and delwri queue holds and frees the
buffer while xfs_buf_item_unpin() sits on xfs_buf_lock() waiting to
run through the same failure sequence. This problem is rare and
requires many iterations of fstest generic/019 (which simulates disk
I/O failures) to reproduce.

To avoid this problem, grab a hold on the buffer before the log item
is unpinned if the associated item has been aborted and will require
a simulated I/O failure. The hold is already required for the
simulated I/O failure, so the ordering simply guarantees the unpin
handler access to the buffer before it is unpinned and thus
processed by the AIL. This particular ordering is required so long
as the AIL does not acquire a reference on the bli, which is the
long term solution to this problem.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/xfs/xfs_buf_item.c |   37 +++++++++++++++++++++----------------
 1 file changed, 21 insertions(+), 16 deletions(-)

--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -393,17 +393,8 @@ xfs_buf_item_pin(
 }
 
 /*
- * This is called to unpin the buffer associated with the buf log
- * item which was previously pinned with a call to xfs_buf_item_pin().
- *
- * Also drop the reference to the buf item for the current transaction.
- * If the XFS_BLI_STALE flag is set and we are the last reference,
- * then free up the buf log item and unlock the buffer.
- *
- * If the remove flag is set we are called from uncommit in the
- * forced-shutdown path.  If that is true and the reference count on
- * the log item is going to drop to zero we need to free the item's
- * descriptor in the transaction.
+ * This is called to unpin the buffer associated with the buf log item which
+ * was previously pinned with a call to xfs_buf_item_pin().
  */
 STATIC void
 xfs_buf_item_unpin(
@@ -420,12 +411,26 @@ xfs_buf_item_unpin(
 
 	trace_xfs_buf_item_unpin(bip);
 
+	/*
+	 * Drop the bli ref associated with the pin and grab the hold required
+	 * for the I/O simulation failure in the abort case. We have to do this
+	 * before the pin count drops because the AIL doesn't acquire a bli
+	 * reference. Therefore if the refcount drops to zero, the bli could
+	 * still be AIL resident and the buffer submitted for I/O (and freed on
+	 * completion) at any point before we return. This can be removed once
+	 * the AIL properly holds a reference on the bli.
+	 */
 	freed = atomic_dec_and_test(&bip->bli_refcount);
-
+	if (freed && !stale && remove)
+		xfs_buf_hold(bp);
 	if (atomic_dec_and_test(&bp->b_pin_count))
 		wake_up_all(&bp->b_waiters);
 
-	if (freed && stale) {
+	 /* nothing to do but drop the pin count if the bli is active */
+	if (!freed)
+		return;
+
+	if (stale) {
 		ASSERT(bip->bli_flags & XFS_BLI_STALE);
 		ASSERT(xfs_buf_islocked(bp));
 		ASSERT(bp->b_flags & XBF_STALE);
@@ -468,13 +473,13 @@ xfs_buf_item_unpin(
 			ASSERT(bp->b_log_item == NULL);
 		}
 		xfs_buf_relse(bp);
-	} else if (freed && remove) {
+	} else if (remove) {
 		/*
 		 * The buffer must be locked and held by the caller to simulate
-		 * an async I/O failure.
+		 * an async I/O failure. We acquired the hold for this case
+		 * before the buffer was unpinned.
 		 */
 		xfs_buf_lock(bp);
-		xfs_buf_hold(bp);
 		bp->b_flags |= XBF_ASYNC;
 		xfs_buf_ioend_fail(bp);
 	}



  parent reply	other threads:[~2022-08-01 11:58 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-01 11:46 [PATCH 5.10 00/65] 5.10.135-rc1 review Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 01/65] Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 02/65] Revert "ocfs2: mount shared volume without ha stack" Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 03/65] ntfs: fix use-after-free in ntfs_ucsncmp() Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 04/65] s390/archrandom: prevent CPACF trng invocations in interrupt context Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 05/65] nouveau/svm: Fix to migrate all requested pages Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 06/65] watch_queue: Fix missing rcu annotation Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 07/65] watch_queue: Fix missing locking in add_watch_to_object() Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 08/65] tcp: Fix data-races around sysctl_tcp_dsack Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 09/65] tcp: Fix a data-race around sysctl_tcp_app_win Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 10/65] tcp: Fix a data-race around sysctl_tcp_adv_win_scale Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 11/65] tcp: Fix a data-race around sysctl_tcp_frto Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 12/65] tcp: Fix a data-race around sysctl_tcp_nometrics_save Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 13/65] tcp: Fix data-races around sysctl_tcp_no_ssthresh_metrics_save Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 14/65] ice: check (DD | EOF) bits on Rx descriptor rather than (EOP | RS) Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 15/65] ice: do not setup vlan for loopback VSI Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 16/65] scsi: ufs: host: Hold reference returned by of_parse_phandle() Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 17/65] Revert "tcp: change pingpong threshold to 3" Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 18/65] tcp: Fix data-races around sysctl_tcp_moderate_rcvbuf Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 19/65] tcp: Fix a data-race around sysctl_tcp_limit_output_bytes Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 20/65] tcp: Fix a data-race around sysctl_tcp_challenge_ack_limit Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 21/65] net: ping6: Fix memleak in ipv6_renew_options() Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 22/65] ipv6/addrconf: fix a null-ptr-deref bug for ip6_ptr Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 23/65] net/tls: Remove the context from the list in tls_device_down Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 24/65] igmp: Fix data-races around sysctl_igmp_qrv Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 25/65] net: sungem_phy: Add of_node_put() for reference returned by of_get_parent() Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 26/65] tcp: Fix a data-race around sysctl_tcp_min_tso_segs Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 27/65] tcp: Fix a data-race around sysctl_tcp_min_rtt_wlen Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 28/65] tcp: Fix a data-race around sysctl_tcp_autocorking Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 29/65] tcp: Fix a data-race around sysctl_tcp_invalid_ratelimit Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 30/65] Documentation: fix sctp_wmem in ip-sysctl.rst Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 31/65] macsec: fix NULL deref in macsec_add_rxsa Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 32/65] macsec: fix error message in macsec_add_rxsa and _txsa Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 33/65] macsec: limit replay window size with XPN Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 34/65] macsec: always read MACSEC_SA_ATTR_PN as a u64 Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 35/65] net: macsec: fix potential resource leak in macsec_add_rxsa() and macsec_add_txsa() Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 36/65] tcp: Fix a data-race around sysctl_tcp_comp_sack_delay_ns Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 37/65] tcp: Fix a data-race around sysctl_tcp_comp_sack_slack_ns Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 38/65] tcp: Fix a data-race around sysctl_tcp_comp_sack_nr Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 39/65] tcp: Fix data-races around sysctl_tcp_reflect_tos Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 40/65] i40e: Fix interface init with MSI interrupts (no MSI-X) Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 41/65] sctp: fix sleep in atomic context bug in timer handlers Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.10 42/65] netfilter: nf_queue: do not allow packet truncation below transport header offset Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.10 43/65] virtio-net: fix the race between refill work and close Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.10 44/65] perf symbol: Correct address for bss symbols Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.10 45/65] sfc: disable softirqs for ptp TX Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.10 46/65] sctp: leave the err path free in sctp_stream_init to sctp_stream_free Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.10 47/65] ARM: crypto: comment out gcc warning that breaks clang builds Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.10 48/65] page_alloc: fix invalid watermark check on a negative value Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.10 49/65] mt7601u: add USB device ID for some versions of XiaoDu WiFi Dongle Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.10 50/65] ARM: 9216/1: Fix MAX_DMA_ADDRESS overflow Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.10 51/65] EDAC/ghes: Set the DIMM label unconditionally Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.10 52/65] docs/kernel-parameters: Update descriptions for "mitigations=" param with retbleed Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.10 53/65] xfs: refactor xfs_file_fsync Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.10 54/65] xfs: xfs_log_force_lsn isnt passed a LSN Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.10 55/65] xfs: prevent UAF in xfs_log_item_in_current_chkpt Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.10 56/65] xfs: fix log intent recovery ENOSPC shutdowns when inactivating inodes Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.10 57/65] xfs: force the log offline when log intent item recovery fails Greg Kroah-Hartman
2022-08-01 11:47 ` Greg Kroah-Hartman [this message]
2022-08-01 11:47 ` [PATCH 5.10 59/65] xfs: remove dead stale buf unpin handling code Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.10 60/65] xfs: logging the on disk inode LSN can make it go backwards Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.10 61/65] xfs: Enforce attr3 buffer recovery order Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.10 62/65] x86/bugs: Do not enable IBPB at firmware entry when IBPB is not available Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.10 63/65] bpf: Consolidate shared test timing code Greg Kroah-Hartman
2022-08-02 12:13   ` Pavel Machek
2022-08-02 13:27     ` Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.10 64/65] bpf: Add PROG_TEST_RUN support for sk_lookup programs Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.10 65/65] selftests: bpf: Dont run sk_lookup in verifier tests Greg Kroah-Hartman
2022-08-01 14:26 ` [PATCH 5.10 00/65] 5.10.135-rc1 review Jon Hunter
2022-08-01 20:53 ` Florian Fainelli
2022-08-01 22:14 ` Daniel Díaz
2022-08-01 22:21 ` Shuah Khan
2022-08-02  5:28 ` Guenter Roeck
2022-08-02 11:28 ` Rudi Heitbaum
2022-08-02 17:01 ` Pavel Machek
2022-08-02 17:33 ` Sudip Mukherjee (Codethink)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220801114136.094866787@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=amir73il@gmail.com \
    --cc=bfoster@redhat.com \
    --cc=djwong@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.