All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev, "Darrick J. Wong" <djwong@kernel.org>,
	Dave Chinner <dchinner@redhat.com>,
	Leah Rumancik <leah.rumancik@gmail.com>
Subject: [PATCH 6.1 24/45] xfs: attach dquots to inode before reading data/cow fork mappings
Date: Thu, 23 May 2024 15:13:15 +0200	[thread overview]
Message-ID: <20240523130333.409174044@linuxfoundation.org> (raw)
In-Reply-To: <20240523130332.496202557@linuxfoundation.org>

6.1-stable review patch.  If anyone has any objections, please let me know.

------------------

From: "Darrick J. Wong" <djwong@kernel.org>

[ Upstream commit 4c6dbfd2756bd83a0085ed804e2bb7be9cc16bc5 ]

I've been running near-continuous integration testing of online fsck,
and I've noticed that once a day, one of the ARM VMs will fail the test
with out of order records in the data fork.

xfs/804 races fsstress with online scrub (aka scan but do not change
anything), so I think this might be a bug in the core xfs code.  This
also only seems to trigger if one runs the test for more than ~6 minutes
via TIME_FACTOR=13 or something.
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/tree/tests/xfs/804?h=djwong-wtf

I added a debugging patch to the kernel to check the data fork extents
after taking the ILOCK, before dropping ILOCK, and before and after each
bmapping operation.  So far I've narrowed it down to the delalloc code
inserting a record in the wrong place in the iext tree:

xfs_bmap_add_extent_hole_delay, near line 2691:

	case 0:
		/*
		 * New allocation is not contiguous with another
		 * delayed allocation.
		 * Insert a new entry.
		 */
		oldlen = newlen = 0;
		xfs_iunlock_check_datafork(ip);		<-- ok here
		xfs_iext_insert(ip, icur, new, state);
		xfs_iunlock_check_datafork(ip);		<-- bad here
		break;
	}

I recorded the state of the data fork mappings and iext cursor state
when a corrupt data fork is detected immediately after the
xfs_bmap_add_extent_hole_delay call in xfs_bmapi_reserve_delalloc:

ino 0x140bb3 func xfs_bmapi_reserve_delalloc line 4164 data fork:
    ino 0x140bb3 nr 0x0 nr_real 0x0 offset 0xb9 blockcount 0x1f startblock 0x935de2 state 1
    ino 0x140bb3 nr 0x1 nr_real 0x1 offset 0xe6 blockcount 0xa startblock 0xffffffffe0007 state 0
    ino 0x140bb3 nr 0x2 nr_real 0x1 offset 0xd8 blockcount 0xe startblock 0x935e01 state 0

Here we see that a delalloc extent was inserted into the wrong position
in the iext leaf, same as all the other times.  The extra trace data I
collected are as follows:

ino 0x140bb3 fork 0 oldoff 0xe6 oldlen 0x4 oldprealloc 0x6 isize 0xe6000
    ino 0x140bb3 oldgotoff 0xea oldgotstart 0xfffffffffffffffe oldgotcount 0x0 oldgotstate 0
    ino 0x140bb3 crapgotoff 0x0 crapgotstart 0x0 crapgotcount 0x0 crapgotstate 0
    ino 0x140bb3 freshgotoff 0xd8 freshgotstart 0x935e01 freshgotcount 0xe freshgotstate 0
    ino 0x140bb3 nowgotoff 0xe6 nowgotstart 0xffffffffe0007 nowgotcount 0xa nowgotstate 0
    ino 0x140bb3 oldicurpos 1 oldleafnr 2 oldleaf 0xfffffc00f0609a00
    ino 0x140bb3 crapicurpos 2 crapleafnr 2 crapleaf 0xfffffc00f0609a00
    ino 0x140bb3 freshicurpos 1 freshleafnr 2 freshleaf 0xfffffc00f0609a00
    ino 0x140bb3 newicurpos 1 newleafnr 3 newleaf 0xfffffc00f0609a00

The first line shows that xfs_bmapi_reserve_delalloc was called with
whichfork=XFS_DATA_FORK, off=0xe6, len=0x4, prealloc=6.

The second line ("oldgot") shows the contents of @got at the beginning
of the call, which are the results of the first iext lookup in
xfs_buffered_write_iomap_begin.

Line 3 ("crapgot") is the result of duplicating the cursor at the start
of the body of xfs_bmapi_reserve_delalloc and performing a fresh lookup
at @off.

Line 4 ("freshgot") is the result of a new xfs_iext_get_extent right
before the call to xfs_bmap_add_extent_hole_delay.  Totally garbage.

Line 5 ("nowgot") is contents of @got after the
xfs_bmap_add_extent_hole_delay call.

Line 6 is the contents of @icur at the beginning fo the call.  Lines 7-9
are the contents of the iext cursors at the point where the block
mappings were sampled.

I think @oldgot is a HOLESTARTBLOCK extent because the first lookup
didn't find anything, so we filled in imap with "fake hole until the
end".  At the time of the first lookup, I suspect that there's only one
32-block unwritten extent in the mapping (hence oldicurpos==1) but by
the time we get to recording crapgot, crapicurpos==2.

Dave then added:

Ok, that's much simpler to reason about, and implies the smoke is
coming from xfs_buffered_write_iomap_begin() or
xfs_bmapi_reserve_delalloc(). I suspect the former - it does a lot
of stuff with the ILOCK_EXCL held.....

.... including calling xfs_qm_dqattach_locked().

xfs_buffered_write_iomap_begin
  ILOCK_EXCL
  look up icur
  xfs_qm_dqattach_locked
    xfs_qm_dqattach_one
      xfs_qm_dqget_inode
        dquot cache miss
        xfs_iunlock(ip, XFS_ILOCK_EXCL);
        error = xfs_qm_dqread(mp, id, type, can_alloc, &dqp);
        xfs_ilock(ip, XFS_ILOCK_EXCL);
  ....
  xfs_bmapi_reserve_delalloc(icur)

Yup, that's what is letting the magic smoke out -
xfs_qm_dqattach_locked() can cycle the ILOCK. If that happens, we
can pass a stale icur to xfs_bmapi_reserve_delalloc() and it all
goes downhill from there.

Back to Darrick now:

So.  Fix this by moving the dqattach_locked call up before we take the
ILOCK, like all the other callers in that file.

Fixes: a526c85c2236 ("xfs: move xfs_file_iomap_begin_delay around") # goes further back than this
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/xfs/xfs_iomap.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -968,6 +968,10 @@ xfs_buffered_write_iomap_begin(
 
 	ASSERT(!XFS_IS_REALTIME_INODE(ip));
 
+	error = xfs_qm_dqattach(ip);
+	if (error)
+		return error;
+
 	error = xfs_ilock_for_iomap(ip, flags, &lockmode);
 	if (error)
 		return error;
@@ -1071,10 +1075,6 @@ xfs_buffered_write_iomap_begin(
 			allocfork = XFS_COW_FORK;
 	}
 
-	error = xfs_qm_dqattach_locked(ip, false);
-	if (error)
-		goto out_unlock;
-
 	if (eof && offset + count > XFS_ISIZE(ip)) {
 		/*
 		 * Determine the initial size of the preallocation.



  parent reply	other threads:[~2024-05-23 13:20 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-23 13:12 [PATCH 6.1 00/45] 6.1.92-rc1 review Greg Kroah-Hartman
2024-05-23 13:12 ` [PATCH 6.1 01/45] drm/amd/display: Fix division by zero in setup_dsc_config Greg Kroah-Hartman
2024-05-23 13:12 ` [PATCH 6.1 02/45] net: ks8851: Fix another TX stall caused by wrong ISR flag handling Greg Kroah-Hartman
2024-05-23 13:12 ` [PATCH 6.1 03/45] ice: pass VSI pointer into ice_vc_isvalid_q_id Greg Kroah-Hartman
2024-05-23 13:12 ` [PATCH 6.1 04/45] ice: remove unnecessary duplicate checks for VF VSI ID Greg Kroah-Hartman
2024-05-23 13:12 ` [PATCH 6.1 05/45] pinctrl: core: handle radix_tree_insert() errors in pinctrl_register_one_pin() Greg Kroah-Hartman
2024-05-23 13:12 ` [PATCH 6.1 06/45] mfd: stpmic1: Fix swapped mask/unmask in irq chip Greg Kroah-Hartman
2024-05-23 13:12 ` [PATCH 6.1 07/45] nfsd: dont allow nfsd threads to be signalled Greg Kroah-Hartman
2024-05-23 13:12 ` [PATCH 6.1 08/45] KEYS: trusted: Fix memory leak in tpm2_key_encode() Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 09/45] mmc: core: Add HS400 tuning in HS400es initialization Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 10/45] xfs: write page faults in iomap are not buffered writes Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 11/45] xfs: punching delalloc extents on write failure is racy Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 12/45] xfs: use byte ranges for write cleanup ranges Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 13/45] xfs,iomap: move delalloc punching to iomap Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 14/45] iomap: buffered write failure should not truncate the page cache Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 15/45] xfs: xfs_bmap_punch_delalloc_range() should take a byte range Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 16/45] iomap: write iomap validity checks Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 17/45] xfs: use iomap_valid method to detect stale cached iomaps Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 18/45] xfs: drop write error injection is unfixable, remove it Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 19/45] xfs: fix off-by-one-block in xfs_discard_folio() Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 20/45] xfs: fix incorrect error-out in xfs_remove Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 21/45] xfs: fix sb write verify for lazysbcount Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 22/45] xfs: fix incorrect i_nlink caused by inode racing Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 23/45] xfs: invalidate block device page cache during unmount Greg Kroah-Hartman
2024-05-23 13:13 ` Greg Kroah-Hartman [this message]
2024-05-23 13:13 ` [PATCH 6.1 25/45] xfs: wait iclog complete before tearing down AIL Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 26/45] xfs: fix super block buf log item UAF during force shutdown Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 27/45] xfs: hoist refcount record merge predicates Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 28/45] xfs: estimate post-merge refcounts correctly Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 29/45] xfs: invalidate xfs_bufs when allocating cow extents Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 30/45] xfs: allow inode inactivation during a ro mount log recovery Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 31/45] xfs: fix log recovery when unknown rocompat bits are set Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 32/45] xfs: get root inode correctly at bulkstat Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 33/45] xfs: short circuit xfs_growfs_data_private() if delta is zero Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 34/45] arm64: atomics: lse: remove stale dependency on JUMP_LABEL Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 35/45] drm/amdgpu: Fix possible NULL dereference in amdgpu_ras_query_error_status_helper() Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 36/45] binder: fix max_thread type inconsistency Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 37/45] usb: dwc3: Wait unconditionally after issuing EndXfer command Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 38/45] net: usb: ax88179_178a: fix link status when link is set to down/up Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 39/45] usb: typec: ucsi: displayport: Fix potential deadlock Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 40/45] usb: typec: tipd: fix event checking for tps6598x Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 41/45] serial: kgdboc: Fix NMI-safety problems from keyboard reset code Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 42/45] remoteproc: mediatek: Make sure IPI buffer fits in L2TCM Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 43/45] KEYS: trusted: Do not use WARN when encode fails Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 44/45] admin-guide/hw-vuln/core-scheduling: fix return type of PR_SCHED_CORE_GET Greg Kroah-Hartman
2024-05-23 13:13 ` [PATCH 6.1 45/45] docs: kernel_include.py: Cope with docutils 0.21 Greg Kroah-Hartman
2024-05-23 17:03 ` [PATCH 6.1 00/45] 6.1.92-rc1 review SeongJae Park
2024-05-23 18:17 ` Mark Brown
2024-05-23 20:21 ` Florian Fainelli
2024-05-24  8:13 ` Anders Roxell
2024-05-24 11:21 ` Pavel Machek
2024-05-24 14:37 ` Shuah Khan
2024-05-24 15:20 ` Jon Hunter
2024-05-24 20:35 ` Mateusz Jończyk
2024-05-24 20:38 ` Ron Economos
2024-05-25  1:06 ` Kelsey Steele
2024-05-25 16:25 ` Allen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240523130333.409174044@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=leah.rumancik@gmail.com \
    --cc=patches@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.