All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Christoph Hellwig <hch@lst.de>, Dave Chinner <dchinner@redhat.com>
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	david@fromorbit.com, hch@infradead.org
Subject: [PATCH 15/14] fstest: regression test for writeback corruption bug
Date: Wed, 9 Nov 2022 10:22:50 -0800	[thread overview]
Message-ID: <Y2vv+tdWVEStVpaO@magnolia> (raw)
In-Reply-To: <166801774453.3992140.241667783932550826.stgit@magnolia>

From: Darrick J. Wong <djwong@kernel.org>

This is a regression test for a data corruption bug that existed in XFS'
copy on write code between 4.9 and 4.19.  The root cause is a
concurrency bug wherein we would drop ILOCK_SHARED after querying the
CoW fork in xfs_map_cow and retake it before querying the data fork in
xfs_map_blocks.  See the test description for a lot more details.

Cc: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 tests/xfs/924     |  197 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/xfs/924.out |    2 +
 2 files changed, 199 insertions(+)
 create mode 100755 tests/xfs/924
 create mode 100644 tests/xfs/924.out

diff --git a/tests/xfs/924 b/tests/xfs/924
new file mode 100755
index 0000000000..486afefedd
--- /dev/null
+++ b/tests/xfs/924
@@ -0,0 +1,197 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2022 Oracle.  All Rights Reserved.
+#
+# FS QA Test 924
+#
+# This is a regression test for a data corruption bug that existed in XFS' copy
+# on write code between 4.9 and 4.19.  The root cause is a concurrency bug
+# wherein we would drop ILOCK_SHARED after querying the CoW fork in xfs_map_cow
+# and retake it before querying the data fork in xfs_map_blocks.  If a second
+# thread changes the CoW fork mappings between the two calls, it's possible for
+# xfs_map_blocks to return a zero-block mapping, which results in writeback
+# being elided for that block.  Elided writeback of dirty data results in
+# silent loss of writes.
+#
+# Worse yet, kernels from that era still used buffer heads, which means that an
+# elided writeback leaves the page clean but the bufferheads dirty.  Due to a
+# naïve optimization in mark_buffer_dirty, the SetPageDirty call is elided if
+# the bufferhead is dirty, which means that a subsequent rewrite of the data
+# block will never result in the page being marked dirty, and all subsequent
+# writes are lost.
+#
+# It turns out that Christoph Hellwig unwittingly fixed the race in commit
+# 5c665e5b5af6 ("xfs: remove xfs_map_cow"), and no testcase was ever written.
+# Four years later, we hit it on a production 4.14 kernel.  This testcase
+# relies on a debugging knob that introduces artificial delays into writeback.
+#
+# Before the race, the file blocks 0-1 are not shared and blocks 2-5 are
+# shared.  There are no extents in CoW fork.
+#
+# Two threads race like this:
+#
+# Thread 1 (writeback block 0)     | Thread 2  (write to block 2)
+# ---------------------------------|--------------------------------
+#                                  |
+# 1. Check if block 0 in CoW fork  |
+#    from xfs_map_cow.             |
+#                                  |
+# 2. Block 0 not found in CoW      |
+#    fork; the block is considered |
+#    not shared.                   |
+#                                  |
+# 3. xfs_map_blocks looks up data  |
+#    fork to get a map covering    |
+#    block 0.                      |
+#                                  |
+# 4. It gets a data fork mapping   |
+#    for block 0 with length 2.    |
+#                                  |
+#                                  | 1. A buffered write to block 2 sees
+#                                  |    that it is a shared block and no
+#                                  |    extent covers block 2 in CoW fork.
+#                                  |
+#                                  |    It creates a new CoW fork mapping.
+#                                  |    Due to the cowextsize, the new
+#                                  |    extent starts at block 0 with
+#                                  |    length 128.
+#                                  |
+#                                  |
+# 5. It lookup CoW fork again to   |
+#    trim the map (0, 2) to a      |
+#    shared block boundary.        |
+#                                  |
+# 5a. It finds (0, 128) in CoW fork|
+# 5b. It trims the data fork map   |
+#     from (0, 1) to (0, 0) (!!!)  |
+#                                  |
+# 6. The xfs_imap_valid call after |
+#    the xfs_map_blocks call checks|
+#    if the mapping (0, 0) covers  |
+#    block 0.  The result is "NO". |
+#                                  |
+# 7. Since block 0 has no physical |
+#    block mapped, it's not added  |
+#    to the ioend.  This is the    |
+#    first problem.                |
+#                                  |
+# 8. xfs_add_to_ioend usually      |
+#    clears the bufferhead dirty   |
+#    flag  Because this is skipped,|
+#    we leave the page clean with  |
+#    the associated buffer head(s) |
+#    dirty (the second problem).   |
+#    Now the dirty state is        |
+#    inconsistent.
+#
+# On newer kernels, this is also a functionality test for the ifork sequence
+# counter because the writeback completions will change the data fork and force
+# revalidations of the wb mapping.
+#
+. ./common/preamble
+_begin_fstest auto quick clone
+
+# Import common functions.
+. ./common/reflink
+. ./common/inject
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs xfs
+_fixed_by_kernel_commit 5c665e5b5af6 "xfs: remove xfs_map_cow"
+_require_error_injection
+_require_scratch_reflink
+_require_cp_reflink
+
+_scratch_mkfs >> $seqres.full
+_scratch_mount >> $seqres.full
+
+knob="$(_find_xfs_mountdev_errortag_knob $SCRATCH_DEV "wb_delay_ms")"
+test -w "$knob" || _notrun "Kernel does not have wb_delay_ms error injector"
+
+blksz=65536
+_require_congruent_file_oplen $SCRATCH_MNT $blksz
+
+# Make sure we have sufficient extent size to create speculative CoW
+# preallocations.
+$XFS_IO_PROG -c 'cowextsize 1m' $SCRATCH_MNT
+
+# Write out a file with the first two blocks unshared and the rest shared.
+_pwrite_byte 0x59 0 $((160 * blksz)) $SCRATCH_MNT/file >> $seqres.full
+_pwrite_byte 0x59 0 $((160 * blksz)) $SCRATCH_MNT/file.compare >> $seqres.full
+sync
+
+_cp_reflink $SCRATCH_MNT/file $SCRATCH_MNT/file.reflink
+
+_pwrite_byte 0x58 0 $((2 * blksz)) $SCRATCH_MNT/file >> $seqres.full
+_pwrite_byte 0x58 0 $((2 * blksz)) $SCRATCH_MNT/file.compare >> $seqres.full
+sync
+
+# Avoid creation of large folios on newer kernels by cycling the mount and
+# immediately writing to the page cache.
+_scratch_cycle_mount
+
+# Write the same data to file.compare as we're about to do to file.  Do this
+# before slowing down writeback to avoid unnecessary delay.
+_pwrite_byte 0x57 0 $((2 * blksz)) $SCRATCH_MNT/file.compare >> $seqres.full
+_pwrite_byte 0x56 $((2 * blksz)) $((2 * blksz)) $SCRATCH_MNT/file.compare >> $seqres.full
+sync
+
+# Introduce a half-second wait to each writeback block mapping call.  This
+# gives us a chance to race speculative cow prealloc with writeback.
+wb_delay=500
+echo $wb_delay > $knob
+curval="$(cat $knob)"
+test "$curval" -eq $wb_delay || echo "expected wb_delay_ms == $wb_delay"
+
+# Start thread 1 + writeback above
+$XFS_IO_PROG -c "pwrite -S 0x57 0 $((2 * blksz))" \
+	-c 'bmap -celpv' -c 'bmap -elpv' \
+	-c 'fsync' $SCRATCH_MNT/file >> $seqres.full &
+sleep 1
+
+# Start a sentry to look for evidence of the XFS_ERRORTAG_REPORT logging.  If
+# we see that, we know we've forced writeback to revalidate a mapping.  The
+# test has been successful, so turn off the delay.
+sentryfile=$TEST_DIR/$seq.sentry
+wait_for_errortag() {
+	while [ -e "$sentryfile" ]; do
+		if _check_dmesg_for XFS_ERRTAG_WB_DELAY_MS; then
+			echo 0 > $knob
+			break;
+		fi
+		sleep 1
+	done
+}
+touch $sentryfile
+wait_for_errortag &
+
+# Start thread 2 to create the cowextsize reservation
+$XFS_IO_PROG -c "pwrite -S 0x56 $((2 * blksz)) $((2 * blksz))" \
+	-c 'bmap -celpv' -c 'bmap -elpv' \
+	-c 'fsync' $SCRATCH_MNT/file >> $seqres.full
+rm -f $sentryfile
+
+_check_dmesg_for XFS_ERRTAG_WB_DELAY_MS
+saw_delay=$?
+
+# Flush everything to disk.  If the bug manifests, then after the cycle,
+# file should have stale 0x58 in block 0 because we silently dropped a write.
+_scratch_cycle_mount
+
+if ! cmp -s $SCRATCH_MNT/file $SCRATCH_MNT/file.compare; then
+	echo file and file.compare do not match
+	$XFS_IO_PROG -c 'bmap -celpv' -c 'bmap -elpv' $SCRATCH_MNT/file >> $seqres.full
+	echo file.compare
+	od -tx1 -Ad -c $SCRATCH_MNT/file.compare
+	echo file
+	od -tx1 -Ad -c $SCRATCH_MNT/file
+elif [ $saw_delay -ne 0 ]; then
+	# The files matched, but nothing got logged about the revalidation?
+	echo "Expected to hear about XFS_ERRTAG_WB_DELAY_MS?"
+fi
+
+echo Silence is golden
+status=0
+exit
diff --git a/tests/xfs/924.out b/tests/xfs/924.out
new file mode 100644
index 0000000000..c6655da35a
--- /dev/null
+++ b/tests/xfs/924.out
@@ -0,0 +1,2 @@
+QA output created by 924
+Silence is golden

  parent reply	other threads:[~2022-11-09 18:22 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-09 18:15 [PATCHSET RFCRAP v2 00/14] xfs, iomap: fix data corruption due to stale cached iomaps Darrick J. Wong
2022-11-09 18:15 ` [PATCH 01/14] xfs: write page faults in iomap are not buffered writes Darrick J. Wong
2022-11-09 18:15 ` [PATCH 02/14] xfs: punching delalloc extents on write failure is racy Darrick J. Wong
2022-11-09 18:16 ` [PATCH 03/14] xfs: use byte ranges for write cleanup ranges Darrick J. Wong
2022-11-09 18:16 ` [PATCH 04/14] xfs: buffered write failure should not truncate the page cache Darrick J. Wong
2022-11-15  8:32   ` Christoph Hellwig
2022-11-09 18:16 ` [PATCH 05/14] iomap: write iomap validity checks Darrick J. Wong
2022-11-09 18:16 ` [PATCH 06/14] xfs: use iomap_valid method to detect stale cached iomaps Darrick J. Wong
2022-11-09 18:16 ` [PATCH 07/14] xfs: drop write error injection is unfixable, remove it Darrick J. Wong
2022-11-09 18:16 ` [PATCH 08/14] iomap: pass iter to ->iomap_begin implementations Darrick J. Wong
2022-11-15  8:36   ` Christoph Hellwig
2022-11-09 18:16 ` [PATCH 09/14] iomap: pass iter to ->iomap_end implementations Darrick J. Wong
2022-11-09 18:16 ` [PATCH 10/14] iomap: pass a private pointer to iomap_file_buffered_write Darrick J. Wong
2022-11-15  8:37   ` Christoph Hellwig
2022-11-09 18:16 ` [PATCH 11/14] xfs: move the seq counters for buffered writes to a private struct Darrick J. Wong
2022-11-15  8:38   ` Christoph Hellwig
2022-11-09 18:16 ` [PATCH 12/14] xfs: validate COW fork sequence counters during buffered writes Darrick J. Wong
2022-11-09 18:16 ` [PATCH 13/14] xfs: add debug knob to slow down writeback for fun Darrick J. Wong
2022-11-09 18:17 ` [PATCH 14/14] xfs: add debug knob to slow down write " Darrick J. Wong
2022-11-09 18:22 ` Darrick J. Wong [this message]
2022-11-09 18:23 ` [PATCH 16/14] fstest: regression test for writes racing with reclaim writeback Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y2vv+tdWVEStVpaO@magnolia \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=hch@infradead.org \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.