All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	wen.gang.wang@oracle.com, fstests <fstests@vger.kernel.org>
Subject: [RFC PATCH] xfs: regression test for writeback corruption bug
Date: Tue, 29 Nov 2022 13:57:14 -0800	[thread overview]
Message-ID: <Y4aAOn7CUTr9tUBN@magnolia> (raw)
In-Reply-To: <Y4VejsHGU/tZuRYs@magnolia>

From: Darrick J. Wong <djwong@kernel.org>

This is a regression test for a data corruption bug that existed in XFS'
copy on write code between 4.9 and 4.19.  The root cause is a
concurrency bug wherein we would drop ILOCK_SHARED after querying the
CoW fork in xfs_map_cow and retake it before querying the data fork in
xfs_map_blocks.  See the test description for a lot more details.

Cc: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 common/rc         |   15 ++++
 common/tracing    |   69 +++++++++++++++++
 tests/xfs/924     |  215 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/xfs/924.out |    2 
 4 files changed, 301 insertions(+)
 create mode 100644 common/tracing
 create mode 100755 tests/xfs/924
 create mode 100644 tests/xfs/924.out

diff --git a/common/rc b/common/rc
index d71fc0603f..b1b7a3e553 100644
--- a/common/rc
+++ b/common/rc
@@ -3625,6 +3625,21 @@ _check_xflag()
 	fi
 }
 
+# Make sure the given file access mode is set to use the pagecache.  If
+# userspace or kernel don't support statx or STATX_ATTR_DAX, we assume that
+# means pagecache.  The sole parameter must be a directory.
+_require_pagecache_access() {
+	local testfile="$1/testfile"
+
+	touch "$testfile"
+	if ! _check_s_dax "$testfile" 0 &>> $seqres.full; then
+		rm -f "$testfile"
+		_notrun 'test requires pagecache access'
+	fi
+
+	rm -f "$testfile"
+}
+
 # Check if dax mount options are supported
 #
 # $1 can be either 'dax=always' or 'dax'
diff --git a/common/tracing b/common/tracing
new file mode 100644
index 0000000000..35e5ed41c2
--- /dev/null
+++ b/common/tracing
@@ -0,0 +1,69 @@
+##/bin/bash
+# SPDX-License-Identifier: GPL-2.0-or-later
+# Copyright (c) 2022 Oracle.  All Rights Reserved.
+#
+# Routines for dealing with ftrace (or any other tracing).
+
+_require_ftrace() {
+	local ftrace_dir="/sys/kernel/debug/tracing/instances/"
+	test -d "$ftrace_dir" || _notrun "kernel does not support ftrace"
+
+	# Give this fstest its own ftrace buffer so that we don't mess up
+	# any other tracers that might be running.
+	FTRACE_DIR="$ftrace_dir/fstests.$seq"
+	test -d "$FTRACE_DIR" && rmdir "$FTRACE_DIR"
+}
+
+_ftrace_cleanup() {
+	if [ -d "$FTRACE_DIR" ]; then
+		_ftrace_ignore_events
+		# Removing an ftrace buffer requires rmdir, even though the
+		# virtual directory contains children.
+		rmdir "$FTRACE_DIR"
+	fi
+}
+
+# Intercept the given events.  Arguments may be regular expressions.
+_ftrace_record_events() {
+	local pwd="$PWD"
+
+	test -n "$FTRACE_DIR" || _fail "_require_ftrace not run?"
+	mkdir "$FTRACE_DIR"
+	cd "$FTRACE_DIR/events/" || _fail "$FTRACE_DIR: ftrace not set up?"
+
+	for arg in "$@"; do
+		for tp in */${arg}; do
+			# Replace slashes with semicolons per ftrace convention
+			echo "${tp////:}" >> ../set_event
+		done
+	done
+	cd "$pwd"
+}
+
+# Stop intercepting the given events.  If no arguments, stops all events.
+_ftrace_ignore_events() {
+	local pwd="$PWD"
+
+	test -n "$FTRACE_DIR" || _fail "_require_ftrace not run?"
+	cd "$FTRACE_DIR/events/" || _fail "$FTRACE_DIR: ftrace not set up?"
+
+	if [ "$#" -eq 0 ]; then
+		echo > ../set_event
+	else
+		for arg in "$@"; do
+			for tp in */${arg}; do
+				# Replace slashes with semicolons per ftrace convention
+				echo "!${tp////:}" >> ../set_event
+			done
+		done
+	fi
+
+	cd "$pwd"
+}
+
+# Dump whatever was written to the ftrace buffer since the last time this
+# helper was called.
+_ftrace_dump() {
+	test -n "$FTRACE_DIR" || _fail "_require_ftrace not run?"
+	(cd "$FTRACE_DIR" && cat trace)
+}
diff --git a/tests/xfs/924 b/tests/xfs/924
new file mode 100755
index 0000000000..81f8ba2743
--- /dev/null
+++ b/tests/xfs/924
@@ -0,0 +1,215 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2022 Oracle.  All Rights Reserved.
+#
+# FS QA Test 924
+#
+# This is a regression test for a data corruption bug that existed in XFS' copy
+# on write code between 4.9 and 4.19.  The root cause is a concurrency bug
+# wherein we would drop ILOCK_SHARED after querying the CoW fork in xfs_map_cow
+# and retake it before querying the data fork in xfs_map_blocks.  If a second
+# thread changes the CoW fork mappings between the two calls, it's possible for
+# xfs_map_blocks to return a zero-block mapping, which results in writeback
+# being elided for that block.  Elided writeback of dirty data results in
+# silent loss of writes.
+#
+# Worse yet, kernels from that era still used buffer heads, which means that an
+# elided writeback leaves the page clean but the bufferheads dirty.  Due to a
+# naïve optimization in mark_buffer_dirty, the SetPageDirty call is elided if
+# the bufferhead is dirty, which means that a subsequent rewrite of the data
+# block will never result in the page being marked dirty, and all subsequent
+# writes are lost.
+#
+# It turns out that Christoph Hellwig unwittingly fixed the race in commit
+# 5c665e5b5af6 ("xfs: remove xfs_map_cow"), and no testcase was ever written.
+# Four years later, we hit it on a production 4.14 kernel.  This testcase
+# relies on a debugging knob that introduces artificial delays into writeback.
+#
+# Before the race, the file blocks 0-1 are not shared and blocks 2-5 are
+# shared.  There are no extents in CoW fork.
+#
+# Two threads race like this:
+#
+# Thread 1 (writeback block 0)     | Thread 2  (write to block 2)
+# ---------------------------------|--------------------------------
+#                                  |
+# 1. Check if block 0 in CoW fork  |
+#    from xfs_map_cow.             |
+#                                  |
+# 2. Block 0 not found in CoW      |
+#    fork; the block is considered |
+#    not shared.                   |
+#                                  |
+# 3. xfs_map_blocks looks up data  |
+#    fork to get a map covering    |
+#    block 0.                      |
+#                                  |
+# 4. It gets a data fork mapping   |
+#    for block 0 with length 2.    |
+#                                  |
+#                                  | 1. A buffered write to block 2 sees
+#                                  |    that it is a shared block and no
+#                                  |    extent covers block 2 in CoW fork.
+#                                  |
+#                                  |    It creates a new CoW fork mapping.
+#                                  |    Due to the cowextsize, the new
+#                                  |    extent starts at block 0 with
+#                                  |    length 128.
+#                                  |
+#                                  |
+# 5. It lookup CoW fork again to   |
+#    trim the map (0, 2) to a      |
+#    shared block boundary.        |
+#                                  |
+# 5a. It finds (0, 128) in CoW fork|
+# 5b. It trims the data fork map   |
+#     from (0, 1) to (0, 0) (!!!)  |
+#                                  |
+# 6. The xfs_imap_valid call after |
+#    the xfs_map_blocks call checks|
+#    if the mapping (0, 0) covers  |
+#    block 0.  The result is "NO". |
+#                                  |
+# 7. Since block 0 has no physical |
+#    block mapped, it's not added  |
+#    to the ioend.  This is the    |
+#    first problem.                |
+#                                  |
+# 8. xfs_add_to_ioend usually      |
+#    clears the bufferhead dirty   |
+#    flag  Because this is skipped,|
+#    we leave the page clean with  |
+#    the associated buffer head(s) |
+#    dirty (the second problem).   |
+#    Now the dirty state is        |
+#    inconsistent.
+#
+# On newer kernels, this is also a functionality test for the ifork sequence
+# counter because the writeback completions will change the data fork and force
+# revalidations of the wb mapping.
+#
+. ./common/preamble
+_begin_fstest auto quick clone
+
+# Import common functions.
+. ./common/reflink
+. ./common/inject
+. ./common/tracing
+
+# real QA test starts here
+_cleanup()
+{
+	_ftrace_cleanup
+	cd /
+	rm -r -f $tmp.* $sentryfile $tracefile
+}
+
+# Modify as appropriate.
+_supported_fs xfs
+_fixed_by_kernel_commit 5c665e5b5af6 "xfs: remove xfs_map_cow"
+_require_ftrace
+_require_error_injection
+_require_scratch_reflink
+_require_cp_reflink
+
+_scratch_mkfs >> $seqres.full
+_scratch_mount >> $seqres.full
+
+# This is a pagecache test, so try to disable fsdax mode.
+$XFS_IO_PROG -c 'chattr -x' $SCRATCH_MNT &> $seqres.full
+_require_pagecache_access $SCRATCH_MNT
+
+knob="$(_find_xfs_mountdev_errortag_knob $SCRATCH_DEV "wb_delay_ms")"
+test -w "$knob" || _notrun "Kernel does not have wb_delay_ms error injector"
+
+blksz=65536
+_require_congruent_file_oplen $SCRATCH_MNT $blksz
+
+# Make sure we have sufficient extent size to create speculative CoW
+# preallocations.
+$XFS_IO_PROG -c 'cowextsize 1m' $SCRATCH_MNT
+
+# Write out a file with the first two blocks unshared and the rest shared.
+_pwrite_byte 0x59 0 $((160 * blksz)) $SCRATCH_MNT/file >> $seqres.full
+_pwrite_byte 0x59 0 $((160 * blksz)) $SCRATCH_MNT/file.compare >> $seqres.full
+sync
+
+_cp_reflink $SCRATCH_MNT/file $SCRATCH_MNT/file.reflink
+
+_pwrite_byte 0x58 0 $((2 * blksz)) $SCRATCH_MNT/file >> $seqres.full
+_pwrite_byte 0x58 0 $((2 * blksz)) $SCRATCH_MNT/file.compare >> $seqres.full
+sync
+
+# Avoid creation of large folios on newer kernels by cycling the mount and
+# immediately writing to the page cache.
+_scratch_cycle_mount
+
+# Write the same data to file.compare as we're about to do to file.  Do this
+# before slowing down writeback to avoid unnecessary delay.
+_pwrite_byte 0x57 0 $((2 * blksz)) $SCRATCH_MNT/file.compare >> $seqres.full
+_pwrite_byte 0x56 $((2 * blksz)) $((2 * blksz)) $SCRATCH_MNT/file.compare >> $seqres.full
+sync
+
+# Introduce a half-second wait to each writeback block mapping call.  This
+# gives us a chance to race speculative cow prealloc with writeback.
+wb_delay=500
+echo $wb_delay > $knob
+curval="$(cat $knob)"
+test "$curval" -eq $wb_delay || echo "expected wb_delay_ms == $wb_delay"
+
+_ftrace_record_events 'xfs_wb*iomap_invalid'
+
+# Start thread 1 + writeback above
+$XFS_IO_PROG -c "pwrite -S 0x57 0 $((2 * blksz))" \
+	-c 'bmap -celpv' -c 'bmap -elpv' \
+	-c 'fsync' $SCRATCH_MNT/file >> $seqres.full &
+sleep 1
+
+# Start a sentry to look for evidence of the XFS_ERRORTAG_REPORT logging.  If
+# we see that, we know we've forced writeback to revalidate a mapping.  The
+# test has been successful, so turn off the delay.
+sentryfile=$TEST_DIR/$seq.sentry
+tracefile=$TEST_DIR/$seq.ftrace
+wait_for_errortag() {
+	while [ -e "$sentryfile" ]; do
+		_ftrace_dump | grep iomap_invalid >> "$tracefile"
+		if grep -q iomap_invalid "$tracefile"; then
+			echo 0 > "$knob"
+			_ftrace_ignore_events
+			break;
+		fi
+		sleep 0.5
+	done
+}
+touch $sentryfile
+wait_for_errortag &
+
+# Start thread 2 to create the cowextsize reservation
+$XFS_IO_PROG -c "pwrite -S 0x56 $((2 * blksz)) $((2 * blksz))" \
+	-c 'bmap -celpv' -c 'bmap -elpv' \
+	-c 'fsync' $SCRATCH_MNT/file >> $seqres.full
+rm -f $sentryfile
+
+cat "$tracefile" >> $seqres.full
+grep -q iomap_invalid "$tracefile"
+saw_invalidation=$?
+
+# Flush everything to disk.  If the bug manifests, then after the cycle,
+# file should have stale 0x58 in block 0 because we silently dropped a write.
+_scratch_cycle_mount
+
+if ! cmp -s $SCRATCH_MNT/file $SCRATCH_MNT/file.compare; then
+	echo file and file.compare do not match
+	$XFS_IO_PROG -c 'bmap -celpv' -c 'bmap -elpv' $SCRATCH_MNT/file >> $seqres.full
+	echo file.compare
+	od -tx1 -Ad -c $SCRATCH_MNT/file.compare
+	echo file
+	od -tx1 -Ad -c $SCRATCH_MNT/file
+elif [ $saw_invalidation -ne 0 ]; then
+	# The files matched, but nothing got logged about the revalidation?
+	echo "Expected to hear about writeback iomap invalidations?"
+fi
+
+echo Silence is golden
+status=0
+exit
diff --git a/tests/xfs/924.out b/tests/xfs/924.out
new file mode 100644
index 0000000000..c6655da35a
--- /dev/null
+++ b/tests/xfs/924.out
@@ -0,0 +1,2 @@
+QA output created by 924
+Silence is golden

  parent reply	other threads:[~2022-11-29 21:58 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-23  5:58 [PATCH 0/9 V4] xfs, iomap: fix data corrupton due to stale cached iomaps Dave Chinner
2022-11-23  5:58 ` [PATCH 1/9] xfs: write page faults in iomap are not buffered writes Dave Chinner
2022-11-23  5:58 ` [PATCH 2/9] xfs: punching delalloc extents on write failure is racy Dave Chinner
2022-11-23  5:58 ` [PATCH 3/9] xfs: use byte ranges for write cleanup ranges Dave Chinner
2022-11-23  5:58 ` [PATCH 4/9] xfs,iomap: move delalloc punching to iomap Dave Chinner
2022-11-23  5:58 ` [PATCH 5/9] iomap: buffered write failure should not truncate the page cache Dave Chinner
2022-11-23 13:24   ` kernel test robot
2022-11-23  5:58 ` [PATCH 6/9] xfs: xfs_bmap_punch_delalloc_range() should take a byte range Dave Chinner
2022-11-23  5:58 ` [PATCH 7/9] iomap: write iomap validity checks Dave Chinner
2022-11-23  5:58 ` [PATCH 8/9] xfs: use iomap_valid method to detect stale cached iomaps Dave Chinner
2022-11-23  5:58 ` [PATCH 9/9] xfs: drop write error injection is unfixable, remove it Dave Chinner
2022-11-28 22:34 ` [PATCH 10/9] xfs: add debug knob to slow down writeback for fun Darrick J. Wong
2022-11-28 23:30   ` Dave Chinner
2022-11-28 23:39     ` Darrick J. Wong
2022-11-29  1:21   ` [PATCH v2 " Darrick J. Wong
2022-11-29  1:34     ` Dave Chinner
2022-11-29 21:53       ` Darrick J. Wong
2022-11-29 21:57     ` Darrick J. Wong [this message]
2022-11-30 17:34       ` [RFC PATCH] xfs: regression test for writeback corruption bug Zorro Lang
2022-11-30 19:04         ` Darrick J. Wong
2022-12-01 15:21           ` Zorro Lang
2022-12-01 15:53             ` Darrick J. Wong
2022-11-28 22:34 ` [PATCH 11/9] xfs: add debug knob to slow down write for fun Darrick J. Wong
2022-11-29  1:22   ` [PATCH v2 " Darrick J. Wong
2022-11-29  1:37     ` Dave Chinner
2022-11-29 21:53       ` Darrick J. Wong
2022-11-29 21:55     ` [RFC PATCH] xfs: regression test for writes racing with reclaim writeback Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y4aAOn7CUTr9tUBN@magnolia \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=fstests@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=wen.gang.wang@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.