All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] generic: test race between block map change and writeback
@ 2017-08-31  4:02 Eryu Guan
  2017-10-09  8:17 ` Eryu Guan
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Eryu Guan @ 2017-08-31  4:02 UTC (permalink / raw)
  To: fstests; +Cc: linux-xfs, Eryu Guan

Run delalloc writes & append writes & non-data-integrity syncs
concurrently to test the race between block map change vs writeback.

This is to cover an XFS bug that data could be written to wrong
block and delay allocated blocks are leaked because the block map
was changed due to the removal of speculative allocated eofblocks
when writeback is in progress.

And this test partially mimics what lustre-racer[1] test does, using
which this bug was first found.

[1] https://git.hpdd.intel.com/?p=fs/lustre-release.git;a=tree;f=lustre/tests/racer;hb=HEAD

Signed-off-by: Eryu Guan <eguan@redhat.com>
---

This may not reproduce the bug on all hosts, but it does reproduce the XFS
corruption issue reliably on my different test hosts.

 tests/generic/451     | 130 ++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/generic/451.out |   2 +
 tests/generic/group   |   1 +
 3 files changed, 133 insertions(+)
 create mode 100755 tests/generic/451
 create mode 100644 tests/generic/451.out

diff --git a/tests/generic/451 b/tests/generic/451
new file mode 100755
index 000000000000..72cdd1c01de2
--- /dev/null
+++ b/tests/generic/451
@@ -0,0 +1,130 @@
+#! /bin/bash
+# FS QA Test 451
+#
+# Run delalloc writes & append writes & non-data-integrity syncs concurrently
+# to test the race between block map change vs writeback.
+#
+#-----------------------------------------------------------------------
+# Copyright (c) 2017 Red Hat Inc. All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#-----------------------------------------------------------------------
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1	# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+	cd /
+	rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+MAXFILES=200
+BLOCK_SZ=65536
+
+LOOP_CNT=12
+LOOP_TIME=5
+PROC_CNT=16
+
+stop=$tmp.stop
+
+# get a random file to work on
+getfile()
+{
+	echo $SCRATCH_MNT/$((RANDOM % MAXFILES))
+}
+
+# delalloc write a relative big file to get enough dirty pages to be written
+# back, and XFS needs big enough file to trigger speculative preallocations, so
+# freeing these eofblocks could change the extent record
+do_write()
+{
+	local blockcount=$((RANDOM % 100))
+	local filesize=$((blockcount * BLOCK_SZ))
+	$XFS_IO_PROG -ftc "pwrite -b $BLOCK_SZ 0 $filesize" `getfile` >/dev/null 2>&1
+}
+
+# append another dirty page to the file, the writeback might pick it up too if
+# the file is already under writeback
+do_append()
+{
+	echo "test string" >> `getfile`
+}
+
+# issue WB_SYNC_NONE writeback with the '-w' option of sync_range xfs_io
+# command, so that the last dirty page from append write can be picked up in
+# this writeback cycle. This is not mandatory but could help reproduce XFS
+# corruption more easily.
+do_writeback()
+{
+	$XFS_IO_PROG -c "sync_range -w 0 0" `getfile` >/dev/null 2>&1
+}
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+_supported_fs generic
+_supported_os Linux
+# do fsck after each iteration in test
+_require_scratch_nocheck
+_require_xfs_io_command "sync_range"
+
+_scratch_mkfs >>$seqres.full 2>&1
+_scratch_mount
+
+# loop for $LOOP_CNT iterations, and each iteration starts $PROC_CNT processes
+# for each operation and runs for $LOOP_TIME seconds, and check filesystem
+# consistency after each iteration
+for i in `seq 1 $LOOP_CNT`; do
+	rm -f $stop
+	for j in `seq 1 $PROC_CNT`; do
+		while [ ! -e $stop ]; do
+			do_write
+		done &
+
+		while [ ! -e $stop ]; do
+			do_append
+		done &
+
+		while [ ! -e $stop ]; do
+			do_writeback
+		done &
+	done
+	sleep $LOOP_TIME
+	touch $stop
+	wait
+
+	_scratch_unmount
+	# test exits here if fs is inconsistent
+	_check_scratch_fs
+	_scratch_mount
+done
+
+echo "Silence is golden"
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/451.out b/tests/generic/451.out
new file mode 100644
index 000000000000..db924411b72f
--- /dev/null
+++ b/tests/generic/451.out
@@ -0,0 +1,2 @@
+QA output created by 451
+Silence is golden
diff --git a/tests/generic/group b/tests/generic/group
index 044ec3f355ed..b4bd66bc65a9 100644
--- a/tests/generic/group
+++ b/tests/generic/group
@@ -453,3 +453,4 @@
 448 auto quick rw
 449 auto quick acl enospc
 450 auto quick rw
+451 auto rw
-- 
2.13.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] generic: test race between block map change and writeback
  2017-08-31  4:02 [PATCH] generic: test race between block map change and writeback Eryu Guan
@ 2017-10-09  8:17 ` Eryu Guan
  2017-10-09 16:12 ` Brian Foster
  2017-10-10 12:44 ` Xiong Zhou
  2 siblings, 0 replies; 12+ messages in thread
From: Eryu Guan @ 2017-10-09  8:17 UTC (permalink / raw)
  To: fstests; +Cc: linux-xfs

On Thu, Aug 31, 2017 at 12:02:37PM +0800, Eryu Guan wrote:
> Run delalloc writes & append writes & non-data-integrity syncs
> concurrently to test the race between block map change vs writeback.
> 
> This is to cover an XFS bug that data could be written to wrong
> block and delay allocated blocks are leaked because the block map
> was changed due to the removal of speculative allocated eofblocks
> when writeback is in progress.
> 
> And this test partially mimics what lustre-racer[1] test does, using
> which this bug was first found.
> 
> [1] https://git.hpdd.intel.com/?p=fs/lustre-release.git;a=tree;f=lustre/tests/racer;hb=HEAD
> 
> Signed-off-by: Eryu Guan <eguan@redhat.com>

Ping on this test.

Eryu

> ---
> 
> This may not reproduce the bug on all hosts, but it does reproduce the XFS
> corruption issue reliably on my different test hosts.
> 
>  tests/generic/451     | 130 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/generic/451.out |   2 +
>  tests/generic/group   |   1 +
>  3 files changed, 133 insertions(+)
>  create mode 100755 tests/generic/451
>  create mode 100644 tests/generic/451.out
> 
> diff --git a/tests/generic/451 b/tests/generic/451
> new file mode 100755
> index 000000000000..72cdd1c01de2
> --- /dev/null
> +++ b/tests/generic/451
> @@ -0,0 +1,130 @@
> +#! /bin/bash
> +# FS QA Test 451
> +#
> +# Run delalloc writes & append writes & non-data-integrity syncs concurrently
> +# to test the race between block map change vs writeback.
> +#
> +#-----------------------------------------------------------------------
> +# Copyright (c) 2017 Red Hat Inc. All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#-----------------------------------------------------------------------
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1	# failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> +	cd /
> +	rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +
> +MAXFILES=200
> +BLOCK_SZ=65536
> +
> +LOOP_CNT=12
> +LOOP_TIME=5
> +PROC_CNT=16
> +
> +stop=$tmp.stop
> +
> +# get a random file to work on
> +getfile()
> +{
> +	echo $SCRATCH_MNT/$((RANDOM % MAXFILES))
> +}
> +
> +# delalloc write a relative big file to get enough dirty pages to be written
> +# back, and XFS needs big enough file to trigger speculative preallocations, so
> +# freeing these eofblocks could change the extent record
> +do_write()
> +{
> +	local blockcount=$((RANDOM % 100))
> +	local filesize=$((blockcount * BLOCK_SZ))
> +	$XFS_IO_PROG -ftc "pwrite -b $BLOCK_SZ 0 $filesize" `getfile` >/dev/null 2>&1
> +}
> +
> +# append another dirty page to the file, the writeback might pick it up too if
> +# the file is already under writeback
> +do_append()
> +{
> +	echo "test string" >> `getfile`
> +}
> +
> +# issue WB_SYNC_NONE writeback with the '-w' option of sync_range xfs_io
> +# command, so that the last dirty page from append write can be picked up in
> +# this writeback cycle. This is not mandatory but could help reproduce XFS
> +# corruption more easily.
> +do_writeback()
> +{
> +	$XFS_IO_PROG -c "sync_range -w 0 0" `getfile` >/dev/null 2>&1
> +}
> +
> +# remove previous $seqres.full before test
> +rm -f $seqres.full
> +
> +# real QA test starts here
> +_supported_fs generic
> +_supported_os Linux
> +# do fsck after each iteration in test
> +_require_scratch_nocheck
> +_require_xfs_io_command "sync_range"
> +
> +_scratch_mkfs >>$seqres.full 2>&1
> +_scratch_mount
> +
> +# loop for $LOOP_CNT iterations, and each iteration starts $PROC_CNT processes
> +# for each operation and runs for $LOOP_TIME seconds, and check filesystem
> +# consistency after each iteration
> +for i in `seq 1 $LOOP_CNT`; do
> +	rm -f $stop
> +	for j in `seq 1 $PROC_CNT`; do
> +		while [ ! -e $stop ]; do
> +			do_write
> +		done &
> +
> +		while [ ! -e $stop ]; do
> +			do_append
> +		done &
> +
> +		while [ ! -e $stop ]; do
> +			do_writeback
> +		done &
> +	done
> +	sleep $LOOP_TIME
> +	touch $stop
> +	wait
> +
> +	_scratch_unmount
> +	# test exits here if fs is inconsistent
> +	_check_scratch_fs
> +	_scratch_mount
> +done
> +
> +echo "Silence is golden"
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/generic/451.out b/tests/generic/451.out
> new file mode 100644
> index 000000000000..db924411b72f
> --- /dev/null
> +++ b/tests/generic/451.out
> @@ -0,0 +1,2 @@
> +QA output created by 451
> +Silence is golden
> diff --git a/tests/generic/group b/tests/generic/group
> index 044ec3f355ed..b4bd66bc65a9 100644
> --- a/tests/generic/group
> +++ b/tests/generic/group
> @@ -453,3 +453,4 @@
>  448 auto quick rw
>  449 auto quick acl enospc
>  450 auto quick rw
> +451 auto rw
> -- 
> 2.13.5
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] generic: test race between block map change and writeback
  2017-08-31  4:02 [PATCH] generic: test race between block map change and writeback Eryu Guan
  2017-10-09  8:17 ` Eryu Guan
@ 2017-10-09 16:12 ` Brian Foster
  2017-10-10  4:36   ` Eryu Guan
  2017-10-10 12:44 ` Xiong Zhou
  2 siblings, 1 reply; 12+ messages in thread
From: Brian Foster @ 2017-10-09 16:12 UTC (permalink / raw)
  To: Eryu Guan; +Cc: fstests, linux-xfs

On Thu, Aug 31, 2017 at 12:02:37PM +0800, Eryu Guan wrote:
> Run delalloc writes & append writes & non-data-integrity syncs
> concurrently to test the race between block map change vs writeback.
> 
> This is to cover an XFS bug that data could be written to wrong
> block and delay allocated blocks are leaked because the block map
> was changed due to the removal of speculative allocated eofblocks
> when writeback is in progress.
> 
> And this test partially mimics what lustre-racer[1] test does, using
> which this bug was first found.
> 
> [1] https://git.hpdd.intel.com/?p=fs/lustre-release.git;a=tree;f=lustre/tests/racer;hb=HEAD
> 
> Signed-off-by: Eryu Guan <eguan@redhat.com>
> ---
> 
> This may not reproduce the bug on all hosts, but it does reproduce the XFS
> corruption issue reliably on my different test hosts.
> 

Was this problem fixed already or are we still waiting on a fix?

>  tests/generic/451     | 130 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/generic/451.out |   2 +
>  tests/generic/group   |   1 +
>  3 files changed, 133 insertions(+)
>  create mode 100755 tests/generic/451
>  create mode 100644 tests/generic/451.out
> 
> diff --git a/tests/generic/451 b/tests/generic/451
> new file mode 100755
> index 000000000000..72cdd1c01de2
> --- /dev/null
> +++ b/tests/generic/451
> @@ -0,0 +1,130 @@
> +#! /bin/bash
> +# FS QA Test 451
> +#
> +# Run delalloc writes & append writes & non-data-integrity syncs concurrently
> +# to test the race between block map change vs writeback.
> +#
> +#-----------------------------------------------------------------------
> +# Copyright (c) 2017 Red Hat Inc. All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#-----------------------------------------------------------------------
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1	# failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> +	cd /
> +	rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +
> +MAXFILES=200
> +BLOCK_SZ=65536
> +
> +LOOP_CNT=12

If I skip the failure detection below, the test runs for 100s on my vm.
Otherwise it fails consistently within ~45s (worst case in 5 or 6
tries). Do you observe differently? If not, I'm wondering if we could
speed up the common case and reduce the number of iterations.

> +LOOP_TIME=5
> +PROC_CNT=16
> +
> +stop=$tmp.stop
> +
> +# get a random file to work on
> +getfile()
> +{
> +	echo $SCRATCH_MNT/$((RANDOM % MAXFILES))
> +}
> +
> +# delalloc write a relative big file to get enough dirty pages to be written
> +# back, and XFS needs big enough file to trigger speculative preallocations, so
> +# freeing these eofblocks could change the extent record
> +do_write()
> +{
> +	local blockcount=$((RANDOM % 100))
> +	local filesize=$((blockcount * BLOCK_SZ))
> +	$XFS_IO_PROG -ftc "pwrite -b $BLOCK_SZ 0 $filesize" `getfile` >/dev/null 2>&1

Long line here. Otherwise the rest of the test looks good.

Brian

> +}
> +
> +# append another dirty page to the file, the writeback might pick it up too if
> +# the file is already under writeback
> +do_append()
> +{
> +	echo "test string" >> `getfile`
> +}
> +
> +# issue WB_SYNC_NONE writeback with the '-w' option of sync_range xfs_io
> +# command, so that the last dirty page from append write can be picked up in
> +# this writeback cycle. This is not mandatory but could help reproduce XFS
> +# corruption more easily.
> +do_writeback()
> +{
> +	$XFS_IO_PROG -c "sync_range -w 0 0" `getfile` >/dev/null 2>&1
> +}
> +
> +# remove previous $seqres.full before test
> +rm -f $seqres.full
> +
> +# real QA test starts here
> +_supported_fs generic
> +_supported_os Linux
> +# do fsck after each iteration in test
> +_require_scratch_nocheck
> +_require_xfs_io_command "sync_range"
> +
> +_scratch_mkfs >>$seqres.full 2>&1
> +_scratch_mount
> +
> +# loop for $LOOP_CNT iterations, and each iteration starts $PROC_CNT processes
> +# for each operation and runs for $LOOP_TIME seconds, and check filesystem
> +# consistency after each iteration
> +for i in `seq 1 $LOOP_CNT`; do
> +	rm -f $stop
> +	for j in `seq 1 $PROC_CNT`; do
> +		while [ ! -e $stop ]; do
> +			do_write
> +		done &
> +
> +		while [ ! -e $stop ]; do
> +			do_append
> +		done &
> +
> +		while [ ! -e $stop ]; do
> +			do_writeback
> +		done &
> +	done
> +	sleep $LOOP_TIME
> +	touch $stop
> +	wait
> +
> +	_scratch_unmount
> +	# test exits here if fs is inconsistent
> +	_check_scratch_fs
> +	_scratch_mount
> +done
> +
> +echo "Silence is golden"
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/generic/451.out b/tests/generic/451.out
> new file mode 100644
> index 000000000000..db924411b72f
> --- /dev/null
> +++ b/tests/generic/451.out
> @@ -0,0 +1,2 @@
> +QA output created by 451
> +Silence is golden
> diff --git a/tests/generic/group b/tests/generic/group
> index 044ec3f355ed..b4bd66bc65a9 100644
> --- a/tests/generic/group
> +++ b/tests/generic/group
> @@ -453,3 +453,4 @@
>  448 auto quick rw
>  449 auto quick acl enospc
>  450 auto quick rw
> +451 auto rw
> -- 
> 2.13.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] generic: test race between block map change and writeback
  2017-10-09 16:12 ` Brian Foster
@ 2017-10-10  4:36   ` Eryu Guan
  2017-10-10  5:24     ` Dave Chinner
  0 siblings, 1 reply; 12+ messages in thread
From: Eryu Guan @ 2017-10-10  4:36 UTC (permalink / raw)
  To: Brian Foster; +Cc: fstests, linux-xfs

On Mon, Oct 09, 2017 at 12:12:55PM -0400, Brian Foster wrote:
> On Thu, Aug 31, 2017 at 12:02:37PM +0800, Eryu Guan wrote:
> > Run delalloc writes & append writes & non-data-integrity syncs
> > concurrently to test the race between block map change vs writeback.
> > 
> > This is to cover an XFS bug that data could be written to wrong
> > block and delay allocated blocks are leaked because the block map
> > was changed due to the removal of speculative allocated eofblocks
> > when writeback is in progress.
> > 
> > And this test partially mimics what lustre-racer[1] test does, using
> > which this bug was first found.
> > 
> > [1] https://git.hpdd.intel.com/?p=fs/lustre-release.git;a=tree;f=lustre/tests/racer;hb=HEAD
> > 
> > Signed-off-by: Eryu Guan <eguan@redhat.com>
> > ---
> > 
> > This may not reproduce the bug on all hosts, but it does reproduce the XFS
> > corruption issue reliably on my different test hosts.
> > 
> 
> Was this problem fixed already or are we still waiting on a fix?

It's still an unfixed problem. Dave provided a test patch (which did fix
the bug for me) then Christoph suggested a fix based on seqlock, and
things stalled there. (I'm happy to pick up the work, but I'm not that
familiar with all the allocation paths that could change the extent map,
so I may need some guidance and time to play with it.)

> 
> >  tests/generic/451     | 130 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >  tests/generic/451.out |   2 +
> >  tests/generic/group   |   1 +
> >  3 files changed, 133 insertions(+)
> >  create mode 100755 tests/generic/451
> >  create mode 100644 tests/generic/451.out
> > 
> > diff --git a/tests/generic/451 b/tests/generic/451
> > new file mode 100755
> > index 000000000000..72cdd1c01de2
> > --- /dev/null
> > +++ b/tests/generic/451
> > @@ -0,0 +1,130 @@
> > +#! /bin/bash
> > +# FS QA Test 451
> > +#
> > +# Run delalloc writes & append writes & non-data-integrity syncs concurrently
> > +# to test the race between block map change vs writeback.
> > +#
> > +#-----------------------------------------------------------------------
> > +# Copyright (c) 2017 Red Hat Inc. All Rights Reserved.
> > +#
> > +# This program is free software; you can redistribute it and/or
> > +# modify it under the terms of the GNU General Public License as
> > +# published by the Free Software Foundation.
> > +#
> > +# This program is distributed in the hope that it would be useful,
> > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +# GNU General Public License for more details.
> > +#
> > +# You should have received a copy of the GNU General Public License
> > +# along with this program; if not, write the Free Software Foundation,
> > +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> > +#-----------------------------------------------------------------------
> > +#
> > +
> > +seq=`basename $0`
> > +seqres=$RESULT_DIR/$seq
> > +echo "QA output created by $seq"
> > +
> > +here=`pwd`
> > +tmp=/tmp/$$
> > +status=1	# failure is the default!
> > +trap "_cleanup; exit \$status" 0 1 2 3 15
> > +
> > +_cleanup()
> > +{
> > +	cd /
> > +	rm -f $tmp.*
> > +}
> > +
> > +# get standard environment, filters and checks
> > +. ./common/rc
> > +. ./common/filter
> > +
> > +MAXFILES=200
> > +BLOCK_SZ=65536
> > +
> > +LOOP_CNT=12
> 
> If I skip the failure detection below, the test runs for 100s on my vm.
> Otherwise it fails consistently within ~45s (worst case in 5 or 6
> tries). Do you observe differently? If not, I'm wondering if we could
> speed up the common case and reduce the number of iterations.

On my test vm, around 60% runs failed for me, and the run time of failed
runs can vary from 6s to 65s. A successful run needs around 70s. I think
I can reduce the LOOP_CNT to 10, then more than 50% runs failed for me
and a successful run needs around 60s on my test vm.

> 
> > +LOOP_TIME=5
> > +PROC_CNT=16
> > +
> > +stop=$tmp.stop
> > +
> > +# get a random file to work on
> > +getfile()
> > +{
> > +	echo $SCRATCH_MNT/$((RANDOM % MAXFILES))
> > +}
> > +
> > +# delalloc write a relative big file to get enough dirty pages to be written
> > +# back, and XFS needs big enough file to trigger speculative preallocations, so
> > +# freeing these eofblocks could change the extent record
> > +do_write()
> > +{
> > +	local blockcount=$((RANDOM % 100))
> > +	local filesize=$((blockcount * BLOCK_SZ))
> > +	$XFS_IO_PROG -ftc "pwrite -b $BLOCK_SZ 0 $filesize" `getfile` >/dev/null 2>&1
> 
> Long line here. Otherwise the rest of the test looks good.

Sure, will fix that. Thanks a lot for the review!

Eryu
> 
> Brian
> 
> > +}
> > +
> > +# append another dirty page to the file, the writeback might pick it up too if
> > +# the file is already under writeback
> > +do_append()
> > +{
> > +	echo "test string" >> `getfile`
> > +}
> > +
> > +# issue WB_SYNC_NONE writeback with the '-w' option of sync_range xfs_io
> > +# command, so that the last dirty page from append write can be picked up in
> > +# this writeback cycle. This is not mandatory but could help reproduce XFS
> > +# corruption more easily.
> > +do_writeback()
> > +{
> > +	$XFS_IO_PROG -c "sync_range -w 0 0" `getfile` >/dev/null 2>&1
> > +}
> > +
> > +# remove previous $seqres.full before test
> > +rm -f $seqres.full
> > +
> > +# real QA test starts here
> > +_supported_fs generic
> > +_supported_os Linux
> > +# do fsck after each iteration in test
> > +_require_scratch_nocheck
> > +_require_xfs_io_command "sync_range"
> > +
> > +_scratch_mkfs >>$seqres.full 2>&1
> > +_scratch_mount
> > +
> > +# loop for $LOOP_CNT iterations, and each iteration starts $PROC_CNT processes
> > +# for each operation and runs for $LOOP_TIME seconds, and check filesystem
> > +# consistency after each iteration
> > +for i in `seq 1 $LOOP_CNT`; do
> > +	rm -f $stop
> > +	for j in `seq 1 $PROC_CNT`; do
> > +		while [ ! -e $stop ]; do
> > +			do_write
> > +		done &
> > +
> > +		while [ ! -e $stop ]; do
> > +			do_append
> > +		done &
> > +
> > +		while [ ! -e $stop ]; do
> > +			do_writeback
> > +		done &
> > +	done
> > +	sleep $LOOP_TIME
> > +	touch $stop
> > +	wait
> > +
> > +	_scratch_unmount
> > +	# test exits here if fs is inconsistent
> > +	_check_scratch_fs
> > +	_scratch_mount
> > +done
> > +
> > +echo "Silence is golden"
> > +
> > +# success, all done
> > +status=0
> > +exit
> > diff --git a/tests/generic/451.out b/tests/generic/451.out
> > new file mode 100644
> > index 000000000000..db924411b72f
> > --- /dev/null
> > +++ b/tests/generic/451.out
> > @@ -0,0 +1,2 @@
> > +QA output created by 451
> > +Silence is golden
> > diff --git a/tests/generic/group b/tests/generic/group
> > index 044ec3f355ed..b4bd66bc65a9 100644
> > --- a/tests/generic/group
> > +++ b/tests/generic/group
> > @@ -453,3 +453,4 @@
> >  448 auto quick rw
> >  449 auto quick acl enospc
> >  450 auto quick rw
> > +451 auto rw
> > -- 
> > 2.13.5
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe fstests" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] generic: test race between block map change and writeback
  2017-10-10  4:36   ` Eryu Guan
@ 2017-10-10  5:24     ` Dave Chinner
  2017-10-10 10:56       ` Brian Foster
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2017-10-10  5:24 UTC (permalink / raw)
  To: Eryu Guan; +Cc: Brian Foster, fstests, linux-xfs

On Tue, Oct 10, 2017 at 12:36:49PM +0800, Eryu Guan wrote:
> On Mon, Oct 09, 2017 at 12:12:55PM -0400, Brian Foster wrote:
> > On Thu, Aug 31, 2017 at 12:02:37PM +0800, Eryu Guan wrote:
> > > Run delalloc writes & append writes & non-data-integrity syncs
> > > concurrently to test the race between block map change vs writeback.
> > > 
> > > This is to cover an XFS bug that data could be written to wrong
> > > block and delay allocated blocks are leaked because the block map
> > > was changed due to the removal of speculative allocated eofblocks
> > > when writeback is in progress.
> > > 
> > > And this test partially mimics what lustre-racer[1] test does, using
> > > which this bug was first found.
> > > 
> > > [1] https://git.hpdd.intel.com/?p=fs/lustre-release.git;a=tree;f=lustre/tests/racer;hb=HEAD
> > > 
> > > Signed-off-by: Eryu Guan <eguan@redhat.com>
> > > ---
> > > 
> > > This may not reproduce the bug on all hosts, but it does reproduce the XFS
> > > corruption issue reliably on my different test hosts.
> > > 
> > 
> > Was this problem fixed already or are we still waiting on a fix?
> 
> It's still an unfixed problem. Dave provided a test patch (which did fix
> the bug for me)

The test patch I provided broken the COW writeback path, primarily
because it's a separate mapping path and the change I made doesn't
work at all well with it....

> then Christoph suggested a fix based on seqlock, and
> things stalled there.

I had a look at doing that and got stalled on the fact that, again,
the COW writeback is completely separate to the existing block
mapping during writeback path and so applying a seqlock algorithm is
pretty difficult.

Basically, to fix the problem, we first need to merge the COW and
delalloc paths in the writepage code and then we'll have a sane base
on which to apply a proper fix...

(we need to do this to get rid of the bufferhead dependency, anyway)

> (I'm happy to pick up the work, but I'm not that
> familiar with all the allocation paths that could change the extent map,
> so I may need some guidance and time to play with it.)

There's some black magic in amongst it all. I'll spend some time on
it again over the next week and see what I come up with...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] generic: test race between block map change and writeback
  2017-10-10  5:24     ` Dave Chinner
@ 2017-10-10 10:56       ` Brian Foster
  2017-10-11  5:30         ` Dave Chinner
  2017-10-11 10:33         ` Eryu Guan
  0 siblings, 2 replies; 12+ messages in thread
From: Brian Foster @ 2017-10-10 10:56 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Eryu Guan, fstests, linux-xfs

On Tue, Oct 10, 2017 at 04:24:59PM +1100, Dave Chinner wrote:
> On Tue, Oct 10, 2017 at 12:36:49PM +0800, Eryu Guan wrote:
> > On Mon, Oct 09, 2017 at 12:12:55PM -0400, Brian Foster wrote:
> > > On Thu, Aug 31, 2017 at 12:02:37PM +0800, Eryu Guan wrote:
> > > > Run delalloc writes & append writes & non-data-integrity syncs
> > > > concurrently to test the race between block map change vs writeback.
> > > > 
> > > > This is to cover an XFS bug that data could be written to wrong
> > > > block and delay allocated blocks are leaked because the block map
> > > > was changed due to the removal of speculative allocated eofblocks
> > > > when writeback is in progress.
> > > > 
> > > > And this test partially mimics what lustre-racer[1] test does, using
> > > > which this bug was first found.
> > > > 
> > > > [1] https://git.hpdd.intel.com/?p=fs/lustre-release.git;a=tree;f=lustre/tests/racer;hb=HEAD
> > > > 
> > > > Signed-off-by: Eryu Guan <eguan@redhat.com>
> > > > ---
> > > > 
> > > > This may not reproduce the bug on all hosts, but it does reproduce the XFS
> > > > corruption issue reliably on my different test hosts.
> > > > 
> > > 
> > > Was this problem fixed already or are we still waiting on a fix?
> > 
> > It's still an unfixed problem. Dave provided a test patch (which did fix
> > the bug for me)
> 
> The test patch I provided broken the COW writeback path, primarily
> because it's a separate mapping path and the change I made doesn't
> work at all well with it....
> 
> > then Christoph suggested a fix based on seqlock, and
> > things stalled there.
> 
> I had a look at doing that and got stalled on the fact that, again,
> the COW writeback is completely separate to the existing block
> mapping during writeback path and so applying a seqlock algorithm is
> pretty difficult.
> 
> Basically, to fix the problem, we first need to merge the COW and
> delalloc paths in the writepage code and then we'll have a sane base
> on which to apply a proper fix...
> 
> (we need to do this to get rid of the bufferhead dependency, anyway)
> 
> > (I'm happy to pick up the work, but I'm not that
> > familiar with all the allocation paths that could change the extent map,
> > so I may need some guidance and time to play with it.)
> 
> There's some black magic in amongst it all. I'll spend some time on
> it again over the next week and see what I come up with...
> 

Hmm, is this[1] the test patch/thread associated with this test case? If
so, I'm still wondering why we can't just trim the mapping to eof like
the previous code had effectively done for so long..? Eryu, does the
appended diff address this test case?

Note that I'm not saying that there isn't also a similar mapping
validation issue associated with user interaction (as opposed to
eofblocks), but if so, I am skeptical that this test reproduces it. IOW,
I think the latter should be independently verified (I don't see any
follow up to that in the previous thread) and may very well warrant a
unique test.

Brian

[1] https://marc.info/?l=linux-xfs&m=150407819630651&w=2

--- 8< ---

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 044a363..dd3fb7b 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3852,6 +3852,17 @@ xfs_trim_extent(
 	}
 }
 
+/* trim extent to within eof */
+void
+xfs_trim_extent_eof(
+	struct xfs_bmbt_irec	*irec,
+	struct xfs_inode	*ip)
+
+{
+	xfs_trim_extent(irec, 0, XFS_B_TO_FSB(ip->i_mount,
+					      i_size_read(VFS_I(ip))));
+}
+
 /*
  * Trim the returned map to the required bounds
  */
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index 851982a..502e0d8 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -208,6 +208,7 @@ void	xfs_bmap_trace_exlist(struct xfs_inode *ip, xfs_extnum_t cnt,
 
 void	xfs_trim_extent(struct xfs_bmbt_irec *irec, xfs_fileoff_t bno,
 		xfs_filblks_t len);
+void	xfs_trim_extent_eof(struct xfs_bmbt_irec *, struct xfs_inode *);
 int	xfs_bmap_add_attrfork(struct xfs_inode *ip, int size, int rsvd);
 void	xfs_bmap_local_to_extents_empty(struct xfs_inode *ip, int whichfork);
 void	xfs_bmap_add_free(struct xfs_mount *mp, struct xfs_defer_ops *dfops,
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 1dbc5cf..3ab6d9d 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -423,7 +423,7 @@ xfs_map_blocks(
 				imap);
 		if (!error)
 			trace_xfs_map_blocks_alloc(ip, offset, count, type, imap);
-		return error;
+		goto out_trim;
 	}
 
 #ifdef DEBUG
@@ -435,7 +435,9 @@ xfs_map_blocks(
 #endif
 	if (nimaps)
 		trace_xfs_map_blocks_found(ip, offset, count, type, imap);
-	return 0;
+out_trim:
+	xfs_trim_extent_eof(imap, ip);
+	return error;
 }
 
 STATIC bool

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] generic: test race between block map change and writeback
  2017-08-31  4:02 [PATCH] generic: test race between block map change and writeback Eryu Guan
  2017-10-09  8:17 ` Eryu Guan
  2017-10-09 16:12 ` Brian Foster
@ 2017-10-10 12:44 ` Xiong Zhou
  2 siblings, 0 replies; 12+ messages in thread
From: Xiong Zhou @ 2017-10-10 12:44 UTC (permalink / raw)
  To: Eryu Guan; +Cc: fstests, linux-xfs

On Thu, Aug 31, 2017 at 12:02:37PM +0800, Eryu Guan wrote:
> Run delalloc writes & append writes & non-data-integrity syncs
> concurrently to test the race between block map change vs writeback.
> 
> This is to cover an XFS bug that data could be written to wrong
> block and delay allocated blocks are leaked because the block map
> was changed due to the removal of speculative allocated eofblocks
> when writeback is in progress.
> 
> And this test partially mimics what lustre-racer[1] test does, using
> which this bug was first found.
> 
> [1] https://git.hpdd.intel.com/?p=fs/lustre-release.git;a=tree;f=lustre/tests/racer;hb=HEAD
> 
> Signed-off-by: Eryu Guan <eguan@redhat.com>
> ---
> 
> This may not reproduce the bug on all hosts, but it does reproduce the XFS
> corruption issue reliably on my different test hosts.
> 
>  tests/generic/451     | 130 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/generic/451.out |   2 +
>  tests/generic/group   |   1 +
>  3 files changed, 133 insertions(+)
>  create mode 100755 tests/generic/451
>  create mode 100644 tests/generic/451.out
> 
> diff --git a/tests/generic/451 b/tests/generic/451
> new file mode 100755
> index 000000000000..72cdd1c01de2
> --- /dev/null
> +++ b/tests/generic/451
> @@ -0,0 +1,130 @@
> +#! /bin/bash
> +# FS QA Test 451
> +#
> +# Run delalloc writes & append writes & non-data-integrity syncs concurrently
> +# to test the race between block map change vs writeback.
> +#
> +#-----------------------------------------------------------------------
> +# Copyright (c) 2017 Red Hat Inc. All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#-----------------------------------------------------------------------
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1	# failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> +	cd /
> +	rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +
> +MAXFILES=200
> +BLOCK_SZ=65536
> +
> +LOOP_CNT=12
> +LOOP_TIME=5
> +PROC_CNT=16
> +
> +stop=$tmp.stop
> +
> +# get a random file to work on
> +getfile()
> +{
> +	echo $SCRATCH_MNT/$((RANDOM % MAXFILES))
> +}
> +
> +# delalloc write a relative big file to get enough dirty pages to be written
> +# back, and XFS needs big enough file to trigger speculative preallocations, so
> +# freeing these eofblocks could change the extent record
> +do_write()
> +{
> +	local blockcount=$((RANDOM % 100))
> +	local filesize=$((blockcount * BLOCK_SZ))
> +	$XFS_IO_PROG -ftc "pwrite -b $BLOCK_SZ 0 $filesize" `getfile` >/dev/null 2>&1
> +}
> +
> +# append another dirty page to the file, the writeback might pick it up too if
> +# the file is already under writeback
> +do_append()
> +{
> +	echo "test string" >> `getfile`
> +}
> +
> +# issue WB_SYNC_NONE writeback with the '-w' option of sync_range xfs_io
> +# command, so that the last dirty page from append write can be picked up in
> +# this writeback cycle. This is not mandatory but could help reproduce XFS
> +# corruption more easily.
> +do_writeback()
> +{
> +	$XFS_IO_PROG -c "sync_range -w 0 0" `getfile` >/dev/null 2>&1
> +}

How about adding a do_read() to read some data and check.

Thanks,
Xiong

> +
> +# remove previous $seqres.full before test
> +rm -f $seqres.full
> +
> +# real QA test starts here
> +_supported_fs generic
> +_supported_os Linux
> +# do fsck after each iteration in test
> +_require_scratch_nocheck
> +_require_xfs_io_command "sync_range"
> +
> +_scratch_mkfs >>$seqres.full 2>&1
> +_scratch_mount
> +
> +# loop for $LOOP_CNT iterations, and each iteration starts $PROC_CNT processes
> +# for each operation and runs for $LOOP_TIME seconds, and check filesystem
> +# consistency after each iteration
> +for i in `seq 1 $LOOP_CNT`; do
> +	rm -f $stop
> +	for j in `seq 1 $PROC_CNT`; do
> +		while [ ! -e $stop ]; do
> +			do_write
> +		done &
> +
> +		while [ ! -e $stop ]; do
> +			do_append
> +		done &
> +
> +		while [ ! -e $stop ]; do
> +			do_writeback
> +		done &
> +	done
> +	sleep $LOOP_TIME
> +	touch $stop
> +	wait
> +
> +	_scratch_unmount
> +	# test exits here if fs is inconsistent
> +	_check_scratch_fs
> +	_scratch_mount
> +done
> +
> +echo "Silence is golden"
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/generic/451.out b/tests/generic/451.out
> new file mode 100644
> index 000000000000..db924411b72f
> --- /dev/null
> +++ b/tests/generic/451.out
> @@ -0,0 +1,2 @@
> +QA output created by 451
> +Silence is golden
> diff --git a/tests/generic/group b/tests/generic/group
> index 044ec3f355ed..b4bd66bc65a9 100644
> --- a/tests/generic/group
> +++ b/tests/generic/group
> @@ -453,3 +453,4 @@
>  448 auto quick rw
>  449 auto quick acl enospc
>  450 auto quick rw
> +451 auto rw
> -- 
> 2.13.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] generic: test race between block map change and writeback
  2017-10-10 10:56       ` Brian Foster
@ 2017-10-11  5:30         ` Dave Chinner
  2017-10-11  9:45           ` Brian Foster
  2017-10-11 10:33         ` Eryu Guan
  1 sibling, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2017-10-11  5:30 UTC (permalink / raw)
  To: Brian Foster; +Cc: Eryu Guan, fstests, linux-xfs

On Tue, Oct 10, 2017 at 06:56:22AM -0400, Brian Foster wrote:
> On Tue, Oct 10, 2017 at 04:24:59PM +1100, Dave Chinner wrote:
> > On Tue, Oct 10, 2017 at 12:36:49PM +0800, Eryu Guan wrote:
> > > On Mon, Oct 09, 2017 at 12:12:55PM -0400, Brian Foster wrote:
> > > > On Thu, Aug 31, 2017 at 12:02:37PM +0800, Eryu Guan wrote:
> > > > > Run delalloc writes & append writes & non-data-integrity syncs
> > > > > concurrently to test the race between block map change vs writeback.
> > > > > 
> > > > > This is to cover an XFS bug that data could be written to wrong
> > > > > block and delay allocated blocks are leaked because the block map
> > > > > was changed due to the removal of speculative allocated eofblocks
> > > > > when writeback is in progress.
> > > > > 
> > > > > And this test partially mimics what lustre-racer[1] test does, using
> > > > > which this bug was first found.
> > > > > 
> > > > > [1] https://git.hpdd.intel.com/?p=fs/lustre-release.git;a=tree;f=lustre/tests/racer;hb=HEAD
> > > > > 
> > > > > Signed-off-by: Eryu Guan <eguan@redhat.com>
> > > > > ---
> > > > > 
> > > > > This may not reproduce the bug on all hosts, but it does reproduce the XFS
> > > > > corruption issue reliably on my different test hosts.
> > > > > 
> > > > 
> > > > Was this problem fixed already or are we still waiting on a fix?
> > > 
> > > It's still an unfixed problem. Dave provided a test patch (which did fix
> > > the bug for me)
> > 
> > The test patch I provided broken the COW writeback path, primarily
> > because it's a separate mapping path and the change I made doesn't
> > work at all well with it....
> > 
> > > then Christoph suggested a fix based on seqlock, and
> > > things stalled there.
> > 
> > I had a look at doing that and got stalled on the fact that, again,
> > the COW writeback is completely separate to the existing block
> > mapping during writeback path and so applying a seqlock algorithm is
> > pretty difficult.
> > 
> > Basically, to fix the problem, we first need to merge the COW and
> > delalloc paths in the writepage code and then we'll have a sane base
> > on which to apply a proper fix...
> > 
> > (we need to do this to get rid of the bufferhead dependency, anyway)
> > 
> > > (I'm happy to pick up the work, but I'm not that
> > > familiar with all the allocation paths that could change the extent map,
> > > so I may need some guidance and time to play with it.)
> > 
> > There's some black magic in amongst it all. I'll spend some time on
> > it again over the next week and see what I come up with...
> > 
> 
> Hmm, is this[1] the test patch/thread associated with this test case? If
> so, I'm still wondering why we can't just trim the mapping to eof like
> the previous code had effectively done for so long..? Eryu, does the
> appended diff address this test case?

I'm not sure that is sufficient. To me addresses the symptom, not
the root problem. The cached extent can go stale at any time, so
we really need to ensure that cannot go unnoticed in any
circumstance, not just EOF trimming....

I'm working on a patch right now that unifies the writeback mapping
mechanisms so we can apply something like a seqlock (a.k.a a
generation number) to a cached extent, and that solves the general
problem of caching extent lookup results without inode locks held.
We do this in several places, and we've had problems in the past
that we've worked around by reducing the number of cached extents
to 1 (e.g. xfs_iomap_write_allocate()).

Hence I think it's something we really need to solve rather than
continuing to add case-by-case work arounds every time we have this
problem...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] generic: test race between block map change and writeback
  2017-10-11  5:30         ` Dave Chinner
@ 2017-10-11  9:45           ` Brian Foster
  2017-10-11 10:42             ` Dave Chinner
  0 siblings, 1 reply; 12+ messages in thread
From: Brian Foster @ 2017-10-11  9:45 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Eryu Guan, fstests, linux-xfs

On Wed, Oct 11, 2017 at 04:30:25PM +1100, Dave Chinner wrote:
> On Tue, Oct 10, 2017 at 06:56:22AM -0400, Brian Foster wrote:
> > On Tue, Oct 10, 2017 at 04:24:59PM +1100, Dave Chinner wrote:
> > > On Tue, Oct 10, 2017 at 12:36:49PM +0800, Eryu Guan wrote:
> > > > On Mon, Oct 09, 2017 at 12:12:55PM -0400, Brian Foster wrote:
> > > > > On Thu, Aug 31, 2017 at 12:02:37PM +0800, Eryu Guan wrote:
> > > > > > Run delalloc writes & append writes & non-data-integrity syncs
> > > > > > concurrently to test the race between block map change vs writeback.
> > > > > > 
> > > > > > This is to cover an XFS bug that data could be written to wrong
> > > > > > block and delay allocated blocks are leaked because the block map
> > > > > > was changed due to the removal of speculative allocated eofblocks
> > > > > > when writeback is in progress.
> > > > > > 
> > > > > > And this test partially mimics what lustre-racer[1] test does, using
> > > > > > which this bug was first found.
> > > > > > 
> > > > > > [1] https://git.hpdd.intel.com/?p=fs/lustre-release.git;a=tree;f=lustre/tests/racer;hb=HEAD
> > > > > > 
> > > > > > Signed-off-by: Eryu Guan <eguan@redhat.com>
> > > > > > ---
> > > > > > 
> > > > > > This may not reproduce the bug on all hosts, but it does reproduce the XFS
> > > > > > corruption issue reliably on my different test hosts.
> > > > > > 
> > > > > 
> > > > > Was this problem fixed already or are we still waiting on a fix?
> > > > 
> > > > It's still an unfixed problem. Dave provided a test patch (which did fix
> > > > the bug for me)
> > > 
> > > The test patch I provided broken the COW writeback path, primarily
> > > because it's a separate mapping path and the change I made doesn't
> > > work at all well with it....
> > > 
> > > > then Christoph suggested a fix based on seqlock, and
> > > > things stalled there.
> > > 
> > > I had a look at doing that and got stalled on the fact that, again,
> > > the COW writeback is completely separate to the existing block
> > > mapping during writeback path and so applying a seqlock algorithm is
> > > pretty difficult.
> > > 
> > > Basically, to fix the problem, we first need to merge the COW and
> > > delalloc paths in the writepage code and then we'll have a sane base
> > > on which to apply a proper fix...
> > > 
> > > (we need to do this to get rid of the bufferhead dependency, anyway)
> > > 
> > > > (I'm happy to pick up the work, but I'm not that
> > > > familiar with all the allocation paths that could change the extent map,
> > > > so I may need some guidance and time to play with it.)
> > > 
> > > There's some black magic in amongst it all. I'll spend some time on
> > > it again over the next week and see what I come up with...
> > > 
> > 
> > Hmm, is this[1] the test patch/thread associated with this test case? If
> > so, I'm still wondering why we can't just trim the mapping to eof like
> > the previous code had effectively done for so long..? Eryu, does the
> > appended diff address this test case?
> 
> I'm not sure that is sufficient. To me addresses the symptom, not
> the root problem. The cached extent can go stale at any time, so
> we really need to ensure that cannot go unnoticed in any
> circumstance, not just EOF trimming....
> 

I agree that it may not be sufficient. But the fact remains that the
only currently reproducible component of this is a regression as of the
page writeback rework that killed off the old cluster_write() bits. I've
asked a couple times about proving out the broader design flaw of the
mapping going stale leading to a tangible problem (using
instrumentation, if necessary) without any feedback so far, so I'm going
to consider that a theoretical problem until that happens. To put it
another way, I don't think this test is sufficient validation of the
root problem. ;)

The intent is not to avoid fixing the root problem, but to suggest that
we classify it as a second part of a two part fix. I think the benefits
of doing so are twofold:

1.) The aforementioned change provides a straightforward and practical
fix for a reproducible regression (i.e., the workaround is more likely
-rc material and stable fodder).

2.) Using the simple regression fix to address this particular test
nudges us to also consider a better, more thorough test for the broader
design flaw.

I think it would be a bit of a shame to fix this kind of longstanding
design flaw using a regression test that only tests for a particular
symptom, as you put it. Simple changes to speculative preallocation in
the future could potentially render it (silently) ineffective.

> I'm working on a patch right now that unifies the writeback mapping
> mechanisms so we can apply something like a seqlock (a.k.a a
> generation number) to a cached extent, and that solves the general
> problem of caching extent lookup results without inode locks held.
> We do this in several places, and we've had problems in the past
> that we've worked around by reducing the number of cached extents
> to 1 (e.g. xfs_iomap_write_allocate()).
> 

Sounds interesting.

> Hence I think it's something we really need to solve rather than
> continuing to add case-by-case work arounds every time we have this
> problem...
> 

The workaround doesn't elide the need for the design fix. The latter can
essentially replace the former, but a workaround first allows us to fix
the regression more quickly and with limited risk to older kernels. It
looks like this regression was introduced in v4.6, thus taking over a
year to be teased out.

If we assume that you are going to continue to work out a design fix for
the root problem of the writeback mapping becoming invalid (and perhaps
I'll take a stab at another test that more thoroughly tests that
problem), do you see any problems with the patch itself? If not, do you
object to getting it posted for review in the meantime?

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] generic: test race between block map change and writeback
  2017-10-10 10:56       ` Brian Foster
  2017-10-11  5:30         ` Dave Chinner
@ 2017-10-11 10:33         ` Eryu Guan
  1 sibling, 0 replies; 12+ messages in thread
From: Eryu Guan @ 2017-10-11 10:33 UTC (permalink / raw)
  To: Brian Foster; +Cc: Dave Chinner, fstests, linux-xfs

On Tue, Oct 10, 2017 at 06:56:22AM -0400, Brian Foster wrote:
> On Tue, Oct 10, 2017 at 04:24:59PM +1100, Dave Chinner wrote:
> > On Tue, Oct 10, 2017 at 12:36:49PM +0800, Eryu Guan wrote:
> > > On Mon, Oct 09, 2017 at 12:12:55PM -0400, Brian Foster wrote:
> > > > On Thu, Aug 31, 2017 at 12:02:37PM +0800, Eryu Guan wrote:
> > > > > Run delalloc writes & append writes & non-data-integrity syncs
> > > > > concurrently to test the race between block map change vs writeback.
> > > > > 
> > > > > This is to cover an XFS bug that data could be written to wrong
> > > > > block and delay allocated blocks are leaked because the block map
> > > > > was changed due to the removal of speculative allocated eofblocks
> > > > > when writeback is in progress.
> > > > > 
> > > > > And this test partially mimics what lustre-racer[1] test does, using
> > > > > which this bug was first found.
> > > > > 
> > > > > [1] https://git.hpdd.intel.com/?p=fs/lustre-release.git;a=tree;f=lustre/tests/racer;hb=HEAD
> > > > > 
> > > > > Signed-off-by: Eryu Guan <eguan@redhat.com>
> > > > > ---
> > > > > 
> > > > > This may not reproduce the bug on all hosts, but it does reproduce the XFS
> > > > > corruption issue reliably on my different test hosts.
> > > > > 
> > > > 
> > > > Was this problem fixed already or are we still waiting on a fix?
> > > 
> > > It's still an unfixed problem. Dave provided a test patch (which did fix
> > > the bug for me)
> > 
> > The test patch I provided broken the COW writeback path, primarily
> > because it's a separate mapping path and the change I made doesn't
> > work at all well with it....
> > 
> > > then Christoph suggested a fix based on seqlock, and
> > > things stalled there.
> > 
> > I had a look at doing that and got stalled on the fact that, again,
> > the COW writeback is completely separate to the existing block
> > mapping during writeback path and so applying a seqlock algorithm is
> > pretty difficult.
> > 
> > Basically, to fix the problem, we first need to merge the COW and
> > delalloc paths in the writepage code and then we'll have a sane base
> > on which to apply a proper fix...
> > 
> > (we need to do this to get rid of the bufferhead dependency, anyway)
> > 
> > > (I'm happy to pick up the work, but I'm not that
> > > familiar with all the allocation paths that could change the extent map,
> > > so I may need some guidance and time to play with it.)
> > 
> > There's some black magic in amongst it all. I'll spend some time on
> > it again over the next week and see what I come up with...
> > 
> 
> Hmm, is this[1] the test patch/thread associated with this test case? If
> so, I'm still wondering why we can't just trim the mapping to eof like
> the previous code had effectively done for so long..? Eryu, does the
> appended diff address this test case?

Yes, the appended patch fixed my test failure, it survived 20+
iterations for me.

Thanks,
Eryu

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] generic: test race between block map change and writeback
  2017-10-11  9:45           ` Brian Foster
@ 2017-10-11 10:42             ` Dave Chinner
  2017-10-11 13:47               ` Brian Foster
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2017-10-11 10:42 UTC (permalink / raw)
  To: Brian Foster; +Cc: Eryu Guan, fstests, linux-xfs

On Wed, Oct 11, 2017 at 05:45:53AM -0400, Brian Foster wrote:
> On Wed, Oct 11, 2017 at 04:30:25PM +1100, Dave Chinner wrote:
> > On Tue, Oct 10, 2017 at 06:56:22AM -0400, Brian Foster wrote:
> > > On Tue, Oct 10, 2017 at 04:24:59PM +1100, Dave Chinner wrote:
> > > > On Tue, Oct 10, 2017 at 12:36:49PM +0800, Eryu Guan wrote:
> > > > > On Mon, Oct 09, 2017 at 12:12:55PM -0400, Brian Foster
> > > > > wrote:
> > > > > > On Thu, Aug 31, 2017 at 12:02:37PM +0800, Eryu Guan
> > > > > > wrote:
> > > > > > > Run delalloc writes & append writes &
> > > > > > > non-data-integrity syncs concurrently to test the race
> > > > > > > between block map change vs writeback.
> > > > > > > 
> > > > > > > This is to cover an XFS bug that data could be written
> > > > > > > to wrong block and delay allocated blocks are leaked
> > > > > > > because the block map was changed due to the removal
> > > > > > > of speculative allocated eofblocks when writeback is
> > > > > > > in progress.
> > > > > > > 
> > > > > > > And this test partially mimics what lustre-racer[1]
> > > > > > > test does, using which this bug was first found.
> > > > > > > 
> > > > > > > [1]
> > > > > > > https://git.hpdd.intel.com/?p=fs/lustre-release.git;a=tree;f=lustre/tests/racer;hb=HEAD
> > > > > > > 
> > > > > > > Signed-off-by: Eryu Guan <eguan@redhat.com> ---
> > > > > > > 
> > > > > > > This may not reproduce the bug on all hosts, but it
> > > > > > > does reproduce the XFS corruption issue reliably on my
> > > > > > > different test hosts.
> > > > > > > 
> > > > > > 
> > > > > > Was this problem fixed already or are we still waiting
> > > > > > on a fix?
> > > > > 
> > > > > It's still an unfixed problem. Dave provided a test patch
> > > > > (which did fix the bug for me)
> > > > 
> > > > The test patch I provided broken the COW writeback path,
> > > > primarily because it's a separate mapping path and the
> > > > change I made doesn't work at all well with it....
> > > > 
> > > > > then Christoph suggested a fix based on seqlock, and
> > > > > things stalled there.
> > > > 
> > > > I had a look at doing that and got stalled on the fact that,
> > > > again, the COW writeback is completely separate to the
> > > > existing block mapping during writeback path and so applying
> > > > a seqlock algorithm is pretty difficult.
> > > > 
> > > > Basically, to fix the problem, we first need to merge the
> > > > COW and delalloc paths in the writepage code and then we'll
> > > > have a sane base on which to apply a proper fix...
> > > > 
> > > > (we need to do this to get rid of the bufferhead dependency,
> > > > anyway)
> > > > 
> > > > > (I'm happy to pick up the work, but I'm not that familiar
> > > > > with all the allocation paths that could change the extent
> > > > > map, so I may need some guidance and time to play with
> > > > > it.)
> > > > 
> > > > There's some black magic in amongst it all. I'll spend some
> > > > time on it again over the next week and see what I come up
> > > > with...
> > > > 
> > > 
> > > Hmm, is this[1] the test patch/thread associated with this
> > > test case? If so, I'm still wondering why we can't just trim
> > > the mapping to eof like the previous code had effectively done
> > > for so long..? Eryu, does the appended diff address this test
> > > case?
> > 
> > I'm not sure that is sufficient. To me addresses the symptom,
> > not the root problem. The cached extent can go stale at any
> > time, so we really need to ensure that cannot go unnoticed in
> > any circumstance, not just EOF trimming....
> > 
> 
> I agree that it may not be sufficient. But the fact remains that
> the only currently reproducible component of this is a regression
> as of the page writeback rework that killed off the old
> cluster_write() bits. I've asked a couple times about proving out
> the broader design flaw of the mapping going stale leading to a
> tangible problem (using instrumentation, if necessary) without any
> feedback so far, so I'm going to consider that a theoretical
> problem until that happens.

It's most definitely not theoretical - I can show you the scars if
you want.  We know it's a real problem and have for years, so i see
no need to "prove" anything here. The recent regression was
introduced because we broke one of the badly documented bandaids
we did years ago to solve a specific xfstests failure.

Keep in mind that these bandaids were done back when nobody had the
knowledge to realise that there was a general problem.  SGI had bled
away all of it's original XFS expertise and most of us working on it
only had a couple of years experience. Nobody really understood the
big picture about any of the complex XFS code

Hence the result was that the stupid moron who kept tripping over
the problems only knew just enough to work around the problems. He
didn't have the knoweldge base needed to recognise there was a
common underlying cause to many of the problems that were occurring
in algorithms inherited from the Irix code base. We were struggling
just to get tests to pass without data corruption or filesystem
shutdowns being reported.

e.g. the xfs_map_buffer -> xfs_iomap_write_allocate map coherency
problem that concurrent fsstress tests in xfstests kept tripping
over got "fixed" like this:

commit e4143a1cf5973e3443c0650fc4c35292d3b7baa8
Author: David Chinner <dgc@sgi.com>
Date:   Fri Nov 23 16:29:11 2007 +1100

    [XFS] Fix transaction overrun during writeback.
    
    Prevent transaction overrun in xfs_iomap_write_allocate() if we race with
    a truncate that overlaps the delalloc range we were planning to allocate.
    
    If we race, we may allocate into a hole and that requires block
    allocation. At this point in time we don't have a reservation for block
    allocation (apart from metadata blocks) and so allocating into a hole
    rather than a delalloc region results in overflowing the transaction block
    reservation.
    
    Fix it by only allowing a single extent to be allocated at a time.
    
    SGI-PV: 972757
    SGI-Modid: xfs-linux-melb:xfs-kern:30005a
    
    Signed-off-by: David Chinner <dgc@sgi.com>
    Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>

IOWs, if we passed two maps from xfs_bmapi_read() to
xfs_iomap_write_allocate() then the second map might be stale by
the time we used it. The fix didn't solve the cached map problem -
it just mitigated it to the point where it didn't cause corruption
or shutdowns.

And so here we are, 10 years later, dealing with the same "cached
map without locks held is stale" problems in the writeback code....

And, FWIW, it looks to me like the new COW writeback code has a
bunch of interesting coherency issues that have been worked around
because there isn't a general solution for ensuring cached maps are
valid. Yeah, I tripped over XFS_BMAPI_DELALLOC today for the first
time today and could not understand what it was there for from the
code....

> I think it would be a bit of a shame to fix this kind of longstanding
> design flaw using a regression test that only tests for a particular
> symptom, as you put it. Simple changes to speculative preallocation in
> the future could potentially render it (silently) ineffective.

As I've just mentioned, there's a bunch of existing xfstests that
trip over the stale cached extent problem I describe above. That's
how we found them and patched them in the first place.

> The workaround doesn't elide the need for the design fix. The latter can
> essentially replace the former, but a workaround first allows us to fix
> the regression more quickly and with limited risk to older kernels. It
> looks like this regression was introduced in v4.6, thus taking over a
> year to be teased out.

I guess the difference here is that I'm just not interested in
trying to work around problems like this anymore. We need to
understand and fix them properly to ensure we kill them dead for
good and they won't rise from the dead ten years later and bite us
again. Then we can decide if a targetted workaround is appropriate
as a first step for backports....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] generic: test race between block map change and writeback
  2017-10-11 10:42             ` Dave Chinner
@ 2017-10-11 13:47               ` Brian Foster
  0 siblings, 0 replies; 12+ messages in thread
From: Brian Foster @ 2017-10-11 13:47 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Eryu Guan, fstests, linux-xfs

On Wed, Oct 11, 2017 at 09:42:34PM +1100, Dave Chinner wrote:
> On Wed, Oct 11, 2017 at 05:45:53AM -0400, Brian Foster wrote:
> > On Wed, Oct 11, 2017 at 04:30:25PM +1100, Dave Chinner wrote:
> > > On Tue, Oct 10, 2017 at 06:56:22AM -0400, Brian Foster wrote:
> > > > On Tue, Oct 10, 2017 at 04:24:59PM +1100, Dave Chinner wrote:
> > > > > On Tue, Oct 10, 2017 at 12:36:49PM +0800, Eryu Guan wrote:
> > > > > > On Mon, Oct 09, 2017 at 12:12:55PM -0400, Brian Foster
> > > > > > wrote:
> > > > > > > On Thu, Aug 31, 2017 at 12:02:37PM +0800, Eryu Guan
> > > > > > > wrote:
> > > > > > > > Run delalloc writes & append writes &
> > > > > > > > non-data-integrity syncs concurrently to test the race
> > > > > > > > between block map change vs writeback.
> > > > > > > > 
> > > > > > > > This is to cover an XFS bug that data could be written
> > > > > > > > to wrong block and delay allocated blocks are leaked
> > > > > > > > because the block map was changed due to the removal
> > > > > > > > of speculative allocated eofblocks when writeback is
> > > > > > > > in progress.
> > > > > > > > 
> > > > > > > > And this test partially mimics what lustre-racer[1]
> > > > > > > > test does, using which this bug was first found.
> > > > > > > > 
> > > > > > > > [1]
> > > > > > > > https://git.hpdd.intel.com/?p=fs/lustre-release.git;a=tree;f=lustre/tests/racer;hb=HEAD
> > > > > > > > 
> > > > > > > > Signed-off-by: Eryu Guan <eguan@redhat.com> ---
> > > > > > > > 
> > > > > > > > This may not reproduce the bug on all hosts, but it
> > > > > > > > does reproduce the XFS corruption issue reliably on my
> > > > > > > > different test hosts.
> > > > > > > > 
> > > > > > > 
> > > > > > > Was this problem fixed already or are we still waiting
> > > > > > > on a fix?
> > > > > > 
> > > > > > It's still an unfixed problem. Dave provided a test patch
> > > > > > (which did fix the bug for me)
> > > > > 
> > > > > The test patch I provided broken the COW writeback path,
> > > > > primarily because it's a separate mapping path and the
> > > > > change I made doesn't work at all well with it....
> > > > > 
> > > > > > then Christoph suggested a fix based on seqlock, and
> > > > > > things stalled there.
> > > > > 
> > > > > I had a look at doing that and got stalled on the fact that,
> > > > > again, the COW writeback is completely separate to the
> > > > > existing block mapping during writeback path and so applying
> > > > > a seqlock algorithm is pretty difficult.
> > > > > 
> > > > > Basically, to fix the problem, we first need to merge the
> > > > > COW and delalloc paths in the writepage code and then we'll
> > > > > have a sane base on which to apply a proper fix...
> > > > > 
> > > > > (we need to do this to get rid of the bufferhead dependency,
> > > > > anyway)
> > > > > 
> > > > > > (I'm happy to pick up the work, but I'm not that familiar
> > > > > > with all the allocation paths that could change the extent
> > > > > > map, so I may need some guidance and time to play with
> > > > > > it.)
> > > > > 
> > > > > There's some black magic in amongst it all. I'll spend some
> > > > > time on it again over the next week and see what I come up
> > > > > with...
> > > > > 
> > > > 
> > > > Hmm, is this[1] the test patch/thread associated with this
> > > > test case? If so, I'm still wondering why we can't just trim
> > > > the mapping to eof like the previous code had effectively done
> > > > for so long..? Eryu, does the appended diff address this test
> > > > case?
> > > 
> > > I'm not sure that is sufficient. To me addresses the symptom,
> > > not the root problem. The cached extent can go stale at any
> > > time, so we really need to ensure that cannot go unnoticed in
> > > any circumstance, not just EOF trimming....
> > > 
> > 
> > I agree that it may not be sufficient. But the fact remains that
> > the only currently reproducible component of this is a regression
> > as of the page writeback rework that killed off the old
> > cluster_write() bits. I've asked a couple times about proving out
> > the broader design flaw of the mapping going stale leading to a
> > tangible problem (using instrumentation, if necessary) without any
> > feedback so far, so I'm going to consider that a theoretical
> > problem until that happens.
> 
> It's most definitely not theoretical - I can show you the scars if
> you want.  We know it's a real problem and have for years, so i see
> no need to "prove" anything here. The recent regression was
> introduced because we broke one of the badly documented bandaids
> we did years ago to solve a specific xfstests failure.
> 

It's not so much about proving it in general as opposed to doing so from
the context of writeback and also attempting to ensure we have a test
case that sufficiently verifies a fix in that context.

> Keep in mind that these bandaids were done back when nobody had the
> knowledge to realise that there was a general problem.  SGI had bled
> away all of it's original XFS expertise and most of us working on it
> only had a couple of years experience. Nobody really understood the
> big picture about any of the complex XFS code
> 
> Hence the result was that the stupid moron who kept tripping over
> the problems only knew just enough to work around the problems. He
> didn't have the knoweldge base needed to recognise there was a
> common underlying cause to many of the problems that were occurring
> in algorithms inherited from the Irix code base. We were struggling
> just to get tests to pass without data corruption or filesystem
> shutdowns being reported.
> 
> e.g. the xfs_map_buffer -> xfs_iomap_write_allocate map coherency
> problem that concurrent fsstress tests in xfstests kept tripping
> over got "fixed" like this:
> 
> commit e4143a1cf5973e3443c0650fc4c35292d3b7baa8
> Author: David Chinner <dgc@sgi.com>
> Date:   Fri Nov 23 16:29:11 2007 +1100
> 
>     [XFS] Fix transaction overrun during writeback.
>     
>     Prevent transaction overrun in xfs_iomap_write_allocate() if we race with
>     a truncate that overlaps the delalloc range we were planning to allocate.
>     
>     If we race, we may allocate into a hole and that requires block
>     allocation. At this point in time we don't have a reservation for block
>     allocation (apart from metadata blocks) and so allocating into a hole
>     rather than a delalloc region results in overflowing the transaction block
>     reservation.
>     
>     Fix it by only allowing a single extent to be allocated at a time.
>     
>     SGI-PV: 972757
>     SGI-Modid: xfs-linux-melb:xfs-kern:30005a
>     
>     Signed-off-by: David Chinner <dgc@sgi.com>
>     Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
> 
> IOWs, if we passed two maps from xfs_bmapi_read() to
> xfs_iomap_write_allocate() then the second map might be stale by
> the time we used it. The fix didn't solve the cached map problem -
> it just mitigated it to the point where it didn't cause corruption
> or shutdowns.
> 
> And so here we are, 10 years later, dealing with the same "cached
> map without locks held is stale" problems in the writeback code....
> 

Ok, thanks for the background. That suggests a broader approach is
worthwhile regardless of whether this variant is "theoretical." While I
don't have this historical context, note again that I'm not contending
we don't have (or never had) this kind of mapping issue anywhere in XFS.

> And, FWIW, it looks to me like the new COW writeback code has a
> bunch of interesting coherency issues that have been worked around
> because there isn't a general solution for ensuring cached maps are
> valid. Yeah, I tripped over XFS_BMAPI_DELALLOC today for the first
> time today and could not understand what it was there for from the
> code....
> 
> > I think it would be a bit of a shame to fix this kind of longstanding
> > design flaw using a regression test that only tests for a particular
> > symptom, as you put it. Simple changes to speculative preallocation in
> > the future could potentially render it (silently) ineffective.
> 
> As I've just mentioned, there's a bunch of existing xfstests that
> trip over the stale cached extent problem I describe above. That's
> how we found them and patched them in the first place.
> 
> > The workaround doesn't elide the need for the design fix. The latter can
> > essentially replace the former, but a workaround first allows us to fix
> > the regression more quickly and with limited risk to older kernels. It
> > looks like this regression was introduced in v4.6, thus taking over a
> > year to be teased out.
> 
> I guess the difference here is that I'm just not interested in
> trying to work around problems like this anymore. We need to
> understand and fix them properly to ensure we kill them dead for
> good and they won't rise from the dead ten years later and bite us
> again. Then we can decide if a targetted workaround is appropriate
> as a first step for backports....
> 

The following items:

1.) The severity of the bug as a file/fs corruption vector.

2.) We don't yet have a fix more than a month after the regression was
reported and diagnosed to a fairly trivial change in behavior associated
with the writeback rework.

3.) We have a regression test that the workaround addresses and from
feedback so far, the only known problem it has is it doesn't implement a
generic enough solution to prevent the same problem we've had
historically in other places.

4.) We need to backport whatever fix we end up with back to v4.6 (before
reflink/COW was relevant) and to distro kernels.

5.) A quick glance at the commit log alone of the rfc to fix whatever is
wrong with writeback mapping (to facilitate the broader fix) shows that
it is notably more complex than the workaround.

... are together, more than enough to suggest to me that a workaround is
worthwhile. The writeback problem really should have been fixed by now
and doing so doesn't mean we have to stop pursuing a broader fix for
related problems.

I think I'm going to post that patch in the meantime and as suggested
earlier, I'll try to play around with another test in addition to this
one to see if we can have something that reproduces any other writeback
problems outside of the scope of speculative preallocation.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-10-11 13:47 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-31  4:02 [PATCH] generic: test race between block map change and writeback Eryu Guan
2017-10-09  8:17 ` Eryu Guan
2017-10-09 16:12 ` Brian Foster
2017-10-10  4:36   ` Eryu Guan
2017-10-10  5:24     ` Dave Chinner
2017-10-10 10:56       ` Brian Foster
2017-10-11  5:30         ` Dave Chinner
2017-10-11  9:45           ` Brian Foster
2017-10-11 10:42             ` Dave Chinner
2017-10-11 13:47               ` Brian Foster
2017-10-11 10:33         ` Eryu Guan
2017-10-10 12:44 ` Xiong Zhou

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.