fstests.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] fstests: delete btrfs/064 it makes no sense
@ 2020-09-29 15:50 Anand Jain
  2020-09-29 15:55 ` Filipe Manana
  2020-09-30  4:44 ` [PATCH v2] fstests: btrfs/064 add a comment to the test case header Anand Jain
  0 siblings, 2 replies; 10+ messages in thread
From: Anand Jain @ 2020-09-29 15:50 UTC (permalink / raw)
  To: fstests; +Cc: linux-btrfs, josef, guaneryu, johannes.thumshirn

btrfs/064 aimed to test balance and replace concurrency while the stress
test is running in the background.

However, as the balance and the replace operation are mutually
exclusive, so they can never run concurrently.

So long this test case is showed successful because the btrfs replace's
return error was captured only into seqfull.out.

 ERROR: ioctl(DEV_REPLACE_START) '/mnt/scratch': add/delete/balance/replace/resize operation in progress

Reported-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 tests/btrfs/064     | 105 --------------------------------------------
 tests/btrfs/064.out |   2 -
 2 files changed, 107 deletions(-)
 delete mode 100755 tests/btrfs/064
 delete mode 100644 tests/btrfs/064.out

diff --git a/tests/btrfs/064 b/tests/btrfs/064
deleted file mode 100755
index 683a69f113bf..000000000000
--- a/tests/btrfs/064
+++ /dev/null
@@ -1,105 +0,0 @@
-#! /bin/bash
-# SPDX-License-Identifier: GPL-2.0
-# Copyright (C) 2014 Red Hat Inc. All rights reserved.
-#
-# FSQA Test No. btrfs/064
-#
-# Run btrfs balance and replace operations simultaneously with fsstress
-# running in background.
-#
-seq=`basename $0`
-seqres=$RESULT_DIR/$seq
-echo "QA output created by $seq"
-
-here=`pwd`
-tmp=/tmp/$$
-status=1
-trap "_cleanup; exit \$status" 0 1 2 3 15
-
-_cleanup()
-{
-	cd /
-	rm -f $tmp.*
-}
-
-# get standard environment, filters and checks
-. ./common/rc
-. ./common/filter
-
-# real QA test starts here
-_supported_fs btrfs
-# we check scratch dev after each loop
-_require_scratch_nocheck
-_require_scratch_dev_pool 5
-_require_scratch_dev_pool_equal_size
-_btrfs_get_profile_configs replace
-
-rm -f $seqres.full
-
-run_test()
-{
-	local mkfs_opts=$1
-	local saved_scratch_dev_pool=$SCRATCH_DEV_POOL
-
-	echo "Test $mkfs_opts" >>$seqres.full
-
-	# remove the last device from the SCRATCH_DEV_POOL list so
-	# _scratch_pool_mkfs won't use all devices in pool
-	local last_dev="`echo $SCRATCH_DEV_POOL | $AWK_PROG '{print $NF}'`"
-	SCRATCH_DEV_POOL=`echo $SCRATCH_DEV_POOL | sed -e "s# *$last_dev *##"`
-	_scratch_pool_mkfs $mkfs_opts >>$seqres.full 2>&1
-	# make sure we created btrfs with desired options
-	if [ $? -ne 0 ]; then
-		echo "mkfs $mkfs_opts failed"
-		SCRATCH_DEV_POOL=$saved_scratch_dev_pool
-		return
-	fi
-	_scratch_mount >>$seqres.full 2>&1
-	SCRATCH_DEV_POOL=$saved_scratch_dev_pool
-
-	args=`_scale_fsstress_args -p 20 -n 100 $FSSTRESS_AVOID -d $SCRATCH_MNT/stressdir`
-	echo "Run fsstress $args" >>$seqres.full
-	$FSSTRESS_PROG $args >/dev/null 2>&1 &
-	fsstress_pid=$!
-
-	echo -n "Start balance worker: " >>$seqres.full
-	_btrfs_stress_balance $SCRATCH_MNT >/dev/null 2>&1 &
-	balance_pid=$!
-	echo "$balance_pid" >>$seqres.full
-
-	echo -n "Start replace worker: " >>$seqres.full
-	_btrfs_stress_replace $SCRATCH_MNT >>$seqres.full 2>&1 &
-	replace_pid=$!
-	echo "$replace_pid" >>$seqres.full
-
-	echo "Wait for fsstress to exit and kill all background workers" >>$seqres.full
-	wait $fsstress_pid
-	kill $balance_pid $replace_pid
-	wait
-	# wait for the balance and replace operations to finish
-	while ps aux | grep "balance start" | grep -qv grep; do
-		sleep 1
-	done
-	while ps aux | grep "replace start" | grep -qv grep; do
-		sleep 1
-	done
-
-	echo "Scrub the filesystem" >>$seqres.full
-	$BTRFS_UTIL_PROG scrub start -B $SCRATCH_MNT >>$seqres.full 2>&1
-	if [ $? -ne 0 ]; then
-		echo "Scrub find errors in \"$mkfs_opts\" test" | tee -a $seqres.full
-	fi
-
-	_scratch_unmount
-	# we called _require_scratch_nocheck instead of _require_scratch
-	# do check after test for each profile config
-	_check_scratch_fs
-}
-
-echo "Silence is golden"
-for t in "${_btrfs_profile_configs[@]}"; do
-	run_test "$t"
-done
-
-status=0
-exit
diff --git a/tests/btrfs/064.out b/tests/btrfs/064.out
deleted file mode 100644
index d90765460090..000000000000
--- a/tests/btrfs/064.out
+++ /dev/null
@@ -1,2 +0,0 @@
-QA output created by 064
-Silence is golden
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] fstests: delete btrfs/064 it makes no sense
  2020-09-29 15:50 [PATCH] fstests: delete btrfs/064 it makes no sense Anand Jain
@ 2020-09-29 15:55 ` Filipe Manana
  2020-09-29 16:02   ` Josef Bacik
  2020-09-30  4:44 ` [PATCH v2] fstests: btrfs/064 add a comment to the test case header Anand Jain
  1 sibling, 1 reply; 10+ messages in thread
From: Filipe Manana @ 2020-09-29 15:55 UTC (permalink / raw)
  To: Anand Jain
  Cc: fstests, linux-btrfs, Josef Bacik, Eryu Guan, Johannes Thumshirn

On Tue, Sep 29, 2020 at 4:50 PM Anand Jain <anand.jain@oracle.com> wrote:
>
> btrfs/064 aimed to test balance and replace concurrency while the stress
> test is running in the background.
>
> However, as the balance and the replace operation are mutually
> exclusive, so they can never run concurrently.

And it's good to have a test that verifies that attempting to run them
concurrently doesn't cause any problems, like crashes, memory leaks or
some sort of filesystem corruption.

For example btrfs/187, which I wrote sometime ago, tests that running
send, balance and deduplication in parallel doesn't result in crashes,
since in the past they were allowed to run concurrently.

I see no point in removing the test, it's useful.

Thanks.

>
> So long this test case is showed successful because the btrfs replace's
> return error was captured only into seqfull.out.
>
>  ERROR: ioctl(DEV_REPLACE_START) '/mnt/scratch': add/delete/balance/replace/resize operation in progress
>
> Reported-by: Josef Bacik <josef@toxicpanda.com>
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
>  tests/btrfs/064     | 105 --------------------------------------------
>  tests/btrfs/064.out |   2 -
>  2 files changed, 107 deletions(-)
>  delete mode 100755 tests/btrfs/064
>  delete mode 100644 tests/btrfs/064.out
>
> diff --git a/tests/btrfs/064 b/tests/btrfs/064
> deleted file mode 100755
> index 683a69f113bf..000000000000
> --- a/tests/btrfs/064
> +++ /dev/null
> @@ -1,105 +0,0 @@
> -#! /bin/bash
> -# SPDX-License-Identifier: GPL-2.0
> -# Copyright (C) 2014 Red Hat Inc. All rights reserved.
> -#
> -# FSQA Test No. btrfs/064
> -#
> -# Run btrfs balance and replace operations simultaneously with fsstress
> -# running in background.
> -#
> -seq=`basename $0`
> -seqres=$RESULT_DIR/$seq
> -echo "QA output created by $seq"
> -
> -here=`pwd`
> -tmp=/tmp/$$
> -status=1
> -trap "_cleanup; exit \$status" 0 1 2 3 15
> -
> -_cleanup()
> -{
> -       cd /
> -       rm -f $tmp.*
> -}
> -
> -# get standard environment, filters and checks
> -. ./common/rc
> -. ./common/filter
> -
> -# real QA test starts here
> -_supported_fs btrfs
> -# we check scratch dev after each loop
> -_require_scratch_nocheck
> -_require_scratch_dev_pool 5
> -_require_scratch_dev_pool_equal_size
> -_btrfs_get_profile_configs replace
> -
> -rm -f $seqres.full
> -
> -run_test()
> -{
> -       local mkfs_opts=$1
> -       local saved_scratch_dev_pool=$SCRATCH_DEV_POOL
> -
> -       echo "Test $mkfs_opts" >>$seqres.full
> -
> -       # remove the last device from the SCRATCH_DEV_POOL list so
> -       # _scratch_pool_mkfs won't use all devices in pool
> -       local last_dev="`echo $SCRATCH_DEV_POOL | $AWK_PROG '{print $NF}'`"
> -       SCRATCH_DEV_POOL=`echo $SCRATCH_DEV_POOL | sed -e "s# *$last_dev *##"`
> -       _scratch_pool_mkfs $mkfs_opts >>$seqres.full 2>&1
> -       # make sure we created btrfs with desired options
> -       if [ $? -ne 0 ]; then
> -               echo "mkfs $mkfs_opts failed"
> -               SCRATCH_DEV_POOL=$saved_scratch_dev_pool
> -               return
> -       fi
> -       _scratch_mount >>$seqres.full 2>&1
> -       SCRATCH_DEV_POOL=$saved_scratch_dev_pool
> -
> -       args=`_scale_fsstress_args -p 20 -n 100 $FSSTRESS_AVOID -d $SCRATCH_MNT/stressdir`
> -       echo "Run fsstress $args" >>$seqres.full
> -       $FSSTRESS_PROG $args >/dev/null 2>&1 &
> -       fsstress_pid=$!
> -
> -       echo -n "Start balance worker: " >>$seqres.full
> -       _btrfs_stress_balance $SCRATCH_MNT >/dev/null 2>&1 &
> -       balance_pid=$!
> -       echo "$balance_pid" >>$seqres.full
> -
> -       echo -n "Start replace worker: " >>$seqres.full
> -       _btrfs_stress_replace $SCRATCH_MNT >>$seqres.full 2>&1 &
> -       replace_pid=$!
> -       echo "$replace_pid" >>$seqres.full
> -
> -       echo "Wait for fsstress to exit and kill all background workers" >>$seqres.full
> -       wait $fsstress_pid
> -       kill $balance_pid $replace_pid
> -       wait
> -       # wait for the balance and replace operations to finish
> -       while ps aux | grep "balance start" | grep -qv grep; do
> -               sleep 1
> -       done
> -       while ps aux | grep "replace start" | grep -qv grep; do
> -               sleep 1
> -       done
> -
> -       echo "Scrub the filesystem" >>$seqres.full
> -       $BTRFS_UTIL_PROG scrub start -B $SCRATCH_MNT >>$seqres.full 2>&1
> -       if [ $? -ne 0 ]; then
> -               echo "Scrub find errors in \"$mkfs_opts\" test" | tee -a $seqres.full
> -       fi
> -
> -       _scratch_unmount
> -       # we called _require_scratch_nocheck instead of _require_scratch
> -       # do check after test for each profile config
> -       _check_scratch_fs
> -}
> -
> -echo "Silence is golden"
> -for t in "${_btrfs_profile_configs[@]}"; do
> -       run_test "$t"
> -done
> -
> -status=0
> -exit
> diff --git a/tests/btrfs/064.out b/tests/btrfs/064.out
> deleted file mode 100644
> index d90765460090..000000000000
> --- a/tests/btrfs/064.out
> +++ /dev/null
> @@ -1,2 +0,0 @@
> -QA output created by 064
> -Silence is golden
> --
> 2.25.1
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] fstests: delete btrfs/064 it makes no sense
  2020-09-29 15:55 ` Filipe Manana
@ 2020-09-29 16:02   ` Josef Bacik
  2020-09-29 16:13     ` Filipe Manana
  0 siblings, 1 reply; 10+ messages in thread
From: Josef Bacik @ 2020-09-29 16:02 UTC (permalink / raw)
  To: fdmanana, Anand Jain; +Cc: fstests, linux-btrfs, Eryu Guan, Johannes Thumshirn

On 9/29/20 11:55 AM, Filipe Manana wrote:
> On Tue, Sep 29, 2020 at 4:50 PM Anand Jain <anand.jain@oracle.com> wrote:
>>
>> btrfs/064 aimed to test balance and replace concurrency while the stress
>> test is running in the background.
>>
>> However, as the balance and the replace operation are mutually
>> exclusive, so they can never run concurrently.
> 
> And it's good to have a test that verifies that attempting to run them
> concurrently doesn't cause any problems, like crashes, memory leaks or
> some sort of filesystem corruption.
> 
> For example btrfs/187, which I wrote sometime ago, tests that running
> send, balance and deduplication in parallel doesn't result in crashes,
> since in the past they were allowed to run concurrently.
> 
> I see no point in removing the test, it's useful.

My confusion was around whether this test was actually testing what we 
think it should be testing.  If this test was meant to make sure that 
replace works while we've got load on the fs, then clearly it's not 
doing what we think it's doing.

In this case if we're ok with it exercising the exclusion path then I 
think we at least need to update the comment at the beginning of the 
test so it's clear what the purpose of the test is.  And then we need to 
make sure we do actually have a test where device replace is properly 
exercised.  I _think_ it is with btrfs/065 and some others, so just 
updating the comment here would be enough.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] fstests: delete btrfs/064 it makes no sense
  2020-09-29 16:02   ` Josef Bacik
@ 2020-09-29 16:13     ` Filipe Manana
  2020-09-29 17:26       ` Josef Bacik
  0 siblings, 1 reply; 10+ messages in thread
From: Filipe Manana @ 2020-09-29 16:13 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Anand Jain, fstests, linux-btrfs, Eryu Guan, Johannes Thumshirn

On Tue, Sep 29, 2020 at 5:02 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> On 9/29/20 11:55 AM, Filipe Manana wrote:
> > On Tue, Sep 29, 2020 at 4:50 PM Anand Jain <anand.jain@oracle.com> wrote:
> >>
> >> btrfs/064 aimed to test balance and replace concurrency while the stress
> >> test is running in the background.
> >>
> >> However, as the balance and the replace operation are mutually
> >> exclusive, so they can never run concurrently.
> >
> > And it's good to have a test that verifies that attempting to run them
> > concurrently doesn't cause any problems, like crashes, memory leaks or
> > some sort of filesystem corruption.
> >
> > For example btrfs/187, which I wrote sometime ago, tests that running
> > send, balance and deduplication in parallel doesn't result in crashes,
> > since in the past they were allowed to run concurrently.
> >
> > I see no point in removing the test, it's useful.
>
> My confusion was around whether this test was actually testing what we
> think it should be testing.  If this test was meant to make sure that
> replace works while we've got load on the fs, then clearly it's not
> doing what we think it's doing.

Given that neither the test's description nor the changelog mention
that it expects device replace and balance to be able to run
concurrently,
that errors are explicitly ignored and redirected to $seqres.full, and
we don't do any sort of validation after device replace operations, it
makes it clear to me it's a stress test.

>
> In this case if we're ok with it exercising the exclusion path then I
> think we at least need to update the comment at the beginning of the
> test so it's clear what the purpose of the test is.  And then we need to
> make sure we do actually have a test where device replace is properly
> exercised.  I _think_ it is with btrfs/065 and some others, so just
> updating the comment here would be enough.  Thanks,
>
> Josef



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] fstests: delete btrfs/064 it makes no sense
  2020-09-29 16:13     ` Filipe Manana
@ 2020-09-29 17:26       ` Josef Bacik
  2020-09-30  4:14         ` Anand Jain
  0 siblings, 1 reply; 10+ messages in thread
From: Josef Bacik @ 2020-09-29 17:26 UTC (permalink / raw)
  To: fdmanana; +Cc: Anand Jain, fstests, linux-btrfs, Eryu Guan, Johannes Thumshirn

On 9/29/20 12:13 PM, Filipe Manana wrote:
> On Tue, Sep 29, 2020 at 5:02 PM Josef Bacik <josef@toxicpanda.com> wrote:
>>
>> On 9/29/20 11:55 AM, Filipe Manana wrote:
>>> On Tue, Sep 29, 2020 at 4:50 PM Anand Jain <anand.jain@oracle.com> wrote:
>>>>
>>>> btrfs/064 aimed to test balance and replace concurrency while the stress
>>>> test is running in the background.
>>>>
>>>> However, as the balance and the replace operation are mutually
>>>> exclusive, so they can never run concurrently.
>>>
>>> And it's good to have a test that verifies that attempting to run them
>>> concurrently doesn't cause any problems, like crashes, memory leaks or
>>> some sort of filesystem corruption.
>>>
>>> For example btrfs/187, which I wrote sometime ago, tests that running
>>> send, balance and deduplication in parallel doesn't result in crashes,
>>> since in the past they were allowed to run concurrently.
>>>
>>> I see no point in removing the test, it's useful.
>>
>> My confusion was around whether this test was actually testing what we
>> think it should be testing.  If this test was meant to make sure that
>> replace works while we've got load on the fs, then clearly it's not
>> doing what we think it's doing.
> 
> Given that neither the test's description nor the changelog mention
> that it expects device replace and balance to be able to run
> concurrently,
> that errors are explicitly ignored and redirected to $seqres.full, and
> we don't do any sort of validation after device replace operations, it
> makes it clear to me it's a stress test.
> 

Sure but I spent a while looking at it when it was failing being very 
confused.  In my mind my snapshot-stress.sh is a stress test, because 
its meant to run without errors.  The changelog and description are 
sufficiently vague enough that it appeared that Eryu meant to write a 
test that actually did a replace and balance at the same time.  The test 
clearly isn't doing that, so we need to update the description so it's 
clear that's what's going on.  And then I wanted to make sure that we do 
in fact have a test that stresses replace in these scenarios, because I 
want to make sure we actually test replace as well.

Not ripping it out is fine, but updating the description so I'm not 
confused in a couple years when I trip over this again would be nice. 
Thanks,

Josef

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] fstests: delete btrfs/064 it makes no sense
  2020-09-29 17:26       ` Josef Bacik
@ 2020-09-30  4:14         ` Anand Jain
  2020-09-30  9:16           ` Filipe Manana
  0 siblings, 1 reply; 10+ messages in thread
From: Anand Jain @ 2020-09-30  4:14 UTC (permalink / raw)
  To: Josef Bacik, fdmanana; +Cc: fstests, linux-btrfs, Eryu Guan, Johannes Thumshirn

On 30/9/20 1:26 am, Josef Bacik wrote:
> On 9/29/20 12:13 PM, Filipe Manana wrote:
>> On Tue, Sep 29, 2020 at 5:02 PM Josef Bacik <josef@toxicpanda.com> wrote:
>>>
>>> On 9/29/20 11:55 AM, Filipe Manana wrote:
>>>> On Tue, Sep 29, 2020 at 4:50 PM Anand Jain <anand.jain@oracle.com> 
>>>> wrote:
>>>>>
>>>>> btrfs/064 aimed to test balance and replace concurrency while the 
>>>>> stress
>>>>> test is running in the background.
>>>>>
>>>>> However, as the balance and the replace operation are mutually
>>>>> exclusive, so they can never run concurrently.
>>>>
>>>> And it's good to have a test that verifies that attempting to run them
>>>> concurrently doesn't cause any problems, like crashes, memory leaks or
>>>> some sort of filesystem corruption.
>>>>
>>>> For example btrfs/187, which I wrote sometime ago, tests that running
>>>> send, balance and deduplication in parallel doesn't result in crashes,
>>>> since in the past they were allowed to run concurrently.
>>>>
>>>> I see no point in removing the test, it's useful.
>>>
>>> My confusion was around whether this test was actually testing what we
>>> think it should be testing.  If this test was meant to make sure that
>>> replace works while we've got load on the fs, then clearly it's not
>>> doing what we think it's doing.
>>
>> Given that neither the test's description nor the changelog mention
>> that it expects device replace and balance to be able to run
>> concurrently,
>> that errors are explicitly ignored and redirected to $seqres.full, and
>> we don't do any sort of validation after device replace operations, it
>> makes it clear to me it's a stress test.
>>
> 
> Sure but I spent a while looking at it when it was failing being very 
> confused.  In my mind my snapshot-stress.sh is a stress test, because 
> its meant to run without errors.  The changelog and description are 
> sufficiently vague enough that it appeared that Eryu meant to write a 
> test that actually did a replace and balance at the same time.  The test 
> clearly isn't doing that, so we need to update the description so it's 
> clear that's what's going on.  And then I wanted to make sure that we do 
> in fact have a test that stresses replace in these scenarios, because I 
> want to make sure we actually test replace as well.
> 
> Not ripping it out is fine, but updating the description so I'm not 
> confused in a couple years when I trip over this again would be nice. 
> Thanks,
> 

As of now, we have the following balance concurrency tests.
-----
028 balance and unlink fsstress concurrency [1]
060 balance and subvol ops concurrency with fsstress [2]
061 balance and scrub concurrency with fsstress [2]
062 balance and defrag concurrency with fsstress [2]
063 balance and remount concurrency with fsstress [2]
064 balance and replace concurrency with fsstress  [2]
177 balance and resize concurrency
187 balance, send and dedupe concurrency
190 balance with qgroup

[1]
args=`_scale_fsstress_args -z \
         -f write=10 -f unlink=10 \
         -f creat=10 -f fsync=10 \
         -f fsync=10 -n 100000 -p 2 \
         -d $SCRATCH_MNT/stress_dir`

[2]
args=`_scale_fsstress_args -p 20 -n 100 $FSSTRESS_AVOID -d 
$SCRATCH_MNT/stressdir`
-----

064 shall test balance with fsstress in the background. The replace 
thread is kept out with the early check of BTRFS_FS_EXCL_OP in the kernel.
I am ok with the 064 headers updated, will send v2.


Also, it turns out that this test case helped to find a btrfs-progs bug.
Its patch [1] is sent to the ML.
   [1] btrfs-progs: fix return code for failed replace start

Thanks, Anand


> Josef


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2] fstests: btrfs/064 add a comment to the test case header
  2020-09-29 15:50 [PATCH] fstests: delete btrfs/064 it makes no sense Anand Jain
  2020-09-29 15:55 ` Filipe Manana
@ 2020-09-30  4:44 ` Anand Jain
  2020-09-30 12:42   ` Josef Bacik
  1 sibling, 1 reply; 10+ messages in thread
From: Anand Jain @ 2020-09-30  4:44 UTC (permalink / raw)
  To: fstests; +Cc: linux-btrfs, josef, guaneryu, johannes.thumshirn

It appears that the goal of this case was to test balance and
replace concurrency. But these two operations aren't meant to run
concurrently and the replace failing errors are captured in the
seqres.full output. Which are expected errors. To avoid further
confusion, this patch adds comments. The reason to keep this
test case is at the Link.

Link: https://patchwork.kernel.org/patch/11806307/
Reported-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
v2: 
Instead of removing the test case, add comments to clarify.
Title is changed
 old: fstests: delete btrfs/064 it makes no sens

 tests/btrfs/064 | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/tests/btrfs/064 b/tests/btrfs/064
index 683a69f113bf..ce73acf7872a 100755
--- a/tests/btrfs/064
+++ b/tests/btrfs/064
@@ -4,9 +4,11 @@
 #
 # FSQA Test No. btrfs/064
 #
-# Run btrfs balance and replace operations simultaneously with fsstress
-# running in background.
-#
+# Run btrfs balance and replace operations simultaneously with fsstress running
+# in the background, check with the scrub if all the blocks are ok.
+# Balance and replace operations are mutually exclusive operations they can't
+# run simultaneously. One of them is expected to fail when the other is running.
+
 seq=`basename $0`
 seqres=$RESULT_DIR/$seq
 echo "QA output created by $seq"
@@ -62,6 +64,8 @@ run_test()
 	$FSSTRESS_PROG $args >/dev/null 2>&1 &
 	fsstress_pid=$!
 
+	# Start both balance and replace in the background.
+	# Either balance or replace shall run, the other fails.
 	echo -n "Start balance worker: " >>$seqres.full
 	_btrfs_stress_balance $SCRATCH_MNT >/dev/null 2>&1 &
 	balance_pid=$!
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] fstests: delete btrfs/064 it makes no sense
  2020-09-30  4:14         ` Anand Jain
@ 2020-09-30  9:16           ` Filipe Manana
  2020-09-30 10:01             ` Anand Jain
  0 siblings, 1 reply; 10+ messages in thread
From: Filipe Manana @ 2020-09-30  9:16 UTC (permalink / raw)
  To: Anand Jain
  Cc: Josef Bacik, fstests, linux-btrfs, Eryu Guan, Johannes Thumshirn

On Wed, Sep 30, 2020 at 5:14 AM Anand Jain <anand.jain@oracle.com> wrote:
>
> On 30/9/20 1:26 am, Josef Bacik wrote:
> > On 9/29/20 12:13 PM, Filipe Manana wrote:
> >> On Tue, Sep 29, 2020 at 5:02 PM Josef Bacik <josef@toxicpanda.com> wrote:
> >>>
> >>> On 9/29/20 11:55 AM, Filipe Manana wrote:
> >>>> On Tue, Sep 29, 2020 at 4:50 PM Anand Jain <anand.jain@oracle.com>
> >>>> wrote:
> >>>>>
> >>>>> btrfs/064 aimed to test balance and replace concurrency while the
> >>>>> stress
> >>>>> test is running in the background.
> >>>>>
> >>>>> However, as the balance and the replace operation are mutually
> >>>>> exclusive, so they can never run concurrently.
> >>>>
> >>>> And it's good to have a test that verifies that attempting to run them
> >>>> concurrently doesn't cause any problems, like crashes, memory leaks or
> >>>> some sort of filesystem corruption.
> >>>>
> >>>> For example btrfs/187, which I wrote sometime ago, tests that running
> >>>> send, balance and deduplication in parallel doesn't result in crashes,
> >>>> since in the past they were allowed to run concurrently.
> >>>>
> >>>> I see no point in removing the test, it's useful.
> >>>
> >>> My confusion was around whether this test was actually testing what we
> >>> think it should be testing.  If this test was meant to make sure that
> >>> replace works while we've got load on the fs, then clearly it's not
> >>> doing what we think it's doing.
> >>
> >> Given that neither the test's description nor the changelog mention
> >> that it expects device replace and balance to be able to run
> >> concurrently,
> >> that errors are explicitly ignored and redirected to $seqres.full, and
> >> we don't do any sort of validation after device replace operations, it
> >> makes it clear to me it's a stress test.
> >>
> >
> > Sure but I spent a while looking at it when it was failing being very
> > confused.  In my mind my snapshot-stress.sh is a stress test, because
> > its meant to run without errors.  The changelog and description are
> > sufficiently vague enough that it appeared that Eryu meant to write a
> > test that actually did a replace and balance at the same time.  The test
> > clearly isn't doing that, so we need to update the description so it's
> > clear that's what's going on.  And then I wanted to make sure that we do
> > in fact have a test that stresses replace in these scenarios, because I
> > want to make sure we actually test replace as well.
> >
> > Not ripping it out is fine, but updating the description so I'm not
> > confused in a couple years when I trip over this again would be nice.
> > Thanks,
> >
>
> As of now, we have the following balance concurrency tests.
> -----
> 028 balance and unlink fsstress concurrency [1]
> 060 balance and subvol ops concurrency with fsstress [2]
> 061 balance and scrub concurrency with fsstress [2]
> 062 balance and defrag concurrency with fsstress [2]
> 063 balance and remount concurrency with fsstress [2]
> 064 balance and replace concurrency with fsstress  [2]
> 177 balance and resize concurrency

No, 177 does not test balance and resize concurrency.
It tests balance when a swap file exists. And the resize happens
(starts and ends) before setting the swap file and before doing the
balance.

Thanks.


> 187 balance, send and dedupe concurrency
> 190 balance with qgroup
>
> [1]
> args=`_scale_fsstress_args -z \
>          -f write=10 -f unlink=10 \
>          -f creat=10 -f fsync=10 \
>          -f fsync=10 -n 100000 -p 2 \
>          -d $SCRATCH_MNT/stress_dir`
>
> [2]
> args=`_scale_fsstress_args -p 20 -n 100 $FSSTRESS_AVOID -d
> $SCRATCH_MNT/stressdir`
> -----
>
> 064 shall test balance with fsstress in the background. The replace
> thread is kept out with the early check of BTRFS_FS_EXCL_OP in the kernel.
> I am ok with the 064 headers updated, will send v2.
>
>
> Also, it turns out that this test case helped to find a btrfs-progs bug.
> Its patch [1] is sent to the ML.
>    [1] btrfs-progs: fix return code for failed replace start
>
> Thanks, Anand
>
>
> > Josef
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] fstests: delete btrfs/064 it makes no sense
  2020-09-30  9:16           ` Filipe Manana
@ 2020-09-30 10:01             ` Anand Jain
  0 siblings, 0 replies; 10+ messages in thread
From: Anand Jain @ 2020-09-30 10:01 UTC (permalink / raw)
  To: fdmanana; +Cc: Josef Bacik, fstests, linux-btrfs, Eryu Guan, Johannes Thumshirn



On 30/9/20 5:16 pm, Filipe Manana wrote:
> On Wed, Sep 30, 2020 at 5:14 AM Anand Jain <anand.jain@oracle.com> wrote:
>>
>> On 30/9/20 1:26 am, Josef Bacik wrote:
>>> On 9/29/20 12:13 PM, Filipe Manana wrote:
>>>> On Tue, Sep 29, 2020 at 5:02 PM Josef Bacik <josef@toxicpanda.com> wrote:
>>>>>
>>>>> On 9/29/20 11:55 AM, Filipe Manana wrote:
>>>>>> On Tue, Sep 29, 2020 at 4:50 PM Anand Jain <anand.jain@oracle.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> btrfs/064 aimed to test balance and replace concurrency while the
>>>>>>> stress
>>>>>>> test is running in the background.
>>>>>>>
>>>>>>> However, as the balance and the replace operation are mutually
>>>>>>> exclusive, so they can never run concurrently.
>>>>>>
>>>>>> And it's good to have a test that verifies that attempting to run them
>>>>>> concurrently doesn't cause any problems, like crashes, memory leaks or
>>>>>> some sort of filesystem corruption.
>>>>>>
>>>>>> For example btrfs/187, which I wrote sometime ago, tests that running
>>>>>> send, balance and deduplication in parallel doesn't result in crashes,
>>>>>> since in the past they were allowed to run concurrently.
>>>>>>
>>>>>> I see no point in removing the test, it's useful.
>>>>>
>>>>> My confusion was around whether this test was actually testing what we
>>>>> think it should be testing.  If this test was meant to make sure that
>>>>> replace works while we've got load on the fs, then clearly it's not
>>>>> doing what we think it's doing.
>>>>
>>>> Given that neither the test's description nor the changelog mention
>>>> that it expects device replace and balance to be able to run
>>>> concurrently,
>>>> that errors are explicitly ignored and redirected to $seqres.full, and
>>>> we don't do any sort of validation after device replace operations, it
>>>> makes it clear to me it's a stress test.
>>>>
>>>
>>> Sure but I spent a while looking at it when it was failing being very
>>> confused.  In my mind my snapshot-stress.sh is a stress test, because
>>> its meant to run without errors.  The changelog and description are
>>> sufficiently vague enough that it appeared that Eryu meant to write a
>>> test that actually did a replace and balance at the same time.  The test
>>> clearly isn't doing that, so we need to update the description so it's
>>> clear that's what's going on.  And then I wanted to make sure that we do
>>> in fact have a test that stresses replace in these scenarios, because I
>>> want to make sure we actually test replace as well.
>>>
>>> Not ripping it out is fine, but updating the description so I'm not
>>> confused in a couple years when I trip over this again would be nice.
>>> Thanks,
>>>
>>
>> As of now, we have the following balance concurrency tests.
>> -----
>> 028 balance and unlink fsstress concurrency [1]
>> 060 balance and subvol ops concurrency with fsstress [2]
>> 061 balance and scrub concurrency with fsstress [2]
>> 062 balance and defrag concurrency with fsstress [2]
>> 063 balance and remount concurrency with fsstress [2]
>> 064 balance and replace concurrency with fsstress  [2]
>> 177 balance and resize concurrency
> 
> No, 177 does not test balance and resize concurrency.
> It tests balance when a swap file exists. And the resize happens
> (starts and ends) before setting the swap file and before doing the
> balance.

   You are right. Thanks for correcting.
-Anand

> 
> Thanks.
> 
> 
>> 187 balance, send and dedupe concurrency
>> 190 balance with qgroup
>>
>> [1]
>> args=`_scale_fsstress_args -z \
>>           -f write=10 -f unlink=10 \
>>           -f creat=10 -f fsync=10 \
>>           -f fsync=10 -n 100000 -p 2 \
>>           -d $SCRATCH_MNT/stress_dir`
>>
>> [2]
>> args=`_scale_fsstress_args -p 20 -n 100 $FSSTRESS_AVOID -d
>> $SCRATCH_MNT/stressdir`
>> -----
>>
>> 064 shall test balance with fsstress in the background. The replace
>> thread is kept out with the early check of BTRFS_FS_EXCL_OP in the kernel.
>> I am ok with the 064 headers updated, will send v2.
>>
>>
>> Also, it turns out that this test case helped to find a btrfs-progs bug.
>> Its patch [1] is sent to the ML.
>>     [1] btrfs-progs: fix return code for failed replace start
>>
>> Thanks, Anand
>>
>>
>>> Josef
>>
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] fstests: btrfs/064 add a comment to the test case header
  2020-09-30  4:44 ` [PATCH v2] fstests: btrfs/064 add a comment to the test case header Anand Jain
@ 2020-09-30 12:42   ` Josef Bacik
  0 siblings, 0 replies; 10+ messages in thread
From: Josef Bacik @ 2020-09-30 12:42 UTC (permalink / raw)
  To: Anand Jain, fstests; +Cc: linux-btrfs, guaneryu, johannes.thumshirn

On 9/30/20 12:44 AM, Anand Jain wrote:
> It appears that the goal of this case was to test balance and
> replace concurrency. But these two operations aren't meant to run
> concurrently and the replace failing errors are captured in the
> seqres.full output. Which are expected errors. To avoid further
> confusion, this patch adds comments. The reason to keep this
> test case is at the Link.
> 
> Link: https://patchwork.kernel.org/patch/11806307/
> Reported-by: Josef Bacik <josef@toxicpanda.com>
> Signed-off-by: Anand Jain <anand.jain@oracle.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-09-30 12:42 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-29 15:50 [PATCH] fstests: delete btrfs/064 it makes no sense Anand Jain
2020-09-29 15:55 ` Filipe Manana
2020-09-29 16:02   ` Josef Bacik
2020-09-29 16:13     ` Filipe Manana
2020-09-29 17:26       ` Josef Bacik
2020-09-30  4:14         ` Anand Jain
2020-09-30  9:16           ` Filipe Manana
2020-09-30 10:01             ` Anand Jain
2020-09-30  4:44 ` [PATCH v2] fstests: btrfs/064 add a comment to the test case header Anand Jain
2020-09-30 12:42   ` Josef Bacik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).