* [PATCH 2/2] xfs: test the deadlock between the AGI and AGF with RENAME_WHITEOUT @ 2019-09-11 13:17 kaixuxia 2019-09-13 17:36 ` Brian Foster 0 siblings, 1 reply; 7+ messages in thread From: kaixuxia @ 2019-09-11 13:17 UTC (permalink / raw) To: fstests; +Cc: linux-xfs, Eryu Guan, Darrick J. Wong, newtongao, jasperwang There is ABBA deadlock bug between the AGI and AGF when performing rename() with RENAME_WHITEOUT flag, and add this testcase to make sure the rename() call works well. Signed-off-by: kaixuxia <kaixuxia@tencent.com> --- tests/xfs/512 | 99 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ tests/xfs/512.out | 2 ++ tests/xfs/group | 1 + 3 files changed, 102 insertions(+) create mode 100755 tests/xfs/512 create mode 100644 tests/xfs/512.out diff --git a/tests/xfs/512 b/tests/xfs/512 new file mode 100755 index 0000000..754f102 --- /dev/null +++ b/tests/xfs/512 @@ -0,0 +1,99 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2019 Tencent. All Rights Reserved. +# +# FS QA Test 512 +# +# Test the ABBA deadlock case between the AGI and AGF When performing +# rename operation with RENAME_WHITEOUT flag. +# +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + cd / + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter +. ./common/renameat2 + +rm -f $seqres.full + +# real QA test starts here +_supported_fs xfs +_supported_os Linux +_require_scratch_nocheck +_requires_renameat2 whiteout + +prepare_file() +{ + # create many small files for the rename with RENAME_WHITEOUT + i=0 + while [ $i -le $files ]; do + file=$SCRATCH_MNT/f$i + echo > $file >/dev/null 2>&1 + let i=$i+1 + done +} + +rename_whiteout() +{ + # create the rename targetdir + renamedir=$SCRATCH_MNT/renamedir + mkdir $renamedir + + # a long filename could increase the possibility that target_dp + # allocate new blocks(acquire the AGF lock) to store the filename + longnamepre=FFFsafdsagafsadfagasdjfalskdgakdlsglkasdg + + # now try to do rename with RENAME_WHITEOUT flag + i=0 + while [ $i -le $files ]; do + src/renameat2 -w $SCRATCH_MNT/f$i $renamedir/$longnamepre$i >/dev/null 2>&1 + let i=$i+1 + done +} + +create_file() +{ + # create the targetdir + createdir=$SCRATCH_MNT/createdir + mkdir $createdir + + # try to create file at the same time to hit the deadlock + i=0 + while [ $i -le $files ]; do + file=$createdir/f$i + echo > $file >/dev/null 2>&1 + let i=$i+1 + done +} + +_scratch_mkfs_xfs -bsize=1024 -dagcount=1 >> $seqres.full 2>&1 || + _fail "mkfs failed" +_scratch_mount + +files=250000 + +prepare_file +rename_whiteout & +create_file & + +wait +echo Silence is golden + +# Failure comes in the form of a deadlock. + +# success, all done +status=0 +exit diff --git a/tests/xfs/512.out b/tests/xfs/512.out new file mode 100644 index 0000000..0aabdef --- /dev/null +++ b/tests/xfs/512.out @@ -0,0 +1,2 @@ +QA output created by 512 +Silence is golden diff --git a/tests/xfs/group b/tests/xfs/group index a7ad300..ed250d6 100644 --- a/tests/xfs/group +++ b/tests/xfs/group @@ -509,3 +509,4 @@ 509 auto ioctl 510 auto ioctl quick 511 auto quick quota +512 auto rename -- 1.8.3.1 -- kaixuxia ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] xfs: test the deadlock between the AGI and AGF with RENAME_WHITEOUT 2019-09-11 13:17 [PATCH 2/2] xfs: test the deadlock between the AGI and AGF with RENAME_WHITEOUT kaixuxia @ 2019-09-13 17:36 ` Brian Foster 2019-09-15 3:34 ` Eryu Guan 2019-09-16 5:12 ` kaixuxia 0 siblings, 2 replies; 7+ messages in thread From: Brian Foster @ 2019-09-13 17:36 UTC (permalink / raw) To: kaixuxia Cc: fstests, linux-xfs, Eryu Guan, Darrick J. Wong, newtongao, jasperwang On Wed, Sep 11, 2019 at 09:17:08PM +0800, kaixuxia wrote: > There is ABBA deadlock bug between the AGI and AGF when performing > rename() with RENAME_WHITEOUT flag, and add this testcase to make > sure the rename() call works well. > > Signed-off-by: kaixuxia <kaixuxia@tencent.com> > --- > tests/xfs/512 | 99 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ > tests/xfs/512.out | 2 ++ > tests/xfs/group | 1 + > 3 files changed, 102 insertions(+) > create mode 100755 tests/xfs/512 > create mode 100644 tests/xfs/512.out > > diff --git a/tests/xfs/512 b/tests/xfs/512 > new file mode 100755 > index 0000000..754f102 > --- /dev/null > +++ b/tests/xfs/512 > @@ -0,0 +1,99 @@ > +#! /bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (c) 2019 Tencent. All Rights Reserved. > +# > +# FS QA Test 512 > +# > +# Test the ABBA deadlock case between the AGI and AGF When performing > +# rename operation with RENAME_WHITEOUT flag. > +# > +seq=`basename $0` > +seqres=$RESULT_DIR/$seq > +echo "QA output created by $seq" > + > +here=`pwd` > +tmp=/tmp/$$ > +status=1 # failure is the default! > +trap "_cleanup; exit \$status" 0 1 2 3 15 > + > +_cleanup() > +{ > + cd / > + rm -f $tmp.* > +} > + > +# get standard environment, filters and checks > +. ./common/rc > +. ./common/filter > +. ./common/renameat2 > + > +rm -f $seqres.full > + > +# real QA test starts here > +_supported_fs xfs > +_supported_os Linux > +_require_scratch_nocheck Why _nocheck? AFAICT the filesystem shouldn't end up intentionally corrupted. > +_requires_renameat2 whiteout > + > +prepare_file() > +{ > + # create many small files for the rename with RENAME_WHITEOUT > + i=0 > + while [ $i -le $files ]; do > + file=$SCRATCH_MNT/f$i > + echo > $file >/dev/null 2>&1 > + let i=$i+1 > + done Something like the following is a bit more simple, IMO: for i in $(seq 1 $files); do touch $SCRATCH_MNT/f.$i done The same goes for the other while loops below that increment up to $files. > +} > + > +rename_whiteout() > +{ > + # create the rename targetdir > + renamedir=$SCRATCH_MNT/renamedir > + mkdir $renamedir > + > + # a long filename could increase the possibility that target_dp > + # allocate new blocks(acquire the AGF lock) to store the filename > + longnamepre=FFFsafdsagafsadfagasdjfalskdgakdlsglkasdg > + The max filename length is 256 bytes. You could do something like the following to increase name length (leaving room for the file index and terminating NULL) if it helps the test: prefix=`for i in $(seq 0 245); do echo -n a; done` > + # now try to do rename with RENAME_WHITEOUT flag > + i=0 > + while [ $i -le $files ]; do > + src/renameat2 -w $SCRATCH_MNT/f$i $renamedir/$longnamepre$i >/dev/null 2>&1 > + let i=$i+1 > + done > +} > + > +create_file() > +{ > + # create the targetdir > + createdir=$SCRATCH_MNT/createdir > + mkdir $createdir > + > + # try to create file at the same time to hit the deadlock > + i=0 > + while [ $i -le $files ]; do > + file=$createdir/f$i > + echo > $file >/dev/null 2>&1 > + let i=$i+1 > + done > +} You could generalize this function to take a target directory parameter and just call it twice (once to prepare and again for the create workload). > + > +_scratch_mkfs_xfs -bsize=1024 -dagcount=1 >> $seqres.full 2>&1 || > + _fail "mkfs failed" Why -bsize=1k? Does that make the reproducer more effective? > +_scratch_mount > + > +files=250000 > + Have you tested effectiveness of reproducing the issue with smaller file counts? A brief comment here to document where the value comes from might be useful. Somewhat related, how long does this test take on fixed kernels? > +prepare_file > +rename_whiteout & > +create_file & > + > +wait > +echo Silence is golden > + > +# Failure comes in the form of a deadlock. > + I wonder if this should be in the dangerous group as well. I go back and forth on that though because I tend to filter out dangerous tests and the test won't be so risky once the fix proliferates. Perhaps that's just a matter of removing it from the dangerous group after a long enough period of time. Brian > +# success, all done > +status=0 > +exit > diff --git a/tests/xfs/512.out b/tests/xfs/512.out > new file mode 100644 > index 0000000..0aabdef > --- /dev/null > +++ b/tests/xfs/512.out > @@ -0,0 +1,2 @@ > +QA output created by 512 > +Silence is golden > diff --git a/tests/xfs/group b/tests/xfs/group > index a7ad300..ed250d6 100644 > --- a/tests/xfs/group > +++ b/tests/xfs/group > @@ -509,3 +509,4 @@ > 509 auto ioctl > 510 auto ioctl quick > 511 auto quick quota > +512 auto rename > -- > 1.8.3.1 > > -- > kaixuxia ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] xfs: test the deadlock between the AGI and AGF with RENAME_WHITEOUT 2019-09-13 17:36 ` Brian Foster @ 2019-09-15 3:34 ` Eryu Guan 2019-09-15 11:47 ` Brian Foster 2019-09-16 5:12 ` kaixuxia 1 sibling, 1 reply; 7+ messages in thread From: Eryu Guan @ 2019-09-15 3:34 UTC (permalink / raw) To: Brian Foster Cc: kaixuxia, fstests, linux-xfs, Darrick J. Wong, newtongao, jasperwang On Fri, Sep 13, 2019 at 01:36:24PM -0400, Brian Foster wrote: > On Wed, Sep 11, 2019 at 09:17:08PM +0800, kaixuxia wrote: > > There is ABBA deadlock bug between the AGI and AGF when performing > > rename() with RENAME_WHITEOUT flag, and add this testcase to make > > sure the rename() call works well. > > > > Signed-off-by: kaixuxia <kaixuxia@tencent.com> > > --- > > tests/xfs/512 | 99 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > tests/xfs/512.out | 2 ++ > > tests/xfs/group | 1 + > > 3 files changed, 102 insertions(+) > > create mode 100755 tests/xfs/512 > > create mode 100644 tests/xfs/512.out > > > > diff --git a/tests/xfs/512 b/tests/xfs/512 > > new file mode 100755 > > index 0000000..754f102 > > --- /dev/null > > +++ b/tests/xfs/512 > > @@ -0,0 +1,99 @@ > > +#! /bin/bash > > +# SPDX-License-Identifier: GPL-2.0 > > +# Copyright (c) 2019 Tencent. All Rights Reserved. > > +# > > +# FS QA Test 512 > > +# > > +# Test the ABBA deadlock case between the AGI and AGF When performing > > +# rename operation with RENAME_WHITEOUT flag. > > +# > > +seq=`basename $0` > > +seqres=$RESULT_DIR/$seq > > +echo "QA output created by $seq" > > + > > +here=`pwd` > > +tmp=/tmp/$$ > > +status=1 # failure is the default! > > +trap "_cleanup; exit \$status" 0 1 2 3 15 > > + > > +_cleanup() > > +{ > > + cd / > > + rm -f $tmp.* > > +} > > + > > +# get standard environment, filters and checks > > +. ./common/rc > > +. ./common/filter > > +. ./common/renameat2 > > + > > +rm -f $seqres.full > > + > > +# real QA test starts here > > +_supported_fs xfs > > +_supported_os Linux > > +_require_scratch_nocheck > > Why _nocheck? AFAICT the filesystem shouldn't end up intentionally > corrupted. There was a comment in v1, but not in this v2, we should keep that comment. > > > +_requires_renameat2 whiteout > > + > > +prepare_file() > > +{ > > + # create many small files for the rename with RENAME_WHITEOUT > > + i=0 > > + while [ $i -le $files ]; do > > + file=$SCRATCH_MNT/f$i > > + echo > $file >/dev/null 2>&1 > > + let i=$i+1 > > + done > > Something like the following is a bit more simple, IMO: > > for i in $(seq 1 $files); do > touch $SCRATCH_MNT/f.$i > done > > The same goes for the other while loops below that increment up to > $files. Agreed, but looks like echo (which is a bash builtin) is faster than touch (which requires forking new process every loop). > > > +} > > + > > +rename_whiteout() > > +{ > > + # create the rename targetdir > > + renamedir=$SCRATCH_MNT/renamedir > > + mkdir $renamedir > > + > > + # a long filename could increase the possibility that target_dp > > + # allocate new blocks(acquire the AGF lock) to store the filename > > + longnamepre=FFFsafdsagafsadfagasdjfalskdgakdlsglkasdg > > + > > The max filename length is 256 bytes. You could do something like the > following to increase name length (leaving room for the file index and > terminating NULL) if it helps the test: > > prefix=`for i in $(seq 0 245); do echo -n a; done` Or prefix=`$PERL_PROG -e 'print "a"x256;'` ? Which seems a bit simpler to me. > > > + # now try to do rename with RENAME_WHITEOUT flag > > + i=0 > > + while [ $i -le $files ]; do > > + src/renameat2 -w $SCRATCH_MNT/f$i $renamedir/$longnamepre$i >/dev/null 2>&1 > > + let i=$i+1 > > + done > > +} > > + > > +create_file() > > +{ > > + # create the targetdir > > + createdir=$SCRATCH_MNT/createdir > > + mkdir $createdir > > + > > + # try to create file at the same time to hit the deadlock > > + i=0 > > + while [ $i -le $files ]; do > > + file=$createdir/f$i > > + echo > $file >/dev/null 2>&1 > > + let i=$i+1 > > + done > > +} > > You could generalize this function to take a target directory parameter > and just call it twice (once to prepare and again for the create > workload). > > > + > > +_scratch_mkfs_xfs -bsize=1024 -dagcount=1 >> $seqres.full 2>&1 || > > + _fail "mkfs failed" > > Why -bsize=1k? Does that make the reproducer more effective? > > > +_scratch_mount > > + > > +files=250000 > > + > > Have you tested effectiveness of reproducing the issue with smaller file > counts? A brief comment here to document where the value comes from > might be useful. Somewhat related, how long does this test take on fixed > kernels? > > > +prepare_file > > +rename_whiteout & > > +create_file & > > + > > +wait > > +echo Silence is golden > > + > > +# Failure comes in the form of a deadlock. > > + > > I wonder if this should be in the dangerous group as well. I go back and > forth on that though because I tend to filter out dangerous tests and > the test won't be so risky once the fix proliferates. Perhaps that's > just a matter of removing it from the dangerous group after a long > enough period of time. The deadlock has been fixed, so I think it's fine to leave dangerous group. > > Brian Thanks a lot for the review! Eryu > > > +# success, all done > > +status=0 > > +exit > > diff --git a/tests/xfs/512.out b/tests/xfs/512.out > > new file mode 100644 > > index 0000000..0aabdef > > --- /dev/null > > +++ b/tests/xfs/512.out > > @@ -0,0 +1,2 @@ > > +QA output created by 512 > > +Silence is golden > > diff --git a/tests/xfs/group b/tests/xfs/group > > index a7ad300..ed250d6 100644 > > --- a/tests/xfs/group > > +++ b/tests/xfs/group > > @@ -509,3 +509,4 @@ > > 509 auto ioctl > > 510 auto ioctl quick > > 511 auto quick quota > > +512 auto rename > > -- > > 1.8.3.1 > > > > -- > > kaixuxia ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] xfs: test the deadlock between the AGI and AGF with RENAME_WHITEOUT 2019-09-15 3:34 ` Eryu Guan @ 2019-09-15 11:47 ` Brian Foster 2019-09-15 13:27 ` Eryu Guan 0 siblings, 1 reply; 7+ messages in thread From: Brian Foster @ 2019-09-15 11:47 UTC (permalink / raw) To: Eryu Guan Cc: kaixuxia, fstests, linux-xfs, Darrick J. Wong, newtongao, jasperwang On Sun, Sep 15, 2019 at 11:34:35AM +0800, Eryu Guan wrote: > On Fri, Sep 13, 2019 at 01:36:24PM -0400, Brian Foster wrote: > > On Wed, Sep 11, 2019 at 09:17:08PM +0800, kaixuxia wrote: > > > There is ABBA deadlock bug between the AGI and AGF when performing > > > rename() with RENAME_WHITEOUT flag, and add this testcase to make > > > sure the rename() call works well. > > > > > > Signed-off-by: kaixuxia <kaixuxia@tencent.com> > > > --- > > > tests/xfs/512 | 99 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > tests/xfs/512.out | 2 ++ > > > tests/xfs/group | 1 + > > > 3 files changed, 102 insertions(+) > > > create mode 100755 tests/xfs/512 > > > create mode 100644 tests/xfs/512.out > > > > > > diff --git a/tests/xfs/512 b/tests/xfs/512 > > > new file mode 100755 > > > index 0000000..754f102 > > > --- /dev/null > > > +++ b/tests/xfs/512 > > > @@ -0,0 +1,99 @@ > > > +#! /bin/bash > > > +# SPDX-License-Identifier: GPL-2.0 > > > +# Copyright (c) 2019 Tencent. All Rights Reserved. > > > +# > > > +# FS QA Test 512 > > > +# > > > +# Test the ABBA deadlock case between the AGI and AGF When performing > > > +# rename operation with RENAME_WHITEOUT flag. > > > +# > > > +seq=`basename $0` > > > +seqres=$RESULT_DIR/$seq > > > +echo "QA output created by $seq" > > > + > > > +here=`pwd` > > > +tmp=/tmp/$$ > > > +status=1 # failure is the default! > > > +trap "_cleanup; exit \$status" 0 1 2 3 15 > > > + > > > +_cleanup() > > > +{ > > > + cd / > > > + rm -f $tmp.* > > > +} > > > + > > > +# get standard environment, filters and checks > > > +. ./common/rc > > > +. ./common/filter > > > +. ./common/renameat2 > > > + > > > +rm -f $seqres.full > > > + > > > +# real QA test starts here > > > +_supported_fs xfs > > > +_supported_os Linux > > > +_require_scratch_nocheck > > > > Why _nocheck? AFAICT the filesystem shouldn't end up intentionally > > corrupted. > > There was a comment in v1, but not in this v2, we should keep that > comment. > > > > > > +_requires_renameat2 whiteout > > > + > > > +prepare_file() > > > +{ > > > + # create many small files for the rename with RENAME_WHITEOUT > > > + i=0 > > > + while [ $i -le $files ]; do > > > + file=$SCRATCH_MNT/f$i > > > + echo > $file >/dev/null 2>&1 > > > + let i=$i+1 > > > + done > > > > Something like the following is a bit more simple, IMO: > > > > for i in $(seq 1 $files); do > > touch $SCRATCH_MNT/f.$i > > done > > > > The same goes for the other while loops below that increment up to > > $files. > > Agreed, but looks like echo (which is a bash builtin) is faster than > touch (which requires forking new process every loop). > Ah, interesting. I suppose that makes sense if there's tangible benefit. Would that benefit stand if we created an internal _touch helper or some such instead of open-coding it everywhere? > > > > > +} > > > + > > > +rename_whiteout() > > > +{ > > > + # create the rename targetdir > > > + renamedir=$SCRATCH_MNT/renamedir > > > + mkdir $renamedir > > > + > > > + # a long filename could increase the possibility that target_dp > > > + # allocate new blocks(acquire the AGF lock) to store the filename > > > + longnamepre=FFFsafdsagafsadfagasdjfalskdgakdlsglkasdg > > > + > > > > The max filename length is 256 bytes. You could do something like the > > following to increase name length (leaving room for the file index and > > terminating NULL) if it helps the test: > > > > prefix=`for i in $(seq 0 245); do echo -n a; done` > > Or > > prefix=`$PERL_PROG -e 'print "a"x256;'` > > ? Which seems a bit simpler to me. > > > > > > + # now try to do rename with RENAME_WHITEOUT flag > > > + i=0 > > > + while [ $i -le $files ]; do > > > + src/renameat2 -w $SCRATCH_MNT/f$i $renamedir/$longnamepre$i >/dev/null 2>&1 > > > + let i=$i+1 > > > + done > > > +} > > > + > > > +create_file() > > > +{ > > > + # create the targetdir > > > + createdir=$SCRATCH_MNT/createdir > > > + mkdir $createdir > > > + > > > + # try to create file at the same time to hit the deadlock > > > + i=0 > > > + while [ $i -le $files ]; do > > > + file=$createdir/f$i > > > + echo > $file >/dev/null 2>&1 > > > + let i=$i+1 > > > + done > > > +} > > > > You could generalize this function to take a target directory parameter > > and just call it twice (once to prepare and again for the create > > workload). > > > > > + > > > +_scratch_mkfs_xfs -bsize=1024 -dagcount=1 >> $seqres.full 2>&1 || > > > + _fail "mkfs failed" > > > > Why -bsize=1k? Does that make the reproducer more effective? > > > > > +_scratch_mount > > > + > > > +files=250000 > > > + > > > > Have you tested effectiveness of reproducing the issue with smaller file > > counts? A brief comment here to document where the value comes from > > might be useful. Somewhat related, how long does this test take on fixed > > kernels? > > > > > +prepare_file > > > +rename_whiteout & > > > +create_file & > > > + > > > +wait > > > +echo Silence is golden > > > + > > > +# Failure comes in the form of a deadlock. > > > + > > > > I wonder if this should be in the dangerous group as well. I go back and > > forth on that though because I tend to filter out dangerous tests and > > the test won't be so risky once the fix proliferates. Perhaps that's > > just a matter of removing it from the dangerous group after a long > > enough period of time. > > The deadlock has been fixed, so I think it's fine to leave dangerous > group. > Do you mean to leave it in or out? In general, what's the approach for dealing with dangerous tests that are no longer dangerous? Leave them indefinitely or remove them after a period of time? I tend to skip dangerous tests on regression runs just because I'm not looking for a deadlock or crash to disrupt a long running test. Brian > > > > Brian > > Thanks a lot for the review! > > Eryu > > > > > > +# success, all done > > > +status=0 > > > +exit > > > diff --git a/tests/xfs/512.out b/tests/xfs/512.out > > > new file mode 100644 > > > index 0000000..0aabdef > > > --- /dev/null > > > +++ b/tests/xfs/512.out > > > @@ -0,0 +1,2 @@ > > > +QA output created by 512 > > > +Silence is golden > > > diff --git a/tests/xfs/group b/tests/xfs/group > > > index a7ad300..ed250d6 100644 > > > --- a/tests/xfs/group > > > +++ b/tests/xfs/group > > > @@ -509,3 +509,4 @@ > > > 509 auto ioctl > > > 510 auto ioctl quick > > > 511 auto quick quota > > > +512 auto rename > > > -- > > > 1.8.3.1 > > > > > > -- > > > kaixuxia ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] xfs: test the deadlock between the AGI and AGF with RENAME_WHITEOUT 2019-09-15 11:47 ` Brian Foster @ 2019-09-15 13:27 ` Eryu Guan 0 siblings, 0 replies; 7+ messages in thread From: Eryu Guan @ 2019-09-15 13:27 UTC (permalink / raw) To: Brian Foster Cc: kaixuxia, fstests, linux-xfs, Darrick J. Wong, newtongao, jasperwang On Sun, Sep 15, 2019 at 07:47:51AM -0400, Brian Foster wrote: > On Sun, Sep 15, 2019 at 11:34:35AM +0800, Eryu Guan wrote: > > On Fri, Sep 13, 2019 at 01:36:24PM -0400, Brian Foster wrote: > > > On Wed, Sep 11, 2019 at 09:17:08PM +0800, kaixuxia wrote: > > > > There is ABBA deadlock bug between the AGI and AGF when performing > > > > rename() with RENAME_WHITEOUT flag, and add this testcase to make > > > > sure the rename() call works well. > > > > > > > > Signed-off-by: kaixuxia <kaixuxia@tencent.com> > > > > --- > > > > tests/xfs/512 | 99 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > tests/xfs/512.out | 2 ++ > > > > tests/xfs/group | 1 + > > > > 3 files changed, 102 insertions(+) > > > > create mode 100755 tests/xfs/512 > > > > create mode 100644 tests/xfs/512.out > > > > > > > > diff --git a/tests/xfs/512 b/tests/xfs/512 > > > > new file mode 100755 > > > > index 0000000..754f102 > > > > --- /dev/null > > > > +++ b/tests/xfs/512 > > > > @@ -0,0 +1,99 @@ > > > > +#! /bin/bash > > > > +# SPDX-License-Identifier: GPL-2.0 > > > > +# Copyright (c) 2019 Tencent. All Rights Reserved. > > > > +# > > > > +# FS QA Test 512 > > > > +# > > > > +# Test the ABBA deadlock case between the AGI and AGF When performing > > > > +# rename operation with RENAME_WHITEOUT flag. > > > > +# > > > > +seq=`basename $0` > > > > +seqres=$RESULT_DIR/$seq > > > > +echo "QA output created by $seq" > > > > + > > > > +here=`pwd` > > > > +tmp=/tmp/$$ > > > > +status=1 # failure is the default! > > > > +trap "_cleanup; exit \$status" 0 1 2 3 15 > > > > + > > > > +_cleanup() > > > > +{ > > > > + cd / > > > > + rm -f $tmp.* > > > > +} > > > > + > > > > +# get standard environment, filters and checks > > > > +. ./common/rc > > > > +. ./common/filter > > > > +. ./common/renameat2 > > > > + > > > > +rm -f $seqres.full > > > > + > > > > +# real QA test starts here > > > > +_supported_fs xfs > > > > +_supported_os Linux > > > > +_require_scratch_nocheck > > > > > > Why _nocheck? AFAICT the filesystem shouldn't end up intentionally > > > corrupted. > > > > There was a comment in v1, but not in this v2, we should keep that > > comment. > > > > > > > > > +_requires_renameat2 whiteout > > > > + > > > > +prepare_file() > > > > +{ > > > > + # create many small files for the rename with RENAME_WHITEOUT > > > > + i=0 > > > > + while [ $i -le $files ]; do > > > > + file=$SCRATCH_MNT/f$i > > > > + echo > $file >/dev/null 2>&1 > > > > + let i=$i+1 > > > > + done > > > > > > Something like the following is a bit more simple, IMO: > > > > > > for i in $(seq 1 $files); do > > > touch $SCRATCH_MNT/f.$i > > > done > > > > > > The same goes for the other while loops below that increment up to > > > $files. > > > > Agreed, but looks like echo (which is a bash builtin) is faster than > > touch (which requires forking new process every loop). > > > > Ah, interesting. I suppose that makes sense if there's tangible benefit. > Would that benefit stand if we created an internal _touch helper or some > such instead of open-coding it everywhere? IMHO, creating a _touch helper doesn't gain much than open-coding it, it's probably just a one-liner helper anyway. > > > > > > > > +} > > > > + > > > > +rename_whiteout() > > > > +{ > > > > + # create the rename targetdir > > > > + renamedir=$SCRATCH_MNT/renamedir > > > > + mkdir $renamedir > > > > + > > > > + # a long filename could increase the possibility that target_dp > > > > + # allocate new blocks(acquire the AGF lock) to store the filename > > > > + longnamepre=FFFsafdsagafsadfagasdjfalskdgakdlsglkasdg > > > > + > > > > > > The max filename length is 256 bytes. You could do something like the > > > following to increase name length (leaving room for the file index and > > > terminating NULL) if it helps the test: > > > > > > prefix=`for i in $(seq 0 245); do echo -n a; done` > > > > Or > > > > prefix=`$PERL_PROG -e 'print "a"x256;'` > > > > ? Which seems a bit simpler to me. > > > > > > > > > + # now try to do rename with RENAME_WHITEOUT flag > > > > + i=0 > > > > + while [ $i -le $files ]; do > > > > + src/renameat2 -w $SCRATCH_MNT/f$i $renamedir/$longnamepre$i >/dev/null 2>&1 > > > > + let i=$i+1 > > > > + done > > > > +} > > > > + > > > > +create_file() > > > > +{ > > > > + # create the targetdir > > > > + createdir=$SCRATCH_MNT/createdir > > > > + mkdir $createdir > > > > + > > > > + # try to create file at the same time to hit the deadlock > > > > + i=0 > > > > + while [ $i -le $files ]; do > > > > + file=$createdir/f$i > > > > + echo > $file >/dev/null 2>&1 > > > > + let i=$i+1 > > > > + done > > > > +} > > > > > > You could generalize this function to take a target directory parameter > > > and just call it twice (once to prepare and again for the create > > > workload). > > > > > > > + > > > > +_scratch_mkfs_xfs -bsize=1024 -dagcount=1 >> $seqres.full 2>&1 || > > > > + _fail "mkfs failed" > > > > > > Why -bsize=1k? Does that make the reproducer more effective? > > > > > > > +_scratch_mount > > > > + > > > > +files=250000 > > > > + > > > > > > Have you tested effectiveness of reproducing the issue with smaller file > > > counts? A brief comment here to document where the value comes from > > > might be useful. Somewhat related, how long does this test take on fixed > > > kernels? > > > > > > > +prepare_file > > > > +rename_whiteout & > > > > +create_file & > > > > + > > > > +wait > > > > +echo Silence is golden > > > > + > > > > +# Failure comes in the form of a deadlock. > > > > + > > > > > > I wonder if this should be in the dangerous group as well. I go back and > > > forth on that though because I tend to filter out dangerous tests and > > > the test won't be so risky once the fix proliferates. Perhaps that's > > > just a matter of removing it from the dangerous group after a long > > > enough period of time. > > > > The deadlock has been fixed, so I think it's fine to leave dangerous > > group. > > > > Do you mean to leave it in or out? Don't add 'dangerous' group. > > In general, what's the approach for dealing with dangerous tests that > are no longer dangerous? Leave them indefinitely or remove them after a > period of time? I tend to skip dangerous tests on regression runs just > because I'm not looking for a deadlock or crash to disrupt a long > running test. I think the definition of 'dangerous' group is a bit vague over time (at least to me), I'd define it as 'would crash or hang the test machine and cause tests to be interrupted & re-run'. So I'd remove them after the problem is fixed in latest upstream kernel/userspace tools. And I tend to not add new 'dangerous' tests, as we usually push such tests to fstests upstream only after the fixes are upstreamed, and only add new 'dangerous' tests when we know the fix is not going to be there for the near future. Thanks, Eryu ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] xfs: test the deadlock between the AGI and AGF with RENAME_WHITEOUT 2019-09-13 17:36 ` Brian Foster 2019-09-15 3:34 ` Eryu Guan @ 2019-09-16 5:12 ` kaixuxia 2019-09-16 6:33 ` Eryu Guan 1 sibling, 1 reply; 7+ messages in thread From: kaixuxia @ 2019-09-16 5:12 UTC (permalink / raw) To: Brian Foster Cc: fstests, linux-xfs, Eryu Guan, Darrick J. Wong, newtongao, jasperwang On 2019/9/14 1:36, Brian Foster wrote: > On Wed, Sep 11, 2019 at 09:17:08PM +0800, kaixuxia wrote: >> There is ABBA deadlock bug between the AGI and AGF when performing >> rename() with RENAME_WHITEOUT flag, and add this testcase to make >> sure the rename() call works well. >> >> Signed-off-by: kaixuxia <kaixuxia@tencent.com> >> --- >> tests/xfs/512 | 99 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> tests/xfs/512.out | 2 ++ >> tests/xfs/group | 1 + >> 3 files changed, 102 insertions(+) >> create mode 100755 tests/xfs/512 >> create mode 100644 tests/xfs/512.out >> >> diff --git a/tests/xfs/512 b/tests/xfs/512 >> new file mode 100755 >> index 0000000..754f102 >> --- /dev/null >> +++ b/tests/xfs/512 >> @@ -0,0 +1,99 @@ >> +#! /bin/bash >> +# SPDX-License-Identifier: GPL-2.0 >> +# Copyright (c) 2019 Tencent. All Rights Reserved. >> +# >> +# FS QA Test 512 >> +# >> +# Test the ABBA deadlock case between the AGI and AGF When performing >> +# rename operation with RENAME_WHITEOUT flag. >> +# >> +seq=`basename $0` >> +seqres=$RESULT_DIR/$seq >> +echo "QA output created by $seq" >> + >> +here=`pwd` >> +tmp=/tmp/$$ >> +status=1 # failure is the default! >> +trap "_cleanup; exit \$status" 0 1 2 3 15 >> + >> +_cleanup() >> +{ >> + cd / >> + rm -f $tmp.* >> +} >> + >> +# get standard environment, filters and checks >> +. ./common/rc >> +. ./common/filter >> +. ./common/renameat2 >> + >> +rm -f $seqres.full >> + >> +# real QA test starts here >> +_supported_fs xfs >> +_supported_os Linux >> +_require_scratch_nocheck > > Why _nocheck? AFAICT the filesystem shouldn't end up intentionally > corrupted. Will add the comment in the next version. > >> +_requires_renameat2 whiteout >> + >> +prepare_file() >> +{ >> + # create many small files for the rename with RENAME_WHITEOUT >> + i=0 >> + while [ $i -le $files ]; do >> + file=$SCRATCH_MNT/f$i >> + echo > $file >/dev/null 2>&1 >> + let i=$i+1 >> + done > > Something like the following is a bit more simple, IMO: > > for i in $(seq 1 $files); do > touch $SCRATCH_MNT/f.$i > done > > The same goes for the other while loops below that increment up to > $files. > >> +} >> + >> +rename_whiteout() >> +{ >> + # create the rename targetdir >> + renamedir=$SCRATCH_MNT/renamedir >> + mkdir $renamedir >> + >> + # a long filename could increase the possibility that target_dp >> + # allocate new blocks(acquire the AGF lock) to store the filename >> + longnamepre=FFFsafdsagafsadfagasdjfalskdgakdlsglkasdg >> + > > The max filename length is 256 bytes. You could do something like the > following to increase name length (leaving room for the file index and > terminating NULL) if it helps the test: > > prefix=`for i in $(seq 0 245); do echo -n a; done` > >> + # now try to do rename with RENAME_WHITEOUT flag >> + i=0 >> + while [ $i -le $files ]; do >> + src/renameat2 -w $SCRATCH_MNT/f$i $renamedir/$longnamepre$i >/dev/null 2>&1 >> + let i=$i+1 >> + done >> +} >> + >> +create_file() >> +{ >> + # create the targetdir >> + createdir=$SCRATCH_MNT/createdir >> + mkdir $createdir >> + >> + # try to create file at the same time to hit the deadlock >> + i=0 >> + while [ $i -le $files ]; do >> + file=$createdir/f$i >> + echo > $file >/dev/null 2>&1 >> + let i=$i+1 >> + done >> +} > > You could generalize this function to take a target directory parameter > and just call it twice (once to prepare and again for the create > workload). Right. > >> + >> +_scratch_mkfs_xfs -bsize=1024 -dagcount=1 >> $seqres.full 2>&1 || >> + _fail "mkfs failed" > > Why -bsize=1k? Does that make the reproducer more effective? > The smaller block size, the more frequently that the blocks are allocated when creating files... >> +_scratch_mount >> + >> +files=250000 >> + > > Have you tested effectiveness of reproducing the issue with smaller file > counts? A brief comment here to document where the value comes from > might be useful. Somewhat related, how long does this test take on fixed > kernels? Hmm, 250000 is just a random number big enough for this test. The smaller file counts maybe is enough. Actually, the create_file() call run fast than the rename_whiteout() call, so we can choose two different file counts for them. Anyway, I would address the file counts problem in V3... > >> +prepare_file >> +rename_whiteout & >> +create_file & >> + >> +wait >> +echo Silence is golden >> + >> +# Failure comes in the form of a deadlock. >> + > > I wonder if this should be in the dangerous group as well. I go back and > forth on that though because I tend to filter out dangerous tests and > the test won't be so risky once the fix proliferates. Perhaps that's > just a matter of removing it from the dangerous group after a long > enough period of time. > > Brian > >> +# success, all done >> +status=0 >> +exit >> diff --git a/tests/xfs/512.out b/tests/xfs/512.out >> new file mode 100644 >> index 0000000..0aabdef >> --- /dev/null >> +++ b/tests/xfs/512.out >> @@ -0,0 +1,2 @@ >> +QA output created by 512 >> +Silence is golden >> diff --git a/tests/xfs/group b/tests/xfs/group >> index a7ad300..ed250d6 100644 >> --- a/tests/xfs/group >> +++ b/tests/xfs/group >> @@ -509,3 +509,4 @@ >> 509 auto ioctl >> 510 auto ioctl quick >> 511 auto quick quota >> +512 auto rename >> -- >> 1.8.3.1 >> >> -- >> kaixuxia -- kaixuxia ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] xfs: test the deadlock between the AGI and AGF with RENAME_WHITEOUT 2019-09-16 5:12 ` kaixuxia @ 2019-09-16 6:33 ` Eryu Guan 0 siblings, 0 replies; 7+ messages in thread From: Eryu Guan @ 2019-09-16 6:33 UTC (permalink / raw) To: kaixuxia Cc: Brian Foster, fstests, linux-xfs, Eryu Guan, Darrick J. Wong, newtongao, jasperwang On Mon, Sep 16, 2019 at 01:12:37PM +0800, kaixuxia wrote: > > On 2019/9/14 1:36, Brian Foster wrote: > > On Wed, Sep 11, 2019 at 09:17:08PM +0800, kaixuxia wrote: > >> There is ABBA deadlock bug between the AGI and AGF when performing > >> rename() with RENAME_WHITEOUT flag, and add this testcase to make > >> sure the rename() call works well. > >> > >> Signed-off-by: kaixuxia <kaixuxia@tencent.com> > >> --- > >> tests/xfs/512 | 99 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> tests/xfs/512.out | 2 ++ > >> tests/xfs/group | 1 + > >> 3 files changed, 102 insertions(+) > >> create mode 100755 tests/xfs/512 > >> create mode 100644 tests/xfs/512.out > >> > >> diff --git a/tests/xfs/512 b/tests/xfs/512 > >> new file mode 100755 > >> index 0000000..754f102 > >> --- /dev/null > >> +++ b/tests/xfs/512 > >> @@ -0,0 +1,99 @@ > >> +#! /bin/bash > >> +# SPDX-License-Identifier: GPL-2.0 > >> +# Copyright (c) 2019 Tencent. All Rights Reserved. > >> +# > >> +# FS QA Test 512 > >> +# > >> +# Test the ABBA deadlock case between the AGI and AGF When performing > >> +# rename operation with RENAME_WHITEOUT flag. > >> +# > >> +seq=`basename $0` > >> +seqres=$RESULT_DIR/$seq > >> +echo "QA output created by $seq" > >> + > >> +here=`pwd` > >> +tmp=/tmp/$$ > >> +status=1 # failure is the default! > >> +trap "_cleanup; exit \$status" 0 1 2 3 15 > >> + > >> +_cleanup() > >> +{ > >> + cd / > >> + rm -f $tmp.* > >> +} > >> + > >> +# get standard environment, filters and checks > >> +. ./common/rc > >> +. ./common/filter > >> +. ./common/renameat2 > >> + > >> +rm -f $seqres.full > >> + > >> +# real QA test starts here > >> +_supported_fs xfs > >> +_supported_os Linux > >> +_require_scratch_nocheck > > > > Why _nocheck? AFAICT the filesystem shouldn't end up intentionally > > corrupted. > > Will add the comment in the next version. > > > >> +_requires_renameat2 whiteout > >> + > >> +prepare_file() > >> +{ > >> + # create many small files for the rename with RENAME_WHITEOUT > >> + i=0 > >> + while [ $i -le $files ]; do > >> + file=$SCRATCH_MNT/f$i > >> + echo > $file >/dev/null 2>&1 > >> + let i=$i+1 > >> + done > > > > Something like the following is a bit more simple, IMO: > > > > for i in $(seq 1 $files); do > > touch $SCRATCH_MNT/f.$i > > done > > > > The same goes for the other while loops below that increment up to > > $files. > > > >> +} > >> + > >> +rename_whiteout() > >> +{ > >> + # create the rename targetdir > >> + renamedir=$SCRATCH_MNT/renamedir > >> + mkdir $renamedir > >> + > >> + # a long filename could increase the possibility that target_dp > >> + # allocate new blocks(acquire the AGF lock) to store the filename > >> + longnamepre=FFFsafdsagafsadfagasdjfalskdgakdlsglkasdg > >> + > > > > The max filename length is 256 bytes. You could do something like the > > following to increase name length (leaving room for the file index and > > terminating NULL) if it helps the test: > > > > prefix=`for i in $(seq 0 245); do echo -n a; done` > > > >> + # now try to do rename with RENAME_WHITEOUT flag > >> + i=0 > >> + while [ $i -le $files ]; do > >> + src/renameat2 -w $SCRATCH_MNT/f$i $renamedir/$longnamepre$i >/dev/null 2>&1 > >> + let i=$i+1 > >> + done > >> +} > >> + > >> +create_file() > >> +{ > >> + # create the targetdir > >> + createdir=$SCRATCH_MNT/createdir > >> + mkdir $createdir > >> + > >> + # try to create file at the same time to hit the deadlock > >> + i=0 > >> + while [ $i -le $files ]; do > >> + file=$createdir/f$i > >> + echo > $file >/dev/null 2>&1 > >> + let i=$i+1 > >> + done > >> +} > > > > You could generalize this function to take a target directory parameter > > and just call it twice (once to prepare and again for the create > > workload). > > Right. > > > >> + > >> +_scratch_mkfs_xfs -bsize=1024 -dagcount=1 >> $seqres.full 2>&1 || > >> + _fail "mkfs failed" > > > > Why -bsize=1k? Does that make the reproducer more effective? > > > The smaller block size, the more frequently that the blocks are allocated > when creating files... If 1k block size is not a "must have", we could leave that part to MKFS_OPTIONS and don't specify block size in the test, i.e. testing with different MKFS_OPTIONS could cover various block sizes, e.g. MKFS_OPTIONS="-bsize=1k,-m crc=1,reflink=1" ./check ... > >> +_scratch_mount > >> + > >> +files=250000 > >> + > > > > Have you tested effectiveness of reproducing the issue with smaller file > > counts? A brief comment here to document where the value comes from > > might be useful. Somewhat related, how long does this test take on fixed > > kernels? > > Hmm, 250000 is just a random number big enough for this test. The smaller > file counts maybe is enough. Actually, the create_file() call run fast > than the rename_whiteout() call, so we can choose two different file > counts for them. Anyway, I would address the file counts problem in V3... Yeah, it's good to figure out a small file count (no need to be smallest) that could reproduce the deadlock reliably. Thanks, Eryu > > > >> +prepare_file > >> +rename_whiteout & > >> +create_file & > >> + > >> +wait > >> +echo Silence is golden > >> + > >> +# Failure comes in the form of a deadlock. > >> + > > > > I wonder if this should be in the dangerous group as well. I go back and > > forth on that though because I tend to filter out dangerous tests and > > the test won't be so risky once the fix proliferates. Perhaps that's > > just a matter of removing it from the dangerous group after a long > > enough period of time. > > > > Brian > > > >> +# success, all done > >> +status=0 > >> +exit > >> diff --git a/tests/xfs/512.out b/tests/xfs/512.out > >> new file mode 100644 > >> index 0000000..0aabdef > >> --- /dev/null > >> +++ b/tests/xfs/512.out > >> @@ -0,0 +1,2 @@ > >> +QA output created by 512 > >> +Silence is golden > >> diff --git a/tests/xfs/group b/tests/xfs/group > >> index a7ad300..ed250d6 100644 > >> --- a/tests/xfs/group > >> +++ b/tests/xfs/group > >> @@ -509,3 +509,4 @@ > >> 509 auto ioctl > >> 510 auto ioctl quick > >> 511 auto quick quota > >> +512 auto rename > >> -- > >> 1.8.3.1 > >> > >> -- > >> kaixuxia > > -- > kaixuxia ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2019-09-16 6:33 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-09-11 13:17 [PATCH 2/2] xfs: test the deadlock between the AGI and AGF with RENAME_WHITEOUT kaixuxia 2019-09-13 17:36 ` Brian Foster 2019-09-15 3:34 ` Eryu Guan 2019-09-15 11:47 ` Brian Foster 2019-09-15 13:27 ` Eryu Guan 2019-09-16 5:12 ` kaixuxia 2019-09-16 6:33 ` Eryu Guan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).