All of lore.kernel.org
 help / color / mirror / Atom feed
From: Filipe Manana <fdmanana@gmail.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: Qu Wenruo <wqu@suse.com>,
	linux-btrfs <linux-btrfs@vger.kernel.org>,
	fstests <fstests@vger.kernel.org>
Subject: Re: [PATCH v3] btrfs: Add a test for dead looping balance after balance cancel
Date: Wed, 20 May 2020 13:38:11 +0100	[thread overview]
Message-ID: <CAL3q7H7ofKeuRrsGDdBdzxjOqNcF03E9Z0fpPdpbf86=8eTwFQ@mail.gmail.com> (raw)
In-Reply-To: <e374561f-397a-8ab3-e2d4-cac423da6636@gmx.com>

On Wed, May 20, 2020 at 12:32 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2020/5/20 下午6:02, Qu Wenruo wrote:
> >
> >
> > On 2020/5/20 下午5:29, Filipe Manana wrote:
> >> On Wed, May 20, 2020 at 8:59 AM Qu Wenruo <wqu@suse.com> wrote:
> >>>
> >>> Test if canceling a running balance can cause later balance to dead
> >>> loop.
> >>>
> >>> The ifx is titled "btrfs: relocation: Clear the DEAD_RELOC_TREE bit for
> >>>  orphan roots to prevent runaway balance".
> >>>
> >>> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >>> ---
> >>> Changelog:
> >>> v2:
> >>> - Remove lsof debug output
> >>> v3:
> >>> - Remove ps debug output
> >>> ---
> >>>  tests/btrfs/213     | 64 +++++++++++++++++++++++++++++++++++++++++++++
> >>>  tests/btrfs/213.out |  2 ++
> >>>  tests/btrfs/group   |  1 +
> >>>  3 files changed, 67 insertions(+)
> >>>  create mode 100755 tests/btrfs/213
> >>>  create mode 100644 tests/btrfs/213.out
> >>>
> >>> diff --git a/tests/btrfs/213 b/tests/btrfs/213
> >>> new file mode 100755
> >>> index 00000000..f56b0279
> >>> --- /dev/null
> >>> +++ b/tests/btrfs/213
> >>> @@ -0,0 +1,64 @@
> >>> +#! /bin/bash
> >>> +# SPDX-License-Identifier: GPL-2.0
> >>> +# Copyright (C) 2020 SUSE Linux Products GmbH. All Rights Reserved.
> >>> +#
> >>> +# FS QA Test 213
> >>> +#
> >>> +# Test if canceling a running balance can lead to dead looping balance
> >>> +#
> >>> +seq=`basename $0`
> >>> +seqres=$RESULT_DIR/$seq
> >>> +echo "QA output created by $seq"
> >>> +
> >>> +here=`pwd`
> >>> +tmp=/tmp/$$
> >>> +status=1       # failure is the default!
> >>> +trap "_cleanup; exit \$status" 0 1 2 3 15
> >>> +
> >>> +_cleanup()
> >>> +{
> >>> +       cd /
> >>> +       rm -f $tmp.*
> >>> +}
> >>> +
> >>> +# get standard environment, filters and checks
> >>> +. ./common/rc
> >>> +. ./common/filter
> >>> +
> >>> +# remove previous $seqres.full before test
> >>> +rm -f $seqres.full
> >>> +
> >>> +# Modify as appropriate.
> >>> +_supported_fs btrfs
> >>> +_supported_os Linux
> >>> +_require_scratch
> >>> +_require_command "$KILLALL_PROG" killall
> >>> +
> >>> +_scratch_mkfs >> $seqres.full
> >>> +_scratch_mount
> >>> +
> >>> +runtime=4
> >>> +
> >>> +# Create enough IO so that we need around $runtime seconds to relocate it
> >>> +dd if=/dev/zero bs=1M of="$SCRATCH_MNT/file" oflag=sync status=none \
> >>> +       &> /dev/null &
> >>> +dd_pid=$!
> >>> +sleep $runtime
> >>> +"$KILLALL_PROG" -q dd &> /dev/null
> >>
> >> Do we really need the killall program? There's only one dd process.
> >>
> >> We should also kill the dd process at _cleanup(), as killing the test
> >> during the sleep above will result in the dd process not being killed.
> >
> > The main problem here is, I can't find a good way to kill dd.
> > plain 'kill $dd_pid' doesn't seem to kill it properly, as my debug
> > lsof/ps still shows dd running and failed to unmount the fs.
> >
> > 'kill -KILL $dd_pid' kills it well, but causes extra output for the
> > terminated dd.
> >
> > Only 'killall -q dd' works as expected.
> >
> > Any good advice on this?
>
> Oh, the fstests "kindly" overrides dd, "kill $dd_pid" only kills the
> child bash running that dd function, not the real dd command.
> I'll use xfs_io to ensure we get no extra wrapper.

Indeed, I was doing this in the test:

ps -eo pid,ppid,pgid,comm > /tmp/ps.txt
my_pid=$$
echo "dd_pid = $dd_pid mypid = $my_pid" >> /tmp/ps.txt
kill -9 $dd_pid
wait $dd_pid

And was noticing that the ps.txt file had:

23467   937 23467 check
...
23840 23467 23467 213
...
24034 23840 23467 213
24039 24034 23467 dd
...
dd_pid = 24034 mypid = 23840

So dd_pid matched a bash process running the test, which is a child of
the bash process that invoked dd.

Before digging further I saw your reply and checked that common/rc
provides a function named dd() which is a wrapper to dd and it invokes
dd twice, whence you needed killall.
Interesting, wasn't aware of that wrapper.

Thanks.



>
> Thanks,
> Qu
>
> >
> >>
> >>> +wait $dd_pid
> >>> +
> >>> +# Now balance should take at least $runtime seconds, we can cancel it at
> >>> +# $runtime/2 to ensure a success cancel.
> >>> +$BTRFS_UTIL_PROG balance start -d --bg "$SCRATCH_MNT"
> >>> +sleep $(($runtime / 2))
> >>> +$BTRFS_UTIL_PROG balance cancel "$SCRATCH_MNT"
> >>> +
> >>> +# Now check if we can finish relocating metadata, which should finish very
> >>> +# quickly
> >>> +$BTRFS_UTIL_PROG balance start -m "$SCRATCH_MNT" >> $seqres.full
> >>
> >> Why redirect this one to $seqres.full and not the other balance? What
> >> kind of useful information it provides?
> >
> > Not really much, just an indicator to show that the balance finishes as
> > expected.
> > And we don't want to output it golden output, as mkfs change may create
> > more metadata chunks and cause difference.
> >
> > For other ones, like the one to be canceled, we don't really care that much.
> >
> > Thanks,
> > Qu
> >>
> >>> +
> >>> +echo "Silence is golden"
> >>> +
> >>> +# success, all done
> >>> +status=0
> >>> +exit
> >>> diff --git a/tests/btrfs/213.out b/tests/btrfs/213.out
> >>> new file mode 100644
> >>> index 00000000..bd8f2430
> >>> --- /dev/null
> >>> +++ b/tests/btrfs/213.out
> >>> @@ -0,0 +1,2 @@
> >>> +QA output created by 213
> >>> +Silence is golden
> >>> diff --git a/tests/btrfs/group b/tests/btrfs/group
> >>> index 8d65bddd..fe4d5bb3 100644
> >>> --- a/tests/btrfs/group
> >>> +++ b/tests/btrfs/group
> >>> @@ -215,3 +215,4 @@
> >>>  210 auto quick qgroup snapshot
> >>>  211 auto quick log prealloc
> >>>  212 auto balance dangerous
> >>> +213 auto fast balance dangerous
> >>
> >> fast -> quick
> >>
> >> Thanks.
> >>
> >>> --
> >>> 2.26.2
> >>>
> >>
> >>
> >
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

  reply	other threads:[~2020-05-20 12:38 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-20  7:57 [PATCH v3] btrfs: Add a test for dead looping balance after balance cancel Qu Wenruo
2020-05-20  8:23 ` Johannes Thumshirn
2020-05-20  9:29 ` Filipe Manana
2020-05-20 10:02   ` Qu Wenruo
2020-05-20 11:32     ` Qu Wenruo
2020-05-20 12:38       ` Filipe Manana [this message]
  -- strict thread matches above, loose matches on Subject: below --
2020-05-20  7:43 Qu Wenruo
2020-05-20  7:48 ` Nikolay Borisov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAL3q7H7ofKeuRrsGDdBdzxjOqNcF03E9Z0fpPdpbf86=8eTwFQ@mail.gmail.com' \
    --to=fdmanana@gmail.com \
    --cc=fstests@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.