All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eryu Guan <eguan@redhat.com>
To: Dave Chinner <david@fromorbit.com>, Brian Foster <bfoster@redhat.com>
Cc: xiao yang <yangx.jy@cn.fujitsu.com>,
	fstests@vger.kernel.org, darrick.wong@oracle.com
Subject: Re: [PATCH v2 2/2] xfs: Regression test for invalid sb_logsunit
Date: Tue, 16 Jan 2018 12:02:54 +0800	[thread overview]
Message-ID: <20180116040254.GE3102@eguan.usersys.redhat.com> (raw)
In-Reply-To: <20180115210352.GY16421@dastard>

On Tue, Jan 16, 2018 at 08:03:52AM +1100, Dave Chinner wrote:
> On Mon, Jan 15, 2018 at 07:45:23AM -0500, Brian Foster wrote:
> > On Sat, Jan 13, 2018 at 01:23:53PM +1100, Dave Chinner wrote:
> > > On Fri, Jan 12, 2018 at 04:50:05PM +0800, Eryu Guan wrote:
> > > > On Fri, Jan 12, 2018 at 07:36:05PM +1100, Dave Chinner wrote:
> > > > > On Fri, Jan 12, 2018 at 03:49:21PM +0800, Eryu Guan wrote:
> > > > > > On Fri, Jan 12, 2018 at 02:16:23PM +0800, xiao yang wrote:
> > > > > > > If log stripe unit isn't a multiple of the fs blocksize and mounting,
> > > > > > > the invalid sb_logsunit leads to crash as soon as we try to write to
> > > > > > > the log.
> > > > > > > 
> > > > > > > Signed-off-by: xiao yang <yangx.jy@cn.fujitsu.com>
> > > > > > > ---
> > > > > > >  tests/xfs/437     | 73 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > > > > >  tests/xfs/437.out |  2 ++
> > > > > > >  tests/xfs/group   |  1 +
> > > > > > >  3 files changed, 76 insertions(+)
> > > > > > >  create mode 100755 tests/xfs/437
> > > > > > >  create mode 100644 tests/xfs/437.out
> > > > > > > 
> > > > > > > diff --git a/tests/xfs/437 b/tests/xfs/437
> > > > > > > new file mode 100755
> > > > > > > index 0000000..f2b84ad
> > > > > > > --- /dev/null
> > > > > > > +++ b/tests/xfs/437
> > > > > > > @@ -0,0 +1,73 @@
> > > > > > > +#! /bin/bash
> > > > > > > +# FS QA Test No. 437
> > > > > > > +#
> > > > > > > +# Regression test for commit:
> > > > > > > +# 9c92ee2 ("xfs: validate sb_logsunit is a multiple of the fs blocksize")
> > > > > > > +#
> > > > > > > +# If log stripe unit isn't a multiple of the fs blocksize and mounting,
> > > > > > > +# the invalid sb_logsunit leads to crash as soon as we try to write to
> > > > > > > +# the log.
> > > > > > > +#
> > > > > > > +#-----------------------------------------------------------------------
> > > > > > > +# Copyright (c) 2018 FUJITSU.  All Rights Reserved.
> > > > > > > +# Author: Xiao Yang <yangx.jy@cn.fujitsu.com>
> > > > > > > +#
> > > > > > > +# This program is free software; you can redistribute it and/or
> > > > > > > +# modify it under the terms of the GNU General Public License as
> > > > > > > +# published by the Free Software Foundation.
> > > > > > > +#
> > > > > > > +# This program is distributed in the hope that it would be useful,
> > > > > > > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > > > > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > > > > > > +# GNU General Public License for more details.
> > > > > > > +#
> > > > > > > +# You should have received a copy of the GNU General Public License
> > > > > > > +# along with this program; if not, write the Free Software Foundation,
> > > > > > > +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> > > > > > > +#-----------------------------------------------------------------------
> > > > > > > +
> > > > > > > +seq=`basename "$0"`
> > > > > > > +seqres="$RESULT_DIR/$seq"
> > > > > > > +echo "QA output created by $seq"
> > > > > > > +
> > > > > > > +here=`pwd`
> > > > > > > +tmp=/tmp/$$
> > > > > > > +status=1    # failure is the default!
> > > > > > > +trap "_cleanup; exit \$status" 0 1 2 3 15
> > > > > > > +
> > > > > > > +_cleanup()
> > > > > > > +{
> > > > > > > +	rm -rf $tmp.*
> > > > > > > +}
> > > > > > > +
> > > > > > > +# get standard environment and checks
> > > > > > > +. ./common/rc
> > > > > > > +
> > > > > > > +# real QA test starts here
> > > > > > > +_supported_os Linux
> > > > > > > +_supported_fs xfs
> > > > > > > +_require_scratch
> > > > > > 
> > > > > > This test triggers ASSERT failure and warning on debug build, thus
> > > > > > failed _dmesg_check, I think we need _disable_dmesg_check (and some
> > > > > > comments) too.
> > > > > > 
> > > > > > [1809960.157615] XFS (sda6): log stripe unit 4095 bytes must be a multiple of block size
> > > > > > [1809960.159228] XFS (sda6): AAIEEE! Log failed size checks. Abort!
> > > > > > [1809960.160534] XFS: Assertion failed: 0, file: fs/xfs/xfs_log.c, line: 679
> > > > > > [1809960.162298] WARNING: CPU: 1 PID: 2186 at fs/xfs/xfs_message.c:105 asswarn+0x1e/0x30 [xfs]
> > > > > 
> > > > > If it triggers debug asserts which have been put there to catch bugs
> > > > > in other utilities (like mkfs) on a current upstream debug kernel,
> > > > > then the test should not be part of the auto and quick groups.
> > > > 
> > > > This test corrupts the filesystem on purpose (I didn't make this clear
> > > > in my last reply, sorry about that), and xfs detects the corruption &
> > > > refuses the mount instead of crashing. So I think the ASSERT failure is
> > > > expected on debug build and we just need to ignore it.
> > > 
> > > Except that after that assert, the block device is now busy and can't
> > > be released, so we can't use the scratch device anymore. All tests
> > > after this assert has been triggered are going to fail because
> > > the block device is busy....
> > > 
> > > That's the whole point of adding debug asserts in cases like this -
> > > they are supposed to stop test execution in it's tracks and leave a
> > > corpse to analyse. The auto group regression tests are not supposed
> > > to take the machine down on normal test configs (i.e.
> > > CONFIG_XFS_DEBUG=y).
> > > 
> > 
> > FWIW, my understanding of the dangerous group has always been that it's
> > for tests that when they trigger a regression, forcibly affect the
> > entire system as such (lockup, hang, crash, etc.). IMO, a test that

I had the same understanding of dangerous group. And I recommended the
usage of "-g auto -x dangerous" before[1], and Dave acknowledged this
dangerous group usage[2] :)

[1] https://www.spinics.net/lists/linux-btrfs/msg57312.html
[2] https://www.spinics.net/lists/linux-btrfs/msg57330.html

> > intentionally generates an XFS assert doesn't really follow the spirit
> > of that categorization, even though technically an assert can cause a
> > BUG. This is why we have CONFIG_XFS_ASSERT_FATAL now, because unless
> > that is disabled any auto test that reproduces a regression (assuming we
> > have broad/robust assert coverage) can very easily BUG() and have the
> > associated effect on the system.
> 
> And this is exactly why I opposed the CONFIG_XFS_ASSERT_FATAL
> inclusion. It started us down the slippery slope of allowing people
> to ignore asserts that indicate an impossible condition had
> occurred.
> 
> This was a bug found via fuzzing, not user problems in the wild. The
> assert had functioned perfectly well up to this point as a mechanism
> for preventing the XFS tools from creating a corrupt log stripe
> unit, and it still does....
> 
> That's the point here - a corrupt log stripe unit *should never
> happen* because it should trigger a CRC validation failure long
> before we get to parsing it. If we've got a corrupt value with a
> correct CRC, then we damn well need to understand the cause
> *immediately* because something else has gone very, very wrong.
> 
> > The flipside of course is that one should generally be able to run
> > xfstests against a FATAL_ASSERTS=y kernel that otherwise has no
> > regressions and not bork the system because a test generates a BUG() as
> > part of a successful run.
> 
> That is the historical behaviour of CONFIG_XFS_DEBUG=y and the
> historical policy for xfstests. Adding CONFIG_XFS_ASSERT_FATAL does
> not change that because (at least) some of use still treat asserts
> as fatal.
> 
> > Perhaps the root problem here is that the
> > kernel generates an assert from a codepath that "successfully" handles
> > the erroneous situation.
> 
> We do that all over the place - the assert is there because it
> indicates a fatal problem has occurred that should never happen.
> For production systems, we still have to handle that gracefully
> because people do fuzz testing of "should never happen" conditions
> and then get upset we fail to detect this. That does not mean
> the ASSERT is wrong - it makes it even more important that
> developers know when their code hits these cases....
> 
> If we really want to test these "should not ever happen" conditions
> that trigger asserts on debug kernels as "everyone always runs"
> regression testing, then these need to _notrun on CONFIG_XFS_DEBUG=y
> kernels.  fstests already knows that it's running on a debug kernel,
> so adding a _requires_production_kernel check may be the way around
> this being considered a dangerous test.

This reminds that we already have a _require_no_xfs_debug rule, as used
in xfs/115, which is known to trigger ASSERT failure on debug build. So
we can do the same in this new test.

Thanks,
Eryu

> 
> > We already spit out several error messages and
> > fail the mount, so my .02 (fwiw) would be to leave the test as
> > auto+dangerous, delete the useless assert and let people affected by
> > older kernels (w/ FATAL_ASSERT behavior) filter out the test as
> > appropriate in those environments. (Looking back, I see Darrick already
> > suggested to delete the assert as well...).
> 
> ... and this specific assert caught multiple bugs I introduced while
> refactoring mkfs recently. That's it's purpose, and it works for
> that purpose very well.
> 
> > > If you want to test dangerous regression tests, then you need to
> > > *opt-in* by adding "-g dangerous", not force everyone else to
> > > opt-out by having to run "-g auto -x dangerous".
> > > 
> > 
> > I often explicitly filter out the dangerous group as above because we
> > already have more than a handful of tests that are in both groups. I
> > don't think they've historically been mutually exclusive.
> 
> Which means we've already deviated from the historic policy. That
> doesn't mean it the right direction to take.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2018-01-16  4:02 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-11  8:25 [PATCH] xfs: Regression test for invalid sb_logsunit xiao yang
2018-01-11 18:07 ` Darrick J. Wong
2018-01-12  1:58   ` Xiao Yang
2018-01-12  6:14   ` [PATCH v2] syscalls/madvise09.c: Use custom mount point instead of /sys/fs/cgroup/memory xiao yang
2018-01-12  6:14     ` [PATCH v2] xfs: Regression test for invalid sb_logsunit xiao yang
2018-01-12  6:19     ` [PATCH v2] syscalls/madvise09.c: Use custom mount point instead of /sys/fs/cgroup/memory Xiao Yang
2018-01-12  6:16   ` [PATCH v2 1/2] common/xfs: Check if write supports [-c|-d] option in xfs_db xiao yang
2018-01-12  6:16     ` [PATCH v2 2/2] xfs: Regression test for invalid sb_logsunit xiao yang
2018-01-12  7:49       ` Eryu Guan
2018-01-12  8:36         ` Dave Chinner
2018-01-12  8:50           ` Eryu Guan
2018-01-12 16:41             ` Darrick J. Wong
2018-01-13  2:23             ` Dave Chinner
2018-01-15  6:29               ` Eryu Guan
2018-01-15  7:48                 ` [PATCH v3 1/3] common/xfs: Check if write supports [-c|-d] option in xfs_db xiao yang
2018-01-15  7:48                   ` [PATCH v3 2/3] common/filter: factor out expected XFS warnings for mount xiao yang
2018-01-15  7:48                   ` [PATCH v3 3/3] xfs: Regression test for invalid sb_logsunit xiao yang
2018-01-15 12:45               ` [PATCH v2 2/2] " Brian Foster
2018-01-15 21:03                 ` Dave Chinner
2018-01-16  4:02                   ` Eryu Guan [this message]
2018-01-16  6:41                     ` Xiao Yang
2018-01-16  7:26                     ` [PATCH v4 1/3] common/xfs: Check if write supports [-c|-d] option in xfs_db xiao yang
2018-01-16  7:26                       ` [PATCH v4 2/3] common/filter: factor out expected XFS warnings for mount xiao yang
2018-01-18  8:48                         ` Eryu Guan
2018-01-18  8:56                           ` Xiao Yang
2018-01-16  7:26                       ` [PATCH v4 3/3] xfs: Regression test for invalid sb_logsunit xiao yang
2018-01-18  8:46                         ` Eryu Guan
2018-01-18 10:49                           ` [PATCH v5 1/3] common/xfs: Check if write supports [-c|-d] option in xfs_db xiao yang
2018-01-18 10:49                             ` [PATCH v5 2/3] common/filter: Factor out expected XFS warnings for assert xiao yang
2018-01-18 18:29                               ` Darrick J. Wong
2018-01-19  2:51                                 ` Eryu Guan
2018-01-19  4:04                                 ` Xiao Yang
2018-01-19  5:38                                 ` [PATCH v6 2/3] common: Add _require_no_xfs_bug_on_assert && Factor out filter_xfs_dmesg xiao yang
2018-01-19  5:38                                   ` [PATCH v6 3/3] xfs: Regression test for invalid sb_logsunit xiao yang
2018-01-18 10:49                             ` [PATCH v5 " xiao yang
2018-01-18 18:19                             ` [PATCH v5 1/3] common/xfs: Check if write supports [-c|-d] option in xfs_db Darrick J. Wong
2018-01-16  8:50                     ` [PATCH v2 2/2] xfs: Regression test for invalid sb_logsunit Dave Chinner
2018-01-16 14:09                       ` Brian Foster
2018-01-18  8:44                         ` Eryu Guan
2018-01-16 13:58                   ` Brian Foster
2018-01-12  7:44     ` [PATCH v2 1/2] common/xfs: Check if write supports [-c|-d] option in xfs_db Eryu Guan
2018-01-12 16:43     ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180116040254.GE3102@eguan.usersys.redhat.com \
    --to=eguan@redhat.com \
    --cc=bfoster@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=fstests@vger.kernel.org \
    --cc=yangx.jy@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.