All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: guaneryu@gmail.com, linux-xfs@vger.kernel.org,
	fstests@vger.kernel.org, guan@eryu.me
Subject: Re: [PATCH 2/6] common: capture metadump output if xfs filesystem check fails
Date: Thu, 11 Feb 2021 13:35:24 -0500	[thread overview]
Message-ID: <20210211183524.GH222065@bfoster> (raw)
In-Reply-To: <20210211181234.GE7193@magnolia>

On Thu, Feb 11, 2021 at 10:12:34AM -0800, Darrick J. Wong wrote:
> On Thu, Feb 11, 2021 at 08:59:58AM -0500, Brian Foster wrote:
> > On Tue, Feb 09, 2021 at 06:56:30PM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@djwong.org>
> > > 
> > > Capture metadump output when various userspace repair and checker tools
> > > fail or indicate corruption, to aid in debugging.  We don't bother to
> > > annotate xfs_check because it's bitrotting.
> > > 
> > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > ---
> > >  README     |    2 ++
> > >  common/xfs |   26 ++++++++++++++++++++++++++
> > >  2 files changed, 28 insertions(+)
> > > 
> > > 
> > > diff --git a/README b/README
> > > index 43bb0cee..36f72088 100644
> > > --- a/README
> > > +++ b/README
> > > @@ -109,6 +109,8 @@ Preparing system for tests:
> > >               - Set TEST_FS_MODULE_RELOAD=1 to unload the module and reload
> > >                 it between test invocations.  This assumes that the name of
> > >                 the module is the same as FSTYP.
> > > +	     - Set SNAPSHOT_CORRUPT_XFS=1 to record compressed metadumps of XFS
> > > +	       filesystems if the various stages of _check_xfs_filesystem fail.
> > >  
> > >          - or add a case to the switch in common/config assigning
> > >            these variables based on the hostname of your test
> > > diff --git a/common/xfs b/common/xfs
> > > index 2156749d..ad1eb6ee 100644
> > > --- a/common/xfs
> > > +++ b/common/xfs
> > > @@ -432,6 +432,21 @@ _supports_xfs_scrub()
> > >  	return 0
> > >  }
> > >  
> > > +# Save a compressed snapshot of a corrupt xfs filesystem for later debugging.
> > > +_snapshot_xfs() {
> > 
> > The term snapshot has a well known meaning. Can we just call this
> > _metadump_xfs()?
> 
> Ok.
> 
> > 
> > > +	local metadump="$1"
> > > +	local device="$2"
> > > +	local logdev="$3"
> > > +	local options="-a -o"
> > > +
> > > +	if [ "$logdev" != "none" ]; then
> > > +		options="$options -l $logdev"
> > > +	fi
> > > +
> > > +	$XFS_METADUMP_PROG $options "$device" "$metadump" >> "$seqres.full" 2>&1
> > > +	gzip -f "$metadump" >> "$seqres.full" 2>&1 &
> > 
> > Why compress in the background?
> 
> Sometimes the metadumps can become very large and I don't tend to have a
> lot of space on the test appliances for storing blobs.
> 
> Also, I was under the impression that it was customary for people to
> share compressed metadumps of crashes, so why not save everyone a step?
> 
> I do this in the background to avoid holding up the next fstest.
> 
> > I wonder if we should just skip the
> > compression step since this requires an option to enable in the first
> > place..
> 
> Seeing as it's optional, I think that's all the more reason to compress.
> 

That's fair. It was more the background task that I was concerned about.
If the issue is that the compression takes too long, ISTM there's a
similar risk of the background compression conflicting with ongoing
tests. E.g., we have various tests that scale out I/O threads to extreme
levels and could delay the compression even longer (or vice versa), we
have no way to prevent multiple background compression tasks from
starting/competing as tests continue to run, etc.

What about allowing the user to specify an optional env var in the
config file to provide a compression command to use? If set, compress
the file in the foreground. Then the user can determine whether
compression is necessary at all, and if so, which compression tool might
provide a suitable enough time/space tradeoff for the test environment
(i.e., something like lz4 might be faster than gzip or bzip2 at the cost
of space).

Brian

> > 
> > > +}
> > > +
> > >  # run xfs_check and friends on a FS.
> > >  _check_xfs_filesystem()
> > >  {
> > ...
> > > @@ -540,6 +564,8 @@ _check_xfs_filesystem()
> > >  			cat $tmp.repair				>>$seqres.full
> > >  			echo "*** end xfs_repair output"	>>$seqres.full
> > >  
> > > +			test "$SNAPSHOT_CORRUPT_XFS" = "1" && \
> > > +				_snapshot_xfs "$seqres.rebuildrepair.md" "$device" "$2"
> > 
> > Why do we collect so many metadump images? Shouldn't all but the last
> > TEST_XFS_REPAIR_REBUILD thing not modify the fs? If so, it seems like we
> > should be able to collect one image (and perhaps just call it
> > "$seqres.$device.md") if any of the first several checks flag a problem.
> 
> Yes, the number of metadumps collected can be reduced to two.  One if
> scrub or logprint or repair -n fail, and a second one if the user set
> TEST_XFS_REPAIR_REBUILD=1 and either the repair or the repair -n fail.
> 
> Will change that.
> 
> --D
> 
> > 
> > Brian
> > 
> > >  			ok=0
> > >  		fi
> > >  		rm -f $tmp.repair
> > > 
> > 
> 


  reply	other threads:[~2021-02-11 18:53 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-10  2:56 [PATCHSET 0/6] fstests: various improvements to the test framework Darrick J. Wong
2021-02-10  2:56 ` [PATCH 1/6] config: wrap xfs_metadump as $XFS_METADUMP_PROG like the other tools Darrick J. Wong
2021-02-11 13:58   ` Brian Foster
2021-02-10  2:56 ` [PATCH 2/6] common: capture metadump output if xfs filesystem check fails Darrick J. Wong
2021-02-11 13:59   ` Brian Foster
2021-02-11 18:12     ` Darrick J. Wong
2021-02-11 18:35       ` Brian Foster [this message]
2021-02-11 19:05         ` Darrick J. Wong
2021-02-10  2:56 ` [PATCH 3/6] check: allow '-e testid' to exclude a single test Darrick J. Wong
2021-02-11 14:00   ` Brian Foster
2021-02-10  2:56 ` [PATCH 4/6] check: don't abort on non-existent excluded groups Darrick J. Wong
2021-02-11 14:00   ` Brian Foster
2021-02-11 17:27     ` Darrick J. Wong
2021-02-11 18:01       ` Brian Foster
2021-02-10  2:56 ` [PATCH 5/6] check: run tests in exactly the order specified Darrick J. Wong
2021-02-11 14:00   ` Brian Foster
2021-02-11 17:28     ` Darrick J. Wong
2021-02-10  2:56 ` [PATCH 6/6] fuzzy: capture core dumps from repair utilities Darrick J. Wong
2021-02-11 14:00   ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210211183524.GH222065@bfoster \
    --to=bfoster@redhat.com \
    --cc=djwong@kernel.org \
    --cc=fstests@vger.kernel.org \
    --cc=guan@eryu.me \
    --cc=guaneryu@gmail.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.