From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=187c=6L=vger.kernel.org=fstests-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.6 required=3.0 tests=DKIM_SIGNED,DKIM_VALID,
	DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 42AA0C81857
	for <linux-fstests@archiver.kernel.org>; Mon, 27 Apr 2020 23:07:10 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id F0B7A20775
	for <linux-fstests@archiver.kernel.org>; Mon, 27 Apr 2020 23:07:09 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="MrU9agQB"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1726233AbgD0XHJ (ORCPT
        <rfc822;linux-fstests@archiver.kernel.org>);
        Mon, 27 Apr 2020 19:07:09 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59084 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL)
        by vger.kernel.org with ESMTP id S1726224AbgD0XHJ (ORCPT
        <rfc822;fstests@vger.kernel.org>); Mon, 27 Apr 2020 19:07:09 -0400
Received: from mail-io1-xd42.google.com (mail-io1-xd42.google.com [IPv6:2607:f8b0:4864:20::d42])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 24685C0610D5
        for <fstests@vger.kernel.org>; Mon, 27 Apr 2020 16:07:09 -0700 (PDT)
Received: by mail-io1-xd42.google.com with SMTP id o127so20844801iof.0
        for <fstests@vger.kernel.org>; Mon, 27 Apr 2020 16:07:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=LbZ3KNynKElNXXUKdSx7sKX2j+VYwSw0W9lyFQc/Xi4=;
        b=MrU9agQB0vV8PzGyLUNvAnCc9KdQUjyBp5iH/xYOIcl3SCQOrs+9MbQgwtKnmCgqZN
         7J8m1huYvjByl18D1AdDTX4uH1ZuHKXuE2Z7QDGPcx/RJHBxKMdKbXGouSr8g28UBA9o
         hu7F1vg26I5RuIkXaSoIYjlID8Pi6GpRQDNmh7UDHoK1NfBbJhLsbTqriM+IZbBUSVCx
         xjfOxh88NViU/LubGuPme000E0YQ+25ZiLnbMr5sT9LEM2Ix1wBUX3JmxuPdRmKxgnd5
         BOimaS/gICrtNeAH8/JEPiigZo1d6l9yn2A4kKgj9HeInmPlq9Uyp2g6m+Dw41NPldyh
         Yx0Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=LbZ3KNynKElNXXUKdSx7sKX2j+VYwSw0W9lyFQc/Xi4=;
        b=aA9M7+r7dqopetn9JJJRW70Rtvw9PFNEzNieqU39LlNhmPvdhKQEuy9pRFkEPhvAqD
         bHjWUS0cFJyf4hV1OnTqSVlH9ZaxPylw1Aq/BLZiIqplN2mJdvU14+K8dmabC1wqpVRZ
         SBNBgM0RlbcJ6Fm7bUOqyBRpemuPrg9cTtxorEalk0yQNfizsJx1XHYx+zsTLHNw1ncq
         okZc4JjBXL4bVmhu0qs0TMdKxiyYDe5twbRAdwq2vIJMXpv9eDOi5V8/xKgtkW3Nj3Vy
         ytqYk4Dw+Y8s9N+OUcyo5wOaPvNoTKLTtVYmKcGwbta5I+UQvXqii3nExLHCqR1M3WXK
         Uxhw==
X-Gm-Message-State: AGi0PuYj7F095qHZBpu4cDForjrG6n2AR5GdLAEy9NXotNkCHuI74bHe
        srksJvR87D2Hfw7ujNG8eDIdSVHW7WABzFQkIwWVea0Ay0jx
X-Google-Smtp-Source: APiQypKyFB/ycJEE67dYqKBn/tbx3J/SrQdv52y1hv5gONaoe4TZ2yMITbHfbt08xdN2a+w2o9zQkiwudCM8hGh/yok=
X-Received: by 2002:a05:6638:3d2:: with SMTP id r18mr23503412jaq.6.1588028828093;
 Mon, 27 Apr 2020 16:07:08 -0700 (PDT)
MIME-Version: 1.0
References: <20200424165117.6433-1-raghavan.arvind@gmail.com>
 <CAOQ4uxi7ZgRbCUzCppmobDZ+2vAmUhzim16SmUgCMB0jYRMNkQ@mail.gmail.com> <CAM2wg3QmfjbAgAQLOGuLxoJwKcB2BxnEgb9LyuxF2S_Z1FFw6g@mail.gmail.com>
In-Reply-To: <CAM2wg3QmfjbAgAQLOGuLxoJwKcB2BxnEgb9LyuxF2S_Z1FFw6g@mail.gmail.com>
From:   Arvind Raghavan <raghavan.arvind@gmail.com>
Date:   Mon, 27 Apr 2020 18:06:56 -0500
Message-ID: <CAM2wg3SyE8eMNMN3mJoua0P8dprqXiAVoUhgzdND_hEmywCGXg@mail.gmail.com>
Subject: Re: [PATCH] fstest: Crashmonkey rename tests ported to xfstest
To:     Amir Goldstein <amir73il@gmail.com>
Cc:     Jayashree Mohan <jayashree2912@gmail.com>,
        Vijay Chidambaram <vijay@cs.utexas.edu>,
        fstests@vger.kernel.org
Content-Type: text/plain; charset="UTF-8"
Sender: fstests-owner@vger.kernel.org
Precedence: bulk
List-ID: <fstests.vger.kernel.org>
X-Mailing-List: fstests@vger.kernel.org

On Mon, Apr 27, 2020 at 5:07 PM Arvind Raghavan
<raghavan.arvind@gmail.com> wrote:
>
> Thanks for the feedback!
>
> On Sun, Apr 26, 2020 at 2:29 AM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Fri, Apr 24, 2020 at 7:52 PM Arvind Raghavan
> > <raghavan.arvind@gmail.com> wrote:
> > >
> > > This patch aims to add tests to the xfstest suite to check whether
> > > renamed files recover after a crash. These test cases were generated by
> > > CrashMonkey, a crash-consistency testing framework built at the SASLab
> > > at UT Austin.
> > >
> > > This patch batches 37 tests into a single test, each of which creates a
> > > file, renames it, syncs one of the files/parent directories, and ensures
> > > that the synced file/dir is equivalent before and after a crash.
> > >
> > > This test takes around 12-15 seconds to run locally and is thus placed
> > > in the 'quick' group.
> > >
> > > Signed-off-by: Arvind Raghavan <raghavan.arvind@gmail.com>
> >
> > Nice work. Some things to improve...
> >
> > > ---
> > >  tests/generic/484     | 179 ++++++++++++++++++++++++++++++++++++++++++
> > >  tests/generic/484.out |   2 +
> > >  tests/generic/group   |   1 +
> > >  3 files changed, 182 insertions(+)
> > >  create mode 100755 tests/generic/484
> > >  create mode 100644 tests/generic/484.out
> > >
> > > diff --git a/tests/generic/484 b/tests/generic/484
> > > new file mode 100755
> > > index 00000000..b27d828d
> > > --- /dev/null
> > > +++ b/tests/generic/484
> > > @@ -0,0 +1,179 @@
> > > +#! /bin/bash
> > > +# SPDX-License-Identifier: GPL-2.0
> > > +# Copyright (c) 2020 The University of Texas at Austin.  All Rights Reserved.
> > > +#
> > > +# FS QA Test 484
> > > +#
> > > +# Test case created by CrashMonkey
> > > +#
> > > +# Test if we create a rename a file and persist it that the
> > > +# appropriate files exist.
> > > +#
> > > +seq=`basename $0`
> > > +seqres=$RESULT_DIR/$seq
> > > +echo "QA output created by $seq"
> > > +
> > > +here=`pwd`
> > > +tmp=/tmp/$$
> > > +status=1       # failure is the default!
> > > +trap "_cleanup; exit \$status" 0 1 2 3 15
> > > +
> > > +_cleanup()
> > > +{
> > > +       _cleanup_flakey
> > > +       cd /
> > > +       rm -f $tmp.*
> > > +}
> > > +
> > > +# get standard environment, filters and checks
> > > +. ./common/rc
> > > +. ./common/filter
> > > +. ./common/dmflakey
> > > +
> > > +# 256MB in byte
> > > +fssize=$((2**20 * 256))
> > > +
> > > +# remove previous $seqres.full before test
> > > +rm -f $seqres.full
> > > +
> > > +# real QA test starts here
> > > +_supported_fs generic
> > > +_supported_os Linux
> > > +_require_scratch_nocheck
> > > +_require_dm_target flakey
> > > +
> > > +# initialize scratch device
> > > +_scratch_mkfs_sized $fssize >> $seqres.full 2>&1
> >
> > Why limit the fs size?
> > In general this results is test coverage for non real life filesystems,
> > so sub-optimal IMO.
>
> It was an artifact from the originally Crashmonkey test harness, but
> we can go ahead and remove it now.
>
> > > +_require_metadata_journaling $SCRATCH_DEV
> > > +_init_flakey
> > > +
> > > +stat_opt='-c  %n - blocks: %b size: %s inode: %i links: %h'
> > > +
> > > +# Usage: general_stat [--data-only] file1 [--data-only] [file2] ..
> > > +# If the --data-only flag precedes a file, then only that file's
> > > +# data will be printed. If a file is synced, then general_stat
> > > +# should provide identical output before and after a crash.
> >
> > This helper seems overly complicated for the use case in this
> > test, but it is local to the test. I understand it is auto generated for
> > other tests but that's not the way to go.
> > I suggest you look into using existing FSSUM_PROG tool.
>
> I agree that seems like a more elegant solution. We essentially
> have four cases that we need to check:
>
> (1) fsync file (meta + data)
> (2) fdatasync file
> (3) fsync dir
> (4) fdatasync dir
>
> FSSUM_PROG works great for (3) and (4), since it looks like its
> intended to be used only on directories. md5sum works well for
> (2), but it doesn't capture the metadata needed for (1).
> FSSUM_PROG has a way to print a full "manifest", which contains
> the metadata/data hashes for each file, so I believe we can
> grep/cut it out of the manifest output to test for (1). I'll send
> out a V2 patch if you think this seems like a reasonable
> approach.
>
> Also, see below for a note on FSSUM_PROG.

Upon looking more closely into fssum, I've changed my mind. It
looks like fssum walks a directory recursively, adding all file
names, metadata, and data into a checksum. This is useful for a
filesytem with stronger guarantees, but under POSIX standards,
the only guarantees for fsycing a dir are that its immediate
dirents are correct and that its metadata is synced, not
necessarily the metadata of its children.

Because of this, I believe it would be more readable and concise
to rewrite this function instead of grep/cutting the output of
fssum. I can clean it up by removing the echos and case/shifting
and turn it into something that concisely expresses the
following:

fsync file     -> stat file, md5sum file
fdatasync file -> md5sum file
fsync dir      -> stat dir, ls dir
fdatasync dir  -> ls dir

Would those changes make more sense?

> > > +general_stat() {
> > > +       local data_only="false"
> > > +       while (( "$#" )); do
> > > +               case $1 in
> > > +                       --data-only)
> > > +                               data_only="true"
> > > +                               ;;
> > > +                       *)
> > > +                               local path="$1"
> > > +                               echo "-- $path --"
> > > +                               if [ ! -e "$path" ]; then
> > > +                                       echo "Doesn't exist!"
> > > +                               elif [ "$data_only" = "true" ] && [ -d "$path" ]; then
> > > +                                       echo "Directory Data"
> > > +                                       [ -z "$(ls -A $path)" ] || ls -1 "$path" | sort
> >
> > [ -z "$(ls -A $path)" ] || seems to add nothing.
> >
> > > +                               elif [ "$data_only" = "true" ]; then
> > > +                                       echo "File Data"
> > > +                                       od "$path"
> > > +                               elif [ -d "$path" ]; then
> > > +                                       echo "Directory Metadata"
> > > +                                       stat "$stat_opt" "$path"
> > > +                                       echo "Directory Data"
> > > +                                       [ -z "$(ls -A $path)" ] || ls -1 "$path" | sort
> > > +                               else
> > > +                                       echo "File Metadata"
> > > +                                       stat "$stat_opt" "$path"
> > > +                                       echo "File Data"
> > > +                                       od "$path"
> > > +                               fi
> > > +                               data_only="false"
> > > +                               ;;
> > > +               esac
> > > +               shift
> > > +       done
> > > +}
> > > +
> > > +check_consistency()
> > > +{
> > > +       before=$(general_stat "$@")
> > > +       _flakey_drop_and_remount
> > > +       after=$(general_stat "$@")
> > > +
> > > +       if [ "$before" != "$after" ]; then
> > > +               echo -e "Before:\n$before"
> > > +               echo -e "After:\n$after"
> > > +       fi
> > > +
> > > +       # Using _unmount_flakey nondeterministically fails here with
> > > +       # "target is busy". Calling 'sync' doesn't fix the issue. Lazy
> > > +       # unmount waits for lingering processes to exit.
> >
> > No way. None of the other flaky tests have this kludge.
> > You must be doing something wrong, or there is another bug
> > in the infrastructure that needs to be fixed instead of being papered over.
>
> Yeah, this was a bug on my end. I've fixed it and removed all of
> the lazy unmounts.
>
> > > +       $UMOUNT_PROG -l $SCRATCH_MNT
> > > +       _check_scratch_fs $FLAKEY_DEV
> > > +}
> > > +
> > > +clean_dir()
> > > +{
> > > +       _mount_flakey
> > > +       rm -rf $(find $SCRATCH_MNT/* | grep -v "lost+found")
> >
> > Use testdir=$SCRATCH_MNT/testdir for base of your test and you
> > won't need this blacklisting of lost+found.
> >
> > > +       $UMOUNT_PROG -l $SCRATCH_MNT
> >
> > And maybe this is what you are doing wrong??
> > Why is this unmount lazy?
> >
> > > +}
> > > +
> > > +do_fsync_check() {
> > > +       local file="$1"
> > > +       local sync_op="$2"
> > > +
> > > +       if [ $sync_op == "fdatasync" ]; then
> > > +               $XFS_IO_PROG -c "fdatasync" $file
> >
> > You don't really need to put this statement inside if/else
> >
> > > +               check_consistency --data-only $file
> >
> > And you could probably also pass sync_op to check_consistency()
> > instead of converting it here. In other words, this helper does not
> > do much, so you could get rid of it. up to you.
> >
> > Thanks,
> > Amir.
>
> I believe there is a minor bug in the fssum program. Below are
> the relevant lines from the help output.
>
> usage: fssum <options> <path>
>     options:
>     ...
>     -[ugoamcdxe]: specify which fields to include in checksum
>                   calculation.
>          ...
>          x      : include xattrs
>          ...
>     ...
>     -x path     : exclude path when building checksum
>                   (multiple ok)
>     ...
>
> There seems to be a duplicate flag which prevents the user from
> passing in -x to include the xattrs field in checksum
> calculation. This can be fixed with a simple patch that changes
> the latter flag to "-i path", and updates the documentation to
> say "ignores path when building checksum". I've got such a commit
> built locally than I can send out in a patch if you think it
> makes sense. None of the tests in the repo use the flag as AFAIK,
> but we will end up using it for the xattr tests.