From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.6 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42AA0C81857 for ; Mon, 27 Apr 2020 23:07:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F0B7A20775 for ; Mon, 27 Apr 2020 23:07:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="MrU9agQB" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726233AbgD0XHJ (ORCPT ); Mon, 27 Apr 2020 19:07:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59084 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1726224AbgD0XHJ (ORCPT ); Mon, 27 Apr 2020 19:07:09 -0400 Received: from mail-io1-xd42.google.com (mail-io1-xd42.google.com [IPv6:2607:f8b0:4864:20::d42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 24685C0610D5 for ; Mon, 27 Apr 2020 16:07:09 -0700 (PDT) Received: by mail-io1-xd42.google.com with SMTP id o127so20844801iof.0 for ; Mon, 27 Apr 2020 16:07:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=LbZ3KNynKElNXXUKdSx7sKX2j+VYwSw0W9lyFQc/Xi4=; b=MrU9agQB0vV8PzGyLUNvAnCc9KdQUjyBp5iH/xYOIcl3SCQOrs+9MbQgwtKnmCgqZN 7J8m1huYvjByl18D1AdDTX4uH1ZuHKXuE2Z7QDGPcx/RJHBxKMdKbXGouSr8g28UBA9o hu7F1vg26I5RuIkXaSoIYjlID8Pi6GpRQDNmh7UDHoK1NfBbJhLsbTqriM+IZbBUSVCx xjfOxh88NViU/LubGuPme000E0YQ+25ZiLnbMr5sT9LEM2Ix1wBUX3JmxuPdRmKxgnd5 BOimaS/gICrtNeAH8/JEPiigZo1d6l9yn2A4kKgj9HeInmPlq9Uyp2g6m+Dw41NPldyh Yx0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=LbZ3KNynKElNXXUKdSx7sKX2j+VYwSw0W9lyFQc/Xi4=; b=aA9M7+r7dqopetn9JJJRW70Rtvw9PFNEzNieqU39LlNhmPvdhKQEuy9pRFkEPhvAqD bHjWUS0cFJyf4hV1OnTqSVlH9ZaxPylw1Aq/BLZiIqplN2mJdvU14+K8dmabC1wqpVRZ SBNBgM0RlbcJ6Fm7bUOqyBRpemuPrg9cTtxorEalk0yQNfizsJx1XHYx+zsTLHNw1ncq okZc4JjBXL4bVmhu0qs0TMdKxiyYDe5twbRAdwq2vIJMXpv9eDOi5V8/xKgtkW3Nj3Vy ytqYk4Dw+Y8s9N+OUcyo5wOaPvNoTKLTtVYmKcGwbta5I+UQvXqii3nExLHCqR1M3WXK Uxhw== X-Gm-Message-State: AGi0PuYj7F095qHZBpu4cDForjrG6n2AR5GdLAEy9NXotNkCHuI74bHe srksJvR87D2Hfw7ujNG8eDIdSVHW7WABzFQkIwWVea0Ay0jx X-Google-Smtp-Source: APiQypKyFB/ycJEE67dYqKBn/tbx3J/SrQdv52y1hv5gONaoe4TZ2yMITbHfbt08xdN2a+w2o9zQkiwudCM8hGh/yok= X-Received: by 2002:a05:6638:3d2:: with SMTP id r18mr23503412jaq.6.1588028828093; Mon, 27 Apr 2020 16:07:08 -0700 (PDT) MIME-Version: 1.0 References: <20200424165117.6433-1-raghavan.arvind@gmail.com> In-Reply-To: From: Arvind Raghavan Date: Mon, 27 Apr 2020 18:06:56 -0500 Message-ID: Subject: Re: [PATCH] fstest: Crashmonkey rename tests ported to xfstest To: Amir Goldstein Cc: Jayashree Mohan , Vijay Chidambaram , fstests@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: fstests-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: fstests@vger.kernel.org On Mon, Apr 27, 2020 at 5:07 PM Arvind Raghavan wrote: > > Thanks for the feedback! > > On Sun, Apr 26, 2020 at 2:29 AM Amir Goldstein wrote: > > > > On Fri, Apr 24, 2020 at 7:52 PM Arvind Raghavan > > wrote: > > > > > > This patch aims to add tests to the xfstest suite to check whether > > > renamed files recover after a crash. These test cases were generated by > > > CrashMonkey, a crash-consistency testing framework built at the SASLab > > > at UT Austin. > > > > > > This patch batches 37 tests into a single test, each of which creates a > > > file, renames it, syncs one of the files/parent directories, and ensures > > > that the synced file/dir is equivalent before and after a crash. > > > > > > This test takes around 12-15 seconds to run locally and is thus placed > > > in the 'quick' group. > > > > > > Signed-off-by: Arvind Raghavan > > > > Nice work. Some things to improve... > > > > > --- > > > tests/generic/484 | 179 ++++++++++++++++++++++++++++++++++++++++++ > > > tests/generic/484.out | 2 + > > > tests/generic/group | 1 + > > > 3 files changed, 182 insertions(+) > > > create mode 100755 tests/generic/484 > > > create mode 100644 tests/generic/484.out > > > > > > diff --git a/tests/generic/484 b/tests/generic/484 > > > new file mode 100755 > > > index 00000000..b27d828d > > > --- /dev/null > > > +++ b/tests/generic/484 > > > @@ -0,0 +1,179 @@ > > > +#! /bin/bash > > > +# SPDX-License-Identifier: GPL-2.0 > > > +# Copyright (c) 2020 The University of Texas at Austin. All Rights Reserved. > > > +# > > > +# FS QA Test 484 > > > +# > > > +# Test case created by CrashMonkey > > > +# > > > +# Test if we create a rename a file and persist it that the > > > +# appropriate files exist. > > > +# > > > +seq=`basename $0` > > > +seqres=$RESULT_DIR/$seq > > > +echo "QA output created by $seq" > > > + > > > +here=`pwd` > > > +tmp=/tmp/$$ > > > +status=1 # failure is the default! > > > +trap "_cleanup; exit \$status" 0 1 2 3 15 > > > + > > > +_cleanup() > > > +{ > > > + _cleanup_flakey > > > + cd / > > > + rm -f $tmp.* > > > +} > > > + > > > +# get standard environment, filters and checks > > > +. ./common/rc > > > +. ./common/filter > > > +. ./common/dmflakey > > > + > > > +# 256MB in byte > > > +fssize=$((2**20 * 256)) > > > + > > > +# remove previous $seqres.full before test > > > +rm -f $seqres.full > > > + > > > +# real QA test starts here > > > +_supported_fs generic > > > +_supported_os Linux > > > +_require_scratch_nocheck > > > +_require_dm_target flakey > > > + > > > +# initialize scratch device > > > +_scratch_mkfs_sized $fssize >> $seqres.full 2>&1 > > > > Why limit the fs size? > > In general this results is test coverage for non real life filesystems, > > so sub-optimal IMO. > > It was an artifact from the originally Crashmonkey test harness, but > we can go ahead and remove it now. > > > > +_require_metadata_journaling $SCRATCH_DEV > > > +_init_flakey > > > + > > > +stat_opt='-c %n - blocks: %b size: %s inode: %i links: %h' > > > + > > > +# Usage: general_stat [--data-only] file1 [--data-only] [file2] .. > > > +# If the --data-only flag precedes a file, then only that file's > > > +# data will be printed. If a file is synced, then general_stat > > > +# should provide identical output before and after a crash. > > > > This helper seems overly complicated for the use case in this > > test, but it is local to the test. I understand it is auto generated for > > other tests but that's not the way to go. > > I suggest you look into using existing FSSUM_PROG tool. > > I agree that seems like a more elegant solution. We essentially > have four cases that we need to check: > > (1) fsync file (meta + data) > (2) fdatasync file > (3) fsync dir > (4) fdatasync dir > > FSSUM_PROG works great for (3) and (4), since it looks like its > intended to be used only on directories. md5sum works well for > (2), but it doesn't capture the metadata needed for (1). > FSSUM_PROG has a way to print a full "manifest", which contains > the metadata/data hashes for each file, so I believe we can > grep/cut it out of the manifest output to test for (1). I'll send > out a V2 patch if you think this seems like a reasonable > approach. > > Also, see below for a note on FSSUM_PROG. Upon looking more closely into fssum, I've changed my mind. It looks like fssum walks a directory recursively, adding all file names, metadata, and data into a checksum. This is useful for a filesytem with stronger guarantees, but under POSIX standards, the only guarantees for fsycing a dir are that its immediate dirents are correct and that its metadata is synced, not necessarily the metadata of its children. Because of this, I believe it would be more readable and concise to rewrite this function instead of grep/cutting the output of fssum. I can clean it up by removing the echos and case/shifting and turn it into something that concisely expresses the following: fsync file -> stat file, md5sum file fdatasync file -> md5sum file fsync dir -> stat dir, ls dir fdatasync dir -> ls dir Would those changes make more sense? > > > +general_stat() { > > > + local data_only="false" > > > + while (( "$#" )); do > > > + case $1 in > > > + --data-only) > > > + data_only="true" > > > + ;; > > > + *) > > > + local path="$1" > > > + echo "-- $path --" > > > + if [ ! -e "$path" ]; then > > > + echo "Doesn't exist!" > > > + elif [ "$data_only" = "true" ] && [ -d "$path" ]; then > > > + echo "Directory Data" > > > + [ -z "$(ls -A $path)" ] || ls -1 "$path" | sort > > > > [ -z "$(ls -A $path)" ] || seems to add nothing. > > > > > + elif [ "$data_only" = "true" ]; then > > > + echo "File Data" > > > + od "$path" > > > + elif [ -d "$path" ]; then > > > + echo "Directory Metadata" > > > + stat "$stat_opt" "$path" > > > + echo "Directory Data" > > > + [ -z "$(ls -A $path)" ] || ls -1 "$path" | sort > > > + else > > > + echo "File Metadata" > > > + stat "$stat_opt" "$path" > > > + echo "File Data" > > > + od "$path" > > > + fi > > > + data_only="false" > > > + ;; > > > + esac > > > + shift > > > + done > > > +} > > > + > > > +check_consistency() > > > +{ > > > + before=$(general_stat "$@") > > > + _flakey_drop_and_remount > > > + after=$(general_stat "$@") > > > + > > > + if [ "$before" != "$after" ]; then > > > + echo -e "Before:\n$before" > > > + echo -e "After:\n$after" > > > + fi > > > + > > > + # Using _unmount_flakey nondeterministically fails here with > > > + # "target is busy". Calling 'sync' doesn't fix the issue. Lazy > > > + # unmount waits for lingering processes to exit. > > > > No way. None of the other flaky tests have this kludge. > > You must be doing something wrong, or there is another bug > > in the infrastructure that needs to be fixed instead of being papered over. > > Yeah, this was a bug on my end. I've fixed it and removed all of > the lazy unmounts. > > > > + $UMOUNT_PROG -l $SCRATCH_MNT > > > + _check_scratch_fs $FLAKEY_DEV > > > +} > > > + > > > +clean_dir() > > > +{ > > > + _mount_flakey > > > + rm -rf $(find $SCRATCH_MNT/* | grep -v "lost+found") > > > > Use testdir=$SCRATCH_MNT/testdir for base of your test and you > > won't need this blacklisting of lost+found. > > > > > + $UMOUNT_PROG -l $SCRATCH_MNT > > > > And maybe this is what you are doing wrong?? > > Why is this unmount lazy? > > > > > +} > > > + > > > +do_fsync_check() { > > > + local file="$1" > > > + local sync_op="$2" > > > + > > > + if [ $sync_op == "fdatasync" ]; then > > > + $XFS_IO_PROG -c "fdatasync" $file > > > > You don't really need to put this statement inside if/else > > > > > + check_consistency --data-only $file > > > > And you could probably also pass sync_op to check_consistency() > > instead of converting it here. In other words, this helper does not > > do much, so you could get rid of it. up to you. > > > > Thanks, > > Amir. > > I believe there is a minor bug in the fssum program. Below are > the relevant lines from the help output. > > usage: fssum > options: > ... > -[ugoamcdxe]: specify which fields to include in checksum > calculation. > ... > x : include xattrs > ... > ... > -x path : exclude path when building checksum > (multiple ok) > ... > > There seems to be a duplicate flag which prevents the user from > passing in -x to include the xattrs field in checksum > calculation. This can be fixed with a simple patch that changes > the latter flag to "-i path", and updates the documentation to > say "ignores path when building checksum". I've got such a commit > built locally than I can send out in a patch if you think it > makes sense. None of the tests in the repo use the flag as AFAIK, > but we will end up using it for the xattr tests.