From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-vs1-f68.google.com ([209.85.217.68]:33696 "EHLO mail-vs1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728328AbeKHLPH (ORCPT ); Thu, 8 Nov 2018 06:15:07 -0500 Received: by mail-vs1-f68.google.com with SMTP id p74so10681194vsc.0 for ; Wed, 07 Nov 2018 17:42:03 -0800 (PST) MIME-Version: 1.0 References: <20181104163826.GH12788@desktop> <1B22AFA2-FAF3-45AA-9910-CDBE4AEBFB09@gmail.com> <20181105052217.GT6311@dastard> <46630C6B-77FA-4D15-92E7-43B89AD889A0@gmail.com> <20181106231536.GB8691@thunk.org> <20181106233956.GX6311@dastard> <20181107020922.GY6311@dastard> <20181107040429.GA13539@thunk.org> In-Reply-To: <20181107040429.GA13539@thunk.org> From: Jayashree Mohan Date: Wed, 7 Nov 2018 19:41:50 -0600 Message-ID: Subject: Re: [PATCH] fstest: CrashMonkey tests ported to xfstest Content-Type: text/plain; charset="UTF-8" Sender: fstests-owner@vger.kernel.org To: Theodore Ts'o Cc: Dave Chinner , Eryu Guan , fstests , Vijaychidambaram Velayudhan Pillai , Amir Goldstein , Filipe Manana List-ID: Hi all, We understand the concern about testing times. To choose a middle ground, Ted's suggestion of using _scratch_mkfs_sized works best for CrashMonkey specific tests. These tests involve very few files and it suffices to have a 100MB file system. I tested the patch on ext4, xfs, btrfs and f2fs on a partition of this size. The overhead due to _check_scratch_fs after each sub test is in the range of 3-5 seconds for all these file systems. If this is tolerable, we can force a smaller file system size for all CrashMonkey tests. Does this sound reasonable to you? Thanks, Jayashree Mohan On Tue, Nov 6, 2018 at 10:04 PM Theodore Y. Ts'o wrote: > > On Wed, Nov 07, 2018 at 01:09:22PM +1100, Dave Chinner wrote: > > > Running on a 200MB partition, addition of this check added only around 3-4 > > > seconds of delay in total for this patch consisting of 37 tests. Currently > > > this patch takes about 12-15 seconds to run to completion on my 200MB > > > partition. > > > > What filesystem, and what about 20GB scratch partitions (which are > > common)? i.e. Checking cost is different on different filesystems, > > different capacity devices and even different userspace versions of > > the same filesystem utilities. It is most definitely not free, and > > in some cases can be prohibitively expensive. > > For the CrashMonkey tests, one solution might be to force the use of a > small file system on the scratch disk. (e.g., using _scratch_mkfs_sized). > > > I suspect we've lost sight of the fact that fstests was /primarily/h > > a filesystem developer test suite, not a distro regression test > > suite. If the test suite becomes too cumbersome and slow for > > developers to use effectively, then it will get used less during > > development and that's a *really, really bad outcome*. > > I agree with your concern. > > > It even takes half an hour to run the quick group on my fast > > machine, which really isn't very quick anymore because of the sheer > > number of tests in the quick group. Half an hour is too slow for > > effective change feed back - feedback within 5 minutes is > > necessary, otherwise the developer will context switch to somethingt > > else while waiting and lose all focus on what they were doing. This > > leads to highly inefficient developers. > > At Google we were willing to live with a 10 minute "fssmoke" subset, > but admittedly, that's grown to 15-20 minutes in recent years. So > trying to create a "smoke" group that is only 5 minutes SGTM. > > > The only tests that I've seen discover new bugs recently are those > > that run fsx, fstress or some other semi-randomised workloads that > > are combined with some other operation. These tests find the bugs > > that fine-grained, targetted regression tests will never uncover, > > and so in many cases running most of these integration/regression > > tests doesn't provide any value to the developer. > > Yeah, what I used to do is assume that if the test run survives past > generic/013 (which uses fsstress), I'd assume that it would pass the > rest of the tests, and I would move on to reviewing the next commit. > Unfortuantely we've added so many ext4 specific tests (which run in > front of generic) that this trick no longer works. I haven't gotten > annoyed enough to hack in some way to reorder the tests that get run > so the highest value tests run first, and then sending a "90+% chance > the commit is good, running the rest of the tests" message, but it has > occurred to me.... > > > Perhaps we need to recategorise the tests into new groups. > > Agreed. Either we need to change what tests we leave in "quick", or > we need to create a new group "smoke" where quick is an attribute of > the group as a whole, not an attribute of each test in the "quick" > group. > > > Perhaps we need to scale the fstests infrastructure to support > > thousands of tests efficiently. > > On my "when I find the round tuit, or when I can get a GSOC or intern > to work on it, whichever comes first" list is to enhance gce-xfstests > so it can shard the tests for a particular fs configuration so they > use a group of a VMs, instead of just using a separate VM for each > config scenario (e.g., dax, blocksize < page size, bigalloc, ext3 > compat, etc.) > > It might mean using ~100 VM's instead of the current 10 that I use, > but if it means the tests complete in a tenth of the time, the total > cost for doing a full integration test won't change by that much. The > bigger problem is that people might have to ask permission to increase > the GCE quotas from the defaults used on new accounts. > > For those people who are trying to run xfstests on bare metal, I'm not > sure there's that much that can be done to improve things; did you > have some ideas? Or were you assuming that step one would require > buying many more physical test machines in your test cluster? > > (Maybe IBM will be willing to give you a much bigger test machine > budget? :-) > > - Ted