On Thu, Feb 14, 2019 at 01:56:34PM -0800, Omar Sandoval wrote: > > 3) Making blktests more stable/useful. For someone who is not a block > > layer specialist, it can be hard to determine whether the problem is a > > kernel bug, > > From my experience with running xfstests at Facebook, the same thing > goes for xfstests :) The filesystem developers on the team are the only > ones that can make sense of any test failures. What I've done for xfstests is to make it so easy that even a University Professor (or Graduate Student) can run it. That's why I created the {kvm,gce}-xfstests test appliance: https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-quickstart.md I've been trying to integrate blktests into the test appliance, to try to make it really easy to run. > Have you encountered issues where missing config options have caused > test failures? Or you want the config options for maximum coverage? If > you have examples of the former, I'll fix them up. For the latter, I > have a list somewhere that I can add to the blktests repository. There were a few cases a missing config caused test failures; most of the time, it simply causes tests to get skipped. But figuring out how to enable the nvme or srp tests required turning on a *large* number of modules. Figuring that out was painful, and required multipel tries. One of the things that I've done is create kernel defconfigs to make it really easy for somone to build a kernel for testing purposes. The defconfigs for suitable for running xfstests under KVM or GCE can be found here: https://github.com/tytso/xfstests-bld/blob/master/kernel-configs/x86_64-config-4.14 I've attached the defconfig I've been developing suitable for xfstests and blktests below. I try to create minimal defconfigs so I can more quickly build kernels, especially if I am needing to do bisection search. > My (undocumented) rule of thumb has been that blktests shouldn't assume > anything newer than whatever ships on Debian oldstable. I can document > that requirement. That's definitely not true for the nvme tests; the nvme-cli from Debian stable is *not* sufficient. This is why I've started building nvme-cli as part of the test appliance in xfstests-bld. I'm now somewhat suspicious that there are problems because using the latest HEAD of the nvme-cli git tree may have had messages printed to standard out that is subtly different from the version of nvme-cli that was used to develop some of the nvme tests. > blktests is new, so we have some rough edges, but I'd like to think that > we're trying to do the right things. Please report the cases where we're > not and we'll get them fixed up. I have been graduately reporting them to linux-block@. Here are the full set of test failures I've been working through. My goal is that eventually, someone will be able to run "gce-xfstests --blktests" in their kernel development tree, and in less than 45 minutes, they would get an e-mail that would look like the following, except there wouldn't be any failures reported. :-) - Ted CMDLINE: --blktests FSTESTIMG: gce-xfstests/xfstests-201902111955 FSTESTPRJ: gce-xfstests FSTESTVER: blktests 5f1e24c (Mon, 11 Feb 2019 10:08:14 -0800) FSTESTVER: fio fio-3.2 (Fri, 3 Nov 2017 15:23:49 -0600) FSTESTVER: fsverity bdebc45 (Wed, 5 Sep 2018 21:32:22 -0700) FSTESTVER: ima-evm-utils 0267fa1 (Mon, 3 Dec 2018 06:11:35 -0500) FSTESTVER: nvme-cli v1.7-22-gf716974 (Wed, 6 Feb 2019 16:03:58 -0700) FSTESTVER: quota 59b280e (Mon, 5 Feb 2018 16:48:22 +0100) FSTESTVER: stress-ng 7d0353cf (Sun, 20 Jan 2019 03:30:03 +0000) FSTESTVER: syzkaller 2103a236 (Fri, 18 Jan 2019 13:20:33 +0100) FSTESTVER: xfsprogs v4.19.0 (Fri, 9 Nov 2018 14:31:04 -0600) FSTESTVER: xfstests-bld 11be69c (Mon, 11 Feb 2019 18:57:39 -0500) FSTESTVER: xfstests linux-v3.8-2293-g6f7f9398 (Mon, 11 Feb 2019 19:42:24 -0500) FSTESTSET: "" FSTESTEXC: "" FSTESTOPT: "blktests aex" CPUS: "2" MEM: "7680" BEGIN BLKTESTS Tue Feb 12 00:41:13 EST 2019 block/024 (do I/O faster than a jiffy and check iostats times) [failed] loop/002 (try various loop device block sizes) [failed] nvme/002 (create many subsystems and test discovery) [failed] nvme/012 (run mkfs and data verification fio job on NVMeOF block device-backed ns) [failed] [ 1857.726308] WARNING: possible recursive locking detected nvme/013 (run mkfs and data verification fio job on NVMeOF file-backed ns) [failed] nvme/015 (unit test for NVMe flush for file backed ns) [failed] nvme/016 (create/delete many NVMeOF block device-backed ns and test discovery) [failed] nvme/017 (create/delete many file-ns and test discovery) [failed] srp/002 (File I/O on top of multipath concurrently with logout and login (mq)) [failed] srp/011 (Block I/O on top of multipath concurrently with logout and login) [failed] Run: block/001 block/002 block/003 block/004 block/005 block/006 block/009 block/010 block/012 block/013 block/014 block/015 block/016 block/017 block/018 block/020 block/021 block/023 block/024 block/025 block/028 loop/001 loop/002 loop/003 loop/004 loop/005 loop/006 loop/007 nvme/002 nvme/003 nvme/004 nvme/005 nvme/006 nvme/007 nvme/008 nvme/009 nvme/010 nvme/011 nvme/012 nvme/013 nvme/014 nvme/015 nvme/016 nvme/017 nvme/019 nvme/020 nvme/021 nvme/022 nvme/023 nvme/024 nvme/025 nvme/026 nvme/027 nvme/028 scsi/001 scsi/002 scsi/003 scsi/004 scsi/005 scsi/006 srp/001 srp/002 srp/005 srp/006 srp/007 srp/008 srp/009 srp/010 srp/011 srp/012 srp/013 Failures: block/024 loop/002 nvme/002 nvme/012 nvme/013 nvme/015 nvme/016 nvme/017 srp/002 srp/011 Failed 10 of 71 tests END BLKTESTS Tue Feb 12 01:18:37 EST 2019 Feb 12 01:11:34 xfstests-tytso-20190212003935 kernel: [ 1857.726308] WARNING: possible recursive locking detected