All of lore.kernel.org
 help / color / mirror / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: Dave Chinner <david@fromorbit.com>,
	Bart Van Assche <bvanassche@acm.org>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Luis Chamberlain <mcgrof@kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-block <linux-block@vger.kernel.org>,
	Pankaj Raghav <pankydev8@gmail.com>,
	Josef Bacik <josef@toxicpanda.com>,
	jmeneghi@redhat.com, Jan Kara <jack@suse.cz>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Dan Williams <dan.j.williams@intel.com>, Jake Edge <jake@lwn.net>,
	Klaus Jensen <its@irrelevant.dk>,
	fstests <fstests@vger.kernel.org>, Zorro Lang <zlang@redhat.com>,
	Matthew Wilcox <willy@infradead.org>
Subject: Re: [RFC: kdevops] Standardizing on failure rate nomenclature for expunges
Date: Wed, 6 Jul 2022 19:35:57 +0300	[thread overview]
Message-ID: <CAOQ4uxgr8v=h1xi=sfJD9uSp6DR_iAiXScd68Ov7=6Cm-iA+ZA@mail.gmail.com> (raw)
In-Reply-To: <YsWcZbBALgWKS88+@mit.edu>

On Wed, Jul 6, 2022 at 5:30 PM Theodore Ts'o <tytso@mit.edu> wrote:
>
> On Wed, Jul 06, 2022 at 01:11:16PM +0300, Amir Goldstein wrote:
> >
> > So I am wondering what is the status today, because I rarely
> > see fstests failure reports from kernel test bot on the list, but there
> > are some reports.
> >
> > Does anybody have a clue what hw/fs/config/group of fstests
> > kernel test bot is running on linux-next?
>
> Te zero-day test bot only reports test regressions.  So they have some
> list of tests that have failed in the past, and they only report *new*
> test failures.  This is not just true for fstests, but it's also true
> for things like check and compiler warnings warnings --- and I suspect
> it's for those sorts of reports that caused the zero-day bot to keep
> state, and to filter out test failures and/or check warnings and/or
> compiler warnings, so that only new test failures and/or new compiler
> warnigns are reported.  If they didn't, they would be spamming kernel
> developers, and given how.... "kind and understanding" kernel
> developers are at getting spammed, especially when sometimes the
> complaints are bogus ones (either test bugs or compiler bugs), my
> guess is that they did the filtering out of sheer self-defense.  It
> certainly wasn't something requested by a file system developer as far
> as I know.
>
>
> So this is how I think an automated system for "drive-by testers"
> should work.  First, the tester would specify the baseline/origin tag,
> and the testing system would run the tests on the baseline once.
> Hopefully, the test runner already has exclude files so that kernel
> bugs that cause an immediate kernel crash or deadlock would be already
> be in the exclude list.  But as I've discovered this weekend, for file
> systems that I haven't tried in a few yeas, like udf, or
> ubifs. etc. there may be missing tests that result in the test VM to
> stop responding and/or crash.
>
> I have a planned improvement where if you are using the gce-xfstests's
> lightweight test manager, since the LTM is constantly reading the
> serial console, a deadlock can be detected and the LTM can restart the
> VM.  The VM can then disambiguate from a forced reboot caused by the
> LTM, or a forced shutdown caused by the use of a preemptible VM (a
> planned feature not yet fully implemented yet), and the test runner
> can skip the tests already run, and skip the test which caused the
> crash or deadlock, and this could be reported so that eventually, the
> test could be added to the exclude file to benefit thouse people who
> are using kvm-xfstests.  (This is an example of a planned improvement
> in xfstests-bld which if someone is interested in helping to implement
> it, they should give me a ring.)
>
> Once the tests which are failing given a particular baseline are
> known, this state would then get saved, and then now the tests can be
> run on the drive-by developer's changes.  We can now compare the known
> failures for the baseline, with the changed kernels, and if there are
> any new failures, there are two possibilities: (a) this was a new
> feailure caused by the drive-by developer's changes, (b) this was a
> pre-existing known flake.
>
> To disambiguate between these two cases, we now run the failed test N
> times (where N is probably something like 10-50 times; I normally use
> 25 times) on the changed kernel, and get the failure rate.  If the
> failure rate is 100%, then this is almost certainly (a).  If the
> failure rate is < 100% (and greater than 0%), then we need to rerun
> the failed test on the baseline kernel N times, and see if the failure
> rate is 0%, then we should do a bisection search to determine the
> guilty commit.
>
> If the failure rate is 0%, then this is either an extremely rare
> flake, in which case we might need to increase N --- or it's an
> example of a test failure which is sensitive to the order of tests
> which are failed, in which case we may need to reun all of the tests
> in order up to the failed test.
>
> This is right now what I do when processing patches for upstream.
> It's also rather similar to what we're doing for the XFS stable
> backports, because it's much more efficient than running the baseline
> tests 100 times (which can take a week of continuous testing per
> Luis's comments) --- we only tests dozens (or more) times where a
> potential flake has been found, as opposed to *all* tests.  It's all
> done manually, but it would be great if we could automate this to make
> life easier for XFS stable backporters, and *also* for drive-by
> developers.
>

This process sounds like it could get us to mostly unattended regression
testing, so it sounds good.

I do wonder if there is nothing more that fstests devlopers can do to
assist when annotating new (and existing) tests to aid in that effort.

For example, there might be a case to tag a test as "this is a very
reliable test that should have no failures at all - if there is a failure
then something is surely wrong".
I wonder if it would help to have a group like that and how many
tests would that group include.

> And again, if anyone is interested in helping with this, especially if
> you're familiar with shell, python 3, and/or the Go language, please
> contact me off-line.
>

Please keep me in the loop if you have a prototype I may be able
to help test it.

Thanks,
Amir.

  reply	other threads:[~2022-07-06 16:36 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-19  3:07 [RFC: kdevops] Standardizing on failure rate nomenclature for expunges Luis Chamberlain
2022-05-19  6:36 ` Amir Goldstein
2022-05-19  7:58   ` Dave Chinner
2022-05-19  9:20     ` Amir Goldstein
2022-05-19 15:36       ` Josef Bacik
2022-05-19 16:18         ` Zorro Lang
2022-05-19 11:24   ` Zorro Lang
2022-05-19 14:18     ` Theodore Ts'o
2022-05-19 15:10       ` Zorro Lang
2022-05-19 14:58     ` Matthew Wilcox
2022-05-19 15:44       ` Zorro Lang
2022-05-19 16:06         ` Matthew Wilcox
2022-05-19 16:54           ` Zorro Lang
2022-07-01 23:36           ` Luis Chamberlain
2022-07-02 17:01           ` Theodore Ts'o
2022-07-07 21:36             ` Luis Chamberlain
2022-07-02 21:48 ` Bart Van Assche
2022-07-03  5:56   ` Amir Goldstein
2022-07-03 13:15     ` Theodore Ts'o
2022-07-03 14:22       ` Amir Goldstein
2022-07-03 16:30         ` Theodore Ts'o
2022-07-04  3:25     ` Dave Chinner
2022-07-04  7:58       ` Amir Goldstein
2022-07-05  2:29         ` Theodore Ts'o
2022-07-05  3:11         ` Dave Chinner
2022-07-06 10:11           ` Amir Goldstein
2022-07-06 14:29             ` Theodore Ts'o
2022-07-06 16:35               ` Amir Goldstein [this message]
2022-07-03 13:32   ` Theodore Ts'o
2022-07-03 14:54     ` Bart Van Assche
2022-07-07 21:16       ` Luis Chamberlain
2022-07-07 21:06     ` Luis Chamberlain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOQ4uxgr8v=h1xi=sfJD9uSp6DR_iAiXScd68Ov7=6Cm-iA+ZA@mail.gmail.com' \
    --to=amir73il@gmail.com \
    --cc=bvanassche@acm.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave@stgolabs.net \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=fstests@vger.kernel.org \
    --cc=its@irrelevant.dk \
    --cc=jack@suse.cz \
    --cc=jake@lwn.net \
    --cc=jmeneghi@redhat.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=pankydev8@gmail.com \
    --cc=tytso@mit.edu \
    --cc=willy@infradead.org \
    --cc=zlang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.