All of lore.kernel.org
 help / color / mirror / Atom feed
From: Luis Chamberlain <mcgrof@kernel.org>
To: Bart Van Assche <bvanassche@acm.org>
Cc: Theodore Ts'o <tytso@mit.edu>,
	linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org,
	amir73il@gmail.com, pankydev8@gmail.com, josef@toxicpanda.com,
	jmeneghi@redhat.com, Jan Kara <jack@suse.cz>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Dan Williams <dan.j.williams@intel.com>, Jake Edge <jake@lwn.net>,
	Klaus Jensen <its@irrelevant.dk>
Subject: Re: [RFC: kdevops] Standardizing on failure rate nomenclature for expunges
Date: Thu, 7 Jul 2022 14:16:13 -0700	[thread overview]
Message-ID: <YsdNHQZdlT1IU5dv@bombadil.infradead.org> (raw)
In-Reply-To: <1abb9307-509d-e2dc-5756-ebc297a62538@acm.org>

On Sun, Jul 03, 2022 at 07:54:11AM -0700, Bart Van Assche wrote:
> On 7/3/22 06:32, Theodore Ts'o wrote:
> > On Sat, Jul 02, 2022 at 02:48:12PM -0700, Bart Van Assche wrote:
> > > 
> > > I strongly disagree with annotating tests with failure rates. My opinion is
> > > that on a given test setup a test either should pass 100% of the time or
> > > fail 100% of the time.
> > 
> > My opinion is also that no child should ever go to bed hungry, and we
> > should end world hunger.
> 
> In my view the above comment is unfair. The first year after I wrote the
> SRP tests in blktests I submitted multiple fixes for kernel bugs encountered
> by running these tests. Although it took a significant effort, after about
> one year the test itself and the kernel code it triggered finally resulted
> in reliable operation of the test. After that initial stabilization period
> these tests uncovered regressions in many kernel development cycles, even in
> the v5.19-rc cycle.
> 
> Since I'm not very familiar with xfstests I do not know what makes the
> stress tests in this test suite fail. Would it be useful to modify the code
> that decides the test outcome to remove the flakiness, e.g. by only checking
> that the stress tests do not trigger any unwanted behavior, e.g. kernel
> warnings or filesystem inconsistencies?

Filesystems and the block layer are bundled on top of tons of things in
the kernel, and those layers could introduce the undeterminism. To rule
out determinism we must first rule out undeterminism in other areas of
the kernel, and that will take a long time. Things like kunit tests will
help here, along with adding more tests to other smaller layers. The
list is long.

At LSFMM I mentioned how blktests block/009 had an odd failure rate of
about 1/669 a while ago. The issue was real, and it took a while to
figure out what the real issue was. Jan Kara's patches solved these
issues and they are not trivial to backport to ancient enterprise
kernels ;)

Another more recent one was the undeterministic RCU cpu stall warnings with
a failure rate of about 1/80 on zbd/006 and that lead to some interesting
revelations about how qemu's use of discard was shitty and just needed
to be enhanced.

Yes, you can probably make zbd/006 more atomic and split it into 10
tests, but I don't think we can escape the lack of determinism in
certain areas of the kernel. We can *work to improve* it, but again,
that will take time, and I am not quite sure many folks really want
that too.

  Luis

  reply	other threads:[~2022-07-07 21:16 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-19  3:07 [RFC: kdevops] Standardizing on failure rate nomenclature for expunges Luis Chamberlain
2022-05-19  6:36 ` Amir Goldstein
2022-05-19  7:58   ` Dave Chinner
2022-05-19  9:20     ` Amir Goldstein
2022-05-19 15:36       ` Josef Bacik
2022-05-19 16:18         ` Zorro Lang
2022-05-19 11:24   ` Zorro Lang
2022-05-19 14:18     ` Theodore Ts'o
2022-05-19 15:10       ` Zorro Lang
2022-05-19 14:58     ` Matthew Wilcox
2022-05-19 15:44       ` Zorro Lang
2022-05-19 16:06         ` Matthew Wilcox
2022-05-19 16:54           ` Zorro Lang
2022-07-01 23:36           ` Luis Chamberlain
2022-07-02 17:01           ` Theodore Ts'o
2022-07-07 21:36             ` Luis Chamberlain
2022-07-02 21:48 ` Bart Van Assche
2022-07-03  5:56   ` Amir Goldstein
2022-07-03 13:15     ` Theodore Ts'o
2022-07-03 14:22       ` Amir Goldstein
2022-07-03 16:30         ` Theodore Ts'o
2022-07-04  3:25     ` Dave Chinner
2022-07-04  7:58       ` Amir Goldstein
2022-07-05  2:29         ` Theodore Ts'o
2022-07-05  3:11         ` Dave Chinner
2022-07-06 10:11           ` Amir Goldstein
2022-07-06 14:29             ` Theodore Ts'o
2022-07-06 16:35               ` Amir Goldstein
2022-07-03 13:32   ` Theodore Ts'o
2022-07-03 14:54     ` Bart Van Assche
2022-07-07 21:16       ` Luis Chamberlain [this message]
2022-07-07 21:06     ` Luis Chamberlain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YsdNHQZdlT1IU5dv@bombadil.infradead.org \
    --to=mcgrof@kernel.org \
    --cc=amir73il@gmail.com \
    --cc=bvanassche@acm.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave@stgolabs.net \
    --cc=its@irrelevant.dk \
    --cc=jack@suse.cz \
    --cc=jake@lwn.net \
    --cc=jmeneghi@redhat.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=pankydev8@gmail.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.