stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Luis Chamberlain <mcgrof@kernel.org>
To: Sasha Levin <sashal@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>,
	linux-xfs@vger.kernel.org, gregkh@linuxfoundation.org,
	Alexander.Levin@microsoft.com, stable@vger.kernel.org,
	amir73il@gmail.com, hch@infradead.org
Subject: Re: [PATCH v2 00/10] xfs: stable fixes for v4.19.y
Date: Mon, 11 Feb 2019 11:46:06 -0800	[thread overview]
Message-ID: <20190211194606.GO11489@garbanzo.do-not-panic.com> (raw)
In-Reply-To: <20190209215627.GB69686@sasha-vm>

On Sat, Feb 09, 2019 at 04:56:27PM -0500, Sasha Levin wrote:
> On Fri, Feb 08, 2019 at 02:17:26PM -0800, Luis Chamberlain wrote:
> > On Fri, Feb 08, 2019 at 01:06:20AM -0500, Sasha Levin wrote:
> > Have you found pmem
> > issues not present on other sections?
> 
> Originally I've added this because the xfs folks suggested that pmem vs
> block exercises very different code paths and we should be testing both
> of them.
> 
> Looking at the baseline I have, it seems that there are differences
> between the failing tests. For example, with "MKFS_OPTIONS='-f -m
> crc=1,reflink=0,rmapbt=0, -i sparse=0'",

That's my "xfs" section.

> generic/524 seems to fail on pmem but not on block.

This is useful thanks! Can you get the failure rate? How often does it
fail when you run the test? Always? Does it *never* fail on block? How
many consecutive runs did you have run on block?

To help with this oscheck has naggy-check.sh, you could run it until
a failure is hit:

./naggy-check.sh -f -s xfs generic/524

And on another host:

./naggy-check.sh -f -s xfs_pmem generic/524

> > Any reason you don't name the sections with more finer granularity?
> > It would help me in ensuring when we revise both of tests we can more
> > easily ensure we're talking about apples, pears, or bananas.
> 
> Nope, I'll happily rename them if there are "official" names for it :)

Well since I am pushing out the stable fixes and am using oscheck to
be transparent about how I test and what I track, and since I'm using
section names, yes it would be useful to me. Simply adding a _pmem
postfix to the pmem ones would suffice.

> > FWIW, I run two different bare metal hosts now, and each has a VM guest
> > per section above. One host I use for tracking stable, the other host for
> > my changes. This ensures I don't mess things up easier and I can re-test
> > any time fast.
> > 
> > I dedicate a VM guest to test *one* section. I do this with oscheck
> > easily:
> > 
> > ./oscheck.sh --test-section xfs_nocrc | tee log-xfs-4.19.18+
> > 
> > For instance will just test xfs_nocrc section. On average each section
> > takes about 1 hour to run.
> 
> We have a similar setup then. I just spawn the VM on azure for each
> section and run them all in parallel that way.

Indeed.

> I thought oscheck runs everything on a single VM,

By default it does.

> is it a built in
> mechanism to spawn a VM for each config?

Yes:

./oscheck.sh --test-section xfs_nocrc_512

For instance will test section xfs_nocrc_512 *only* on that host.

> If so, I can add some code in
> to support azure and we can use the same codebase.

Groovy. I believe the next step will if you can send me your delta
of expunges, and then I can run naggy-check.sh on them to see if I
can reach similar results. I believe you have a larger expunge list.
I suspect some of this may you may not have certain quirks handled.
We will see. But getting this right and to sync our testing should
yield good confirmation of failures.

> > I could run the tests on raw nvme and do away with the guests, but
> > that loses some of my ability to debug on crashes easily and out to
> > baremetal.. but curious, how long do your tests takes? How about per
> > section? Say just the default "xfs" section?
> 
> I think that the longest config takes about 5 hours, otherwise
> everything tends to take about 2 hours.

Oh wow, mine are only 1 hour each. Guess I got a decent rig now :)

> I basically run these on "repeat" until I issue a stop order, so in a
> timespan of 48 hours some configs run ~20 times and some only ~10.

I see... so you iterate over all tests and many times a day and this is
how you've built your expunge list. Correct?

It could could explain how you may end up with a larger set. This can
mean some tests only fail at a non-100% failure rate, for these I'm
annotating the failure rate as a comment on each expunge line. Having a
consistent format for this and proper agreed upon term would be good.
Right now I just mention how oftem I have to run a test before reaching
a failure.  This provides a rough estimate how many times one should
iterate running the test in a loop before detecting a failure. Of course
this may not always be acurate, given systems vary and this could play
an impact on the failure... but at least it provides some guidance. It
would be curious to see if we end up with similar failure rates for
tests don't always fail. And if there is a divergence, how big this
could be.

  Luis

  reply	other threads:[~2019-02-11 19:46 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-04 16:54 [PATCH v2 00/10] xfs: stable fixes for v4.19.y Luis Chamberlain
2019-02-04 16:54 ` [PATCH v2 01/10] xfs: Fix xqmstats offsets in /proc/fs/xfs/xqmstat Luis Chamberlain
2019-02-04 16:54 ` [PATCH v2 02/10] xfs: cancel COW blocks before swapext Luis Chamberlain
2019-02-04 16:54 ` [PATCH v2 03/10] xfs: Fix error code in 'xfs_ioc_getbmap()' Luis Chamberlain
2019-02-04 16:54 ` [PATCH v2 04/10] xfs: fix overflow in xfs_attr3_leaf_verify Luis Chamberlain
2019-02-04 16:54 ` [PATCH v2 05/10] xfs: fix shared extent data corruption due to missing cow reservation Luis Chamberlain
2019-02-04 16:54 ` [PATCH v2 06/10] xfs: fix transient reference count error in xfs_buf_resubmit_failed_buffers Luis Chamberlain
2019-02-04 16:54 ` [PATCH v2 07/10] xfs: delalloc -> unwritten COW fork allocation can go wrong Luis Chamberlain
2019-02-04 16:54 ` [PATCH v2 08/10] fs/xfs: fix f_ffree value for statfs when project quota is set Luis Chamberlain
2019-02-04 16:54 ` [PATCH v2 09/10] xfs: fix PAGE_MASK usage in xfs_free_file_space Luis Chamberlain
2019-02-04 16:54 ` [PATCH v2 10/10] xfs: fix inverted return from xfs_btree_sblock_verify_crc Luis Chamberlain
2019-02-05  6:44 ` [PATCH v2 00/10] xfs: stable fixes for v4.19.y Amir Goldstein
2019-02-05 22:06 ` Dave Chinner
2019-02-06  4:05   ` Sasha Levin
2019-02-06 21:54     ` Dave Chinner
2019-02-08  6:06       ` Sasha Levin
2019-02-08 20:06         ` Luis Chamberlain
2019-02-08 21:29         ` Dave Chinner
2019-02-09 17:53           ` Sasha Levin
2019-02-08 22:17         ` Luis Chamberlain
2019-02-09 21:56           ` Sasha Levin
2019-02-11 19:46             ` Luis Chamberlain [this message]
2019-02-08 19:48   ` Luis Chamberlain
2019-02-08 21:32     ` Dave Chinner
2019-02-08 21:50       ` Luis Chamberlain
2019-02-10 22:12         ` Dave Chinner
2019-02-11 20:09     ` Luis Chamberlain
2019-02-10  0:06 ` Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190211194606.GO11489@garbanzo.do-not-panic.com \
    --to=mcgrof@kernel.org \
    --cc=Alexander.Levin@microsoft.com \
    --cc=amir73il@gmail.com \
    --cc=david@fromorbit.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@infradead.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).