kdevops.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: Frederick Lawler <fred@cloudflare.com>
Cc: mcgrof@kernel.org, kdevops@lists.linux.dev,
	kernel-team@cloudflare.com,  linux-fsdevel@vger.kernel.org,
	Chandan Babu R <chandan.babu@oracle.com>,
	 Leah Rumancik <leah.rumancik@gmail.com>,
	"Darrick J. Wong" <djwong@kernel.org>
Subject: Re: [RFC PATCH kdevops 0/2] augment expunge list for v6.1.53
Date: Sat, 16 Sep 2023 12:23:54 +0300	[thread overview]
Message-ID: <CAOQ4uxiGYF8EhqxM91_vrGSVYoX7dAf154btVobbsj=RUQNWAQ@mail.gmail.com> (raw)
In-Reply-To: <20230915234857.1613994-1-fred@cloudflare.com>

Hi Frederick!

Nice to see you joining the kdevops gang :)

On Sat, Sep 16, 2023 at 2:49 AM Frederick Lawler <fred@cloudflare.com> wrote:
>
> In an effort to test and prepare patches from XFS to stable 6.1.y [1], I needed
> to make a baseline for v6.1.53 to verify that the backported patches do not
> introduce regressions (if any). However, after a 'make fstests-baseline', we
> observed that compared to v6.1.42, v6.1.53 introduced more than expected
> expunges to XFS. This RFC is an attempt to put some eyes to this and open up a
> discussion.

I have refreshed the v6.1.42 expunge list very recently to uptodate fstests:

commit 0b58b02f08d26ea23b6ff58d9b24488c266f32d0
Author: Amir Goldstein <amir73il@gmail.com>
Date:   Sat Aug 12 12:29:57 2023 +0300

    xfs: expunge new failing tests

    After update of fstests branch to tag v2023.08.06

There are zero changes in xfs code between v6.1.42..v6.1.53, so all
the regressions
you observed are unlikely due to the code change.

If it is not easy for you to test on a v6.1.42 k8 host, I can re-run
the baseline loop
with v6.1.53 kernel to verify there are no regressions, but I am
betting there won't be.
So the failures you are seeing must be due to some difference between
our setups.

Note that when I started to use kdepops with libvirt, we have observed
many random
errors that were eventually attributed to faulty code in qemu nvme driver.

I am not ruling out the possibility that the expuge lists that me or
Luis prepared
for xfs in some version (5.10.y, 6.1,y, etc) are tainted with failures
related to
our specific setup.

AFAIK, we never bothered to create two different baselines from scratch in
two different envs (e.g. libvirt and GCE/OCI) and compare them.

But as it is, you already have my baseline from libvirt/kvm -
I don't think that it makes sense to add to 6.1.y expunge lists
failures due to test env change, unless you were able to prove that either:
1. Those tests did not run in my env
2. You env manages to expose a bug that my env did not expose

I can help with #1 by committing results from a run in my env.
#2 is harder - you will need to analyse the failures in your env
and understand them.

Whenever I see new failures, I always analyse them before adding
to the expunge list and I try to add a comment explaining either the
observed reason for failure or the missing fix if I know it.

>
> At Cloudflare, the Linux team does not have an easy way to obtain dedicated and
> easily configurable server infrastructure to execute kdevops filesystem testing,
> but we do have an easily-configurable kubernetes infrastructure. I prepared a
> POC to spin up virtual machines [2] in kubernetes to emulate what terraform
> may do for OpenStack, Azure, AWS, etc... to perform this test. Therefore, the
> configuration option is set to SKIP_BRINGUP=y
>
> In this baseline, I spun up XFS workflow nodes for:
> - xfs_crc
> - xfs_logdev
> - xfs_nocrc
> - xfs_nocrc_512
> - xfs_reflink
> - xfs_reflink_1024
> - xfs_reflink_normapbt
> - xfs_rtdev
>
> Each node is running a vanilla-stable 6.1.y (6.1.53), and the image is based on
> latest Debian SID [3]. Each node also has its own dedicated /data and /media
> partitions to store Linux, fstests, etc... and sparse-images respectfully.
>
> In v6.1.42, we don't currently have expunges for xfs_reflink_normapbt, and
> xfs_reflink. So those are _new_. The rest had significant additions. However,
> not all nodes finished their testing after >12hrs of run time. Some appeared to
> be stuck, in particular xfs_rtdev, and never finished (reason unknown).
> I CTRL+C and ran 'make fstests-results'.
>
> I prepared a fork [4] where the results 6.1.53.xz can be found.
>
> These patches are based on top of commit 0ec98182f4a9 ("bootlinux/fstests:
> remove odd hplip user")
>
> Links:
> 1: https://lore.kernel.org/all/CAOQ4uxgvawD4=4g8BaRiNvyvKN1oreuov_ie6sK6arq3bf8fxw@mail.gmail.com/
> 2: https://kubevirt.io/api-reference/v1.0.0/definitions.html#_v1_virtualmachine
> 3: https://cloud.debian.org/images/cloud/sid/daily/latest/ (debian-sid-genericcloud-amd64-daily.qcow2)
> 4: https://github.com/fredlawl/kdevops/commit/afcb8fe7c4498d2be5386e191db3534f651a3730#diff-0677846133ad9128bf752f674b3c8da437c12ce28f48d8890b9f66d0dcb3717c
>
> Frederick Lawler (2):
>   fstests/xfs: copy 6.1.42 baseline for v6.1.53

In this commit you copied also the ext4 and btrfs expunge lists.
That is not needed as you are not changing or intend to change them.

I don't think that forking xfs lists is going to be needed at all
once you verified what happened - if your findings are indeed
correct they probably belong in the v6.1.42 expunge list.

>   xfs: merge common expunge lists for v6.1.53

The title of this commit does not represent the change correctly.
What this commit does is to add many new tests to the 6.1.53
expunge list.

Your confusing must be from seeing my commits like:
8745d44 xfs: merge common expunge lists for v6.1.42

What these commits do is to merge common failures
in xfs_* config specific expunge lists into the common all.txt
expunge list - there are scripts that do that:
./scripts/workflows/fstests/{find,remove}-common-failures.sh

Thanks,
Amir.

  parent reply	other threads:[~2023-09-16  9:24 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-15 23:48 [RFC PATCH kdevops 0/2] augment expunge list for v6.1.53 Frederick Lawler
2023-09-15 23:48 ` [RFC PATCH kdevops 1/2] fstests/xfs: copy 6.1.42 baseline " Frederick Lawler
2023-09-15 23:48 ` [RFC PATCH kdevops 2/2] xfs: merge common expunge lists " Frederick Lawler
2023-09-16  9:23 ` Amir Goldstein [this message]
2023-09-18 18:52   ` [RFC PATCH kdevops 0/2] augment expunge list " Luis Chamberlain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOQ4uxiGYF8EhqxM91_vrGSVYoX7dAf154btVobbsj=RUQNWAQ@mail.gmail.com' \
    --to=amir73il@gmail.com \
    --cc=chandan.babu@oracle.com \
    --cc=djwong@kernel.org \
    --cc=fred@cloudflare.com \
    --cc=kdevops@lists.linux.dev \
    --cc=kernel-team@cloudflare.com \
    --cc=leah.rumancik@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).