kdevops.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH kdevops 0/2] augment expunge list for v6.1.53
@ 2023-09-15 23:48 Frederick Lawler
  2023-09-15 23:48 ` [RFC PATCH kdevops 1/2] fstests/xfs: copy 6.1.42 baseline " Frederick Lawler
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Frederick Lawler @ 2023-09-15 23:48 UTC (permalink / raw)
  To: amir73il, mcgrof; +Cc: kdevops, kernel-team, linux-fsdevel, Frederick Lawler

In an effort to test and prepare patches from XFS to stable 6.1.y [1], I needed 
to make a baseline for v6.1.53 to verify that the backported patches do not 
introduce regressions (if any). However, after a 'make fstests-baseline', we 
observed that compared to v6.1.42, v6.1.53 introduced more than expected 
expunges to XFS. This RFC is an attempt to put some eyes to this and open up a 
discussion.

At Cloudflare, the Linux team does not have an easy way to obtain dedicated and
easily configurable server infrastructure to execute kdevops filesystem testing, 
but we do have an easily-configurable kubernetes infrastructure. I prepared a 
POC to spin up virtual machines [2] in kubernetes to emulate what terraform 
may do for OpenStack, Azure, AWS, etc... to perform this test. Therefore, the 
configuration option is set to SKIP_BRINGUP=y

In this baseline, I spun up XFS workflow nodes for:
- xfs_crc
- xfs_logdev
- xfs_nocrc
- xfs_nocrc_512
- xfs_reflink
- xfs_reflink_1024
- xfs_reflink_normapbt
- xfs_rtdev

Each node is running a vanilla-stable 6.1.y (6.1.53), and the image is based on 
latest Debian SID [3]. Each node also has its own dedicated /data and /media
partitions to store Linux, fstests, etc... and sparse-images respectfully.

In v6.1.42, we don't currently have expunges for xfs_reflink_normapbt, and 
xfs_reflink. So those are _new_. The rest had significant additions. However, 
not all nodes finished their testing after >12hrs of run time. Some appeared to 
be stuck, in particular xfs_rtdev, and never finished (reason unknown). 
I CTRL+C and ran 'make fstests-results'.

I prepared a fork [4] where the results 6.1.53.xz can be found.

These patches are based on top of commit 0ec98182f4a9 ("bootlinux/fstests: 
remove odd hplip user")

Links:
1: https://lore.kernel.org/all/CAOQ4uxgvawD4=4g8BaRiNvyvKN1oreuov_ie6sK6arq3bf8fxw@mail.gmail.com/
2: https://kubevirt.io/api-reference/v1.0.0/definitions.html#_v1_virtualmachine
3: https://cloud.debian.org/images/cloud/sid/daily/latest/ (debian-sid-genericcloud-amd64-daily.qcow2)
4: https://github.com/fredlawl/kdevops/commit/afcb8fe7c4498d2be5386e191db3534f651a3730#diff-0677846133ad9128bf752f674b3c8da437c12ce28f48d8890b9f66d0dcb3717c

Frederick Lawler (2):
  fstests/xfs: copy 6.1.42 baseline for v6.1.53
  xfs: merge common expunge lists for v6.1.53

 .../expunges/6.1.53/btrfs/unassigned/all.txt  | 38 +++++++++++
 .../btrfs/unassigned/btrfs_noraid56.txt       |  2 +
 .../6.1.53/btrfs/unassigned/btrfs_simple.txt  |  2 +
 .../btrfs/unassigned/btrfs_simple_zns.txt     | 65 +++++++++++++++++++
 .../expunges/6.1.53/ext4/unassigned/all.txt   | 21 ++++++
 .../unassigned/ext4_advanced_features.txt     |  1 +
 .../6.1.53/ext4/unassigned/ext4_defaults.txt  |  5 ++
 .../expunges/6.1.53/xfs/unassigned/all.txt    | 40 ++++++++++++
 .../6.1.53/xfs/unassigned/xfs_crc.txt         |  7 ++
 .../6.1.53/xfs/unassigned/xfs_logdev.txt      | 26 ++++++++
 .../6.1.53/xfs/unassigned/xfs_nocrc.txt       |  7 ++
 .../6.1.53/xfs/unassigned/xfs_nocrc_512.txt   | 12 ++++
 .../6.1.53/xfs/unassigned/xfs_reflink.txt     |  5 ++
 .../xfs/unassigned/xfs_reflink_1024.txt       | 12 ++++
 .../xfs/unassigned/xfs_reflink_normapbt.txt   | 10 +++
 .../6.1.53/xfs/unassigned/xfs_rtdev.txt       | 49 ++++++++++++++
 16 files changed, 302 insertions(+)
 create mode 100644 workflows/fstests/expunges/6.1.53/btrfs/unassigned/all.txt
 create mode 100644 workflows/fstests/expunges/6.1.53/btrfs/unassigned/btrfs_noraid56.txt
 create mode 100644 workflows/fstests/expunges/6.1.53/btrfs/unassigned/btrfs_simple.txt
 create mode 100644 workflows/fstests/expunges/6.1.53/btrfs/unassigned/btrfs_simple_zns.txt
 create mode 100644 workflows/fstests/expunges/6.1.53/ext4/unassigned/all.txt
 create mode 100644 workflows/fstests/expunges/6.1.53/ext4/unassigned/ext4_advanced_features.txt
 create mode 100644 workflows/fstests/expunges/6.1.53/ext4/unassigned/ext4_defaults.txt
 create mode 100644 workflows/fstests/expunges/6.1.53/xfs/unassigned/all.txt
 create mode 100644 workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_crc.txt
 create mode 100644 workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_logdev.txt
 create mode 100644 workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_nocrc.txt
 create mode 100644 workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_nocrc_512.txt
 create mode 100644 workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_reflink.txt
 create mode 100644 workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_reflink_1024.txt
 create mode 100644 workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_reflink_normapbt.txt
 create mode 100644 workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_rtdev.txt

-- 
2.34.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFC PATCH kdevops 1/2] fstests/xfs: copy 6.1.42 baseline for v6.1.53
  2023-09-15 23:48 [RFC PATCH kdevops 0/2] augment expunge list for v6.1.53 Frederick Lawler
@ 2023-09-15 23:48 ` Frederick Lawler
  2023-09-15 23:48 ` [RFC PATCH kdevops 2/2] xfs: merge common expunge lists " Frederick Lawler
  2023-09-16  9:23 ` [RFC PATCH kdevops 0/2] augment expunge list " Amir Goldstein
  2 siblings, 0 replies; 5+ messages in thread
From: Frederick Lawler @ 2023-09-15 23:48 UTC (permalink / raw)
  To: amir73il, mcgrof; +Cc: kdevops, kernel-team, linux-fsdevel, Frederick Lawler

Signed-off-by: Frederick Lawler <fred@cloudflare.com>
---
 .../expunges/6.1.53/btrfs/unassigned/all.txt  | 38 +++++++++++
 .../btrfs/unassigned/btrfs_noraid56.txt       |  2 +
 .../6.1.53/btrfs/unassigned/btrfs_simple.txt  |  2 +
 .../btrfs/unassigned/btrfs_simple_zns.txt     | 65 +++++++++++++++++++
 .../expunges/6.1.53/ext4/unassigned/all.txt   | 21 ++++++
 .../unassigned/ext4_advanced_features.txt     |  1 +
 .../6.1.53/ext4/unassigned/ext4_defaults.txt  |  5 ++
 .../expunges/6.1.53/xfs/unassigned/all.txt    | 40 ++++++++++++
 .../6.1.53/xfs/unassigned/xfs_crc.txt         |  1 +
 .../6.1.53/xfs/unassigned/xfs_logdev.txt      | 10 +++
 .../6.1.53/xfs/unassigned/xfs_nocrc.txt       |  2 +
 .../6.1.53/xfs/unassigned/xfs_nocrc_512.txt   |  7 ++
 .../xfs/unassigned/xfs_reflink_1024.txt       |  1 +
 .../6.1.53/xfs/unassigned/xfs_rtdev.txt       |  1 +
 14 files changed, 196 insertions(+)
 create mode 100644 workflows/fstests/expunges/6.1.53/btrfs/unassigned/all.txt
 create mode 100644 workflows/fstests/expunges/6.1.53/btrfs/unassigned/btrfs_noraid56.txt
 create mode 100644 workflows/fstests/expunges/6.1.53/btrfs/unassigned/btrfs_simple.txt
 create mode 100644 workflows/fstests/expunges/6.1.53/btrfs/unassigned/btrfs_simple_zns.txt
 create mode 100644 workflows/fstests/expunges/6.1.53/ext4/unassigned/all.txt
 create mode 100644 workflows/fstests/expunges/6.1.53/ext4/unassigned/ext4_advanced_features.txt
 create mode 100644 workflows/fstests/expunges/6.1.53/ext4/unassigned/ext4_defaults.txt
 create mode 100644 workflows/fstests/expunges/6.1.53/xfs/unassigned/all.txt
 create mode 100644 workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_crc.txt
 create mode 100644 workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_logdev.txt
 create mode 100644 workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_nocrc.txt
 create mode 100644 workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_nocrc_512.txt
 create mode 100644 workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_reflink_1024.txt
 create mode 100644 workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_rtdev.txt

diff --git a/workflows/fstests/expunges/6.1.53/btrfs/unassigned/all.txt b/workflows/fstests/expunges/6.1.53/btrfs/unassigned/all.txt
new file mode 100644
index 000000000000..10aeaff40275
--- /dev/null
+++ b/workflows/fstests/expunges/6.1.53/btrfs/unassigned/all.txt
@@ -0,0 +1,38 @@
+btrfs/011 # crash on raid6_avx5121_gen_syndrome https://gist.github.com/mcgrof/4f4d59a6d6057d2147949cbc49a41b13
+btrfs/029 # lazy baseline - failure found in at least two sections
+btrfs/080 # fails on section btrfs_simple but we expect this to fail for others https://gist.github.com/mcgrof/7ae85812aeacd62ab221eda2fab4552e
+btrfs/099
+btrfs/131 # lazy baseline - failure found in at least two sections
+btrfs/175
+btrfs/176
+btrfs/194
+btrfs/197
+btrfs/216
+btrfs/219
+btrfs/220
+btrfs/223
+btrfs/225
+btrfs/232 # kernel warning btrfs_noraid56 WARNING: CPU: 5 PID: 823784 at fs/btrfs/space-info.h:110 btrfs_free_reserved_data_space+0x179/0x190 https://gist.github.com/mcgrof/041f78010f8094a75cfa9a3a7bcb7d02
+btrfs/238
+btrfs/249
+btrfs/254 # lazy baseline - failure found in at least two sections
+btrfs/258 # lazy baseline - failure found in at least two sections
+btrfs/263 # lazy baseline - failure found in at least two sections
+generic/208 # lazy baseline - dmesg failure rate is about 1/60 try with zswap pressure on the host https://gist.github.com/mcgrof/8296499615800048d658abbadb7ebe22
+generic/224 # fails with a hang on btrfs_raid56 section but let's skip for all sections for now
+generic/226 # lazy baseline - failure found in at least two sections - failure rate 1/20 requires a fix in btrfs-progs I have queued https://gist.github.com/mcgrof/81771a86ef0b90152e142e597d5f4147
+generic/241
+generic/260
+generic/300 # fails on btrfs_simple so chances are other sections should fail too - failure rate is about 1/10 hung task on btrfs_wait_ordered_extents() https://gist.github.com/mcgrof/2696e71d3322becfe3811260fbe1ec3a
+generic/371 # found to have taken once 10 times the amount it took to run the first run so 305s vs 26s for 1173% difference on btrfs_simple, this variablity should be looked into
+generic/373 # lazy baseline - failure found in at least two sections
+generic/374 # lazy baseline - failure found in at least two sections
+generic/471 # broken test
+generic/509 # low-hanging-fruit: device-mapper reload ioctl on flakey-test device or resource busy #  https://gist.github.com/mcgrof/05be5f0b6b9c669bef9481ace6299529
+generic/633
+generic/644
+generic/645
+generic/648 # fails on btrfs_noraid56 section but the error seems generic so skip for now see that section for details - failure rate is about 1/50 https://gist.github.com/mcgrof/c3f6dae20800da6f1bda607d0c0275b3
+generic/673 # lazy baseline - failure found in at least two sections - failure rate 1/4
+generic/679 # lazy baseline - failure found in at least two sections
+shared/298
diff --git a/workflows/fstests/expunges/6.1.53/btrfs/unassigned/btrfs_noraid56.txt b/workflows/fstests/expunges/6.1.53/btrfs/unassigned/btrfs_noraid56.txt
new file mode 100644
index 000000000000..991297edae0b
--- /dev/null
+++ b/workflows/fstests/expunges/6.1.53/btrfs/unassigned/btrfs_noraid56.txt
@@ -0,0 +1,2 @@
+btrfs/270
+generic/118 # failure rate is about 1/15
diff --git a/workflows/fstests/expunges/6.1.53/btrfs/unassigned/btrfs_simple.txt b/workflows/fstests/expunges/6.1.53/btrfs/unassigned/btrfs_simple.txt
new file mode 100644
index 000000000000..97b689365c66
--- /dev/null
+++ b/workflows/fstests/expunges/6.1.53/btrfs/unassigned/btrfs_simple.txt
@@ -0,0 +1,2 @@
+btrfs/049 # fails with a low failure rate of about 1/70 https://gist.github.com/mcgrof/ad2d0752f17bc64dacef47dc639e949b
+btrfs/270
diff --git a/workflows/fstests/expunges/6.1.53/btrfs/unassigned/btrfs_simple_zns.txt b/workflows/fstests/expunges/6.1.53/btrfs/unassigned/btrfs_simple_zns.txt
new file mode 100644
index 000000000000..1fb24b6c77b1
--- /dev/null
+++ b/workflows/fstests/expunges/6.1.53/btrfs/unassigned/btrfs_simple_zns.txt
@@ -0,0 +1,65 @@
+btrfs/003
+btrfs/006
+btrfs/011
+btrfs/038
+btrfs/060
+btrfs/061
+btrfs/062
+btrfs/063
+btrfs/064
+btrfs/065
+btrfs/066
+btrfs/067
+btrfs/068
+btrfs/069
+btrfs/070
+btrfs/071
+btrfs/072
+btrfs/073
+btrfs/074
+btrfs/076
+btrfs/090
+btrfs/132
+btrfs/141
+btrfs/150
+btrfs/151
+btrfs/161
+btrfs/162
+btrfs/163
+btrfs/164
+btrfs/167
+btrfs/184
+btrfs/207
+btrfs/218
+btrfs/233
+btrfs/236
+btrfs/237
+btrfs/239
+btrfs/242
+btrfs/248
+btrfs/271
+generic/015
+generic/027 # hangs forever
+generic/066 # Fails in loop 14
+generic/083
+generic/102
+generic/113
+generic/171
+generic/173
+generic/174
+generic/204 # hangs forever
+generic/269
+generic/273
+generic/275
+generic/297
+generic/298
+generic/301
+generic/320
+generic/333 # stuff just hangs in this
+generic/334 # ran for 3607 seconds then hang detected, failure rate 1/2 - https://gist.github.com/mcgrof/2442b9b7fc015eb8551c018f388beb53
+generic/387
+generic/416
+generic/427
+generic/492
+generic/520
+generic/626 # failure rate 1/3 hung task https://gist.github.com/mcgrof/d4b349e3298fd3b889e790b825a96c77
diff --git a/workflows/fstests/expunges/6.1.53/ext4/unassigned/all.txt b/workflows/fstests/expunges/6.1.53/ext4/unassigned/all.txt
new file mode 100644
index 000000000000..9841c2233b0b
--- /dev/null
+++ b/workflows/fstests/expunges/6.1.53/ext4/unassigned/all.txt
@@ -0,0 +1,21 @@
+ext4/025 # lazy baseline - failure found in at least two sections
+ext4/034 # lazy baseline - failure found in at least two sections
+generic/050 # lazy baseline - failure found in at least two sections https://gist.github.com/mcgrof/e598053099af6b5eb83c3f318d38bab5
+generic/082 # lazy baseline - failure found in at least two sections
+generic/219 # lazy baseline - failure found in at least two sections
+generic/230 # lazy baseline - failure found in at least two sections
+generic/231 # lazy baseline - failure found in at least two sections
+generic/232 # lazy baseline - failure found in at least two sections
+generic/233 # lazy baseline - failure found in at least two sections
+generic/235 # lazy baseline - failure found in at least two sections
+generic/241 # lazy baseline - failure found in at least two sections
+generic/270 # lazy baseline - failure found in at least two sections
+generic/382 # lazy baseline - failure found in at least two sections
+generic/388 # lazy baseline - failure rate 1/30 failure only appears on xunit file no *.bad file https://gist.github.com/mcgrof/3ec6a4603548d240e5d33a2831a55683
+generic/398 # lazy baseline - failure found in at least two sections
+generic/471 # broken test
+generic/566 # lazy baseline - failure found in at least two sections
+generic/587 # lazy baseline - failure found in at least two sections
+generic/600 # lazy baseline - failure found in at least two sections
+generic/601 # lazy baseline - failure found in at least two sections
+generic/607 # lazy baseline - failure found in at least two sections
diff --git a/workflows/fstests/expunges/6.1.53/ext4/unassigned/ext4_advanced_features.txt b/workflows/fstests/expunges/6.1.53/ext4/unassigned/ext4_advanced_features.txt
new file mode 100644
index 000000000000..4723bde6329f
--- /dev/null
+++ b/workflows/fstests/expunges/6.1.53/ext4/unassigned/ext4_advanced_features.txt
@@ -0,0 +1 @@
+generic/477 # always fails and never fails with the defaults section after 300 runs https://gist.github.com/mcgrof/8ca888e9f41553573c22ea61b36aa165
diff --git a/workflows/fstests/expunges/6.1.53/ext4/unassigned/ext4_defaults.txt b/workflows/fstests/expunges/6.1.53/ext4/unassigned/ext4_defaults.txt
new file mode 100644
index 000000000000..f79db6016660
--- /dev/null
+++ b/workflows/fstests/expunges/6.1.53/ext4/unassigned/ext4_defaults.txt
@@ -0,0 +1,5 @@
+generic/127 # sometimes may take about 10 times more than it typically takes (336 seconds)
+generic/459 # failure rate 1/20 https://gist.github.com/mcgrof/58818ec26ca195b22a3cfe8da5e40a7a
+generic/476 # seems to run for over 3603 seconds..
+generic/581 # failure rate is 1/20
+generic/622 # failure rate 1/10
diff --git a/workflows/fstests/expunges/6.1.53/xfs/unassigned/all.txt b/workflows/fstests/expunges/6.1.53/xfs/unassigned/all.txt
new file mode 100644
index 000000000000..a9bfe501e0f5
--- /dev/null
+++ b/workflows/fstests/expunges/6.1.53/xfs/unassigned/all.txt
@@ -0,0 +1,40 @@
+generic/050 # lazy baseline - failure found in at least two sections
+generic/054 # lazy baseline - failure found in at least two sections
+generic/055 # lazy baseline - failure found in at least two sections
+generic/081 # lazy baseline - failure found in at least two sections
+generic/108 # lazy baseline - failure found in at least two sections
+generic/175 # buggy test uses uninitialized variable i as input argument to truncate
+generic/204 # lazy baseline - failure found in at least two sections
+generic/223 # lazy baseline - failure found in at least two sections
+generic/241
+generic/273 # lazy baseline - failure found in at least two sections
+generic/297 # buggy test uses uninitialized variable i as input argument to truncate
+generic/298 # buggy test uses uninitialized variable i as input argument to truncate
+generic/361 # lazy baseline - failure found in at least two sections
+generic/455 # fails on two sections already
+generic/459 # fails on multiple sections maybe xfsprogs version?
+generic/459 # lazy baseline - failure found in at least two sections
+generic/471 # broken test
+generic/475 # flaky test
+generic/482 # flaky test - failure rate is about 1/15 https://gist.github.com/mcgrof/048243ac4435ee055d7a0e38a2c082da
+generic/530 # lazy baseline - failure rate about 1/15 https://gist.github.com/mcgrof/4129074db592c170e6bf748aa11d783d
+generic/604 # buggy test
+shared/298 # lazy baseline - failure found in at least two sections
+xfs/005 # lazy baseline - failure found in at least two sections
+xfs/008 # unreliable test - fails because "holes has value of 44 holes is NOT in range 45 .. 55" - this is a _within_tolerance test - not an reliable check
+xfs/009 # fails on multiple sections already
+xfs/016 # unreliable test - fails on multiple sections already because of non deterministic calculation of log size
+xfs/059 # fails on multiple sections already
+xfs/060 # fails on multiple sections already
+xfs/154 # lazy baseline - failure found in at least two sections
+xfs/155 # xfs_repair fails (possibly after reducing RAM size to 3GB)
+xfs/157 # lazy baseline - failure found in at least two sections
+xfs/158 # lazy baseline - failure found in at least two sections
+xfs/168 # lazy baseline - failure found in at least two sections
+xfs/199 # lazy baseline - failure found in at least two sections
+xfs/216 # fails on multiple sections maybe xfsprogs version change?
+xfs/294 # lazy baseline - failure found in at least two sections
+xfs/301 # fails on multiple sections already
+xfs/495 # lazy baseline - failure found in at least two sections
+xfs/506
+xfs/598
diff --git a/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_crc.txt b/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_crc.txt
new file mode 100644
index 000000000000..51f9ff242061
--- /dev/null
+++ b/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_crc.txt
@@ -0,0 +1 @@
+generic/299
diff --git a/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_logdev.txt b/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_logdev.txt
new file mode 100644
index 000000000000..db5f60dcf5bf
--- /dev/null
+++ b/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_logdev.txt
@@ -0,0 +1,10 @@
+generic/042
+generic/704
+generic/730
+generic/731
+xfs/017
+xfs/045
+xfs/160
+xfs/161
+xfs/273
+xfs/438
diff --git a/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_nocrc.txt b/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_nocrc.txt
new file mode 100644
index 000000000000..5a4c1ed3368b
--- /dev/null
+++ b/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_nocrc.txt
@@ -0,0 +1,2 @@
+generic/589 # failure rate is 1/10
+xfs/195
diff --git a/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_nocrc_512.txt b/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_nocrc_512.txt
new file mode 100644
index 000000000000..eba91e9ba338
--- /dev/null
+++ b/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_nocrc_512.txt
@@ -0,0 +1,7 @@
+generic/388 # failure only appears on xunit file no *.bad file
+generic/618
+generic/681
+generic/682
+xfs/071
+xfs/220
+xfs/295 # failure rate is about 1/30 xfs_logprint: unknown log operation type (0) Bad data in log
diff --git a/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_reflink_1024.txt b/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_reflink_1024.txt
new file mode 100644
index 000000000000..4e222f35568a
--- /dev/null
+++ b/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_reflink_1024.txt
@@ -0,0 +1 @@
+xfs/014 # failure rate is about 1/20
diff --git a/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_rtdev.txt b/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_rtdev.txt
new file mode 100644
index 000000000000..a27042912692
--- /dev/null
+++ b/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_rtdev.txt
@@ -0,0 +1 @@
+xfs/002 # xfs_growfs: log growth not supported yet
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [RFC PATCH kdevops 2/2] xfs: merge common expunge lists for v6.1.53
  2023-09-15 23:48 [RFC PATCH kdevops 0/2] augment expunge list for v6.1.53 Frederick Lawler
  2023-09-15 23:48 ` [RFC PATCH kdevops 1/2] fstests/xfs: copy 6.1.42 baseline " Frederick Lawler
@ 2023-09-15 23:48 ` Frederick Lawler
  2023-09-16  9:23 ` [RFC PATCH kdevops 0/2] augment expunge list " Amir Goldstein
  2 siblings, 0 replies; 5+ messages in thread
From: Frederick Lawler @ 2023-09-15 23:48 UTC (permalink / raw)
  To: amir73il, mcgrof; +Cc: kdevops, kernel-team, linux-fsdevel, Frederick Lawler

Signed-off-by: Frederick Lawler <fred@cloudflare.com>
---
 .../6.1.53/xfs/unassigned/xfs_crc.txt         |  6 +++
 .../6.1.53/xfs/unassigned/xfs_logdev.txt      | 16 +++++++
 .../6.1.53/xfs/unassigned/xfs_nocrc.txt       |  5 ++
 .../6.1.53/xfs/unassigned/xfs_nocrc_512.txt   |  5 ++
 .../6.1.53/xfs/unassigned/xfs_reflink.txt     |  5 ++
 .../xfs/unassigned/xfs_reflink_1024.txt       | 11 +++++
 .../xfs/unassigned/xfs_reflink_normapbt.txt   | 10 ++++
 .../6.1.53/xfs/unassigned/xfs_rtdev.txt       | 48 +++++++++++++++++++
 8 files changed, 106 insertions(+)
 create mode 100644 workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_reflink.txt
 create mode 100644 workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_reflink_normapbt.txt

diff --git a/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_crc.txt b/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_crc.txt
index 51f9ff242061..05c42bab9f3d 100644
--- a/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_crc.txt
+++ b/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_crc.txt
@@ -1 +1,7 @@
 generic/299
+generic/471
+xfs/075
+xfs/270
+xfs/506
+xfs/513
+xfs/557
diff --git a/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_logdev.txt b/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_logdev.txt
index db5f60dcf5bf..32d81aa51620 100644
--- a/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_logdev.txt
+++ b/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_logdev.txt
@@ -1,10 +1,26 @@
 generic/042
+generic/054
+generic/055
+generic/081
+generic/108
+generic/204
+generic/223
+generic/361
+generic/459
+generic/471
 generic/704
 generic/730
 generic/731
+shared/298
+xfs/008
 xfs/017
 xfs/045
+xfs/075
+xfs/158
 xfs/160
 xfs/161
+xfs/199
+xfs/270
 xfs/273
+xfs/294
 xfs/438
diff --git a/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_nocrc.txt b/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_nocrc.txt
index 5a4c1ed3368b..5359fe78a91c 100644
--- a/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_nocrc.txt
+++ b/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_nocrc.txt
@@ -1,2 +1,7 @@
+generic/471
 generic/589 # failure rate is 1/10
+xfs/075
 xfs/195
+xfs/506
+xfs/513
+xfs/557
diff --git a/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_nocrc_512.txt b/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_nocrc_512.txt
index eba91e9ba338..e58ba3291e43 100644
--- a/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_nocrc_512.txt
+++ b/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_nocrc_512.txt
@@ -1,7 +1,12 @@
 generic/388 # failure only appears on xunit file no *.bad file
+generic/471
 generic/618
 generic/681
 generic/682
 xfs/071
+xfs/075
 xfs/220
 xfs/295 # failure rate is about 1/30 xfs_logprint: unknown log operation type (0) Bad data in log
+xfs/506
+xfs/513
+xfs/557
diff --git a/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_reflink.txt b/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_reflink.txt
new file mode 100644
index 000000000000..cb826775acee
--- /dev/null
+++ b/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_reflink.txt
@@ -0,0 +1,5 @@
+generic/175
+generic/297
+generic/298
+generic/471
+xfs/075
diff --git a/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_reflink_1024.txt b/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_reflink_1024.txt
index 4e222f35568a..67172755b9bd 100644
--- a/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_reflink_1024.txt
+++ b/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_reflink_1024.txt
@@ -1 +1,12 @@
+generic/175
+generic/297
+generic/298
+generic/471
 xfs/014 # failure rate is about 1/20
+xfs/075
+xfs/168
+xfs/179
+xfs/270
+xfs/506
+xfs/513
+xfs/557
diff --git a/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_reflink_normapbt.txt b/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_reflink_normapbt.txt
new file mode 100644
index 000000000000..97eb2ba03d40
--- /dev/null
+++ b/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_reflink_normapbt.txt
@@ -0,0 +1,10 @@
+generic/175
+generic/297
+generic/298
+generic/471
+xfs/075
+xfs/179
+xfs/270
+xfs/506
+xfs/513
+xfs/557
diff --git a/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_rtdev.txt b/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_rtdev.txt
index a27042912692..783bf7adfce9 100644
--- a/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_rtdev.txt
+++ b/workflows/fstests/expunges/6.1.53/xfs/unassigned/xfs_rtdev.txt
@@ -1 +1,49 @@
+generic/012
+generic/013
+generic/015
+generic/016
+generic/021
+generic/022
+generic/027
+generic/058
+generic/060
+generic/061
+generic/063
+generic/074
+generic/075
+generic/077
+generic/096
+generic/102
+generic/112
+generic/113
+generic/171
+generic/172
+generic/173
+generic/174
+generic/175
+generic/204
+generic/224
+generic/226
+generic/251
+generic/256
+generic/269
+generic/270
+generic/273
+generic/274
+generic/275
+generic/297
+generic/298
+generic/300
+generic/312
+generic/361
+generic/371
+generic/416
+generic/427
+generic/449
+generic/471
+generic/488
+generic/511
+generic/515
+generic/520
+generic/558
 xfs/002 # xfs_growfs: log growth not supported yet
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH kdevops 0/2] augment expunge list for v6.1.53
  2023-09-15 23:48 [RFC PATCH kdevops 0/2] augment expunge list for v6.1.53 Frederick Lawler
  2023-09-15 23:48 ` [RFC PATCH kdevops 1/2] fstests/xfs: copy 6.1.42 baseline " Frederick Lawler
  2023-09-15 23:48 ` [RFC PATCH kdevops 2/2] xfs: merge common expunge lists " Frederick Lawler
@ 2023-09-16  9:23 ` Amir Goldstein
  2023-09-18 18:52   ` Luis Chamberlain
  2 siblings, 1 reply; 5+ messages in thread
From: Amir Goldstein @ 2023-09-16  9:23 UTC (permalink / raw)
  To: Frederick Lawler
  Cc: mcgrof, kdevops, kernel-team, linux-fsdevel, Chandan Babu R,
	Leah Rumancik, Darrick J. Wong

Hi Frederick!

Nice to see you joining the kdevops gang :)

On Sat, Sep 16, 2023 at 2:49 AM Frederick Lawler <fred@cloudflare.com> wrote:
>
> In an effort to test and prepare patches from XFS to stable 6.1.y [1], I needed
> to make a baseline for v6.1.53 to verify that the backported patches do not
> introduce regressions (if any). However, after a 'make fstests-baseline', we
> observed that compared to v6.1.42, v6.1.53 introduced more than expected
> expunges to XFS. This RFC is an attempt to put some eyes to this and open up a
> discussion.

I have refreshed the v6.1.42 expunge list very recently to uptodate fstests:

commit 0b58b02f08d26ea23b6ff58d9b24488c266f32d0
Author: Amir Goldstein <amir73il@gmail.com>
Date:   Sat Aug 12 12:29:57 2023 +0300

    xfs: expunge new failing tests

    After update of fstests branch to tag v2023.08.06

There are zero changes in xfs code between v6.1.42..v6.1.53, so all
the regressions
you observed are unlikely due to the code change.

If it is not easy for you to test on a v6.1.42 k8 host, I can re-run
the baseline loop
with v6.1.53 kernel to verify there are no regressions, but I am
betting there won't be.
So the failures you are seeing must be due to some difference between
our setups.

Note that when I started to use kdepops with libvirt, we have observed
many random
errors that were eventually attributed to faulty code in qemu nvme driver.

I am not ruling out the possibility that the expuge lists that me or
Luis prepared
for xfs in some version (5.10.y, 6.1,y, etc) are tainted with failures
related to
our specific setup.

AFAIK, we never bothered to create two different baselines from scratch in
two different envs (e.g. libvirt and GCE/OCI) and compare them.

But as it is, you already have my baseline from libvirt/kvm -
I don't think that it makes sense to add to 6.1.y expunge lists
failures due to test env change, unless you were able to prove that either:
1. Those tests did not run in my env
2. You env manages to expose a bug that my env did not expose

I can help with #1 by committing results from a run in my env.
#2 is harder - you will need to analyse the failures in your env
and understand them.

Whenever I see new failures, I always analyse them before adding
to the expunge list and I try to add a comment explaining either the
observed reason for failure or the missing fix if I know it.

>
> At Cloudflare, the Linux team does not have an easy way to obtain dedicated and
> easily configurable server infrastructure to execute kdevops filesystem testing,
> but we do have an easily-configurable kubernetes infrastructure. I prepared a
> POC to spin up virtual machines [2] in kubernetes to emulate what terraform
> may do for OpenStack, Azure, AWS, etc... to perform this test. Therefore, the
> configuration option is set to SKIP_BRINGUP=y
>
> In this baseline, I spun up XFS workflow nodes for:
> - xfs_crc
> - xfs_logdev
> - xfs_nocrc
> - xfs_nocrc_512
> - xfs_reflink
> - xfs_reflink_1024
> - xfs_reflink_normapbt
> - xfs_rtdev
>
> Each node is running a vanilla-stable 6.1.y (6.1.53), and the image is based on
> latest Debian SID [3]. Each node also has its own dedicated /data and /media
> partitions to store Linux, fstests, etc... and sparse-images respectfully.
>
> In v6.1.42, we don't currently have expunges for xfs_reflink_normapbt, and
> xfs_reflink. So those are _new_. The rest had significant additions. However,
> not all nodes finished their testing after >12hrs of run time. Some appeared to
> be stuck, in particular xfs_rtdev, and never finished (reason unknown).
> I CTRL+C and ran 'make fstests-results'.
>
> I prepared a fork [4] where the results 6.1.53.xz can be found.
>
> These patches are based on top of commit 0ec98182f4a9 ("bootlinux/fstests:
> remove odd hplip user")
>
> Links:
> 1: https://lore.kernel.org/all/CAOQ4uxgvawD4=4g8BaRiNvyvKN1oreuov_ie6sK6arq3bf8fxw@mail.gmail.com/
> 2: https://kubevirt.io/api-reference/v1.0.0/definitions.html#_v1_virtualmachine
> 3: https://cloud.debian.org/images/cloud/sid/daily/latest/ (debian-sid-genericcloud-amd64-daily.qcow2)
> 4: https://github.com/fredlawl/kdevops/commit/afcb8fe7c4498d2be5386e191db3534f651a3730#diff-0677846133ad9128bf752f674b3c8da437c12ce28f48d8890b9f66d0dcb3717c
>
> Frederick Lawler (2):
>   fstests/xfs: copy 6.1.42 baseline for v6.1.53

In this commit you copied also the ext4 and btrfs expunge lists.
That is not needed as you are not changing or intend to change them.

I don't think that forking xfs lists is going to be needed at all
once you verified what happened - if your findings are indeed
correct they probably belong in the v6.1.42 expunge list.

>   xfs: merge common expunge lists for v6.1.53

The title of this commit does not represent the change correctly.
What this commit does is to add many new tests to the 6.1.53
expunge list.

Your confusing must be from seeing my commits like:
8745d44 xfs: merge common expunge lists for v6.1.42

What these commits do is to merge common failures
in xfs_* config specific expunge lists into the common all.txt
expunge list - there are scripts that do that:
./scripts/workflows/fstests/{find,remove}-common-failures.sh

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH kdevops 0/2] augment expunge list for v6.1.53
  2023-09-16  9:23 ` [RFC PATCH kdevops 0/2] augment expunge list " Amir Goldstein
@ 2023-09-18 18:52   ` Luis Chamberlain
  0 siblings, 0 replies; 5+ messages in thread
From: Luis Chamberlain @ 2023-09-18 18:52 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Frederick Lawler, kdevops, kernel-team, linux-fsdevel,
	Chandan Babu R, Leah Rumancik, Darrick J. Wong

On Sat, Sep 16, 2023 at 12:23:54PM +0300, Amir Goldstein wrote:
> AFAIK, we never bothered to create two different baselines from scratch in
> two different envs (e.g. libvirt and GCE/OCI) and compare them.

The experience so far has been that sparse files help find more issues
than using real drives and it is why its a default. And even if you use
a cloud solution you can use sparse files too. Only recently did I add
support to use real NVMe drive support and so to create partitions. That
should find less issues, however I did the work so to be able to test
LBS devices.

> But as it is, you already have my baseline from libvirt/kvm -
> I don't think that it makes sense to add to 6.1.y expunge lists
> failures due to test env change, unless you were able to prove that either:
> 1. Those tests did not run in my env
> 2. You env manages to expose a bug that my env did not expose
> 
> I can help with #1 by committing results from a run in my env.
> #2 is harder - you will need to analyse the failures in your env
> and understand them.

My guess so far is that the older expunges used an older version of
fstests.

  Luis

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-09-18 18:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-15 23:48 [RFC PATCH kdevops 0/2] augment expunge list for v6.1.53 Frederick Lawler
2023-09-15 23:48 ` [RFC PATCH kdevops 1/2] fstests/xfs: copy 6.1.42 baseline " Frederick Lawler
2023-09-15 23:48 ` [RFC PATCH kdevops 2/2] xfs: merge common expunge lists " Frederick Lawler
2023-09-16  9:23 ` [RFC PATCH kdevops 0/2] augment expunge list " Amir Goldstein
2023-09-18 18:52   ` Luis Chamberlain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).