[PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1)

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1)
@ 2022-06-16 18:27 Leah Rumancik
  2022-06-16 18:27 ` [PATCH 5.15 CANDIDATE v2 1/8] xfs: use kmem_cache_free() for kmem_cache objects Leah Rumancik
                   ` (9 more replies)
  0 siblings, 10 replies; 32+ messages in thread
From: Leah Rumancik @ 2022-06-16 18:27 UTC (permalink / raw)
  To: linux-xfs; +Cc: mcgrof, Leah Rumancik

The patch testing has been increased to 100 runs per test on each 
config. A baseline without the patches was established with 100 runs 
to help detect hard failures / tests with a high fail rate. Any 
failures seen in the backports branch but not in the baseline branch 
were then run 1000+ times on both the baseline and backport branches 
and the failure rates compared. The failures seen on the 5.15 
baseline are listed at 
https://gist.github.com/lrumancik/5a9d85d2637f878220224578e173fc23. 
No regressions were seen with these patches.

To make the review process easier, I have been coordinating with Amir 
who has been testing this same set of patches on 5.10. He will be 
sending out the corresponding 5.10 series shortly.

Change log from v1 
(https://lore.kernel.org/all/20220603184701.3117780-1-leah.rumancik@gmail.com/):
- Increased testing
- Reduced patch set to overlap with 5.10 patches

Thanks,
Leah

Brian Foster (1):
  xfs: punch out data fork delalloc blocks on COW writeback failure

Darrick J. Wong (4):
  xfs: remove all COW fork extents when remounting readonly
  xfs: prevent UAF in xfs_log_item_in_current_chkpt
  xfs: only bother with sync_filesystem during readonly remount
  xfs: use setattr_copy to set vfs inode attributes

Dave Chinner (1):
  xfs: check sb_meta_uuid for dabuf buffer recovery

Rustam Kovhaev (1):
  xfs: use kmem_cache_free() for kmem_cache objects

Yang Xu (1):
  xfs: Fix the free logic of state in xfs_attr_node_hasname

 fs/xfs/libxfs/xfs_attr.c      | 17 +++++------
 fs/xfs/xfs_aops.c             | 15 ++++++++--
 fs/xfs/xfs_buf_item_recover.c |  2 +-
 fs/xfs/xfs_extfree_item.c     |  6 ++--
 fs/xfs/xfs_iops.c             | 56 ++---------------------------------
 fs/xfs/xfs_log_cil.c          |  6 ++--
 fs/xfs/xfs_pnfs.c             |  3 +-
 fs/xfs/xfs_super.c            | 21 +++++++++----
 8 files changed, 47 insertions(+), 79 deletions(-)

-- 
2.36.1.476.g0c4daa206d-goog

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 5.15 CANDIDATE v2 1/8] xfs: use kmem_cache_free() for kmem_cache objects
  2022-06-16 18:27 [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1) Leah Rumancik
@ 2022-06-16 18:27 ` Leah Rumancik
  2022-06-16 18:27 ` [PATCH 5.15 CANDIDATE v2 2/8] xfs: punch out data fork delalloc blocks on COW writeback failure Leah Rumancik
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 32+ messages in thread
From: Leah Rumancik @ 2022-06-16 18:27 UTC (permalink / raw)
  To: linux-xfs; +Cc: mcgrof, Rustam Kovhaev, Darrick J . Wong, Leah Rumancik

From: Rustam Kovhaev <rkovhaev@gmail.com>

[ Upstream commit c30a0cbd07ecc0eec7b3cd568f7b1c7bb7913f93 ]

For kmalloc() allocations SLOB prepends the blocks with a 4-byte header,
and it puts the size of the allocated blocks in that header.
Blocks allocated with kmem_cache_alloc() allocations do not have that
header.

SLOB explodes when you allocate memory with kmem_cache_alloc() and then
try to free it with kfree() instead of kmem_cache_free().
SLOB will assume that there is a header when there is none, read some
garbage to size variable and corrupt the adjacent objects, which
eventually leads to hang or panic.

Let's make XFS work with SLOB by using proper free function.

Fixes: 9749fee83f38 ("xfs: enable the xfs_defer mechanism to process extents to free")
Signed-off-by: Rustam Kovhaev <rkovhaev@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
---
 fs/xfs/xfs_extfree_item.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 3f8a0713573a..a4b8caa2c601 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -482,7 +482,7 @@ xfs_extent_free_finish_item(
 			free->xefi_startblock,
 			free->xefi_blockcount,
 			&free->xefi_oinfo, free->xefi_skip_discard);
-	kmem_free(free);
+	kmem_cache_free(xfs_bmap_free_item_zone, free);
 	return error;
 }
 
@@ -502,7 +502,7 @@ xfs_extent_free_cancel_item(
 	struct xfs_extent_free_item	*free;
 
 	free = container_of(item, struct xfs_extent_free_item, xefi_list);
-	kmem_free(free);
+	kmem_cache_free(xfs_bmap_free_item_zone, free);
 }
 
 const struct xfs_defer_op_type xfs_extent_free_defer_type = {
@@ -564,7 +564,7 @@ xfs_agfl_free_finish_item(
 	extp->ext_len = free->xefi_blockcount;
 	efdp->efd_next_extent++;
 
-	kmem_free(free);
+	kmem_cache_free(xfs_bmap_free_item_zone, free);
 	return error;
 }
 
-- 
2.36.1.476.g0c4daa206d-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 5.15 CANDIDATE v2 2/8] xfs: punch out data fork delalloc blocks on COW writeback failure
  2022-06-16 18:27 [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1) Leah Rumancik
  2022-06-16 18:27 ` [PATCH 5.15 CANDIDATE v2 1/8] xfs: use kmem_cache_free() for kmem_cache objects Leah Rumancik
@ 2022-06-16 18:27 ` Leah Rumancik
  2022-06-16 18:27 ` [PATCH 5.15 CANDIDATE v2 3/8] xfs: Fix the free logic of state in xfs_attr_node_hasname Leah Rumancik
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 32+ messages in thread
From: Leah Rumancik @ 2022-06-16 18:27 UTC (permalink / raw)
  To: linux-xfs; +Cc: mcgrof, Brian Foster, Darrick J . Wong, Leah Rumancik

From: Brian Foster <bfoster@redhat.com>

[ Upstream commit 5ca5916b6bc93577c360c06cb7cdf71adb9b5faf ]

If writeback I/O to a COW extent fails, the COW fork blocks are
punched out and the data fork blocks left alone. It is possible for
COW fork blocks to overlap non-shared data fork blocks (due to
cowextsz hint prealloc), however, and writeback unconditionally maps
to the COW fork whenever blocks exist at the corresponding offset of
the page undergoing writeback. This means it's quite possible for a
COW fork extent to overlap delalloc data fork blocks, writeback to
convert and map to the COW fork blocks, writeback to fail, and
finally for ioend completion to cancel the COW fork blocks and leave
stale data fork delalloc blocks around in the inode. The blocks are
effectively stale because writeback failure also discards dirty page
state.

If this occurs, it is likely to trigger assert failures, free space
accounting corruption and failures in unrelated file operations. For
example, a subsequent reflink attempt of the affected file to a new
target file will trip over the stale delalloc in the source file and
fail. Several of these issues are occasionally reproduced by
generic/648, but are reproducible on demand with the right sequence
of operations and timely I/O error injection.

To fix this problem, update the ioend failure path to also punch out
underlying data fork delalloc blocks on I/O error. This is analogous
to the writeback submission failure path in xfs_discard_page() where
we might fail to map data fork delalloc blocks and consistent with
the successful COW writeback completion path, which is responsible
for unmapping from the data fork and remapping in COW fork blocks.

Fixes: 787eb485509f ("xfs: fix and streamline error handling in xfs_end_io")
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
---
 fs/xfs/xfs_aops.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 34fc6148032a..c8c15c3c3147 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -82,6 +82,7 @@ xfs_end_ioend(
 	struct iomap_ioend	*ioend)
 {
 	struct xfs_inode	*ip = XFS_I(ioend->io_inode);
+	struct xfs_mount	*mp = ip->i_mount;
 	xfs_off_t		offset = ioend->io_offset;
 	size_t			size = ioend->io_size;
 	unsigned int		nofs_flag;
@@ -97,18 +98,26 @@ xfs_end_ioend(
 	/*
 	 * Just clean up the in-memory structures if the fs has been shut down.
 	 */
-	if (xfs_is_shutdown(ip->i_mount)) {
+	if (xfs_is_shutdown(mp)) {
 		error = -EIO;
 		goto done;
 	}

 	/*
-	 * Clean up any COW blocks on an I/O error.
+	 * Clean up all COW blocks and underlying data fork delalloc blocks on
+	 * I/O error. The delalloc punch is required because this ioend was
+	 * mapped to blocks in the COW fork and the associated pages are no
+	 * longer dirty. If we don't remove delalloc blocks here, they become
+	 * stale and can corrupt free space accounting on unmount.
 	 */
 	error = blk_status_to_errno(ioend->io_bio->bi_status);
 	if (unlikely(error)) {
-		if (ioend->io_flags & IOMAP_F_SHARED)
+		if (ioend->io_flags & IOMAP_F_SHARED) {
 			xfs_reflink_cancel_cow_range(ip, offset, size, true);
+			xfs_bmap_punch_delalloc_range(ip,
+						      XFS_B_TO_FSBT(mp, offset),
+						      XFS_B_TO_FSB(mp, size));
+		}
 		goto done;
 	}

-- 
2.36.1.476.g0c4daa206d-goog

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 5.15 CANDIDATE v2 3/8] xfs: Fix the free logic of state in xfs_attr_node_hasname
  2022-06-16 18:27 [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1) Leah Rumancik
  2022-06-16 18:27 ` [PATCH 5.15 CANDIDATE v2 1/8] xfs: use kmem_cache_free() for kmem_cache objects Leah Rumancik
  2022-06-16 18:27 ` [PATCH 5.15 CANDIDATE v2 2/8] xfs: punch out data fork delalloc blocks on COW writeback failure Leah Rumancik
@ 2022-06-16 18:27 ` Leah Rumancik
  2022-06-16 18:27 ` [PATCH 5.15 CANDIDATE v2 4/8] xfs: remove all COW fork extents when remounting readonly Leah Rumancik
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 32+ messages in thread
From: Leah Rumancik @ 2022-06-16 18:27 UTC (permalink / raw)
  To: linux-xfs; +Cc: mcgrof, Yang Xu, Darrick J . Wong, Leah Rumancik

From: Yang Xu <xuyang2018.jy@fujitsu.com>

[ Upstream commit a1de97fe296c52eafc6590a3506f4bbd44ecb19a ]

When testing xfstests xfs/126 on lastest upstream kernel, it will hang on some machine.
Adding a getxattr operation after xattr corrupted, I can reproduce it 100%.

The deadlock as below:
[983.923403] task:setfattr        state:D stack:    0 pid:17639 ppid: 14687 flags:0x00000080
[  983.923405] Call Trace:
[  983.923410]  __schedule+0x2c4/0x700
[  983.923412]  schedule+0x37/0xa0
[  983.923414]  schedule_timeout+0x274/0x300
[  983.923416]  __down+0x9b/0xf0
[  983.923451]  ? xfs_buf_find.isra.29+0x3c8/0x5f0 [xfs]
[  983.923453]  down+0x3b/0x50
[  983.923471]  xfs_buf_lock+0x33/0xf0 [xfs]
[  983.923490]  xfs_buf_find.isra.29+0x3c8/0x5f0 [xfs]
[  983.923508]  xfs_buf_get_map+0x4c/0x320 [xfs]
[  983.923525]  xfs_buf_read_map+0x53/0x310 [xfs]
[  983.923541]  ? xfs_da_read_buf+0xcf/0x120 [xfs]
[  983.923560]  xfs_trans_read_buf_map+0x1cf/0x360 [xfs]
[  983.923575]  ? xfs_da_read_buf+0xcf/0x120 [xfs]
[  983.923590]  xfs_da_read_buf+0xcf/0x120 [xfs]
[  983.923606]  xfs_da3_node_read+0x1f/0x40 [xfs]
[  983.923621]  xfs_da3_node_lookup_int+0x69/0x4a0 [xfs]
[  983.923624]  ? kmem_cache_alloc+0x12e/0x270
[  983.923637]  xfs_attr_node_hasname+0x6e/0xa0 [xfs]
[  983.923651]  xfs_has_attr+0x6e/0xd0 [xfs]
[  983.923664]  xfs_attr_set+0x273/0x320 [xfs]
[  983.923683]  xfs_xattr_set+0x87/0xd0 [xfs]
[  983.923686]  __vfs_removexattr+0x4d/0x60
[  983.923688]  __vfs_removexattr_locked+0xac/0x130
[  983.923689]  vfs_removexattr+0x4e/0xf0
[  983.923690]  removexattr+0x4d/0x80
[  983.923693]  ? __check_object_size+0xa8/0x16b
[  983.923695]  ? strncpy_from_user+0x47/0x1a0
[  983.923696]  ? getname_flags+0x6a/0x1e0
[  983.923697]  ? _cond_resched+0x15/0x30
[  983.923699]  ? __sb_start_write+0x1e/0x70
[  983.923700]  ? mnt_want_write+0x28/0x50
[  983.923701]  path_removexattr+0x9b/0xb0
[  983.923702]  __x64_sys_removexattr+0x17/0x20
[  983.923704]  do_syscall_64+0x5b/0x1a0
[  983.923705]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[  983.923707] RIP: 0033:0x7f080f10ee1b

When getxattr calls xfs_attr_node_get function, xfs_da3_node_lookup_int fails with EFSCORRUPTED in
xfs_attr_node_hasname because we have use blocktrash to random it in xfs/126. So it
free state in internal and xfs_attr_node_get doesn't do xfs_buf_trans release job.

Then subsequent removexattr will hang because of it.

This bug was introduced by kernel commit 07120f1abdff ("xfs: Add xfs_has_attr and subroutines").
It adds xfs_attr_node_hasname helper and said caller will be responsible for freeing the state
in this case. But xfs_attr_node_hasname will free state itself instead of caller if
xfs_da3_node_lookup_int fails.

Fix this bug by moving the step of free state into caller.

Also, use "goto error/out" instead of returning error directly in xfs_attr_node_addname_find_attr and
xfs_attr_node_removename_setup function because we should free state ourselves.

Fixes: 07120f1abdff ("xfs: Add xfs_has_attr and subroutines")
Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
---
 fs/xfs/libxfs/xfs_attr.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index fbc9d816882c..23523b802539 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -1077,21 +1077,18 @@ xfs_attr_node_hasname(
 
 	state = xfs_da_state_alloc(args);
 	if (statep != NULL)
-		*statep = NULL;
+		*statep = state;
 
 	/*
 	 * Search to see if name exists, and get back a pointer to it.
 	 */
 	error = xfs_da3_node_lookup_int(state, &retval);
-	if (error) {
-		xfs_da_state_free(state);
-		return error;
-	}
+	if (error)
+		retval = error;
 
-	if (statep != NULL)
-		*statep = state;
-	else
+	if (!statep)
 		xfs_da_state_free(state);
+
 	return retval;
 }
 
@@ -1112,7 +1109,7 @@ xfs_attr_node_addname_find_attr(
 	 */
 	retval = xfs_attr_node_hasname(args, &dac->da_state);
 	if (retval != -ENOATTR && retval != -EEXIST)
-		return retval;
+		goto error;
 
 	if (retval == -ENOATTR && (args->attr_flags & XATTR_REPLACE))
 		goto error;
@@ -1337,7 +1334,7 @@ int xfs_attr_node_removename_setup(
 
 	error = xfs_attr_node_hasname(args, state);
 	if (error != -EEXIST)
-		return error;
+		goto out;
 	error = 0;
 
 	ASSERT((*state)->path.blk[(*state)->path.active - 1].bp != NULL);
-- 
2.36.1.476.g0c4daa206d-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 5.15 CANDIDATE v2 4/8] xfs: remove all COW fork extents when remounting readonly
  2022-06-16 18:27 [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1) Leah Rumancik
                   ` (2 preceding siblings ...)
  2022-06-16 18:27 ` [PATCH 5.15 CANDIDATE v2 3/8] xfs: Fix the free logic of state in xfs_attr_node_hasname Leah Rumancik
@ 2022-06-16 18:27 ` Leah Rumancik
  2022-06-16 18:27 ` [PATCH 5.15 CANDIDATE v2 5/8] xfs: check sb_meta_uuid for dabuf buffer recovery Leah Rumancik
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 32+ messages in thread
From: Leah Rumancik @ 2022-06-16 18:27 UTC (permalink / raw)
  To: linux-xfs
  Cc: mcgrof, Darrick J. Wong, Dave Chinner, Chandan Babu R, Leah Rumancik

From: "Darrick J. Wong" <djwong@kernel.org>

[ Upstream commit 089558bc7ba785c03815a49c89e28ad9b8de51f9 ]

As part of multiple customer escalations due to file data corruption
after copy on write operations, I wrote some fstests that use fsstress
to hammer on COW to shake things loose.  Regrettably, I caught some
filesystem shutdowns due to incorrect rmap operations with the following
loop:

mount <filesystem>				# (0)
fsstress <run only readonly ops> &		# (1)
while true; do
	fsstress <run all ops>
	mount -o remount,ro			# (2)
	fsstress <run only readonly ops>
	mount -o remount,rw			# (3)
done

When (2) happens, notice that (1) is still running.  xfs_remount_ro will
call xfs_blockgc_stop to walk the inode cache to free all the COW
extents, but the blockgc mechanism races with (1)'s reader threads to
take IOLOCKs and loses, which means that it doesn't clean them all out.
Call such a file (A).

When (3) happens, xfs_remount_rw calls xfs_reflink_recover_cow, which
walks the ondisk refcount btree and frees any COW extent that it finds.
This function does not check the inode cache, which means that incore
COW forks of inode (A) is now inconsistent with the ondisk metadata.  If
one of those former COW extents are allocated and mapped into another
file (B) and someone triggers a COW to the stale reservation in (A), A's
dirty data will be written into (B) and once that's done, those blocks
will be transferred to (A)'s data fork without bumping the refcount.

The results are catastrophic -- file (B) and the refcount btree are now
corrupt.  Solve this race by forcing the xfs_blockgc_free_space to run
synchronously, which causes xfs_icwalk to return to inodes that were
skipped because the blockgc code couldn't take the IOLOCK.  This is safe
to do here because the VFS has already prohibited new writer threads.

Fixes: 10ddf64e420f ("xfs: remove leftover CoW reservations when remounting ro")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
---
 fs/xfs/xfs_super.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 170fee98c45c..23673703618a 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1768,7 +1768,10 @@ static int
 xfs_remount_ro(
 	struct xfs_mount	*mp)
 {
-	int error;
+	struct xfs_icwalk	icw = {
+		.icw_flags	= XFS_ICWALK_FLAG_SYNC,
+	};
+	int			error;

 	/*
 	 * Cancel background eofb scanning so it cannot race with the final
@@ -1776,8 +1779,13 @@ xfs_remount_ro(
 	 */
 	xfs_blockgc_stop(mp);

-	/* Get rid of any leftover CoW reservations... */
-	error = xfs_blockgc_free_space(mp, NULL);
+	/*
+	 * Clear out all remaining COW staging extents and speculative post-EOF
+	 * preallocations so that we don't leave inodes requiring inactivation
+	 * cleanups during reclaim on a read-only mount.  We must process every
+	 * cached inode, so this requires a synchronous cache scan.
+	 */
+	error = xfs_blockgc_free_space(mp, &icw);
 	if (error) {
 		xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
 		return error;
-- 
2.36.1.476.g0c4daa206d-goog

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 5.15 CANDIDATE v2 5/8] xfs: check sb_meta_uuid for dabuf buffer recovery
  2022-06-16 18:27 [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1) Leah Rumancik
                   ` (3 preceding siblings ...)
  2022-06-16 18:27 ` [PATCH 5.15 CANDIDATE v2 4/8] xfs: remove all COW fork extents when remounting readonly Leah Rumancik
@ 2022-06-16 18:27 ` Leah Rumancik
  2022-06-16 18:27 ` [PATCH 5.15 CANDIDATE v2 6/8] xfs: prevent UAF in xfs_log_item_in_current_chkpt Leah Rumancik
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 32+ messages in thread
From: Leah Rumancik @ 2022-06-16 18:27 UTC (permalink / raw)
  To: linux-xfs; +Cc: mcgrof, Dave Chinner, Darrick J . Wong, Leah Rumancik

From: Dave Chinner <dchinner@redhat.com>

[ Upstream commit 09654ed8a18cfd45027a67d6cbca45c9ea54feab ]

Got a report that a repeated crash test of a container host would
eventually fail with a log recovery error preventing the system from
mounting the root filesystem. It manifested as a directory leaf node
corruption on writeback like so:

 XFS (loop0): Mounting V5 Filesystem
 XFS (loop0): Starting recovery (logdev: internal)
 XFS (loop0): Metadata corruption detected at xfs_dir3_leaf_check_int+0x99/0xf0, xfs_dir3_leaf1 block 0x12faa158
 XFS (loop0): Unmount and run xfs_repair
 XFS (loop0): First 128 bytes of corrupted metadata buffer:
 00000000: 00 00 00 00 00 00 00 00 3d f1 00 00 e1 9e d5 8b  ........=.......
 00000010: 00 00 00 00 12 fa a1 58 00 00 00 29 00 00 1b cc  .......X...)....
 00000020: 91 06 78 ff f7 7e 4a 7d 8d 53 86 f2 ac 47 a8 23  ..x..~J}.S...G.#
 00000030: 00 00 00 00 17 e0 00 80 00 43 00 00 00 00 00 00  .........C......
 00000040: 00 00 00 2e 00 00 00 08 00 00 17 2e 00 00 00 0a  ................
 00000050: 02 35 79 83 00 00 00 30 04 d3 b4 80 00 00 01 50  .5y....0.......P
 00000060: 08 40 95 7f 00 00 02 98 08 41 fe b7 00 00 02 d4  .@.......A......
 00000070: 0d 62 ef a7 00 00 01 f2 14 50 21 41 00 00 00 0c  .b.......P!A....
 XFS (loop0): Corruption of in-memory data (0x8) detected at xfs_do_force_shutdown+0x1a/0x20 (fs/xfs/xfs_buf.c:1514).  Shutting down.
 XFS (loop0): Please unmount the filesystem and rectify the problem(s)
 XFS (loop0): log mount/recovery failed: error -117
 XFS (loop0): log mount failed

Tracing indicated that we were recovering changes from a transaction
at LSN 0x29/0x1c16 into a buffer that had an LSN of 0x29/0x1d57.
That is, log recovery was overwriting a buffer with newer changes on
disk than was in the transaction. Tracing indicated that we were
hitting the "recovery immediately" case in
xfs_buf_log_recovery_lsn(), and hence it was ignoring the LSN in the
buffer.

The code was extracting the LSN correctly, then ignoring it because
the UUID in the buffer did not match the superblock UUID. The
problem arises because the UUID check uses the wrong UUID - it
should be checking the sb_meta_uuid, not sb_uuid. This filesystem
has sb_uuid != sb_meta_uuid (which is fine), and the buffer has the
correct matching sb_meta_uuid in it, it's just the code checked it
against the wrong superblock uuid.

The is no corruption in the filesystem, and failing to recover the
buffer due to a write verifier failure means the recovery bug did
not propagate the corruption to disk. Hence there is no corruption
before or after this bug has manifested, the impact is limited
simply to an unmountable filesystem....

This was missed back in 2015 during an audit of incorrect sb_uuid
usage that resulted in commit fcfbe2c4ef42 ("xfs: log recovery needs
to validate against sb_meta_uuid") that fixed the magic32 buffers to
validate against sb_meta_uuid instead of sb_uuid. It missed the
magicda buffers....

Fixes: ce748eaa65f2 ("xfs: create new metadata UUID field and incompat flag")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
---
 fs/xfs/xfs_buf_item_recover.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
index a476c7ef5d53..991fbf1eb564 100644
--- a/fs/xfs/xfs_buf_item_recover.c
+++ b/fs/xfs/xfs_buf_item_recover.c
@@ -816,7 +816,7 @@ xlog_recover_get_buf_lsn(
 	}

 	if (lsn != (xfs_lsn_t)-1) {
-		if (!uuid_equal(&mp->m_sb.sb_uuid, uuid))
+		if (!uuid_equal(&mp->m_sb.sb_meta_uuid, uuid))
 			goto recover_immediately;
 		return lsn;
 	}
-- 
2.36.1.476.g0c4daa206d-goog

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 5.15 CANDIDATE v2 6/8] xfs: prevent UAF in xfs_log_item_in_current_chkpt
  2022-06-16 18:27 [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1) Leah Rumancik
                   ` (4 preceding siblings ...)
  2022-06-16 18:27 ` [PATCH 5.15 CANDIDATE v2 5/8] xfs: check sb_meta_uuid for dabuf buffer recovery Leah Rumancik
@ 2022-06-16 18:27 ` Leah Rumancik
  2022-06-16 18:27 ` [PATCH 5.15 CANDIDATE v2 7/8] xfs: only bother with sync_filesystem during readonly remount Leah Rumancik
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 32+ messages in thread
From: Leah Rumancik @ 2022-06-16 18:27 UTC (permalink / raw)
  To: linux-xfs; +Cc: mcgrof, Darrick J. Wong, Dave Chinner, Leah Rumancik

From: "Darrick J. Wong" <djwong@kernel.org>

[ Upstream commit f8d92a66e810acbef6ddbc0bd0cbd9b117ce8acd ]

While I was running with KASAN and lockdep enabled, I stumbled upon an
KASAN report about a UAF to a freed CIL checkpoint.  Looking at the
comment for xfs_log_item_in_current_chkpt, it seems pretty obvious to me
that the original patch to xfs_defer_finish_noroll should have done
something to lock the CIL to prevent it from switching the CIL contexts
while the predicate runs.

For upper level code that needs to know if a given log item is new
enough not to need relogging, add a new wrapper that takes the CIL
context lock long enough to sample the current CIL context.  This is
kind of racy in that the CIL can switch the contexts immediately after
sampling, but that's ok because the consequence is that the defer ops
code is a little slow to relog items.

 ==================================================================
 BUG: KASAN: use-after-free in xfs_log_item_in_current_chkpt+0x139/0x160 [xfs]
 Read of size 8 at addr ffff88804ea5f608 by task fsstress/527999

 CPU: 1 PID: 527999 Comm: fsstress Tainted: G      D      5.16.0-rc4-xfsx #rc4
 Call Trace:
  <TASK>
  dump_stack_lvl+0x45/0x59
  print_address_description.constprop.0+0x1f/0x140
  kasan_report.cold+0x83/0xdf
  xfs_log_item_in_current_chkpt+0x139/0x160
  xfs_defer_finish_noroll+0x3bb/0x1e30
  __xfs_trans_commit+0x6c8/0xcf0
  xfs_reflink_remap_extent+0x66f/0x10e0
  xfs_reflink_remap_blocks+0x2dd/0xa90
  xfs_file_remap_range+0x27b/0xc30
  vfs_dedupe_file_range_one+0x368/0x420
  vfs_dedupe_file_range+0x37c/0x5d0
  do_vfs_ioctl+0x308/0x1260
  __x64_sys_ioctl+0xa1/0x170
  do_syscall_64+0x35/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7f2c71a2950b
 Code: 0f 1e fa 48 8b 05 85 39 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff
ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 8b 0d 55 39 0d 00 f7 d8 64 89 01 48
 RSP: 002b:00007ffe8c0e03c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
 RAX: ffffffffffffffda RBX: 00005600862a8740 RCX: 00007f2c71a2950b
 RDX: 00005600862a7be0 RSI: 00000000c0189436 RDI: 0000000000000004
 RBP: 000000000000000b R08: 0000000000000027 R09: 0000000000000003
 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000005a
 R13: 00005600862804a8 R14: 0000000000016000 R15: 00005600862a8a20
  </TASK>

 Allocated by task 464064:
  kasan_save_stack+0x1e/0x50
  __kasan_kmalloc+0x81/0xa0
  kmem_alloc+0xcd/0x2c0 [xfs]
  xlog_cil_ctx_alloc+0x17/0x1e0 [xfs]
  xlog_cil_push_work+0x141/0x13d0 [xfs]
  process_one_work+0x7f6/0x1380
  worker_thread+0x59d/0x1040
  kthread+0x3b0/0x490
  ret_from_fork+0x1f/0x30

 Freed by task 51:
  kasan_save_stack+0x1e/0x50
  kasan_set_track+0x21/0x30
  kasan_set_free_info+0x20/0x30
  __kasan_slab_free+0xed/0x130
  slab_free_freelist_hook+0x7f/0x160
  kfree+0xde/0x340
  xlog_cil_committed+0xbfd/0xfe0 [xfs]
  xlog_cil_process_committed+0x103/0x1c0 [xfs]
  xlog_state_do_callback+0x45d/0xbd0 [xfs]
  xlog_ioend_work+0x116/0x1c0 [xfs]
  process_one_work+0x7f6/0x1380
  worker_thread+0x59d/0x1040
  kthread+0x3b0/0x490
  ret_from_fork+0x1f/0x30

 Last potentially related work creation:
  kasan_save_stack+0x1e/0x50
  __kasan_record_aux_stack+0xb7/0xc0
  insert_work+0x48/0x2e0
  __queue_work+0x4e7/0xda0
  queue_work_on+0x69/0x80
  xlog_cil_push_now.isra.0+0x16b/0x210 [xfs]
  xlog_cil_force_seq+0x1b7/0x850 [xfs]
  xfs_log_force_seq+0x1c7/0x670 [xfs]
  xfs_file_fsync+0x7c1/0xa60 [xfs]
  __x64_sys_fsync+0x52/0x80
  do_syscall_64+0x35/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae

 The buggy address belongs to the object at ffff88804ea5f600
  which belongs to the cache kmalloc-256 of size 256
 The buggy address is located 8 bytes inside of
  256-byte region [ffff88804ea5f600, ffff88804ea5f700)
 The buggy address belongs to the page:
 page:ffffea00013a9780 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88804ea5ea00 pfn:0x4ea5e
 head:ffffea00013a9780 order:1 compound_mapcount:0
 flags: 0x4fff80000010200(slab|head|node=1|zone=1|lastcpupid=0xfff)
 raw: 04fff80000010200 ffffea0001245908 ffffea00011bd388 ffff888004c42b40
 raw: ffff88804ea5ea00 0000000000100009 00000001ffffffff 0000000000000000
 page dumped because: kasan: bad access detected

 Memory state around the buggy address:
  ffff88804ea5f500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ffff88804ea5f580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 >ffff88804ea5f600: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                       ^
  ffff88804ea5f680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff88804ea5f700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ==================================================================

Fixes: 4e919af7827a ("xfs: periodically relog deferred intent items")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
---
 fs/xfs/xfs_log_cil.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index 6c93c8ada6f3..b59cc9c0961c 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -1442,9 +1442,9 @@ xlog_cil_force_seq(
  */
 bool
 xfs_log_item_in_current_chkpt(
-	struct xfs_log_item *lip)
+	struct xfs_log_item	*lip)
 {
-	struct xfs_cil_ctx *ctx = lip->li_mountp->m_log->l_cilp->xc_ctx;
+	struct xfs_cil		*cil = lip->li_mountp->m_log->l_cilp;
 
 	if (list_empty(&lip->li_cil))
 		return false;
@@ -1454,7 +1454,7 @@ xfs_log_item_in_current_chkpt(
 	 * first checkpoint it is written to. Hence if it is different to the
 	 * current sequence, we're in a new checkpoint.
 	 */
-	return lip->li_seq == ctx->sequence;
+	return lip->li_seq == READ_ONCE(cil->xc_current_sequence);
 }
 
 /*
-- 
2.36.1.476.g0c4daa206d-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 5.15 CANDIDATE v2 7/8] xfs: only bother with sync_filesystem during readonly remount
  2022-06-16 18:27 [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1) Leah Rumancik
                   ` (5 preceding siblings ...)
  2022-06-16 18:27 ` [PATCH 5.15 CANDIDATE v2 6/8] xfs: prevent UAF in xfs_log_item_in_current_chkpt Leah Rumancik
@ 2022-06-16 18:27 ` Leah Rumancik
  2022-06-16 18:27 ` [PATCH 5.15 CANDIDATE v2 8/8] xfs: use setattr_copy to set vfs inode attributes Leah Rumancik
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 32+ messages in thread
From: Leah Rumancik @ 2022-06-16 18:27 UTC (permalink / raw)
  To: linux-xfs; +Cc: mcgrof, Darrick J. Wong, Dave Chinner, Leah Rumancik

From: "Darrick J. Wong" <djwong@kernel.org>

[ Upstream commit b97cca3ba9098522e5a1c3388764ead42640c1a5 ]

In commit 02b9984d6408, we pushed a sync_filesystem() call from the VFS
into xfs_fs_remount.  The only time that we ever need to push dirty file
data or metadata to disk for a remount is if we're remounting the
filesystem read only, so this really could be moved to xfs_remount_ro.

Once we've moved the call site, actually check the return value from
sync_filesystem.

Fixes: 02b9984d6408 ("fs: push sync_filesystem() down to the file system's remount_fs()")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
---
 fs/xfs/xfs_super.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 23673703618a..e8d19916ba99 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1773,6 +1773,11 @@ xfs_remount_ro(
 	};
 	int			error;
 
+	/* Flush all the dirty data to disk. */
+	error = sync_filesystem(mp->m_super);
+	if (error)
+		return error;
+
 	/*
 	 * Cancel background eofb scanning so it cannot race with the final
 	 * log force+buftarg wait and deadlock the remount.
@@ -1851,8 +1856,6 @@ xfs_fs_reconfigure(
 	if (error)
 		return error;
 
-	sync_filesystem(mp->m_super);
-
 	/* inode32 -> inode64 */
 	if (xfs_has_small_inums(mp) && !xfs_has_small_inums(new_mp)) {
 		mp->m_features &= ~XFS_FEAT_SMALL_INUMS;
-- 
2.36.1.476.g0c4daa206d-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 5.15 CANDIDATE v2 8/8] xfs: use setattr_copy to set vfs inode attributes
  2022-06-16 18:27 [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1) Leah Rumancik
                   ` (6 preceding siblings ...)
  2022-06-16 18:27 ` [PATCH 5.15 CANDIDATE v2 7/8] xfs: only bother with sync_filesystem during readonly remount Leah Rumancik
@ 2022-06-16 18:27 ` Leah Rumancik
  2022-06-17  7:27   ` Amir Goldstein
  2022-06-22  0:07 ` [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1) Luis Chamberlain
  2022-06-22 16:23 ` Darrick J. Wong
  9 siblings, 1 reply; 32+ messages in thread
From: Leah Rumancik @ 2022-06-16 18:27 UTC (permalink / raw)
  To: linux-xfs
  Cc: mcgrof, Darrick J. Wong, Dave Chinner, Christoph Hellwig,
	Christian Brauner, Leah Rumancik

From: "Darrick J. Wong" <djwong@kernel.org>

[ Upstream commit e014f37db1a2d109afa750042ac4d69cf3e3d88e ]

Filipe Manana pointed out that XFS' behavior w.r.t. setuid/setgid
revocation isn't consistent with btrfs[1] or ext4.  Those two
filesystems use the VFS function setattr_copy to convey certain
attributes from struct iattr into the VFS inode structure.

Andrey Zhadchenko reported[2] that XFS uses the wrong user namespace to
decide if it should clear setgid and setuid on a file attribute update.
This is a second symptom of the problem that Filipe noticed.

XFS, on the other hand, open-codes setattr_copy in xfs_setattr_mode,
xfs_setattr_nonsize, and xfs_setattr_time.  Regrettably, setattr_copy is
/not/ a simple copy function; it contains additional logic to clear the
setgid bit when setting the mode, and XFS' version no longer matches.

The VFS implements its own setuid/setgid stripping logic, which
establishes consistent behavior.  It's a tad unfortunate that it's
scattered across notify_change, should_remove_suid, and setattr_copy but
XFS should really follow the Linux VFS.  Adapt XFS to use the VFS
functions and get rid of the old functions.

[1] https://lore.kernel.org/fstests/CAL3q7H47iNQ=Wmk83WcGB-KBJVOEtR9+qGczzCeXJ9Y2KCV25Q@mail.gmail.com/
[2] https://lore.kernel.org/linux-xfs/20220221182218.748084-1-andrey.zhadchenko@virtuozzo.com/

Fixes: 7fa294c8991c ("userns: Allow chown and setgid preservation")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
---
 fs/xfs/xfs_iops.c | 56 +++--------------------------------------------
 fs/xfs/xfs_pnfs.c |  3 ++-
 2 files changed, 5 insertions(+), 54 deletions(-)

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index a607d6aca5c4..1eb71275e5b0 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -634,37 +634,6 @@ xfs_vn_getattr(
 	return 0;
 }
 
-static void
-xfs_setattr_mode(
-	struct xfs_inode	*ip,
-	struct iattr		*iattr)
-{
-	struct inode		*inode = VFS_I(ip);
-	umode_t			mode = iattr->ia_mode;
-
-	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
-
-	inode->i_mode &= S_IFMT;
-	inode->i_mode |= mode & ~S_IFMT;
-}
-
-void
-xfs_setattr_time(
-	struct xfs_inode	*ip,
-	struct iattr		*iattr)
-{
-	struct inode		*inode = VFS_I(ip);
-
-	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
-
-	if (iattr->ia_valid & ATTR_ATIME)
-		inode->i_atime = iattr->ia_atime;
-	if (iattr->ia_valid & ATTR_CTIME)
-		inode->i_ctime = iattr->ia_ctime;
-	if (iattr->ia_valid & ATTR_MTIME)
-		inode->i_mtime = iattr->ia_mtime;
-}
-
 static int
 xfs_vn_change_ok(
 	struct user_namespace	*mnt_userns,
@@ -763,16 +732,6 @@ xfs_setattr_nonsize(
 		gid = (mask & ATTR_GID) ? iattr->ia_gid : igid;
 		uid = (mask & ATTR_UID) ? iattr->ia_uid : iuid;
 
-		/*
-		 * CAP_FSETID overrides the following restrictions:
-		 *
-		 * The set-user-ID and set-group-ID bits of a file will be
-		 * cleared upon successful return from chown()
-		 */
-		if ((inode->i_mode & (S_ISUID|S_ISGID)) &&
-		    !capable(CAP_FSETID))
-			inode->i_mode &= ~(S_ISUID|S_ISGID);
-
 		/*
 		 * Change the ownerships and register quota modifications
 		 * in the transaction.
@@ -784,7 +743,6 @@ xfs_setattr_nonsize(
 				olddquot1 = xfs_qm_vop_chown(tp, ip,
 							&ip->i_udquot, udqp);
 			}
-			inode->i_uid = uid;
 		}
 		if (!gid_eq(igid, gid)) {
 			if (XFS_IS_GQUOTA_ON(mp)) {
@@ -795,15 +753,10 @@ xfs_setattr_nonsize(
 				olddquot2 = xfs_qm_vop_chown(tp, ip,
 							&ip->i_gdquot, gdqp);
 			}
-			inode->i_gid = gid;
 		}
 	}
 
-	if (mask & ATTR_MODE)
-		xfs_setattr_mode(ip, iattr);
-	if (mask & (ATTR_ATIME|ATTR_CTIME|ATTR_MTIME))
-		xfs_setattr_time(ip, iattr);
-
+	setattr_copy(mnt_userns, inode, iattr);
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 
 	XFS_STATS_INC(mp, xs_ig_attrchg);
@@ -1028,11 +981,8 @@ xfs_setattr_size(
 		xfs_inode_clear_eofblocks_tag(ip);
 	}
 
-	if (iattr->ia_valid & ATTR_MODE)
-		xfs_setattr_mode(ip, iattr);
-	if (iattr->ia_valid & (ATTR_ATIME|ATTR_CTIME|ATTR_MTIME))
-		xfs_setattr_time(ip, iattr);
-
+	ASSERT(!(iattr->ia_valid & (ATTR_UID | ATTR_GID)));
+	setattr_copy(mnt_userns, inode, iattr);
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 
 	XFS_STATS_INC(mp, xs_ig_attrchg);
diff --git a/fs/xfs/xfs_pnfs.c b/fs/xfs/xfs_pnfs.c
index 5e1d29d8b2e7..8865f7d4404a 100644
--- a/fs/xfs/xfs_pnfs.c
+++ b/fs/xfs/xfs_pnfs.c
@@ -283,7 +283,8 @@ xfs_fs_commit_blocks(
 	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 
-	xfs_setattr_time(ip, iattr);
+	ASSERT(!(iattr->ia_valid & (ATTR_UID | ATTR_GID)));
+	setattr_copy(&init_user_ns, inode, iattr);
 	if (update_isize) {
 		i_size_write(inode, iattr->ia_size);
 		ip->i_disk_size = iattr->ia_size;
-- 
2.36.1.476.g0c4daa206d-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 5.15 CANDIDATE v2 8/8] xfs: use setattr_copy to set vfs inode attributes
  2022-06-16 18:27 ` [PATCH 5.15 CANDIDATE v2 8/8] xfs: use setattr_copy to set vfs inode attributes Leah Rumancik
@ 2022-06-17  7:27   ` Amir Goldstein
  0 siblings, 0 replies; 32+ messages in thread
From: Amir Goldstein @ 2022-06-17  7:27 UTC (permalink / raw)
  To: Leah Rumancik, Christoph Hellwig, Christian Brauner
  Cc: linux-xfs, Luis R. Rodriguez, Darrick J. Wong, Dave Chinner

On Thu, Jun 16, 2022 at 9:29 PM Leah Rumancik <leah.rumancik@gmail.com> wrote:
>
> From: "Darrick J. Wong" <djwong@kernel.org>
>
> [ Upstream commit e014f37db1a2d109afa750042ac4d69cf3e3d88e ]
>
> Filipe Manana pointed out that XFS' behavior w.r.t. setuid/setgid
> revocation isn't consistent with btrfs[1] or ext4.  Those two
> filesystems use the VFS function setattr_copy to convey certain
> attributes from struct iattr into the VFS inode structure.
>
> Andrey Zhadchenko reported[2] that XFS uses the wrong user namespace to
> decide if it should clear setgid and setuid on a file attribute update.
> This is a second symptom of the problem that Filipe noticed.
>
> XFS, on the other hand, open-codes setattr_copy in xfs_setattr_mode,
> xfs_setattr_nonsize, and xfs_setattr_time.  Regrettably, setattr_copy is
> /not/ a simple copy function; it contains additional logic to clear the
> setgid bit when setting the mode, and XFS' version no longer matches.
>
> The VFS implements its own setuid/setgid stripping logic, which
> establishes consistent behavior.  It's a tad unfortunate that it's
> scattered across notify_change, should_remove_suid, and setattr_copy but
> XFS should really follow the Linux VFS.  Adapt XFS to use the VFS
> functions and get rid of the old functions.
>
> [1] https://lore.kernel.org/fstests/CAL3q7H47iNQ=Wmk83WcGB-KBJVOEtR9+qGczzCeXJ9Y2KCV25Q@mail.gmail.com/
> [2] https://lore.kernel.org/linux-xfs/20220221182218.748084-1-andrey.zhadchenko@virtuozzo.com/
>
> Fixes: 7fa294c8991c ("userns: Allow chown and setgid preservation")
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Christian Brauner <brauner@kernel.org>
> Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
> ---

Hi Leah,

Dave has raised a concern [1] about backporting fixes in this area because
there are other vfs fixes still in work.

For the fix "xfs: fix up non-directory creation in SGID directories",
both Christian
and Christoph said it should be backported regardless [2].

I am not sure what they would have to say about this one though?

Christian, Christoph,

In your opinion, is this one also worth backporting regardless of possible
future vfs fixes?

I did test the 5.10 backports both with and without these two patches,
so I could go either way.

Thanks,
Amir.

[1] https://lore.kernel.org/linux-xfs/20220602005238.GK227878@dread.disaster.area/
[2] https://lore.kernel.org/linux-xfs/CAOQ4uxg4=m9zEFbDAKXx7CP7HYiMwtsYSJvq076oKpy-OhK1uw@mail.gmail.com/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1)
  2022-06-16 18:27 [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1) Leah Rumancik
                   ` (7 preceding siblings ...)
  2022-06-16 18:27 ` [PATCH 5.15 CANDIDATE v2 8/8] xfs: use setattr_copy to set vfs inode attributes Leah Rumancik
@ 2022-06-22  0:07 ` Luis Chamberlain
  2022-06-22 21:44   ` Theodore Ts'o
  2022-06-22 21:52   ` [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1) Leah Rumancik
  2022-06-22 16:23 ` Darrick J. Wong
  9 siblings, 2 replies; 32+ messages in thread
From: Luis Chamberlain @ 2022-06-22  0:07 UTC (permalink / raw)
  To: Leah Rumancik, Amir Goldstein, Josef Bacik, Chuck Lever,
	chandanrmail, Sweet Tea Dorminy, Pankaj Raghav
  Cc: linux-xfs, fstests

On Thu, Jun 16, 2022 at 11:27:41AM -0700, Leah Rumancik wrote:
> https://gist.github.com/lrumancik/5a9d85d2637f878220224578e173fc23. 

The coverage for XFS is using profiles which seem to come inspired
by ext4's different mkfs configurations.

Long ago (2019) I had asked we strive to address popular configurations
for XFS so that what would be back then oscheck (now kdevops) can cover
them for stable XFS patch candidate test consideration. That was so long
ago no one should be surprised you didn't get the memo:

https://lkml.kernel.org/r/20190208194829.GJ11489@garbanzo.do-not-panic.com

This has grown to cover more now:

https://github.com/linux-kdevops/kdevops/blob/master/playbooks/roles/fstests/templates/xfs/xfs.config

For instance xfs_bigblock and xfs_reflink_normapbt.

My litmus test back then *and* today is to ensure we have no regressions
on the test sections supported by kdevops for XFS as reflected above.
Without that confidence I'd be really reluctant to support stable
efforts.

If you use kdevops, it should be easy to set up even if you are not
using local virtualization technologies. For instance I just fired
up an AWS cloud m5ad.4xlarge image which has 2 nvme drives, which
mimics the reqs for the methodology of using loopback files:

https://github.com/linux-kdevops/kdevops/blob/master/docs/seeing-more-issues.md

GCE is supported as well, so is Azure and OpenStack, and even custom
openstack solutions...

Also, I see on the above URL you posted there is a TODO in the gist which
says, "find a better route for publishing these". If you were to use
kdevops for this it would have the immediate gain in that kdevops users
could reproduce your findings and help augment it.

However if using kdevops as a landing home for this is too large for you,
we could use a new git tree which just tracks expunges and then kdevops can
use it as a git subtree as I had suggested at LSFMM. The benefit of using a
git subtree is then any runner can make use of it. And note that we
track both fstests and blktests.

The downside is for kdevops to use a new git subtree is just that kdevops
developers would have to use two trees to work on, one for code changes just
for kdevops and one for the git subtree for expunges. That workflow would be
new. I don't suspect it would be a really big issue other than addressing the
initial growing pains to adapt. I have used git subtrees before extensively
and the best rule of thumb is just to ensure you keep the code for the git
subtree in its own directory. You can either immediately upstream your
delta or carry the delta until you are ready to try to push those
changes. Right now kdevops uses the directory workflows/fstests/expunges/
for expunges. Your runner could use whatever it wishes.

We should discuss if we just also want to add the respective found
*.bad, *.dmesg *.all files for results for expunged entries, or if
we should be pushing these out to a new shared storage area. Right now
kdevops keeps track of results in the directory workflows/fstests/results/
but this is a path on .gitignore. If we *do* want to use github and a
shared git subtree perhaps a workflows/fstests/artifacts/kdevops/ would
make sense for the kdevops runner ? Then that namespace allows other
runners to also add files, but we all share expunges / tribal knowledge.

Thoughts?

  Luis

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1)
  2022-06-16 18:27 [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1) Leah Rumancik
                   ` (8 preceding siblings ...)
  2022-06-22  0:07 ` [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1) Luis Chamberlain
@ 2022-06-22 16:23 ` Darrick J. Wong
  2022-06-22 16:35   ` Darrick J. Wong
  9 siblings, 1 reply; 32+ messages in thread
From: Darrick J. Wong @ 2022-06-22 16:23 UTC (permalink / raw)
  To: Leah Rumancik; +Cc: linux-xfs, mcgrof

On Thu, Jun 16, 2022 at 11:27:41AM -0700, Leah Rumancik wrote:
> The patch testing has been increased to 100 runs per test on each 
> config. A baseline without the patches was established with 100 runs 
> to help detect hard failures / tests with a high fail rate. Any 
> failures seen in the backports branch but not in the baseline branch 
> were then run 1000+ times on both the baseline and backport branches 
> and the failure rates compared. The failures seen on the 5.15 
> baseline are listed at 
> https://gist.github.com/lrumancik/5a9d85d2637f878220224578e173fc23. 
> No regressions were seen with these patches.
> 
> To make the review process easier, I have been coordinating with Amir 
> who has been testing this same set of patches on 5.10. He will be 
> sending out the corresponding 5.10 series shortly.
> 
> Change log from v1 
> (https://lore.kernel.org/all/20220603184701.3117780-1-leah.rumancik@gmail.com/):
> - Increased testing
> - Reduced patch set to overlap with 5.10 patches
> 
> Thanks,
> Leah
> 
> Brian Foster (1):
>   xfs: punch out data fork delalloc blocks on COW writeback failure
> 
> Darrick J. Wong (4):
>   xfs: remove all COW fork extents when remounting readonly
>   xfs: prevent UAF in xfs_log_item_in_current_chkpt
>   xfs: only bother with sync_filesystem during readonly remount

5.15 already has the vfs fixes to make sync_fs/sync_filesystem actually
return error codes, right?

>   xfs: use setattr_copy to set vfs inode attributes
> 
> Dave Chinner (1):
>   xfs: check sb_meta_uuid for dabuf buffer recovery
> 
> Rustam Kovhaev (1):
>   xfs: use kmem_cache_free() for kmem_cache objects
> 
> Yang Xu (1):
>   xfs: Fix the free logic of state in xfs_attr_node_hasname

This one trips me up every time I look at it, but this looks correct.

If the answer to the above question is yes, then:
Acked-by: Darrick J. Wong <djwong@kernel.org>

--D

> 
>  fs/xfs/libxfs/xfs_attr.c      | 17 +++++------
>  fs/xfs/xfs_aops.c             | 15 ++++++++--
>  fs/xfs/xfs_buf_item_recover.c |  2 +-
>  fs/xfs/xfs_extfree_item.c     |  6 ++--
>  fs/xfs/xfs_iops.c             | 56 ++---------------------------------
>  fs/xfs/xfs_log_cil.c          |  6 ++--
>  fs/xfs/xfs_pnfs.c             |  3 +-
>  fs/xfs/xfs_super.c            | 21 +++++++++----
>  8 files changed, 47 insertions(+), 79 deletions(-)
> 
> -- 
> 2.36.1.476.g0c4daa206d-goog
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1)
  2022-06-22 16:23 ` Darrick J. Wong
@ 2022-06-22 16:35   ` Darrick J. Wong
  2022-06-22 21:29     ` Leah Rumancik
  0 siblings, 1 reply; 32+ messages in thread
From: Darrick J. Wong @ 2022-06-22 16:35 UTC (permalink / raw)
  To: Leah Rumancik; +Cc: linux-xfs, mcgrof

On Wed, Jun 22, 2022 at 09:23:07AM -0700, Darrick J. Wong wrote:
> On Thu, Jun 16, 2022 at 11:27:41AM -0700, Leah Rumancik wrote:
> > The patch testing has been increased to 100 runs per test on each 
> > config. A baseline without the patches was established with 100 runs 
> > to help detect hard failures / tests with a high fail rate. Any 
> > failures seen in the backports branch but not in the baseline branch 
> > were then run 1000+ times on both the baseline and backport branches 
> > and the failure rates compared. The failures seen on the 5.15 
> > baseline are listed at 
> > https://gist.github.com/lrumancik/5a9d85d2637f878220224578e173fc23. 
> > No regressions were seen with these patches.
> > 
> > To make the review process easier, I have been coordinating with Amir 
> > who has been testing this same set of patches on 5.10. He will be 
> > sending out the corresponding 5.10 series shortly.
> > 
> > Change log from v1 
> > (https://lore.kernel.org/all/20220603184701.3117780-1-leah.rumancik@gmail.com/):
> > - Increased testing
> > - Reduced patch set to overlap with 5.10 patches
> > 
> > Thanks,
> > Leah
> > 
> > Brian Foster (1):
> >   xfs: punch out data fork delalloc blocks on COW writeback failure
> > 
> > Darrick J. Wong (4):
> >   xfs: remove all COW fork extents when remounting readonly
> >   xfs: prevent UAF in xfs_log_item_in_current_chkpt
> >   xfs: only bother with sync_filesystem during readonly remount
> 
> 5.15 already has the vfs fixes to make sync_fs/sync_filesystem actually
> return error codes, right?
> 
> >   xfs: use setattr_copy to set vfs inode attributes
> > 
> > Dave Chinner (1):
> >   xfs: check sb_meta_uuid for dabuf buffer recovery
> > 
> > Rustam Kovhaev (1):
> >   xfs: use kmem_cache_free() for kmem_cache objects
> > 
> > Yang Xu (1):
> >   xfs: Fix the free logic of state in xfs_attr_node_hasname
> 
> This one trips me up every time I look at it, but this looks correct.
> 
> If the answer to the above question is yes, then:
> Acked-by: Darrick J. Wong <djwong@kernel.org>

I should've mentioned that this is acked-by for patches 1-7, since Amir
posted a question about patch 8 that seems not to have been answered(?)

(Do all the new setgid inheritance tests actually pass with patch 8
applied?  The VFS fixes were thorny enough that I'm not as confident
that they've all been applied to 5.15.y...)

--D

> --D
> 
> > 
> >  fs/xfs/libxfs/xfs_attr.c      | 17 +++++------
> >  fs/xfs/xfs_aops.c             | 15 ++++++++--
> >  fs/xfs/xfs_buf_item_recover.c |  2 +-
> >  fs/xfs/xfs_extfree_item.c     |  6 ++--
> >  fs/xfs/xfs_iops.c             | 56 ++---------------------------------
> >  fs/xfs/xfs_log_cil.c          |  6 ++--
> >  fs/xfs/xfs_pnfs.c             |  3 +-
> >  fs/xfs/xfs_super.c            | 21 +++++++++----
> >  8 files changed, 47 insertions(+), 79 deletions(-)
> > 
> > -- 
> > 2.36.1.476.g0c4daa206d-goog
> > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1)
  2022-06-22 16:35   ` Darrick J. Wong
@ 2022-06-22 21:29     ` Leah Rumancik
  2022-06-23  4:53       ` Amir Goldstein
  0 siblings, 1 reply; 32+ messages in thread
From: Leah Rumancik @ 2022-06-22 21:29 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, mcgrof

On Wed, Jun 22, 2022 at 09:35:19AM -0700, Darrick J. Wong wrote:
> On Wed, Jun 22, 2022 at 09:23:07AM -0700, Darrick J. Wong wrote:
> > On Thu, Jun 16, 2022 at 11:27:41AM -0700, Leah Rumancik wrote:
> > > The patch testing has been increased to 100 runs per test on each 
> > > config. A baseline without the patches was established with 100 runs 
> > > to help detect hard failures / tests with a high fail rate. Any 
> > > failures seen in the backports branch but not in the baseline branch 
> > > were then run 1000+ times on both the baseline and backport branches 
> > > and the failure rates compared. The failures seen on the 5.15 
> > > baseline are listed at 
> > > https://gist.github.com/lrumancik/5a9d85d2637f878220224578e173fc23. 
> > > No regressions were seen with these patches.
> > > 
> > > To make the review process easier, I have been coordinating with Amir 
> > > who has been testing this same set of patches on 5.10. He will be 
> > > sending out the corresponding 5.10 series shortly.
> > > 
> > > Change log from v1 
> > > (https://lore.kernel.org/all/20220603184701.3117780-1-leah.rumancik@gmail.com/):
> > > - Increased testing
> > > - Reduced patch set to overlap with 5.10 patches
> > > 
> > > Thanks,
> > > Leah
> > > 
> > > Brian Foster (1):
> > >   xfs: punch out data fork delalloc blocks on COW writeback failure
> > > 
> > > Darrick J. Wong (4):
> > >   xfs: remove all COW fork extents when remounting readonly
> > >   xfs: prevent UAF in xfs_log_item_in_current_chkpt
> > >   xfs: only bother with sync_filesystem during readonly remount
> > 
> > 5.15 already has the vfs fixes to make sync_fs/sync_filesystem actually
> > return error codes, right?
Confirmed "vfs: make sync_filesystem return errors from ->sync_fs" made
it into 5.15.y (935745abcf4c695a18b9af3fbe295e322547a114).

> > 
> > >   xfs: use setattr_copy to set vfs inode attributes
> > > 
> > > Dave Chinner (1):
> > >   xfs: check sb_meta_uuid for dabuf buffer recovery
> > > 
> > > Rustam Kovhaev (1):
> > >   xfs: use kmem_cache_free() for kmem_cache objects
> > > 
> > > Yang Xu (1):
> > >   xfs: Fix the free logic of state in xfs_attr_node_hasname
> > 
> > This one trips me up every time I look at it, but this looks correct.
> > 
> > If the answer to the above question is yes, then:
> > Acked-by: Darrick J. Wong <djwong@kernel.org>
> 
> I should've mentioned that this is acked-by for patches 1-7, since Amir
> posted a question about patch 8 that seems not to have been answered(?)
> 
> (Do all the new setgid inheritance tests actually pass with patch 8
> applied?  The VFS fixes were thorny enough that I'm not as confident
> that they've all been applied to 5.15.y...)

The setgid tests (ex generic/314, generic/444, generic/673, generic/68[3-7],
xfs/019) ran without issues. The dax config did have failures on both the
baseline and backports branch but the other configs all passed cleanly.
There were some changes to these tests since the last time I updated fstests
though so I'll go ahead and rerun this set to be sure.

Best,
Leah

> 
> --D
> 
> > --D
> > 
> > > 
> > >  fs/xfs/libxfs/xfs_attr.c      | 17 +++++------
> > >  fs/xfs/xfs_aops.c             | 15 ++++++++--
> > >  fs/xfs/xfs_buf_item_recover.c |  2 +-
> > >  fs/xfs/xfs_extfree_item.c     |  6 ++--
> > >  fs/xfs/xfs_iops.c             | 56 ++---------------------------------
> > >  fs/xfs/xfs_log_cil.c          |  6 ++--
> > >  fs/xfs/xfs_pnfs.c             |  3 +-
> > >  fs/xfs/xfs_super.c            | 21 +++++++++----
> > >  8 files changed, 47 insertions(+), 79 deletions(-)
> > > 
> > > -- 
> > > 2.36.1.476.g0c4daa206d-goog
> > > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1)
  2022-06-22  0:07 ` [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1) Luis Chamberlain
@ 2022-06-22 21:44   ` Theodore Ts'o
  2022-06-23  5:31     ` Amir Goldstein
  2022-06-23 21:31     ` Luis Chamberlain
  2022-06-22 21:52   ` [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1) Leah Rumancik
  1 sibling, 2 replies; 32+ messages in thread
From: Theodore Ts'o @ 2022-06-22 21:44 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Leah Rumancik, Amir Goldstein, Josef Bacik, Chuck Lever,
	chandanrmail, Sweet Tea Dorminy, Pankaj Raghav, linux-xfs,
	fstests

On Tue, Jun 21, 2022 at 05:07:10PM -0700, Luis Chamberlain wrote:
> On Thu, Jun 16, 2022 at 11:27:41AM -0700, Leah Rumancik wrote:
> > https://gist.github.com/lrumancik/5a9d85d2637f878220224578e173fc23. 
> 
> The coverage for XFS is using profiles which seem to come inspired
> by ext4's different mkfs configurations.

That's not correct, actually.  It's using the gce-xfstests test
framework which is part of the xfstests-bld[1][2] system that I
maintain, yes.  However, the actual config profiles were obtained via
discussions from Darrick and represent the actual configs which the
XFS maintainer uses to test the upstream XFS tree before deciding to
push to Linus.  We figure if it's good enough for the XFS Maintainer,
it's good enough for us.  :-)

[1] https://thunk.org/gce-xfstests
[2] https://github.com/tytso/xfstests-bld

If you think the XFS Maintainer should be running more configs, I
invite you to have that conversation with Darrick.

> GCE is supported as well, so is Azure and OpenStack, and even custom
> openstack solutions...

The way kdevops work is quite different from how gce-xfstests work,
since it is a VM native solution.  Which is to say, when we kick off a
test, VM's are launched, one per each config, whih provide for better
parallelization, and then once everything is completed, the VM's are
automatically shutdown and they go away; so it's far more efficient in
terms of using cloud resources.  The Lightweight Test Manager will ten
take the Junit XML files, plus all of the test artifacts, and these
get combined into a single test report.

The lightweight test manager runs in a small VM, and this is the only
VM which is consuming resources until we ask it to do some work.  For
example:

    gce-xfstests ltm -c xfs --repo stable.git --commit v5.18.6 -c xfs/all -g auto

That single command will result in the LTM launching a large builder
VM which quickly build the kernel.  (And it uses ccache, and a
persistent cache disk, but even if we've never built the kernel, it
can complete the build in a few minutes.)  Then we launch 12 VM's, one
for each config, and since they don't need to be optimized for fast
builds, we can run most of the VM's with a smaller amount of memory,
to better stress test the file system.  (But for the dax config, we'll
launch a VM with more memory, since we need to simulate the PMEM
device using raw memory.)  Once each VM completes each test run, it
uploads its test artifiacts and results XML file to Google Cloud
Storage.  When all of the VM's complete, the LTM VM will download all
of the results files from GCS, combines them together into a single
result file, and then sends e-mail with a summary of the results.

It's optimized for developers, and for our use cases.  I'm sure
kdevops is much more general, since it can work for hardware-based
test machines, as well as many other cloud stacks, and it's also
optimized for the QA department --- not surprising, since where
kdevops has come from.

> Also, I see on the above URL you posted there is a TODO in the gist which
> says, "find a better route for publishing these". If you were to use
> kdevops for this it would have the immediate gain in that kdevops users
> could reproduce your findings and help augment it.

Sure, but with our system, kvm-xfstests and gce-xfstests users can
*easily* reproduce our findings and can help augment it.  :-)

As far as sharing expunge files, as I've observed before, these files
tend to be very specific to the test configuration --- the number of
CPU's, and the amount of memory, the characteristics of the storage
device, etc.  So what works for one developer's test setup will not
necessarily work for others --- and I'm not convinced that trying to
get everyone standardized on the One True Test Setup is actually an
advantage.  Some people may be using large RAID Arrays; some might be
using fast flash; some might be using some kind of emulated log
structured block device; some might be using eMMC flash.  And that's a
*good* thing.

We also have a very different philosophy about how to use expunge
files.  In paticular, if there is test which is only failing 0.5% of
the time, I don't think it makes sense to put that test into an
expunge file.

In general, we are only placing tests into expunge files when
it causes the system under test to crash, or it takes *WAAAY* too
long, or it's a clear test bug that is too hard to fix for real, so we
just suppress the test for that config for now.  (Example: tests in
xfstests for quota don't understand clustered allocation.)

So we want to run the tests, even if we know it will fail, and have a
way of annotating that a test is known to fail for a particular kernel
version, or if it's a flaky test, what the expected flake percentage
is for that particular test.  For flaky tests, we'd like to be able
automatically retry running the test, and so we can flag when a flaky
test has become a hard failure, or a flaky test has radically changed
how often it fails.  We haven't implemented all of this yet, but this
is something that we're exploring the design space at the moment.

More generally, I think competition is a good thing, and for areas
where we are still exploring the best way to automate tests, not just
from a QA department's perspective, but from a file system developer's
perspective, having multiple systems where we can explore these ideas
can be a good thing.

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1)
  2022-06-22  0:07 ` [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1) Luis Chamberlain
  2022-06-22 21:44   ` Theodore Ts'o
@ 2022-06-22 21:52   ` Leah Rumancik
  2022-06-23 21:40     ` Luis Chamberlain
  1 sibling, 1 reply; 32+ messages in thread
From: Leah Rumancik @ 2022-06-22 21:52 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Amir Goldstein, Josef Bacik, Chuck Lever, chandanrmail,
	Sweet Tea Dorminy, Pankaj Raghav, linux-xfs, fstests

On Tue, Jun 21, 2022 at 05:07:10PM -0700, Luis Chamberlain wrote:
> On Thu, Jun 16, 2022 at 11:27:41AM -0700, Leah Rumancik wrote:
> > https://gist.github.com/lrumancik/5a9d85d2637f878220224578e173fc23. 
> 
> The coverage for XFS is using profiles which seem to come inspired
> by ext4's different mkfs configurations.
The configs I am using for the backports testing were developed with
Darrick's help. If you guys agree on a different set of configs, I'd be
happy to update my configs moving forward. As there has been testing of
these patches on both 5.10 with those configs as well as on 5.15 with
my configs, I don't think this should be blocking for this set of
patches.

- Leah
> 
> Long ago (2019) I had asked we strive to address popular configurations
> for XFS so that what would be back then oscheck (now kdevops) can cover
> them for stable XFS patch candidate test consideration. That was so long
> ago no one should be surprised you didn't get the memo:
> 
> https://lkml.kernel.org/r/20190208194829.GJ11489@garbanzo.do-not-panic.com
> 
> This has grown to cover more now:
> 
> https://github.com/linux-kdevops/kdevops/blob/master/playbooks/roles/fstests/templates/xfs/xfs.config
> 
> For instance xfs_bigblock and xfs_reflink_normapbt.
> 
> My litmus test back then *and* today is to ensure we have no regressions
> on the test sections supported by kdevops for XFS as reflected above.
> Without that confidence I'd be really reluctant to support stable
> efforts.
> 
> If you use kdevops, it should be easy to set up even if you are not
> using local virtualization technologies. For instance I just fired
> up an AWS cloud m5ad.4xlarge image which has 2 nvme drives, which
> mimics the reqs for the methodology of using loopback files:
> 
> https://github.com/linux-kdevops/kdevops/blob/master/docs/seeing-more-issues.md
> 
> GCE is supported as well, so is Azure and OpenStack, and even custom
> openstack solutions...
> 
> Also, I see on the above URL you posted there is a TODO in the gist which
> says, "find a better route for publishing these". If you were to use
> kdevops for this it would have the immediate gain in that kdevops users
> could reproduce your findings and help augment it.
> 
> However if using kdevops as a landing home for this is too large for you,
> we could use a new git tree which just tracks expunges and then kdevops can
> use it as a git subtree as I had suggested at LSFMM. The benefit of using a
> git subtree is then any runner can make use of it. And note that we
> track both fstests and blktests.
> 
> The downside is for kdevops to use a new git subtree is just that kdevops
> developers would have to use two trees to work on, one for code changes just
> for kdevops and one for the git subtree for expunges. That workflow would be
> new. I don't suspect it would be a really big issue other than addressing the
> initial growing pains to adapt. I have used git subtrees before extensively
> and the best rule of thumb is just to ensure you keep the code for the git
> subtree in its own directory. You can either immediately upstream your
> delta or carry the delta until you are ready to try to push those
> changes. Right now kdevops uses the directory workflows/fstests/expunges/
> for expunges. Your runner could use whatever it wishes.
> 
> We should discuss if we just also want to add the respective found
> *.bad, *.dmesg *.all files for results for expunged entries, or if
> we should be pushing these out to a new shared storage area. Right now
> kdevops keeps track of results in the directory workflows/fstests/results/
> but this is a path on .gitignore. If we *do* want to use github and a
> shared git subtree perhaps a workflows/fstests/artifacts/kdevops/ would
> make sense for the kdevops runner ? Then that namespace allows other
> runners to also add files, but we all share expunges / tribal knowledge.
> 
> Thoughts?
> 
>   Luis

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1)
  2022-06-22 21:29     ` Leah Rumancik
@ 2022-06-23  4:53       ` Amir Goldstein
  2022-06-23  6:28         ` Amir Goldstein
  0 siblings, 1 reply; 32+ messages in thread
From: Amir Goldstein @ 2022-06-23  4:53 UTC (permalink / raw)
  To: Leah Rumancik; +Cc: Darrick J. Wong, linux-xfs, Luis R. Rodriguez

On Thu, Jun 23, 2022 at 12:38 AM Leah Rumancik <leah.rumancik@gmail.com> wrote:
>
> On Wed, Jun 22, 2022 at 09:35:19AM -0700, Darrick J. Wong wrote:
> > On Wed, Jun 22, 2022 at 09:23:07AM -0700, Darrick J. Wong wrote:
> > > On Thu, Jun 16, 2022 at 11:27:41AM -0700, Leah Rumancik wrote:
> > > > The patch testing has been increased to 100 runs per test on each
> > > > config. A baseline without the patches was established with 100 runs
> > > > to help detect hard failures / tests with a high fail rate. Any
> > > > failures seen in the backports branch but not in the baseline branch
> > > > were then run 1000+ times on both the baseline and backport branches
> > > > and the failure rates compared. The failures seen on the 5.15
> > > > baseline are listed at
> > > > https://gist.github.com/lrumancik/5a9d85d2637f878220224578e173fc23.
> > > > No regressions were seen with these patches.
> > > >
> > > > To make the review process easier, I have been coordinating with Amir
> > > > who has been testing this same set of patches on 5.10. He will be
> > > > sending out the corresponding 5.10 series shortly.
> > > >
> > > > Change log from v1
> > > > (https://lore.kernel.org/all/20220603184701.3117780-1-leah.rumancik@gmail.com/):
> > > > - Increased testing
> > > > - Reduced patch set to overlap with 5.10 patches
> > > >
> > > > Thanks,
> > > > Leah
> > > >
> > > > Brian Foster (1):
> > > >   xfs: punch out data fork delalloc blocks on COW writeback failure
> > > >
> > > > Darrick J. Wong (4):
> > > >   xfs: remove all COW fork extents when remounting readonly
> > > >   xfs: prevent UAF in xfs_log_item_in_current_chkpt
> > > >   xfs: only bother with sync_filesystem during readonly remount
> > >
> > > 5.15 already has the vfs fixes to make sync_fs/sync_filesystem actually
> > > return error codes, right?
> Confirmed "vfs: make sync_filesystem return errors from ->sync_fs" made
> it into 5.15.y (935745abcf4c695a18b9af3fbe295e322547a114).

Confirmed that it made it into 5.10.y and that
2719c7160dcf ("vfs: make freeze_super abort when sync_filesystem returns error")
also made it to both 5.10.y and 5.15.y

>
> > >
> > > >   xfs: use setattr_copy to set vfs inode attributes
> > > >
> > > > Dave Chinner (1):
> > > >   xfs: check sb_meta_uuid for dabuf buffer recovery
> > > >
> > > > Rustam Kovhaev (1):
> > > >   xfs: use kmem_cache_free() for kmem_cache objects
> > > >
> > > > Yang Xu (1):
> > > >   xfs: Fix the free logic of state in xfs_attr_node_hasname
> > >
> > > This one trips me up every time I look at it, but this looks correct.
> > >
> > > If the answer to the above question is yes, then:
> > > Acked-by: Darrick J. Wong <djwong@kernel.org>
> >
> > I should've mentioned that this is acked-by for patches 1-7, since Amir
> > posted a question about patch 8 that seems not to have been answered(?)
> >
> > (Do all the new setgid inheritance tests actually pass with patch 8
> > applied?  The VFS fixes were thorny enough that I'm not as confident
> > that they've all been applied to 5.15.y...)
>
> The setgid tests (ex generic/314, generic/444, generic/673, generic/68[3-7],
> xfs/019) ran without issues. The dax config did have failures on both the
> baseline and backports branch but the other configs all passed cleanly.
> There were some changes to these tests since the last time I updated fstests
> though so I'll go ahead and rerun this set to be sure.

That's not all.
There are also Christian's vfstest tests that also check SGID with
and without idmapped mounts.
This is especially important for 5.15.y which has idmapped mounts
support.

Anyway, as agreed, we are dropping this patch for this round.
So let's talk procedure -
You got Acked-by from Darrick for the entire series minus this one patch,
so according to earlier request, please post the v3 diminished series
of the 5.15 CANDIDATE to xfs list with Darrick's Acked-by on all patches,
to allow developers to see the series in its final form before posting
to stable.

The cover letter should list the Changes since v2 to express the dropped
patch.

I don't know what's the customary time to wait for "last notice before merge"
but I would say you can send your series to stable in a day or two if there
are no last minute comments.

Anyway, the post to stable is also public (don't forget to cc xfs list) and
after patches are queued to stable there is also time for developers
to object. (stable process sends emails to patch authors and reviewers
before merging them).

I suggest for convention, that the post to stable be tagged as v4
with CANDIDATE prefix removed and the changelog looking
something like this:

Changes since v3:
- Post to stable

Changes from v2:
- Drop SGID fix
- Added Acks from Darrick

Changes from v1:
(https://lore.kernel.org/all/20220603184701.3117780-1-leah.rumancik@gmail.com/):
- Increased testing
- Reduced patch set to overlap with 5.10 patches

I will follow the same procedure for 5.10 series as soon as all
of Darrick's concerns will be addressed.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1)
  2022-06-22 21:44   ` Theodore Ts'o
@ 2022-06-23  5:31     ` Amir Goldstein
  2022-06-23 21:39       ` Luis Chamberlain
  2022-06-23 21:31     ` Luis Chamberlain
  1 sibling, 1 reply; 32+ messages in thread
From: Amir Goldstein @ 2022-06-23  5:31 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Luis Chamberlain, Leah Rumancik, Josef Bacik, Chuck Lever,
	chandanrmail, Sweet Tea Dorminy, Pankaj Raghav, linux-xfs,
	fstests

> It's optimized for developers, and for our use cases.  I'm sure
> kdevops is much more general, since it can work for hardware-based
> test machines, as well as many other cloud stacks, and it's also
> optimized for the QA department --- not surprising, since where
> kdevops has come from.
>

[...]

>
> We also have a very different philosophy about how to use expunge
> files.  In paticular, if there is test which is only failing 0.5% of
> the time, I don't think it makes sense to put that test into an
> expunge file.
>
> In general, we are only placing tests into expunge files when
> it causes the system under test to crash, or it takes *WAAAY* too
> long, or it's a clear test bug that is too hard to fix for real, so we
> just suppress the test for that config for now.  (Example: tests in
> xfstests for quota don't understand clustered allocation.)
>
> So we want to run the tests, even if we know it will fail, and have a
> way of annotating that a test is known to fail for a particular kernel
> version, or if it's a flaky test, what the expected flake percentage
> is for that particular test.  For flaky tests, we'd like to be able
> automatically retry running the test, and so we can flag when a flaky
> test has become a hard failure, or a flaky test has radically changed
> how often it fails.  We haven't implemented all of this yet, but this
> is something that we're exploring the design space at the moment.
>
> More generally, I think competition is a good thing, and for areas
> where we are still exploring the best way to automate tests, not just
> from a QA department's perspective, but from a file system developer's
> perspective, having multiple systems where we can explore these ideas
> can be a good thing.
>

I very much agree with Ted on that point.

As a user and big fan of both kdevops and fstests-bld I wouldn't
want to have to choose one over the other, not even to choose
a unified expunge list.

I think we are still at a point where this diversity makes our ecosystem
stronger rather than causing duplicate work.

To put it in more blunt terms, the core test suite, fstests, is not
very reliable. Neither kdevops nor fstests-bld address all the
reliability issue (and they contribute some of their own).
So we need the community to run both to get better and more
reliable filesystem test coverage.

Nevertheless, we should continue to share as much experience
and data points as we can during this co-opetition stage in order to
improve both systems.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1)
  2022-06-23  4:53       ` Amir Goldstein
@ 2022-06-23  6:28         ` Amir Goldstein
  0 siblings, 0 replies; 32+ messages in thread
From: Amir Goldstein @ 2022-06-23  6:28 UTC (permalink / raw)
  To: Leah Rumancik; +Cc: Darrick J. Wong, linux-xfs, Luis R. Rodriguez

On Thu, Jun 23, 2022 at 7:53 AM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Thu, Jun 23, 2022 at 12:38 AM Leah Rumancik <leah.rumancik@gmail.com> wrote:
> >
> > On Wed, Jun 22, 2022 at 09:35:19AM -0700, Darrick J. Wong wrote:
> > > On Wed, Jun 22, 2022 at 09:23:07AM -0700, Darrick J. Wong wrote:
> > > > On Thu, Jun 16, 2022 at 11:27:41AM -0700, Leah Rumancik wrote:
> > > > > The patch testing has been increased to 100 runs per test on each
> > > > > config. A baseline without the patches was established with 100 runs
> > > > > to help detect hard failures / tests with a high fail rate. Any
> > > > > failures seen in the backports branch but not in the baseline branch
> > > > > were then run 1000+ times on both the baseline and backport branches
> > > > > and the failure rates compared. The failures seen on the 5.15
> > > > > baseline are listed at
> > > > > https://gist.github.com/lrumancik/5a9d85d2637f878220224578e173fc23.
> > > > > No regressions were seen with these patches.
> > > > >
> > > > > To make the review process easier, I have been coordinating with Amir
> > > > > who has been testing this same set of patches on 5.10. He will be
> > > > > sending out the corresponding 5.10 series shortly.
> > > > >
> > > > > Change log from v1
> > > > > (https://lore.kernel.org/all/20220603184701.3117780-1-leah.rumancik@gmail.com/):
> > > > > - Increased testing
> > > > > - Reduced patch set to overlap with 5.10 patches
> > > > >
> > > > > Thanks,
> > > > > Leah
> > > > >
> > > > > Brian Foster (1):
> > > > >   xfs: punch out data fork delalloc blocks on COW writeback failure
> > > > >
> > > > > Darrick J. Wong (4):
> > > > >   xfs: remove all COW fork extents when remounting readonly
> > > > >   xfs: prevent UAF in xfs_log_item_in_current_chkpt
> > > > >   xfs: only bother with sync_filesystem during readonly remount
> > > >
> > > > 5.15 already has the vfs fixes to make sync_fs/sync_filesystem actually
> > > > return error codes, right?
> > Confirmed "vfs: make sync_filesystem return errors from ->sync_fs" made
> > it into 5.15.y (935745abcf4c695a18b9af3fbe295e322547a114).
>
> Confirmed that it made it into 5.10.y and that
> 2719c7160dcf ("vfs: make freeze_super abort when sync_filesystem returns error")
> also made it to both 5.10.y and 5.15.y

Correcting myself:
All of these vfs fixes are in 5.15.y:

* 2d86293c7075 - (xfs/vfs-5.17-fixes) xfs: return errors in xfs_fs_sync_fs
* dd5532a4994b - quota: make dquot_quota_sync return errors from ->sync_fs
* 5679897eb104 - vfs: make sync_filesystem return errors from ->sync_fs
* 2719c7160dcf - vfs: make freeze_super abort when sync_filesystem returns error

5.10.y has only 2719c7160dcf and dd5532a4994b which applied cleanly
and are outside of the fs/xfs/* AUTOSEL ban.

I have backported the two other patches to 5.10, but I may defer them
and the readonly remount patch to the next submission.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1)
  2022-06-22 21:44   ` Theodore Ts'o
  2022-06-23  5:31     ` Amir Goldstein
@ 2022-06-23 21:31     ` Luis Chamberlain
  2022-06-24  5:32       ` Theodore Ts'o
  1 sibling, 1 reply; 32+ messages in thread
From: Luis Chamberlain @ 2022-06-23 21:31 UTC (permalink / raw)
  To: Theodore Ts'o, Darrick J. Wong
  Cc: Leah Rumancik, Amir Goldstein, Josef Bacik, Chuck Lever,
	chandanrmail, Sweet Tea Dorminy, Pankaj Raghav, linux-xfs,
	fstests

On Wed, Jun 22, 2022 at 05:44:30PM -0400, Theodore Ts'o wrote:
> On Tue, Jun 21, 2022 at 05:07:10PM -0700, Luis Chamberlain wrote:
> > On Thu, Jun 16, 2022 at 11:27:41AM -0700, Leah Rumancik wrote:
> > > https://gist.github.com/lrumancik/5a9d85d2637f878220224578e173fc23. 
> > 
> > The coverage for XFS is using profiles which seem to come inspired
> > by ext4's different mkfs configurations.
> 
> That's not correct, actually.  It's using the gce-xfstests test
> framework which is part of the xfstests-bld[1][2] system that I
> maintain, yes.  However, the actual config profiles were obtained via
> discussions from Darrick and represent the actual configs which the
> XFS maintainer uses to test the upstream XFS tree before deciding to
> push to Linus.  We figure if it's good enough for the XFS Maintainer,
> it's good enough for us.  :-)
> 
> [1] https://thunk.org/gce-xfstests
> [2] https://github.com/tytso/xfstests-bld
> 
> If you think the XFS Maintainer should be running more configs, I
> invite you to have that conversation with Darrick.

Sorry, I did not realize that the test configurations for XFS were already
agreed upon with Darrick for stable for the v5.15 effort.

Darrick, long ago when I started to test xfs for stable I had published
what I had suggested and it seemed to cover the grounds back then in
2019:

https://lore.kernel.org/all/20190208194829.GJ11489@garbanzo.do-not-panic.com/T/#m14e299ce476de104f9ee2038b8d002001e579515

If there is something missing from what we use on kdevops for stable
consideation I'd like to augment it. Note that kdevops supports many
sections and some of them are optional for the distribution, each
distribution can opt-in, but likewise we can make sensible defaults for
stable kernels, and per release too. The list of configurations
supported are:

https://github.com/linux-kdevops/kdevops/blob/master/playbooks/roles/fstests/templates/xfs/xfs.config

For stable today we use all sections except xfs_bigblock and xfs_realtimedev.

Do you have any advice on what to stick to for both v5.10 and v5.15 for
stable for both kdevops and gce-xfstests ? It would seem just odd if we
are not testing the same set of profiles as a minimum requirement.

Likewise, the same quiestion applies to linus' tree and linux-next as
in the future my hope is we get to the point kdevops *will* send out
notices for new regressions detected.

> > GCE is supported as well, so is Azure and OpenStack, and even custom
> > openstack solutions...
> 
> The way kdevops work is quite different from how gce-xfstests work,
> since it is a VM native solution.

To be clear, you seem to suggest gce-xfstests is a VM native solution.
I'd also like to clarify that kdevops supports native VMs, cloud and
baremetal. With kdevops you pick your bringup method.

> Which is to 

<-- a description of how gce-xfstests works -->

Today all artifacts are gathered by kdevops locally, they are not
uploaded anywhere. Consumption of this is yet to be determined,
but typically I put the output into a gist manually and then refer
to the URL of the gist on the expunge entry.

Uploading them can be an option / should, but it is not clear yet
where to upload them to. A team will soon be looking into doing some
more parsing of the results into a pretty flexible form / introspection.

> It's optimized for developers, and for our use cases.  I'm sure
> kdevops is much more general, since it can work for hardware-based
> test machines, as well as many other cloud stacks, and it's also
> optimized for the QA department --- not surprising, since where
> kdevops has come from.

kdevops started as an effort for kernel development and filesystems
testing. It is why the initial guest configuration was to use 8 GiB
of RAM and 4 vcpus, that suffices to do local builds / development.
I always did kernel development on guests back in the day still do
to this day.

It also has support for email reports and you get the xunit summary
output *and* a git diff output of the expunges should a new regression
be found.

A QA team was never involved other than later learning existed and that
the kernel team was using it to proactively find issues. Later kdevops was
used to report bugs proactively as it was finding a lot more issues than
typical fstests QA setups find.

> > Also, I see on the above URL you posted there is a TODO in the gist which
> > says, "find a better route for publishing these". If you were to use
> > kdevops for this it would have the immediate gain in that kdevops users
> > could reproduce your findings and help augment it.
> 
> Sure, but with our system, kvm-xfstests and gce-xfstests users can
> *easily* reproduce our findings and can help augment it.  :-)

Sure, the TODO item on the URL seemed to indicate there was a desire to
find a better place to put failures.

> As far as sharing expunge files, as I've observed before, these files
> tend to be very specific to the test configuration --- the number of
> CPU's, and the amount of memory, the characteristics of the storage
> device, etc.

And as I noted also at LSFMM it is not an imposibility to address this
either if we want to. We can simply use a namespace for test runner and
a generic test configuration.

A parent directory simply would represent the test runner. We have two
main ones for stable:

  * gce-xfstests
  * kdevops

So they can just be the parent directory.

Then I think we can probably agree upon 4 GiB RAM / 4 vpus per guest on
x86_64 for a typical standard requirement. So something like
x86_64_mem4g_cpus4. Then there is the drive setup. kdevops defaults
to loopback drives on nvme drives as the default for both cloud and
native KVM guests. So that can be nvme_loopback. It is not clear
what gce-xfstests but this can probably be described just as well.

> So what works for one developer's test setup will not
> necessarily work for others 

True but it does not mean we cannot automate setup of an agreed upon setup.
Specially if you wan to enable folks to reproduce. We can.

> --- and I'm not convinced that trying to
> get everyone standardized on the One True Test Setup is actually an
> advantage.

That is not a goal, the goal is allow variability! And share results
in the most efficient way.

It just turns an extremely simple setup we *can* *enable* *many* folks
to setup easily with local vms to reproduce *more* issues today is
with nvme drives + loopback drives. You are probably correct that
this methodology was perhaps not as tested today as it was before and
this is probably *why* we find more issues today. But so far it is
true that:

 * all issues found are real and sometimes hard to reproduce with direct
   drives
 * this methodology is easy to bring up
 * it is finding more issues

This is why this is just today's default for kdevops. It does not
mean you can't *grow* to add support for other drive setup. In fact
this is needed for testing ZNS drives.

> Some people may be using large RAID Arrays; some might be
> using fast flash; some might be using some kind of emulated log
> structured block device; some might be using eMMC flash.  And that's a
> *good* thing.

Absolutely!

> We also have a very different philosophy about how to use expunge
> files.

Yes it does not mean we can't share them.

And the variability which exists today *can* also be expressed.

> In paticular, if there is test which is only failing 0.5% of
> the time, I don't think it makes sense to put that test into an
> expunge file.

This preference can be expressed through kconfig and supported
and support added for it.

> More generally, I think competition is a good thing, and for areas
> where we are still exploring the best way to automate tests, not just
> from a QA department's perspective, but from a file system developer's
> perspective, having multiple systems where we can explore these ideas
> can be a good thing.

Sure, sure, but again, but it does not mean we can't or shouldn't
consider to share some things. Differences in strategy on how to process
expunge files can be discussed so that later I can add support for it.

I still think we can share at the very least configurations and
expunges with known failure rates (even if they are runner/config
specific).

  Luis

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1)
  2022-06-23  5:31     ` Amir Goldstein
@ 2022-06-23 21:39       ` Luis Chamberlain
  0 siblings, 0 replies; 32+ messages in thread
From: Luis Chamberlain @ 2022-06-23 21:39 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Theodore Ts'o, Leah Rumancik, Josef Bacik, Chuck Lever,
	chandanrmail, Sweet Tea Dorminy, Pankaj Raghav, linux-xfs,
	fstests

On Thu, Jun 23, 2022 at 08:31:30AM +0300, Amir Goldstein wrote:
> To put it in more blunt terms, the core test suite, fstests, is not
> very reliable. Neither kdevops nor fstests-bld address all the
> reliability issue (and they contribute some of their own).
> So we need the community to run both to get better and more
> reliable filesystem test coverage.

The generic pains with fstests / blktests surely can be shared and
perhaps that is just a think we need to start doing more regularly at
LSFMM more so than a one-off thing.

> Nevertheless, we should continue to share as much experience
> and data points as we can during this co-opetition stage in order to
> improve both systems.

Yes, my point was not about killing something off, it was about sharing
data points, and I think we should at least share configs.

I personally see value in sharing expunges, but indeed if we do we'd
have to decide if to put them up on github with just the expunge list
alone, or do we also want to upload artifacts on the same tree. Or
should we dump all the artifacts into a storage pool somewhere. Some
artifacts can grow to insane sizes if a test is bogus, I ran into one
once which was at least 2 GiB of output on a *.bad file. The error was
just reapeating over and over. I think IIRC it was for ZNS for btrfs or
for a blktests zbd test where the ouput was just an error repeating
itself over and over. We could just have a size limit on these. And if
experience is to show us anyting perahps adopt an epoch thing if we
use git.

  Luis

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1)
  2022-06-22 21:52   ` [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1) Leah Rumancik
@ 2022-06-23 21:40     ` Luis Chamberlain
  0 siblings, 0 replies; 32+ messages in thread
From: Luis Chamberlain @ 2022-06-23 21:40 UTC (permalink / raw)
  To: Leah Rumancik
  Cc: Amir Goldstein, Josef Bacik, Chuck Lever, chandanrmail,
	Sweet Tea Dorminy, Pankaj Raghav, linux-xfs, fstests

On Wed, Jun 22, 2022 at 02:52:18PM -0700, Leah Rumancik wrote:
> On Tue, Jun 21, 2022 at 05:07:10PM -0700, Luis Chamberlain wrote:
> > On Thu, Jun 16, 2022 at 11:27:41AM -0700, Leah Rumancik wrote:
> > > https://gist.github.com/lrumancik/5a9d85d2637f878220224578e173fc23. 
> > 
> > The coverage for XFS is using profiles which seem to come inspired
> > by ext4's different mkfs configurations.
> The configs I am using for the backports testing were developed with
> Darrick's help.

Sorry for the noise then.

> If you guys agree on a different set of configs, I'd be
> happy to update my configs moving forward.

Indeed it would be great to unify on target test configs at the very least.

  Luis

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1)
  2022-06-23 21:31     ` Luis Chamberlain
@ 2022-06-24  5:32       ` Theodore Ts'o
  2022-06-24 22:54         ` Luis Chamberlain
  0 siblings, 1 reply; 32+ messages in thread
From: Theodore Ts'o @ 2022-06-24  5:32 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Darrick J. Wong, Leah Rumancik, Amir Goldstein, Josef Bacik,
	Chuck Lever, chandanrmail, Sweet Tea Dorminy, Pankaj Raghav,
	linux-xfs, fstests

On Thu, Jun 23, 2022 at 02:31:12PM -0700, Luis Chamberlain wrote:
> 
> To be clear, you seem to suggest gce-xfstests is a VM native solution.
> I'd also like to clarify that kdevops supports native VMs, cloud and
> baremetal. With kdevops you pick your bringup method.

Yes, that was my point.  Because gce-xfstests is a VM native solution,
it has some advantages, such as the ability to take advantage of the
fact that it's trivially easy to start up multiple cloud VM's which
can run in parallel --- and then the VM's shut themselves down once
they are done running the test, which saves cost and is more
efficient.

It is *because* that we are a VM-native solution that we can optimize
in certain ways because we don't have to also support a bare metal
setup.  So yes, the fact that kdevops also supports bare metal is
certainly granted.  That that kind of flexibility is an advantage for
kdevops, certainly; but being able to fully take advantage of the
unqiue attributes of cloud VM's can also be a good thing.

(I've already made offers to folks working at other cloud vendors that
if they are interested in adding support for other cloud systems
beyond GCE, I'm happy to work with them to enable the use of other
XXX-xfstests test-appliance runners.)

> kdevops started as an effort for kernel development and filesystems
> testing. It is why the initial guest configuration was to use 8 GiB
> of RAM and 4 vcpus, that suffices to do local builds / development.
> I always did kernel development on guests back in the day still do
> to this day.

For kvm-xfstests, the default RAM size for the VM is 2GB.  One of the
reasons why I was interested in low-memory configurations is because
ext4 is often used in smaller devices (such as embedded systesm and
mobile handsets) --- and running in memory constrained environments
can turn up bugs that otherwise are much harder to reproduce on a
system with more memory.

Separating the kernel build system from the test VM's means that the
build can take place on a really powerful machine (either my desktop
with 48 cores and gobs and gobs of memory, or a build VM if you are
using the Lightweight Test Manager's Kernel Compilation Service) so
builds go much faster.  And then, of course, we can then launch a
dozen VM's, one for each test config.  If you force the build to be
done on the test VM, then you either give up parallelism, or you waste
time by building the kernel N times on N test VM's.

And in the case of the android-xfstests, which communicates with a
phone or tablet over a debugging serial cable and Android's fastboot
protocol, of *course* it would be insane to want to build the kernel
on the system under test!

So I've ***always*** done the kernel build on a machine or VM separate
from the System Under Test.  At least for my use cases, it just makes
a heck of a lot more sense.

And that's fine.  I'm *not* trying to convince everyone that my test
infrastructure everyone should standardize on.  Which quite frankly, I
sometimes think you have been evangelizing.  I believe very strongly
that the choice of test infrastructures is a personal choice, which is
heavily dependent on each developer's workflow, and trying to get
everyone to standardize on a single test infrastructure is likely
going to work as well as trying to get everyone to standardize on a
single text editor.

(Although obviously emacs is the one true editor.  :-)

> Sure, the TODO item on the URL seemed to indicate there was a desire to
> find a better place to put failures.

I'm not convinced the "better place" is expunge files.  I suspect it
may need to be some kind of database.  Darrick tells me that he stores
his test results in a postgres database.  (Which is way better than
what I'm doing which is an mbox file and using mail search tools.)

Currently, Leah is using flat text files for the XFS 5.15 stable
backports effort, plus some tools that parse and analyze those text
files.

I'll also note that the number of baseline kernel versions is much
smaller if you are primarily testing an enterprise Linux distribution,
such as SLES.  And if you are working with stable kernels, you can
probably get away with having updating the baseline for each LTS
kernel every so often.  But for upstream kernels development the
number of kernel versions for which a developer might want to track
flaky percentages and far greater, and will need to be updated at
least once every kernel development cycle, and possibly more
frequently than that.  Which is why I'm not entirely sure a flat text
file, such as an expunge file, is really the right answer.  I can
completely understand why Darrick is using a Postgres database.

So there is clearly more thought and design required here, in my
opinion.

> That is not a goal, the goal is allow variability! And share results
> in the most efficient way.

Sure, but are expunge files the most efficient way to "share results"?
If we have a huge amount of variability, such that we have a large
number of directories with different test configs and different
hardware configs, each with different expunge files, I'm not sure how
useful that actually is.  Are we expecting users to do a "git clone",
and then start browsing all of these different expunge files by hand?

It might perhaps be useful to get a bit more clarity about how we
expect the shared results would be used, because that might drive some
of the design decisions about the best way to store these "results".

Cheers,

					- Ted

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1)
  2022-06-24  5:32       ` Theodore Ts'o
@ 2022-06-24 22:54         ` Luis Chamberlain
  2022-06-25  2:21           ` Theodore Ts'o
  2022-06-25  7:28           ` sharing fstests results (Was: [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1)) Amir Goldstein
  0 siblings, 2 replies; 32+ messages in thread
From: Luis Chamberlain @ 2022-06-24 22:54 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Darrick J. Wong, Leah Rumancik, Amir Goldstein, Josef Bacik,
	Chuck Lever, chandanrmail, Sweet Tea Dorminy, Pankaj Raghav,
	Zorro Lang, linux-xfs, fstests

On Fri, Jun 24, 2022 at 01:32:23AM -0400, Theodore Ts'o wrote:
> On Thu, Jun 23, 2022 at 02:31:12PM -0700, Luis Chamberlain wrote:
> > 
> > To be clear, you seem to suggest gce-xfstests is a VM native solution.
> > I'd also like to clarify that kdevops supports native VMs, cloud and
> > baremetal. With kdevops you pick your bringup method.
> 
> Yes, that was my point.  Because gce-xfstests is a VM native solution,
> it has some advantages, such as the ability to take advantage of the
> fact that it's trivially easy to start up multiple cloud VM's which
> can run in parallel --- and then the VM's shut themselves down once
> they are done running the test, which saves cost and is more
> efficient.

Perhaps I am not understanding what you are suggesting with a VM native
solution. What do you mean by that? A full KVM VM inside the cloud?

Anyway, kdevops has support to bring up whatever type of node you want
in the clouds providers: GCE, AWS, Azure, and OpenStack and even custom
OpenStack solutions. That could be a VM or a high end bare metal node.
It does this by using terraform and providing the variability through
kconfig.

The initial 'make bringup' brings nodes up, and then all work runs on
each in parallel for fstests as you run 'make fstests-baseline'. At the
end you just run 'make destroy'.

> It is *because* that we are a VM-native solution that we can optimize
> in certain ways because we don't have to also support a bare metal
> setup.  So yes, the fact that kdevops also supports bare metal is
> certainly granted.  That that kind of flexibility is an advantage for
> kdevops, certainly; but being able to fully take advantage of the
> unqiue attributes of cloud VM's can also be a good thing.

Yes, agreed. That is why I focused on technology that would support
all cloud providers, not just one.

I had not touched code for AWS code for example in 2 years, I just
went and tried a bringup and it worked in 10 minutes, most of the time
was getting my .aws/credentials file set up with information from the
website.

> > kdevops started as an effort for kernel development and filesystems
> > testing. It is why the initial guest configuration was to use 8 GiB
> > of RAM and 4 vcpus, that suffices to do local builds / development.
> > I always did kernel development on guests back in the day still do
> > to this day.
> 
> For kvm-xfstests, the default RAM size for the VM is 2GB.  One of the
> reasons why I was interested in low-memory configurations is because
> ext4 is often used in smaller devices (such as embedded systesm and
> mobile handsets) --- and running in memory constrained environments
> can turn up bugs that otherwise are much harder to reproduce on a
> system with more memory.

Yes, I agree. We started with 8 GiB. Long ago while at SUSE I tried 2GiB
and ran into the xfs/074 issue of requiring more due to xfs_scratch.
Then later Amir ran into snags with xfs/084 and generic/627 due to the
OOMs. So in terms of XFS to avoid OOMs with just the tests we need 3GiB.

> Separating the kernel build system from the test VM's means that the
> build can take place on a really powerful machine (either my desktop
> with 48 cores and gobs and gobs of memory, or a build VM if you are
> using the Lightweight Test Manager's Kernel Compilation Service) so
> builds go much faster.  And then, of course, we can then launch a
> dozen VM's, one for each test config.  If you force the build to be
> done on the test VM, then you either give up parallelism, or you waste
> time by building the kernel N times on N test VM's.

The build is done once but I agree this can be optimized for kdevops.

Right now in kdevops the git clone and build of the kernel does take
place on each guest, and that requires at least 3 GiB of RAM. Shallow
git clone support was added as option to help here but the ideal thing
will be to just build locally or perhaps as you suggest dedicated build
VM.

> And in the case of the android-xfstests, which communicates with a
> phone or tablet over a debugging serial cable and Android's fastboot
> protocol, of *course* it would be insane to want to build the kernel
> on the system under test!
> 
> So I've ***always*** done the kernel build on a machine or VM separate
> from the System Under Test.  At least for my use cases, it just makes
> a heck of a lot more sense.

Support for this will be added to kdevops.

> And that's fine.  I'm *not* trying to convince everyone that my test
> infrastructure everyone should standardize on.  Which quite frankly, I
> sometimes think you have been evangelizing.  I believe very strongly
> that the choice of test infrastructures is a personal choice, which is
> heavily dependent on each developer's workflow, and trying to get
> everyone to standardize on a single test infrastructure is likely
> going to work as well as trying to get everyone to standardize on a
> single text editor.

What I think we *should* standardize on is at least configurations
for testing. And now the dialog of how / if we track / share failures
is also important.

What runner you use is up to you.

> (Although obviously emacs is the one true editor.  :-)
> 
> > Sure, the TODO item on the URL seemed to indicate there was a desire to
> > find a better place to put failures.
> 
> I'm not convinced the "better place" is expunge files.  I suspect it
> may need to be some kind of database.  Darrick tells me that he stores
> his test results in a postgres database.  (Which is way better than
> what I'm doing which is an mbox file and using mail search tools.)
> 
> Currently, Leah is using flat text files for the XFS 5.15 stable
> backports effort, plus some tools that parse and analyze those text
> files.

Where does not matter yet, what I'd like to refocus on is *if* sharing
is desirable by folks. We can discuss *how* and *where* if we do think
it is worth to share.

If folks would like to evaluate this I'd encourage to do so perhaps
after a specific distro release moving forward, and to not backtrack.

But for stable kernels I'd imagine it may be easier to see value in
sharing.

> I'll also note that the number of baseline kernel versions is much
> smaller if you are primarily testing an enterprise Linux distribution,
> such as SLES.

Much smaller than what? Android? If so then perhaps. Just recall that
Enterprise supports kernels for at least 10 years.

> And if you are working with stable kernels, you can
> probably get away with having updating the baseline for each LTS
> kernel every so often.  But for upstream kernels development the
> number of kernel versions for which a developer might want to track
> flaky percentages and far greater, and will need to be updated at
> least once every kernel development cycle, and possibly more
> frequently than that.  Which is why I'm not entirely sure a flat text
> file, such as an expunge file, is really the right answer.  I can
> completely understand why Darrick is using a Postgres database.
> 
> So there is clearly more thought and design required here, in my
> opinion.

Sure, let's talk about it, *if* we do find it valuable to share.
kdevops already has stuff in a format which is consistent, that
can change or be ported. We first just need to decide if we want
to as a community share.

The flakyness annotations are important too, and we have a thread
about that, which I have to go and get back to at some point.

> > That is not a goal, the goal is allow variability! And share results
> > in the most efficient way.
> 
> Sure, but are expunge files the most efficient way to "share results"?

There are three things we want to do if we are going to talk about
sharing results:

a) Consuming expunges so check.sh for the Node Under Test (NUT) can expand
   on the expunges given a criteria (flakyness, crash requirements)

b) Sharing updates to expunges per kernel / distro / runner / node-config
   and making patches to this easy.

c) Making updates for failures easy to read for a developer / community.
   These would be in the form of an email or results file for a test
   run through some sort of kernel-ci.

Let's start with a):

We can adopt runners to use anything. My gut tells me postgres is
a bit large unless we need socket communication. I can think of two
ways to go here then. Perhaps others have some other ideas?

1) We go lightweight on the db, maybe sqlite3 ? And embrace the same
   postgres db schema as used by Darrick if he sees value in sharing
   this. If we do this I think it does't make sense to *require*
   sqlite3 on the NUT (nodes), for many reasons, so parsing the db
   on the host to a flat file to be used by the node does seem
   ideal.

2) Keep postgres and provide a REST api for queries from the host to
   this server so it can then construct a flat file / directory
   interpreation of expunges for the nodes under test (NUT).

Given the minimum requirements desirable on the NUTs I think in the end
a flat file hierarchy is nice so to not incur some new dependency on
them.

Determinism is important for tests though so snapshotting a reflection
interpretion of expunges at a specific point in time is also important.
So the database would need to be versioned per updates, so a test is
checkpointed against a specific version of the expunge db.

If we come to some sort of consensus then this code for parsing an
expunge set can be used from directly on fstests's check script, so the
interpreation and use can be done in one place for all test runners.
We also have additional criteria which we may want for the expunges.
For instance, if we had flakyness percentage annotated somehow then
fstests's check could be passed an argument to only include expunges
given a certain flakyness level of some sort, or for example only
include expunges for tests which are known to crash.

Generating the files from a db is nice. But what gains do we have
with using a db then?

Now let's move on to b) sharing the expunges and sending patches for
updates. I think sending a patch against a flat file reads a lot easier
except for the comments / flakyness levels / crash consideration / and
artifacts. For kdevop's purposes this reads well today as we don't
upload artifacts anywhere and just refer to them on github gists as best
effort / optional. There is no convention yet on expression of flakyness
but some tests do mention "failure rate" in one way or another.

So we want to evaluate if we want to share not only expunges but other
meta data associated to why a new test can be expunged or removed:

 * flakyness percentage
 * cause a kernel crash?
 * bogus test?
 * expunged due to a slew of a tons of other reasons, some of them maybe
   categorized and shared, some of them not

And do we want to share artifacts? If so how? Perhaps an optional URL,
with another component describing what it is, gist, or a tarball, etc.

Then for the last part c) making failures easy to read to a developer
let's review what could be done. I gather gce-xfstests explains the
xunit results summary. Right now kdevop's kernel-ci stuff just sends
an email with the same but also a diff to the expunge file hierarchy
augmented for the target kernel directory being tested. The developer
would just go and edit the line with meta data as a comment, but that
is just because we lack a structure for it. If we strive to share
an expunge list I think it would be wise to consider structure for
this metadata.

Perhaps:

<test> # <crashes>|<flayness-percent-as-fraction>|<fs-skip-reason>|<artifact-type>|<artifact-dir-url>|<comments>

Where:

test:                         xfs/123 or btrfs/234
crashes:                      can be either Y or N
flayness-percent-as-percentage: 80%
fs-skip-reason:               can be an enum to represent a series of 
                              fs specific reasons why a test may not be
			      applicable or should be skipped
artifact-type:                optional, if present the type of artifact,
                              can be enum to represent a gist test
			      description, or a tarball
artifact-dir-url:             optional, path to the artifact
comments:                     additional comments

All the above considered, a) b) and c), yes I think a flat file
model works well as an option. I'd love to hear other's feedback.

> If we have a huge amount of variability, such that we have a large
> number of directories with different test configs and different
> hardware configs, each with different expunge files, I'm not sure how
> useful that actually is.

*If* you want to share I think it would be useful.

At least kdevops uses a flat file model with no artifacts, just the
expunges and comments, and over time it has been very useful, even to be
able to review historic issues on older kernels by simply using
something like 'git grep xfs/123' gives me a quick sense of history of
issues of a test.

> Are we expecting users to do a "git clone",
> and then start browsing all of these different expunge files by hand?

If we want to extend fstests check script to look for this, it could
be an optional directory and an arugment could be pased to check so
to enable its hunt for it, so that if passed it would look for the
runner / kernel / host-type. For instance today we already have
a function on initialization for the check script which looks for
the fstests' config file as follows:

known_hosts()
{
	[ "$HOST_CONFIG_DIR" ] || HOST_CONFIG_DIR=`pwd`/configs

	[ -f /etc/xfsqa.config ] && export HOST_OPTIONS=/etc/xfsqa.config
	[ -f $HOST_CONFIG_DIR/$HOST ] && export HOST_OPTIONS=$HOST_CONFIG_DIR/$HOST
	[ -f $HOST_CONFIG_DIR/$HOST.config ] && export HOST_OPTIONS=$HOST_CONFIG_DIR/$HOST.config
}

We could have something similar look for an expugne directory of say
say --expunge-auto-look and that could be something like:

process_expunge_dir()
{
	[ "$HOST_EXPUNGE_DIR" ] || HOST_EXPUNGE_DIR=`pwd`/expunges

	[ -d /etc/fstests/expunges/$HOST ] && export HOST_EXPUNGES=/etc/fstests/expunges/$HOST
	[ -d $HOST_EXPUNGE_DIR/$HOST ] && export HOST_EXPUNGES=$HOST_EXPUNGE_DIR/$HOST
}

The runner could be specified, and the host-type

./check --runner <gce-xfstests|kdevops|whatever> --host-type <kvm-8vcpus-2gb>

And so we can have it look for these directory and if any of these are used
processed (commulative):

  * HOST_EXPUNGES/any/$fstype/                       - regardless of kernel, host type and runner
  * HOST_EXPUNGES/$kernel/$fstype/any                - common between runners for any host type
  * HOST_EXPUNGES/$kernel/$fstype/$hostype           - common between runners for a host type
  * HOST_EXPUNGES/$kernel/$fstype/$hostype/$runner   - only present for the runner

The aggregate set of expugnes are used.

Additional criteria could be passed to check so to ensure that only
certain expunges that meet the criteria are used to skip tests for the
run, provided we can agree on some metatdata for that.

> It might perhaps be useful to get a bit more clarity about how we
> expect the shared results would be used, because that might drive some
> of the design decisions about the best way to store these "results".

Sure.

  Luis

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1)
  2022-06-24 22:54         ` Luis Chamberlain
@ 2022-06-25  2:21           ` Theodore Ts'o
  2022-06-25 18:49             ` Luis Chamberlain
  2022-06-25  7:28           ` sharing fstests results (Was: [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1)) Amir Goldstein
  1 sibling, 1 reply; 32+ messages in thread
From: Theodore Ts'o @ 2022-06-25  2:21 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Darrick J. Wong, Leah Rumancik, Amir Goldstein, Josef Bacik,
	Chuck Lever, chandanrmail, Sweet Tea Dorminy, Pankaj Raghav,
	Zorro Lang, linux-xfs, fstests

On Fri, Jun 24, 2022 at 03:54:44PM -0700, Luis Chamberlain wrote:
> 
> Perhaps I am not understanding what you are suggesting with a VM native
> solution. What do you mean by that? A full KVM VM inside the cloud?

"Cloud native" is the better way to put things.  Cloud VM's
are designed to be ephemeral, so the concept of "node bringup" really
doesn't enter into the picture.

When I run the "build-appliance" command, this creates a test
appliance image.  Which is to say, we create a root file system image,
and then "freeze" it into a VM image.

For kvm-xfstests this is a qcow image which is run in snapshot mode,
which means that if any changes is made to the root file system, those
changes disappear when the VM exits.

For gce-xfstests, we create an image which can be used to quickly
bring up a VM which contains a block device which contains a copy of
that image as the root file system.  What so special about this?  I
can create a dozen, or a hundred VM's, all with a copy of that same
image.  So I can do something like

   gce-xfstests ltm -c ext4/all -g full gs://gce-xfstests/bzImage-5.4.200

and this will launch a dozen VM's, with each VM testing a single test
configuration with the kernel found at gs://gce-xfstests/bzImage-5.4.200
in Google Cloud Storage GCS (the rough equivalent of AWS's S3).

And then I can run

   gce-xfstests ltm -c ext4/all -g full --repo stable.git --commit v5.10.124

And this will launch a build VM which is nice and powerful to
*quickly* build the 5.10.124 kernel as found in the stable git tree,
and then launch a dozen additional VM's to test that built kernel
against all of the test configs defined for ext4/all, one VM per each
fs config.

And after running

   gce-xfstests ltm -c ext4/all -g full --repo stable.git --commit v5.15.49
   gce-xfstests ltm -c ext4/all -g full --repo stable.git --commit v5.18.6

... now there will be ~50 VM's all running tests in parallel.  So this
is far faster than doing a "node bringup", and since I am running all
of the tests in parallel, I will get the test results back in a much
shorter amount of wall clock time.  And, as running each test config
complete, the VM's will disappear (after first uploading the test
results into GCS), and I will stop getting charged for them.

And if I were to launch additional tests runs, each containing their
own set of VM's:

   gce-xfstests ltm -c xfs/all -g full --repo stable.git --commit v5.15.49
   gce-xfstests ltm -c xfs/all -g full --repo stable.git --commit v5.18.6
   gce-xfstests ltm -c f2fs/all -g full --repo stable.git --commit v5.15.49

I can very quickly have over 100 test VM's running in parallel, and as
the tests complete, they are automatically shutdown and destroyed ----
which means that we don't store state in the VM.  Instead the state is
stored in a Google Cloud Storage (Amazon S3) bucket, with e-mail sent
with a summary of results.

VM's can get started much more quickly than "make bringup", since
we're not running puppet or ansible to configure each node.  Instead,
we get a clone of the test appliance:

% gce-xfstests describe-image
archiveSizeBytes: '1315142848'
creationTimestamp: '2022-06-20T21:46:24.797-07:00'
description: Linux Kernel File System Test Appliance
diskSizeGb: '10'
family: xfstests
  ...
labels:
  blktests: gaf97b55
  fio: fio-3_30
  fsverity: v1_5
  ima-evm-utils: v1_3_2
  nvme-cli: v1_16
  quota: v4_05-43-gd2256ac
  util-linux: v2_38
  xfsprogs: v5_18_0
  xfstests: v2022_06_05-13-gbc442c4b
  xfstests-bld: g8548bd11
  zz_build-distro: bullseye
  ...

And since these images are cheap to keep around (5-6 cents/month), I
can keep a bunch of older versions of test appliances around, in case
I want to see if a test regression might be caused by a newer version
of the test appliance.  So I can run "gce-xfstests -I
xfstests-202001021302" and this will create a VM using the test
appliance that I built on January 2, 2020.  It also means that I can
release a new test appliance to the xfstests-cloud project for public
use, and if someone wants to pin their testing to an known version of
the test appliance, they can do that.

So the test appliance VM's can be much more dynamic than kdevops
nodes, because they can be created and deleted without a care in the
world.  This is enabled by the fact that there isn't any state which
is stored on the VM.  In contrat, in order to harvest test results
from a kdevops node, you have to ssh into the node and try to find the
test results.

In contrast, I can just run "gce-xfstests ls-results" to see all of
the results that has been saved to GCS, and I can fetch a particular
test result to my laptop via a single command: "gce-xfstests
get-results tytso-20220624210238".  No need to ssh to a host node, and
then ssh to the kdevops test node, yadda, yadda, yadda --- and if you
run "make destroy" you lose all of the test result history on that node,
right?

Speaking of saving the test result history, a full set of test
results/artifiacts for a dozen ext4 configs is around 12MB for the
tar.xz file, and Google Cloud Storage is a penny/GB/month for nearline
storage, and 0.4 cents/GB/month for coldline storage, so I can afford
to keep a *lot* of test results/artifacts for quite a while, which can
occasionally be handy for doing some historic research.

See the difference?

						- Ted

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharing fstests results (Was: [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1))
  2022-06-24 22:54         ` Luis Chamberlain
  2022-06-25  2:21           ` Theodore Ts'o
@ 2022-06-25  7:28           ` Amir Goldstein
  2022-06-25 19:35             ` Luis Chamberlain
  1 sibling, 1 reply; 32+ messages in thread
From: Amir Goldstein @ 2022-06-25  7:28 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Theodore Ts'o, Darrick J. Wong, Leah Rumancik, Josef Bacik,
	Chuck Lever, chandanrmail, Sweet Tea Dorminy, Pankaj Raghav,
	Zorro Lang, linux-xfs, fstests

[Subject change was long due...]

On Sat, Jun 25, 2022 at 1:54 AM Luis Chamberlain <mcgrof@kernel.org> wrote:
>
> On Fri, Jun 24, 2022 at 01:32:23AM -0400, Theodore Ts'o wrote:
[...]

> >
> > > Sure, the TODO item on the URL seemed to indicate there was a desire to
> > > find a better place to put failures.
> >
> > I'm not convinced the "better place" is expunge files.  I suspect it
> > may need to be some kind of database.  Darrick tells me that he stores
> > his test results in a postgres database.  (Which is way better than
> > what I'm doing which is an mbox file and using mail search tools.)
> >
> > Currently, Leah is using flat text files for the XFS 5.15 stable
> > backports effort, plus some tools that parse and analyze those text
> > files.
>
> Where does not matter yet, what I'd like to refocus on is *if* sharing
> is desirable by folks. We can discuss *how* and *where* if we do think
> it is worth to share.
>
> If folks would like to evaluate this I'd encourage to do so perhaps
> after a specific distro release moving forward, and to not backtrack.
>
> But for stable kernels I'd imagine it may be easier to see value in
> sharing.
>
> > I'll also note that the number of baseline kernel versions is much
> > smaller if you are primarily testing an enterprise Linux distribution,
> > such as SLES.
>
> Much smaller than what? Android? If so then perhaps. Just recall that
> Enterprise supports kernels for at least 10 years.
>
> > And if you are working with stable kernels, you can
> > probably get away with having updating the baseline for each LTS
> > kernel every so often.  But for upstream kernels development the
> > number of kernel versions for which a developer might want to track
> > flaky percentages and far greater, and will need to be updated at
> > least once every kernel development cycle, and possibly more
> > frequently than that.  Which is why I'm not entirely sure a flat text
> > file, such as an expunge file, is really the right answer.  I can
> > completely understand why Darrick is using a Postgres database.
> >
> > So there is clearly more thought and design required here, in my
> > opinion.
>
> Sure, let's talk about it, *if* we do find it valuable to share.
> kdevops already has stuff in a format which is consistent, that
> can change or be ported. We first just need to decide if we want
> to as a community share.
>
> The flakyness annotations are important too, and we have a thread
> about that, which I have to go and get back to at some point.
>
> > > That is not a goal, the goal is allow variability! And share results
> > > in the most efficient way.
> >
> > Sure, but are expunge files the most efficient way to "share results"?
>
> There are three things we want to do if we are going to talk about
> sharing results:
>
> a) Consuming expunges so check.sh for the Node Under Test (NUT) can expand
>    on the expunges given a criteria (flakyness, crash requirements)
>
> b) Sharing updates to expunges per kernel / distro / runner / node-config
>    and making patches to this easy.
>
> c) Making updates for failures easy to read for a developer / community.
>    These would be in the form of an email or results file for a test
>    run through some sort of kernel-ci.
>
> Let's start with a):
>
> We can adopt runners to use anything. My gut tells me postgres is
> a bit large unless we need socket communication. I can think of two
> ways to go here then. Perhaps others have some other ideas?
>
> 1) We go lightweight on the db, maybe sqlite3 ? And embrace the same
>    postgres db schema as used by Darrick if he sees value in sharing
>    this. If we do this I think it does't make sense to *require*
>    sqlite3 on the NUT (nodes), for many reasons, so parsing the db
>    on the host to a flat file to be used by the node does seem
>    ideal.
>
> 2) Keep postgres and provide a REST api for queries from the host to
>    this server so it can then construct a flat file / directory
>    interpreation of expunges for the nodes under test (NUT).
>
> Given the minimum requirements desirable on the NUTs I think in the end
> a flat file hierarchy is nice so to not incur some new dependency on
> them.
>
> Determinism is important for tests though so snapshotting a reflection
> interpretion of expunges at a specific point in time is also important.
> So the database would need to be versioned per updates, so a test is
> checkpointed against a specific version of the expunge db.

Using the terminology "expunge db" is wrong here because it suggests
that flakey tests (which are obviously part of that db) should be in
expunge list as is done in kdevops and that is not how Josef/Ted/Darrick
treat the flakey tests.

The discussion should be around sharing fstests "results" not expunge
lists. Sharing expunge lists for tests that should not be run at all
with certain kernel/disrto/xfsprogs has great value on its own and I
this the kdevops hierarchical expunge lists are a very good place to
share this *determinitic* information, but only as long as those lists
absolutely do not contain non-deterministic test expunges.

For example, this is a deterministic expunge list that may be worth sharing:
https://github.com/linux-kdevops/kdevops/blob/master/workflows/fstests/expunges/any/xfs/reqs-xfsprogs-5.10.txt
Because for all the tests (it's just one), the failure is analysed
and found to be deterministic and related to the topic of the expunge.

However, this is also a classic example for an expunge list that could
be auto generated by the test runner if xfs/540 had the annotations:

_fixed_in_version xfsprogs 5.13
_fixed_by_git_commit xfsprogs 5f062427 \
      "xfs_repair: validate alignment of inherited rt extent hints"

>
> If we come to some sort of consensus then this code for parsing an
> expunge set can be used from directly on fstests's check script, so the
> interpreation and use can be done in one place for all test runners.
> We also have additional criteria which we may want for the expunges.
> For instance, if we had flakyness percentage annotated somehow then
> fstests's check could be passed an argument to only include expunges
> given a certain flakyness level of some sort, or for example only
> include expunges for tests which are known to crash.
>
> Generating the files from a db is nice. But what gains do we have
> with using a db then?
>
> Now let's move on to b) sharing the expunges and sending patches for
> updates. I think sending a patch against a flat file reads a lot easier
> except for the comments / flakyness levels / crash consideration / and
> artifacts. For kdevop's purposes this reads well today as we don't
> upload artifacts anywhere and just refer to them on github gists as best
> effort / optional. There is no convention yet on expression of flakyness
> but some tests do mention "failure rate" in one way or another.
>
> So we want to evaluate if we want to share not only expunges but other
> meta data associated to why a new test can be expunged or removed:
>
>  * flakyness percentage
>  * cause a kernel crash?
>  * bogus test?
>  * expunged due to a slew of a tons of other reasons, some of them maybe
>    categorized and shared, some of them not
>
> And do we want to share artifacts? If so how? Perhaps an optional URL,
> with another component describing what it is, gist, or a tarball, etc.
>
> Then for the last part c) making failures easy to read to a developer
> let's review what could be done. I gather gce-xfstests explains the
> xunit results summary. Right now kdevop's kernel-ci stuff just sends
> an email with the same but also a diff to the expunge file hierarchy
> augmented for the target kernel directory being tested. The developer
> would just go and edit the line with meta data as a comment, but that
> is just because we lack a structure for it. If we strive to share
> an expunge list I think it would be wise to consider structure for
> this metadata.
>
> Perhaps:
>
> <test> # <crashes>|<flayness-percent-as-fraction>|<fs-skip-reason>|<artifact-type>|<artifact-dir-url>|<comments>
>
> Where:
>
> test:                         xfs/123 or btrfs/234
> crashes:                      can be either Y or N
> flayness-percent-as-percentage: 80%
> fs-skip-reason:               can be an enum to represent a series of
>                               fs specific reasons why a test may not be
>                               applicable or should be skipped
> artifact-type:                optional, if present the type of artifact,
>                               can be enum to represent a gist test
>                               description, or a tarball
> artifact-dir-url:             optional, path to the artifact
> comments:                     additional comments
>
> All the above considered, a) b) and c), yes I think a flat file
> model works well as an option. I'd love to hear other's feedback.
>
> > If we have a huge amount of variability, such that we have a large
> > number of directories with different test configs and different
> > hardware configs, each with different expunge files, I'm not sure how
> > useful that actually is.
>
> *If* you want to share I think it would be useful.
>
> At least kdevops uses a flat file model with no artifacts, just the
> expunges and comments, and over time it has been very useful, even to be
> able to review historic issues on older kernels by simply using
> something like 'git grep xfs/123' gives me a quick sense of history of
> issues of a test.
>
> > Are we expecting users to do a "git clone",
> > and then start browsing all of these different expunge files by hand?
>
> If we want to extend fstests check script to look for this, it could
> be an optional directory and an arugment could be pased to check so
> to enable its hunt for it, so that if passed it would look for the
> runner / kernel / host-type. For instance today we already have
> a function on initialization for the check script which looks for
> the fstests' config file as follows:
>
> known_hosts()
> {
>         [ "$HOST_CONFIG_DIR" ] || HOST_CONFIG_DIR=`pwd`/configs
>
>         [ -f /etc/xfsqa.config ] && export HOST_OPTIONS=/etc/xfsqa.config
>         [ -f $HOST_CONFIG_DIR/$HOST ] && export HOST_OPTIONS=$HOST_CONFIG_DIR/$HOST
>         [ -f $HOST_CONFIG_DIR/$HOST.config ] && export HOST_OPTIONS=$HOST_CONFIG_DIR/$HOST.config
> }
>
> We could have something similar look for an expugne directory of say
> say --expunge-auto-look and that could be something like:
>
> process_expunge_dir()
> {
>         [ "$HOST_EXPUNGE_DIR" ] || HOST_EXPUNGE_DIR=`pwd`/expunges
>
>         [ -d /etc/fstests/expunges/$HOST ] && export HOST_EXPUNGES=/etc/fstests/expunges/$HOST
>         [ -d $HOST_EXPUNGE_DIR/$HOST ] && export HOST_EXPUNGES=$HOST_EXPUNGE_DIR/$HOST
> }
>
> The runner could be specified, and the host-type
>
> ./check --runner <gce-xfstests|kdevops|whatever> --host-type <kvm-8vcpus-2gb>
>
> And so we can have it look for these directory and if any of these are used
> processed (commulative):
>
>   * HOST_EXPUNGES/any/$fstype/                       - regardless of kernel, host type and runner
>   * HOST_EXPUNGES/$kernel/$fstype/any                - common between runners for any host type
>   * HOST_EXPUNGES/$kernel/$fstype/$hostype           - common between runners for a host type
>   * HOST_EXPUNGES/$kernel/$fstype/$hostype/$runner   - only present for the runner
>
> The aggregate set of expugnes are used.
>
> Additional criteria could be passed to check so to ensure that only
> certain expunges that meet the criteria are used to skip tests for the
> run, provided we can agree on some metatdata for that.
>
> > It might perhaps be useful to get a bit more clarity about how we
> > expect the shared results would be used, because that might drive some
> > of the design decisions about the best way to store these "results".
>

As a requirement, what I am looking for is a way to search for anything
known to the community about failures in test FS/NNN.

Because when I get an alert on a possible regression, that's the fastest
way for me to triage and understand how much effort I should put into
the investigation of that failure and which directions I should look into.

Right now, I look at the test header comment and git log, I grep the
kdepops expunge lists to look for juicy details and I search lore for
mentions of that test.

In fact, I already have an auto generated index of lore fstests
mentions in xfs patch discussions [1] that I just grep for failures found
when testing xfs. For LTS testing, I found it to be the best way to
find candidate fix patches that I may have missed.

I would love to have more sources to get search results from.
There doesn't even need to be a standard form for the search or results.

If Leah, Darrick, Ted and Josef would provide me with a script to search
their home brewed fstests db, I would just run all those scripts and
see what they have to tell me about FS/NNN in some form of human
readable format that I can understand.

Going forward, we can try to standardize the search and results
format, but for getting better requirements you first need users!

Thanks,
Amir.

[1] https://github.com/amir73il/b4/blob/xfs-5.10.y/xfs-5.10..5.17-rn.rst

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1)
  2022-06-25  2:21           ` Theodore Ts'o
@ 2022-06-25 18:49             ` Luis Chamberlain
  2022-06-25 21:14               ` Theodore Ts'o
  0 siblings, 1 reply; 32+ messages in thread
From: Luis Chamberlain @ 2022-06-25 18:49 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Darrick J. Wong, Leah Rumancik, Amir Goldstein, Josef Bacik,
	Chuck Lever, chandanrmail, Sweet Tea Dorminy, Pankaj Raghav,
	Zorro Lang, linux-xfs, fstests

On Fri, Jun 24, 2022 at 10:21:50PM -0400, Theodore Ts'o wrote:
> On Fri, Jun 24, 2022 at 03:54:44PM -0700, Luis Chamberlain wrote:
> > 
> > Perhaps I am not understanding what you are suggesting with a VM native
> > solution. What do you mean by that? A full KVM VM inside the cloud?
> 
> "Cloud native" is the better way to put things.  Cloud VM's
> are designed to be ephemeral, so the concept of "node bringup" really
> doesn't enter into the picture.
> 
> When I run the "build-appliance" command, this creates a test
> appliance image.  Which is to say, we create a root file system image,
> and then "freeze" it into a VM image.

So this seems to build an image from a base distro image. Is that right?
And it would seem your goal is to store that image then after so it can
be re-used.

> For kvm-xfstests this is a qcow image which is run in snapshot mode,
> which means that if any changes is made to the root file system, those
> changes disappear when the VM exits.

Sure, so you use one built image once, makes sense.

You are optimizing usage for GCE. That makes sense. The goal behind
kdevops was to use technology which can *enable* any optimizations in
a cloud agnostic way. What APIs become public is up to the cloud
provider, and one cloud agnostic way to manage cloud solutions using
open source tools is with terraform and so that is used today. If an API
is not yet avilable through terraform kdevops could simply use whatever
cloud tool for additional hooks. But having the ability to ramp up
regardless of cloud provider was extremely important to me from the
beginning.

Optimizing is certainly possible, always :)

Likewise, if you using local virtualized, we can save vagrant images
in the vagrant cloud, if we wanted, which would allow pre-built setups
saved:

https://app.vagrantup.com/boxes/search

That could reduce speed for when doing bringup for local KVM /
Virtualbox guests.

In fact since vagrant images are also just tarballs with qcow2 files,
I do wonder if they can be also leveraged for cloud deployments. Or if
the inverse is true, if your qcow2 images can be used for vagrant
purposes as well. If you're curious:

https://github.com/linux-kdevops/kdevops/blob/master/docs/custom-vagrant-boxes.md

What approach you use is up to you. From a Linux distribution perspective
being able to do reproducible builds was important too, and so that is
why a lot of effort was put to ensure how you cook up a final state from
an initial distro release was supported.

> I can very quickly have over 100 test VM's running in parallel, and as
> the tests complete, they are automatically shutdown and destroyed ----
> which means that we don't store state in the VM.  Instead the state is
> stored in a Google Cloud Storage (Amazon S3) bucket, with e-mail sent
> with a summary of results.

Using cloud object storage is certainly nice if you can afford it. I
think it is valuable, but likewise should be optional. And so with
kdevops support is welcomed should someone want to do that. And so
what you describe is not impossible with kdevops it is just not done
today, but could be enabled.

> VM's can get started much more quickly than "make bringup", since
> we're not running puppet or ansible to configure each node.

You can easily just use pre-built images as well instead of doing
the build from a base distro release, just as you could use custom
vagrant images for local KVM guests.

The usage of ansible to *build* fstests and install can be done once
too and that image saved, exported, etc, and then re-used. The kernel
config I maintain on kdevops has been tested to work on local KVM
virtualization setups, but also all supported cloud providers as well.

So I think there is certainly value in learning from the ways you 
optimizing cloud usage for GCE and generalizing that for *any* cloud
provider.

The steps to get to *build* an image from a base distro release is
glanced over but that alone takes effort and is made pretty well
distro agnostic within kdevops too.

> In contrast, I can just run "gce-xfstests ls-results" to see all of
> the results that has been saved to GCS, and I can fetch a particular
> test result to my laptop via a single command: "gce-xfstests
> get-results tytso-20220624210238".  No need to ssh to a host node, and
> then ssh to the kdevops test node, yadda, yadda, yadda --- and if you
> run "make destroy" you lose all of the test result history on that node,
> right?

Actually all the *.bad, *.dmesg as well as final xunit results for all
nodes for failed tests is copied over locally to the host which is
running kdevops. Xunit files are also merged to represent a final full set
of results too. So no not destroyed. If you wanted to keep all files even
for non-failed stuff we can add that as a new Kconfig bool.

Support for stashing results into object storage sure would be nice, agreed.

> See the difference?

Yes you have optimized usage of GCE. Good stuff, lots to learn from that effort!
Thanks for sharing the details!

  Luis

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharing fstests results (Was: [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1))
  2022-06-25  7:28           ` sharing fstests results (Was: [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1)) Amir Goldstein
@ 2022-06-25 19:35             ` Luis Chamberlain
  2022-06-25 21:50               ` Theodore Ts'o
  0 siblings, 1 reply; 32+ messages in thread
From: Luis Chamberlain @ 2022-06-25 19:35 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Theodore Ts'o, Darrick J. Wong, Leah Rumancik, Josef Bacik,
	Chuck Lever, chandanrmail, Sweet Tea Dorminy, Pankaj Raghav,
	Zorro Lang, linux-xfs, fstests

On Sat, Jun 25, 2022 at 10:28:32AM +0300, Amir Goldstein wrote:
> On Sat, Jun 25, 2022 at 1:54 AM Luis Chamberlain <mcgrof@kernel.org> wrote:
> > Determinism is important for tests though so snapshotting a reflection
> > interpretion of expunges at a specific point in time is also important.
> > So the database would need to be versioned per updates, so a test is
> > checkpointed against a specific version of the expunge db.
> 
> Using the terminology "expunge db" is wrong here because it suggests
> that flakey tests (which are obviously part of that db) should be in
> expunge list as is done in kdevops and that is not how Josef/Ted/Darrick
> treat the flakey tests.

There are flaky tests which can cause a crash, and that is why I started
to expunge these. Not all flaky tests cause a crash though. And so, this
is why in the format I suggested you can specify metadata such as if a
test caused a crash.

At this point I agree that the way kdevops simply skips flaky test which
does not cause a crash should be changed, and if the test is just known
to fail though non deterministically but without a crash it would be
good then at the end to simply not treat that failure as fatal. If
however the failure rate does change it would be useful to update that
information. Without metadata one cannot process that sort of stuff.

> The discussion should be around sharing fstests "results" not expunge
> lists. Sharing expunge lists for tests that should not be run at all
> with certain kernel/disrto/xfsprogs has great value on its own and I
> this the kdevops hierarchical expunge lists are a very good place to
> share think *determinitic* information, but only as long as those lists
> absolutely do not contain non-deterministic test expunges.

The way the expunge list is process could simply be modified in kdevops
so that non-deterministic tests are not expunged but also not treated as
fatal at the end. But think about it, the exception is if the non-deterministic
failure does not lead to a crash, no?

> > > It might perhaps be useful to get a bit more clarity about how we
> > > expect the shared results would be used, because that might drive some
> > > of the design decisions about the best way to store these "results".
> >
> 
> As a requirement, what I am looking for is a way to search for anything
> known to the community about failures in test FS/NNN.

Here's the thing though. Not all developers have incentives to share.
For a while SLE didn't have public expunges, that changed after OpenSUSE
Leap 15.3 as it has binary compatibility with SLE15.3 and so the same
failures on workflows/fstests/expunges/opensuse-leap/15.3/ are applicable/.
It is up to each distro if they wish to share and without a public
vehicle to do so why would they, or how would they?

For upstream and stable I would hope there is more incentives to share.
But again, no shared home ever had existed before. And I don't think
there was ever before dialog about sharing a home for these.

> Because when I get an alert on a possible regression, that's the fastest
> way for me to triage and understand how much effort I should put into
> the investigation of that failure and which directions I should look into.
> 
> Right now, I look at the test header comment and git log, I grep the
> kdepops expunge lists to look for juicy details and I search lore for
> mentions of that test.
> 
> In fact, I already have an auto generated index of lore fstests
> mentions in xfs patch discussions [1] that I just grep for failures found
> when testing xfs. For LTS testing, I found it to be the best way to
> find candidate fix patches that I may have missed.

This effort is valuable and thanks for doing all this.

> Going forward, we can try to standardize the search and results
> format, but for getting better requirements you first need users!

As you are witness to it, running fstests against any fs takes a lot of
time and patience, and as I have noted, not many have incentives to
share. So the best I could do is provide the solution to enable folks to
reproduce testing as fast and as easy as possible and let folks who are
interested to share, to do so. And obvioulsy at least I did get a major
enterprise distro to share some results. Hope others could follow.

So I expect the format for sharing then to be lead by those who have a
clear incentive to do so. Folks working on upstream or stable stakeholders
seem like an obvious candidates. And then it is just volunteer work.

  Luis

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1)
  2022-06-25 18:49             ` Luis Chamberlain
@ 2022-06-25 21:14               ` Theodore Ts'o
  2022-07-01 23:08                 ` Luis Chamberlain
  0 siblings, 1 reply; 32+ messages in thread
From: Theodore Ts'o @ 2022-06-25 21:14 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Darrick J. Wong, Leah Rumancik, Amir Goldstein, Josef Bacik,
	Chuck Lever, chandanrmail, Sweet Tea Dorminy, Pankaj Raghav,
	Zorro Lang, linux-xfs, fstests

On Sat, Jun 25, 2022 at 11:49:54AM -0700, Luis Chamberlain wrote:
> You are optimizing usage for GCE. That makes sense.

This particular usage model is not unique to GCE.  A very similar
thing can be done using Microsoft Azure, Amazon Web Services and
Oracle Cloud Services.  And I've talked to some folks who might be
interested in taking the Test Appliance that is currently built for
use with KVM, Android, and GCE, and extending it to support other
Cloud infrastructures.  So the concept of these optimizations are not
unique to GCE, which is why I've been calling this approach "cloud
native".

Perhaps one other difference is that I make the test appliance images
available, so people don't *have* to build them from scratch.  They
can just download the qcow2 image from:

    https://www.kernel.org/pub/linux/kernel/people/tytso/kvm-xfstests

And for GCE, there is the public image project, xfstests-cloud, just
like there are public images for debian in the debian-cloud project,
for Fedora in the fedora-cloud project, etc.  Of course, for full GPL
compliance, how to build these images from source is fully available,
which is why the images are carefully tagged so all of the git commit
versions and the automated scripts used to build the image are fully
available for anyone who wants to replicate the build.  *BUT*, they
don't have to build the test environment if they are just getting
started.

One of the things which I am trying to do is to make the "out of box"
experience as simple as possible, which means I don't want to force
users to build the test appliance or run "make bringup" if they don't
have to.   

Of course, someone who is doing xfstests development will need to
learn how to build their own test appliance.  But for someone who is
just getting started, the goal is to make the learning curve as flat
as possible.

One of the other things that was important design principles for me
was I didn't want to require that the VM's have networking access, nor
did I want to require users to be able to have to run random scripts
via sudo or as root.  (Some of this was because of corporate security
requirements at the time.)  This also had the benefit that I'm not
asing the user to set up ssh keys if they are using kvm-xfstests, but
instead rely on the serial console.

> The goal behind kdevops was to use technology which can *enable* any
> optimizations in a cloud agnostic way.

Fair enough.  My goal for kvm-xfstests and gce-xfstests was to make
developer velocity the primary goal.  Portability to different cloud
systems took a back seat.  I don't apologize for this, since over the
many years that I've been personally using {kvm,gce}-xfstests, the
fact that I can use my native kernel development environment, and have
the test environment pluck the kernel straight out of my build tree,
has paid for itself many times over.

If I had to push test/debug kernel code to a public git tree just so
the test VM can pull donwn the code and build it in the test VM a
second time --- I'd say, "no thank you, absolutely not."  Having to do
this would slow me down, and as I said, developer velocity is king.  I
want to be able to save a patch from my mail user agent, apply the
patch, and then give the code a test, *without* having to interact
with a public git tree.

Maybe you can do that with kdevops --- but it's not at all obvious
how.  With kvm-xfstests, I have a quickstart doc which gives
instructions, and then it's just a matter of running the command
"kvm-xfstests smoke" or "kvm-xfstests shell" from the developer's
kernel tree.  No muss, no fuss, no dirty dishes....

> In fact since vagrant images are also just tarballs with qcow2 files,
> I do wonder if they can be also leveraged for cloud deployments. Or if
> the inverse is true, if your qcow2 images can be used for vagrant
> purposes as well. 

Well, my qcow2 images don't come with ssh keys, since they are
optimized to be launched from the kvm-xfstests script, where the tests
to be run are passed in via the boot command line:

% kvm-xfstests smoke --no-action
Detected kbuild config; using /build/ext4-4.14 for kernel
Using kernel /build/ext4-4.14/arch/x86/boot/bzImage
Networking disabled.
Would execute:
         ionice -n 5 /usr/bin/kvm -boot order=c -net none -machine type=pc,accel=kvm:tcg \
	 -cpu host -drive file=/usr/projects/xfstests-bld/build-64/test-appliance/root_fs.img,if=virtio,snapshot=on \
	 ....
	-gdb tcp:localhost:7499 --kernel /build/ext4-4.14/arch/x86/boot/bzImage \
	--append "quiet loglevel=0 root=/dev/vda console=ttyS0,115200 nokaslr fstestcfg=4k fstestset=-g,quick fstestopt=aex fstesttz=America/New_York fstesttyp=ext4 fstestapi=1.5 orig_cmdline=c21va2UgLS1uby1hY3Rpb24="

The boot command line options "fstestcfg=4k", "fstestset=-g,quick",
"fstesttyp=ext4", etc. is how the test appliance knows which tests to
run.  So that means *all* the developer needs to do is to type command
"kvm-xfstests smoke".

(By the way, it's a simple config option in ~/.config/kvm-xfstests if
you are a btrfs or xfs developer, and you want the default file system
type to be btrfs or xfs.  Of course you can explicitly specify a test
config if you are an ext4 developer and,, you want to test how a test
runs on xfs: "kvm-xfstests -c xfs/4k generic/223".)

There's no need to set up ssh keys, push the kernel to a public git
tree, ssh into the test VM, yadda, yadda, yadda.  Just one single
command line and you're *done*.

This is what I meant by the fact that kvm-xfstests is optimized for a
file system developer's workflow, which I claim is very different from
what a QA department might want.  I added that capability to
gce-xfstests later, but it's very separate from the very simple
command lines for a file system developer.  If I want the lightweight
test manager to watch a git tree, and to kick off a build whenever a
branch changes, and then run a set of tests, I can do that, but that's
a *very* different command and a very different use case, and I've
optimized for that separately:

    gce-xfstests ltm -c ext4/all -g auto --repo ext4.dev --watch dev

This is what I call the QA department's workflow.  Which is also
totally valid.  But I believe in optimizing for each workflow
separately, and being somewhat opinionated in my choices.

For example, the test appliance uses Debian.  Period.  And that's
because I didn't see the point of investing time in making that be
flexible.  My test infrastructure is optimized for a ***kernel***
developer, and from that perspective, the distro for the test
environment is totally irrelevant.

I understand that if you are working for SuSE, then maybe you would
want to insist on a test environment based on OpenSuSE, or if you're
working for Red Hat, you'd want to use Fedora.  If so, then
kvm-xfstests is not for you.  I'd much rather optimize for a *kernel*
developer, not a Linux distribution's QA department.  They can use
kdevops if they want, for that use case.  :-)

Cheers,

							- Ted

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharing fstests results (Was: [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1))
  2022-06-25 19:35             ` Luis Chamberlain
@ 2022-06-25 21:50               ` Theodore Ts'o
  2022-07-01 23:13                 ` Luis Chamberlain
  0 siblings, 1 reply; 32+ messages in thread
From: Theodore Ts'o @ 2022-06-25 21:50 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Amir Goldstein, Darrick J. Wong, Leah Rumancik, Josef Bacik,
	Chuck Lever, chandanrmail, Sweet Tea Dorminy, Pankaj Raghav,
	Zorro Lang, linux-xfs, fstests

On Sat, Jun 25, 2022 at 12:35:50PM -0700, Luis Chamberlain wrote:
> 
> The way the expunge list is process could simply be modified in kdevops
> so that non-deterministic tests are not expunged but also not treated as
> fatal at the end. But think about it, the exception is if the non-deterministic
> failure does not lead to a crash, no?

That's what I'm doing today, but once we have a better test analysis
system, what I think the only thing which should be excluded is:

   a)   bugs which cause the kernel to crash
   b)   test bugs
   c)   tests which take ***forever*** for a particular configuration
   	    (and for which  we probably get enough coverage through
	    other configs)

If we have a non-deterministic failure, which is due to a kernel bug,
I don't see any reason why we should skip the test.  We just need to
have a fully-featured enough test results analyzer so that we can
distinguish between known failures, known flaky failures, and new test
regressions.

So for example, the new tests generic/681, generic/682, and
generic/692 are causing determinsitic failures for the ext4/encrypt
config.  Right now, this is being tracked manually in a flat text
file:

generic/68[12]  encrypt   Failure percentage: 100%
    The directory does grow, but blocks aren't charged to either root or
    the non-privileged users' quota.  So this appears to be a real bug.
    Testing shows this goes all the way back to at least 4.14.

It's currently not tagged by kernel version, because I mostly only
care about upstream.  So once it's fixed upstream, I stop caring about
it.  In the ideal world, we'd track the kernel commit which fixed the
test failure, and when the fix propagated to the various stable
kernels, etc.

I've also resisted putting it in an expunge file, since if it did, I
would ignore it forever.  If it stays in my face, I'm more likely to
fix it, even if it's on my personal time.

> Here's the thing though. Not all developers have incentives to share.

Part of this is the amount of *time* that it takes to share this
information.  Right now, a lot of sharing takes place on the weekly
ext4 conference call.  It doesn't take Eric Whitney a lot of time to
mention that he's seeing a particular test failure, and I can quickly
search my test summary Unix mbox file and say, "yep, I've seen this
fail a couple of times before, starting in February 2020 --- but it's
super rare."

And since Darrick attends the weekly ext4 video chats, once or twice
we've asked him about some test failures on some esoteric xfs config,
such as realtime with an external logdev, and he might say, "oh yeah,
that's a known test bug.  pull this branch from my public xfstests
tree, I just haven't had time to push those fixes upstream yet."

(And I don't blame him for that; I just recently pushed some ext4 test
bug fixes, some of which I had initially sent to the list in late
April --- but on code review, changes were requested, and I just
didn't have *time* to clean up fixes in response to the code reviews.
So the fix which was good enough to suppress the failures sat in my
tree, but didn't go upstream since it was deemed not ready for
upstream.  I'm all for decreasing tech debt in xfstests; but do
understand that sometimes this means fixes to known test bugs will
stay in developers' git trees, since we're all overloaded.)

It's a similar problem with test failures.  Simply reporting a test
failure isn't *that* hard.  But the analysis, even if it's something
like:

generic/68[12]  encrypt   Failure percentage: 100%
    The directory does grow, but blocks aren't charged to either root or
    the non-privileged users' quota.....

... is the critical bit that people *really* want, and it takes real
developer time to come up with that kind of information.  In the ideal
world, I'd have an army of trained minions to run down this kind of
stuff.  In the real world, sometimes this stuff happens after
midnight, local time, on a Friday night.

(Note that Android and Chrome OS, both of which are big users of
fscrypt, don't use quota.  So If I were to open a bug tracker entry on
it, the bug would get prioritized to P2 or P3, and never be heard from
again, since there's no business reason to prioritize fixing it.
Which is why some of this happens on personal time.)

      	      	    	      	    	       - Ted

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1)
  2022-06-25 21:14               ` Theodore Ts'o
@ 2022-07-01 23:08                 ` Luis Chamberlain
  0 siblings, 0 replies; 32+ messages in thread
From: Luis Chamberlain @ 2022-07-01 23:08 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Darrick J. Wong, Leah Rumancik, Amir Goldstein, Josef Bacik,
	Chuck Lever, chandanrmail, Sweet Tea Dorminy, Pankaj Raghav,
	Zorro Lang, linux-xfs, fstests

On Sat, Jun 25, 2022 at 05:14:17PM -0400, Theodore Ts'o wrote:
> On Sat, Jun 25, 2022 at 11:49:54AM -0700, Luis Chamberlain wrote:
> > You are optimizing usage for GCE. That makes sense.
> 
> This particular usage model is not unique to GCE.  A very similar
> thing can be done using Microsoft Azure, Amazon Web Services and
> Oracle Cloud Services.  And I've talked to some folks who might be
> interested in taking the Test Appliance that is currently built for
> use with KVM, Android, and GCE, and extending it to support other
> Cloud infrastructures.  So the concept of these optimizations are not
> unique to GCE, which is why I've been calling this approach "cloud
> native".

I think we have similar goals. I'd like to eventually generalize
what you have done for enablement through *any* cloud.

And, I suspect this may not just be useful for kernel development too
and so there is value in that for other things.

> Perhaps one other difference is that I make the test appliance images
> available, so people don't *have* to build them from scratch.  They
> can just download the qcow2 image from:
> 
>     https://www.kernel.org/pub/linux/kernel/people/tytso/kvm-xfstests

It may make sense for us to consider containers for some of this.

If a distro doesn't have one, for example, well then we just have
to do the build-it-all-step.

> And for GCE, there is the public image project, xfstests-cloud, just
> like there are public images for debian in the debian-cloud project,
> for Fedora in the fedora-cloud project, etc.  Of course, for full GPL
> compliance, how to build these images from source is fully available,
> which is why the images are carefully tagged so all of the git commit
> versions and the automated scripts used to build the image are fully
> available for anyone who wants to replicate the build.  *BUT*, they
> don't have to build the test environment if they are just getting
> started.
> 
> One of the things which I am trying to do is to make the "out of box"
> experience as simple as possible, which means I don't want to force
> users to build the test appliance or run "make bringup" if they don't
> have to.   

You are misunderstanding the goal with 'make bringup', if you already
have pre-built images you can use them and you have less to do. You
*don't* have to run 'make fstests' if you already have that set up.

'make bringup' just abstracts general initial stage nodes, whether on
cloud or local virt.

'make linux' however does get / build / install linux.

And for local virtualization there whedre vagrant images are used
one could enhance these further too. They are just compressed tarball
with a qcow2 file at least when libvirt is used. Since kdevops works off
of these, you can then also use pre-built images with all kernel/modules
need and even binaries. I've extended docs recently to help folks who wish
to optimize on that front:

https://github.com/linux-kdevops/kdevops/blob/master/docs/custom-vagrant-boxes.md

Each stage has its own reproducible builds aspect to it.

So if one *had* these enhanced vagrant images with kernels, one could
just skip the build stage and jump straight to testing after bringup.

I do wonder if we could share simiular qcow2 images for cloud testing
too and for vagrant. If we could... there is a pretty big win.

> Of course, someone who is doing xfstests development will need to
> learn how to build their own test appliance.  But for someone who is
> just getting started, the goal is to make the learning curve as flat
> as possible.

Yup.

> One of the other things that was important design principles for me
> was I didn't want to require that the VM's have networking access, nor
> did I want to require users to be able to have to run random scripts
> via sudo or as root.  (Some of this was because of corporate security
> requirements at the time.)  This also had the benefit that I'm not
> asing the user to set up ssh keys if they are using kvm-xfstests, but
> instead rely on the serial console.

Philosphy.

> > The goal behind kdevops was to use technology which can *enable* any
> > optimizations in a cloud agnostic way.
> 
> Fair enough.  My goal for kvm-xfstests and gce-xfstests was to make
> developer velocity the primary goal.  Portability to different cloud
> systems took a back seat.  I don't apologize for this, since over the
> many years that I've been personally using {kvm,gce}-xfstests, the
> fact that I can use my native kernel development environment, and have
> the test environment pluck the kernel straight out of my build tree,
> has paid for itself many times over.

Yes I realize that. No one typically has time to do that. Which is why
when I had my requirements from a prior $employer to do the tech do
something cloud agnostic, I decided it was tech best shared. It was not
easy.

> If I had to push test/debug kernel code to a public git tree just so
> the test VM can pull donwn the code and build it in the test VM a
> second time --- I'd say, "no thank you, absolutely not."  Having to do
> this would slow me down, and as I said, developer velocity is king.  I
> want to be able to save a patch from my mail user agent, apply the
> patch, and then give the code a test, *without* having to interact
> with a public git tree.

Every developer may have a different way to work and do Linux kernel
development.

> Maybe you can do that with kdevops --- but it's not at all obvious
> how.

The above just explained what you *don't* want to do, not what you want.
But you explained to me in private a while ago you expect to do local
test builds fast to guest.

I think you're just missing that the goal is to support variability
and enable that variability. If such variability is not supported
then its just a matter of adding a few kconfig options and then
adding support for it. So yes its possible and its a matter of
taking a bit of time to do that workflow. My kdev workflow was to
just work with large guests before, and use 'localmodconfig' kernels
which are very small, and so build time is fast, specially after the
first build. The other worklow I then supported was the distro world
one where we tested a "kernel of the day" which is a kernel on a repo
somewhere. So upgradng is just ensuring you have a repo and `zypper in`
the kernel, reboot and test.

To support the workflow you have I'd like to evaluate both a local virt
solution and cloud (for any cloud vendor). For local virt using 9p seems to
make sense. For cloud, not so sure.

I think we really digress from the subject at hand though. This
conversation is useful but it really is just noise to a lot of people.

  Luis

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharing fstests results (Was: [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1))
  2022-06-25 21:50               ` Theodore Ts'o
@ 2022-07-01 23:13                 ` Luis Chamberlain
  0 siblings, 0 replies; 32+ messages in thread
From: Luis Chamberlain @ 2022-07-01 23:13 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Amir Goldstein, Darrick J. Wong, Leah Rumancik, Josef Bacik,
	Chuck Lever, chandanrmail, Sweet Tea Dorminy, Pankaj Raghav,
	Zorro Lang, linux-xfs, fstests

On Sat, Jun 25, 2022 at 05:50:26PM -0400, Theodore Ts'o wrote:
> On Sat, Jun 25, 2022 at 12:35:50PM -0700, Luis Chamberlain wrote:
> > Here's the thing though. Not all developers have incentives to share.
> 
> Part of this is the amount of *time* that it takes to share this
> information.

There's many reasons. In the end we keep digressing, but I see no
expressed interest to share, and so we can just keep on moving
with how things are.

  Luis

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2022-07-01 23:14 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-16 18:27 [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1) Leah Rumancik
2022-06-16 18:27 ` [PATCH 5.15 CANDIDATE v2 1/8] xfs: use kmem_cache_free() for kmem_cache objects Leah Rumancik
2022-06-16 18:27 ` [PATCH 5.15 CANDIDATE v2 2/8] xfs: punch out data fork delalloc blocks on COW writeback failure Leah Rumancik
2022-06-16 18:27 ` [PATCH 5.15 CANDIDATE v2 3/8] xfs: Fix the free logic of state in xfs_attr_node_hasname Leah Rumancik
2022-06-16 18:27 ` [PATCH 5.15 CANDIDATE v2 4/8] xfs: remove all COW fork extents when remounting readonly Leah Rumancik
2022-06-16 18:27 ` [PATCH 5.15 CANDIDATE v2 5/8] xfs: check sb_meta_uuid for dabuf buffer recovery Leah Rumancik
2022-06-16 18:27 ` [PATCH 5.15 CANDIDATE v2 6/8] xfs: prevent UAF in xfs_log_item_in_current_chkpt Leah Rumancik
2022-06-16 18:27 ` [PATCH 5.15 CANDIDATE v2 7/8] xfs: only bother with sync_filesystem during readonly remount Leah Rumancik
2022-06-16 18:27 ` [PATCH 5.15 CANDIDATE v2 8/8] xfs: use setattr_copy to set vfs inode attributes Leah Rumancik
2022-06-17  7:27   ` Amir Goldstein
2022-06-22  0:07 ` [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1) Luis Chamberlain
2022-06-22 21:44   ` Theodore Ts'o
2022-06-23  5:31     ` Amir Goldstein
2022-06-23 21:39       ` Luis Chamberlain
2022-06-23 21:31     ` Luis Chamberlain
2022-06-24  5:32       ` Theodore Ts'o
2022-06-24 22:54         ` Luis Chamberlain
2022-06-25  2:21           ` Theodore Ts'o
2022-06-25 18:49             ` Luis Chamberlain
2022-06-25 21:14               ` Theodore Ts'o
2022-07-01 23:08                 ` Luis Chamberlain
2022-06-25  7:28           ` sharing fstests results (Was: [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1)) Amir Goldstein
2022-06-25 19:35             ` Luis Chamberlain
2022-06-25 21:50               ` Theodore Ts'o
2022-07-01 23:13                 ` Luis Chamberlain
2022-06-22 21:52   ` [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1) Leah Rumancik
2022-06-23 21:40     ` Luis Chamberlain
2022-06-22 16:23 ` Darrick J. Wong
2022-06-22 16:35   ` Darrick J. Wong
2022-06-22 21:29     ` Leah Rumancik
2022-06-23  4:53       ` Amir Goldstein
2022-06-23  6:28         ` Amir Goldstein

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.