[PATCH 5.10 CANDIDATE 0/8] xfs stable candidate patches for 5.10.y (part 2)

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 5.10 CANDIDATE 0/8] xfs stable candidate patches for 5.10.y (part 2)
@ 2022-06-01 10:45 Amir Goldstein
  2022-06-01 10:45 ` [PATCH 5.10 CANDIDATE 1/8] xfs: fix up non-directory creation in SGID directories Amir Goldstein
                   ` (7 more replies)
  0 siblings, 8 replies; 21+ messages in thread
From: Amir Goldstein @ 2022-06-01 10:45 UTC (permalink / raw)
  To: Darrick J . Wong
  Cc: Dave Chinner, Brian Foster, Christoph Hellwig, Luis Chamberlain,
	Adam Manzanares, Tyler Hicks, linux-xfs, fstests

Hi all!

This is the 2nd part of a collection of stable patch candidates that
I collected from xfs releases v5.11..v5.18 [1][2].

The patches in this part are from circa v5.11..v5.12.
They have been through >30 auto group runs with the kdevops configs
reflink_normapbt, reflink, reflink_1024, nocrc, nocrc_512.
No regressions from baseline were observed.

At least three of the fixes have regression tests in fstests that were
used to verify the fix. I also annotated [3] the fix commits in the tests.

It is worth noting that one series and another patch from v5.12 were
removed from the stable candidates queue before this posting.

The "extent count overflow" series [4] was removed following feedback
from Dave Chinner that is it not practically relevant for LTS users.

The patch "xfs: don't reuse busy extents on extent trim" was removed
following a regression it caused in 5.10.y [5] that was discovered in
early stages of testing of this part.  The process works! :-)

I would like to thank Luis for his huge part in this still ongoing effort
and I would like to thank Samsung for contributing the hardware resources
to drive this effort.

Your inputs on these stable candidates are most welcome!

Thanks,
Amir.

[1] https://github.com/amir73il/b4/blob/xfs-5.10.y/xfs-5.10..5.17-fixes.rst
[2] https://github.com/amir73il/b4/blob/xfs-5.10.y/xfs-5.18-fixes.rst
[3] https://lore.kernel.org/fstests/20220520143249.2103631-1-amir73il@gmail.com/
[4] https://lore.kernel.org/linux-xfs/20220525083814.GH1098723@dread.disaster.area/
[5] https://lore.kernel.org/linux-xfs/YpY6hUknor2S1iMd@bfoster/T/#mf1add66b8309a75a8984f28ea08718f22033bce7

Brian Foster (3):
  xfs: sync lazy sb accounting on quiesce of read-only mounts
  xfs: restore shutdown check in mapped write fault path
  xfs: consider shutdown in bmapbt cursor delete assert

Christoph Hellwig (1):
  xfs: fix up non-directory creation in SGID directories

Darrick J. Wong (3):
  xfs: fix chown leaking delalloc quota blocks when fssetxattr fails
  xfs: fix incorrect root dquot corruption error when switching
    group/project quota types
  xfs: force log and push AIL to clear pinned inodes when aborting mount

Jeffrey Mitchell (1):
  xfs: set inode size after creating symlink

 fs/xfs/libxfs/xfs_btree.c | 33 +++++---------
 fs/xfs/xfs_dquot.c        | 39 +++++++++++++++-
 fs/xfs/xfs_inode.c        | 14 +++---
 fs/xfs/xfs_iomap.c        |  3 ++
 fs/xfs/xfs_log.c          | 28 ++++++++----
 fs/xfs/xfs_log.h          |  1 +
 fs/xfs/xfs_mount.c        | 93 +++++++++++++++++++--------------------
 fs/xfs/xfs_qm.c           | 92 +++++++++++++++-----------------------
 fs/xfs/xfs_symlink.c      |  1 +
 9 files changed, 161 insertions(+), 143 deletions(-)

-- 
2.25.1

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 5.10 CANDIDATE 1/8] xfs: fix up non-directory creation in SGID directories
  2022-06-01 10:45 [PATCH 5.10 CANDIDATE 0/8] xfs stable candidate patches for 5.10.y (part 2) Amir Goldstein
@ 2022-06-01 10:45 ` Amir Goldstein
  2022-06-02  0:52   ` Dave Chinner
  2022-06-01 10:45 ` [PATCH 5.10 CANDIDATE 2/8] xfs: set inode size after creating symlink Amir Goldstein
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Amir Goldstein @ 2022-06-01 10:45 UTC (permalink / raw)
  To: Darrick J . Wong
  Cc: Dave Chinner, Brian Foster, Christoph Hellwig, Luis Chamberlain,
	Adam Manzanares, Tyler Hicks, linux-xfs, fstests,
	Christian Brauner

From: Christoph Hellwig <hch@lst.de>

commit 01ea173e103edd5ec41acec65b9261b87e123fc2 upstream.

XFS always inherits the SGID bit if it is set on the parent inode, while
the generic inode_init_owner does not do this in a few cases where it can
create a possible security problem, see commit 0fa3ecd87848
("Fix up non-directory creation in SGID directories") for details.

Switch XFS to use the generic helper for the normal path to fix this,
just keeping the simple field inheritance open coded for the case of the
non-sgid case with the bsdgrpid mount option.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/xfs/xfs_inode.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index e958b1c74561..e20d8af80216 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -802,6 +802,7 @@ xfs_ialloc(
 	xfs_buf_t	**ialloc_context,
 	xfs_inode_t	**ipp)
 {
+	struct inode	*dir = pip ? VFS_I(pip) : NULL;
 	struct xfs_mount *mp = tp->t_mountp;
 	xfs_ino_t	ino;
 	xfs_inode_t	*ip;
@@ -847,18 +848,17 @@ xfs_ialloc(
 		return error;
 	ASSERT(ip != NULL);
 	inode = VFS_I(ip);
-	inode->i_mode = mode;
 	set_nlink(inode, nlink);
-	inode->i_uid = current_fsuid();
 	inode->i_rdev = rdev;
 	ip->i_d.di_projid = prid;
 
-	if (pip && XFS_INHERIT_GID(pip)) {
-		inode->i_gid = VFS_I(pip)->i_gid;
-		if ((VFS_I(pip)->i_mode & S_ISGID) && S_ISDIR(mode))
-			inode->i_mode |= S_ISGID;
+	if (dir && !(dir->i_mode & S_ISGID) &&
+	    (mp->m_flags & XFS_MOUNT_GRPID)) {
+		inode->i_uid = current_fsuid();
+		inode->i_gid = dir->i_gid;
+		inode->i_mode = mode;
 	} else {
-		inode->i_gid = current_fsgid();
+		inode_init_owner(inode, dir, mode);
 	}
 
 	/*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5.10 CANDIDATE 2/8] xfs: set inode size after creating symlink
  2022-06-01 10:45 [PATCH 5.10 CANDIDATE 0/8] xfs stable candidate patches for 5.10.y (part 2) Amir Goldstein
  2022-06-01 10:45 ` [PATCH 5.10 CANDIDATE 1/8] xfs: fix up non-directory creation in SGID directories Amir Goldstein
@ 2022-06-01 10:45 ` Amir Goldstein
  2022-06-01 10:45 ` [PATCH 5.10 CANDIDATE 3/8] xfs: sync lazy sb accounting on quiesce of read-only mounts Amir Goldstein
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Amir Goldstein @ 2022-06-01 10:45 UTC (permalink / raw)
  To: Darrick J . Wong
  Cc: Dave Chinner, Brian Foster, Christoph Hellwig, Luis Chamberlain,
	Adam Manzanares, Tyler Hicks, linux-xfs, fstests,
	Jeffrey Mitchell

From: Jeffrey Mitchell <jeffrey.mitchell@starlab.io>

commit 8aa921a95335d0a8c8e2be35a44467e7c91ec3e4 upstream.

When XFS creates a new symlink, it writes its size to disk but not to the
VFS inode. This causes i_size_read() to return 0 for that symlink until
it is re-read from disk, for example when the system is rebooted.

I found this inconsistency while protecting directories with eCryptFS.
The command "stat path/to/symlink/in/ecryptfs" will report "Size: 0" if
the symlink was created after the last reboot on an XFS root.

Call i_size_write() in xfs_symlink()

Signed-off-by: Jeffrey Mitchell <jeffrey.mitchell@starlab.io>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/xfs/xfs_symlink.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index 8e88a7ca387e..8d3abf06c54f 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -300,6 +300,7 @@ xfs_symlink(
 		}
 		ASSERT(pathlen == 0);
 	}
+	i_size_write(VFS_I(ip), ip->i_d.di_size);
 
 	/*
 	 * Create the directory entry for the symlink.
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5.10 CANDIDATE 3/8] xfs: sync lazy sb accounting on quiesce of read-only mounts
  2022-06-01 10:45 [PATCH 5.10 CANDIDATE 0/8] xfs stable candidate patches for 5.10.y (part 2) Amir Goldstein
  2022-06-01 10:45 ` [PATCH 5.10 CANDIDATE 1/8] xfs: fix up non-directory creation in SGID directories Amir Goldstein
  2022-06-01 10:45 ` [PATCH 5.10 CANDIDATE 2/8] xfs: set inode size after creating symlink Amir Goldstein
@ 2022-06-01 10:45 ` Amir Goldstein
  2022-06-01 10:45 ` [PATCH 5.10 CANDIDATE 4/8] xfs: fix chown leaking delalloc quota blocks when fssetxattr fails Amir Goldstein
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Amir Goldstein @ 2022-06-01 10:45 UTC (permalink / raw)
  To: Darrick J . Wong
  Cc: Dave Chinner, Brian Foster, Christoph Hellwig, Luis Chamberlain,
	Adam Manzanares, Tyler Hicks, linux-xfs, fstests, Gao Xiang,
	Allison Henderson, Darrick J . Wong, Bill O'Donnell

From: Brian Foster <bfoster@redhat.com>

commit 50d25484bebe94320c49dd1347d3330c7063bbdb upstream.

xfs_log_sbcount() syncs the superblock specifically to accumulate
the in-core percpu superblock counters and commit them to disk. This
is required to maintain filesystem consistency across quiesce
(freeze, read-only mount/remount) or unmount when lazy superblock
accounting is enabled because individual transactions do not update
the superblock directly.

This mechanism works as expected for writable mounts, but
xfs_log_sbcount() skips the update for read-only mounts. Read-only
mounts otherwise still allow log recovery and write out an unmount
record during log quiesce. If a read-only mount performs log
recovery, it can modify the in-core superblock counters and write an
unmount record when the filesystem unmounts without ever syncing the
in-core counters. This leaves the filesystem with a clean log but in
an inconsistent state with regard to lazy sb counters.

Update xfs_log_sbcount() to use the same logic
xfs_log_unmount_write() uses to determine when to write an unmount
record. This ensures that lazy accounting is always synced before
the log is cleaned. Refactor this logic into a new helper to
distinguish between a writable filesystem and a writable log.
Specifically, the log is writable unless the filesystem is mounted
with the norecovery mount option, the underlying log device is
read-only, or the filesystem is shutdown. Drop the freeze state
check because the update is already allowed during the freezing
process and no context calls this function on an already frozen fs.
Also, retain the shutdown check in xfs_log_unmount_write() to catch
the case where the preceding log force might have triggered a
shutdown.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/xfs/xfs_log.c   | 28 ++++++++++++++++++++--------
 fs/xfs/xfs_log.h   |  1 +
 fs/xfs/xfs_mount.c |  3 +--
 3 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index fa2d05e65ff1..b445e63cbc3c 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -347,6 +347,25 @@ xlog_tic_add_region(xlog_ticket_t *tic, uint len, uint type)
 	tic->t_res_num++;
 }
 
+bool
+xfs_log_writable(
+	struct xfs_mount	*mp)
+{
+	/*
+	 * Never write to the log on norecovery mounts, if the block device is
+	 * read-only, or if the filesystem is shutdown. Read-only mounts still
+	 * allow internal writes for log recovery and unmount purposes, so don't
+	 * restrict that case here.
+	 */
+	if (mp->m_flags & XFS_MOUNT_NORECOVERY)
+		return false;
+	if (xfs_readonly_buftarg(mp->m_log->l_targ))
+		return false;
+	if (XFS_FORCED_SHUTDOWN(mp))
+		return false;
+	return true;
+}
+
 /*
  * Replenish the byte reservation required by moving the grant write head.
  */
@@ -886,15 +905,8 @@ xfs_log_unmount_write(
 {
 	struct xlog		*log = mp->m_log;
 
-	/*
-	 * Don't write out unmount record on norecovery mounts or ro devices.
-	 * Or, if we are doing a forced umount (typically because of IO errors).
-	 */
-	if (mp->m_flags & XFS_MOUNT_NORECOVERY ||
-	    xfs_readonly_buftarg(log->l_targ)) {
-		ASSERT(mp->m_flags & XFS_MOUNT_RDONLY);
+	if (!xfs_log_writable(mp))
 		return;
-	}
 
 	xfs_log_force(mp, XFS_LOG_SYNC);
 
diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h
index 58c3fcbec94a..98c913da7587 100644
--- a/fs/xfs/xfs_log.h
+++ b/fs/xfs/xfs_log.h
@@ -127,6 +127,7 @@ int	  xfs_log_reserve(struct xfs_mount *mp,
 int	  xfs_log_regrant(struct xfs_mount *mp, struct xlog_ticket *tic);
 void      xfs_log_unmount(struct xfs_mount *mp);
 int	  xfs_log_force_umount(struct xfs_mount *mp, int logerror);
+bool	xfs_log_writable(struct xfs_mount *mp);
 
 struct xlog_ticket *xfs_log_ticket_get(struct xlog_ticket *ticket);
 void	  xfs_log_ticket_put(struct xlog_ticket *ticket);
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 7110507a2b6b..a62b8a574409 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1176,8 +1176,7 @@ xfs_fs_writable(
 int
 xfs_log_sbcount(xfs_mount_t *mp)
 {
-	/* allow this to proceed during the freeze sequence... */
-	if (!xfs_fs_writable(mp, SB_FREEZE_COMPLETE))
+	if (!xfs_log_writable(mp))
 		return 0;
 
 	/*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5.10 CANDIDATE 4/8] xfs: fix chown leaking delalloc quota blocks when fssetxattr fails
  2022-06-01 10:45 [PATCH 5.10 CANDIDATE 0/8] xfs stable candidate patches for 5.10.y (part 2) Amir Goldstein
                   ` (2 preceding siblings ...)
  2022-06-01 10:45 ` [PATCH 5.10 CANDIDATE 3/8] xfs: sync lazy sb accounting on quiesce of read-only mounts Amir Goldstein
@ 2022-06-01 10:45 ` Amir Goldstein
  2022-06-01 10:45 ` [PATCH 5.10 CANDIDATE 5/8] xfs: fix incorrect root dquot corruption error when switching group/project quota types Amir Goldstein
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Amir Goldstein @ 2022-06-01 10:45 UTC (permalink / raw)
  To: Darrick J . Wong
  Cc: Dave Chinner, Brian Foster, Christoph Hellwig, Luis Chamberlain,
	Adam Manzanares, Tyler Hicks, linux-xfs, fstests

From: "Darrick J. Wong" <djwong@kernel.org>

commit 1aecf3734a95f3c167d1495550ca57556d33f7ec upstream.

While refactoring the quota code to create a function to allocate inode
change transactions, I noticed that xfs_qm_vop_chown_reserve does more
than just make reservations: it also *modifies* the incore counts
directly to handle the owner id change for the delalloc blocks.

I then observed that the fssetxattr code continues validating input
arguments after making the quota reservation but before dirtying the
transaction.  If the routine decides to error out, it fails to undo the
accounting switch!  This leads to incorrect quota reservation and
failure down the line.

We can fix this by making the reservation function do only that -- for
the new dquot, it reserves ondisk and delalloc blocks to the
transaction, and the old dquot hangs on to its incore reservation for
now.  Once we actually switch the dquots, we can then update the incore
reservations because we've dirtied the transaction and it's too late to
turn back now.

No fixes tag because this has been broken since the start of git.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/xfs/xfs_qm.c | 92 +++++++++++++++++++------------------------------
 1 file changed, 35 insertions(+), 57 deletions(-)

diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index b2a9abee8b2b..64e5da33733b 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -1785,6 +1785,29 @@ xfs_qm_vop_chown(
 	xfs_trans_mod_dquot(tp, newdq, bfield, ip->i_d.di_nblocks);
 	xfs_trans_mod_dquot(tp, newdq, XFS_TRANS_DQ_ICOUNT, 1);
 
+	/*
+	 * Back when we made quota reservations for the chown, we reserved the
+	 * ondisk blocks + delalloc blocks with the new dquot.  Now that we've
+	 * switched the dquots, decrease the new dquot's block reservation
+	 * (having already bumped up the real counter) so that we don't have
+	 * any reservation to give back when we commit.
+	 */
+	xfs_trans_mod_dquot(tp, newdq, XFS_TRANS_DQ_RES_BLKS,
+			-ip->i_delayed_blks);
+
+	/*
+	 * Give the incore reservation for delalloc blocks back to the old
+	 * dquot.  We don't normally handle delalloc quota reservations
+	 * transactionally, so just lock the dquot and subtract from the
+	 * reservation.  Dirty the transaction because it's too late to turn
+	 * back now.
+	 */
+	tp->t_flags |= XFS_TRANS_DIRTY;
+	xfs_dqlock(prevdq);
+	ASSERT(prevdq->q_blk.reserved >= ip->i_delayed_blks);
+	prevdq->q_blk.reserved -= ip->i_delayed_blks;
+	xfs_dqunlock(prevdq);
+
 	/*
 	 * Take an extra reference, because the inode is going to keep
 	 * this dquot pointer even after the trans_commit.
@@ -1807,84 +1830,39 @@ xfs_qm_vop_chown_reserve(
 	uint			flags)
 {
 	struct xfs_mount	*mp = ip->i_mount;
-	uint64_t		delblks;
 	unsigned int		blkflags;
-	struct xfs_dquot	*udq_unres = NULL;
-	struct xfs_dquot	*gdq_unres = NULL;
-	struct xfs_dquot	*pdq_unres = NULL;
 	struct xfs_dquot	*udq_delblks = NULL;
 	struct xfs_dquot	*gdq_delblks = NULL;
 	struct xfs_dquot	*pdq_delblks = NULL;
-	int			error;
-
 
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED));
 	ASSERT(XFS_IS_QUOTA_RUNNING(mp));
 
-	delblks = ip->i_delayed_blks;
 	blkflags = XFS_IS_REALTIME_INODE(ip) ?
 			XFS_QMOPT_RES_RTBLKS : XFS_QMOPT_RES_REGBLKS;
 
 	if (XFS_IS_UQUOTA_ON(mp) && udqp &&
-	    i_uid_read(VFS_I(ip)) != udqp->q_id) {
+	    i_uid_read(VFS_I(ip)) != udqp->q_id)
 		udq_delblks = udqp;
-		/*
-		 * If there are delayed allocation blocks, then we have to
-		 * unreserve those from the old dquot, and add them to the
-		 * new dquot.
-		 */
-		if (delblks) {
-			ASSERT(ip->i_udquot);
-			udq_unres = ip->i_udquot;
-		}
-	}
+
 	if (XFS_IS_GQUOTA_ON(ip->i_mount) && gdqp &&
-	    i_gid_read(VFS_I(ip)) != gdqp->q_id) {
+	    i_gid_read(VFS_I(ip)) != gdqp->q_id)
 		gdq_delblks = gdqp;
-		if (delblks) {
-			ASSERT(ip->i_gdquot);
-			gdq_unres = ip->i_gdquot;
-		}
-	}
 
 	if (XFS_IS_PQUOTA_ON(ip->i_mount) && pdqp &&
-	    ip->i_d.di_projid != pdqp->q_id) {
+	    ip->i_d.di_projid != pdqp->q_id)
 		pdq_delblks = pdqp;
-		if (delblks) {
-			ASSERT(ip->i_pdquot);
-			pdq_unres = ip->i_pdquot;
-		}
-	}
-
-	error = xfs_trans_reserve_quota_bydquots(tp, ip->i_mount,
-				udq_delblks, gdq_delblks, pdq_delblks,
-				ip->i_d.di_nblocks, 1, flags | blkflags);
-	if (error)
-		return error;
 
 	/*
-	 * Do the delayed blks reservations/unreservations now. Since, these
-	 * are done without the help of a transaction, if a reservation fails
-	 * its previous reservations won't be automatically undone by trans
-	 * code. So, we have to do it manually here.
+	 * Reserve enough quota to handle blocks on disk and reserved for a
+	 * delayed allocation.  We'll actually transfer the delalloc
+	 * reservation between dquots at chown time, even though that part is
+	 * only semi-transactional.
 	 */
-	if (delblks) {
-		/*
-		 * Do the reservations first. Unreservation can't fail.
-		 */
-		ASSERT(udq_delblks || gdq_delblks || pdq_delblks);
-		ASSERT(udq_unres || gdq_unres || pdq_unres);
-		error = xfs_trans_reserve_quota_bydquots(NULL, ip->i_mount,
-			    udq_delblks, gdq_delblks, pdq_delblks,
-			    (xfs_qcnt_t)delblks, 0, flags | blkflags);
-		if (error)
-			return error;
-		xfs_trans_reserve_quota_bydquots(NULL, ip->i_mount,
-				udq_unres, gdq_unres, pdq_unres,
-				-((xfs_qcnt_t)delblks), 0, blkflags);
-	}
-
-	return 0;
+	return xfs_trans_reserve_quota_bydquots(tp, ip->i_mount, udq_delblks,
+			gdq_delblks, pdq_delblks,
+			ip->i_d.di_nblocks + ip->i_delayed_blks,
+			1, blkflags | flags);
 }
 
 int
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5.10 CANDIDATE 5/8] xfs: fix incorrect root dquot corruption error when switching group/project quota types
  2022-06-01 10:45 [PATCH 5.10 CANDIDATE 0/8] xfs stable candidate patches for 5.10.y (part 2) Amir Goldstein
                   ` (3 preceding siblings ...)
  2022-06-01 10:45 ` [PATCH 5.10 CANDIDATE 4/8] xfs: fix chown leaking delalloc quota blocks when fssetxattr fails Amir Goldstein
@ 2022-06-01 10:45 ` Amir Goldstein
  2022-06-01 10:45 ` [PATCH 5.10 CANDIDATE 6/8] xfs: restore shutdown check in mapped write fault path Amir Goldstein
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Amir Goldstein @ 2022-06-01 10:45 UTC (permalink / raw)
  To: Darrick J . Wong
  Cc: Dave Chinner, Brian Foster, Christoph Hellwig, Luis Chamberlain,
	Adam Manzanares, Tyler Hicks, linux-xfs, fstests, Chandan Babu R

From: "Darrick J. Wong" <djwong@kernel.org>

commit 45068063efb7dd0a8d115c106aa05d9ab0946257 upstream.

While writing up a regression test for broken behavior when a chprojid
request fails, I noticed that we were logging corruption notices about
the root dquot of the group/project quota file at mount time when
testing V4 filesystems.

In commit afeda6000b0c, I was trying to improve ondisk dquot validation
by making sure that when we load an ondisk dquot into memory on behalf
of an incore dquot, the dquot id and type matches.  Unfortunately, I
forgot that V4 filesystems only have two quota files, and can switch
that file between group and project quota types at mount time.  When we
perform that switch, we'll try to load the default quota limits from the
root dquot prior to running quotacheck and log a corruption error when
the types don't match.

This is inconsequential because quotacheck will reset the second quota
file as part of doing the switch, but we shouldn't leave scary messages
in the kernel log.

Fixes: afeda6000b0c ("xfs: validate ondisk/incore dquot flags")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/xfs/xfs_dquot.c | 39 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index 1d95ed387d66..80c4579d6835 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -500,6 +500,42 @@ xfs_dquot_alloc(
 	return dqp;
 }
 
+/* Check the ondisk dquot's id and type match what the incore dquot expects. */
+static bool
+xfs_dquot_check_type(
+	struct xfs_dquot	*dqp,
+	struct xfs_disk_dquot	*ddqp)
+{
+	uint8_t			ddqp_type;
+	uint8_t			dqp_type;
+
+	ddqp_type = ddqp->d_type & XFS_DQTYPE_REC_MASK;
+	dqp_type = xfs_dquot_type(dqp);
+
+	if (be32_to_cpu(ddqp->d_id) != dqp->q_id)
+		return false;
+
+	/*
+	 * V5 filesystems always expect an exact type match.  V4 filesystems
+	 * expect an exact match for user dquots and for non-root group and
+	 * project dquots.
+	 */
+	if (xfs_sb_version_hascrc(&dqp->q_mount->m_sb) ||
+	    dqp_type == XFS_DQTYPE_USER || dqp->q_id != 0)
+		return ddqp_type == dqp_type;
+
+	/*
+	 * V4 filesystems support either group or project quotas, but not both
+	 * at the same time.  The non-user quota file can be switched between
+	 * group and project quota uses depending on the mount options, which
+	 * means that we can encounter the other type when we try to load quota
+	 * defaults.  Quotacheck will soon reset the the entire quota file
+	 * (including the root dquot) anyway, but don't log scary corruption
+	 * reports to dmesg.
+	 */
+	return ddqp_type == XFS_DQTYPE_GROUP || ddqp_type == XFS_DQTYPE_PROJ;
+}
+
 /* Copy the in-core quota fields in from the on-disk buffer. */
 STATIC int
 xfs_dquot_from_disk(
@@ -512,8 +548,7 @@ xfs_dquot_from_disk(
 	 * Ensure that we got the type and ID we were looking for.
 	 * Everything else was checked by the dquot buffer verifier.
 	 */
-	if ((ddqp->d_type & XFS_DQTYPE_REC_MASK) != xfs_dquot_type(dqp) ||
-	    be32_to_cpu(ddqp->d_id) != dqp->q_id) {
+	if (!xfs_dquot_check_type(dqp, ddqp)) {
 		xfs_alert_tag(bp->b_mount, XFS_PTAG_VERIFIER_ERROR,
 			  "Metadata corruption detected at %pS, quota %u",
 			  __this_address, dqp->q_id);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5.10 CANDIDATE 6/8] xfs: restore shutdown check in mapped write fault path
  2022-06-01 10:45 [PATCH 5.10 CANDIDATE 0/8] xfs stable candidate patches for 5.10.y (part 2) Amir Goldstein
                   ` (4 preceding siblings ...)
  2022-06-01 10:45 ` [PATCH 5.10 CANDIDATE 5/8] xfs: fix incorrect root dquot corruption error when switching group/project quota types Amir Goldstein
@ 2022-06-01 10:45 ` Amir Goldstein
  2022-06-01 10:45 ` [PATCH 5.10 CANDIDATE 7/8] xfs: consider shutdown in bmapbt cursor delete assert Amir Goldstein
  2022-06-01 10:45 ` [PATCH 5.10 CANDIDATE 8/8] xfs: force log and push AIL to clear pinned inodes when aborting mount Amir Goldstein
  7 siblings, 0 replies; 21+ messages in thread
From: Amir Goldstein @ 2022-06-01 10:45 UTC (permalink / raw)
  To: Darrick J . Wong
  Cc: Dave Chinner, Brian Foster, Christoph Hellwig, Luis Chamberlain,
	Adam Manzanares, Tyler Hicks, linux-xfs, fstests, Eric Sandeen

From: Brian Foster <bfoster@redhat.com>

commit e4826691cc7e5458bcb659935d0092bcf3f08c20 upstream.

XFS triggers an iomap warning in the write fault path due to a
!PageUptodate() page if a write fault happens to occur on a page
that recently failed writeback. The iomap writeback error handling
code can clear the Uptodate flag if no portion of the page is
submitted for I/O. This is reproduced by fstest generic/019, which
combines various forms of I/O with simulated disk failures that
inevitably lead to filesystem shutdown (which then unconditionally
fails page writeback).

This is a regression introduced by commit f150b4234397 ("xfs: split
the iomap ops for buffered vs direct writes") due to the removal of
a shutdown check and explicit error return in the ->iomap_begin()
path used by the write fault path. The explicit error return
historically translated to a SIGBUS, but now carries on with iomap
processing where it complains about the unexpected state. Restore
the shutdown check to xfs_buffered_write_iomap_begin() to restore
historical behavior.

Fixes: f150b4234397 ("xfs: split the iomap ops for buffered vs direct writes")
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/xfs/xfs_iomap.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 7b9ff824e82d..74bc2beadc23 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -870,6 +870,9 @@ xfs_buffered_write_iomap_begin(
 	int			allocfork = XFS_DATA_FORK;
 	int			error = 0;

+	if (XFS_FORCED_SHUTDOWN(mp))
+		return -EIO;
+
 	/* we can't use delayed allocations when using extent size hints */
 	if (xfs_get_extsz_hint(ip))
 		return xfs_direct_write_iomap_begin(inode, offset, count,
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5.10 CANDIDATE 7/8] xfs: consider shutdown in bmapbt cursor delete assert
  2022-06-01 10:45 [PATCH 5.10 CANDIDATE 0/8] xfs stable candidate patches for 5.10.y (part 2) Amir Goldstein
                   ` (5 preceding siblings ...)
  2022-06-01 10:45 ` [PATCH 5.10 CANDIDATE 6/8] xfs: restore shutdown check in mapped write fault path Amir Goldstein
@ 2022-06-01 10:45 ` Amir Goldstein
  2022-06-02  0:38   ` Dave Chinner
  2022-06-01 10:45 ` [PATCH 5.10 CANDIDATE 8/8] xfs: force log and push AIL to clear pinned inodes when aborting mount Amir Goldstein
  7 siblings, 1 reply; 21+ messages in thread
From: Amir Goldstein @ 2022-06-01 10:45 UTC (permalink / raw)
  To: Darrick J . Wong
  Cc: Dave Chinner, Brian Foster, Christoph Hellwig, Luis Chamberlain,
	Adam Manzanares, Tyler Hicks, linux-xfs, fstests

From: Brian Foster <bfoster@redhat.com>

commit 1cd738b13ae9b29e03d6149f0246c61f76e81fcf upstream.

The assert in xfs_btree_del_cursor() checks that the bmapbt block
allocation field has been handled correctly before the cursor is
freed. This field is used for accurate calculation of indirect block
reservation requirements (for delayed allocations), for example.
generic/019 reproduces a scenario where this assert fails because
the filesystem has shutdown while in the middle of a bmbt record
insertion. This occurs after a bmbt block has been allocated via the
cursor but before the higher level bmap function (i.e.
xfs_bmap_add_extent_hole_real()) completes and resets the field.

Update the assert to accommodate the transient state if the
filesystem has shutdown. While here, clean up the indentation and
comments in the function.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/xfs/libxfs/xfs_btree.c | 33 ++++++++++++---------------------
 1 file changed, 12 insertions(+), 21 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index 2d25bab68764..9f9f9feccbcd 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -353,20 +353,17 @@ xfs_btree_free_block(
  */
 void
 xfs_btree_del_cursor(
-	xfs_btree_cur_t	*cur,		/* btree cursor */
-	int		error)		/* del because of error */
+	struct xfs_btree_cur	*cur,		/* btree cursor */
+	int			error)		/* del because of error */
 {
-	int		i;		/* btree level */
+	int			i;		/* btree level */
 
 	/*
-	 * Clear the buffer pointers, and release the buffers.
-	 * If we're doing this in the face of an error, we
-	 * need to make sure to inspect all of the entries
-	 * in the bc_bufs array for buffers to be unlocked.
-	 * This is because some of the btree code works from
-	 * level n down to 0, and if we get an error along
-	 * the way we won't have initialized all the entries
-	 * down to 0.
+	 * Clear the buffer pointers and release the buffers. If we're doing
+	 * this because of an error, inspect all of the entries in the bc_bufs
+	 * array for buffers to be unlocked. This is because some of the btree
+	 * code works from level n down to 0, and if we get an error along the
+	 * way we won't have initialized all the entries down to 0.
 	 */
 	for (i = 0; i < cur->bc_nlevels; i++) {
 		if (cur->bc_bufs[i])
@@ -374,17 +371,11 @@ xfs_btree_del_cursor(
 		else if (!error)
 			break;
 	}
-	/*
-	 * Can't free a bmap cursor without having dealt with the
-	 * allocated indirect blocks' accounting.
-	 */
-	ASSERT(cur->bc_btnum != XFS_BTNUM_BMAP ||
-	       cur->bc_ino.allocated == 0);
-	/*
-	 * Free the cursor.
-	 */
+
+	ASSERT(cur->bc_btnum != XFS_BTNUM_BMAP || cur->bc_ino.allocated == 0 ||
+	       XFS_FORCED_SHUTDOWN(cur->bc_mp));
 	if (unlikely(cur->bc_flags & XFS_BTREE_STAGING))
-		kmem_free((void *)cur->bc_ops);
+		kmem_free(cur->bc_ops);
 	kmem_cache_free(xfs_btree_cur_zone, cur);
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5.10 CANDIDATE 8/8] xfs: force log and push AIL to clear pinned inodes when aborting mount
  2022-06-01 10:45 [PATCH 5.10 CANDIDATE 0/8] xfs stable candidate patches for 5.10.y (part 2) Amir Goldstein
                   ` (6 preceding siblings ...)
  2022-06-01 10:45 ` [PATCH 5.10 CANDIDATE 7/8] xfs: consider shutdown in bmapbt cursor delete assert Amir Goldstein
@ 2022-06-01 10:45 ` Amir Goldstein
  7 siblings, 0 replies; 21+ messages in thread
From: Amir Goldstein @ 2022-06-01 10:45 UTC (permalink / raw)
  To: Darrick J . Wong
  Cc: Dave Chinner, Brian Foster, Christoph Hellwig, Luis Chamberlain,
	Adam Manzanares, Tyler Hicks, linux-xfs, fstests, Dave Chinner

From: "Darrick J. Wong" <djwong@kernel.org>

commit d336f7ebc65007f5831e2297e6f3383ae8dbf8ed upstream.

If we allocate quota inodes in the process of mounting a filesystem but
then decide to abort the mount, it's possible that the quota inodes are
sitting around pinned by the log.  Now that inode reclaim relies on the
AIL to flush inodes, we have to force the log and push the AIL in
between releasing the quota inodes and kicking off reclaim to tear down
all the incore inodes.  Do this by extracting the bits we need from the
unmount path and reusing them.  As an added bonus, failed writes during
a failed mount will not retry forever now.

This was originally found during a fuzz test of metadata directories
(xfs/1546), but the actual symptom was that reclaim hung up on the quota
inodes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/xfs/xfs_mount.c | 90 +++++++++++++++++++++++-----------------------
 1 file changed, 44 insertions(+), 46 deletions(-)

diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index a62b8a574409..44b05e1d5d32 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -631,6 +631,47 @@ xfs_check_summary_counts(
 	return xfs_initialize_perag_data(mp, mp->m_sb.sb_agcount);
 }
 
+/*
+ * Flush and reclaim dirty inodes in preparation for unmount. Inodes and
+ * internal inode structures can be sitting in the CIL and AIL at this point,
+ * so we need to unpin them, write them back and/or reclaim them before unmount
+ * can proceed.
+ *
+ * An inode cluster that has been freed can have its buffer still pinned in
+ * memory because the transaction is still sitting in a iclog. The stale inodes
+ * on that buffer will be pinned to the buffer until the transaction hits the
+ * disk and the callbacks run. Pushing the AIL will skip the stale inodes and
+ * may never see the pinned buffer, so nothing will push out the iclog and
+ * unpin the buffer.
+ *
+ * Hence we need to force the log to unpin everything first. However, log
+ * forces don't wait for the discards they issue to complete, so we have to
+ * explicitly wait for them to complete here as well.
+ *
+ * Then we can tell the world we are unmounting so that error handling knows
+ * that the filesystem is going away and we should error out anything that we
+ * have been retrying in the background.  This will prevent never-ending
+ * retries in AIL pushing from hanging the unmount.
+ *
+ * Finally, we can push the AIL to clean all the remaining dirty objects, then
+ * reclaim the remaining inodes that are still in memory at this point in time.
+ */
+static void
+xfs_unmount_flush_inodes(
+	struct xfs_mount	*mp)
+{
+	xfs_log_force(mp, XFS_LOG_SYNC);
+	xfs_extent_busy_wait_all(mp);
+	flush_workqueue(xfs_discard_wq);
+
+	mp->m_flags |= XFS_MOUNT_UNMOUNTING;
+
+	xfs_ail_push_all_sync(mp->m_ail);
+	cancel_delayed_work_sync(&mp->m_reclaim_work);
+	xfs_reclaim_inodes(mp);
+	xfs_health_unmount(mp);
+}
+
 /*
  * This function does the following on an initial mount of a file system:
  *	- reads the superblock from disk and init the mount struct
@@ -1005,7 +1046,7 @@ xfs_mountfs(
 	/* Clean out dquots that might be in memory after quotacheck. */
 	xfs_qm_unmount(mp);
 	/*
-	 * Cancel all delayed reclaim work and reclaim the inodes directly.
+	 * Flush all inode reclamation work and flush the log.
 	 * We have to do this /after/ rtunmount and qm_unmount because those
 	 * two will have scheduled delayed reclaim for the rt/quota inodes.
 	 *
@@ -1015,11 +1056,8 @@ xfs_mountfs(
 	 * qm_unmount_quotas and therefore rely on qm_unmount to release the
 	 * quota inodes.
 	 */
-	cancel_delayed_work_sync(&mp->m_reclaim_work);
-	xfs_reclaim_inodes(mp);
-	xfs_health_unmount(mp);
+	xfs_unmount_flush_inodes(mp);
  out_log_dealloc:
-	mp->m_flags |= XFS_MOUNT_UNMOUNTING;
 	xfs_log_mount_cancel(mp);
  out_fail_wait:
 	if (mp->m_logdev_targp && mp->m_logdev_targp != mp->m_ddev_targp)
@@ -1060,47 +1098,7 @@ xfs_unmountfs(
 	xfs_rtunmount_inodes(mp);
 	xfs_irele(mp->m_rootip);
 
-	/*
-	 * We can potentially deadlock here if we have an inode cluster
-	 * that has been freed has its buffer still pinned in memory because
-	 * the transaction is still sitting in a iclog. The stale inodes
-	 * on that buffer will be pinned to the buffer until the
-	 * transaction hits the disk and the callbacks run. Pushing the AIL will
-	 * skip the stale inodes and may never see the pinned buffer, so
-	 * nothing will push out the iclog and unpin the buffer. Hence we
-	 * need to force the log here to ensure all items are flushed into the
-	 * AIL before we go any further.
-	 */
-	xfs_log_force(mp, XFS_LOG_SYNC);
-
-	/*
-	 * Wait for all busy extents to be freed, including completion of
-	 * any discard operation.
-	 */
-	xfs_extent_busy_wait_all(mp);
-	flush_workqueue(xfs_discard_wq);
-
-	/*
-	 * We now need to tell the world we are unmounting. This will allow
-	 * us to detect that the filesystem is going away and we should error
-	 * out anything that we have been retrying in the background. This will
-	 * prevent neverending retries in AIL pushing from hanging the unmount.
-	 */
-	mp->m_flags |= XFS_MOUNT_UNMOUNTING;
-
-	/*
-	 * Flush all pending changes from the AIL.
-	 */
-	xfs_ail_push_all_sync(mp->m_ail);
-
-	/*
-	 * Reclaim all inodes. At this point there should be no dirty inodes and
-	 * none should be pinned or locked. Stop background inode reclaim here
-	 * if it is still running.
-	 */
-	cancel_delayed_work_sync(&mp->m_reclaim_work);
-	xfs_reclaim_inodes(mp);
-	xfs_health_unmount(mp);
+	xfs_unmount_flush_inodes(mp);
 
 	xfs_qm_unmount(mp);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 5.10 CANDIDATE 7/8] xfs: consider shutdown in bmapbt cursor delete assert
  2022-06-01 10:45 ` [PATCH 5.10 CANDIDATE 7/8] xfs: consider shutdown in bmapbt cursor delete assert Amir Goldstein
@ 2022-06-02  0:38   ` Dave Chinner
  2022-06-02  4:24     ` Amir Goldstein
  0 siblings, 1 reply; 21+ messages in thread
From: Dave Chinner @ 2022-06-02  0:38 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Darrick J . Wong, Brian Foster, Christoph Hellwig,
	Luis Chamberlain, Adam Manzanares, Tyler Hicks, linux-xfs,
	fstests

On Wed, Jun 01, 2022 at 01:45:46PM +0300, Amir Goldstein wrote:
> From: Brian Foster <bfoster@redhat.com>
> 
> commit 1cd738b13ae9b29e03d6149f0246c61f76e81fcf upstream.
> 
> The assert in xfs_btree_del_cursor() checks that the bmapbt block
> allocation field has been handled correctly before the cursor is
> freed. This field is used for accurate calculation of indirect block
> reservation requirements (for delayed allocations), for example.
> generic/019 reproduces a scenario where this assert fails because
> the filesystem has shutdown while in the middle of a bmbt record
> insertion. This occurs after a bmbt block has been allocated via the
> cursor but before the higher level bmap function (i.e.
> xfs_bmap_add_extent_hole_real()) completes and resets the field.
> 
> Update the assert to accommodate the transient state if the
> filesystem has shutdown. While here, clean up the indentation and
> comments in the function.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/xfs/libxfs/xfs_btree.c | 33 ++++++++++++---------------------
>  1 file changed, 12 insertions(+), 21 deletions(-)

https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/commit/?h=for-next&id=56486f307100e8fc66efa2ebd8a71941fa10bf6f

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5.10 CANDIDATE 1/8] xfs: fix up non-directory creation in SGID directories
  2022-06-01 10:45 ` [PATCH 5.10 CANDIDATE 1/8] xfs: fix up non-directory creation in SGID directories Amir Goldstein
@ 2022-06-02  0:52   ` Dave Chinner
  2022-06-02  4:13     ` Amir Goldstein
  2022-06-02 10:17     ` Christian Brauner
  0 siblings, 2 replies; 21+ messages in thread
From: Dave Chinner @ 2022-06-02  0:52 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Darrick J . Wong, Brian Foster, Christoph Hellwig,
	Luis Chamberlain, Adam Manzanares, Tyler Hicks, linux-xfs,
	fstests, Christian Brauner

On Wed, Jun 01, 2022 at 01:45:40PM +0300, Amir Goldstein wrote:
> From: Christoph Hellwig <hch@lst.de>
> 
> commit 01ea173e103edd5ec41acec65b9261b87e123fc2 upstream.
> 
> XFS always inherits the SGID bit if it is set on the parent inode, while
> the generic inode_init_owner does not do this in a few cases where it can
> create a possible security problem, see commit 0fa3ecd87848
> ("Fix up non-directory creation in SGID directories") for details.

inode_init_owner() introduces a bunch more SGID problems because
it strips the SGID bit from the mode passed to it, but all the code
outside it still sees the SGID bit set. IIRC, that means we do the
wrong thing when ACLs are present. IIRC, there's an LTP test for
this CVE now, and it also has a variant which uses ACLs and that
fails too....

I'm kinda wary about mentioning a security fix and then not
backporting the entire set of fixes the CVE requires in the same
patchset.  I have no idea what the status of the VFS level fixes
that are needed to fix this properly - I thought they were done and
reviewed, but they don't appear to be in 5.19 yet.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5.10 CANDIDATE 1/8] xfs: fix up non-directory creation in SGID directories
  2022-06-02  0:52   ` Dave Chinner
@ 2022-06-02  4:13     ` Amir Goldstein
  2022-06-02 10:31       ` Christian Brauner
  2022-06-02 10:17     ` Christian Brauner
  1 sibling, 1 reply; 21+ messages in thread
From: Amir Goldstein @ 2022-06-02  4:13 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Darrick J . Wong, Brian Foster, Christoph Hellwig,
	Luis Chamberlain, Adam Manzanares, Tyler Hicks, linux-xfs,
	fstests, Christian Brauner, Yang Xu

On Thu, Jun 2, 2022 at 3:52 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Wed, Jun 01, 2022 at 01:45:40PM +0300, Amir Goldstein wrote:
> > From: Christoph Hellwig <hch@lst.de>
> >
> > commit 01ea173e103edd5ec41acec65b9261b87e123fc2 upstream.
> >
> > XFS always inherits the SGID bit if it is set on the parent inode, while
> > the generic inode_init_owner does not do this in a few cases where it can
> > create a possible security problem, see commit 0fa3ecd87848
> > ("Fix up non-directory creation in SGID directories") for details.
>
> inode_init_owner() introduces a bunch more SGID problems because
> it strips the SGID bit from the mode passed to it, but all the code
> outside it still sees the SGID bit set. IIRC, that means we do the
> wrong thing when ACLs are present. IIRC, there's an LTP test for
> this CVE now, and it also has a variant which uses ACLs and that
> fails too....

Good point.
I think Christian's vfstests probably tests more cases than what LTP
does at this point.

Christian, Yang,

It would be nice if you could annotate the relevant fstests with
_fixed_by_kernel_commit, which will make it easier to find
all relevant commits to backport when tests are failing on LTS
kernel.

>
> I'm kinda wary about mentioning a security fix and then not
> backporting the entire set of fixes the CVE requires in the same
> patchset.  I have no idea what the status of the VFS level fixes
> that are needed to fix this properly - I thought they were done and
> reviewed, but they don't appear to be in 5.19 yet.
>

No, it looks like tihs is still in review.

Christoph, Cristian, Yang,

What do you think is best to do w.r.t this patch?

Wait for all the current known issues to be fixed in upstream and then
backport all known fixes?

Backport whatever fixes are available in upstream now at the same
backport series?

Take this now and the rest later?

To be on the safe side, until there is consensus about the best way
to fix LTS, I will omit this fix from my weekly post to stable.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5.10 CANDIDATE 7/8] xfs: consider shutdown in bmapbt cursor delete assert
  2022-06-02  0:38   ` Dave Chinner
@ 2022-06-02  4:24     ` Amir Goldstein
  2022-06-02  5:15       ` Dave Chinner
  0 siblings, 1 reply; 21+ messages in thread
From: Amir Goldstein @ 2022-06-02  4:24 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Darrick J . Wong, Brian Foster, Christoph Hellwig,
	Luis Chamberlain, Adam Manzanares, Tyler Hicks, linux-xfs,
	fstests

On Thu, Jun 2, 2022 at 3:38 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Wed, Jun 01, 2022 at 01:45:46PM +0300, Amir Goldstein wrote:
> > From: Brian Foster <bfoster@redhat.com>
> >
> > commit 1cd738b13ae9b29e03d6149f0246c61f76e81fcf upstream.
> >
> > The assert in xfs_btree_del_cursor() checks that the bmapbt block
> > allocation field has been handled correctly before the cursor is
> > freed. This field is used for accurate calculation of indirect block
> > reservation requirements (for delayed allocations), for example.
> > generic/019 reproduces a scenario where this assert fails because
> > the filesystem has shutdown while in the middle of a bmbt record
> > insertion. This occurs after a bmbt block has been allocated via the
> > cursor but before the higher level bmap function (i.e.
> > xfs_bmap_add_extent_hole_real()) completes and resets the field.
> >
> > Update the assert to accommodate the transient state if the
> > filesystem has shutdown. While here, clean up the indentation and
> > comments in the function.
> >
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > ---
> >  fs/xfs/libxfs/xfs_btree.c | 33 ++++++++++++---------------------
> >  1 file changed, 12 insertions(+), 21 deletions(-)
>
> https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/commit/?h=for-next&id=56486f307100e8fc66efa2ebd8a71941fa10bf6f
>

Warm from the over :)

I will need more time to verify that this new fix is not breaking LTS
but I don't think that it should be blocking taking the old 5.12 fix now.
Right?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5.10 CANDIDATE 7/8] xfs: consider shutdown in bmapbt cursor delete assert
  2022-06-02  4:24     ` Amir Goldstein
@ 2022-06-02  5:15       ` Dave Chinner
  2022-06-02  5:55         ` Amir Goldstein
  0 siblings, 1 reply; 21+ messages in thread
From: Dave Chinner @ 2022-06-02  5:15 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Darrick J . Wong, Brian Foster, Christoph Hellwig,
	Luis Chamberlain, Adam Manzanares, Tyler Hicks, linux-xfs,
	fstests

On Thu, Jun 02, 2022 at 07:24:26AM +0300, Amir Goldstein wrote:
> On Thu, Jun 2, 2022 at 3:38 AM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Wed, Jun 01, 2022 at 01:45:46PM +0300, Amir Goldstein wrote:
> > > From: Brian Foster <bfoster@redhat.com>
> > >
> > > commit 1cd738b13ae9b29e03d6149f0246c61f76e81fcf upstream.
> > >
> > > The assert in xfs_btree_del_cursor() checks that the bmapbt block
> > > allocation field has been handled correctly before the cursor is
> > > freed. This field is used for accurate calculation of indirect block
> > > reservation requirements (for delayed allocations), for example.
> > > generic/019 reproduces a scenario where this assert fails because
> > > the filesystem has shutdown while in the middle of a bmbt record
> > > insertion. This occurs after a bmbt block has been allocated via the
> > > cursor but before the higher level bmap function (i.e.
> > > xfs_bmap_add_extent_hole_real()) completes and resets the field.
> > >
> > > Update the assert to accommodate the transient state if the
> > > filesystem has shutdown. While here, clean up the indentation and
> > > comments in the function.
> > >
> > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > ---
> > >  fs/xfs/libxfs/xfs_btree.c | 33 ++++++++++++---------------------
> > >  1 file changed, 12 insertions(+), 21 deletions(-)
> >
> > https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/commit/?h=for-next&id=56486f307100e8fc66efa2ebd8a71941fa10bf6f
> >
> 
> Warm from the over :)
> 
> I will need more time to verify that this new fix is not breaking LTS
> but I don't think that it should be blocking taking the old 5.12 fix now.
> Right?

Rule #1: don't introduce new bugs into stable kernels.

This commit has a known (and fixed) bug in it. If you are going to
back port it to a stable kernel, then you need to also pull in the
fix for that commit, too.

But the bigger question is this: why propose backports of commits
that only change debug code?

ASSERT()s are not compiled into production kernels - they are only
compiled into developer builds when CONFIG_XFS_DEBUG=y is set. It is
test code, not production code, hence nobody will be using this in
production kernels.

I don't see the value in backporting debug fixes unless there
is some other dependency that requires them. But if you are going to
back port them, Rule #1 applies.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5.10 CANDIDATE 7/8] xfs: consider shutdown in bmapbt cursor delete assert
  2022-06-02  5:15       ` Dave Chinner
@ 2022-06-02  5:55         ` Amir Goldstein
  2022-06-03  9:39           ` Amir Goldstein
  0 siblings, 1 reply; 21+ messages in thread
From: Amir Goldstein @ 2022-06-02  5:55 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Darrick J . Wong, Brian Foster, Christoph Hellwig,
	Luis Chamberlain, Adam Manzanares, Tyler Hicks, linux-xfs,
	fstests

On Thu, Jun 2, 2022 at 8:15 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Thu, Jun 02, 2022 at 07:24:26AM +0300, Amir Goldstein wrote:
> > On Thu, Jun 2, 2022 at 3:38 AM Dave Chinner <david@fromorbit.com> wrote:
> > >
> > > On Wed, Jun 01, 2022 at 01:45:46PM +0300, Amir Goldstein wrote:
> > > > From: Brian Foster <bfoster@redhat.com>
> > > >
> > > > commit 1cd738b13ae9b29e03d6149f0246c61f76e81fcf upstream.
> > > >
> > > > The assert in xfs_btree_del_cursor() checks that the bmapbt block
> > > > allocation field has been handled correctly before the cursor is
> > > > freed. This field is used for accurate calculation of indirect block
> > > > reservation requirements (for delayed allocations), for example.
> > > > generic/019 reproduces a scenario where this assert fails because
> > > > the filesystem has shutdown while in the middle of a bmbt record
> > > > insertion. This occurs after a bmbt block has been allocated via the
> > > > cursor but before the higher level bmap function (i.e.
> > > > xfs_bmap_add_extent_hole_real()) completes and resets the field.
> > > >
> > > > Update the assert to accommodate the transient state if the
> > > > filesystem has shutdown. While here, clean up the indentation and
> > > > comments in the function.
> > > >
> > > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > > Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > > ---
> > > >  fs/xfs/libxfs/xfs_btree.c | 33 ++++++++++++---------------------
> > > >  1 file changed, 12 insertions(+), 21 deletions(-)
> > >
> > > https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/commit/?h=for-next&id=56486f307100e8fc66efa2ebd8a71941fa10bf6f
> > >
> >
> > Warm from the over :)
> >
> > I will need more time to verify that this new fix is not breaking LTS
> > but I don't think that it should be blocking taking the old 5.12 fix now.
> > Right?
>
> Rule #1: don't introduce new bugs into stable kernels.
>
> This commit has a known (and fixed) bug in it. If you are going to
> back port it to a stable kernel, then you need to also pull in the
> fix for that commit, too.

Oh. I misunderstood.
I thought this wasn't a Fixes: situation.
I thought you pointed me to another related bug fix.

>
> But the bigger question is this: why propose backports of commits
> that only change debug code?
>
> ASSERT()s are not compiled into production kernels - they are only
> compiled into developer builds when CONFIG_XFS_DEBUG=y is set. It is
> test code, not production code, hence nobody will be using this in
> production kernels.
>
> I don't see the value in backporting debug fixes unless there
> is some other dependency that requires them.

The value is in testing of LTS kernel.

For my backport work to be serious, I need to do serious testing.
Serious means running as many tests as I can and running the tests
on many configs and many times over.

When I first joined Luis in testing LTS baseline, CONFIG_XFS_DEBUG
was not enabled on the tested kernels.

I enabled it so I could get better test coverage for fstests that use
error injection and tests that check for asserts.

This helped me find a regression with one of the backported patches [1].

IOW, for LTS code to be in good quality, it needs to also have the
correct assertions.

For the same reason, I am also going to queue the following as stable
candidate:

756b1c343333 xfs: use current->journal_info for detecting transaction recursion

Because it has already proved to be helpful in detecting bugs on
our internal product tests.

> But if you are going to back port them, Rule #1 applies.
>

Of course. I will defer sending this patch to stable and test
it along with the new fix.

Thanks!
Amir.

[1] https://lore.kernel.org/linux-xfs/YpY6hUknor2S1iMd@bfoster/T/#mf1add66b8309a75a8984f28ea08718f22033bce7

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5.10 CANDIDATE 1/8] xfs: fix up non-directory creation in SGID directories
  2022-06-02  0:52   ` Dave Chinner
  2022-06-02  4:13     ` Amir Goldstein
@ 2022-06-02 10:17     ` Christian Brauner
  1 sibling, 0 replies; 21+ messages in thread
From: Christian Brauner @ 2022-06-02 10:17 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Amir Goldstein, Darrick J . Wong, Brian Foster,
	Christoph Hellwig, Luis Chamberlain, Adam Manzanares,
	Tyler Hicks, linux-xfs, fstests, Christian Brauner

On Thu, Jun 02, 2022 at 10:52:38AM +1000, Dave Chinner wrote:
> On Wed, Jun 01, 2022 at 01:45:40PM +0300, Amir Goldstein wrote:
> > From: Christoph Hellwig <hch@lst.de>
> > 
> > commit 01ea173e103edd5ec41acec65b9261b87e123fc2 upstream.
> > 
> > XFS always inherits the SGID bit if it is set on the parent inode, while
> > the generic inode_init_owner does not do this in a few cases where it can
> > create a possible security problem, see commit 0fa3ecd87848
> > ("Fix up non-directory creation in SGID directories") for details.
> 
> inode_init_owner() introduces a bunch more SGID problems because
> it strips the SGID bit from the mode passed to it, but all the code
> outside it still sees the SGID bit set. IIRC, that means we do the
> wrong thing when ACLs are present. IIRC, there's an LTP test for
> this CVE now, and it also has a variant which uses ACLs and that
> fails too....
> 
> I'm kinda wary about mentioning a security fix and then not
> backporting the entire set of fixes the CVE requires in the same
> patchset.  I have no idea what the status of the VFS level fixes
> that are needed to fix this properly - I thought they were done and
> reviewed, but they don't appear to be in 5.19 yet.

There were a few outstanding issues and we didn't receive a new
submission for them right before or during the merge window.

I'm at a conference this week but I'll get back to review the patches
and associated tests on Monday.

Christian

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5.10 CANDIDATE 1/8] xfs: fix up non-directory creation in SGID directories
  2022-06-02  4:13     ` Amir Goldstein
@ 2022-06-02 10:31       ` Christian Brauner
  2022-06-08  8:25         ` Amir Goldstein
  0 siblings, 1 reply; 21+ messages in thread
From: Christian Brauner @ 2022-06-02 10:31 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Dave Chinner, Darrick J . Wong, Brian Foster, Christoph Hellwig,
	Luis Chamberlain, Adam Manzanares, Tyler Hicks, linux-xfs,
	fstests, Christian Brauner, Yang Xu

On Thu, Jun 02, 2022 at 07:13:56AM +0300, Amir Goldstein wrote:
> On Thu, Jun 2, 2022 at 3:52 AM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Wed, Jun 01, 2022 at 01:45:40PM +0300, Amir Goldstein wrote:
> > > From: Christoph Hellwig <hch@lst.de>
> > >
> > > commit 01ea173e103edd5ec41acec65b9261b87e123fc2 upstream.
> > >
> > > XFS always inherits the SGID bit if it is set on the parent inode, while
> > > the generic inode_init_owner does not do this in a few cases where it can
> > > create a possible security problem, see commit 0fa3ecd87848
> > > ("Fix up non-directory creation in SGID directories") for details.
> >
> > inode_init_owner() introduces a bunch more SGID problems because
> > it strips the SGID bit from the mode passed to it, but all the code
> > outside it still sees the SGID bit set. IIRC, that means we do the
> > wrong thing when ACLs are present. IIRC, there's an LTP test for
> > this CVE now, and it also has a variant which uses ACLs and that
> > fails too....
> 
> Good point.
> I think Christian's vfstests probably tests more cases than what LTP
> does at this point.

I think so, yes. There will also be more tests coming into fstests.

> 
> Christian, Yang,
> 
> It would be nice if you could annotate the relevant fstests with
> _fixed_by_kernel_commit, which will make it easier to find
> all relevant commits to backport when tests are failing on LTS
> kernel.
> 
> >
> > I'm kinda wary about mentioning a security fix and then not
> > backporting the entire set of fixes the CVE requires in the same
> > patchset.  I have no idea what the status of the VFS level fixes
> > that are needed to fix this properly - I thought they were done and
> > reviewed, but they don't appear to be in 5.19 yet.
> >
> 
> No, it looks like tihs is still in review.
> 
> Christoph, Cristian, Yang,
> 
> What do you think is best to do w.r.t this patch?
> 
> Wait for all the current known issues to be fixed in upstream and then
> backport all known fixes?
> 
> Backport whatever fixes are available in upstream now at the same
> backport series?
> 
> Take this now and the rest later?

Imho, backporting this patch is useful. It fixes a basic issue.
What Dave mentioned is that if ACLs/umask are in play things become
order dependent I've pointed out on the patch that aims to fix this: If
no ACLs are supported then umask is stripped in vfs and if they are it's
stripped in the fs. So if umask strips S_IXGRP in the vfs then no setgid
bit is inherited. If it's stripped in the fs post inode_init_owner()
setgid bit is tripped and thus not inherited.. The vfs patch unifies
this behavior.

Christian

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5.10 CANDIDATE 7/8] xfs: consider shutdown in bmapbt cursor delete assert
  2022-06-02  5:55         ` Amir Goldstein
@ 2022-06-03  9:39           ` Amir Goldstein
  0 siblings, 0 replies; 21+ messages in thread
From: Amir Goldstein @ 2022-06-03  9:39 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Darrick J . Wong, Brian Foster, Christoph Hellwig,
	Luis Chamberlain, Adam Manzanares, Tyler Hicks, linux-xfs,
	fstests

On Thu, Jun 2, 2022 at 8:55 AM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Thu, Jun 2, 2022 at 8:15 AM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Thu, Jun 02, 2022 at 07:24:26AM +0300, Amir Goldstein wrote:
> > > On Thu, Jun 2, 2022 at 3:38 AM Dave Chinner <david@fromorbit.com> wrote:
> > > >
> > > > On Wed, Jun 01, 2022 at 01:45:46PM +0300, Amir Goldstein wrote:
> > > > > From: Brian Foster <bfoster@redhat.com>
> > > > >
> > > > > commit 1cd738b13ae9b29e03d6149f0246c61f76e81fcf upstream.
> > > > >
> > > > > The assert in xfs_btree_del_cursor() checks that the bmapbt block
> > > > > allocation field has been handled correctly before the cursor is
> > > > > freed. This field is used for accurate calculation of indirect block
> > > > > reservation requirements (for delayed allocations), for example.
> > > > > generic/019 reproduces a scenario where this assert fails because
> > > > > the filesystem has shutdown while in the middle of a bmbt record
> > > > > insertion. This occurs after a bmbt block has been allocated via the
> > > > > cursor but before the higher level bmap function (i.e.
> > > > > xfs_bmap_add_extent_hole_real()) completes and resets the field.
> > > > >
> > > > > Update the assert to accommodate the transient state if the
> > > > > filesystem has shutdown. While here, clean up the indentation and
> > > > > comments in the function.
> > > > >
> > > > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > > > Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> > > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > > > ---
> > > > >  fs/xfs/libxfs/xfs_btree.c | 33 ++++++++++++---------------------
> > > > >  1 file changed, 12 insertions(+), 21 deletions(-)
> > > >
> > > > https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/commit/?h=for-next&id=56486f307100e8fc66efa2ebd8a71941fa10bf6f
> > > >
> > >
> > > Warm from the over :)
> > >
> > > I will need more time to verify that this new fix is not breaking LTS
> > > but I don't think that it should be blocking taking the old 5.12 fix now.
> > > Right?
> >
> > Rule #1: don't introduce new bugs into stable kernels.
> >
> > This commit has a known (and fixed) bug in it. If you are going to
> > back port it to a stable kernel, then you need to also pull in the
> > fix for that commit, too.
>
> Oh. I misunderstood.
> I thought this wasn't a Fixes: situation.
> I thought you pointed me to another related bug fix.
>

Just to make sure we are all on the same page.

I have applied both patches to my test tree:
1. 1cd738b13ae9 xfs: consider shutdown in bmapbt cursor delete assert
2. 56486f307100 xfs: assert in xfs_btree_del_cursor should take into
account error

Patch #2 looks pretty safe and it only affects builds with XFS_WARN/DEBUG,
so I am not too concerned about a soaking period.
I plan to send it along with patch #1 to stable after a few more test runs.

If my understanding is correct, the ASSERT has been there since git epoc.
The too strict ASSERT was relaxed two times by patch #1 and then by patch #2.

Maybe I am missing something, but I do not see how applying patch #1
introduces a bug, but anyway, I am going to send both patches together.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5.10 CANDIDATE 1/8] xfs: fix up non-directory creation in SGID directories
  2022-06-02 10:31       ` Christian Brauner
@ 2022-06-08  8:25         ` Amir Goldstein
  2022-06-08  8:26           ` Christoph Hellwig
  0 siblings, 1 reply; 21+ messages in thread
From: Amir Goldstein @ 2022-06-08  8:25 UTC (permalink / raw)
  To: Christian Brauner, Christoph Hellwig
  Cc: Dave Chinner, Darrick J . Wong, Brian Foster, Luis Chamberlain,
	Adam Manzanares, Tyler Hicks, linux-xfs, fstests,
	Christian Brauner, Yang Xu

On Thu, Jun 2, 2022 at 1:31 PM Christian Brauner <brauner@kernel.org> wrote:
>
> On Thu, Jun 02, 2022 at 07:13:56AM +0300, Amir Goldstein wrote:
> > On Thu, Jun 2, 2022 at 3:52 AM Dave Chinner <david@fromorbit.com> wrote:
> > >
> > > On Wed, Jun 01, 2022 at 01:45:40PM +0300, Amir Goldstein wrote:
> > > > From: Christoph Hellwig <hch@lst.de>
> > > >
> > > > commit 01ea173e103edd5ec41acec65b9261b87e123fc2 upstream.
> > > >
> > > > XFS always inherits the SGID bit if it is set on the parent inode, while
> > > > the generic inode_init_owner does not do this in a few cases where it can
> > > > create a possible security problem, see commit 0fa3ecd87848
> > > > ("Fix up non-directory creation in SGID directories") for details.
> > >
> > > inode_init_owner() introduces a bunch more SGID problems because
> > > it strips the SGID bit from the mode passed to it, but all the code
> > > outside it still sees the SGID bit set. IIRC, that means we do the
> > > wrong thing when ACLs are present. IIRC, there's an LTP test for
> > > this CVE now, and it also has a variant which uses ACLs and that
> > > fails too....
> >
> > Good point.
> > I think Christian's vfstests probably tests more cases than what LTP
> > does at this point.
>
> I think so, yes. There will also be more tests coming into fstests.
>
> >
> > Christian, Yang,
> >
> > It would be nice if you could annotate the relevant fstests with
> > _fixed_by_kernel_commit, which will make it easier to find
> > all relevant commits to backport when tests are failing on LTS
> > kernel.
> >
> > >
> > > I'm kinda wary about mentioning a security fix and then not
> > > backporting the entire set of fixes the CVE requires in the same
> > > patchset.  I have no idea what the status of the VFS level fixes
> > > that are needed to fix this properly - I thought they were done and
> > > reviewed, but they don't appear to be in 5.19 yet.
> > >
> >
> > No, it looks like tihs is still in review.
> >
> > Christoph, Cristian, Yang,
> >
> > What do you think is best to do w.r.t this patch?
> >
> > Wait for all the current known issues to be fixed in upstream and then
> > backport all known fixes?
> >
> > Backport whatever fixes are available in upstream now at the same
> > backport series?
> >
> > Take this now and the rest later?
>
> Imho, backporting this patch is useful. It fixes a basic issue.
> What Dave mentioned is that if ACLs/umask are in play things become
> order dependent I've pointed out on the patch that aims to fix this: If
> no ACLs are supported then umask is stripped in vfs and if they are it's
> stripped in the fs. So if umask strips S_IXGRP in the vfs then no setgid
> bit is inherited. If it's stripped in the fs post inode_init_owner()
> setgid bit is tripped and thus not inherited.. The vfs patch unifies
> this behavior.
>

TBH, I am having a hard time following the expected vs. actual
behavior in all the cases at all points in time.

Christoph,

As the author of this patch, do you have an opinion w.r.t backporting
this patch alongs with vs. independent of followup fixes?
wait for future fixes yet to come?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5.10 CANDIDATE 1/8] xfs: fix up non-directory creation in SGID directories
  2022-06-08  8:25         ` Amir Goldstein
@ 2022-06-08  8:26           ` Christoph Hellwig
  2022-06-08  9:15             ` Amir Goldstein
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2022-06-08  8:26 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Christian Brauner, Christoph Hellwig, Dave Chinner,
	Darrick J . Wong, Brian Foster, Luis Chamberlain,
	Adam Manzanares, Tyler Hicks, linux-xfs, fstests,
	Christian Brauner, Yang Xu

On Wed, Jun 08, 2022 at 11:25:10AM +0300, Amir Goldstein wrote:
> TBH, I am having a hard time following the expected vs. actual
> behavior in all the cases at all points in time.
> 
> Christoph,
> 
> As the author of this patch, do you have an opinion w.r.t backporting
> this patch alongs with vs. independent of followup fixes?
> wait for future fixes yet to come?

To me backporting it seems good and useful, as it fixes a relatively
big problem.  The remaining issues seem minor compared to that.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5.10 CANDIDATE 1/8] xfs: fix up non-directory creation in SGID directories
  2022-06-08  8:26           ` Christoph Hellwig
@ 2022-06-08  9:15             ` Amir Goldstein
  0 siblings, 0 replies; 21+ messages in thread
From: Amir Goldstein @ 2022-06-08  9:15 UTC (permalink / raw)
  To: Christoph Hellwig, Dave Chinner
  Cc: Christian Brauner, Darrick J . Wong, Brian Foster,
	Luis Chamberlain, Adam Manzanares, Tyler Hicks, linux-xfs,
	fstests, Christian Brauner, Yang Xu, Leah Rumancik

On Wed, Jun 8, 2022 at 11:26 AM Christoph Hellwig <hch@lst.de> wrote:
>
> On Wed, Jun 08, 2022 at 11:25:10AM +0300, Amir Goldstein wrote:
> > TBH, I am having a hard time following the expected vs. actual
> > behavior in all the cases at all points in time.
> >
> > Christoph,
> >
> > As the author of this patch, do you have an opinion w.r.t backporting
> > this patch alongs with vs. independent of followup fixes?
> > wait for future fixes yet to come?
>
> To me backporting it seems good and useful, as it fixes a relatively
> big problem.  The remaining issues seem minor compared to that.
>

Thanks!

Dave,

I am not seeing any progress with the remaining VFS issues,
so rather not hold up fixing this "big problem" on future changes.

Do the two opinions from Christian and Christoph help you
ease your mind about the concerns that you raised?

Anyway, I now have these two patches queued and tests so
I can send them along in the same batch, whenever that is:

e014f37db xfs: use setattr_copy to set vfs inode attributes
01ea173e1 xfs: fix up non-directory creation in SGID directories

The setattr_copy patch is also part of Leah's candidates for 5.15,
so if you approve, I will include both patches above in my
next posting (after part 3) to align with Leah's 5.15 candidates.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2022-06-08  9:48 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-01 10:45 [PATCH 5.10 CANDIDATE 0/8] xfs stable candidate patches for 5.10.y (part 2) Amir Goldstein
2022-06-01 10:45 ` [PATCH 5.10 CANDIDATE 1/8] xfs: fix up non-directory creation in SGID directories Amir Goldstein
2022-06-02  0:52   ` Dave Chinner
2022-06-02  4:13     ` Amir Goldstein
2022-06-02 10:31       ` Christian Brauner
2022-06-08  8:25         ` Amir Goldstein
2022-06-08  8:26           ` Christoph Hellwig
2022-06-08  9:15             ` Amir Goldstein
2022-06-02 10:17     ` Christian Brauner
2022-06-01 10:45 ` [PATCH 5.10 CANDIDATE 2/8] xfs: set inode size after creating symlink Amir Goldstein
2022-06-01 10:45 ` [PATCH 5.10 CANDIDATE 3/8] xfs: sync lazy sb accounting on quiesce of read-only mounts Amir Goldstein
2022-06-01 10:45 ` [PATCH 5.10 CANDIDATE 4/8] xfs: fix chown leaking delalloc quota blocks when fssetxattr fails Amir Goldstein
2022-06-01 10:45 ` [PATCH 5.10 CANDIDATE 5/8] xfs: fix incorrect root dquot corruption error when switching group/project quota types Amir Goldstein
2022-06-01 10:45 ` [PATCH 5.10 CANDIDATE 6/8] xfs: restore shutdown check in mapped write fault path Amir Goldstein
2022-06-01 10:45 ` [PATCH 5.10 CANDIDATE 7/8] xfs: consider shutdown in bmapbt cursor delete assert Amir Goldstein
2022-06-02  0:38   ` Dave Chinner
2022-06-02  4:24     ` Amir Goldstein
2022-06-02  5:15       ` Dave Chinner
2022-06-02  5:55         ` Amir Goldstein
2022-06-03  9:39           ` Amir Goldstein
2022-06-01 10:45 ` [PATCH 5.10 CANDIDATE 8/8] xfs: force log and push AIL to clear pinned inodes when aborting mount Amir Goldstein

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.