All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] xfs: validate extsz hints against rt extent size when rtinherit is set
@ 2021-05-25  6:15 Darrick J. Wong
  2021-05-25 10:49 ` Carlos Maiolino
  2021-05-25 11:54 ` Brian Foster
  0 siblings, 2 replies; 5+ messages in thread
From: Darrick J. Wong @ 2021-05-25  6:15 UTC (permalink / raw)
  To: linux-xfs, bfoster, hch

From: Darrick J. Wong <djwong@kernel.org>

The RTINHERIT bit can be set on a directory so that newly created
regular files will have the REALTIME bit set to store their data on the
realtime volume.  If an extent size hint (and EXTSZINHERIT) are set on
the directory, the hint will also be copied into the new file.

As pointed out in previous patches, for realtime files we require the
extent size hint be an integer multiple of the realtime extent, but we
don't perform the same validation on a directory with both RTINHERIT and
EXTSZINHERIT set, even though the only use-case of that combination is
to propagate extent size hints into new realtime files.  This leads to
inode corruption errors when the bad values are propagated.

Because there may be existing filesystems with such a configuration, we
cannot simply amend the inode verifier to trip on these directories and
call it a day because that will cause previously "working" filesystems
to start throwing errors abruptly.  Note that it's valid to have
directories with rtinherit set even if there is no realtime volume, in
which case the problem does not manifest because rtinherit is ignored if
there's no realtime device; and it's possible that someone set the flag,
crashed, repaired the filesystem (which clears the hint on the realtime
file) and continued.

Therefore, mitigate this issue in several ways: First, if we try to
write out an inode with both rtinherit/extszinherit set and an unaligned
extent size hint, turn off the hint to correct the error.  Second, if
someone tries to misconfigure a directory via the fssetxattr ioctl, fail
the ioctl.  Third, reverify both extent size hint values when we
propagate heritable inode attributes from parent to child, to prevent
misconfigurations from spreading.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
v2: disable incorrect hints at runtime instead of whacking filesystems
    with verifier errors
v3: revise the comment in the verifier to describe the source of the
    problem, the observable symptoms, and how the solution fits the
    historical context
---
 fs/xfs/libxfs/xfs_inode_buf.c   |   22 ++++++++++++++++++++++
 fs/xfs/libxfs/xfs_trans_inode.c |   15 +++++++++++++++
 fs/xfs/xfs_inode.c              |   29 +++++++++++++++++++++++++++++
 fs/xfs/xfs_ioctl.c              |   15 +++++++++++++++
 4 files changed, 81 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index 045118c7bf78..f3254a4f4cb4 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -589,6 +589,28 @@ xfs_inode_validate_extsize(
 	inherit_flag = (flags & XFS_DIFLAG_EXTSZINHERIT);
 	extsize_bytes = XFS_FSB_TO_B(mp, extsize);
 
+	/*
+	 * This comment describes a historic gap in this verifier function.
+	 *
+	 * On older kernels, the extent size hint verifier doesn't check that
+	 * the extent size hint is an integer multiple of the realtime extent
+	 * size on a directory with both RTINHERIT and EXTSZINHERIT flags set.
+	 * The verifier has always enforced the alignment rule for regular
+	 * files with the REALTIME flag set.
+	 *
+	 * If a directory with a misaligned extent size hint is allowed to
+	 * propagate that hint into a new regular realtime file, the result
+	 * is that the inode cluster buffer verifier will trigger a corruption
+	 * shutdown the next time it is run.
+	 *
+	 * Unfortunately, there could be filesystems with these misconfigured
+	 * directories in the wild, so we cannot add a check to this verifier
+	 * at this time because that will result a new source of directory
+	 * corruption errors when reading an existing filesystem.  Instead, we
+	 * permit the misconfiguration to pass through the verifiers so that
+	 * callers of this function can correct and mitigate externally.
+	 */
+
 	if (rt_flag)
 		blocksize_bytes = mp->m_sb.sb_rextsize << mp->m_sb.sb_blocklog;
 	else
diff --git a/fs/xfs/libxfs/xfs_trans_inode.c b/fs/xfs/libxfs/xfs_trans_inode.c
index 78324e043e25..325f2dceec13 100644
--- a/fs/xfs/libxfs/xfs_trans_inode.c
+++ b/fs/xfs/libxfs/xfs_trans_inode.c
@@ -142,6 +142,21 @@ xfs_trans_log_inode(
 		flags |= XFS_ILOG_CORE;
 	}
 
+	/*
+	 * Inode verifiers on older kernels don't check that the extent size
+	 * hint is an integer multiple of the rt extent size on a directory
+	 * with both rtinherit and extszinherit flags set.  If we're logging a
+	 * directory that is misconfigured in this way, clear the hint.
+	 */
+	if ((ip->i_diflags & XFS_DIFLAG_RTINHERIT) &&
+	    (ip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) &&
+	    (ip->i_extsize % ip->i_mount->m_sb.sb_rextsize) > 0) {
+		ip->i_diflags &= ~(XFS_DIFLAG_EXTSIZE |
+				   XFS_DIFLAG_EXTSZINHERIT);
+		ip->i_extsize = 0;
+		flags |= XFS_ILOG_CORE;
+	}
+
 	/*
 	 * Record the specific change for fdatasync optimisation. This allows
 	 * fdatasync to skip log forces for inodes that are only timestamp
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 0369eb22c1bb..e4c2da4566f1 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -690,6 +690,7 @@ xfs_inode_inherit_flags(
 	const struct xfs_inode	*pip)
 {
 	unsigned int		di_flags = 0;
+	xfs_failaddr_t		failaddr;
 	umode_t			mode = VFS_I(ip)->i_mode;
 
 	if (S_ISDIR(mode)) {
@@ -729,6 +730,24 @@ xfs_inode_inherit_flags(
 		di_flags |= XFS_DIFLAG_FILESTREAM;
 
 	ip->i_diflags |= di_flags;
+
+	/*
+	 * Inode verifiers on older kernels only check that the extent size
+	 * hint is an integer multiple of the rt extent size on realtime files.
+	 * They did not check the hint alignment on a directory with both
+	 * rtinherit and extszinherit flags set.  If the misaligned hint is
+	 * propagated from a directory into a new realtime file, new file
+	 * allocations will fail due to math errors in the rt allocator and/or
+	 * trip the verifiers.  Validate the hint settings in the new file so
+	 * that we don't let broken hints propagate.
+	 */
+	failaddr = xfs_inode_validate_extsize(ip->i_mount, ip->i_extsize,
+			VFS_I(ip)->i_mode, ip->i_diflags);
+	if (failaddr) {
+		ip->i_diflags &= ~(XFS_DIFLAG_EXTSIZE |
+				   XFS_DIFLAG_EXTSZINHERIT);
+		ip->i_extsize = 0;
+	}
 }
 
 /* Propagate di_flags2 from a parent inode to a child inode. */
@@ -737,12 +756,22 @@ xfs_inode_inherit_flags2(
 	struct xfs_inode	*ip,
 	const struct xfs_inode	*pip)
 {
+	xfs_failaddr_t		failaddr;
+
 	if (pip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) {
 		ip->i_diflags2 |= XFS_DIFLAG2_COWEXTSIZE;
 		ip->i_cowextsize = pip->i_cowextsize;
 	}
 	if (pip->i_diflags2 & XFS_DIFLAG2_DAX)
 		ip->i_diflags2 |= XFS_DIFLAG2_DAX;
+
+	/* Don't let invalid cowextsize hints propagate. */
+	failaddr = xfs_inode_validate_cowextsize(ip->i_mount, ip->i_cowextsize,
+			VFS_I(ip)->i_mode, ip->i_diflags, ip->i_diflags2);
+	if (failaddr) {
+		ip->i_diflags2 &= ~XFS_DIFLAG2_COWEXTSIZE;
+		ip->i_cowextsize = 0;
+	}
 }
 
 /*
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 6407921aca96..1fe4c1fc0aea 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1291,6 +1291,21 @@ xfs_ioctl_setattr_check_extsize(
 
 	new_diflags = xfs_flags2diflags(ip, fa->fsx_xflags);
 
+	/*
+	 * Inode verifiers on older kernels don't check that the extent size
+	 * hint is an integer multiple of the rt extent size on a directory
+	 * with both rtinherit and extszinherit flags set.  Don't let sysadmins
+	 * misconfigure directories.
+	 */
+	if ((new_diflags & XFS_DIFLAG_RTINHERIT) &&
+	    (new_diflags & XFS_DIFLAG_EXTSZINHERIT)) {
+		unsigned int	rtextsize_bytes;
+
+		rtextsize_bytes = XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize);
+		if (fa->fsx_extsize % rtextsize_bytes)
+			return -EINVAL;
+	}
+
 	failaddr = xfs_inode_validate_extsize(ip->i_mount,
 			XFS_B_TO_FSB(mp, fa->fsx_extsize),
 			VFS_I(ip)->i_mode, new_diflags);

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] xfs: validate extsz hints against rt extent size when rtinherit is set
  2021-05-25  6:15 [PATCH v3] xfs: validate extsz hints against rt extent size when rtinherit is set Darrick J. Wong
@ 2021-05-25 10:49 ` Carlos Maiolino
  2021-05-25 17:20   ` Darrick J. Wong
  2021-05-25 11:54 ` Brian Foster
  1 sibling, 1 reply; 5+ messages in thread
From: Carlos Maiolino @ 2021-05-25 10:49 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, bfoster, hch

Hi Darrick.

On Mon, May 24, 2021 at 11:15:31PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> The RTINHERIT bit can be set on a directory so that newly created
> regular files will have the REALTIME bit set to store their data on the
> realtime volume.  If an extent size hint (and EXTSZINHERIT) are set on
> the directory, the hint will also be copied into the new file.
> 
> As pointed out in previous patches, for realtime files we require the
> extent size hint be an integer multiple of the realtime extent, but we
> don't perform the same validation on a directory with both RTINHERIT and
> EXTSZINHERIT set, even though the only use-case of that combination is
> to propagate extent size hints into new realtime files.  This leads to
> inode corruption errors when the bad values are propagated.
> 
> Because there may be existing filesystems with such a configuration, we
> cannot simply amend the inode verifier to trip on these directories and
> call it a day because that will cause previously "working" filesystems
> to start throwing errors abruptly.  Note that it's valid to have
> directories with rtinherit set even if there is no realtime volume, in
> which case the problem does not manifest because rtinherit is ignored if
> there's no realtime device; and it's possible that someone set the flag,
> crashed, repaired the filesystem (which clears the hint on the realtime
> file) and continued.
> 
> Therefore, mitigate this issue in several ways: First, if we try to
> write out an inode with both rtinherit/extszinherit set and an unaligned
> extent size hint, turn off the hint to correct the error.  Second, if
> someone tries to misconfigure a directory via the fssetxattr ioctl, fail
> the ioctl.  Third, reverify both extent size hint values when we
> propagate heritable inode attributes from parent to child, to prevent
> misconfigurations from spreading.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
> v2: disable incorrect hints at runtime instead of whacking filesystems
>     with verifier errors
> v3: revise the comment in the verifier to describe the source of the
>     problem, the observable symptoms, and how the solution fits the
>     historical context

IMHO the patch is fine, I have just one comment I'd like to address though:

> +	/*
> +	 * Inode verifiers on older kernels don't check that the extent size
> +	 * hint is an integer multiple of the rt extent size on a directory
> +	 * with both rtinherit and extszinherit flags set.  If we're logging a
> +	 * directory that is misconfigured in this way, clear the hint.
> +	 */
> +	if ((ip->i_diflags & XFS_DIFLAG_RTINHERIT) &&
> +	    (ip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) &&
> +	    (ip->i_extsize % ip->i_mount->m_sb.sb_rextsize) > 0) {
> +		ip->i_diflags &= ~(XFS_DIFLAG_EXTSIZE |
> +				   XFS_DIFLAG_EXTSZINHERIT);
> +		ip->i_extsize = 0;
> +		flags |= XFS_ILOG_CORE;
> +	}
> +
...
> +	 * that we don't let broken hints propagate.
> +	 */
> +	failaddr = xfs_inode_validate_extsize(ip->i_mount, ip->i_extsize,
> +			VFS_I(ip)->i_mode, ip->i_diflags);
> +	if (failaddr) {
> +		ip->i_diflags &= ~(XFS_DIFLAG_EXTSIZE |
> +				   XFS_DIFLAG_EXTSZINHERIT);
> +		ip->i_extsize = 0;
> +	}
>  }

...
> +	/* Don't let invalid cowextsize hints propagate. */
> +	failaddr = xfs_inode_validate_cowextsize(ip->i_mount, ip->i_cowextsize,
> +			VFS_I(ip)->i_mode, ip->i_diflags, ip->i_diflags2);
> +	if (failaddr) {
> +		ip->i_diflags2 &= ~XFS_DIFLAG2_COWEXTSIZE;
> +		ip->i_cowextsize = 0;
> +	}
>  }

In all cases above, wouldn't be interesting to at least log the fact we are
resetting the extent size? At least in debug mode? This may let users clueless
on why the extent size has been reset, or at least give us some debug data when
required?


The patch itself looks fine, with or without logging the extsize reset, you can
add:

Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

Cheers

>  
>  /*
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 6407921aca96..1fe4c1fc0aea 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -1291,6 +1291,21 @@ xfs_ioctl_setattr_check_extsize(
>  
>  	new_diflags = xfs_flags2diflags(ip, fa->fsx_xflags);
>  
> +	/*
> +	 * Inode verifiers on older kernels don't check that the extent size
> +	 * hint is an integer multiple of the rt extent size on a directory
> +	 * with both rtinherit and extszinherit flags set.  Don't let sysadmins
> +	 * misconfigure directories.
> +	 */
> +	if ((new_diflags & XFS_DIFLAG_RTINHERIT) &&
> +	    (new_diflags & XFS_DIFLAG_EXTSZINHERIT)) {
> +		unsigned int	rtextsize_bytes;
> +
> +		rtextsize_bytes = XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize);
> +		if (fa->fsx_extsize % rtextsize_bytes)
> +			return -EINVAL;
> +	}
> +
>  	failaddr = xfs_inode_validate_extsize(ip->i_mount,
>  			XFS_B_TO_FSB(mp, fa->fsx_extsize),
>  			VFS_I(ip)->i_mode, new_diflags);
> 

-- 
Carlos


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] xfs: validate extsz hints against rt extent size when rtinherit is set
  2021-05-25  6:15 [PATCH v3] xfs: validate extsz hints against rt extent size when rtinherit is set Darrick J. Wong
  2021-05-25 10:49 ` Carlos Maiolino
@ 2021-05-25 11:54 ` Brian Foster
  2021-05-25 17:29   ` Darrick J. Wong
  1 sibling, 1 reply; 5+ messages in thread
From: Brian Foster @ 2021-05-25 11:54 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, hch

On Mon, May 24, 2021 at 11:15:31PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> The RTINHERIT bit can be set on a directory so that newly created
> regular files will have the REALTIME bit set to store their data on the
> realtime volume.  If an extent size hint (and EXTSZINHERIT) are set on
> the directory, the hint will also be copied into the new file.
> 
> As pointed out in previous patches, for realtime files we require the
> extent size hint be an integer multiple of the realtime extent, but we
> don't perform the same validation on a directory with both RTINHERIT and
> EXTSZINHERIT set, even though the only use-case of that combination is
> to propagate extent size hints into new realtime files.  This leads to
> inode corruption errors when the bad values are propagated.
> 
> Because there may be existing filesystems with such a configuration, we
> cannot simply amend the inode verifier to trip on these directories and
> call it a day because that will cause previously "working" filesystems
> to start throwing errors abruptly.  Note that it's valid to have
> directories with rtinherit set even if there is no realtime volume, in
> which case the problem does not manifest because rtinherit is ignored if
> there's no realtime device; and it's possible that someone set the flag,
> crashed, repaired the filesystem (which clears the hint on the realtime
> file) and continued.
> 
> Therefore, mitigate this issue in several ways: First, if we try to
> write out an inode with both rtinherit/extszinherit set and an unaligned
> extent size hint, turn off the hint to correct the error.  Second, if
> someone tries to misconfigure a directory via the fssetxattr ioctl, fail
> the ioctl.  Third, reverify both extent size hint values when we
> propagate heritable inode attributes from parent to child, to prevent
> misconfigurations from spreading.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
> v2: disable incorrect hints at runtime instead of whacking filesystems
>     with verifier errors
> v3: revise the comment in the verifier to describe the source of the
>     problem, the observable symptoms, and how the solution fits the
>     historical context
> ---

I agree with Carlos in that I'd prefer to see some kind of one-shot
warning to indicate that the problem has been identified in the fs. Some
users may want to run a repair sequence (or would otherwise have no idea
they should consider one) just to put things back in a correct state.
(Perhaps putting the clearing code in a common helper with a flag to
toggle between extszhint and cowextszhint would facilitate?).

That aside, the patch looks Ok to me:

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/libxfs/xfs_inode_buf.c   |   22 ++++++++++++++++++++++
>  fs/xfs/libxfs/xfs_trans_inode.c |   15 +++++++++++++++
>  fs/xfs/xfs_inode.c              |   29 +++++++++++++++++++++++++++++
>  fs/xfs/xfs_ioctl.c              |   15 +++++++++++++++
>  4 files changed, 81 insertions(+)
> 
> diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> index 045118c7bf78..f3254a4f4cb4 100644
> --- a/fs/xfs/libxfs/xfs_inode_buf.c
> +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> @@ -589,6 +589,28 @@ xfs_inode_validate_extsize(
>  	inherit_flag = (flags & XFS_DIFLAG_EXTSZINHERIT);
>  	extsize_bytes = XFS_FSB_TO_B(mp, extsize);
>  
> +	/*
> +	 * This comment describes a historic gap in this verifier function.
> +	 *
> +	 * On older kernels, the extent size hint verifier doesn't check that
> +	 * the extent size hint is an integer multiple of the realtime extent
> +	 * size on a directory with both RTINHERIT and EXTSZINHERIT flags set.
> +	 * The verifier has always enforced the alignment rule for regular
> +	 * files with the REALTIME flag set.
> +	 *
> +	 * If a directory with a misaligned extent size hint is allowed to
> +	 * propagate that hint into a new regular realtime file, the result
> +	 * is that the inode cluster buffer verifier will trigger a corruption
> +	 * shutdown the next time it is run.
> +	 *
> +	 * Unfortunately, there could be filesystems with these misconfigured
> +	 * directories in the wild, so we cannot add a check to this verifier
> +	 * at this time because that will result a new source of directory
> +	 * corruption errors when reading an existing filesystem.  Instead, we
> +	 * permit the misconfiguration to pass through the verifiers so that
> +	 * callers of this function can correct and mitigate externally.
> +	 */
> +
>  	if (rt_flag)
>  		blocksize_bytes = mp->m_sb.sb_rextsize << mp->m_sb.sb_blocklog;
>  	else
> diff --git a/fs/xfs/libxfs/xfs_trans_inode.c b/fs/xfs/libxfs/xfs_trans_inode.c
> index 78324e043e25..325f2dceec13 100644
> --- a/fs/xfs/libxfs/xfs_trans_inode.c
> +++ b/fs/xfs/libxfs/xfs_trans_inode.c
> @@ -142,6 +142,21 @@ xfs_trans_log_inode(
>  		flags |= XFS_ILOG_CORE;
>  	}
>  
> +	/*
> +	 * Inode verifiers on older kernels don't check that the extent size
> +	 * hint is an integer multiple of the rt extent size on a directory
> +	 * with both rtinherit and extszinherit flags set.  If we're logging a
> +	 * directory that is misconfigured in this way, clear the hint.
> +	 */
> +	if ((ip->i_diflags & XFS_DIFLAG_RTINHERIT) &&
> +	    (ip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) &&
> +	    (ip->i_extsize % ip->i_mount->m_sb.sb_rextsize) > 0) {
> +		ip->i_diflags &= ~(XFS_DIFLAG_EXTSIZE |
> +				   XFS_DIFLAG_EXTSZINHERIT);
> +		ip->i_extsize = 0;
> +		flags |= XFS_ILOG_CORE;
> +	}
> +
>  	/*
>  	 * Record the specific change for fdatasync optimisation. This allows
>  	 * fdatasync to skip log forces for inodes that are only timestamp
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 0369eb22c1bb..e4c2da4566f1 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -690,6 +690,7 @@ xfs_inode_inherit_flags(
>  	const struct xfs_inode	*pip)
>  {
>  	unsigned int		di_flags = 0;
> +	xfs_failaddr_t		failaddr;
>  	umode_t			mode = VFS_I(ip)->i_mode;
>  
>  	if (S_ISDIR(mode)) {
> @@ -729,6 +730,24 @@ xfs_inode_inherit_flags(
>  		di_flags |= XFS_DIFLAG_FILESTREAM;
>  
>  	ip->i_diflags |= di_flags;
> +
> +	/*
> +	 * Inode verifiers on older kernels only check that the extent size
> +	 * hint is an integer multiple of the rt extent size on realtime files.
> +	 * They did not check the hint alignment on a directory with both
> +	 * rtinherit and extszinherit flags set.  If the misaligned hint is
> +	 * propagated from a directory into a new realtime file, new file
> +	 * allocations will fail due to math errors in the rt allocator and/or
> +	 * trip the verifiers.  Validate the hint settings in the new file so
> +	 * that we don't let broken hints propagate.
> +	 */
> +	failaddr = xfs_inode_validate_extsize(ip->i_mount, ip->i_extsize,
> +			VFS_I(ip)->i_mode, ip->i_diflags);
> +	if (failaddr) {
> +		ip->i_diflags &= ~(XFS_DIFLAG_EXTSIZE |
> +				   XFS_DIFLAG_EXTSZINHERIT);
> +		ip->i_extsize = 0;
> +	}
>  }
>  
>  /* Propagate di_flags2 from a parent inode to a child inode. */
> @@ -737,12 +756,22 @@ xfs_inode_inherit_flags2(
>  	struct xfs_inode	*ip,
>  	const struct xfs_inode	*pip)
>  {
> +	xfs_failaddr_t		failaddr;
> +
>  	if (pip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) {
>  		ip->i_diflags2 |= XFS_DIFLAG2_COWEXTSIZE;
>  		ip->i_cowextsize = pip->i_cowextsize;
>  	}
>  	if (pip->i_diflags2 & XFS_DIFLAG2_DAX)
>  		ip->i_diflags2 |= XFS_DIFLAG2_DAX;
> +
> +	/* Don't let invalid cowextsize hints propagate. */
> +	failaddr = xfs_inode_validate_cowextsize(ip->i_mount, ip->i_cowextsize,
> +			VFS_I(ip)->i_mode, ip->i_diflags, ip->i_diflags2);
> +	if (failaddr) {
> +		ip->i_diflags2 &= ~XFS_DIFLAG2_COWEXTSIZE;
> +		ip->i_cowextsize = 0;
> +	}
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 6407921aca96..1fe4c1fc0aea 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -1291,6 +1291,21 @@ xfs_ioctl_setattr_check_extsize(
>  
>  	new_diflags = xfs_flags2diflags(ip, fa->fsx_xflags);
>  
> +	/*
> +	 * Inode verifiers on older kernels don't check that the extent size
> +	 * hint is an integer multiple of the rt extent size on a directory
> +	 * with both rtinherit and extszinherit flags set.  Don't let sysadmins
> +	 * misconfigure directories.
> +	 */
> +	if ((new_diflags & XFS_DIFLAG_RTINHERIT) &&
> +	    (new_diflags & XFS_DIFLAG_EXTSZINHERIT)) {
> +		unsigned int	rtextsize_bytes;
> +
> +		rtextsize_bytes = XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize);
> +		if (fa->fsx_extsize % rtextsize_bytes)
> +			return -EINVAL;
> +	}
> +
>  	failaddr = xfs_inode_validate_extsize(ip->i_mount,
>  			XFS_B_TO_FSB(mp, fa->fsx_extsize),
>  			VFS_I(ip)->i_mode, new_diflags);
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] xfs: validate extsz hints against rt extent size when rtinherit is set
  2021-05-25 10:49 ` Carlos Maiolino
@ 2021-05-25 17:20   ` Darrick J. Wong
  0 siblings, 0 replies; 5+ messages in thread
From: Darrick J. Wong @ 2021-05-25 17:20 UTC (permalink / raw)
  To: linux-xfs, bfoster, hch

On Tue, May 25, 2021 at 12:49:02PM +0200, Carlos Maiolino wrote:
> Hi Darrick.
> 
> On Mon, May 24, 2021 at 11:15:31PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > The RTINHERIT bit can be set on a directory so that newly created
> > regular files will have the REALTIME bit set to store their data on the
> > realtime volume.  If an extent size hint (and EXTSZINHERIT) are set on
> > the directory, the hint will also be copied into the new file.
> > 
> > As pointed out in previous patches, for realtime files we require the
> > extent size hint be an integer multiple of the realtime extent, but we
> > don't perform the same validation on a directory with both RTINHERIT and
> > EXTSZINHERIT set, even though the only use-case of that combination is
> > to propagate extent size hints into new realtime files.  This leads to
> > inode corruption errors when the bad values are propagated.
> > 
> > Because there may be existing filesystems with such a configuration, we
> > cannot simply amend the inode verifier to trip on these directories and
> > call it a day because that will cause previously "working" filesystems
> > to start throwing errors abruptly.  Note that it's valid to have
> > directories with rtinherit set even if there is no realtime volume, in
> > which case the problem does not manifest because rtinherit is ignored if
> > there's no realtime device; and it's possible that someone set the flag,
> > crashed, repaired the filesystem (which clears the hint on the realtime
> > file) and continued.
> > 
> > Therefore, mitigate this issue in several ways: First, if we try to
> > write out an inode with both rtinherit/extszinherit set and an unaligned
> > extent size hint, turn off the hint to correct the error.  Second, if
> > someone tries to misconfigure a directory via the fssetxattr ioctl, fail
> > the ioctl.  Third, reverify both extent size hint values when we
> > propagate heritable inode attributes from parent to child, to prevent
> > misconfigurations from spreading.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> > v2: disable incorrect hints at runtime instead of whacking filesystems
> >     with verifier errors
> > v3: revise the comment in the verifier to describe the source of the
> >     problem, the observable symptoms, and how the solution fits the
> >     historical context
> 
> IMHO the patch is fine, I have just one comment I'd like to address though:
> 
> > +	/*
> > +	 * Inode verifiers on older kernels don't check that the extent size
> > +	 * hint is an integer multiple of the rt extent size on a directory
> > +	 * with both rtinherit and extszinherit flags set.  If we're logging a
> > +	 * directory that is misconfigured in this way, clear the hint.
> > +	 */
> > +	if ((ip->i_diflags & XFS_DIFLAG_RTINHERIT) &&
> > +	    (ip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) &&
> > +	    (ip->i_extsize % ip->i_mount->m_sb.sb_rextsize) > 0) {
> > +		ip->i_diflags &= ~(XFS_DIFLAG_EXTSIZE |
> > +				   XFS_DIFLAG_EXTSZINHERIT);
> > +		ip->i_extsize = 0;
> > +		flags |= XFS_ILOG_CORE;
> > +	}
> > +
> ...
> > +	 * that we don't let broken hints propagate.
> > +	 */
> > +	failaddr = xfs_inode_validate_extsize(ip->i_mount, ip->i_extsize,
> > +			VFS_I(ip)->i_mode, ip->i_diflags);
> > +	if (failaddr) {
> > +		ip->i_diflags &= ~(XFS_DIFLAG_EXTSIZE |
> > +				   XFS_DIFLAG_EXTSZINHERIT);
> > +		ip->i_extsize = 0;
> > +	}
> >  }
> 
> ...
> > +	/* Don't let invalid cowextsize hints propagate. */
> > +	failaddr = xfs_inode_validate_cowextsize(ip->i_mount, ip->i_cowextsize,
> > +			VFS_I(ip)->i_mode, ip->i_diflags, ip->i_diflags2);
> > +	if (failaddr) {
> > +		ip->i_diflags2 &= ~XFS_DIFLAG2_COWEXTSIZE;
> > +		ip->i_cowextsize = 0;
> > +	}
> >  }
> 
> In all cases above, wouldn't be interesting to at least log the fact we are
> resetting the extent size? At least in debug mode? This may let users clueless
> on why the extent size has been reset, or at least give us some debug data when
> required?

Ok, I'll add an xfs_info message to log the fact that we are changing
inode attributes.  Thanks for the review!

--D

> 
> 
> The patch itself looks fine, with or without logging the extsize reset, you can
> add:
> 
> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
> 
> Cheers
> 
> >  
> >  /*
> > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > index 6407921aca96..1fe4c1fc0aea 100644
> > --- a/fs/xfs/xfs_ioctl.c
> > +++ b/fs/xfs/xfs_ioctl.c
> > @@ -1291,6 +1291,21 @@ xfs_ioctl_setattr_check_extsize(
> >  
> >  	new_diflags = xfs_flags2diflags(ip, fa->fsx_xflags);
> >  
> > +	/*
> > +	 * Inode verifiers on older kernels don't check that the extent size
> > +	 * hint is an integer multiple of the rt extent size on a directory
> > +	 * with both rtinherit and extszinherit flags set.  Don't let sysadmins
> > +	 * misconfigure directories.
> > +	 */
> > +	if ((new_diflags & XFS_DIFLAG_RTINHERIT) &&
> > +	    (new_diflags & XFS_DIFLAG_EXTSZINHERIT)) {
> > +		unsigned int	rtextsize_bytes;
> > +
> > +		rtextsize_bytes = XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize);
> > +		if (fa->fsx_extsize % rtextsize_bytes)
> > +			return -EINVAL;
> > +	}
> > +
> >  	failaddr = xfs_inode_validate_extsize(ip->i_mount,
> >  			XFS_B_TO_FSB(mp, fa->fsx_extsize),
> >  			VFS_I(ip)->i_mode, new_diflags);
> > 
> 
> -- 
> Carlos
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] xfs: validate extsz hints against rt extent size when rtinherit is set
  2021-05-25 11:54 ` Brian Foster
@ 2021-05-25 17:29   ` Darrick J. Wong
  0 siblings, 0 replies; 5+ messages in thread
From: Darrick J. Wong @ 2021-05-25 17:29 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs, hch

On Tue, May 25, 2021 at 07:54:04AM -0400, Brian Foster wrote:
> On Mon, May 24, 2021 at 11:15:31PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > The RTINHERIT bit can be set on a directory so that newly created
> > regular files will have the REALTIME bit set to store their data on the
> > realtime volume.  If an extent size hint (and EXTSZINHERIT) are set on
> > the directory, the hint will also be copied into the new file.
> > 
> > As pointed out in previous patches, for realtime files we require the
> > extent size hint be an integer multiple of the realtime extent, but we
> > don't perform the same validation on a directory with both RTINHERIT and
> > EXTSZINHERIT set, even though the only use-case of that combination is
> > to propagate extent size hints into new realtime files.  This leads to
> > inode corruption errors when the bad values are propagated.
> > 
> > Because there may be existing filesystems with such a configuration, we
> > cannot simply amend the inode verifier to trip on these directories and
> > call it a day because that will cause previously "working" filesystems
> > to start throwing errors abruptly.  Note that it's valid to have
> > directories with rtinherit set even if there is no realtime volume, in
> > which case the problem does not manifest because rtinherit is ignored if
> > there's no realtime device; and it's possible that someone set the flag,
> > crashed, repaired the filesystem (which clears the hint on the realtime
> > file) and continued.
> > 
> > Therefore, mitigate this issue in several ways: First, if we try to
> > write out an inode with both rtinherit/extszinherit set and an unaligned
> > extent size hint, turn off the hint to correct the error.  Second, if
> > someone tries to misconfigure a directory via the fssetxattr ioctl, fail
> > the ioctl.  Third, reverify both extent size hint values when we
> > propagate heritable inode attributes from parent to child, to prevent
> > misconfigurations from spreading.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> > v2: disable incorrect hints at runtime instead of whacking filesystems
> >     with verifier errors
> > v3: revise the comment in the verifier to describe the source of the
> >     problem, the observable symptoms, and how the solution fits the
> >     historical context
> > ---
> 
> I agree with Carlos in that I'd prefer to see some kind of one-shot
> warning to indicate that the problem has been identified in the fs. Some
> users may want to run a repair sequence (or would otherwise have no idea
> they should consider one) just to put things back in a correct state.

Done.  xfs_trans_log_inode now has:

	/*
	 * Inode verifiers on older kernels don't check that the extent size
	 * hint is an integer multiple of the rt extent size on a directory
	 * with both rtinherit and extszinherit flags set.  If we're logging a
	 * directory that is misconfigured in this way, clear the hint.
	 */
	if ((ip->i_diflags & XFS_DIFLAG_RTINHERIT) &&
	    (ip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) &&
	    (ip->i_extsize % ip->i_mount->m_sb.sb_rextsize) > 0) {
		xfs_info_once(ip->i_mount,
	"Correcting misaligned extent size hint in inode 0x%llx.", ip->i_ino);
		ip->i_diflags &= ~(XFS_DIFLAG_EXTSIZE |
				   XFS_DIFLAG_EXTSZINHERIT);
		ip->i_extsize = 0;
		flags |= XFS_ILOG_CORE;
	}

> (Perhaps putting the clearing code in a common helper with a flag to
> toggle between extszhint and cowextszhint would facilitate?).

The cowextsize hint shouldn't have this problem since you can't set the
cow hint unless reflink is enabled, and you can't enable reflink if
there's a realtime volume attached.  The appropriate extra checks will
need to be added to xfs_inode_validate_cowextsize, of course.

> That aside, the patch looks Ok to me:
> 
> Reviewed-by: Brian Foster <bfoster@redhat.com>

Thanks for the review!

--D

> 
> >  fs/xfs/libxfs/xfs_inode_buf.c   |   22 ++++++++++++++++++++++
> >  fs/xfs/libxfs/xfs_trans_inode.c |   15 +++++++++++++++
> >  fs/xfs/xfs_inode.c              |   29 +++++++++++++++++++++++++++++
> >  fs/xfs/xfs_ioctl.c              |   15 +++++++++++++++
> >  4 files changed, 81 insertions(+)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> > index 045118c7bf78..f3254a4f4cb4 100644
> > --- a/fs/xfs/libxfs/xfs_inode_buf.c
> > +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> > @@ -589,6 +589,28 @@ xfs_inode_validate_extsize(
> >  	inherit_flag = (flags & XFS_DIFLAG_EXTSZINHERIT);
> >  	extsize_bytes = XFS_FSB_TO_B(mp, extsize);
> >  
> > +	/*
> > +	 * This comment describes a historic gap in this verifier function.
> > +	 *
> > +	 * On older kernels, the extent size hint verifier doesn't check that
> > +	 * the extent size hint is an integer multiple of the realtime extent
> > +	 * size on a directory with both RTINHERIT and EXTSZINHERIT flags set.
> > +	 * The verifier has always enforced the alignment rule for regular
> > +	 * files with the REALTIME flag set.
> > +	 *
> > +	 * If a directory with a misaligned extent size hint is allowed to
> > +	 * propagate that hint into a new regular realtime file, the result
> > +	 * is that the inode cluster buffer verifier will trigger a corruption
> > +	 * shutdown the next time it is run.
> > +	 *
> > +	 * Unfortunately, there could be filesystems with these misconfigured
> > +	 * directories in the wild, so we cannot add a check to this verifier
> > +	 * at this time because that will result a new source of directory
> > +	 * corruption errors when reading an existing filesystem.  Instead, we
> > +	 * permit the misconfiguration to pass through the verifiers so that
> > +	 * callers of this function can correct and mitigate externally.
> > +	 */
> > +
> >  	if (rt_flag)
> >  		blocksize_bytes = mp->m_sb.sb_rextsize << mp->m_sb.sb_blocklog;
> >  	else
> > diff --git a/fs/xfs/libxfs/xfs_trans_inode.c b/fs/xfs/libxfs/xfs_trans_inode.c
> > index 78324e043e25..325f2dceec13 100644
> > --- a/fs/xfs/libxfs/xfs_trans_inode.c
> > +++ b/fs/xfs/libxfs/xfs_trans_inode.c
> > @@ -142,6 +142,21 @@ xfs_trans_log_inode(
> >  		flags |= XFS_ILOG_CORE;
> >  	}
> >  
> > +	/*
> > +	 * Inode verifiers on older kernels don't check that the extent size
> > +	 * hint is an integer multiple of the rt extent size on a directory
> > +	 * with both rtinherit and extszinherit flags set.  If we're logging a
> > +	 * directory that is misconfigured in this way, clear the hint.
> > +	 */
> > +	if ((ip->i_diflags & XFS_DIFLAG_RTINHERIT) &&
> > +	    (ip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) &&
> > +	    (ip->i_extsize % ip->i_mount->m_sb.sb_rextsize) > 0) {
> > +		ip->i_diflags &= ~(XFS_DIFLAG_EXTSIZE |
> > +				   XFS_DIFLAG_EXTSZINHERIT);
> > +		ip->i_extsize = 0;
> > +		flags |= XFS_ILOG_CORE;
> > +	}
> > +
> >  	/*
> >  	 * Record the specific change for fdatasync optimisation. This allows
> >  	 * fdatasync to skip log forces for inodes that are only timestamp
> > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > index 0369eb22c1bb..e4c2da4566f1 100644
> > --- a/fs/xfs/xfs_inode.c
> > +++ b/fs/xfs/xfs_inode.c
> > @@ -690,6 +690,7 @@ xfs_inode_inherit_flags(
> >  	const struct xfs_inode	*pip)
> >  {
> >  	unsigned int		di_flags = 0;
> > +	xfs_failaddr_t		failaddr;
> >  	umode_t			mode = VFS_I(ip)->i_mode;
> >  
> >  	if (S_ISDIR(mode)) {
> > @@ -729,6 +730,24 @@ xfs_inode_inherit_flags(
> >  		di_flags |= XFS_DIFLAG_FILESTREAM;
> >  
> >  	ip->i_diflags |= di_flags;
> > +
> > +	/*
> > +	 * Inode verifiers on older kernels only check that the extent size
> > +	 * hint is an integer multiple of the rt extent size on realtime files.
> > +	 * They did not check the hint alignment on a directory with both
> > +	 * rtinherit and extszinherit flags set.  If the misaligned hint is
> > +	 * propagated from a directory into a new realtime file, new file
> > +	 * allocations will fail due to math errors in the rt allocator and/or
> > +	 * trip the verifiers.  Validate the hint settings in the new file so
> > +	 * that we don't let broken hints propagate.
> > +	 */
> > +	failaddr = xfs_inode_validate_extsize(ip->i_mount, ip->i_extsize,
> > +			VFS_I(ip)->i_mode, ip->i_diflags);
> > +	if (failaddr) {
> > +		ip->i_diflags &= ~(XFS_DIFLAG_EXTSIZE |
> > +				   XFS_DIFLAG_EXTSZINHERIT);
> > +		ip->i_extsize = 0;
> > +	}
> >  }
> >  
> >  /* Propagate di_flags2 from a parent inode to a child inode. */
> > @@ -737,12 +756,22 @@ xfs_inode_inherit_flags2(
> >  	struct xfs_inode	*ip,
> >  	const struct xfs_inode	*pip)
> >  {
> > +	xfs_failaddr_t		failaddr;
> > +
> >  	if (pip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) {
> >  		ip->i_diflags2 |= XFS_DIFLAG2_COWEXTSIZE;
> >  		ip->i_cowextsize = pip->i_cowextsize;
> >  	}
> >  	if (pip->i_diflags2 & XFS_DIFLAG2_DAX)
> >  		ip->i_diflags2 |= XFS_DIFLAG2_DAX;
> > +
> > +	/* Don't let invalid cowextsize hints propagate. */
> > +	failaddr = xfs_inode_validate_cowextsize(ip->i_mount, ip->i_cowextsize,
> > +			VFS_I(ip)->i_mode, ip->i_diflags, ip->i_diflags2);
> > +	if (failaddr) {
> > +		ip->i_diflags2 &= ~XFS_DIFLAG2_COWEXTSIZE;
> > +		ip->i_cowextsize = 0;
> > +	}
> >  }
> >  
> >  /*
> > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > index 6407921aca96..1fe4c1fc0aea 100644
> > --- a/fs/xfs/xfs_ioctl.c
> > +++ b/fs/xfs/xfs_ioctl.c
> > @@ -1291,6 +1291,21 @@ xfs_ioctl_setattr_check_extsize(
> >  
> >  	new_diflags = xfs_flags2diflags(ip, fa->fsx_xflags);
> >  
> > +	/*
> > +	 * Inode verifiers on older kernels don't check that the extent size
> > +	 * hint is an integer multiple of the rt extent size on a directory
> > +	 * with both rtinherit and extszinherit flags set.  Don't let sysadmins
> > +	 * misconfigure directories.
> > +	 */
> > +	if ((new_diflags & XFS_DIFLAG_RTINHERIT) &&
> > +	    (new_diflags & XFS_DIFLAG_EXTSZINHERIT)) {
> > +		unsigned int	rtextsize_bytes;
> > +
> > +		rtextsize_bytes = XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize);
> > +		if (fa->fsx_extsize % rtextsize_bytes)
> > +			return -EINVAL;
> > +	}
> > +
> >  	failaddr = xfs_inode_validate_extsize(ip->i_mount,
> >  			XFS_B_TO_FSB(mp, fa->fsx_extsize),
> >  			VFS_I(ip)->i_mode, new_diflags);
> > 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-05-25 17:29 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-25  6:15 [PATCH v3] xfs: validate extsz hints against rt extent size when rtinherit is set Darrick J. Wong
2021-05-25 10:49 ` Carlos Maiolino
2021-05-25 17:20   ` Darrick J. Wong
2021-05-25 11:54 ` Brian Foster
2021-05-25 17:29   ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.