linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: xfs <linux-xfs@vger.kernel.org>, Alex Lyakas <alex@zadara.com>,
	Dave Chinner <david@fromorbit.com>
Subject: Re: [PATCH] xfs: don't commit sunit/swidth updates to disk if that would cause repair failures
Date: Thu, 5 Dec 2019 09:36:18 -0500	[thread overview]
Message-ID: <20191205143618.GA48368@bfoster> (raw)
In-Reply-To: <20191204170340.GR7335@magnolia>

On Wed, Dec 04, 2019 at 09:03:40AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Alex Lyakas reported[1] that mounting an xfs filesystem with new sunit
> and swidth values could cause xfs_repair to fail loudly.  The problem
> here is that repair calculates the where mkfs should have allocated the
> root inode, based on the superblock geometry.  The allocation decisions
> depend on sunit, which means that we really can't go updating sunit if
> it would lead to a subsequent repair failure on an otherwise correct
> filesystem.
> 
> Port the computation code from xfs_repair and teach mount to avoid the
> ondisk update if it would cause problems for repair.  We allow the mount
> to proceed (and new allocations will reflect this new geometry) because
> we've never screened this kind of thing before.
> 
> [1] https://lore.kernel.org/linux-xfs/20191125130744.GA44777@bfoster/T/#m00f9594b511e076e2fcdd489d78bc30216d72a7d
> 
> Reported-by: Alex Lyakas <alex@zadara.com>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> v2: compute the root inode location directly
> ---
>  fs/xfs/libxfs/xfs_ialloc.c |   81 ++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/libxfs/xfs_ialloc.h |    1 +
>  fs/xfs/xfs_mount.c         |   51 ++++++++++++++++++----------
>  3 files changed, 115 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> index 988cde7744e6..6df9bcc96251 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.c
> +++ b/fs/xfs/libxfs/xfs_ialloc.c
> @@ -2909,3 +2909,84 @@ xfs_ialloc_setup_geometry(
>  	else
>  		igeo->ialloc_align = 0;
>  }
> +
> +/*
> + * Compute the location of the root directory inode that is laid out by mkfs.
> + * The @sunit parameter will be copied from the superblock if it is negative.
> + */
> +xfs_ino_t
> +xfs_ialloc_calc_rootino(
> +	struct xfs_mount	*mp,
> +	int			sunit)
> +{
> +	struct xfs_ino_geometry	*igeo = M_IGEO(mp);
> +	xfs_agino_t		first_agino;
> +	xfs_agblock_t		first_bno;
> +
> +	if (sunit < 0)
> +		sunit = mp->m_sb.sb_unit;
> +
> +	/*
> +	 * Pre-calculate the geometry of ag 0. We know what it looks like
> +	 * because we know what mkfs does: 2 allocation btree roots (by block
> +	 * and by size), the inode allocation btree root, the free inode
> +	 * allocation btree root (if enabled) and some number of blocks to
> +	 * prefill the agfl.
> +	 *
> +	 * Because the current shape of the btrees may differ from the current
> +	 * shape, we open code the mkfs freelist block count here. mkfs creates
> +	 * single level trees, so the calculation is pretty straight forward for
> +	 * the trees that use the AGFL.
> +	 */
> +

I know this code is lifted from userspace, but.. "the current shape of
the btrees may differ from the current shape, .." Eh?

> +	/* free space by block btree root comes after the ag headers */
> +	first_bno = howmany(4 * mp->m_sb.sb_sectsize, mp->m_sb.sb_blocksize);
> +
> +	/* free space by length btree root */
> +	first_bno += 1;
> +
> +	/* inode btree root */
> +	first_bno += 1;
> +
> +	/* agfl */
> +	first_bno += (2 * min_t(xfs_agblock_t, 2, mp->m_ag_maxlevels)) + 1;
> +

This is a little subtle from the userspace code. The extra +1 here is
where we go from pointing at metadata blocks (i.e. bnobt root) to the
first free block (i.e., past metadata blocks), right? If so, I wonder if
this should be incorporated either at the beginning or end with a
comment for explanation (i.e. "Start by pointing at the first block
after the AG headers and increment by size of applicable metadata to
locate the first free block ...").

I'm guessing this is a historical artifact of the userspace code as
features were added. The fact that userspace uses different variables
somewhat helps self-document in that context, and we lose that here.

> +	if (xfs_sb_version_hasfinobt(&mp->m_sb))
> +		first_bno++;
> +
> +	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> +		first_bno++;
> +		/* agfl blocks */
> +		first_bno += min_t(xfs_agblock_t, 2, mp->m_rmap_maxlevels);
> +	}
> +
> +	if (xfs_sb_version_hasreflink(&mp->m_sb))
> +		first_bno++;
> +
> +	/*
> +	 * If the log is allocated in the first allocation group we need to
> +	 * add the number of blocks used by the log to the above calculation.
> +	 *
> +	 * This can happens with filesystems that only have a single
> +	 * allocation group, or very odd geometries created by old mkfs
> +	 * versions on very small filesystems.
> +	 */
> +	if (mp->m_sb.sb_logstart &&
> +	    XFS_FSB_TO_AGNO(mp, mp->m_sb.sb_logstart) == 0)
> +		 first_bno += mp->m_sb.sb_logblocks;
> +
> +	/*
> +	 * ditto the location of the first inode chunks in the fs ('/')
> +	 */
> +	if (xfs_sb_version_hasdalign(&mp->m_sb) && igeo->ialloc_align > 0) {
> +		first_agino = XFS_AGB_TO_AGINO(mp, roundup(first_bno, sunit));
> +	} else if (xfs_sb_version_hasalign(&mp->m_sb) &&
> +		   mp->m_sb.sb_inoalignmt > 1)  {
> +		first_agino = XFS_AGB_TO_AGINO(mp,
> +				roundup(first_bno, mp->m_sb.sb_inoalignmt));
> +	} else  {
> +		first_agino = XFS_AGB_TO_AGINO(mp, first_bno);
> +	}
> +
> +	return XFS_AGINO_TO_INO(mp, 0, first_agino);
> +}
> diff --git a/fs/xfs/libxfs/xfs_ialloc.h b/fs/xfs/libxfs/xfs_ialloc.h
> index 323592d563d5..72b3468b97b1 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.h
> +++ b/fs/xfs/libxfs/xfs_ialloc.h
> @@ -152,5 +152,6 @@ int xfs_inobt_insert_rec(struct xfs_btree_cur *cur, uint16_t holemask,
>  
>  int xfs_ialloc_cluster_alignment(struct xfs_mount *mp);
>  void xfs_ialloc_setup_geometry(struct xfs_mount *mp);
> +xfs_ino_t xfs_ialloc_calc_rootino(struct xfs_mount *mp, int sunit);
>  
>  #endif	/* __XFS_IALLOC_H__ */
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index fca65109cf24..a4eb3ae34a84 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -363,9 +363,10 @@ xfs_readsb(
>   * Update alignment values based on mount options and sb values
>   */
>  STATIC int
> -xfs_update_alignment(xfs_mount_t *mp)
> +xfs_update_alignment(
> +	struct xfs_mount	*mp)
>  {
> -	xfs_sb_t	*sbp = &(mp->m_sb);
> +	struct xfs_sb		*sbp = &mp->m_sb;
>  
>  	if (mp->m_dalign) {
>  		/*
> @@ -398,28 +399,42 @@ xfs_update_alignment(xfs_mount_t *mp)
>  			}
>  		}
>  
> -		/*
> -		 * Update superblock with new values
> -		 * and log changes
> -		 */
> -		if (xfs_sb_version_hasdalign(sbp)) {
> -			if (sbp->sb_unit != mp->m_dalign) {
> -				sbp->sb_unit = mp->m_dalign;
> -				mp->m_update_sb = true;
> -			}
> -			if (sbp->sb_width != mp->m_swidth) {
> -				sbp->sb_width = mp->m_swidth;
> -				mp->m_update_sb = true;
> -			}
> -		} else {
> +		/* Update superblock with new values and log changes. */
> +		if (!xfs_sb_version_hasdalign(sbp)) {
>  			xfs_warn(mp,
>  	"cannot change alignment: superblock does not support data alignment");
>  			return -EINVAL;
>  		}
> +
> +		if (sbp->sb_unit == mp->m_dalign &&
> +		    sbp->sb_width == mp->m_swidth)
> +			return 0;
> +
> +		/*
> +		 * If the sunit/swidth change would move the precomputed root
> +		 * inode value, we must reject the ondisk change because repair
> +		 * will stumble over that.  However, we allow the mount to
> +		 * proceed because we never rejected this combination before.
> +		 */
> +		if (sbp->sb_rootino !=
> +		    xfs_ialloc_calc_rootino(mp, mp->m_dalign)) {
> +			xfs_warn(mp,
> +	"cannot change stripe alignment: would require moving root inode");
> +

FWIW, I read this error message as the mount option was ignored. I don't
much care whether we ignore the mount option or simply the on-disk
update, but the error could be a bit more clear in the latter case.
Also, what is the expected behavior for xfs_info in the latter
situation?

Brian

> +			/*
> +			 * XXX: Next time we add a new incompat feature, this
> +			 * should start returning -EINVAL.
> +			 */
> +			return 0;
> +		}
> +
> +		sbp->sb_unit = mp->m_dalign;
> +		sbp->sb_width = mp->m_swidth;
> +		mp->m_update_sb = true;
>  	} else if ((mp->m_flags & XFS_MOUNT_NOALIGN) != XFS_MOUNT_NOALIGN &&
>  		    xfs_sb_version_hasdalign(&mp->m_sb)) {
> -			mp->m_dalign = sbp->sb_unit;
> -			mp->m_swidth = sbp->sb_width;
> +		mp->m_dalign = sbp->sb_unit;
> +		mp->m_swidth = sbp->sb_width;
>  	}
>  
>  	return 0;
> 


  reply	other threads:[~2019-12-05 14:36 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-04 17:03 [PATCH] xfs: don't commit sunit/swidth updates to disk if that would cause repair failures Darrick J. Wong
2019-12-05 14:36 ` Brian Foster [this message]
2019-12-05 21:42   ` Darrick J. Wong
2019-12-06 10:57     ` Brian Foster
2019-12-13 16:10       ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191205143618.GA48368@bfoster \
    --to=bfoster@redhat.com \
    --cc=alex@zadara.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).