From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: xfs <linux-xfs@vger.kernel.org>, Alex Lyakas <alex@zadara.com>
Subject: Re: [RFC PATCH] xfs: don't commit sunit/swidth updates to disk if that would cause repair failures
Date: Tue, 3 Dec 2019 08:21:40 +1100 [thread overview]
Message-ID: <20191202212140.GG2695@dread.disaster.area> (raw)
In-Reply-To: <20191202173538.GD7335@magnolia>
On Mon, Dec 02, 2019 at 09:35:38AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> Alex Lyakas reported[1] that mounting an xfs filesystem with new sunit
> and swidth values could cause xfs_repair to fail loudly. The problem
> here is that repair calculates the where mkfs should have allocated the
> root inode, based on the superblock geometry. The allocation decisions
> depend on sunit, which means that we really can't go updating sunit if
> it would lead to a subsequent repair failure on an otherwise correct
> filesystem.
>
> Port the computation code from xfs_repair and teach mount to avoid the
> ondisk update if it would cause problems for repair. We allow the mount
> to proceed (and new allocations will reflect this new geometry) because
> we've never screened this kind of thing before.
>
> [1] https://lore.kernel.org/linux-xfs/20191125130744.GA44777@bfoster/T/#m00f9594b511e076e2fcdd489d78bc30216d72a7d
....
> +/*
> + * Compute the first and last inodes numbers of the inode chunk that was
> + * preallocated for the root directory.
> + */
> +void
> +xfs_ialloc_find_prealloc(
> + struct xfs_mount *mp,
> + xfs_agino_t *first_agino,
> + xfs_agino_t *last_agino)
> +{
> + struct xfs_ino_geometry *igeo = M_IGEO(mp);
> + xfs_agblock_t first_bno;
> +
> + /*
> + * Pre-calculate the geometry of ag 0. We know what it looks like
> + * because we know what mkfs does: 2 allocation btree roots (by block
> + * and by size), the inode allocation btree root, the free inode
> + * allocation btree root (if enabled) and some number of blocks to
> + * prefill the agfl.
> + *
> + * Because the current shape of the btrees may differ from the current
> + * shape, we open code the mkfs freelist block count here. mkfs creates
> + * single level trees, so the calculation is pertty straight forward for
pretty.
> + * the trees that use the AGFL.
> + */
> +
> + /* free space by block btree root comes after the ag headers */
> + first_bno = howmany(4 * mp->m_sb.sb_sectsize, mp->m_sb.sb_blocksize);
> +
> + /* free space by length btree root */
> + first_bno += 1;
> +
> + /* inode btree root */
> + first_bno += 1;
> +
> + /* agfl */
> + first_bno += (2 * min(2U, mp->m_ag_maxlevels)) + 1;
min_t(xfs_agblock_t, 2, mp->m_ag_maxlevels) ?
> +
> + if (xfs_sb_version_hasfinobt(&mp->m_sb))
> + first_bno++;
> +
> + if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> + first_bno += min(2U, mp->m_rmap_maxlevels); /* agfl blocks */
same.
> + first_bno++;
> + }
> +
> + if (xfs_sb_version_hasreflink(&mp->m_sb))
> + first_bno++;
> +
> + /*
> + * If the log is allocated in the first allocation group we need to
> + * add the number of blocks used by the log to the above calculation.
> + *
> + * This can happens with filesystems that only have a single
> + * allocation group, or very odd geometries created by old mkfs
> + * versions on very small filesystems.
> + */
> + if (mp->m_sb.sb_logstart &&
> + XFS_FSB_TO_AGNO(mp, mp->m_sb.sb_logstart) == 0) {
> +
> + /*
> + * XXX(hch): verify that sb_logstart makes sense?
> + */
> + first_bno += mp->m_sb.sb_logblocks;
> + }
> +
> + /*
> + * ditto the location of the first inode chunks in the fs ('/')
> + */
> + if (xfs_sb_version_hasdalign(&mp->m_sb) && igeo->ialloc_align > 0) {
> + *first_agino = XFS_AGB_TO_AGINO(mp,
> + roundup(first_bno, mp->m_sb.sb_unit));
> + } else if (xfs_sb_version_hasalign(&mp->m_sb) &&
> + mp->m_sb.sb_inoalignmt > 1) {
> + *first_agino = XFS_AGB_TO_AGINO(mp,
> + roundup(first_bno, mp->m_sb.sb_inoalignmt));
> + } else {
> + *first_agino = XFS_AGB_TO_AGINO(mp, first_bno);
> + }
> +
> + ASSERT(igeo->ialloc_blks > 0);
> +
> + if (igeo->ialloc_blks > 1)
> + *last_agino = *first_agino + XFS_INODES_PER_CHUNK;
> + else
> + *last_agino = XFS_AGB_TO_AGINO(mp, first_bno + 1);
Isn't last_agino of the first inode of the next chunk? i.e. this is
an off-by-one...
> +}
> diff --git a/fs/xfs/libxfs/xfs_ialloc.h b/fs/xfs/libxfs/xfs_ialloc.h
> index 323592d563d5..9d9fe7b488b8 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.h
> +++ b/fs/xfs/libxfs/xfs_ialloc.h
> @@ -152,5 +152,7 @@ int xfs_inobt_insert_rec(struct xfs_btree_cur *cur, uint16_t holemask,
>
> int xfs_ialloc_cluster_alignment(struct xfs_mount *mp);
> void xfs_ialloc_setup_geometry(struct xfs_mount *mp);
> +void xfs_ialloc_find_prealloc(struct xfs_mount *mp, xfs_agino_t *first_agino,
> + xfs_agino_t *last_agino);
>
> #endif /* __XFS_IALLOC_H__ */
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 7b35d62ede9f..d830a9e13817 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -891,6 +891,9 @@ xfs_ioc_fsgeometry(
>
> xfs_fs_geometry(&mp->m_sb, &fsgeo, struct_version);
>
> + fsgeo.sunit = mp->m_sb.sb_unit;
> + fsgeo.swidth = mp->m_sb.sb_width;
Why?
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index fca65109cf24..0323a89256c7 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -368,6 +368,11 @@ xfs_update_alignment(xfs_mount_t *mp)
> xfs_sb_t *sbp = &(mp->m_sb);
>
> if (mp->m_dalign) {
> + uint32_t old_su;
> + uint32_t old_sw;
> + xfs_agino_t first;
> + xfs_agino_t last;
> +
> /*
> * If stripe unit and stripe width are not multiples
> * of the fs blocksize turn off alignment.
> @@ -398,24 +403,38 @@ xfs_update_alignment(xfs_mount_t *mp)
> }
> }
>
> - /*
> - * Update superblock with new values
> - * and log changes
> - */
> - if (xfs_sb_version_hasdalign(sbp)) {
> - if (sbp->sb_unit != mp->m_dalign) {
> - sbp->sb_unit = mp->m_dalign;
> - mp->m_update_sb = true;
> - }
> - if (sbp->sb_width != mp->m_swidth) {
> - sbp->sb_width = mp->m_swidth;
> - mp->m_update_sb = true;
> - }
> - } else {
> + /* Update superblock with new values and log changes. */
> + if (!xfs_sb_version_hasdalign(sbp)) {
> xfs_warn(mp,
> "cannot change alignment: superblock does not support data alignment");
> return -EINVAL;
> }
> +
> + if (sbp->sb_unit == mp->m_dalign &&
> + sbp->sb_width == mp->m_swidth)
> + return 0;
> +
> + old_su = sbp->sb_unit;
> + old_sw = sbp->sb_width;
> + sbp->sb_unit = mp->m_dalign;
> + sbp->sb_width = mp->m_swidth;
> + xfs_ialloc_find_prealloc(mp, &first, &last);
We just chuck last away? why calculate it then? And why not just
pass mp->m_dalign/mp->m_swidth into the function rather than setting
them in the sb and then having to undo the change? i.e.
rootino = xfs_ialloc_calc_rootino(mp, mp->m_dalign, mp->m_swidth);
if (sbp->sb_rootino != rootino) {
.....
}
> +
> + /*
> + * If the sunit/swidth change would move the precomputed root
> + * inode value, we must reject the ondisk change because repair
> + * will stumble over that. However, we allow the mount to
> + * proceed because we never rejected this combination before.
> + */
> + if (sbp->sb_rootino != XFS_AGINO_TO_INO(mp, 0, first)) {
> + sbp->sb_unit = old_su;
> + sbp->sb_width = old_sw;
> + xfs_warn(mp,
> + "cannot change alignment: would require moving root inode");
"cannot change stripe alignment: ..." ?
Should this also return EINVAL, as per above when the DALIGN sb
feature bit is not set?
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2019-12-02 21:21 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-02 17:35 [RFC PATCH] xfs: don't commit sunit/swidth updates to disk if that would cause repair failures Darrick J. Wong
2019-12-02 21:21 ` Dave Chinner [this message]
2019-12-03 2:30 ` Darrick J. Wong
2019-12-03 21:21 ` Dave Chinner
2019-12-04 16:51 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191202212140.GG2695@dread.disaster.area \
--to=david@fromorbit.com \
--cc=alex@zadara.com \
--cc=darrick.wong@oracle.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).