linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/3] xfs: don't commit sunit/swidth updates to disk if that would cause repair failures
@ 2019-12-18 19:37 Darrick J. Wong
  2019-12-18 19:37 ` [PATCH 1/3] xfs: refactor agfl length computation function Darrick J. Wong
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Darrick J. Wong @ 2019-12-18 19:37 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, bfoster, david, alex

Hi all,

Alex Lyakas discovered that it's possible to mount a filesystem with
sunit/swidth options that will cause a subsequent xfs_repair run to fail
to find the root directory.  This series adds some code to strengthen
the kernel's mount option checking to avoid updating the superblock if
this happens.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=sunit-updates

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/3] xfs: refactor agfl length computation function
  2019-12-18 19:37 [PATCH v5 0/3] xfs: don't commit sunit/swidth updates to disk if that would cause repair failures Darrick J. Wong
@ 2019-12-18 19:37 ` Darrick J. Wong
  2019-12-19 13:17   ` Brian Foster
  2019-12-24  8:31   ` Christoph Hellwig
  2019-12-18 19:37 ` [PATCH 2/3] xfs: split the sunit parameter update into two parts Darrick J. Wong
  2019-12-18 19:37 ` [PATCH 3/3] xfs: don't commit sunit/swidth updates to disk if that would cause repair failures Darrick J. Wong
  2 siblings, 2 replies; 9+ messages in thread
From: Darrick J. Wong @ 2019-12-18 19:37 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, bfoster, david, alex

From: Darrick J. Wong <darrick.wong@oracle.com>

Refactor xfs_alloc_min_freelist to accept a NULL @pag argument, in which
case it returns the largest possible minimum length.  This will be used
in an upcoming patch to compute the length of the AGFL at mkfs time.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_alloc.c |   18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index c284e10af491..fc93fd88ec89 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2248,24 +2248,32 @@ xfs_alloc_longest_free_extent(
 	return pag->pagf_flcount > 0 || pag->pagf_longest > 0;
 }
 
+/*
+ * Compute the minimum length of the AGFL in the given AG.  If @pag is NULL,
+ * return the largest possible minimum length.
+ */
 unsigned int
 xfs_alloc_min_freelist(
 	struct xfs_mount	*mp,
 	struct xfs_perag	*pag)
 {
+	/* AG btrees have at least 1 level. */
+	static const uint8_t	fake_levels[XFS_BTNUM_AGF] = {1, 1, 1};
+	const uint8_t		*levels = pag ? pag->pagf_levels : fake_levels;
 	unsigned int		min_free;
 
+	ASSERT(mp->m_ag_maxlevels > 0);
+
 	/* space needed by-bno freespace btree */
-	min_free = min_t(unsigned int, pag->pagf_levels[XFS_BTNUM_BNOi] + 1,
+	min_free = min_t(unsigned int, levels[XFS_BTNUM_BNOi] + 1,
 				       mp->m_ag_maxlevels);
 	/* space needed by-size freespace btree */
-	min_free += min_t(unsigned int, pag->pagf_levels[XFS_BTNUM_CNTi] + 1,
+	min_free += min_t(unsigned int, levels[XFS_BTNUM_CNTi] + 1,
 				       mp->m_ag_maxlevels);
 	/* space needed reverse mapping used space btree */
 	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
-		min_free += min_t(unsigned int,
-				  pag->pagf_levels[XFS_BTNUM_RMAPi] + 1,
-				  mp->m_rmap_maxlevels);
+		min_free += min_t(unsigned int, levels[XFS_BTNUM_RMAPi] + 1,
+						mp->m_rmap_maxlevels);
 
 	return min_free;
 }


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/3] xfs: split the sunit parameter update into two parts
  2019-12-18 19:37 [PATCH v5 0/3] xfs: don't commit sunit/swidth updates to disk if that would cause repair failures Darrick J. Wong
  2019-12-18 19:37 ` [PATCH 1/3] xfs: refactor agfl length computation function Darrick J. Wong
@ 2019-12-18 19:37 ` Darrick J. Wong
  2019-12-19 13:18   ` Brian Foster
  2019-12-18 19:37 ` [PATCH 3/3] xfs: don't commit sunit/swidth updates to disk if that would cause repair failures Darrick J. Wong
  2 siblings, 1 reply; 9+ messages in thread
From: Darrick J. Wong @ 2019-12-18 19:37 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, bfoster, david, alex

From: Darrick J. Wong <darrick.wong@oracle.com>

If the administrator provided a sunit= mount option, we need to validate
the raw parameter, convert the mount option units (512b blocks) into the
internal unit (fs blocks), and then validate that the (now cooked)
parameter doesn't screw anything up on disk.  The incore inode geometry
computation can depend on the new sunit option, but a subsequent patch
will make validating the cooked value depends on the computed inode
geometry, so break the sunit update into two steps.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_mount.c |  126 +++++++++++++++++++++++++++++++---------------------
 1 file changed, 75 insertions(+), 51 deletions(-)


diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index fca65109cf24..3750f0074c74 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -360,66 +360,79 @@ xfs_readsb(
 }
 
 /*
- * Update alignment values based on mount options and sb values
+ * If we were provided with new sunit/swidth values as mount options, make sure
+ * that they pass basic alignment and superblock feature checks, and convert
+ * them into the same units (FSB) that everything else expects.  This step
+ * /must/ be done before computing the inode geometry.
  */
 STATIC int
-xfs_update_alignment(xfs_mount_t *mp)
+xfs_validate_new_dalign(
+	struct xfs_mount	*mp)
 {
-	xfs_sb_t	*sbp = &(mp->m_sb);
+	if (mp->m_dalign == 0)
+		return 0;
 
-	if (mp->m_dalign) {
+	/*
+	 * If stripe unit and stripe width are not multiples
+	 * of the fs blocksize turn off alignment.
+	 */
+	if ((BBTOB(mp->m_dalign) & mp->m_blockmask) ||
+	    (BBTOB(mp->m_swidth) & mp->m_blockmask)) {
+		xfs_warn(mp,
+	"alignment check failed: sunit/swidth vs. blocksize(%d)",
+			mp->m_sb.sb_blocksize);
+		return -EINVAL;
+	} else {
 		/*
-		 * If stripe unit and stripe width are not multiples
-		 * of the fs blocksize turn off alignment.
+		 * Convert the stripe unit and width to FSBs.
 		 */
-		if ((BBTOB(mp->m_dalign) & mp->m_blockmask) ||
-		    (BBTOB(mp->m_swidth) & mp->m_blockmask)) {
+		mp->m_dalign = XFS_BB_TO_FSBT(mp, mp->m_dalign);
+		if (mp->m_dalign && (mp->m_sb.sb_agblocks % mp->m_dalign)) {
 			xfs_warn(mp,
-		"alignment check failed: sunit/swidth vs. blocksize(%d)",
-				sbp->sb_blocksize);
+		"alignment check failed: sunit/swidth vs. agsize(%d)",
+				 mp->m_sb.sb_agblocks);
 			return -EINVAL;
-		} else {
-			/*
-			 * Convert the stripe unit and width to FSBs.
-			 */
-			mp->m_dalign = XFS_BB_TO_FSBT(mp, mp->m_dalign);
-			if (mp->m_dalign && (sbp->sb_agblocks % mp->m_dalign)) {
-				xfs_warn(mp,
-			"alignment check failed: sunit/swidth vs. agsize(%d)",
-					 sbp->sb_agblocks);
-				return -EINVAL;
-			} else if (mp->m_dalign) {
-				mp->m_swidth = XFS_BB_TO_FSBT(mp, mp->m_swidth);
-			} else {
-				xfs_warn(mp,
-			"alignment check failed: sunit(%d) less than bsize(%d)",
-					 mp->m_dalign, sbp->sb_blocksize);
-				return -EINVAL;
-			}
-		}
-
-		/*
-		 * Update superblock with new values
-		 * and log changes
-		 */
-		if (xfs_sb_version_hasdalign(sbp)) {
-			if (sbp->sb_unit != mp->m_dalign) {
-				sbp->sb_unit = mp->m_dalign;
-				mp->m_update_sb = true;
-			}
-			if (sbp->sb_width != mp->m_swidth) {
-				sbp->sb_width = mp->m_swidth;
-				mp->m_update_sb = true;
-			}
+		} else if (mp->m_dalign) {
+			mp->m_swidth = XFS_BB_TO_FSBT(mp, mp->m_swidth);
 		} else {
 			xfs_warn(mp,
-	"cannot change alignment: superblock does not support data alignment");
+		"alignment check failed: sunit(%d) less than bsize(%d)",
+				 mp->m_dalign, mp->m_sb.sb_blocksize);
 			return -EINVAL;
 		}
+	}
+
+	/* Update superblock with new values and log changes. */
+	if (!xfs_sb_version_hasdalign(&mp->m_sb)) {
+		xfs_warn(mp,
+"cannot change alignment: superblock does not support data alignment");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/*
+ * Update alignment values based on mount options and sb values
+ */
+STATIC int
+xfs_update_alignment(
+	struct xfs_mount	*mp)
+{
+	struct xfs_sb		*sbp = &mp->m_sb;
+
+	if (mp->m_dalign) {
+		if (sbp->sb_unit == mp->m_dalign &&
+		    sbp->sb_width == mp->m_swidth)
+			return 0;
+
+		sbp->sb_unit = mp->m_dalign;
+		sbp->sb_width = mp->m_swidth;
+		mp->m_update_sb = true;
 	} else if ((mp->m_flags & XFS_MOUNT_NOALIGN) != XFS_MOUNT_NOALIGN &&
 		    xfs_sb_version_hasdalign(&mp->m_sb)) {
-			mp->m_dalign = sbp->sb_unit;
-			mp->m_swidth = sbp->sb_width;
+		mp->m_dalign = sbp->sb_unit;
+		mp->m_swidth = sbp->sb_width;
 	}
 
 	return 0;
@@ -648,12 +661,12 @@ xfs_mountfs(
 	}
 
 	/*
-	 * Check if sb_agblocks is aligned at stripe boundary
-	 * If sb_agblocks is NOT aligned turn off m_dalign since
-	 * allocator alignment is within an ag, therefore ag has
-	 * to be aligned at stripe boundary.
+	 * If we were given new sunit/swidth options, do some basic validation
+	 * checks and convert the incore dalign and swidth values to the
+	 * same units (FSB) that everything else uses.  This /must/ happen
+	 * before computing the inode geometry.
 	 */
-	error = xfs_update_alignment(mp);
+	error = xfs_validate_new_dalign(mp);
 	if (error)
 		goto out;
 
@@ -664,6 +677,17 @@ xfs_mountfs(
 	xfs_rmapbt_compute_maxlevels(mp);
 	xfs_refcountbt_compute_maxlevels(mp);
 
+	/*
+	 * Check if sb_agblocks is aligned at stripe boundary.  If sb_agblocks
+	 * is NOT aligned turn off m_dalign since allocator alignment is within
+	 * an ag, therefore ag has to be aligned at stripe boundary.  Note that
+	 * we must compute the free space and rmap btree geometry before doing
+	 * this.
+	 */
+	error = xfs_update_alignment(mp);
+	if (error)
+		goto out;
+
 	/* enable fail_at_unmount as default */
 	mp->m_fail_unmount = true;
 


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/3] xfs: don't commit sunit/swidth updates to disk if that would cause repair failures
  2019-12-18 19:37 [PATCH v5 0/3] xfs: don't commit sunit/swidth updates to disk if that would cause repair failures Darrick J. Wong
  2019-12-18 19:37 ` [PATCH 1/3] xfs: refactor agfl length computation function Darrick J. Wong
  2019-12-18 19:37 ` [PATCH 2/3] xfs: split the sunit parameter update into two parts Darrick J. Wong
@ 2019-12-18 19:37 ` Darrick J. Wong
  2019-12-19 13:21   ` Brian Foster
  2 siblings, 1 reply; 9+ messages in thread
From: Darrick J. Wong @ 2019-12-18 19:37 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, bfoster, david, alex

From: Darrick J. Wong <darrick.wong@oracle.com>

Alex Lyakas reported[1] that mounting an xfs filesystem with new sunit
and swidth values could cause xfs_repair to fail loudly.  The problem
here is that repair calculates the where mkfs should have allocated the
root inode, based on the superblock geometry.  The allocation decisions
depend on sunit, which means that we really can't go updating sunit if
it would lead to a subsequent repair failure on an otherwise correct
filesystem.

Port from xfs_repair some code that computes the location of the root
inode and teach mount to skip the ondisk update if it would cause
problems for repair.  Along the way we'll update the documentation,
provide a function for computing the minimum AGFL size instead of
open-coding it, and cut down some indenting in the mount code.

Note that we allow the mount to proceed (and new allocations will
reflect this new geometry) because we've never screened this kind of
thing before.  We'll have to wait for a new future incompat feature to
enforce correct behavior, alas.

Note that the geometry reporting always uses the superblock values, not
the incore ones, so that is what xfs_info and xfs_growfs will report.

[1] https://lore.kernel.org/linux-xfs/20191125130744.GA44777@bfoster/T/#m00f9594b511e076e2fcdd489d78bc30216d72a7d

Reported-by: Alex Lyakas <alex@zadara.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_ialloc.c |   70 ++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_ialloc.h |    1 +
 fs/xfs/xfs_mount.c         |   45 ++++++++++++++++++++++++++++
 fs/xfs/xfs_trace.h         |   21 +++++++++++++
 4 files changed, 136 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index 988cde7744e6..eeec7c8d93fd 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -2909,3 +2909,73 @@ xfs_ialloc_setup_geometry(
 	else
 		igeo->ialloc_align = 0;
 }
+
+/*
+ * Compute the location of the root directory inode that is laid out by mkfs.
+ * The @sunit parameter will be copied from the superblock if it is negative.
+ */
+xfs_ino_t
+xfs_ialloc_calc_rootino(
+	struct xfs_mount	*mp,
+	int			sunit)
+{
+	struct xfs_ino_geometry	*igeo = M_IGEO(mp);
+	xfs_agblock_t		first_bno;
+
+	if (sunit < 0)
+		sunit = mp->m_sb.sb_unit;
+
+	/*
+	 * Pre-calculate the geometry of AG 0.  We know what it looks like
+	 * because libxfs knows how to create allocation groups now.
+	 *
+	 * first_bno is the first block in which mkfs could possibly have
+	 * allocated the root directory inode, once we factor in the metadata
+	 * that mkfs formats before it.  Namely, the four AG headers...
+	 */
+	first_bno = howmany(4 * mp->m_sb.sb_sectsize, mp->m_sb.sb_blocksize);
+
+	/* ...the two free space btree roots... */
+	first_bno += 2;
+
+	/* ...the inode btree root... */
+	first_bno += 1;
+
+	/* ...the initial AGFL... */
+	first_bno += xfs_alloc_min_freelist(mp, NULL);
+
+	/* ...the free inode btree root... */
+	if (xfs_sb_version_hasfinobt(&mp->m_sb))
+		first_bno++;
+
+	/* ...the reverse mapping btree root... */
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
+		first_bno++;
+
+	/* ...the reference count btree... */
+	if (xfs_sb_version_hasreflink(&mp->m_sb))
+		first_bno++;
+
+	/*
+	 * ...and the log, if it is allocated in the first allocation group.
+	 *
+	 * This can happen with filesystems that only have a single
+	 * allocation group, or very odd geometries created by old mkfs
+	 * versions on very small filesystems.
+	 */
+	if (mp->m_sb.sb_logstart &&
+	    XFS_FSB_TO_AGNO(mp, mp->m_sb.sb_logstart) == 0)
+		 first_bno += mp->m_sb.sb_logblocks;
+
+	/*
+	 * Now round first_bno up to whatever allocation alignment is given
+	 * by the filesystem or was passed in.
+	 */
+	if (xfs_sb_version_hasdalign(&mp->m_sb) && igeo->ialloc_align > 0)
+		first_bno = roundup(first_bno, sunit);
+	else if (xfs_sb_version_hasalign(&mp->m_sb) &&
+			mp->m_sb.sb_inoalignmt > 1)
+		first_bno = roundup(first_bno, mp->m_sb.sb_inoalignmt);
+
+	return XFS_AGINO_TO_INO(mp, 0, XFS_AGB_TO_AGINO(mp, first_bno));
+}
diff --git a/fs/xfs/libxfs/xfs_ialloc.h b/fs/xfs/libxfs/xfs_ialloc.h
index 323592d563d5..72b3468b97b1 100644
--- a/fs/xfs/libxfs/xfs_ialloc.h
+++ b/fs/xfs/libxfs/xfs_ialloc.h
@@ -152,5 +152,6 @@ int xfs_inobt_insert_rec(struct xfs_btree_cur *cur, uint16_t holemask,
 
 int xfs_ialloc_cluster_alignment(struct xfs_mount *mp);
 void xfs_ialloc_setup_geometry(struct xfs_mount *mp);
+xfs_ino_t xfs_ialloc_calc_rootino(struct xfs_mount *mp, int sunit);
 
 #endif	/* __XFS_IALLOC_H__ */
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 3750f0074c74..7f94c6b20920 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -31,7 +31,7 @@
 #include "xfs_reflink.h"
 #include "xfs_extent_busy.h"
 #include "xfs_health.h"
-
+#include "xfs_trace.h"
 
 static DEFINE_MUTEX(xfs_uuid_table_mutex);
 static int xfs_uuid_table_size;
@@ -359,6 +359,42 @@ xfs_readsb(
 	return error;
 }
 
+/*
+ * If the sunit/swidth change would move the precomputed root inode value, we
+ * must reject the ondisk change because repair will stumble over that.
+ * However, we allow the mount to proceed because we never rejected this
+ * combination before.  Returns true to update the sb, false otherwise.
+ */
+static inline int
+xfs_check_new_dalign(
+	struct xfs_mount	*mp,
+	int			new_dalign,
+	bool			*update_sb)
+{
+	struct xfs_sb		*sbp = &mp->m_sb;
+	xfs_ino_t		calc_ino;
+
+	calc_ino = xfs_ialloc_calc_rootino(mp, new_dalign);
+	trace_xfs_check_new_dalign(mp, new_dalign, calc_ino);
+
+	if (sbp->sb_rootino == calc_ino) {
+		*update_sb = true;
+		return 0;
+	}
+
+	xfs_warn(mp,
+"Cannot change stripe alignment; would require moving root inode.");
+
+	/*
+	 * XXX: Next time we add a new incompat feature, this should start
+	 * returning -EINVAL to fail the mount.  Until then, spit out a warning
+	 * that we're ignoring the administrator's instructions.
+	 */
+	xfs_warn(mp, "Skipping superblock stripe alignment update.");
+	*update_sb = false;
+	return 0;
+}
+
 /*
  * If we were provided with new sunit/swidth values as mount options, make sure
  * that they pass basic alignment and superblock feature checks, and convert
@@ -422,10 +458,17 @@ xfs_update_alignment(
 	struct xfs_sb		*sbp = &mp->m_sb;
 
 	if (mp->m_dalign) {
+		bool		update_sb;
+		int		error;
+
 		if (sbp->sb_unit == mp->m_dalign &&
 		    sbp->sb_width == mp->m_swidth)
 			return 0;
 
+		error = xfs_check_new_dalign(mp, mp->m_dalign, &update_sb);
+		if (error || !update_sb)
+			return error;
+
 		sbp->sb_unit = mp->m_dalign;
 		sbp->sb_width = mp->m_swidth;
 		mp->m_update_sb = true;
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index c13bb3655e48..a86be7f807ee 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3573,6 +3573,27 @@ DEFINE_KMEM_EVENT(kmem_alloc_large);
 DEFINE_KMEM_EVENT(kmem_realloc);
 DEFINE_KMEM_EVENT(kmem_zone_alloc);
 
+TRACE_EVENT(xfs_check_new_dalign,
+	TP_PROTO(struct xfs_mount *mp, int new_dalign, xfs_ino_t calc_rootino),
+	TP_ARGS(mp, new_dalign, calc_rootino),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(int, new_dalign)
+		__field(xfs_ino_t, sb_rootino)
+		__field(xfs_ino_t, calc_rootino)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->new_dalign = new_dalign;
+		__entry->sb_rootino = mp->m_sb.sb_rootino;
+		__entry->calc_rootino = calc_rootino;
+	),
+	TP_printk("dev %d:%d new_dalign %d sb_rootino %llu calc_rootino %llu",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->new_dalign, __entry->sb_rootino,
+		  __entry->calc_rootino)
+)
+
 #endif /* _TRACE_XFS_H */
 
 #undef TRACE_INCLUDE_PATH


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] xfs: refactor agfl length computation function
  2019-12-18 19:37 ` [PATCH 1/3] xfs: refactor agfl length computation function Darrick J. Wong
@ 2019-12-19 13:17   ` Brian Foster
  2019-12-24  8:31   ` Christoph Hellwig
  1 sibling, 0 replies; 9+ messages in thread
From: Brian Foster @ 2019-12-19 13:17 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, david, alex

On Wed, Dec 18, 2019 at 11:37:28AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Refactor xfs_alloc_min_freelist to accept a NULL @pag argument, in which
> case it returns the largest possible minimum length.  This will be used
> in an upcoming patch to compute the length of the AGFL at mkfs time.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/libxfs/xfs_alloc.c |   18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index c284e10af491..fc93fd88ec89 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -2248,24 +2248,32 @@ xfs_alloc_longest_free_extent(
>  	return pag->pagf_flcount > 0 || pag->pagf_longest > 0;
>  }
>  
> +/*
> + * Compute the minimum length of the AGFL in the given AG.  If @pag is NULL,
> + * return the largest possible minimum length.
> + */
>  unsigned int
>  xfs_alloc_min_freelist(
>  	struct xfs_mount	*mp,
>  	struct xfs_perag	*pag)
>  {
> +	/* AG btrees have at least 1 level. */
> +	static const uint8_t	fake_levels[XFS_BTNUM_AGF] = {1, 1, 1};
> +	const uint8_t		*levels = pag ? pag->pagf_levels : fake_levels;
>  	unsigned int		min_free;
>  
> +	ASSERT(mp->m_ag_maxlevels > 0);
> +
>  	/* space needed by-bno freespace btree */
> -	min_free = min_t(unsigned int, pag->pagf_levels[XFS_BTNUM_BNOi] + 1,
> +	min_free = min_t(unsigned int, levels[XFS_BTNUM_BNOi] + 1,
>  				       mp->m_ag_maxlevels);
>  	/* space needed by-size freespace btree */
> -	min_free += min_t(unsigned int, pag->pagf_levels[XFS_BTNUM_CNTi] + 1,
> +	min_free += min_t(unsigned int, levels[XFS_BTNUM_CNTi] + 1,
>  				       mp->m_ag_maxlevels);
>  	/* space needed reverse mapping used space btree */
>  	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
> -		min_free += min_t(unsigned int,
> -				  pag->pagf_levels[XFS_BTNUM_RMAPi] + 1,
> -				  mp->m_rmap_maxlevels);
> +		min_free += min_t(unsigned int, levels[XFS_BTNUM_RMAPi] + 1,
> +						mp->m_rmap_maxlevels);
>  
>  	return min_free;
>  }
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] xfs: split the sunit parameter update into two parts
  2019-12-18 19:37 ` [PATCH 2/3] xfs: split the sunit parameter update into two parts Darrick J. Wong
@ 2019-12-19 13:18   ` Brian Foster
  0 siblings, 0 replies; 9+ messages in thread
From: Brian Foster @ 2019-12-19 13:18 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, david, alex

On Wed, Dec 18, 2019 at 11:37:35AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> If the administrator provided a sunit= mount option, we need to validate
> the raw parameter, convert the mount option units (512b blocks) into the
> internal unit (fs blocks), and then validate that the (now cooked)
> parameter doesn't screw anything up on disk.  The incore inode geometry
> computation can depend on the new sunit option, but a subsequent patch
> will make validating the cooked value depends on the computed inode
> geometry, so break the sunit update into two steps.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/xfs_mount.c |  126 +++++++++++++++++++++++++++++++---------------------
>  1 file changed, 75 insertions(+), 51 deletions(-)
> 
> 
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index fca65109cf24..3750f0074c74 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -360,66 +360,79 @@ xfs_readsb(
>  }
>  
>  /*
> - * Update alignment values based on mount options and sb values
> + * If we were provided with new sunit/swidth values as mount options, make sure
> + * that they pass basic alignment and superblock feature checks, and convert
> + * them into the same units (FSB) that everything else expects.  This step
> + * /must/ be done before computing the inode geometry.
>   */
>  STATIC int
> -xfs_update_alignment(xfs_mount_t *mp)
> +xfs_validate_new_dalign(
> +	struct xfs_mount	*mp)
>  {
...
> +		} else if (mp->m_dalign) {
> +			mp->m_swidth = XFS_BB_TO_FSBT(mp, mp->m_swidth);
>  		} else {
>  			xfs_warn(mp,
> -	"cannot change alignment: superblock does not support data alignment");
> +		"alignment check failed: sunit(%d) less than bsize(%d)",
> +				 mp->m_dalign, mp->m_sb.sb_blocksize);
>  			return -EINVAL;
>  		}
> +	}
> +
> +	/* Update superblock with new values and log changes. */

We can probably drop the above comment since this hunk no longer does
either. ;) With that fixed:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> +	if (!xfs_sb_version_hasdalign(&mp->m_sb)) {
> +		xfs_warn(mp,
> +"cannot change alignment: superblock does not support data alignment");
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * Update alignment values based on mount options and sb values
> + */
> +STATIC int
> +xfs_update_alignment(
> +	struct xfs_mount	*mp)
> +{
> +	struct xfs_sb		*sbp = &mp->m_sb;
> +
> +	if (mp->m_dalign) {
> +		if (sbp->sb_unit == mp->m_dalign &&
> +		    sbp->sb_width == mp->m_swidth)
> +			return 0;
> +
> +		sbp->sb_unit = mp->m_dalign;
> +		sbp->sb_width = mp->m_swidth;
> +		mp->m_update_sb = true;
>  	} else if ((mp->m_flags & XFS_MOUNT_NOALIGN) != XFS_MOUNT_NOALIGN &&
>  		    xfs_sb_version_hasdalign(&mp->m_sb)) {
> -			mp->m_dalign = sbp->sb_unit;
> -			mp->m_swidth = sbp->sb_width;
> +		mp->m_dalign = sbp->sb_unit;
> +		mp->m_swidth = sbp->sb_width;
>  	}
>  
>  	return 0;
> @@ -648,12 +661,12 @@ xfs_mountfs(
>  	}
>  
>  	/*
> -	 * Check if sb_agblocks is aligned at stripe boundary
> -	 * If sb_agblocks is NOT aligned turn off m_dalign since
> -	 * allocator alignment is within an ag, therefore ag has
> -	 * to be aligned at stripe boundary.
> +	 * If we were given new sunit/swidth options, do some basic validation
> +	 * checks and convert the incore dalign and swidth values to the
> +	 * same units (FSB) that everything else uses.  This /must/ happen
> +	 * before computing the inode geometry.
>  	 */
> -	error = xfs_update_alignment(mp);
> +	error = xfs_validate_new_dalign(mp);
>  	if (error)
>  		goto out;
>  
> @@ -664,6 +677,17 @@ xfs_mountfs(
>  	xfs_rmapbt_compute_maxlevels(mp);
>  	xfs_refcountbt_compute_maxlevels(mp);
>  
> +	/*
> +	 * Check if sb_agblocks is aligned at stripe boundary.  If sb_agblocks
> +	 * is NOT aligned turn off m_dalign since allocator alignment is within
> +	 * an ag, therefore ag has to be aligned at stripe boundary.  Note that
> +	 * we must compute the free space and rmap btree geometry before doing
> +	 * this.
> +	 */
> +	error = xfs_update_alignment(mp);
> +	if (error)
> +		goto out;
> +
>  	/* enable fail_at_unmount as default */
>  	mp->m_fail_unmount = true;
>  
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] xfs: don't commit sunit/swidth updates to disk if that would cause repair failures
  2019-12-18 19:37 ` [PATCH 3/3] xfs: don't commit sunit/swidth updates to disk if that would cause repair failures Darrick J. Wong
@ 2019-12-19 13:21   ` Brian Foster
  2019-12-19 15:39     ` Darrick J. Wong
  0 siblings, 1 reply; 9+ messages in thread
From: Brian Foster @ 2019-12-19 13:21 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, david, alex

On Wed, Dec 18, 2019 at 11:37:41AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Alex Lyakas reported[1] that mounting an xfs filesystem with new sunit
> and swidth values could cause xfs_repair to fail loudly.  The problem
> here is that repair calculates the where mkfs should have allocated the
> root inode, based on the superblock geometry.  The allocation decisions
> depend on sunit, which means that we really can't go updating sunit if
> it would lead to a subsequent repair failure on an otherwise correct
> filesystem.
> 
> Port from xfs_repair some code that computes the location of the root
> inode and teach mount to skip the ondisk update if it would cause
> problems for repair.  Along the way we'll update the documentation,
> provide a function for computing the minimum AGFL size instead of
> open-coding it, and cut down some indenting in the mount code.
> 
> Note that we allow the mount to proceed (and new allocations will
> reflect this new geometry) because we've never screened this kind of
> thing before.  We'll have to wait for a new future incompat feature to
> enforce correct behavior, alas.
> 
> Note that the geometry reporting always uses the superblock values, not
> the incore ones, so that is what xfs_info and xfs_growfs will report.
> 
> [1] https://lore.kernel.org/linux-xfs/20191125130744.GA44777@bfoster/T/#m00f9594b511e076e2fcdd489d78bc30216d72a7d
> 
> Reported-by: Alex Lyakas <alex@zadara.com>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_ialloc.c |   70 ++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/libxfs/xfs_ialloc.h |    1 +
>  fs/xfs/xfs_mount.c         |   45 ++++++++++++++++++++++++++++
>  fs/xfs/xfs_trace.h         |   21 +++++++++++++
>  4 files changed, 136 insertions(+), 1 deletion(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> index 988cde7744e6..eeec7c8d93fd 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.c
> +++ b/fs/xfs/libxfs/xfs_ialloc.c
> @@ -2909,3 +2909,73 @@ xfs_ialloc_setup_geometry(
>  	else
>  		igeo->ialloc_align = 0;
>  }
> +
> +/*
> + * Compute the location of the root directory inode that is laid out by mkfs.
> + * The @sunit parameter will be copied from the superblock if it is negative.
> + */
> +xfs_ino_t
> +xfs_ialloc_calc_rootino(
> +	struct xfs_mount	*mp,
> +	int			sunit)
> +{
> +	struct xfs_ino_geometry	*igeo = M_IGEO(mp);
> +	xfs_agblock_t		first_bno;
> +
> +	if (sunit < 0)
> +		sunit = mp->m_sb.sb_unit;

What's the purpose of this check? Can this actually happen?

That question aside, the rest LGTM:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> +
> +	/*
> +	 * Pre-calculate the geometry of AG 0.  We know what it looks like
> +	 * because libxfs knows how to create allocation groups now.
> +	 *
> +	 * first_bno is the first block in which mkfs could possibly have
> +	 * allocated the root directory inode, once we factor in the metadata
> +	 * that mkfs formats before it.  Namely, the four AG headers...
> +	 */
> +	first_bno = howmany(4 * mp->m_sb.sb_sectsize, mp->m_sb.sb_blocksize);
> +
> +	/* ...the two free space btree roots... */
> +	first_bno += 2;
> +
> +	/* ...the inode btree root... */
> +	first_bno += 1;
> +
> +	/* ...the initial AGFL... */
> +	first_bno += xfs_alloc_min_freelist(mp, NULL);
> +
> +	/* ...the free inode btree root... */
> +	if (xfs_sb_version_hasfinobt(&mp->m_sb))
> +		first_bno++;
> +
> +	/* ...the reverse mapping btree root... */
> +	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
> +		first_bno++;
> +
> +	/* ...the reference count btree... */
> +	if (xfs_sb_version_hasreflink(&mp->m_sb))
> +		first_bno++;
> +
> +	/*
> +	 * ...and the log, if it is allocated in the first allocation group.
> +	 *
> +	 * This can happen with filesystems that only have a single
> +	 * allocation group, or very odd geometries created by old mkfs
> +	 * versions on very small filesystems.
> +	 */
> +	if (mp->m_sb.sb_logstart &&
> +	    XFS_FSB_TO_AGNO(mp, mp->m_sb.sb_logstart) == 0)
> +		 first_bno += mp->m_sb.sb_logblocks;
> +
> +	/*
> +	 * Now round first_bno up to whatever allocation alignment is given
> +	 * by the filesystem or was passed in.
> +	 */
> +	if (xfs_sb_version_hasdalign(&mp->m_sb) && igeo->ialloc_align > 0)
> +		first_bno = roundup(first_bno, sunit);
> +	else if (xfs_sb_version_hasalign(&mp->m_sb) &&
> +			mp->m_sb.sb_inoalignmt > 1)
> +		first_bno = roundup(first_bno, mp->m_sb.sb_inoalignmt);
> +
> +	return XFS_AGINO_TO_INO(mp, 0, XFS_AGB_TO_AGINO(mp, first_bno));
> +}
> diff --git a/fs/xfs/libxfs/xfs_ialloc.h b/fs/xfs/libxfs/xfs_ialloc.h
> index 323592d563d5..72b3468b97b1 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.h
> +++ b/fs/xfs/libxfs/xfs_ialloc.h
> @@ -152,5 +152,6 @@ int xfs_inobt_insert_rec(struct xfs_btree_cur *cur, uint16_t holemask,
>  
>  int xfs_ialloc_cluster_alignment(struct xfs_mount *mp);
>  void xfs_ialloc_setup_geometry(struct xfs_mount *mp);
> +xfs_ino_t xfs_ialloc_calc_rootino(struct xfs_mount *mp, int sunit);
>  
>  #endif	/* __XFS_IALLOC_H__ */
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index 3750f0074c74..7f94c6b20920 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -31,7 +31,7 @@
>  #include "xfs_reflink.h"
>  #include "xfs_extent_busy.h"
>  #include "xfs_health.h"
> -
> +#include "xfs_trace.h"
>  
>  static DEFINE_MUTEX(xfs_uuid_table_mutex);
>  static int xfs_uuid_table_size;
> @@ -359,6 +359,42 @@ xfs_readsb(
>  	return error;
>  }
>  
> +/*
> + * If the sunit/swidth change would move the precomputed root inode value, we
> + * must reject the ondisk change because repair will stumble over that.
> + * However, we allow the mount to proceed because we never rejected this
> + * combination before.  Returns true to update the sb, false otherwise.
> + */
> +static inline int
> +xfs_check_new_dalign(
> +	struct xfs_mount	*mp,
> +	int			new_dalign,
> +	bool			*update_sb)
> +{
> +	struct xfs_sb		*sbp = &mp->m_sb;
> +	xfs_ino_t		calc_ino;
> +
> +	calc_ino = xfs_ialloc_calc_rootino(mp, new_dalign);
> +	trace_xfs_check_new_dalign(mp, new_dalign, calc_ino);
> +
> +	if (sbp->sb_rootino == calc_ino) {
> +		*update_sb = true;
> +		return 0;
> +	}
> +
> +	xfs_warn(mp,
> +"Cannot change stripe alignment; would require moving root inode.");
> +
> +	/*
> +	 * XXX: Next time we add a new incompat feature, this should start
> +	 * returning -EINVAL to fail the mount.  Until then, spit out a warning
> +	 * that we're ignoring the administrator's instructions.
> +	 */
> +	xfs_warn(mp, "Skipping superblock stripe alignment update.");
> +	*update_sb = false;
> +	return 0;
> +}
> +
>  /*
>   * If we were provided with new sunit/swidth values as mount options, make sure
>   * that they pass basic alignment and superblock feature checks, and convert
> @@ -422,10 +458,17 @@ xfs_update_alignment(
>  	struct xfs_sb		*sbp = &mp->m_sb;
>  
>  	if (mp->m_dalign) {
> +		bool		update_sb;
> +		int		error;
> +
>  		if (sbp->sb_unit == mp->m_dalign &&
>  		    sbp->sb_width == mp->m_swidth)
>  			return 0;
>  
> +		error = xfs_check_new_dalign(mp, mp->m_dalign, &update_sb);
> +		if (error || !update_sb)
> +			return error;
> +
>  		sbp->sb_unit = mp->m_dalign;
>  		sbp->sb_width = mp->m_swidth;
>  		mp->m_update_sb = true;
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index c13bb3655e48..a86be7f807ee 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -3573,6 +3573,27 @@ DEFINE_KMEM_EVENT(kmem_alloc_large);
>  DEFINE_KMEM_EVENT(kmem_realloc);
>  DEFINE_KMEM_EVENT(kmem_zone_alloc);
>  
> +TRACE_EVENT(xfs_check_new_dalign,
> +	TP_PROTO(struct xfs_mount *mp, int new_dalign, xfs_ino_t calc_rootino),
> +	TP_ARGS(mp, new_dalign, calc_rootino),
> +	TP_STRUCT__entry(
> +		__field(dev_t, dev)
> +		__field(int, new_dalign)
> +		__field(xfs_ino_t, sb_rootino)
> +		__field(xfs_ino_t, calc_rootino)
> +	),
> +	TP_fast_assign(
> +		__entry->dev = mp->m_super->s_dev;
> +		__entry->new_dalign = new_dalign;
> +		__entry->sb_rootino = mp->m_sb.sb_rootino;
> +		__entry->calc_rootino = calc_rootino;
> +	),
> +	TP_printk("dev %d:%d new_dalign %d sb_rootino %llu calc_rootino %llu",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  __entry->new_dalign, __entry->sb_rootino,
> +		  __entry->calc_rootino)
> +)
> +
>  #endif /* _TRACE_XFS_H */
>  
>  #undef TRACE_INCLUDE_PATH
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] xfs: don't commit sunit/swidth updates to disk if that would cause repair failures
  2019-12-19 13:21   ` Brian Foster
@ 2019-12-19 15:39     ` Darrick J. Wong
  0 siblings, 0 replies; 9+ messages in thread
From: Darrick J. Wong @ 2019-12-19 15:39 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs, david, alex

On Thu, Dec 19, 2019 at 08:21:30AM -0500, Brian Foster wrote:
> On Wed, Dec 18, 2019 at 11:37:41AM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Alex Lyakas reported[1] that mounting an xfs filesystem with new sunit
> > and swidth values could cause xfs_repair to fail loudly.  The problem
> > here is that repair calculates the where mkfs should have allocated the
> > root inode, based on the superblock geometry.  The allocation decisions
> > depend on sunit, which means that we really can't go updating sunit if
> > it would lead to a subsequent repair failure on an otherwise correct
> > filesystem.
> > 
> > Port from xfs_repair some code that computes the location of the root
> > inode and teach mount to skip the ondisk update if it would cause
> > problems for repair.  Along the way we'll update the documentation,
> > provide a function for computing the minimum AGFL size instead of
> > open-coding it, and cut down some indenting in the mount code.
> > 
> > Note that we allow the mount to proceed (and new allocations will
> > reflect this new geometry) because we've never screened this kind of
> > thing before.  We'll have to wait for a new future incompat feature to
> > enforce correct behavior, alas.
> > 
> > Note that the geometry reporting always uses the superblock values, not
> > the incore ones, so that is what xfs_info and xfs_growfs will report.
> > 
> > [1] https://lore.kernel.org/linux-xfs/20191125130744.GA44777@bfoster/T/#m00f9594b511e076e2fcdd489d78bc30216d72a7d
> > 
> > Reported-by: Alex Lyakas <alex@zadara.com>
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/libxfs/xfs_ialloc.c |   70 ++++++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/libxfs/xfs_ialloc.h |    1 +
> >  fs/xfs/xfs_mount.c         |   45 ++++++++++++++++++++++++++++
> >  fs/xfs/xfs_trace.h         |   21 +++++++++++++
> >  4 files changed, 136 insertions(+), 1 deletion(-)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> > index 988cde7744e6..eeec7c8d93fd 100644
> > --- a/fs/xfs/libxfs/xfs_ialloc.c
> > +++ b/fs/xfs/libxfs/xfs_ialloc.c
> > @@ -2909,3 +2909,73 @@ xfs_ialloc_setup_geometry(
> >  	else
> >  		igeo->ialloc_align = 0;
> >  }
> > +
> > +/*
> > + * Compute the location of the root directory inode that is laid out by mkfs.
> > + * The @sunit parameter will be copied from the superblock if it is negative.
> > + */
> > +xfs_ino_t
> > +xfs_ialloc_calc_rootino(
> > +	struct xfs_mount	*mp,
> > +	int			sunit)
> > +{
> > +	struct xfs_ino_geometry	*igeo = M_IGEO(mp);
> > +	xfs_agblock_t		first_bno;
> > +
> > +	if (sunit < 0)
> > +		sunit = mp->m_sb.sb_unit;
> 
> What's the purpose of this check? Can this actually happen?

I'd written the mkfs / repair callers to pass in -1 to read the
superblock sunit value, but TBH it's not strictly necessary, so
I'll pull it out.

> That question aside, the rest LGTM:
> 
> Reviewed-by: Brian Foster <bfoster@redhat.com>

Thanks for the review. :)

--D

> > +
> > +	/*
> > +	 * Pre-calculate the geometry of AG 0.  We know what it looks like
> > +	 * because libxfs knows how to create allocation groups now.
> > +	 *
> > +	 * first_bno is the first block in which mkfs could possibly have
> > +	 * allocated the root directory inode, once we factor in the metadata
> > +	 * that mkfs formats before it.  Namely, the four AG headers...
> > +	 */
> > +	first_bno = howmany(4 * mp->m_sb.sb_sectsize, mp->m_sb.sb_blocksize);
> > +
> > +	/* ...the two free space btree roots... */
> > +	first_bno += 2;
> > +
> > +	/* ...the inode btree root... */
> > +	first_bno += 1;
> > +
> > +	/* ...the initial AGFL... */
> > +	first_bno += xfs_alloc_min_freelist(mp, NULL);
> > +
> > +	/* ...the free inode btree root... */
> > +	if (xfs_sb_version_hasfinobt(&mp->m_sb))
> > +		first_bno++;
> > +
> > +	/* ...the reverse mapping btree root... */
> > +	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
> > +		first_bno++;
> > +
> > +	/* ...the reference count btree... */
> > +	if (xfs_sb_version_hasreflink(&mp->m_sb))
> > +		first_bno++;
> > +
> > +	/*
> > +	 * ...and the log, if it is allocated in the first allocation group.
> > +	 *
> > +	 * This can happen with filesystems that only have a single
> > +	 * allocation group, or very odd geometries created by old mkfs
> > +	 * versions on very small filesystems.
> > +	 */
> > +	if (mp->m_sb.sb_logstart &&
> > +	    XFS_FSB_TO_AGNO(mp, mp->m_sb.sb_logstart) == 0)
> > +		 first_bno += mp->m_sb.sb_logblocks;
> > +
> > +	/*
> > +	 * Now round first_bno up to whatever allocation alignment is given
> > +	 * by the filesystem or was passed in.
> > +	 */
> > +	if (xfs_sb_version_hasdalign(&mp->m_sb) && igeo->ialloc_align > 0)
> > +		first_bno = roundup(first_bno, sunit);
> > +	else if (xfs_sb_version_hasalign(&mp->m_sb) &&
> > +			mp->m_sb.sb_inoalignmt > 1)
> > +		first_bno = roundup(first_bno, mp->m_sb.sb_inoalignmt);
> > +
> > +	return XFS_AGINO_TO_INO(mp, 0, XFS_AGB_TO_AGINO(mp, first_bno));
> > +}
> > diff --git a/fs/xfs/libxfs/xfs_ialloc.h b/fs/xfs/libxfs/xfs_ialloc.h
> > index 323592d563d5..72b3468b97b1 100644
> > --- a/fs/xfs/libxfs/xfs_ialloc.h
> > +++ b/fs/xfs/libxfs/xfs_ialloc.h
> > @@ -152,5 +152,6 @@ int xfs_inobt_insert_rec(struct xfs_btree_cur *cur, uint16_t holemask,
> >  
> >  int xfs_ialloc_cluster_alignment(struct xfs_mount *mp);
> >  void xfs_ialloc_setup_geometry(struct xfs_mount *mp);
> > +xfs_ino_t xfs_ialloc_calc_rootino(struct xfs_mount *mp, int sunit);
> >  
> >  #endif	/* __XFS_IALLOC_H__ */
> > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > index 3750f0074c74..7f94c6b20920 100644
> > --- a/fs/xfs/xfs_mount.c
> > +++ b/fs/xfs/xfs_mount.c
> > @@ -31,7 +31,7 @@
> >  #include "xfs_reflink.h"
> >  #include "xfs_extent_busy.h"
> >  #include "xfs_health.h"
> > -
> > +#include "xfs_trace.h"
> >  
> >  static DEFINE_MUTEX(xfs_uuid_table_mutex);
> >  static int xfs_uuid_table_size;
> > @@ -359,6 +359,42 @@ xfs_readsb(
> >  	return error;
> >  }
> >  
> > +/*
> > + * If the sunit/swidth change would move the precomputed root inode value, we
> > + * must reject the ondisk change because repair will stumble over that.
> > + * However, we allow the mount to proceed because we never rejected this
> > + * combination before.  Returns true to update the sb, false otherwise.
> > + */
> > +static inline int
> > +xfs_check_new_dalign(
> > +	struct xfs_mount	*mp,
> > +	int			new_dalign,
> > +	bool			*update_sb)
> > +{
> > +	struct xfs_sb		*sbp = &mp->m_sb;
> > +	xfs_ino_t		calc_ino;
> > +
> > +	calc_ino = xfs_ialloc_calc_rootino(mp, new_dalign);
> > +	trace_xfs_check_new_dalign(mp, new_dalign, calc_ino);
> > +
> > +	if (sbp->sb_rootino == calc_ino) {
> > +		*update_sb = true;
> > +		return 0;
> > +	}
> > +
> > +	xfs_warn(mp,
> > +"Cannot change stripe alignment; would require moving root inode.");
> > +
> > +	/*
> > +	 * XXX: Next time we add a new incompat feature, this should start
> > +	 * returning -EINVAL to fail the mount.  Until then, spit out a warning
> > +	 * that we're ignoring the administrator's instructions.
> > +	 */
> > +	xfs_warn(mp, "Skipping superblock stripe alignment update.");
> > +	*update_sb = false;
> > +	return 0;
> > +}
> > +
> >  /*
> >   * If we were provided with new sunit/swidth values as mount options, make sure
> >   * that they pass basic alignment and superblock feature checks, and convert
> > @@ -422,10 +458,17 @@ xfs_update_alignment(
> >  	struct xfs_sb		*sbp = &mp->m_sb;
> >  
> >  	if (mp->m_dalign) {
> > +		bool		update_sb;
> > +		int		error;
> > +
> >  		if (sbp->sb_unit == mp->m_dalign &&
> >  		    sbp->sb_width == mp->m_swidth)
> >  			return 0;
> >  
> > +		error = xfs_check_new_dalign(mp, mp->m_dalign, &update_sb);
> > +		if (error || !update_sb)
> > +			return error;
> > +
> >  		sbp->sb_unit = mp->m_dalign;
> >  		sbp->sb_width = mp->m_swidth;
> >  		mp->m_update_sb = true;
> > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> > index c13bb3655e48..a86be7f807ee 100644
> > --- a/fs/xfs/xfs_trace.h
> > +++ b/fs/xfs/xfs_trace.h
> > @@ -3573,6 +3573,27 @@ DEFINE_KMEM_EVENT(kmem_alloc_large);
> >  DEFINE_KMEM_EVENT(kmem_realloc);
> >  DEFINE_KMEM_EVENT(kmem_zone_alloc);
> >  
> > +TRACE_EVENT(xfs_check_new_dalign,
> > +	TP_PROTO(struct xfs_mount *mp, int new_dalign, xfs_ino_t calc_rootino),
> > +	TP_ARGS(mp, new_dalign, calc_rootino),
> > +	TP_STRUCT__entry(
> > +		__field(dev_t, dev)
> > +		__field(int, new_dalign)
> > +		__field(xfs_ino_t, sb_rootino)
> > +		__field(xfs_ino_t, calc_rootino)
> > +	),
> > +	TP_fast_assign(
> > +		__entry->dev = mp->m_super->s_dev;
> > +		__entry->new_dalign = new_dalign;
> > +		__entry->sb_rootino = mp->m_sb.sb_rootino;
> > +		__entry->calc_rootino = calc_rootino;
> > +	),
> > +	TP_printk("dev %d:%d new_dalign %d sb_rootino %llu calc_rootino %llu",
> > +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> > +		  __entry->new_dalign, __entry->sb_rootino,
> > +		  __entry->calc_rootino)
> > +)
> > +
> >  #endif /* _TRACE_XFS_H */
> >  
> >  #undef TRACE_INCLUDE_PATH
> > 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] xfs: refactor agfl length computation function
  2019-12-18 19:37 ` [PATCH 1/3] xfs: refactor agfl length computation function Darrick J. Wong
  2019-12-19 13:17   ` Brian Foster
@ 2019-12-24  8:31   ` Christoph Hellwig
  1 sibling, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2019-12-24  8:31 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, bfoster, david, alex

> +/*
> + * Compute the minimum length of the AGFL in the given AG.  If @pag is NULL,
> + * return the largest possible minimum length.
> + */
>  unsigned int
>  xfs_alloc_min_freelist(
>  	struct xfs_mount	*mp,
>  	struct xfs_perag	*pag)
>  {
> +	/* AG btrees have at least 1 level. */
> +	static const uint8_t	fake_levels[XFS_BTNUM_AGF] = {1, 1, 1};
> +	const uint8_t		*levels = pag ? pag->pagf_levels : fake_levels;
>  	unsigned int		min_free;

Yikes.  This is just the nastiest calling convention possible.  Why
not factor out a xfs_alloc_min_freelist_levels helpers that gets the
levels pointer passed, and we get something much saner.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-12-24  8:31 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-18 19:37 [PATCH v5 0/3] xfs: don't commit sunit/swidth updates to disk if that would cause repair failures Darrick J. Wong
2019-12-18 19:37 ` [PATCH 1/3] xfs: refactor agfl length computation function Darrick J. Wong
2019-12-19 13:17   ` Brian Foster
2019-12-24  8:31   ` Christoph Hellwig
2019-12-18 19:37 ` [PATCH 2/3] xfs: split the sunit parameter update into two parts Darrick J. Wong
2019-12-19 13:18   ` Brian Foster
2019-12-18 19:37 ` [PATCH 3/3] xfs: don't commit sunit/swidth updates to disk if that would cause repair failures Darrick J. Wong
2019-12-19 13:21   ` Brian Foster
2019-12-19 15:39     ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).