All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] xfs: fix various problems
@ 2018-02-23  1:59 Darrick J. Wong
  2018-02-23  1:59 ` [PATCH 1/5] xfs: don't iunlock the quota ip when quota block allocation fails Darrick J. Wong
                   ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: Darrick J. Wong @ 2018-02-23  1:59 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

Hi all,

This series fixes miscellaneous problems in XFS.  The first two fix some
crashes and locking problems that happen when we're fed fs images with
corrupt directories or AGFs.

The third patch speeds up inode reclaim by avoiding an unnecessary
transaction allocation when there's nothing in the CoW fork.

The final two patches fix the AGFL wrapping problems we've seen with the
v5 disk format once and for all.  At mount time we fix any AGFL wrapping
problems that an old kernel might have left for us, and at unmount or
freeze or remount-ro time we ensure that the AGFLs we leave behind for
other filesystems to mount do not wrap around the end of the block.
This should fix all the intermittent agf verifier failures people see.
Note that the mitigation is switched off for rmap or reflink filesystems
on the assumption that any kernel that supports those features also has
the 4.5 agfl padding patch applied.

These patches are all against 4.16-rc2, though aside from the first two
I intend them all for 4.17.

--D

[1] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=djwong-devel
[2] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=djwong-devel
[3] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=djwong-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/5] xfs: don't iunlock the quota ip when quota block allocation fails
  2018-02-23  1:59 [PATCH 0/5] xfs: fix various problems Darrick J. Wong
@ 2018-02-23  1:59 ` Darrick J. Wong
  2018-02-27 13:55   ` Brian Foster
  2018-02-23  1:59 ` [PATCH 2/5] xfs: convert a few more directory asserts to corruption returns Darrick J. Wong
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2018-02-23  1:59 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

In xfs_qm_dqalloc, we join the locked quota inode to the transaction we
use to allocate blocks.  If the allocation or mapping fails, we're not
allowed to unlock the inode because the transaction code is in charge of
unlocking it for us.  Therefore, remove the iunlock call to avoid
blowing asserts about unbalanced locking + mount hang.

Found by corrupting the AGF and allocating space in the filesystem
(quotacheck) immediately after mount.  The upcoming agfl wrapping fixup
test will trigger this scenario.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_dquot.c |    2 --
 1 file changed, 2 deletions(-)


diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index 43572f8..2410acc 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -394,8 +394,6 @@ xfs_qm_dqalloc(
 error1:
 	xfs_defer_cancel(&dfops);
 error0:
-	xfs_iunlock(quotip, XFS_ILOCK_EXCL);
-
 	return error;
 }
 


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/5] xfs: convert a few more directory asserts to corruption returns
  2018-02-23  1:59 [PATCH 0/5] xfs: fix various problems Darrick J. Wong
  2018-02-23  1:59 ` [PATCH 1/5] xfs: don't iunlock the quota ip when quota block allocation fails Darrick J. Wong
@ 2018-02-23  1:59 ` Darrick J. Wong
  2018-02-27 13:55   ` Brian Foster
  2018-02-23  2:00 ` [PATCH 3/5] xfs: check for cow blocks before trying to clear them during inode reclaim Darrick J. Wong
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2018-02-23  1:59 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Yet another round of playing whack-a-mole with directory code that
asserts on corrupt on-disk metadata when it really should be returning
-EFSCORRUPTED instead of ASSERTing.  Found by a xfs/391 crash while
lastbit fuzzing of ltail.bestcount.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_dir2_leaf.c |    3 ++-
 fs/xfs/libxfs/xfs_dir2_node.c |    5 +++--
 2 files changed, 5 insertions(+), 3 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_dir2_leaf.c b/fs/xfs/libxfs/xfs_dir2_leaf.c
index d7e630f..d61d52d 100644
--- a/fs/xfs/libxfs/xfs_dir2_leaf.c
+++ b/fs/xfs/libxfs/xfs_dir2_leaf.c
@@ -1415,7 +1415,8 @@ xfs_dir2_leaf_removename(
 	oldbest = be16_to_cpu(bf[0].length);
 	ltp = xfs_dir2_leaf_tail_p(args->geo, leaf);
 	bestsp = xfs_dir2_leaf_bests_p(ltp);
-	ASSERT(be16_to_cpu(bestsp[db]) == oldbest);
+	if (be16_to_cpu(bestsp[db]) != oldbest)
+		return -EFSCORRUPTED;
 	/*
 	 * Mark the former data entry unused.
 	 */
diff --git a/fs/xfs/libxfs/xfs_dir2_node.c b/fs/xfs/libxfs/xfs_dir2_node.c
index 239d97a..0839ffe 100644
--- a/fs/xfs/libxfs/xfs_dir2_node.c
+++ b/fs/xfs/libxfs/xfs_dir2_node.c
@@ -387,8 +387,9 @@ xfs_dir2_leaf_to_node(
 	dp->d_ops->free_hdr_from_disk(&freehdr, free);
 	leaf = lbp->b_addr;
 	ltp = xfs_dir2_leaf_tail_p(args->geo, leaf);
-	ASSERT(be32_to_cpu(ltp->bestcount) <=
-				(uint)dp->i_d.di_size / args->geo->blksize);
+	if (be32_to_cpu(ltp->bestcount) >
+				(uint)dp->i_d.di_size / args->geo->blksize)
+		return -EFSCORRUPTED;
 
 	/*
 	 * Copy freespace entries from the leaf block to the new block.


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/5] xfs: check for cow blocks before trying to clear them during inode reclaim
  2018-02-23  1:59 [PATCH 0/5] xfs: fix various problems Darrick J. Wong
  2018-02-23  1:59 ` [PATCH 1/5] xfs: don't iunlock the quota ip when quota block allocation fails Darrick J. Wong
  2018-02-23  1:59 ` [PATCH 2/5] xfs: convert a few more directory asserts to corruption returns Darrick J. Wong
@ 2018-02-23  2:00 ` Darrick J. Wong
  2018-02-27 13:55   ` Brian Foster
  2018-02-23  2:00 ` [PATCH 4/5] xfs: convert XFS_AGFL_SIZE to a helper function Darrick J. Wong
  2018-02-23  2:00 ` [PATCH 5/5] xfs: fix agfl wrapping Darrick J. Wong
  4 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2018-02-23  2:00 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

There's no point in allocating a transaction and locking the inode in
preparation to clear cow blocks if there actually are any cow fork
extents.  Therefore, move the xfs_reflink_cancel_cow_range hunk to
xfs_inactive and check the cow ifp first.  This makes inode reclamation
run faster.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_inode.c |    5 +++++
 fs/xfs/xfs_super.c |    9 ---------
 2 files changed, 5 insertions(+), 9 deletions(-)


diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 604ee38..50fbbf5 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1872,6 +1872,7 @@ xfs_inactive(
 	xfs_inode_t	*ip)
 {
 	struct xfs_mount	*mp;
+	struct xfs_ifork	*cow_ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
 	int			error;
 	int			truncate = 0;
 
@@ -1892,6 +1893,10 @@ xfs_inactive(
 	if (mp->m_flags & XFS_MOUNT_RDONLY)
 		return;
 
+	/* Try to clean out the cow blocks if there are any. */
+	if (xfs_is_reflink_inode(ip) && cow_ifp->if_bytes > 0)
+		xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, true);
+
 	if (VFS_I(ip)->i_nlink != 0) {
 		/*
 		 * force is true because we are evicting an inode from the
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 7aba628..624a802 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -970,7 +970,6 @@ xfs_fs_destroy_inode(
 	struct inode		*inode)
 {
 	struct xfs_inode	*ip = XFS_I(inode);
-	int			error;
 
 	trace_xfs_destroy_inode(ip);
 
@@ -978,14 +977,6 @@ xfs_fs_destroy_inode(
 	XFS_STATS_INC(ip->i_mount, vn_rele);
 	XFS_STATS_INC(ip->i_mount, vn_remove);
 
-	if (xfs_is_reflink_inode(ip)) {
-		error = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, true);
-		if (error && !XFS_FORCED_SHUTDOWN(ip->i_mount))
-			xfs_warn(ip->i_mount,
-"Error %d while evicting CoW blocks for inode %llu.",
-					error, ip->i_ino);
-	}
-
 	xfs_inactive(ip);
 
 	ASSERT(XFS_FORCED_SHUTDOWN(ip->i_mount) || ip->i_delayed_blks == 0);


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/5] xfs: convert XFS_AGFL_SIZE to a helper function
  2018-02-23  1:59 [PATCH 0/5] xfs: fix various problems Darrick J. Wong
                   ` (2 preceding siblings ...)
  2018-02-23  2:00 ` [PATCH 3/5] xfs: check for cow blocks before trying to clear them during inode reclaim Darrick J. Wong
@ 2018-02-23  2:00 ` Darrick J. Wong
  2018-02-27 19:34   ` Brian Foster
  2018-02-23  2:00 ` [PATCH 5/5] xfs: fix agfl wrapping Darrick J. Wong
  4 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2018-02-23  2:00 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, Dave Chinner

From: Dave Chinner <dchinner@redhat.com>

The AGFL size calculation is about to get more complex, so lets turn
the macro into a function first and remove the macro.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
[darrick: forward port to newer kernel, simplify the helper]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_alloc.c  |   31 ++++++++++++++++++++++++-------
 fs/xfs/libxfs/xfs_alloc.h  |    2 ++
 fs/xfs/libxfs/xfs_format.h |   13 +------------
 fs/xfs/scrub/agheader.c    |    6 +++---
 fs/xfs/xfs_fsops.c         |    2 +-
 5 files changed, 31 insertions(+), 23 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index c02781a..36101e5 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -53,6 +53,23 @@ STATIC int xfs_alloc_ag_vextent_size(xfs_alloc_arg_t *);
 STATIC int xfs_alloc_ag_vextent_small(xfs_alloc_arg_t *,
 		xfs_btree_cur_t *, xfs_agblock_t *, xfs_extlen_t *, int *);
 
+/*
+ * Size of the AGFL.  For CRC-enabled filesystes we steal a couple of slots in
+ * the beginning of the block for a proper header with the location information
+ * and CRC.
+ */
+unsigned int
+xfs_agfl_size(
+	struct xfs_mount	*mp)
+{
+	unsigned int		size = mp->m_sb.sb_sectsize;
+
+	if (xfs_sb_version_hascrc(&mp->m_sb))
+		size -= sizeof(struct xfs_agfl);
+
+	return size / sizeof(xfs_agblock_t);
+}
+
 unsigned int
 xfs_refc_block(
 	struct xfs_mount	*mp)
@@ -550,7 +567,7 @@ xfs_agfl_verify(
 	if (bp->b_pag && be32_to_cpu(agfl->agfl_seqno) != bp->b_pag->pag_agno)
 		return __this_address;
 
-	for (i = 0; i < XFS_AGFL_SIZE(mp); i++) {
+	for (i = 0; i < xfs_agfl_size(mp); i++) {
 		if (be32_to_cpu(agfl->agfl_bno[i]) != NULLAGBLOCK &&
 		    be32_to_cpu(agfl->agfl_bno[i]) >= mp->m_sb.sb_agblocks)
 			return __this_address;
@@ -2266,7 +2283,7 @@ xfs_alloc_get_freelist(
 	bno = be32_to_cpu(agfl_bno[be32_to_cpu(agf->agf_flfirst)]);
 	be32_add_cpu(&agf->agf_flfirst, 1);
 	xfs_trans_brelse(tp, agflbp);
-	if (be32_to_cpu(agf->agf_flfirst) == XFS_AGFL_SIZE(mp))
+	if (be32_to_cpu(agf->agf_flfirst) == xfs_agfl_size(mp))
 		agf->agf_flfirst = 0;
 
 	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
@@ -2377,7 +2394,7 @@ xfs_alloc_put_freelist(
 			be32_to_cpu(agf->agf_seqno), &agflbp)))
 		return error;
 	be32_add_cpu(&agf->agf_fllast, 1);
-	if (be32_to_cpu(agf->agf_fllast) == XFS_AGFL_SIZE(mp))
+	if (be32_to_cpu(agf->agf_fllast) == xfs_agfl_size(mp))
 		agf->agf_fllast = 0;
 
 	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
@@ -2395,7 +2412,7 @@ xfs_alloc_put_freelist(
 
 	xfs_alloc_log_agf(tp, agbp, logflags);
 
-	ASSERT(be32_to_cpu(agf->agf_flcount) <= XFS_AGFL_SIZE(mp));
+	ASSERT(be32_to_cpu(agf->agf_flcount) <= xfs_agfl_size(mp));
 
 	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);
 	blockp = &agfl_bno[be32_to_cpu(agf->agf_fllast)];
@@ -2428,9 +2445,9 @@ xfs_agf_verify(
 	if (!(agf->agf_magicnum == cpu_to_be32(XFS_AGF_MAGIC) &&
 	      XFS_AGF_GOOD_VERSION(be32_to_cpu(agf->agf_versionnum)) &&
 	      be32_to_cpu(agf->agf_freeblks) <= be32_to_cpu(agf->agf_length) &&
-	      be32_to_cpu(agf->agf_flfirst) < XFS_AGFL_SIZE(mp) &&
-	      be32_to_cpu(agf->agf_fllast) < XFS_AGFL_SIZE(mp) &&
-	      be32_to_cpu(agf->agf_flcount) <= XFS_AGFL_SIZE(mp)))
+	      be32_to_cpu(agf->agf_flfirst) < xfs_agfl_size(mp) &&
+	      be32_to_cpu(agf->agf_fllast) < xfs_agfl_size(mp) &&
+	      be32_to_cpu(agf->agf_flcount) <= xfs_agfl_size(mp)))
 		return __this_address;
 
 	if (be32_to_cpu(agf->agf_levels[XFS_BTNUM_BNO]) < 1 ||
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index 65a0caf..a311a24 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -26,6 +26,8 @@ struct xfs_trans;
 
 extern struct workqueue_struct *xfs_alloc_wq;
 
+unsigned int xfs_agfl_size(struct xfs_mount *mp);
+
 /*
  * Freespace allocation types.  Argument to xfs_alloc_[v]extent.
  */
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 1acb584..42956d8 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -803,24 +803,13 @@ typedef struct xfs_agi {
 		&(XFS_BUF_TO_AGFL(bp)->agfl_bno[0]) : \
 		(__be32 *)(bp)->b_addr)
 
-/*
- * Size of the AGFL.  For CRC-enabled filesystes we steal a couple of
- * slots in the beginning of the block for a proper header with the
- * location information and CRC.
- */
-#define XFS_AGFL_SIZE(mp) \
-	(((mp)->m_sb.sb_sectsize - \
-	 (xfs_sb_version_hascrc(&((mp)->m_sb)) ? \
-		sizeof(struct xfs_agfl) : 0)) / \
-	  sizeof(xfs_agblock_t))
-
 typedef struct xfs_agfl {
 	__be32		agfl_magicnum;
 	__be32		agfl_seqno;
 	uuid_t		agfl_uuid;
 	__be64		agfl_lsn;
 	__be32		agfl_crc;
-	__be32		agfl_bno[];	/* actually XFS_AGFL_SIZE(mp) */
+	__be32		agfl_bno[];	/* actually xfs_agfl_size(mp) */
 } __attribute__((packed)) xfs_agfl_t;
 
 #define XFS_AGFL_CRC_OFF	offsetof(struct xfs_agfl, agfl_crc)
diff --git a/fs/xfs/scrub/agheader.c b/fs/xfs/scrub/agheader.c
index fd97552..fd383c5 100644
--- a/fs/xfs/scrub/agheader.c
+++ b/fs/xfs/scrub/agheader.c
@@ -80,7 +80,7 @@ xfs_scrub_walk_agfl(
 	}
 
 	/* first to the end */
-	for (i = flfirst; i < XFS_AGFL_SIZE(mp); i++) {
+	for (i = flfirst; i < xfs_agfl_size(mp); i++) {
 		error = fn(sc, be32_to_cpu(agfl_bno[i]), priv);
 		if (error)
 			return error;
@@ -664,7 +664,7 @@ xfs_scrub_agf(
 	if (agfl_last > agfl_first)
 		fl_count = agfl_last - agfl_first + 1;
 	else
-		fl_count = XFS_AGFL_SIZE(mp) - agfl_first + agfl_last + 1;
+		fl_count = xfs_agfl_size(mp) - agfl_first + agfl_last + 1;
 	if (agfl_count != 0 && fl_count != agfl_count)
 		xfs_scrub_block_set_corrupt(sc, sc->sa.agf_bp);
 
@@ -791,7 +791,7 @@ xfs_scrub_agfl(
 	/* Allocate buffer to ensure uniqueness of AGFL entries. */
 	agf = XFS_BUF_TO_AGF(sc->sa.agf_bp);
 	agflcount = be32_to_cpu(agf->agf_flcount);
-	if (agflcount > XFS_AGFL_SIZE(sc->mp)) {
+	if (agflcount > xfs_agfl_size(sc->mp)) {
 		xfs_scrub_block_set_corrupt(sc, sc->sa.agf_bp);
 		goto out;
 	}
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 8b45456..5237927 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -217,7 +217,7 @@ xfs_growfs_data_private(
 		}
 
 		agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, bp);
-		for (bucket = 0; bucket < XFS_AGFL_SIZE(mp); bucket++)
+		for (bucket = 0; bucket < xfs_agfl_size(mp); bucket++)
 			agfl_bno[bucket] = cpu_to_be32(NULLAGBLOCK);
 
 		error = xfs_bwrite(bp);


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 5/5] xfs: fix agfl wrapping
  2018-02-23  1:59 [PATCH 0/5] xfs: fix various problems Darrick J. Wong
                   ` (3 preceding siblings ...)
  2018-02-23  2:00 ` [PATCH 4/5] xfs: convert XFS_AGFL_SIZE to a helper function Darrick J. Wong
@ 2018-02-23  2:00 ` Darrick J. Wong
  2018-02-23  4:40   ` Darrick J. Wong
                     ` (3 more replies)
  4 siblings, 4 replies; 26+ messages in thread
From: Darrick J. Wong @ 2018-02-23  2:00 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile     |    1 
 fs/xfs/xfs_fixups.c |  310 +++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_fixups.h |   26 ++++
 fs/xfs/xfs_mount.c  |   21 +++
 fs/xfs/xfs_super.c  |   10 ++
 5 files changed, 367 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/xfs_fixups.c
 create mode 100644 fs/xfs/xfs_fixups.h


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index b03c77e..f88368a 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -86,6 +86,7 @@ xfs-y				+= xfs_aops.o \
 				   xfs_extent_busy.o \
 				   xfs_file.o \
 				   xfs_filestream.o \
+				   xfs_fixups.o \
 				   xfs_fsmap.o \
 				   xfs_fsops.o \
 				   xfs_globals.o \
diff --git a/fs/xfs/xfs_fixups.c b/fs/xfs/xfs_fixups.c
new file mode 100644
index 0000000..0cad7bb
--- /dev/null
+++ b/fs/xfs/xfs_fixups.c
@@ -0,0 +1,310 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_sb.h"
+#include "xfs_mount.h"
+#include "xfs_alloc.h"
+#include "xfs_trans.h"
+#include "xfs_fixups.h"
+
+/*
+ * v5 AGFL padding defects
+ *
+ * When the v5 format was first introduced, there was a defect in the struct
+ * xfs_agfl definition that resulted in XFS_AGFL_SIZE returning different
+ * values depending on the compiler padding.  On a fs with 512-byte sectors,
+ * this meant that XFS_AGFL_SIZE was 119 on i386, but 118 on x64.  Commit
+ * 96f859d52bcb1 ("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is
+ * correct") changed the definition to disable padding the end of the
+ * structure, and was accepted into Linux 4.5.  Since then, the AGFL has
+ * always used the larger size (e.g. 119 entries on a 512b sector fs).
+ *
+ * Unfortunately, pre-4.5 kernels can produce filesystems with AGFLs that wrap
+ * at the smaller size, and those kernels are not prepared to handle the
+ * longer size.  This typically manifests itself as an AGF verifier corruption
+ * error followed by a filesystem shutdown.  While we encourage admins to stay
+ * current with software, we would like to avoid this intermittent breakage.
+ *
+ * Any v5 filesystem which has a feature bit set for a feature that was
+ * introduced after Linux 4.5 will not have this problem, as such kernels
+ * cannot be mounted on older kernels.  v4 filesystems are also unaffected.
+ *
+ * Therefore, we add two fixup functions -- the first runs at mount time to
+ * detect a short-wrapped AGFL and fix it; the second runs at unmount, freeze,
+ * or remount-ro time to move a wrapped AGFL to the beginning of the list.
+ * This reduces the likelihood of a screwup to the scenario where you have (a)
+ * a filesystem with no post-4.5 features (reflink, rmap), (b) the AGFL wraps,
+ * (c) the filesystem goes down leaving a dirty log, and (d) the dirty
+ * filesystem is mounted on an old kernel.
+ */
+
+/*
+ * Decide if we need to have the agfl wrapping fixes applied.  This only
+ * affects v5 filesystems that do not have any features enabled that did not
+ * exist when the agfl padding fix went in.
+ *
+ * Features already present when the fix went in were finobt, ftype, spinodes.
+ * If we see something new (e.g. reflink) then don't bother.
+ */
+#define XFS_SB_FEAT_RO_COMPAT_AGFL_WRAP_ALREADY_FIXED \
+		(~(XFS_SB_FEAT_RO_COMPAT_FINOBT))
+#define XFS_SB_FEAT_INCOMPAT_AGFL_WRAP_ALREADY_FIXED \
+		(~(XFS_SB_FEAT_INCOMPAT_FTYPE | \
+		   XFS_SB_FEAT_INCOMPAT_SPINODES))
+#define XFS_SB_FEAT_INCOMPAT_LOG_AGFL_WRAP_ALREADY_FIXED \
+		(~0)
+static inline bool xfs_sb_version_needs_agfl_wrap_fixes(struct xfs_sb *sbp)
+{
+	return xfs_sb_version_hascrc(sbp) &&
+		!xfs_sb_has_incompat_feature(sbp,
+			XFS_SB_FEAT_INCOMPAT_AGFL_WRAP_ALREADY_FIXED) &&
+		!xfs_sb_has_ro_compat_feature(sbp,
+			XFS_SB_FEAT_RO_COMPAT_AGFL_WRAP_ALREADY_FIXED) &&
+		!xfs_sb_has_incompat_log_feature(sbp,
+			XFS_SB_FEAT_INCOMPAT_LOG_AGFL_WRAP_ALREADY_FIXED);
+}
+
+/*
+ * Fix an AGFL wrapping that falls short of the end of the block by filling the
+ * gap at the end of the block.
+ */
+STATIC int
+xfs_fixup_freelist_wrap_mount(
+	struct xfs_trans	*tp,
+	struct xfs_buf		*agfbp,
+	struct xfs_perag	*pag)
+{
+	struct xfs_mount	*mp = tp->t_mountp;
+	struct xfs_agf		*agf;
+	struct xfs_buf		*agflbp;
+	__be32			*agfl_bno;
+	xfs_agnumber_t		agno;
+	uint32_t		agfl_size;
+	uint32_t		flfirst;
+	uint32_t		fllast;
+	int32_t			active;
+	int			offset;
+	int			len;
+	int			error;
+
+	if (pag->pagf_flcount == 0)
+		return 0;
+
+	agfl_size = xfs_agfl_size(mp);
+	agf = XFS_BUF_TO_AGF(agfbp);
+	agno = be32_to_cpu(agf->agf_seqno);
+	flfirst = be32_to_cpu(agf->agf_flfirst);
+	fllast = be32_to_cpu(agf->agf_fllast);
+
+	/* Make sure we're either spot on or off by 1. */
+	active = fllast - flfirst + 1;
+	if (active <= 0)
+		active += agfl_size;
+	if (active == pag->pagf_flcount)
+		return 0;
+	else if (active != pag->pagf_flcount + 1)
+		return -EFSCORRUPTED;
+
+	/* Would this have even passed muster on an old system? */
+	if (flfirst >= agfl_size - 1 || fllast >= agfl_size - 1 ||
+	    pag->pagf_flcount > agfl_size - 1)
+		return -EFSCORRUPTED;
+
+	/*
+	 * Convert a 40-byte-padded agfl into a 36-byte-padded AGFL.
+	 * Therefore, we need to move the AGFL blocks
+	 * bno[flfirst..agfl_size - 2] to bno[flfirst + 1...agfl_size - 1].
+	 *
+	 * Reusing the example above, if we had flfirst == 116, we need
+	 * to move bno[116] and bno[117] into bno[117] and bno[118],
+	 * respectively, and then increment flfirst.
+	 */
+	error = xfs_alloc_read_agfl(mp, tp, agno, &agflbp);
+	if (error)
+		return error;
+	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);
+
+	len = (agfl_size - flfirst - 1) * sizeof(xfs_agblock_t);
+	memmove(&agfl_bno[flfirst + 1], &agfl_bno[flfirst], len);
+	offset = (char *)&agfl_bno[flfirst + 1] - (char *)agflbp->b_addr;
+	be32_add_cpu(&agf->agf_flfirst, 1);
+
+	xfs_trans_buf_set_type(tp, agflbp, XFS_BLFT_AGFL_BUF);
+	xfs_trans_log_buf(tp, agflbp, offset, offset + len - 1);
+	xfs_trans_brelse(tp, agflbp);
+	agflbp = NULL;
+	xfs_alloc_log_agf(tp, agfbp, XFS_AGF_FLFIRST);
+
+	return 0;
+}
+
+/*
+ * Fix an AGFL that touches the end of the block by moving the first or last
+ * part of the list elsewhere in the AGFL so that old kernels don't trip over
+ * wrapping issues.
+ */
+STATIC int
+xfs_fixup_freelist_wrap_unmount(
+	struct xfs_trans	*tp,
+	struct xfs_buf		*agfbp,
+	struct xfs_perag	*pag)
+{
+	struct xfs_mount	*mp = tp->t_mountp;
+	struct xfs_agf		*agf;
+	struct xfs_buf		*agflbp;
+	__be32			*agfl_bno;
+	xfs_agnumber_t		agno;
+	uint32_t		agfl_size;
+	uint32_t		flfirst;
+	uint32_t		fllast;
+	int			offset;
+	int			len;
+	int			error;
+
+	agfl_size = xfs_agfl_size(mp);
+	agf = XFS_BUF_TO_AGF(agfbp);
+	agno = be32_to_cpu(agf->agf_seqno);
+	flfirst = be32_to_cpu(agf->agf_flfirst);
+	fllast = be32_to_cpu(agf->agf_fllast);
+
+	/* Empty AGFL?  Make sure we aren't pointing at the end. */
+	if (pag->pagf_flcount == 0) {
+		if (flfirst >= agfl_size || fllast >= agfl_size) {
+			agf->agf_flfirst = cpu_to_be32(1);
+			agf->agf_fllast = 0;
+			xfs_alloc_log_agf(tp, agfbp,
+					XFS_AGF_FLFIRST | XFS_AGF_FLLAST);
+		}
+		return 0;
+	}
+
+	/* If we don't hit the end, we're done. */
+	if (flfirst < fllast && fllast != agfl_size - 1)
+		return 0;
+
+	/*
+	 * Move a start of a wrapped list towards the start of the agfl block.
+	 * Therefore, we need to move the AGFL blocks
+	 * bno[flfirst..agfl_size - 1] to bno[fllast + 1...agfl_size - flfirst].
+	 * Then we reset flfirst and fllast appropriately.
+	 *
+	 * Reusing the example above, if we had flfirst == 117 and fllast == 4,
+	 * we need to move bno[117] and bno[118] into bno[5] and bno[6],
+	 * respectively, and then reset flfirst and fllast.
+	 *
+	 * If it's just the last block that touches the end, only move that.
+	 */
+	error = xfs_alloc_read_agfl(mp, tp, agno, &agflbp);
+	if (error)
+		return error;
+	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);
+
+	if (fllast == agfl_size - 1) {
+		/* Back the AGFL off from the end of the block. */
+		len = sizeof(xfs_agblock_t);
+		agfl_bno[flfirst - 1] = agfl_bno[agfl_size - 1];
+		offset = (char *)&agfl_bno[flfirst - 1] - (char *)agflbp->b_addr;
+		be32_add_cpu(&agf->agf_fllast, -1);
+		be32_add_cpu(&agf->agf_flfirst, -1);
+	} else {
+		/* Move the first part of the AGFL towards the front. */
+		len = (agfl_size - flfirst) * sizeof(xfs_agblock_t);
+		memcpy(&agfl_bno[fllast + 1], &agfl_bno[flfirst], len);
+		offset = (char *)&agfl_bno[fllast + 1] - (char *)agflbp->b_addr;
+		agf->agf_flfirst = 0;
+		agf->agf_fllast = cpu_to_be32(pag->pagf_flcount - 1);
+	}
+
+	xfs_trans_buf_set_type(tp, agflbp, XFS_BLFT_AGFL_BUF);
+	xfs_trans_log_buf(tp, agflbp, offset, offset + len - 1);
+	xfs_trans_brelse(tp, agflbp);
+	agflbp = NULL;
+	xfs_alloc_log_agf(tp, agfbp, XFS_AGF_FLFIRST | XFS_AGF_FLLAST);
+
+	return 0;
+}
+
+typedef int (*xfs_agf_apply_fn_t)(struct xfs_trans *tp, struct xfs_buf *agfbp,
+		struct xfs_perag *pag);
+
+/* Apply something to every AGF. */
+STATIC int
+xfs_fixup_agf_apply(
+	struct xfs_mount	*mp,
+	xfs_agf_apply_fn_t	fn)
+{
+	struct xfs_trans	*tp;
+	struct xfs_perag	*pag;
+	struct xfs_buf		*agfbp;
+	xfs_agnumber_t		agno;
+	int			error;
+
+	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growdata, 0, 0, 0, &tp);
+	if (error)
+		return error;
+
+	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
+		error = xfs_alloc_read_agf(mp, tp, agno, 0, &agfbp);
+		if (error)
+			goto cancel;
+		if (!agfbp) {
+			error = -ENOMEM;
+			goto cancel;
+		}
+		pag = xfs_perag_get(mp, agno);
+		error = fn(tp, agfbp, pag);
+		xfs_perag_put(pag);
+		xfs_trans_brelse(tp, agfbp);
+		if (error)
+			goto cancel;
+	}
+
+	return xfs_trans_commit(tp);
+cancel:
+	xfs_trans_cancel(tp);
+	return error;
+}
+
+/* Fix AGFL wrapping so we can use the filesystem. */
+int
+xfs_fixup_agfl_wrap_mount(
+	struct xfs_mount	*mp)
+{
+	if (!xfs_sb_version_needs_agfl_wrap_fixes(&mp->m_sb))
+		return 0;
+
+	return xfs_fixup_agf_apply(mp, xfs_fixup_freelist_wrap_mount);
+}
+
+/* Fix AGFL wrapping so old kernels can use this filesystem. */
+int
+xfs_fixup_agfl_wrap_unmount(
+	struct xfs_mount	*mp)
+{
+	if (!xfs_sb_version_needs_agfl_wrap_fixes(&mp->m_sb))
+		return 0;
+
+	return xfs_fixup_agf_apply(mp, xfs_fixup_freelist_wrap_unmount);
+}
diff --git a/fs/xfs/xfs_fixups.h b/fs/xfs/xfs_fixups.h
new file mode 100644
index 0000000..fb52a96
--- /dev/null
+++ b/fs/xfs/xfs_fixups.h
@@ -0,0 +1,26 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef	__XFS_FIXUPS_H__
+#define	__XFS_FIXUPS_H__
+
+int xfs_fixup_agfl_wrap_mount(struct xfs_mount *mp);
+int xfs_fixup_agfl_wrap_unmount(struct xfs_mount *mp);
+
+#endif /* __XFS_FIXUPS_H__ */
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 98fd41c..eb284aa 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -46,7 +46,7 @@
 #include "xfs_refcount_btree.h"
 #include "xfs_reflink.h"
 #include "xfs_extent_busy.h"
-
+#include "xfs_fixups.h"
 
 static DEFINE_MUTEX(xfs_uuid_table_mutex);
 static int xfs_uuid_table_size;
@@ -875,6 +875,16 @@ xfs_mountfs(
 	}
 
 	/*
+	 * Make sure our AGFL counters do not wrap the end of the block
+	 * in a troublesome manner.
+	 */
+	error = xfs_fixup_agfl_wrap_mount(mp);
+	if (error) {
+		xfs_warn(mp, "Failed to fix agfl wrapping.  Run xfs_repair.");
+		goto out_log_dealloc;
+	}
+
+	/*
 	 * Get and sanity-check the root inode.
 	 * Save the pointer to it in the mount structure.
 	 */
@@ -1128,6 +1138,15 @@ xfs_unmountfs(
 	xfs_qm_unmount(mp);
 
 	/*
+	 * Make sure our AGFL counters do not wrap the end of the block
+	 * in a troublesome manner for old kernels.
+	 */
+	error = xfs_fixup_agfl_wrap_unmount(mp);
+	if (error)
+		xfs_warn(mp, "Unable to fix agfl wrapping.  "
+				"This may cause problems on next mount.");
+
+	/*
 	 * Unreserve any blocks we have so that when we unmount we don't account
 	 * the reserved free space as used. This is really only necessary for
 	 * lazy superblock counting because it trusts the incore superblock
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 624a802..d9aa39a 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -50,6 +50,7 @@
 #include "xfs_refcount_item.h"
 #include "xfs_bmap_item.h"
 #include "xfs_reflink.h"
+#include "xfs_fixups.h"
 
 #include <linux/namei.h>
 #include <linux/dax.h>
@@ -1206,6 +1207,15 @@ xfs_quiesce_attr(
 	xfs_reclaim_inodes(mp, 0);
 	xfs_reclaim_inodes(mp, SYNC_WAIT);
 
+	/*
+	 * Make sure our AGFL counters do not wrap the end of the block
+	 * in a troublesome manner for old kernels.
+	 */
+	error = xfs_fixup_agfl_wrap_unmount(mp);
+	if (error)
+		xfs_warn(mp, "Unable to fix agfl wrapping.  "
+				"This may cause problems on next mount.");
+
 	/* Push the superblock and write an unmount record */
 	error = xfs_log_sbcount(mp);
 	if (error)


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] xfs: fix agfl wrapping
  2018-02-23  2:00 ` [PATCH 5/5] xfs: fix agfl wrapping Darrick J. Wong
@ 2018-02-23  4:40   ` Darrick J. Wong
  2018-02-23 20:33   ` Darrick J. Wong
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2018-02-23  4:40 UTC (permalink / raw)
  To: linux-xfs

On Thu, Feb 22, 2018 at 06:00:15PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>

Yay, no commit message!

"Fix the AGFL wrapping issues introduced with the v5 disk format and
semi-corrected later in 4.5.  In short, if we find an incorrectly
wrapped agfl at mount time, fix the list wrapping so that we can start
opreations.  At unmount time, adjust all the agfls so that they don't
touch the controversial last element.  Skip all this fuss if the
filesystem has features that didn't exist prior to the agfl fixing in
4.5."

--D

> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/Makefile     |    1 
>  fs/xfs/xfs_fixups.c |  310 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_fixups.h |   26 ++++
>  fs/xfs/xfs_mount.c  |   21 +++
>  fs/xfs/xfs_super.c  |   10 ++
>  5 files changed, 367 insertions(+), 1 deletion(-)
>  create mode 100644 fs/xfs/xfs_fixups.c
>  create mode 100644 fs/xfs/xfs_fixups.h
> 
> 
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index b03c77e..f88368a 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -86,6 +86,7 @@ xfs-y				+= xfs_aops.o \
>  				   xfs_extent_busy.o \
>  				   xfs_file.o \
>  				   xfs_filestream.o \
> +				   xfs_fixups.o \
>  				   xfs_fsmap.o \
>  				   xfs_fsops.o \
>  				   xfs_globals.o \
> diff --git a/fs/xfs/xfs_fixups.c b/fs/xfs/xfs_fixups.c
> new file mode 100644
> index 0000000..0cad7bb
> --- /dev/null
> +++ b/fs/xfs/xfs_fixups.c
> @@ -0,0 +1,310 @@
> +/*
> + * Copyright (C) 2018 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_shared.h"
> +#include "xfs_format.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_sb.h"
> +#include "xfs_mount.h"
> +#include "xfs_alloc.h"
> +#include "xfs_trans.h"
> +#include "xfs_fixups.h"
> +
> +/*
> + * v5 AGFL padding defects
> + *
> + * When the v5 format was first introduced, there was a defect in the struct
> + * xfs_agfl definition that resulted in XFS_AGFL_SIZE returning different
> + * values depending on the compiler padding.  On a fs with 512-byte sectors,
> + * this meant that XFS_AGFL_SIZE was 119 on i386, but 118 on x64.  Commit
> + * 96f859d52bcb1 ("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is
> + * correct") changed the definition to disable padding the end of the
> + * structure, and was accepted into Linux 4.5.  Since then, the AGFL has
> + * always used the larger size (e.g. 119 entries on a 512b sector fs).
> + *
> + * Unfortunately, pre-4.5 kernels can produce filesystems with AGFLs that wrap
> + * at the smaller size, and those kernels are not prepared to handle the
> + * longer size.  This typically manifests itself as an AGF verifier corruption
> + * error followed by a filesystem shutdown.  While we encourage admins to stay
> + * current with software, we would like to avoid this intermittent breakage.
> + *
> + * Any v5 filesystem which has a feature bit set for a feature that was
> + * introduced after Linux 4.5 will not have this problem, as such kernels
> + * cannot be mounted on older kernels.  v4 filesystems are also unaffected.
> + *
> + * Therefore, we add two fixup functions -- the first runs at mount time to
> + * detect a short-wrapped AGFL and fix it; the second runs at unmount, freeze,
> + * or remount-ro time to move a wrapped AGFL to the beginning of the list.
> + * This reduces the likelihood of a screwup to the scenario where you have (a)
> + * a filesystem with no post-4.5 features (reflink, rmap), (b) the AGFL wraps,
> + * (c) the filesystem goes down leaving a dirty log, and (d) the dirty
> + * filesystem is mounted on an old kernel.
> + */
> +
> +/*
> + * Decide if we need to have the agfl wrapping fixes applied.  This only
> + * affects v5 filesystems that do not have any features enabled that did not
> + * exist when the agfl padding fix went in.
> + *
> + * Features already present when the fix went in were finobt, ftype, spinodes.
> + * If we see something new (e.g. reflink) then don't bother.
> + */
> +#define XFS_SB_FEAT_RO_COMPAT_AGFL_WRAP_ALREADY_FIXED \
> +		(~(XFS_SB_FEAT_RO_COMPAT_FINOBT))
> +#define XFS_SB_FEAT_INCOMPAT_AGFL_WRAP_ALREADY_FIXED \
> +		(~(XFS_SB_FEAT_INCOMPAT_FTYPE | \
> +		   XFS_SB_FEAT_INCOMPAT_SPINODES))
> +#define XFS_SB_FEAT_INCOMPAT_LOG_AGFL_WRAP_ALREADY_FIXED \
> +		(~0)
> +static inline bool xfs_sb_version_needs_agfl_wrap_fixes(struct xfs_sb *sbp)
> +{
> +	return xfs_sb_version_hascrc(sbp) &&
> +		!xfs_sb_has_incompat_feature(sbp,
> +			XFS_SB_FEAT_INCOMPAT_AGFL_WRAP_ALREADY_FIXED) &&
> +		!xfs_sb_has_ro_compat_feature(sbp,
> +			XFS_SB_FEAT_RO_COMPAT_AGFL_WRAP_ALREADY_FIXED) &&
> +		!xfs_sb_has_incompat_log_feature(sbp,
> +			XFS_SB_FEAT_INCOMPAT_LOG_AGFL_WRAP_ALREADY_FIXED);
> +}
> +
> +/*
> + * Fix an AGFL wrapping that falls short of the end of the block by filling the
> + * gap at the end of the block.
> + */
> +STATIC int
> +xfs_fixup_freelist_wrap_mount(
> +	struct xfs_trans	*tp,
> +	struct xfs_buf		*agfbp,
> +	struct xfs_perag	*pag)
> +{
> +	struct xfs_mount	*mp = tp->t_mountp;
> +	struct xfs_agf		*agf;
> +	struct xfs_buf		*agflbp;
> +	__be32			*agfl_bno;
> +	xfs_agnumber_t		agno;
> +	uint32_t		agfl_size;
> +	uint32_t		flfirst;
> +	uint32_t		fllast;
> +	int32_t			active;
> +	int			offset;
> +	int			len;
> +	int			error;
> +
> +	if (pag->pagf_flcount == 0)
> +		return 0;
> +
> +	agfl_size = xfs_agfl_size(mp);
> +	agf = XFS_BUF_TO_AGF(agfbp);
> +	agno = be32_to_cpu(agf->agf_seqno);
> +	flfirst = be32_to_cpu(agf->agf_flfirst);
> +	fllast = be32_to_cpu(agf->agf_fllast);
> +
> +	/* Make sure we're either spot on or off by 1. */
> +	active = fllast - flfirst + 1;
> +	if (active <= 0)
> +		active += agfl_size;
> +	if (active == pag->pagf_flcount)
> +		return 0;
> +	else if (active != pag->pagf_flcount + 1)
> +		return -EFSCORRUPTED;
> +
> +	/* Would this have even passed muster on an old system? */
> +	if (flfirst >= agfl_size - 1 || fllast >= agfl_size - 1 ||
> +	    pag->pagf_flcount > agfl_size - 1)
> +		return -EFSCORRUPTED;
> +
> +	/*
> +	 * Convert a 40-byte-padded agfl into a 36-byte-padded AGFL.
> +	 * Therefore, we need to move the AGFL blocks
> +	 * bno[flfirst..agfl_size - 2] to bno[flfirst + 1...agfl_size - 1].
> +	 *
> +	 * Reusing the example above, if we had flfirst == 116, we need
> +	 * to move bno[116] and bno[117] into bno[117] and bno[118],
> +	 * respectively, and then increment flfirst.
> +	 */
> +	error = xfs_alloc_read_agfl(mp, tp, agno, &agflbp);
> +	if (error)
> +		return error;
> +	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);
> +
> +	len = (agfl_size - flfirst - 1) * sizeof(xfs_agblock_t);
> +	memmove(&agfl_bno[flfirst + 1], &agfl_bno[flfirst], len);
> +	offset = (char *)&agfl_bno[flfirst + 1] - (char *)agflbp->b_addr;
> +	be32_add_cpu(&agf->agf_flfirst, 1);
> +
> +	xfs_trans_buf_set_type(tp, agflbp, XFS_BLFT_AGFL_BUF);
> +	xfs_trans_log_buf(tp, agflbp, offset, offset + len - 1);
> +	xfs_trans_brelse(tp, agflbp);
> +	agflbp = NULL;
> +	xfs_alloc_log_agf(tp, agfbp, XFS_AGF_FLFIRST);
> +
> +	return 0;
> +}
> +
> +/*
> + * Fix an AGFL that touches the end of the block by moving the first or last
> + * part of the list elsewhere in the AGFL so that old kernels don't trip over
> + * wrapping issues.
> + */
> +STATIC int
> +xfs_fixup_freelist_wrap_unmount(
> +	struct xfs_trans	*tp,
> +	struct xfs_buf		*agfbp,
> +	struct xfs_perag	*pag)
> +{
> +	struct xfs_mount	*mp = tp->t_mountp;
> +	struct xfs_agf		*agf;
> +	struct xfs_buf		*agflbp;
> +	__be32			*agfl_bno;
> +	xfs_agnumber_t		agno;
> +	uint32_t		agfl_size;
> +	uint32_t		flfirst;
> +	uint32_t		fllast;
> +	int			offset;
> +	int			len;
> +	int			error;
> +
> +	agfl_size = xfs_agfl_size(mp);
> +	agf = XFS_BUF_TO_AGF(agfbp);
> +	agno = be32_to_cpu(agf->agf_seqno);
> +	flfirst = be32_to_cpu(agf->agf_flfirst);
> +	fllast = be32_to_cpu(agf->agf_fllast);
> +
> +	/* Empty AGFL?  Make sure we aren't pointing at the end. */
> +	if (pag->pagf_flcount == 0) {
> +		if (flfirst >= agfl_size || fllast >= agfl_size) {
> +			agf->agf_flfirst = cpu_to_be32(1);
> +			agf->agf_fllast = 0;
> +			xfs_alloc_log_agf(tp, agfbp,
> +					XFS_AGF_FLFIRST | XFS_AGF_FLLAST);
> +		}
> +		return 0;
> +	}
> +
> +	/* If we don't hit the end, we're done. */
> +	if (flfirst < fllast && fllast != agfl_size - 1)
> +		return 0;
> +
> +	/*
> +	 * Move a start of a wrapped list towards the start of the agfl block.
> +	 * Therefore, we need to move the AGFL blocks
> +	 * bno[flfirst..agfl_size - 1] to bno[fllast + 1...agfl_size - flfirst].
> +	 * Then we reset flfirst and fllast appropriately.
> +	 *
> +	 * Reusing the example above, if we had flfirst == 117 and fllast == 4,
> +	 * we need to move bno[117] and bno[118] into bno[5] and bno[6],
> +	 * respectively, and then reset flfirst and fllast.
> +	 *
> +	 * If it's just the last block that touches the end, only move that.
> +	 */
> +	error = xfs_alloc_read_agfl(mp, tp, agno, &agflbp);
> +	if (error)
> +		return error;
> +	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);
> +
> +	if (fllast == agfl_size - 1) {
> +		/* Back the AGFL off from the end of the block. */
> +		len = sizeof(xfs_agblock_t);
> +		agfl_bno[flfirst - 1] = agfl_bno[agfl_size - 1];
> +		offset = (char *)&agfl_bno[flfirst - 1] - (char *)agflbp->b_addr;
> +		be32_add_cpu(&agf->agf_fllast, -1);
> +		be32_add_cpu(&agf->agf_flfirst, -1);
> +	} else {
> +		/* Move the first part of the AGFL towards the front. */
> +		len = (agfl_size - flfirst) * sizeof(xfs_agblock_t);
> +		memcpy(&agfl_bno[fllast + 1], &agfl_bno[flfirst], len);
> +		offset = (char *)&agfl_bno[fllast + 1] - (char *)agflbp->b_addr;
> +		agf->agf_flfirst = 0;
> +		agf->agf_fllast = cpu_to_be32(pag->pagf_flcount - 1);
> +	}
> +
> +	xfs_trans_buf_set_type(tp, agflbp, XFS_BLFT_AGFL_BUF);
> +	xfs_trans_log_buf(tp, agflbp, offset, offset + len - 1);
> +	xfs_trans_brelse(tp, agflbp);
> +	agflbp = NULL;
> +	xfs_alloc_log_agf(tp, agfbp, XFS_AGF_FLFIRST | XFS_AGF_FLLAST);
> +
> +	return 0;
> +}
> +
> +typedef int (*xfs_agf_apply_fn_t)(struct xfs_trans *tp, struct xfs_buf *agfbp,
> +		struct xfs_perag *pag);
> +
> +/* Apply something to every AGF. */
> +STATIC int
> +xfs_fixup_agf_apply(
> +	struct xfs_mount	*mp,
> +	xfs_agf_apply_fn_t	fn)
> +{
> +	struct xfs_trans	*tp;
> +	struct xfs_perag	*pag;
> +	struct xfs_buf		*agfbp;
> +	xfs_agnumber_t		agno;
> +	int			error;
> +
> +	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growdata, 0, 0, 0, &tp);
> +	if (error)
> +		return error;
> +
> +	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
> +		error = xfs_alloc_read_agf(mp, tp, agno, 0, &agfbp);
> +		if (error)
> +			goto cancel;
> +		if (!agfbp) {
> +			error = -ENOMEM;
> +			goto cancel;
> +		}
> +		pag = xfs_perag_get(mp, agno);
> +		error = fn(tp, agfbp, pag);
> +		xfs_perag_put(pag);
> +		xfs_trans_brelse(tp, agfbp);
> +		if (error)
> +			goto cancel;
> +	}
> +
> +	return xfs_trans_commit(tp);
> +cancel:
> +	xfs_trans_cancel(tp);
> +	return error;
> +}
> +
> +/* Fix AGFL wrapping so we can use the filesystem. */
> +int
> +xfs_fixup_agfl_wrap_mount(
> +	struct xfs_mount	*mp)
> +{
> +	if (!xfs_sb_version_needs_agfl_wrap_fixes(&mp->m_sb))
> +		return 0;
> +
> +	return xfs_fixup_agf_apply(mp, xfs_fixup_freelist_wrap_mount);
> +}
> +
> +/* Fix AGFL wrapping so old kernels can use this filesystem. */
> +int
> +xfs_fixup_agfl_wrap_unmount(
> +	struct xfs_mount	*mp)
> +{
> +	if (!xfs_sb_version_needs_agfl_wrap_fixes(&mp->m_sb))
> +		return 0;
> +
> +	return xfs_fixup_agf_apply(mp, xfs_fixup_freelist_wrap_unmount);
> +}
> diff --git a/fs/xfs/xfs_fixups.h b/fs/xfs/xfs_fixups.h
> new file mode 100644
> index 0000000..fb52a96
> --- /dev/null
> +++ b/fs/xfs/xfs_fixups.h
> @@ -0,0 +1,26 @@
> +/*
> + * Copyright (C) 2018 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#ifndef	__XFS_FIXUPS_H__
> +#define	__XFS_FIXUPS_H__
> +
> +int xfs_fixup_agfl_wrap_mount(struct xfs_mount *mp);
> +int xfs_fixup_agfl_wrap_unmount(struct xfs_mount *mp);
> +
> +#endif /* __XFS_FIXUPS_H__ */
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index 98fd41c..eb284aa 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -46,7 +46,7 @@
>  #include "xfs_refcount_btree.h"
>  #include "xfs_reflink.h"
>  #include "xfs_extent_busy.h"
> -
> +#include "xfs_fixups.h"
>  
>  static DEFINE_MUTEX(xfs_uuid_table_mutex);
>  static int xfs_uuid_table_size;
> @@ -875,6 +875,16 @@ xfs_mountfs(
>  	}
>  
>  	/*
> +	 * Make sure our AGFL counters do not wrap the end of the block
> +	 * in a troublesome manner.
> +	 */
> +	error = xfs_fixup_agfl_wrap_mount(mp);
> +	if (error) {
> +		xfs_warn(mp, "Failed to fix agfl wrapping.  Run xfs_repair.");
> +		goto out_log_dealloc;
> +	}
> +
> +	/*
>  	 * Get and sanity-check the root inode.
>  	 * Save the pointer to it in the mount structure.
>  	 */
> @@ -1128,6 +1138,15 @@ xfs_unmountfs(
>  	xfs_qm_unmount(mp);
>  
>  	/*
> +	 * Make sure our AGFL counters do not wrap the end of the block
> +	 * in a troublesome manner for old kernels.
> +	 */
> +	error = xfs_fixup_agfl_wrap_unmount(mp);
> +	if (error)
> +		xfs_warn(mp, "Unable to fix agfl wrapping.  "
> +				"This may cause problems on next mount.");
> +
> +	/*
>  	 * Unreserve any blocks we have so that when we unmount we don't account
>  	 * the reserved free space as used. This is really only necessary for
>  	 * lazy superblock counting because it trusts the incore superblock
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 624a802..d9aa39a 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -50,6 +50,7 @@
>  #include "xfs_refcount_item.h"
>  #include "xfs_bmap_item.h"
>  #include "xfs_reflink.h"
> +#include "xfs_fixups.h"
>  
>  #include <linux/namei.h>
>  #include <linux/dax.h>
> @@ -1206,6 +1207,15 @@ xfs_quiesce_attr(
>  	xfs_reclaim_inodes(mp, 0);
>  	xfs_reclaim_inodes(mp, SYNC_WAIT);
>  
> +	/*
> +	 * Make sure our AGFL counters do not wrap the end of the block
> +	 * in a troublesome manner for old kernels.
> +	 */
> +	error = xfs_fixup_agfl_wrap_unmount(mp);
> +	if (error)
> +		xfs_warn(mp, "Unable to fix agfl wrapping.  "
> +				"This may cause problems on next mount.");
> +
>  	/* Push the superblock and write an unmount record */
>  	error = xfs_log_sbcount(mp);
>  	if (error)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] xfs: fix agfl wrapping
  2018-02-23  2:00 ` [PATCH 5/5] xfs: fix agfl wrapping Darrick J. Wong
  2018-02-23  4:40   ` Darrick J. Wong
@ 2018-02-23 20:33   ` Darrick J. Wong
  2018-02-27 19:35   ` Brian Foster
  2018-03-01  6:42   ` [PATCH v2 " Darrick J. Wong
  3 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2018-02-23 20:33 UTC (permalink / raw)
  To: linux-xfs

On Thu, Feb 22, 2018 at 06:00:15PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/Makefile     |    1 
>  fs/xfs/xfs_fixups.c |  310 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_fixups.h |   26 ++++
>  fs/xfs/xfs_mount.c  |   21 +++
>  fs/xfs/xfs_super.c  |   10 ++
>  5 files changed, 367 insertions(+), 1 deletion(-)
>  create mode 100644 fs/xfs/xfs_fixups.c
>  create mode 100644 fs/xfs/xfs_fixups.h
> 
> 
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index b03c77e..f88368a 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -86,6 +86,7 @@ xfs-y				+= xfs_aops.o \
>  				   xfs_extent_busy.o \
>  				   xfs_file.o \
>  				   xfs_filestream.o \
> +				   xfs_fixups.o \
>  				   xfs_fsmap.o \
>  				   xfs_fsops.o \
>  				   xfs_globals.o \
> diff --git a/fs/xfs/xfs_fixups.c b/fs/xfs/xfs_fixups.c
> new file mode 100644
> index 0000000..0cad7bb
> --- /dev/null
> +++ b/fs/xfs/xfs_fixups.c
> @@ -0,0 +1,310 @@
> +/*
> + * Copyright (C) 2018 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_shared.h"
> +#include "xfs_format.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_sb.h"
> +#include "xfs_mount.h"
> +#include "xfs_alloc.h"
> +#include "xfs_trans.h"
> +#include "xfs_fixups.h"
> +
> +/*
> + * v5 AGFL padding defects
> + *
> + * When the v5 format was first introduced, there was a defect in the struct
> + * xfs_agfl definition that resulted in XFS_AGFL_SIZE returning different
> + * values depending on the compiler padding.  On a fs with 512-byte sectors,
> + * this meant that XFS_AGFL_SIZE was 119 on i386, but 118 on x64.  Commit
> + * 96f859d52bcb1 ("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is
> + * correct") changed the definition to disable padding the end of the
> + * structure, and was accepted into Linux 4.5.  Since then, the AGFL has
> + * always used the larger size (e.g. 119 entries on a 512b sector fs).
> + *
> + * Unfortunately, pre-4.5 kernels can produce filesystems with AGFLs that wrap
> + * at the smaller size, and those kernels are not prepared to handle the
> + * longer size.  This typically manifests itself as an AGF verifier corruption
> + * error followed by a filesystem shutdown.  While we encourage admins to stay
> + * current with software, we would like to avoid this intermittent breakage.
> + *
> + * Any v5 filesystem which has a feature bit set for a feature that was
> + * introduced after Linux 4.5 will not have this problem, as such kernels
> + * cannot be mounted on older kernels.  v4 filesystems are also unaffected.
> + *
> + * Therefore, we add two fixup functions -- the first runs at mount time to
> + * detect a short-wrapped AGFL and fix it; the second runs at unmount, freeze,
> + * or remount-ro time to move a wrapped AGFL to the beginning of the list.
> + * This reduces the likelihood of a screwup to the scenario where you have (a)
> + * a filesystem with no post-4.5 features (reflink, rmap), (b) the AGFL wraps,
> + * (c) the filesystem goes down leaving a dirty log, and (d) the dirty
> + * filesystem is mounted on an old kernel.
> + */
> +
> +/*
> + * Decide if we need to have the agfl wrapping fixes applied.  This only
> + * affects v5 filesystems that do not have any features enabled that did not
> + * exist when the agfl padding fix went in.
> + *
> + * Features already present when the fix went in were finobt, ftype, spinodes.
> + * If we see something new (e.g. reflink) then don't bother.
> + */
> +#define XFS_SB_FEAT_RO_COMPAT_AGFL_WRAP_ALREADY_FIXED \
> +		(~(XFS_SB_FEAT_RO_COMPAT_FINOBT))
> +#define XFS_SB_FEAT_INCOMPAT_AGFL_WRAP_ALREADY_FIXED \
> +		(~(XFS_SB_FEAT_INCOMPAT_FTYPE | \
> +		   XFS_SB_FEAT_INCOMPAT_SPINODES))
> +#define XFS_SB_FEAT_INCOMPAT_LOG_AGFL_WRAP_ALREADY_FIXED \
> +		(~0)
> +static inline bool xfs_sb_version_needs_agfl_wrap_fixes(struct xfs_sb *sbp)
> +{
> +	return xfs_sb_version_hascrc(sbp) &&
> +		!xfs_sb_has_incompat_feature(sbp,
> +			XFS_SB_FEAT_INCOMPAT_AGFL_WRAP_ALREADY_FIXED) &&
> +		!xfs_sb_has_ro_compat_feature(sbp,
> +			XFS_SB_FEAT_RO_COMPAT_AGFL_WRAP_ALREADY_FIXED) &&
> +		!xfs_sb_has_incompat_log_feature(sbp,
> +			XFS_SB_FEAT_INCOMPAT_LOG_AGFL_WRAP_ALREADY_FIXED);
> +}
> +
> +/*
> + * Fix an AGFL wrapping that falls short of the end of the block by filling the
> + * gap at the end of the block.
> + */
> +STATIC int
> +xfs_fixup_freelist_wrap_mount(
> +	struct xfs_trans	*tp,
> +	struct xfs_buf		*agfbp,
> +	struct xfs_perag	*pag)
> +{
> +	struct xfs_mount	*mp = tp->t_mountp;
> +	struct xfs_agf		*agf;
> +	struct xfs_buf		*agflbp;
> +	__be32			*agfl_bno;
> +	xfs_agnumber_t		agno;
> +	uint32_t		agfl_size;
> +	uint32_t		flfirst;
> +	uint32_t		fllast;
> +	int32_t			active;
> +	int			offset;
> +	int			len;
> +	int			error;
> +
> +	if (pag->pagf_flcount == 0)
> +		return 0;
> +
> +	agfl_size = xfs_agfl_size(mp);
> +	agf = XFS_BUF_TO_AGF(agfbp);
> +	agno = be32_to_cpu(agf->agf_seqno);
> +	flfirst = be32_to_cpu(agf->agf_flfirst);
> +	fllast = be32_to_cpu(agf->agf_fllast);
> +
> +	/* Make sure we're either spot on or off by 1. */
> +	active = fllast - flfirst + 1;
> +	if (active <= 0)
> +		active += agfl_size;
> +	if (active == pag->pagf_flcount)
> +		return 0;
> +	else if (active != pag->pagf_flcount + 1)
> +		return -EFSCORRUPTED;
> +
> +	/* Would this have even passed muster on an old system? */
> +	if (flfirst >= agfl_size - 1 || fllast >= agfl_size - 1 ||
> +	    pag->pagf_flcount > agfl_size - 1)
> +		return -EFSCORRUPTED;
> +
> +	/*
> +	 * Convert a 40-byte-padded agfl into a 36-byte-padded AGFL.
> +	 * Therefore, we need to move the AGFL blocks
> +	 * bno[flfirst..agfl_size - 2] to bno[flfirst + 1...agfl_size - 1].
> +	 *
> +	 * Reusing the example above, if we had flfirst == 116, we need
> +	 * to move bno[116] and bno[117] into bno[117] and bno[118],
> +	 * respectively, and then increment flfirst.
> +	 */
> +	error = xfs_alloc_read_agfl(mp, tp, agno, &agflbp);
> +	if (error)
> +		return error;
> +	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);
> +
> +	len = (agfl_size - flfirst - 1) * sizeof(xfs_agblock_t);
> +	memmove(&agfl_bno[flfirst + 1], &agfl_bno[flfirst], len);
> +	offset = (char *)&agfl_bno[flfirst + 1] - (char *)agflbp->b_addr;
> +	be32_add_cpu(&agf->agf_flfirst, 1);
> +
> +	xfs_trans_buf_set_type(tp, agflbp, XFS_BLFT_AGFL_BUF);
> +	xfs_trans_log_buf(tp, agflbp, offset, offset + len - 1);
> +	xfs_trans_brelse(tp, agflbp);
> +	agflbp = NULL;
> +	xfs_alloc_log_agf(tp, agfbp, XFS_AGF_FLFIRST);
> +
> +	return 0;
> +}
> +
> +/*
> + * Fix an AGFL that touches the end of the block by moving the first or last
> + * part of the list elsewhere in the AGFL so that old kernels don't trip over
> + * wrapping issues.
> + */
> +STATIC int
> +xfs_fixup_freelist_wrap_unmount(
> +	struct xfs_trans	*tp,
> +	struct xfs_buf		*agfbp,
> +	struct xfs_perag	*pag)
> +{
> +	struct xfs_mount	*mp = tp->t_mountp;
> +	struct xfs_agf		*agf;
> +	struct xfs_buf		*agflbp;
> +	__be32			*agfl_bno;
> +	xfs_agnumber_t		agno;
> +	uint32_t		agfl_size;
> +	uint32_t		flfirst;
> +	uint32_t		fllast;
> +	int			offset;
> +	int			len;
> +	int			error;
> +
> +	agfl_size = xfs_agfl_size(mp);
> +	agf = XFS_BUF_TO_AGF(agfbp);
> +	agno = be32_to_cpu(agf->agf_seqno);
> +	flfirst = be32_to_cpu(agf->agf_flfirst);
> +	fllast = be32_to_cpu(agf->agf_fllast);
> +
> +	/* Empty AGFL?  Make sure we aren't pointing at the end. */
> +	if (pag->pagf_flcount == 0) {
> +		if (flfirst >= agfl_size || fllast >= agfl_size) {
> +			agf->agf_flfirst = cpu_to_be32(1);
> +			agf->agf_fllast = 0;
> +			xfs_alloc_log_agf(tp, agfbp,
> +					XFS_AGF_FLFIRST | XFS_AGF_FLLAST);
> +		}
> +		return 0;
> +	}
> +
> +	/* If we don't hit the end, we're done. */
> +	if (flfirst < fllast && fllast != agfl_size - 1)
> +		return 0;
> +
> +	/*
> +	 * Move a start of a wrapped list towards the start of the agfl block.
> +	 * Therefore, we need to move the AGFL blocks
> +	 * bno[flfirst..agfl_size - 1] to bno[fllast + 1...agfl_size - flfirst].
> +	 * Then we reset flfirst and fllast appropriately.
> +	 *
> +	 * Reusing the example above, if we had flfirst == 117 and fllast == 4,
> +	 * we need to move bno[117] and bno[118] into bno[5] and bno[6],
> +	 * respectively, and then reset flfirst and fllast.
> +	 *
> +	 * If it's just the last block that touches the end, only move that.
> +	 */
> +	error = xfs_alloc_read_agfl(mp, tp, agno, &agflbp);
> +	if (error)
> +		return error;
> +	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);
> +
> +	if (fllast == agfl_size - 1) {
> +		/* Back the AGFL off from the end of the block. */
> +		len = sizeof(xfs_agblock_t);
> +		agfl_bno[flfirst - 1] = agfl_bno[agfl_size - 1];
> +		offset = (char *)&agfl_bno[flfirst - 1] - (char *)agflbp->b_addr;
> +		be32_add_cpu(&agf->agf_fllast, -1);
> +		be32_add_cpu(&agf->agf_flfirst, -1);
> +	} else {
> +		/* Move the first part of the AGFL towards the front. */
> +		len = (agfl_size - flfirst) * sizeof(xfs_agblock_t);
> +		memcpy(&agfl_bno[fllast + 1], &agfl_bno[flfirst], len);
> +		offset = (char *)&agfl_bno[fllast + 1] - (char *)agflbp->b_addr;
> +		agf->agf_flfirst = 0;
> +		agf->agf_fllast = cpu_to_be32(pag->pagf_flcount - 1);
> +	}
> +
> +	xfs_trans_buf_set_type(tp, agflbp, XFS_BLFT_AGFL_BUF);
> +	xfs_trans_log_buf(tp, agflbp, offset, offset + len - 1);
> +	xfs_trans_brelse(tp, agflbp);
> +	agflbp = NULL;
> +	xfs_alloc_log_agf(tp, agfbp, XFS_AGF_FLFIRST | XFS_AGF_FLLAST);
> +
> +	return 0;
> +}
> +
> +typedef int (*xfs_agf_apply_fn_t)(struct xfs_trans *tp, struct xfs_buf *agfbp,
> +		struct xfs_perag *pag);
> +
> +/* Apply something to every AGF. */
> +STATIC int
> +xfs_fixup_agf_apply(
> +	struct xfs_mount	*mp,
> +	xfs_agf_apply_fn_t	fn)
> +{
> +	struct xfs_trans	*tp;
> +	struct xfs_perag	*pag;
> +	struct xfs_buf		*agfbp;
> +	xfs_agnumber_t		agno;
> +	int			error;
> +
> +	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growdata, 0, 0, 0, &tp);

This can get called when we're in freeze context, so I think this needs
to be:

error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growdata, 0, 0,
		XFS_TRANS_NO_WRITECOUNT, &tp);

I saw xfs/119 cough up an error about locking problems and deadlock.

--D

> +	if (error)
> +		return error;
> +
> +	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
> +		error = xfs_alloc_read_agf(mp, tp, agno, 0, &agfbp);
> +		if (error)
> +			goto cancel;
> +		if (!agfbp) {
> +			error = -ENOMEM;
> +			goto cancel;
> +		}
> +		pag = xfs_perag_get(mp, agno);
> +		error = fn(tp, agfbp, pag);
> +		xfs_perag_put(pag);
> +		xfs_trans_brelse(tp, agfbp);
> +		if (error)
> +			goto cancel;
> +	}
> +
> +	return xfs_trans_commit(tp);
> +cancel:
> +	xfs_trans_cancel(tp);
> +	return error;
> +}
> +
> +/* Fix AGFL wrapping so we can use the filesystem. */
> +int
> +xfs_fixup_agfl_wrap_mount(
> +	struct xfs_mount	*mp)
> +{
> +	if (!xfs_sb_version_needs_agfl_wrap_fixes(&mp->m_sb))
> +		return 0;
> +
> +	return xfs_fixup_agf_apply(mp, xfs_fixup_freelist_wrap_mount);
> +}
> +
> +/* Fix AGFL wrapping so old kernels can use this filesystem. */
> +int
> +xfs_fixup_agfl_wrap_unmount(
> +	struct xfs_mount	*mp)
> +{
> +	if (!xfs_sb_version_needs_agfl_wrap_fixes(&mp->m_sb))
> +		return 0;
> +
> +	return xfs_fixup_agf_apply(mp, xfs_fixup_freelist_wrap_unmount);
> +}
> diff --git a/fs/xfs/xfs_fixups.h b/fs/xfs/xfs_fixups.h
> new file mode 100644
> index 0000000..fb52a96
> --- /dev/null
> +++ b/fs/xfs/xfs_fixups.h
> @@ -0,0 +1,26 @@
> +/*
> + * Copyright (C) 2018 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#ifndef	__XFS_FIXUPS_H__
> +#define	__XFS_FIXUPS_H__
> +
> +int xfs_fixup_agfl_wrap_mount(struct xfs_mount *mp);
> +int xfs_fixup_agfl_wrap_unmount(struct xfs_mount *mp);
> +
> +#endif /* __XFS_FIXUPS_H__ */
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index 98fd41c..eb284aa 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -46,7 +46,7 @@
>  #include "xfs_refcount_btree.h"
>  #include "xfs_reflink.h"
>  #include "xfs_extent_busy.h"
> -
> +#include "xfs_fixups.h"
>  
>  static DEFINE_MUTEX(xfs_uuid_table_mutex);
>  static int xfs_uuid_table_size;
> @@ -875,6 +875,16 @@ xfs_mountfs(
>  	}
>  
>  	/*
> +	 * Make sure our AGFL counters do not wrap the end of the block
> +	 * in a troublesome manner.
> +	 */
> +	error = xfs_fixup_agfl_wrap_mount(mp);
> +	if (error) {
> +		xfs_warn(mp, "Failed to fix agfl wrapping.  Run xfs_repair.");
> +		goto out_log_dealloc;
> +	}
> +
> +	/*
>  	 * Get and sanity-check the root inode.
>  	 * Save the pointer to it in the mount structure.
>  	 */
> @@ -1128,6 +1138,15 @@ xfs_unmountfs(
>  	xfs_qm_unmount(mp);
>  
>  	/*
> +	 * Make sure our AGFL counters do not wrap the end of the block
> +	 * in a troublesome manner for old kernels.
> +	 */
> +	error = xfs_fixup_agfl_wrap_unmount(mp);
> +	if (error)
> +		xfs_warn(mp, "Unable to fix agfl wrapping.  "
> +				"This may cause problems on next mount.");
> +
> +	/*
>  	 * Unreserve any blocks we have so that when we unmount we don't account
>  	 * the reserved free space as used. This is really only necessary for
>  	 * lazy superblock counting because it trusts the incore superblock
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 624a802..d9aa39a 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -50,6 +50,7 @@
>  #include "xfs_refcount_item.h"
>  #include "xfs_bmap_item.h"
>  #include "xfs_reflink.h"
> +#include "xfs_fixups.h"
>  
>  #include <linux/namei.h>
>  #include <linux/dax.h>
> @@ -1206,6 +1207,15 @@ xfs_quiesce_attr(
>  	xfs_reclaim_inodes(mp, 0);
>  	xfs_reclaim_inodes(mp, SYNC_WAIT);
>  
> +	/*
> +	 * Make sure our AGFL counters do not wrap the end of the block
> +	 * in a troublesome manner for old kernels.
> +	 */
> +	error = xfs_fixup_agfl_wrap_unmount(mp);
> +	if (error)
> +		xfs_warn(mp, "Unable to fix agfl wrapping.  "
> +				"This may cause problems on next mount.");
> +
>  	/* Push the superblock and write an unmount record */
>  	error = xfs_log_sbcount(mp);
>  	if (error)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/5] xfs: don't iunlock the quota ip when quota block allocation fails
  2018-02-23  1:59 ` [PATCH 1/5] xfs: don't iunlock the quota ip when quota block allocation fails Darrick J. Wong
@ 2018-02-27 13:55   ` Brian Foster
  0 siblings, 0 replies; 26+ messages in thread
From: Brian Foster @ 2018-02-27 13:55 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Feb 22, 2018 at 05:59:48PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> In xfs_qm_dqalloc, we join the locked quota inode to the transaction we
> use to allocate blocks.  If the allocation or mapping fails, we're not
> allowed to unlock the inode because the transaction code is in charge of
> unlocking it for us.  Therefore, remove the iunlock call to avoid
> blowing asserts about unbalanced locking + mount hang.
> 
> Found by corrupting the AGF and allocating space in the filesystem
> (quotacheck) immediately after mount.  The upcoming agfl wrapping fixup
> test will trigger this scenario.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_dquot.c |    2 --
>  1 file changed, 2 deletions(-)
> 
> 
> diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
> index 43572f8..2410acc 100644
> --- a/fs/xfs/xfs_dquot.c
> +++ b/fs/xfs/xfs_dquot.c
> @@ -394,8 +394,6 @@ xfs_qm_dqalloc(
>  error1:
>  	xfs_defer_cancel(&dfops);
>  error0:
> -	xfs_iunlock(quotip, XFS_ILOCK_EXCL);
> -
>  	return error;
>  }
>  
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/5] xfs: convert a few more directory asserts to corruption returns
  2018-02-23  1:59 ` [PATCH 2/5] xfs: convert a few more directory asserts to corruption returns Darrick J. Wong
@ 2018-02-27 13:55   ` Brian Foster
  0 siblings, 0 replies; 26+ messages in thread
From: Brian Foster @ 2018-02-27 13:55 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Feb 22, 2018 at 05:59:54PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Yet another round of playing whack-a-mole with directory code that
> asserts on corrupt on-disk metadata when it really should be returning
> -EFSCORRUPTED instead of ASSERTing.  Found by a xfs/391 crash while
> lastbit fuzzing of ltail.bestcount.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/libxfs/xfs_dir2_leaf.c |    3 ++-
>  fs/xfs/libxfs/xfs_dir2_node.c |    5 +++--
>  2 files changed, 5 insertions(+), 3 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_dir2_leaf.c b/fs/xfs/libxfs/xfs_dir2_leaf.c
> index d7e630f..d61d52d 100644
> --- a/fs/xfs/libxfs/xfs_dir2_leaf.c
> +++ b/fs/xfs/libxfs/xfs_dir2_leaf.c
> @@ -1415,7 +1415,8 @@ xfs_dir2_leaf_removename(
>  	oldbest = be16_to_cpu(bf[0].length);
>  	ltp = xfs_dir2_leaf_tail_p(args->geo, leaf);
>  	bestsp = xfs_dir2_leaf_bests_p(ltp);
> -	ASSERT(be16_to_cpu(bestsp[db]) == oldbest);
> +	if (be16_to_cpu(bestsp[db]) != oldbest)
> +		return -EFSCORRUPTED;
>  	/*
>  	 * Mark the former data entry unused.
>  	 */
> diff --git a/fs/xfs/libxfs/xfs_dir2_node.c b/fs/xfs/libxfs/xfs_dir2_node.c
> index 239d97a..0839ffe 100644
> --- a/fs/xfs/libxfs/xfs_dir2_node.c
> +++ b/fs/xfs/libxfs/xfs_dir2_node.c
> @@ -387,8 +387,9 @@ xfs_dir2_leaf_to_node(
>  	dp->d_ops->free_hdr_from_disk(&freehdr, free);
>  	leaf = lbp->b_addr;
>  	ltp = xfs_dir2_leaf_tail_p(args->geo, leaf);
> -	ASSERT(be32_to_cpu(ltp->bestcount) <=
> -				(uint)dp->i_d.di_size / args->geo->blksize);
> +	if (be32_to_cpu(ltp->bestcount) >
> +				(uint)dp->i_d.di_size / args->geo->blksize)
> +		return -EFSCORRUPTED;
>  
>  	/*
>  	 * Copy freespace entries from the leaf block to the new block.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] xfs: check for cow blocks before trying to clear them during inode reclaim
  2018-02-23  2:00 ` [PATCH 3/5] xfs: check for cow blocks before trying to clear them during inode reclaim Darrick J. Wong
@ 2018-02-27 13:55   ` Brian Foster
  0 siblings, 0 replies; 26+ messages in thread
From: Brian Foster @ 2018-02-27 13:55 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Feb 22, 2018 at 06:00:00PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> There's no point in allocating a transaction and locking the inode in
> preparation to clear cow blocks if there actually are any cow fork
> extents.  Therefore, move the xfs_reflink_cancel_cow_range hunk to
> xfs_inactive and check the cow ifp first.  This makes inode reclamation
> run faster.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_inode.c |    5 +++++
>  fs/xfs/xfs_super.c |    9 ---------
>  2 files changed, 5 insertions(+), 9 deletions(-)
> 
> 
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 604ee38..50fbbf5 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -1872,6 +1872,7 @@ xfs_inactive(
>  	xfs_inode_t	*ip)
>  {
>  	struct xfs_mount	*mp;
> +	struct xfs_ifork	*cow_ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
>  	int			error;
>  	int			truncate = 0;
>  
> @@ -1892,6 +1893,10 @@ xfs_inactive(
>  	if (mp->m_flags & XFS_MOUNT_RDONLY)
>  		return;
>  
> +	/* Try to clean out the cow blocks if there are any. */
> +	if (xfs_is_reflink_inode(ip) && cow_ifp->if_bytes > 0)
> +		xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, true);
> +
>  	if (VFS_I(ip)->i_nlink != 0) {
>  		/*
>  		 * force is true because we are evicting an inode from the
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 7aba628..624a802 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -970,7 +970,6 @@ xfs_fs_destroy_inode(
>  	struct inode		*inode)
>  {
>  	struct xfs_inode	*ip = XFS_I(inode);
> -	int			error;
>  
>  	trace_xfs_destroy_inode(ip);
>  
> @@ -978,14 +977,6 @@ xfs_fs_destroy_inode(
>  	XFS_STATS_INC(ip->i_mount, vn_rele);
>  	XFS_STATS_INC(ip->i_mount, vn_remove);
>  
> -	if (xfs_is_reflink_inode(ip)) {
> -		error = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, true);
> -		if (error && !XFS_FORCED_SHUTDOWN(ip->i_mount))
> -			xfs_warn(ip->i_mount,
> -"Error %d while evicting CoW blocks for inode %llu.",
> -					error, ip->i_ino);
> -	}
> -
>  	xfs_inactive(ip);
>  
>  	ASSERT(XFS_FORCED_SHUTDOWN(ip->i_mount) || ip->i_delayed_blks == 0);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/5] xfs: convert XFS_AGFL_SIZE to a helper function
  2018-02-23  2:00 ` [PATCH 4/5] xfs: convert XFS_AGFL_SIZE to a helper function Darrick J. Wong
@ 2018-02-27 19:34   ` Brian Foster
  0 siblings, 0 replies; 26+ messages in thread
From: Brian Foster @ 2018-02-27 19:34 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, Dave Chinner

On Thu, Feb 22, 2018 at 06:00:06PM -0800, Darrick J. Wong wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The AGFL size calculation is about to get more complex, so lets turn
> the macro into a function first and remove the macro.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> [darrick: forward port to newer kernel, simplify the helper]
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/libxfs/xfs_alloc.c  |   31 ++++++++++++++++++++++++-------
>  fs/xfs/libxfs/xfs_alloc.h  |    2 ++
>  fs/xfs/libxfs/xfs_format.h |   13 +------------
>  fs/xfs/scrub/agheader.c    |    6 +++---
>  fs/xfs/xfs_fsops.c         |    2 +-
>  5 files changed, 31 insertions(+), 23 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index c02781a..36101e5 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -53,6 +53,23 @@ STATIC int xfs_alloc_ag_vextent_size(xfs_alloc_arg_t *);
>  STATIC int xfs_alloc_ag_vextent_small(xfs_alloc_arg_t *,
>  		xfs_btree_cur_t *, xfs_agblock_t *, xfs_extlen_t *, int *);
>  
> +/*
> + * Size of the AGFL.  For CRC-enabled filesystes we steal a couple of slots in
> + * the beginning of the block for a proper header with the location information
> + * and CRC.
> + */
> +unsigned int
> +xfs_agfl_size(
> +	struct xfs_mount	*mp)
> +{
> +	unsigned int		size = mp->m_sb.sb_sectsize;
> +
> +	if (xfs_sb_version_hascrc(&mp->m_sb))
> +		size -= sizeof(struct xfs_agfl);
> +
> +	return size / sizeof(xfs_agblock_t);
> +}
> +
>  unsigned int
>  xfs_refc_block(
>  	struct xfs_mount	*mp)
> @@ -550,7 +567,7 @@ xfs_agfl_verify(
>  	if (bp->b_pag && be32_to_cpu(agfl->agfl_seqno) != bp->b_pag->pag_agno)
>  		return __this_address;
>  
> -	for (i = 0; i < XFS_AGFL_SIZE(mp); i++) {
> +	for (i = 0; i < xfs_agfl_size(mp); i++) {
>  		if (be32_to_cpu(agfl->agfl_bno[i]) != NULLAGBLOCK &&
>  		    be32_to_cpu(agfl->agfl_bno[i]) >= mp->m_sb.sb_agblocks)
>  			return __this_address;
> @@ -2266,7 +2283,7 @@ xfs_alloc_get_freelist(
>  	bno = be32_to_cpu(agfl_bno[be32_to_cpu(agf->agf_flfirst)]);
>  	be32_add_cpu(&agf->agf_flfirst, 1);
>  	xfs_trans_brelse(tp, agflbp);
> -	if (be32_to_cpu(agf->agf_flfirst) == XFS_AGFL_SIZE(mp))
> +	if (be32_to_cpu(agf->agf_flfirst) == xfs_agfl_size(mp))
>  		agf->agf_flfirst = 0;
>  
>  	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
> @@ -2377,7 +2394,7 @@ xfs_alloc_put_freelist(
>  			be32_to_cpu(agf->agf_seqno), &agflbp)))
>  		return error;
>  	be32_add_cpu(&agf->agf_fllast, 1);
> -	if (be32_to_cpu(agf->agf_fllast) == XFS_AGFL_SIZE(mp))
> +	if (be32_to_cpu(agf->agf_fllast) == xfs_agfl_size(mp))
>  		agf->agf_fllast = 0;
>  
>  	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
> @@ -2395,7 +2412,7 @@ xfs_alloc_put_freelist(
>  
>  	xfs_alloc_log_agf(tp, agbp, logflags);
>  
> -	ASSERT(be32_to_cpu(agf->agf_flcount) <= XFS_AGFL_SIZE(mp));
> +	ASSERT(be32_to_cpu(agf->agf_flcount) <= xfs_agfl_size(mp));
>  
>  	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);
>  	blockp = &agfl_bno[be32_to_cpu(agf->agf_fllast)];
> @@ -2428,9 +2445,9 @@ xfs_agf_verify(
>  	if (!(agf->agf_magicnum == cpu_to_be32(XFS_AGF_MAGIC) &&
>  	      XFS_AGF_GOOD_VERSION(be32_to_cpu(agf->agf_versionnum)) &&
>  	      be32_to_cpu(agf->agf_freeblks) <= be32_to_cpu(agf->agf_length) &&
> -	      be32_to_cpu(agf->agf_flfirst) < XFS_AGFL_SIZE(mp) &&
> -	      be32_to_cpu(agf->agf_fllast) < XFS_AGFL_SIZE(mp) &&
> -	      be32_to_cpu(agf->agf_flcount) <= XFS_AGFL_SIZE(mp)))
> +	      be32_to_cpu(agf->agf_flfirst) < xfs_agfl_size(mp) &&
> +	      be32_to_cpu(agf->agf_fllast) < xfs_agfl_size(mp) &&
> +	      be32_to_cpu(agf->agf_flcount) <= xfs_agfl_size(mp)))
>  		return __this_address;
>  
>  	if (be32_to_cpu(agf->agf_levels[XFS_BTNUM_BNO]) < 1 ||
> diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
> index 65a0caf..a311a24 100644
> --- a/fs/xfs/libxfs/xfs_alloc.h
> +++ b/fs/xfs/libxfs/xfs_alloc.h
> @@ -26,6 +26,8 @@ struct xfs_trans;
>  
>  extern struct workqueue_struct *xfs_alloc_wq;
>  
> +unsigned int xfs_agfl_size(struct xfs_mount *mp);
> +
>  /*
>   * Freespace allocation types.  Argument to xfs_alloc_[v]extent.
>   */
> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> index 1acb584..42956d8 100644
> --- a/fs/xfs/libxfs/xfs_format.h
> +++ b/fs/xfs/libxfs/xfs_format.h
> @@ -803,24 +803,13 @@ typedef struct xfs_agi {
>  		&(XFS_BUF_TO_AGFL(bp)->agfl_bno[0]) : \
>  		(__be32 *)(bp)->b_addr)
>  
> -/*
> - * Size of the AGFL.  For CRC-enabled filesystes we steal a couple of
> - * slots in the beginning of the block for a proper header with the
> - * location information and CRC.
> - */
> -#define XFS_AGFL_SIZE(mp) \
> -	(((mp)->m_sb.sb_sectsize - \
> -	 (xfs_sb_version_hascrc(&((mp)->m_sb)) ? \
> -		sizeof(struct xfs_agfl) : 0)) / \
> -	  sizeof(xfs_agblock_t))
> -
>  typedef struct xfs_agfl {
>  	__be32		agfl_magicnum;
>  	__be32		agfl_seqno;
>  	uuid_t		agfl_uuid;
>  	__be64		agfl_lsn;
>  	__be32		agfl_crc;
> -	__be32		agfl_bno[];	/* actually XFS_AGFL_SIZE(mp) */
> +	__be32		agfl_bno[];	/* actually xfs_agfl_size(mp) */
>  } __attribute__((packed)) xfs_agfl_t;
>  
>  #define XFS_AGFL_CRC_OFF	offsetof(struct xfs_agfl, agfl_crc)
> diff --git a/fs/xfs/scrub/agheader.c b/fs/xfs/scrub/agheader.c
> index fd97552..fd383c5 100644
> --- a/fs/xfs/scrub/agheader.c
> +++ b/fs/xfs/scrub/agheader.c
> @@ -80,7 +80,7 @@ xfs_scrub_walk_agfl(
>  	}
>  
>  	/* first to the end */
> -	for (i = flfirst; i < XFS_AGFL_SIZE(mp); i++) {
> +	for (i = flfirst; i < xfs_agfl_size(mp); i++) {
>  		error = fn(sc, be32_to_cpu(agfl_bno[i]), priv);
>  		if (error)
>  			return error;
> @@ -664,7 +664,7 @@ xfs_scrub_agf(
>  	if (agfl_last > agfl_first)
>  		fl_count = agfl_last - agfl_first + 1;
>  	else
> -		fl_count = XFS_AGFL_SIZE(mp) - agfl_first + agfl_last + 1;
> +		fl_count = xfs_agfl_size(mp) - agfl_first + agfl_last + 1;
>  	if (agfl_count != 0 && fl_count != agfl_count)
>  		xfs_scrub_block_set_corrupt(sc, sc->sa.agf_bp);
>  
> @@ -791,7 +791,7 @@ xfs_scrub_agfl(
>  	/* Allocate buffer to ensure uniqueness of AGFL entries. */
>  	agf = XFS_BUF_TO_AGF(sc->sa.agf_bp);
>  	agflcount = be32_to_cpu(agf->agf_flcount);
> -	if (agflcount > XFS_AGFL_SIZE(sc->mp)) {
> +	if (agflcount > xfs_agfl_size(sc->mp)) {
>  		xfs_scrub_block_set_corrupt(sc, sc->sa.agf_bp);
>  		goto out;
>  	}
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index 8b45456..5237927 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -217,7 +217,7 @@ xfs_growfs_data_private(
>  		}
>  
>  		agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, bp);
> -		for (bucket = 0; bucket < XFS_AGFL_SIZE(mp); bucket++)
> +		for (bucket = 0; bucket < xfs_agfl_size(mp); bucket++)
>  			agfl_bno[bucket] = cpu_to_be32(NULLAGBLOCK);
>  
>  		error = xfs_bwrite(bp);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] xfs: fix agfl wrapping
  2018-02-23  2:00 ` [PATCH 5/5] xfs: fix agfl wrapping Darrick J. Wong
  2018-02-23  4:40   ` Darrick J. Wong
  2018-02-23 20:33   ` Darrick J. Wong
@ 2018-02-27 19:35   ` Brian Foster
  2018-02-27 21:03     ` Darrick J. Wong
  2018-03-01  6:37     ` Darrick J. Wong
  2018-03-01  6:42   ` [PATCH v2 " Darrick J. Wong
  3 siblings, 2 replies; 26+ messages in thread
From: Brian Foster @ 2018-02-27 19:35 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Feb 22, 2018 at 06:00:15PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/Makefile     |    1 
>  fs/xfs/xfs_fixups.c |  310 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_fixups.h |   26 ++++
>  fs/xfs/xfs_mount.c  |   21 +++
>  fs/xfs/xfs_super.c  |   10 ++
>  5 files changed, 367 insertions(+), 1 deletion(-)
>  create mode 100644 fs/xfs/xfs_fixups.c
>  create mode 100644 fs/xfs/xfs_fixups.h
> 
> 
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index b03c77e..f88368a 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -86,6 +86,7 @@ xfs-y				+= xfs_aops.o \
>  				   xfs_extent_busy.o \
>  				   xfs_file.o \
>  				   xfs_filestream.o \
> +				   xfs_fixups.o \
>  				   xfs_fsmap.o \
>  				   xfs_fsops.o \
>  				   xfs_globals.o \
> diff --git a/fs/xfs/xfs_fixups.c b/fs/xfs/xfs_fixups.c
> new file mode 100644
> index 0000000..0cad7bb
> --- /dev/null
> +++ b/fs/xfs/xfs_fixups.c
> @@ -0,0 +1,310 @@
> +/*
> + * Copyright (C) 2018 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_shared.h"
> +#include "xfs_format.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_sb.h"
> +#include "xfs_mount.h"
> +#include "xfs_alloc.h"
> +#include "xfs_trans.h"
> +#include "xfs_fixups.h"
> +
> +/*
> + * v5 AGFL padding defects
> + *
> + * When the v5 format was first introduced, there was a defect in the struct
> + * xfs_agfl definition that resulted in XFS_AGFL_SIZE returning different

XFS_AGFL_SIZE() no longer exists as of the previous patch. It might be
better to size refer to "the agfl size" generically.

> + * values depending on the compiler padding.  On a fs with 512-byte sectors,
> + * this meant that XFS_AGFL_SIZE was 119 on i386, but 118 on x64.  Commit
> + * 96f859d52bcb1 ("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is
> + * correct") changed the definition to disable padding the end of the
> + * structure, and was accepted into Linux 4.5.  Since then, the AGFL has
> + * always used the larger size (e.g. 119 entries on a 512b sector fs).
> + *
> + * Unfortunately, pre-4.5 kernels can produce filesystems with AGFLs that wrap
> + * at the smaller size, and those kernels are not prepared to handle the
> + * longer size.  This typically manifests itself as an AGF verifier corruption
> + * error followed by a filesystem shutdown.  While we encourage admins to stay
> + * current with software, we would like to avoid this intermittent breakage.
> + *
> + * Any v5 filesystem which has a feature bit set for a feature that was
> + * introduced after Linux 4.5 will not have this problem, as such kernels
> + * cannot be mounted on older kernels.  v4 filesystems are also unaffected.
> + *
> + * Therefore, we add two fixup functions -- the first runs at mount time to
> + * detect a short-wrapped AGFL and fix it; the second runs at unmount, freeze,
> + * or remount-ro time to move a wrapped AGFL to the beginning of the list.
> + * This reduces the likelihood of a screwup to the scenario where you have (a)
> + * a filesystem with no post-4.5 features (reflink, rmap), (b) the AGFL wraps,
> + * (c) the filesystem goes down leaving a dirty log, and (d) the dirty
> + * filesystem is mounted on an old kernel.
> + */
> +

While the mount vs. unmount time fixups serve different purposes and
have different conditions, do we really need two separate fixup
implementations? E.g., if we have to unwrap the AGFL at unmount time,
why not just reuse that fixup at mount time (when needed) as well? I
suspect the difference would only be the length of what we can consider
valid in the agfl at the time.

> +/*
> + * Decide if we need to have the agfl wrapping fixes applied.  This only
> + * affects v5 filesystems that do not have any features enabled that did not
> + * exist when the agfl padding fix went in.
> + *
> + * Features already present when the fix went in were finobt, ftype, spinodes.
> + * If we see something new (e.g. reflink) then don't bother.
> + */
> +#define XFS_SB_FEAT_RO_COMPAT_AGFL_WRAP_ALREADY_FIXED \
> +		(~(XFS_SB_FEAT_RO_COMPAT_FINOBT))
> +#define XFS_SB_FEAT_INCOMPAT_AGFL_WRAP_ALREADY_FIXED \
> +		(~(XFS_SB_FEAT_INCOMPAT_FTYPE | \
> +		   XFS_SB_FEAT_INCOMPAT_SPINODES))
> +#define XFS_SB_FEAT_INCOMPAT_LOG_AGFL_WRAP_ALREADY_FIXED \
> +		(~0)
> +static inline bool xfs_sb_version_needs_agfl_wrap_fixes(struct xfs_sb *sbp)
> +{
> +	return xfs_sb_version_hascrc(sbp) &&
> +		!xfs_sb_has_incompat_feature(sbp,
> +			XFS_SB_FEAT_INCOMPAT_AGFL_WRAP_ALREADY_FIXED) &&
> +		!xfs_sb_has_ro_compat_feature(sbp,
> +			XFS_SB_FEAT_RO_COMPAT_AGFL_WRAP_ALREADY_FIXED) &&
> +		!xfs_sb_has_incompat_log_feature(sbp,
> +			XFS_SB_FEAT_INCOMPAT_LOG_AGFL_WRAP_ALREADY_FIXED);
> +}
> +
> +/*
> + * Fix an AGFL wrapping that falls short of the end of the block by filling the
> + * gap at the end of the block.
> + */
> +STATIC int
> +xfs_fixup_freelist_wrap_mount(
> +	struct xfs_trans	*tp,
> +	struct xfs_buf		*agfbp,
> +	struct xfs_perag	*pag)
> +{
> +	struct xfs_mount	*mp = tp->t_mountp;
> +	struct xfs_agf		*agf;
> +	struct xfs_buf		*agflbp;
> +	__be32			*agfl_bno;
> +	xfs_agnumber_t		agno;
> +	uint32_t		agfl_size;
> +	uint32_t		flfirst;
> +	uint32_t		fllast;
> +	int32_t			active;
> +	int			offset;
> +	int			len;
> +	int			error;
> +
> +	if (pag->pagf_flcount == 0)
> +		return 0;
> +
> +	agfl_size = xfs_agfl_size(mp);
> +	agf = XFS_BUF_TO_AGF(agfbp);
> +	agno = be32_to_cpu(agf->agf_seqno);
> +	flfirst = be32_to_cpu(agf->agf_flfirst);
> +	fllast = be32_to_cpu(agf->agf_fllast);
> +
> +	/* Make sure we're either spot on or off by 1. */
> +	active = fllast - flfirst + 1;
> +	if (active <= 0)
> +		active += agfl_size;
> +	if (active == pag->pagf_flcount)
> +		return 0;
> +	else if (active != pag->pagf_flcount + 1)
> +		return -EFSCORRUPTED;
> +

So we're not attempting to cover the case where the agfl has 1 more
block than the agfl size (i.e., the case where an fs goes back to a
kernel with an unpacked header)?

I'm wondering whether using the unmount algorithm in both places (as
noted above) would also facilitate this mechanism to work both ways
(unpacked <-> packed), provided particular conditions are met. I suppose
that could be considered regardless, but the less variance the better
IMO.

> +	/* Would this have even passed muster on an old system? */

Comment doesn't really explain what is going on here..?

> +	if (flfirst >= agfl_size - 1 || fllast >= agfl_size - 1 ||
> +	    pag->pagf_flcount > agfl_size - 1)
> +		return -EFSCORRUPTED;
> +

I take it these are checks for whether the agfl was previously corrupted
based on the unpacked size? FWIW, it might be a bit more clear to use an
old/unpacked_agfl_size variable to declare intent (here and in some of
the comments and whatnot that follow).

> +	/*
> +	 * Convert a 40-byte-padded agfl into a 36-byte-padded AGFL.
> +	 * Therefore, we need to move the AGFL blocks
> +	 * bno[flfirst..agfl_size - 2] to bno[flfirst + 1...agfl_size - 1].
> +	 *
> +	 * Reusing the example above, if we had flfirst == 116, we need
> +	 * to move bno[116] and bno[117] into bno[117] and bno[118],
> +	 * respectively, and then increment flfirst.
> +	 */

Kind of a strange example to use given that it doesn't mention
wrapping.. this only triggers if the agfl was previously wrapped, right?

> +	error = xfs_alloc_read_agfl(mp, tp, agno, &agflbp);
> +	if (error)
> +		return error;
> +	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);
> +
> +	len = (agfl_size - flfirst - 1) * sizeof(xfs_agblock_t);
> +	memmove(&agfl_bno[flfirst + 1], &agfl_bno[flfirst], len);
> +	offset = (char *)&agfl_bno[flfirst + 1] - (char *)agflbp->b_addr;
> +	be32_add_cpu(&agf->agf_flfirst, 1);
> +
> +	xfs_trans_buf_set_type(tp, agflbp, XFS_BLFT_AGFL_BUF);
> +	xfs_trans_log_buf(tp, agflbp, offset, offset + len - 1);
> +	xfs_trans_brelse(tp, agflbp);
> +	agflbp = NULL;
> +	xfs_alloc_log_agf(tp, agfbp, XFS_AGF_FLFIRST);
> +
> +	return 0;
> +}
> +
> +/*
> + * Fix an AGFL that touches the end of the block by moving the first or last
> + * part of the list elsewhere in the AGFL so that old kernels don't trip over
> + * wrapping issues.
> + */
> +STATIC int
> +xfs_fixup_freelist_wrap_unmount(
> +	struct xfs_trans	*tp,
> +	struct xfs_buf		*agfbp,
> +	struct xfs_perag	*pag)
> +{
> +	struct xfs_mount	*mp = tp->t_mountp;
> +	struct xfs_agf		*agf;
> +	struct xfs_buf		*agflbp;
> +	__be32			*agfl_bno;
> +	xfs_agnumber_t		agno;
> +	uint32_t		agfl_size;
> +	uint32_t		flfirst;
> +	uint32_t		fllast;
> +	int			offset;
> +	int			len;
> +	int			error;
> +
> +	agfl_size = xfs_agfl_size(mp);
> +	agf = XFS_BUF_TO_AGF(agfbp);
> +	agno = be32_to_cpu(agf->agf_seqno);
> +	flfirst = be32_to_cpu(agf->agf_flfirst);
> +	fllast = be32_to_cpu(agf->agf_fllast);
> +
> +	/* Empty AGFL?  Make sure we aren't pointing at the end. */
> +	if (pag->pagf_flcount == 0) {
> +		if (flfirst >= agfl_size || fllast >= agfl_size) {

When would either field be >= agfl_size? Isn't that where they wrap?

> +			agf->agf_flfirst = cpu_to_be32(1);
> +			agf->agf_fllast = 0;
> +			xfs_alloc_log_agf(tp, agfbp,
> +					XFS_AGF_FLFIRST | XFS_AGF_FLLAST);
> +		}
> +		return 0;
> +	}
> +
> +	/* If we don't hit the end, we're done. */
> +	if (flfirst < fllast && fllast != agfl_size - 1)
> +		return 0;
> +
> +	/*
> +	 * Move a start of a wrapped list towards the start of the agfl block.

FWIW, this and the subsequent comments kind of gave me the impression
that the agfl would be "shifted" towards the start of the block. The
code doesn't do that, but rather rotates the start/head of the agfl to
the tail and adjusts the pointers (changing the effective order). That
seems technically Ok, but I had to grok the code and read back to
understand the comment rather than the other way around.

> +	 * Therefore, we need to move the AGFL blocks
> +	 * bno[flfirst..agfl_size - 1] to bno[fllast + 1...agfl_size - flfirst].
> +	 * Then we reset flfirst and fllast appropriately.
> +	 *
> +	 * Reusing the example above, if we had flfirst == 117 and fllast == 4,
> +	 * we need to move bno[117] and bno[118] into bno[5] and bno[6],
> +	 * respectively, and then reset flfirst and fllast.
> +	 *
> +	 * If it's just the last block that touches the end, only move that.
> +	 */
> +	error = xfs_alloc_read_agfl(mp, tp, agno, &agflbp);
> +	if (error)
> +		return error;
> +	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);
> +
> +	if (fllast == agfl_size - 1) {
> +		/* Back the AGFL off from the end of the block. */
> +		len = sizeof(xfs_agblock_t);
> +		agfl_bno[flfirst - 1] = agfl_bno[agfl_size - 1];

What if the agfl is full (flfirst == 0)?

> +		offset = (char *)&agfl_bno[flfirst - 1] - (char *)agflbp->b_addr;
> +		be32_add_cpu(&agf->agf_fllast, -1);
> +		be32_add_cpu(&agf->agf_flfirst, -1);
> +	} else {
> +		/* Move the first part of the AGFL towards the front. */
> +		len = (agfl_size - flfirst) * sizeof(xfs_agblock_t);
> +		memcpy(&agfl_bno[fllast + 1], &agfl_bno[flfirst], len);
> +		offset = (char *)&agfl_bno[fllast + 1] - (char *)agflbp->b_addr;
> +		agf->agf_flfirst = 0;
> +		agf->agf_fllast = cpu_to_be32(pag->pagf_flcount - 1);

Similar question here as above.. it looks like the copy may be a no-op,
but we still reset flfirst/fllast (which may also be fine, but
unnecessary).

The more interesting case here is flcount == 1 and flfirst == fllast,
where it looks like we'd wrongly reset first/last (and perhaps copy more
than we should..?).

Ugh. Remind me again why we can't just detect a mismatch, fail the mount
and tell the user to repair? :P Random thought... have we considered
anything less invasive/more simple for this issue than low level buffer
manipulation? For example, could we address the unpacked -> packed case
by just allocating a block to fill the gap at the end of the wrapped
agfl? Perhaps that could even be done more discreetly in
xfs_alloc_fix_freelist() when we already have a transaction set up, etc.
That wouldn't help going the other direction, but maybe we could
consider similar logic to free an extraneous block and then worry about
protecting users who won't upgrade broken kernels separately if that
really continues to be a problem.

Brian

> +	}
> +
> +	xfs_trans_buf_set_type(tp, agflbp, XFS_BLFT_AGFL_BUF);
> +	xfs_trans_log_buf(tp, agflbp, offset, offset + len - 1);
> +	xfs_trans_brelse(tp, agflbp);
> +	agflbp = NULL;
> +	xfs_alloc_log_agf(tp, agfbp, XFS_AGF_FLFIRST | XFS_AGF_FLLAST);
> +
> +	return 0;
> +}
> +
> +typedef int (*xfs_agf_apply_fn_t)(struct xfs_trans *tp, struct xfs_buf *agfbp,
> +		struct xfs_perag *pag);
> +
> +/* Apply something to every AGF. */
> +STATIC int
> +xfs_fixup_agf_apply(
> +	struct xfs_mount	*mp,
> +	xfs_agf_apply_fn_t	fn)
> +{
> +	struct xfs_trans	*tp;
> +	struct xfs_perag	*pag;
> +	struct xfs_buf		*agfbp;
> +	xfs_agnumber_t		agno;
> +	int			error;
> +
> +	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growdata, 0, 0, 0, &tp);
> +	if (error)
> +		return error;
> +
> +	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
> +		error = xfs_alloc_read_agf(mp, tp, agno, 0, &agfbp);
> +		if (error)
> +			goto cancel;
> +		if (!agfbp) {
> +			error = -ENOMEM;
> +			goto cancel;
> +		}
> +		pag = xfs_perag_get(mp, agno);
> +		error = fn(tp, agfbp, pag);
> +		xfs_perag_put(pag);
> +		xfs_trans_brelse(tp, agfbp);
> +		if (error)
> +			goto cancel;
> +	}
> +
> +	return xfs_trans_commit(tp);
> +cancel:
> +	xfs_trans_cancel(tp);
> +	return error;
> +}
> +
> +/* Fix AGFL wrapping so we can use the filesystem. */
> +int
> +xfs_fixup_agfl_wrap_mount(
> +	struct xfs_mount	*mp)
> +{
> +	if (!xfs_sb_version_needs_agfl_wrap_fixes(&mp->m_sb))
> +		return 0;
> +
> +	return xfs_fixup_agf_apply(mp, xfs_fixup_freelist_wrap_mount);
> +}
> +
> +/* Fix AGFL wrapping so old kernels can use this filesystem. */
> +int
> +xfs_fixup_agfl_wrap_unmount(
> +	struct xfs_mount	*mp)
> +{
> +	if (!xfs_sb_version_needs_agfl_wrap_fixes(&mp->m_sb))
> +		return 0;
> +
> +	return xfs_fixup_agf_apply(mp, xfs_fixup_freelist_wrap_unmount);
> +}
> diff --git a/fs/xfs/xfs_fixups.h b/fs/xfs/xfs_fixups.h
> new file mode 100644
> index 0000000..fb52a96
> --- /dev/null
> +++ b/fs/xfs/xfs_fixups.h
> @@ -0,0 +1,26 @@
> +/*
> + * Copyright (C) 2018 Oracle.  All Rights Reserved.
> + *
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> + */
> +#ifndef	__XFS_FIXUPS_H__
> +#define	__XFS_FIXUPS_H__
> +
> +int xfs_fixup_agfl_wrap_mount(struct xfs_mount *mp);
> +int xfs_fixup_agfl_wrap_unmount(struct xfs_mount *mp);
> +
> +#endif /* __XFS_FIXUPS_H__ */
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index 98fd41c..eb284aa 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -46,7 +46,7 @@
>  #include "xfs_refcount_btree.h"
>  #include "xfs_reflink.h"
>  #include "xfs_extent_busy.h"
> -
> +#include "xfs_fixups.h"
>  
>  static DEFINE_MUTEX(xfs_uuid_table_mutex);
>  static int xfs_uuid_table_size;
> @@ -875,6 +875,16 @@ xfs_mountfs(
>  	}
>  
>  	/*
> +	 * Make sure our AGFL counters do not wrap the end of the block
> +	 * in a troublesome manner.
> +	 */
> +	error = xfs_fixup_agfl_wrap_mount(mp);
> +	if (error) {
> +		xfs_warn(mp, "Failed to fix agfl wrapping.  Run xfs_repair.");
> +		goto out_log_dealloc;
> +	}
> +
> +	/*
>  	 * Get and sanity-check the root inode.
>  	 * Save the pointer to it in the mount structure.
>  	 */
> @@ -1128,6 +1138,15 @@ xfs_unmountfs(
>  	xfs_qm_unmount(mp);
>  
>  	/*
> +	 * Make sure our AGFL counters do not wrap the end of the block
> +	 * in a troublesome manner for old kernels.
> +	 */
> +	error = xfs_fixup_agfl_wrap_unmount(mp);
> +	if (error)
> +		xfs_warn(mp, "Unable to fix agfl wrapping.  "
> +				"This may cause problems on next mount.");
> +
> +	/*
>  	 * Unreserve any blocks we have so that when we unmount we don't account
>  	 * the reserved free space as used. This is really only necessary for
>  	 * lazy superblock counting because it trusts the incore superblock
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 624a802..d9aa39a 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -50,6 +50,7 @@
>  #include "xfs_refcount_item.h"
>  #include "xfs_bmap_item.h"
>  #include "xfs_reflink.h"
> +#include "xfs_fixups.h"
>  
>  #include <linux/namei.h>
>  #include <linux/dax.h>
> @@ -1206,6 +1207,15 @@ xfs_quiesce_attr(
>  	xfs_reclaim_inodes(mp, 0);
>  	xfs_reclaim_inodes(mp, SYNC_WAIT);
>  
> +	/*
> +	 * Make sure our AGFL counters do not wrap the end of the block
> +	 * in a troublesome manner for old kernels.
> +	 */
> +	error = xfs_fixup_agfl_wrap_unmount(mp);
> +	if (error)
> +		xfs_warn(mp, "Unable to fix agfl wrapping.  "
> +				"This may cause problems on next mount.");
> +
>  	/* Push the superblock and write an unmount record */
>  	error = xfs_log_sbcount(mp);
>  	if (error)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] xfs: fix agfl wrapping
  2018-02-27 19:35   ` Brian Foster
@ 2018-02-27 21:03     ` Darrick J. Wong
  2018-02-28 22:43       ` Brian Foster
  2018-03-01  6:37     ` Darrick J. Wong
  1 sibling, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2018-02-27 21:03 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Tue, Feb 27, 2018 at 02:35:49PM -0500, Brian Foster wrote:
> On Thu, Feb 22, 2018 at 06:00:15PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/Makefile     |    1 
> >  fs/xfs/xfs_fixups.c |  310 +++++++++++++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/xfs_fixups.h |   26 ++++
> >  fs/xfs/xfs_mount.c  |   21 +++
> >  fs/xfs/xfs_super.c  |   10 ++
> >  5 files changed, 367 insertions(+), 1 deletion(-)
> >  create mode 100644 fs/xfs/xfs_fixups.c
> >  create mode 100644 fs/xfs/xfs_fixups.h
> > 
> > 
> > diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> > index b03c77e..f88368a 100644
> > --- a/fs/xfs/Makefile
> > +++ b/fs/xfs/Makefile
> > @@ -86,6 +86,7 @@ xfs-y				+= xfs_aops.o \
> >  				   xfs_extent_busy.o \
> >  				   xfs_file.o \
> >  				   xfs_filestream.o \
> > +				   xfs_fixups.o \
> >  				   xfs_fsmap.o \
> >  				   xfs_fsops.o \
> >  				   xfs_globals.o \
> > diff --git a/fs/xfs/xfs_fixups.c b/fs/xfs/xfs_fixups.c
> > new file mode 100644
> > index 0000000..0cad7bb
> > --- /dev/null
> > +++ b/fs/xfs/xfs_fixups.c
> > @@ -0,0 +1,310 @@
> > +/*
> > + * Copyright (C) 2018 Oracle.  All Rights Reserved.
> > + *
> > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version 2
> > + * of the License, or (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it would be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write the Free Software Foundation,
> > + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> > + */
> > +#include "xfs.h"
> > +#include "xfs_fs.h"
> > +#include "xfs_shared.h"
> > +#include "xfs_format.h"
> > +#include "xfs_log_format.h"
> > +#include "xfs_trans_resv.h"
> > +#include "xfs_sb.h"
> > +#include "xfs_mount.h"
> > +#include "xfs_alloc.h"
> > +#include "xfs_trans.h"
> > +#include "xfs_fixups.h"
> > +
> > +/*
> > + * v5 AGFL padding defects
> > + *
> > + * When the v5 format was first introduced, there was a defect in the struct
> > + * xfs_agfl definition that resulted in XFS_AGFL_SIZE returning different
> 
> XFS_AGFL_SIZE() no longer exists as of the previous patch. It might be
> better to size refer to "the agfl size" generically.

How about:

"...the agfl length calculation formula (at the time known as
XFS_AGFL_SIZE)..."

> > + * values depending on the compiler padding.  On a fs with 512-byte sectors,
> > + * this meant that XFS_AGFL_SIZE was 119 on i386, but 118 on x64.  Commit
> > + * 96f859d52bcb1 ("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is
> > + * correct") changed the definition to disable padding the end of the
> > + * structure, and was accepted into Linux 4.5.  Since then, the AGFL has
> > + * always used the larger size (e.g. 119 entries on a 512b sector fs).
> > + *
> > + * Unfortunately, pre-4.5 kernels can produce filesystems with AGFLs that wrap
> > + * at the smaller size, and those kernels are not prepared to handle the
> > + * longer size.  This typically manifests itself as an AGF verifier corruption
> > + * error followed by a filesystem shutdown.  While we encourage admins to stay
> > + * current with software, we would like to avoid this intermittent breakage.
> > + *
> > + * Any v5 filesystem which has a feature bit set for a feature that was
> > + * introduced after Linux 4.5 will not have this problem, as such kernels
> > + * cannot be mounted on older kernels.  v4 filesystems are also unaffected.
> > + *
> > + * Therefore, we add two fixup functions -- the first runs at mount time to
> > + * detect a short-wrapped AGFL and fix it; the second runs at unmount, freeze,
> > + * or remount-ro time to move a wrapped AGFL to the beginning of the list.
> > + * This reduces the likelihood of a screwup to the scenario where you have (a)
> > + * a filesystem with no post-4.5 features (reflink, rmap), (b) the AGFL wraps,
> > + * (c) the filesystem goes down leaving a dirty log, and (d) the dirty
> > + * filesystem is mounted on an old kernel.
> > + */
> > +
> 
> While the mount vs. unmount time fixups serve different purposes and
> have different conditions, do we really need two separate fixup
> implementations? E.g., if we have to unwrap the AGFL at unmount time,
> why not just reuse that fixup at mount time (when needed) as well? I
> suspect the difference would only be the length of what we can consider
> valid in the agfl at the time.

They're doing two different things.  I will use ASCII art to demonstrate
the mount-time function:

Let's say the AGFL block has space for 10 items.  It looks like this:

         0   1   2   3   4   5   6   7   8   9
Header |   |   |   |   |   |   |   |   |   |   |


When the fs is freshly formatted or repaired, the AGFL will look like:

         0   1   2   3   4   5   6   7   8   9
Header | A | B | C | D | E | F |   |   |   |   |
         ^-- flfirst         ^-- fllast

Due to the padding problems prior to 4.5 ("4.5 agfl padding fix"),
XFS_AGFL_SIZE would return 10 on 32-bit systems and 9 on 64-bit systems.
Therefore, if the AGFL wrapped on a 64-bit kernel we would end up with:

         0   1   2   3   4   5   6   7   8   9
Header | D | E | F |   |   |   | A | B | C | ? |
                 ^-- fllast      ^-- flfirst

Note that block 9's contents are undefined because we didn't write
anything there.  Now the "4.5 agfl padding fix" has corrected
XFS_AGFL_SIZE to return 10 in all cases, this looks like an AGF
corruption because flcount is 6 but the distance between flfirst and
fllast is 7.  So long as the list wraps and the mismatch is exactly one
block we can fix this.  The mount time fixer in this patch does this by
moving the records between flfirst and xfs_agfl_size (A, B, and C) to
the end of the block to close the gap:

         0   1   2   3   4   5   6   7   8   9
Header | D | E | F |   |   |   |   | A | B | C |
                 ^-- fllast          ^-- flfirst

If the count is off by more than 1 then the AGF is truly corrupt and we
bail out.

Now let's say that we run for a while and want to unmount, but our AGFL
wraps like this:

         0   1   2   3   4   5   6   7   8   9
Header | T | U | V |   |   |   |   | Q | R | S |
                 ^-- fllast          ^-- flfirst

We don't know that the next kernel to mount this filesystem will have
the "4.5 agfl padding fix" applied to it; if it does not, it will flag
the AGF as corrupted because flcount is 6 but in its view the distance
between flfirst and fllast (which omits bno[9]) is 5.  We don't want
it to choke on that, so the unmount fixer moves all the records between
flfirst and the end (Q, R, and S) towards the start of the block and
resets flfirst/fllast:

         0   1   2   3   4   5   6   7   8   9
Header | T | U | V | Q | R | S |   |   |   |   |
         ^-- flfirst         ^-- fllast

Since the AGFL no longer wraps at all, it doesn't matter if the next
kernel to mount this filesystem has the "4.5 agfl padding fix" applied;
all kernels can handle this correctly.

> > +/*
> > + * Decide if we need to have the agfl wrapping fixes applied.  This only
> > + * affects v5 filesystems that do not have any features enabled that did not
> > + * exist when the agfl padding fix went in.
> > + *
> > + * Features already present when the fix went in were finobt, ftype, spinodes.
> > + * If we see something new (e.g. reflink) then don't bother.
> > + */
> > +#define XFS_SB_FEAT_RO_COMPAT_AGFL_WRAP_ALREADY_FIXED \
> > +		(~(XFS_SB_FEAT_RO_COMPAT_FINOBT))
> > +#define XFS_SB_FEAT_INCOMPAT_AGFL_WRAP_ALREADY_FIXED \
> > +		(~(XFS_SB_FEAT_INCOMPAT_FTYPE | \
> > +		   XFS_SB_FEAT_INCOMPAT_SPINODES))
> > +#define XFS_SB_FEAT_INCOMPAT_LOG_AGFL_WRAP_ALREADY_FIXED \
> > +		(~0)
> > +static inline bool xfs_sb_version_needs_agfl_wrap_fixes(struct xfs_sb *sbp)
> > +{
> > +	return xfs_sb_version_hascrc(sbp) &&
> > +		!xfs_sb_has_incompat_feature(sbp,
> > +			XFS_SB_FEAT_INCOMPAT_AGFL_WRAP_ALREADY_FIXED) &&
> > +		!xfs_sb_has_ro_compat_feature(sbp,
> > +			XFS_SB_FEAT_RO_COMPAT_AGFL_WRAP_ALREADY_FIXED) &&
> > +		!xfs_sb_has_incompat_log_feature(sbp,
> > +			XFS_SB_FEAT_INCOMPAT_LOG_AGFL_WRAP_ALREADY_FIXED);
> > +}
> > +
> > +/*
> > + * Fix an AGFL wrapping that falls short of the end of the block by filling the
> > + * gap at the end of the block.
> > + */
> > +STATIC int
> > +xfs_fixup_freelist_wrap_mount(
> > +	struct xfs_trans	*tp,
> > +	struct xfs_buf		*agfbp,
> > +	struct xfs_perag	*pag)
> > +{
> > +	struct xfs_mount	*mp = tp->t_mountp;
> > +	struct xfs_agf		*agf;
> > +	struct xfs_buf		*agflbp;
> > +	__be32			*agfl_bno;
> > +	xfs_agnumber_t		agno;
> > +	uint32_t		agfl_size;
> > +	uint32_t		flfirst;
> > +	uint32_t		fllast;
> > +	int32_t			active;
> > +	int			offset;
> > +	int			len;
> > +	int			error;
> > +
> > +	if (pag->pagf_flcount == 0)
> > +		return 0;
> > +
> > +	agfl_size = xfs_agfl_size(mp);
> > +	agf = XFS_BUF_TO_AGF(agfbp);
> > +	agno = be32_to_cpu(agf->agf_seqno);
> > +	flfirst = be32_to_cpu(agf->agf_flfirst);
> > +	fllast = be32_to_cpu(agf->agf_fllast);
> > +
> > +	/* Make sure we're either spot on or off by 1. */
> > +	active = fllast - flfirst + 1;
> > +	if (active <= 0)
> > +		active += agfl_size;
> > +	if (active == pag->pagf_flcount)
> > +		return 0;
> > +	else if (active != pag->pagf_flcount + 1)
> > +		return -EFSCORRUPTED;
> > +
> 
> So we're not attempting to cover the case where the agfl has 1 more
> block than the agfl size (i.e., the case where an fs goes back to a
> kernel with an unpacked header)?

We don't know how the next kernel to touch this filesystem will define
XFS_AGFL_SIZE -- it could be a 4.5+ kernel (same agfl size), a 32-bit
pre-4.5 kernel (same agfl size), or a 64-bit pre-4.5 kernel (small agfl
size).

> I'm wondering whether using the unmount algorithm in both places (as
> noted above) would also facilitate this mechanism to work both ways
> (unpacked <-> packed), provided particular conditions are met. I suppose
> that could be considered regardless, but the less variance the better
> IMO.

In principle I suppose we could have a single function to handle all the
rearranging, but I'd rather have two functions to handle the two cases.
I'm open to refactoring the common parts out of the
xfs_fixup_freelist_wrap_*mount functions.

> > +	/* Would this have even passed muster on an old system? */
> 
> Comment doesn't really explain what is going on here..?

/*
 * If the distance between flfirst and fllast mismatches flcount by more
 * than 1, then there's more wrong with this agfl than just the padding
 * problem.  Bail out completely, which will force the admin to run
 * xfs_repair.
 */

> 
> > +	if (flfirst >= agfl_size - 1 || fllast >= agfl_size - 1 ||
> > +	    pag->pagf_flcount > agfl_size - 1)
> > +		return -EFSCORRUPTED;
> > +
> 
> I take it these are checks for whether the agfl was previously corrupted
> based on the unpacked size? FWIW, it might be a bit more clear to use an
> old/unpacked_agfl_size variable to declare intent (here and in some of
> the comments and whatnot that follow).

Ok.

> > +	/*
> > +	 * Convert a 40-byte-padded agfl into a 36-byte-padded AGFL.
> > +	 * Therefore, we need to move the AGFL blocks
> > +	 * bno[flfirst..agfl_size - 2] to bno[flfirst + 1...agfl_size - 1].
> > +	 *
> > +	 * Reusing the example above, if we had flfirst == 116, we need
> > +	 * to move bno[116] and bno[117] into bno[117] and bno[118],
> > +	 * respectively, and then increment flfirst.
> > +	 */
> 
> Kind of a strange example to use given that it doesn't mention
> wrapping.. this only triggers if the agfl was previously wrapped, right?

Right.  I used to have a check that if the list didn't wrap then we
would just exit because no fixing is required... but it must've fallen
out.  I agree that we don't want to rely on a subtlety here to stay out
of here if the agfl wasn't initially wrapped.

> > +	error = xfs_alloc_read_agfl(mp, tp, agno, &agflbp);
> > +	if (error)
> > +		return error;
> > +	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);
> > +
> > +	len = (agfl_size - flfirst - 1) * sizeof(xfs_agblock_t);
> > +	memmove(&agfl_bno[flfirst + 1], &agfl_bno[flfirst], len);
> > +	offset = (char *)&agfl_bno[flfirst + 1] - (char *)agflbp->b_addr;
> > +	be32_add_cpu(&agf->agf_flfirst, 1);
> > +
> > +	xfs_trans_buf_set_type(tp, agflbp, XFS_BLFT_AGFL_BUF);
> > +	xfs_trans_log_buf(tp, agflbp, offset, offset + len - 1);
> > +	xfs_trans_brelse(tp, agflbp);
> > +	agflbp = NULL;
> > +	xfs_alloc_log_agf(tp, agfbp, XFS_AGF_FLFIRST);
> > +
> > +	return 0;
> > +}
> > +
> > +/*
> > + * Fix an AGFL that touches the end of the block by moving the first or last
> > + * part of the list elsewhere in the AGFL so that old kernels don't trip over
> > + * wrapping issues.
> > + */
> > +STATIC int
> > +xfs_fixup_freelist_wrap_unmount(
> > +	struct xfs_trans	*tp,
> > +	struct xfs_buf		*agfbp,
> > +	struct xfs_perag	*pag)
> > +{
> > +	struct xfs_mount	*mp = tp->t_mountp;
> > +	struct xfs_agf		*agf;
> > +	struct xfs_buf		*agflbp;
> > +	__be32			*agfl_bno;
> > +	xfs_agnumber_t		agno;
> > +	uint32_t		agfl_size;
> > +	uint32_t		flfirst;
> > +	uint32_t		fllast;
> > +	int			offset;
> > +	int			len;
> > +	int			error;
> > +
> > +	agfl_size = xfs_agfl_size(mp);
> > +	agf = XFS_BUF_TO_AGF(agfbp);
> > +	agno = be32_to_cpu(agf->agf_seqno);
> > +	flfirst = be32_to_cpu(agf->agf_flfirst);
> > +	fllast = be32_to_cpu(agf->agf_fllast);
> > +
> > +	/* Empty AGFL?  Make sure we aren't pointing at the end. */
> > +	if (pag->pagf_flcount == 0) {
> > +		if (flfirst >= agfl_size || fllast >= agfl_size) {
> 
> When would either field be >= agfl_size? Isn't that where they wrap?

Oops, you're right, that should be flfirst >= agfl_size - 1.

Or really,

old_agfl_size = agfl_size - 1
if flfirst >= old_agfl_size or fllast >= old_agfl_size:
	blahblahblah

> 
> > +			agf->agf_flfirst = cpu_to_be32(1);
> > +			agf->agf_fllast = 0;
> > +			xfs_alloc_log_agf(tp, agfbp,
> > +					XFS_AGF_FLFIRST | XFS_AGF_FLLAST);
> > +		}
> > +		return 0;
> > +	}
> > +
> > +	/* If we don't hit the end, we're done. */
> > +	if (flfirst < fllast && fllast != agfl_size - 1)
> > +		return 0;
> > +
> > +	/*
> > +	 * Move a start of a wrapped list towards the start of the agfl block.
> 
> FWIW, this and the subsequent comments kind of gave me the impression
> that the agfl would be "shifted" towards the start of the block. The
> code doesn't do that, but rather rotates the start/head of the agfl to
> the tail and adjusts the pointers (changing the effective order). That
> seems technically Ok, but I had to grok the code and read back to
> understand the comment rather than the other way around.

Yeah.  Sorry about the confusion, maybe it would be better if I took out
these confusing comments and replaced it all with the ascii art above?

> > +	 * Therefore, we need to move the AGFL blocks
> > +	 * bno[flfirst..agfl_size - 1] to bno[fllast + 1...agfl_size - flfirst].
> > +	 * Then we reset flfirst and fllast appropriately.
> > +	 *
> > +	 * Reusing the example above, if we had flfirst == 117 and fllast == 4,
> > +	 * we need to move bno[117] and bno[118] into bno[5] and bno[6],
> > +	 * respectively, and then reset flfirst and fllast.
> > +	 *
> > +	 * If it's just the last block that touches the end, only move that.
> > +	 */
> > +	error = xfs_alloc_read_agfl(mp, tp, agno, &agflbp);
> > +	if (error)
> > +		return error;
> > +	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);
> > +
> > +	if (fllast == agfl_size - 1) {
> > +		/* Back the AGFL off from the end of the block. */
> > +		len = sizeof(xfs_agblock_t);
> > +		agfl_bno[flfirst - 1] = agfl_bno[agfl_size - 1];
> 
> What if the agfl is full (flfirst == 0)?

I don't think we ever unmount with a full agfl -- we're guaranteed at
least 118 items in the AGFL, and the maximum size the agfl will ever
need is 8 blocks for each of bnobt, cntbt, rmapbt.

However, we could just detect a full agfl and free the end block.

> > +		offset = (char *)&agfl_bno[flfirst - 1] - (char *)agflbp->b_addr;
> > +		be32_add_cpu(&agf->agf_fllast, -1);
> > +		be32_add_cpu(&agf->agf_flfirst, -1);
> > +	} else {

This needs to check for flfirst > fllast.

> > +		/* Move the first part of the AGFL towards the front. */
> > +		len = (agfl_size - flfirst) * sizeof(xfs_agblock_t);
> > +		memcpy(&agfl_bno[fllast + 1], &agfl_bno[flfirst], len);
> > +		offset = (char *)&agfl_bno[fllast + 1] - (char *)agflbp->b_addr;
> > +		agf->agf_flfirst = 0;
> > +		agf->agf_fllast = cpu_to_be32(pag->pagf_flcount - 1);
> 
> Similar question here as above.. it looks like the copy may be a no-op,
> but we still reset flfirst/fllast (which may also be fine, but
> unnecessary).
> 
> The more interesting case here is flcount == 1 and flfirst == fllast,
> where it looks like we'd wrongly reset first/last (and perhaps copy more
> than we should..?).
> 
> Ugh. Remind me again why we can't just detect a mismatch, fail the mount
> and tell the user to repair? :P

Not all the distros included xfs_repair in the initrd or the logic to
call it (as opposed to "fsck -y") if the root fs fails to mount, which
means that if this wrapping problem hits a root filesystem then the
administrator will have to reboot with a recovery iso and run xfs_repair
from there.  That's going to generate a /lot/ of support work compared
to us figuring out how to repair the filesystems automatically.

> Random thought... have we considered anything less invasive/more
> simple for this issue than low level buffer manipulation? For example,
> could we address the unpacked -> packed case by just allocating a
> block to fill the gap at the end of the wrapped agfl?

The allocation could fail, in which case we'd be forced to resort to
list manipulation.

> Perhaps that could even be done more discreetly in
> xfs_alloc_fix_freelist() when we already have a transaction set up,
> etc.

Possible, though this will add more logic to every hot allocation call
for something that only needs fixing at mount and at unmount.  I think
there's a stronger argument for doing the unmount cleanup on every
fix_freelist since that would eliminate the small chance that a new
kernel crashes and gets recovered on an old kernel.

> That wouldn't help going the other direction, but maybe we could
> consider similar logic to free an extraneous block and then worry
> about protecting users who won't upgrade broken kernels separately if
> that really continues to be a problem.

It is already a problem here.

--D

> Brian
> 
> > +	}
> > +
> > +	xfs_trans_buf_set_type(tp, agflbp, XFS_BLFT_AGFL_BUF);
> > +	xfs_trans_log_buf(tp, agflbp, offset, offset + len - 1);
> > +	xfs_trans_brelse(tp, agflbp);
> > +	agflbp = NULL;
> > +	xfs_alloc_log_agf(tp, agfbp, XFS_AGF_FLFIRST | XFS_AGF_FLLAST);
> > +
> > +	return 0;
> > +}
> > +
> > +typedef int (*xfs_agf_apply_fn_t)(struct xfs_trans *tp, struct xfs_buf *agfbp,
> > +		struct xfs_perag *pag);
> > +
> > +/* Apply something to every AGF. */
> > +STATIC int
> > +xfs_fixup_agf_apply(
> > +	struct xfs_mount	*mp,
> > +	xfs_agf_apply_fn_t	fn)
> > +{
> > +	struct xfs_trans	*tp;
> > +	struct xfs_perag	*pag;
> > +	struct xfs_buf		*agfbp;
> > +	xfs_agnumber_t		agno;
> > +	int			error;
> > +
> > +	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growdata, 0, 0, 0, &tp);
> > +	if (error)
> > +		return error;
> > +
> > +	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
> > +		error = xfs_alloc_read_agf(mp, tp, agno, 0, &agfbp);
> > +		if (error)
> > +			goto cancel;
> > +		if (!agfbp) {
> > +			error = -ENOMEM;
> > +			goto cancel;
> > +		}
> > +		pag = xfs_perag_get(mp, agno);
> > +		error = fn(tp, agfbp, pag);
> > +		xfs_perag_put(pag);
> > +		xfs_trans_brelse(tp, agfbp);
> > +		if (error)
> > +			goto cancel;
> > +	}
> > +
> > +	return xfs_trans_commit(tp);
> > +cancel:
> > +	xfs_trans_cancel(tp);
> > +	return error;
> > +}
> > +
> > +/* Fix AGFL wrapping so we can use the filesystem. */
> > +int
> > +xfs_fixup_agfl_wrap_mount(
> > +	struct xfs_mount	*mp)
> > +{
> > +	if (!xfs_sb_version_needs_agfl_wrap_fixes(&mp->m_sb))
> > +		return 0;
> > +
> > +	return xfs_fixup_agf_apply(mp, xfs_fixup_freelist_wrap_mount);
> > +}
> > +
> > +/* Fix AGFL wrapping so old kernels can use this filesystem. */
> > +int
> > +xfs_fixup_agfl_wrap_unmount(
> > +	struct xfs_mount	*mp)
> > +{
> > +	if (!xfs_sb_version_needs_agfl_wrap_fixes(&mp->m_sb))
> > +		return 0;
> > +
> > +	return xfs_fixup_agf_apply(mp, xfs_fixup_freelist_wrap_unmount);
> > +}
> > diff --git a/fs/xfs/xfs_fixups.h b/fs/xfs/xfs_fixups.h
> > new file mode 100644
> > index 0000000..fb52a96
> > --- /dev/null
> > +++ b/fs/xfs/xfs_fixups.h
> > @@ -0,0 +1,26 @@
> > +/*
> > + * Copyright (C) 2018 Oracle.  All Rights Reserved.
> > + *
> > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version 2
> > + * of the License, or (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it would be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write the Free Software Foundation,
> > + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> > + */
> > +#ifndef	__XFS_FIXUPS_H__
> > +#define	__XFS_FIXUPS_H__
> > +
> > +int xfs_fixup_agfl_wrap_mount(struct xfs_mount *mp);
> > +int xfs_fixup_agfl_wrap_unmount(struct xfs_mount *mp);
> > +
> > +#endif /* __XFS_FIXUPS_H__ */
> > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > index 98fd41c..eb284aa 100644
> > --- a/fs/xfs/xfs_mount.c
> > +++ b/fs/xfs/xfs_mount.c
> > @@ -46,7 +46,7 @@
> >  #include "xfs_refcount_btree.h"
> >  #include "xfs_reflink.h"
> >  #include "xfs_extent_busy.h"
> > -
> > +#include "xfs_fixups.h"
> >  
> >  static DEFINE_MUTEX(xfs_uuid_table_mutex);
> >  static int xfs_uuid_table_size;
> > @@ -875,6 +875,16 @@ xfs_mountfs(
> >  	}
> >  
> >  	/*
> > +	 * Make sure our AGFL counters do not wrap the end of the block
> > +	 * in a troublesome manner.
> > +	 */
> > +	error = xfs_fixup_agfl_wrap_mount(mp);
> > +	if (error) {
> > +		xfs_warn(mp, "Failed to fix agfl wrapping.  Run xfs_repair.");
> > +		goto out_log_dealloc;
> > +	}
> > +
> > +	/*
> >  	 * Get and sanity-check the root inode.
> >  	 * Save the pointer to it in the mount structure.
> >  	 */
> > @@ -1128,6 +1138,15 @@ xfs_unmountfs(
> >  	xfs_qm_unmount(mp);
> >  
> >  	/*
> > +	 * Make sure our AGFL counters do not wrap the end of the block
> > +	 * in a troublesome manner for old kernels.
> > +	 */
> > +	error = xfs_fixup_agfl_wrap_unmount(mp);
> > +	if (error)
> > +		xfs_warn(mp, "Unable to fix agfl wrapping.  "
> > +				"This may cause problems on next mount.");
> > +
> > +	/*
> >  	 * Unreserve any blocks we have so that when we unmount we don't account
> >  	 * the reserved free space as used. This is really only necessary for
> >  	 * lazy superblock counting because it trusts the incore superblock
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index 624a802..d9aa39a 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -50,6 +50,7 @@
> >  #include "xfs_refcount_item.h"
> >  #include "xfs_bmap_item.h"
> >  #include "xfs_reflink.h"
> > +#include "xfs_fixups.h"
> >  
> >  #include <linux/namei.h>
> >  #include <linux/dax.h>
> > @@ -1206,6 +1207,15 @@ xfs_quiesce_attr(
> >  	xfs_reclaim_inodes(mp, 0);
> >  	xfs_reclaim_inodes(mp, SYNC_WAIT);
> >  
> > +	/*
> > +	 * Make sure our AGFL counters do not wrap the end of the block
> > +	 * in a troublesome manner for old kernels.
> > +	 */
> > +	error = xfs_fixup_agfl_wrap_unmount(mp);
> > +	if (error)
> > +		xfs_warn(mp, "Unable to fix agfl wrapping.  "
> > +				"This may cause problems on next mount.");
> > +
> >  	/* Push the superblock and write an unmount record */
> >  	error = xfs_log_sbcount(mp);
> >  	if (error)
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] xfs: fix agfl wrapping
  2018-02-27 21:03     ` Darrick J. Wong
@ 2018-02-28 22:43       ` Brian Foster
  2018-02-28 23:20         ` Darrick J. Wong
  0 siblings, 1 reply; 26+ messages in thread
From: Brian Foster @ 2018-02-28 22:43 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Feb 27, 2018 at 01:03:13PM -0800, Darrick J. Wong wrote:
> On Tue, Feb 27, 2018 at 02:35:49PM -0500, Brian Foster wrote:
> > On Thu, Feb 22, 2018 at 06:00:15PM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > >  fs/xfs/Makefile     |    1 
> > >  fs/xfs/xfs_fixups.c |  310 +++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  fs/xfs/xfs_fixups.h |   26 ++++
> > >  fs/xfs/xfs_mount.c  |   21 +++
> > >  fs/xfs/xfs_super.c  |   10 ++
> > >  5 files changed, 367 insertions(+), 1 deletion(-)
> > >  create mode 100644 fs/xfs/xfs_fixups.c
> > >  create mode 100644 fs/xfs/xfs_fixups.h
> > > 
> > > 
> > > diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> > > index b03c77e..f88368a 100644
> > > --- a/fs/xfs/Makefile
> > > +++ b/fs/xfs/Makefile
> > > @@ -86,6 +86,7 @@ xfs-y				+= xfs_aops.o \
> > >  				   xfs_extent_busy.o \
> > >  				   xfs_file.o \
> > >  				   xfs_filestream.o \
> > > +				   xfs_fixups.o \
> > >  				   xfs_fsmap.o \
> > >  				   xfs_fsops.o \
> > >  				   xfs_globals.o \
> > > diff --git a/fs/xfs/xfs_fixups.c b/fs/xfs/xfs_fixups.c
> > > new file mode 100644
> > > index 0000000..0cad7bb
> > > --- /dev/null
> > > +++ b/fs/xfs/xfs_fixups.c
> > > @@ -0,0 +1,310 @@
> > > +/*
> > > + * Copyright (C) 2018 Oracle.  All Rights Reserved.
> > > + *
> > > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > > + *
> > > + * This program is free software; you can redistribute it and/or
> > > + * modify it under the terms of the GNU General Public License
> > > + * as published by the Free Software Foundation; either version 2
> > > + * of the License, or (at your option) any later version.
> > > + *
> > > + * This program is distributed in the hope that it would be useful,
> > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > > + * GNU General Public License for more details.
> > > + *
> > > + * You should have received a copy of the GNU General Public License
> > > + * along with this program; if not, write the Free Software Foundation,
> > > + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> > > + */
> > > +#include "xfs.h"
> > > +#include "xfs_fs.h"
> > > +#include "xfs_shared.h"
> > > +#include "xfs_format.h"
> > > +#include "xfs_log_format.h"
> > > +#include "xfs_trans_resv.h"
> > > +#include "xfs_sb.h"
> > > +#include "xfs_mount.h"
> > > +#include "xfs_alloc.h"
> > > +#include "xfs_trans.h"
> > > +#include "xfs_fixups.h"
> > > +
> > > +/*
> > > + * v5 AGFL padding defects
> > > + *
> > > + * When the v5 format was first introduced, there was a defect in the struct
> > > + * xfs_agfl definition that resulted in XFS_AGFL_SIZE returning different
> > 
> > XFS_AGFL_SIZE() no longer exists as of the previous patch. It might be
> > better to size refer to "the agfl size" generically.
> 
> How about:
> 
> "...the agfl length calculation formula (at the time known as
> XFS_AGFL_SIZE)..."
> 

The formula formerly known as..? ;) Heh, sounds fine.

> > > + * values depending on the compiler padding.  On a fs with 512-byte sectors,
> > > + * this meant that XFS_AGFL_SIZE was 119 on i386, but 118 on x64.  Commit
> > > + * 96f859d52bcb1 ("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is
> > > + * correct") changed the definition to disable padding the end of the
> > > + * structure, and was accepted into Linux 4.5.  Since then, the AGFL has
> > > + * always used the larger size (e.g. 119 entries on a 512b sector fs).
> > > + *
> > > + * Unfortunately, pre-4.5 kernels can produce filesystems with AGFLs that wrap
> > > + * at the smaller size, and those kernels are not prepared to handle the
> > > + * longer size.  This typically manifests itself as an AGF verifier corruption
> > > + * error followed by a filesystem shutdown.  While we encourage admins to stay
> > > + * current with software, we would like to avoid this intermittent breakage.
> > > + *
> > > + * Any v5 filesystem which has a feature bit set for a feature that was
> > > + * introduced after Linux 4.5 will not have this problem, as such kernels
> > > + * cannot be mounted on older kernels.  v4 filesystems are also unaffected.
> > > + *
> > > + * Therefore, we add two fixup functions -- the first runs at mount time to
> > > + * detect a short-wrapped AGFL and fix it; the second runs at unmount, freeze,
> > > + * or remount-ro time to move a wrapped AGFL to the beginning of the list.
> > > + * This reduces the likelihood of a screwup to the scenario where you have (a)
> > > + * a filesystem with no post-4.5 features (reflink, rmap), (b) the AGFL wraps,
> > > + * (c) the filesystem goes down leaving a dirty log, and (d) the dirty
> > > + * filesystem is mounted on an old kernel.
> > > + */
> > > +
> > 
> > While the mount vs. unmount time fixups serve different purposes and
> > have different conditions, do we really need two separate fixup
> > implementations? E.g., if we have to unwrap the AGFL at unmount time,
> > why not just reuse that fixup at mount time (when needed) as well? I
> > suspect the difference would only be the length of what we can consider
> > valid in the agfl at the time.
> 
> They're doing two different things.  I will use ASCII art to demonstrate
> the mount-time function:
> 

Yes, the question was whether we need to do two different things. For
example...

> Let's say the AGFL block has space for 10 items.  It looks like this:
> 
>          0   1   2   3   4   5   6   7   8   9
> Header |   |   |   |   |   |   |   |   |   |   |
> 
> 
> When the fs is freshly formatted or repaired, the AGFL will look like:
> 
>          0   1   2   3   4   5   6   7   8   9
> Header | A | B | C | D | E | F |   |   |   |   |
>          ^-- flfirst         ^-- fllast
> 
> Due to the padding problems prior to 4.5 ("4.5 agfl padding fix"),
> XFS_AGFL_SIZE would return 10 on 32-bit systems and 9 on 64-bit systems.
> Therefore, if the AGFL wrapped on a 64-bit kernel we would end up with:
> 
>          0   1   2   3   4   5   6   7   8   9
> Header | D | E | F |   |   |   | A | B | C | ? |
>                  ^-- fllast      ^-- flfirst
> 
> Note that block 9's contents are undefined because we didn't write
> anything there.  Now the "4.5 agfl padding fix" has corrected
> XFS_AGFL_SIZE to return 10 in all cases, this looks like an AGF
> corruption because flcount is 6 but the distance between flfirst and
> fllast is 7.  So long as the list wraps and the mismatch is exactly one
> block we can fix this.  The mount time fixer in this patch does this by
> moving the records between flfirst and xfs_agfl_size (A, B, and C) to
> the end of the block to close the gap:
> 
>          0   1   2   3   4   5   6   7   8   9
> Header | D | E | F |   |   |   |   | A | B | C |
>                  ^-- fllast          ^-- flfirst
> 
> If the count is off by more than 1 then the AGF is truly corrupt and we
> bail out.
> 
> Now let's say that we run for a while and want to unmount, but our AGFL
> wraps like this:
> 
>          0   1   2   3   4   5   6   7   8   9
> Header | T | U | V |   |   |   |   | Q | R | S |
>                  ^-- fllast          ^-- flfirst
> 
> We don't know that the next kernel to mount this filesystem will have
> the "4.5 agfl padding fix" applied to it; if it does not, it will flag
> the AGF as corrupted because flcount is 6 but in its view the distance
> between flfirst and fllast (which omits bno[9]) is 5.  We don't want
> it to choke on that, so the unmount fixer moves all the records between
> flfirst and the end (Q, R, and S) towards the start of the block and
> resets flfirst/fllast:
> 
>          0   1   2   3   4   5   6   7   8   9
> Header | T | U | V | Q | R | S |   |   |   |   |
>          ^-- flfirst         ^-- fllast
> 

Could we use this basic algorithm at mount time as well? I know it
wouldn't be _exactly_ the same operation at mount time as it is for
unmount since slot 9 is a gap in the former case, but afaict the only
difference from an algorithmic perspective is the length of the shift.

IOW, if we were to parameterize the length of the shift and have the
mount fixup call the unmount algorithm, would it not also address the
problem?

> Since the AGFL no longer wraps at all, it doesn't matter if the next
> kernel to mount this filesystem has the "4.5 agfl padding fix" applied;
> all kernels can handle this correctly.
> 
> > > +/*
> > > + * Decide if we need to have the agfl wrapping fixes applied.  This only
> > > + * affects v5 filesystems that do not have any features enabled that did not
> > > + * exist when the agfl padding fix went in.
> > > + *
> > > + * Features already present when the fix went in were finobt, ftype, spinodes.
> > > + * If we see something new (e.g. reflink) then don't bother.
> > > + */
> > > +#define XFS_SB_FEAT_RO_COMPAT_AGFL_WRAP_ALREADY_FIXED \
> > > +		(~(XFS_SB_FEAT_RO_COMPAT_FINOBT))
> > > +#define XFS_SB_FEAT_INCOMPAT_AGFL_WRAP_ALREADY_FIXED \
> > > +		(~(XFS_SB_FEAT_INCOMPAT_FTYPE | \
> > > +		   XFS_SB_FEAT_INCOMPAT_SPINODES))
> > > +#define XFS_SB_FEAT_INCOMPAT_LOG_AGFL_WRAP_ALREADY_FIXED \
> > > +		(~0)
> > > +static inline bool xfs_sb_version_needs_agfl_wrap_fixes(struct xfs_sb *sbp)
> > > +{
> > > +	return xfs_sb_version_hascrc(sbp) &&
> > > +		!xfs_sb_has_incompat_feature(sbp,
> > > +			XFS_SB_FEAT_INCOMPAT_AGFL_WRAP_ALREADY_FIXED) &&
> > > +		!xfs_sb_has_ro_compat_feature(sbp,
> > > +			XFS_SB_FEAT_RO_COMPAT_AGFL_WRAP_ALREADY_FIXED) &&
> > > +		!xfs_sb_has_incompat_log_feature(sbp,
> > > +			XFS_SB_FEAT_INCOMPAT_LOG_AGFL_WRAP_ALREADY_FIXED);
> > > +}
> > > +
> > > +/*
> > > + * Fix an AGFL wrapping that falls short of the end of the block by filling the
> > > + * gap at the end of the block.
> > > + */
> > > +STATIC int
> > > +xfs_fixup_freelist_wrap_mount(
> > > +	struct xfs_trans	*tp,
> > > +	struct xfs_buf		*agfbp,
> > > +	struct xfs_perag	*pag)
> > > +{
> > > +	struct xfs_mount	*mp = tp->t_mountp;
> > > +	struct xfs_agf		*agf;
> > > +	struct xfs_buf		*agflbp;
> > > +	__be32			*agfl_bno;
> > > +	xfs_agnumber_t		agno;
> > > +	uint32_t		agfl_size;
> > > +	uint32_t		flfirst;
> > > +	uint32_t		fllast;
> > > +	int32_t			active;
> > > +	int			offset;
> > > +	int			len;
> > > +	int			error;
> > > +
> > > +	if (pag->pagf_flcount == 0)
> > > +		return 0;
> > > +
> > > +	agfl_size = xfs_agfl_size(mp);
> > > +	agf = XFS_BUF_TO_AGF(agfbp);
> > > +	agno = be32_to_cpu(agf->agf_seqno);
> > > +	flfirst = be32_to_cpu(agf->agf_flfirst);
> > > +	fllast = be32_to_cpu(agf->agf_fllast);
> > > +
> > > +	/* Make sure we're either spot on or off by 1. */
> > > +	active = fllast - flfirst + 1;
> > > +	if (active <= 0)
> > > +		active += agfl_size;
> > > +	if (active == pag->pagf_flcount)
> > > +		return 0;
> > > +	else if (active != pag->pagf_flcount + 1)
> > > +		return -EFSCORRUPTED;
> > > +
> > 
> > So we're not attempting to cover the case where the agfl has 1 more
> > block than the agfl size (i.e., the case where an fs goes back to a
> > kernel with an unpacked header)?
> 
> We don't know how the next kernel to touch this filesystem will define
> XFS_AGFL_SIZE -- it could be a 4.5+ kernel (same agfl size), a 32-bit
> pre-4.5 kernel (same agfl size), or a 64-bit pre-4.5 kernel (small agfl
> size).
> 

I don't think I was clear.. I'm envisioning whether we could come up
with a patch that would generically fix up the agfl on disk to be sane
relative to the current kernel. This patch covers the case of a packed
agfl kernel mounting an unpacked on-disk agfl. It would be nice if we
could implement something that also handled a packed on-disk agfl to an
unpacked agfl kernel (for easier backport to unpacked kernels, for
e.g.).

> > I'm wondering whether using the unmount algorithm in both places (as
> > noted above) would also facilitate this mechanism to work both ways
> > (unpacked <-> packed), provided particular conditions are met. I suppose
> > that could be considered regardless, but the less variance the better
> > IMO.
> 
> In principle I suppose we could have a single function to handle all the
> rearranging, but I'd rather have two functions to handle the two cases.
> I'm open to refactoring the common parts out of the
> xfs_fixup_freelist_wrap_*mount functions.
> 

As noted above, I'm not really commenting on the function factoring here
as much as the use of multiple algorithms (using one would certainly
facilitate some mechanical refactoring).

> > > +	/* Would this have even passed muster on an old system? */
> > 
> > Comment doesn't really explain what is going on here..?
> 
> /*
>  * If the distance between flfirst and fllast mismatches flcount by more
>  * than 1, then there's more wrong with this agfl than just the padding
>  * problem.  Bail out completely, which will force the admin to run
>  * xfs_repair.
>  */
> 

*nod*

> > 
> > > +	if (flfirst >= agfl_size - 1 || fllast >= agfl_size - 1 ||
> > > +	    pag->pagf_flcount > agfl_size - 1)
> > > +		return -EFSCORRUPTED;
> > > +
> > 
> > I take it these are checks for whether the agfl was previously corrupted
> > based on the unpacked size? FWIW, it might be a bit more clear to use an
> > old/unpacked_agfl_size variable to declare intent (here and in some of
> > the comments and whatnot that follow).
> 
> Ok.
> 
> > > +	/*
> > > +	 * Convert a 40-byte-padded agfl into a 36-byte-padded AGFL.
> > > +	 * Therefore, we need to move the AGFL blocks
> > > +	 * bno[flfirst..agfl_size - 2] to bno[flfirst + 1...agfl_size - 1].
> > > +	 *
> > > +	 * Reusing the example above, if we had flfirst == 116, we need
> > > +	 * to move bno[116] and bno[117] into bno[117] and bno[118],
> > > +	 * respectively, and then increment flfirst.
> > > +	 */
> > 
> > Kind of a strange example to use given that it doesn't mention
> > wrapping.. this only triggers if the agfl was previously wrapped, right?
> 
> Right.  I used to have a check that if the list didn't wrap then we
> would just exit because no fixing is required... but it must've fallen
> out.  I agree that we don't want to rely on a subtlety here to stay out
> of here if the agfl wasn't initially wrapped.
> 
> > > +	error = xfs_alloc_read_agfl(mp, tp, agno, &agflbp);
> > > +	if (error)
> > > +		return error;
> > > +	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);
> > > +
> > > +	len = (agfl_size - flfirst - 1) * sizeof(xfs_agblock_t);
> > > +	memmove(&agfl_bno[flfirst + 1], &agfl_bno[flfirst], len);
> > > +	offset = (char *)&agfl_bno[flfirst + 1] - (char *)agflbp->b_addr;
> > > +	be32_add_cpu(&agf->agf_flfirst, 1);
> > > +
> > > +	xfs_trans_buf_set_type(tp, agflbp, XFS_BLFT_AGFL_BUF);
> > > +	xfs_trans_log_buf(tp, agflbp, offset, offset + len - 1);
> > > +	xfs_trans_brelse(tp, agflbp);
> > > +	agflbp = NULL;
> > > +	xfs_alloc_log_agf(tp, agfbp, XFS_AGF_FLFIRST);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +/*
> > > + * Fix an AGFL that touches the end of the block by moving the first or last
> > > + * part of the list elsewhere in the AGFL so that old kernels don't trip over
> > > + * wrapping issues.
> > > + */
> > > +STATIC int
> > > +xfs_fixup_freelist_wrap_unmount(
> > > +	struct xfs_trans	*tp,
> > > +	struct xfs_buf		*agfbp,
> > > +	struct xfs_perag	*pag)
> > > +{
> > > +	struct xfs_mount	*mp = tp->t_mountp;
> > > +	struct xfs_agf		*agf;
> > > +	struct xfs_buf		*agflbp;
> > > +	__be32			*agfl_bno;
> > > +	xfs_agnumber_t		agno;
> > > +	uint32_t		agfl_size;
> > > +	uint32_t		flfirst;
> > > +	uint32_t		fllast;
> > > +	int			offset;
> > > +	int			len;
> > > +	int			error;
> > > +
> > > +	agfl_size = xfs_agfl_size(mp);
> > > +	agf = XFS_BUF_TO_AGF(agfbp);
> > > +	agno = be32_to_cpu(agf->agf_seqno);
> > > +	flfirst = be32_to_cpu(agf->agf_flfirst);
> > > +	fllast = be32_to_cpu(agf->agf_fllast);
> > > +
> > > +	/* Empty AGFL?  Make sure we aren't pointing at the end. */
> > > +	if (pag->pagf_flcount == 0) {
> > > +		if (flfirst >= agfl_size || fllast >= agfl_size) {
> > 
> > When would either field be >= agfl_size? Isn't that where they wrap?
> 
> Oops, you're right, that should be flfirst >= agfl_size - 1.
> 
> Or really,
> 
> old_agfl_size = agfl_size - 1
> if flfirst >= old_agfl_size or fllast >= old_agfl_size:
> 	blahblahblah
> 
> > 
> > > +			agf->agf_flfirst = cpu_to_be32(1);
> > > +			agf->agf_fllast = 0;
> > > +			xfs_alloc_log_agf(tp, agfbp,
> > > +					XFS_AGF_FLFIRST | XFS_AGF_FLLAST);
> > > +		}
> > > +		return 0;
> > > +	}
> > > +
> > > +	/* If we don't hit the end, we're done. */
> > > +	if (flfirst < fllast && fllast != agfl_size - 1)
> > > +		return 0;
> > > +
> > > +	/*
> > > +	 * Move a start of a wrapped list towards the start of the agfl block.
> > 
> > FWIW, this and the subsequent comments kind of gave me the impression
> > that the agfl would be "shifted" towards the start of the block. The
> > code doesn't do that, but rather rotates the start/head of the agfl to
> > the tail and adjusts the pointers (changing the effective order). That
> > seems technically Ok, but I had to grok the code and read back to
> > understand the comment rather than the other way around.
> 
> Yeah.  Sorry about the confusion, maybe it would be better if I took out
> these confusing comments and replaced it all with the ascii art above?
> 

I'm still hoping we can do something a bit more simple and flexible in
general, but that aside, the ascii art describes the associated
algorithms much better.

> > > +	 * Therefore, we need to move the AGFL blocks
> > > +	 * bno[flfirst..agfl_size - 1] to bno[fllast + 1...agfl_size - flfirst].
> > > +	 * Then we reset flfirst and fllast appropriately.
> > > +	 *
> > > +	 * Reusing the example above, if we had flfirst == 117 and fllast == 4,
> > > +	 * we need to move bno[117] and bno[118] into bno[5] and bno[6],
> > > +	 * respectively, and then reset flfirst and fllast.
> > > +	 *
> > > +	 * If it's just the last block that touches the end, only move that.
> > > +	 */
> > > +	error = xfs_alloc_read_agfl(mp, tp, agno, &agflbp);
> > > +	if (error)
> > > +		return error;
> > > +	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);
> > > +
> > > +	if (fllast == agfl_size - 1) {
> > > +		/* Back the AGFL off from the end of the block. */
> > > +		len = sizeof(xfs_agblock_t);
> > > +		agfl_bno[flfirst - 1] = agfl_bno[agfl_size - 1];
> > 
> > What if the agfl is full (flfirst == 0)?
> 
> I don't think we ever unmount with a full agfl -- we're guaranteed at
> least 118 items in the AGFL, and the maximum size the agfl will ever
> need is 8 blocks for each of bnobt, cntbt, rmapbt.
> 
> However, we could just detect a full agfl and free the end block.
> 

I suppose it's unlikely we'll ever see corner cases such as a completely
full or empty agfl. That said, I think the code needs to handle those
cases sanely. That could mean explicit checks and skipping the fixup for
all I care, just as long as we don't have memory corruption vectors and
whatnot if (when?) assumptions prove wrong.

> > > +		offset = (char *)&agfl_bno[flfirst - 1] - (char *)agflbp->b_addr;
> > > +		be32_add_cpu(&agf->agf_fllast, -1);
> > > +		be32_add_cpu(&agf->agf_flfirst, -1);
> > > +	} else {
> 
> This needs to check for flfirst > fllast.
> 
> > > +		/* Move the first part of the AGFL towards the front. */
> > > +		len = (agfl_size - flfirst) * sizeof(xfs_agblock_t);
> > > +		memcpy(&agfl_bno[fllast + 1], &agfl_bno[flfirst], len);
> > > +		offset = (char *)&agfl_bno[fllast + 1] - (char *)agflbp->b_addr;
> > > +		agf->agf_flfirst = 0;
> > > +		agf->agf_fllast = cpu_to_be32(pag->pagf_flcount - 1);
> > 
> > Similar question here as above.. it looks like the copy may be a no-op,
> > but we still reset flfirst/fllast (which may also be fine, but
> > unnecessary).
> > 
> > The more interesting case here is flcount == 1 and flfirst == fllast,
> > where it looks like we'd wrongly reset first/last (and perhaps copy more
> > than we should..?).
> > 
> > Ugh. Remind me again why we can't just detect a mismatch, fail the mount
> > and tell the user to repair? :P
> 
> Not all the distros included xfs_repair in the initrd or the logic to
> call it (as opposed to "fsck -y") if the root fs fails to mount, which
> means that if this wrapping problem hits a root filesystem then the
> administrator will have to reboot with a recovery iso and run xfs_repair
> from there.  That's going to generate a /lot/ of support work compared
> to us figuring out how to repair the filesystems automatically.
> 

Yeah, sorry. Good argument for the unpacked -> packed fixup.

> > Random thought... have we considered anything less invasive/more
> > simple for this issue than low level buffer manipulation? For example,
> > could we address the unpacked -> packed case by just allocating a
> > block to fill the gap at the end of the wrapped agfl?
> 
> The allocation could fail, in which case we'd be forced to resort to
> list manipulation.
> 

Yeah.. obvious tradeoff, but how likely is that? It seems a reasonable
tradeoff to me given there's still the repair path. Also note that we
don't have to necessarily allocate from the btrees.

(FWIW, I hacked around on this a bit and ended up with something
slightly different[1].)

> > Perhaps that could even be done more discreetly in
> > xfs_alloc_fix_freelist() when we already have a transaction set up,
> > etc.
> 
> Possible, though this will add more logic to every hot allocation call
> for something that only needs fixing at mount and at unmount.  I think
> there's a stronger argument for doing the unmount cleanup on every
> fix_freelist since that would eliminate the small chance that a new
> kernel crashes and gets recovered on an old kernel.
> 

Eh, I don't think that's anything we couldn't address if need be.  E.g.,
amortize the padding fixup against actual allocations, set/test a pag
bit to track the padding fixup, or check it on agf read and set a "needs
fixup" bit, etc.

> > That wouldn't help going the other direction, but maybe we could
> > consider similar logic to free an extraneous block and then worry
> > about protecting users who won't upgrade broken kernels separately if
> > that really continues to be a problem.
> 
> It is already a problem here.
> 

Then it should be able to stand alone as an independent patch (with
independent commit log that explains/justifies it, etc.).

Brian

[1] This is hacky and brute force, but the idea was to try and do
something that could work in either direction and reuse more existing
code. It basically just reads the on-disk entries "manually" and
reinserts them into a reinitialized (empty) agfl. Perhaps it could be
refactored into something that only rotates a given number of entries or
something of that nature. Anyways, just an experiment and probably
broken..

int
xfs_agfl_fixup(
	struct xfs_trans	*tp,
	struct xfs_buf		*agbp,
	struct xfs_perag	*pag)
{
	struct xfs_mount	*mp = tp->t_mountp;
	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agbp);
	int			agfl_size = XFS_AGFL_SIZE(mp);
	int			ofirst, olast, osize;
	int			nfirst, nlast;
	struct xfs_buf		*agflbp;
	__be32			*agfl_bno;
	int			active;
	xfs_agblock_t		bno;
	bool			wasempty = false;

	if (pag->pagf_flcount == 0)
		wasempty = true;

	ofirst = be32_to_cpu(agf->agf_flfirst);
	olast = be32_to_cpu(agf->agf_fllast);
	if (olast >= ofirst)
		active = olast - ofirst + 1;
	else
		active = agfl_size - ofirst + olast + 1;

	if (active == pag->pagf_flcount + 1)
		osize = agfl_size - 1;
	else if ((active == pag->pagf_flcount - 1) ||
		 ofirst == agfl_size || olast == agfl_size)
		osize = agfl_size + 1;
	else if (active == pag->pagf_flcount)
		osize = agfl_size;
	else
		return -EFSCORRUPTED;

	xfs_alloc_read_agfl(mp, tp, be32_to_cpu(agf->agf_seqno), &agflbp);
	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);

	nlast = do_mod(olast, agfl_size);
	nfirst = nlast + 1;
	nfirst = do_mod(nfirst, agfl_size);
	ASSERT(nfirst != ofirst);

	agf->agf_flfirst = cpu_to_be32(nfirst);
	agf->agf_fllast = cpu_to_be32(nlast);
	agf->agf_flcount = 0;
	xfs_alloc_log_agf(tp, agbp, XFS_AGF_FLFIRST | XFS_AGF_FLLAST |
				    XFS_AGF_FLCOUNT);
	pag->pagf_flcount = 0;

	if (wasempty)
		goto out;

	while (true) {
		ofirst = do_mod(ofirst, osize);
		bno = be32_to_cpu(agfl_bno[ofirst]);
		xfs_alloc_put_freelist(tp, agbp, agflbp, bno, 0);
		if (ofirst == olast)
			break;
		ofirst++;
	}

out:
	xfs_trans_brelse(tp, agflbp);
	return 0;
}


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] xfs: fix agfl wrapping
  2018-02-28 22:43       ` Brian Foster
@ 2018-02-28 23:20         ` Darrick J. Wong
  2018-03-01 17:28           ` Brian Foster
  0 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2018-02-28 23:20 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Wed, Feb 28, 2018 at 05:43:51PM -0500, Brian Foster wrote:
> On Tue, Feb 27, 2018 at 01:03:13PM -0800, Darrick J. Wong wrote:
> > On Tue, Feb 27, 2018 at 02:35:49PM -0500, Brian Foster wrote:
> > > On Thu, Feb 22, 2018 at 06:00:15PM -0800, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > 
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > ---
> > > >  fs/xfs/Makefile     |    1 
> > > >  fs/xfs/xfs_fixups.c |  310 +++++++++++++++++++++++++++++++++++++++++++++++++++
> > > >  fs/xfs/xfs_fixups.h |   26 ++++
> > > >  fs/xfs/xfs_mount.c  |   21 +++
> > > >  fs/xfs/xfs_super.c  |   10 ++
> > > >  5 files changed, 367 insertions(+), 1 deletion(-)
> > > >  create mode 100644 fs/xfs/xfs_fixups.c
> > > >  create mode 100644 fs/xfs/xfs_fixups.h
> > > > 
> > > > 
> > > > diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> > > > index b03c77e..f88368a 100644
> > > > --- a/fs/xfs/Makefile
> > > > +++ b/fs/xfs/Makefile
> > > > @@ -86,6 +86,7 @@ xfs-y				+= xfs_aops.o \
> > > >  				   xfs_extent_busy.o \
> > > >  				   xfs_file.o \
> > > >  				   xfs_filestream.o \
> > > > +				   xfs_fixups.o \
> > > >  				   xfs_fsmap.o \
> > > >  				   xfs_fsops.o \
> > > >  				   xfs_globals.o \
> > > > diff --git a/fs/xfs/xfs_fixups.c b/fs/xfs/xfs_fixups.c
> > > > new file mode 100644
> > > > index 0000000..0cad7bb
> > > > --- /dev/null
> > > > +++ b/fs/xfs/xfs_fixups.c
> > > > @@ -0,0 +1,310 @@
> > > > +/*
> > > > + * Copyright (C) 2018 Oracle.  All Rights Reserved.
> > > > + *
> > > > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > > > + *
> > > > + * This program is free software; you can redistribute it and/or
> > > > + * modify it under the terms of the GNU General Public License
> > > > + * as published by the Free Software Foundation; either version 2
> > > > + * of the License, or (at your option) any later version.
> > > > + *
> > > > + * This program is distributed in the hope that it would be useful,
> > > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > > > + * GNU General Public License for more details.
> > > > + *
> > > > + * You should have received a copy of the GNU General Public License
> > > > + * along with this program; if not, write the Free Software Foundation,
> > > > + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> > > > + */
> > > > +#include "xfs.h"
> > > > +#include "xfs_fs.h"
> > > > +#include "xfs_shared.h"
> > > > +#include "xfs_format.h"
> > > > +#include "xfs_log_format.h"
> > > > +#include "xfs_trans_resv.h"
> > > > +#include "xfs_sb.h"
> > > > +#include "xfs_mount.h"
> > > > +#include "xfs_alloc.h"
> > > > +#include "xfs_trans.h"
> > > > +#include "xfs_fixups.h"
> > > > +
> > > > +/*
> > > > + * v5 AGFL padding defects
> > > > + *
> > > > + * When the v5 format was first introduced, there was a defect in the struct
> > > > + * xfs_agfl definition that resulted in XFS_AGFL_SIZE returning different
> > > 
> > > XFS_AGFL_SIZE() no longer exists as of the previous patch. It might be
> > > better to size refer to "the agfl size" generically.
> > 
> > How about:
> > 
> > "...the agfl length calculation formula (at the time known as
> > XFS_AGFL_SIZE)..."
> > 
> 
> The formula formerly known as..? ;) Heh, sounds fine.

lol. :)

> > > > + * values depending on the compiler padding.  On a fs with 512-byte sectors,
> > > > + * this meant that XFS_AGFL_SIZE was 119 on i386, but 118 on x64.  Commit
> > > > + * 96f859d52bcb1 ("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is
> > > > + * correct") changed the definition to disable padding the end of the
> > > > + * structure, and was accepted into Linux 4.5.  Since then, the AGFL has
> > > > + * always used the larger size (e.g. 119 entries on a 512b sector fs).
> > > > + *
> > > > + * Unfortunately, pre-4.5 kernels can produce filesystems with AGFLs that wrap
> > > > + * at the smaller size, and those kernels are not prepared to handle the
> > > > + * longer size.  This typically manifests itself as an AGF verifier corruption
> > > > + * error followed by a filesystem shutdown.  While we encourage admins to stay
> > > > + * current with software, we would like to avoid this intermittent breakage.
> > > > + *
> > > > + * Any v5 filesystem which has a feature bit set for a feature that was
> > > > + * introduced after Linux 4.5 will not have this problem, as such kernels
> > > > + * cannot be mounted on older kernels.  v4 filesystems are also unaffected.
> > > > + *
> > > > + * Therefore, we add two fixup functions -- the first runs at mount time to
> > > > + * detect a short-wrapped AGFL and fix it; the second runs at unmount, freeze,
> > > > + * or remount-ro time to move a wrapped AGFL to the beginning of the list.
> > > > + * This reduces the likelihood of a screwup to the scenario where you have (a)
> > > > + * a filesystem with no post-4.5 features (reflink, rmap), (b) the AGFL wraps,
> > > > + * (c) the filesystem goes down leaving a dirty log, and (d) the dirty
> > > > + * filesystem is mounted on an old kernel.
> > > > + */
> > > > +
> > > 
> > > While the mount vs. unmount time fixups serve different purposes and
> > > have different conditions, do we really need two separate fixup
> > > implementations? E.g., if we have to unwrap the AGFL at unmount time,
> > > why not just reuse that fixup at mount time (when needed) as well? I
> > > suspect the difference would only be the length of what we can consider
> > > valid in the agfl at the time.
> > 
> > They're doing two different things.  I will use ASCII art to demonstrate
> > the mount-time function:
> > 
> 
> Yes, the question was whether we need to do two different things. For
> example...
> 
> > Let's say the AGFL block has space for 10 items.  It looks like this:
> > 
> >          0   1   2   3   4   5   6   7   8   9
> > Header |   |   |   |   |   |   |   |   |   |   |
> > 
> > 
> > When the fs is freshly formatted or repaired, the AGFL will look like:
> > 
> >          0   1   2   3   4   5   6   7   8   9
> > Header | A | B | C | D | E | F |   |   |   |   |
> >          ^-- flfirst         ^-- fllast
> > 
> > Due to the padding problems prior to 4.5 ("4.5 agfl padding fix"),
> > XFS_AGFL_SIZE would return 10 on 32-bit systems and 9 on 64-bit systems.
> > Therefore, if the AGFL wrapped on a 64-bit kernel we would end up with:
> > 
> >          0   1   2   3   4   5   6   7   8   9
> > Header | D | E | F |   |   |   | A | B | C | ? |
> >                  ^-- fllast      ^-- flfirst
> > 
> > Note that block 9's contents are undefined because we didn't write
> > anything there.  Now the "4.5 agfl padding fix" has corrected
> > XFS_AGFL_SIZE to return 10 in all cases, this looks like an AGF
> > corruption because flcount is 6 but the distance between flfirst and
> > fllast is 7.  So long as the list wraps and the mismatch is exactly one
> > block we can fix this.  The mount time fixer in this patch does this by
> > moving the records between flfirst and xfs_agfl_size (A, B, and C) to
> > the end of the block to close the gap:
> > 
> >          0   1   2   3   4   5   6   7   8   9
> > Header | D | E | F |   |   |   |   | A | B | C |
> >                  ^-- fllast          ^-- flfirst
> > 
> > If the count is off by more than 1 then the AGF is truly corrupt and we
> > bail out.
> > 
> > Now let's say that we run for a while and want to unmount, but our AGFL
> > wraps like this:
> > 
> >          0   1   2   3   4   5   6   7   8   9
> > Header | T | U | V |   |   |   |   | Q | R | S |
> >                  ^-- fllast          ^-- flfirst
> > 
> > We don't know that the next kernel to mount this filesystem will have
> > the "4.5 agfl padding fix" applied to it; if it does not, it will flag
> > the AGF as corrupted because flcount is 6 but in its view the distance
> > between flfirst and fllast (which omits bno[9]) is 5.  We don't want
> > it to choke on that, so the unmount fixer moves all the records between
> > flfirst and the end (Q, R, and S) towards the start of the block and
> > resets flfirst/fllast:
> > 
> >          0   1   2   3   4   5   6   7   8   9
> > Header | T | U | V | Q | R | S |   |   |   |   |
> >          ^-- flfirst         ^-- fllast
> > 
> 
> Could we use this basic algorithm at mount time as well? I know it
> wouldn't be _exactly_ the same operation at mount time as it is for
> unmount since slot 9 is a gap in the former case, but afaict the only
> difference from an algorithmic perspective is the length of the shift.
> 
> IOW, if we were to parameterize the length of the shift and have the
> mount fixup call the unmount algorithm, would it not also address the
> problem?

Yes, I believe that would work.  I think it would be more efficient in
the patch below to memmove the entries instead of put_freelist'ing them
individually.

If the agfl is completely full then fix_freelist ought to trim it down
by at least one element.  The algorithm then becomes:

if mounting and flfirst < fllast:
	return 0
if flcount == agfl_size:
	assert !mounting
	fix_freelist()
	assert flcount < agfl_size
if flfirst < fllast:
	return 0
movelen = agfl_size - flfirst
if active == flcount - 1:
	movelen--
memmove(&agflbno[fllast + 1], &agflbno[flfirst], movelen)
flfirst = 0
fllast = fllast + movelen

> > Since the AGFL no longer wraps at all, it doesn't matter if the next
> > kernel to mount this filesystem has the "4.5 agfl padding fix" applied;
> > all kernels can handle this correctly.
> > 
> > > > +/*
> > > > + * Decide if we need to have the agfl wrapping fixes applied.  This only
> > > > + * affects v5 filesystems that do not have any features enabled that did not
> > > > + * exist when the agfl padding fix went in.
> > > > + *
> > > > + * Features already present when the fix went in were finobt, ftype, spinodes.
> > > > + * If we see something new (e.g. reflink) then don't bother.
> > > > + */
> > > > +#define XFS_SB_FEAT_RO_COMPAT_AGFL_WRAP_ALREADY_FIXED \
> > > > +		(~(XFS_SB_FEAT_RO_COMPAT_FINOBT))
> > > > +#define XFS_SB_FEAT_INCOMPAT_AGFL_WRAP_ALREADY_FIXED \
> > > > +		(~(XFS_SB_FEAT_INCOMPAT_FTYPE | \
> > > > +		   XFS_SB_FEAT_INCOMPAT_SPINODES))
> > > > +#define XFS_SB_FEAT_INCOMPAT_LOG_AGFL_WRAP_ALREADY_FIXED \
> > > > +		(~0)
> > > > +static inline bool xfs_sb_version_needs_agfl_wrap_fixes(struct xfs_sb *sbp)
> > > > +{
> > > > +	return xfs_sb_version_hascrc(sbp) &&
> > > > +		!xfs_sb_has_incompat_feature(sbp,
> > > > +			XFS_SB_FEAT_INCOMPAT_AGFL_WRAP_ALREADY_FIXED) &&
> > > > +		!xfs_sb_has_ro_compat_feature(sbp,
> > > > +			XFS_SB_FEAT_RO_COMPAT_AGFL_WRAP_ALREADY_FIXED) &&
> > > > +		!xfs_sb_has_incompat_log_feature(sbp,
> > > > +			XFS_SB_FEAT_INCOMPAT_LOG_AGFL_WRAP_ALREADY_FIXED);
> > > > +}
> > > > +
> > > > +/*
> > > > + * Fix an AGFL wrapping that falls short of the end of the block by filling the
> > > > + * gap at the end of the block.
> > > > + */
> > > > +STATIC int
> > > > +xfs_fixup_freelist_wrap_mount(
> > > > +	struct xfs_trans	*tp,
> > > > +	struct xfs_buf		*agfbp,
> > > > +	struct xfs_perag	*pag)
> > > > +{
> > > > +	struct xfs_mount	*mp = tp->t_mountp;
> > > > +	struct xfs_agf		*agf;
> > > > +	struct xfs_buf		*agflbp;
> > > > +	__be32			*agfl_bno;
> > > > +	xfs_agnumber_t		agno;
> > > > +	uint32_t		agfl_size;
> > > > +	uint32_t		flfirst;
> > > > +	uint32_t		fllast;
> > > > +	int32_t			active;
> > > > +	int			offset;
> > > > +	int			len;
> > > > +	int			error;
> > > > +
> > > > +	if (pag->pagf_flcount == 0)
> > > > +		return 0;
> > > > +
> > > > +	agfl_size = xfs_agfl_size(mp);
> > > > +	agf = XFS_BUF_TO_AGF(agfbp);
> > > > +	agno = be32_to_cpu(agf->agf_seqno);
> > > > +	flfirst = be32_to_cpu(agf->agf_flfirst);
> > > > +	fllast = be32_to_cpu(agf->agf_fllast);
> > > > +
> > > > +	/* Make sure we're either spot on or off by 1. */
> > > > +	active = fllast - flfirst + 1;
> > > > +	if (active <= 0)
> > > > +		active += agfl_size;
> > > > +	if (active == pag->pagf_flcount)
> > > > +		return 0;
> > > > +	else if (active != pag->pagf_flcount + 1)
> > > > +		return -EFSCORRUPTED;
> > > > +
> > > 
> > > So we're not attempting to cover the case where the agfl has 1 more
> > > block than the agfl size (i.e., the case where an fs goes back to a
> > > kernel with an unpacked header)?
> > 
> > We don't know how the next kernel to touch this filesystem will define
> > XFS_AGFL_SIZE -- it could be a 4.5+ kernel (same agfl size), a 32-bit
> > pre-4.5 kernel (same agfl size), or a 64-bit pre-4.5 kernel (small agfl
> > size).
> > 
> 
> I don't think I was clear.. I'm envisioning whether we could come up
> with a patch that would generically fix up the agfl on disk to be sane
> relative to the current kernel. This patch covers the case of a packed
> agfl kernel mounting an unpacked on-disk agfl. It would be nice if we
> could implement something that also handled a packed on-disk agfl to an
> unpacked agfl kernel (for easier backport to unpacked kernels, for
> e.g.).

If we're going to touch an old kernel's source at all I'd rather we
backport both the packing fix and this fixer-upper.

> > > I'm wondering whether using the unmount algorithm in both places (as
> > > noted above) would also facilitate this mechanism to work both ways
> > > (unpacked <-> packed), provided particular conditions are met. I suppose
> > > that could be considered regardless, but the less variance the better
> > > IMO.
> > 
> > In principle I suppose we could have a single function to handle all the
> > rearranging, but I'd rather have two functions to handle the two cases.
> > I'm open to refactoring the common parts out of the
> > xfs_fixup_freelist_wrap_*mount functions.
> > 
> 
> As noted above, I'm not really commenting on the function factoring here
> as much as the use of multiple algorithms (using one would certainly
> facilitate some mechanical refactoring).

TBH the patch you sent is pretty close to what I was imagining.

> > > > +	/* Would this have even passed muster on an old system? */
> > > 
> > > Comment doesn't really explain what is going on here..?
> > 
> > /*
> >  * If the distance between flfirst and fllast mismatches flcount by more
> >  * than 1, then there's more wrong with this agfl than just the padding
> >  * problem.  Bail out completely, which will force the admin to run
> >  * xfs_repair.
> >  */
> > 
> 
> *nod*
> 
> > > 
> > > > +	if (flfirst >= agfl_size - 1 || fllast >= agfl_size - 1 ||
> > > > +	    pag->pagf_flcount > agfl_size - 1)
> > > > +		return -EFSCORRUPTED;
> > > > +
> > > 
> > > I take it these are checks for whether the agfl was previously corrupted
> > > based on the unpacked size? FWIW, it might be a bit more clear to use an
> > > old/unpacked_agfl_size variable to declare intent (here and in some of
> > > the comments and whatnot that follow).
> > 
> > Ok.
> > 
> > > > +	/*
> > > > +	 * Convert a 40-byte-padded agfl into a 36-byte-padded AGFL.
> > > > +	 * Therefore, we need to move the AGFL blocks
> > > > +	 * bno[flfirst..agfl_size - 2] to bno[flfirst + 1...agfl_size - 1].
> > > > +	 *
> > > > +	 * Reusing the example above, if we had flfirst == 116, we need
> > > > +	 * to move bno[116] and bno[117] into bno[117] and bno[118],
> > > > +	 * respectively, and then increment flfirst.
> > > > +	 */
> > > 
> > > Kind of a strange example to use given that it doesn't mention
> > > wrapping.. this only triggers if the agfl was previously wrapped, right?
> > 
> > Right.  I used to have a check that if the list didn't wrap then we
> > would just exit because no fixing is required... but it must've fallen
> > out.  I agree that we don't want to rely on a subtlety here to stay out
> > of here if the agfl wasn't initially wrapped.
> > 
> > > > +	error = xfs_alloc_read_agfl(mp, tp, agno, &agflbp);
> > > > +	if (error)
> > > > +		return error;
> > > > +	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);
> > > > +
> > > > +	len = (agfl_size - flfirst - 1) * sizeof(xfs_agblock_t);
> > > > +	memmove(&agfl_bno[flfirst + 1], &agfl_bno[flfirst], len);
> > > > +	offset = (char *)&agfl_bno[flfirst + 1] - (char *)agflbp->b_addr;
> > > > +	be32_add_cpu(&agf->agf_flfirst, 1);
> > > > +
> > > > +	xfs_trans_buf_set_type(tp, agflbp, XFS_BLFT_AGFL_BUF);
> > > > +	xfs_trans_log_buf(tp, agflbp, offset, offset + len - 1);
> > > > +	xfs_trans_brelse(tp, agflbp);
> > > > +	agflbp = NULL;
> > > > +	xfs_alloc_log_agf(tp, agfbp, XFS_AGF_FLFIRST);
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +/*
> > > > + * Fix an AGFL that touches the end of the block by moving the first or last
> > > > + * part of the list elsewhere in the AGFL so that old kernels don't trip over
> > > > + * wrapping issues.
> > > > + */
> > > > +STATIC int
> > > > +xfs_fixup_freelist_wrap_unmount(
> > > > +	struct xfs_trans	*tp,
> > > > +	struct xfs_buf		*agfbp,
> > > > +	struct xfs_perag	*pag)
> > > > +{
> > > > +	struct xfs_mount	*mp = tp->t_mountp;
> > > > +	struct xfs_agf		*agf;
> > > > +	struct xfs_buf		*agflbp;
> > > > +	__be32			*agfl_bno;
> > > > +	xfs_agnumber_t		agno;
> > > > +	uint32_t		agfl_size;
> > > > +	uint32_t		flfirst;
> > > > +	uint32_t		fllast;
> > > > +	int			offset;
> > > > +	int			len;
> > > > +	int			error;
> > > > +
> > > > +	agfl_size = xfs_agfl_size(mp);
> > > > +	agf = XFS_BUF_TO_AGF(agfbp);
> > > > +	agno = be32_to_cpu(agf->agf_seqno);
> > > > +	flfirst = be32_to_cpu(agf->agf_flfirst);
> > > > +	fllast = be32_to_cpu(agf->agf_fllast);
> > > > +
> > > > +	/* Empty AGFL?  Make sure we aren't pointing at the end. */
> > > > +	if (pag->pagf_flcount == 0) {
> > > > +		if (flfirst >= agfl_size || fllast >= agfl_size) {
> > > 
> > > When would either field be >= agfl_size? Isn't that where they wrap?
> > 
> > Oops, you're right, that should be flfirst >= agfl_size - 1.
> > 
> > Or really,
> > 
> > old_agfl_size = agfl_size - 1
> > if flfirst >= old_agfl_size or fllast >= old_agfl_size:
> > 	blahblahblah
> > 
> > > 
> > > > +			agf->agf_flfirst = cpu_to_be32(1);
> > > > +			agf->agf_fllast = 0;
> > > > +			xfs_alloc_log_agf(tp, agfbp,
> > > > +					XFS_AGF_FLFIRST | XFS_AGF_FLLAST);
> > > > +		}
> > > > +		return 0;
> > > > +	}
> > > > +
> > > > +	/* If we don't hit the end, we're done. */
> > > > +	if (flfirst < fllast && fllast != agfl_size - 1)
> > > > +		return 0;
> > > > +
> > > > +	/*
> > > > +	 * Move a start of a wrapped list towards the start of the agfl block.
> > > 
> > > FWIW, this and the subsequent comments kind of gave me the impression
> > > that the agfl would be "shifted" towards the start of the block. The
> > > code doesn't do that, but rather rotates the start/head of the agfl to
> > > the tail and adjusts the pointers (changing the effective order). That
> > > seems technically Ok, but I had to grok the code and read back to
> > > understand the comment rather than the other way around.
> > 
> > Yeah.  Sorry about the confusion, maybe it would be better if I took out
> > these confusing comments and replaced it all with the ascii art above?
> > 
> 
> I'm still hoping we can do something a bit more simple and flexible in
> general, but that aside, the ascii art describes the associated
> algorithms much better.
> 
> > > > +	 * Therefore, we need to move the AGFL blocks
> > > > +	 * bno[flfirst..agfl_size - 1] to bno[fllast + 1...agfl_size - flfirst].
> > > > +	 * Then we reset flfirst and fllast appropriately.
> > > > +	 *
> > > > +	 * Reusing the example above, if we had flfirst == 117 and fllast == 4,
> > > > +	 * we need to move bno[117] and bno[118] into bno[5] and bno[6],
> > > > +	 * respectively, and then reset flfirst and fllast.
> > > > +	 *
> > > > +	 * If it's just the last block that touches the end, only move that.
> > > > +	 */
> > > > +	error = xfs_alloc_read_agfl(mp, tp, agno, &agflbp);
> > > > +	if (error)
> > > > +		return error;
> > > > +	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);
> > > > +
> > > > +	if (fllast == agfl_size - 1) {
> > > > +		/* Back the AGFL off from the end of the block. */
> > > > +		len = sizeof(xfs_agblock_t);
> > > > +		agfl_bno[flfirst - 1] = agfl_bno[agfl_size - 1];
> > > 
> > > What if the agfl is full (flfirst == 0)?
> > 
> > I don't think we ever unmount with a full agfl -- we're guaranteed at
> > least 118 items in the AGFL, and the maximum size the agfl will ever
> > need is 8 blocks for each of bnobt, cntbt, rmapbt.
> > 
> > However, we could just detect a full agfl and free the end block.
> > 
> 
> I suppose it's unlikely we'll ever see corner cases such as a completely
> full or empty agfl. That said, I think the code needs to handle those
> cases sanely. That could mean explicit checks and skipping the fixup for
> all I care, just as long as we don't have memory corruption vectors and
> whatnot if (when?) assumptions prove wrong.
> 
> > > > +		offset = (char *)&agfl_bno[flfirst - 1] - (char *)agflbp->b_addr;
> > > > +		be32_add_cpu(&agf->agf_fllast, -1);
> > > > +		be32_add_cpu(&agf->agf_flfirst, -1);
> > > > +	} else {
> > 
> > This needs to check for flfirst > fllast.
> > 
> > > > +		/* Move the first part of the AGFL towards the front. */
> > > > +		len = (agfl_size - flfirst) * sizeof(xfs_agblock_t);
> > > > +		memcpy(&agfl_bno[fllast + 1], &agfl_bno[flfirst], len);
> > > > +		offset = (char *)&agfl_bno[fllast + 1] - (char *)agflbp->b_addr;
> > > > +		agf->agf_flfirst = 0;
> > > > +		agf->agf_fllast = cpu_to_be32(pag->pagf_flcount - 1);
> > > 
> > > Similar question here as above.. it looks like the copy may be a no-op,
> > > but we still reset flfirst/fllast (which may also be fine, but
> > > unnecessary).
> > > 
> > > The more interesting case here is flcount == 1 and flfirst == fllast,
> > > where it looks like we'd wrongly reset first/last (and perhaps copy more
> > > than we should..?).
> > > 
> > > Ugh. Remind me again why we can't just detect a mismatch, fail the mount
> > > and tell the user to repair? :P
> > 
> > Not all the distros included xfs_repair in the initrd or the logic to
> > call it (as opposed to "fsck -y") if the root fs fails to mount, which
> > means that if this wrapping problem hits a root filesystem then the
> > administrator will have to reboot with a recovery iso and run xfs_repair
> > from there.  That's going to generate a /lot/ of support work compared
> > to us figuring out how to repair the filesystems automatically.
> > 
> 
> Yeah, sorry. Good argument for the unpacked -> packed fixup.
> 
> > > Random thought... have we considered anything less invasive/more
> > > simple for this issue than low level buffer manipulation? For example,
> > > could we address the unpacked -> packed case by just allocating a
> > > block to fill the gap at the end of the wrapped agfl?
> > 
> > The allocation could fail, in which case we'd be forced to resort to
> > list manipulation.
> > 
> 
> Yeah.. obvious tradeoff, but how likely is that? It seems a reasonable
> tradeoff to me given there's still the repair path. Also note that we
> don't have to necessarily allocate from the btrees.
> 
> (FWIW, I hacked around on this a bit and ended up with something
> slightly different[1].)
> 
> > > Perhaps that could even be done more discreetly in
> > > xfs_alloc_fix_freelist() when we already have a transaction set up,
> > > etc.
> > 
> > Possible, though this will add more logic to every hot allocation call
> > for something that only needs fixing at mount and at unmount.  I think
> > there's a stronger argument for doing the unmount cleanup on every
> > fix_freelist since that would eliminate the small chance that a new
> > kernel crashes and gets recovered on an old kernel.
> > 
> 
> Eh, I don't think that's anything we couldn't address if need be.  E.g.,
> amortize the padding fixup against actual allocations, set/test a pag
> bit to track the padding fixup, or check it on agf read and set a "needs
> fixup" bit, etc.
> 
> > > That wouldn't help going the other direction, but maybe we could
> > > consider similar logic to free an extraneous block and then worry
> > > about protecting users who won't upgrade broken kernels separately if
> > > that really continues to be a problem.
> > 
> > It is already a problem here.
> > 
> 
> Then it should be able to stand alone as an independent patch (with
> independent commit log that explains/justifies it, etc.).
> 
> Brian
> 
> [1] This is hacky and brute force, but the idea was to try and do
> something that could work in either direction and reuse more existing
> code. It basically just reads the on-disk entries "manually" and
> reinserts them into a reinitialized (empty) agfl. Perhaps it could be
> refactored into something that only rotates a given number of entries or
> something of that nature. Anyways, just an experiment and probably
> broken..

I'll give this a whirl and see where we end up.

--D

> int
> xfs_agfl_fixup(
> 	struct xfs_trans	*tp,
> 	struct xfs_buf		*agbp,
> 	struct xfs_perag	*pag)
> {
> 	struct xfs_mount	*mp = tp->t_mountp;
> 	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agbp);
> 	int			agfl_size = XFS_AGFL_SIZE(mp);
> 	int			ofirst, olast, osize;
> 	int			nfirst, nlast;
> 	struct xfs_buf		*agflbp;
> 	__be32			*agfl_bno;
> 	int			active;
> 	xfs_agblock_t		bno;
> 	bool			wasempty = false;
> 
> 	if (pag->pagf_flcount == 0)
> 		wasempty = true;
> 
> 	ofirst = be32_to_cpu(agf->agf_flfirst);
> 	olast = be32_to_cpu(agf->agf_fllast);
> 	if (olast >= ofirst)
> 		active = olast - ofirst + 1;
> 	else
> 		active = agfl_size - ofirst + olast + 1;
> 
> 	if (active == pag->pagf_flcount + 1)
> 		osize = agfl_size - 1;
> 	else if ((active == pag->pagf_flcount - 1) ||
> 		 ofirst == agfl_size || olast == agfl_size)
> 		osize = agfl_size + 1;
> 	else if (active == pag->pagf_flcount)
> 		osize = agfl_size;
> 	else
> 		return -EFSCORRUPTED;
> 
> 	xfs_alloc_read_agfl(mp, tp, be32_to_cpu(agf->agf_seqno), &agflbp);
> 	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);
> 
> 	nlast = do_mod(olast, agfl_size);
> 	nfirst = nlast + 1;
> 	nfirst = do_mod(nfirst, agfl_size);
> 	ASSERT(nfirst != ofirst);
> 
> 	agf->agf_flfirst = cpu_to_be32(nfirst);
> 	agf->agf_fllast = cpu_to_be32(nlast);
> 	agf->agf_flcount = 0;
> 	xfs_alloc_log_agf(tp, agbp, XFS_AGF_FLFIRST | XFS_AGF_FLLAST |
> 				    XFS_AGF_FLCOUNT);
> 	pag->pagf_flcount = 0;
> 
> 	if (wasempty)
> 		goto out;
> 
> 	while (true) {
> 		ofirst = do_mod(ofirst, osize);
> 		bno = be32_to_cpu(agfl_bno[ofirst]);
> 		xfs_alloc_put_freelist(tp, agbp, agflbp, bno, 0);
> 		if (ofirst == olast)
> 			break;
> 		ofirst++;
> 	}
> 
> out:
> 	xfs_trans_brelse(tp, agflbp);
> 	return 0;
> }
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] xfs: fix agfl wrapping
  2018-02-27 19:35   ` Brian Foster
  2018-02-27 21:03     ` Darrick J. Wong
@ 2018-03-01  6:37     ` Darrick J. Wong
  1 sibling, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2018-03-01  6:37 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs, djwong

On Tue, Feb 27, 2018 at 02:35:49PM -0500, Brian Foster wrote:
> On Thu, Feb 22, 2018 at 06:00:15PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/Makefile     |    1 
> >  fs/xfs/xfs_fixups.c |  310 +++++++++++++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/xfs_fixups.h |   26 ++++
> >  fs/xfs/xfs_mount.c  |   21 +++
> >  fs/xfs/xfs_super.c  |   10 ++
> >  5 files changed, 367 insertions(+), 1 deletion(-)
> >  create mode 100644 fs/xfs/xfs_fixups.c
> >  create mode 100644 fs/xfs/xfs_fixups.h
> > 
> > 
> > diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> > index b03c77e..f88368a 100644
> > --- a/fs/xfs/Makefile
> > +++ b/fs/xfs/Makefile
> > @@ -86,6 +86,7 @@ xfs-y				+= xfs_aops.o \
> >  				   xfs_extent_busy.o \
> >  				   xfs_file.o \
> >  				   xfs_filestream.o \
> > +				   xfs_fixups.o \
> >  				   xfs_fsmap.o \
> >  				   xfs_fsops.o \
> >  				   xfs_globals.o \
> > diff --git a/fs/xfs/xfs_fixups.c b/fs/xfs/xfs_fixups.c
> > new file mode 100644
> > index 0000000..0cad7bb
> > --- /dev/null
> > +++ b/fs/xfs/xfs_fixups.c
> > @@ -0,0 +1,310 @@
> > +/*
> > + * Copyright (C) 2018 Oracle.  All Rights Reserved.
> > + *
> > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version 2
> > + * of the License, or (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it would be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write the Free Software Foundation,
> > + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> > + */
> > +#include "xfs.h"
> > +#include "xfs_fs.h"
> > +#include "xfs_shared.h"
> > +#include "xfs_format.h"
> > +#include "xfs_log_format.h"
> > +#include "xfs_trans_resv.h"
> > +#include "xfs_sb.h"
> > +#include "xfs_mount.h"
> > +#include "xfs_alloc.h"
> > +#include "xfs_trans.h"
> > +#include "xfs_fixups.h"
> > +
> > +/*
> > + * v5 AGFL padding defects
> > + *
> > + * When the v5 format was first introduced, there was a defect in the struct
> > + * xfs_agfl definition that resulted in XFS_AGFL_SIZE returning different
> 
> XFS_AGFL_SIZE() no longer exists as of the previous patch. It might be
> better to size refer to "the agfl size" generically.
> 
> > + * values depending on the compiler padding.  On a fs with 512-byte sectors,
> > + * this meant that XFS_AGFL_SIZE was 119 on i386, but 118 on x64.  Commit
> > + * 96f859d52bcb1 ("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is
> > + * correct") changed the definition to disable padding the end of the
> > + * structure, and was accepted into Linux 4.5.  Since then, the AGFL has
> > + * always used the larger size (e.g. 119 entries on a 512b sector fs).
> > + *
> > + * Unfortunately, pre-4.5 kernels can produce filesystems with AGFLs that wrap
> > + * at the smaller size, and those kernels are not prepared to handle the
> > + * longer size.  This typically manifests itself as an AGF verifier corruption
> > + * error followed by a filesystem shutdown.  While we encourage admins to stay
> > + * current with software, we would like to avoid this intermittent breakage.
> > + *
> > + * Any v5 filesystem which has a feature bit set for a feature that was
> > + * introduced after Linux 4.5 will not have this problem, as such kernels
> > + * cannot be mounted on older kernels.  v4 filesystems are also unaffected.
> > + *
> > + * Therefore, we add two fixup functions -- the first runs at mount time to
> > + * detect a short-wrapped AGFL and fix it; the second runs at unmount, freeze,
> > + * or remount-ro time to move a wrapped AGFL to the beginning of the list.
> > + * This reduces the likelihood of a screwup to the scenario where you have (a)
> > + * a filesystem with no post-4.5 features (reflink, rmap), (b) the AGFL wraps,
> > + * (c) the filesystem goes down leaving a dirty log, and (d) the dirty
> > + * filesystem is mounted on an old kernel.
> > + */
> > +
> 
> While the mount vs. unmount time fixups serve different purposes and
> have different conditions, do we really need two separate fixup
> implementations? E.g., if we have to unwrap the AGFL at unmount time,
> why not just reuse that fixup at mount time (when needed) as well? I
> suspect the difference would only be the length of what we can consider
> valid in the agfl at the time.

TBH I rewrote the whole thing (again...) so that we move all the list
items to the front of the agfl block no matter what problem we detect.
I also added more ascii art to describe the two other things that the
unmount fixer has to detect (list hits the end of the block at unmount)
and (empty list has pointers touching the end of the block).  It seems
to have passed the quick run so I'll send it out right after this.

I think my email receiving is broken.

--D

> 
> > +/*
> > + * Decide if we need to have the agfl wrapping fixes applied.  This only
> > + * affects v5 filesystems that do not have any features enabled that did not
> > + * exist when the agfl padding fix went in.
> > + *
> > + * Features already present when the fix went in were finobt, ftype, spinodes.
> > + * If we see something new (e.g. reflink) then don't bother.
> > + */
> > +#define XFS_SB_FEAT_RO_COMPAT_AGFL_WRAP_ALREADY_FIXED \
> > +		(~(XFS_SB_FEAT_RO_COMPAT_FINOBT))
> > +#define XFS_SB_FEAT_INCOMPAT_AGFL_WRAP_ALREADY_FIXED \
> > +		(~(XFS_SB_FEAT_INCOMPAT_FTYPE | \
> > +		   XFS_SB_FEAT_INCOMPAT_SPINODES))
> > +#define XFS_SB_FEAT_INCOMPAT_LOG_AGFL_WRAP_ALREADY_FIXED \
> > +		(~0)
> > +static inline bool xfs_sb_version_needs_agfl_wrap_fixes(struct xfs_sb *sbp)
> > +{
> > +	return xfs_sb_version_hascrc(sbp) &&
> > +		!xfs_sb_has_incompat_feature(sbp,
> > +			XFS_SB_FEAT_INCOMPAT_AGFL_WRAP_ALREADY_FIXED) &&
> > +		!xfs_sb_has_ro_compat_feature(sbp,
> > +			XFS_SB_FEAT_RO_COMPAT_AGFL_WRAP_ALREADY_FIXED) &&
> > +		!xfs_sb_has_incompat_log_feature(sbp,
> > +			XFS_SB_FEAT_INCOMPAT_LOG_AGFL_WRAP_ALREADY_FIXED);
> > +}
> > +
> > +/*
> > + * Fix an AGFL wrapping that falls short of the end of the block by filling the
> > + * gap at the end of the block.
> > + */
> > +STATIC int
> > +xfs_fixup_freelist_wrap_mount(
> > +	struct xfs_trans	*tp,
> > +	struct xfs_buf		*agfbp,
> > +	struct xfs_perag	*pag)
> > +{
> > +	struct xfs_mount	*mp = tp->t_mountp;
> > +	struct xfs_agf		*agf;
> > +	struct xfs_buf		*agflbp;
> > +	__be32			*agfl_bno;
> > +	xfs_agnumber_t		agno;
> > +	uint32_t		agfl_size;
> > +	uint32_t		flfirst;
> > +	uint32_t		fllast;
> > +	int32_t			active;
> > +	int			offset;
> > +	int			len;
> > +	int			error;
> > +
> > +	if (pag->pagf_flcount == 0)
> > +		return 0;
> > +
> > +	agfl_size = xfs_agfl_size(mp);
> > +	agf = XFS_BUF_TO_AGF(agfbp);
> > +	agno = be32_to_cpu(agf->agf_seqno);
> > +	flfirst = be32_to_cpu(agf->agf_flfirst);
> > +	fllast = be32_to_cpu(agf->agf_fllast);
> > +
> > +	/* Make sure we're either spot on or off by 1. */
> > +	active = fllast - flfirst + 1;
> > +	if (active <= 0)
> > +		active += agfl_size;
> > +	if (active == pag->pagf_flcount)
> > +		return 0;
> > +	else if (active != pag->pagf_flcount + 1)
> > +		return -EFSCORRUPTED;
> > +
> 
> So we're not attempting to cover the case where the agfl has 1 more
> block than the agfl size (i.e., the case where an fs goes back to a
> kernel with an unpacked header)?
> 
> I'm wondering whether using the unmount algorithm in both places (as
> noted above) would also facilitate this mechanism to work both ways
> (unpacked <-> packed), provided particular conditions are met. I suppose
> that could be considered regardless, but the less variance the better
> IMO.
> 
> > +	/* Would this have even passed muster on an old system? */
> 
> Comment doesn't really explain what is going on here..?
> 
> > +	if (flfirst >= agfl_size - 1 || fllast >= agfl_size - 1 ||
> > +	    pag->pagf_flcount > agfl_size - 1)
> > +		return -EFSCORRUPTED;
> > +
> 
> I take it these are checks for whether the agfl was previously corrupted
> based on the unpacked size? FWIW, it might be a bit more clear to use an
> old/unpacked_agfl_size variable to declare intent (here and in some of
> the comments and whatnot that follow).
> 
> > +	/*
> > +	 * Convert a 40-byte-padded agfl into a 36-byte-padded AGFL.
> > +	 * Therefore, we need to move the AGFL blocks
> > +	 * bno[flfirst..agfl_size - 2] to bno[flfirst + 1...agfl_size - 1].
> > +	 *
> > +	 * Reusing the example above, if we had flfirst == 116, we need
> > +	 * to move bno[116] and bno[117] into bno[117] and bno[118],
> > +	 * respectively, and then increment flfirst.
> > +	 */
> 
> Kind of a strange example to use given that it doesn't mention
> wrapping.. this only triggers if the agfl was previously wrapped, right?
> 
> > +	error = xfs_alloc_read_agfl(mp, tp, agno, &agflbp);
> > +	if (error)
> > +		return error;
> > +	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);
> > +
> > +	len = (agfl_size - flfirst - 1) * sizeof(xfs_agblock_t);
> > +	memmove(&agfl_bno[flfirst + 1], &agfl_bno[flfirst], len);
> > +	offset = (char *)&agfl_bno[flfirst + 1] - (char *)agflbp->b_addr;
> > +	be32_add_cpu(&agf->agf_flfirst, 1);
> > +
> > +	xfs_trans_buf_set_type(tp, agflbp, XFS_BLFT_AGFL_BUF);
> > +	xfs_trans_log_buf(tp, agflbp, offset, offset + len - 1);
> > +	xfs_trans_brelse(tp, agflbp);
> > +	agflbp = NULL;
> > +	xfs_alloc_log_agf(tp, agfbp, XFS_AGF_FLFIRST);
> > +
> > +	return 0;
> > +}
> > +
> > +/*
> > + * Fix an AGFL that touches the end of the block by moving the first or last
> > + * part of the list elsewhere in the AGFL so that old kernels don't trip over
> > + * wrapping issues.
> > + */
> > +STATIC int
> > +xfs_fixup_freelist_wrap_unmount(
> > +	struct xfs_trans	*tp,
> > +	struct xfs_buf		*agfbp,
> > +	struct xfs_perag	*pag)
> > +{
> > +	struct xfs_mount	*mp = tp->t_mountp;
> > +	struct xfs_agf		*agf;
> > +	struct xfs_buf		*agflbp;
> > +	__be32			*agfl_bno;
> > +	xfs_agnumber_t		agno;
> > +	uint32_t		agfl_size;
> > +	uint32_t		flfirst;
> > +	uint32_t		fllast;
> > +	int			offset;
> > +	int			len;
> > +	int			error;
> > +
> > +	agfl_size = xfs_agfl_size(mp);
> > +	agf = XFS_BUF_TO_AGF(agfbp);
> > +	agno = be32_to_cpu(agf->agf_seqno);
> > +	flfirst = be32_to_cpu(agf->agf_flfirst);
> > +	fllast = be32_to_cpu(agf->agf_fllast);
> > +
> > +	/* Empty AGFL?  Make sure we aren't pointing at the end. */
> > +	if (pag->pagf_flcount == 0) {
> > +		if (flfirst >= agfl_size || fllast >= agfl_size) {
> 
> When would either field be >= agfl_size? Isn't that where they wrap?
> 
> > +			agf->agf_flfirst = cpu_to_be32(1);
> > +			agf->agf_fllast = 0;
> > +			xfs_alloc_log_agf(tp, agfbp,
> > +					XFS_AGF_FLFIRST | XFS_AGF_FLLAST);
> > +		}
> > +		return 0;
> > +	}
> > +
> > +	/* If we don't hit the end, we're done. */
> > +	if (flfirst < fllast && fllast != agfl_size - 1)
> > +		return 0;
> > +
> > +	/*
> > +	 * Move a start of a wrapped list towards the start of the agfl block.
> 
> FWIW, this and the subsequent comments kind of gave me the impression
> that the agfl would be "shifted" towards the start of the block. The
> code doesn't do that, but rather rotates the start/head of the agfl to
> the tail and adjusts the pointers (changing the effective order). That
> seems technically Ok, but I had to grok the code and read back to
> understand the comment rather than the other way around.
> 
> > +	 * Therefore, we need to move the AGFL blocks
> > +	 * bno[flfirst..agfl_size - 1] to bno[fllast + 1...agfl_size - flfirst].
> > +	 * Then we reset flfirst and fllast appropriately.
> > +	 *
> > +	 * Reusing the example above, if we had flfirst == 117 and fllast == 4,
> > +	 * we need to move bno[117] and bno[118] into bno[5] and bno[6],
> > +	 * respectively, and then reset flfirst and fllast.
> > +	 *
> > +	 * If it's just the last block that touches the end, only move that.
> > +	 */
> > +	error = xfs_alloc_read_agfl(mp, tp, agno, &agflbp);
> > +	if (error)
> > +		return error;
> > +	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);
> > +
> > +	if (fllast == agfl_size - 1) {
> > +		/* Back the AGFL off from the end of the block. */
> > +		len = sizeof(xfs_agblock_t);
> > +		agfl_bno[flfirst - 1] = agfl_bno[agfl_size - 1];
> 
> What if the agfl is full (flfirst == 0)?
> 
> > +		offset = (char *)&agfl_bno[flfirst - 1] - (char *)agflbp->b_addr;
> > +		be32_add_cpu(&agf->agf_fllast, -1);
> > +		be32_add_cpu(&agf->agf_flfirst, -1);
> > +	} else {
> > +		/* Move the first part of the AGFL towards the front. */
> > +		len = (agfl_size - flfirst) * sizeof(xfs_agblock_t);
> > +		memcpy(&agfl_bno[fllast + 1], &agfl_bno[flfirst], len);
> > +		offset = (char *)&agfl_bno[fllast + 1] - (char *)agflbp->b_addr;
> > +		agf->agf_flfirst = 0;
> > +		agf->agf_fllast = cpu_to_be32(pag->pagf_flcount - 1);
> 
> Similar question here as above.. it looks like the copy may be a no-op,
> but we still reset flfirst/fllast (which may also be fine, but
> unnecessary).
> 
> The more interesting case here is flcount == 1 and flfirst == fllast,
> where it looks like we'd wrongly reset first/last (and perhaps copy more
> than we should..?).
> 
> Ugh. Remind me again why we can't just detect a mismatch, fail the mount
> and tell the user to repair? :P Random thought... have we considered
> anything less invasive/more simple for this issue than low level buffer
> manipulation? For example, could we address the unpacked -> packed case
> by just allocating a block to fill the gap at the end of the wrapped
> agfl? Perhaps that could even be done more discreetly in
> xfs_alloc_fix_freelist() when we already have a transaction set up, etc.
> That wouldn't help going the other direction, but maybe we could
> consider similar logic to free an extraneous block and then worry about
> protecting users who won't upgrade broken kernels separately if that
> really continues to be a problem.
> 
> Brian
> 
> > +	}
> > +
> > +	xfs_trans_buf_set_type(tp, agflbp, XFS_BLFT_AGFL_BUF);
> > +	xfs_trans_log_buf(tp, agflbp, offset, offset + len - 1);
> > +	xfs_trans_brelse(tp, agflbp);
> > +	agflbp = NULL;
> > +	xfs_alloc_log_agf(tp, agfbp, XFS_AGF_FLFIRST | XFS_AGF_FLLAST);
> > +
> > +	return 0;
> > +}
> > +
> > +typedef int (*xfs_agf_apply_fn_t)(struct xfs_trans *tp, struct xfs_buf *agfbp,
> > +		struct xfs_perag *pag);
> > +
> > +/* Apply something to every AGF. */
> > +STATIC int
> > +xfs_fixup_agf_apply(
> > +	struct xfs_mount	*mp,
> > +	xfs_agf_apply_fn_t	fn)
> > +{
> > +	struct xfs_trans	*tp;
> > +	struct xfs_perag	*pag;
> > +	struct xfs_buf		*agfbp;
> > +	xfs_agnumber_t		agno;
> > +	int			error;
> > +
> > +	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growdata, 0, 0, 0, &tp);
> > +	if (error)
> > +		return error;
> > +
> > +	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
> > +		error = xfs_alloc_read_agf(mp, tp, agno, 0, &agfbp);
> > +		if (error)
> > +			goto cancel;
> > +		if (!agfbp) {
> > +			error = -ENOMEM;
> > +			goto cancel;
> > +		}
> > +		pag = xfs_perag_get(mp, agno);
> > +		error = fn(tp, agfbp, pag);
> > +		xfs_perag_put(pag);
> > +		xfs_trans_brelse(tp, agfbp);
> > +		if (error)
> > +			goto cancel;
> > +	}
> > +
> > +	return xfs_trans_commit(tp);
> > +cancel:
> > +	xfs_trans_cancel(tp);
> > +	return error;
> > +}
> > +
> > +/* Fix AGFL wrapping so we can use the filesystem. */
> > +int
> > +xfs_fixup_agfl_wrap_mount(
> > +	struct xfs_mount	*mp)
> > +{
> > +	if (!xfs_sb_version_needs_agfl_wrap_fixes(&mp->m_sb))
> > +		return 0;
> > +
> > +	return xfs_fixup_agf_apply(mp, xfs_fixup_freelist_wrap_mount);
> > +}
> > +
> > +/* Fix AGFL wrapping so old kernels can use this filesystem. */
> > +int
> > +xfs_fixup_agfl_wrap_unmount(
> > +	struct xfs_mount	*mp)
> > +{
> > +	if (!xfs_sb_version_needs_agfl_wrap_fixes(&mp->m_sb))
> > +		return 0;
> > +
> > +	return xfs_fixup_agf_apply(mp, xfs_fixup_freelist_wrap_unmount);
> > +}
> > diff --git a/fs/xfs/xfs_fixups.h b/fs/xfs/xfs_fixups.h
> > new file mode 100644
> > index 0000000..fb52a96
> > --- /dev/null
> > +++ b/fs/xfs/xfs_fixups.h
> > @@ -0,0 +1,26 @@
> > +/*
> > + * Copyright (C) 2018 Oracle.  All Rights Reserved.
> > + *
> > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version 2
> > + * of the License, or (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it would be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write the Free Software Foundation,
> > + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> > + */
> > +#ifndef	__XFS_FIXUPS_H__
> > +#define	__XFS_FIXUPS_H__
> > +
> > +int xfs_fixup_agfl_wrap_mount(struct xfs_mount *mp);
> > +int xfs_fixup_agfl_wrap_unmount(struct xfs_mount *mp);
> > +
> > +#endif /* __XFS_FIXUPS_H__ */
> > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > index 98fd41c..eb284aa 100644
> > --- a/fs/xfs/xfs_mount.c
> > +++ b/fs/xfs/xfs_mount.c
> > @@ -46,7 +46,7 @@
> >  #include "xfs_refcount_btree.h"
> >  #include "xfs_reflink.h"
> >  #include "xfs_extent_busy.h"
> > -
> > +#include "xfs_fixups.h"
> >  
> >  static DEFINE_MUTEX(xfs_uuid_table_mutex);
> >  static int xfs_uuid_table_size;
> > @@ -875,6 +875,16 @@ xfs_mountfs(
> >  	}
> >  
> >  	/*
> > +	 * Make sure our AGFL counters do not wrap the end of the block
> > +	 * in a troublesome manner.
> > +	 */
> > +	error = xfs_fixup_agfl_wrap_mount(mp);
> > +	if (error) {
> > +		xfs_warn(mp, "Failed to fix agfl wrapping.  Run xfs_repair.");
> > +		goto out_log_dealloc;
> > +	}
> > +
> > +	/*
> >  	 * Get and sanity-check the root inode.
> >  	 * Save the pointer to it in the mount structure.
> >  	 */
> > @@ -1128,6 +1138,15 @@ xfs_unmountfs(
> >  	xfs_qm_unmount(mp);
> >  
> >  	/*
> > +	 * Make sure our AGFL counters do not wrap the end of the block
> > +	 * in a troublesome manner for old kernels.
> > +	 */
> > +	error = xfs_fixup_agfl_wrap_unmount(mp);
> > +	if (error)
> > +		xfs_warn(mp, "Unable to fix agfl wrapping.  "
> > +				"This may cause problems on next mount.");
> > +
> > +	/*
> >  	 * Unreserve any blocks we have so that when we unmount we don't account
> >  	 * the reserved free space as used. This is really only necessary for
> >  	 * lazy superblock counting because it trusts the incore superblock
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index 624a802..d9aa39a 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -50,6 +50,7 @@
> >  #include "xfs_refcount_item.h"
> >  #include "xfs_bmap_item.h"
> >  #include "xfs_reflink.h"
> > +#include "xfs_fixups.h"
> >  
> >  #include <linux/namei.h>
> >  #include <linux/dax.h>
> > @@ -1206,6 +1207,15 @@ xfs_quiesce_attr(
> >  	xfs_reclaim_inodes(mp, 0);
> >  	xfs_reclaim_inodes(mp, SYNC_WAIT);
> >  
> > +	/*
> > +	 * Make sure our AGFL counters do not wrap the end of the block
> > +	 * in a troublesome manner for old kernels.
> > +	 */
> > +	error = xfs_fixup_agfl_wrap_unmount(mp);
> > +	if (error)
> > +		xfs_warn(mp, "Unable to fix agfl wrapping.  "
> > +				"This may cause problems on next mount.");
> > +
> >  	/* Push the superblock and write an unmount record */
> >  	error = xfs_log_sbcount(mp);
> >  	if (error)
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 5/5] xfs: fix agfl wrapping
  2018-02-23  2:00 ` [PATCH 5/5] xfs: fix agfl wrapping Darrick J. Wong
                     ` (2 preceding siblings ...)
  2018-02-27 19:35   ` Brian Foster
@ 2018-03-01  6:42   ` Darrick J. Wong
  3 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2018-03-01  6:42 UTC (permalink / raw)
  To: linux-xfs; +Cc: Brian Foster

From: Darrick J. Wong <darrick.wong@oracle.com>

Fix the AGFL wrapping issues introduced with the v5 disk format and
semi-corrected later in 4.5.  In short, if we find an incorrectly
wrapped agfl at mount time, or a list that touches the last item in the
agfl block, move the entire list to the start of the agfl block to avoid
triggering wrapping problems if the next mount is on a kernel that does
not have the 4.5 wrapping fix.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile     |    1 
 fs/xfs/xfs_fixups.c |  492 +++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_fixups.h |   26 +++
 fs/xfs/xfs_mount.c  |   21 ++
 fs/xfs/xfs_super.c  |   10 +
 fs/xfs/xfs_trace.h  |   25 +++
 6 files changed, 574 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/xfs_fixups.c
 create mode 100644 fs/xfs/xfs_fixups.h

diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index b03c77e..f88368a 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -86,6 +86,7 @@ xfs-y				+= xfs_aops.o \
 				   xfs_extent_busy.o \
 				   xfs_file.o \
 				   xfs_filestream.o \
+				   xfs_fixups.o \
 				   xfs_fsmap.o \
 				   xfs_fsops.o \
 				   xfs_globals.o \
diff --git a/fs/xfs/xfs_fixups.c b/fs/xfs/xfs_fixups.c
new file mode 100644
index 0000000..05b7193
--- /dev/null
+++ b/fs/xfs/xfs_fixups.c
@@ -0,0 +1,492 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_sb.h"
+#include "xfs_mount.h"
+#include "xfs_alloc.h"
+#include "xfs_trans.h"
+#include "xfs_fixups.h"
+#include "xfs_trace.h"
+
+/*
+ * v5 AGFL padding defects
+ *
+ * When the v5 format was first introduced, there was a defect in the struct
+ * xfs_agfl definition that resulted in the agfl length calculation formula
+ * returning different values depending on the compiler padding.  On a fs with
+ * 512-byte sectors, this meant that XFS_AGFL_SIZE was 119 on i386, but 118 on
+ * x64.  Commit 96f859d52bcb1 ("libxfs: pack the agfl header structure so
+ * XFS_AGFL_SIZE is correct") changed the definition to disable padding the
+ * end of the structure, and was accepted into Linux 4.5.  Since then, the
+ * AGFL has always used the larger size (e.g. 119 entries on a 512b sector
+ * fs).
+ *
+ * Unfortunately, pre-4.5 kernels can produce filesystems with AGFLs that wrap
+ * at the smaller size, and those kernels are not prepared to handle the
+ * longer size.  This typically manifests itself as an AGF verifier corruption
+ * error followed by a filesystem shutdown.  While we encourage admins to stay
+ * current with software, we would like to avoid this intermittent breakage.
+ *
+ * Any v5 filesystem which has a feature bit set for a feature that was
+ * introduced after Linux 4.5 will not have this problem, as such kernels
+ * cannot be mounted on older kernels.  v4 filesystems are also unaffected.
+ *
+ * Therefore, we add two fixup functions -- the first runs at mount time to
+ * detect a short-wrapped AGFL and fix it; the second runs at unmount, freeze,
+ * or remount-ro time to move a wrapped AGFL to the beginning of the list.
+ * This reduces the likelihood of a screwup to the scenario where you have (a)
+ * a filesystem with no post-4.5 features (reflink, rmap), (b) the AGFL wraps,
+ * (c) the filesystem goes down leaving a dirty log, and (d) the dirty
+ * filesystem is mounted on an old kernel.
+ *
+ * We prefer to fix this in the kernel at mount/unmount time instead of simply
+ * detecting the problem at mount time and ordering the administrator to run
+ * xfs_repair because not all the distros include xfs_repair in the initrd or
+ * the logic to call it (as opposed to "fsck -y") if the root fs fails to
+ * mount.  This means that if this wrapping problem hits a root filesystem
+ * then the administrator will have to reboot with a recovery iso and run
+ * xfs_repair from there.  That's going to generate a /lot/ of support work
+ * compared to us figuring out how to repair the filesystems automatically.
+ *
+ * To explain graphically exactly what this fixup does, let's pretend the AGFL
+ * block has space for 10 items.  It looks like this:
+ *
+ *          0   1   2   3   4   5   6   7   8   9
+ * Header |   |   |   |   |   |   |   |   |   |   |
+ *
+ * When the fs is freshly formatted or repaired, the AGFL will look like:
+ *
+ *          0   1   2   3   4   5   6   7   8   9
+ * Header | A | B | C | D | E | F |   |   |   |   |
+ *          ^-- flfirst         ^-- fllast
+ *
+ * Scenario A: List Wraps Badly at Mount Time
+ *
+ * Due to the padding problems prior to 4.5 ("4.5 agfl padding fix"),
+ * XFS_AGFL_SIZE would return 10 on 32-bit systems and 9 on 64-bit systems.
+ * Therefore, if the AGFL wrapped on a 64-bit kernel we would end up with:
+ *
+ *          0   1   2   3   4   5   6   7   8   9
+ * Header | D | E | F |   |   |   | A | B | C |   |
+ *                  ^-- fllast      ^-- flfirst
+ *
+ * Note that block 9's contents are undefined because we didn't write anything
+ * there.  Because the "4.5 agfl padding fix" corrected XFS_AGFL_SIZE to
+ * return 10 in all cases, this looks like an AGF corruption because flcount
+ * is 6 but the distance between flfirst and fllast is 7.  So long as the list
+ * wraps and the mismatch is exactly one block we can fix this by moving the
+ * whole list to the start of the block:
+ *
+ *          0   1   2   3   4   5   6   7   8   9
+ * Header | A | B | C | D | E | F |   |   |   |   |
+ *          ^-- flfirst         ^-- fllast
+ *
+ * If the count is off by more than 1 then the AGF is truly corrupt and we
+ * bail out.
+ *
+ * Scenario B: List Wraps Incompatibly at Unmount Time
+ *
+ * Now let's say that we run for a while and want to unmount, but our AGFL
+ * wraps like this:
+ *
+ *          0   1   2   3   4   5   6   7   8   9
+ * Header | T | U | V |   |   |   |   | Q | R | S |
+ *                  ^-- fllast          ^-- flfirst
+ *
+ * We don't know that the next kernel to mount this filesystem will have the
+ * "4.5 agfl padding fix" applied to it; if that kernel is a 64-bit kernel
+ * without the padding fix applied, it will flag the AGF as corrupted because
+ * flcount is 6 but in its view the distance between flfirst and fllast (which
+ * omits bno[9]) is 5.  We don't want unpatched kernels choke on that, so the
+ * unmount fixer moves the whole list to the start of the block:
+ *
+ *          0   1   2   3   4   5   6   7   8   9
+ * Header | Q | R | S | T | U | V |   |   |   |   |
+ *          ^-- flfirst         ^-- fllast
+ *
+ * Since the AGFL no longer wraps at all, it doesn't matter if the next
+ * kernel to mount this filesystem has the "4.5 agfl padding fix" applied;
+ * all kernels can handle this correctly.
+ *
+ * Scenario C: List Hits the End at Unmount Time
+ *
+ * Now let's say that we run for a while and want to unmount, but our AGFL
+ * happens to stop right at the last block:
+ *
+ *          0   1   2   3   4   5   6   7   8   9
+ * Header |   |   |   |   | Q | R | S | T | U | V |
+ *                          ^-- flfirst         ^-- fllast
+ *
+ * This has the same problem as Scenario B, and the solution is the same:
+ * Move the whole list to the start of the block.
+ *
+ *          0   1   2   3   4   5   6   7   8   9
+ * Header | Q | R | S | T | U | V |   |   |   |   |
+ *          ^-- flfirst         ^-- fllast
+ *
+ * Scenario D: Empty List Touches the End at Unmount Time
+ *
+ * Let's say that the list is empty but the pointers touch the end:
+ *
+ *          0   1   2   3   4   5   6   7   8   9
+ * Header |   |   |   |   |   |   |   |   |   |   |
+ *          ^ flfirst                           ^-- fllast
+ *
+ * Or:
+ *
+ *          0   1   2   3   4   5   6   7   8   9
+ * Header |   |   |   |   |   |   |   |   |   |   |
+ *                                 fllast --^   ^-- flfirst
+ *
+ * Then we simply move the pointers to the start of the block:
+ *
+ *          0   1   2   3   4   5   6   7   8   9
+ * Header |   |   |   |   |   |   |   |   |   |   |
+ * fllast --^   ^-- flfirst
+ *
+ * As a side note, if flcount == 0 (which is only the case for AGs that are
+ * either entirely out of space or have been freshly created by growfs), we
+ * simply reset the pointers to the start of the block.
+ */
+
+/*
+ * Decide if we need to have the agfl wrapping fixes applied.  This only
+ * affects v5 filesystems that do not have any features enabled that did not
+ * exist when the agfl padding fix went in.
+ *
+ * Features already present when the fix went in were finobt, ftype, spinodes.
+ * If we see something new (e.g. reflink) then don't bother.
+ */
+#define XFS_SB_FEAT_RO_COMPAT_AGFL_WRAP_ALREADY_FIXED \
+		(~(XFS_SB_FEAT_RO_COMPAT_FINOBT))
+#define XFS_SB_FEAT_INCOMPAT_AGFL_WRAP_ALREADY_FIXED \
+		(~(XFS_SB_FEAT_INCOMPAT_FTYPE | \
+		   XFS_SB_FEAT_INCOMPAT_SPINODES))
+#define XFS_SB_FEAT_INCOMPAT_LOG_AGFL_WRAP_ALREADY_FIXED \
+		(~0)
+static inline bool xfs_sb_version_needs_agfl_wrap_fixes(struct xfs_sb *sbp)
+{
+	return xfs_sb_version_hascrc(sbp) &&
+		!xfs_sb_has_incompat_feature(sbp,
+			XFS_SB_FEAT_INCOMPAT_AGFL_WRAP_ALREADY_FIXED) &&
+		!xfs_sb_has_ro_compat_feature(sbp,
+			XFS_SB_FEAT_RO_COMPAT_AGFL_WRAP_ALREADY_FIXED) &&
+		!xfs_sb_has_incompat_log_feature(sbp,
+			XFS_SB_FEAT_INCOMPAT_LOG_AGFL_WRAP_ALREADY_FIXED);
+}
+
+/* Copy the entire AGFL list into the memory buffer. */
+STATIC void
+xfs_fixup_freelist_copyin(
+	struct xfs_trans	*tp,
+	struct xfs_buf		*agfbp,
+	struct xfs_perag	*pag,
+	struct xfs_buf		*agflbp,
+	__be32			*bnolist,
+	unsigned int		agfl_size)
+{
+	struct xfs_mount	*mp = tp->t_mountp;
+	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agfbp);
+	__be32			*agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);
+	unsigned int		flfirst;
+	unsigned int		fllast;
+
+	ASSERT(pag->pagf_flcount < agfl_size);
+
+	flfirst = be32_to_cpu(agf->agf_flfirst);
+	fllast = be32_to_cpu(agf->agf_fllast);
+
+	/* Non wrapping list, easy. */
+	if (flfirst < fllast) {
+		memcpy(bnolist, &agfl_bno[flfirst],
+				pag->pagf_flcount * sizeof(__be32));
+		return;
+	}
+
+	/* Copy the first part: 0 to fllast */
+	memcpy(bnolist, agfl_bno, (fllast + 1) * sizeof(__be32));
+
+	/* Copy the second part: flfirst to agfl_size */
+	memcpy(&bnolist[fllast + 1], &agfl_bno[flfirst],
+			(agfl_size - flfirst) * sizeof(__be32));
+}
+
+/* Copy the list into memory and move it to the head. */
+STATIC int
+xfs_fixup_freelist_reset(
+	struct xfs_trans	*tp,
+	xfs_agnumber_t		agno,
+	struct xfs_buf		*agfbp,
+	struct xfs_perag	*pag,
+	__be32			*bnolist,
+	unsigned int		agfl_size)
+{
+	struct xfs_mount	*mp = tp->t_mountp;
+	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agfbp);
+	__be32			*agfl_bno;
+	struct xfs_buf		*agflbp;
+	unsigned int		copylen;
+	int			error;
+
+	trace_xfs_fixup_freelist_reset(mp, agno, be32_to_cpu(agf->agf_flfirst),
+			be32_to_cpu(agf->agf_fllast), pag->pagf_flcount);
+
+	/* Empty list, just move it to the start. */
+	if (pag->pagf_flcount == 0) {
+		agf->agf_flfirst = cpu_to_be32(1);
+		agf->agf_fllast = 0;
+		xfs_alloc_log_agf(tp, agfbp, XFS_AGF_FLFIRST | XFS_AGF_FLLAST);
+		return 0;
+	}
+
+	error = xfs_alloc_read_agfl(mp, tp, agno, &agflbp);
+	if (error)
+		return error;
+
+	/*
+	 * Load the agfl into the buffer and write it back at the start
+	 * of the block.
+	 */
+	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);
+	copylen = pag->pagf_flcount * sizeof(__be32);
+	xfs_fixup_freelist_copyin(tp, agfbp, pag, agflbp, bnolist,
+			agfl_size);
+	memcpy(agfl_bno, bnolist, copylen);
+	xfs_trans_buf_set_type(tp, agflbp, XFS_BLFT_AGFL_BUF);
+	xfs_trans_log_buf(tp, agflbp, 0, copylen);
+	xfs_trans_brelse(tp, agflbp);
+
+	agf->agf_flfirst = 0;
+	agf->agf_fllast = cpu_to_be32(pag->pagf_flcount - 1);
+	xfs_alloc_log_agf(tp, agfbp, XFS_AGF_FLFIRST | XFS_AGF_FLLAST);
+
+	return 0;
+}
+
+/* Detect and fix Scenario A. */
+STATIC int
+xfs_fixup_freelist_wrap_mount(
+	struct xfs_trans	*tp,
+	struct xfs_buf		*agfbp,
+	struct xfs_perag	*pag,
+	void			*priv)
+{
+	struct xfs_mount	*mp = tp->t_mountp;
+	struct xfs_agf		*agf;
+	xfs_agnumber_t		agno;
+	unsigned int		agfl_size;
+	unsigned int		bad_agfl_size;
+	unsigned int		flfirst;
+	unsigned int		fllast;
+	unsigned int		active;
+
+	agfl_size = xfs_agfl_size(mp);
+	bad_agfl_size = agfl_size - 1;
+	agf = XFS_BUF_TO_AGF(agfbp);
+	agno = be32_to_cpu(agf->agf_seqno);
+	flfirst = be32_to_cpu(agf->agf_flfirst);
+	fllast = be32_to_cpu(agf->agf_fllast);
+
+	/* List doesn't wrap or is empty, no correction required. */
+	if (flfirst < fllast || pag->pagf_flcount == 0)
+		return 0;
+
+	/* Compare computed active item count to flcount. */
+	active = (fllast + 1) + (agfl_size - flfirst);
+	if (active == pag->pagf_flcount) {
+		/* Exactly the right size, we're done. */
+		return 0;
+	} else if (active != pag->pagf_flcount + 1) {
+		/*
+		 * If the distance between flfirst and fllast is greater than
+		 * flcount by more than 1, then there's more wrong with this
+		 * agfl than just the padding problem.  Bail out completely,
+		 * which will force the admin to run xfs_repair.
+		 */
+		return -EFSCORRUPTED;
+	}
+
+	/*
+	 * Bail out if either pointer goes off the end of the block, the
+	 * flcount is insane, or the agfl wasn't wrapped to start with.
+	 */
+	if (flfirst >= agfl_size - 1 || fllast >= agfl_size - 1 ||
+	    pag->pagf_flcount > agfl_size)
+		return -EFSCORRUPTED;
+
+	return xfs_fixup_freelist_reset(tp, agno, agfbp, pag, priv,
+			bad_agfl_size);
+}
+
+/* Detect and fix Scenario B - D. */
+STATIC int
+xfs_fixup_freelist_wrap_unmount(
+	struct xfs_trans	*tp,
+	struct xfs_buf		*agfbp,
+	struct xfs_perag	*pag,
+	void			*priv)
+{
+	struct xfs_mount	*mp = tp->t_mountp;
+	struct xfs_agf		*agf;
+	xfs_agnumber_t		agno;
+	uint32_t		agfl_size;
+	uint32_t		flfirst;
+	uint32_t		fllast;
+	int			error;
+
+	agfl_size = xfs_agfl_size(mp);
+	agf = XFS_BUF_TO_AGF(agfbp);
+	agno = be32_to_cpu(agf->agf_seqno);
+	flfirst = be32_to_cpu(agf->agf_flfirst);
+	fllast = be32_to_cpu(agf->agf_fllast);
+
+	/* Scenario D: empty list, but pointer touches the end. */
+	if (pag->pagf_flcount == 0) {
+		if (flfirst == agfl_size - 1 || fllast == agfl_size - 1)
+			return xfs_fixup_freelist_reset(tp, agno, agfbp, pag,
+					priv, agfl_size);
+		return 0;
+	}
+
+	/*
+	 * If the AGFL is completely full, fix_freelist to free up some space
+	 * in the list.
+	 */
+	if (pag->pagf_flcount == agfl_size) {
+		struct xfs_alloc_arg		args = {
+			.mp = mp,
+			.agno = agno,
+			.alignment = 1,
+			.pag = pag,
+			.tp = tp,
+		};
+
+		error = xfs_alloc_fix_freelist(&args, 0);
+		if (error)
+			return error;
+
+		/* The list should not still be full! */
+		if (pag->pagf_flcount == agfl_size) {
+			ASSERT(0);
+			return -EIO;
+		}
+	}
+
+	/*
+	 * If the list doesn't wrap and doesn't touch the last element, we're
+	 * done here.
+	 */
+	if (flfirst < fllast && fllast < agfl_size - 1)
+		return 0;
+
+	/*
+	 * Either the list wraps around the end (Scenario B) or the last
+	 * element touches the end (Scenario C).
+	 */
+	return xfs_fixup_freelist_reset(tp, agno, agfbp, pag, priv, agfl_size);
+}
+
+typedef int (*xfs_agf_apply_fn_t)(struct xfs_trans *tp, struct xfs_buf *agfbp,
+		struct xfs_perag *pag, void *priv);
+
+/* Apply something to every AGF. */
+STATIC int
+xfs_fixup_agf_apply(
+	struct xfs_mount	*mp,
+	xfs_agf_apply_fn_t	fn,
+	void			*priv)
+{
+	struct xfs_trans	*tp;
+	struct xfs_perag	*pag;
+	struct xfs_buf		*agfbp;
+	xfs_agnumber_t		agno;
+	int			error;
+
+	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growdata, 0, 0,
+			XFS_TRANS_NO_WRITECOUNT, &tp);
+	if (error)
+		return error;
+
+	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
+		error = xfs_alloc_read_agf(mp, tp, agno, 0, &agfbp);
+		if (error)
+			goto cancel;
+		if (!agfbp) {
+			error = -ENOMEM;
+			goto cancel;
+		}
+		pag = xfs_perag_get(mp, agno);
+		error = fn(tp, agfbp, pag, priv);
+		xfs_perag_put(pag);
+		xfs_trans_brelse(tp, agfbp);
+		if (error)
+			goto cancel;
+	}
+
+	return xfs_trans_commit(tp);
+cancel:
+	xfs_trans_cancel(tp);
+	return error;
+}
+
+/* Fix AGFL wrapping so we can use the filesystem. */
+int
+xfs_fixup_agfl_wrap_mount(
+	struct xfs_mount	*mp)
+{
+	__be32			*buf;
+	int			error;
+
+	if (!xfs_sb_version_needs_agfl_wrap_fixes(&mp->m_sb))
+		return 0;
+
+	buf = kmem_alloc(xfs_agfl_size(mp) * sizeof(__be32), KM_SLEEP);
+	if (!buf)
+		return -ENOMEM;
+
+	error = xfs_fixup_agf_apply(mp, xfs_fixup_freelist_wrap_mount, buf);
+	kmem_free(buf);
+	return error;
+}
+
+/* Fix AGFL wrapping so old kernels can use this filesystem. */
+int
+xfs_fixup_agfl_wrap_unmount(
+	struct xfs_mount	*mp)
+{
+	__be32			*buf;
+	int			error;
+
+	if (!xfs_sb_version_needs_agfl_wrap_fixes(&mp->m_sb))
+		return 0;
+
+	buf = kmem_alloc(xfs_agfl_size(mp) * sizeof(__be32), KM_SLEEP);
+	if (!buf)
+		return -ENOMEM;
+
+	error = xfs_fixup_agf_apply(mp, xfs_fixup_freelist_wrap_unmount, buf);
+	kmem_free(buf);
+	return error;
+}
diff --git a/fs/xfs/xfs_fixups.h b/fs/xfs/xfs_fixups.h
new file mode 100644
index 0000000..fb52a96
--- /dev/null
+++ b/fs/xfs/xfs_fixups.h
@@ -0,0 +1,26 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef	__XFS_FIXUPS_H__
+#define	__XFS_FIXUPS_H__
+
+int xfs_fixup_agfl_wrap_mount(struct xfs_mount *mp);
+int xfs_fixup_agfl_wrap_unmount(struct xfs_mount *mp);
+
+#endif /* __XFS_FIXUPS_H__ */
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 98fd41c..eb284aa 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -46,7 +46,7 @@
 #include "xfs_refcount_btree.h"
 #include "xfs_reflink.h"
 #include "xfs_extent_busy.h"
-
+#include "xfs_fixups.h"
 
 static DEFINE_MUTEX(xfs_uuid_table_mutex);
 static int xfs_uuid_table_size;
@@ -875,6 +875,16 @@ xfs_mountfs(
 	}
 
 	/*
+	 * Make sure our AGFL counters do not wrap the end of the block
+	 * in a troublesome manner.
+	 */
+	error = xfs_fixup_agfl_wrap_mount(mp);
+	if (error) {
+		xfs_warn(mp, "Failed to fix agfl wrapping.  Run xfs_repair.");
+		goto out_log_dealloc;
+	}
+
+	/*
 	 * Get and sanity-check the root inode.
 	 * Save the pointer to it in the mount structure.
 	 */
@@ -1128,6 +1138,15 @@ xfs_unmountfs(
 	xfs_qm_unmount(mp);
 
 	/*
+	 * Make sure our AGFL counters do not wrap the end of the block
+	 * in a troublesome manner for old kernels.
+	 */
+	error = xfs_fixup_agfl_wrap_unmount(mp);
+	if (error)
+		xfs_warn(mp, "Unable to fix agfl wrapping.  "
+				"This may cause problems on next mount.");
+
+	/*
 	 * Unreserve any blocks we have so that when we unmount we don't account
 	 * the reserved free space as used. This is really only necessary for
 	 * lazy superblock counting because it trusts the incore superblock
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 624a802..d9aa39a 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -50,6 +50,7 @@
 #include "xfs_refcount_item.h"
 #include "xfs_bmap_item.h"
 #include "xfs_reflink.h"
+#include "xfs_fixups.h"
 
 #include <linux/namei.h>
 #include <linux/dax.h>
@@ -1206,6 +1207,15 @@ xfs_quiesce_attr(
 	xfs_reclaim_inodes(mp, 0);
 	xfs_reclaim_inodes(mp, SYNC_WAIT);
 
+	/*
+	 * Make sure our AGFL counters do not wrap the end of the block
+	 * in a troublesome manner for old kernels.
+	 */
+	error = xfs_fixup_agfl_wrap_unmount(mp);
+	if (error)
+		xfs_warn(mp, "Unable to fix agfl wrapping.  "
+				"This may cause problems on next mount.");
+
 	/* Push the superblock and write an unmount record */
 	error = xfs_log_sbcount(mp);
 	if (error)
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 945de08..d83d079 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3339,6 +3339,31 @@ TRACE_EVENT(xfs_trans_resv_calc,
 		  __entry->logflags)
 );
 
+TRACE_EVENT(xfs_fixup_freelist_reset,
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
+		 unsigned int flfirst, unsigned int fllast,
+		 unsigned int flcount),
+	TP_ARGS(mp, agno, flfirst, fllast, flcount),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_agnumber_t, agno)
+		__field(unsigned int, flfirst)
+		__field(unsigned int, fllast)
+		__field(unsigned int, flcount)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->agno = agno;
+		__entry->flfirst = flfirst;
+		__entry->fllast = fllast;
+		__entry->flcount = flcount;
+	),
+	TP_printk("dev %d:%d agno %u flfirst %u fllast %u flcount %u",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->agno, __entry->flfirst, __entry->fllast,
+		  __entry->flcount)
+);
+
 #endif /* _TRACE_XFS_H */
 
 #undef TRACE_INCLUDE_PATH

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] xfs: fix agfl wrapping
  2018-02-28 23:20         ` Darrick J. Wong
@ 2018-03-01 17:28           ` Brian Foster
  2018-03-01 20:55             ` Darrick J. Wong
  0 siblings, 1 reply; 26+ messages in thread
From: Brian Foster @ 2018-03-01 17:28 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Feb 28, 2018 at 03:20:32PM -0800, Darrick J. Wong wrote:
> On Wed, Feb 28, 2018 at 05:43:51PM -0500, Brian Foster wrote:
> > On Tue, Feb 27, 2018 at 01:03:13PM -0800, Darrick J. Wong wrote:
> > > On Tue, Feb 27, 2018 at 02:35:49PM -0500, Brian Foster wrote:
> > > > On Thu, Feb 22, 2018 at 06:00:15PM -0800, Darrick J. Wong wrote:
> > > > > From: Darrick J. Wong <darrick.wong@oracle.com>
...
> > > > > diff --git a/fs/xfs/xfs_fixups.c b/fs/xfs/xfs_fixups.c
> > > > > new file mode 100644
> > > > > index 0000000..0cad7bb
> > > > > --- /dev/null
> > > > > +++ b/fs/xfs/xfs_fixups.c
> > > > > @@ -0,0 +1,310 @@
...
> > > Now let's say that we run for a while and want to unmount, but our AGFL
> > > wraps like this:
> > > 
> > >          0   1   2   3   4   5   6   7   8   9
> > > Header | T | U | V |   |   |   |   | Q | R | S |
> > >                  ^-- fllast          ^-- flfirst
> > > 
> > > We don't know that the next kernel to mount this filesystem will have
> > > the "4.5 agfl padding fix" applied to it; if it does not, it will flag
> > > the AGF as corrupted because flcount is 6 but in its view the distance
> > > between flfirst and fllast (which omits bno[9]) is 5.  We don't want
> > > it to choke on that, so the unmount fixer moves all the records between
> > > flfirst and the end (Q, R, and S) towards the start of the block and
> > > resets flfirst/fllast:
> > > 
> > >          0   1   2   3   4   5   6   7   8   9
> > > Header | T | U | V | Q | R | S |   |   |   |   |
> > >          ^-- flfirst         ^-- fllast
> > > 
> > 
> > Could we use this basic algorithm at mount time as well? I know it
> > wouldn't be _exactly_ the same operation at mount time as it is for
> > unmount since slot 9 is a gap in the former case, but afaict the only
> > difference from an algorithmic perspective is the length of the shift.
> > 
> > IOW, if we were to parameterize the length of the shift and have the
> > mount fixup call the unmount algorithm, would it not also address the
> > problem?
> 
> Yes, I believe that would work.  I think it would be more efficient in
> the patch below to memmove the entries instead of put_freelist'ing them
> individually.
> 
> If the agfl is completely full then fix_freelist ought to trim it down
> by at least one element.  The algorithm then becomes:
> 
> if mounting and flfirst < fllast:
> 	return 0
> if flcount == agfl_size:
> 	assert !mounting
> 	fix_freelist()

Not sure I follow where the fix_freelist() came in, but I think we need
to be careful here. What if flfirst == 118 and that slot is garbage?
Don't we need to fix up the agfl before we can allow any traditional
operations to proceed?

> 	assert flcount < agfl_size
> if flfirst < fllast:
> 	return 0
> movelen = agfl_size - flfirst
> if active == flcount - 1:
> 	movelen--
> memmove(&agflbno[fllast + 1], &agflbno[flfirst], movelen)
> flfirst = 0
> fllast = fllast + movelen
> 
...
> > > > 
> > > > So we're not attempting to cover the case where the agfl has 1 more
> > > > block than the agfl size (i.e., the case where an fs goes back to a
> > > > kernel with an unpacked header)?
> > > 
> > > We don't know how the next kernel to touch this filesystem will define
> > > XFS_AGFL_SIZE -- it could be a 4.5+ kernel (same agfl size), a 32-bit
> > > pre-4.5 kernel (same agfl size), or a 64-bit pre-4.5 kernel (small agfl
> > > size).
> > > 
> > 
> > I don't think I was clear.. I'm envisioning whether we could come up
> > with a patch that would generically fix up the agfl on disk to be sane
> > relative to the current kernel. This patch covers the case of a packed
> > agfl kernel mounting an unpacked on-disk agfl. It would be nice if we
> > could implement something that also handled a packed on-disk agfl to an
> > unpacked agfl kernel (for easier backport to unpacked kernels, for
> > e.g.).
> 
> If we're going to touch an old kernel's source at all I'd rather we
> backport both the packing fix and this fixer-upper.
> 

Not sure I parse... the "old kernel" is essentially the rhel example
where we apparently have deliberately maintained the unpacked format to
avoid this incompatibility problem. If we had a patch that generically
converted on-disk format (packed or unpacked) to the current kernel
(packed or unpacked), we could merge that patch to upstream, stable as
well as distro kernels that might not include the agfl packing fix and
eliminate compatibility problems between them (even if the packing fix
comes in out of order).

Otherwise, we need a separate unpacked -> packed fixup for packed
kernels (i.e., this patch) and a packed -> unpacked fixup for unpacked
kernels and to make sure they are used in the right places. Trying to
see if we could avoid this kind of dependency matrix was one of the
objectives around the hack I posted previously. I'm not married to that
particular implementation, but I'm much less concerned about
inefficiency (even if it dictates a mount time fixup over a runtime one)
in comparison to simplicity and flexibility. Perhaps we can accomplish
something similarly flexible via direct buffer manipulation..?

FWIW, I've appended another variant of the previous hack that is less
brute force, but I think is still able to convert back and forth. The
tradeoff is essentially that it no longer uses the same generic
algorithm.

Brian

--- 8< ---

static int
xfs_agfl_ondisk_size(
	struct xfs_mount	*mp,
	int			first,
	int			last,
	int			flcount)
{
	int			active;
	int			size;
	int			agfl_size = XFS_AGFL_SIZE(mp);

	if (last >= first)
		active = last - first + 1;
	else
		active = agfl_size - first + last + 1;

	if (active == flcount + 1)
		size = agfl_size - 1;
	else if ((active == flcount - 1) ||
		 first == agfl_size || last == agfl_size)
		size = agfl_size + 1;
	else if (active == flcount)
		size = agfl_size;
	else
		return -EFSCORRUPTED;

	return size;
}

int
xfs_agfl_fixup(
	struct xfs_trans	*tp,
	struct xfs_buf		*agbp,
	struct xfs_perag	*pag)
{
	struct xfs_mount	*mp = tp->t_mountp;
	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agbp);
	int			agfl_size = XFS_AGFL_SIZE(mp);
	int			ofirst, olast, osize;
	int			nfirst, nlast;
	struct xfs_buf		*agflbp;
	__be32			*agfl_bno;
	xfs_agblock_t		bno = -1;
	int			tidx = -1;
	bool			empty = false;
	int			logflags = 0;
	int			error;

	ofirst = nfirst = be32_to_cpu(agf->agf_flfirst);
	olast = nlast = be32_to_cpu(agf->agf_fllast);
	osize = xfs_agfl_ondisk_size(mp, ofirst, olast, pag->pagf_flcount);
	if (osize < 0)
		return osize;
	if (pag->pagf_flcount == 0)
		empty = true;

	/* sizes match, nothing to do */
	if (osize == agfl_size)
		return 0;

	/* size mismatch, read the agfl.. */
	error = xfs_alloc_read_agfl(mp, tp, be32_to_cpu(agf->agf_seqno),
				    &agflbp);
	if (error)
		return error;
	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);

	/*
	 * If the on-disk agfl is smaller than what the kernel expects, the last
	 * slot of the on-disk agfl is a gap with bogus data. Allocate the first
	 * valid block from the agfl, manually place it in the gap and fix up
	 * the count.
	 */
	if (osize < agfl_size) {
		ASSERT(!empty);
		error = xfs_alloc_get_freelist(tp, agbp, &bno, 0);
		if (error)
			goto out_relse;

		pag->pagf_flcount++;
		be32_add_cpu(&agf->agf_flcount, 1);
		logflags |= XFS_AGF_FLCOUNT;
		tidx = agfl_size - 1;
		goto done;
	}

	/*
	 * Otherwise, the on-disk agfl is larger than what the current kernel
	 * can manage. If the agfl was empty, we just fix up the first and last
	 * pointers. If not, move the inaccessible block in the last slot to the
	 * next valid, open slot.
	 */
	nfirst = do_mod(nfirst, agfl_size);
	if (empty) {
		nlast = (nfirst == 0 ? agfl_size - 1 : nfirst - 1);
		goto done;
	}
	if (nlast != agfl_size)
		nlast++;
	nlast = do_mod(nlast, agfl_size);
	tidx = nlast;
	bno = be32_to_cpu(agfl_bno[osize - 1]);

done:
	if (nfirst != ofirst) {
		agf->agf_flfirst = cpu_to_be32(nfirst);
		logflags |= XFS_AGF_FLFIRST;
	}
	if (nlast != olast) {
		agf->agf_fllast = cpu_to_be32(nlast);
		logflags |= XFS_AGF_FLLAST;
	}
	if (bno != -1) {
		int	startoff;

		agfl_bno[tidx] = cpu_to_be32(bno);
		xfs_trans_buf_set_type(tp, agflbp, XFS_BLFT_AGFL_BUF);
		startoff = (char *) &agfl_bno[tidx] - (char *) agflbp->b_addr;
		xfs_trans_log_buf(tp, agflbp, startoff,
				  startoff + sizeof(xfs_agblock_t) - 1);
	}
	if (logflags)
		xfs_alloc_log_agf(tp, agbp, logflags);

out_relse:
	xfs_trans_brelse(tp, agflbp);
	return error;
}


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] xfs: fix agfl wrapping
  2018-03-01 17:28           ` Brian Foster
@ 2018-03-01 20:55             ` Darrick J. Wong
  2018-03-02 13:12               ` Brian Foster
  0 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2018-03-01 20:55 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs, djwong

On Thu, Mar 01, 2018 at 12:28:33PM -0500, Brian Foster wrote:
> On Wed, Feb 28, 2018 at 03:20:32PM -0800, Darrick J. Wong wrote:
> > On Wed, Feb 28, 2018 at 05:43:51PM -0500, Brian Foster wrote:
> > > On Tue, Feb 27, 2018 at 01:03:13PM -0800, Darrick J. Wong wrote:
> > > > On Tue, Feb 27, 2018 at 02:35:49PM -0500, Brian Foster wrote:
> > > > > On Thu, Feb 22, 2018 at 06:00:15PM -0800, Darrick J. Wong wrote:
> > > > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> ...
> > > > > > diff --git a/fs/xfs/xfs_fixups.c b/fs/xfs/xfs_fixups.c
> > > > > > new file mode 100644
> > > > > > index 0000000..0cad7bb
> > > > > > --- /dev/null
> > > > > > +++ b/fs/xfs/xfs_fixups.c
> > > > > > @@ -0,0 +1,310 @@
> ...
> > > > Now let's say that we run for a while and want to unmount, but our AGFL
> > > > wraps like this:
> > > > 
> > > >          0   1   2   3   4   5   6   7   8   9
> > > > Header | T | U | V |   |   |   |   | Q | R | S |
> > > >                  ^-- fllast          ^-- flfirst
> > > > 
> > > > We don't know that the next kernel to mount this filesystem will have
> > > > the "4.5 agfl padding fix" applied to it; if it does not, it will flag
> > > > the AGF as corrupted because flcount is 6 but in its view the distance
> > > > between flfirst and fllast (which omits bno[9]) is 5.  We don't want
> > > > it to choke on that, so the unmount fixer moves all the records between
> > > > flfirst and the end (Q, R, and S) towards the start of the block and
> > > > resets flfirst/fllast:
> > > > 
> > > >          0   1   2   3   4   5   6   7   8   9
> > > > Header | T | U | V | Q | R | S |   |   |   |   |
> > > >          ^-- flfirst         ^-- fllast
> > > > 
> > > 
> > > Could we use this basic algorithm at mount time as well? I know it
> > > wouldn't be _exactly_ the same operation at mount time as it is for
> > > unmount since slot 9 is a gap in the former case, but afaict the only
> > > difference from an algorithmic perspective is the length of the shift.
> > > 
> > > IOW, if we were to parameterize the length of the shift and have the
> > > mount fixup call the unmount algorithm, would it not also address the
> > > problem?
> > 
> > Yes, I believe that would work.  I think it would be more efficient in
> > the patch below to memmove the entries instead of put_freelist'ing them
> > individually.
> > 
> > If the agfl is completely full then fix_freelist ought to trim it down
> > by at least one element.  The algorithm then becomes:
> > 
> > if mounting and flfirst < fllast:
> > 	return 0
> > if flcount == agfl_size:
> > 	assert !mounting
> > 	fix_freelist()
> 
> Not sure I follow where the fix_freelist() came in, but I think we need
> to be careful here. What if flfirst == 118 and that slot is garbage?
> Don't we need to fix up the agfl before we can allow any traditional
> operations to proceed?

This got confusing, so you might notice in the v2 patch that I separated
this out...

> > 	assert flcount < agfl_size
> > if flfirst < fllast:
> > 	return 0
> > movelen = agfl_size - flfirst
> > if active == flcount - 1:
> > 	movelen--
> > memmove(&agflbno[fllast + 1], &agflbno[flfirst], movelen)
> > flfirst = 0
> > fllast = fllast + movelen
> > 
> ...
> > > > > 
> > > > > So we're not attempting to cover the case where the agfl has 1 more
> > > > > block than the agfl size (i.e., the case where an fs goes back to a
> > > > > kernel with an unpacked header)?
> > > > 
> > > > We don't know how the next kernel to touch this filesystem will define
> > > > XFS_AGFL_SIZE -- it could be a 4.5+ kernel (same agfl size), a 32-bit
> > > > pre-4.5 kernel (same agfl size), or a 64-bit pre-4.5 kernel (small agfl
> > > > size).
> > > > 
> > > 
> > > I don't think I was clear.. I'm envisioning whether we could come up
> > > with a patch that would generically fix up the agfl on disk to be sane
> > > relative to the current kernel. This patch covers the case of a packed
> > > agfl kernel mounting an unpacked on-disk agfl. It would be nice if we
> > > could implement something that also handled a packed on-disk agfl to an
> > > unpacked agfl kernel (for easier backport to unpacked kernels, for
> > > e.g.).
> > 
> > If we're going to touch an old kernel's source at all I'd rather we
> > backport both the packing fix and this fixer-upper.
> > 
> 
> Not sure I parse... the "old kernel" is essentially the rhel example
> where we apparently have deliberately maintained the unpacked format to
> avoid this incompatibility problem. If we had a patch that generically
> converted on-disk format (packed or unpacked) to the current kernel
> (packed or unpacked), we could merge that patch to upstream, stable as
> well as distro kernels that might not include the agfl packing fix and
> eliminate compatibility problems between them (even if the packing fix
> comes in out of order).

...I wonder if we've gotten to the 'talking past each other' point, but
let me try one more time.

Going forward, I want the number of unpacked kernels to decrease as
quickly as possible.  I understand that distro kernel maintainers are
not willing to apply the packing patch to their kernel until we come up
with a smooth transition path.

I don't want to support fixing agfls to be 118 units long on 64-bit
unpacked kernels and 119 units long on 32-bit unpacked kernels, and I
only want to support the packed kernels with their 119 unit long agfls.
An AGFL that starts at 0 and ends at flcount-1 is compatible with packed
and unpacked kernels, so the v2 patch I sent last night removes the
delicate per-case surgery in favor of a new strategy where the mount
time and unmount time helpers both look for agfl configurationss that
are known to cause problems, and solves them all with the same solution:
moving the agfl list towards the start of the block.

The fixup patch requires that the kernel already include the packing
fix.  Its goals are:

a) At mount time, ensure that all AGFLs which are not compatible with a
packed kernel are made compatible with a packed kernel.

b) At unmount time, ensure all AGFLs are compatible with packed kernels
and unpacked kernels.

Therefore, the first step for a kernel maintainer is to ensure that the
packing fix is applied (which it will be for any 4.5+ kernel).  The
second step is to apply this fixer patch.

--D

> Otherwise, we need a separate unpacked -> packed fixup for packed
> kernels (i.e., this patch) and a packed -> unpacked fixup for unpacked
> kernels and to make sure they are used in the right places. Trying to
> see if we could avoid this kind of dependency matrix was one of the
> objectives around the hack I posted previously. I'm not married to that
> particular implementation, but I'm much less concerned about
> inefficiency (even if it dictates a mount time fixup over a runtime one)
> in comparison to simplicity and flexibility. Perhaps we can accomplish
> something similarly flexible via direct buffer manipulation..?
> 
> FWIW, I've appended another variant of the previous hack that is less
> brute force, but I think is still able to convert back and forth. The
> tradeoff is essentially that it no longer uses the same generic
> algorithm.
> 
> Brian
> 
> --- 8< ---
> 
> static int
> xfs_agfl_ondisk_size(
> 	struct xfs_mount	*mp,
> 	int			first,
> 	int			last,
> 	int			flcount)
> {
> 	int			active;
> 	int			size;
> 	int			agfl_size = XFS_AGFL_SIZE(mp);
> 
> 	if (last >= first)
> 		active = last - first + 1;
> 	else
> 		active = agfl_size - first + last + 1;
> 
> 	if (active == flcount + 1)
> 		size = agfl_size - 1;
> 	else if ((active == flcount - 1) ||
> 		 first == agfl_size || last == agfl_size)
> 		size = agfl_size + 1;
> 	else if (active == flcount)
> 		size = agfl_size;
> 	else
> 		return -EFSCORRUPTED;
> 
> 	return size;
> }
> 
> int
> xfs_agfl_fixup(
> 	struct xfs_trans	*tp,
> 	struct xfs_buf		*agbp,
> 	struct xfs_perag	*pag)
> {
> 	struct xfs_mount	*mp = tp->t_mountp;
> 	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agbp);
> 	int			agfl_size = XFS_AGFL_SIZE(mp);
> 	int			ofirst, olast, osize;
> 	int			nfirst, nlast;
> 	struct xfs_buf		*agflbp;
> 	__be32			*agfl_bno;
> 	xfs_agblock_t		bno = -1;
> 	int			tidx = -1;
> 	bool			empty = false;
> 	int			logflags = 0;
> 	int			error;
> 
> 	ofirst = nfirst = be32_to_cpu(agf->agf_flfirst);
> 	olast = nlast = be32_to_cpu(agf->agf_fllast);
> 	osize = xfs_agfl_ondisk_size(mp, ofirst, olast, pag->pagf_flcount);
> 	if (osize < 0)
> 		return osize;
> 	if (pag->pagf_flcount == 0)
> 		empty = true;
> 
> 	/* sizes match, nothing to do */
> 	if (osize == agfl_size)
> 		return 0;
> 
> 	/* size mismatch, read the agfl.. */
> 	error = xfs_alloc_read_agfl(mp, tp, be32_to_cpu(agf->agf_seqno),
> 				    &agflbp);
> 	if (error)
> 		return error;
> 	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);
> 
> 	/*
> 	 * If the on-disk agfl is smaller than what the kernel expects, the last
> 	 * slot of the on-disk agfl is a gap with bogus data. Allocate the first
> 	 * valid block from the agfl, manually place it in the gap and fix up
> 	 * the count.
> 	 */
> 	if (osize < agfl_size) {
> 		ASSERT(!empty);
> 		error = xfs_alloc_get_freelist(tp, agbp, &bno, 0);
> 		if (error)
> 			goto out_relse;
> 
> 		pag->pagf_flcount++;
> 		be32_add_cpu(&agf->agf_flcount, 1);
> 		logflags |= XFS_AGF_FLCOUNT;
> 		tidx = agfl_size - 1;
> 		goto done;
> 	}
> 
> 	/*
> 	 * Otherwise, the on-disk agfl is larger than what the current kernel
> 	 * can manage. If the agfl was empty, we just fix up the first and last
> 	 * pointers. If not, move the inaccessible block in the last slot to the
> 	 * next valid, open slot.
> 	 */
> 	nfirst = do_mod(nfirst, agfl_size);
> 	if (empty) {
> 		nlast = (nfirst == 0 ? agfl_size - 1 : nfirst - 1);
> 		goto done;
> 	}
> 	if (nlast != agfl_size)
> 		nlast++;
> 	nlast = do_mod(nlast, agfl_size);
> 	tidx = nlast;
> 	bno = be32_to_cpu(agfl_bno[osize - 1]);
> 
> done:
> 	if (nfirst != ofirst) {
> 		agf->agf_flfirst = cpu_to_be32(nfirst);
> 		logflags |= XFS_AGF_FLFIRST;
> 	}
> 	if (nlast != olast) {
> 		agf->agf_fllast = cpu_to_be32(nlast);
> 		logflags |= XFS_AGF_FLLAST;
> 	}
> 	if (bno != -1) {
> 		int	startoff;
> 
> 		agfl_bno[tidx] = cpu_to_be32(bno);
> 		xfs_trans_buf_set_type(tp, agflbp, XFS_BLFT_AGFL_BUF);
> 		startoff = (char *) &agfl_bno[tidx] - (char *) agflbp->b_addr;
> 		xfs_trans_log_buf(tp, agflbp, startoff,
> 				  startoff + sizeof(xfs_agblock_t) - 1);
> 	}
> 	if (logflags)
> 		xfs_alloc_log_agf(tp, agbp, logflags);
> 
> out_relse:
> 	xfs_trans_brelse(tp, agflbp);
> 	return error;
> }
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] xfs: fix agfl wrapping
  2018-03-01 20:55             ` Darrick J. Wong
@ 2018-03-02 13:12               ` Brian Foster
  2018-03-03 13:59                 ` Brian Foster
  0 siblings, 1 reply; 26+ messages in thread
From: Brian Foster @ 2018-03-02 13:12 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, djwong

On Thu, Mar 01, 2018 at 12:55:41PM -0800, Darrick J. Wong wrote:
> On Thu, Mar 01, 2018 at 12:28:33PM -0500, Brian Foster wrote:
> > On Wed, Feb 28, 2018 at 03:20:32PM -0800, Darrick J. Wong wrote:
> > > On Wed, Feb 28, 2018 at 05:43:51PM -0500, Brian Foster wrote:
> > > > On Tue, Feb 27, 2018 at 01:03:13PM -0800, Darrick J. Wong wrote:
> > > > > On Tue, Feb 27, 2018 at 02:35:49PM -0500, Brian Foster wrote:
> > > > > > On Thu, Feb 22, 2018 at 06:00:15PM -0800, Darrick J. Wong wrote:
> > > > > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > ...
> > > > > > > diff --git a/fs/xfs/xfs_fixups.c b/fs/xfs/xfs_fixups.c
> > > > > > > new file mode 100644
> > > > > > > index 0000000..0cad7bb
> > > > > > > --- /dev/null
> > > > > > > +++ b/fs/xfs/xfs_fixups.c
> > > > > > > @@ -0,0 +1,310 @@
> > ...
> > > > > Now let's say that we run for a while and want to unmount, but our AGFL
> > > > > wraps like this:
> > > > > 
> > > > >          0   1   2   3   4   5   6   7   8   9
> > > > > Header | T | U | V |   |   |   |   | Q | R | S |
> > > > >                  ^-- fllast          ^-- flfirst
> > > > > 
> > > > > We don't know that the next kernel to mount this filesystem will have
> > > > > the "4.5 agfl padding fix" applied to it; if it does not, it will flag
> > > > > the AGF as corrupted because flcount is 6 but in its view the distance
> > > > > between flfirst and fllast (which omits bno[9]) is 5.  We don't want
> > > > > it to choke on that, so the unmount fixer moves all the records between
> > > > > flfirst and the end (Q, R, and S) towards the start of the block and
> > > > > resets flfirst/fllast:
> > > > > 
> > > > >          0   1   2   3   4   5   6   7   8   9
> > > > > Header | T | U | V | Q | R | S |   |   |   |   |
> > > > >          ^-- flfirst         ^-- fllast
> > > > > 
> > > > 
> > > > Could we use this basic algorithm at mount time as well? I know it
> > > > wouldn't be _exactly_ the same operation at mount time as it is for
> > > > unmount since slot 9 is a gap in the former case, but afaict the only
> > > > difference from an algorithmic perspective is the length of the shift.
> > > > 
> > > > IOW, if we were to parameterize the length of the shift and have the
> > > > mount fixup call the unmount algorithm, would it not also address the
> > > > problem?
> > > 
> > > Yes, I believe that would work.  I think it would be more efficient in
> > > the patch below to memmove the entries instead of put_freelist'ing them
> > > individually.
> > > 
> > > If the agfl is completely full then fix_freelist ought to trim it down
> > > by at least one element.  The algorithm then becomes:
> > > 
> > > if mounting and flfirst < fllast:
> > > 	return 0
> > > if flcount == agfl_size:
> > > 	assert !mounting
> > > 	fix_freelist()
> > 
> > Not sure I follow where the fix_freelist() came in, but I think we need
> > to be careful here. What if flfirst == 118 and that slot is garbage?
> > Don't we need to fix up the agfl before we can allow any traditional
> > operations to proceed?
> 
> This got confusing, so you might notice in the v2 patch that I separated
> this out...
> 
> > > 	assert flcount < agfl_size
> > > if flfirst < fllast:
> > > 	return 0
> > > movelen = agfl_size - flfirst
> > > if active == flcount - 1:
> > > 	movelen--
> > > memmove(&agflbno[fllast + 1], &agflbno[flfirst], movelen)
> > > flfirst = 0
> > > fllast = fllast + movelen
> > > 
> > ...
> > > > > > 
> > > > > > So we're not attempting to cover the case where the agfl has 1 more
> > > > > > block than the agfl size (i.e., the case where an fs goes back to a
> > > > > > kernel with an unpacked header)?
> > > > > 
> > > > > We don't know how the next kernel to touch this filesystem will define
> > > > > XFS_AGFL_SIZE -- it could be a 4.5+ kernel (same agfl size), a 32-bit
> > > > > pre-4.5 kernel (same agfl size), or a 64-bit pre-4.5 kernel (small agfl
> > > > > size).
> > > > > 
> > > > 
> > > > I don't think I was clear.. I'm envisioning whether we could come up
> > > > with a patch that would generically fix up the agfl on disk to be sane
> > > > relative to the current kernel. This patch covers the case of a packed
> > > > agfl kernel mounting an unpacked on-disk agfl. It would be nice if we
> > > > could implement something that also handled a packed on-disk agfl to an
> > > > unpacked agfl kernel (for easier backport to unpacked kernels, for
> > > > e.g.).
> > > 
> > > If we're going to touch an old kernel's source at all I'd rather we
> > > backport both the packing fix and this fixer-upper.
> > > 
> > 
> > Not sure I parse... the "old kernel" is essentially the rhel example
> > where we apparently have deliberately maintained the unpacked format to
> > avoid this incompatibility problem. If we had a patch that generically
> > converted on-disk format (packed or unpacked) to the current kernel
> > (packed or unpacked), we could merge that patch to upstream, stable as
> > well as distro kernels that might not include the agfl packing fix and
> > eliminate compatibility problems between them (even if the packing fix
> > comes in out of order).
> 
> ...I wonder if we've gotten to the 'talking past each other' point, but
> let me try one more time.
> 
> Going forward, I want the number of unpacked kernels to decrease as
> quickly as possible.  I understand that distro kernel maintainers are
> not willing to apply the packing patch to their kernel until we come up
> with a smooth transition path.
> 

I agree wrt to upstream, but note that I don't anticipate including the
padding fix downstream any time soon.

> I don't want to support fixing agfls to be 118 units long on 64-bit
> unpacked kernels and 119 units long on 32-bit unpacked kernels, and I
> only want to support the packed kernels with their 119 unit long agfls.
> An AGFL that starts at 0 and ends at flcount-1 is compatible with packed
> and unpacked kernels, so the v2 patch I sent last night removes the
> delicate per-case surgery in favor of a new strategy where the mount
> time and unmount time helpers both look for agfl configurationss that
> are known to cause problems, and solves them all with the same solution:
> moving the agfl list towards the start of the block.
> 

The purpose of the patch I sent was not for upstream unpacked support
going forward. Upstream has clearly moved forward with the packed
format. The goal of the patch was to explore a single/generic patch that
could be merged upstream/downstream and handle compatibility cleanly.

IOW, I considered what I thought will be necessary to handle
compatibility downstream and attempted to incorporate that with an
upstream fix because then the patch is straightforward and safe to
backport (upstream acceptance mostly justifies inclusion downstream).
Then it is possible to create a set of downstream systems, going
forward, that are explicitly 100% upstream compatible.

So upstream is technically fine without a packed -> unpacked variant, to
suggest otherwise wasn't really the intent. Perhaps a BUILD_BUG_ON() or
some such that asserts a packed agfl would be effective to prevent blind
backporting to unpacked upstream kernels (and rhel could just drop that
bit) and would have documented that more clearly.

> The fixup patch requires that the kernel already include the packing
> fix.  Its goals are:
> 
> a) At mount time, ensure that all AGFLs which are not compatible with a
> packed kernel are made compatible with a packed kernel.
> 
> b) At unmount time, ensure all AGFLs are compatible with packed kernels
> and unpacked kernels.
> 
> Therefore, the first step for a kernel maintainer is to ensure that the
> packing fix is applied (which it will be for any 4.5+ kernel).  The
> second step is to apply this fixer patch.
> 

I'm pretty clear on what the patch does at this point. ;)

I agree on goal A in principle. It is generally sane for newer kernels
to expect weird/broken formats that could be caused by older kernels and
rectify them. There is a convenience value for users upgrading from
broken kernels. As noted above, the purpose of my patch was to try and
further genericize the implementation to make it useful for downstream
(and perhaps more simple/less code).

I don't see the value in B. The approach is not 100% effective so it
doesn't allow those who have to support unpacked kernels to support any
use case that they couldn't before. It seems to be motivated by the
existence of essentially stale[1] stable kernels, but we may have
already released more[2] kernels since v4.5 that would never get the
unmount fix than we had unpacked kernel releases in the first place.
Those newer stale kernels can continue to break the older stale kernels
regardless of the existence of B.

So in other words, the only way the B variant of problems truly
disappear is for those stale/unpacked kernel users to upgrade. As such,
B mostly seems like a placebo, requres 2-3x more code than is necessary
for A (particularly if we consider an on-demand fixup to elminate the
need for a mount time scan) and as a result I'm having a hard time
justifying doing anything at all here so long as these fixes remain
artificially squashed together. At the end of the day it's not my
decision but that's the only remaining feedback I can offer on the v2
patch (having skimmed it).

Brian

[1] Unless I'm mistaken, anybody still on a stale/stable unpacked
upstream kernel should have a minor release update readily available
that includes the padding fix.

[2] AFAICT, the same logic around the stale unpacked kernels applies to
future stable non-unmount-fix-capable kernels were we to take this
approach.

> --D
> 
> > Otherwise, we need a separate unpacked -> packed fixup for packed
> > kernels (i.e., this patch) and a packed -> unpacked fixup for unpacked
> > kernels and to make sure they are used in the right places. Trying to
> > see if we could avoid this kind of dependency matrix was one of the
> > objectives around the hack I posted previously. I'm not married to that
> > particular implementation, but I'm much less concerned about
> > inefficiency (even if it dictates a mount time fixup over a runtime one)
> > in comparison to simplicity and flexibility. Perhaps we can accomplish
> > something similarly flexible via direct buffer manipulation..?
> > 
> > FWIW, I've appended another variant of the previous hack that is less
> > brute force, but I think is still able to convert back and forth. The
> > tradeoff is essentially that it no longer uses the same generic
> > algorithm.
> > 
> > Brian
> > 
> > --- 8< ---
> > 
> > static int
> > xfs_agfl_ondisk_size(
> > 	struct xfs_mount	*mp,
> > 	int			first,
> > 	int			last,
> > 	int			flcount)
> > {
> > 	int			active;
> > 	int			size;
> > 	int			agfl_size = XFS_AGFL_SIZE(mp);
> > 
> > 	if (last >= first)
> > 		active = last - first + 1;
> > 	else
> > 		active = agfl_size - first + last + 1;
> > 
> > 	if (active == flcount + 1)
> > 		size = agfl_size - 1;
> > 	else if ((active == flcount - 1) ||
> > 		 first == agfl_size || last == agfl_size)
> > 		size = agfl_size + 1;
> > 	else if (active == flcount)
> > 		size = agfl_size;
> > 	else
> > 		return -EFSCORRUPTED;
> > 
> > 	return size;
> > }
> > 
> > int
> > xfs_agfl_fixup(
> > 	struct xfs_trans	*tp,
> > 	struct xfs_buf		*agbp,
> > 	struct xfs_perag	*pag)
> > {
> > 	struct xfs_mount	*mp = tp->t_mountp;
> > 	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agbp);
> > 	int			agfl_size = XFS_AGFL_SIZE(mp);
> > 	int			ofirst, olast, osize;
> > 	int			nfirst, nlast;
> > 	struct xfs_buf		*agflbp;
> > 	__be32			*agfl_bno;
> > 	xfs_agblock_t		bno = -1;
> > 	int			tidx = -1;
> > 	bool			empty = false;
> > 	int			logflags = 0;
> > 	int			error;
> > 
> > 	ofirst = nfirst = be32_to_cpu(agf->agf_flfirst);
> > 	olast = nlast = be32_to_cpu(agf->agf_fllast);
> > 	osize = xfs_agfl_ondisk_size(mp, ofirst, olast, pag->pagf_flcount);
> > 	if (osize < 0)
> > 		return osize;
> > 	if (pag->pagf_flcount == 0)
> > 		empty = true;
> > 
> > 	/* sizes match, nothing to do */
> > 	if (osize == agfl_size)
> > 		return 0;
> > 
> > 	/* size mismatch, read the agfl.. */
> > 	error = xfs_alloc_read_agfl(mp, tp, be32_to_cpu(agf->agf_seqno),
> > 				    &agflbp);
> > 	if (error)
> > 		return error;
> > 	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);
> > 
> > 	/*
> > 	 * If the on-disk agfl is smaller than what the kernel expects, the last
> > 	 * slot of the on-disk agfl is a gap with bogus data. Allocate the first
> > 	 * valid block from the agfl, manually place it in the gap and fix up
> > 	 * the count.
> > 	 */
> > 	if (osize < agfl_size) {
> > 		ASSERT(!empty);
> > 		error = xfs_alloc_get_freelist(tp, agbp, &bno, 0);
> > 		if (error)
> > 			goto out_relse;
> > 
> > 		pag->pagf_flcount++;
> > 		be32_add_cpu(&agf->agf_flcount, 1);
> > 		logflags |= XFS_AGF_FLCOUNT;
> > 		tidx = agfl_size - 1;
> > 		goto done;
> > 	}
> > 
> > 	/*
> > 	 * Otherwise, the on-disk agfl is larger than what the current kernel
> > 	 * can manage. If the agfl was empty, we just fix up the first and last
> > 	 * pointers. If not, move the inaccessible block in the last slot to the
> > 	 * next valid, open slot.
> > 	 */
> > 	nfirst = do_mod(nfirst, agfl_size);
> > 	if (empty) {
> > 		nlast = (nfirst == 0 ? agfl_size - 1 : nfirst - 1);
> > 		goto done;
> > 	}
> > 	if (nlast != agfl_size)
> > 		nlast++;
> > 	nlast = do_mod(nlast, agfl_size);
> > 	tidx = nlast;
> > 	bno = be32_to_cpu(agfl_bno[osize - 1]);
> > 
> > done:
> > 	if (nfirst != ofirst) {
> > 		agf->agf_flfirst = cpu_to_be32(nfirst);
> > 		logflags |= XFS_AGF_FLFIRST;
> > 	}
> > 	if (nlast != olast) {
> > 		agf->agf_fllast = cpu_to_be32(nlast);
> > 		logflags |= XFS_AGF_FLLAST;
> > 	}
> > 	if (bno != -1) {
> > 		int	startoff;
> > 
> > 		agfl_bno[tidx] = cpu_to_be32(bno);
> > 		xfs_trans_buf_set_type(tp, agflbp, XFS_BLFT_AGFL_BUF);
> > 		startoff = (char *) &agfl_bno[tidx] - (char *) agflbp->b_addr;
> > 		xfs_trans_log_buf(tp, agflbp, startoff,
> > 				  startoff + sizeof(xfs_agblock_t) - 1);
> > 	}
> > 	if (logflags)
> > 		xfs_alloc_log_agf(tp, agbp, logflags);
> > 
> > out_relse:
> > 	xfs_trans_brelse(tp, agflbp);
> > 	return error;
> > }
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] xfs: fix agfl wrapping
  2018-03-02 13:12               ` Brian Foster
@ 2018-03-03 13:59                 ` Brian Foster
  2018-03-05 22:24                   ` Darrick J. Wong
  0 siblings, 1 reply; 26+ messages in thread
From: Brian Foster @ 2018-03-03 13:59 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, djwong

On Fri, Mar 02, 2018 at 08:12:07AM -0500, Brian Foster wrote:
> On Thu, Mar 01, 2018 at 12:55:41PM -0800, Darrick J. Wong wrote:
> > On Thu, Mar 01, 2018 at 12:28:33PM -0500, Brian Foster wrote:
> > > On Wed, Feb 28, 2018 at 03:20:32PM -0800, Darrick J. Wong wrote:
> > > > On Wed, Feb 28, 2018 at 05:43:51PM -0500, Brian Foster wrote:
> > > > > On Tue, Feb 27, 2018 at 01:03:13PM -0800, Darrick J. Wong wrote:
> > > > > > On Tue, Feb 27, 2018 at 02:35:49PM -0500, Brian Foster wrote:
> > > > > > > On Thu, Feb 22, 2018 at 06:00:15PM -0800, Darrick J. Wong wrote:
> > > > > > > > From: Darrick J. Wong <darrick.wong@oracle.com>
...
> > Going forward, I want the number of unpacked kernels to decrease as
> > quickly as possible.  I understand that distro kernel maintainers are
> > not willing to apply the packing patch to their kernel until we come up
> > with a smooth transition path.
> > 
> 
> I agree wrt to upstream, but note that I don't anticipate including the
> padding fix downstream any time soon.
> 
> > I don't want to support fixing agfls to be 118 units long on 64-bit
> > unpacked kernels and 119 units long on 32-bit unpacked kernels, and I
> > only want to support the packed kernels with their 119 unit long agfls.
> > An AGFL that starts at 0 and ends at flcount-1 is compatible with packed
> > and unpacked kernels, so the v2 patch I sent last night removes the
> > delicate per-case surgery in favor of a new strategy where the mount
> > time and unmount time helpers both look for agfl configurationss that
> > are known to cause problems, and solves them all with the same solution:
> > moving the agfl list towards the start of the block.
> > 
> 
> The purpose of the patch I sent was not for upstream unpacked support
> going forward. Upstream has clearly moved forward with the packed
> format. The goal of the patch was to explore a single/generic patch that
> could be merged upstream/downstream and handle compatibility cleanly.
> 

FWIW, here's a new variant of a bidirectional fixup. It's refactored and
polished up a bit into a patch. It basically inspects the agf when first
read for any evidence that the on-disk fields reflect a size mismatch
with the current kernel and sets a flag if so. agfl gets/puts check the
flag and thus the first transaction that attempts to modify a mismatched
agfl swaps a block into or out of the gap slot appropriately.

This avoids the need for any new transactions or mount time scan and the
(downstream motivated) packed -> unpacked case is only 10 or so lines of
additional code. Only spot tested, but I _think_ it covers all of the
cases. Hm?

Brian

--- 8< ---

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index c02781a4c091..1cbf80d8481f 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2053,6 +2053,115 @@ xfs_alloc_space_available(
 	return true;
 }
 
+static int
+xfs_agfl_ondisk_size(
+	struct xfs_mount	*mp,
+	int			first,
+	int			last,
+	int			count)
+{
+	int			active = count;
+	int			agfl_size = XFS_AGFL_SIZE(mp);
+
+	if (count && last >= first)
+		active = last - first + 1;
+	else if (count)
+		active = agfl_size - first + last + 1;
+
+	if (active == count + 1)
+		return agfl_size - 1;
+	if (active == count - 1 || first == agfl_size || last == agfl_size)
+		return agfl_size + 1;
+
+	ASSERT(active == count);
+	return agfl_size;
+}
+
+static bool
+xfs_agfl_need_padfix(
+	struct xfs_mount	*mp,
+	struct xfs_agf		*agf)
+{
+	int			f = be32_to_cpu(agf->agf_flfirst);
+	int			l = be32_to_cpu(agf->agf_fllast);
+	int			c = be32_to_cpu(agf->agf_flcount);
+
+	return xfs_agfl_ondisk_size(mp, f, l, c) != XFS_AGFL_SIZE(mp);
+}
+
+static int
+xfs_agfl_check_padfix(
+	struct xfs_trans	*tp,
+	struct xfs_buf		*agbp,
+	struct xfs_buf		*agflbp,
+	struct xfs_perag	*pag)
+{
+	struct xfs_mount	*mp = tp->t_mountp;
+	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agbp);
+	__be32			*agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);
+	int			agfl_size = XFS_AGFL_SIZE(mp);
+	int			ofirst, olast, osize;
+	int			nfirst, nlast;
+	int			logflags = 0;
+	int			startoff = 0;
+
+	if (!pag->pagf_needpadfix)
+		return 0;
+
+	ofirst = nfirst = be32_to_cpu(agf->agf_flfirst);
+	olast = nlast = be32_to_cpu(agf->agf_fllast);
+	osize = xfs_agfl_ondisk_size(mp, ofirst, olast, pag->pagf_flcount);
+
+	/*
+	 * If the on-disk agfl is smaller than what the kernel expects, the
+	 * last slot of the on-disk agfl is a gap with bogus data. Move the
+	 * first valid block into the gap and bump the pointer.
+	 */
+	if (osize < agfl_size) {
+		ASSERT(pag->pagf_flcount != 0);
+		agfl_bno[agfl_size - 1] = agfl_bno[ofirst];
+		startoff = (char *) &agfl_bno[agfl_size - 1] - (char *) agflbp->b_addr;
+		nfirst++;
+		goto done;
+	}
+
+	/*
+	 * Otherwise, the on-disk agfl is larger than what the current kernel
+	 * expects. If empty, just fix up the first and last pointers. If not,
+	 * move the inaccessible block to the end of the valid range.
+	 */
+	nfirst = do_mod(nfirst, agfl_size);
+	if (pag->pagf_flcount == 0) {
+		nlast = (nfirst == 0 ? agfl_size - 1 : nfirst - 1);
+		goto done;
+	}
+	if (nlast != agfl_size)
+		nlast++;
+	nlast = do_mod(nlast, agfl_size);
+	agfl_bno[nlast] = agfl_bno[osize - 1];
+	startoff = (char *) &agfl_bno[nlast] - (char *) agflbp->b_addr;
+
+done:
+	if (nfirst != ofirst) {
+		agf->agf_flfirst = cpu_to_be32(nfirst);
+		logflags |= XFS_AGF_FLFIRST;
+	}
+	if (nlast != olast) {
+		agf->agf_fllast = cpu_to_be32(nlast);
+		logflags |= XFS_AGF_FLLAST;
+	}
+	if (startoff) {
+		xfs_trans_buf_set_type(tp, agflbp, XFS_BLFT_AGFL_BUF);
+		xfs_trans_log_buf(tp, agflbp, startoff,
+				  startoff + sizeof(xfs_agblock_t) - 1);
+	}
+	if (logflags)
+		xfs_alloc_log_agf(tp, agbp, logflags);
+
+	pag->pagf_needpadfix = false;
+	return 0;
+}
+
 /*
  * Decide whether to use this allocation group for this allocation.
  * If so, fix up the btree freelist's size.
@@ -2258,6 +2367,12 @@ xfs_alloc_get_freelist(
 	if (error)
 		return error;
 
+	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
+	error = xfs_agfl_check_padfix(tp, agbp, agflbp, pag);
+	if (error) {
+		xfs_perag_put(pag);
+		return error;
+	}
 
 	/*
 	 * Get the block number and update the data structures.
@@ -2269,7 +2384,6 @@ xfs_alloc_get_freelist(
 	if (be32_to_cpu(agf->agf_flfirst) == XFS_AGFL_SIZE(mp))
 		agf->agf_flfirst = 0;
 
-	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
 	be32_add_cpu(&agf->agf_flcount, -1);
 	xfs_trans_agflist_delta(tp, -1);
 	pag->pagf_flcount--;
@@ -2376,11 +2490,18 @@ xfs_alloc_put_freelist(
 	if (!agflbp && (error = xfs_alloc_read_agfl(mp, tp,
 			be32_to_cpu(agf->agf_seqno), &agflbp)))
 		return error;
+
+	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
+	error = xfs_agfl_check_padfix(tp, agbp, agflbp, pag);
+	if (error) {
+		xfs_perag_put(pag);
+		return error;
+	}
+
 	be32_add_cpu(&agf->agf_fllast, 1);
 	if (be32_to_cpu(agf->agf_fllast) == XFS_AGFL_SIZE(mp))
 		agf->agf_fllast = 0;
 
-	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
 	be32_add_cpu(&agf->agf_flcount, 1);
 	xfs_trans_agflist_delta(tp, 1);
 	pag->pagf_flcount++;
@@ -2588,6 +2709,7 @@ xfs_alloc_read_agf(
 		pag->pagb_count = 0;
 		pag->pagb_tree = RB_ROOT;
 		pag->pagf_init = 1;
+		pag->pagf_needpadfix = xfs_agfl_need_padfix(mp, agf);
 	}
 #ifdef DEBUG
 	else if (!XFS_FORCED_SHUTDOWN(mp)) {
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index e0792d036be2..78a6377a9b38 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -353,6 +353,7 @@ typedef struct xfs_perag {
 	char		pagi_inodeok;	/* The agi is ok for inodes */
 	uint8_t		pagf_levels[XFS_BTNUM_AGF];
 					/* # of levels in bno & cnt btree */
+	bool		pagf_needpadfix;
 	uint32_t	pagf_flcount;	/* count of blocks in freelist */
 	xfs_extlen_t	pagf_freeblks;	/* total free blocks */
 	xfs_extlen_t	pagf_longest;	/* longest free space */

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] xfs: fix agfl wrapping
  2018-03-03 13:59                 ` Brian Foster
@ 2018-03-05 22:24                   ` Darrick J. Wong
  2018-03-05 22:53                     ` Dave Chinner
  2018-03-06  2:15                     ` Darrick J. Wong
  0 siblings, 2 replies; 26+ messages in thread
From: Darrick J. Wong @ 2018-03-05 22:24 UTC (permalink / raw)
  To: Brian Foster, b; +Cc: linux-xfs, djwong

On Sat, Mar 03, 2018 at 08:59:50AM -0500, Brian Foster wrote:
> On Fri, Mar 02, 2018 at 08:12:07AM -0500, Brian Foster wrote:
> > On Thu, Mar 01, 2018 at 12:55:41PM -0800, Darrick J. Wong wrote:
> > > On Thu, Mar 01, 2018 at 12:28:33PM -0500, Brian Foster wrote:
> > > > On Wed, Feb 28, 2018 at 03:20:32PM -0800, Darrick J. Wong wrote:
> > > > > On Wed, Feb 28, 2018 at 05:43:51PM -0500, Brian Foster wrote:
> > > > > > On Tue, Feb 27, 2018 at 01:03:13PM -0800, Darrick J. Wong wrote:
> > > > > > > On Tue, Feb 27, 2018 at 02:35:49PM -0500, Brian Foster wrote:
> > > > > > > > On Thu, Feb 22, 2018 at 06:00:15PM -0800, Darrick J. Wong wrote:
> > > > > > > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> ...
> > > Going forward, I want the number of unpacked kernels to decrease as
> > > quickly as possible.  I understand that distro kernel maintainers are
> > > not willing to apply the packing patch to their kernel until we come up
> > > with a smooth transition path.
> > > 
> > 
> > I agree wrt to upstream, but note that I don't anticipate including the
> > padding fix downstream any time soon.
> > 
> > > I don't want to support fixing agfls to be 118 units long on 64-bit
> > > unpacked kernels and 119 units long on 32-bit unpacked kernels, and I
> > > only want to support the packed kernels with their 119 unit long agfls.
> > > An AGFL that starts at 0 and ends at flcount-1 is compatible with packed
> > > and unpacked kernels, so the v2 patch I sent last night removes the
> > > delicate per-case surgery in favor of a new strategy where the mount
> > > time and unmount time helpers both look for agfl configurationss that
> > > are known to cause problems, and solves them all with the same solution:
> > > moving the agfl list towards the start of the block.
> > > 
> > 
> > The purpose of the patch I sent was not for upstream unpacked support
> > going forward. Upstream has clearly moved forward with the packed
> > format. The goal of the patch was to explore a single/generic patch that
> > could be merged upstream/downstream and handle compatibility cleanly.
> > 
> 
> FWIW, here's a new variant of a bidirectional fixup. It's refactored and
> polished up a bit into a patch. It basically inspects the agf when first
> read for any evidence that the on-disk fields reflect a size mismatch
> with the current kernel and sets a flag if so. agfl gets/puts check the
> flag and thus the first transaction that attempts to modify a mismatched
> agfl swaps a block into or out of the gap slot appropriately.
> 
> This avoids the need for any new transactions or mount time scan and the
> (downstream motivated) packed -> unpacked case is only 10 or so lines of
> additional code. Only spot tested, but I _think_ it covers all of the
> cases. Hm?

Just from a quick glance this looks like a reasonable way to fix the
agfl wrapping to whatever the running kernel expects.  I tried feeding
it to the xfstest I wrote to exercise my agfl fixer[1], but even with
changing the test to fill the fs to enospc and delete everything I
couldn't get it to trigger reliably.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/commit/?h=djwong-experimental&id=f085bb09c839da69daf921da33f5d13c80c9f165

--D

> Brian
> 
> --- 8< ---
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index c02781a4c091..1cbf80d8481f 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -2053,6 +2053,115 @@ xfs_alloc_space_available(
>  	return true;
>  }
>  
> +static int
> +xfs_agfl_ondisk_size(
> +	struct xfs_mount	*mp,
> +	int			first,
> +	int			last,
> +	int			count)
> +{
> +	int			active = count;
> +	int			agfl_size = XFS_AGFL_SIZE(mp);
> +
> +	if (count && last >= first)
> +		active = last - first + 1;
> +	else if (count)
> +		active = agfl_size - first + last + 1;
> +
> +	if (active == count + 1)
> +		return agfl_size - 1;
> +	if (active == count - 1 || first == agfl_size || last == agfl_size)
> +		return agfl_size + 1;
> +
> +	ASSERT(active == count);
> +	return agfl_size;
> +}
> +
> +static bool
> +xfs_agfl_need_padfix(
> +	struct xfs_mount	*mp,
> +	struct xfs_agf		*agf)
> +{
> +	int			f = be32_to_cpu(agf->agf_flfirst);
> +	int			l = be32_to_cpu(agf->agf_fllast);
> +	int			c = be32_to_cpu(agf->agf_flcount);
> +
> +	return xfs_agfl_ondisk_size(mp, f, l, c) != XFS_AGFL_SIZE(mp);
> +}
> +
> +static int
> +xfs_agfl_check_padfix(
> +	struct xfs_trans	*tp,
> +	struct xfs_buf		*agbp,
> +	struct xfs_buf		*agflbp,
> +	struct xfs_perag	*pag)
> +{
> +	struct xfs_mount	*mp = tp->t_mountp;
> +	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agbp);
> +	__be32			*agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);
> +	int			agfl_size = XFS_AGFL_SIZE(mp);
> +	int			ofirst, olast, osize;
> +	int			nfirst, nlast;
> +	int			logflags = 0;
> +	int			startoff = 0;
> +
> +	if (!pag->pagf_needpadfix)
> +		return 0;
> +
> +	ofirst = nfirst = be32_to_cpu(agf->agf_flfirst);
> +	olast = nlast = be32_to_cpu(agf->agf_fllast);
> +	osize = xfs_agfl_ondisk_size(mp, ofirst, olast, pag->pagf_flcount);
> +
> +	/*
> +	 * If the on-disk agfl is smaller than what the kernel expects, the
> +	 * last slot of the on-disk agfl is a gap with bogus data. Move the
> +	 * first valid block into the gap and bump the pointer.
> +	 */
> +	if (osize < agfl_size) {
> +		ASSERT(pag->pagf_flcount != 0);
> +		agfl_bno[agfl_size - 1] = agfl_bno[ofirst];
> +		startoff = (char *) &agfl_bno[agfl_size - 1] - (char *) agflbp->b_addr;
> +		nfirst++;
> +		goto done;
> +	}
> +
> +	/*
> +	 * Otherwise, the on-disk agfl is larger than what the current kernel
> +	 * expects. If empty, just fix up the first and last pointers. If not,
> +	 * move the inaccessible block to the end of the valid range.
> +	 */
> +	nfirst = do_mod(nfirst, agfl_size);
> +	if (pag->pagf_flcount == 0) {
> +		nlast = (nfirst == 0 ? agfl_size - 1 : nfirst - 1);
> +		goto done;
> +	}
> +	if (nlast != agfl_size)
> +		nlast++;
> +	nlast = do_mod(nlast, agfl_size);
> +	agfl_bno[nlast] = agfl_bno[osize - 1];
> +	startoff = (char *) &agfl_bno[nlast] - (char *) agflbp->b_addr;
> +
> +done:
> +	if (nfirst != ofirst) {
> +		agf->agf_flfirst = cpu_to_be32(nfirst);
> +		logflags |= XFS_AGF_FLFIRST;
> +	}
> +	if (nlast != olast) {
> +		agf->agf_fllast = cpu_to_be32(nlast);
> +		logflags |= XFS_AGF_FLLAST;
> +	}
> +	if (startoff) {
> +		xfs_trans_buf_set_type(tp, agflbp, XFS_BLFT_AGFL_BUF);
> +		xfs_trans_log_buf(tp, agflbp, startoff,
> +				  startoff + sizeof(xfs_agblock_t) - 1);
> +	}
> +	if (logflags)
> +		xfs_alloc_log_agf(tp, agbp, logflags);
> +
> +	pag->pagf_needpadfix = false;
> +	return 0;
> +}
> +
>  /*
>   * Decide whether to use this allocation group for this allocation.
>   * If so, fix up the btree freelist's size.
> @@ -2258,6 +2367,12 @@ xfs_alloc_get_freelist(
>  	if (error)
>  		return error;
>  
> +	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
> +	error = xfs_agfl_check_padfix(tp, agbp, agflbp, pag);
> +	if (error) {
> +		xfs_perag_put(pag);
> +		return error;
> +	}
>  
>  	/*
>  	 * Get the block number and update the data structures.
> @@ -2269,7 +2384,6 @@ xfs_alloc_get_freelist(
>  	if (be32_to_cpu(agf->agf_flfirst) == XFS_AGFL_SIZE(mp))
>  		agf->agf_flfirst = 0;
>  
> -	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
>  	be32_add_cpu(&agf->agf_flcount, -1);
>  	xfs_trans_agflist_delta(tp, -1);
>  	pag->pagf_flcount--;
> @@ -2376,11 +2490,18 @@ xfs_alloc_put_freelist(
>  	if (!agflbp && (error = xfs_alloc_read_agfl(mp, tp,
>  			be32_to_cpu(agf->agf_seqno), &agflbp)))
>  		return error;
> +
> +	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
> +	error = xfs_agfl_check_padfix(tp, agbp, agflbp, pag);
> +	if (error) {
> +		xfs_perag_put(pag);
> +		return error;
> +	}
> +
>  	be32_add_cpu(&agf->agf_fllast, 1);
>  	if (be32_to_cpu(agf->agf_fllast) == XFS_AGFL_SIZE(mp))
>  		agf->agf_fllast = 0;
>  
> -	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
>  	be32_add_cpu(&agf->agf_flcount, 1);
>  	xfs_trans_agflist_delta(tp, 1);
>  	pag->pagf_flcount++;
> @@ -2588,6 +2709,7 @@ xfs_alloc_read_agf(
>  		pag->pagb_count = 0;
>  		pag->pagb_tree = RB_ROOT;
>  		pag->pagf_init = 1;
> +		pag->pagf_needpadfix = xfs_agfl_need_padfix(mp, agf);
>  	}
>  #ifdef DEBUG
>  	else if (!XFS_FORCED_SHUTDOWN(mp)) {
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index e0792d036be2..78a6377a9b38 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -353,6 +353,7 @@ typedef struct xfs_perag {
>  	char		pagi_inodeok;	/* The agi is ok for inodes */
>  	uint8_t		pagf_levels[XFS_BTNUM_AGF];
>  					/* # of levels in bno & cnt btree */
> +	bool		pagf_needpadfix;
>  	uint32_t	pagf_flcount;	/* count of blocks in freelist */
>  	xfs_extlen_t	pagf_freeblks;	/* total free blocks */
>  	xfs_extlen_t	pagf_longest;	/* longest free space */
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] xfs: fix agfl wrapping
  2018-03-05 22:24                   ` Darrick J. Wong
@ 2018-03-05 22:53                     ` Dave Chinner
  2018-03-06  2:15                     ` Darrick J. Wong
  1 sibling, 0 replies; 26+ messages in thread
From: Dave Chinner @ 2018-03-05 22:53 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Brian Foster, linux-xfs, djwong

On Mon, Mar 05, 2018 at 02:24:30PM -0800, Darrick J. Wong wrote:
> On Sat, Mar 03, 2018 at 08:59:50AM -0500, Brian Foster wrote:
> > On Fri, Mar 02, 2018 at 08:12:07AM -0500, Brian Foster wrote:
> > > On Thu, Mar 01, 2018 at 12:55:41PM -0800, Darrick J. Wong wrote:
> > > > On Thu, Mar 01, 2018 at 12:28:33PM -0500, Brian Foster wrote:
> > > > > On Wed, Feb 28, 2018 at 03:20:32PM -0800, Darrick J. Wong wrote:
> > > > > > On Wed, Feb 28, 2018 at 05:43:51PM -0500, Brian Foster wrote:
> > > > > > > On Tue, Feb 27, 2018 at 01:03:13PM -0800, Darrick J. Wong wrote:
> > > > > > > > On Tue, Feb 27, 2018 at 02:35:49PM -0500, Brian Foster wrote:
> > > > > > > > > On Thu, Feb 22, 2018 at 06:00:15PM -0800, Darrick J. Wong wrote:
> > > > > > > > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > ...
> > > > Going forward, I want the number of unpacked kernels to decrease as
> > > > quickly as possible.  I understand that distro kernel maintainers are
> > > > not willing to apply the packing patch to their kernel until we come up
> > > > with a smooth transition path.
> > > > 
> > > 
> > > I agree wrt to upstream, but note that I don't anticipate including the
> > > padding fix downstream any time soon.
> > > 
> > > > I don't want to support fixing agfls to be 118 units long on 64-bit
> > > > unpacked kernels and 119 units long on 32-bit unpacked kernels, and I
> > > > only want to support the packed kernels with their 119 unit long agfls.
> > > > An AGFL that starts at 0 and ends at flcount-1 is compatible with packed
> > > > and unpacked kernels, so the v2 patch I sent last night removes the
> > > > delicate per-case surgery in favor of a new strategy where the mount
> > > > time and unmount time helpers both look for agfl configurationss that
> > > > are known to cause problems, and solves them all with the same solution:
> > > > moving the agfl list towards the start of the block.
> > > > 
> > > 
> > > The purpose of the patch I sent was not for upstream unpacked support
> > > going forward. Upstream has clearly moved forward with the packed
> > > format. The goal of the patch was to explore a single/generic patch that
> > > could be merged upstream/downstream and handle compatibility cleanly.
> > > 
> > 
> > FWIW, here's a new variant of a bidirectional fixup. It's refactored and
> > polished up a bit into a patch. It basically inspects the agf when first
> > read for any evidence that the on-disk fields reflect a size mismatch
> > with the current kernel and sets a flag if so. agfl gets/puts check the
> > flag and thus the first transaction that attempts to modify a mismatched
> > agfl swaps a block into or out of the gap slot appropriately.
> > 
> > This avoids the need for any new transactions or mount time scan and the
> > (downstream motivated) packed -> unpacked case is only 10 or so lines of
> > additional code. Only spot tested, but I _think_ it covers all of the
> > cases. Hm?
> 
> Just from a quick glance this looks like a reasonable way to fix the
> agfl wrapping to whatever the running kernel expects.  I tried feeding
> it to the xfstest I wrote to exercise my agfl fixer[1], but even with
> changing the test to fill the fs to enospc and delete everything I
> couldn't get it to trigger reliably.
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/commit/?h=djwong-experimental&id=f085bb09c839da69daf921da33f5d13c80c9f165

I wrote a script that specifically wrapped the AGFL with xfs_db to
test this. I've attached it below, you'll need to adapt it to
whatever scheme is being used to correct the wrapping now....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com


#!/bin/bash


do_write()
{
	mount /dev/ram0 /mnt/test 
	echo > /mnt/test/foo
	sync
	umount /mnt/test 
	#xfs_db -x -c "agf 0" -c "p" /dev/ram0
	xfs_repair -n /dev/ram0 > /dev/null 2>&1
	xfs_repair /dev/ram0 > /dev/null 2>&1
	mount /dev/ram0 /mnt/test 
	echo > /mnt/test/bar
	umount /mnt/test 
	xfs_repair /dev/ram0 > /dev/null 2>&1
}

agfl_copy()
{
	source=$1
	dest=$2

	agbno=`xfs_db -x -c "agfl 0" -c "p bno[$source]" /dev/ram0 | \
		cut -d "=" -f 2`
	if [ "$agbno" == " null" ]; then
		agbno="0xffffffff"
	fi
	echo agbno "$agbno"
	xfs_db -x -c "agfl 0" -c "write -d bno[$source] 0xffffffff" /dev/ram0 > /dev/null
	xfs_db -x -c "agfl 0" -c "write -d bno[$dest] $agbno" /dev/ram0 > /dev/null
}

run_test()
{
	fltail=$1
	flhead=$2
	flcount=$3
	urk=$4

sleep 2

	echo "Testing fltail=$fltail flhead=$flhead flcount=$flcount...." > /dev/kmsg
	echo "Expecting $urk to occur...." > /dev/kmsg
	echo "Testing fltail=$fltail flhead=$flhead flcount=$flcount...."
	mkfs.xfs -f -s size=512 /dev/ram0 > /dev/null
	xfs_db -x -c "agf 0"			\
		-c "write -d flfirst $fltail"	\
		-c "write -d fllast $flhead"	\
		-c "write -d flcount $flcount"	\
		/dev/ram0

	# we need to write a bunch of block numbers into the new part
	# of the AGFL. So we just copy 0 -> fltail and so on.
	let i=0
	while (($flcount - $i > 0)) ; do
		dst=$((fltail + i))
		if [ $dst -ge 118 ]; then
			dst=$((dst - 118))
		fi
		agfl_copy $i $dst
		i=$((i + 1))
	done

	do_write
}

# run_test fltail flhead flcount
#
# mkfs default on 512 byte sectors is "0 3 4" w/ size 118
# hence 118 should be the first invalid index, and the number
# filesystems with the agfl header packing bug use.
#
# We want to test corrections for:
#	fltail being oversize w/ matching flcount
run_test 118 3 5 correction
#	flhead being oversize w/ matching flcount
run_test 114 118 5 correction
#	fltail/flast being in range w/ oversize flcount
run_test 117 3 6 correction

#
# We want to test corruption detection for:
# where "non-matching flcount" exercises both too small and too large
#	fltail being oversize w/ non-matching flcount
run_test 118 3 4 correction	# because tail gets fixed first
run_test 118 3 3 corruption
run_test 118 3 6 corruption
#	flhead being oversize w/ non-matching flcount
run_test 114 118 4 correction	# because head gets fixed first
run_test 114 118 3 corruption
run_test 114 118 6 corruption
#	fltail/flast being in range w/ non-matching flcount
run_test 117 3 4 corruption
run_test 117 3 7 corruption



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] xfs: fix agfl wrapping
  2018-03-05 22:24                   ` Darrick J. Wong
  2018-03-05 22:53                     ` Dave Chinner
@ 2018-03-06  2:15                     ` Darrick J. Wong
  2018-03-06 12:02                       ` Brian Foster
  1 sibling, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2018-03-06  2:15 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs, djwong

On Mon, Mar 05, 2018 at 02:24:30PM -0800, Darrick J. Wong wrote:
> On Sat, Mar 03, 2018 at 08:59:50AM -0500, Brian Foster wrote:
> > On Fri, Mar 02, 2018 at 08:12:07AM -0500, Brian Foster wrote:
> > > On Thu, Mar 01, 2018 at 12:55:41PM -0800, Darrick J. Wong wrote:
> > > > On Thu, Mar 01, 2018 at 12:28:33PM -0500, Brian Foster wrote:
> > > > > On Wed, Feb 28, 2018 at 03:20:32PM -0800, Darrick J. Wong wrote:
> > > > > > On Wed, Feb 28, 2018 at 05:43:51PM -0500, Brian Foster wrote:
> > > > > > > On Tue, Feb 27, 2018 at 01:03:13PM -0800, Darrick J. Wong wrote:
> > > > > > > > On Tue, Feb 27, 2018 at 02:35:49PM -0500, Brian Foster wrote:
> > > > > > > > > On Thu, Feb 22, 2018 at 06:00:15PM -0800, Darrick J. Wong wrote:
> > > > > > > > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > ...
> > > > Going forward, I want the number of unpacked kernels to decrease as
> > > > quickly as possible.  I understand that distro kernel maintainers are
> > > > not willing to apply the packing patch to their kernel until we come up
> > > > with a smooth transition path.
> > > > 
> > > 
> > > I agree wrt to upstream, but note that I don't anticipate including the
> > > padding fix downstream any time soon.
> > > 
> > > > I don't want to support fixing agfls to be 118 units long on 64-bit
> > > > unpacked kernels and 119 units long on 32-bit unpacked kernels, and I
> > > > only want to support the packed kernels with their 119 unit long agfls.
> > > > An AGFL that starts at 0 and ends at flcount-1 is compatible with packed
> > > > and unpacked kernels, so the v2 patch I sent last night removes the
> > > > delicate per-case surgery in favor of a new strategy where the mount
> > > > time and unmount time helpers both look for agfl configurationss that
> > > > are known to cause problems, and solves them all with the same solution:
> > > > moving the agfl list towards the start of the block.
> > > > 
> > > 
> > > The purpose of the patch I sent was not for upstream unpacked support
> > > going forward. Upstream has clearly moved forward with the packed
> > > format. The goal of the patch was to explore a single/generic patch that
> > > could be merged upstream/downstream and handle compatibility cleanly.
> > > 
> > 
> > FWIW, here's a new variant of a bidirectional fixup. It's refactored and
> > polished up a bit into a patch. It basically inspects the agf when first
> > read for any evidence that the on-disk fields reflect a size mismatch
> > with the current kernel and sets a flag if so. agfl gets/puts check the
> > flag and thus the first transaction that attempts to modify a mismatched
> > agfl swaps a block into or out of the gap slot appropriately.
> > 
> > This avoids the need for any new transactions or mount time scan and the
> > (downstream motivated) packed -> unpacked case is only 10 or so lines of
> > additional code. Only spot tested, but I _think_ it covers all of the
> > cases. Hm?
> 
> Just from a quick glance this looks like a reasonable way to fix the
> agfl wrapping to whatever the running kernel expects.  I tried feeding
> it to the xfstest I wrote to exercise my agfl fixer[1], but even with
> changing the test to fill the fs to enospc and delete everything I
> couldn't get it to trigger reliably.
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/commit/?h=djwong-experimental&id=f085bb09c839da69daf921da33f5d13c80c9f165

Ok, I got it to trigger by updating that xfstest to fragment the free
space after mounting.  According to the test results on 4.16-rc4 it
fixes the deliberately tweaked agfls to work without complaint, so could
you please write this up as a proper patch?

--D

> 
> --D
> 
> > Brian
> > 
> > --- 8< ---
> > 
> > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > index c02781a4c091..1cbf80d8481f 100644
> > --- a/fs/xfs/libxfs/xfs_alloc.c
> > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > @@ -2053,6 +2053,115 @@ xfs_alloc_space_available(
> >  	return true;
> >  }
> >  
> > +static int
> > +xfs_agfl_ondisk_size(
> > +	struct xfs_mount	*mp,
> > +	int			first,
> > +	int			last,
> > +	int			count)
> > +{
> > +	int			active = count;
> > +	int			agfl_size = XFS_AGFL_SIZE(mp);
> > +
> > +	if (count && last >= first)
> > +		active = last - first + 1;
> > +	else if (count)
> > +		active = agfl_size - first + last + 1;
> > +
> > +	if (active == count + 1)
> > +		return agfl_size - 1;
> > +	if (active == count - 1 || first == agfl_size || last == agfl_size)
> > +		return agfl_size + 1;
> > +
> > +	ASSERT(active == count);
> > +	return agfl_size;
> > +}
> > +
> > +static bool
> > +xfs_agfl_need_padfix(
> > +	struct xfs_mount	*mp,
> > +	struct xfs_agf		*agf)
> > +{
> > +	int			f = be32_to_cpu(agf->agf_flfirst);
> > +	int			l = be32_to_cpu(agf->agf_fllast);
> > +	int			c = be32_to_cpu(agf->agf_flcount);
> > +
> > +	return xfs_agfl_ondisk_size(mp, f, l, c) != XFS_AGFL_SIZE(mp);
> > +}
> > +
> > +static int
> > +xfs_agfl_check_padfix(
> > +	struct xfs_trans	*tp,
> > +	struct xfs_buf		*agbp,
> > +	struct xfs_buf		*agflbp,
> > +	struct xfs_perag	*pag)
> > +{
> > +	struct xfs_mount	*mp = tp->t_mountp;
> > +	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agbp);
> > +	__be32			*agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);
> > +	int			agfl_size = XFS_AGFL_SIZE(mp);
> > +	int			ofirst, olast, osize;
> > +	int			nfirst, nlast;
> > +	int			logflags = 0;
> > +	int			startoff = 0;
> > +
> > +	if (!pag->pagf_needpadfix)
> > +		return 0;
> > +
> > +	ofirst = nfirst = be32_to_cpu(agf->agf_flfirst);
> > +	olast = nlast = be32_to_cpu(agf->agf_fllast);
> > +	osize = xfs_agfl_ondisk_size(mp, ofirst, olast, pag->pagf_flcount);
> > +
> > +	/*
> > +	 * If the on-disk agfl is smaller than what the kernel expects, the
> > +	 * last slot of the on-disk agfl is a gap with bogus data. Move the
> > +	 * first valid block into the gap and bump the pointer.
> > +	 */
> > +	if (osize < agfl_size) {
> > +		ASSERT(pag->pagf_flcount != 0);
> > +		agfl_bno[agfl_size - 1] = agfl_bno[ofirst];
> > +		startoff = (char *) &agfl_bno[agfl_size - 1] - (char *) agflbp->b_addr;
> > +		nfirst++;
> > +		goto done;
> > +	}
> > +
> > +	/*
> > +	 * Otherwise, the on-disk agfl is larger than what the current kernel
> > +	 * expects. If empty, just fix up the first and last pointers. If not,
> > +	 * move the inaccessible block to the end of the valid range.
> > +	 */
> > +	nfirst = do_mod(nfirst, agfl_size);
> > +	if (pag->pagf_flcount == 0) {
> > +		nlast = (nfirst == 0 ? agfl_size - 1 : nfirst - 1);
> > +		goto done;
> > +	}
> > +	if (nlast != agfl_size)
> > +		nlast++;
> > +	nlast = do_mod(nlast, agfl_size);
> > +	agfl_bno[nlast] = agfl_bno[osize - 1];
> > +	startoff = (char *) &agfl_bno[nlast] - (char *) agflbp->b_addr;
> > +
> > +done:
> > +	if (nfirst != ofirst) {
> > +		agf->agf_flfirst = cpu_to_be32(nfirst);
> > +		logflags |= XFS_AGF_FLFIRST;
> > +	}
> > +	if (nlast != olast) {
> > +		agf->agf_fllast = cpu_to_be32(nlast);
> > +		logflags |= XFS_AGF_FLLAST;
> > +	}
> > +	if (startoff) {
> > +		xfs_trans_buf_set_type(tp, agflbp, XFS_BLFT_AGFL_BUF);
> > +		xfs_trans_log_buf(tp, agflbp, startoff,
> > +				  startoff + sizeof(xfs_agblock_t) - 1);
> > +	}
> > +	if (logflags)
> > +		xfs_alloc_log_agf(tp, agbp, logflags);
> > +
> > +	pag->pagf_needpadfix = false;
> > +	return 0;
> > +}
> > +
> >  /*
> >   * Decide whether to use this allocation group for this allocation.
> >   * If so, fix up the btree freelist's size.
> > @@ -2258,6 +2367,12 @@ xfs_alloc_get_freelist(
> >  	if (error)
> >  		return error;
> >  
> > +	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
> > +	error = xfs_agfl_check_padfix(tp, agbp, agflbp, pag);
> > +	if (error) {
> > +		xfs_perag_put(pag);
> > +		return error;
> > +	}
> >  
> >  	/*
> >  	 * Get the block number and update the data structures.
> > @@ -2269,7 +2384,6 @@ xfs_alloc_get_freelist(
> >  	if (be32_to_cpu(agf->agf_flfirst) == XFS_AGFL_SIZE(mp))
> >  		agf->agf_flfirst = 0;
> >  
> > -	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
> >  	be32_add_cpu(&agf->agf_flcount, -1);
> >  	xfs_trans_agflist_delta(tp, -1);
> >  	pag->pagf_flcount--;
> > @@ -2376,11 +2490,18 @@ xfs_alloc_put_freelist(
> >  	if (!agflbp && (error = xfs_alloc_read_agfl(mp, tp,
> >  			be32_to_cpu(agf->agf_seqno), &agflbp)))
> >  		return error;
> > +
> > +	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
> > +	error = xfs_agfl_check_padfix(tp, agbp, agflbp, pag);
> > +	if (error) {
> > +		xfs_perag_put(pag);
> > +		return error;
> > +	}
> > +
> >  	be32_add_cpu(&agf->agf_fllast, 1);
> >  	if (be32_to_cpu(agf->agf_fllast) == XFS_AGFL_SIZE(mp))
> >  		agf->agf_fllast = 0;
> >  
> > -	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
> >  	be32_add_cpu(&agf->agf_flcount, 1);
> >  	xfs_trans_agflist_delta(tp, 1);
> >  	pag->pagf_flcount++;
> > @@ -2588,6 +2709,7 @@ xfs_alloc_read_agf(
> >  		pag->pagb_count = 0;
> >  		pag->pagb_tree = RB_ROOT;
> >  		pag->pagf_init = 1;
> > +		pag->pagf_needpadfix = xfs_agfl_need_padfix(mp, agf);
> >  	}
> >  #ifdef DEBUG
> >  	else if (!XFS_FORCED_SHUTDOWN(mp)) {
> > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> > index e0792d036be2..78a6377a9b38 100644
> > --- a/fs/xfs/xfs_mount.h
> > +++ b/fs/xfs/xfs_mount.h
> > @@ -353,6 +353,7 @@ typedef struct xfs_perag {
> >  	char		pagi_inodeok;	/* The agi is ok for inodes */
> >  	uint8_t		pagf_levels[XFS_BTNUM_AGF];
> >  					/* # of levels in bno & cnt btree */
> > +	bool		pagf_needpadfix;
> >  	uint32_t	pagf_flcount;	/* count of blocks in freelist */
> >  	xfs_extlen_t	pagf_freeblks;	/* total free blocks */
> >  	xfs_extlen_t	pagf_longest;	/* longest free space */
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] xfs: fix agfl wrapping
  2018-03-06  2:15                     ` Darrick J. Wong
@ 2018-03-06 12:02                       ` Brian Foster
  0 siblings, 0 replies; 26+ messages in thread
From: Brian Foster @ 2018-03-06 12:02 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, djwong

On Mon, Mar 05, 2018 at 06:15:39PM -0800, Darrick J. Wong wrote:
> On Mon, Mar 05, 2018 at 02:24:30PM -0800, Darrick J. Wong wrote:
> > On Sat, Mar 03, 2018 at 08:59:50AM -0500, Brian Foster wrote:
> > > On Fri, Mar 02, 2018 at 08:12:07AM -0500, Brian Foster wrote:
> > > > On Thu, Mar 01, 2018 at 12:55:41PM -0800, Darrick J. Wong wrote:
> > > > > On Thu, Mar 01, 2018 at 12:28:33PM -0500, Brian Foster wrote:
> > > > > > On Wed, Feb 28, 2018 at 03:20:32PM -0800, Darrick J. Wong wrote:
> > > > > > > On Wed, Feb 28, 2018 at 05:43:51PM -0500, Brian Foster wrote:
> > > > > > > > On Tue, Feb 27, 2018 at 01:03:13PM -0800, Darrick J. Wong wrote:
> > > > > > > > > On Tue, Feb 27, 2018 at 02:35:49PM -0500, Brian Foster wrote:
> > > > > > > > > > On Thu, Feb 22, 2018 at 06:00:15PM -0800, Darrick J. Wong wrote:
> > > > > > > > > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > ...
> > > > > Going forward, I want the number of unpacked kernels to decrease as
> > > > > quickly as possible.  I understand that distro kernel maintainers are
> > > > > not willing to apply the packing patch to their kernel until we come up
> > > > > with a smooth transition path.
> > > > > 
> > > > 
> > > > I agree wrt to upstream, but note that I don't anticipate including the
> > > > padding fix downstream any time soon.
> > > > 
> > > > > I don't want to support fixing agfls to be 118 units long on 64-bit
> > > > > unpacked kernels and 119 units long on 32-bit unpacked kernels, and I
> > > > > only want to support the packed kernels with their 119 unit long agfls.
> > > > > An AGFL that starts at 0 and ends at flcount-1 is compatible with packed
> > > > > and unpacked kernels, so the v2 patch I sent last night removes the
> > > > > delicate per-case surgery in favor of a new strategy where the mount
> > > > > time and unmount time helpers both look for agfl configurationss that
> > > > > are known to cause problems, and solves them all with the same solution:
> > > > > moving the agfl list towards the start of the block.
> > > > > 
> > > > 
> > > > The purpose of the patch I sent was not for upstream unpacked support
> > > > going forward. Upstream has clearly moved forward with the packed
> > > > format. The goal of the patch was to explore a single/generic patch that
> > > > could be merged upstream/downstream and handle compatibility cleanly.
> > > > 
> > > 
> > > FWIW, here's a new variant of a bidirectional fixup. It's refactored and
> > > polished up a bit into a patch. It basically inspects the agf when first
> > > read for any evidence that the on-disk fields reflect a size mismatch
> > > with the current kernel and sets a flag if so. agfl gets/puts check the
> > > flag and thus the first transaction that attempts to modify a mismatched
> > > agfl swaps a block into or out of the gap slot appropriately.
> > > 
> > > This avoids the need for any new transactions or mount time scan and the
> > > (downstream motivated) packed -> unpacked case is only 10 or so lines of
> > > additional code. Only spot tested, but I _think_ it covers all of the
> > > cases. Hm?
> > 
> > Just from a quick glance this looks like a reasonable way to fix the
> > agfl wrapping to whatever the running kernel expects.  I tried feeding
> > it to the xfstest I wrote to exercise my agfl fixer[1], but even with
> > changing the test to fill the fs to enospc and delete everything I
> > couldn't get it to trigger reliably.
> > 
> > [1] https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/commit/?h=djwong-experimental&id=f085bb09c839da69daf921da33f5d13c80c9f165
> 
> Ok, I got it to trigger by updating that xfstest to fragment the free
> space after mounting.  According to the test results on 4.16-rc4 it
> fixes the deliberately tweaked agfls to work without complaint, so could
> you please write this up as a proper patch?
> 

Yeah, I had to trigger this variant by filling a small fs (based on a
couple metadumps I had created with wrapped packed/unpacked agfls) and
then fragmenting free space.

Thanks for looking. I need to run through some more thorough testing and
then I'll post it.

Brian

> --D
> 
> > 
> > --D
> > 
> > > Brian
> > > 
> > > --- 8< ---
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > > index c02781a4c091..1cbf80d8481f 100644
> > > --- a/fs/xfs/libxfs/xfs_alloc.c
> > > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > > @@ -2053,6 +2053,115 @@ xfs_alloc_space_available(
> > >  	return true;
> > >  }
> > >  
> > > +static int
> > > +xfs_agfl_ondisk_size(
> > > +	struct xfs_mount	*mp,
> > > +	int			first,
> > > +	int			last,
> > > +	int			count)
> > > +{
> > > +	int			active = count;
> > > +	int			agfl_size = XFS_AGFL_SIZE(mp);
> > > +
> > > +	if (count && last >= first)
> > > +		active = last - first + 1;
> > > +	else if (count)
> > > +		active = agfl_size - first + last + 1;
> > > +
> > > +	if (active == count + 1)
> > > +		return agfl_size - 1;
> > > +	if (active == count - 1 || first == agfl_size || last == agfl_size)
> > > +		return agfl_size + 1;
> > > +
> > > +	ASSERT(active == count);
> > > +	return agfl_size;
> > > +}
> > > +
> > > +static bool
> > > +xfs_agfl_need_padfix(
> > > +	struct xfs_mount	*mp,
> > > +	struct xfs_agf		*agf)
> > > +{
> > > +	int			f = be32_to_cpu(agf->agf_flfirst);
> > > +	int			l = be32_to_cpu(agf->agf_fllast);
> > > +	int			c = be32_to_cpu(agf->agf_flcount);
> > > +
> > > +	return xfs_agfl_ondisk_size(mp, f, l, c) != XFS_AGFL_SIZE(mp);
> > > +}
> > > +
> > > +static int
> > > +xfs_agfl_check_padfix(
> > > +	struct xfs_trans	*tp,
> > > +	struct xfs_buf		*agbp,
> > > +	struct xfs_buf		*agflbp,
> > > +	struct xfs_perag	*pag)
> > > +{
> > > +	struct xfs_mount	*mp = tp->t_mountp;
> > > +	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agbp);
> > > +	__be32			*agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);
> > > +	int			agfl_size = XFS_AGFL_SIZE(mp);
> > > +	int			ofirst, olast, osize;
> > > +	int			nfirst, nlast;
> > > +	int			logflags = 0;
> > > +	int			startoff = 0;
> > > +
> > > +	if (!pag->pagf_needpadfix)
> > > +		return 0;
> > > +
> > > +	ofirst = nfirst = be32_to_cpu(agf->agf_flfirst);
> > > +	olast = nlast = be32_to_cpu(agf->agf_fllast);
> > > +	osize = xfs_agfl_ondisk_size(mp, ofirst, olast, pag->pagf_flcount);
> > > +
> > > +	/*
> > > +	 * If the on-disk agfl is smaller than what the kernel expects, the
> > > +	 * last slot of the on-disk agfl is a gap with bogus data. Move the
> > > +	 * first valid block into the gap and bump the pointer.
> > > +	 */
> > > +	if (osize < agfl_size) {
> > > +		ASSERT(pag->pagf_flcount != 0);
> > > +		agfl_bno[agfl_size - 1] = agfl_bno[ofirst];
> > > +		startoff = (char *) &agfl_bno[agfl_size - 1] - (char *) agflbp->b_addr;
> > > +		nfirst++;
> > > +		goto done;
> > > +	}
> > > +
> > > +	/*
> > > +	 * Otherwise, the on-disk agfl is larger than what the current kernel
> > > +	 * expects. If empty, just fix up the first and last pointers. If not,
> > > +	 * move the inaccessible block to the end of the valid range.
> > > +	 */
> > > +	nfirst = do_mod(nfirst, agfl_size);
> > > +	if (pag->pagf_flcount == 0) {
> > > +		nlast = (nfirst == 0 ? agfl_size - 1 : nfirst - 1);
> > > +		goto done;
> > > +	}
> > > +	if (nlast != agfl_size)
> > > +		nlast++;
> > > +	nlast = do_mod(nlast, agfl_size);
> > > +	agfl_bno[nlast] = agfl_bno[osize - 1];
> > > +	startoff = (char *) &agfl_bno[nlast] - (char *) agflbp->b_addr;
> > > +
> > > +done:
> > > +	if (nfirst != ofirst) {
> > > +		agf->agf_flfirst = cpu_to_be32(nfirst);
> > > +		logflags |= XFS_AGF_FLFIRST;
> > > +	}
> > > +	if (nlast != olast) {
> > > +		agf->agf_fllast = cpu_to_be32(nlast);
> > > +		logflags |= XFS_AGF_FLLAST;
> > > +	}
> > > +	if (startoff) {
> > > +		xfs_trans_buf_set_type(tp, agflbp, XFS_BLFT_AGFL_BUF);
> > > +		xfs_trans_log_buf(tp, agflbp, startoff,
> > > +				  startoff + sizeof(xfs_agblock_t) - 1);
> > > +	}
> > > +	if (logflags)
> > > +		xfs_alloc_log_agf(tp, agbp, logflags);
> > > +
> > > +	pag->pagf_needpadfix = false;
> > > +	return 0;
> > > +}
> > > +
> > >  /*
> > >   * Decide whether to use this allocation group for this allocation.
> > >   * If so, fix up the btree freelist's size.
> > > @@ -2258,6 +2367,12 @@ xfs_alloc_get_freelist(
> > >  	if (error)
> > >  		return error;
> > >  
> > > +	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
> > > +	error = xfs_agfl_check_padfix(tp, agbp, agflbp, pag);
> > > +	if (error) {
> > > +		xfs_perag_put(pag);
> > > +		return error;
> > > +	}
> > >  
> > >  	/*
> > >  	 * Get the block number and update the data structures.
> > > @@ -2269,7 +2384,6 @@ xfs_alloc_get_freelist(
> > >  	if (be32_to_cpu(agf->agf_flfirst) == XFS_AGFL_SIZE(mp))
> > >  		agf->agf_flfirst = 0;
> > >  
> > > -	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
> > >  	be32_add_cpu(&agf->agf_flcount, -1);
> > >  	xfs_trans_agflist_delta(tp, -1);
> > >  	pag->pagf_flcount--;
> > > @@ -2376,11 +2490,18 @@ xfs_alloc_put_freelist(
> > >  	if (!agflbp && (error = xfs_alloc_read_agfl(mp, tp,
> > >  			be32_to_cpu(agf->agf_seqno), &agflbp)))
> > >  		return error;
> > > +
> > > +	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
> > > +	error = xfs_agfl_check_padfix(tp, agbp, agflbp, pag);
> > > +	if (error) {
> > > +		xfs_perag_put(pag);
> > > +		return error;
> > > +	}
> > > +
> > >  	be32_add_cpu(&agf->agf_fllast, 1);
> > >  	if (be32_to_cpu(agf->agf_fllast) == XFS_AGFL_SIZE(mp))
> > >  		agf->agf_fllast = 0;
> > >  
> > > -	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
> > >  	be32_add_cpu(&agf->agf_flcount, 1);
> > >  	xfs_trans_agflist_delta(tp, 1);
> > >  	pag->pagf_flcount++;
> > > @@ -2588,6 +2709,7 @@ xfs_alloc_read_agf(
> > >  		pag->pagb_count = 0;
> > >  		pag->pagb_tree = RB_ROOT;
> > >  		pag->pagf_init = 1;
> > > +		pag->pagf_needpadfix = xfs_agfl_need_padfix(mp, agf);
> > >  	}
> > >  #ifdef DEBUG
> > >  	else if (!XFS_FORCED_SHUTDOWN(mp)) {
> > > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> > > index e0792d036be2..78a6377a9b38 100644
> > > --- a/fs/xfs/xfs_mount.h
> > > +++ b/fs/xfs/xfs_mount.h
> > > @@ -353,6 +353,7 @@ typedef struct xfs_perag {
> > >  	char		pagi_inodeok;	/* The agi is ok for inodes */
> > >  	uint8_t		pagf_levels[XFS_BTNUM_AGF];
> > >  					/* # of levels in bno & cnt btree */
> > > +	bool		pagf_needpadfix;
> > >  	uint32_t	pagf_flcount;	/* count of blocks in freelist */
> > >  	xfs_extlen_t	pagf_freeblks;	/* total free blocks */
> > >  	xfs_extlen_t	pagf_longest;	/* longest free space */
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2018-03-06 12:02 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-23  1:59 [PATCH 0/5] xfs: fix various problems Darrick J. Wong
2018-02-23  1:59 ` [PATCH 1/5] xfs: don't iunlock the quota ip when quota block allocation fails Darrick J. Wong
2018-02-27 13:55   ` Brian Foster
2018-02-23  1:59 ` [PATCH 2/5] xfs: convert a few more directory asserts to corruption returns Darrick J. Wong
2018-02-27 13:55   ` Brian Foster
2018-02-23  2:00 ` [PATCH 3/5] xfs: check for cow blocks before trying to clear them during inode reclaim Darrick J. Wong
2018-02-27 13:55   ` Brian Foster
2018-02-23  2:00 ` [PATCH 4/5] xfs: convert XFS_AGFL_SIZE to a helper function Darrick J. Wong
2018-02-27 19:34   ` Brian Foster
2018-02-23  2:00 ` [PATCH 5/5] xfs: fix agfl wrapping Darrick J. Wong
2018-02-23  4:40   ` Darrick J. Wong
2018-02-23 20:33   ` Darrick J. Wong
2018-02-27 19:35   ` Brian Foster
2018-02-27 21:03     ` Darrick J. Wong
2018-02-28 22:43       ` Brian Foster
2018-02-28 23:20         ` Darrick J. Wong
2018-03-01 17:28           ` Brian Foster
2018-03-01 20:55             ` Darrick J. Wong
2018-03-02 13:12               ` Brian Foster
2018-03-03 13:59                 ` Brian Foster
2018-03-05 22:24                   ` Darrick J. Wong
2018-03-05 22:53                     ` Dave Chinner
2018-03-06  2:15                     ` Darrick J. Wong
2018-03-06 12:02                       ` Brian Foster
2018-03-01  6:37     ` Darrick J. Wong
2018-03-01  6:42   ` [PATCH v2 " Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.