All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET v2 0/3] xfs_repair: various small fixes
@ 2022-06-28 20:50 Darrick J. Wong
  2022-06-28 20:50 ` [PATCH 1/3] xfs_repair: always rewrite secondary supers when needsrepair is set Darrick J. Wong
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Darrick J. Wong @ 2022-06-28 20:50 UTC (permalink / raw)
  To: sandeen, djwong; +Cc: Dave Chinner, Chandan Babu R, linux-xfs

Hi all,

Bug fixes for 5.16 include:

 - checking V5 feature bits in secondary superblocks against whatever we
   decide is the primary
 - consistently warning if repair can't check existing rmap and refcount
   btrees
 - checking the ftype of dot and dotdot entries

Enjoy!

v2: improve documentation

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=repair-fixes
---
 repair/agheader.c    |   23 ++++++++++++++++++++---
 repair/dino_chunks.c |    3 +--
 2 files changed, 21 insertions(+), 5 deletions(-)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/3] xfs_repair: always rewrite secondary supers when needsrepair is set
  2022-06-28 20:50 [PATCHSET v2 0/3] xfs_repair: various small fixes Darrick J. Wong
@ 2022-06-28 20:50 ` Darrick J. Wong
  2022-06-28 23:20   ` Dave Chinner
  2022-06-28 20:50 ` [PATCH 2/3] xfs_repair: don't flag log_incompat inconsistencies as corruptions Darrick J. Wong
  2022-06-28 20:50 ` [PATCH 3/3] xfs_repair: Search for conflicts in inode_tree_ptrs[] when processing uncertain inodes Darrick J. Wong
  2 siblings, 1 reply; 6+ messages in thread
From: Darrick J. Wong @ 2022-06-28 20:50 UTC (permalink / raw)
  To: sandeen, djwong; +Cc: Dave Chinner, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Dave Chinner complained about xfs_scrub failures coming from xfs/158.
That test induces xfs_repair to fail while upgrading a filesystem to
have the inobtcount feature, and then restarts xfs_repair to finish the
upgrade.  When the second xfs_repair run starts, it will find that the
primary super has NEEDSREPAIR set, along with whatever new feature that
we were trying to add to the filesystem.

From there, repair completes the upgrade in much the same manner as the
first repair run would have, with one big exception -- it forgets to set
features_changed to trigger rewriting of the secondary supers at the end
of repair.  This results in discrepancies between the supers:

# XFS_REPAIR_FAIL_AFTER_PHASE=2 xfs_repair -c inobtcount=1 /dev/sdf
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Adding inode btree counts to filesystem.
Killed
# xfs_repair /dev/sdf
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
clearing needsrepair flag and regenerating metadata
bad inobt block count 0, saw 1
bad finobt block count 0, saw 1
bad inobt block count 0, saw 1
bad finobt block count 0, saw 1
bad inobt block count 0, saw 1
bad finobt block count 0, saw 1
bad inobt block count 0, saw 1
bad finobt block count 0, saw 1
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 1
        - agno = 2
        - agno = 0
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done
# xfs_db -c 'sb 0' -c 'print' -c 'sb 1' -c 'print' /dev/sdf | \
	egrep '(features_ro_compat|features_incompat)'
features_ro_compat = 0xd
features_incompat = 0xb
features_ro_compat = 0x5
features_incompat = 0xb

Curiously, re-running xfs_repair will not trigger any warnings about the
featureset mismatch between the primary and secondary supers.  xfs_scrub
immediately notices, which is what causes xfs/158 to fail.

This discrepancy doesn't happen when the upgrade completes successfully
in a single repair run, so we need to teach repair to rewrite the
secondaries at the end of repair any time needsrepair was set.

Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 repair/agheader.c |    8 ++++++++
 1 file changed, 8 insertions(+)


diff --git a/repair/agheader.c b/repair/agheader.c
index 36da1395..e91509d0 100644
--- a/repair/agheader.c
+++ b/repair/agheader.c
@@ -552,6 +552,14 @@ secondary_sb_whack(
 			else
 				do_warn(
 	_("would clear needsrepair flag and regenerate metadata\n"));
+			/*
+			 * If needsrepair is set on the primary super, there's
+			 * a possibility that repair crashed during an upgrade.
+			 * Set features_changed to ensure that the secondary
+			 * supers are rewritten with the new feature bits once
+			 * we've finished the upgrade.
+			 */
+			features_changed = true;
 		} else {
 			/*
 			 * Quietly clear needsrepair on the secondary supers as


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/3] xfs_repair: don't flag log_incompat inconsistencies as corruptions
  2022-06-28 20:50 [PATCHSET v2 0/3] xfs_repair: various small fixes Darrick J. Wong
  2022-06-28 20:50 ` [PATCH 1/3] xfs_repair: always rewrite secondary supers when needsrepair is set Darrick J. Wong
@ 2022-06-28 20:50 ` Darrick J. Wong
  2022-06-28 23:21   ` Dave Chinner
  2022-06-28 20:50 ` [PATCH 3/3] xfs_repair: Search for conflicts in inode_tree_ptrs[] when processing uncertain inodes Darrick J. Wong
  2 siblings, 1 reply; 6+ messages in thread
From: Darrick J. Wong @ 2022-06-28 20:50 UTC (permalink / raw)
  To: sandeen, djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

While testing xfs/233 and xfs/127 with LARP mode enabled, I noticed
errors such as the following:

xfs_growfs --BlockSize=4096 --Blocks=8192
data blocks changed from 8192 to 2579968
meta-data=/dev/sdf               isize=512    agcount=630, agsize=4096 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=1
         =                       reflink=1    bigtime=1 inobtcount=1
data     =                       bsize=4096   blocks=2579968, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=3075, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
_check_xfs_filesystem: filesystem on /dev/sdf is inconsistent (r)
*** xfs_repair -n output ***
Phase 1 - find and verify superblock...
        - reporting progress in intervals of 15 minutes
Phase 2 - using internal log
        - zero log...
        - 23:03:47: zeroing log - 3075 of 3075 blocks done
        - scan filesystem freespace and inode maps...
would fix log incompat feature mismatch in AG 30 super, 0x0 != 0x1
would fix log incompat feature mismatch in AG 8 super, 0x0 != 0x1
would fix log incompat feature mismatch in AG 12 super, 0x0 != 0x1
would fix log incompat feature mismatch in AG 24 super, 0x0 != 0x1
would fix log incompat feature mismatch in AG 18 super, 0x0 != 0x1
<snip>

0x1 corresponds to XFS_SB_FEAT_INCOMPAT_LOG_XATTRS, which is the feature
bit used to indicate that the log contains extended attribute log intent
items.  This is a mechanism to prevent older kernels from trying to
recover log items that they won't know how to recover.

I thought about this a little bit more, and realized that log_incompat
features bits are set on the primary sb prior to writing certain types
of log records, and cleared once the log has written the committed
changes back to the filesystem.  If the secondary superblocks are
updated at any point during that interval (due to things like growfs or
setting labels), the log_incompat field will now be set on the secondary
supers.

Due to the ephemeral nature of the current log_incompat feature bits,
a discrepancy between the primary and secondary supers is not a
corruption.  If we're in dry run mode, we should log the discrepancy,
but that's not a reason to end with EXIT_FAILURE.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 repair/agheader.c |   15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)


diff --git a/repair/agheader.c b/repair/agheader.c
index e91509d0..76290158 100644
--- a/repair/agheader.c
+++ b/repair/agheader.c
@@ -285,15 +285,24 @@ check_v5_feature_mismatch(
 		}
 	}
 
+	/*
+	 * Log incompat feature bits are set and cleared from the primary super
+	 * as needed to protect against log replay on old kernels finding log
+	 * records that they cannot handle.  Secondary sb resyncs performed as
+	 * part of a geometry update to the primary sb (e.g. growfs, label/uuid
+	 * changes) will copy the log incompat feature bits, but it's not a
+	 * corruption for a secondary to have a bit set that is clear in the
+	 * primary super.
+	 */
 	if (mp->m_sb.sb_features_log_incompat != sb->sb_features_log_incompat) {
 		if (no_modify) {
-			do_warn(
-	_("would fix log incompat feature mismatch in AG %u super, 0x%x != 0x%x\n"),
+			do_log(
+	_("would sync log incompat feature in AG %u super, 0x%x != 0x%x\n"),
 					agno, mp->m_sb.sb_features_log_incompat,
 					sb->sb_features_log_incompat);
 		} else {
 			do_warn(
-	_("will fix log incompat feature mismatch in AG %u super, 0x%x != 0x%x\n"),
+	_("will sync log incompat feature in AG %u super, 0x%x != 0x%x\n"),
 					agno, mp->m_sb.sb_features_log_incompat,
 					sb->sb_features_log_incompat);
 			dirty = true;


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/3] xfs_repair: Search for conflicts in inode_tree_ptrs[] when processing uncertain inodes
  2022-06-28 20:50 [PATCHSET v2 0/3] xfs_repair: various small fixes Darrick J. Wong
  2022-06-28 20:50 ` [PATCH 1/3] xfs_repair: always rewrite secondary supers when needsrepair is set Darrick J. Wong
  2022-06-28 20:50 ` [PATCH 2/3] xfs_repair: don't flag log_incompat inconsistencies as corruptions Darrick J. Wong
@ 2022-06-28 20:50 ` Darrick J. Wong
  2 siblings, 0 replies; 6+ messages in thread
From: Darrick J. Wong @ 2022-06-28 20:50 UTC (permalink / raw)
  To: sandeen, djwong; +Cc: Chandan Babu R, linux-xfs

From: Chandan Babu R <chandan.babu@oracle.com>

When processing an uncertain inode chunk record, if we lose 2 blocks worth of
inodes or 25% of the chunk, xfs_repair decides to ignore the chunk. Otherwise,
xfs_repair adds a new chunk record to inode_tree_ptrs[agno], marking each
inode as either free or used. However, before adding the new chunk record,
xfs_repair has to check for the existance of a conflicting record.

The existing code incorrectly checks for the conflicting record in
inode_uncertain_tree_ptrs[agno]. This check will succeed since the inode chunk
record being processed was originally obtained from
inode_uncertain_tree_ptrs[agno].

This commit fixes the bug by changing xfs_repair to search
inode_tree_ptrs[agno] for conflicts.

Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 repair/dino_chunks.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)


diff --git a/repair/dino_chunks.c b/repair/dino_chunks.c
index 11b0eb5f..80c52a43 100644
--- a/repair/dino_chunks.c
+++ b/repair/dino_chunks.c
@@ -229,8 +229,7 @@ verify_inode_chunk(xfs_mount_t		*mp,
 		/*
 		 * ok, put the record into the tree, if no conflict.
 		 */
-		if (find_uncertain_inode_rec(agno,
-				XFS_AGB_TO_AGINO(mp, start_agbno)))
+		if (find_inode_rec(mp, agno, XFS_AGB_TO_AGINO(mp, start_agbno)))
 			return(0);
 
 		start_agino = XFS_AGB_TO_AGINO(mp, start_agbno);


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/3] xfs_repair: always rewrite secondary supers when needsrepair is set
  2022-06-28 20:50 ` [PATCH 1/3] xfs_repair: always rewrite secondary supers when needsrepair is set Darrick J. Wong
@ 2022-06-28 23:20   ` Dave Chinner
  0 siblings, 0 replies; 6+ messages in thread
From: Dave Chinner @ 2022-06-28 23:20 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs

On Tue, Jun 28, 2022 at 01:50:40PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Dave Chinner complained about xfs_scrub failures coming from xfs/158.
> That test induces xfs_repair to fail while upgrading a filesystem to
> have the inobtcount feature, and then restarts xfs_repair to finish the
> upgrade.  When the second xfs_repair run starts, it will find that the
> primary super has NEEDSREPAIR set, along with whatever new feature that
> we were trying to add to the filesystem.
> 
> From there, repair completes the upgrade in much the same manner as the
> first repair run would have, with one big exception -- it forgets to set
> features_changed to trigger rewriting of the secondary supers at the end
> of repair.  This results in discrepancies between the supers:
> 
> # XFS_REPAIR_FAIL_AFTER_PHASE=2 xfs_repair -c inobtcount=1 /dev/sdf
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>         - zero log...
>         - scan filesystem freespace and inode maps...
>         - found root inode chunk
> Adding inode btree counts to filesystem.
> Killed
> # xfs_repair /dev/sdf
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>         - zero log...
>         - scan filesystem freespace and inode maps...
> clearing needsrepair flag and regenerating metadata
> bad inobt block count 0, saw 1
> bad finobt block count 0, saw 1
> bad inobt block count 0, saw 1
> bad finobt block count 0, saw 1
> bad inobt block count 0, saw 1
> bad finobt block count 0, saw 1
> bad inobt block count 0, saw 1
> bad finobt block count 0, saw 1
>         - found root inode chunk
> Phase 3 - for each AG...
>         - scan and clear agi unlinked lists...
>         - process known inodes and perform inode discovery...
>         - agno = 0
>         - agno = 1
>         - agno = 2
>         - agno = 3
>         - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
>         - setting up duplicate extent list...
>         - check for inodes claiming duplicate blocks...
>         - agno = 1
>         - agno = 2
>         - agno = 0
>         - agno = 3
> Phase 5 - rebuild AG headers and trees...
>         - reset superblock...
> Phase 6 - check inode connectivity...
>         - resetting contents of realtime bitmap and summary inodes
>         - traversing filesystem ...
>         - traversal finished ...
>         - moving disconnected inodes to lost+found ...
> Phase 7 - verify and correct link counts...
> done
> # xfs_db -c 'sb 0' -c 'print' -c 'sb 1' -c 'print' /dev/sdf | \
> 	egrep '(features_ro_compat|features_incompat)'
> features_ro_compat = 0xd
> features_incompat = 0xb
> features_ro_compat = 0x5
> features_incompat = 0xb
> 
> Curiously, re-running xfs_repair will not trigger any warnings about the
> featureset mismatch between the primary and secondary supers.  xfs_scrub
> immediately notices, which is what causes xfs/158 to fail.
> 
> This discrepancy doesn't happen when the upgrade completes successfully
> in a single repair run, so we need to teach repair to rewrite the
> secondaries at the end of repair any time needsrepair was set.
> 
> Reported-by: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  repair/agheader.c |    8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> 
> diff --git a/repair/agheader.c b/repair/agheader.c
> index 36da1395..e91509d0 100644
> --- a/repair/agheader.c
> +++ b/repair/agheader.c
> @@ -552,6 +552,14 @@ secondary_sb_whack(
>  			else
>  				do_warn(
>  	_("would clear needsrepair flag and regenerate metadata\n"));
> +			/*
> +			 * If needsrepair is set on the primary super, there's
> +			 * a possibility that repair crashed during an upgrade.
> +			 * Set features_changed to ensure that the secondary
> +			 * supers are rewritten with the new feature bits once
> +			 * we've finished the upgrade.
> +			 */
> +			features_changed = true;
>  		} else {
>  			/*
>  			 * Quietly clear needsrepair on the secondary supers as
> 
> 

Looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/3] xfs_repair: don't flag log_incompat inconsistencies as corruptions
  2022-06-28 20:50 ` [PATCH 2/3] xfs_repair: don't flag log_incompat inconsistencies as corruptions Darrick J. Wong
@ 2022-06-28 23:21   ` Dave Chinner
  0 siblings, 0 replies; 6+ messages in thread
From: Dave Chinner @ 2022-06-28 23:21 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs

On Tue, Jun 28, 2022 at 01:50:45PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> While testing xfs/233 and xfs/127 with LARP mode enabled, I noticed
> errors such as the following:
> 
> xfs_growfs --BlockSize=4096 --Blocks=8192
> data blocks changed from 8192 to 2579968
> meta-data=/dev/sdf               isize=512    agcount=630, agsize=4096 blks
>          =                       sectsz=512   attr=2, projid32bit=1
>          =                       crc=1        finobt=1, sparse=1, rmapbt=1
>          =                       reflink=1    bigtime=1 inobtcount=1
> data     =                       bsize=4096   blocks=2579968, imaxpct=25
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
> log      =internal log           bsize=4096   blocks=3075, version=2
>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> _check_xfs_filesystem: filesystem on /dev/sdf is inconsistent (r)
> *** xfs_repair -n output ***
> Phase 1 - find and verify superblock...
>         - reporting progress in intervals of 15 minutes
> Phase 2 - using internal log
>         - zero log...
>         - 23:03:47: zeroing log - 3075 of 3075 blocks done
>         - scan filesystem freespace and inode maps...
> would fix log incompat feature mismatch in AG 30 super, 0x0 != 0x1
> would fix log incompat feature mismatch in AG 8 super, 0x0 != 0x1
> would fix log incompat feature mismatch in AG 12 super, 0x0 != 0x1
> would fix log incompat feature mismatch in AG 24 super, 0x0 != 0x1
> would fix log incompat feature mismatch in AG 18 super, 0x0 != 0x1
> <snip>
> 
> 0x1 corresponds to XFS_SB_FEAT_INCOMPAT_LOG_XATTRS, which is the feature
> bit used to indicate that the log contains extended attribute log intent
> items.  This is a mechanism to prevent older kernels from trying to
> recover log items that they won't know how to recover.
> 
> I thought about this a little bit more, and realized that log_incompat
> features bits are set on the primary sb prior to writing certain types
> of log records, and cleared once the log has written the committed
> changes back to the filesystem.  If the secondary superblocks are
> updated at any point during that interval (due to things like growfs or
> setting labels), the log_incompat field will now be set on the secondary
> supers.
> 
> Due to the ephemeral nature of the current log_incompat feature bits,
> a discrepancy between the primary and secondary supers is not a
> corruption.  If we're in dry run mode, we should log the discrepancy,
> but that's not a reason to end with EXIT_FAILURE.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  repair/agheader.c |   15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> 
> diff --git a/repair/agheader.c b/repair/agheader.c
> index e91509d0..76290158 100644
> --- a/repair/agheader.c
> +++ b/repair/agheader.c
> @@ -285,15 +285,24 @@ check_v5_feature_mismatch(
>  		}
>  	}
>  
> +	/*
> +	 * Log incompat feature bits are set and cleared from the primary super
> +	 * as needed to protect against log replay on old kernels finding log
> +	 * records that they cannot handle.  Secondary sb resyncs performed as
> +	 * part of a geometry update to the primary sb (e.g. growfs, label/uuid
> +	 * changes) will copy the log incompat feature bits, but it's not a
> +	 * corruption for a secondary to have a bit set that is clear in the
> +	 * primary super.
> +	 */
>  	if (mp->m_sb.sb_features_log_incompat != sb->sb_features_log_incompat) {
>  		if (no_modify) {
> -			do_warn(
> -	_("would fix log incompat feature mismatch in AG %u super, 0x%x != 0x%x\n"),
> +			do_log(
> +	_("would sync log incompat feature in AG %u super, 0x%x != 0x%x\n"),
>  					agno, mp->m_sb.sb_features_log_incompat,
>  					sb->sb_features_log_incompat);
>  		} else {
>  			do_warn(
> -	_("will fix log incompat feature mismatch in AG %u super, 0x%x != 0x%x\n"),
> +	_("will sync log incompat feature in AG %u super, 0x%x != 0x%x\n"),
>  					agno, mp->m_sb.sb_features_log_incompat,
>  					sb->sb_features_log_incompat);
>  			dirty = true;

Looks fine.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-06-28 23:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-28 20:50 [PATCHSET v2 0/3] xfs_repair: various small fixes Darrick J. Wong
2022-06-28 20:50 ` [PATCH 1/3] xfs_repair: always rewrite secondary supers when needsrepair is set Darrick J. Wong
2022-06-28 23:20   ` Dave Chinner
2022-06-28 20:50 ` [PATCH 2/3] xfs_repair: don't flag log_incompat inconsistencies as corruptions Darrick J. Wong
2022-06-28 23:21   ` Dave Chinner
2022-06-28 20:50 ` [PATCH 3/3] xfs_repair: Search for conflicts in inode_tree_ptrs[] when processing uncertain inodes Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.