All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs: check sb_meta_uuid for dabuf buffer recovery
@ 2021-12-16  0:17 Dave Chinner
  2021-12-16  0:21 ` Eric Sandeen
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Dave Chinner @ 2021-12-16  0:17 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Got a report that a repeated crash test of a container host would
eventually fail with a log recovery error preventing the system from
mounting the root filesystem. It manifested as a directory leaf node
corruption on writeback like so:

 XFS (loop0): Mounting V5 Filesystem
 XFS (loop0): Starting recovery (logdev: internal)
 XFS (loop0): Metadata corruption detected at xfs_dir3_leaf_check_int+0x99/0xf0, xfs_dir3_leaf1 block 0x12faa158
 XFS (loop0): Unmount and run xfs_repair
 XFS (loop0): First 128 bytes of corrupted metadata buffer:
 00000000: 00 00 00 00 00 00 00 00 3d f1 00 00 e1 9e d5 8b  ........=.......
 00000010: 00 00 00 00 12 fa a1 58 00 00 00 29 00 00 1b cc  .......X...)....
 00000020: 91 06 78 ff f7 7e 4a 7d 8d 53 86 f2 ac 47 a8 23  ..x..~J}.S...G.#
 00000030: 00 00 00 00 17 e0 00 80 00 43 00 00 00 00 00 00  .........C......
 00000040: 00 00 00 2e 00 00 00 08 00 00 17 2e 00 00 00 0a  ................
 00000050: 02 35 79 83 00 00 00 30 04 d3 b4 80 00 00 01 50  .5y....0.......P
 00000060: 08 40 95 7f 00 00 02 98 08 41 fe b7 00 00 02 d4  .@.......A......
 00000070: 0d 62 ef a7 00 00 01 f2 14 50 21 41 00 00 00 0c  .b.......P!A....
 XFS (loop0): Corruption of in-memory data (0x8) detected at xfs_do_force_shutdown+0x1a/0x20 (fs/xfs/xfs_buf.c:1514).  Shutting down.
 XFS (loop0): Please unmount the filesystem and rectify the problem(s)
 XFS (loop0): log mount/recovery failed: error -117
 XFS (loop0): log mount failed

Tracing indicated that we were recovering changes from a transaction
at LSN 0x29/0x1c16 into a buffer that had an LSN of 0x29/0x1d57.
That is, log recovery was overwriting a buffer with newer changes on
disk than was in the transaction. Tracing indicated that we were
hitting the "recovery immediately" case in
xfs_buf_log_recovery_lsn(), and hence it was ignoring the LSN in the
buffer.

The code was extracting the LSN correctly, then ignoring it because
the UUID in the buffer did not match the superblock UUID. The
problem arises because the UUID check uses the wrong UUID - it
should be checking the sb_meta_uuid, not sb_uuid. This filesystem
has sb_uuid != sb_meta_uuid (which is fine), and the buffer has the
correct matching sb_meta_uuid in it, it's just the code checked it
against the wrong superblock uuid.

The is no corruption in the filesystem, and failing to recover the
buffer due to a write verifier failure means the recovery bug did
not propagate the corruption to disk. Hence there is no corruption
before or after this bug has manifested, the impact is limited
simply to an unmountable filesystem....

This was missed back in 2015 during an audit of incorrect sb_uuid
usage that resulted in commit fcfbe2c4ef42 ("xfs: log recovery needs
to validate against sb_meta_uuid") that fixed the magic32 buffers to
validate against sb_meta_uuid instead of sb_uuid. It missed the
magicda buffers....

Fixes: ce748eaa65f2 ("xfs: create new metadata UUID field and incompat flag")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_buf_item_recover.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
index 70ca5751b13e..e484251dc9c8 100644
--- a/fs/xfs/xfs_buf_item_recover.c
+++ b/fs/xfs/xfs_buf_item_recover.c
@@ -816,7 +816,7 @@ xlog_recover_get_buf_lsn(
 	}
 
 	if (lsn != (xfs_lsn_t)-1) {
-		if (!uuid_equal(&mp->m_sb.sb_uuid, uuid))
+		if (!uuid_equal(&mp->m_sb.sb_meta_uuid, uuid))
 			goto recover_immediately;
 		return lsn;
 	}
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] xfs: check sb_meta_uuid for dabuf buffer recovery
  2021-12-16  0:17 [PATCH] xfs: check sb_meta_uuid for dabuf buffer recovery Dave Chinner
@ 2021-12-16  0:21 ` Eric Sandeen
  2021-12-16  0:25   ` Dave Chinner
  2021-12-16  0:23 ` Dave Chinner
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Eric Sandeen @ 2021-12-16  0:21 UTC (permalink / raw)
  To: Dave Chinner, linux-xfs

On 12/15/21 6:17 PM, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Got a report that a repeated crash test of a container host would
> eventually fail with a log recovery error preventing the system from
> mounting the root filesystem. It manifested as a directory leaf node
> corruption on writeback like so:
> 
>   XFS (loop0): Mounting V5 Filesystem
>   XFS (loop0): Starting recovery (logdev: internal)
>   XFS (loop0): Metadata corruption detected at xfs_dir3_leaf_check_int+0x99/0xf0, xfs_dir3_leaf1 block 0x12faa158
>   XFS (loop0): Unmount and run xfs_repair
>   XFS (loop0): First 128 bytes of corrupted metadata buffer:
>   00000000: 00 00 00 00 00 00 00 00 3d f1 00 00 e1 9e d5 8b  ........=.......
>   00000010: 00 00 00 00 12 fa a1 58 00 00 00 29 00 00 1b cc  .......X...)....
>   00000020: 91 06 78 ff f7 7e 4a 7d 8d 53 86 f2 ac 47 a8 23  ..x..~J}.S...G.#
>   00000030: 00 00 00 00 17 e0 00 80 00 43 00 00 00 00 00 00  .........C......
>   00000040: 00 00 00 2e 00 00 00 08 00 00 17 2e 00 00 00 0a  ................
>   00000050: 02 35 79 83 00 00 00 30 04 d3 b4 80 00 00 01 50  .5y....0.......P
>   00000060: 08 40 95 7f 00 00 02 98 08 41 fe b7 00 00 02 d4  .@.......A......
>   00000070: 0d 62 ef a7 00 00 01 f2 14 50 21 41 00 00 00 0c  .b.......P!A....
>   XFS (loop0): Corruption of in-memory data (0x8) detected at xfs_do_force_shutdown+0x1a/0x20 (fs/xfs/xfs_buf.c:1514).  Shutting down.
>   XFS (loop0): Please unmount the filesystem and rectify the problem(s)
>   XFS (loop0): log mount/recovery failed: error -117
>   XFS (loop0): log mount failed
> 
> Tracing indicated that we were recovering changes from a transaction
> at LSN 0x29/0x1c16 into a buffer that had an LSN of 0x29/0x1d57.
> That is, log recovery was overwriting a buffer with newer changes on
> disk than was in the transaction. Tracing indicated that we were
> hitting the "recovery immediately" case in
> xfs_buf_log_recovery_lsn(), and hence it was ignoring the LSN in the
> buffer.
> 
> The code was extracting the LSN correctly, then ignoring it because
> the UUID in the buffer did not match the superblock UUID. The
> problem arises because the UUID check uses the wrong UUID - it
> should be checking the sb_meta_uuid, not sb_uuid. This filesystem
> has sb_uuid != sb_meta_uuid (which is fine), and the buffer has the
> correct matching sb_meta_uuid in it, it's just the code checked it
> against the wrong superblock uuid.
> 
> The is no corruption in the filesystem, and failing to recover the
> buffer due to a write verifier failure means the recovery bug did
> not propagate the corruption to disk. Hence there is no corruption
> before or after this bug has manifested, the impact is limited
> simply to an unmountable filesystem....
> 
> This was missed back in 2015 during an audit of incorrect sb_uuid
> usage that resulted in commit fcfbe2c4ef42 ("xfs: log recovery needs
> to validate against sb_meta_uuid") that fixed the magic32 buffers to
> validate against sb_meta_uuid instead of sb_uuid. It missed the
> magicda buffers....
> 
> Fixes: ce748eaa65f2 ("xfs: create new metadata UUID field and incompat flag")
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Kind of amazing it took so long to turn up and get found, eh?

Thanks Dave,

Reviewed-by: Eric Sandeen <sandeen@redhat.com>

> ---
>   fs/xfs/xfs_buf_item_recover.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
> index 70ca5751b13e..e484251dc9c8 100644
> --- a/fs/xfs/xfs_buf_item_recover.c
> +++ b/fs/xfs/xfs_buf_item_recover.c
> @@ -816,7 +816,7 @@ xlog_recover_get_buf_lsn(
>   	}
>   
>   	if (lsn != (xfs_lsn_t)-1) {
> -		if (!uuid_equal(&mp->m_sb.sb_uuid, uuid))
> +		if (!uuid_equal(&mp->m_sb.sb_meta_uuid, uuid))
>   			goto recover_immediately;
>   		return lsn;
>   	}

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] xfs: check sb_meta_uuid for dabuf buffer recovery
  2021-12-16  0:17 [PATCH] xfs: check sb_meta_uuid for dabuf buffer recovery Dave Chinner
  2021-12-16  0:21 ` Eric Sandeen
@ 2021-12-16  0:23 ` Dave Chinner
  2021-12-16  1:10 ` Darrick J. Wong
  2021-12-24  7:07 ` Christoph Hellwig
  3 siblings, 0 replies; 8+ messages in thread
From: Dave Chinner @ 2021-12-16  0:23 UTC (permalink / raw)
  To: linux-xfs

On Thu, Dec 16, 2021 at 11:17:09AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Got a report that a repeated crash test of a container host would
> eventually fail with a log recovery error preventing the system from
> mounting the root filesystem. It manifested as a directory leaf node
> corruption on writeback like so:
> 
>  XFS (loop0): Mounting V5 Filesystem
>  XFS (loop0): Starting recovery (logdev: internal)
>  XFS (loop0): Metadata corruption detected at xfs_dir3_leaf_check_int+0x99/0xf0, xfs_dir3_leaf1 block 0x12faa158
>  XFS (loop0): Unmount and run xfs_repair
>  XFS (loop0): First 128 bytes of corrupted metadata buffer:
>  00000000: 00 00 00 00 00 00 00 00 3d f1 00 00 e1 9e d5 8b  ........=.......
>  00000010: 00 00 00 00 12 fa a1 58 00 00 00 29 00 00 1b cc  .......X...)....
>  00000020: 91 06 78 ff f7 7e 4a 7d 8d 53 86 f2 ac 47 a8 23  ..x..~J}.S...G.#
>  00000030: 00 00 00 00 17 e0 00 80 00 43 00 00 00 00 00 00  .........C......
>  00000040: 00 00 00 2e 00 00 00 08 00 00 17 2e 00 00 00 0a  ................
>  00000050: 02 35 79 83 00 00 00 30 04 d3 b4 80 00 00 01 50  .5y....0.......P
>  00000060: 08 40 95 7f 00 00 02 98 08 41 fe b7 00 00 02 d4  .@.......A......
>  00000070: 0d 62 ef a7 00 00 01 f2 14 50 21 41 00 00 00 0c  .b.......P!A....
>  XFS (loop0): Corruption of in-memory data (0x8) detected at xfs_do_force_shutdown+0x1a/0x20 (fs/xfs/xfs_buf.c:1514).  Shutting down.
>  XFS (loop0): Please unmount the filesystem and rectify the problem(s)
>  XFS (loop0): log mount/recovery failed: error -117
>  XFS (loop0): log mount failed
> 
> Tracing indicated that we were recovering changes from a transaction
> at LSN 0x29/0x1c16 into a buffer that had an LSN of 0x29/0x1d57.
> That is, log recovery was overwriting a buffer with newer changes on
> disk than was in the transaction. Tracing indicated that we were
> hitting the "recovery immediately" case in
> xfs_buf_log_recovery_lsn(), and hence it was ignoring the LSN in the
  ^^^^^^^^^^^^^^^^^^^^^^^^^^

xlog_recover_get_buf_lsn()

> buffer.
> 
> The code was extracting the LSN correctly, then ignoring it because
> the UUID in the buffer did not match the superblock UUID. The
> problem arises because the UUID check uses the wrong UUID - it
> should be checking the sb_meta_uuid, not sb_uuid. This filesystem
> has sb_uuid != sb_meta_uuid (which is fine), and the buffer has the
> correct matching sb_meta_uuid in it, it's just the code checked it
> against the wrong superblock uuid.
> 
> The is no corruption in the filesystem, and failing to recover the
> buffer due to a write verifier failure means the recovery bug did
> not propagate the corruption to disk. Hence there is no corruption
> before or after this bug has manifested, the impact is limited
> simply to an unmountable filesystem....
> 
> This was missed back in 2015 during an audit of incorrect sb_uuid
> usage that resulted in commit fcfbe2c4ef42 ("xfs: log recovery needs
> to validate against sb_meta_uuid") that fixed the magic32 buffers to
> validate against sb_meta_uuid instead of sb_uuid. It missed the
> magicda buffers....
> 
> Fixes: ce748eaa65f2 ("xfs: create new metadata UUID field and incompat flag")
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_buf_item_recover.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
> index 70ca5751b13e..e484251dc9c8 100644
> --- a/fs/xfs/xfs_buf_item_recover.c
> +++ b/fs/xfs/xfs_buf_item_recover.c
> @@ -816,7 +816,7 @@ xlog_recover_get_buf_lsn(
>  	}
>  
>  	if (lsn != (xfs_lsn_t)-1) {
> -		if (!uuid_equal(&mp->m_sb.sb_uuid, uuid))
> +		if (!uuid_equal(&mp->m_sb.sb_meta_uuid, uuid))
>  			goto recover_immediately;
>  		return lsn;
>  	}
> -- 
> 2.33.0
> 
> 

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] xfs: check sb_meta_uuid for dabuf buffer recovery
  2021-12-16  0:21 ` Eric Sandeen
@ 2021-12-16  0:25   ` Dave Chinner
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Chinner @ 2021-12-16  0:25 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs

On Wed, Dec 15, 2021 at 06:21:16PM -0600, Eric Sandeen wrote:
> On 12/15/21 6:17 PM, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Got a report that a repeated crash test of a container host would
> > eventually fail with a log recovery error preventing the system from
> > mounting the root filesystem. It manifested as a directory leaf node
> > corruption on writeback like so:
> > 
> >   XFS (loop0): Mounting V5 Filesystem
> >   XFS (loop0): Starting recovery (logdev: internal)
> >   XFS (loop0): Metadata corruption detected at xfs_dir3_leaf_check_int+0x99/0xf0, xfs_dir3_leaf1 block 0x12faa158
> >   XFS (loop0): Unmount and run xfs_repair
> >   XFS (loop0): First 128 bytes of corrupted metadata buffer:
> >   00000000: 00 00 00 00 00 00 00 00 3d f1 00 00 e1 9e d5 8b  ........=.......
> >   00000010: 00 00 00 00 12 fa a1 58 00 00 00 29 00 00 1b cc  .......X...)....
> >   00000020: 91 06 78 ff f7 7e 4a 7d 8d 53 86 f2 ac 47 a8 23  ..x..~J}.S...G.#
> >   00000030: 00 00 00 00 17 e0 00 80 00 43 00 00 00 00 00 00  .........C......
> >   00000040: 00 00 00 2e 00 00 00 08 00 00 17 2e 00 00 00 0a  ................
> >   00000050: 02 35 79 83 00 00 00 30 04 d3 b4 80 00 00 01 50  .5y....0.......P
> >   00000060: 08 40 95 7f 00 00 02 98 08 41 fe b7 00 00 02 d4  .@.......A......
> >   00000070: 0d 62 ef a7 00 00 01 f2 14 50 21 41 00 00 00 0c  .b.......P!A....
> >   XFS (loop0): Corruption of in-memory data (0x8) detected at xfs_do_force_shutdown+0x1a/0x20 (fs/xfs/xfs_buf.c:1514).  Shutting down.
> >   XFS (loop0): Please unmount the filesystem and rectify the problem(s)
> >   XFS (loop0): log mount/recovery failed: error -117
> >   XFS (loop0): log mount failed
> > 
> > Tracing indicated that we were recovering changes from a transaction
> > at LSN 0x29/0x1c16 into a buffer that had an LSN of 0x29/0x1d57.
> > That is, log recovery was overwriting a buffer with newer changes on
> > disk than was in the transaction. Tracing indicated that we were
> > hitting the "recovery immediately" case in
> > xfs_buf_log_recovery_lsn(), and hence it was ignoring the LSN in the
> > buffer.
> > 
> > The code was extracting the LSN correctly, then ignoring it because
> > the UUID in the buffer did not match the superblock UUID. The
> > problem arises because the UUID check uses the wrong UUID - it
> > should be checking the sb_meta_uuid, not sb_uuid. This filesystem
> > has sb_uuid != sb_meta_uuid (which is fine), and the buffer has the
> > correct matching sb_meta_uuid in it, it's just the code checked it
> > against the wrong superblock uuid.
> > 
> > The is no corruption in the filesystem, and failing to recover the
> > buffer due to a write verifier failure means the recovery bug did
> > not propagate the corruption to disk. Hence there is no corruption
> > before or after this bug has manifested, the impact is limited
> > simply to an unmountable filesystem....
> > 
> > This was missed back in 2015 during an audit of incorrect sb_uuid
> > usage that resulted in commit fcfbe2c4ef42 ("xfs: log recovery needs
> > to validate against sb_meta_uuid") that fixed the magic32 buffers to
> > validate against sb_meta_uuid instead of sb_uuid. It missed the
> > magicda buffers....
> > 
> > Fixes: ce748eaa65f2 ("xfs: create new metadata UUID field and incompat flag")
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> 
> Kind of amazing it took so long to turn up and get found, eh?

Especially considering that I use sb_uuid != sb_meta_uuid for all
the test device filesystems I run fstests on and have for the past 5
years. I think that means we *never* run log recovery tests on test
devices in fstests.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] xfs: check sb_meta_uuid for dabuf buffer recovery
  2021-12-16  0:17 [PATCH] xfs: check sb_meta_uuid for dabuf buffer recovery Dave Chinner
  2021-12-16  0:21 ` Eric Sandeen
  2021-12-16  0:23 ` Dave Chinner
@ 2021-12-16  1:10 ` Darrick J. Wong
  2021-12-16  3:58   ` Darrick J. Wong
  2021-12-24  7:07 ` Christoph Hellwig
  3 siblings, 1 reply; 8+ messages in thread
From: Darrick J. Wong @ 2021-12-16  1:10 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Dec 16, 2021 at 11:17:09AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Got a report that a repeated crash test of a container host would
> eventually fail with a log recovery error preventing the system from
> mounting the root filesystem. It manifested as a directory leaf node
> corruption on writeback like so:
> 
>  XFS (loop0): Mounting V5 Filesystem
>  XFS (loop0): Starting recovery (logdev: internal)
>  XFS (loop0): Metadata corruption detected at xfs_dir3_leaf_check_int+0x99/0xf0, xfs_dir3_leaf1 block 0x12faa158
>  XFS (loop0): Unmount and run xfs_repair
>  XFS (loop0): First 128 bytes of corrupted metadata buffer:
>  00000000: 00 00 00 00 00 00 00 00 3d f1 00 00 e1 9e d5 8b  ........=.......
>  00000010: 00 00 00 00 12 fa a1 58 00 00 00 29 00 00 1b cc  .......X...)....
>  00000020: 91 06 78 ff f7 7e 4a 7d 8d 53 86 f2 ac 47 a8 23  ..x..~J}.S...G.#
>  00000030: 00 00 00 00 17 e0 00 80 00 43 00 00 00 00 00 00  .........C......
>  00000040: 00 00 00 2e 00 00 00 08 00 00 17 2e 00 00 00 0a  ................
>  00000050: 02 35 79 83 00 00 00 30 04 d3 b4 80 00 00 01 50  .5y....0.......P
>  00000060: 08 40 95 7f 00 00 02 98 08 41 fe b7 00 00 02 d4  .@.......A......
>  00000070: 0d 62 ef a7 00 00 01 f2 14 50 21 41 00 00 00 0c  .b.......P!A....
>  XFS (loop0): Corruption of in-memory data (0x8) detected at xfs_do_force_shutdown+0x1a/0x20 (fs/xfs/xfs_buf.c:1514).  Shutting down.
>  XFS (loop0): Please unmount the filesystem and rectify the problem(s)
>  XFS (loop0): log mount/recovery failed: error -117
>  XFS (loop0): log mount failed
> 
> Tracing indicated that we were recovering changes from a transaction
> at LSN 0x29/0x1c16 into a buffer that had an LSN of 0x29/0x1d57.
> That is, log recovery was overwriting a buffer with newer changes on
> disk than was in the transaction. Tracing indicated that we were
> hitting the "recovery immediately" case in
> xfs_buf_log_recovery_lsn(), and hence it was ignoring the LSN in the
> buffer.
> 
> The code was extracting the LSN correctly, then ignoring it because
> the UUID in the buffer did not match the superblock UUID. The
> problem arises because the UUID check uses the wrong UUID - it
> should be checking the sb_meta_uuid, not sb_uuid. This filesystem
> has sb_uuid != sb_meta_uuid (which is fine), and the buffer has the
> correct matching sb_meta_uuid in it, it's just the code checked it
> against the wrong superblock uuid.
> 
> The is no corruption in the filesystem, and failing to recover the
> buffer due to a write verifier failure means the recovery bug did
> not propagate the corruption to disk. Hence there is no corruption
> before or after this bug has manifested, the impact is limited
> simply to an unmountable filesystem....
> 
> This was missed back in 2015 during an audit of incorrect sb_uuid
> usage that resulted in commit fcfbe2c4ef42 ("xfs: log recovery needs
> to validate against sb_meta_uuid") that fixed the magic32 buffers to
> validate against sb_meta_uuid instead of sb_uuid. It missed the
> magicda buffers....
> 
> Fixes: ce748eaa65f2 ("xfs: create new metadata UUID field and incompat flag")
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Makes sense.  Please send a slimmed down version of this to fstests@.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_buf_item_recover.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
> index 70ca5751b13e..e484251dc9c8 100644
> --- a/fs/xfs/xfs_buf_item_recover.c
> +++ b/fs/xfs/xfs_buf_item_recover.c
> @@ -816,7 +816,7 @@ xlog_recover_get_buf_lsn(
>  	}
>  
>  	if (lsn != (xfs_lsn_t)-1) {
> -		if (!uuid_equal(&mp->m_sb.sb_uuid, uuid))
> +		if (!uuid_equal(&mp->m_sb.sb_meta_uuid, uuid))
>  			goto recover_immediately;
>  		return lsn;
>  	}
> -- 
> 2.33.0
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] xfs: check sb_meta_uuid for dabuf buffer recovery
  2021-12-16  1:10 ` Darrick J. Wong
@ 2021-12-16  3:58   ` Darrick J. Wong
  2021-12-16  5:16     ` Dave Chinner
  0 siblings, 1 reply; 8+ messages in thread
From: Darrick J. Wong @ 2021-12-16  3:58 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Dec 15, 2021 at 05:10:54PM -0800, Darrick J. Wong wrote:
> On Thu, Dec 16, 2021 at 11:17:09AM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Got a report that a repeated crash test of a container host would
> > eventually fail with a log recovery error preventing the system from
> > mounting the root filesystem. It manifested as a directory leaf node
> > corruption on writeback like so:
> > 
> >  XFS (loop0): Mounting V5 Filesystem
> >  XFS (loop0): Starting recovery (logdev: internal)
> >  XFS (loop0): Metadata corruption detected at xfs_dir3_leaf_check_int+0x99/0xf0, xfs_dir3_leaf1 block 0x12faa158
> >  XFS (loop0): Unmount and run xfs_repair
> >  XFS (loop0): First 128 bytes of corrupted metadata buffer:
> >  00000000: 00 00 00 00 00 00 00 00 3d f1 00 00 e1 9e d5 8b  ........=.......
> >  00000010: 00 00 00 00 12 fa a1 58 00 00 00 29 00 00 1b cc  .......X...)....
> >  00000020: 91 06 78 ff f7 7e 4a 7d 8d 53 86 f2 ac 47 a8 23  ..x..~J}.S...G.#
> >  00000030: 00 00 00 00 17 e0 00 80 00 43 00 00 00 00 00 00  .........C......
> >  00000040: 00 00 00 2e 00 00 00 08 00 00 17 2e 00 00 00 0a  ................
> >  00000050: 02 35 79 83 00 00 00 30 04 d3 b4 80 00 00 01 50  .5y....0.......P
> >  00000060: 08 40 95 7f 00 00 02 98 08 41 fe b7 00 00 02 d4  .@.......A......
> >  00000070: 0d 62 ef a7 00 00 01 f2 14 50 21 41 00 00 00 0c  .b.......P!A....
> >  XFS (loop0): Corruption of in-memory data (0x8) detected at xfs_do_force_shutdown+0x1a/0x20 (fs/xfs/xfs_buf.c:1514).  Shutting down.
> >  XFS (loop0): Please unmount the filesystem and rectify the problem(s)
> >  XFS (loop0): log mount/recovery failed: error -117
> >  XFS (loop0): log mount failed
> > 
> > Tracing indicated that we were recovering changes from a transaction
> > at LSN 0x29/0x1c16 into a buffer that had an LSN of 0x29/0x1d57.
> > That is, log recovery was overwriting a buffer with newer changes on
> > disk than was in the transaction. Tracing indicated that we were
> > hitting the "recovery immediately" case in
> > xfs_buf_log_recovery_lsn(), and hence it was ignoring the LSN in the
> > buffer.
> > 
> > The code was extracting the LSN correctly, then ignoring it because
> > the UUID in the buffer did not match the superblock UUID. The
> > problem arises because the UUID check uses the wrong UUID - it
> > should be checking the sb_meta_uuid, not sb_uuid. This filesystem
> > has sb_uuid != sb_meta_uuid (which is fine), and the buffer has the
> > correct matching sb_meta_uuid in it, it's just the code checked it
> > against the wrong superblock uuid.
> > 
> > The is no corruption in the filesystem, and failing to recover the
> > buffer due to a write verifier failure means the recovery bug did
> > not propagate the corruption to disk. Hence there is no corruption
> > before or after this bug has manifested, the impact is limited
> > simply to an unmountable filesystem....
> > 
> > This was missed back in 2015 during an audit of incorrect sb_uuid
> > usage that resulted in commit fcfbe2c4ef42 ("xfs: log recovery needs
> > to validate against sb_meta_uuid") that fixed the magic32 buffers to
> > validate against sb_meta_uuid instead of sb_uuid. It missed the
> > magicda buffers....
> > 
> > Fixes: ce748eaa65f2 ("xfs: create new metadata UUID field and incompat flag")
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> 
> Makes sense.  Please send a slimmed down version of this to fstests@.

<sigh> "...of a reproducer for this to fstests@."

--D

> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> 
> --D
> 
> > ---
> >  fs/xfs/xfs_buf_item_recover.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
> > index 70ca5751b13e..e484251dc9c8 100644
> > --- a/fs/xfs/xfs_buf_item_recover.c
> > +++ b/fs/xfs/xfs_buf_item_recover.c
> > @@ -816,7 +816,7 @@ xlog_recover_get_buf_lsn(
> >  	}
> >  
> >  	if (lsn != (xfs_lsn_t)-1) {
> > -		if (!uuid_equal(&mp->m_sb.sb_uuid, uuid))
> > +		if (!uuid_equal(&mp->m_sb.sb_meta_uuid, uuid))
> >  			goto recover_immediately;
> >  		return lsn;
> >  	}
> > -- 
> > 2.33.0
> > 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] xfs: check sb_meta_uuid for dabuf buffer recovery
  2021-12-16  3:58   ` Darrick J. Wong
@ 2021-12-16  5:16     ` Dave Chinner
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Chinner @ 2021-12-16  5:16 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Dec 15, 2021 at 07:58:58PM -0800, Darrick J. Wong wrote:
> On Wed, Dec 15, 2021 at 05:10:54PM -0800, Darrick J. Wong wrote:
> > On Thu, Dec 16, 2021 at 11:17:09AM +1100, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > Got a report that a repeated crash test of a container host would
> > > eventually fail with a log recovery error preventing the system from
> > > mounting the root filesystem. It manifested as a directory leaf node
> > > corruption on writeback like so:
> > > 
> > >  XFS (loop0): Mounting V5 Filesystem
> > >  XFS (loop0): Starting recovery (logdev: internal)
> > >  XFS (loop0): Metadata corruption detected at xfs_dir3_leaf_check_int+0x99/0xf0, xfs_dir3_leaf1 block 0x12faa158
> > >  XFS (loop0): Unmount and run xfs_repair
> > >  XFS (loop0): First 128 bytes of corrupted metadata buffer:
> > >  00000000: 00 00 00 00 00 00 00 00 3d f1 00 00 e1 9e d5 8b  ........=.......
> > >  00000010: 00 00 00 00 12 fa a1 58 00 00 00 29 00 00 1b cc  .......X...)....
> > >  00000020: 91 06 78 ff f7 7e 4a 7d 8d 53 86 f2 ac 47 a8 23  ..x..~J}.S...G.#
> > >  00000030: 00 00 00 00 17 e0 00 80 00 43 00 00 00 00 00 00  .........C......
> > >  00000040: 00 00 00 2e 00 00 00 08 00 00 17 2e 00 00 00 0a  ................
> > >  00000050: 02 35 79 83 00 00 00 30 04 d3 b4 80 00 00 01 50  .5y....0.......P
> > >  00000060: 08 40 95 7f 00 00 02 98 08 41 fe b7 00 00 02 d4  .@.......A......
> > >  00000070: 0d 62 ef a7 00 00 01 f2 14 50 21 41 00 00 00 0c  .b.......P!A....
> > >  XFS (loop0): Corruption of in-memory data (0x8) detected at xfs_do_force_shutdown+0x1a/0x20 (fs/xfs/xfs_buf.c:1514).  Shutting down.
> > >  XFS (loop0): Please unmount the filesystem and rectify the problem(s)
> > >  XFS (loop0): log mount/recovery failed: error -117
> > >  XFS (loop0): log mount failed
> > > 
> > > Tracing indicated that we were recovering changes from a transaction
> > > at LSN 0x29/0x1c16 into a buffer that had an LSN of 0x29/0x1d57.
> > > That is, log recovery was overwriting a buffer with newer changes on
> > > disk than was in the transaction. Tracing indicated that we were
> > > hitting the "recovery immediately" case in
> > > xfs_buf_log_recovery_lsn(), and hence it was ignoring the LSN in the
> > > buffer.
> > > 
> > > The code was extracting the LSN correctly, then ignoring it because
> > > the UUID in the buffer did not match the superblock UUID. The
> > > problem arises because the UUID check uses the wrong UUID - it
> > > should be checking the sb_meta_uuid, not sb_uuid. This filesystem
> > > has sb_uuid != sb_meta_uuid (which is fine), and the buffer has the
> > > correct matching sb_meta_uuid in it, it's just the code checked it
> > > against the wrong superblock uuid.
> > > 
> > > The is no corruption in the filesystem, and failing to recover the
> > > buffer due to a write verifier failure means the recovery bug did
> > > not propagate the corruption to disk. Hence there is no corruption
> > > before or after this bug has manifested, the impact is limited
> > > simply to an unmountable filesystem....
> > > 
> > > This was missed back in 2015 during an audit of incorrect sb_uuid
> > > usage that resulted in commit fcfbe2c4ef42 ("xfs: log recovery needs
> > > to validate against sb_meta_uuid") that fixed the magic32 buffers to
> > > validate against sb_meta_uuid instead of sb_uuid. It missed the
> > > magicda buffers....
> > > 
> > > Fixes: ce748eaa65f2 ("xfs: create new metadata UUID field and incompat flag")
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > 
> > Makes sense.  Please send a slimmed down version of this to fstests@.
> 
> <sigh> "...of a reproducer for this to fstests@."

I don't have one. All I have a pre-recovery metadump that has the
issue already in it, and I traced the mount and looked at log print
output to find the issue.

Eric, it seems like getting generic coverage of sb_meta_uuid !=
sb_uuid for at least the recoveryloop group tests would be a goof
thing for the QE team to add to fstests? Alternatively, randomly
decide whether to set a different UUID during each scratch_mkfs
call?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] xfs: check sb_meta_uuid for dabuf buffer recovery
  2021-12-16  0:17 [PATCH] xfs: check sb_meta_uuid for dabuf buffer recovery Dave Chinner
                   ` (2 preceding siblings ...)
  2021-12-16  1:10 ` Darrick J. Wong
@ 2021-12-24  7:07 ` Christoph Hellwig
  3 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2021-12-24  7:07 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-12-24  7:07 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-16  0:17 [PATCH] xfs: check sb_meta_uuid for dabuf buffer recovery Dave Chinner
2021-12-16  0:21 ` Eric Sandeen
2021-12-16  0:25   ` Dave Chinner
2021-12-16  0:23 ` Dave Chinner
2021-12-16  1:10 ` Darrick J. Wong
2021-12-16  3:58   ` Darrick J. Wong
2021-12-16  5:16     ` Dave Chinner
2021-12-24  7:07 ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.