* Re: [PATCH] xfs: check sb_meta_uuid for dabuf buffer recovery
2021-12-16 0:17 [PATCH] xfs: check sb_meta_uuid for dabuf buffer recovery Dave Chinner
@ 2021-12-16 0:21 ` Eric Sandeen
2021-12-16 0:25 ` Dave Chinner
2021-12-16 0:23 ` Dave Chinner
` (2 subsequent siblings)
3 siblings, 1 reply; 8+ messages in thread
From: Eric Sandeen @ 2021-12-16 0:21 UTC (permalink / raw)
To: Dave Chinner, linux-xfs
On 12/15/21 6:17 PM, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> Got a report that a repeated crash test of a container host would
> eventually fail with a log recovery error preventing the system from
> mounting the root filesystem. It manifested as a directory leaf node
> corruption on writeback like so:
>
> XFS (loop0): Mounting V5 Filesystem
> XFS (loop0): Starting recovery (logdev: internal)
> XFS (loop0): Metadata corruption detected at xfs_dir3_leaf_check_int+0x99/0xf0, xfs_dir3_leaf1 block 0x12faa158
> XFS (loop0): Unmount and run xfs_repair
> XFS (loop0): First 128 bytes of corrupted metadata buffer:
> 00000000: 00 00 00 00 00 00 00 00 3d f1 00 00 e1 9e d5 8b ........=.......
> 00000010: 00 00 00 00 12 fa a1 58 00 00 00 29 00 00 1b cc .......X...)....
> 00000020: 91 06 78 ff f7 7e 4a 7d 8d 53 86 f2 ac 47 a8 23 ..x..~J}.S...G.#
> 00000030: 00 00 00 00 17 e0 00 80 00 43 00 00 00 00 00 00 .........C......
> 00000040: 00 00 00 2e 00 00 00 08 00 00 17 2e 00 00 00 0a ................
> 00000050: 02 35 79 83 00 00 00 30 04 d3 b4 80 00 00 01 50 .5y....0.......P
> 00000060: 08 40 95 7f 00 00 02 98 08 41 fe b7 00 00 02 d4 .@.......A......
> 00000070: 0d 62 ef a7 00 00 01 f2 14 50 21 41 00 00 00 0c .b.......P!A....
> XFS (loop0): Corruption of in-memory data (0x8) detected at xfs_do_force_shutdown+0x1a/0x20 (fs/xfs/xfs_buf.c:1514). Shutting down.
> XFS (loop0): Please unmount the filesystem and rectify the problem(s)
> XFS (loop0): log mount/recovery failed: error -117
> XFS (loop0): log mount failed
>
> Tracing indicated that we were recovering changes from a transaction
> at LSN 0x29/0x1c16 into a buffer that had an LSN of 0x29/0x1d57.
> That is, log recovery was overwriting a buffer with newer changes on
> disk than was in the transaction. Tracing indicated that we were
> hitting the "recovery immediately" case in
> xfs_buf_log_recovery_lsn(), and hence it was ignoring the LSN in the
> buffer.
>
> The code was extracting the LSN correctly, then ignoring it because
> the UUID in the buffer did not match the superblock UUID. The
> problem arises because the UUID check uses the wrong UUID - it
> should be checking the sb_meta_uuid, not sb_uuid. This filesystem
> has sb_uuid != sb_meta_uuid (which is fine), and the buffer has the
> correct matching sb_meta_uuid in it, it's just the code checked it
> against the wrong superblock uuid.
>
> The is no corruption in the filesystem, and failing to recover the
> buffer due to a write verifier failure means the recovery bug did
> not propagate the corruption to disk. Hence there is no corruption
> before or after this bug has manifested, the impact is limited
> simply to an unmountable filesystem....
>
> This was missed back in 2015 during an audit of incorrect sb_uuid
> usage that resulted in commit fcfbe2c4ef42 ("xfs: log recovery needs
> to validate against sb_meta_uuid") that fixed the magic32 buffers to
> validate against sb_meta_uuid instead of sb_uuid. It missed the
> magicda buffers....
>
> Fixes: ce748eaa65f2 ("xfs: create new metadata UUID field and incompat flag")
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
Kind of amazing it took so long to turn up and get found, eh?
Thanks Dave,
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
> ---
> fs/xfs/xfs_buf_item_recover.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
> index 70ca5751b13e..e484251dc9c8 100644
> --- a/fs/xfs/xfs_buf_item_recover.c
> +++ b/fs/xfs/xfs_buf_item_recover.c
> @@ -816,7 +816,7 @@ xlog_recover_get_buf_lsn(
> }
>
> if (lsn != (xfs_lsn_t)-1) {
> - if (!uuid_equal(&mp->m_sb.sb_uuid, uuid))
> + if (!uuid_equal(&mp->m_sb.sb_meta_uuid, uuid))
> goto recover_immediately;
> return lsn;
> }
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] xfs: check sb_meta_uuid for dabuf buffer recovery
2021-12-16 0:21 ` Eric Sandeen
@ 2021-12-16 0:25 ` Dave Chinner
0 siblings, 0 replies; 8+ messages in thread
From: Dave Chinner @ 2021-12-16 0:25 UTC (permalink / raw)
To: Eric Sandeen; +Cc: linux-xfs
On Wed, Dec 15, 2021 at 06:21:16PM -0600, Eric Sandeen wrote:
> On 12/15/21 6:17 PM, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > Got a report that a repeated crash test of a container host would
> > eventually fail with a log recovery error preventing the system from
> > mounting the root filesystem. It manifested as a directory leaf node
> > corruption on writeback like so:
> >
> > XFS (loop0): Mounting V5 Filesystem
> > XFS (loop0): Starting recovery (logdev: internal)
> > XFS (loop0): Metadata corruption detected at xfs_dir3_leaf_check_int+0x99/0xf0, xfs_dir3_leaf1 block 0x12faa158
> > XFS (loop0): Unmount and run xfs_repair
> > XFS (loop0): First 128 bytes of corrupted metadata buffer:
> > 00000000: 00 00 00 00 00 00 00 00 3d f1 00 00 e1 9e d5 8b ........=.......
> > 00000010: 00 00 00 00 12 fa a1 58 00 00 00 29 00 00 1b cc .......X...)....
> > 00000020: 91 06 78 ff f7 7e 4a 7d 8d 53 86 f2 ac 47 a8 23 ..x..~J}.S...G.#
> > 00000030: 00 00 00 00 17 e0 00 80 00 43 00 00 00 00 00 00 .........C......
> > 00000040: 00 00 00 2e 00 00 00 08 00 00 17 2e 00 00 00 0a ................
> > 00000050: 02 35 79 83 00 00 00 30 04 d3 b4 80 00 00 01 50 .5y....0.......P
> > 00000060: 08 40 95 7f 00 00 02 98 08 41 fe b7 00 00 02 d4 .@.......A......
> > 00000070: 0d 62 ef a7 00 00 01 f2 14 50 21 41 00 00 00 0c .b.......P!A....
> > XFS (loop0): Corruption of in-memory data (0x8) detected at xfs_do_force_shutdown+0x1a/0x20 (fs/xfs/xfs_buf.c:1514). Shutting down.
> > XFS (loop0): Please unmount the filesystem and rectify the problem(s)
> > XFS (loop0): log mount/recovery failed: error -117
> > XFS (loop0): log mount failed
> >
> > Tracing indicated that we were recovering changes from a transaction
> > at LSN 0x29/0x1c16 into a buffer that had an LSN of 0x29/0x1d57.
> > That is, log recovery was overwriting a buffer with newer changes on
> > disk than was in the transaction. Tracing indicated that we were
> > hitting the "recovery immediately" case in
> > xfs_buf_log_recovery_lsn(), and hence it was ignoring the LSN in the
> > buffer.
> >
> > The code was extracting the LSN correctly, then ignoring it because
> > the UUID in the buffer did not match the superblock UUID. The
> > problem arises because the UUID check uses the wrong UUID - it
> > should be checking the sb_meta_uuid, not sb_uuid. This filesystem
> > has sb_uuid != sb_meta_uuid (which is fine), and the buffer has the
> > correct matching sb_meta_uuid in it, it's just the code checked it
> > against the wrong superblock uuid.
> >
> > The is no corruption in the filesystem, and failing to recover the
> > buffer due to a write verifier failure means the recovery bug did
> > not propagate the corruption to disk. Hence there is no corruption
> > before or after this bug has manifested, the impact is limited
> > simply to an unmountable filesystem....
> >
> > This was missed back in 2015 during an audit of incorrect sb_uuid
> > usage that resulted in commit fcfbe2c4ef42 ("xfs: log recovery needs
> > to validate against sb_meta_uuid") that fixed the magic32 buffers to
> > validate against sb_meta_uuid instead of sb_uuid. It missed the
> > magicda buffers....
> >
> > Fixes: ce748eaa65f2 ("xfs: create new metadata UUID field and incompat flag")
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
>
> Kind of amazing it took so long to turn up and get found, eh?
Especially considering that I use sb_uuid != sb_meta_uuid for all
the test device filesystems I run fstests on and have for the past 5
years. I think that means we *never* run log recovery tests on test
devices in fstests.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] xfs: check sb_meta_uuid for dabuf buffer recovery
2021-12-16 0:17 [PATCH] xfs: check sb_meta_uuid for dabuf buffer recovery Dave Chinner
2021-12-16 0:21 ` Eric Sandeen
@ 2021-12-16 0:23 ` Dave Chinner
2021-12-16 1:10 ` Darrick J. Wong
2021-12-24 7:07 ` Christoph Hellwig
3 siblings, 0 replies; 8+ messages in thread
From: Dave Chinner @ 2021-12-16 0:23 UTC (permalink / raw)
To: linux-xfs
On Thu, Dec 16, 2021 at 11:17:09AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> Got a report that a repeated crash test of a container host would
> eventually fail with a log recovery error preventing the system from
> mounting the root filesystem. It manifested as a directory leaf node
> corruption on writeback like so:
>
> XFS (loop0): Mounting V5 Filesystem
> XFS (loop0): Starting recovery (logdev: internal)
> XFS (loop0): Metadata corruption detected at xfs_dir3_leaf_check_int+0x99/0xf0, xfs_dir3_leaf1 block 0x12faa158
> XFS (loop0): Unmount and run xfs_repair
> XFS (loop0): First 128 bytes of corrupted metadata buffer:
> 00000000: 00 00 00 00 00 00 00 00 3d f1 00 00 e1 9e d5 8b ........=.......
> 00000010: 00 00 00 00 12 fa a1 58 00 00 00 29 00 00 1b cc .......X...)....
> 00000020: 91 06 78 ff f7 7e 4a 7d 8d 53 86 f2 ac 47 a8 23 ..x..~J}.S...G.#
> 00000030: 00 00 00 00 17 e0 00 80 00 43 00 00 00 00 00 00 .........C......
> 00000040: 00 00 00 2e 00 00 00 08 00 00 17 2e 00 00 00 0a ................
> 00000050: 02 35 79 83 00 00 00 30 04 d3 b4 80 00 00 01 50 .5y....0.......P
> 00000060: 08 40 95 7f 00 00 02 98 08 41 fe b7 00 00 02 d4 .@.......A......
> 00000070: 0d 62 ef a7 00 00 01 f2 14 50 21 41 00 00 00 0c .b.......P!A....
> XFS (loop0): Corruption of in-memory data (0x8) detected at xfs_do_force_shutdown+0x1a/0x20 (fs/xfs/xfs_buf.c:1514). Shutting down.
> XFS (loop0): Please unmount the filesystem and rectify the problem(s)
> XFS (loop0): log mount/recovery failed: error -117
> XFS (loop0): log mount failed
>
> Tracing indicated that we were recovering changes from a transaction
> at LSN 0x29/0x1c16 into a buffer that had an LSN of 0x29/0x1d57.
> That is, log recovery was overwriting a buffer with newer changes on
> disk than was in the transaction. Tracing indicated that we were
> hitting the "recovery immediately" case in
> xfs_buf_log_recovery_lsn(), and hence it was ignoring the LSN in the
^^^^^^^^^^^^^^^^^^^^^^^^^^
xlog_recover_get_buf_lsn()
> buffer.
>
> The code was extracting the LSN correctly, then ignoring it because
> the UUID in the buffer did not match the superblock UUID. The
> problem arises because the UUID check uses the wrong UUID - it
> should be checking the sb_meta_uuid, not sb_uuid. This filesystem
> has sb_uuid != sb_meta_uuid (which is fine), and the buffer has the
> correct matching sb_meta_uuid in it, it's just the code checked it
> against the wrong superblock uuid.
>
> The is no corruption in the filesystem, and failing to recover the
> buffer due to a write verifier failure means the recovery bug did
> not propagate the corruption to disk. Hence there is no corruption
> before or after this bug has manifested, the impact is limited
> simply to an unmountable filesystem....
>
> This was missed back in 2015 during an audit of incorrect sb_uuid
> usage that resulted in commit fcfbe2c4ef42 ("xfs: log recovery needs
> to validate against sb_meta_uuid") that fixed the magic32 buffers to
> validate against sb_meta_uuid instead of sb_uuid. It missed the
> magicda buffers....
>
> Fixes: ce748eaa65f2 ("xfs: create new metadata UUID field and incompat flag")
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/xfs_buf_item_recover.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
> index 70ca5751b13e..e484251dc9c8 100644
> --- a/fs/xfs/xfs_buf_item_recover.c
> +++ b/fs/xfs/xfs_buf_item_recover.c
> @@ -816,7 +816,7 @@ xlog_recover_get_buf_lsn(
> }
>
> if (lsn != (xfs_lsn_t)-1) {
> - if (!uuid_equal(&mp->m_sb.sb_uuid, uuid))
> + if (!uuid_equal(&mp->m_sb.sb_meta_uuid, uuid))
> goto recover_immediately;
> return lsn;
> }
> --
> 2.33.0
>
>
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] xfs: check sb_meta_uuid for dabuf buffer recovery
2021-12-16 0:17 [PATCH] xfs: check sb_meta_uuid for dabuf buffer recovery Dave Chinner
2021-12-16 0:21 ` Eric Sandeen
2021-12-16 0:23 ` Dave Chinner
@ 2021-12-16 1:10 ` Darrick J. Wong
2021-12-16 3:58 ` Darrick J. Wong
2021-12-24 7:07 ` Christoph Hellwig
3 siblings, 1 reply; 8+ messages in thread
From: Darrick J. Wong @ 2021-12-16 1:10 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-xfs
On Thu, Dec 16, 2021 at 11:17:09AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> Got a report that a repeated crash test of a container host would
> eventually fail with a log recovery error preventing the system from
> mounting the root filesystem. It manifested as a directory leaf node
> corruption on writeback like so:
>
> XFS (loop0): Mounting V5 Filesystem
> XFS (loop0): Starting recovery (logdev: internal)
> XFS (loop0): Metadata corruption detected at xfs_dir3_leaf_check_int+0x99/0xf0, xfs_dir3_leaf1 block 0x12faa158
> XFS (loop0): Unmount and run xfs_repair
> XFS (loop0): First 128 bytes of corrupted metadata buffer:
> 00000000: 00 00 00 00 00 00 00 00 3d f1 00 00 e1 9e d5 8b ........=.......
> 00000010: 00 00 00 00 12 fa a1 58 00 00 00 29 00 00 1b cc .......X...)....
> 00000020: 91 06 78 ff f7 7e 4a 7d 8d 53 86 f2 ac 47 a8 23 ..x..~J}.S...G.#
> 00000030: 00 00 00 00 17 e0 00 80 00 43 00 00 00 00 00 00 .........C......
> 00000040: 00 00 00 2e 00 00 00 08 00 00 17 2e 00 00 00 0a ................
> 00000050: 02 35 79 83 00 00 00 30 04 d3 b4 80 00 00 01 50 .5y....0.......P
> 00000060: 08 40 95 7f 00 00 02 98 08 41 fe b7 00 00 02 d4 .@.......A......
> 00000070: 0d 62 ef a7 00 00 01 f2 14 50 21 41 00 00 00 0c .b.......P!A....
> XFS (loop0): Corruption of in-memory data (0x8) detected at xfs_do_force_shutdown+0x1a/0x20 (fs/xfs/xfs_buf.c:1514). Shutting down.
> XFS (loop0): Please unmount the filesystem and rectify the problem(s)
> XFS (loop0): log mount/recovery failed: error -117
> XFS (loop0): log mount failed
>
> Tracing indicated that we were recovering changes from a transaction
> at LSN 0x29/0x1c16 into a buffer that had an LSN of 0x29/0x1d57.
> That is, log recovery was overwriting a buffer with newer changes on
> disk than was in the transaction. Tracing indicated that we were
> hitting the "recovery immediately" case in
> xfs_buf_log_recovery_lsn(), and hence it was ignoring the LSN in the
> buffer.
>
> The code was extracting the LSN correctly, then ignoring it because
> the UUID in the buffer did not match the superblock UUID. The
> problem arises because the UUID check uses the wrong UUID - it
> should be checking the sb_meta_uuid, not sb_uuid. This filesystem
> has sb_uuid != sb_meta_uuid (which is fine), and the buffer has the
> correct matching sb_meta_uuid in it, it's just the code checked it
> against the wrong superblock uuid.
>
> The is no corruption in the filesystem, and failing to recover the
> buffer due to a write verifier failure means the recovery bug did
> not propagate the corruption to disk. Hence there is no corruption
> before or after this bug has manifested, the impact is limited
> simply to an unmountable filesystem....
>
> This was missed back in 2015 during an audit of incorrect sb_uuid
> usage that resulted in commit fcfbe2c4ef42 ("xfs: log recovery needs
> to validate against sb_meta_uuid") that fixed the magic32 buffers to
> validate against sb_meta_uuid instead of sb_uuid. It missed the
> magicda buffers....
>
> Fixes: ce748eaa65f2 ("xfs: create new metadata UUID field and incompat flag")
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
Makes sense. Please send a slimmed down version of this to fstests@.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
--D
> ---
> fs/xfs/xfs_buf_item_recover.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
> index 70ca5751b13e..e484251dc9c8 100644
> --- a/fs/xfs/xfs_buf_item_recover.c
> +++ b/fs/xfs/xfs_buf_item_recover.c
> @@ -816,7 +816,7 @@ xlog_recover_get_buf_lsn(
> }
>
> if (lsn != (xfs_lsn_t)-1) {
> - if (!uuid_equal(&mp->m_sb.sb_uuid, uuid))
> + if (!uuid_equal(&mp->m_sb.sb_meta_uuid, uuid))
> goto recover_immediately;
> return lsn;
> }
> --
> 2.33.0
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] xfs: check sb_meta_uuid for dabuf buffer recovery
2021-12-16 1:10 ` Darrick J. Wong
@ 2021-12-16 3:58 ` Darrick J. Wong
2021-12-16 5:16 ` Dave Chinner
0 siblings, 1 reply; 8+ messages in thread
From: Darrick J. Wong @ 2021-12-16 3:58 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-xfs
On Wed, Dec 15, 2021 at 05:10:54PM -0800, Darrick J. Wong wrote:
> On Thu, Dec 16, 2021 at 11:17:09AM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > Got a report that a repeated crash test of a container host would
> > eventually fail with a log recovery error preventing the system from
> > mounting the root filesystem. It manifested as a directory leaf node
> > corruption on writeback like so:
> >
> > XFS (loop0): Mounting V5 Filesystem
> > XFS (loop0): Starting recovery (logdev: internal)
> > XFS (loop0): Metadata corruption detected at xfs_dir3_leaf_check_int+0x99/0xf0, xfs_dir3_leaf1 block 0x12faa158
> > XFS (loop0): Unmount and run xfs_repair
> > XFS (loop0): First 128 bytes of corrupted metadata buffer:
> > 00000000: 00 00 00 00 00 00 00 00 3d f1 00 00 e1 9e d5 8b ........=.......
> > 00000010: 00 00 00 00 12 fa a1 58 00 00 00 29 00 00 1b cc .......X...)....
> > 00000020: 91 06 78 ff f7 7e 4a 7d 8d 53 86 f2 ac 47 a8 23 ..x..~J}.S...G.#
> > 00000030: 00 00 00 00 17 e0 00 80 00 43 00 00 00 00 00 00 .........C......
> > 00000040: 00 00 00 2e 00 00 00 08 00 00 17 2e 00 00 00 0a ................
> > 00000050: 02 35 79 83 00 00 00 30 04 d3 b4 80 00 00 01 50 .5y....0.......P
> > 00000060: 08 40 95 7f 00 00 02 98 08 41 fe b7 00 00 02 d4 .@.......A......
> > 00000070: 0d 62 ef a7 00 00 01 f2 14 50 21 41 00 00 00 0c .b.......P!A....
> > XFS (loop0): Corruption of in-memory data (0x8) detected at xfs_do_force_shutdown+0x1a/0x20 (fs/xfs/xfs_buf.c:1514). Shutting down.
> > XFS (loop0): Please unmount the filesystem and rectify the problem(s)
> > XFS (loop0): log mount/recovery failed: error -117
> > XFS (loop0): log mount failed
> >
> > Tracing indicated that we were recovering changes from a transaction
> > at LSN 0x29/0x1c16 into a buffer that had an LSN of 0x29/0x1d57.
> > That is, log recovery was overwriting a buffer with newer changes on
> > disk than was in the transaction. Tracing indicated that we were
> > hitting the "recovery immediately" case in
> > xfs_buf_log_recovery_lsn(), and hence it was ignoring the LSN in the
> > buffer.
> >
> > The code was extracting the LSN correctly, then ignoring it because
> > the UUID in the buffer did not match the superblock UUID. The
> > problem arises because the UUID check uses the wrong UUID - it
> > should be checking the sb_meta_uuid, not sb_uuid. This filesystem
> > has sb_uuid != sb_meta_uuid (which is fine), and the buffer has the
> > correct matching sb_meta_uuid in it, it's just the code checked it
> > against the wrong superblock uuid.
> >
> > The is no corruption in the filesystem, and failing to recover the
> > buffer due to a write verifier failure means the recovery bug did
> > not propagate the corruption to disk. Hence there is no corruption
> > before or after this bug has manifested, the impact is limited
> > simply to an unmountable filesystem....
> >
> > This was missed back in 2015 during an audit of incorrect sb_uuid
> > usage that resulted in commit fcfbe2c4ef42 ("xfs: log recovery needs
> > to validate against sb_meta_uuid") that fixed the magic32 buffers to
> > validate against sb_meta_uuid instead of sb_uuid. It missed the
> > magicda buffers....
> >
> > Fixes: ce748eaa65f2 ("xfs: create new metadata UUID field and incompat flag")
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
>
> Makes sense. Please send a slimmed down version of this to fstests@.
<sigh> "...of a reproducer for this to fstests@."
--D
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
>
> --D
>
> > ---
> > fs/xfs/xfs_buf_item_recover.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
> > index 70ca5751b13e..e484251dc9c8 100644
> > --- a/fs/xfs/xfs_buf_item_recover.c
> > +++ b/fs/xfs/xfs_buf_item_recover.c
> > @@ -816,7 +816,7 @@ xlog_recover_get_buf_lsn(
> > }
> >
> > if (lsn != (xfs_lsn_t)-1) {
> > - if (!uuid_equal(&mp->m_sb.sb_uuid, uuid))
> > + if (!uuid_equal(&mp->m_sb.sb_meta_uuid, uuid))
> > goto recover_immediately;
> > return lsn;
> > }
> > --
> > 2.33.0
> >
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] xfs: check sb_meta_uuid for dabuf buffer recovery
2021-12-16 3:58 ` Darrick J. Wong
@ 2021-12-16 5:16 ` Dave Chinner
0 siblings, 0 replies; 8+ messages in thread
From: Dave Chinner @ 2021-12-16 5:16 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs
On Wed, Dec 15, 2021 at 07:58:58PM -0800, Darrick J. Wong wrote:
> On Wed, Dec 15, 2021 at 05:10:54PM -0800, Darrick J. Wong wrote:
> > On Thu, Dec 16, 2021 at 11:17:09AM +1100, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > >
> > > Got a report that a repeated crash test of a container host would
> > > eventually fail with a log recovery error preventing the system from
> > > mounting the root filesystem. It manifested as a directory leaf node
> > > corruption on writeback like so:
> > >
> > > XFS (loop0): Mounting V5 Filesystem
> > > XFS (loop0): Starting recovery (logdev: internal)
> > > XFS (loop0): Metadata corruption detected at xfs_dir3_leaf_check_int+0x99/0xf0, xfs_dir3_leaf1 block 0x12faa158
> > > XFS (loop0): Unmount and run xfs_repair
> > > XFS (loop0): First 128 bytes of corrupted metadata buffer:
> > > 00000000: 00 00 00 00 00 00 00 00 3d f1 00 00 e1 9e d5 8b ........=.......
> > > 00000010: 00 00 00 00 12 fa a1 58 00 00 00 29 00 00 1b cc .......X...)....
> > > 00000020: 91 06 78 ff f7 7e 4a 7d 8d 53 86 f2 ac 47 a8 23 ..x..~J}.S...G.#
> > > 00000030: 00 00 00 00 17 e0 00 80 00 43 00 00 00 00 00 00 .........C......
> > > 00000040: 00 00 00 2e 00 00 00 08 00 00 17 2e 00 00 00 0a ................
> > > 00000050: 02 35 79 83 00 00 00 30 04 d3 b4 80 00 00 01 50 .5y....0.......P
> > > 00000060: 08 40 95 7f 00 00 02 98 08 41 fe b7 00 00 02 d4 .@.......A......
> > > 00000070: 0d 62 ef a7 00 00 01 f2 14 50 21 41 00 00 00 0c .b.......P!A....
> > > XFS (loop0): Corruption of in-memory data (0x8) detected at xfs_do_force_shutdown+0x1a/0x20 (fs/xfs/xfs_buf.c:1514). Shutting down.
> > > XFS (loop0): Please unmount the filesystem and rectify the problem(s)
> > > XFS (loop0): log mount/recovery failed: error -117
> > > XFS (loop0): log mount failed
> > >
> > > Tracing indicated that we were recovering changes from a transaction
> > > at LSN 0x29/0x1c16 into a buffer that had an LSN of 0x29/0x1d57.
> > > That is, log recovery was overwriting a buffer with newer changes on
> > > disk than was in the transaction. Tracing indicated that we were
> > > hitting the "recovery immediately" case in
> > > xfs_buf_log_recovery_lsn(), and hence it was ignoring the LSN in the
> > > buffer.
> > >
> > > The code was extracting the LSN correctly, then ignoring it because
> > > the UUID in the buffer did not match the superblock UUID. The
> > > problem arises because the UUID check uses the wrong UUID - it
> > > should be checking the sb_meta_uuid, not sb_uuid. This filesystem
> > > has sb_uuid != sb_meta_uuid (which is fine), and the buffer has the
> > > correct matching sb_meta_uuid in it, it's just the code checked it
> > > against the wrong superblock uuid.
> > >
> > > The is no corruption in the filesystem, and failing to recover the
> > > buffer due to a write verifier failure means the recovery bug did
> > > not propagate the corruption to disk. Hence there is no corruption
> > > before or after this bug has manifested, the impact is limited
> > > simply to an unmountable filesystem....
> > >
> > > This was missed back in 2015 during an audit of incorrect sb_uuid
> > > usage that resulted in commit fcfbe2c4ef42 ("xfs: log recovery needs
> > > to validate against sb_meta_uuid") that fixed the magic32 buffers to
> > > validate against sb_meta_uuid instead of sb_uuid. It missed the
> > > magicda buffers....
> > >
> > > Fixes: ce748eaa65f2 ("xfs: create new metadata UUID field and incompat flag")
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> >
> > Makes sense. Please send a slimmed down version of this to fstests@.
>
> <sigh> "...of a reproducer for this to fstests@."
I don't have one. All I have a pre-recovery metadump that has the
issue already in it, and I traced the mount and looked at log print
output to find the issue.
Eric, it seems like getting generic coverage of sb_meta_uuid !=
sb_uuid for at least the recoveryloop group tests would be a goof
thing for the QE team to add to fstests? Alternatively, randomly
decide whether to set a different UUID during each scratch_mkfs
call?
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] xfs: check sb_meta_uuid for dabuf buffer recovery
2021-12-16 0:17 [PATCH] xfs: check sb_meta_uuid for dabuf buffer recovery Dave Chinner
` (2 preceding siblings ...)
2021-12-16 1:10 ` Darrick J. Wong
@ 2021-12-24 7:07 ` Christoph Hellwig
3 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2021-12-24 7:07 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-xfs
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 8+ messages in thread