* [PATCH 0/6 V2] xfs: more verifications! @ 2018-06-05 6:24 Dave Chinner 2018-06-05 6:24 ` [PATCH 1/6] xfs: catch bad stripe alignment configurations Dave Chinner ` (5 more replies) 0 siblings, 6 replies; 36+ messages in thread From: Dave Chinner @ 2018-06-05 6:24 UTC (permalink / raw) To: linux-xfs Hi folks, Updated version of this patchset. This version adds COW extent size hint verification and moves the corruption error conversion to ESTALE out of xfs_imap_to_bp() and up to xfs_nfs_get_inode() where ESTALE should be returned. And, oh, I just realised I forgot the overflow checks in the btree record verification - I'll send an update for that in line. Cheers, Dave. ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 1/6] xfs: catch bad stripe alignment configurations 2018-06-05 6:24 [PATCH 0/6 V2] xfs: more verifications! Dave Chinner @ 2018-06-05 6:24 ` Dave Chinner 2018-06-05 9:27 ` Carlos Maiolino 2018-06-05 6:24 ` [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier Dave Chinner ` (4 subsequent siblings) 5 siblings, 1 reply; 36+ messages in thread From: Dave Chinner @ 2018-06-05 6:24 UTC (permalink / raw) To: linux-xfs From: Dave Chinner <dchinner@redhat.com> When stripe alignments are invalid, data alignment algorithms in the allocator may not work correctly. Ensure we catch superblocks with invalid stripe alignment setups at mount time. These data alignment mismatches are now detected at mount time like this: XFS (loop0): SB stripe unit sanity check failed XFS (loop0): Metadata corruption detected at xfs_sb_read_verify+0xab/0x110, xfs_sb block 0xffffffffffffffff XFS (loop0): Unmount and run xfs_repair XFS (loop0): First 128 bytes of corrupted metadata buffer: 0000000091c2de02: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 10 00 XFSB............ 0000000023bff869: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000000cdd8c893: 17 32 37 15 ff ca 46 3d 9a 17 d3 33 04 b5 f1 a2 .27...F=...3.... 000000009fd2844f: 00 00 00 00 00 00 00 04 00 00 00 00 00 00 06 d0 ................ 0000000088e9b0bb: 00 00 00 00 00 00 06 d1 00 00 00 00 00 00 06 d2 ................ 00000000ff233a20: 00 00 00 01 00 00 10 00 00 00 00 01 00 00 00 00 ................ 000000009db0ac8b: 00 00 03 60 e1 34 02 00 08 00 00 02 00 00 00 00 ...`.4.......... 00000000f7022460: 00 00 00 00 00 00 00 00 0c 09 0b 01 0c 00 00 19 ................ XFS (loop0): SB validate failed with error -117. And the mount fails. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> --- fs/xfs/libxfs/xfs_sb.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c index 54c5e14fdfda..fe9448862667 100644 --- a/fs/xfs/libxfs/xfs_sb.c +++ b/fs/xfs/libxfs/xfs_sb.c @@ -266,6 +266,22 @@ xfs_mount_validate_sb( return -EFSCORRUPTED; } + if (sbp->sb_unit) { + if (!xfs_sb_version_hasdalign(sbp) || + sbp->sb_unit > sbp->sb_width || + (sbp->sb_width % sbp->sb_unit) != 0) { + xfs_notice(mp, "SB stripe unit sanity check failed"); + return -EFSCORRUPTED; + } + } else if (xfs_sb_version_hasdalign(sbp)) { + xfs_notice(mp, "SB stripe alignment sanity check failed"); + return -EFSCORRUPTED; + } else if (sbp->sb_width) { + xfs_notice(mp, "SB stripe width sanity check failed"); + return -EFSCORRUPTED; + } + + if (xfs_sb_version_hascrc(&mp->m_sb) && sbp->sb_blocksize < XFS_MIN_CRC_BLOCKSIZE) { xfs_notice(mp, "v5 SB sanity check failed"); -- 2.17.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 1/6] xfs: catch bad stripe alignment configurations 2018-06-05 6:24 ` [PATCH 1/6] xfs: catch bad stripe alignment configurations Dave Chinner @ 2018-06-05 9:27 ` Carlos Maiolino 0 siblings, 0 replies; 36+ messages in thread From: Carlos Maiolino @ 2018-06-05 9:27 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs On Tue, Jun 05, 2018 at 04:24:18PM +1000, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > When stripe alignments are invalid, data alignment algorithms in the > allocator may not work correctly. Ensure we catch superblocks with > invalid stripe alignment setups at mount time. These data alignment > mismatches are now detected at mount time like this: > > XFS (loop0): SB stripe unit sanity check failed > XFS (loop0): Metadata corruption detected at xfs_sb_read_verify+0xab/0x110, xfs_sb block 0xffffffffffffffff > XFS (loop0): Unmount and run xfs_repair > XFS (loop0): First 128 bytes of corrupted metadata buffer: > 0000000091c2de02: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 10 00 XFSB............ > 0000000023bff869: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > 00000000cdd8c893: 17 32 37 15 ff ca 46 3d 9a 17 d3 33 04 b5 f1 a2 .27...F=...3.... > 000000009fd2844f: 00 00 00 00 00 00 00 04 00 00 00 00 00 00 06 d0 ................ > 0000000088e9b0bb: 00 00 00 00 00 00 06 d1 00 00 00 00 00 00 06 d2 ................ > 00000000ff233a20: 00 00 00 01 00 00 10 00 00 00 00 01 00 00 00 00 ................ > 000000009db0ac8b: 00 00 03 60 e1 34 02 00 08 00 00 02 00 00 00 00 ...`.4.......... > 00000000f7022460: 00 00 00 00 00 00 00 00 0c 09 0b 01 0c 00 00 19 ................ > XFS (loop0): SB validate failed with error -117. > > And the mount fails. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> > --- > fs/xfs/libxfs/xfs_sb.c | 16 ++++++++++++++++ > 1 file changed, 16 insertions(+) Looks good, you can add if you want: Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> > > diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c > index 54c5e14fdfda..fe9448862667 100644 > --- a/fs/xfs/libxfs/xfs_sb.c > +++ b/fs/xfs/libxfs/xfs_sb.c > @@ -266,6 +266,22 @@ xfs_mount_validate_sb( > return -EFSCORRUPTED; > } > > + if (sbp->sb_unit) { > + if (!xfs_sb_version_hasdalign(sbp) || > + sbp->sb_unit > sbp->sb_width || > + (sbp->sb_width % sbp->sb_unit) != 0) { > + xfs_notice(mp, "SB stripe unit sanity check failed"); > + return -EFSCORRUPTED; > + } > + } else if (xfs_sb_version_hasdalign(sbp)) { > + xfs_notice(mp, "SB stripe alignment sanity check failed"); > + return -EFSCORRUPTED; > + } else if (sbp->sb_width) { > + xfs_notice(mp, "SB stripe width sanity check failed"); > + return -EFSCORRUPTED; > + } > + > + > if (xfs_sb_version_hascrc(&mp->m_sb) && > sbp->sb_blocksize < XFS_MIN_CRC_BLOCKSIZE) { > xfs_notice(mp, "v5 SB sanity check failed"); > -- > 2.17.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Carlos ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier 2018-06-05 6:24 [PATCH 0/6 V2] xfs: more verifications! Dave Chinner 2018-06-05 6:24 ` [PATCH 1/6] xfs: catch bad stripe alignment configurations Dave Chinner @ 2018-06-05 6:24 ` Dave Chinner 2018-06-05 9:53 ` Carlos Maiolino ` (2 more replies) 2018-06-05 6:24 ` [PATCH 3/6] xfs: verify COW " Dave Chinner ` (3 subsequent siblings) 5 siblings, 3 replies; 36+ messages in thread From: Dave Chinner @ 2018-06-05 6:24 UTC (permalink / raw) To: linux-xfs From: Dave Chinner <dchinner@redhat.com> There are rules for vald extent size hints. We enforce them when applications set them, but fuzzers violate those rules and that screws us over. This results in alignment assertion failures when setting up allocations such as this in direct IO: XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432 .... Call Trace: xfs_bmap_btalloc+0x415/0x910 xfs_bmapi_write+0x71c/0x12e0 xfs_iomap_write_direct+0x2a9/0x420 xfs_file_iomap_begin+0x4dc/0xa70 iomap_apply+0x43/0x100 iomap_file_buffered_write+0x62/0x90 xfs_file_buffered_aio_write+0xba/0x300 __vfs_write+0xd5/0x150 vfs_write+0xb6/0x180 ksys_write+0x45/0xa0 do_syscall_64+0x5a/0x180 entry_SYSCALL_64_after_hwframe+0x49/0xbe And from xfs_db: core.extsize = 10380288 Which is not an integer multiple of the block size, and so violates Rule #7 for setting extent size hints. Validate extent size hint rules in the inode verifier to catch this. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/libxfs/xfs_inode_buf.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c index f5fff1ccb61d..be197c91307b 100644 --- a/fs/xfs/libxfs/xfs_inode_buf.c +++ b/fs/xfs/libxfs/xfs_inode_buf.c @@ -385,6 +385,7 @@ xfs_dinode_verify( xfs_ino_t ino, struct xfs_dinode *dip) { + xfs_failaddr_t fa; uint16_t mode; uint16_t flags; uint64_t flags2; @@ -501,6 +502,12 @@ xfs_dinode_verify( return __this_address; } + /* extent size hint validation */ + fa = xfs_inode_validate_extsize(mp, be32_to_cpu(dip->di_extsize), + mode, be32_to_cpu(dip->di_flags)); + if (fa) + return fa; + /* only version 3 or greater inodes are extensively verified here */ if (dip->di_version < 3) return NULL; -- 2.17.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier 2018-06-05 6:24 ` [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier Dave Chinner @ 2018-06-05 9:53 ` Carlos Maiolino 2018-06-05 22:56 ` Dave Chinner 2018-06-05 17:10 ` Darrick J. Wong 2018-07-24 6:39 ` Eric Sandeen 2 siblings, 1 reply; 36+ messages in thread From: Carlos Maiolino @ 2018-06-05 9:53 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs On Tue, Jun 05, 2018 at 04:24:19PM +1000, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > There are rules for vald extent size hints. We enforce them when > applications set them, but fuzzers violate those rules and that > screws us over. > > This results in alignment assertion failures when setting up > allocations such as this in direct IO: > > XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432 > .... > Call Trace: > xfs_bmap_btalloc+0x415/0x910 > xfs_bmapi_write+0x71c/0x12e0 > xfs_iomap_write_direct+0x2a9/0x420 > xfs_file_iomap_begin+0x4dc/0xa70 > iomap_apply+0x43/0x100 > iomap_file_buffered_write+0x62/0x90 > xfs_file_buffered_aio_write+0xba/0x300 > __vfs_write+0xd5/0x150 > vfs_write+0xb6/0x180 > ksys_write+0x45/0xa0 > do_syscall_64+0x5a/0x180 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > And from xfs_db: > > core.extsize = 10380288 > > Which is not an integer multiple of the block size, and so violates > Rule #7 for setting extent size hints. Validate extent size hint > rules in the inode verifier to catch this. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > --- > fs/xfs/libxfs/xfs_inode_buf.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c > index f5fff1ccb61d..be197c91307b 100644 > --- a/fs/xfs/libxfs/xfs_inode_buf.c > +++ b/fs/xfs/libxfs/xfs_inode_buf.c > @@ -385,6 +385,7 @@ xfs_dinode_verify( > xfs_ino_t ino, > struct xfs_dinode *dip) > { > + xfs_failaddr_t fa; Weren't we getting rid of typedefs? To be honest the typedef here gives more clarity to the code than void* directly, so, I'm ok with it anyway, I'm just curious is some typedefs are going to be kept. Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> > uint16_t mode; > uint16_t flags; > uint64_t flags2; > @@ -501,6 +502,12 @@ xfs_dinode_verify( > return __this_address; > } > > + /* extent size hint validation */ > + fa = xfs_inode_validate_extsize(mp, be32_to_cpu(dip->di_extsize), > + mode, be32_to_cpu(dip->di_flags)); > + if (fa) > + return fa; > + > /* only version 3 or greater inodes are extensively verified here */ > if (dip->di_version < 3) > return NULL; > -- > 2.17.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Carlos ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier 2018-06-05 9:53 ` Carlos Maiolino @ 2018-06-05 22:56 ` Dave Chinner 0 siblings, 0 replies; 36+ messages in thread From: Dave Chinner @ 2018-06-05 22:56 UTC (permalink / raw) To: linux-xfs On Tue, Jun 05, 2018 at 11:53:59AM +0200, Carlos Maiolino wrote: > On Tue, Jun 05, 2018 at 04:24:19PM +1000, Dave Chinner wrote: > > From: Dave Chinner <dchinner@redhat.com> > > > > There are rules for vald extent size hints. We enforce them when > > applications set them, but fuzzers violate those rules and that > > screws us over. > > > > This results in alignment assertion failures when setting up > > allocations such as this in direct IO: > > > > XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432 > > .... > > Call Trace: > > xfs_bmap_btalloc+0x415/0x910 > > xfs_bmapi_write+0x71c/0x12e0 > > xfs_iomap_write_direct+0x2a9/0x420 > > xfs_file_iomap_begin+0x4dc/0xa70 > > iomap_apply+0x43/0x100 > > iomap_file_buffered_write+0x62/0x90 > > xfs_file_buffered_aio_write+0xba/0x300 > > __vfs_write+0xd5/0x150 > > vfs_write+0xb6/0x180 > > ksys_write+0x45/0xa0 > > do_syscall_64+0x5a/0x180 > > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > > > And from xfs_db: > > > > core.extsize = 10380288 > > > > Which is not an integer multiple of the block size, and so violates > > Rule #7 for setting extent size hints. Validate extent size hint > > rules in the inode verifier to catch this. > > > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > > --- > > fs/xfs/libxfs/xfs_inode_buf.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c > > index f5fff1ccb61d..be197c91307b 100644 > > --- a/fs/xfs/libxfs/xfs_inode_buf.c > > +++ b/fs/xfs/libxfs/xfs_inode_buf.c > > @@ -385,6 +385,7 @@ xfs_dinode_verify( > > xfs_ino_t ino, > > struct xfs_dinode *dip) > > { > > + xfs_failaddr_t fa; > > Weren't we getting rid of typedefs? Unneeded typedefs, yes. e.g. typedef struct foo { } foo_t; serve no useful purpose, so we get rid of them where appropriate. > To be honest the typedef here gives more > clarity to the code than void* directly, so, I'm ok with it anyway, I'm just > curious is some typedefs are going to be kept. Right, xfs_failaddr_t is a useful typedef - it tells us that this variable will hold an instruction pointer related to the failure that was detected, which is something a void * can't tell us. It's all about context :P Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier 2018-06-05 6:24 ` [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier Dave Chinner 2018-06-05 9:53 ` Carlos Maiolino @ 2018-06-05 17:10 ` Darrick J. Wong 2018-06-07 16:16 ` Darrick J. Wong 2018-07-24 6:39 ` Eric Sandeen 2 siblings, 1 reply; 36+ messages in thread From: Darrick J. Wong @ 2018-06-05 17:10 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs On Tue, Jun 05, 2018 at 04:24:19PM +1000, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > There are rules for vald extent size hints. We enforce them when > applications set them, but fuzzers violate those rules and that > screws us over. > > This results in alignment assertion failures when setting up > allocations such as this in direct IO: > > XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432 > .... > Call Trace: > xfs_bmap_btalloc+0x415/0x910 > xfs_bmapi_write+0x71c/0x12e0 > xfs_iomap_write_direct+0x2a9/0x420 > xfs_file_iomap_begin+0x4dc/0xa70 > iomap_apply+0x43/0x100 > iomap_file_buffered_write+0x62/0x90 > xfs_file_buffered_aio_write+0xba/0x300 > __vfs_write+0xd5/0x150 > vfs_write+0xb6/0x180 > ksys_write+0x45/0xa0 > do_syscall_64+0x5a/0x180 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > And from xfs_db: > > core.extsize = 10380288 > > Which is not an integer multiple of the block size, and so violates > Rule #7 for setting extent size hints. Validate extent size hint > rules in the inode verifier to catch this. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> Looks ok modulo my comments in the next patch, Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> --D > --- > fs/xfs/libxfs/xfs_inode_buf.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c > index f5fff1ccb61d..be197c91307b 100644 > --- a/fs/xfs/libxfs/xfs_inode_buf.c > +++ b/fs/xfs/libxfs/xfs_inode_buf.c > @@ -385,6 +385,7 @@ xfs_dinode_verify( > xfs_ino_t ino, > struct xfs_dinode *dip) > { > + xfs_failaddr_t fa; > uint16_t mode; > uint16_t flags; > uint64_t flags2; > @@ -501,6 +502,12 @@ xfs_dinode_verify( > return __this_address; > } > > + /* extent size hint validation */ > + fa = xfs_inode_validate_extsize(mp, be32_to_cpu(dip->di_extsize), > + mode, be32_to_cpu(dip->di_flags)); > + if (fa) > + return fa; > + > /* only version 3 or greater inodes are extensively verified here */ > if (dip->di_version < 3) > return NULL; > -- > 2.17.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier 2018-06-05 17:10 ` Darrick J. Wong @ 2018-06-07 16:16 ` Darrick J. Wong 2018-06-08 1:10 ` Dave Chinner 0 siblings, 1 reply; 36+ messages in thread From: Darrick J. Wong @ 2018-06-07 16:16 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs On Tue, Jun 05, 2018 at 10:10:15AM -0700, Darrick J. Wong wrote: > On Tue, Jun 05, 2018 at 04:24:19PM +1000, Dave Chinner wrote: > > From: Dave Chinner <dchinner@redhat.com> > > > > There are rules for vald extent size hints. We enforce them when > > applications set them, but fuzzers violate those rules and that > > screws us over. > > > > This results in alignment assertion failures when setting up > > allocations such as this in direct IO: > > > > XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432 > > .... > > Call Trace: > > xfs_bmap_btalloc+0x415/0x910 > > xfs_bmapi_write+0x71c/0x12e0 > > xfs_iomap_write_direct+0x2a9/0x420 > > xfs_file_iomap_begin+0x4dc/0xa70 > > iomap_apply+0x43/0x100 > > iomap_file_buffered_write+0x62/0x90 > > xfs_file_buffered_aio_write+0xba/0x300 > > __vfs_write+0xd5/0x150 > > vfs_write+0xb6/0x180 > > ksys_write+0x45/0xa0 > > do_syscall_64+0x5a/0x180 > > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > > > And from xfs_db: > > > > core.extsize = 10380288 > > > > Which is not an integer multiple of the block size, and so violates > > Rule #7 for setting extent size hints. Validate extent size hint > > rules in the inode verifier to catch this. > > > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > > Looks ok modulo my comments in the next patch, > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> FWIW when I applied this to xfsprogs I saw an xfs/033 regression: Phase 6 - check inode connectivity... reinitializing root directory Metadata corruption detected at 0x5555555c60e0, inode 0x80 dinode fatal error -- could not iget root inode -- error - 117 [Inferior 1 (process 1178) exited with code 01] (gdb) l *(0x5555555c60e0) 0x5555555c60e0 is in libxfs_inode_validate_extsize (xfs_inode_buf.c:729). We fail the inode verifier while trying to _iget the root inode so that we can reinitialize it; I suspect phase 3 is going to need to check the extent size hints and clear them. --D > --D > > > --- > > fs/xfs/libxfs/xfs_inode_buf.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c > > index f5fff1ccb61d..be197c91307b 100644 > > --- a/fs/xfs/libxfs/xfs_inode_buf.c > > +++ b/fs/xfs/libxfs/xfs_inode_buf.c > > @@ -385,6 +385,7 @@ xfs_dinode_verify( > > xfs_ino_t ino, > > struct xfs_dinode *dip) > > { > > + xfs_failaddr_t fa; > > uint16_t mode; > > uint16_t flags; > > uint64_t flags2; > > @@ -501,6 +502,12 @@ xfs_dinode_verify( > > return __this_address; > > } > > > > + /* extent size hint validation */ > > + fa = xfs_inode_validate_extsize(mp, be32_to_cpu(dip->di_extsize), > > + mode, be32_to_cpu(dip->di_flags)); > > + if (fa) > > + return fa; > > + > > /* only version 3 or greater inodes are extensively verified here */ > > if (dip->di_version < 3) > > return NULL; > > -- > > 2.17.0 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier 2018-06-07 16:16 ` Darrick J. Wong @ 2018-06-08 1:10 ` Dave Chinner 2018-06-08 1:23 ` Darrick J. Wong 0 siblings, 1 reply; 36+ messages in thread From: Dave Chinner @ 2018-06-08 1:10 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-xfs On Thu, Jun 07, 2018 at 09:16:31AM -0700, Darrick J. Wong wrote: > On Tue, Jun 05, 2018 at 10:10:15AM -0700, Darrick J. Wong wrote: > > On Tue, Jun 05, 2018 at 04:24:19PM +1000, Dave Chinner wrote: > > > From: Dave Chinner <dchinner@redhat.com> > > > > > > There are rules for vald extent size hints. We enforce them when > > > applications set them, but fuzzers violate those rules and that > > > screws us over. > > > > > > This results in alignment assertion failures when setting up > > > allocations such as this in direct IO: > > > > > > XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432 > > > .... > > > Call Trace: > > > xfs_bmap_btalloc+0x415/0x910 > > > xfs_bmapi_write+0x71c/0x12e0 > > > xfs_iomap_write_direct+0x2a9/0x420 > > > xfs_file_iomap_begin+0x4dc/0xa70 > > > iomap_apply+0x43/0x100 > > > iomap_file_buffered_write+0x62/0x90 > > > xfs_file_buffered_aio_write+0xba/0x300 > > > __vfs_write+0xd5/0x150 > > > vfs_write+0xb6/0x180 > > > ksys_write+0x45/0xa0 > > > do_syscall_64+0x5a/0x180 > > > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > > > > > And from xfs_db: > > > > > > core.extsize = 10380288 > > > > > > Which is not an integer multiple of the block size, and so violates > > > Rule #7 for setting extent size hints. Validate extent size hint > > > rules in the inode verifier to catch this. > > > > > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > > > > Looks ok modulo my comments in the next patch, > > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> > > FWIW when I applied this to xfsprogs I saw an xfs/033 regression: > > Phase 6 - check inode connectivity... > reinitializing root directory > Metadata corruption detected at 0x5555555c60e0, inode 0x80 dinode > > fatal error -- could not iget root inode -- error - 117 > [Inferior 1 (process 1178) exited with code 01] > (gdb) l *(0x5555555c60e0) > 0x5555555c60e0 is in libxfs_inode_validate_extsize (xfs_inode_buf.c:729). > > We fail the inode verifier while trying to _iget the root inode so that > we can reinitialize it; I suspect phase 3 is going to need to check the > extent size hints and clear them. I'm actually quite happy to see that the continual process of hardening the kernel verifiers has got to the point where we are starting to expose deficiencies in xfs_repair. Can I wait for the xfsprogs libxfs-4.18-sync branch to pick up these verifier changes before looking at what repair needs to do to avoid it? I don't want to do a forced context switch to debugging/enhancing userspace code right at this moment.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier 2018-06-08 1:10 ` Dave Chinner @ 2018-06-08 1:23 ` Darrick J. Wong 2018-06-08 2:23 ` Eric Sandeen 0 siblings, 1 reply; 36+ messages in thread From: Darrick J. Wong @ 2018-06-08 1:23 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs On Fri, Jun 08, 2018 at 11:10:39AM +1000, Dave Chinner wrote: > On Thu, Jun 07, 2018 at 09:16:31AM -0700, Darrick J. Wong wrote: > > On Tue, Jun 05, 2018 at 10:10:15AM -0700, Darrick J. Wong wrote: > > > On Tue, Jun 05, 2018 at 04:24:19PM +1000, Dave Chinner wrote: > > > > From: Dave Chinner <dchinner@redhat.com> > > > > > > > > There are rules for vald extent size hints. We enforce them when > > > > applications set them, but fuzzers violate those rules and that > > > > screws us over. > > > > > > > > This results in alignment assertion failures when setting up > > > > allocations such as this in direct IO: > > > > > > > > XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432 > > > > .... > > > > Call Trace: > > > > xfs_bmap_btalloc+0x415/0x910 > > > > xfs_bmapi_write+0x71c/0x12e0 > > > > xfs_iomap_write_direct+0x2a9/0x420 > > > > xfs_file_iomap_begin+0x4dc/0xa70 > > > > iomap_apply+0x43/0x100 > > > > iomap_file_buffered_write+0x62/0x90 > > > > xfs_file_buffered_aio_write+0xba/0x300 > > > > __vfs_write+0xd5/0x150 > > > > vfs_write+0xb6/0x180 > > > > ksys_write+0x45/0xa0 > > > > do_syscall_64+0x5a/0x180 > > > > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > > > > > > > And from xfs_db: > > > > > > > > core.extsize = 10380288 > > > > > > > > Which is not an integer multiple of the block size, and so violates > > > > Rule #7 for setting extent size hints. Validate extent size hint > > > > rules in the inode verifier to catch this. > > > > > > > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > > > > > > Looks ok modulo my comments in the next patch, > > > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> > > > > FWIW when I applied this to xfsprogs I saw an xfs/033 regression: > > > > Phase 6 - check inode connectivity... > > reinitializing root directory > > Metadata corruption detected at 0x5555555c60e0, inode 0x80 dinode > > > > fatal error -- could not iget root inode -- error - 117 > > [Inferior 1 (process 1178) exited with code 01] > > (gdb) l *(0x5555555c60e0) > > 0x5555555c60e0 is in libxfs_inode_validate_extsize (xfs_inode_buf.c:729). > > > > We fail the inode verifier while trying to _iget the root inode so that > > we can reinitialize it; I suspect phase 3 is going to need to check the > > extent size hints and clear them. > > I'm actually quite happy to see that the continual process of > hardening the kernel verifiers has got to the point where we are > starting to expose deficiencies in xfs_repair. > > Can I wait for the xfsprogs libxfs-4.18-sync branch to pick up these > verifier changes before looking at what repair needs to do to avoid > it? I don't want to do a forced context switch to > debugging/enhancing userspace code right at this moment.... That's ultimately up to Eric, but since fixing it is nontrivial surgery on xfs_repair (and the verifier update patch doesn't itself break the build) I'd be fine with fixing it after the 4.18 sync goes in. --D > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier 2018-06-08 1:23 ` Darrick J. Wong @ 2018-06-08 2:23 ` Eric Sandeen 0 siblings, 0 replies; 36+ messages in thread From: Eric Sandeen @ 2018-06-08 2:23 UTC (permalink / raw) To: Darrick J. Wong, Dave Chinner; +Cc: linux-xfs On 6/7/18 8:23 PM, Darrick J. Wong wrote: > On Fri, Jun 08, 2018 at 11:10:39AM +1000, Dave Chinner wrote: >> On Thu, Jun 07, 2018 at 09:16:31AM -0700, Darrick J. Wong wrote: >>> On Tue, Jun 05, 2018 at 10:10:15AM -0700, Darrick J. Wong wrote: >>>> On Tue, Jun 05, 2018 at 04:24:19PM +1000, Dave Chinner wrote: >>>>> From: Dave Chinner <dchinner@redhat.com> >>>>> >>>>> There are rules for vald extent size hints. We enforce them when >>>>> applications set them, but fuzzers violate those rules and that >>>>> screws us over. >>>>> >>>>> This results in alignment assertion failures when setting up >>>>> allocations such as this in direct IO: >>>>> >>>>> XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432 >>>>> .... >>>>> Call Trace: >>>>> xfs_bmap_btalloc+0x415/0x910 >>>>> xfs_bmapi_write+0x71c/0x12e0 >>>>> xfs_iomap_write_direct+0x2a9/0x420 >>>>> xfs_file_iomap_begin+0x4dc/0xa70 >>>>> iomap_apply+0x43/0x100 >>>>> iomap_file_buffered_write+0x62/0x90 >>>>> xfs_file_buffered_aio_write+0xba/0x300 >>>>> __vfs_write+0xd5/0x150 >>>>> vfs_write+0xb6/0x180 >>>>> ksys_write+0x45/0xa0 >>>>> do_syscall_64+0x5a/0x180 >>>>> entry_SYSCALL_64_after_hwframe+0x49/0xbe >>>>> >>>>> And from xfs_db: >>>>> >>>>> core.extsize = 10380288 >>>>> >>>>> Which is not an integer multiple of the block size, and so violates >>>>> Rule #7 for setting extent size hints. Validate extent size hint >>>>> rules in the inode verifier to catch this. >>>>> >>>>> Signed-off-by: Dave Chinner <dchinner@redhat.com> >>>> >>>> Looks ok modulo my comments in the next patch, >>>> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> >>> >>> FWIW when I applied this to xfsprogs I saw an xfs/033 regression: >>> >>> Phase 6 - check inode connectivity... >>> reinitializing root directory >>> Metadata corruption detected at 0x5555555c60e0, inode 0x80 dinode >>> >>> fatal error -- could not iget root inode -- error - 117 >>> [Inferior 1 (process 1178) exited with code 01] >>> (gdb) l *(0x5555555c60e0) >>> 0x5555555c60e0 is in libxfs_inode_validate_extsize (xfs_inode_buf.c:729). >>> >>> We fail the inode verifier while trying to _iget the root inode so that >>> we can reinitialize it; I suspect phase 3 is going to need to check the >>> extent size hints and clear them. >> >> I'm actually quite happy to see that the continual process of >> hardening the kernel verifiers has got to the point where we are >> starting to expose deficiencies in xfs_repair. >> >> Can I wait for the xfsprogs libxfs-4.18-sync branch to pick up these >> verifier changes before looking at what repair needs to do to avoid >> it? I don't want to do a forced context switch to >> debugging/enhancing userspace code right at this moment.... > > That's ultimately up to Eric, but since fixing it is nontrivial surgery > on xfs_repair (and the verifier update patch doesn't itself break the > build) I'd be fine with fixing it after the 4.18 sync goes in. > > --D I think that getting it into the kernel and even into the xfsprogs/libxfs tree for 4.18 is fine as long as we are sure a repair fix will be forthcoming before 4.18 is done as long as it doesn't blow up regression testing /too/ much... This kernel<->libxfs<->application coordination can get a bit chicken-and-eggy sometimes. I guess this kernel change means that only a latest xfs_repair will make a latest kernel happy; I guess that's fairly normal. -Eric ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier 2018-06-05 6:24 ` [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier Dave Chinner 2018-06-05 9:53 ` Carlos Maiolino 2018-06-05 17:10 ` Darrick J. Wong @ 2018-07-24 6:39 ` Eric Sandeen 2018-07-24 16:43 ` Darrick J. Wong 2 siblings, 1 reply; 36+ messages in thread From: Eric Sandeen @ 2018-07-24 6:39 UTC (permalink / raw) To: Dave Chinner, linux-xfs On 6/4/18 11:24 PM, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > There are rules for vald extent size hints. We enforce them when > applications set them, but fuzzers violate those rules and that > screws us over. > > This results in alignment assertion failures when setting up > allocations such as this in direct IO: > > XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432 > .... > Call Trace: > xfs_bmap_btalloc+0x415/0x910 > xfs_bmapi_write+0x71c/0x12e0 > xfs_iomap_write_direct+0x2a9/0x420 > xfs_file_iomap_begin+0x4dc/0xa70 > iomap_apply+0x43/0x100 > iomap_file_buffered_write+0x62/0x90 > xfs_file_buffered_aio_write+0xba/0x300 > __vfs_write+0xd5/0x150 > vfs_write+0xb6/0x180 > ksys_write+0x45/0xa0 > do_syscall_64+0x5a/0x180 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > And from xfs_db: > > core.extsize = 10380288 > > Which is not an integer multiple of the block size, and so violates > Rule #7 for setting extent size hints. Validate extent size hint > rules in the inode verifier to catch this. So, I think that if I do: # mkfs.xfs -f -m crc=0 $TEST_DEV # ./check xfs/229 # ./check xfs/229 I trip the verifier, because I end up with freed inodes on disk with an extent size hints but zeroed flags. xfs_ifree sets di_flags = 0 but doesn't clear di_extsize; xfs_inode_validate_extsize says if extsize !=0 and the hint flag is set, it fails Anyone else see this? (crc=0 needed because that causes us to actually reread the inode chunks in xfs_iread vs. /* shortcut IO on inode allocation if possible */ -Eric ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier 2018-07-24 6:39 ` Eric Sandeen @ 2018-07-24 16:43 ` Darrick J. Wong 2018-08-20 15:06 ` Brian Foster 0 siblings, 1 reply; 36+ messages in thread From: Darrick J. Wong @ 2018-07-24 16:43 UTC (permalink / raw) To: Eric Sandeen; +Cc: Dave Chinner, linux-xfs On Mon, Jul 23, 2018 at 11:39:53PM -0700, Eric Sandeen wrote: > On 6/4/18 11:24 PM, Dave Chinner wrote: > > From: Dave Chinner <dchinner@redhat.com> > > > > There are rules for vald extent size hints. We enforce them when > > applications set them, but fuzzers violate those rules and that > > screws us over. > > > > This results in alignment assertion failures when setting up > > allocations such as this in direct IO: > > > > XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432 > > .... > > Call Trace: > > xfs_bmap_btalloc+0x415/0x910 > > xfs_bmapi_write+0x71c/0x12e0 > > xfs_iomap_write_direct+0x2a9/0x420 > > xfs_file_iomap_begin+0x4dc/0xa70 > > iomap_apply+0x43/0x100 > > iomap_file_buffered_write+0x62/0x90 > > xfs_file_buffered_aio_write+0xba/0x300 > > __vfs_write+0xd5/0x150 > > vfs_write+0xb6/0x180 > > ksys_write+0x45/0xa0 > > do_syscall_64+0x5a/0x180 > > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > > > And from xfs_db: > > > > core.extsize = 10380288 > > > > Which is not an integer multiple of the block size, and so violates > > Rule #7 for setting extent size hints. Validate extent size hint > > rules in the inode verifier to catch this. > > So, I think that if I do: > > # mkfs.xfs -f -m crc=0 $TEST_DEV > # ./check xfs/229 > # ./check xfs/229 > > I trip the verifier, because I end up with freed inodes on disk with an > extent size hints but zeroed flags. > > xfs_ifree sets di_flags = 0 but doesn't clear di_extsize; xfs_inode_validate_extsize > says if extsize !=0 and the hint flag is set, it fails > > Anyone else see this? Yeah, I think I just hit this on the TEST_DEV in xfs/242. git blame says I lifted the code from the scrub code, and I probably wrote the code having read the ioctl code (which clears the extsize field if the iflag isn't set). > (crc=0 needed because that causes us to actually reread the inode chunks > in xfs_iread vs. /* shortcut IO on inode allocation if possible */ Hmmm, so a v5 fs mounted with ikeep will also read an inode chunk when creating an inode. It looks like we do that (instead of zeroing the incore inode and setting a random i_generation) to preserve the existing generation number? In any case, it's pretty clear that kernels have been writing out freed inode cores with di_mode == 0, di_flags == 0, and di_extsize == (some number) so we clearly can't have that in the verifier. It looks like we only examine di_extsize if either EXTSZ flag are set, so it's not causing incorrect behavior. Maybe it can be a preening fix in scrub/repair. --D > -Eric > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier 2018-07-24 16:43 ` Darrick J. Wong @ 2018-08-20 15:06 ` Brian Foster 2018-08-20 15:27 ` Eric Sandeen 0 siblings, 1 reply; 36+ messages in thread From: Brian Foster @ 2018-08-20 15:06 UTC (permalink / raw) To: Darrick J. Wong; +Cc: Eric Sandeen, Dave Chinner, linux-xfs On Tue, Jul 24, 2018 at 09:43:46AM -0700, Darrick J. Wong wrote: > On Mon, Jul 23, 2018 at 11:39:53PM -0700, Eric Sandeen wrote: > > On 6/4/18 11:24 PM, Dave Chinner wrote: > > > From: Dave Chinner <dchinner@redhat.com> > > > > > > There are rules for vald extent size hints. We enforce them when > > > applications set them, but fuzzers violate those rules and that > > > screws us over. > > > > > > This results in alignment assertion failures when setting up > > > allocations such as this in direct IO: > > > > > > XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432 > > > .... > > > Call Trace: > > > xfs_bmap_btalloc+0x415/0x910 > > > xfs_bmapi_write+0x71c/0x12e0 > > > xfs_iomap_write_direct+0x2a9/0x420 > > > xfs_file_iomap_begin+0x4dc/0xa70 > > > iomap_apply+0x43/0x100 > > > iomap_file_buffered_write+0x62/0x90 > > > xfs_file_buffered_aio_write+0xba/0x300 > > > __vfs_write+0xd5/0x150 > > > vfs_write+0xb6/0x180 > > > ksys_write+0x45/0xa0 > > > do_syscall_64+0x5a/0x180 > > > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > > > > > And from xfs_db: > > > > > > core.extsize = 10380288 > > > > > > Which is not an integer multiple of the block size, and so violates > > > Rule #7 for setting extent size hints. Validate extent size hint > > > rules in the inode verifier to catch this. > > > > So, I think that if I do: > > > > # mkfs.xfs -f -m crc=0 $TEST_DEV > > # ./check xfs/229 > > # ./check xfs/229 > > > > I trip the verifier, because I end up with freed inodes on disk with an > > extent size hints but zeroed flags. > > > > xfs_ifree sets di_flags = 0 but doesn't clear di_extsize; xfs_inode_validate_extsize > > says if extsize !=0 and the hint flag is set, it fails > > > > Anyone else see this? > > Yeah, I think I just hit this on the TEST_DEV in xfs/242. > > git blame says I lifted the code from the scrub code, and I probably > wrote the code having read the ioctl code (which clears the extsize > field if the iflag isn't set). > > > (crc=0 needed because that causes us to actually reread the inode chunks > > in xfs_iread vs. /* shortcut IO on inode allocation if possible */ > > Hmmm, so a v5 fs mounted with ikeep will also read an inode chunk when > creating an inode. It looks like we do that (instead of zeroing the > incore inode and setting a random i_generation) to preserve the existing > generation number? > > In any case, it's pretty clear that kernels have been writing out freed > inode cores with di_mode == 0, di_flags == 0, and di_extsize == (some > number) so we clearly can't have that in the verifier. It looks like we > only examine di_extsize if either EXTSZ flag are set, so it's not > causing incorrect behavior. Maybe it can be a preening fix in > scrub/repair. > I just stumbled on this problem with xfs/229 that Eric reported. I'm confused by the comment above regarding this not causing incorrect behavior. This patch adds extsize verification to the inode verifier, which trips up the inode allocation code if the (free) on-disk inode still happens to have an old extsize hint set. This causes a filesystem shutdown where we'd otherwise subsequently reset di_extsize to zero as part of the inode allocation path (xfs_ialloc()). Shouldn't we filter out this particular (!hint && extsize != 0) check for free inodes (or something of that nature)? FWIW, I reproduce this with xfs/229 with crc=0 or crc=1+ikeep. Brian > --D > > > -Eric > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier 2018-08-20 15:06 ` Brian Foster @ 2018-08-20 15:27 ` Eric Sandeen 2018-08-20 15:36 ` Darrick J. Wong 0 siblings, 1 reply; 36+ messages in thread From: Eric Sandeen @ 2018-08-20 15:27 UTC (permalink / raw) To: Brian Foster, Darrick J. Wong; +Cc: Dave Chinner, linux-xfs On 8/20/18 10:06 AM, Brian Foster wrote: > On Tue, Jul 24, 2018 at 09:43:46AM -0700, Darrick J. Wong wrote: >> On Mon, Jul 23, 2018 at 11:39:53PM -0700, Eric Sandeen wrote: >>> On 6/4/18 11:24 PM, Dave Chinner wrote: >>>> From: Dave Chinner <dchinner@redhat.com> >>>> >>>> There are rules for vald extent size hints. We enforce them when >>>> applications set them, but fuzzers violate those rules and that >>>> screws us over. >>>> >>>> This results in alignment assertion failures when setting up >>>> allocations such as this in direct IO: >>>> >>>> XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432 >>>> .... >>>> Call Trace: >>>> xfs_bmap_btalloc+0x415/0x910 >>>> xfs_bmapi_write+0x71c/0x12e0 >>>> xfs_iomap_write_direct+0x2a9/0x420 >>>> xfs_file_iomap_begin+0x4dc/0xa70 >>>> iomap_apply+0x43/0x100 >>>> iomap_file_buffered_write+0x62/0x90 >>>> xfs_file_buffered_aio_write+0xba/0x300 >>>> __vfs_write+0xd5/0x150 >>>> vfs_write+0xb6/0x180 >>>> ksys_write+0x45/0xa0 >>>> do_syscall_64+0x5a/0x180 >>>> entry_SYSCALL_64_after_hwframe+0x49/0xbe >>>> >>>> And from xfs_db: >>>> >>>> core.extsize = 10380288 >>>> >>>> Which is not an integer multiple of the block size, and so violates >>>> Rule #7 for setting extent size hints. Validate extent size hint >>>> rules in the inode verifier to catch this. >>> >>> So, I think that if I do: >>> >>> # mkfs.xfs -f -m crc=0 $TEST_DEV >>> # ./check xfs/229 >>> # ./check xfs/229 >>> >>> I trip the verifier, because I end up with freed inodes on disk with an >>> extent size hints but zeroed flags. >>> >>> xfs_ifree sets di_flags = 0 but doesn't clear di_extsize; xfs_inode_validate_extsize >>> says if extsize !=0 and the hint flag is set, it fails >>> >>> Anyone else see this? >> >> Yeah, I think I just hit this on the TEST_DEV in xfs/242. >> >> git blame says I lifted the code from the scrub code, and I probably >> wrote the code having read the ioctl code (which clears the extsize >> field if the iflag isn't set). >> >>> (crc=0 needed because that causes us to actually reread the inode chunks >>> in xfs_iread vs. /* shortcut IO on inode allocation if possible */ >> >> Hmmm, so a v5 fs mounted with ikeep will also read an inode chunk when >> creating an inode. It looks like we do that (instead of zeroing the >> incore inode and setting a random i_generation) to preserve the existing >> generation number? >> >> In any case, it's pretty clear that kernels have been writing out freed >> inode cores with di_mode == 0, di_flags == 0, and di_extsize == (some >> number) so we clearly can't have that in the verifier. It looks like we >> only examine di_extsize if either EXTSZ flag are set, so it's not >> causing incorrect behavior. Maybe it can be a preening fix in >> scrub/repair. >> > > I just stumbled on this problem with xfs/229 that Eric reported. I'm > confused by the comment above regarding this not causing incorrect > behavior. I think Darrick meant that having a nonzero extent size hint on disk won't cause incorrect behavior because "we only examine di_extsize if either EXTSZ flag are set" -Eric ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier 2018-08-20 15:27 ` Eric Sandeen @ 2018-08-20 15:36 ` Darrick J. Wong 2018-08-20 15:59 ` Brian Foster 0 siblings, 1 reply; 36+ messages in thread From: Darrick J. Wong @ 2018-08-20 15:36 UTC (permalink / raw) To: Eric Sandeen; +Cc: Brian Foster, Dave Chinner, linux-xfs On Mon, Aug 20, 2018 at 10:27:42AM -0500, Eric Sandeen wrote: > > > On 8/20/18 10:06 AM, Brian Foster wrote: > > On Tue, Jul 24, 2018 at 09:43:46AM -0700, Darrick J. Wong wrote: > >> On Mon, Jul 23, 2018 at 11:39:53PM -0700, Eric Sandeen wrote: > >>> On 6/4/18 11:24 PM, Dave Chinner wrote: > >>>> From: Dave Chinner <dchinner@redhat.com> > >>>> > >>>> There are rules for vald extent size hints. We enforce them when > >>>> applications set them, but fuzzers violate those rules and that > >>>> screws us over. > >>>> > >>>> This results in alignment assertion failures when setting up > >>>> allocations such as this in direct IO: > >>>> > >>>> XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432 > >>>> .... > >>>> Call Trace: > >>>> xfs_bmap_btalloc+0x415/0x910 > >>>> xfs_bmapi_write+0x71c/0x12e0 > >>>> xfs_iomap_write_direct+0x2a9/0x420 > >>>> xfs_file_iomap_begin+0x4dc/0xa70 > >>>> iomap_apply+0x43/0x100 > >>>> iomap_file_buffered_write+0x62/0x90 > >>>> xfs_file_buffered_aio_write+0xba/0x300 > >>>> __vfs_write+0xd5/0x150 > >>>> vfs_write+0xb6/0x180 > >>>> ksys_write+0x45/0xa0 > >>>> do_syscall_64+0x5a/0x180 > >>>> entry_SYSCALL_64_after_hwframe+0x49/0xbe > >>>> > >>>> And from xfs_db: > >>>> > >>>> core.extsize = 10380288 > >>>> > >>>> Which is not an integer multiple of the block size, and so violates > >>>> Rule #7 for setting extent size hints. Validate extent size hint > >>>> rules in the inode verifier to catch this. > >>> > >>> So, I think that if I do: > >>> > >>> # mkfs.xfs -f -m crc=0 $TEST_DEV > >>> # ./check xfs/229 > >>> # ./check xfs/229 > >>> > >>> I trip the verifier, because I end up with freed inodes on disk with an > >>> extent size hints but zeroed flags. > >>> > >>> xfs_ifree sets di_flags = 0 but doesn't clear di_extsize; xfs_inode_validate_extsize > >>> says if extsize !=0 and the hint flag is set, it fails > >>> > >>> Anyone else see this? > >> > >> Yeah, I think I just hit this on the TEST_DEV in xfs/242. > >> > >> git blame says I lifted the code from the scrub code, and I probably > >> wrote the code having read the ioctl code (which clears the extsize > >> field if the iflag isn't set). > >> > >>> (crc=0 needed because that causes us to actually reread the inode chunks > >>> in xfs_iread vs. /* shortcut IO on inode allocation if possible */ > >> > >> Hmmm, so a v5 fs mounted with ikeep will also read an inode chunk when > >> creating an inode. It looks like we do that (instead of zeroing the > >> incore inode and setting a random i_generation) to preserve the existing > >> generation number? > >> > >> In any case, it's pretty clear that kernels have been writing out freed > >> inode cores with di_mode == 0, di_flags == 0, and di_extsize == (some > >> number) so we clearly can't have that in the verifier. It looks like we > >> only examine di_extsize if either EXTSZ flag are set, so it's not > >> causing incorrect behavior. Maybe it can be a preening fix in > >> scrub/repair. > >> > > > > I just stumbled on this problem with xfs/229 that Eric reported. I'm > > confused by the comment above regarding this not causing incorrect > > behavior. > > I think Darrick meant that having a nonzero extent size hint on disk > won't cause incorrect behavior because "we only examine di_extsize if > either EXTSZ flag are set" Yeah, he probably did. :) I think Brian's suggestion of if (i_mode != 0 && !hint && extsize != 0) barf_error(); sounds reasonable (having not tested that at all). --D > -Eric ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier 2018-08-20 15:36 ` Darrick J. Wong @ 2018-08-20 15:59 ` Brian Foster 2018-08-20 22:15 ` Dave Chinner 0 siblings, 1 reply; 36+ messages in thread From: Brian Foster @ 2018-08-20 15:59 UTC (permalink / raw) To: Darrick J. Wong; +Cc: Eric Sandeen, Dave Chinner, linux-xfs On Mon, Aug 20, 2018 at 08:36:26AM -0700, Darrick J. Wong wrote: > On Mon, Aug 20, 2018 at 10:27:42AM -0500, Eric Sandeen wrote: > > > > > > On 8/20/18 10:06 AM, Brian Foster wrote: > > > On Tue, Jul 24, 2018 at 09:43:46AM -0700, Darrick J. Wong wrote: > > >> On Mon, Jul 23, 2018 at 11:39:53PM -0700, Eric Sandeen wrote: > > >>> On 6/4/18 11:24 PM, Dave Chinner wrote: > > >>>> From: Dave Chinner <dchinner@redhat.com> > > >>>> > > >>>> There are rules for vald extent size hints. We enforce them when > > >>>> applications set them, but fuzzers violate those rules and that > > >>>> screws us over. > > >>>> > > >>>> This results in alignment assertion failures when setting up > > >>>> allocations such as this in direct IO: > > >>>> > > >>>> XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432 > > >>>> .... > > >>>> Call Trace: > > >>>> xfs_bmap_btalloc+0x415/0x910 > > >>>> xfs_bmapi_write+0x71c/0x12e0 > > >>>> xfs_iomap_write_direct+0x2a9/0x420 > > >>>> xfs_file_iomap_begin+0x4dc/0xa70 > > >>>> iomap_apply+0x43/0x100 > > >>>> iomap_file_buffered_write+0x62/0x90 > > >>>> xfs_file_buffered_aio_write+0xba/0x300 > > >>>> __vfs_write+0xd5/0x150 > > >>>> vfs_write+0xb6/0x180 > > >>>> ksys_write+0x45/0xa0 > > >>>> do_syscall_64+0x5a/0x180 > > >>>> entry_SYSCALL_64_after_hwframe+0x49/0xbe > > >>>> > > >>>> And from xfs_db: > > >>>> > > >>>> core.extsize = 10380288 > > >>>> > > >>>> Which is not an integer multiple of the block size, and so violates > > >>>> Rule #7 for setting extent size hints. Validate extent size hint > > >>>> rules in the inode verifier to catch this. > > >>> > > >>> So, I think that if I do: > > >>> > > >>> # mkfs.xfs -f -m crc=0 $TEST_DEV > > >>> # ./check xfs/229 > > >>> # ./check xfs/229 > > >>> > > >>> I trip the verifier, because I end up with freed inodes on disk with an > > >>> extent size hints but zeroed flags. > > >>> > > >>> xfs_ifree sets di_flags = 0 but doesn't clear di_extsize; xfs_inode_validate_extsize > > >>> says if extsize !=0 and the hint flag is set, it fails > > >>> > > >>> Anyone else see this? > > >> > > >> Yeah, I think I just hit this on the TEST_DEV in xfs/242. > > >> > > >> git blame says I lifted the code from the scrub code, and I probably > > >> wrote the code having read the ioctl code (which clears the extsize > > >> field if the iflag isn't set). > > >> > > >>> (crc=0 needed because that causes us to actually reread the inode chunks > > >>> in xfs_iread vs. /* shortcut IO on inode allocation if possible */ > > >> > > >> Hmmm, so a v5 fs mounted with ikeep will also read an inode chunk when > > >> creating an inode. It looks like we do that (instead of zeroing the > > >> incore inode and setting a random i_generation) to preserve the existing > > >> generation number? > > >> > > >> In any case, it's pretty clear that kernels have been writing out freed > > >> inode cores with di_mode == 0, di_flags == 0, and di_extsize == (some > > >> number) so we clearly can't have that in the verifier. It looks like we > > >> only examine di_extsize if either EXTSZ flag are set, so it's not > > >> causing incorrect behavior. Maybe it can be a preening fix in > > >> scrub/repair. > > >> > > > > > > I just stumbled on this problem with xfs/229 that Eric reported. I'm > > > confused by the comment above regarding this not causing incorrect > > > behavior. > > > > I think Darrick meant that having a nonzero extent size hint on disk > > won't cause incorrect behavior because "we only examine di_extsize if > > either EXTSZ flag are set" > > Yeah, he probably did. :) > Got it, thanks. > I think Brian's suggestion of > > if (i_mode != 0 && !hint && extsize != 0) > barf_error(); > > sounds reasonable (having not tested that at all). > I'll run it through xfstests and get it posted if nothing else fails. BTW, do we have a similar issue with the cowextsize hint (assuming v5+ikeep)? It looks like it's cleared similarly in xfs_ialloc(), but I'm not sure if it's cleared somewhere else on free... Brian > --D > > > -Eric ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier 2018-08-20 15:59 ` Brian Foster @ 2018-08-20 22:15 ` Dave Chinner 2018-08-21 10:56 ` Brian Foster 0 siblings, 1 reply; 36+ messages in thread From: Dave Chinner @ 2018-08-20 22:15 UTC (permalink / raw) To: Brian Foster; +Cc: Darrick J. Wong, Eric Sandeen, linux-xfs On Mon, Aug 20, 2018 at 11:59:18AM -0400, Brian Foster wrote: > On Mon, Aug 20, 2018 at 08:36:26AM -0700, Darrick J. Wong wrote: > > On Mon, Aug 20, 2018 at 10:27:42AM -0500, Eric Sandeen wrote: > > > On 8/20/18 10:06 AM, Brian Foster wrote: > > > > On Tue, Jul 24, 2018 at 09:43:46AM -0700, Darrick J. Wong wrote: > > > >> On Mon, Jul 23, 2018 at 11:39:53PM -0700, Eric Sandeen wrote: > > > >>> On 6/4/18 11:24 PM, Dave Chinner wrote: > > > >>>> From: Dave Chinner <dchinner@redhat.com> > > > >>>> > > > >>>> There are rules for vald extent size hints. We enforce them when > > > >>>> applications set them, but fuzzers violate those rules and that > > > >>>> screws us over. > > > >>>> > > > >>>> This results in alignment assertion failures when setting up > > > >>>> allocations such as this in direct IO: > > > >>>> > > > >>>> XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432 > > > >>>> .... > > > >>>> Call Trace: > > > >>>> xfs_bmap_btalloc+0x415/0x910 > > > >>>> xfs_bmapi_write+0x71c/0x12e0 > > > >>>> xfs_iomap_write_direct+0x2a9/0x420 > > > >>>> xfs_file_iomap_begin+0x4dc/0xa70 > > > >>>> iomap_apply+0x43/0x100 > > > >>>> iomap_file_buffered_write+0x62/0x90 > > > >>>> xfs_file_buffered_aio_write+0xba/0x300 > > > >>>> __vfs_write+0xd5/0x150 > > > >>>> vfs_write+0xb6/0x180 > > > >>>> ksys_write+0x45/0xa0 > > > >>>> do_syscall_64+0x5a/0x180 > > > >>>> entry_SYSCALL_64_after_hwframe+0x49/0xbe > > > >>>> > > > >>>> And from xfs_db: > > > >>>> > > > >>>> core.extsize = 10380288 > > > >>>> > > > >>>> Which is not an integer multiple of the block size, and so violates > > > >>>> Rule #7 for setting extent size hints. Validate extent size hint > > > >>>> rules in the inode verifier to catch this. > > > >>> > > > >>> So, I think that if I do: > > > >>> > > > >>> # mkfs.xfs -f -m crc=0 $TEST_DEV > > > >>> # ./check xfs/229 > > > >>> # ./check xfs/229 > > > >>> > > > >>> I trip the verifier, because I end up with freed inodes on disk with an > > > >>> extent size hints but zeroed flags. > > > >>> > > > >>> xfs_ifree sets di_flags = 0 but doesn't clear di_extsize; xfs_inode_validate_extsize > > > >>> says if extsize !=0 and the hint flag is set, it fails > > > >>> > > > >>> Anyone else see this? > > > >> > > > >> Yeah, I think I just hit this on the TEST_DEV in xfs/242. > > > >> > > > >> git blame says I lifted the code from the scrub code, and I probably > > > >> wrote the code having read the ioctl code (which clears the extsize > > > >> field if the iflag isn't set). > > > >> > > > >>> (crc=0 needed because that causes us to actually reread the inode chunks > > > >>> in xfs_iread vs. /* shortcut IO on inode allocation if possible */ > > > >> > > > >> Hmmm, so a v5 fs mounted with ikeep will also read an inode chunk when > > > >> creating an inode. It looks like we do that (instead of zeroing the > > > >> incore inode and setting a random i_generation) to preserve the existing > > > >> generation number? > > > >> > > > >> In any case, it's pretty clear that kernels have been writing out freed > > > >> inode cores with di_mode == 0, di_flags == 0, and di_extsize == (some > > > >> number) so we clearly can't have that in the verifier. It looks like we > > > >> only examine di_extsize if either EXTSZ flag are set, so it's not > > > >> causing incorrect behavior. Maybe it can be a preening fix in > > > >> scrub/repair. > > > >> > > > > > > > > I just stumbled on this problem with xfs/229 that Eric reported. I'm > > > > confused by the comment above regarding this not causing incorrect > > > > behavior. > > > > > > I think Darrick meant that having a nonzero extent size hint on disk > > > won't cause incorrect behavior because "we only examine di_extsize if > > > either EXTSZ flag are set" > > > > Yeah, he probably did. :) > > > > Got it, thanks. > > > I think Brian's suggestion of > > > > if (i_mode != 0 && !hint && extsize != 0) > > barf_error(); > > > > sounds reasonable (having not tested that at all). > > > > I'll run it through xfstests and get it posted if nothing else fails. > > BTW, do we have a similar issue with the cowextsize hint (assuming > v5+ikeep)? It looks like it's cleared similarly in xfs_ialloc(), but I'm > not sure if it's cleared somewhere else on free... We should clear them on free now, so that we can draw a line in the sand for when we can have verifiers check it. e.g. when the next feature bit gets introduced, filesystems with that feature bit set can also verify the extent size hints are zero on freed inodes because we know that kernels supporting that feature always zero them on free.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier 2018-08-20 22:15 ` Dave Chinner @ 2018-08-21 10:56 ` Brian Foster 2018-08-22 0:41 ` Dave Chinner 0 siblings, 1 reply; 36+ messages in thread From: Brian Foster @ 2018-08-21 10:56 UTC (permalink / raw) To: Dave Chinner; +Cc: Darrick J. Wong, Eric Sandeen, linux-xfs On Tue, Aug 21, 2018 at 08:15:06AM +1000, Dave Chinner wrote: > On Mon, Aug 20, 2018 at 11:59:18AM -0400, Brian Foster wrote: > > On Mon, Aug 20, 2018 at 08:36:26AM -0700, Darrick J. Wong wrote: > > > On Mon, Aug 20, 2018 at 10:27:42AM -0500, Eric Sandeen wrote: > > > > On 8/20/18 10:06 AM, Brian Foster wrote: > > > > > On Tue, Jul 24, 2018 at 09:43:46AM -0700, Darrick J. Wong wrote: > > > > >> On Mon, Jul 23, 2018 at 11:39:53PM -0700, Eric Sandeen wrote: > > > > >>> On 6/4/18 11:24 PM, Dave Chinner wrote: > > > > >>>> From: Dave Chinner <dchinner@redhat.com> > > > > >>>> > > > > >>>> There are rules for vald extent size hints. We enforce them when > > > > >>>> applications set them, but fuzzers violate those rules and that > > > > >>>> screws us over. > > > > >>>> > > > > >>>> This results in alignment assertion failures when setting up > > > > >>>> allocations such as this in direct IO: > > > > >>>> > > > > >>>> XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432 > > > > >>>> .... > > > > >>>> Call Trace: > > > > >>>> xfs_bmap_btalloc+0x415/0x910 > > > > >>>> xfs_bmapi_write+0x71c/0x12e0 > > > > >>>> xfs_iomap_write_direct+0x2a9/0x420 > > > > >>>> xfs_file_iomap_begin+0x4dc/0xa70 > > > > >>>> iomap_apply+0x43/0x100 > > > > >>>> iomap_file_buffered_write+0x62/0x90 > > > > >>>> xfs_file_buffered_aio_write+0xba/0x300 > > > > >>>> __vfs_write+0xd5/0x150 > > > > >>>> vfs_write+0xb6/0x180 > > > > >>>> ksys_write+0x45/0xa0 > > > > >>>> do_syscall_64+0x5a/0x180 > > > > >>>> entry_SYSCALL_64_after_hwframe+0x49/0xbe > > > > >>>> > > > > >>>> And from xfs_db: > > > > >>>> > > > > >>>> core.extsize = 10380288 > > > > >>>> > > > > >>>> Which is not an integer multiple of the block size, and so violates > > > > >>>> Rule #7 for setting extent size hints. Validate extent size hint > > > > >>>> rules in the inode verifier to catch this. > > > > >>> > > > > >>> So, I think that if I do: > > > > >>> > > > > >>> # mkfs.xfs -f -m crc=0 $TEST_DEV > > > > >>> # ./check xfs/229 > > > > >>> # ./check xfs/229 > > > > >>> > > > > >>> I trip the verifier, because I end up with freed inodes on disk with an > > > > >>> extent size hints but zeroed flags. > > > > >>> > > > > >>> xfs_ifree sets di_flags = 0 but doesn't clear di_extsize; xfs_inode_validate_extsize > > > > >>> says if extsize !=0 and the hint flag is set, it fails > > > > >>> > > > > >>> Anyone else see this? > > > > >> > > > > >> Yeah, I think I just hit this on the TEST_DEV in xfs/242. > > > > >> > > > > >> git blame says I lifted the code from the scrub code, and I probably > > > > >> wrote the code having read the ioctl code (which clears the extsize > > > > >> field if the iflag isn't set). > > > > >> > > > > >>> (crc=0 needed because that causes us to actually reread the inode chunks > > > > >>> in xfs_iread vs. /* shortcut IO on inode allocation if possible */ > > > > >> > > > > >> Hmmm, so a v5 fs mounted with ikeep will also read an inode chunk when > > > > >> creating an inode. It looks like we do that (instead of zeroing the > > > > >> incore inode and setting a random i_generation) to preserve the existing > > > > >> generation number? > > > > >> > > > > >> In any case, it's pretty clear that kernels have been writing out freed > > > > >> inode cores with di_mode == 0, di_flags == 0, and di_extsize == (some > > > > >> number) so we clearly can't have that in the verifier. It looks like we > > > > >> only examine di_extsize if either EXTSZ flag are set, so it's not > > > > >> causing incorrect behavior. Maybe it can be a preening fix in > > > > >> scrub/repair. > > > > >> > > > > > > > > > > I just stumbled on this problem with xfs/229 that Eric reported. I'm > > > > > confused by the comment above regarding this not causing incorrect > > > > > behavior. > > > > > > > > I think Darrick meant that having a nonzero extent size hint on disk > > > > won't cause incorrect behavior because "we only examine di_extsize if > > > > either EXTSZ flag are set" > > > > > > Yeah, he probably did. :) > > > > > > > Got it, thanks. > > > > > I think Brian's suggestion of > > > > > > if (i_mode != 0 && !hint && extsize != 0) > > > barf_error(); > > > > > > sounds reasonable (having not tested that at all). > > > > > > > I'll run it through xfstests and get it posted if nothing else fails. > > > > BTW, do we have a similar issue with the cowextsize hint (assuming > > v5+ikeep)? It looks like it's cleared similarly in xfs_ialloc(), but I'm > > not sure if it's cleared somewhere else on free... > I should note for the list that we've since determined this was already fixed in v4.18 [1]. The patch ended up in a common base branch between what is used for upstream pull requests and XFS' for-next, being left out of the latter just by accident. [1] d4a34e1655 ("xfs: properly handle free inodes in extent hint validators") > We should clear them on free now, so that we can draw a line in the > sand for when we can have verifiers check it. e.g. when the next > feature bit gets introduced, filesystems with that feature bit set > can also verify the extent size hints are zero on freed inodes > because we know that kernels supporting that feature always zero > them on free.... > That seems fine (and harmless) to me if the goal is ultimately to have this content clear on-disk. It keeps things consistent for verifiers, scrub, repair, etc. to not have some bits with required initialized values and others where we need to accommodate stale data. Brian > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier 2018-08-21 10:56 ` Brian Foster @ 2018-08-22 0:41 ` Dave Chinner 0 siblings, 0 replies; 36+ messages in thread From: Dave Chinner @ 2018-08-22 0:41 UTC (permalink / raw) To: Brian Foster; +Cc: Darrick J. Wong, Eric Sandeen, linux-xfs On Tue, Aug 21, 2018 at 06:56:45AM -0400, Brian Foster wrote: > On Tue, Aug 21, 2018 at 08:15:06AM +1000, Dave Chinner wrote: > > We should clear them on free now, so that we can draw a line in the > > sand for when we can have verifiers check it. e.g. when the next > > feature bit gets introduced, filesystems with that feature bit set > > can also verify the extent size hints are zero on freed inodes > > because we know that kernels supporting that feature always zero > > them on free.... > > > > That seems fine (and harmless) to me if the goal is ultimately to have > this content clear on-disk. It keeps things consistent for verifiers, > scrub, repair, etc. to not have some bits with required initialized > values and others where we need to accommodate stale data. *nod* One of my original goals for all the online repair work was to ensure that everything on disk was clean, sanitised and not ambiguous in any way. The verifiers are a mechanism that helps us identify things that aren't clean or sanitised, and working out how to verify those things identifies ambiguities we need to fix. Such as this one with freed inodes. As I said to Eric on #xfs: <dchinner_> We're well beyond the point where we trust a single field in an on-disk structure to be the One True Source of Truth <dchinner_> we want everything on disk to be correct and clean <dchinner_> so we can make decisions like this correctly via multi-field verifications -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 3/6] xfs: verify COW extent size hint is valid in inode verifier 2018-06-05 6:24 [PATCH 0/6 V2] xfs: more verifications! Dave Chinner 2018-06-05 6:24 ` [PATCH 1/6] xfs: catch bad stripe alignment configurations Dave Chinner 2018-06-05 6:24 ` [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier Dave Chinner @ 2018-06-05 6:24 ` Dave Chinner 2018-06-05 10:00 ` Carlos Maiolino 2018-06-05 17:09 ` Darrick J. Wong 2018-06-05 6:24 ` [PATCH 4/6] xfs: validate btree records on retreival Dave Chinner ` (2 subsequent siblings) 5 siblings, 2 replies; 36+ messages in thread From: Dave Chinner @ 2018-06-05 6:24 UTC (permalink / raw) To: linux-xfs From: Dave Chinner <dchinner@redhat.com> There are rules for vald extent size hints. We enforce them when applications set them, but fuzzers violate those rules and that screws us over. Validate COW extent size hint rules in the inode verifier to catch this. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/libxfs/xfs_inode_buf.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c index be197c91307b..ea64be7cbd98 100644 --- a/fs/xfs/libxfs/xfs_inode_buf.c +++ b/fs/xfs/libxfs/xfs_inode_buf.c @@ -504,7 +504,7 @@ xfs_dinode_verify( /* extent size hint validation */ fa = xfs_inode_validate_extsize(mp, be32_to_cpu(dip->di_extsize), - mode, be32_to_cpu(dip->di_flags)); + mode, flags); if (fa) return fa; @@ -516,7 +516,7 @@ xfs_dinode_verify( /* don't allow reflink/cowextsize if we don't have reflink */ if ((flags2 & (XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE)) && - !xfs_sb_version_hasreflink(&mp->m_sb)) + !xfs_sb_version_hasreflink(&mp->m_sb)) return __this_address; /* only regular files get reflink */ @@ -531,6 +531,12 @@ xfs_dinode_verify( if ((flags2 & XFS_DIFLAG2_REFLINK) && (flags2 & XFS_DIFLAG2_DAX)) return __this_address; + /* COW extent size hint validation */ + fa = xfs_inode_validate_cowextsize(mp, be32_to_cpu(dip->di_cowextsize), + mode, flags, flags2); + if (fa) + return fa; + return NULL; } -- 2.17.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 3/6] xfs: verify COW extent size hint is valid in inode verifier 2018-06-05 6:24 ` [PATCH 3/6] xfs: verify COW " Dave Chinner @ 2018-06-05 10:00 ` Carlos Maiolino 2018-06-05 17:09 ` Darrick J. Wong 1 sibling, 0 replies; 36+ messages in thread From: Carlos Maiolino @ 2018-06-05 10:00 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs On Tue, Jun 05, 2018 at 04:24:20PM +1000, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > There are rules for vald extent size hints. We enforce them when > applications set them, but fuzzers violate those rules and that > screws us over. Validate COW extent size hint rules in the inode > verifier to catch this. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > --- > fs/xfs/libxfs/xfs_inode_buf.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> > diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c > index be197c91307b..ea64be7cbd98 100644 > --- a/fs/xfs/libxfs/xfs_inode_buf.c > +++ b/fs/xfs/libxfs/xfs_inode_buf.c > @@ -504,7 +504,7 @@ xfs_dinode_verify( > > /* extent size hint validation */ > fa = xfs_inode_validate_extsize(mp, be32_to_cpu(dip->di_extsize), > - mode, be32_to_cpu(dip->di_flags)); > + mode, flags); > if (fa) > return fa; > > @@ -516,7 +516,7 @@ xfs_dinode_verify( > > /* don't allow reflink/cowextsize if we don't have reflink */ > if ((flags2 & (XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE)) && > - !xfs_sb_version_hasreflink(&mp->m_sb)) > + !xfs_sb_version_hasreflink(&mp->m_sb)) > return __this_address; > > /* only regular files get reflink */ > @@ -531,6 +531,12 @@ xfs_dinode_verify( > if ((flags2 & XFS_DIFLAG2_REFLINK) && (flags2 & XFS_DIFLAG2_DAX)) > return __this_address; > > + /* COW extent size hint validation */ > + fa = xfs_inode_validate_cowextsize(mp, be32_to_cpu(dip->di_cowextsize), > + mode, flags, flags2); > + if (fa) > + return fa; > + > return NULL; > } > > -- > 2.17.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Carlos ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 3/6] xfs: verify COW extent size hint is valid in inode verifier 2018-06-05 6:24 ` [PATCH 3/6] xfs: verify COW " Dave Chinner 2018-06-05 10:00 ` Carlos Maiolino @ 2018-06-05 17:09 ` Darrick J. Wong 1 sibling, 0 replies; 36+ messages in thread From: Darrick J. Wong @ 2018-06-05 17:09 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs On Tue, Jun 05, 2018 at 04:24:20PM +1000, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > There are rules for vald extent size hints. We enforce them when > applications set them, but fuzzers violate those rules and that > screws us over. Validate COW extent size hint rules in the inode > verifier to catch this. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > --- > fs/xfs/libxfs/xfs_inode_buf.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c > index be197c91307b..ea64be7cbd98 100644 > --- a/fs/xfs/libxfs/xfs_inode_buf.c > +++ b/fs/xfs/libxfs/xfs_inode_buf.c > @@ -504,7 +504,7 @@ xfs_dinode_verify( > > /* extent size hint validation */ > fa = xfs_inode_validate_extsize(mp, be32_to_cpu(dip->di_extsize), > - mode, be32_to_cpu(dip->di_flags)); > + mode, flags); > if (fa) > return fa; > > @@ -516,7 +516,7 @@ xfs_dinode_verify( > > /* don't allow reflink/cowextsize if we don't have reflink */ > if ((flags2 & (XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE)) && > - !xfs_sb_version_hasreflink(&mp->m_sb)) > + !xfs_sb_version_hasreflink(&mp->m_sb)) These two bits belong in the previous patch, but I'll fix them on the way in. Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> --D > return __this_address; > > /* only regular files get reflink */ > @@ -531,6 +531,12 @@ xfs_dinode_verify( > if ((flags2 & XFS_DIFLAG2_REFLINK) && (flags2 & XFS_DIFLAG2_DAX)) > return __this_address; > > + /* COW extent size hint validation */ > + fa = xfs_inode_validate_cowextsize(mp, be32_to_cpu(dip->di_cowextsize), > + mode, flags, flags2); > + if (fa) > + return fa; > + > return NULL; > } > > -- > 2.17.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 4/6] xfs: validate btree records on retreival 2018-06-05 6:24 [PATCH 0/6 V2] xfs: more verifications! Dave Chinner ` (2 preceding siblings ...) 2018-06-05 6:24 ` [PATCH 3/6] xfs: verify COW " Dave Chinner @ 2018-06-05 6:24 ` Dave Chinner 2018-06-05 6:40 ` [PATCH 4/6 v2] " Dave Chinner 2018-06-05 6:24 ` [PATCH 5/6] xfs: verify root inode more thoroughly Dave Chinner 2018-06-05 6:24 ` [PATCH 6/6] xfs: push corruption -> ESTALE conversion to xfs_nfs_get_inode() Dave Chinner 5 siblings, 1 reply; 36+ messages in thread From: Dave Chinner @ 2018-06-05 6:24 UTC (permalink / raw) To: linux-xfs From: Dave Chinner <dchinner@redhat.com> So we don't check the validity of records as we walk the btree. When there are corrupt records in the free space btree (e.g. zero startblock/length or beyond EOAG) we just blindly use it and things go bad from there. That leads to assert failures on debug kernels like this: XFS: Assertion failed: fs_is_ok, file: fs/xfs/libxfs/xfs_alloc.c, line: 450 .... Call Trace: xfs_alloc_fixup_trees+0x368/0x5c0 xfs_alloc_ag_vextent_near+0x79a/0xe20 xfs_alloc_ag_vextent+0x1d3/0x330 xfs_alloc_vextent+0x5e9/0x870 Or crashes like this: XFS (loop0): xfs_buf_find: daddr 0x7fb28 out of range, EOFS 0x8000 ..... BUG: unable to handle kernel NULL pointer dereference at 00000000000000c8 .... Call Trace: xfs_bmap_add_extent_hole_real+0x67d/0x930 xfs_bmapi_write+0x934/0xc90 xfs_da_grow_inode_int+0x27e/0x2f0 xfs_dir2_grow_inode+0x55/0x130 xfs_dir2_sf_to_block+0x94/0x5d0 xfs_dir2_sf_addname+0xd0/0x590 xfs_dir_createname+0x168/0x1a0 xfs_rename+0x658/0x9b0 By checking that free space records pulled from the trees are within the valid range, we catch many of these corruptions before they can do damage. This is a generic btree record checking deficiency. We need to validate the records we fetch from all the different btrees before we use them to catch corruptions like this. This patch results in a corrupt record emitting an error message and returning -EFSCORRUPTED, and the higher layers catch that and abort: XFS (loop0): Size Freespace BTree record corruption in AG 0 detected! XFS (loop0): start block 0x0 block count 0x0 XFS (loop0): Internal error xfs_trans_cancel at line 1012 of file fs/xfs/xfs_trans.c. Caller xfs_create+0x42a/0x670 ..... Call Trace: dump_stack+0x85/0xcb xfs_trans_cancel+0x19f/0x1c0 xfs_create+0x42a/0x670 xfs_generic_create+0x1f6/0x2c0 vfs_create+0xf9/0x180 do_mknodat+0x1f9/0x210 do_syscall_64+0x5a/0x180 entry_SYSCALL_64_after_hwframe+0x49/0xbe ..... XFS (loop0): xfs_do_force_shutdown(0x8) called from line 1013 of file fs/xfs/xfs_trans.c. Return address = ffffffff81500868 XFS (loop0): Corruption of in-memory data detected. Shutting down filesystem Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/libxfs/xfs_alloc.c | 18 ++++++++++++++-- fs/xfs/libxfs/xfs_ialloc.c | 31 ++++++++++++++++++++++++++- fs/xfs/libxfs/xfs_refcount.c | 41 +++++++++++++++++++++++++++++++----- fs/xfs/libxfs/xfs_rmap.c | 34 +++++++++++++++++++++++++++++- 4 files changed, 115 insertions(+), 9 deletions(-) diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index a52ffb835b16..e9caa3d18f89 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -215,6 +215,8 @@ xfs_alloc_get_rec( xfs_extlen_t *len, /* output: length of extent */ int *stat) /* output: success/failure */ { + struct xfs_mount *mp = cur->bc_mp; + xfs_agnumber_t agno = cur->bc_private.a.agno; union xfs_btree_rec *rec; int error; @@ -222,12 +224,24 @@ xfs_alloc_get_rec( if (error || !(*stat)) return error; if (rec->alloc.ar_blockcount == 0) - return -EFSCORRUPTED; + goto out_bad_rec; *bno = be32_to_cpu(rec->alloc.ar_startblock); *len = be32_to_cpu(rec->alloc.ar_blockcount); - return error; + if (!xfs_verify_agbno(mp, agno, *bno) || + !xfs_verify_agbno(mp, agno, *bno + *len - 1)) + goto out_bad_rec; + + return 0; + +out_bad_rec: + xfs_warn(mp, + "%s Freespace BTree record corruption in AG %d detected!", + cur->bc_btnum == XFS_BTNUM_BNO ? "Block" : "Size", agno); + xfs_warn(mp, + "start block 0x%x block count 0x%x", *bno, *len); + return -EFSCORRUPTED; } /* diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c index ec5ea02b5553..3f551eb29157 100644 --- a/fs/xfs/libxfs/xfs_ialloc.c +++ b/fs/xfs/libxfs/xfs_ialloc.c @@ -121,16 +121,45 @@ xfs_inobt_get_rec( struct xfs_inobt_rec_incore *irec, int *stat) { + struct xfs_mount *mp = cur->bc_mp; + xfs_agnumber_t agno = cur->bc_private.a.agno; union xfs_btree_rec *rec; int error; + uint64_t realfree; error = xfs_btree_get_rec(cur, &rec, stat); if (error || *stat == 0) return error; - xfs_inobt_btrec_to_irec(cur->bc_mp, rec, irec); + xfs_inobt_btrec_to_irec(mp, rec, irec); + + if (!xfs_verify_agino(mp, agno, irec->ir_startino)) + goto out_bad_rec; + if (irec->ir_count < XFS_INODES_PER_HOLEMASK_BIT || + irec->ir_count > XFS_INODES_PER_CHUNK) + goto out_bad_rec; + if (irec->ir_freecount > XFS_INODES_PER_CHUNK) + goto out_bad_rec; + + /* if there are no holes, return the first available offset */ + if (!xfs_inobt_issparse(irec->ir_holemask)) + realfree = irec->ir_free; + else + realfree = irec->ir_free & xfs_inobt_irec_to_allocmask(irec); + if (hweight64(realfree) != irec->ir_freecount) + goto out_bad_rec; return 0; + +out_bad_rec: + xfs_warn(mp, + "%s Inode BTree record corruption in AG %d detected!", + cur->bc_btnum == XFS_BTNUM_INO ? "Used" : "Free", agno); + xfs_warn(mp, +"start inode 0x%x, count 0x%x, free 0x%x freemask 0x%llx, holemask 0x%x", + irec->ir_startino, irec->ir_count, irec->ir_freecount, + irec->ir_free, irec->ir_holemask); + return -EFSCORRUPTED; } /* diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index ed5704c7dcf5..bd778e8b809f 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -111,16 +111,47 @@ xfs_refcount_get_rec( struct xfs_refcount_irec *irec, int *stat) { + struct xfs_mount *mp = cur->bc_mp; + xfs_agnumber_t agno = cur->bc_private.a.agno; union xfs_btree_rec *rec; int error; + xfs_agblock_t realstart; error = xfs_btree_get_rec(cur, &rec, stat); - if (!error && *stat == 1) { - xfs_refcount_btrec_to_irec(rec, irec); - trace_xfs_refcount_get(cur->bc_mp, cur->bc_private.a.agno, - irec); + if (error || !*stat) + return error; + + xfs_refcount_btrec_to_irec(rec, irec); + + agno = cur->bc_private.a.agno; + if (irec->rc_blockcount == 0 || irec->rc_blockcount > MAXREFCEXTLEN) + goto out_bad_rec; + + /* handle special COW-staging state */ + realstart = irec->rc_startblock; + if (realstart & XFS_REFC_COW_START) { + if (irec->rc_refcount != 1) + goto out_bad_rec; + realstart &= ~XFS_REFC_COW_START; } - return error; + + if (!xfs_verify_agbno(mp, agno, realstart)) + goto out_bad_rec; + if (!xfs_verify_agbno(mp, agno, realstart + irec->rc_blockcount - 1)) + goto out_bad_rec; + if (irec->rc_refcount == 0 || irec->rc_refcount > MAXREFCOUNT) + goto out_bad_rec; + + trace_xfs_refcount_get(cur->bc_mp, cur->bc_private.a.agno, irec); + return 0; + +out_bad_rec: + xfs_warn(mp, + "Refcount BTree record corruption in AG %d detected!", agno); + xfs_warn(mp, + "Start block 0x%x, block count 0x%x, references 0x%x", + irec->rc_startblock, irec->rc_blockcount, irec->rc_refcount); + return -EFSCORRUPTED; } /* diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c index 87711f9af625..69cfc92039e7 100644 --- a/fs/xfs/libxfs/xfs_rmap.c +++ b/fs/xfs/libxfs/xfs_rmap.c @@ -191,6 +191,8 @@ xfs_rmap_get_rec( struct xfs_rmap_irec *irec, int *stat) { + struct xfs_mount *mp = cur->bc_mp; + xfs_agnumber_t agno = cur->bc_private.a.agno; union xfs_btree_rec *rec; int error; @@ -198,7 +200,37 @@ xfs_rmap_get_rec( if (error || !*stat) return error; - return xfs_rmap_btrec_to_irec(rec, irec); + if (xfs_rmap_btrec_to_irec(rec, irec)) + goto out_bad_rec; + + if (irec->rm_blockcount == 0) + goto out_bad_rec; + if (irec->rm_startblock <= XFS_AGFL_BLOCK(mp)) { + if (irec->rm_owner != XFS_RMAP_OWN_FS) + goto out_bad_rec; + } else { + if (!xfs_verify_agbno(mp, agno, irec->rm_startblock)) + goto out_bad_rec; + if (!xfs_verify_agbno(mp, agno, + irec->rm_startblock + irec->rm_blockcount - 1)) + goto out_bad_rec; + } + + if (irec->rm_owner == 0) + goto out_bad_rec; + if (irec->rm_owner > XFS_MAXINUMBER && + irec->rm_owner <= XFS_RMAP_OWN_MIN) + goto out_bad_rec; + + return 0; +out_bad_rec: + xfs_warn(mp, + "RMAP BTree record corruption in AG %d detected!", agno); + xfs_warn(mp, + "Owner 0x%llx, flags 0x%x, start block 0x%x block count 0x%x", + irec->rm_owner, irec->rm_flags, irec->rm_startblock, + irec->rm_blockcount); + return -EFSCORRUPTED; } struct xfs_find_left_neighbor_info { -- 2.17.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 4/6 v2] xfs: validate btree records on retreival 2018-06-05 6:24 ` [PATCH 4/6] xfs: validate btree records on retreival Dave Chinner @ 2018-06-05 6:40 ` Dave Chinner 2018-06-05 10:42 ` Carlos Maiolino ` (2 more replies) 0 siblings, 3 replies; 36+ messages in thread From: Dave Chinner @ 2018-06-05 6:40 UTC (permalink / raw) To: linux-xfs From: Dave Chinner <dchinner@redhat.com> So we don't check the validity of records as we walk the btree. When there are corrupt records in the free space btree (e.g. zero startblock/length or beyond EOAG) we just blindly use it and things go bad from there. That leads to assert failures on debug kernels like this: XFS: Assertion failed: fs_is_ok, file: fs/xfs/libxfs/xfs_alloc.c, line: 450 .... Call Trace: xfs_alloc_fixup_trees+0x368/0x5c0 xfs_alloc_ag_vextent_near+0x79a/0xe20 xfs_alloc_ag_vextent+0x1d3/0x330 xfs_alloc_vextent+0x5e9/0x870 Or crashes like this: XFS (loop0): xfs_buf_find: daddr 0x7fb28 out of range, EOFS 0x8000 ..... BUG: unable to handle kernel NULL pointer dereference at 00000000000000c8 .... Call Trace: xfs_bmap_add_extent_hole_real+0x67d/0x930 xfs_bmapi_write+0x934/0xc90 xfs_da_grow_inode_int+0x27e/0x2f0 xfs_dir2_grow_inode+0x55/0x130 xfs_dir2_sf_to_block+0x94/0x5d0 xfs_dir2_sf_addname+0xd0/0x590 xfs_dir_createname+0x168/0x1a0 xfs_rename+0x658/0x9b0 By checking that free space records pulled from the trees are within the valid range, we catch many of these corruptions before they can do damage. This is a generic btree record checking deficiency. We need to validate the records we fetch from all the different btrees before we use them to catch corruptions like this. This patch results in a corrupt record emitting an error message and returning -EFSCORRUPTED, and the higher layers catch that and abort: XFS (loop0): Size Freespace BTree record corruption in AG 0 detected! XFS (loop0): start block 0x0 block count 0x0 XFS (loop0): Internal error xfs_trans_cancel at line 1012 of file fs/xfs/xfs_trans.c. Caller xfs_create+0x42a/0x670 ..... Call Trace: dump_stack+0x85/0xcb xfs_trans_cancel+0x19f/0x1c0 xfs_create+0x42a/0x670 xfs_generic_create+0x1f6/0x2c0 vfs_create+0xf9/0x180 do_mknodat+0x1f9/0x210 do_syscall_64+0x5a/0x180 entry_SYSCALL_64_after_hwframe+0x49/0xbe ..... XFS (loop0): xfs_do_force_shutdown(0x8) called from line 1013 of file fs/xfs/xfs_trans.c. Return address = ffffffff81500868 XFS (loop0): Corruption of in-memory data detected. Shutting down filesystem Signed-off-by: Dave Chinner <dchinner@redhat.com> --- V2: - now with overflow checks - slightly more robust rmap checks that validate AG header region properly. fs/xfs/libxfs/xfs_alloc.c | 22 ++++++++++++++++++++-- fs/xfs/libxfs/xfs_ialloc.c | 31 +++++++++++++++++++++++++++++- fs/xfs/libxfs/xfs_refcount.c | 45 +++++++++++++++++++++++++++++++++++++++----- fs/xfs/libxfs/xfs_rmap.c | 40 ++++++++++++++++++++++++++++++++++++++- 4 files changed, 129 insertions(+), 9 deletions(-) diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index a52ffb835b16..1db50cfc0212 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -215,6 +215,8 @@ xfs_alloc_get_rec( xfs_extlen_t *len, /* output: length of extent */ int *stat) /* output: success/failure */ { + struct xfs_mount *mp = cur->bc_mp; + xfs_agnumber_t agno = cur->bc_private.a.agno; union xfs_btree_rec *rec; int error; @@ -222,12 +224,28 @@ xfs_alloc_get_rec( if (error || !(*stat)) return error; if (rec->alloc.ar_blockcount == 0) - return -EFSCORRUPTED; + goto out_bad_rec; *bno = be32_to_cpu(rec->alloc.ar_startblock); *len = be32_to_cpu(rec->alloc.ar_blockcount); - return error; + /* check for valid extent range, including overflow */ + if (!xfs_verify_agbno(mp, agno, *bno)) + goto out_bad_rec; + if (*bno > *bno + *len) + goto out_bad_rec; + if (!xfs_verify_agbno(mp, agno, *bno + *len - 1)) + goto out_bad_rec; + + return 0; + +out_bad_rec: + xfs_warn(mp, + "%s Freespace BTree record corruption in AG %d detected!", + cur->bc_btnum == XFS_BTNUM_BNO ? "Block" : "Size", agno); + xfs_warn(mp, + "start block 0x%x block count 0x%x", *bno, *len); + return -EFSCORRUPTED; } /* diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c index ec5ea02b5553..3f551eb29157 100644 --- a/fs/xfs/libxfs/xfs_ialloc.c +++ b/fs/xfs/libxfs/xfs_ialloc.c @@ -121,16 +121,45 @@ xfs_inobt_get_rec( struct xfs_inobt_rec_incore *irec, int *stat) { + struct xfs_mount *mp = cur->bc_mp; + xfs_agnumber_t agno = cur->bc_private.a.agno; union xfs_btree_rec *rec; int error; + uint64_t realfree; error = xfs_btree_get_rec(cur, &rec, stat); if (error || *stat == 0) return error; - xfs_inobt_btrec_to_irec(cur->bc_mp, rec, irec); + xfs_inobt_btrec_to_irec(mp, rec, irec); + + if (!xfs_verify_agino(mp, agno, irec->ir_startino)) + goto out_bad_rec; + if (irec->ir_count < XFS_INODES_PER_HOLEMASK_BIT || + irec->ir_count > XFS_INODES_PER_CHUNK) + goto out_bad_rec; + if (irec->ir_freecount > XFS_INODES_PER_CHUNK) + goto out_bad_rec; + + /* if there are no holes, return the first available offset */ + if (!xfs_inobt_issparse(irec->ir_holemask)) + realfree = irec->ir_free; + else + realfree = irec->ir_free & xfs_inobt_irec_to_allocmask(irec); + if (hweight64(realfree) != irec->ir_freecount) + goto out_bad_rec; return 0; + +out_bad_rec: + xfs_warn(mp, + "%s Inode BTree record corruption in AG %d detected!", + cur->bc_btnum == XFS_BTNUM_INO ? "Used" : "Free", agno); + xfs_warn(mp, +"start inode 0x%x, count 0x%x, free 0x%x freemask 0x%llx, holemask 0x%x", + irec->ir_startino, irec->ir_count, irec->ir_freecount, + irec->ir_free, irec->ir_holemask); + return -EFSCORRUPTED; } /* diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index ed5704c7dcf5..9a2a2004af24 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -111,16 +111,51 @@ xfs_refcount_get_rec( struct xfs_refcount_irec *irec, int *stat) { + struct xfs_mount *mp = cur->bc_mp; + xfs_agnumber_t agno = cur->bc_private.a.agno; union xfs_btree_rec *rec; int error; + xfs_agblock_t realstart; error = xfs_btree_get_rec(cur, &rec, stat); - if (!error && *stat == 1) { - xfs_refcount_btrec_to_irec(rec, irec); - trace_xfs_refcount_get(cur->bc_mp, cur->bc_private.a.agno, - irec); + if (error || !*stat) + return error; + + xfs_refcount_btrec_to_irec(rec, irec); + + agno = cur->bc_private.a.agno; + if (irec->rc_blockcount == 0 || irec->rc_blockcount > MAXREFCEXTLEN) + goto out_bad_rec; + + /* handle special COW-staging state */ + realstart = irec->rc_startblock; + if (realstart & XFS_REFC_COW_START) { + if (irec->rc_refcount != 1) + goto out_bad_rec; + realstart &= ~XFS_REFC_COW_START; } - return error; + + /* check for valid extent range, including overflow */ + if (!xfs_verify_agbno(mp, agno, realstart)) + goto out_bad_rec; + if (realstart > realstart + irec->rc_blockcount) + goto out_bad_rec; + if (!xfs_verify_agbno(mp, agno, realstart + irec->rc_blockcount - 1)) + goto out_bad_rec; + + if (irec->rc_refcount == 0 || irec->rc_refcount > MAXREFCOUNT) + goto out_bad_rec; + + trace_xfs_refcount_get(cur->bc_mp, cur->bc_private.a.agno, irec); + return 0; + +out_bad_rec: + xfs_warn(mp, + "Refcount BTree record corruption in AG %d detected!", agno); + xfs_warn(mp, + "Start block 0x%x, block count 0x%x, references 0x%x", + irec->rc_startblock, irec->rc_blockcount, irec->rc_refcount); + return -EFSCORRUPTED; } /* diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c index 87711f9af625..e794bcd6591a 100644 --- a/fs/xfs/libxfs/xfs_rmap.c +++ b/fs/xfs/libxfs/xfs_rmap.c @@ -191,6 +191,8 @@ xfs_rmap_get_rec( struct xfs_rmap_irec *irec, int *stat) { + struct xfs_mount *mp = cur->bc_mp; + xfs_agnumber_t agno = cur->bc_private.a.agno; union xfs_btree_rec *rec; int error; @@ -198,7 +200,43 @@ xfs_rmap_get_rec( if (error || !*stat) return error; - return xfs_rmap_btrec_to_irec(rec, irec); + if (xfs_rmap_btrec_to_irec(rec, irec)) + goto out_bad_rec; + + if (irec->rm_blockcount == 0) + goto out_bad_rec; + if (irec->rm_startblock <= XFS_AGFL_BLOCK(mp)) { + if (irec->rm_owner != XFS_RMAP_OWN_FS) + goto out_bad_rec; + if (irec->rm_blockcount != XFS_AGFL_BLOCK(mp) + 1) + goto out_bad_rec; + } else { + /* check for valid extent range, including overflow */ + if (!xfs_verify_agbno(mp, agno, irec->rm_startblock)) + goto out_bad_rec; + if (irec->rm_startblock > + irec->rm_startblock + irec->rm_blockcount) + goto out_bad_rec; + if (!xfs_verify_agbno(mp, agno, + irec->rm_startblock + irec->rm_blockcount - 1)) + goto out_bad_rec; + } + + if (irec->rm_owner == 0) + goto out_bad_rec; + if (irec->rm_owner > XFS_MAXINUMBER && + irec->rm_owner <= XFS_RMAP_OWN_MIN) + goto out_bad_rec; + + return 0; +out_bad_rec: + xfs_warn(mp, + "RMAP BTree record corruption in AG %d detected!", agno); + xfs_warn(mp, + "Owner 0x%llx, flags 0x%x, start block 0x%x block count 0x%x", + irec->rm_owner, irec->rm_flags, irec->rm_startblock, + irec->rm_blockcount); + return -EFSCORRUPTED; } struct xfs_find_left_neighbor_info { ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 4/6 v2] xfs: validate btree records on retreival 2018-06-05 6:40 ` [PATCH 4/6 v2] " Dave Chinner @ 2018-06-05 10:42 ` Carlos Maiolino 2018-06-05 23:00 ` Dave Chinner 2018-06-05 17:47 ` Darrick J. Wong 2018-06-06 1:21 ` [PATCH 4/6 v3] " Dave Chinner 2 siblings, 1 reply; 36+ messages in thread From: Carlos Maiolino @ 2018-06-05 10:42 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs > union xfs_btree_rec *rec; > int error; > > @@ -222,12 +224,28 @@ xfs_alloc_get_rec( > if (error || !(*stat)) > return error; > if (rec->alloc.ar_blockcount == 0) > - return -EFSCORRUPTED; > + goto out_bad_rec; > > *bno = be32_to_cpu(rec->alloc.ar_startblock); > *len = be32_to_cpu(rec->alloc.ar_blockcount); > > - return error; > + /* check for valid extent range, including overflow */ > + if (!xfs_verify_agbno(mp, agno, *bno)) > + goto out_bad_rec; > + if (*bno > *bno + *len) > + goto out_bad_rec; > + if (!xfs_verify_agbno(mp, agno, *bno + *len - 1)) > + goto out_bad_rec; > + > + return 0; > + > +out_bad_rec: > + xfs_warn(mp, > + "%s Freespace BTree record corruption in AG %d detected!", > + cur->bc_btnum == XFS_BTNUM_BNO ? "Block" : "Size", agno); > + xfs_warn(mp, > + "start block 0x%x block count 0x%x", *bno, *len); > + return -EFSCORRUPTED; > } > > /* > diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c > index ec5ea02b5553..3f551eb29157 100644 > --- a/fs/xfs/libxfs/xfs_ialloc.c > +++ b/fs/xfs/libxfs/xfs_ialloc.c > @@ -121,16 +121,45 @@ xfs_inobt_get_rec( > struct xfs_inobt_rec_incore *irec, > int *stat) > { > + struct xfs_mount *mp = cur->bc_mp; > + xfs_agnumber_t agno = cur->bc_private.a.agno; > union xfs_btree_rec *rec; > int error; > + uint64_t realfree; > > error = xfs_btree_get_rec(cur, &rec, stat); > if (error || *stat == 0) > return error; > > - xfs_inobt_btrec_to_irec(cur->bc_mp, rec, irec); > + xfs_inobt_btrec_to_irec(mp, rec, irec); > + > + if (!xfs_verify_agino(mp, agno, irec->ir_startino)) > + goto out_bad_rec; > + if (irec->ir_count < XFS_INODES_PER_HOLEMASK_BIT || > + irec->ir_count > XFS_INODES_PER_CHUNK) > + goto out_bad_rec; > + if (irec->ir_freecount > XFS_INODES_PER_CHUNK) > + goto out_bad_rec; > + > + /* if there are no holes, return the first available offset */ > + if (!xfs_inobt_issparse(irec->ir_holemask)) > + realfree = irec->ir_free; > + else > + realfree = irec->ir_free & xfs_inobt_irec_to_allocmask(irec); > + if (hweight64(realfree) != irec->ir_freecount) > + goto out_bad_rec; > > return 0; > + > +out_bad_rec: > + xfs_warn(mp, > + "%s Inode BTree record corruption in AG %d detected!", > + cur->bc_btnum == XFS_BTNUM_INO ? "Used" : "Free", agno); > + xfs_warn(mp, > +"start inode 0x%x, count 0x%x, free 0x%x freemask 0x%llx, holemask 0x%x", > + irec->ir_startino, irec->ir_count, irec->ir_freecount, > + irec->ir_free, irec->ir_holemask); > + return -EFSCORRUPTED; > } > > /* > diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c > index ed5704c7dcf5..9a2a2004af24 100644 > --- a/fs/xfs/libxfs/xfs_refcount.c > +++ b/fs/xfs/libxfs/xfs_refcount.c > @@ -111,16 +111,51 @@ xfs_refcount_get_rec( > struct xfs_refcount_irec *irec, > int *stat) > { > + struct xfs_mount *mp = cur->bc_mp; > + xfs_agnumber_t agno = cur->bc_private.a.agno; > union xfs_btree_rec *rec; > int error; > + xfs_agblock_t realstart; > > error = xfs_btree_get_rec(cur, &rec, stat); > - if (!error && *stat == 1) { > - xfs_refcount_btrec_to_irec(rec, irec); > - trace_xfs_refcount_get(cur->bc_mp, cur->bc_private.a.agno, > - irec); > + if (error || !*stat) > + return error; > + > + xfs_refcount_btrec_to_irec(rec, irec); > + > + agno = cur->bc_private.a.agno; > + if (irec->rc_blockcount == 0 || irec->rc_blockcount > MAXREFCEXTLEN) > + goto out_bad_rec; > + > + /* handle special COW-staging state */ > + realstart = irec->rc_startblock; > + if (realstart & XFS_REFC_COW_START) { > + if (irec->rc_refcount != 1) > + goto out_bad_rec; > + realstart &= ~XFS_REFC_COW_START; > } > - return error; > + > + /* check for valid extent range, including overflow */ > + if (!xfs_verify_agbno(mp, agno, realstart)) > + goto out_bad_rec; > + if (realstart > realstart + irec->rc_blockcount) I am not sure if I'm right, but I thought this ought to be ">="? Other than that, you can add: Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> -- Carlos ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 4/6 v2] xfs: validate btree records on retreival 2018-06-05 10:42 ` Carlos Maiolino @ 2018-06-05 23:00 ` Dave Chinner 0 siblings, 0 replies; 36+ messages in thread From: Dave Chinner @ 2018-06-05 23:00 UTC (permalink / raw) To: linux-xfs On Tue, Jun 05, 2018 at 12:42:07PM +0200, Carlos Maiolino wrote: > > + if (irec->rc_blockcount == 0 || irec->rc_blockcount > MAXREFCEXTLEN) > > + goto out_bad_rec; > > + > > + /* handle special COW-staging state */ > > + realstart = irec->rc_startblock; > > + if (realstart & XFS_REFC_COW_START) { > > + if (irec->rc_refcount != 1) > > + goto out_bad_rec; > > + realstart &= ~XFS_REFC_COW_START; > > } > > - return error; > > + > > + /* check for valid extent range, including overflow */ > > + if (!xfs_verify_agbno(mp, agno, realstart)) > > + goto out_bad_rec; > > + if (realstart > realstart + irec->rc_blockcount) > > I am not sure if I'm right, but I thought this ought to be ">="? We've already caught zero length and block count greater than 2^32-1, so if the above is true we've wrapped through zero during the addition. But we can never add 0 or 2^32 to realstart here, so the "==" condition will not occur and we don't need to check for it.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 4/6 v2] xfs: validate btree records on retreival 2018-06-05 6:40 ` [PATCH 4/6 v2] " Dave Chinner 2018-06-05 10:42 ` Carlos Maiolino @ 2018-06-05 17:47 ` Darrick J. Wong 2018-06-05 23:02 ` Dave Chinner 2018-06-06 1:21 ` [PATCH 4/6 v3] " Dave Chinner 2 siblings, 1 reply; 36+ messages in thread From: Darrick J. Wong @ 2018-06-05 17:47 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs On Tue, Jun 05, 2018 at 04:40:43PM +1000, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > So we don't check the validity of records as we walk the btree. When > there are corrupt records in the free space btree (e.g. zero > startblock/length or beyond EOAG) we just blindly use it and things > go bad from there. That leads to assert failures on debug kernels > like this: > > XFS: Assertion failed: fs_is_ok, file: fs/xfs/libxfs/xfs_alloc.c, line: 450 > .... > Call Trace: > xfs_alloc_fixup_trees+0x368/0x5c0 > xfs_alloc_ag_vextent_near+0x79a/0xe20 > xfs_alloc_ag_vextent+0x1d3/0x330 > xfs_alloc_vextent+0x5e9/0x870 > > Or crashes like this: > > XFS (loop0): xfs_buf_find: daddr 0x7fb28 out of range, EOFS 0x8000 > ..... > BUG: unable to handle kernel NULL pointer dereference at 00000000000000c8 > .... > Call Trace: > xfs_bmap_add_extent_hole_real+0x67d/0x930 > xfs_bmapi_write+0x934/0xc90 > xfs_da_grow_inode_int+0x27e/0x2f0 > xfs_dir2_grow_inode+0x55/0x130 > xfs_dir2_sf_to_block+0x94/0x5d0 > xfs_dir2_sf_addname+0xd0/0x590 > xfs_dir_createname+0x168/0x1a0 > xfs_rename+0x658/0x9b0 > > By checking that free space records pulled from the trees are > within the valid range, we catch many of these corruptions before > they can do damage. > > This is a generic btree record checking deficiency. We need to > validate the records we fetch from all the different btrees before > we use them to catch corruptions like this. > > This patch results in a corrupt record emitting an error message and > returning -EFSCORRUPTED, and the higher layers catch that and abort: > > XFS (loop0): Size Freespace BTree record corruption in AG 0 detected! > XFS (loop0): start block 0x0 block count 0x0 > XFS (loop0): Internal error xfs_trans_cancel at line 1012 of file fs/xfs/xfs_trans.c. Caller xfs_create+0x42a/0x670 > ..... > Call Trace: > dump_stack+0x85/0xcb > xfs_trans_cancel+0x19f/0x1c0 > xfs_create+0x42a/0x670 > xfs_generic_create+0x1f6/0x2c0 > vfs_create+0xf9/0x180 > do_mknodat+0x1f9/0x210 > do_syscall_64+0x5a/0x180 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > ..... > XFS (loop0): xfs_do_force_shutdown(0x8) called from line 1013 of file fs/xfs/xfs_trans.c. Return address = ffffffff81500868 > XFS (loop0): Corruption of in-memory data detected. Shutting down filesystem > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > > --- > V2: > - now with overflow checks > - slightly more robust rmap checks that validate AG header region > properly. > > fs/xfs/libxfs/xfs_alloc.c | 22 ++++++++++++++++++++-- > fs/xfs/libxfs/xfs_ialloc.c | 31 +++++++++++++++++++++++++++++- > fs/xfs/libxfs/xfs_refcount.c | 45 +++++++++++++++++++++++++++++++++++++++----- > fs/xfs/libxfs/xfs_rmap.c | 40 ++++++++++++++++++++++++++++++++++++++- > 4 files changed, 129 insertions(+), 9 deletions(-) > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c > index a52ffb835b16..1db50cfc0212 100644 > --- a/fs/xfs/libxfs/xfs_alloc.c > +++ b/fs/xfs/libxfs/xfs_alloc.c > @@ -215,6 +215,8 @@ xfs_alloc_get_rec( > xfs_extlen_t *len, /* output: length of extent */ > int *stat) /* output: success/failure */ > { > + struct xfs_mount *mp = cur->bc_mp; > + xfs_agnumber_t agno = cur->bc_private.a.agno; > union xfs_btree_rec *rec; > int error; > > @@ -222,12 +224,28 @@ xfs_alloc_get_rec( > if (error || !(*stat)) > return error; > if (rec->alloc.ar_blockcount == 0) > - return -EFSCORRUPTED; > + goto out_bad_rec; > > *bno = be32_to_cpu(rec->alloc.ar_startblock); > *len = be32_to_cpu(rec->alloc.ar_blockcount); > > - return error; > + /* check for valid extent range, including overflow */ > + if (!xfs_verify_agbno(mp, agno, *bno)) > + goto out_bad_rec; > + if (*bno > *bno + *len) > + goto out_bad_rec; > + if (!xfs_verify_agbno(mp, agno, *bno + *len - 1)) > + goto out_bad_rec; > + > + return 0; > + > +out_bad_rec: > + xfs_warn(mp, > + "%s Freespace BTree record corruption in AG %d detected!", > + cur->bc_btnum == XFS_BTNUM_BNO ? "Block" : "Size", agno); > + xfs_warn(mp, > + "start block 0x%x block count 0x%x", *bno, *len); > + return -EFSCORRUPTED; > } > > /* > diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c > index ec5ea02b5553..3f551eb29157 100644 > --- a/fs/xfs/libxfs/xfs_ialloc.c > +++ b/fs/xfs/libxfs/xfs_ialloc.c > @@ -121,16 +121,45 @@ xfs_inobt_get_rec( > struct xfs_inobt_rec_incore *irec, > int *stat) > { > + struct xfs_mount *mp = cur->bc_mp; > + xfs_agnumber_t agno = cur->bc_private.a.agno; > union xfs_btree_rec *rec; > int error; > + uint64_t realfree; > > error = xfs_btree_get_rec(cur, &rec, stat); > if (error || *stat == 0) > return error; > > - xfs_inobt_btrec_to_irec(cur->bc_mp, rec, irec); > + xfs_inobt_btrec_to_irec(mp, rec, irec); > + > + if (!xfs_verify_agino(mp, agno, irec->ir_startino)) > + goto out_bad_rec; > + if (irec->ir_count < XFS_INODES_PER_HOLEMASK_BIT || > + irec->ir_count > XFS_INODES_PER_CHUNK) > + goto out_bad_rec; > + if (irec->ir_freecount > XFS_INODES_PER_CHUNK) > + goto out_bad_rec; > + > + /* if there are no holes, return the first available offset */ > + if (!xfs_inobt_issparse(irec->ir_holemask)) > + realfree = irec->ir_free; > + else > + realfree = irec->ir_free & xfs_inobt_irec_to_allocmask(irec); > + if (hweight64(realfree) != irec->ir_freecount) > + goto out_bad_rec; > > return 0; > + > +out_bad_rec: > + xfs_warn(mp, > + "%s Inode BTree record corruption in AG %d detected!", > + cur->bc_btnum == XFS_BTNUM_INO ? "Used" : "Free", agno); > + xfs_warn(mp, > +"start inode 0x%x, count 0x%x, free 0x%x freemask 0x%llx, holemask 0x%x", > + irec->ir_startino, irec->ir_count, irec->ir_freecount, > + irec->ir_free, irec->ir_holemask); > + return -EFSCORRUPTED; > } > > /* > diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c > index ed5704c7dcf5..9a2a2004af24 100644 > --- a/fs/xfs/libxfs/xfs_refcount.c > +++ b/fs/xfs/libxfs/xfs_refcount.c > @@ -111,16 +111,51 @@ xfs_refcount_get_rec( > struct xfs_refcount_irec *irec, > int *stat) > { > + struct xfs_mount *mp = cur->bc_mp; > + xfs_agnumber_t agno = cur->bc_private.a.agno; > union xfs_btree_rec *rec; > int error; > + xfs_agblock_t realstart; > > error = xfs_btree_get_rec(cur, &rec, stat); > - if (!error && *stat == 1) { > - xfs_refcount_btrec_to_irec(rec, irec); > - trace_xfs_refcount_get(cur->bc_mp, cur->bc_private.a.agno, > - irec); > + if (error || !*stat) > + return error; > + > + xfs_refcount_btrec_to_irec(rec, irec); > + > + agno = cur->bc_private.a.agno; > + if (irec->rc_blockcount == 0 || irec->rc_blockcount > MAXREFCEXTLEN) > + goto out_bad_rec; > + > + /* handle special COW-staging state */ > + realstart = irec->rc_startblock; > + if (realstart & XFS_REFC_COW_START) { > + if (irec->rc_refcount != 1) > + goto out_bad_rec; > + realstart &= ~XFS_REFC_COW_START; > } If the record does not have the cow "flag" set then the refcount cannot be 1, so this needs an else clause: } else { if (irec->rc_refcount == 1) goto out_bad_rec; } > - return error; > + > + /* check for valid extent range, including overflow */ > + if (!xfs_verify_agbno(mp, agno, realstart)) > + goto out_bad_rec; > + if (realstart > realstart + irec->rc_blockcount) > + goto out_bad_rec; > + if (!xfs_verify_agbno(mp, agno, realstart + irec->rc_blockcount - 1)) > + goto out_bad_rec; > + > + if (irec->rc_refcount == 0 || irec->rc_refcount > MAXREFCOUNT) > + goto out_bad_rec; > + > + trace_xfs_refcount_get(cur->bc_mp, cur->bc_private.a.agno, irec); > + return 0; > + > +out_bad_rec: > + xfs_warn(mp, > + "Refcount BTree record corruption in AG %d detected!", agno); > + xfs_warn(mp, > + "Start block 0x%x, block count 0x%x, references 0x%x", > + irec->rc_startblock, irec->rc_blockcount, irec->rc_refcount); > + return -EFSCORRUPTED; > } > > /* > diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c > index 87711f9af625..e794bcd6591a 100644 > --- a/fs/xfs/libxfs/xfs_rmap.c > +++ b/fs/xfs/libxfs/xfs_rmap.c > @@ -191,6 +191,8 @@ xfs_rmap_get_rec( > struct xfs_rmap_irec *irec, > int *stat) > { > + struct xfs_mount *mp = cur->bc_mp; > + xfs_agnumber_t agno = cur->bc_private.a.agno; > union xfs_btree_rec *rec; > int error; > > @@ -198,7 +200,43 @@ xfs_rmap_get_rec( > if (error || !*stat) > return error; > > - return xfs_rmap_btrec_to_irec(rec, irec); > + if (xfs_rmap_btrec_to_irec(rec, irec)) > + goto out_bad_rec; > + > + if (irec->rm_blockcount == 0) > + goto out_bad_rec; > + if (irec->rm_startblock <= XFS_AGFL_BLOCK(mp)) { > + if (irec->rm_owner != XFS_RMAP_OWN_FS) > + goto out_bad_rec; > + if (irec->rm_blockcount != XFS_AGFL_BLOCK(mp) + 1) > + goto out_bad_rec; > + } else { > + /* check for valid extent range, including overflow */ > + if (!xfs_verify_agbno(mp, agno, irec->rm_startblock)) > + goto out_bad_rec; > + if (irec->rm_startblock > > + irec->rm_startblock + irec->rm_blockcount) > + goto out_bad_rec; > + if (!xfs_verify_agbno(mp, agno, > + irec->rm_startblock + irec->rm_blockcount - 1)) > + goto out_bad_rec; > + } > + > + if (irec->rm_owner == 0) > + goto out_bad_rec; > + if (irec->rm_owner > XFS_MAXINUMBER && > + irec->rm_owner <= XFS_RMAP_OWN_MIN) rm_owner should be (between XFS_RMAP_OWN_FS and XFS_RMAP_OWN_MIN) or (pass xfs_verify_fsino()) since we cannot have inodes between EOFS and MAXINUMBER. > + goto out_bad_rec; > + Should we check the rmap flags here? > + return 0; > +out_bad_rec: > + xfs_warn(mp, > + "RMAP BTree record corruption in AG %d detected!", agno); "RMap" ? Or "Reverse Mapping"? (Consistent capitalization blah blah...) --D > + xfs_warn(mp, > + "Owner 0x%llx, flags 0x%x, start block 0x%x block count 0x%x", > + irec->rm_owner, irec->rm_flags, irec->rm_startblock, > + irec->rm_blockcount); > + return -EFSCORRUPTED; > } > > struct xfs_find_left_neighbor_info { > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 4/6 v2] xfs: validate btree records on retreival 2018-06-05 17:47 ` Darrick J. Wong @ 2018-06-05 23:02 ` Dave Chinner 0 siblings, 0 replies; 36+ messages in thread From: Dave Chinner @ 2018-06-05 23:02 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-xfs On Tue, Jun 05, 2018 at 10:47:16AM -0700, Darrick J. Wong wrote: > On Tue, Jun 05, 2018 at 04:40:43PM +1000, Dave Chinner wrote: > > @@ -111,16 +111,51 @@ xfs_refcount_get_rec( > > struct xfs_refcount_irec *irec, > > int *stat) > > { > > + struct xfs_mount *mp = cur->bc_mp; > > + xfs_agnumber_t agno = cur->bc_private.a.agno; > > union xfs_btree_rec *rec; > > int error; > > + xfs_agblock_t realstart; > > > > error = xfs_btree_get_rec(cur, &rec, stat); > > - if (!error && *stat == 1) { > > - xfs_refcount_btrec_to_irec(rec, irec); > > - trace_xfs_refcount_get(cur->bc_mp, cur->bc_private.a.agno, > > - irec); > > + if (error || !*stat) > > + return error; > > + > > + xfs_refcount_btrec_to_irec(rec, irec); > > + > > + agno = cur->bc_private.a.agno; > > + if (irec->rc_blockcount == 0 || irec->rc_blockcount > MAXREFCEXTLEN) > > + goto out_bad_rec; > > + > > + /* handle special COW-staging state */ > > + realstart = irec->rc_startblock; > > + if (realstart & XFS_REFC_COW_START) { > > + if (irec->rc_refcount != 1) > > + goto out_bad_rec; > > + realstart &= ~XFS_REFC_COW_START; > > } > > If the record does not have the cow "flag" set then the refcount cannot > be 1, so this needs an else clause: > > } else { > if (irec->rc_refcount == 1) > goto out_bad_rec; > } Ah, yes, that is true, though it should be refcount < 2 because a zero value is invalid, too. > > + if (irec->rm_owner == 0) > > + goto out_bad_rec; > > + if (irec->rm_owner > XFS_MAXINUMBER && > > + irec->rm_owner <= XFS_RMAP_OWN_MIN) > > rm_owner should be (between XFS_RMAP_OWN_FS and XFS_RMAP_OWN_MIN) or > (pass xfs_verify_fsino()) since we cannot have inodes between EOFS and > MAXINUMBER. Yes, that is better. I should have noticed xfs_verify_fsino() - my bad. > > > + goto out_bad_rec; > > + > > Should we check the rmap flags here? They are checked in xfs_refcount_btrec_to_irec(), and it returns -EFSCORRUPTED if invalid flags are set. > > + return 0; > > +out_bad_rec: > > + xfs_warn(mp, > > + "RMAP BTree record corruption in AG %d detected!", agno); > > "RMap" ? Or "Reverse Mapping"? > > (Consistent capitalization blah blah...) I'll use the latter. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 4/6 v3] xfs: validate btree records on retreival 2018-06-05 6:40 ` [PATCH 4/6 v2] " Dave Chinner 2018-06-05 10:42 ` Carlos Maiolino 2018-06-05 17:47 ` Darrick J. Wong @ 2018-06-06 1:21 ` Dave Chinner 2 siblings, 0 replies; 36+ messages in thread From: Dave Chinner @ 2018-06-06 1:21 UTC (permalink / raw) To: linux-xfs From: Dave Chinner <dchinner@redhat.com> So we don't check the validity of records as we walk the btree. When there are corrupt records in the free space btree (e.g. zero startblock/length or beyond EOAG) we just blindly use it and things go bad from there. That leads to assert failures on debug kernels like this: XFS: Assertion failed: fs_is_ok, file: fs/xfs/libxfs/xfs_alloc.c, line: 450 .... Call Trace: xfs_alloc_fixup_trees+0x368/0x5c0 xfs_alloc_ag_vextent_near+0x79a/0xe20 xfs_alloc_ag_vextent+0x1d3/0x330 xfs_alloc_vextent+0x5e9/0x870 Or crashes like this: XFS (loop0): xfs_buf_find: daddr 0x7fb28 out of range, EOFS 0x8000 ..... BUG: unable to handle kernel NULL pointer dereference at 00000000000000c8 .... Call Trace: xfs_bmap_add_extent_hole_real+0x67d/0x930 xfs_bmapi_write+0x934/0xc90 xfs_da_grow_inode_int+0x27e/0x2f0 xfs_dir2_grow_inode+0x55/0x130 xfs_dir2_sf_to_block+0x94/0x5d0 xfs_dir2_sf_addname+0xd0/0x590 xfs_dir_createname+0x168/0x1a0 xfs_rename+0x658/0x9b0 By checking that free space records pulled from the trees are within the valid range, we catch many of these corruptions before they can do damage. This is a generic btree record checking deficiency. We need to validate the records we fetch from all the different btrees before we use them to catch corruptions like this. This patch results in a corrupt record emitting an error message and returning -EFSCORRUPTED, and the higher layers catch that and abort: XFS (loop0): Size Freespace BTree record corruption in AG 0 detected! XFS (loop0): start block 0x0 block count 0x0 XFS (loop0): Internal error xfs_trans_cancel at line 1012 of file fs/xfs/xfs_trans.c. Caller xfs_create+0x42a/0x670 ..... Call Trace: dump_stack+0x85/0xcb xfs_trans_cancel+0x19f/0x1c0 xfs_create+0x42a/0x670 xfs_generic_create+0x1f6/0x2c0 vfs_create+0xf9/0x180 do_mknodat+0x1f9/0x210 do_syscall_64+0x5a/0x180 entry_SYSCALL_64_after_hwframe+0x49/0xbe ..... XFS (loop0): xfs_do_force_shutdown(0x8) called from line 1013 of file fs/xfs/xfs_trans.c. Return address = ffffffff81500868 XFS (loop0): Corruption of in-memory data detected. Shutting down filesystem Signed-off-by: Dave Chinner <dchinner@redhat.com> --- V3: - add refcount range check for non-COW extents - fixed rmap owner range check - fixed rmap error message format "Reverse Mapping BTree..." fs/xfs/libxfs/xfs_alloc.c | 22 +++++++++++++++++++-- fs/xfs/libxfs/xfs_ialloc.c | 31 ++++++++++++++++++++++++++++- fs/xfs/libxfs/xfs_refcount.c | 47 +++++++++++++++++++++++++++++++++++++++----- fs/xfs/libxfs/xfs_rmap.c | 41 +++++++++++++++++++++++++++++++++++++- 4 files changed, 132 insertions(+), 9 deletions(-) diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index a52ffb835b16..1db50cfc0212 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -215,6 +215,8 @@ xfs_alloc_get_rec( xfs_extlen_t *len, /* output: length of extent */ int *stat) /* output: success/failure */ { + struct xfs_mount *mp = cur->bc_mp; + xfs_agnumber_t agno = cur->bc_private.a.agno; union xfs_btree_rec *rec; int error; @@ -222,12 +224,28 @@ xfs_alloc_get_rec( if (error || !(*stat)) return error; if (rec->alloc.ar_blockcount == 0) - return -EFSCORRUPTED; + goto out_bad_rec; *bno = be32_to_cpu(rec->alloc.ar_startblock); *len = be32_to_cpu(rec->alloc.ar_blockcount); - return error; + /* check for valid extent range, including overflow */ + if (!xfs_verify_agbno(mp, agno, *bno)) + goto out_bad_rec; + if (*bno > *bno + *len) + goto out_bad_rec; + if (!xfs_verify_agbno(mp, agno, *bno + *len - 1)) + goto out_bad_rec; + + return 0; + +out_bad_rec: + xfs_warn(mp, + "%s Freespace BTree record corruption in AG %d detected!", + cur->bc_btnum == XFS_BTNUM_BNO ? "Block" : "Size", agno); + xfs_warn(mp, + "start block 0x%x block count 0x%x", *bno, *len); + return -EFSCORRUPTED; } /* diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c index ec5ea02b5553..3f551eb29157 100644 --- a/fs/xfs/libxfs/xfs_ialloc.c +++ b/fs/xfs/libxfs/xfs_ialloc.c @@ -121,16 +121,45 @@ xfs_inobt_get_rec( struct xfs_inobt_rec_incore *irec, int *stat) { + struct xfs_mount *mp = cur->bc_mp; + xfs_agnumber_t agno = cur->bc_private.a.agno; union xfs_btree_rec *rec; int error; + uint64_t realfree; error = xfs_btree_get_rec(cur, &rec, stat); if (error || *stat == 0) return error; - xfs_inobt_btrec_to_irec(cur->bc_mp, rec, irec); + xfs_inobt_btrec_to_irec(mp, rec, irec); + + if (!xfs_verify_agino(mp, agno, irec->ir_startino)) + goto out_bad_rec; + if (irec->ir_count < XFS_INODES_PER_HOLEMASK_BIT || + irec->ir_count > XFS_INODES_PER_CHUNK) + goto out_bad_rec; + if (irec->ir_freecount > XFS_INODES_PER_CHUNK) + goto out_bad_rec; + + /* if there are no holes, return the first available offset */ + if (!xfs_inobt_issparse(irec->ir_holemask)) + realfree = irec->ir_free; + else + realfree = irec->ir_free & xfs_inobt_irec_to_allocmask(irec); + if (hweight64(realfree) != irec->ir_freecount) + goto out_bad_rec; return 0; + +out_bad_rec: + xfs_warn(mp, + "%s Inode BTree record corruption in AG %d detected!", + cur->bc_btnum == XFS_BTNUM_INO ? "Used" : "Free", agno); + xfs_warn(mp, +"start inode 0x%x, count 0x%x, free 0x%x freemask 0x%llx, holemask 0x%x", + irec->ir_startino, irec->ir_count, irec->ir_freecount, + irec->ir_free, irec->ir_holemask); + return -EFSCORRUPTED; } /* diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index ed5704c7dcf5..9dda6fd0bb13 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -111,16 +111,53 @@ xfs_refcount_get_rec( struct xfs_refcount_irec *irec, int *stat) { + struct xfs_mount *mp = cur->bc_mp; + xfs_agnumber_t agno = cur->bc_private.a.agno; union xfs_btree_rec *rec; int error; + xfs_agblock_t realstart; error = xfs_btree_get_rec(cur, &rec, stat); - if (!error && *stat == 1) { - xfs_refcount_btrec_to_irec(rec, irec); - trace_xfs_refcount_get(cur->bc_mp, cur->bc_private.a.agno, - irec); + if (error || !*stat) + return error; + + xfs_refcount_btrec_to_irec(rec, irec); + + agno = cur->bc_private.a.agno; + if (irec->rc_blockcount == 0 || irec->rc_blockcount > MAXREFCEXTLEN) + goto out_bad_rec; + + /* handle special COW-staging state */ + realstart = irec->rc_startblock; + if (realstart & XFS_REFC_COW_START) { + if (irec->rc_refcount != 1) + goto out_bad_rec; + realstart &= ~XFS_REFC_COW_START; + } else if (irec->rc_refcount < 2) { + goto out_bad_rec; } - return error; + + /* check for valid extent range, including overflow */ + if (!xfs_verify_agbno(mp, agno, realstart)) + goto out_bad_rec; + if (realstart > realstart + irec->rc_blockcount) + goto out_bad_rec; + if (!xfs_verify_agbno(mp, agno, realstart + irec->rc_blockcount - 1)) + goto out_bad_rec; + + if (irec->rc_refcount == 0 || irec->rc_refcount > MAXREFCOUNT) + goto out_bad_rec; + + trace_xfs_refcount_get(cur->bc_mp, cur->bc_private.a.agno, irec); + return 0; + +out_bad_rec: + xfs_warn(mp, + "Refcount BTree record corruption in AG %d detected!", agno); + xfs_warn(mp, + "Start block 0x%x, block count 0x%x, references 0x%x", + irec->rc_startblock, irec->rc_blockcount, irec->rc_refcount); + return -EFSCORRUPTED; } /* diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c index 87711f9af625..d4460b0d2d81 100644 --- a/fs/xfs/libxfs/xfs_rmap.c +++ b/fs/xfs/libxfs/xfs_rmap.c @@ -27,6 +27,7 @@ #include "xfs_extent_busy.h" #include "xfs_bmap.h" #include "xfs_inode.h" +#include "xfs_ialloc.h" /* * Lookup the first record less than or equal to [bno, len, owner, offset] @@ -191,6 +192,8 @@ xfs_rmap_get_rec( struct xfs_rmap_irec *irec, int *stat) { + struct xfs_mount *mp = cur->bc_mp; + xfs_agnumber_t agno = cur->bc_private.a.agno; union xfs_btree_rec *rec; int error; @@ -198,7 +201,43 @@ xfs_rmap_get_rec( if (error || !*stat) return error; - return xfs_rmap_btrec_to_irec(rec, irec); + if (xfs_rmap_btrec_to_irec(rec, irec)) + goto out_bad_rec; + + if (irec->rm_blockcount == 0) + goto out_bad_rec; + if (irec->rm_startblock <= XFS_AGFL_BLOCK(mp)) { + if (irec->rm_owner != XFS_RMAP_OWN_FS) + goto out_bad_rec; + if (irec->rm_blockcount != XFS_AGFL_BLOCK(mp) + 1) + goto out_bad_rec; + } else { + /* check for valid extent range, including overflow */ + if (!xfs_verify_agbno(mp, agno, irec->rm_startblock)) + goto out_bad_rec; + if (irec->rm_startblock > + irec->rm_startblock + irec->rm_blockcount) + goto out_bad_rec; + if (!xfs_verify_agbno(mp, agno, + irec->rm_startblock + irec->rm_blockcount - 1)) + goto out_bad_rec; + } + + if (!(xfs_verify_ino(mp, irec->rm_owner) || + (irec->rm_owner <= XFS_RMAP_OWN_FS && + irec->rm_owner >= XFS_RMAP_OWN_MIN))) + goto out_bad_rec; + + return 0; +out_bad_rec: + xfs_warn(mp, + "Reverse Mapping BTree record corruption in AG %d detected!", + agno); + xfs_warn(mp, + "Owner 0x%llx, flags 0x%x, start block 0x%x block count 0x%x", + irec->rm_owner, irec->rm_flags, irec->rm_startblock, + irec->rm_blockcount); + return -EFSCORRUPTED; } struct xfs_find_left_neighbor_info { -- Dave Chinner david@fromorbit.com ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 5/6] xfs: verify root inode more thoroughly 2018-06-05 6:24 [PATCH 0/6 V2] xfs: more verifications! Dave Chinner ` (3 preceding siblings ...) 2018-06-05 6:24 ` [PATCH 4/6] xfs: validate btree records on retreival Dave Chinner @ 2018-06-05 6:24 ` Dave Chinner 2018-06-05 10:50 ` Carlos Maiolino 2018-06-05 17:10 ` Darrick J. Wong 2018-06-05 6:24 ` [PATCH 6/6] xfs: push corruption -> ESTALE conversion to xfs_nfs_get_inode() Dave Chinner 5 siblings, 2 replies; 36+ messages in thread From: Dave Chinner @ 2018-06-05 6:24 UTC (permalink / raw) To: linux-xfs From: Dave Chinner <dchinner@redhat.com> When looking up the root inode at mount time, we don't actually do any verification to check that the inode is allocated and accounted for correctly in the INOBT. Make the checks on the root inode more robust by making it an untrusted lookup. This forces the inode lookup to use the inode btree to verify the inode is allocated and mapped correctly to disk. This will also have the effect of catching a significant number of AGI/INOBT related corruptions in AG 0 at mount time. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/xfs_mount.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index 189fa7b615d3..a3378252baa1 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -862,9 +862,12 @@ xfs_mountfs( * Get and sanity-check the root inode. * Save the pointer to it in the mount structure. */ - error = xfs_iget(mp, NULL, sbp->sb_rootino, 0, XFS_ILOCK_EXCL, &rip); + error = xfs_iget(mp, NULL, sbp->sb_rootino, XFS_IGET_UNTRUSTED, + XFS_ILOCK_EXCL, &rip); if (error) { - xfs_warn(mp, "failed to read root inode"); + xfs_warn(mp, + "Failed to read root inode 0x%llx, error %d", + sbp->sb_rootino, -error); goto out_log_dealloc; } -- 2.17.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 5/6] xfs: verify root inode more thoroughly 2018-06-05 6:24 ` [PATCH 5/6] xfs: verify root inode more thoroughly Dave Chinner @ 2018-06-05 10:50 ` Carlos Maiolino 2018-06-05 17:10 ` Darrick J. Wong 1 sibling, 0 replies; 36+ messages in thread From: Carlos Maiolino @ 2018-06-05 10:50 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs On Tue, Jun 05, 2018 at 04:24:22PM +1000, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > When looking up the root inode at mount time, we don't actually do > any verification to check that the inode is allocated and accounted > for correctly in the INOBT. Make the checks on the root inode more > robust by making it an untrusted lookup. This forces the inode > lookup to use the inode btree to verify the inode is allocated > and mapped correctly to disk. This will also have the effect of > catching a significant number of AGI/INOBT related corruptions in > AG 0 at mount time. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> > --- > fs/xfs/xfs_mount.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c > index 189fa7b615d3..a3378252baa1 100644 > --- a/fs/xfs/xfs_mount.c > +++ b/fs/xfs/xfs_mount.c > @@ -862,9 +862,12 @@ xfs_mountfs( > * Get and sanity-check the root inode. > * Save the pointer to it in the mount structure. > */ > - error = xfs_iget(mp, NULL, sbp->sb_rootino, 0, XFS_ILOCK_EXCL, &rip); > + error = xfs_iget(mp, NULL, sbp->sb_rootino, XFS_IGET_UNTRUSTED, > + XFS_ILOCK_EXCL, &rip); > if (error) { > - xfs_warn(mp, "failed to read root inode"); > + xfs_warn(mp, > + "Failed to read root inode 0x%llx, error %d", > + sbp->sb_rootino, -error); > goto out_log_dealloc; > } > > -- > 2.17.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Carlos ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 5/6] xfs: verify root inode more thoroughly 2018-06-05 6:24 ` [PATCH 5/6] xfs: verify root inode more thoroughly Dave Chinner 2018-06-05 10:50 ` Carlos Maiolino @ 2018-06-05 17:10 ` Darrick J. Wong 1 sibling, 0 replies; 36+ messages in thread From: Darrick J. Wong @ 2018-06-05 17:10 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs On Tue, Jun 05, 2018 at 04:24:22PM +1000, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > When looking up the root inode at mount time, we don't actually do > any verification to check that the inode is allocated and accounted > for correctly in the INOBT. Make the checks on the root inode more > robust by making it an untrusted lookup. This forces the inode > lookup to use the inode btree to verify the inode is allocated > and mapped correctly to disk. This will also have the effect of > catching a significant number of AGI/INOBT related corruptions in > AG 0 at mount time. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> Looks ok, Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> --D > --- > fs/xfs/xfs_mount.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c > index 189fa7b615d3..a3378252baa1 100644 > --- a/fs/xfs/xfs_mount.c > +++ b/fs/xfs/xfs_mount.c > @@ -862,9 +862,12 @@ xfs_mountfs( > * Get and sanity-check the root inode. > * Save the pointer to it in the mount structure. > */ > - error = xfs_iget(mp, NULL, sbp->sb_rootino, 0, XFS_ILOCK_EXCL, &rip); > + error = xfs_iget(mp, NULL, sbp->sb_rootino, XFS_IGET_UNTRUSTED, > + XFS_ILOCK_EXCL, &rip); > if (error) { > - xfs_warn(mp, "failed to read root inode"); > + xfs_warn(mp, > + "Failed to read root inode 0x%llx, error %d", > + sbp->sb_rootino, -error); > goto out_log_dealloc; > } > > -- > 2.17.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 6/6] xfs: push corruption -> ESTALE conversion to xfs_nfs_get_inode() 2018-06-05 6:24 [PATCH 0/6 V2] xfs: more verifications! Dave Chinner ` (4 preceding siblings ...) 2018-06-05 6:24 ` [PATCH 5/6] xfs: verify root inode more thoroughly Dave Chinner @ 2018-06-05 6:24 ` Dave Chinner 2018-06-05 11:12 ` Carlos Maiolino 2018-06-05 17:11 ` Darrick J. Wong 5 siblings, 2 replies; 36+ messages in thread From: Dave Chinner @ 2018-06-05 6:24 UTC (permalink / raw) To: linux-xfs From: Dave Chinner <dchinner@redhat.com> In xfs_imap_to_bp(), we convert a -EFSCORRUPTED error to -EINVAL if we are doing an untrusted lookup. This is done because we need failed filehandle lookups to report -ESTALE to the caller, and it does this by converting -EINVAL and -ENOENT errors to -ESTALE. The squashing of EFSCORRUPTED in imap_to_bp makes it impossible for for xfs_iget(UNTRUSTED) callers to determine the difference between "inode does not exist" and "corruption detected during lookup". We realy need that distinction in places calling xfS_iget(UNTRUSTED), so move the filehandle error case handling all the way out to xfs_nfs_get_inode() where it is needed. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/libxfs/xfs_inode_buf.c | 5 ----- fs/xfs/xfs_export.c | 15 ++++++++++++--- 2 files changed, 12 insertions(+), 8 deletions(-) diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c index ea64be7cbd98..0a22c39325cd 100644 --- a/fs/xfs/libxfs/xfs_inode_buf.c +++ b/fs/xfs/libxfs/xfs_inode_buf.c @@ -189,11 +189,6 @@ xfs_imap_to_bp( ASSERT(buf_flags & XBF_TRYLOCK); return error; } - - if (error == -EFSCORRUPTED && - (iget_flags & XFS_IGET_UNTRUSTED)) - return -EINVAL; - xfs_warn(mp, "%s: xfs_trans_read_buf() returned error %d.", __func__, error); return error; diff --git a/fs/xfs/xfs_export.c b/fs/xfs/xfs_export.c index 7af7d6faa709..3cf4682e2510 100644 --- a/fs/xfs/xfs_export.c +++ b/fs/xfs/xfs_export.c @@ -128,15 +128,24 @@ xfs_nfs_get_inode( */ error = xfs_iget(mp, NULL, ino, XFS_IGET_UNTRUSTED, 0, &ip); if (error) { + /* * EINVAL means the inode cluster doesn't exist anymore. - * This implies the filehandle is stale, so we should - * translate it here. + * EFSCORRUPTED means the metadata pointing to the inode cluster + * or the inode cluster itself is corrupt. This implies the + * filehandle is stale, so we should translate it here. * We don't use ESTALE directly down the chain to not * confuse applications using bulkstat that expect EINVAL. */ - if (error == -EINVAL || error == -ENOENT) + switch (error) { + case -EINVAL: + case -ENOENT: + case -EFSCORRUPTED: error = -ESTALE; + break; + default: + break; + } return ERR_PTR(error); } -- 2.17.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 6/6] xfs: push corruption -> ESTALE conversion to xfs_nfs_get_inode() 2018-06-05 6:24 ` [PATCH 6/6] xfs: push corruption -> ESTALE conversion to xfs_nfs_get_inode() Dave Chinner @ 2018-06-05 11:12 ` Carlos Maiolino 2018-06-05 17:11 ` Darrick J. Wong 1 sibling, 0 replies; 36+ messages in thread From: Carlos Maiolino @ 2018-06-05 11:12 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs On Tue, Jun 05, 2018 at 04:24:23PM +1000, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > In xfs_imap_to_bp(), we convert a -EFSCORRUPTED error to -EINVAL if > we are doing an untrusted lookup. This is done because we need > failed filehandle lookups to report -ESTALE to the caller, and it > does this by converting -EINVAL and -ENOENT errors to -ESTALE. > > The squashing of EFSCORRUPTED in imap_to_bp makes it impossible for > for xfs_iget(UNTRUSTED) callers to determine the difference between > "inode does not exist" and "corruption detected during lookup". We > realy need that distinction in places calling xfS_iget(UNTRUSTED), > so move the filehandle error case handling all the way out to > xfs_nfs_get_inode() where it is needed. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > --- Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> > fs/xfs/libxfs/xfs_inode_buf.c | 5 ----- > fs/xfs/xfs_export.c | 15 ++++++++++++--- > 2 files changed, 12 insertions(+), 8 deletions(-) > > diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c > index ea64be7cbd98..0a22c39325cd 100644 > --- a/fs/xfs/libxfs/xfs_inode_buf.c > +++ b/fs/xfs/libxfs/xfs_inode_buf.c > @@ -189,11 +189,6 @@ xfs_imap_to_bp( > ASSERT(buf_flags & XBF_TRYLOCK); > return error; > } > - > - if (error == -EFSCORRUPTED && > - (iget_flags & XFS_IGET_UNTRUSTED)) > - return -EINVAL; > - > xfs_warn(mp, "%s: xfs_trans_read_buf() returned error %d.", > __func__, error); > return error; > diff --git a/fs/xfs/xfs_export.c b/fs/xfs/xfs_export.c > index 7af7d6faa709..3cf4682e2510 100644 > --- a/fs/xfs/xfs_export.c > +++ b/fs/xfs/xfs_export.c > @@ -128,15 +128,24 @@ xfs_nfs_get_inode( > */ > error = xfs_iget(mp, NULL, ino, XFS_IGET_UNTRUSTED, 0, &ip); > if (error) { > + > /* > * EINVAL means the inode cluster doesn't exist anymore. > - * This implies the filehandle is stale, so we should > - * translate it here. > + * EFSCORRUPTED means the metadata pointing to the inode cluster > + * or the inode cluster itself is corrupt. This implies the > + * filehandle is stale, so we should translate it here. > * We don't use ESTALE directly down the chain to not > * confuse applications using bulkstat that expect EINVAL. > */ > - if (error == -EINVAL || error == -ENOENT) > + switch (error) { > + case -EINVAL: > + case -ENOENT: > + case -EFSCORRUPTED: > error = -ESTALE; > + break; > + default: > + break; > + } > return ERR_PTR(error); > } > > -- > 2.17.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Carlos ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 6/6] xfs: push corruption -> ESTALE conversion to xfs_nfs_get_inode() 2018-06-05 6:24 ` [PATCH 6/6] xfs: push corruption -> ESTALE conversion to xfs_nfs_get_inode() Dave Chinner 2018-06-05 11:12 ` Carlos Maiolino @ 2018-06-05 17:11 ` Darrick J. Wong 1 sibling, 0 replies; 36+ messages in thread From: Darrick J. Wong @ 2018-06-05 17:11 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs On Tue, Jun 05, 2018 at 04:24:23PM +1000, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > In xfs_imap_to_bp(), we convert a -EFSCORRUPTED error to -EINVAL if > we are doing an untrusted lookup. This is done because we need > failed filehandle lookups to report -ESTALE to the caller, and it > does this by converting -EINVAL and -ENOENT errors to -ESTALE. > > The squashing of EFSCORRUPTED in imap_to_bp makes it impossible for > for xfs_iget(UNTRUSTED) callers to determine the difference between > "inode does not exist" and "corruption detected during lookup". We > realy need that distinction in places calling xfS_iget(UNTRUSTED), > so move the filehandle error case handling all the way out to > xfs_nfs_get_inode() where it is needed. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> Woot! Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> --D > --- > fs/xfs/libxfs/xfs_inode_buf.c | 5 ----- > fs/xfs/xfs_export.c | 15 ++++++++++++--- > 2 files changed, 12 insertions(+), 8 deletions(-) > > diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c > index ea64be7cbd98..0a22c39325cd 100644 > --- a/fs/xfs/libxfs/xfs_inode_buf.c > +++ b/fs/xfs/libxfs/xfs_inode_buf.c > @@ -189,11 +189,6 @@ xfs_imap_to_bp( > ASSERT(buf_flags & XBF_TRYLOCK); > return error; > } > - > - if (error == -EFSCORRUPTED && > - (iget_flags & XFS_IGET_UNTRUSTED)) > - return -EINVAL; > - > xfs_warn(mp, "%s: xfs_trans_read_buf() returned error %d.", > __func__, error); > return error; > diff --git a/fs/xfs/xfs_export.c b/fs/xfs/xfs_export.c > index 7af7d6faa709..3cf4682e2510 100644 > --- a/fs/xfs/xfs_export.c > +++ b/fs/xfs/xfs_export.c > @@ -128,15 +128,24 @@ xfs_nfs_get_inode( > */ > error = xfs_iget(mp, NULL, ino, XFS_IGET_UNTRUSTED, 0, &ip); > if (error) { > + > /* > * EINVAL means the inode cluster doesn't exist anymore. > - * This implies the filehandle is stale, so we should > - * translate it here. > + * EFSCORRUPTED means the metadata pointing to the inode cluster > + * or the inode cluster itself is corrupt. This implies the > + * filehandle is stale, so we should translate it here. > * We don't use ESTALE directly down the chain to not > * confuse applications using bulkstat that expect EINVAL. > */ > - if (error == -EINVAL || error == -ENOENT) > + switch (error) { > + case -EINVAL: > + case -ENOENT: > + case -EFSCORRUPTED: > error = -ESTALE; > + break; > + default: > + break; > + } > return ERR_PTR(error); > } > > -- > 2.17.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2018-08-22 4:03 UTC | newest] Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-06-05 6:24 [PATCH 0/6 V2] xfs: more verifications! Dave Chinner 2018-06-05 6:24 ` [PATCH 1/6] xfs: catch bad stripe alignment configurations Dave Chinner 2018-06-05 9:27 ` Carlos Maiolino 2018-06-05 6:24 ` [PATCH 2/6] xfs: verify extent size hint is valid in inode verifier Dave Chinner 2018-06-05 9:53 ` Carlos Maiolino 2018-06-05 22:56 ` Dave Chinner 2018-06-05 17:10 ` Darrick J. Wong 2018-06-07 16:16 ` Darrick J. Wong 2018-06-08 1:10 ` Dave Chinner 2018-06-08 1:23 ` Darrick J. Wong 2018-06-08 2:23 ` Eric Sandeen 2018-07-24 6:39 ` Eric Sandeen 2018-07-24 16:43 ` Darrick J. Wong 2018-08-20 15:06 ` Brian Foster 2018-08-20 15:27 ` Eric Sandeen 2018-08-20 15:36 ` Darrick J. Wong 2018-08-20 15:59 ` Brian Foster 2018-08-20 22:15 ` Dave Chinner 2018-08-21 10:56 ` Brian Foster 2018-08-22 0:41 ` Dave Chinner 2018-06-05 6:24 ` [PATCH 3/6] xfs: verify COW " Dave Chinner 2018-06-05 10:00 ` Carlos Maiolino 2018-06-05 17:09 ` Darrick J. Wong 2018-06-05 6:24 ` [PATCH 4/6] xfs: validate btree records on retreival Dave Chinner 2018-06-05 6:40 ` [PATCH 4/6 v2] " Dave Chinner 2018-06-05 10:42 ` Carlos Maiolino 2018-06-05 23:00 ` Dave Chinner 2018-06-05 17:47 ` Darrick J. Wong 2018-06-05 23:02 ` Dave Chinner 2018-06-06 1:21 ` [PATCH 4/6 v3] " Dave Chinner 2018-06-05 6:24 ` [PATCH 5/6] xfs: verify root inode more thoroughly Dave Chinner 2018-06-05 10:50 ` Carlos Maiolino 2018-06-05 17:10 ` Darrick J. Wong 2018-06-05 6:24 ` [PATCH 6/6] xfs: push corruption -> ESTALE conversion to xfs_nfs_get_inode() Dave Chinner 2018-06-05 11:12 ` Carlos Maiolino 2018-06-05 17:11 ` Darrick J. Wong
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.