* [PATCH v2 0/3] xfs: fix maxbytes problems on 32-bit systems @ 2020-01-08 4:17 Darrick J. Wong 2020-01-08 4:17 ` [PATCH 1/3] xfs: introduce XFS_MAX_FILEOFF Darrick J. Wong ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Darrick J. Wong @ 2020-01-08 4:17 UTC (permalink / raw) To: darrick.wong; +Cc: linux-xfs Hi all, Here's a series of patches to fix some problems related to s_maxbytes that I found by running fstests in a 32-bit VM. The first step is to fix bunmapi during inactivation so that it cleans out /all/ the blocks of an unlinked file instead of leaking them; and the second is to fix the s_maxbytes computation to avoid setting a limit larger than what the pagecache supports. This has been lightly tested with fstests. Enjoy! Comments and questions are, as always, welcome. --D ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/3] xfs: introduce XFS_MAX_FILEOFF 2020-01-08 4:17 [PATCH v2 0/3] xfs: fix maxbytes problems on 32-bit systems Darrick J. Wong @ 2020-01-08 4:17 ` Darrick J. Wong 2020-01-08 8:09 ` Christoph Hellwig 2020-01-08 20:40 ` Dave Chinner 2020-01-08 4:17 ` [PATCH 2/3] xfs: truncate should remove all blocks, not just to the end of the page cache Darrick J. Wong 2020-01-08 4:17 ` [PATCH 3/3] xfs: fix s_maxbytes computation on 32-bit kernels Darrick J. Wong 2 siblings, 2 replies; 14+ messages in thread From: Darrick J. Wong @ 2020-01-08 4:17 UTC (permalink / raw) To: darrick.wong; +Cc: linux-xfs From: Darrick J. Wong <darrick.wong@oracle.com> Introduce a new #define for the maximum supported file block offset. We'll use this in the next patch to make it more obvious that we're doing some operation for all possible inode fork mappings after a given offset. We can't use ULLONG_MAX here because bunmapi uses that to detect when it's done. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> --- fs/xfs/libxfs/xfs_format.h | 1 + fs/xfs/xfs_reflink.c | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index 1b7dcbae051c..c2976e441d43 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -1540,6 +1540,7 @@ typedef struct xfs_bmdr_block { #define BMBT_BLOCKCOUNT_BITLEN 21 #define BMBT_STARTOFF_MASK ((1ULL << BMBT_STARTOFF_BITLEN) - 1) +#define XFS_MAX_FILEOFF (BMBT_STARTOFF_MASK) typedef struct xfs_bmbt_rec { __be64 l0, l1; diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index de451235c4ee..7a6c94295b8a 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -1457,7 +1457,8 @@ xfs_reflink_clear_inode_flag( * We didn't find any shared blocks so turn off the reflink flag. * First, get rid of any leftover CoW mappings. */ - error = xfs_reflink_cancel_cow_blocks(ip, tpp, 0, NULLFILEOFF, true); + error = xfs_reflink_cancel_cow_blocks(ip, tpp, 0, XFS_MAX_FILEOFF, + true); if (error) return error; ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] xfs: introduce XFS_MAX_FILEOFF 2020-01-08 4:17 ` [PATCH 1/3] xfs: introduce XFS_MAX_FILEOFF Darrick J. Wong @ 2020-01-08 8:09 ` Christoph Hellwig 2020-01-08 16:37 ` Darrick J. Wong 2020-01-08 20:40 ` Dave Chinner 1 sibling, 1 reply; 14+ messages in thread From: Christoph Hellwig @ 2020-01-08 8:09 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-xfs > --- a/fs/xfs/libxfs/xfs_format.h > +++ b/fs/xfs/libxfs/xfs_format.h > @@ -1540,6 +1540,7 @@ typedef struct xfs_bmdr_block { > #define BMBT_BLOCKCOUNT_BITLEN 21 > > #define BMBT_STARTOFF_MASK ((1ULL << BMBT_STARTOFF_BITLEN) - 1) > +#define XFS_MAX_FILEOFF (BMBT_STARTOFF_MASK) I think throwing in a comment here would be useful. Otherwise the patch looks good to me. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] xfs: introduce XFS_MAX_FILEOFF 2020-01-08 8:09 ` Christoph Hellwig @ 2020-01-08 16:37 ` Darrick J. Wong 0 siblings, 0 replies; 14+ messages in thread From: Darrick J. Wong @ 2020-01-08 16:37 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-xfs On Wed, Jan 08, 2020 at 12:09:27AM -0800, Christoph Hellwig wrote: > > --- a/fs/xfs/libxfs/xfs_format.h > > +++ b/fs/xfs/libxfs/xfs_format.h > > @@ -1540,6 +1540,7 @@ typedef struct xfs_bmdr_block { > > #define BMBT_BLOCKCOUNT_BITLEN 21 > > > > #define BMBT_STARTOFF_MASK ((1ULL << BMBT_STARTOFF_BITLEN) - 1) > > +#define XFS_MAX_FILEOFF (BMBT_STARTOFF_MASK) > > I think throwing in a comment here would be useful. > > Otherwise the patch looks good to me. Ok, done. --D ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] xfs: introduce XFS_MAX_FILEOFF 2020-01-08 4:17 ` [PATCH 1/3] xfs: introduce XFS_MAX_FILEOFF Darrick J. Wong 2020-01-08 8:09 ` Christoph Hellwig @ 2020-01-08 20:40 ` Dave Chinner 2020-01-08 22:32 ` Darrick J. Wong 1 sibling, 1 reply; 14+ messages in thread From: Dave Chinner @ 2020-01-08 20:40 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-xfs On Tue, Jan 07, 2020 at 08:17:38PM -0800, Darrick J. Wong wrote: > From: Darrick J. Wong <darrick.wong@oracle.com> > > Introduce a new #define for the maximum supported file block offset. > We'll use this in the next patch to make it more obvious that we're > doing some operation for all possible inode fork mappings after a given > offset. We can't use ULLONG_MAX here because bunmapi uses that to > detect when it's done. > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > --- > fs/xfs/libxfs/xfs_format.h | 1 + > fs/xfs/xfs_reflink.c | 3 ++- > 2 files changed, 3 insertions(+), 1 deletion(-) > > > diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h > index 1b7dcbae051c..c2976e441d43 100644 > --- a/fs/xfs/libxfs/xfs_format.h > +++ b/fs/xfs/libxfs/xfs_format.h > @@ -1540,6 +1540,7 @@ typedef struct xfs_bmdr_block { > #define BMBT_BLOCKCOUNT_BITLEN 21 > > #define BMBT_STARTOFF_MASK ((1ULL << BMBT_STARTOFF_BITLEN) - 1) > +#define XFS_MAX_FILEOFF (BMBT_STARTOFF_MASK) Isn't the maximum file offset in the BMBT the max start offset + the max length of the extent that is located at BMBT_STARTOFF_MASK? Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] xfs: introduce XFS_MAX_FILEOFF 2020-01-08 20:40 ` Dave Chinner @ 2020-01-08 22:32 ` Darrick J. Wong 2020-01-08 22:42 ` Dave Chinner 0 siblings, 1 reply; 14+ messages in thread From: Darrick J. Wong @ 2020-01-08 22:32 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs On Thu, Jan 09, 2020 at 07:40:41AM +1100, Dave Chinner wrote: > On Tue, Jan 07, 2020 at 08:17:38PM -0800, Darrick J. Wong wrote: > > From: Darrick J. Wong <darrick.wong@oracle.com> > > > > Introduce a new #define for the maximum supported file block offset. > > We'll use this in the next patch to make it more obvious that we're > > doing some operation for all possible inode fork mappings after a given > > offset. We can't use ULLONG_MAX here because bunmapi uses that to > > detect when it's done. > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > > --- > > fs/xfs/libxfs/xfs_format.h | 1 + > > fs/xfs/xfs_reflink.c | 3 ++- > > 2 files changed, 3 insertions(+), 1 deletion(-) > > > > > > diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h > > index 1b7dcbae051c..c2976e441d43 100644 > > --- a/fs/xfs/libxfs/xfs_format.h > > +++ b/fs/xfs/libxfs/xfs_format.h > > @@ -1540,6 +1540,7 @@ typedef struct xfs_bmdr_block { > > #define BMBT_BLOCKCOUNT_BITLEN 21 > > > > #define BMBT_STARTOFF_MASK ((1ULL << BMBT_STARTOFF_BITLEN) - 1) > > +#define XFS_MAX_FILEOFF (BMBT_STARTOFF_MASK) > > Isn't the maximum file offset in the BMBT the max start offset + the > max length of the extent that is located at BMBT_STARTOFF_MASK? Apologies for responding to a question with another question, but has there ever been an XFS that supported an inode size of more than 8EB? Linux supports at most a file offset of 8EB, which is 2^63-1, or 0x7FFF,FFFF,FFFF,FFFF. On a filesystem with 512-byte blocks, the very last byte in the file would be in block 2^54-1, or 0x3F,FFFF,FFFF,FFFF. Larger blocksizes decrease that even further (e.g. 2^47-1, or 0x7FFF,FFFF,FFFF on 64k block filesystems). Therefore, on Linux I conclude that the largest file offset (block) possible is 2^54-1, which is BMBT_STARTOFF_MASK. Unless there's an XFS port that actually supports 16EB files, BMBT_STARTOFF_MASK will suffice here. --D > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] xfs: introduce XFS_MAX_FILEOFF 2020-01-08 22:32 ` Darrick J. Wong @ 2020-01-08 22:42 ` Dave Chinner 2020-01-08 23:04 ` Darrick J. Wong 0 siblings, 1 reply; 14+ messages in thread From: Dave Chinner @ 2020-01-08 22:42 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-xfs On Wed, Jan 08, 2020 at 02:32:38PM -0800, Darrick J. Wong wrote: > On Thu, Jan 09, 2020 at 07:40:41AM +1100, Dave Chinner wrote: > > On Tue, Jan 07, 2020 at 08:17:38PM -0800, Darrick J. Wong wrote: > > > From: Darrick J. Wong <darrick.wong@oracle.com> > > > > > > Introduce a new #define for the maximum supported file block offset. > > > We'll use this in the next patch to make it more obvious that we're > > > doing some operation for all possible inode fork mappings after a given > > > offset. We can't use ULLONG_MAX here because bunmapi uses that to > > > detect when it's done. > > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > > > --- > > > fs/xfs/libxfs/xfs_format.h | 1 + > > > fs/xfs/xfs_reflink.c | 3 ++- > > > 2 files changed, 3 insertions(+), 1 deletion(-) > > > > > > > > > diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h > > > index 1b7dcbae051c..c2976e441d43 100644 > > > --- a/fs/xfs/libxfs/xfs_format.h > > > +++ b/fs/xfs/libxfs/xfs_format.h > > > @@ -1540,6 +1540,7 @@ typedef struct xfs_bmdr_block { > > > #define BMBT_BLOCKCOUNT_BITLEN 21 > > > > > > #define BMBT_STARTOFF_MASK ((1ULL << BMBT_STARTOFF_BITLEN) - 1) > > > +#define XFS_MAX_FILEOFF (BMBT_STARTOFF_MASK) > > > > Isn't the maximum file offset in the BMBT the max start offset + the > > max length of the extent that is located at BMBT_STARTOFF_MASK? > > Apologies for responding to a question with another question, but has > there ever been an XFS that supported an inode size of more than 8EB? Doubt it. > Linux supports at most a file offset of 8EB, which is 2^63-1, or > 0x7FFF,FFFF,FFFF,FFFF. On a filesystem with 512-byte blocks, the very > last byte in the file would be in block 2^54-1, or 0x3F,FFFF,FFFF,FFFF. > Larger blocksizes decrease that even further (e.g. 2^47-1, or > 0x7FFF,FFFF,FFFF on 64k block filesystems). > > Therefore, on Linux I conclude that the largest file offset (block) > possible is 2^54-1, which is BMBT_STARTOFF_MASK. Unless there's an > XFS port that actually supports 16EB files, BMBT_STARTOFF_MASK will > suffice here. Sure, but my point was that checks against the max file offset as a block count are applied to the startoff field, not the startoff + blockcount value, so we can potentially get extents on disk beyond the above definition of XFS_MAX_FILEOFF... i.e. startoff can be < XFS_MAX_FILEOFF, but startoff + blockcount can be > XFS_MAX_FILEOFF, and there's nothing in the code that prevents that from occurring... e.g. what's preventing speculative delalloc from going beyond XFS_MAX_FILEOFF, even though the actual file offset that is being written is within XFS_MAX_FILEOFF? Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] xfs: introduce XFS_MAX_FILEOFF 2020-01-08 22:42 ` Dave Chinner @ 2020-01-08 23:04 ` Darrick J. Wong 0 siblings, 0 replies; 14+ messages in thread From: Darrick J. Wong @ 2020-01-08 23:04 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs On Thu, Jan 09, 2020 at 09:42:16AM +1100, Dave Chinner wrote: > On Wed, Jan 08, 2020 at 02:32:38PM -0800, Darrick J. Wong wrote: > > On Thu, Jan 09, 2020 at 07:40:41AM +1100, Dave Chinner wrote: > > > On Tue, Jan 07, 2020 at 08:17:38PM -0800, Darrick J. Wong wrote: > > > > From: Darrick J. Wong <darrick.wong@oracle.com> > > > > > > > > Introduce a new #define for the maximum supported file block offset. > > > > We'll use this in the next patch to make it more obvious that we're > > > > doing some operation for all possible inode fork mappings after a given > > > > offset. We can't use ULLONG_MAX here because bunmapi uses that to > > > > detect when it's done. > > > > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > > > > --- > > > > fs/xfs/libxfs/xfs_format.h | 1 + > > > > fs/xfs/xfs_reflink.c | 3 ++- > > > > 2 files changed, 3 insertions(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h > > > > index 1b7dcbae051c..c2976e441d43 100644 > > > > --- a/fs/xfs/libxfs/xfs_format.h > > > > +++ b/fs/xfs/libxfs/xfs_format.h > > > > @@ -1540,6 +1540,7 @@ typedef struct xfs_bmdr_block { > > > > #define BMBT_BLOCKCOUNT_BITLEN 21 > > > > > > > > #define BMBT_STARTOFF_MASK ((1ULL << BMBT_STARTOFF_BITLEN) - 1) > > > > +#define XFS_MAX_FILEOFF (BMBT_STARTOFF_MASK) > > > > > > Isn't the maximum file offset in the BMBT the max start offset + the > > > max length of the extent that is located at BMBT_STARTOFF_MASK? > > > > Apologies for responding to a question with another question, but has > > there ever been an XFS that supported an inode size of more than 8EB? > > Doubt it. > > > Linux supports at most a file offset of 8EB, which is 2^63-1, or > > 0x7FFF,FFFF,FFFF,FFFF. On a filesystem with 512-byte blocks, the very > > last byte in the file would be in block 2^54-1, or 0x3F,FFFF,FFFF,FFFF. > > Larger blocksizes decrease that even further (e.g. 2^47-1, or > > 0x7FFF,FFFF,FFFF on 64k block filesystems). > > > > Therefore, on Linux I conclude that the largest file offset (block) > > possible is 2^54-1, which is BMBT_STARTOFF_MASK. Unless there's an > > XFS port that actually supports 16EB files, BMBT_STARTOFF_MASK will > > suffice here. > > Sure, but my point was that checks against the max file offset > as a block count are applied to the startoff field, not the > startoff + blockcount value, so we can potentially get extents on > disk beyond the above definition of XFS_MAX_FILEOFF... > > i.e. startoff can be < XFS_MAX_FILEOFF, but startoff + blockcount > can be > XFS_MAX_FILEOFF, and there's nothing in the code that > prevents that from occurring... > > e.g. what's preventing speculative delalloc from going beyond > XFS_MAX_FILEOFF, even though the actual file offset that is being > written is within XFS_MAX_FILEOFF? I thought we constrained the @prealloc_blocks argument to xfs_bmapi_reserve_delalloc based on s_maxbytes: p_end_fsb = min(p_end_fsb, XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes)); But as there's nothing in the verifiers or anywhere else prohibiting this, I'll change it to (BMBT_STARTOFF_MASK + MAXEXTLEN) for now. --D > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/3] xfs: truncate should remove all blocks, not just to the end of the page cache 2020-01-08 4:17 [PATCH v2 0/3] xfs: fix maxbytes problems on 32-bit systems Darrick J. Wong 2020-01-08 4:17 ` [PATCH 1/3] xfs: introduce XFS_MAX_FILEOFF Darrick J. Wong @ 2020-01-08 4:17 ` Darrick J. Wong 2020-01-08 8:11 ` Christoph Hellwig 2020-01-08 4:17 ` [PATCH 3/3] xfs: fix s_maxbytes computation on 32-bit kernels Darrick J. Wong 2 siblings, 1 reply; 14+ messages in thread From: Darrick J. Wong @ 2020-01-08 4:17 UTC (permalink / raw) To: darrick.wong; +Cc: linux-xfs From: Darrick J. Wong <darrick.wong@oracle.com> xfs_itruncate_extents_flags() is supposed to unmap every block in a file from EOF onwards. Oddly, it uses s_maxbytes as the upper limit to the bunmapi range, even though s_maxbytes reflects the highest offset the pagecache can support, not the highest offset that XFS supports. The result of this confusion is that if you create a 20T file on a 64-bit machine, mount the filesystem on a 32-bit machine, and remove the file, we leak everything above 16T. Fix this by capping the bunmapi request at the maximum possible block offset, not s_maxbytes. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> --- fs/xfs/xfs_inode.c | 23 +++++++++++------------ 1 file changed, 11 insertions(+), 12 deletions(-) diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index fc3aec26ef87..79799ab30c93 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1518,7 +1518,6 @@ xfs_itruncate_extents_flags( struct xfs_mount *mp = ip->i_mount; struct xfs_trans *tp = *tpp; xfs_fileoff_t first_unmap_block; - xfs_fileoff_t last_block; xfs_filblks_t unmap_len; int error = 0; @@ -1540,21 +1539,21 @@ xfs_itruncate_extents_flags( * the end of the file (in a crash where the space is allocated * but the inode size is not yet updated), simply remove any * blocks which show up between the new EOF and the maximum - * possible file size. If the first block to be removed is - * beyond the maximum file size (ie it is the same as last_block), - * then there is nothing to do. + * possible file size. + * + * We have to free all the blocks to the bmbt maximum offset, even if + * the page cache can't scale that far. */ first_unmap_block = XFS_B_TO_FSB(mp, (xfs_ufsize_t)new_size); - last_block = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes); - if (first_unmap_block == last_block) + if (first_unmap_block == XFS_MAX_FILEOFF) return 0; - ASSERT(first_unmap_block < last_block); - unmap_len = last_block - first_unmap_block + 1; - while (!done) { + ASSERT(first_unmap_block < XFS_MAX_FILEOFF); + unmap_len = XFS_MAX_FILEOFF - first_unmap_block + 1; + while (unmap_len > 0) { ASSERT(tp->t_firstblock == NULLFSBLOCK); - error = xfs_bunmapi(tp, ip, first_unmap_block, unmap_len, flags, - XFS_ITRUNC_MAX_EXTENTS, &done); + error = __xfs_bunmapi(tp, ip, first_unmap_block, &unmap_len, + flags, XFS_ITRUNC_MAX_EXTENTS); if (error) goto out; @@ -1574,7 +1573,7 @@ xfs_itruncate_extents_flags( if (whichfork == XFS_DATA_FORK) { /* Remove all pending CoW reservations. */ error = xfs_reflink_cancel_cow_blocks(ip, &tp, - first_unmap_block, last_block, true); + first_unmap_block, XFS_MAX_FILEOFF, true); if (error) goto out; ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] xfs: truncate should remove all blocks, not just to the end of the page cache 2020-01-08 4:17 ` [PATCH 2/3] xfs: truncate should remove all blocks, not just to the end of the page cache Darrick J. Wong @ 2020-01-08 8:11 ` Christoph Hellwig 2020-01-08 16:37 ` Darrick J. Wong 0 siblings, 1 reply; 14+ messages in thread From: Christoph Hellwig @ 2020-01-08 8:11 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-xfs On Tue, Jan 07, 2020 at 08:17:45PM -0800, Darrick J. Wong wrote: > From: Darrick J. Wong <darrick.wong@oracle.com> > > xfs_itruncate_extents_flags() is supposed to unmap every block in a file > from EOF onwards. Oddly, it uses s_maxbytes as the upper limit to the > bunmapi range, even though s_maxbytes reflects the highest offset the > pagecache can support, not the highest offset that XFS supports. > > The result of this confusion is that if you create a 20T file on a > 64-bit machine, mount the filesystem on a 32-bit machine, and remove the > file, we leak everything above 16T. Fix this by capping the bunmapi > request at the maximum possible block offset, not s_maxbytes. > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > --- > fs/xfs/xfs_inode.c | 23 +++++++++++------------ > 1 file changed, 11 insertions(+), 12 deletions(-) > > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c > index fc3aec26ef87..79799ab30c93 100644 > --- a/fs/xfs/xfs_inode.c > +++ b/fs/xfs/xfs_inode.c > @@ -1518,7 +1518,6 @@ xfs_itruncate_extents_flags( > struct xfs_mount *mp = ip->i_mount; > struct xfs_trans *tp = *tpp; > xfs_fileoff_t first_unmap_block; > - xfs_fileoff_t last_block; > xfs_filblks_t unmap_len; > int error = 0; > > @@ -1540,21 +1539,21 @@ xfs_itruncate_extents_flags( > * the end of the file (in a crash where the space is allocated > * but the inode size is not yet updated), simply remove any > * blocks which show up between the new EOF and the maximum > - * possible file size. If the first block to be removed is > - * beyond the maximum file size (ie it is the same as last_block), > - * then there is nothing to do. > + * possible file size. > + * > + * We have to free all the blocks to the bmbt maximum offset, even if > + * the page cache can't scale that far. > */ > first_unmap_block = XFS_B_TO_FSB(mp, (xfs_ufsize_t)new_size); > - last_block = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes); > - if (first_unmap_block == last_block) > + if (first_unmap_block == XFS_MAX_FILEOFF) > return 0; > > - ASSERT(first_unmap_block < last_block); > - unmap_len = last_block - first_unmap_block + 1; > - while (!done) { > + ASSERT(first_unmap_block < XFS_MAX_FILEOFF); Instead of the assert we could just do the early return for first_unmap_block >= XFS_MAX_FILEOFF and throw in a WARN_ON_ONCE, as that condition really should be nothing but a sanity check. Otherwise this looks good to me. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] xfs: truncate should remove all blocks, not just to the end of the page cache 2020-01-08 8:11 ` Christoph Hellwig @ 2020-01-08 16:37 ` Darrick J. Wong 0 siblings, 0 replies; 14+ messages in thread From: Darrick J. Wong @ 2020-01-08 16:37 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-xfs On Wed, Jan 08, 2020 at 12:11:57AM -0800, Christoph Hellwig wrote: > On Tue, Jan 07, 2020 at 08:17:45PM -0800, Darrick J. Wong wrote: > > From: Darrick J. Wong <darrick.wong@oracle.com> > > > > xfs_itruncate_extents_flags() is supposed to unmap every block in a file > > from EOF onwards. Oddly, it uses s_maxbytes as the upper limit to the > > bunmapi range, even though s_maxbytes reflects the highest offset the > > pagecache can support, not the highest offset that XFS supports. > > > > The result of this confusion is that if you create a 20T file on a > > 64-bit machine, mount the filesystem on a 32-bit machine, and remove the > > file, we leak everything above 16T. Fix this by capping the bunmapi > > request at the maximum possible block offset, not s_maxbytes. > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > > --- > > fs/xfs/xfs_inode.c | 23 +++++++++++------------ > > 1 file changed, 11 insertions(+), 12 deletions(-) > > > > > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c > > index fc3aec26ef87..79799ab30c93 100644 > > --- a/fs/xfs/xfs_inode.c > > +++ b/fs/xfs/xfs_inode.c > > @@ -1518,7 +1518,6 @@ xfs_itruncate_extents_flags( > > struct xfs_mount *mp = ip->i_mount; > > struct xfs_trans *tp = *tpp; > > xfs_fileoff_t first_unmap_block; > > - xfs_fileoff_t last_block; > > xfs_filblks_t unmap_len; > > int error = 0; > > > > @@ -1540,21 +1539,21 @@ xfs_itruncate_extents_flags( > > * the end of the file (in a crash where the space is allocated > > * but the inode size is not yet updated), simply remove any > > * blocks which show up between the new EOF and the maximum > > - * possible file size. If the first block to be removed is > > - * beyond the maximum file size (ie it is the same as last_block), > > - * then there is nothing to do. > > + * possible file size. > > + * > > + * We have to free all the blocks to the bmbt maximum offset, even if > > + * the page cache can't scale that far. > > */ > > first_unmap_block = XFS_B_TO_FSB(mp, (xfs_ufsize_t)new_size); > > - last_block = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes); > > - if (first_unmap_block == last_block) > > + if (first_unmap_block == XFS_MAX_FILEOFF) > > return 0; > > > > - ASSERT(first_unmap_block < last_block); > > - unmap_len = last_block - first_unmap_block + 1; > > - while (!done) { > > + ASSERT(first_unmap_block < XFS_MAX_FILEOFF); > > Instead of the assert we could just do the early return for > > first_unmap_block >= XFS_MAX_FILEOFF > > and throw in a WARN_ON_ONCE, as that condition really should be nothing > but a sanity check. > > Otherwise this looks good to me. Ok, done. --D ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 3/3] xfs: fix s_maxbytes computation on 32-bit kernels 2020-01-08 4:17 [PATCH v2 0/3] xfs: fix maxbytes problems on 32-bit systems Darrick J. Wong 2020-01-08 4:17 ` [PATCH 1/3] xfs: introduce XFS_MAX_FILEOFF Darrick J. Wong 2020-01-08 4:17 ` [PATCH 2/3] xfs: truncate should remove all blocks, not just to the end of the page cache Darrick J. Wong @ 2020-01-08 4:17 ` Darrick J. Wong 2020-01-08 8:12 ` Christoph Hellwig 2 siblings, 1 reply; 14+ messages in thread From: Darrick J. Wong @ 2020-01-08 4:17 UTC (permalink / raw) To: darrick.wong; +Cc: linux-xfs From: Darrick J. Wong <darrick.wong@oracle.com> I observed a hang in generic/308 while running fstests on a i686 kernel. The hang occurred when trying to purge the pagecache on a large sparse file that had a page created past MAX_LFS_FILESIZE, which caused an integer overflow in the pagecache xarray and resulted in an infinite loop. I then noticed that Linus changed the definition of MAX_LFS_FILESIZE in commit 0cc3b0ec23ce ("Clarify (and fix) MAX_LFS_FILESIZE macros") so that it is now one page short of the maximum page index on 32-bit kernels. Because the XFS function to compute max offset open-codes the 2005-era MAX_LFS_FILESIZE computation and neither the vfs nor mm perform any sanity checking of s_maxbytes, the code in generic/308 can create a page above the pagecache's limit and kaboom. Fix all this by setting s_maxbytes to MAX_LFS_FILESIZE directly and aborting the mount with a warning if our assumptions ever break. I have no answer for why this seems to have been broken for years and nobody noticed. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> --- fs/xfs/xfs_super.c | 48 +++++++++++++++++++++--------------------------- 1 file changed, 21 insertions(+), 27 deletions(-) diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index d9ae27ddf253..05ca2a7bbe55 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -193,32 +193,6 @@ xfs_fs_show_options( return 0; } -static uint64_t -xfs_max_file_offset( - unsigned int blockshift) -{ - unsigned int pagefactor = 1; - unsigned int bitshift = BITS_PER_LONG - 1; - - /* Figure out maximum filesize, on Linux this can depend on - * the filesystem blocksize (on 32 bit platforms). - * __block_write_begin does this in an [unsigned] long long... - * page->index << (PAGE_SHIFT - bbits) - * So, for page sized blocks (4K on 32 bit platforms), - * this wraps at around 8Tb (hence MAX_LFS_FILESIZE which is - * (((u64)PAGE_SIZE << (BITS_PER_LONG-1))-1) - * but for smaller blocksizes it is less (bbits = log2 bsize). - */ - -#if BITS_PER_LONG == 32 - ASSERT(sizeof(sector_t) == 8); - pagefactor = PAGE_SIZE; - bitshift = BITS_PER_LONG; -#endif - - return (((uint64_t)pagefactor) << bitshift) - 1; -} - /* * Set parameters for inode allocation heuristics, taking into account * filesystem size and inode32/inode64 mount options; i.e. specifically @@ -1424,6 +1398,26 @@ xfs_fc_fill_super( if (error) goto out_free_sb; + /* + * XFS block mappings use 54 bits to store the logical block offset. + * This should suffice to handle the maximum file size that the VFS + * supports (currently 2^63 bytes on 64-bit and ULONG_MAX << PAGE_SHIFT + * bytes on 32-bit), but as XFS and VFS have gotten the s_maxbytes + * calculation wrong on 32-bit kernels in the past, we'll add a WARN_ON + * to check this assertion. + * + * Avoid integer overflow by comparing the maximum bmbt offset to the + * maximum pagecache offset in units of fs blocks. + */ + if (XFS_MAX_FILEOFF < XFS_B_TO_FSBT(mp, MAX_LFS_FILESIZE)) { + xfs_warn(mp, +"MAX_LFS_FILESIZE block offset (%llu) exceeds extent map maximum (%llu)!", + XFS_B_TO_FSBT(mp, MAX_LFS_FILESIZE), + XFS_MAX_FILEOFF); + error = -EINVAL; + goto out_free_sb; + } + error = xfs_filestream_mount(mp); if (error) goto out_free_sb; @@ -1435,7 +1429,7 @@ xfs_fc_fill_super( sb->s_magic = XFS_SUPER_MAGIC; sb->s_blocksize = mp->m_sb.sb_blocksize; sb->s_blocksize_bits = ffs(sb->s_blocksize) - 1; - sb->s_maxbytes = xfs_max_file_offset(sb->s_blocksize_bits); + sb->s_maxbytes = MAX_LFS_FILESIZE; sb->s_max_links = XFS_MAXLINK; sb->s_time_gran = 1; sb->s_time_min = S32_MIN; ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] xfs: fix s_maxbytes computation on 32-bit kernels 2020-01-08 4:17 ` [PATCH 3/3] xfs: fix s_maxbytes computation on 32-bit kernels Darrick J. Wong @ 2020-01-08 8:12 ` Christoph Hellwig 0 siblings, 0 replies; 14+ messages in thread From: Christoph Hellwig @ 2020-01-08 8:12 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-xfs Looks good, Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v3 0/3] xfs: fix maxbytes problems on 32-bit systems @ 2020-01-09 18:44 Darrick J. Wong 2020-01-09 18:44 ` [PATCH 3/3] xfs: fix s_maxbytes computation on 32-bit kernels Darrick J. Wong 0 siblings, 1 reply; 14+ messages in thread From: Darrick J. Wong @ 2020-01-09 18:44 UTC (permalink / raw) To: darrick.wong; +Cc: linux-xfs Hi all, Here's a series of patches to fix some problems related to s_maxbytes that I found by running fstests in a 32-bit VM. The first step is to fix bunmapi during inactivation so that it cleans out /all/ the blocks of an unlinked file instead of leaking them; and the second is to fix the s_maxbytes computation to avoid setting a limit larger than what the pagecache supports. This has been lightly tested with fstests. Enjoy! Comments and questions are, as always, welcome. --D ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 3/3] xfs: fix s_maxbytes computation on 32-bit kernels 2020-01-09 18:44 [PATCH v3 0/3] xfs: fix maxbytes problems on 32-bit systems Darrick J. Wong @ 2020-01-09 18:44 ` Darrick J. Wong 0 siblings, 0 replies; 14+ messages in thread From: Darrick J. Wong @ 2020-01-09 18:44 UTC (permalink / raw) To: darrick.wong; +Cc: linux-xfs, Christoph Hellwig From: Darrick J. Wong <darrick.wong@oracle.com> I observed a hang in generic/308 while running fstests on a i686 kernel. The hang occurred when trying to purge the pagecache on a large sparse file that had a page created past MAX_LFS_FILESIZE, which caused an integer overflow in the pagecache xarray and resulted in an infinite loop. I then noticed that Linus changed the definition of MAX_LFS_FILESIZE in commit 0cc3b0ec23ce ("Clarify (and fix) MAX_LFS_FILESIZE macros") so that it is now one page short of the maximum page index on 32-bit kernels. Because the XFS function to compute max offset open-codes the 2005-era MAX_LFS_FILESIZE computation and neither the vfs nor mm perform any sanity checking of s_maxbytes, the code in generic/308 can create a page above the pagecache's limit and kaboom. Fix all this by setting s_maxbytes to MAX_LFS_FILESIZE directly and aborting the mount with a warning if our assumptions ever break. I have no answer for why this seems to have been broken for years and nobody noticed. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> --- fs/xfs/xfs_super.c | 48 +++++++++++++++++++++--------------------------- 1 file changed, 21 insertions(+), 27 deletions(-) diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index d9ae27ddf253..760901783944 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -193,32 +193,6 @@ xfs_fs_show_options( return 0; } -static uint64_t -xfs_max_file_offset( - unsigned int blockshift) -{ - unsigned int pagefactor = 1; - unsigned int bitshift = BITS_PER_LONG - 1; - - /* Figure out maximum filesize, on Linux this can depend on - * the filesystem blocksize (on 32 bit platforms). - * __block_write_begin does this in an [unsigned] long long... - * page->index << (PAGE_SHIFT - bbits) - * So, for page sized blocks (4K on 32 bit platforms), - * this wraps at around 8Tb (hence MAX_LFS_FILESIZE which is - * (((u64)PAGE_SIZE << (BITS_PER_LONG-1))-1) - * but for smaller blocksizes it is less (bbits = log2 bsize). - */ - -#if BITS_PER_LONG == 32 - ASSERT(sizeof(sector_t) == 8); - pagefactor = PAGE_SIZE; - bitshift = BITS_PER_LONG; -#endif - - return (((uint64_t)pagefactor) << bitshift) - 1; -} - /* * Set parameters for inode allocation heuristics, taking into account * filesystem size and inode32/inode64 mount options; i.e. specifically @@ -1424,6 +1398,26 @@ xfs_fc_fill_super( if (error) goto out_free_sb; + /* + * XFS block mappings use 54 bits to store the logical block offset. + * This should suffice to handle the maximum file size that the VFS + * supports (currently 2^63 bytes on 64-bit and ULONG_MAX << PAGE_SHIFT + * bytes on 32-bit), but as XFS and VFS have gotten the s_maxbytes + * calculation wrong on 32-bit kernels in the past, we'll add a WARN_ON + * to check this assertion. + * + * Avoid integer overflow by comparing the maximum bmbt offset to the + * maximum pagecache offset in units of fs blocks. + */ + if (XFS_B_TO_FSBT(mp, MAX_LFS_FILESIZE) > XFS_MAX_FILEOFF) { + xfs_warn(mp, +"MAX_LFS_FILESIZE block offset (%llu) exceeds extent map maximum (%llu)!", + XFS_B_TO_FSBT(mp, MAX_LFS_FILESIZE), + XFS_MAX_FILEOFF); + error = -EINVAL; + goto out_free_sb; + } + error = xfs_filestream_mount(mp); if (error) goto out_free_sb; @@ -1435,7 +1429,7 @@ xfs_fc_fill_super( sb->s_magic = XFS_SUPER_MAGIC; sb->s_blocksize = mp->m_sb.sb_blocksize; sb->s_blocksize_bits = ffs(sb->s_blocksize) - 1; - sb->s_maxbytes = xfs_max_file_offset(sb->s_blocksize_bits); + sb->s_maxbytes = MAX_LFS_FILESIZE; sb->s_max_links = XFS_MAXLINK; sb->s_time_gran = 1; sb->s_time_min = S32_MIN; ^ permalink raw reply related [flat|nested] 14+ messages in thread
end of thread, other threads:[~2020-01-09 18:46 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-01-08 4:17 [PATCH v2 0/3] xfs: fix maxbytes problems on 32-bit systems Darrick J. Wong 2020-01-08 4:17 ` [PATCH 1/3] xfs: introduce XFS_MAX_FILEOFF Darrick J. Wong 2020-01-08 8:09 ` Christoph Hellwig 2020-01-08 16:37 ` Darrick J. Wong 2020-01-08 20:40 ` Dave Chinner 2020-01-08 22:32 ` Darrick J. Wong 2020-01-08 22:42 ` Dave Chinner 2020-01-08 23:04 ` Darrick J. Wong 2020-01-08 4:17 ` [PATCH 2/3] xfs: truncate should remove all blocks, not just to the end of the page cache Darrick J. Wong 2020-01-08 8:11 ` Christoph Hellwig 2020-01-08 16:37 ` Darrick J. Wong 2020-01-08 4:17 ` [PATCH 3/3] xfs: fix s_maxbytes computation on 32-bit kernels Darrick J. Wong 2020-01-08 8:12 ` Christoph Hellwig 2020-01-09 18:44 [PATCH v3 0/3] xfs: fix maxbytes problems on 32-bit systems Darrick J. Wong 2020-01-09 18:44 ` [PATCH 3/3] xfs: fix s_maxbytes computation on 32-bit kernels Darrick J. Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).