All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Chandan Babu R <chandan.babu@oracle.com>, linux-xfs@vger.kernel.org
Subject: Re: [PATCH V8 15/19] xfs: Directory's data fork extent counter can never overflow
Date: Tue, 29 Mar 2022 20:43:33 -0700	[thread overview]
Message-ID: <20220330034333.GG27690@magnolia> (raw)
In-Reply-To: <20220329062340.GY1544202@dread.disaster.area>

On Tue, Mar 29, 2022 at 05:23:40PM +1100, Dave Chinner wrote:
> On Tue, Mar 29, 2022 at 10:52:04AM +0530, Chandan Babu R wrote:
> > On 25 Mar 2022 at 03:44, Dave Chinner wrote:
> > > On Mon, Mar 21, 2022 at 10:47:46AM +0530, Chandan Babu R wrote:
> > >> The maximum file size that can be represented by the data fork extent counter
> > >> in the worst case occurs when all extents are 1 block in length and each block
> > >> is 1KB in size.
> > >> 
> > >> With XFS_MAX_EXTCNT_DATA_FORK_SMALL representing maximum extent count and with
> > >> 1KB sized blocks, a file can reach upto,
> > >> (2^31) * 1KB = 2TB
> > >> 
> > >> This is much larger than the theoretical maximum size of a directory
> > >> i.e. 32GB * 3 = 96GB.
> > >> 
> > >> Since a directory's inode can never overflow its data fork extent counter,
> > >> this commit replaces checking the return value of
> > >> xfs_iext_count_may_overflow() with calls to ASSERT(error == 0).
> > >
> > > I'd really prefer that we don't add noise like this to a bunch of
> > > call sites.  If directories can't overflow the extent count in
> > > normal operation, then why are we even calling
> > > xfs_iext_count_may_overflow() in these paths? i.e. an overflow would
> > > be a sign of an inode corruption, and we should have flagged that
> > > long before we do an operation that might overflow the extent count.
> > >
> > > So, really, I think you should document the directory size
> > > constraints at the site where we define all the large extent count
> > > values in xfs_format.h, remove the xfs_iext_count_may_overflow()
> > > checks from the directory code and replace them with a simple inode
> > > verifier check that we haven't got more than 100GB worth of
> > > individual extents in the data fork for directory inodes....
> > 
> > I don't think that we could trivially verify if the extents in a directory's
> > data fork add up to more than 96GB.
> 
> dip->di_nextents tells us how many extents there are in the data
> fork, we know what the block size of the filesystem is, so it should
> be pretty easy to calculate a maximum extent count for 96GB of
> space. i.e. absolute maximum valid dir data fork extent count
> is (96GB / blocksize).
> 
> > 
> > xfs_dinode->di_size tracks the size of XFS_DIR2_DATA_SPACE. This also includes
> > holes that could be created by freeing directory entries in a single directory
> > block. Also, there is no easy method to determine the space occupied by
> > XFS_DIR2_LEAF_SPACE and XFS_DIR2_FREE_SPACE segments of a directory.
> 
> Sure there is. We do this sort of calc for things like transaction
> reservations via definitions like XFS_DA_NODE_MAXDEPTH. That tells us

Hmmm.  Seeing as I just replaced XFS_BTREE_MAXLEVELS with dynamic limits
set for each filesytem, is XFS_DA_NODE_MAXDEPTH even appropriate for
modern filesystems?  We're about to start allowing far more extended
attributes in the form of parent pointers, so we should be careful about
this.

For a directory, there can be at most 32GB of directory entries, so the
maximum number of directory entries is...

32GB / (directory block size) * (max entries per dir block)

The dabtree stores (u32 hash, u32 offset) records, so I guess it
wouldn't be so hard to compute the number of blocks needed for each node
level until we only need one block, and that's our real
XFS_DA_NODE_MAXEPTH.

But then the second question is: what's the maximum height of a dabtree
that indexes an xattr structure?  I don't think there's any maximum
limit within XFS on the number of attrs you can set on a file, is there?
At least until you hit the iext_max_count check.  I think the VFS
institutes its own limit of 64k for the llistxattr buffer, but that's
about all I can think of.

I suppose right now the xattr structure can't grow larger than 2^(16+21)
blocks in size, which is 2^49 bytes, but that's a mix of attr leaves and
dabtree blocks, unlike directories, right?

> immediately how many blocks can be in the XFS_DIR2_LEAF_SPACE
> segement....
> 
> We also know the maximum number of individual directory blocks in
> the 32GB segment (fixed at 32GB / dir block size), so the free space
> array is also a fixed size at (32GB / dir block size / free space
> entries per block).
> 
> It's easy to just use (96GB / block size) and that will catch most
> corruptions with no risk of a false positive detection, but we could
> quite easily refine this to something like:
> 
> data	(32GB +				
> leaf	 btree blocks(XFS_DA_NODE_MAXDEPTH) +
> freesp	 (32GB / free space records per block))
> frags					/ filesystem block size

I think we ought to do a more careful study of XFS_DA_NODE_MAXDEPTH, but
it could become more involved than we think.  In the interest of keeping
this series moving, can we start with a new verifier check that
(di_nextents < that formula from above) and then refine that based on
whatever improvements we may or may not come up with for
XFS_DA_NODE_MAXDEPTH?

> 
> > May be the following can be added to xfs_dinode_verify(),
> > 
> > 	if (S_ISDIR(mode) && ((xfs_dinode->di_size + 2 * 32GB) > 96GB))
> >     		return __this_address
> 
> That doesn't validate that the on disk or in-memory di_nextents
> value is withing the known valid range or not. We can do that
> directly (as per above), so we shouldn't need a hueristic like this.

Indeed, inode size is not a good proxy variable for extent count.

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

  reply	other threads:[~2022-03-30  3:43 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-21  5:17 [PATCH V8 00/19] xfs: Extend per-inode extent counters Chandan Babu R
2022-03-21  5:17 ` [PATCH V8 01/19] xfs: Move extent count limits to xfs_format.h Chandan Babu R
2022-03-21  5:17 ` [PATCH V8 02/19] xfs: Define max extent length based on on-disk format definition Chandan Babu R
2022-03-21  5:17 ` [PATCH V8 03/19] xfs: Introduce xfs_iext_max_nextents() helper Chandan Babu R
2022-03-21  5:17 ` [PATCH V8 04/19] xfs: Use xfs_extnum_t instead of basic data types Chandan Babu R
2022-03-21  5:17 ` [PATCH V8 05/19] xfs: Introduce xfs_dfork_nextents() helper Chandan Babu R
2022-03-24 21:31   ` Dave Chinner
2022-03-21  5:17 ` [PATCH V8 06/19] xfs: Use basic types to define xfs_log_dinode's di_nextents and di_anextents Chandan Babu R
2022-03-21  5:17 ` [PATCH V8 07/19] xfs: Promote xfs_extnum_t and xfs_aextnum_t to 64 and 32-bits respectively Chandan Babu R
2022-03-24 21:33   ` Dave Chinner
2022-03-21  5:17 ` [PATCH V8 08/19] xfs: Introduce XFS_SB_FEAT_INCOMPAT_NREXT64 and associated per-fs feature bit Chandan Babu R
2022-03-24 21:37   ` Dave Chinner
2022-03-24 21:40     ` Darrick J. Wong
2022-03-25  6:01       ` Chandan Babu R
2022-03-21  5:17 ` [PATCH V8 09/19] xfs: Introduce XFS_FSOP_GEOM_FLAGS_NREXT64 Chandan Babu R
2022-03-21  5:17 ` [PATCH V8 10/19] xfs: Introduce XFS_DIFLAG2_NREXT64 and associated helpers Chandan Babu R
2022-03-24 21:38   ` Dave Chinner
2022-03-21  5:17 ` [PATCH V8 11/19] xfs: Use uint64_t to count maximum blocks that can be used by BMBT Chandan Babu R
2022-03-24 21:42   ` Dave Chinner
2022-03-25  6:01     ` Chandan Babu R
2022-03-21  5:17 ` [PATCH V8 12/19] xfs: Introduce macros to represent new maximum extent counts for data/attr forks Chandan Babu R
2022-03-21  5:17 ` [PATCH V8 13/19] xfs: Replace numbered inode recovery error messages with descriptive ones Chandan Babu R
2022-03-24 21:47   ` Dave Chinner
2022-03-21  5:17 ` [PATCH V8 14/19] xfs: Introduce per-inode 64-bit extent counters Chandan Babu R
2022-03-24 21:53   ` Dave Chinner
2022-03-25  6:02     ` Chandan Babu R
2022-03-21  5:17 ` [PATCH V8 15/19] xfs: Directory's data fork extent counter can never overflow Chandan Babu R
2022-03-24 22:14   ` Dave Chinner
2022-03-25  6:02     ` Chandan Babu R
2022-03-25  6:37       ` Chandan Babu R
2022-03-29  5:22     ` Chandan Babu R
2022-03-29  6:23       ` Dave Chinner
2022-03-30  3:43         ` Darrick J. Wong [this message]
2022-03-30 15:39           ` Chandan Babu R
2022-03-30 15:52             ` Darrick J. Wong
2022-04-01  1:27           ` Dave Chinner
2022-04-01  7:46             ` Chandan Babu R
2022-03-21  5:17 ` [PATCH V8 16/19] xfs: Conditionally upgrade existing inodes to use large extent counters Chandan Babu R
2022-03-24 22:28   ` Dave Chinner
2022-03-25  6:02     ` Chandan Babu R
2022-03-21  5:17 ` [PATCH V8 17/19] xfs: Decouple XFS_IBULK flags from XFS_IWALK flags Chandan Babu R
2022-03-24 22:28   ` Dave Chinner
2022-03-21  5:17 ` [PATCH V8 18/19] xfs: Enable bulkstat ioctl to support 64-bit per-inode extent counters Chandan Babu R
2022-03-24 22:33   ` Dave Chinner
2022-03-21  5:17 ` [PATCH V8 19/19] xfs: Add XFS_SB_FEAT_INCOMPAT_NREXT64 to the list of supported flags Chandan Babu R

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220330034333.GG27690@magnolia \
    --to=djwong@kernel.org \
    --cc=chandan.babu@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.