From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8FB23C433F5 for ; Tue, 12 Apr 2022 17:04:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229972AbiDLRHD (ORCPT ); Tue, 12 Apr 2022 13:07:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47258 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346743AbiDLRHC (ORCPT ); Tue, 12 Apr 2022 13:07:02 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 94F8C13E8C for ; Tue, 12 Apr 2022 10:04:43 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 216C4B81F15 for ; Tue, 12 Apr 2022 17:04:42 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B70CAC385A5; Tue, 12 Apr 2022 17:04:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1649783080; bh=UvsivTtyguqqtrVkOTh6Fd/dc5ArNqyBEEr9znZBwNY=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=NKWpQ+UB1LcTH2Xmry7pjLFBxDuww2SlPZgV+IgTjyyHAr2KkbgTajlweBld21NMP lwhrjMa/bHqDyDcjWA/8vDcJ8XGjrC2pHmfETDJJ8O+ITo8ApMHbMA6zN0TTHgdHBz vY3UmaKPyLUYXvxGCbefznv9aHt2UHscSca7PSa710RSu9vVQBfRCQGIfL1IBOyg8U jiKvKSFjqN+QBKglS/BFN5DREa5GV09HQ3gB8WofuhmiN70PRGxIaWyQ6qCEOZmhMO QZJUcaT5flsPYQZ5MhXD101RE/4m6Ibg0F3pmi1j7VXt7mYHqk7ggM2mukuHzT8Kx0 U/sdTpuZBVPPA== Date: Tue, 12 Apr 2022 10:04:40 -0700 From: "Darrick J. Wong" To: Chandan Babu R Cc: linux-xfs@vger.kernel.org, david@fromorbit.com Subject: Re: [PATCH V9.2] xfs: Directory's data fork extent counter can never overflow Message-ID: <20220412170440.GD16799@magnolia> References: <20220406061904.595597-16-chandan.babu@oracle.com> <20220412140237.1759506-1-chandan.babu@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220412140237.1759506-1-chandan.babu@oracle.com> Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org On Tue, Apr 12, 2022 at 07:32:37PM +0530, Chandan Babu R wrote: > The maximum file size that can be represented by the data fork extent counter > in the worst case occurs when all extents are 1 block in length and each block > is 1KB in size. > > With XFS_MAX_EXTCNT_DATA_FORK_SMALL representing maximum extent count and with > 1KB sized blocks, a file can reach upto, > (2^31) * 1KB = 2TB > > This is much larger than the theoretical maximum size of a directory > i.e. XFS_DIR2_SPACE_SIZE * 3 = ~96GB. > > Since a directory's inode can never overflow its data fork extent counter, > this commit removes all the overflow checks associated with > it. xfs_dinode_verify() now performs a rough check to verify if a diretory's > data fork is larger than 96GB. > > Reviewed-by: Dave Chinner > Signed-off-by: Chandan Babu R Neato! Reviewed-by: Darrick J. Wong --D > --- > fs/xfs/libxfs/xfs_bmap.c | 20 ------------- > fs/xfs/libxfs/xfs_da_btree.h | 1 + > fs/xfs/libxfs/xfs_da_format.h | 1 + > fs/xfs/libxfs/xfs_dir2.c | 8 +++++ > fs/xfs/libxfs/xfs_format.h | 13 ++++++++ > fs/xfs/libxfs/xfs_inode_buf.c | 3 ++ > fs/xfs/libxfs/xfs_inode_fork.h | 13 -------- > fs/xfs/xfs_inode.c | 55 ++-------------------------------- > fs/xfs/xfs_symlink.c | 5 ---- > 9 files changed, 28 insertions(+), 91 deletions(-) > > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c > index 1254d4d4821e..4fab0c92ab70 100644 > --- a/fs/xfs/libxfs/xfs_bmap.c > +++ b/fs/xfs/libxfs/xfs_bmap.c > @@ -5147,26 +5147,6 @@ xfs_bmap_del_extent_real( > * Deleting the middle of the extent. > */ > > - /* > - * For directories, -ENOSPC is returned since a directory entry > - * remove operation must not fail due to low extent count > - * availability. -ENOSPC will be handled by higher layers of XFS > - * by letting the corresponding empty Data/Free blocks to linger > - * until a future remove operation. Dabtree blocks would be > - * swapped with the last block in the leaf space and then the > - * new last block will be unmapped. > - * > - * The above logic also applies to the source directory entry of > - * a rename operation. > - */ > - error = xfs_iext_count_may_overflow(ip, whichfork, 1); > - if (error) { > - ASSERT(S_ISDIR(VFS_I(ip)->i_mode) && > - whichfork == XFS_DATA_FORK); > - error = -ENOSPC; > - goto done; > - } > - > old = got; > > got.br_blockcount = del->br_startoff - got.br_startoff; > diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h > index 0faf7d9ac241..7f08f6de48bf 100644 > --- a/fs/xfs/libxfs/xfs_da_btree.h > +++ b/fs/xfs/libxfs/xfs_da_btree.h > @@ -30,6 +30,7 @@ struct xfs_da_geometry { > unsigned int free_hdr_size; /* dir2 free header size */ > unsigned int free_max_bests; /* # of bests entries in dir2 free */ > xfs_dablk_t freeblk; /* blockno of free data v2 */ > + xfs_extnum_t max_extents; /* Max. extents in corresponding fork */ > > xfs_dir2_data_aoff_t data_first_offset; > size_t data_entry_offset; > diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h > index 5a49caa5c9df..95354b7ab7f5 100644 > --- a/fs/xfs/libxfs/xfs_da_format.h > +++ b/fs/xfs/libxfs/xfs_da_format.h > @@ -277,6 +277,7 @@ xfs_dir2_sf_firstentry(struct xfs_dir2_sf_hdr *hdr) > * Directory address space divided into sections, > * spaces separated by 32GB. > */ > +#define XFS_DIR2_MAX_SPACES 3 > #define XFS_DIR2_SPACE_SIZE (1ULL << (32 + XFS_DIR2_DATA_ALIGN_LOG)) > #define XFS_DIR2_DATA_SPACE 0 > #define XFS_DIR2_DATA_OFFSET (XFS_DIR2_DATA_SPACE * XFS_DIR2_SPACE_SIZE) > diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c > index 5f1e4799e8fa..3cd51fa3837b 100644 > --- a/fs/xfs/libxfs/xfs_dir2.c > +++ b/fs/xfs/libxfs/xfs_dir2.c > @@ -150,6 +150,8 @@ xfs_da_mount( > dageo->freeblk = xfs_dir2_byte_to_da(dageo, XFS_DIR2_FREE_OFFSET); > dageo->node_ents = (dageo->blksize - dageo->node_hdr_size) / > (uint)sizeof(xfs_da_node_entry_t); > + dageo->max_extents = (XFS_DIR2_MAX_SPACES * XFS_DIR2_SPACE_SIZE) >> > + mp->m_sb.sb_blocklog; > dageo->magicpct = (dageo->blksize * 37) / 100; > > /* set up attribute geometry - single fsb only */ > @@ -161,6 +163,12 @@ xfs_da_mount( > dageo->node_hdr_size = mp->m_dir_geo->node_hdr_size; > dageo->node_ents = (dageo->blksize - dageo->node_hdr_size) / > (uint)sizeof(xfs_da_node_entry_t); > + > + if (xfs_has_large_extent_counts(mp)) > + dageo->max_extents = XFS_MAX_EXTCNT_ATTR_FORK_LARGE; > + else > + dageo->max_extents = XFS_MAX_EXTCNT_ATTR_FORK_SMALL; > + > dageo->magicpct = (dageo->blksize * 37) / 100; > return 0; > } > diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h > index 82b404c99b80..43de892d0305 100644 > --- a/fs/xfs/libxfs/xfs_format.h > +++ b/fs/xfs/libxfs/xfs_format.h > @@ -915,6 +915,19 @@ enum xfs_dinode_fmt { > * > * Rounding up 47 to the nearest multiple of bits-per-byte results in 48. Hence > * 2^48 was chosen as the maximum data fork extent count. > + * > + * The maximum file size that can be represented by the data fork extent counter > + * in the worst case occurs when all extents are 1 block in length and each > + * block is 1KB in size. > + * > + * With XFS_MAX_EXTCNT_DATA_FORK_SMALL representing maximum extent count and > + * with 1KB sized blocks, a file can reach upto, > + * 1KB * (2^31) = 2TB > + * > + * This is much larger than the theoretical maximum size of a directory > + * i.e. XFS_DIR2_SPACE_SIZE * XFS_DIR2_MAX_SPACES = ~96GB. > + * > + * Hence, a directory inode can never overflow its data fork extent counter. > */ > #define XFS_MAX_EXTCNT_DATA_FORK_LARGE ((xfs_extnum_t)((1ULL << 48) - 1)) > #define XFS_MAX_EXTCNT_ATTR_FORK_LARGE ((xfs_extnum_t)((1ULL << 32) - 1)) > diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c > index ee8d4eb7d048..74b82ec80f8e 100644 > --- a/fs/xfs/libxfs/xfs_inode_buf.c > +++ b/fs/xfs/libxfs/xfs_inode_buf.c > @@ -491,6 +491,9 @@ xfs_dinode_verify( > if (mode && nextents + naextents > nblocks) > return __this_address; > > + if (S_ISDIR(mode) && nextents > mp->m_dir_geo->max_extents) > + return __this_address; > + > if (mode && XFS_DFORK_BOFF(dip) > mp->m_sb.sb_inodesize) > return __this_address; > > diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h > index fd5c3c2d77e0..6f9d69f8896e 100644 > --- a/fs/xfs/libxfs/xfs_inode_fork.h > +++ b/fs/xfs/libxfs/xfs_inode_fork.h > @@ -39,19 +39,6 @@ struct xfs_ifork { > */ > #define XFS_IEXT_PUNCH_HOLE_CNT (1) > > -/* > - * Directory entry addition can cause the following, > - * 1. Data block can be added/removed. > - * A new extent can cause extent count to increase by 1. > - * 2. Free disk block can be added/removed. > - * Same behaviour as described above for Data block. > - * 3. Dabtree blocks. > - * XFS_DA_NODE_MAXDEPTH blocks can be added. Each of these can be new > - * extents. Hence extent count can increase by XFS_DA_NODE_MAXDEPTH. > - */ > -#define XFS_IEXT_DIR_MANIP_CNT(mp) \ > - ((XFS_DA_NODE_MAXDEPTH + 1 + 1) * (mp)->m_dir_geo->fsbcount) > - > /* > * Adding/removing an xattr can cause XFS_DA_NODE_MAXDEPTH extents to > * be added. One extra extent for dabtree in case a local attr is > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c > index adc1355ce853..20f15a0393e1 100644 > --- a/fs/xfs/xfs_inode.c > +++ b/fs/xfs/xfs_inode.c > @@ -1024,11 +1024,6 @@ xfs_create( > xfs_ilock(dp, XFS_ILOCK_EXCL | XFS_ILOCK_PARENT); > unlock_dp_on_error = true; > > - error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK, > - XFS_IEXT_DIR_MANIP_CNT(mp)); > - if (error) > - goto out_trans_cancel; > - > /* > * A newly created regular or special file just has one directory > * entry pointing to them, but a directory also the "." entry > @@ -1242,11 +1237,6 @@ xfs_link( > if (error) > goto std_return; > > - error = xfs_iext_count_may_overflow(tdp, XFS_DATA_FORK, > - XFS_IEXT_DIR_MANIP_CNT(mp)); > - if (error) > - goto error_return; > - > /* > * If we are using project inheritance, we only allow hard link > * creation in our tree when the project IDs are the same; else > @@ -3210,35 +3200,6 @@ xfs_rename( > /* > * Check for expected errors before we dirty the transaction > * so we can return an error without a transaction abort. > - * > - * Extent count overflow check: > - * > - * From the perspective of src_dp, a rename operation is essentially a > - * directory entry remove operation. Hence the only place where we check > - * for extent count overflow for src_dp is in > - * xfs_bmap_del_extent_real(). xfs_bmap_del_extent_real() returns > - * -ENOSPC when it detects a possible extent count overflow and in > - * response, the higher layers of directory handling code do the > - * following: > - * 1. Data/Free blocks: XFS lets these blocks linger until a > - * future remove operation removes them. > - * 2. Dabtree blocks: XFS swaps the blocks with the last block in the > - * Leaf space and unmaps the last block. > - * > - * For target_dp, there are two cases depending on whether the > - * destination directory entry exists or not. > - * > - * When destination directory entry does not exist (i.e. target_ip == > - * NULL), extent count overflow check is performed only when transaction > - * has a non-zero sized space reservation associated with it. With a > - * zero-sized space reservation, XFS allows a rename operation to > - * continue only when the directory has sufficient free space in its > - * data/leaf/free space blocks to hold the new entry. > - * > - * When destination directory entry exists (i.e. target_ip != NULL), all > - * we need to do is change the inode number associated with the already > - * existing entry. Hence there is no need to perform an extent count > - * overflow check. > */ > if (target_ip == NULL) { > /* > @@ -3249,12 +3210,6 @@ xfs_rename( > error = xfs_dir_canenter(tp, target_dp, target_name); > if (error) > goto out_trans_cancel; > - } else { > - error = xfs_iext_count_may_overflow(target_dp, > - XFS_DATA_FORK, > - XFS_IEXT_DIR_MANIP_CNT(mp)); > - if (error) > - goto out_trans_cancel; > } > } else { > /* > @@ -3422,18 +3377,12 @@ xfs_rename( > * inode number of the whiteout inode rather than removing it > * altogether. > */ > - if (wip) { > + if (wip) > error = xfs_dir_replace(tp, src_dp, src_name, wip->i_ino, > spaceres); > - } else { > - /* > - * NOTE: We don't need to check for extent count overflow here > - * because the dir remove name code will leave the dir block in > - * place if the extent count would overflow. > - */ > + else > error = xfs_dir_removename(tp, src_dp, src_name, src_ip->i_ino, > spaceres); > - } > > if (error) > goto out_trans_cancel; > diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c > index affbedf78160..4145ba872547 100644 > --- a/fs/xfs/xfs_symlink.c > +++ b/fs/xfs/xfs_symlink.c > @@ -226,11 +226,6 @@ xfs_symlink( > goto out_trans_cancel; > } > > - error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK, > - XFS_IEXT_DIR_MANIP_CNT(mp)); > - if (error) > - goto out_trans_cancel; > - > /* > * Allocate an inode for the symlink. > */ > -- > 2.30.2 >