* [PATCH V9 00/19] xfs: Extend per-inode extent counters
@ 2022-04-06 6:18 Chandan Babu R
2022-04-06 6:18 ` [PATCH V9 01/19] xfs: Move extent count limits to xfs_format.h Chandan Babu R
` (19 more replies)
0 siblings, 20 replies; 62+ messages in thread
From: Chandan Babu R @ 2022-04-06 6:18 UTC (permalink / raw)
To: linux-xfs; +Cc: Chandan Babu R, djwong, david
The commit xfs: fix inode fork extent count overflow
(3f8a4f1d876d3e3e49e50b0396eaffcc4ba71b08) mentions that 10 billion
data fork extents should be possible to create. However the
corresponding on-disk field has a signed 32-bit type. Hence this
patchset extends the per-inode data fork extent counter to 64 bits
(out of which 48 bits are used to store the extent count).
Also, XFS has an attribute fork extent counter which is 16 bits
wide. A workload that,
1. Creates 1 million 255-byte sized xattrs,
2. Deletes 50% of these xattrs in an alternating manner,
3. Tries to insert 400,000 new 255-byte sized xattrs
causes the xattr extent counter to overflow.
Dave tells me that there are instances where a single file has more
than 100 million hardlinks. With parent pointers being stored in
xattrs, we will overflow the signed 16-bits wide attribute extent
counter when large number of hardlinks are created. Hence this
patchset extends the on-disk field to 32-bits.
The following changes are made to accomplish this,
1. A 64-bit inode field is carved out of existing di_pad and
di_flushiter fields to hold the 64-bit data fork extent counter.
2. The existing 32-bit inode data fork extent counter will be used to
hold the attribute fork extent counter.
3. A new incompat superblock flag to prevent older kernels from mounting
the filesystem.
The patchset has been tested by executing xfstests with the following
mkfs.xfs options,
1. -m crc=0 -b size=1k
2. -m crc=0 -b size=4k
3. -m crc=0 -b size=512
4. -m rmapbt=1,reflink=1 -b size=1k
5. -m rmapbt=1,reflink=1 -b size=4k
Each of the above test scenarios were executed on the following
combinations (For V4 FS test scenario, the last combination was
omitted).
|---------------------------+-----------|
| Xfsprogs | Kernel |
|---------------------------+-----------|
| Unpatched | Patched |
| Patched (disable nrext64) | Patched |
| Patched (enable nrext64) | Patched |
|---------------------------+-----------|
I have also written tests to check if the correct extent counter
fields are updated with/without the new incompat flag and to verify
upgrading older fs instances to support large extent counters. I have
also fixed xfs/270 test to work with the new code base.
These patches can also be obtained from
https://github.com/chandanr/linux.git at branch
xfs-incompat-extend-extcnt-v9.
Changelog:
V8 -> V9:
1. Rebase patchset on Linux v5.18-rc1.
2. Replace directory extent count overflow checks with a simple check
added to xfs_dinode_verify().
3. Warn users about "Large extent counters" being an experimental feature.
4. Address other trivial review comments provided for v9 of the patchset.
V7 -> V8:
1. Do not roll a transaction after upgrading an inode to "Large extent
counter" feature. Any transaction which can cause an inode's extent counter
to change, will have included the space required to log the inode in its
transaction reservation calculation.
This means that the patch "xfs: xfs_growfs_rt_alloc: Unlock inode
explicitly rather than through iop_committing()" is no longer required.
2. Use XFS_MAX_EXTCNT_DATA_FORK_LARGE & XFS_MAX_EXTCNT_ATTR_FORK_LARGE to
represent large extent counter limits. Similarly, use
XFS_MAX_EXTCNT_DATA_FORK_SMALL & XFS_MAX_EXTCNT_ATTR_FORK_SMALL to
represent previously defined extent counter limits.
3. Decouple XFS_IBULK flags from XFS_IWALK flags in a separate patch.
4. Bulkstat operation now returns XFS_MAX_EXTCNT_DATA_FORK_SMALL as the extent
count if data fork extent count exceeds XFS_MAX_EXTCNT_DATA_FORK_SMALL and
userspace program isn't aware of large extent counters.
V6 -> V7:
1. Address the following review comments from V6,
- Revert xfs_ibulk->flags to "unsigned int" type.
- Fix definition of XFS_IBULK_NREXT64 to be independent of IWALK flags.
- Fix possible double free of transaction handle in xfs_growfs_rt_alloc().
V5 -> V6:
1. Rebase on Linux-v5.17-rc4.
2. Upgrade inodes to use large extent counters from within a
transaction context.
V4 -> V5:
1. Rebase on xfs-linux/for-next.
2. Use howmany_64() to compute height of maximum bmbt tree.
3. Rename disk and log inode's di_big_dextcnt to di_big_nextents.
4. Rename disk and log inode's di_big_aextcnt to di_big_anextents.
5. Since XFS_IBULK_NREXT64 is not associated with inode walking
functionality, define it as the 32nd bit and mask it when passing
xfs_ibulk->flags to xfs_iwalk() function.
V3 -> V4:
1. Rebase patchset on xfs-linux/for-next branch.
2. Carve out a 64-bit inode field out of the existing di_pad and
di_flushiter fields to hold the 64-bit data fork extent counter.
3. Use the existing 32-bit inode data fork extent counter to hold the
attr fork extent counter.
4. Verify the contents of newly introduced inode fields immediately
after the inode has been read from the disk.
5. Upgrade inodes to be able to hold large extent counters when
reading them from disk.
6. Use XFS_BULK_IREQ_NREXT64 as the flag that userspace can use to
indicate that it can read 64-bit data fork extent counter.
7. Bulkstat ioctl returns -EOVERFLOW when userspace is not capable of
working with large extent counters and inode's data fork extent
count is larger than INT32_MAX.
V2 -> V3:
1. Define maximum extent length as a function of
BMBT_BLOCKCOUNT_BITLEN.
2. Introduce xfs_iext_max_nextents() function in the patch series
before renaming MAXEXTNUM/MAXAEXTNUM. This is done to reduce
proliferation of macros indicating maximum extent count for data
and attribute forks.
3. Define xfs_dfork_nextents() as an inline function.
4. Use xfs_rfsblock_t as the data type for variables that hold block
count.
5. xfs_dfork_nextents() now returns -EFSCORRUPTED when an invalid fork
is passed as an argument.
6. The following changes are done to enable bulkstat ioctl to report
64-bit extent counters,
- Carve out a new 64-bit field xfs_bulkstat->bs_extents64 from
xfs_bulkstat->bs_pad[].
- Carve out a new 64-bit field xfs_bulk_ireq->bulkstat_flags from
xfs_bulk_ireq->reserved[] to hold bulkstat specific operational
flags. Introduce XFS_IBULK_NREXT64 flag to indicate that
userspace has the necessary infrastructure to receive 64-bit
extent counters.
- Define the new flag XFS_BULK_IREQ_BULKSTAT for userspace to
indicate that xfs_bulk_ireq->bulkstat_flags has valid flags set.
7. Rename the incompat flag from XFS_SB_FEAT_INCOMPAT_EXTCOUNT_64BIT
to XFS_SB_FEAT_INCOMPAT_NREXT64.
8. Add a new helper function xfs_inode_to_disk_iext_counters() to
convert from incore inode extent counters to ondisk inode extent
counters.
9. Reuse XFS_ERRTAG_REDUCE_MAX_IEXTENTS error tag to skip reporting
inodes with more than 10 extents when bulkstat ioctl is invoked by
userspace.
10. Introduce the new per-inode XFS_DIFLAG2_NREXT64 flag to indicate
that the inode uses 64-bit extent counter. This is used to allow
administrators to upgrade existing filesystems.
11. Export presence of XFS_SB_FEAT_INCOMPAT_NREXT64 feature to
userspace via XFS_IOC_FSGEOMETRY ioctl.
V1 -> V2:
1. Rebase patches on top of Darrick's btree-dynamic-depth branch.
2. Add new bulkstat ioctl version to support 64-bit data fork extent
counter field.
3. Introduce new error tag to verify if the old bulkstat ioctls skip
reporting inodes with large data fork extent counters.
Chandan Babu R (19):
xfs: Move extent count limits to xfs_format.h
xfs: Define max extent length based on on-disk format definition
xfs: Introduce xfs_iext_max_nextents() helper
xfs: Use xfs_extnum_t instead of basic data types
xfs: Introduce xfs_dfork_nextents() helper
xfs: Use basic types to define xfs_log_dinode's di_nextents and
di_anextents
xfs: Promote xfs_extnum_t and xfs_aextnum_t to 64 and 32-bits
respectively
xfs: Introduce XFS_SB_FEAT_INCOMPAT_NREXT64 and associated per-fs
feature bit
xfs: Introduce XFS_FSOP_GEOM_FLAGS_NREXT64
xfs: Introduce XFS_DIFLAG2_NREXT64 and associated helpers
xfs: Use uint64_t to count maximum blocks that can be used by BMBT
xfs: Introduce macros to represent new maximum extent counts for
data/attr forks
xfs: Replace numbered inode recovery error messages with descriptive
ones
xfs: Introduce per-inode 64-bit extent counters
xfs: Directory's data fork extent counter can never overflow
xfs: Conditionally upgrade existing inodes to use large extent
counters
xfs: Decouple XFS_IBULK flags from XFS_IWALK flags
xfs: Enable bulkstat ioctl to support 64-bit per-inode extent counters
xfs: Add XFS_SB_FEAT_INCOMPAT_NREXT64 to the list of supported flags
fs/xfs/libxfs/xfs_alloc.c | 2 +-
fs/xfs/libxfs/xfs_attr.c | 10 +++
fs/xfs/libxfs/xfs_bmap.c | 109 +++++++++++--------------
fs/xfs/libxfs/xfs_bmap_btree.c | 3 +-
fs/xfs/libxfs/xfs_da_format.h | 1 +
fs/xfs/libxfs/xfs_format.h | 101 ++++++++++++++++++++---
fs/xfs/libxfs/xfs_fs.h | 21 ++++-
fs/xfs/libxfs/xfs_ialloc.c | 2 +
fs/xfs/libxfs/xfs_inode_buf.c | 89 ++++++++++++++++----
fs/xfs/libxfs/xfs_inode_fork.c | 34 ++++++--
fs/xfs/libxfs/xfs_inode_fork.h | 76 +++++++++++++----
fs/xfs/libxfs/xfs_log_format.h | 33 +++++++-
fs/xfs/libxfs/xfs_sb.c | 5 ++
fs/xfs/libxfs/xfs_trans_resv.c | 11 +--
fs/xfs/libxfs/xfs_types.h | 11 +--
fs/xfs/scrub/bmap.c | 2 +-
fs/xfs/scrub/inode.c | 20 ++---
fs/xfs/xfs_bmap_item.c | 2 +
fs/xfs/xfs_bmap_util.c | 27 +++++--
fs/xfs/xfs_dquot.c | 3 +
fs/xfs/xfs_inode.c | 59 +-------------
fs/xfs/xfs_inode.h | 5 ++
fs/xfs/xfs_inode_item.c | 23 +++++-
fs/xfs/xfs_inode_item_recover.c | 139 +++++++++++++++++++++++---------
fs/xfs/xfs_ioctl.c | 3 +
fs/xfs/xfs_iomap.c | 33 ++++----
fs/xfs/xfs_itable.c | 19 ++++-
fs/xfs/xfs_itable.h | 4 +-
fs/xfs/xfs_iwalk.h | 2 +-
fs/xfs/xfs_mount.h | 2 +
fs/xfs/xfs_reflink.c | 5 ++
fs/xfs/xfs_rtalloc.c | 3 +
fs/xfs/xfs_super.c | 5 ++
fs/xfs/xfs_symlink.c | 5 --
fs/xfs/xfs_trace.h | 4 +-
35 files changed, 599 insertions(+), 274 deletions(-)
--
2.30.2
^ permalink raw reply [flat|nested] 62+ messages in thread
* [PATCH V9 01/19] xfs: Move extent count limits to xfs_format.h
2022-04-06 6:18 [PATCH V9 00/19] xfs: Extend per-inode extent counters Chandan Babu R
@ 2022-04-06 6:18 ` Chandan Babu R
2022-04-06 6:18 ` [PATCH V9 02/19] xfs: Define max extent length based on on-disk format definition Chandan Babu R
` (18 subsequent siblings)
19 siblings, 0 replies; 62+ messages in thread
From: Chandan Babu R @ 2022-04-06 6:18 UTC (permalink / raw)
To: linux-xfs; +Cc: Chandan Babu R, djwong, david, Dave Chinner
Maximum values associated with extent counters i.e. Maximum extent length,
Maximum data extents and Maximum xattr extents are dictated by the on-disk
format. Hence move these definitions over to xfs_format.h.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
fs/xfs/libxfs/xfs_format.h | 7 +++++++
fs/xfs/libxfs/xfs_types.h | 7 -------
2 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index d665c04e69dd..d75e5b16da7e 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -869,6 +869,13 @@ enum xfs_dinode_fmt {
{ XFS_DINODE_FMT_BTREE, "btree" }, \
{ XFS_DINODE_FMT_UUID, "uuid" }
+/*
+ * Max values for extlen, extnum, aextnum.
+ */
+#define MAXEXTLEN ((xfs_extlen_t)0x001fffff) /* 21 bits */
+#define MAXEXTNUM ((xfs_extnum_t)0x7fffffff) /* signed int */
+#define MAXAEXTNUM ((xfs_aextnum_t)0x7fff) /* signed short */
+
/*
* Inode minimum and maximum sizes.
*/
diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
index b6da06b40989..794a54cbd0de 100644
--- a/fs/xfs/libxfs/xfs_types.h
+++ b/fs/xfs/libxfs/xfs_types.h
@@ -56,13 +56,6 @@ typedef void * xfs_failaddr_t;
#define NULLFSINO ((xfs_ino_t)-1)
#define NULLAGINO ((xfs_agino_t)-1)
-/*
- * Max values for extlen, extnum, aextnum.
- */
-#define MAXEXTLEN ((xfs_extlen_t)0x001fffff) /* 21 bits */
-#define MAXEXTNUM ((xfs_extnum_t)0x7fffffff) /* signed int */
-#define MAXAEXTNUM ((xfs_aextnum_t)0x7fff) /* signed short */
-
/*
* Minimum and maximum blocksize and sectorsize.
* The blocksize upper limit is pretty much arbitrary.
--
2.30.2
^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH V9 02/19] xfs: Define max extent length based on on-disk format definition
2022-04-06 6:18 [PATCH V9 00/19] xfs: Extend per-inode extent counters Chandan Babu R
2022-04-06 6:18 ` [PATCH V9 01/19] xfs: Move extent count limits to xfs_format.h Chandan Babu R
@ 2022-04-06 6:18 ` Chandan Babu R
2022-04-06 6:18 ` [PATCH V9 03/19] xfs: Introduce xfs_iext_max_nextents() helper Chandan Babu R
` (17 subsequent siblings)
19 siblings, 0 replies; 62+ messages in thread
From: Chandan Babu R @ 2022-04-06 6:18 UTC (permalink / raw)
To: linux-xfs; +Cc: Chandan Babu R, djwong, david, Dave Chinner
The maximum extent length depends on maximum block count that can be stored in
a BMBT record. Hence this commit defines MAXEXTLEN based on
BMBT_BLOCKCOUNT_BITLEN.
While at it, the commit also renames MAXEXTLEN to XFS_MAX_BMBT_EXTLEN.
Suggested-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
fs/xfs/libxfs/xfs_alloc.c | 2 +-
fs/xfs/libxfs/xfs_bmap.c | 57 +++++++++++++++++-----------------
fs/xfs/libxfs/xfs_format.h | 5 +--
fs/xfs/libxfs/xfs_inode_buf.c | 4 +--
fs/xfs/libxfs/xfs_trans_resv.c | 11 ++++---
fs/xfs/scrub/bmap.c | 2 +-
fs/xfs/xfs_bmap_util.c | 14 +++++----
fs/xfs/xfs_iomap.c | 28 ++++++++---------
8 files changed, 64 insertions(+), 59 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index b52ed339727f..f2a918ed7b8a 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2511,7 +2511,7 @@ __xfs_free_extent_later(
ASSERT(bno != NULLFSBLOCK);
ASSERT(len > 0);
- ASSERT(len <= MAXEXTLEN);
+ ASSERT(len <= XFS_MAX_BMBT_EXTLEN);
ASSERT(!isnullstartblock(bno));
agno = XFS_FSB_TO_AGNO(mp, bno);
agbno = XFS_FSB_TO_AGBNO(mp, bno);
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 74198dd82b03..00b8e6e1c404 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -1452,7 +1452,7 @@ xfs_bmap_add_extent_delay_real(
LEFT.br_startoff + LEFT.br_blockcount == new->br_startoff &&
LEFT.br_startblock + LEFT.br_blockcount == new->br_startblock &&
LEFT.br_state == new->br_state &&
- LEFT.br_blockcount + new->br_blockcount <= MAXEXTLEN)
+ LEFT.br_blockcount + new->br_blockcount <= XFS_MAX_BMBT_EXTLEN)
state |= BMAP_LEFT_CONTIG;
/*
@@ -1470,13 +1470,13 @@ xfs_bmap_add_extent_delay_real(
new_endoff == RIGHT.br_startoff &&
new->br_startblock + new->br_blockcount == RIGHT.br_startblock &&
new->br_state == RIGHT.br_state &&
- new->br_blockcount + RIGHT.br_blockcount <= MAXEXTLEN &&
+ new->br_blockcount + RIGHT.br_blockcount <= XFS_MAX_BMBT_EXTLEN &&
((state & (BMAP_LEFT_CONTIG | BMAP_LEFT_FILLING |
BMAP_RIGHT_FILLING)) !=
(BMAP_LEFT_CONTIG | BMAP_LEFT_FILLING |
BMAP_RIGHT_FILLING) ||
LEFT.br_blockcount + new->br_blockcount + RIGHT.br_blockcount
- <= MAXEXTLEN))
+ <= XFS_MAX_BMBT_EXTLEN))
state |= BMAP_RIGHT_CONTIG;
error = 0;
@@ -2000,7 +2000,7 @@ xfs_bmap_add_extent_unwritten_real(
LEFT.br_startoff + LEFT.br_blockcount == new->br_startoff &&
LEFT.br_startblock + LEFT.br_blockcount == new->br_startblock &&
LEFT.br_state == new->br_state &&
- LEFT.br_blockcount + new->br_blockcount <= MAXEXTLEN)
+ LEFT.br_blockcount + new->br_blockcount <= XFS_MAX_BMBT_EXTLEN)
state |= BMAP_LEFT_CONTIG;
/*
@@ -2018,13 +2018,13 @@ xfs_bmap_add_extent_unwritten_real(
new_endoff == RIGHT.br_startoff &&
new->br_startblock + new->br_blockcount == RIGHT.br_startblock &&
new->br_state == RIGHT.br_state &&
- new->br_blockcount + RIGHT.br_blockcount <= MAXEXTLEN &&
+ new->br_blockcount + RIGHT.br_blockcount <= XFS_MAX_BMBT_EXTLEN &&
((state & (BMAP_LEFT_CONTIG | BMAP_LEFT_FILLING |
BMAP_RIGHT_FILLING)) !=
(BMAP_LEFT_CONTIG | BMAP_LEFT_FILLING |
BMAP_RIGHT_FILLING) ||
LEFT.br_blockcount + new->br_blockcount + RIGHT.br_blockcount
- <= MAXEXTLEN))
+ <= XFS_MAX_BMBT_EXTLEN))
state |= BMAP_RIGHT_CONTIG;
/*
@@ -2510,15 +2510,15 @@ xfs_bmap_add_extent_hole_delay(
*/
if ((state & BMAP_LEFT_VALID) && (state & BMAP_LEFT_DELAY) &&
left.br_startoff + left.br_blockcount == new->br_startoff &&
- left.br_blockcount + new->br_blockcount <= MAXEXTLEN)
+ left.br_blockcount + new->br_blockcount <= XFS_MAX_BMBT_EXTLEN)
state |= BMAP_LEFT_CONTIG;
if ((state & BMAP_RIGHT_VALID) && (state & BMAP_RIGHT_DELAY) &&
new->br_startoff + new->br_blockcount == right.br_startoff &&
- new->br_blockcount + right.br_blockcount <= MAXEXTLEN &&
+ new->br_blockcount + right.br_blockcount <= XFS_MAX_BMBT_EXTLEN &&
(!(state & BMAP_LEFT_CONTIG) ||
(left.br_blockcount + new->br_blockcount +
- right.br_blockcount <= MAXEXTLEN)))
+ right.br_blockcount <= XFS_MAX_BMBT_EXTLEN)))
state |= BMAP_RIGHT_CONTIG;
/*
@@ -2661,17 +2661,17 @@ xfs_bmap_add_extent_hole_real(
left.br_startoff + left.br_blockcount == new->br_startoff &&
left.br_startblock + left.br_blockcount == new->br_startblock &&
left.br_state == new->br_state &&
- left.br_blockcount + new->br_blockcount <= MAXEXTLEN)
+ left.br_blockcount + new->br_blockcount <= XFS_MAX_BMBT_EXTLEN)
state |= BMAP_LEFT_CONTIG;
if ((state & BMAP_RIGHT_VALID) && !(state & BMAP_RIGHT_DELAY) &&
new->br_startoff + new->br_blockcount == right.br_startoff &&
new->br_startblock + new->br_blockcount == right.br_startblock &&
new->br_state == right.br_state &&
- new->br_blockcount + right.br_blockcount <= MAXEXTLEN &&
+ new->br_blockcount + right.br_blockcount <= XFS_MAX_BMBT_EXTLEN &&
(!(state & BMAP_LEFT_CONTIG) ||
left.br_blockcount + new->br_blockcount +
- right.br_blockcount <= MAXEXTLEN))
+ right.br_blockcount <= XFS_MAX_BMBT_EXTLEN))
state |= BMAP_RIGHT_CONTIG;
error = 0;
@@ -2906,15 +2906,15 @@ xfs_bmap_extsize_align(
/*
* For large extent hint sizes, the aligned extent might be larger than
- * MAXEXTLEN. In that case, reduce the size by an extsz so that it pulls
- * the length back under MAXEXTLEN. The outer allocation loops handle
- * short allocation just fine, so it is safe to do this. We only want to
- * do it when we are forced to, though, because it means more allocation
- * operations are required.
+ * XFS_BMBT_MAX_EXTLEN. In that case, reduce the size by an extsz so
+ * that it pulls the length back under XFS_BMBT_MAX_EXTLEN. The outer
+ * allocation loops handle short allocation just fine, so it is safe to
+ * do this. We only want to do it when we are forced to, though, because
+ * it means more allocation operations are required.
*/
- while (align_alen > MAXEXTLEN)
+ while (align_alen > XFS_MAX_BMBT_EXTLEN)
align_alen -= extsz;
- ASSERT(align_alen <= MAXEXTLEN);
+ ASSERT(align_alen <= XFS_MAX_BMBT_EXTLEN);
/*
* If the previous block overlaps with this proposed allocation
@@ -3004,9 +3004,9 @@ xfs_bmap_extsize_align(
return -EINVAL;
} else {
ASSERT(orig_off >= align_off);
- /* see MAXEXTLEN handling above */
+ /* see XFS_BMBT_MAX_EXTLEN handling above */
ASSERT(orig_end <= align_off + align_alen ||
- align_alen + extsz > MAXEXTLEN);
+ align_alen + extsz > XFS_MAX_BMBT_EXTLEN);
}
#ifdef DEBUG
@@ -3971,7 +3971,7 @@ xfs_bmapi_reserve_delalloc(
* Cap the alloc length. Keep track of prealloc so we know whether to
* tag the inode before we return.
*/
- alen = XFS_FILBLKS_MIN(len + prealloc, MAXEXTLEN);
+ alen = XFS_FILBLKS_MIN(len + prealloc, XFS_MAX_BMBT_EXTLEN);
if (!eof)
alen = XFS_FILBLKS_MIN(alen, got->br_startoff - aoff);
if (prealloc && alen >= len)
@@ -4104,7 +4104,7 @@ xfs_bmapi_allocate(
if (!xfs_iext_peek_prev_extent(ifp, &bma->icur, &bma->prev))
bma->prev.br_startoff = NULLFILEOFF;
} else {
- bma->length = XFS_FILBLKS_MIN(bma->length, MAXEXTLEN);
+ bma->length = XFS_FILBLKS_MIN(bma->length, XFS_MAX_BMBT_EXTLEN);
if (!bma->eof)
bma->length = XFS_FILBLKS_MIN(bma->length,
bma->got.br_startoff - bma->offset);
@@ -4424,8 +4424,8 @@ xfs_bmapi_write(
* xfs_extlen_t and therefore 32 bits. Hence we have to
* check for 32-bit overflows and handle them here.
*/
- if (len > (xfs_filblks_t)MAXEXTLEN)
- bma.length = MAXEXTLEN;
+ if (len > (xfs_filblks_t)XFS_MAX_BMBT_EXTLEN)
+ bma.length = XFS_MAX_BMBT_EXTLEN;
else
bma.length = len;
@@ -4560,7 +4560,8 @@ xfs_bmapi_convert_delalloc(
bma.ip = ip;
bma.wasdel = true;
bma.offset = bma.got.br_startoff;
- bma.length = max_t(xfs_filblks_t, bma.got.br_blockcount, MAXEXTLEN);
+ bma.length = max_t(xfs_filblks_t, bma.got.br_blockcount,
+ XFS_MAX_BMBT_EXTLEN);
bma.minleft = xfs_bmapi_minleft(tp, ip, whichfork);
/*
@@ -4641,7 +4642,7 @@ xfs_bmapi_remap(
ifp = XFS_IFORK_PTR(ip, whichfork);
ASSERT(len > 0);
- ASSERT(len <= (xfs_filblks_t)MAXEXTLEN);
+ ASSERT(len <= (xfs_filblks_t)XFS_MAX_BMBT_EXTLEN);
ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
ASSERT(!(flags & ~(XFS_BMAPI_ATTRFORK | XFS_BMAPI_PREALLOC |
XFS_BMAPI_NORMAP)));
@@ -5641,7 +5642,7 @@ xfs_bmse_can_merge(
if ((left->br_startoff + left->br_blockcount != startoff) ||
(left->br_startblock + left->br_blockcount != got->br_startblock) ||
(left->br_state != got->br_state) ||
- (left->br_blockcount + got->br_blockcount > MAXEXTLEN))
+ (left->br_blockcount + got->br_blockcount > XFS_MAX_BMBT_EXTLEN))
return false;
return true;
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index d75e5b16da7e..66594853a88b 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -870,9 +870,8 @@ enum xfs_dinode_fmt {
{ XFS_DINODE_FMT_UUID, "uuid" }
/*
- * Max values for extlen, extnum, aextnum.
+ * Max values for extnum and aextnum.
*/
-#define MAXEXTLEN ((xfs_extlen_t)0x001fffff) /* 21 bits */
#define MAXEXTNUM ((xfs_extnum_t)0x7fffffff) /* signed int */
#define MAXAEXTNUM ((xfs_aextnum_t)0x7fff) /* signed short */
@@ -1603,6 +1602,8 @@ typedef struct xfs_bmdr_block {
#define BMBT_STARTOFF_MASK ((1ULL << BMBT_STARTOFF_BITLEN) - 1)
#define BMBT_BLOCKCOUNT_MASK ((1ULL << BMBT_BLOCKCOUNT_BITLEN) - 1)
+#define XFS_MAX_BMBT_EXTLEN ((xfs_extlen_t)(BMBT_BLOCKCOUNT_MASK))
+
/*
* bmbt records have a file offset (block) field that is 54 bits wide, so this
* is the largest xfs_fileoff_t that we ever expect to see.
diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index cae9708c8587..87781a5d5a45 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -639,7 +639,7 @@ xfs_inode_validate_extsize(
if (extsize_bytes % blocksize_bytes)
return __this_address;
- if (extsize > MAXEXTLEN)
+ if (extsize > XFS_MAX_BMBT_EXTLEN)
return __this_address;
if (!rt_flag && extsize > mp->m_sb.sb_agblocks / 2)
@@ -696,7 +696,7 @@ xfs_inode_validate_cowextsize(
if (cowextsize_bytes % mp->m_sb.sb_blocksize)
return __this_address;
- if (cowextsize > MAXEXTLEN)
+ if (cowextsize > XFS_MAX_BMBT_EXTLEN)
return __this_address;
if (cowextsize > mp->m_sb.sb_agblocks / 2)
diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
index 6f83d9b306ee..8e1d09e8cc9a 100644
--- a/fs/xfs/libxfs/xfs_trans_resv.c
+++ b/fs/xfs/libxfs/xfs_trans_resv.c
@@ -199,8 +199,8 @@ xfs_calc_inode_chunk_res(
/*
* Per-extent log reservation for the btree changes involved in freeing or
* allocating a realtime extent. We have to be able to log as many rtbitmap
- * blocks as needed to mark inuse MAXEXTLEN blocks' worth of realtime extents,
- * as well as the realtime summary block.
+ * blocks as needed to mark inuse XFS_BMBT_MAX_EXTLEN blocks' worth of realtime
+ * extents, as well as the realtime summary block.
*/
static unsigned int
xfs_rtalloc_log_count(
@@ -210,7 +210,7 @@ xfs_rtalloc_log_count(
unsigned int blksz = XFS_FSB_TO_B(mp, 1);
unsigned int rtbmp_bytes;
- rtbmp_bytes = (MAXEXTLEN / mp->m_sb.sb_rextsize) / NBBY;
+ rtbmp_bytes = (XFS_MAX_BMBT_EXTLEN / mp->m_sb.sb_rextsize) / NBBY;
return (howmany(rtbmp_bytes, blksz) + 1) * num_ops;
}
@@ -247,7 +247,7 @@ xfs_rtalloc_log_count(
* the inode's bmap btree: max depth * block size
* the agfs of the ags from which the extents are allocated: 2 * sector
* the superblock free block counter: sector size
- * the realtime bitmap: ((MAXEXTLEN / rtextsize) / NBBY) bytes
+ * the realtime bitmap: ((XFS_BMBT_MAX_EXTLEN / rtextsize) / NBBY) bytes
* the realtime summary: 1 block
* the allocation btrees: 2 trees * (2 * max depth - 1) * block size
* And the bmap_finish transaction can free bmap blocks in a join (t3):
@@ -299,7 +299,8 @@ xfs_calc_write_reservation(
* the agf for each of the ags: 2 * sector size
* the agfl for each of the ags: 2 * sector size
* the super block to reflect the freed blocks: sector size
- * the realtime bitmap: 2 exts * ((MAXEXTLEN / rtextsize) / NBBY) bytes
+ * the realtime bitmap:
+ * 2 exts * ((XFS_BMBT_MAX_EXTLEN / rtextsize) / NBBY) bytes
* the realtime summary: 2 exts * 1 block
* worst case split in allocation btrees per extent assuming 2 extents:
* 2 exts * 2 trees * (2 * max depth - 1) * block size
diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c
index a4cbbc346f60..c357593e0a02 100644
--- a/fs/xfs/scrub/bmap.c
+++ b/fs/xfs/scrub/bmap.c
@@ -350,7 +350,7 @@ xchk_bmap_iextent(
irec->br_startoff);
/* Make sure the extent points to a valid place. */
- if (irec->br_blockcount > MAXEXTLEN)
+ if (irec->br_blockcount > XFS_MAX_BMBT_EXTLEN)
xchk_fblock_set_corrupt(info->sc, info->whichfork,
irec->br_startoff);
if (info->is_rt &&
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index eb2e387ba528..18c1b99311a8 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -119,14 +119,14 @@ xfs_bmap_rtalloc(
*/
ralen = ap->length / mp->m_sb.sb_rextsize;
/*
- * If the old value was close enough to MAXEXTLEN that
+ * If the old value was close enough to XFS_BMBT_MAX_EXTLEN that
* we rounded up to it, cut it back so it's valid again.
* Note that if it's a really large request (bigger than
- * MAXEXTLEN), we don't hear about that number, and can't
+ * XFS_BMBT_MAX_EXTLEN), we don't hear about that number, and can't
* adjust the starting point to match it.
*/
- if (ralen * mp->m_sb.sb_rextsize >= MAXEXTLEN)
- ralen = MAXEXTLEN / mp->m_sb.sb_rextsize;
+ if (ralen * mp->m_sb.sb_rextsize >= XFS_MAX_BMBT_EXTLEN)
+ ralen = XFS_MAX_BMBT_EXTLEN / mp->m_sb.sb_rextsize;
/*
* Lock out modifications to both the RT bitmap and summary inodes
@@ -839,9 +839,11 @@ xfs_alloc_file_space(
* count, hence we need to limit the number of blocks we are
* trying to reserve to avoid an overflow. We can't allocate
* more than @nimaps extents, and an extent is limited on disk
- * to MAXEXTLEN (21 bits), so use that to enforce the limit.
+ * to XFS_BMBT_MAX_EXTLEN (21 bits), so use that to enforce the
+ * limit.
*/
- resblks = min_t(xfs_fileoff_t, (e - s), (MAXEXTLEN * nimaps));
+ resblks = min_t(xfs_fileoff_t, (e - s),
+ (XFS_MAX_BMBT_EXTLEN * nimaps));
if (unlikely(rt)) {
dblocks = XFS_DIOSTRAT_SPACE_RES(mp, 0);
rblocks = resblks;
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index e552ce541ec2..87e1cf5060bd 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -402,7 +402,7 @@ xfs_iomap_prealloc_size(
*/
plen = prev.br_blockcount;
while (xfs_iext_prev_extent(ifp, &ncur, &got)) {
- if (plen > MAXEXTLEN / 2 ||
+ if (plen > XFS_MAX_BMBT_EXTLEN / 2 ||
isnullstartblock(got.br_startblock) ||
got.br_startoff + got.br_blockcount != prev.br_startoff ||
got.br_startblock + got.br_blockcount != prev.br_startblock)
@@ -414,23 +414,23 @@ xfs_iomap_prealloc_size(
/*
* If the size of the extents is greater than half the maximum extent
* length, then use the current offset as the basis. This ensures that
- * for large files the preallocation size always extends to MAXEXTLEN
- * rather than falling short due to things like stripe unit/width
- * alignment of real extents.
+ * for large files the preallocation size always extends to
+ * XFS_BMBT_MAX_EXTLEN rather than falling short due to things like stripe
+ * unit/width alignment of real extents.
*/
alloc_blocks = plen * 2;
- if (alloc_blocks > MAXEXTLEN)
+ if (alloc_blocks > XFS_MAX_BMBT_EXTLEN)
alloc_blocks = XFS_B_TO_FSB(mp, offset);
qblocks = alloc_blocks;
/*
- * MAXEXTLEN is not a power of two value but we round the prealloc down
- * to the nearest power of two value after throttling. To prevent the
- * round down from unconditionally reducing the maximum supported
- * prealloc size, we round up first, apply appropriate throttling,
- * round down and cap the value to MAXEXTLEN.
+ * XFS_BMBT_MAX_EXTLEN is not a power of two value but we round the prealloc
+ * down to the nearest power of two value after throttling. To prevent
+ * the round down from unconditionally reducing the maximum supported
+ * prealloc size, we round up first, apply appropriate throttling, round
+ * down and cap the value to XFS_BMBT_MAX_EXTLEN.
*/
- alloc_blocks = XFS_FILEOFF_MIN(roundup_pow_of_two(MAXEXTLEN),
+ alloc_blocks = XFS_FILEOFF_MIN(roundup_pow_of_two(XFS_MAX_BMBT_EXTLEN),
alloc_blocks);
freesp = percpu_counter_read_positive(&mp->m_fdblocks);
@@ -478,14 +478,14 @@ xfs_iomap_prealloc_size(
*/
if (alloc_blocks)
alloc_blocks = rounddown_pow_of_two(alloc_blocks);
- if (alloc_blocks > MAXEXTLEN)
- alloc_blocks = MAXEXTLEN;
+ if (alloc_blocks > XFS_MAX_BMBT_EXTLEN)
+ alloc_blocks = XFS_MAX_BMBT_EXTLEN;
/*
* If we are still trying to allocate more space than is
* available, squash the prealloc hard. This can happen if we
* have a large file on a small filesystem and the above
- * lowspace thresholds are smaller than MAXEXTLEN.
+ * lowspace thresholds are smaller than XFS_BMBT_MAX_EXTLEN.
*/
while (alloc_blocks && alloc_blocks >= freesp)
alloc_blocks >>= 4;
--
2.30.2
^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH V9 03/19] xfs: Introduce xfs_iext_max_nextents() helper
2022-04-06 6:18 [PATCH V9 00/19] xfs: Extend per-inode extent counters Chandan Babu R
2022-04-06 6:18 ` [PATCH V9 01/19] xfs: Move extent count limits to xfs_format.h Chandan Babu R
2022-04-06 6:18 ` [PATCH V9 02/19] xfs: Define max extent length based on on-disk format definition Chandan Babu R
@ 2022-04-06 6:18 ` Chandan Babu R
2022-04-06 6:18 ` [PATCH V9 04/19] xfs: Use xfs_extnum_t instead of basic data types Chandan Babu R
` (16 subsequent siblings)
19 siblings, 0 replies; 62+ messages in thread
From: Chandan Babu R @ 2022-04-06 6:18 UTC (permalink / raw)
To: linux-xfs; +Cc: Chandan Babu R, djwong, david, Dave Chinner
xfs_iext_max_nextents() returns the maximum number of extents possible for one
of data, cow or attribute fork. This helper will be extended further in a
future commit when maximum extent counts associated with data/attribute forks
are increased.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
fs/xfs/libxfs/xfs_bmap.c | 9 ++++-----
fs/xfs/libxfs/xfs_inode_buf.c | 8 +++-----
fs/xfs/libxfs/xfs_inode_fork.c | 2 +-
fs/xfs/libxfs/xfs_inode_fork.h | 8 ++++++++
4 files changed, 16 insertions(+), 11 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 00b8e6e1c404..a713bc7242a4 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -74,13 +74,12 @@ xfs_bmap_compute_maxlevels(
* ATTR2 we have to assume the worst case scenario of a minimum size
* available.
*/
- if (whichfork == XFS_DATA_FORK) {
- maxleafents = MAXEXTNUM;
+ maxleafents = xfs_iext_max_nextents(whichfork);
+ if (whichfork == XFS_DATA_FORK)
sz = XFS_BMDR_SPACE_CALC(MINDBTPTRS);
- } else {
- maxleafents = MAXAEXTNUM;
+ else
sz = XFS_BMDR_SPACE_CALC(MINABTPTRS);
- }
+
maxrootrecs = xfs_bmdr_maxrecs(sz, 0);
minleafrecs = mp->m_bmap_dmnr[0];
minnoderecs = mp->m_bmap_dmnr[1];
diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index 87781a5d5a45..b1c37a82ddce 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -337,6 +337,7 @@ xfs_dinode_verify_fork(
int whichfork)
{
uint32_t di_nextents = XFS_DFORK_NEXTENTS(dip, whichfork);
+ xfs_extnum_t max_extents;
switch (XFS_DFORK_FORMAT(dip, whichfork)) {
case XFS_DINODE_FMT_LOCAL:
@@ -358,12 +359,9 @@ xfs_dinode_verify_fork(
return __this_address;
break;
case XFS_DINODE_FMT_BTREE:
- if (whichfork == XFS_ATTR_FORK) {
- if (di_nextents > MAXAEXTNUM)
- return __this_address;
- } else if (di_nextents > MAXEXTNUM) {
+ max_extents = xfs_iext_max_nextents(whichfork);
+ if (di_nextents > max_extents)
return __this_address;
- }
break;
default:
return __this_address;
diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
index 9149f4f796fc..e136c29a0ec1 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -744,7 +744,7 @@ xfs_iext_count_may_overflow(
if (whichfork == XFS_COW_FORK)
return 0;
- max_exts = (whichfork == XFS_ATTR_FORK) ? MAXAEXTNUM : MAXEXTNUM;
+ max_exts = xfs_iext_max_nextents(whichfork);
if (XFS_TEST_ERROR(false, ip->i_mount, XFS_ERRTAG_REDUCE_MAX_IEXTENTS))
max_exts = 10;
diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
index 3d64a3acb0ed..2605f7ff8fc1 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.h
+++ b/fs/xfs/libxfs/xfs_inode_fork.h
@@ -133,6 +133,14 @@ static inline int8_t xfs_ifork_format(struct xfs_ifork *ifp)
return ifp->if_format;
}
+static inline xfs_extnum_t xfs_iext_max_nextents(int whichfork)
+{
+ if (whichfork == XFS_DATA_FORK || whichfork == XFS_COW_FORK)
+ return MAXEXTNUM;
+
+ return MAXAEXTNUM;
+}
+
struct xfs_ifork *xfs_ifork_alloc(enum xfs_dinode_fmt format,
xfs_extnum_t nextents);
struct xfs_ifork *xfs_iext_state_to_fork(struct xfs_inode *ip, int state);
--
2.30.2
^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH V9 04/19] xfs: Use xfs_extnum_t instead of basic data types
2022-04-06 6:18 [PATCH V9 00/19] xfs: Extend per-inode extent counters Chandan Babu R
` (2 preceding siblings ...)
2022-04-06 6:18 ` [PATCH V9 03/19] xfs: Introduce xfs_iext_max_nextents() helper Chandan Babu R
@ 2022-04-06 6:18 ` Chandan Babu R
2022-04-06 6:18 ` [PATCH V9 05/19] xfs: Introduce xfs_dfork_nextents() helper Chandan Babu R
` (15 subsequent siblings)
19 siblings, 0 replies; 62+ messages in thread
From: Chandan Babu R @ 2022-04-06 6:18 UTC (permalink / raw)
To: linux-xfs; +Cc: Chandan Babu R, djwong, david, Dave Chinner
xfs_extnum_t is the type to use to declare variables which have values
obtained from xfs_dinode->di_[a]nextents. This commit replaces basic
types (e.g. uint32_t) with xfs_extnum_t for such variables.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
fs/xfs/libxfs/xfs_bmap.c | 2 +-
fs/xfs/libxfs/xfs_inode_buf.c | 2 +-
fs/xfs/libxfs/xfs_inode_fork.c | 2 +-
fs/xfs/scrub/inode.c | 2 +-
fs/xfs/xfs_trace.h | 2 +-
5 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index a713bc7242a4..cc15981b1793 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -54,7 +54,7 @@ xfs_bmap_compute_maxlevels(
{
int level; /* btree level */
uint maxblocks; /* max blocks at this level */
- uint maxleafents; /* max leaf entries possible */
+ xfs_extnum_t maxleafents; /* max leaf entries possible */
int maxrootrecs; /* max records in root block */
int minleafrecs; /* min records in leaf block */
int minnoderecs; /* min records in node block */
diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index b1c37a82ddce..7cad307840b3 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -336,7 +336,7 @@ xfs_dinode_verify_fork(
struct xfs_mount *mp,
int whichfork)
{
- uint32_t di_nextents = XFS_DFORK_NEXTENTS(dip, whichfork);
+ xfs_extnum_t di_nextents = XFS_DFORK_NEXTENTS(dip, whichfork);
xfs_extnum_t max_extents;
switch (XFS_DFORK_FORMAT(dip, whichfork)) {
diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
index e136c29a0ec1..a17c4d87520a 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -105,7 +105,7 @@ xfs_iformat_extents(
struct xfs_mount *mp = ip->i_mount;
struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork);
int state = xfs_bmap_fork_to_state(whichfork);
- int nex = XFS_DFORK_NEXTENTS(dip, whichfork);
+ xfs_extnum_t nex = XFS_DFORK_NEXTENTS(dip, whichfork);
int size = nex * sizeof(xfs_bmbt_rec_t);
struct xfs_iext_cursor icur;
struct xfs_bmbt_rec *dp;
diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c
index eac15af7b08c..87925761e174 100644
--- a/fs/xfs/scrub/inode.c
+++ b/fs/xfs/scrub/inode.c
@@ -232,7 +232,7 @@ xchk_dinode(
size_t fork_recs;
unsigned long long isize;
uint64_t flags2;
- uint32_t nextents;
+ xfs_extnum_t nextents;
prid_t prid;
uint16_t flags;
uint16_t mode;
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index b141ef78c755..16a91b4f97bd 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -2169,7 +2169,7 @@ DECLARE_EVENT_CLASS(xfs_swap_extent_class,
__field(int, which)
__field(xfs_ino_t, ino)
__field(int, format)
- __field(int, nex)
+ __field(xfs_extnum_t, nex)
__field(int, broot_size)
__field(int, fork_off)
),
--
2.30.2
^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH V9 05/19] xfs: Introduce xfs_dfork_nextents() helper
2022-04-06 6:18 [PATCH V9 00/19] xfs: Extend per-inode extent counters Chandan Babu R
` (3 preceding siblings ...)
2022-04-06 6:18 ` [PATCH V9 04/19] xfs: Use xfs_extnum_t instead of basic data types Chandan Babu R
@ 2022-04-06 6:18 ` Chandan Babu R
2022-04-06 6:18 ` [PATCH V9 06/19] xfs: Use basic types to define xfs_log_dinode's di_nextents and di_anextents Chandan Babu R
` (14 subsequent siblings)
19 siblings, 0 replies; 62+ messages in thread
From: Chandan Babu R @ 2022-04-06 6:18 UTC (permalink / raw)
To: linux-xfs; +Cc: Chandan Babu R, djwong, david, Dave Chinner
This commit replaces the macro XFS_DFORK_NEXTENTS() with the helper function
xfs_dfork_nextents(). As of this commit, xfs_dfork_nextents() returns the same
value as XFS_DFORK_NEXTENTS(). A future commit which extends inode's extent
counter fields will add more logic to this helper.
This commit also replaces direct accesses to xfs_dinode->di_[a]nextents
with calls to xfs_dfork_nextents().
No functional changes have been made.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
fs/xfs/libxfs/xfs_format.h | 4 ----
fs/xfs/libxfs/xfs_inode_buf.c | 17 ++++++++++++-----
fs/xfs/libxfs/xfs_inode_fork.c | 8 ++++----
fs/xfs/libxfs/xfs_inode_fork.h | 32 ++++++++++++++++++++++++++++++++
fs/xfs/scrub/inode.c | 18 ++++++++++--------
5 files changed, 58 insertions(+), 21 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 66594853a88b..b5e9256d6d32 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -924,10 +924,6 @@ enum xfs_dinode_fmt {
((w) == XFS_DATA_FORK ? \
(dip)->di_format : \
(dip)->di_aformat)
-#define XFS_DFORK_NEXTENTS(dip,w) \
- ((w) == XFS_DATA_FORK ? \
- be32_to_cpu((dip)->di_nextents) : \
- be16_to_cpu((dip)->di_anextents))
/*
* For block and character special files the 32bit dev_t is stored at the
diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index 7cad307840b3..f0e063835318 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -336,9 +336,11 @@ xfs_dinode_verify_fork(
struct xfs_mount *mp,
int whichfork)
{
- xfs_extnum_t di_nextents = XFS_DFORK_NEXTENTS(dip, whichfork);
+ xfs_extnum_t di_nextents;
xfs_extnum_t max_extents;
+ di_nextents = xfs_dfork_nextents(dip, whichfork);
+
switch (XFS_DFORK_FORMAT(dip, whichfork)) {
case XFS_DINODE_FMT_LOCAL:
/*
@@ -405,6 +407,9 @@ xfs_dinode_verify(
uint16_t flags;
uint64_t flags2;
uint64_t di_size;
+ xfs_extnum_t nextents;
+ xfs_extnum_t naextents;
+ xfs_filblks_t nblocks;
if (dip->di_magic != cpu_to_be16(XFS_DINODE_MAGIC))
return __this_address;
@@ -435,10 +440,12 @@ xfs_dinode_verify(
if ((S_ISLNK(mode) || S_ISDIR(mode)) && di_size == 0)
return __this_address;
+ nextents = xfs_dfork_data_extents(dip);
+ naextents = xfs_dfork_attr_extents(dip);
+ nblocks = be64_to_cpu(dip->di_nblocks);
+
/* Fork checks carried over from xfs_iformat_fork */
- if (mode &&
- be32_to_cpu(dip->di_nextents) + be16_to_cpu(dip->di_anextents) >
- be64_to_cpu(dip->di_nblocks))
+ if (mode && nextents + naextents > nblocks)
return __this_address;
if (mode && XFS_DFORK_BOFF(dip) > mp->m_sb.sb_inodesize)
@@ -495,7 +502,7 @@ xfs_dinode_verify(
default:
return __this_address;
}
- if (dip->di_anextents)
+ if (naextents)
return __this_address;
}
diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
index a17c4d87520a..1cf48cee45e3 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -105,7 +105,7 @@ xfs_iformat_extents(
struct xfs_mount *mp = ip->i_mount;
struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork);
int state = xfs_bmap_fork_to_state(whichfork);
- xfs_extnum_t nex = XFS_DFORK_NEXTENTS(dip, whichfork);
+ xfs_extnum_t nex = xfs_dfork_nextents(dip, whichfork);
int size = nex * sizeof(xfs_bmbt_rec_t);
struct xfs_iext_cursor icur;
struct xfs_bmbt_rec *dp;
@@ -230,7 +230,7 @@ xfs_iformat_data_fork(
* depend on it.
*/
ip->i_df.if_format = dip->di_format;
- ip->i_df.if_nextents = be32_to_cpu(dip->di_nextents);
+ ip->i_df.if_nextents = xfs_dfork_data_extents(dip);
switch (inode->i_mode & S_IFMT) {
case S_IFIFO:
@@ -295,14 +295,14 @@ xfs_iformat_attr_fork(
struct xfs_inode *ip,
struct xfs_dinode *dip)
{
+ xfs_extnum_t naextents = xfs_dfork_attr_extents(dip);
int error = 0;
/*
* Initialize the extent count early, as the per-format routines may
* depend on it.
*/
- ip->i_afp = xfs_ifork_alloc(dip->di_aformat,
- be16_to_cpu(dip->di_anextents));
+ ip->i_afp = xfs_ifork_alloc(dip->di_aformat, naextents);
switch (ip->i_afp->if_format) {
case XFS_DINODE_FMT_LOCAL:
diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
index 2605f7ff8fc1..7ed2ecb51bca 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.h
+++ b/fs/xfs/libxfs/xfs_inode_fork.h
@@ -141,6 +141,38 @@ static inline xfs_extnum_t xfs_iext_max_nextents(int whichfork)
return MAXAEXTNUM;
}
+static inline xfs_extnum_t
+xfs_dfork_data_extents(
+ struct xfs_dinode *dip)
+{
+ return be32_to_cpu(dip->di_nextents);
+}
+
+static inline xfs_extnum_t
+xfs_dfork_attr_extents(
+ struct xfs_dinode *dip)
+{
+ return be16_to_cpu(dip->di_anextents);
+}
+
+static inline xfs_extnum_t
+xfs_dfork_nextents(
+ struct xfs_dinode *dip,
+ int whichfork)
+{
+ switch (whichfork) {
+ case XFS_DATA_FORK:
+ return xfs_dfork_data_extents(dip);
+ case XFS_ATTR_FORK:
+ return xfs_dfork_attr_extents(dip);
+ default:
+ ASSERT(0);
+ break;
+ }
+
+ return 0;
+}
+
struct xfs_ifork *xfs_ifork_alloc(enum xfs_dinode_fmt format,
xfs_extnum_t nextents);
struct xfs_ifork *xfs_iext_state_to_fork(struct xfs_inode *ip, int state);
diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c
index 87925761e174..51820b40ab1c 100644
--- a/fs/xfs/scrub/inode.c
+++ b/fs/xfs/scrub/inode.c
@@ -233,6 +233,7 @@ xchk_dinode(
unsigned long long isize;
uint64_t flags2;
xfs_extnum_t nextents;
+ xfs_extnum_t naextents;
prid_t prid;
uint16_t flags;
uint16_t mode;
@@ -390,8 +391,10 @@ xchk_dinode(
xchk_inode_extsize(sc, dip, ino, mode, flags);
+ nextents = xfs_dfork_data_extents(dip);
+ naextents = xfs_dfork_attr_extents(dip);
+
/* di_nextents */
- nextents = be32_to_cpu(dip->di_nextents);
fork_recs = XFS_DFORK_DSIZE(dip, mp) / sizeof(struct xfs_bmbt_rec);
switch (dip->di_format) {
case XFS_DINODE_FMT_EXTENTS:
@@ -411,7 +414,7 @@ xchk_dinode(
/* di_forkoff */
if (XFS_DFORK_APTR(dip) >= (char *)dip + mp->m_sb.sb_inodesize)
xchk_ino_set_corrupt(sc, ino);
- if (dip->di_anextents != 0 && dip->di_forkoff == 0)
+ if (naextents != 0 && dip->di_forkoff == 0)
xchk_ino_set_corrupt(sc, ino);
if (dip->di_forkoff == 0 && dip->di_aformat != XFS_DINODE_FMT_EXTENTS)
xchk_ino_set_corrupt(sc, ino);
@@ -423,19 +426,18 @@ xchk_dinode(
xchk_ino_set_corrupt(sc, ino);
/* di_anextents */
- nextents = be16_to_cpu(dip->di_anextents);
fork_recs = XFS_DFORK_ASIZE(dip, mp) / sizeof(struct xfs_bmbt_rec);
switch (dip->di_aformat) {
case XFS_DINODE_FMT_EXTENTS:
- if (nextents > fork_recs)
+ if (naextents > fork_recs)
xchk_ino_set_corrupt(sc, ino);
break;
case XFS_DINODE_FMT_BTREE:
- if (nextents <= fork_recs)
+ if (naextents <= fork_recs)
xchk_ino_set_corrupt(sc, ino);
break;
default:
- if (nextents != 0)
+ if (naextents != 0)
xchk_ino_set_corrupt(sc, ino);
}
@@ -513,14 +515,14 @@ xchk_inode_xref_bmap(
&nextents, &count);
if (!xchk_should_check_xref(sc, &error, NULL))
return;
- if (nextents < be32_to_cpu(dip->di_nextents))
+ if (nextents < xfs_dfork_data_extents(dip))
xchk_ino_xref_set_corrupt(sc, sc->ip->i_ino);
error = xfs_bmap_count_blocks(sc->tp, sc->ip, XFS_ATTR_FORK,
&nextents, &acount);
if (!xchk_should_check_xref(sc, &error, NULL))
return;
- if (nextents != be16_to_cpu(dip->di_anextents))
+ if (nextents != xfs_dfork_attr_extents(dip))
xchk_ino_xref_set_corrupt(sc, sc->ip->i_ino);
/* Check nblocks against the inode. */
--
2.30.2
^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH V9 06/19] xfs: Use basic types to define xfs_log_dinode's di_nextents and di_anextents
2022-04-06 6:18 [PATCH V9 00/19] xfs: Extend per-inode extent counters Chandan Babu R
` (4 preceding siblings ...)
2022-04-06 6:18 ` [PATCH V9 05/19] xfs: Introduce xfs_dfork_nextents() helper Chandan Babu R
@ 2022-04-06 6:18 ` Chandan Babu R
2022-04-06 6:18 ` [PATCH V9 07/19] xfs: Promote xfs_extnum_t and xfs_aextnum_t to 64 and 32-bits respectively Chandan Babu R
` (13 subsequent siblings)
19 siblings, 0 replies; 62+ messages in thread
From: Chandan Babu R @ 2022-04-06 6:18 UTC (permalink / raw)
To: linux-xfs; +Cc: Chandan Babu R, djwong, david, Dave Chinner
A future commit will increase the width of xfs_extnum_t in order to facilitate
larger per-inode extent counters. Hence this patch now uses basic types to
define xfs_log_dinode->[di_nextents|dianextents].
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
fs/xfs/libxfs/xfs_log_format.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index b322db523d65..fd66e70248f7 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -396,8 +396,8 @@ struct xfs_log_dinode {
xfs_fsize_t di_size; /* number of bytes in file */
xfs_rfsblock_t di_nblocks; /* # of direct & btree blocks used */
xfs_extlen_t di_extsize; /* basic/minimum extent size for file */
- xfs_extnum_t di_nextents; /* number of extents in data fork */
- xfs_aextnum_t di_anextents; /* number of extents in attribute fork*/
+ uint32_t di_nextents; /* number of extents in data fork */
+ uint16_t di_anextents; /* number of extents in attribute fork*/
uint8_t di_forkoff; /* attr fork offs, <<3 for 64b align */
int8_t di_aformat; /* format of attr fork's data */
uint32_t di_dmevmask; /* DMIG event mask */
--
2.30.2
^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH V9 07/19] xfs: Promote xfs_extnum_t and xfs_aextnum_t to 64 and 32-bits respectively
2022-04-06 6:18 [PATCH V9 00/19] xfs: Extend per-inode extent counters Chandan Babu R
` (5 preceding siblings ...)
2022-04-06 6:18 ` [PATCH V9 06/19] xfs: Use basic types to define xfs_log_dinode's di_nextents and di_anextents Chandan Babu R
@ 2022-04-06 6:18 ` Chandan Babu R
2022-04-06 6:18 ` [PATCH V9 08/19] xfs: Introduce XFS_SB_FEAT_INCOMPAT_NREXT64 and associated per-fs feature bit Chandan Babu R
` (12 subsequent siblings)
19 siblings, 0 replies; 62+ messages in thread
From: Chandan Babu R @ 2022-04-06 6:18 UTC (permalink / raw)
To: linux-xfs; +Cc: Chandan Babu R, djwong, david, Dave Chinner
A future commit will introduce a 64-bit on-disk data extent counter and a
32-bit on-disk attr extent counter. This commit promotes xfs_extnum_t and
xfs_aextnum_t to 64 and 32-bits in order to correctly handle in-core versions
of these quantities.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
fs/xfs/libxfs/xfs_bmap.c | 4 ++--
fs/xfs/libxfs/xfs_inode_fork.c | 4 ++--
fs/xfs/libxfs/xfs_inode_fork.h | 2 +-
fs/xfs/libxfs/xfs_types.h | 4 ++--
fs/xfs/xfs_inode.c | 4 ++--
fs/xfs/xfs_trace.h | 2 +-
6 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index cc15981b1793..9f38e33d6ce2 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -83,7 +83,7 @@ xfs_bmap_compute_maxlevels(
maxrootrecs = xfs_bmdr_maxrecs(sz, 0);
minleafrecs = mp->m_bmap_dmnr[0];
minnoderecs = mp->m_bmap_dmnr[1];
- maxblocks = (maxleafents + minleafrecs - 1) / minleafrecs;
+ maxblocks = howmany_64(maxleafents, minleafrecs);
for (level = 1; maxblocks > 1; level++) {
if (maxblocks <= maxrootrecs)
maxblocks = 1;
@@ -467,7 +467,7 @@ xfs_bmap_check_leaf_extents(
if (bp_release)
xfs_trans_brelse(NULL, bp);
error_norelse:
- xfs_warn(mp, "%s: BAD after btree leaves for %d extents",
+ xfs_warn(mp, "%s: BAD after btree leaves for %llu extents",
__func__, i);
xfs_err(mp, "%s: CORRUPTED BTREE OR SOMETHING", __func__);
xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
index 1cf48cee45e3..004b205d87b8 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -117,8 +117,8 @@ xfs_iformat_extents(
* we just bail out rather than crash in kmem_alloc() or memcpy() below.
*/
if (unlikely(size < 0 || size > XFS_DFORK_SIZE(dip, mp, whichfork))) {
- xfs_warn(ip->i_mount, "corrupt inode %Lu ((a)extents = %d).",
- (unsigned long long) ip->i_ino, nex);
+ xfs_warn(ip->i_mount, "corrupt inode %llu ((a)extents = %llu).",
+ ip->i_ino, nex);
xfs_inode_verifier_error(ip, -EFSCORRUPTED,
"xfs_iformat_extents(1)", dip, sizeof(*dip),
__this_address);
diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
index 7ed2ecb51bca..4a8b77d425df 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.h
+++ b/fs/xfs/libxfs/xfs_inode_fork.h
@@ -21,9 +21,9 @@ struct xfs_ifork {
void *if_root; /* extent tree root */
char *if_data; /* inline file data */
} if_u1;
+ xfs_extnum_t if_nextents; /* # of extents in this fork */
short if_broot_bytes; /* bytes allocated for root */
int8_t if_format; /* format of this fork */
- xfs_extnum_t if_nextents; /* # of extents in this fork */
};
/*
diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
index 794a54cbd0de..373f64a492a4 100644
--- a/fs/xfs/libxfs/xfs_types.h
+++ b/fs/xfs/libxfs/xfs_types.h
@@ -12,8 +12,8 @@ typedef uint32_t xfs_agblock_t; /* blockno in alloc. group */
typedef uint32_t xfs_agino_t; /* inode # within allocation grp */
typedef uint32_t xfs_extlen_t; /* extent length in blocks */
typedef uint32_t xfs_agnumber_t; /* allocation group number */
-typedef int32_t xfs_extnum_t; /* # of extents in a file */
-typedef int16_t xfs_aextnum_t; /* # extents in an attribute fork */
+typedef uint64_t xfs_extnum_t; /* # of extents in a file */
+typedef uint32_t xfs_aextnum_t; /* # extents in an attribute fork */
typedef int64_t xfs_fsize_t; /* bytes in a file */
typedef uint64_t xfs_ufsize_t; /* unsigned bytes in a file */
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 9de6205fe134..adc1355ce853 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3515,8 +3515,8 @@ xfs_iflush(
if (XFS_TEST_ERROR(ip->i_df.if_nextents + xfs_ifork_nextents(ip->i_afp) >
ip->i_nblocks, mp, XFS_ERRTAG_IFLUSH_5)) {
xfs_alert_tag(mp, XFS_PTAG_IFLUSH,
- "%s: detected corrupt incore inode %Lu, "
- "total extents = %d, nblocks = %Ld, ptr "PTR_FMT,
+ "%s: detected corrupt incore inode %llu, "
+ "total extents = %llu nblocks = %lld, ptr "PTR_FMT,
__func__, ip->i_ino,
ip->i_df.if_nextents + xfs_ifork_nextents(ip->i_afp),
ip->i_nblocks, ip);
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 16a91b4f97bd..fe6cb2951233 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -2182,7 +2182,7 @@ DECLARE_EVENT_CLASS(xfs_swap_extent_class,
__entry->broot_size = ip->i_df.if_broot_bytes;
__entry->fork_off = XFS_IFORK_BOFF(ip);
),
- TP_printk("dev %d:%d ino 0x%llx (%s), %s format, num_extents %d, "
+ TP_printk("dev %d:%d ino 0x%llx (%s), %s format, num_extents %llu, "
"broot size %d, forkoff 0x%x",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->ino,
--
2.30.2
^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH V9 08/19] xfs: Introduce XFS_SB_FEAT_INCOMPAT_NREXT64 and associated per-fs feature bit
2022-04-06 6:18 [PATCH V9 00/19] xfs: Extend per-inode extent counters Chandan Babu R
` (6 preceding siblings ...)
2022-04-06 6:18 ` [PATCH V9 07/19] xfs: Promote xfs_extnum_t and xfs_aextnum_t to 64 and 32-bits respectively Chandan Babu R
@ 2022-04-06 6:18 ` Chandan Babu R
2022-04-07 0:50 ` Dave Chinner
2022-04-06 6:18 ` [PATCH V9 09/19] xfs: Introduce XFS_FSOP_GEOM_FLAGS_NREXT64 Chandan Babu R
` (11 subsequent siblings)
19 siblings, 1 reply; 62+ messages in thread
From: Chandan Babu R @ 2022-04-06 6:18 UTC (permalink / raw)
To: linux-xfs; +Cc: Chandan Babu R, djwong, david
XFS_SB_FEAT_INCOMPAT_NREXT64 incompat feature bit will be set on filesystems
which support large per-inode extent counters. This commit defines the new
incompat feature bit and the corresponding per-fs feature bit (along with
inline functions to work on it).
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
fs/xfs/libxfs/xfs_format.h | 1 +
fs/xfs/libxfs/xfs_sb.c | 3 +++
fs/xfs/xfs_mount.h | 2 ++
3 files changed, 6 insertions(+)
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index b5e9256d6d32..64ff0c310696 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -372,6 +372,7 @@ xfs_sb_has_ro_compat_feature(
#define XFS_SB_FEAT_INCOMPAT_META_UUID (1 << 2) /* metadata UUID */
#define XFS_SB_FEAT_INCOMPAT_BIGTIME (1 << 3) /* large timestamps */
#define XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR (1 << 4) /* needs xfs_repair */
+#define XFS_SB_FEAT_INCOMPAT_NREXT64 (1 << 5) /* large extent counters */
#define XFS_SB_FEAT_INCOMPAT_ALL \
(XFS_SB_FEAT_INCOMPAT_FTYPE| \
XFS_SB_FEAT_INCOMPAT_SPINODES| \
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index f4e84aa1d50a..bd632389ae92 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -124,6 +124,9 @@ xfs_sb_version_to_features(
features |= XFS_FEAT_BIGTIME;
if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR)
features |= XFS_FEAT_NEEDSREPAIR;
+ if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_NREXT64)
+ features |= XFS_FEAT_NREXT64;
+
return features;
}
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index f6dc19de8322..98ceccdbcf51 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -276,6 +276,7 @@ typedef struct xfs_mount {
#define XFS_FEAT_INOBTCNT (1ULL << 23) /* inobt block counts */
#define XFS_FEAT_BIGTIME (1ULL << 24) /* large timestamps */
#define XFS_FEAT_NEEDSREPAIR (1ULL << 25) /* needs xfs_repair */
+#define XFS_FEAT_NREXT64 (1ULL << 26) /* large extent counters */
/* Mount features */
#define XFS_FEAT_NOATTR2 (1ULL << 48) /* disable attr2 creation */
@@ -338,6 +339,7 @@ __XFS_HAS_FEAT(realtime, REALTIME)
__XFS_HAS_FEAT(inobtcounts, INOBTCNT)
__XFS_HAS_FEAT(bigtime, BIGTIME)
__XFS_HAS_FEAT(needsrepair, NEEDSREPAIR)
+__XFS_HAS_FEAT(large_extent_counts, NREXT64)
/*
* Mount features
--
2.30.2
^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH V9 09/19] xfs: Introduce XFS_FSOP_GEOM_FLAGS_NREXT64
2022-04-06 6:18 [PATCH V9 00/19] xfs: Extend per-inode extent counters Chandan Babu R
` (7 preceding siblings ...)
2022-04-06 6:18 ` [PATCH V9 08/19] xfs: Introduce XFS_SB_FEAT_INCOMPAT_NREXT64 and associated per-fs feature bit Chandan Babu R
@ 2022-04-06 6:18 ` Chandan Babu R
2022-04-06 6:18 ` [PATCH V9 10/19] xfs: Introduce XFS_DIFLAG2_NREXT64 and associated helpers Chandan Babu R
` (10 subsequent siblings)
19 siblings, 0 replies; 62+ messages in thread
From: Chandan Babu R @ 2022-04-06 6:18 UTC (permalink / raw)
To: linux-xfs; +Cc: Chandan Babu R, djwong, david, Dave Chinner
XFS_FSOP_GEOM_FLAGS_NREXT64 indicates that the current filesystem instance
supports 64-bit per-inode extent counters.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
fs/xfs/libxfs/xfs_fs.h | 1 +
fs/xfs/libxfs/xfs_sb.c | 2 ++
2 files changed, 3 insertions(+)
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 505533c43a92..1f7238db35cc 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -236,6 +236,7 @@ typedef struct xfs_fsop_resblks {
#define XFS_FSOP_GEOM_FLAGS_REFLINK (1 << 20) /* files can share blocks */
#define XFS_FSOP_GEOM_FLAGS_BIGTIME (1 << 21) /* 64-bit nsec timestamps */
#define XFS_FSOP_GEOM_FLAGS_INOBTCNT (1 << 22) /* inobt btree counter */
+#define XFS_FSOP_GEOM_FLAGS_NREXT64 (1 << 23) /* large extent counters */
/*
* Minimum and maximum sizes need for growth checks.
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index bd632389ae92..e292a1914a5b 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -1138,6 +1138,8 @@ xfs_fs_geometry(
} else {
geo->logsectsize = BBSIZE;
}
+ if (xfs_has_large_extent_counts(mp))
+ geo->flags |= XFS_FSOP_GEOM_FLAGS_NREXT64;
geo->rtsectsize = sbp->sb_blocksize;
geo->dirblocksize = xfs_dir2_dirblock_bytes(sbp);
--
2.30.2
^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH V9 10/19] xfs: Introduce XFS_DIFLAG2_NREXT64 and associated helpers
2022-04-06 6:18 [PATCH V9 00/19] xfs: Extend per-inode extent counters Chandan Babu R
` (8 preceding siblings ...)
2022-04-06 6:18 ` [PATCH V9 09/19] xfs: Introduce XFS_FSOP_GEOM_FLAGS_NREXT64 Chandan Babu R
@ 2022-04-06 6:18 ` Chandan Babu R
2022-04-06 6:18 ` [PATCH V9 11/19] xfs: Use uint64_t to count maximum blocks that can be used by BMBT Chandan Babu R
` (9 subsequent siblings)
19 siblings, 0 replies; 62+ messages in thread
From: Chandan Babu R @ 2022-04-06 6:18 UTC (permalink / raw)
To: linux-xfs; +Cc: Chandan Babu R, djwong, david, Dave Chinner
This commit adds the new per-inode flag XFS_DIFLAG2_NREXT64 to indicate that
an inode supports 64-bit extent counters. This flag is also enabled by default
on newly created inodes when the corresponding filesystem has large extent
counter feature bit (i.e. XFS_FEAT_NREXT64) set.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
fs/xfs/libxfs/xfs_format.h | 11 ++++++++++-
fs/xfs/libxfs/xfs_ialloc.c | 2 ++
fs/xfs/xfs_inode.h | 5 +++++
fs/xfs/xfs_inode_item_recover.c | 7 +++++++
4 files changed, 24 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 64ff0c310696..57b24744a7c2 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -991,15 +991,17 @@ static inline void xfs_dinode_put_rdev(struct xfs_dinode *dip, xfs_dev_t rdev)
#define XFS_DIFLAG2_REFLINK_BIT 1 /* file's blocks may be shared */
#define XFS_DIFLAG2_COWEXTSIZE_BIT 2 /* copy on write extent size hint */
#define XFS_DIFLAG2_BIGTIME_BIT 3 /* big timestamps */
+#define XFS_DIFLAG2_NREXT64_BIT 4 /* large extent counters */
#define XFS_DIFLAG2_DAX (1 << XFS_DIFLAG2_DAX_BIT)
#define XFS_DIFLAG2_REFLINK (1 << XFS_DIFLAG2_REFLINK_BIT)
#define XFS_DIFLAG2_COWEXTSIZE (1 << XFS_DIFLAG2_COWEXTSIZE_BIT)
#define XFS_DIFLAG2_BIGTIME (1 << XFS_DIFLAG2_BIGTIME_BIT)
+#define XFS_DIFLAG2_NREXT64 (1 << XFS_DIFLAG2_NREXT64_BIT)
#define XFS_DIFLAG2_ANY \
(XFS_DIFLAG2_DAX | XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE | \
- XFS_DIFLAG2_BIGTIME)
+ XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64)
static inline bool xfs_dinode_has_bigtime(const struct xfs_dinode *dip)
{
@@ -1007,6 +1009,13 @@ static inline bool xfs_dinode_has_bigtime(const struct xfs_dinode *dip)
(dip->di_flags2 & cpu_to_be64(XFS_DIFLAG2_BIGTIME));
}
+static inline bool xfs_dinode_has_large_extent_counts(
+ const struct xfs_dinode *dip)
+{
+ return dip->di_version >= 3 &&
+ (dip->di_flags2 & cpu_to_be64(XFS_DIFLAG2_NREXT64));
+}
+
/*
* Inode number format:
* low inopblog bits - offset in block
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index b418fe0c0679..cdf8b63fcb22 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -2772,6 +2772,8 @@ xfs_ialloc_setup_geometry(
igeo->new_diflags2 = 0;
if (xfs_has_bigtime(mp))
igeo->new_diflags2 |= XFS_DIFLAG2_BIGTIME;
+ if (xfs_has_large_extent_counts(mp))
+ igeo->new_diflags2 |= XFS_DIFLAG2_NREXT64;
/* Compute inode btree geometry. */
igeo->agino_log = sbp->sb_inopblog + sbp->sb_agblklog;
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 740ab13d1aa2..aeab09882702 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -218,6 +218,11 @@ static inline bool xfs_inode_has_bigtime(struct xfs_inode *ip)
return ip->i_diflags2 & XFS_DIFLAG2_BIGTIME;
}
+static inline bool xfs_inode_has_large_extent_counts(struct xfs_inode *ip)
+{
+ return ip->i_diflags2 & XFS_DIFLAG2_NREXT64;
+}
+
/*
* Return the buftarg used for data allocations on a given inode.
*/
diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c
index 239dd2e3384e..44b90614859e 100644
--- a/fs/xfs/xfs_inode_item_recover.c
+++ b/fs/xfs/xfs_inode_item_recover.c
@@ -142,6 +142,13 @@ xfs_log_dinode_to_disk_ts(
return ts;
}
+static inline bool xfs_log_dinode_has_large_extent_counts(
+ const struct xfs_log_dinode *ld)
+{
+ return ld->di_version >= 3 &&
+ (ld->di_flags2 & XFS_DIFLAG2_NREXT64);
+}
+
STATIC void
xfs_log_dinode_to_disk(
struct xfs_log_dinode *from,
--
2.30.2
^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH V9 11/19] xfs: Use uint64_t to count maximum blocks that can be used by BMBT
2022-04-06 6:18 [PATCH V9 00/19] xfs: Extend per-inode extent counters Chandan Babu R
` (9 preceding siblings ...)
2022-04-06 6:18 ` [PATCH V9 10/19] xfs: Introduce XFS_DIFLAG2_NREXT64 and associated helpers Chandan Babu R
@ 2022-04-06 6:18 ` Chandan Babu R
2022-04-07 0:52 ` Dave Chinner
2022-04-06 6:18 ` [PATCH V9 12/19] xfs: Introduce macros to represent new maximum extent counts for data/attr forks Chandan Babu R
` (8 subsequent siblings)
19 siblings, 1 reply; 62+ messages in thread
From: Chandan Babu R @ 2022-04-06 6:18 UTC (permalink / raw)
To: linux-xfs; +Cc: Chandan Babu R, djwong, david
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
fs/xfs/libxfs/xfs_bmap.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 9f38e33d6ce2..b317226fb4ba 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -52,9 +52,9 @@ xfs_bmap_compute_maxlevels(
xfs_mount_t *mp, /* file system mount structure */
int whichfork) /* data or attr fork */
{
- int level; /* btree level */
- uint maxblocks; /* max blocks at this level */
+ uint64_t maxblocks; /* max blocks at this level */
xfs_extnum_t maxleafents; /* max leaf entries possible */
+ int level; /* btree level */
int maxrootrecs; /* max records in root block */
int minleafrecs; /* min records in leaf block */
int minnoderecs; /* min records in node block */
@@ -88,7 +88,7 @@ xfs_bmap_compute_maxlevels(
if (maxblocks <= maxrootrecs)
maxblocks = 1;
else
- maxblocks = (maxblocks + minnoderecs - 1) / minnoderecs;
+ maxblocks = howmany_64(maxblocks, minnoderecs);
}
mp->m_bm_maxlevels[whichfork] = level;
ASSERT(mp->m_bm_maxlevels[whichfork] <= xfs_bmbt_maxlevels_ondisk());
--
2.30.2
^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH V9 12/19] xfs: Introduce macros to represent new maximum extent counts for data/attr forks
2022-04-06 6:18 [PATCH V9 00/19] xfs: Extend per-inode extent counters Chandan Babu R
` (10 preceding siblings ...)
2022-04-06 6:18 ` [PATCH V9 11/19] xfs: Use uint64_t to count maximum blocks that can be used by BMBT Chandan Babu R
@ 2022-04-06 6:18 ` Chandan Babu R
2022-04-07 1:05 ` Dave Chinner
2022-04-06 6:18 ` [PATCH V9 13/19] xfs: Replace numbered inode recovery error messages with descriptive ones Chandan Babu R
` (7 subsequent siblings)
19 siblings, 1 reply; 62+ messages in thread
From: Chandan Babu R @ 2022-04-06 6:18 UTC (permalink / raw)
To: linux-xfs; +Cc: Chandan Babu R, djwong, david
This commit defines new macros to represent maximum extent counts allowed by
filesystems which have support for large per-inode extent counters.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
fs/xfs/libxfs/xfs_bmap.c | 9 ++++-----
fs/xfs/libxfs/xfs_bmap_btree.c | 3 ++-
fs/xfs/libxfs/xfs_format.h | 24 ++++++++++++++++++++++--
fs/xfs/libxfs/xfs_inode_buf.c | 4 +++-
fs/xfs/libxfs/xfs_inode_fork.c | 3 ++-
fs/xfs/libxfs/xfs_inode_fork.h | 21 +++++++++++++++++----
6 files changed, 50 insertions(+), 14 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index b317226fb4ba..1254d4d4821e 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -61,10 +61,8 @@ xfs_bmap_compute_maxlevels(
int sz; /* root block size */
/*
- * The maximum number of extents in a file, hence the maximum number of
- * leaf entries, is controlled by the size of the on-disk extent count,
- * either a signed 32-bit number for the data fork, or a signed 16-bit
- * number for the attr fork.
+ * The maximum number of extents in a fork, hence the maximum number of
+ * leaf entries, is controlled by the size of the on-disk extent count.
*
* Note that we can no longer assume that if we are in ATTR1 that the
* fork offset of all the inodes will be
@@ -74,7 +72,8 @@ xfs_bmap_compute_maxlevels(
* ATTR2 we have to assume the worst case scenario of a minimum size
* available.
*/
- maxleafents = xfs_iext_max_nextents(whichfork);
+ maxleafents = xfs_iext_max_nextents(xfs_has_large_extent_counts(mp),
+ whichfork);
if (whichfork == XFS_DATA_FORK)
sz = XFS_BMDR_SPACE_CALC(MINDBTPTRS);
else
diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
index 453309fc85f2..7aabeccea9ab 100644
--- a/fs/xfs/libxfs/xfs_bmap_btree.c
+++ b/fs/xfs/libxfs/xfs_bmap_btree.c
@@ -611,7 +611,8 @@ xfs_bmbt_maxlevels_ondisk(void)
minrecs[1] = xfs_bmbt_block_maxrecs(blocklen, false) / 2;
/* One extra level for the inode root. */
- return xfs_btree_compute_maxlevels(minrecs, MAXEXTNUM) + 1;
+ return xfs_btree_compute_maxlevels(minrecs,
+ XFS_MAX_EXTCNT_DATA_FORK_LARGE) + 1;
}
/*
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 57b24744a7c2..eb85bc9b229b 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -872,9 +872,29 @@ enum xfs_dinode_fmt {
/*
* Max values for extnum and aextnum.
+ *
+ * The original on-disk extent counts were held in signed fields, resulting in
+ * maximum extent counts of 2^31 and 2^15 for the data and attr forks
+ * respectively. Similarly the maximum extent length is limited to 2^21 blocks
+ * by the 21-bit wide blockcount field of a BMBT extent record.
+ *
+ * The newly introduced data fork extent counter can hold a 64-bit value,
+ * however the maximum number of extents in a file is also limited to 2^54
+ * extents by the 54-bit wide startoff field of a BMBT extent record.
+ *
+ * It is further limited by the maximum supported file size of 2^63
+ * *bytes*. This leads to a maximum extent count for maximally sized filesystem
+ * blocks (64kB) of:
+ *
+ * 2^63 bytes / 2^16 bytes per block = 2^47 blocks
+ *
+ * Rounding up 47 to the nearest multiple of bits-per-byte results in 48. Hence
+ * 2^48 was chosen as the maximum data fork extent count.
*/
-#define MAXEXTNUM ((xfs_extnum_t)0x7fffffff) /* signed int */
-#define MAXAEXTNUM ((xfs_aextnum_t)0x7fff) /* signed short */
+#define XFS_MAX_EXTCNT_DATA_FORK_LARGE ((xfs_extnum_t)((1ULL << 48) - 1))
+#define XFS_MAX_EXTCNT_ATTR_FORK_LARGE ((xfs_extnum_t)((1ULL << 32) - 1))
+#define XFS_MAX_EXTCNT_DATA_FORK_SMALL ((xfs_extnum_t)((1ULL << 31) - 1))
+#define XFS_MAX_EXTCNT_ATTR_FORK_SMALL ((xfs_extnum_t)((1ULL << 15) - 1))
/*
* Inode minimum and maximum sizes.
diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index f0e063835318..e0d3140c3622 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -361,7 +361,9 @@ xfs_dinode_verify_fork(
return __this_address;
break;
case XFS_DINODE_FMT_BTREE:
- max_extents = xfs_iext_max_nextents(whichfork);
+ max_extents = xfs_iext_max_nextents(
+ xfs_dinode_has_large_extent_counts(dip),
+ whichfork);
if (di_nextents > max_extents)
return __this_address;
break;
diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
index 004b205d87b8..bb5d841aac58 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -744,7 +744,8 @@ xfs_iext_count_may_overflow(
if (whichfork == XFS_COW_FORK)
return 0;
- max_exts = xfs_iext_max_nextents(whichfork);
+ max_exts = xfs_iext_max_nextents(xfs_inode_has_large_extent_counts(ip),
+ whichfork);
if (XFS_TEST_ERROR(false, ip->i_mount, XFS_ERRTAG_REDUCE_MAX_IEXTENTS))
max_exts = 10;
diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
index 4a8b77d425df..967837a88860 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.h
+++ b/fs/xfs/libxfs/xfs_inode_fork.h
@@ -133,12 +133,25 @@ static inline int8_t xfs_ifork_format(struct xfs_ifork *ifp)
return ifp->if_format;
}
-static inline xfs_extnum_t xfs_iext_max_nextents(int whichfork)
+static inline xfs_extnum_t xfs_iext_max_nextents(bool has_large_extent_counts,
+ int whichfork)
{
- if (whichfork == XFS_DATA_FORK || whichfork == XFS_COW_FORK)
- return MAXEXTNUM;
+ switch (whichfork) {
+ case XFS_DATA_FORK:
+ case XFS_COW_FORK:
+ if (has_large_extent_counts)
+ return XFS_MAX_EXTCNT_DATA_FORK_LARGE;
+ return XFS_MAX_EXTCNT_DATA_FORK_SMALL;
+
+ case XFS_ATTR_FORK:
+ if (has_large_extent_counts)
+ return XFS_MAX_EXTCNT_ATTR_FORK_LARGE;
+ return XFS_MAX_EXTCNT_ATTR_FORK_SMALL;
- return MAXAEXTNUM;
+ default:
+ ASSERT(0);
+ return 0;
+ }
}
static inline xfs_extnum_t
--
2.30.2
^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH V9 13/19] xfs: Replace numbered inode recovery error messages with descriptive ones
2022-04-06 6:18 [PATCH V9 00/19] xfs: Extend per-inode extent counters Chandan Babu R
` (11 preceding siblings ...)
2022-04-06 6:18 ` [PATCH V9 12/19] xfs: Introduce macros to represent new maximum extent counts for data/attr forks Chandan Babu R
@ 2022-04-06 6:18 ` Chandan Babu R
2022-04-07 1:50 ` Darrick J. Wong
2022-04-06 6:18 ` [PATCH V9 14/19] xfs: Introduce per-inode 64-bit extent counters Chandan Babu R
` (6 subsequent siblings)
19 siblings, 1 reply; 62+ messages in thread
From: Chandan Babu R @ 2022-04-06 6:18 UTC (permalink / raw)
To: linux-xfs; +Cc: Chandan Babu R, djwong, david, Dave Chinner
This commit also prints inode fields with invalid values instead of printing
addresses of inode and buffer instances.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Suggested-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_inode_item_recover.c | 52 ++++++++++++++-------------------
1 file changed, 22 insertions(+), 30 deletions(-)
diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c
index 44b90614859e..96b222e18b0f 100644
--- a/fs/xfs/xfs_inode_item_recover.c
+++ b/fs/xfs/xfs_inode_item_recover.c
@@ -324,13 +324,12 @@ xlog_recover_inode_commit_pass2(
if (unlikely(S_ISREG(ldip->di_mode))) {
if ((ldip->di_format != XFS_DINODE_FMT_EXTENTS) &&
(ldip->di_format != XFS_DINODE_FMT_BTREE)) {
- XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(3)",
- XFS_ERRLEVEL_LOW, mp, ldip,
- sizeof(*ldip));
+ XFS_CORRUPTION_ERROR(
+ "Bad log dinode data fork format for regular file",
+ XFS_ERRLEVEL_LOW, mp, ldip, sizeof(*ldip));
xfs_alert(mp,
- "%s: Bad regular inode log record, rec ptr "PTR_FMT", "
- "ino ptr = "PTR_FMT", ino bp = "PTR_FMT", ino %Ld",
- __func__, item, dip, bp, in_f->ilf_ino);
+ "Bad inode 0x%llx, data fork format 0x%x",
+ in_f->ilf_ino, ldip->di_format);
error = -EFSCORRUPTED;
goto out_release;
}
@@ -338,49 +337,42 @@ xlog_recover_inode_commit_pass2(
if ((ldip->di_format != XFS_DINODE_FMT_EXTENTS) &&
(ldip->di_format != XFS_DINODE_FMT_BTREE) &&
(ldip->di_format != XFS_DINODE_FMT_LOCAL)) {
- XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(4)",
- XFS_ERRLEVEL_LOW, mp, ldip,
- sizeof(*ldip));
+ XFS_CORRUPTION_ERROR(
+ "Bad log dinode data fork format for directory",
+ XFS_ERRLEVEL_LOW, mp, ldip, sizeof(*ldip));
xfs_alert(mp,
- "%s: Bad dir inode log record, rec ptr "PTR_FMT", "
- "ino ptr = "PTR_FMT", ino bp = "PTR_FMT", ino %Ld",
- __func__, item, dip, bp, in_f->ilf_ino);
+ "Bad inode 0x%llx, data fork format 0x%x",
+ in_f->ilf_ino, ldip->di_format);
error = -EFSCORRUPTED;
goto out_release;
}
}
if (unlikely(ldip->di_nextents + ldip->di_anextents > ldip->di_nblocks)){
- XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(5)",
- XFS_ERRLEVEL_LOW, mp, ldip,
- sizeof(*ldip));
+ XFS_CORRUPTION_ERROR("Bad log dinode extent counts",
+ XFS_ERRLEVEL_LOW, mp, ldip, sizeof(*ldip));
xfs_alert(mp,
- "%s: Bad inode log record, rec ptr "PTR_FMT", dino ptr "PTR_FMT", "
- "dino bp "PTR_FMT", ino %Ld, total extents = %d, nblocks = %Ld",
- __func__, item, dip, bp, in_f->ilf_ino,
- ldip->di_nextents + ldip->di_anextents,
+ "Bad inode 0x%llx, nextents 0x%x, anextents 0x%x, nblocks 0x%llx",
+ in_f->ilf_ino, ldip->di_nextents, ldip->di_anextents,
ldip->di_nblocks);
error = -EFSCORRUPTED;
goto out_release;
}
if (unlikely(ldip->di_forkoff > mp->m_sb.sb_inodesize)) {
- XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(6)",
- XFS_ERRLEVEL_LOW, mp, ldip,
- sizeof(*ldip));
+ XFS_CORRUPTION_ERROR("Bad log dinode fork offset",
+ XFS_ERRLEVEL_LOW, mp, ldip, sizeof(*ldip));
xfs_alert(mp,
- "%s: Bad inode log record, rec ptr "PTR_FMT", dino ptr "PTR_FMT", "
- "dino bp "PTR_FMT", ino %Ld, forkoff 0x%x", __func__,
- item, dip, bp, in_f->ilf_ino, ldip->di_forkoff);
+ "Bad inode 0x%llx, di_forkoff 0x%x",
+ in_f->ilf_ino, ldip->di_forkoff);
error = -EFSCORRUPTED;
goto out_release;
}
isize = xfs_log_dinode_size(mp);
if (unlikely(item->ri_buf[1].i_len > isize)) {
- XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(7)",
- XFS_ERRLEVEL_LOW, mp, ldip,
- sizeof(*ldip));
+ XFS_CORRUPTION_ERROR("Bad log dinode size", XFS_ERRLEVEL_LOW,
+ mp, ldip, sizeof(*ldip));
xfs_alert(mp,
- "%s: Bad inode log record length %d, rec ptr "PTR_FMT,
- __func__, item->ri_buf[1].i_len, item);
+ "Bad inode 0x%llx log dinode size 0x%x",
+ in_f->ilf_ino, item->ri_buf[1].i_len);
error = -EFSCORRUPTED;
goto out_release;
}
--
2.30.2
^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH V9 14/19] xfs: Introduce per-inode 64-bit extent counters
2022-04-06 6:18 [PATCH V9 00/19] xfs: Extend per-inode extent counters Chandan Babu R
` (12 preceding siblings ...)
2022-04-06 6:18 ` [PATCH V9 13/19] xfs: Replace numbered inode recovery error messages with descriptive ones Chandan Babu R
@ 2022-04-06 6:18 ` Chandan Babu R
2022-04-06 19:03 ` kernel test robot
2022-04-06 6:18 ` [PATCH V9 15/19] xfs: Directory's data fork extent counter can never overflow Chandan Babu R
` (5 subsequent siblings)
19 siblings, 1 reply; 62+ messages in thread
From: Chandan Babu R @ 2022-04-06 6:18 UTC (permalink / raw)
To: linux-xfs; +Cc: Chandan Babu R, djwong, david, Dave Chinner
This commit introduces new fields in the on-disk inode format to support
64-bit data fork extent counters and 32-bit attribute fork extent
counters. The new fields will be used only when an inode has
XFS_DIFLAG2_NREXT64 flag set. Otherwise we continue to use the regular 32-bit
data fork extent counters and 16-bit attribute fork extent counters.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Suggested-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/libxfs/xfs_format.h | 33 +++++++++++--
fs/xfs/libxfs/xfs_inode_buf.c | 49 ++++++++++++++++--
fs/xfs/libxfs/xfs_inode_fork.h | 6 +++
fs/xfs/libxfs/xfs_log_format.h | 33 +++++++++++--
fs/xfs/xfs_inode_item.c | 23 +++++++--
fs/xfs/xfs_inode_item_recover.c | 88 ++++++++++++++++++++++++++++-----
6 files changed, 203 insertions(+), 29 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index eb85bc9b229b..82b404c99b80 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -792,16 +792,41 @@ struct xfs_dinode {
__be32 di_nlink; /* number of links to file */
__be16 di_projid_lo; /* lower part of owner's project id */
__be16 di_projid_hi; /* higher part owner's project id */
- __u8 di_pad[6]; /* unused, zeroed space */
- __be16 di_flushiter; /* incremented on flush */
+ union {
+ /* Number of data fork extents if NREXT64 is set */
+ __be64 di_big_nextents;
+
+ /* Padding for V3 inodes without NREXT64 set. */
+ __be64 di_v3_pad;
+
+ /* Padding and inode flush counter for V2 inodes. */
+ struct {
+ __u8 di_v2_pad[6];
+ __be16 di_flushiter;
+ };
+ };
xfs_timestamp_t di_atime; /* time last accessed */
xfs_timestamp_t di_mtime; /* time last modified */
xfs_timestamp_t di_ctime; /* time created/inode modified */
__be64 di_size; /* number of bytes in file */
__be64 di_nblocks; /* # of direct & btree blocks used */
__be32 di_extsize; /* basic/minimum extent size for file */
- __be32 di_nextents; /* number of extents in data fork */
- __be16 di_anextents; /* number of extents in attribute fork*/
+ union {
+ /*
+ * For V2 inodes and V3 inodes without NREXT64 set, this
+ * is the number of data and attr fork extents.
+ */
+ struct {
+ __be32 di_nextents;
+ __be16 di_anextents;
+ } __packed;
+
+ /* Number of attr fork extents if NREXT64 is set. */
+ struct {
+ __be32 di_big_anextents;
+ __be16 di_nrext64_pad;
+ } __packed;
+ } __packed;
__u8 di_forkoff; /* attr fork offs, <<3 for 64b align */
__s8 di_aformat; /* format of attr fork's data */
__be32 di_dmevmask; /* DMIG event mask */
diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index e0d3140c3622..ee8d4eb7d048 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -279,6 +279,25 @@ xfs_inode_to_disk_ts(
return ts;
}
+static inline void
+xfs_inode_to_disk_iext_counters(
+ struct xfs_inode *ip,
+ struct xfs_dinode *to)
+{
+ if (xfs_inode_has_large_extent_counts(ip)) {
+ to->di_big_nextents = cpu_to_be64(xfs_ifork_nextents(&ip->i_df));
+ to->di_big_anextents = cpu_to_be32(xfs_ifork_nextents(ip->i_afp));
+ /*
+ * We might be upgrading the inode to use larger extent counters
+ * than was previously used. Hence zero the unused field.
+ */
+ to->di_nrext64_pad = cpu_to_be16(0);
+ } else {
+ to->di_nextents = cpu_to_be32(xfs_ifork_nextents(&ip->i_df));
+ to->di_anextents = cpu_to_be16(xfs_ifork_nextents(ip->i_afp));
+ }
+}
+
void
xfs_inode_to_disk(
struct xfs_inode *ip,
@@ -296,7 +315,6 @@ xfs_inode_to_disk(
to->di_projid_lo = cpu_to_be16(ip->i_projid & 0xffff);
to->di_projid_hi = cpu_to_be16(ip->i_projid >> 16);
- memset(to->di_pad, 0, sizeof(to->di_pad));
to->di_atime = xfs_inode_to_disk_ts(ip, inode->i_atime);
to->di_mtime = xfs_inode_to_disk_ts(ip, inode->i_mtime);
to->di_ctime = xfs_inode_to_disk_ts(ip, inode->i_ctime);
@@ -307,8 +325,6 @@ xfs_inode_to_disk(
to->di_size = cpu_to_be64(ip->i_disk_size);
to->di_nblocks = cpu_to_be64(ip->i_nblocks);
to->di_extsize = cpu_to_be32(ip->i_extsize);
- to->di_nextents = cpu_to_be32(xfs_ifork_nextents(&ip->i_df));
- to->di_anextents = cpu_to_be16(xfs_ifork_nextents(ip->i_afp));
to->di_forkoff = ip->i_forkoff;
to->di_aformat = xfs_ifork_format(ip->i_afp);
to->di_flags = cpu_to_be16(ip->i_diflags);
@@ -323,11 +339,14 @@ xfs_inode_to_disk(
to->di_lsn = cpu_to_be64(lsn);
memset(to->di_pad2, 0, sizeof(to->di_pad2));
uuid_copy(&to->di_uuid, &ip->i_mount->m_sb.sb_meta_uuid);
- to->di_flushiter = 0;
+ to->di_v3_pad = 0;
} else {
to->di_version = 2;
to->di_flushiter = cpu_to_be16(ip->i_flushiter);
+ memset(to->di_v2_pad, 0, sizeof(to->di_v2_pad));
}
+
+ xfs_inode_to_disk_iext_counters(ip, to);
}
static xfs_failaddr_t
@@ -398,6 +417,24 @@ xfs_dinode_verify_forkoff(
return NULL;
}
+static xfs_failaddr_t
+xfs_dinode_verify_nrext64(
+ struct xfs_mount *mp,
+ struct xfs_dinode *dip)
+{
+ if (xfs_dinode_has_large_extent_counts(dip)) {
+ if (!xfs_has_large_extent_counts(mp))
+ return __this_address;
+ if (dip->di_nrext64_pad != 0)
+ return __this_address;
+ } else if (dip->di_version >= 3) {
+ if (dip->di_v3_pad != 0)
+ return __this_address;
+ }
+
+ return NULL;
+}
+
xfs_failaddr_t
xfs_dinode_verify(
struct xfs_mount *mp,
@@ -442,6 +479,10 @@ xfs_dinode_verify(
if ((S_ISLNK(mode) || S_ISDIR(mode)) && di_size == 0)
return __this_address;
+ fa = xfs_dinode_verify_nrext64(mp, dip);
+ if (fa)
+ return fa;
+
nextents = xfs_dfork_data_extents(dip);
naextents = xfs_dfork_attr_extents(dip);
nblocks = be64_to_cpu(dip->di_nblocks);
diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
index 967837a88860..fd5c3c2d77e0 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.h
+++ b/fs/xfs/libxfs/xfs_inode_fork.h
@@ -158,6 +158,9 @@ static inline xfs_extnum_t
xfs_dfork_data_extents(
struct xfs_dinode *dip)
{
+ if (xfs_dinode_has_large_extent_counts(dip))
+ return be64_to_cpu(dip->di_big_nextents);
+
return be32_to_cpu(dip->di_nextents);
}
@@ -165,6 +168,9 @@ static inline xfs_extnum_t
xfs_dfork_attr_extents(
struct xfs_dinode *dip)
{
+ if (xfs_dinode_has_large_extent_counts(dip))
+ return be32_to_cpu(dip->di_big_anextents);
+
return be16_to_cpu(dip->di_anextents);
}
diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index fd66e70248f7..12234a880e94 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -388,16 +388,41 @@ struct xfs_log_dinode {
uint32_t di_nlink; /* number of links to file */
uint16_t di_projid_lo; /* lower part of owner's project id */
uint16_t di_projid_hi; /* higher part of owner's project id */
- uint8_t di_pad[6]; /* unused, zeroed space */
- uint16_t di_flushiter; /* incremented on flush */
+ union {
+ /* Number of data fork extents if NREXT64 is set */
+ uint64_t di_big_nextents;
+
+ /* Padding for V3 inodes without NREXT64 set. */
+ uint64_t di_v3_pad;
+
+ /* Padding and inode flush counter for V2 inodes. */
+ struct {
+ uint8_t di_v2_pad[6]; /* V2 inode zeroed space */
+ uint16_t di_flushiter; /* V2 inode incremented on flush */
+ };
+ };
xfs_log_timestamp_t di_atime; /* time last accessed */
xfs_log_timestamp_t di_mtime; /* time last modified */
xfs_log_timestamp_t di_ctime; /* time created/inode modified */
xfs_fsize_t di_size; /* number of bytes in file */
xfs_rfsblock_t di_nblocks; /* # of direct & btree blocks used */
xfs_extlen_t di_extsize; /* basic/minimum extent size for file */
- uint32_t di_nextents; /* number of extents in data fork */
- uint16_t di_anextents; /* number of extents in attribute fork*/
+ union {
+ /*
+ * For V2 inodes and V3 inodes without NREXT64 set, this
+ * is the number of data and attr fork extents.
+ */
+ struct {
+ uint32_t di_nextents;
+ uint16_t di_anextents;
+ } __packed;
+
+ /* Number of attr fork extents if NREXT64 is set. */
+ struct {
+ uint32_t di_big_anextents;
+ uint16_t di_nrext64_pad;
+ } __packed;
+ } __packed;
uint8_t di_forkoff; /* attr fork offs, <<3 for 64b align */
int8_t di_aformat; /* format of attr fork's data */
uint32_t di_dmevmask; /* DMIG event mask */
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index 9e6ef55cf29e..00733a18ccdc 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -359,6 +359,21 @@ xfs_copy_dm_fields_to_log_dinode(
}
}
+static inline void
+xfs_inode_to_log_dinode_iext_counters(
+ struct xfs_inode *ip,
+ struct xfs_log_dinode *to)
+{
+ if (xfs_inode_has_large_extent_counts(ip)) {
+ to->di_big_nextents = xfs_ifork_nextents(&ip->i_df);
+ to->di_big_anextents = xfs_ifork_nextents(ip->i_afp);
+ to->di_nrext64_pad = 0;
+ } else {
+ to->di_nextents = xfs_ifork_nextents(&ip->i_df);
+ to->di_anextents = xfs_ifork_nextents(ip->i_afp);
+ }
+}
+
static void
xfs_inode_to_log_dinode(
struct xfs_inode *ip,
@@ -374,7 +389,6 @@ xfs_inode_to_log_dinode(
to->di_projid_lo = ip->i_projid & 0xffff;
to->di_projid_hi = ip->i_projid >> 16;
- memset(to->di_pad, 0, sizeof(to->di_pad));
memset(to->di_pad3, 0, sizeof(to->di_pad3));
to->di_atime = xfs_inode_to_log_dinode_ts(ip, inode->i_atime);
to->di_mtime = xfs_inode_to_log_dinode_ts(ip, inode->i_mtime);
@@ -386,8 +400,6 @@ xfs_inode_to_log_dinode(
to->di_size = ip->i_disk_size;
to->di_nblocks = ip->i_nblocks;
to->di_extsize = ip->i_extsize;
- to->di_nextents = xfs_ifork_nextents(&ip->i_df);
- to->di_anextents = xfs_ifork_nextents(ip->i_afp);
to->di_forkoff = ip->i_forkoff;
to->di_aformat = xfs_ifork_format(ip->i_afp);
to->di_flags = ip->i_diflags;
@@ -407,11 +419,14 @@ xfs_inode_to_log_dinode(
to->di_lsn = lsn;
memset(to->di_pad2, 0, sizeof(to->di_pad2));
uuid_copy(&to->di_uuid, &ip->i_mount->m_sb.sb_meta_uuid);
- to->di_flushiter = 0;
+ to->di_v3_pad = 0;
} else {
to->di_version = 2;
to->di_flushiter = ip->i_flushiter;
+ memset(to->di_v2_pad, 0, sizeof(to->di_v2_pad));
}
+
+ xfs_inode_to_log_dinode_iext_counters(ip, to);
}
/*
diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c
index 96b222e18b0f..2d8f9cfa7116 100644
--- a/fs/xfs/xfs_inode_item_recover.c
+++ b/fs/xfs/xfs_inode_item_recover.c
@@ -149,6 +149,22 @@ static inline bool xfs_log_dinode_has_large_extent_counts(
(ld->di_flags2 & XFS_DIFLAG2_NREXT64);
}
+static inline void
+xfs_log_dinode_to_disk_iext_counters(
+ struct xfs_log_dinode *from,
+ struct xfs_dinode *to)
+{
+ if (xfs_log_dinode_has_large_extent_counts(from)) {
+ to->di_big_nextents = cpu_to_be64(from->di_big_nextents);
+ to->di_big_anextents = cpu_to_be32(from->di_big_anextents);
+ to->di_nrext64_pad = cpu_to_be16(from->di_nrext64_pad);
+ } else {
+ to->di_nextents = cpu_to_be32(from->di_nextents);
+ to->di_anextents = cpu_to_be16(from->di_anextents);
+ }
+
+}
+
STATIC void
xfs_log_dinode_to_disk(
struct xfs_log_dinode *from,
@@ -165,7 +181,6 @@ xfs_log_dinode_to_disk(
to->di_nlink = cpu_to_be32(from->di_nlink);
to->di_projid_lo = cpu_to_be16(from->di_projid_lo);
to->di_projid_hi = cpu_to_be16(from->di_projid_hi);
- memcpy(to->di_pad, from->di_pad, sizeof(to->di_pad));
to->di_atime = xfs_log_dinode_to_disk_ts(from, from->di_atime);
to->di_mtime = xfs_log_dinode_to_disk_ts(from, from->di_mtime);
@@ -174,8 +189,6 @@ xfs_log_dinode_to_disk(
to->di_size = cpu_to_be64(from->di_size);
to->di_nblocks = cpu_to_be64(from->di_nblocks);
to->di_extsize = cpu_to_be32(from->di_extsize);
- to->di_nextents = cpu_to_be32(from->di_nextents);
- to->di_anextents = cpu_to_be16(from->di_anextents);
to->di_forkoff = from->di_forkoff;
to->di_aformat = from->di_aformat;
to->di_dmevmask = cpu_to_be32(from->di_dmevmask);
@@ -193,10 +206,64 @@ xfs_log_dinode_to_disk(
to->di_lsn = cpu_to_be64(lsn);
memcpy(to->di_pad2, from->di_pad2, sizeof(to->di_pad2));
uuid_copy(&to->di_uuid, &from->di_uuid);
- to->di_flushiter = 0;
+ to->di_v3_pad = from->di_v3_pad;
} else {
to->di_flushiter = cpu_to_be16(from->di_flushiter);
+ memcpy(to->di_v2_pad, from->di_v2_pad, sizeof(to->di_v2_pad));
+ }
+
+ xfs_log_dinode_to_disk_iext_counters(from, to);
+}
+
+STATIC int
+xlog_dinode_verify_extent_counts(
+ struct xfs_mount *mp,
+ struct xfs_log_dinode *ldip)
+{
+ xfs_extnum_t nextents;
+ xfs_aextnum_t anextents;
+
+ if (xfs_log_dinode_has_large_extent_counts(ldip)) {
+ if (!xfs_has_large_extent_counts(mp) ||
+ (ldip->di_nrext64_pad != 0)) {
+ XFS_CORRUPTION_ERROR(
+ "Bad log dinode large extent count format",
+ XFS_ERRLEVEL_LOW, mp, ldip, sizeof(*ldip));
+ xfs_alert(mp,
+ "Bad inode 0x%llx, large extent counts %d, padding 0x%x",
+ ldip->di_ino, xfs_has_large_extent_counts(mp),
+ ldip->di_nrext64_pad);
+ return -EFSCORRUPTED;
+ }
+
+ nextents = ldip->di_big_nextents;
+ anextents = ldip->di_big_anextents;
+ } else {
+ if (ldip->di_version == 3 && ldip->di_v3_pad != 0) {
+ XFS_CORRUPTION_ERROR(
+ "Bad log dinode di_v3_pad",
+ XFS_ERRLEVEL_LOW, mp, ldip, sizeof(*ldip));
+ xfs_alert(mp,
+ "Bad inode 0x%llx, di_v3_pad 0x%llx",
+ ldip->di_ino, ldip->di_v3_pad);
+ return -EFSCORRUPTED;
+ }
+
+ nextents = ldip->di_nextents;
+ anextents = ldip->di_anextents;
+ }
+
+ if (unlikely(nextents + anextents > ldip->di_nblocks)) {
+ XFS_CORRUPTION_ERROR("Bad log dinode extent counts",
+ XFS_ERRLEVEL_LOW, mp, ldip, sizeof(*ldip));
+ xfs_alert(mp,
+ "Bad inode 0x%llx, large extent counts %d, nextents 0x%llx, anextents 0x%x, nblocks 0x%llx",
+ ldip->di_ino, xfs_has_large_extent_counts(mp), nextents,
+ anextents, ldip->di_nblocks);
+ return -EFSCORRUPTED;
}
+
+ return 0;
}
STATIC int
@@ -347,16 +414,11 @@ xlog_recover_inode_commit_pass2(
goto out_release;
}
}
- if (unlikely(ldip->di_nextents + ldip->di_anextents > ldip->di_nblocks)){
- XFS_CORRUPTION_ERROR("Bad log dinode extent counts",
- XFS_ERRLEVEL_LOW, mp, ldip, sizeof(*ldip));
- xfs_alert(mp,
- "Bad inode 0x%llx, nextents 0x%x, anextents 0x%x, nblocks 0x%llx",
- in_f->ilf_ino, ldip->di_nextents, ldip->di_anextents,
- ldip->di_nblocks);
- error = -EFSCORRUPTED;
+
+ error = xlog_dinode_verify_extent_counts(mp, ldip);
+ if (error)
goto out_release;
- }
+
if (unlikely(ldip->di_forkoff > mp->m_sb.sb_inodesize)) {
XFS_CORRUPTION_ERROR("Bad log dinode fork offset",
XFS_ERRLEVEL_LOW, mp, ldip, sizeof(*ldip));
--
2.30.2
^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH V9 15/19] xfs: Directory's data fork extent counter can never overflow
2022-04-06 6:18 [PATCH V9 00/19] xfs: Extend per-inode extent counters Chandan Babu R
` (13 preceding siblings ...)
2022-04-06 6:18 ` [PATCH V9 14/19] xfs: Introduce per-inode 64-bit extent counters Chandan Babu R
@ 2022-04-06 6:18 ` Chandan Babu R
2022-04-07 1:13 ` Dave Chinner
` (2 more replies)
2022-04-06 6:19 ` [PATCH V9 16/19] xfs: Conditionally upgrade existing inodes to use large extent counters Chandan Babu R
` (4 subsequent siblings)
19 siblings, 3 replies; 62+ messages in thread
From: Chandan Babu R @ 2022-04-06 6:18 UTC (permalink / raw)
To: linux-xfs; +Cc: Chandan Babu R, djwong, david
The maximum file size that can be represented by the data fork extent counter
in the worst case occurs when all extents are 1 block in length and each block
is 1KB in size.
With XFS_MAX_EXTCNT_DATA_FORK_SMALL representing maximum extent count and with
1KB sized blocks, a file can reach upto,
(2^31) * 1KB = 2TB
This is much larger than the theoretical maximum size of a directory
i.e. XFS_DIR2_SPACE_SIZE * 3 = ~96GB.
Since a directory's inode can never overflow its data fork extent counter,
this commit removes all the overflow checks associated with
it. xfs_dinode_verify() now performs a rough check to verify if a diretory's
data fork is larger than 96GB.
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
fs/xfs/libxfs/xfs_bmap.c | 20 -------------
fs/xfs/libxfs/xfs_da_format.h | 1 +
fs/xfs/libxfs/xfs_format.h | 13 ++++++++
fs/xfs/libxfs/xfs_inode_buf.c | 9 ++++++
fs/xfs/libxfs/xfs_inode_fork.h | 13 --------
fs/xfs/xfs_inode.c | 55 ++--------------------------------
fs/xfs/xfs_symlink.c | 5 ----
7 files changed, 25 insertions(+), 91 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 1254d4d4821e..4fab0c92ab70 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -5147,26 +5147,6 @@ xfs_bmap_del_extent_real(
* Deleting the middle of the extent.
*/
- /*
- * For directories, -ENOSPC is returned since a directory entry
- * remove operation must not fail due to low extent count
- * availability. -ENOSPC will be handled by higher layers of XFS
- * by letting the corresponding empty Data/Free blocks to linger
- * until a future remove operation. Dabtree blocks would be
- * swapped with the last block in the leaf space and then the
- * new last block will be unmapped.
- *
- * The above logic also applies to the source directory entry of
- * a rename operation.
- */
- error = xfs_iext_count_may_overflow(ip, whichfork, 1);
- if (error) {
- ASSERT(S_ISDIR(VFS_I(ip)->i_mode) &&
- whichfork == XFS_DATA_FORK);
- error = -ENOSPC;
- goto done;
- }
-
old = got;
got.br_blockcount = del->br_startoff - got.br_startoff;
diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index 5a49caa5c9df..95354b7ab7f5 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -277,6 +277,7 @@ xfs_dir2_sf_firstentry(struct xfs_dir2_sf_hdr *hdr)
* Directory address space divided into sections,
* spaces separated by 32GB.
*/
+#define XFS_DIR2_MAX_SPACES 3
#define XFS_DIR2_SPACE_SIZE (1ULL << (32 + XFS_DIR2_DATA_ALIGN_LOG))
#define XFS_DIR2_DATA_SPACE 0
#define XFS_DIR2_DATA_OFFSET (XFS_DIR2_DATA_SPACE * XFS_DIR2_SPACE_SIZE)
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 82b404c99b80..43de892d0305 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -915,6 +915,19 @@ enum xfs_dinode_fmt {
*
* Rounding up 47 to the nearest multiple of bits-per-byte results in 48. Hence
* 2^48 was chosen as the maximum data fork extent count.
+ *
+ * The maximum file size that can be represented by the data fork extent counter
+ * in the worst case occurs when all extents are 1 block in length and each
+ * block is 1KB in size.
+ *
+ * With XFS_MAX_EXTCNT_DATA_FORK_SMALL representing maximum extent count and
+ * with 1KB sized blocks, a file can reach upto,
+ * 1KB * (2^31) = 2TB
+ *
+ * This is much larger than the theoretical maximum size of a directory
+ * i.e. XFS_DIR2_SPACE_SIZE * XFS_DIR2_MAX_SPACES = ~96GB.
+ *
+ * Hence, a directory inode can never overflow its data fork extent counter.
*/
#define XFS_MAX_EXTCNT_DATA_FORK_LARGE ((xfs_extnum_t)((1ULL << 48) - 1))
#define XFS_MAX_EXTCNT_ATTR_FORK_LARGE ((xfs_extnum_t)((1ULL << 32) - 1))
diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index ee8d4eb7d048..54b106ae77e1 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -491,6 +491,15 @@ xfs_dinode_verify(
if (mode && nextents + naextents > nblocks)
return __this_address;
+ if (S_ISDIR(mode)) {
+ uint64_t max_dfork_nexts;
+
+ max_dfork_nexts = (XFS_DIR2_MAX_SPACES * XFS_DIR2_SPACE_SIZE) >>
+ mp->m_sb.sb_blocklog;
+ if (nextents > max_dfork_nexts)
+ return __this_address;
+ }
+
if (mode && XFS_DFORK_BOFF(dip) > mp->m_sb.sb_inodesize)
return __this_address;
diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
index fd5c3c2d77e0..6f9d69f8896e 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.h
+++ b/fs/xfs/libxfs/xfs_inode_fork.h
@@ -39,19 +39,6 @@ struct xfs_ifork {
*/
#define XFS_IEXT_PUNCH_HOLE_CNT (1)
-/*
- * Directory entry addition can cause the following,
- * 1. Data block can be added/removed.
- * A new extent can cause extent count to increase by 1.
- * 2. Free disk block can be added/removed.
- * Same behaviour as described above for Data block.
- * 3. Dabtree blocks.
- * XFS_DA_NODE_MAXDEPTH blocks can be added. Each of these can be new
- * extents. Hence extent count can increase by XFS_DA_NODE_MAXDEPTH.
- */
-#define XFS_IEXT_DIR_MANIP_CNT(mp) \
- ((XFS_DA_NODE_MAXDEPTH + 1 + 1) * (mp)->m_dir_geo->fsbcount)
-
/*
* Adding/removing an xattr can cause XFS_DA_NODE_MAXDEPTH extents to
* be added. One extra extent for dabtree in case a local attr is
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index adc1355ce853..20f15a0393e1 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1024,11 +1024,6 @@ xfs_create(
xfs_ilock(dp, XFS_ILOCK_EXCL | XFS_ILOCK_PARENT);
unlock_dp_on_error = true;
- error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
- XFS_IEXT_DIR_MANIP_CNT(mp));
- if (error)
- goto out_trans_cancel;
-
/*
* A newly created regular or special file just has one directory
* entry pointing to them, but a directory also the "." entry
@@ -1242,11 +1237,6 @@ xfs_link(
if (error)
goto std_return;
- error = xfs_iext_count_may_overflow(tdp, XFS_DATA_FORK,
- XFS_IEXT_DIR_MANIP_CNT(mp));
- if (error)
- goto error_return;
-
/*
* If we are using project inheritance, we only allow hard link
* creation in our tree when the project IDs are the same; else
@@ -3210,35 +3200,6 @@ xfs_rename(
/*
* Check for expected errors before we dirty the transaction
* so we can return an error without a transaction abort.
- *
- * Extent count overflow check:
- *
- * From the perspective of src_dp, a rename operation is essentially a
- * directory entry remove operation. Hence the only place where we check
- * for extent count overflow for src_dp is in
- * xfs_bmap_del_extent_real(). xfs_bmap_del_extent_real() returns
- * -ENOSPC when it detects a possible extent count overflow and in
- * response, the higher layers of directory handling code do the
- * following:
- * 1. Data/Free blocks: XFS lets these blocks linger until a
- * future remove operation removes them.
- * 2. Dabtree blocks: XFS swaps the blocks with the last block in the
- * Leaf space and unmaps the last block.
- *
- * For target_dp, there are two cases depending on whether the
- * destination directory entry exists or not.
- *
- * When destination directory entry does not exist (i.e. target_ip ==
- * NULL), extent count overflow check is performed only when transaction
- * has a non-zero sized space reservation associated with it. With a
- * zero-sized space reservation, XFS allows a rename operation to
- * continue only when the directory has sufficient free space in its
- * data/leaf/free space blocks to hold the new entry.
- *
- * When destination directory entry exists (i.e. target_ip != NULL), all
- * we need to do is change the inode number associated with the already
- * existing entry. Hence there is no need to perform an extent count
- * overflow check.
*/
if (target_ip == NULL) {
/*
@@ -3249,12 +3210,6 @@ xfs_rename(
error = xfs_dir_canenter(tp, target_dp, target_name);
if (error)
goto out_trans_cancel;
- } else {
- error = xfs_iext_count_may_overflow(target_dp,
- XFS_DATA_FORK,
- XFS_IEXT_DIR_MANIP_CNT(mp));
- if (error)
- goto out_trans_cancel;
}
} else {
/*
@@ -3422,18 +3377,12 @@ xfs_rename(
* inode number of the whiteout inode rather than removing it
* altogether.
*/
- if (wip) {
+ if (wip)
error = xfs_dir_replace(tp, src_dp, src_name, wip->i_ino,
spaceres);
- } else {
- /*
- * NOTE: We don't need to check for extent count overflow here
- * because the dir remove name code will leave the dir block in
- * place if the extent count would overflow.
- */
+ else
error = xfs_dir_removename(tp, src_dp, src_name, src_ip->i_ino,
spaceres);
- }
if (error)
goto out_trans_cancel;
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index affbedf78160..4145ba872547 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -226,11 +226,6 @@ xfs_symlink(
goto out_trans_cancel;
}
- error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
- XFS_IEXT_DIR_MANIP_CNT(mp));
- if (error)
- goto out_trans_cancel;
-
/*
* Allocate an inode for the symlink.
*/
--
2.30.2
^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH V9 16/19] xfs: Conditionally upgrade existing inodes to use large extent counters
2022-04-06 6:18 [PATCH V9 00/19] xfs: Extend per-inode extent counters Chandan Babu R
` (14 preceding siblings ...)
2022-04-06 6:18 ` [PATCH V9 15/19] xfs: Directory's data fork extent counter can never overflow Chandan Babu R
@ 2022-04-06 6:19 ` Chandan Babu R
2022-04-07 1:22 ` Dave Chinner
` (2 more replies)
2022-04-06 6:19 ` [PATCH V9 17/19] xfs: Decouple XFS_IBULK flags from XFS_IWALK flags Chandan Babu R
` (3 subsequent siblings)
19 siblings, 3 replies; 62+ messages in thread
From: Chandan Babu R @ 2022-04-06 6:19 UTC (permalink / raw)
To: linux-xfs; +Cc: Chandan Babu R, djwong, david
This commit enables upgrading existing inodes to use large extent counters
provided that underlying filesystem's superblock has large extent counter
feature enabled.
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
fs/xfs/libxfs/xfs_attr.c | 10 ++++++++++
fs/xfs/libxfs/xfs_bmap.c | 6 ++++--
fs/xfs/libxfs/xfs_format.h | 8 ++++++++
fs/xfs/libxfs/xfs_inode_fork.c | 19 +++++++++++++++++++
fs/xfs/libxfs/xfs_inode_fork.h | 2 ++
fs/xfs/xfs_bmap_item.c | 2 ++
fs/xfs/xfs_bmap_util.c | 13 +++++++++++++
fs/xfs/xfs_dquot.c | 3 +++
fs/xfs/xfs_iomap.c | 5 +++++
fs/xfs/xfs_reflink.c | 5 +++++
fs/xfs/xfs_rtalloc.c | 3 +++
11 files changed, 74 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 23523b802539..66c4fc55c9d7 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -776,8 +776,18 @@ xfs_attr_set(
if (args->value || xfs_inode_hasattr(dp)) {
error = xfs_iext_count_may_overflow(dp, XFS_ATTR_FORK,
XFS_IEXT_ATTR_MANIP_CNT(rmt_blks));
+ if (error == -EFBIG)
+ error = xfs_iext_count_upgrade(args->trans, dp,
+ XFS_IEXT_ATTR_MANIP_CNT(rmt_blks));
if (error)
goto out_trans_cancel;
+
+ if (error == -EFBIG) {
+ error = xfs_iext_count_upgrade(args->trans, dp,
+ XFS_IEXT_ATTR_MANIP_CNT(rmt_blks));
+ if (error)
+ goto out_trans_cancel;
+ }
}
error = xfs_attr_lookup(args);
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 4fab0c92ab70..82d5467ddf2c 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4524,14 +4524,16 @@ xfs_bmapi_convert_delalloc(
return error;
xfs_ilock(ip, XFS_ILOCK_EXCL);
+ xfs_trans_ijoin(tp, ip, 0);
error = xfs_iext_count_may_overflow(ip, whichfork,
XFS_IEXT_ADD_NOSPLIT_CNT);
+ if (error == -EFBIG)
+ error = xfs_iext_count_upgrade(tp, ip,
+ XFS_IEXT_ADD_NOSPLIT_CNT);
if (error)
goto out_trans_cancel;
- xfs_trans_ijoin(tp, ip, 0);
-
if (!xfs_iext_lookup_extent(ip, ifp, offset_fsb, &bma.icur, &bma.got) ||
bma.got.br_startoff > offset_fsb) {
/*
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 43de892d0305..bb327ea43ca1 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -934,6 +934,14 @@ enum xfs_dinode_fmt {
#define XFS_MAX_EXTCNT_DATA_FORK_SMALL ((xfs_extnum_t)((1ULL << 31) - 1))
#define XFS_MAX_EXTCNT_ATTR_FORK_SMALL ((xfs_extnum_t)((1ULL << 15) - 1))
+/*
+ * This macro represents the maximum value by which a filesystem operation can
+ * increase the value of an inode's data/attr fork extent count.
+ */
+#define XFS_MAX_EXTCNT_UPGRADE_NR \
+ min(XFS_MAX_EXTCNT_ATTR_FORK_LARGE - XFS_MAX_EXTCNT_ATTR_FORK_SMALL, \
+ XFS_MAX_EXTCNT_DATA_FORK_LARGE - XFS_MAX_EXTCNT_DATA_FORK_SMALL)
+
/*
* Inode minimum and maximum sizes.
*/
diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
index bb5d841aac58..1245e9f1ca81 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -756,3 +756,22 @@ xfs_iext_count_may_overflow(
return 0;
}
+
+int
+xfs_iext_count_upgrade(
+ struct xfs_trans *tp,
+ struct xfs_inode *ip,
+ uint nr_to_add)
+{
+ ASSERT(nr_to_add <= XFS_MAX_EXTCNT_UPGRADE_NR);
+
+ if (!xfs_has_large_extent_counts(ip->i_mount) ||
+ (ip->i_diflags2 & XFS_DIFLAG2_NREXT64) ||
+ XFS_TEST_ERROR(false, ip->i_mount, XFS_ERRTAG_REDUCE_MAX_IEXTENTS))
+ return -EFBIG;
+
+ ip->i_diflags2 |= XFS_DIFLAG2_NREXT64;
+ xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+
+ return 0;
+}
diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
index 6f9d69f8896e..4f68c1f20beb 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.h
+++ b/fs/xfs/libxfs/xfs_inode_fork.h
@@ -275,6 +275,8 @@ int xfs_ifork_verify_local_data(struct xfs_inode *ip);
int xfs_ifork_verify_local_attr(struct xfs_inode *ip);
int xfs_iext_count_may_overflow(struct xfs_inode *ip, int whichfork,
int nr_to_add);
+int xfs_iext_count_upgrade(struct xfs_trans *tp, struct xfs_inode *ip,
+ uint nr_to_add);
/* returns true if the fork has extents but they are not read in yet. */
static inline bool xfs_need_iread_extents(struct xfs_ifork *ifp)
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index 761dde155099..593ac29cffc7 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -506,6 +506,8 @@ xfs_bui_item_recover(
iext_delta = XFS_IEXT_PUNCH_HOLE_CNT;
error = xfs_iext_count_may_overflow(ip, whichfork, iext_delta);
+ if (error == -EFBIG)
+ error = xfs_iext_count_upgrade(tp, ip, iext_delta);
if (error)
goto err_cancel;
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 18c1b99311a8..52be58372c63 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -859,6 +859,9 @@ xfs_alloc_file_space(
error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
XFS_IEXT_ADD_NOSPLIT_CNT);
+ if (error == -EFBIG)
+ error = xfs_iext_count_upgrade(tp, ip,
+ XFS_IEXT_ADD_NOSPLIT_CNT);
if (error)
goto error;
@@ -914,6 +917,8 @@ xfs_unmap_extent(
error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
XFS_IEXT_PUNCH_HOLE_CNT);
+ if (error == -EFBIG)
+ error = xfs_iext_count_upgrade(tp, ip, XFS_IEXT_PUNCH_HOLE_CNT);
if (error)
goto out_trans_cancel;
@@ -1195,6 +1200,8 @@ xfs_insert_file_space(
error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
XFS_IEXT_PUNCH_HOLE_CNT);
+ if (error == -EFBIG)
+ error = xfs_iext_count_upgrade(tp, ip, XFS_IEXT_PUNCH_HOLE_CNT);
if (error)
goto out_trans_cancel;
@@ -1423,6 +1430,9 @@ xfs_swap_extent_rmap(
error = xfs_iext_count_may_overflow(ip,
XFS_DATA_FORK,
XFS_IEXT_SWAP_RMAP_CNT);
+ if (error == -EFBIG)
+ error = xfs_iext_count_upgrade(tp, ip,
+ XFS_IEXT_SWAP_RMAP_CNT);
if (error)
goto out;
}
@@ -1431,6 +1441,9 @@ xfs_swap_extent_rmap(
error = xfs_iext_count_may_overflow(tip,
XFS_DATA_FORK,
XFS_IEXT_SWAP_RMAP_CNT);
+ if (error == -EFBIG)
+ error = xfs_iext_count_upgrade(tp, ip,
+ XFS_IEXT_SWAP_RMAP_CNT);
if (error)
goto out;
}
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index 5afedcbc78c7..eb211e0ede5d 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -322,6 +322,9 @@ xfs_dquot_disk_alloc(
error = xfs_iext_count_may_overflow(quotip, XFS_DATA_FORK,
XFS_IEXT_ADD_NOSPLIT_CNT);
+ if (error == -EFBIG)
+ error = xfs_iext_count_upgrade(tp, quotip,
+ XFS_IEXT_ADD_NOSPLIT_CNT);
if (error)
goto err_cancel;
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 87e1cf5060bd..5a393259a3a3 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -251,6 +251,8 @@ xfs_iomap_write_direct(
return error;
error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK, nr_exts);
+ if (error == -EFBIG)
+ error = xfs_iext_count_upgrade(tp, ip, nr_exts);
if (error)
goto out_trans_cancel;
@@ -555,6 +557,9 @@ xfs_iomap_write_unwritten(
error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
XFS_IEXT_WRITE_UNWRITTEN_CNT);
+ if (error == -EFBIG)
+ error = xfs_iext_count_upgrade(tp, ip,
+ XFS_IEXT_WRITE_UNWRITTEN_CNT);
if (error)
goto error_on_bmapi_transaction;
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 54e68e5693fd..1ae6d3434ad2 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -620,6 +620,9 @@ xfs_reflink_end_cow_extent(
error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
XFS_IEXT_REFLINK_END_COW_CNT);
+ if (error == -EFBIG)
+ error = xfs_iext_count_upgrade(tp, ip,
+ XFS_IEXT_REFLINK_END_COW_CNT);
if (error)
goto out_cancel;
@@ -1121,6 +1124,8 @@ xfs_reflink_remap_extent(
++iext_delta;
error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK, iext_delta);
+ if (error == -EFBIG)
+ error = xfs_iext_count_upgrade(tp, ip, iext_delta);
if (error)
goto out_cancel;
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index b8c79ee791af..3e587e85d5bf 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -806,6 +806,9 @@ xfs_growfs_rt_alloc(
error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
XFS_IEXT_ADD_NOSPLIT_CNT);
+ if (error == -EFBIG)
+ error = xfs_iext_count_upgrade(tp, ip,
+ XFS_IEXT_ADD_NOSPLIT_CNT);
if (error)
goto out_trans_cancel;
--
2.30.2
^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH V9 17/19] xfs: Decouple XFS_IBULK flags from XFS_IWALK flags
2022-04-06 6:18 [PATCH V9 00/19] xfs: Extend per-inode extent counters Chandan Babu R
` (15 preceding siblings ...)
2022-04-06 6:19 ` [PATCH V9 16/19] xfs: Conditionally upgrade existing inodes to use large extent counters Chandan Babu R
@ 2022-04-06 6:19 ` Chandan Babu R
2022-04-07 1:29 ` Darrick J. Wong
2022-04-06 6:19 ` [PATCH V9 18/19] xfs: Enable bulkstat ioctl to support 64-bit per-inode extent counters Chandan Babu R
` (2 subsequent siblings)
19 siblings, 1 reply; 62+ messages in thread
From: Chandan Babu R @ 2022-04-06 6:19 UTC (permalink / raw)
To: linux-xfs; +Cc: Chandan Babu R, djwong, david, Dave Chinner
A future commit will add a new XFS_IBULK flag which will not have a
corresponding XFS_IWALK flag. In preparation for the change, this commit
separates XFS_IBULK_* flags from XFS_IWALK_* flags.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
fs/xfs/xfs_itable.c | 6 +++++-
fs/xfs/xfs_itable.h | 2 +-
fs/xfs/xfs_iwalk.h | 2 +-
3 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
index c08c79d9e311..71ed4905f206 100644
--- a/fs/xfs/xfs_itable.c
+++ b/fs/xfs/xfs_itable.c
@@ -256,6 +256,7 @@ xfs_bulkstat(
.breq = breq,
};
struct xfs_trans *tp;
+ unsigned int iwalk_flags = 0;
int error;
if (breq->mnt_userns != &init_user_ns) {
@@ -279,7 +280,10 @@ xfs_bulkstat(
if (error)
goto out;
- error = xfs_iwalk(breq->mp, tp, breq->startino, breq->flags,
+ if (breq->flags & XFS_IBULK_SAME_AG)
+ iwalk_flags |= XFS_IWALK_SAME_AG;
+
+ error = xfs_iwalk(breq->mp, tp, breq->startino, iwalk_flags,
xfs_bulkstat_iwalk, breq->icount, &bc);
xfs_trans_cancel(tp);
out:
diff --git a/fs/xfs/xfs_itable.h b/fs/xfs/xfs_itable.h
index 7078d10c9b12..2cf3872fcd2f 100644
--- a/fs/xfs/xfs_itable.h
+++ b/fs/xfs/xfs_itable.h
@@ -17,7 +17,7 @@ struct xfs_ibulk {
};
/* Only iterate within the same AG as startino */
-#define XFS_IBULK_SAME_AG (XFS_IWALK_SAME_AG)
+#define XFS_IBULK_SAME_AG (1 << 0)
/*
* Advance the user buffer pointer by one record of the given size. If the
diff --git a/fs/xfs/xfs_iwalk.h b/fs/xfs/xfs_iwalk.h
index 37a795f03267..3a68766fd909 100644
--- a/fs/xfs/xfs_iwalk.h
+++ b/fs/xfs/xfs_iwalk.h
@@ -26,7 +26,7 @@ int xfs_iwalk_threaded(struct xfs_mount *mp, xfs_ino_t startino,
unsigned int inode_records, bool poll, void *data);
/* Only iterate inodes within the same AG as @startino. */
-#define XFS_IWALK_SAME_AG (0x1)
+#define XFS_IWALK_SAME_AG (1 << 0)
#define XFS_IWALK_FLAGS_ALL (XFS_IWALK_SAME_AG)
--
2.30.2
^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH V9 18/19] xfs: Enable bulkstat ioctl to support 64-bit per-inode extent counters
2022-04-06 6:18 [PATCH V9 00/19] xfs: Extend per-inode extent counters Chandan Babu R
` (16 preceding siblings ...)
2022-04-06 6:19 ` [PATCH V9 17/19] xfs: Decouple XFS_IBULK flags from XFS_IWALK flags Chandan Babu R
@ 2022-04-06 6:19 ` Chandan Babu R
2022-04-07 1:29 ` Darrick J. Wong
2022-04-09 13:57 ` [PATCH V9.1] " Chandan Babu R
2022-04-06 6:19 ` [PATCH V9 19/19] xfs: Add XFS_SB_FEAT_INCOMPAT_NREXT64 to the list of supported flags Chandan Babu R
2022-04-09 13:23 ` [PATCH V9.1] xfs: Introduce macros to represent new maximum extent counts for data/attr forks Chandan Babu R
19 siblings, 2 replies; 62+ messages in thread
From: Chandan Babu R @ 2022-04-06 6:19 UTC (permalink / raw)
To: linux-xfs; +Cc: Chandan Babu R, djwong, david, Dave Chinner
The following changes are made to enable userspace to obtain 64-bit extent
counters,
1. Carve out a new 64-bit field xfs_bulkstat->bs_extents64 from
xfs_bulkstat->bs_pad[] to hold 64-bit extent counter.
2. Define the new flag XFS_BULK_IREQ_BULKSTAT for userspace to indicate that
it is capable of receiving 64-bit extent counters.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Suggested-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
fs/xfs/libxfs/xfs_fs.h | 20 ++++++++++++++++----
fs/xfs/xfs_ioctl.c | 3 +++
fs/xfs/xfs_itable.c | 13 ++++++++++++-
fs/xfs/xfs_itable.h | 2 ++
4 files changed, 33 insertions(+), 5 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 1f7238db35cc..2a42bfb85c3b 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -378,7 +378,7 @@ struct xfs_bulkstat {
uint32_t bs_extsize_blks; /* extent size hint, blocks */
uint32_t bs_nlink; /* number of links */
- uint32_t bs_extents; /* number of extents */
+ uint32_t bs_extents; /* 32-bit data fork extent counter */
uint32_t bs_aextents; /* attribute number of extents */
uint16_t bs_version; /* structure version */
uint16_t bs_forkoff; /* inode fork offset in bytes */
@@ -387,8 +387,9 @@ struct xfs_bulkstat {
uint16_t bs_checked; /* checked inode metadata */
uint16_t bs_mode; /* type and mode */
uint16_t bs_pad2; /* zeroed */
+ uint64_t bs_extents64; /* 64-bit data fork extent counter */
- uint64_t bs_pad[7]; /* zeroed */
+ uint64_t bs_pad[6]; /* zeroed */
};
#define XFS_BULKSTAT_VERSION_V1 (1)
@@ -469,8 +470,19 @@ struct xfs_bulk_ireq {
*/
#define XFS_BULK_IREQ_SPECIAL (1 << 1)
-#define XFS_BULK_IREQ_FLAGS_ALL (XFS_BULK_IREQ_AGNO | \
- XFS_BULK_IREQ_SPECIAL)
+/*
+ * Return data fork extent count via xfs_bulkstat->bs_extents64 field and assign
+ * 0 to xfs_bulkstat->bs_extents when the flag is set. Otherwise, use
+ * xfs_bulkstat->bs_extents for returning data fork extent count and set
+ * xfs_bulkstat->bs_extents64 to 0. In the second case, return -EOVERFLOW and
+ * assign 0 to xfs_bulkstat->bs_extents if data fork extent count is larger than
+ * XFS_MAX_EXTCNT_DATA_FORK_OLD.
+ */
+#define XFS_BULK_IREQ_NREXT64 (1 << 2)
+
+#define XFS_BULK_IREQ_FLAGS_ALL (XFS_BULK_IREQ_AGNO | \
+ XFS_BULK_IREQ_SPECIAL | \
+ XFS_BULK_IREQ_NREXT64)
/* Operate on the root directory inode. */
#define XFS_BULK_IREQ_SPECIAL_ROOT (1)
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 83481005317a..e9eadc7337ce 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -813,6 +813,9 @@ xfs_bulk_ireq_setup(
if (XFS_INO_TO_AGNO(mp, breq->startino) >= mp->m_sb.sb_agcount)
return -ECANCELED;
+ if (hdr->flags & XFS_BULK_IREQ_NREXT64)
+ breq->flags |= XFS_IBULK_NREXT64;
+
return 0;
}
diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
index 71ed4905f206..847f03f75a38 100644
--- a/fs/xfs/xfs_itable.c
+++ b/fs/xfs/xfs_itable.c
@@ -64,6 +64,7 @@ xfs_bulkstat_one_int(
struct xfs_inode *ip; /* incore inode pointer */
struct inode *inode;
struct xfs_bulkstat *buf = bc->buf;
+ xfs_extnum_t nextents;
int error = -EINVAL;
if (xfs_internal_inum(mp, ino))
@@ -102,7 +103,17 @@ xfs_bulkstat_one_int(
buf->bs_xflags = xfs_ip2xflags(ip);
buf->bs_extsize_blks = ip->i_extsize;
- buf->bs_extents = xfs_ifork_nextents(&ip->i_df);
+
+ nextents = xfs_ifork_nextents(&ip->i_df);
+ if (!(bc->breq->flags & XFS_IBULK_NREXT64)) {
+ if (nextents > XFS_MAX_EXTCNT_DATA_FORK_SMALL)
+ buf->bs_extents = XFS_MAX_EXTCNT_DATA_FORK_SMALL;
+ else
+ buf->bs_extents = nextents;
+ } else {
+ buf->bs_extents64 = nextents;
+ }
+
xfs_bulkstat_health(ip, buf);
buf->bs_aextents = xfs_ifork_nextents(ip->i_afp);
buf->bs_forkoff = XFS_IFORK_BOFF(ip);
diff --git a/fs/xfs/xfs_itable.h b/fs/xfs/xfs_itable.h
index 2cf3872fcd2f..0150fd53d18e 100644
--- a/fs/xfs/xfs_itable.h
+++ b/fs/xfs/xfs_itable.h
@@ -19,6 +19,8 @@ struct xfs_ibulk {
/* Only iterate within the same AG as startino */
#define XFS_IBULK_SAME_AG (1 << 0)
+#define XFS_IBULK_NREXT64 (1 << 1)
+
/*
* Advance the user buffer pointer by one record of the given size. If the
* buffer is now full, return the appropriate error code.
--
2.30.2
^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH V9 19/19] xfs: Add XFS_SB_FEAT_INCOMPAT_NREXT64 to the list of supported flags
2022-04-06 6:18 [PATCH V9 00/19] xfs: Extend per-inode extent counters Chandan Babu R
` (17 preceding siblings ...)
2022-04-06 6:19 ` [PATCH V9 18/19] xfs: Enable bulkstat ioctl to support 64-bit per-inode extent counters Chandan Babu R
@ 2022-04-06 6:19 ` Chandan Babu R
2022-04-07 1:23 ` Dave Chinner
2022-04-07 1:26 ` Darrick J. Wong
2022-04-09 13:23 ` [PATCH V9.1] xfs: Introduce macros to represent new maximum extent counts for data/attr forks Chandan Babu R
19 siblings, 2 replies; 62+ messages in thread
From: Chandan Babu R @ 2022-04-06 6:19 UTC (permalink / raw)
To: linux-xfs; +Cc: Chandan Babu R, djwong, david
This commit enables XFS module to work with fs instances having 64-bit
per-inode extent counters by adding XFS_SB_FEAT_INCOMPAT_NREXT64 flag to the
list of supported incompat feature flags.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
fs/xfs/libxfs/xfs_format.h | 3 ++-
fs/xfs/xfs_super.c | 5 +++++
2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index bb327ea43ca1..b3f4a33b986c 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -378,7 +378,8 @@ xfs_sb_has_ro_compat_feature(
XFS_SB_FEAT_INCOMPAT_SPINODES| \
XFS_SB_FEAT_INCOMPAT_META_UUID| \
XFS_SB_FEAT_INCOMPAT_BIGTIME| \
- XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR)
+ XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR| \
+ XFS_SB_FEAT_INCOMPAT_NREXT64)
#define XFS_SB_FEAT_INCOMPAT_UNKNOWN ~XFS_SB_FEAT_INCOMPAT_ALL
static inline bool
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 54be9d64093e..14591492c384 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1639,6 +1639,11 @@ xfs_fs_fill_super(
goto out_filestream_unmount;
}
+ if (xfs_has_large_extent_counts(mp)) {
+ xfs_warn(mp,
+ "EXPERIMENTAL Large extent counts feature in use. Use at your own risk!");
+ }
+
error = xfs_mountfs(mp);
if (error)
goto out_filestream_unmount;
--
2.30.2
^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [PATCH V9 14/19] xfs: Introduce per-inode 64-bit extent counters
2022-04-06 6:18 ` [PATCH V9 14/19] xfs: Introduce per-inode 64-bit extent counters Chandan Babu R
@ 2022-04-06 19:03 ` kernel test robot
2022-04-07 1:07 ` Dave Chinner
0 siblings, 1 reply; 62+ messages in thread
From: kernel test robot @ 2022-04-06 19:03 UTC (permalink / raw)
To: Chandan Babu R, linux-xfs
Cc: kbuild-all, Chandan Babu R, djwong, david, Dave Chinner
Hi Chandan,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on xfs-linux/for-next]
[also build test WARNING on v5.18-rc1 next-20220406]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/intel-lab-lkp/linux/commits/Chandan-Babu-R/xfs-Extend-per-inode-extent-counters/20220406-174647
base: https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
config: i386-randconfig-s002 (https://download.01.org/0day-ci/archive/20220407/202204070218.QyD2PQPx-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.2.0-19) 11.2.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.4-dirty
# https://github.com/intel-lab-lkp/linux/commit/28be4fd3f13d4ba2bcedceb8951cd3bfe852cba2
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Chandan-Babu-R/xfs-Extend-per-inode-extent-counters/20220406-174647
git checkout 28be4fd3f13d4ba2bcedceb8951cd3bfe852cba2
# save the config file to linux build tree
mkdir build_dir
make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=i386 SHELL=/bin/bash fs/xfs/
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
sparse warnings: (new ones prefixed by >>)
>> fs/xfs/xfs_inode_item_recover.c:209:31: sparse: sparse: incorrect type in assignment (different base types) @@ expected restricted __be64 [usertype] di_v3_pad @@ got unsigned long long [usertype] di_v3_pad @@
fs/xfs/xfs_inode_item_recover.c:209:31: sparse: expected restricted __be64 [usertype] di_v3_pad
fs/xfs/xfs_inode_item_recover.c:209:31: sparse: got unsigned long long [usertype] di_v3_pad
vim +209 fs/xfs/xfs_inode_item_recover.c
167
168 STATIC void
169 xfs_log_dinode_to_disk(
170 struct xfs_log_dinode *from,
171 struct xfs_dinode *to,
172 xfs_lsn_t lsn)
173 {
174 to->di_magic = cpu_to_be16(from->di_magic);
175 to->di_mode = cpu_to_be16(from->di_mode);
176 to->di_version = from->di_version;
177 to->di_format = from->di_format;
178 to->di_onlink = 0;
179 to->di_uid = cpu_to_be32(from->di_uid);
180 to->di_gid = cpu_to_be32(from->di_gid);
181 to->di_nlink = cpu_to_be32(from->di_nlink);
182 to->di_projid_lo = cpu_to_be16(from->di_projid_lo);
183 to->di_projid_hi = cpu_to_be16(from->di_projid_hi);
184
185 to->di_atime = xfs_log_dinode_to_disk_ts(from, from->di_atime);
186 to->di_mtime = xfs_log_dinode_to_disk_ts(from, from->di_mtime);
187 to->di_ctime = xfs_log_dinode_to_disk_ts(from, from->di_ctime);
188
189 to->di_size = cpu_to_be64(from->di_size);
190 to->di_nblocks = cpu_to_be64(from->di_nblocks);
191 to->di_extsize = cpu_to_be32(from->di_extsize);
192 to->di_forkoff = from->di_forkoff;
193 to->di_aformat = from->di_aformat;
194 to->di_dmevmask = cpu_to_be32(from->di_dmevmask);
195 to->di_dmstate = cpu_to_be16(from->di_dmstate);
196 to->di_flags = cpu_to_be16(from->di_flags);
197 to->di_gen = cpu_to_be32(from->di_gen);
198
199 if (from->di_version == 3) {
200 to->di_changecount = cpu_to_be64(from->di_changecount);
201 to->di_crtime = xfs_log_dinode_to_disk_ts(from,
202 from->di_crtime);
203 to->di_flags2 = cpu_to_be64(from->di_flags2);
204 to->di_cowextsize = cpu_to_be32(from->di_cowextsize);
205 to->di_ino = cpu_to_be64(from->di_ino);
206 to->di_lsn = cpu_to_be64(lsn);
207 memcpy(to->di_pad2, from->di_pad2, sizeof(to->di_pad2));
208 uuid_copy(&to->di_uuid, &from->di_uuid);
> 209 to->di_v3_pad = from->di_v3_pad;
210 } else {
211 to->di_flushiter = cpu_to_be16(from->di_flushiter);
212 memcpy(to->di_v2_pad, from->di_v2_pad, sizeof(to->di_v2_pad));
213 }
214
215 xfs_log_dinode_to_disk_iext_counters(from, to);
216 }
217
--
0-DAY CI Kernel Test Service
https://01.org/lkp
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH V9 08/19] xfs: Introduce XFS_SB_FEAT_INCOMPAT_NREXT64 and associated per-fs feature bit
2022-04-06 6:18 ` [PATCH V9 08/19] xfs: Introduce XFS_SB_FEAT_INCOMPAT_NREXT64 and associated per-fs feature bit Chandan Babu R
@ 2022-04-07 0:50 ` Dave Chinner
0 siblings, 0 replies; 62+ messages in thread
From: Dave Chinner @ 2022-04-07 0:50 UTC (permalink / raw)
To: Chandan Babu R; +Cc: linux-xfs, djwong
On Wed, Apr 06, 2022 at 11:48:52AM +0530, Chandan Babu R wrote:
> XFS_SB_FEAT_INCOMPAT_NREXT64 incompat feature bit will be set on filesystems
> which support large per-inode extent counters. This commit defines the new
> incompat feature bit and the corresponding per-fs feature bit (along with
> inline functions to work on it).
>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH V9 11/19] xfs: Use uint64_t to count maximum blocks that can be used by BMBT
2022-04-06 6:18 ` [PATCH V9 11/19] xfs: Use uint64_t to count maximum blocks that can be used by BMBT Chandan Babu R
@ 2022-04-07 0:52 ` Dave Chinner
0 siblings, 0 replies; 62+ messages in thread
From: Dave Chinner @ 2022-04-07 0:52 UTC (permalink / raw)
To: Chandan Babu R; +Cc: linux-xfs, djwong
On Wed, Apr 06, 2022 at 11:48:55AM +0530, Chandan Babu R wrote:
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
> ---
> fs/xfs/libxfs/xfs_bmap.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
I still think this should be merged with the earlier type conversion
patch, but I'm not going to hold up merging this on such a small
technicality.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH V9 12/19] xfs: Introduce macros to represent new maximum extent counts for data/attr forks
2022-04-06 6:18 ` [PATCH V9 12/19] xfs: Introduce macros to represent new maximum extent counts for data/attr forks Chandan Babu R
@ 2022-04-07 1:05 ` Dave Chinner
2022-04-07 1:58 ` Darrick J. Wong
0 siblings, 1 reply; 62+ messages in thread
From: Dave Chinner @ 2022-04-07 1:05 UTC (permalink / raw)
To: Chandan Babu R; +Cc: linux-xfs, djwong
On Wed, Apr 06, 2022 at 11:48:56AM +0530, Chandan Babu R wrote:
> This commit defines new macros to represent maximum extent counts allowed by
> filesystems which have support for large per-inode extent counters.
>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
> ---
> fs/xfs/libxfs/xfs_bmap.c | 9 ++++-----
> fs/xfs/libxfs/xfs_bmap_btree.c | 3 ++-
> fs/xfs/libxfs/xfs_format.h | 24 ++++++++++++++++++++++--
> fs/xfs/libxfs/xfs_inode_buf.c | 4 +++-
> fs/xfs/libxfs/xfs_inode_fork.c | 3 ++-
> fs/xfs/libxfs/xfs_inode_fork.h | 21 +++++++++++++++++----
> 6 files changed, 50 insertions(+), 14 deletions(-)
>
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index b317226fb4ba..1254d4d4821e 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -61,10 +61,8 @@ xfs_bmap_compute_maxlevels(
> int sz; /* root block size */
>
> /*
> - * The maximum number of extents in a file, hence the maximum number of
> - * leaf entries, is controlled by the size of the on-disk extent count,
> - * either a signed 32-bit number for the data fork, or a signed 16-bit
> - * number for the attr fork.
> + * The maximum number of extents in a fork, hence the maximum number of
> + * leaf entries, is controlled by the size of the on-disk extent count.
> *
> * Note that we can no longer assume that if we are in ATTR1 that the
> * fork offset of all the inodes will be
> @@ -74,7 +72,8 @@ xfs_bmap_compute_maxlevels(
> * ATTR2 we have to assume the worst case scenario of a minimum size
> * available.
> */
> - maxleafents = xfs_iext_max_nextents(whichfork);
> + maxleafents = xfs_iext_max_nextents(xfs_has_large_extent_counts(mp),
> + whichfork);
> if (whichfork == XFS_DATA_FORK)
> sz = XFS_BMDR_SPACE_CALC(MINDBTPTRS);
> else
Just to confirm, the large extent count feature bit can only be
added when the filesystem is unmounted?
> diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
> index 453309fc85f2..7aabeccea9ab 100644
> --- a/fs/xfs/libxfs/xfs_bmap_btree.c
> +++ b/fs/xfs/libxfs/xfs_bmap_btree.c
> @@ -611,7 +611,8 @@ xfs_bmbt_maxlevels_ondisk(void)
> minrecs[1] = xfs_bmbt_block_maxrecs(blocklen, false) / 2;
>
> /* One extra level for the inode root. */
> - return xfs_btree_compute_maxlevels(minrecs, MAXEXTNUM) + 1;
> + return xfs_btree_compute_maxlevels(minrecs,
> + XFS_MAX_EXTCNT_DATA_FORK_LARGE) + 1;
> }
Why is this set to XFS_MAX_EXTCNT_DATA_FORK_LARGE rather than being
conditional xfs_has_large_extent_counts(mp)? i.e. if the feature bit
is not set, the maximum on-disk levels in the bmbt is determined by
XFS_MAX_EXTCNT_DATA_FORK_SMALL, not XFS_MAX_EXTCNT_DATA_FORK_LARGE.
The "_ondisk" suffix implies that it has something to do with the
on-disk format of the filesystem, but AFAICT what we are calculating
here is a constant used for in-memory structure allocation? There
needs to be something explained/changed here, because this is
confusing...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH V9 14/19] xfs: Introduce per-inode 64-bit extent counters
2022-04-06 19:03 ` kernel test robot
@ 2022-04-07 1:07 ` Dave Chinner
0 siblings, 0 replies; 62+ messages in thread
From: Dave Chinner @ 2022-04-07 1:07 UTC (permalink / raw)
To: kernel test robot
Cc: Chandan Babu R, linux-xfs, kbuild-all, djwong, Dave Chinner
On Thu, Apr 07, 2022 at 03:03:32AM +0800, kernel test robot wrote:
> Hi Chandan,
>
> Thank you for the patch! Perhaps something to improve:
>
> [auto build test WARNING on xfs-linux/for-next]
> [also build test WARNING on v5.18-rc1 next-20220406]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Chandan-Babu-R/xfs-Extend-per-inode-extent-counters/20220406-174647
> base: https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
> config: i386-randconfig-s002 (https://download.01.org/0day-ci/archive/20220407/202204070218.QyD2PQPx-lkp@intel.com/config)
> compiler: gcc-11 (Debian 11.2.0-19) 11.2.0
> reproduce:
> # apt-get install sparse
> # sparse version: v0.6.4-dirty
> # https://github.com/intel-lab-lkp/linux/commit/28be4fd3f13d4ba2bcedceb8951cd3bfe852cba2
> git remote add linux-review https://github.com/intel-lab-lkp/linux
> git fetch --no-tags linux-review Chandan-Babu-R/xfs-Extend-per-inode-extent-counters/20220406-174647
> git checkout 28be4fd3f13d4ba2bcedceb8951cd3bfe852cba2
> # save the config file to linux build tree
> mkdir build_dir
> make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=i386 SHELL=/bin/bash fs/xfs/
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
>
>
> sparse warnings: (new ones prefixed by >>)
> >> fs/xfs/xfs_inode_item_recover.c:209:31: sparse: sparse: incorrect type in assignment (different base types) @@ expected restricted __be64 [usertype] di_v3_pad @@ got unsigned long long [usertype] di_v3_pad @@
> fs/xfs/xfs_inode_item_recover.c:209:31: sparse: expected restricted __be64 [usertype] di_v3_pad
> fs/xfs/xfs_inode_item_recover.c:209:31: sparse: got unsigned long long [usertype] di_v3_pad
>
> vim +209 fs/xfs/xfs_inode_item_recover.c
>
> 167
> 168 STATIC void
> 169 xfs_log_dinode_to_disk(
> 170 struct xfs_log_dinode *from,
> 171 struct xfs_dinode *to,
> 172 xfs_lsn_t lsn)
> 173 {
> 174 to->di_magic = cpu_to_be16(from->di_magic);
> 175 to->di_mode = cpu_to_be16(from->di_mode);
> 176 to->di_version = from->di_version;
> 177 to->di_format = from->di_format;
> 178 to->di_onlink = 0;
> 179 to->di_uid = cpu_to_be32(from->di_uid);
> 180 to->di_gid = cpu_to_be32(from->di_gid);
> 181 to->di_nlink = cpu_to_be32(from->di_nlink);
> 182 to->di_projid_lo = cpu_to_be16(from->di_projid_lo);
> 183 to->di_projid_hi = cpu_to_be16(from->di_projid_hi);
> 184
> 185 to->di_atime = xfs_log_dinode_to_disk_ts(from, from->di_atime);
> 186 to->di_mtime = xfs_log_dinode_to_disk_ts(from, from->di_mtime);
> 187 to->di_ctime = xfs_log_dinode_to_disk_ts(from, from->di_ctime);
> 188
> 189 to->di_size = cpu_to_be64(from->di_size);
> 190 to->di_nblocks = cpu_to_be64(from->di_nblocks);
> 191 to->di_extsize = cpu_to_be32(from->di_extsize);
> 192 to->di_forkoff = from->di_forkoff;
> 193 to->di_aformat = from->di_aformat;
> 194 to->di_dmevmask = cpu_to_be32(from->di_dmevmask);
> 195 to->di_dmstate = cpu_to_be16(from->di_dmstate);
> 196 to->di_flags = cpu_to_be16(from->di_flags);
> 197 to->di_gen = cpu_to_be32(from->di_gen);
> 198
> 199 if (from->di_version == 3) {
> 200 to->di_changecount = cpu_to_be64(from->di_changecount);
> 201 to->di_crtime = xfs_log_dinode_to_disk_ts(from,
> 202 from->di_crtime);
> 203 to->di_flags2 = cpu_to_be64(from->di_flags2);
> 204 to->di_cowextsize = cpu_to_be32(from->di_cowextsize);
> 205 to->di_ino = cpu_to_be64(from->di_ino);
> 206 to->di_lsn = cpu_to_be64(lsn);
> 207 memcpy(to->di_pad2, from->di_pad2, sizeof(to->di_pad2));
> 208 uuid_copy(&to->di_uuid, &from->di_uuid);
> > 209 to->di_v3_pad = from->di_v3_pad;
Why not just explicitly write zero to the di_v3_pad field?
> 210 } else {
> 211 to->di_flushiter = cpu_to_be16(from->di_flushiter);
> 212 memcpy(to->di_v2_pad, from->di_v2_pad, sizeof(to->di_v2_pad));
Same here?
Cheers,
Dave.
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH V9 14/19] xfs: Introduce per-inode 64-bit extent counters
@ 2022-04-07 1:07 ` Dave Chinner
0 siblings, 0 replies; 62+ messages in thread
From: Dave Chinner @ 2022-04-07 1:07 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 4448 bytes --]
On Thu, Apr 07, 2022 at 03:03:32AM +0800, kernel test robot wrote:
> Hi Chandan,
>
> Thank you for the patch! Perhaps something to improve:
>
> [auto build test WARNING on xfs-linux/for-next]
> [also build test WARNING on v5.18-rc1 next-20220406]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Chandan-Babu-R/xfs-Extend-per-inode-extent-counters/20220406-174647
> base: https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
> config: i386-randconfig-s002 (https://download.01.org/0day-ci/archive/20220407/202204070218.QyD2PQPx-lkp(a)intel.com/config)
> compiler: gcc-11 (Debian 11.2.0-19) 11.2.0
> reproduce:
> # apt-get install sparse
> # sparse version: v0.6.4-dirty
> # https://github.com/intel-lab-lkp/linux/commit/28be4fd3f13d4ba2bcedceb8951cd3bfe852cba2
> git remote add linux-review https://github.com/intel-lab-lkp/linux
> git fetch --no-tags linux-review Chandan-Babu-R/xfs-Extend-per-inode-extent-counters/20220406-174647
> git checkout 28be4fd3f13d4ba2bcedceb8951cd3bfe852cba2
> # save the config file to linux build tree
> mkdir build_dir
> make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=i386 SHELL=/bin/bash fs/xfs/
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
>
>
> sparse warnings: (new ones prefixed by >>)
> >> fs/xfs/xfs_inode_item_recover.c:209:31: sparse: sparse: incorrect type in assignment (different base types) @@ expected restricted __be64 [usertype] di_v3_pad @@ got unsigned long long [usertype] di_v3_pad @@
> fs/xfs/xfs_inode_item_recover.c:209:31: sparse: expected restricted __be64 [usertype] di_v3_pad
> fs/xfs/xfs_inode_item_recover.c:209:31: sparse: got unsigned long long [usertype] di_v3_pad
>
> vim +209 fs/xfs/xfs_inode_item_recover.c
>
> 167
> 168 STATIC void
> 169 xfs_log_dinode_to_disk(
> 170 struct xfs_log_dinode *from,
> 171 struct xfs_dinode *to,
> 172 xfs_lsn_t lsn)
> 173 {
> 174 to->di_magic = cpu_to_be16(from->di_magic);
> 175 to->di_mode = cpu_to_be16(from->di_mode);
> 176 to->di_version = from->di_version;
> 177 to->di_format = from->di_format;
> 178 to->di_onlink = 0;
> 179 to->di_uid = cpu_to_be32(from->di_uid);
> 180 to->di_gid = cpu_to_be32(from->di_gid);
> 181 to->di_nlink = cpu_to_be32(from->di_nlink);
> 182 to->di_projid_lo = cpu_to_be16(from->di_projid_lo);
> 183 to->di_projid_hi = cpu_to_be16(from->di_projid_hi);
> 184
> 185 to->di_atime = xfs_log_dinode_to_disk_ts(from, from->di_atime);
> 186 to->di_mtime = xfs_log_dinode_to_disk_ts(from, from->di_mtime);
> 187 to->di_ctime = xfs_log_dinode_to_disk_ts(from, from->di_ctime);
> 188
> 189 to->di_size = cpu_to_be64(from->di_size);
> 190 to->di_nblocks = cpu_to_be64(from->di_nblocks);
> 191 to->di_extsize = cpu_to_be32(from->di_extsize);
> 192 to->di_forkoff = from->di_forkoff;
> 193 to->di_aformat = from->di_aformat;
> 194 to->di_dmevmask = cpu_to_be32(from->di_dmevmask);
> 195 to->di_dmstate = cpu_to_be16(from->di_dmstate);
> 196 to->di_flags = cpu_to_be16(from->di_flags);
> 197 to->di_gen = cpu_to_be32(from->di_gen);
> 198
> 199 if (from->di_version == 3) {
> 200 to->di_changecount = cpu_to_be64(from->di_changecount);
> 201 to->di_crtime = xfs_log_dinode_to_disk_ts(from,
> 202 from->di_crtime);
> 203 to->di_flags2 = cpu_to_be64(from->di_flags2);
> 204 to->di_cowextsize = cpu_to_be32(from->di_cowextsize);
> 205 to->di_ino = cpu_to_be64(from->di_ino);
> 206 to->di_lsn = cpu_to_be64(lsn);
> 207 memcpy(to->di_pad2, from->di_pad2, sizeof(to->di_pad2));
> 208 uuid_copy(&to->di_uuid, &from->di_uuid);
> > 209 to->di_v3_pad = from->di_v3_pad;
Why not just explicitly write zero to the di_v3_pad field?
> 210 } else {
> 211 to->di_flushiter = cpu_to_be16(from->di_flushiter);
> 212 memcpy(to->di_v2_pad, from->di_v2_pad, sizeof(to->di_v2_pad));
Same here?
Cheers,
Dave.
Dave Chinner
david(a)fromorbit.com
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH V9 15/19] xfs: Directory's data fork extent counter can never overflow
2022-04-06 6:18 ` [PATCH V9 15/19] xfs: Directory's data fork extent counter can never overflow Chandan Babu R
@ 2022-04-07 1:13 ` Dave Chinner
2022-04-07 1:48 ` Darrick J. Wong
2022-04-09 13:47 ` [PATCH V9.1] " Chandan Babu R
2022-04-12 14:02 ` [PATCH V9.2] " Chandan Babu R
2 siblings, 1 reply; 62+ messages in thread
From: Dave Chinner @ 2022-04-07 1:13 UTC (permalink / raw)
To: Chandan Babu R; +Cc: linux-xfs, djwong
On Wed, Apr 06, 2022 at 11:48:59AM +0530, Chandan Babu R wrote:
> The maximum file size that can be represented by the data fork extent counter
> in the worst case occurs when all extents are 1 block in length and each block
> is 1KB in size.
>
> With XFS_MAX_EXTCNT_DATA_FORK_SMALL representing maximum extent count and with
> 1KB sized blocks, a file can reach upto,
> (2^31) * 1KB = 2TB
>
> This is much larger than the theoretical maximum size of a directory
> i.e. XFS_DIR2_SPACE_SIZE * 3 = ~96GB.
>
> Since a directory's inode can never overflow its data fork extent counter,
> this commit removes all the overflow checks associated with
> it. xfs_dinode_verify() now performs a rough check to verify if a diretory's
> data fork is larger than 96GB.
>
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Mostly OK, just a simple cleanup needed.
> diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> index ee8d4eb7d048..54b106ae77e1 100644
> --- a/fs/xfs/libxfs/xfs_inode_buf.c
> +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> @@ -491,6 +491,15 @@ xfs_dinode_verify(
> if (mode && nextents + naextents > nblocks)
> return __this_address;
>
> + if (S_ISDIR(mode)) {
> + uint64_t max_dfork_nexts;
> +
> + max_dfork_nexts = (XFS_DIR2_MAX_SPACES * XFS_DIR2_SPACE_SIZE) >>
> + mp->m_sb.sb_blocklog;
> + if (nextents > max_dfork_nexts)
> + return __this_address;
> + }
max_dfork_nexts for a directory is a constant that should be
calculated at mount time via xfs_da_mount() and stored in the
mp->m_dir_geo structure. Then this code simple becomes:
if (S_ISDIR(mode) && nextents > mp->m_dir_geo->max_extents)
return __this_address;
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH V9 16/19] xfs: Conditionally upgrade existing inodes to use large extent counters
2022-04-06 6:19 ` [PATCH V9 16/19] xfs: Conditionally upgrade existing inodes to use large extent counters Chandan Babu R
@ 2022-04-07 1:22 ` Dave Chinner
2022-04-07 1:46 ` Darrick J. Wong
2022-04-07 8:19 ` Chandan Babu R
2022-04-07 1:46 ` Darrick J. Wong
2022-04-09 13:52 ` [PATCH V9.1] " Chandan Babu R
2 siblings, 2 replies; 62+ messages in thread
From: Dave Chinner @ 2022-04-07 1:22 UTC (permalink / raw)
To: Chandan Babu R; +Cc: linux-xfs, djwong
On Wed, Apr 06, 2022 at 11:49:00AM +0530, Chandan Babu R wrote:
> This commit enables upgrading existing inodes to use large extent counters
> provided that underlying filesystem's superblock has large extent counter
> feature enabled.
>
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
> ---
> fs/xfs/libxfs/xfs_attr.c | 10 ++++++++++
> fs/xfs/libxfs/xfs_bmap.c | 6 ++++--
> fs/xfs/libxfs/xfs_format.h | 8 ++++++++
> fs/xfs/libxfs/xfs_inode_fork.c | 19 +++++++++++++++++++
> fs/xfs/libxfs/xfs_inode_fork.h | 2 ++
> fs/xfs/xfs_bmap_item.c | 2 ++
> fs/xfs/xfs_bmap_util.c | 13 +++++++++++++
> fs/xfs/xfs_dquot.c | 3 +++
> fs/xfs/xfs_iomap.c | 5 +++++
> fs/xfs/xfs_reflink.c | 5 +++++
> fs/xfs/xfs_rtalloc.c | 3 +++
> 11 files changed, 74 insertions(+), 2 deletions(-)
>
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index 23523b802539..66c4fc55c9d7 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -776,8 +776,18 @@ xfs_attr_set(
> if (args->value || xfs_inode_hasattr(dp)) {
> error = xfs_iext_count_may_overflow(dp, XFS_ATTR_FORK,
> XFS_IEXT_ATTR_MANIP_CNT(rmt_blks));
> + if (error == -EFBIG)
> + error = xfs_iext_count_upgrade(args->trans, dp,
> + XFS_IEXT_ATTR_MANIP_CNT(rmt_blks));
> if (error)
> goto out_trans_cancel;
> +
> + if (error == -EFBIG) {
> + error = xfs_iext_count_upgrade(args->trans, dp,
> + XFS_IEXT_ATTR_MANIP_CNT(rmt_blks));
> + if (error)
> + goto out_trans_cancel;
> + }
> }
Did you forgot to remove the original xfs_iext_count_upgrade() call?
> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> index 43de892d0305..bb327ea43ca1 100644
> --- a/fs/xfs/libxfs/xfs_format.h
> +++ b/fs/xfs/libxfs/xfs_format.h
> @@ -934,6 +934,14 @@ enum xfs_dinode_fmt {
> #define XFS_MAX_EXTCNT_DATA_FORK_SMALL ((xfs_extnum_t)((1ULL << 31) - 1))
> #define XFS_MAX_EXTCNT_ATTR_FORK_SMALL ((xfs_extnum_t)((1ULL << 15) - 1))
>
> +/*
> + * This macro represents the maximum value by which a filesystem operation can
> + * increase the value of an inode's data/attr fork extent count.
> + */
> +#define XFS_MAX_EXTCNT_UPGRADE_NR \
> + min(XFS_MAX_EXTCNT_ATTR_FORK_LARGE - XFS_MAX_EXTCNT_ATTR_FORK_SMALL, \
> + XFS_MAX_EXTCNT_DATA_FORK_LARGE - XFS_MAX_EXTCNT_DATA_FORK_SMALL)
You don't need to write "This macro represents" in a comment above
the macro that that the comment is describing. If you need to refer
to the actual macro, use it's name directly.
As it is, the comment could be improved:
/*
* When we upgrade an inode to the large extent counts, the maximum
* value by which the extent count can increase is bound by the
* change in size of the on-disk field. No upgrade operation should
* ever be adding more than a few tens of, so if we get a really
* large value it is a sign of a code bug or corruption.
*/
#define XFS_MAX_EXTCNT_UPGRADE_NR \
min(XFS_MAX_EXTCNT_ATTR_FORK_LARGE - XFS_MAX_EXTCNT_ATTR_FORK_SMALL, \
XFS_MAX_EXTCNT_DATA_FORK_LARGE - XFS_MAX_EXTCNT_DATA_FORK_SMALL)
Otherwise it looks OK.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH V9 19/19] xfs: Add XFS_SB_FEAT_INCOMPAT_NREXT64 to the list of supported flags
2022-04-06 6:19 ` [PATCH V9 19/19] xfs: Add XFS_SB_FEAT_INCOMPAT_NREXT64 to the list of supported flags Chandan Babu R
@ 2022-04-07 1:23 ` Dave Chinner
2022-04-07 1:26 ` Darrick J. Wong
1 sibling, 0 replies; 62+ messages in thread
From: Dave Chinner @ 2022-04-07 1:23 UTC (permalink / raw)
To: Chandan Babu R; +Cc: linux-xfs, djwong
On Wed, Apr 06, 2022 at 11:49:03AM +0530, Chandan Babu R wrote:
> This commit enables XFS module to work with fs instances having 64-bit
> per-inode extent counters by adding XFS_SB_FEAT_INCOMPAT_NREXT64 flag to the
> list of supported incompat feature flags.
>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
> ---
> fs/xfs/libxfs/xfs_format.h | 3 ++-
> fs/xfs/xfs_super.c | 5 +++++
> 2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> index bb327ea43ca1..b3f4a33b986c 100644
> --- a/fs/xfs/libxfs/xfs_format.h
> +++ b/fs/xfs/libxfs/xfs_format.h
> @@ -378,7 +378,8 @@ xfs_sb_has_ro_compat_feature(
> XFS_SB_FEAT_INCOMPAT_SPINODES| \
> XFS_SB_FEAT_INCOMPAT_META_UUID| \
> XFS_SB_FEAT_INCOMPAT_BIGTIME| \
> - XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR)
> + XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR| \
> + XFS_SB_FEAT_INCOMPAT_NREXT64)
>
> #define XFS_SB_FEAT_INCOMPAT_UNKNOWN ~XFS_SB_FEAT_INCOMPAT_ALL
> static inline bool
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 54be9d64093e..14591492c384 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1639,6 +1639,11 @@ xfs_fs_fill_super(
> goto out_filestream_unmount;
> }
>
> + if (xfs_has_large_extent_counts(mp)) {
> + xfs_warn(mp,
> + "EXPERIMENTAL Large extent counts feature in use. Use at your own risk!");
> + }
Thanks for adding this. :)
Reviewed-by: Dave Chinner <dchinner@redhat.com>
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH V9 19/19] xfs: Add XFS_SB_FEAT_INCOMPAT_NREXT64 to the list of supported flags
2022-04-06 6:19 ` [PATCH V9 19/19] xfs: Add XFS_SB_FEAT_INCOMPAT_NREXT64 to the list of supported flags Chandan Babu R
2022-04-07 1:23 ` Dave Chinner
@ 2022-04-07 1:26 ` Darrick J. Wong
1 sibling, 0 replies; 62+ messages in thread
From: Darrick J. Wong @ 2022-04-07 1:26 UTC (permalink / raw)
To: Chandan Babu R; +Cc: linux-xfs, david
On Wed, Apr 06, 2022 at 11:49:03AM +0530, Chandan Babu R wrote:
> This commit enables XFS module to work with fs instances having 64-bit
> per-inode extent counters by adding XFS_SB_FEAT_INCOMPAT_NREXT64 flag to the
> list of supported incompat feature flags.
>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
> ---
> fs/xfs/libxfs/xfs_format.h | 3 ++-
> fs/xfs/xfs_super.c | 5 +++++
> 2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> index bb327ea43ca1..b3f4a33b986c 100644
> --- a/fs/xfs/libxfs/xfs_format.h
> +++ b/fs/xfs/libxfs/xfs_format.h
> @@ -378,7 +378,8 @@ xfs_sb_has_ro_compat_feature(
> XFS_SB_FEAT_INCOMPAT_SPINODES| \
> XFS_SB_FEAT_INCOMPAT_META_UUID| \
> XFS_SB_FEAT_INCOMPAT_BIGTIME| \
> - XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR)
> + XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR| \
> + XFS_SB_FEAT_INCOMPAT_NREXT64)
>
> #define XFS_SB_FEAT_INCOMPAT_UNKNOWN ~XFS_SB_FEAT_INCOMPAT_ALL
> static inline bool
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 54be9d64093e..14591492c384 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1639,6 +1639,11 @@ xfs_fs_fill_super(
> goto out_filestream_unmount;
> }
>
> + if (xfs_has_large_extent_counts(mp)) {
> + xfs_warn(mp,
> + "EXPERIMENTAL Large extent counts feature in use. Use at your own risk!");
> + }
Style nit: no need for braces here.
But thanks for putting on the EXPERIMENTAL tag.
--D
> +
> error = xfs_mountfs(mp);
> if (error)
> goto out_filestream_unmount;
> --
> 2.30.2
>
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH V9 18/19] xfs: Enable bulkstat ioctl to support 64-bit per-inode extent counters
2022-04-06 6:19 ` [PATCH V9 18/19] xfs: Enable bulkstat ioctl to support 64-bit per-inode extent counters Chandan Babu R
@ 2022-04-07 1:29 ` Darrick J. Wong
2022-04-07 1:42 ` Dave Chinner
2022-04-07 8:20 ` Chandan Babu R
2022-04-09 13:57 ` [PATCH V9.1] " Chandan Babu R
1 sibling, 2 replies; 62+ messages in thread
From: Darrick J. Wong @ 2022-04-07 1:29 UTC (permalink / raw)
To: Chandan Babu R; +Cc: linux-xfs, david, Dave Chinner
On Wed, Apr 06, 2022 at 11:49:02AM +0530, Chandan Babu R wrote:
> The following changes are made to enable userspace to obtain 64-bit extent
> counters,
> 1. Carve out a new 64-bit field xfs_bulkstat->bs_extents64 from
> xfs_bulkstat->bs_pad[] to hold 64-bit extent counter.
> 2. Define the new flag XFS_BULK_IREQ_BULKSTAT for userspace to indicate that
> it is capable of receiving 64-bit extent counters.
>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> Suggested-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
> ---
> fs/xfs/libxfs/xfs_fs.h | 20 ++++++++++++++++----
> fs/xfs/xfs_ioctl.c | 3 +++
> fs/xfs/xfs_itable.c | 13 ++++++++++++-
> fs/xfs/xfs_itable.h | 2 ++
> 4 files changed, 33 insertions(+), 5 deletions(-)
>
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index 1f7238db35cc..2a42bfb85c3b 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -378,7 +378,7 @@ struct xfs_bulkstat {
> uint32_t bs_extsize_blks; /* extent size hint, blocks */
>
> uint32_t bs_nlink; /* number of links */
> - uint32_t bs_extents; /* number of extents */
> + uint32_t bs_extents; /* 32-bit data fork extent counter */
> uint32_t bs_aextents; /* attribute number of extents */
> uint16_t bs_version; /* structure version */
> uint16_t bs_forkoff; /* inode fork offset in bytes */
> @@ -387,8 +387,9 @@ struct xfs_bulkstat {
> uint16_t bs_checked; /* checked inode metadata */
> uint16_t bs_mode; /* type and mode */
> uint16_t bs_pad2; /* zeroed */
> + uint64_t bs_extents64; /* 64-bit data fork extent counter */
>
> - uint64_t bs_pad[7]; /* zeroed */
> + uint64_t bs_pad[6]; /* zeroed */
> };
>
> #define XFS_BULKSTAT_VERSION_V1 (1)
> @@ -469,8 +470,19 @@ struct xfs_bulk_ireq {
> */
> #define XFS_BULK_IREQ_SPECIAL (1 << 1)
>
> -#define XFS_BULK_IREQ_FLAGS_ALL (XFS_BULK_IREQ_AGNO | \
> - XFS_BULK_IREQ_SPECIAL)
> +/*
> + * Return data fork extent count via xfs_bulkstat->bs_extents64 field and assign
> + * 0 to xfs_bulkstat->bs_extents when the flag is set. Otherwise, use
> + * xfs_bulkstat->bs_extents for returning data fork extent count and set
> + * xfs_bulkstat->bs_extents64 to 0. In the second case, return -EOVERFLOW and
> + * assign 0 to xfs_bulkstat->bs_extents if data fork extent count is larger than
> + * XFS_MAX_EXTCNT_DATA_FORK_OLD.
> + */
> +#define XFS_BULK_IREQ_NREXT64 (1 << 2)
> +
> +#define XFS_BULK_IREQ_FLAGS_ALL (XFS_BULK_IREQ_AGNO | \
> + XFS_BULK_IREQ_SPECIAL | \
> + XFS_BULK_IREQ_NREXT64)
>
> /* Operate on the root directory inode. */
> #define XFS_BULK_IREQ_SPECIAL_ROOT (1)
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 83481005317a..e9eadc7337ce 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -813,6 +813,9 @@ xfs_bulk_ireq_setup(
> if (XFS_INO_TO_AGNO(mp, breq->startino) >= mp->m_sb.sb_agcount)
> return -ECANCELED;
>
> + if (hdr->flags & XFS_BULK_IREQ_NREXT64)
> + breq->flags |= XFS_IBULK_NREXT64;
> +
> return 0;
> }
>
> diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
> index 71ed4905f206..847f03f75a38 100644
> --- a/fs/xfs/xfs_itable.c
> +++ b/fs/xfs/xfs_itable.c
> @@ -64,6 +64,7 @@ xfs_bulkstat_one_int(
> struct xfs_inode *ip; /* incore inode pointer */
> struct inode *inode;
> struct xfs_bulkstat *buf = bc->buf;
> + xfs_extnum_t nextents;
> int error = -EINVAL;
>
> if (xfs_internal_inum(mp, ino))
> @@ -102,7 +103,17 @@ xfs_bulkstat_one_int(
>
> buf->bs_xflags = xfs_ip2xflags(ip);
> buf->bs_extsize_blks = ip->i_extsize;
> - buf->bs_extents = xfs_ifork_nextents(&ip->i_df);
> +
> + nextents = xfs_ifork_nextents(&ip->i_df);
> + if (!(bc->breq->flags & XFS_IBULK_NREXT64)) {
> + if (nextents > XFS_MAX_EXTCNT_DATA_FORK_SMALL)
> + buf->bs_extents = XFS_MAX_EXTCNT_DATA_FORK_SMALL;
> + else
> + buf->bs_extents = nextents;
buf->bs_extents = min(nextents, XFS_MAX_EXTCNT_DATA_FORK_SMALL); ?
> + } else {
> + buf->bs_extents64 = nextents;
> + }
> +
> xfs_bulkstat_health(ip, buf);
> buf->bs_aextents = xfs_ifork_nextents(ip->i_afp);
> buf->bs_forkoff = XFS_IFORK_BOFF(ip);
> diff --git a/fs/xfs/xfs_itable.h b/fs/xfs/xfs_itable.h
> index 2cf3872fcd2f..0150fd53d18e 100644
> --- a/fs/xfs/xfs_itable.h
> +++ b/fs/xfs/xfs_itable.h
> @@ -19,6 +19,8 @@ struct xfs_ibulk {
> /* Only iterate within the same AG as startino */
> #define XFS_IBULK_SAME_AG (1 << 0)
>
> +#define XFS_IBULK_NREXT64 (1 << 1)
Needs a comment here.
/* Fill out the bs_extents64 field if set. */
#define XFS_IBULK_NREXT64 (1U << 1)
(Are we supposed to do "1U" now?)
--D
> +
> /*
> * Advance the user buffer pointer by one record of the given size. If the
> * buffer is now full, return the appropriate error code.
> --
> 2.30.2
>
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH V9 17/19] xfs: Decouple XFS_IBULK flags from XFS_IWALK flags
2022-04-06 6:19 ` [PATCH V9 17/19] xfs: Decouple XFS_IBULK flags from XFS_IWALK flags Chandan Babu R
@ 2022-04-07 1:29 ` Darrick J. Wong
0 siblings, 0 replies; 62+ messages in thread
From: Darrick J. Wong @ 2022-04-07 1:29 UTC (permalink / raw)
To: Chandan Babu R; +Cc: linux-xfs, david, Dave Chinner
On Wed, Apr 06, 2022 at 11:49:01AM +0530, Chandan Babu R wrote:
> A future commit will add a new XFS_IBULK flag which will not have a
> corresponding XFS_IWALK flag. In preparation for the change, this commit
> separates XFS_IBULK_* flags from XFS_IWALK_* flags.
>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Looks good to me now,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
--D
> ---
> fs/xfs/xfs_itable.c | 6 +++++-
> fs/xfs/xfs_itable.h | 2 +-
> fs/xfs/xfs_iwalk.h | 2 +-
> 3 files changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
> index c08c79d9e311..71ed4905f206 100644
> --- a/fs/xfs/xfs_itable.c
> +++ b/fs/xfs/xfs_itable.c
> @@ -256,6 +256,7 @@ xfs_bulkstat(
> .breq = breq,
> };
> struct xfs_trans *tp;
> + unsigned int iwalk_flags = 0;
> int error;
>
> if (breq->mnt_userns != &init_user_ns) {
> @@ -279,7 +280,10 @@ xfs_bulkstat(
> if (error)
> goto out;
>
> - error = xfs_iwalk(breq->mp, tp, breq->startino, breq->flags,
> + if (breq->flags & XFS_IBULK_SAME_AG)
> + iwalk_flags |= XFS_IWALK_SAME_AG;
> +
> + error = xfs_iwalk(breq->mp, tp, breq->startino, iwalk_flags,
> xfs_bulkstat_iwalk, breq->icount, &bc);
> xfs_trans_cancel(tp);
> out:
> diff --git a/fs/xfs/xfs_itable.h b/fs/xfs/xfs_itable.h
> index 7078d10c9b12..2cf3872fcd2f 100644
> --- a/fs/xfs/xfs_itable.h
> +++ b/fs/xfs/xfs_itable.h
> @@ -17,7 +17,7 @@ struct xfs_ibulk {
> };
>
> /* Only iterate within the same AG as startino */
> -#define XFS_IBULK_SAME_AG (XFS_IWALK_SAME_AG)
> +#define XFS_IBULK_SAME_AG (1 << 0)
>
> /*
> * Advance the user buffer pointer by one record of the given size. If the
> diff --git a/fs/xfs/xfs_iwalk.h b/fs/xfs/xfs_iwalk.h
> index 37a795f03267..3a68766fd909 100644
> --- a/fs/xfs/xfs_iwalk.h
> +++ b/fs/xfs/xfs_iwalk.h
> @@ -26,7 +26,7 @@ int xfs_iwalk_threaded(struct xfs_mount *mp, xfs_ino_t startino,
> unsigned int inode_records, bool poll, void *data);
>
> /* Only iterate inodes within the same AG as @startino. */
> -#define XFS_IWALK_SAME_AG (0x1)
> +#define XFS_IWALK_SAME_AG (1 << 0)
>
> #define XFS_IWALK_FLAGS_ALL (XFS_IWALK_SAME_AG)
>
> --
> 2.30.2
>
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH V9 18/19] xfs: Enable bulkstat ioctl to support 64-bit per-inode extent counters
2022-04-07 1:29 ` Darrick J. Wong
@ 2022-04-07 1:42 ` Dave Chinner
2022-04-07 8:20 ` Chandan Babu R
1 sibling, 0 replies; 62+ messages in thread
From: Dave Chinner @ 2022-04-07 1:42 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: Chandan Babu R, linux-xfs, Dave Chinner
On Wed, Apr 06, 2022 at 06:29:12PM -0700, Darrick J. Wong wrote:
> > diff --git a/fs/xfs/xfs_itable.h b/fs/xfs/xfs_itable.h
> > index 2cf3872fcd2f..0150fd53d18e 100644
> > --- a/fs/xfs/xfs_itable.h
> > +++ b/fs/xfs/xfs_itable.h
> > @@ -19,6 +19,8 @@ struct xfs_ibulk {
> > /* Only iterate within the same AG as startino */
> > #define XFS_IBULK_SAME_AG (1 << 0)
> >
> > +#define XFS_IBULK_NREXT64 (1 << 1)
>
> Needs a comment here.
>
> /* Fill out the bs_extents64 field if set. */
> #define XFS_IBULK_NREXT64 (1U << 1)
>
> (Are we supposed to do "1U" now?)
Apparently so. I'm not concerned by this specific patchset right now
because we've got so many unsigned bit fields that need bulk
updates.
I'm slowly working through the ones that are used in__print_flags
macros right now (that'll be 16-17 patches by itself), and once
those are done we can worry about the rest as ongoing individual
cleanups...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH V9 16/19] xfs: Conditionally upgrade existing inodes to use large extent counters
2022-04-06 6:19 ` [PATCH V9 16/19] xfs: Conditionally upgrade existing inodes to use large extent counters Chandan Babu R
2022-04-07 1:22 ` Dave Chinner
@ 2022-04-07 1:46 ` Darrick J. Wong
2022-04-07 2:00 ` Darrick J. Wong
2022-04-07 8:19 ` Chandan Babu R
2022-04-09 13:52 ` [PATCH V9.1] " Chandan Babu R
2 siblings, 2 replies; 62+ messages in thread
From: Darrick J. Wong @ 2022-04-07 1:46 UTC (permalink / raw)
To: Chandan Babu R; +Cc: linux-xfs, david
On Wed, Apr 06, 2022 at 11:49:00AM +0530, Chandan Babu R wrote:
> This commit enables upgrading existing inodes to use large extent counters
> provided that underlying filesystem's superblock has large extent counter
> feature enabled.
>
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
> ---
> fs/xfs/libxfs/xfs_attr.c | 10 ++++++++++
> fs/xfs/libxfs/xfs_bmap.c | 6 ++++--
> fs/xfs/libxfs/xfs_format.h | 8 ++++++++
> fs/xfs/libxfs/xfs_inode_fork.c | 19 +++++++++++++++++++
> fs/xfs/libxfs/xfs_inode_fork.h | 2 ++
> fs/xfs/xfs_bmap_item.c | 2 ++
> fs/xfs/xfs_bmap_util.c | 13 +++++++++++++
> fs/xfs/xfs_dquot.c | 3 +++
> fs/xfs/xfs_iomap.c | 5 +++++
> fs/xfs/xfs_reflink.c | 5 +++++
> fs/xfs/xfs_rtalloc.c | 3 +++
> 11 files changed, 74 insertions(+), 2 deletions(-)
>
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index 23523b802539..66c4fc55c9d7 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -776,8 +776,18 @@ xfs_attr_set(
> if (args->value || xfs_inode_hasattr(dp)) {
> error = xfs_iext_count_may_overflow(dp, XFS_ATTR_FORK,
> XFS_IEXT_ATTR_MANIP_CNT(rmt_blks));
> + if (error == -EFBIG)
> + error = xfs_iext_count_upgrade(args->trans, dp,
> + XFS_IEXT_ATTR_MANIP_CNT(rmt_blks));
> if (error)
> goto out_trans_cancel;
> +
> + if (error == -EFBIG) {
> + error = xfs_iext_count_upgrade(args->trans, dp,
> + XFS_IEXT_ATTR_MANIP_CNT(rmt_blks));
> + if (error)
> + goto out_trans_cancel;
> + }
> }
>
> error = xfs_attr_lookup(args);
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 4fab0c92ab70..82d5467ddf2c 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -4524,14 +4524,16 @@ xfs_bmapi_convert_delalloc(
> return error;
>
> xfs_ilock(ip, XFS_ILOCK_EXCL);
> + xfs_trans_ijoin(tp, ip, 0);
>
> error = xfs_iext_count_may_overflow(ip, whichfork,
> XFS_IEXT_ADD_NOSPLIT_CNT);
> + if (error == -EFBIG)
> + error = xfs_iext_count_upgrade(tp, ip,
> + XFS_IEXT_ADD_NOSPLIT_CNT);
> if (error)
> goto out_trans_cancel;
>
> - xfs_trans_ijoin(tp, ip, 0);
> -
> if (!xfs_iext_lookup_extent(ip, ifp, offset_fsb, &bma.icur, &bma.got) ||
> bma.got.br_startoff > offset_fsb) {
> /*
> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> index 43de892d0305..bb327ea43ca1 100644
> --- a/fs/xfs/libxfs/xfs_format.h
> +++ b/fs/xfs/libxfs/xfs_format.h
> @@ -934,6 +934,14 @@ enum xfs_dinode_fmt {
> #define XFS_MAX_EXTCNT_DATA_FORK_SMALL ((xfs_extnum_t)((1ULL << 31) - 1))
> #define XFS_MAX_EXTCNT_ATTR_FORK_SMALL ((xfs_extnum_t)((1ULL << 15) - 1))
>
> +/*
> + * This macro represents the maximum value by which a filesystem operation can
> + * increase the value of an inode's data/attr fork extent count.
> + */
> +#define XFS_MAX_EXTCNT_UPGRADE_NR \
> + min(XFS_MAX_EXTCNT_ATTR_FORK_LARGE - XFS_MAX_EXTCNT_ATTR_FORK_SMALL, \
> + XFS_MAX_EXTCNT_DATA_FORK_LARGE - XFS_MAX_EXTCNT_DATA_FORK_SMALL)
> +
> /*
> * Inode minimum and maximum sizes.
> */
> diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
> index bb5d841aac58..1245e9f1ca81 100644
> --- a/fs/xfs/libxfs/xfs_inode_fork.c
> +++ b/fs/xfs/libxfs/xfs_inode_fork.c
> @@ -756,3 +756,22 @@ xfs_iext_count_may_overflow(
>
> return 0;
> }
> +
> +int
> +xfs_iext_count_upgrade(
Hmm. I think the @nr_to_add parameter is supposed to be the one
that caused xfs_iext_count_may_overflow to return -EFBIG, right?
I was about to comment that it would be really helpful to have a comment
above this function dropping a hint that this is the case:
/*
* Upgrade this inode's extent counter fields to be able to handle a
* potential increase in the extent count by this number. Normally
* this is the same quantity that caused xfs_iext_count_may_overflow to
* return -EFBIG.
*/
int
xfs_iext_count_upgrade(...
...though I worry that this will cause fatal warnings about the
otherwise unused parameter on non-debug kernels? I'm not sure why it
matters that nr_to_add is constrained to a small value? Is it just to
prevent obviously huge values? AFAICT all the current callers pass in
small #defined integer values.
That said, if the assert here is something Dave asked for in a previous
review, then I won't stand in the way:
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
--D
> + struct xfs_trans *tp,
> + struct xfs_inode *ip,
> + uint nr_to_add)
> +{
> + ASSERT(nr_to_add <= XFS_MAX_EXTCNT_UPGRADE_NR);
> +
> + if (!xfs_has_large_extent_counts(ip->i_mount) ||
> + (ip->i_diflags2 & XFS_DIFLAG2_NREXT64) ||
> + XFS_TEST_ERROR(false, ip->i_mount, XFS_ERRTAG_REDUCE_MAX_IEXTENTS))
> + return -EFBIG;
> +
> + ip->i_diflags2 |= XFS_DIFLAG2_NREXT64;
> + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> +
> + return 0;
> +}
> diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
> index 6f9d69f8896e..4f68c1f20beb 100644
> --- a/fs/xfs/libxfs/xfs_inode_fork.h
> +++ b/fs/xfs/libxfs/xfs_inode_fork.h
> @@ -275,6 +275,8 @@ int xfs_ifork_verify_local_data(struct xfs_inode *ip);
> int xfs_ifork_verify_local_attr(struct xfs_inode *ip);
> int xfs_iext_count_may_overflow(struct xfs_inode *ip, int whichfork,
> int nr_to_add);
> +int xfs_iext_count_upgrade(struct xfs_trans *tp, struct xfs_inode *ip,
> + uint nr_to_add);
>
> /* returns true if the fork has extents but they are not read in yet. */
> static inline bool xfs_need_iread_extents(struct xfs_ifork *ifp)
> diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
> index 761dde155099..593ac29cffc7 100644
> --- a/fs/xfs/xfs_bmap_item.c
> +++ b/fs/xfs/xfs_bmap_item.c
> @@ -506,6 +506,8 @@ xfs_bui_item_recover(
> iext_delta = XFS_IEXT_PUNCH_HOLE_CNT;
>
> error = xfs_iext_count_may_overflow(ip, whichfork, iext_delta);
> + if (error == -EFBIG)
> + error = xfs_iext_count_upgrade(tp, ip, iext_delta);
> if (error)
> goto err_cancel;
>
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 18c1b99311a8..52be58372c63 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -859,6 +859,9 @@ xfs_alloc_file_space(
>
> error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
> XFS_IEXT_ADD_NOSPLIT_CNT);
> + if (error == -EFBIG)
> + error = xfs_iext_count_upgrade(tp, ip,
> + XFS_IEXT_ADD_NOSPLIT_CNT);
> if (error)
> goto error;
>
> @@ -914,6 +917,8 @@ xfs_unmap_extent(
>
> error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
> XFS_IEXT_PUNCH_HOLE_CNT);
> + if (error == -EFBIG)
> + error = xfs_iext_count_upgrade(tp, ip, XFS_IEXT_PUNCH_HOLE_CNT);
> if (error)
> goto out_trans_cancel;
>
> @@ -1195,6 +1200,8 @@ xfs_insert_file_space(
>
> error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
> XFS_IEXT_PUNCH_HOLE_CNT);
> + if (error == -EFBIG)
> + error = xfs_iext_count_upgrade(tp, ip, XFS_IEXT_PUNCH_HOLE_CNT);
> if (error)
> goto out_trans_cancel;
>
> @@ -1423,6 +1430,9 @@ xfs_swap_extent_rmap(
> error = xfs_iext_count_may_overflow(ip,
> XFS_DATA_FORK,
> XFS_IEXT_SWAP_RMAP_CNT);
> + if (error == -EFBIG)
> + error = xfs_iext_count_upgrade(tp, ip,
> + XFS_IEXT_SWAP_RMAP_CNT);
> if (error)
> goto out;
> }
> @@ -1431,6 +1441,9 @@ xfs_swap_extent_rmap(
> error = xfs_iext_count_may_overflow(tip,
> XFS_DATA_FORK,
> XFS_IEXT_SWAP_RMAP_CNT);
> + if (error == -EFBIG)
> + error = xfs_iext_count_upgrade(tp, ip,
> + XFS_IEXT_SWAP_RMAP_CNT);
> if (error)
> goto out;
> }
> diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
> index 5afedcbc78c7..eb211e0ede5d 100644
> --- a/fs/xfs/xfs_dquot.c
> +++ b/fs/xfs/xfs_dquot.c
> @@ -322,6 +322,9 @@ xfs_dquot_disk_alloc(
>
> error = xfs_iext_count_may_overflow(quotip, XFS_DATA_FORK,
> XFS_IEXT_ADD_NOSPLIT_CNT);
> + if (error == -EFBIG)
> + error = xfs_iext_count_upgrade(tp, quotip,
> + XFS_IEXT_ADD_NOSPLIT_CNT);
> if (error)
> goto err_cancel;
>
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 87e1cf5060bd..5a393259a3a3 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -251,6 +251,8 @@ xfs_iomap_write_direct(
> return error;
>
> error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK, nr_exts);
> + if (error == -EFBIG)
> + error = xfs_iext_count_upgrade(tp, ip, nr_exts);
> if (error)
> goto out_trans_cancel;
>
> @@ -555,6 +557,9 @@ xfs_iomap_write_unwritten(
>
> error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
> XFS_IEXT_WRITE_UNWRITTEN_CNT);
> + if (error == -EFBIG)
> + error = xfs_iext_count_upgrade(tp, ip,
> + XFS_IEXT_WRITE_UNWRITTEN_CNT);
> if (error)
> goto error_on_bmapi_transaction;
>
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 54e68e5693fd..1ae6d3434ad2 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -620,6 +620,9 @@ xfs_reflink_end_cow_extent(
>
> error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
> XFS_IEXT_REFLINK_END_COW_CNT);
> + if (error == -EFBIG)
> + error = xfs_iext_count_upgrade(tp, ip,
> + XFS_IEXT_REFLINK_END_COW_CNT);
> if (error)
> goto out_cancel;
>
> @@ -1121,6 +1124,8 @@ xfs_reflink_remap_extent(
> ++iext_delta;
>
> error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK, iext_delta);
> + if (error == -EFBIG)
> + error = xfs_iext_count_upgrade(tp, ip, iext_delta);
> if (error)
> goto out_cancel;
>
> diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
> index b8c79ee791af..3e587e85d5bf 100644
> --- a/fs/xfs/xfs_rtalloc.c
> +++ b/fs/xfs/xfs_rtalloc.c
> @@ -806,6 +806,9 @@ xfs_growfs_rt_alloc(
>
> error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
> XFS_IEXT_ADD_NOSPLIT_CNT);
> + if (error == -EFBIG)
> + error = xfs_iext_count_upgrade(tp, ip,
> + XFS_IEXT_ADD_NOSPLIT_CNT);
> if (error)
> goto out_trans_cancel;
>
> --
> 2.30.2
>
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH V9 16/19] xfs: Conditionally upgrade existing inodes to use large extent counters
2022-04-07 1:22 ` Dave Chinner
@ 2022-04-07 1:46 ` Darrick J. Wong
2022-04-07 8:19 ` Chandan Babu R
1 sibling, 0 replies; 62+ messages in thread
From: Darrick J. Wong @ 2022-04-07 1:46 UTC (permalink / raw)
To: Dave Chinner; +Cc: Chandan Babu R, linux-xfs
On Thu, Apr 07, 2022 at 11:22:25AM +1000, Dave Chinner wrote:
> On Wed, Apr 06, 2022 at 11:49:00AM +0530, Chandan Babu R wrote:
> > This commit enables upgrading existing inodes to use large extent counters
> > provided that underlying filesystem's superblock has large extent counter
> > feature enabled.
> >
> > Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
> > ---
> > fs/xfs/libxfs/xfs_attr.c | 10 ++++++++++
> > fs/xfs/libxfs/xfs_bmap.c | 6 ++++--
> > fs/xfs/libxfs/xfs_format.h | 8 ++++++++
> > fs/xfs/libxfs/xfs_inode_fork.c | 19 +++++++++++++++++++
> > fs/xfs/libxfs/xfs_inode_fork.h | 2 ++
> > fs/xfs/xfs_bmap_item.c | 2 ++
> > fs/xfs/xfs_bmap_util.c | 13 +++++++++++++
> > fs/xfs/xfs_dquot.c | 3 +++
> > fs/xfs/xfs_iomap.c | 5 +++++
> > fs/xfs/xfs_reflink.c | 5 +++++
> > fs/xfs/xfs_rtalloc.c | 3 +++
> > 11 files changed, 74 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > index 23523b802539..66c4fc55c9d7 100644
> > --- a/fs/xfs/libxfs/xfs_attr.c
> > +++ b/fs/xfs/libxfs/xfs_attr.c
> > @@ -776,8 +776,18 @@ xfs_attr_set(
> > if (args->value || xfs_inode_hasattr(dp)) {
> > error = xfs_iext_count_may_overflow(dp, XFS_ATTR_FORK,
> > XFS_IEXT_ATTR_MANIP_CNT(rmt_blks));
> > + if (error == -EFBIG)
> > + error = xfs_iext_count_upgrade(args->trans, dp,
> > + XFS_IEXT_ATTR_MANIP_CNT(rmt_blks));
> > if (error)
> > goto out_trans_cancel;
> > +
> > + if (error == -EFBIG) {
> > + error = xfs_iext_count_upgrade(args->trans, dp,
> > + XFS_IEXT_ATTR_MANIP_CNT(rmt_blks));
> > + if (error)
> > + goto out_trans_cancel;
> > + }
> > }
>
> Did you forgot to remove the original xfs_iext_count_upgrade() call?
>
> > diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> > index 43de892d0305..bb327ea43ca1 100644
> > --- a/fs/xfs/libxfs/xfs_format.h
> > +++ b/fs/xfs/libxfs/xfs_format.h
> > @@ -934,6 +934,14 @@ enum xfs_dinode_fmt {
> > #define XFS_MAX_EXTCNT_DATA_FORK_SMALL ((xfs_extnum_t)((1ULL << 31) - 1))
> > #define XFS_MAX_EXTCNT_ATTR_FORK_SMALL ((xfs_extnum_t)((1ULL << 15) - 1))
> >
> > +/*
> > + * This macro represents the maximum value by which a filesystem operation can
> > + * increase the value of an inode's data/attr fork extent count.
> > + */
> > +#define XFS_MAX_EXTCNT_UPGRADE_NR \
> > + min(XFS_MAX_EXTCNT_ATTR_FORK_LARGE - XFS_MAX_EXTCNT_ATTR_FORK_SMALL, \
> > + XFS_MAX_EXTCNT_DATA_FORK_LARGE - XFS_MAX_EXTCNT_DATA_FORK_SMALL)
>
> You don't need to write "This macro represents" in a comment above
> the macro that that the comment is describing. If you need to refer
> to the actual macro, use it's name directly.
>
> As it is, the comment could be improved:
>
> /*
> * When we upgrade an inode to the large extent counts, the maximum
> * value by which the extent count can increase is bound by the
> * change in size of the on-disk field. No upgrade operation should
> * ever be adding more than a few tens of, so if we get a really
Nit: missing object?
"...more than a few tens of extents, so if we get..."
--D
> * large value it is a sign of a code bug or corruption.
> */
> #define XFS_MAX_EXTCNT_UPGRADE_NR \
> min(XFS_MAX_EXTCNT_ATTR_FORK_LARGE - XFS_MAX_EXTCNT_ATTR_FORK_SMALL, \
> XFS_MAX_EXTCNT_DATA_FORK_LARGE - XFS_MAX_EXTCNT_DATA_FORK_SMALL)
>
> Otherwise it looks OK.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH V9 15/19] xfs: Directory's data fork extent counter can never overflow
2022-04-07 1:13 ` Dave Chinner
@ 2022-04-07 1:48 ` Darrick J. Wong
2022-04-07 8:19 ` Chandan Babu R
0 siblings, 1 reply; 62+ messages in thread
From: Darrick J. Wong @ 2022-04-07 1:48 UTC (permalink / raw)
To: Dave Chinner; +Cc: Chandan Babu R, linux-xfs
On Thu, Apr 07, 2022 at 11:13:11AM +1000, Dave Chinner wrote:
> On Wed, Apr 06, 2022 at 11:48:59AM +0530, Chandan Babu R wrote:
> > The maximum file size that can be represented by the data fork extent counter
> > in the worst case occurs when all extents are 1 block in length and each block
> > is 1KB in size.
> >
> > With XFS_MAX_EXTCNT_DATA_FORK_SMALL representing maximum extent count and with
> > 1KB sized blocks, a file can reach upto,
> > (2^31) * 1KB = 2TB
> >
> > This is much larger than the theoretical maximum size of a directory
> > i.e. XFS_DIR2_SPACE_SIZE * 3 = ~96GB.
> >
> > Since a directory's inode can never overflow its data fork extent counter,
> > this commit removes all the overflow checks associated with
> > it. xfs_dinode_verify() now performs a rough check to verify if a diretory's
> > data fork is larger than 96GB.
> >
> > Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
>
> Mostly OK, just a simple cleanup needed.
>
> > diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> > index ee8d4eb7d048..54b106ae77e1 100644
> > --- a/fs/xfs/libxfs/xfs_inode_buf.c
> > +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> > @@ -491,6 +491,15 @@ xfs_dinode_verify(
> > if (mode && nextents + naextents > nblocks)
> > return __this_address;
> >
> > + if (S_ISDIR(mode)) {
> > + uint64_t max_dfork_nexts;
> > +
> > + max_dfork_nexts = (XFS_DIR2_MAX_SPACES * XFS_DIR2_SPACE_SIZE) >>
> > + mp->m_sb.sb_blocklog;
> > + if (nextents > max_dfork_nexts)
> > + return __this_address;
> > + }
>
> max_dfork_nexts for a directory is a constant that should be
> calculated at mount time via xfs_da_mount() and stored in the
> mp->m_dir_geo structure. Then this code simple becomes:
>
> if (S_ISDIR(mode) && nextents > mp->m_dir_geo->max_extents)
> return __this_address;
I have the same comment as Dave, FWIW. :)
--D
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH V9 13/19] xfs: Replace numbered inode recovery error messages with descriptive ones
2022-04-06 6:18 ` [PATCH V9 13/19] xfs: Replace numbered inode recovery error messages with descriptive ones Chandan Babu R
@ 2022-04-07 1:50 ` Darrick J. Wong
0 siblings, 0 replies; 62+ messages in thread
From: Darrick J. Wong @ 2022-04-07 1:50 UTC (permalink / raw)
To: Chandan Babu R; +Cc: linux-xfs, david, Dave Chinner
On Wed, Apr 06, 2022 at 11:48:57AM +0530, Chandan Babu R wrote:
> This commit also prints inode fields with invalid values instead of printing
> addresses of inode and buffer instances.
>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
> Suggested-by: Dave Chinner <dchinner@redhat.com>
Much better for diagnosing recovery problems!!
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
--D
> ---
> fs/xfs/xfs_inode_item_recover.c | 52 ++++++++++++++-------------------
> 1 file changed, 22 insertions(+), 30 deletions(-)
>
> diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c
> index 44b90614859e..96b222e18b0f 100644
> --- a/fs/xfs/xfs_inode_item_recover.c
> +++ b/fs/xfs/xfs_inode_item_recover.c
> @@ -324,13 +324,12 @@ xlog_recover_inode_commit_pass2(
> if (unlikely(S_ISREG(ldip->di_mode))) {
> if ((ldip->di_format != XFS_DINODE_FMT_EXTENTS) &&
> (ldip->di_format != XFS_DINODE_FMT_BTREE)) {
> - XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(3)",
> - XFS_ERRLEVEL_LOW, mp, ldip,
> - sizeof(*ldip));
> + XFS_CORRUPTION_ERROR(
> + "Bad log dinode data fork format for regular file",
> + XFS_ERRLEVEL_LOW, mp, ldip, sizeof(*ldip));
> xfs_alert(mp,
> - "%s: Bad regular inode log record, rec ptr "PTR_FMT", "
> - "ino ptr = "PTR_FMT", ino bp = "PTR_FMT", ino %Ld",
> - __func__, item, dip, bp, in_f->ilf_ino);
> + "Bad inode 0x%llx, data fork format 0x%x",
> + in_f->ilf_ino, ldip->di_format);
> error = -EFSCORRUPTED;
> goto out_release;
> }
> @@ -338,49 +337,42 @@ xlog_recover_inode_commit_pass2(
> if ((ldip->di_format != XFS_DINODE_FMT_EXTENTS) &&
> (ldip->di_format != XFS_DINODE_FMT_BTREE) &&
> (ldip->di_format != XFS_DINODE_FMT_LOCAL)) {
> - XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(4)",
> - XFS_ERRLEVEL_LOW, mp, ldip,
> - sizeof(*ldip));
> + XFS_CORRUPTION_ERROR(
> + "Bad log dinode data fork format for directory",
> + XFS_ERRLEVEL_LOW, mp, ldip, sizeof(*ldip));
> xfs_alert(mp,
> - "%s: Bad dir inode log record, rec ptr "PTR_FMT", "
> - "ino ptr = "PTR_FMT", ino bp = "PTR_FMT", ino %Ld",
> - __func__, item, dip, bp, in_f->ilf_ino);
> + "Bad inode 0x%llx, data fork format 0x%x",
> + in_f->ilf_ino, ldip->di_format);
> error = -EFSCORRUPTED;
> goto out_release;
> }
> }
> if (unlikely(ldip->di_nextents + ldip->di_anextents > ldip->di_nblocks)){
> - XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(5)",
> - XFS_ERRLEVEL_LOW, mp, ldip,
> - sizeof(*ldip));
> + XFS_CORRUPTION_ERROR("Bad log dinode extent counts",
> + XFS_ERRLEVEL_LOW, mp, ldip, sizeof(*ldip));
> xfs_alert(mp,
> - "%s: Bad inode log record, rec ptr "PTR_FMT", dino ptr "PTR_FMT", "
> - "dino bp "PTR_FMT", ino %Ld, total extents = %d, nblocks = %Ld",
> - __func__, item, dip, bp, in_f->ilf_ino,
> - ldip->di_nextents + ldip->di_anextents,
> + "Bad inode 0x%llx, nextents 0x%x, anextents 0x%x, nblocks 0x%llx",
> + in_f->ilf_ino, ldip->di_nextents, ldip->di_anextents,
> ldip->di_nblocks);
> error = -EFSCORRUPTED;
> goto out_release;
> }
> if (unlikely(ldip->di_forkoff > mp->m_sb.sb_inodesize)) {
> - XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(6)",
> - XFS_ERRLEVEL_LOW, mp, ldip,
> - sizeof(*ldip));
> + XFS_CORRUPTION_ERROR("Bad log dinode fork offset",
> + XFS_ERRLEVEL_LOW, mp, ldip, sizeof(*ldip));
> xfs_alert(mp,
> - "%s: Bad inode log record, rec ptr "PTR_FMT", dino ptr "PTR_FMT", "
> - "dino bp "PTR_FMT", ino %Ld, forkoff 0x%x", __func__,
> - item, dip, bp, in_f->ilf_ino, ldip->di_forkoff);
> + "Bad inode 0x%llx, di_forkoff 0x%x",
> + in_f->ilf_ino, ldip->di_forkoff);
> error = -EFSCORRUPTED;
> goto out_release;
> }
> isize = xfs_log_dinode_size(mp);
> if (unlikely(item->ri_buf[1].i_len > isize)) {
> - XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(7)",
> - XFS_ERRLEVEL_LOW, mp, ldip,
> - sizeof(*ldip));
> + XFS_CORRUPTION_ERROR("Bad log dinode size", XFS_ERRLEVEL_LOW,
> + mp, ldip, sizeof(*ldip));
> xfs_alert(mp,
> - "%s: Bad inode log record length %d, rec ptr "PTR_FMT,
> - __func__, item->ri_buf[1].i_len, item);
> + "Bad inode 0x%llx log dinode size 0x%x",
> + in_f->ilf_ino, item->ri_buf[1].i_len);
> error = -EFSCORRUPTED;
> goto out_release;
> }
> --
> 2.30.2
>
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH V9 12/19] xfs: Introduce macros to represent new maximum extent counts for data/attr forks
2022-04-07 1:05 ` Dave Chinner
@ 2022-04-07 1:58 ` Darrick J. Wong
2022-04-07 2:44 ` Dave Chinner
2022-04-07 8:18 ` Chandan Babu R
0 siblings, 2 replies; 62+ messages in thread
From: Darrick J. Wong @ 2022-04-07 1:58 UTC (permalink / raw)
To: Dave Chinner; +Cc: Chandan Babu R, linux-xfs
On Thu, Apr 07, 2022 at 11:05:44AM +1000, Dave Chinner wrote:
> On Wed, Apr 06, 2022 at 11:48:56AM +0530, Chandan Babu R wrote:
> > This commit defines new macros to represent maximum extent counts allowed by
> > filesystems which have support for large per-inode extent counters.
> >
> > Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> > Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
> > ---
> > fs/xfs/libxfs/xfs_bmap.c | 9 ++++-----
> > fs/xfs/libxfs/xfs_bmap_btree.c | 3 ++-
> > fs/xfs/libxfs/xfs_format.h | 24 ++++++++++++++++++++++--
> > fs/xfs/libxfs/xfs_inode_buf.c | 4 +++-
> > fs/xfs/libxfs/xfs_inode_fork.c | 3 ++-
> > fs/xfs/libxfs/xfs_inode_fork.h | 21 +++++++++++++++++----
> > 6 files changed, 50 insertions(+), 14 deletions(-)
> >
> > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> > index b317226fb4ba..1254d4d4821e 100644
> > --- a/fs/xfs/libxfs/xfs_bmap.c
> > +++ b/fs/xfs/libxfs/xfs_bmap.c
> > @@ -61,10 +61,8 @@ xfs_bmap_compute_maxlevels(
> > int sz; /* root block size */
> >
> > /*
> > - * The maximum number of extents in a file, hence the maximum number of
> > - * leaf entries, is controlled by the size of the on-disk extent count,
> > - * either a signed 32-bit number for the data fork, or a signed 16-bit
> > - * number for the attr fork.
> > + * The maximum number of extents in a fork, hence the maximum number of
> > + * leaf entries, is controlled by the size of the on-disk extent count.
> > *
> > * Note that we can no longer assume that if we are in ATTR1 that the
> > * fork offset of all the inodes will be
> > @@ -74,7 +72,8 @@ xfs_bmap_compute_maxlevels(
> > * ATTR2 we have to assume the worst case scenario of a minimum size
> > * available.
> > */
> > - maxleafents = xfs_iext_max_nextents(whichfork);
> > + maxleafents = xfs_iext_max_nextents(xfs_has_large_extent_counts(mp),
> > + whichfork);
> > if (whichfork == XFS_DATA_FORK)
> > sz = XFS_BMDR_SPACE_CALC(MINDBTPTRS);
> > else
>
> Just to confirm, the large extent count feature bit can only be
> added when the filesystem is unmounted?
Yes, because we (currently) don't support /any/ online feature upgrades.
IIRC Chandan said that you'd have to be careful about validating the min
log size requirements are still met because the tx reservation sizes can
change with the taller bmbts.
> > diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
> > index 453309fc85f2..7aabeccea9ab 100644
> > --- a/fs/xfs/libxfs/xfs_bmap_btree.c
> > +++ b/fs/xfs/libxfs/xfs_bmap_btree.c
> > @@ -611,7 +611,8 @@ xfs_bmbt_maxlevels_ondisk(void)
> > minrecs[1] = xfs_bmbt_block_maxrecs(blocklen, false) / 2;
> >
> > /* One extra level for the inode root. */
> > - return xfs_btree_compute_maxlevels(minrecs, MAXEXTNUM) + 1;
> > + return xfs_btree_compute_maxlevels(minrecs,
> > + XFS_MAX_EXTCNT_DATA_FORK_LARGE) + 1;
> > }
>
> Why is this set to XFS_MAX_EXTCNT_DATA_FORK_LARGE rather than being
> conditional xfs_has_large_extent_counts(mp)? i.e. if the feature bit
> is not set, the maximum on-disk levels in the bmbt is determined by
> XFS_MAX_EXTCNT_DATA_FORK_SMALL, not XFS_MAX_EXTCNT_DATA_FORK_LARGE.
This function (and all the other _maxlevels_ondisk functions) compute
the maximum possible btree height for any filesystem that we'd care to
mount. This value is then passed to the functions that create the btree
cursor caches, which is why this is independent of any xfs_mount.
That said ... depending on how much this inflates the size of the bmbt
cursor cache, I think we could create multiple slabs.
> The "_ondisk" suffix implies that it has something to do with the
> on-disk format of the filesystem, but AFAICT what we are calculating
> here is a constant used for in-memory structure allocation? There
> needs to be something explained/changed here, because this is
> confusing...
You suggested it. ;)
https://lore.kernel.org/linux-xfs/20211013075743.GG2361455@dread.disaster.area/
--D
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH V9 16/19] xfs: Conditionally upgrade existing inodes to use large extent counters
2022-04-07 1:46 ` Darrick J. Wong
@ 2022-04-07 2:00 ` Darrick J. Wong
2022-04-07 8:19 ` Chandan Babu R
1 sibling, 0 replies; 62+ messages in thread
From: Darrick J. Wong @ 2022-04-07 2:00 UTC (permalink / raw)
To: Chandan Babu R; +Cc: linux-xfs, david
On Wed, Apr 06, 2022 at 06:46:27PM -0700, Darrick J. Wong wrote:
> On Wed, Apr 06, 2022 at 11:49:00AM +0530, Chandan Babu R wrote:
> > This commit enables upgrading existing inodes to use large extent counters
> > provided that underlying filesystem's superblock has large extent counter
> > feature enabled.
> >
> > Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
> > ---
> > fs/xfs/libxfs/xfs_attr.c | 10 ++++++++++
> > fs/xfs/libxfs/xfs_bmap.c | 6 ++++--
> > fs/xfs/libxfs/xfs_format.h | 8 ++++++++
> > fs/xfs/libxfs/xfs_inode_fork.c | 19 +++++++++++++++++++
> > fs/xfs/libxfs/xfs_inode_fork.h | 2 ++
> > fs/xfs/xfs_bmap_item.c | 2 ++
> > fs/xfs/xfs_bmap_util.c | 13 +++++++++++++
> > fs/xfs/xfs_dquot.c | 3 +++
> > fs/xfs/xfs_iomap.c | 5 +++++
> > fs/xfs/xfs_reflink.c | 5 +++++
> > fs/xfs/xfs_rtalloc.c | 3 +++
> > 11 files changed, 74 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > index 23523b802539..66c4fc55c9d7 100644
> > --- a/fs/xfs/libxfs/xfs_attr.c
> > +++ b/fs/xfs/libxfs/xfs_attr.c
> > @@ -776,8 +776,18 @@ xfs_attr_set(
> > if (args->value || xfs_inode_hasattr(dp)) {
> > error = xfs_iext_count_may_overflow(dp, XFS_ATTR_FORK,
> > XFS_IEXT_ATTR_MANIP_CNT(rmt_blks));
> > + if (error == -EFBIG)
> > + error = xfs_iext_count_upgrade(args->trans, dp,
> > + XFS_IEXT_ATTR_MANIP_CNT(rmt_blks));
> > if (error)
> > goto out_trans_cancel;
> > +
> > + if (error == -EFBIG) {
> > + error = xfs_iext_count_upgrade(args->trans, dp,
> > + XFS_IEXT_ATTR_MANIP_CNT(rmt_blks));
> > + if (error)
> > + goto out_trans_cancel;
> > + }
> > }
> >
> > error = xfs_attr_lookup(args);
> > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> > index 4fab0c92ab70..82d5467ddf2c 100644
> > --- a/fs/xfs/libxfs/xfs_bmap.c
> > +++ b/fs/xfs/libxfs/xfs_bmap.c
> > @@ -4524,14 +4524,16 @@ xfs_bmapi_convert_delalloc(
> > return error;
> >
> > xfs_ilock(ip, XFS_ILOCK_EXCL);
> > + xfs_trans_ijoin(tp, ip, 0);
> >
> > error = xfs_iext_count_may_overflow(ip, whichfork,
> > XFS_IEXT_ADD_NOSPLIT_CNT);
> > + if (error == -EFBIG)
> > + error = xfs_iext_count_upgrade(tp, ip,
> > + XFS_IEXT_ADD_NOSPLIT_CNT);
> > if (error)
> > goto out_trans_cancel;
> >
> > - xfs_trans_ijoin(tp, ip, 0);
> > -
> > if (!xfs_iext_lookup_extent(ip, ifp, offset_fsb, &bma.icur, &bma.got) ||
> > bma.got.br_startoff > offset_fsb) {
> > /*
> > diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> > index 43de892d0305..bb327ea43ca1 100644
> > --- a/fs/xfs/libxfs/xfs_format.h
> > +++ b/fs/xfs/libxfs/xfs_format.h
> > @@ -934,6 +934,14 @@ enum xfs_dinode_fmt {
> > #define XFS_MAX_EXTCNT_DATA_FORK_SMALL ((xfs_extnum_t)((1ULL << 31) - 1))
> > #define XFS_MAX_EXTCNT_ATTR_FORK_SMALL ((xfs_extnum_t)((1ULL << 15) - 1))
> >
> > +/*
> > + * This macro represents the maximum value by which a filesystem operation can
> > + * increase the value of an inode's data/attr fork extent count.
> > + */
> > +#define XFS_MAX_EXTCNT_UPGRADE_NR \
> > + min(XFS_MAX_EXTCNT_ATTR_FORK_LARGE - XFS_MAX_EXTCNT_ATTR_FORK_SMALL, \
> > + XFS_MAX_EXTCNT_DATA_FORK_LARGE - XFS_MAX_EXTCNT_DATA_FORK_SMALL)
> > +
> > /*
> > * Inode minimum and maximum sizes.
> > */
> > diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
> > index bb5d841aac58..1245e9f1ca81 100644
> > --- a/fs/xfs/libxfs/xfs_inode_fork.c
> > +++ b/fs/xfs/libxfs/xfs_inode_fork.c
> > @@ -756,3 +756,22 @@ xfs_iext_count_may_overflow(
> >
> > return 0;
> > }
> > +
> > +int
> > +xfs_iext_count_upgrade(
>
> Hmm. I think the @nr_to_add parameter is supposed to be the one
> that caused xfs_iext_count_may_overflow to return -EFBIG, right?
>
> I was about to comment that it would be really helpful to have a comment
> above this function dropping a hint that this is the case:
>
> /*
> * Upgrade this inode's extent counter fields to be able to handle a
> * potential increase in the extent count by this number. Normally
> * this is the same quantity that caused xfs_iext_count_may_overflow to
> * return -EFBIG.
> */
> int
> xfs_iext_count_upgrade(...
>
> ...though I worry that this will cause fatal warnings about the
> otherwise unused parameter on non-debug kernels? I'm not sure why it
> matters that nr_to_add is constrained to a small value? Is it just to
> prevent obviously huge values? AFAICT all the current callers pass in
> small #defined integer values.
>
> That said, if the assert here is something Dave asked for in a previous
> review, then I won't stand in the way:
>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
>
> --D
>
> > + struct xfs_trans *tp,
> > + struct xfs_inode *ip,
> > + uint nr_to_add)
> > +{
> > + ASSERT(nr_to_add <= XFS_MAX_EXTCNT_UPGRADE_NR);
> > +
> > + if (!xfs_has_large_extent_counts(ip->i_mount) ||
> > + (ip->i_diflags2 & XFS_DIFLAG2_NREXT64) ||
xfs_inode_has_large_extent_counts(ip) || ?
--D
> > + XFS_TEST_ERROR(false, ip->i_mount, XFS_ERRTAG_REDUCE_MAX_IEXTENTS))
> > + return -EFBIG;
> > +
> > + ip->i_diflags2 |= XFS_DIFLAG2_NREXT64;
> > + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> > +
> > + return 0;
> > +}
> > diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
> > index 6f9d69f8896e..4f68c1f20beb 100644
> > --- a/fs/xfs/libxfs/xfs_inode_fork.h
> > +++ b/fs/xfs/libxfs/xfs_inode_fork.h
> > @@ -275,6 +275,8 @@ int xfs_ifork_verify_local_data(struct xfs_inode *ip);
> > int xfs_ifork_verify_local_attr(struct xfs_inode *ip);
> > int xfs_iext_count_may_overflow(struct xfs_inode *ip, int whichfork,
> > int nr_to_add);
> > +int xfs_iext_count_upgrade(struct xfs_trans *tp, struct xfs_inode *ip,
> > + uint nr_to_add);
> >
> > /* returns true if the fork has extents but they are not read in yet. */
> > static inline bool xfs_need_iread_extents(struct xfs_ifork *ifp)
> > diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
> > index 761dde155099..593ac29cffc7 100644
> > --- a/fs/xfs/xfs_bmap_item.c
> > +++ b/fs/xfs/xfs_bmap_item.c
> > @@ -506,6 +506,8 @@ xfs_bui_item_recover(
> > iext_delta = XFS_IEXT_PUNCH_HOLE_CNT;
> >
> > error = xfs_iext_count_may_overflow(ip, whichfork, iext_delta);
> > + if (error == -EFBIG)
> > + error = xfs_iext_count_upgrade(tp, ip, iext_delta);
> > if (error)
> > goto err_cancel;
> >
> > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> > index 18c1b99311a8..52be58372c63 100644
> > --- a/fs/xfs/xfs_bmap_util.c
> > +++ b/fs/xfs/xfs_bmap_util.c
> > @@ -859,6 +859,9 @@ xfs_alloc_file_space(
> >
> > error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
> > XFS_IEXT_ADD_NOSPLIT_CNT);
> > + if (error == -EFBIG)
> > + error = xfs_iext_count_upgrade(tp, ip,
> > + XFS_IEXT_ADD_NOSPLIT_CNT);
> > if (error)
> > goto error;
> >
> > @@ -914,6 +917,8 @@ xfs_unmap_extent(
> >
> > error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
> > XFS_IEXT_PUNCH_HOLE_CNT);
> > + if (error == -EFBIG)
> > + error = xfs_iext_count_upgrade(tp, ip, XFS_IEXT_PUNCH_HOLE_CNT);
> > if (error)
> > goto out_trans_cancel;
> >
> > @@ -1195,6 +1200,8 @@ xfs_insert_file_space(
> >
> > error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
> > XFS_IEXT_PUNCH_HOLE_CNT);
> > + if (error == -EFBIG)
> > + error = xfs_iext_count_upgrade(tp, ip, XFS_IEXT_PUNCH_HOLE_CNT);
> > if (error)
> > goto out_trans_cancel;
> >
> > @@ -1423,6 +1430,9 @@ xfs_swap_extent_rmap(
> > error = xfs_iext_count_may_overflow(ip,
> > XFS_DATA_FORK,
> > XFS_IEXT_SWAP_RMAP_CNT);
> > + if (error == -EFBIG)
> > + error = xfs_iext_count_upgrade(tp, ip,
> > + XFS_IEXT_SWAP_RMAP_CNT);
> > if (error)
> > goto out;
> > }
> > @@ -1431,6 +1441,9 @@ xfs_swap_extent_rmap(
> > error = xfs_iext_count_may_overflow(tip,
> > XFS_DATA_FORK,
> > XFS_IEXT_SWAP_RMAP_CNT);
> > + if (error == -EFBIG)
> > + error = xfs_iext_count_upgrade(tp, ip,
> > + XFS_IEXT_SWAP_RMAP_CNT);
> > if (error)
> > goto out;
> > }
> > diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
> > index 5afedcbc78c7..eb211e0ede5d 100644
> > --- a/fs/xfs/xfs_dquot.c
> > +++ b/fs/xfs/xfs_dquot.c
> > @@ -322,6 +322,9 @@ xfs_dquot_disk_alloc(
> >
> > error = xfs_iext_count_may_overflow(quotip, XFS_DATA_FORK,
> > XFS_IEXT_ADD_NOSPLIT_CNT);
> > + if (error == -EFBIG)
> > + error = xfs_iext_count_upgrade(tp, quotip,
> > + XFS_IEXT_ADD_NOSPLIT_CNT);
> > if (error)
> > goto err_cancel;
> >
> > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > index 87e1cf5060bd..5a393259a3a3 100644
> > --- a/fs/xfs/xfs_iomap.c
> > +++ b/fs/xfs/xfs_iomap.c
> > @@ -251,6 +251,8 @@ xfs_iomap_write_direct(
> > return error;
> >
> > error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK, nr_exts);
> > + if (error == -EFBIG)
> > + error = xfs_iext_count_upgrade(tp, ip, nr_exts);
> > if (error)
> > goto out_trans_cancel;
> >
> > @@ -555,6 +557,9 @@ xfs_iomap_write_unwritten(
> >
> > error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
> > XFS_IEXT_WRITE_UNWRITTEN_CNT);
> > + if (error == -EFBIG)
> > + error = xfs_iext_count_upgrade(tp, ip,
> > + XFS_IEXT_WRITE_UNWRITTEN_CNT);
> > if (error)
> > goto error_on_bmapi_transaction;
> >
> > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> > index 54e68e5693fd..1ae6d3434ad2 100644
> > --- a/fs/xfs/xfs_reflink.c
> > +++ b/fs/xfs/xfs_reflink.c
> > @@ -620,6 +620,9 @@ xfs_reflink_end_cow_extent(
> >
> > error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
> > XFS_IEXT_REFLINK_END_COW_CNT);
> > + if (error == -EFBIG)
> > + error = xfs_iext_count_upgrade(tp, ip,
> > + XFS_IEXT_REFLINK_END_COW_CNT);
> > if (error)
> > goto out_cancel;
> >
> > @@ -1121,6 +1124,8 @@ xfs_reflink_remap_extent(
> > ++iext_delta;
> >
> > error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK, iext_delta);
> > + if (error == -EFBIG)
> > + error = xfs_iext_count_upgrade(tp, ip, iext_delta);
> > if (error)
> > goto out_cancel;
> >
> > diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
> > index b8c79ee791af..3e587e85d5bf 100644
> > --- a/fs/xfs/xfs_rtalloc.c
> > +++ b/fs/xfs/xfs_rtalloc.c
> > @@ -806,6 +806,9 @@ xfs_growfs_rt_alloc(
> >
> > error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
> > XFS_IEXT_ADD_NOSPLIT_CNT);
> > + if (error == -EFBIG)
> > + error = xfs_iext_count_upgrade(tp, ip,
> > + XFS_IEXT_ADD_NOSPLIT_CNT);
> > if (error)
> > goto out_trans_cancel;
> >
> > --
> > 2.30.2
> >
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH V9 12/19] xfs: Introduce macros to represent new maximum extent counts for data/attr forks
2022-04-07 1:58 ` Darrick J. Wong
@ 2022-04-07 2:44 ` Dave Chinner
2022-04-07 8:18 ` Chandan Babu R
2022-04-07 8:18 ` Chandan Babu R
1 sibling, 1 reply; 62+ messages in thread
From: Dave Chinner @ 2022-04-07 2:44 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: Chandan Babu R, linux-xfs
On Wed, Apr 06, 2022 at 06:58:55PM -0700, Darrick J. Wong wrote:
> On Thu, Apr 07, 2022 at 11:05:44AM +1000, Dave Chinner wrote:
> > On Wed, Apr 06, 2022 at 11:48:56AM +0530, Chandan Babu R wrote:
> > > diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
> > > index 453309fc85f2..7aabeccea9ab 100644
> > > --- a/fs/xfs/libxfs/xfs_bmap_btree.c
> > > +++ b/fs/xfs/libxfs/xfs_bmap_btree.c
> > > @@ -611,7 +611,8 @@ xfs_bmbt_maxlevels_ondisk(void)
> > > minrecs[1] = xfs_bmbt_block_maxrecs(blocklen, false) / 2;
> > >
> > > /* One extra level for the inode root. */
> > > - return xfs_btree_compute_maxlevels(minrecs, MAXEXTNUM) + 1;
> > > + return xfs_btree_compute_maxlevels(minrecs,
> > > + XFS_MAX_EXTCNT_DATA_FORK_LARGE) + 1;
> > > }
> >
> > Why is this set to XFS_MAX_EXTCNT_DATA_FORK_LARGE rather than being
> > conditional xfs_has_large_extent_counts(mp)? i.e. if the feature bit
> > is not set, the maximum on-disk levels in the bmbt is determined by
> > XFS_MAX_EXTCNT_DATA_FORK_SMALL, not XFS_MAX_EXTCNT_DATA_FORK_LARGE.
>
> This function (and all the other _maxlevels_ondisk functions) compute
> the maximum possible btree height for any filesystem that we'd care to
> mount. This value is then passed to the functions that create the btree
> cursor caches, which is why this is independent of any xfs_mount.
>
> That said ... depending on how much this inflates the size of the bmbt
> cursor cache, I think we could create multiple slabs.
>
> > The "_ondisk" suffix implies that it has something to do with the
> > on-disk format of the filesystem, but AFAICT what we are calculating
> > here is a constant used for in-memory structure allocation? There
> > needs to be something explained/changed here, because this is
> > confusing...
>
> You suggested it. ;)
>
> https://lore.kernel.org/linux-xfs/20211013075743.GG2361455@dread.disaster.area/
That doesn't mean it's perfect and can't be changed, nor that I
remember the exact details of something that happened 6 months ago.
Indeed, if I'm confused by it 6 months later, that tends to say it
wasn't a very good name... :)
.... or that the missing context needs explaining so the reader is
reminded what the _ondisk() name means.
i.e. the problem goes away with a simple comment:
/*
* Calculate the maximum possible height of the btree that the
* on-disk format supports. This is used for sizing structures large
* enough to support every possible configuration of a filesystem
* that might get mounted.
*/
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH V9 12/19] xfs: Introduce macros to represent new maximum extent counts for data/attr forks
2022-04-07 2:44 ` Dave Chinner
@ 2022-04-07 8:18 ` Chandan Babu R
2022-04-07 8:56 ` Dave Chinner
0 siblings, 1 reply; 62+ messages in thread
From: Chandan Babu R @ 2022-04-07 8:18 UTC (permalink / raw)
To: Dave Chinner; +Cc: Darrick J. Wong, linux-xfs
On 07 Apr 2022 at 08:14, Dave Chinner wrote:
> On Wed, Apr 06, 2022 at 06:58:55PM -0700, Darrick J. Wong wrote:
>> On Thu, Apr 07, 2022 at 11:05:44AM +1000, Dave Chinner wrote:
>> > On Wed, Apr 06, 2022 at 11:48:56AM +0530, Chandan Babu R wrote:
>> > > diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
>> > > index 453309fc85f2..7aabeccea9ab 100644
>> > > --- a/fs/xfs/libxfs/xfs_bmap_btree.c
>> > > +++ b/fs/xfs/libxfs/xfs_bmap_btree.c
>> > > @@ -611,7 +611,8 @@ xfs_bmbt_maxlevels_ondisk(void)
>> > > minrecs[1] = xfs_bmbt_block_maxrecs(blocklen, false) / 2;
>> > >
>> > > /* One extra level for the inode root. */
>> > > - return xfs_btree_compute_maxlevels(minrecs, MAXEXTNUM) + 1;
>> > > + return xfs_btree_compute_maxlevels(minrecs,
>> > > + XFS_MAX_EXTCNT_DATA_FORK_LARGE) + 1;
>> > > }
>> >
>> > Why is this set to XFS_MAX_EXTCNT_DATA_FORK_LARGE rather than being
>> > conditional xfs_has_large_extent_counts(mp)? i.e. if the feature bit
>> > is not set, the maximum on-disk levels in the bmbt is determined by
>> > XFS_MAX_EXTCNT_DATA_FORK_SMALL, not XFS_MAX_EXTCNT_DATA_FORK_LARGE.
>>
>> This function (and all the other _maxlevels_ondisk functions) compute
>> the maximum possible btree height for any filesystem that we'd care to
>> mount. This value is then passed to the functions that create the btree
>> cursor caches, which is why this is independent of any xfs_mount.
>>
>> That said ... depending on how much this inflates the size of the bmbt
>> cursor cache, I think we could create multiple slabs.
>>
>> > The "_ondisk" suffix implies that it has something to do with the
>> > on-disk format of the filesystem, but AFAICT what we are calculating
>> > here is a constant used for in-memory structure allocation? There
>> > needs to be something explained/changed here, because this is
>> > confusing...
>>
>> You suggested it. ;)
>>
>> https://lore.kernel.org/linux-xfs/20211013075743.GG2361455@dread.disaster.area/
>
> That doesn't mean it's perfect and can't be changed, nor that I
> remember the exact details of something that happened 6 months ago.
> Indeed, if I'm confused by it 6 months later, that tends to say it
> wasn't a very good name... :)
>
> .... or that the missing context needs explaining so the reader is
> reminded what the _ondisk() name means.
>
> i.e. the problem goes away with a simple comment:
>
> /*
> * Calculate the maximum possible height of the btree that the
> * on-disk format supports. This is used for sizing structures large
> * enough to support every possible configuration of a filesystem
> * that might get mounted.
> */
>
If there are no objections, I will include the above comment as part of this
patch.
--
chandan
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH V9 12/19] xfs: Introduce macros to represent new maximum extent counts for data/attr forks
2022-04-07 1:58 ` Darrick J. Wong
2022-04-07 2:44 ` Dave Chinner
@ 2022-04-07 8:18 ` Chandan Babu R
1 sibling, 0 replies; 62+ messages in thread
From: Chandan Babu R @ 2022-04-07 8:18 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: Dave Chinner, linux-xfs
On 07 Apr 2022 at 07:28, Darrick J. Wong wrote:
> On Thu, Apr 07, 2022 at 11:05:44AM +1000, Dave Chinner wrote:
>> On Wed, Apr 06, 2022 at 11:48:56AM +0530, Chandan Babu R wrote:
>> > This commit defines new macros to represent maximum extent counts allowed by
>> > filesystems which have support for large per-inode extent counters.
>> >
>> > Reviewed-by: Darrick J. Wong <djwong@kernel.org>
>> > Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
>> > ---
>> > fs/xfs/libxfs/xfs_bmap.c | 9 ++++-----
>> > fs/xfs/libxfs/xfs_bmap_btree.c | 3 ++-
>> > fs/xfs/libxfs/xfs_format.h | 24 ++++++++++++++++++++++--
>> > fs/xfs/libxfs/xfs_inode_buf.c | 4 +++-
>> > fs/xfs/libxfs/xfs_inode_fork.c | 3 ++-
>> > fs/xfs/libxfs/xfs_inode_fork.h | 21 +++++++++++++++++----
>> > 6 files changed, 50 insertions(+), 14 deletions(-)
>> >
>> > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
>> > index b317226fb4ba..1254d4d4821e 100644
>> > --- a/fs/xfs/libxfs/xfs_bmap.c
>> > +++ b/fs/xfs/libxfs/xfs_bmap.c
>> > @@ -61,10 +61,8 @@ xfs_bmap_compute_maxlevels(
>> > int sz; /* root block size */
>> >
>> > /*
>> > - * The maximum number of extents in a file, hence the maximum number of
>> > - * leaf entries, is controlled by the size of the on-disk extent count,
>> > - * either a signed 32-bit number for the data fork, or a signed 16-bit
>> > - * number for the attr fork.
>> > + * The maximum number of extents in a fork, hence the maximum number of
>> > + * leaf entries, is controlled by the size of the on-disk extent count.
>> > *
>> > * Note that we can no longer assume that if we are in ATTR1 that the
>> > * fork offset of all the inodes will be
>> > @@ -74,7 +72,8 @@ xfs_bmap_compute_maxlevels(
>> > * ATTR2 we have to assume the worst case scenario of a minimum size
>> > * available.
>> > */
>> > - maxleafents = xfs_iext_max_nextents(whichfork);
>> > + maxleafents = xfs_iext_max_nextents(xfs_has_large_extent_counts(mp),
>> > + whichfork);
>> > if (whichfork == XFS_DATA_FORK)
>> > sz = XFS_BMDR_SPACE_CALC(MINDBTPTRS);
>> > else
>>
>> Just to confirm, the large extent count feature bit can only be
>> added when the filesystem is unmounted?
>
> Yes, because we (currently) don't support /any/ online feature upgrades.
> IIRC Chandan said that you'd have to be careful about validating the min
> log size requirements are still met because the tx reservation sizes can
> change with the taller bmbts.
>
Yes, taller BMBT trees causes transaction reservation values to change. This
in turn causes a change in log size calculations.
--
chandan
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH V9 14/19] xfs: Introduce per-inode 64-bit extent counters
2022-04-07 1:07 ` Dave Chinner
@ 2022-04-07 8:18 ` Chandan Babu R
-1 siblings, 0 replies; 62+ messages in thread
From: Chandan Babu R @ 2022-04-07 8:18 UTC (permalink / raw)
To: Dave Chinner
Cc: kernel test robot, linux-xfs, kbuild-all, djwong, Dave Chinner
On 07 Apr 2022 at 06:37, Dave Chinner wrote:
> On Thu, Apr 07, 2022 at 03:03:32AM +0800, kernel test robot wrote:
>> Hi Chandan,
>>
>> Thank you for the patch! Perhaps something to improve:
>>
>> [auto build test WARNING on xfs-linux/for-next]
>> [also build test WARNING on v5.18-rc1 next-20220406]
>> [If your patch is applied to the wrong git tree, kindly drop us a note.
>> And when submitting patch, we suggest to use '--base' as documented in
>> https://git-scm.com/docs/git-format-patch]
>>
>> url: https://github.com/intel-lab-lkp/linux/commits/Chandan-Babu-R/xfs-Extend-per-inode-extent-counters/20220406-174647
>> base: https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
>> config: i386-randconfig-s002 (https://download.01.org/0day-ci/archive/20220407/202204070218.QyD2PQPx-lkp@intel.com/config)
>> compiler: gcc-11 (Debian 11.2.0-19) 11.2.0
>> reproduce:
>> # apt-get install sparse
>> # sparse version: v0.6.4-dirty
>> # https://github.com/intel-lab-lkp/linux/commit/28be4fd3f13d4ba2bcedceb8951cd3bfe852cba2
>> git remote add linux-review https://github.com/intel-lab-lkp/linux
>> git fetch --no-tags linux-review Chandan-Babu-R/xfs-Extend-per-inode-extent-counters/20220406-174647
>> git checkout 28be4fd3f13d4ba2bcedceb8951cd3bfe852cba2
>> # save the config file to linux build tree
>> mkdir build_dir
>> make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=i386 SHELL=/bin/bash fs/xfs/
>>
>> If you fix the issue, kindly add following tag as appropriate
>> Reported-by: kernel test robot <lkp@intel.com>
>>
>>
>> sparse warnings: (new ones prefixed by >>)
>> >> fs/xfs/xfs_inode_item_recover.c:209:31: sparse: sparse: incorrect type in assignment (different base types) @@ expected restricted __be64 [usertype] di_v3_pad @@ got unsigned long long [usertype] di_v3_pad @@
>> fs/xfs/xfs_inode_item_recover.c:209:31: sparse: expected restricted __be64 [usertype] di_v3_pad
>> fs/xfs/xfs_inode_item_recover.c:209:31: sparse: got unsigned long long [usertype] di_v3_pad
>>
>> vim +209 fs/xfs/xfs_inode_item_recover.c
>>
>> 167
>> 168 STATIC void
>> 169 xfs_log_dinode_to_disk(
>> 170 struct xfs_log_dinode *from,
>> 171 struct xfs_dinode *to,
>> 172 xfs_lsn_t lsn)
>> 173 {
>> 174 to->di_magic = cpu_to_be16(from->di_magic);
>> 175 to->di_mode = cpu_to_be16(from->di_mode);
>> 176 to->di_version = from->di_version;
>> 177 to->di_format = from->di_format;
>> 178 to->di_onlink = 0;
>> 179 to->di_uid = cpu_to_be32(from->di_uid);
>> 180 to->di_gid = cpu_to_be32(from->di_gid);
>> 181 to->di_nlink = cpu_to_be32(from->di_nlink);
>> 182 to->di_projid_lo = cpu_to_be16(from->di_projid_lo);
>> 183 to->di_projid_hi = cpu_to_be16(from->di_projid_hi);
>> 184
>> 185 to->di_atime = xfs_log_dinode_to_disk_ts(from, from->di_atime);
>> 186 to->di_mtime = xfs_log_dinode_to_disk_ts(from, from->di_mtime);
>> 187 to->di_ctime = xfs_log_dinode_to_disk_ts(from, from->di_ctime);
>> 188
>> 189 to->di_size = cpu_to_be64(from->di_size);
>> 190 to->di_nblocks = cpu_to_be64(from->di_nblocks);
>> 191 to->di_extsize = cpu_to_be32(from->di_extsize);
>> 192 to->di_forkoff = from->di_forkoff;
>> 193 to->di_aformat = from->di_aformat;
>> 194 to->di_dmevmask = cpu_to_be32(from->di_dmevmask);
>> 195 to->di_dmstate = cpu_to_be16(from->di_dmstate);
>> 196 to->di_flags = cpu_to_be16(from->di_flags);
>> 197 to->di_gen = cpu_to_be32(from->di_gen);
>> 198
>> 199 if (from->di_version == 3) {
>> 200 to->di_changecount = cpu_to_be64(from->di_changecount);
>> 201 to->di_crtime = xfs_log_dinode_to_disk_ts(from,
>> 202 from->di_crtime);
>> 203 to->di_flags2 = cpu_to_be64(from->di_flags2);
>> 204 to->di_cowextsize = cpu_to_be32(from->di_cowextsize);
>> 205 to->di_ino = cpu_to_be64(from->di_ino);
>> 206 to->di_lsn = cpu_to_be64(lsn);
>> 207 memcpy(to->di_pad2, from->di_pad2, sizeof(to->di_pad2));
>> 208 uuid_copy(&to->di_uuid, &from->di_uuid);
>> > 209 to->di_v3_pad = from->di_v3_pad;
>
> Why not just explicitly write zero to the di_v3_pad field?
>
Yes, We can do that since the call to xfs_log_dinode_to_disk_iext_counters()
will update the values of union members correctly for v3 inodes with large
extent count feature enabled.
>> 210 } else {
>> 211 to->di_flushiter = cpu_to_be16(from->di_flushiter);
>> 212 memcpy(to->di_v2_pad, from->di_v2_pad, sizeof(to->di_v2_pad));
>
> Same here?
>
This field too can be set to zeroes explicitly.
--
chandan
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH V9 14/19] xfs: Introduce per-inode 64-bit extent counters
@ 2022-04-07 8:18 ` Chandan Babu R
0 siblings, 0 replies; 62+ messages in thread
From: Chandan Babu R @ 2022-04-07 8:18 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 4785 bytes --]
On 07 Apr 2022 at 06:37, Dave Chinner wrote:
> On Thu, Apr 07, 2022 at 03:03:32AM +0800, kernel test robot wrote:
>> Hi Chandan,
>>
>> Thank you for the patch! Perhaps something to improve:
>>
>> [auto build test WARNING on xfs-linux/for-next]
>> [also build test WARNING on v5.18-rc1 next-20220406]
>> [If your patch is applied to the wrong git tree, kindly drop us a note.
>> And when submitting patch, we suggest to use '--base' as documented in
>> https://git-scm.com/docs/git-format-patch]
>>
>> url: https://github.com/intel-lab-lkp/linux/commits/Chandan-Babu-R/xfs-Extend-per-inode-extent-counters/20220406-174647
>> base: https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
>> config: i386-randconfig-s002 (https://download.01.org/0day-ci/archive/20220407/202204070218.QyD2PQPx-lkp(a)intel.com/config)
>> compiler: gcc-11 (Debian 11.2.0-19) 11.2.0
>> reproduce:
>> # apt-get install sparse
>> # sparse version: v0.6.4-dirty
>> # https://github.com/intel-lab-lkp/linux/commit/28be4fd3f13d4ba2bcedceb8951cd3bfe852cba2
>> git remote add linux-review https://github.com/intel-lab-lkp/linux
>> git fetch --no-tags linux-review Chandan-Babu-R/xfs-Extend-per-inode-extent-counters/20220406-174647
>> git checkout 28be4fd3f13d4ba2bcedceb8951cd3bfe852cba2
>> # save the config file to linux build tree
>> mkdir build_dir
>> make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=i386 SHELL=/bin/bash fs/xfs/
>>
>> If you fix the issue, kindly add following tag as appropriate
>> Reported-by: kernel test robot <lkp@intel.com>
>>
>>
>> sparse warnings: (new ones prefixed by >>)
>> >> fs/xfs/xfs_inode_item_recover.c:209:31: sparse: sparse: incorrect type in assignment (different base types) @@ expected restricted __be64 [usertype] di_v3_pad @@ got unsigned long long [usertype] di_v3_pad @@
>> fs/xfs/xfs_inode_item_recover.c:209:31: sparse: expected restricted __be64 [usertype] di_v3_pad
>> fs/xfs/xfs_inode_item_recover.c:209:31: sparse: got unsigned long long [usertype] di_v3_pad
>>
>> vim +209 fs/xfs/xfs_inode_item_recover.c
>>
>> 167
>> 168 STATIC void
>> 169 xfs_log_dinode_to_disk(
>> 170 struct xfs_log_dinode *from,
>> 171 struct xfs_dinode *to,
>> 172 xfs_lsn_t lsn)
>> 173 {
>> 174 to->di_magic = cpu_to_be16(from->di_magic);
>> 175 to->di_mode = cpu_to_be16(from->di_mode);
>> 176 to->di_version = from->di_version;
>> 177 to->di_format = from->di_format;
>> 178 to->di_onlink = 0;
>> 179 to->di_uid = cpu_to_be32(from->di_uid);
>> 180 to->di_gid = cpu_to_be32(from->di_gid);
>> 181 to->di_nlink = cpu_to_be32(from->di_nlink);
>> 182 to->di_projid_lo = cpu_to_be16(from->di_projid_lo);
>> 183 to->di_projid_hi = cpu_to_be16(from->di_projid_hi);
>> 184
>> 185 to->di_atime = xfs_log_dinode_to_disk_ts(from, from->di_atime);
>> 186 to->di_mtime = xfs_log_dinode_to_disk_ts(from, from->di_mtime);
>> 187 to->di_ctime = xfs_log_dinode_to_disk_ts(from, from->di_ctime);
>> 188
>> 189 to->di_size = cpu_to_be64(from->di_size);
>> 190 to->di_nblocks = cpu_to_be64(from->di_nblocks);
>> 191 to->di_extsize = cpu_to_be32(from->di_extsize);
>> 192 to->di_forkoff = from->di_forkoff;
>> 193 to->di_aformat = from->di_aformat;
>> 194 to->di_dmevmask = cpu_to_be32(from->di_dmevmask);
>> 195 to->di_dmstate = cpu_to_be16(from->di_dmstate);
>> 196 to->di_flags = cpu_to_be16(from->di_flags);
>> 197 to->di_gen = cpu_to_be32(from->di_gen);
>> 198
>> 199 if (from->di_version == 3) {
>> 200 to->di_changecount = cpu_to_be64(from->di_changecount);
>> 201 to->di_crtime = xfs_log_dinode_to_disk_ts(from,
>> 202 from->di_crtime);
>> 203 to->di_flags2 = cpu_to_be64(from->di_flags2);
>> 204 to->di_cowextsize = cpu_to_be32(from->di_cowextsize);
>> 205 to->di_ino = cpu_to_be64(from->di_ino);
>> 206 to->di_lsn = cpu_to_be64(lsn);
>> 207 memcpy(to->di_pad2, from->di_pad2, sizeof(to->di_pad2));
>> 208 uuid_copy(&to->di_uuid, &from->di_uuid);
>> > 209 to->di_v3_pad = from->di_v3_pad;
>
> Why not just explicitly write zero to the di_v3_pad field?
>
Yes, We can do that since the call to xfs_log_dinode_to_disk_iext_counters()
will update the values of union members correctly for v3 inodes with large
extent count feature enabled.
>> 210 } else {
>> 211 to->di_flushiter = cpu_to_be16(from->di_flushiter);
>> 212 memcpy(to->di_v2_pad, from->di_v2_pad, sizeof(to->di_v2_pad));
>
> Same here?
>
This field too can be set to zeroes explicitly.
--
chandan
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH V9 15/19] xfs: Directory's data fork extent counter can never overflow
2022-04-07 1:48 ` Darrick J. Wong
@ 2022-04-07 8:19 ` Chandan Babu R
0 siblings, 0 replies; 62+ messages in thread
From: Chandan Babu R @ 2022-04-07 8:19 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: Dave Chinner, linux-xfs
On 07 Apr 2022 at 07:18, Darrick J. Wong wrote:
> On Thu, Apr 07, 2022 at 11:13:11AM +1000, Dave Chinner wrote:
>> On Wed, Apr 06, 2022 at 11:48:59AM +0530, Chandan Babu R wrote:
>> > The maximum file size that can be represented by the data fork extent counter
>> > in the worst case occurs when all extents are 1 block in length and each block
>> > is 1KB in size.
>> >
>> > With XFS_MAX_EXTCNT_DATA_FORK_SMALL representing maximum extent count and with
>> > 1KB sized blocks, a file can reach upto,
>> > (2^31) * 1KB = 2TB
>> >
>> > This is much larger than the theoretical maximum size of a directory
>> > i.e. XFS_DIR2_SPACE_SIZE * 3 = ~96GB.
>> >
>> > Since a directory's inode can never overflow its data fork extent counter,
>> > this commit removes all the overflow checks associated with
>> > it. xfs_dinode_verify() now performs a rough check to verify if a diretory's
>> > data fork is larger than 96GB.
>> >
>> > Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
>>
>> Mostly OK, just a simple cleanup needed.
>>
>> > diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
>> > index ee8d4eb7d048..54b106ae77e1 100644
>> > --- a/fs/xfs/libxfs/xfs_inode_buf.c
>> > +++ b/fs/xfs/libxfs/xfs_inode_buf.c
>> > @@ -491,6 +491,15 @@ xfs_dinode_verify(
>> > if (mode && nextents + naextents > nblocks)
>> > return __this_address;
>> >
>> > + if (S_ISDIR(mode)) {
>> > + uint64_t max_dfork_nexts;
>> > +
>> > + max_dfork_nexts = (XFS_DIR2_MAX_SPACES * XFS_DIR2_SPACE_SIZE) >>
>> > + mp->m_sb.sb_blocklog;
>> > + if (nextents > max_dfork_nexts)
>> > + return __this_address;
>> > + }
>>
>> max_dfork_nexts for a directory is a constant that should be
>> calculated at mount time via xfs_da_mount() and stored in the
>> mp->m_dir_geo structure. Then this code simple becomes:
>>
>> if (S_ISDIR(mode) && nextents > mp->m_dir_geo->max_extents)
>> return __this_address;
>
> I have the same comment as Dave, FWIW. :)
>
Ok. I will apply the above suggestion.
--
chandan
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH V9 16/19] xfs: Conditionally upgrade existing inodes to use large extent counters
2022-04-07 1:22 ` Dave Chinner
2022-04-07 1:46 ` Darrick J. Wong
@ 2022-04-07 8:19 ` Chandan Babu R
1 sibling, 0 replies; 62+ messages in thread
From: Chandan Babu R @ 2022-04-07 8:19 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-xfs, djwong
On 07 Apr 2022 at 06:52, Dave Chinner wrote:
> On Wed, Apr 06, 2022 at 11:49:00AM +0530, Chandan Babu R wrote:
>> This commit enables upgrading existing inodes to use large extent counters
>> provided that underlying filesystem's superblock has large extent counter
>> feature enabled.
>>
>> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
>> ---
>> fs/xfs/libxfs/xfs_attr.c | 10 ++++++++++
>> fs/xfs/libxfs/xfs_bmap.c | 6 ++++--
>> fs/xfs/libxfs/xfs_format.h | 8 ++++++++
>> fs/xfs/libxfs/xfs_inode_fork.c | 19 +++++++++++++++++++
>> fs/xfs/libxfs/xfs_inode_fork.h | 2 ++
>> fs/xfs/xfs_bmap_item.c | 2 ++
>> fs/xfs/xfs_bmap_util.c | 13 +++++++++++++
>> fs/xfs/xfs_dquot.c | 3 +++
>> fs/xfs/xfs_iomap.c | 5 +++++
>> fs/xfs/xfs_reflink.c | 5 +++++
>> fs/xfs/xfs_rtalloc.c | 3 +++
>> 11 files changed, 74 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>> index 23523b802539..66c4fc55c9d7 100644
>> --- a/fs/xfs/libxfs/xfs_attr.c
>> +++ b/fs/xfs/libxfs/xfs_attr.c
>> @@ -776,8 +776,18 @@ xfs_attr_set(
>> if (args->value || xfs_inode_hasattr(dp)) {
>> error = xfs_iext_count_may_overflow(dp, XFS_ATTR_FORK,
>> XFS_IEXT_ATTR_MANIP_CNT(rmt_blks));
>> + if (error == -EFBIG)
>> + error = xfs_iext_count_upgrade(args->trans, dp,
>> + XFS_IEXT_ATTR_MANIP_CNT(rmt_blks));
>> if (error)
>> goto out_trans_cancel;
>> +
>> + if (error == -EFBIG) {
>> + error = xfs_iext_count_upgrade(args->trans, dp,
>> + XFS_IEXT_ATTR_MANIP_CNT(rmt_blks));
>> + if (error)
>> + goto out_trans_cancel;
>> + }
>> }
>
> Did you forgot to remove the original xfs_iext_count_upgrade() call?
>
Sorry, I thought I had removed it before testing the changes. Thanks for
catching this.
>> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
>> index 43de892d0305..bb327ea43ca1 100644
>> --- a/fs/xfs/libxfs/xfs_format.h
>> +++ b/fs/xfs/libxfs/xfs_format.h
>> @@ -934,6 +934,14 @@ enum xfs_dinode_fmt {
>> #define XFS_MAX_EXTCNT_DATA_FORK_SMALL ((xfs_extnum_t)((1ULL << 31) - 1))
>> #define XFS_MAX_EXTCNT_ATTR_FORK_SMALL ((xfs_extnum_t)((1ULL << 15) - 1))
>>
>> +/*
>> + * This macro represents the maximum value by which a filesystem operation can
>> + * increase the value of an inode's data/attr fork extent count.
>> + */
>> +#define XFS_MAX_EXTCNT_UPGRADE_NR \
>> + min(XFS_MAX_EXTCNT_ATTR_FORK_LARGE - XFS_MAX_EXTCNT_ATTR_FORK_SMALL, \
>> + XFS_MAX_EXTCNT_DATA_FORK_LARGE - XFS_MAX_EXTCNT_DATA_FORK_SMALL)
>
> You don't need to write "This macro represents" in a comment above
> the macro that that the comment is describing. If you need to refer
> to the actual macro, use it's name directly.
>
> As it is, the comment could be improved:
>
> /*
> * When we upgrade an inode to the large extent counts, the maximum
> * value by which the extent count can increase is bound by the
> * change in size of the on-disk field. No upgrade operation should
> * ever be adding more than a few tens of, so if we get a really
> * large value it is a sign of a code bug or corruption.
> */
> #define XFS_MAX_EXTCNT_UPGRADE_NR \
> min(XFS_MAX_EXTCNT_ATTR_FORK_LARGE - XFS_MAX_EXTCNT_ATTR_FORK_SMALL, \
> XFS_MAX_EXTCNT_DATA_FORK_LARGE - XFS_MAX_EXTCNT_DATA_FORK_SMALL)
>
> Otherwise it looks OK.
>
Ok. I will include this change.
--
chandan
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH V9 16/19] xfs: Conditionally upgrade existing inodes to use large extent counters
2022-04-07 1:46 ` Darrick J. Wong
2022-04-07 2:00 ` Darrick J. Wong
@ 2022-04-07 8:19 ` Chandan Babu R
1 sibling, 0 replies; 62+ messages in thread
From: Chandan Babu R @ 2022-04-07 8:19 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs, david
On 07 Apr 2022 at 07:16, Darrick J. Wong wrote:
> On Wed, Apr 06, 2022 at 11:49:00AM +0530, Chandan Babu R wrote:
>> This commit enables upgrading existing inodes to use large extent counters
>> provided that underlying filesystem's superblock has large extent counter
>> feature enabled.
>>
>> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
>> ---
>> fs/xfs/libxfs/xfs_attr.c | 10 ++++++++++
>> fs/xfs/libxfs/xfs_bmap.c | 6 ++++--
>> fs/xfs/libxfs/xfs_format.h | 8 ++++++++
>> fs/xfs/libxfs/xfs_inode_fork.c | 19 +++++++++++++++++++
>> fs/xfs/libxfs/xfs_inode_fork.h | 2 ++
>> fs/xfs/xfs_bmap_item.c | 2 ++
>> fs/xfs/xfs_bmap_util.c | 13 +++++++++++++
>> fs/xfs/xfs_dquot.c | 3 +++
>> fs/xfs/xfs_iomap.c | 5 +++++
>> fs/xfs/xfs_reflink.c | 5 +++++
>> fs/xfs/xfs_rtalloc.c | 3 +++
>> 11 files changed, 74 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>> index 23523b802539..66c4fc55c9d7 100644
>> --- a/fs/xfs/libxfs/xfs_attr.c
>> +++ b/fs/xfs/libxfs/xfs_attr.c
>> @@ -776,8 +776,18 @@ xfs_attr_set(
>> if (args->value || xfs_inode_hasattr(dp)) {
>> error = xfs_iext_count_may_overflow(dp, XFS_ATTR_FORK,
>> XFS_IEXT_ATTR_MANIP_CNT(rmt_blks));
>> + if (error == -EFBIG)
>> + error = xfs_iext_count_upgrade(args->trans, dp,
>> + XFS_IEXT_ATTR_MANIP_CNT(rmt_blks));
>> if (error)
>> goto out_trans_cancel;
>> +
>> + if (error == -EFBIG) {
>> + error = xfs_iext_count_upgrade(args->trans, dp,
>> + XFS_IEXT_ATTR_MANIP_CNT(rmt_blks));
>> + if (error)
>> + goto out_trans_cancel;
>> + }
>> }
>>
>> error = xfs_attr_lookup(args);
>> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
>> index 4fab0c92ab70..82d5467ddf2c 100644
>> --- a/fs/xfs/libxfs/xfs_bmap.c
>> +++ b/fs/xfs/libxfs/xfs_bmap.c
>> @@ -4524,14 +4524,16 @@ xfs_bmapi_convert_delalloc(
>> return error;
>>
>> xfs_ilock(ip, XFS_ILOCK_EXCL);
>> + xfs_trans_ijoin(tp, ip, 0);
>>
>> error = xfs_iext_count_may_overflow(ip, whichfork,
>> XFS_IEXT_ADD_NOSPLIT_CNT);
>> + if (error == -EFBIG)
>> + error = xfs_iext_count_upgrade(tp, ip,
>> + XFS_IEXT_ADD_NOSPLIT_CNT);
>> if (error)
>> goto out_trans_cancel;
>>
>> - xfs_trans_ijoin(tp, ip, 0);
>> -
>> if (!xfs_iext_lookup_extent(ip, ifp, offset_fsb, &bma.icur, &bma.got) ||
>> bma.got.br_startoff > offset_fsb) {
>> /*
>> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
>> index 43de892d0305..bb327ea43ca1 100644
>> --- a/fs/xfs/libxfs/xfs_format.h
>> +++ b/fs/xfs/libxfs/xfs_format.h
>> @@ -934,6 +934,14 @@ enum xfs_dinode_fmt {
>> #define XFS_MAX_EXTCNT_DATA_FORK_SMALL ((xfs_extnum_t)((1ULL << 31) - 1))
>> #define XFS_MAX_EXTCNT_ATTR_FORK_SMALL ((xfs_extnum_t)((1ULL << 15) - 1))
>>
>> +/*
>> + * This macro represents the maximum value by which a filesystem operation can
>> + * increase the value of an inode's data/attr fork extent count.
>> + */
>> +#define XFS_MAX_EXTCNT_UPGRADE_NR \
>> + min(XFS_MAX_EXTCNT_ATTR_FORK_LARGE - XFS_MAX_EXTCNT_ATTR_FORK_SMALL, \
>> + XFS_MAX_EXTCNT_DATA_FORK_LARGE - XFS_MAX_EXTCNT_DATA_FORK_SMALL)
>> +
>> /*
>> * Inode minimum and maximum sizes.
>> */
>> diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
>> index bb5d841aac58..1245e9f1ca81 100644
>> --- a/fs/xfs/libxfs/xfs_inode_fork.c
>> +++ b/fs/xfs/libxfs/xfs_inode_fork.c
>> @@ -756,3 +756,22 @@ xfs_iext_count_may_overflow(
>>
>> return 0;
>> }
>> +
>> +int
>> +xfs_iext_count_upgrade(
>
> Hmm. I think the @nr_to_add parameter is supposed to be the one
> that caused xfs_iext_count_may_overflow to return -EFBIG, right?
>
> I was about to comment that it would be really helpful to have a comment
> above this function dropping a hint that this is the case:
>
> /*
> * Upgrade this inode's extent counter fields to be able to handle a
> * potential increase in the extent count by this number. Normally
> * this is the same quantity that caused xfs_iext_count_may_overflow to
> * return -EFBIG.
> */
> int
> xfs_iext_count_upgrade(...
>
> ...though I worry that this will cause fatal warnings about the
> otherwise unused parameter on non-debug kernels?
> I'm not sure why it
> matters that nr_to_add is constrained to a small value? Is it just to
> prevent obviously huge values? AFAICT all the current callers pass in
> small #defined integer values.
>
> That said, if the assert here is something Dave asked for in a previous
> review, then I won't stand in the way:
It was me who added the call to ASSERT() in V7 of the patchset. This was done
to catch any unintentional programming errors.
Also, I found out today that Linux kernel build does not pass either
-Wunused-parameter or -Wextra options to the compiler
(https://www.spinics.net/lists/newbies/msg63816.html). Passing either of those
compiler options causes several "unused parameter" warnings across the kernel
source code. So we should never see warnings about nr_to_add being unused on
non-debug kernels.
>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
>
Thanks for the review! I will include the comment for xfs_iext_count_upgrade()
that you have suggested above and I will also replace the open code that you
have pointed out (in the your next mail) with a call to
xfs_inode_has_large_extent_counts().
--
chandan
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH V9 18/19] xfs: Enable bulkstat ioctl to support 64-bit per-inode extent counters
2022-04-07 1:29 ` Darrick J. Wong
2022-04-07 1:42 ` Dave Chinner
@ 2022-04-07 8:20 ` Chandan Babu R
1 sibling, 0 replies; 62+ messages in thread
From: Chandan Babu R @ 2022-04-07 8:20 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs, david, Dave Chinner
On 07 Apr 2022 at 06:59, Darrick J. Wong wrote:
> On Wed, Apr 06, 2022 at 11:49:02AM +0530, Chandan Babu R wrote:
>> The following changes are made to enable userspace to obtain 64-bit extent
>> counters,
>> 1. Carve out a new 64-bit field xfs_bulkstat->bs_extents64 from
>> xfs_bulkstat->bs_pad[] to hold 64-bit extent counter.
>> 2. Define the new flag XFS_BULK_IREQ_BULKSTAT for userspace to indicate that
>> it is capable of receiving 64-bit extent counters.
>>
>> Reviewed-by: Dave Chinner <dchinner@redhat.com>
>> Suggested-by: Darrick J. Wong <djwong@kernel.org>
>> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
>> ---
>> fs/xfs/libxfs/xfs_fs.h | 20 ++++++++++++++++----
>> fs/xfs/xfs_ioctl.c | 3 +++
>> fs/xfs/xfs_itable.c | 13 ++++++++++++-
>> fs/xfs/xfs_itable.h | 2 ++
>> 4 files changed, 33 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
>> index 1f7238db35cc..2a42bfb85c3b 100644
>> --- a/fs/xfs/libxfs/xfs_fs.h
>> +++ b/fs/xfs/libxfs/xfs_fs.h
>> @@ -378,7 +378,7 @@ struct xfs_bulkstat {
>> uint32_t bs_extsize_blks; /* extent size hint, blocks */
>>
>> uint32_t bs_nlink; /* number of links */
>> - uint32_t bs_extents; /* number of extents */
>> + uint32_t bs_extents; /* 32-bit data fork extent counter */
>> uint32_t bs_aextents; /* attribute number of extents */
>> uint16_t bs_version; /* structure version */
>> uint16_t bs_forkoff; /* inode fork offset in bytes */
>> @@ -387,8 +387,9 @@ struct xfs_bulkstat {
>> uint16_t bs_checked; /* checked inode metadata */
>> uint16_t bs_mode; /* type and mode */
>> uint16_t bs_pad2; /* zeroed */
>> + uint64_t bs_extents64; /* 64-bit data fork extent counter */
>>
>> - uint64_t bs_pad[7]; /* zeroed */
>> + uint64_t bs_pad[6]; /* zeroed */
>> };
>>
>> #define XFS_BULKSTAT_VERSION_V1 (1)
>> @@ -469,8 +470,19 @@ struct xfs_bulk_ireq {
>> */
>> #define XFS_BULK_IREQ_SPECIAL (1 << 1)
>>
>> -#define XFS_BULK_IREQ_FLAGS_ALL (XFS_BULK_IREQ_AGNO | \
>> - XFS_BULK_IREQ_SPECIAL)
>> +/*
>> + * Return data fork extent count via xfs_bulkstat->bs_extents64 field and assign
>> + * 0 to xfs_bulkstat->bs_extents when the flag is set. Otherwise, use
>> + * xfs_bulkstat->bs_extents for returning data fork extent count and set
>> + * xfs_bulkstat->bs_extents64 to 0. In the second case, return -EOVERFLOW and
>> + * assign 0 to xfs_bulkstat->bs_extents if data fork extent count is larger than
>> + * XFS_MAX_EXTCNT_DATA_FORK_OLD.
>> + */
>> +#define XFS_BULK_IREQ_NREXT64 (1 << 2)
>> +
>> +#define XFS_BULK_IREQ_FLAGS_ALL (XFS_BULK_IREQ_AGNO | \
>> + XFS_BULK_IREQ_SPECIAL | \
>> + XFS_BULK_IREQ_NREXT64)
>>
>> /* Operate on the root directory inode. */
>> #define XFS_BULK_IREQ_SPECIAL_ROOT (1)
>> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
>> index 83481005317a..e9eadc7337ce 100644
>> --- a/fs/xfs/xfs_ioctl.c
>> +++ b/fs/xfs/xfs_ioctl.c
>> @@ -813,6 +813,9 @@ xfs_bulk_ireq_setup(
>> if (XFS_INO_TO_AGNO(mp, breq->startino) >= mp->m_sb.sb_agcount)
>> return -ECANCELED;
>>
>> + if (hdr->flags & XFS_BULK_IREQ_NREXT64)
>> + breq->flags |= XFS_IBULK_NREXT64;
>> +
>> return 0;
>> }
>>
>> diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
>> index 71ed4905f206..847f03f75a38 100644
>> --- a/fs/xfs/xfs_itable.c
>> +++ b/fs/xfs/xfs_itable.c
>> @@ -64,6 +64,7 @@ xfs_bulkstat_one_int(
>> struct xfs_inode *ip; /* incore inode pointer */
>> struct inode *inode;
>> struct xfs_bulkstat *buf = bc->buf;
>> + xfs_extnum_t nextents;
>> int error = -EINVAL;
>>
>> if (xfs_internal_inum(mp, ino))
>> @@ -102,7 +103,17 @@ xfs_bulkstat_one_int(
>>
>> buf->bs_xflags = xfs_ip2xflags(ip);
>> buf->bs_extsize_blks = ip->i_extsize;
>> - buf->bs_extents = xfs_ifork_nextents(&ip->i_df);
>> +
>> + nextents = xfs_ifork_nextents(&ip->i_df);
>> + if (!(bc->breq->flags & XFS_IBULK_NREXT64)) {
>> + if (nextents > XFS_MAX_EXTCNT_DATA_FORK_SMALL)
>> + buf->bs_extents = XFS_MAX_EXTCNT_DATA_FORK_SMALL;
>> + else
>> + buf->bs_extents = nextents;
>
> buf->bs_extents = min(nextents, XFS_MAX_EXTCNT_DATA_FORK_SMALL); ?
>
This is much cleaner. I will apply the above suggestion.
>> + } else {
>> + buf->bs_extents64 = nextents;
>> + }
>> +
>> xfs_bulkstat_health(ip, buf);
>> buf->bs_aextents = xfs_ifork_nextents(ip->i_afp);
>> buf->bs_forkoff = XFS_IFORK_BOFF(ip);
>> diff --git a/fs/xfs/xfs_itable.h b/fs/xfs/xfs_itable.h
>> index 2cf3872fcd2f..0150fd53d18e 100644
>> --- a/fs/xfs/xfs_itable.h
>> +++ b/fs/xfs/xfs_itable.h
>> @@ -19,6 +19,8 @@ struct xfs_ibulk {
>> /* Only iterate within the same AG as startino */
>> #define XFS_IBULK_SAME_AG (1 << 0)
>>
>> +#define XFS_IBULK_NREXT64 (1 << 1)
>
> Needs a comment here.
>
> /* Fill out the bs_extents64 field if set. */
> #define XFS_IBULK_NREXT64 (1U << 1)
>
> (Are we supposed to do "1U" now?)
>
I will apply the above suggestion.
--
chandan
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH V9 12/19] xfs: Introduce macros to represent new maximum extent counts for data/attr forks
2022-04-07 8:18 ` Chandan Babu R
@ 2022-04-07 8:56 ` Dave Chinner
0 siblings, 0 replies; 62+ messages in thread
From: Dave Chinner @ 2022-04-07 8:56 UTC (permalink / raw)
To: Chandan Babu R; +Cc: Darrick J. Wong, linux-xfs
On Thu, Apr 07, 2022 at 01:48:17PM +0530, Chandan Babu R wrote:
> On 07 Apr 2022 at 08:14, Dave Chinner wrote:
> > On Wed, Apr 06, 2022 at 06:58:55PM -0700, Darrick J. Wong wrote:
> >> On Thu, Apr 07, 2022 at 11:05:44AM +1000, Dave Chinner wrote:
> >> > On Wed, Apr 06, 2022 at 11:48:56AM +0530, Chandan Babu R wrote:
> >> > > diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
> >> > > index 453309fc85f2..7aabeccea9ab 100644
> >> > > --- a/fs/xfs/libxfs/xfs_bmap_btree.c
> >> > > +++ b/fs/xfs/libxfs/xfs_bmap_btree.c
> >> > > @@ -611,7 +611,8 @@ xfs_bmbt_maxlevels_ondisk(void)
> >> > > minrecs[1] = xfs_bmbt_block_maxrecs(blocklen, false) / 2;
> >> > >
> >> > > /* One extra level for the inode root. */
> >> > > - return xfs_btree_compute_maxlevels(minrecs, MAXEXTNUM) + 1;
> >> > > + return xfs_btree_compute_maxlevels(minrecs,
> >> > > + XFS_MAX_EXTCNT_DATA_FORK_LARGE) + 1;
> >> > > }
> >> >
> >> > Why is this set to XFS_MAX_EXTCNT_DATA_FORK_LARGE rather than being
> >> > conditional xfs_has_large_extent_counts(mp)? i.e. if the feature bit
> >> > is not set, the maximum on-disk levels in the bmbt is determined by
> >> > XFS_MAX_EXTCNT_DATA_FORK_SMALL, not XFS_MAX_EXTCNT_DATA_FORK_LARGE.
> >>
> >> This function (and all the other _maxlevels_ondisk functions) compute
> >> the maximum possible btree height for any filesystem that we'd care to
> >> mount. This value is then passed to the functions that create the btree
> >> cursor caches, which is why this is independent of any xfs_mount.
> >>
> >> That said ... depending on how much this inflates the size of the bmbt
> >> cursor cache, I think we could create multiple slabs.
> >>
> >> > The "_ondisk" suffix implies that it has something to do with the
> >> > on-disk format of the filesystem, but AFAICT what we are calculating
> >> > here is a constant used for in-memory structure allocation? There
> >> > needs to be something explained/changed here, because this is
> >> > confusing...
> >>
> >> You suggested it. ;)
> >>
> >> https://lore.kernel.org/linux-xfs/20211013075743.GG2361455@dread.disaster.area/
> >
> > That doesn't mean it's perfect and can't be changed, nor that I
> > remember the exact details of something that happened 6 months ago.
> > Indeed, if I'm confused by it 6 months later, that tends to say it
> > wasn't a very good name... :)
> >
> > .... or that the missing context needs explaining so the reader is
> > reminded what the _ondisk() name means.
> >
> > i.e. the problem goes away with a simple comment:
> >
> > /*
> > * Calculate the maximum possible height of the btree that the
> > * on-disk format supports. This is used for sizing structures large
> > * enough to support every possible configuration of a filesystem
> > * that might get mounted.
> > */
> >
>
> If there are no objections, I will include the above comment as part of this
> patch.
Yes, that's fine, and with that added, you can add:
Reviewed-by: Dave Chinner <dchinner@redhat.com>
as well.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 62+ messages in thread
* [PATCH V9.1] xfs: Introduce macros to represent new maximum extent counts for data/attr forks
2022-04-06 6:18 [PATCH V9 00/19] xfs: Extend per-inode extent counters Chandan Babu R
` (18 preceding siblings ...)
2022-04-06 6:19 ` [PATCH V9 19/19] xfs: Add XFS_SB_FEAT_INCOMPAT_NREXT64 to the list of supported flags Chandan Babu R
@ 2022-04-09 13:23 ` Chandan Babu R
19 siblings, 0 replies; 62+ messages in thread
From: Chandan Babu R @ 2022-04-09 13:23 UTC (permalink / raw)
To: linux-xfs; +Cc: Chandan Babu R, djwong, david, Dave Chinner
This commit defines new macros to represent maximum extent counts allowed by
filesystems which have support for large per-inode extent counters.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
fs/xfs/libxfs/xfs_bmap.c | 9 ++++-----
fs/xfs/libxfs/xfs_bmap_btree.c | 9 +++++++--
fs/xfs/libxfs/xfs_format.h | 24 ++++++++++++++++++++++--
fs/xfs/libxfs/xfs_inode_buf.c | 4 +++-
fs/xfs/libxfs/xfs_inode_fork.c | 3 ++-
fs/xfs/libxfs/xfs_inode_fork.h | 21 +++++++++++++++++----
6 files changed, 55 insertions(+), 15 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index b317226fb4ba..1254d4d4821e 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -61,10 +61,8 @@ xfs_bmap_compute_maxlevels(
int sz; /* root block size */
/*
- * The maximum number of extents in a file, hence the maximum number of
- * leaf entries, is controlled by the size of the on-disk extent count,
- * either a signed 32-bit number for the data fork, or a signed 16-bit
- * number for the attr fork.
+ * The maximum number of extents in a fork, hence the maximum number of
+ * leaf entries, is controlled by the size of the on-disk extent count.
*
* Note that we can no longer assume that if we are in ATTR1 that the
* fork offset of all the inodes will be
@@ -74,7 +72,8 @@ xfs_bmap_compute_maxlevels(
* ATTR2 we have to assume the worst case scenario of a minimum size
* available.
*/
- maxleafents = xfs_iext_max_nextents(whichfork);
+ maxleafents = xfs_iext_max_nextents(xfs_has_large_extent_counts(mp),
+ whichfork);
if (whichfork == XFS_DATA_FORK)
sz = XFS_BMDR_SPACE_CALC(MINDBTPTRS);
else
diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
index 453309fc85f2..2b77d45c215f 100644
--- a/fs/xfs/libxfs/xfs_bmap_btree.c
+++ b/fs/xfs/libxfs/xfs_bmap_btree.c
@@ -597,7 +597,11 @@ xfs_bmbt_maxrecs(
return xfs_bmbt_block_maxrecs(blocklen, leaf);
}
-/* Compute the max possible height for block mapping btrees. */
+/*
+ * Calculate the maximum possible height of the btree that the on-disk format
+ * supports. This is used for sizing structures large enough to support every
+ * possible configuration of a filesystem that might get mounted.
+ */
unsigned int
xfs_bmbt_maxlevels_ondisk(void)
{
@@ -611,7 +615,8 @@ xfs_bmbt_maxlevels_ondisk(void)
minrecs[1] = xfs_bmbt_block_maxrecs(blocklen, false) / 2;
/* One extra level for the inode root. */
- return xfs_btree_compute_maxlevels(minrecs, MAXEXTNUM) + 1;
+ return xfs_btree_compute_maxlevels(minrecs,
+ XFS_MAX_EXTCNT_DATA_FORK_LARGE) + 1;
}
/*
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 57b24744a7c2..eb85bc9b229b 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -872,9 +872,29 @@ enum xfs_dinode_fmt {
/*
* Max values for extnum and aextnum.
+ *
+ * The original on-disk extent counts were held in signed fields, resulting in
+ * maximum extent counts of 2^31 and 2^15 for the data and attr forks
+ * respectively. Similarly the maximum extent length is limited to 2^21 blocks
+ * by the 21-bit wide blockcount field of a BMBT extent record.
+ *
+ * The newly introduced data fork extent counter can hold a 64-bit value,
+ * however the maximum number of extents in a file is also limited to 2^54
+ * extents by the 54-bit wide startoff field of a BMBT extent record.
+ *
+ * It is further limited by the maximum supported file size of 2^63
+ * *bytes*. This leads to a maximum extent count for maximally sized filesystem
+ * blocks (64kB) of:
+ *
+ * 2^63 bytes / 2^16 bytes per block = 2^47 blocks
+ *
+ * Rounding up 47 to the nearest multiple of bits-per-byte results in 48. Hence
+ * 2^48 was chosen as the maximum data fork extent count.
*/
-#define MAXEXTNUM ((xfs_extnum_t)0x7fffffff) /* signed int */
-#define MAXAEXTNUM ((xfs_aextnum_t)0x7fff) /* signed short */
+#define XFS_MAX_EXTCNT_DATA_FORK_LARGE ((xfs_extnum_t)((1ULL << 48) - 1))
+#define XFS_MAX_EXTCNT_ATTR_FORK_LARGE ((xfs_extnum_t)((1ULL << 32) - 1))
+#define XFS_MAX_EXTCNT_DATA_FORK_SMALL ((xfs_extnum_t)((1ULL << 31) - 1))
+#define XFS_MAX_EXTCNT_ATTR_FORK_SMALL ((xfs_extnum_t)((1ULL << 15) - 1))
/*
* Inode minimum and maximum sizes.
diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index f0e063835318..e0d3140c3622 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -361,7 +361,9 @@ xfs_dinode_verify_fork(
return __this_address;
break;
case XFS_DINODE_FMT_BTREE:
- max_extents = xfs_iext_max_nextents(whichfork);
+ max_extents = xfs_iext_max_nextents(
+ xfs_dinode_has_large_extent_counts(dip),
+ whichfork);
if (di_nextents > max_extents)
return __this_address;
break;
diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
index 004b205d87b8..bb5d841aac58 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -744,7 +744,8 @@ xfs_iext_count_may_overflow(
if (whichfork == XFS_COW_FORK)
return 0;
- max_exts = xfs_iext_max_nextents(whichfork);
+ max_exts = xfs_iext_max_nextents(xfs_inode_has_large_extent_counts(ip),
+ whichfork);
if (XFS_TEST_ERROR(false, ip->i_mount, XFS_ERRTAG_REDUCE_MAX_IEXTENTS))
max_exts = 10;
diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
index 4a8b77d425df..967837a88860 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.h
+++ b/fs/xfs/libxfs/xfs_inode_fork.h
@@ -133,12 +133,25 @@ static inline int8_t xfs_ifork_format(struct xfs_ifork *ifp)
return ifp->if_format;
}
-static inline xfs_extnum_t xfs_iext_max_nextents(int whichfork)
+static inline xfs_extnum_t xfs_iext_max_nextents(bool has_large_extent_counts,
+ int whichfork)
{
- if (whichfork == XFS_DATA_FORK || whichfork == XFS_COW_FORK)
- return MAXEXTNUM;
+ switch (whichfork) {
+ case XFS_DATA_FORK:
+ case XFS_COW_FORK:
+ if (has_large_extent_counts)
+ return XFS_MAX_EXTCNT_DATA_FORK_LARGE;
+ return XFS_MAX_EXTCNT_DATA_FORK_SMALL;
+
+ case XFS_ATTR_FORK:
+ if (has_large_extent_counts)
+ return XFS_MAX_EXTCNT_ATTR_FORK_LARGE;
+ return XFS_MAX_EXTCNT_ATTR_FORK_SMALL;
- return MAXAEXTNUM;
+ default:
+ ASSERT(0);
+ return 0;
+ }
}
static inline xfs_extnum_t
--
2.30.2
^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH V9.1] xfs: Directory's data fork extent counter can never overflow
2022-04-06 6:18 ` [PATCH V9 15/19] xfs: Directory's data fork extent counter can never overflow Chandan Babu R
2022-04-07 1:13 ` Dave Chinner
@ 2022-04-09 13:47 ` Chandan Babu R
2022-04-11 1:33 ` Dave Chinner
2022-04-11 22:07 ` Darrick J. Wong
2022-04-12 14:02 ` [PATCH V9.2] " Chandan Babu R
2 siblings, 2 replies; 62+ messages in thread
From: Chandan Babu R @ 2022-04-09 13:47 UTC (permalink / raw)
To: linux-xfs; +Cc: djwong, david, chandan.babu
The maximum file size that can be represented by the data fork extent counter
in the worst case occurs when all extents are 1 block in length and each block
is 1KB in size.
With XFS_MAX_EXTCNT_DATA_FORK_SMALL representing maximum extent count and with
1KB sized blocks, a file can reach upto,
(2^31) * 1KB = 2TB
This is much larger than the theoretical maximum size of a directory
i.e. XFS_DIR2_SPACE_SIZE * 3 = ~96GB.
Since a directory's inode can never overflow its data fork extent counter,
this commit removes all the overflow checks associated with
it. xfs_dinode_verify() now performs a rough check to verify if a diretory's
data fork is larger than 96GB.
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
fs/xfs/libxfs/xfs_bmap.c | 20 -------------
fs/xfs/libxfs/xfs_da_btree.h | 1 +
fs/xfs/libxfs/xfs_da_format.h | 1 +
fs/xfs/libxfs/xfs_dir2.c | 2 ++
fs/xfs/libxfs/xfs_format.h | 13 ++++++++
fs/xfs/libxfs/xfs_inode_buf.c | 3 ++
fs/xfs/libxfs/xfs_inode_fork.h | 13 --------
fs/xfs/xfs_inode.c | 55 ++--------------------------------
fs/xfs/xfs_symlink.c | 5 ----
9 files changed, 22 insertions(+), 91 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 1254d4d4821e..4fab0c92ab70 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -5147,26 +5147,6 @@ xfs_bmap_del_extent_real(
* Deleting the middle of the extent.
*/
- /*
- * For directories, -ENOSPC is returned since a directory entry
- * remove operation must not fail due to low extent count
- * availability. -ENOSPC will be handled by higher layers of XFS
- * by letting the corresponding empty Data/Free blocks to linger
- * until a future remove operation. Dabtree blocks would be
- * swapped with the last block in the leaf space and then the
- * new last block will be unmapped.
- *
- * The above logic also applies to the source directory entry of
- * a rename operation.
- */
- error = xfs_iext_count_may_overflow(ip, whichfork, 1);
- if (error) {
- ASSERT(S_ISDIR(VFS_I(ip)->i_mode) &&
- whichfork == XFS_DATA_FORK);
- error = -ENOSPC;
- goto done;
- }
-
old = got;
got.br_blockcount = del->br_startoff - got.br_startoff;
diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
index 0faf7d9ac241..7f08f6de48bf 100644
--- a/fs/xfs/libxfs/xfs_da_btree.h
+++ b/fs/xfs/libxfs/xfs_da_btree.h
@@ -30,6 +30,7 @@ struct xfs_da_geometry {
unsigned int free_hdr_size; /* dir2 free header size */
unsigned int free_max_bests; /* # of bests entries in dir2 free */
xfs_dablk_t freeblk; /* blockno of free data v2 */
+ xfs_extnum_t max_extents; /* Max. extents in corresponding fork */
xfs_dir2_data_aoff_t data_first_offset;
size_t data_entry_offset;
diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index 5a49caa5c9df..95354b7ab7f5 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -277,6 +277,7 @@ xfs_dir2_sf_firstentry(struct xfs_dir2_sf_hdr *hdr)
* Directory address space divided into sections,
* spaces separated by 32GB.
*/
+#define XFS_DIR2_MAX_SPACES 3
#define XFS_DIR2_SPACE_SIZE (1ULL << (32 + XFS_DIR2_DATA_ALIGN_LOG))
#define XFS_DIR2_DATA_SPACE 0
#define XFS_DIR2_DATA_OFFSET (XFS_DIR2_DATA_SPACE * XFS_DIR2_SPACE_SIZE)
diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
index 5f1e4799e8fa..52c764ecc015 100644
--- a/fs/xfs/libxfs/xfs_dir2.c
+++ b/fs/xfs/libxfs/xfs_dir2.c
@@ -150,6 +150,8 @@ xfs_da_mount(
dageo->freeblk = xfs_dir2_byte_to_da(dageo, XFS_DIR2_FREE_OFFSET);
dageo->node_ents = (dageo->blksize - dageo->node_hdr_size) /
(uint)sizeof(xfs_da_node_entry_t);
+ dageo->max_extents = (XFS_DIR2_MAX_SPACES * XFS_DIR2_SPACE_SIZE) >>
+ mp->m_sb.sb_blocklog;
dageo->magicpct = (dageo->blksize * 37) / 100;
/* set up attribute geometry - single fsb only */
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 82b404c99b80..43de892d0305 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -915,6 +915,19 @@ enum xfs_dinode_fmt {
*
* Rounding up 47 to the nearest multiple of bits-per-byte results in 48. Hence
* 2^48 was chosen as the maximum data fork extent count.
+ *
+ * The maximum file size that can be represented by the data fork extent counter
+ * in the worst case occurs when all extents are 1 block in length and each
+ * block is 1KB in size.
+ *
+ * With XFS_MAX_EXTCNT_DATA_FORK_SMALL representing maximum extent count and
+ * with 1KB sized blocks, a file can reach upto,
+ * 1KB * (2^31) = 2TB
+ *
+ * This is much larger than the theoretical maximum size of a directory
+ * i.e. XFS_DIR2_SPACE_SIZE * XFS_DIR2_MAX_SPACES = ~96GB.
+ *
+ * Hence, a directory inode can never overflow its data fork extent counter.
*/
#define XFS_MAX_EXTCNT_DATA_FORK_LARGE ((xfs_extnum_t)((1ULL << 48) - 1))
#define XFS_MAX_EXTCNT_ATTR_FORK_LARGE ((xfs_extnum_t)((1ULL << 32) - 1))
diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index ee8d4eb7d048..74b82ec80f8e 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -491,6 +491,9 @@ xfs_dinode_verify(
if (mode && nextents + naextents > nblocks)
return __this_address;
+ if (S_ISDIR(mode) && nextents > mp->m_dir_geo->max_extents)
+ return __this_address;
+
if (mode && XFS_DFORK_BOFF(dip) > mp->m_sb.sb_inodesize)
return __this_address;
diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
index fd5c3c2d77e0..6f9d69f8896e 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.h
+++ b/fs/xfs/libxfs/xfs_inode_fork.h
@@ -39,19 +39,6 @@ struct xfs_ifork {
*/
#define XFS_IEXT_PUNCH_HOLE_CNT (1)
-/*
- * Directory entry addition can cause the following,
- * 1. Data block can be added/removed.
- * A new extent can cause extent count to increase by 1.
- * 2. Free disk block can be added/removed.
- * Same behaviour as described above for Data block.
- * 3. Dabtree blocks.
- * XFS_DA_NODE_MAXDEPTH blocks can be added. Each of these can be new
- * extents. Hence extent count can increase by XFS_DA_NODE_MAXDEPTH.
- */
-#define XFS_IEXT_DIR_MANIP_CNT(mp) \
- ((XFS_DA_NODE_MAXDEPTH + 1 + 1) * (mp)->m_dir_geo->fsbcount)
-
/*
* Adding/removing an xattr can cause XFS_DA_NODE_MAXDEPTH extents to
* be added. One extra extent for dabtree in case a local attr is
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index adc1355ce853..20f15a0393e1 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1024,11 +1024,6 @@ xfs_create(
xfs_ilock(dp, XFS_ILOCK_EXCL | XFS_ILOCK_PARENT);
unlock_dp_on_error = true;
- error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
- XFS_IEXT_DIR_MANIP_CNT(mp));
- if (error)
- goto out_trans_cancel;
-
/*
* A newly created regular or special file just has one directory
* entry pointing to them, but a directory also the "." entry
@@ -1242,11 +1237,6 @@ xfs_link(
if (error)
goto std_return;
- error = xfs_iext_count_may_overflow(tdp, XFS_DATA_FORK,
- XFS_IEXT_DIR_MANIP_CNT(mp));
- if (error)
- goto error_return;
-
/*
* If we are using project inheritance, we only allow hard link
* creation in our tree when the project IDs are the same; else
@@ -3210,35 +3200,6 @@ xfs_rename(
/*
* Check for expected errors before we dirty the transaction
* so we can return an error without a transaction abort.
- *
- * Extent count overflow check:
- *
- * From the perspective of src_dp, a rename operation is essentially a
- * directory entry remove operation. Hence the only place where we check
- * for extent count overflow for src_dp is in
- * xfs_bmap_del_extent_real(). xfs_bmap_del_extent_real() returns
- * -ENOSPC when it detects a possible extent count overflow and in
- * response, the higher layers of directory handling code do the
- * following:
- * 1. Data/Free blocks: XFS lets these blocks linger until a
- * future remove operation removes them.
- * 2. Dabtree blocks: XFS swaps the blocks with the last block in the
- * Leaf space and unmaps the last block.
- *
- * For target_dp, there are two cases depending on whether the
- * destination directory entry exists or not.
- *
- * When destination directory entry does not exist (i.e. target_ip ==
- * NULL), extent count overflow check is performed only when transaction
- * has a non-zero sized space reservation associated with it. With a
- * zero-sized space reservation, XFS allows a rename operation to
- * continue only when the directory has sufficient free space in its
- * data/leaf/free space blocks to hold the new entry.
- *
- * When destination directory entry exists (i.e. target_ip != NULL), all
- * we need to do is change the inode number associated with the already
- * existing entry. Hence there is no need to perform an extent count
- * overflow check.
*/
if (target_ip == NULL) {
/*
@@ -3249,12 +3210,6 @@ xfs_rename(
error = xfs_dir_canenter(tp, target_dp, target_name);
if (error)
goto out_trans_cancel;
- } else {
- error = xfs_iext_count_may_overflow(target_dp,
- XFS_DATA_FORK,
- XFS_IEXT_DIR_MANIP_CNT(mp));
- if (error)
- goto out_trans_cancel;
}
} else {
/*
@@ -3422,18 +3377,12 @@ xfs_rename(
* inode number of the whiteout inode rather than removing it
* altogether.
*/
- if (wip) {
+ if (wip)
error = xfs_dir_replace(tp, src_dp, src_name, wip->i_ino,
spaceres);
- } else {
- /*
- * NOTE: We don't need to check for extent count overflow here
- * because the dir remove name code will leave the dir block in
- * place if the extent count would overflow.
- */
+ else
error = xfs_dir_removename(tp, src_dp, src_name, src_ip->i_ino,
spaceres);
- }
if (error)
goto out_trans_cancel;
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index affbedf78160..4145ba872547 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -226,11 +226,6 @@ xfs_symlink(
goto out_trans_cancel;
}
- error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
- XFS_IEXT_DIR_MANIP_CNT(mp));
- if (error)
- goto out_trans_cancel;
-
/*
* Allocate an inode for the symlink.
*/
--
2.30.2
^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH V9.1] xfs: Conditionally upgrade existing inodes to use large extent counters
2022-04-06 6:19 ` [PATCH V9 16/19] xfs: Conditionally upgrade existing inodes to use large extent counters Chandan Babu R
2022-04-07 1:22 ` Dave Chinner
2022-04-07 1:46 ` Darrick J. Wong
@ 2022-04-09 13:52 ` Chandan Babu R
2022-04-11 1:34 ` Dave Chinner
2 siblings, 1 reply; 62+ messages in thread
From: Chandan Babu R @ 2022-04-09 13:52 UTC (permalink / raw)
To: linux-xfs; +Cc: djwong, david, chandan.babu
This commit enables upgrading existing inodes to use large extent counters
provided that underlying filesystem's superblock has large extent counter
feature enabled.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
fs/xfs/libxfs/xfs_attr.c | 3 +++
fs/xfs/libxfs/xfs_bmap.c | 6 ++++--
fs/xfs/libxfs/xfs_format.h | 11 +++++++++++
fs/xfs/libxfs/xfs_inode_fork.c | 24 ++++++++++++++++++++++++
fs/xfs/libxfs/xfs_inode_fork.h | 2 ++
fs/xfs/xfs_bmap_item.c | 2 ++
fs/xfs/xfs_bmap_util.c | 13 +++++++++++++
fs/xfs/xfs_dquot.c | 3 +++
fs/xfs/xfs_iomap.c | 5 +++++
fs/xfs/xfs_reflink.c | 5 +++++
fs/xfs/xfs_rtalloc.c | 3 +++
11 files changed, 75 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 23523b802539..2815cfbbae70 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -776,6 +776,9 @@ xfs_attr_set(
if (args->value || xfs_inode_hasattr(dp)) {
error = xfs_iext_count_may_overflow(dp, XFS_ATTR_FORK,
XFS_IEXT_ATTR_MANIP_CNT(rmt_blks));
+ if (error == -EFBIG)
+ error = xfs_iext_count_upgrade(args->trans, dp,
+ XFS_IEXT_ATTR_MANIP_CNT(rmt_blks));
if (error)
goto out_trans_cancel;
}
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 4fab0c92ab70..82d5467ddf2c 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4524,14 +4524,16 @@ xfs_bmapi_convert_delalloc(
return error;
xfs_ilock(ip, XFS_ILOCK_EXCL);
+ xfs_trans_ijoin(tp, ip, 0);
error = xfs_iext_count_may_overflow(ip, whichfork,
XFS_IEXT_ADD_NOSPLIT_CNT);
+ if (error == -EFBIG)
+ error = xfs_iext_count_upgrade(tp, ip,
+ XFS_IEXT_ADD_NOSPLIT_CNT);
if (error)
goto out_trans_cancel;
- xfs_trans_ijoin(tp, ip, 0);
-
if (!xfs_iext_lookup_extent(ip, ifp, offset_fsb, &bma.icur, &bma.got) ||
bma.got.br_startoff > offset_fsb) {
/*
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 43de892d0305..3beaa819b790 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -934,6 +934,17 @@ enum xfs_dinode_fmt {
#define XFS_MAX_EXTCNT_DATA_FORK_SMALL ((xfs_extnum_t)((1ULL << 31) - 1))
#define XFS_MAX_EXTCNT_ATTR_FORK_SMALL ((xfs_extnum_t)((1ULL << 15) - 1))
+/*
+ * When we upgrade an inode to the large extent counts, the maximum value by
+ * which the extent count can increase is bound by the change in size of the
+ * on-disk field. No upgrade operation should ever be adding more than a few
+ * tens of extents, so if we get a really large value it is a sign of a code bug
+ * or corruption.
+ */
+#define XFS_MAX_EXTCNT_UPGRADE_NR \
+ min(XFS_MAX_EXTCNT_ATTR_FORK_LARGE - XFS_MAX_EXTCNT_ATTR_FORK_SMALL, \
+ XFS_MAX_EXTCNT_DATA_FORK_LARGE - XFS_MAX_EXTCNT_DATA_FORK_SMALL)
+
/*
* Inode minimum and maximum sizes.
*/
diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
index bb5d841aac58..9aee4a1e2fe9 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -756,3 +756,27 @@ xfs_iext_count_may_overflow(
return 0;
}
+
+/*
+ * Upgrade this inode's extent counter fields to be able to handle a potential
+ * increase in the extent count by nr_to_add. Normally this is the same
+ * quantity that caused xfs_iext_count_may_overflow() to return -EFBIG.
+ */
+int
+xfs_iext_count_upgrade(
+ struct xfs_trans *tp,
+ struct xfs_inode *ip,
+ uint nr_to_add)
+{
+ ASSERT(nr_to_add <= XFS_MAX_EXTCNT_UPGRADE_NR);
+
+ if (!xfs_has_large_extent_counts(ip->i_mount) ||
+ xfs_inode_has_large_extent_counts(ip) ||
+ XFS_TEST_ERROR(false, ip->i_mount, XFS_ERRTAG_REDUCE_MAX_IEXTENTS))
+ return -EFBIG;
+
+ ip->i_diflags2 |= XFS_DIFLAG2_NREXT64;
+ xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+
+ return 0;
+}
diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
index 6f9d69f8896e..4f68c1f20beb 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.h
+++ b/fs/xfs/libxfs/xfs_inode_fork.h
@@ -275,6 +275,8 @@ int xfs_ifork_verify_local_data(struct xfs_inode *ip);
int xfs_ifork_verify_local_attr(struct xfs_inode *ip);
int xfs_iext_count_may_overflow(struct xfs_inode *ip, int whichfork,
int nr_to_add);
+int xfs_iext_count_upgrade(struct xfs_trans *tp, struct xfs_inode *ip,
+ uint nr_to_add);
/* returns true if the fork has extents but they are not read in yet. */
static inline bool xfs_need_iread_extents(struct xfs_ifork *ifp)
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index 761dde155099..593ac29cffc7 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -506,6 +506,8 @@ xfs_bui_item_recover(
iext_delta = XFS_IEXT_PUNCH_HOLE_CNT;
error = xfs_iext_count_may_overflow(ip, whichfork, iext_delta);
+ if (error == -EFBIG)
+ error = xfs_iext_count_upgrade(tp, ip, iext_delta);
if (error)
goto err_cancel;
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 18c1b99311a8..52be58372c63 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -859,6 +859,9 @@ xfs_alloc_file_space(
error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
XFS_IEXT_ADD_NOSPLIT_CNT);
+ if (error == -EFBIG)
+ error = xfs_iext_count_upgrade(tp, ip,
+ XFS_IEXT_ADD_NOSPLIT_CNT);
if (error)
goto error;
@@ -914,6 +917,8 @@ xfs_unmap_extent(
error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
XFS_IEXT_PUNCH_HOLE_CNT);
+ if (error == -EFBIG)
+ error = xfs_iext_count_upgrade(tp, ip, XFS_IEXT_PUNCH_HOLE_CNT);
if (error)
goto out_trans_cancel;
@@ -1195,6 +1200,8 @@ xfs_insert_file_space(
error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
XFS_IEXT_PUNCH_HOLE_CNT);
+ if (error == -EFBIG)
+ error = xfs_iext_count_upgrade(tp, ip, XFS_IEXT_PUNCH_HOLE_CNT);
if (error)
goto out_trans_cancel;
@@ -1423,6 +1430,9 @@ xfs_swap_extent_rmap(
error = xfs_iext_count_may_overflow(ip,
XFS_DATA_FORK,
XFS_IEXT_SWAP_RMAP_CNT);
+ if (error == -EFBIG)
+ error = xfs_iext_count_upgrade(tp, ip,
+ XFS_IEXT_SWAP_RMAP_CNT);
if (error)
goto out;
}
@@ -1431,6 +1441,9 @@ xfs_swap_extent_rmap(
error = xfs_iext_count_may_overflow(tip,
XFS_DATA_FORK,
XFS_IEXT_SWAP_RMAP_CNT);
+ if (error == -EFBIG)
+ error = xfs_iext_count_upgrade(tp, ip,
+ XFS_IEXT_SWAP_RMAP_CNT);
if (error)
goto out;
}
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index 5afedcbc78c7..eb211e0ede5d 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -322,6 +322,9 @@ xfs_dquot_disk_alloc(
error = xfs_iext_count_may_overflow(quotip, XFS_DATA_FORK,
XFS_IEXT_ADD_NOSPLIT_CNT);
+ if (error == -EFBIG)
+ error = xfs_iext_count_upgrade(tp, quotip,
+ XFS_IEXT_ADD_NOSPLIT_CNT);
if (error)
goto err_cancel;
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 87e1cf5060bd..5a393259a3a3 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -251,6 +251,8 @@ xfs_iomap_write_direct(
return error;
error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK, nr_exts);
+ if (error == -EFBIG)
+ error = xfs_iext_count_upgrade(tp, ip, nr_exts);
if (error)
goto out_trans_cancel;
@@ -555,6 +557,9 @@ xfs_iomap_write_unwritten(
error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
XFS_IEXT_WRITE_UNWRITTEN_CNT);
+ if (error == -EFBIG)
+ error = xfs_iext_count_upgrade(tp, ip,
+ XFS_IEXT_WRITE_UNWRITTEN_CNT);
if (error)
goto error_on_bmapi_transaction;
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 54e68e5693fd..1ae6d3434ad2 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -620,6 +620,9 @@ xfs_reflink_end_cow_extent(
error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
XFS_IEXT_REFLINK_END_COW_CNT);
+ if (error == -EFBIG)
+ error = xfs_iext_count_upgrade(tp, ip,
+ XFS_IEXT_REFLINK_END_COW_CNT);
if (error)
goto out_cancel;
@@ -1121,6 +1124,8 @@ xfs_reflink_remap_extent(
++iext_delta;
error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK, iext_delta);
+ if (error == -EFBIG)
+ error = xfs_iext_count_upgrade(tp, ip, iext_delta);
if (error)
goto out_cancel;
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index b8c79ee791af..3e587e85d5bf 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -806,6 +806,9 @@ xfs_growfs_rt_alloc(
error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
XFS_IEXT_ADD_NOSPLIT_CNT);
+ if (error == -EFBIG)
+ error = xfs_iext_count_upgrade(tp, ip,
+ XFS_IEXT_ADD_NOSPLIT_CNT);
if (error)
goto out_trans_cancel;
--
2.30.2
^ permalink raw reply related [flat|nested] 62+ messages in thread
* [PATCH V9.1] xfs: Enable bulkstat ioctl to support 64-bit per-inode extent counters
2022-04-06 6:19 ` [PATCH V9 18/19] xfs: Enable bulkstat ioctl to support 64-bit per-inode extent counters Chandan Babu R
2022-04-07 1:29 ` Darrick J. Wong
@ 2022-04-09 13:57 ` Chandan Babu R
2022-04-11 2:56 ` Dave Chinner
2022-04-13 2:57 ` Darrick J. Wong
1 sibling, 2 replies; 62+ messages in thread
From: Chandan Babu R @ 2022-04-09 13:57 UTC (permalink / raw)
To: linux-xfs; +Cc: djwong, david, chandan.babu
The following changes are made to enable userspace to obtain 64-bit extent
counters,
1. Carve out a new 64-bit field xfs_bulkstat->bs_extents64 from
xfs_bulkstat->bs_pad[] to hold 64-bit extent counter.
2. Define the new flag XFS_BULK_IREQ_BULKSTAT for userspace to indicate that
it is capable of receiving 64-bit extent counters.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Suggested-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
fs/xfs/libxfs/xfs_fs.h | 20 ++++++++++++++++----
fs/xfs/xfs_ioctl.c | 3 +++
fs/xfs/xfs_itable.c | 9 ++++++++-
fs/xfs/xfs_itable.h | 3 +++
4 files changed, 30 insertions(+), 5 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 1f7238db35cc..2a42bfb85c3b 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -378,7 +378,7 @@ struct xfs_bulkstat {
uint32_t bs_extsize_blks; /* extent size hint, blocks */
uint32_t bs_nlink; /* number of links */
- uint32_t bs_extents; /* number of extents */
+ uint32_t bs_extents; /* 32-bit data fork extent counter */
uint32_t bs_aextents; /* attribute number of extents */
uint16_t bs_version; /* structure version */
uint16_t bs_forkoff; /* inode fork offset in bytes */
@@ -387,8 +387,9 @@ struct xfs_bulkstat {
uint16_t bs_checked; /* checked inode metadata */
uint16_t bs_mode; /* type and mode */
uint16_t bs_pad2; /* zeroed */
+ uint64_t bs_extents64; /* 64-bit data fork extent counter */
- uint64_t bs_pad[7]; /* zeroed */
+ uint64_t bs_pad[6]; /* zeroed */
};
#define XFS_BULKSTAT_VERSION_V1 (1)
@@ -469,8 +470,19 @@ struct xfs_bulk_ireq {
*/
#define XFS_BULK_IREQ_SPECIAL (1 << 1)
-#define XFS_BULK_IREQ_FLAGS_ALL (XFS_BULK_IREQ_AGNO | \
- XFS_BULK_IREQ_SPECIAL)
+/*
+ * Return data fork extent count via xfs_bulkstat->bs_extents64 field and assign
+ * 0 to xfs_bulkstat->bs_extents when the flag is set. Otherwise, use
+ * xfs_bulkstat->bs_extents for returning data fork extent count and set
+ * xfs_bulkstat->bs_extents64 to 0. In the second case, return -EOVERFLOW and
+ * assign 0 to xfs_bulkstat->bs_extents if data fork extent count is larger than
+ * XFS_MAX_EXTCNT_DATA_FORK_OLD.
+ */
+#define XFS_BULK_IREQ_NREXT64 (1 << 2)
+
+#define XFS_BULK_IREQ_FLAGS_ALL (XFS_BULK_IREQ_AGNO | \
+ XFS_BULK_IREQ_SPECIAL | \
+ XFS_BULK_IREQ_NREXT64)
/* Operate on the root directory inode. */
#define XFS_BULK_IREQ_SPECIAL_ROOT (1)
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 83481005317a..e9eadc7337ce 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -813,6 +813,9 @@ xfs_bulk_ireq_setup(
if (XFS_INO_TO_AGNO(mp, breq->startino) >= mp->m_sb.sb_agcount)
return -ECANCELED;
+ if (hdr->flags & XFS_BULK_IREQ_NREXT64)
+ breq->flags |= XFS_IBULK_NREXT64;
+
return 0;
}
diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
index 71ed4905f206..f74c9fff72bb 100644
--- a/fs/xfs/xfs_itable.c
+++ b/fs/xfs/xfs_itable.c
@@ -64,6 +64,7 @@ xfs_bulkstat_one_int(
struct xfs_inode *ip; /* incore inode pointer */
struct inode *inode;
struct xfs_bulkstat *buf = bc->buf;
+ xfs_extnum_t nextents;
int error = -EINVAL;
if (xfs_internal_inum(mp, ino))
@@ -102,7 +103,13 @@ xfs_bulkstat_one_int(
buf->bs_xflags = xfs_ip2xflags(ip);
buf->bs_extsize_blks = ip->i_extsize;
- buf->bs_extents = xfs_ifork_nextents(&ip->i_df);
+
+ nextents = xfs_ifork_nextents(&ip->i_df);
+ if (!(bc->breq->flags & XFS_IBULK_NREXT64))
+ buf->bs_extents = min(nextents, XFS_MAX_EXTCNT_DATA_FORK_SMALL);
+ else
+ buf->bs_extents64 = nextents;
+
xfs_bulkstat_health(ip, buf);
buf->bs_aextents = xfs_ifork_nextents(ip->i_afp);
buf->bs_forkoff = XFS_IFORK_BOFF(ip);
diff --git a/fs/xfs/xfs_itable.h b/fs/xfs/xfs_itable.h
index 5ee1d3f44ce9..e2d0eba43f35 100644
--- a/fs/xfs/xfs_itable.h
+++ b/fs/xfs/xfs_itable.h
@@ -19,6 +19,9 @@ struct xfs_ibulk {
/* Only iterate within the same AG as startino */
#define XFS_IBULK_SAME_AG (1U << 0)
+/* Fill out the bs_extents64 field if set. */
+#define XFS_IBULK_NREXT64 (1U << 1)
+
/*
* Advance the user buffer pointer by one record of the given size. If the
* buffer is now full, return the appropriate error code.
--
2.30.2
^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [PATCH V9.1] xfs: Directory's data fork extent counter can never overflow
2022-04-09 13:47 ` [PATCH V9.1] " Chandan Babu R
@ 2022-04-11 1:33 ` Dave Chinner
2022-04-11 22:07 ` Darrick J. Wong
1 sibling, 0 replies; 62+ messages in thread
From: Dave Chinner @ 2022-04-11 1:33 UTC (permalink / raw)
To: Chandan Babu R; +Cc: linux-xfs, djwong
On Sat, Apr 09, 2022 at 07:17:21PM +0530, Chandan Babu R wrote:
> The maximum file size that can be represented by the data fork extent counter
> in the worst case occurs when all extents are 1 block in length and each block
> is 1KB in size.
>
> With XFS_MAX_EXTCNT_DATA_FORK_SMALL representing maximum extent count and with
> 1KB sized blocks, a file can reach upto,
> (2^31) * 1KB = 2TB
>
> This is much larger than the theoretical maximum size of a directory
> i.e. XFS_DIR2_SPACE_SIZE * 3 = ~96GB.
>
> Since a directory's inode can never overflow its data fork extent counter,
> this commit removes all the overflow checks associated with
> it. xfs_dinode_verify() now performs a rough check to verify if a diretory's
> data fork is larger than 96GB.
>
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Looks good now.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH V9.1] xfs: Conditionally upgrade existing inodes to use large extent counters
2022-04-09 13:52 ` [PATCH V9.1] " Chandan Babu R
@ 2022-04-11 1:34 ` Dave Chinner
0 siblings, 0 replies; 62+ messages in thread
From: Dave Chinner @ 2022-04-11 1:34 UTC (permalink / raw)
To: Chandan Babu R; +Cc: linux-xfs, djwong
On Sat, Apr 09, 2022 at 07:22:51PM +0530, Chandan Babu R wrote:
> This commit enables upgrading existing inodes to use large extent counters
> provided that underlying filesystem's superblock has large extent counter
> feature enabled.
>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
LGTM.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH V9.1] xfs: Enable bulkstat ioctl to support 64-bit per-inode extent counters
2022-04-09 13:57 ` [PATCH V9.1] " Chandan Babu R
@ 2022-04-11 2:56 ` Dave Chinner
2022-04-13 2:57 ` Darrick J. Wong
1 sibling, 0 replies; 62+ messages in thread
From: Dave Chinner @ 2022-04-11 2:56 UTC (permalink / raw)
To: Chandan Babu R; +Cc: linux-xfs, djwong
On Sat, Apr 09, 2022 at 07:27:09PM +0530, Chandan Babu R wrote:
> The following changes are made to enable userspace to obtain 64-bit extent
> counters,
> 1. Carve out a new 64-bit field xfs_bulkstat->bs_extents64 from
> xfs_bulkstat->bs_pad[] to hold 64-bit extent counter.
> 2. Define the new flag XFS_BULK_IREQ_BULKSTAT for userspace to indicate that
> it is capable of receiving 64-bit extent counters.
>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> Suggested-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
....
> diff --git a/fs/xfs/xfs_itable.h b/fs/xfs/xfs_itable.h
> index 5ee1d3f44ce9..e2d0eba43f35 100644
> --- a/fs/xfs/xfs_itable.h
> +++ b/fs/xfs/xfs_itable.h
> @@ -19,6 +19,9 @@ struct xfs_ibulk {
> /* Only iterate within the same AG as startino */
> #define XFS_IBULK_SAME_AG (1U << 0)
This doesn't apply - I guess you modified patch 17 to make this 1U
instead of 1 and then didn't resend it.
I'll clean it up....
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH V9.1] xfs: Directory's data fork extent counter can never overflow
2022-04-09 13:47 ` [PATCH V9.1] " Chandan Babu R
2022-04-11 1:33 ` Dave Chinner
@ 2022-04-11 22:07 ` Darrick J. Wong
2022-04-12 3:39 ` Chandan Babu R
1 sibling, 1 reply; 62+ messages in thread
From: Darrick J. Wong @ 2022-04-11 22:07 UTC (permalink / raw)
To: Chandan Babu R; +Cc: linux-xfs, david
On Sat, Apr 09, 2022 at 07:17:21PM +0530, Chandan Babu R wrote:
> The maximum file size that can be represented by the data fork extent counter
> in the worst case occurs when all extents are 1 block in length and each block
> is 1KB in size.
>
> With XFS_MAX_EXTCNT_DATA_FORK_SMALL representing maximum extent count and with
> 1KB sized blocks, a file can reach upto,
> (2^31) * 1KB = 2TB
>
> This is much larger than the theoretical maximum size of a directory
> i.e. XFS_DIR2_SPACE_SIZE * 3 = ~96GB.
>
> Since a directory's inode can never overflow its data fork extent counter,
> this commit removes all the overflow checks associated with
> it. xfs_dinode_verify() now performs a rough check to verify if a diretory's
> data fork is larger than 96GB.
>
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
> ---
> fs/xfs/libxfs/xfs_bmap.c | 20 -------------
> fs/xfs/libxfs/xfs_da_btree.h | 1 +
> fs/xfs/libxfs/xfs_da_format.h | 1 +
> fs/xfs/libxfs/xfs_dir2.c | 2 ++
> fs/xfs/libxfs/xfs_format.h | 13 ++++++++
> fs/xfs/libxfs/xfs_inode_buf.c | 3 ++
> fs/xfs/libxfs/xfs_inode_fork.h | 13 --------
> fs/xfs/xfs_inode.c | 55 ++--------------------------------
> fs/xfs/xfs_symlink.c | 5 ----
> 9 files changed, 22 insertions(+), 91 deletions(-)
>
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 1254d4d4821e..4fab0c92ab70 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -5147,26 +5147,6 @@ xfs_bmap_del_extent_real(
> * Deleting the middle of the extent.
> */
>
> - /*
> - * For directories, -ENOSPC is returned since a directory entry
> - * remove operation must not fail due to low extent count
> - * availability. -ENOSPC will be handled by higher layers of XFS
> - * by letting the corresponding empty Data/Free blocks to linger
> - * until a future remove operation. Dabtree blocks would be
> - * swapped with the last block in the leaf space and then the
> - * new last block will be unmapped.
> - *
> - * The above logic also applies to the source directory entry of
> - * a rename operation.
> - */
> - error = xfs_iext_count_may_overflow(ip, whichfork, 1);
> - if (error) {
> - ASSERT(S_ISDIR(VFS_I(ip)->i_mode) &&
> - whichfork == XFS_DATA_FORK);
> - error = -ENOSPC;
> - goto done;
> - }
> -
> old = got;
>
> got.br_blockcount = del->br_startoff - got.br_startoff;
> diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
> index 0faf7d9ac241..7f08f6de48bf 100644
> --- a/fs/xfs/libxfs/xfs_da_btree.h
> +++ b/fs/xfs/libxfs/xfs_da_btree.h
> @@ -30,6 +30,7 @@ struct xfs_da_geometry {
> unsigned int free_hdr_size; /* dir2 free header size */
> unsigned int free_max_bests; /* # of bests entries in dir2 free */
> xfs_dablk_t freeblk; /* blockno of free data v2 */
> + xfs_extnum_t max_extents; /* Max. extents in corresponding fork */
>
> xfs_dir2_data_aoff_t data_first_offset;
> size_t data_entry_offset;
> diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
> index 5a49caa5c9df..95354b7ab7f5 100644
> --- a/fs/xfs/libxfs/xfs_da_format.h
> +++ b/fs/xfs/libxfs/xfs_da_format.h
> @@ -277,6 +277,7 @@ xfs_dir2_sf_firstentry(struct xfs_dir2_sf_hdr *hdr)
> * Directory address space divided into sections,
> * spaces separated by 32GB.
> */
> +#define XFS_DIR2_MAX_SPACES 3
> #define XFS_DIR2_SPACE_SIZE (1ULL << (32 + XFS_DIR2_DATA_ALIGN_LOG))
> #define XFS_DIR2_DATA_SPACE 0
> #define XFS_DIR2_DATA_OFFSET (XFS_DIR2_DATA_SPACE * XFS_DIR2_SPACE_SIZE)
> diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
> index 5f1e4799e8fa..52c764ecc015 100644
> --- a/fs/xfs/libxfs/xfs_dir2.c
> +++ b/fs/xfs/libxfs/xfs_dir2.c
> @@ -150,6 +150,8 @@ xfs_da_mount(
> dageo->freeblk = xfs_dir2_byte_to_da(dageo, XFS_DIR2_FREE_OFFSET);
> dageo->node_ents = (dageo->blksize - dageo->node_hdr_size) /
> (uint)sizeof(xfs_da_node_entry_t);
> + dageo->max_extents = (XFS_DIR2_MAX_SPACES * XFS_DIR2_SPACE_SIZE) >>
> + mp->m_sb.sb_blocklog;
> dageo->magicpct = (dageo->blksize * 37) / 100;
>
> /* set up attribute geometry - single fsb only */
Shouldn't we set up mp->m_attr_geo.max_extents too? Even if all we do
is set it to XFS_MAX_EXTCNT_ATTR_FORK_{SMALL,LARGE}? I get that nothing
will use it anywhere, but we shouldn't leave uninitialized geometry
structure variables around.
--D
> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> index 82b404c99b80..43de892d0305 100644
> --- a/fs/xfs/libxfs/xfs_format.h
> +++ b/fs/xfs/libxfs/xfs_format.h
> @@ -915,6 +915,19 @@ enum xfs_dinode_fmt {
> *
> * Rounding up 47 to the nearest multiple of bits-per-byte results in 48. Hence
> * 2^48 was chosen as the maximum data fork extent count.
> + *
> + * The maximum file size that can be represented by the data fork extent counter
> + * in the worst case occurs when all extents are 1 block in length and each
> + * block is 1KB in size.
> + *
> + * With XFS_MAX_EXTCNT_DATA_FORK_SMALL representing maximum extent count and
> + * with 1KB sized blocks, a file can reach upto,
> + * 1KB * (2^31) = 2TB
> + *
> + * This is much larger than the theoretical maximum size of a directory
> + * i.e. XFS_DIR2_SPACE_SIZE * XFS_DIR2_MAX_SPACES = ~96GB.
> + *
> + * Hence, a directory inode can never overflow its data fork extent counter.
> */
> #define XFS_MAX_EXTCNT_DATA_FORK_LARGE ((xfs_extnum_t)((1ULL << 48) - 1))
> #define XFS_MAX_EXTCNT_ATTR_FORK_LARGE ((xfs_extnum_t)((1ULL << 32) - 1))
> diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> index ee8d4eb7d048..74b82ec80f8e 100644
> --- a/fs/xfs/libxfs/xfs_inode_buf.c
> +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> @@ -491,6 +491,9 @@ xfs_dinode_verify(
> if (mode && nextents + naextents > nblocks)
> return __this_address;
>
> + if (S_ISDIR(mode) && nextents > mp->m_dir_geo->max_extents)
> + return __this_address;
> +
> if (mode && XFS_DFORK_BOFF(dip) > mp->m_sb.sb_inodesize)
> return __this_address;
>
> diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
> index fd5c3c2d77e0..6f9d69f8896e 100644
> --- a/fs/xfs/libxfs/xfs_inode_fork.h
> +++ b/fs/xfs/libxfs/xfs_inode_fork.h
> @@ -39,19 +39,6 @@ struct xfs_ifork {
> */
> #define XFS_IEXT_PUNCH_HOLE_CNT (1)
>
> -/*
> - * Directory entry addition can cause the following,
> - * 1. Data block can be added/removed.
> - * A new extent can cause extent count to increase by 1.
> - * 2. Free disk block can be added/removed.
> - * Same behaviour as described above for Data block.
> - * 3. Dabtree blocks.
> - * XFS_DA_NODE_MAXDEPTH blocks can be added. Each of these can be new
> - * extents. Hence extent count can increase by XFS_DA_NODE_MAXDEPTH.
> - */
> -#define XFS_IEXT_DIR_MANIP_CNT(mp) \
> - ((XFS_DA_NODE_MAXDEPTH + 1 + 1) * (mp)->m_dir_geo->fsbcount)
> -
> /*
> * Adding/removing an xattr can cause XFS_DA_NODE_MAXDEPTH extents to
> * be added. One extra extent for dabtree in case a local attr is
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index adc1355ce853..20f15a0393e1 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -1024,11 +1024,6 @@ xfs_create(
> xfs_ilock(dp, XFS_ILOCK_EXCL | XFS_ILOCK_PARENT);
> unlock_dp_on_error = true;
>
> - error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
> - XFS_IEXT_DIR_MANIP_CNT(mp));
> - if (error)
> - goto out_trans_cancel;
> -
> /*
> * A newly created regular or special file just has one directory
> * entry pointing to them, but a directory also the "." entry
> @@ -1242,11 +1237,6 @@ xfs_link(
> if (error)
> goto std_return;
>
> - error = xfs_iext_count_may_overflow(tdp, XFS_DATA_FORK,
> - XFS_IEXT_DIR_MANIP_CNT(mp));
> - if (error)
> - goto error_return;
> -
> /*
> * If we are using project inheritance, we only allow hard link
> * creation in our tree when the project IDs are the same; else
> @@ -3210,35 +3200,6 @@ xfs_rename(
> /*
> * Check for expected errors before we dirty the transaction
> * so we can return an error without a transaction abort.
> - *
> - * Extent count overflow check:
> - *
> - * From the perspective of src_dp, a rename operation is essentially a
> - * directory entry remove operation. Hence the only place where we check
> - * for extent count overflow for src_dp is in
> - * xfs_bmap_del_extent_real(). xfs_bmap_del_extent_real() returns
> - * -ENOSPC when it detects a possible extent count overflow and in
> - * response, the higher layers of directory handling code do the
> - * following:
> - * 1. Data/Free blocks: XFS lets these blocks linger until a
> - * future remove operation removes them.
> - * 2. Dabtree blocks: XFS swaps the blocks with the last block in the
> - * Leaf space and unmaps the last block.
> - *
> - * For target_dp, there are two cases depending on whether the
> - * destination directory entry exists or not.
> - *
> - * When destination directory entry does not exist (i.e. target_ip ==
> - * NULL), extent count overflow check is performed only when transaction
> - * has a non-zero sized space reservation associated with it. With a
> - * zero-sized space reservation, XFS allows a rename operation to
> - * continue only when the directory has sufficient free space in its
> - * data/leaf/free space blocks to hold the new entry.
> - *
> - * When destination directory entry exists (i.e. target_ip != NULL), all
> - * we need to do is change the inode number associated with the already
> - * existing entry. Hence there is no need to perform an extent count
> - * overflow check.
> */
> if (target_ip == NULL) {
> /*
> @@ -3249,12 +3210,6 @@ xfs_rename(
> error = xfs_dir_canenter(tp, target_dp, target_name);
> if (error)
> goto out_trans_cancel;
> - } else {
> - error = xfs_iext_count_may_overflow(target_dp,
> - XFS_DATA_FORK,
> - XFS_IEXT_DIR_MANIP_CNT(mp));
> - if (error)
> - goto out_trans_cancel;
> }
> } else {
> /*
> @@ -3422,18 +3377,12 @@ xfs_rename(
> * inode number of the whiteout inode rather than removing it
> * altogether.
> */
> - if (wip) {
> + if (wip)
> error = xfs_dir_replace(tp, src_dp, src_name, wip->i_ino,
> spaceres);
> - } else {
> - /*
> - * NOTE: We don't need to check for extent count overflow here
> - * because the dir remove name code will leave the dir block in
> - * place if the extent count would overflow.
> - */
> + else
> error = xfs_dir_removename(tp, src_dp, src_name, src_ip->i_ino,
> spaceres);
> - }
>
> if (error)
> goto out_trans_cancel;
> diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
> index affbedf78160..4145ba872547 100644
> --- a/fs/xfs/xfs_symlink.c
> +++ b/fs/xfs/xfs_symlink.c
> @@ -226,11 +226,6 @@ xfs_symlink(
> goto out_trans_cancel;
> }
>
> - error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
> - XFS_IEXT_DIR_MANIP_CNT(mp));
> - if (error)
> - goto out_trans_cancel;
> -
> /*
> * Allocate an inode for the symlink.
> */
> --
> 2.30.2
>
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH V9.1] xfs: Directory's data fork extent counter can never overflow
2022-04-11 22:07 ` Darrick J. Wong
@ 2022-04-12 3:39 ` Chandan Babu R
0 siblings, 0 replies; 62+ messages in thread
From: Chandan Babu R @ 2022-04-12 3:39 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs, david
On 12 Apr 2022 at 03:37, Darrick J. Wong wrote:
> On Sat, Apr 09, 2022 at 07:17:21PM +0530, Chandan Babu R wrote:
>> The maximum file size that can be represented by the data fork extent counter
>> in the worst case occurs when all extents are 1 block in length and each block
>> is 1KB in size.
>>
>> With XFS_MAX_EXTCNT_DATA_FORK_SMALL representing maximum extent count and with
>> 1KB sized blocks, a file can reach upto,
>> (2^31) * 1KB = 2TB
>>
>> This is much larger than the theoretical maximum size of a directory
>> i.e. XFS_DIR2_SPACE_SIZE * 3 = ~96GB.
>>
>> Since a directory's inode can never overflow its data fork extent counter,
>> this commit removes all the overflow checks associated with
>> it. xfs_dinode_verify() now performs a rough check to verify if a diretory's
>> data fork is larger than 96GB.
>>
>> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
>> ---
>> fs/xfs/libxfs/xfs_bmap.c | 20 -------------
>> fs/xfs/libxfs/xfs_da_btree.h | 1 +
>> fs/xfs/libxfs/xfs_da_format.h | 1 +
>> fs/xfs/libxfs/xfs_dir2.c | 2 ++
>> fs/xfs/libxfs/xfs_format.h | 13 ++++++++
>> fs/xfs/libxfs/xfs_inode_buf.c | 3 ++
>> fs/xfs/libxfs/xfs_inode_fork.h | 13 --------
>> fs/xfs/xfs_inode.c | 55 ++--------------------------------
>> fs/xfs/xfs_symlink.c | 5 ----
>> 9 files changed, 22 insertions(+), 91 deletions(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
>> index 1254d4d4821e..4fab0c92ab70 100644
>> --- a/fs/xfs/libxfs/xfs_bmap.c
>> +++ b/fs/xfs/libxfs/xfs_bmap.c
>> @@ -5147,26 +5147,6 @@ xfs_bmap_del_extent_real(
>> * Deleting the middle of the extent.
>> */
>>
>> - /*
>> - * For directories, -ENOSPC is returned since a directory entry
>> - * remove operation must not fail due to low extent count
>> - * availability. -ENOSPC will be handled by higher layers of XFS
>> - * by letting the corresponding empty Data/Free blocks to linger
>> - * until a future remove operation. Dabtree blocks would be
>> - * swapped with the last block in the leaf space and then the
>> - * new last block will be unmapped.
>> - *
>> - * The above logic also applies to the source directory entry of
>> - * a rename operation.
>> - */
>> - error = xfs_iext_count_may_overflow(ip, whichfork, 1);
>> - if (error) {
>> - ASSERT(S_ISDIR(VFS_I(ip)->i_mode) &&
>> - whichfork == XFS_DATA_FORK);
>> - error = -ENOSPC;
>> - goto done;
>> - }
>> -
>> old = got;
>>
>> got.br_blockcount = del->br_startoff - got.br_startoff;
>> diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
>> index 0faf7d9ac241..7f08f6de48bf 100644
>> --- a/fs/xfs/libxfs/xfs_da_btree.h
>> +++ b/fs/xfs/libxfs/xfs_da_btree.h
>> @@ -30,6 +30,7 @@ struct xfs_da_geometry {
>> unsigned int free_hdr_size; /* dir2 free header size */
>> unsigned int free_max_bests; /* # of bests entries in dir2 free */
>> xfs_dablk_t freeblk; /* blockno of free data v2 */
>> + xfs_extnum_t max_extents; /* Max. extents in corresponding fork */
>>
>> xfs_dir2_data_aoff_t data_first_offset;
>> size_t data_entry_offset;
>> diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
>> index 5a49caa5c9df..95354b7ab7f5 100644
>> --- a/fs/xfs/libxfs/xfs_da_format.h
>> +++ b/fs/xfs/libxfs/xfs_da_format.h
>> @@ -277,6 +277,7 @@ xfs_dir2_sf_firstentry(struct xfs_dir2_sf_hdr *hdr)
>> * Directory address space divided into sections,
>> * spaces separated by 32GB.
>> */
>> +#define XFS_DIR2_MAX_SPACES 3
>> #define XFS_DIR2_SPACE_SIZE (1ULL << (32 + XFS_DIR2_DATA_ALIGN_LOG))
>> #define XFS_DIR2_DATA_SPACE 0
>> #define XFS_DIR2_DATA_OFFSET (XFS_DIR2_DATA_SPACE * XFS_DIR2_SPACE_SIZE)
>> diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
>> index 5f1e4799e8fa..52c764ecc015 100644
>> --- a/fs/xfs/libxfs/xfs_dir2.c
>> +++ b/fs/xfs/libxfs/xfs_dir2.c
>> @@ -150,6 +150,8 @@ xfs_da_mount(
>> dageo->freeblk = xfs_dir2_byte_to_da(dageo, XFS_DIR2_FREE_OFFSET);
>> dageo->node_ents = (dageo->blksize - dageo->node_hdr_size) /
>> (uint)sizeof(xfs_da_node_entry_t);
>> + dageo->max_extents = (XFS_DIR2_MAX_SPACES * XFS_DIR2_SPACE_SIZE) >>
>> + mp->m_sb.sb_blocklog;
>> dageo->magicpct = (dageo->blksize * 37) / 100;
>>
>> /* set up attribute geometry - single fsb only */
>
> Shouldn't we set up mp->m_attr_geo.max_extents too? Even if all we do
> is set it to XFS_MAX_EXTCNT_ATTR_FORK_{SMALL,LARGE}? I get that nothing
> will use it anywhere, but we shouldn't leave uninitialized geometry
> structure variables around.
>
I had left it to be initialized to the value of zero as an indicator that the
field has an invalid value. But I think your suggestion is indeed correct
since we can assign the field with either XFS_MAX_EXTCNT_ATTR_FORK_SMALL or
XFS_MAX_EXTCNT_ATTR_FORK_LARGE. I will post a v9.2 patch soon.
--
chandan
^ permalink raw reply [flat|nested] 62+ messages in thread
* [PATCH V9.2] xfs: Directory's data fork extent counter can never overflow
2022-04-06 6:18 ` [PATCH V9 15/19] xfs: Directory's data fork extent counter can never overflow Chandan Babu R
2022-04-07 1:13 ` Dave Chinner
2022-04-09 13:47 ` [PATCH V9.1] " Chandan Babu R
@ 2022-04-12 14:02 ` Chandan Babu R
2022-04-12 17:04 ` Darrick J. Wong
2 siblings, 1 reply; 62+ messages in thread
From: Chandan Babu R @ 2022-04-12 14:02 UTC (permalink / raw)
To: linux-xfs; +Cc: djwong, david, chandan.babu
The maximum file size that can be represented by the data fork extent counter
in the worst case occurs when all extents are 1 block in length and each block
is 1KB in size.
With XFS_MAX_EXTCNT_DATA_FORK_SMALL representing maximum extent count and with
1KB sized blocks, a file can reach upto,
(2^31) * 1KB = 2TB
This is much larger than the theoretical maximum size of a directory
i.e. XFS_DIR2_SPACE_SIZE * 3 = ~96GB.
Since a directory's inode can never overflow its data fork extent counter,
this commit removes all the overflow checks associated with
it. xfs_dinode_verify() now performs a rough check to verify if a diretory's
data fork is larger than 96GB.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
fs/xfs/libxfs/xfs_bmap.c | 20 -------------
fs/xfs/libxfs/xfs_da_btree.h | 1 +
fs/xfs/libxfs/xfs_da_format.h | 1 +
fs/xfs/libxfs/xfs_dir2.c | 8 +++++
fs/xfs/libxfs/xfs_format.h | 13 ++++++++
fs/xfs/libxfs/xfs_inode_buf.c | 3 ++
fs/xfs/libxfs/xfs_inode_fork.h | 13 --------
fs/xfs/xfs_inode.c | 55 ++--------------------------------
fs/xfs/xfs_symlink.c | 5 ----
9 files changed, 28 insertions(+), 91 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 1254d4d4821e..4fab0c92ab70 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -5147,26 +5147,6 @@ xfs_bmap_del_extent_real(
* Deleting the middle of the extent.
*/
- /*
- * For directories, -ENOSPC is returned since a directory entry
- * remove operation must not fail due to low extent count
- * availability. -ENOSPC will be handled by higher layers of XFS
- * by letting the corresponding empty Data/Free blocks to linger
- * until a future remove operation. Dabtree blocks would be
- * swapped with the last block in the leaf space and then the
- * new last block will be unmapped.
- *
- * The above logic also applies to the source directory entry of
- * a rename operation.
- */
- error = xfs_iext_count_may_overflow(ip, whichfork, 1);
- if (error) {
- ASSERT(S_ISDIR(VFS_I(ip)->i_mode) &&
- whichfork == XFS_DATA_FORK);
- error = -ENOSPC;
- goto done;
- }
-
old = got;
got.br_blockcount = del->br_startoff - got.br_startoff;
diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
index 0faf7d9ac241..7f08f6de48bf 100644
--- a/fs/xfs/libxfs/xfs_da_btree.h
+++ b/fs/xfs/libxfs/xfs_da_btree.h
@@ -30,6 +30,7 @@ struct xfs_da_geometry {
unsigned int free_hdr_size; /* dir2 free header size */
unsigned int free_max_bests; /* # of bests entries in dir2 free */
xfs_dablk_t freeblk; /* blockno of free data v2 */
+ xfs_extnum_t max_extents; /* Max. extents in corresponding fork */
xfs_dir2_data_aoff_t data_first_offset;
size_t data_entry_offset;
diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index 5a49caa5c9df..95354b7ab7f5 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -277,6 +277,7 @@ xfs_dir2_sf_firstentry(struct xfs_dir2_sf_hdr *hdr)
* Directory address space divided into sections,
* spaces separated by 32GB.
*/
+#define XFS_DIR2_MAX_SPACES 3
#define XFS_DIR2_SPACE_SIZE (1ULL << (32 + XFS_DIR2_DATA_ALIGN_LOG))
#define XFS_DIR2_DATA_SPACE 0
#define XFS_DIR2_DATA_OFFSET (XFS_DIR2_DATA_SPACE * XFS_DIR2_SPACE_SIZE)
diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
index 5f1e4799e8fa..3cd51fa3837b 100644
--- a/fs/xfs/libxfs/xfs_dir2.c
+++ b/fs/xfs/libxfs/xfs_dir2.c
@@ -150,6 +150,8 @@ xfs_da_mount(
dageo->freeblk = xfs_dir2_byte_to_da(dageo, XFS_DIR2_FREE_OFFSET);
dageo->node_ents = (dageo->blksize - dageo->node_hdr_size) /
(uint)sizeof(xfs_da_node_entry_t);
+ dageo->max_extents = (XFS_DIR2_MAX_SPACES * XFS_DIR2_SPACE_SIZE) >>
+ mp->m_sb.sb_blocklog;
dageo->magicpct = (dageo->blksize * 37) / 100;
/* set up attribute geometry - single fsb only */
@@ -161,6 +163,12 @@ xfs_da_mount(
dageo->node_hdr_size = mp->m_dir_geo->node_hdr_size;
dageo->node_ents = (dageo->blksize - dageo->node_hdr_size) /
(uint)sizeof(xfs_da_node_entry_t);
+
+ if (xfs_has_large_extent_counts(mp))
+ dageo->max_extents = XFS_MAX_EXTCNT_ATTR_FORK_LARGE;
+ else
+ dageo->max_extents = XFS_MAX_EXTCNT_ATTR_FORK_SMALL;
+
dageo->magicpct = (dageo->blksize * 37) / 100;
return 0;
}
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 82b404c99b80..43de892d0305 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -915,6 +915,19 @@ enum xfs_dinode_fmt {
*
* Rounding up 47 to the nearest multiple of bits-per-byte results in 48. Hence
* 2^48 was chosen as the maximum data fork extent count.
+ *
+ * The maximum file size that can be represented by the data fork extent counter
+ * in the worst case occurs when all extents are 1 block in length and each
+ * block is 1KB in size.
+ *
+ * With XFS_MAX_EXTCNT_DATA_FORK_SMALL representing maximum extent count and
+ * with 1KB sized blocks, a file can reach upto,
+ * 1KB * (2^31) = 2TB
+ *
+ * This is much larger than the theoretical maximum size of a directory
+ * i.e. XFS_DIR2_SPACE_SIZE * XFS_DIR2_MAX_SPACES = ~96GB.
+ *
+ * Hence, a directory inode can never overflow its data fork extent counter.
*/
#define XFS_MAX_EXTCNT_DATA_FORK_LARGE ((xfs_extnum_t)((1ULL << 48) - 1))
#define XFS_MAX_EXTCNT_ATTR_FORK_LARGE ((xfs_extnum_t)((1ULL << 32) - 1))
diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index ee8d4eb7d048..74b82ec80f8e 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -491,6 +491,9 @@ xfs_dinode_verify(
if (mode && nextents + naextents > nblocks)
return __this_address;
+ if (S_ISDIR(mode) && nextents > mp->m_dir_geo->max_extents)
+ return __this_address;
+
if (mode && XFS_DFORK_BOFF(dip) > mp->m_sb.sb_inodesize)
return __this_address;
diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
index fd5c3c2d77e0..6f9d69f8896e 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.h
+++ b/fs/xfs/libxfs/xfs_inode_fork.h
@@ -39,19 +39,6 @@ struct xfs_ifork {
*/
#define XFS_IEXT_PUNCH_HOLE_CNT (1)
-/*
- * Directory entry addition can cause the following,
- * 1. Data block can be added/removed.
- * A new extent can cause extent count to increase by 1.
- * 2. Free disk block can be added/removed.
- * Same behaviour as described above for Data block.
- * 3. Dabtree blocks.
- * XFS_DA_NODE_MAXDEPTH blocks can be added. Each of these can be new
- * extents. Hence extent count can increase by XFS_DA_NODE_MAXDEPTH.
- */
-#define XFS_IEXT_DIR_MANIP_CNT(mp) \
- ((XFS_DA_NODE_MAXDEPTH + 1 + 1) * (mp)->m_dir_geo->fsbcount)
-
/*
* Adding/removing an xattr can cause XFS_DA_NODE_MAXDEPTH extents to
* be added. One extra extent for dabtree in case a local attr is
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index adc1355ce853..20f15a0393e1 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1024,11 +1024,6 @@ xfs_create(
xfs_ilock(dp, XFS_ILOCK_EXCL | XFS_ILOCK_PARENT);
unlock_dp_on_error = true;
- error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
- XFS_IEXT_DIR_MANIP_CNT(mp));
- if (error)
- goto out_trans_cancel;
-
/*
* A newly created regular or special file just has one directory
* entry pointing to them, but a directory also the "." entry
@@ -1242,11 +1237,6 @@ xfs_link(
if (error)
goto std_return;
- error = xfs_iext_count_may_overflow(tdp, XFS_DATA_FORK,
- XFS_IEXT_DIR_MANIP_CNT(mp));
- if (error)
- goto error_return;
-
/*
* If we are using project inheritance, we only allow hard link
* creation in our tree when the project IDs are the same; else
@@ -3210,35 +3200,6 @@ xfs_rename(
/*
* Check for expected errors before we dirty the transaction
* so we can return an error without a transaction abort.
- *
- * Extent count overflow check:
- *
- * From the perspective of src_dp, a rename operation is essentially a
- * directory entry remove operation. Hence the only place where we check
- * for extent count overflow for src_dp is in
- * xfs_bmap_del_extent_real(). xfs_bmap_del_extent_real() returns
- * -ENOSPC when it detects a possible extent count overflow and in
- * response, the higher layers of directory handling code do the
- * following:
- * 1. Data/Free blocks: XFS lets these blocks linger until a
- * future remove operation removes them.
- * 2. Dabtree blocks: XFS swaps the blocks with the last block in the
- * Leaf space and unmaps the last block.
- *
- * For target_dp, there are two cases depending on whether the
- * destination directory entry exists or not.
- *
- * When destination directory entry does not exist (i.e. target_ip ==
- * NULL), extent count overflow check is performed only when transaction
- * has a non-zero sized space reservation associated with it. With a
- * zero-sized space reservation, XFS allows a rename operation to
- * continue only when the directory has sufficient free space in its
- * data/leaf/free space blocks to hold the new entry.
- *
- * When destination directory entry exists (i.e. target_ip != NULL), all
- * we need to do is change the inode number associated with the already
- * existing entry. Hence there is no need to perform an extent count
- * overflow check.
*/
if (target_ip == NULL) {
/*
@@ -3249,12 +3210,6 @@ xfs_rename(
error = xfs_dir_canenter(tp, target_dp, target_name);
if (error)
goto out_trans_cancel;
- } else {
- error = xfs_iext_count_may_overflow(target_dp,
- XFS_DATA_FORK,
- XFS_IEXT_DIR_MANIP_CNT(mp));
- if (error)
- goto out_trans_cancel;
}
} else {
/*
@@ -3422,18 +3377,12 @@ xfs_rename(
* inode number of the whiteout inode rather than removing it
* altogether.
*/
- if (wip) {
+ if (wip)
error = xfs_dir_replace(tp, src_dp, src_name, wip->i_ino,
spaceres);
- } else {
- /*
- * NOTE: We don't need to check for extent count overflow here
- * because the dir remove name code will leave the dir block in
- * place if the extent count would overflow.
- */
+ else
error = xfs_dir_removename(tp, src_dp, src_name, src_ip->i_ino,
spaceres);
- }
if (error)
goto out_trans_cancel;
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index affbedf78160..4145ba872547 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -226,11 +226,6 @@ xfs_symlink(
goto out_trans_cancel;
}
- error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
- XFS_IEXT_DIR_MANIP_CNT(mp));
- if (error)
- goto out_trans_cancel;
-
/*
* Allocate an inode for the symlink.
*/
--
2.30.2
^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [PATCH V9.2] xfs: Directory's data fork extent counter can never overflow
2022-04-12 14:02 ` [PATCH V9.2] " Chandan Babu R
@ 2022-04-12 17:04 ` Darrick J. Wong
0 siblings, 0 replies; 62+ messages in thread
From: Darrick J. Wong @ 2022-04-12 17:04 UTC (permalink / raw)
To: Chandan Babu R; +Cc: linux-xfs, david
On Tue, Apr 12, 2022 at 07:32:37PM +0530, Chandan Babu R wrote:
> The maximum file size that can be represented by the data fork extent counter
> in the worst case occurs when all extents are 1 block in length and each block
> is 1KB in size.
>
> With XFS_MAX_EXTCNT_DATA_FORK_SMALL representing maximum extent count and with
> 1KB sized blocks, a file can reach upto,
> (2^31) * 1KB = 2TB
>
> This is much larger than the theoretical maximum size of a directory
> i.e. XFS_DIR2_SPACE_SIZE * 3 = ~96GB.
>
> Since a directory's inode can never overflow its data fork extent counter,
> this commit removes all the overflow checks associated with
> it. xfs_dinode_verify() now performs a rough check to verify if a diretory's
> data fork is larger than 96GB.
>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Neato!
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
--D
> ---
> fs/xfs/libxfs/xfs_bmap.c | 20 -------------
> fs/xfs/libxfs/xfs_da_btree.h | 1 +
> fs/xfs/libxfs/xfs_da_format.h | 1 +
> fs/xfs/libxfs/xfs_dir2.c | 8 +++++
> fs/xfs/libxfs/xfs_format.h | 13 ++++++++
> fs/xfs/libxfs/xfs_inode_buf.c | 3 ++
> fs/xfs/libxfs/xfs_inode_fork.h | 13 --------
> fs/xfs/xfs_inode.c | 55 ++--------------------------------
> fs/xfs/xfs_symlink.c | 5 ----
> 9 files changed, 28 insertions(+), 91 deletions(-)
>
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 1254d4d4821e..4fab0c92ab70 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -5147,26 +5147,6 @@ xfs_bmap_del_extent_real(
> * Deleting the middle of the extent.
> */
>
> - /*
> - * For directories, -ENOSPC is returned since a directory entry
> - * remove operation must not fail due to low extent count
> - * availability. -ENOSPC will be handled by higher layers of XFS
> - * by letting the corresponding empty Data/Free blocks to linger
> - * until a future remove operation. Dabtree blocks would be
> - * swapped with the last block in the leaf space and then the
> - * new last block will be unmapped.
> - *
> - * The above logic also applies to the source directory entry of
> - * a rename operation.
> - */
> - error = xfs_iext_count_may_overflow(ip, whichfork, 1);
> - if (error) {
> - ASSERT(S_ISDIR(VFS_I(ip)->i_mode) &&
> - whichfork == XFS_DATA_FORK);
> - error = -ENOSPC;
> - goto done;
> - }
> -
> old = got;
>
> got.br_blockcount = del->br_startoff - got.br_startoff;
> diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
> index 0faf7d9ac241..7f08f6de48bf 100644
> --- a/fs/xfs/libxfs/xfs_da_btree.h
> +++ b/fs/xfs/libxfs/xfs_da_btree.h
> @@ -30,6 +30,7 @@ struct xfs_da_geometry {
> unsigned int free_hdr_size; /* dir2 free header size */
> unsigned int free_max_bests; /* # of bests entries in dir2 free */
> xfs_dablk_t freeblk; /* blockno of free data v2 */
> + xfs_extnum_t max_extents; /* Max. extents in corresponding fork */
>
> xfs_dir2_data_aoff_t data_first_offset;
> size_t data_entry_offset;
> diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
> index 5a49caa5c9df..95354b7ab7f5 100644
> --- a/fs/xfs/libxfs/xfs_da_format.h
> +++ b/fs/xfs/libxfs/xfs_da_format.h
> @@ -277,6 +277,7 @@ xfs_dir2_sf_firstentry(struct xfs_dir2_sf_hdr *hdr)
> * Directory address space divided into sections,
> * spaces separated by 32GB.
> */
> +#define XFS_DIR2_MAX_SPACES 3
> #define XFS_DIR2_SPACE_SIZE (1ULL << (32 + XFS_DIR2_DATA_ALIGN_LOG))
> #define XFS_DIR2_DATA_SPACE 0
> #define XFS_DIR2_DATA_OFFSET (XFS_DIR2_DATA_SPACE * XFS_DIR2_SPACE_SIZE)
> diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
> index 5f1e4799e8fa..3cd51fa3837b 100644
> --- a/fs/xfs/libxfs/xfs_dir2.c
> +++ b/fs/xfs/libxfs/xfs_dir2.c
> @@ -150,6 +150,8 @@ xfs_da_mount(
> dageo->freeblk = xfs_dir2_byte_to_da(dageo, XFS_DIR2_FREE_OFFSET);
> dageo->node_ents = (dageo->blksize - dageo->node_hdr_size) /
> (uint)sizeof(xfs_da_node_entry_t);
> + dageo->max_extents = (XFS_DIR2_MAX_SPACES * XFS_DIR2_SPACE_SIZE) >>
> + mp->m_sb.sb_blocklog;
> dageo->magicpct = (dageo->blksize * 37) / 100;
>
> /* set up attribute geometry - single fsb only */
> @@ -161,6 +163,12 @@ xfs_da_mount(
> dageo->node_hdr_size = mp->m_dir_geo->node_hdr_size;
> dageo->node_ents = (dageo->blksize - dageo->node_hdr_size) /
> (uint)sizeof(xfs_da_node_entry_t);
> +
> + if (xfs_has_large_extent_counts(mp))
> + dageo->max_extents = XFS_MAX_EXTCNT_ATTR_FORK_LARGE;
> + else
> + dageo->max_extents = XFS_MAX_EXTCNT_ATTR_FORK_SMALL;
> +
> dageo->magicpct = (dageo->blksize * 37) / 100;
> return 0;
> }
> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> index 82b404c99b80..43de892d0305 100644
> --- a/fs/xfs/libxfs/xfs_format.h
> +++ b/fs/xfs/libxfs/xfs_format.h
> @@ -915,6 +915,19 @@ enum xfs_dinode_fmt {
> *
> * Rounding up 47 to the nearest multiple of bits-per-byte results in 48. Hence
> * 2^48 was chosen as the maximum data fork extent count.
> + *
> + * The maximum file size that can be represented by the data fork extent counter
> + * in the worst case occurs when all extents are 1 block in length and each
> + * block is 1KB in size.
> + *
> + * With XFS_MAX_EXTCNT_DATA_FORK_SMALL representing maximum extent count and
> + * with 1KB sized blocks, a file can reach upto,
> + * 1KB * (2^31) = 2TB
> + *
> + * This is much larger than the theoretical maximum size of a directory
> + * i.e. XFS_DIR2_SPACE_SIZE * XFS_DIR2_MAX_SPACES = ~96GB.
> + *
> + * Hence, a directory inode can never overflow its data fork extent counter.
> */
> #define XFS_MAX_EXTCNT_DATA_FORK_LARGE ((xfs_extnum_t)((1ULL << 48) - 1))
> #define XFS_MAX_EXTCNT_ATTR_FORK_LARGE ((xfs_extnum_t)((1ULL << 32) - 1))
> diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> index ee8d4eb7d048..74b82ec80f8e 100644
> --- a/fs/xfs/libxfs/xfs_inode_buf.c
> +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> @@ -491,6 +491,9 @@ xfs_dinode_verify(
> if (mode && nextents + naextents > nblocks)
> return __this_address;
>
> + if (S_ISDIR(mode) && nextents > mp->m_dir_geo->max_extents)
> + return __this_address;
> +
> if (mode && XFS_DFORK_BOFF(dip) > mp->m_sb.sb_inodesize)
> return __this_address;
>
> diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
> index fd5c3c2d77e0..6f9d69f8896e 100644
> --- a/fs/xfs/libxfs/xfs_inode_fork.h
> +++ b/fs/xfs/libxfs/xfs_inode_fork.h
> @@ -39,19 +39,6 @@ struct xfs_ifork {
> */
> #define XFS_IEXT_PUNCH_HOLE_CNT (1)
>
> -/*
> - * Directory entry addition can cause the following,
> - * 1. Data block can be added/removed.
> - * A new extent can cause extent count to increase by 1.
> - * 2. Free disk block can be added/removed.
> - * Same behaviour as described above for Data block.
> - * 3. Dabtree blocks.
> - * XFS_DA_NODE_MAXDEPTH blocks can be added. Each of these can be new
> - * extents. Hence extent count can increase by XFS_DA_NODE_MAXDEPTH.
> - */
> -#define XFS_IEXT_DIR_MANIP_CNT(mp) \
> - ((XFS_DA_NODE_MAXDEPTH + 1 + 1) * (mp)->m_dir_geo->fsbcount)
> -
> /*
> * Adding/removing an xattr can cause XFS_DA_NODE_MAXDEPTH extents to
> * be added. One extra extent for dabtree in case a local attr is
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index adc1355ce853..20f15a0393e1 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -1024,11 +1024,6 @@ xfs_create(
> xfs_ilock(dp, XFS_ILOCK_EXCL | XFS_ILOCK_PARENT);
> unlock_dp_on_error = true;
>
> - error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
> - XFS_IEXT_DIR_MANIP_CNT(mp));
> - if (error)
> - goto out_trans_cancel;
> -
> /*
> * A newly created regular or special file just has one directory
> * entry pointing to them, but a directory also the "." entry
> @@ -1242,11 +1237,6 @@ xfs_link(
> if (error)
> goto std_return;
>
> - error = xfs_iext_count_may_overflow(tdp, XFS_DATA_FORK,
> - XFS_IEXT_DIR_MANIP_CNT(mp));
> - if (error)
> - goto error_return;
> -
> /*
> * If we are using project inheritance, we only allow hard link
> * creation in our tree when the project IDs are the same; else
> @@ -3210,35 +3200,6 @@ xfs_rename(
> /*
> * Check for expected errors before we dirty the transaction
> * so we can return an error without a transaction abort.
> - *
> - * Extent count overflow check:
> - *
> - * From the perspective of src_dp, a rename operation is essentially a
> - * directory entry remove operation. Hence the only place where we check
> - * for extent count overflow for src_dp is in
> - * xfs_bmap_del_extent_real(). xfs_bmap_del_extent_real() returns
> - * -ENOSPC when it detects a possible extent count overflow and in
> - * response, the higher layers of directory handling code do the
> - * following:
> - * 1. Data/Free blocks: XFS lets these blocks linger until a
> - * future remove operation removes them.
> - * 2. Dabtree blocks: XFS swaps the blocks with the last block in the
> - * Leaf space and unmaps the last block.
> - *
> - * For target_dp, there are two cases depending on whether the
> - * destination directory entry exists or not.
> - *
> - * When destination directory entry does not exist (i.e. target_ip ==
> - * NULL), extent count overflow check is performed only when transaction
> - * has a non-zero sized space reservation associated with it. With a
> - * zero-sized space reservation, XFS allows a rename operation to
> - * continue only when the directory has sufficient free space in its
> - * data/leaf/free space blocks to hold the new entry.
> - *
> - * When destination directory entry exists (i.e. target_ip != NULL), all
> - * we need to do is change the inode number associated with the already
> - * existing entry. Hence there is no need to perform an extent count
> - * overflow check.
> */
> if (target_ip == NULL) {
> /*
> @@ -3249,12 +3210,6 @@ xfs_rename(
> error = xfs_dir_canenter(tp, target_dp, target_name);
> if (error)
> goto out_trans_cancel;
> - } else {
> - error = xfs_iext_count_may_overflow(target_dp,
> - XFS_DATA_FORK,
> - XFS_IEXT_DIR_MANIP_CNT(mp));
> - if (error)
> - goto out_trans_cancel;
> }
> } else {
> /*
> @@ -3422,18 +3377,12 @@ xfs_rename(
> * inode number of the whiteout inode rather than removing it
> * altogether.
> */
> - if (wip) {
> + if (wip)
> error = xfs_dir_replace(tp, src_dp, src_name, wip->i_ino,
> spaceres);
> - } else {
> - /*
> - * NOTE: We don't need to check for extent count overflow here
> - * because the dir remove name code will leave the dir block in
> - * place if the extent count would overflow.
> - */
> + else
> error = xfs_dir_removename(tp, src_dp, src_name, src_ip->i_ino,
> spaceres);
> - }
>
> if (error)
> goto out_trans_cancel;
> diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
> index affbedf78160..4145ba872547 100644
> --- a/fs/xfs/xfs_symlink.c
> +++ b/fs/xfs/xfs_symlink.c
> @@ -226,11 +226,6 @@ xfs_symlink(
> goto out_trans_cancel;
> }
>
> - error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
> - XFS_IEXT_DIR_MANIP_CNT(mp));
> - if (error)
> - goto out_trans_cancel;
> -
> /*
> * Allocate an inode for the symlink.
> */
> --
> 2.30.2
>
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH V9.1] xfs: Enable bulkstat ioctl to support 64-bit per-inode extent counters
2022-04-09 13:57 ` [PATCH V9.1] " Chandan Babu R
2022-04-11 2:56 ` Dave Chinner
@ 2022-04-13 2:57 ` Darrick J. Wong
2022-04-13 7:48 ` Chandan Babu R
1 sibling, 1 reply; 62+ messages in thread
From: Darrick J. Wong @ 2022-04-13 2:57 UTC (permalink / raw)
To: Chandan Babu R; +Cc: linux-xfs, david
On Sat, Apr 09, 2022 at 07:27:09PM +0530, Chandan Babu R wrote:
> The following changes are made to enable userspace to obtain 64-bit extent
> counters,
> 1. Carve out a new 64-bit field xfs_bulkstat->bs_extents64 from
> xfs_bulkstat->bs_pad[] to hold 64-bit extent counter.
> 2. Define the new flag XFS_BULK_IREQ_BULKSTAT for userspace to indicate that
> it is capable of receiving 64-bit extent counters.
>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> Suggested-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
> ---
> fs/xfs/libxfs/xfs_fs.h | 20 ++++++++++++++++----
> fs/xfs/xfs_ioctl.c | 3 +++
> fs/xfs/xfs_itable.c | 9 ++++++++-
> fs/xfs/xfs_itable.h | 3 +++
> 4 files changed, 30 insertions(+), 5 deletions(-)
>
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index 1f7238db35cc..2a42bfb85c3b 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -378,7 +378,7 @@ struct xfs_bulkstat {
> uint32_t bs_extsize_blks; /* extent size hint, blocks */
>
> uint32_t bs_nlink; /* number of links */
> - uint32_t bs_extents; /* number of extents */
> + uint32_t bs_extents; /* 32-bit data fork extent counter */
> uint32_t bs_aextents; /* attribute number of extents */
> uint16_t bs_version; /* structure version */
> uint16_t bs_forkoff; /* inode fork offset in bytes */
> @@ -387,8 +387,9 @@ struct xfs_bulkstat {
> uint16_t bs_checked; /* checked inode metadata */
> uint16_t bs_mode; /* type and mode */
> uint16_t bs_pad2; /* zeroed */
> + uint64_t bs_extents64; /* 64-bit data fork extent counter */
>
> - uint64_t bs_pad[7]; /* zeroed */
> + uint64_t bs_pad[6]; /* zeroed */
> };
>
> #define XFS_BULKSTAT_VERSION_V1 (1)
> @@ -469,8 +470,19 @@ struct xfs_bulk_ireq {
> */
> #define XFS_BULK_IREQ_SPECIAL (1 << 1)
>
> -#define XFS_BULK_IREQ_FLAGS_ALL (XFS_BULK_IREQ_AGNO | \
> - XFS_BULK_IREQ_SPECIAL)
> +/*
> + * Return data fork extent count via xfs_bulkstat->bs_extents64 field and assign
> + * 0 to xfs_bulkstat->bs_extents when the flag is set. Otherwise, use
> + * xfs_bulkstat->bs_extents for returning data fork extent count and set
> + * xfs_bulkstat->bs_extents64 to 0. In the second case, return -EOVERFLOW and
> + * assign 0 to xfs_bulkstat->bs_extents if data fork extent count is larger than
> + * XFS_MAX_EXTCNT_DATA_FORK_OLD.
> + */
> +#define XFS_BULK_IREQ_NREXT64 (1 << 2)
This /probably/ ought to be (1U << 2) but ... fmeh, I don't have gcc 5
and don't care to install it, so because the logic looks ok to me:
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
--D
> +
> +#define XFS_BULK_IREQ_FLAGS_ALL (XFS_BULK_IREQ_AGNO | \
> + XFS_BULK_IREQ_SPECIAL | \
> + XFS_BULK_IREQ_NREXT64)
>
> /* Operate on the root directory inode. */
> #define XFS_BULK_IREQ_SPECIAL_ROOT (1)
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 83481005317a..e9eadc7337ce 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -813,6 +813,9 @@ xfs_bulk_ireq_setup(
> if (XFS_INO_TO_AGNO(mp, breq->startino) >= mp->m_sb.sb_agcount)
> return -ECANCELED;
>
> + if (hdr->flags & XFS_BULK_IREQ_NREXT64)
> + breq->flags |= XFS_IBULK_NREXT64;
> +
> return 0;
> }
>
> diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
> index 71ed4905f206..f74c9fff72bb 100644
> --- a/fs/xfs/xfs_itable.c
> +++ b/fs/xfs/xfs_itable.c
> @@ -64,6 +64,7 @@ xfs_bulkstat_one_int(
> struct xfs_inode *ip; /* incore inode pointer */
> struct inode *inode;
> struct xfs_bulkstat *buf = bc->buf;
> + xfs_extnum_t nextents;
> int error = -EINVAL;
>
> if (xfs_internal_inum(mp, ino))
> @@ -102,7 +103,13 @@ xfs_bulkstat_one_int(
>
> buf->bs_xflags = xfs_ip2xflags(ip);
> buf->bs_extsize_blks = ip->i_extsize;
> - buf->bs_extents = xfs_ifork_nextents(&ip->i_df);
> +
> + nextents = xfs_ifork_nextents(&ip->i_df);
> + if (!(bc->breq->flags & XFS_IBULK_NREXT64))
> + buf->bs_extents = min(nextents, XFS_MAX_EXTCNT_DATA_FORK_SMALL);
> + else
> + buf->bs_extents64 = nextents;
> +
> xfs_bulkstat_health(ip, buf);
> buf->bs_aextents = xfs_ifork_nextents(ip->i_afp);
> buf->bs_forkoff = XFS_IFORK_BOFF(ip);
> diff --git a/fs/xfs/xfs_itable.h b/fs/xfs/xfs_itable.h
> index 5ee1d3f44ce9..e2d0eba43f35 100644
> --- a/fs/xfs/xfs_itable.h
> +++ b/fs/xfs/xfs_itable.h
> @@ -19,6 +19,9 @@ struct xfs_ibulk {
> /* Only iterate within the same AG as startino */
> #define XFS_IBULK_SAME_AG (1U << 0)
>
> +/* Fill out the bs_extents64 field if set. */
> +#define XFS_IBULK_NREXT64 (1U << 1)
> +
> /*
> * Advance the user buffer pointer by one record of the given size. If the
> * buffer is now full, return the appropriate error code.
> --
> 2.30.2
>
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH V9.1] xfs: Enable bulkstat ioctl to support 64-bit per-inode extent counters
2022-04-13 2:57 ` Darrick J. Wong
@ 2022-04-13 7:48 ` Chandan Babu R
0 siblings, 0 replies; 62+ messages in thread
From: Chandan Babu R @ 2022-04-13 7:48 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs, david
On 13 Apr 2022 at 08:27, Darrick J. Wong wrote:
> On Sat, Apr 09, 2022 at 07:27:09PM +0530, Chandan Babu R wrote:
>> The following changes are made to enable userspace to obtain 64-bit extent
>> counters,
>> 1. Carve out a new 64-bit field xfs_bulkstat->bs_extents64 from
>> xfs_bulkstat->bs_pad[] to hold 64-bit extent counter.
>> 2. Define the new flag XFS_BULK_IREQ_BULKSTAT for userspace to indicate that
>> it is capable of receiving 64-bit extent counters.
>>
>> Reviewed-by: Dave Chinner <dchinner@redhat.com>
>> Suggested-by: Darrick J. Wong <djwong@kernel.org>
>> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
>> ---
>> fs/xfs/libxfs/xfs_fs.h | 20 ++++++++++++++++----
>> fs/xfs/xfs_ioctl.c | 3 +++
>> fs/xfs/xfs_itable.c | 9 ++++++++-
>> fs/xfs/xfs_itable.h | 3 +++
>> 4 files changed, 30 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
>> index 1f7238db35cc..2a42bfb85c3b 100644
>> --- a/fs/xfs/libxfs/xfs_fs.h
>> +++ b/fs/xfs/libxfs/xfs_fs.h
>> @@ -378,7 +378,7 @@ struct xfs_bulkstat {
>> uint32_t bs_extsize_blks; /* extent size hint, blocks */
>>
>> uint32_t bs_nlink; /* number of links */
>> - uint32_t bs_extents; /* number of extents */
>> + uint32_t bs_extents; /* 32-bit data fork extent counter */
>> uint32_t bs_aextents; /* attribute number of extents */
>> uint16_t bs_version; /* structure version */
>> uint16_t bs_forkoff; /* inode fork offset in bytes */
>> @@ -387,8 +387,9 @@ struct xfs_bulkstat {
>> uint16_t bs_checked; /* checked inode metadata */
>> uint16_t bs_mode; /* type and mode */
>> uint16_t bs_pad2; /* zeroed */
>> + uint64_t bs_extents64; /* 64-bit data fork extent counter */
>>
>> - uint64_t bs_pad[7]; /* zeroed */
>> + uint64_t bs_pad[6]; /* zeroed */
>> };
>>
>> #define XFS_BULKSTAT_VERSION_V1 (1)
>> @@ -469,8 +470,19 @@ struct xfs_bulk_ireq {
>> */
>> #define XFS_BULK_IREQ_SPECIAL (1 << 1)
>>
>> -#define XFS_BULK_IREQ_FLAGS_ALL (XFS_BULK_IREQ_AGNO | \
>> - XFS_BULK_IREQ_SPECIAL)
>> +/*
>> + * Return data fork extent count via xfs_bulkstat->bs_extents64 field and assign
>> + * 0 to xfs_bulkstat->bs_extents when the flag is set. Otherwise, use
>> + * xfs_bulkstat->bs_extents for returning data fork extent count and set
>> + * xfs_bulkstat->bs_extents64 to 0. In the second case, return -EOVERFLOW and
>> + * assign 0 to xfs_bulkstat->bs_extents if data fork extent count is larger than
>> + * XFS_MAX_EXTCNT_DATA_FORK_OLD.
>> + */
>> +#define XFS_BULK_IREQ_NREXT64 (1 << 2)
>
> This /probably/ ought to be (1U << 2) but ... fmeh, I don't have gcc 5
> and don't care to install it, so because the logic looks ok to me:
>
I have changed XFS_BULK_IREQ_* flags to have unsigned values.
So with this patchset, XFS_IWALK_*, XFS_BULK_IREQ_* and XFS_IBULK_* flags will
have unsigned values.
I will execute fstests before sending a pull request.
>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
>
Thanks to Dave and you for taking time to review various versions of this
patchset.
--
chandan
^ permalink raw reply [flat|nested] 62+ messages in thread
end of thread, other threads:[~2022-04-13 7:48 UTC | newest]
Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-06 6:18 [PATCH V9 00/19] xfs: Extend per-inode extent counters Chandan Babu R
2022-04-06 6:18 ` [PATCH V9 01/19] xfs: Move extent count limits to xfs_format.h Chandan Babu R
2022-04-06 6:18 ` [PATCH V9 02/19] xfs: Define max extent length based on on-disk format definition Chandan Babu R
2022-04-06 6:18 ` [PATCH V9 03/19] xfs: Introduce xfs_iext_max_nextents() helper Chandan Babu R
2022-04-06 6:18 ` [PATCH V9 04/19] xfs: Use xfs_extnum_t instead of basic data types Chandan Babu R
2022-04-06 6:18 ` [PATCH V9 05/19] xfs: Introduce xfs_dfork_nextents() helper Chandan Babu R
2022-04-06 6:18 ` [PATCH V9 06/19] xfs: Use basic types to define xfs_log_dinode's di_nextents and di_anextents Chandan Babu R
2022-04-06 6:18 ` [PATCH V9 07/19] xfs: Promote xfs_extnum_t and xfs_aextnum_t to 64 and 32-bits respectively Chandan Babu R
2022-04-06 6:18 ` [PATCH V9 08/19] xfs: Introduce XFS_SB_FEAT_INCOMPAT_NREXT64 and associated per-fs feature bit Chandan Babu R
2022-04-07 0:50 ` Dave Chinner
2022-04-06 6:18 ` [PATCH V9 09/19] xfs: Introduce XFS_FSOP_GEOM_FLAGS_NREXT64 Chandan Babu R
2022-04-06 6:18 ` [PATCH V9 10/19] xfs: Introduce XFS_DIFLAG2_NREXT64 and associated helpers Chandan Babu R
2022-04-06 6:18 ` [PATCH V9 11/19] xfs: Use uint64_t to count maximum blocks that can be used by BMBT Chandan Babu R
2022-04-07 0:52 ` Dave Chinner
2022-04-06 6:18 ` [PATCH V9 12/19] xfs: Introduce macros to represent new maximum extent counts for data/attr forks Chandan Babu R
2022-04-07 1:05 ` Dave Chinner
2022-04-07 1:58 ` Darrick J. Wong
2022-04-07 2:44 ` Dave Chinner
2022-04-07 8:18 ` Chandan Babu R
2022-04-07 8:56 ` Dave Chinner
2022-04-07 8:18 ` Chandan Babu R
2022-04-06 6:18 ` [PATCH V9 13/19] xfs: Replace numbered inode recovery error messages with descriptive ones Chandan Babu R
2022-04-07 1:50 ` Darrick J. Wong
2022-04-06 6:18 ` [PATCH V9 14/19] xfs: Introduce per-inode 64-bit extent counters Chandan Babu R
2022-04-06 19:03 ` kernel test robot
2022-04-07 1:07 ` Dave Chinner
2022-04-07 1:07 ` Dave Chinner
2022-04-07 8:18 ` Chandan Babu R
2022-04-07 8:18 ` Chandan Babu R
2022-04-06 6:18 ` [PATCH V9 15/19] xfs: Directory's data fork extent counter can never overflow Chandan Babu R
2022-04-07 1:13 ` Dave Chinner
2022-04-07 1:48 ` Darrick J. Wong
2022-04-07 8:19 ` Chandan Babu R
2022-04-09 13:47 ` [PATCH V9.1] " Chandan Babu R
2022-04-11 1:33 ` Dave Chinner
2022-04-11 22:07 ` Darrick J. Wong
2022-04-12 3:39 ` Chandan Babu R
2022-04-12 14:02 ` [PATCH V9.2] " Chandan Babu R
2022-04-12 17:04 ` Darrick J. Wong
2022-04-06 6:19 ` [PATCH V9 16/19] xfs: Conditionally upgrade existing inodes to use large extent counters Chandan Babu R
2022-04-07 1:22 ` Dave Chinner
2022-04-07 1:46 ` Darrick J. Wong
2022-04-07 8:19 ` Chandan Babu R
2022-04-07 1:46 ` Darrick J. Wong
2022-04-07 2:00 ` Darrick J. Wong
2022-04-07 8:19 ` Chandan Babu R
2022-04-09 13:52 ` [PATCH V9.1] " Chandan Babu R
2022-04-11 1:34 ` Dave Chinner
2022-04-06 6:19 ` [PATCH V9 17/19] xfs: Decouple XFS_IBULK flags from XFS_IWALK flags Chandan Babu R
2022-04-07 1:29 ` Darrick J. Wong
2022-04-06 6:19 ` [PATCH V9 18/19] xfs: Enable bulkstat ioctl to support 64-bit per-inode extent counters Chandan Babu R
2022-04-07 1:29 ` Darrick J. Wong
2022-04-07 1:42 ` Dave Chinner
2022-04-07 8:20 ` Chandan Babu R
2022-04-09 13:57 ` [PATCH V9.1] " Chandan Babu R
2022-04-11 2:56 ` Dave Chinner
2022-04-13 2:57 ` Darrick J. Wong
2022-04-13 7:48 ` Chandan Babu R
2022-04-06 6:19 ` [PATCH V9 19/19] xfs: Add XFS_SB_FEAT_INCOMPAT_NREXT64 to the list of supported flags Chandan Babu R
2022-04-07 1:23 ` Dave Chinner
2022-04-07 1:26 ` Darrick J. Wong
2022-04-09 13:23 ` [PATCH V9.1] xfs: Introduce macros to represent new maximum extent counts for data/attr forks Chandan Babu R
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.