All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V7 00/17] xfs: Extend per-inode extent counters
@ 2022-03-01 10:39 Chandan Babu R
  2022-03-01 10:39 ` [PATCH V7 01/17] xfs: Move extent count limits to xfs_format.h Chandan Babu R
                   ` (16 more replies)
  0 siblings, 17 replies; 53+ messages in thread
From: Chandan Babu R @ 2022-03-01 10:39 UTC (permalink / raw)
  To: linux-xfs; +Cc: Chandan Babu R, djwong, david

The commit xfs: fix inode fork extent count overflow
(3f8a4f1d876d3e3e49e50b0396eaffcc4ba71b08) mentions that 10 billion
data fork extents should be possible to create. However the
corresponding on-disk field has a signed 32-bit type. Hence this
patchset extends the per-inode data fork extent counter to 64 bits
(out of which 48 bits are used to store the extent count).

Also, XFS has an attribute fork extent counter which is 16 bits
wide. A workload that,
1. Creates 1 million 255-byte sized xattrs,
2. Deletes 50% of these xattrs in an alternating manner,
3. Tries to insert 400,000 new 255-byte sized xattrs
   causes the xattr extent counter to overflow.

Dave tells me that there are instances where a single file has more
than 100 million hardlinks. With parent pointers being stored in
xattrs, we will overflow the signed 16-bits wide attribute extent
counter when large number of hardlinks are created. Hence this
patchset extends the on-disk field to 32-bits.

The following changes are made to accomplish this,
1. A 64-bit inode field is carved out of existing di_pad and
   di_flushiter fields to hold the 64-bit data fork extent counter.
2. The existing 32-bit inode data fork extent counter will be used to
   hold the attribute fork extent counter.
3. A new incompat superblock flag to prevent older kernels from mounting
   the filesystem.

The patchset has been tested by executing xfstests with the following
mkfs.xfs options,
1. -m crc=0 -b size=1k
2. -m crc=0 -b size=4k
3. -m crc=0 -b size=512
4. -m rmapbt=1,reflink=1 -b size=1k
5. -m rmapbt=1,reflink=1 -b size=4k

Each of the above test scenarios were executed on the following
combinations (For V4 FS test scenario, the last combination was
omitted).
|-------------------------------+-----------|
| Xfsprogs                      | Kernel    |
|-------------------------------+-----------|
| Unpatched                     | Patched   |
| Patched (disable nrext64)     | Unpatched |
| Patched (disable nrext64)     | Patched   |
| Patched (enable nrext64)      | Patched   |
|-------------------------------+-----------|

I have also written tests to check if the correct extent counter
fields are updated with/without the new incompat flag and to verify
upgrading older fs instances to support large extent counters. I have
also fixed xfs/270 test to work with the new code base.

These patches can also be obtained from
https://github.com/chandanr/linux.git at branch
xfs-incompat-extend-extcnt-v7.

Changelog:
V6 -> V7:
1. Address the following review comments from V6,
   - Revert xfs_ibulk->flags to "unsigned int" type.
   - Fix definition of XFS_IBULK_NREXT64 to be independent of IWALK flags.
   - Fix possible double free of transaction handle in xfs_growfs_rt_alloc().

V5 -> V6:
1. Rebase on Linux-v5.17-rc4.
2. Upgrade inodes to use large extent counters from within a
   transaction context.

V4 -> V5:
1. Rebase on xfs-linux/for-next.
2. Use howmany_64() to compute height of maximum bmbt tree.
3. Rename disk and log inode's di_big_dextcnt to di_big_nextents.
4. Rename disk and log inode's di_big_aextcnt to di_big_anextents.
5. Since XFS_IBULK_NREXT64 is not associated with inode walking
   functionality, define it as the 32nd bit and mask it when passing
   xfs_ibulk->flags to xfs_iwalk() function. 

V3 -> V4:
1. Rebase patchset on xfs-linux/for-next branch.
2. Carve out a 64-bit inode field out of the existing di_pad and
   di_flushiter fields to hold the 64-bit data fork extent counter.
3. Use the existing 32-bit inode data fork extent counter to hold the
   attr fork extent counter.
4. Verify the contents of newly introduced inode fields immediately
   after the inode has been read from the disk.
5. Upgrade inodes to be able to hold large extent counters when
   reading them from disk.
6. Use XFS_BULK_IREQ_NREXT64 as the flag that userspace can use to
   indicate that it can read 64-bit data fork extent counter.
7. Bulkstat ioctl returns -EOVERFLOW when userspace is not capable of
   working with large extent counters and inode's data fork extent
   count is larger than INT32_MAX.

V2 -> V3:
1. Define maximum extent length as a function of
   BMBT_BLOCKCOUNT_BITLEN.
2. Introduce xfs_iext_max_nextents() function in the patch series
   before renaming MAXEXTNUM/MAXAEXTNUM. This is done to reduce
   proliferation of macros indicating maximum extent count for data
   and attribute forks.
3. Define xfs_dfork_nextents() as an inline function.
4. Use xfs_rfsblock_t as the data type for variables that hold block
   count.
5. xfs_dfork_nextents() now returns -EFSCORRUPTED when an invalid fork
   is passed as an argument.
6. The following changes are done to enable bulkstat ioctl to report
   64-bit extent counters,
   - Carve out a new 64-bit field xfs_bulkstat->bs_extents64 from
     xfs_bulkstat->bs_pad[].
   - Carve out a new 64-bit field xfs_bulk_ireq->bulkstat_flags from
     xfs_bulk_ireq->reserved[] to hold bulkstat specific operational
     flags. Introduce XFS_IBULK_NREXT64 flag to indicate that
     userspace has the necessary infrastructure to receive 64-bit
     extent counters.
   - Define the new flag XFS_BULK_IREQ_BULKSTAT for userspace to
     indicate that xfs_bulk_ireq->bulkstat_flags has valid flags set.
7. Rename the incompat flag from XFS_SB_FEAT_INCOMPAT_EXTCOUNT_64BIT
   to XFS_SB_FEAT_INCOMPAT_NREXT64.
8. Add a new helper function xfs_inode_to_disk_iext_counters() to
   convert from incore inode extent counters to ondisk inode extent
   counters.
9. Reuse XFS_ERRTAG_REDUCE_MAX_IEXTENTS error tag to skip reporting
   inodes with more than 10 extents when bulkstat ioctl is invoked by
   userspace.
10. Introduce the new per-inode XFS_DIFLAG2_NREXT64 flag to indicate
    that the inode uses 64-bit extent counter. This is used to allow
    administrators to upgrade existing filesystems.
11. Export presence of XFS_SB_FEAT_INCOMPAT_NREXT64 feature to
    userspace via XFS_IOC_FSGEOMETRY ioctl.

V1 -> V2:
1. Rebase patches on top of Darrick's btree-dynamic-depth branch.
2. Add new bulkstat ioctl version to support 64-bit data fork extent
   counter field.
3. Introduce new error tag to verify if the old bulkstat ioctls skip
   reporting inodes with large data fork extent counters.

Chandan Babu R (17):
  xfs: Move extent count limits to xfs_format.h
  xfs: Introduce xfs_iext_max_nextents() helper
  xfs: Use xfs_extnum_t instead of basic data types
  xfs: Introduce xfs_dfork_nextents() helper
  xfs: Use basic types to define xfs_log_dinode's di_nextents and
    di_anextents
  xfs: Promote xfs_extnum_t and xfs_aextnum_t to 64 and 32-bits
    respectively
  xfs: Introduce XFS_SB_FEAT_INCOMPAT_NREXT64 and associated per-fs
    feature bit
  xfs: Introduce XFS_FSOP_GEOM_FLAGS_NREXT64
  xfs: Introduce XFS_DIFLAG2_NREXT64 and associated helpers
  xfs: Use xfs_rfsblock_t to count maximum blocks that can be used by
    BMBT
  xfs: Introduce macros to represent new maximum extent counts for
    data/attr forks
  xfs: Introduce per-inode 64-bit extent counters
  xfs: xfs_growfs_rt_alloc: Unlock inode explicitly rather than through
    iop_committing()
  xfs: Conditionally upgrade existing inodes to use 64-bit extent
    counters
  xfs: Enable bulkstat ioctl to support 64-bit per-inode extent counters
  xfs: Add XFS_SB_FEAT_INCOMPAT_NREXT64 to the list of supported flags
  xfs: Define max extent length based on on-disk format definition

 fs/xfs/libxfs/xfs_alloc.c       |  2 +-
 fs/xfs/libxfs/xfs_attr.c        |  3 +-
 fs/xfs/libxfs/xfs_bmap.c        | 87 ++++++++++++++++-----------------
 fs/xfs/libxfs/xfs_bmap_btree.c  |  2 +-
 fs/xfs/libxfs/xfs_format.h      | 71 +++++++++++++++++++++++----
 fs/xfs/libxfs/xfs_fs.h          | 21 ++++++--
 fs/xfs/libxfs/xfs_ialloc.c      |  2 +
 fs/xfs/libxfs/xfs_inode_buf.c   | 78 +++++++++++++++++++++++------
 fs/xfs/libxfs/xfs_inode_fork.c  | 51 ++++++++++++++++---
 fs/xfs/libxfs/xfs_inode_fork.h  | 61 ++++++++++++++++++++++-
 fs/xfs/libxfs/xfs_log_format.h  | 33 +++++++++++--
 fs/xfs/libxfs/xfs_sb.c          |  5 ++
 fs/xfs/libxfs/xfs_trans_resv.c  | 11 +++--
 fs/xfs/libxfs/xfs_types.h       | 11 +----
 fs/xfs/scrub/bmap.c             |  2 +-
 fs/xfs/scrub/inode.c            | 20 ++++----
 fs/xfs/xfs_bmap_item.c          |  3 +-
 fs/xfs/xfs_bmap_util.c          | 24 ++++-----
 fs/xfs/xfs_dquot.c              |  2 +-
 fs/xfs/xfs_inode.c              |  4 +-
 fs/xfs/xfs_inode.h              |  5 ++
 fs/xfs/xfs_inode_item.c         | 23 +++++++--
 fs/xfs/xfs_inode_item_recover.c | 85 +++++++++++++++++++++++++++-----
 fs/xfs/xfs_ioctl.c              |  3 ++
 fs/xfs/xfs_iomap.c              | 33 +++++++------
 fs/xfs/xfs_itable.c             | 30 +++++++++++-
 fs/xfs/xfs_itable.h             |  4 +-
 fs/xfs/xfs_iwalk.h              |  2 +-
 fs/xfs/xfs_mount.h              |  2 +
 fs/xfs/xfs_reflink.c            |  5 +-
 fs/xfs/xfs_rtalloc.c            | 13 +++--
 fs/xfs/xfs_trace.h              |  4 +-
 32 files changed, 532 insertions(+), 170 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH V7 01/17] xfs: Move extent count limits to xfs_format.h
  2022-03-01 10:39 [PATCH V7 00/17] xfs: Extend per-inode extent counters Chandan Babu R
@ 2022-03-01 10:39 ` Chandan Babu R
  2022-03-04  0:55   ` Dave Chinner
  2022-03-01 10:39 ` [PATCH V7 02/17] xfs: Introduce xfs_iext_max_nextents() helper Chandan Babu R
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 53+ messages in thread
From: Chandan Babu R @ 2022-03-01 10:39 UTC (permalink / raw)
  To: linux-xfs; +Cc: Chandan Babu R, djwong, david

Maximum values associated with extent counters i.e. Maximum extent length,
Maximum data extents and Maximum xattr extents are dictated by the on-disk
format. Hence move these definitions over to xfs_format.h.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
 fs/xfs/libxfs/xfs_format.h | 7 +++++++
 fs/xfs/libxfs/xfs_types.h  | 7 -------
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index d665c04e69dd..d75e5b16da7e 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -869,6 +869,13 @@ enum xfs_dinode_fmt {
 	{ XFS_DINODE_FMT_BTREE,		"btree" }, \
 	{ XFS_DINODE_FMT_UUID,		"uuid" }
 
+/*
+ * Max values for extlen, extnum, aextnum.
+ */
+#define	MAXEXTLEN	((xfs_extlen_t)0x001fffff)	/* 21 bits */
+#define	MAXEXTNUM	((xfs_extnum_t)0x7fffffff)	/* signed int */
+#define	MAXAEXTNUM	((xfs_aextnum_t)0x7fff)		/* signed short */
+
 /*
  * Inode minimum and maximum sizes.
  */
diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
index b6da06b40989..794a54cbd0de 100644
--- a/fs/xfs/libxfs/xfs_types.h
+++ b/fs/xfs/libxfs/xfs_types.h
@@ -56,13 +56,6 @@ typedef void *		xfs_failaddr_t;
 #define	NULLFSINO	((xfs_ino_t)-1)
 #define	NULLAGINO	((xfs_agino_t)-1)
 
-/*
- * Max values for extlen, extnum, aextnum.
- */
-#define	MAXEXTLEN	((xfs_extlen_t)0x001fffff)	/* 21 bits */
-#define	MAXEXTNUM	((xfs_extnum_t)0x7fffffff)	/* signed int */
-#define	MAXAEXTNUM	((xfs_aextnum_t)0x7fff)		/* signed short */
-
 /*
  * Minimum and maximum blocksize and sectorsize.
  * The blocksize upper limit is pretty much arbitrary.
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH V7 02/17] xfs: Introduce xfs_iext_max_nextents() helper
  2022-03-01 10:39 [PATCH V7 00/17] xfs: Extend per-inode extent counters Chandan Babu R
  2022-03-01 10:39 ` [PATCH V7 01/17] xfs: Move extent count limits to xfs_format.h Chandan Babu R
@ 2022-03-01 10:39 ` Chandan Babu R
  2022-03-04  0:56   ` Dave Chinner
  2022-03-01 10:39 ` [PATCH V7 03/17] xfs: Use xfs_extnum_t instead of basic data types Chandan Babu R
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 53+ messages in thread
From: Chandan Babu R @ 2022-03-01 10:39 UTC (permalink / raw)
  To: linux-xfs; +Cc: Chandan Babu R, djwong, david

xfs_iext_max_nextents() returns the maximum number of extents possible for one
of data, cow or attribute fork. This helper will be extended further in a
future commit when maximum extent counts associated with data/attribute forks
are increased.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c       | 9 ++++-----
 fs/xfs/libxfs/xfs_inode_buf.c  | 8 +++-----
 fs/xfs/libxfs/xfs_inode_fork.c | 2 +-
 fs/xfs/libxfs/xfs_inode_fork.h | 8 ++++++++
 4 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 74198dd82b03..703ab9a84530 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -74,13 +74,12 @@ xfs_bmap_compute_maxlevels(
 	 * ATTR2 we have to assume the worst case scenario of a minimum size
 	 * available.
 	 */
-	if (whichfork == XFS_DATA_FORK) {
-		maxleafents = MAXEXTNUM;
+	maxleafents = xfs_iext_max_nextents(whichfork);
+	if (whichfork == XFS_DATA_FORK)
 		sz = XFS_BMDR_SPACE_CALC(MINDBTPTRS);
-	} else {
-		maxleafents = MAXAEXTNUM;
+	else
 		sz = XFS_BMDR_SPACE_CALC(MINABTPTRS);
-	}
+
 	maxrootrecs = xfs_bmdr_maxrecs(sz, 0);
 	minleafrecs = mp->m_bmap_dmnr[0];
 	minnoderecs = mp->m_bmap_dmnr[1];
diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index cae9708c8587..e6f9bdc4558f 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -337,6 +337,7 @@ xfs_dinode_verify_fork(
 	int			whichfork)
 {
 	uint32_t		di_nextents = XFS_DFORK_NEXTENTS(dip, whichfork);
+	xfs_extnum_t		max_extents;
 
 	switch (XFS_DFORK_FORMAT(dip, whichfork)) {
 	case XFS_DINODE_FMT_LOCAL:
@@ -358,12 +359,9 @@ xfs_dinode_verify_fork(
 			return __this_address;
 		break;
 	case XFS_DINODE_FMT_BTREE:
-		if (whichfork == XFS_ATTR_FORK) {
-			if (di_nextents > MAXAEXTNUM)
-				return __this_address;
-		} else if (di_nextents > MAXEXTNUM) {
+		max_extents = xfs_iext_max_nextents(whichfork);
+		if (di_nextents > max_extents)
 			return __this_address;
-		}
 		break;
 	default:
 		return __this_address;
diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
index 9149f4f796fc..e136c29a0ec1 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -744,7 +744,7 @@ xfs_iext_count_may_overflow(
 	if (whichfork == XFS_COW_FORK)
 		return 0;
 
-	max_exts = (whichfork == XFS_ATTR_FORK) ? MAXAEXTNUM : MAXEXTNUM;
+	max_exts = xfs_iext_max_nextents(whichfork);
 
 	if (XFS_TEST_ERROR(false, ip->i_mount, XFS_ERRTAG_REDUCE_MAX_IEXTENTS))
 		max_exts = 10;
diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
index 3d64a3acb0ed..2605f7ff8fc1 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.h
+++ b/fs/xfs/libxfs/xfs_inode_fork.h
@@ -133,6 +133,14 @@ static inline int8_t xfs_ifork_format(struct xfs_ifork *ifp)
 	return ifp->if_format;
 }
 
+static inline xfs_extnum_t xfs_iext_max_nextents(int whichfork)
+{
+	if (whichfork == XFS_DATA_FORK || whichfork == XFS_COW_FORK)
+		return MAXEXTNUM;
+
+	return MAXAEXTNUM;
+}
+
 struct xfs_ifork *xfs_ifork_alloc(enum xfs_dinode_fmt format,
 				xfs_extnum_t nextents);
 struct xfs_ifork *xfs_iext_state_to_fork(struct xfs_inode *ip, int state);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH V7 03/17] xfs: Use xfs_extnum_t instead of basic data types
  2022-03-01 10:39 [PATCH V7 00/17] xfs: Extend per-inode extent counters Chandan Babu R
  2022-03-01 10:39 ` [PATCH V7 01/17] xfs: Move extent count limits to xfs_format.h Chandan Babu R
  2022-03-01 10:39 ` [PATCH V7 02/17] xfs: Introduce xfs_iext_max_nextents() helper Chandan Babu R
@ 2022-03-01 10:39 ` Chandan Babu R
  2022-03-04  0:59   ` Dave Chinner
  2022-03-01 10:39 ` [PATCH V7 04/17] xfs: Introduce xfs_dfork_nextents() helper Chandan Babu R
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 53+ messages in thread
From: Chandan Babu R @ 2022-03-01 10:39 UTC (permalink / raw)
  To: linux-xfs; +Cc: Chandan Babu R, djwong, david

xfs_extnum_t is the type to use to declare variables which have values
obtained from xfs_dinode->di_[a]nextents. This commit replaces basic
types (e.g. uint32_t) with xfs_extnum_t for such variables.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c       | 2 +-
 fs/xfs/libxfs/xfs_inode_buf.c  | 2 +-
 fs/xfs/libxfs/xfs_inode_fork.c | 2 +-
 fs/xfs/scrub/inode.c           | 2 +-
 fs/xfs/xfs_trace.h             | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 703ab9a84530..98541be873d8 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -54,7 +54,7 @@ xfs_bmap_compute_maxlevels(
 {
 	int		level;		/* btree level */
 	uint		maxblocks;	/* max blocks at this level */
-	uint		maxleafents;	/* max leaf entries possible */
+	xfs_extnum_t	maxleafents;	/* max leaf entries possible */
 	int		maxrootrecs;	/* max records in root block */
 	int		minleafrecs;	/* min records in leaf block */
 	int		minnoderecs;	/* min records in node block */
diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index e6f9bdc4558f..5c95a5428fc7 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -336,7 +336,7 @@ xfs_dinode_verify_fork(
 	struct xfs_mount	*mp,
 	int			whichfork)
 {
-	uint32_t		di_nextents = XFS_DFORK_NEXTENTS(dip, whichfork);
+	xfs_extnum_t		di_nextents = XFS_DFORK_NEXTENTS(dip, whichfork);
 	xfs_extnum_t		max_extents;
 
 	switch (XFS_DFORK_FORMAT(dip, whichfork)) {
diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
index e136c29a0ec1..a17c4d87520a 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -105,7 +105,7 @@ xfs_iformat_extents(
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, whichfork);
 	int			state = xfs_bmap_fork_to_state(whichfork);
-	int			nex = XFS_DFORK_NEXTENTS(dip, whichfork);
+	xfs_extnum_t		nex = XFS_DFORK_NEXTENTS(dip, whichfork);
 	int			size = nex * sizeof(xfs_bmbt_rec_t);
 	struct xfs_iext_cursor	icur;
 	struct xfs_bmbt_rec	*dp;
diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c
index eac15af7b08c..87925761e174 100644
--- a/fs/xfs/scrub/inode.c
+++ b/fs/xfs/scrub/inode.c
@@ -232,7 +232,7 @@ xchk_dinode(
 	size_t			fork_recs;
 	unsigned long long	isize;
 	uint64_t		flags2;
-	uint32_t		nextents;
+	xfs_extnum_t		nextents;
 	prid_t			prid;
 	uint16_t		flags;
 	uint16_t		mode;
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 4a8076ef8cb4..3153db29de40 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -2169,7 +2169,7 @@ DECLARE_EVENT_CLASS(xfs_swap_extent_class,
 		__field(int, which)
 		__field(xfs_ino_t, ino)
 		__field(int, format)
-		__field(int, nex)
+		__field(xfs_extnum_t, nex)
 		__field(int, broot_size)
 		__field(int, fork_off)
 	),
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH V7 04/17] xfs: Introduce xfs_dfork_nextents() helper
  2022-03-01 10:39 [PATCH V7 00/17] xfs: Extend per-inode extent counters Chandan Babu R
                   ` (2 preceding siblings ...)
  2022-03-01 10:39 ` [PATCH V7 03/17] xfs: Use xfs_extnum_t instead of basic data types Chandan Babu R
@ 2022-03-01 10:39 ` Chandan Babu R
  2022-03-04  1:43   ` Dave Chinner
  2022-03-01 10:39 ` [PATCH V7 05/17] xfs: Use basic types to define xfs_log_dinode's di_nextents and di_anextents Chandan Babu R
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 53+ messages in thread
From: Chandan Babu R @ 2022-03-01 10:39 UTC (permalink / raw)
  To: linux-xfs; +Cc: Chandan Babu R, djwong, david

This commit replaces the macro XFS_DFORK_NEXTENTS() with the helper function
xfs_dfork_nextents(). As of this commit, xfs_dfork_nextents() returns the same
value as XFS_DFORK_NEXTENTS(). A future commit which extends inode's extent
counter fields will add more logic to this helper.

This commit also replaces direct accesses to xfs_dinode->di_[a]nextents
with calls to xfs_dfork_nextents().

No functional changes have been made.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
 fs/xfs/libxfs/xfs_format.h     |  4 ----
 fs/xfs/libxfs/xfs_inode_buf.c  | 16 +++++++++++-----
 fs/xfs/libxfs/xfs_inode_fork.c | 10 ++++++----
 fs/xfs/libxfs/xfs_inode_fork.h | 32 ++++++++++++++++++++++++++++++++
 fs/xfs/scrub/inode.c           | 18 ++++++++++--------
 5 files changed, 59 insertions(+), 21 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index d75e5b16da7e..e5654b578ec0 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -925,10 +925,6 @@ enum xfs_dinode_fmt {
 	((w) == XFS_DATA_FORK ? \
 		(dip)->di_format : \
 		(dip)->di_aformat)
-#define XFS_DFORK_NEXTENTS(dip,w) \
-	((w) == XFS_DATA_FORK ? \
-		be32_to_cpu((dip)->di_nextents) : \
-		be16_to_cpu((dip)->di_anextents))
 
 /*
  * For block and character special files the 32bit dev_t is stored at the
diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index 5c95a5428fc7..860d32816909 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -336,9 +336,11 @@ xfs_dinode_verify_fork(
 	struct xfs_mount	*mp,
 	int			whichfork)
 {
-	xfs_extnum_t		di_nextents = XFS_DFORK_NEXTENTS(dip, whichfork);
+	xfs_extnum_t		di_nextents;
 	xfs_extnum_t		max_extents;
 
+	di_nextents = xfs_dfork_nextents(dip, whichfork);
+
 	switch (XFS_DFORK_FORMAT(dip, whichfork)) {
 	case XFS_DINODE_FMT_LOCAL:
 		/*
@@ -405,6 +407,8 @@ xfs_dinode_verify(
 	uint16_t		flags;
 	uint64_t		flags2;
 	uint64_t		di_size;
+	xfs_extnum_t            nextents;
+	xfs_filblks_t		nblocks;
 
 	if (dip->di_magic != cpu_to_be16(XFS_DINODE_MAGIC))
 		return __this_address;
@@ -435,10 +439,12 @@ xfs_dinode_verify(
 	if ((S_ISLNK(mode) || S_ISDIR(mode)) && di_size == 0)
 		return __this_address;
 
+	nextents = xfs_dfork_data_extents(dip);
+	nextents += xfs_dfork_attr_extents(dip);
+	nblocks = be64_to_cpu(dip->di_nblocks);
+
 	/* Fork checks carried over from xfs_iformat_fork */
-	if (mode &&
-	    be32_to_cpu(dip->di_nextents) + be16_to_cpu(dip->di_anextents) >
-			be64_to_cpu(dip->di_nblocks))
+	if (mode && nextents > nblocks)
 		return __this_address;
 
 	if (mode && XFS_DFORK_BOFF(dip) > mp->m_sb.sb_inodesize)
@@ -495,7 +501,7 @@ xfs_dinode_verify(
 		default:
 			return __this_address;
 		}
-		if (dip->di_anextents)
+		if (xfs_dfork_attr_extents(dip))
 			return __this_address;
 	}
 
diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
index a17c4d87520a..829739e249b6 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -105,7 +105,7 @@ xfs_iformat_extents(
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, whichfork);
 	int			state = xfs_bmap_fork_to_state(whichfork);
-	xfs_extnum_t		nex = XFS_DFORK_NEXTENTS(dip, whichfork);
+	xfs_extnum_t		nex = xfs_dfork_nextents(dip, whichfork);
 	int			size = nex * sizeof(xfs_bmbt_rec_t);
 	struct xfs_iext_cursor	icur;
 	struct xfs_bmbt_rec	*dp;
@@ -230,7 +230,7 @@ xfs_iformat_data_fork(
 	 * depend on it.
 	 */
 	ip->i_df.if_format = dip->di_format;
-	ip->i_df.if_nextents = be32_to_cpu(dip->di_nextents);
+	ip->i_df.if_nextents = xfs_dfork_data_extents(dip);
 
 	switch (inode->i_mode & S_IFMT) {
 	case S_IFIFO:
@@ -295,14 +295,16 @@ xfs_iformat_attr_fork(
 	struct xfs_inode	*ip,
 	struct xfs_dinode	*dip)
 {
+	xfs_extnum_t		naextents;
 	int			error = 0;
 
+	naextents = xfs_dfork_attr_extents(dip);
+
 	/*
 	 * Initialize the extent count early, as the per-format routines may
 	 * depend on it.
 	 */
-	ip->i_afp = xfs_ifork_alloc(dip->di_aformat,
-				be16_to_cpu(dip->di_anextents));
+	ip->i_afp = xfs_ifork_alloc(dip->di_aformat, naextents);
 
 	switch (ip->i_afp->if_format) {
 	case XFS_DINODE_FMT_LOCAL:
diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
index 2605f7ff8fc1..7ed2ecb51bca 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.h
+++ b/fs/xfs/libxfs/xfs_inode_fork.h
@@ -141,6 +141,38 @@ static inline xfs_extnum_t xfs_iext_max_nextents(int whichfork)
 	return MAXAEXTNUM;
 }
 
+static inline xfs_extnum_t
+xfs_dfork_data_extents(
+	struct xfs_dinode	*dip)
+{
+	return be32_to_cpu(dip->di_nextents);
+}
+
+static inline xfs_extnum_t
+xfs_dfork_attr_extents(
+	struct xfs_dinode	*dip)
+{
+	return be16_to_cpu(dip->di_anextents);
+}
+
+static inline xfs_extnum_t
+xfs_dfork_nextents(
+	struct xfs_dinode	*dip,
+	int			whichfork)
+{
+	switch (whichfork) {
+	case XFS_DATA_FORK:
+		return xfs_dfork_data_extents(dip);
+	case XFS_ATTR_FORK:
+		return xfs_dfork_attr_extents(dip);
+	default:
+		ASSERT(0);
+		break;
+	}
+
+	return 0;
+}
+
 struct xfs_ifork *xfs_ifork_alloc(enum xfs_dinode_fmt format,
 				xfs_extnum_t nextents);
 struct xfs_ifork *xfs_iext_state_to_fork(struct xfs_inode *ip, int state);
diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c
index 87925761e174..edad5307e430 100644
--- a/fs/xfs/scrub/inode.c
+++ b/fs/xfs/scrub/inode.c
@@ -233,6 +233,7 @@ xchk_dinode(
 	unsigned long long	isize;
 	uint64_t		flags2;
 	xfs_extnum_t		nextents;
+	xfs_extnum_t		naextents;
 	prid_t			prid;
 	uint16_t		flags;
 	uint16_t		mode;
@@ -391,7 +392,7 @@ xchk_dinode(
 	xchk_inode_extsize(sc, dip, ino, mode, flags);
 
 	/* di_nextents */
-	nextents = be32_to_cpu(dip->di_nextents);
+	nextents = xfs_dfork_data_extents(dip);
 	fork_recs =  XFS_DFORK_DSIZE(dip, mp) / sizeof(struct xfs_bmbt_rec);
 	switch (dip->di_format) {
 	case XFS_DINODE_FMT_EXTENTS:
@@ -408,10 +409,12 @@ xchk_dinode(
 		break;
 	}
 
+	naextents = xfs_dfork_attr_extents(dip);
+
 	/* di_forkoff */
 	if (XFS_DFORK_APTR(dip) >= (char *)dip + mp->m_sb.sb_inodesize)
 		xchk_ino_set_corrupt(sc, ino);
-	if (dip->di_anextents != 0 && dip->di_forkoff == 0)
+	if (naextents != 0 && dip->di_forkoff == 0)
 		xchk_ino_set_corrupt(sc, ino);
 	if (dip->di_forkoff == 0 && dip->di_aformat != XFS_DINODE_FMT_EXTENTS)
 		xchk_ino_set_corrupt(sc, ino);
@@ -423,19 +426,18 @@ xchk_dinode(
 		xchk_ino_set_corrupt(sc, ino);
 
 	/* di_anextents */
-	nextents = be16_to_cpu(dip->di_anextents);
 	fork_recs =  XFS_DFORK_ASIZE(dip, mp) / sizeof(struct xfs_bmbt_rec);
 	switch (dip->di_aformat) {
 	case XFS_DINODE_FMT_EXTENTS:
-		if (nextents > fork_recs)
+		if (naextents > fork_recs)
 			xchk_ino_set_corrupt(sc, ino);
 		break;
 	case XFS_DINODE_FMT_BTREE:
-		if (nextents <= fork_recs)
+		if (naextents <= fork_recs)
 			xchk_ino_set_corrupt(sc, ino);
 		break;
 	default:
-		if (nextents != 0)
+		if (naextents != 0)
 			xchk_ino_set_corrupt(sc, ino);
 	}
 
@@ -513,14 +515,14 @@ xchk_inode_xref_bmap(
 			&nextents, &count);
 	if (!xchk_should_check_xref(sc, &error, NULL))
 		return;
-	if (nextents < be32_to_cpu(dip->di_nextents))
+	if (nextents < xfs_dfork_data_extents(dip))
 		xchk_ino_xref_set_corrupt(sc, sc->ip->i_ino);
 
 	error = xfs_bmap_count_blocks(sc->tp, sc->ip, XFS_ATTR_FORK,
 			&nextents, &acount);
 	if (!xchk_should_check_xref(sc, &error, NULL))
 		return;
-	if (nextents != be16_to_cpu(dip->di_anextents))
+	if (nextents != xfs_dfork_attr_extents(dip))
 		xchk_ino_xref_set_corrupt(sc, sc->ip->i_ino);
 
 	/* Check nblocks against the inode. */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH V7 05/17] xfs: Use basic types to define xfs_log_dinode's di_nextents and di_anextents
  2022-03-01 10:39 [PATCH V7 00/17] xfs: Extend per-inode extent counters Chandan Babu R
                   ` (3 preceding siblings ...)
  2022-03-01 10:39 ` [PATCH V7 04/17] xfs: Introduce xfs_dfork_nextents() helper Chandan Babu R
@ 2022-03-01 10:39 ` Chandan Babu R
  2022-03-04  1:44   ` Dave Chinner
  2022-03-01 10:39 ` [PATCH V7 06/17] xfs: Promote xfs_extnum_t and xfs_aextnum_t to 64 and 32-bits respectively Chandan Babu R
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 53+ messages in thread
From: Chandan Babu R @ 2022-03-01 10:39 UTC (permalink / raw)
  To: linux-xfs; +Cc: Chandan Babu R, djwong, david

A future commit will increase the width of xfs_extnum_t in order to facilitate
larger per-inode extent counters. Hence this patch now uses basic types to
define xfs_log_dinode->[di_nextents|dianextents].

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
 fs/xfs/libxfs/xfs_log_format.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index b322db523d65..fd66e70248f7 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -396,8 +396,8 @@ struct xfs_log_dinode {
 	xfs_fsize_t	di_size;	/* number of bytes in file */
 	xfs_rfsblock_t	di_nblocks;	/* # of direct & btree blocks used */
 	xfs_extlen_t	di_extsize;	/* basic/minimum extent size for file */
-	xfs_extnum_t	di_nextents;	/* number of extents in data fork */
-	xfs_aextnum_t	di_anextents;	/* number of extents in attribute fork*/
+	uint32_t	di_nextents;	/* number of extents in data fork */
+	uint16_t	di_anextents;	/* number of extents in attribute fork*/
 	uint8_t		di_forkoff;	/* attr fork offs, <<3 for 64b align */
 	int8_t		di_aformat;	/* format of attr fork's data */
 	uint32_t	di_dmevmask;	/* DMIG event mask */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH V7 06/17] xfs: Promote xfs_extnum_t and xfs_aextnum_t to 64 and 32-bits respectively
  2022-03-01 10:39 [PATCH V7 00/17] xfs: Extend per-inode extent counters Chandan Babu R
                   ` (4 preceding siblings ...)
  2022-03-01 10:39 ` [PATCH V7 05/17] xfs: Use basic types to define xfs_log_dinode's di_nextents and di_anextents Chandan Babu R
@ 2022-03-01 10:39 ` Chandan Babu R
  2022-03-04  1:29   ` Dave Chinner
  2022-03-01 10:39 ` [PATCH V7 07/17] xfs: Introduce XFS_SB_FEAT_INCOMPAT_NREXT64 and associated per-fs feature bit Chandan Babu R
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 53+ messages in thread
From: Chandan Babu R @ 2022-03-01 10:39 UTC (permalink / raw)
  To: linux-xfs; +Cc: Chandan Babu R, djwong, david, kernel test robot

A future commit will introduce a 64-bit on-disk data extent counter and a
32-bit on-disk attr extent counter. This commit promotes xfs_extnum_t and
xfs_aextnum_t to 64 and 32-bits in order to correctly handle in-core versions
of these quantities.

Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c       | 6 +++---
 fs/xfs/libxfs/xfs_inode_fork.c | 2 +-
 fs/xfs/libxfs/xfs_inode_fork.h | 2 +-
 fs/xfs/libxfs/xfs_types.h      | 4 ++--
 fs/xfs/xfs_inode.c             | 4 ++--
 fs/xfs/xfs_trace.h             | 2 +-
 6 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 98541be873d8..9df98339a43a 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -52,9 +52,9 @@ xfs_bmap_compute_maxlevels(
 	xfs_mount_t	*mp,		/* file system mount structure */
 	int		whichfork)	/* data or attr fork */
 {
+	xfs_extnum_t	maxleafents;	/* max leaf entries possible */
 	int		level;		/* btree level */
 	uint		maxblocks;	/* max blocks at this level */
-	xfs_extnum_t	maxleafents;	/* max leaf entries possible */
 	int		maxrootrecs;	/* max records in root block */
 	int		minleafrecs;	/* min records in leaf block */
 	int		minnoderecs;	/* min records in node block */
@@ -83,7 +83,7 @@ xfs_bmap_compute_maxlevels(
 	maxrootrecs = xfs_bmdr_maxrecs(sz, 0);
 	minleafrecs = mp->m_bmap_dmnr[0];
 	minnoderecs = mp->m_bmap_dmnr[1];
-	maxblocks = (maxleafents + minleafrecs - 1) / minleafrecs;
+	maxblocks = howmany_64(maxleafents, minleafrecs);
 	for (level = 1; maxblocks > 1; level++) {
 		if (maxblocks <= maxrootrecs)
 			maxblocks = 1;
@@ -467,7 +467,7 @@ xfs_bmap_check_leaf_extents(
 	if (bp_release)
 		xfs_trans_brelse(NULL, bp);
 error_norelse:
-	xfs_warn(mp, "%s: BAD after btree leaves for %d extents",
+	xfs_warn(mp, "%s: BAD after btree leaves for %llu extents",
 		__func__, i);
 	xfs_err(mp, "%s: CORRUPTED BTREE OR SOMETHING", __func__);
 	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
index 829739e249b6..ce690abe5dce 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -117,7 +117,7 @@ xfs_iformat_extents(
 	 * we just bail out rather than crash in kmem_alloc() or memcpy() below.
 	 */
 	if (unlikely(size < 0 || size > XFS_DFORK_SIZE(dip, mp, whichfork))) {
-		xfs_warn(ip->i_mount, "corrupt inode %Lu ((a)extents = %d).",
+		xfs_warn(ip->i_mount, "corrupt inode %llu ((a)extents = %llu).",
 			(unsigned long long) ip->i_ino, nex);
 		xfs_inode_verifier_error(ip, -EFSCORRUPTED,
 				"xfs_iformat_extents(1)", dip, sizeof(*dip),
diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
index 7ed2ecb51bca..4a8b77d425df 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.h
+++ b/fs/xfs/libxfs/xfs_inode_fork.h
@@ -21,9 +21,9 @@ struct xfs_ifork {
 		void		*if_root;	/* extent tree root */
 		char		*if_data;	/* inline file data */
 	} if_u1;
+	xfs_extnum_t		if_nextents;	/* # of extents in this fork */
 	short			if_broot_bytes;	/* bytes allocated for root */
 	int8_t			if_format;	/* format of this fork */
-	xfs_extnum_t		if_nextents;	/* # of extents in this fork */
 };
 
 /*
diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
index 794a54cbd0de..373f64a492a4 100644
--- a/fs/xfs/libxfs/xfs_types.h
+++ b/fs/xfs/libxfs/xfs_types.h
@@ -12,8 +12,8 @@ typedef uint32_t	xfs_agblock_t;	/* blockno in alloc. group */
 typedef uint32_t	xfs_agino_t;	/* inode # within allocation grp */
 typedef uint32_t	xfs_extlen_t;	/* extent length in blocks */
 typedef uint32_t	xfs_agnumber_t;	/* allocation group number */
-typedef int32_t		xfs_extnum_t;	/* # of extents in a file */
-typedef int16_t		xfs_aextnum_t;	/* # extents in an attribute fork */
+typedef uint64_t	xfs_extnum_t;	/* # of extents in a file */
+typedef uint32_t	xfs_aextnum_t;	/* # extents in an attribute fork */
 typedef int64_t		xfs_fsize_t;	/* bytes in a file */
 typedef uint64_t	xfs_ufsize_t;	/* unsigned bytes in a file */
 
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 04bf467b1090..6810c4feaa45 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3495,8 +3495,8 @@ xfs_iflush(
 	if (XFS_TEST_ERROR(ip->i_df.if_nextents + xfs_ifork_nextents(ip->i_afp) >
 				ip->i_nblocks, mp, XFS_ERRTAG_IFLUSH_5)) {
 		xfs_alert_tag(mp, XFS_PTAG_IFLUSH,
-			"%s: detected corrupt incore inode %Lu, "
-			"total extents = %d, nblocks = %Ld, ptr "PTR_FMT,
+			"%s: detected corrupt incore inode %llu, "
+			"total extents = %llu nblocks = %lld, ptr "PTR_FMT,
 			__func__, ip->i_ino,
 			ip->i_df.if_nextents + xfs_ifork_nextents(ip->i_afp),
 			ip->i_nblocks, ip);
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 3153db29de40..6b4a7f197308 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -2182,7 +2182,7 @@ DECLARE_EVENT_CLASS(xfs_swap_extent_class,
 		__entry->broot_size = ip->i_df.if_broot_bytes;
 		__entry->fork_off = XFS_IFORK_BOFF(ip);
 	),
-	TP_printk("dev %d:%d ino 0x%llx (%s), %s format, num_extents %d, "
+	TP_printk("dev %d:%d ino 0x%llx (%s), %s format, num_extents %llu, "
 		  "broot size %d, forkoff 0x%x",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  __entry->ino,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH V7 07/17] xfs: Introduce XFS_SB_FEAT_INCOMPAT_NREXT64 and associated per-fs feature bit
  2022-03-01 10:39 [PATCH V7 00/17] xfs: Extend per-inode extent counters Chandan Babu R
                   ` (5 preceding siblings ...)
  2022-03-01 10:39 ` [PATCH V7 06/17] xfs: Promote xfs_extnum_t and xfs_aextnum_t to 64 and 32-bits respectively Chandan Babu R
@ 2022-03-01 10:39 ` Chandan Babu R
  2022-03-04  1:57   ` Dave Chinner
  2022-03-01 10:39 ` [PATCH V7 08/17] xfs: Introduce XFS_FSOP_GEOM_FLAGS_NREXT64 Chandan Babu R
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 53+ messages in thread
From: Chandan Babu R @ 2022-03-01 10:39 UTC (permalink / raw)
  To: linux-xfs; +Cc: Chandan Babu R, djwong, david

XFS_SB_FEAT_INCOMPAT_NREXT64 incompat feature bit will be set on filesystems
which support large per-inode extent counters. This commit defines the new
incompat feature bit and the corresponding per-fs feature bit (along with
inline functions to work on it).

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
 fs/xfs/libxfs/xfs_format.h | 1 +
 fs/xfs/libxfs/xfs_sb.c     | 3 +++
 fs/xfs/xfs_mount.h         | 2 ++
 3 files changed, 6 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index e5654b578ec0..7972cbc22608 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -372,6 +372,7 @@ xfs_sb_has_ro_compat_feature(
 #define XFS_SB_FEAT_INCOMPAT_META_UUID	(1 << 2)	/* metadata UUID */
 #define XFS_SB_FEAT_INCOMPAT_BIGTIME	(1 << 3)	/* large timestamps */
 #define XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR (1 << 4)	/* needs xfs_repair */
+#define XFS_SB_FEAT_INCOMPAT_NREXT64	(1 << 5)	/* 64-bit data fork extent counter */
 #define XFS_SB_FEAT_INCOMPAT_ALL \
 		(XFS_SB_FEAT_INCOMPAT_FTYPE|	\
 		 XFS_SB_FEAT_INCOMPAT_SPINODES|	\
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index f4e84aa1d50a..bd632389ae92 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -124,6 +124,9 @@ xfs_sb_version_to_features(
 		features |= XFS_FEAT_BIGTIME;
 	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR)
 		features |= XFS_FEAT_NEEDSREPAIR;
+	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_NREXT64)
+		features |= XFS_FEAT_NREXT64;
+
 	return features;
 }
 
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 00720a02e761..10941481f7e6 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -276,6 +276,7 @@ typedef struct xfs_mount {
 #define XFS_FEAT_INOBTCNT	(1ULL << 23)	/* inobt block counts */
 #define XFS_FEAT_BIGTIME	(1ULL << 24)	/* large timestamps */
 #define XFS_FEAT_NEEDSREPAIR	(1ULL << 25)	/* needs xfs_repair */
+#define XFS_FEAT_NREXT64	(1ULL << 26)	/* 64-bit inode extent counters */
 
 /* Mount features */
 #define XFS_FEAT_NOATTR2	(1ULL << 48)	/* disable attr2 creation */
@@ -338,6 +339,7 @@ __XFS_HAS_FEAT(realtime, REALTIME)
 __XFS_HAS_FEAT(inobtcounts, INOBTCNT)
 __XFS_HAS_FEAT(bigtime, BIGTIME)
 __XFS_HAS_FEAT(needsrepair, NEEDSREPAIR)
+__XFS_HAS_FEAT(nrext64, NREXT64)
 
 /*
  * Mount features
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH V7 08/17] xfs: Introduce XFS_FSOP_GEOM_FLAGS_NREXT64
  2022-03-01 10:39 [PATCH V7 00/17] xfs: Extend per-inode extent counters Chandan Babu R
                   ` (6 preceding siblings ...)
  2022-03-01 10:39 ` [PATCH V7 07/17] xfs: Introduce XFS_SB_FEAT_INCOMPAT_NREXT64 and associated per-fs feature bit Chandan Babu R
@ 2022-03-01 10:39 ` Chandan Babu R
  2022-03-04  1:58   ` Dave Chinner
  2022-03-01 10:39 ` [PATCH V7 09/17] xfs: Introduce XFS_DIFLAG2_NREXT64 and associated helpers Chandan Babu R
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 53+ messages in thread
From: Chandan Babu R @ 2022-03-01 10:39 UTC (permalink / raw)
  To: linux-xfs; +Cc: Chandan Babu R, djwong, david

XFS_FSOP_GEOM_FLAGS_NREXT64 indicates that the current filesystem instance
supports 64-bit per-inode extent counters.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
 fs/xfs/libxfs/xfs_fs.h | 1 +
 fs/xfs/libxfs/xfs_sb.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 505533c43a92..2204d49d0c3a 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -236,6 +236,7 @@ typedef struct xfs_fsop_resblks {
 #define XFS_FSOP_GEOM_FLAGS_REFLINK	(1 << 20) /* files can share blocks */
 #define XFS_FSOP_GEOM_FLAGS_BIGTIME	(1 << 21) /* 64-bit nsec timestamps */
 #define XFS_FSOP_GEOM_FLAGS_INOBTCNT	(1 << 22) /* inobt btree counter */
+#define XFS_FSOP_GEOM_FLAGS_NREXT64	(1 << 23) /* 64-bit extent counter */
 
 /*
  * Minimum and maximum sizes need for growth checks.
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index bd632389ae92..0c1add39177f 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -1138,6 +1138,8 @@ xfs_fs_geometry(
 	} else {
 		geo->logsectsize = BBSIZE;
 	}
+	if (xfs_has_nrext64(mp))
+		geo->flags |= XFS_FSOP_GEOM_FLAGS_NREXT64;
 	geo->rtsectsize = sbp->sb_blocksize;
 	geo->dirblocksize = xfs_dir2_dirblock_bytes(sbp);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH V7 09/17] xfs: Introduce XFS_DIFLAG2_NREXT64 and associated helpers
  2022-03-01 10:39 [PATCH V7 00/17] xfs: Extend per-inode extent counters Chandan Babu R
                   ` (7 preceding siblings ...)
  2022-03-01 10:39 ` [PATCH V7 08/17] xfs: Introduce XFS_FSOP_GEOM_FLAGS_NREXT64 Chandan Babu R
@ 2022-03-01 10:39 ` Chandan Babu R
  2022-03-01 10:39 ` [PATCH V7 10/17] xfs: Use xfs_rfsblock_t to count maximum blocks that can be used by BMBT Chandan Babu R
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 53+ messages in thread
From: Chandan Babu R @ 2022-03-01 10:39 UTC (permalink / raw)
  To: linux-xfs; +Cc: Chandan Babu R, djwong, david

This commit adds the new per-inode flag XFS_DIFLAG2_NREXT64 to indicate that
an inode supports 64-bit extent counters. This flag is also enabled by default
on newly created inodes when the corresponding filesystem has large extent
counter feature bit (i.e. XFS_FEAT_NREXT64) set.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
 fs/xfs/libxfs/xfs_format.h      | 10 +++++++++-
 fs/xfs/libxfs/xfs_ialloc.c      |  2 ++
 fs/xfs/xfs_inode.h              |  5 +++++
 fs/xfs/xfs_inode_item_recover.c |  6 ++++++
 4 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 7972cbc22608..9934c320bf01 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -992,15 +992,17 @@ static inline void xfs_dinode_put_rdev(struct xfs_dinode *dip, xfs_dev_t rdev)
 #define XFS_DIFLAG2_REFLINK_BIT	1	/* file's blocks may be shared */
 #define XFS_DIFLAG2_COWEXTSIZE_BIT   2  /* copy on write extent size hint */
 #define XFS_DIFLAG2_BIGTIME_BIT	3	/* big timestamps */
+#define XFS_DIFLAG2_NREXT64_BIT 4	/* 64-bit extent counter enabled */
 
 #define XFS_DIFLAG2_DAX		(1 << XFS_DIFLAG2_DAX_BIT)
 #define XFS_DIFLAG2_REFLINK     (1 << XFS_DIFLAG2_REFLINK_BIT)
 #define XFS_DIFLAG2_COWEXTSIZE  (1 << XFS_DIFLAG2_COWEXTSIZE_BIT)
 #define XFS_DIFLAG2_BIGTIME	(1 << XFS_DIFLAG2_BIGTIME_BIT)
+#define XFS_DIFLAG2_NREXT64	(1 << XFS_DIFLAG2_NREXT64_BIT)
 
 #define XFS_DIFLAG2_ANY \
 	(XFS_DIFLAG2_DAX | XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE | \
-	 XFS_DIFLAG2_BIGTIME)
+	 XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64)
 
 static inline bool xfs_dinode_has_bigtime(const struct xfs_dinode *dip)
 {
@@ -1008,6 +1010,12 @@ static inline bool xfs_dinode_has_bigtime(const struct xfs_dinode *dip)
 	       (dip->di_flags2 & cpu_to_be64(XFS_DIFLAG2_BIGTIME));
 }
 
+static inline bool xfs_dinode_has_nrext64(const struct xfs_dinode *dip)
+{
+	return dip->di_version >= 3 &&
+	       (dip->di_flags2 & cpu_to_be64(XFS_DIFLAG2_NREXT64));
+}
+
 /*
  * Inode number format:
  * low inopblog bits - offset in block
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index b418fe0c0679..1d2ba51483ec 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -2772,6 +2772,8 @@ xfs_ialloc_setup_geometry(
 	igeo->new_diflags2 = 0;
 	if (xfs_has_bigtime(mp))
 		igeo->new_diflags2 |= XFS_DIFLAG2_BIGTIME;
+	if (xfs_has_nrext64(mp))
+		igeo->new_diflags2 |= XFS_DIFLAG2_NREXT64;
 
 	/* Compute inode btree geometry. */
 	igeo->agino_log = sbp->sb_inopblog + sbp->sb_agblklog;
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index b7e8f14d9fca..ee54a775a340 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -218,6 +218,11 @@ static inline bool xfs_inode_has_bigtime(struct xfs_inode *ip)
 	return ip->i_diflags2 & XFS_DIFLAG2_BIGTIME;
 }
 
+static inline bool xfs_inode_has_nrext64(struct xfs_inode *ip)
+{
+	return ip->i_diflags2 & XFS_DIFLAG2_NREXT64;
+}
+
 /*
  * Return the buftarg used for data allocations on a given inode.
  */
diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c
index 239dd2e3384e..767a551816a0 100644
--- a/fs/xfs/xfs_inode_item_recover.c
+++ b/fs/xfs/xfs_inode_item_recover.c
@@ -142,6 +142,12 @@ xfs_log_dinode_to_disk_ts(
 	return ts;
 }
 
+static inline bool xfs_log_dinode_has_nrext64(const struct xfs_log_dinode *ld)
+{
+	return ld->di_version >= 3 &&
+	       (ld->di_flags2 & XFS_DIFLAG2_NREXT64);
+}
+
 STATIC void
 xfs_log_dinode_to_disk(
 	struct xfs_log_dinode	*from,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH V7 10/17] xfs: Use xfs_rfsblock_t to count maximum blocks that can be used by BMBT
  2022-03-01 10:39 [PATCH V7 00/17] xfs: Extend per-inode extent counters Chandan Babu R
                   ` (8 preceding siblings ...)
  2022-03-01 10:39 ` [PATCH V7 09/17] xfs: Introduce XFS_DIFLAG2_NREXT64 and associated helpers Chandan Babu R
@ 2022-03-01 10:39 ` Chandan Babu R
  2022-03-04  2:09   ` Dave Chinner
  2022-03-01 10:39 ` [PATCH V7 11/17] xfs: Introduce macros to represent new maximum extent counts for data/attr forks Chandan Babu R
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 53+ messages in thread
From: Chandan Babu R @ 2022-03-01 10:39 UTC (permalink / raw)
  To: linux-xfs; +Cc: Chandan Babu R, djwong, david, kernel test robot

Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 9df98339a43a..a01d9a9225ae 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -53,8 +53,8 @@ xfs_bmap_compute_maxlevels(
 	int		whichfork)	/* data or attr fork */
 {
 	xfs_extnum_t	maxleafents;	/* max leaf entries possible */
+	xfs_rfsblock_t	maxblocks;	/* max blocks at this level */
 	int		level;		/* btree level */
-	uint		maxblocks;	/* max blocks at this level */
 	int		maxrootrecs;	/* max records in root block */
 	int		minleafrecs;	/* min records in leaf block */
 	int		minnoderecs;	/* min records in node block */
@@ -88,7 +88,7 @@ xfs_bmap_compute_maxlevels(
 		if (maxblocks <= maxrootrecs)
 			maxblocks = 1;
 		else
-			maxblocks = (maxblocks + minnoderecs - 1) / minnoderecs;
+			maxblocks = howmany_64(maxblocks, minnoderecs);
 	}
 	mp->m_bm_maxlevels[whichfork] = level;
 	ASSERT(mp->m_bm_maxlevels[whichfork] <= xfs_bmbt_maxlevels_ondisk());
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH V7 11/17] xfs: Introduce macros to represent new maximum extent counts for data/attr forks
  2022-03-01 10:39 [PATCH V7 00/17] xfs: Extend per-inode extent counters Chandan Babu R
                   ` (9 preceding siblings ...)
  2022-03-01 10:39 ` [PATCH V7 10/17] xfs: Use xfs_rfsblock_t to count maximum blocks that can be used by BMBT Chandan Babu R
@ 2022-03-01 10:39 ` Chandan Babu R
  2022-03-04  2:32   ` Dave Chinner
  2022-03-01 10:39 ` [PATCH V7 12/17] xfs: Introduce per-inode 64-bit extent counters Chandan Babu R
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 53+ messages in thread
From: Chandan Babu R @ 2022-03-01 10:39 UTC (permalink / raw)
  To: linux-xfs; +Cc: Chandan Babu R, djwong, david

This commit defines new macros to represent maximum extent counts allowed by
filesystems which have support for large per-inode extent counters.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c       |  8 +++-----
 fs/xfs/libxfs/xfs_bmap_btree.c |  2 +-
 fs/xfs/libxfs/xfs_format.h     | 20 ++++++++++++++++----
 fs/xfs/libxfs/xfs_inode_buf.c  |  3 ++-
 fs/xfs/libxfs/xfs_inode_fork.c |  2 +-
 fs/xfs/libxfs/xfs_inode_fork.h | 19 +++++++++++++++----
 6 files changed, 38 insertions(+), 16 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index a01d9a9225ae..be7f8ebe3cd5 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -61,10 +61,8 @@ xfs_bmap_compute_maxlevels(
 	int		sz;		/* root block size */
 
 	/*
-	 * The maximum number of extents in a file, hence the maximum number of
-	 * leaf entries, is controlled by the size of the on-disk extent count,
-	 * either a signed 32-bit number for the data fork, or a signed 16-bit
-	 * number for the attr fork.
+	 * The maximum number of extents in a fork, hence the maximum number of
+	 * leaf entries, is controlled by the size of the on-disk extent count.
 	 *
 	 * Note that we can no longer assume that if we are in ATTR1 that the
 	 * fork offset of all the inodes will be
@@ -74,7 +72,7 @@ xfs_bmap_compute_maxlevels(
 	 * ATTR2 we have to assume the worst case scenario of a minimum size
 	 * available.
 	 */
-	maxleafents = xfs_iext_max_nextents(whichfork);
+	maxleafents = xfs_iext_max_nextents(xfs_has_nrext64(mp), whichfork);
 	if (whichfork == XFS_DATA_FORK)
 		sz = XFS_BMDR_SPACE_CALC(MINDBTPTRS);
 	else
diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
index 453309fc85f2..e8d21d69b9ff 100644
--- a/fs/xfs/libxfs/xfs_bmap_btree.c
+++ b/fs/xfs/libxfs/xfs_bmap_btree.c
@@ -611,7 +611,7 @@ xfs_bmbt_maxlevels_ondisk(void)
 	minrecs[1] = xfs_bmbt_block_maxrecs(blocklen, false) / 2;
 
 	/* One extra level for the inode root. */
-	return xfs_btree_compute_maxlevels(minrecs, MAXEXTNUM) + 1;
+	return xfs_btree_compute_maxlevels(minrecs, XFS_MAX_EXTCNT_DATA_FORK) + 1;
 }
 
 /*
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 9934c320bf01..d3dfd45c39e0 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -872,10 +872,22 @@ enum xfs_dinode_fmt {
 
 /*
  * Max values for extlen, extnum, aextnum.
- */
-#define	MAXEXTLEN	((xfs_extlen_t)0x001fffff)	/* 21 bits */
-#define	MAXEXTNUM	((xfs_extnum_t)0x7fffffff)	/* signed int */
-#define	MAXAEXTNUM	((xfs_aextnum_t)0x7fff)		/* signed short */
+ *
+ * The newly introduced data fork extent counter is a 64-bit field. However, the
+ * maximum number of extents in a file is limited to 2^54 extents (assuming one
+ * blocks per extent) by the 54-bit wide startoff field of an extent record.
+ *
+ * A further limitation applies as shown below,
+ * 2^63 (max file size) / 64k (max block size) = 2^47
+ *
+ * Rounding up 47 to the nearest multiple of bits-per-byte results in 48. Hence
+ * 2^48 was chosen as the maximum data fork extent count.
+ */
+#define	MAXEXTLEN			((xfs_extlen_t)((1ULL << 21) - 1)) /* 21 bits */
+#define XFS_MAX_EXTCNT_DATA_FORK	((xfs_extnum_t)((1ULL << 48) - 1)) /* Unsigned 48-bits */
+#define XFS_MAX_EXTCNT_ATTR_FORK	((xfs_extnum_t)((1ULL << 32) - 1)) /* Unsigned 32-bits */
+#define XFS_MAX_EXTCNT_DATA_FORK_OLD	((xfs_extnum_t)((1ULL << 31) - 1)) /* Signed 32-bits */
+#define XFS_MAX_EXTCNT_ATTR_FORK_OLD	((xfs_extnum_t)((1ULL << 15) - 1)) /* Signed 16-bits */
 
 /*
  * Inode minimum and maximum sizes.
diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index 860d32816909..34f360a38603 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -361,7 +361,8 @@ xfs_dinode_verify_fork(
 			return __this_address;
 		break;
 	case XFS_DINODE_FMT_BTREE:
-		max_extents = xfs_iext_max_nextents(whichfork);
+		max_extents = xfs_iext_max_nextents(xfs_dinode_has_nrext64(dip),
+					whichfork);
 		if (di_nextents > max_extents)
 			return __this_address;
 		break;
diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
index ce690abe5dce..a3a3b54f9c55 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -746,7 +746,7 @@ xfs_iext_count_may_overflow(
 	if (whichfork == XFS_COW_FORK)
 		return 0;
 
-	max_exts = xfs_iext_max_nextents(whichfork);
+	max_exts = xfs_iext_max_nextents(xfs_inode_has_nrext64(ip), whichfork);
 
 	if (XFS_TEST_ERROR(false, ip->i_mount, XFS_ERRTAG_REDUCE_MAX_IEXTENTS))
 		max_exts = 10;
diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
index 4a8b77d425df..e56803436c61 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.h
+++ b/fs/xfs/libxfs/xfs_inode_fork.h
@@ -133,12 +133,23 @@ static inline int8_t xfs_ifork_format(struct xfs_ifork *ifp)
 	return ifp->if_format;
 }
 
-static inline xfs_extnum_t xfs_iext_max_nextents(int whichfork)
+static inline xfs_extnum_t xfs_iext_max_nextents(bool has_nrext64,
+				int whichfork)
 {
-	if (whichfork == XFS_DATA_FORK || whichfork == XFS_COW_FORK)
-		return MAXEXTNUM;
+	switch (whichfork) {
+	case XFS_DATA_FORK:
+	case XFS_COW_FORK:
+		return has_nrext64 ? XFS_MAX_EXTCNT_DATA_FORK
+			: XFS_MAX_EXTCNT_DATA_FORK_OLD;
+
+	case XFS_ATTR_FORK:
+		return has_nrext64 ? XFS_MAX_EXTCNT_ATTR_FORK
+			: XFS_MAX_EXTCNT_ATTR_FORK_OLD;
 
-	return MAXAEXTNUM;
+	default:
+		ASSERT(0);
+		return 0;
+	}
 }
 
 static inline xfs_extnum_t
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH V7 12/17] xfs: Introduce per-inode 64-bit extent counters
  2022-03-01 10:39 [PATCH V7 00/17] xfs: Extend per-inode extent counters Chandan Babu R
                   ` (10 preceding siblings ...)
  2022-03-01 10:39 ` [PATCH V7 11/17] xfs: Introduce macros to represent new maximum extent counts for data/attr forks Chandan Babu R
@ 2022-03-01 10:39 ` Chandan Babu R
  2022-03-04  7:14   ` Dave Chinner
  2022-03-01 10:39 ` [PATCH V7 13/17] xfs: xfs_growfs_rt_alloc: Unlock inode explicitly rather than through iop_committing() Chandan Babu R
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 53+ messages in thread
From: Chandan Babu R @ 2022-03-01 10:39 UTC (permalink / raw)
  To: linux-xfs; +Cc: Chandan Babu R, djwong, david, Dave Chinner

This commit introduces new fields in the on-disk inode format to support
64-bit data fork extent counters and 32-bit attribute fork extent
counters. The new fields will be used only when an inode has
XFS_DIFLAG2_NREXT64 flag set. Otherwise we continue to use the regular 32-bit
data fork extent counters and 16-bit attribute fork extent counters.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Suggested-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_format.h      | 33 ++++++++++++--
 fs/xfs/libxfs/xfs_inode_buf.c   | 49 ++++++++++++++++++--
 fs/xfs/libxfs/xfs_inode_fork.h  |  6 +++
 fs/xfs/libxfs/xfs_log_format.h  | 33 ++++++++++++--
 fs/xfs/xfs_inode_item.c         | 23 ++++++++--
 fs/xfs/xfs_inode_item_recover.c | 79 ++++++++++++++++++++++++++++-----
 6 files changed, 196 insertions(+), 27 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index d3dfd45c39e0..1a5b194da191 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -792,16 +792,41 @@ struct xfs_dinode {
 	__be32		di_nlink;	/* number of links to file */
 	__be16		di_projid_lo;	/* lower part of owner's project id */
 	__be16		di_projid_hi;	/* higher part owner's project id */
-	__u8		di_pad[6];	/* unused, zeroed space */
-	__be16		di_flushiter;	/* incremented on flush */
+	union {
+		/* Number of data fork extents if NREXT64 is set */
+		__be64	di_big_nextents;
+
+		/* Padding for V3 inodes without NREXT64 set. */
+		__be64	di_v3_pad;
+
+		/* Padding and inode flush counter for V2 inodes. */
+		struct {
+			__u8	di_v2_pad[6];
+			__be16	di_flushiter;
+		};
+	};
 	xfs_timestamp_t	di_atime;	/* time last accessed */
 	xfs_timestamp_t	di_mtime;	/* time last modified */
 	xfs_timestamp_t	di_ctime;	/* time created/inode modified */
 	__be64		di_size;	/* number of bytes in file */
 	__be64		di_nblocks;	/* # of direct & btree blocks used */
 	__be32		di_extsize;	/* basic/minimum extent size for file */
-	__be32		di_nextents;	/* number of extents in data fork */
-	__be16		di_anextents;	/* number of extents in attribute fork*/
+	union {
+		/*
+		 * For V2 inodes and V3 inodes without NREXT64 set, this
+		 * is the number of data and attr fork extents.
+		 */
+		struct {
+			__be32	di_nextents;
+			__be16	di_anextents;
+		} __packed;
+
+		/* Number of attr fork extents if NREXT64 is set. */
+		struct {
+			__be32	di_big_anextents;
+			__be16	di_nrext64_pad;
+		} __packed;
+	} __packed;
 	__u8		di_forkoff;	/* attr fork offs, <<3 for 64b align */
 	__s8		di_aformat;	/* format of attr fork's data */
 	__be32		di_dmevmask;	/* DMIG event mask */
diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index 34f360a38603..a11d3ea5ebfe 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -279,6 +279,25 @@ xfs_inode_to_disk_ts(
 	return ts;
 }
 
+static inline void
+xfs_inode_to_disk_iext_counters(
+	struct xfs_inode	*ip,
+	struct xfs_dinode	*to)
+{
+	if (xfs_inode_has_nrext64(ip)) {
+		to->di_big_nextents = cpu_to_be64(xfs_ifork_nextents(&ip->i_df));
+		to->di_big_anextents = cpu_to_be32(xfs_ifork_nextents(ip->i_afp));
+		/*
+		 * We might be upgrading the inode to use larger extent counters
+		 * than was previously used. Hence zero the unused field.
+		 */
+		to->di_nrext64_pad = cpu_to_be16(0);
+	} else {
+		to->di_nextents = cpu_to_be32(xfs_ifork_nextents(&ip->i_df));
+		to->di_anextents = cpu_to_be16(xfs_ifork_nextents(ip->i_afp));
+	}
+}
+
 void
 xfs_inode_to_disk(
 	struct xfs_inode	*ip,
@@ -296,7 +315,6 @@ xfs_inode_to_disk(
 	to->di_projid_lo = cpu_to_be16(ip->i_projid & 0xffff);
 	to->di_projid_hi = cpu_to_be16(ip->i_projid >> 16);
 
-	memset(to->di_pad, 0, sizeof(to->di_pad));
 	to->di_atime = xfs_inode_to_disk_ts(ip, inode->i_atime);
 	to->di_mtime = xfs_inode_to_disk_ts(ip, inode->i_mtime);
 	to->di_ctime = xfs_inode_to_disk_ts(ip, inode->i_ctime);
@@ -307,8 +325,6 @@ xfs_inode_to_disk(
 	to->di_size = cpu_to_be64(ip->i_disk_size);
 	to->di_nblocks = cpu_to_be64(ip->i_nblocks);
 	to->di_extsize = cpu_to_be32(ip->i_extsize);
-	to->di_nextents = cpu_to_be32(xfs_ifork_nextents(&ip->i_df));
-	to->di_anextents = cpu_to_be16(xfs_ifork_nextents(ip->i_afp));
 	to->di_forkoff = ip->i_forkoff;
 	to->di_aformat = xfs_ifork_format(ip->i_afp);
 	to->di_flags = cpu_to_be16(ip->i_diflags);
@@ -323,11 +339,14 @@ xfs_inode_to_disk(
 		to->di_lsn = cpu_to_be64(lsn);
 		memset(to->di_pad2, 0, sizeof(to->di_pad2));
 		uuid_copy(&to->di_uuid, &ip->i_mount->m_sb.sb_meta_uuid);
-		to->di_flushiter = 0;
+		to->di_v3_pad = 0;
 	} else {
 		to->di_version = 2;
 		to->di_flushiter = cpu_to_be16(ip->i_flushiter);
+		memset(to->di_v2_pad, 0, sizeof(to->di_v2_pad));
 	}
+
+	xfs_inode_to_disk_iext_counters(ip, to);
 }
 
 static xfs_failaddr_t
@@ -397,6 +416,24 @@ xfs_dinode_verify_forkoff(
 	return NULL;
 }
 
+static xfs_failaddr_t
+xfs_dinode_verify_nrext64(
+	struct xfs_mount	*mp,
+	struct xfs_dinode	*dip)
+{
+	if (xfs_dinode_has_nrext64(dip)) {
+		if (!xfs_has_nrext64(mp))
+			return __this_address;
+		if (dip->di_nrext64_pad != 0)
+			return __this_address;
+	} else if (dip->di_version >= 3) {
+		if (dip->di_v3_pad != 0)
+			return __this_address;
+	}
+
+	return NULL;
+}
+
 xfs_failaddr_t
 xfs_dinode_verify(
 	struct xfs_mount	*mp,
@@ -440,6 +477,10 @@ xfs_dinode_verify(
 	if ((S_ISLNK(mode) || S_ISDIR(mode)) && di_size == 0)
 		return __this_address;
 
+	fa = xfs_dinode_verify_nrext64(mp, dip);
+	if (fa)
+		return fa;
+
 	nextents = xfs_dfork_data_extents(dip);
 	nextents += xfs_dfork_attr_extents(dip);
 	nblocks = be64_to_cpu(dip->di_nblocks);
diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
index e56803436c61..8e6221e32660 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.h
+++ b/fs/xfs/libxfs/xfs_inode_fork.h
@@ -156,6 +156,9 @@ static inline xfs_extnum_t
 xfs_dfork_data_extents(
 	struct xfs_dinode	*dip)
 {
+	if (xfs_dinode_has_nrext64(dip))
+		return be64_to_cpu(dip->di_big_nextents);
+
 	return be32_to_cpu(dip->di_nextents);
 }
 
@@ -163,6 +166,9 @@ static inline xfs_extnum_t
 xfs_dfork_attr_extents(
 	struct xfs_dinode	*dip)
 {
+	if (xfs_dinode_has_nrext64(dip))
+		return be32_to_cpu(dip->di_big_anextents);
+
 	return be16_to_cpu(dip->di_anextents);
 }
 
diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index fd66e70248f7..12234a880e94 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -388,16 +388,41 @@ struct xfs_log_dinode {
 	uint32_t	di_nlink;	/* number of links to file */
 	uint16_t	di_projid_lo;	/* lower part of owner's project id */
 	uint16_t	di_projid_hi;	/* higher part of owner's project id */
-	uint8_t		di_pad[6];	/* unused, zeroed space */
-	uint16_t	di_flushiter;	/* incremented on flush */
+	union {
+		/* Number of data fork extents if NREXT64 is set */
+		uint64_t	di_big_nextents;
+
+		/* Padding for V3 inodes without NREXT64 set. */
+		uint64_t	di_v3_pad;
+
+		/* Padding and inode flush counter for V2 inodes. */
+		struct {
+			uint8_t	di_v2_pad[6];	/* V2 inode zeroed space */
+			uint16_t di_flushiter;	/* V2 inode incremented on flush */
+		};
+	};
 	xfs_log_timestamp_t di_atime;	/* time last accessed */
 	xfs_log_timestamp_t di_mtime;	/* time last modified */
 	xfs_log_timestamp_t di_ctime;	/* time created/inode modified */
 	xfs_fsize_t	di_size;	/* number of bytes in file */
 	xfs_rfsblock_t	di_nblocks;	/* # of direct & btree blocks used */
 	xfs_extlen_t	di_extsize;	/* basic/minimum extent size for file */
-	uint32_t	di_nextents;	/* number of extents in data fork */
-	uint16_t	di_anextents;	/* number of extents in attribute fork*/
+	union {
+		/*
+		 * For V2 inodes and V3 inodes without NREXT64 set, this
+		 * is the number of data and attr fork extents.
+		 */
+		struct {
+			uint32_t  di_nextents;
+			uint16_t  di_anextents;
+		} __packed;
+
+		/* Number of attr fork extents if NREXT64 is set. */
+		struct {
+			uint32_t  di_big_anextents;
+			uint16_t  di_nrext64_pad;
+		} __packed;
+	} __packed;
 	uint8_t		di_forkoff;	/* attr fork offs, <<3 for 64b align */
 	int8_t		di_aformat;	/* format of attr fork's data */
 	uint32_t	di_dmevmask;	/* DMIG event mask */
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index 90d8e591baf8..0d2fe38dc6e5 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -358,6 +358,21 @@ xfs_copy_dm_fields_to_log_dinode(
 	}
 }
 
+static inline void
+xfs_inode_to_log_dinode_iext_counters(
+	struct xfs_inode	*ip,
+	struct xfs_log_dinode	*to)
+{
+	if (xfs_inode_has_nrext64(ip)) {
+		to->di_big_nextents = xfs_ifork_nextents(&ip->i_df);
+		to->di_big_anextents = xfs_ifork_nextents(ip->i_afp);
+		to->di_nrext64_pad = 0;
+	} else {
+		to->di_nextents = xfs_ifork_nextents(&ip->i_df);
+		to->di_anextents = xfs_ifork_nextents(ip->i_afp);
+	}
+}
+
 static void
 xfs_inode_to_log_dinode(
 	struct xfs_inode	*ip,
@@ -373,7 +388,6 @@ xfs_inode_to_log_dinode(
 	to->di_projid_lo = ip->i_projid & 0xffff;
 	to->di_projid_hi = ip->i_projid >> 16;
 
-	memset(to->di_pad, 0, sizeof(to->di_pad));
 	memset(to->di_pad3, 0, sizeof(to->di_pad3));
 	to->di_atime = xfs_inode_to_log_dinode_ts(ip, inode->i_atime);
 	to->di_mtime = xfs_inode_to_log_dinode_ts(ip, inode->i_mtime);
@@ -385,8 +399,6 @@ xfs_inode_to_log_dinode(
 	to->di_size = ip->i_disk_size;
 	to->di_nblocks = ip->i_nblocks;
 	to->di_extsize = ip->i_extsize;
-	to->di_nextents = xfs_ifork_nextents(&ip->i_df);
-	to->di_anextents = xfs_ifork_nextents(ip->i_afp);
 	to->di_forkoff = ip->i_forkoff;
 	to->di_aformat = xfs_ifork_format(ip->i_afp);
 	to->di_flags = ip->i_diflags;
@@ -406,11 +418,14 @@ xfs_inode_to_log_dinode(
 		to->di_lsn = lsn;
 		memset(to->di_pad2, 0, sizeof(to->di_pad2));
 		uuid_copy(&to->di_uuid, &ip->i_mount->m_sb.sb_meta_uuid);
-		to->di_flushiter = 0;
+		to->di_v3_pad = 0;
 	} else {
 		to->di_version = 2;
 		to->di_flushiter = ip->i_flushiter;
+		memset(to->di_v2_pad, 0, sizeof(to->di_v2_pad));
 	}
+
+	xfs_inode_to_log_dinode_iext_counters(ip, to);
 }
 
 /*
diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c
index 767a551816a0..c35796a4e9c5 100644
--- a/fs/xfs/xfs_inode_item_recover.c
+++ b/fs/xfs/xfs_inode_item_recover.c
@@ -148,6 +148,22 @@ static inline bool xfs_log_dinode_has_nrext64(const struct xfs_log_dinode *ld)
 	       (ld->di_flags2 & XFS_DIFLAG2_NREXT64);
 }
 
+static inline void
+xfs_log_dinode_to_disk_iext_counters(
+	struct xfs_log_dinode	*from,
+	struct xfs_dinode	*to)
+{
+	if (xfs_log_dinode_has_nrext64(from)) {
+		to->di_big_nextents = cpu_to_be64(from->di_big_nextents);
+		to->di_big_anextents = cpu_to_be32(from->di_big_anextents);
+		to->di_nrext64_pad = cpu_to_be16(from->di_nrext64_pad);
+	} else {
+		to->di_nextents = cpu_to_be32(from->di_nextents);
+		to->di_anextents = cpu_to_be16(from->di_anextents);
+	}
+
+}
+
 STATIC void
 xfs_log_dinode_to_disk(
 	struct xfs_log_dinode	*from,
@@ -164,7 +180,6 @@ xfs_log_dinode_to_disk(
 	to->di_nlink = cpu_to_be32(from->di_nlink);
 	to->di_projid_lo = cpu_to_be16(from->di_projid_lo);
 	to->di_projid_hi = cpu_to_be16(from->di_projid_hi);
-	memcpy(to->di_pad, from->di_pad, sizeof(to->di_pad));
 
 	to->di_atime = xfs_log_dinode_to_disk_ts(from, from->di_atime);
 	to->di_mtime = xfs_log_dinode_to_disk_ts(from, from->di_mtime);
@@ -173,8 +188,6 @@ xfs_log_dinode_to_disk(
 	to->di_size = cpu_to_be64(from->di_size);
 	to->di_nblocks = cpu_to_be64(from->di_nblocks);
 	to->di_extsize = cpu_to_be32(from->di_extsize);
-	to->di_nextents = cpu_to_be32(from->di_nextents);
-	to->di_anextents = cpu_to_be16(from->di_anextents);
 	to->di_forkoff = from->di_forkoff;
 	to->di_aformat = from->di_aformat;
 	to->di_dmevmask = cpu_to_be32(from->di_dmevmask);
@@ -192,10 +205,13 @@ xfs_log_dinode_to_disk(
 		to->di_lsn = cpu_to_be64(lsn);
 		memcpy(to->di_pad2, from->di_pad2, sizeof(to->di_pad2));
 		uuid_copy(&to->di_uuid, &from->di_uuid);
-		to->di_flushiter = 0;
+		to->di_v3_pad = from->di_v3_pad;
 	} else {
 		to->di_flushiter = cpu_to_be16(from->di_flushiter);
+		memcpy(to->di_v2_pad, from->di_v2_pad, sizeof(to->di_v2_pad));
 	}
+
+	xfs_log_dinode_to_disk_iext_counters(from, to);
 }
 
 STATIC int
@@ -209,6 +225,8 @@ xlog_recover_inode_commit_pass2(
 	struct xfs_mount		*mp = log->l_mp;
 	struct xfs_buf			*bp;
 	struct xfs_dinode		*dip;
+	xfs_extnum_t                    nextents;
+	xfs_aextnum_t                   anextents;
 	int				len;
 	char				*src;
 	char				*dest;
@@ -348,21 +366,60 @@ xlog_recover_inode_commit_pass2(
 			goto out_release;
 		}
 	}
-	if (unlikely(ldip->di_nextents + ldip->di_anextents > ldip->di_nblocks)){
-		XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(5)",
+
+	if (xfs_log_dinode_has_nrext64(ldip)) {
+		if (!xfs_has_nrext64(mp) || (ldip->di_nrext64_pad != 0)) {
+			XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(5)",
+				     XFS_ERRLEVEL_LOW, mp, ldip,
+				     sizeof(*ldip));
+			xfs_alert(mp,
+				"%s: Bad inode log record, rec ptr "PTR_FMT", "
+				"dino ptr "PTR_FMT", dino bp "PTR_FMT", "
+				"ino %Ld, xfs_has_nrext64(mp) = %d, "
+				"ldip->di_nrext64_pad = %u",
+				__func__, item, dip, bp, in_f->ilf_ino,
+				xfs_has_nrext64(mp), ldip->di_nrext64_pad);
+			error = -EFSCORRUPTED;
+			goto out_release;
+		}
+	} else {
+		if (ldip->di_version == 3 && ldip->di_big_nextents != 0) {
+			XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(6)",
+				     XFS_ERRLEVEL_LOW, mp, ldip,
+				     sizeof(*ldip));
+			xfs_alert(mp,
+				"%s: Bad inode log record, rec ptr "PTR_FMT", "
+				"dino ptr "PTR_FMT", dino bp "PTR_FMT", "
+				"ino %Ld, ldip->di_big_dextcnt = %llu",
+				__func__, item, dip, bp, in_f->ilf_ino,
+				ldip->di_big_nextents);
+			error = -EFSCORRUPTED;
+			goto out_release;
+		}
+	}
+
+	if (xfs_log_dinode_has_nrext64(ldip)) {
+		nextents = ldip->di_big_nextents;
+		anextents = ldip->di_big_anextents;
+	} else {
+		nextents = ldip->di_nextents;
+		anextents = ldip->di_anextents;
+	}
+
+	if (unlikely(nextents + anextents > ldip->di_nblocks)) {
+		XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(7)",
 				     XFS_ERRLEVEL_LOW, mp, ldip,
 				     sizeof(*ldip));
 		xfs_alert(mp,
 	"%s: Bad inode log record, rec ptr "PTR_FMT", dino ptr "PTR_FMT", "
-	"dino bp "PTR_FMT", ino %Ld, total extents = %d, nblocks = %Ld",
+	"dino bp "PTR_FMT", ino %Ld, total extents = %llu, nblocks = %Ld",
 			__func__, item, dip, bp, in_f->ilf_ino,
-			ldip->di_nextents + ldip->di_anextents,
-			ldip->di_nblocks);
+			nextents + anextents, ldip->di_nblocks);
 		error = -EFSCORRUPTED;
 		goto out_release;
 	}
 	if (unlikely(ldip->di_forkoff > mp->m_sb.sb_inodesize)) {
-		XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(6)",
+		XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(8)",
 				     XFS_ERRLEVEL_LOW, mp, ldip,
 				     sizeof(*ldip));
 		xfs_alert(mp,
@@ -374,7 +431,7 @@ xlog_recover_inode_commit_pass2(
 	}
 	isize = xfs_log_dinode_size(mp);
 	if (unlikely(item->ri_buf[1].i_len > isize)) {
-		XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(7)",
+		XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(9)",
 				     XFS_ERRLEVEL_LOW, mp, ldip,
 				     sizeof(*ldip));
 		xfs_alert(mp,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH V7 13/17] xfs: xfs_growfs_rt_alloc: Unlock inode explicitly rather than through iop_committing()
  2022-03-01 10:39 [PATCH V7 00/17] xfs: Extend per-inode extent counters Chandan Babu R
                   ` (11 preceding siblings ...)
  2022-03-01 10:39 ` [PATCH V7 12/17] xfs: Introduce per-inode 64-bit extent counters Chandan Babu R
@ 2022-03-01 10:39 ` Chandan Babu R
  2022-03-02  0:26   ` Darrick J. Wong
  2022-03-04  7:25   ` Dave Chinner
  2022-03-01 10:39 ` [PATCH V7 14/17] xfs: Conditionally upgrade existing inodes to use 64-bit extent counters Chandan Babu R
                   ` (3 subsequent siblings)
  16 siblings, 2 replies; 53+ messages in thread
From: Chandan Babu R @ 2022-03-01 10:39 UTC (permalink / raw)
  To: linux-xfs; +Cc: Chandan Babu R, djwong, david

In order to be able to upgrade inodes to XFS_DIFLAG2_NREXT64, a future commit
will perform such an upgrade in a transaction context. This requires the
transaction to be rolled once. Hence inodes which have been added to the
tranasction (via xfs_trans_ijoin()) with non-zero value for lock_flags
argument would cause the inode to be unlocked when the transaction is rolled.

To prevent this from happening in the case of realtime bitmap/summary inodes,
this commit now unlocks the inode explictly rather than through
iop_committing() call back.

Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
 fs/xfs/xfs_rtalloc.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index b8c79ee791af..a70140b35e8b 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -780,6 +780,7 @@ xfs_growfs_rt_alloc(
 	int			resblks;	/* space reservation */
 	enum xfs_blft		buf_type;
 	struct xfs_trans	*tp;
+	bool			unlock_inode;
 
 	if (ip == mp->m_rsumip)
 		buf_type = XFS_BLFT_RTSUMMARY_BUF;
@@ -802,7 +803,8 @@ xfs_growfs_rt_alloc(
 		 * Lock the inode.
 		 */
 		xfs_ilock(ip, XFS_ILOCK_EXCL);
-		xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+		xfs_trans_ijoin(tp, ip, 0);
+		unlock_inode = true;
 
 		error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
 				XFS_IEXT_ADD_NOSPLIT_CNT);
@@ -823,8 +825,11 @@ xfs_growfs_rt_alloc(
 		 * Free any blocks freed up in the transaction, then commit.
 		 */
 		error = xfs_trans_commit(tp);
-		if (error)
+                unlock_inode = false;
+                xfs_iunlock(ip, XFS_ILOCK_EXCL);
+                if (error)
 			return error;
+
 		/*
 		 * Now we need to clear the allocated blocks.
 		 * Do this one block per transaction, to keep it simple.
@@ -874,6 +879,8 @@ xfs_growfs_rt_alloc(
 
 out_trans_cancel:
 	xfs_trans_cancel(tp);
+	if (unlock_inode)
+		xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	return error;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH V7 14/17] xfs: Conditionally upgrade existing inodes to use 64-bit extent counters
  2022-03-01 10:39 [PATCH V7 00/17] xfs: Extend per-inode extent counters Chandan Babu R
                   ` (12 preceding siblings ...)
  2022-03-01 10:39 ` [PATCH V7 13/17] xfs: xfs_growfs_rt_alloc: Unlock inode explicitly rather than through iop_committing() Chandan Babu R
@ 2022-03-01 10:39 ` Chandan Babu R
  2022-03-04  7:51   ` Dave Chinner
  2022-03-01 10:39 ` [PATCH V7 15/17] xfs: Enable bulkstat ioctl to support 64-bit per-inode " Chandan Babu R
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 53+ messages in thread
From: Chandan Babu R @ 2022-03-01 10:39 UTC (permalink / raw)
  To: linux-xfs; +Cc: Chandan Babu R, djwong, david

This commit upgrades inodes to use 64-bit extent counters when they are read
from disk. Inodes are upgraded only when the filesystem instance has
XFS_SB_FEAT_INCOMPAT_NREXT64 incompat flag set.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.c       |  3 ++-
 fs/xfs/libxfs/xfs_bmap.c       |  5 ++---
 fs/xfs/libxfs/xfs_inode_fork.c | 37 ++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_inode_fork.h |  2 ++
 fs/xfs/xfs_bmap_item.c         |  3 ++-
 fs/xfs/xfs_bmap_util.c         | 10 ++++-----
 fs/xfs/xfs_dquot.c             |  2 +-
 fs/xfs/xfs_iomap.c             |  5 +++--
 fs/xfs/xfs_reflink.c           |  5 +++--
 fs/xfs/xfs_rtalloc.c           |  2 +-
 10 files changed, 58 insertions(+), 16 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 23523b802539..03a358930d74 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -774,7 +774,8 @@ xfs_attr_set(
 		return error;
 
 	if (args->value || xfs_inode_hasattr(dp)) {
-		error = xfs_iext_count_may_overflow(dp, XFS_ATTR_FORK,
+		error = xfs_trans_inode_ensure_nextents(&args->trans, dp,
+				XFS_ATTR_FORK,
 				XFS_IEXT_ATTR_MANIP_CNT(rmt_blks));
 		if (error)
 			goto out_trans_cancel;
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index be7f8ebe3cd5..3a3c99ef7f13 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4523,14 +4523,13 @@ xfs_bmapi_convert_delalloc(
 		return error;
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
+	xfs_trans_ijoin(tp, ip, 0);
 
-	error = xfs_iext_count_may_overflow(ip, whichfork,
+	error = xfs_trans_inode_ensure_nextents(&tp, ip, whichfork,
 			XFS_IEXT_ADD_NOSPLIT_CNT);
 	if (error)
 		goto out_trans_cancel;
 
-	xfs_trans_ijoin(tp, ip, 0);
-
 	if (!xfs_iext_lookup_extent(ip, ifp, offset_fsb, &bma.icur, &bma.got) ||
 	    bma.got.br_startoff > offset_fsb) {
 		/*
diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
index a3a3b54f9c55..d1d065abeac3 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -757,3 +757,40 @@ xfs_iext_count_may_overflow(
 
 	return 0;
 }
+
+/*
+ * Ensure that the inode has the ability to add the specified number of
+ * extents.  Caller must hold ILOCK_EXCL and have joined the inode to
+ * the transaction.  Upon return, the inode will still be in this state
+ * upon return and the transaction will be clean.
+ */
+int
+xfs_trans_inode_ensure_nextents(
+	struct xfs_trans	**tpp,
+	struct xfs_inode	*ip,
+	int			whichfork,
+	int			nr_to_add)
+{
+	int			error;
+
+	error = xfs_iext_count_may_overflow(ip, whichfork, nr_to_add);
+	if (!error)
+		return 0;
+
+	/*
+	 * Try to upgrade if the extent count fields aren't large
+	 * enough.
+	 */
+	if (!xfs_has_nrext64(ip->i_mount) ||
+	    (ip->i_diflags2 & XFS_DIFLAG2_NREXT64))
+		return error;
+
+	ip->i_diflags2 |= XFS_DIFLAG2_NREXT64;
+	xfs_trans_log_inode(*tpp, ip, XFS_ILOG_CORE);
+
+	error = xfs_trans_roll(tpp);
+	if (error)
+		return error;
+
+	return xfs_iext_count_may_overflow(ip, whichfork, nr_to_add);
+}
diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
index 8e6221e32660..65265ca51b0d 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.h
+++ b/fs/xfs/libxfs/xfs_inode_fork.h
@@ -286,6 +286,8 @@ int xfs_ifork_verify_local_data(struct xfs_inode *ip);
 int xfs_ifork_verify_local_attr(struct xfs_inode *ip);
 int xfs_iext_count_may_overflow(struct xfs_inode *ip, int whichfork,
 		int nr_to_add);
+int xfs_trans_inode_ensure_nextents(struct xfs_trans **tpp,
+		struct xfs_inode *ip, int whichfork, int nr_to_add);
 
 /* returns true if the fork has extents but they are not read in yet. */
 static inline bool xfs_need_iread_extents(struct xfs_ifork *ifp)
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index e1f4d7d5a011..27bc16a2b09b 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -505,7 +505,8 @@ xfs_bui_item_recover(
 	else
 		iext_delta = XFS_IEXT_PUNCH_HOLE_CNT;
 
-	error = xfs_iext_count_may_overflow(ip, whichfork, iext_delta);
+	error = xfs_trans_inode_ensure_nextents(&tp, ip, whichfork,
+			iext_delta);
 	if (error)
 		goto err_cancel;
 
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index eb2e387ba528..8d86d8d5ad88 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -855,7 +855,7 @@ xfs_alloc_file_space(
 		if (error)
 			break;
 
-		error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
+		error = xfs_trans_inode_ensure_nextents(&tp, ip, XFS_DATA_FORK,
 				XFS_IEXT_ADD_NOSPLIT_CNT);
 		if (error)
 			goto error;
@@ -910,7 +910,7 @@ xfs_unmap_extent(
 	if (error)
 		return error;
 
-	error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
+	error = xfs_trans_inode_ensure_nextents(&tp, ip, XFS_DATA_FORK,
 			XFS_IEXT_PUNCH_HOLE_CNT);
 	if (error)
 		goto out_trans_cancel;
@@ -1191,7 +1191,7 @@ xfs_insert_file_space(
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(tp, ip, 0);
 
-	error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
+	error = xfs_trans_inode_ensure_nextents(&tp, ip, XFS_DATA_FORK,
 			XFS_IEXT_PUNCH_HOLE_CNT);
 	if (error)
 		goto out_trans_cancel;
@@ -1418,7 +1418,7 @@ xfs_swap_extent_rmap(
 			trace_xfs_swap_extent_rmap_remap_piece(tip, &uirec);
 
 			if (xfs_bmap_is_real_extent(&uirec)) {
-				error = xfs_iext_count_may_overflow(ip,
+				error = xfs_trans_inode_ensure_nextents(&tp, ip,
 						XFS_DATA_FORK,
 						XFS_IEXT_SWAP_RMAP_CNT);
 				if (error)
@@ -1426,7 +1426,7 @@ xfs_swap_extent_rmap(
 			}
 
 			if (xfs_bmap_is_real_extent(&irec)) {
-				error = xfs_iext_count_may_overflow(tip,
+				error = xfs_trans_inode_ensure_nextents(&tp, tip,
 						XFS_DATA_FORK,
 						XFS_IEXT_SWAP_RMAP_CNT);
 				if (error)
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index 5afedcbc78c7..193a2e66efc7 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -320,7 +320,7 @@ xfs_dquot_disk_alloc(
 		goto err_cancel;
 	}
 
-	error = xfs_iext_count_may_overflow(quotip, XFS_DATA_FORK,
+	error = xfs_trans_inode_ensure_nextents(&tp, quotip, XFS_DATA_FORK,
 			XFS_IEXT_ADD_NOSPLIT_CNT);
 	if (error)
 		goto err_cancel;
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index e552ce541ec2..4078d5324090 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -250,7 +250,8 @@ xfs_iomap_write_direct(
 	if (error)
 		return error;
 
-	error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK, nr_exts);
+	error = xfs_trans_inode_ensure_nextents(&tp, ip, XFS_DATA_FORK,
+			nr_exts);
 	if (error)
 		goto out_trans_cancel;
 
@@ -553,7 +554,7 @@ xfs_iomap_write_unwritten(
 		if (error)
 			return error;
 
-		error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
+		error = xfs_trans_inode_ensure_nextents(&tp, ip, XFS_DATA_FORK,
 				XFS_IEXT_WRITE_UNWRITTEN_CNT);
 		if (error)
 			goto error_on_bmapi_transaction;
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index db70060e7bf6..9d4fd2b160ff 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -615,7 +615,7 @@ xfs_reflink_end_cow_extent(
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(tp, ip, 0);
 
-	error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
+	error = xfs_trans_inode_ensure_nextents(&tp, ip, XFS_DATA_FORK,
 			XFS_IEXT_REFLINK_END_COW_CNT);
 	if (error)
 		goto out_cancel;
@@ -1117,7 +1117,8 @@ xfs_reflink_remap_extent(
 	if (dmap_written)
 		++iext_delta;
 
-	error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK, iext_delta);
+	error = xfs_trans_inode_ensure_nextents(&tp, ip, XFS_DATA_FORK,
+			iext_delta);
 	if (error)
 		goto out_cancel;
 
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index a70140b35e8b..6d4a16534b1f 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -806,7 +806,7 @@ xfs_growfs_rt_alloc(
 		xfs_trans_ijoin(tp, ip, 0);
 		unlock_inode = true;
 
-		error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
+		error = xfs_trans_inode_ensure_nextents(&tp, ip, XFS_DATA_FORK,
 				XFS_IEXT_ADD_NOSPLIT_CNT);
 		if (error)
 			goto out_trans_cancel;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH V7 15/17] xfs: Enable bulkstat ioctl to support 64-bit per-inode extent counters
  2022-03-01 10:39 [PATCH V7 00/17] xfs: Extend per-inode extent counters Chandan Babu R
                   ` (13 preceding siblings ...)
  2022-03-01 10:39 ` [PATCH V7 14/17] xfs: Conditionally upgrade existing inodes to use 64-bit extent counters Chandan Babu R
@ 2022-03-01 10:39 ` Chandan Babu R
  2022-03-02  0:31   ` Darrick J. Wong
  2022-03-04  8:09   ` Dave Chinner
  2022-03-01 10:39 ` [PATCH V7 16/17] xfs: Add XFS_SB_FEAT_INCOMPAT_NREXT64 to the list of supported flags Chandan Babu R
  2022-03-01 10:39 ` [PATCH V7 17/17] xfs: Define max extent length based on on-disk format definition Chandan Babu R
  16 siblings, 2 replies; 53+ messages in thread
From: Chandan Babu R @ 2022-03-01 10:39 UTC (permalink / raw)
  To: linux-xfs; +Cc: Chandan Babu R, djwong, david

The following changes are made to enable userspace to obtain 64-bit extent
counters,
1. Carve out a new 64-bit field xfs_bulkstat->bs_extents64 from
   xfs_bulkstat->bs_pad[] to hold 64-bit extent counter.
2. Define the new flag XFS_BULK_IREQ_BULKSTAT for userspace to indicate that
   it is capable of receiving 64-bit extent counters.

Suggested-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
 fs/xfs/libxfs/xfs_fs.h | 20 ++++++++++++++++----
 fs/xfs/xfs_ioctl.c     |  3 +++
 fs/xfs/xfs_itable.c    | 30 ++++++++++++++++++++++++++++--
 fs/xfs/xfs_itable.h    |  4 +++-
 fs/xfs/xfs_iwalk.h     |  2 +-
 5 files changed, 51 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 2204d49d0c3a..31ccbff2f16c 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -378,7 +378,7 @@ struct xfs_bulkstat {
 	uint32_t	bs_extsize_blks; /* extent size hint, blocks	*/
 
 	uint32_t	bs_nlink;	/* number of links		*/
-	uint32_t	bs_extents;	/* number of extents		*/
+	uint32_t	bs_extents;	/* 32-bit data fork extent counter */
 	uint32_t	bs_aextents;	/* attribute number of extents	*/
 	uint16_t	bs_version;	/* structure version		*/
 	uint16_t	bs_forkoff;	/* inode fork offset in bytes	*/
@@ -387,8 +387,9 @@ struct xfs_bulkstat {
 	uint16_t	bs_checked;	/* checked inode metadata	*/
 	uint16_t	bs_mode;	/* type and mode		*/
 	uint16_t	bs_pad2;	/* zeroed			*/
+	uint64_t	bs_extents64;	/* 64-bit data fork extent counter */
 
-	uint64_t	bs_pad[7];	/* zeroed			*/
+	uint64_t	bs_pad[6];	/* zeroed			*/
 };
 
 #define XFS_BULKSTAT_VERSION_V1	(1)
@@ -469,8 +470,19 @@ struct xfs_bulk_ireq {
  */
 #define XFS_BULK_IREQ_SPECIAL	(1 << 1)
 
-#define XFS_BULK_IREQ_FLAGS_ALL	(XFS_BULK_IREQ_AGNO | \
-				 XFS_BULK_IREQ_SPECIAL)
+/*
+ * Return data fork extent count via xfs_bulkstat->bs_extents64 field and assign
+ * 0 to xfs_bulkstat->bs_extents when the flag is set.  Otherwise, use
+ * xfs_bulkstat->bs_extents for returning data fork extent count and set
+ * xfs_bulkstat->bs_extents64 to 0. In the second case, return -EOVERFLOW and
+ * assign 0 to xfs_bulkstat->bs_extents if data fork extent count is larger than
+ * XFS_MAX_EXTCNT_DATA_FORK_OLD.
+ */
+#define XFS_BULK_IREQ_NREXT64	(1 << 2)
+
+#define XFS_BULK_IREQ_FLAGS_ALL	(XFS_BULK_IREQ_AGNO |	 \
+				 XFS_BULK_IREQ_SPECIAL | \
+				 XFS_BULK_IREQ_NREXT64)
 
 /* Operate on the root directory inode. */
 #define XFS_BULK_IREQ_SPECIAL_ROOT	(1)
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 2515fe8299e1..22947c5ffd34 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -813,6 +813,9 @@ xfs_bulk_ireq_setup(
 	if (XFS_INO_TO_AGNO(mp, breq->startino) >= mp->m_sb.sb_agcount)
 		return -ECANCELED;
 
+	if (hdr->flags & XFS_BULK_IREQ_NREXT64)
+		breq->flags |= XFS_IBULK_NREXT64;
+
 	return 0;
 }
 
diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
index c08c79d9e311..0272a3c9d8b1 100644
--- a/fs/xfs/xfs_itable.c
+++ b/fs/xfs/xfs_itable.c
@@ -20,6 +20,7 @@
 #include "xfs_icache.h"
 #include "xfs_health.h"
 #include "xfs_trans.h"
+#include "xfs_errortag.h"
 
 /*
  * Bulk Stat
@@ -64,6 +65,7 @@ xfs_bulkstat_one_int(
 	struct xfs_inode	*ip;		/* incore inode pointer */
 	struct inode		*inode;
 	struct xfs_bulkstat	*buf = bc->buf;
+	xfs_extnum_t		nextents;
 	int			error = -EINVAL;
 
 	if (xfs_internal_inum(mp, ino))
@@ -102,7 +104,27 @@ xfs_bulkstat_one_int(
 
 	buf->bs_xflags = xfs_ip2xflags(ip);
 	buf->bs_extsize_blks = ip->i_extsize;
-	buf->bs_extents = xfs_ifork_nextents(&ip->i_df);
+
+	nextents = xfs_ifork_nextents(&ip->i_df);
+	if (!(bc->breq->flags & XFS_IBULK_NREXT64)) {
+		xfs_extnum_t	max_nextents = XFS_MAX_EXTCNT_DATA_FORK_OLD;
+
+		if (unlikely(XFS_TEST_ERROR(false, mp,
+				XFS_ERRTAG_REDUCE_MAX_IEXTENTS)))
+			max_nextents = 10;
+
+		if (nextents > max_nextents) {
+			xfs_iunlock(ip, XFS_ILOCK_SHARED);
+			xfs_irele(ip);
+			error = -EOVERFLOW;
+			goto out;
+		}
+
+		buf->bs_extents = nextents;
+	} else {
+		buf->bs_extents64 = nextents;
+	}
+
 	xfs_bulkstat_health(ip, buf);
 	buf->bs_aextents = xfs_ifork_nextents(ip->i_afp);
 	buf->bs_forkoff = XFS_IFORK_BOFF(ip);
@@ -256,6 +278,7 @@ xfs_bulkstat(
 		.breq		= breq,
 	};
 	struct xfs_trans	*tp;
+	unsigned int		iwalk_flags = 0;
 	int			error;
 
 	if (breq->mnt_userns != &init_user_ns) {
@@ -279,7 +302,10 @@ xfs_bulkstat(
 	if (error)
 		goto out;
 
-	error = xfs_iwalk(breq->mp, tp, breq->startino, breq->flags,
+	if (breq->flags & XFS_IBULK_SAME_AG)
+		iwalk_flags |= XFS_IWALK_SAME_AG;
+
+	error = xfs_iwalk(breq->mp, tp, breq->startino, iwalk_flags,
 			xfs_bulkstat_iwalk, breq->icount, &bc);
 	xfs_trans_cancel(tp);
 out:
diff --git a/fs/xfs/xfs_itable.h b/fs/xfs/xfs_itable.h
index 7078d10c9b12..9223529cd7bd 100644
--- a/fs/xfs/xfs_itable.h
+++ b/fs/xfs/xfs_itable.h
@@ -17,7 +17,9 @@ struct xfs_ibulk {
 };
 
 /* Only iterate within the same AG as startino */
-#define XFS_IBULK_SAME_AG	(XFS_IWALK_SAME_AG)
+#define XFS_IBULK_SAME_AG	(1ULL << 0)
+
+#define XFS_IBULK_NREXT64	(1ULL << 1)
 
 /*
  * Advance the user buffer pointer by one record of the given size.  If the
diff --git a/fs/xfs/xfs_iwalk.h b/fs/xfs/xfs_iwalk.h
index 37a795f03267..3a68766fd909 100644
--- a/fs/xfs/xfs_iwalk.h
+++ b/fs/xfs/xfs_iwalk.h
@@ -26,7 +26,7 @@ int xfs_iwalk_threaded(struct xfs_mount *mp, xfs_ino_t startino,
 		unsigned int inode_records, bool poll, void *data);
 
 /* Only iterate inodes within the same AG as @startino. */
-#define XFS_IWALK_SAME_AG	(0x1)
+#define XFS_IWALK_SAME_AG	(1 << 0)
 
 #define XFS_IWALK_FLAGS_ALL	(XFS_IWALK_SAME_AG)
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH V7 16/17] xfs: Add XFS_SB_FEAT_INCOMPAT_NREXT64 to the list of supported flags
  2022-03-01 10:39 [PATCH V7 00/17] xfs: Extend per-inode extent counters Chandan Babu R
                   ` (14 preceding siblings ...)
  2022-03-01 10:39 ` [PATCH V7 15/17] xfs: Enable bulkstat ioctl to support 64-bit per-inode " Chandan Babu R
@ 2022-03-01 10:39 ` Chandan Babu R
  2022-03-01 10:39 ` [PATCH V7 17/17] xfs: Define max extent length based on on-disk format definition Chandan Babu R
  16 siblings, 0 replies; 53+ messages in thread
From: Chandan Babu R @ 2022-03-01 10:39 UTC (permalink / raw)
  To: linux-xfs; +Cc: Chandan Babu R, djwong, david

This commit enables XFS module to work with fs instances having 64-bit
per-inode extent counters by adding XFS_SB_FEAT_INCOMPAT_NREXT64 flag to the
list of supported incompat feature flags.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
 fs/xfs/libxfs/xfs_format.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 1a5b194da191..76bd5181f7d3 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -378,7 +378,8 @@ xfs_sb_has_ro_compat_feature(
 		 XFS_SB_FEAT_INCOMPAT_SPINODES|	\
 		 XFS_SB_FEAT_INCOMPAT_META_UUID| \
 		 XFS_SB_FEAT_INCOMPAT_BIGTIME| \
-		 XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR)
+		 XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR| \
+		 XFS_SB_FEAT_INCOMPAT_NREXT64)
 
 #define XFS_SB_FEAT_INCOMPAT_UNKNOWN	~XFS_SB_FEAT_INCOMPAT_ALL
 static inline bool
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH V7 17/17] xfs: Define max extent length based on on-disk format definition
  2022-03-01 10:39 [PATCH V7 00/17] xfs: Extend per-inode extent counters Chandan Babu R
                   ` (15 preceding siblings ...)
  2022-03-01 10:39 ` [PATCH V7 16/17] xfs: Add XFS_SB_FEAT_INCOMPAT_NREXT64 to the list of supported flags Chandan Babu R
@ 2022-03-01 10:39 ` Chandan Babu R
  2022-03-04  8:15   ` Dave Chinner
  16 siblings, 1 reply; 53+ messages in thread
From: Chandan Babu R @ 2022-03-01 10:39 UTC (permalink / raw)
  To: linux-xfs; +Cc: Chandan Babu R, djwong, david

The maximum extent length depends on maximum block count that can be stored in
a BMBT record. Hence this commit defines MAXEXTLEN based on
BMBT_BLOCKCOUNT_BITLEN.

While at it, the commit also renames MAXEXTLEN to XFS_MAX_BMBT_EXTLEN.

Suggested-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
 fs/xfs/libxfs/xfs_alloc.c      |  2 +-
 fs/xfs/libxfs/xfs_bmap.c       | 57 +++++++++++++++++-----------------
 fs/xfs/libxfs/xfs_format.h     |  5 +--
 fs/xfs/libxfs/xfs_inode_buf.c  |  4 +--
 fs/xfs/libxfs/xfs_trans_resv.c | 11 ++++---
 fs/xfs/scrub/bmap.c            |  2 +-
 fs/xfs/xfs_bmap_util.c         | 14 +++++----
 fs/xfs/xfs_iomap.c             | 28 ++++++++---------
 8 files changed, 64 insertions(+), 59 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 353e53b892e6..3f9b9cbfef43 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2493,7 +2493,7 @@ __xfs_free_extent_later(
 
 	ASSERT(bno != NULLFSBLOCK);
 	ASSERT(len > 0);
-	ASSERT(len <= MAXEXTLEN);
+	ASSERT(len <= XFS_MAX_BMBT_EXTLEN);
 	ASSERT(!isnullstartblock(bno));
 	agno = XFS_FSB_TO_AGNO(mp, bno);
 	agbno = XFS_FSB_TO_AGBNO(mp, bno);
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 3a3c99ef7f13..f604d45e1712 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -1449,7 +1449,7 @@ xfs_bmap_add_extent_delay_real(
 	    LEFT.br_startoff + LEFT.br_blockcount == new->br_startoff &&
 	    LEFT.br_startblock + LEFT.br_blockcount == new->br_startblock &&
 	    LEFT.br_state == new->br_state &&
-	    LEFT.br_blockcount + new->br_blockcount <= MAXEXTLEN)
+	    LEFT.br_blockcount + new->br_blockcount <= XFS_MAX_BMBT_EXTLEN)
 		state |= BMAP_LEFT_CONTIG;
 
 	/*
@@ -1467,13 +1467,13 @@ xfs_bmap_add_extent_delay_real(
 	    new_endoff == RIGHT.br_startoff &&
 	    new->br_startblock + new->br_blockcount == RIGHT.br_startblock &&
 	    new->br_state == RIGHT.br_state &&
-	    new->br_blockcount + RIGHT.br_blockcount <= MAXEXTLEN &&
+	    new->br_blockcount + RIGHT.br_blockcount <= XFS_MAX_BMBT_EXTLEN &&
 	    ((state & (BMAP_LEFT_CONTIG | BMAP_LEFT_FILLING |
 		       BMAP_RIGHT_FILLING)) !=
 		      (BMAP_LEFT_CONTIG | BMAP_LEFT_FILLING |
 		       BMAP_RIGHT_FILLING) ||
 	     LEFT.br_blockcount + new->br_blockcount + RIGHT.br_blockcount
-			<= MAXEXTLEN))
+			<= XFS_MAX_BMBT_EXTLEN))
 		state |= BMAP_RIGHT_CONTIG;
 
 	error = 0;
@@ -1997,7 +1997,7 @@ xfs_bmap_add_extent_unwritten_real(
 	    LEFT.br_startoff + LEFT.br_blockcount == new->br_startoff &&
 	    LEFT.br_startblock + LEFT.br_blockcount == new->br_startblock &&
 	    LEFT.br_state == new->br_state &&
-	    LEFT.br_blockcount + new->br_blockcount <= MAXEXTLEN)
+	    LEFT.br_blockcount + new->br_blockcount <= XFS_MAX_BMBT_EXTLEN)
 		state |= BMAP_LEFT_CONTIG;
 
 	/*
@@ -2015,13 +2015,13 @@ xfs_bmap_add_extent_unwritten_real(
 	    new_endoff == RIGHT.br_startoff &&
 	    new->br_startblock + new->br_blockcount == RIGHT.br_startblock &&
 	    new->br_state == RIGHT.br_state &&
-	    new->br_blockcount + RIGHT.br_blockcount <= MAXEXTLEN &&
+	    new->br_blockcount + RIGHT.br_blockcount <= XFS_MAX_BMBT_EXTLEN &&
 	    ((state & (BMAP_LEFT_CONTIG | BMAP_LEFT_FILLING |
 		       BMAP_RIGHT_FILLING)) !=
 		      (BMAP_LEFT_CONTIG | BMAP_LEFT_FILLING |
 		       BMAP_RIGHT_FILLING) ||
 	     LEFT.br_blockcount + new->br_blockcount + RIGHT.br_blockcount
-			<= MAXEXTLEN))
+			<= XFS_MAX_BMBT_EXTLEN))
 		state |= BMAP_RIGHT_CONTIG;
 
 	/*
@@ -2507,15 +2507,15 @@ xfs_bmap_add_extent_hole_delay(
 	 */
 	if ((state & BMAP_LEFT_VALID) && (state & BMAP_LEFT_DELAY) &&
 	    left.br_startoff + left.br_blockcount == new->br_startoff &&
-	    left.br_blockcount + new->br_blockcount <= MAXEXTLEN)
+	    left.br_blockcount + new->br_blockcount <= XFS_MAX_BMBT_EXTLEN)
 		state |= BMAP_LEFT_CONTIG;
 
 	if ((state & BMAP_RIGHT_VALID) && (state & BMAP_RIGHT_DELAY) &&
 	    new->br_startoff + new->br_blockcount == right.br_startoff &&
-	    new->br_blockcount + right.br_blockcount <= MAXEXTLEN &&
+	    new->br_blockcount + right.br_blockcount <= XFS_MAX_BMBT_EXTLEN &&
 	    (!(state & BMAP_LEFT_CONTIG) ||
 	     (left.br_blockcount + new->br_blockcount +
-	      right.br_blockcount <= MAXEXTLEN)))
+	      right.br_blockcount <= XFS_MAX_BMBT_EXTLEN)))
 		state |= BMAP_RIGHT_CONTIG;
 
 	/*
@@ -2658,17 +2658,17 @@ xfs_bmap_add_extent_hole_real(
 	    left.br_startoff + left.br_blockcount == new->br_startoff &&
 	    left.br_startblock + left.br_blockcount == new->br_startblock &&
 	    left.br_state == new->br_state &&
-	    left.br_blockcount + new->br_blockcount <= MAXEXTLEN)
+	    left.br_blockcount + new->br_blockcount <= XFS_MAX_BMBT_EXTLEN)
 		state |= BMAP_LEFT_CONTIG;
 
 	if ((state & BMAP_RIGHT_VALID) && !(state & BMAP_RIGHT_DELAY) &&
 	    new->br_startoff + new->br_blockcount == right.br_startoff &&
 	    new->br_startblock + new->br_blockcount == right.br_startblock &&
 	    new->br_state == right.br_state &&
-	    new->br_blockcount + right.br_blockcount <= MAXEXTLEN &&
+	    new->br_blockcount + right.br_blockcount <= XFS_MAX_BMBT_EXTLEN &&
 	    (!(state & BMAP_LEFT_CONTIG) ||
 	     left.br_blockcount + new->br_blockcount +
-	     right.br_blockcount <= MAXEXTLEN))
+	     right.br_blockcount <= XFS_MAX_BMBT_EXTLEN))
 		state |= BMAP_RIGHT_CONTIG;
 
 	error = 0;
@@ -2903,15 +2903,15 @@ xfs_bmap_extsize_align(
 
 	/*
 	 * For large extent hint sizes, the aligned extent might be larger than
-	 * MAXEXTLEN. In that case, reduce the size by an extsz so that it pulls
-	 * the length back under MAXEXTLEN. The outer allocation loops handle
-	 * short allocation just fine, so it is safe to do this. We only want to
-	 * do it when we are forced to, though, because it means more allocation
-	 * operations are required.
+	 * XFS_BMBT_MAX_EXTLEN. In that case, reduce the size by an extsz so
+	 * that it pulls the length back under XFS_BMBT_MAX_EXTLEN. The outer
+	 * allocation loops handle short allocation just fine, so it is safe to
+	 * do this. We only want to do it when we are forced to, though, because
+	 * it means more allocation operations are required.
 	 */
-	while (align_alen > MAXEXTLEN)
+	while (align_alen > XFS_MAX_BMBT_EXTLEN)
 		align_alen -= extsz;
-	ASSERT(align_alen <= MAXEXTLEN);
+	ASSERT(align_alen <= XFS_MAX_BMBT_EXTLEN);
 
 	/*
 	 * If the previous block overlaps with this proposed allocation
@@ -3001,9 +3001,9 @@ xfs_bmap_extsize_align(
 			return -EINVAL;
 	} else {
 		ASSERT(orig_off >= align_off);
-		/* see MAXEXTLEN handling above */
+		/* see XFS_BMBT_MAX_EXTLEN handling above */
 		ASSERT(orig_end <= align_off + align_alen ||
-		       align_alen + extsz > MAXEXTLEN);
+		       align_alen + extsz > XFS_MAX_BMBT_EXTLEN);
 	}
 
 #ifdef DEBUG
@@ -3968,7 +3968,7 @@ xfs_bmapi_reserve_delalloc(
 	 * Cap the alloc length. Keep track of prealloc so we know whether to
 	 * tag the inode before we return.
 	 */
-	alen = XFS_FILBLKS_MIN(len + prealloc, MAXEXTLEN);
+	alen = XFS_FILBLKS_MIN(len + prealloc, XFS_MAX_BMBT_EXTLEN);
 	if (!eof)
 		alen = XFS_FILBLKS_MIN(alen, got->br_startoff - aoff);
 	if (prealloc && alen >= len)
@@ -4101,7 +4101,7 @@ xfs_bmapi_allocate(
 		if (!xfs_iext_peek_prev_extent(ifp, &bma->icur, &bma->prev))
 			bma->prev.br_startoff = NULLFILEOFF;
 	} else {
-		bma->length = XFS_FILBLKS_MIN(bma->length, MAXEXTLEN);
+		bma->length = XFS_FILBLKS_MIN(bma->length, XFS_MAX_BMBT_EXTLEN);
 		if (!bma->eof)
 			bma->length = XFS_FILBLKS_MIN(bma->length,
 					bma->got.br_startoff - bma->offset);
@@ -4421,8 +4421,8 @@ xfs_bmapi_write(
 			 * xfs_extlen_t and therefore 32 bits. Hence we have to
 			 * check for 32-bit overflows and handle them here.
 			 */
-			if (len > (xfs_filblks_t)MAXEXTLEN)
-				bma.length = MAXEXTLEN;
+			if (len > (xfs_filblks_t)XFS_MAX_BMBT_EXTLEN)
+				bma.length = XFS_MAX_BMBT_EXTLEN;
 			else
 				bma.length = len;
 
@@ -4556,7 +4556,8 @@ xfs_bmapi_convert_delalloc(
 	bma.ip = ip;
 	bma.wasdel = true;
 	bma.offset = bma.got.br_startoff;
-	bma.length = max_t(xfs_filblks_t, bma.got.br_blockcount, MAXEXTLEN);
+	bma.length = max_t(xfs_filblks_t, bma.got.br_blockcount,
+			XFS_MAX_BMBT_EXTLEN);
 	bma.minleft = xfs_bmapi_minleft(tp, ip, whichfork);
 
 	/*
@@ -4637,7 +4638,7 @@ xfs_bmapi_remap(
 
 	ifp = XFS_IFORK_PTR(ip, whichfork);
 	ASSERT(len > 0);
-	ASSERT(len <= (xfs_filblks_t)MAXEXTLEN);
+	ASSERT(len <= (xfs_filblks_t)XFS_MAX_BMBT_EXTLEN);
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
 	ASSERT(!(flags & ~(XFS_BMAPI_ATTRFORK | XFS_BMAPI_PREALLOC |
 			   XFS_BMAPI_NORMAP)));
@@ -5637,7 +5638,7 @@ xfs_bmse_can_merge(
 	if ((left->br_startoff + left->br_blockcount != startoff) ||
 	    (left->br_startblock + left->br_blockcount != got->br_startblock) ||
 	    (left->br_state != got->br_state) ||
-	    (left->br_blockcount + got->br_blockcount > MAXEXTLEN))
+	    (left->br_blockcount + got->br_blockcount > XFS_MAX_BMBT_EXTLEN))
 		return false;
 
 	return true;
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 76bd5181f7d3..b2228558f798 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -897,7 +897,7 @@ enum xfs_dinode_fmt {
 	{ XFS_DINODE_FMT_UUID,		"uuid" }
 
 /*
- * Max values for extlen, extnum, aextnum.
+ * Max values for ondisk inode's extent counters.
  *
  * The newly introduced data fork extent counter is a 64-bit field. However, the
  * maximum number of extents in a file is limited to 2^54 extents (assuming one
@@ -909,7 +909,6 @@ enum xfs_dinode_fmt {
  * Rounding up 47 to the nearest multiple of bits-per-byte results in 48. Hence
  * 2^48 was chosen as the maximum data fork extent count.
  */
-#define	MAXEXTLEN			((xfs_extlen_t)((1ULL << 21) - 1)) /* 21 bits */
 #define XFS_MAX_EXTCNT_DATA_FORK	((xfs_extnum_t)((1ULL << 48) - 1)) /* Unsigned 48-bits */
 #define XFS_MAX_EXTCNT_ATTR_FORK	((xfs_extnum_t)((1ULL << 32) - 1)) /* Unsigned 32-bits */
 #define XFS_MAX_EXTCNT_DATA_FORK_OLD	((xfs_extnum_t)((1ULL << 31) - 1)) /* Signed 32-bits */
@@ -1646,6 +1645,8 @@ typedef struct xfs_bmdr_block {
 #define BMBT_STARTOFF_MASK	((1ULL << BMBT_STARTOFF_BITLEN) - 1)
 #define BMBT_BLOCKCOUNT_MASK	((1ULL << BMBT_BLOCKCOUNT_BITLEN) - 1)
 
+#define XFS_MAX_BMBT_EXTLEN	((xfs_extlen_t)(BMBT_BLOCKCOUNT_MASK))
+
 /*
  * bmbt records have a file offset (block) field that is 54 bits wide, so this
  * is the largest xfs_fileoff_t that we ever expect to see.
diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index a11d3ea5ebfe..b8e2a542643f 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -685,7 +685,7 @@ xfs_inode_validate_extsize(
 	if (extsize_bytes % blocksize_bytes)
 		return __this_address;
 
-	if (extsize > MAXEXTLEN)
+	if (extsize > XFS_MAX_BMBT_EXTLEN)
 		return __this_address;
 
 	if (!rt_flag && extsize > mp->m_sb.sb_agblocks / 2)
@@ -742,7 +742,7 @@ xfs_inode_validate_cowextsize(
 	if (cowextsize_bytes % mp->m_sb.sb_blocksize)
 		return __this_address;
 
-	if (cowextsize > MAXEXTLEN)
+	if (cowextsize > XFS_MAX_BMBT_EXTLEN)
 		return __this_address;
 
 	if (cowextsize > mp->m_sb.sb_agblocks / 2)
diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
index 6f83d9b306ee..19313021fb99 100644
--- a/fs/xfs/libxfs/xfs_trans_resv.c
+++ b/fs/xfs/libxfs/xfs_trans_resv.c
@@ -199,8 +199,8 @@ xfs_calc_inode_chunk_res(
 /*
  * Per-extent log reservation for the btree changes involved in freeing or
  * allocating a realtime extent.  We have to be able to log as many rtbitmap
- * blocks as needed to mark inuse MAXEXTLEN blocks' worth of realtime extents,
- * as well as the realtime summary block.
+ * blocks as needed to mark inuse XFS_BMBT_MAX_EXTLEN blocks' worth of realtime
+ * extents, as well as the realtime summary block.
  */
 static unsigned int
 xfs_rtalloc_log_count(
@@ -210,7 +210,7 @@ xfs_rtalloc_log_count(
 	unsigned int		blksz = XFS_FSB_TO_B(mp, 1);
 	unsigned int		rtbmp_bytes;
 
-	rtbmp_bytes = (MAXEXTLEN / mp->m_sb.sb_rextsize) / NBBY;
+	rtbmp_bytes = (XFS_MAX_BMBT_EXTLEN / mp->m_sb.sb_rextsize) / NBBY;
 	return (howmany(rtbmp_bytes, blksz) + 1) * num_ops;
 }
 
@@ -247,7 +247,7 @@ xfs_rtalloc_log_count(
  *    the inode's bmap btree: max depth * block size
  *    the agfs of the ags from which the extents are allocated: 2 * sector
  *    the superblock free block counter: sector size
- *    the realtime bitmap: ((MAXEXTLEN / rtextsize) / NBBY) bytes
+ *    the realtime bitmap: ((XFS_BMBT_MAX_EXTLEN / rtextsize) / NBBY) bytes
  *    the realtime summary: 1 block
  *    the allocation btrees: 2 trees * (2 * max depth - 1) * block size
  * And the bmap_finish transaction can free bmap blocks in a join (t3):
@@ -299,7 +299,8 @@ xfs_calc_write_reservation(
  *    the agf for each of the ags: 2 * sector size
  *    the agfl for each of the ags: 2 * sector size
  *    the super block to reflect the freed blocks: sector size
- *    the realtime bitmap: 2 exts * ((MAXEXTLEN / rtextsize) / NBBY) bytes
+ *    the realtime bitmap: 2 exts * ((XFS_BMBT_MAX_EXTLEN / rtextsize) / NBBY)
+ *    bytes
  *    the realtime summary: 2 exts * 1 block
  *    worst case split in allocation btrees per extent assuming 2 extents:
  *		2 exts * 2 trees * (2 * max depth - 1) * block size
diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c
index a4cbbc346f60..c357593e0a02 100644
--- a/fs/xfs/scrub/bmap.c
+++ b/fs/xfs/scrub/bmap.c
@@ -350,7 +350,7 @@ xchk_bmap_iextent(
 				irec->br_startoff);
 
 	/* Make sure the extent points to a valid place. */
-	if (irec->br_blockcount > MAXEXTLEN)
+	if (irec->br_blockcount > XFS_MAX_BMBT_EXTLEN)
 		xchk_fblock_set_corrupt(info->sc, info->whichfork,
 				irec->br_startoff);
 	if (info->is_rt &&
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 8d86d8d5ad88..fc110fe7cb51 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -119,14 +119,14 @@ xfs_bmap_rtalloc(
 	 */
 	ralen = ap->length / mp->m_sb.sb_rextsize;
 	/*
-	 * If the old value was close enough to MAXEXTLEN that
+	 * If the old value was close enough to XFS_BMBT_MAX_EXTLEN that
 	 * we rounded up to it, cut it back so it's valid again.
 	 * Note that if it's a really large request (bigger than
-	 * MAXEXTLEN), we don't hear about that number, and can't
+	 * XFS_BMBT_MAX_EXTLEN), we don't hear about that number, and can't
 	 * adjust the starting point to match it.
 	 */
-	if (ralen * mp->m_sb.sb_rextsize >= MAXEXTLEN)
-		ralen = MAXEXTLEN / mp->m_sb.sb_rextsize;
+	if (ralen * mp->m_sb.sb_rextsize >= XFS_MAX_BMBT_EXTLEN)
+		ralen = XFS_MAX_BMBT_EXTLEN / mp->m_sb.sb_rextsize;
 
 	/*
 	 * Lock out modifications to both the RT bitmap and summary inodes
@@ -839,9 +839,11 @@ xfs_alloc_file_space(
 		 * count, hence we need to limit the number of blocks we are
 		 * trying to reserve to avoid an overflow. We can't allocate
 		 * more than @nimaps extents, and an extent is limited on disk
-		 * to MAXEXTLEN (21 bits), so use that to enforce the limit.
+		 * to XFS_BMBT_MAX_EXTLEN (21 bits), so use that to enforce the
+		 * limit.
 		 */
-		resblks = min_t(xfs_fileoff_t, (e - s), (MAXEXTLEN * nimaps));
+		resblks = min_t(xfs_fileoff_t, (e - s),
+				(XFS_MAX_BMBT_EXTLEN * nimaps));
 		if (unlikely(rt)) {
 			dblocks = XFS_DIOSTRAT_SPACE_RES(mp, 0);
 			rblocks = resblks;
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 4078d5324090..8bbf4a2cca9f 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -403,7 +403,7 @@ xfs_iomap_prealloc_size(
 	 */
 	plen = prev.br_blockcount;
 	while (xfs_iext_prev_extent(ifp, &ncur, &got)) {
-		if (plen > MAXEXTLEN / 2 ||
+		if (plen > XFS_MAX_BMBT_EXTLEN / 2 ||
 		    isnullstartblock(got.br_startblock) ||
 		    got.br_startoff + got.br_blockcount != prev.br_startoff ||
 		    got.br_startblock + got.br_blockcount != prev.br_startblock)
@@ -415,23 +415,23 @@ xfs_iomap_prealloc_size(
 	/*
 	 * If the size of the extents is greater than half the maximum extent
 	 * length, then use the current offset as the basis.  This ensures that
-	 * for large files the preallocation size always extends to MAXEXTLEN
-	 * rather than falling short due to things like stripe unit/width
-	 * alignment of real extents.
+	 * for large files the preallocation size always extends to
+	 * XFS_BMBT_MAX_EXTLEN rather than falling short due to things like stripe
+	 * unit/width alignment of real extents.
 	 */
 	alloc_blocks = plen * 2;
-	if (alloc_blocks > MAXEXTLEN)
+	if (alloc_blocks > XFS_MAX_BMBT_EXTLEN)
 		alloc_blocks = XFS_B_TO_FSB(mp, offset);
 	qblocks = alloc_blocks;
 
 	/*
-	 * MAXEXTLEN is not a power of two value but we round the prealloc down
-	 * to the nearest power of two value after throttling. To prevent the
-	 * round down from unconditionally reducing the maximum supported
-	 * prealloc size, we round up first, apply appropriate throttling,
-	 * round down and cap the value to MAXEXTLEN.
+	 * XFS_BMBT_MAX_EXTLEN is not a power of two value but we round the prealloc
+	 * down to the nearest power of two value after throttling. To prevent
+	 * the round down from unconditionally reducing the maximum supported
+	 * prealloc size, we round up first, apply appropriate throttling, round
+	 * down and cap the value to XFS_BMBT_MAX_EXTLEN.
 	 */
-	alloc_blocks = XFS_FILEOFF_MIN(roundup_pow_of_two(MAXEXTLEN),
+	alloc_blocks = XFS_FILEOFF_MIN(roundup_pow_of_two(XFS_MAX_BMBT_EXTLEN),
 				       alloc_blocks);
 
 	freesp = percpu_counter_read_positive(&mp->m_fdblocks);
@@ -479,14 +479,14 @@ xfs_iomap_prealloc_size(
 	 */
 	if (alloc_blocks)
 		alloc_blocks = rounddown_pow_of_two(alloc_blocks);
-	if (alloc_blocks > MAXEXTLEN)
-		alloc_blocks = MAXEXTLEN;
+	if (alloc_blocks > XFS_MAX_BMBT_EXTLEN)
+		alloc_blocks = XFS_MAX_BMBT_EXTLEN;
 
 	/*
 	 * If we are still trying to allocate more space than is
 	 * available, squash the prealloc hard. This can happen if we
 	 * have a large file on a small filesystem and the above
-	 * lowspace thresholds are smaller than MAXEXTLEN.
+	 * lowspace thresholds are smaller than XFS_BMBT_MAX_EXTLEN.
 	 */
 	while (alloc_blocks && alloc_blocks >= freesp)
 		alloc_blocks >>= 4;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 13/17] xfs: xfs_growfs_rt_alloc: Unlock inode explicitly rather than through iop_committing()
  2022-03-01 10:39 ` [PATCH V7 13/17] xfs: xfs_growfs_rt_alloc: Unlock inode explicitly rather than through iop_committing() Chandan Babu R
@ 2022-03-02  0:26   ` Darrick J. Wong
  2022-03-04  7:25   ` Dave Chinner
  1 sibling, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2022-03-02  0:26 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs, david

On Tue, Mar 01, 2022 at 04:09:34PM +0530, Chandan Babu R wrote:
> In order to be able to upgrade inodes to XFS_DIFLAG2_NREXT64, a future commit
> will perform such an upgrade in a transaction context. This requires the
> transaction to be rolled once. Hence inodes which have been added to the
> tranasction (via xfs_trans_ijoin()) with non-zero value for lock_flags
> argument would cause the inode to be unlocked when the transaction is rolled.
> 
> To prevent this from happening in the case of realtime bitmap/summary inodes,
> this commit now unlocks the inode explictly rather than through
> iop_committing() call back.
> 
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
> ---
>  fs/xfs/xfs_rtalloc.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
> index b8c79ee791af..a70140b35e8b 100644
> --- a/fs/xfs/xfs_rtalloc.c
> +++ b/fs/xfs/xfs_rtalloc.c
> @@ -780,6 +780,7 @@ xfs_growfs_rt_alloc(
>  	int			resblks;	/* space reservation */
>  	enum xfs_blft		buf_type;
>  	struct xfs_trans	*tp;
> +	bool			unlock_inode;
>  
>  	if (ip == mp->m_rsumip)
>  		buf_type = XFS_BLFT_RTSUMMARY_BUF;
> @@ -802,7 +803,8 @@ xfs_growfs_rt_alloc(
>  		 * Lock the inode.
>  		 */
>  		xfs_ilock(ip, XFS_ILOCK_EXCL);
> -		xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
> +		xfs_trans_ijoin(tp, ip, 0);
> +		unlock_inode = true;
>  
>  		error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
>  				XFS_IEXT_ADD_NOSPLIT_CNT);
> @@ -823,8 +825,11 @@ xfs_growfs_rt_alloc(
>  		 * Free any blocks freed up in the transaction, then commit.
>  		 */
>  		error = xfs_trans_commit(tp);
> -		if (error)
> +                unlock_inode = false;
> +                xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +                if (error)
>  			return error;

Whitespace corruption here?

Other than that ... let's see, the ILOCK/ijoin in the inner loop that
zeroes the new bitmap/summary blocks doesn't require an explicit unlock,
so I think this looks fine now.

So with that fixed,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> +
>  		/*
>  		 * Now we need to clear the allocated blocks.
>  		 * Do this one block per transaction, to keep it simple.
> @@ -874,6 +879,8 @@ xfs_growfs_rt_alloc(
>  
>  out_trans_cancel:
>  	xfs_trans_cancel(tp);
> +	if (unlock_inode)
> +		xfs_iunlock(ip, XFS_ILOCK_EXCL);
>  	return error;
>  }
>  
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 15/17] xfs: Enable bulkstat ioctl to support 64-bit per-inode extent counters
  2022-03-01 10:39 ` [PATCH V7 15/17] xfs: Enable bulkstat ioctl to support 64-bit per-inode " Chandan Babu R
@ 2022-03-02  0:31   ` Darrick J. Wong
  2022-03-04  8:09   ` Dave Chinner
  1 sibling, 0 replies; 53+ messages in thread
From: Darrick J. Wong @ 2022-03-02  0:31 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs, david

On Tue, Mar 01, 2022 at 04:09:36PM +0530, Chandan Babu R wrote:
> The following changes are made to enable userspace to obtain 64-bit extent
> counters,
> 1. Carve out a new 64-bit field xfs_bulkstat->bs_extents64 from
>    xfs_bulkstat->bs_pad[] to hold 64-bit extent counter.
> 2. Define the new flag XFS_BULK_IREQ_BULKSTAT for userspace to indicate that
>    it is capable of receiving 64-bit extent counters.
> 
> Suggested-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>

Hm.  So I've fully reviewed this now:
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

But since this is an ondisk format change, I think we ought to let
others chime in on the review parts.

In any case, it's past -rc6, which means it's too late for adding
anything other than bug fixes to 5.18-merge.  This is particularly
unfortunate since I am so strapped for time now that I barely had time
to review these, didn't have time to help Allison figure out where the
memory leak is in her patchset, and completely failed to get even a
single patch ready from my own development tree. <grumble>

--D

> ---
>  fs/xfs/libxfs/xfs_fs.h | 20 ++++++++++++++++----
>  fs/xfs/xfs_ioctl.c     |  3 +++
>  fs/xfs/xfs_itable.c    | 30 ++++++++++++++++++++++++++++--
>  fs/xfs/xfs_itable.h    |  4 +++-
>  fs/xfs/xfs_iwalk.h     |  2 +-
>  5 files changed, 51 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index 2204d49d0c3a..31ccbff2f16c 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -378,7 +378,7 @@ struct xfs_bulkstat {
>  	uint32_t	bs_extsize_blks; /* extent size hint, blocks	*/
>  
>  	uint32_t	bs_nlink;	/* number of links		*/
> -	uint32_t	bs_extents;	/* number of extents		*/
> +	uint32_t	bs_extents;	/* 32-bit data fork extent counter */
>  	uint32_t	bs_aextents;	/* attribute number of extents	*/
>  	uint16_t	bs_version;	/* structure version		*/
>  	uint16_t	bs_forkoff;	/* inode fork offset in bytes	*/
> @@ -387,8 +387,9 @@ struct xfs_bulkstat {
>  	uint16_t	bs_checked;	/* checked inode metadata	*/
>  	uint16_t	bs_mode;	/* type and mode		*/
>  	uint16_t	bs_pad2;	/* zeroed			*/
> +	uint64_t	bs_extents64;	/* 64-bit data fork extent counter */
>  
> -	uint64_t	bs_pad[7];	/* zeroed			*/
> +	uint64_t	bs_pad[6];	/* zeroed			*/
>  };
>  
>  #define XFS_BULKSTAT_VERSION_V1	(1)
> @@ -469,8 +470,19 @@ struct xfs_bulk_ireq {
>   */
>  #define XFS_BULK_IREQ_SPECIAL	(1 << 1)
>  
> -#define XFS_BULK_IREQ_FLAGS_ALL	(XFS_BULK_IREQ_AGNO | \
> -				 XFS_BULK_IREQ_SPECIAL)
> +/*
> + * Return data fork extent count via xfs_bulkstat->bs_extents64 field and assign
> + * 0 to xfs_bulkstat->bs_extents when the flag is set.  Otherwise, use
> + * xfs_bulkstat->bs_extents for returning data fork extent count and set
> + * xfs_bulkstat->bs_extents64 to 0. In the second case, return -EOVERFLOW and
> + * assign 0 to xfs_bulkstat->bs_extents if data fork extent count is larger than
> + * XFS_MAX_EXTCNT_DATA_FORK_OLD.
> + */
> +#define XFS_BULK_IREQ_NREXT64	(1 << 2)
> +
> +#define XFS_BULK_IREQ_FLAGS_ALL	(XFS_BULK_IREQ_AGNO |	 \
> +				 XFS_BULK_IREQ_SPECIAL | \
> +				 XFS_BULK_IREQ_NREXT64)
>  
>  /* Operate on the root directory inode. */
>  #define XFS_BULK_IREQ_SPECIAL_ROOT	(1)
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 2515fe8299e1..22947c5ffd34 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -813,6 +813,9 @@ xfs_bulk_ireq_setup(
>  	if (XFS_INO_TO_AGNO(mp, breq->startino) >= mp->m_sb.sb_agcount)
>  		return -ECANCELED;
>  
> +	if (hdr->flags & XFS_BULK_IREQ_NREXT64)
> +		breq->flags |= XFS_IBULK_NREXT64;
> +
>  	return 0;
>  }
>  
> diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
> index c08c79d9e311..0272a3c9d8b1 100644
> --- a/fs/xfs/xfs_itable.c
> +++ b/fs/xfs/xfs_itable.c
> @@ -20,6 +20,7 @@
>  #include "xfs_icache.h"
>  #include "xfs_health.h"
>  #include "xfs_trans.h"
> +#include "xfs_errortag.h"
>  
>  /*
>   * Bulk Stat
> @@ -64,6 +65,7 @@ xfs_bulkstat_one_int(
>  	struct xfs_inode	*ip;		/* incore inode pointer */
>  	struct inode		*inode;
>  	struct xfs_bulkstat	*buf = bc->buf;
> +	xfs_extnum_t		nextents;
>  	int			error = -EINVAL;
>  
>  	if (xfs_internal_inum(mp, ino))
> @@ -102,7 +104,27 @@ xfs_bulkstat_one_int(
>  
>  	buf->bs_xflags = xfs_ip2xflags(ip);
>  	buf->bs_extsize_blks = ip->i_extsize;
> -	buf->bs_extents = xfs_ifork_nextents(&ip->i_df);
> +
> +	nextents = xfs_ifork_nextents(&ip->i_df);
> +	if (!(bc->breq->flags & XFS_IBULK_NREXT64)) {
> +		xfs_extnum_t	max_nextents = XFS_MAX_EXTCNT_DATA_FORK_OLD;
> +
> +		if (unlikely(XFS_TEST_ERROR(false, mp,
> +				XFS_ERRTAG_REDUCE_MAX_IEXTENTS)))
> +			max_nextents = 10;
> +
> +		if (nextents > max_nextents) {
> +			xfs_iunlock(ip, XFS_ILOCK_SHARED);
> +			xfs_irele(ip);
> +			error = -EOVERFLOW;
> +			goto out;
> +		}
> +
> +		buf->bs_extents = nextents;
> +	} else {
> +		buf->bs_extents64 = nextents;
> +	}
> +
>  	xfs_bulkstat_health(ip, buf);
>  	buf->bs_aextents = xfs_ifork_nextents(ip->i_afp);
>  	buf->bs_forkoff = XFS_IFORK_BOFF(ip);
> @@ -256,6 +278,7 @@ xfs_bulkstat(
>  		.breq		= breq,
>  	};
>  	struct xfs_trans	*tp;
> +	unsigned int		iwalk_flags = 0;
>  	int			error;
>  
>  	if (breq->mnt_userns != &init_user_ns) {
> @@ -279,7 +302,10 @@ xfs_bulkstat(
>  	if (error)
>  		goto out;
>  
> -	error = xfs_iwalk(breq->mp, tp, breq->startino, breq->flags,
> +	if (breq->flags & XFS_IBULK_SAME_AG)
> +		iwalk_flags |= XFS_IWALK_SAME_AG;
> +
> +	error = xfs_iwalk(breq->mp, tp, breq->startino, iwalk_flags,
>  			xfs_bulkstat_iwalk, breq->icount, &bc);
>  	xfs_trans_cancel(tp);
>  out:
> diff --git a/fs/xfs/xfs_itable.h b/fs/xfs/xfs_itable.h
> index 7078d10c9b12..9223529cd7bd 100644
> --- a/fs/xfs/xfs_itable.h
> +++ b/fs/xfs/xfs_itable.h
> @@ -17,7 +17,9 @@ struct xfs_ibulk {
>  };
>  
>  /* Only iterate within the same AG as startino */
> -#define XFS_IBULK_SAME_AG	(XFS_IWALK_SAME_AG)
> +#define XFS_IBULK_SAME_AG	(1ULL << 0)
> +
> +#define XFS_IBULK_NREXT64	(1ULL << 1)
>  
>  /*
>   * Advance the user buffer pointer by one record of the given size.  If the
> diff --git a/fs/xfs/xfs_iwalk.h b/fs/xfs/xfs_iwalk.h
> index 37a795f03267..3a68766fd909 100644
> --- a/fs/xfs/xfs_iwalk.h
> +++ b/fs/xfs/xfs_iwalk.h
> @@ -26,7 +26,7 @@ int xfs_iwalk_threaded(struct xfs_mount *mp, xfs_ino_t startino,
>  		unsigned int inode_records, bool poll, void *data);
>  
>  /* Only iterate inodes within the same AG as @startino. */
> -#define XFS_IWALK_SAME_AG	(0x1)
> +#define XFS_IWALK_SAME_AG	(1 << 0)
>  
>  #define XFS_IWALK_FLAGS_ALL	(XFS_IWALK_SAME_AG)
>  
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 01/17] xfs: Move extent count limits to xfs_format.h
  2022-03-01 10:39 ` [PATCH V7 01/17] xfs: Move extent count limits to xfs_format.h Chandan Babu R
@ 2022-03-04  0:55   ` Dave Chinner
  0 siblings, 0 replies; 53+ messages in thread
From: Dave Chinner @ 2022-03-04  0:55 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs, djwong

On Tue, Mar 01, 2022 at 04:09:22PM +0530, Chandan Babu R wrote:
> Maximum values associated with extent counters i.e. Maximum extent length,
> Maximum data extents and Maximum xattr extents are dictated by the on-disk
> format. Hence move these definitions over to xfs_format.h.
> 
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_format.h | 7 +++++++
>  fs/xfs/libxfs/xfs_types.h  | 7 -------
>  2 files changed, 7 insertions(+), 7 deletions(-)

Looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 02/17] xfs: Introduce xfs_iext_max_nextents() helper
  2022-03-01 10:39 ` [PATCH V7 02/17] xfs: Introduce xfs_iext_max_nextents() helper Chandan Babu R
@ 2022-03-04  0:56   ` Dave Chinner
  0 siblings, 0 replies; 53+ messages in thread
From: Dave Chinner @ 2022-03-04  0:56 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs, djwong

On Tue, Mar 01, 2022 at 04:09:23PM +0530, Chandan Babu R wrote:
> xfs_iext_max_nextents() returns the maximum number of extents possible for one
> of data, cow or attribute fork. This helper will be extended further in a
> future commit when maximum extent counts associated with data/attribute forks
> are increased.
> 
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_bmap.c       | 9 ++++-----
>  fs/xfs/libxfs/xfs_inode_buf.c  | 8 +++-----
>  fs/xfs/libxfs/xfs_inode_fork.c | 2 +-
>  fs/xfs/libxfs/xfs_inode_fork.h | 8 ++++++++
>  4 files changed, 16 insertions(+), 11 deletions(-)

LGTM.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 03/17] xfs: Use xfs_extnum_t instead of basic data types
  2022-03-01 10:39 ` [PATCH V7 03/17] xfs: Use xfs_extnum_t instead of basic data types Chandan Babu R
@ 2022-03-04  0:59   ` Dave Chinner
  2022-03-04  1:30     ` Dave Chinner
  0 siblings, 1 reply; 53+ messages in thread
From: Dave Chinner @ 2022-03-04  0:59 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs, djwong

On Tue, Mar 01, 2022 at 04:09:24PM +0530, Chandan Babu R wrote:
> xfs_extnum_t is the type to use to declare variables which have values
> obtained from xfs_dinode->di_[a]nextents. This commit replaces basic
> types (e.g. uint32_t) with xfs_extnum_t for such variables.
> 
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_bmap.c       | 2 +-
>  fs/xfs/libxfs/xfs_inode_buf.c  | 2 +-
>  fs/xfs/libxfs/xfs_inode_fork.c | 2 +-
>  fs/xfs/scrub/inode.c           | 2 +-
>  fs/xfs/xfs_trace.h             | 2 +-
>  5 files changed, 5 insertions(+), 5 deletions(-)

Nice little cleanup.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

Something to think about for a followup - how do we ensure we catch
this sort type mismatch in future as it could end up with overflow
bugs?

> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 703ab9a84530..98541be873d8 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -54,7 +54,7 @@ xfs_bmap_compute_maxlevels(
>  {
>  	int		level;		/* btree level */
>  	uint		maxblocks;	/* max blocks at this level */
> -	uint		maxleafents;	/* max leaf entries possible */
> +	xfs_extnum_t	maxleafents;	/* max leaf entries possible */
>  	int		maxrootrecs;	/* max records in root block */
>  	int		minleafrecs;	/* min records in leaf block */
>  	int		minnoderecs;	/* min records in node block */
> diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> index e6f9bdc4558f..5c95a5428fc7 100644
> --- a/fs/xfs/libxfs/xfs_inode_buf.c
> +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> @@ -336,7 +336,7 @@ xfs_dinode_verify_fork(
>  	struct xfs_mount	*mp,
>  	int			whichfork)
>  {
> -	uint32_t		di_nextents = XFS_DFORK_NEXTENTS(dip, whichfork);
> +	xfs_extnum_t		di_nextents = XFS_DFORK_NEXTENTS(dip, whichfork);

e.g. should we convert macros like this to static inline functions
so that we get type checking of the returned value?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 06/17] xfs: Promote xfs_extnum_t and xfs_aextnum_t to 64 and 32-bits respectively
  2022-03-01 10:39 ` [PATCH V7 06/17] xfs: Promote xfs_extnum_t and xfs_aextnum_t to 64 and 32-bits respectively Chandan Babu R
@ 2022-03-04  1:29   ` Dave Chinner
  2022-03-05 12:43     ` Chandan Babu R
  0 siblings, 1 reply; 53+ messages in thread
From: Dave Chinner @ 2022-03-04  1:29 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs, djwong, kernel test robot

On Tue, Mar 01, 2022 at 04:09:27PM +0530, Chandan Babu R wrote:
> A future commit will introduce a 64-bit on-disk data extent counter and a
> 32-bit on-disk attr extent counter. This commit promotes xfs_extnum_t and
> xfs_aextnum_t to 64 and 32-bits in order to correctly handle in-core versions
> of these quantities.
> 
> Reported-by: kernel test robot <lkp@intel.com>

What was reported by the test robot? This change isn't a bug that
needed fixing, it's a core part of the patchset...

> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_bmap.c       | 6 +++---
>  fs/xfs/libxfs/xfs_inode_fork.c | 2 +-
>  fs/xfs/libxfs/xfs_inode_fork.h | 2 +-
>  fs/xfs/libxfs/xfs_types.h      | 4 ++--
>  fs/xfs/xfs_inode.c             | 4 ++--
>  fs/xfs/xfs_trace.h             | 2 +-
>  6 files changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 98541be873d8..9df98339a43a 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -52,9 +52,9 @@ xfs_bmap_compute_maxlevels(
>  	xfs_mount_t	*mp,		/* file system mount structure */
>  	int		whichfork)	/* data or attr fork */
>  {
> +	xfs_extnum_t	maxleafents;	/* max leaf entries possible */
>  	int		level;		/* btree level */
>  	uint		maxblocks;	/* max blocks at this level */
> -	xfs_extnum_t	maxleafents;	/* max leaf entries possible */
>  	int		maxrootrecs;	/* max records in root block */
>  	int		minleafrecs;	/* min records in leaf block */
>  	int		minnoderecs;	/* min records in node block */

Unnecessary.

> @@ -83,7 +83,7 @@ xfs_bmap_compute_maxlevels(
>  	maxrootrecs = xfs_bmdr_maxrecs(sz, 0);
>  	minleafrecs = mp->m_bmap_dmnr[0];
>  	minnoderecs = mp->m_bmap_dmnr[1];
> -	maxblocks = (maxleafents + minleafrecs - 1) / minleafrecs;
> +	maxblocks = howmany_64(maxleafents, minleafrecs);
>  	for (level = 1; maxblocks > 1; level++) {
>  		if (maxblocks <= maxrootrecs)
>  			maxblocks = 1;
> @@ -467,7 +467,7 @@ xfs_bmap_check_leaf_extents(
>  	if (bp_release)
>  		xfs_trans_brelse(NULL, bp);
>  error_norelse:
> -	xfs_warn(mp, "%s: BAD after btree leaves for %d extents",
> +	xfs_warn(mp, "%s: BAD after btree leaves for %llu extents",
>  		__func__, i);
>  	xfs_err(mp, "%s: CORRUPTED BTREE OR SOMETHING", __func__);
>  	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
> index 829739e249b6..ce690abe5dce 100644
> --- a/fs/xfs/libxfs/xfs_inode_fork.c
> +++ b/fs/xfs/libxfs/xfs_inode_fork.c
> @@ -117,7 +117,7 @@ xfs_iformat_extents(
>  	 * we just bail out rather than crash in kmem_alloc() or memcpy() below.
>  	 */
>  	if (unlikely(size < 0 || size > XFS_DFORK_SIZE(dip, mp, whichfork))) {
> -		xfs_warn(ip->i_mount, "corrupt inode %Lu ((a)extents = %d).",
> +		xfs_warn(ip->i_mount, "corrupt inode %llu ((a)extents = %llu).",
>  			(unsigned long long) ip->i_ino, nex);

Isn't ip->i_ino explicitly defined as an unsigned long long? If you are going
to fix one part of the printk formatting for ip->i_ino, you should
probably should get rid of the unnecessary cast, too.

Otherwise looks OK.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 03/17] xfs: Use xfs_extnum_t instead of basic data types
  2022-03-04  0:59   ` Dave Chinner
@ 2022-03-04  1:30     ` Dave Chinner
  0 siblings, 0 replies; 53+ messages in thread
From: Dave Chinner @ 2022-03-04  1:30 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs, djwong

On Fri, Mar 04, 2022 at 11:59:34AM +1100, Dave Chinner wrote:
> On Tue, Mar 01, 2022 at 04:09:24PM +0530, Chandan Babu R wrote:
> > xfs_extnum_t is the type to use to declare variables which have values
> > obtained from xfs_dinode->di_[a]nextents. This commit replaces basic
> > types (e.g. uint32_t) with xfs_extnum_t for such variables.
> > 
> > Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> > Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
> > ---
> >  fs/xfs/libxfs/xfs_bmap.c       | 2 +-
> >  fs/xfs/libxfs/xfs_inode_buf.c  | 2 +-
> >  fs/xfs/libxfs/xfs_inode_fork.c | 2 +-
> >  fs/xfs/scrub/inode.c           | 2 +-
> >  fs/xfs/xfs_trace.h             | 2 +-
> >  5 files changed, 5 insertions(+), 5 deletions(-)
> 
> Nice little cleanup.
> 
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> 
> Something to think about for a followup - how do we ensure we catch
> this sort type mismatch in future as it could end up with overflow
> bugs?

Ah, never mind, later patches in the series look to address this...

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 04/17] xfs: Introduce xfs_dfork_nextents() helper
  2022-03-01 10:39 ` [PATCH V7 04/17] xfs: Introduce xfs_dfork_nextents() helper Chandan Babu R
@ 2022-03-04  1:43   ` Dave Chinner
  2022-03-05 12:42     ` Chandan Babu R
  0 siblings, 1 reply; 53+ messages in thread
From: Dave Chinner @ 2022-03-04  1:43 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs, djwong

On Tue, Mar 01, 2022 at 04:09:25PM +0530, Chandan Babu R wrote:
> This commit replaces the macro XFS_DFORK_NEXTENTS() with the helper function
> xfs_dfork_nextents(). As of this commit, xfs_dfork_nextents() returns the same
> value as XFS_DFORK_NEXTENTS(). A future commit which extends inode's extent
> counter fields will add more logic to this helper.
> 
> This commit also replaces direct accesses to xfs_dinode->di_[a]nextents
> with calls to xfs_dfork_nextents().
> 
> No functional changes have been made.
> 
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_format.h     |  4 ----
>  fs/xfs/libxfs/xfs_inode_buf.c  | 16 +++++++++++-----
>  fs/xfs/libxfs/xfs_inode_fork.c | 10 ++++++----
>  fs/xfs/libxfs/xfs_inode_fork.h | 32 ++++++++++++++++++++++++++++++++
>  fs/xfs/scrub/inode.c           | 18 ++++++++++--------
>  5 files changed, 59 insertions(+), 21 deletions(-)

Mostly good - a few consistency nits below.

> 
> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> index d75e5b16da7e..e5654b578ec0 100644
> --- a/fs/xfs/libxfs/xfs_format.h
> +++ b/fs/xfs/libxfs/xfs_format.h
> @@ -925,10 +925,6 @@ enum xfs_dinode_fmt {
>  	((w) == XFS_DATA_FORK ? \
>  		(dip)->di_format : \
>  		(dip)->di_aformat)
> -#define XFS_DFORK_NEXTENTS(dip,w) \
> -	((w) == XFS_DATA_FORK ? \
> -		be32_to_cpu((dip)->di_nextents) : \
> -		be16_to_cpu((dip)->di_anextents))
>  
>  /*
>   * For block and character special files the 32bit dev_t is stored at the
> diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> index 5c95a5428fc7..860d32816909 100644
> --- a/fs/xfs/libxfs/xfs_inode_buf.c
> +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> @@ -336,9 +336,11 @@ xfs_dinode_verify_fork(
>  	struct xfs_mount	*mp,
>  	int			whichfork)
>  {
> -	xfs_extnum_t		di_nextents = XFS_DFORK_NEXTENTS(dip, whichfork);
> +	xfs_extnum_t		di_nextents;
>  	xfs_extnum_t		max_extents;
>  
> +	di_nextents = xfs_dfork_nextents(dip, whichfork);

Why separate the declaration and init? We normally move the init
up to the declaration, not demote it like this....

>  	switch (XFS_DFORK_FORMAT(dip, whichfork)) {
>  	case XFS_DINODE_FMT_LOCAL:
>  		/*
> @@ -405,6 +407,8 @@ xfs_dinode_verify(
>  	uint16_t		flags;
>  	uint64_t		flags2;
>  	uint64_t		di_size;
> +	xfs_extnum_t            nextents;
> +	xfs_filblks_t		nblocks;
>  
>  	if (dip->di_magic != cpu_to_be16(XFS_DINODE_MAGIC))
>  		return __this_address;
> @@ -435,10 +439,12 @@ xfs_dinode_verify(
>  	if ((S_ISLNK(mode) || S_ISDIR(mode)) && di_size == 0)
>  		return __this_address;
>  
> +	nextents = xfs_dfork_data_extents(dip);
> +	nextents += xfs_dfork_attr_extents(dip);
> +	nblocks = be64_to_cpu(dip->di_nblocks);
> +
>  	/* Fork checks carried over from xfs_iformat_fork */
> -	if (mode &&
> -	    be32_to_cpu(dip->di_nextents) + be16_to_cpu(dip->di_anextents) >
> -			be64_to_cpu(dip->di_nblocks))
> +	if (mode && nextents > nblocks)
>  		return __this_address;

The naextents count is needed later in this function. Rather than
calculate it twice, I find the code reads a lot better if it is
structured like this:

	nextents = xfs_dfork_data_extents(dip);
	naextents = xfs_dfork_attr_extents(dip);
	nblocks = be64_to_cpu(dip->di_nblocks);

	if (mode && nextents + naextents > nblocks)
		return __this_address;
	.....

>  
>  	if (mode && XFS_DFORK_BOFF(dip) > mp->m_sb.sb_inodesize)
> @@ -495,7 +501,7 @@ xfs_dinode_verify(
>  		default:
>  			return __this_address;
>  		}
> -		if (dip->di_anextents)
> +		if (xfs_dfork_attr_extents(dip))
>  			return __this_address;
>  	}

And then just check naextents here, too?

> diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
> index a17c4d87520a..829739e249b6 100644
> --- a/fs/xfs/libxfs/xfs_inode_fork.c
> +++ b/fs/xfs/libxfs/xfs_inode_fork.c
> @@ -105,7 +105,7 @@ xfs_iformat_extents(
>  	struct xfs_mount	*mp = ip->i_mount;
>  	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, whichfork);
>  	int			state = xfs_bmap_fork_to_state(whichfork);
> -	xfs_extnum_t		nex = XFS_DFORK_NEXTENTS(dip, whichfork);
> +	xfs_extnum_t		nex = xfs_dfork_nextents(dip, whichfork);

I'll point out declaration with init as I mentioned earlier...

>  	int			size = nex * sizeof(xfs_bmbt_rec_t);
>  	struct xfs_iext_cursor	icur;
>  	struct xfs_bmbt_rec	*dp;
> @@ -230,7 +230,7 @@ xfs_iformat_data_fork(
>  	 * depend on it.
>  	 */
>  	ip->i_df.if_format = dip->di_format;
> -	ip->i_df.if_nextents = be32_to_cpu(dip->di_nextents);
> +	ip->i_df.if_nextents = xfs_dfork_data_extents(dip);
>  
>  	switch (inode->i_mode & S_IFMT) {
>  	case S_IFIFO:
> @@ -295,14 +295,16 @@ xfs_iformat_attr_fork(
>  	struct xfs_inode	*ip,
>  	struct xfs_dinode	*dip)
>  {
> +	xfs_extnum_t		naextents;
>  	int			error = 0;
>  
> +	naextents = xfs_dfork_attr_extents(dip);
> +

.... and point it out again because otherwise this looks
inconsistent.

>  struct xfs_ifork *xfs_iext_state_to_fork(struct xfs_inode *ip, int state);
> diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c
> index 87925761e174..edad5307e430 100644
> --- a/fs/xfs/scrub/inode.c
> +++ b/fs/xfs/scrub/inode.c
> @@ -233,6 +233,7 @@ xchk_dinode(
>  	unsigned long long	isize;
>  	uint64_t		flags2;
>  	xfs_extnum_t		nextents;
> +	xfs_extnum_t		naextents;
>  	prid_t			prid;
>  	uint16_t		flags;
>  	uint16_t		mode;
> @@ -391,7 +392,7 @@ xchk_dinode(
>  	xchk_inode_extsize(sc, dip, ino, mode, flags);
>  
>  	/* di_nextents */
> -	nextents = be32_to_cpu(dip->di_nextents);
> +	nextents = xfs_dfork_data_extents(dip);
>  	fork_recs =  XFS_DFORK_DSIZE(dip, mp) / sizeof(struct xfs_bmbt_rec);
>  	switch (dip->di_format) {
>  	case XFS_DINODE_FMT_EXTENTS:
> @@ -408,10 +409,12 @@ xchk_dinode(
>  		break;
>  	}
>  
> +	naextents = xfs_dfork_attr_extents(dip);

Initialise the two extent counts in the same place - they are both
first used only a handful of lines apart.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 05/17] xfs: Use basic types to define xfs_log_dinode's di_nextents and di_anextents
  2022-03-01 10:39 ` [PATCH V7 05/17] xfs: Use basic types to define xfs_log_dinode's di_nextents and di_anextents Chandan Babu R
@ 2022-03-04  1:44   ` Dave Chinner
  0 siblings, 0 replies; 53+ messages in thread
From: Dave Chinner @ 2022-03-04  1:44 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs, djwong

On Tue, Mar 01, 2022 at 04:09:26PM +0530, Chandan Babu R wrote:
> A future commit will increase the width of xfs_extnum_t in order to facilitate
> larger per-inode extent counters. Hence this patch now uses basic types to
> define xfs_log_dinode->[di_nextents|dianextents].
> 
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_log_format.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 07/17] xfs: Introduce XFS_SB_FEAT_INCOMPAT_NREXT64 and associated per-fs feature bit
  2022-03-01 10:39 ` [PATCH V7 07/17] xfs: Introduce XFS_SB_FEAT_INCOMPAT_NREXT64 and associated per-fs feature bit Chandan Babu R
@ 2022-03-04  1:57   ` Dave Chinner
  2022-03-05 12:43     ` Chandan Babu R
  0 siblings, 1 reply; 53+ messages in thread
From: Dave Chinner @ 2022-03-04  1:57 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs, djwong

On Tue, Mar 01, 2022 at 04:09:28PM +0530, Chandan Babu R wrote:
> XFS_SB_FEAT_INCOMPAT_NREXT64 incompat feature bit will be set on filesystems
> which support large per-inode extent counters. This commit defines the new
> incompat feature bit and the corresponding per-fs feature bit (along with
> inline functions to work on it).
> 
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_format.h | 1 +
>  fs/xfs/libxfs/xfs_sb.c     | 3 +++
>  fs/xfs/xfs_mount.h         | 2 ++
>  3 files changed, 6 insertions(+)
> 
> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> index e5654b578ec0..7972cbc22608 100644
> --- a/fs/xfs/libxfs/xfs_format.h
> +++ b/fs/xfs/libxfs/xfs_format.h
> @@ -372,6 +372,7 @@ xfs_sb_has_ro_compat_feature(
>  #define XFS_SB_FEAT_INCOMPAT_META_UUID	(1 << 2)	/* metadata UUID */
>  #define XFS_SB_FEAT_INCOMPAT_BIGTIME	(1 << 3)	/* large timestamps */
>  #define XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR (1 << 4)	/* needs xfs_repair */
> +#define XFS_SB_FEAT_INCOMPAT_NREXT64	(1 << 5)	/* 64-bit data fork extent counter */
>  #define XFS_SB_FEAT_INCOMPAT_ALL \
>  		(XFS_SB_FEAT_INCOMPAT_FTYPE|	\
>  		 XFS_SB_FEAT_INCOMPAT_SPINODES|	\
> diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> index f4e84aa1d50a..bd632389ae92 100644
> --- a/fs/xfs/libxfs/xfs_sb.c
> +++ b/fs/xfs/libxfs/xfs_sb.c
> @@ -124,6 +124,9 @@ xfs_sb_version_to_features(
>  		features |= XFS_FEAT_BIGTIME;
>  	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR)
>  		features |= XFS_FEAT_NEEDSREPAIR;
> +	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_NREXT64)
> +		features |= XFS_FEAT_NREXT64;
> +
>  	return features;
>  }
>  
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index 00720a02e761..10941481f7e6 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -276,6 +276,7 @@ typedef struct xfs_mount {
>  #define XFS_FEAT_INOBTCNT	(1ULL << 23)	/* inobt block counts */
>  #define XFS_FEAT_BIGTIME	(1ULL << 24)	/* large timestamps */
>  #define XFS_FEAT_NEEDSREPAIR	(1ULL << 25)	/* needs xfs_repair */
> +#define XFS_FEAT_NREXT64	(1ULL << 26)	/* 64-bit inode extent counters */
>  
>  /* Mount features */
>  #define XFS_FEAT_NOATTR2	(1ULL << 48)	/* disable attr2 creation */
> @@ -338,6 +339,7 @@ __XFS_HAS_FEAT(realtime, REALTIME)
>  __XFS_HAS_FEAT(inobtcounts, INOBTCNT)
>  __XFS_HAS_FEAT(bigtime, BIGTIME)
>  __XFS_HAS_FEAT(needsrepair, NEEDSREPAIR)
> +__XFS_HAS_FEAT(nrext64, NREXT64)

Not a big fan of "nrext64" naming.

I'd really like the feature macro to be human readable such as:

__XFS_HAS_FEAT(large_extent_counts, NREXT64)

So that it reads like this:

	if (xfs_has_large_extent_counts(mp)) {
		.....
	}

because then the code is much easier to read and is largely self
documenting. In this case, I don't really care about the flag names
(they can remain NREXT64) because they are only seen deep down in
the code.  But for (potentially complex) conditional logic, the
clarity of human readable names makes a big difference.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 08/17] xfs: Introduce XFS_FSOP_GEOM_FLAGS_NREXT64
  2022-03-01 10:39 ` [PATCH V7 08/17] xfs: Introduce XFS_FSOP_GEOM_FLAGS_NREXT64 Chandan Babu R
@ 2022-03-04  1:58   ` Dave Chinner
  0 siblings, 0 replies; 53+ messages in thread
From: Dave Chinner @ 2022-03-04  1:58 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs, djwong

On Tue, Mar 01, 2022 at 04:09:29PM +0530, Chandan Babu R wrote:
> XFS_FSOP_GEOM_FLAGS_NREXT64 indicates that the current filesystem instance
> supports 64-bit per-inode extent counters.
> 
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_fs.h | 1 +
>  fs/xfs/libxfs/xfs_sb.c | 2 ++
>  2 files changed, 3 insertions(+)

Looks fine, module xfs_has_nrext64....

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 10/17] xfs: Use xfs_rfsblock_t to count maximum blocks that can be used by BMBT
  2022-03-01 10:39 ` [PATCH V7 10/17] xfs: Use xfs_rfsblock_t to count maximum blocks that can be used by BMBT Chandan Babu R
@ 2022-03-04  2:09   ` Dave Chinner
  2022-03-05 12:44     ` Chandan Babu R
  0 siblings, 1 reply; 53+ messages in thread
From: Dave Chinner @ 2022-03-04  2:09 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs, djwong, kernel test robot

On Tue, Mar 01, 2022 at 04:09:31PM +0530, Chandan Babu R wrote:
> Reported-by: kernel test robot <lkp@intel.com>

What was reported by the robot? I don't quite see the relevance of
this change to the overall patchset just from the change being made.

> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_bmap.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 9df98339a43a..a01d9a9225ae 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -53,8 +53,8 @@ xfs_bmap_compute_maxlevels(
>  	int		whichfork)	/* data or attr fork */
>  {
>  	xfs_extnum_t	maxleafents;	/* max leaf entries possible */
> +	xfs_rfsblock_t	maxblocks;	/* max blocks at this level */

typedef uint64_t        xfs_rfsblock_t; /* blockno in filesystem (raw) */

Usage of the type doesn't seem to match it's definition. This
function is calculating a block count, not a block number. If you
must use a xfs type, then:

typedef uint64_t        xfs_filblks_t;  /* number of blocks in a file */

is a better match, but I think this should just use uint64_t because
the count has nothing to do with block addresses or files..

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 11/17] xfs: Introduce macros to represent new maximum extent counts for data/attr forks
  2022-03-01 10:39 ` [PATCH V7 11/17] xfs: Introduce macros to represent new maximum extent counts for data/attr forks Chandan Babu R
@ 2022-03-04  2:32   ` Dave Chinner
  2022-03-05 12:44     ` Chandan Babu R
  0 siblings, 1 reply; 53+ messages in thread
From: Dave Chinner @ 2022-03-04  2:32 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs, djwong

On Tue, Mar 01, 2022 at 04:09:32PM +0530, Chandan Babu R wrote:
> This commit defines new macros to represent maximum extent counts allowed by
> filesystems which have support for large per-inode extent counters.
> 
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_bmap.c       |  8 +++-----
>  fs/xfs/libxfs/xfs_bmap_btree.c |  2 +-
>  fs/xfs/libxfs/xfs_format.h     | 20 ++++++++++++++++----
>  fs/xfs/libxfs/xfs_inode_buf.c  |  3 ++-
>  fs/xfs/libxfs/xfs_inode_fork.c |  2 +-
>  fs/xfs/libxfs/xfs_inode_fork.h | 19 +++++++++++++++----
>  6 files changed, 38 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index a01d9a9225ae..be7f8ebe3cd5 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -61,10 +61,8 @@ xfs_bmap_compute_maxlevels(
>  	int		sz;		/* root block size */
>  
>  	/*
> -	 * The maximum number of extents in a file, hence the maximum number of
> -	 * leaf entries, is controlled by the size of the on-disk extent count,
> -	 * either a signed 32-bit number for the data fork, or a signed 16-bit
> -	 * number for the attr fork.
> +	 * The maximum number of extents in a fork, hence the maximum number of
> +	 * leaf entries, is controlled by the size of the on-disk extent count.
>  	 *
>  	 * Note that we can no longer assume that if we are in ATTR1 that the
>  	 * fork offset of all the inodes will be
> @@ -74,7 +72,7 @@ xfs_bmap_compute_maxlevels(
>  	 * ATTR2 we have to assume the worst case scenario of a minimum size
>  	 * available.
>  	 */
> -	maxleafents = xfs_iext_max_nextents(whichfork);
> +	maxleafents = xfs_iext_max_nextents(xfs_has_nrext64(mp), whichfork);
>  	if (whichfork == XFS_DATA_FORK)
>  		sz = XFS_BMDR_SPACE_CALC(MINDBTPTRS);
>  	else
> diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
> index 453309fc85f2..e8d21d69b9ff 100644
> --- a/fs/xfs/libxfs/xfs_bmap_btree.c
> +++ b/fs/xfs/libxfs/xfs_bmap_btree.c
> @@ -611,7 +611,7 @@ xfs_bmbt_maxlevels_ondisk(void)
>  	minrecs[1] = xfs_bmbt_block_maxrecs(blocklen, false) / 2;
>  
>  	/* One extra level for the inode root. */
> -	return xfs_btree_compute_maxlevels(minrecs, MAXEXTNUM) + 1;
> +	return xfs_btree_compute_maxlevels(minrecs, XFS_MAX_EXTCNT_DATA_FORK) + 1;
>  }
>  
>  /*
> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> index 9934c320bf01..d3dfd45c39e0 100644
> --- a/fs/xfs/libxfs/xfs_format.h
> +++ b/fs/xfs/libxfs/xfs_format.h
> @@ -872,10 +872,22 @@ enum xfs_dinode_fmt {
>  
>  /*
>   * Max values for extlen, extnum, aextnum.
> - */
> -#define	MAXEXTLEN	((xfs_extlen_t)0x001fffff)	/* 21 bits */
> -#define	MAXEXTNUM	((xfs_extnum_t)0x7fffffff)	/* signed int */
> -#define	MAXAEXTNUM	((xfs_aextnum_t)0x7fff)		/* signed short */
> + *
> + * The newly introduced data fork extent counter is a 64-bit field. However, the
> + * maximum number of extents in a file is limited to 2^54 extents (assuming one
> + * blocks per extent) by the 54-bit wide startoff field of an extent record.
> + *
> + * A further limitation applies as shown below,
> + * 2^63 (max file size) / 64k (max block size) = 2^47
> + *
> + * Rounding up 47 to the nearest multiple of bits-per-byte results in 48. Hence
> + * 2^48 was chosen as the maximum data fork extent count.
> + */
> +#define	MAXEXTLEN			((xfs_extlen_t)((1ULL << 21) - 1)) /* 21 bits */
> +#define XFS_MAX_EXTCNT_DATA_FORK	((xfs_extnum_t)((1ULL << 48) - 1)) /* Unsigned 48-bits */
> +#define XFS_MAX_EXTCNT_ATTR_FORK	((xfs_extnum_t)((1ULL << 32) - 1)) /* Unsigned 32-bits */
> +#define XFS_MAX_EXTCNT_DATA_FORK_OLD	((xfs_extnum_t)((1ULL << 31) - 1)) /* Signed 32-bits */
> +#define XFS_MAX_EXTCNT_ATTR_FORK_OLD	((xfs_extnum_t)((1ULL << 15) - 1)) /* Signed 16-bits */

These go way beyond 80 columns. You do not need the trailing comment
saying how many bits are supported - that's obvious from numbers.
If you need to describe the actual supported limits, then do it
in the head comment:

/*
 * Max values for extent sizes and counts
 *
 * The original on-disk extent counts were held in signed fields,
 * resulting in maximum extent counts of 2^31 and 2^15 for the data
 * and attr forks respectively. Similarly the maximum extent length
 * is limited to 2^21 blocks by the 21-bit wide blockcount field of
 * a BMBT extent record.
 *
 * The newly introduced data fork extent counter can hold a 64-bit
 * value, however the  maximum number of extents in a file is also
 * limited to 2^54 extents by the 54-bit wide startoff field of a BMBT
 * extent record.
 *
 * It is further limited by the maximum supported file size
 * of 2^63 *bytes*. This leads to a maximum extent count for maximally sized
 * filesystem blocks (64kB) of:
 *
 * 2^63 bytes / 2^16 bytes per block = 2^47 blocks
 *
 * Rounding up 47 to the nearest multiple of bits-per-byte
 * results in 48. Hence 2^48 was chosen as the maximum data fork
 * extent count.
 */
#define	MAXEXTLEN			((xfs_extlen_t)((1ULL << 21) - 1))
#define XFS_MAX_EXTCNT_DATA_FORK	((xfs_extnum_t)((1ULL << 48) - 1))
#define XFS_MAX_EXTCNT_ATTR_FORK	((xfs_extnum_t)((1ULL << 32) - 1))
#define XFS_MAX_EXTCNT_DATA_FORK_OLD	((xfs_extnum_t)((1ULL << 31) - 1))
#define XFS_MAX_EXTCNT_ATTR_FORK_OLD	((xfs_extnum_t)((1ULL << 15) - 1))


Hmmm. On reading that back and looking at the code below, maybe the
names should be _LARGE and _SMALL, not (blank) and _OLD....

>  /*
>   * Inode minimum and maximum sizes.
> diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> index 860d32816909..34f360a38603 100644
> --- a/fs/xfs/libxfs/xfs_inode_buf.c
> +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> @@ -361,7 +361,8 @@ xfs_dinode_verify_fork(
>  			return __this_address;
>  		break;
>  	case XFS_DINODE_FMT_BTREE:
> -		max_extents = xfs_iext_max_nextents(whichfork);
> +		max_extents = xfs_iext_max_nextents(xfs_dinode_has_nrext64(dip),
> +					whichfork);

>  		if (di_nextents > max_extents)
>  			return __this_address;
>  		break;
> diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
> index ce690abe5dce..a3a3b54f9c55 100644
> --- a/fs/xfs/libxfs/xfs_inode_fork.c
> +++ b/fs/xfs/libxfs/xfs_inode_fork.c
> @@ -746,7 +746,7 @@ xfs_iext_count_may_overflow(
>  	if (whichfork == XFS_COW_FORK)
>  		return 0;
>  
> -	max_exts = xfs_iext_max_nextents(whichfork);
> +	max_exts = xfs_iext_max_nextents(xfs_inode_has_nrext64(ip), whichfork);
>  
>  	if (XFS_TEST_ERROR(false, ip->i_mount, XFS_ERRTAG_REDUCE_MAX_IEXTENTS))
>  		max_exts = 10;
> diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
> index 4a8b77d425df..e56803436c61 100644
> --- a/fs/xfs/libxfs/xfs_inode_fork.h
> +++ b/fs/xfs/libxfs/xfs_inode_fork.h
> @@ -133,12 +133,23 @@ static inline int8_t xfs_ifork_format(struct xfs_ifork *ifp)
>  	return ifp->if_format;
>  }
>  
> -static inline xfs_extnum_t xfs_iext_max_nextents(int whichfork)
> +static inline xfs_extnum_t xfs_iext_max_nextents(bool has_nrext64,
							has_large_extent_counts
> +				int whichfork)
>  {
> -	if (whichfork == XFS_DATA_FORK || whichfork == XFS_COW_FORK)
> -		return MAXEXTNUM;
> +	switch (whichfork) {
> +	case XFS_DATA_FORK:
> +	case XFS_COW_FORK:
> +		return has_nrext64 ? XFS_MAX_EXTCNT_DATA_FORK
> +			: XFS_MAX_EXTCNT_DATA_FORK_OLD;

		if (has_large_extent_counts)
			return XFS_MAX_EXTCNT_DATA_FORK_LARGE;
		return XFS_MAX_EXTCNT_DATA_FORK_SMALL;

That reads much better to me...

Cheers,

DAve/
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 12/17] xfs: Introduce per-inode 64-bit extent counters
  2022-03-01 10:39 ` [PATCH V7 12/17] xfs: Introduce per-inode 64-bit extent counters Chandan Babu R
@ 2022-03-04  7:14   ` Dave Chinner
  2022-03-05 12:44     ` Chandan Babu R
  0 siblings, 1 reply; 53+ messages in thread
From: Dave Chinner @ 2022-03-04  7:14 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs, djwong, Dave Chinner

On Tue, Mar 01, 2022 at 04:09:33PM +0530, Chandan Babu R wrote:
> This commit introduces new fields in the on-disk inode format to support
> 64-bit data fork extent counters and 32-bit attribute fork extent
> counters. The new fields will be used only when an inode has
> XFS_DIFLAG2_NREXT64 flag set. Otherwise we continue to use the regular 32-bit
> data fork extent counters and 16-bit attribute fork extent counters.
> 
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
> Suggested-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_format.h      | 33 ++++++++++++--
>  fs/xfs/libxfs/xfs_inode_buf.c   | 49 ++++++++++++++++++--
>  fs/xfs/libxfs/xfs_inode_fork.h  |  6 +++
>  fs/xfs/libxfs/xfs_log_format.h  | 33 ++++++++++++--
>  fs/xfs/xfs_inode_item.c         | 23 ++++++++--
>  fs/xfs/xfs_inode_item_recover.c | 79 ++++++++++++++++++++++++++++-----
>  6 files changed, 196 insertions(+), 27 deletions(-)

.....

> +static xfs_failaddr_t
> +xfs_dinode_verify_nrext64(
> +	struct xfs_mount	*mp,
> +	struct xfs_dinode	*dip)
> +{
> +	if (xfs_dinode_has_nrext64(dip)) {
> +		if (!xfs_has_nrext64(mp))
> +			return __this_address;
> +		if (dip->di_nrext64_pad != 0)
> +			return __this_address;
> +	} else if (dip->di_version >= 3) {
> +		if (dip->di_v3_pad != 0)
> +			return __this_address;
> +	}
> +
> +	return NULL;
> +}

Shouldn't this also check that di_v2_pad is zero if it's a v2 inode?

Also, this isn't verifying the actual extent count range. Maybe
that's done somewhere else now, and if so, shouldn't we move all the
extent count verification checks into a single function called,
say, xfs_dinode_verify_extent_counts()?

> @@ -348,21 +366,60 @@ xlog_recover_inode_commit_pass2(
>  			goto out_release;
>  		}
>  	}
> -	if (unlikely(ldip->di_nextents + ldip->di_anextents > ldip->di_nblocks)){
> -		XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(5)",
> +
> +	if (xfs_log_dinode_has_nrext64(ldip)) {
> +		if (!xfs_has_nrext64(mp) || (ldip->di_nrext64_pad != 0)) {
> +			XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(5)",

Can we have a meaningful error like "Bad log dinode large extent
count format" rather than something we have to go look up the source
code to understand when someone reports a problem?

> +				     XFS_ERRLEVEL_LOW, mp, ldip,
> +				     sizeof(*ldip));
> +			xfs_alert(mp,
> +				"%s: Bad inode log record, rec ptr "PTR_FMT", "
> +				"dino ptr "PTR_FMT", dino bp "PTR_FMT", "
> +				"ino %Ld, xfs_has_nrext64(mp) = %d, "
> +				"ldip->di_nrext64_pad = %u",

What's the point of printing pointers here? Just print the inode
number and the bad values - we log the pointers in the
the log recovery tracepoints so there's no need to print them in
user facing errors because we can't do anything with them without a
debugger attached.

Hence we really only need to dump the inode number and the bad extent
format information - we already have the error context/location from
the corruption error report above. Hence all we need here is:

			xfs_alert(mp,
				"Bad inode 0x%llx, nrext64 %d, padding 0x%x"
				in_f->ilf_ino, xfs_has_nrext64(mp).
				ldip->di_nrext64_pad);

The other new alerts can be cleaned up like this, too.

> +				__func__, item, dip, bp, in_f->ilf_ino,
> +				xfs_has_nrext64(mp), ldip->di_nrext64_pad);
> +			error = -EFSCORRUPTED;
> +			goto out_release;
> +		}
> +	} else {
> +		if (ldip->di_version == 3 && ldip->di_big_nextents != 0) {
> +			XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(6)",
> +				     XFS_ERRLEVEL_LOW, mp, ldip,
> +				     sizeof(*ldip));
> +			xfs_alert(mp,
> +				"%s: Bad inode log record, rec ptr "PTR_FMT", "
> +				"dino ptr "PTR_FMT", dino bp "PTR_FMT", "
> +				"ino %Ld, ldip->di_big_dextcnt = %llu",
> +				__func__, item, dip, bp, in_f->ilf_ino,
> +				ldip->di_big_nextents);
> +			error = -EFSCORRUPTED;
> +			goto out_release;
> +		}
> +	}
> +
> +	if (xfs_log_dinode_has_nrext64(ldip)) {
> +		nextents = ldip->di_big_nextents;
> +		anextents = ldip->di_big_anextents;
> +	} else {
> +		nextents = ldip->di_nextents;
> +		anextents = ldip->di_anextents;
> +	}

Also, this can be put in the above if statements, it does not need
a separate identical if clause.
> +
> +	if (unlikely(nextents + anextents > ldip->di_nblocks)) {
> +		XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(7)",
>  				     XFS_ERRLEVEL_LOW, mp, ldip,
>  				     sizeof(*ldip));
>  		xfs_alert(mp,
>  	"%s: Bad inode log record, rec ptr "PTR_FMT", dino ptr "PTR_FMT", "
> -	"dino bp "PTR_FMT", ino %Ld, total extents = %d, nblocks = %Ld",
> +	"dino bp "PTR_FMT", ino %Ld, total extents = %llu, nblocks = %Ld",
>  			__func__, item, dip, bp, in_f->ilf_ino,
> -			ldip->di_nextents + ldip->di_anextents,
> -			ldip->di_nblocks);
> +			nextents + anextents, ldip->di_nblocks);
>  		error = -EFSCORRUPTED;
>  		goto out_release;
>  	}

ALso, I think that xlog_recover_inode_commit_pass2() is already too
big without adding this new verification to it. Can we factor this
into a separate function (say xlog_dinode_verify_extent_counts()) 


>  	if (unlikely(ldip->di_forkoff > mp->m_sb.sb_inodesize)) {
> -		XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(6)",
> +		XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(8)",
>  				     XFS_ERRLEVEL_LOW, mp, ldip,
>  				     sizeof(*ldip));
>  		xfs_alert(mp,
> @@ -374,7 +431,7 @@ xlog_recover_inode_commit_pass2(
>  	}
>  	isize = xfs_log_dinode_size(mp);
>  	if (unlikely(item->ri_buf[1].i_len > isize)) {
> -		XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(7)",
> +		XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(9)",
>  				     XFS_ERRLEVEL_LOW, mp, ldip,
>  				     sizeof(*ldip));
>  		xfs_alert(mp,

And this is exactly why I don't like these numbered warnings. Make
the warning descriptive rather than numbered -
changing/adding/removing a warning shouldn't force us to change a
bunch of unrelated warninngs...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 13/17] xfs: xfs_growfs_rt_alloc: Unlock inode explicitly rather than through iop_committing()
  2022-03-01 10:39 ` [PATCH V7 13/17] xfs: xfs_growfs_rt_alloc: Unlock inode explicitly rather than through iop_committing() Chandan Babu R
  2022-03-02  0:26   ` Darrick J. Wong
@ 2022-03-04  7:25   ` Dave Chinner
  2022-03-05 12:44     ` Chandan Babu R
  1 sibling, 1 reply; 53+ messages in thread
From: Dave Chinner @ 2022-03-04  7:25 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs, djwong

On Tue, Mar 01, 2022 at 04:09:34PM +0530, Chandan Babu R wrote:
> In order to be able to upgrade inodes to XFS_DIFLAG2_NREXT64, a future commit
> will perform such an upgrade in a transaction context. This requires the
> transaction to be rolled once. Hence inodes which have been added to the
> tranasction (via xfs_trans_ijoin()) with non-zero value for lock_flags
> argument would cause the inode to be unlocked when the transaction is rolled.
> 
> To prevent this from happening in the case of realtime bitmap/summary inodes,
> this commit now unlocks the inode explictly rather than through
> iop_committing() call back.
> 
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
> ---
>  fs/xfs/xfs_rtalloc.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
> index b8c79ee791af..a70140b35e8b 100644
> --- a/fs/xfs/xfs_rtalloc.c
> +++ b/fs/xfs/xfs_rtalloc.c
> @@ -780,6 +780,7 @@ xfs_growfs_rt_alloc(
>  	int			resblks;	/* space reservation */
>  	enum xfs_blft		buf_type;
>  	struct xfs_trans	*tp;
> +	bool			unlock_inode;
>  
>  	if (ip == mp->m_rsumip)
>  		buf_type = XFS_BLFT_RTSUMMARY_BUF;
> @@ -802,7 +803,8 @@ xfs_growfs_rt_alloc(
>  		 * Lock the inode.
>  		 */
>  		xfs_ilock(ip, XFS_ILOCK_EXCL);
> -		xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
> +		xfs_trans_ijoin(tp, ip, 0);
> +		unlock_inode = true;
>  
>  		error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
>  				XFS_IEXT_ADD_NOSPLIT_CNT);
> @@ -823,8 +825,11 @@ xfs_growfs_rt_alloc(
>  		 * Free any blocks freed up in the transaction, then commit.
>  		 */
>  		error = xfs_trans_commit(tp);
> -		if (error)
> +                unlock_inode = false;
> +                xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +                if (error)
>  			return error;
> +

whitespace damage.

>  		/*
>  		 * Now we need to clear the allocated blocks.
>  		 * Do this one block per transaction, to keep it simple.
> @@ -874,6 +879,8 @@ xfs_growfs_rt_alloc(
>  
>  out_trans_cancel:
>  	xfs_trans_cancel(tp);
> +	if (unlock_inode)
> +		xfs_iunlock(ip, XFS_ILOCK_EXCL);
>  	return error;

That's kinda messy, IMO. If you create a new error stack like:

out_trans_cancel:
	xfs_trans_cancel(tp);
	return error;

out_cancel_unlock:
	xfs_trans_cancel(tp);
	xfs_iunlock(ip, XFS_ILOCK_EXCL);
	return error;

Then you can get rid of the unlock_inode variable and just change
the if (error) goto ... jumps in the appropriate places where
unlock on cancel is needed. That seems much cleaner and easier to
verify.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 14/17] xfs: Conditionally upgrade existing inodes to use 64-bit extent counters
  2022-03-01 10:39 ` [PATCH V7 14/17] xfs: Conditionally upgrade existing inodes to use 64-bit extent counters Chandan Babu R
@ 2022-03-04  7:51   ` Dave Chinner
  2022-03-05 12:45     ` Chandan Babu R
  0 siblings, 1 reply; 53+ messages in thread
From: Dave Chinner @ 2022-03-04  7:51 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs, djwong

On Tue, Mar 01, 2022 at 04:09:35PM +0530, Chandan Babu R wrote:
> This commit upgrades inodes to use 64-bit extent counters when they are read
> from disk. Inodes are upgraded only when the filesystem instance has
> XFS_SB_FEAT_INCOMPAT_NREXT64 incompat flag set.
> 
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_attr.c       |  3 ++-
>  fs/xfs/libxfs/xfs_bmap.c       |  5 ++---
>  fs/xfs/libxfs/xfs_inode_fork.c | 37 ++++++++++++++++++++++++++++++++++
>  fs/xfs/libxfs/xfs_inode_fork.h |  2 ++
>  fs/xfs/xfs_bmap_item.c         |  3 ++-
>  fs/xfs/xfs_bmap_util.c         | 10 ++++-----
>  fs/xfs/xfs_dquot.c             |  2 +-
>  fs/xfs/xfs_iomap.c             |  5 +++--
>  fs/xfs/xfs_reflink.c           |  5 +++--
>  fs/xfs/xfs_rtalloc.c           |  2 +-
>  10 files changed, 58 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index 23523b802539..03a358930d74 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -774,7 +774,8 @@ xfs_attr_set(
>  		return error;
>  
>  	if (args->value || xfs_inode_hasattr(dp)) {
> -		error = xfs_iext_count_may_overflow(dp, XFS_ATTR_FORK,
> +		error = xfs_trans_inode_ensure_nextents(&args->trans, dp,
> +				XFS_ATTR_FORK,
>  				XFS_IEXT_ATTR_MANIP_CNT(rmt_blks));

hmmmm.

>  		if (error)
>  			goto out_trans_cancel;
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index be7f8ebe3cd5..3a3c99ef7f13 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -4523,14 +4523,13 @@ xfs_bmapi_convert_delalloc(
>  		return error;
>  
>  	xfs_ilock(ip, XFS_ILOCK_EXCL);
> +	xfs_trans_ijoin(tp, ip, 0);
>  
> -	error = xfs_iext_count_may_overflow(ip, whichfork,
> +	error = xfs_trans_inode_ensure_nextents(&tp, ip, whichfork,
>  			XFS_IEXT_ADD_NOSPLIT_CNT);
>  	if (error)
>  		goto out_trans_cancel;
>  
> -	xfs_trans_ijoin(tp, ip, 0);
> -
>  	if (!xfs_iext_lookup_extent(ip, ifp, offset_fsb, &bma.icur, &bma.got) ||
>  	    bma.got.br_startoff > offset_fsb) {
>  		/*
> diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
> index a3a3b54f9c55..d1d065abeac3 100644
> --- a/fs/xfs/libxfs/xfs_inode_fork.c
> +++ b/fs/xfs/libxfs/xfs_inode_fork.c
> @@ -757,3 +757,40 @@ xfs_iext_count_may_overflow(
>  
>  	return 0;
>  }
> +
> +/*
> + * Ensure that the inode has the ability to add the specified number of
> + * extents.  Caller must hold ILOCK_EXCL and have joined the inode to
> + * the transaction.  Upon return, the inode will still be in this state
> + * upon return and the transaction will be clean.
> + */
> +int
> +xfs_trans_inode_ensure_nextents(
> +	struct xfs_trans	**tpp,
> +	struct xfs_inode	*ip,
> +	int			whichfork,
> +	int			nr_to_add)

Ok, xfs_trans_inode* is a namespace that belongs to
fs/xfs/xfs_trans_inode.c, not fs/xfs/libxfs/xfs_inode_fork.c. So my
second observation is that the function needs either be renamed or
moved.

My first observation was that the function name didn't really make
any sense to me when read in context. xfs_iext_count_may_overflow()
makes sense because it's telling me that it's checking that the
extent count hasn't overflowed. xfs_trans_inode_ensure_nextents()
conveys none of that certainty.

What does it ensure? "ensure" doesn't imply we are goign to change
anything - it could just mean "check and abort if wrong" when read
as "ensure we haven't overflowed". And if we already have nrext64
and we've overflowed that then it will still fail, meaning we
haven't "ensured" anything.

This would make much more sense if written as:

	error = xfs_iext_count_may_overflow();
	if (error && error != -EOVERFLOW)
		goto out_trans_cancel;

	if (error == -EOVERFLOW) {
		error = xfs_inode_upgrade_extent_counts();
		if (error)
			goto out_trans_cancel;
	}

Because it splits the logic into a "do we need to do something"
part and a "do an explicit modification" part.


> +{
> +	int			error;
> +
> +	error = xfs_iext_count_may_overflow(ip, whichfork, nr_to_add);
> +	if (!error)
> +		return 0;
> +
> +	/*
> +	 * Try to upgrade if the extent count fields aren't large
> +	 * enough.
> +	 */
> +	if (!xfs_has_nrext64(ip->i_mount) ||
> +	    (ip->i_diflags2 & XFS_DIFLAG2_NREXT64))
> +		return error;

Oh, that's tricky, too. The first check returns if there's no error,
the second check returns the error of the first function. Keeping
the initial overflow check in the caller gets rid of this, too.

> +
> +	ip->i_diflags2 |= XFS_DIFLAG2_NREXT64;
> +	xfs_trans_log_inode(*tpp, ip, XFS_ILOG_CORE);
> +
> +	error = xfs_trans_roll(tpp);
> +	if (error)
> +		return error;

Why does this need to roll the transaction? We can just log the
inode core and return to the caller which will then commit the
change.

> +	return xfs_iext_count_may_overflow(ip, whichfork, nr_to_add);

If the answer is so we don't cancel a dirty transaction here, then
I think this check needs to be more explicit - don't even try to do
the upgrade if the number of extents we are adding will cause an
overflow anyway.

As it is, wouldn't adding 2^47 - 2^31 extents in a single hit be
indicative of a bug? We can only modify the extent count by a
handful of extents (10, maybe 20?) at most in a single transaction,
so why do we even need this check?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 15/17] xfs: Enable bulkstat ioctl to support 64-bit per-inode extent counters
  2022-03-01 10:39 ` [PATCH V7 15/17] xfs: Enable bulkstat ioctl to support 64-bit per-inode " Chandan Babu R
  2022-03-02  0:31   ` Darrick J. Wong
@ 2022-03-04  8:09   ` Dave Chinner
  2022-03-05 12:45     ` Chandan Babu R
  1 sibling, 1 reply; 53+ messages in thread
From: Dave Chinner @ 2022-03-04  8:09 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs, djwong

On Tue, Mar 01, 2022 at 04:09:36PM +0530, Chandan Babu R wrote:
> The following changes are made to enable userspace to obtain 64-bit extent
> counters,
> 1. Carve out a new 64-bit field xfs_bulkstat->bs_extents64 from
>    xfs_bulkstat->bs_pad[] to hold 64-bit extent counter.
> 2. Define the new flag XFS_BULK_IREQ_BULKSTAT for userspace to indicate that
>    it is capable of receiving 64-bit extent counters.
> 
> Suggested-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_fs.h | 20 ++++++++++++++++----
>  fs/xfs/xfs_ioctl.c     |  3 +++
>  fs/xfs/xfs_itable.c    | 30 ++++++++++++++++++++++++++++--
>  fs/xfs/xfs_itable.h    |  4 +++-
>  fs/xfs/xfs_iwalk.h     |  2 +-
>  5 files changed, 51 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index 2204d49d0c3a..31ccbff2f16c 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -378,7 +378,7 @@ struct xfs_bulkstat {
>  	uint32_t	bs_extsize_blks; /* extent size hint, blocks	*/
>  
>  	uint32_t	bs_nlink;	/* number of links		*/
> -	uint32_t	bs_extents;	/* number of extents		*/
> +	uint32_t	bs_extents;	/* 32-bit data fork extent counter */
>  	uint32_t	bs_aextents;	/* attribute number of extents	*/
>  	uint16_t	bs_version;	/* structure version		*/
>  	uint16_t	bs_forkoff;	/* inode fork offset in bytes	*/
> @@ -387,8 +387,9 @@ struct xfs_bulkstat {
>  	uint16_t	bs_checked;	/* checked inode metadata	*/
>  	uint16_t	bs_mode;	/* type and mode		*/
>  	uint16_t	bs_pad2;	/* zeroed			*/
> +	uint64_t	bs_extents64;	/* 64-bit data fork extent counter */
>  
> -	uint64_t	bs_pad[7];	/* zeroed			*/
> +	uint64_t	bs_pad[6];	/* zeroed			*/
>  };
>  
>  #define XFS_BULKSTAT_VERSION_V1	(1)
> @@ -469,8 +470,19 @@ struct xfs_bulk_ireq {
>   */
>  #define XFS_BULK_IREQ_SPECIAL	(1 << 1)
>  
> -#define XFS_BULK_IREQ_FLAGS_ALL	(XFS_BULK_IREQ_AGNO | \
> -				 XFS_BULK_IREQ_SPECIAL)
> +/*
> + * Return data fork extent count via xfs_bulkstat->bs_extents64 field and assign
> + * 0 to xfs_bulkstat->bs_extents when the flag is set.  Otherwise, use
> + * xfs_bulkstat->bs_extents for returning data fork extent count and set
> + * xfs_bulkstat->bs_extents64 to 0. In the second case, return -EOVERFLOW and
> + * assign 0 to xfs_bulkstat->bs_extents if data fork extent count is larger than
> + * XFS_MAX_EXTCNT_DATA_FORK_OLD.
> + */
> +#define XFS_BULK_IREQ_NREXT64	(1 << 2)
> +
> +#define XFS_BULK_IREQ_FLAGS_ALL	(XFS_BULK_IREQ_AGNO |	 \
> +				 XFS_BULK_IREQ_SPECIAL | \
> +				 XFS_BULK_IREQ_NREXT64)
>  
>  /* Operate on the root directory inode. */
>  #define XFS_BULK_IREQ_SPECIAL_ROOT	(1)
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 2515fe8299e1..22947c5ffd34 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -813,6 +813,9 @@ xfs_bulk_ireq_setup(
>  	if (XFS_INO_TO_AGNO(mp, breq->startino) >= mp->m_sb.sb_agcount)
>  		return -ECANCELED;
>  
> +	if (hdr->flags & XFS_BULK_IREQ_NREXT64)
> +		breq->flags |= XFS_IBULK_NREXT64;
> +
>  	return 0;
>  }
>  
> diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
> index c08c79d9e311..0272a3c9d8b1 100644
> --- a/fs/xfs/xfs_itable.c
> +++ b/fs/xfs/xfs_itable.c
> @@ -20,6 +20,7 @@
>  #include "xfs_icache.h"
>  #include "xfs_health.h"
>  #include "xfs_trans.h"
> +#include "xfs_errortag.h"
>  
>  /*
>   * Bulk Stat
> @@ -64,6 +65,7 @@ xfs_bulkstat_one_int(
>  	struct xfs_inode	*ip;		/* incore inode pointer */
>  	struct inode		*inode;
>  	struct xfs_bulkstat	*buf = bc->buf;
> +	xfs_extnum_t		nextents;
>  	int			error = -EINVAL;
>  
>  	if (xfs_internal_inum(mp, ino))
> @@ -102,7 +104,27 @@ xfs_bulkstat_one_int(
>  
>  	buf->bs_xflags = xfs_ip2xflags(ip);
>  	buf->bs_extsize_blks = ip->i_extsize;
> -	buf->bs_extents = xfs_ifork_nextents(&ip->i_df);
> +
> +	nextents = xfs_ifork_nextents(&ip->i_df);
> +	if (!(bc->breq->flags & XFS_IBULK_NREXT64)) {
> +		xfs_extnum_t	max_nextents = XFS_MAX_EXTCNT_DATA_FORK_OLD;
> +
> +		if (unlikely(XFS_TEST_ERROR(false, mp,
> +				XFS_ERRTAG_REDUCE_MAX_IEXTENTS)))
> +			max_nextents = 10;
> +
> +		if (nextents > max_nextents) {
> +			xfs_iunlock(ip, XFS_ILOCK_SHARED);
> +			xfs_irele(ip);
> +			error = -EOVERFLOW;
> +			goto out;
> +		}

This just seems wrong. This will cause a total abort of the bulkstat
pass which will just be completely unexpected by any application
taht does not know about 64 bit extent counts. Most of them likely
don't even care about the extent count in the data being returned.

Really, I think this should just set the extent count to the MAX
number and just continue onwards, otherwise existing application
will not be able to bulkstat a filesystem with large extents counts
in it at all.

> @@ -256,6 +278,7 @@ xfs_bulkstat(
>  		.breq		= breq,
>  	};
>  	struct xfs_trans	*tp;
> +	unsigned int		iwalk_flags = 0;
>  	int			error;
>  
>  	if (breq->mnt_userns != &init_user_ns) {
> @@ -279,7 +302,10 @@ xfs_bulkstat(
>  	if (error)
>  		goto out;
>  
> -	error = xfs_iwalk(breq->mp, tp, breq->startino, breq->flags,
> +	if (breq->flags & XFS_IBULK_SAME_AG)
> +		iwalk_flags |= XFS_IWALK_SAME_AG;
> +
> +	error = xfs_iwalk(breq->mp, tp, breq->startino, iwalk_flags,
>  			xfs_bulkstat_iwalk, breq->icount, &bc);
>  	xfs_trans_cancel(tp);
>  out:

This looks like an unrelated bug fix and doesn't make any sense in
the context of the change being made in this patch.

> diff --git a/fs/xfs/xfs_itable.h b/fs/xfs/xfs_itable.h
> index 7078d10c9b12..9223529cd7bd 100644
> --- a/fs/xfs/xfs_itable.h
> +++ b/fs/xfs/xfs_itable.h
> @@ -17,7 +17,9 @@ struct xfs_ibulk {
>  };
>  
>  /* Only iterate within the same AG as startino */
> -#define XFS_IBULK_SAME_AG	(XFS_IWALK_SAME_AG)
> +#define XFS_IBULK_SAME_AG	(1ULL << 0)
> +
> +#define XFS_IBULK_NREXT64	(1ULL << 1)

Why are these defined as ULL? AFAICT they are only ever stored in an
unsigned int.

>  
>  /*
>   * Advance the user buffer pointer by one record of the given size.  If the
> diff --git a/fs/xfs/xfs_iwalk.h b/fs/xfs/xfs_iwalk.h
> index 37a795f03267..3a68766fd909 100644
> --- a/fs/xfs/xfs_iwalk.h
> +++ b/fs/xfs/xfs_iwalk.h
> @@ -26,7 +26,7 @@ int xfs_iwalk_threaded(struct xfs_mount *mp, xfs_ino_t startino,
>  		unsigned int inode_records, bool poll, void *data);
>  
>  /* Only iterate inodes within the same AG as @startino. */
> -#define XFS_IWALK_SAME_AG	(0x1)
> +#define XFS_IWALK_SAME_AG	(1 << 0)

This also seems unrelated. If these flags need changing, can you
pull it out into a separate patch explaining the what and why it
needs changing because I'm getting lost in the 3-layer-deep (or is
it 4?) iwalk/ibulk/ibulkreq flag munging that is all intertwined in
this patch....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 17/17] xfs: Define max extent length based on on-disk format definition
  2022-03-01 10:39 ` [PATCH V7 17/17] xfs: Define max extent length based on on-disk format definition Chandan Babu R
@ 2022-03-04  8:15   ` Dave Chinner
  2022-03-05 12:45     ` Chandan Babu R
  0 siblings, 1 reply; 53+ messages in thread
From: Dave Chinner @ 2022-03-04  8:15 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs, djwong

On Tue, Mar 01, 2022 at 04:09:38PM +0530, Chandan Babu R wrote:
> The maximum extent length depends on maximum block count that can be stored in
> a BMBT record. Hence this commit defines MAXEXTLEN based on
> BMBT_BLOCKCOUNT_BITLEN.
> 
> While at it, the commit also renames MAXEXTLEN to XFS_MAX_BMBT_EXTLEN.
> 
> Suggested-by: Darrick J. Wong <djwong@kernel.org>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>

Looks fine, but this should be up near the top of the series where
all the extent count definitions are being changed. Also, minor
formatting nit below.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

> @@ -299,7 +299,8 @@ xfs_calc_write_reservation(
>   *    the agf for each of the ags: 2 * sector size
>   *    the agfl for each of the ags: 2 * sector size
>   *    the super block to reflect the freed blocks: sector size
> - *    the realtime bitmap: 2 exts * ((MAXEXTLEN / rtextsize) / NBBY) bytes
> + *    the realtime bitmap: 2 exts * ((XFS_BMBT_MAX_EXTLEN / rtextsize) / NBBY)
> + *    bytes

Break the line at the ":"

 *    the realtime bitmap:
 *		2 exts * ((XFS_BMBT_MAX_EXTLEN / rtextsize) / NBBY) bytes

Which makes it consistent with the rest of the comment:

>   *    the realtime summary: 2 exts * 1 block
>   *    worst case split in allocation btrees per extent assuming 2 extents:
>   *		2 exts * 2 trees * (2 * max depth - 1) * block size

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 04/17] xfs: Introduce xfs_dfork_nextents() helper
  2022-03-04  1:43   ` Dave Chinner
@ 2022-03-05 12:42     ` Chandan Babu R
  0 siblings, 0 replies; 53+ messages in thread
From: Chandan Babu R @ 2022-03-05 12:42 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, djwong

On 04 Mar 2022 at 07:13, Dave Chinner wrote:
> On Tue, Mar 01, 2022 at 04:09:25PM +0530, Chandan Babu R wrote:
>> This commit replaces the macro XFS_DFORK_NEXTENTS() with the helper function
>> xfs_dfork_nextents(). As of this commit, xfs_dfork_nextents() returns the same
>> value as XFS_DFORK_NEXTENTS(). A future commit which extends inode's extent
>> counter fields will add more logic to this helper.
>> 
>> This commit also replaces direct accesses to xfs_dinode->di_[a]nextents
>> with calls to xfs_dfork_nextents().
>> 
>> No functional changes have been made.
>> 
>> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
>> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
>> ---
>>  fs/xfs/libxfs/xfs_format.h     |  4 ----
>>  fs/xfs/libxfs/xfs_inode_buf.c  | 16 +++++++++++-----
>>  fs/xfs/libxfs/xfs_inode_fork.c | 10 ++++++----
>>  fs/xfs/libxfs/xfs_inode_fork.h | 32 ++++++++++++++++++++++++++++++++
>>  fs/xfs/scrub/inode.c           | 18 ++++++++++--------
>>  5 files changed, 59 insertions(+), 21 deletions(-)
>
> Mostly good - a few consistency nits below.
>
>> 
>> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
>> index d75e5b16da7e..e5654b578ec0 100644
>> --- a/fs/xfs/libxfs/xfs_format.h
>> +++ b/fs/xfs/libxfs/xfs_format.h
>> @@ -925,10 +925,6 @@ enum xfs_dinode_fmt {
>>  	((w) == XFS_DATA_FORK ? \
>>  		(dip)->di_format : \
>>  		(dip)->di_aformat)
>> -#define XFS_DFORK_NEXTENTS(dip,w) \
>> -	((w) == XFS_DATA_FORK ? \
>> -		be32_to_cpu((dip)->di_nextents) : \
>> -		be16_to_cpu((dip)->di_anextents))
>>  
>>  /*
>>   * For block and character special files the 32bit dev_t is stored at the
>> diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
>> index 5c95a5428fc7..860d32816909 100644
>> --- a/fs/xfs/libxfs/xfs_inode_buf.c
>> +++ b/fs/xfs/libxfs/xfs_inode_buf.c
>> @@ -336,9 +336,11 @@ xfs_dinode_verify_fork(
>>  	struct xfs_mount	*mp,
>>  	int			whichfork)
>>  {
>> -	xfs_extnum_t		di_nextents = XFS_DFORK_NEXTENTS(dip, whichfork);
>> +	xfs_extnum_t		di_nextents;
>>  	xfs_extnum_t		max_extents;
>>  
>> +	di_nextents = xfs_dfork_nextents(dip, whichfork);
>
> Why separate the declaration and init? We normally move the init
> up to the declaration, not demote it like this....
>

Having init on the same line as the declaration would cause the line to cross
80 columns. Hence, I had moved init to occur after all the declaration
statements.

>>  	switch (XFS_DFORK_FORMAT(dip, whichfork)) {
>>  	case XFS_DINODE_FMT_LOCAL:
>>  		/*
>> @@ -405,6 +407,8 @@ xfs_dinode_verify(
>>  	uint16_t		flags;
>>  	uint64_t		flags2;
>>  	uint64_t		di_size;
>> +	xfs_extnum_t            nextents;
>> +	xfs_filblks_t		nblocks;
>>  
>>  	if (dip->di_magic != cpu_to_be16(XFS_DINODE_MAGIC))
>>  		return __this_address;
>> @@ -435,10 +439,12 @@ xfs_dinode_verify(
>>  	if ((S_ISLNK(mode) || S_ISDIR(mode)) && di_size == 0)
>>  		return __this_address;
>>  
>> +	nextents = xfs_dfork_data_extents(dip);
>> +	nextents += xfs_dfork_attr_extents(dip);
>> +	nblocks = be64_to_cpu(dip->di_nblocks);
>> +
>>  	/* Fork checks carried over from xfs_iformat_fork */
>> -	if (mode &&
>> -	    be32_to_cpu(dip->di_nextents) + be16_to_cpu(dip->di_anextents) >
>> -			be64_to_cpu(dip->di_nblocks))
>> +	if (mode && nextents > nblocks)
>>  		return __this_address;
>
> The naextents count is needed later in this function. Rather than
> calculate it twice, I find the code reads a lot better if it is
> structured like this:
>
> 	nextents = xfs_dfork_data_extents(dip);
> 	naextents = xfs_dfork_attr_extents(dip);
> 	nblocks = be64_to_cpu(dip->di_nblocks);
>
> 	if (mode && nextents + naextents > nblocks)
> 		return __this_address;
> 	.....
>
>>  
>>  	if (mode && XFS_DFORK_BOFF(dip) > mp->m_sb.sb_inodesize)
>> @@ -495,7 +501,7 @@ xfs_dinode_verify(
>>  		default:
>>  			return __this_address;
>>  		}
>> -		if (dip->di_anextents)
>> +		if (xfs_dfork_attr_extents(dip))
>>  			return __this_address;
>>  	}
>
> And then just check naextents here, too?
>

Ok. I will apply this suggestion.

>> diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
>> index a17c4d87520a..829739e249b6 100644
>> --- a/fs/xfs/libxfs/xfs_inode_fork.c
>> +++ b/fs/xfs/libxfs/xfs_inode_fork.c
>> @@ -105,7 +105,7 @@ xfs_iformat_extents(
>>  	struct xfs_mount	*mp = ip->i_mount;
>>  	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, whichfork);
>>  	int			state = xfs_bmap_fork_to_state(whichfork);
>> -	xfs_extnum_t		nex = XFS_DFORK_NEXTENTS(dip, whichfork);
>> +	xfs_extnum_t		nex = xfs_dfork_nextents(dip, whichfork);
>
> I'll point out declaration with init as I mentioned earlier...
>
>>  	int			size = nex * sizeof(xfs_bmbt_rec_t);
>>  	struct xfs_iext_cursor	icur;
>>  	struct xfs_bmbt_rec	*dp;
>> @@ -230,7 +230,7 @@ xfs_iformat_data_fork(
>>  	 * depend on it.
>>  	 */
>>  	ip->i_df.if_format = dip->di_format;
>> -	ip->i_df.if_nextents = be32_to_cpu(dip->di_nextents);
>> +	ip->i_df.if_nextents = xfs_dfork_data_extents(dip);
>>  
>>  	switch (inode->i_mode & S_IFMT) {
>>  	case S_IFIFO:
>> @@ -295,14 +295,16 @@ xfs_iformat_attr_fork(
>>  	struct xfs_inode	*ip,
>>  	struct xfs_dinode	*dip)
>>  {
>> +	xfs_extnum_t		naextents;
>>  	int			error = 0;
>>  
>> +	naextents = xfs_dfork_attr_extents(dip);
>> +
>
> .... and point it out again because otherwise this looks
> inconsistent.
>

Yes, this initialization should have been included as part of the declaration
since it won't violate the 80-column guideline.

>>  struct xfs_ifork *xfs_iext_state_to_fork(struct xfs_inode *ip, int state);
>> diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c
>> index 87925761e174..edad5307e430 100644
>> --- a/fs/xfs/scrub/inode.c
>> +++ b/fs/xfs/scrub/inode.c
>> @@ -233,6 +233,7 @@ xchk_dinode(
>>  	unsigned long long	isize;
>>  	uint64_t		flags2;
>>  	xfs_extnum_t		nextents;
>> +	xfs_extnum_t		naextents;
>>  	prid_t			prid;
>>  	uint16_t		flags;
>>  	uint16_t		mode;
>> @@ -391,7 +392,7 @@ xchk_dinode(
>>  	xchk_inode_extsize(sc, dip, ino, mode, flags);
>>  
>>  	/* di_nextents */
>> -	nextents = be32_to_cpu(dip->di_nextents);
>> +	nextents = xfs_dfork_data_extents(dip);
>>  	fork_recs =  XFS_DFORK_DSIZE(dip, mp) / sizeof(struct xfs_bmbt_rec);
>>  	switch (dip->di_format) {
>>  	case XFS_DINODE_FMT_EXTENTS:
>> @@ -408,10 +409,12 @@ xchk_dinode(
>>  		break;
>>  	}
>>  
>> +	naextents = xfs_dfork_attr_extents(dip);
>
> Initialise the two extent counts in the same place - they are both
> first used only a handful of lines apart.
>

Ok.

-- 
chandan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 06/17] xfs: Promote xfs_extnum_t and xfs_aextnum_t to 64 and 32-bits respectively
  2022-03-04  1:29   ` Dave Chinner
@ 2022-03-05 12:43     ` Chandan Babu R
  2022-03-07  4:55       ` Dave Chinner
  0 siblings, 1 reply; 53+ messages in thread
From: Chandan Babu R @ 2022-03-05 12:43 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, djwong, kernel test robot

On 04 Mar 2022 at 06:59, Dave Chinner wrote:
> On Tue, Mar 01, 2022 at 04:09:27PM +0530, Chandan Babu R wrote:
>> A future commit will introduce a 64-bit on-disk data extent counter and a
>> 32-bit on-disk attr extent counter. This commit promotes xfs_extnum_t and
>> xfs_aextnum_t to 64 and 32-bits in order to correctly handle in-core versions
>> of these quantities.
>> 
>> Reported-by: kernel test robot <lkp@intel.com>
>
> What was reported by the test robot? This change isn't a bug that
> needed fixing, it's a core part of the patchset...
>

Kernel test robot had complained about the following,

  ld.lld: error: undefined symbol: __udivdi3
  >>> referenced by xfs_bmap.c
  >>>               xfs/libxfs/xfs_bmap.o:(xfs_bmap_compute_maxlevels) in archive fs/built-in.a

I had solved the linker error by replacing the division operation with the
following statement,

  maxblocks = howmany_64(maxleafents, minleafrecs);

Sorry, I will include this description in the commit message.

>> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
>> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
>> ---
>>  fs/xfs/libxfs/xfs_bmap.c       | 6 +++---
>>  fs/xfs/libxfs/xfs_inode_fork.c | 2 +-
>>  fs/xfs/libxfs/xfs_inode_fork.h | 2 +-
>>  fs/xfs/libxfs/xfs_types.h      | 4 ++--
>>  fs/xfs/xfs_inode.c             | 4 ++--
>>  fs/xfs/xfs_trace.h             | 2 +-
>>  6 files changed, 10 insertions(+), 10 deletions(-)
>> 
>> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
>> index 98541be873d8..9df98339a43a 100644
>> --- a/fs/xfs/libxfs/xfs_bmap.c
>> +++ b/fs/xfs/libxfs/xfs_bmap.c
>> @@ -52,9 +52,9 @@ xfs_bmap_compute_maxlevels(
>>  	xfs_mount_t	*mp,		/* file system mount structure */
>>  	int		whichfork)	/* data or attr fork */
>>  {
>> +	xfs_extnum_t	maxleafents;	/* max leaf entries possible */
>>  	int		level;		/* btree level */
>>  	uint		maxblocks;	/* max blocks at this level */
>> -	xfs_extnum_t	maxleafents;	/* max leaf entries possible */
>>  	int		maxrootrecs;	/* max records in root block */
>>  	int		minleafrecs;	/* min records in leaf block */
>>  	int		minnoderecs;	/* min records in node block */
>
> Unnecessary.
>

I agree. I will revert the above change.

>> @@ -83,7 +83,7 @@ xfs_bmap_compute_maxlevels(
>>  	maxrootrecs = xfs_bmdr_maxrecs(sz, 0);
>>  	minleafrecs = mp->m_bmap_dmnr[0];
>>  	minnoderecs = mp->m_bmap_dmnr[1];
>> -	maxblocks = (maxleafents + minleafrecs - 1) / minleafrecs;
>> +	maxblocks = howmany_64(maxleafents, minleafrecs);
>>  	for (level = 1; maxblocks > 1; level++) {
>>  		if (maxblocks <= maxrootrecs)
>>  			maxblocks = 1;
>> @@ -467,7 +467,7 @@ xfs_bmap_check_leaf_extents(
>>  	if (bp_release)
>>  		xfs_trans_brelse(NULL, bp);
>>  error_norelse:
>> -	xfs_warn(mp, "%s: BAD after btree leaves for %d extents",
>> +	xfs_warn(mp, "%s: BAD after btree leaves for %llu extents",
>>  		__func__, i);
>>  	xfs_err(mp, "%s: CORRUPTED BTREE OR SOMETHING", __func__);
>>  	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
>> diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
>> index 829739e249b6..ce690abe5dce 100644
>> --- a/fs/xfs/libxfs/xfs_inode_fork.c
>> +++ b/fs/xfs/libxfs/xfs_inode_fork.c
>> @@ -117,7 +117,7 @@ xfs_iformat_extents(
>>  	 * we just bail out rather than crash in kmem_alloc() or memcpy() below.
>>  	 */
>>  	if (unlikely(size < 0 || size > XFS_DFORK_SIZE(dip, mp, whichfork))) {
>> -		xfs_warn(ip->i_mount, "corrupt inode %Lu ((a)extents = %d).",
>> +		xfs_warn(ip->i_mount, "corrupt inode %llu ((a)extents = %llu).",
>>  			(unsigned long long) ip->i_ino, nex);
>
> Isn't ip->i_ino explicitly defined as an unsigned long long? If you are going
> to fix one part of the printk formatting for ip->i_ino, you should
> probably should get rid of the unnecessary cast, too.

Yes, xfs_ino_t is an alias for "unsigned long long". I will remove the
typecast.

>
> Otherwise looks OK.

-- 
chandan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 07/17] xfs: Introduce XFS_SB_FEAT_INCOMPAT_NREXT64 and associated per-fs feature bit
  2022-03-04  1:57   ` Dave Chinner
@ 2022-03-05 12:43     ` Chandan Babu R
  0 siblings, 0 replies; 53+ messages in thread
From: Chandan Babu R @ 2022-03-05 12:43 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, djwong

On 04 Mar 2022 at 07:27, Dave Chinner wrote:
> On Tue, Mar 01, 2022 at 04:09:28PM +0530, Chandan Babu R wrote:
>> XFS_SB_FEAT_INCOMPAT_NREXT64 incompat feature bit will be set on filesystems
>> which support large per-inode extent counters. This commit defines the new
>> incompat feature bit and the corresponding per-fs feature bit (along with
>> inline functions to work on it).
>> 
>> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
>> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
>> ---
>>  fs/xfs/libxfs/xfs_format.h | 1 +
>>  fs/xfs/libxfs/xfs_sb.c     | 3 +++
>>  fs/xfs/xfs_mount.h         | 2 ++
>>  3 files changed, 6 insertions(+)
>> 
>> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
>> index e5654b578ec0..7972cbc22608 100644
>> --- a/fs/xfs/libxfs/xfs_format.h
>> +++ b/fs/xfs/libxfs/xfs_format.h
>> @@ -372,6 +372,7 @@ xfs_sb_has_ro_compat_feature(
>>  #define XFS_SB_FEAT_INCOMPAT_META_UUID	(1 << 2)	/* metadata UUID */
>>  #define XFS_SB_FEAT_INCOMPAT_BIGTIME	(1 << 3)	/* large timestamps */
>>  #define XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR (1 << 4)	/* needs xfs_repair */
>> +#define XFS_SB_FEAT_INCOMPAT_NREXT64	(1 << 5)	/* 64-bit data fork extent counter */
>>  #define XFS_SB_FEAT_INCOMPAT_ALL \
>>  		(XFS_SB_FEAT_INCOMPAT_FTYPE|	\
>>  		 XFS_SB_FEAT_INCOMPAT_SPINODES|	\
>> diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
>> index f4e84aa1d50a..bd632389ae92 100644
>> --- a/fs/xfs/libxfs/xfs_sb.c
>> +++ b/fs/xfs/libxfs/xfs_sb.c
>> @@ -124,6 +124,9 @@ xfs_sb_version_to_features(
>>  		features |= XFS_FEAT_BIGTIME;
>>  	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR)
>>  		features |= XFS_FEAT_NEEDSREPAIR;
>> +	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_NREXT64)
>> +		features |= XFS_FEAT_NREXT64;
>> +
>>  	return features;
>>  }
>>  
>> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
>> index 00720a02e761..10941481f7e6 100644
>> --- a/fs/xfs/xfs_mount.h
>> +++ b/fs/xfs/xfs_mount.h
>> @@ -276,6 +276,7 @@ typedef struct xfs_mount {
>>  #define XFS_FEAT_INOBTCNT	(1ULL << 23)	/* inobt block counts */
>>  #define XFS_FEAT_BIGTIME	(1ULL << 24)	/* large timestamps */
>>  #define XFS_FEAT_NEEDSREPAIR	(1ULL << 25)	/* needs xfs_repair */
>> +#define XFS_FEAT_NREXT64	(1ULL << 26)	/* 64-bit inode extent counters */
>>  
>>  /* Mount features */
>>  #define XFS_FEAT_NOATTR2	(1ULL << 48)	/* disable attr2 creation */
>> @@ -338,6 +339,7 @@ __XFS_HAS_FEAT(realtime, REALTIME)
>>  __XFS_HAS_FEAT(inobtcounts, INOBTCNT)
>>  __XFS_HAS_FEAT(bigtime, BIGTIME)
>>  __XFS_HAS_FEAT(needsrepair, NEEDSREPAIR)
>> +__XFS_HAS_FEAT(nrext64, NREXT64)
>
> Not a big fan of "nrext64" naming.
>
> I'd really like the feature macro to be human readable such as:
>
> __XFS_HAS_FEAT(large_extent_counts, NREXT64)
>
> So that it reads like this:
>
> 	if (xfs_has_large_extent_counts(mp)) {
> 		.....
> 	}
>
> because then the code is much easier to read and is largely self
> documenting. In this case, I don't really care about the flag names
> (they can remain NREXT64) because they are only seen deep down in
> the code.  But for (potentially complex) conditional logic, the
> clarity of human readable names makes a big difference.
>

Ok. I will rename the feature.

-- 
chandan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 10/17] xfs: Use xfs_rfsblock_t to count maximum blocks that can be used by BMBT
  2022-03-04  2:09   ` Dave Chinner
@ 2022-03-05 12:44     ` Chandan Babu R
  0 siblings, 0 replies; 53+ messages in thread
From: Chandan Babu R @ 2022-03-05 12:44 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, djwong, kernel test robot

On 04 Mar 2022 at 07:39, Dave Chinner wrote:
> On Tue, Mar 01, 2022 at 04:09:31PM +0530, Chandan Babu R wrote:
>> Reported-by: kernel test robot <lkp@intel.com>
>
> What was reported by the robot? I don't quite see the relevance of
> this change to the overall patchset just from the change being made.
>

Kernel test robot had complained about the following,

  microblaze-linux-ld: fs/xfs/libxfs/xfs_bmap.o: in function `xfs_bmap_compute_maxlevels':
  (.text+0x10cbc): undefined reference to `__udivdi3'
  >> microblaze-linux-ld: (.text+0x10dc0): undefined reference to `__udivdi3'

I had solved the linker error by replacing the division operation with the
following statement,

  maxblocks = howmany_64(maxblocks, minnoderecs);

Sorry, I will include this description in the commit message.

>> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
>> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
>> ---
>>  fs/xfs/libxfs/xfs_bmap.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>> 
>> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
>> index 9df98339a43a..a01d9a9225ae 100644
>> --- a/fs/xfs/libxfs/xfs_bmap.c
>> +++ b/fs/xfs/libxfs/xfs_bmap.c
>> @@ -53,8 +53,8 @@ xfs_bmap_compute_maxlevels(
>>  	int		whichfork)	/* data or attr fork */
>>  {
>>  	xfs_extnum_t	maxleafents;	/* max leaf entries possible */
>> +	xfs_rfsblock_t	maxblocks;	/* max blocks at this level */
>
> typedef uint64_t        xfs_rfsblock_t; /* blockno in filesystem (raw) */
>
> Usage of the type doesn't seem to match it's definition. This
> function is calculating a block count, not a block number. If you
> must use a xfs type, then:
>
> typedef uint64_t        xfs_filblks_t;  /* number of blocks in a file */
>
> is a better match, but I think this should just use uint64_t because
> the count has nothing to do with block addresses or files..
>

True. I will replace xfs_rfsblock_t with uint64_t.

-- 
chandan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 11/17] xfs: Introduce macros to represent new maximum extent counts for data/attr forks
  2022-03-04  2:32   ` Dave Chinner
@ 2022-03-05 12:44     ` Chandan Babu R
  0 siblings, 0 replies; 53+ messages in thread
From: Chandan Babu R @ 2022-03-05 12:44 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, djwong

On 04 Mar 2022 at 08:02, Dave Chinner wrote:
> On Tue, Mar 01, 2022 at 04:09:32PM +0530, Chandan Babu R wrote:
>> This commit defines new macros to represent maximum extent counts allowed by
>> filesystems which have support for large per-inode extent counters.
>> 
>> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
>> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
>> ---
>>  fs/xfs/libxfs/xfs_bmap.c       |  8 +++-----
>>  fs/xfs/libxfs/xfs_bmap_btree.c |  2 +-
>>  fs/xfs/libxfs/xfs_format.h     | 20 ++++++++++++++++----
>>  fs/xfs/libxfs/xfs_inode_buf.c  |  3 ++-
>>  fs/xfs/libxfs/xfs_inode_fork.c |  2 +-
>>  fs/xfs/libxfs/xfs_inode_fork.h | 19 +++++++++++++++----
>>  6 files changed, 38 insertions(+), 16 deletions(-)
>> 
>> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
>> index a01d9a9225ae..be7f8ebe3cd5 100644
>> --- a/fs/xfs/libxfs/xfs_bmap.c
>> +++ b/fs/xfs/libxfs/xfs_bmap.c
>> @@ -61,10 +61,8 @@ xfs_bmap_compute_maxlevels(
>>  	int		sz;		/* root block size */
>>  
>>  	/*
>> -	 * The maximum number of extents in a file, hence the maximum number of
>> -	 * leaf entries, is controlled by the size of the on-disk extent count,
>> -	 * either a signed 32-bit number for the data fork, or a signed 16-bit
>> -	 * number for the attr fork.
>> +	 * The maximum number of extents in a fork, hence the maximum number of
>> +	 * leaf entries, is controlled by the size of the on-disk extent count.
>>  	 *
>>  	 * Note that we can no longer assume that if we are in ATTR1 that the
>>  	 * fork offset of all the inodes will be
>> @@ -74,7 +72,7 @@ xfs_bmap_compute_maxlevels(
>>  	 * ATTR2 we have to assume the worst case scenario of a minimum size
>>  	 * available.
>>  	 */
>> -	maxleafents = xfs_iext_max_nextents(whichfork);
>> +	maxleafents = xfs_iext_max_nextents(xfs_has_nrext64(mp), whichfork);
>>  	if (whichfork == XFS_DATA_FORK)
>>  		sz = XFS_BMDR_SPACE_CALC(MINDBTPTRS);
>>  	else
>> diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
>> index 453309fc85f2..e8d21d69b9ff 100644
>> --- a/fs/xfs/libxfs/xfs_bmap_btree.c
>> +++ b/fs/xfs/libxfs/xfs_bmap_btree.c
>> @@ -611,7 +611,7 @@ xfs_bmbt_maxlevels_ondisk(void)
>>  	minrecs[1] = xfs_bmbt_block_maxrecs(blocklen, false) / 2;
>>  
>>  	/* One extra level for the inode root. */
>> -	return xfs_btree_compute_maxlevels(minrecs, MAXEXTNUM) + 1;
>> +	return xfs_btree_compute_maxlevels(minrecs, XFS_MAX_EXTCNT_DATA_FORK) + 1;
>>  }
>>  
>>  /*
>> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
>> index 9934c320bf01..d3dfd45c39e0 100644
>> --- a/fs/xfs/libxfs/xfs_format.h
>> +++ b/fs/xfs/libxfs/xfs_format.h
>> @@ -872,10 +872,22 @@ enum xfs_dinode_fmt {
>>  
>>  /*
>>   * Max values for extlen, extnum, aextnum.
>> - */
>> -#define	MAXEXTLEN	((xfs_extlen_t)0x001fffff)	/* 21 bits */
>> -#define	MAXEXTNUM	((xfs_extnum_t)0x7fffffff)	/* signed int */
>> -#define	MAXAEXTNUM	((xfs_aextnum_t)0x7fff)		/* signed short */
>> + *
>> + * The newly introduced data fork extent counter is a 64-bit field. However, the
>> + * maximum number of extents in a file is limited to 2^54 extents (assuming one
>> + * blocks per extent) by the 54-bit wide startoff field of an extent record.
>> + *
>> + * A further limitation applies as shown below,
>> + * 2^63 (max file size) / 64k (max block size) = 2^47
>> + *
>> + * Rounding up 47 to the nearest multiple of bits-per-byte results in 48. Hence
>> + * 2^48 was chosen as the maximum data fork extent count.
>> + */
>> +#define	MAXEXTLEN			((xfs_extlen_t)((1ULL << 21) - 1)) /* 21 bits */
>> +#define XFS_MAX_EXTCNT_DATA_FORK	((xfs_extnum_t)((1ULL << 48) - 1)) /* Unsigned 48-bits */
>> +#define XFS_MAX_EXTCNT_ATTR_FORK	((xfs_extnum_t)((1ULL << 32) - 1)) /* Unsigned 32-bits */
>> +#define XFS_MAX_EXTCNT_DATA_FORK_OLD	((xfs_extnum_t)((1ULL << 31) - 1)) /* Signed 32-bits */
>> +#define XFS_MAX_EXTCNT_ATTR_FORK_OLD	((xfs_extnum_t)((1ULL << 15) - 1)) /* Signed 16-bits */
>
> These go way beyond 80 columns. You do not need the trailing comment
> saying how many bits are supported - that's obvious from numbers.
> If you need to describe the actual supported limits, then do it
> in the head comment:
>
> /*
>  * Max values for extent sizes and counts
>  *
>  * The original on-disk extent counts were held in signed fields,
>  * resulting in maximum extent counts of 2^31 and 2^15 for the data
>  * and attr forks respectively. Similarly the maximum extent length
>  * is limited to 2^21 blocks by the 21-bit wide blockcount field of
>  * a BMBT extent record.
>  *
>  * The newly introduced data fork extent counter can hold a 64-bit
>  * value, however the  maximum number of extents in a file is also
>  * limited to 2^54 extents by the 54-bit wide startoff field of a BMBT
>  * extent record.
>  *
>  * It is further limited by the maximum supported file size
>  * of 2^63 *bytes*. This leads to a maximum extent count for maximally sized
>  * filesystem blocks (64kB) of:
>  *
>  * 2^63 bytes / 2^16 bytes per block = 2^47 blocks
>  *
>  * Rounding up 47 to the nearest multiple of bits-per-byte
>  * results in 48. Hence 2^48 was chosen as the maximum data fork
>  * extent count.
>  */
> #define	MAXEXTLEN			((xfs_extlen_t)((1ULL << 21) - 1))
> #define XFS_MAX_EXTCNT_DATA_FORK	((xfs_extnum_t)((1ULL << 48) - 1))
> #define XFS_MAX_EXTCNT_ATTR_FORK	((xfs_extnum_t)((1ULL << 32) - 1))
> #define XFS_MAX_EXTCNT_DATA_FORK_OLD	((xfs_extnum_t)((1ULL << 31) - 1))
> #define XFS_MAX_EXTCNT_ATTR_FORK_OLD	((xfs_extnum_t)((1ULL << 15) - 1))
>

Ok. I will make the change suggested above.

>
> Hmmm. On reading that back and looking at the code below, maybe the
> names should be _LARGE and _SMALL, not (blank) and _OLD....
>

Ok. I will make this change.

>>  /*
>>   * Inode minimum and maximum sizes.
>> diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
>> index 860d32816909..34f360a38603 100644
>> --- a/fs/xfs/libxfs/xfs_inode_buf.c
>> +++ b/fs/xfs/libxfs/xfs_inode_buf.c
>> @@ -361,7 +361,8 @@ xfs_dinode_verify_fork(
>>  			return __this_address;
>>  		break;
>>  	case XFS_DINODE_FMT_BTREE:
>> -		max_extents = xfs_iext_max_nextents(whichfork);
>> +		max_extents = xfs_iext_max_nextents(xfs_dinode_has_nrext64(dip),
>> +					whichfork);
>
>>  		if (di_nextents > max_extents)
>>  			return __this_address;
>>  		break;
>> diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
>> index ce690abe5dce..a3a3b54f9c55 100644
>> --- a/fs/xfs/libxfs/xfs_inode_fork.c
>> +++ b/fs/xfs/libxfs/xfs_inode_fork.c
>> @@ -746,7 +746,7 @@ xfs_iext_count_may_overflow(
>>  	if (whichfork == XFS_COW_FORK)
>>  		return 0;
>>  
>> -	max_exts = xfs_iext_max_nextents(whichfork);
>> +	max_exts = xfs_iext_max_nextents(xfs_inode_has_nrext64(ip), whichfork);
>>  
>>  	if (XFS_TEST_ERROR(false, ip->i_mount, XFS_ERRTAG_REDUCE_MAX_IEXTENTS))
>>  		max_exts = 10;
>> diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
>> index 4a8b77d425df..e56803436c61 100644
>> --- a/fs/xfs/libxfs/xfs_inode_fork.h
>> +++ b/fs/xfs/libxfs/xfs_inode_fork.h
>> @@ -133,12 +133,23 @@ static inline int8_t xfs_ifork_format(struct xfs_ifork *ifp)
>>  	return ifp->if_format;
>>  }
>>  
>> -static inline xfs_extnum_t xfs_iext_max_nextents(int whichfork)
>> +static inline xfs_extnum_t xfs_iext_max_nextents(bool has_nrext64,
> 							has_large_extent_counts
>> +				int whichfork)
>>  {
>> -	if (whichfork == XFS_DATA_FORK || whichfork == XFS_COW_FORK)
>> -		return MAXEXTNUM;
>> +	switch (whichfork) {
>> +	case XFS_DATA_FORK:
>> +	case XFS_COW_FORK:
>> +		return has_nrext64 ? XFS_MAX_EXTCNT_DATA_FORK
>> +			: XFS_MAX_EXTCNT_DATA_FORK_OLD;
>
> 		if (has_large_extent_counts)
> 			return XFS_MAX_EXTCNT_DATA_FORK_LARGE;
> 		return XFS_MAX_EXTCNT_DATA_FORK_SMALL;
>
> That reads much better to me...
>

Ok. 

-- 
chandan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 12/17] xfs: Introduce per-inode 64-bit extent counters
  2022-03-04  7:14   ` Dave Chinner
@ 2022-03-05 12:44     ` Chandan Babu R
  0 siblings, 0 replies; 53+ messages in thread
From: Chandan Babu R @ 2022-03-05 12:44 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, djwong, Dave Chinner

On 04 Mar 2022 at 12:44, Dave Chinner wrote:
> On Tue, Mar 01, 2022 at 04:09:33PM +0530, Chandan Babu R wrote:
>> This commit introduces new fields in the on-disk inode format to support
>> 64-bit data fork extent counters and 32-bit attribute fork extent
>> counters. The new fields will be used only when an inode has
>> XFS_DIFLAG2_NREXT64 flag set. Otherwise we continue to use the regular 32-bit
>> data fork extent counters and 16-bit attribute fork extent counters.
>> 
>> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
>> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
>> Suggested-by: Dave Chinner <dchinner@redhat.com>
>> ---
>>  fs/xfs/libxfs/xfs_format.h      | 33 ++++++++++++--
>>  fs/xfs/libxfs/xfs_inode_buf.c   | 49 ++++++++++++++++++--
>>  fs/xfs/libxfs/xfs_inode_fork.h  |  6 +++
>>  fs/xfs/libxfs/xfs_log_format.h  | 33 ++++++++++++--
>>  fs/xfs/xfs_inode_item.c         | 23 ++++++++--
>>  fs/xfs/xfs_inode_item_recover.c | 79 ++++++++++++++++++++++++++++-----
>>  6 files changed, 196 insertions(+), 27 deletions(-)
>
> .....
>
>> +static xfs_failaddr_t
>> +xfs_dinode_verify_nrext64(
>> +	struct xfs_mount	*mp,
>> +	struct xfs_dinode	*dip)
>> +{
>> +	if (xfs_dinode_has_nrext64(dip)) {
>> +		if (!xfs_has_nrext64(mp))
>> +			return __this_address;
>> +		if (dip->di_nrext64_pad != 0)
>> +			return __this_address;
>> +	} else if (dip->di_version >= 3) {
>> +		if (dip->di_v3_pad != 0)
>> +			return __this_address;
>> +	}
>> +
>> +	return NULL;
>> +}
>
> Shouldn't this also check that di_v2_pad is zero if it's a v2 inode?
>

xfs_dinode_verify_nrext64() is meant for checking only those parts of an inode
that are influenced by "large extent counters" feature. Hence, I don't think
we should check di_v2_pad field in this function.

> Also, this isn't verifying the actual extent count range. Maybe
> that's done somewhere else now, and if so, shouldn't we move all the
> extent count verification checks into a single function called,
> say, xfs_dinode_verify_extent_counts()?
>

Validation of extent count had been performed by xfs_dinode_verify_fork(). I
think it still continues to be the right place for validating extent counts
since they are per-fork attributes.

>> @@ -348,21 +366,60 @@ xlog_recover_inode_commit_pass2(
>>  			goto out_release;
>>  		}
>>  	}
>> -	if (unlikely(ldip->di_nextents + ldip->di_anextents > ldip->di_nblocks)){
>> -		XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(5)",
>> +
>> +	if (xfs_log_dinode_has_nrext64(ldip)) {
>> +		if (!xfs_has_nrext64(mp) || (ldip->di_nrext64_pad != 0)) {
>> +			XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(5)",
>
> Can we have a meaningful error like "Bad log dinode large extent
> count format" rather than something we have to go look up the source
> code to understand when someone reports a problem?
>

Ok. I will change the error message to apply the above review.

>> +				     XFS_ERRLEVEL_LOW, mp, ldip,
>> +				     sizeof(*ldip));
>> +			xfs_alert(mp,
>> +				"%s: Bad inode log record, rec ptr "PTR_FMT", "
>> +				"dino ptr "PTR_FMT", dino bp "PTR_FMT", "
>> +				"ino %Ld, xfs_has_nrext64(mp) = %d, "
>> +				"ldip->di_nrext64_pad = %u",
>
> What's the point of printing pointers here? Just print the inode
> number and the bad values - we log the pointers in the
> the log recovery tracepoints so there's no need to print them in
> user facing errors because we can't do anything with them without a
> debugger attached.
>
> Hence we really only need to dump the inode number and the bad extent
> format information - we already have the error context/location from
> the corruption error report above. Hence all we need here is:
>
> 			xfs_alert(mp,
> 				"Bad inode 0x%llx, nrext64 %d, padding 0x%x"
> 				in_f->ilf_ino, xfs_has_nrext64(mp).
> 				ldip->di_nrext64_pad);
>
> The other new alerts can be cleaned up like this, too.
>

Ok. I will clean it up.

>> +				__func__, item, dip, bp, in_f->ilf_ino,
>> +				xfs_has_nrext64(mp), ldip->di_nrext64_pad);
>> +			error = -EFSCORRUPTED;
>> +			goto out_release;
>> +		}
>> +	} else {
>> +		if (ldip->di_version == 3 && ldip->di_big_nextents != 0) {
>> +			XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(6)",
>> +				     XFS_ERRLEVEL_LOW, mp, ldip,
>> +				     sizeof(*ldip));
>> +			xfs_alert(mp,
>> +				"%s: Bad inode log record, rec ptr "PTR_FMT", "
>> +				"dino ptr "PTR_FMT", dino bp "PTR_FMT", "
>> +				"ino %Ld, ldip->di_big_dextcnt = %llu",
>> +				__func__, item, dip, bp, in_f->ilf_ino,
>> +				ldip->di_big_nextents);
>> +			error = -EFSCORRUPTED;
>> +			goto out_release;
>> +		}
>> +	}
>> +
>> +	if (xfs_log_dinode_has_nrext64(ldip)) {
>> +		nextents = ldip->di_big_nextents;
>> +		anextents = ldip->di_big_anextents;
>> +	} else {
>> +		nextents = ldip->di_nextents;
>> +		anextents = ldip->di_anextents;
>> +	}
>
> Also, this can be put in the above if statements, it does not need
> a separate identical if clause.

I agree. I will move these assignments to the previous if/else statement.

>> +
>> +	if (unlikely(nextents + anextents > ldip->di_nblocks)) {
>> +		XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(7)",
>>  				     XFS_ERRLEVEL_LOW, mp, ldip,
>>  				     sizeof(*ldip));
>>  		xfs_alert(mp,
>>  	"%s: Bad inode log record, rec ptr "PTR_FMT", dino ptr "PTR_FMT", "
>> -	"dino bp "PTR_FMT", ino %Ld, total extents = %d, nblocks = %Ld",
>> +	"dino bp "PTR_FMT", ino %Ld, total extents = %llu, nblocks = %Ld",
>>  			__func__, item, dip, bp, in_f->ilf_ino,
>> -			ldip->di_nextents + ldip->di_anextents,
>> -			ldip->di_nblocks);
>> +			nextents + anextents, ldip->di_nblocks);
>>  		error = -EFSCORRUPTED;
>>  		goto out_release;
>>  	}
>
> ALso, I think that xlog_recover_inode_commit_pass2() is already too
> big without adding this new verification to it. Can we factor this
> into a separate function (say xlog_dinode_verify_extent_counts()) 
>

Sure. I will move extent count validation into a new function.

>>  	if (unlikely(ldip->di_forkoff > mp->m_sb.sb_inodesize)) {
>> -		XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(6)",
>> +		XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(8)",
>>  				     XFS_ERRLEVEL_LOW, mp, ldip,
>>  				     sizeof(*ldip));
>>  		xfs_alert(mp,
>> @@ -374,7 +431,7 @@ xlog_recover_inode_commit_pass2(
>>  	}
>>  	isize = xfs_log_dinode_size(mp);
>>  	if (unlikely(item->ri_buf[1].i_len > isize)) {
>> -		XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(7)",
>> +		XFS_CORRUPTION_ERROR("xlog_recover_inode_pass2(9)",
>>  				     XFS_ERRLEVEL_LOW, mp, ldip,
>>  				     sizeof(*ldip));
>>  		xfs_alert(mp,
>
> And this is exactly why I don't like these numbered warnings. Make
> the warning descriptive rather than numbered -
> changing/adding/removing a warning shouldn't force us to change a
> bunch of unrelated warninngs...

I will write a new patch to replace these numbered warnings with descriptive
ones.

-- 
chandan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 13/17] xfs: xfs_growfs_rt_alloc: Unlock inode explicitly rather than through iop_committing()
  2022-03-04  7:25   ` Dave Chinner
@ 2022-03-05 12:44     ` Chandan Babu R
  0 siblings, 0 replies; 53+ messages in thread
From: Chandan Babu R @ 2022-03-05 12:44 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, djwong

On 04 Mar 2022 at 12:55, Dave Chinner wrote:
> On Tue, Mar 01, 2022 at 04:09:34PM +0530, Chandan Babu R wrote:
>> In order to be able to upgrade inodes to XFS_DIFLAG2_NREXT64, a future commit
>> will perform such an upgrade in a transaction context. This requires the
>> transaction to be rolled once. Hence inodes which have been added to the
>> tranasction (via xfs_trans_ijoin()) with non-zero value for lock_flags
>> argument would cause the inode to be unlocked when the transaction is rolled.
>> 
>> To prevent this from happening in the case of realtime bitmap/summary inodes,
>> this commit now unlocks the inode explictly rather than through
>> iop_committing() call back.
>> 
>> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
>> ---
>>  fs/xfs/xfs_rtalloc.c | 11 +++++++++--
>>  1 file changed, 9 insertions(+), 2 deletions(-)
>> 
>> diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
>> index b8c79ee791af..a70140b35e8b 100644
>> --- a/fs/xfs/xfs_rtalloc.c
>> +++ b/fs/xfs/xfs_rtalloc.c
>> @@ -780,6 +780,7 @@ xfs_growfs_rt_alloc(
>>  	int			resblks;	/* space reservation */
>>  	enum xfs_blft		buf_type;
>>  	struct xfs_trans	*tp;
>> +	bool			unlock_inode;
>>  
>>  	if (ip == mp->m_rsumip)
>>  		buf_type = XFS_BLFT_RTSUMMARY_BUF;
>> @@ -802,7 +803,8 @@ xfs_growfs_rt_alloc(
>>  		 * Lock the inode.
>>  		 */
>>  		xfs_ilock(ip, XFS_ILOCK_EXCL);
>> -		xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
>> +		xfs_trans_ijoin(tp, ip, 0);
>> +		unlock_inode = true;
>>  
>>  		error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
>>  				XFS_IEXT_ADD_NOSPLIT_CNT);
>> @@ -823,8 +825,11 @@ xfs_growfs_rt_alloc(
>>  		 * Free any blocks freed up in the transaction, then commit.
>>  		 */
>>  		error = xfs_trans_commit(tp);
>> -		if (error)
>> +                unlock_inode = false;
>> +                xfs_iunlock(ip, XFS_ILOCK_EXCL);
>> +                if (error)
>>  			return error;
>> +
>
> whitespace damage.
>
>>  		/*
>>  		 * Now we need to clear the allocated blocks.
>>  		 * Do this one block per transaction, to keep it simple.
>> @@ -874,6 +879,8 @@ xfs_growfs_rt_alloc(
>>  
>>  out_trans_cancel:
>>  	xfs_trans_cancel(tp);
>> +	if (unlock_inode)
>> +		xfs_iunlock(ip, XFS_ILOCK_EXCL);
>>  	return error;
>
> That's kinda messy, IMO. If you create a new error stack like:
>
> out_trans_cancel:
> 	xfs_trans_cancel(tp);
> 	return error;
>
> out_cancel_unlock:
> 	xfs_trans_cancel(tp);
> 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> 	return error;
>
> Then you can get rid of the unlock_inode variable and just change
> the if (error) goto ... jumps in the appropriate places where
> unlock on cancel is needed. That seems much cleaner and easier to
> verify.

The above suggestion is correct. I will include this change in the next
version.

-- 
chandan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 14/17] xfs: Conditionally upgrade existing inodes to use 64-bit extent counters
  2022-03-04  7:51   ` Dave Chinner
@ 2022-03-05 12:45     ` Chandan Babu R
  2022-03-07  5:02       ` Dave Chinner
  0 siblings, 1 reply; 53+ messages in thread
From: Chandan Babu R @ 2022-03-05 12:45 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, djwong

On 04 Mar 2022 at 13:21, Dave Chinner wrote:
> On Tue, Mar 01, 2022 at 04:09:35PM +0530, Chandan Babu R wrote:
>> This commit upgrades inodes to use 64-bit extent counters when they are read
>> from disk. Inodes are upgraded only when the filesystem instance has
>> XFS_SB_FEAT_INCOMPAT_NREXT64 incompat flag set.
>> 
>> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
>> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
>> ---
>>  fs/xfs/libxfs/xfs_attr.c       |  3 ++-
>>  fs/xfs/libxfs/xfs_bmap.c       |  5 ++---
>>  fs/xfs/libxfs/xfs_inode_fork.c | 37 ++++++++++++++++++++++++++++++++++
>>  fs/xfs/libxfs/xfs_inode_fork.h |  2 ++
>>  fs/xfs/xfs_bmap_item.c         |  3 ++-
>>  fs/xfs/xfs_bmap_util.c         | 10 ++++-----
>>  fs/xfs/xfs_dquot.c             |  2 +-
>>  fs/xfs/xfs_iomap.c             |  5 +++--
>>  fs/xfs/xfs_reflink.c           |  5 +++--
>>  fs/xfs/xfs_rtalloc.c           |  2 +-
>>  10 files changed, 58 insertions(+), 16 deletions(-)
>> 
>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>> index 23523b802539..03a358930d74 100644
>> --- a/fs/xfs/libxfs/xfs_attr.c
>> +++ b/fs/xfs/libxfs/xfs_attr.c
>> @@ -774,7 +774,8 @@ xfs_attr_set(
>>  		return error;
>>  
>>  	if (args->value || xfs_inode_hasattr(dp)) {
>> -		error = xfs_iext_count_may_overflow(dp, XFS_ATTR_FORK,
>> +		error = xfs_trans_inode_ensure_nextents(&args->trans, dp,
>> +				XFS_ATTR_FORK,
>>  				XFS_IEXT_ATTR_MANIP_CNT(rmt_blks));
>
> hmmmm.
>
>>  		if (error)
>>  			goto out_trans_cancel;
>> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
>> index be7f8ebe3cd5..3a3c99ef7f13 100644
>> --- a/fs/xfs/libxfs/xfs_bmap.c
>> +++ b/fs/xfs/libxfs/xfs_bmap.c
>> @@ -4523,14 +4523,13 @@ xfs_bmapi_convert_delalloc(
>>  		return error;
>>  
>>  	xfs_ilock(ip, XFS_ILOCK_EXCL);
>> +	xfs_trans_ijoin(tp, ip, 0);
>>  
>> -	error = xfs_iext_count_may_overflow(ip, whichfork,
>> +	error = xfs_trans_inode_ensure_nextents(&tp, ip, whichfork,
>>  			XFS_IEXT_ADD_NOSPLIT_CNT);
>>  	if (error)
>>  		goto out_trans_cancel;
>>  
>> -	xfs_trans_ijoin(tp, ip, 0);
>> -
>>  	if (!xfs_iext_lookup_extent(ip, ifp, offset_fsb, &bma.icur, &bma.got) ||
>>  	    bma.got.br_startoff > offset_fsb) {
>>  		/*
>> diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
>> index a3a3b54f9c55..d1d065abeac3 100644
>> --- a/fs/xfs/libxfs/xfs_inode_fork.c
>> +++ b/fs/xfs/libxfs/xfs_inode_fork.c
>> @@ -757,3 +757,40 @@ xfs_iext_count_may_overflow(
>>  
>>  	return 0;
>>  }
>> +
>> +/*
>> + * Ensure that the inode has the ability to add the specified number of
>> + * extents.  Caller must hold ILOCK_EXCL and have joined the inode to
>> + * the transaction.  Upon return, the inode will still be in this state
>> + * upon return and the transaction will be clean.
>> + */
>> +int
>> +xfs_trans_inode_ensure_nextents(
>> +	struct xfs_trans	**tpp,
>> +	struct xfs_inode	*ip,
>> +	int			whichfork,
>> +	int			nr_to_add)
>
> Ok, xfs_trans_inode* is a namespace that belongs to
> fs/xfs/xfs_trans_inode.c, not fs/xfs/libxfs/xfs_inode_fork.c. So my
> second observation is that the function needs either be renamed or
> moved.
>
> My first observation was that the function name didn't really make
> any sense to me when read in context. xfs_iext_count_may_overflow()
> makes sense because it's telling me that it's checking that the
> extent count hasn't overflowed. xfs_trans_inode_ensure_nextents()
> conveys none of that certainty.
>
> What does it ensure? "ensure" doesn't imply we are goign to change
> anything - it could just mean "check and abort if wrong" when read
> as "ensure we haven't overflowed". And if we already have nrext64
> and we've overflowed that then it will still fail, meaning we
> haven't "ensured" anything.
>
> This would make much more sense if written as:
>
> 	error = xfs_iext_count_may_overflow();
> 	if (error && error != -EOVERFLOW)
> 		goto out_trans_cancel;
>
> 	if (error == -EOVERFLOW) {
> 		error = xfs_inode_upgrade_extent_counts();
> 		if (error)
> 			goto out_trans_cancel;
> 	}
>
> Because it splits the logic into a "do we need to do something"
> part and a "do an explicit modification" part.
>

Ok. The above logic is much better than xfs_trans_inode_ensure_nextents().
Also, I will define xfs_inode_upgrade_extent_counts() in
libxfs/xfs_inode_fork.c since the function is supposed to operate on inode
extent counts.

>> +{
>> +	int			error;
>> +
>> +	error = xfs_iext_count_may_overflow(ip, whichfork, nr_to_add);
>> +	if (!error)
>> +		return 0;
>> +
>> +	/*
>> +	 * Try to upgrade if the extent count fields aren't large
>> +	 * enough.
>> +	 */
>> +	if (!xfs_has_nrext64(ip->i_mount) ||
>> +	    (ip->i_diflags2 & XFS_DIFLAG2_NREXT64))
>> +		return error;
>
> Oh, that's tricky, too. The first check returns if there's no error,
> the second check returns the error of the first function. Keeping
> the initial overflow check in the caller gets rid of this, too.
>
>> +
>> +	ip->i_diflags2 |= XFS_DIFLAG2_NREXT64;
>> +	xfs_trans_log_inode(*tpp, ip, XFS_ILOG_CORE);
>> +
>> +	error = xfs_trans_roll(tpp);
>> +	if (error)
>> +		return error;
>
> Why does this need to roll the transaction? We can just log the
> inode core and return to the caller which will then commit the
> change.

Transaction was rolled in order to make sure that we don't overflow log
reservations (computed in libxfs/xfs_trans_resv.c). But now I see that any
transaction which causes inode's extent count to change would have considered
the space required to log an inode in its reservation calculation. Hence, I
will remove the above call to xfs_trans_roll().

>> +	return xfs_iext_count_may_overflow(ip, whichfork, nr_to_add);
>
> If the answer is so we don't cancel a dirty transaction here, then
> I think this check needs to be more explicit - don't even try to do
> the upgrade if the number of extents we are adding will cause an
> overflow anyway.
>
> As it is, wouldn't adding 2^47 - 2^31 extents in a single hit be
> indicative of a bug? We can only modify the extent count by a
> handful of extents (10, maybe 20?) at most in a single transaction,
> so why do we even need this check?

Yes, the above call to xfs_iext_count_may_overflow() is not correct. The value
of nr_to_add has to be larger than 2^17 (2^32 - 2^15 for attr fork and 2^48 -
2^31 for data fork) for extent count to overflow. Hence, I will remove this
call to xfs_iext_count_may_overflow().

-- 
chandan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 15/17] xfs: Enable bulkstat ioctl to support 64-bit per-inode extent counters
  2022-03-04  8:09   ` Dave Chinner
@ 2022-03-05 12:45     ` Chandan Babu R
  2022-03-07  5:13       ` Dave Chinner
  0 siblings, 1 reply; 53+ messages in thread
From: Chandan Babu R @ 2022-03-05 12:45 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, djwong

On 04 Mar 2022 at 13:39, Dave Chinner wrote:
> On Tue, Mar 01, 2022 at 04:09:36PM +0530, Chandan Babu R wrote:
>> The following changes are made to enable userspace to obtain 64-bit extent
>> counters,
>> 1. Carve out a new 64-bit field xfs_bulkstat->bs_extents64 from
>>    xfs_bulkstat->bs_pad[] to hold 64-bit extent counter.
>> 2. Define the new flag XFS_BULK_IREQ_BULKSTAT for userspace to indicate that
>>    it is capable of receiving 64-bit extent counters.
>> 
>> Suggested-by: Darrick J. Wong <djwong@kernel.org>
>> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
>> ---
>>  fs/xfs/libxfs/xfs_fs.h | 20 ++++++++++++++++----
>>  fs/xfs/xfs_ioctl.c     |  3 +++
>>  fs/xfs/xfs_itable.c    | 30 ++++++++++++++++++++++++++++--
>>  fs/xfs/xfs_itable.h    |  4 +++-
>>  fs/xfs/xfs_iwalk.h     |  2 +-
>>  5 files changed, 51 insertions(+), 8 deletions(-)
>> 
>> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
>> index 2204d49d0c3a..31ccbff2f16c 100644
>> --- a/fs/xfs/libxfs/xfs_fs.h
>> +++ b/fs/xfs/libxfs/xfs_fs.h
>> @@ -378,7 +378,7 @@ struct xfs_bulkstat {
>>  	uint32_t	bs_extsize_blks; /* extent size hint, blocks	*/
>>  
>>  	uint32_t	bs_nlink;	/* number of links		*/
>> -	uint32_t	bs_extents;	/* number of extents		*/
>> +	uint32_t	bs_extents;	/* 32-bit data fork extent counter */
>>  	uint32_t	bs_aextents;	/* attribute number of extents	*/
>>  	uint16_t	bs_version;	/* structure version		*/
>>  	uint16_t	bs_forkoff;	/* inode fork offset in bytes	*/
>> @@ -387,8 +387,9 @@ struct xfs_bulkstat {
>>  	uint16_t	bs_checked;	/* checked inode metadata	*/
>>  	uint16_t	bs_mode;	/* type and mode		*/
>>  	uint16_t	bs_pad2;	/* zeroed			*/
>> +	uint64_t	bs_extents64;	/* 64-bit data fork extent counter */
>>  
>> -	uint64_t	bs_pad[7];	/* zeroed			*/
>> +	uint64_t	bs_pad[6];	/* zeroed			*/
>>  };
>>  
>>  #define XFS_BULKSTAT_VERSION_V1	(1)
>> @@ -469,8 +470,19 @@ struct xfs_bulk_ireq {
>>   */
>>  #define XFS_BULK_IREQ_SPECIAL	(1 << 1)
>>  
>> -#define XFS_BULK_IREQ_FLAGS_ALL	(XFS_BULK_IREQ_AGNO | \
>> -				 XFS_BULK_IREQ_SPECIAL)
>> +/*
>> + * Return data fork extent count via xfs_bulkstat->bs_extents64 field and assign
>> + * 0 to xfs_bulkstat->bs_extents when the flag is set.  Otherwise, use
>> + * xfs_bulkstat->bs_extents for returning data fork extent count and set
>> + * xfs_bulkstat->bs_extents64 to 0. In the second case, return -EOVERFLOW and
>> + * assign 0 to xfs_bulkstat->bs_extents if data fork extent count is larger than
>> + * XFS_MAX_EXTCNT_DATA_FORK_OLD.
>> + */
>> +#define XFS_BULK_IREQ_NREXT64	(1 << 2)
>> +
>> +#define XFS_BULK_IREQ_FLAGS_ALL	(XFS_BULK_IREQ_AGNO |	 \
>> +				 XFS_BULK_IREQ_SPECIAL | \
>> +				 XFS_BULK_IREQ_NREXT64)
>>  
>>  /* Operate on the root directory inode. */
>>  #define XFS_BULK_IREQ_SPECIAL_ROOT	(1)
>> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
>> index 2515fe8299e1..22947c5ffd34 100644
>> --- a/fs/xfs/xfs_ioctl.c
>> +++ b/fs/xfs/xfs_ioctl.c
>> @@ -813,6 +813,9 @@ xfs_bulk_ireq_setup(
>>  	if (XFS_INO_TO_AGNO(mp, breq->startino) >= mp->m_sb.sb_agcount)
>>  		return -ECANCELED;
>>  
>> +	if (hdr->flags & XFS_BULK_IREQ_NREXT64)
>> +		breq->flags |= XFS_IBULK_NREXT64;
>> +
>>  	return 0;
>>  }
>>  
>> diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
>> index c08c79d9e311..0272a3c9d8b1 100644
>> --- a/fs/xfs/xfs_itable.c
>> +++ b/fs/xfs/xfs_itable.c
>> @@ -20,6 +20,7 @@
>>  #include "xfs_icache.h"
>>  #include "xfs_health.h"
>>  #include "xfs_trans.h"
>> +#include "xfs_errortag.h"
>>  
>>  /*
>>   * Bulk Stat
>> @@ -64,6 +65,7 @@ xfs_bulkstat_one_int(
>>  	struct xfs_inode	*ip;		/* incore inode pointer */
>>  	struct inode		*inode;
>>  	struct xfs_bulkstat	*buf = bc->buf;
>> +	xfs_extnum_t		nextents;
>>  	int			error = -EINVAL;
>>  
>>  	if (xfs_internal_inum(mp, ino))
>> @@ -102,7 +104,27 @@ xfs_bulkstat_one_int(
>>  
>>  	buf->bs_xflags = xfs_ip2xflags(ip);
>>  	buf->bs_extsize_blks = ip->i_extsize;
>> -	buf->bs_extents = xfs_ifork_nextents(&ip->i_df);
>> +
>> +	nextents = xfs_ifork_nextents(&ip->i_df);
>> +	if (!(bc->breq->flags & XFS_IBULK_NREXT64)) {
>> +		xfs_extnum_t	max_nextents = XFS_MAX_EXTCNT_DATA_FORK_OLD;
>> +
>> +		if (unlikely(XFS_TEST_ERROR(false, mp,
>> +				XFS_ERRTAG_REDUCE_MAX_IEXTENTS)))
>> +			max_nextents = 10;
>> +
>> +		if (nextents > max_nextents) {
>> +			xfs_iunlock(ip, XFS_ILOCK_SHARED);
>> +			xfs_irele(ip);
>> +			error = -EOVERFLOW;
>> +			goto out;
>> +		}
>
> This just seems wrong. This will cause a total abort of the bulkstat
> pass which will just be completely unexpected by any application
> taht does not know about 64 bit extent counts. Most of them likely
> don't even care about the extent count in the data being returned.
>
> Really, I think this should just set the extent count to the MAX
> number and just continue onwards, otherwise existing application
> will not be able to bulkstat a filesystem with large extents counts
> in it at all.
>

Actually, I don't know much about how applications use bulkstat. I am
dependent on guidance from other developers who are well versed on this
topic. I will change the code to return maximum extent count if the value
overflows older extent count limits.

>> @@ -256,6 +278,7 @@ xfs_bulkstat(
>>  		.breq		= breq,
>>  	};
>>  	struct xfs_trans	*tp;
>> +	unsigned int		iwalk_flags = 0;
>>  	int			error;
>>  
>>  	if (breq->mnt_userns != &init_user_ns) {
>> @@ -279,7 +302,10 @@ xfs_bulkstat(
>>  	if (error)
>>  		goto out;
>>  
>> -	error = xfs_iwalk(breq->mp, tp, breq->startino, breq->flags,
>> +	if (breq->flags & XFS_IBULK_SAME_AG)
>> +		iwalk_flags |= XFS_IWALK_SAME_AG;
>> +
>> +	error = xfs_iwalk(breq->mp, tp, breq->startino, iwalk_flags,
>>  			xfs_bulkstat_iwalk, breq->icount, &bc);
>>  	xfs_trans_cancel(tp);
>>  out:
>
> This looks like an unrelated bug fix and doesn't make any sense in
> the context of the change being made in this patch.
>

You are right. This is about removing dependency of XFS_IBULK_* flags from
XFS_IWALK_* flags. I will include this change in a separate patch.

>> diff --git a/fs/xfs/xfs_itable.h b/fs/xfs/xfs_itable.h
>> index 7078d10c9b12..9223529cd7bd 100644
>> --- a/fs/xfs/xfs_itable.h
>> +++ b/fs/xfs/xfs_itable.h
>> @@ -17,7 +17,9 @@ struct xfs_ibulk {
>>  };
>>  
>>  /* Only iterate within the same AG as startino */
>> -#define XFS_IBULK_SAME_AG	(XFS_IWALK_SAME_AG)
>> +#define XFS_IBULK_SAME_AG	(1ULL << 0)
>> +
>> +#define XFS_IBULK_NREXT64	(1ULL << 1)
>
> Why are these defined as ULL? AFAICT they are only ever stored in an
> unsigned int.
>

In one of the older versions of the patchset, I had extended xfs_ibulk->flags
to an "unsigned long long" field. These changes are remnants from the older
version. I will remove ULL suffix.

>>  
>>  /*
>>   * Advance the user buffer pointer by one record of the given size.  If the
>> diff --git a/fs/xfs/xfs_iwalk.h b/fs/xfs/xfs_iwalk.h
>> index 37a795f03267..3a68766fd909 100644
>> --- a/fs/xfs/xfs_iwalk.h
>> +++ b/fs/xfs/xfs_iwalk.h
>> @@ -26,7 +26,7 @@ int xfs_iwalk_threaded(struct xfs_mount *mp, xfs_ino_t startino,
>>  		unsigned int inode_records, bool poll, void *data);
>>  
>>  /* Only iterate inodes within the same AG as @startino. */
>> -#define XFS_IWALK_SAME_AG	(0x1)
>> +#define XFS_IWALK_SAME_AG	(1 << 0)
>
> This also seems unrelated. If these flags need changing, can you
> pull it out into a separate patch explaining the what and why it
> needs changing because I'm getting lost in the 3-layer-deep (or is
> it 4?) iwalk/ibulk/ibulkreq flag munging that is all intertwined in
> this patch....
>

Sorry about that. As I had mentioned earlier, this is about removing
dependency of XFS_IBULK_* flags from XFS_IWALK_* flags. I will include this
change in a separate patch.

-- 
chandan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 17/17] xfs: Define max extent length based on on-disk format definition
  2022-03-04  8:15   ` Dave Chinner
@ 2022-03-05 12:45     ` Chandan Babu R
  0 siblings, 0 replies; 53+ messages in thread
From: Chandan Babu R @ 2022-03-05 12:45 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, djwong

On 04 Mar 2022 at 13:45, Dave Chinner wrote:
> On Tue, Mar 01, 2022 at 04:09:38PM +0530, Chandan Babu R wrote:
>> The maximum extent length depends on maximum block count that can be stored in
>> a BMBT record. Hence this commit defines MAXEXTLEN based on
>> BMBT_BLOCKCOUNT_BITLEN.
>> 
>> While at it, the commit also renames MAXEXTLEN to XFS_MAX_BMBT_EXTLEN.
>> 
>> Suggested-by: Darrick J. Wong <djwong@kernel.org>
>> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
>> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
>
> Looks fine, but this should be up near the top of the series where
> all the extent count definitions are being changed. Also, minor
> formatting nit below.
>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
>
>> @@ -299,7 +299,8 @@ xfs_calc_write_reservation(
>>   *    the agf for each of the ags: 2 * sector size
>>   *    the agfl for each of the ags: 2 * sector size
>>   *    the super block to reflect the freed blocks: sector size
>> - *    the realtime bitmap: 2 exts * ((MAXEXTLEN / rtextsize) / NBBY) bytes
>> + *    the realtime bitmap: 2 exts * ((XFS_BMBT_MAX_EXTLEN / rtextsize) / NBBY)
>> + *    bytes
>
> Break the line at the ":"
>
>  *    the realtime bitmap:
>  *		2 exts * ((XFS_BMBT_MAX_EXTLEN / rtextsize) / NBBY) bytes
>
> Which makes it consistent with the rest of the comment:
>
>>   *    the realtime summary: 2 exts * 1 block
>>   *    worst case split in allocation btrees per extent assuming 2 extents:
>>   *		2 exts * 2 trees * (2 * max depth - 1) * block size
>

Ok. I will include this in the next version of the patchset.

Thanks a lot for reviewing the entire patchset.

-- 
chandan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 06/17] xfs: Promote xfs_extnum_t and xfs_aextnum_t to 64 and 32-bits respectively
  2022-03-05 12:43     ` Chandan Babu R
@ 2022-03-07  4:55       ` Dave Chinner
  0 siblings, 0 replies; 53+ messages in thread
From: Dave Chinner @ 2022-03-07  4:55 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs, djwong, kernel test robot

On Sat, Mar 05, 2022 at 06:13:21PM +0530, Chandan Babu R wrote:
> On 04 Mar 2022 at 06:59, Dave Chinner wrote:
> > On Tue, Mar 01, 2022 at 04:09:27PM +0530, Chandan Babu R wrote:
> >> A future commit will introduce a 64-bit on-disk data extent counter and a
> >> 32-bit on-disk attr extent counter. This commit promotes xfs_extnum_t and
> >> xfs_aextnum_t to 64 and 32-bits in order to correctly handle in-core versions
> >> of these quantities.
> >> 
> >> Reported-by: kernel test robot <lkp@intel.com>
> >
> > What was reported by the test robot? This change isn't a bug that
> > needed fixing, it's a core part of the patchset...
> >
> 
> Kernel test robot had complained about the following,
> 
>   ld.lld: error: undefined symbol: __udivdi3
>   >>> referenced by xfs_bmap.c
>   >>>               xfs/libxfs/xfs_bmap.o:(xfs_bmap_compute_maxlevels) in archive fs/built-in.a
> 
> I had solved the linker error by replacing the division operation with the
> following statement,
> 
>   maxblocks = howmany_64(maxleafents, minleafrecs);
> 
> Sorry, I will include this description in the commit message.

Oh, I wouldn't even bother with a Reported-by tag then. It's just
like a reviewer pointing out that there was an issue with the patch
- you don't add "reported-by" for every little thing that someone
points out that you fix, right? You might mention who noticed it
in the changelog for the patch, but this sort of information does
not belong in the commit message for a new feature.

IOWs, reported-by is really only useful for referencing the bug
report for a regression or bug that was found in a released kernel
- it's not useful or meaningful for patches that are being developed
and have not yet been merged...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 14/17] xfs: Conditionally upgrade existing inodes to use 64-bit extent counters
  2022-03-05 12:45     ` Chandan Babu R
@ 2022-03-07  5:02       ` Dave Chinner
  2022-03-07 10:20         ` Chandan Babu R
  0 siblings, 1 reply; 53+ messages in thread
From: Dave Chinner @ 2022-03-07  5:02 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs, djwong

On Sat, Mar 05, 2022 at 06:15:15PM +0530, Chandan Babu R wrote:
> On 04 Mar 2022 at 13:21, Dave Chinner wrote:
> > On Tue, Mar 01, 2022 at 04:09:35PM +0530, Chandan Babu R wrote:
> >> This commit upgrades inodes to use 64-bit extent counters when they are read
> >> from disk. Inodes are upgraded only when the filesystem instance has
> >> XFS_SB_FEAT_INCOMPAT_NREXT64 incompat flag set.
> >> 
> >> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> >> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
.....
> >> +	return xfs_iext_count_may_overflow(ip, whichfork, nr_to_add);
> >
> > If the answer is so we don't cancel a dirty transaction here, then
> > I think this check needs to be more explicit - don't even try to do
> > the upgrade if the number of extents we are adding will cause an
> > overflow anyway.
> >
> > As it is, wouldn't adding 2^47 - 2^31 extents in a single hit be
> > indicative of a bug? We can only modify the extent count by a
> > handful of extents (10, maybe 20?) at most in a single transaction,
> > so why do we even need this check?
> 
> Yes, the above call to xfs_iext_count_may_overflow() is not correct. The value
> of nr_to_add has to be larger than 2^17 (2^32 - 2^15 for attr fork and 2^48 -
> 2^31 for data fork) for extent count to overflow. Hence, I will remove this
> call to xfs_iext_count_may_overflow().

Would it be worth putting an assert somewhere with this logic in it?
That way we at least capture such bugs in debug settings and protect
ourselves from unintentional mistakes.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 15/17] xfs: Enable bulkstat ioctl to support 64-bit per-inode extent counters
  2022-03-05 12:45     ` Chandan Babu R
@ 2022-03-07  5:13       ` Dave Chinner
  2022-03-07 13:46         ` Chandan Babu R
  0 siblings, 1 reply; 53+ messages in thread
From: Dave Chinner @ 2022-03-07  5:13 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs, djwong

On Sat, Mar 05, 2022 at 06:15:37PM +0530, Chandan Babu R wrote:
> On 04 Mar 2022 at 13:39, Dave Chinner wrote:
> > On Tue, Mar 01, 2022 at 04:09:36PM +0530, Chandan Babu R wrote:
> >> @@ -102,7 +104,27 @@ xfs_bulkstat_one_int(
> >>  
> >>  	buf->bs_xflags = xfs_ip2xflags(ip);
> >>  	buf->bs_extsize_blks = ip->i_extsize;
> >> -	buf->bs_extents = xfs_ifork_nextents(&ip->i_df);
> >> +
> >> +	nextents = xfs_ifork_nextents(&ip->i_df);
> >> +	if (!(bc->breq->flags & XFS_IBULK_NREXT64)) {
> >> +		xfs_extnum_t	max_nextents = XFS_MAX_EXTCNT_DATA_FORK_OLD;
> >> +
> >> +		if (unlikely(XFS_TEST_ERROR(false, mp,
> >> +				XFS_ERRTAG_REDUCE_MAX_IEXTENTS)))
> >> +			max_nextents = 10;
> >> +
> >> +		if (nextents > max_nextents) {
> >> +			xfs_iunlock(ip, XFS_ILOCK_SHARED);
> >> +			xfs_irele(ip);
> >> +			error = -EOVERFLOW;
> >> +			goto out;
> >> +		}
> >
> > This just seems wrong. This will cause a total abort of the bulkstat
> > pass which will just be completely unexpected by any application
> > taht does not know about 64 bit extent counts. Most of them likely
> > don't even care about the extent count in the data being returned.
> >
> > Really, I think this should just set the extent count to the MAX
> > number and just continue onwards, otherwise existing application
> > will not be able to bulkstat a filesystem with large extents counts
> > in it at all.
> >
> 
> Actually, I don't know much about how applications use bulkstat. I am
> dependent on guidance from other developers who are well versed on this
> topic. I will change the code to return maximum extent count if the value
> overflows older extent count limits.

They tend to just run in a loop until either no more inodes are to
be found or an error occurs. bulkstat loops don't expect errors to
be reported - it's hard to do something based on all inodes if you
get errors reading then inodes part way through. There's no way for
the application to tell where it should restart scanning - the
bulkstat iteration cookie is controlled by the kernel, and I don't
think we update it on error.

e.g. see fstests src/bstat.c and src/bulkstat_unlink_test*.c - they
simply abort if bulkstat fails. Same goes for xfsdump common/util.c
and dump/content.c - they just error out and return and don't try to
continue further.

Hence returning -EOVERFLOW because the extent count is greater than
what can be held in the struct bstat will stop those programs from
running properly to completion.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 14/17] xfs: Conditionally upgrade existing inodes to use 64-bit extent counters
  2022-03-07  5:02       ` Dave Chinner
@ 2022-03-07 10:20         ` Chandan Babu R
  0 siblings, 0 replies; 53+ messages in thread
From: Chandan Babu R @ 2022-03-07 10:20 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, djwong

On 07 Mar 2022 at 10:32, Dave Chinner wrote:
> On Sat, Mar 05, 2022 at 06:15:15PM +0530, Chandan Babu R wrote:
>> On 04 Mar 2022 at 13:21, Dave Chinner wrote:
>> > On Tue, Mar 01, 2022 at 04:09:35PM +0530, Chandan Babu R wrote:
>> >> This commit upgrades inodes to use 64-bit extent counters when they are read
>> >> from disk. Inodes are upgraded only when the filesystem instance has
>> >> XFS_SB_FEAT_INCOMPAT_NREXT64 incompat flag set.
>> >> 
>> >> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
>> >> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
> .....
>> >> +	return xfs_iext_count_may_overflow(ip, whichfork, nr_to_add);
>> >
>> > If the answer is so we don't cancel a dirty transaction here, then
>> > I think this check needs to be more explicit - don't even try to do
>> > the upgrade if the number of extents we are adding will cause an
>> > overflow anyway.
>> >
>> > As it is, wouldn't adding 2^47 - 2^31 extents in a single hit be
>> > indicative of a bug? We can only modify the extent count by a
>> > handful of extents (10, maybe 20?) at most in a single transaction,
>> > so why do we even need this check?
>> 
>> Yes, the above call to xfs_iext_count_may_overflow() is not correct. The value
>> of nr_to_add has to be larger than 2^17 (2^32 - 2^15 for attr fork and 2^48 -
>> 2^31 for data fork) for extent count to overflow. Hence, I will remove this
>> call to xfs_iext_count_may_overflow().
>
> Would it be worth putting an assert somewhere with this logic in it?
> That way we at least capture such bugs in debug settings and protect
> ourselves from unintentional mistakes.
>

Sure. I will add an ASSERT() call to check if we ever add more than 2^17
extents in a single modification of an inode's data/attr fork extent count.

-- 
chandan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 15/17] xfs: Enable bulkstat ioctl to support 64-bit per-inode extent counters
  2022-03-07  5:13       ` Dave Chinner
@ 2022-03-07 13:46         ` Chandan Babu R
  2022-03-07 21:41           ` Dave Chinner
  0 siblings, 1 reply; 53+ messages in thread
From: Chandan Babu R @ 2022-03-07 13:46 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, djwong

On 07 Mar 2022 at 10:43, Dave Chinner wrote:
> On Sat, Mar 05, 2022 at 06:15:37PM +0530, Chandan Babu R wrote:
>> On 04 Mar 2022 at 13:39, Dave Chinner wrote:
>> > On Tue, Mar 01, 2022 at 04:09:36PM +0530, Chandan Babu R wrote:
>> >> @@ -102,7 +104,27 @@ xfs_bulkstat_one_int(
>> >>  
>> >>  	buf->bs_xflags = xfs_ip2xflags(ip);
>> >>  	buf->bs_extsize_blks = ip->i_extsize;
>> >> -	buf->bs_extents = xfs_ifork_nextents(&ip->i_df);
>> >> +
>> >> +	nextents = xfs_ifork_nextents(&ip->i_df);
>> >> +	if (!(bc->breq->flags & XFS_IBULK_NREXT64)) {
>> >> +		xfs_extnum_t	max_nextents = XFS_MAX_EXTCNT_DATA_FORK_OLD;
>> >> +
>> >> +		if (unlikely(XFS_TEST_ERROR(false, mp,
>> >> +				XFS_ERRTAG_REDUCE_MAX_IEXTENTS)))
>> >> +			max_nextents = 10;
>> >> +
>> >> +		if (nextents > max_nextents) {
>> >> +			xfs_iunlock(ip, XFS_ILOCK_SHARED);
>> >> +			xfs_irele(ip);
>> >> +			error = -EOVERFLOW;
>> >> +			goto out;
>> >> +		}
>> >
>> > This just seems wrong. This will cause a total abort of the bulkstat
>> > pass which will just be completely unexpected by any application
>> > taht does not know about 64 bit extent counts. Most of them likely
>> > don't even care about the extent count in the data being returned.
>> >
>> > Really, I think this should just set the extent count to the MAX
>> > number and just continue onwards, otherwise existing application
>> > will not be able to bulkstat a filesystem with large extents counts
>> > in it at all.
>> >
>> 
>> Actually, I don't know much about how applications use bulkstat. I am
>> dependent on guidance from other developers who are well versed on this
>> topic. I will change the code to return maximum extent count if the value
>> overflows older extent count limits.
>
> They tend to just run in a loop until either no more inodes are to
> be found or an error occurs. bulkstat loops don't expect errors to
> be reported - it's hard to do something based on all inodes if you
> get errors reading then inodes part way through. There's no way for
> the application to tell where it should restart scanning - the
> bulkstat iteration cookie is controlled by the kernel, and I don't
> think we update it on error.

xfs_bulkstat() has the following,

        kmem_free(bc.buf);

        /*
         * We found some inodes, so clear the error status and return them.
         * The lastino pointer will point directly at the inode that triggered
         * any error that occurred, so on the next call the error will be
         * triggered again and propagated to userspace as there will be no
         * formatted inodes in the buffer.
         */
        if (breq->ocount > 0)
                error = 0;

        return error;

The above will help the userspace process to issue another bulkstat call which
beging from the inode causing an error.

>
> e.g. see fstests src/bstat.c and src/bulkstat_unlink_test*.c - they
> simply abort if bulkstat fails. Same goes for xfsdump common/util.c
> and dump/content.c - they just error out and return and don't try to
> continue further.

I made the following changes to src/bstat.c,

diff --git a/src/bstat.c b/src/bstat.c
index 3f3dc2c6..0e72190e 100644
--- a/src/bstat.c
+++ b/src/bstat.c
@@ -143,7 +143,19 @@ main(int argc, char **argv)
 	bulkreq.ubuffer = t;
 	bulkreq.ocount  = &count;
 
-	while ((ret = xfsctl(name, fsfd, XFS_IOC_FSBULKSTAT, &bulkreq)) == 0) {
+	while (1) {
+		ret = xfsctl(name, fsfd, XFS_IOC_FSBULKSTAT, &bulkreq);
+		if (ret == -1) {
+			if (errno == EOVERFLOW) {
+				printf("Skipping inode %llu.\n",  last+1);
+				++last;
+				continue;
+			}
+
+			perror("xfsctl");
+			exit(1);
+		}
+
 		total += count;
 

Executing the script at
https://gist.github.com/chandanr/f2d147fa20a681e1508e182b5b7cdb00 provides the
following output,

...

ino 128 mode 040755 nlink 3 uid 0 gid 0 rdev 0
blksize 4096 size 37 blocks 0 xflags 0 extsize 0
atime Thu Jan  1 00:00:00.000000000 1970
mtime Mon Mar  7 13:06:30.051339892 2022
ctime Mon Mar  7 13:06:30.051339892 2022
extents 0 0 gen 0
DMI: event mask 0x00000000 state 0x0000

Skipping inode 131.

ino 132 mode 040755 nlink 2 uid 0 gid 0 rdev 0
blksize 4096 size 97 blocks 0 xflags 0 extsize 0
atime Mon Mar  7 13:06:30.051339892 2022
mtime Mon Mar  7 13:06:30.083339892 2022
ctime Mon Mar  7 13:06:30.083339892 2022
extents 0 0 gen 548703887
DMI: event mask 0x00000000 state 0x0000

...

The above illustrates that userspace programs can be modified to use lastip to
skip inodes which cause bulkstat ioctl to return with an error.

>
> Hence returning -EOVERFLOW because the extent count is greater than
> what can be held in the struct bstat will stop those programs from
> running properly to completion.
>

-- 
chandan

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 15/17] xfs: Enable bulkstat ioctl to support 64-bit per-inode extent counters
  2022-03-07 13:46         ` Chandan Babu R
@ 2022-03-07 21:41           ` Dave Chinner
  2022-03-08  2:52             ` Chandan Babu R
  0 siblings, 1 reply; 53+ messages in thread
From: Dave Chinner @ 2022-03-07 21:41 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs, djwong

On Mon, Mar 07, 2022 at 07:16:57PM +0530, Chandan Babu R wrote:
> On 07 Mar 2022 at 10:43, Dave Chinner wrote:
> > On Sat, Mar 05, 2022 at 06:15:37PM +0530, Chandan Babu R wrote:
> >> On 04 Mar 2022 at 13:39, Dave Chinner wrote:
> >> > On Tue, Mar 01, 2022 at 04:09:36PM +0530, Chandan Babu R wrote:
> >> >> @@ -102,7 +104,27 @@ xfs_bulkstat_one_int(
> >> >>  
> >> >>  	buf->bs_xflags = xfs_ip2xflags(ip);
> >> >>  	buf->bs_extsize_blks = ip->i_extsize;
> >> >> -	buf->bs_extents = xfs_ifork_nextents(&ip->i_df);
> >> >> +
> >> >> +	nextents = xfs_ifork_nextents(&ip->i_df);
> >> >> +	if (!(bc->breq->flags & XFS_IBULK_NREXT64)) {
> >> >> +		xfs_extnum_t	max_nextents = XFS_MAX_EXTCNT_DATA_FORK_OLD;
> >> >> +
> >> >> +		if (unlikely(XFS_TEST_ERROR(false, mp,
> >> >> +				XFS_ERRTAG_REDUCE_MAX_IEXTENTS)))
> >> >> +			max_nextents = 10;
> >> >> +
> >> >> +		if (nextents > max_nextents) {
> >> >> +			xfs_iunlock(ip, XFS_ILOCK_SHARED);
> >> >> +			xfs_irele(ip);
> >> >> +			error = -EOVERFLOW;
> >> >> +			goto out;
> >> >> +		}
> >> >
> >> > This just seems wrong. This will cause a total abort of the bulkstat
> >> > pass which will just be completely unexpected by any application
> >> > taht does not know about 64 bit extent counts. Most of them likely
> >> > don't even care about the extent count in the data being returned.
> >> >
> >> > Really, I think this should just set the extent count to the MAX
> >> > number and just continue onwards, otherwise existing application
> >> > will not be able to bulkstat a filesystem with large extents counts
> >> > in it at all.
> >> >
> >> 
> >> Actually, I don't know much about how applications use bulkstat. I am
> >> dependent on guidance from other developers who are well versed on this
> >> topic. I will change the code to return maximum extent count if the value
> >> overflows older extent count limits.
> >
> > They tend to just run in a loop until either no more inodes are to
> > be found or an error occurs. bulkstat loops don't expect errors to
> > be reported - it's hard to do something based on all inodes if you
> > get errors reading then inodes part way through. There's no way for
> > the application to tell where it should restart scanning - the
> > bulkstat iteration cookie is controlled by the kernel, and I don't
> > think we update it on error.
> 
> xfs_bulkstat() has the following,
> 
>         kmem_free(bc.buf);
> 
>         /*
>          * We found some inodes, so clear the error status and return them.
>          * The lastino pointer will point directly at the inode that triggered
>          * any error that occurred, so on the next call the error will be
>          * triggered again and propagated to userspace as there will be no
>          * formatted inodes in the buffer.
>          */
>         if (breq->ocount > 0)
>                 error = 0;
> 
>         return error;
> 
> The above will help the userspace process to issue another bulkstat call which
> beging from the inode causing an error.

ANd then it returns with a cookie pointing at the overflowed inode,
and we try that one first on the next loop, triggering -EOVERFLOW
with breq->ocount == 0.

Or maybe we have two inodes in a row that trigger EOVERFLOW, so even
if we skip the first and return to userspace, we trip the second on
the next call and boom...

> > e.g. see fstests src/bstat.c and src/bulkstat_unlink_test*.c - they
> > simply abort if bulkstat fails. Same goes for xfsdump common/util.c
> > and dump/content.c - they just error out and return and don't try to
> > continue further.
> 
> I made the following changes to src/bstat.c,
> 
> diff --git a/src/bstat.c b/src/bstat.c
> index 3f3dc2c6..0e72190e 100644
> --- a/src/bstat.c
> +++ b/src/bstat.c
> @@ -143,7 +143,19 @@ main(int argc, char **argv)
>  	bulkreq.ubuffer = t;
>  	bulkreq.ocount  = &count;
>  
> -	while ((ret = xfsctl(name, fsfd, XFS_IOC_FSBULKSTAT, &bulkreq)) == 0) {
> +	while (1) {
> +		ret = xfsctl(name, fsfd, XFS_IOC_FSBULKSTAT, &bulkreq);
> +		if (ret == -1) {
> +			if (errno == EOVERFLOW) {
> +				printf("Skipping inode %llu.\n",  last+1);
> +				++last;
> +				continue;
> +			}
> +
> +			perror("xfsctl");
> +			exit(1);
> +		}
> +
>  		total += count;
>  
> 
> Executing the script at
> https://gist.github.com/chandanr/f2d147fa20a681e1508e182b5b7cdb00 provides the
> following output,
> 
> ...
> 
> ino 128 mode 040755 nlink 3 uid 0 gid 0 rdev 0
> blksize 4096 size 37 blocks 0 xflags 0 extsize 0
> atime Thu Jan  1 00:00:00.000000000 1970
> mtime Mon Mar  7 13:06:30.051339892 2022
> ctime Mon Mar  7 13:06:30.051339892 2022
> extents 0 0 gen 0
> DMI: event mask 0x00000000 state 0x0000
> 
> Skipping inode 131.
> 
> ino 132 mode 040755 nlink 2 uid 0 gid 0 rdev 0
> blksize 4096 size 97 blocks 0 xflags 0 extsize 0
> atime Mon Mar  7 13:06:30.051339892 2022
> mtime Mon Mar  7 13:06:30.083339892 2022
> ctime Mon Mar  7 13:06:30.083339892 2022
> extents 0 0 gen 548703887
> DMI: event mask 0x00000000 state 0x0000
> 
> ...
> 
> The above illustrates that userspace programs can be modified to use lastip to
> skip inodes which cause bulkstat ioctl to return with an error.

Yes, I know they can be modified to handle it - that is not the
concern here. The concern is that this new error can potentially
break the *unmodified* applications already out there. e.g. xfsdump
may just stop dumping a filesystem half way through because it
doesn't handle unexpected errors like this sanely. But we can't tie
a version of xfsdump to a specific kernel feature, so we have to
make sure that buklstat from older builds of xfsdump will still
iterate through the entire filesystem without explicit EOVERFLOW
support...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH V7 15/17] xfs: Enable bulkstat ioctl to support 64-bit per-inode extent counters
  2022-03-07 21:41           ` Dave Chinner
@ 2022-03-08  2:52             ` Chandan Babu R
  0 siblings, 0 replies; 53+ messages in thread
From: Chandan Babu R @ 2022-03-08  2:52 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, djwong

On 08 Mar 2022 at 03:11, Dave Chinner wrote:
> On Mon, Mar 07, 2022 at 07:16:57PM +0530, Chandan Babu R wrote:
>> On 07 Mar 2022 at 10:43, Dave Chinner wrote:
>> > On Sat, Mar 05, 2022 at 06:15:37PM +0530, Chandan Babu R wrote:
>> >> On 04 Mar 2022 at 13:39, Dave Chinner wrote:
>> >> > On Tue, Mar 01, 2022 at 04:09:36PM +0530, Chandan Babu R wrote:
>> >> >> @@ -102,7 +104,27 @@ xfs_bulkstat_one_int(
>> >> >>  
>> >> >>  	buf->bs_xflags = xfs_ip2xflags(ip);
>> >> >>  	buf->bs_extsize_blks = ip->i_extsize;
>> >> >> -	buf->bs_extents = xfs_ifork_nextents(&ip->i_df);
>> >> >> +
>> >> >> +	nextents = xfs_ifork_nextents(&ip->i_df);
>> >> >> +	if (!(bc->breq->flags & XFS_IBULK_NREXT64)) {
>> >> >> +		xfs_extnum_t	max_nextents = XFS_MAX_EXTCNT_DATA_FORK_OLD;
>> >> >> +
>> >> >> +		if (unlikely(XFS_TEST_ERROR(false, mp,
>> >> >> +				XFS_ERRTAG_REDUCE_MAX_IEXTENTS)))
>> >> >> +			max_nextents = 10;
>> >> >> +
>> >> >> +		if (nextents > max_nextents) {
>> >> >> +			xfs_iunlock(ip, XFS_ILOCK_SHARED);
>> >> >> +			xfs_irele(ip);
>> >> >> +			error = -EOVERFLOW;
>> >> >> +			goto out;
>> >> >> +		}
>> >> >
>> >> > This just seems wrong. This will cause a total abort of the bulkstat
>> >> > pass which will just be completely unexpected by any application
>> >> > taht does not know about 64 bit extent counts. Most of them likely
>> >> > don't even care about the extent count in the data being returned.
>> >> >
>> >> > Really, I think this should just set the extent count to the MAX
>> >> > number and just continue onwards, otherwise existing application
>> >> > will not be able to bulkstat a filesystem with large extents counts
>> >> > in it at all.
>> >> >
>> >> 
>> >> Actually, I don't know much about how applications use bulkstat. I am
>> >> dependent on guidance from other developers who are well versed on this
>> >> topic. I will change the code to return maximum extent count if the value
>> >> overflows older extent count limits.
>> >
>> > They tend to just run in a loop until either no more inodes are to
>> > be found or an error occurs. bulkstat loops don't expect errors to
>> > be reported - it's hard to do something based on all inodes if you
>> > get errors reading then inodes part way through. There's no way for
>> > the application to tell where it should restart scanning - the
>> > bulkstat iteration cookie is controlled by the kernel, and I don't
>> > think we update it on error.
>> 
>> xfs_bulkstat() has the following,
>> 
>>         kmem_free(bc.buf);
>> 
>>         /*
>>          * We found some inodes, so clear the error status and return them.
>>          * The lastino pointer will point directly at the inode that triggered
>>          * any error that occurred, so on the next call the error will be
>>          * triggered again and propagated to userspace as there will be no
>>          * formatted inodes in the buffer.
>>          */
>>         if (breq->ocount > 0)
>>                 error = 0;
>> 
>>         return error;
>> 
>> The above will help the userspace process to issue another bulkstat call which
>> beging from the inode causing an error.
>
> ANd then it returns with a cookie pointing at the overflowed inode,
> and we try that one first on the next loop, triggering -EOVERFLOW
> with breq->ocount == 0.
>
> Or maybe we have two inodes in a row that trigger EOVERFLOW, so even
> if we skip the first and return to userspace, we trip the second on
> the next call and boom...
>
>> > e.g. see fstests src/bstat.c and src/bulkstat_unlink_test*.c - they
>> > simply abort if bulkstat fails. Same goes for xfsdump common/util.c
>> > and dump/content.c - they just error out and return and don't try to
>> > continue further.
>> 
>> I made the following changes to src/bstat.c,
>> 
>> diff --git a/src/bstat.c b/src/bstat.c
>> index 3f3dc2c6..0e72190e 100644
>> --- a/src/bstat.c
>> +++ b/src/bstat.c
>> @@ -143,7 +143,19 @@ main(int argc, char **argv)
>>  	bulkreq.ubuffer = t;
>>  	bulkreq.ocount  = &count;
>>  
>> -	while ((ret = xfsctl(name, fsfd, XFS_IOC_FSBULKSTAT, &bulkreq)) == 0) {
>> +	while (1) {
>> +		ret = xfsctl(name, fsfd, XFS_IOC_FSBULKSTAT, &bulkreq);
>> +		if (ret == -1) {
>> +			if (errno == EOVERFLOW) {
>> +				printf("Skipping inode %llu.\n",  last+1);
>> +				++last;
>> +				continue;
>> +			}
>> +
>> +			perror("xfsctl");
>> +			exit(1);
>> +		}
>> +
>>  		total += count;
>>  
>> 
>> Executing the script at
>> https://gist.github.com/chandanr/f2d147fa20a681e1508e182b5b7cdb00 provides the
>> following output,
>> 
>> ...
>> 
>> ino 128 mode 040755 nlink 3 uid 0 gid 0 rdev 0
>> blksize 4096 size 37 blocks 0 xflags 0 extsize 0
>> atime Thu Jan  1 00:00:00.000000000 1970
>> mtime Mon Mar  7 13:06:30.051339892 2022
>> ctime Mon Mar  7 13:06:30.051339892 2022
>> extents 0 0 gen 0
>> DMI: event mask 0x00000000 state 0x0000
>> 
>> Skipping inode 131.
>> 
>> ino 132 mode 040755 nlink 2 uid 0 gid 0 rdev 0
>> blksize 4096 size 97 blocks 0 xflags 0 extsize 0
>> atime Mon Mar  7 13:06:30.051339892 2022
>> mtime Mon Mar  7 13:06:30.083339892 2022
>> ctime Mon Mar  7 13:06:30.083339892 2022
>> extents 0 0 gen 548703887
>> DMI: event mask 0x00000000 state 0x0000
>> 
>> ...
>> 
>> The above illustrates that userspace programs can be modified to use lastip to
>> skip inodes which cause bulkstat ioctl to return with an error.
>
> Yes, I know they can be modified to handle it - that is not the
> concern here. The concern is that this new error can potentially
> break the *unmodified* applications already out there. e.g. xfsdump
> may just stop dumping a filesystem half way through because it
> doesn't handle unexpected errors like this sanely. But we can't tie
> a version of xfsdump to a specific kernel feature, so we have to
> make sure that buklstat from older builds of xfsdump will still
> iterate through the entire filesystem without explicit EOVERFLOW
> support...

Ok. Thanks for the clarification.

-- 
chandan

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2022-03-08  2:53 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-01 10:39 [PATCH V7 00/17] xfs: Extend per-inode extent counters Chandan Babu R
2022-03-01 10:39 ` [PATCH V7 01/17] xfs: Move extent count limits to xfs_format.h Chandan Babu R
2022-03-04  0:55   ` Dave Chinner
2022-03-01 10:39 ` [PATCH V7 02/17] xfs: Introduce xfs_iext_max_nextents() helper Chandan Babu R
2022-03-04  0:56   ` Dave Chinner
2022-03-01 10:39 ` [PATCH V7 03/17] xfs: Use xfs_extnum_t instead of basic data types Chandan Babu R
2022-03-04  0:59   ` Dave Chinner
2022-03-04  1:30     ` Dave Chinner
2022-03-01 10:39 ` [PATCH V7 04/17] xfs: Introduce xfs_dfork_nextents() helper Chandan Babu R
2022-03-04  1:43   ` Dave Chinner
2022-03-05 12:42     ` Chandan Babu R
2022-03-01 10:39 ` [PATCH V7 05/17] xfs: Use basic types to define xfs_log_dinode's di_nextents and di_anextents Chandan Babu R
2022-03-04  1:44   ` Dave Chinner
2022-03-01 10:39 ` [PATCH V7 06/17] xfs: Promote xfs_extnum_t and xfs_aextnum_t to 64 and 32-bits respectively Chandan Babu R
2022-03-04  1:29   ` Dave Chinner
2022-03-05 12:43     ` Chandan Babu R
2022-03-07  4:55       ` Dave Chinner
2022-03-01 10:39 ` [PATCH V7 07/17] xfs: Introduce XFS_SB_FEAT_INCOMPAT_NREXT64 and associated per-fs feature bit Chandan Babu R
2022-03-04  1:57   ` Dave Chinner
2022-03-05 12:43     ` Chandan Babu R
2022-03-01 10:39 ` [PATCH V7 08/17] xfs: Introduce XFS_FSOP_GEOM_FLAGS_NREXT64 Chandan Babu R
2022-03-04  1:58   ` Dave Chinner
2022-03-01 10:39 ` [PATCH V7 09/17] xfs: Introduce XFS_DIFLAG2_NREXT64 and associated helpers Chandan Babu R
2022-03-01 10:39 ` [PATCH V7 10/17] xfs: Use xfs_rfsblock_t to count maximum blocks that can be used by BMBT Chandan Babu R
2022-03-04  2:09   ` Dave Chinner
2022-03-05 12:44     ` Chandan Babu R
2022-03-01 10:39 ` [PATCH V7 11/17] xfs: Introduce macros to represent new maximum extent counts for data/attr forks Chandan Babu R
2022-03-04  2:32   ` Dave Chinner
2022-03-05 12:44     ` Chandan Babu R
2022-03-01 10:39 ` [PATCH V7 12/17] xfs: Introduce per-inode 64-bit extent counters Chandan Babu R
2022-03-04  7:14   ` Dave Chinner
2022-03-05 12:44     ` Chandan Babu R
2022-03-01 10:39 ` [PATCH V7 13/17] xfs: xfs_growfs_rt_alloc: Unlock inode explicitly rather than through iop_committing() Chandan Babu R
2022-03-02  0:26   ` Darrick J. Wong
2022-03-04  7:25   ` Dave Chinner
2022-03-05 12:44     ` Chandan Babu R
2022-03-01 10:39 ` [PATCH V7 14/17] xfs: Conditionally upgrade existing inodes to use 64-bit extent counters Chandan Babu R
2022-03-04  7:51   ` Dave Chinner
2022-03-05 12:45     ` Chandan Babu R
2022-03-07  5:02       ` Dave Chinner
2022-03-07 10:20         ` Chandan Babu R
2022-03-01 10:39 ` [PATCH V7 15/17] xfs: Enable bulkstat ioctl to support 64-bit per-inode " Chandan Babu R
2022-03-02  0:31   ` Darrick J. Wong
2022-03-04  8:09   ` Dave Chinner
2022-03-05 12:45     ` Chandan Babu R
2022-03-07  5:13       ` Dave Chinner
2022-03-07 13:46         ` Chandan Babu R
2022-03-07 21:41           ` Dave Chinner
2022-03-08  2:52             ` Chandan Babu R
2022-03-01 10:39 ` [PATCH V7 16/17] xfs: Add XFS_SB_FEAT_INCOMPAT_NREXT64 to the list of supported flags Chandan Babu R
2022-03-01 10:39 ` [PATCH V7 17/17] xfs: Define max extent length based on on-disk format definition Chandan Babu R
2022-03-04  8:15   ` Dave Chinner
2022-03-05 12:45     ` Chandan Babu R

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.