linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V11 00/14] Bail out if transaction can cause extent count to overflow
@ 2020-11-17 13:44 Chandan Babu R
  2020-11-17 13:44 ` [PATCH V11 01/14] xfs: Add helper for checking per-inode extent count overflow Chandan Babu R
                   ` (13 more replies)
  0 siblings, 14 replies; 25+ messages in thread
From: Chandan Babu R @ 2020-11-17 13:44 UTC (permalink / raw)
  To: linux-xfs; +Cc: Chandan Babu R, darrick.wong

XFS does not check for possible overflow of per-inode extent counter
fields when adding extents to either data or attr fork.

For e.g.
1. Insert 5 million xattrs (each having a value size of 255 bytes) and
   then delete 50% of them in an alternating manner.

2. On a 4k block sized XFS filesystem instance, the above causes 98511
   extents to be created in the attr fork of the inode.

   xfsaild/loop0  2008 [003]  1475.127209: probe:xfs_inode_to_disk: (ffffffffa43fb6b0) if_nextents=98511 i_ino=131

3. The incore inode fork extent counter is a signed 32-bit
   quantity. However, the on-disk extent counter is an unsigned 16-bit
   quantity and hence cannot hold 98511 extents.

4. The following incorrect value is stored in the xattr extent counter,
   # xfs_db -f -c 'inode 131' -c 'print core.naextents' /dev/loop0
   core.naextents = -32561

This patchset adds a new helper function
(i.e. xfs_iext_count_may_overflow()) to check for overflow of the
per-inode data and xattr extent counters and invokes it before
starting an fs operation (e.g. creating a new directory entry). With
this patchset applied, XFS detects counter overflows and returns with
an error rather than causing a silent corruption.

The patchset has been tested by executing xfstests with the following
mkfs.xfs options,
1. -m crc=0 -b size=1k
2. -m crc=0 -b size=4k
3. -m crc=0 -b size=512
4. -m rmapbt=1,reflink=1 -b size=1k
5. -m rmapbt=1,reflink=1 -b size=4k

The patches can also be obtained from
https://github.com/chandanr/linux.git at branch xfs-reserve-extent-count-v11.

I have two patches that define the newly introduced error injection
tags in xfsprogs
(https://lore.kernel.org/linux-xfs/20201104114900.172147-1-chandanrlinux@gmail.com/).

I have also written tests
(https://github.com/chandanr/xfstests/commits/extent-overflow-tests)
for verifying the checks introduced in the kernel.

Changelog:
V10 -> V11:
  1. For directory/xattr insert operations we now reserve sufficient
     number of "extent count" so as to guarantee a future
     directory/xattr remove operation.
  2. The pseudo max extent count value has been increased to 35.

V9 -> V10:
  1. Pull back changes which cause xfs_bmap_compute_alignments() to
     return "stripe alignment" into 12th patch i.e. "xfs: Compute bmap
     extent alignments in a separate function".

V8 -> V9:
  1. Enabling XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT error tag will
     always allocate single block sized free extents (if
     available).
  2. xfs_bmap_compute_alignments() now returns stripe alignment as its
     return value.
  3. Dropped Allison's RVB tag for "xfs: Compute bmap extent
     alignments in a separate function" and "xfs: Introduce error
     injection to allocate only minlen size extents for files".

V7 -> V8:
  1. Rename local variable in xfs_alloc_fix_freelist() from "i" to "stat".

V6 -> V7:
  1. Create new function xfs_bmap_exact_minlen_extent_alloc() (enabled
     only when CONFIG_XFS_DEBUG is set to y) which issues allocation
     requests for minlen sized extents only. In order to achieve this,
     common code from xfs_bmap_btalloc() have been refactored into new
     functions.
  2. All major functions implementing logic associated with
     XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT error tag are compiled only
     when CONFIG_XFS_DEBUG is set to y.
  3. Remove XFS_IEXT_REFLINK_REMAP_CNT macro and replace it with an
     integer which holds the number of new extents to be
     added to the data fork.

V5 -> V6:
  1. Rebased the patchset on xfs-linux/for-next branch.
  2. Drop "xfs: Set tp->t_firstblock only once during a transaction's
     lifetime" patch from the patchset.
  3. Add a comment to xfs_bmap_btalloc() describing why it was chosen
     to start "free space extent search" from AG 0 when
     XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT is enabled and when the
     transaction is allocating its first extent.
  4. Fix review comments associated with coding style.

V4 -> V5:
  1. Introduce new error tag XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT to
     let user space programs to be able to guarantee that free space
     requests for files are satisfied by allocating minlen sized
     extents.
  2. Change xfs_bmap_btalloc() and xfs_alloc_vextent() to allocate
     minlen sized extents when XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT is
     enabled.
  3. Introduce a new patch that causes tp->t_firstblock to be assigned
     to a value only when its previous value is NULLFSBLOCK.
  4. Replace the previously introduced MAXERRTAGEXTNUM (maximum inode
     fork extent count) with the hardcoded value of 10.
  5. xfs_bui_item_recover(): Use XFS_IEXT_ADD_NOSPLIT_CNT when mapping
     an extent.
  6. xfs_swap_extent_rmap(): Use xfs_bmap_is_real_extent() instead of
     xfs_bmap_is_update_needed() to assess if the extent really needs
     to be swapped.

V3 -> V4:
  1. Introduce new patch which lets userspace programs to test "extent
     count overflow detection" by injecting an error tag. The new
     error tag reduces the maximum allowed extent count to 10.
  2. Injecting the newly defined error tag prevents
     xfs_bmap_add_extent_hole_real() from merging a new extent with
     its neighbours to allow writing deterministic tests for testing
     extent count overflow for Directories, Xattr and growing realtime
     devices. This is required because the new extent being allocated
     can be contiguous with its neighbours (w.r.t both file and disk
     offsets).
  3. Injecting the newly defined error tag forces block sized extents
     to be allocated for summary/bitmap files when growing a realtime
     device. This is required because xfs_growfs_rt_alloc() allocates
     as large an extent as possible for summary/bitmap files and hence
     it would be impossible to write deterministic tests.
  4. Rename XFS_IEXT_REMOVE_CNT to XFS_IEXT_PUNCH_HOLE_CNT to reflect
     the actual meaning of the fs operation.
  5. Fold XFS_IEXT_INSERT_HOLE_CNT code into that associated with
     XFS_IEXT_PUNCH_HOLE_CNT since both perform the same job.
  6. xfs_swap_extent_rmap(): Check for extent overflow should be made
     on the source file only if the donor file extent has a valid
     on-disk mapping and vice versa.

V2 -> V3:
  1. Move the definition of xfs_iext_count_may_overflow() from
     libxfs/xfs_trans_resv.c to libxfs/xfs_inode_fork.c. Also, I tried
     to make xfs_iext_count_may_overflow() an inline function by
     placing the definition in libxfs/xfs_inode_fork.h. However this
     required that the definition of 'struct xfs_inode' be available,
     since xfs_iext_count_may_overflow() uses a 'struct xfs_inode *'
     type variable.
  2. Handle XFS_COW_FORK within xfs_iext_count_may_overflow() by
     returning a success value.
  3. Rename XFS_IEXT_ADD_CNT to XFS_IEXT_ADD_NOSPLIT_CNT. Thanks to
     Darrick for the suggesting the new name.
  4. Expand comments to make use of 80 columns.

V1 -> V2:
  1. Rename helper function from xfs_trans_resv_ext_cnt() to
     xfs_iext_count_may_overflow().
  2. Define and use macros to represent fs operations and the
     corresponding increase in extent count.
  3. Split the patches based on the fs operation being performed.

Chandan Babu R (14):
  xfs: Add helper for checking per-inode extent count overflow
  xfs: Check for extent overflow when trivally adding a new extent
  xfs: Check for extent overflow when punching a hole
  xfs: Check for extent overflow when adding/removing xattrs
  xfs: Check for extent overflow when adding/removing dir entries
  xfs: Check for extent overflow when writing to unwritten extent
  xfs: Check for extent overflow when moving extent from cow to data
    fork
  xfs: Check for extent overflow when remapping an extent
  xfs: Check for extent overflow when swapping extents
  xfs: Introduce error injection to reduce maximum inode fork extent
    count
  xfs: Remove duplicate assert statement in xfs_bmap_btalloc()
  xfs: Compute bmap extent alignments in a separate function
  xfs: Process allocated extent in a separate function
  xfs: Introduce error injection to allocate only minlen size extents
    for files

 fs/xfs/libxfs/xfs_alloc.c      |  50 +++++++
 fs/xfs/libxfs/xfs_alloc.h      |   3 +
 fs/xfs/libxfs/xfs_attr.c       |  20 +++
 fs/xfs/libxfs/xfs_bmap.c       | 264 +++++++++++++++++++++++----------
 fs/xfs/libxfs/xfs_errortag.h   |   6 +-
 fs/xfs/libxfs/xfs_inode_fork.c |  27 ++++
 fs/xfs/libxfs/xfs_inode_fork.h |  62 ++++++++
 fs/xfs/xfs_bmap_item.c         |  10 ++
 fs/xfs/xfs_bmap_util.c         |  31 ++++
 fs/xfs/xfs_dquot.c             |   8 +-
 fs/xfs/xfs_error.c             |   6 +
 fs/xfs/xfs_inode.c             |  27 ++++
 fs/xfs/xfs_iomap.c             |  10 ++
 fs/xfs/xfs_reflink.c           |  16 ++
 fs/xfs/xfs_rtalloc.c           |   5 +
 fs/xfs/xfs_symlink.c           |   5 +
 16 files changed, 472 insertions(+), 78 deletions(-)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH V11 01/14] xfs: Add helper for checking per-inode extent count overflow
  2020-11-17 13:44 [PATCH V11 00/14] Bail out if transaction can cause extent count to overflow Chandan Babu R
@ 2020-11-17 13:44 ` Chandan Babu R
  2020-11-17 13:44 ` [PATCH V11 02/14] xfs: Check for extent overflow when trivally adding a new extent Chandan Babu R
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Chandan Babu R @ 2020-11-17 13:44 UTC (permalink / raw)
  To: linux-xfs
  Cc: Chandan Babu R, darrick.wong, Allison Henderson, Christoph Hellwig

XFS does not check for possible overflow of per-inode extent counter
fields when adding extents to either data or attr fork.

For e.g.
1. Insert 5 million xattrs (each having a value size of 255 bytes) and
   then delete 50% of them in an alternating manner.

2. On a 4k block sized XFS filesystem instance, the above causes 98511
   extents to be created in the attr fork of the inode.

   xfsaild/loop0  2008 [003]  1475.127209: probe:xfs_inode_to_disk: (ffffffffa43fb6b0) if_nextents=98511 i_ino=131

3. The incore inode fork extent counter is a signed 32-bit
   quantity. However the on-disk extent counter is an unsigned 16-bit
   quantity and hence cannot hold 98511 extents.

4. The following incorrect value is stored in the attr extent counter,
   # xfs_db -f -c 'inode 131' -c 'print core.naextents' /dev/loop0
   core.naextents = -32561

This commit adds a new helper function (i.e.
xfs_iext_count_may_overflow()) to check for overflow of the per-inode
data and xattr extent counters. Future patches will use this function to
make sure that an FS operation won't cause the extent counter to
overflow.

Suggested-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
---
 fs/xfs/libxfs/xfs_inode_fork.c | 23 +++++++++++++++++++++++
 fs/xfs/libxfs/xfs_inode_fork.h |  2 ++
 2 files changed, 25 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
index 7575de5cecb1..8d48716547e5 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -23,6 +23,7 @@
 #include "xfs_da_btree.h"
 #include "xfs_dir2_priv.h"
 #include "xfs_attr_leaf.h"
+#include "xfs_types.h"
 
 kmem_zone_t *xfs_ifork_zone;
 
@@ -728,3 +729,25 @@ xfs_ifork_verify_local_attr(
 
 	return 0;
 }
+
+int
+xfs_iext_count_may_overflow(
+	struct xfs_inode	*ip,
+	int			whichfork,
+	int			nr_to_add)
+{
+	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, whichfork);
+	uint64_t		max_exts;
+	uint64_t		nr_exts;
+
+	if (whichfork == XFS_COW_FORK)
+		return 0;
+
+	max_exts = (whichfork == XFS_ATTR_FORK) ? MAXAEXTNUM : MAXEXTNUM;
+
+	nr_exts = ifp->if_nextents + nr_to_add;
+	if (nr_exts < ifp->if_nextents || nr_exts > max_exts)
+		return -EFBIG;
+
+	return 0;
+}
diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
index a4953e95c4f3..0beb8e2a00be 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.h
+++ b/fs/xfs/libxfs/xfs_inode_fork.h
@@ -172,5 +172,7 @@ extern void xfs_ifork_init_cow(struct xfs_inode *ip);
 
 int xfs_ifork_verify_local_data(struct xfs_inode *ip);
 int xfs_ifork_verify_local_attr(struct xfs_inode *ip);
+int xfs_iext_count_may_overflow(struct xfs_inode *ip, int whichfork,
+		int nr_to_add);
 
 #endif	/* __XFS_INODE_FORK_H__ */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V11 02/14] xfs: Check for extent overflow when trivally adding a new extent
  2020-11-17 13:44 [PATCH V11 00/14] Bail out if transaction can cause extent count to overflow Chandan Babu R
  2020-11-17 13:44 ` [PATCH V11 01/14] xfs: Add helper for checking per-inode extent count overflow Chandan Babu R
@ 2020-11-17 13:44 ` Chandan Babu R
  2020-11-17 13:44 ` [PATCH V11 03/14] xfs: Check for extent overflow when punching a hole Chandan Babu R
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Chandan Babu R @ 2020-11-17 13:44 UTC (permalink / raw)
  To: linux-xfs
  Cc: Chandan Babu R, darrick.wong, Christoph Hellwig, Allison Henderson

When adding a new data extent (without modifying an inode's existing
extents) the extent count increases only by 1. This commit checks for
extent count overflow in such cases.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
---
 fs/xfs/libxfs/xfs_bmap.c       | 6 ++++++
 fs/xfs/libxfs/xfs_inode_fork.h | 6 ++++++
 fs/xfs/xfs_bmap_item.c         | 7 +++++++
 fs/xfs/xfs_bmap_util.c         | 5 +++++
 fs/xfs/xfs_dquot.c             | 8 +++++++-
 fs/xfs/xfs_iomap.c             | 5 +++++
 fs/xfs/xfs_rtalloc.c           | 5 +++++
 7 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index d9a692484eae..505358839d2f 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4527,6 +4527,12 @@ xfs_bmapi_convert_delalloc(
 		return error;
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
+
+	error = xfs_iext_count_may_overflow(ip, whichfork,
+			XFS_IEXT_ADD_NOSPLIT_CNT);
+	if (error)
+		goto out_trans_cancel;
+
 	xfs_trans_ijoin(tp, ip, 0);
 
 	if (!xfs_iext_lookup_extent(ip, ifp, offset_fsb, &bma.icur, &bma.got) ||
diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
index 0beb8e2a00be..7fc2b129a2e7 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.h
+++ b/fs/xfs/libxfs/xfs_inode_fork.h
@@ -34,6 +34,12 @@ struct xfs_ifork {
 #define	XFS_IFEXTENTS	0x02	/* All extent pointers are read in */
 #define	XFS_IFBROOT	0x04	/* i_broot points to the bmap b-tree root */
 
+/*
+ * Worst-case increase in the fork extent count when we're adding a single
+ * extent to a fork and there's no possibility of splitting an existing mapping.
+ */
+#define XFS_IEXT_ADD_NOSPLIT_CNT	(1)
+
 /*
  * Fork handling.
  */
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index 9e16a4d0f97c..1610d6ad089b 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -497,6 +497,13 @@ xfs_bui_item_recover(
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(tp, ip, 0);
 
+	if (bui_type == XFS_BMAP_MAP) {
+		error = xfs_iext_count_may_overflow(ip, whichfork,
+				XFS_IEXT_ADD_NOSPLIT_CNT);
+		if (error)
+			goto err_cancel;
+	}
+
 	count = bmap->me_len;
 	error = xfs_trans_log_finish_bmap_update(tp, budp, bui_type, ip,
 			whichfork, bmap->me_startoff, bmap->me_startblock,
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index f2a8a0e75e1f..dcd6e61df711 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -822,6 +822,11 @@ xfs_alloc_file_space(
 		if (error)
 			goto error1;
 
+		error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
+				XFS_IEXT_ADD_NOSPLIT_CNT);
+		if (error)
+			goto error0;
+
 		xfs_trans_ijoin(tp, ip, 0);
 
 		error = xfs_bmapi_write(tp, ip, startoffset_fsb,
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index 1d95ed387d66..175f544f7c45 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -314,8 +314,14 @@ xfs_dquot_disk_alloc(
 		return -ESRCH;
 	}
 
-	/* Create the block mapping. */
 	xfs_trans_ijoin(tp, quotip, XFS_ILOCK_EXCL);
+
+	error = xfs_iext_count_may_overflow(quotip, XFS_DATA_FORK,
+			XFS_IEXT_ADD_NOSPLIT_CNT);
+	if (error)
+		return error;
+
+	/* Create the block mapping. */
 	error = xfs_bmapi_write(tp, quotip, dqp->q_fileoffset,
 			XFS_DQUOT_CLUSTER_SIZE_FSB, XFS_BMAPI_METADATA, 0, &map,
 			&nmaps);
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 3abb8b9d6f4c..a302a96823b8 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -250,6 +250,11 @@ xfs_iomap_write_direct(
 	if (error)
 		goto out_trans_cancel;
 
+	error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
+			XFS_IEXT_ADD_NOSPLIT_CNT);
+	if (error)
+		goto out_trans_cancel;
+
 	xfs_trans_ijoin(tp, ip, 0);
 
 	/*
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 1c3969807fb9..45ef7fa69e1d 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -804,6 +804,11 @@ xfs_growfs_rt_alloc(
 		xfs_ilock(ip, XFS_ILOCK_EXCL);
 		xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
 
+		error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
+				XFS_IEXT_ADD_NOSPLIT_CNT);
+		if (error)
+			goto out_trans_cancel;
+
 		/*
 		 * Allocate blocks to the bitmap file.
 		 */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V11 03/14] xfs: Check for extent overflow when punching a hole
  2020-11-17 13:44 [PATCH V11 00/14] Bail out if transaction can cause extent count to overflow Chandan Babu R
  2020-11-17 13:44 ` [PATCH V11 01/14] xfs: Add helper for checking per-inode extent count overflow Chandan Babu R
  2020-11-17 13:44 ` [PATCH V11 02/14] xfs: Check for extent overflow when trivally adding a new extent Chandan Babu R
@ 2020-11-17 13:44 ` Chandan Babu R
  2020-11-17 13:44 ` [PATCH V11 04/14] xfs: Check for extent overflow when adding/removing xattrs Chandan Babu R
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Chandan Babu R @ 2020-11-17 13:44 UTC (permalink / raw)
  To: linux-xfs
  Cc: Chandan Babu R, darrick.wong, Christoph Hellwig, Allison Henderson

The extent mapping the file offset at which a hole has to be
inserted will be split into two extents causing extent count to
increase by 1.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
---
 fs/xfs/libxfs/xfs_inode_fork.h |  7 +++++++
 fs/xfs/xfs_bmap_item.c         | 15 +++++++++------
 fs/xfs/xfs_bmap_util.c         | 10 ++++++++++
 3 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
index 7fc2b129a2e7..bcac769a7df6 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.h
+++ b/fs/xfs/libxfs/xfs_inode_fork.h
@@ -40,6 +40,13 @@ struct xfs_ifork {
  */
 #define XFS_IEXT_ADD_NOSPLIT_CNT	(1)
 
+/*
+ * Punching out an extent from the middle of an existing extent can cause the
+ * extent count to increase by 1.
+ * i.e. | Old extent | Hole | Old extent |
+ */
+#define XFS_IEXT_PUNCH_HOLE_CNT		(1)
+
 /*
  * Fork handling.
  */
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index 1610d6ad089b..80d828394158 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -439,6 +439,7 @@ xfs_bui_item_recover(
 	xfs_exntst_t			state;
 	unsigned int			bui_type;
 	int				whichfork;
+	int				iext_delta;
 	int				error = 0;
 
 	/* Only one mapping operation per BUI... */
@@ -497,12 +498,14 @@ xfs_bui_item_recover(
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(tp, ip, 0);
 
-	if (bui_type == XFS_BMAP_MAP) {
-		error = xfs_iext_count_may_overflow(ip, whichfork,
-				XFS_IEXT_ADD_NOSPLIT_CNT);
-		if (error)
-			goto err_cancel;
-	}
+	if (bui_type == XFS_BMAP_MAP)
+		iext_delta = XFS_IEXT_ADD_NOSPLIT_CNT;
+	else
+		iext_delta = XFS_IEXT_PUNCH_HOLE_CNT;
+
+	error = xfs_iext_count_may_overflow(ip, whichfork, iext_delta);
+	if (error)
+		goto err_cancel;
 
 	count = bmap->me_len;
 	error = xfs_trans_log_finish_bmap_update(tp, budp, bui_type, ip,
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index dcd6e61df711..0776abd0103c 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -891,6 +891,11 @@ xfs_unmap_extent(
 
 	xfs_trans_ijoin(tp, ip, 0);
 
+	error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
+			XFS_IEXT_PUNCH_HOLE_CNT);
+	if (error)
+		goto out_trans_cancel;
+
 	error = xfs_bunmapi(tp, ip, startoffset_fsb, len_fsb, 0, 2, done);
 	if (error)
 		goto out_trans_cancel;
@@ -1176,6 +1181,11 @@ xfs_insert_file_space(
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(tp, ip, 0);
 
+	error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
+			XFS_IEXT_PUNCH_HOLE_CNT);
+	if (error)
+		goto out_trans_cancel;
+
 	/*
 	 * The extent shifting code works on extent granularity. So, if stop_fsb
 	 * is not the starting block of extent, we need to split the extent at
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V11 04/14] xfs: Check for extent overflow when adding/removing xattrs
  2020-11-17 13:44 [PATCH V11 00/14] Bail out if transaction can cause extent count to overflow Chandan Babu R
                   ` (2 preceding siblings ...)
  2020-11-17 13:44 ` [PATCH V11 03/14] xfs: Check for extent overflow when punching a hole Chandan Babu R
@ 2020-11-17 13:44 ` Chandan Babu R
  2020-12-03 18:45   ` Darrick J. Wong
  2020-11-17 13:44 ` [PATCH V11 05/14] xfs: Check for extent overflow when adding/removing dir entries Chandan Babu R
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 25+ messages in thread
From: Chandan Babu R @ 2020-11-17 13:44 UTC (permalink / raw)
  To: linux-xfs; +Cc: Chandan Babu R, darrick.wong

Adding/removing an xattr can cause XFS_DA_NODE_MAXDEPTH extents to be
added. One extra extent for dabtree in case a local attr is large enough
to cause a double split.  It can also cause extent count to increase
proportional to the size of a remote xattr's value.

To be able to always remove an existing xattr, when adding an xattr we
make sure to reserve inode fork extent count required for removing max
sized xattr in addition to that required by the xattr add operation.

Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
---
 fs/xfs/libxfs/xfs_attr.c       | 20 ++++++++++++++++++++
 fs/xfs/libxfs/xfs_inode_fork.h | 10 ++++++++++
 2 files changed, 30 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index fd8e6418a0d3..d53b3867b308 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -396,6 +396,8 @@ xfs_attr_set(
 	struct xfs_trans_res	tres;
 	bool			rsvd = (args->attr_filter & XFS_ATTR_ROOT);
 	int			error, local;
+	int			iext_cnt;
+	int			rmt_blks;
 	unsigned int		total;
 
 	if (XFS_FORCED_SHUTDOWN(dp->i_mount))
@@ -416,6 +418,9 @@ xfs_attr_set(
 	 */
 	args->op_flags = XFS_DA_OP_OKNOENT;
 
+	rmt_blks = xfs_attr3_rmt_blocks(mp, XFS_XATTR_SIZE_MAX);
+	iext_cnt = XFS_IEXT_ATTR_MANIP_CNT(rmt_blks);
+
 	if (args->value) {
 		XFS_STATS_INC(mp, xs_attr_set);
 
@@ -442,6 +447,13 @@ xfs_attr_set(
 		tres.tr_logcount = XFS_ATTRSET_LOG_COUNT;
 		tres.tr_logflags = XFS_TRANS_PERM_LOG_RES;
 		total = args->total;
+
+		if (local)
+			rmt_blks = 0;
+		else
+			rmt_blks = xfs_attr3_rmt_blocks(mp, args->valuelen);
+
+		iext_cnt += XFS_IEXT_ATTR_MANIP_CNT(rmt_blks);
 	} else {
 		XFS_STATS_INC(mp, xs_attr_remove);
 
@@ -460,6 +472,14 @@ xfs_attr_set(
 
 	xfs_ilock(dp, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(args->trans, dp, 0);
+
+	if (args->value || xfs_inode_hasattr(dp)) {
+		error = xfs_iext_count_may_overflow(dp, XFS_ATTR_FORK,
+				iext_cnt);
+		if (error)
+			goto out_trans_cancel;
+	}
+
 	if (args->value) {
 		unsigned int	quota_flags = XFS_QMOPT_RES_REGBLKS;
 
diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
index bcac769a7df6..5de2f07d0dd5 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.h
+++ b/fs/xfs/libxfs/xfs_inode_fork.h
@@ -47,6 +47,16 @@ struct xfs_ifork {
  */
 #define XFS_IEXT_PUNCH_HOLE_CNT		(1)
 
+/*
+ * Adding/removing an xattr can cause XFS_DA_NODE_MAXDEPTH extents to
+ * be added. One extra extent for dabtree in case a local attr is
+ * large enough to cause a double split.  It can also cause extent
+ * count to increase proportional to the size of a remote xattr's
+ * value.
+ */
+#define XFS_IEXT_ATTR_MANIP_CNT(rmt_blks) \
+	(XFS_DA_NODE_MAXDEPTH + max(1, rmt_blks))
+
 /*
  * Fork handling.
  */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V11 05/14] xfs: Check for extent overflow when adding/removing dir entries
  2020-11-17 13:44 [PATCH V11 00/14] Bail out if transaction can cause extent count to overflow Chandan Babu R
                   ` (3 preceding siblings ...)
  2020-11-17 13:44 ` [PATCH V11 04/14] xfs: Check for extent overflow when adding/removing xattrs Chandan Babu R
@ 2020-11-17 13:44 ` Chandan Babu R
  2020-12-03 19:04   ` Darrick J. Wong
  2020-11-17 13:44 ` [PATCH V11 06/14] xfs: Check for extent overflow when writing to unwritten extent Chandan Babu R
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 25+ messages in thread
From: Chandan Babu R @ 2020-11-17 13:44 UTC (permalink / raw)
  To: linux-xfs; +Cc: Chandan Babu R, darrick.wong

Directory entry addition/removal can cause the following,
1. Data block can be added/removed.
   A new extent can cause extent count to increase by 1.
2. Free disk block can be added/removed.
   Same behaviour as described above for Data block.
3. Dabtree blocks.
   XFS_DA_NODE_MAXDEPTH blocks can be added. Each of these
   can be new extents. Hence extent count can increase by
   XFS_DA_NODE_MAXDEPTH.

To be able to always remove an existing directory entry, when adding a
new directory entry we make sure to reserve inode fork extent count
required for removing a directory entry in addition to that required for
the directory entry add operation.

Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
---
 fs/xfs/libxfs/xfs_inode_fork.h | 13 +++++++++++++
 fs/xfs/xfs_inode.c             | 27 +++++++++++++++++++++++++++
 fs/xfs/xfs_symlink.c           |  5 +++++
 3 files changed, 45 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
index 5de2f07d0dd5..fd93fdc67ee4 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.h
+++ b/fs/xfs/libxfs/xfs_inode_fork.h
@@ -57,6 +57,19 @@ struct xfs_ifork {
 #define XFS_IEXT_ATTR_MANIP_CNT(rmt_blks) \
 	(XFS_DA_NODE_MAXDEPTH + max(1, rmt_blks))
 
+/*
+ * Directory entry addition/removal can cause the following,
+ * 1. Data block can be added/removed.
+ *    A new extent can cause extent count to increase by 1.
+ * 2. Free disk block can be added/removed.
+ *    Same behaviour as described above for Data block.
+ * 3. Dabtree blocks.
+ *    XFS_DA_NODE_MAXDEPTH blocks can be added. Each of these can be new
+ *    extents. Hence extent count can increase by XFS_DA_NODE_MAXDEPTH.
+ */
+#define XFS_IEXT_DIR_MANIP_CNT(mp) \
+	((XFS_DA_NODE_MAXDEPTH + 1 + 1) * (mp)->m_dir_geo->fsbcount)
+
 /*
  * Fork handling.
  */
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 2bfbcf28b1bd..f7b0b7fce940 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1177,6 +1177,11 @@ xfs_create(
 	if (error)
 		goto out_trans_cancel;
 
+	error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
+			XFS_IEXT_DIR_MANIP_CNT(mp) << 1);
+	if (error)
+		goto out_trans_cancel;
+
 	/*
 	 * A newly created regular or special file just has one directory
 	 * entry pointing to them, but a directory also the "." entry
@@ -1393,6 +1398,11 @@ xfs_link(
 	xfs_trans_ijoin(tp, sip, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(tp, tdp, XFS_ILOCK_EXCL);
 
+	error = xfs_iext_count_may_overflow(tdp, XFS_DATA_FORK,
+			XFS_IEXT_DIR_MANIP_CNT(mp) << 1);
+	if (error)
+		goto error_return;
+
 	/*
 	 * If we are using project inheritance, we only allow hard link
 	 * creation in our tree when the project IDs are the same; else
@@ -2861,6 +2871,11 @@ xfs_remove(
 	xfs_trans_ijoin(tp, dp, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
 
+	error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
+			XFS_IEXT_DIR_MANIP_CNT(mp));
+	if (error)
+		goto out_trans_cancel;
+
 	/*
 	 * If we're removing a directory perform some additional validation.
 	 */
@@ -3221,6 +3236,18 @@ xfs_rename(
 	if (wip)
 		xfs_trans_ijoin(tp, wip, XFS_ILOCK_EXCL);
 
+	error = xfs_iext_count_may_overflow(src_dp, XFS_DATA_FORK,
+			XFS_IEXT_DIR_MANIP_CNT(mp));
+	if (error)
+		goto out_trans_cancel;
+
+	if (target_ip == NULL) {
+		error = xfs_iext_count_may_overflow(target_dp, XFS_DATA_FORK,
+				XFS_IEXT_DIR_MANIP_CNT(mp) << 1);
+		if (error)
+			goto out_trans_cancel;
+	}
+
 	/*
 	 * If we are using project inheritance, we only allow renames
 	 * into our tree when the project IDs are the same; else the
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index 8e88a7ca387e..08aa808fe290 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -220,6 +220,11 @@ xfs_symlink(
 	if (error)
 		goto out_trans_cancel;
 
+	error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
+			XFS_IEXT_DIR_MANIP_CNT(mp) << 1);
+	if (error)
+		goto out_trans_cancel;
+
 	/*
 	 * Allocate an inode for the symlink.
 	 */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V11 06/14] xfs: Check for extent overflow when writing to unwritten extent
  2020-11-17 13:44 [PATCH V11 00/14] Bail out if transaction can cause extent count to overflow Chandan Babu R
                   ` (4 preceding siblings ...)
  2020-11-17 13:44 ` [PATCH V11 05/14] xfs: Check for extent overflow when adding/removing dir entries Chandan Babu R
@ 2020-11-17 13:44 ` Chandan Babu R
  2020-11-17 13:44 ` [PATCH V11 07/14] xfs: Check for extent overflow when moving extent from cow to data fork Chandan Babu R
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Chandan Babu R @ 2020-11-17 13:44 UTC (permalink / raw)
  To: linux-xfs
  Cc: Chandan Babu R, darrick.wong, Christoph Hellwig, Allison Henderson

A write to a sub-interval of an existing unwritten extent causes
the original extent to be split into 3 extents
i.e. | Unwritten | Real | Unwritten |
Hence extent count can increase by 2.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
---
 fs/xfs/libxfs/xfs_inode_fork.h | 8 ++++++++
 fs/xfs/xfs_iomap.c             | 5 +++++
 2 files changed, 13 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
index fd93fdc67ee4..afb647e1e3fa 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.h
+++ b/fs/xfs/libxfs/xfs_inode_fork.h
@@ -70,6 +70,14 @@ struct xfs_ifork {
 #define XFS_IEXT_DIR_MANIP_CNT(mp) \
 	((XFS_DA_NODE_MAXDEPTH + 1 + 1) * (mp)->m_dir_geo->fsbcount)
 
+/*
+ * A write to a sub-interval of an existing unwritten extent causes the original
+ * extent to be split into 3 extents
+ * i.e. | Unwritten | Real | Unwritten |
+ * Hence extent count can increase by 2.
+ */
+#define XFS_IEXT_WRITE_UNWRITTEN_CNT	(2)
+
 /*
  * Fork handling.
  */
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index a302a96823b8..2aa788379611 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -566,6 +566,11 @@ xfs_iomap_write_unwritten(
 		if (error)
 			goto error_on_bmapi_transaction;
 
+		error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
+				XFS_IEXT_WRITE_UNWRITTEN_CNT);
+		if (error)
+			goto error_on_bmapi_transaction;
+
 		/*
 		 * Modify the unwritten extent state of the buffer.
 		 */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V11 07/14] xfs: Check for extent overflow when moving extent from cow to data fork
  2020-11-17 13:44 [PATCH V11 00/14] Bail out if transaction can cause extent count to overflow Chandan Babu R
                   ` (5 preceding siblings ...)
  2020-11-17 13:44 ` [PATCH V11 06/14] xfs: Check for extent overflow when writing to unwritten extent Chandan Babu R
@ 2020-11-17 13:44 ` Chandan Babu R
  2020-11-17 13:44 ` [PATCH V11 08/14] xfs: Check for extent overflow when remapping an extent Chandan Babu R
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Chandan Babu R @ 2020-11-17 13:44 UTC (permalink / raw)
  To: linux-xfs
  Cc: Chandan Babu R, darrick.wong, Christoph Hellwig, Allison Henderson

Moving an extent to data fork can cause a sub-interval of an existing
extent to be unmapped. This will increase extent count by 1. Mapping in
the new extent can increase the extent count by 1 again i.e.
 | Old extent | New extent | Old extent |
Hence number of extents increases by 2.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
---
 fs/xfs/libxfs/xfs_inode_fork.h | 9 +++++++++
 fs/xfs/xfs_reflink.c           | 5 +++++
 2 files changed, 14 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
index afb647e1e3fa..b99e67e7b59b 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.h
+++ b/fs/xfs/libxfs/xfs_inode_fork.h
@@ -78,6 +78,15 @@ struct xfs_ifork {
  */
 #define XFS_IEXT_WRITE_UNWRITTEN_CNT	(2)
 
+/*
+ * Moving an extent to data fork can cause a sub-interval of an existing extent
+ * to be unmapped. This will increase extent count by 1. Mapping in the new
+ * extent can increase the extent count by 1 again i.e.
+ * | Old extent | New extent | Old extent |
+ * Hence number of extents increases by 2.
+ */
+#define XFS_IEXT_REFLINK_END_COW_CNT	(2)
+
 /*
  * Fork handling.
  */
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 16098dc42add..4f0198f636ad 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -628,6 +628,11 @@ xfs_reflink_end_cow_extent(
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(tp, ip, 0);
 
+	error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
+			XFS_IEXT_REFLINK_END_COW_CNT);
+	if (error)
+		goto out_cancel;
+
 	/*
 	 * In case of racing, overlapping AIO writes no COW extents might be
 	 * left by the time I/O completes for the loser of the race.  In that
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V11 08/14] xfs: Check for extent overflow when remapping an extent
  2020-11-17 13:44 [PATCH V11 00/14] Bail out if transaction can cause extent count to overflow Chandan Babu R
                   ` (6 preceding siblings ...)
  2020-11-17 13:44 ` [PATCH V11 07/14] xfs: Check for extent overflow when moving extent from cow to data fork Chandan Babu R
@ 2020-11-17 13:44 ` Chandan Babu R
  2020-11-17 13:44 ` [PATCH V11 09/14] xfs: Check for extent overflow when swapping extents Chandan Babu R
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Chandan Babu R @ 2020-11-17 13:44 UTC (permalink / raw)
  To: linux-xfs; +Cc: Chandan Babu R, darrick.wong, Allison Henderson

Remapping an extent involves unmapping the existing extent and mapping
in the new extent. When unmapping, an extent containing the entire unmap
range can be split into two extents,
i.e. | Old extent | hole | Old extent |
Hence extent count increases by 1.

Mapping in the new extent into the destination file can increase the
extent count by 1.

Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
---
 fs/xfs/xfs_reflink.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 4f0198f636ad..856fe755a5e9 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1006,6 +1006,7 @@ xfs_reflink_remap_extent(
 	unsigned int		resblks;
 	bool			smap_real;
 	bool			dmap_written = xfs_bmap_is_written_extent(dmap);
+	int			iext_delta = 0;
 	int			nimaps;
 	int			error;
 
@@ -1099,6 +1100,16 @@ xfs_reflink_remap_extent(
 			goto out_cancel;
 	}
 
+	if (smap_real)
+		++iext_delta;
+
+	if (dmap_written)
+		++iext_delta;
+
+	error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK, iext_delta);
+	if (error)
+		goto out_cancel;
+
 	if (smap_real) {
 		/*
 		 * If the extent we're unmapping is backed by storage (written
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V11 09/14] xfs: Check for extent overflow when swapping extents
  2020-11-17 13:44 [PATCH V11 00/14] Bail out if transaction can cause extent count to overflow Chandan Babu R
                   ` (7 preceding siblings ...)
  2020-11-17 13:44 ` [PATCH V11 08/14] xfs: Check for extent overflow when remapping an extent Chandan Babu R
@ 2020-11-17 13:44 ` Chandan Babu R
  2020-11-17 13:44 ` [PATCH V11 10/14] xfs: Introduce error injection to reduce maximum inode fork extent count Chandan Babu R
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Chandan Babu R @ 2020-11-17 13:44 UTC (permalink / raw)
  To: linux-xfs; +Cc: Chandan Babu R, darrick.wong, Allison Henderson

Removing an initial range of source/donor file's extent and adding a new
extent (from donor/source file) in its place will cause extent count to
increase by 1.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
---
 fs/xfs/libxfs/xfs_inode_fork.h |  7 +++++++
 fs/xfs/xfs_bmap_util.c         | 16 ++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
index b99e67e7b59b..969b06160d44 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.h
+++ b/fs/xfs/libxfs/xfs_inode_fork.h
@@ -87,6 +87,13 @@ struct xfs_ifork {
  */
 #define XFS_IEXT_REFLINK_END_COW_CNT	(2)
 
+/*
+ * Removing an initial range of source/donor file's extent and adding a new
+ * extent (from donor/source file) in its place will cause extent count to
+ * increase by 1.
+ */
+#define XFS_IEXT_SWAP_RMAP_CNT		(1)
+
 /*
  * Fork handling.
  */
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 0776abd0103c..b6728fdf50ae 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1407,6 +1407,22 @@ xfs_swap_extent_rmap(
 					irec.br_blockcount);
 			trace_xfs_swap_extent_rmap_remap_piece(tip, &uirec);
 
+			if (xfs_bmap_is_real_extent(&uirec)) {
+				error = xfs_iext_count_may_overflow(ip,
+						XFS_DATA_FORK,
+						XFS_IEXT_SWAP_RMAP_CNT);
+				if (error)
+					goto out;
+			}
+
+			if (xfs_bmap_is_real_extent(&irec)) {
+				error = xfs_iext_count_may_overflow(tip,
+						XFS_DATA_FORK,
+						XFS_IEXT_SWAP_RMAP_CNT);
+				if (error)
+					goto out;
+			}
+
 			/* Remove the mapping from the donor file. */
 			xfs_bmap_unmap_extent(tp, tip, &uirec);
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V11 10/14] xfs: Introduce error injection to reduce maximum inode fork extent count
  2020-11-17 13:44 [PATCH V11 00/14] Bail out if transaction can cause extent count to overflow Chandan Babu R
                   ` (8 preceding siblings ...)
  2020-11-17 13:44 ` [PATCH V11 09/14] xfs: Check for extent overflow when swapping extents Chandan Babu R
@ 2020-11-17 13:44 ` Chandan Babu R
  2020-12-03 19:06   ` Darrick J. Wong
  2020-11-17 13:44 ` [PATCH V11 11/14] xfs: Remove duplicate assert statement in xfs_bmap_btalloc() Chandan Babu R
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 25+ messages in thread
From: Chandan Babu R @ 2020-11-17 13:44 UTC (permalink / raw)
  To: linux-xfs; +Cc: Chandan Babu R, darrick.wong

This commit adds XFS_ERRTAG_REDUCE_MAX_IEXTENTS error tag which enables
userspace programs to test "Inode fork extent count overflow detection"
by reducing maximum possible inode fork extent count to 35.

With block size of 4k, xattr (with local value) insert operation would
require in the worst case "XFS_DA_NODE_MAXDEPTH + 1" plus
"XFS_DA_NODE_MAXDEPTH + (64k / 4k)" (required for guaranteeing removal
of a maximum sized xattr) number of extents. This evaluates to ~28
extents. To allow for additions of two or more xattrs during extent
overflow testing, the pseudo max extent count is set to 35.

Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
---
 fs/xfs/libxfs/xfs_errortag.h   | 4 +++-
 fs/xfs/libxfs/xfs_inode_fork.c | 4 ++++
 fs/xfs/xfs_error.c             | 3 +++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_errortag.h b/fs/xfs/libxfs/xfs_errortag.h
index 53b305dea381..1c56fcceeea6 100644
--- a/fs/xfs/libxfs/xfs_errortag.h
+++ b/fs/xfs/libxfs/xfs_errortag.h
@@ -56,7 +56,8 @@
 #define XFS_ERRTAG_FORCE_SUMMARY_RECALC			33
 #define XFS_ERRTAG_IUNLINK_FALLBACK			34
 #define XFS_ERRTAG_BUF_IOERROR				35
-#define XFS_ERRTAG_MAX					36
+#define XFS_ERRTAG_REDUCE_MAX_IEXTENTS			36
+#define XFS_ERRTAG_MAX					37
 
 /*
  * Random factors for above tags, 1 means always, 2 means 1/2 time, etc.
@@ -97,5 +98,6 @@
 #define XFS_RANDOM_FORCE_SUMMARY_RECALC			1
 #define XFS_RANDOM_IUNLINK_FALLBACK			(XFS_RANDOM_DEFAULT/10)
 #define XFS_RANDOM_BUF_IOERROR				XFS_RANDOM_DEFAULT
+#define XFS_RANDOM_REDUCE_MAX_IEXTENTS			1
 
 #endif /* __XFS_ERRORTAG_H_ */
diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
index 8d48716547e5..989b20977654 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -24,6 +24,7 @@
 #include "xfs_dir2_priv.h"
 #include "xfs_attr_leaf.h"
 #include "xfs_types.h"
+#include "xfs_errortag.h"
 
 kmem_zone_t *xfs_ifork_zone;
 
@@ -745,6 +746,9 @@ xfs_iext_count_may_overflow(
 
 	max_exts = (whichfork == XFS_ATTR_FORK) ? MAXAEXTNUM : MAXEXTNUM;
 
+	if (XFS_TEST_ERROR(false, ip->i_mount, XFS_ERRTAG_REDUCE_MAX_IEXTENTS))
+		max_exts = 35;
+
 	nr_exts = ifp->if_nextents + nr_to_add;
 	if (nr_exts < ifp->if_nextents || nr_exts > max_exts)
 		return -EFBIG;
diff --git a/fs/xfs/xfs_error.c b/fs/xfs/xfs_error.c
index 7f6e20899473..3780b118cc47 100644
--- a/fs/xfs/xfs_error.c
+++ b/fs/xfs/xfs_error.c
@@ -54,6 +54,7 @@ static unsigned int xfs_errortag_random_default[] = {
 	XFS_RANDOM_FORCE_SUMMARY_RECALC,
 	XFS_RANDOM_IUNLINK_FALLBACK,
 	XFS_RANDOM_BUF_IOERROR,
+	XFS_RANDOM_REDUCE_MAX_IEXTENTS,
 };
 
 struct xfs_errortag_attr {
@@ -164,6 +165,7 @@ XFS_ERRORTAG_ATTR_RW(force_repair,	XFS_ERRTAG_FORCE_SCRUB_REPAIR);
 XFS_ERRORTAG_ATTR_RW(bad_summary,	XFS_ERRTAG_FORCE_SUMMARY_RECALC);
 XFS_ERRORTAG_ATTR_RW(iunlink_fallback,	XFS_ERRTAG_IUNLINK_FALLBACK);
 XFS_ERRORTAG_ATTR_RW(buf_ioerror,	XFS_ERRTAG_BUF_IOERROR);
+XFS_ERRORTAG_ATTR_RW(reduce_max_iextents,	XFS_ERRTAG_REDUCE_MAX_IEXTENTS);
 
 static struct attribute *xfs_errortag_attrs[] = {
 	XFS_ERRORTAG_ATTR_LIST(noerror),
@@ -202,6 +204,7 @@ static struct attribute *xfs_errortag_attrs[] = {
 	XFS_ERRORTAG_ATTR_LIST(bad_summary),
 	XFS_ERRORTAG_ATTR_LIST(iunlink_fallback),
 	XFS_ERRORTAG_ATTR_LIST(buf_ioerror),
+	XFS_ERRORTAG_ATTR_LIST(reduce_max_iextents),
 	NULL,
 };
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V11 11/14] xfs: Remove duplicate assert statement in xfs_bmap_btalloc()
  2020-11-17 13:44 [PATCH V11 00/14] Bail out if transaction can cause extent count to overflow Chandan Babu R
                   ` (9 preceding siblings ...)
  2020-11-17 13:44 ` [PATCH V11 10/14] xfs: Introduce error injection to reduce maximum inode fork extent count Chandan Babu R
@ 2020-11-17 13:44 ` Chandan Babu R
  2020-11-17 13:44 ` [PATCH V11 12/14] xfs: Compute bmap extent alignments in a separate function Chandan Babu R
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Chandan Babu R @ 2020-11-17 13:44 UTC (permalink / raw)
  To: linux-xfs; +Cc: Chandan Babu R, darrick.wong, Allison Henderson

The check for verifying if the allocated extent is from an AG whose
index is greater than or equal to that of tp->t_firstblock is already
done a couple of statements earlier in the same function. Hence this
commit removes the redundant assert statement.

Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 505358839d2f..64c4d0e384a5 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3699,7 +3699,6 @@ xfs_bmap_btalloc(
 		ap->blkno = args.fsbno;
 		if (ap->tp->t_firstblock == NULLFSBLOCK)
 			ap->tp->t_firstblock = args.fsbno;
-		ASSERT(nullfb || fb_agno <= args.agno);
 		ap->length = args.len;
 		/*
 		 * If the extent size hint is active, we tried to round the
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V11 12/14] xfs: Compute bmap extent alignments in a separate function
  2020-11-17 13:44 [PATCH V11 00/14] Bail out if transaction can cause extent count to overflow Chandan Babu R
                   ` (10 preceding siblings ...)
  2020-11-17 13:44 ` [PATCH V11 11/14] xfs: Remove duplicate assert statement in xfs_bmap_btalloc() Chandan Babu R
@ 2020-11-17 13:44 ` Chandan Babu R
  2020-11-17 13:44 ` [PATCH V11 13/14] xfs: Process allocated extent " Chandan Babu R
  2020-11-17 13:44 ` [PATCH V11 14/14] xfs: Introduce error injection to allocate only minlen size extents for files Chandan Babu R
  13 siblings, 0 replies; 25+ messages in thread
From: Chandan Babu R @ 2020-11-17 13:44 UTC (permalink / raw)
  To: linux-xfs; +Cc: Chandan Babu R, darrick.wong

This commit moves over the code which computes stripe alignment and
extent size hint alignment into a separate function. Apart from
xfs_bmap_btalloc(), the new function will be used by another function
introduced in a future commit.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 89 +++++++++++++++++++++++-----------------
 1 file changed, 52 insertions(+), 37 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 64c4d0e384a5..5032539d5e85 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3463,13 +3463,59 @@ xfs_bmap_btalloc_accounting(
 		args->len);
 }
 
+static int
+xfs_bmap_compute_alignments(
+	struct xfs_bmalloca	*ap,
+	struct xfs_alloc_arg	*args)
+{
+	struct xfs_mount	*mp = args->mp;
+	xfs_extlen_t		align = 0; /* minimum allocation alignment */
+	int			stripe_align = 0;
+	int			error;
+
+	/* stripe alignment for allocation is determined by mount parameters */
+	if (mp->m_swidth && (mp->m_flags & XFS_MOUNT_SWALLOC))
+		stripe_align = mp->m_swidth;
+	else if (mp->m_dalign)
+		stripe_align = mp->m_dalign;
+
+	if (ap->flags & XFS_BMAPI_COWFORK)
+		align = xfs_get_cowextsz_hint(ap->ip);
+	else if (ap->datatype & XFS_ALLOC_USERDATA)
+		align = xfs_get_extsz_hint(ap->ip);
+	if (align) {
+		error = xfs_bmap_extsize_align(mp, &ap->got, &ap->prev,
+						align, 0, ap->eof, 0, ap->conv,
+						&ap->offset, &ap->length);
+		ASSERT(!error);
+		ASSERT(ap->length);
+	}
+
+	/* apply extent size hints if obtained earlier */
+	if (align) {
+		args->prod = align;
+		div_u64_rem(ap->offset, args->prod, &args->mod);
+		if (args->mod)
+			args->mod = args->prod - args->mod;
+	} else if (mp->m_sb.sb_blocksize >= PAGE_SIZE) {
+		args->prod = 1;
+		args->mod = 0;
+	} else {
+		args->prod = PAGE_SIZE >> mp->m_sb.sb_blocklog;
+		div_u64_rem(ap->offset, args->prod, &args->mod);
+		if (args->mod)
+			args->mod = args->prod - args->mod;
+	}
+
+	return stripe_align;
+}
+
 STATIC int
 xfs_bmap_btalloc(
 	struct xfs_bmalloca	*ap)	/* bmap alloc argument struct */
 {
 	xfs_mount_t	*mp;		/* mount point structure */
 	xfs_alloctype_t	atype = 0;	/* type for allocation routines */
-	xfs_extlen_t	align = 0;	/* minimum allocation alignment */
 	xfs_agnumber_t	fb_agno;	/* ag number of ap->firstblock */
 	xfs_agnumber_t	ag;
 	xfs_alloc_arg_t	args;
@@ -3489,25 +3535,11 @@ xfs_bmap_btalloc(
 
 	mp = ap->ip->i_mount;
 
-	/* stripe alignment for allocation is determined by mount parameters */
-	stripe_align = 0;
-	if (mp->m_swidth && (mp->m_flags & XFS_MOUNT_SWALLOC))
-		stripe_align = mp->m_swidth;
-	else if (mp->m_dalign)
-		stripe_align = mp->m_dalign;
-
-	if (ap->flags & XFS_BMAPI_COWFORK)
-		align = xfs_get_cowextsz_hint(ap->ip);
-	else if (ap->datatype & XFS_ALLOC_USERDATA)
-		align = xfs_get_extsz_hint(ap->ip);
-	if (align) {
-		error = xfs_bmap_extsize_align(mp, &ap->got, &ap->prev,
-						align, 0, ap->eof, 0, ap->conv,
-						&ap->offset, &ap->length);
-		ASSERT(!error);
-		ASSERT(ap->length);
-	}
+	memset(&args, 0, sizeof(args));
+	args.tp = ap->tp;
+	args.mp = mp;
 
+	stripe_align = xfs_bmap_compute_alignments(ap, &args);
 
 	nullfb = ap->tp->t_firstblock == NULLFSBLOCK;
 	fb_agno = nullfb ? NULLAGNUMBER : XFS_FSB_TO_AGNO(mp,
@@ -3538,9 +3570,6 @@ xfs_bmap_btalloc(
 	 * Normal allocation, done through xfs_alloc_vextent.
 	 */
 	tryagain = isaligned = 0;
-	memset(&args, 0, sizeof(args));
-	args.tp = ap->tp;
-	args.mp = mp;
 	args.fsbno = ap->blkno;
 	args.oinfo = XFS_RMAP_OINFO_SKIP_UPDATE;
 
@@ -3571,21 +3600,7 @@ xfs_bmap_btalloc(
 		args.total = ap->total;
 		args.minlen = ap->minlen;
 	}
-	/* apply extent size hints if obtained earlier */
-	if (align) {
-		args.prod = align;
-		div_u64_rem(ap->offset, args.prod, &args.mod);
-		if (args.mod)
-			args.mod = args.prod - args.mod;
-	} else if (mp->m_sb.sb_blocksize >= PAGE_SIZE) {
-		args.prod = 1;
-		args.mod = 0;
-	} else {
-		args.prod = PAGE_SIZE >> mp->m_sb.sb_blocklog;
-		div_u64_rem(ap->offset, args.prod, &args.mod);
-		if (args.mod)
-			args.mod = args.prod - args.mod;
-	}
+
 	/*
 	 * If we are not low on available data blocks, and the underlying
 	 * logical volume manager is a stripe, and the file offset is zero then
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V11 13/14] xfs: Process allocated extent in a separate function
  2020-11-17 13:44 [PATCH V11 00/14] Bail out if transaction can cause extent count to overflow Chandan Babu R
                   ` (11 preceding siblings ...)
  2020-11-17 13:44 ` [PATCH V11 12/14] xfs: Compute bmap extent alignments in a separate function Chandan Babu R
@ 2020-11-17 13:44 ` Chandan Babu R
  2020-11-17 13:44 ` [PATCH V11 14/14] xfs: Introduce error injection to allocate only minlen size extents for files Chandan Babu R
  13 siblings, 0 replies; 25+ messages in thread
From: Chandan Babu R @ 2020-11-17 13:44 UTC (permalink / raw)
  To: linux-xfs; +Cc: Chandan Babu R, darrick.wong, Allison Henderson

This commit moves over the code in xfs_bmap_btalloc() which is
responsible for processing an allocated extent to a new function. Apart
from xfs_bmap_btalloc(), the new function will be invoked by another
function introduced in a future commit.

Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 74 ++++++++++++++++++++++++----------------
 1 file changed, 45 insertions(+), 29 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 5032539d5e85..f6cd33684571 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3510,6 +3510,48 @@ xfs_bmap_compute_alignments(
 	return stripe_align;
 }
 
+static void
+xfs_bmap_process_allocated_extent(
+	struct xfs_bmalloca	*ap,
+	struct xfs_alloc_arg	*args,
+	xfs_fileoff_t		orig_offset,
+	xfs_extlen_t		orig_length)
+{
+	int			nullfb;
+
+	nullfb = ap->tp->t_firstblock == NULLFSBLOCK;
+
+	/*
+	 * check the allocation happened at the same or higher AG than
+	 * the first block that was allocated.
+	 */
+	ASSERT(nullfb ||
+		XFS_FSB_TO_AGNO(args->mp, ap->tp->t_firstblock) <=
+		XFS_FSB_TO_AGNO(args->mp, args->fsbno));
+
+	ap->blkno = args->fsbno;
+	if (nullfb)
+		ap->tp->t_firstblock = args->fsbno;
+	ap->length = args->len;
+	/*
+	 * If the extent size hint is active, we tried to round the
+	 * caller's allocation request offset down to extsz and the
+	 * length up to another extsz boundary.  If we found a free
+	 * extent we mapped it in starting at this new offset.  If the
+	 * newly mapped space isn't long enough to cover any of the
+	 * range of offsets that was originally requested, move the
+	 * mapping up so that we can fill as much of the caller's
+	 * original request as possible.  Free space is apparently
+	 * very fragmented so we're unlikely to be able to satisfy the
+	 * hints anyway.
+	 */
+	if (ap->length <= orig_length)
+		ap->offset = orig_offset;
+	else if (ap->offset + ap->length < orig_offset + orig_length)
+		ap->offset = orig_offset + orig_length - ap->length;
+	xfs_bmap_btalloc_accounting(ap, args);
+}
+
 STATIC int
 xfs_bmap_btalloc(
 	struct xfs_bmalloca	*ap)	/* bmap alloc argument struct */
@@ -3702,36 +3744,10 @@ xfs_bmap_btalloc(
 			return error;
 		ap->tp->t_flags |= XFS_TRANS_LOWMODE;
 	}
+
 	if (args.fsbno != NULLFSBLOCK) {
-		/*
-		 * check the allocation happened at the same or higher AG than
-		 * the first block that was allocated.
-		 */
-		ASSERT(ap->tp->t_firstblock == NULLFSBLOCK ||
-		       XFS_FSB_TO_AGNO(mp, ap->tp->t_firstblock) <=
-		       XFS_FSB_TO_AGNO(mp, args.fsbno));
-
-		ap->blkno = args.fsbno;
-		if (ap->tp->t_firstblock == NULLFSBLOCK)
-			ap->tp->t_firstblock = args.fsbno;
-		ap->length = args.len;
-		/*
-		 * If the extent size hint is active, we tried to round the
-		 * caller's allocation request offset down to extsz and the
-		 * length up to another extsz boundary.  If we found a free
-		 * extent we mapped it in starting at this new offset.  If the
-		 * newly mapped space isn't long enough to cover any of the
-		 * range of offsets that was originally requested, move the
-		 * mapping up so that we can fill as much of the caller's
-		 * original request as possible.  Free space is apparently
-		 * very fragmented so we're unlikely to be able to satisfy the
-		 * hints anyway.
-		 */
-		if (ap->length <= orig_length)
-			ap->offset = orig_offset;
-		else if (ap->offset + ap->length < orig_offset + orig_length)
-			ap->offset = orig_offset + orig_length - ap->length;
-		xfs_bmap_btalloc_accounting(ap, &args);
+		xfs_bmap_process_allocated_extent(ap, &args, orig_offset,
+			orig_length);
 	} else {
 		ap->blkno = NULLFSBLOCK;
 		ap->length = 0;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V11 14/14] xfs: Introduce error injection to allocate only minlen size extents for files
  2020-11-17 13:44 [PATCH V11 00/14] Bail out if transaction can cause extent count to overflow Chandan Babu R
                   ` (12 preceding siblings ...)
  2020-11-17 13:44 ` [PATCH V11 13/14] xfs: Process allocated extent " Chandan Babu R
@ 2020-11-17 13:44 ` Chandan Babu R
  13 siblings, 0 replies; 25+ messages in thread
From: Chandan Babu R @ 2020-11-17 13:44 UTC (permalink / raw)
  To: linux-xfs; +Cc: Chandan Babu R, darrick.wong

This commit adds XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT error tag which
helps userspace test programs to get xfs_bmap_btalloc() to always
allocate minlen sized extents.

This is required for test programs which need a guarantee that minlen
extents allocated for a file do not get merged with their existing
neighbours in the inode's BMBT. "Inode fork extent overflow check" for
Directories, Xattrs and extension of realtime inodes need this since the
file offset at which the extents are being allocated cannot be
explicitly controlled from userspace.

One way to use this error tag is to,
1. Consume all of the free space by sequentially writing to a file.
2. Punch alternate blocks of the file. This causes CNTBT to contain
   sufficient number of one block sized extent records.
3. Inject XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT error tag.
After step 3, xfs_bmap_btalloc() will issue space allocation
requests for minlen sized extents only.

ENOSPC error code is returned to userspace when there aren't any "one
block sized" extents left in any of the AGs.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
---
 fs/xfs/libxfs/xfs_alloc.c    |  50 ++++++++++++++
 fs/xfs/libxfs/xfs_alloc.h    |   3 +
 fs/xfs/libxfs/xfs_bmap.c     | 124 ++++++++++++++++++++++++++++-------
 fs/xfs/libxfs/xfs_errortag.h |   4 +-
 fs/xfs/xfs_error.c           |   3 +
 5 files changed, 159 insertions(+), 25 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 852b536551b5..a7c4eb1d71d5 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2473,6 +2473,47 @@ xfs_defer_agfl_block(
 	xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_AGFL_FREE, &new->xefi_list);
 }
 
+#ifdef DEBUG
+/*
+ * Check if an AGF has a free extent record whose length is equal to
+ * args->minlen.
+ */
+STATIC int
+xfs_exact_minlen_extent_available(
+	struct xfs_alloc_arg	*args,
+	struct xfs_buf		*agbp,
+	int			*stat)
+{
+	struct xfs_btree_cur	*cnt_cur;
+	xfs_agblock_t		fbno;
+	xfs_extlen_t		flen;
+	int			error = 0;
+
+	cnt_cur = xfs_allocbt_init_cursor(args->mp, args->tp, agbp,
+			args->agno, XFS_BTNUM_CNT);
+	error = xfs_alloc_lookup_ge(cnt_cur, 0, args->minlen, stat);
+	if (error)
+		goto out;
+
+	if (*stat == 0) {
+		error = -EFSCORRUPTED;
+		goto out;
+	}
+
+	error = xfs_alloc_get_rec(cnt_cur, &fbno, &flen, stat);
+	if (error)
+		goto out;
+
+	if (*stat == 1 && flen != args->minlen)
+		*stat = 0;
+
+out:
+	xfs_btree_del_cursor(cnt_cur, error);
+
+	return error;
+}
+#endif
+
 /*
  * Decide whether to use this allocation group for this allocation.
  * If so, fix up the btree freelist's size.
@@ -2544,6 +2585,15 @@ xfs_alloc_fix_freelist(
 	if (!xfs_alloc_space_available(args, need, flags))
 		goto out_agbp_relse;
 
+#ifdef DEBUG
+	if (args->alloc_minlen_only) {
+		int stat;
+
+		error = xfs_exact_minlen_extent_available(args, agbp, &stat);
+		if (error || !stat)
+			goto out_agbp_relse;
+	}
+#endif
 	/*
 	 * Make the freelist shorter if it's too long.
 	 *
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index 6c22b12176b8..a4427c5775c2 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -75,6 +75,9 @@ typedef struct xfs_alloc_arg {
 	char		wasfromfl;	/* set if allocation is from freelist */
 	struct xfs_owner_info	oinfo;	/* owner of blocks being allocated */
 	enum xfs_ag_resv_type	resv;	/* block reservation to use */
+#ifdef DEBUG
+	bool		alloc_minlen_only; /* allocate exact minlen extent */
+#endif
 } xfs_alloc_arg_t;
 
 /*
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index f6cd33684571..c57dcd3f46bc 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3552,34 +3552,101 @@ xfs_bmap_process_allocated_extent(
 	xfs_bmap_btalloc_accounting(ap, args);
 }
 
-STATIC int
-xfs_bmap_btalloc(
-	struct xfs_bmalloca	*ap)	/* bmap alloc argument struct */
+#ifdef DEBUG
+static int
+xfs_bmap_exact_minlen_extent_alloc(
+	struct xfs_bmalloca	*ap)
 {
-	xfs_mount_t	*mp;		/* mount point structure */
-	xfs_alloctype_t	atype = 0;	/* type for allocation routines */
-	xfs_agnumber_t	fb_agno;	/* ag number of ap->firstblock */
-	xfs_agnumber_t	ag;
-	xfs_alloc_arg_t	args;
-	xfs_fileoff_t	orig_offset;
-	xfs_extlen_t	orig_length;
-	xfs_extlen_t	blen;
-	xfs_extlen_t	nextminlen = 0;
-	int		nullfb;		/* true if ap->firstblock isn't set */
-	int		isaligned;
-	int		tryagain;
-	int		error;
-	int		stripe_align;
+	struct xfs_mount	*mp = ap->ip->i_mount;
+	struct xfs_alloc_arg	args = { .tp = ap->tp, .mp = mp };
+	xfs_fileoff_t		orig_offset;
+	xfs_extlen_t		orig_length;
+	int			error;
 
 	ASSERT(ap->length);
+
+	if (ap->minlen != 1) {
+		ap->blkno = NULLFSBLOCK;
+		ap->length = 0;
+		return 0;
+	}
+
 	orig_offset = ap->offset;
 	orig_length = ap->length;
 
-	mp = ap->ip->i_mount;
+	args.alloc_minlen_only = 1;
 
-	memset(&args, 0, sizeof(args));
-	args.tp = ap->tp;
-	args.mp = mp;
+	xfs_bmap_compute_alignments(ap, &args);
+
+	if (ap->tp->t_firstblock == NULLFSBLOCK) {
+		/*
+		 * Unlike the longest extent available in an AG, we don't track
+		 * the length of an AG's shortest extent.
+		 * XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT is a debug only knob and
+		 * hence we can afford to start traversing from the 0th AG since
+		 * we need not be concerned about a drop in performance in
+		 * "debug only" code paths.
+		 */
+		ap->blkno = XFS_AGB_TO_FSB(mp, 0, 0);
+	} else {
+		ap->blkno = ap->tp->t_firstblock;
+	}
+
+	args.fsbno = ap->blkno;
+	args.oinfo = XFS_RMAP_OINFO_SKIP_UPDATE;
+	args.type = XFS_ALLOCTYPE_FIRST_AG;
+	args.total = args.minlen = args.maxlen = ap->minlen;
+
+	args.alignment = 1;
+	args.minalignslop = 0;
+
+	args.minleft = ap->minleft;
+	args.wasdel = ap->wasdel;
+	args.resv = XFS_AG_RESV_NONE;
+	args.datatype = ap->datatype;
+
+	error = xfs_alloc_vextent(&args);
+	if (error)
+		return error;
+
+	if (args.fsbno != NULLFSBLOCK) {
+		xfs_bmap_process_allocated_extent(ap, &args, orig_offset,
+			orig_length);
+	} else {
+		ap->blkno = NULLFSBLOCK;
+		ap->length = 0;
+	}
+
+	return 0;
+}
+#else
+
+#define xfs_bmap_exact_minlen_extent_alloc(bma) (-EFSCORRUPTED)
+
+#endif
+
+STATIC int
+xfs_bmap_btalloc(
+	struct xfs_bmalloca	*ap)
+{
+	struct xfs_mount	*mp = ap->ip->i_mount;
+	struct xfs_alloc_arg	args = { .tp = ap->tp, .mp = mp };
+	xfs_alloctype_t		atype = 0;
+	xfs_agnumber_t		fb_agno;	/* ag number of ap->firstblock */
+	xfs_agnumber_t		ag;
+	xfs_fileoff_t		orig_offset;
+	xfs_extlen_t		orig_length;
+	xfs_extlen_t		blen;
+	xfs_extlen_t		nextminlen = 0;
+	int			nullfb; /* true if ap->firstblock isn't set */
+	int			isaligned;
+	int			tryagain;
+	int			error;
+	int			stripe_align;
+
+	ASSERT(ap->length);
+	orig_offset = ap->offset;
+	orig_length = ap->length;
 
 	stripe_align = xfs_bmap_compute_alignments(ap, &args);
 
@@ -4113,6 +4180,10 @@ xfs_bmap_alloc_userdata(
 			return xfs_bmap_rtalloc(bma);
 	}
 
+	if (unlikely(XFS_TEST_ERROR(false, mp,
+			XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT)))
+		return xfs_bmap_exact_minlen_extent_alloc(bma);
+
 	return xfs_bmap_btalloc(bma);
 }
 
@@ -4149,10 +4220,15 @@ xfs_bmapi_allocate(
 	else
 		bma->minlen = 1;
 
-	if (bma->flags & XFS_BMAPI_METADATA)
-		error = xfs_bmap_btalloc(bma);
-	else
+	if (bma->flags & XFS_BMAPI_METADATA) {
+		if (unlikely(XFS_TEST_ERROR(false, mp,
+				XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT)))
+			error = xfs_bmap_exact_minlen_extent_alloc(bma);
+		else
+			error = xfs_bmap_btalloc(bma);
+	} else {
 		error = xfs_bmap_alloc_userdata(bma);
+	}
 	if (error || bma->blkno == NULLFSBLOCK)
 		return error;
 
diff --git a/fs/xfs/libxfs/xfs_errortag.h b/fs/xfs/libxfs/xfs_errortag.h
index 1c56fcceeea6..6ca9084b6934 100644
--- a/fs/xfs/libxfs/xfs_errortag.h
+++ b/fs/xfs/libxfs/xfs_errortag.h
@@ -57,7 +57,8 @@
 #define XFS_ERRTAG_IUNLINK_FALLBACK			34
 #define XFS_ERRTAG_BUF_IOERROR				35
 #define XFS_ERRTAG_REDUCE_MAX_IEXTENTS			36
-#define XFS_ERRTAG_MAX					37
+#define XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT		37
+#define XFS_ERRTAG_MAX					38
 
 /*
  * Random factors for above tags, 1 means always, 2 means 1/2 time, etc.
@@ -99,5 +100,6 @@
 #define XFS_RANDOM_IUNLINK_FALLBACK			(XFS_RANDOM_DEFAULT/10)
 #define XFS_RANDOM_BUF_IOERROR				XFS_RANDOM_DEFAULT
 #define XFS_RANDOM_REDUCE_MAX_IEXTENTS			1
+#define XFS_RANDOM_BMAP_ALLOC_MINLEN_EXTENT		1
 
 #endif /* __XFS_ERRORTAG_H_ */
diff --git a/fs/xfs/xfs_error.c b/fs/xfs/xfs_error.c
index 3780b118cc47..185b4915b7bf 100644
--- a/fs/xfs/xfs_error.c
+++ b/fs/xfs/xfs_error.c
@@ -55,6 +55,7 @@ static unsigned int xfs_errortag_random_default[] = {
 	XFS_RANDOM_IUNLINK_FALLBACK,
 	XFS_RANDOM_BUF_IOERROR,
 	XFS_RANDOM_REDUCE_MAX_IEXTENTS,
+	XFS_RANDOM_BMAP_ALLOC_MINLEN_EXTENT,
 };
 
 struct xfs_errortag_attr {
@@ -166,6 +167,7 @@ XFS_ERRORTAG_ATTR_RW(bad_summary,	XFS_ERRTAG_FORCE_SUMMARY_RECALC);
 XFS_ERRORTAG_ATTR_RW(iunlink_fallback,	XFS_ERRTAG_IUNLINK_FALLBACK);
 XFS_ERRORTAG_ATTR_RW(buf_ioerror,	XFS_ERRTAG_BUF_IOERROR);
 XFS_ERRORTAG_ATTR_RW(reduce_max_iextents,	XFS_ERRTAG_REDUCE_MAX_IEXTENTS);
+XFS_ERRORTAG_ATTR_RW(bmap_alloc_minlen_extent,	XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT);
 
 static struct attribute *xfs_errortag_attrs[] = {
 	XFS_ERRORTAG_ATTR_LIST(noerror),
@@ -205,6 +207,7 @@ static struct attribute *xfs_errortag_attrs[] = {
 	XFS_ERRORTAG_ATTR_LIST(iunlink_fallback),
 	XFS_ERRORTAG_ATTR_LIST(buf_ioerror),
 	XFS_ERRORTAG_ATTR_LIST(reduce_max_iextents),
+	XFS_ERRORTAG_ATTR_LIST(bmap_alloc_minlen_extent),
 	NULL,
 };
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH V11 04/14] xfs: Check for extent overflow when adding/removing xattrs
  2020-11-17 13:44 ` [PATCH V11 04/14] xfs: Check for extent overflow when adding/removing xattrs Chandan Babu R
@ 2020-12-03 18:45   ` Darrick J. Wong
  2020-12-04  9:04     ` Chandan Babu R
  0 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2020-12-03 18:45 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs

On Tue, Nov 17, 2020 at 07:14:06PM +0530, Chandan Babu R wrote:
> Adding/removing an xattr can cause XFS_DA_NODE_MAXDEPTH extents to be
> added. One extra extent for dabtree in case a local attr is large enough
> to cause a double split.  It can also cause extent count to increase
> proportional to the size of a remote xattr's value.
> 
> To be able to always remove an existing xattr, when adding an xattr we
> make sure to reserve inode fork extent count required for removing max
> sized xattr in addition to that required by the xattr add operation.
> 
> Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
> ---
>  fs/xfs/libxfs/xfs_attr.c       | 20 ++++++++++++++++++++
>  fs/xfs/libxfs/xfs_inode_fork.h | 10 ++++++++++
>  2 files changed, 30 insertions(+)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index fd8e6418a0d3..d53b3867b308 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -396,6 +396,8 @@ xfs_attr_set(
>  	struct xfs_trans_res	tres;
>  	bool			rsvd = (args->attr_filter & XFS_ATTR_ROOT);
>  	int			error, local;
> +	int			iext_cnt;
> +	int			rmt_blks;
>  	unsigned int		total;
>  
>  	if (XFS_FORCED_SHUTDOWN(dp->i_mount))
> @@ -416,6 +418,9 @@ xfs_attr_set(
>  	 */
>  	args->op_flags = XFS_DA_OP_OKNOENT;
>  
> +	rmt_blks = xfs_attr3_rmt_blocks(mp, XFS_XATTR_SIZE_MAX);
> +	iext_cnt = XFS_IEXT_ATTR_MANIP_CNT(rmt_blks);

These values are only relevant for the xattr removal case, right?
AFAICT the args->value != NULL case immediately after will set new
values, so why not just move this into...

> +
>  	if (args->value) {
>  		XFS_STATS_INC(mp, xs_attr_set);
>  
> @@ -442,6 +447,13 @@ xfs_attr_set(
>  		tres.tr_logcount = XFS_ATTRSET_LOG_COUNT;
>  		tres.tr_logflags = XFS_TRANS_PERM_LOG_RES;
>  		total = args->total;
> +
> +		if (local)
> +			rmt_blks = 0;
> +		else
> +			rmt_blks = xfs_attr3_rmt_blocks(mp, args->valuelen);
> +
> +		iext_cnt += XFS_IEXT_ATTR_MANIP_CNT(rmt_blks);
>  	} else {
>  		XFS_STATS_INC(mp, xs_attr_remove);

...the bottom of this clause here.

>  
> @@ -460,6 +472,14 @@ xfs_attr_set(
>  
>  	xfs_ilock(dp, XFS_ILOCK_EXCL);
>  	xfs_trans_ijoin(args->trans, dp, 0);
> +
> +	if (args->value || xfs_inode_hasattr(dp)) {

Can this simply be "if (iext_cnt != 0)" ?

--D

> +		error = xfs_iext_count_may_overflow(dp, XFS_ATTR_FORK,
> +				iext_cnt);
> +		if (error)
> +			goto out_trans_cancel;
> +	}
> +
>  	if (args->value) {
>  		unsigned int	quota_flags = XFS_QMOPT_RES_REGBLKS;
>  
> diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
> index bcac769a7df6..5de2f07d0dd5 100644
> --- a/fs/xfs/libxfs/xfs_inode_fork.h
> +++ b/fs/xfs/libxfs/xfs_inode_fork.h
> @@ -47,6 +47,16 @@ struct xfs_ifork {
>   */
>  #define XFS_IEXT_PUNCH_HOLE_CNT		(1)
>  
> +/*
> + * Adding/removing an xattr can cause XFS_DA_NODE_MAXDEPTH extents to
> + * be added. One extra extent for dabtree in case a local attr is
> + * large enough to cause a double split.  It can also cause extent
> + * count to increase proportional to the size of a remote xattr's
> + * value.
> + */
> +#define XFS_IEXT_ATTR_MANIP_CNT(rmt_blks) \
> +	(XFS_DA_NODE_MAXDEPTH + max(1, rmt_blks))
> +
>  /*
>   * Fork handling.
>   */
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V11 05/14] xfs: Check for extent overflow when adding/removing dir entries
  2020-11-17 13:44 ` [PATCH V11 05/14] xfs: Check for extent overflow when adding/removing dir entries Chandan Babu R
@ 2020-12-03 19:04   ` Darrick J. Wong
  2020-12-04  9:04     ` Chandan Babu R
  0 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2020-12-03 19:04 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs

On Tue, Nov 17, 2020 at 07:14:07PM +0530, Chandan Babu R wrote:
> Directory entry addition/removal can cause the following,
> 1. Data block can be added/removed.
>    A new extent can cause extent count to increase by 1.
> 2. Free disk block can be added/removed.
>    Same behaviour as described above for Data block.
> 3. Dabtree blocks.
>    XFS_DA_NODE_MAXDEPTH blocks can be added. Each of these
>    can be new extents. Hence extent count can increase by
>    XFS_DA_NODE_MAXDEPTH.
> 
> To be able to always remove an existing directory entry, when adding a
> new directory entry we make sure to reserve inode fork extent count
> required for removing a directory entry in addition to that required for
> the directory entry add operation.
> 
> Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
> ---
>  fs/xfs/libxfs/xfs_inode_fork.h | 13 +++++++++++++
>  fs/xfs/xfs_inode.c             | 27 +++++++++++++++++++++++++++
>  fs/xfs/xfs_symlink.c           |  5 +++++
>  3 files changed, 45 insertions(+)
> 
> diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
> index 5de2f07d0dd5..fd93fdc67ee4 100644
> --- a/fs/xfs/libxfs/xfs_inode_fork.h
> +++ b/fs/xfs/libxfs/xfs_inode_fork.h
> @@ -57,6 +57,19 @@ struct xfs_ifork {
>  #define XFS_IEXT_ATTR_MANIP_CNT(rmt_blks) \
>  	(XFS_DA_NODE_MAXDEPTH + max(1, rmt_blks))
>  
> +/*
> + * Directory entry addition/removal can cause the following,
> + * 1. Data block can be added/removed.
> + *    A new extent can cause extent count to increase by 1.
> + * 2. Free disk block can be added/removed.
> + *    Same behaviour as described above for Data block.
> + * 3. Dabtree blocks.
> + *    XFS_DA_NODE_MAXDEPTH blocks can be added. Each of these can be new
> + *    extents. Hence extent count can increase by XFS_DA_NODE_MAXDEPTH.
> + */
> +#define XFS_IEXT_DIR_MANIP_CNT(mp) \
> +	((XFS_DA_NODE_MAXDEPTH + 1 + 1) * (mp)->m_dir_geo->fsbcount)
> +
>  /*
>   * Fork handling.
>   */
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 2bfbcf28b1bd..f7b0b7fce940 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -1177,6 +1177,11 @@ xfs_create(
>  	if (error)
>  		goto out_trans_cancel;
>  
> +	error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
> +			XFS_IEXT_DIR_MANIP_CNT(mp) << 1);

Er, why did these double since V10?  We're only adding one entry, right?

> +	if (error)
> +		goto out_trans_cancel;
> +
>  	/*
>  	 * A newly created regular or special file just has one directory
>  	 * entry pointing to them, but a directory also the "." entry
> @@ -1393,6 +1398,11 @@ xfs_link(
>  	xfs_trans_ijoin(tp, sip, XFS_ILOCK_EXCL);
>  	xfs_trans_ijoin(tp, tdp, XFS_ILOCK_EXCL);
>  
> +	error = xfs_iext_count_may_overflow(tdp, XFS_DATA_FORK,
> +			XFS_IEXT_DIR_MANIP_CNT(mp) << 1);

Same question here.

> +	if (error)
> +		goto error_return;
> +
>  	/*
>  	 * If we are using project inheritance, we only allow hard link
>  	 * creation in our tree when the project IDs are the same; else
> @@ -2861,6 +2871,11 @@ xfs_remove(
>  	xfs_trans_ijoin(tp, dp, XFS_ILOCK_EXCL);
>  	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
>  
> +	error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
> +			XFS_IEXT_DIR_MANIP_CNT(mp));
> +	if (error)
> +		goto out_trans_cancel;
> +
>  	/*
>  	 * If we're removing a directory perform some additional validation.
>  	 */
> @@ -3221,6 +3236,18 @@ xfs_rename(
>  	if (wip)
>  		xfs_trans_ijoin(tp, wip, XFS_ILOCK_EXCL);
>  
> +	error = xfs_iext_count_may_overflow(src_dp, XFS_DATA_FORK,
> +			XFS_IEXT_DIR_MANIP_CNT(mp));
> +	if (error)
> +		goto out_trans_cancel;
> +
> +	if (target_ip == NULL) {
> +		error = xfs_iext_count_may_overflow(target_dp, XFS_DATA_FORK,
> +				XFS_IEXT_DIR_MANIP_CNT(mp) << 1);

Why did this change to "<< 1" since V10?

I'm sorry, but I've lost my recollection on how the accounting works
here.  This seems (to me anyway ;)) a good candidate for a comment:

For a rename within the same dir where target_name doesn't yet exist, we
are removing a name and then adding a name.  We therefore check for iext
overflow with (DIR_MANIP_CNT * 2), right?  And I think that "target name
does not exist" is synonymous with target_ip == NULL?

For a rename between dirs where the target name doesn't exist, we're
removing src_name from src_dp and adding target_name to target_dp.
Therefore we have to check for DIR_MANIP_CNT overflow on each of src_dp
and target_dp, right?

For a rename where target_name /does/ exist, we're only removing the
src_name, so we have to check for DIR_MANIP_CNT on src_dp, right?

For a RENAME_EXCHANGE we're not removing either name, so we don't need
to check for iext overflow of src_dp or target_dp, right?

> +		if (error)
> +			goto out_trans_cancel;
> +	}
> +
>  	/*
>  	 * If we are using project inheritance, we only allow renames
>  	 * into our tree when the project IDs are the same; else the
> diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
> index 8e88a7ca387e..08aa808fe290 100644
> --- a/fs/xfs/xfs_symlink.c
> +++ b/fs/xfs/xfs_symlink.c
> @@ -220,6 +220,11 @@ xfs_symlink(
>  	if (error)
>  		goto out_trans_cancel;
>  
> +	error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
> +			XFS_IEXT_DIR_MANIP_CNT(mp) << 1);

Same question as xfs_create.

--D

> +	if (error)
> +		goto out_trans_cancel;
> +
>  	/*
>  	 * Allocate an inode for the symlink.
>  	 */
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V11 10/14] xfs: Introduce error injection to reduce maximum inode fork extent count
  2020-11-17 13:44 ` [PATCH V11 10/14] xfs: Introduce error injection to reduce maximum inode fork extent count Chandan Babu R
@ 2020-12-03 19:06   ` Darrick J. Wong
  2020-12-04  9:05     ` Chandan Babu R
  0 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2020-12-03 19:06 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs

On Tue, Nov 17, 2020 at 07:14:12PM +0530, Chandan Babu R wrote:
> This commit adds XFS_ERRTAG_REDUCE_MAX_IEXTENTS error tag which enables
> userspace programs to test "Inode fork extent count overflow detection"
> by reducing maximum possible inode fork extent count to 35.
> 
> With block size of 4k, xattr (with local value) insert operation would
> require in the worst case "XFS_DA_NODE_MAXDEPTH + 1" plus
> "XFS_DA_NODE_MAXDEPTH + (64k / 4k)" (required for guaranteeing removal
> of a maximum sized xattr) number of extents. This evaluates to ~28
> extents. To allow for additions of two or more xattrs during extent
> overflow testing, the pseudo max extent count is set to 35.
> 
> Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
> ---
>  fs/xfs/libxfs/xfs_errortag.h   | 4 +++-
>  fs/xfs/libxfs/xfs_inode_fork.c | 4 ++++
>  fs/xfs/xfs_error.c             | 3 +++
>  3 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_errortag.h b/fs/xfs/libxfs/xfs_errortag.h
> index 53b305dea381..1c56fcceeea6 100644
> --- a/fs/xfs/libxfs/xfs_errortag.h
> +++ b/fs/xfs/libxfs/xfs_errortag.h
> @@ -56,7 +56,8 @@
>  #define XFS_ERRTAG_FORCE_SUMMARY_RECALC			33
>  #define XFS_ERRTAG_IUNLINK_FALLBACK			34
>  #define XFS_ERRTAG_BUF_IOERROR				35
> -#define XFS_ERRTAG_MAX					36
> +#define XFS_ERRTAG_REDUCE_MAX_IEXTENTS			36
> +#define XFS_ERRTAG_MAX					37
>  
>  /*
>   * Random factors for above tags, 1 means always, 2 means 1/2 time, etc.
> @@ -97,5 +98,6 @@
>  #define XFS_RANDOM_FORCE_SUMMARY_RECALC			1
>  #define XFS_RANDOM_IUNLINK_FALLBACK			(XFS_RANDOM_DEFAULT/10)
>  #define XFS_RANDOM_BUF_IOERROR				XFS_RANDOM_DEFAULT
> +#define XFS_RANDOM_REDUCE_MAX_IEXTENTS			1
>  
>  #endif /* __XFS_ERRORTAG_H_ */
> diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
> index 8d48716547e5..989b20977654 100644
> --- a/fs/xfs/libxfs/xfs_inode_fork.c
> +++ b/fs/xfs/libxfs/xfs_inode_fork.c
> @@ -24,6 +24,7 @@
>  #include "xfs_dir2_priv.h"
>  #include "xfs_attr_leaf.h"
>  #include "xfs_types.h"
> +#include "xfs_errortag.h"
>  
>  kmem_zone_t *xfs_ifork_zone;
>  
> @@ -745,6 +746,9 @@ xfs_iext_count_may_overflow(
>  
>  	max_exts = (whichfork == XFS_ATTR_FORK) ? MAXAEXTNUM : MAXEXTNUM;
>  
> +	if (XFS_TEST_ERROR(false, ip->i_mount, XFS_ERRTAG_REDUCE_MAX_IEXTENTS))
> +		max_exts = 35;

Please add a comment here explaining why 35.

Sorry about the longish review delay, last week was a US holiday and
this week I have eye problems again. :(

--D

> +
>  	nr_exts = ifp->if_nextents + nr_to_add;
>  	if (nr_exts < ifp->if_nextents || nr_exts > max_exts)
>  		return -EFBIG;
> diff --git a/fs/xfs/xfs_error.c b/fs/xfs/xfs_error.c
> index 7f6e20899473..3780b118cc47 100644
> --- a/fs/xfs/xfs_error.c
> +++ b/fs/xfs/xfs_error.c
> @@ -54,6 +54,7 @@ static unsigned int xfs_errortag_random_default[] = {
>  	XFS_RANDOM_FORCE_SUMMARY_RECALC,
>  	XFS_RANDOM_IUNLINK_FALLBACK,
>  	XFS_RANDOM_BUF_IOERROR,
> +	XFS_RANDOM_REDUCE_MAX_IEXTENTS,
>  };
>  
>  struct xfs_errortag_attr {
> @@ -164,6 +165,7 @@ XFS_ERRORTAG_ATTR_RW(force_repair,	XFS_ERRTAG_FORCE_SCRUB_REPAIR);
>  XFS_ERRORTAG_ATTR_RW(bad_summary,	XFS_ERRTAG_FORCE_SUMMARY_RECALC);
>  XFS_ERRORTAG_ATTR_RW(iunlink_fallback,	XFS_ERRTAG_IUNLINK_FALLBACK);
>  XFS_ERRORTAG_ATTR_RW(buf_ioerror,	XFS_ERRTAG_BUF_IOERROR);
> +XFS_ERRORTAG_ATTR_RW(reduce_max_iextents,	XFS_ERRTAG_REDUCE_MAX_IEXTENTS);
>  
>  static struct attribute *xfs_errortag_attrs[] = {
>  	XFS_ERRORTAG_ATTR_LIST(noerror),
> @@ -202,6 +204,7 @@ static struct attribute *xfs_errortag_attrs[] = {
>  	XFS_ERRORTAG_ATTR_LIST(bad_summary),
>  	XFS_ERRORTAG_ATTR_LIST(iunlink_fallback),
>  	XFS_ERRORTAG_ATTR_LIST(buf_ioerror),
> +	XFS_ERRORTAG_ATTR_LIST(reduce_max_iextents),
>  	NULL,
>  };
>  
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V11 04/14] xfs: Check for extent overflow when adding/removing xattrs
  2020-12-03 18:45   ` Darrick J. Wong
@ 2020-12-04  9:04     ` Chandan Babu R
  2020-12-09 18:51       ` Darrick J. Wong
  0 siblings, 1 reply; 25+ messages in thread
From: Chandan Babu R @ 2020-12-04  9:04 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, 03 Dec 2020 10:45:59 -0800, Darrick J. Wong wrote:
> On Tue, Nov 17, 2020 at 07:14:06PM +0530, Chandan Babu R wrote:
> > Adding/removing an xattr can cause XFS_DA_NODE_MAXDEPTH extents to be
> > added. One extra extent for dabtree in case a local attr is large enough
> > to cause a double split.  It can also cause extent count to increase
> > proportional to the size of a remote xattr's value.
> > 
> > To be able to always remove an existing xattr, when adding an xattr we
> > make sure to reserve inode fork extent count required for removing max
> > sized xattr in addition to that required by the xattr add operation.
> > 
> > Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
> > ---
> >  fs/xfs/libxfs/xfs_attr.c       | 20 ++++++++++++++++++++
> >  fs/xfs/libxfs/xfs_inode_fork.h | 10 ++++++++++
> >  2 files changed, 30 insertions(+)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > index fd8e6418a0d3..d53b3867b308 100644
> > --- a/fs/xfs/libxfs/xfs_attr.c
> > +++ b/fs/xfs/libxfs/xfs_attr.c
> > @@ -396,6 +396,8 @@ xfs_attr_set(
> >  	struct xfs_trans_res	tres;
> >  	bool			rsvd = (args->attr_filter & XFS_ATTR_ROOT);
> >  	int			error, local;
> > +	int			iext_cnt;
> > +	int			rmt_blks;
> >  	unsigned int		total;
> >  
> >  	if (XFS_FORCED_SHUTDOWN(dp->i_mount))
> > @@ -416,6 +418,9 @@ xfs_attr_set(
> >  	 */
> >  	args->op_flags = XFS_DA_OP_OKNOENT;
> >  
> > +	rmt_blks = xfs_attr3_rmt_blocks(mp, XFS_XATTR_SIZE_MAX);
> > +	iext_cnt = XFS_IEXT_ATTR_MANIP_CNT(rmt_blks);
> 
> These values are only relevant for the xattr removal case, right?
> AFAICT the args->value != NULL case immediately after will set new
> values, so why not just move this into...

The above statements compute the extent count required to remove a maximum
sized remote xattr.

To guarantee that a user can always remove an xattr, the "args->value != NULL"
case adds to the value of iext_cnt that has been computed above. I had
extracted and placed the above set of statements since they were now common to
both "insert" and "remove" xattr operations.

> 
> > +
> >  	if (args->value) {
> >  		XFS_STATS_INC(mp, xs_attr_set);
> >  
> > @@ -442,6 +447,13 @@ xfs_attr_set(
> >  		tres.tr_logcount = XFS_ATTRSET_LOG_COUNT;
> >  		tres.tr_logflags = XFS_TRANS_PERM_LOG_RES;
> >  		total = args->total;
> > +
> > +		if (local)
> > +			rmt_blks = 0;
> > +		else
> > +			rmt_blks = xfs_attr3_rmt_blocks(mp, args->valuelen);
> > +
> > +		iext_cnt += XFS_IEXT_ATTR_MANIP_CNT(rmt_blks);
> >  	} else {
> >  		XFS_STATS_INC(mp, xs_attr_remove);
> 
> ...the bottom of this clause here.
> 
> >  
> > @@ -460,6 +472,14 @@ xfs_attr_set(
> >  
> >  	xfs_ilock(dp, XFS_ILOCK_EXCL);
> >  	xfs_trans_ijoin(args->trans, dp, 0);
> > +
> > +	if (args->value || xfs_inode_hasattr(dp)) {
> 
> Can this simply be "if (iext_cnt != 0)" ?

That would lead to a bug since iext_cnt is computed unconditionally at the
beginning of the function. An extent count reservation will be attempted when
xattr delete operation is executed against an inode which does not have an
associated attr fork. This will cause xfs_iext_count_may_overflow() to
dereference the NULL pointer at xfs_inode->i_afp->if_nextents.

> 
> --D
> 
> > +		error = xfs_iext_count_may_overflow(dp, XFS_ATTR_FORK,
> > +				iext_cnt);
> > +		if (error)
> > +			goto out_trans_cancel;
> > +	}
> > +
> >  	if (args->value) {
> >  		unsigned int	quota_flags = XFS_QMOPT_RES_REGBLKS;
> >  
> > diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
> > index bcac769a7df6..5de2f07d0dd5 100644
> > --- a/fs/xfs/libxfs/xfs_inode_fork.h
> > +++ b/fs/xfs/libxfs/xfs_inode_fork.h
> > @@ -47,6 +47,16 @@ struct xfs_ifork {
> >   */
> >  #define XFS_IEXT_PUNCH_HOLE_CNT		(1)
> >  
> > +/*
> > + * Adding/removing an xattr can cause XFS_DA_NODE_MAXDEPTH extents to
> > + * be added. One extra extent for dabtree in case a local attr is
> > + * large enough to cause a double split.  It can also cause extent
> > + * count to increase proportional to the size of a remote xattr's
> > + * value.
> > + */
> > +#define XFS_IEXT_ATTR_MANIP_CNT(rmt_blks) \
> > +	(XFS_DA_NODE_MAXDEPTH + max(1, rmt_blks))
> > +
> >  /*
> >   * Fork handling.
> >   */
> 


-- 
chandan




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V11 05/14] xfs: Check for extent overflow when adding/removing dir entries
  2020-12-03 19:04   ` Darrick J. Wong
@ 2020-12-04  9:04     ` Chandan Babu R
  2020-12-07  8:18       ` Chandan Babu R
  0 siblings, 1 reply; 25+ messages in thread
From: Chandan Babu R @ 2020-12-04  9:04 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, 03 Dec 2020 11:04:22 -0800, Darrick J. Wong wrote:
> On Tue, Nov 17, 2020 at 07:14:07PM +0530, Chandan Babu R wrote:
> > Directory entry addition/removal can cause the following,
> > 1. Data block can be added/removed.
> >    A new extent can cause extent count to increase by 1.
> > 2. Free disk block can be added/removed.
> >    Same behaviour as described above for Data block.
> > 3. Dabtree blocks.
> >    XFS_DA_NODE_MAXDEPTH blocks can be added. Each of these
> >    can be new extents. Hence extent count can increase by
> >    XFS_DA_NODE_MAXDEPTH.
> > 
> > To be able to always remove an existing directory entry, when adding a
> > new directory entry we make sure to reserve inode fork extent count
> > required for removing a directory entry in addition to that required for
> > the directory entry add operation.
> > 
> > Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
> > ---
> >  fs/xfs/libxfs/xfs_inode_fork.h | 13 +++++++++++++
> >  fs/xfs/xfs_inode.c             | 27 +++++++++++++++++++++++++++
> >  fs/xfs/xfs_symlink.c           |  5 +++++
> >  3 files changed, 45 insertions(+)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
> > index 5de2f07d0dd5..fd93fdc67ee4 100644
> > --- a/fs/xfs/libxfs/xfs_inode_fork.h
> > +++ b/fs/xfs/libxfs/xfs_inode_fork.h
> > @@ -57,6 +57,19 @@ struct xfs_ifork {
> >  #define XFS_IEXT_ATTR_MANIP_CNT(rmt_blks) \
> >  	(XFS_DA_NODE_MAXDEPTH + max(1, rmt_blks))
> >  
> > +/*
> > + * Directory entry addition/removal can cause the following,
> > + * 1. Data block can be added/removed.
> > + *    A new extent can cause extent count to increase by 1.
> > + * 2. Free disk block can be added/removed.
> > + *    Same behaviour as described above for Data block.
> > + * 3. Dabtree blocks.
> > + *    XFS_DA_NODE_MAXDEPTH blocks can be added. Each of these can be new
> > + *    extents. Hence extent count can increase by XFS_DA_NODE_MAXDEPTH.
> > + */
> > +#define XFS_IEXT_DIR_MANIP_CNT(mp) \
> > +	((XFS_DA_NODE_MAXDEPTH + 1 + 1) * (mp)->m_dir_geo->fsbcount)
> > +
> >  /*
> >   * Fork handling.
> >   */
> > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > index 2bfbcf28b1bd..f7b0b7fce940 100644
> > --- a/fs/xfs/xfs_inode.c
> > +++ b/fs/xfs/xfs_inode.c
> > @@ -1177,6 +1177,11 @@ xfs_create(
> >  	if (error)
> >  		goto out_trans_cancel;
> >  
> > +	error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
> > +			XFS_IEXT_DIR_MANIP_CNT(mp) << 1);
> 
> Er, why did these double since V10?  We're only adding one entry, right?

To be able to always guarantee the removal of an existing directory entry, we
reserve inode fork extent count required for removing a directory entry in
addition to that required for the directory entry add operation.

A bug was discovered when executing the following sequence of
operations,
1. Keep inserting directory entries until the pseudo max extent count limit is
   reached.
2. At this stage, a directory entry remove operation will fail because it
   tries to reserve XFS_IEXT_DIR_MANIP_CNT(mp) worth of extent count. This
   reservation fails since the extent count would go over the pseudo max
   extent count limit as it did in step 1.

We would end up with a directory which can never be deleted.

Hence V11 doubles the extent count reservation for "directory entry insert"
operations. The first XFS_IEXT_DIR_MANIP_CNT(mp) instance is for "insert"
operation while the second XFS_IEXT_DIR_MANIP_CNT(mp) instance is for
guaranteeing a possible future "remove" operation to succeed.

> 
> > +	if (error)
> > +		goto out_trans_cancel;
> > +
> >  	/*
> >  	 * A newly created regular or special file just has one directory
> >  	 * entry pointing to them, but a directory also the "." entry
> > @@ -1393,6 +1398,11 @@ xfs_link(
> >  	xfs_trans_ijoin(tp, sip, XFS_ILOCK_EXCL);
> >  	xfs_trans_ijoin(tp, tdp, XFS_ILOCK_EXCL);
> >  
> > +	error = xfs_iext_count_may_overflow(tdp, XFS_DATA_FORK,
> > +			XFS_IEXT_DIR_MANIP_CNT(mp) << 1);
> 
> Same question here.

Creating a new hard link involves adding a new directory entry. Hence apart
from reserving extent count for directory entry addition we will have to
reserve extent count for a future directory entry removal as well.

> 
> > +	if (error)
> > +		goto error_return;
> > +
> >  	/*
> >  	 * If we are using project inheritance, we only allow hard link
> >  	 * creation in our tree when the project IDs are the same; else
> > @@ -2861,6 +2871,11 @@ xfs_remove(
> >  	xfs_trans_ijoin(tp, dp, XFS_ILOCK_EXCL);
> >  	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
> >  
> > +	error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
> > +			XFS_IEXT_DIR_MANIP_CNT(mp));
> > +	if (error)
> > +		goto out_trans_cancel;
> > +
> >  	/*
> >  	 * If we're removing a directory perform some additional validation.
> >  	 */
> > @@ -3221,6 +3236,18 @@ xfs_rename(
> >  	if (wip)
> >  		xfs_trans_ijoin(tp, wip, XFS_ILOCK_EXCL);
> >  
> > +	error = xfs_iext_count_may_overflow(src_dp, XFS_DATA_FORK,
> > +			XFS_IEXT_DIR_MANIP_CNT(mp));
> > +	if (error)
> > +		goto out_trans_cancel;
> > +
> > +	if (target_ip == NULL) {
> > +		error = xfs_iext_count_may_overflow(target_dp, XFS_DATA_FORK,
> > +				XFS_IEXT_DIR_MANIP_CNT(mp) << 1);
> 
> Why did this change to "<< 1" since V10?

Extent count is doubled since this is essentially a directory insert operation
w.r.t target_dp directory. One instance of XFS_IEXT_DIR_MANIP_CNT(mp) is for
the directory entry being added to target_dp directory and another instance of
XFS_IEXT_DIR_MANIP_CNT(mp) is for guaranteeing a future directory entry
removal from target_dp directory to succeed.

> 
> I'm sorry, but I've lost my recollection on how the accounting works
> here.  This seems (to me anyway ;)) a good candidate for a comment:
> 
> For a rename between dirs where the target name doesn't exist, we're
> removing src_name from src_dp and adding target_name to target_dp.
> Therefore we have to check for DIR_MANIP_CNT overflow on each of src_dp
> and target_dp, right?

Extent count check is doubled since this is a directory insert operation w.r.t
target_dp directory ... One instance of XFS_IEXT_DIR_MANIP_CNT(mp) is for the
directory entry being added to target_dp directory and another instance of
XFS_IEXT_DIR_MANIP_CNT(mp) is for guaranteeing a future directory entry
removal from target_dp directory to succeed.

Since a directory entry is being removed from src_dp, reserving only a single
instance of XFS_IEXT_DIR_MANIP_CNT(mp) would suffice.

> 
> For a rename within the same dir where target_name doesn't yet exist, we
> are removing a name and then adding a name.  We therefore check for iext
> overflow with (DIR_MANIP_CNT * 2), right?  And I think that "target name
> does not exist" is synonymous with target_ip == NULL?

Here again we have to reserve two instances of XFS_IEXT_DIR_MANIP_CNT(mp) for
target_name insertion and one instance of XFS_IEXT_DIR_MANIP_CNT(mp) for
src_name removal. This is because insertion and removal of src_name may each
end up consuming XFS_IEXT_DIR_MANIP_CNT(mp) extent counts in the worst case. A
future directory entry remove operation will require
XFS_IEXT_DIR_MANIP_CNT(mp) extent counts to be reserved.

Also, You are right about "target name does not exist" being synonymous with
target_ip == NULL.

> 
> For a rename where target_name /does/ exist, we're only removing the
> src_name, so we have to check for DIR_MANIP_CNT on src_dp, right?

Yes, you are right.

> 
> For a RENAME_EXCHANGE we're not removing either name, so we don't need
> to check for iext overflow of src_dp or target_dp, right?

You are right. Sorry, I missed this. I will move the extent count reservation
logic to come after the invocation of xfs_cross_rename().

I will also add appropriate comments into xfs_rename() describing the
scenarios that have been discussed above.

PS: I have swapped the order of two comments from your original reply since I
think it is easier to explain the scenarios with the order of
comments/questions swapped.

> 
> > +		if (error)
> > +			goto out_trans_cancel;
> > +	}
> > +
> >  	/*
> >  	 * If we are using project inheritance, we only allow renames
> >  	 * into our tree when the project IDs are the same; else the
> > diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
> > index 8e88a7ca387e..08aa808fe290 100644
> > --- a/fs/xfs/xfs_symlink.c
> > +++ b/fs/xfs/xfs_symlink.c
> > @@ -220,6 +220,11 @@ xfs_symlink(
> >  	if (error)
> >  		goto out_trans_cancel;
> >  
> > +	error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
> > +			XFS_IEXT_DIR_MANIP_CNT(mp) << 1);
> 
> Same question as xfs_create.

This is again similar to adding a new directory entry. Hence, apart from
reserving extent count for directory entry addition we will have to reserve
extent count for a future directory entry removal as well.

> 
> --D
> 
> > +	if (error)
> > +		goto out_trans_cancel;
> > +
> >  	/*
> >  	 * Allocate an inode for the symlink.
> >  	 */
> 

-- 
chandan




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V11 10/14] xfs: Introduce error injection to reduce maximum inode fork extent count
  2020-12-03 19:06   ` Darrick J. Wong
@ 2020-12-04  9:05     ` Chandan Babu R
  0 siblings, 0 replies; 25+ messages in thread
From: Chandan Babu R @ 2020-12-04  9:05 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, 03 Dec 2020 11:06:16 -0800, Darrick J. Wong wrote:
> On Tue, Nov 17, 2020 at 07:14:12PM +0530, Chandan Babu R wrote:
> > This commit adds XFS_ERRTAG_REDUCE_MAX_IEXTENTS error tag which enables
> > userspace programs to test "Inode fork extent count overflow detection"
> > by reducing maximum possible inode fork extent count to 35.
> > 
> > With block size of 4k, xattr (with local value) insert operation would
> > require in the worst case "XFS_DA_NODE_MAXDEPTH + 1" plus
> > "XFS_DA_NODE_MAXDEPTH + (64k / 4k)" (required for guaranteeing removal
> > of a maximum sized xattr) number of extents. This evaluates to ~28
> > extents. To allow for additions of two or more xattrs during extent
> > overflow testing, the pseudo max extent count is set to 35.
> > 
> > Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
> > ---
> >  fs/xfs/libxfs/xfs_errortag.h   | 4 +++-
> >  fs/xfs/libxfs/xfs_inode_fork.c | 4 ++++
> >  fs/xfs/xfs_error.c             | 3 +++
> >  3 files changed, 10 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_errortag.h b/fs/xfs/libxfs/xfs_errortag.h
> > index 53b305dea381..1c56fcceeea6 100644
> > --- a/fs/xfs/libxfs/xfs_errortag.h
> > +++ b/fs/xfs/libxfs/xfs_errortag.h
> > @@ -56,7 +56,8 @@
> >  #define XFS_ERRTAG_FORCE_SUMMARY_RECALC			33
> >  #define XFS_ERRTAG_IUNLINK_FALLBACK			34
> >  #define XFS_ERRTAG_BUF_IOERROR				35
> > -#define XFS_ERRTAG_MAX					36
> > +#define XFS_ERRTAG_REDUCE_MAX_IEXTENTS			36
> > +#define XFS_ERRTAG_MAX					37
> >  
> >  /*
> >   * Random factors for above tags, 1 means always, 2 means 1/2 time, etc.
> > @@ -97,5 +98,6 @@
> >  #define XFS_RANDOM_FORCE_SUMMARY_RECALC			1
> >  #define XFS_RANDOM_IUNLINK_FALLBACK			(XFS_RANDOM_DEFAULT/10)
> >  #define XFS_RANDOM_BUF_IOERROR				XFS_RANDOM_DEFAULT
> > +#define XFS_RANDOM_REDUCE_MAX_IEXTENTS			1
> >  
> >  #endif /* __XFS_ERRORTAG_H_ */
> > diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
> > index 8d48716547e5..989b20977654 100644
> > --- a/fs/xfs/libxfs/xfs_inode_fork.c
> > +++ b/fs/xfs/libxfs/xfs_inode_fork.c
> > @@ -24,6 +24,7 @@
> >  #include "xfs_dir2_priv.h"
> >  #include "xfs_attr_leaf.h"
> >  #include "xfs_types.h"
> > +#include "xfs_errortag.h"
> >  
> >  kmem_zone_t *xfs_ifork_zone;
> >  
> > @@ -745,6 +746,9 @@ xfs_iext_count_may_overflow(
> >  
> >  	max_exts = (whichfork == XFS_ATTR_FORK) ? MAXAEXTNUM : MAXEXTNUM;
> >  
> > +	if (XFS_TEST_ERROR(false, ip->i_mount, XFS_ERRTAG_REDUCE_MAX_IEXTENTS))
> > +		max_exts = 35;
> 
> Please add a comment here explaining why 35.

Sure. I will do that.

> 
> Sorry about the longish review delay, last week was a US holiday and
> this week I have eye problems again. :(

Np. Please take care.

> 
> --D
> 
> > +
> >  	nr_exts = ifp->if_nextents + nr_to_add;
> >  	if (nr_exts < ifp->if_nextents || nr_exts > max_exts)
> >  		return -EFBIG;
> > diff --git a/fs/xfs/xfs_error.c b/fs/xfs/xfs_error.c
> > index 7f6e20899473..3780b118cc47 100644
> > --- a/fs/xfs/xfs_error.c
> > +++ b/fs/xfs/xfs_error.c
> > @@ -54,6 +54,7 @@ static unsigned int xfs_errortag_random_default[] = {
> >  	XFS_RANDOM_FORCE_SUMMARY_RECALC,
> >  	XFS_RANDOM_IUNLINK_FALLBACK,
> >  	XFS_RANDOM_BUF_IOERROR,
> > +	XFS_RANDOM_REDUCE_MAX_IEXTENTS,
> >  };
> >  
> >  struct xfs_errortag_attr {
> > @@ -164,6 +165,7 @@ XFS_ERRORTAG_ATTR_RW(force_repair,	XFS_ERRTAG_FORCE_SCRUB_REPAIR);
> >  XFS_ERRORTAG_ATTR_RW(bad_summary,	XFS_ERRTAG_FORCE_SUMMARY_RECALC);
> >  XFS_ERRORTAG_ATTR_RW(iunlink_fallback,	XFS_ERRTAG_IUNLINK_FALLBACK);
> >  XFS_ERRORTAG_ATTR_RW(buf_ioerror,	XFS_ERRTAG_BUF_IOERROR);
> > +XFS_ERRORTAG_ATTR_RW(reduce_max_iextents,	XFS_ERRTAG_REDUCE_MAX_IEXTENTS);
> >  
> >  static struct attribute *xfs_errortag_attrs[] = {
> >  	XFS_ERRORTAG_ATTR_LIST(noerror),
> > @@ -202,6 +204,7 @@ static struct attribute *xfs_errortag_attrs[] = {
> >  	XFS_ERRORTAG_ATTR_LIST(bad_summary),
> >  	XFS_ERRORTAG_ATTR_LIST(iunlink_fallback),
> >  	XFS_ERRORTAG_ATTR_LIST(buf_ioerror),
> > +	XFS_ERRORTAG_ATTR_LIST(reduce_max_iextents),
> >  	NULL,
> >  };
> >  
> 


-- 
chandan




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V11 05/14] xfs: Check for extent overflow when adding/removing dir entries
  2020-12-04  9:04     ` Chandan Babu R
@ 2020-12-07  8:18       ` Chandan Babu R
  2020-12-09 19:24         ` Darrick J. Wong
  0 siblings, 1 reply; 25+ messages in thread
From: Chandan Babu R @ 2020-12-07  8:18 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Fri, 04 Dec 2020 14:34:32 +0530, Chandan Babu R wrote:
> On Thu, 03 Dec 2020 11:04:22 -0800, Darrick J. Wong wrote:
> > On Tue, Nov 17, 2020 at 07:14:07PM +0530, Chandan Babu R wrote:
> > > Directory entry addition/removal can cause the following,
> > > 1. Data block can be added/removed.
> > >    A new extent can cause extent count to increase by 1.
> > > 2. Free disk block can be added/removed.
> > >    Same behaviour as described above for Data block.
> > > 3. Dabtree blocks.
> > >    XFS_DA_NODE_MAXDEPTH blocks can be added. Each of these
> > >    can be new extents. Hence extent count can increase by
> > >    XFS_DA_NODE_MAXDEPTH.
> > > 
> > > To be able to always remove an existing directory entry, when adding a
> > > new directory entry we make sure to reserve inode fork extent count
> > > required for removing a directory entry in addition to that required for
> > > the directory entry add operation.
> > > 
> > > Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
> > > ---
> > >  fs/xfs/libxfs/xfs_inode_fork.h | 13 +++++++++++++
> > >  fs/xfs/xfs_inode.c             | 27 +++++++++++++++++++++++++++
> > >  fs/xfs/xfs_symlink.c           |  5 +++++
> > >  3 files changed, 45 insertions(+)
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
> > > index 5de2f07d0dd5..fd93fdc67ee4 100644
> > > --- a/fs/xfs/libxfs/xfs_inode_fork.h
> > > +++ b/fs/xfs/libxfs/xfs_inode_fork.h
> > > @@ -57,6 +57,19 @@ struct xfs_ifork {
> > >  #define XFS_IEXT_ATTR_MANIP_CNT(rmt_blks) \
> > >  	(XFS_DA_NODE_MAXDEPTH + max(1, rmt_blks))
> > >  
> > > +/*
> > > + * Directory entry addition/removal can cause the following,
> > > + * 1. Data block can be added/removed.
> > > + *    A new extent can cause extent count to increase by 1.
> > > + * 2. Free disk block can be added/removed.
> > > + *    Same behaviour as described above for Data block.
> > > + * 3. Dabtree blocks.
> > > + *    XFS_DA_NODE_MAXDEPTH blocks can be added. Each of these can be new
> > > + *    extents. Hence extent count can increase by XFS_DA_NODE_MAXDEPTH.
> > > + */
> > > +#define XFS_IEXT_DIR_MANIP_CNT(mp) \
> > > +	((XFS_DA_NODE_MAXDEPTH + 1 + 1) * (mp)->m_dir_geo->fsbcount)
> > > +
> > >  /*
> > >   * Fork handling.
> > >   */
> > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > > index 2bfbcf28b1bd..f7b0b7fce940 100644
> > > --- a/fs/xfs/xfs_inode.c
> > > +++ b/fs/xfs/xfs_inode.c
> > > @@ -1177,6 +1177,11 @@ xfs_create(
> > >  	if (error)
> > >  		goto out_trans_cancel;
> > >  
> > > +	error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
> > > +			XFS_IEXT_DIR_MANIP_CNT(mp) << 1);
> > 
> > Er, why did these double since V10?  We're only adding one entry, right?
> 
> To be able to always guarantee the removal of an existing directory entry, we
> reserve inode fork extent count required for removing a directory entry in
> addition to that required for the directory entry add operation.
> 
> A bug was discovered when executing the following sequence of
> operations,
> 1. Keep inserting directory entries until the pseudo max extent count limit is
>    reached.
> 2. At this stage, a directory entry remove operation will fail because it
>    tries to reserve XFS_IEXT_DIR_MANIP_CNT(mp) worth of extent count. This
>    reservation fails since the extent count would go over the pseudo max
>    extent count limit as it did in step 1.
> 
> We would end up with a directory which can never be deleted.

I just found that reserving an extra XFS_IEXT_DIR_MANIP_CNT(mp) extent count,
when performing a directory insert operation, would not prevent us from ending
up with a directory which can never be deleted.

Let x be a directory's data fork extent count and lets assume its value to be,

x = MAX_EXT_COUNT - XFS_IEXT_DIR_MANIP_CNT(mp)

So in this case we do have sufficient "extent count" to be able to perform a
directory entry remove operation. But the directory remove operation itself
can cause extent count to increase by XFS_IEXT_DIR_MANIP_CNT(mp) units in the
worst case. This happens when freeing 5 dabtree blocks, one data block and one
free block causes file extents to be split for each of the above mentioned
blocks.

If on the other hand, the current value of 'x' were,

x = MAX_EXT_COUNT - (2 * XFS_IEXT_DIR_MANIP_CNT(mp))

'x' can still reach MAX_EXT_COUNT if two consecutive directory remove
operations can each cause extent count to increase by
XFS_IEXT_DIR_MANIP_CNT(mp).

IMHO there is no way to prevent a directory from becoming un-deletable
once its data fork extent count reaches close to MAX_EXT_COUNT. The other
choice of not checking for extent overflow would mean silent data
corruption. Hence maybe the former result is better one to go with.

W.r.t xattrs, not reserving an extra XFS_IEXT_ATTR_MANIP_CNT(mp) extent count
units would prevent the user from removing xattrs when the inode's attr fork
extent count value is close to MAX_EXT_COUNT. However, the file and the
associated extents will be removed during file deletion operation.

> 
> Hence V11 doubles the extent count reservation for "directory entry insert"
> operations. The first XFS_IEXT_DIR_MANIP_CNT(mp) instance is for "insert"
> operation while the second XFS_IEXT_DIR_MANIP_CNT(mp) instance is for
> guaranteeing a possible future "remove" operation to succeed.
> 
> > 
> > > +	if (error)
> > > +		goto out_trans_cancel;
> > > +
> > >  	/*
> > >  	 * A newly created regular or special file just has one directory
> > >  	 * entry pointing to them, but a directory also the "." entry
> > > @@ -1393,6 +1398,11 @@ xfs_link(
> > >  	xfs_trans_ijoin(tp, sip, XFS_ILOCK_EXCL);
> > >  	xfs_trans_ijoin(tp, tdp, XFS_ILOCK_EXCL);
> > >  
> > > +	error = xfs_iext_count_may_overflow(tdp, XFS_DATA_FORK,
> > > +			XFS_IEXT_DIR_MANIP_CNT(mp) << 1);
> > 
> > Same question here.
> 
> Creating a new hard link involves adding a new directory entry. Hence apart
> from reserving extent count for directory entry addition we will have to
> reserve extent count for a future directory entry removal as well.
> 
> > 
> > > +	if (error)
> > > +		goto error_return;
> > > +
> > >  	/*
> > >  	 * If we are using project inheritance, we only allow hard link
> > >  	 * creation in our tree when the project IDs are the same; else
> > > @@ -2861,6 +2871,11 @@ xfs_remove(
> > >  	xfs_trans_ijoin(tp, dp, XFS_ILOCK_EXCL);
> > >  	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
> > >  
> > > +	error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
> > > +			XFS_IEXT_DIR_MANIP_CNT(mp));
> > > +	if (error)
> > > +		goto out_trans_cancel;
> > > +
> > >  	/*
> > >  	 * If we're removing a directory perform some additional validation.
> > >  	 */
> > > @@ -3221,6 +3236,18 @@ xfs_rename(
> > >  	if (wip)
> > >  		xfs_trans_ijoin(tp, wip, XFS_ILOCK_EXCL);
> > >  
> > > +	error = xfs_iext_count_may_overflow(src_dp, XFS_DATA_FORK,
> > > +			XFS_IEXT_DIR_MANIP_CNT(mp));
> > > +	if (error)
> > > +		goto out_trans_cancel;
> > > +
> > > +	if (target_ip == NULL) {
> > > +		error = xfs_iext_count_may_overflow(target_dp, XFS_DATA_FORK,
> > > +				XFS_IEXT_DIR_MANIP_CNT(mp) << 1);
> > 
> > Why did this change to "<< 1" since V10?
> 
> Extent count is doubled since this is essentially a directory insert operation
> w.r.t target_dp directory. One instance of XFS_IEXT_DIR_MANIP_CNT(mp) is for
> the directory entry being added to target_dp directory and another instance of
> XFS_IEXT_DIR_MANIP_CNT(mp) is for guaranteeing a future directory entry
> removal from target_dp directory to succeed.
> 
> > 
> > I'm sorry, but I've lost my recollection on how the accounting works
> > here.  This seems (to me anyway ;)) a good candidate for a comment:
> > 
> > For a rename between dirs where the target name doesn't exist, we're
> > removing src_name from src_dp and adding target_name to target_dp.
> > Therefore we have to check for DIR_MANIP_CNT overflow on each of src_dp
> > and target_dp, right?
> 
> Extent count check is doubled since this is a directory insert operation w.r.t
> target_dp directory ... One instance of XFS_IEXT_DIR_MANIP_CNT(mp) is for the
> directory entry being added to target_dp directory and another instance of
> XFS_IEXT_DIR_MANIP_CNT(mp) is for guaranteeing a future directory entry
> removal from target_dp directory to succeed.
> 
> Since a directory entry is being removed from src_dp, reserving only a single
> instance of XFS_IEXT_DIR_MANIP_CNT(mp) would suffice.
> 
> > 
> > For a rename within the same dir where target_name doesn't yet exist, we
> > are removing a name and then adding a name.  We therefore check for iext
> > overflow with (DIR_MANIP_CNT * 2), right?  And I think that "target name
> > does not exist" is synonymous with target_ip == NULL?
> 
> Here again we have to reserve two instances of XFS_IEXT_DIR_MANIP_CNT(mp) for
> target_name insertion and one instance of XFS_IEXT_DIR_MANIP_CNT(mp) for
> src_name removal. This is because insertion and removal of src_name may each
> end up consuming XFS_IEXT_DIR_MANIP_CNT(mp) extent counts in the worst case. A
> future directory entry remove operation will require
> XFS_IEXT_DIR_MANIP_CNT(mp) extent counts to be reserved.
> 
> Also, You are right about "target name does not exist" being synonymous with
> target_ip == NULL.
> 
> > 
> > For a rename where target_name /does/ exist, we're only removing the
> > src_name, so we have to check for DIR_MANIP_CNT on src_dp, right?
> 
> Yes, you are right.
> 
> > 
> > For a RENAME_EXCHANGE we're not removing either name, so we don't need
> > to check for iext overflow of src_dp or target_dp, right?
> 
> You are right. Sorry, I missed this. I will move the extent count reservation
> logic to come after the invocation of xfs_cross_rename().
> 
> I will also add appropriate comments into xfs_rename() describing the
> scenarios that have been discussed above.
> 
> PS: I have swapped the order of two comments from your original reply since I
> think it is easier to explain the scenarios with the order of
> comments/questions swapped.
> 
> > 
> > > +		if (error)
> > > +			goto out_trans_cancel;
> > > +	}
> > > +
> > >  	/*
> > >  	 * If we are using project inheritance, we only allow renames
> > >  	 * into our tree when the project IDs are the same; else the
> > > diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
> > > index 8e88a7ca387e..08aa808fe290 100644
> > > --- a/fs/xfs/xfs_symlink.c
> > > +++ b/fs/xfs/xfs_symlink.c
> > > @@ -220,6 +220,11 @@ xfs_symlink(
> > >  	if (error)
> > >  		goto out_trans_cancel;
> > >  
> > > +	error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
> > > +			XFS_IEXT_DIR_MANIP_CNT(mp) << 1);
> > 
> > Same question as xfs_create.
> 
> This is again similar to adding a new directory entry. Hence, apart from
> reserving extent count for directory entry addition we will have to reserve
> extent count for a future directory entry removal as well.
> 
> > 
> > --D
> > 
> > > +	if (error)
> > > +		goto out_trans_cancel;
> > > +
> > >  	/*
> > >  	 * Allocate an inode for the symlink.
> > >  	 */
> > 
> 
> 


-- 
chandan




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V11 04/14] xfs: Check for extent overflow when adding/removing xattrs
  2020-12-04  9:04     ` Chandan Babu R
@ 2020-12-09 18:51       ` Darrick J. Wong
  0 siblings, 0 replies; 25+ messages in thread
From: Darrick J. Wong @ 2020-12-09 18:51 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs

On Fri, Dec 04, 2020 at 02:34:17PM +0530, Chandan Babu R wrote:
> On Thu, 03 Dec 2020 10:45:59 -0800, Darrick J. Wong wrote:
> > On Tue, Nov 17, 2020 at 07:14:06PM +0530, Chandan Babu R wrote:
> > > Adding/removing an xattr can cause XFS_DA_NODE_MAXDEPTH extents to be
> > > added. One extra extent for dabtree in case a local attr is large enough
> > > to cause a double split.  It can also cause extent count to increase
> > > proportional to the size of a remote xattr's value.
> > > 
> > > To be able to always remove an existing xattr, when adding an xattr we
> > > make sure to reserve inode fork extent count required for removing max
> > > sized xattr in addition to that required by the xattr add operation.
> > > 
> > > Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
> > > ---
> > >  fs/xfs/libxfs/xfs_attr.c       | 20 ++++++++++++++++++++
> > >  fs/xfs/libxfs/xfs_inode_fork.h | 10 ++++++++++
> > >  2 files changed, 30 insertions(+)
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > > index fd8e6418a0d3..d53b3867b308 100644
> > > --- a/fs/xfs/libxfs/xfs_attr.c
> > > +++ b/fs/xfs/libxfs/xfs_attr.c
> > > @@ -396,6 +396,8 @@ xfs_attr_set(
> > >  	struct xfs_trans_res	tres;
> > >  	bool			rsvd = (args->attr_filter & XFS_ATTR_ROOT);
> > >  	int			error, local;
> > > +	int			iext_cnt;
> > > +	int			rmt_blks;
> > >  	unsigned int		total;
> > >  
> > >  	if (XFS_FORCED_SHUTDOWN(dp->i_mount))
> > > @@ -416,6 +418,9 @@ xfs_attr_set(
> > >  	 */
> > >  	args->op_flags = XFS_DA_OP_OKNOENT;
> > >  
> > > +	rmt_blks = xfs_attr3_rmt_blocks(mp, XFS_XATTR_SIZE_MAX);
> > > +	iext_cnt = XFS_IEXT_ATTR_MANIP_CNT(rmt_blks);
> > 
> > These values are only relevant for the xattr removal case, right?
> > AFAICT the args->value != NULL case immediately after will set new
> > values, so why not just move this into...
> 
> The above statements compute the extent count required to remove a maximum
> sized remote xattr.
> 
> To guarantee that a user can always remove an xattr, the "args->value != NULL"
> case adds to the value of iext_cnt that has been computed above. I had
> extracted and placed the above set of statements since they were now common to
> both "insert" and "remove" xattr operations.

D'oh, you're right.

> > 
> > > +
> > >  	if (args->value) {
> > >  		XFS_STATS_INC(mp, xs_attr_set);
> > >  
> > > @@ -442,6 +447,13 @@ xfs_attr_set(
> > >  		tres.tr_logcount = XFS_ATTRSET_LOG_COUNT;
> > >  		tres.tr_logflags = XFS_TRANS_PERM_LOG_RES;
> > >  		total = args->total;
> > > +
> > > +		if (local)
> > > +			rmt_blks = 0;
> > > +		else
> > > +			rmt_blks = xfs_attr3_rmt_blocks(mp, args->valuelen);
> > > +
> > > +		iext_cnt += XFS_IEXT_ATTR_MANIP_CNT(rmt_blks);
> > >  	} else {
> > >  		XFS_STATS_INC(mp, xs_attr_remove);
> > 
> > ...the bottom of this clause here.
> > 
> > >  
> > > @@ -460,6 +472,14 @@ xfs_attr_set(
> > >  
> > >  	xfs_ilock(dp, XFS_ILOCK_EXCL);
> > >  	xfs_trans_ijoin(args->trans, dp, 0);
> > > +
> > > +	if (args->value || xfs_inode_hasattr(dp)) {
> > 
> > Can this simply be "if (iext_cnt != 0)" ?
> 
> That would lead to a bug since iext_cnt is computed unconditionally at the
> beginning of the function. An extent count reservation will be attempted when
> xattr delete operation is executed against an inode which does not have an
> associated attr fork. This will cause xfs_iext_count_may_overflow() to
> dereference the NULL pointer at xfs_inode->i_afp->if_nextents.

Ah, right, got it.  This looks fine to me then...

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> > 
> > --D
> > 
> > > +		error = xfs_iext_count_may_overflow(dp, XFS_ATTR_FORK,
> > > +				iext_cnt);
> > > +		if (error)
> > > +			goto out_trans_cancel;
> > > +	}
> > > +
> > >  	if (args->value) {
> > >  		unsigned int	quota_flags = XFS_QMOPT_RES_REGBLKS;
> > >  
> > > diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
> > > index bcac769a7df6..5de2f07d0dd5 100644
> > > --- a/fs/xfs/libxfs/xfs_inode_fork.h
> > > +++ b/fs/xfs/libxfs/xfs_inode_fork.h
> > > @@ -47,6 +47,16 @@ struct xfs_ifork {
> > >   */
> > >  #define XFS_IEXT_PUNCH_HOLE_CNT		(1)
> > >  
> > > +/*
> > > + * Adding/removing an xattr can cause XFS_DA_NODE_MAXDEPTH extents to
> > > + * be added. One extra extent for dabtree in case a local attr is
> > > + * large enough to cause a double split.  It can also cause extent
> > > + * count to increase proportional to the size of a remote xattr's
> > > + * value.
> > > + */
> > > +#define XFS_IEXT_ATTR_MANIP_CNT(rmt_blks) \
> > > +	(XFS_DA_NODE_MAXDEPTH + max(1, rmt_blks))
> > > +
> > >  /*
> > >   * Fork handling.
> > >   */
> > 
> 
> 
> -- 
> chandan
> 
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V11 05/14] xfs: Check for extent overflow when adding/removing dir entries
  2020-12-07  8:18       ` Chandan Babu R
@ 2020-12-09 19:24         ` Darrick J. Wong
  2020-12-11  5:49           ` Chandan Babu R
  0 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2020-12-09 19:24 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs

On Mon, Dec 07, 2020 at 01:48:50PM +0530, Chandan Babu R wrote:
> On Fri, 04 Dec 2020 14:34:32 +0530, Chandan Babu R wrote:
> > On Thu, 03 Dec 2020 11:04:22 -0800, Darrick J. Wong wrote:
> > > On Tue, Nov 17, 2020 at 07:14:07PM +0530, Chandan Babu R wrote:
> > > > Directory entry addition/removal can cause the following,
> > > > 1. Data block can be added/removed.
> > > >    A new extent can cause extent count to increase by 1.
> > > > 2. Free disk block can be added/removed.
> > > >    Same behaviour as described above for Data block.
> > > > 3. Dabtree blocks.
> > > >    XFS_DA_NODE_MAXDEPTH blocks can be added. Each of these
> > > >    can be new extents. Hence extent count can increase by
> > > >    XFS_DA_NODE_MAXDEPTH.
> > > > 
> > > > To be able to always remove an existing directory entry, when adding a
> > > > new directory entry we make sure to reserve inode fork extent count
> > > > required for removing a directory entry in addition to that required for
> > > > the directory entry add operation.
> > > > 
> > > > Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
> > > > ---
> > > >  fs/xfs/libxfs/xfs_inode_fork.h | 13 +++++++++++++
> > > >  fs/xfs/xfs_inode.c             | 27 +++++++++++++++++++++++++++
> > > >  fs/xfs/xfs_symlink.c           |  5 +++++
> > > >  3 files changed, 45 insertions(+)
> > > > 
> > > > diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
> > > > index 5de2f07d0dd5..fd93fdc67ee4 100644
> > > > --- a/fs/xfs/libxfs/xfs_inode_fork.h
> > > > +++ b/fs/xfs/libxfs/xfs_inode_fork.h
> > > > @@ -57,6 +57,19 @@ struct xfs_ifork {
> > > >  #define XFS_IEXT_ATTR_MANIP_CNT(rmt_blks) \
> > > >  	(XFS_DA_NODE_MAXDEPTH + max(1, rmt_blks))
> > > >  
> > > > +/*
> > > > + * Directory entry addition/removal can cause the following,
> > > > + * 1. Data block can be added/removed.
> > > > + *    A new extent can cause extent count to increase by 1.
> > > > + * 2. Free disk block can be added/removed.
> > > > + *    Same behaviour as described above for Data block.
> > > > + * 3. Dabtree blocks.
> > > > + *    XFS_DA_NODE_MAXDEPTH blocks can be added. Each of these can be new
> > > > + *    extents. Hence extent count can increase by XFS_DA_NODE_MAXDEPTH.
> > > > + */
> > > > +#define XFS_IEXT_DIR_MANIP_CNT(mp) \
> > > > +	((XFS_DA_NODE_MAXDEPTH + 1 + 1) * (mp)->m_dir_geo->fsbcount)
> > > > +
> > > >  /*
> > > >   * Fork handling.
> > > >   */
> > > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > > > index 2bfbcf28b1bd..f7b0b7fce940 100644
> > > > --- a/fs/xfs/xfs_inode.c
> > > > +++ b/fs/xfs/xfs_inode.c
> > > > @@ -1177,6 +1177,11 @@ xfs_create(
> > > >  	if (error)
> > > >  		goto out_trans_cancel;
> > > >  
> > > > +	error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
> > > > +			XFS_IEXT_DIR_MANIP_CNT(mp) << 1);
> > > 
> > > Er, why did these double since V10?  We're only adding one entry, right?
> > 
> > To be able to always guarantee the removal of an existing directory entry, we
> > reserve inode fork extent count required for removing a directory entry in
> > addition to that required for the directory entry add operation.
> > 
> > A bug was discovered when executing the following sequence of
> > operations,
> > 1. Keep inserting directory entries until the pseudo max extent count limit is
> >    reached.
> > 2. At this stage, a directory entry remove operation will fail because it
> >    tries to reserve XFS_IEXT_DIR_MANIP_CNT(mp) worth of extent count. This
> >    reservation fails since the extent count would go over the pseudo max
> >    extent count limit as it did in step 1.
> > 
> > We would end up with a directory which can never be deleted.
> 
> I just found that reserving an extra XFS_IEXT_DIR_MANIP_CNT(mp) extent count,
> when performing a directory insert operation, would not prevent us from ending
> up with a directory which can never be deleted.
> 
> Let x be a directory's data fork extent count and lets assume its value to be,
> 
> x = MAX_EXT_COUNT - XFS_IEXT_DIR_MANIP_CNT(mp)
> 
> So in this case we do have sufficient "extent count" to be able to perform a
> directory entry remove operation. But the directory remove operation itself
> can cause extent count to increase by XFS_IEXT_DIR_MANIP_CNT(mp) units in the
> worst case. This happens when freeing 5 dabtree blocks, one data block and one
> free block causes file extents to be split for each of the above mentioned
> blocks.
> 
> If on the other hand, the current value of 'x' were,
> 
> x = MAX_EXT_COUNT - (2 * XFS_IEXT_DIR_MANIP_CNT(mp))
> 
> 'x' can still reach MAX_EXT_COUNT if two consecutive directory remove
> operations can each cause extent count to increase by
> XFS_IEXT_DIR_MANIP_CNT(mp).
> 
> IMHO there is no way to prevent a directory from becoming un-deletable
> once its data fork extent count reaches close to MAX_EXT_COUNT. The other
> choice of not checking for extent overflow would mean silent data
> corruption. Hence maybe the former result is better one to go with.

So in other words you're doubling the amount you pass into the overflow
check so that we can guarantee that a future dirent removal will work.

In other words, the doubling is to preserve future functionality, and is
not required by the create() call itself.  This should be captured in
a comment above the call to xfs_iext_count_may_overflow.

Or I guess you could create an XFS_IEXT_DIRENT_CREATE macro that wraps
all that (along with that comment explaining why).

> W.r.t xattrs, not reserving an extra XFS_IEXT_ATTR_MANIP_CNT(mp) extent count
> units would prevent the user from removing xattrs when the inode's attr fork
> extent count value is close to MAX_EXT_COUNT. However, the file and the
> associated extents will be removed during file deletion operation.

<shrug> I doubt xattr trees often get close to 64k extents, so you might
as well apply the same logic to them.  Better to cut off the user early
than to force them to delete the whole file just to wipe out the xattrs.

> > 
> > Hence V11 doubles the extent count reservation for "directory entry insert"
> > operations. The first XFS_IEXT_DIR_MANIP_CNT(mp) instance is for "insert"
> > operation while the second XFS_IEXT_DIR_MANIP_CNT(mp) instance is for
> > guaranteeing a possible future "remove" operation to succeed.
> > 
> > > 
> > > > +	if (error)
> > > > +		goto out_trans_cancel;
> > > > +
> > > >  	/*
> > > >  	 * A newly created regular or special file just has one directory
> > > >  	 * entry pointing to them, but a directory also the "." entry
> > > > @@ -1393,6 +1398,11 @@ xfs_link(
> > > >  	xfs_trans_ijoin(tp, sip, XFS_ILOCK_EXCL);
> > > >  	xfs_trans_ijoin(tp, tdp, XFS_ILOCK_EXCL);
> > > >  
> > > > +	error = xfs_iext_count_may_overflow(tdp, XFS_DATA_FORK,
> > > > +			XFS_IEXT_DIR_MANIP_CNT(mp) << 1);
> > > 
> > > Same question here.
> > 
> > Creating a new hard link involves adding a new directory entry. Hence apart
> > from reserving extent count for directory entry addition we will have to
> > reserve extent count for a future directory entry removal as well.

In other words, we also want XFS_IEXT_DIRENT_CREATE here?

> > 
> > > 
> > > > +	if (error)
> > > > +		goto error_return;
> > > > +
> > > >  	/*
> > > >  	 * If we are using project inheritance, we only allow hard link
> > > >  	 * creation in our tree when the project IDs are the same; else
> > > > @@ -2861,6 +2871,11 @@ xfs_remove(
> > > >  	xfs_trans_ijoin(tp, dp, XFS_ILOCK_EXCL);
> > > >  	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
> > > >  
> > > > +	error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
> > > > +			XFS_IEXT_DIR_MANIP_CNT(mp));
> > > > +	if (error)
> > > > +		goto out_trans_cancel;
> > > > +
> > > >  	/*
> > > >  	 * If we're removing a directory perform some additional validation.
> > > >  	 */
> > > > @@ -3221,6 +3236,18 @@ xfs_rename(
> > > >  	if (wip)
> > > >  		xfs_trans_ijoin(tp, wip, XFS_ILOCK_EXCL);
> > > >  
> > > > +	error = xfs_iext_count_may_overflow(src_dp, XFS_DATA_FORK,
> > > > +			XFS_IEXT_DIR_MANIP_CNT(mp));
> > > > +	if (error)
> > > > +		goto out_trans_cancel;
> > > > +
> > > > +	if (target_ip == NULL) {
> > > > +		error = xfs_iext_count_may_overflow(target_dp, XFS_DATA_FORK,
> > > > +				XFS_IEXT_DIR_MANIP_CNT(mp) << 1);
> > > 
> > > Why did this change to "<< 1" since V10?
> > 
> > Extent count is doubled since this is essentially a directory insert operation
> > w.r.t target_dp directory. One instance of XFS_IEXT_DIR_MANIP_CNT(mp) is for
> > the directory entry being added to target_dp directory and another instance of
> > XFS_IEXT_DIR_MANIP_CNT(mp) is for guaranteeing a future directory entry
> > removal from target_dp directory to succeed.

...and here too?

> > 
> > > 
> > > I'm sorry, but I've lost my recollection on how the accounting works
> > > here.  This seems (to me anyway ;)) a good candidate for a comment:
> > > 
> > > For a rename between dirs where the target name doesn't exist, we're
> > > removing src_name from src_dp and adding target_name to target_dp.
> > > Therefore we have to check for DIR_MANIP_CNT overflow on each of src_dp
> > > and target_dp, right?
> > 
> > Extent count check is doubled since this is a directory insert operation w.r.t
> > target_dp directory ... One instance of XFS_IEXT_DIR_MANIP_CNT(mp) is for the
> > directory entry being added to target_dp directory and another instance of
> > XFS_IEXT_DIR_MANIP_CNT(mp) is for guaranteeing a future directory entry
> > removal from target_dp directory to succeed.

Or in other words, another place for XFS_IEXT_DIRENT_CREATE...

> > Since a directory entry is being removed from src_dp, reserving only a single
> > instance of XFS_IEXT_DIR_MANIP_CNT(mp) would suffice.

<nod>

> > > 
> > > For a rename within the same dir where target_name doesn't yet exist, we
> > > are removing a name and then adding a name.  We therefore check for iext
> > > overflow with (DIR_MANIP_CNT * 2), right?  And I think that "target name
> > > does not exist" is synonymous with target_ip == NULL?
> > 
> > Here again we have to reserve two instances of XFS_IEXT_DIR_MANIP_CNT(mp) for
> > target_name insertion and one instance of XFS_IEXT_DIR_MANIP_CNT(mp) for
> > src_name removal. This is because insertion and removal of src_name may each
> > end up consuming XFS_IEXT_DIR_MANIP_CNT(mp) extent counts in the worst case. A
> > future directory entry remove operation will require
> > XFS_IEXT_DIR_MANIP_CNT(mp) extent counts to be reserved.

...and another place for DIRENT_CREATE...

> > 
> > Also, You are right about "target name does not exist" being synonymous with
> > target_ip == NULL.
> > 
> > > 
> > > For a rename where target_name /does/ exist, we're only removing the
> > > src_name, so we have to check for DIR_MANIP_CNT on src_dp, right?
> > 
> > Yes, you are right.
> > 
> > > 
> > > For a RENAME_EXCHANGE we're not removing either name, so we don't need
> > > to check for iext overflow of src_dp or target_dp, right?
> > 
> > You are right. Sorry, I missed this. I will move the extent count reservation
> > logic to come after the invocation of xfs_cross_rename().

Ok.

> > I will also add appropriate comments into xfs_rename() describing the
> > scenarios that have been discussed above.

Thanks.

> > PS: I have swapped the order of two comments from your original reply since I
> > think it is easier to explain the scenarios with the order of
> > comments/questions swapped.

Ok.

> > 
> > > 
> > > > +		if (error)
> > > > +			goto out_trans_cancel;
> > > > +	}
> > > > +
> > > >  	/*
> > > >  	 * If we are using project inheritance, we only allow renames
> > > >  	 * into our tree when the project IDs are the same; else the
> > > > diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
> > > > index 8e88a7ca387e..08aa808fe290 100644
> > > > --- a/fs/xfs/xfs_symlink.c
> > > > +++ b/fs/xfs/xfs_symlink.c
> > > > @@ -220,6 +220,11 @@ xfs_symlink(
> > > >  	if (error)
> > > >  		goto out_trans_cancel;
> > > >  
> > > > +	error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
> > > > +			XFS_IEXT_DIR_MANIP_CNT(mp) << 1);
> > > 
> > > Same question as xfs_create.
> > 
> > This is again similar to adding a new directory entry. Hence, apart from
> > reserving extent count for directory entry addition we will have to reserve
> > extent count for a future directory entry removal as well.

...and here yet another place to use XFS_IEXT_DIRENT_CREATE?

--D

> > 
> > > 
> > > --D
> > > 
> > > > +	if (error)
> > > > +		goto out_trans_cancel;
> > > > +
> > > >  	/*
> > > >  	 * Allocate an inode for the symlink.
> > > >  	 */
> > > 
> > 
> > 
> 
> 
> -- 
> chandan
> 
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V11 05/14] xfs: Check for extent overflow when adding/removing dir entries
  2020-12-09 19:24         ` Darrick J. Wong
@ 2020-12-11  5:49           ` Chandan Babu R
  0 siblings, 0 replies; 25+ messages in thread
From: Chandan Babu R @ 2020-12-11  5:49 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, 09 Dec 2020 11:24:04 -0800, Darrick J. Wong wrote:
> On Mon, Dec 07, 2020 at 01:48:50PM +0530, Chandan Babu R wrote:
> > On Fri, 04 Dec 2020 14:34:32 +0530, Chandan Babu R wrote:
> > > On Thu, 03 Dec 2020 11:04:22 -0800, Darrick J. Wong wrote:
> > > > On Tue, Nov 17, 2020 at 07:14:07PM +0530, Chandan Babu R wrote:
> > > > > Directory entry addition/removal can cause the following,
> > > > > 1. Data block can be added/removed.
> > > > >    A new extent can cause extent count to increase by 1.
> > > > > 2. Free disk block can be added/removed.
> > > > >    Same behaviour as described above for Data block.
> > > > > 3. Dabtree blocks.
> > > > >    XFS_DA_NODE_MAXDEPTH blocks can be added. Each of these
> > > > >    can be new extents. Hence extent count can increase by
> > > > >    XFS_DA_NODE_MAXDEPTH.
> > > > > 
> > > > > To be able to always remove an existing directory entry, when adding a
> > > > > new directory entry we make sure to reserve inode fork extent count
> > > > > required for removing a directory entry in addition to that required for
> > > > > the directory entry add operation.
> > > > > 
> > > > > Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
> > > > > ---
> > > > >  fs/xfs/libxfs/xfs_inode_fork.h | 13 +++++++++++++
> > > > >  fs/xfs/xfs_inode.c             | 27 +++++++++++++++++++++++++++
> > > > >  fs/xfs/xfs_symlink.c           |  5 +++++
> > > > >  3 files changed, 45 insertions(+)
> > > > > 
> > > > > diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
> > > > > index 5de2f07d0dd5..fd93fdc67ee4 100644
> > > > > --- a/fs/xfs/libxfs/xfs_inode_fork.h
> > > > > +++ b/fs/xfs/libxfs/xfs_inode_fork.h
> > > > > @@ -57,6 +57,19 @@ struct xfs_ifork {
> > > > >  #define XFS_IEXT_ATTR_MANIP_CNT(rmt_blks) \
> > > > >  	(XFS_DA_NODE_MAXDEPTH + max(1, rmt_blks))
> > > > >  
> > > > > +/*
> > > > > + * Directory entry addition/removal can cause the following,
> > > > > + * 1. Data block can be added/removed.
> > > > > + *    A new extent can cause extent count to increase by 1.
> > > > > + * 2. Free disk block can be added/removed.
> > > > > + *    Same behaviour as described above for Data block.
> > > > > + * 3. Dabtree blocks.
> > > > > + *    XFS_DA_NODE_MAXDEPTH blocks can be added. Each of these can be new
> > > > > + *    extents. Hence extent count can increase by XFS_DA_NODE_MAXDEPTH.
> > > > > + */
> > > > > +#define XFS_IEXT_DIR_MANIP_CNT(mp) \
> > > > > +	((XFS_DA_NODE_MAXDEPTH + 1 + 1) * (mp)->m_dir_geo->fsbcount)
> > > > > +
> > > > >  /*
> > > > >   * Fork handling.
> > > > >   */
> > > > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > > > > index 2bfbcf28b1bd..f7b0b7fce940 100644
> > > > > --- a/fs/xfs/xfs_inode.c
> > > > > +++ b/fs/xfs/xfs_inode.c
> > > > > @@ -1177,6 +1177,11 @@ xfs_create(
> > > > >  	if (error)
> > > > >  		goto out_trans_cancel;
> > > > >  
> > > > > +	error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
> > > > > +			XFS_IEXT_DIR_MANIP_CNT(mp) << 1);
> > > > 
> > > > Er, why did these double since V10?  We're only adding one entry, right?
> > > 
> > > To be able to always guarantee the removal of an existing directory entry, we
> > > reserve inode fork extent count required for removing a directory entry in
> > > addition to that required for the directory entry add operation.
> > > 
> > > A bug was discovered when executing the following sequence of
> > > operations,
> > > 1. Keep inserting directory entries until the pseudo max extent count limit is
> > >    reached.
> > > 2. At this stage, a directory entry remove operation will fail because it
> > >    tries to reserve XFS_IEXT_DIR_MANIP_CNT(mp) worth of extent count. This
> > >    reservation fails since the extent count would go over the pseudo max
> > >    extent count limit as it did in step 1.
> > > 
> > > We would end up with a directory which can never be deleted.
> > 
> > I just found that reserving an extra XFS_IEXT_DIR_MANIP_CNT(mp) extent count,
> > when performing a directory insert operation, would not prevent us from ending
> > up with a directory which can never be deleted.
> > 
> > Let x be a directory's data fork extent count and lets assume its value to be,
> > 
> > x = MAX_EXT_COUNT - XFS_IEXT_DIR_MANIP_CNT(mp)
> > 
> > So in this case we do have sufficient "extent count" to be able to perform a
> > directory entry remove operation. But the directory remove operation itself
> > can cause extent count to increase by XFS_IEXT_DIR_MANIP_CNT(mp) units in the
> > worst case. This happens when freeing 5 dabtree blocks, one data block and one
> > free block causes file extents to be split for each of the above mentioned
> > blocks.
> > 
> > If on the other hand, the current value of 'x' were,
> > 
> > x = MAX_EXT_COUNT - (2 * XFS_IEXT_DIR_MANIP_CNT(mp))
> > 
> > 'x' can still reach MAX_EXT_COUNT if two consecutive directory remove
> > operations can each cause extent count to increase by
> > XFS_IEXT_DIR_MANIP_CNT(mp).
> > 
> > IMHO there is no way to prevent a directory from becoming un-deletable
> > once its data fork extent count reaches close to MAX_EXT_COUNT. The other
> > choice of not checking for extent overflow would mean silent data
> > corruption. Hence maybe the former result is better one to go with.
> 
> So in other words you're doubling the amount you pass into the overflow
> check so that we can guarantee that a future dirent removal will work.
> 
> In other words, the doubling is to preserve future functionality, and is
> not required by the create() call itself.  This should be captured in
> a comment above the call to xfs_iext_count_may_overflow.
> 
> Or I guess you could create an XFS_IEXT_DIRENT_CREATE macro that wraps
> all that (along with that comment explaining why).

Sorry, the previous explaination I had provided earlier was not clear
enough. I hope the following will add more clarity.

The bottom line is that extent count reservation cannot guarantee that a
future "directory entry remove" operation will always have sufficient amount
of "extent count" left in the inode's data fork extent count.

Doubling the extent count that gets reserved during directory entry insert
operation can leave a minimum of XFS_IEXT_DIR_MANIP_CNT extent count units
free after the operation is completed. But this extent count can be consumed
away by a directory entry remove operation that ends up freeing directory
blocks due to the fact that it can cause extent count of directory inode to
increase by XFS_IEXT_DIR_MANIP_CNT units.

For example, Assume that a directory's data fork extent count is X and it has
the following extent record in its bmbt,

 | Y | Y+1 | Y+2 |

Here Y is the offset within the directory. The directory blocks are within the
same extent record since they are contiguous in terms of both file offset and
disk offset. Now, if a directory entry remove operation is executed, it can
free the block Y+1. This causes an increase in the extent count.

So a single directory entry remove operation has the potential to increase
extent count by XFS_IEXT_DIR_MANIP_CNT units in the worst case (one data
block, one free disk block and 5 dabtree blocks). Hence reserving an extra
XFS_IEXT_DIR_MANIP_CNT units of extent count during directory entry insertion
would not help solve the problem.

The above mentioned scenario can be further extended:
If a directory has (2 * XFS_IEXT_DIR_MANIP_CNT) units of free extent count
left, two directory entry remove operations can potentially increase extent
count by XFS_IEXT_DIR_MANIP_CNT units each and hence the execution of a third
consecutive directory entry remove operation fails due to lack of availability
of free extent count.

A related problem has already been solved in XFS.

File deletion can fail in the case of low disk space scenarios. This happens
because of failure to successfully reserve disk blocks for "directory entry
remove" transaction. In such a case, xfs_remove() allocates a transaction with
tp->t_blk_res set to 0. During the execution of the operation, if we end up
having to remove a block (say a "directory data block") from the directory,
the following events could occur,

1. xfs_bmap_del_extent_real(): Extent count increases because the block that
   is being unmapped from bmbt occurs in the middle of an extent record.
2. We truncate the length of the existing extent record and try to insert a
   new extent record which maps the blocks of the original extent record that
   occurs after the block being freed.
3. The following sequence of functions are invoked if inserting a new record
   requires a btree block to be split,
   xfs_btree_insert() => xfs_btree_insrec() => xfs_btree_make_block_unfull()
   => __xfs_btree_split() => xfs_bmbt_alloc_block()
4. xfs_bmbt_alloc_block() returns -ENOSPC when it notices tp->t_blk_res having
   a value of 0.
5. Upon receiving -ENOSPC return value, xfs_bmap_del_extent_real() restores
   the original extent record.
6. Invokers of xfs_dir2_shrink_inode() (e.g. xfs_dir3_data_block_free()) would
   ignore the -ENOSPC error code and hence the corresponding directory block
   is not freed. It is left to a future user of the block to be able to free
   it.

I think we have to use a similar approach to solve the "undeletable directory"
problem. To that end, I have written a patch which implements the following
logic,

W.r.t directory entry remove operation, we check for extent count overflow in
xfs_bmap_del_extent_real() only when the block being unmapped could cause an
increase the extent count overflow. If unmapping can cause extent count to
overflow, xfs_bmap_del_extent_real() would return -ENOSPC, causing the
invokers of xfs_dir2_shrink_inode() to ignore -ENOSPC and leaving the
responsibility of freeing the directory block for future.

For "directory rename operation", the explanation provided for "directory
entry remove" operation holds for the "source directory entry". For rename's
destination directory entry, we now check for extent overflow only when we
have successfully reserved non-zero blocks for the transaction. This is
because with zero block-sized reservation, the rename either
1. Fails due to non-availability of space in existing directory blocks for
   holding the new directory entry.
2. Succeeds since the directory has enough space in existing blocks to hold
   the new directory entry. In this case, XFS wouldn't add new blocks to the
   directory.

For the remaining directory operations (e.g. create, link and symlink) we
continue to reserve XFS_IEXT_ATTR_MANIP_CNT units of extent count before the
corresponding transaction starts dirtying metadata.

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 505358839d2f..b388b7d55cb9 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -5050,6 +5050,12 @@ xfs_bmap_del_extent_real(
 	    del->br_startoff > got.br_startoff && del_endoff < got_endoff)
 		return -ENOSPC;
 
+	if (S_ISDIR(VFS_I(ip)->i_mode) &&
+	    whichfork == XFS_DATA_FORK &&
+	    del->br_startoff > got.br_startoff && del_endoff < got_endoff &&
+	    xfs_iext_count_may_overflow(ip, whichfork, 1))
+		return -ENOSPC;
+
 	flags = XFS_ILOG_CORE;
 	if (whichfork == XFS_DATA_FORK && XFS_IS_REALTIME_INODE(ip)) {
 		xfs_filblks_t	len;
diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
index 5de2f07d0dd5..fd93fdc67ee4 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.h
+++ b/fs/xfs/libxfs/xfs_inode_fork.h
@@ -57,6 +57,19 @@ struct xfs_ifork {
 #define XFS_IEXT_ATTR_MANIP_CNT(rmt_blks) \
 	(XFS_DA_NODE_MAXDEPTH + max(1, rmt_blks))
 
+/*
+ * Directory entry addition/removal can cause the following,
+ * 1. Data block can be added/removed.
+ *    A new extent can cause extent count to increase by 1.
+ * 2. Free disk block can be added/removed.
+ *    Same behaviour as described above for Data block.
+ * 3. Dabtree blocks.
+ *    XFS_DA_NODE_MAXDEPTH blocks can be added. Each of these can be new
+ *    extents. Hence extent count can increase by XFS_DA_NODE_MAXDEPTH.
+ */
+#define XFS_IEXT_DIR_MANIP_CNT(mp) \
+	((XFS_DA_NODE_MAXDEPTH + 1 + 1) * (mp)->m_dir_geo->fsbcount)
+
 /*
  * Fork handling.
  */
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 2bfbcf28b1bd..c4f3a42d5733 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1177,6 +1177,11 @@ xfs_create(
 	if (error)
 		goto out_trans_cancel;
 
+	error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
+			XFS_IEXT_DIR_MANIP_CNT(mp));
+	if (error)
+		goto out_trans_cancel;
+
 	/*
 	 * A newly created regular or special file just has one directory
 	 * entry pointing to them, but a directory also the "." entry
@@ -1393,6 +1398,11 @@ xfs_link(
 	xfs_trans_ijoin(tp, sip, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(tp, tdp, XFS_ILOCK_EXCL);
 
+	error = xfs_iext_count_may_overflow(tdp, XFS_DATA_FORK,
+			XFS_IEXT_DIR_MANIP_CNT(mp));
+	if (error)
+		goto error_return;
+
 	/*
 	 * If we are using project inheritance, we only allow hard link
 	 * creation in our tree when the project IDs are the same; else
@@ -3246,12 +3256,27 @@ xfs_rename(
 		/*
 		 * If there's no space reservation, check the entry will
 		 * fit before actually inserting it.
+		 *
+		 * If the entry does fit in, then there is no need to check for
+		 * extent count overflow since no new extents will be added to
+		 * the directory's data fork.
 		 */
 		if (!spaceres) {
 			error = xfs_dir_canenter(tp, target_dp, target_name);
 			if (error)
 				goto out_trans_cancel;
 		}
+		/*
+		 * Otherwise, Check if inserting the new entry can cause extent
+		 * count to overflow.
+		 */
+		else {
+			error = xfs_iext_count_may_overflow(target_dp,
+					XFS_DATA_FORK,
+					XFS_IEXT_DIR_MANIP_CNT(mp));
+			if (error)
+				goto out_trans_cancel;
+		}
 	} else {
 		/*
 		 * If target exists and it's a directory, check that whether
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index 8e88a7ca387e..581a4032a817 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -220,6 +220,11 @@ xfs_symlink(
 	if (error)
 		goto out_trans_cancel;
 
+	error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
+			XFS_IEXT_DIR_MANIP_CNT(mp));
+	if (error)
+		goto out_trans_cancel;
+
 	/*
 	 * Allocate an inode for the symlink.
 	 */

The ideal way to test this patch would be to have a directory whose extent
count has reached the maximum limit and also has an extent record with atleast
three blocks. Freeing directory entries occupying the middle block in such an
extent record would trigger the -ENOSPC error code handling described above.

However, I don't think it is possible to deterministically create such a
directory layout by executing commands from userspace. Hence my testing was
limited to, 
1. Fill a directory with entries until the maximum extent count limit is
   reached. Remove the corresponding directory. The remove operation should
   succeed.
2. Regular run of fstests with various mount options.

> 
> > W.r.t xattrs, not reserving an extra XFS_IEXT_ATTR_MANIP_CNT(mp) extent count
> > units would prevent the user from removing xattrs when the inode's attr fork
> > extent count value is close to MAX_EXT_COUNT. However, the file and the
> > associated extents will be removed during file deletion operation.
> 
> <shrug> I doubt xattr trees often get close to 64k extents, so you might
> as well apply the same logic to them.  Better to cut off the user early
> than to force them to delete the whole file just to wipe out the xattrs.

As described above reserving twice the amount of extent count units during
xattr insertion would not guarantee a future xattr remove operation to obtain
extent count reservation successfully. Hence to always allow xattr remove
operation, we have to implement some of the logic associated with "directory
remove operation". This includes adding the ability to swap dabtree
block (whose removal can cause -ENOSPC to be returned from xfs_bunmapi()) with
the last block of the xattr dabtree (i.e. the logic implemented by
xfs_da3_swap_lastblock()). Please let me know if you prefer this approach to
be implemented instead of file deletion.

> 
> > > 
> > > Hence V11 doubles the extent count reservation for "directory entry insert"
> > > operations. The first XFS_IEXT_DIR_MANIP_CNT(mp) instance is for "insert"
> > > operation while the second XFS_IEXT_DIR_MANIP_CNT(mp) instance is for
> > > guaranteeing a possible future "remove" operation to succeed.
> > > 
> > > > 
> > > > > +	if (error)
> > > > > +		goto out_trans_cancel;
> > > > > +
> > > > >  	/*
> > > > >  	 * A newly created regular or special file just has one directory
> > > > >  	 * entry pointing to them, but a directory also the "." entry
> > > > > @@ -1393,6 +1398,11 @@ xfs_link(
> > > > >  	xfs_trans_ijoin(tp, sip, XFS_ILOCK_EXCL);
> > > > >  	xfs_trans_ijoin(tp, tdp, XFS_ILOCK_EXCL);
> > > > >  
> > > > > +	error = xfs_iext_count_may_overflow(tdp, XFS_DATA_FORK,
> > > > > +			XFS_IEXT_DIR_MANIP_CNT(mp) << 1);
> > > > 
> > > > Same question here.
> > > 
> > > Creating a new hard link involves adding a new directory entry. Hence apart
> > > from reserving extent count for directory entry addition we will have to
> > > reserve extent count for a future directory entry removal as well.
> 
> In other words, we also want XFS_IEXT_DIRENT_CREATE here?
> 
> > > 
> > > > 
> > > > > +	if (error)
> > > > > +		goto error_return;
> > > > > +
> > > > >  	/*
> > > > >  	 * If we are using project inheritance, we only allow hard link
> > > > >  	 * creation in our tree when the project IDs are the same; else
> > > > > @@ -2861,6 +2871,11 @@ xfs_remove(
> > > > >  	xfs_trans_ijoin(tp, dp, XFS_ILOCK_EXCL);
> > > > >  	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
> > > > >  
> > > > > +	error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
> > > > > +			XFS_IEXT_DIR_MANIP_CNT(mp));
> > > > > +	if (error)
> > > > > +		goto out_trans_cancel;
> > > > > +
> > > > >  	/*
> > > > >  	 * If we're removing a directory perform some additional validation.
> > > > >  	 */
> > > > > @@ -3221,6 +3236,18 @@ xfs_rename(
> > > > >  	if (wip)
> > > > >  		xfs_trans_ijoin(tp, wip, XFS_ILOCK_EXCL);
> > > > >  
> > > > > +	error = xfs_iext_count_may_overflow(src_dp, XFS_DATA_FORK,
> > > > > +			XFS_IEXT_DIR_MANIP_CNT(mp));
> > > > > +	if (error)
> > > > > +		goto out_trans_cancel;
> > > > > +
> > > > > +	if (target_ip == NULL) {
> > > > > +		error = xfs_iext_count_may_overflow(target_dp, XFS_DATA_FORK,
> > > > > +				XFS_IEXT_DIR_MANIP_CNT(mp) << 1);
> > > > 
> > > > Why did this change to "<< 1" since V10?
> > > 
> > > Extent count is doubled since this is essentially a directory insert operation
> > > w.r.t target_dp directory. One instance of XFS_IEXT_DIR_MANIP_CNT(mp) is for
> > > the directory entry being added to target_dp directory and another instance of
> > > XFS_IEXT_DIR_MANIP_CNT(mp) is for guaranteeing a future directory entry
> > > removal from target_dp directory to succeed.
> 
> ...and here too?
> 
> > > 
> > > > 
> > > > I'm sorry, but I've lost my recollection on how the accounting works
> > > > here.  This seems (to me anyway ;)) a good candidate for a comment:
> > > > 
> > > > For a rename between dirs where the target name doesn't exist, we're
> > > > removing src_name from src_dp and adding target_name to target_dp.
> > > > Therefore we have to check for DIR_MANIP_CNT overflow on each of src_dp
> > > > and target_dp, right?
> > > 
> > > Extent count check is doubled since this is a directory insert operation w.r.t
> > > target_dp directory ... One instance of XFS_IEXT_DIR_MANIP_CNT(mp) is for the
> > > directory entry being added to target_dp directory and another instance of
> > > XFS_IEXT_DIR_MANIP_CNT(mp) is for guaranteeing a future directory entry
> > > removal from target_dp directory to succeed.
> 
> Or in other words, another place for XFS_IEXT_DIRENT_CREATE...
> 
> > > Since a directory entry is being removed from src_dp, reserving only a single
> > > instance of XFS_IEXT_DIR_MANIP_CNT(mp) would suffice.
> 
> <nod>
> 
> > > > 
> > > > For a rename within the same dir where target_name doesn't yet exist, we
> > > > are removing a name and then adding a name.  We therefore check for iext
> > > > overflow with (DIR_MANIP_CNT * 2), right?  And I think that "target name
> > > > does not exist" is synonymous with target_ip == NULL?
> > > 
> > > Here again we have to reserve two instances of XFS_IEXT_DIR_MANIP_CNT(mp) for
> > > target_name insertion and one instance of XFS_IEXT_DIR_MANIP_CNT(mp) for
> > > src_name removal. This is because insertion and removal of src_name may each
> > > end up consuming XFS_IEXT_DIR_MANIP_CNT(mp) extent counts in the worst case. A
> > > future directory entry remove operation will require
> > > XFS_IEXT_DIR_MANIP_CNT(mp) extent counts to be reserved.
> 
> ...and another place for DIRENT_CREATE...
> 
> > > 
> > > Also, You are right about "target name does not exist" being synonymous with
> > > target_ip == NULL.
> > > 
> > > > 
> > > > For a rename where target_name /does/ exist, we're only removing the
> > > > src_name, so we have to check for DIR_MANIP_CNT on src_dp, right?
> > > 
> > > Yes, you are right.
> > > 
> > > > 
> > > > For a RENAME_EXCHANGE we're not removing either name, so we don't need
> > > > to check for iext overflow of src_dp or target_dp, right?
> > > 
> > > You are right. Sorry, I missed this. I will move the extent count reservation
> > > logic to come after the invocation of xfs_cross_rename().
> 
> Ok.
> 
> > > I will also add appropriate comments into xfs_rename() describing the
> > > scenarios that have been discussed above.
> 
> Thanks.
> 
> > > PS: I have swapped the order of two comments from your original reply since I
> > > think it is easier to explain the scenarios with the order of
> > > comments/questions swapped.
> 
> Ok.
> 
> > > 
> > > > 
> > > > > +		if (error)
> > > > > +			goto out_trans_cancel;
> > > > > +	}
> > > > > +
> > > > >  	/*
> > > > >  	 * If we are using project inheritance, we only allow renames
> > > > >  	 * into our tree when the project IDs are the same; else the
> > > > > diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
> > > > > index 8e88a7ca387e..08aa808fe290 100644
> > > > > --- a/fs/xfs/xfs_symlink.c
> > > > > +++ b/fs/xfs/xfs_symlink.c
> > > > > @@ -220,6 +220,11 @@ xfs_symlink(
> > > > >  	if (error)
> > > > >  		goto out_trans_cancel;
> > > > >  
> > > > > +	error = xfs_iext_count_may_overflow(dp, XFS_DATA_FORK,
> > > > > +			XFS_IEXT_DIR_MANIP_CNT(mp) << 1);
> > > > 
> > > > Same question as xfs_create.
> > > 
> > > This is again similar to adding a new directory entry. Hence, apart from
> > > reserving extent count for directory entry addition we will have to reserve
> > > extent count for a future directory entry removal as well.
> 
> ...and here yet another place to use XFS_IEXT_DIRENT_CREATE?
> 
> --D
> 
> > > 
> > > > 
> > > > --D
> > > > 
> > > > > +	if (error)
> > > > > +		goto out_trans_cancel;
> > > > > +
> > > > >  	/*
> > > > >  	 * Allocate an inode for the symlink.
> > > > >  	 */
> > > > 
> > > 
> > > 
> > 
> > 
> 


-- 
chandan




^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2020-12-11  5:51 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-17 13:44 [PATCH V11 00/14] Bail out if transaction can cause extent count to overflow Chandan Babu R
2020-11-17 13:44 ` [PATCH V11 01/14] xfs: Add helper for checking per-inode extent count overflow Chandan Babu R
2020-11-17 13:44 ` [PATCH V11 02/14] xfs: Check for extent overflow when trivally adding a new extent Chandan Babu R
2020-11-17 13:44 ` [PATCH V11 03/14] xfs: Check for extent overflow when punching a hole Chandan Babu R
2020-11-17 13:44 ` [PATCH V11 04/14] xfs: Check for extent overflow when adding/removing xattrs Chandan Babu R
2020-12-03 18:45   ` Darrick J. Wong
2020-12-04  9:04     ` Chandan Babu R
2020-12-09 18:51       ` Darrick J. Wong
2020-11-17 13:44 ` [PATCH V11 05/14] xfs: Check for extent overflow when adding/removing dir entries Chandan Babu R
2020-12-03 19:04   ` Darrick J. Wong
2020-12-04  9:04     ` Chandan Babu R
2020-12-07  8:18       ` Chandan Babu R
2020-12-09 19:24         ` Darrick J. Wong
2020-12-11  5:49           ` Chandan Babu R
2020-11-17 13:44 ` [PATCH V11 06/14] xfs: Check for extent overflow when writing to unwritten extent Chandan Babu R
2020-11-17 13:44 ` [PATCH V11 07/14] xfs: Check for extent overflow when moving extent from cow to data fork Chandan Babu R
2020-11-17 13:44 ` [PATCH V11 08/14] xfs: Check for extent overflow when remapping an extent Chandan Babu R
2020-11-17 13:44 ` [PATCH V11 09/14] xfs: Check for extent overflow when swapping extents Chandan Babu R
2020-11-17 13:44 ` [PATCH V11 10/14] xfs: Introduce error injection to reduce maximum inode fork extent count Chandan Babu R
2020-12-03 19:06   ` Darrick J. Wong
2020-12-04  9:05     ` Chandan Babu R
2020-11-17 13:44 ` [PATCH V11 11/14] xfs: Remove duplicate assert statement in xfs_bmap_btalloc() Chandan Babu R
2020-11-17 13:44 ` [PATCH V11 12/14] xfs: Compute bmap extent alignments in a separate function Chandan Babu R
2020-11-17 13:44 ` [PATCH V11 13/14] xfs: Process allocated extent " Chandan Babu R
2020-11-17 13:44 ` [PATCH V11 14/14] xfs: Introduce error injection to allocate only minlen size extents for files Chandan Babu R

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).