linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET RFC v2 chandan 0/4] xfs: separate btree cursor slab zones
@ 2021-09-24  1:27 Darrick J. Wong
  2021-09-24  1:27 ` [PATCH 1/4] xfs: remove xfs_btree_cur.bc_blocklog Darrick J. Wong
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Darrick J. Wong @ 2021-09-24  1:27 UTC (permalink / raw)
  To: djwong, chandan.babu, chandanrlinux, david; +Cc: linux-xfs

Hi all,

Following the review discussions about the dynamic btree cursor height
patches, I've throw together another series to reduce the size of the
btree cursor, compute the absolute maximum possible btree heights for
each btree type, and now each btree cursor has its own slab zone:

$ grep xfs.*cur /proc/slabinfo
xfs_refc_btree_cur      0      0    192   21    1 : tunables    0    0    0 : slabdata      0      0      0
xfs_ialloc_btree_cur      0      0    208   19    1 : tunables    0    0    0 : slabdata      0      0      0
xfs_bmap_btree_cur      0      0    240   17    1 : tunables    0    0    0 : slabdata      0      0      0
xfs_rmap_btree_cur      0      0    240   17    1 : tunables    0    0    0 : slabdata      0      0      0
xfs_alloc_btree_cur      0      0    208   19    1 : tunables    0    0    0 : slabdata      0      0      0

I've also rigged up the debugger to make it easier to extract the actual
height information:

$ xfs_db /dev/sda -c 'btheight -w absmax all'
bnobt: 7
cntbt: 7
inobt: 7
finobt: 7
bmapbt: 9
refcountbt: 6
rmapbt: 9

As you can see from the slabinfo output, this no longer means that we're
allocating 224-byte cursors for all five btree types.  Even with the
extra overhead of supporting dynamic cursor sizes, we still come out
ahead in terms of cursor size for three of the five btree types.

This series also includes a couple of patches to reduce holes and
unnecessary fields in the btree cursor.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=btree-cursor-zones-5.16
---
 fs/xfs/libxfs/xfs_alloc_btree.c    |   12 +++++++
 fs/xfs/libxfs/xfs_alloc_btree.h    |    1 +
 fs/xfs/libxfs/xfs_bmap_btree.c     |   13 ++++++++
 fs/xfs/libxfs/xfs_bmap_btree.h     |    1 +
 fs/xfs/libxfs/xfs_btree.c          |   61 +++++++++++++++++++++++++++++++----
 fs/xfs/libxfs/xfs_btree.h          |   28 +++++++++-------
 fs/xfs/libxfs/xfs_fs.h             |    3 ++
 fs/xfs/libxfs/xfs_ialloc_btree.c   |   11 ++++++
 fs/xfs/libxfs/xfs_ialloc_btree.h   |    1 +
 fs/xfs/libxfs/xfs_refcount_btree.c |   13 ++++++++
 fs/xfs/libxfs/xfs_refcount_btree.h |    1 +
 fs/xfs/libxfs/xfs_rmap_btree.c     |   25 +++++++++++++++
 fs/xfs/libxfs/xfs_rmap_btree.h     |    1 +
 fs/xfs/libxfs/xfs_types.h          |    3 ++
 fs/xfs/xfs_super.c                 |   62 +++++++++++++++++++++++++++++++++---
 15 files changed, 210 insertions(+), 26 deletions(-)


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/4] xfs: remove xfs_btree_cur.bc_blocklog
  2021-09-24  1:27 [PATCHSET RFC v2 chandan 0/4] xfs: separate btree cursor slab zones Darrick J. Wong
@ 2021-09-24  1:27 ` Darrick J. Wong
  2021-09-24  1:27 ` [PATCH 2/4] xfs: reduce the size of nr_ops for refcount btree cursors Darrick J. Wong
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2021-09-24  1:27 UTC (permalink / raw)
  To: djwong, chandan.babu, chandanrlinux, david; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

This field isn't used by anyone, so get rid of it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_btree.c |    1 -
 fs/xfs/libxfs/xfs_btree.h |    1 -
 2 files changed, 2 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index 64b47e81f6f3..619319ff41e5 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -5023,7 +5023,6 @@ xfs_btree_alloc_cursor(
 	cur->bc_tp = tp;
 	cur->bc_mp = mp;
 	cur->bc_btnum = btnum;
-	cur->bc_blocklog = mp->m_sb.sb_blocklog;
 	cur->bc_maxlevels = maxlevels;
 
 	return cur;
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index 89239b2be096..527b90aa085b 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -241,7 +241,6 @@ struct xfs_btree_cur
 	uint			bc_flags; /* btree features - below */
 	uint8_t		bc_maxlevels;	/* maximum levels for this btree type */
 	uint8_t		bc_nlevels;	/* number of levels in the tree */
-	uint8_t		bc_blocklog;	/* log2(blocksize) of btree blocks */
 	union xfs_btree_irec	bc_rec;	/* current insert/search record value */
 	xfs_btnum_t	bc_btnum;	/* identifies which btree type */
 	int		bc_statoff;	/* offset of btre stats array */


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 2/4] xfs: reduce the size of nr_ops for refcount btree cursors
  2021-09-24  1:27 [PATCHSET RFC v2 chandan 0/4] xfs: separate btree cursor slab zones Darrick J. Wong
  2021-09-24  1:27 ` [PATCH 1/4] xfs: remove xfs_btree_cur.bc_blocklog Darrick J. Wong
@ 2021-09-24  1:27 ` Darrick J. Wong
  2021-09-24  1:27 ` [PATCH 3/4] xfs: check absolute maximum nlevels for each btree type Darrick J. Wong
  2021-09-24  1:27 ` [PATCH 4/4] xfs: use separate btree cursor slab " Darrick J. Wong
  3 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2021-09-24  1:27 UTC (permalink / raw)
  To: djwong, chandan.babu, chandanrlinux, david; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

We're never going to run more than 4 billion btree operations on a
refcount cursor, so shrink the field to an unsigned int to reduce the
structure size.  Fix whitespace alignment too.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_btree.h |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index 527b90aa085b..57b7aa3f6366 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -185,18 +185,18 @@ union xfs_btree_irec {
 
 /* Per-AG btree information. */
 struct xfs_btree_cur_ag {
-	struct xfs_perag	*pag;
+	struct xfs_perag		*pag;
 	union {
 		struct xfs_buf		*agbp;
 		struct xbtree_afakeroot	*afake;	/* for staging cursor */
 	};
 	union {
 		struct {
-			unsigned long nr_ops;	/* # record updates */
-			int	shape_changes;	/* # of extent splits */
+			unsigned int	nr_ops;	/* # record updates */
+			unsigned int	shape_changes;	/* # of extent splits */
 		} refc;
 		struct {
-			bool	active;		/* allocation cursor state */
+			bool		active;	/* allocation cursor state */
 		} abt;
 	};
 };


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 3/4] xfs: check absolute maximum nlevels for each btree type
  2021-09-24  1:27 [PATCHSET RFC v2 chandan 0/4] xfs: separate btree cursor slab zones Darrick J. Wong
  2021-09-24  1:27 ` [PATCH 1/4] xfs: remove xfs_btree_cur.bc_blocklog Darrick J. Wong
  2021-09-24  1:27 ` [PATCH 2/4] xfs: reduce the size of nr_ops for refcount btree cursors Darrick J. Wong
@ 2021-09-24  1:27 ` Darrick J. Wong
  2021-09-26  0:43   ` Dave Chinner
  2021-09-24  1:27 ` [PATCH 4/4] xfs: use separate btree cursor slab " Darrick J. Wong
  3 siblings, 1 reply; 13+ messages in thread
From: Darrick J. Wong @ 2021-09-24  1:27 UTC (permalink / raw)
  To: djwong, chandan.babu, chandanrlinux, david; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Add code for all five btree types so that we can compute the absolute
maximum possible btree height for each btree type, and then check that
none of them exceed XFS_BTREE_CUR_ZONE_MAXLEVELS.  The code to do the
actual checking is a little excessive, but it sets us up for per-type
cursor zones in the next patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_alloc_btree.c    |   12 +++++++++
 fs/xfs/libxfs/xfs_alloc_btree.h    |    1 +
 fs/xfs/libxfs/xfs_bmap_btree.c     |   13 ++++++++++
 fs/xfs/libxfs/xfs_bmap_btree.h     |    1 +
 fs/xfs/libxfs/xfs_btree.c          |   48 ++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_btree.h          |   10 ++++++++
 fs/xfs/libxfs/xfs_fs.h             |    3 ++
 fs/xfs/libxfs/xfs_ialloc_btree.c   |   11 ++++++++
 fs/xfs/libxfs/xfs_ialloc_btree.h   |    1 +
 fs/xfs/libxfs/xfs_refcount_btree.c |   13 ++++++++++
 fs/xfs/libxfs/xfs_refcount_btree.h |    1 +
 fs/xfs/libxfs/xfs_rmap_btree.c     |   25 +++++++++++++++++++
 fs/xfs/libxfs/xfs_rmap_btree.h     |    1 +
 fs/xfs/libxfs/xfs_types.h          |    3 ++
 fs/xfs/xfs_super.c                 |   35 ++++++++++++++++++++++++++
 15 files changed, 178 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_alloc_btree.c b/fs/xfs/libxfs/xfs_alloc_btree.c
index c644b11132f6..7f8612e383ed 100644
--- a/fs/xfs/libxfs/xfs_alloc_btree.c
+++ b/fs/xfs/libxfs/xfs_alloc_btree.c
@@ -582,6 +582,18 @@ xfs_allocbt_maxrecs(
 	return blocklen / (sizeof(xfs_alloc_key_t) + sizeof(xfs_alloc_ptr_t));
 }
 
+unsigned int
+xfs_allocbt_absolute_maxlevels(void)
+{
+	unsigned int		minrecs[2];
+
+	xfs_btree_absolute_minrecs(minrecs, 0, sizeof(xfs_alloc_rec_t),
+			sizeof(xfs_alloc_key_t) + sizeof(xfs_alloc_ptr_t));
+
+	return xfs_btree_compute_maxlevels(minrecs,
+			(XFS_MAX_AG_BLOCKS + 1) / 2);
+}
+
 /* Calculate the freespace btree size for some records. */
 xfs_extlen_t
 xfs_allocbt_calc_size(
diff --git a/fs/xfs/libxfs/xfs_alloc_btree.h b/fs/xfs/libxfs/xfs_alloc_btree.h
index 2f6b816aaf9f..0d5b9aeb78b2 100644
--- a/fs/xfs/libxfs/xfs_alloc_btree.h
+++ b/fs/xfs/libxfs/xfs_alloc_btree.h
@@ -59,5 +59,6 @@ extern xfs_extlen_t xfs_allocbt_calc_size(struct xfs_mount *mp,
 
 void xfs_allocbt_commit_staged_btree(struct xfs_btree_cur *cur,
 		struct xfs_trans *tp, struct xfs_buf *agbp);
+unsigned int xfs_allocbt_absolute_maxlevels(void);
 
 #endif	/* __XFS_ALLOC_BTREE_H__ */
diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
index a06987e36db5..976a324460b4 100644
--- a/fs/xfs/libxfs/xfs_bmap_btree.c
+++ b/fs/xfs/libxfs/xfs_bmap_btree.c
@@ -586,6 +586,19 @@ xfs_bmbt_maxrecs(
 	return blocklen / (sizeof(xfs_bmbt_key_t) + sizeof(xfs_bmbt_ptr_t));
 }
 
+unsigned int
+xfs_bmbt_absolute_maxlevels(void)
+{
+	unsigned int		minrecs[2];
+
+	xfs_btree_absolute_minrecs(minrecs, XFS_BTREE_LONG_PTRS,
+			sizeof(struct xfs_bmbt_rec),
+			sizeof(struct xfs_bmbt_key) +
+				sizeof(xfs_bmbt_ptr_t));
+
+	return xfs_btree_compute_maxlevels(minrecs, MAXEXTNUM) + 1;
+}
+
 /*
  * Calculate number of records in a bmap btree inode root.
  */
diff --git a/fs/xfs/libxfs/xfs_bmap_btree.h b/fs/xfs/libxfs/xfs_bmap_btree.h
index 729e3bc569be..1c5168947b96 100644
--- a/fs/xfs/libxfs/xfs_bmap_btree.h
+++ b/fs/xfs/libxfs/xfs_bmap_btree.h
@@ -109,5 +109,6 @@ extern struct xfs_btree_cur *xfs_bmbt_init_cursor(struct xfs_mount *,
 
 extern unsigned long long xfs_bmbt_calc_size(struct xfs_mount *mp,
 		unsigned long long len);
+unsigned int xfs_bmbt_absolute_maxlevels(void);
 
 #endif	/* __XFS_BMAP_BTREE_H__ */
diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index 619319ff41e5..120280c998f8 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -27,6 +27,13 @@
  * Cursor allocation zone.
  */
 kmem_zone_t	*xfs_btree_cur_zone;
+struct xfs_btree_cur_zone xfs_btree_cur_zones[XFS_BTNUM_MAX] = {
+	[XFS_BTNUM_BNO]		= { .name = "xfs_alloc_btree_cur" },
+	[XFS_BTNUM_INO]		= { .name = "xfs_ialloc_btree_cur" },
+	[XFS_BTNUM_RMAP]	= { .name = "xfs_rmap_btree_cur" },
+	[XFS_BTNUM_REFC]	= { .name = "xfs_refc_btree_cur" },
+	[XFS_BTNUM_BMAP]	= { .name = "xfs_bmap_btree_cur" },
+};
 
 /*
  * Btree magic numbers.
@@ -5027,3 +5034,44 @@ xfs_btree_alloc_cursor(
 
 	return cur;
 }
+
+/*
+ * Compute absolute minrecs for leaf and node btree blocks.  Callers should set
+ * BTREE_LONG_PTRS and BTREE_OVERLAPPING as they would for regular cursors.
+ * Set BTREE_CRC_BLOCKS if the btree type is supported on V5 or newer
+ * filesystems.
+ */
+void
+xfs_btree_absolute_minrecs(
+	unsigned int		*minrecs,
+	unsigned int		bc_flags,
+	unsigned int		leaf_recbytes,
+	unsigned int		node_recbytes)
+{
+	unsigned int		min_recbytes;
+
+	/*
+	 * If this btree type is supported on V4, we use the smaller V4 min
+	 * block size along with the V4 header size.  If the btree type is only
+	 * supported on V5, use the (twice as large) V5 min block size along
+	 * with the V5 header size.
+	 */
+	if (!(bc_flags & XFS_BTREE_CRC_BLOCKS)) {
+		if (bc_flags & XFS_BTREE_LONG_PTRS)
+			min_recbytes = XFS_MIN_BLOCKSIZE -
+							XFS_BTREE_LBLOCK_LEN;
+		else
+			min_recbytes = XFS_MIN_BLOCKSIZE -
+							XFS_BTREE_SBLOCK_LEN;
+	} else if (bc_flags & XFS_BTREE_LONG_PTRS) {
+		min_recbytes = XFS_MIN_CRC_BLOCKSIZE - XFS_BTREE_LBLOCK_CRC_LEN;
+	} else {
+		min_recbytes = XFS_MIN_CRC_BLOCKSIZE - XFS_BTREE_SBLOCK_CRC_LEN;
+	}
+
+	if (bc_flags & XFS_BTREE_OVERLAPPING)
+		node_recbytes <<= 1;
+
+	minrecs[0] = min_recbytes / (2 * leaf_recbytes);
+	minrecs[1] = min_recbytes / (2 * node_recbytes);
+}
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index 57b7aa3f6366..5bebd26f8b2c 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -583,4 +583,14 @@ struct xfs_btree_cur *xfs_btree_alloc_cursor(struct xfs_mount *mp,
 		struct xfs_trans *tp, xfs_btnum_t btnum);
 unsigned int xfs_btree_maxlevels(struct xfs_mount *mp, xfs_btnum_t btnum);
 
+void xfs_btree_absolute_minrecs(unsigned int *minrecs, unsigned int bc_flags,
+		unsigned int leaf_recbytes, unsigned int node_recbytes);
+
+struct xfs_btree_cur_zone {
+	const char		*name;
+	unsigned int		maxlevels;
+};
+
+extern struct xfs_btree_cur_zone xfs_btree_cur_zones[XFS_BTNUM_MAX];
+
 #endif	/* __XFS_BTREE_H__ */
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index bde2b4c64dbe..33d323ddf6da 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -268,6 +268,9 @@ typedef struct xfs_fsop_resblks {
  */
 #define XFS_MIN_AG_BYTES	(1ULL << 24)	/* 16 MB */
 #define XFS_MAX_AG_BYTES	(1ULL << 40)	/* 1 TB */
+#define XFS_MAX_AG_BLOCKS	(XFS_MAX_AG_BYTES / XFS_MIN_BLOCKSIZE)
+#define XFS_MAX_CRC_AG_BLOCKS	(XFS_MAX_AG_BYTES / XFS_MIN_CRC_BLOCKSIZE)
+#define XFS_MAX_AG_INODES	(XFS_MAX_AG_BYTES / XFS_DINODE_MIN_SIZE)
 
 /* keep the maximum size under 2^31 by a small amount */
 #define XFS_MAX_LOG_BYTES \
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
index c8fea6a464d5..ce428c98e7c4 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.c
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
@@ -541,6 +541,17 @@ xfs_inobt_maxrecs(
 	return blocklen / (sizeof(xfs_inobt_key_t) + sizeof(xfs_inobt_ptr_t));
 }
 
+unsigned int
+xfs_inobt_absolute_maxlevels(void)
+{
+	unsigned int		minrecs[2];
+
+	xfs_btree_absolute_minrecs(minrecs, 0, sizeof(xfs_inobt_rec_t),
+			sizeof(xfs_inobt_key_t) + sizeof(xfs_inobt_ptr_t));
+
+	return xfs_btree_compute_maxlevels(minrecs, XFS_MAX_AG_INODES);
+}
+
 /*
  * Convert the inode record holemask to an inode allocation bitmap. The inode
  * allocation bitmap is inode granularity and specifies whether an inode is
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.h b/fs/xfs/libxfs/xfs_ialloc_btree.h
index 8a322d402e61..e29570b4d3b6 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.h
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.h
@@ -74,5 +74,6 @@ int xfs_inobt_cur(struct xfs_mount *mp, struct xfs_trans *tp,
 
 void xfs_inobt_commit_staged_btree(struct xfs_btree_cur *cur,
 		struct xfs_trans *tp, struct xfs_buf *agbp);
+unsigned int xfs_inobt_absolute_maxlevels(void);
 
 #endif	/* __XFS_IALLOC_BTREE_H__ */
diff --git a/fs/xfs/libxfs/xfs_refcount_btree.c b/fs/xfs/libxfs/xfs_refcount_btree.c
index 48c45e31d897..9d428c3d4d90 100644
--- a/fs/xfs/libxfs/xfs_refcount_btree.c
+++ b/fs/xfs/libxfs/xfs_refcount_btree.c
@@ -408,6 +408,19 @@ xfs_refcountbt_maxrecs(
 			   sizeof(xfs_refcount_ptr_t));
 }
 
+unsigned int
+xfs_refcountbt_absolute_maxlevels(void)
+{
+	unsigned int		minrecs[2];
+
+	xfs_btree_absolute_minrecs(minrecs, XFS_BTREE_CRC_BLOCKS,
+			sizeof(struct xfs_refcount_rec),
+			sizeof(struct xfs_refcount_key) +
+						sizeof(xfs_refcount_ptr_t));
+
+	return xfs_btree_compute_maxlevels(minrecs, XFS_MAX_CRC_AG_BLOCKS);
+}
+
 /* Compute the maximum height of a refcount btree. */
 void
 xfs_refcountbt_compute_maxlevels(
diff --git a/fs/xfs/libxfs/xfs_refcount_btree.h b/fs/xfs/libxfs/xfs_refcount_btree.h
index bd9ed9e1e41f..d7dad2dbac09 100644
--- a/fs/xfs/libxfs/xfs_refcount_btree.h
+++ b/fs/xfs/libxfs/xfs_refcount_btree.h
@@ -64,5 +64,6 @@ extern int xfs_refcountbt_calc_reserves(struct xfs_mount *mp,
 
 void xfs_refcountbt_commit_staged_btree(struct xfs_btree_cur *cur,
 		struct xfs_trans *tp, struct xfs_buf *agbp);
+unsigned int xfs_refcountbt_absolute_maxlevels(void);
 
 #endif	/* __XFS_REFCOUNT_BTREE_H__ */
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c
index 85caeb14e4db..d73768c7546f 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.c
+++ b/fs/xfs/libxfs/xfs_rmap_btree.c
@@ -534,6 +534,31 @@ xfs_rmapbt_maxrecs(
 		(2 * sizeof(struct xfs_rmap_key) + sizeof(xfs_rmap_ptr_t));
 }
 
+unsigned int
+xfs_rmapbt_absolute_maxlevels(void)
+{
+	unsigned int		minrecs[2];
+
+	xfs_btree_absolute_minrecs(minrecs,
+			XFS_BTREE_CRC_BLOCKS | XFS_BTREE_OVERLAPPING,
+			sizeof(struct xfs_rmap_rec),
+			sizeof(struct xfs_rmap_key) + sizeof(xfs_rmap_ptr_t));
+
+	/*
+	 * Compute the asymptotic maxlevels for an rmapbt on any reflink fs.
+	 *
+	 * On a reflink filesystem, each AG block can have up to 2^32 (per the
+	 * refcount record format) owners, which means that theoretically we
+	 * could face up to 2^64 rmap records.  However, we're likely to run
+	 * out of blocks in the AG long before that happens, which means that
+	 * we must compute the max height based on what the btree will look
+	 * like if it consumes almost all the blocks in the AG due to maximal
+	 * sharing factor.
+	 */
+	return xfs_btree_compute_maxlevels_size(XFS_MAX_CRC_AG_BLOCKS,
+			minrecs[1]);
+}
+
 /* Compute the maximum height of an rmap btree. */
 unsigned int
 xfs_rmapbt_compute_maxlevels(
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.h b/fs/xfs/libxfs/xfs_rmap_btree.h
index 5aaecf755abd..6d5001e92966 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.h
+++ b/fs/xfs/libxfs/xfs_rmap_btree.h
@@ -58,5 +58,6 @@ extern xfs_extlen_t xfs_rmapbt_max_size(struct xfs_mount *mp,
 
 extern int xfs_rmapbt_calc_reserves(struct xfs_mount *mp, struct xfs_trans *tp,
 		struct xfs_perag *pag, xfs_extlen_t *ask, xfs_extlen_t *used);
+unsigned int xfs_rmapbt_absolute_maxlevels(void);
 
 #endif /* __XFS_RMAP_BTREE_H__ */
diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
index b6da06b40989..6ce15a4f9f47 100644
--- a/fs/xfs/libxfs/xfs_types.h
+++ b/fs/xfs/libxfs/xfs_types.h
@@ -126,6 +126,9 @@ typedef enum {
 	XFS_BTNUM_INOi, XFS_BTNUM_FINOi, XFS_BTNUM_REFCi, XFS_BTNUM_MAX
 } xfs_btnum_t;
 
+#define for_each_xfs_btnum(btnum) \
+	for((btnum) = XFS_BTNUM_BNOi; (btnum) < XFS_BTNUM_MAX; (btnum)++)
+
 #define XFS_BTNUM_STRINGS \
 	{ XFS_BTNUM_BNOi,	"bnobt" }, \
 	{ XFS_BTNUM_CNTi,	"cntbt" }, \
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 90c92a6a49e0..83abe9bfe3ef 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -37,6 +37,12 @@
 #include "xfs_reflink.h"
 #include "xfs_pwork.h"
 #include "xfs_ag.h"
+#include "xfs_btree.h"
+#include "xfs_alloc_btree.h"
+#include "xfs_ialloc_btree.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_rmap_btree.h"
+#include "xfs_refcount_btree.h"
 
 #include <linux/magic.h>
 #include <linux/fs_context.h>
@@ -1950,9 +1956,34 @@ static struct file_system_type xfs_fs_type = {
 };
 MODULE_ALIAS_FS("xfs");
 
+STATIC int __init
+xfs_init_btree_zones(void)
+{
+	struct xfs_btree_cur_zone	*z = xfs_btree_cur_zones;
+	xfs_btnum_t			btnum;
+
+	z[XFS_BTNUM_BNO].maxlevels	= xfs_allocbt_absolute_maxlevels();
+	z[XFS_BTNUM_RMAP].maxlevels	= xfs_rmapbt_absolute_maxlevels();
+	z[XFS_BTNUM_BMAP].maxlevels	= xfs_bmbt_absolute_maxlevels();
+	z[XFS_BTNUM_INO].maxlevels	= xfs_inobt_absolute_maxlevels();
+	z[XFS_BTNUM_REFC].maxlevels	= xfs_refcountbt_absolute_maxlevels();
+
+	for_each_xfs_btnum(btnum) {
+		ASSERT(z[btnum].maxlevels <= XFS_BTREE_CUR_ZONE_MAXLEVELS);
+	}
+
+	/* struct copies for btree types that share zones */
+	z[XFS_BTNUM_CNT] = z[XFS_BTNUM_BNO];
+	z[XFS_BTNUM_FINO] = z[XFS_BTNUM_INO];
+
+	return 0;
+}
+
 STATIC int __init
 xfs_init_zones(void)
 {
+	int			error;
+
 	xfs_log_ticket_zone = kmem_cache_create("xfs_log_ticket",
 						sizeof(struct xlog_ticket),
 						0, 0, NULL);
@@ -1965,6 +1996,10 @@ xfs_init_zones(void)
 	if (!xfs_bmap_free_item_zone)
 		goto out_destroy_log_ticket_zone;
 
+	error = xfs_init_btree_zones();
+	if (error)
+		goto out_destroy_bmap_free_item_zone;
+
 	xfs_btree_cur_zone = kmem_cache_create("xfs_btree_cur",
 			xfs_btree_cur_sizeof(XFS_BTREE_CUR_ZONE_MAXLEVELS),
 			0, 0, NULL);


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 4/4] xfs: use separate btree cursor slab for each btree type
  2021-09-24  1:27 [PATCHSET RFC v2 chandan 0/4] xfs: separate btree cursor slab zones Darrick J. Wong
                   ` (2 preceding siblings ...)
  2021-09-24  1:27 ` [PATCH 3/4] xfs: check absolute maximum nlevels for each btree type Darrick J. Wong
@ 2021-09-24  1:27 ` Darrick J. Wong
  2021-09-26  0:47   ` Dave Chinner
  3 siblings, 1 reply; 13+ messages in thread
From: Darrick J. Wong @ 2021-09-24  1:27 UTC (permalink / raw)
  To: djwong, chandan.babu, chandanrlinux, david; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Now that we have the infrastructure to track the max possible height of
each btree type, we can create a separate slab zone for cursors of each
type of btree.  For smaller indices like the free space btrees, this
means that we can pack more cursors into a slab page, improving slab
utilization.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_btree.c |   12 ++++++------
 fs/xfs/libxfs/xfs_btree.h |    9 +--------
 fs/xfs/xfs_super.c        |   33 ++++++++++++++++++++++++---------
 3 files changed, 31 insertions(+), 23 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index 120280c998f8..3131de9ae631 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -26,7 +26,6 @@
 /*
  * Cursor allocation zone.
  */
-kmem_zone_t	*xfs_btree_cur_zone;
 struct xfs_btree_cur_zone xfs_btree_cur_zones[XFS_BTNUM_MAX] = {
 	[XFS_BTNUM_BNO]		= { .name = "xfs_alloc_btree_cur" },
 	[XFS_BTNUM_INO]		= { .name = "xfs_ialloc_btree_cur" },
@@ -364,6 +363,7 @@ xfs_btree_del_cursor(
 	struct xfs_btree_cur	*cur,		/* btree cursor */
 	int			error)		/* del because of error */
 {
+	struct xfs_btree_cur_zone *bczone = &xfs_btree_cur_zones[cur->bc_btnum];
 	int			i;		/* btree level */
 
 	/*
@@ -386,10 +386,10 @@ xfs_btree_del_cursor(
 		kmem_free(cur->bc_ops);
 	if (!(cur->bc_flags & XFS_BTREE_LONG_PTRS) && cur->bc_ag.pag)
 		xfs_perag_put(cur->bc_ag.pag);
-	if (cur->bc_maxlevels > XFS_BTREE_CUR_ZONE_MAXLEVELS)
+	if (cur->bc_maxlevels > bczone->maxlevels)
 		kmem_free(cur);
 	else
-		kmem_cache_free(xfs_btree_cur_zone, cur);
+		kmem_cache_free(bczone->zone, cur);
 }
 
 /*
@@ -5021,12 +5021,12 @@ xfs_btree_alloc_cursor(
 {
 	struct xfs_btree_cur	*cur;
 	unsigned int		maxlevels = xfs_btree_maxlevels(mp, btnum);
+	struct xfs_btree_cur_zone *bczone = &xfs_btree_cur_zones[btnum];
 
-	if (maxlevels > XFS_BTREE_CUR_ZONE_MAXLEVELS)
+	if (maxlevels > bczone->maxlevels)
 		cur = kmem_zalloc(xfs_btree_cur_sizeof(maxlevels), KM_NOFS);
 	else
-		cur = kmem_cache_zalloc(xfs_btree_cur_zone,
-				GFP_NOFS | __GFP_NOFAIL);
+		cur = kmem_cache_zalloc(bczone->zone, GFP_NOFS | __GFP_NOFAIL);
 	cur->bc_tp = tp;
 	cur->bc_mp = mp;
 	cur->bc_btnum = btnum;
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index 5bebd26f8b2c..854235a8adac 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -13,8 +13,6 @@ struct xfs_trans;
 struct xfs_ifork;
 struct xfs_perag;
 
-extern kmem_zone_t	*xfs_btree_cur_zone;
-
 /*
  * Generic key, ptr and record wrapper structures.
  *
@@ -92,12 +90,6 @@ uint32_t xfs_btree_magic(int crc, xfs_btnum_t btnum);
 #define XFS_BTREE_STATS_ADD(cur, stat, val)	\
 	XFS_STATS_ADD_OFF((cur)->bc_mp, (cur)->bc_statoff + __XBTS_ ## stat, val)
 
-/*
- * The btree cursor zone hands out cursors that can handle up to this many
- * levels.  This is the known maximum for all btree types.
- */
-#define XFS_BTREE_CUR_ZONE_MAXLEVELS	(9)
-
 struct xfs_btree_ops {
 	/* size of the key and record structures */
 	size_t	key_len;
@@ -588,6 +580,7 @@ void xfs_btree_absolute_minrecs(unsigned int *minrecs, unsigned int bc_flags,
 
 struct xfs_btree_cur_zone {
 	const char		*name;
+	kmem_zone_t		*zone;
 	unsigned int		maxlevels;
 };
 
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 83abe9bfe3ef..426dcba213ac 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1969,7 +1969,14 @@ xfs_init_btree_zones(void)
 	z[XFS_BTNUM_REFC].maxlevels	= xfs_refcountbt_absolute_maxlevels();
 
 	for_each_xfs_btnum(btnum) {
-		ASSERT(z[btnum].maxlevels <= XFS_BTREE_CUR_ZONE_MAXLEVELS);
+		if (z[btnum].name == NULL)
+			continue;
+		ASSERT(z[btnum].maxlevels >= 1);
+		z[btnum].zone = kmem_cache_create(z[btnum].name,
+				xfs_btree_cur_sizeof(z[btnum].maxlevels),
+				0, 0, NULL);
+		if (!z[btnum].zone)
+			return -ENOMEM;
 	}
 
 	/* struct copies for btree types that share zones */
@@ -1979,6 +1986,20 @@ xfs_init_btree_zones(void)
 	return 0;
 }
 
+STATIC void
+xfs_destroy_btree_zones(void)
+{
+	struct xfs_btree_cur_zone	*z = xfs_btree_cur_zones;
+	xfs_btnum_t			btnum;
+
+	/* don't free shared zones */
+	z[XFS_BTNUM_CNT].zone = NULL;
+	z[XFS_BTNUM_FINO].zone = NULL;
+
+	for_each_xfs_btnum(btnum)
+		kmem_cache_destroy(z[btnum].zone);
+}
+
 STATIC int __init
 xfs_init_zones(void)
 {
@@ -2000,12 +2021,6 @@ xfs_init_zones(void)
 	if (error)
 		goto out_destroy_bmap_free_item_zone;
 
-	xfs_btree_cur_zone = kmem_cache_create("xfs_btree_cur",
-			xfs_btree_cur_sizeof(XFS_BTREE_CUR_ZONE_MAXLEVELS),
-			0, 0, NULL);
-	if (!xfs_btree_cur_zone)
-		goto out_destroy_bmap_free_item_zone;
-
 	xfs_da_state_zone = kmem_cache_create("xfs_da_state",
 					      sizeof(struct xfs_da_state),
 					      0, 0, NULL);
@@ -2141,7 +2156,7 @@ xfs_init_zones(void)
  out_destroy_da_state_zone:
 	kmem_cache_destroy(xfs_da_state_zone);
  out_destroy_btree_cur_zone:
-	kmem_cache_destroy(xfs_btree_cur_zone);
+	xfs_destroy_btree_zones();
  out_destroy_bmap_free_item_zone:
 	kmem_cache_destroy(xfs_bmap_free_item_zone);
  out_destroy_log_ticket_zone:
@@ -2173,7 +2188,7 @@ xfs_destroy_zones(void)
 	kmem_cache_destroy(xfs_trans_zone);
 	kmem_cache_destroy(xfs_ifork_zone);
 	kmem_cache_destroy(xfs_da_state_zone);
-	kmem_cache_destroy(xfs_btree_cur_zone);
+	xfs_destroy_btree_zones();
 	kmem_cache_destroy(xfs_bmap_free_item_zone);
 	kmem_cache_destroy(xfs_log_ticket_zone);
 }


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] xfs: check absolute maximum nlevels for each btree type
  2021-09-24  1:27 ` [PATCH 3/4] xfs: check absolute maximum nlevels for each btree type Darrick J. Wong
@ 2021-09-26  0:43   ` Dave Chinner
  2021-09-27 18:17     ` Darrick J. Wong
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Chinner @ 2021-09-26  0:43 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: chandan.babu, chandanrlinux, linux-xfs

On Thu, Sep 23, 2021 at 06:27:54PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Add code for all five btree types so that we can compute the absolute
> maximum possible btree height for each btree type, and then check that
> none of them exceed XFS_BTREE_CUR_ZONE_MAXLEVELS.  The code to do the
> actual checking is a little excessive, but it sets us up for per-type
> cursor zones in the next patch.

Ok, I think the cursor "zone" array is the wrong approach here.

First of all - can we stop using the term "zone" for new code?
That's the old irix terminolgy for slab caches, and we have been
moving away from that to the Linux "kmem_cache" terminology and
types for quite some time.

AFAICT, the only reason for having the zone array is so that
xfs_btree_alloc_cursor() can do a lookup via btnum into the array to
get the maxlevels and kmem cache pointer to allocate from.

Given that we've just called into xfs_btree_alloc_cursor() from the
specific btree type we are allocating the cursor for (that's where
we got btnum from!), we should just be passing these type specific
variables directly from the caller like we do for btnum. That gets
rid of the need for the zone array completely....

i.e. I don't see why the per-type cache information needs to be
global information. The individual max-level calculations could just
be individual kmem_cache_alloc() calls to set locally defined (i.e.
static global) cache pointers and max size variables.

> diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
> index c8fea6a464d5..ce428c98e7c4 100644
> --- a/fs/xfs/libxfs/xfs_ialloc_btree.c
> +++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
> @@ -541,6 +541,17 @@ xfs_inobt_maxrecs(
>  	return blocklen / (sizeof(xfs_inobt_key_t) + sizeof(xfs_inobt_ptr_t));
>  }
>  
> +unsigned int
> +xfs_inobt_absolute_maxlevels(void)
> +{
> +	unsigned int		minrecs[2];
> +
> +	xfs_btree_absolute_minrecs(minrecs, 0, sizeof(xfs_inobt_rec_t),
> +			sizeof(xfs_inobt_key_t) + sizeof(xfs_inobt_ptr_t));
> +
> +	return xfs_btree_compute_maxlevels(minrecs, XFS_MAX_AG_INODES);
> +}

i.e. rather than returning the size here, we do:

static int xfs_inobt_maxlevels;
static struct kmem_cache xfs_inobt_cursor_cache;

int __init
xfs_inobt_create_cursor_cache(void)
{
	unsigned int		minrecs[2];

	xfs_btree_absolute_minrecs(minrecs, 0, sizeof(xfs_inobt_rec_t),
			sizeof(xfs_inobt_key_t) + sizeof(xfs_inobt_ptr_t));
	xfs_inobt_maxlevels = xfs_btree_compute_maxlevels(minrecs,
			XFS_MAX_AG_INODES);
	xfs_inobt_cursor_cache = kmem_cache_alloc("xfs_inobt_cur",
			xfs_btree_cur_sizeof(xfs_inobt_maxlevels),
			0, 0, NULL);
	if (!xfs_inobt_cursor_cache)
		return -ENOMEM;
	return 0;
}

void
xfs_inobt_destroy_cursor_cache(void)
{
	kmem_cache_destroy(xfs_inobt_cursor_cache);
}

and nothing outside fs/xfs/libxfs/xfs_ialloc_btree.c ever needs to
know about these variables as they only ever feed into
xfs_btree_alloc_cursor() from xfs_inobt_init_common().

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] xfs: use separate btree cursor slab for each btree type
  2021-09-24  1:27 ` [PATCH 4/4] xfs: use separate btree cursor slab " Darrick J. Wong
@ 2021-09-26  0:47   ` Dave Chinner
  2021-09-27 18:21     ` Darrick J. Wong
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Chinner @ 2021-09-26  0:47 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: chandan.babu, chandanrlinux, linux-xfs

On Thu, Sep 23, 2021 at 06:27:59PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Now that we have the infrastructure to track the max possible height of
> each btree type, we can create a separate slab zone for cursors of each
> type of btree.  For smaller indices like the free space btrees, this
> means that we can pack more cursors into a slab page, improving slab
> utilization.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/libxfs/xfs_btree.c |   12 ++++++------
>  fs/xfs/libxfs/xfs_btree.h |    9 +--------
>  fs/xfs/xfs_super.c        |   33 ++++++++++++++++++++++++---------
>  3 files changed, 31 insertions(+), 23 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> index 120280c998f8..3131de9ae631 100644
> --- a/fs/xfs/libxfs/xfs_btree.c
> +++ b/fs/xfs/libxfs/xfs_btree.c
> @@ -26,7 +26,6 @@
>  /*
>   * Cursor allocation zone.
>   */
> -kmem_zone_t	*xfs_btree_cur_zone;
>  struct xfs_btree_cur_zone xfs_btree_cur_zones[XFS_BTNUM_MAX] = {
>  	[XFS_BTNUM_BNO]		= { .name = "xfs_alloc_btree_cur" },
>  	[XFS_BTNUM_INO]		= { .name = "xfs_ialloc_btree_cur" },
> @@ -364,6 +363,7 @@ xfs_btree_del_cursor(
>  	struct xfs_btree_cur	*cur,		/* btree cursor */
>  	int			error)		/* del because of error */
>  {
> +	struct xfs_btree_cur_zone *bczone = &xfs_btree_cur_zones[cur->bc_btnum];
>  	int			i;		/* btree level */
>  
>  	/*
> @@ -386,10 +386,10 @@ xfs_btree_del_cursor(
>  		kmem_free(cur->bc_ops);
>  	if (!(cur->bc_flags & XFS_BTREE_LONG_PTRS) && cur->bc_ag.pag)
>  		xfs_perag_put(cur->bc_ag.pag);
> -	if (cur->bc_maxlevels > XFS_BTREE_CUR_ZONE_MAXLEVELS)
> +	if (cur->bc_maxlevels > bczone->maxlevels)
>  		kmem_free(cur);
>  	else
> -		kmem_cache_free(xfs_btree_cur_zone, cur);
> +		kmem_cache_free(bczone->zone, cur);
>  }
>  
>  /*
> @@ -5021,12 +5021,12 @@ xfs_btree_alloc_cursor(
>  {
>  	struct xfs_btree_cur	*cur;
>  	unsigned int		maxlevels = xfs_btree_maxlevels(mp, btnum);
> +	struct xfs_btree_cur_zone *bczone = &xfs_btree_cur_zones[btnum];
>  
> -	if (maxlevels > XFS_BTREE_CUR_ZONE_MAXLEVELS)
> +	if (maxlevels > bczone->maxlevels)
>  		cur = kmem_zalloc(xfs_btree_cur_sizeof(maxlevels), KM_NOFS);
>  	else
> -		cur = kmem_cache_zalloc(xfs_btree_cur_zone,
> -				GFP_NOFS | __GFP_NOFAIL);
> +		cur = kmem_cache_zalloc(bczone->zone, GFP_NOFS | __GFP_NOFAIL);

When will maxlevels ever be greater than bczone->maxlevels? Isn't
the bczone->maxlevels case always supposed to be the tallest
possible height for that btree?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] xfs: check absolute maximum nlevels for each btree type
  2021-09-26  0:43   ` Dave Chinner
@ 2021-09-27 18:17     ` Darrick J. Wong
  2021-09-27 20:29       ` Darrick J. Wong
  2021-09-27 21:57       ` Dave Chinner
  0 siblings, 2 replies; 13+ messages in thread
From: Darrick J. Wong @ 2021-09-27 18:17 UTC (permalink / raw)
  To: Dave Chinner; +Cc: chandan.babu, chandanrlinux, linux-xfs

On Sun, Sep 26, 2021 at 10:43:43AM +1000, Dave Chinner wrote:
> On Thu, Sep 23, 2021 at 06:27:54PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Add code for all five btree types so that we can compute the absolute
> > maximum possible btree height for each btree type, and then check that
> > none of them exceed XFS_BTREE_CUR_ZONE_MAXLEVELS.  The code to do the
> > actual checking is a little excessive, but it sets us up for per-type
> > cursor zones in the next patch.
> 
> Ok, I think the cursor "zone" array is the wrong approach here.
> 
> First of all - can we stop using the term "zone" for new code?
> That's the old irix terminolgy for slab caches, and we have been
> moving away from that to the Linux "kmem_cache" terminology and
> types for quite some time.
> 
> AFAICT, the only reason for having the zone array is so that
> xfs_btree_alloc_cursor() can do a lookup via btnum into the array to
> get the maxlevels and kmem cache pointer to allocate from.
> 
> Given that we've just called into xfs_btree_alloc_cursor() from the
> specific btree type we are allocating the cursor for (that's where
> we got btnum from!), we should just be passing these type specific
> variables directly from the caller like we do for btnum. That gets
> rid of the need for the zone array completely....
> 
> i.e. I don't see why the per-type cache information needs to be
> global information. The individual max-level calculations could just
> be individual kmem_cache_alloc() calls to set locally defined (i.e.
> static global) cache pointers and max size variables.

If the cache is a static variable inside xfs_fubar_btree.c, how do you
know which cache to pass to kmem_cache_free in xfs_btree_del_cursor?
Does this imply adding per-btree del_cursor functions and refactoring
the entire codebase to use them?

I was /trying/ to get a dependent patchset ready so that Chandan could
submit the extent counters patchset for 5.16, not trigger a refactoring
of a whole ton of btree code.  If you want to hide the information that
badly, please take over this patchset and solve both the above problem
and then one below.

> > diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
> > index c8fea6a464d5..ce428c98e7c4 100644
> > --- a/fs/xfs/libxfs/xfs_ialloc_btree.c
> > +++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
> > @@ -541,6 +541,17 @@ xfs_inobt_maxrecs(
> >  	return blocklen / (sizeof(xfs_inobt_key_t) + sizeof(xfs_inobt_ptr_t));
> >  }
> >  
> > +unsigned int
> > +xfs_inobt_absolute_maxlevels(void)
> > +{
> > +	unsigned int		minrecs[2];
> > +
> > +	xfs_btree_absolute_minrecs(minrecs, 0, sizeof(xfs_inobt_rec_t),
> > +			sizeof(xfs_inobt_key_t) + sizeof(xfs_inobt_ptr_t));
> > +
> > +	return xfs_btree_compute_maxlevels(minrecs, XFS_MAX_AG_INODES);
> > +}
> 
> i.e. rather than returning the size here, we do:
> 
> static int xfs_inobt_maxlevels;
> static struct kmem_cache xfs_inobt_cursor_cache;
> 
> int __init
> xfs_inobt_create_cursor_cache(void)
> {
> 	unsigned int		minrecs[2];
> 
> 	xfs_btree_absolute_minrecs(minrecs, 0, sizeof(xfs_inobt_rec_t),
> 			sizeof(xfs_inobt_key_t) + sizeof(xfs_inobt_ptr_t));
> 	xfs_inobt_maxlevels = xfs_btree_compute_maxlevels(minrecs,
> 			XFS_MAX_AG_INODES);

Something you couldn't have seen here is that the xfsprogs port contains
an addition to the xfs_db btheight switch to print these absolute maxima
so that we won't have to compute them by hand anymore.

Maybe I should have noted both of these points in the commit message?
Though I've also been chided for submitting excessive comments in the
past, which is why I didn't.

--D

> 	xfs_inobt_cursor_cache = kmem_cache_alloc("xfs_inobt_cur",
> 			xfs_btree_cur_sizeof(xfs_inobt_maxlevels),
> 			0, 0, NULL);
> 	if (!xfs_inobt_cursor_cache)
> 		return -ENOMEM;
> 	return 0;
> }
> 
> void
> xfs_inobt_destroy_cursor_cache(void)
> {
> 	kmem_cache_destroy(xfs_inobt_cursor_cache);
> }
> 
> and nothing outside fs/xfs/libxfs/xfs_ialloc_btree.c ever needs to
> know about these variables as they only ever feed into
> xfs_btree_alloc_cursor() from xfs_inobt_init_common().
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] xfs: use separate btree cursor slab for each btree type
  2021-09-26  0:47   ` Dave Chinner
@ 2021-09-27 18:21     ` Darrick J. Wong
  2021-09-27 22:01       ` Dave Chinner
  0 siblings, 1 reply; 13+ messages in thread
From: Darrick J. Wong @ 2021-09-27 18:21 UTC (permalink / raw)
  To: Dave Chinner; +Cc: chandan.babu, chandanrlinux, linux-xfs

On Sun, Sep 26, 2021 at 10:47:21AM +1000, Dave Chinner wrote:
> On Thu, Sep 23, 2021 at 06:27:59PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Now that we have the infrastructure to track the max possible height of
> > each btree type, we can create a separate slab zone for cursors of each
> > type of btree.  For smaller indices like the free space btrees, this
> > means that we can pack more cursors into a slab page, improving slab
> > utilization.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  fs/xfs/libxfs/xfs_btree.c |   12 ++++++------
> >  fs/xfs/libxfs/xfs_btree.h |    9 +--------
> >  fs/xfs/xfs_super.c        |   33 ++++++++++++++++++++++++---------
> >  3 files changed, 31 insertions(+), 23 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> > index 120280c998f8..3131de9ae631 100644
> > --- a/fs/xfs/libxfs/xfs_btree.c
> > +++ b/fs/xfs/libxfs/xfs_btree.c
> > @@ -26,7 +26,6 @@
> >  /*
> >   * Cursor allocation zone.
> >   */
> > -kmem_zone_t	*xfs_btree_cur_zone;
> >  struct xfs_btree_cur_zone xfs_btree_cur_zones[XFS_BTNUM_MAX] = {
> >  	[XFS_BTNUM_BNO]		= { .name = "xfs_alloc_btree_cur" },
> >  	[XFS_BTNUM_INO]		= { .name = "xfs_ialloc_btree_cur" },
> > @@ -364,6 +363,7 @@ xfs_btree_del_cursor(
> >  	struct xfs_btree_cur	*cur,		/* btree cursor */
> >  	int			error)		/* del because of error */
> >  {
> > +	struct xfs_btree_cur_zone *bczone = &xfs_btree_cur_zones[cur->bc_btnum];
> >  	int			i;		/* btree level */
> >  
> >  	/*
> > @@ -386,10 +386,10 @@ xfs_btree_del_cursor(
> >  		kmem_free(cur->bc_ops);
> >  	if (!(cur->bc_flags & XFS_BTREE_LONG_PTRS) && cur->bc_ag.pag)
> >  		xfs_perag_put(cur->bc_ag.pag);
> > -	if (cur->bc_maxlevels > XFS_BTREE_CUR_ZONE_MAXLEVELS)
> > +	if (cur->bc_maxlevels > bczone->maxlevels)
> >  		kmem_free(cur);
> >  	else
> > -		kmem_cache_free(xfs_btree_cur_zone, cur);
> > +		kmem_cache_free(bczone->zone, cur);
> >  }
> >  
> >  /*
> > @@ -5021,12 +5021,12 @@ xfs_btree_alloc_cursor(
> >  {
> >  	struct xfs_btree_cur	*cur;
> >  	unsigned int		maxlevels = xfs_btree_maxlevels(mp, btnum);
> > +	struct xfs_btree_cur_zone *bczone = &xfs_btree_cur_zones[btnum];
> >  
> > -	if (maxlevels > XFS_BTREE_CUR_ZONE_MAXLEVELS)
> > +	if (maxlevels > bczone->maxlevels)
> >  		cur = kmem_zalloc(xfs_btree_cur_sizeof(maxlevels), KM_NOFS);
> >  	else
> > -		cur = kmem_cache_zalloc(xfs_btree_cur_zone,
> > -				GFP_NOFS | __GFP_NOFAIL);
> > +		cur = kmem_cache_zalloc(bczone->zone, GFP_NOFS | __GFP_NOFAIL);
> 
> When will maxlevels ever be greater than bczone->maxlevels? Isn't
> the bczone->maxlevels case always supposed to be the tallest
> possible height for that btree?

It should never happen, provided that the maxlevels computation and
verification are all correct.  I thought it was important to leave the
heap allocation in here as a fallback, since the consequence for getting
the size calculations wrong is corrupt kernel memory.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] xfs: check absolute maximum nlevels for each btree type
  2021-09-27 18:17     ` Darrick J. Wong
@ 2021-09-27 20:29       ` Darrick J. Wong
  2021-09-27 21:57       ` Dave Chinner
  1 sibling, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2021-09-27 20:29 UTC (permalink / raw)
  To: Dave Chinner; +Cc: chandan.babu, chandanrlinux, linux-xfs

On Mon, Sep 27, 2021 at 11:17:51AM -0700, Darrick J. Wong wrote:
> On Sun, Sep 26, 2021 at 10:43:43AM +1000, Dave Chinner wrote:
> > On Thu, Sep 23, 2021 at 06:27:54PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > Add code for all five btree types so that we can compute the absolute
> > > maximum possible btree height for each btree type, and then check that
> > > none of them exceed XFS_BTREE_CUR_ZONE_MAXLEVELS.  The code to do the
> > > actual checking is a little excessive, but it sets us up for per-type
> > > cursor zones in the next patch.
> > 
> > Ok, I think the cursor "zone" array is the wrong approach here.
> > 
> > First of all - can we stop using the term "zone" for new code?
> > That's the old irix terminolgy for slab caches, and we have been
> > moving away from that to the Linux "kmem_cache" terminology and
> > types for quite some time.
> > 
> > AFAICT, the only reason for having the zone array is so that
> > xfs_btree_alloc_cursor() can do a lookup via btnum into the array to
> > get the maxlevels and kmem cache pointer to allocate from.
> > 
> > Given that we've just called into xfs_btree_alloc_cursor() from the
> > specific btree type we are allocating the cursor for (that's where
> > we got btnum from!), we should just be passing these type specific
> > variables directly from the caller like we do for btnum. That gets
> > rid of the need for the zone array completely....
> > 
> > i.e. I don't see why the per-type cache information needs to be
> > global information. The individual max-level calculations could just
> > be individual kmem_cache_alloc() calls to set locally defined (i.e.
> > static global) cache pointers and max size variables.
> 
> If the cache is a static variable inside xfs_fubar_btree.c, how do you
> know which cache to pass to kmem_cache_free in xfs_btree_del_cursor?
> Does this imply adding per-btree del_cursor functions and refactoring
> the entire codebase to use them?
> 
> I was /trying/ to get a dependent patchset ready so that Chandan could
> submit the extent counters patchset for 5.16, not trigger a refactoring
> of a whole ton of btree code.  If you want to hide the information that
> badly, please take over this patchset and solve both the above problem
> and then one below.

So of course 5 minutes after sending this grouchy message I notice that
everthing you asked for can be done pretty easily by having each btree
type call a generic btree function that does:

int __init
xfs_btree_create_cursor_cache(xfs_btnum_t btnum, const char *name,
		unsigned int maxlevels)
{
	struct xfs_btree_cur_cache		*cc;

	cc = &xfs_btree_cur_caches[btnum];
	cc->name = name;
	cc->maxlevels = maxlevels;
	cc->alias = false;

	return 0;
}

and a similar destructor function to null all that out.

Then the xfs_db callout becomes:

unsigned int
xfs_btree_absolute_maxlevels(xfs_btnum_t btnum)
{
	return xfs_btree_cur_caches[btnum].maxlevels;
}

printf("rmap maxlevels %u\n",
		libxfs_btree_absolute_maxlevels(XFS_BTNUM_RMAP));

So yeah, I'll do that.

--D

> > > diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
> > > index c8fea6a464d5..ce428c98e7c4 100644
> > > --- a/fs/xfs/libxfs/xfs_ialloc_btree.c
> > > +++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
> > > @@ -541,6 +541,17 @@ xfs_inobt_maxrecs(
> > >  	return blocklen / (sizeof(xfs_inobt_key_t) + sizeof(xfs_inobt_ptr_t));
> > >  }
> > >  
> > > +unsigned int
> > > +xfs_inobt_absolute_maxlevels(void)
> > > +{
> > > +	unsigned int		minrecs[2];
> > > +
> > > +	xfs_btree_absolute_minrecs(minrecs, 0, sizeof(xfs_inobt_rec_t),
> > > +			sizeof(xfs_inobt_key_t) + sizeof(xfs_inobt_ptr_t));
> > > +
> > > +	return xfs_btree_compute_maxlevels(minrecs, XFS_MAX_AG_INODES);
> > > +}
> > 
> > i.e. rather than returning the size here, we do:
> > 
> > static int xfs_inobt_maxlevels;
> > static struct kmem_cache xfs_inobt_cursor_cache;
> > 
> > int __init
> > xfs_inobt_create_cursor_cache(void)
> > {
> > 	unsigned int		minrecs[2];
> > 
> > 	xfs_btree_absolute_minrecs(minrecs, 0, sizeof(xfs_inobt_rec_t),
> > 			sizeof(xfs_inobt_key_t) + sizeof(xfs_inobt_ptr_t));
> > 	xfs_inobt_maxlevels = xfs_btree_compute_maxlevels(minrecs,
> > 			XFS_MAX_AG_INODES);
> 
> Something you couldn't have seen here is that the xfsprogs port contains
> an addition to the xfs_db btheight switch to print these absolute maxima
> so that we won't have to compute them by hand anymore.
> 
> Maybe I should have noted both of these points in the commit message?
> Though I've also been chided for submitting excessive comments in the
> past, which is why I didn't.
> 
> --D
> 
> > 	xfs_inobt_cursor_cache = kmem_cache_alloc("xfs_inobt_cur",
> > 			xfs_btree_cur_sizeof(xfs_inobt_maxlevels),
> > 			0, 0, NULL);
> > 	if (!xfs_inobt_cursor_cache)
> > 		return -ENOMEM;
> > 	return 0;
> > }
> > 
> > void
> > xfs_inobt_destroy_cursor_cache(void)
> > {
> > 	kmem_cache_destroy(xfs_inobt_cursor_cache);
> > }
> > 
> > and nothing outside fs/xfs/libxfs/xfs_ialloc_btree.c ever needs to
> > know about these variables as they only ever feed into
> > xfs_btree_alloc_cursor() from xfs_inobt_init_common().
> > 
> > Cheers,
> > 
> > Dave.
> > -- 
> > Dave Chinner
> > david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] xfs: check absolute maximum nlevels for each btree type
  2021-09-27 18:17     ` Darrick J. Wong
  2021-09-27 20:29       ` Darrick J. Wong
@ 2021-09-27 21:57       ` Dave Chinner
  1 sibling, 0 replies; 13+ messages in thread
From: Dave Chinner @ 2021-09-27 21:57 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: chandan.babu, chandanrlinux, linux-xfs

On Mon, Sep 27, 2021 at 11:17:51AM -0700, Darrick J. Wong wrote:
> On Sun, Sep 26, 2021 at 10:43:43AM +1000, Dave Chinner wrote:
> > On Thu, Sep 23, 2021 at 06:27:54PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > Add code for all five btree types so that we can compute the absolute
> > > maximum possible btree height for each btree type, and then check that
> > > none of them exceed XFS_BTREE_CUR_ZONE_MAXLEVELS.  The code to do the
> > > actual checking is a little excessive, but it sets us up for per-type
> > > cursor zones in the next patch.
> > 
> > Ok, I think the cursor "zone" array is the wrong approach here.
> > 
> > First of all - can we stop using the term "zone" for new code?
> > That's the old irix terminolgy for slab caches, and we have been
> > moving away from that to the Linux "kmem_cache" terminology and
> > types for quite some time.
> > 
> > AFAICT, the only reason for having the zone array is so that
> > xfs_btree_alloc_cursor() can do a lookup via btnum into the array to
> > get the maxlevels and kmem cache pointer to allocate from.
> > 
> > Given that we've just called into xfs_btree_alloc_cursor() from the
> > specific btree type we are allocating the cursor for (that's where
> > we got btnum from!), we should just be passing these type specific
> > variables directly from the caller like we do for btnum. That gets
> > rid of the need for the zone array completely....
> > 
> > i.e. I don't see why the per-type cache information needs to be
> > global information. The individual max-level calculations could just
> > be individual kmem_cache_alloc() calls to set locally defined (i.e.
> > static global) cache pointers and max size variables.
> 
> If the cache is a static variable inside xfs_fubar_btree.c, how do you
> know which cache to pass to kmem_cache_free in xfs_btree_del_cursor?
> Does this imply adding per-btree del_cursor functions and refactoring
> the entire codebase to use them?

Use a switch statement on cur->bc_btnum and a function call to free
the cursor? That's about the same CPU cost as a cacheline miss
when looking up the global array for the cache pointer.

The other obvious way of doing it is putting the cache pointer in
the cursor itself. Cost is 8 bytes per cursor, but you've just
reduced the size of most cursors by more than that so I see no big
deal in using some of that space. It also means we don't take a
cacheline miss just to look up the array that contains the pointer -
the cursor will be hot at this point in time....

<looks at kmem cache code>

Huh.

Both slab and slub keep the cache pointer in page->slab_cache. They
don't actually need (or use!) the cache pointer to be supplied -
kfree() would work just fine on kmem_cache allocated objects.

The problem is SLOB - it doesn't do this, and the kmem_cache
implementation is subtly different to the kmalloc implementation
and it doesn't store the slab cache on page->slab_cache. So to
support SLOB, we have to use kmem_cache_free().

How's this for a triple negative that makes you scratch your head,
but demonstrates that SLAB/SLUB can handle null cache pointers
correctly:

/*
 * Caller must not use kfree_bulk() on memory not originally allocated
 * by kmalloc(), because the SLOB allocator cannot handle this.
 */
static __always_inline void kfree_bulk(size_t size, void **p)
{
        kmem_cache_free_bulk(NULL, size, p);
}

Ok, so we can't completely optimise the cache pointer way (which
would be the best solution), so I'd be leaning towards a pointer
within the cursor..

> I was /trying/ to get a dependent patchset ready so that Chandan could
> submit the extent counters patchset for 5.16, not trigger a refactoring
> of a whole ton of btree code.  If you want to hide the information that
> badly, please take over this patchset and solve both the above problem
> and then one below.

I'm not trying to hide anything, just trying to keep the
implementation modular and not reliant on an explosion of one-off
global constructs that have to be repeatedly looked up on every
operation. Just keep a pointer to the zone in the cursor....

> > > diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
> > > index c8fea6a464d5..ce428c98e7c4 100644
> > > --- a/fs/xfs/libxfs/xfs_ialloc_btree.c
> > > +++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
> > > @@ -541,6 +541,17 @@ xfs_inobt_maxrecs(
> > >  	return blocklen / (sizeof(xfs_inobt_key_t) + sizeof(xfs_inobt_ptr_t));
> > >  }
> > >  
> > > +unsigned int
> > > +xfs_inobt_absolute_maxlevels(void)
> > > +{
> > > +	unsigned int		minrecs[2];
> > > +
> > > +	xfs_btree_absolute_minrecs(minrecs, 0, sizeof(xfs_inobt_rec_t),
> > > +			sizeof(xfs_inobt_key_t) + sizeof(xfs_inobt_ptr_t));
> > > +
> > > +	return xfs_btree_compute_maxlevels(minrecs, XFS_MAX_AG_INODES);
> > > +}
> > 
> > i.e. rather than returning the size here, we do:
> > 
> > static int xfs_inobt_maxlevels;
> > static struct kmem_cache xfs_inobt_cursor_cache;
> > 
> > int __init
> > xfs_inobt_create_cursor_cache(void)
> > {
> > 	unsigned int		minrecs[2];
> > 
> > 	xfs_btree_absolute_minrecs(minrecs, 0, sizeof(xfs_inobt_rec_t),
> > 			sizeof(xfs_inobt_key_t) + sizeof(xfs_inobt_ptr_t));
> > 	xfs_inobt_maxlevels = xfs_btree_compute_maxlevels(minrecs,
> > 			XFS_MAX_AG_INODES);
> 
> Something you couldn't have seen here is that the xfsprogs port contains
> an addition to the xfs_db btheight switch to print these absolute maxima
> so that we won't have to compute them by hand anymore.
>
> Maybe I should have noted both of these points in the commit message?
> Though I've also been chided for submitting excessive comments in the
> past, which is why I didn't.

Please err on the side of verbosity in commit messages - it's better
to document the obvious than it is to omit important design
criteria.  Anyone reading the code is not going to know what your
design constraints were if you don't document them. Making sure
everyone knows you've created lots of little functions to build a
global table in the kernel code "because userspace" needs to know
of that "because userspace" requirement...

I still don't like global tables like this we have to look up on
every cursor del operation. I'd much prefer the object carries the
state needed for generic code to perform operations on it
efficiently...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] xfs: use separate btree cursor slab for each btree type
  2021-09-27 18:21     ` Darrick J. Wong
@ 2021-09-27 22:01       ` Dave Chinner
  2021-10-12 18:49         ` Darrick J. Wong
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Chinner @ 2021-09-27 22:01 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: chandan.babu, chandanrlinux, linux-xfs

On Mon, Sep 27, 2021 at 11:21:22AM -0700, Darrick J. Wong wrote:
> On Sun, Sep 26, 2021 at 10:47:21AM +1000, Dave Chinner wrote:
> > On Thu, Sep 23, 2021 at 06:27:59PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > Now that we have the infrastructure to track the max possible height of
> > > each btree type, we can create a separate slab zone for cursors of each
> > > type of btree.  For smaller indices like the free space btrees, this
> > > means that we can pack more cursors into a slab page, improving slab
> > > utilization.
> > > 
> > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > ---
> > >  fs/xfs/libxfs/xfs_btree.c |   12 ++++++------
> > >  fs/xfs/libxfs/xfs_btree.h |    9 +--------
> > >  fs/xfs/xfs_super.c        |   33 ++++++++++++++++++++++++---------
> > >  3 files changed, 31 insertions(+), 23 deletions(-)
> > > 
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> > > index 120280c998f8..3131de9ae631 100644
> > > --- a/fs/xfs/libxfs/xfs_btree.c
> > > +++ b/fs/xfs/libxfs/xfs_btree.c
> > > @@ -26,7 +26,6 @@
> > >  /*
> > >   * Cursor allocation zone.
> > >   */
> > > -kmem_zone_t	*xfs_btree_cur_zone;
> > >  struct xfs_btree_cur_zone xfs_btree_cur_zones[XFS_BTNUM_MAX] = {
> > >  	[XFS_BTNUM_BNO]		= { .name = "xfs_alloc_btree_cur" },
> > >  	[XFS_BTNUM_INO]		= { .name = "xfs_ialloc_btree_cur" },
> > > @@ -364,6 +363,7 @@ xfs_btree_del_cursor(
> > >  	struct xfs_btree_cur	*cur,		/* btree cursor */
> > >  	int			error)		/* del because of error */
> > >  {
> > > +	struct xfs_btree_cur_zone *bczone = &xfs_btree_cur_zones[cur->bc_btnum];
> > >  	int			i;		/* btree level */
> > >  
> > >  	/*
> > > @@ -386,10 +386,10 @@ xfs_btree_del_cursor(
> > >  		kmem_free(cur->bc_ops);
> > >  	if (!(cur->bc_flags & XFS_BTREE_LONG_PTRS) && cur->bc_ag.pag)
> > >  		xfs_perag_put(cur->bc_ag.pag);
> > > -	if (cur->bc_maxlevels > XFS_BTREE_CUR_ZONE_MAXLEVELS)
> > > +	if (cur->bc_maxlevels > bczone->maxlevels)
> > >  		kmem_free(cur);
> > >  	else
> > > -		kmem_cache_free(xfs_btree_cur_zone, cur);
> > > +		kmem_cache_free(bczone->zone, cur);
> > >  }
> > >  
> > >  /*
> > > @@ -5021,12 +5021,12 @@ xfs_btree_alloc_cursor(
> > >  {
> > >  	struct xfs_btree_cur	*cur;
> > >  	unsigned int		maxlevels = xfs_btree_maxlevels(mp, btnum);
> > > +	struct xfs_btree_cur_zone *bczone = &xfs_btree_cur_zones[btnum];
> > >  
> > > -	if (maxlevels > XFS_BTREE_CUR_ZONE_MAXLEVELS)
> > > +	if (maxlevels > bczone->maxlevels)
> > >  		cur = kmem_zalloc(xfs_btree_cur_sizeof(maxlevels), KM_NOFS);
> > >  	else
> > > -		cur = kmem_cache_zalloc(xfs_btree_cur_zone,
> > > -				GFP_NOFS | __GFP_NOFAIL);
> > > +		cur = kmem_cache_zalloc(bczone->zone, GFP_NOFS | __GFP_NOFAIL);
> > 
> > When will maxlevels ever be greater than bczone->maxlevels? Isn't
> > the bczone->maxlevels case always supposed to be the tallest
> > possible height for that btree?
> 
> It should never happen, provided that the maxlevels computation and
> verification are all correct.  I thought it was important to leave the
> heap allocation in here as a fallback, since the consequence for getting
> the size calculations wrong is corrupt kernel memory.

I think that this is the wrong approach. Static debug-only testing
of btree size calculations at init time is needed here, not runtime
fallbacks that hide the fact that we got fundamental calculations
wrong. A mistake here should be loud and obvious, not hidden away in
a fallback path that might never, ever be hit in the real world.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] xfs: use separate btree cursor slab for each btree type
  2021-09-27 22:01       ` Dave Chinner
@ 2021-10-12 18:49         ` Darrick J. Wong
  0 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2021-10-12 18:49 UTC (permalink / raw)
  To: Dave Chinner; +Cc: chandan.babu, chandanrlinux, linux-xfs

On Tue, Sep 28, 2021 at 08:01:36AM +1000, Dave Chinner wrote:
> On Mon, Sep 27, 2021 at 11:21:22AM -0700, Darrick J. Wong wrote:
> > On Sun, Sep 26, 2021 at 10:47:21AM +1000, Dave Chinner wrote:
> > > On Thu, Sep 23, 2021 at 06:27:59PM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > 
> > > > Now that we have the infrastructure to track the max possible height of
> > > > each btree type, we can create a separate slab zone for cursors of each
> > > > type of btree.  For smaller indices like the free space btrees, this
> > > > means that we can pack more cursors into a slab page, improving slab
> > > > utilization.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > > ---
> > > >  fs/xfs/libxfs/xfs_btree.c |   12 ++++++------
> > > >  fs/xfs/libxfs/xfs_btree.h |    9 +--------
> > > >  fs/xfs/xfs_super.c        |   33 ++++++++++++++++++++++++---------
> > > >  3 files changed, 31 insertions(+), 23 deletions(-)
> > > > 
> > > > 
> > > > diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> > > > index 120280c998f8..3131de9ae631 100644
> > > > --- a/fs/xfs/libxfs/xfs_btree.c
> > > > +++ b/fs/xfs/libxfs/xfs_btree.c
> > > > @@ -26,7 +26,6 @@
> > > >  /*
> > > >   * Cursor allocation zone.
> > > >   */
> > > > -kmem_zone_t	*xfs_btree_cur_zone;
> > > >  struct xfs_btree_cur_zone xfs_btree_cur_zones[XFS_BTNUM_MAX] = {
> > > >  	[XFS_BTNUM_BNO]		= { .name = "xfs_alloc_btree_cur" },
> > > >  	[XFS_BTNUM_INO]		= { .name = "xfs_ialloc_btree_cur" },
> > > > @@ -364,6 +363,7 @@ xfs_btree_del_cursor(
> > > >  	struct xfs_btree_cur	*cur,		/* btree cursor */
> > > >  	int			error)		/* del because of error */
> > > >  {
> > > > +	struct xfs_btree_cur_zone *bczone = &xfs_btree_cur_zones[cur->bc_btnum];
> > > >  	int			i;		/* btree level */
> > > >  
> > > >  	/*
> > > > @@ -386,10 +386,10 @@ xfs_btree_del_cursor(
> > > >  		kmem_free(cur->bc_ops);
> > > >  	if (!(cur->bc_flags & XFS_BTREE_LONG_PTRS) && cur->bc_ag.pag)
> > > >  		xfs_perag_put(cur->bc_ag.pag);
> > > > -	if (cur->bc_maxlevels > XFS_BTREE_CUR_ZONE_MAXLEVELS)
> > > > +	if (cur->bc_maxlevels > bczone->maxlevels)
> > > >  		kmem_free(cur);
> > > >  	else
> > > > -		kmem_cache_free(xfs_btree_cur_zone, cur);
> > > > +		kmem_cache_free(bczone->zone, cur);
> > > >  }
> > > >  
> > > >  /*
> > > > @@ -5021,12 +5021,12 @@ xfs_btree_alloc_cursor(
> > > >  {
> > > >  	struct xfs_btree_cur	*cur;
> > > >  	unsigned int		maxlevels = xfs_btree_maxlevels(mp, btnum);
> > > > +	struct xfs_btree_cur_zone *bczone = &xfs_btree_cur_zones[btnum];
> > > >  
> > > > -	if (maxlevels > XFS_BTREE_CUR_ZONE_MAXLEVELS)
> > > > +	if (maxlevels > bczone->maxlevels)
> > > >  		cur = kmem_zalloc(xfs_btree_cur_sizeof(maxlevels), KM_NOFS);
> > > >  	else
> > > > -		cur = kmem_cache_zalloc(xfs_btree_cur_zone,
> > > > -				GFP_NOFS | __GFP_NOFAIL);
> > > > +		cur = kmem_cache_zalloc(bczone->zone, GFP_NOFS | __GFP_NOFAIL);
> > > 
> > > When will maxlevels ever be greater than bczone->maxlevels? Isn't
> > > the bczone->maxlevels case always supposed to be the tallest
> > > possible height for that btree?
> > 
> > It should never happen, provided that the maxlevels computation and
> > verification are all correct.  I thought it was important to leave the
> > heap allocation in here as a fallback, since the consequence for getting
> > the size calculations wrong is corrupt kernel memory.
> 
> I think that this is the wrong approach. Static debug-only testing
> of btree size calculations at init time is needed here, not runtime
> fallbacks that hide the fact that we got fundamental calculations
> wrong. A mistake here should be loud and obvious, not hidden away in
> a fallback path that might never, ever be hit in the real world.

Agree.  I'll add an assert to the functions that compute the per-mount
maxlevels.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-10-12 18:49 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-24  1:27 [PATCHSET RFC v2 chandan 0/4] xfs: separate btree cursor slab zones Darrick J. Wong
2021-09-24  1:27 ` [PATCH 1/4] xfs: remove xfs_btree_cur.bc_blocklog Darrick J. Wong
2021-09-24  1:27 ` [PATCH 2/4] xfs: reduce the size of nr_ops for refcount btree cursors Darrick J. Wong
2021-09-24  1:27 ` [PATCH 3/4] xfs: check absolute maximum nlevels for each btree type Darrick J. Wong
2021-09-26  0:43   ` Dave Chinner
2021-09-27 18:17     ` Darrick J. Wong
2021-09-27 20:29       ` Darrick J. Wong
2021-09-27 21:57       ` Dave Chinner
2021-09-24  1:27 ` [PATCH 4/4] xfs: use separate btree cursor slab " Darrick J. Wong
2021-09-26  0:47   ` Dave Chinner
2021-09-27 18:21     ` Darrick J. Wong
2021-09-27 22:01       ` Dave Chinner
2021-10-12 18:49         ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).