All of lore.kernel.org
 help / color / mirror / Atom feed
* lockless and cleaned up buffer lookup
@ 2022-04-03 12:01 Christoph Hellwig
  2022-04-03 12:01 ` [PATCH 1/5] xfs: add a flags argument to xfs_buf_get Christoph Hellwig
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Christoph Hellwig @ 2022-04-03 12:01 UTC (permalink / raw)
  To: linux-xfs; +Cc: Dave Chinner

Hi all,

this series cleans up the slight mess that the buffer cache hash
lookup and allocation path is before applying a now much simplified
(and split up) version of "xfs: lockless buffer lookup" from Dave.

Let me know what you think.

Diffstat:
 libxfs/xfs_attr_remote.c |    8 -
 libxfs/xfs_sb.c          |    2 
 scrub/repair.c           |    6 -
 xfs_buf.c                |  253 ++++++++++++++++++++++-------------------------
 xfs_buf.h                |    9 -
 xfs_qm.c                 |    6 -
 6 files changed, 134 insertions(+), 150 deletions(-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/5] xfs: add a flags argument to xfs_buf_get
  2022-04-03 12:01 lockless and cleaned up buffer lookup Christoph Hellwig
@ 2022-04-03 12:01 ` Christoph Hellwig
  2022-04-03 12:01 ` [PATCH 2/5] xfs: replace xfs_buf_incore with an XBF_NOALLOC flag to xfs_buf_get* Christoph Hellwig
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2022-04-03 12:01 UTC (permalink / raw)
  To: linux-xfs; +Cc: Dave Chinner

Add a flags argument to xfs_buf_get that gets passed on to
xfs_buf_get_map.  This matches the rest of the buffer cache interfaces
and will be needed for upcoming cleanups.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_attr_remote.c | 2 +-
 fs/xfs/libxfs/xfs_sb.c          | 2 +-
 fs/xfs/xfs_buf.h                | 3 ++-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
index 83b95be9ded8a4..3fc62ff92441d5 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.c
+++ b/fs/xfs/libxfs/xfs_attr_remote.c
@@ -512,7 +512,7 @@ xfs_attr_rmtval_set_value(
 		dblkno = XFS_FSB_TO_DADDR(mp, map.br_startblock),
 		dblkcnt = XFS_FSB_TO_BB(mp, map.br_blockcount);
 
-		error = xfs_buf_get(mp->m_ddev_targp, dblkno, dblkcnt, &bp);
+		error = xfs_buf_get(mp->m_ddev_targp, dblkno, dblkcnt, 0, &bp);
 		if (error)
 			return error;
 		bp->b_ops = &xfs_attr3_rmt_buf_ops;
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index f4e84aa1d50a4b..6849c53d27970e 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -979,7 +979,7 @@ xfs_update_secondary_sbs(
 
 		error = xfs_buf_get(mp->m_ddev_targp,
 				 XFS_AG_DADDR(mp, pag->pag_agno, XFS_SB_DADDR),
-				 XFS_FSS_TO_BB(mp, 1), &bp);
+				 XFS_FSS_TO_BB(mp, 1), 0, &bp);
 		/*
 		 * If we get an error reading or writing alternate superblocks,
 		 * continue.  xfs_repair chooses the "best" superblock based
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index edcb6254fa6a87..ed7ee674216037 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -214,11 +214,12 @@ xfs_buf_get(
 	struct xfs_buftarg	*target,
 	xfs_daddr_t		blkno,
 	size_t			numblks,
+	xfs_buf_flags_t		flags,
 	struct xfs_buf		**bpp)
 {
 	DEFINE_SINGLE_BUF_MAP(map, blkno, numblks);
 
-	return xfs_buf_get_map(target, &map, 1, 0, bpp);
+	return xfs_buf_get_map(target, &map, 1, flags, bpp);
 }
 
 static inline int
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/5] xfs: replace xfs_buf_incore with an XBF_NOALLOC flag to xfs_buf_get*
  2022-04-03 12:01 lockless and cleaned up buffer lookup Christoph Hellwig
  2022-04-03 12:01 ` [PATCH 1/5] xfs: add a flags argument to xfs_buf_get Christoph Hellwig
@ 2022-04-03 12:01 ` Christoph Hellwig
  2022-04-03 21:54   ` Dave Chinner
  2022-04-03 12:01 ` [PATCH 3/5] xfs: remove a superflous hash lookup when inserting new buffers Christoph Hellwig
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2022-04-03 12:01 UTC (permalink / raw)
  To: linux-xfs; +Cc: Dave Chinner

Replace the special purpose xfs_buf_incore interface with a new
XBF_NOALLOC flag for the xfs_buf_get* routines.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_attr_remote.c |  6 +++---
 fs/xfs/scrub/repair.c           |  6 +++---
 fs/xfs/xfs_buf.c                | 22 +++-------------------
 fs/xfs/xfs_buf.h                |  5 +----
 fs/xfs/xfs_qm.c                 |  6 +++---
 5 files changed, 13 insertions(+), 32 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
index 3fc62ff92441d5..9aff2ce203c9b6 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.c
+++ b/fs/xfs/libxfs/xfs_attr_remote.c
@@ -550,10 +550,10 @@ xfs_attr_rmtval_stale(
 	    XFS_IS_CORRUPT(mp, map->br_startblock == HOLESTARTBLOCK))
 		return -EFSCORRUPTED;
 
-	bp = xfs_buf_incore(mp->m_ddev_targp,
+	if (!xfs_buf_get(mp->m_ddev_targp,
 			XFS_FSB_TO_DADDR(mp, map->br_startblock),
-			XFS_FSB_TO_BB(mp, map->br_blockcount), incore_flags);
-	if (bp) {
+			XFS_FSB_TO_BB(mp, map->br_blockcount),
+			incore_flags | XBF_NOALLOC, &bp)) {
 		xfs_buf_stale(bp);
 		xfs_buf_relse(bp);
 	}
diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
index 1e7b6b209ee89a..d4e0fcbc1487a1 100644
--- a/fs/xfs/scrub/repair.c
+++ b/fs/xfs/scrub/repair.c
@@ -460,10 +460,10 @@ xrep_invalidate_blocks(
 		/* Skip AG headers and post-EOFS blocks */
 		if (!xfs_verify_fsbno(sc->mp, fsbno))
 			continue;
-		bp = xfs_buf_incore(sc->mp->m_ddev_targp,
+		if (!xfs_buf_get(sc->mp->m_ddev_targp,
 				XFS_FSB_TO_DADDR(sc->mp, fsbno),
-				XFS_FSB_TO_BB(sc->mp, 1), XBF_TRYLOCK);
-		if (bp) {
+				XFS_FSB_TO_BB(sc->mp, 1),
+				XBF_NOALLOC | XBF_TRYLOCK, &bp)) {
 			xfs_trans_bjoin(sc->tp, bp);
 			xfs_trans_binval(sc->tp, bp);
 		}
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index e1afb9e503e167..0951b7cbd79675 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -227,7 +227,8 @@ _xfs_buf_alloc(
 	 * We don't want certain flags to appear in b_flags unless they are
 	 * specifically set by later operations on the buffer.
 	 */
-	flags &= ~(XBF_UNMAPPED | XBF_TRYLOCK | XBF_ASYNC | XBF_READ_AHEAD);
+	flags &= ~(XBF_NOALLOC | XBF_UNMAPPED | XBF_TRYLOCK | XBF_ASYNC |
+		   XBF_READ_AHEAD);
 
 	atomic_set(&bp->b_hold, 1);
 	atomic_set(&bp->b_lru_ref, 1);
@@ -616,23 +617,6 @@ xfs_buf_find(
 	return 0;
 }
 
-struct xfs_buf *
-xfs_buf_incore(
-	struct xfs_buftarg	*target,
-	xfs_daddr_t		blkno,
-	size_t			numblks,
-	xfs_buf_flags_t		flags)
-{
-	struct xfs_buf		*bp;
-	int			error;
-	DEFINE_SINGLE_BUF_MAP(map, blkno, numblks);
-
-	error = xfs_buf_find(target, &map, 1, flags, NULL, &bp);
-	if (error)
-		return NULL;
-	return bp;
-}
-
 /*
  * Assembles a buffer covering the specified range. The code is optimised for
  * cache hits, as metadata intensive workloads will see 3 orders of magnitude
@@ -654,7 +638,7 @@ xfs_buf_get_map(
 	error = xfs_buf_find(target, map, nmaps, flags, NULL, &bp);
 	if (!error)
 		goto found;
-	if (error != -ENOENT)
+	if (error != -ENOENT || (flags & XBF_NOALLOC))
 		return error;
 
 	error = _xfs_buf_alloc(target, map, nmaps, flags, &new_bp);
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index ed7ee674216037..a28a2c5a496ab5 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -42,6 +42,7 @@ struct xfs_buf;
 #define _XBF_DELWRI_Q	 (1 << 22)/* buffer on a delwri queue */
 
 /* flags used only as arguments to access routines */
+#define XBF_NOALLOC	 (1 << 29)/* do not allocate buffer if not found */
 #define XBF_TRYLOCK	 (1 << 30)/* lock requested, but do not wait */
 #define XBF_UNMAPPED	 (1 << 31)/* do not map the buffer */
 
@@ -196,10 +197,6 @@ struct xfs_buf {
 };
 
 /* Finding and Reading Buffers */
-struct xfs_buf *xfs_buf_incore(struct xfs_buftarg *target,
-			   xfs_daddr_t blkno, size_t numblks,
-			   xfs_buf_flags_t flags);
-
 int xfs_buf_get_map(struct xfs_buftarg *target, struct xfs_buf_map *map,
 		int nmaps, xfs_buf_flags_t flags, struct xfs_buf **bpp);
 int xfs_buf_read_map(struct xfs_buftarg *target, struct xfs_buf_map *map,
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index f165d1a3de1d1d..9325170a3e18b3 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -1238,9 +1238,9 @@ xfs_qm_flush_one(
 	 */
 	if (!xfs_dqflock_nowait(dqp)) {
 		/* buf is pinned in-core by delwri list */
-		bp = xfs_buf_incore(mp->m_ddev_targp, dqp->q_blkno,
-				mp->m_quotainfo->qi_dqchunklen, 0);
-		if (!bp) {
+		if (xfs_buf_get(mp->m_ddev_targp, dqp->q_blkno,
+				mp->m_quotainfo->qi_dqchunklen,
+				XBF_NOALLOC, &bp)) {
 			error = -EINVAL;
 			goto out_unlock;
 		}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/5] xfs: remove a superflous hash lookup when inserting new buffers
  2022-04-03 12:01 lockless and cleaned up buffer lookup Christoph Hellwig
  2022-04-03 12:01 ` [PATCH 1/5] xfs: add a flags argument to xfs_buf_get Christoph Hellwig
  2022-04-03 12:01 ` [PATCH 2/5] xfs: replace xfs_buf_incore with an XBF_NOALLOC flag to xfs_buf_get* Christoph Hellwig
@ 2022-04-03 12:01 ` Christoph Hellwig
  2022-04-03 23:04   ` Dave Chinner
  2022-04-03 12:01 ` [PATCH 4/5] xfs: reduce the number of atomic when locking a buffer after lookup Christoph Hellwig
  2022-04-03 12:01 ` [PATCH 5/5] xfs: lockless buffer lookup Christoph Hellwig
  4 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2022-04-03 12:01 UTC (permalink / raw)
  To: linux-xfs; +Cc: Dave Chinner

xfs_buf_get_map has a bit of a strange structure where the xfs_buf_find
helper is called twice before we actually insert a new buffer on a cache
miss.  Given that the rhashtable has an interface to insert a new entry
and return the found one on a conflict we can easily get rid of the
double lookup by using that.  To get a straight code path this requires
folding the xfs_buf_find helper into xfs_buf_get_map, but the counter
side we can now easily move the logic for the slow path cache insert
into a new helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_buf.c | 207 +++++++++++++++++++++++------------------------
 1 file changed, 99 insertions(+), 108 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 0951b7cbd79675..ef645e15935369 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -504,12 +504,66 @@ xfs_buf_hash_destroy(
 	rhashtable_destroy(&pag->pag_buf_hash);
 }
 
+static int
+xfs_buf_insert(
+	struct xfs_buftarg	*btp,
+	struct xfs_buf_map	*map,
+	int			nmaps,
+	xfs_buf_flags_t		flags,
+	struct xfs_perag	*pag,
+	struct xfs_buf		**bpp)
+{
+	int			error;
+	struct xfs_buf		*bp;
+
+	error = _xfs_buf_alloc(btp, map, nmaps, flags, &bp);
+	if (error)
+		return error;
+
+	/*
+	 * For buffers that fit entirely within a single page, first attempt to
+	 * allocate the memory from the heap to minimise memory usage. If we
+	 * can't get heap memory for these small buffers, we fall back to using
+	 * the page allocator.
+	 */
+	if (BBTOB(bp->b_length) >= PAGE_SIZE ||
+	    xfs_buf_alloc_kmem(bp, flags) < 0) {
+		error = xfs_buf_alloc_pages(bp, flags);
+		if (error)
+			goto out_free_buf;
+	}
+
+	/* the buffer keeps the perag reference until it is freed */
+	bp->b_pag = pag;
+
+	spin_lock(&pag->pag_buf_lock);
+	*bpp = rhashtable_lookup_get_insert_fast(&pag->pag_buf_hash,
+			&bp->b_rhash_head, xfs_buf_hash_params);
+	if (IS_ERR(*bpp)) {
+		error = PTR_ERR(*bpp);
+		goto out_unlock;
+	}
+	if (*bpp) {
+		/* found an existing buffer */
+		atomic_inc(&(*bpp)->b_hold);
+		error = -EEXIST;
+		goto out_unlock;
+	}
+	spin_unlock(&pag->pag_buf_lock);
+	*bpp = bp;
+	return 0;
+
+out_unlock:
+	spin_unlock(&pag->pag_buf_lock);
+out_free_buf:
+	xfs_buf_free(bp);
+	return error;
+}
+
 /*
- * Look up a buffer in the buffer cache and return it referenced and locked
- * in @found_bp.
- *
- * If @new_bp is supplied and we have a lookup miss, insert @new_bp into the
- * cache.
+ * Assemble a buffer covering the specified range. The code is optimised for
+ * cache hits, as metadata intensive workloads will see 3 orders of magnitude
+ * more hits than misses.
  *
  * If XBF_TRYLOCK is set in @flags, only try to lock the buffer and return
  * -EAGAIN if we fail to lock it.
@@ -517,27 +571,26 @@ xfs_buf_hash_destroy(
  * Return values are:
  *	-EFSCORRUPTED if have been supplied with an invalid address
  *	-EAGAIN on trylock failure
- *	-ENOENT if we fail to find a match and @new_bp was NULL
- *	0, with @found_bp:
- *		- @new_bp if we inserted it into the cache
- *		- the buffer we found and locked.
+ *	-ENOENT if we fail to find a match and alloc was %false
+ *	0 if a buffer was found or newly created
  */
-static int
-xfs_buf_find(
+int
+xfs_buf_get_map(
 	struct xfs_buftarg	*btp,
 	struct xfs_buf_map	*map,
 	int			nmaps,
 	xfs_buf_flags_t		flags,
-	struct xfs_buf		*new_bp,
-	struct xfs_buf		**found_bp)
+	struct xfs_buf		**bpp)
 {
+	struct xfs_mount	*mp = btp->bt_mount;
 	struct xfs_perag	*pag;
 	struct xfs_buf		*bp;
 	struct xfs_buf_map	cmap = { .bm_bn = map[0].bm_bn };
+	int			error = 0;
 	xfs_daddr_t		eofs;
 	int			i;
 
-	*found_bp = NULL;
+	*bpp = NULL;
 
 	for (i = 0; i < nmaps; i++)
 		cmap.bm_len += map[i].bm_len;
@@ -550,54 +603,47 @@ xfs_buf_find(
 	 * Corrupted block numbers can get through to here, unfortunately, so we
 	 * have to check that the buffer falls within the filesystem bounds.
 	 */
-	eofs = XFS_FSB_TO_BB(btp->bt_mount, btp->bt_mount->m_sb.sb_dblocks);
-	if (cmap.bm_bn < 0 || cmap.bm_bn >= eofs) {
-		xfs_alert(btp->bt_mount,
-			  "%s: daddr 0x%llx out of range, EOFS 0x%llx",
+	eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks);
+	if (WARN_ON_ONCE(cmap.bm_bn < 0) || WARN_ON_ONCE(cmap.bm_bn >= eofs)) {
+		xfs_alert(mp, "%s: daddr 0x%llx out of range, EOFS 0x%llx",
 			  __func__, cmap.bm_bn, eofs);
-		WARN_ON(1);
 		return -EFSCORRUPTED;
 	}
 
-	pag = xfs_perag_get(btp->bt_mount,
-			    xfs_daddr_to_agno(btp->bt_mount, cmap.bm_bn));
+	pag = xfs_perag_get(mp, xfs_daddr_to_agno(mp, cmap.bm_bn));
 
 	spin_lock(&pag->pag_buf_lock);
 	bp = rhashtable_lookup_fast(&pag->pag_buf_hash, &cmap,
 				    xfs_buf_hash_params);
-	if (bp) {
+	if (bp)
 		atomic_inc(&bp->b_hold);
-		goto found;
-	}
-
-	/* No match found */
-	if (!new_bp) {
-		XFS_STATS_INC(btp->bt_mount, xb_miss_locked);
-		spin_unlock(&pag->pag_buf_lock);
-		xfs_perag_put(pag);
-		return -ENOENT;
-	}
-
-	/* the buffer keeps the perag reference until it is freed */
-	new_bp->b_pag = pag;
-	rhashtable_insert_fast(&pag->pag_buf_hash, &new_bp->b_rhash_head,
-			       xfs_buf_hash_params);
 	spin_unlock(&pag->pag_buf_lock);
-	*found_bp = new_bp;
-	return 0;
 
-found:
-	spin_unlock(&pag->pag_buf_lock);
+	if (unlikely(!bp)) {
+		if (flags & XBF_NOALLOC) {
+			XFS_STATS_INC(mp, xb_miss_locked);
+			xfs_perag_put(pag);
+			return -ENOENT;
+		}
+
+		error = xfs_buf_insert(btp, map, nmaps, flags, pag, &bp);
+		if (!error)
+			goto finish;
+		if (error != -EEXIST) {
+			xfs_perag_put(pag);
+			return error;
+		}
+	}
 	xfs_perag_put(pag);
 
 	if (!xfs_buf_trylock(bp)) {
 		if (flags & XBF_TRYLOCK) {
 			xfs_buf_rele(bp);
-			XFS_STATS_INC(btp->bt_mount, xb_busy_locked);
+			XFS_STATS_INC(mp, xb_busy_locked);
 			return -EAGAIN;
 		}
 		xfs_buf_lock(bp);
-		XFS_STATS_INC(btp->bt_mount, xb_get_locked_waited);
+		XFS_STATS_INC(mp, xb_get_locked_waited);
 	}
 
 	/*
@@ -612,64 +658,12 @@ xfs_buf_find(
 	}
 
 	trace_xfs_buf_find(bp, flags, _RET_IP_);
-	XFS_STATS_INC(btp->bt_mount, xb_get_locked);
-	*found_bp = bp;
-	return 0;
-}
-
-/*
- * Assembles a buffer covering the specified range. The code is optimised for
- * cache hits, as metadata intensive workloads will see 3 orders of magnitude
- * more hits than misses.
- */
-int
-xfs_buf_get_map(
-	struct xfs_buftarg	*target,
-	struct xfs_buf_map	*map,
-	int			nmaps,
-	xfs_buf_flags_t		flags,
-	struct xfs_buf		**bpp)
-{
-	struct xfs_buf		*bp;
-	struct xfs_buf		*new_bp;
-	int			error;
-
-	*bpp = NULL;
-	error = xfs_buf_find(target, map, nmaps, flags, NULL, &bp);
-	if (!error)
-		goto found;
-	if (error != -ENOENT || (flags & XBF_NOALLOC))
-		return error;
-
-	error = _xfs_buf_alloc(target, map, nmaps, flags, &new_bp);
-	if (error)
-		return error;
-
-	/*
-	 * For buffers that fit entirely within a single page, first attempt to
-	 * allocate the memory from the heap to minimise memory usage. If we
-	 * can't get heap memory for these small buffers, we fall back to using
-	 * the page allocator.
-	 */
-	if (BBTOB(new_bp->b_length) >= PAGE_SIZE ||
-	    xfs_buf_alloc_kmem(new_bp, flags) < 0) {
-		error = xfs_buf_alloc_pages(new_bp, flags);
-		if (error)
-			goto out_free_buf;
-	}
-
-	error = xfs_buf_find(target, map, nmaps, flags, new_bp, &bp);
-	if (error)
-		goto out_free_buf;
-
-	if (bp != new_bp)
-		xfs_buf_free(new_bp);
-
-found:
+	XFS_STATS_INC(mp, xb_get_locked);
+finish:
 	if (!bp->b_addr) {
 		error = _xfs_buf_map_pages(bp, flags);
 		if (unlikely(error)) {
-			xfs_warn_ratelimited(target->bt_mount,
+			xfs_warn_ratelimited(mp,
 				"%s: failed to map %u pages", __func__,
 				bp->b_page_count);
 			xfs_buf_relse(bp);
@@ -684,13 +678,10 @@ xfs_buf_get_map(
 	if (!(flags & XBF_READ))
 		xfs_buf_ioerror(bp, 0);
 
-	XFS_STATS_INC(target->bt_mount, xb_get);
+	XFS_STATS_INC(mp, xb_get);
 	trace_xfs_buf_get(bp, flags, _RET_IP_);
 	*bpp = bp;
 	return 0;
-out_free_buf:
-	xfs_buf_free(new_bp);
-	return error;
 }
 
 int
@@ -962,12 +953,12 @@ xfs_buf_rele(
 	/*
 	 * We grab the b_lock here first to serialise racing xfs_buf_rele()
 	 * calls. The pag_buf_lock being taken on the last reference only
-	 * serialises against racing lookups in xfs_buf_find(). IOWs, the second
-	 * to last reference we drop here is not serialised against the last
-	 * reference until we take bp->b_lock. Hence if we don't grab b_lock
-	 * first, the last "release" reference can win the race to the lock and
-	 * free the buffer before the second-to-last reference is processed,
-	 * leading to a use-after-free scenario.
+	 * serialises against racing lookups in xfs_buf_get_map(). IOWs, the
+	 * second to last reference we drop here is not serialised against the
+	 * last reference until we take bp->b_lock. Hence if we don't grab
+	 * b_lock first, the last "release" reference can win the race to the
+	 * lock and free the buffer before the second-to-last reference is
+	 * processed, leading to a use-after-free scenario.
 	 */
 	spin_lock(&bp->b_lock);
 	release = atomic_dec_and_lock(&bp->b_hold, &pag->pag_buf_lock);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/5] xfs: reduce the number of atomic when locking a buffer after lookup
  2022-04-03 12:01 lockless and cleaned up buffer lookup Christoph Hellwig
                   ` (2 preceding siblings ...)
  2022-04-03 12:01 ` [PATCH 3/5] xfs: remove a superflous hash lookup when inserting new buffers Christoph Hellwig
@ 2022-04-03 12:01 ` Christoph Hellwig
  2022-04-03 12:01 ` [PATCH 5/5] xfs: lockless buffer lookup Christoph Hellwig
  4 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2022-04-03 12:01 UTC (permalink / raw)
  To: linux-xfs; +Cc: Dave Chinner, Dave Chinner

From: Dave Chinner <dchinner@redhat.com>

Avoid an extra atomic operation in the non-trylock case by only doing a
trylock if the XBF_TRYLOCK flag is set. This follows the pattern in the
IO path with NOWAIT semantics where the "trylock-fail-lock" path showed
5-10% reduced throughput compared to just using single lock call when not
under NOWAIT conditions. So make that same change here, too.

See commit 942491c9e6d6 ("xfs: fix AIM7 regression") for details.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
[hch: split from a larger patch]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_buf.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index ef645e15935369..dd68aee52118c2 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -636,12 +636,13 @@ xfs_buf_get_map(
 	}
 	xfs_perag_put(pag);
 
-	if (!xfs_buf_trylock(bp)) {
-		if (flags & XBF_TRYLOCK) {
+	if (flags & XBF_TRYLOCK) {
+		if (!xfs_buf_trylock(bp)) {
 			xfs_buf_rele(bp);
 			XFS_STATS_INC(mp, xb_busy_locked);
 			return -EAGAIN;
 		}
+	} else {
 		xfs_buf_lock(bp);
 		XFS_STATS_INC(mp, xb_get_locked_waited);
 	}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/5] xfs: lockless buffer lookup
  2022-04-03 12:01 lockless and cleaned up buffer lookup Christoph Hellwig
                   ` (3 preceding siblings ...)
  2022-04-03 12:01 ` [PATCH 4/5] xfs: reduce the number of atomic when locking a buffer after lookup Christoph Hellwig
@ 2022-04-03 12:01 ` Christoph Hellwig
  4 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2022-04-03 12:01 UTC (permalink / raw)
  To: linux-xfs; +Cc: Dave Chinner, Dave Chinner

From: Dave Chinner <dchinner@redhat.com>

Current work to merge the XFS inode life cycle with the VFS indoe
life cycle is finding some interesting issues. If we have a path
that hits buffer trylocks fairly hard (e.g. a non-blocking
background inode freeing function), we end up hitting massive
contention on the buffer cache hash locks:

-   92.71%     0.05%  [kernel]                  [k] xfs_inodegc_worker
   - 92.67% xfs_inodegc_worker
      - 92.13% xfs_inode_unlink
         - 91.52% xfs_inactive_ifree
            - 85.63% xfs_read_agi
               - 85.61% xfs_trans_read_buf_map
                  - 85.59% xfs_buf_read_map
                     - xfs_buf_get_map
                        - 85.55% xfs_buf_find
                           - 72.87% _raw_spin_lock
                              - do_raw_spin_lock
                                   71.86% __pv_queued_spin_lock_slowpath
                           - 8.74% xfs_buf_rele
                              - 7.88% _raw_spin_lock
                                 - 7.88% do_raw_spin_lock
                                      7.63% __pv_queued_spin_lock_slowpath
                           - 1.70% xfs_buf_trylock
                              - 1.68% down_trylock
                                 - 1.41% _raw_spin_lock_irqsave
                                    - 1.39% do_raw_spin_lock
                                         __pv_queued_spin_lock_slowpath
                           - 0.76% _raw_spin_unlock
                                0.75% do_raw_spin_unlock

This is basically hammering the pag->pag_buf_lock from lots of CPUs
doing trylocks at the same time. Most of the buffer trylock
operations ultimately fail after we've done the lookup, so we're
really hammering the buf hash lock whilst making no progress.

We can also see significant spinlock traffic on the same lock just
under normal operation when lots of tasks are accessing metadata
from the same AG, so let's avoid all this by creating a lookup fast
path which leverages the rhashtable's ability to do rcu protected
lookups.

We avoid races with the buffer release path by using
atomic_inc_not_zero() on the buffer hold count. Any buffer that is
in the LRU will have a non-zero count, thereby allowing the lockless
fast path to be taken in most cache hit situations. If the buffer
hold count is zero, then it is likely going through the release path
so in that case we fall back to the existing lookup miss slow path.
i.e. we simply use the fallback path where the caller allocates a
new buf and tries to insert it, but this time holding the
pag->pag_buf_lock.

The use of rcu protected lookups means that buffer handles now need
to be freed by RCU callbacks (same as inodes). We still free the
buffer pages before the RCU callback - we won't be trying to access
them at all on a buffer that has zero references - but we need the
buffer handle itself to be present for the entire rcu protected read
side to detect a zero hold count correctly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
[hch: rebased and split]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_buf.c | 25 +++++++++++++++++--------
 fs/xfs/xfs_buf.h |  1 +
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index dd68aee52118c2..2d6d57e80f56d5 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -295,6 +295,16 @@ xfs_buf_free_pages(
 	bp->b_flags &= ~_XBF_PAGES;
 }
 
+static void
+xfs_buf_free_callback(
+	struct callback_head	*cb)
+{
+	struct xfs_buf		*bp = container_of(cb, struct xfs_buf, b_rcu);
+
+	xfs_buf_free_maps(bp);
+	kmem_cache_free(xfs_buf_cache, bp);
+}
+
 static void
 xfs_buf_free(
 	struct xfs_buf		*bp)
@@ -308,10 +318,10 @@ xfs_buf_free(
 	else if (bp->b_flags & _XBF_KMEM)
 		kmem_free(bp->b_addr);
 
-	xfs_buf_free_maps(bp);
-	kmem_cache_free(xfs_buf_cache, bp);
+	call_rcu(&bp->b_rcu, xfs_buf_free_callback);
 }
 
+
 static int
 xfs_buf_alloc_kmem(
 	struct xfs_buf	*bp,
@@ -612,12 +622,11 @@ xfs_buf_get_map(
 
 	pag = xfs_perag_get(mp, xfs_daddr_to_agno(mp, cmap.bm_bn));
 
-	spin_lock(&pag->pag_buf_lock);
-	bp = rhashtable_lookup_fast(&pag->pag_buf_hash, &cmap,
-				    xfs_buf_hash_params);
-	if (bp)
-		atomic_inc(&bp->b_hold);
-	spin_unlock(&pag->pag_buf_lock);
+	rcu_read_lock();
+	bp = rhashtable_lookup(&pag->pag_buf_hash, &cmap, xfs_buf_hash_params);
+	if (bp && !atomic_inc_not_zero(&bp->b_hold))
+		bp = NULL;
+	rcu_read_unlock();
 
 	if (unlikely(!bp)) {
 		if (flags & XBF_NOALLOC) {
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index a28a2c5a496ab5..b74761a4e83025 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -194,6 +194,7 @@ struct xfs_buf {
 	int			b_last_error;
 
 	const struct xfs_buf_ops	*b_ops;
+	struct rcu_head		b_rcu;
 };
 
 /* Finding and Reading Buffers */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/5] xfs: replace xfs_buf_incore with an XBF_NOALLOC flag to xfs_buf_get*
  2022-04-03 12:01 ` [PATCH 2/5] xfs: replace xfs_buf_incore with an XBF_NOALLOC flag to xfs_buf_get* Christoph Hellwig
@ 2022-04-03 21:54   ` Dave Chinner
  2022-04-05 14:55     ` Christoph Hellwig
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2022-04-03 21:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Sun, Apr 03, 2022 at 02:01:16PM +0200, Christoph Hellwig wrote:
> Replace the special purpose xfs_buf_incore interface with a new
> XBF_NOALLOC flag for the xfs_buf_get* routines.

I think this is the wrong direction to go in the greater scheme of
things. _XBF_NOALLOC needs to be an internal implementation detail
similar to _XBF_PAGES, not exposed as part of the caller API.

That is, xfs_buf_incore() clearly documents the operation "return me
the locked buffer if it currently cached in memory" that the callers
want, while XBF_NOALLOC doesn't clearly mean anything as obvious as
this at the caller level.  Hence I'd prefer this to end up as:

/*
 * Lock and return the buffer that matches the requested range if
 * and only if it is present in the cache already.
 */
static inline struct xfs_buf *
xfs_buf_incore(
	struct xfs_buftarg	*target,
	xfs_daddr_t		blkno,
	size_t			numblks,
	xfs_buf_flags_t		flags)
{
	struct xfs_buf		*bp;
	int			error;
	DEFINE_SINGLE_BUF_MAP(map, blkno, numblks);

	error = xfs_buf_get_map(target, &map, 1, _XBF_NOALLOC | flags,
				NULL, &bp);
	if (error)
		return NULL;
	return bp;
}

Then none of the external callers need to be changed, and we don't
introduce new external xfs_buf_get() callers.


> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/libxfs/xfs_attr_remote.c |  6 +++---
>  fs/xfs/scrub/repair.c           |  6 +++---
>  fs/xfs/xfs_buf.c                | 22 +++-------------------
>  fs/xfs/xfs_buf.h                |  5 +----
>  fs/xfs/xfs_qm.c                 |  6 +++---
>  5 files changed, 13 insertions(+), 32 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
> index 3fc62ff92441d5..9aff2ce203c9b6 100644
> --- a/fs/xfs/libxfs/xfs_attr_remote.c
> +++ b/fs/xfs/libxfs/xfs_attr_remote.c
> @@ -550,10 +550,10 @@ xfs_attr_rmtval_stale(
>  	    XFS_IS_CORRUPT(mp, map->br_startblock == HOLESTARTBLOCK))
>  		return -EFSCORRUPTED;
>  
> -	bp = xfs_buf_incore(mp->m_ddev_targp,
> +	if (!xfs_buf_get(mp->m_ddev_targp,
>  			XFS_FSB_TO_DADDR(mp, map->br_startblock),
> -			XFS_FSB_TO_BB(mp, map->br_blockcount), incore_flags);
> -	if (bp) {
> +			XFS_FSB_TO_BB(mp, map->br_blockcount),
> +			incore_flags | XBF_NOALLOC, &bp)) {
>  		xfs_buf_stale(bp);
>  		xfs_buf_relse(bp);
>  	}

FWIW, I also think that the this pattern change is a regression.
We've spent the past decade+ moving function calls that return
objects and errors out of if() scope to clean up the code. Reversing
that pattern here doesn't make the code cleaner...

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/5] xfs: remove a superflous hash lookup when inserting new buffers
  2022-04-03 12:01 ` [PATCH 3/5] xfs: remove a superflous hash lookup when inserting new buffers Christoph Hellwig
@ 2022-04-03 23:04   ` Dave Chinner
  2022-04-05 15:00     ` Christoph Hellwig
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2022-04-03 23:04 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Sun, Apr 03, 2022 at 02:01:17PM +0200, Christoph Hellwig wrote:
> xfs_buf_get_map has a bit of a strange structure where the xfs_buf_find
> helper is called twice before we actually insert a new buffer on a cache
> miss.  Given that the rhashtable has an interface to insert a new entry
> and return the found one on a conflict we can easily get rid of the
> double lookup by using that.

We can do that without completely rewriting this code.

        spin_lock(&pag->pag_buf_lock);
	if (!new_bp) {
		bp = rhashtable_lookup_fast(&pag->pag_buf_hash, &cmap,
					    xfs_buf_hash_params);
		if (!bp) {
			error = -ENOENT;
			goto not_found;
		}
	} else {
		bp = rhashtable_lookup_get_insert_fast(&pag->pag_buf_hash,
				&new_bp->b_rhash_head, xfs_buf_hash_params);
		if (IS_ERR(bp)) {
			error = PTR_ERR(*bpp);
			goto not_found;
		}
		if (!bp) {
			/*
			 * Inserted new buffer, it keeps the perag reference until
			 * it is freed.
			 */
			new_bp->b_pag = pag;
			spin_unlock(&pag->pag_buf_lock);
			*found_bp = new_bp;
			return 0;
		}
	}

	atomic_inc(&bp->b_hold);
        spin_lock(&pag->pag_buf_lock);
	xfs_perag_put(pag);

	<lock buffer>

	*found_bp = bp;
	return 0;

not_found:
	spin_lock(&pag->pag_buf_lock); 
	xfs_perag_put(pag);
	return error;
}

And now we have the existing code with the the optimised rhashtable
insertion. I'd much prefer this be separated out like this so that
this rhashtable usage change is a separate bisectable commit to all
the other changes.

I'm also not sold on moving where we allocate the buffer allocation
as done in this patch because:

> +static int
> +xfs_buf_insert(
> +	struct xfs_buftarg	*btp,
> +	struct xfs_buf_map	*map,
> +	int			nmaps,
> +	xfs_buf_flags_t		flags,
> +	struct xfs_perag	*pag,
> +	struct xfs_buf		**bpp)
> +{
> +	int			error;
> +	struct xfs_buf		*bp;
> +
> +	error = _xfs_buf_alloc(btp, map, nmaps, flags, &bp);
> +	if (error)
> +		return error;
> +
> +	/*
> +	 * For buffers that fit entirely within a single page, first attempt to
> +	 * allocate the memory from the heap to minimise memory usage. If we
> +	 * can't get heap memory for these small buffers, we fall back to using
> +	 * the page allocator.
> +	 */
> +	if (BBTOB(bp->b_length) >= PAGE_SIZE ||
> +	    xfs_buf_alloc_kmem(bp, flags) < 0) {
> +		error = xfs_buf_alloc_pages(bp, flags);
> +		if (error)
> +			goto out_free_buf;
> +	}
> +
> +	/* the buffer keeps the perag reference until it is freed */
> +	bp->b_pag = pag;
> +
> +	spin_lock(&pag->pag_buf_lock);
> +	*bpp = rhashtable_lookup_get_insert_fast(&pag->pag_buf_hash,
> +			&bp->b_rhash_head, xfs_buf_hash_params);
> +	if (IS_ERR(*bpp)) {
> +		error = PTR_ERR(*bpp);
> +		goto out_unlock;
> +	}
> +	if (*bpp) {
> +		/* found an existing buffer */
> +		atomic_inc(&(*bpp)->b_hold);
> +		error = -EEXIST;
> +		goto out_unlock;
> +	}
> +	spin_unlock(&pag->pag_buf_lock);
> +	*bpp = bp;
> +	return 0;

The return cases of this function end up being a bit of a mess. We can return:

 - error = 0 and a locked buffer in *bpp
 - error = -EEXIST and an unlocked buffer in *bpp
 - error != 0 and a modified *bpp pointer
 - error != 0 and an unmodified *bpp pointer

So we end up with a bunch of logic here to separate out the return
cases into different error values, then have to add logic to the
caller to handle the different return cases.

And if we look at the new caller:

> +	if (unlikely(!bp)) {
> +		if (flags & XBF_NOALLOC) {
> +			XFS_STATS_INC(mp, xb_miss_locked);
> +			xfs_perag_put(pag);
> +			return -ENOENT;
> +		}
> +
> +		error = xfs_buf_insert(btp, map, nmaps, flags, pag, &bp);
> +		if (!error)
> +			goto finish;
> +		if (error != -EEXIST) {
> +			xfs_perag_put(pag);
> +			return error;
> +		}
> +	}
>  	xfs_perag_put(pag);

	<lock the buffer>

It took me quite some time to realise this wasn't buggy - it looks
for all the world like the "!error" case here fails to lock the
buffer and leaks the perag reference. It isn't at all clear that the
new buffer is inserted locked into the hash and that it steals the
callers perag reference, whilst the -EEXIST case does neither yet
still returns a buffer.

Hence while I like the idea of changing up how we allocate the
buffer on lookup failure, I'm not sold on this interface yet. Hence
I think splitting the rhashtable insert optimisation from the
allocation rework might help clean this up more.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/5] xfs: replace xfs_buf_incore with an XBF_NOALLOC flag to xfs_buf_get*
  2022-04-03 21:54   ` Dave Chinner
@ 2022-04-05 14:55     ` Christoph Hellwig
  2022-04-05 21:21       ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2022-04-05 14:55 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, linux-xfs

On Mon, Apr 04, 2022 at 07:54:43AM +1000, Dave Chinner wrote:
> /*
>  * Lock and return the buffer that matches the requested range if
>  * and only if it is present in the cache already.
>  */
> static inline struct xfs_buf *
> xfs_buf_incore(
> 	struct xfs_buftarg	*target,
> 	xfs_daddr_t		blkno,
> 	size_t			numblks,
> 	xfs_buf_flags_t		flags)
> {
> 	struct xfs_buf		*bp;
> 	int			error;
> 	DEFINE_SINGLE_BUF_MAP(map, blkno, numblks);
> 
> 	error = xfs_buf_get_map(target, &map, 1, _XBF_NOALLOC | flags,
> 				NULL, &bp);
> 	if (error)
> 		return NULL;
> 	return bp;
> }
> 
> Then none of the external callers need to be changed, and we don't
> introduce new external xfs_buf_get() callers.

I had that earlier, but having xfs_buf_incore as the odd one out that
still returns a buffer (like most XFS buffer cache routines did back
a long time ago) just did seem pretty odd compared tothe rest.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/5] xfs: remove a superflous hash lookup when inserting new buffers
  2022-04-03 23:04   ` Dave Chinner
@ 2022-04-05 15:00     ` Christoph Hellwig
  2022-04-05 22:01       ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2022-04-05 15:00 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, linux-xfs

On Mon, Apr 04, 2022 at 09:04:52AM +1000, Dave Chinner wrote:
> On Sun, Apr 03, 2022 at 02:01:17PM +0200, Christoph Hellwig wrote:
> > xfs_buf_get_map has a bit of a strange structure where the xfs_buf_find
> > helper is called twice before we actually insert a new buffer on a cache
> > miss.  Given that the rhashtable has an interface to insert a new entry
> > and return the found one on a conflict we can easily get rid of the
> > double lookup by using that.
> 
> We can do that without completely rewriting this code.

We could.  And I had something similar earlier.  But I actually thing
the structure of the code after this patch makes much more sense.  All
the logic for the fast path buffer lookup is clearly layed out in one
function, which then just calls a helper to perform the lookup.
The new scheme also is slightly less code overall.  Even more so once
the lockless lookup comes into play which requires different locking
and refcount increments.

> The return cases of this function end up being a bit of a mess. We can return:
> 
>  - error = 0 and a locked buffer in *bpp
>  - error = -EEXIST and an unlocked buffer in *bpp
>  - error != 0 and a modified *bpp pointer
>  - error != 0 and an unmodified *bpp pointer

The last two are the same  - the *bpp pointer simply is not valid on a
"real" error return.  So the return really is a tristate, similar
to many other places in xfs.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/5] xfs: replace xfs_buf_incore with an XBF_NOALLOC flag to xfs_buf_get*
  2022-04-05 14:55     ` Christoph Hellwig
@ 2022-04-05 21:21       ` Dave Chinner
  2022-04-06 16:24         ` Christoph Hellwig
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2022-04-05 21:21 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Tue, Apr 05, 2022 at 04:55:09PM +0200, Christoph Hellwig wrote:
> On Mon, Apr 04, 2022 at 07:54:43AM +1000, Dave Chinner wrote:
> > /*
> >  * Lock and return the buffer that matches the requested range if
> >  * and only if it is present in the cache already.
> >  */
> > static inline struct xfs_buf *
> > xfs_buf_incore(
> > 	struct xfs_buftarg	*target,
> > 	xfs_daddr_t		blkno,
> > 	size_t			numblks,
> > 	xfs_buf_flags_t		flags)
> > {
> > 	struct xfs_buf		*bp;
> > 	int			error;
> > 	DEFINE_SINGLE_BUF_MAP(map, blkno, numblks);
> > 
> > 	error = xfs_buf_get_map(target, &map, 1, _XBF_NOALLOC | flags,
> > 				NULL, &bp);
> > 	if (error)
> > 		return NULL;
> > 	return bp;
> > }
> > 
> > Then none of the external callers need to be changed, and we don't
> > introduce new external xfs_buf_get() callers.
> 
> I had that earlier, but having xfs_buf_incore as the odd one out that
> still returns a buffer (like most XFS buffer cache routines did back
> a long time ago) just did seem pretty odd compared tothe rest.

Then let's fix that to use the same interface as everything else,
and that simplifies the implementation down to just:

static inline int
xfs_buf_incore(
	struct xfs_buftarg	*target,
	xfs_daddr_t		blkno,
	size_t			numblks,
	xfs_buf_flags_t		flags,
	struct xfs_buf		**bpp)
{
	DEFINE_SINGLE_BUF_MAP(map, blkno, numblks);

	return xfs_buf_get_map(target, &map, 1, _XBF_INCORE | flags,
				NULL, bpp);
}

And, FWIW, the _XBF_NOALLOC flag really wants to be _XBF_INCORE - we
need it to describe the lookup behaviour the flag provides, not the
internal implementation detail that acheives the desired
behaviour....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/5] xfs: remove a superflous hash lookup when inserting new buffers
  2022-04-05 15:00     ` Christoph Hellwig
@ 2022-04-05 22:01       ` Dave Chinner
  2022-04-06 16:26         ` Christoph Hellwig
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2022-04-05 22:01 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Tue, Apr 05, 2022 at 05:00:27PM +0200, Christoph Hellwig wrote:
> On Mon, Apr 04, 2022 at 09:04:52AM +1000, Dave Chinner wrote:
> > On Sun, Apr 03, 2022 at 02:01:17PM +0200, Christoph Hellwig wrote:
> > > xfs_buf_get_map has a bit of a strange structure where the xfs_buf_find
> > > helper is called twice before we actually insert a new buffer on a cache
> > > miss.  Given that the rhashtable has an interface to insert a new entry
> > > and return the found one on a conflict we can easily get rid of the
> > > double lookup by using that.
> > 
> > We can do that without completely rewriting this code.
> 
> We could.  And I had something similar earlier.  But I actually thing
> the structure of the code after this patch makes much more sense.  All
> the logic for the fast path buffer lookup is clearly layed out in one
> function, which then just calls a helper to perform the lookup.
> The new scheme also is slightly less code overall.  Even more so once
> the lockless lookup comes into play which requires different locking
> and refcount increments.

Agreed, but you're making two distinct, significant modifications in
the one patchset. One is changing the way we use a generic library
functionality, the other is changing the entire structure of the
lookup path.

IOWs, I was not saying the end result was bad, I was (clumsily)
trying to suggest that you should split these two modifications into
separate patches because they are largely separate changes.

Once I thought about it that way, and
looking that them that way made me want to structure the code quite
differently.

> > The return cases of this function end up being a bit of a mess. We can return:
> > 
> >  - error = 0 and a locked buffer in *bpp
> >  - error = -EEXIST and an unlocked buffer in *bpp
> >  - error != 0 and a modified *bpp pointer
> >  - error != 0 and an unmodified *bpp pointer
> 
> The last two are the same  - the *bpp pointer simply is not valid on a
> "real" error return.  So the return really is a tristate, similar
> to many other places in xfs.

I think you missed the point I was making. I'm not complaining about
whether it's a tristate return or not, it's the fact that it can
return buffers in different states and the caller has to handle that
inconsistency itself whilst still maintaining an efficient fast
path.

That's what makes the code difficult to follow - xfs_buf_insert() is
the slow path, so all the complexity and twisted logic should be
inside that function rather than directly impacting the fast path
code.

e.g. Most of the complexity goes away if we factor out the buffer
trylock/locking code into a helper (like we have in the iomap code)
and then have xfs_buf_insert() call it when it finds an existing
buffer. Then the -EEXIST return value can go away, and
xfs_buf_insert can return a locked buffer exactly the same as if it
inserted a new buffer. Have the newly allocated buffer take a new
perag reference, too, instead of stealing the caller's reference,
and then all the differences between insert and -EEXIST cases go
away.

Then you can move all the conditional lookup failure cases into
xfs_buf_insert(), too, and we end up with high level fast path code
that is clear and concise:

	/* cache hits generally outnumber misses by at least 10:1 */
	bp = xfs_buf_lookup_fast(pag, &cmap);
	if (likely(bp))
		error = xfs_buf_lookup_lock(bp, flags);
	else
		error = xfs_buf_lookup_insert(pag, &cmap, flags, &bp);

	xfs_perag_put(pag);
	if (!error)
		*bpp = bp;
	return error;

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/5] xfs: replace xfs_buf_incore with an XBF_NOALLOC flag to xfs_buf_get*
  2022-04-05 21:21       ` Dave Chinner
@ 2022-04-06 16:24         ` Christoph Hellwig
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2022-04-06 16:24 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, linux-xfs

On Wed, Apr 06, 2022 at 07:21:33AM +1000, Dave Chinner wrote:
> > I had that earlier, but having xfs_buf_incore as the odd one out that
> > still returns a buffer (like most XFS buffer cache routines did back
> > a long time ago) just did seem pretty odd compared tothe rest.
> 
> Then let's fix that to use the same interface as everything else,
> and that simplifies the implementation down to just:
> 
> static inline int
> xfs_buf_incore(
> 	struct xfs_buftarg	*target,
> 	xfs_daddr_t		blkno,
> 	size_t			numblks,
> 	xfs_buf_flags_t		flags,
> 	struct xfs_buf		**bpp)
> {
> 	DEFINE_SINGLE_BUF_MAP(map, blkno, numblks);
> 
> 	return xfs_buf_get_map(target, &map, 1, _XBF_INCORE | flags,
> 				NULL, bpp);
> }
> 
> And, FWIW, the _XBF_NOALLOC flag really wants to be _XBF_INCORE - we
> need it to describe the lookup behaviour the flag provides, not the
> internal implementation detail that acheives the desired
> behaviour....

At least in my mental model a 'find but do not allocate' matches
the lookup behavior more than the somewhat odd 'incore' name.  I know
it is something traditional Unix including IRIX has used forever,
but it is a bit of an odd choice with no history in Linux.

That being said the flag and the wrapper should match, so IFF we keep
xfs_buf_incore the flag should also be _XBF_INCORE.  Still not my
preference, though.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/5] xfs: remove a superflous hash lookup when inserting new buffers
  2022-04-05 22:01       ` Dave Chinner
@ 2022-04-06 16:26         ` Christoph Hellwig
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2022-04-06 16:26 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, linux-xfs

On Wed, Apr 06, 2022 at 08:01:21AM +1000, Dave Chinner wrote:
> Agreed, but you're making two distinct, significant modifications in
> the one patchset. One is changing the way we use a generic library
> functionality, the other is changing the entire structure of the
> lookup path.
> 
> IOWs, I was not saying the end result was bad, I was (clumsily)
> trying to suggest that you should split these two modifications into
> separate patches because they are largely separate changes.
> 
> Once I thought about it that way, and
> looking that them that way made me want to structure the code quite
> differently.

Ok, I'll see if I can split things up a bit better.

> 
> e.g. Most of the complexity goes away if we factor out the buffer
> trylock/locking code into a helper (like we have in the iomap code)
> and then have xfs_buf_insert() call it when it finds an existing
> buffer. Then the -EEXIST return value can go away, and
> xfs_buf_insert can return a locked buffer exactly the same as if it
> inserted a new buffer. Have the newly allocated buffer take a new
> perag reference, too, instead of stealing the caller's reference,
> and then all the differences between insert and -EEXIST cases go
> away.

I actually had that earlier as well and really like the flow of
the single function.  So it certainly is doable.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-04-06 17:46 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-03 12:01 lockless and cleaned up buffer lookup Christoph Hellwig
2022-04-03 12:01 ` [PATCH 1/5] xfs: add a flags argument to xfs_buf_get Christoph Hellwig
2022-04-03 12:01 ` [PATCH 2/5] xfs: replace xfs_buf_incore with an XBF_NOALLOC flag to xfs_buf_get* Christoph Hellwig
2022-04-03 21:54   ` Dave Chinner
2022-04-05 14:55     ` Christoph Hellwig
2022-04-05 21:21       ` Dave Chinner
2022-04-06 16:24         ` Christoph Hellwig
2022-04-03 12:01 ` [PATCH 3/5] xfs: remove a superflous hash lookup when inserting new buffers Christoph Hellwig
2022-04-03 23:04   ` Dave Chinner
2022-04-05 15:00     ` Christoph Hellwig
2022-04-05 22:01       ` Dave Chinner
2022-04-06 16:26         ` Christoph Hellwig
2022-04-03 12:01 ` [PATCH 4/5] xfs: reduce the number of atomic when locking a buffer after lookup Christoph Hellwig
2022-04-03 12:01 ` [PATCH 5/5] xfs: lockless buffer lookup Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.