All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/4] xfs: defer agfl block frees
@ 2017-12-07 18:58 Brian Foster
  2017-12-07 18:58 ` [PATCH RFC 1/4] xfs: create agfl block free helper function Brian Foster
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Brian Foster @ 2017-12-07 18:58 UTC (permalink / raw)
  To: linux-xfs

Hi all,

This is an experiment I've been hacking on based on some of the recent
discussions around log reservation deficiences due to unpredictable AGFL
fixup behavior[1] and a previous attempt to address the problem[2].

This RFC series introduces the ability to defer AGFL block frees much
like the way we already defer frees of file-mapped extents. The intent
is to disconnect freeing of surplus AGFL blocks from making those free
AGFL slots available for an impending allocation. In turn, this creates
conditions where AGFL fixups have more consistent and predictable log
reservation consumption. AGFL block allocation behavior does not change
since most AGFL deficiences can be satisfied by a single allocation
request (whereas AGFL blocks are generally freed one at a time,
requiring multiple cycles through the allocator).

This series survives the overrun reproducer associated with [1], even
without the transaction fixups in [3] (that also work around the
overrun). It also survives an xfstests run without any obvious problems.

This is RFC for a few reasons. First, this is obviously a sensitive area
of code that requires thorough analysis. This also requires more
thorough stress testing than it has seen so far. Finally, the behavior
changes are intentionally limited to target the reproducer workload. A
proper series should probably include updates to consistently defer AGFL
block frees from all possible inobt contexts, for example. I'm not so
sure that is necessary/appropriate for other allocator callers, however,
but is worth consideration.

Thoughts, reviews, flames appreciated.

Brian

[1] https://marc.info/?l=linux-xfs&m=151127676203410&w=2
[2] https://marc.info/?l=linux-xfs&m=151181428131889&w=2
[3] https://marc.info/?l=linux-xfs&m=151206831911206&w=2

Brian Foster (4):
  xfs: create agfl block free helper function
  xfs: defer agfl block frees when dfops is available
  xfs: defer agfl block frees on extent frees
  xfs: defer agfl frees on inobt allocs during chunk removal

 fs/xfs/libxfs/xfs_alloc.c          | 91 ++++++++++++++++++++++++++++++++------
 fs/xfs/libxfs/xfs_alloc.h          |  8 +++-
 fs/xfs/libxfs/xfs_defer.h          |  1 +
 fs/xfs/libxfs/xfs_ialloc.c         | 16 +++----
 fs/xfs/libxfs/xfs_ialloc_btree.c   | 11 +++--
 fs/xfs/libxfs/xfs_ialloc_btree.h   |  4 +-
 fs/xfs/libxfs/xfs_refcount_btree.c |  2 +-
 fs/xfs/libxfs/xfs_rmap.c           |  2 +-
 fs/xfs/scrub/common.c              |  8 ++--
 fs/xfs/xfs_extfree_item.c          |  2 +-
 fs/xfs/xfs_fsops.c                 |  2 +-
 fs/xfs/xfs_itable.c                |  4 +-
 fs/xfs/xfs_trace.h                 |  2 +
 fs/xfs/xfs_trans.h                 |  3 +-
 fs/xfs/xfs_trans_extfree.c         | 77 ++++++++++++++++++++++++++++++--
 15 files changed, 189 insertions(+), 44 deletions(-)

-- 
2.13.6


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH RFC 1/4] xfs: create agfl block free helper function
  2017-12-07 18:58 [PATCH RFC 0/4] xfs: defer agfl block frees Brian Foster
@ 2017-12-07 18:58 ` Brian Foster
  2017-12-07 22:24   ` Dave Chinner
  2017-12-07 18:58 ` [PATCH RFC 2/4] xfs: defer agfl block frees when dfops is available Brian Foster
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 17+ messages in thread
From: Brian Foster @ 2017-12-07 18:58 UTC (permalink / raw)
  To: linux-xfs

Refactor the AGFL block free code into a new helper such that it can
be invoked from deferred context. No functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 37 +++++++++++++++++++++++++++----------
 fs/xfs/libxfs/xfs_alloc.h |  2 ++
 2 files changed, 29 insertions(+), 10 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 0da80019a917..8a2235ebca08 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2040,6 +2040,30 @@ xfs_alloc_space_available(
 	return true;
 }
 
+int
+xfs_free_agfl_block(
+	struct xfs_trans	*tp,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		agbno,
+	struct xfs_buf		*agbp,
+	struct xfs_owner_info	*oinfo)
+{
+	int			error;
+	struct xfs_buf		*bp;
+
+	error = xfs_free_ag_extent(tp, agbp, agno, agbno, 1, oinfo,
+				   XFS_AG_RESV_AGFL);
+	if (error)
+		return error;
+
+	bp = xfs_btree_get_bufs(tp->t_mountp, tp, agno, agbno, 0);
+	if (!bp)
+		return -EFSCORRUPTED;
+	xfs_trans_binval(tp, bp);
+
+	return 0;
+}
+
 /*
  * Decide whether to use this allocation group for this allocation.
  * If so, fix up the btree freelist's size.
@@ -2136,21 +2160,14 @@ xfs_alloc_fix_freelist(
 	else
 		xfs_rmap_ag_owner(&targs.oinfo, XFS_RMAP_OWN_AG);
 	while (!(flags & XFS_ALLOC_FLAG_NOSHRINK) && pag->pagf_flcount > need) {
-		struct xfs_buf	*bp;
-
 		error = xfs_alloc_get_freelist(tp, agbp, &bno, 0);
 		if (error)
 			goto out_agbp_relse;
-		error = xfs_free_ag_extent(tp, agbp, args->agno, bno, 1,
-					   &targs.oinfo, XFS_AG_RESV_AGFL);
+
+		error = xfs_free_agfl_block(tp, args->agno, bno, agbp,
+					    &targs.oinfo);
 		if (error)
 			goto out_agbp_relse;
-		bp = xfs_btree_get_bufs(mp, tp, args->agno, bno, 0);
-		if (!bp) {
-			error = -EFSCORRUPTED;
-			goto out_agbp_relse;
-		}
-		xfs_trans_binval(tp, bp);
 	}
 
 	targs.tp = tp;
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index 7ba2d129d504..d3a150180b1d 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -215,6 +215,8 @@ int xfs_read_agf(struct xfs_mount *mp, struct xfs_trans *tp,
 			xfs_agnumber_t agno, int flags, struct xfs_buf **bpp);
 int xfs_alloc_read_agfl(struct xfs_mount *mp, struct xfs_trans *tp,
 			xfs_agnumber_t agno, struct xfs_buf **bpp);
+int xfs_free_agfl_block(struct xfs_trans *, xfs_agnumber_t, xfs_agblock_t,
+			struct xfs_buf *, struct xfs_owner_info *);
 int xfs_alloc_fix_freelist(struct xfs_alloc_arg *args, int flags);
 int xfs_free_extent_fix_freelist(struct xfs_trans *tp, xfs_agnumber_t agno,
 		struct xfs_buf **agbp);
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RFC 2/4] xfs: defer agfl block frees when dfops is available
  2017-12-07 18:58 [PATCH RFC 0/4] xfs: defer agfl block frees Brian Foster
  2017-12-07 18:58 ` [PATCH RFC 1/4] xfs: create agfl block free helper function Brian Foster
@ 2017-12-07 18:58 ` Brian Foster
  2017-12-07 22:41   ` Dave Chinner
  2017-12-07 18:58 ` [PATCH RFC 3/4] xfs: defer agfl block frees on extent frees Brian Foster
  2017-12-07 18:58 ` [PATCH RFC 4/4] xfs: defer agfl frees on inobt allocs during chunk removal Brian Foster
  3 siblings, 1 reply; 17+ messages in thread
From: Brian Foster @ 2017-12-07 18:58 UTC (permalink / raw)
  To: linux-xfs

The AGFL fixup code executes before every block allocation/free and
rectifies the AGFL based on the current, dynamic allocation
requirements of the fs. The AGFL must hold a minimum number of
blocks to satisfy a worst case split of the free space btrees caused
by the impending allocation operation. The AGFL is also updated to
maintain the implicit requirement for a minimum number of free slots
to satisfy a worst case join of the free space btrees.

Since the AGFL caches individual blocks, AGFL reduction typically
involves multiple, single block frees. We've had reports of
transaction overrun problems during certain workloads that boil down
to AGFL reduction freeing multiple blocks and consuming more space
in the log than was reserved for the transaction.

Since the objective of freeing AGFL blocks is to ensure free AGFL
free slots are available for the upcoming allocation, one way to
address this problem is to release surplus blocks from the AGFL
immediately but defer the free of those blocks (similar to how
file-mapped blocks are unmapped from the file in one transaction and
freed via a deferred operation) until the transaction is rolled.
This turns AGFL reduction into an operation with predictable log
reservation consumption.

Add the capability to defer AGFL block frees when a deferred ops
list is handed to the AGFL fixup code. Deferring AGFL frees is a
conditional behavior based on whether the caller has populated the
new dfops field of the xfs_alloc_arg structure. A bit of
customization is required to handle deferred completion processing
because AGFL blocks are accounted against a separate reservation
pool and AGFL are not inserted into the extent busy list when freed
(they are inserted when used and released back to the AGFL). Reuse
the majority of the existing deferred extent free infrastructure and
customize it appropriately to handle AGFL blocks.

Note that this patch only adds infrastructure. It does not change
behavior because no callers have been updated to pass dfops into the
allocation code.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c  | 51 ++++++++++++++++++++++++++++++---
 fs/xfs/libxfs/xfs_alloc.h  |  1 +
 fs/xfs/libxfs/xfs_defer.h  |  1 +
 fs/xfs/xfs_trace.h         |  2 ++
 fs/xfs/xfs_trans_extfree.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 121 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 8a2235ebca08..ab636181471c 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -39,6 +39,9 @@
 #include "xfs_buf_item.h"
 #include "xfs_log.h"
 #include "xfs_ag_resv.h"
+#include "xfs_bmap.h"
+
+extern kmem_zone_t	*xfs_bmap_free_item_zone;
 
 struct workqueue_struct *xfs_alloc_wq;
 
@@ -2065,6 +2068,40 @@ xfs_free_agfl_block(
 }
 
 /*
+ * Defer an AGFL block free. This is effectively equivalent to
+ * xfs_bmap_add_free() with some special handling particular to AGFL blocks.
+ *
+ * Deferring AGFL frees helps prevent log reservation overruns due to too many
+ * allocation operations in a transaction. AGFL frees are prone to this problem
+ * because for one they are always freed one at a time. Further, an immediate
+ * AGFL block free can cause a btree join and require another block free before
+ * the real allocation can proceed. Deferring the free disconnects freeing up
+ * the AGFL slot from freeing the block.
+ */
+STATIC void
+xfs_defer_agfl_block(
+	struct xfs_mount		*mp,
+	struct xfs_defer_ops		*dfops,
+	xfs_agnumber_t			agno,
+	xfs_fsblock_t			agbno,
+	struct xfs_owner_info		*oinfo)
+{
+	struct xfs_extent_free_item	*new;		/* new element */
+
+	ASSERT(xfs_bmap_free_item_zone != NULL);
+	ASSERT(oinfo != NULL);
+
+	new = kmem_zone_alloc(xfs_bmap_free_item_zone, KM_SLEEP);
+	new->xefi_startblock = XFS_AGB_TO_FSB(mp, agno, agbno);
+	new->xefi_blockcount = 1;
+	new->xefi_oinfo = *oinfo;
+
+	trace_xfs_agfl_free_defer(mp, agno, 0, agbno, 1);
+
+	xfs_defer_add(dfops, XFS_DEFER_OPS_TYPE_AGFL_FREE, &new->xefi_list);
+}
+
+/*
  * Decide whether to use this allocation group for this allocation.
  * If so, fix up the btree freelist's size.
  */
@@ -2164,10 +2201,16 @@ xfs_alloc_fix_freelist(
 		if (error)
 			goto out_agbp_relse;
 
-		error = xfs_free_agfl_block(tp, args->agno, bno, agbp,
-					    &targs.oinfo);
-		if (error)
-			goto out_agbp_relse;
+		/* defer agfl frees if dfops is provided */
+		if (args->dfops) {
+			xfs_defer_agfl_block(mp, args->dfops, args->agno, bno,
+						     &targs.oinfo);
+		} else {
+			error = xfs_free_agfl_block(tp, args->agno, bno, agbp,
+						    &targs.oinfo);
+			if (error)
+				goto out_agbp_relse;
+		}
 	}
 
 	targs.tp = tp;
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index d3a150180b1d..559568806265 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -61,6 +61,7 @@ typedef unsigned int xfs_alloctype_t;
  */
 typedef struct xfs_alloc_arg {
 	struct xfs_trans *tp;		/* transaction pointer */
+	struct xfs_defer_ops	*dfops;	/* deferred ops (for agfl) */
 	struct xfs_mount *mp;		/* file system mount point */
 	struct xfs_buf	*agbp;		/* buffer for a.g. freelist header */
 	struct xfs_perag *pag;		/* per-ag struct for this agno */
diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
index d4f046dd44bd..29c6b550f49b 100644
--- a/fs/xfs/libxfs/xfs_defer.h
+++ b/fs/xfs/libxfs/xfs_defer.h
@@ -55,6 +55,7 @@ enum xfs_defer_ops_type {
 	XFS_DEFER_OPS_TYPE_REFCOUNT,
 	XFS_DEFER_OPS_TYPE_RMAP,
 	XFS_DEFER_OPS_TYPE_FREE,
+	XFS_DEFER_OPS_TYPE_AGFL_FREE,
 	XFS_DEFER_OPS_TYPE_MAX,
 };
 
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index d718a10c2271..3c5d9f8cbb9d 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -2426,6 +2426,8 @@ DEFINE_DEFER_PENDING_EVENT(xfs_defer_pending_abort);
 #define DEFINE_BMAP_FREE_DEFERRED_EVENT DEFINE_PHYS_EXTENT_DEFERRED_EVENT
 DEFINE_BMAP_FREE_DEFERRED_EVENT(xfs_bmap_free_defer);
 DEFINE_BMAP_FREE_DEFERRED_EVENT(xfs_bmap_free_deferred);
+DEFINE_BMAP_FREE_DEFERRED_EVENT(xfs_agfl_free_defer);
+DEFINE_BMAP_FREE_DEFERRED_EVENT(xfs_agfl_free_deferred);
 
 /* rmap tracepoints */
 DECLARE_EVENT_CLASS(xfs_rmap_class,
diff --git a/fs/xfs/xfs_trans_extfree.c b/fs/xfs/xfs_trans_extfree.c
index ab438647592a..f5620796ae25 100644
--- a/fs/xfs/xfs_trans_extfree.c
+++ b/fs/xfs/xfs_trans_extfree.c
@@ -231,9 +231,79 @@ static const struct xfs_defer_op_type xfs_extent_free_defer_type = {
 	.cancel_item	= xfs_extent_free_cancel_item,
 };
 
+/*
+ * AGFL blocks are accounted differently in the reserve pools and are not
+ * inserted into the busy extent list.
+ */
+STATIC int
+xfs_agfl_free_finish_item(
+	struct xfs_trans		*tp,
+	struct xfs_defer_ops		*dop,
+	struct list_head		*item,
+	void				*done_item,
+	void				**state)
+{
+	struct xfs_mount		*mp = tp->t_mountp;
+	struct xfs_efd_log_item		*efdp = done_item;
+	struct xfs_extent_free_item	*free;
+	struct xfs_extent		*extp;
+	struct xfs_buf			*agbp;
+	int				error;
+	xfs_agnumber_t			agno;
+	xfs_agblock_t			agbno;
+	uint				next_extent;
+
+	free = container_of(item, struct xfs_extent_free_item, xefi_list);
+	ASSERT(free->xefi_blockcount == 1);
+	agno = XFS_FSB_TO_AGNO(mp, free->xefi_startblock);
+	agbno = XFS_FSB_TO_AGBNO(mp, free->xefi_startblock);
+
+	trace_xfs_agfl_free_deferred(mp, agno, 0, agbno, free->xefi_blockcount);
+
+	error = xfs_alloc_read_agf(mp, tp, agno, 0, &agbp);
+	if (!error)
+		error = xfs_free_agfl_block(tp, agno, agbno, agbp,
+					    &free->xefi_oinfo);
+
+	/*
+	 * Mark the transaction dirty, even on error. This ensures the
+	 * transaction is aborted, which:
+	 *
+	 * 1.) releases the EFI and frees the EFD
+	 * 2.) shuts down the filesystem
+	 */
+	tp->t_flags |= XFS_TRANS_DIRTY;
+	efdp->efd_item.li_desc->lid_flags |= XFS_LID_DIRTY;
+
+	next_extent = efdp->efd_next_extent;
+	ASSERT(next_extent < efdp->efd_format.efd_nextents);
+	extp = &(efdp->efd_format.efd_extents[next_extent]);
+	extp->ext_start = free->xefi_startblock;
+	extp->ext_len = free->xefi_blockcount;
+	efdp->efd_next_extent++;
+
+	kmem_free(free);
+	return error;
+}
+
+
+/* sub-type with special handling for AGFL deferred frees */
+static const struct xfs_defer_op_type xfs_agfl_free_defer_type = {
+	.type		= XFS_DEFER_OPS_TYPE_AGFL_FREE,
+	.max_items	= XFS_EFI_MAX_FAST_EXTENTS,
+	.diff_items	= xfs_extent_free_diff_items,
+	.create_intent	= xfs_extent_free_create_intent,
+	.abort_intent	= xfs_extent_free_abort_intent,
+	.log_item	= xfs_extent_free_log_item,
+	.create_done	= xfs_extent_free_create_done,
+	.finish_item	= xfs_agfl_free_finish_item,
+	.cancel_item	= xfs_extent_free_cancel_item,
+};
+
 /* Register the deferred op type. */
 void
 xfs_extent_free_init_defer_op(void)
 {
 	xfs_defer_init_op_type(&xfs_extent_free_defer_type);
+	xfs_defer_init_op_type(&xfs_agfl_free_defer_type);
 }
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RFC 3/4] xfs: defer agfl block frees on extent frees
  2017-12-07 18:58 [PATCH RFC 0/4] xfs: defer agfl block frees Brian Foster
  2017-12-07 18:58 ` [PATCH RFC 1/4] xfs: create agfl block free helper function Brian Foster
  2017-12-07 18:58 ` [PATCH RFC 2/4] xfs: defer agfl block frees when dfops is available Brian Foster
@ 2017-12-07 18:58 ` Brian Foster
  2017-12-07 22:49   ` Dave Chinner
  2017-12-07 18:58 ` [PATCH RFC 4/4] xfs: defer agfl frees on inobt allocs during chunk removal Brian Foster
  3 siblings, 1 reply; 17+ messages in thread
From: Brian Foster @ 2017-12-07 18:58 UTC (permalink / raw)
  To: linux-xfs

Defer AGFL block frees from deferred extent free context. This means
that extents that are deferred freed via xfs_bmap_add_free() will
add additional deferred items for AGFL block frees during completion
processing. All such items complete before xfs_defer_finish()
returns.

Update xfs_trans_free_extent() and xfs_free_extent() to receive an
optional dfops pointer and pass it down to the AGFL fixup code via
the allocation arguments structure. Update
xfs_extent_free_finish_item() to pass along the dfops list currently
being processed by xfs_defer_finish(). All other callers pass a NULL
dfops and so do not change behavior.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c          | 9 ++++++---
 fs/xfs/libxfs/xfs_alloc.h          | 5 +++--
 fs/xfs/libxfs/xfs_ialloc_btree.c   | 2 +-
 fs/xfs/libxfs/xfs_refcount_btree.c | 2 +-
 fs/xfs/libxfs/xfs_rmap.c           | 2 +-
 fs/xfs/xfs_extfree_item.c          | 2 +-
 fs/xfs/xfs_fsops.c                 | 2 +-
 fs/xfs/xfs_trans.h                 | 3 ++-
 fs/xfs/xfs_trans_extfree.c         | 7 ++++---
 9 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index ab636181471c..eafcef839526 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2857,7 +2857,8 @@ int
 xfs_free_extent_fix_freelist(
 	struct xfs_trans	*tp,
 	xfs_agnumber_t		agno,
-	struct xfs_buf		**agbp)
+	struct xfs_buf		**agbp,
+	struct xfs_defer_ops	*dfops)
 {
 	struct xfs_alloc_arg	args;
 	int			error;
@@ -2866,6 +2867,7 @@ xfs_free_extent_fix_freelist(
 	args.tp = tp;
 	args.mp = tp->t_mountp;
 	args.agno = agno;
+	args.dfops = dfops;
 
 	/*
 	 * validate that the block number is legal - the enables us to detect
@@ -2898,7 +2900,8 @@ xfs_free_extent(
 	xfs_fsblock_t		bno,	/* starting block number of extent */
 	xfs_extlen_t		len,	/* length of extent */
 	struct xfs_owner_info	*oinfo,	/* extent owner */
-	enum xfs_ag_resv_type	type)	/* block reservation type */
+	enum xfs_ag_resv_type	type,	/* block reservation type */
+	struct xfs_defer_ops	*dfops)
 {
 	struct xfs_mount	*mp = tp->t_mountp;
 	struct xfs_buf		*agbp;
@@ -2913,7 +2916,7 @@ xfs_free_extent(
 			XFS_ERRTAG_FREE_EXTENT))
 		return -EIO;
 
-	error = xfs_free_extent_fix_freelist(tp, agno, &agbp);
+	error = xfs_free_extent_fix_freelist(tp, agno, &agbp, dfops);
 	if (error)
 		return error;
 
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index 559568806265..17fc0265fbfa 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -196,7 +196,8 @@ xfs_free_extent(
 	xfs_fsblock_t		bno,	/* starting block number of extent */
 	xfs_extlen_t		len,	/* length of extent */
 	struct xfs_owner_info	*oinfo,	/* extent owner */
-	enum xfs_ag_resv_type	type);	/* block reservation type */
+	enum xfs_ag_resv_type	type,	/* block reservation type */
+	struct xfs_defer_ops	*dfops);/* dfops for agfl frees */
 
 int				/* error */
 xfs_alloc_lookup_ge(
@@ -220,7 +221,7 @@ int xfs_free_agfl_block(struct xfs_trans *, xfs_agnumber_t, xfs_agblock_t,
 			struct xfs_buf *, struct xfs_owner_info *);
 int xfs_alloc_fix_freelist(struct xfs_alloc_arg *args, int flags);
 int xfs_free_extent_fix_freelist(struct xfs_trans *tp, xfs_agnumber_t agno,
-		struct xfs_buf **agbp);
+		struct xfs_buf **agbp, struct xfs_defer_ops *);
 
 xfs_extlen_t xfs_prealloc_blocks(struct xfs_mount *mp);
 
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
index 317caba9faa6..b899f02e5c5e 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.c
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
@@ -155,7 +155,7 @@ xfs_inobt_free_block(
 	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_INOBT);
 	return xfs_free_extent(cur->bc_tp,
 			XFS_DADDR_TO_FSB(cur->bc_mp, XFS_BUF_ADDR(bp)), 1,
-			&oinfo, XFS_AG_RESV_NONE);
+			&oinfo, XFS_AG_RESV_NONE, NULL);
 }
 
 STATIC int
diff --git a/fs/xfs/libxfs/xfs_refcount_btree.c b/fs/xfs/libxfs/xfs_refcount_btree.c
index 3c59dd3d58d7..0f1ff83d0fb8 100644
--- a/fs/xfs/libxfs/xfs_refcount_btree.c
+++ b/fs/xfs/libxfs/xfs_refcount_btree.c
@@ -136,7 +136,7 @@ xfs_refcountbt_free_block(
 	be32_add_cpu(&agf->agf_refcount_blocks, -1);
 	xfs_alloc_log_agf(cur->bc_tp, agbp, XFS_AGF_REFCOUNT_BLOCKS);
 	error = xfs_free_extent(cur->bc_tp, fsbno, 1, &oinfo,
-			XFS_AG_RESV_METADATA);
+			XFS_AG_RESV_METADATA, NULL);
 	if (error)
 		return error;
 
diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
index dd019cee1b3b..439f90d70850 100644
--- a/fs/xfs/libxfs/xfs_rmap.c
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -2107,7 +2107,7 @@ xfs_rmap_finish_one(
 		 * rmapbt, because a shape change could cause us to
 		 * allocate blocks.
 		 */
-		error = xfs_free_extent_fix_freelist(tp, agno, &agbp);
+		error = xfs_free_extent_fix_freelist(tp, agno, &agbp, NULL);
 		if (error)
 			return error;
 		if (!agbp)
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 44f8c5451210..100dce7f6f17 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -542,7 +542,7 @@ xfs_efi_recover(
 	for (i = 0; i < efip->efi_format.efi_nextents; i++) {
 		extp = &efip->efi_format.efi_extents[i];
 		error = xfs_trans_free_extent(tp, efdp, extp->ext_start,
-					      extp->ext_len, &oinfo);
+					      extp->ext_len, &oinfo, NULL);
 		if (error)
 			goto abort_error;
 
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 8f22fc579dbb..c50ecf684938 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -574,7 +574,7 @@ xfs_growfs_data_private(
 		error = xfs_free_extent(tp,
 				XFS_AGB_TO_FSB(mp, agno,
 					be32_to_cpu(agf->agf_length) - new),
-				new, &oinfo, XFS_AG_RESV_NONE);
+				new, &oinfo, XFS_AG_RESV_NONE, NULL);
 		if (error)
 			goto error0;
 	}
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 815b53d20e26..88c5357d558c 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -228,7 +228,8 @@ struct xfs_efd_log_item	*xfs_trans_get_efd(struct xfs_trans *,
 				  uint);
 int		xfs_trans_free_extent(struct xfs_trans *,
 				      struct xfs_efd_log_item *, xfs_fsblock_t,
-				      xfs_extlen_t, struct xfs_owner_info *);
+				      xfs_extlen_t, struct xfs_owner_info *,
+				      struct xfs_defer_ops *);
 int		xfs_trans_commit(struct xfs_trans *);
 int		xfs_trans_roll(struct xfs_trans **);
 int		xfs_trans_roll_inode(struct xfs_trans **, struct xfs_inode *);
diff --git a/fs/xfs/xfs_trans_extfree.c b/fs/xfs/xfs_trans_extfree.c
index f5620796ae25..04cd3f7e3c02 100644
--- a/fs/xfs/xfs_trans_extfree.c
+++ b/fs/xfs/xfs_trans_extfree.c
@@ -68,7 +68,8 @@ xfs_trans_free_extent(
 	struct xfs_efd_log_item	*efdp,
 	xfs_fsblock_t		start_block,
 	xfs_extlen_t		ext_len,
-	struct xfs_owner_info	*oinfo)
+	struct xfs_owner_info	*oinfo,
+	struct xfs_defer_ops	*dfops)
 {
 	struct xfs_mount	*mp = tp->t_mountp;
 	uint			next_extent;
@@ -80,7 +81,7 @@ xfs_trans_free_extent(
 	trace_xfs_bmap_free_deferred(tp->t_mountp, agno, 0, agbno, ext_len);
 
 	error = xfs_free_extent(tp, start_block, ext_len, oinfo,
-			XFS_AG_RESV_NONE);
+			XFS_AG_RESV_NONE, dfops);
 
 	/*
 	 * Mark the transaction dirty, even on error. This ensures the
@@ -195,7 +196,7 @@ xfs_extent_free_finish_item(
 	error = xfs_trans_free_extent(tp, done_item,
 			free->xefi_startblock,
 			free->xefi_blockcount,
-			&free->xefi_oinfo);
+			&free->xefi_oinfo, dop);
 	kmem_free(free);
 	return error;
 }
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RFC 4/4] xfs: defer agfl frees on inobt allocs during chunk removal
  2017-12-07 18:58 [PATCH RFC 0/4] xfs: defer agfl block frees Brian Foster
                   ` (2 preceding siblings ...)
  2017-12-07 18:58 ` [PATCH RFC 3/4] xfs: defer agfl block frees on extent frees Brian Foster
@ 2017-12-07 18:58 ` Brian Foster
  3 siblings, 0 replies; 17+ messages in thread
From: Brian Foster @ 2017-12-07 18:58 UTC (permalink / raw)
  To: linux-xfs

Defer AGFL frees during inode btree ([f]inobt) block allocations and
frees that occur when removing inode chunks. The inode btrees do not
use the AGFL and instead make allocator calls directly from the
context of the tree update. This can chew up log reservation in the
transaction if multiple AGFL block frees are required.

Update the inobt cursor initialization function to receive an
optional dfops pointer and carry it to ->[alloc|free]_block()
context. From there, pass the dfops along to the allocator
appropriately. Update xfs_difree_inobt() to pass dfops from the
context where an inode chunk is freed and associated record removed
from the inobt. All other inobt cursor users pass a NULL dfops and
thus do not change behavior.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_ialloc.c       | 16 ++++++++--------
 fs/xfs/libxfs/xfs_ialloc_btree.c | 11 +++++++----
 fs/xfs/libxfs/xfs_ialloc_btree.h |  4 ++--
 fs/xfs/scrub/common.c            |  8 ++++----
 fs/xfs/xfs_itable.c              |  4 ++--
 5 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index de3f04a98656..4345ba9b691a 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -183,7 +183,7 @@ xfs_inobt_insert(
 	int			i;
 	int			error;
 
-	cur = xfs_inobt_init_cursor(mp, tp, agbp, agno, btnum);
+	cur = xfs_inobt_init_cursor(mp, tp, NULL, agbp, agno, btnum);
 
 	for (thisino = newino;
 	     thisino < newino + newlen;
@@ -532,7 +532,7 @@ xfs_inobt_insert_sprec(
 	int				i;
 	struct xfs_inobt_rec_incore	rec;
 
-	cur = xfs_inobt_init_cursor(mp, tp, agbp, agno, btnum);
+	cur = xfs_inobt_init_cursor(mp, tp, NULL, agbp, agno, btnum);
 
 	/* the new record is pre-aligned so we know where to look */
 	error = xfs_inobt_lookup(cur, nrec->ir_startino, XFS_LOOKUP_EQ, &i);
@@ -1141,7 +1141,7 @@ xfs_dialloc_ag_inobt(
 	ASSERT(pag->pagi_freecount > 0);
 
  restart_pagno:
-	cur = xfs_inobt_init_cursor(mp, tp, agbp, agno, XFS_BTNUM_INO);
+	cur = xfs_inobt_init_cursor(mp, tp, NULL, agbp, agno, XFS_BTNUM_INO);
 	/*
 	 * If pagino is 0 (this is the root inode allocation) use newino.
 	 * This must work because we've just allocated some.
@@ -1570,7 +1570,7 @@ xfs_dialloc_ag(
 	if (!pagino)
 		pagino = be32_to_cpu(agi->agi_newino);
 
-	cur = xfs_inobt_init_cursor(mp, tp, agbp, agno, XFS_BTNUM_FINO);
+	cur = xfs_inobt_init_cursor(mp, tp, NULL, agbp, agno, XFS_BTNUM_FINO);
 
 	error = xfs_check_agi_freecount(cur, agi);
 	if (error)
@@ -1613,7 +1613,7 @@ xfs_dialloc_ag(
 	 * the original freecount. If all is well, make the equivalent update to
 	 * the inobt using the finobt record and offset information.
 	 */
-	icur = xfs_inobt_init_cursor(mp, tp, agbp, agno, XFS_BTNUM_INO);
+	icur = xfs_inobt_init_cursor(mp, tp, NULL, agbp, agno, XFS_BTNUM_INO);
 
 	error = xfs_check_agi_freecount(icur, agi);
 	if (error)
@@ -1921,7 +1921,7 @@ xfs_difree_inobt(
 	/*
 	 * Initialize the cursor.
 	 */
-	cur = xfs_inobt_init_cursor(mp, tp, agbp, agno, XFS_BTNUM_INO);
+	cur = xfs_inobt_init_cursor(mp, tp, dfops, agbp, agno, XFS_BTNUM_INO);
 
 	error = xfs_check_agi_freecount(cur, agi);
 	if (error)
@@ -2042,7 +2042,7 @@ xfs_difree_finobt(
 	int				error;
 	int				i;
 
-	cur = xfs_inobt_init_cursor(mp, tp, agbp, agno, XFS_BTNUM_FINO);
+	cur = xfs_inobt_init_cursor(mp, tp, NULL, agbp, agno, XFS_BTNUM_FINO);
 
 	error = xfs_inobt_lookup(cur, ibtrec->ir_startino, XFS_LOOKUP_EQ, &i);
 	if (error)
@@ -2235,7 +2235,7 @@ xfs_imap_lookup(
 	 * we have a record, we need to ensure it contains the inode number
 	 * we are looking up.
 	 */
-	cur = xfs_inobt_init_cursor(mp, tp, agbp, agno, XFS_BTNUM_INO);
+	cur = xfs_inobt_init_cursor(mp, tp, NULL, agbp, agno, XFS_BTNUM_INO);
 	error = xfs_inobt_lookup(cur, agino, XFS_LOOKUP_LE, &i);
 	if (!error) {
 		if (i)
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
index b899f02e5c5e..4c996521b031 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.c
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
@@ -48,8 +48,8 @@ xfs_inobt_dup_cursor(
 	struct xfs_btree_cur	*cur)
 {
 	return xfs_inobt_init_cursor(cur->bc_mp, cur->bc_tp,
-			cur->bc_private.a.agbp, cur->bc_private.a.agno,
-			cur->bc_btnum);
+			cur->bc_private.a.dfops, cur->bc_private.a.agbp,
+			cur->bc_private.a.agno, cur->bc_btnum);
 }
 
 STATIC void
@@ -105,6 +105,7 @@ __xfs_inobt_alloc_block(
 	args.prod = 1;
 	args.type = XFS_ALLOCTYPE_NEAR_BNO;
 	args.resv = resv;
+	args.dfops = cur->bc_private.a.dfops;
 
 	error = xfs_alloc_vextent(&args);
 	if (error) {
@@ -155,7 +156,7 @@ xfs_inobt_free_block(
 	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_INOBT);
 	return xfs_free_extent(cur->bc_tp,
 			XFS_DADDR_TO_FSB(cur->bc_mp, XFS_BUF_ADDR(bp)), 1,
-			&oinfo, XFS_AG_RESV_NONE, NULL);
+			&oinfo, XFS_AG_RESV_NONE, cur->bc_private.a.dfops);
 }
 
 STATIC int
@@ -393,6 +394,7 @@ struct xfs_btree_cur *				/* new inode btree cursor */
 xfs_inobt_init_cursor(
 	struct xfs_mount	*mp,		/* file system mount point */
 	struct xfs_trans	*tp,		/* transaction pointer */
+	struct xfs_defer_ops	*dfops,		/* deferred ops */
 	struct xfs_buf		*agbp,		/* buffer for agi structure */
 	xfs_agnumber_t		agno,		/* allocation group number */
 	xfs_btnum_t		btnum)		/* ialloc or free ino btree */
@@ -420,6 +422,7 @@ xfs_inobt_init_cursor(
 	if (xfs_sb_version_hascrc(&mp->m_sb))
 		cur->bc_flags |= XFS_BTREE_CRC_BLOCKS;
 
+	cur->bc_private.a.dfops = dfops;
 	cur->bc_private.a.agbp = agbp;
 	cur->bc_private.a.agno = agno;
 
@@ -552,7 +555,7 @@ xfs_inobt_count_blocks(
 	if (error)
 		return error;
 
-	cur = xfs_inobt_init_cursor(mp, NULL, agbp, agno, btnum);
+	cur = xfs_inobt_init_cursor(mp, NULL, NULL, agbp, agno, btnum);
 	error = xfs_btree_count_blocks(cur, tree_blocks);
 	xfs_btree_del_cursor(cur, error ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR);
 	xfs_buf_relse(agbp);
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.h b/fs/xfs/libxfs/xfs_ialloc_btree.h
index aa81e2e63f3f..0194351c09c8 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.h
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.h
@@ -58,8 +58,8 @@ struct xfs_mount;
 		 ((index) - 1) * sizeof(xfs_inobt_ptr_t)))
 
 extern struct xfs_btree_cur *xfs_inobt_init_cursor(struct xfs_mount *,
-		struct xfs_trans *, struct xfs_buf *, xfs_agnumber_t,
-		xfs_btnum_t);
+		struct xfs_trans *, struct xfs_defer_ops *, struct xfs_buf *,
+		xfs_agnumber_t, xfs_btnum_t);
 extern int xfs_inobt_maxrecs(struct xfs_mount *, int, int);
 
 /* ir_holemask to inode allocation bitmap conversion */
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index ac95fe911d96..44f96c835836 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -358,16 +358,16 @@ xfs_scrub_ag_btcur_init(
 
 	/* Set up a inobt cursor for cross-referencing. */
 	if (sa->agi_bp) {
-		sa->ino_cur = xfs_inobt_init_cursor(mp, sc->tp, sa->agi_bp,
-					agno, XFS_BTNUM_INO);
+		sa->ino_cur = xfs_inobt_init_cursor(mp, sc->tp, NULL,
+					sa->agi_bp, agno, XFS_BTNUM_INO);
 		if (!sa->ino_cur)
 			goto err;
 	}
 
 	/* Set up a finobt cursor for cross-referencing. */
 	if (sa->agi_bp && xfs_sb_version_hasfinobt(&mp->m_sb)) {
-		sa->fino_cur = xfs_inobt_init_cursor(mp, sc->tp, sa->agi_bp,
-				agno, XFS_BTNUM_FINO);
+		sa->fino_cur = xfs_inobt_init_cursor(mp, sc->tp, NULL,
+				sa->agi_bp, agno, XFS_BTNUM_FINO);
 		if (!sa->fino_cur)
 			goto err;
 	}
diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
index d58310514423..39b16eb84e51 100644
--- a/fs/xfs/xfs_itable.c
+++ b/fs/xfs/xfs_itable.c
@@ -398,7 +398,7 @@ xfs_bulkstat(
 		/*
 		 * Allocate and initialize a btree cursor for ialloc btree.
 		 */
-		cur = xfs_inobt_init_cursor(mp, NULL, agbp, agno,
+		cur = xfs_inobt_init_cursor(mp, NULL, NULL, agbp, agno,
 					    XFS_BTNUM_INO);
 		if (agino > 0) {
 			/*
@@ -582,7 +582,7 @@ xfs_inumbers(
 			if (error)
 				break;
 
-			cur = xfs_inobt_init_cursor(mp, NULL, agbp, agno,
+			cur = xfs_inobt_init_cursor(mp, NULL, NULL, agbp, agno,
 						    XFS_BTNUM_INO);
 			error = xfs_inobt_lookup(cur, agino, XFS_LOOKUP_GE,
 						 &stat);
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 1/4] xfs: create agfl block free helper function
  2017-12-07 18:58 ` [PATCH RFC 1/4] xfs: create agfl block free helper function Brian Foster
@ 2017-12-07 22:24   ` Dave Chinner
  0 siblings, 0 replies; 17+ messages in thread
From: Dave Chinner @ 2017-12-07 22:24 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Thu, Dec 07, 2017 at 01:58:07PM -0500, Brian Foster wrote:
> Refactor the AGFL block free code into a new helper such that it can
> be invoked from deferred context. No functional changes.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>

That's a good cleanup by itself - I think I have a patch hanging
around somewhere that does this, too :P

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 2/4] xfs: defer agfl block frees when dfops is available
  2017-12-07 18:58 ` [PATCH RFC 2/4] xfs: defer agfl block frees when dfops is available Brian Foster
@ 2017-12-07 22:41   ` Dave Chinner
  2017-12-07 22:54     ` Dave Chinner
  2017-12-08 14:16     ` Brian Foster
  0 siblings, 2 replies; 17+ messages in thread
From: Dave Chinner @ 2017-12-07 22:41 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Thu, Dec 07, 2017 at 01:58:08PM -0500, Brian Foster wrote:
> The AGFL fixup code executes before every block allocation/free and
> rectifies the AGFL based on the current, dynamic allocation
> requirements of the fs. The AGFL must hold a minimum number of
> blocks to satisfy a worst case split of the free space btrees caused
> by the impending allocation operation. The AGFL is also updated to
> maintain the implicit requirement for a minimum number of free slots
> to satisfy a worst case join of the free space btrees.
> 
> Since the AGFL caches individual blocks, AGFL reduction typically
> involves multiple, single block frees. We've had reports of
> transaction overrun problems during certain workloads that boil down
> to AGFL reduction freeing multiple blocks and consuming more space
> in the log than was reserved for the transaction.
> 
> Since the objective of freeing AGFL blocks is to ensure free AGFL
> free slots are available for the upcoming allocation, one way to
> address this problem is to release surplus blocks from the AGFL
> immediately but defer the free of those blocks (similar to how
> file-mapped blocks are unmapped from the file in one transaction and
> freed via a deferred operation) until the transaction is rolled.
> This turns AGFL reduction into an operation with predictable log
> reservation consumption.
> 
> Add the capability to defer AGFL block frees when a deferred ops
> list is handed to the AGFL fixup code. Deferring AGFL frees is a
> conditional behavior based on whether the caller has populated the
> new dfops field of the xfs_alloc_arg structure. A bit of
> customization is required to handle deferred completion processing
> because AGFL blocks are accounted against a separate reservation
> pool and AGFL are not inserted into the extent busy list when freed
> (they are inserted when used and released back to the AGFL). Reuse
> the majority of the existing deferred extent free infrastructure and
> customize it appropriately to handle AGFL blocks.

Ok, so it uses the EFI/EFD to make sure that the block freeing is
logged and replayed. So my question is:

> +/*
> + * AGFL blocks are accounted differently in the reserve pools and are not
> + * inserted into the busy extent list.
> + */
> +STATIC int
> +xfs_agfl_free_finish_item(
> +	struct xfs_trans		*tp,
> +	struct xfs_defer_ops		*dop,
> +	struct list_head		*item,
> +	void				*done_item,
> +	void				**state)
> +{

How does this function get called by log recovery when processing
the EFI as there is no flag in the EFI that says this was a AGFL
block?

That said, I haven't traced through whether this matters or not,
but I suspect it does because freelist frees use XFS_AG_RESV_AGFL
and that avoids accounting the free to the superblock counters
because the block is already accounted as free space....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 3/4] xfs: defer agfl block frees on extent frees
  2017-12-07 18:58 ` [PATCH RFC 3/4] xfs: defer agfl block frees on extent frees Brian Foster
@ 2017-12-07 22:49   ` Dave Chinner
  2017-12-08 14:20     ` Brian Foster
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Chinner @ 2017-12-07 22:49 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Thu, Dec 07, 2017 at 01:58:09PM -0500, Brian Foster wrote:
> Defer AGFL block frees from deferred extent free context. This means
> that extents that are deferred freed via xfs_bmap_add_free() will
> add additional deferred items for AGFL block frees during completion
> processing. All such items complete before xfs_defer_finish()
> returns.
> 
> Update xfs_trans_free_extent() and xfs_free_extent() to receive an
> optional dfops pointer and pass it down to the AGFL fixup code via
> the allocation arguments structure. Update
> xfs_extent_free_finish_item() to pass along the dfops list currently
> being processed by xfs_defer_finish(). All other callers pass a NULL
> dfops and so do not change behavior.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_alloc.c          | 9 ++++++---
>  fs/xfs/libxfs/xfs_alloc.h          | 5 +++--
>  fs/xfs/libxfs/xfs_ialloc_btree.c   | 2 +-
>  fs/xfs/libxfs/xfs_refcount_btree.c | 2 +-
>  fs/xfs/libxfs/xfs_rmap.c           | 2 +-
>  fs/xfs/xfs_extfree_item.c          | 2 +-
>  fs/xfs/xfs_fsops.c                 | 2 +-
>  fs/xfs/xfs_trans.h                 | 3 ++-
>  fs/xfs/xfs_trans_extfree.c         | 7 ++++---
>  9 files changed, 20 insertions(+), 14 deletions(-)

Rather than passing the dfops structure around, I'm starting to
wonder it it makes more sense to attach it to the struct xfs_trans
we pass around to all these functions? That would mean it doesn't
need to be manually plumbed into any of this code - it would be
directly available in any transaction context that has a dfops
associated with it.

That would mean all agfl fixups would be able to be deferred without
modifying any of the intermediate code paths as all allocation/free
transactions require a dfops structure....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 2/4] xfs: defer agfl block frees when dfops is available
  2017-12-07 22:41   ` Dave Chinner
@ 2017-12-07 22:54     ` Dave Chinner
  2017-12-08 14:17       ` Brian Foster
  2017-12-08 14:16     ` Brian Foster
  1 sibling, 1 reply; 17+ messages in thread
From: Dave Chinner @ 2017-12-07 22:54 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Dec 08, 2017 at 09:41:26AM +1100, Dave Chinner wrote:
> On Thu, Dec 07, 2017 at 01:58:08PM -0500, Brian Foster wrote:
> > The AGFL fixup code executes before every block allocation/free and
> > rectifies the AGFL based on the current, dynamic allocation
> > requirements of the fs. The AGFL must hold a minimum number of
> > blocks to satisfy a worst case split of the free space btrees caused
> > by the impending allocation operation. The AGFL is also updated to
> > maintain the implicit requirement for a minimum number of free slots
> > to satisfy a worst case join of the free space btrees.
> > 
> > Since the AGFL caches individual blocks, AGFL reduction typically
> > involves multiple, single block frees. We've had reports of
> > transaction overrun problems during certain workloads that boil down
> > to AGFL reduction freeing multiple blocks and consuming more space
> > in the log than was reserved for the transaction.
> > 
> > Since the objective of freeing AGFL blocks is to ensure free AGFL
> > free slots are available for the upcoming allocation, one way to
> > address this problem is to release surplus blocks from the AGFL
> > immediately but defer the free of those blocks (similar to how
> > file-mapped blocks are unmapped from the file in one transaction and
> > freed via a deferred operation) until the transaction is rolled.
> > This turns AGFL reduction into an operation with predictable log
> > reservation consumption.
> > 
> > Add the capability to defer AGFL block frees when a deferred ops
> > list is handed to the AGFL fixup code. Deferring AGFL frees is a
> > conditional behavior based on whether the caller has populated the
> > new dfops field of the xfs_alloc_arg structure. A bit of
> > customization is required to handle deferred completion processing
> > because AGFL blocks are accounted against a separate reservation
> > pool and AGFL are not inserted into the extent busy list when freed
> > (they are inserted when used and released back to the AGFL). Reuse
> > the majority of the existing deferred extent free infrastructure and
> > customize it appropriately to handle AGFL blocks.
> 
> Ok, so it uses the EFI/EFD to make sure that the block freeing is
> logged and replayed. So my question is:
> 
> > +/*
> > + * AGFL blocks are accounted differently in the reserve pools and are not
> > + * inserted into the busy extent list.
> > + */
> > +STATIC int
> > +xfs_agfl_free_finish_item(
> > +	struct xfs_trans		*tp,
> > +	struct xfs_defer_ops		*dop,
> > +	struct list_head		*item,
> > +	void				*done_item,
> > +	void				**state)
> > +{
> 
> How does this function get called by log recovery when processing
> the EFI as there is no flag in the EFI that says this was a AGFL
> block?
> 
> That said, I haven't traced through whether this matters or not,
> but I suspect it does because freelist frees use XFS_AG_RESV_AGFL
> and that avoids accounting the free to the superblock counters
> because the block is already accounted as free space....

Just had another thought on this - this is going to cause a large
number of alloc/free transactions to roll the transaction at least
one more time. That means the logcount in the alloc/free transaction
reservation should be bumped by one. i.e. so that the common case
doesn't need to block and re-reserve grant space in the log to
complete the transaction because it has rolled the more times than
the reservation log count accounts for.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 2/4] xfs: defer agfl block frees when dfops is available
  2017-12-07 22:41   ` Dave Chinner
  2017-12-07 22:54     ` Dave Chinner
@ 2017-12-08 14:16     ` Brian Foster
  2018-01-08 21:56       ` Brian Foster
  1 sibling, 1 reply; 17+ messages in thread
From: Brian Foster @ 2017-12-08 14:16 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Fri, Dec 08, 2017 at 09:41:26AM +1100, Dave Chinner wrote:
> On Thu, Dec 07, 2017 at 01:58:08PM -0500, Brian Foster wrote:
> > The AGFL fixup code executes before every block allocation/free and
> > rectifies the AGFL based on the current, dynamic allocation
> > requirements of the fs. The AGFL must hold a minimum number of
> > blocks to satisfy a worst case split of the free space btrees caused
> > by the impending allocation operation. The AGFL is also updated to
> > maintain the implicit requirement for a minimum number of free slots
> > to satisfy a worst case join of the free space btrees.
> > 
> > Since the AGFL caches individual blocks, AGFL reduction typically
> > involves multiple, single block frees. We've had reports of
> > transaction overrun problems during certain workloads that boil down
> > to AGFL reduction freeing multiple blocks and consuming more space
> > in the log than was reserved for the transaction.
> > 
> > Since the objective of freeing AGFL blocks is to ensure free AGFL
> > free slots are available for the upcoming allocation, one way to
> > address this problem is to release surplus blocks from the AGFL
> > immediately but defer the free of those blocks (similar to how
> > file-mapped blocks are unmapped from the file in one transaction and
> > freed via a deferred operation) until the transaction is rolled.
> > This turns AGFL reduction into an operation with predictable log
> > reservation consumption.
> > 
> > Add the capability to defer AGFL block frees when a deferred ops
> > list is handed to the AGFL fixup code. Deferring AGFL frees is a
> > conditional behavior based on whether the caller has populated the
> > new dfops field of the xfs_alloc_arg structure. A bit of
> > customization is required to handle deferred completion processing
> > because AGFL blocks are accounted against a separate reservation
> > pool and AGFL are not inserted into the extent busy list when freed
> > (they are inserted when used and released back to the AGFL). Reuse
> > the majority of the existing deferred extent free infrastructure and
> > customize it appropriately to handle AGFL blocks.
> 
> Ok, so it uses the EFI/EFD to make sure that the block freeing is
> logged and replayed. So my question is:
> 
> > +/*
> > + * AGFL blocks are accounted differently in the reserve pools and are not
> > + * inserted into the busy extent list.
> > + */
> > +STATIC int
> > +xfs_agfl_free_finish_item(
> > +	struct xfs_trans		*tp,
> > +	struct xfs_defer_ops		*dop,
> > +	struct list_head		*item,
> > +	void				*done_item,
> > +	void				**state)
> > +{
> 
> How does this function get called by log recovery when processing
> the EFI as there is no flag in the EFI that says this was a AGFL
> block?
> 

It doesn't...

> That said, I haven't traced through whether this matters or not,
> but I suspect it does because freelist frees use XFS_AG_RESV_AGFL
> and that avoids accounting the free to the superblock counters
> because the block is already accounted as free space....
> 

I don't think it does matter. I actually tested log recovery precisely
for this question, to see whether the traditional EFI recovery path
would disrupt accounting or anything and I didn't reproduce any problems
(well, except for that rmap record cleanup failure thing).

However, I do still need to trace through and understand why that is, to
know for sure that there aren't any problems lurking here (and if not, I
should probably document it), but I suspect the reason is that the
differences between how agfl and regular blocks are handled here only
affect in-core state of the AG reservation pools. These are all
reinitialized from zero on a subsequent mount based on the on-disk state
(... but good point, and I will try to confirm that before posting a
non-RFC variant).

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 2/4] xfs: defer agfl block frees when dfops is available
  2017-12-07 22:54     ` Dave Chinner
@ 2017-12-08 14:17       ` Brian Foster
  0 siblings, 0 replies; 17+ messages in thread
From: Brian Foster @ 2017-12-08 14:17 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Fri, Dec 08, 2017 at 09:54:59AM +1100, Dave Chinner wrote:
> On Fri, Dec 08, 2017 at 09:41:26AM +1100, Dave Chinner wrote:
> > On Thu, Dec 07, 2017 at 01:58:08PM -0500, Brian Foster wrote:
> > > The AGFL fixup code executes before every block allocation/free and
> > > rectifies the AGFL based on the current, dynamic allocation
> > > requirements of the fs. The AGFL must hold a minimum number of
> > > blocks to satisfy a worst case split of the free space btrees caused
> > > by the impending allocation operation. The AGFL is also updated to
> > > maintain the implicit requirement for a minimum number of free slots
> > > to satisfy a worst case join of the free space btrees.
> > > 
> > > Since the AGFL caches individual blocks, AGFL reduction typically
> > > involves multiple, single block frees. We've had reports of
> > > transaction overrun problems during certain workloads that boil down
> > > to AGFL reduction freeing multiple blocks and consuming more space
> > > in the log than was reserved for the transaction.
> > > 
> > > Since the objective of freeing AGFL blocks is to ensure free AGFL
> > > free slots are available for the upcoming allocation, one way to
> > > address this problem is to release surplus blocks from the AGFL
> > > immediately but defer the free of those blocks (similar to how
> > > file-mapped blocks are unmapped from the file in one transaction and
> > > freed via a deferred operation) until the transaction is rolled.
> > > This turns AGFL reduction into an operation with predictable log
> > > reservation consumption.
> > > 
> > > Add the capability to defer AGFL block frees when a deferred ops
> > > list is handed to the AGFL fixup code. Deferring AGFL frees is a
> > > conditional behavior based on whether the caller has populated the
> > > new dfops field of the xfs_alloc_arg structure. A bit of
> > > customization is required to handle deferred completion processing
> > > because AGFL blocks are accounted against a separate reservation
> > > pool and AGFL are not inserted into the extent busy list when freed
> > > (they are inserted when used and released back to the AGFL). Reuse
> > > the majority of the existing deferred extent free infrastructure and
> > > customize it appropriately to handle AGFL blocks.
> > 
> > Ok, so it uses the EFI/EFD to make sure that the block freeing is
> > logged and replayed. So my question is:
> > 
> > > +/*
> > > + * AGFL blocks are accounted differently in the reserve pools and are not
> > > + * inserted into the busy extent list.
> > > + */
> > > +STATIC int
> > > +xfs_agfl_free_finish_item(
> > > +	struct xfs_trans		*tp,
> > > +	struct xfs_defer_ops		*dop,
> > > +	struct list_head		*item,
> > > +	void				*done_item,
> > > +	void				**state)
> > > +{
> > 
> > How does this function get called by log recovery when processing
> > the EFI as there is no flag in the EFI that says this was a AGFL
> > block?
> > 
> > That said, I haven't traced through whether this matters or not,
> > but I suspect it does because freelist frees use XFS_AG_RESV_AGFL
> > and that avoids accounting the free to the superblock counters
> > because the block is already accounted as free space....
> 
> Just had another thought on this - this is going to cause a large
> number of alloc/free transactions to roll the transaction at least
> one more time. That means the logcount in the alloc/free transaction
> reservation should be bumped by one. i.e. so that the common case
> doesn't need to block and re-reserve grant space in the log to
> complete the transaction because it has rolled the more times than
> the reservation log count accounts for.
> 

Yeah, that is something else we need to consider. One thing that stands
out is that we don't seem to currently break down log count values into
operational units as we do for log reservation itself. We could just
bump them all, or at least whatever ones are used in contexts that are
now able to defer AGFL block frees. There aren't that many separate
values defined, but I wonder if something like:

#define XFS_BMAPFREE_LOG_COUNT 2	/* 1 deferred free + 1 AGFL free */
...
#define XFS_INACTIVE_LOG_COUNT (XFS_ALLOCFREE_LOG_COUNT + 1)
...

... would help make this more self-documenting. Hm?

I was also a little concerned about increasing the size of the
transactions again since I believe that a bump in log count increases
the initial reservation requirement (so we don't end up blocking on the
roll, as you point out). But now that I take a quick look with the above
in mind, it's probably the appropriate thing to do so long it can be
applied selectively/accurately. I'll look more into it..

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 3/4] xfs: defer agfl block frees on extent frees
  2017-12-07 22:49   ` Dave Chinner
@ 2017-12-08 14:20     ` Brian Foster
  0 siblings, 0 replies; 17+ messages in thread
From: Brian Foster @ 2017-12-08 14:20 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Fri, Dec 08, 2017 at 09:49:41AM +1100, Dave Chinner wrote:
> On Thu, Dec 07, 2017 at 01:58:09PM -0500, Brian Foster wrote:
> > Defer AGFL block frees from deferred extent free context. This means
> > that extents that are deferred freed via xfs_bmap_add_free() will
> > add additional deferred items for AGFL block frees during completion
> > processing. All such items complete before xfs_defer_finish()
> > returns.
> > 
> > Update xfs_trans_free_extent() and xfs_free_extent() to receive an
> > optional dfops pointer and pass it down to the AGFL fixup code via
> > the allocation arguments structure. Update
> > xfs_extent_free_finish_item() to pass along the dfops list currently
> > being processed by xfs_defer_finish(). All other callers pass a NULL
> > dfops and so do not change behavior.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_alloc.c          | 9 ++++++---
> >  fs/xfs/libxfs/xfs_alloc.h          | 5 +++--
> >  fs/xfs/libxfs/xfs_ialloc_btree.c   | 2 +-
> >  fs/xfs/libxfs/xfs_refcount_btree.c | 2 +-
> >  fs/xfs/libxfs/xfs_rmap.c           | 2 +-
> >  fs/xfs/xfs_extfree_item.c          | 2 +-
> >  fs/xfs/xfs_fsops.c                 | 2 +-
> >  fs/xfs/xfs_trans.h                 | 3 ++-
> >  fs/xfs/xfs_trans_extfree.c         | 7 ++++---
> >  9 files changed, 20 insertions(+), 14 deletions(-)
> 
> Rather than passing the dfops structure around, I'm starting to
> wonder it it makes more sense to attach it to the struct xfs_trans
> we pass around to all these functions? That would mean it doesn't
> need to be manually plumbed into any of this code - it would be
> directly available in any transaction context that has a dfops
> associated with it.
> 

Yeah, that sounds like a nice cleanup.

> That would mean all agfl fixups would be able to be deferred without
> modifying any of the intermediate code paths as all allocation/free
> transactions require a dfops structure....
> 

Ok.. so separate from the refactoring, it sounds like you're suggesting
that we should try to always defer AGFL frees rather than doing so very
selectively (as the current code does). I may still try to implement
that incrementally, but I suppose that may depend on what falls out from
the above cleanup. Otherwise that sounds reasonable to me.

I'm heading into some time off, but I'll look more into the above and
other fixups once I get back to this. Thanks for the review!

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 2/4] xfs: defer agfl block frees when dfops is available
  2017-12-08 14:16     ` Brian Foster
@ 2018-01-08 21:56       ` Brian Foster
  2018-01-09 20:43         ` Darrick J. Wong
  0 siblings, 1 reply; 17+ messages in thread
From: Brian Foster @ 2018-01-08 21:56 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, Darrick J. Wong

cc Darrick

On Fri, Dec 08, 2017 at 09:16:30AM -0500, Brian Foster wrote:
> On Fri, Dec 08, 2017 at 09:41:26AM +1100, Dave Chinner wrote:
> > On Thu, Dec 07, 2017 at 01:58:08PM -0500, Brian Foster wrote:
...
> > 
> > Ok, so it uses the EFI/EFD to make sure that the block freeing is
> > logged and replayed. So my question is:
> > 
> > > +/*
> > > + * AGFL blocks are accounted differently in the reserve pools and are not
> > > + * inserted into the busy extent list.
> > > + */
> > > +STATIC int
> > > +xfs_agfl_free_finish_item(
> > > +	struct xfs_trans		*tp,
> > > +	struct xfs_defer_ops		*dop,
> > > +	struct list_head		*item,
> > > +	void				*done_item,
> > > +	void				**state)
> > > +{
> > 
> > How does this function get called by log recovery when processing
> > the EFI as there is no flag in the EFI that says this was a AGFL
> > block?
> > 
> 
> It doesn't...
> 
> > That said, I haven't traced through whether this matters or not,
> > but I suspect it does because freelist frees use XFS_AG_RESV_AGFL
> > and that avoids accounting the free to the superblock counters
> > because the block is already accounted as free space....
> > 
> 
> I don't think it does matter. I actually tested log recovery precisely
> for this question, to see whether the traditional EFI recovery path
> would disrupt accounting or anything and I didn't reproduce any problems
> (well, except for that rmap record cleanup failure thing).
> 
> However, I do still need to trace through and understand why that is, to
> know for sure that there aren't any problems lurking here (and if not, I
> should probably document it), but I suspect the reason is that the
> differences between how agfl and regular blocks are handled here only
> affect in-core state of the AG reservation pools. These are all
> reinitialized from zero on a subsequent mount based on the on-disk state
> (... but good point, and I will try to confirm that before posting a
> non-RFC variant).
> 

After catching back up with this and taking a closer look at the code, I
can confirm that generic EFI recovery works fine for deferred AGFL block
frees. What happens is essentially that the slot is freed and the block
free is deferred in a particular tx. If we crash before that tx commits,
then obviously nothing changes and we're fine. If we crash after that tx
commits, EFI recovery frees the block and the AGFL reserve pool
adjustment is irrelevant as the in-core res counters are initialized
from the current state of the fs after log recovery has completed (so
even if we knew this was an agfl block, attempting reservation
adjustments at recovery time would probably be wrong).

That aside, looking through the perag res code had me a bit curious
about why we reserve all agfl blocks in the first place. IIUC, the AGFL
reserve pool actually serves the rmapbt, since that (and that alone) is
what the mount time reservation is based on. AGFL blocks can be used for
other purposes, however, and the current runtime reservation is adjusted
based on all AGFL activity. Is there a reason this reserve pool does not
specifically target rmapbt allocations? Doesn't not doing so allow some
percentage of the rmapbt reserved blocks to be consumed by other
structures (alloc btrees) until/unless the fs is remounted? I'm
wondering if specifically there's a risk of something like the
following:

- mount fs with some number N of AGFL reserved blocks based on current
  rmapbt state. Suppose the size of the rmapbt is R.
- A bunch of agfl blocks are used over time (U). Suppose 50% of those go
  to the rmapbt and the other 50% to alloc btrees and whatnot.
  ar_reserved is reduced by U, but R only increases by U/2.
- a bunch more unrelated physical allocations occur and consume all
  non-reserved space
- the fs unmounts/mounts and the perag code looks for the remaining U/2
  blocks to reserve for the rmapbt, but not all of those blocks are
  available because we depleted the reserved pool faster than the rmapbt
  grew.

Darrick, hm? FWIW, a quick test to allocate 100% of an AG to a file,
punch out every other block for a few minutes and then remount kind of
shows what I'm wondering about:

 mount-3215  [002] ...1  1642.401846: xfs_ag_resv_init: dev 253:3 agno 0 resv 2 freeblks 259564 flcount 6 resv 3193 ask 3194 len 3194
...
 <...>-28260 [000] ...1  1974.946866: xfs_ag_resv_alloc_extent: dev 253:3 agno 0 resv 2 freeblks 36317 flcount 9 resv 2936 ask 3194 len 1                                                                                            
 <...>-28428 [002] ...1  1976.371830: xfs_ag_resv_alloc_extent: dev 253:3 agno 0 resv 2 freeblks 36483 flcount 9 resv 2935 ask 3194 len 1                                                                                            
 <...>-28490 [002] ...1  1976.898147: xfs_ag_resv_alloc_extent: dev 253:3 agno 0 resv 2 freeblks 36544 flcount 9 resv 2934 ask 3194 len 1                                                                                            
 <...>-28491 [002] ...1  1976.907967: xfs_ag_resv_alloc_extent: dev 253:3 agno 0 resv 2 freeblks 36544 flcount 9 resv 2933 ask 3194 len 1                                                                                            
umount-28664 [002] ...1  1983.335444: xfs_ag_resv_free: dev 253:3 agno 0 resv 2 freeblks 36643 flcount 10 resv 2932 ask 3194 len 0                                                                                                   
 mount-28671 [000] ...1  1991.396640: xfs_ag_resv_init: dev 253:3 agno 0 resv 2 freeblks 36643 flcount 10 resv 3038 ask 3194 len 3194                    

We consume the res as we go, unmount with some held reservation value,
immediately remount and the associated reservation has jumped by 100
blocks or so. (Granted, whether this can manifest into a tangible
problem may be another story altogether.).

Brian

> Brian
> 
> > Cheers,
> > 
> > Dave.
> > -- 
> > Dave Chinner
> > david@fromorbit.com
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 2/4] xfs: defer agfl block frees when dfops is available
  2018-01-08 21:56       ` Brian Foster
@ 2018-01-09 20:43         ` Darrick J. Wong
  2018-01-10 12:58           ` Brian Foster
  0 siblings, 1 reply; 17+ messages in thread
From: Darrick J. Wong @ 2018-01-09 20:43 UTC (permalink / raw)
  To: Brian Foster; +Cc: Dave Chinner, linux-xfs

On Mon, Jan 08, 2018 at 04:56:03PM -0500, Brian Foster wrote:
> cc Darrick
> 
> On Fri, Dec 08, 2017 at 09:16:30AM -0500, Brian Foster wrote:
> > On Fri, Dec 08, 2017 at 09:41:26AM +1100, Dave Chinner wrote:
> > > On Thu, Dec 07, 2017 at 01:58:08PM -0500, Brian Foster wrote:
> ...
> > > 
> > > Ok, so it uses the EFI/EFD to make sure that the block freeing is
> > > logged and replayed. So my question is:
> > > 
> > > > +/*
> > > > + * AGFL blocks are accounted differently in the reserve pools and are not
> > > > + * inserted into the busy extent list.
> > > > + */
> > > > +STATIC int
> > > > +xfs_agfl_free_finish_item(
> > > > +	struct xfs_trans		*tp,
> > > > +	struct xfs_defer_ops		*dop,
> > > > +	struct list_head		*item,
> > > > +	void				*done_item,
> > > > +	void				**state)
> > > > +{
> > > 
> > > How does this function get called by log recovery when processing
> > > the EFI as there is no flag in the EFI that says this was a AGFL
> > > block?
> > > 
> > 
> > It doesn't...
> > 
> > > That said, I haven't traced through whether this matters or not,
> > > but I suspect it does because freelist frees use XFS_AG_RESV_AGFL
> > > and that avoids accounting the free to the superblock counters
> > > because the block is already accounted as free space....
> > > 
> > 
> > I don't think it does matter. I actually tested log recovery precisely
> > for this question, to see whether the traditional EFI recovery path
> > would disrupt accounting or anything and I didn't reproduce any problems
> > (well, except for that rmap record cleanup failure thing).
> > 
> > However, I do still need to trace through and understand why that is, to
> > know for sure that there aren't any problems lurking here (and if not, I
> > should probably document it), but I suspect the reason is that the
> > differences between how agfl and regular blocks are handled here only
> > affect in-core state of the AG reservation pools. These are all
> > reinitialized from zero on a subsequent mount based on the on-disk state
> > (... but good point, and I will try to confirm that before posting a
> > non-RFC variant).
> > 
> 
> After catching back up with this and taking a closer look at the code, I
> can confirm that generic EFI recovery works fine for deferred AGFL block
> frees. What happens is essentially that the slot is freed and the block
> free is deferred in a particular tx. If we crash before that tx commits,
> then obviously nothing changes and we're fine. If we crash after that tx
> commits, EFI recovery frees the block and the AGFL reserve pool
> adjustment is irrelevant as the in-core res counters are initialized
> from the current state of the fs after log recovery has completed (so
> even if we knew this was an agfl block, attempting reservation
> adjustments at recovery time would probably be wrong).
> 
> That aside, looking through the perag res code had me a bit curious
> about why we reserve all agfl blocks in the first place. IIUC, the AGFL
> reserve pool actually serves the rmapbt, since that (and that alone) is
> what the mount time reservation is based on. AGFL blocks can be used for
> other purposes, however, and the current runtime reservation is adjusted
> based on all AGFL activity. Is there a reason this reserve pool does not
> specifically target rmapbt allocations? Doesn't not doing so allow some
> percentage of the rmapbt reserved blocks to be consumed by other
> structures (alloc btrees) until/unless the fs is remounted? I'm
> wondering if specifically there's a risk of something like the
> following:
> 
> - mount fs with some number N of AGFL reserved blocks based on current
>   rmapbt state. Suppose the size of the rmapbt is R.
> - A bunch of agfl blocks are used over time (U). Suppose 50% of those go
>   to the rmapbt and the other 50% to alloc btrees and whatnot.
>   ar_reserved is reduced by U, but R only increases by U/2.
> - a bunch more unrelated physical allocations occur and consume all
>   non-reserved space
> - the fs unmounts/mounts and the perag code looks for the remaining U/2
>   blocks to reserve for the rmapbt, but not all of those blocks are
>   available because we depleted the reserved pool faster than the rmapbt
>   grew.
> 
> Darrick, hm? FWIW, a quick test to allocate 100% of an AG to a file,
> punch out every other block for a few minutes and then remount kind of
> shows what I'm wondering about:
> 
>  mount-3215  [002] ...1  1642.401846: xfs_ag_resv_init: dev 253:3 agno 0 resv 2 freeblks 259564 flcount 6 resv 3193 ask 3194 len 3194
> ...
>  <...>-28260 [000] ...1  1974.946866: xfs_ag_resv_alloc_extent: dev 253:3 agno 0 resv 2 freeblks 36317 flcount 9 resv 2936 ask 3194 len 1
>  <...>-28428 [002] ...1  1976.371830: xfs_ag_resv_alloc_extent: dev 253:3 agno 0 resv 2 freeblks 36483 flcount 9 resv 2935 ask 3194 len 1
>  <...>-28490 [002] ...1  1976.898147: xfs_ag_resv_alloc_extent: dev 253:3 agno 0 resv 2 freeblks 36544 flcount 9 resv 2934 ask 3194 len 1
>  <...>-28491 [002] ...1  1976.907967: xfs_ag_resv_alloc_extent: dev 253:3 agno 0 resv 2 freeblks 36544 flcount 9 resv 2933 ask 3194 len 1
> umount-28664 [002] ...1  1983.335444: xfs_ag_resv_free: dev 253:3 agno 0 resv 2 freeblks 36643 flcount 10 resv 2932 ask 3194 len 0
>  mount-28671 [000] ...1  1991.396640: xfs_ag_resv_init: dev 253:3 agno 0 resv 2 freeblks 36643 flcount 10 resv 3038 ask 3194 len 3194

Yep, that's a bug.  Our current agfl doesn't have much of a way to
signal to the perag reservation code that it's using blocks for the
rmapbt vs. any other use, so we use up the reservation and then pull
from the non-reserved free space after that, with the result that
sometimes we can blow the assert in xfs_ag_resv_init.  I've not been
able to come up with a convincing way to fix this problem, largely
because of:

> We consume the res as we go, unmount with some held reservation value,
> immediately remount and the associated reservation has jumped by 100
> blocks or so. (Granted, whether this can manifest into a tangible
> problem may be another story altogether.).

It's theoretically possible -- even with the perag reservation
functioning perfectly we still can run the ag totally out of blocks if
the rmap expands beyond our assumed max rmapbt size of max(1 record per
block, 1% of the ag).

Say you have agblocks = 11000, allocate 50% of the AG to a file, then
reflink that extent into 10000 other files.  Then dirty every other
block in each of the 10000 reflink copies.  Even if all the dirtied
blocks end up in some other AG, we've still expanded the number of rmap
records from 10001 to 5000 * 10000 == 50,000,000, which is much bigger
than our original estimation.

Unfortunately the options here aren't good -- we can't reserve enough
blocks to cover the maximal rmapbt when reflink is enabled, so the best
we could do is fail write_begin with ENOSPC if the shared extent's AG is
critically low on reservation, but that kinda sucks because at that
point the CoW /should/ be moving blocks (and therefore mappings) away
from the full AG.

(We already cut off reflinking when the perag reservations are
critically low, but that's only because we assume the caller will fall
back to a physical copy and that the physical copy will land in some
other AG.)

--D

> 
> Brian
> 
> > Brian
> > 
> > > Cheers,
> > > 
> > > Dave.
> > > -- 
> > > Dave Chinner
> > > david@fromorbit.com
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 2/4] xfs: defer agfl block frees when dfops is available
  2018-01-09 20:43         ` Darrick J. Wong
@ 2018-01-10 12:58           ` Brian Foster
  2018-01-10 19:08             ` Darrick J. Wong
  0 siblings, 1 reply; 17+ messages in thread
From: Brian Foster @ 2018-01-10 12:58 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Dave Chinner, linux-xfs

On Tue, Jan 09, 2018 at 12:43:15PM -0800, Darrick J. Wong wrote:
> On Mon, Jan 08, 2018 at 04:56:03PM -0500, Brian Foster wrote:
> > cc Darrick
> > 
> > On Fri, Dec 08, 2017 at 09:16:30AM -0500, Brian Foster wrote:
> > > On Fri, Dec 08, 2017 at 09:41:26AM +1100, Dave Chinner wrote:
> > > > On Thu, Dec 07, 2017 at 01:58:08PM -0500, Brian Foster wrote:
> > ...
> > > > 
> > > > Ok, so it uses the EFI/EFD to make sure that the block freeing is
> > > > logged and replayed. So my question is:
> > > > 
> > > > > +/*
> > > > > + * AGFL blocks are accounted differently in the reserve pools and are not
> > > > > + * inserted into the busy extent list.
> > > > > + */
> > > > > +STATIC int
> > > > > +xfs_agfl_free_finish_item(
> > > > > +	struct xfs_trans		*tp,
> > > > > +	struct xfs_defer_ops		*dop,
> > > > > +	struct list_head		*item,
> > > > > +	void				*done_item,
> > > > > +	void				**state)
> > > > > +{
> > > > 
> > > > How does this function get called by log recovery when processing
> > > > the EFI as there is no flag in the EFI that says this was a AGFL
> > > > block?
> > > > 
> > > 
> > > It doesn't...
> > > 
> > > > That said, I haven't traced through whether this matters or not,
> > > > but I suspect it does because freelist frees use XFS_AG_RESV_AGFL
> > > > and that avoids accounting the free to the superblock counters
> > > > because the block is already accounted as free space....
> > > > 
> > > 
> > > I don't think it does matter. I actually tested log recovery precisely
> > > for this question, to see whether the traditional EFI recovery path
> > > would disrupt accounting or anything and I didn't reproduce any problems
> > > (well, except for that rmap record cleanup failure thing).
> > > 
> > > However, I do still need to trace through and understand why that is, to
> > > know for sure that there aren't any problems lurking here (and if not, I
> > > should probably document it), but I suspect the reason is that the
> > > differences between how agfl and regular blocks are handled here only
> > > affect in-core state of the AG reservation pools. These are all
> > > reinitialized from zero on a subsequent mount based on the on-disk state
> > > (... but good point, and I will try to confirm that before posting a
> > > non-RFC variant).
> > > 
> > 
> > After catching back up with this and taking a closer look at the code, I
> > can confirm that generic EFI recovery works fine for deferred AGFL block
> > frees. What happens is essentially that the slot is freed and the block
> > free is deferred in a particular tx. If we crash before that tx commits,
> > then obviously nothing changes and we're fine. If we crash after that tx
> > commits, EFI recovery frees the block and the AGFL reserve pool
> > adjustment is irrelevant as the in-core res counters are initialized
> > from the current state of the fs after log recovery has completed (so
> > even if we knew this was an agfl block, attempting reservation
> > adjustments at recovery time would probably be wrong).
> > 
> > That aside, looking through the perag res code had me a bit curious
> > about why we reserve all agfl blocks in the first place. IIUC, the AGFL
> > reserve pool actually serves the rmapbt, since that (and that alone) is
> > what the mount time reservation is based on. AGFL blocks can be used for
> > other purposes, however, and the current runtime reservation is adjusted
> > based on all AGFL activity. Is there a reason this reserve pool does not
> > specifically target rmapbt allocations? Doesn't not doing so allow some
> > percentage of the rmapbt reserved blocks to be consumed by other
> > structures (alloc btrees) until/unless the fs is remounted? I'm
> > wondering if specifically there's a risk of something like the
> > following:
> > 
> > - mount fs with some number N of AGFL reserved blocks based on current
> >   rmapbt state. Suppose the size of the rmapbt is R.
> > - A bunch of agfl blocks are used over time (U). Suppose 50% of those go
> >   to the rmapbt and the other 50% to alloc btrees and whatnot.
> >   ar_reserved is reduced by U, but R only increases by U/2.
> > - a bunch more unrelated physical allocations occur and consume all
> >   non-reserved space
> > - the fs unmounts/mounts and the perag code looks for the remaining U/2
> >   blocks to reserve for the rmapbt, but not all of those blocks are
> >   available because we depleted the reserved pool faster than the rmapbt
> >   grew.
> > 
> > Darrick, hm? FWIW, a quick test to allocate 100% of an AG to a file,
> > punch out every other block for a few minutes and then remount kind of
> > shows what I'm wondering about:
> > 
> >  mount-3215  [002] ...1  1642.401846: xfs_ag_resv_init: dev 253:3 agno 0 resv 2 freeblks 259564 flcount 6 resv 3193 ask 3194 len 3194
> > ...
> >  <...>-28260 [000] ...1  1974.946866: xfs_ag_resv_alloc_extent: dev 253:3 agno 0 resv 2 freeblks 36317 flcount 9 resv 2936 ask 3194 len 1
> >  <...>-28428 [002] ...1  1976.371830: xfs_ag_resv_alloc_extent: dev 253:3 agno 0 resv 2 freeblks 36483 flcount 9 resv 2935 ask 3194 len 1
> >  <...>-28490 [002] ...1  1976.898147: xfs_ag_resv_alloc_extent: dev 253:3 agno 0 resv 2 freeblks 36544 flcount 9 resv 2934 ask 3194 len 1
> >  <...>-28491 [002] ...1  1976.907967: xfs_ag_resv_alloc_extent: dev 253:3 agno 0 resv 2 freeblks 36544 flcount 9 resv 2933 ask 3194 len 1
> > umount-28664 [002] ...1  1983.335444: xfs_ag_resv_free: dev 253:3 agno 0 resv 2 freeblks 36643 flcount 10 resv 2932 ask 3194 len 0
> >  mount-28671 [000] ...1  1991.396640: xfs_ag_resv_init: dev 253:3 agno 0 resv 2 freeblks 36643 flcount 10 resv 3038 ask 3194 len 3194
> 
> Yep, that's a bug.  Our current agfl doesn't have much of a way to
> signal to the perag reservation code that it's using blocks for the
> rmapbt vs. any other use, so we use up the reservation and then pull
> from the non-reserved free space after that, with the result that
> sometimes we can blow the assert in xfs_ag_resv_init.  I've not been
> able to come up with a convincing way to fix this problem, largely
> because of:
> 

Ok..

> > We consume the res as we go, unmount with some held reservation value,
> > immediately remount and the associated reservation has jumped by 100
> > blocks or so. (Granted, whether this can manifest into a tangible
> > problem may be another story altogether.).
> 
> It's theoretically possible -- even with the perag reservation
> functioning perfectly we still can run the ag totally out of blocks if
> the rmap expands beyond our assumed max rmapbt size of max(1 record per
> block, 1% of the ag).
> 
> Say you have agblocks = 11000, allocate 50% of the AG to a file, then
> reflink that extent into 10000 other files.  Then dirty every other
> block in each of the 10000 reflink copies.  Even if all the dirtied
> blocks end up in some other AG, we've still expanded the number of rmap
> records from 10001 to 5000 * 10000 == 50,000,000, which is much bigger
> than our original estimation.
> 

Yeah, though I think the effectiveness of the maximum sized rmapbt
estimation is a separate issue from the accuracy of the reservation
accounting system. The latter makes me worry that certain, sustained
workloads could amplify the divergence between runtime accounting and
reality such that the reservation system is ineffective even in cases
that don't test the worst case estimation. That may be less likely in
the current situation where the AGFL consumers are semi-related, but
it's hard to characterize.

> Unfortunately the options here aren't good -- we can't reserve enough
> blocks to cover the maximal rmapbt when reflink is enabled, so the best
> we could do is fail write_begin with ENOSPC if the shared extent's AG is
> critically low on reservation, but that kinda sucks because at that
> point the CoW /should/ be moving blocks (and therefore mappings) away
> from the full AG.
> 

Two potential options came to mind when reading the code:

- Factor all possible AGFL consumers into the perag AGFL reservation
  calculation to address the impedence mismatch between the calculation
  and runtime accounting.
- Refactor the RESV_AGFL reservation into an RESV_RMAPBT reservation
  that sits on top of the AGFL rather than behind it.

... neither of which is fully thought out from a sanity perspective.

The former strikes me as significantly more complicated as it would
require the reservation calculation (and estimation) to account for all
possible consumers of the AGFL. That comes along with all the expected
future maintenance cost for future AGFL consumers. Add to that the fact
that the reservation is really only needed for the rmapbt at this point
in time and this seems like an undesirable option.

The latter more simply moves the accounting to where rmapbt blocks are
consumed/freed, irrespective of where the blocks are allocated from. I
_think_ that should allow for accurate reservation accounting without
losing any major capability (i.e., the reservation value is still an
estimation in the end), but I could be missing something in the bowels
of the res. code. I do wonder a bit about the reservation
over-protecting blocks in the free space btrees based on the current
AGFL population, but it's not immediately clear to me if or how much
that matters (and perhaps is something that could be addressed,
anyways). Thoughts?

Brian

> (We already cut off reflinking when the perag reservations are
> critically low, but that's only because we assume the caller will fall
> back to a physical copy and that the physical copy will land in some
> other AG.)
> 
> --D
> 
> > 
> > Brian
> > 
> > > Brian
> > > 
> > > > Cheers,
> > > > 
> > > > Dave.
> > > > -- 
> > > > Dave Chinner
> > > > david@fromorbit.com
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 2/4] xfs: defer agfl block frees when dfops is available
  2018-01-10 12:58           ` Brian Foster
@ 2018-01-10 19:08             ` Darrick J. Wong
  2018-01-10 20:32               ` Brian Foster
  0 siblings, 1 reply; 17+ messages in thread
From: Darrick J. Wong @ 2018-01-10 19:08 UTC (permalink / raw)
  To: Brian Foster; +Cc: Dave Chinner, linux-xfs

On Wed, Jan 10, 2018 at 07:58:59AM -0500, Brian Foster wrote:
> On Tue, Jan 09, 2018 at 12:43:15PM -0800, Darrick J. Wong wrote:
> > On Mon, Jan 08, 2018 at 04:56:03PM -0500, Brian Foster wrote:
> > > cc Darrick
> > > 
> > > On Fri, Dec 08, 2017 at 09:16:30AM -0500, Brian Foster wrote:
> > > > On Fri, Dec 08, 2017 at 09:41:26AM +1100, Dave Chinner wrote:
> > > > > On Thu, Dec 07, 2017 at 01:58:08PM -0500, Brian Foster wrote:
> > > ...
> > > > > 
> > > > > Ok, so it uses the EFI/EFD to make sure that the block freeing is
> > > > > logged and replayed. So my question is:
> > > > > 
> > > > > > +/*
> > > > > > + * AGFL blocks are accounted differently in the reserve pools and are not
> > > > > > + * inserted into the busy extent list.
> > > > > > + */
> > > > > > +STATIC int
> > > > > > +xfs_agfl_free_finish_item(
> > > > > > +	struct xfs_trans		*tp,
> > > > > > +	struct xfs_defer_ops		*dop,
> > > > > > +	struct list_head		*item,
> > > > > > +	void				*done_item,
> > > > > > +	void				**state)
> > > > > > +{
> > > > > 
> > > > > How does this function get called by log recovery when processing
> > > > > the EFI as there is no flag in the EFI that says this was a AGFL
> > > > > block?
> > > > > 
> > > > 
> > > > It doesn't...
> > > > 
> > > > > That said, I haven't traced through whether this matters or not,
> > > > > but I suspect it does because freelist frees use XFS_AG_RESV_AGFL
> > > > > and that avoids accounting the free to the superblock counters
> > > > > because the block is already accounted as free space....
> > > > > 
> > > > 
> > > > I don't think it does matter. I actually tested log recovery precisely
> > > > for this question, to see whether the traditional EFI recovery path
> > > > would disrupt accounting or anything and I didn't reproduce any problems
> > > > (well, except for that rmap record cleanup failure thing).
> > > > 
> > > > However, I do still need to trace through and understand why that is, to
> > > > know for sure that there aren't any problems lurking here (and if not, I
> > > > should probably document it), but I suspect the reason is that the
> > > > differences between how agfl and regular blocks are handled here only
> > > > affect in-core state of the AG reservation pools. These are all
> > > > reinitialized from zero on a subsequent mount based on the on-disk state
> > > > (... but good point, and I will try to confirm that before posting a
> > > > non-RFC variant).
> > > > 
> > > 
> > > After catching back up with this and taking a closer look at the code, I
> > > can confirm that generic EFI recovery works fine for deferred AGFL block
> > > frees. What happens is essentially that the slot is freed and the block
> > > free is deferred in a particular tx. If we crash before that tx commits,
> > > then obviously nothing changes and we're fine. If we crash after that tx
> > > commits, EFI recovery frees the block and the AGFL reserve pool
> > > adjustment is irrelevant as the in-core res counters are initialized
> > > from the current state of the fs after log recovery has completed (so
> > > even if we knew this was an agfl block, attempting reservation
> > > adjustments at recovery time would probably be wrong).
> > > 
> > > That aside, looking through the perag res code had me a bit curious
> > > about why we reserve all agfl blocks in the first place. IIUC, the AGFL
> > > reserve pool actually serves the rmapbt, since that (and that alone) is
> > > what the mount time reservation is based on. AGFL blocks can be used for
> > > other purposes, however, and the current runtime reservation is adjusted
> > > based on all AGFL activity. Is there a reason this reserve pool does not
> > > specifically target rmapbt allocations? Doesn't not doing so allow some
> > > percentage of the rmapbt reserved blocks to be consumed by other
> > > structures (alloc btrees) until/unless the fs is remounted? I'm
> > > wondering if specifically there's a risk of something like the
> > > following:
> > > 
> > > - mount fs with some number N of AGFL reserved blocks based on current
> > >   rmapbt state. Suppose the size of the rmapbt is R.
> > > - A bunch of agfl blocks are used over time (U). Suppose 50% of those go
> > >   to the rmapbt and the other 50% to alloc btrees and whatnot.
> > >   ar_reserved is reduced by U, but R only increases by U/2.
> > > - a bunch more unrelated physical allocations occur and consume all
> > >   non-reserved space
> > > - the fs unmounts/mounts and the perag code looks for the remaining U/2
> > >   blocks to reserve for the rmapbt, but not all of those blocks are
> > >   available because we depleted the reserved pool faster than the rmapbt
> > >   grew.
> > > 
> > > Darrick, hm? FWIW, a quick test to allocate 100% of an AG to a file,
> > > punch out every other block for a few minutes and then remount kind of
> > > shows what I'm wondering about:
> > > 
> > >  mount-3215  [002] ...1  1642.401846: xfs_ag_resv_init: dev 253:3 agno 0 resv 2 freeblks 259564 flcount 6 resv 3193 ask 3194 len 3194
> > > ...
> > >  <...>-28260 [000] ...1  1974.946866: xfs_ag_resv_alloc_extent: dev 253:3 agno 0 resv 2 freeblks 36317 flcount 9 resv 2936 ask 3194 len 1
> > >  <...>-28428 [002] ...1  1976.371830: xfs_ag_resv_alloc_extent: dev 253:3 agno 0 resv 2 freeblks 36483 flcount 9 resv 2935 ask 3194 len 1
> > >  <...>-28490 [002] ...1  1976.898147: xfs_ag_resv_alloc_extent: dev 253:3 agno 0 resv 2 freeblks 36544 flcount 9 resv 2934 ask 3194 len 1
> > >  <...>-28491 [002] ...1  1976.907967: xfs_ag_resv_alloc_extent: dev 253:3 agno 0 resv 2 freeblks 36544 flcount 9 resv 2933 ask 3194 len 1
> > > umount-28664 [002] ...1  1983.335444: xfs_ag_resv_free: dev 253:3 agno 0 resv 2 freeblks 36643 flcount 10 resv 2932 ask 3194 len 0
> > >  mount-28671 [000] ...1  1991.396640: xfs_ag_resv_init: dev 253:3 agno 0 resv 2 freeblks 36643 flcount 10 resv 3038 ask 3194 len 3194
> > 
> > Yep, that's a bug.  Our current agfl doesn't have much of a way to
> > signal to the perag reservation code that it's using blocks for the
> > rmapbt vs. any other use, so we use up the reservation and then pull
> > from the non-reserved free space after that, with the result that
> > sometimes we can blow the assert in xfs_ag_resv_init.  I've not been
> > able to come up with a convincing way to fix this problem, largely
> > because of:
> > 
> 
> Ok..
> 
> > > We consume the res as we go, unmount with some held reservation value,
> > > immediately remount and the associated reservation has jumped by 100
> > > blocks or so. (Granted, whether this can manifest into a tangible
> > > problem may be another story altogether.).
> > 
> > It's theoretically possible -- even with the perag reservation
> > functioning perfectly we still can run the ag totally out of blocks if
> > the rmap expands beyond our assumed max rmapbt size of max(1 record per
> > block, 1% of the ag).
> > 
> > Say you have agblocks = 11000, allocate 50% of the AG to a file, then
> > reflink that extent into 10000 other files.  Then dirty every other
> > block in each of the 10000 reflink copies.  Even if all the dirtied
> > blocks end up in some other AG, we've still expanded the number of rmap
> > records from 10001 to 5000 * 10000 == 50,000,000, which is much bigger
> > than our original estimation.
> > 
> 
> Yeah, though I think the effectiveness of the maximum sized rmapbt
> estimation is a separate issue from the accuracy of the reservation
> accounting system. The latter makes me worry that certain, sustained
> workloads could amplify the divergence between runtime accounting and
> reality such that the reservation system is ineffective even in cases
> that don't test the worst case estimation. That may be less likely in
> the current situation where the AGFL consumers are semi-related, but
> it's hard to characterize.

Agreed, on both points.

> > Unfortunately the options here aren't good -- we can't reserve enough
> > blocks to cover the maximal rmapbt when reflink is enabled, so the best
> > we could do is fail write_begin with ENOSPC if the shared extent's AG is
> > critically low on reservation, but that kinda sucks because at that
> > point the CoW /should/ be moving blocks (and therefore mappings) away
> > from the full AG.
> > 
> 
> Two potential options came to mind when reading the code:
> 
> - Factor all possible AGFL consumers into the perag AGFL reservation
>   calculation to address the impedence mismatch between the calculation
>   and runtime accounting.
> - Refactor the RESV_AGFL reservation into an RESV_RMAPBT reservation
>   that sits on top of the AGFL rather than behind it.
> 
> ... neither of which is fully thought out from a sanity perspective.
> 
> The former strikes me as significantly more complicated as it would
> require the reservation calculation (and estimation) to account for all
> possible consumers of the AGFL. That comes along with all the expected
> future maintenance cost for future AGFL consumers. Add to that the fact
> that the reservation is really only needed for the rmapbt at this point
> in time and this seems like an undesirable option.

<nod>  I also don't know that we're not going to someday add another
consumer of agfl blocks that also needs its own perag reservation, in
which case we'd have to have more accounting.  Or...

> The latter more simply moves the accounting to where rmapbt blocks are
> consumed/freed, irrespective of where the blocks are allocated from. I
> _think_ that should allow for accurate reservation accounting without
> losing any major capability (i.e., the reservation value is still an
> estimation in the end), but I could be missing something in the bowels
> of the res. code. I do wonder a bit about the reservation
> over-protecting blocks in the free space btrees based on the current
> AGFL population, but it's not immediately clear to me if or how much
> that matters (and perhaps is something that could be addressed,
> anyways). Thoughts?

Hmmm.  Right now the perag code reserves some number of blocks for
future use by the rmapbt.  The agfl (via fix_freelist) refreshes from
the perag pool; and the bnobt/rmapbt get blocks from the agfl.  IOWs, we
use the agfl reservation (sloppily) as a backstop for the agfl, and
RESV_AGFL gets charged for any agfl allocation, even if it ultimately
goes to something that isn't the rmapbt, even though the rmapbt code put
that reservation there for its own use.

You propose moving the xfs_ag_resv_{alloc,free}_extent accounting bits
to xfs_rmapbt_{alloc,free}_block so that only rmapbt allocations can
drain from RESV_AGFL (or put back to the pool).  This fixes the
accounting problem, since only those who set up RESV_AGFL reservations
actually get to account from them, i.e. we no longer lose RESV_AGFL
blocks to the bnobt.  Therefore, fix_freelist is no longer responsible
for passing RESV_AGFL into the extent allocation/free routines ... but
does the rmapbt directly allocate its own blocks now?  (I don't think
this is possible, because we'd then have to create an OWN_FS rmap record
for the new rmapbt blocks.)  Or does it still draw from the agfl, in
which case we have to figure out how to get reserved blocks to the
rmapbt if the agfl can't supply any blocks?

--D

> Brian
> 
> > (We already cut off reflinking when the perag reservations are
> > critically low, but that's only because we assume the caller will fall
> > back to a physical copy and that the physical copy will land in some
> > other AG.)
> > 
> > --D
> > 
> > > 
> > > Brian
> > > 
> > > > Brian
> > > > 
> > > > > Cheers,
> > > > > 
> > > > > Dave.
> > > > > -- 
> > > > > Dave Chinner
> > > > > david@fromorbit.com
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > > the body of a message to majordomo@vger.kernel.org
> > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 2/4] xfs: defer agfl block frees when dfops is available
  2018-01-10 19:08             ` Darrick J. Wong
@ 2018-01-10 20:32               ` Brian Foster
  0 siblings, 0 replies; 17+ messages in thread
From: Brian Foster @ 2018-01-10 20:32 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Dave Chinner, linux-xfs

On Wed, Jan 10, 2018 at 11:08:23AM -0800, Darrick J. Wong wrote:
> On Wed, Jan 10, 2018 at 07:58:59AM -0500, Brian Foster wrote:
> > On Tue, Jan 09, 2018 at 12:43:15PM -0800, Darrick J. Wong wrote:
> > > On Mon, Jan 08, 2018 at 04:56:03PM -0500, Brian Foster wrote:
> > > > cc Darrick
> > > > 
> > > > On Fri, Dec 08, 2017 at 09:16:30AM -0500, Brian Foster wrote:
> > > > > On Fri, Dec 08, 2017 at 09:41:26AM +1100, Dave Chinner wrote:
> > > > > > On Thu, Dec 07, 2017 at 01:58:08PM -0500, Brian Foster wrote:
> > > > ...
> > > > > > 
> > > > > > Ok, so it uses the EFI/EFD to make sure that the block freeing is
> > > > > > logged and replayed. So my question is:
> > > > > > 
> > > > > > > +/*
> > > > > > > + * AGFL blocks are accounted differently in the reserve pools and are not
> > > > > > > + * inserted into the busy extent list.
> > > > > > > + */
> > > > > > > +STATIC int
> > > > > > > +xfs_agfl_free_finish_item(
> > > > > > > +	struct xfs_trans		*tp,
> > > > > > > +	struct xfs_defer_ops		*dop,
> > > > > > > +	struct list_head		*item,
> > > > > > > +	void				*done_item,
> > > > > > > +	void				**state)
> > > > > > > +{
> > > > > > 
> > > > > > How does this function get called by log recovery when processing
> > > > > > the EFI as there is no flag in the EFI that says this was a AGFL
> > > > > > block?
> > > > > > 
> > > > > 
> > > > > It doesn't...
> > > > > 
> > > > > > That said, I haven't traced through whether this matters or not,
> > > > > > but I suspect it does because freelist frees use XFS_AG_RESV_AGFL
> > > > > > and that avoids accounting the free to the superblock counters
> > > > > > because the block is already accounted as free space....
> > > > > > 
> > > > > 
> > > > > I don't think it does matter. I actually tested log recovery precisely
> > > > > for this question, to see whether the traditional EFI recovery path
> > > > > would disrupt accounting or anything and I didn't reproduce any problems
> > > > > (well, except for that rmap record cleanup failure thing).
> > > > > 
> > > > > However, I do still need to trace through and understand why that is, to
> > > > > know for sure that there aren't any problems lurking here (and if not, I
> > > > > should probably document it), but I suspect the reason is that the
> > > > > differences between how agfl and regular blocks are handled here only
> > > > > affect in-core state of the AG reservation pools. These are all
> > > > > reinitialized from zero on a subsequent mount based on the on-disk state
> > > > > (... but good point, and I will try to confirm that before posting a
> > > > > non-RFC variant).
> > > > > 
> > > > 
> > > > After catching back up with this and taking a closer look at the code, I
> > > > can confirm that generic EFI recovery works fine for deferred AGFL block
> > > > frees. What happens is essentially that the slot is freed and the block
> > > > free is deferred in a particular tx. If we crash before that tx commits,
> > > > then obviously nothing changes and we're fine. If we crash after that tx
> > > > commits, EFI recovery frees the block and the AGFL reserve pool
> > > > adjustment is irrelevant as the in-core res counters are initialized
> > > > from the current state of the fs after log recovery has completed (so
> > > > even if we knew this was an agfl block, attempting reservation
> > > > adjustments at recovery time would probably be wrong).
> > > > 
> > > > That aside, looking through the perag res code had me a bit curious
> > > > about why we reserve all agfl blocks in the first place. IIUC, the AGFL
> > > > reserve pool actually serves the rmapbt, since that (and that alone) is
> > > > what the mount time reservation is based on. AGFL blocks can be used for
> > > > other purposes, however, and the current runtime reservation is adjusted
> > > > based on all AGFL activity. Is there a reason this reserve pool does not
> > > > specifically target rmapbt allocations? Doesn't not doing so allow some
> > > > percentage of the rmapbt reserved blocks to be consumed by other
> > > > structures (alloc btrees) until/unless the fs is remounted? I'm
> > > > wondering if specifically there's a risk of something like the
> > > > following:
> > > > 
> > > > - mount fs with some number N of AGFL reserved blocks based on current
> > > >   rmapbt state. Suppose the size of the rmapbt is R.
> > > > - A bunch of agfl blocks are used over time (U). Suppose 50% of those go
> > > >   to the rmapbt and the other 50% to alloc btrees and whatnot.
> > > >   ar_reserved is reduced by U, but R only increases by U/2.
> > > > - a bunch more unrelated physical allocations occur and consume all
> > > >   non-reserved space
> > > > - the fs unmounts/mounts and the perag code looks for the remaining U/2
> > > >   blocks to reserve for the rmapbt, but not all of those blocks are
> > > >   available because we depleted the reserved pool faster than the rmapbt
> > > >   grew.
> > > > 
> > > > Darrick, hm? FWIW, a quick test to allocate 100% of an AG to a file,
> > > > punch out every other block for a few minutes and then remount kind of
> > > > shows what I'm wondering about:
> > > > 
> > > >  mount-3215  [002] ...1  1642.401846: xfs_ag_resv_init: dev 253:3 agno 0 resv 2 freeblks 259564 flcount 6 resv 3193 ask 3194 len 3194
> > > > ...
> > > >  <...>-28260 [000] ...1  1974.946866: xfs_ag_resv_alloc_extent: dev 253:3 agno 0 resv 2 freeblks 36317 flcount 9 resv 2936 ask 3194 len 1
> > > >  <...>-28428 [002] ...1  1976.371830: xfs_ag_resv_alloc_extent: dev 253:3 agno 0 resv 2 freeblks 36483 flcount 9 resv 2935 ask 3194 len 1
> > > >  <...>-28490 [002] ...1  1976.898147: xfs_ag_resv_alloc_extent: dev 253:3 agno 0 resv 2 freeblks 36544 flcount 9 resv 2934 ask 3194 len 1
> > > >  <...>-28491 [002] ...1  1976.907967: xfs_ag_resv_alloc_extent: dev 253:3 agno 0 resv 2 freeblks 36544 flcount 9 resv 2933 ask 3194 len 1
> > > > umount-28664 [002] ...1  1983.335444: xfs_ag_resv_free: dev 253:3 agno 0 resv 2 freeblks 36643 flcount 10 resv 2932 ask 3194 len 0
> > > >  mount-28671 [000] ...1  1991.396640: xfs_ag_resv_init: dev 253:3 agno 0 resv 2 freeblks 36643 flcount 10 resv 3038 ask 3194 len 3194
> > > 
> > > Yep, that's a bug.  Our current agfl doesn't have much of a way to
> > > signal to the perag reservation code that it's using blocks for the
> > > rmapbt vs. any other use, so we use up the reservation and then pull
> > > from the non-reserved free space after that, with the result that
> > > sometimes we can blow the assert in xfs_ag_resv_init.  I've not been
> > > able to come up with a convincing way to fix this problem, largely
> > > because of:
> > > 
> > 
> > Ok..
> > 
> > > > We consume the res as we go, unmount with some held reservation value,
> > > > immediately remount and the associated reservation has jumped by 100
> > > > blocks or so. (Granted, whether this can manifest into a tangible
> > > > problem may be another story altogether.).
> > > 
> > > It's theoretically possible -- even with the perag reservation
> > > functioning perfectly we still can run the ag totally out of blocks if
> > > the rmap expands beyond our assumed max rmapbt size of max(1 record per
> > > block, 1% of the ag).
> > > 
> > > Say you have agblocks = 11000, allocate 50% of the AG to a file, then
> > > reflink that extent into 10000 other files.  Then dirty every other
> > > block in each of the 10000 reflink copies.  Even if all the dirtied
> > > blocks end up in some other AG, we've still expanded the number of rmap
> > > records from 10001 to 5000 * 10000 == 50,000,000, which is much bigger
> > > than our original estimation.
> > > 
> > 
> > Yeah, though I think the effectiveness of the maximum sized rmapbt
> > estimation is a separate issue from the accuracy of the reservation
> > accounting system. The latter makes me worry that certain, sustained
> > workloads could amplify the divergence between runtime accounting and
> > reality such that the reservation system is ineffective even in cases
> > that don't test the worst case estimation. That may be less likely in
> > the current situation where the AGFL consumers are semi-related, but
> > it's hard to characterize.
> 
> Agreed, on both points.
> 
> > > Unfortunately the options here aren't good -- we can't reserve enough
> > > blocks to cover the maximal rmapbt when reflink is enabled, so the best
> > > we could do is fail write_begin with ENOSPC if the shared extent's AG is
> > > critically low on reservation, but that kinda sucks because at that
> > > point the CoW /should/ be moving blocks (and therefore mappings) away
> > > from the full AG.
> > > 
> > 
> > Two potential options came to mind when reading the code:
> > 
> > - Factor all possible AGFL consumers into the perag AGFL reservation
> >   calculation to address the impedence mismatch between the calculation
> >   and runtime accounting.
> > - Refactor the RESV_AGFL reservation into an RESV_RMAPBT reservation
> >   that sits on top of the AGFL rather than behind it.
> > 
> > ... neither of which is fully thought out from a sanity perspective.
> > 
> > The former strikes me as significantly more complicated as it would
> > require the reservation calculation (and estimation) to account for all
> > possible consumers of the AGFL. That comes along with all the expected
> > future maintenance cost for future AGFL consumers. Add to that the fact
> > that the reservation is really only needed for the rmapbt at this point
> > in time and this seems like an undesirable option.
> 
> <nod>  I also don't know that we're not going to someday add another
> consumer of agfl blocks that also needs its own perag reservation, in
> which case we'd have to have more accounting.  Or...
> 
> > The latter more simply moves the accounting to where rmapbt blocks are
> > consumed/freed, irrespective of where the blocks are allocated from. I
> > _think_ that should allow for accurate reservation accounting without
> > losing any major capability (i.e., the reservation value is still an
> > estimation in the end), but I could be missing something in the bowels
> > of the res. code. I do wonder a bit about the reservation
> > over-protecting blocks in the free space btrees based on the current
> > AGFL population, but it's not immediately clear to me if or how much
> > that matters (and perhaps is something that could be addressed,
> > anyways). Thoughts?
> 
> Hmmm.  Right now the perag code reserves some number of blocks for
> future use by the rmapbt.  The agfl (via fix_freelist) refreshes from
> the perag pool; and the bnobt/rmapbt get blocks from the agfl.  IOWs, we
> use the agfl reservation (sloppily) as a backstop for the agfl, and
> RESV_AGFL gets charged for any agfl allocation, even if it ultimately
> goes to something that isn't the rmapbt, even though the rmapbt code put
> that reservation there for its own use.
> 
> You propose moving the xfs_ag_resv_{alloc,free}_extent accounting bits
> to xfs_rmapbt_{alloc,free}_block so that only rmapbt allocations can
> drain from RESV_AGFL (or put back to the pool).  This fixes the
> accounting problem, since only those who set up RESV_AGFL reservations
> actually get to account from them, i.e. we no longer lose RESV_AGFL
> blocks to the bnobt.  Therefore, fix_freelist is no longer responsible
> for passing RESV_AGFL into the extent allocation/free routines ... but
> does the rmapbt directly allocate its own blocks now?  (I don't think
> this is possible, because we'd then have to create an OWN_FS rmap record
> for the new rmapbt blocks.)  Or does it still draw from the agfl, in
> which case we have to figure out how to get reserved blocks to the
> rmapbt if the agfl can't supply any blocks?
> 

I would expect the rmapbt to continue to allocate blocks from the agfl.
The idea is to refactor the reservation accounting only and not try to
mess with how the broader rmap mechanism works. With regard to the agfl,
shouldn't _fix_freelist() always populate the AGFL such that rmapbt
allocs tied to the upcoming operation are always satisfied?

FWIW, I hacked up the following earlier that I think illustrates the
idea. This essentially just pushes the accounting down into the rmap
btree block allocator functions via a RESV_RMAPBT pool. The AGFL tag
stays around because we apparently don't make any sb changes for AGFL
blocks and thus we still need a way to tell the perag code to do nothing
in xfs_alloc_fix_freelist().

This is obviously crap code that just remaps the existing RESV_AGFL pool
to RESV_RMAPBT and repurposes RESV_AGFL to an accounting no-op for
experimental purposes. It does seem to allow the simple "alloc some
rmapbt blocks -> unmount -> mount" test to work as one would expect, but
I haven't tested anything beyond that.

Brian

---8<---

diff --git a/fs/xfs/libxfs/xfs_ag_resv.c b/fs/xfs/libxfs/xfs_ag_resv.c
index 2291f4224e24..9db2259557b1 100644
--- a/fs/xfs/libxfs/xfs_ag_resv.c
+++ b/fs/xfs/libxfs/xfs_ag_resv.c
@@ -325,8 +325,11 @@ xfs_ag_resv_alloc_extent(
 	trace_xfs_ag_resv_alloc_extent(pag, type, args->len);
 
 	switch (type) {
-	case XFS_AG_RESV_METADATA:
 	case XFS_AG_RESV_AGFL:
+		return;
+	case XFS_AG_RESV_RMAPBT:
+		type = XFS_AG_RESV_AGFL;
+	case XFS_AG_RESV_METADATA:
 		resv = xfs_perag_resv(pag, type);
 		break;
 	default:
@@ -351,6 +354,20 @@ xfs_ag_resv_alloc_extent(
 				-((int64_t)args->len - len));
 }
 
+void
+xfs_ag_resv_rmapbt_alloc(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno)
+{
+	struct xfs_alloc_arg	args = {0};
+	struct xfs_perag	*pag;
+
+	args.len = 1;
+	pag = xfs_perag_get(mp, agno);
+	xfs_ag_resv_alloc_extent(pag, XFS_AG_RESV_RMAPBT, &args);
+	xfs_perag_put(pag);
+}
+
 /* Free a block to the reservation. */
 void
 xfs_ag_resv_free_extent(
@@ -365,8 +382,11 @@ xfs_ag_resv_free_extent(
 	trace_xfs_ag_resv_free_extent(pag, type, len);
 
 	switch (type) {
-	case XFS_AG_RESV_METADATA:
 	case XFS_AG_RESV_AGFL:
+		return;
+	case XFS_AG_RESV_RMAPBT:
+		type = XFS_AG_RESV_AGFL;
+	case XFS_AG_RESV_METADATA:
 		resv = xfs_perag_resv(pag, type);
 		break;
 	default:
@@ -387,3 +407,15 @@ xfs_ag_resv_free_extent(
 	if (len > leftover)
 		xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, len - leftover);
 }
+
+void
+xfs_ag_resv_rmapbt_free(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno)
+{
+	struct xfs_perag	*pag;
+
+	pag = xfs_perag_get(mp, agno);
+	xfs_ag_resv_free_extent(pag, XFS_AG_RESV_RMAPBT, NULL, 1);
+	xfs_perag_put(pag);
+}
diff --git a/fs/xfs/libxfs/xfs_ag_resv.h b/fs/xfs/libxfs/xfs_ag_resv.h
index 8d6c687deef3..071b9152bdbf 100644
--- a/fs/xfs/libxfs/xfs_ag_resv.h
+++ b/fs/xfs/libxfs/xfs_ag_resv.h
@@ -32,4 +32,7 @@ void xfs_ag_resv_alloc_extent(struct xfs_perag *pag, enum xfs_ag_resv_type type,
 void xfs_ag_resv_free_extent(struct xfs_perag *pag, enum xfs_ag_resv_type type,
 		struct xfs_trans *tp, xfs_extlen_t len);
 
+void xfs_ag_resv_rmapbt_alloc(struct xfs_mount *, xfs_agnumber_t);
+void xfs_ag_resv_rmapbt_free(struct xfs_mount *, xfs_agnumber_t);
+
 #endif	/* __XFS_AG_RESV_H__ */
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c
index e829c3e489ea..8560091554e0 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.c
+++ b/fs/xfs/libxfs/xfs_rmap_btree.c
@@ -130,6 +130,8 @@ xfs_rmapbt_alloc_block(
 	be32_add_cpu(&agf->agf_rmap_blocks, 1);
 	xfs_alloc_log_agf(cur->bc_tp, agbp, XFS_AGF_RMAP_BLOCKS);
 
+	xfs_ag_resv_rmapbt_alloc(cur->bc_mp, cur->bc_private.a.agno);
+
 	XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT);
 	*stat = 1;
 	return 0;
@@ -158,6 +160,8 @@ xfs_rmapbt_free_block(
 			      XFS_EXTENT_BUSY_SKIP_DISCARD);
 	xfs_trans_agbtree_delta(cur->bc_tp, -1);
 
+	xfs_ag_resv_rmapbt_free(cur->bc_mp, cur->bc_private.a.agno);
+
 	return 0;
 }
 
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index e0792d036be2..9a235c7c6f1b 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -328,6 +328,7 @@ enum xfs_ag_resv_type {
 	XFS_AG_RESV_NONE = 0,
 	XFS_AG_RESV_METADATA,
 	XFS_AG_RESV_AGFL,
+	XFS_AG_RESV_RMAPBT,
 };
 
 struct xfs_ag_resv {

^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-01-10 20:41 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-07 18:58 [PATCH RFC 0/4] xfs: defer agfl block frees Brian Foster
2017-12-07 18:58 ` [PATCH RFC 1/4] xfs: create agfl block free helper function Brian Foster
2017-12-07 22:24   ` Dave Chinner
2017-12-07 18:58 ` [PATCH RFC 2/4] xfs: defer agfl block frees when dfops is available Brian Foster
2017-12-07 22:41   ` Dave Chinner
2017-12-07 22:54     ` Dave Chinner
2017-12-08 14:17       ` Brian Foster
2017-12-08 14:16     ` Brian Foster
2018-01-08 21:56       ` Brian Foster
2018-01-09 20:43         ` Darrick J. Wong
2018-01-10 12:58           ` Brian Foster
2018-01-10 19:08             ` Darrick J. Wong
2018-01-10 20:32               ` Brian Foster
2017-12-07 18:58 ` [PATCH RFC 3/4] xfs: defer agfl block frees on extent frees Brian Foster
2017-12-07 22:49   ` Dave Chinner
2017-12-08 14:20     ` Brian Foster
2017-12-07 18:58 ` [PATCH RFC 4/4] xfs: defer agfl frees on inobt allocs during chunk removal Brian Foster

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.